Encyclopedia of Optimization 


With 613 Figures and 247 Tables 


ABS Algorithms for Linear Equations and Linear Least Squares 


———e 
ABS Algorithms for Linear Equations 
and Linear Least Squares 


EMILIO SPEDICATO 
Department Math., University Bergamo, 
Bergamo, Italy 


MSC2000: 65K05, 65K10 


Article Outline 


Keywords 
Synonyms 
The Scaled ABS Class: General Properties 
Subclasses of the ABS Class 
The Implicit LU Algorithm 
and the Huang Algorithm 
Other ABS Linear Solvers 
ABS Methods for Linear Least Squares 
See also 
References 


Keywords 


Linear algebraic equations; Linear least squares; ABS 
methods; Abaffian matrices; Huang algorithm; Implicit 
LU algorithm; Implicit LX algorithm 


Synonyms 
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The Scaled ABS Class: General Properties 


ABS methods were introduced by [1], in a paper deal- 
ing originally only with solving linear equations via 


what is now called the basic or unscaled ABS class. The 
basic ABS class was later generalized to the so-called 
scaled ABS class and subsequently applied to linear least 
squares, nonlinear equations and optimization prob- 
lems, see [2]. Preliminary work has also been initiated 
concerning Diophantine equations, with possible exten- 
sions to combinatorial optimization, and the eigenvalue 
problem. There are presently (1998) over 350 papers 
in the ABS field, see [11]. In this contribution we will 
review the basic properties and results of ABS meth- 
ods for solving linear determined or underdetermined 
systems and overdetermined linear systems in the least 
squares sense. 

Let us consider the linear determined or underde- 
termined system, where rank(A) is arbitrary 


Ax=b, xE€R", DER", m<n, (1) 
or 
alx—b, =0 i=1,...,m, (2) 
where 
ay 
A=]: |. (3) 
an 


The steps of the scaled ABS class algorithms are as fol- 

lows: 

A) Let x; € R" be arbitrary, H; € R”” be nonsingular 
arbitrary, v; be an arbitrary nonzero vector in R”; 
seti=1. 

B) Compute the residual r; = Ax; — b. If r; = 0, stop (x; 
solves the problem); else compute s; = H;ATy;. If s; 
# 0, then go to C). If s; = 0 and t = vi ri = 0, then 
set Xj41 = Xj, Hj41 = H; and go to F), else stop (the 
system has no solution). 
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C) Compute the search vector p; by 
pi= H} zi, (4) 
where z; € R” is arbitrary save for the condition 
v) AH} z; #0. (5) 
D) Update the estimate of the solution by 
— Qipi, (6) 


Xi. = Xi 


where the stepsize a; is given by 


vi ti (7) 
a; = . 
‘r} Api 
E) Update the matrix H; by 
H -—H H;A!l vw) Hi (8) 
i--l — 414i wi HAT; 


where w; € R" is arbitrary save for the condition 
w) HiA'v; £0. (9) 


F) If i= m, then stop (x41 solves the system), else 
define v;,; as an arbitrary vector in R” but linearly 
independent from ,. 
go to B). 

The matrices H; appearing in step E) are generalizations 

of (oblique) projection matrices. They probably first ap- 

peared in [16]. They have been named Abaffians since 
the first international conference on ABS methods (Lu- 
oyang, China, 1991) and this name will be used here. 
The above recursion defines a class of algorithms, 
each particular method being determined by the choice 
of the parameters H;, v;, z;, wi. The basic ABS class is 
obtained by taking v; = e;, e; being the ith unitary vector 
in R”. The parameters w;, z;, H; have been introduced 
respectively by J. Abaffy, C.G. Broyden and E. Spedi- 
cato, whose initials are referred to in the name of the 
class. It is possible to show that the scaled ABS class is 

a complete realization of the so-called Petrov-Galerkin 

iteration for solving a linear system (but the principle 

can be applied to more general problems), where the 
iteration has the form x;,1 = x; — a;p; with a;, p; cho- 


sen so that the orthogonality relation ies 


.+» Vj, increment i by one and 


vy, =0,j =1, 


..., i, holds, the vectors v; being arbitrary linearly inde- 
pendent. It appears that all deterministic algorithms in 
the literature having finite termination on a linear sys- 
tem are members of the scaled ABS class (this statement 
has been recently shown to be true also for the quasi- 
Newton methods, which are known to have under some 
conditions termination in at most 2n steps: the iterate 
of index 2i — 1 generated by Broyden’s iteration cor- 
responds to the ith iterate of a certain algorithm in the 
ABS class). 

Referring [2] for proofs, we give some of the general 
properties of methods of the scaled ABS class, assum- 
ing, for simplicity, that A has full rank. 

e Define V; = (1, ..., vi)) Wi = (wi, ..., wi). Then 
H;,,ATV; = 0, H},W;i = 0, meaning that vectors 
Aly, wj,j =1,..., i, span the null spaces of Hi41 
and its transpose, respectively. 

e The vectors H;ATv;, H Tw; are nonzero if and only 
if a;, w; are linearly independent from aj, ..., aj—1, 
W},..-, Wi—1, respectively. 

e Define P; = (pi, ..., pi). Then the implicit factor- 
ization VIA] P; = L; holds, where L; is nonsingular 
lower triangular. From this relation, if m = n, one 
obtains the following semi-explicit factorization of 
the inverse, with P = P,, V=V,,L=L, 

At =prL'v!, (10) 

For several choices of the matrix V the matrix L is 

diagonal, hence formula (10) gives a fully explicit 

factorization of the inverse as a byproduct of the 

ABS solution of a linear system, a property that 

does not hold for the classical solvers. It can also 

be shown that all possible factorizations of the form 

(10) can be obtained by proper parameter choices in 

the scaled ABS class, another completeness result. 

e Define S; and R; by S; = (s1,...; s;), Ri = (11, «+s Ti)s 
where s; = H;ATy;, r; = Hf w;. Then the Abaffian can 
be written in the form H;,, = H, — SiR} and the vec- 
tors s;, r; can be built via a Gram-Schmidt type itera- 
tions involving the previous vectors (the search vec- 
tor p; can be built in a similar way). This representa- 
tion of the Abaffian in terms of 2i vectors is compu- 
tationally convenient when the number of equations 
is much less than the number of variables. Notice 
that there is also a representation in terms of n — i 
vectors. 
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e A compact formula of the Abaffian in terms of the 
parameter matrices is the following 


Hj41 = Hy — HA V;(W;' HiA' Vi) *W;" Ai. 
(11) 


Letting V = Vin, W = W,,, one can show that the 
parameter matrices H,, V, W are admissible (i.e. 
are such that condition (9) is satisfied) if and only 
if the matrix Q = VTAH] W is strongly nonsingular 
(i.e. is LU factorizable). Notice that this condition 
can always be satisfied by suitable exchanges of the 
columns of V or W, equivalent to a row or a column 
pivoting on the matrix Q. If Q is strongly nonsingu- 
lar and we take, as is done in all algorithms insofar 
considered, z; = w;, then condition (5) is also satis- 
fied. 
It can be shown that the scaled ABS class corresponds to 
applying (implicitly) the unscaled ABS algorithm to the 
scaled (or preconditioned) system VTAx = VTb, where 
V is an arbitrary nonsingular matrix of order m. There- 
fore we see that the scaled ABS class is also complete 
with respect to all possible left preconditioning matri- 
ces, which in the ABS context are defined implicitly and 
dynamically (only the ith column of V is needed at the 
ith iteration, and it can also be a function of the previ- 
ous column choices). 


Subclasses of the ABS Class 


In [1], nine subclasses are considered of the scaled ABS 

class. Here we quote three important subclasses. 

e The conjugate direction subclass. This class is well 
defined under the condition (sufficient but not 
necessary) that A is symmetric and positive defi- 
nite. It contains the implicit Choleski algorithm, the 
Hestenes-Stiefel and the Lanczos algorithms. This 
class generates all possible algorithms whose search 
directions are A-conjugate. The vector x;,, mini- 
mizes the energy or A-weighted Euclidean norm of 
the error over x; + Span(py, ..., pi). If x; = 0, then 
the solution is approached monotonically from be- 
low in the energy norm. 

e The orthogonally scaled subclass. This class is well 
defined if A has full column rank and remains well 
defined even if m is greater than n. It contains 
the ABS formulation of the QR algorithm (the so- 
called implicit QR algorithm), of the GMRES and of 


the conjugate residual algorithms. The scaling vec- 
tors are orthogonal and the search vectors are AAT- 
conjugate. The vector x; minimizes the Euclidean 
norm of the residual over x; + Span(py, ..., pi). In 
general, the methods in this class can be applied to 
overdetermined systems to obtain the solution in 
the least squares sense. 

e The optimally scaled subclass. This class is obtained 
by the choice v; = A” 'p;. The inverse of AT disap- 
pears in the actual formulas, if we make the change 
of variables z; = ATu;, u; being now the parame- 
ter that defines the search vector. For u; = e; the 
Huang method is obtained and for u; = rj a method 
equivalent to Craig’s conjugate gradient type algo- 
rithm. From the general implicit factorization rela- 
tion one obtains PTP = D or VTAATV = D, a re- 
lation which was shown in [5] to characterize the 
optimal choice of the parameters in the general 
Petrov-Galerkin process in terms of minimizing 
the effect of a single error in x; on the final com- 
puted solution. Such a property is therefore satis- 
fied by the Huang (and the Craig) algorithm, but 
not, for instance, by the implicit LU or the implicit 
QR algorithms. A. Galantai [8] has shown that the 
condition characterizing the optimal choice of the 
scaling parameters in terms of minimizing the fi- 
nal residual Euclidean norm is V'V = D, a con- 
dition satisfied by the implicit QR algorithm, the 
GMRES method, the implicit LU algorithm and 
again by the Huang algorithm, which therefore sat- 
isfies both conditions). The methods in the opti- 
mally stable subclass have the property that x;+1 
minimizes the Euclidean norm of the error over x; + 


Span(pi,...5 pi). 


The Implicit LU Algorithm 
and the Huang Algorithm 


Specific algorithms of the scaled ABS class are obtained 

by choosing the available parameters. The implicit LU 

algorithm is given by the choices Hy = I, zj = wi = vj = 

e;. We quote the following properties of the implicit LU 

algorithm. 

a) The algorithm is well defined if and only if A is reg- 
ular (i.e. all principal submatrices are nonsingular). 
Otherwise column pivoting has to be performed (or, 
if m =n, equations pivoting). 
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b) The Abaffian H;,; has the following structure, with 
Kj eR" **; 


0 0 

Aj41 = (12) 
0 0 
Ki In-i 


c) Only the first icomponents of p; can be nonzero and 
the ith component is one. Hence the matrix P; is unit 
upper triangular, so that the implicit factorization A 
= LP’ is of the LU type, with units on the diagonal, 
justifying the name. 

d) Only K; has to be updated. The algorithm re- 
quires nm? — 2m°/3 multiplications plus lower or- 
der terms, hence, for m = n, n°/3 multiplications 
plus lower order terms. This is the same overhead 
required by the classical LU factorization or Gaus- 
sian elimination (which are two essentially equiva- 
lent processes). 

e) The main storage requirement is the storage of K;, 
whose maximum value is n*/4. This is two times 
less than the storage needed by Gaussian elimina- 
tion and four times less than the storage needed by 
the LU factorization algorithm (assuming that A is 
not overwritten). Hence the implicit LU algorithm 
is computationally better than the classical Gaussian 
elimination or LU algorithm, having the same over- 
head but less memory cost. 

The implicit LU algorithm, implemented in the case m 
= n with row pivoting, has been shown in experiments 
of M. Bertocchi and Spedicato [3] to be numerically sta- 
ble and in experiments of E. Bodon [4] on the vector 
processor Alliant FX 80 with 8 processors to be about 
twice faster than the LAPACK implementation of the 
classical LU algorithm. 

The Huang algorithm is obtained by the parame- 
ter choices Hy e;. A mathemati- 
cally equivalent, but numerically more stable, formula- 
tion of this algorithm is the so-called modified Huang 
algorithm where the search vectors and the Abaffans 
are given by formulas p; = Hj(Hja;) and Hj; = Hj — 
pip} /p} pi. Some properties of this algorithm follow. 

e The search vectors are orthogonal and are the same 
vectors obtained by applying the classical Gram- 
Schmidt orthogonalization procedure to the rows 
of A. The modified Huang algorithm is related, 


I, 2) = Wi 


Qj, Vi 


but is not numerically identical, with the Daniel- 
Gragg-Kaufmann-Stewart reorthogonalized Gram- 
Schmidt algorithm [6]. 

e Ifx, is the zero vector, then the vector x;, is the so- 
lution with least Euclidean norm of the first i equa- 
tions and the solution x* of least Euclidean norm of 
the whole system is approached monotonically and 
from below by the sequence x;. L. Zhang [17] has 
shown that the Huang algorithm can be applied, via 
the Goldfarb-Idnani active set strategy [9], to sys- 
tems of linear inequalities. The process in a finite 
number of steps either finds the solution with least 
Euclidean norm or determines that the system has 
no solution. 

e While the error growth in the Huang algorithm is 
governed by the square of the number 7; = || a; || 
/ || Hia; ||, which is certainly large for some i if A 
is ill conditioned, the error growth depends only on 
n; if pj or H; are defined as in the modified Huang 
algorithm and, at first order, there is no error growth 
for the modified Huang algorithm. 

e Numerical experiments, see [15], have shown that 
the modified Huang algorithm is very stable, giv- 
ing usually better accuracy in the computed solution 
than both the implicit LU algorithm and the classical 
LU factorization method. 

The implicit LX algorithm is defined by the choices H, 


I, vi = €j, Zj = Wi = ex,» where k; is an integer, 1 < k; < 
n, such that 
T 
e,, Ha; x 0. (13) 


Notice that by a general property of the ABS class for 
A with full rank there is at least one index k; such that 
(13) is satisfied. For stability reasons it may be recom- 
mended to select k; such that 7; = let. Hiail is maxi- 
mized. 

The following properties are valid for the implicit 
LX algorithm. Let N be the set of integers from 1 to n, 
N = (1, ..., 1). Let B; be the set of indexes k,, ..., k; 
chosen for the parameters of the implicit LX algorithm 
up to the step i. Let N; be the set N \ B;. Then: 

e The index k; is selected in the set Nj-}. 

e The rows of H;, 1 of index k € B; are null rows. 

e The vector p; has n — i zero components; its k;th 
component is equal to one. 

e If x; = 0, then x;,; is a basic type solution of the 
first i equations, whose nonzero components may lie 
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only in the positions corresponding to the indices 
ke Bj. 

e The columns of H;,, of index k € N; are the unit 
vectors ex, while the columns of H;,, of index k € 
B; have zero components in the jth position, with j 
€ B;, implying that only i(m — i) elements of such 
columns have to be computed. 

e At the ith step i(n — i) multiplications are needed 
to compute Hja; and i(n — i) to update the nontriv- 
ial part of H;. Hence the total number of multiplica- 
tions is the same as for the implicit LU algorithm 
(i.e. n°/3), but no pivoting is necessary, reflecting 
the fact that no condition is required on the matrix 
A. 

e The storage requirement is the same as for the im- 
plicit LU algorithm, i.e. at most n*/4. Hence the im- 
plicit LX algorithm shares the same storage advan- 
tage of the implicit LU algorithm over the classical 
LU algorithm, with the additional advantage of not 
requiring pivoting. 

e Numerical experiments by K. Mirnia [10] have 
shown that the implicit LX method gives usually bet- 
ter accuracy, in terms of error in the computed solu- 
tion, than the implicit LU algorithm and often even 
than the modified Huang algorithm. In terms of size 
of the final residual, its accuracy is comparable to 
that of the LU algorithm as implemented (with row 
pivoting) in the MATLAB or LAPACK libraries, but 
it is better again in terms of error in the solution. 


Other ABS Linear Solvers 


ABS reformulations have been obtained for most al- 
gorithms proposed in the literature. The availability of 
several formulations of the linear algebra of the ABS 
process allows alternative formulations of each method, 
with possibly different values of overhead, storage and 
different properties of numerical stability, vectoriza- 
tion and parallelization. The reprojection technique, al- 
ready seen in the case of the modified Huang algorithm 
and based upon the identities Hjq = H;(Hiq), H} = 
H Te 1Q), valid for any vector q if H, = I, remarkably 
improves the stability of the algorithm. The ABS ver- 
sions of the Hestenes-Stiefel and the Craig algorithms 
for instance are very stable under the above reprojec- 
tion. The implicit QR algorithm, defined by the choices 
H, =I, v; = Api, 2; = wi = e; can be implemented in 


a very stable way using the reprojection in both the def- 
inition of the search vector and the scaling vector. It 
should also be noticed that the classical iterative refine- 
ment procedure, which amounts to a Newton iteration 
on the system Ax — b = 0 using the approximate fac- 
tors of A, can be reformulated in the ABS context using 
the previously defined search vectors p;. Experiments of 
Mirnia [11] have shown that ABS refinement works ex- 
cellently. 

For problems with special structure ABS methods 
can often be implemented taking into account the ef- 
fect of the structure on the Abaffian matrix, which of- 
ten tends to reflect the structure of the matrix A. For 
instance, if A has a banded structure, the same is true 
for the Abaffian matrix generated by the implicit LU, 
the implicit QR and the Huang algorithm, albeit the 
band size is increased. If A is SPD and has a ND struc- 
ture, the same is true for the Abaffian matrix. In this 
case the implementation of the implicit LU algorithm 
has much less storage cost, for large n, than the cost 
required by an implementation of the Choleski algo- 
rithm. For matrices having the Kuhn-Tucker structure 
(KT structure) large classes of ABS methods have been 
devised, see ® ABS algorithms for optimization. For 
matrices with general sparsity patterns little is presently 
known about minimizing the fill-in in the Abaffian ma- 
trix. Careful use of BLAS4 routines can however sub- 
stantially reduce the number of operations and make 
the ABS implementation competitive with a sparse im- 
plementation of say the LU factorization (e.g. by the 
code MA28) for values of n not too big. 

It is possible to implement the ABS process also in 
block form, where several equations, instead of just one, 
are dealt with at each step. The block formulation does 
not deteriorate the numerical accuracy and can lead to 
reduction of overhead on special problems or to faster 
implementations on vector or parallel computers. 

Finally infinite iterative methods can be obtained by 
the finite ABS methods via two approaches. The first 
one consists in restarting the iteration after k < m steps, 
so that the storage will be of order 2kn if the represen- 
tation of the Abaffian in terms of 2i vectors is used. The 
second approach consists in using only a limited num- 
ber of terms in the Gram-Schmidt type processes that 
are alternative formulations of the ABS procedure. For 
both cases convergence at a linear rate has been estab- 
lished using the technique developed in [7]. The infinite 
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iteration methods obtained by these approaches define 
a very large class of methods, that contains not only 
all Krylov space type methods of the literature, but also 
non-Krylov type methods as the Gauss-Seidel, the De 
La Garza and the Kackmartz methods, with their gener- 
alizations. 


ABS Methods for Linear Least Squares 


There are several ways of using ABS methods for solv- 
ing in the least squares sense an overdetermined lin- 
ear system without forming the normal equations of 
Gauss, which are usually avoided on the account of 
their higher conditioning. One possibility is to compute 
explicitly the factors associated with the implicit factor- 
ization and then use them in the standard way. From 
results of [14] the obtained methods work well, giving 
usually better results than the methods using the QR 
factorization computed in the standard way. A second 
possibility is to use the representation of the Moore- 
Penrose pseudo-inverse that is provided explicitly by the 
ABS technique described in [13]. Again this approach 
has given very good numerical results. A third possibil- 
ity is based upon the equivalence of the normal system 
ATAx = ATb with the extended system in the variables x 
€ R", y € R”, given by the two subsystems Ax = y, ATy 
= ATb. The first of the subsystems is overdetermined 
but must be solvable. Hence y must lie in the range of 
AT, which means that y must be the solution of least 
Euclidean norm of the second underdetermined sub- 
system. Such a solution is computed by the Huang al- 
gorithm. Then the ABS algorithm, applied to the first 
subsystem, in step B) recognizes and eliminates the m 
— k dependent equations, where k is the rank of A. If 
k <n there are infinite solutions and the one of least 
Euclidean norm is obtained by using again the Huang 
algorithm on the first subsystem. 

Finally a large class of ABS methods can be applied 
directly to an overdetermined system stopping after n 
iterations in a least squares solution. The class is ob- 
tained by defining V = AU, where U is an arbitrary non- 
singular matrix in R”. Indeed at the point x,,,; the satis- 
fied Petrov-Galerkin condition is just equivalent to the 
normal equations of Gauss. If U = P then the orthogo- 
nally scaled class is obtained, implying, as already stated 
in section 2, that the methods of this class can be applied 
to solve linear least squares (but a suitable modification 


has to be made for the GMRES method). A version of 
the implicit QR algorithm, with reprojection on both 
the search vector and the scaling vector, tested in [12], 
has outperformed other ABS algorithms for linear least 
squares methods as well as methods in the LINPACK 
and NAG library based upon the classical QR factoriza- 
tion via the Householder matrices. 
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The scaled ABS (Abaffy-Broyden-Spedicato) class of 

algorithms, see [1] and > ABS algorithms for linear 

equations and linear least squares, is a very general pro- 
cess for solving linear equations, realizing the so-called 

Petrov-Galerkin approach. In addition to solving gen- 

eral determined or underdetermined linear systems Ax 

= b,x € R",b € R", m < n, rank(A) < m, A = [a), 

...€m]™, ABS methods can also solve linear least squares 

problems and nonlinear algebraic equations. In this ar- 

ticle we will consider applications of ABS methods to 
optimization problems. We will consider only the so- 
called basic ABS class, defined by the following proce- 

dure for solving Ax = b: 

A) Let x; € R" be arbitrary, H; € R”” be nonsingular 
arbitrary, set i= 1. 

B) Compute s; = Hj aj. IF s; 4 0, go to C). 

IF s; =O and t = a} x; — b; = 0, THEN set x;41 =x; 
Hj, 1 = Hj and go to F), ELSE stop, the system has 
no solution. 

C) Compute the search vector p; by pj = H y Zi, Where 
z; € R” is arbitrary save for the condition a} H a Zi 
#0. 

D) Update the estimate of the solution by xj; = x; — 
aipi, where the stepsize a; is given by a; = (a} pi _ 
bila} pi. 

E) Update the matrix H; by Hix. = Hi — Hia;w} Hil 
wi H id;, where w; € R” is arbitrary save for the con- 
dition w) Ha; #0. 

F) IFi=m, THEN stop; x +1 solves the system, ELSE 
increment i by one and go to B). 

Among the properties of the ABS class the following 

is fundamental in the applications to optimization. Let 
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m <n and, for simplicity, assume that rank(A) = m. 
Then the linear variety containing all solutions of the 
underdetermined system Ax = b is represented by the 
vectors x of the form 


x =Xmti+ Hid (1) 


where q € R” is arbitrary. In the following the matrices 

generated by the ABS process will be called Abaffians. 

It is recalled that the matrix Hj,; can be represented in 

terms of either 2i vectors or of n — i vectors, which is 

also true for the representation of the search vector pj. 

The first representation is computationally convenient 

for systems where the number of equations is small (less 

than n/2), while the second one is suitable for problems 
where m is close to n. In the applications to optimiza- 
tion, the first case corresponds to problems with few 
constraints (many degrees of freedom), the second case 
to problems with many constraints (few degrees of free- 
dom). 

Among the algorithms of the basic ABS class, the 
following are particularly important. 

a) The implicit LU algorithm is given by the choices H, 
= I, z; = w; = e;, where e; is the ith unit vector in 
R”. This algorithm is well defined if and only if A 
is regular (otherwise pivoting of the columns has to 
be performed, or of the equations, if m = n). Due 
to the special structure of the Abaffian induced by 
the parameter choices (the first i rows of Hj; are 
identically zero, while the last n — i columns are unit 
vectors) the maximum storage is n”/4, hence 4 times 
less than for the classical LU factorization or twice 
less than for Gaussian elimination; the number of 
multiplications is nm? — 2m?/3, hence, for m = n, 
n°/3, i.e. the same as for Gaussian elimination or the 
LU factorization algorithm. 

b) The Huang algorithm is obtained by the parameter 
choices H, = I, z; = w; = a;. A mathematically equiv- 
alent, but numerically more stable, formulation of 
this algorithm is the so-called modified Huang al- 
gorithm where the search vectors and the Abafh- 
ans are given by formulas p; = Hj(Hja;) and Hj; = 
Hj — pip; Ip} pi- The search vectors are orthogonal 
and are equal to the vectors obtained by applying 
the classical Gram-Schmidt orthogonalization pro- 
cedure to the rows of A. If x; is the zero vector, 
then the vector x;,; is the solution of least Euclidean 


norm of the first i equations and the solution x* 
of least Euclidean norm of the whole system is ap- 
proached monotonically and from below by the se- 
quence xj. 

c) The implicit LX algorithm, where ‘L’ refers to the 
lower triangular left factor while ‘X’ refers to the 
right factor, which is a matrix obtainable after row 
permutation of an upper triangular matrix, consid- 
ered by Z. Xia, is defined by the choices H, = I, z; = 
w; = ex, where k; is an integer, 1 < k; < n, such that 


eg Hiaj x 0. (2) 


If A has full rank, from a property of the basic ABS 

class the vector H;a; is nonzero, hence there is at 

least one index k; such that (2) is satisfied. The im- 

plicit LX algorithm has the same overhead as the 

implicit LU algorithm, hence the same as Gaussian 

elimination, and the same storage requirement, i.e. 

less than Gaussian elimination or the LU factoriza- 

tion algorithm. It has the additional advantage of not 
requiring any condition on the matrix A, hence piv- 
oting is not necessary. The structure of the Abaffian 
matrix is somewhat more complicated than for the 
implicit LU algorithm, the zero rows of H;,; being 
now in the positions k,, ...,k; and the columns that 
are unit vectors being in the positions that do not 

correspond to the already chosen indices k;. 

The vector p; has n — i zero components and its k;th 
component is equal to one. It follows that if x; = 0, 
then xj, is a basic type solution of the first i equations, 
whose nonzero components correspond to the chosen 
indices k;. 

In this paper we will present the following appli- 
cations of ABS methods to optimization problems. In 
Section 2 we describe a class of ABS related methods 
for the unconstrained optimization problem. In Sec- 
tion 3 we show how ABS methods provide the general 
solution of the quasi-Newton equation, also with spar- 
sity and symmetry and we discuss how SPD solutions 
can be obtained. In Section 4 we present several special 
ABS methods for solving the Kuhn-Tucker equations. 
In Section 5 we consider the application of the implicit 
LX algorithm to the linear programming (LP) problem. 
In Section 6 we present ABS approaches to the general 
linearly constrained optimization problem, which unify 
linear and nonlinear problems. 
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A Class of ABS Projection Methods 
for Unconstrained Optimization 


ABS methods can be applied directly to solve uncon- 
strained optimization problems via the iteration xj; = 
Xj — Qj Hil zi, where H; is reset after n or less steps 
and z; is chosen so that the descent condition holds, 
ie. gi A} z; > 0, with g; the gradient of the function 
at x;. If the function to be minimized is quadratic, 
one can identify the matrix A in the Abaffian update 
formula with the Hessian of the quadratic function. 
Defining a perturbed point x’ by x’ = x; — B v; one 
has on quadratic functions g’ = g — B Avj, hence the 
update of the Abaffian takes the form Hi; = H; — 
Hiyiw} Hilw} Hii, where y; = g’ — gj. The above de- 
fined class has termination on quadratic functions and 
local superlinear (n-step Q-quadratic) rate of conver- 
gence on general functions. It is a special case of a class 
of projection methods developed in [7]. Almost no nu- 
merical results are available about the performance of 
the methods in this class. 


Applications to Quasi-Newton Methods 


ABS methods have been used to provide the general 
solution of the quasi-Newton equation, also with the 
additional conditions of symmetry, sparsity and. posi- 
tive definiteness. While the general solution of only the 
quasi-Newton equation was already known from [2], 
the explicit formulas obtained for the sparse symmetric 
case are new, and so is the way of constructing sparse 
SPD updates. 

Let us consider the quasi- Newton equation defining 
the new approximation to a Jacobian or a Hessian, in 
the transpose form 

d! B= y', (3) 
where d = x’ — x, y= g/ — g. We observe that (3) can 
be seen as a set of n linear underdetermined systems, 
each one having just one equation and differing only 
in the right-hand side. Hence the general solution can 
be obtained by one step of the ABS method. It can be 
written in the following way 


where Q € R”” is arbitrary and s € R” is arbitrary sub- 
ject to std $ 0. Formula (4), derived in [9], is equivalent 
to the formula in [2]. 

Now the conditions that some elements of B’ should 
be zero, or have constant value or that B’ should be 
symmetric can be written as the additional linear con- 
straints, where b’; is the ith column of B’ 

(bi) ex = ni, (5) 
where 7 = 0 implies sparsity, nj = const implies that 
some elements do not change their value and nj = nji 
implies symmetry. The ABS algorithm can deal with 
these extra conditions, see [11], giving the solution in 
explicit form, columnwise in presence of symmetry. By 
adding the additional condition that the diagonal ele- 
ments be sufficiently large, it is possible to obtain for- 
mulas where B’ is quasi positive definite or quasi di- 
agonally dominant, in the sense that the principal sub- 
matrix of order n — 1 is positive definite or diagonally 
dominant. It is not possible in general to force B’ to 
be SPD, since SPD solutions may not exist, which is 
reflected in the fact that no additional conditions can 
be put on the last diagonal element, since the last col- 
umn is fully determined by the n — 1 symmetry con- 
ditions and the quasi- Newton equation. This result can 
however be exploited to provide SPD approximations 
by imbedding the original minimization problem of n 
variables in a problem of n + 1 variables, whose solu- 
tion with respect to the first n variables is the original 
solution (just set, for instance, f(x’) = f(x) + x? 44): This 
imbedding modifies the quasi- Newton equation so that 
SPD solutions exist. 


ABS Methods for Kuhn-Tucker Equations 
The Kuhn-Tucker equations (KT equations), which 


should more appropriately be named Kantorovich- 
Karush-Kuhn-Tucker equations (KKKT equations), 
are a special linear system, obtained by writing the 
optimality conditions of the problem of minimizing 
a quadratic function with Hessian G subject to the lin- 
ear equality constraint Cx = b. They are the system Ax 
= b, where A is a symmetric indefinite matrix of the fol- 
lowing form, with Ge R”", Ce R”" 


G cl 
a=(6 S). 


(6) 
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If G is nonsingular, then A is nonsingular if and 
only if CG~'CT is nonsingular. Usually G is nonsingu- 
lar, symmetric and positive definite, but this assump- 
tion, required by several classical solvers, is not neces- 
sary for the ABS solvers. 

ABS classes for solving the KT problem can be de- 
rived in several ways. Observe that system (6) is equiv- 
alent to the two subsystems 


Gp+c'z=g, (7) 
Cp=c, (8) 


where x = (pT, zT)T and b = (gT, CT)T. The general so- 
lution of subsystem (8) has the form, see (1) 


P= Pm + Hag, (9) 


with q arbitrary. The parameter choices made to con- 
struct P41 and H,,,; are arbitrary and define therefore 
a class of algorithms. 

Since the KT equations have a unique solution, 
there must be a choice of g in (9) which makes p be the 
unique n-dimensional subvector defined by the first n 
components of the solution x. Notice that since Hin41 
is singular, q is not uniquely defined (but would be 
uniquely defined if one takes the representation of the 
Abaffian in terms of n — m vectors). 

By multiplying equation (7) on the left by H,,,, and 
using the ABS property Hy; CT = 0, we obtain the 
equation 


Am+iGp = Am+1g, (10) 


which does not contain z. Now there are two possibili- 

ties to determine p: 

Al) Consider the system formed by equations (8) and 
(10). Such a system is solvable but overdeter- 
mined. Since rank(Hn41) = n — m, m equations 
are recognized as dependent and are eliminated in 
step B) of any ABS algorithm applied to this sys- 
tem. 

A2) In equation (10) substitute p with the expression 
of the general solution (9) obtaining 


Hn41GH i414 = Hn+ig—Hm+1Gpm+i- (11) 


The above system can be solved by any ABS 
method for a particular solution q, m equations be- 
ing again removed at step B) of the ABS algorithm 
as linearly dependent. 


Once p is determined, there are two approaches to de- 

termine z, namely: 

B1) Solve by any ABS method the overdetermined 
compatible system 


C'z=g-—Gp (12) 


by removing at step B) of the ABS algorithm the n 
— m dependent equations. 

B2) Let P = (pi, ...Pm) be the matrix whose columns 
are the search vectors generated on the system Cp 
= c. Now CP = L, with L nonsingular lower diago- 
nal. Multiplying equation (12) on the left by PT we 
obtain a triangular system, defining z uniquely 


L'z=P!'g—P'Gp. (13) 


Extensive numerical testing has evaluated the accuracy 
of the above considered ABS algorithms for KT equa- 
tions for certain choices of the ABS parameters (cor- 
responding to the implicit LU algorithm with row piv- 
oting and the modified Huang algorithm). The meth- 
ods have been tested against classical methods, in par- 
ticular the method of Aasen and methods using the QR 
factorization. The experiments have shown that some 
ABS methods are the most accurate, in both residual 
and solution error; moreover some ABS algorithms are 
cheaper in storage and in overhead, up to one order, 
especially for the case when m is close to n. 

In many interior point methods the main computa- 
tional cost is to compute the solution for a sequence of 
KT problems where only G, which is diagonal, changes. 
In such a case the ABS methods, which initially work 
on the matrix C, which is unchanged, are advantaged, 
particularly when m is large, where the dominant cu- 
bic term decreases with m and disappears for m = n, 
so that the overhead is dominated by second order 
terms. Again numerical experiments show that some 
ABS methods are more accurate than the classical ones. 
For details see [8]. 


Reformulation of the Simplex Method 
via the Implicit LX Algorithm 


The implicit LX algorithm has a natural application to 
a reformulation of the simplex method for the LP prob- 
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lem in standard form, i. e. the problem 


min c'x 
s.t Ax =b 
x>0 


The applicability of the implicit LX method is a con- 
sequence of the fact that the iterate x;,; generated by 
the method, started from the zero vector, is a basic type 
vector, with a unit component in the position k;, non 
identically zero components corresponding to indices j 
€ B;, where B; is the set of indices of the unit vectors 
chosen as the z;, w; parameters, i.e. the set B; = (ky, 
...) ki), while the components of xj,1 of indices in the 
set N; = N/B; are identically zero, where N = (1, ...n). 
Therefore, if the nonzero components are nonnegative, 
the point defines a vertex of the polytope containing 
the feasible points defined by the constraints of the LP 
problem. 

In the simplex method one moves from a vertex to 
another one, according to some rules and usually re- 
ducing at each step the value of the function cTx. The 
direction along which one moves from a vertex to an- 
other one is an edge direction of the polytope and is de- 
termined by solving a linear system, whose coefficient 
matrix Ag, the basic matrix, is defined by m linearly 
independent columns of the matrix A, called the basic 
columns. Usually such a system is solved by the LU fac- 
torization method or occasionally by the QR method, 
see [5]. The new vertex is associated to a new basic ma- 
trix Ag’, which is obtained by substituting one of the 
columns in Ag by a column of the matrix Ay, which 
comprises the columns of A that do not belong to Az:. 
The most efficient algorithm for solving the modified 
system, after the column interchange, is the Forrest- 
Goldfarb method [6], requiring m” multiplications. No- 
tice that the classical simplex method requires m? stor- 
age for the matrix Ag plus mn storage for the matrix A, 
which must be kept in general to provide the columns 
for the exchange. 

The application of the implicit LX method to the 
simplex method, developed in [4,10,13,17] exploits the 
fact that in the implicit LX algorithm the interchange 
of a jth column in Ag with a kth column in Ay cor- 
responds to the interchange of a previously chosen pa- 
rameter vector Zj = Wj = ej with a new parameter Z; = wx 


= ex. This operation is a special case of the perturbation 
of the Abaffian after a change in the parameters and can 
be done using a general formula of [15], without explicit 
use of the kth column in Ay. Moreover since all quanti- 
ties which are needed for the construction of the search 
direction (the edge direction) and for the interchange 
criteria can as well be implemented without explicit use 
of the columns of A, it follows that the ABS approach 
needs only the storage of the matrix H,,,1, which, in the 
case of the implicit LX algorithm, has a cost of at most 
n*/4, Therefore for values of m close to n the storage 
required by the ABS formulation is about 8 times less 
than for the classical simplex method. 

Here we give the basic formulas of the simplex 
method in the classical and in the ABS formulation. 
The column in Ay substituting an old column in Ag 
is often taken as the column with minimal relative cost. 
In terms of the ABS formulation this is equivalent to 
minimize with respect to i € N,, the scalar n; = cTHTe;. 
Let N* be the index chosen in this way. The column 
in Ag to be exchanged is usually chosen with the cri- 
terion of the maximum displacement along an edge 
which keeps the basic variables nonnegative. Define w; 
= xTe/le] H Tey*, where x is the current basic feasible 
solution. Then the above criterion is equivalent to min- 
imize w; with respect the set of indices i € B,, such that 


e} H' ey > 0. (14) 


Notice that HTey» 4 0 and that an index i such that 
(14) is satisfied always exists, unless x is a solution of 
the LP problem. 

The update of the Abaffian after the interchange of 
the unit vectors, which corresponds to the update of the 
LU factors after the interchange of the basic with the 
nonbasic column, is given by the following formula 


; eH 
H’ = H — (He x — ea) 
ex» Hepx 


(15) 
The search direction d, which in the classical formula- 
tion is obtained by solving the system Agd = — Aen~, is 
given byd=H], 41en*, hence at no cost. Finally, the rel- 
ative cost vector r, classically given by r=c — ATA} 'cp, 
where cg consists of the components of c with indices 
corresponding to those of the basic columns, is simply 
given by r= Hy +10. 
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Let us now consider the computational cost of up- 
date (15). Since H eg» has at most n — m nonzero com- 
ponents, while HTey* has at most m, no more than m(n 
— m) multiplications are required. The update is most 
expensive for m = n/2 and gets cheaper the smaller m is 
or the closer it is to n. In the dual steepest edge Forrest- 
Goldfarb method [6] the overhead for replacing a col- 
umn is m*, hence formula (15) is faster for m > n/2 
and is recommended on overhead considerations for 
m sufficiently large. However we notice that ABS up- 
dates having a O(m?) cost can also be obtained by using 
the representation of the Abaffian in terms of 2m vec- 
tors. No computational experience has been obtained 
till now on the new ABS formulation of the simplex 
method. 

Finally, a generalization of the simplex method, 
based upon the use of the Huang algorithm started with 
a suitable singular matrix, has been developed in [16]. 
In this formulation the solution is approached by points 
lying on a face of the polytope. Whenever the point hits 
a vertex the remaining iterates move among vertices 
and the method is reduced to the simplex method. 


ABS Unification of Feasible Direction Methods 
for Minimization with Linear Constraints 


ABS algorithms can be used to provide a unification 
of feasible point methods for nonlinear minimization 
with linear constraints, including as a special case the 
LP problem. Let us first consider the problem with only 
linear equality constraints: 


min x 

de) 

s.t Ax =b 
AER™", m<n, 
rank(A) = m. 


Let x; be a feasible starting point; then for an itera- 
tion procedure of the form x;,; = x; — adj, the search 
direction will generate feasible points if and only if 


Ad; = 0. (16) 


Solving the underdetermined system (16) for d; by the 
ABS algorithm, the solution can be written in the fol- 


lowing form, taking, without loss of generality, the zero 
vector as a special solution 


dj = Hy414 (17) 


where the matrix H,,,; depends on the arbitrary choice 
of the parameters Hj, w; and v; used in solving (16) and 
q € R" is arbitrary. Hence the general feasible direction 
iteration has the form 


Xit-1 = x; — OH 444. (18) 


The search direction is a descent direction if and only 

if d™Vf(x) = q’Hinsi V f(x) > 0. Such a condition can 

always be satisfied by choice of q unless Hy+1 V f(x) = 

0, which implies, from the null space structure of His, 

that V f(x) = AT A for some A, hence that x; isa KT 

point and A is the vector of the Lagrange multipliers. 

When xj; is not a KT point, it is immediate to see that 

the search direction is a descent directions if we select 

gas gq = WHymii V f(x), where W is a symmetric and 
positive definite matrix. 

Particular well-known algorithms from the litera- 
ture are obtained by the following choices of q, with 
Wel: 

e The Wolfe reduced gradient method. Here, Hy41 is 
constructed by the implicit LU (or the implicit LX) 
algorithm. 

e The Rosen gradient projection method. Here, Hy41 is 
built using the Huang algorithm. 

e The Goldfarb-Idnani method. Here, H,,,1 is built via 
the modification of the Huang algorithm where H, 
is a symmetric positive definite matrix approximat- 
ing the inverse Hessian of f(x). 

If there are inequalities two approaches are possible: 

A) The active set approach. In this approach the set of 
linear equality constraints is modified at every iter- 
ation by adding and/or dropping some of the linear 
inequality constraints. Adding or deleting a single 
constraint can be done, for every ABS algorithm, in 
order two operations, see [15]. In the ABS reformu- 
lation of the Goldfarb-Idnani method, the initial 
matrix is related to a quasi- Newton approximation 
of the Hessian and an efficient update of the Abaf- 
fian after a change in the initial matrix is discussed 
in [14]. 
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B) The standard form approach. In this approach, by 
introducing slack variables, the problem with both 
types of linear constraints is written in the equiva- 


lent form 
min f(x) 
st. Ax =b 


x>0. 


The following general iteration, started with x; a feasi- 
ble point, generates a sequence of feasible points for the 
problem in standard form 


Xitn = x; — O/B; Hn+iV f(x), (19) 


where the parameter a; can be chosen by a line search 
along the vector Hy; V f(x), while the relaxation pa- 
rameter 6; > 0 is selected to avoid that the new point 
has some negative components. 

If f(x) is nonlinear, then H,,,; can be determined 
once and for all at the first step, since V f(x) generally 
changes from iteration to iteration, therefore modifying 
the search direction. If, however, f(x) = cTx is linear (we 
have then the LP problem) to modify the search direc- 
tion we need to change Hy,,;. As observed before, the 
simplex method is obtained by constructing H,,,; with 
the implicit LX algorithm, every step of the method cor- 
responding to a change of the parameters e,,. It can be 
shown, see [13], that the method of Karmarkar (equiv- 
alent to an earlier method of Evtushenko [3]), corre- 
sponds to using the generalized Huang algorithm, with 
initial matrix H, = Diag(x;) changing from iteration to 
iteration. Another method, faster than Karmarkar’s and 
having superlinear against linear rate of convergence 
and O(./n) against O(n) complexity, again first pro- 
posed by Y. Evtushenko, is obtained by the generalized 
Huang algorithm with initial matrix H; = Diag(x?). 


See also 


> ABS Algorithms for Linear Equations and Linear 
Least Squares 

> Gauss-Newton Method: Least Squares, Relation to 
Newton’s Method 

> Generalized Total Least Squares 

> Least Squares Orthogonal Polynomials 


> Least Squares Problems 

> Nonlinear Least Squares: Newton-type Methods 
> Nonlinear Least Squares Problems 

> Nonlinear Least Squares: Trust Region Methods 
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Introduction 


The adaptive convexification algorithm is a method 
to solve semi-infinite optimization problems via a se- 
quence of feasible iterates. Its main idea [6] is to 
adaptively construct convex relaxations of the lower 
level problem, replace the relaxed lower level problems 
equivalently by their Karush-Kuhn-Tucker conditions, 
and solve the resulting mathematical programs with 
complementarity constraints. The convex relaxations 
are constructed with ideas from the a@BB method of 
global optimization. 


Feasibility in Semi-Infinite Optimization 


In a (standard) semi-infinite optimization problem 
a finite-dimensional decision variable is subject to in- 


finitely many inequality constraints. For adaptive con- 
vexification one assumes the form 


SIP: min f(x) subjectto g(x,y) <0, 
xe 
for all y € [0, 1] 


with objective function f € C?(R",R), constraint 
function g € C?(R" x R,R), a box constraint set 
X = [x®, x“] C R" with x’ < x” € R", and the set of 
infinitely many indices Y = [0,1]. Adaptive convexi- 
fication easily generalizes to problems with additional 
inequality and equality constraints, a finite number of 
semi-infinite constraints as well as higher-dimensional 
box index sets [6]. Reviews on semi-infinite program- 
ming are given in [8,13], and [9,14,15] overview the ex- 
isting numerical methods. 

Classical numerical methods for SIP suffer from the 
drawback that their approximations of the feasible set 
XMM with 


M = {x €R"| g(x, y) < 0 forall y € [0, 1]} 


may contain infeasible points. In fact, discretization 
and exchange methods approximate M by finitely many 
inequalities corresponding to finitely many indices in 
Y = [0,1], yielding an outer approximation of M, 
and reduction based methods solve the Karush-Kuhn- 
Tucker system of SIP by a Newton-SQP approach. As 
a consequence, the iterates of these methods are not 
necessarily feasible for SIP, but only their limit might 
be. On the other hand, a first method producing feasible 
iterates for SIP was presented in the articles [3,4], where 
a branch-and-bound framework for the global solution 
of SIP generates convergent sequences of lower and up- 
per bounds for the globally optimal value. 

In fact, checking feasibility of a given point x € R" 
is the crucial problem in semi-infinite optimization. 
Clearly we have x € M if and only if g(x) < 0 holds 
with the function 

og: R" +R, x ee g(x,y). 

The latter function is the optimal value function of the 
so-called lower level problem of SIP, 


Q(x): max g(x,y) subjectto O0O<y<1. 
yeR 
The difficulty lies in the fact that g(x) is the globally 


optimal value of Q(x) which might be hard to deter- 
mine numerically. In fact, standard NLP solvers can 
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only be expected to produce a local maximizer ioc of 
Q(x) which is not necessarily a global maximizer ygiop. 
Even if g(X, Yioc) < 0 is satisfied, x might be infeasible 
since 9(X, Vioc) < 0 < P(X) = gB(X, Velop) may hold. 


Convex Lower Level Problems 


Assume for a moment that Q(x) is a convex optimiza- 
tion problem for all x € X, that is, g(x, -) is concave on 
Y = [0, 1] for these x. An approach developed for so- 
called generalized semi-infinite programs from [18,19] 
then takes advantage of the fact that the solution set 
of a differentiable convex lower level problem satisfy- 
ing a constraint qualification is characterized by its first 
order optimality condition. In fact, SIP and the Stackel- 
berg game 


SG: min f(x) subjectto g(x,y) <0, 
xy 


and y solves Q(x) 


are equivalent problems, and the restriction 
‘y solves Q(x)’ in SG can be equivalently replaced 
by its Karush-Kuhn-Tucker condition. For this refor- 


mulation we use that the Lagrange function of Q(x), 


L(x, ¥, Ve. Yu) = glx y)+vey+ yu — y), 


satisfies 


Vy L(x, Yes Yu) = Vyg(x, y) + ve — Yu 


and obtain that the Stackelberg game is equivalent to 
the following mathematical program with complemen- 
tarity constraints: 


MPCC: min __ f(x) subject to g(x, y) <0 


XsYVe0Vu 

Vy g(x,y) + Ye- Yu =0 

vey=0 

Yu(1 — y) =0 

Ye, Yu 20 

y, 1l-y=0. 

Overviews of solution methods for MPCC are given 
in [10,11,17]. One approach to solve MPCC is the refor- 
mulation of the complementarity constraints by a so- 


called NCP function, that is, a function ¢: R? > R 
with 


o(a,b) = 0 


ifandonlyif a>0, b>0, ab=0. 


For numerical purposes one can regularize these non- 
differentiable NCP functions. Although MPCC does 
not necessarily have to be solved via the NCP function 
formulation, in the following we will use NCP func- 
tions to keep the notation concise. In fact, MPCC can 
be equivalently rewritten as the nonsmooth problem 


P: min 
X5VsVOsYu 
f(x) subjectto g(x, y) <0 
Vy g(x,y) + ve — Yu =0 
b(ve. y) =0 
(Yu, 1—y) =0. 


The wBB Method 


In @BB, a convex underestimator of a nonconvex func- 
tion is constructed by decomposing it into a sum of 
nonconvex terms of special type (e. g., linear, bilinear, 
trilinear, fractional, fractional trilinear, convex, uni- 
variate concave) and nonconvex terms of arbitrary type. 
The first type is then replaced by its convex envelope 
or very tight convex underestimators which are already 
known. A complete list of the tight convex underesti- 
mators of the above special type nonconvex terms is 
provided in [5]. 

For the ease of presentation, here we will treat 
all terms as arbitrarily nonconvex. For these terms, 
a@BB constructs convex underestimators by adding 
a quadratic relaxation function y. With the obvi- 
ous modification we use this approach to construct 
a concave overestimator for a nonconcave function 
g: Ly®, y“] > R being C? on an open neighborhood of 
Ly’, y“]. With 


wise, yy") = 5- y r= 9) (1) 
we put 
ayayy") = gy) +a yy"). 


In the sequel we will suppress the dependence of ¢ on 
y’, y". For w > 0 the function ¢ clearly is an overesti- 
mator of g on [y*, y“], and it coincides with g at the 
endpoints y‘, y“ of the domain. Moreover, ¢ is twice 
continuously differentiable with second derivative 


Vi a(ysa) = Vegly) —o 
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on [y*, y“]. Consequently ¢ is concave on [y*, y“] for 


a> max V7 2(y) (2) 
yely®y"] 


(cf. also [1,2]). The computation of a thus involves 
a global optimization problem itself. Note, however, 
that one may use any upper bound for the right-hand 
side in (2). Such upper bounds can be provided by in- 
terval methods (see, e. g., [5,7,12]). An q@ satisfying (2) 
is called convexification parameter. 

Combining these facts shows that for 


a > max (0 max ao) 


yelyoy"] 


the function g(y;@) is a concave overestimator of g on 
Ly’. y"]. 


Formulation 


For NEW let 0=7! <9! <..c<e 997 < 9% =1 
define a subdivision of Y = [0,1], that is, with 
K = {1,..., N} and 


y* = [yx y*], kEK, 
we have 
Y= U Yx « 


kEK 


A trivial but very useful observation is that the single 
semi-infinite constraint 


g(x,y) <0 forall yeY 


is equivalent to the finitely many semi-infinite con- 
straints 


g(x,y) <0 forall yeY*, keKk. 


Given a subdivision, one can construct concave over- 
estimators for each of these finitely many semi- 
infinite constraints, solve the corresponding optimiza- 
tion problem, and adaptively refine the subdivision. 

The following lemma formulates the obvious fact 
that replacing g by overestimators on each subdivision 
node Y* results in an approximation of M by feasible 
points. 


Lemmal Foreachk € K let g*: X x Y* > R, and let 
& € X be given such that for allk € K andally € Y* we 
have g(x, y) < g(x, y). Then the constraints 


g(x,y) <0 forall ye Y' kek, 
entail x € M. 


«BB for the Lower Level 


For the construction of these overestimators one uses 
ideas of the w@BB method. In fact, for each k € K we put 


gk: XxY* SR (xy) glx, y+W(yson. nk n*) 


(3) 
with the quadratic relaxation function y from (1) and 
a, > max (0 max vets] . (4) 
(x, y)EXxY* 


Note that the latter condition on a; is uniform in x. We 
emphasize that with the single bound 
@>max{0, max V72e(x, 5 
( (x,y)EXXY y8( ») ( ) 
the choices a; := a satisfy (4) for all k € K. Moreover, 
the a; can always be chosen such that a; < a, k € K. 
The following properties of g* are easily verified. 


Lemma 2 ([6]) For each k € K let 2 be given by (3). 
Then the following holds: 

(i) For all (x,y) € X x Y* we have g(x,y) < Pate y). 
(ii) For all x € X, the function rate -) is concave on Y*. 
Now consider the following approximation of the fea- 
sible set M, where E = {n*| k € K} denotes the set of 
subdivision points, and a the vector of convexification 
parameters: 


Mapsl(E,a) = {x € R"| gk (x, y) <0, 
forall ye Y*, k EK}. 


By Lemma 1 and Lemma 2(i) we _ have 
Mass(E,a) C M. This means that any solution con- 
cept for 


SIPypp(E, a): min f(«) subject to 
xe 


x € Mapp(E,a@), 
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be it global solutions, local solutions or stationary 
points, will at least lead to feasible points of SIP (pro- 
vided that SIPyg(E, «) is consistent). 

The problem SIPygg(E, @) has finitely many lower 
level problems Q(x), k € K, with 


O*(x): ee g*(x,y) subjectto n1<y<nt. 


Since the inequality (4) is strict, the convex problem 
Q*(x) has a unique solution yF (x) for each k € K and 
x € X, Recall that y € Y* is called active for the con- 
straint max y¢ yk gk (x, y) < Oat x if g*(x, y) = 0 holds. 
By the uniqueness of the global solution of Q*(x) there 
exists at most one active index for each k € K, namely 
pF (x). Thus, one can consider the finite active index sets 
Ko(#) ={k € K| g*(x, y*(%)) = 0}, 
Ys? (x) = { y*(®)| k € Ko(%)}. 


The MPCC Reformulation 
Following the ideas to treat convex lower level prob- 
lems, yk solves Q*(x) if and only if (x, vy vr ye ) solves 
the system 
Vy gk (x, y) + ye-Yu =0 
o(Ye.y—n"') =0 
P(Yu.n* — y) =0 
with some ree y*, and ¢ denoting some NCP function. 
With 
Wi (x, y*, ae yk, k €K) 
F(w) := f(x) 
Ak = 
GE (ws E,a2) = glx, y") + S08 = nk = y) 
H*(w;E, a) := 
Vy g(x, y*) + a (= 
bye. y* — 1 
oyu nk — y*) 


one can thus replace SIPygg(E, a) equivalently by the 
nonsmooth problem 


- y*) + yk — yk 
k-1) 


P(E,a@): min F(w) — subject to 
G*(w;E,a) <0, 


H*(w;E,a)=0, kek. 


The latter problem can be solved to local optimality by 
MPCC algorithms [10,11,17]. For a local minimizer w 
of P(E, a) the subvector x of w is a local minimizer and, 
hence, a stationary point of SIPyga(E, a). 


Method 


The main idea of the adaptive convexification al- 
gorithm is to compute a stationary point x of 
SIPysp(E,a) by the approach from the previous sec- 
tion, and terminate if x is also stationary for SIP within 
given tolerances. If x is not stationary it refines the sub- 
division E in the spirit of exchange methods [8,15] by 
adding the active indices Y?(x) to E, and constructs 
a refined problem SIPyg3(E U Y“33(x), &) by the fol- 
lowing procedure. Note that, in view of Carathéodory’s 
theorem, the number of elements of Y?2(x) may be 
bounded by n + 1. 


Refinement Step 

For any 7 € YO? (x), let k € K be the index with 
ne (n'y). Put YS" = [nk A], YS? = [in], let 
Qk,1 and o,,2 be the corresponding convexification pa- 
rameters, put 


g(x, y) = g(x, y) + 0 n= y), 
g(x, y) = g(x,y) + 0- a(nk — y), 


and define Mags(E U {f}, a) by replacing the con- 
straint 


gk (x,y) <0, forallye y* 


in Mwga(£, a) by the two new constraints 


ghi(x,y) <0, forall yey*?, i=1,2, 


and by replacing the entry a, of a by the two new en- 
tries a;j,1 = 1,2. 


The Algorithm 
The point x is stationary for SIPygg(E, a) (in the sense 
of Fritz John) if x € Mygg(E,a@) and if there exist 
y* © Yo3B (x), 1 <k <n, and (x,A) € S"*? (the 
(n + 1)—dimensional standard simplex) with 

n+1 


KV f(Z) + Do Ak Vege y*) = 0 
k=1 


de gk (%, yy") =0, 1 <k<ntl. 
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For the adaptive convexification algorithm the notions 
of active index, stationarity, and set unification are re- 
laxed by certain tolerances. 


Definition 1 For €act; Estat: €u > 0 we say that 

(i) yk is €act-active for g* at X if gk (x, y®) € [—eact, 0]; 

(ii) X is &at-Stationary for SIP with &4,-active indices 
ifxe M and if there exist y* €Y,1l<k<n+l, 
and («,A) € S"*! such that 


n+1 
KV f(x) a So Ak v.94) < Estat 
k=1 
An g(%,y") € [-Ak- esc 0], 1<k<n+l, 
hold, and 
(iii) the ey-union of E and fis E U {} if 


min{# — n°! * — A} > eu: (nk — n*) 


holds for the k € K with 9 € [n*—!, n*], and E oth- 
erwise (i.e., 7 is not unified with E if its distance 
from E is too small). 


In [6] it is shown that Algorithm 1 is well-defined, con- 
vergent and finitely terminating. Furthermore, the fol- 
lowing feasibility result holds. 


Theorem 2 ([6]) Let (x”), be a sequence of points gen- 
erated by Algorithm 1. Then all x”, v € N, are feasible 
for SIP, the sequence (x”), has an accumulation point, 
each such accumulation point x* is feasible for SIP, and 
Ff (x*) provides an upper bound for the optimal value of 
SIP. 


Numerical examples for the performance of the method 
from Chebyshev approximation and design centering 
are given in [6]. 


A Consistent Initial Approximation 


Even if the feasible set M of SIP is consistent, there is 
no guarantee that its approximations Mygz(E, a) are 
also consistent. For Step 1 of Algorithm 1 [6] suggests 
the following phase I approach: use Algorithm 1 to con- 
struct adaptive convexifications of 


Srpeh!; min z 


subject to x,y) <Zz 
(x,z)EXxR ) 8( y) ~ 


for all y € [0, 1] 


Algorithm 1 
(Adaptive convexification algorithm) 


Step 1: Determine a uniform convexification param- 
eter @ with (5), choose N € N, ae € Yanda; < 
a,k € K ={1,...,N}, such that SIPyp3(E, @) is 
consistent, as well as tolerances €act, €stat. EU > 
0 with eu < 2&act/a. 

Step 2: By solving P(E,a@), compute a stationary 
point x of SIPygs(E, a) with ¢gct—active indices 
Me 1 <k <n +1, and multipliers (x, A). 

Step 3: Terminate if x is estgt—stationary for SIP with 
(2éact)-active indices y*,1 < k < n+1, from 
Step 2 and multipliers («, 2) from Step 2. 
Otherwise construct a new set E of subdivision 
points as the ey-union of E and {y*| 
1 < k < n+1},and perform a refinement step 
for the elements in E \ E to construct a new fea- 
sible set Maza(E, &). 

Step 4: Put E = E,a = @, and go to Step 2. 
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until a feasible point (x, Z) with z < 0 of SIPPMM(E , a) 


is found with some subdivision E and convexification 
parameters a. The point x is then obviously also feasi- 
ble for SIPyge(E, a) and can be used as an initial point 
to solve the latter problem. Due to the possible noncon- 
vexity of the upper level problem of SIP, this phase I ap- 
proach is not necessarily successful, but possible reme- 
dies for this situation are given in [6]. 

To initialize Algorithm 1 for phase I, select some 
point x in the box X and put E! = {0,1}, that is, 
Y' = Y = [0,1]. Compute a according to (4) and 
solve the convex optimization problem Q!(x) with 
standard software. With its optimal value Z, the point 
(x, Z) is feasible for sIpPh(E}, 1). 


A Certificate for Global Optimality 


After termination of Algorithm 1 one can exploit that 
the set E C [0, 1] contains indices that should also yield 
a good outer approximation of M. The optimal value of 
the problem 


Psaren? min f(x) subject to g(x,n) <0, 7 EE, 
xe 
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yields a rigorous ower bound for the optimal value of 
SIP. If Pouter can actually be solved to global optimal- 
ity (e.g., if a standard NLP solver is used, due to con- 
vexity with respect to x), then a comparison of this 
lower bound for the optimal value of SIP with the upper 
bound from Algorithm 1 can yield a certificate of global 
optimality for SIP up to some tolerance. 


Conclusions 


The adaptive convexification algorithm provides an 
easily implementable way to solve semi-infinite opti- 
mization problems with feasible iterates. To explain its 
basic ideas, in [6] the algorithm is presented in its sim- 
plest form. It can be improved in a number of ways, 
for example in the magnitude of the convexification pa- 
rameters and in their adaptive refinement, or by using 
other convexification techniques. Although the numer- 
ical results from [6] are very promising, further work is 
needed on error estimates on the numerical solution of 
the auxiliary problem P(E, a), which is assumed to be 
solved to exact local optimality by the present adaptive 
convexification algorithm. 


See also 


> vBB Algorithm 

> Bilevel Optimization: Feasibility Test and Flexibility 
Index 

> Convex Discrete Optimization 

> Generalized Semi-infinite Programming: Optimality 
Conditions 
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This article contains a survey of some well known 
facts about the complexity of global optimization, and 
also describes some results concerning the average-case 
complexity. 

Consider the following optimization problem. 
Given a class F of objective functions f defined on 
a compact subset of d-dimensional Euclidean space, the 
goal is to approximate the global minimum of f based 
on evaluation of the function at sequentially selected 
points. The focus will be on the error after n observa- 
tions 


Ay= =e Tt: 


where f,, is the smallest of the first n observed function 
values (other approximations besides f,, are often con- 
sidered). 

Complexity of optimization is usually studied in the 
worst- or average-case setting. In order for a worst-case 
analysis to be useful the class of objective functions F 
must be quite restricted. Consider the case where F is 
a subset of the continuous functions on a compact set. 
It is convenient to consider the class F = C’([0, 1]%) of 
real-valued functions on [0, 1]4 with continuous deriva- 
tives up to order r > 0. Suppose that r > 0 and f" is 
bounded. In this case @(e~”) function evaluations are 
needed to ensure that the error is at most € for any f € 
F; see [8]. 

An adaptive algorithm is one for which the (n + 1)st 
observation point is determined as a function of the 
previous observations, while a nonadaptive algorithm 
chooses each point independently of the function val- 
ues. In the worst-case setting, adaptation does not help 
much under quite general assumptions. If F is convex 
and symmetric (in the sense that —F = F), then the max- 
imum error under an adaptive algorithm with n ob- 
servations is not smaller than the maximum error of 
a nonadaptive method with n + 1 observations; see [4]. 

Virtually all global optimization methods in prac- 
tical use are adaptive. For a survey of such methods 
see [6,9]. The fact that the worst-case performance can 
not be significantly improved with adaptation leads to 
consideration of alternative settings that may be more 


appropriate. One such setting is the average-case set- 
ting, in which a probability measure P on F is chosen. 
The object of study is then the sequence of random 
variables A,(f), and the questions include under what 
conditions (for what algorithms) the error converges to 
zero and for convergent algorithms the speed of con- 
vergence. While the average-case error is often defined 
as the mathematical expectation of the error, it is useful 
to take a broader view, and consider for example con- 
vergence in probability of a,A, for some normalizing 
sequence {a,}. 

With the average-case setting one can consider less 
restricted classes F than in the worst-case setting. As F 
gets larger, the worst-case deviates more and more from 
the average case, but may occur on only a small portion 
of the set F. Even for continuous functions the worst- 
case is arbitrarily bad. 

Most of what is known about the average-case com- 
plexity of optimization is in the one-dimensional set- 
ting under the Wiener probability measure on C(([0, 1]). 
Under the Wiener measure, the increments f(t)—f(s) 
have a normal distribution with mean zero and vari- 
ance t—s, and are independent for disjoint intervals. Al- 
most every f is nowhere differentiable, and the set of 
local minima is dense in the unit interval. One can thus 
think of the Wiener measure as corresponding to as- 
suming ‘only’ continuity; i. e., a worst-case probabilistic 
assumption. 

K. Ritter proved [5] that the best nonadaptive algo- 


—l2 after n function eval- 


rithms have error of order n 
uations; the optimal order is achieved by observing at 
equally spaced points. Since the choice of each new ob- 
servation point does not depend on any of the previ- 
ous observations, the computation can be carried out 
in parallel. Thus under the Wiener measure, the opti- 
mal nonadaptive order of convergence can be accom- 
plished with an algorithm that has computational cost 
that grows linearly with the number of observations and 
uses constant storage. This gives the base on which to 
compare adaptive algorithms. 

Recent studies (as of 2000) have formally estab- 
lished the improved power of adaptive methods in the 
average-case setting by analyzing the convergence rates 
of certain adaptive algorithms. A randomized algorithm 
is described in [1] with the property that for any 0 <6 < 
1, aversion can be constructed so that under the Wiener 
measure, the error converges to zero at rate n'*5| This 
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algorithm maintains a memory of two past observation 
values, and the computational cost grows linearly with 
the number of iterations. Therefore, the convergence 
rate of this adaptive algorithm improves from the non- 
adaptive n~!? rate to n~!*° with only a constant in- 
crease in storage. 

Algorithms based on a random model for the ob- 
jective function are well-suited to average-case analysis. 
H. Kushner proposed [3] a global optimization method 
based on modeling the objective function as a Wiener 
process. Let {z,} be a sequence of positive numbers, and 
let the (n + 1)st point be chosen to maximize the prob- 
ability that the new function value is less than the pre- 
viously observed minimum minus Z,. This class of al- 
gorithms, often called P-algorithms, was given a formal 
justification by A. Zilinskas [7]. 

By allowing the {z,} to depend on the past observa- 
tions instead of being a fixed deterministic sequence, it 
is possible to establish a much better convergence rate 
than that of the randomized algorithm described above. 
In [2] an algorithm was constructed with the property 
that the error converges to zero for any continuous 
function and furthermore, the error is of order e~""", 
where {c,,} (a parameter of the algorithm) is a determin- 
istic sequence that can be chosen to approach zero at an 
arbitrarily slow rate. Notice that the convergence rate 
is now almost exponential in the number of observa- 
tions n. The computational cost of the algorithm grows 
quadratically, and the storage increases linearly, since 
all past observations must be stored. 


See also 


> Adaptive Simulated Annealing and its Application 
to Protein Folding 
> Global Optimization Based on Statistical Models 
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The adaptive simulated annealing (ASA) algorithm [3] 
has been shown to be faster and more efficient than 
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simulated annealing and genetic algorithms [4]. In 
this article we first outline some of the aspects of the 
method and specific computational details, and then re- 
view the application of the ASA method to biomolec- 
ular structure determination [15], specifically for Met- 
Enkephalin and a model of the poly(L-Alanine) system. 


The ASA Method 


For a system described by a cost function E({p' }), where 
all p' (i = 1, ..., D) are parameters (variables) having 
ranges [A;, Bj], the ASA procedure to find the global 
optimum of ‘EF’ contains the following elements. 


Monte-Carlo Configurations 


As the kth point is saved in a D-dimensional configura- 
tion space, the new point pi 4, is generated by: 


Pear = Pi t+ y' (Bi — Ai), (1) 


where the random variables y' in [—1, 1] (non-uniform) 
are generated from a random number w uniformly dis- 
tributed in [0, 1], and the temperature T; associated 
with parameter p’, as follows: 


l |2u'—1| 
y' = sgn(u' — 0.5)T; (: + 7) —1l]. (Q) 


Note that if pj, , is outside the range of [Aj, Bj] it will 
be disregarded, with the process being repeated until it 
falls within the range. The choice of y! is made so that 
the probability density distribution of the D parameters 
will satisfy the distribution of each parameter: 


1 


- ; 3 
Ay] + TA + 2) 


gi(ysT) = 


which is chosen to ensure that any point in configura- 
tion space can be sampled infinitely often in annealing 
time with a cooling schedule outlined below. Thus, at 
any annealing time ko, the probability of not generating 
a global optimum, given infinite time, is zero: 


[[G-s) =0. (4) 


k=ko 


where g; is the distribution function at time step k. Note 
that all atoms move at each Monte-Carlo step in ASA. 
A Boltzmann acceptance criterion is then applied to the 
difference in the cost function. 


Annealing Schedule 


The annealing schedule for each parameter tempera- 
ture from a starting temperature T;, and similarly for 
the cost temperature, is given by: 


Ti(k;) = Toi exp (—cik? ) ; (5) 


where c; and k; are the annealing scale and ASA step of 
parameter p’. The index for re-annealing the cost func- 
tion is determined by the number of accepted points 
instead of the number of generated points as is being 
used for the parameters. This choice was made since 
the Boltzmann acceptance criterion uses an exponen- 
tial distribution which is not as ‘fat-tailed’ as the ASA 
distribution used for the parameters. 


Re-Annealing 


The temperatures may be periodically re-annealed or 
re-scaled according to the sensitivity of the cost func- 
tion. At any given annealing time, the temperature 
range is ‘stretched out’ over the relatively insensitive pa- 
rameters, thus guiding the search ‘fairly’ among the pa- 
rameters. The sensitivity of the energy to each parame- 
ter is calculated by: 


dE 
Si = 37: 6 
ap (6) 
while the re-annealing temperature is determined by: 
! Si 
Ti(k’) = Ti(k)——. (7) 


max 


In this way, less sensitive parameters anneal faster. This 
is done approximately every 100 accepted events. 

For comparison, within conventional simulated an- 
nealing [6] the cooling schedule is given by: 


ST, = Toe O-* (0 <c <1), (8) 


where trial and error are applied to determine the an- 
nealing rate c—1 as well as the starting temperature 
To. A Monte-Carlo simulation is carried out at each 
temperature step k with temperature T;. This cooling 
schedule is equivalent to Ty 41 = Tx c. 

The ASA algorithm is mostly suited to problems for 
which less is known about the system, and has proven 
to be more robust than other simulated annealing tech- 
niques for complex problems with multiple local min- 
ima, e. g., as compared to Cauchy annealing where T; 
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= To/k, and Boltzmann annealing where T; = To/ln 
k, The annealing schedule in (8), faster than ASA for 
a large dimension of D, does not pass the infinitely of- 
ten annealing-time test in (4), and is therefore referred 
to as simulated quench in the terminology of ASA. 


Application to Protein Folding 
Computational Details 


A protein can be defined as a biopolymer of hundreds 
of amino acids bonded by peptide bonds, while the test 
models in this article contain less amino acids, namely 
oligopeptides. The Met-Enkephalin model was con- 
structed as (H-Tyr-Gly-Gly-Phe-Met-OH). For 14(L- 
Alanine), the neutral —NH, and —COOH end groups 
were substituted at the termini. The conformation of 
a protein is described by the dihedral angles of the back- 
bone (¢;, Wj), side-chains (7), and peptide bond (a,j, 
often very close to 180°). Therefore, the conformation 
determination of the most stable protein is to find the 
set of {p, Y, x, w} which give the global minimal po- 
tential energy E(¢, , x, w). Within the ASA nomen- 
clature, the ‘cost function’ is the potential energy, while 
a ‘parameter’ is a dihedral angle variable. 

Conformational analyses using conventional simu- 
lated annealing were carried out previously [9,11]. The 
modifications in these works include moving a number 
of dihedral angles in a Monte-Carlo step; adjusting the 
maximum deviation of the variables as the temperature 
decreases to insure that the acceptance ratio is more 
than 25%; and treating the variables differently accord- 
ing to their importance in the folding process, e. g., by 
increasing sampling for the backbone dihedral angles as 
compared to those of the side-chains. It is interesting to 
point out that within ASA these modifications are im- 
plicitly included. 

Each ASA run in our work was started from a ran- 
dom initial configuration {¢, w, x}. The dihedral angle 
@ was fixed to 180° in all of the ASA runs. The initial 
temperature was determined by the average energy of 5 
or 10 random samplings, and a full search range of the 
dihedral angles (— zr, 2) was set. The typical maximum 
number of calls to the energy function was 30000. An 
ASA run was terminated if it repeated the best energy 
value for 3 or 5 re-annealing cycles (each cycle gener- 
ates 100 configurations). Further refinement of the final 
ASA optimized configuration was carried out by using 


the local minimizer SUMSL [1], or the conjugate gradi- 
ent method. The combination of the ASA application 
and a local minimizer improved the efficiency of the 
search. 

The ASA calculation is governed by various control 
parameters [3], for which the most important setting 
is the annealing rate for the temperatures of ‘cost’ and 
‘parameters’, determined by the so-called ‘temperature- 
ratio-scale’ (the ratio of the final to the initial tem- 
perature after certain annealing steps) and the ‘cost- 
parameter-scale’. The control parameters were varied 
to improve the search efficiency. Adequate control pa- 
rameters used for obtaining the results reported in 
this study were: “temperature-ratio-scale’ = 10~*; ‘cost- 
parameter-scale’ = 0.5. These parameter settings corre- 
spond to an annealing rate for energy of Ccost = 3.6, and 
for all dihedral angles of Cparameter = 7.2. Note that the 
annealing rate for all dihedral angles was chosen to be 
the same. 


Met-Enkephalin 


Met-Enkephalin has a complicated energy surface 
[11,16]. The lowest energy for Met-Enkephalin was 
found to be —12.9kcal/mol with the force field being 
ECEPP/2 (Empirical Conformation Energy Program 
for Peptides) [8]. With all w fixed, the lowest energy was 
found to be —10.7 kcal/mol by MCM [14]. Using dif- 
ferent initial conformations and control parameter set- 
tings of the cooling schedule as described above, 55 in- 
dependent ASA runs were carried out. Table 1 summa- 
rizes the energy distribution of these calculations. Most 
of the ASA calculations result in energies in the range 
of —8 to —3 kcal/mol, with 7 of the results determining 
conformations having energies that are only 3 kcal/mol 
above the known lowest energy, thus exhibiting the ef- 
fectiveness of the approach. Moreover, as the range of 
search was somewhat narrowed, almost all of the ASA 
runs reach the global energy minimum. 


Adaptive Simulated Annealing and its Application to Protein 
Folding, Table 1 

The energy (in kcal/mol) distribution of ASA runs for Met- 
Enkephalin using a full search range 
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Folding, Table 2 

Energy and dihedral angles of the lowest energy conforma- 
tions of Met-Enkephalin calculated by ASA. RMSD1 is the 
root-mean-square deviation (in A) for backbone atoms, while 
RMSD2 is for all atoms 


AO A 1 2 3 4 


E —12.9 —10.7 —10.6 —10.4 —10.1 —8.5 
pi —86 -—87 -—87 -87 -87 -—87 
i 1569) 154° 1535 1535) 1565 153 
oD) —155 —162 —161 —162 —166 —166 
Wo 84 7l 72 7 87 72 
3 84 64 64 63 68 63 
Ws -74 -93 -94 -95 -91 —97 
pa —137. —82 -83 -81 -103 —74 
Wa 19 -—29 -26 -30 -13 —30 
bs —164 -81 -79 -76 -76 —82 
Ws 160 144 133 132 137 143 
ve —173 —180 180 179 —166 —180 
Mo 79 —111 -110 71 88 = 73 
G -166 145 145 -—35 —148 —179 
ve 59 180 72 -179 71 179 
oe —86 -—100 84 —100 -—93 —100 
ve: 53 -65 —171 -173 -65 —65 
Ve 175 -179 176 176 —178 —179 
Ve —180 -179 180 179 —178 —179 
TE —58 —180 -60 60 —178 —179 
RMSD1 0 0.04 0.07 0.51 0.26 
RMSD2 @ 252 192 2.08 1.29 


For the full range search, we identified three 
conformations with energies of —10.6, —10.4, and 
—10.1 kcal/mol, that exhibit the configuration of the 
known lowest geometry of —10.7 kcal/mol. Table 2 lists 
the conformations of these lowest energy configura- 
tions, as well as an additional low energy structure. 
Conformations AO and A are the lowest-energy confor- 
mations with w nonfixed and fixed, respectively, taken 
from [11,14]. The first two conformations, #1 and #2, 
have almost the same backbone configuration as that of 
A (—10.7 kcal/mol), with a backbone root-mean-square 
deviation (RMSD) of only 0.04 and 0.07A, respectively. 
The all-atom RMSD of the listed conformations with 
energies ranging from —8.5 to —10.6 kcal/mol are about 
2A. For conformations #1 and #2, the noted differ- 
ences are in the side-chains, corresponding to a 0.1 and 
0.3 kcal/mol difference in energy, respectively. 
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The conformation of a model 14(L-Alanine) peptide as calcu- 
lated by ASA 


2 3 4 5 6 
@ 994 -—68.2 —68.0 —69.3 —66.9 
w 158.1 —34.3 —38.8 —38.5 —38.6 
7 8 9 10 11 
@ 68.3 -—66.7 —68.8 —67.1 —69.4 
Ww 92 =—3e0 Hh =a ses 
12 13 14 15 
QO —=—50 <2 O77 18 
w 40.0 —44.6 65.8 —40.1 


Poly(L-Alanine) 


The ASA algorithm was applied to a model of (L- 
Alanine) that is known to assume a dominant right- 
handed q-helical structure [13]. For a search range 
of dihedral angles that include both the right-handed 
(RH) a-helix and the B-sheet region in the Ramachan- 
dran’s diagram, yy: (—115°, — 180°) and @: (— 115°, 0°), 
it was significant to find RH a-helices with @ ~ — 68° 
and w ~ — 38° in all backbones except those near the 
end-groups, as shown in Table 3. The energy of such 
a geometry is typically —10.2 kcal/mol after a local min- 
imization. The energy surfaces of the RH a-helical re- 
gions were found to be less complex than those of Met- 
Enkephalin. These results are consistent with a previous 
study [16]. 


Conclusion 


The adaptive simulated annealing as a global optimiza- 
tion method intrinsically includes some of the modi- 
fications of conventional simulated annealing used for 
biomolecular structure determination. As applied to 
Met-Enkephalin, the performance of ASA is compara- 
ble to the simulated annealing study reported in [12], 
while better than the one reported in [11], although 
some differences other than the algorithms are noted. 
Utilizing a partial search range improves the efficiency 
significantly, showing that ASA may be useful for re- 
finement of a molecular structure predicted or mea- 
sured by other methods. A dominant right-handed 
a-helical conformation was found for the 14 residue 
(L-Alanine) model, with deviations observed only near 
the end groups. 
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Recent Studies and Future Directions 


Recent studies have shown improved efficiency in the 
conformational search of Met-Enkephalin, e.g., the 
so-called conformation space annealing (CSA), which 
combines the ideas of genetic algorithms, simulated an- 
nealing, a build up procedure, and local minimization 
[7]. The use of the multicanonical ensemble algorithm 
(ME) (one of the generalized-ensemble algorithms [2]), 
allows free random walks in energy space, escaping 
from any energy barrier. Both the ME and CSA al- 
gorithms outperform genetic algorithms (GA), simu- 
lated annealing (SA), GA with minimization (GAM) 
and Monte-Carlo with minimization (MCM). Our own 
work (unpublished) and the work in ref. [5] both show 
that simple GA alone underperforms simulated an- 
nealing for the Met-Enkephalin conformational search 
problem. Table 4 compares these algorithms for effi- 
ciency (the number of evaluations of energy and energy 
gradient, or the number of local minimizations) and ef- 
fectiveness (the number of runs reaching the ground 
state conformation (hits) versus the number of total 
independent runs). Caution should be exercised since 
some differences exist between these studies, such as the 
version of the ECEPP potential used, the treatment of 
the peptide dihedral angle w, etc. Ground state confor- 


Adaptive Simulated Annealing and its Application to Protein 
Folding, Table 4 

Comparison of the conformation search efficiency and effec- 
tiveness of Met-Enkephalin using different algorithms. Ne, 
Nye, and Nminz are the number of the evaluations of energy, 
energy gradient, and number of local minimizations of each 
run, in the unit of 107 


hits/total Ne Nve Neer 
ME [2] 10/10 < 1900 0 0 
MCM [11] 24/24 * * 15 
GAM [10] 5/5 * * 50 
ME [2] 18/20 950 0 0 
CSA [7] 99/100 300 250 5 
ME [2] 21/50 400 0 0 
CSA [7] 50/100 170 130 2.6 
SA [2] 8/20 1000 0 0 
GA [5] << ie 100 0 0.001 


*: The total number of E,VE evaluations are not given, 
but can be estimated based on roughly 100 evalua- 
tions for each minimization. 


mations are those having energy within approximately 
leV from the known global minimum energy. Note that 
the generalized-ensemble method can be carried out 
with both Monte-Carlo and molecular dynamics. 

In comparison to the studies summarized in Ta- 
ble 4, ASA seems to be using too small a number 
of function evaluations. Optimizing control parame- 
ters such as the annealing schedule and increasing the 
number of energy evaluations may improve the effec- 
tiveness. Search efficiency could also be improved by 
adopting parallellization to achieve scalable simulation 
for various algorithms. Extensive research on the pro- 
tein conformational search using various hybrids of ge- 
netic algorithms and parallelization is in progress (as of 
1999). 
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The airline industry was one of the first to apply oper- 
ations research methodology and techniques on a large 
scale. As early as the late 1950s, operations researchers 
were beginning to study how the developing fields of 
mathematical programming could be used to address 
a number of very difficult problems faced by the air- 
line industry. Since that time many airline related prob- 
lems have been the topics of active research [26]. Most 
optimization-related research in the airline industry 
can be placed in one of the following areas: 

network design and schedule construction; 

fleet assignment; 

aircraft routing; 

crew scheduling; 

revenue management; 

irregular operations; 

air traffic control and ground delay programs. 

In the following, each of these problem areas will 
be defined along with a brief discussion of some of 
the operations research techniques that have been ap- 
plied to solve them. The majority of applications uti- 
lize network-based models. Solution of these models 
range from traditional mathematical programming ap- 
proaches to a variety of novel heuristic approaches. 
A very brief selection of references is also provided. 

Construction of flight schedules is the starting point 
for all other airline optimization problems and is a crit- 
ical operational planning task faced by an airline. The 
flight schedule defines a set of flight segments that an 
airline will service along with corresponding origin and 
destination points and departure and arrival times for 
each flight segment. An airline’s decision to offer cer- 
tain flights will depend in large part on market de- 
mand forecasts, available aircraft operating characteris- 
tics, available manpower, and the behavior of compet- 
ing airlines [11,12]. 

Of course, prior to the construction of flight sched- 
ules, an airline must decide which markets it will serve. 
Before the 1978 ‘Airline Deregulation Act’, airlines 
had to fly routes as assigned by the Civil Aeronautics 
Board regardless of the demand for service. During this 
period, most airlines emphasized long point-to-point 
routes. Since deregulation, airlines have gained the free- 
dom to choose which markets to serve and how often 
to serve them. This change led to a fundamental shift 
in most airlines routing strategies from point-to-point 
flight networks to hub-and-spoke oriented flight net- 


works. This, in turn, led to new research activities for 
finding optimal hub [3,18] and maintenance base [13] 
locations. 

Following network design and schedule construc- 
tion, an aircraft type must be assigned to each flight 
segment in the schedule. This is called the fleet assign- 
ment problem. Airlines generally operate a number of 
different fleet types, each having different characteris- 
tics and costs such as seating capacity, landing weights, 
and crew and fuel costs. The majority of fleet assign- 
ment methods represent the flight schedule via some 
variant of a time-space network with flight arcs between 
stations and inventory arcs at each station. A multicom- 
modity network flow problem can then be formulated 
with arcs and nodes duplicated as appropriate for all 
fleets that can take a particular flight. Side constraints 
must be implemented to ensure each flight segment is 
assigned to only one fleet. In domestic fleet assignment 
problems, a common simplifying assumption is that ev- 
ery flight is flown every day of the week. Under this 
assumption, the network model need only account for 
one day’s flights and a looping arc connects the end of 
the day with the beginning. The resulting models are 
mixed integer programs [1,16,27,30]. 

Aircraft routing is a fleet by fleet process of assign- 
ing individual aircraft to fly each flight segment as- 
signed to a particular fleet. A primary consideration 
at this stage is maintenance requirements mandated by 
the Federal Aviation Administration. There are differ- 
ent types of maintenance activities that must be per- 
formed after a given number of flight hours. The ma- 
jority of these maintenance activities can be performed 
overnight; however, not all stations are equipped with 
proper maintenance facilities for all fleets. During the 
aircraft routing process, individual aircraft from each 
fleet must be assigned to fly all flight segments assigned 
to that fleet in a manner that provides maintenance op- 
portunities for all aircraft at appropriate stations within 
the required time intervals. This problem has been for- 
mulated and solved in a number of ways including as 
a general integer programming problem solved by La- 
grangian relaxation [9] and as a set partitioning prob- 
lem solved with a branch and bound algorithm [10]. 

As described above, the problems of fleet assign- 
ment and aircraft routing have been historically solved 
in a sequential manner. Recently, work has been done 
to solve these problems simultaneously using a string- 
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based model and a branch and price solution ap- 
proach [5]. 

Crew scheduling, like aircraft routing, is done fol- 
lowing fleet assignment. The first of two sequentially 
solved crew scheduling problems is the crew pairing 
problem. A crew pairing is a sequence of flight legs 
beginning and ending at a crew base that satisfies all 
governmental and contractual restrictions (some times 
called legalities). These crew pairings generally cover 
a period of 2-5 days. The problem is to find a mini- 
mum cost set of such crew pairings such that all flight 
segments are covered. This problem has generally been 
modeled as a set partitioning problem in which pair- 
ings are enumerated or generated dynamically [15,17]. 
Other attempts to solve this problem have employed 
a decomposition approach based on graph partitioning 
[4] and a linear programming relaxation of a set cover- 
ing problem [21]. Often a practice called deadheading 
is used to reposition flight crews in which a crew will fly 
a flight segment as passengers. Therefore, in solving the 
crew-pairing problem, all flight segments must be cov- 
ered, but they may be covered by more than one crew. 

The second problem to be solved relating to crew 
scheduling is the monthly crew rostering problem. This 
is the problem of assigning individual crew members to 
crew pairings to create their monthly schedules. These 
schedules must incorporate time off, training periods, 
and other contractual obligations. Generally, a prefer- 
ential bidding system is used to make the assignments 
in which each personalized schedule takes into account 
an employee’s pre-assigned activities and weighted bids 
representing their preferences. While the crew pairing 
problem has been widely studied, a limited number of 
publications have dealt with the monthly crew rostering 
problem. Approaches include an integer programming 
scheme [14] and a network model [24]. 

Revenue management is the problem of determin- 
ing fare classes for each flight in the flight schedule as 
well as the allocation of available seats to each fare class. 
Not only are seats on an airplane partitioned physically 
into sections such as first class and coach, but also seats 
in the same section are generally priced at many differ- 
ent levels. The goal is to maximize the expected revenue 
from a particular flight segment by finding the proper 
balance between gaining additional revenue by selling 
more inexpensive seats and losing revenue by turn- 
ing away higher fare customers. A standard assump- 


tion is that fare classes are filled sequential from the 
lowest to the highest. This is often the case where dis- 
counted fares are offered in advance, while last minute 
tickets are sold at a premium. Recent research includes 
a probabilistic decision model [6], a dynamic program- 
ming formulation [31] and some calculus-based book- 
ing policies [8]. 

When faced with a lack of resources, airlines of- 
ten are not able to fly their published flight schedule. 
This is frequently the result of aircraft mechanical dif- 
ficulties, inclement weather, or crew shortages. As situ- 
ations like these arise, decisions must be made to deal 
with the shortage of resources in a manner that returns 
the airline to the originally planned flight schedule in 
a timely fashion while attempting to reduce operational 
cost and keep passengers satisfied. This general situa- 
tion is called the airline irregular operations problem 
and it involves aircraft, crew, gates, and passenger re- 
covety. 

The aircraft schedule recovery problem deals with 
re-routing aircraft during irregular operations. This 
problem has received significant attention among ir- 
regular operations topics; papers dealing with crew 
scheduling during irregular operations have only re- 
cently started to appear [28,35]. Most approaches for 
dealing with aircraft schedule recovery have been based 
on network models. Some early models were pure net- 
works [19]. Recently, more comprehensive models have 
been developed that better represent the problem, but 
are more difficult to solve as side constraints have been 
added to the otherwise network structure of these prob- 
lems [2,33,36]. In practice, many airlines use heuristic 
methods to solve these problems as their real-time na- 
ture does not allow for lengthy optimization run times. 

Closely related to the irregular operations prob- 
lem is the ground delay problem in air traffic control. 
Ground delay is a program implemented by the Fed- 
eral Aviation Administration in cases of station conges- 
tion. During ground delay, aircraft departing for a con- 
gested station are held on the ground before departure. 
The rational for this behavior is that ground delays are 
less expensive and safer than airborne delays. Several 
optimization models have been formulated to decrease 
the total minutes of delay experienced throughout the 
system during a ground delay program. These prob- 
lems have generally been modeled as integer programs 
([22,23]), but the problem has also been solved using 
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stochastic linear programming [25] and by heuristic 
methods [34]. 

Optimization based methods have also been ap- 
plied to a myriad of other airline related topics such as 
gate assignment [7], fuel management [29], short term 
fleet assignment swapping [32], demand modeling [20], 
and others. Airline industry is an exciting arena for 
the interplay between optimization theory and practice. 
Many more optimization applications in the airline in- 
dustry will evolve in the future. 


See also 


> Integer Programming 
> Vehicle Scheduling 
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Introduction 


Interval optimization methods (> interval analysis: un- 
constrained and constrained optimization) have the 
guarantee not to loose global optimizer points. To 
achieve this, a deterministic branch-and-bound frame- 
work is applied. Still, heuristic algorithmic improve- 
ments may increase the convergence speed while keep- 
ing the guaranteed reliability. 
The indicator parameter called RejectIndex 


7 =X) 


POO = FF) 


was suggested by L.G. Casado as a measure of the close- 
ness of the interval X to a global minimizer point [1]. 
It was first applied to improve the work load balance of 
global optimization algorithms. 

A subinterval X of the search space with the mini- 
mal value of the inclusion function F(X) is usually con- 
sidered as the best candidate to contain a global min- 
imum. However, the larger the interval X, the larger 
the overestimation of the range f(X) on X compared 
to F(X). Therefore a box could be considered as a good 
candidate to contain a global minimum just because it 
is larger than the others. To compare subintervals of 
different sizes we normalize the distance between the 
global minimum value f’ and F(X). 

The idea behind pf is that in general we expect 
the overestimation to be symmetric, i.e., the overes- 
timation above f(X) is closely equal to the overesti- 
mation below f(X) for small subintervals containing 
a global minimizer point. Hence, for such intervals X 
the relative place of the global optimum value inside 
the F(X) interval should be high, while for intervals far 
from global minimizer points pf’ must be small. Obvi- 
ously, there are exceptions, and there exists no theoreti- 
cal proof that pf’ would bea reliable indicator of nearby 
global minimizer points. 

The value of the global minimum is not available in 
most cases. A generalized expression for a wider class 
of indicators is 


» o,f F(X) 
PU) = Fac’ 


where the 7 value is a kind of approximation of the 
global minimum. We assume that f € F(X), ie., this 
estimation is realistic in the sense that f is within the 
known bounds of the objective function on the search 
region. According to the numerical experience col- 
lected, we need a good approximation of the f’ value 
to improve the efficiency of the algorithm. 


Subinterval Selection 


I. Among the possible applications of these indica- 
tors the most promising and straightforward is in 
the subinterval selection. The theoretical and computa- 
tional properties of the interval branch-and-bound op- 
timization has been investigated extensively [6,7,8,9]. 
The most important statements proved are the follow- 
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ing for algorithms with balanced subdivision direction 

selection: 

1. Assume that the inclusion function of the objective 
function is isotone, it has the zero convergence prop- 
erty, and the p(f;,Y) parameters are calculated with 
the f; parameters converging to f > f*, for which 
there exists a point x € X with f(x) = 7 . Then the 
branch-and-bound algorithm that selects that inter- 
val Y from the working list which has the max- 
imal p(fi,Z) value can converge to a point * € X 
for which f(x) > f*, i.e. to a point which is not 
a global minimizer point of the given problem. 

2. Assume that the inclusion function of the objective 
function has the zero convergence property and f; 
converges to ‘i < f*. Then the optimization branch- 
and-bound algorithm will produce an everywhere 
dense sequence of subintervals converging to each 
point of the search region X regardless of the objec- 
tive function value. 

3. Assume that the inclusion function of the objective 
function is isotone and has the zero convergence 
property. Consider the interval branch-and-bound 
optimization algorithm that uses the cutoff test, the 
monotonicity test, the > interval Newton step, and 
the concavity test as accelerating devices, and that 
selects as the next leading interval that interval Y 
from the working list which has the maximal p(f;,Z) 
value. A necessary and sufficient condition for the 
convergence of this algorithm to a set of global min- 
imizer points is that the sequence {f;} converges to 
the global minimum value f”, and there exist at most 
a finite number of f; values below f’. 

4. If our algorithm applies the interval selection rule 
of maximizing the p(f*, X) = pf*(X) values for the 
members of the list L (i. e., if we can use the known 
exact global minimum value), then the algorithm 
converges exclusively to global minimizer points. 

5. If our algorithm applies the interval selection rule of 
maximizing the p(f, X) values for the members of 
the list L, where f is the best available upper bound 
for the global minimum, and its convergence to f 
can be ensured, then the algorithm converges exclu- 
sively to global minimizer points. 

6. Assume that for an optimization problem 
minxex f(x) the inclusion function F(X) of f(x) 
is isotone and a-convergent with given positive 
constants w and C. Assume further that the pf’ pa- 


rameter is less than 1 for all the subintervals of X. 

Then an arbitrary large number N(> 0) of consecu- 

tive leading intervals of the basic B&B algorithm that 

selects the subinterval with the smallest lower bound 

as the next leading interval may have the following 

properties: 

i. None of these processed intervals contains a sta- 
tionary point. 

ii. During this phase of the search the pf values are 
maximal for these intervals. 

7. Assume that the inclusion function of the objective 
function is isotone and it has the zero convergence 
property. Consider the interval branch-and-bound 
optimization algorithm that uses the cutoff test, the 
monotonicity test, the interval Newton step, and the 
concavity test as accelerating devices and that selects 
as the next leading interval that interval Y from the 
working list which has the maximal pf(f;,Z) value. 
i. The algorithm converges exclusively to global 

minimizer points if 


f, 2h <0 i) +f, 


holds for each 
0<6<1. 

ii. The above condition is sharp in the sense that 

5 = 1 allows convergence to not optimal points. 
Here f, = mint FY"), . = 1,...i |g} = Sk < fk = 
f,, where |L| stands for the cardinality of the elements 
of the list L. 

II. These theoretical results are in part promising 
(e.g., 7), in part disappointing (5 and 6). The conclu- 
sions of the detailed numerical comparisons were that 
if the global minimum value is known, then the use of 
the pf’ parameter in the described way can accelerate 
the interval optimization method by orders of magni- 
tude, and this improvement is especially strong for hard 
problems. 

In case the global minimum value is not available, 
then its estimation, f;, which fulfills the conditions of 7, 
can be utilized with similar efficacy, and again the best 
results were achieved on difficult problems. 


iteration number k, where 


Multisection 


I. The multisection technique is a way to accelerate 
branch-and-bound methods by subdividing the actual 
interval into several subintervals in a single algorithm 
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step. In the extreme case half of the function evalua- 
tions can be saved [5,10]. On the basis of the RejectIn- 
dex value of a given interval it is decided whether simple 
bisection or two higher-degree multisections are to be 
applied [2,11]. Two threshold values, 0 < P; < P,; < 1, 
are used for selecting the proper multisection type. 

This algorithm improvement can also be cheated 
in the sense that there exist global optimization prob- 
lems for which the new method will follow for an ar- 
bitrary long number of iterations an embedded interval 
sequence that contains no global minimizer point, or 
that intervals in which there is a global minimizer have 
misleading indicator values. 

According to the numerical tests, the new multisec- 
tion strategies result in a substantial decrease both in 
the number of function evaluations and in the memory 
complexity. 

II. The multisection strategy can also be applied 
to constrained global optimization problems [11]. The 
feasibility degree index for constraint g;(x) < 0 can be 
formulated as 


—G (X) 
w(Gj(X))’ 


Puc; (X) = min 


Notice that if Puc, (X) < 0, then the box is certainly in- 
feasible, and if pug,(X) = 1 then X certainly satisfies 
the constraint. Otherwise, the box is undetermined for 
that constraint. For boxes that are not certainly infea- 
sible, i.e., for which puc(X) > 0 forall j=1,...,r 
holds, the total infeasibility index is given by 


pu(X) = | | puc,(X). 
j=l 


We must only define the index for such boxes since cer- 
tainly infeasible boxes are immediately removed by the 
algorithm from further consideration. With this defini- 
tion, 
e pu(X) = 1< X is certainly feasible and 
e pu(X) € [0,1) > X is undetermined. 

Using the pu(X) index, we now propose the fol- 
lowing modification of the RejectIndex for constrained 
problems: 


pup(f,X) = pu(X)- p(f,X), 


where 7 is a parameter of this indicator, which is usu- 
ally an approximation of f’. This new index works like 
p(f. X) if X is certainly feasible, but if the box is unde- 
termined, then it takes the feasibility degree of the box 
into account: the less feasible the box is, the lower the 
value of pu(X) is. 

A careful theoretical analysis proved that the new 
interval selection and multisection rules enable the 
branch-and-bound interval optimization algorithm to 
converge to a set of global optimizer points assuming 
we have a proper sequence of {f;,} parameter values. 
The convergence properties obtained were very simi- 
lar to those proven for the unconstrained case, and they 
give a firm basis for computational implementation. 

A comprehensive numerical study on standard 
global optimization test problems and on facility loca- 
tion problems indicated [11] that the constrained ver- 
sion interval selection rules and, to a lesser extent, also 
the new adaptive multisection rules have several advan- 
tageous features that can contribute to the efficiency of 
the interval optimization techniques. 


Heuristic Rejection 


RejectIndex can also be used to improve the efficiency 
of interval global optimization algorithms on very hard 
to solve problems by applying a rejection strategy to 
get rid of subintervals not containing global minimizer 
points. This heuristic rejection technique selects those 
subintervals on the basis of a typical pattern of changes 
in the pf’ values [3,4]. 

The RejectIndex is not always reliable: assume that 
the inclusion function F(X) of f(x) is isotone and a- 
convergent. Assume further that the RejectIndex pa- 
rameter pf. is less than 1 for all the subintervals of X. 
Then an arbitrary large number N(> 0) of consecutive 
leading intervals may have the following properties: 

i. Neither of these processed intervals contains a sta- 
tionary point, and 

ii. During this phase of the search the pf’ values are 
maximal for these intervals as compared with the 
subintervals of the current working list. 

Also, when a global optimization problem has 
a unique global minimizer point x’, there always exists 
an isotone and a@-convergent inclusion function F(X) 
of f(x) such that the new algorithm does not converge 
tox. 
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In spite of the possibility of losing the global mini- 
mum, obviously there exist such implementations that 
allow a safe way to use heuristic rejection. For example, 
the selected subintervals can be saved on a hard disk for 
further possible processing if necessary. 

Although the above theoretical results were not 
encouraging, the computational tests on very hard 
global optimization problems were convincing: when 
the whole list of subintervals produced by the B&B al- 
gorithm is too large for the given computer memory, 
then the use of the suggested heuristic rejection tech- 
nique decreases the number of working list elements 
without missing the global minimum. The new rejec- 
tion test may also make it possible to solve hard-to- 
solve problems that are otherwise unsolvable with the 
usual techniques. 
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Abstract 


The genome of an organism not only serves as its 
blueprint that holds the key for diagnosing and cur- 
ing diseases, but also plays a pivotal role in obtaining 
a holistic view of its ancestry. Recent years have wit- 
nessed a large number of innovations in this field, as 
exemplified by the Human Genome Project. This chap- 
ter provides an overview of popular algorithms used in 
genome analysis and in particular explores two impor- 
tant and deeply interconnected problems: phylogenetic 
analysis and multiple sequence alignment. We also de- 
scribe our novel graph-theoretical approach that en- 


34 


Algorithms for Genomic Analysis 


compasses a wide variety of genome sequence analysis 
problems within a single model. 


Introduction 


Genomics encompasses the study of the genome in hu- 
man and other organisms. The rate of innovation in this 
field has been breathtaking over the last decade, espe- 
cially with the completion of Human Genome Project. 
The purpose of this chapter is to review some well- 
known algorithms that facilitate genome analysis. The 
material is presented in a way that is interesting to 
both the specialists working in this area and others. 
Thus, this review includes a brief sketch of the al- 
gorithms to facilitate a deeper understanding of the 
concepts involved. The list of problems related to ge- 
nomics is very extensive; hence, the scope of this chap- 
ter is restricted to the following two related important 
problems: (1) phylogenetic analysis and (2) multiple 
sequence alignment. Readers interested in algorithms 
used in other fields of computational biology are rec- 
ommended to refer to reviews by Abbas and Holmes [1] 
and Blazewicz et al. [7]. 

Genome refers to the complete DNA sequence con- 
tained in the cell. The DNA sequence consists of the 
four nucleotides adenine (A), thymine (T), cytosine 
(C), and guanine (G). Associated with each DNA strand 
(sequence) is a complementary DNA strand of the same 
length. The strands are complementary in that each nu- 
cleotide in one strand uniquely defines an associated 
nucleotide in the other: A and T are always paired, and 
C and G are always paired. Each pairing is referred to as 
a base pair; and bound complementary strands make up 
a DNA molecule. Typically, the number of base pairs in 
a DNA molecule is between thousands and billions, de- 
pending on the complexity of a given organism. For ex- 
ample, a bacterium contains about 600,000 base pairs, 
while human and mouse have some three billion base 
pairs. Among humans, 99.9% of base pairs are the same 
between any two unrelated persons. But that leaves mil- 
lions of single-letter differences, which provide genetic 
variation between people. 

Understanding the DNA sequence is extremely im- 
portant. It is considered as the blueprint for an organ- 
ism’s structure and function. The sequence order un- 
derlies all of life’s diversity, even dictating whether an 
organism is human or another species such as yeast or 


a fruit fly. It helps in understanding the evolution of 
mankind, identifying genetic diseases, and creating new 
approaches for treating and controlling those diseases. 
In order to achieve these goals, research in genome 
analysis has progressed rapidly over the last decade. 
The rest of this chapter is organized as follows. 
Section “Phylogenetic Analysis” discusses techniques 
used to infer the evolutionary history of species and 
Sect. “Multiple Sequence Alignment” presents the mul- 
tiple sequence alignment problem and recent advances. 
In Sect. “Novel Graph-Theoretical Genomic Models”, 
we describe our research effort for advancing genomic 
analysis through the design of a novel graph-theoretical 
approach for representing a wide variety of genomic se- 
quence analysis problems within a single model. We 
summarize our theoretical findings, and present com- 
putational models based on two integer programming 
formulations. Finally, Sect. “Summary” summarizes the 
interdependence and the pivotal role played by the 
abovementioned two problems in computational biol- 


ogy. 


Phylogenetic Analysis 


Phylogenetic analysis is a major aspect of genome re- 
search. It refers to the study of evolutionary relation- 
ships of a group of organisms. These hierarchical rela- 
tionships among organisms arising through evolution 
are usually represented by a phylogenetic tree (Fig. 1). 
The idea of using trees to represent evolution dates back 
to Darwin. Both rooted and unrooted tree representa- 
tions have been used in practice [17]. The branches of 
a tree represent the time of divergence and the root rep- 
resents the ancestral sequence (Fig. 2). 

The study of phylogenies and processes of evolution 
by the analysis of DNA or amino acid sequence data is 
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Tree terminology 


called molecular phylogenetics. In this study, we will fo- 
cus on methods that use DNA sequence data. There are 
two processes involved in inferring both rooted and un- 
rooted trees. The first is estimating the branching struc- 
ture or topology of the tree. The second is estimating 
the branch lengths for a given tree. Currently, there are 
wide varieties of methods available to conduct this anal- 
ysis [16,19,55,79]. These available approaches can be 
classified into three broad groups: (1) distance meth- 
ods; (2) parsimony methods; and (3) maximum likeli- 
hood methods. Below, we will discuss each of them in 
detail. 


Methods Based on Pairwise Distance 


In distance methods, an evolutionary distance dj is 
computed between each pair i, j of sequences, and 
a phylogenetic tree is constructed from these pair- 
wise distances. There are many different ways of defin- 
ing pairwise evolutionary distance used for this pur- 
pose. Most of the approaches estimate the number of 
nucleotide substitutions per site, but other measures 
have also been used [70,71]. The most popular one is 
the Jukes—Cantor distance [37], which defines dj as 
—3 log(1 — fy, where f is the fraction of sites where 
nucleotides differ in the pairwise alignment [37]. 

There are a large number of distance methods for 
constructing evolutionary trees [78]. In this article, we 
discuss methods based on cluster analysis and neighbor 
joining. 


Cluster Analysis: Unweighted Pair Group Method 
Using Arithmetic Averages The conceptually sim- 
plest and most known distance method is the un- 
weighted pair group method using arithmetic aver- 
ages (UPGMA) developed by Sokal and Michener [66]. 
Given a matrix of pairwise distances between each pair 
of sequences, it starts with assigning each sequence to 
its own cluster. The distances between the clusters are 
defined as djj = Teer Ye cae c, 4(p. q), where C; 
and C; denote sequences in clusters i and j, respectively. 
At each stage in the process, the least distant pair of 
clusters are merged to create a new cluster. This pro- 
cess continues until only one cluster is left. Given n se- 
quences, the general schema of UPGMA is shown in 
Algorithm 1. 


Algorithm 1 (UPGMA) 
1. Input: Distance matrix dj, 1 < i,j <n 
2. Fori=1tondo 


3. Define singleton cluster C; comprising of se- 
quence i 

4. Place cluster C; as a tree leaf at height zero 

5. End for 

6. Repeat 

7s Determine two clusters i, j such that dj is mini- 
mal. 

8. Merge these two clusters to form a new cluster k 


having a distance from other clusters defined as 
the weighted average of the comprising two 
clusters. If Cy is the union of two clusters C; 
and Cj, and if C; is any other cluster, then dy = 
dji\Ci|+4j1|C;| 
Icil+ICl* 

9. Define a node k at height a with daughter nodes 
iand j. 

10. Until just a single cluster remains 


The time and space complexity of UPGMA is O(n’), 
since there are n—1 iterations of complexity O(n). 
A number of approaches have been developed which 
are motivated by UPGMA. Li [52] developed a sim- 
ilar approach which also makes corrections for un- 
equal rates of evolution among lineages. Klotz and 
Blanken [43] presented a method where a present-day 
sequence serves as an ancestor in order to determine the 
tree regardless of the rates of evolution of the sequences 
involved. 
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Neighbor Joining Neighbor joining is another very 
popular algorithm based on pairwise distances [63]. 
This approach yields an unrooted tree and overcomes 
the assumption of the UPGMA method that the same 
rate of evolution applies to each branch. 

Given a matrix of pairwise distances between each 
pair of sequences dj, it first defines the modified dis- 
tance matrix d;;. This matrix is calculated by subtract- 
ing average distances to all other sequences from the dj, 
thus compensating for long edges. In each stage, the two 
nearest nodes (minimal d; ;) of the tree are chosen and 
defined as neighbors in the tree. This is done recursively 
until all of the nodes are paired together. 

Given n sequences, the general schema of neighbor 
joining is shown in Algorithm 2. 


Algorithm 2 (Neighbor joining) 
1. Input: Distance matrix dj;, 
2. Fori=l1lton 


l<ij<n 


3. Assign sequence i to the set of leaf nodes of the 
tree (T) 

4. End for 

5. Set list of active nodes (L) = T 

6. Repeat 


7s Calculate the modified distance matrix 
dij = dij = (r; + r;), where y= ne re; dik 


8. Find the pair i, j in L having the minimal value 
of dij 
9. Define a new node u and set d,, = + (dik + dix 


— djj), for allk in L 
10. Add u to T joining nodes i, j with edges of length 
given by: diu = $(dij ae te rj). d ju = dij = din 
11. Remove i and j from L and add u 
12. Until only two nodes remain in L 
13. Connect remaining two nodes i and j by a branch 
of length dj 


Neighbor joining has a execution time of O(n’), like 
UPGMA. It has given extremely good results in prac- 
tice and is computationally efficient [63,72]. Many 
practitioners have developed algorithms based on this 
approach. Gascuel [24] improved the neighbor-joining 
approach by using a simple first-order model of the 
variances and covariances of evolutionary distance es- 
timates. Bruno et al. [10] developed a weighted neigh- 
bor joining using a likelihood-based approach. Goef- 
fon et al. [25] investigated a local search algorithm un- 


der the maximum parsimony criterion by introducing 
a new subtree swapping neighborhood with an effective 
array-based tree representation. 


Parsimony Methods 


In science, notion of parsimony refers to the prefer- 
ence of simpler hypotheses over complicated ones. In 
the parsimony approach for tree building, the goal is 
to identify the phylogeny that requires the fewest nec- 
essary changes to explain the differences among the ob- 
served sequences. Of the existing numerical approaches 
for reconstructing ancestral relationships directly from 
sequence data, this approach is the most popular one. 
Unlike distance-based methods which build trees, it 
evaluates all possible trees and gives each a score based 
on the number of evolutionary changes that are needed 
to explain the observed sequences. The most parsimo- 
nious tree is the one that requires the fewest evolution- 
ary changes for all sequences to derive from a common 
ancestor [69]. As an example, consider the trees in Fig. 3 
and Fig. 4. The tree in Fig. 3 requires only one evolu- 
tionary change (marked by the star) compared with the 
tree in Fig. 4, which requires two changes. Thus, Fig. 3 
shows the more parsimonious tree. 

There are two distinct components in parsimony 
methods: given a labeled tree, determine the score; de- 
termine global minimum score by evaluating all possi- 
ble trees, as discussed below. 


Score Computation Given a set of nucleotide se- 
quences, parsimony methods treat each site (position) 
independently. The algorithm evaluates the score at 
each position and then sums them up over all the po- 
sitions. As an example, suppose we have the following 
three aligned nucleotide sequences: 


CCC 
GGC 
CGC 


Then, for a given tree topology, we would calcu- 
late the minimal number of changes required at each of 
the three sites and then sum them up. Here, we inves- 
tigate a traditional parsimony algorithm developed by 
Fitch [21], where the number of substitutions required 
is taken as a score. For a particular topology, this ap- 
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ccc 


Algorithms for Genomic Analysis, Figure 3 
Parsimony tree 1 


CCC 


CCC 


Algorithms for Genomic Analysis, Figure 4 
Parsimony tree 2 


proach starts by placing nucleotides at the leaves and 
traverses toward the root of the tree. At each node, the 
nucleotides common to all of the descendant nodes are 
placed. If this set is empty then the union set is placed 
at this node. This continues until the root of the tree is 
reached. The number of union sets { equals} the num- 
ber of substitutions required. 

The general scheme for every position is shown in 
Algorithm 3. 


Algorithm 3 (Parsimony: score computation) 

1. Each leaf / is labeled with set R; having observed 
nucleotide at that position. 

2. Score S =0 

3. For all internal nodes k with children i and j having 
labels R; and R; do 

4, Re=Ri()R; 


{C,G} 


* 


Cc G Cc 


Algorithms for Genomic Analysis, Figure 5 
The sets R, for the first site of given three sequences 


5 if R; = @ then 
6 Ry = Ri UR; 
7 S=S+1 

8 end if 

9. End for 


10. Minimal score = S 


Figure 5 shows the set Ry obtained by Algorithm 3. 
The computation is done for the first site of the three se- 
quences shown above. The minimal score given by the 
algorithm is 1. 

A wide variety of approaches have been developed 
by modifying Fitch’s algorithm [68]. Sankoff and Ced- 
ergren [64] presented a generalized parsimony method 
which does not just count the number of substitutions, 
but also assigns a weighted cost for each substitution. 

Ronquist [62] improved the computational time by 
including strategies for rapid evaluation of tree lengths 
and increasing the exhaustiveness of branch swapping 
while searching topologies. 


Search of Possible Tree Topologies The number of 
possible tree topologies dramatically increases with the 
number of sequences. Consequently, in practice usu- 
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ally only a subset of them are examined using efficient 
search strategies. The most commonly used strategy is 
branch and bound methods to select branching pat- 
terns [60]. For large-scale problems, heuristic meth- 
ods are typically used [69]. These exact and heuris- 
tic tree search strategies are implemented in various 
programs like PHYLIP (phylogeny inference package) 
and MEGA (molecular evolutionary genetic analysis) 
[20,47]. 


Maximum Likelihood Methods 


The method of maximum likelihood is one of the most 
popular statistical tools used in practice. In molecular 
phylogenetics, maximum likelihood methods find the 
tree that has the highest probability of generating ob- 
served sequences, given an explicit model of evolution. 
The method was first introduced by Felenstein [18]. We 
discuss herein both the evolution models and the calcu- 
lation of tree likelihood. 


Model of Evolution A model of evolution refers to 
various events like mutation, which changes one se- 
quence to another over a period of time. It is required to 
determine the probability of a sequence S, arising from 
an ancestral sequence S, over a period of time f. Var- 
ious sophisticated models of evolution have been sug- 
gested, but simple models like the Jukes—Cantor model 
are preferred in maximum likelihood methods. 

The Jukes—Cantor [37] model assumes that all nu- 
cleotides (A, C, T, G) undergo mutation with equal 
probability, and change to all of the other three possible 
nucleotides with the same probability. If the mutation 
rate is 3a per unit time per site, the mutation matrix P; 
(probability that nucleotide i changes to nucleotide j in 
unit time) takes the form 


1—3a a a a 
a 1—3a a a 
a a 1—3a a 
a a a 1—3a 


The above matrix is integrated to evaluate muta- 
tion rates over time ft and is then used to calculate 
P(nt2|nt;, t), defined as the probability of nucleotide 
nt, being substituted by nucleotide nt over time t. 


tz 
t; 


S2 


Ss; 


Algorithms for Genomic Analysis, Figure 6 
A simple tree 


Various other evolution models like the Kimura 
model have also been mentioned in the literature [9,42]. 


Likelihood ofa Tree The likelihood of a tree is calcu- 
lated as the probability of observing a set of sequences 
given the tree. 


L(tree) = probability[sequences|tree] 


We begin with the simple case of two sequences 
S' and S? of length n having a common ancestor a as 
shown in Fig. 6. It is assumed that all different sites (po- 
sitions) evolve independently, and thus the total likeli- 
hood is calculated as the product of the likelihood of all 
sites [15]. Here, the likelihood of each site is obtained 
using substitution probabilities based on an evolution 
model. 

Given qq is the equilibrium distribution of nu- 
cleotide a, the likelihood for the simple tree in Fig. 6 
is calculated as L(tree) = P(S', S*) = []j_, P(S}, S?), 
where P(Si, S?) = )>, qaP(S}|a)P(S?|a). To general- 
ize this approach for m sequences, it is assumed that di- 
verged sequences evolve independently after diverging. 
Hence, the likelihood for every node in a tree depends 
only on its immediate ancestral node and a recursive 
procedure is used to evaluate the likelihood of the tree. 
The conditional likelihood L;,, is defined as the like- 
lihood of the subtree rooted at node k, given that the 
nucleotide at node k is a. The general schema for ev- 
ery site is shown in Algorithm 4. The likelihood is then 
maximized over all possible tree topologies and branch 
lengths. 
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Algorithm 4 (Likelihood: computation at given site) 


For all internal nodes k with children i and j 
define the conditional likelihood 
Lia = dop,cl[P(bla)Li,o][P(cla)Lj,c] 

10. End for 

11. Likelihood at given site = 7, qa Lroot,a 


1. For all leaf] do 

2: if leaf has nucleotide a at that site then 
3, Lia =1 

4. else 

5. Lia = 0 

6. end if 

7. End for 

8. 

0: 


Recent Improvements The maximum likelihood ap- 
proach has received great attention owing to the ex- 
istence of powerful statistical tools. It has been made 
more sophisticated using advance tree search algo- 
rithms, sequence evolution models, and statistical ap- 
proaches. Yang [80] extended it to the case where 
the rate of nucleotide substitutions differ over sites. 
Huelsenbeck and Crandall [34] incorporated the im- 
provements in substitution models. Piontkivska [59] 
evaluated the use of various substitution models in the 
maximum likelihood approach and inferred that simple 
models are comparable in terms of both efficiency and 
reliability with complex models. 

The enormously large number of possible tree 
topologies, especially while working with a large num- 
ber of sequences, makes this approach computationally 
intensive [72]. It has been proved that reconstructing 
the maximum likelihood tree is nondeterministic poly- 
nomial time hard (NP) hard even for certain ap- 
proximations [14]. In order to reduce computational 
time, Guindon and Gascuel [31] developed a sim- 
ple hill-climbing algorithm based on the maximum- 
likelihood principle that adjusts tree topology and 


Cc cC Cc C — C C 
Cc G G Cc cc) G G C¢ 
Cc G— c c¢ GCG GC — 


Algorithms for Genomic Analysis, Figure 7 
Two possible alignments for given three sequences 


branch lengths simultaneously. Recently, parallel com- 
putation has been used to address huge computa- 
tional requirement. Stamatakis et al. [67] have used 
OpenMP-parallelization for symmetric multiprocess- 
ing machines and Keane et al. [39] developed a dis- 
tributed platform for phylogeny reconstruction by 
maximum likelihood. 


Multiple Sequence Alignment 


Multiple sequence alignment is arguably among the 
most studied and difficult problems in computational 
biology. It is a vital tool because it compactly repre- 
sents conserved or variable features among the family 
members. Alignment also allows character-based anal- 
ysis compared to distance-based analysis and thus helps 
to elucidate evolutionary relationships better. Conse- 
quently, it plays a pivotal role in a wide range of se- 
quence analysis problems like identifying conserved 
motifs among given sequences, predicting secondary 
and tertiary structures of protein sequences, and molec- 
ular phylogenetic analysis. It is also used for sequence 
comparison to find the similarity of a new sequence 
with pre-existing ones. This helps in gathering infor- 
mation about the function and structure of newly found 
sequences from existing ones in databases like GenBank 
in the USA and EMBL in Europe. 

The multiple sequence alignment problem can be 
stated formally as follows. Let )> be the alphabet and 
let )> = > Uf-}, where “—” is a symbol to repre- 
sent “gaps” in sequences. For DNA sequences, alphabet 
>> = {AT,CG,-}. 

An alignment for N sequences S;,...,Sy is given 
by a set $= {Si,..., Sw} over the alphabet .2 which 
satisfy the following two properties: (1) the strings in s 
are of the same length; (2) S; can be obtained from 8; by 
removing the gaps. Thus, an alignment in which each 
string $; has length K can be interpreted as an align- 
ment matrix of N rows and K columns, where row i 
corresponds to sequence S;. Alphabets that are placed 
into the same column of the alignment matrix are said 
to be aligned with each other. 

Figure 7 shows two possible alignments for given 
three sequences: S; = CCC, S,; = CGGC, and $3; = 
CGC. 

For two sequences, the optimal multiple sequence 
alignment is easily obtained using dynamic program- 
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ming (Needleman—Wunsch algorithm). Unfortunately, 
the problem becomes much harder for more than two 
sequences, and the optimal solution can be found only 
for a limited number of sequences of moderate length 
(approximately 100) [8]. Researchers have tried to solve 
it by generalizing the dynamic programming approach 
to a multidimensional space. However, this approach 
has huge time and memory requirements and thus can- 
not be used in practice even for small problems of 
five sequences of length 100 each. This algorithm has 
been improved by identifying the portion of hyperspace 
which does not contribute to the solution and excluding 
it from the computation [11]. But even this approach 
of Carrillo and Lipman implemented in the multiple 
sequence alignment program can only align up to ten 
sequences [53]. Although, Gupta et al. [32] improved 
the space and time usage of this approach, it cannot 
align large data sets. To reduce the huge time and mem- 
ory expenses, a wide variety of heuristic approaches for 
multiple sequence alignment have been developed [56]. 

There are two components for finding the multiple 
sequence alignment: (1) searching over all the possible 
multiple alignments; (2) scoring each of them to find 
the best one. 

The problem becomes more complex for remotely 
related homologous sequences, i.e., sequences which 
are not derived from a common ancestor [28]. Numer- 
ous approaches have been proposed, but the quest for 
an approach which is accurate and fast is continuing. It 
must be remembered that even the choice of sequences 
and calculating the score of alignment is a nontrivial 
task and is an active research field in itself. 


Scoring Alignment 


There is no unanimous way of characterizing an align- 
ment as the correct one and the strategy depends on 
the biological context. Different alignments are possi- 
ble and we never know for sure which alignment is 
correct. Thus, one scores every alignment according to 
an appropriate objective function and alignments with 
higher scores are deemed to be better. A typical align- 
ment scoring scheme consists of the following steps. 


Independent Columns 
culated in terms of columns of alignments. The indi- 


The score of alignment is cal- 


vidual columns are assumed to be independent and 


thus the total score of an alignment is a simple sum- 
mation over column scores. Thus, the score for an 
alignment score(A) = >> j score(Aj;), where Aj is col- 
umn j of the multiple alignment A. Now, the score 
for every column j is calculated as the “sum-of-pairs” 
function using the scoring matrices described below. 
The sum-of-pairs score for column A; is obtained as 
score(Aj) = )o,<; score(A‘, A‘), where Aj and Aj are 
nucleotides in column j of the alignment correspond- 
ing to sequences k and J, respectively. If the gap costs are 
linear, score(nucleotide, —) and score(-, nucleotide) will 
be the insertion cost. But, this approach would not dif- 
ferentiate between opening a gap and extending it. So, 
affine gap penalties are often used where gap opening 
and extension penalty are treated as two different pa- 
rameters. The correct value of both of these parameters 
is a major concern since their values can be set only em- 
pirically [75]. Also most schemes used in practice score 
columns as the weighted sum of pairwise substitutions 
instead of just addition as described before. The weights 
are decided in accordance with the amount of indepen- 
dent information each sequence possesses [4]. 

Both the assumption of treating every column in- 
dependently and using the sum-of-pairs score for the 
column have limitations. The problem increases as the 
number of sequences increases. 


Scoring Matrices Any alignment can be obtained by 
performing three evolution operations: insertion, dele- 
tion, and substitution. It is assumed that all the different 
operations occur independently and thus the complete 
score is evaluated as the sum of scores from every op- 
eration. Insertion and deletion scores are calculated as 
either linear or affine gap penalty. Substitutions scores 
are stored as a substitution score matrix, which con- 
tains the score for every pair of nucleotides. Thus, these 
scores S(A,B) can be treated as the score of aligning nu- 
cleotide A with nucleotide B. 

These substitution score matrices can be obtained 
in various ways. One could adopt an ad hoc approach 
of setting up a score matrix which produces good align- 
ments for a given set of sequences. The second ap- 
proach would be more fundamental and look into the 
physical and chemical properties of nucleotides. If two 
nucleotides have similar properties, they would be more 
likely to be substituted by one another. The third and 
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the most prominent one is a statistical approach where 
the maximum likelihood principle is used in conjunc- 
tion with probabilistic models of evolution [3]. 


Alignment Approaches 


The number of different approaches for the multiple se- 
quence alignment problem has steadily increased over 
the last decade and thus being exhaustive will not be 
possible. In this chapter, we will emphasize the most 
widely used class of algorithms and the new emerging 
and most promising approaches: 

1. Progressive alignment algorithms: The most widely 
used type of algorithm based on using pairwise 
alignment information of input sequences. It as- 
sumes that input sequences are phylogenetically re- 
lated, and uses these relationships to guide the align- 
ment [13]. 

2. Graph-based algorithms: A new trend where graph- 
based models are used to approach this problem. 

3. Iterative alignment algorithms: Typically an align- 
ment is produced and is then refined through a se- 
ries of iterations until no more improvement can be 
made. 


Progressive Algorithms 


Progressive alignment constitutes one of the simplest 

and most effective ways for multiple alignment. This 

strategy was introduced by various researchers, like 

Waterman and Perlwitz [77]. Among all the progres- 

sive algorithms, ClustalW is the most famous one. It 

is a noniterative, deterministic algorithm that attempts 
to optimize the weighted sums-of-pairs with affine gap 

penalties [73]. 

The typical progressive algorithm scheme is as fol- 
lows: 

e Compute the distance between all pairs of given se- 
quences by aligning them. The distances represent 
the divergence of each pair of sequences. These dis- 
tances could be calculated by fast approximation 
methods or by slower but more precise methods like 
complete dynamic programming. Since for given N 

sequences Nwap pairwise scores have to be calcu- 
lated and the scores are used just for construction 
of a guide tree and not the alignment itself, it is de- 
sirable to use approximation methods like k tuple 
matches. 


e Find a guide tree from the distance matrix. This is 
typically achieved using the clustering algorithms 
discussed in the construction of an evolutionary 
tree. Once again, since the aim is to get the align- 
ment and not the tree itself, approximation methods 
are used to construct the evolution trees. 

e Align sequences progressively according to the 
branching order in the guide tree. The basic idea is 
to start from the leaves of the guide tree and move 
toward its root and to use a series of pairwise align- 
ments to align larger and larger groups of sequences. 
Some algorithms have only a single growing align- 
ment to which every remaining sequence is aligned, 
whereas other approaches align a subgroup of se- 
quences and then merge the alignments. 

There are three main shortcomings of the progressive 

algorithms. 

1. There does not exist an undisputable “best” way of 
ordering the given sequences. 

2. Once a sequence has been aligned, that alignment 
will not be modified even if it conflicts with se- 
quences added later in the process. Hence, the or- 
der in which sequences are added becomes crucial, 
and since there is no undisputed best way to order 
the sequences, this approach returns suboptimal so- 
lutions. 

3. For a given set of n sequences, (*) pairwise align- 
ments are generated; but while computing the fi- 
nal multiple alignment, most of these algorithms 
use fewer than n pairwise alignments. Thus, the re- 
sulting multiple alignment agrees with only a small 
amount of information available in the data. 

Therefore, there is a growing need for an algorithm 
to align extremely divergent sequences whose pairwise 
alignments are likely to be incorrect. In order to address 
all these issues, some techniques have been developed; 
while they are innovative, it is understandable that they 
have their own assumptions and drawbacks. 


Graph-Based Algorithms 


Over the last few years, the field of genomics has un- 
dergone evolutionary changes with a rapid increase in 
new solution strategies. The use of graph-based mod- 
els is easily seen as one of the most emerging and far- 
reaching trends. Just and Vedova [38] used a rela- 
tion between the facility location problem and sequence 
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alignment to prove the NP-hardness of multiple se- 
quence alignment. In this section, we review the most 
prominent integer programming approaches for find- 
ing multiple sequence alignment. 


Maximum-Weight Trace Kececioglu et al. [40] used 
a solution of the maximum trace problem to construct 
alignment. The algorithm starts by calculating all pair- 
wise alignments and using them to find a trace. To 
achieve this, given n sequences, an input alignment 
graph G = (V, E) is constructed. It is an n-partite graph 
whose vertex set V represents the characters of the 
given sequences and whos edge set E represents the 
pairs of characters matched in the pairwise alignments. 
The subset of matching in E realized by an alignment is 
called a trace. 

Alignment graph G = (V, E) is extended to a mixed 
graph G’ = (V,E, A) by adding arc set A which con- 
nects the characters of every sequence to the next char- 
acter in the same sequence. The objective of the algo- 
rithm is to find the maximum weight trace by finding 
cycles termed as “critical mixed cycles” in graph G’ such 
that they satisfy sequence alignment properties [61]. 

The integer programming model for this problem is 
formulated as 


Maximize x Were (1) 


ecE 


subject to > Xe =|EN P| —1V critical mixed 
e€PNE 
cycles Pin G’ , x. € {0,1} foralle € E. 
(2) 


An implementation of a branch-and-cut algorithm 
is used to solve the above problem. Various valid in- 
equalities for the polytope are added as cuts, some of 
which are facet-defining. The algorithm is capable of 
giving an exact solution under the sum-of-pairs objec- 
tive function with linear gap costs. Kececioglu et al. [40] 
have made a significant contribution by introducing 
a polyhedral approach capable of obtaining exact so- 
lutions for a subclass of multiple sequence alignment. 
However, this method has its own drawbacks like not 
being able to capture the order of insertions and dele- 
tions between two matchings and affine gap costs. Re- 
cently, Althaus et al. [2] proposed a general model using 
this approach in which arbitrary gap costs are allowed. 


Minimum-Spanning Tree and Traveling Salesman 
Problem Shyu et al. [65] explored the use of min- 
imum spanning trees to determine the order of se- 
quences. The idea of the approach is to preserve the 
most informative distances among the set of given se- 
quences. The criterion used is meaningful and capable 
of working better than the traditional criteria like those 
in sum-of-pairs. The algorithm itself is very efficient for 
practical usage, and can be easily implemented. How- 
ever, it fails to address the issue of using all the informa- 
tion in pairwise alignments, since it only uses the score 
and not the pairwise alignments themselves. Moreover, 
this approach has all the drawbacks of the progressive 
strategy. 

A similar approach was also developed by Korosten- 
sky and Gonnet [44] using the traveling salesman prob- 
lem. In this technique, a circular sum measure is used 
instead of a sum-of-pairs score. The cities in the travel- 
ing salesman problem correspond to the sequences and 
the scores of pairwise alignment are taken as the dis- 
tances. The problem is to find the longest tour where 
each sequence is visited exactly once [45]. 


Eulerian Path Approach Zhang and Waterman [81] 
proposed a new approach motivated by the Eulerian 
method for fragment assembly in DNA sequencing. In 
their work, a consensus sequence is found and later 
pairwise alignments are obtained between each input 
sequence and consensus sequence. Finally, multiple se- 
quence alignment is obtained according to these pair- 
wise alignments. The most significant advantage of this 
method is the linear time and memory cost for finding 
the consensus sequence. And, if the consensus sequence 
is the one closest to all given sequences, good quality 
alignment can be obtained in a reasonable amount of 
time. Once again, this approach suffers from the promi- 
nent drawback of the progressive strategy and issues in 
graph formation while finding the consensus sequence. 


Iterative Algorithms 


The main shortcoming of the progressive strategy is the 
failure to remove errors in the alignment, which are in- 
troduced early. The iterative algorithms are developed 
precisely to overcome this flaw. They are based on the 
idea of reconsidering and realigning previously aligned 
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sequences with the goal of improving the overall align- 
ment score. Each modification step is an iteration to im- 
prove the quality of the alignment. 

These available approaches can be classified into 
two broad categories: probabilistic iterative algorithms, 
and deterministic iterative algorithms. We will briefly 
discuss them below. 


Probabilistic Algorithms We will discuss both the 

traditional probabilistic optimization approaches like 

the genetic algorithm and relatively recent approaches 
based on a Bayesian idea. 

e Simulated annealing and genetic algorithm. Simu- 
lated annealing and the genetic algorithm are very 
popular stochastic methods for solving complex op- 
timization problems. While they are often viewed 
as separate and competing paradigms, both of them 
are iterative algorithms which search for new solu- 
tions “near” to already known good solutions. The 
fundamental difference between simulated anneal- 
ing and the genetic algorithm is that simulated an- 
nealing performs a local move only on one solution 
to create a new solution, whereas the genetic algo- 
rithm also creates solutions by combining informa- 
tion from two different solutions. The performance 
of simulated annealing and the genetic algorithm 
varies with the problem and representation used. 
The algorithms starts with an initial alignment and 
the alignment score is taken to be the objective 
function [57]. Various operations like mutation, in- 
sertion, and substitution constitute the local move 
which is used to a get new solution from existing 
ones. Flexibility in the scoring systems and the abil- 
ity to correct for errors introduced during the early 
phase makes these approaches desirable [41]. 

e Hidden Markov model and Gibbs sampler. The hid- 
den Markov model and the Gibbs sampler are rel- 
atively recent approaches which view multiple se- 
quence alignment in a statistical context. Both of 
them use the central Bayesian idea of simultane- 
ously maximizing the data and the model. The Gibbs 
sampler find motifs using local alignment tech- 
niques [49]. It is essentially similar to the hidden 
Markov model with no insert and delete states. 

The hidden Markov model is a statistical model 
based on the Markov process, which has gained im- 
portance in various fields related to pattern recogni- 


tion. It determines the hidden parameters of the sys- 
tem on the basis of the observable parameters of the 
model. For multiple sequence alignment, the hid- 
den Markov model consists of three types of states: 
match states, insert states, and delete states [46]. 
Each state has its own emission probability of nu- 
cleotides and transition probability to other states. 
The standard expectation-maximization algorithm 
or gradient descent algorithms are used to train the 
model and evaluate the parameters. 

Although the hidden Markov model has been suc- 
cessfully used in other areas, it faces a lot of chal- 
lenges. There need to be some minimum number of 
sequences (approximately 50) required to train the 
model and the hidden Markov model can be easily 
trapped in local optima like other hill-climbing ap- 
proaches [35]. 


Deterministic Algorithms A deterministic iterative 

algorithm starts with an initial alignment and then at- 

tempts to improve it. This helps in overcoming the 

drawback of a progressive alignment strategy where 

partial alignments are “frozen” [6]. A typical scheme is 

as follows: 

e Given N sequences S;, S,..., Sy, find alignment A. 

e Remove sequence S; from alignment A and realign it 
to the profile of other aligned sequences S;,..., Sy 
to get new alignment A’. 

e Calculate the score of the new alignment A’ and if it 


is better replace A by A’. 
e Remove sequence S; from A’ and realign it. Con- 
tinue this procedure for S3,..., Sy. 


e Repeat the realignment steps until the alignment 
score converges or the number of iterations reaches 
the user-specified limit. 

Many iteration strategies which enable very accu- 
rate alignments have been developed [76]. The aim is 
to reduce the greedy nature of the algorithm and avoid 
getting trapped in a local optimum. One approach is to 
remove and realign every sequence to the rest in each it- 
eration. Then, the alignment with the best score is taken 
to be the input for the next iteration. The other famous 
approach is to randomly split a set of sequences into 
two sets, which are then realigned. 

Some researchers have incorporated the iterative 
strategy in the progressive alignment procedure it- 
self. For instance, a double iteration loop has been 
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used to make the alignment, guide tree, and sequence 
weights mutually consistent [27]. Recently, Chakra- 
barti et al. [12] developed an approach which provides 
a fast and accurate method for refining existing block- 
based alignments. 


Novel Graph-Theoretical Genomic Models 


In this section, we present our research effort for a novel 
graph-theoretical approach for representing a wide va- 
riety of genomic sequence analysis problems within 
a single model [50]. The model allows incorporation 
of the operations “insertion,” “deletion,” and “substi- 
tution,” and various parameters such as relative dis- 
tances and weights. Conceptually, we refer the prob- 
lem as the minimum weight common mutated sequence 
(MWCMS) problem. The MWCMS model has many 
applications, including the multiple sequence align- 
ment problem, phylogenetic analysis, the DNA se- 
quencing problem, and the sequence comparison prob- 
lem, which encompass a core set of very difficult prob- 
lems in computational biology. Thus, the model pre- 
sented in this section lays out a mathematical model- 
ing framework that allows one to investigate theoretical 
and computational issues, and to forge new advances 
for these distinct, but related problems. 

DNA sequencing refers to determining the exact 
order of nucleotide sequences in a segment of DNA. 
This was the greatest technical challenge in the Human 
Genome Project. Achieving this goal has helped reveal 
the estimated 30,000 human genes that are the basic 
physical and functional units of heredity. The resulting 
DNA sequence maps are being used by scientists to ex- 
plore human biology and other complex phenomena. 

The structure of a DNA strand (sequence) is deter- 
mined by experimentation. Typically, short sequences 
are determined to be in the strand, and the short 
sequences identified are then “connected” to form 
a long sequence. Recent advances attempting to iden- 
tify DNA strand structure involve sequencing by hy- 
bridization [5,36]. Sequencing by hybridization is the 
process where every possible sequence of length n (4” 
possibilities) is compared with a full DNA strand. Prac- 
tical values for n are 8-12. Each short string either binds 
or does not bind to the full strand. Biologists can thus 
determine exactly which short strings are contained in 
the DNA strand and which are not. 


However, the experiment does not identify the ex- 
act location of each short string in the full strand. 
Hence, an important issue involves how these short 
strings are connected together to form the complete 
strand. This problem can be viewed as a shortest com- 
mon superstring problem and has been studied exten- 
sively [22,23,54]. Unfortunately, errors may arise dur- 
ing sequencing experiments. Three types of errors are 
deletions (a letter appears in an input string that should 
not be in the final sequence), insertions (a letter is miss- 
ing from an input string), and substitutions (a letter in 
an input string should be substituted with another let- 
ter). The MWCMS problem can be used to model and 
solve this shortest common superstring problem while ad- 
dressing the issue of possible errors. 

Sequence comparison is one of the most crucial 
problems faced by researchers in the area of bioinfor- 
matics. The sequence patterns are conserved during 
evolution. Given a new sequence, it will be of inter- 
est to understand how much similarity it has with pre- 
existing sequences. Significant similarity between two 
sequences implies similarities in their structures and/or 
functions. There are lots of DNA databases containing 
DNA sequences and their functions. The major ones 
are GenBank in the USA and the EMBL data library 
in Europe. If one finds a new sequence similar to ex- 
isting ones in these databases, one can transfer infor- 
mation about the function and structure [78]. Hence, 
an algorithm for sequence comparison which is efficient 
for a large number of sequences will play a pivotal role 
in rapid sequence analysis. The MWCMS problem can 
be used to address this issue. 


Definitions 


Our motivation for first defining the problem arose 
from the desire to help quantify the concept of the 
“best” representative sequence in the evolutionary dis- 
tance problem. The evolutionary distance problem in- 
volves finding the DNA sequence of the most likely an- 
cestor associated with a given set of DNA sequences 
from distinct but similar organisms. In other words, 
find the DNA strand that best represents a possible 
ancestor, if each of the organisms evolved from the 
same ancestor. Changes that contribute to differences 
between the given sequences and the ancestor are re- 
ferred to as insertions, deletions, and substitutions. 
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These operations account for both evolutionary mu- 
tations and experimental errors in sequencing. Math- 
ematically, given two sequences S and B, let ord(S, B) be 
an ordered collection of insertions, deletions, and sub- 
stitutions to convert sequence S to sequence B. (For any 
two sequences S and B, there are an infinite number of 
collections ord(S, B).) Let w(ord(S, B)) be the weight of 
the conversion from S to B, where the weight is the sum 
of an expression involving values n, 6, and w € Kt 
which represent the weights associated with a single 
insertion, deletion, and substitution, respectively. Let 
ord*(S, B) be such that w(ord*(S, B)) < w(ord(S, B)) 
for all ord(S, B). Define d(S, B) = w(ord*(S, B)). For- 
mally, the MWCMS problem can be stated as fol- 
lows: Given positive weights 7, 5, and w correspond- 
ing to a single insertion, deletion, and substitution re- 
spectively, a positive threshold «, and finite sequences 
Si,...,5m from a finite alphabet, does there exist a se- 
quence B such that }7""_, d(S;, B) < x? 

We have defined the MWCMS problem—which in- 
corporates the notions of insertion, deletion, and sub- 
stitution—to help quantify the concept of the “best” 
representative sequence in the evolutionary distance 
problem. We now define precisely the operations of in- 
sertion, deletion, and substitution. Let S = {s,,...,Sn} 
be a finite sequence of letters from a finite alphabet: 


1. An insertion of an element x in position i of the se- 
quence S is characterized by the addition of x be- 
tween elements s; and s;+,. An insertion carries an 
associated penalty cost of 7. 

2. A deletion of an element in position i of S amounts 
to deleting s; from the sequence S. The penalty for 
deletion is represented by 6. 

3. A substitution of an element in position i of S 
amounts to replacing s; with another letter from the 
alphabet. The penalty for substitution is represented 
by v. 


We remark that a penalty cost for an operation could, 
more generally, depend on the position where the op- 
eration is performed and/or the element to be in- 
serted/deleted/substituted. 

Let S, = {511,...,Sim} and Sz = {521,...,S2n} be 
two finite sequences of letters from a finite alphabet ~. 
We say that the relative distance between elements 5); 
and sj; is k if |i— j| = k. We define a k-restrictive bi- 
partite graph as a graph G; = (Vj, V2, Ex) such that the 


nodes in V; and V2 correspond, respectively, to each of 
the elements from the first and the second sequences. 
We assume the nodes in V; are ordered in the same 
order as they appear in the sequence S;. There is an 
edge between nodes u € V, and v € V) if u and v are 
identical (i.e., the same letter of the alphabet )*) and 
if the relative distance between these two elements is 
less than or equal to k. The problem of identifying the 
“greatest similarity” between these two sequences can 
then be approached as the problem of finding a maxi- 
mum cardinality matching between the associated node 
sets, subject to restrictions on which matchings are al- 
lowed. In particular, one must take into consideration 
the ordering of nodes so as to preserve the relative oc- 
currence of the elements in the matching. In addition, 
matchings that have edge crossings must be prevented. 
When k = max{|Sj|, |S2|} — 1, we denote the graph by 
G = (Vi, V2, E), and the problem is equivalent to the 
well-studied longest common subsequence problem for 
two sequences, which is polynomial time solvable [23]. 


Construction of a Conflict Graph 
from Paths of Multiple Sequences 


Let S;,i = 1,...,m, bea collection of finite sequences, 
each of length n, over a common alphabet 5°. Let 
Gy =(V,,... Vins Ei, Ex,..., Em—1) be the k-restrictive 
multilayer graph in which each element in S; forms 
a distinct node in V;. Assume the nodes in V; are or- 
dered in the same order as they appear in the sequence 
S;. E; denotes the set of edges between nodes in V; 
and Vj+1. There is an edge between nodes u € V; and 
v € Vi41 if and only if u and v are the same letter in 
the alphabet 5°, and the relative distance between them 
is less than or equal to k. The multiple sequence com- 
parison problem involves finding the longest common 
subsequence within the sequences S;,i = 1,...,m.We 
calla path P = pj, po,.... Pm a complete path in G, if 
pi € Vi and pipi+1 € E;. Two complete paths are said 
to be parallel if their node sets are disjoint and the 
edges do not cross. Hence, a set of parallel complete 
paths in G; corresponds to a feasible solution to longest 
common subsequence problem on the collection of se- 
quences S;,i = 1,...,m. We say that two complete 
paths P, and P) cross if they are not parallel. We remark 
that the longest common subsequence problem with 
the number of sequences bounded,is polynomial time 


46 


Algorithms for Genomic Analysis 


solvable using dynamic programming [23]. In general, 
the problem remains NP-complete. 

We can incorporate insertions by generating new 
paths which include inserted nodes on various layers. 
The weight for such a new path will be affected by the 
total number of insertions in the path. In particular, if 
L is a common subsequence for S; and |S;| = n for all 
i= 1,...,m, then the total number of unmatched el- 
ements remaining will be m(n — |L|). These elements 
can be deleted completely, or for a given unmatched 
element, one can increase the size of L by 1 by appro- 
priately inserting this element into various sequences. 
By doing so, one decreases the number of unmatched 
elements. Let / be the number of insertions needed to 
generate a new complete path. Then the number of un- 
matched elements will decrease by m — I. If we assume 
that at the end of the sequencing process all unmatched 
elements will be deleted, then the penalty for generating 
this new complete path will be given by /y — (m — 1)6. 

We next define the concept of a conflict graph rela- 
tive to the complete paths in Gx. 


Definition 1 Let P = {P,,...,P;} be a finite col- 

lection of complete paths in Gy. The conflict graph 

Cp = (Vp, Ep) associated with P is constructed as fol- 

lows: 

e@ Vp = {P,,...,Ps}s 

e there is an edge between two nodes P; and P; in Vp 
if and only if P; and P; cross each other. 


This definition applies to any multilayer graph in gen- 
eral. Note that any stable set of nodes in Cp corre- 
sponds to a set of parallel complete paths for G;, and 
thereby to a feasible solution to the longest common 
subsequence problem on the collection of sequences 
Si, i= i ones 778 

We remark that when m = 2, the resulting conflict 
graph is weakly triangulated, and thus is perfect. For 
m > 2, the conflict graph can contain an antihole of 
size 6. However, these complete paths can be viewed as 
continuous functions on the interval from 0 to 1; thus, 
by construction, Cp is perfect [26]. 


Complexity Theory 

Recall that the notation ord(S,B), w/(ord(S, B)), 
ord*(S,B), and the formal definition of the MWCMS 
problem were given in Sect. “Definitions”. As an opti- 
mization problem, the MWCMS problem can be stated 


as follows. Given a set of input sequences, the MWCMS 
problem seeks to mutate every input sequence to the 
same a priori unknown sequence using the operations 
of insertion, deletion, and substitution; weights are 
assigned for each operation, and the total weight as- 
sociated with all mutations is to be minimized. Leven- 
shtein [51] first considered a special case of this prob- 
lem by changing a single input sequence to another 
sequence using insertions, deletions, and substitutions. 
Our study involves changing multiple input sequences 
to arrive at an a priori unknown common sequence. 

Given positive weights 7, 5, and w corresponding, 
respectively, to insertions, deletions, and substitutions 
and any two sequences S and B, clearly any ord*(S, B) 
will never contain more than |B| insertions or substitu- 
tions. Proving that the MWCMS is in NP is not obvi- 
ous. While one can transform the MWCMS to special 
applications (as described at beginning of Sect. “Novel 
Graph-Theoretical Genomic Models”) to conclude that 
it is in NP, here we prove it directly for the general case. 
One needs to be able to evaluate d(S, B) in polynomial 
time for any two sequences S and B. We next construct 
a graph that can be used to establish the existence of 
a polynomial-time algorithm for obtaining d(S, B). The 
constructs and arguments used here typify those used to 
establish many of the results presented in this chapter. 
It is noteworthy that the notions of both conflict graph 
and perfect graph come into play. 

Let )> be a finite alphabet, and define })7-cross to 
be a directed bipartite graph consisting of | }~ | vertices 
in each bipartition such that each vertex in the bipar- 
tition represents a distinct element in >. There is an 
arc between two vertices if the vertices correspond to 
the same element in }°, and the geometric layout is 
rigidly constructed so that every arc crosses every other 
arc. This graph will be used as a “supernode” for inser- 
tion and substitution operations in our model. Figure 8 
shows an example for )~-cross when )> = {A,C,G,T}. 

We now construct a three-layer supergraph, Gr, 
using the sequences S and B along with the )0- 
cross graphs. Layers 1 and 2 consist of exactly 
|B\(|S| + 1) + |S| >°-crosses. The first |B] >°-crosses 
represent potential insertions before the first letter in S. 
The next }°-cross represents either the first letter of $ 
or a substitution of this letter. The next |B] }°-crosses 
represent potential insertions between the first and sec- 
ond letters of S. And this is followed by a }°-cross rep- 
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Algorithms for Genomic Analysis, Figure 8 
An example of )°-cross when >> = {A, C, G, T} 


resenting either the second letter of S or a substitution 
of this letter. This continues for each letter in S with the 
final |B| }°-crosses representing up to |B| insertions af- 
ter the last letter in S. Each }°-cross is called either an 
insertion supernode or a substitution supernode, accord- 
ing to what it represents. The weight of all of the arcs 
in an insertion supernode is 7. An arc in a substitution 
supernode has weight —é if the arc represents the origi- 
nal letter in the sequences, or y — 6 if the arc represents 
a substitution of the original letter. Layer 3 consists of 
the vertices represented by B. A vertex in layer 2 is con- 
nected to a vertex in layer 3 if they have the same let- 
ter. The weight of every arc between layers 2 and 3 is 
M<-—(n+6+). A sample of a three-layer super- 
graph is given in Fig. 9. The bold arcs are used to de- 
note the original letters in S (the weight of these arcs is 
—6). For simplicity, we omit the first two insertion su- 
pernodes before the first letter G. The first supernode 
thus represents the letter G from the original sequence, 
which allows for substitution. The second and third su- 
pernodes correspond to insertion supernodes, and the 
fourth supernode corresponds to the letter C and allows 
substitution as well. There are two more insertion su- 
pernodes which are omitted from the graph. 

The main step in proving d(S, B) to be polynomial 
time solvable for any sequences S and B involves the 
use of the conflict graph as defined in Definition 1. We 
state some preliminary theoretical results below. De- 
tailed proofs can be found in Lee et al. [50]. 


Layer | 


Layer 2 


Layer 3 


Algorithms for Genomic Analysis, Figure 9 

An example of the three-layer supergraph for converting the 
sequence S = GC toB = TC. Bold arcs are used to denote the 
original letters in S (the weight of these arcs is —5). For sim- 
plicity, we omit the first two insertion supernodes before the 
first letter G. The first supernode thus represents the letter 
G from the original sequence, which allows for substitution. 
The second and third supernodes correspond to insertions, 
and the fourth supernode corresponds to the letter C and al- 
lows substitution as well. There are two more insertion su- 
pernodes which are omitted from the graph 


Lemma The following statements are equivalent: 
1. There exists a conversion from S to B using no more 
than a total of |B| insertions or substitutions. 


2. There exist a set of noncrossing complete paths in the 


associated three-layer supergraph G, of size |B]. 
3. There exists a node packing of size |B| in the associ- 
ated conflict graph C. 
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Lemma 2 Calculating d(S, B) for any sequences S and 
B can be accomplished in polynomial time. 


The three-layer supergraph can be generalized to a mul- 
tilayer supergraph when multiple sequences are consid- 
ered. Clearly, such multilayer supergraphs are much too 
large for practical purposes, yet polynomiality is pre- 
served in the construction, and it is therefore sufficient. 
We can now arrive at the result that the MWCMS is in 
NP. 


Theorem 1 The MWCMS is in NP. 


To prove that the MWCMS is polynomial time solv- 
able when the number of input sequences is bounded 
by a positive constant, the following lemma is crucial, 
though trivial. 


Lemma 3 Given n, 6, w € Rt, an optimal solution B 
to any MWCMS problem has the following properties. B 
has no substitutions from letters other than the original 
letters in S;, and B will never have an element which is 
inserted in every sequence (in the same location). There- 
fore, there are at most )>;"_, |S;| insertions in any se- 
quence. 


In addition, we also require the construction of a (di- 
rected) 2m-layer supergraph, G/", similar to the three- 
layer supergraph, Gr. 

Given sequences S),...,Sm, generate a 2m-layer 
(directed) graph Gi" = (V, E) as follows. Layers 2i — 1 
and 2i consist of eee |Sj|)(ISi] + 1) + |Si| copies 
of >>-crosses for i = 1,...,m, constructed in exactly 
the same manner as layers 1 and 2 of the three- 
layer supergraph using the input sequence S;. The 
first )7;"_, |S;|_}0-crosses represent the possibility that 
ei |S;| different letters can be inserted before the 
first element in S;. The next }°>-cross corresponds to 
either the first letter in S; or a substitution of this let- 
ter. This is repeated |S;| times (for each letter in Sj), 
and the final }7'"_, |S;| }0-crosses represent insertions 
after the final letter in S;. Thus, the first )0"", [Sj] D0- 
crosses represent the insertion supernodes, followed by 
one )--cross representing a letter in S; or a substitu- 
tion supernode, and so forth. An arc exists from a ver- 
tex in layer 2i to a vertex in layer 2i + 1 if the vertices 
correspond to the same letter. Observe that G/" is an 
acyclic directed graph which is polynomial in the size 
of the input sequences. Assign every arc between lay- 
ers 2i and 2i + 1 a weight of 0. There are three differ- 


ent weights for arcs between layers 2i — 1 and 2i each 
corresponding to an insertion, deletion, or substitution. 
The assignment of weights on such arcs is analogous to 
the assignment in G;: a weight of 7 is assigned to ev- 
ery arc contained in an insertion supernode; and an arc 
in a substitution supernode is assigned a weight of —6 
if it corresponds to the original letter, or y — 6, other- 
wise. 

Figure 10 shows a sample graph for two sequences: 
S; = GCand S, = TG. Observe that at most two inser- 
tions are needed in an optimal solution; thus, we can re- 
duce the number of )°-crosses as insertion supernodes 
from )~7_, |Si| = 4 to 2. For simplicity, in the graph 
shown in Fig. 10, we have not included the two inser- 
tion supernodes before the first letter nor those after 
the last letter of each sequence. Thus, in the figure, the 
first }>-cross represents the substitution supernode as- 
sociated with the first letter in S;. The second and third 
>°-crosses represent two insertion supernodes. And the 
last )>-cross represents the substitution supernode as- 
sociated with the second letter in S;. For simplicity, we 
include only arcs connecting vertices associated to the 
element G between layers 2 and 3. The arcs for other 
vertices follow similarly. 

A conflict graph C associated with G7" can be gen- 
erated by finding all complete paths (paths from layer 1 
to layer 2m) in Gj}. These complete paths correspond 
to the set of vertices in C, as in Definition 1. If we as- 
sign a weight to each vertex equal to the weight of the 
associated complete path, then the following result can 
be established. 


Theorem 2 Every node packing in C represents a can- 
didate solution to the MWCMS if and only if at most 
7, |Si| letters can be inserted between any two origi- 
nal letters. Furthermore, the weight of the node packing 
is equal to the weight of the MWCMS — >", |Si|6. 


The supergraph G;" and its associated conflict graph are 
fundamental to our proof of the following theorem on 
the polynomial-time solvability of a restricted version 
of the MWCMS problem. 


Theorem 3 The MWCMS problem restricted to in- 
stances for which the number of sequences is bounded by 
a positive constant is polynomial time solvable. 
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Layer 1 


5; = GC 


Algorithms for Genomic Analysis, Figure 10 
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A sample graph G/” of MWCMS with S; = GC to Sy) = TG, where )° = {A,C,G,T} 


Special Cases of MWCMS 


The MWCMS encompasses a very broad class of prob- 
lems. In computational biology as discussed in this 
chapter, first and foremost, it represents a model 
for phylogenetic analysis. The MWCMS as defined 
is the “most likely ancestor problem,” and the con- 
cept of the three-layer supergraph as described in 
Sect. “Complexity Theory” describes the evolutionary 
distance problem. An optimal solution to a multiple se- 
quence alignment instance can be found using the solu- 
tion of the MWCMS problem obtained on the 2m-layer 
supergraph, Gj". The alignment is the character ma- 
trix obtained by placing together the given sequences 


incorporating the insertions into the solution of the 
MWCMS problem. Furthermore, DNA sequencing can 
be viewed as the shortest common superstring problem, 
while sequence comparison of a given sequence B to 
a collection of N sequences S;,..., Sy is the MWCMS 
problem itself. 

Broader than the computational biology applica- 
tions, special cases of the MWCMS include shortest 
common supersequences, longest common subse- 
quences, and shortest common superstring; these prob- 
lems are of interest in their own right as combinatorial 
optimization problems and for their role in complexity 
theory. 
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Computational Models: 
Integer Programming Formulation 


The construction of the multilayer supergraphs de- 
scribed in our theoretical study lays the foundation 
and provides direction for computational models and 
solution strategies that we will explore in future re- 
search. Although the theoretical results obtained are 
polynomial-time in nature, they present computational 
challenges. In many cases, calculating the worst-case 
scenario is not trivial. Furthermore, the polynomial- 
time result of a node-packing problem for a perfect 
graph by Grotschel et.al. [29,30] is existential in nature, 
and relies on the polynomial-time nature of the ellip- 
soid algorithm. The process itself involves solving an 
integer program relaxation multiple times. In our case, 
the variables of the integer program generated are the 
complete paths in the multilayer supergraph, G7”. For- 
mally, the integer program corresponding to our con- 
flict graph can be stated as follows. 

Let x, be the binary variable denoting the use or 
nonuse of the complete path p with weight w,. Then 
the corresponding node-packing problem is 


Minimize > WpXp 


subject to xp + Xq <1 if complete paths p 
and q cross 
Xp € {0,1} for all complete 
paths p in G?’ . 
(MIP1) 


We call the inequality x, + x, < 1 an adjacency 
constraint. A natural approach to improve the solution 
time for (MIP1) is to decrease the size of the graph 
G;" and thus the number of variables. Reductions in 
the size of G;" can be accomplished for shortest com- 
mon superstrings, longest common subsequences, and 
shortest common supersequences. Among these three 
problems, the graph G7” is smallest for longest com- 
mon subsequences. In longest common subsequences, 
all insertion and substitution supernodes can be elimi- 
nated. 

Our theoretical results thus far rely on the cre- 
ation of all complete paths. Clearly, the typical num- 
ber of complete paths will be on the order of n”, where 
n = max|S;|. In this case, an instance with three se- 


quences and 300 letters in each sequence generates 
more than one million variables; hence, an exact formu- 
lation with all complete paths is impractical in general. 
A simultaneous column and row generation approach 
within a parallel implementation may lead to computa- 
tional advances related to this formulation. 

An alternative formulation can be obtained by ex- 
amining G;" from a network perspective using arcs (in- 
stead of complete paths) in G/" as variables. Namely, let 
x;,j; denote the use or nonuse of arc (i,j) in the final se- 
quence, with c;,; the cost of the arc in G7’. The network 
formulation can be stated as 


Minimize > Ci, jXi,j 
(Gi, j)€E 

subject to — Xij= . Xj,k 
i:(i,)€E k:(j.k)€E 


for all j € V in layers 2,...,2m—1 
Xig + Xk SL 1 
for all crossing arcs (i, j) and (k,1) € E 
xi,j € {0,1} 
for all (i,j) EE. 

(MIP2) 


The first set of constraints ensures flow in equals 
flow out in all vertices contained in sequences 
2,...,m—1 (complete paths). The second set of con- 
straints ensures that no two arcs cross. This model 
grows linearly in the number of sequences. This alter- 
native integer programming formulation is still large, 
but is manageable for even fairly large instances. 

Utilizing a collection of DNA sequences (each with 
40,000 base pairs in length) from a bacterium, and a col- 
lection of short sequences associated with genes found 
in breast cancer patients, computational tests of our 
graph-theoretical models are under way. We are seek- 
ing to develop computational strategies to provide rea- 
sonable running times for evolutionary distance prob- 
lem instances derived from these data. In an initial test, 
when three sequences each with 100 letters are used, the 
initial linear program requires more than 10,000 s to 
provide a solution when tight constraints are employed 
(in this case, each adjacency constraint is replaced by 
a maximal clique constraint). Our ongoing computa- 
tional effort will focus on developing and investigating 
solution techniques for practical problem instances, in- 
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cluding those based on the abovementioned two integer 
programming formulations, as well as development of 
fast heuristic procedures. 

In [50], we outline a simple yet practical heuristic 
based on (MIP2) that we developed for solving the mul- 
tiple sequence alignment problem; and we report on 
preliminary tests of the algorithm using different sets of 
sequence data. Motivation for the heuristic is derived 
from the desire to reduce computational time through 
various strategies for reducing the number of variables 
in (MIP2). 


Summary 


Multiple sequence alignment and phylogenetic analysis 
are deeply interconnected problems in computational 
biology. A good multiple alignment is crucial for reli- 
able reconstruction of the phylogenetic tree [58]. On 
the other hand, most of the multiple alignment meth- 
ods require a phylogenetic tree as the guide tree for pro- 
gressive iteration. 

Thus, the evolutionary tree construction might 
be biased by the guide tree used for obtaining the 
alignment. In order to avoid this pitfall, various al- 
gorithms have been developed which simultaneously 
find alignment and phylogenetic relationship among 
given sequences. Sankoff and Cedergren [64] devel- 
oped a parsimony-based algorithm using a character- 
substitution model of gaps. The algorithm is guar- 
anteed to find the evolutionary tree and alignment 
which minimizes tree-based parsimony cost. Hein [33] 
also developed a parsimony-type algorithm but used 
an affine gap cost, which is more realistic than the 
character-substitution gap model. This algorithm is 
also faster than Sankoff and Cedergreen’s approach but 
makes simplifying assumptions in choosing ancestral 
sequences. 

Like parsimony methods for finding a phylogenetic 
tree, both of the abovementioned approaches require 
a search over all possible trees to find the global op- 
timum. This makes these algorithms computationally 
very intensive. Hence, there has been a strong focus on 
developing an efficient algorithm that considers both 
alignment and the tree. Vingron and Haeseler [74] have 
developed an approach based on three-way alignment 
of prealigned groups of sequences. It also allows change 
in the alignment made early in the course of computa- 


tion. Many programs, like MEGA, are trying to develop 
an efficient integrated computing environment that al- 
lows both sequence alignment and evolutionary analy- 
sis [48]. 

We addressed this issue of simultaneously finding 
alignment and phylogenetic relationships by presenting 
a novel graph-theoretical approach. Indeed, our model 
can be easily tailored to find theoretically provable opti- 
mum solutions to a wide range of crucial sequence anal- 
ysis problems. These sequence analysis problems were 
proven to be NP-hard, and thus understandably present 
computational challenges. In order to strike a balance 
between the time and the quality of the solution, a va- 
riety of parameters are provided. Ongoing research ef- 
forts are exploring the development of efficient com- 
putational models and solution strategies in a massive 
parallel environment. 
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Since the mid-1990s the need for techniques to paral- 
lelize numerical applications has increased. When par- 
allelizing nested loops for distributed memory parallel 
computers, two major problems have to be solved: the 
scheduling of the loop iterations and the mapping of 
the computations and data elements onto the proces- 
sors. The scheduling functions must satisfy all the data 
dependences existing in the sequential loop nests. The 
mapping functions should maximize the degree of par- 
allelism obtained. Furthermore they should minimize 
the amount of communication overhead due to non lo- 
cal data references. 

This survey presents the alignment problem, that is, 
the problem of mapping computation and data onto 


the processors. The alignment problem has been stud- 
ied extensively since the beginning of the nineties, that 
is, since the beginning of the introduction of massively 
parallel distributed memory computers. For different 
sub-problems of the alignment problem, the most in- 
teresting results are surveyed. 


Alignment Problem 


The alignment problem is the problem of finding an 
alignment of loop iterations with the array elements ac- 
cessed. This means computing mapping functions of 
the loop iterations, called computations, and mapping 
functions of the array elements, called data, to a mul- 
tidimensional grid of virtual processors. The name of 
the problem comes from the idea of aligning the pro- 
cessors computing with the ones owning the data. The 
alignment problem is tightly related to the mapping of 
the computation and data objects onto a grid of virtual 
processors. 

As input, programs containing nested loops are 
considered. Each loop nest may contain one or more 
instructions. For the sake of simplicity, only assignment 
instructions are considered. The data access functions 
are described by the functions F;: I; + Dx, where Jj 
represents the iteration space surrounding instruction 
S; and Dx the domain of the array K. 

To solve the alignment problem, computation and 
data mapping functions C; and Dx have to be computed 
such as to minimize the overall execution time of the 
resulting parallel program. 


Cj: I; — P, 
Dx: Dek > P, 


where P represents a multidimensional grid of virtual 

processors. 

To minimize the overall execution time a solution 
to the alignment problem has to address the following 
needs: 

i) maximize the degree of parallelism, that is, use as 
many dimensions of the virtual grid of processors 
as possible, 

ii) minimize the need for non local data accesses, that 
is, distribute the array elements such that a minimal 
amount of communication overhead is required to 
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access data elements stored on different processors 
than the ones accessing them, 

iii) guarantee the existence of scheduling functions 
compatible with the computation mapping func- 
tions. 

Clearly the needs i)-iii) depend on each other. In this 

survey we only focus on the first two needs. 

Need i) can be expressed by maximizing the dimen- 
sion of the virtual processor grid P onto which the com- 
putations and data elements are mapped. 

The need for a given data access F) to be local is ex- 
pressed by the equation (1) being satisfied: 


C(t) = Dx(Fi(@)). (1) 


Equation (1) is called alignment constraint or locality 
constraint. Depending on how the needs i) and ii) are 
satisfied, various subproblems of the alignment prob- 
lem can be defined. 


Communication-Free Alignment Problem 


The communication-free alignment problem (CFAP) is 
the problem of finding computation and data mapping 
functions for each instruction and for each data array 
such that no communication is needed and the degree 
of parallelism obtained is maximal. The CFAP can be 
formulated as an optimization problem: 
MaXcj,Dx dimension of P 
s.t. Vj,1,K: Cj@) = Dg(Fi(2)). 


Constant-Degree Parallelism Alignment Problem 


Let F be the set of data access functions from a set of 
loop nests forming an alignment problem and d a pos- 
itive constant. Let c(¥’, F) be a cost function on a sub- 
set F’ C F of data access functions. The constant de- 
gree parallelism alignment problem (CDPAP), denoted 
by (F, d), is the problem of finding a subset F’ C F of 
data access functions such that: 

1) There exists a solution to the CFAP consisting of all 
data accesses in the set F’ admitting a degree of par- 
allelism of at least d. 

2) The cost function c(F’, F) on the subset F’ is mini- 
mized. 


As for the CFAP, the CDPAP can be formulated as 
follows as an optimization problem: 


maxc;,D, 0j,1,x [Ci@) = Dx(Fi@)II 
s.t. dimension of P > d. 


Example 1 The data accesses in this example are en- 
coded by the three functions Fj (i, j) = (ij + 1), F2(i,j) = 
(i-1j + 1) and F3(i, j) = (i+ 1j +1). A possible solution 
requiring no communication and admitting one degree 
of parallelism is given by C(i, j) = j and D, (i,j) = j —1, P 
being a one-dimensional processor set. 


DOi=2,n-1 
DO j=2, n—1 
A(t, jf a2 I) = ea — IL, i) + WN) sb eH + IL, jf +b Il) 
END DO 
END DO 


Solving the Alignment Problem 
Communication-Free Alignment Approaches 


C.-H. Huang and P. Sadayappan [17], in 1991, were the 
first to formulate the alignment problem in a linear al- 
gebra framework. They focus on a communication-free 
solution. The data array elements as well as the loop 
iterations are partitioned in disjoint sets represented 
by hyperplanes. Each set is mapped onto a different 
processor. The partitions are sought such that they re- 
sult in the elimination of communication. A charac- 
terization of a necessary and sufficient condition for 
communication-free hyperplane partitioning is pro- 
vided. Various results are given characterizing the sit- 
uation where the iteration and data space can be parti- 
tioned along hyperplanes so that no communication is 
necessary. More precisely, two data elements accessed 
during a single iteration in a single instruction must be 
located on a single processor and two iterations in the 
same instruction accessing a single data element must 
be executed on the same processor. 

In [30], a matrix notation is presented to de- 
scribe array accesses in fully parallel loop nests. 
A sufficient condition on the matrices for computing 
a communication-free mapping of the arrays onto the 
processors is given. The owner computes rule is as- 
sumed for the computation mapping. The presented 
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existence condition for communication-free partitions 
is based on the connectivity of the data access graph 
which models the data access patterns. To compute data 
mapping functions, a set of systems of linear equations 
is constructed, one system of linear equations per pair 
of read and write data accesses. If there exists a solu- 
tion to the set of systems of linear equations, then there 
exists a communication-free partitioning of the array 
elements into parallel hyperplanes. 

In [2] a linear algebra approach is proposed, based 
on [17]. The communication-free alignment problem 
is solved by computing a basis of the null space of the 
application representing the alignment constraints. The 
problem of data replication is addressed. 

In [6], T.-S. Chen and J.-P. Sheu consider perfect 
loop nests. They compute iteration and data space par- 
titioning functions requiring no communication. Their 
work focuses only on uniformly generated data ref- 
erences. Sufficient conditions are given for the exis- 
tence of a communication-free partition. The method 
for partitioning the data onto the processors is based on 
the computation of independent blocks called iteration 
and data partitions respectively. If no communication- 
free partitioning exists, data replication is considered. 

In [24], an algorithm is presented that extracts all 
the degrees of communication-free parallelism that can 
be obtained via loop fission, fusion, interchange, re- 
versal, skewing, scaling, re-indexing and statement re- 
ordering. The algorithm first assigns the iterations of 
the instructions in the program to processors via affine 
processor mapping functions. Then it generates the 
correct code by assuring that the semantics of the se- 
quential program are satisfied. 


Alignment Approaches Based on Generating HPF 
like Data Distributions 


J. Li and M. Chen [22,23] are interested in the indices 
of the arrays that have to be aligned with one another 
to minimize remote data references. The techniques 
were initially developed for compiling the functional 
language ‘Crystal’, but can be applied in the process 
of compiling imperative languages like ‘Fortran’. The 
parallelism is assumed to be specified explicitly and the 
single assignment form is used. The goal of their ap- 
proach is to find alignment functions such that the di- 
mensions of each array are projected onto the same 


space of a virtual processor grid. They consider four ba- 
sic alignments: 

i) permutations of the indices, 

ii) embeddings, 

iii) translations by a constant, and 

iv) reflections. 

To find a set of data accesses for which valid align- 
ment functions exist, a component affinity graph is con- 
structed. It represents the affinities between cross ref- 
erence patterns. The nodes of the graph represent the 
components of the index domains to be aligned. An 
edge represents an affinity between the two correspond- 
ing domain components. The alignment problem then 
consists in partitioning the set of nodes of the compo- 
nent affinity graph into disjoint subsets with the restric- 
tion that no two nodes belonging to the same array are 
allowed in the same subset. A fast and quite efficient 
heuristic algorithm is presented. 

M. Gupta, in his thesis in 1992 [16], presents a data 
distribution algorithm that operates in four passes. The 
first pass serves to compute an alignment of the array 
dimensions. The algorithm developed is based on the 
notion of component affinity graph introduced by Li 
and Chen [22]. In the second phase the arrays are parti- 
tioned using either block or cyclic data distributions. In 
the third pass, the block sizes of the arrays distributed 
are computed whereas the last pass computes the num- 
ber of processors on which each array dimension is dis- 
tributed. 

K. Kunchithapadam and B.P. Miller [20], in oppo- 
sition to other approaches, assume that a user-defined 
data distribution is given. The data accesses of a pro- 
gram are modeled by a colored proximity graph. Each 
vertex of the graph represents a part of an array and 
the color of a vertex represents the current processor 
to which this array part is assigned. Edges of the graph 
represent assignments of values arising from part of 
one or more arrays to part of another array assum- 
ing the owner computes rule for the computation map- 
ping. Edges between vertices of different colors are as- 
signed a weight representing the associated communi- 
cation costs. The problem of improving a given set of 
data mapping functions is to find a sequence of color 
exchanges, that is, data redistributions, that minimize 
the weight of the graph, that is, the communication 
costs. A possible algorithm for solving this problem is 
presented. 
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B. Sinharoy and B.K. Szymanski [32] study the 
problem of finding computation and data alignment 
functions for regular iterative algorithms. A loop nest 
can be represented by a regular iterative algorithm if 
and only if all the data access functions are constant off- 
set functions and the loop nest’s instructions are in sin- 
gle assignment form. The communication cost function 
used is based on the distance of the processors exchang- 
ing data on the virtual processor grid. The authors show 
that finding computation and data mapping functions 
is equivalent to minimizing a sum of absolute values 
composed of sums. An exact enumeration algorithm is 
presented and a polynomial time algorithm for finding 
an approximate solution is described. 


Approaches Using a Graph Based Framework 


K. Knobe, J.D. Lukas and G.L. Steele Jr. [19] study 

the problem of aligning the array elements accessed 

amongst each other. They target their approach to- 
wards SIMD machines. Two different kinds of prefer- 
ences are distinguished: 

i) identity preferences representing alignment prefer- 
ences due to different data accesses to the same ar- 
ray, and 

ii) conformance preferences relating two different ar- 
rays. 

To compute what preferences can be satisfied without 
loosing parallelism, a cyclic preference graph is con- 
structed. Each data access is represented by a vertex and 
two vertices are related by an undirected weighted edge 
if there exists a preference between the two data ac- 
cesses. The weight of each edge is defined by the loop 
depth at which the data accesses occur. Conflicts be- 
tween preferences are represented by cycles in the cyclic 
preference graph. A heuristic, using a greedy approach, 
is presented to remove annoying cycles or to reduce the 
parallelism. 

In [5] an intermediate representation of a program 
called the alignment-distribution graph is described. 
The alignment-distribution graph is a directed graph in 
which nodes represent communication and edges rep- 
resent the data flow. It exposes the communication re- 
quirements of the program. The framework restricts 
the alignments computed to alignments in which each 
axis of an array maps to a different axis of an HPF 
like template and data elements are evenly spaced along 


the template axis. The alignments computed have three 
components: 

i) the axis, 

ii) the stride, and 

iii) the offset. 

The papers present two separate algorithms called the 
compact dynamic programming algorithm and the 
constraint graph method for minimizing a communi- 
cation cost function. 

A. Darte and Y. Robert [8] study the problem of 
mapping perfectly nested affine loops onto distributed 
memory parallel computers. The problem is formulated 
by introducing the communication graph that captures 
all the required information to align data and compu- 
tations. Each instruction and each array is represented 
by a vertex, the directed edges representing read and 
write data accesses. The problem of message vectoriza- 
tion and the use of global communication operations, 
like broadcasting, is addressed. 

In [11] an algorithm is presented for computing 
HPF like data distribution functions. A distribution 
graph is constructed representing the relation between 
the data access functions and the array accessed. Based 
on the distribution graph a decision tree, modeling all 
possible combinations of data distribution functions, is 
traversed using a branch and bound algorithm. The cost 
function minimized by the algorithm is based on a com- 
munication analysis tool. The computation mapping is 
done in accordance with the owner computes rule. 

M. Wolfe and M. Ikey [33] propose in 1994 an 
adaption of the techniques introduced by Li and Chen 
[22,23] for the language ‘Crystal’ to the imperative lan- 
guage ‘Tiny’. The alignment phase is decomposed into 
four operations: 

i) finding reference patterns, 

ii) adding implicit dimensions to the arrays when re- 
quired, 

iii) building a component affinity graph, and 

iv) partitioning the component affinity graph. 

As the partitioning problem is NP-hard, a heuristic is 

used. The authors furthermore describe an algorithm 

to generate SPMD code based on the alignments com- 

puted. 

J. Garcia, E. Ayguagé and J. Labarta [15] proposed 
for an algorithm to compute data distribution functions 
that can be expressed using HPF distribute statements. 
This algorithm is based on the construction and traver- 
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sal of a single data structure, called the computation- 
parallelism graph. The computation-parallelism graph 
represents all possible data distributions along the di- 
mensions of the arrays. Parallelism constraints are 
modeled as hyper-edges. Weights are associated to the 
edges to represent the associated communication costs. 
Negative costs are associated with the hyperedges to 
represent the associated parallelism. It is shown that 
distributing the data according to one dimension is 
equivalent to finding a path through the computation- 
parallelism graph fulfilling some additional constraints. 
The problem is formulated as a 0-1 integer program- 
ming problem. In contrast to other graph based ap- 
proaches, the computation-parallelism graph models 
both the possible data distribution, that is, the locality 
constraints within a single data structure, and the pos- 
sible parallelism. 

W. Kelly and W. Pugh [18] describe a technique 
to minimize communication while preserving paral- 
lelism. The approach is not sensitive to the original pro- 
gram structure. For each array, the possible data map- 
ping functions form a finite set of candidate space map- 
pings. These sets consist of each dimension of the orig- 
inal iteration space being distributed. Next, for each 
candidate space, that is, for each possible data distri- 
bution function, all possible permutations of the sur- 
rounding loops are considered and the obtained par- 
allelism measured. In a third step a weighted graph is 
constructed to model the parallelism as well as the com- 
munication cost associated with various data decom- 
positions. One node in this weighted graph represents 
one candidate space mapping for each statement. The 
weight associated with a node is its degree of paral- 
lelism obtained. The edges represent the communica- 
tion required and their weight models the communi- 
cation costs. The alignment problem, as formulated in 
[18], is the problem of selecting one node per statement 
such that the sum of the weights of the selected nodes 
and edges is minimized. An algorithm to find such a set 
using various pruning strategies to reduce the size of the 
search space is presented. 


Approaches Using a Linear Algebra Framework 


Sheu and T.-H. Toi [31] introduced a method for the 
parallel execution of nested loops with constant loop- 
carried data dependences by reducing the communi- 


cation overhead. First the nested loops are partitioned 
into large blocks which result in little inter-block com- 
munication. For a given linear transformation found by 
the hyperplane method [21], the iterations are parti- 
tioned into blocks such that the communication among 
the blocks is reduced while the execution order defined 
by the time transformation is not disturbed. The par- 
titioning is based on projection techniques. In a sec- 
ond step these blocks are mapped onto message-passing 
multiprocessor systems according to specific properties 
of the target machine. 

M. O’Boyle and G.A. Hedayat [26,27] express the 
alignment problem in a linear algebra framework. In 
this framework, aligned data can be viewed as forming 
a subspace in the iteration space. The problem solved 
is the computation of a transformation of the data ac- 
cess functions relative to one another such as to maxi- 
mize the number of iteration points in the loop iteration 
space for which no communication is needed. 

P. Feautrier [14] addresses the problem of find- 
ing an alignment function that maps the computations 
on a one-dimensional grid of virtual processors. The 
data mapping functions are defined by the owner com- 
putes rule which is imposed. The alignment constraints 
between computation and data accesses are derived 
from the data-flow graph of the program, procedure or 
loop nest considered. The data-flow graph is a directed 
graph. Vertices correspond to statements and the arcs 
to producers and consumers of data. For each state- 
ment, the alignment function is assumed to be an affine 
function of the iteration vectors with unknown param- 
eters. The locality of data accesses is imposed by asking 
that the producer and the consumer of a data element 
be the same processor. Feautrier defines distance vec- 
tors between all pairs of producers and consumers. To 
any arc of the data-flow graph corresponds a distance 
vector that expresses the difference of the indices of the 
processor that computes the data and the one that uses 
it. Thus, a computation is local if and only if the cor- 
responding distance vector is zero. The edges are hence 
transformed into affine equations and the problem con- 
sists in determining nontrivial parameters for the com- 
putation mappings that zero out as many distance vec- 
tors as possible. A heuristic is used to sort the equa- 
tions in decreasing order of the communication traffic 
induced. The system of equations, which usually does 
not have a non trivial solution, is solved by successive 
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Gauss-Jordan eliminations as long as a feasible solution 
remains nontrivial. A solution is nontrivial if it has one 
degree of parallelism. 

J.M. Anderson and M.S. Lam [1] describe neces- 
sary conditions for the data elements accessed by each 
processor to be local. They present a greedy algorithm 
to compute the computation and data mapping func- 
tions that can be satisfied. They incrementally add con- 
straints as long as their conditions are satisfied, starting 
with the most frequently used array access functions. 
They only consider the linear part of the data access 
functions, taking care of the constant offsets in a sec- 
ond step. Their heuristic technique is close to the one 
defined in [9]. 

A. Platonoff [28,29] develops extensions to Feau- 
trier’s [14] automatic data distribution algorithm. 
A method is presented to extract global broadcast oper- 
ations as well as translation operations to optimize the 
data mapping functions. In the data-flow graph, pat- 
terns representing broadcast and other global commu- 
nication patterns are searched for. The data distribu- 
tion is then chosen such as to maximize the number of 
global communication operations possible. 

M. Dion and Robert [12,13] consider a problem 
in which all data access functions are of full rank and 
no smaller than d, the required degree of parallelism. 
This ensures that the parallelism obtained is indeed as 
large as wanted. By considering only the linear parts 
they compute the largest set of alignment constraints 
that can be satisfied while yielding the given degree of 
parallelism d. The constant offsets are considered sub- 
sequently, using techniques developed by Darte and 
Robert [8]. They consider a set of candidate solutions 
and search for an optimal one that verifies the largest 
number of constraints while effectively yielding the de- 
gree of parallelism desired. In their approach, Dion 
and Robert consider three basic cases depending on the 
structure of the data access function. Then, they build 
a directed graph defined as follows. Vertices correspond 
either to statements or arrays. There is an arc from ver- 
tex p to vertex q if and only if a mapping of rank d can 
be computed for q from a given mapping of rank d for 
p according to the basic cases enumerated previously. 
In this graph they search for a tree containing a maxi- 
mal number of arcs. Obviously, choosing a mapping of 
rank d for the root of the computed tree implicitly de- 
termines mappings of rank d for all other vertices. 


C. Mongenet [25] is interested in minimizing com- 
munication costs in the presence of systems of affine 
recurrence equations, that is, single assignment loop 
nests. The data dependences are subdivided into two 
classes: 

i) auto dependences, and 

ii) cross dependences. 

Auto-dependences are data dependences between two 
data accesses to the same array. The domains of these 
arrays are projected onto hyperplanes such as to min- 
imize the number of remote data accesses. Cross- 
dependences are dependences between data accesses to 
different arrays. Unimodular transformations are ap- 
plied to the projected domains to align the different 
data array and so minimize the resulting communi- 
cations. A heuristic based on these two steps is intro- 
duced. 

C.G. Diderich [9] and Diderich and M. Gengler [10] 
present and extend the algorithm for solving this prob- 
lem introduced in [2]. In a second step they introduce 
the constant degree parallelism alignment problem. It 
is the problem of finding computation and data map- 
ping functions that minimize the number of remote 
data accesses for a given degree of parallelism. An ex- 
act implicit enumeration algorithm is presented. It pro- 
ceeds by enumerating all interesting subsets of align- 
ment constraints to be satisfied. To allow large align- 
ment problems to be solved an efficient heuristic is pre- 
sented and applied to various benchmarks. 


Other Approaches 


B.M. Chapman, T. Fahringer and H.P. Zima [4] for 
a software tool to provide automatic support for the 
mapping of the data onto the processors of the tar- 
get machine. The computation is mapped by using the 
owner computes rule. The tool is integrated within 
the Vienna Fortran Compilation System, a compiler 
for Vienna Fortran, an HPF like Fortran dialect. The 
tool makes use of performance analysis methods and 
uses, via heuristics, empirical performance data. Once 
the performance data has been obtained for a given 
program, an inter-procedural alignment and pattern 
matching phase determines a suitable alignment of the 
arrays within each procedure. The alignments are then 
propagated through the call graph of the program. 
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Eventually more versions of a procedure are generated, 
corresponding to differently distributed actual argu- 
ments. Finally code is generated using the selected data 
distributions. 

In [7], P. Crooks and R.H. Perrott present an algo- 
rithm for determining data mapping functions by gen- 
erating HPF like directives. Their approach is based 
on identifying reference patterns. To each read/write 
pair is associated an ideal data distribution that mini- 
mized inter-processor communication. Once the pref- 
erences for the individual accesses are determined, 
a performance estimator is used to select the combi- 
nation of preferences that gives the best performance 
estimate. 

R. Bixby, K. Kennedy and U. Kremer [3] present an 
automatic data layout algorithm based on 0-1 integer 
programming techniques. The data mapping functions, 
following the HPF alignment structure, are optimized 
for a target distributed memory machine, a specific 
problem size and the number of available processors. 
The distribution analysis uses the alignment search 
space, that is, the space of all possible HPF like align- 
ments, to build candidate data layout search spaces of 
reasonable data mapping functions for each loop nest. 
Ina second step the inter-phase or inter-loop nests data 
layout problem is addressed. By using an integer pro- 
gramming formulation, a data mapping function is se- 
lected for each loop nest such that a single global cost 
function, modeling the communication costs, is mini- 
mized. 


Conclusion 


This article presents major advancements made in solv- 
ing the alignment problem. Different subproblems are 
defined and described. One major open problem is 
how to incorporate scheduling information into the al- 
gorithms computing efficient alignment functions. See 
[9] for a first approach towards computing scheduling 
functions compatible with computation and data map- 
ping functions. The question of which cost function to 
use when computing alignment functions has to be ad- 
dressed with more details. 


See also 


> Integer Programming 
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Deterministic global optimization techniques for non- 
convex NLPs have been the subject of growing inter- 
est because they can potentially provide a very com- 
plete characterization of the problem being considered. 
In addition to guaranteeing identification of the global 
solution within arbitrary accuracy, they enable the lo- 
cation of all local and global solutions of the problem. 
As a result, they can be used to determine the feasibility 
of a given problem with certainty [1,2,3,4], or to find all 
solutions of a nonlinear system of equations [13]. They 
are especially valuable in the study of systems in which 
the global optimum solution is the only physically 
meaningful solution, as is the case of the phase equilib- 
rium of non ideal mixtures [16,17,18,19,20]. Tradition- 
ally, a major theoretical limitation of these approaches 
has been their inability to tackle problems with ar- 
bitrary nonconvexities. However, the recent develop- 
ment of rigorous convex relaxation techniques for gen- 
eral twice continuously differentiable functions [2,3,4] 
has greatly expanded the class of problems that can 
be addressed through deterministic global optimiza- 
tion. These approaches have been incorporated within 
a branch and bound framework to create the aBB global 
optimization algorithm for twice continuously differen- 
tiable problems [3,6,12]. The theoretical basis of the al- 
gorithm as well as the efficient search strategies it uses 
are discussed in this article. 


General Framework 


The @BB algorithm guarantees finite €-convergence to 
the global solution of nonlinear programming prob- 
lems (NLPs) belonging to the general class 


minx f(x) 
s.t g(x) <0 (1) 
h(x) = 0 


where f(x), g(x) and h(x) are continuous twice- 
differentiable functions. 

The solution scheme is based on the generation of 
a nonincreasing sequence of upper bounds and a non- 
decreasing sequence of lower bounds on the global so- 
lution. The monotonicity of these sequences is ensured 
through successive partitioning of the search space 
which enables the construction of increasingly tight re- 
laxations of the problem. The validity of the bounds ob- 
tained is of crucial importance in a rigorous global op- 
timization approach. The upper bounding step does not 
present any theoretical difficulties and consists of a lo- 
cal optimization of the nonconvex problem. The lower 
bounding step is a more challenging operation in which 
the nonconvex problem must be convexified and un- 
derestimated in the current subdomain. The strategy 
adopted dictates the applicability of the algorithm and 
plays a pivotal role in its performance as it determines 
the tightness of the lower bounds obtained. The pro- 
cedure followed in the wBB algorithm is discussed in 
the next section. Finally, the branching step involves the 
partition of the solution domain with the smallest lower 
bound on the global optimum solution into a covering 
set of subdomains. Although this is a simple task, the 
choice of partition has implications for the rate of con- 
vergence of the algorithm and efficient branching rules 
must be used. 


Convexification and Underestimation Strategy 


A convex relaxation of problem (1) is obtained by con- 
structing convex underestimators for the nonconvex 
objective function and inequality constraints and by 
relaxing the nonlinear equality constraints, replacing 
them with less stringent linear equality constraints or 
a set of two convex inequalities. The general convexi- 
fication/relaxation procedure used is first discussed for 
the objective function and nonconvex inequalities. 


Function Decomposition 


A convex underestimator for a twice continuously dif- 
ferentiable function is constructed by following a two- 
stage procedure. In the first stage, the function is de- 
composed into a summation of terms of special struc- 
ture, such as linear, convex, bilinear, trilinear, frac- 
tional, fractional trilinear, concave in one variable and 
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general nonconvex terms. Then, based on the fact that 
the summation of convex functions results in a con- 
vex function, a tailored convex underestimator is used 
for each different term type. Thus, a twice-differentiable 
function F(x) defined over the domain [x!, x4] 
ten as 


is writ- 


bt 
F(x) = c'x + Fe(x) + > bixB,,1XB,,2 


i=1 


tt ft 7 
Fj,1 
+0 tier axraxns + > fi 


ae as XF;,2 
i=1 i=1 (2) 
ftt 


uct 
XFT;,1%FT;,2 
+ > fi + Y > Fuci(xuc;) 


Xp. 
i=1 PT;,3 i=1 


net 
+ Y > Frei), 


i=1 


where c is a scalar vector; Fc(x) is a convex function; bt 
is the number of bilinear terms, b; is the coefficient of 
the ith bilinear term and xz,,; and xg,,2 are the two vari- 
ables participating in the bilinear term; ¢t is the number 
of trilinear terms, t; is the coefficient of the ith trilin- 
ear term and x7,,1 X7,,2 and x7,,3 are the three variables 
participating in the trilinear term; ft is the number of 
fractional terms, f; is the coefficient of the ith fractional 
term and xf, ,, and x, 2 are the two variables participat- 
ing in the fractional term; ftt is the number of fractional 
trilinear terms, ft; is the coefficient of the ith fractional 
trilinear term and xp7,,1, XF7,,2 and xp7,,3 are the three 
variables participating in the fractional trilinear term; 
uct is the number of univariate concave terms, Fyc, is 
the ith univariate concave term and xyc, is the variable 
participating in the univariate concave term; nct is the 
number of general nonconvex terms and Fc,(x) is the 
ith general nonconvex term. 

The decomposition phase serves two purposes: it 
can lead to the construction of a tight underestima- 
tor by taking advantage of the special structure of the 
function and it may reduce the complexity of the un- 
derestimation strategy by permitting the treatment of 
terms which involve a smaller number of variables than 
the overall nonconvex function. As will become appar- 
ent, this is especially important for general nonconvex 
terms. 


Linear and Convex Terms 


Any term that has been identified as linear or convex 
does not need to be modified during the convexifica- 
tion/underestimation procedure. 


Bilinear Terms 


The bilinear terms can be replaced by their convex en- 
velope [5,15]. A new variable wg substitutes a bilinear 
term x; x2 and is bounded by a set of four inequality 
constraints which depend on the variable bounds. 


L E LL 
WB > X{X2 + XxX] — Xj Xz, 

U U U,U 
WB > Xp X2 + xX — xp Xz, G3) 
we < x x, + xkx, — xU xt 
Bm *, +2 2*1 1 %*2> 

L U LU 
WBS X{X2 + XY xX] — XY Xz. 


Trilinear, Fractional and Fractional Trilinear Terms 


For trilinear, fractional and fractional trilinear terms, 
the convex underestimators proposed in [13] can be 
used. They are constructed in a fashion similar to the 
bilinear term underestimators: a new variable replaces 
the term and a set of inequality constraints provides 
bounds on this variable. For a trilinear term x)x2x3, for 
instance, the substitution variable wr is subject to 


Wr xxkxt + wae 

+xbxkx3 —2xbxbxt, 
wre xyxexd + x¥ xh 

+xU xhx, wl exe - ee 
wre xxkxk + xbxxt 


L,U LU. ecb U,U,U 
+X Xz x3 — XL Xy XZ — Xp Xz Xz, (4) 
Wr xxix’ =P xb ieaxk 
UL UyLyU LyL yb 
+x xXyx3— KYL XX — XL XXz, 
wre xyxkx¥ + xbxoxd 
+P xU x3 — xb bY — xx xd, 
wr eae, ee + eon 
LyL U,UyL Lobel 
AX XZ xX3 — XL Xy XZ — XL XZ XZ, 
Wr xiks xy of xr eae! 


U,U U,U,U 
+X X7 x3 — 2X xy Xz. 
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For a fractional term x,/x2 with x > 0, the new variable 
wr is bounded by 


L 5 
x % ; 
tee] (tae, 
> x2 x. xy 
WB x1 ies ae ge 
= oe + oa i if x; < 0, 
x2 XX «3 (5) 
U 
143-4  ifxy >0, 
wr> x2 x3 xz LS 
")q S24 ited <0 
xh xh YU xU 1 . 


Finally, for a fractional trilinear term x)x2/x3 with x}, x} 


> 0 and x} > 0, the substitution variable wer is subject 
to 


L £ 
xX Xz xy XQ 
Wer = uv + Ao 
*3 £ ot 
a7 x3 2x Xz 
x3 xu? 
X1X7 Xy x2 
WET = my - mi 
3 
Lv L Lt 
ie xy XTX XY X3 
T U> 
e x3 X3 
X1Xy xy X2 
WET = Ty 
*3 U hh U,U 
a ca Hy % #1 2 
U T 
X3 x3 x3 
X1Xy xy X2 
Wey Sg ee 
3 3 
LLU L,U ULU 
Breas a) _ *1% _ *%1 % 
U i. 
x3 x3 x3 (6) 
xpxt xe xy 
2 1 
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3 3 
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X3 x3 x3 
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T U> 
x3 x3 x3 
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U U 
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Univariate Concave Terms 


For univariate terms, the convexifica- 
tion/underestimation procedure does not require the 
introduction of new variables or constraints: a simple 
linearization of the term suffices. Thus, a univariate 


concave term Fyc(x) is replaced by the linear term 


concave 


Fuc(x¥) — Fuc(x") 
gu agk 


Fyc(x") + (x — x"). (7) 


General Nonconvex Terms 


For a general nonconvex term Fyc(x), a convex un- 
derestimator Fyc(x) over [x’, xY] is constructed by 
subtracting a positive separable quadratic term from 
Fnc(x) [12]: 


Fyc(x) = Fye(x) — S) aj(xj — x} )(x¥ — xj), (8) 
j=l 


where n is the number of variables and the a parameters 
are positive scalars. 

The magnitude of the w parameters determines both 
the quality of the convex underestimator, that is, its 
tightness, and its convexity. It was shown in [12] that 
the maximum separation distance, dmax, between the 
nonconvex term Fyc(x) and its convex underestimator 
Fyc(x) is given by 


dmax = ca (Fyc(x) oa Fyc(x)) 
1 n 
= ri yo -_ xy . (9) 
j=l 


Thus, small @ values are needed to construct a tight un- 
derestimator. The dependence of the maximum sepa- 
ration distance on the square of the variable ranges is 
especially important for the convergence proof of the 
algorithm [12]. Provided that the @ values do not in- 
crease from a parent node to a child node, relation (9) 
guarantees that the convex relaxations become increas- 
ingly tight as the branch and bound iterations progress 
and smaller subdomains are generated. In the limit, the 
convex underestimators match the original functions. 
As a result, the monotonicity of the lower bound se- 
quence can be ensured. 

To meet the convexity requirement of Fyc(x), the 
positive quadratic term needs to be sufficiently large to 
overcome the nonconvexity of Fyc(x). This is achieved 
by manipulating the value of the a parameters. Based 
on the properties of convex functions, a necessary and 
sufficient condition for the convexity of Fyc(x) is the 
positive semidefiniteness of the matrix Hpy,(x) + 2 
diag(a;) for all x € [x’, x/], where Hp,.(x) is the Hes- 
sian matrix of the nonconvex term Fyc(x). The diago- 
nal matrix A = diag(a;) results in a shift in the diagonal 
elements of the matrix Hp,,.(x) and is therefore referred 
to as the diagonal shift matrix. The rigorous derivation 
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of a matrix A that satisfies the convexity condition is 
a difficult matter in the general case, primarily because 
of the nonlinear dependence of the Hessian matrix on 
the x variables. This problem can be alleviated by using 
interval arithmetic to generate an interval Hessian ma- 
trix [Hp,y-] such that Hp, (x) € [Hy] for all x €]x* 
x¥] [1,3,4]. This process allows the formulation of a suf- 
ficient convexity condition for the underestimator: if all 
real symmetric matrices in [Hy] + 2 diag(a;) are pos- 
itive semidefinite, then Fyc(x) is convex over [x!, x¥]. 

Based on the interval Hessian matrix, a number of 
methods may be used to automatically and rigorously 
compute a diagonal shift matrix A that guarantees the 
convexity of F(x). The first class of techniques gener- 
ates a uniform diagonal shift matrix by equating all the 
diagonal elements of A with a single a value. In the sec- 
ond class of techniques, different a values are used and 
a nonuniform diagonal shift matrix is obtained [1,3]. 

In the first class of methods, the convexity condition 
is equivalent to the positive semidefiniteness of all real 
symmetric matrices in [Hp,,.] + 2 diag(a) and is satis- 
fied by any @ parameter such that 


a> max 0,— Ami ((Hiyel) (10) 


where Amin ([Hpy-]) is the minimum eigenvalue of 
[Hel [3,12]. 

Consider a square symmetric interval Hessian ma- 
trix family [H] whose element (ij) is the interval 
[hij, hij] and whose radius matrix AH is defined as 
(AH); = gb) A lower bound on the minimum 
eigenvalue of [H] can be obtained using one of the fol- 
lowing methods [1,3,4]: 

e Method I.1 — the Gershgorin theorem approach; 

e Method I.2a — the E-matrix approach with E = 0; 

e Method I.2b — the E-matrix approach with E = 
diag( AH); 

Method I.3 — Mori-Kokame’s approach; 

e Method 1.4 — the lower bounding Hessian ap- 
proach; 

e Method I.5 — an approach based on the Kharitonov 
theorem; 

e Method 1.6 — the Hertz approach. 

Method I.1 is an extension of the Gershgorin theorem 

for real matrices to interval matrices. The minimum 


eigenvalue of [H] is such that 


» hij 


Amin(LHD) = min } hj: — max (Ai; 
i#i 


) 


Methods I.2a and I.2b are a generalization of the re- 
sults presented in [8,23]. It requires the computation of 
the modified midpoint matrix Hy such that (Hy)ij = 


(h ae i) for j # j and (Hw)ii = 0, as well as the com- 


oiitstok of the modified radius matrix AH such that 
(AH);; = = gh) fori xj and (AH);; = h;;. Given an 
arbitrary real apinmietite matrix E, the minimum eigen- 
value of the interval Hessian matrix [H] is such that 


Amin (HH) = Amin (Hea + £) — p (AH + IIEIl). 
where p(M) denotes the spectral radius of the real ma- 
trix M. In practice, two E-matrices have been used: E = 
0 (Method I.2a) and E = A H (Method I.2b). 

Method 1.3 is based on a result presented in [21], 
which uses the lower vertex matrix H, such that (H)j = 
hy, and the upper vertex matrix H, such (H);; = hij. 
The minimum eigenvalue of [H] is such that 


Amin ([H]) ea A min(H) Tr p(H — H). 


Method I.4 uses a lower bounding Hessian of the 
interval Hessian matrix. Such a matrix is defined in 
[24] as a real symmetric matrix whose minimum eigen- 
value is smaller than the minimum eigenvalue of any 
real symmetric matrix in the interval Hessian family. It 
therefore suffices to compute the minimum eigenvalue 
of this real matrix to obtain the desired lower bound. 
A lower bounding Hessian L = (Jj) can be constructed 
from the following rule: 


higghik 
hii+ igi 
hijythij 
=> 3 


i=j, 
i fj. 


Method I.5 is based on the Kharitonov theorem [11] 
which, by extension, gives a lower bound on the min- 
imum eigenvalue of an interval Hessian matrix family 
[2]. First, the corresponding characteristic polynomial 
family must be derived 


Lij = 


[K] = [co, Co] + Ler, G1JA + [e2, &2]JA7 
+ [c3,€3]A? + [ca C4JA* + [c5,e5]A° ++, 
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where the coefficients of A depend on the elements of 
the interval Hessian matrix [H]. A lower bound on the 
roots of this polynomial can then obtained by calcu- 
lating the minimum roots of only four real polyno- 
mials. The appropriate bounding polynomials are the 
Kharitonov polynomials 


Ky= ot qA4t0d7?4+3r3 
+c4A* + esd? +++: 
Ky= M+QA4+ O17 4+ 6513 
+04A4 + C5A2 +++ 
K3= Cot cqA+ od? +033 
HGuh” Ged? oe, 
Kg= cot QaAt dA? + 63d 


+c4A* + t5A> +++. 


Method 1.6 allows the computation of the exact 
minimum eigenvalue of the family of symmetric ma- 
trices represented by the interval Hessian matrix. It re- 
quires the construction of 2”~! vertex matrices H* of 
the interval matrix [H] as defined by 


hi if i= j, 
(H*);; — hij if ujuj => 0, i F j, 
hij if Uujuj < 0, if Ii 


where all possible combinations of the signs of the ar- 
bitrary scalars u; and u; are enumerated. It was shown 
in [4,10] that the lowest minimum eigenvalue from this 
set of real matrices is the minimum eigenvalue of the 
interval matrix. 
Three rigorous techniques for the generation of 
anon uniform shift matrix A can be used [1,3]: 
e Method II.1a — the scaled Gershgorin theorem ap- 
proach with scaling vector d = 1; 
e Method II.1b — the scaled Gershgorin theorem ap- 
proach with scaling vector d = x¥ — x!; 
Method II.2 — the H-matrix approach; 
e Method II.3 — an approach based on the minimiza- 
tion of the maximum separation distance. 
The main advantage of these techniques is that resort- 
ing to a different value of the w parameter for each vari- 
able may lead to tighter underestimators by taking into 
account the individual contribution of each variable to 
the overall nonconvexity of the term being considered. 
In the case of a uniform diagonal shift, the worst con- 
tribution is uniformly assigned to all variables. 


Methods II.1a and II.1b bear resemblance with the 
Gershgorin theorem used for Method I.1. In the present 
case, however, each row is considered independently 
and the ith element of the diagonal shift matrix, «;, is 
the maximum of zero and 


hii — SF max {fi 


ei 


Jl} 2). 


Nile 


where d is an arbitrary positive vector. In practice, d = 
1 (Method II.1a) and d = x — x! (Method II.1b) have 
been used. The latter choice of scaling often helps to 
reduce the maximum separation distance between the 
nonconvex term and its underestimator by assigning 
smaller a values to variables with a larger range. 
Method II.2 is an iterative method based on the 
properties of H-matrices: a square interval matrix that 
has the H-matrix property is regular and does not have 
0 as an eigenvalue [22]. In order to determine whether 
a square interval matrix [H] is an H-matrix, its com- 
parison matrix (H) must first be defined. For i ¥ j, the 
off-diagonal element ((H))jj of the comparison matrix 
((H))j of the comparison matrix is given by 


is given by —max{| hij | : | }. A diagonal element 


0, OeEl 


], 
min {Jil hii \. 0¢[ . 


hii. hii 
hii, hii] 


A real matrix such as (H) is an M-matrix if all its off- 
diagonal elements are nonpositive - this is always true 
for (H) - and if there exists a real positive vector u such 
that (H)u > 0. The interval matrix [H] is an H-matrix if 
its comparison matrix (H) is an M-matrix. Method II.2 
follows an iterative procedure to construct a nonuni- 
form diagonal shift matrix A such that [H] + 2 A is an 
H-matrix whose modified midpoint matrix is positive 
definite. If these conditions are met, the diagonal ele- 
ments of the shift matrix are guaranteed to lead to the 
construction of a convex underestimator for the non- 
convex term. The initial guess chosen for A is the uni- 
form diagonal shift matrix given by Method 1.2. 
Method II.3 aims to generate a non uniform diago- 
nal shift matrix which minimizes the maximum separa- 
tion distance between the nonconvex term and its un- 
derestimator. For this purpose, the following semidefi- 
nite programming problem is solved using an interior 
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point method [25]: 


ming, (x7 —x!)'A(x¥ — x") 
s.t. L+2diag(a;) > 0 


a;>0, Vi, 


where L is the lower bounding Hessian matrix defined 
in Method 1.4. Because this approach is based on the 
lower bounding Hessian matrix rather than the exact x- 
dependent Hessian matrix, the solution found does not 
correspond to the smallest achievable maximum sepa- 
ration distance, but can be expected to be smaller than 
when Method 1.4 is used. 

A comparative study [1,3] of all the methods avail- 
able for the generation of a diagonal shift matrix found 
that Methods II.1a, II.1b and IL3 usually give the tight- 
est underestimators. However, Method II.3 is compu- 
tationally intensive and therefore results in poorer con- 
vergence rates than Methods II.1a and II.1b. Since the 
least computationally expensive techniques for the gen- 
eration of the diagonal shift matrix, Methods I.1, Illa 
and II.1b, are of order O(n’), the decomposition of the 
nonconvex terms into a summation of terms involving 
a smaller number of variables may have a significant 
impact on the performance of the algorithm. 


Overall Convexification/Relaxation Strategy 


Based on the rigorous convexification/underestimation 
schemes for bilinear, trilinear, fractional, fractional 
trilinear, univariate concave and general nonconvex 
terms, the overall convex underestimator F(x, w) for 
a twice continuously differentiable function F(x) de- 
composed according to (2) is 


bt 
F(x, w) = c'x + Fo(x) + > biwe, 


i=1 


tt ft ftt 
+ > tiwr;, + Yo five, + So ftiwer, 
i=l i=1 i=1 
uct 
+0 (Fockste) (11) 
i=1 
Fyc,(x§¢,) — Fuc(xbc,) i 
U 7 (xuc; — XGc,) 
Xuc; — *uc; 
net n 


+2 | Freie) — So onij(xej — x7 )(x7 — x9) 


i=1 j=l 


where the notation is as defined for (2). The introduc- 
tion of the new variables wz,, wr,, Wr, and wrr, is ac- 
companied by the addition of convex inequalities of the 
type given in (3), (4), (5) and (6). For the trilinear, frac- 
tional and fractional trilinear terms, the specific form of 
these equations depends on the sign of the term coeffi- 
cients and variable bounds. 

The form given by (11) can be used to construct 
convex underestimators for the objective function and 
inequality constraints. 


Equality Constraints 


For nonlinear equality constraints, two different con- 
vexification/relaxation schemes are used, depending on 
the mathematical structure of the function. If the equal- 
ity h(x = 0 involves only linear, bilinear, trilinear, frac- 
tional and fractional trilinear terms, it is first decom- 
posed into the equivalent equality constraint 


bt tt 


- 
cx+ ) bjxB;.1%B;,2 + ) tiXT;1XT;,2XT;,3 


i=1 i=1 


ft ftt 
XF;,1 XFT;,1XFT;,2 
+ ) ~— ) t; ——— = 0, 12 
Tina f : XFT;,3 ee 


i=1 i=1 


where the notation is as previously defined. (12) is then 
replaced by 


bt tt 
c'x+ + biwe, + tiwr; 


i=1 i=1 

ft ftt 

+ Yo five, + \> ftiwer, =0, (13) 

i=1 i=1 
with the addition of convex inequalities of the type 
given by (3), (4), (5) and (6). If the nonlinear equal- 
ity contains at least one convex, univariate concave or 
general nonconvex term, the convexification/relaxation 
strategy must first transform the equality constraint 
h(x) into a set of two equivalent inequality constraints 


| h(x) <0 


h(x) < 0, al 


which can then be convexified and underestimated in- 
dependently using (11). 

The transformation of a nonconvex twice-differen- 
tiable problem into a convex lower bounding problem 
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described in this section allows the generation of valid 
and increasingly tight lower bounds on the global opti- 
mum solution. 


Branching Variable Selection 


Once upper and lower bounds have been obtained for 
all the existing nodes of the branch and bound tree, 
the region with the smallest lower bound is selected 
for branching. The partitioning of the solution space 
can have a significant effect on the quality of the lower 
bounds obtained because of the strong dependence of 
the convex underestimators described by (3)-(8) on 
the variable bounds. It is therefore important to iden- 
tify the variables which most contribute to the separa- 
tion between the original problem and the convex lower 
bounding problem at the current node. Several branch- 
ing variable selection criteria have been designed for 
this purpose [1]. 


Least Reduced Axis Rule 


The first strategy leads to the selection of the variable 
that has least been branched on to arrive at the current 
node. It is characterized by the largest ratio 


where x; , and x//, are the lower and upper bounds on 
variable x; at the first node of the branch and bound tree 
and x! and x¥ are the current lower and upper bounds 
on variable x;. 

The main disadvantage of this simple rule is that it 
does not account for the specificities of the participa- 
tion of each variable in the problem and therefore can- 
not accurately identify the critical variables that deter- 
mine the quality of the underestimators. 


Term Measure 


A more sophisticated rule is based on the computation 
of a term measure Lj for term t; defined as 


pi = t;(x") = tj(x",w"), (15) 


where ¢(x) is a bilinear, trilinear, fractional, frac- 
tional trilinear, univariate concave or general noncon- 
vex term, f j(x, w) is the corresponding convex underes- 
timator, x* is the solution vector corresponding to the 


minimum of the convex lower bounding problem, and 
w” is the solution vector for the new variables at the 
minimum of the convex lower bounding problem. One 
of the variables participating in the term with the largest 


measure Li is selected for branching. 


Variable Measure 


A third strategy is based on a variable measure 1; 
which is computed from the term measures pi. For 
variable x;, this measure is 


=> a, (16) 


jeT; 


where T; is the set of terms in which x; participates. The 
variable with the largest measure jx} is branched on. 


Variable Bound Updates 


The effect of the variable bounds on the convexifica- 
tion/relaxation procedure motivates the tightening of 
the variable bounds. However, the trade-off between 
tight underestimators generated at a large computa- 
tional cost and looser underestimators obtained more 
rapidly must be taken into account when designing 
a variable bound update strategy. For this reason, one 
of several approaches can be adopted, depending on the 
degree of nonconvexity of the problem [1,3]: 

e variable bound updates 

- at the beginning of the algorithmic procedure 

only; or 

- at each iteration; 

e bound updates 
- for all variables in the problem; or 
- bound updates for those variables that most af- 
fect the quality of the lower bounds as measured 
by the variable measure ju. 

Two different techniques can be used to tighten the 
variable bounds. The first is based on the generation 
and solution of a series of convex optimization prob- 
lems while the second is an iterative procedure relying 
on the interval evaluation of the functions in the non- 
convex NLP. 


Optimization-Based Approach 


In the optimization approach, a new lower or upper 
bound for variable x; is obtained by solving the convex 
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problem 
Minx,w OF MaXyw Xj 


s.t. f(x,w) < f* 
g(x, w) < 0 
hi (x, w) <0 
hy(x, w) < 0 


h(x, w) = 


(17) 


n(x, w) < 0 


x € [x x7] 


w € [w!, w7], 

where p(x,w) denotes the convex underestimator of 
function p(x) as defined in (11), ie denotes the cur- 
rent best upper bound on the global optimum solution, 
h;,(x) denotes the set of equality constraints which in- 
volve only linear, bilinear, trilinear, fractional and frac- 
tional trilinear terms, hi (x) denotes the set of equality 
constraints that involve other term types and hj,(x) de- 
notes the negative of that set, n(x, w) denotes the set 
of additional constraints that arise from the underes- 
timation of bilinear, trilinear, fractional and fractional 
trilinear terms, and w is the corresponding set of new 
variables. 


Interval-Based Approach 


In the interval-based approach, an iterative procedure is 
followed for each variable whose bounds are to be up- 
dated. The original functions in the problem are used 
without any transformations. An inequality constraint 
g(x) < 0 is infeasible in the domain [x’, x”] if its range 
[g’, g/], computed so that g(x) € [g’, g7] Vxe [x’, x7[, 
is such that g” > 0. Similarly, an equality constraint h(x) 
= 0 is infeasible in this domain if its range [h’, hY], com- 
puted so that h(x) € [h’, h¥], V x € [x’, x7], is such that 
0 ¢ [h', h”]. The variable bounds are updated based on 
the feasibility of the constraints in the original problem 
and the additional constraint that the objective function 
should be less than or equal to the current best upper 
bound f*. The feasible region is therefore defined as 


g(x) < 0, h(x) = 0, 
fee Fee 


The lower (upper) bound on variable x; € oes x”) 


updated as follows: 


is 


PROCEDURE interval-based bound update() 
Set initial bounds L = x! and U = x/; 
Set iteration counter k = 0; 
Set maximum number of iterations K; 
DOk<K 
Compute midpoint M = (U + L)/2; 
Set left region {x € F : x; € [L, M]}; 
Set right region {x € F : x; € [L, M]}; 
Test interval feasibility of left (right region); 
IF feasible, 
Set U=M (L=™M); 
ELSE, 
Test interval feasibility of right (left) 
region; 
IF feasible, 
Sub = IM (Ui = IMDp 
ELSE, 
Saul = WU (U = Ib) 
Sen =" (a) 
IPk=Oand b= x (0 = 47), 
RETURN (infeasible node); 
Setk=k-+1; 
OD; 
RETURNG = ba. 1) 
END interval-based bound update; 


Interval-based bound update procedure 


In general, the interval-based bound update 
strategy is less computationally expensive than the 
optimization-based approach. However, at the begin- 
ning of the branch and bound search, when the bound 
updates are most critical and the variable ranges are 
widest, the overestimations inherent in interval com- 
putations often lead to looser updated bounds in the 
interval-based approach than in the optimization-based 
technique. 


Algorithmic Procedure 


Based on the developments presented in previous sec- 
tions, the procedure for the wBB algorithm can be sum- 
marized by the following pseudocode: 
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PROCEDURE a@BB algorithm() 
Decompose functions in problem; 
Set tolerance e€; 
Set f* = f° = —co and f* = f° = +00; 
Initialize list of lower bounds { f°}; 
DO f*—f* >e 
Select node k with smallest lower bound, f : ; 
from list of lower bounds; e 
Set f =f; 
(Optional) Update variable bounds for cur- 
rent node using optimization or interval 
approach; 
Select branching variable; 
Partition to create new nodes; 
DO for each new node i 
Generate convex lower bounding NLP 
Introduce new variables, constraints; 
Linearize univariate concave terms; 
Compute interval Hessian matrices; 
Compute a values; 
Find solution f' of convex lower bound- 
ing NLP; a 
IF infeasible or f' > f* + 
Fathom node; 
ELSE 
Add f ' to list of lower bounds; 
Find a solution fp ? of nonconvex NLP; 
Tif? =< f* 
Sel 
OD; 
OD; 
RETURN(f* and variables values at correspond- 
ing node); 
END a@BB algorithm; 


A pseudocode for the wBB algorithm 


Computational Experience 


Significant computational experience with the wBB al- 
gorithm has been acquired through the solution of 
a wide variety of problems involving different types 
of nonconvexities and up to 16000 variables [1,2, 
3,4,6,9,12]. These include problems such as pool- 
ing/blending, design of reactor networks, design of 
batch plants under uncertainty [9], stability studies be- 
longing to the class of generalized geometric program- 
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ming problems, characterization of phase-equilibrium 
using activity coefficient models, identification of stable 
molecular conformations and the determination of all 
solutions of systems of nonlinear equations. 

In order to illustrate the performance of the algo- 
rithm and the importance of variable bound updates, 
a medium-size example is presented. The objective is to 
maximize the profit for the simplified alkylation process 
presented in [7] and shown in Fig. 1. 

An olefin feed (100% butene), a pure isobutane re- 
cycle and a 100% isobutane make up stream are intro- 
duced in a reactor together with an acid catalyst. The 
reactor product stream is then passed through a frac- 
tionator where the isobutane and the alkylate prod- 
uct are separated. The spent acid is also removed from 
the reactor. The formulation used here includes 7 vari- 
ables and 16 constraints, 12 of which are nonlinear. The 
variables are defined as follows: x, is the olefin feed 
rate in barrels per day; x2 is the acid addition rate in 
thousands of pounds per day; x3 is the alkylate yield 
in barrels per day; x4 is the acid strength (weight per- 
cent); xs is the motor octane number; x¢ is the exter- 
nal isobutane-to-olefin ratio; x7 is the F-4 performance 
number. The profit maximization problem is then ex- 
pressed as: 


Profit = — min(1.715x, + 0.035x1x¢ 
+ 4.0565x3 + 10.0x2 — 0.063x3xs) 
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subject to: 


0.0059553571x¢x1 + 0.88392857x3 

— 0.1175625x6x; — x; < 0, 

1.1088x; + 0.1303533x1 x6 

— 0.0066033x)xg — x3 < 0, 
6.66173269x¢ + 172.39878x5 

— 56.596669x4 — 191.20592x6 < 10000, 
1.08702x6 + 0.32175x4 — 0.03762x¢ 

— x5 < —56.85075, 

0.006198x7x4x3 + 2462.3121x 

— 25.125634x2x4 — x3x4 < 0, 
161.18996x3x4 + 5000.0x2x4 

— 489510.0x2 — x3x4x7 < 0, 

0.33x7 — x5 + 44.333333 <0, 
0.022556x5 — 0.007595x7 < 1, 
0.00061x3 — 0.0005x; <1, 

0.819672x, — x3 + 0.819672 < 0, 
24500.0x2 — 250.0x2x%4 — x3x4 < 0, 
1020.4082x4x2 + 1.2244898x3x4 

— 100000x2 < 0, 

6.25x1xX6 + 6.25x; — 7.625x3 < 100000, 


1.22x3 = XxX HS hi 1 = 0, 


1500 <x, < 2000, 
1 <x.< 120, 
3000 <x3< 3500, 
85 <x4,< 93, 
90 <x5< 95, 
a SxS 1, 
1445 <x,< 162. 


The maximum profit is $1772.77 per day, and the op- 
timal variable values are xf = 1698.18, x} = 53.66, x} 
= 3031.30, x¥ = 90.11, x* = 95.00, xt = 10.50, x* = 
153.53. In this example, variable bound tightening is 
performed using the optimization-based approach. An 
update of all the variable bounds therefore involves the 
solution of 14 convex NLPs. The computational cost is 
significant and may not always be justified by the cor- 
responding decrease in number of iterations. Two ex- 
treme tightening strategies were used to illustrate this 
trade-off: an update of all variable bounds at the on- 


set of the algorithm only (‘Single Up’), or an update 
of all bounds at each iteration of the wBB algorithm 
(‘One Up/Iter’). An intermediate strategy might involve 
bound updates for those variables that affect the under- 
estimators most significantly or bound updates at only 
a few levels of the branch and bound tree. The results 
of runs performed on an HP9000/730 are summarized 
in the table below. ty denotes the percentage of CPU 
time devoted to the construction of the convex under- 
estimating problem. 

Although the approach relying most heavily on 
variable bound updates results in tighter underestima- 
tors, and hence a smaller number of iterations, the time 
requirements for each iteration are significantly larger 
than when no bounds updates are performed. Thus, the 
overall CPU requirements often increase when all vari- 
able bounds are updated at each iteration. 


Single up One Up/Iter 
Meth | Iter. | CPU | ty ee, | CRU | tty 
sec. | (%) sec. | (%) 


IL il 74 | 37.5 | 0.5 31 41.6 | 0.0 
I.2a 61 30.6 | 1.6 a) || 372 || OW 
1.2b 61 29.2 | 1.0 25 | 35.4 | 0.1 
1.3 69 | 32.8 | 1.9 2 || S18 || O2 
14 61 31.6 | 1.4 2) || BB1l || O24 
L5 61 B28) || 23 | 2a || so.7 | 1.7 
1.6 59 | 32.9 | 1.4 2 || 3243 || OS 
II.la | 56 | 24.9 | 0.3 30 | 36.5 | 0.3 
IL1b | 38 ssf || 17 17 1) || O5 
me? 62 | 32.7 | 0.6 25 | 34.5 | 0.3 
II.3 54 | 21.8 | 16.7 | 23 | 30.4 | 5.0 


Alkylation process design results 


In order to determine the best technique for the 
construction of convex underestimators, the percent- 
age of computational effort dedicated to this purpose, 
ty, is tracked. As can be seen in the above table, the 
generation of the convex lower bounding does not con- 
sume a large share of the computational cost, regard- 
less of the method. It is, however, significantly larger 
for Methods I.5 and II.3 as they require the solution of 
a polynomial and a semidefinite programming problem 
respectively. ty decreases when bound updates are per- 
formed at each iteration as a large amount of time is 
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spent solving the bound updates problems. In this ex- 
ample, the scaled Gershgorin approach with d; = (x?) — 
xt) (Method II.1b) gives the best results both in terms 
of number of iterations and CPU time. 


Conclusions 


The @BB algorithm is guaranteed to identify the global 
optimum solution of problems belonging to the broad 
class of twice continuously differentiable NLPs. It is 
a branch and bound approach based on a rigorous con- 
vex relaxation strategy, which involves the decomposi- 
tion of the functions into a sum of terms with special 
mathematical structure and the construction of differ- 
ent convex underestimators for each class of term. In 
particular, the treatment of general nonconvex terms 
requires the analysis of their Hessian matrix through 
interval arithmetic. Efficient branching and variable 
bound update strategies can be used to enhance the per- 
formance of the algorithm. 
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Alternative set theory has been created and, together 
with his colleagues at Charles University, developed 
by P. Vopénka since the 1970s. In agreement with 
Husserl’s phenomenology, he based his theory on the 
natural world and the human view thereof. 

The most important for any set theory is the way 
it treats infinity. A different approach to infinity forms 
the key difference between AST and classical set the- 
ories based on the Cantor set theory (CST). Cantor’s 
approach led to the creation of a rigid, abstract world 
with an enormous scale of infinite cardinalities while 
Vopénka’s infinity, based on the notion of horizon, is 
more natural and acceptable. 

Another source of inspiration were nonstandard 
models of Peano arithmetics with infinitely large (non- 
standard) numbers. The way to build them in AST is 
easy and natural. 

The basic references are [9,10,11]. 


Classes, Sets and Semisets 


AST, as well as CST, builds on notions of ‘set’, ‘class’, 
“element ofa set’ and, in addition, introduces the notion 
of ‘semiset’. A class is the most general notion used for 
any collection of distinct objects. Sets are such classes 
that are so clearly defined and clean-cut that their ele- 
ments could be, if necessary, included in a list. Semisets 
are classes which are not sets, because their borders are 
vague, however, they are parts of sets. For example, all 
living people in the world form a class—some are be- 
ing born, some are dying, we do not know where all 
of them are. The citizens of Prague, registered at the 
given moment in the register, form a set. However, all 
the beautiful women in Prague or brave men in Prague 
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form a semiset, since it is not clear who belongs to this 
collection and who not. 

In the real world, we may find many other semisets. 
Almost each property defines a semiset of objects, e. g., 
people who are big, happy or sick. Many properties are 
naturally connected with a vagueness. Also, what we 
see and perceive can be vague and limited by a hori- 
zon. Objects described in this way may form a semiset, 
e.g. flowers I can see in the blooming meadow, all my 
friends, sounds I can hear. 


Infinity 


This interpretation differs from the normal one and 
corresponds more to the etymological origin of the 
word infinity. We will call finite those classes any part 
of which is surveyable and forms a set. Any finite class 
is a set. 


Fin(X) @ (VY)(Y CX => Set(Y)). 


On the other side, infinite classes include ungrasped 
parts, semisets. This phenomenon may occur also when 
watching large sets in the case when it is not possible to 
capture them clearly as a whole. 

There are two different forms of infinity tradition- 
ally called denumerability and continuum. 

A countable (denumerable) class, in a way, repre- 
sents a road towards the horizon. Its beginning is clear 
and definite but it comes less and less clear and its end 
loses in a vagueness. A countable class is defined as an 
infinite class with a linear ordering such that each ini- 
tial part (segment) is finite. For instance, a railway track 
with cross-ties leading straight to the horizon, days of 
our life we are to live or ever smaller and smaller reflec- 
tions in two mirrors facing each other. The most im- 
portant example is a class of natural numbers that will 
be discussed later. 

The phenomenon of denumerability corresponds 
to a road towards the horizon. Though we get to the 
last point we can see, we can still go a bit further, the 
road will not disappear immediately. People have al- 
ways tried to look a bit behind the horizon, to gain un- 
derstanding and to overcome it in their thinking. This 
experience is expressed here by the important axiom of 
prolongation (see Axiom A6). 

The other type of infinity, continuum, is based on 
the following experience. If we watch an object, how- 


ever, are not able to distinguish individual elements 
which form it since they lie beyond the horizon of 
our perception. For example, the class of all geometric 
points in the plane, class of all atoms forming a table or 
grains of sand which together form a heap. 

In fact the classical infinite mathematics, when ap- 
plied to the real world, then solely to the above two 
types of infinity. 

The intention of AST is to built on the natural world 
and human intuition. There is no reason for other types 
of infinity which are enforced in CST by its assumption 
that natural numbers form a set and that a power set is 
a set. That is why there are only two infinite cardinali- 
ties in AST: denumerability and continuum (see Axiom 
A8). 

All examples from mathematical and real worlds 
are intentionally set out here together. They serve the 
purpose of inspiration to see where the idea of infin- 
ity comes from, they should be kept in mind when one 
deals with infinity. 

The mathematical world is an ideal one, it is a per- 
fect world of objective truths abstracted from all that 
is external. There is only little space for subjectivity of 
perception in it. That is why not all semisets from the 
real world may be interpreted directly. 

The axiomatic system bellow describes that part of 
the AST which can be expressed in a strictly formal way. 
This basis provides space for extending AST by semisets 
which are parts of big, however, classically finite sets 
and thus make a lot of applications possible. 


Axiomatic System of AST 


[3] The language of AST uses symbols € and =, sym- 
bols X, Y, Z, ... for class variables and symbols x, y, z, 
... for set variables. Sets are created by iteration from 
the empty set by Axiom A3. Classes are defined by for- 
mulas by Axiom A2. Every set is a class. Formally, a set 
is a class that is a member of another class: 


Set(X) @ (SY)(X € Y). 


AST is a theory with the following axioms: 

e Al (extensionality). (X = Y) @ (VZ)(Z EX) & (Z 
€ Y); 

e <A2 (existence of classes). If y is a formula, then 


(4Y)(Vx)(x € ¥ & W(x, X1,...,Xn))3 
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e <A3 (existence of sets). 
Set(O) A (Wx, y)Set(x U {y}). 


A set-formula is a formula in which only set vari- 
ables and constants occur. 
e AA (induction). If W is a set-formula, then (y(@) A 
(Vx, y)(W(x) => W(x U {y})) => (Vx) W(x). 
e A5 (regularity). If y is a set-formula, then (Ax) (x) 
=> (Ax)(w(x) A (Vy € x)>W(y)). 
As usual, the class of natural numbers N is defined in 
the von Neumann way 


hai (Vy Ex)(y © x) 
=o: 
A(Vy, ZExX/yezVyY=ZVZEY) 
The class of finite natural numbers (FN) consists of 
the numbers represented by a finite set. They are acces- 
sible, easy to overlook and lie before the horizon: 


FN = {x EN: Fin(x)} 


FN forms a countable class in the sense described 

above. The class FN correspond to classical natural 

numbers and the class N to their nonstandard model. 

Both N and FN satisfy the axioms of Peano arithmetic. 
Two classes X, Y are equivalent if there is a one-one 

mapping of X onto Y, i.e. X + Y. 

e A6 (prolongation). Every countable function can 
be prolonged to a function which is a set, ie. 
(VF)((Fnc(F) A (F ~ FN)) => (Af)(Fne(f) A FC 
fp). 

An easy corollary is that a countable class is a semiset. 

Also FN is a semiset and it can be prolonged to a set 

which is an element of N and which is greater than all 

finite natural numbers and so it represents an infinitely 
large natural number. Consequently, the class N is not 
countable. 

The universal class V includes all sets created by it- 
eration from the empty set. 

e <A7 (choice). The universal class V can be well or- 
dered. 

e A8 (two cardinalities). Every two infinite classes that 
are not countable are equivalent. 

Thus, any infinite class is either equivalent to FN or N. 
Using ultrapowers, the relative consistency of AST 

can be proved. 


Rational and Real Numbers 


Rational numbers Q are constructed in the usual way 
from N as the quotient field of the class N U {—n; n € N} 
Because N includes infinitely large numbers, Q includes 
infinitely small numbers. 

Finite rational numbers FQ are similarly con- 
structed from finite natural numbers FN. They include 
quantities that are before the horizon with respect to 
distance and depth. Surely FQC Q. 

We define that x, y € Qare infinitely near by 
1 


eee d ees 
x=y @ (Wn € FN) 4) V(x >nAy>n) 


V(x <-—nAy<-—n). 


This relation is an equivalence. The corresponding par- 
tition classes are called monads. For x € Q 


Mon(x) = {y: y=x}. 


Rational numbers x that are elements of Mon(0), i.e. 
(xé0), are infinitely small. All monads are of the same 
nature except for the two limit ones. These consists of 
infinitely large positive and negative numbers. The class 
of bounded rational numbers is 


BQ = {x € Q: (An)((n € FN) A (|x| <n))} 
Now, it is easy and natural to construct real numbers: 
R = {Mon(x): x € BQ}. 


Real numbers built in this way display the same charac- 
teristics as real numbers in CST. 

This motivation for expressing real numbers as 
monads of rational numbers corresponds rather to et- 
ymology than to the traditional interpretation. Ratio- 
nal numbers are constructed by reason, perfectly exact; 
their existence is purely abstract. On the other hand, 
real numbers are more similar to those that are used 
in the real world. If we say: one eighth of a cake, we 
surely do not expect it to be the ideal eighth, it is rather 
a portion which differs from the ideal one by a differ- 
ence which is beyond the horizon of our perception. 
A similar situation occurs in the case of a pint of milk 
or twenty miles. 
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Infinitesimal Calculus 


[12] Infinitesimal calculus in AST is based on the same 
point of view and intuition as that of its founders, I. 
Newton and G.W. Leibniz. It is so because infinitely 
small or infinitesimal quantities are naturally available 
in AST. For example, the limit of a function and the 
continuity in a € Qare defined, respectively, by: 


lim f(x) =b 
> (Vx)((x=a Ax # a) > f(x)=b)); 
(Vx)(x=a => f(x)=f(a)). 


This topic is discussed in detail in [9]. As a method, 
these definitions were successfully used for teaching 
students. 


Topology 


Classes described by arbitrary formulas can be com- 
plex and difficult to capture. The easiest are sets, also 
classes described by using set-formulas, so-called set- 
definable classes (Sd-classes) can be described well. 
Semisets which are defined by a positive property (big, 
blue or happy and also distinguishable or to be a fi- 
nite natural number) can be described as a count- 
able union of Sd-classes, the so-called o-classes. On the 
other hand, classes whose definition is based on nega- 
tion (not big, not happy, indistinguishable), are the so- 
called z-classes—countable intersections of Sd-classes. 
A class which is at the same time z and o is an Sd-class. 
Using combinations of z and o, a set hierarchy can be 
described. 

One of the most important tasks of mathematics is 
to handle the notion of the continuum. AST is based 
on the assumption that this phenomenon is caused by 
that of the indiscernibility of elements of the observed 
class. That is why, for the study of topology, the basic 
notion is a certain relation of indiscernibility (=). Two 
elements are indiscernible if, when observed, available 
criteria that might distinguish them fail. It is a negative 
feature, therefore it must be a z-class. The relation of 
indiscernibility is naturally reflexive and symmetric. In 
pure mathematics, it is in addition transitive (because 
EN is closed under addition), thus it is an equivalence. 
This relation must also be compact, i. e. for each infinite 
set u C dom(=) there are x, y € u such thatx Ay Ax 


= y. The corresponding topological space is a compact 
metric space. 

The relation of infinite nearness in rational numbers 
represents a special case of equivalence of indiscernibil- 
ity. 

Monads and figures correspond to phenomena of 
points and shapes, respectively: 


Mon(x) = {y: y =x}, 
Fig(X) = {y: (Ax € X)(y = x)}. 


Basic Definitions 


Two classes X, Y are separable, Sep(X, Y) 
(AZ)(Sd(Z) A Fig(X) C Z A Fig(Y) 9 Z = 9). 

A closure X of a class X is defined as X = 
{x: —Sep({x}, X)}. _ 

A class X is closed if X = X. 

A set u is connected if (Vw)(@ A = u => Fig(w) N 
(u—w) # 9). 

It is quite easy to prove basic topological theorems. 
Also proofs of some classical theorems are much sim- 
pler here. For instance the Sierpinski theorem: If v is 
a connected set then Fig(v) cannot be expressed as 
a countable union of disjoint closed sets. 

The fundamental indiscernibility =, is defined as 
follows. If c is a set then x =, y if for any set-formula 
w with the constants from c and for any x, it is y (x) © 
y (y). 

This relation has a special position. For any relation 
of indiscernibility = there is a set c such that =;, is finer 
than =ie. =), C =. 


Motion 


Unlike classical mathematics, the motion is captured in 
AST by any relation of indiscernibility =. 

Everybody knows the way films work. Pictures com- 
ing one after another are almost indiscernible from 
each other, however, when shown in a rapid sequence, 
the pictures start to move. The continuous motion may 
be viewed like this, as a sequence of indiscernible stages 
in certain time intervals. 

A function d is a motion of a point in the time 6 € N 
if dom(f) = 6 A (Va < 4)(d(a) = d(a+1)). 

If 6 € FN then the point does not move, it can move 
only in an infinitely big time interval. 
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A sequence {d(a): a € dom(d)} is a sequence of 
states. The number 5 = dom(d) is the number of mo- 
ments and rng(d) is the trace of a moving point. 

A trace is a connected set and for each nonempty 
connected set u there is a motion of a point such that u 
is the trace of d. 

A motion of a set is defined similarly, only 
the last condition is different: (Va < 6)(Fig(d(a)) 
= Fig(d(a+1))). 

The following theorem is proved in [10,11]: Each 
motion of a set may be divided into motions of points. 
This does not involve only the mechanical motion, but 
any motion describing a continuous change. Thus, for 
example, even the growth of a tree from a planted seed 
may be divided into movements of individual points 
while all of their initial stages are already contained in 
the seed. In addition, it is possible to describe condi- 
tions under which such a change is still continuous. 


Utility Theory 


[7] The utility theory is one of nice examples of ap- 
plying AST. Its aim is to find a valuation of elements 
of a class S. There is a preference relation > on linear 
combinations of elements of S with finite rational coef- 
ficients, i. e. on the class 


- (n € FN) 
Yiaiui: ACV < n)\(ui € S) A (a € FQ)) 
i=1 Aaa = 1 


An interpretation of a combination is a game in which 
every u; can be won with the probability a;. The prefer- 
ence relation > declares which of the two games is pre- 
ferred. 

The valuation is a function F from the class S to Q 
for which 


> any > yee > So ai F(ui) > S > BjF(u)).- 
j=l j=l 


i=1 i=1 


It is not necessary to require the so-called Archi- 
medes property on the relation of preference thanks 
to the possibility of using infinitely small and infinitely 
large rational numbers. It is possible to capture finer 
and more complex relations than in classic mathemat- 
ics, e. g. the fact that the value of one element is incom- 


parably higher than that of another element or it is pos- 
sible to compare infinitely small differences of values. 
For each class S with a preference relation a valu- 
ation may be found. Such a valuation is not uniquely 
defined, it is possible to construct it so that rng(F) C N. 


Conclusion 


The aim of this short survey is to demonstrate the basic 
ideas of AST. Yet, there are other areas of mathemat- 
ics which were studied in it, for instance measurabil- 
ity [8], ultrafilters [6], endomorphic universes [5] and 
automorphisms of natural numbers [2], representabil- 
ity [1] metamathematics [3] and models of AST [4]. 
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To ensure a certain level of reliability for the solution of 
an extremum problem under uncertainty it has become 
a spread approach to introduce probabilistic (chance) 
cost and/or constraints into the model. The stability 
analysis of chance constraint problems is rather com- 
plicated due to complicated properties of the probabil- 
ity function v;(x), defined as 


vi(x) = P{s: f(x,s) < ft}. (1) 


Here f(x, s) is a real valued function, defined on R’ x R’, 
t is a fixed level of reliability, s is a random vector and P 
denotes probability. The function v;(x) is never convex, 
only in some cases (e. g., f(x, s) linear in s and distribu- 
tion of the random parameter s normal), it is quasicon- 
vex. Note that for a fixed x function v;(x), as a function 
of t, is the distribution function of the random variable 
F(x s). 

The ‘inverse’, the quantile function wa(x), to the 
probability function v;(x) is defined in such a way that 
the probability level a, 0 < a < 1, is fixed earlier, and the 
purpose is to minimize the reliability level t: 


Wa(x) = min {f: Pi{s: f(x,s)<t}>a}. (2) 


Varied examples of extremum problems with proba- 
bility and quantile functions are presented in [7] and 


in [8]. Some of these models have such a complicated 
structure, see [8, Chap. 1.8], about correction of a satel- 
lite orbit, that we are forced to look for a solution x from 
a certain class of strategies, that means, the solution x 
itself depends on the random parameter s, x = x(s). 

This class of probability functions was introduced to 
stochastic programming by E. Raik, and lower semicon- 
tinuity and continuity properties of v;(x) and wa(x) in 
Lebesgue L’-spaces, 1 < p < ov, were studied in [12]. 
Simultaneously, in [4] problems with various classes 
of solutions x(s) (measurable, continuous, linear, etc) 
were considered. Since the paper [4] solutions x(s) are 
called decision rules, and we will follow also this termi- 
nology. 

Differently from [4], here we will consider approx- 
imation of a decision rule x(s) by sequences of vectors 
{Xn}, Xn = (Xn ---> Xun)» Nn = 1, 2, ..., with increasing 
dimension in order to maximize the value of the prob- 
ability functional v(x) under certain set C of decision 
rules. It will be assumed that the set C will be bounded 
in the space L!(S, Y, 0) = L'(c) of integrable functions 
x(s),x € L'(o): 


max v;(x) — max P {s: Ff (x(s),s) < t}. (3) 


Here S is the support of random variable s with distri- 
bution (probability measure) o(-) and X denotes the 
sigma-algebra of Borel measurable sets from R’. 

Due to technical reasons we are forced to assume 
that the random parameter s has bounded support S$ C 
R’, diam S < ov, and its distribution o is atomless, 


o{s: |s—so| = const} = 0, Vso € R’. (4) 


Since the problem (3) is formulated in the function 
space L!(o) of o-integrable functions, the first step in 
its solution is the approximation step where we will re- 
place the initial problem (3) by a sequence of finite- 
dimensional optimization problems with increasing di- 
mension. Second step, solution methods were consid- 
ered in a series of papers of the author (see, e. g., [9]), 
where the gradient projection method was suggested 
together with simultaneous Parzen-Rosenblatt kernel- 
type smooth approximation of the discontinuous inte- 
grand from (1). 

There are several ways to divide the support S of the 
probability measure o into smaller parts in discretiza- 
tion, e.g., taking disjoint subsets S;, j = 1, ..., k, of S 
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from the initial sigma-algebra >’ as in [11], or using in 
the partition of S only convex sets from &, as in [5]. 

We will divide the support S into smaller parts by 
using only sets Aj,,i=1,...,n,n € N = {1,2,...}, with 
o-measure zero of their boundary, i.e., o (intAj,) = 0 
(Ain) = 0 (clAin), where int A and cl A denote topo- 
logical interior and closure of a set A, respectively. Such 
division is equivalent to weak convergence of a sequence 
of discrete measures {(m,, Sy)} to the initial probability 
measure O, see, e. g. [14]: 


> h(sin)Min > i h(s)o(ds), neéeN, (5) 
S 


i=1 


for any continuous on S function h(s), h € C(S). 

The usage of the weak convergence of discrete mea- 
sures in stochastic programming has its disadvantages 
and advantages. An example in [13] shows that, in gen- 
eral, the stability of a probability function with respect 
to weak convergence cannot be expected without addi- 
tional smoothness assumptions on the measure o. This 
is one of the reasons, why we should use only continu- 
ous measures with the property (4). An advantage of the 
usage of the weak convergence is that it allows us to ap- 
ply in the approximation process instead of conditional 
means [11] the more simple, grid point approximation 
scheme. 

Since the functional v;(x) is not convex, we are not 
able to exploit in the stability analysis of discrete ap- 
proximation of the problem (3) the more convenient, 
weak topology, but only the strong (norm) topology. As 
the first step we will approximate v;(x) so, that the dis- 
crete analogue of continuous convergence of a sequence 
of approximate functionals will be guaranteed. 

Schemes of stability analysis (e.g., finite-dimen- 
sional approximations) of extremum problems in Ba- 
nach spaces require from the sequence of solutions of 
‘approximate’ problems certain kind of compactness. 
Assuming that the constraint set C is compact in L!(o), 
we, as the second step, will approximate the set C by 
a sequence of finite-dimensional sets {C,,} with increas- 
ing dimension so, that the sequence of solutions of ap- 
proximate problems is compact in a certain (discrete 
convergence) sense in L'(a). Then the approximation 
scheme for the discrete approximation of (3) will follow 
formed schemes of approximation of extremum prob- 
lems in Banach spaces, see e. g. [2,3,15]. 


Redefine the functional v;(x) by using the Heaviside 
zero-one function y: 


v(x) = [ ee= F009.9) 0 (6) 


where 


_ fl if f(x(s),s) <6, 
xt — f(x(s), 8) = if f(x(s), 8) > t. 


Since the integrand y(-) itself, as a zero-one func- 
tion, is discontinuous, we will assume that the function 
F(x, s) is continuous both in (x, s) and satisfies following 
growth and ‘platform’ conditions: 


[f(x,s)| < a(s) +a |x|, 
a€éL(o), 
o {s: f(x,s) = const} = 0, 
V(x,s)€R’xS. 


(7) 


a>0O, 


(8) 


The continuity assumption is technical in order to sim- 
plify the description of the approximation scheme be- 
low. The growth condition (7) is essential: without it 
the superposition operator f(x) = f(x(s), s) will not map 
an element from L! to L! (is even not defined). Condi- 
tion (8) means that the function f(x, s) should not have 
horizontal platforms with positive measure. 

Constraint set C is assumed to be a set of integrable 
functions x(s), x € L'(c), with properties 


fix o(ds)<M<ow, VxeEC (9) 
S 
for some M > 0 (C is bounded in L'(o)); 
/ Ix(s)|<Ko(D), VxeC, DES (10) 
D 


for some K > 0; 
(x(s)— x(t),s—t)>0 foraas,teS (11) 


(functions x € C are monotone almost everywhere and 
a.a. denotes abbreviation of ‘almost all’). 

Conditions (9), (10) guarantee that the set C is 
weakly compact (i.e., compact in the (L’, L™)-topol- 
ogy, see, e.g., [6, Chap. 9.1.2]). Condition (11) guar- 
antees now, following [1, Lemma 3], that the set C is 
strongly compact in L'(a). Then, following [11], we 
can conclude that assumptions (7)-(11) together with 
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atomless assumption (4) for the measure o guarantee 
the existence of a solution of problem (3) in the Banach 
space L'(a) of o-integrable functions (the cost func- 
tional v;(x) is continuous in x and the constraint set C 
is compact in L!(o)). 

Since approximate problems will be defined in R™, 
we should define a system of connection operators P = 
{Pn} between spaces L'(o) and R™,n EN. In LP-spaces, 
1 < p < », systems of connection operators should 
be defined in a piecewise integral form (as conditional 
means): 


(Prs)in = (An) f xls) o(ds), (12) 
Ain 
where i = 1, ..., n, and sets Aj,, i= 1,...,n,n EN, 


that define connection operators (12), satisfy following 
conditions Al)-A7): 

Al) o(Ajn)> 0; 

A2) Ain 0 Ajn = 0,1 F js 

A3) U", Ain = S; 

A4) SO", |min — o(Ain)| > 0, 1 € N; 

A5) max; diamAjn > 0, n € N; 

A6) Sin € Ain; 

A7) o(intAin) = (Ain) = o(clAin). 


Remark 1 Weak convergence (5) is equivalent to the 
partition {A,,} of S, Ay = {Ain,..., Ann}, with properties 
A1)-A7), see [14]. 


Remark 2 Collection of sets {Aj,} with the property 
A7) constitutes an algebra X') C ¥, and if S = [0, 1] 
and if o is Lebesgue measure on [0, 1], then integrabil- 
ity relative to o| 5, means Riemann integrability. 


Define now the discrete convergence for the space L'(o) 
of o-integrable functions. 


Definition 3 A sequence of vectors {x,}, x, € R™, P- 
converges (or converges discretely) to an integrable func- 
tion x(s), if 
n 
Yo [xin —(Pux)in| min > 0, n EN. (13) 


i=1 


Remark 4 Note that in the space L'(o) of o-integrable 
functions we are also able to use the projection meth- 
ods approach, defining convergence of {x,} to x(s) as 
follows: 


J 


x(s) — » as (s)| o(ds) > 0, neN. 


i=1 


Remark 5 Projection methods approach does not work 
in the space L°(o) of essentially bounded measurable 
functions with vraisup-norm topology (L™(c) is anon- 
separable Banach space and the space C(S) of continu- 
ous functions is not dense there). 


We need the space L®(o), which is the topological dual 
to the space L!(c) of o-integrable functions, in order 
to define also the discrete analogue of the weak conver- 
gence in L!(o). 


Definition 6 A Sequence of vectors {x}, x, ER”, n € 
N, wP-converges (or converges weakly discretely) to an 
integrable function x(s), x € L'(o), if 


Dein nd a / C.xN) od), 


neN, 


for any sequence {Z,} of vectors, z, € R', n € N, and 
function z(s), z € L°(o), such that 


max |Zin —(pPnZ)in| > 0, neN. (15) 


1l<i<n 
In order to formulate the discretized problem and to 
simplify the presentation, we will assume that in parti- 
tion {A,} of S, where Ay = { Ain, ..., Ann}, with proper- 
ties Al)-A7), in property A4) we will identify mj, and 
O(Ain), Le. Min = O(Ain) (e.g. squares with decreasing 
diagonal in R?). 

Discretize now the probability functional v;(x): 


Ven(%n) = > X(t — fins Sin) tin, (16) 


i=1 
and formulate the discretized problem: 


max Vtn(X,) 
C 


xnECn 


(17) 


XnECn 


= max )) x(t— f (xin, Sin)) Min, 
i=1 


where constraint set C,, will satisfy discrete analogues of 
conditions (9)-(11), covered to the set C: 


n 
Yo lxinl min <M Wn € Cy, (18) 
i=1 
~ |xin| Min < Ky Min, 
i€l, i€l, 
Vxn€Cy, VWIn C f{1,...,n}, (19) 
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Vita —Xewik— je) 20, Vins jet ik < je 
k=1 
(20) 


and such that 0 < ip, jp <n, VN EN. 


Definition 7 A sequence of sets {C,,}, C, CR”, n EN, 

converges to the set C C L'(c) in the discrete Mosco 

sense if 

1) for any subsequence {x,}, 1 € N’ CN, such that x, 
€ C,, from convergence wP-lim x, = x,n €N, it 
follows that x € C; 

2) for any x € C there exists a sequence {xy}, x» € Cy, 
which P-converges to x, P-lim x, =x,n EN. 


Remark 8 If in the above definition also ‘for any’ part 
1) is defined for P-convergence of vectors, then it is said 
that sequence of sets {C,,} converges to the set C in the 
discrete Painlevé-Kuratowski sense. 


Denote optimal values and optimal solutions of prob- 
lems (3) and (17) by v*, x* and vi, x*, respectively. 

Let function f(x, s) be continuous in both variables 
(x, s) and satisfy growth and platform conditions (7) 
and (8). Then from convergence P-lim x, = x,n € N, 
for any monotone a.e. function x(s), it follows conver- 
gence V,(x,) > v(x),n EN. 

Verification of this statement is quite lengthy and 
technically complicated: we should first approximate 
discontinuous function y(t — f(x, s)) by continuous 
function y,(t — f(x, s)) in the following way: 


Xc(t — f(x, s)) 
1 if f(x, 5) = t, 

= 41-S'[f(x,s)—t] ift < f(x,s)<t+6, 
0 if f(x,s) > t+ 


for some (small) 6, and then a discontinuous solution 
x(s), x € L'(o), by continuous function x,(s) (in L'- 
norm topology). 

Let constraint sets C and C,, satisfy conditions (9)- 
(11) and (18)-(20), respectively. Let discrete measures 
{(Mns Sn)} converge weakly to the measure o. Then the 
sequence of sets {C,,} converges to the set C in the dis- 
crete Painlevé-Kuratowski sense. 

Verification of this statement relies on the two fol- 
lowing convergences: 


1) sequence of sets, determined by inequalities (18), 
(19) converges, assuming weak convergence of dis- 
crete measures (5), in discrete Mosco sense to the 
weakly compact in L'(a) set, determined by in- 
equalities (9), (10); 

2) adding to both, approximate and initial sets of ad- 
missible solutions monotonicity conditions (20) and 
(11), respectively, we can guarantee the discrete 
convergence of sequence {C,} to C in Painlevé- 
Kuratowski sense. 

Now we can formulate the discrete approximation con- 

ditions for a stochastic programming problem with 

probability cost function in the class of integrable de- 
cision rules. 
Let function f(x, s) be continuous in both variables 

(x, s) and satisfy growth and platform conditions (7) 

and (8), constraint set C satisfy conditions (9)-(11) and 

let discrete measures {(m,, s,)} converge weakly to the 
atomless measure o. Then v* — v*, n € N, and se- 

quence of solutions {x7} of approximate problems (17) 

has a subsequence, which converges discretely to a so- 

lution of the initial problem (3). 


Remark 9 The usage of the space L'(c) of integrable 
functions is essential. In reflexive L?-spaces, 1 < p < 00, 
serious difficulties arise with application of the strong 
(norm) compactness criterion for a maximizing se- 
quence. 


As a rule, problems with probability cost function are 
maximized, whereas stochastic programs with quantile 
cost are minimized, see, e. g., [8,10]. 

Consider at last discrete approximation of the quan- 
tile minimization problem (2): 


min Wg (x) 
xEC 
= min min{P(f(x(s),s) < t) >a}, (21) 
xEC ¢ 

It was verified in [10] that under certain (quasi)- 
convexity-concavity assumptions the quantile mini- 
mization problem (21) is equivalent to the following 
Nash game: 

max v;(x) = Jy, (22) 

x€EC 


* 


min(v;(x) -—a) =Jy. (23) 
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Discretizing v;,(x) as in (16) and w(x) as 


Wan (Xn) = min y 7 x(t fin. Sin)) Min 2 OG 


i=1 


we can, analogously to the probability functional ap- 
proximation, approximate the quantile minimization 
problem (21) too. In other words, to replace the Nash 
game (22), (23) with the following finite-dimensional 
game: 


max Vin(%n) = Jigs 
Xn 


(24) 


min(Vin (Xn) — a)? = Ee (25) 

Verification of convergences Jf, — J} and J, > 
Jz, n €N, is a little bit more labor-consuming com- 
pared with approximate maximization of probability 
functional v;(x), since we should guarantee also con- 
vergence of the sequence of optimal quantiles {t*} of 
minimization problems (25). 
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Approximation of multivariate probability integrals is 
a hard problem in general. However, if the domain 


of the probability integral is multidimensional interval, 
then the problem reduces to the approximation of mul- 
tivariate probability distribution function values. 


Lower and Upper Bounds 


Let €T = (&, ..., &,) be a random vector with given 
multivariate probability distribution. Introducing the 
events 


Aj = {& < x1},...,An = {En << Xn}; 
where x, .. 


ate probability distribution function of the random vec- 
tor € can be expressed in the following way: 


.» X, are arbitrary real values the multivari- 


F(x1,...,Xn) 

= P(E, < x1,...,8) < Xn) 

= P(4yni-A,) 

= 1-—P(A, U---UA,) 

S1=5) $5,364 (-1)"5 , 


where 
A; = {& > xj}, i=1,...,n, 
and 
Sk= >> P(A, N---NA;,), k= 1,...,0. 


1Si)<--<igSn 


First one shows that S$), S, and so the individual prob- 
abilities P(A;), i= 1,...,n, P(A; N Aj), i=1,...,0—-1, 
j=i+1,...,n, involved in them can be expressed by F; 
(x;), i= i eee 718 and Fi(xj, xj), 1 = 1,...,n- 1j= i+ 1; 
...) N, the one- and two-dimensional marginal probabil- 
ity distribution functions of the random vector &. One 
has 


Si = ) P(A) = > P(E; = x1) 


i=1 i=1 


= n— oP; <x;)= aes 


i=1 i=1 
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and 


S> = > P(A; M Aj) 


l<i<j<n 

= y P(E; > xi, &} = xj) 
l<i<j<n 

= YO {1-PE <x) 
l<i<j<n 


—P(E; < xj) + P(E; < xi,&; < x;)} 


n(n — 1) a 
=P eel) 2 Ate) 
+ oe Fi j(x;, xj). 
l<i<j<n 


So if one can calculate the one- and two-dimensional 
marginal probability distribution functions of the ran- 
dom vector € then one can bound the multivariate 
probability distribution function by the very simple 
bounds given by C.E. Bonferroni [1]: 


1-S, < F(x1,...,Xn) < 1-S,+ S, 


or by the sharp bounds, called Boole-Bonferroni bounds 
discovered independently by many authors (see [11] for 
a summary): 


1-—S,; + —S2 
n 
< F(x1,...,Xn) 
2. = 2 = 
= 1 ee PD 
where 


When applying the above bounds usually the upper 
bound proves to be sharper. However one can improve 
the lower bound by the application of the bound discov- 
ered independently by D. Hunter [5] and K.J. Worsley 
[18]. This bound is an upper bound for P(A, U---U An) 
by the use of S; and the individual probabilities P(A; 9 
Aj), | <i<j <n. It is constructed in the following way. 
Construct a nonoriented complete graph with n nodes 
and assign to node i the event A; (or the probability 
P(A;)) and to arc (i, j) the weight P(A; Aj). Let T* be 


a maximum weight spanning tree in this nonoriented 
complete graph then one has 


P(A, U---UA,) <S:— > P(A; NA), 
(i,jseT* 


which is called the Hunter-Worsley upper bound. This 
results the following lower bound on the multivariate 
probability distribution function: 


1-Si+ So P(A; NA) < F(x... xn). 
(i, feT* 


The individual probabilities P(A; N Aj), l<i<j<n, 
can be stored when one calculates the value of S; and 
the maximum weight spanning tree can be found by 
several fast algorithms, for example by Kruskal’s algo- 
rithm, see [9]. Now one has three lower and two up- 
per bounds on the multivariate probability distribution 
function and all of them are computable if the one- 
and two-dimensional marginal probability distribution 
functions are known. Let us denote these bounds in the 
following way: 


Il, =1-Si, 
= i) 
L, = 1-8, + —So, 
n 


L;=1-S,+ > P(A;NA)), 
G,jeT* 
U, => 1-S, + So, 
2 = 2 


Up =1- S S). 
ea. ee +b” 


As one has L; < Lz < L3 and Uz < Uj, the best lower 
bound is L3 and the best upper bound is U2. 


Monte-Carlo Simulation Algorithm 


One can take the differences between the multivariate 
probability distribution function and its lower and up- 
per bounds introduced before: 


PGi cccpte\ 1a = Som Ss ees (15, 
F(x,,...,Xn) —Ly 


2\-= = = 
= (1-2) Sp Sa ser (=D) Sag 
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F(x,,...,Xn)— L3 
=— > P(A;MA;) + S2—S3 +--+ + (-1)"Sn, 


(i,j)ET* 
F(Xis 600) X%n) = U; = S3 sees (-1)"Sn, 
F(x, one 5X) = U2 


= z 1)S,;+1{1 . Ss. 
~ ke 41 : k*(k*+1))° 


—S3 +--+ (-1)"S, . 


A Monte-Carlo simulation procedure of the multivari- 
ate probability distribution function value based on the 
estimation of the differences above will be given. First 
however the so called crude Monte-Carlo simulation 
procedure will be described. Let the random vectors (1, 
...5€5),S=1,...,S, be distributed according to the mul- 
tivariate probability distribution function to be approx- 
imated. One must check the inequalities &} < x,,..., &, 
< xX, for all sample elements, s = 1, ..., S. For this pur- 
pose let be defined the random values 


1 iff) <x1,...,885 < Xp, 
= . ae co ene 2 
0 otherwise, 


These random values are identically distributed and 
stochastically independent. All of them take on the 
value 1 with probability equal to the approximated mul- 
tivariate probability distribution function value. The 
sum of them has binomial probability distribution with 
parameters S and F(x), ..., X,). So the random variable 


ly S 
Yo = scot a 


has expected value P = F(x,,...,x,) and variance 
Pu) This is why vo can be regarded as an estimate, 
the so called crude Monte-Carlo estimate of F(x), ..., 
xy). If one introduces «* as the number of those &} < xj, 
...) & < xX, inequalities which are not fulfilled, i.e. the 
number of those &} > x),..., &, > x, inequalities which 
are fulfilled, or the number of those A‘,..., AS events 
which occur, s = 1, ..., S, the vj random values can be 
expressed as 


1 ifk’ =0, 
Yo = . a— el Dee 
0 otherwise 


and on the other hand for the binomial moments of kK‘ 
one has 


E =S;,,k =0 =] S 
i = Sk: FS Vig esidia 5 MG SS No eccesy, ‘. 


The simplest proof of these equalities was given by L. 
Takacs [17] and it was reproduced by A. Prékopa in 
[11]. If the random numbers A‘, s = 1, ..., S, are also 
introduced as the number of those A; N A j=l = 
Xi, gi > xj}, (if) € T*, events which occur then for the 
expected value of A° one has 


EG) = >> PAA), = Tyna. 
(Gi,jeT* 


Using these equalities one easily can see that the fol- 
lowing random values have expected values equal to the 
differences between the multivariate probability distri- 
bution function and its bounds: 


v=6)-C)eoverl) 


vy, 


ll 
oN 
— 
| 
alNw 
a 


Vis =i + 4 
ae-fe}eovear(e) 


Vv U2 


wn 


ll 
es, 
~~ 
* 
+ i) 
ran 


i 2 KS 

+( -aaza) 5 
KS nl 

-(S)eten (<)) 


By the binomial theorem one has 


(:) : (:) re co(") =o 
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and the above random values can be expressed as 


; KS—1 ifks > 2, 
v= 
: 0 otherwise, 
1 : 
ee oe = DG= ke") ate* > 2, 
e 0 otherwise, 
F KS—-1-AS ifk’ > 2, 
v= 
: 0 otherwise, 
1 : 
ee 3° =DQ—e") te’ > 3, 
" 0 otherwise, 
(k*—KS\(KS—k*—-1) 
vi, = ”_k(kFE1) if «* = 1, 
. 0 otherwise. 


Taking the new random values vz,, Vr.. Viz» VUy> VUp 
and the estimate v9 introduced before: 


Vo = sv too +8) 

vy, = Ly + 5h +++ 2), 
vi, = 12+ 5h +++ 02), 
vp, = L3 + 5h +---+0})), 
vy, = Ur + 5b, +--+ 03,), 


1 
Vy, = U2 + 5 ur setts oe UB.) 


one gets altogether six estimates of the multivariate 
probability distribution function. These estimates obvi- 
ously are not stochastically independent so one can mix 
them to get a new estimate with minimal possible vari- 
ance. This technique is called regression method and it 
means forming the estimate 


V = WoVo + W1,VL, + WLzVLy 
+ W13YL, + Wu, Vu, + Wu,VuU; 


with wo + wr, + Wi, + Wr, + Wu, + Wy, = 1, where wo, 
WL,» WLy> WL3» WU;» Wu, are chosen so that the variance 
of v be minimized. Let 


Coo COL, COL, COL; Cou, COU, 
CLO «=6CLyL, =ELyL, =ELyL3) = CLYU, — EL Un 
CLx0  CLyLy = CLL, = EL L3 €L QU, CL Uy 
CL30 = CL3L, =EL3L. = €L3L3  €L3U, = ©L3U 
Cu,;0 = CUyL; CU,Ly CU,L3; CU;U, CU,U, 


CU,0 =CUxL, ~=CUnLy ~=FUnL3 =CU2U, =F UU 


be the covariance matrix C of the six estimates, where 
C is a symmetrical matrix. Then the variance of v is wT 
Cw, where w = (Wo; WL}, WL» WL3> WU,» Wu). The La- 
grangian problem: 


min w!Cw 


st. Wo + wr, + wr, + wr, + wu, + wu, =1 


can easily be solved. In fact, the gradient of wT Cw 
equals 2 wT C, hence one has to solve the system of lin- 
ear equations 


CooWo + Cor, WL, + Cor,WL, + CoL3;W1;3 

+ Cou, Wu, + CoU,WU, — A= 0, 

Cr,0Wo + CLL, WL, + CLyL, WL, + CLL; WL; 
+ cru,Wu, + Cr:uWu, — A = 0, 

Cr,0Wo + CLyL,WL, + CLiL.WL + CL L3 WL; 
+ CL,U, WU, + CL3U2WU2 = A = 0, 

Cr30Wo + CL3L, WL, + CL3L, WL, + CL3L3 WL; 
+ CL,u,Wu, + CL3U.WU, —-ir=0, 

CuoWo + CULL, WE, + CULL. WL. + CU;L3WL3 
+ cu,u, wu, + cu,u,wu, —A = 0, 

Cu,0Wo + CUL, WL, + CUrL, WL + CU2rL3 WL; 
+ Cu,u, Wu, + CuU,U,Wu, — A=0, 


Wo + Wr, + wr, + Wr, + wy, + wo, -A=1. 


for the unknowns wo, W1,, Wr, WL3> WU; Wuy, A. AS 
the covariance matrix C is not known in advance, so 
one must estimate its elements from the random sam- 
ple during the Monte-Carlo simulation procedure. This 
means that one must sum up not only the individ- 
ual random values v9, 7,5 V7, V2, Vy,» Vy, but their 
crossproducts, too. The crossproducts are many times 
trivial, so their calculation is not necessary. For exam- 
ple vp equals vjv9, further when vp equals nonzero (k* 
= 0) then all other random values vj, vp, VE, VU,> Vu, 
are equal zero, so the corresponding crossproducts are 
all zero. One should also notice that the random val- 
ues V7, Vp,» Vz, are always nonnegative while the ran- 
dom values vj, , vy, are always nonpositive. So the cor- 
responding crossproducts cannot be positive even they 
are many times negative yielding real variance reduc- 
tion in the final estimate. 
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One- and Two-Dimensional Marginal Distribution 
Functions 


For the applicability of the Monte-Carlo simulation al- 
gorithm of the previous section one has to show that the 
one- and two-dimensional marginal distribution func- 
tion values can be evaluated efficiently. As in the cases 
of the multivariate normal distribution, (one parame- 
ter) gamma and Dirichlet distributions the marginal 
distributions are also normal, gamma and Dirichlet and 
the one-dimensional Dirichlet distribution is the beta 
distribution, the one-dimensional marginal probability 
distribution functions can be evaluated by known al- 
gorithms. For example in the IMSL subroutine library 
[6] the subroutines MDNOR, MDGAM and MDBETA 
provide these calculations. In the case of the normal 
distribution the two-dimensional marginal probability 
distribution function also can be evaluated by a stan- 
dard IMSL subroutine called MDBNOR. Some details 
of the calculations provided by these subroutines can 
be found in [8]. 

In the case of the multivariate gamma distribution, 
introduced by Prékopa and T. Szantai in [12], only the 
evaluation of the joint probability distribution function 
of the random variables 


& =m+%m, 
& =m+7s 


is necessary. Here the random variables 7), nz and 73 
are independent and gamma distributed with parame- 
ters 3), J, and 3. Taking the joint characteristic func- 
tion of €; and & and applying the inversion formula 
one easily gets the joint probability density function of 
them. This is in the form of series expansion involv- 
ing Laguerre polynomials. Using some integral formu- 
lae of these orthogonal polynomials one can integrate 
the joint probability density function to get the final for- 
mula for the evaluation of the joint probability distribu- 
tion function in the following form 


F(z, Z2) = Fo, +92(21) Fo, 493(Z2) 


+ Y°C(d1, 02, 93, k) 
k=1 


o,+0 
x fo, 402 (Zz) Ly2 4 *(z1) 


x fo, tos4i(Z2)LP T(z), 


where 


(kK-1! F041 +h4 
01, 0, 03,k) = 
C( 1,02 V3 ) k T'(04) 
(d+ 0) + 1) PH, + 03 +: 1) 
FO +h + OTOH +648 


and f(z) and Fy(z) are the one-dimensional gamma 
probability density, respectively distribution, functions. 
For the calculation of the Laguerre polynomial the fol- 
lowing recursion formula can be used 


(k + ILE, (2) 
= (2k+0+1-219(2z)-(kK+ HL? 
k=0,1,..., 


where Ee (z) = 1 and i) = 0} + 1— z. The conver- 
gence of the series for calculation of F(z), z2) has been 
established by Szantai in [14]. 

In the case of Dirichlet distribution the two- 
dimensional marginal probability density function of 
the components &;, &; is given by 


Payr(b)r). 


T'(a+b-+c) 
ifzj+a<1,720,%>0, 


ae ele ae 


f (21, 22) = 


where a = 0;, b = 3; and c = aoe Bj— vj. One 
obtains by direct calculation for the two-dimensional 
probability distribution function 


T(a+b+c) 
I'(a)I'(b)I'(c) 


Zl 22 
x ie / oe =o) dad, 
0 0 


a 
= Fi, 


F(z), Z2) = 


a 
a b+m 

a (eo aca 

a (b+ m)m! 


m gatk gotm—k 
+) > — ; 
(a+ k)k! (b+ m—k)(m-— k)! 
The above formula is valid only if z; + z2 < 1, z; = 0, Z2 


> 0; otherwise the statement a) of the following more 
general theorem can be applied. 


4 Sd des(m— 0) 
b ra Cc m Cc 


Theorem 1 Let zi <--- < z* be the ordered sequence 
Of Z1, ..-5 Zn the arguments of the n-dimensional Dirich- 
let distribution function. 
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a) If zi+z3 > 1 then one has 


Zn) =1—-n+ >> Fi(zi). 


i=1 


F(z, ie 


b) Ifz{+ 23+ 23 > 1 then one has 


F(z,...,2Zn) 


(z+ >> 


l<i<j<n 


Fi (zi ’ Zj). 


Here F,(z;) and Fj(z;, zj) are the one- and two- 
dimensional marginal probability distribution functions. 


This theorem was formulated and proved by Szantai in 
[13]. It also can be found in [11]. 


Examples 


For illustrating the lower and upper bounds on the mul- 
tivariate normal probability distribution function value 
and the efficiency of the variance reduction technique 
described before one can regard the following exam- 
ples. 


Example 2 
n = 10, 
xy= 1.7, x2 = 0.8 x3 = 5.1 
x4 = 3.2; x5 2.4 x= 1.8 
x7 = 2.7; xg = 1.5, Xo = 1.2 
X19 = 2.6, 
rij =0.0, i1=2,...,10, j= 1, ,i-1, 
except 121 = — 0.6, r43 = 0.9, ros = 0.4, rg7 = 0.2, r10,9 = 
— 0.8. 
Number of trials: 10000. 
Lower bound by S1, $2 0.524736 
Lower bound by Hunter 0.563719 
Upper bound by S1, S2 0.588646 
Estimated value 0.582743 
Standard deviation 0.000608 
Time in seconds (PC-586) | 0.77 
Efficiency 65.73 


Example 3 
n= 15, 
x, = 2.9, x2 >= 2.9, x3 = 29: 
x4 = 2.9, x5 = 2.9, xo = 2.9, 
x7 = 2.9, xg = 2:9. Xo = 2.9, 
X19 = 2.9, Xi, = 2.9, x12 = 2.7 
x3 = 1.6, x14 = 1.2, x15 = 2.1), 
rij =0.2, 1=2,...,10, j=l,...,i-1, 
rjj =0.0, i=11,...,15, j=l, ,i-1l 
except r13,12 = 0.3, 115,14 = — 0.95. 
Number of trials = 10000. 
Lower bound by S1, $2 0.790073 
Lower bound by Hunter 0.798730 
Upper bound by S1, S2 0.801745 
Estimated value 0.801304 
Standard deviation 0.000193 
Time in seconds (PC-586) | 1.38 
Efficiency 417.84 


Both of the above examples are taken from [2, Exam. 
4; 6] and they are according to standard multivariate 
normal probability distributions, i.e. all components 
of the normally distributed random vector have ex- 
pected value zero and variance one. The efficiency of 
the Monte-Carlo simulation algorithm was calculated 
according to the crude Monte-Carlo algorithm in the 
usual way, i.e. it equals to the fraction (t90¢)/(tio7) 
where fo, f; are the calculation times and 04, of are the 
variances of the crude and the compared simulation al- 
gorithms. 


Remarks 


In many applications one may need finding the gradient 
of multivariate distribution functions, too. As one has 
the general formula 


OF (Zz), Sa.eg Zn) 
0Z; 
= F(Z... 521-1, Zit1,-++s2nlZi) > filZ), 
where F(Z}, ..., Zi-15 Zit 1) +++» Zn | 2) is the conditional 


probability distribution function of the random vari- 
ables &), ..., €—1, €i41, ...5 &,, given that &; = z;, and 
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fi(z) is the probability density function of the random 
variable &;, finding the gradient of a multivariate prob- 
ability distribution function can be reduced to finding 
conditional distribution functions. In the cases of mul- 
tivariate normal and Dirichlet distributions the condi- 
tional distributions are also multivariate normal and 
Dirichlet, and in the case of multivariate gamma dis- 
tribution they are different and more complicated as it 
was obtained by Prékopa and Szantai [12]. 

In the case of multivariate normal probability dis- 
tribution I. Deak [2] proposed another simulation tech- 
nique which proved to be as efficient as the method de- 
scribed here. The main advantage of Deak’s method is 
that it easily can be generalized for calculation the prob- 
ability content of more general sets in the multidimen- 
sional space, like convex polyhedrons, hyperellipsoids, 
circular cones, etc. Its main drawback is that it works 
only for the multivariate normal probability distribu- 
tion. The methods of Szantai and Deak have been com- 
bined by H. Gassmann to compute the probability of an 
n-dimensional rectangle in the case of multivariate nor- 
mal distribution (see [3]). Also in the case of multivari- 
ate normal probability distribution A. Genz proposed 
the transformation of the original integration region 
to the unit hypercube [0, 1]” and then the application 
of a crude Monte-Carlo method or some lattice rules 
for the numerical integration of the resulting multidi- 
mensional integral. A comparison of methods for the 
computation of multivariate normal probabilities can 
be found in [4]. When the three-dimensional marginal 
probability distribution function values are also calcu- 
lated by numerical integration there exist some new, 
sharper bounds. See [16] for these bounds and their ef- 
fect on the efficiency of the Monte-Carlo simulation al- 
gorithm. 

Approximation of multivariate probability integrals 
has a central role in probabilistic constrained stochas- 
tic programming when the probabilistic constraints 
are joint. The computer code PCSP (probabilistic con- 
strained stochastic programming) originally was devel- 
oped for handling the multivariate normal probability 
distributions in this framework (see [15]). A new ver- 
sion of the code now can handle multivariate gamma 
and Dirichlet distributions as well. The calculation pro- 
cedures of this paper also has been applied by J. Mayer 
in his code solving this type of stochastic programming 
problems by reduced gradient algorithm (see [10]). 


These codes have been integrated by P. Kall and Mayer 
into a more advanced computer system for modeling in 
stochastic linear programming (see [7]). 


See also 


> Approximation of Extremum Problems with 
Probability Functionals 

> Discretely Distributed Stochastic Programs: Descent 
Directions and Efficient Points 

> Extremum Problems with Probability Functions: 
Kernel Type Solution Methods 

> General Moment Optimization Problems 

> Logconcave Measures, Logconvexity 

> Logconcavity of Discrete Distributions 

> L-shaped Method for Two-stage Stochastic 
Programs with Recourse 

> Multistage Stochastic Programming: Barycentric 
Approximation 

> Preprocessing in Stochastic Programming 

> Probabilistic Constrained Linear Programming: 
Duality Theory 

> Probabilistic Constrained Problems: Convexity 
Theory 

> Simple Recourse Problem: Dual Method 

> Simple Recourse Problem: Primal Method 

> Stabilization of Cutting Plane Algorithms for 
Stochastic Linear Programming Problems 

> Static Stochastic Programming Models 

> Static Stochastic Programming Models: Conditional 
Expectations 

> Stochastic Integer Programming: Continuity, 
Stability, Rates of Convergence 

> Stochastic Integer Programs 

> Stochastic Linear Programming: Decomposition 
and Cutting Planes 

> Stochastic Linear Programs with Recourse and 
Arbitrary Multivariate Distributions 

> Stochastic Network Problems: Massively Parallel 
Solution 

> Stochastic Programming: Minimax Approach 

> Stochastic Programming Models: Random Objective 

> Stochastic Programming: Nonanticipativity and 
Lagrange Multipliers 

> Stochastic Programming with Simple Integer 
Recourse 

> Stochastic Programs with Recourse: Upper Bounds 
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> Stochastic Quasigradient Methods in Minimax 
Problems 

> Stochastic Vehicle Routing Problems 

> Two-stage Stochastic Programming: Quasigradient 
Method 

> Two-stage Stochastic Programs with Recourse 
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Introduction 


We consider a general conic optimization problem un- 
der parameter uncertainty is as follows: 


max cx 
n 
s.t. > Ajay —BEXK (1) 
j=l 
xEXx, 


where the cone K is a regular cone, i.e., a closed, 
convex and pointed cone. The space of the data 
(Ai, ..., An, B) depends on the cone, K. The most 
common cone is the cone of non-negative orthant, i”? 
in which the conic constraint in Problem (1) becomes 
a set of m linear constraints. Two important cones, 
which have many applications, include the second- 
order cone, 


L™*! = {(yo,y): lly lle < yo. ye HR} 
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and the cone of symmetric positive semidefinite ma- 
trix, 


S” = {Y: Y isa symmetric postive 


semidefinite matrix} . 


The interested reader may refer to the references of 
Ben-Tal and Nemirovski [3] and Pardalos and Wolko- 
wicz [13]. 

In the uncertain conic optimization problem (1), 
the data (A;, ... ,A,, B) are uncertain. It is therefore 
conceivable that as the data take values different than 
the nominal ones, the conic constraint may be violated, 
and the optimal solution found using the nominal data 
may no longer be feasible at the conic constraint. To 
control the feasibility level of the conic constraint, one 
may consider a conic chance constrained model as fol- 
lows: 


max cx 
n 
s.t. P(}) Ajxj; -BeK)>1-e (2) 
j=l 
xEXx, 


in which the level of constraint violation is con- 
trolled probabilistically. Unfortunately, the chance con- 
strained conic optimization problem (2) destroys the 
convexity of the problem and hence its computational 
tractability. 


Formulation 


In modern robust optimization, we represent data un- 
certainty using uncertainty sets instead of probability 
distributions. We allow the data (A;,..., A,, B) to 
vary within an uncertainty set U without having to vio- 
late the conic constraint. We call the following problem 
the robust counterpart of Problem (1) 


max cx 
n 
s.t. LA Bex 
j=l (3) 
V(Aq, .. 
xEex. 


., An, B) Ee U 


The robust counterpart is introduced by Ben-Tal and 
Nemirovski [1] and independently by El-Ghoui et al. 
[9]. An immediate consequence of the robust counter- 


part is the preservation of the convexity. Unfortunately, 
due to the possibly infinite number of scenarios corre- 
sponding to the extreme points of the uncertainty set 
U, optimizing the robust counterpart for general conic 
optimization problems is intractable. 

It is noteworthy that in robust optimization, the el- 
lipsoidal uncertainty set is a popular choice because of 
the motivation from the laws of large numbers and nor- 
mal distributions. Under the assumption of normality, 
we could design an ellipsoidal set that is large enough 
so that the robust model will remain feasible with high 
probability. However, it turns out this approach can 
grossly over estimate the size of ellipsoid necessary to 
ensure the same level of robustness. To illustrate this is- 
sue, consider a linear constraint a’x > b such that a is 
a multivariate normal with mean @ and covariance )°. 
It is natural to design an ellipsoidal uncertainty set of 
the form U = {a: (a—a) X71 (a—4) < a} so that the 
problem remains feasible if @ € U, which has a proba- 
bility of x(a”). However, when solving the equivalent 
robust counterpart, a’x — a/x/X'x > b, the robust 
solution has a feasibility probability of at least (a), 
where ®(q) is the standard normal function. Clearly, 
the value x7(a?) would be a gross over estimate of the 
robustness of the uncertain linear constraint compared 
to the value (a). The reason for this disparity is the 
fact that the uncertainty set chosen does not take into 
account the structure of cone. 

We focus on the robust optimization framework 
proposed by Bertsimas and Sim [5], which offers a sim- 
ple and tractable approximation of uncertain conic 
optimization problems. Moreover, under reasonable 
probabilistic assumptions on data variation, the frame- 
work approximates the conic chance constraint prob- 
lem (2) by relating its feasibility probability with the 
size of the uncertainty set and the structure of the cone. 
Note that more refined approximations of chance con- 
strained problem are available for the case of linear 
cones, K = "!. Interested readers can refer to Ben- 
Tal and Nemirovski [2], Bertsimas and Sim [4], Chen, 
Sim and Sun [8], Chen and Sim [6], Chen et al.[7], Lin 
et al. [10] and Janak et al. [11]. 


Affine Data Dependency 


We first assume that uncertain data (A, ... , Ay, B) 
are affinely dependent on some primitive uncertainty 
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vector, Z € KN, as follows 


N, 
A; = A\(@) 2 Ao +) AMZ; i=1,...,0 
_— 
B= B(z) © Bo + > Biz,. 
j=l 
Note that we can always define a bijection map- 
ping from a vector space of Z to the data space of 
(A, ... , An, B). Therefore, under the affine data de- 
pendency, it is always possible to map all the data un- 
certainties affecting the conic constraint to the primi- 
tive uncertainty vector, Z. It is more convenient to de- 
fine the following linear function mapping with respect 
to (Zo,Z), 


N 
Yi. D= >) Kz, 


j=0 


in which the variables x are affinely mapped to the vari- 
ables (Yo, ... , Yn) as follows 


n 
Y¥j;=) Aix;-Bi Vj=0,...,N. 


i=1 


For instance, under such transformation, Problem (2) 
is equivalent to 


max cx 
st. Yj = o_, Ajx; — Bi Vi= Oar NW 
P(Y((1, Z)) € K) > 1l—-e 
xEeXx, 
(4) 
and Problem (3) is the same as 
max c’x 
st. Yj = oi, Ajxi-B/ Vj=0,...,N 5) 
Y((1,z)) € K VzeV 
xEXxX, 


in which the uncertainty set U is mapped accordingly 
to the uncertainty set V. 


Example: Quadratic Chance Constraint Consider 


the following quadratic chance constraint, 


P(|[A(Z)x||5 + b(2)'x + c(zZ) <0) >1-e, 


where x € ‘h” is the decision variable and (A(Z), b(Z), 

c(z)) € #"*" x HK" x H are the input data, which are 
affinely dependent on its primitive uncertainties as fol- 
lows: 


A@) & A°+ DL, lz; 
bz) = w+, biz; 
cz) & + Dh, lg. 


Note that a quadratic constraint 
|A(@)x|l3 + b(@)'x + c(2) <0 
is second-order cone representable as follows 
1—b(2)' x-c(Z) 
2 


A(Z)x 
14+b(2)/x-+c(Z) 
2 


E pmnt2 


Therefore, under the affine relation, 


A°x 
— | 14 bx +c0 
Yo ~ 2 
1—b" x—c® 
2 
and 
Ax 
= bi x+ci st 
bo a. Vj=l1,...,N 
=p) xe 


2 


we transform the quadratic chance constraint problem 
into the following conic chance constraint 


Ply t+ > yz eL"? 


j=l 
Hence, we treat the quadratic constraint as a special 


case of second-order cone constraint. 


Tractable Approximations 
of a Conic Chance Constrained Problem 


We focus on deriving a tractable approximation on the 
following conic chance constraint: 


P(Y((1,Z)) € K) > 1l—-e. (6) 
For notational convenience, we define 


x4 (Y¥,..., Yn). 
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For a given a reference vector (or matrix), V € int(X), 
where int(K) denotes the interior of the cone K, we can 
define the function 


F(X, (Z,z)) = max{O : ¥((z, z)) —OV € K}, 


which has the following properties: 


Proposition 1 For any V ¢€ int(X), the function 

F(X, (Zo, 2)) satisfies the properties: 

(a) f(X, (Z, z)) is bounded and concave in X 
and (Zo, Z). 

(b) F(X, k(Zo, z)) = kf(X, (Zo, z)), Vk = 0. 

(c) f(X, (Z, z)) = s ifand only if Y((z, z)) —s 
VeX. 

(d) f(X, (Zo, z)) > s ifand only if Y((zo, z)) —s 
V € int(X). 


Hence, the conic chance constraint of (6) is equivalent 
to the following chance constraint 


P(F(X, (1, Z)) = 0) = 1—-e. (7) 


In order to build a tractable framework that approxi- 
mates the conic chance constraint problem, we first an- 
alyze the robust counterpart approach to uncertainty. 
Given an ellipsoidal uncertainty set 


E(p) = tz: |lZll2 <p}. 
the robust counterpart 


f(X,,z))20 VzeE E(p), (8) 


despite its convexity, is generally intractable. Instead we 
consider the following robust counterpart: 


N 
F(X, (1, 0) + DULF(K, (0, 9))v; 


— (9) 
+ f(X, (0, —e;))w;} = 0, 


V(v, w) € V(p) 


where e; € St is a unit vector with one at the jth entry 
and the uncertainty set 


V(p) = {(v, w) e RY x RY | |v + wllo < pt . (10) 


Proposition 2 The robust counterpart (9) is tractable 
relaxation of the robust counterpart, (8). 


Theorem 1 
(a) The constraint (9) is equivalent to 


F(X, (1, 0)) = pllsile. (11) 


where 


sj = max{—f(X, (0, e;)), —f(X, (0, —e;))}, 
GF We soins 


(b) Eq. (11) can be written as: 


f(X, (1, 0)) = py 

F(X, (0, e;)) + tj = 0, 
F(X, (0, —e;)) +t; = 0, 
IItl2<y 

forsome ye R, te RN. 


ViIEN 


VjEN (12) 


From Proposition 1 and noting that 
Y((1, 0)) = Yo 

and 
Y((0, tej)) = Yj, 


we can also represent the formulation (12) explicitly in 
conic constraints as follows: 


Yo-—pyVeEX 

Y,;+t)V EX, VjEeN 
-Y,;+t)VEXK, VjEN (13) 
Ilflla < y 


forsome ye #, te RN, 


for a given reference vector, V in the interior of the 
cone, K. The formulation (12) becomes a cartesian 
product of 2N + 1 cones of the nominal problem plus 
an additional second-order cone, which is a computa- 
tionally tractable cone. Hence, in theory the formula- 
tion (12) is not much harder to solve compared with its 
nominal problem. 

One natural question is whether the simple approx- 
imation is overly conservative with respect to Prob- 
lem (8). While there is lack of theoretical evidence on 
the closeness of the approximation, the framework does 
lead to an approximation of the conic chance constraint 
problem. An important component of the analysis is 
the relation among different norms, which we will sub- 
sequently present. 
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Recall that a norm satisfies ||A|| > 0, ||KA|] = |k|- 
||All. ||A+ Bll < ||Al|+||B]], and ||A]| = 0, implies that 
A = 0. Fora given regular cone, K, and its interior, V, 
we define the following cone induced norm 


Yllx,v & min{y, yW-Y €K, Y+yV € K}. (14) 


Proposition 3 


max{—f(X, (zo,z)), —f(X, —(zo, z))} 
= ||¥ (zo, z))llac,v 


We consider the common cones and the respective 
norms. 
(a) Second-order cone: 
Let e, € int(£"T") be the reference vector, we have 
for any vector (yo, y) € RH"? 


yo, When, en 41 
=min{@: |lyll2 < 9 — yo, 
= |lyll2 + |yol 


yll2 < 0+ yo} 


(b) Cone of symmetric positive definite matrix: 
Let the identity matrix I be the reference matrix, 
then for any m x m symmetric matrix, Y, 


Vs = minty, yI-YeS", Y—yleS?} 
= |¥lhp- 


Proposition 4 Suppose X is feasible in Problem (12) 
then 


P(F(X, (1, 2)) <0) 


N 
< PII>> yz; po Xaley 
j=l KV jEN 


To obtain explicit bounds, we focus on primitive un- 
certainties, Z that are normally and independently dis- 
tributed with mean zero and variance one. For a sum of 
random scalers, we have 


<1-—20(p). 


To derive a similar large deviation result for the sum of 
random vectors used in Proposition 4, we consider the 


following generalization: 


< b(p). 


N 
P| >) Ya; 
j=l 


N 
>p | IXley 
KV j=l 


where ¢(p) is a non-trivial probability bound that de- 
pends on the choice of cone, K, and possibly the di- 
mension and the reference vector, V. 

An important component of the analysis is the re- 
lation among different norms. We denote by ( , ) 
the inner product on a vector space, }i””, or the space 
of m by m symmetric matrices. The inner product in- 
duces a norm ||X|| = 4/(X, & For a vector space, 
the natural inner product is the Euclidian inner prod- 
uct, (x, y) = x’y, and the induced norm is the Eu- 
clidian norm ||x||2. For the space of symmetric matri- 
ces, the natural inner product is the trace product or 
(X, Y) = trace(XY) and the corresponding induced 
norm is the Frobenius norm, || Y ||. 

We analyze the relation of the inner product norm 
J (X, X) with the norm ||X||x,v for the conic opti- 
mization problems we consider. Since ||X||x, vy and the 
inner product norm ||X|| are valid norms in a finite di- 
mensional space, there exist finite w;, @2 > 0 such that 


1 
5, |X lla.v < XI < o2|Xllx,v, 
1 


for all X in the relevant space. Hence, we define the pa- 
rameter 


cv = ( max IXx.r) ( a Ix) (15) 
||X]]=1 ||Xlla,v=l 
——— $$ 


nd 
=a) =a2 


which measures the disparity between the norm ||-||x, v 


and the inner product norm || - ||. 


Parameter «x, y of Common Cones 
(a) Second-order cone: 
Let €,,41 be the reference vector, then 


Iv. Ynti) ML ent, y = Ilyll2 + lyntil- 


Therefore, 


il 
yall. Yn+Vllontt, eng, MOY. Yn+dlle 


SMO, Yn thor eras 


Approximations to Robust Conic Optimization Problems 


95 


Approximations to Robust Conic Optimization Problems, Ta- 
ble 1 
Probability bounds of P(f(X, (1, z)) < 0) for z ~ N(O, I). 


Probability bound of infeasibility 


@ 
2 


Type 


and hence, 
A pn+1 v = /2 . 


(b) Cone of symmetric positive definite matrix: 
Let I be the reference matrix, then for any m x m 
symmetric matrix Y 


Yilsmj7 = Yo. 


Let Aj j=1,...,mbe the eigenvalues of the ma- 


trix Y. Since ||Y¥||p = \/trace(Y*) = pay ni and 


|| ¥ 2 = max; |Aj|, we have 


|¥ll2 < ||Alle < Vmll¥ll2- 


Hence, 
ast I =v V/mM. 
Theorem 2 Given an inner product norm || - || and un- 


der the assumption that Z; are normally and indepen- 
dently distributed with mean zero and variance one, i. €., 
z~ N(O, D), then 


N 
P he gs >p sae 
j=l KV jEN 


2 
_ vee as (- p 
Vv 


~ aK, 2ai, 


» (16) 


In order to have the smallest budget of uncertainty, p, it 
is reasonable to select V that minimizes ax,y, i-e., 


for all p > ax,y. 


min axK,y. 
Veint(K) 


Aq = 


For general conic optimization, we have shown that 
the probability bound depends on the the choice of 


V ¢€ int(K). A cone, K C R" is homogenous if for any 
pair of points A, B € int(K) there exists an invertible 
linear map M: i" — %" such that M(A) = B and 
M(K) = X. It turns out that for homogenous cones, 
of which semidefinite and second-order cones are spe- 
cial cases, the probability bound does not depend on 
V ¢€ int(X). 


Theorem 3 Suppose the cone K is homogenous. For any 
V ¢€ int(K), the probability bound of conic infeasibility 


satisfies 
(a 
aK 20% , 


For the second-order cone, @ pn+1 = s/2 and for the sym- 
metric positive semidefinite cone, Osm = JM. 


While different V lead to the same probability bounds, 
some choices of V may lead to better objectives. The 
following theorem suggests an iterative improvement 
strategy. 


Theorem 4 For any V ¢€ int(X), if X, tand y > 0 are 
feasible in (13), then they are also feasible in the same 
problem in which V is replaced by 


W = Y0/(py). 


While we focus on the primitive uncertainty vector Z 
being normally distributed, using the large deviation 
bounds of Nemirovski [12], we can also apply the same 
framework to other distributions. The interested reader 
may refer to Bertsimas and Sim [5]. 
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Biographical Sketch 


Archimedes (287-212 B.C.) was a famous Greek math- 
ematician, engineer and philosopher. Born in the city 
of Syracuse on the Island of Sicily to an astronomer 
and mathematician named Phidias, Archimedes spent 
the first years of his life in his home city and went to 
Alexandria in Egypt to study mathematics. He soon 
became friends with Konon of Samos and Eratos- 
thenes. After spending a considerable amount of time 


in Alexandria, he returned to Syracuse, where he re- 
mained for the rest of his life conducting mathemati- 
cal research. He had a good relationships with king Hi- 
eron of Syracuse and his son Gelon. We know that he 
assisted king Hieron numerous times either with his 
inventions during the Second Punic War or by solv- 
ing problems like the well-known case (the one that 
Archimedes jumped out of his bathtub crying out eu- 
reka) with the crown of king Hieron during peacetime. 

In this article we will concentrate on the work of 
Archimedes, which is closely related to what we call 
today industrial engineering (including the mathemat- 
ical theory of optimization, operations research, the- 
ory of algorithms, etc.). In particular, we will present 
Archimedes’ definition of convex sets, his method of 
exhaustion for computing finite integrals, his contribu- 
tion to recursive algorithms, and his approach to solv- 
ing real-life operations research problems during the 
Second Punic War. 


Archimedes’ Work 


One very important concept for optimization is the 
definition of convex sets. The first such definition was 
given by Euclid in his books Elements, but Archimedes 
elaborated this definition and gave us his definition, 
which was used until the first decades of the 20th cen- 
tury. In his work On the sphere and the cylinder he gives 
the following definition of the convex arc: 


Definition 1 [ call convex in one and the same direc- 
tions the surfaces for which the straight line joining two 
arbitrary points lies on the same side of the surface. 


On his work On the equilibrium of planes he gives a defi- 
nition of the convex set using the center-of-gravity con- 
cept: 


Definition 2 In any figure whose perimeter is convex 
the center of gravity must be within the figure. 


It is worth mentioning that Archimedes’ definitions of 
convex arcs and convex sets were those used until 1913, 
when E. Steinitz introduced the modern definitions of 
convexity. 

Archimedes had invented a geometrical method 
called the method of exhaustion (or method of in- 
finitesimals) in order to be able to compute areas un- 
der convex curves. This was one of the first geometrical 
methods devised to compute what we call today definite 
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Archimedes and the Foundations of Industrial Engineering, 
Figure 1 
Illustration of Archimedes’ exhaustion method 


integrals. In modern notation Archimedes was able to 
compute 
b 
/ [f (x) — g(x)]dx, (1) 
a 

where f(x) is a line segment and g(x) a convex function 
(usually parabola). An illustration of this method can 
be found in Fig. 1. 

Suppose that we want to compute the area over 
a curve and below the line segment AB. Archimedes 
considered the triangle ABC, where Cis the point below 
the midpoint M of the line segment AB(MC is the mid- 
dle vertical of AB). If we iteratively repeat this process, 
we can see that the next two parabolic triangles have an 
area that is t of the initial triangle. Therefore, the area 
of the curve was the infinite sum of 1+ 4+i+..., 
where 1 corresponds to the area of the initial triangle 
ABC. In this way Archimedes was able to geometrically 
approximate the area of a convex parabolic curve. 

According to [7] Archimedes was the first (in 
around 220 B.C.) to use a double recursive algorithm 
to solve the problem of the sand reckoner (Psammitis). 
In this book he tries to come up with of a number that 
is much larger than the number of grains of sand in the 
world and therefore prove that the number of grains of 
sand in the world is not infinite. For this he fixes a num- 


ber @ and defines the number p;(x) as follows (using 
a double recursion scheme): 


pox) = 1, 
Pk+1(0) = pr(a), 
Pn4ilx + 1) = apgyi(x). 


(2) 


Therefore, px(x) = a**. Then he considers Pale) 
for a = 10°, which was the largest number known at 
that time, and he comes up with the number 10!°17, 
which was the largest number used in mathematics un- 
til 1933. 

Apart from Archimedes’ exceptional skills in the- 
oretical research, he also became famous for his abil- 
ity to deal with everyday life problems. Although op- 
erations research was developed during World War II, 
when mathematicians were looking for ways to make 
better decisions in utilizing certain materials subject to 
some constraint, some consider Archimedes the father 
of operations research as he helped his home city de- 
fend itself against the Romans during the Second Punic 
War. 

Before King Hieron died, he asked Archimedes to 
organize the complete defense of Syracuse against Ro- 
man general Marcelus. Archimedes is said to have in- 
vented many mechanical war machines like the claw of 
Archimedes, a new version of catapult, an array of mir- 
rors that was able to burn enemy ships, etc. 

Archimedes was also responsible for organizing the 
defense of Syracuse and the redecoration of Fort Eu- 
ryalus [6]. Due to Archimedes’ clever defense plans, 
Syracuse managed to survive the Roman siege for 
2 years. 


Conclusion 


Archimedes was a perfect example of a scientist who 
managed to combine theoretical research with practical 
problem solving. He managed to distinguish between 
the two by referring to his mechanical inventions as 
parergon. This shows that Archimedes was capable of 
performing both basic and applied research, but he re- 
garded basic research as more important. In this sense 
he can be considered the father of the modern indus- 
trial engineer who utilizes theoretical methods to solve 
problems that arise in everyday life. 
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Introduction 


Asset Liability Management (ALM) is an important di- 
mension of risk management, where the exposure to 


various risks is minimized while maintaining the ap- 
propriate combination of asset and liability, in order to 
satisfy the goals of the firm or the financial institution 
(Kosmidou and Zopounidis [18]). 

Up to the 1960’s, liability management was aimless. 
In their majority, the banking institutions considered 
liabilities as exogenous factors contributing to the limi- 
tation of asset management. Indeed, for a long period 
the greater part of capital resources originated from 
savings deposits and deposits with agreed maturity. 

Nevertheless, the financial system has radically 
changed. Competition among the banks for obtaining 
capital has become intense. Liability management is the 
main component of each bank strategy in order to en- 
sure the cheapest possible financing. At the same time, 
the importance of decisions regarding the amount of 
capital adequacy is enforced. Indeed, the adequacy of 
the bank as far as equity, contributes to the elimina- 
tion of bankruptcy risk, a situation in which the bank 
cannot satisfy its debts towards clients who make de- 
posits or others who take out loans. Moreover, the cap- 
ital adequacy of banks is influenced by the changes 
of stock prices in relation to the amount of the cap- 
ital stock portfolio. Finally, the existence of a mini- 
mum amount of equity is an obligation of commercial 
banks to the Central Bank for supervisory reasons. It is 
worth mentioning that based on the last published data 
(31/12/2001) the Bank of Greece assigns the coefficient 
for the Tier 1 capital at 8%, while the corresponding Eu- 
ropean average is equal to 6%. This results in the con- 
figuration of the capital adequacy of the Greek banking 
system at higher levels than the European average rate. 
The high capital adequacy index denotes large margins 
of profitability amelioration, which reduces the risk of 
a systematic crisis. 

Asset management in a contemporary bank cannot 
be distinct from liability management. The simultane- 
ous management of assets and liabilities, in order to 
maximize the profits and minimize the risk, demands 
the analysis of a series of issues. 

Firstly, there is the substantive issue of strategic 
planning and expansion. That is, the evaluation of the 
total size of deposits that the bank wishes to attract and 
the total number of loans that it wishes to provide. 

Secondly, there is the issue of determination of the 
“best temporal structure” of the asset liability manage- 
ment, in order to maximize the profits and to ensure 
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the robustness of the bank. Deposits cannot all be liqui- 
dated in the same way. From the point of view of assets, 
the loans and various placements to securities consti- 
tute commitments of the bank’s funds with a different 
duration time. The coordination of the temporal struc- 
ture of the asset liability management is of major im- 
portance in order to avoid the problems of temporary 
liquidity reduction, which might be very injurious. 

Thirdly, there is the issue of risk management of 
assets and liabilities. The main focus is placed on the 
assets, where the evaluation of the quality of the loans 
portfolio (credit risk) and the securities portfolio (mar- 
ket risk) is more easily measurable. 

Fourthly, there is the issue of configuration of an 
integrated invoice, which refers to the entire range of 
bank operations. It refers mainly to the determination 
of interest rates for the total of loans and deposits as well 
as for the various commissions which the bank charges 
for specific mediating operations. It is obvious that in 
a bank market which operates in a competitive environ- 
ment, there is no issue of pricing. This is true even in the 
case where all interest rates and commissions are set by 
monetary authorities, as was the situation in Greece be- 
fore the liberalization of the banking system. 

In reality, bank markets have the basic character- 
istics of monopolistic competition. Thus, the issue of 
planning a system of discrete pricing and product di- 
versification is of major importance. The problem of 
discrete pricing, as far as the assets are concerned, is 
connected to the issue of risk management. It is a com- 
mon fact that the banks determine the borrowing inter- 
est rate on the basis of the interest rates which increase 
in proportion to the risk as they assess it in each case. 
The product diversification policy includes all the loan 
and deposit products and is based on thorough research 
which ensures the best possible knowledge of market 
conditions. 

Lastly, the management of operating cost and tech- 
nology constitutes an important issue. The collabo- 
ration of a well-selected and fully skilled personnel, 
as well as contemporary computerization systems and 
other technological applications, constitutes an impor- 
tant element in creating a low-cost bank. This results in 
the acquisition of a significant competitive advantage 
against other banks, which could finally be expressed 
through a more aggressive policy of attracting loans and 
deposits with low loan interest rates and high deposit 


interest rates. The result of this policy is the increase of 
the market stake. However, the ability of a bank to ab- 
sorb the input of the best strategic technological inno- 
vations depends on the human resources management. 

The present research focuses on the study of bank 
asset liability management. Many are the reasons that 
lead us to study bank asset liability management, as an 
application of ALM. Firstly, bank asset/liability man- 
agement has always been of concern to bank man- 
agers, but in the last years and especially today its im- 
portance has grown more and more. The development 
of information technology has led to such an increas- 
ing public awareness that the bank’s performance, its 
politics and its management are closely monitored by 
the press and the bank’s competitors, shareholders and 
customers and thereby highly affect the bank’s public 
standing. 

The increasing competition in the national and in- 
ternational banking markets, the changeover towards 
the monetary union and the new technological innova- 
tions herald major changes in the banking environment 
and challenge all banks to make timely preparations in 
order to enter into the new competitive monetary and 
financial environment. 

All the above drove banks to seek out greater effi- 
ciency in the management of their assets and liabilities. 
Thus, the central problem of ALM revolves around the 
bank’s balance sheet and the main question that arises 
is: What should be the composition of a bank’s assets 
and liabilities on average given the corresponding re- 
turns and costs, in order to achieve certain goals, such 
as maximization of the bank’s gross revenues? 

It is well known that finding an appropriate balance 
between profitability, risk and liquidity considerations 
is one of the main problems in ALM. The optimal bal- 
ance between these factors cannot be found without 
considering important interactions that exist between 
the structure of a bank’s liabilities and capital and the 
composition of its assets. 

Bank asset/liability management is defined as the si- 
multaneous planning of all asset and liability positions 
on the bank’s balance sheet under consideration of the 
different banking and bank management objectives and 
legal, managerial and market constraints. Banks are 
looking to maximize profit and minimize risk. 

Taking into consideration all the above, the purpose 
of this paper is to develop a goal programming system 
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into a stochastic environment, focusing, mainly, on the 
change of the interest rate risk. This system provides 
the possibility to the administrative board and the man- 
agers of the bank to proceed to various scenarios related 
to their future economic process, aiming mainly to the 
management of the risks, emerged from the changes of 
the market parameters. 

The rest of the paper is organized as follows. The 
next section includes a brief overview of bank ALM 
techniques. Section “Model” outlines the methodology 
used and describes the development of the ALM deci- 
sion support system. Finally, the conclusions of the pa- 
per as well as future research perspectives are discussed 
in the last section. 


Background 


Looking to the past, we find the first mathematical 
models in the field of bank management. Asset and 
liability management models can be deterministic or 
stochastic (Kosmidou and Zopounidis [17]). 

Deterministic models use linear programming, as- 
sume particular realizations for random events, and 
are computationally tractable for large problems. The 
deterministic linear programming model of Chambers 
and Charnes [6] is the pioneer in ALM. Chambers and 
Charnes were concerned with formulating, exploring 
and interpreting the use and construction which may 
be derived from a mathematical programming model 
which expresses more realistically than past efforts the 
actual conditions of current operations. Their model 
corresponds to the problem of determining an optimal 
portfolio for an individual bank over several time pe- 
riods in accordance with requirements laid down by 
bank examiners which are interpreted as defining limits 
within which the level of risk associated with the return 
on the portfolio is an acceptable one. 

Cohen and Hammer [9], Robertson [31], Lifson 
and Blackman [23], Fielitz and Loeffler [14] have real- 
ized successful applications of Chambers and Charnes’ 
model. Even though these models have differed in their 
treatment of disaggregation, uncertainty and dynamic 
considerations, they all have in common the fact that 
they are specified to optimize a single objective profit 
function subject to the relevant linear constraints. 

Eatman and Sealey [12] developed a multiobjective 
linear programming model for commercial bank bal- 


ance sheet management considering profitability and 
solvency objectives subject to policy and managerial 
constraints. 

Giokas and Vassiloglou [15] developed a goal-pro- 
gramming model for bank asset and liability manage- 
ment. They supported the idea that apart from at- 
tempting to maximize revenues, management tries to 
minimize risks involved in the allocation of the bank’s 
capital, as well as to fulfill other goals of the bank, such 
as retaining its market share, increasing the size of its 
deposits and loans, etc. Conventional linear program- 
ming is unable to deal with this kind of problem, as it 
can only handle a single goal in the objective function. 
Goal programming is the most widely used approach 
that solves large-scale multi-criteria decision making 
problems. 

Apart from the deterministic models, several 
stochastic models have been proposed since the 
1970s. These models, including the use of chance- 
constrained programming [7,8,29], dynamic program- 
ming [13,25,26,32], sequential decision theory [3,35] 
and stochastic linear programming under uncer- 
tainty [2,10,11,16], presented computational difficul- 
ties. The stochastic models, in their majority, originate 
from the portfolio selection theory of Markowitz [24] 
and they are known as static mean-variance methods. 
Pyle [30] and Brodt [4] adapted Markowitz’s theory and 
presented an efficient dynamic balance sheet manage- 
ment plan that considers only the risk of the portfolio 
and not other possible uncertainties or maximizes prof- 
its for a given amount of risk over a multi-period plan- 
ning horizon respectively. 

Wolf [35] proposed the sequential decision theo- 
retic approach that employs sequential decision anal- 
ysis to find an optimal solution through the use of im- 
plicit enumeration. 

An alternative approach in considering stochastic 
models, is the stochastic linear programming with sim- 
ple recourse. Kusy and Ziemba [19] employed a multi- 
period stochastic linear program with simple recourse 
to model the management of assets and liabilities in 
banking while maintaining computational feasibility. 
Their results indicate that the proposed ALM model 
is theoretically and operationally superior to a corre- 
sponding deterministic linear programming model and 
that the computational effort required for its imple- 
mentation is comparable to that of the deterministic 
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model. Another application of the multistage stochas- 
tic programming is the Russell- Yasuda Kasai model [5], 
which aims at maximizing the long term wealth of the 
firm while producing high income returns. 

Mulvey and Vladimirou [27] used dynamic general- 
ized network programs for financial planning problems 
under uncertainty and they developed a model in the 
framework of multi-scenario generalized network that 
captures essential features of various discrete time fi- 
nancial decision problems. 

Finally, Mulvey and Ziemba [28] present a more de- 
tailed overview of various asset and liability modeling 
techniques, including models for individuals and finan- 
cial institutions such as banks and insurance compa- 
nies. 

Moreover, over the years, many models have been 
developed in the area of financial analysis and fi- 
nancial planning techniques. Kvanli [20], Lee and 
Lerro [22], Lee and Chesser [21], Baston [1], Sharma et 
al. [34], among others have applied goal programming 
to investment planning. Giokas and Vassiloglou [15], 
Seshadri et al. [33] presented bank models using goal 
programming. These studies focus on the areas of bank- 
ing and financial institutions and they use data from the 
bank financial statements. 


Model 


Kosmidou and Zopounidis [18] developed an asset lia- 
bility management (ALM) methodology into a stochas- 
tic environment of interest rates in order to select the 
best direction strategies to the banking financial plan- 
ning. The ALM model was developed through goal pro- 
gramming in terms of a one-year time horizon. The 
model used balance sheet and income statement infor- 
mation for the previous year of the year t to produce 
a future course of ALM strategy for the year t + 1. As 
far as model variables are concerned, we used variables 
familiar to management and facilitated the specification 
of the constraints and goals. For example, goals con- 
cerning measurements such as liquidity, return and risk 
have to be expressed in terms of utilized variables. 
More precisely, the asset liability management 
model that was developed can be expressed as follows: 


minz = Y> peldy + dt) (1) 
P 


subject to constraints: 


Ky < X' < A®y (2) 
K@y < Y’ < A®y, (3) 
=) Ve tank VR lam 
i=1 j=l 
(4) 

Y> ¥j-a DX; =0 (5) 
je yn i€Eyy 

Yo ¥j- do wiXi-di +d> =k (6) 
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YXi-k >) Yj; +d, -dt =0 (7) 
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DA ee ee (8) 
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\ Xi+d,-dt=1,, Yp (9) 
i€E, 
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St Std Sta Su; 
forall i=1,....n,j=1,...,.m,keP (11) 


where 


Xj: the element i of asset, Vi = 1,...,, nis the num- 
ber of asset variables 

Y;: the element j of liability, Vj = 1,...,m, m is the 
number of liability variables 

K@x (K@y,): is the low bound of specific asset ac- 
counts X’ (liability Y’) 

A®yx (A®y,): is the upper bound of specific asset ac- 
counts X’ (liability Y’) 

Ex: specific categories of asset accounts 

ITy: specific categories of liability accounts 

a: the desirable value of specific asset and liability data 

IT,: the liability set, which includes the equity 

E: the set of assets 

w;: the degree of riskness of the asset data 

kj: the solvency ratio, as it is defined from the Euro- 
pean Central Bank. 


102 


Asset Liability Management Decision Support System 


ky: the liquidity ratio, as it is defined from the bank pol- 
icy 

E,: the set of asset data, which includes the loans 

II: the set of liability data, which includes the deposits 

Rr the expected return of the asseti, Vi =1,...,n 

Rj: the expected return of the liability j, 
Vj=l,....m 

ks: the expected value for the goal of asset and liability 
return 

P: the goal imposed from the bank 

Ly: the desirable value goal for the goal constraint p de- 
fined by the bank 

dj: the over-achievement of the goal k, Vk € P 

d,: the under-achievement of the goal k, Vk € P 

Px: the priority degree (weight) of the goal k 


Certain constraints are imposed by the banking reg- 
ulation on particular categories of accounts. Specific 
categories of asset accounts (X’) and liability accounts 
(Y’) are detected and the minimum and maximum al- 
lowed limit for these categories are defined based on the 
strategy and policy that the bank intends to follow (con- 
straints 2-3). 

The structural constraints (4-5) include those that 
contribute to the structure of the balance sheet and es- 
pecially to the performance of the equation Assets = Li- 
abilities + Net Capital. 

The bank management should determine specific 
goals, such as the desirable structure of each financial 
institution’s assets and liabilities for the units of surplus 
and deficit, balancing the low cost and the high return. 
The structure of assets and liabilities is significant, since 
it affects swiftly the income and profits of the bank. 

Referring to the goals of the model, the solvency 
goal (6) is used as a risk measure and is defined as the 
ratio of the bank’s equity capital to its total weighted 
assets. The weighting of the assets reflects their respec- 
tive risk, greater weights corresponding to a higher de- 
gree of risk. This hierarchy takes place according to the 
determination of several degrees of significance for the 
variables of assets and liabilities. That is, the variables 
with the largest degrees of significance correspond to 
categories of the balance sheet accounts with the high- 
est risk stages. 

Moreover, a basic policy of the commercial banks 
is the management of their liquidity and specifically 
the measurement of their needs that is relative to the 


progress of deposits and loans. The liquidity goal (7) is 
defined as the ratio of liquid assets to current liabilities 
and indicates the liquidity risk, that indicates the pos- 
sibility of the bank to respond to its current liabilities 
with a security margin, which allows the probable re- 
duction of the value of some current data. 

Furthermore, the bank aims at the maximization of 
its efficiency that is the accomplishment of the largest 
possible profit from the best placement of its funds. Its 
aim is the maximization of its profitability and therefore 
precise and consistent decisions should be taken into 
account during the bank management. These decisions 
will guarantee the combined effect of all the variables 
that are included on the calculation of the profits. This 
decision taking gives emphasis to several selected vari- 
ables that are related to the bank management, such as 
to the management of the difference between the asset 
return and the liability cost, the expenses, the liquidity 
management and the capital management. The goal (8) 
determines the total expected return based on the ex- 
pected returns for all the assets R* and liabilities RY. 

Beside the goals of solvency, liquidity and return of 
assets and liabilities, the bank could determine other 
goals that concern specific categories of assets and li- 
abilities, in proportion to the demands and preferences 
of the bank managers. These goals are the deposit goal, 
the loan goal and the goal of asset and liability return. 

The drawing of capital, especially from the deposits 
constitutes a major part of commercial bank manage- 
ment. All sorts of deposits constitute the major source 
of capital for the commercial banks, in order to proceed 
to the financing of the economy, through the financing 
of firms. Thus, it is given special significance to the de- 
posits goal. 

The goal of asset and liability return defines the goal 
for the overall expected return of the selected asset- 
liability strategy over the year of the analysis. 

Finally, there are goals reflecting that variables such 
as cash, cheques receivables, deposits to the Bank of 
Greece and fixed assets, should remain at the levels 
of previous years. More analytically, it is known that 
the fixed assets are the permanent assets, which have 
a natural existence, such as buildings, machines, lo- 
cations and equipment, etc. Intangible assets are the 
fixed assets, which have no natural existence but consti- 
tute rights and benefits. They have significant economic 
value, which sometimes is larger than the value of the 
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tangible fixed assets. These data have stable character 
and are used productively by the bank for the regular 
operation and performance of its objectives. Since the 
fixed assets, tangible or intangible, are presented at the 
balance sheet at their book value that is the initial value 
of cost minus the depreciation till today, it is assumed 
that their value does not change during the develop- 
ment of the present methodology. 

At this point, Kosmidou and Zopounidis [18] took 
into account that the banks should manage the interest 
rate risk, the operating risk, the credit risk, the market 
risk, the foreign exchange risk, the liquidity risk and the 
country risk. 

More specifically, the interest rate risk indicates the 
effect of the changes to the net profit margin between 
the deposit and borrowing values, which are evolved as 
a consequence of the deviations to the dominant inter- 
est rates of assets and liabilities. When the interest rates 
diminish, the banks accomplish high profits since they 
can refresh their liabilities to lower borrowing values. 
The reverse stands to high borrowing values. It is ob- 
vious, that the changes of the inflation have a relevant 
impact on the above sorts of risk. 

Considering the interest rate risk as the basic uncer- 
tainty parameter to the determination of a bank asset 
liability management strategy, the crucial question that 
arises concerns the determination of the way through 
which this factor of uncertainty affects the profitabil- 
ity of the pre-specified strategy. The estimation of the 
expected return of the pre-specified strategy and of its 
variance can render a satisfactory response to the above 
question. 

The use of Monte Carlo techniques constitutes 
a particular widespread approach for the estimation 
of the above information (expected return - variance 
of bank asset liability management strategies). Monte 
Carlo simulation consists in the development of var- 
ious random scenarios for the uncertain variable (in- 
terest rates) and the estimation of the essential statis- 
tical measures (expected return and variance), which 
describe the effect of the interest rate risk to the se- 
lected strategy. The general procedure of implementa- 
tion of Monte Carlo simulation based on the above is 
presented in Fig. 1. 

During the first stage of the procedure the various 
categories of the interest rate risks are identified. The 
risk and the return of the various data of bank asset and 


Determination of the uncertain 
parameters (interest rates) 


| 


Determination of the statistical 
distribution 


| 


Development of random 
scenarios 


Evalution of the strategy for 
each scenario 


| 


Determination of appropriate 
statistical measures 


Asset Liability Management Decision Support System, Fig- 
ure 1 

General Monte Carlo simulation procedure for the evalua- 
tion of the asset liability management strategies 


liability are determined from the different forms of in- 
terest rates. For example, the investments of a bank to 
government or corporate bonds are determined from 
the interest rates that prevail in the bond market, which 
are affected so by the general economic environment 
as by the rules of demand and supply. Similarly, the 
deposits and loans of the bank are determined from 
the corresponding interest rates of deposits and loans, 
which are assigned by the bank according to the con- 
ditions that prevail to the bank market. At this stage, 
the categories of the interest rates, which constitute cru- 
cial uncertain variables for the analysis, are detected. 
The determined interest rates categories depend on the 
type of the bank. For example, for a decisive commer- 
cial bank, the deposit and loan interest rates have a role, 
whereas for an investment bank more emphasis is given 
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to the interest rates and the returns of various invest- 
ment products (repos, bonds, interest-bearing notes, 
etc.). 

After the determination of the various categories of 
interest rates, which determine the total interest rate 
risk, at the second stage of the analysis the statistical 
distribution that follows each of the pre-specified cat- 
egories should be determined. 

Having determined the statistical distribution that 
describes the uncertain variables of the analysis (in- 
terest rates), a series of random independent scenar- 
ios is developed, through a random number genera- 
tor. Generally, the largest the number of scenarios that 
are developed, the more reliable conclusions can be de- 
rived. However, the computational effort increases sig- 
nificantly, since for each scenario the optimal asset lia- 
bility strategy should be determined and moreover its 
evaluation for each other scenario should take place. 
Thus, the determination of the number volume N of 
simulations (scenarios), which will take place should be 
determined, taking into account both the reliability of 
the results and the available computational resources. 

For each scenario s; (i= 1, 2,..., N) over the inter- 
est rates the optimal asset liability management strat- 
egy Y; is determined through the solution of the goal 
programming problem. It is obvious that this strategy 
is not expected to be optimal for each of the other sce- 
narios s; (j # i). Therefore the results obtained from 
the implementation of the strategy Y; under the rest N- 
1 possible scenarios s; should be evaluated. The evalua- 
tion of the results can be implemented from various di- 
rections. The most usual is the one that uses the return. 
Representing as rj; the outcome (return) of the strategy 
Y; under the scenario sj, the expected return 7; of the 
strategy can be easily determined based on all the other 
N-1 scenarios s; (j 4 i), as follows: 


(12) 
j=1,j#i 
At the same time, the variance 07 of the expected return 


can be determined as a risk measure of the strategy Yj, 
as follows: 


(13) 


These two statistical measures (average and variance) 
contribute to the extraction of useful conclusions con- 
cerning the expected efficiency of the asset liability 
management strategy, as well as the risks that it car- 
ries. Moreover, these two basic statistical measures can 
be used for the expansion of the analysis of the deter- 
mination of other useful statistical information, such as 
the determination of the confidence interval for the ex- 
pected return, the quantiles, etc. 


Conclusions 


The banking business has recently become more so- 
phisticated due to technological expansion, economic 
development, creation of financial institutions and in- 
creased competition. Moreover, the mergers and acqui- 
sitions that have taken place the last years create large 
groups of banking institutions. The success of a bank 
depends mainly on the quality of its asset and liabil- 
ity management, since the latter deals with the efficient 
management of sources and uses of bank funds concen- 
trating on profitability, liquidity, capital adequacy and 
risk factors. 

It is obvious that in the last two decades modern 
finance has developed into a complex mathematically 
challenging field. Various and complicated risks exist in 
financial markets. For banks, interest rate risk is at the 
core of their business and managing it successfully is 
crucial to whether or not they remain profitable. There- 
fore, it has been essential the creation of the department 
of financial risk management within the banks. Asset 
liability management is associated with the changes of 
the interest rate risk. Although several models exist re- 
garding asset liability management, most of them are 
focused on the general aspects and methodologies of 
this field and do not refer extensively to the hedging 
of bank interest rate risk through asset liability man- 
agement. Thus, the main purpose of the present paper 
was to describe the development of a bank ALM deci- 
sion support system, which gives the possibility to the 
decision maker to proceed to various scenarios of the 
economic process of the bank in order to monitor its 
financial situation and to determine the optimal strate- 
gic implementation of the composition of assets and 
liabilities. Moreover, we believe that the development 
of a bank asset liability management model that takes 
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into account the exogenous factors and the economic 
parameters of the market as well as the uncertainty of 
variations of the financial risks become essential. 

Finally, despite the approaches described in this pa- 
per, little academic work has been done so far to de- 
velop a model for the management of assets and li- 
abilities in the European banking industry. Based on 
the above we conclude that the quality of asset liabil- 
ity management in the European banking system has 
become significant as a resource of competitive advan- 
tage. Therefore, the development of new technological 
approaches in bank asset liability management in Eu- 
rope is worth further research. 
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Matching problems comprise an important set of prob- 
lems that link the areas of graph theory and combinato- 
rial optimization. The maximum cardinality matching 
problem (see below) is one of the first integer program- 
ming problems that was solved in polynomial time. 
Matchings are of great importance in graph theory 
(see [9]) as well as in combinatorial optimization (see 
e.g. [15]). 

The matching problem and its variations arise in 
cases when we want to find an ‘optimal’ pairing of the 
members of two (not necessarily disjoint) sets. In par- 
ticular, if we are given two sets of ‘objects’ and a ‘weight’ 
for each pair of objects, we want to match the objects 
into pairs in such a way that the total weight is maxi- 
mal. In graph theory, the problem is defined on a graph 


G = (V, E) where V is the node set of the graph, corre- 
sponding to the union of the two sets of objects, and E is 
the edge set of the graph corresponding to the possible 
pairs. A pair is possible if there exists an edge between 
the corresponding nodes. A matching M is a subset of 
the edges E with the property that each node in V is in- 
cident to at most one edge in M. If each node in V is 
met by exactly one edge in M, then M is called a perfect 
matching. There exist several versions of the matching 
problem, depending on whether the graph G is bipar- 
tite or not (i.e., the two sets of objects are disjoint or 
not), and on whether we want to find the maximum size 
(cardinality) or the maximum weight of the matching. 
The book [1] gives several applications of the matching 
problem. 


Maximum Cardinality Bipartite Matching 
Problem 


The graph G is bipartite if the node set V can be par- 
titioned into two disjoint sets V; and V2 such that 
no edge in E connects nodes from the same set. Find- 
ing a maximum cardinality matching on a bipartite 
graph can be solved by several efficient algorithms with 
a worst-case bound of O(./nm), where n is the num- 
ber of nodes and m the number of edges of the graph. 
See [1] for details. 


Weighted Bipartite Matching Problem 


This problem is known as the assignment or the mar- 
riage problem. In the traditional definition it is required 
that the sets V, and V2 are of equal size, but even if not, 
one can add ‘dummy’ nodes to the smaller set to sat- 
isfy this condition. This problem can be formulated as 
a zero-one linear programming problem as follows: 


min > fu, v)xuy 
(u,v)EE 
s.t. a Xyw =1 forallu € Vj, 
(u,v)EE 
> Xyw =1 forallv € Vs, 
(u,v)EE 


Xu € {0,1} forallue Vj, ve Vp. 


The assignment problem has the property that if solved 
as a linear programming problem in nonnegative x,,, it 
yields an integer solution, i.e., the zero-one integrality 
condition in the formulation is not necessary. This is 
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so because the constraint matrix of the equations is to- 
tally unimodular, i.e., the determinant of every square 
submatrix of it is 0 or +1. This means that if the right- 
hand sides of the equations are integer numbers, as is 
the case in the assignment problem, then the solution 
will be integer. 

Linear programming algorithms are not as efficient 
as specialized algorithms for solving the assignment 
problem. The assignment problem is a special case of 
the minimum cost flow problem, and adaptations of al- 
gorithms for that problem that take into account the 
special structure of the assignment problem yield the 
most efficient algorithms. Probably the best known al- 
gorithm is the so called Hungarian algorithm, see [8], 
which is a primal-dual algorithm for the minimum cost 
flow problem. See [1] for details and other algorithms. 

Variations of the bipartite matching include among 
others the order preserving assignment problem and 
the stable marriage problem. In the order preserving 
assignment problem the assignment must be such that 
a prespecified order among the objects of one of the 
node partitions is preserved. Although the linear pro- 
gramming formulation of this problem is more compli- 
cated than that of the assignment problem, the prob- 
lem itself is easier to solve than the assignment problem 
and can be solved in O(m) time where m is the num- 
ber of edges in the graph; see [2,12]. In the stable mar- 
riage problem each object of one partition has a ranking 
(or preference) for each of the objects of the other par- 
tition, and the assignment must be such that there is 
no nonmatched pair of objects that its members prefer 
each other to the ones they are matched against. This 
problem can be solved in O(n’) time using a greedy al- 
gorithm (n is the number of nodes in one partition). 
See [1]. 


Weighted Matching Problem 


The weighted matching problem can be formulated as 
a 0-1 programming problem as follows: 


max FU, V)Xuy 
(u,v)EE 

s.t. ~~ Xy <1 forallue V, 
(u,v)EE 


Xuy € {0,1} forall (u,v) € E. 


Unlike the case of the assignment problem, relaxing the 
integrality constraints yields, in general, a fractional so- 
lution. 


Maximum Cardinality Matching Problem 


J. Edmonds showed in [5] that one more set of in- 
equalities—the odd-set constraints—is needed in order 
to get a linear programming formulation of the match- 
ing problem. The odd-set or blossom inequalities are 


U 
\ tee eat Vodd U C V, |U| = 3, 


(u,v)€E(U) 


where E(U) is the set of all edges in E with both end 
nodes in U. An odd set is a set of odd cardinality. See 
also [11]. 

Solving the matching problem on nonbipartite 
graphs is considerably more difficult than on bipar- 
tite ones. This is so because the path augmenting al- 
gorithms used in the case of bipartite matchings, may 
fail when a structure called blossom is encountered. Ed- 
monds provided an O(n*) algorithm that would find an 
integer solution to the linear programming relaxation 
of the formulation (including the odd-set constraints) 
for any objective function, proving this way the com- 
pleteness of the formulation. Several implementations 
that improved the performance of the algorithm have 
been proposed (see [1,10], among others) as well as data 
structures for the efficient implementation of such algo- 
rithms (see [3]). M. Grétschel and O. Holland [6] gave 
a cutting plane algorithm for the weighted matching 
problem, where they used an efficient separation algo- 
rithm to identify violated blossom inequalities, based on 
the algorithm of M.W. Padberg and MLR. Rao [14] for 
the b-matching problem. 

The b-matching problem is an important general- 
ization of the matching problem. In the b-matching 
problem each node v € V is met by no more than 
b, edges; thus, in this context, the previous defini- 
tion of matching corresponds to an 1-matching. A per- 
fect b-matching is one in which each node v € V is 
met by exactly b, edges. If it is permitted to chose an 
edge more than one times then the problem becomes 
a general integer program instead of a 0-1 program. 
The b-matching problem can be reduced to 1-match- 
ing problem on an appropriately constructed graph. 
Although this procedure is not polynomial in gen- 
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eral—and thus, Edmonds’ algorithm can not be read- 
ily applied—the b-matching problem is polynomially 
solvable; see [14] and [7]. A linear inequality descrip- 
tion for the integer b-matching problem is given in [15]. 
See also [11]. The perfect 0-1 2-matching problem is 
a relaxation of the traveling salesman problem (TSP). 
Solving the 0-1 2-matching problem yields a heuris- 
tic solution to the TSP which is an NP-hard problem; 
see [13]. 


See also 


> Assignment Methods in Clustering 

> Bi-Objective Assignment Problem 

> Communication Network Assignment Problem 
> Frequency Assignment Problem 

> Maximum Partition Matching 

> Quadratic Assignment Problem 
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The use of assignment methods in the formulation of 
various optimization problems encountered in cluster- 
ing and classification, can be introduced through the 
well-known quadratic assignment (QA) model (see [5] 
for a comprehensive discussion of most of the topics 
presented in this entry). In its most basic form the QA 
optimization task can be stated using two n x n matri- 
ces, say P = { pj}, and Q = { qi}, and the identification 
of a one-to-one function (or a permutation), p(-), on 
the first n integers, to optimize (either by minimizing 
or maximizing) the cross-product index 


P(p) = Yo Poon dir. (1) 
ij 


Typically, the main diagonal entries in P and Q are con- 
sidered irrelevant and can be set equal to zero. For ar- 
bitrary matrices P and Q, the cross product index in (1) 
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may be rewritten as 


> Polio) + Peroti) \ ( Vij + ji 
2 2 


i,j 


Fe y ( Pave 5 Peat) (2 = 1) 


i,j 


indicating that the optimization of (1) jointly involves 
the symmetric ([P + P’]/2 versus [Q + Q’]/2) and skew- 
symmetric ([P — P’]/2 versus [Q — Q’]/2) components 
of both P and Q. Because of this separation of P and 
Q into symmetric and skew-symmetric components, it 
is possible in the context of the clustering/classification 
tasks to be discussed below, to assume that both P and 
Q are symmetric or that both are skew-symmetric. 

In applications to clustering, the matrix P usu- 
ally contains numerical proximity information between 
distinct pairs of the n objects from some given set S = 
{ Oj,..., O,} that is of substantive interest. If P is sym- 
metric, pi ( = pji) denotes the degree to which objects 
O; and Oj are similar (and keyed as what is referred to 
as a dissimilarity [or as a similarity] measure if smaller 
[or larger] values reflect greater object similarity). If P 
is skew-symmetric, pj (= — pji) is an index of domi- 
nance (or flow) between objects O; and Oj, with the sign 
reflecting the directionality of dominance and the abso- 
lute value indicating the degree. The (target) matrix Q, 
as developed in detail in the next section, will typically 
be fixed, with the specific pattern of entries character- 
izing the type of structure to be identified for the set S, 
e. g.,a single object cluster, a partition, or a partition hi- 
erarchy. An optimal permutation, say, p*(-), based on 
the cross-product index in (1) for a specific target ma- 
trix Q will identify the (salient) combinatorial structure 
sought. 

The QA optimization task as formulated through (1) 
has an enormous literature that will not be reviewed 
here (for an up-to-date and comprehensive source on 
QA, see [11]). For current purposes, one might con- 
sider the optimization of (1) through a simple object 
interchange heuristic that would begin with some per- 
mutation (possibly chosen at random), and then im- 
plement local interchanges until no improvement in 
the index can be made. By repeatedly initializing such 
a process randomly, a distribution over a set of local 
optima can be achieved. At least within the context 
of clustering/classification, such a distribution may be 


highly relevant diagnostically for explaining whatever 
structure is inherent in the data matrix P, and possibly 
of even greater interest than the identification of just 
a single optimal permutation. In a related framework, 
there are considerable applications for the QA model in 
a confirmatory context where the distribution of I"(p) 
is constructed over all n! possible permutations consid- 
ered equally-likely, and the index value associated with 
some identified permutation is compared to this distri- 
bution. Most nonparametric statistical methods popular 
in the literature can be rephrased through the device of 
defining the matrices P and Q appropriately (see [5] for 
a comprehensive development of these special cases as 
well as approximation methods based on closed-form 
expressions for the first three moments of I"(p)). A few 
of these applications will be briefly noted below. 


Weighting Schemes 
for the Fixed (Target) Matrix Q 


Single Cluster Statistics 


To identify a single salient cluster of fixed size K (that 
can be varied by the user), consider Q to have the par- 
titioned form 


Q= fo | 
Qi Qn)’ 
where within each submatrix of the size indicated, the 
(off-diagonal) entries are constant: 


0 qu 
Qui=]: : 
qui or 0 wre 
Qn = qi2 
Kx(n—K) 
Qa = q21 
(n—K)xXK 
O «: 6 
Qy» = : . ot 
O. s= 0 (n—K)x(n—K) 


Depending on how the values for qi1, qi2, and qa) are 
defined, different indices can be generated that measure 
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the salience of the subset constructed by any permuta- 
tion p(-), ie., for the identified cluster Sy = {Opq), ..- 
Op(x)}- 

For symmetric P: 
A) letting 


1 


=> — —, — => 0, 
K(K-) q12 q21 


ql 


the index I”(p) is the average proximity within the 
subset S, and defines a measure of cluster ‘compact- 


ness’; 
B) letting 
qa =Y, Gi2= 42 = 2K(n — K)’ 


I"(p) is the average proximity between the subset S, 
and its complement, and defines a measure of clus- 
ter ‘isolation’ for either S, or S — Sp; alternatively, it 
can be considered a measure of ‘separation’ between 
Sp or S — Sp; 

C) by contrasting A) and B) as 
1 

K(K —1)’ 

1 
~ 2K(n—k)’ 


qi = 
qi2 = 42 = 


I'(p) characterizes the salience of the subset S, by 
a trade-off between compactness and isolation. The 
optimization of I"(p) based on these latter weights 
identifies a cluster that would be both relatively 
compact and isolated, whereas the emphasis in A) 
and B) are on clusters that may be either compact 
or isolated but not necessarily both. 

For skew-symmetric P: 

D) letting 


1 


= 0, — 
qu 2K(n —K) 


qiz2 = 421 = —412, 
the index I"(p) is the average dominance (or flow) 
from the subset S, to its complement, minus the av- 
erage dominance (or flow) from the complement to 
the subset. Thus, its optimization (e. g., maximiza- 
tion) identifies a subset of S whose members tend to 
dominate those in its complement (or where aggre- 
gate outflow exceeds aggregate inflow). 

In a confirmatory comparison context, the single-clus- 

ter statistic I"(p) can be used to generate a number of 


nonparametric test statistics for comparing the differ- 
ence between two independent groups. For example, 
suppose observations are available on n objects, x1,..., 
Xn» where the first K belong to group I and the last n — K 
to group II. Ifthe (now asymmetric) proximity matrix is 
defined as P = { pj}, where pj = Lifx; < x; and = Oifx;> 
x; then the weighting scheme in B) gives (a simple linear 
transform of) the well-known Mann-Whitney statis- 
tic for comparing two-independent groups, i.e., if two 
observations are drawn at random from groups I and 
II, then I"(p,), for p. the identity permutation, is the 
probability that the group I observation is the larger. 
The distribution of I”(p) over all n! permutations gen- 
erates the null distribution against which the observed 
index I"(p,) can be compared. Because of the struc- 
ture of Q, this null distribution is based on all n!/(K!(n 
— K)!) distinct subsets considered equally-likely to be 
formed from the collection of size n. (See [3, Chap. 7], 
for a more complete discussion of the two-independent 
sample problem in this type of nonparametric frame- 
work.) 

Although single-cluster statistics that depend on the 
comparison of mean proximities may be the most ob- 
vious to consider, a number of possible alternatives can 
be constructed by varying the definition for the weight 
matrices in Q. For example, for symmetric P, if Qi; is 
(re)defined to have the form 


0 1 +0 0 0 0 
101 0 0 0 
0 0 0 101 
0 0 0 0 1 


with entries of all ones immediately above and below 
the main diagonal, and qi. = q2i = 0, the salience of S, 
is now based on (twice) the sum of adjacent proximities 
along a path of length K considered in the object order 
Op) <> +++ <> Opcxy. Or, if Qi is (re)defined to have 
the form 


0 1 1: 11 
1 0 0 :-- 0 0 
10 0 :-- 0 0 


and qi2 = qzi = 0, the salience of S, is now based on 
(twice) the sum of proximities between O,c1) and the 
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remaining objects O,2), ..., Opcxy (this is called a ‘star’ 
cluster of size K with object Opi) as its center; see [10, 
Sect. 4.5.2] for a further discussion of clustering based 
on stars). 


Partition Statistics 


To identify a salient partition of S into M subsets, S;, 
...» Sy Of fixed sizes nm), ..., ny, respectively, consider 
Q to have the partitioned form 


Qn Qn Qim 
= 3 a ae on 
Qui Que Qum 

where the (off-diagonal) entries in each submatrix Qyin/ 

of size Nyy X Ny’, are all equal to a constant Ginm’, 1 < m, 

m’ < M. Again, depending on how these latter values 

are defined, a variety of different indices can be gener- 

ated that now measure the salience of the partition gen- 
erated by a permutation p(-). For a symmetric P, three 
of the most popular alternatives are noted below that 
differ only in how the weights qmm, 1 < m < M, are de- 

fined, and which all assume gyn = 0 for m 4 m’: 

a) Gmm = 1: each subset in a partition contributes in di- 
rect proportion to the number of object pairs it con- 
tains; 

b) dum = 1/(4m(nm — 1)): each subset contributes 
equally irrespective of the number of objects (or ob- 
ject pairs) it contains; 

C) mm! = 1/Nm: each subset contributes in direct pro- 
portion to the number of objects it contains. 

In a confirmatory comparison context, the partition 

statistic (op) with weighting option c) can be used 

to construct a test-statistic equivalent to the common 

F-ratio in a one-way analysis of variance for assess- 

ing whether mean differences exist over K independent 

groups. Explicitly, suppose observations are available 

on n objects, x, . 

group 1, the second nz belonging to group 2, and so on. 

If proximity is defined as P = {pj}, where pj = (x; — ey 

then the weights in c) produce I"(9), for po the iden- 

tity permutation, equal to twice the within group sum 
of squares. The distribution of I"(p) over all n! permu- 
tations generates a distribution over all n!/(n,! ... ny!) 
equally-likely ways the n observations can be grouped 
into subsets of sizes 1), .. 


.+> Xn, With the first n; belonging to 


.» Ny, and against which the 


observed index I"(p,) can be compared. (See [9] for 

a more thorough discussion of thus evaluating a priori 

classifications.) 

For a skew-symmetric P, the partitioning of S would 
now be into M ordered subsets, S; < ... < Sy, of fixed 
sizes m, ..., Nm, With the most natural weights being 
dmm = 0for1<m<M, qmm =+ lifm<m',and=-—1if 
m > mm’. Maximizing I"(p) is this case would be a search 
for an ordered partition in which objects in S,, tend to 
dominate those in S,,’ if m < m’, i.e., there are generally 
positive dominance values from a lower-placed subset 
to one that is higher. 

There are several special cases of interest for the par- 
tition statistic: 

i) for symmetric P and if for convenience it is assumed 
nis even and n, = 2 for 1 <m <M (so,n=2M), the 
weights in a) make J"(p) the index for a matching of 
the objects in S induced by p(-); 

ii) if the proximity matrix P is itself constructed from 
a partition of S, then the index I"(p) can be inter- 
preted as a measure of association for a contingency 
table defined by the n objects cross-classified using 
p(-) and the two partitions underlying P and Q. 

Depending on the choice of weights for Q, and how 

proximity is defined in P based on its underlying parti- 

tion, a number of well-known indices of association can 
be obtained: Pearson’s chi-square statistic, Goodman- 

Kruskal’s t,, and Rand’s index. For a more complete 

discussion of these special cases, including the neces- 

sary definitions for P, consult [5]. 


Partition Hierarchy Statistics 


One straightforward strategy for extending QA to iden- 
tify salient partition hierarchies having a specific form, 
begins with a given collection of T partitions of S, P,, 
...» Pr, that are hierarchically related. Here, P; con- 
tains all n objects in n separate classes, Py contains all 
n objects in one class, and P;,, is formed from P; for 
t > 1 by uniting one or more of the classes in the latter. 
If Q = {qi} is defined by qj = min{t— 1: O;, Oj € com- 
mon object class in P;}, then these latter entries satisfy 
the defining property of being an ultrametric, i.e., qij < 
max{qiz, qx} for all O;, Oj, Ox € S (see [2,10, Chap. 7] for 
an extensive discussion of ultrametrics). For symmetric 
P, the optimization of I”(e) in (1) would be the search 
for a salient partition hierarchy having the generic form 
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defined by P), ..., Pr, and which optimizes the cross- 
product between the proximity information in P and 
the levels at which the object pairs are first placed into 
common classes in the hierarchy. It might be noted that 
both single clusters and partitions could be considered 
special cases of a partition hierarchy when T = 3 and 
the only nontrivial partition is P», i.e., to obtain a sin- 
gle cluster, Pz can be defined by one subset of size K 
and n — K subsets each of size one; to obtain a single 
partition, P2; merely has to be that partition with the 
desired number of classes and class sizes. 


Alternative Assignment Indices 


There are a variety of alternatives for replacing the 
cross-product in the QA index in (1) by a different func- 
tion between the entries in P and Q. Depending on how 
the proximity information in P and the target given by 
Q are specified, one might adopt, for example, the sum 
of absolute differences, )°,, ; [Povo — qij|> or the sum of 
dichotomous indicators for equality, }°j, ; g(Ppcio(j)» ij)» 
where g(x, y) = 1 if x = y and 0 otherwise, or even use 
‘bottleneck’ measures such as minj, ; Ppvip(j) qij OF MAX;, j 
Polio) qi. Somewhat more well-developed in the lit- 
erature than these possibilities (e. g., see [5, Chap. 5]) 
are generalizations of (1) that would maintain the ba- 
sic cross-product structure but which would rely on 
higher-order functions of the entries in P and Q before 
the cross-products were taken. Again, variations would 
be possible, but two of the more obvious forms of ex- 
tension are given below that depend solely on the order 
of the entries within P and within Q: 
e Three-argument functions: Given P and Q, and let- 
ting sign(x) = +1 ifx > 0, = 0 ifx =0, and = — lif 
x <0, define 


A(p) = > sign (P p(i)0(j) — Pp(io(k)) Sign(4ij— ik). 
ij 
itk 


The index A(p) can be interpreted as the difference 
between two counts, say A*(p) and A (p), where 
A*(p) (respectively, A™()) is the number of con- 
sistencies (inconsistencies) in the ordering of pairs 
of off-diagonal entries in {ppcipq} and their coun- 
terparts in {qi}, where the former pairs share a com- 
mon (row) object Opi). 


e Four-argument functions: Define 


B(p) = > sign (P p(i)o(j) — Pockye()) Sign(qij— 4x1). 
iFj 
k#l 

Again, the index B(p) can be viewed as the differ- 
ence between B*(p) and B(p), where B*(p) (re- 
spectively, B-(p)) is the number of consistencies 
(inconsistencies) in the ordering of pairs of off- 
diagonal entries in {ppciypq} and their counterparts 
in {qi}. In contrast to A(p), however, no com- 
mon object need be present in the pairs of off- 
diagonal entries. The distinction between -A(p) and 
B(p) in measuring the correspondence between P 
and Q rests on whether the proximity entries in P 
are strictly comparable only within rows (i.e., to 
what are called row conditional proximity data, e. g., 
see [1, p. 192]) or whether such comparisons make 

sense when performed across rows. 
To illustrate the interpretation of A(p) and B(p) in the 
single cluster statistic context, suppose Q has the weight 
structure in A) that generated through (1) the mea- 
sure of cluster compactness as the average within group 
proximity in Sp = {Opq1), -.-» Opcxy}. In using this spe- 
cific target Q for A(p), the index is, in words, twice the 
difference between the number of instances in which 
a proximity for two objects both within S, is greater 
than the proximity from one of these two objects to an- 
other in S — Sp, and the number of instances in which 
it is less. Depending on whether proximity is keyed as 
a similarity or a dissimilarity, a compact subset would 
be one for which A(p) is maximized or minimized, re- 
spectively. If instead, the weight structure for Q given 
in B) that defined the measure of cluster isolation, the 
index .A(p) would now be twice the difference between 
the number of instances in which a proximity between 
two objects that span S, and S — Sp is greater than the 
proximity between two objects within S, or within S — 
Sp (where the latter have one member in common with 
the two that span S, and S — S,), and the number of in- 
stances in which it is less. Now, an isolated subset would 
be identified by maximizing or minimizing A(p) de- 
pending on the keying of proximity as a dissimilarity or 
similarity, respectively. For B(p), and the weight struc- 
ture in A), the index is, in words, twice the difference 
between the number of instances in which a proximity 
for two objects both within S, is greater than the prox- 


Assignment Methods in Clustering 


113 


imity between any two objects that span S, and S — S, 
and the number of instances in which it is less. The in- 
dex B(p) for the weight matrix in B) would be twice the 
difference between the number of instances in which 
a proximity between two objects that span S, and S — 
Sp is greater than the proximity between any two ob- 
jects within Sp or within S — Sp. 

In the partition context, a similar interpretation 
to the use of the single subset compactness measure 
would be present for A(p) and B(p) and for all of the 
three weighting options mentioned, but now all aggre- 
gated over the M subsets of the partition. In the parti- 
tion hierarchy framework, the correspondence between 
{Poi} and Q is measured by the degree of consis- 
tency in the ordering of the object pairs by proxim- 
ity and the ordering of the object pairs by the levels in 
which the objects are first placed into a common class. 

In addition to replacing the QA index in (1) by the 
higher order functions adopted in A(p) and B(p) to ef- 
fect a reliance only on the order properties of the entries 
within P and Q, there are several other uses in a cluster- 
ing/classification context for the definition of three- or 
four-argument functions. One alternative will be men- 
tioned here that deals with what can be called the gen- 
eralized single cluster statistic. Explicitly, suppose three- 
and four-argument function of the entries in P are de- 
noted by u(-, -, -) and r(-, -, -, -), respectively, and those 
in Q by v(-, -, -) and s(-, -, -, -), and consider the general 
cross-product forms of 


C(p) = D> u(p(i), oj), (kK) (i, j,k), 


i,jsk 
D(p) = Y~ rol), pj). eck), pl) sli, j,k, D). 
i,jsksl 
It will be assumed here that both v(-, -, -) and s(-, +, +, -) 


are merely indicator functions for a subset of size K, so 
v(i, j, k) = 1 if 1 < i,j, k < K, and = 0 otherwise; s(i, j, 
k, ) =1if 1 <i, j,k, 1 < K, and = 0 otherwise. Thus, 
the optimization of C(p) or D(p) can be viewed as the 
search for a subset of size K with extreme values for the 
indices )°1 <i,j,r< x U(p(i), p({), p(k) or 01 <ij,kil<Kk 
r(p(i), eG), p(k), p(2)), and depending on how the func- 
tions u(-, -, +) and r(-, «, -, -) are defined, a subset that is 
very salient with respect to the property that character- 
izes the latter. 

A number of properties that may be desirable to 
optimize in a subset of size K have been considered 


(see [4] for a more complete discussion), of which the 
two listed below are directly relevant to the cluster- 
ing/classification context: 

i) aproximity matrix (with a dissimilarity keying) rep- 
resents a perfect partition hierarchy if it satisfies 
the property of being an ultrametric: for all 1 < i, 
jek <n, py < maxt{pix, py}, or equivalently, the 
two largest values among pj, pik, and px; are equal. 
Thus, if u(p(i), e(j), e(k)) equals the absolute differ- 
ence between the two largest values among Pp(j)p(j), 
Ppviptk)> and Ppvjyp(k), the minimization of C(p) seeks 
a subset of size K that is as close to being an ultra- 
metric as possible (as measured by C(p)); 

ii) a proximity matrix (again, with a dissimilarity key- 
ing) represents a perfect additive tree where proxim- 
ities can be reconstructed by minimum path lengths 
in a tree if they satisfy the four-point property: for 
alll <i,j,k,1 <n, py + pia < max{pix + pj, Pi + Pixs 
or equivalently, the largest two sums among pj + pxi, 
Pik + pj, and pi + pj are equal. Thus, if r(p(i), p(/); 
p(k), p(D)) equals the absolute difference between the 
two largest values among Ppiipt) + Pocky)» Poti) 
+ Polje())> and Polio) + PeGp(k» the minimization of 
D(p) seeks a subset of size K that is as close to satis- 
fying the four-point condition as possible (as mea- 
sured by D(p)). 


Modifications of the Target Matrix Q 


The optimization of an assignment index such as (1) as- 
sumes that the target matrix Q is fixed and given a pri- 
ori. Based on this invariance, maximizing (1), for exam- 
ple, could be equivalently stated as the minimization of 


> (Poot — 4) (2) 
ij 


There has been a substantial recent literature (e. g., 
[6,7,8]) where not only is an optimal permutation, say 
p*(-), sought that would minimize (2), but in which 
a specific target matrix Q is also constructed based on 
a collection of (linear inequality) constraints that would 
characterize some type of classificatory structure fitted 
to {Ppioq}- The constraints imposed on Q are possibly 
based on the (sought for) permutation p*(-). 

In minimizing (2) but allowing the target matrix Q 
to itself be estimated, a typical iterative process would 
proceed as follows: on the basis of an initial target ma- 
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trix Q, find a permutation, say p“)(-), to maximize 

the cross-product in (1). Using p(.), fit a target matrix 

Q™ to {Ppwapm yt minimizing (2). Continue the pro- 

cess for p\ and Q for ¢ > 1 until convergence. A vari- 

ety of constraints for Q have been considered. Among 

these, there are 

i) asum of matrices each having what are called anti- 
Robinson forms (i.e., a matrix is anti-Robinson if 
within each row and column, the entries never de- 
crease moving in any direction away from the main 
diagonal [6]); 

ii) asum of ultrametric matrices (characterized by the 
ultrametric condition given earlier [7]); 

iii) a sum of additive tree matrices (again, as character- 
ized by the four-point condition given earlier [7]); 

iv) unidimensional scales (i. e., a matrix is a linear uni- 
dimensional scale if its entries can be given by { |x; 
— x;| + c}, where the estimated coordinates are x, < 

- <x, and c is an estimated constant [8]); and 

v) circular unidimensional scales (i.e., a matrix is so 
characterized if it can be represented as {min{|x; — 
xl; %o = [xy — |} + ch where x = <== 
the circumference of the circular structure, and c is 
an estimated constant [8]). 


< Xy, Xo is 


See also 


> Assignment and Matching 

> Bi-Objective Assignment Problem 

> Communication Network Assignment Problem 
> Frequency Assignment Problem 

> Maximum Partition Matching 

> Quadratic Assignment Problem 
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Introduction 


The Multidimensional Assignment Problem (MAP) is 
a higher dimensional version of the two-dimensional, 
or Linear Assignment Problem (LAP) [24]. Ifa classical 
textbook formulation of the Linear Assignment Prob- 
lem is to find an optimal assignment of “N jobs to M 
workers”, then, for example, the 3-dimensional Assign- 
ment Problem can be interpreted as finding an optimal 
assignment of “N jobs to M workers in K time slots”, 
etc. In general, the objective of the MAP is to find tu- 
ples of elements from given sets, such that the total cost 
of the tuples is minimized. The MAP was first intro- 
duced by Pierskalla [26], and since then has found nu- 
merous applications in the areas of data association [4], 
image recognition [31], multisensor multitarget track- 
ing [18,27], tracking of elementary particles [28], etc. 
For a discussion of the MAP and its applications see, 
for example, [7] and references therein. 

Without loss of generality, a d-dimensional axial 
MAP can be written in a form where each dimension 
has the same number n of elements, i. e., 


yy a 


ip Elly... n} 
kE{1,....d} kefl,...d\j 


min 
xe{o,1}n4 


(1) 


An instance of the MAP with different numbers of 
elements in each dimension, n, > nz > --: > ng, is re- 
ducible to form (1) by introduction of dummy vari- 
ables. 

Problem (1) admits the following geometric inter- 
pretation: given a d-dimensional cubic matrix, find 
such a permutation of its rows and columns that the 
sum of the diagonal elements is minimized (which ex- 
plains the term “axial”). This rendition leads to an alter- 
native formulation of the MAP (1) in terms of permu- 
tations 77,,..., q—, of numbers 1 to n, i. e., one-to-one 
mappings 7;: {1,...,n}+> {1,...,n}, 


n 
min y Ci,7(i),-a—1(i) > 
i=1 


10} ,...,0q—1 E11" 4 


where JI” is the set of all permutations of the set 
{1,..., 1}. A feasible solution to the MAP (1) can be 


conveniently described by specifying its cost, 
Z= Ca), 0) + 6,0), ,@ bees + Cm, 00 5 (2) 
1 d 1 d 1 d 


QQ) s(n) 


where (i, if see hj ) is a permutation of the set 


{1,2,...,n} for every j = 1,...,d. In contrast to the 
LAP that represents ad = 2 special case of the MAP (1) 
and is polynomially solvable [7], the MAP with d > 3 is 
generally NP-hard, a fact that follows from reduction of 
the 3-dimensional matching problem (3DM) [8]. 

Despite its inherent difficulty, several exact and 
heuristic algorithms [1,6,11,25] have been proposed to 
this problem. Most of these algorithms rely, at least 
partly, on repeated local searches in neighborhoods of 
feasible solutions, which brings about the question of 
how the number of local minima in a MAP impact these 
solution algorithms. Intuitively, if the number of lo- 
cal minima is small then one may expect better perfor- 
mance from meta-heuristic algorithms that rely on lo- 
cal neighborhood searches. A solution landscape is con- 
sidered to be rugged if the number of local minima is 
exponential with respect to the dimensions of the prob- 
lem [21]. Evidence in [5] showed that ruggedness of the 
solution landscape has a direct impact on the effective- 
ness of the simulated annealing heuristic in solving at 
least one other hard problem, the quadratic assignment 
problem. Thus, one of the issues that we address be- 
low is estimation of the expected number E[M] of local 
minima in random MAPs with respect to different local 
neighborhoods. 

Another problem that we discuss is the behavior of 
the expected optimal value Z7 , of random large-scale 
MAPs, whose assignment costs are assumed to be inde- 
pendent identically distributed (iid) random variables 
from a given continuous distribution. 

During the last two decades, expected optimal val- 
ues of random assignment problems have been stud- 
ied intensively in the context of random LAP. Per- 
haps, the most widely known result in this area 
is the conjecture by Mézard and Parisi [17] that 
the expected optimal value E[L,]:= Zz, of a LAP 
of size n with iid uniform or exponential with 
mean 1 cost coefficients satisfies limy—o9 E[Ln] = = 
In fact, this conjecture was preceded by an upper 
bound on the expected optimal value of the LAP 
with uniform (0,1) costs: lim sup,_,,, Ln <3 due to 
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Walkup [32], which was soon improved by Karp [12]: 
lim sup,,_,99 Ln < 2. A lower bound on the limit- 
ing value of L, was first provided by Lazarus [14]: 
lim inf,-s99 L, > 1+ e7! 1.37, and then has been 
improved to 1.44 by Goemans and Kodialam [9] and 
1.51 by Olin [20]. Experimental evidence in sup- 
port of the Mézard-Parisi conjecture was provided by 
Pardalos and Ramakrishnan [22]. Recently, Aldous [2] 
has shown that indeed limy+o9 E[L,] = as thereby 
proving the conjecture. Another conjecture due to 
Parisi [23] stating that the expected optimal value of 
a random LAP of finite size n with exponentially dis- 
tributed iid costs is equal to E[L,] = Z},, = j=, 7? 
has been proven independently in [16] and [19]. 

Our work contributes to the existing literature on 
random assignment problems by establishing the limit- 
ing value and asymptotic behavior of the expected op- 
timal cost Z7, of random MAP with iid cost coefh- 
cients for a broad class of continuous distributions. The 
presented approach is constructive in the sense that it 
allows for deriving converging asymptotical lower and 
upper bounds for Z7 ,,, as well as for estimating the rate 
of convergence for Z7 , in special cases. 


Expected Optimal Value of Random MAP 


Our approach to determining the asymptotic behavior 
of the expected optimal cost Z7 , of an MAP (1) with 
random cost coefficients involves analysis of the so- 
called index tree, a graph structure that represents the 
set of feasible solutions of the MAP. First introduced by 
Pierskalla [26], the index tree graph G = (V, E) of the 
MAP (1) has a set of vertices V which is partitioned into 
n levels! and a distinct root node. A node at level j of the 
graph represents an assignment (ij, . . 
and cost Cjj,...i;, whereby each level contains k = n 
nodes. The set E of arcs in the index tree graph is con- 
structed in such a way that any feasible solution of the 
MAP (1) can be represented as a path connecting the 
root node to a leaf node at level n (such a path is called 
a feasible path); evidently, the index tree contains nid} 
feasible paths, by the number of feasible solutions of the 
MAP (1). 

The index tree representation of MAP aids in con- 
struction of lower and upper bounds for the expected 


., ig) with i, = j 
d-1 


lTIn the general case of MAP with n; elements in dimension 
i=1,...,d, the index graph would contain n, levels. 


optimal cost of MAP (1) with random iid costs via the 
following lemmata [10]. 


Lemma 1. Given the index tree graph G = (V, E) of 
d > 3, n => 3 MAP whose assignment costs are iid ran- 
dom variables from an absolutely continuous distribu- 
tion, construct set A C V by randomly selecting a dif- 
ferent nodes from each level of the index tree. Then, A is 
expected to contain a feasible solution of the MAP if 


d-1 
n 
a= = | . (3) 
n! an 
Lemma 2. For ad > 3, n > 3 MAP whose cost coeffi- 


cients are iid random variables from an absolutely con- 
tinuous distribution F with existing first moment, define 


Lin := nEp [Xatc) | and cae := nEp [Xialx) | ; 
(4) 


where X(i\«) is the ith order statistic of k = n‘— iid ran- 

dom variables with distribution F, and parameter a is 

determined as in (3). Then, Z7 ,, and Z, , constitute 

lower and upper bounds for the expected optimal cost 
H * =F 

Zin Of the MAP, respectively: 27, < Zp, < Zan: 


Proofs of the lemmas are based on the probabilistic 
method [3] and can be found in [10]. In particular, the 
proof of Lemma 2 considers a set Amin that is con- 
structed by selecting from each level of the index tree 
a nodes with the smallest costs among the « nodes at 
that level. The continuity of distribution F ensures that 
assignment costs in the MAP (1) are all different almost 
surely, hence locations of the nodes that comprise the 
set Amin are random with respect to the array of nodes 
in each level of G(V, E£). In the remainder of the paper, 
we always refer to a and x as defined above. 

By definition, the parameter « = n?~! approaches 
infinity whenever n or d does; this allows us to denote 


d 
the corresponding cases by k — +00 and kK —> 00, re- 
spectively. If certain statement holds for both cases of 


n — oo and d — oo, we indicate this by x ale oo. The 
behavior of quantity w (3) when n or d increases is more 
contrasting. In the case n —> oo it approaches a finite 
limiting value, 


a—>a*:=fe*"], K—>00, (5) 
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while in the case of fixed n and unbounded d it increases 
exponentially: 


Inn! 


d 
a~k’", «—>oo, where , (6) 


Yn = 1- 
ninn 
and it is important to observe that 0 < y, < ; for 
n > 3 [13]. 

The presented lemmata addresses MAPs with 
d > 3,n > 3. The case d = 2 represents, as noted ear- 
lier, the Linear Assignment Problem, whose asymptotic 
behavior is distinctly different from that of MAPs with 
d > 3. It can be shown that in the case of d = 2 Lem- 
mas | and 2 produce only trivial bounds that are rather 
inefficient in determining the asymptotic behavior of 
the expected optimal value of the LAP within the pre- 
sented approach. In the case n = 2 the costs of feasible 
solutions to the MAP (1) have the form 


2= €;0),..{0) + €52)..42)4 
where sae 4 € {1, 2}, Ss x i , 


and consequently are iid random variables with dis- 
tribution F,, which is the convolution of F with itself: 
F, = F x F [11]. This fact allows for computing the ex- 
pected optimal value of n = 2 MAP exactly, without re- 
sorting to bounds (4): 


Zip = Ersr[ X41 | . (7) 


In the general case d > 3,n > 3 the main chal- 
lenge is constituted by computation of the upper bound 
Fis = nEp [Xielx) |, where X(qjc) is the a-th order 
statistic among « independent F-distributed random 
variables. The subsequent analysis relies on represen- 
tation of Dis in the form 


=* nl (k + 1) 


Zan = T(a)U(k —a +1) 


1 
f Fl(@)u* 1(1 —u)* “du, (8) 
0 


where F' denotes the inverse of the c.d.f. F of the the 
distribution of assignment costs in MAP (1). While it 
is practically impossible to evaluate the integral in (8) 
exactly in the general case, its asymptotic behavior for 
large n and d can be determined for a wide range of dis- 
tributions F. For instance, in the case when distribution 


F has a finite left endpoint of its support set, the asymp- 
totic behavior of the integral in (8) is obtained by means 
of the following 


Lemma 3. Let function h(u) have the following asymp- 
totic expansion at 0+, 


Co 

h(u) ~ ~~ austA—e ye OF, (9) 
s=0 

where i, 4 > 0. Then for any positive integer m one has 


1 
i h(u)u* "(1 — u)*~* du 
0 


m—1 
n,d 
kK—w, 


ashs(kK) + O(Pm(k)), (10) 


s=0 


where $;(K) = B(4* +a-1, k—-a+1),s =0,1,..., 
provided that the integral is absolutely convergent for 
K=a=1. 


Above, B(x, y) is the Beta function. Using similar re- 
sults for the cases when the support set of distribution 
F is unbounded from below, we obtain that the limiting 
behavior of the expected optimal value Z7 ,, of random 
MAP is determined by the location of the left endpoint 
of the support of F [13]. 


Theorem 1. Expected Optimal Value of Random 

MAP Considerad > 3,n > 2 MAP (1) with cost coef- 

ficients that are iid random variables from an absolutely 

continuous distribution F with existing first moment. If 

the distribution F satisfies either of the following condi- 

tions, 

1, F"\(u) = F4(0+) + O(u8),u > 0+, B > 0 

2. F'(u) ~ vu Fi (In ae u—> 04+,0 < pi < 
1, Bo > 0,81 + B2 > 0,v >0 

where F~'(0+) = limy—+04+ F-'(u), the expected opti- 

mal value of the MAP satisfies 


lim Z7,,, = lim nF~'(0+) , 


where both limits are taken at eithern — oo ord > o. 


The obtained results can be readily employed to con- 
struct upper and lower asymptotical bounds for the ex- 
pected optimal value of MAP when one of the param- 
eters n or d is large but finite. The following statement 
follows directly from Lemma 3 and Theorem 1. 
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Corollary 1. Consider a d => 3,n > 3 MAP (1) with 
cost coefficients that are iid random variables from an 
absolutely continuous distribution with existing first mo- 
ment. Let a € R be the left endpoint of the support set of 
this distribution, a = F—'(0+), and assume that the in- 
verse F—'(u) of the c.d.f. F(u) of the distribution is such 
that 


CO 
FO(u)~at >> au"", u>0t+,u>0. (11) 
s=1 


Then, for any integer m > 1, lower and upper bounds 


Zags ae (4) on the expected optimal cost Z7,,, of the 
MAP can be asymptotically evaluated as 
ml nl(k + yr(s +1) 
Zi, =an + 
fd, 2,4 V(k« rar <4 . 
(e+ DFG 
+O {n 
Pees +1) 
(12a) 


nl (k + Dry + a) 

*PF@r(ke+2+1) 
Pk +10(% +a) nid 
P(a)l («+ % +1) 


sor 


It can be shown that the lower and upper bounds de- 


fined by (12a, 12b) are convergent, i.e., ae =Z;5|> 


0, K = oo, whereas the corresponding asymptotical 
bounds for the case of distributions with support un- 
bounded from below may be divergent in the sense that 
Zan — Zi,| 7 0 when k Ie 

The asymptotical representations (12a, 12b) for the 
bounds Lie and Zz), are simplified when the inverse 
F of the c.d.f. of the distribution has a regular power 
series expansion in the vicinity of zero. Assume, for ex- 
ample, that function F —! can be written as 


FU(u) =au+ O(u’?), uot. (13) 
It is then easy to see that for n > 1 and d fixed the 


expected optimal value of the MAP is asymptotically 


bounded as 


d-1 
1 
< Ue +0(3 


a =). n—>oo, (14) 
which immediately yields the rate of convergence to 


* 1 1 . 
zero for Z d.n 28 1 approaches infinity: 


Corollary 2. Consider a d > 3,n > 3 MAP (1) with 
cost coefficients that are iid random variables from an 
absolutely continuous distribution with existing first mo- 
ment. Let the inverse F—' of the c.d.f. of the distribu- 
tion satisfy (13). Then, for a fixed d and n — on the ex- 
pected optimal value Z7 ,, of the MAP converges to zero 
as O(n—4~)), 


For example, the expected optimal value of 3-dimen- 
sional (d = 3) MAP with uniform U(0, 1) or exponen- 
tial distributions converges to zero as O(n~') when 
n— oo. 

We illustrate the tightness of the developed bounds 
(12a, 12b) by comparing them to the computed ex- 
pected optimal values of MAPs with coefficients c;j,...;, 
drawn from the uniform U(0,1) distribution and ex- 
ponential distribution with mean 1. It is elementary 
that the inverse functions F~!(-) of the c.d.f.’s for both 
these distributions are representable in form (13) with 
ay = 1. 

The numerical experiments involved solving mul- 
tiple instances of randomly generated MAPs with the 
number of dimensions d ranging from 3 to 10, and 
the number n of elements in each dimension running 
from 3 to 20. The number of instances generated for 
estimation of the expected optimal value of the MAP 
with a given distribution of cost coefficients varied from 
1000 (for smaller values of d and n) to 50 (for problems 
with largest n and d). 

To solve the problems to optimality, we used 
a branch-and-bound algorithm that navigated through 
the index tree representation of the MAP. Figures 1 
and 2 display the obtained expected optimal values 
of MAP with uniform and exponential iid cost coeffi- 
cients when d is fixed at d = 3 or 5 and n = 3,..., 20, 
and when n = 3 or 5 and d runs from 3 to 10. This 
“asymmetry” in reporting of the results is explained by 
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Expected optimal value Zant lower and upper bounds Zi, 
form U(0, 1) and exponential (1) distributions 


Z* Exponential 
Z* Uniform 


ane. 


,>Zdn of an MAP with fixed d = 3 (left) and d = 5 (right) for uni- 


Exponential 
Z* Uniform 
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Expected optimal value Zi. np lower and upper bounds z* 
U(0, 1) and exponential(1) distributions 


=d,n? 


the fact that the implemented branch-and-bound algo- 
rithm based on index tree is more efficient in solving 
“shallow” MAPs, i.e., instances that have larger n and 
smaller d. The solution times varied from several sec- 
onds to 20 hours on a 2GHz PC. 

The conducted numerical experiments suggest that 
the constructed lower and upper bounds for the ex- 
pected optimal cost of random MAPs are quite tight, 
with the upper bound Fas being tighter for the case of 
fixed n and large d (see Figs. 1, 2). 


Zs of an MAP with fixed n = 3 (left) andn = 5 (right) for uniform 


Expected Number of Local Minima 
in Random MAP 


Local Minima and p-exchange Neighborhoods 
in MAP 


As it has been mentioned in the Introduction, we 
consider local minima of a MAP with respect to 
a local neighborhood, in the sense of [15]. For 
any p=2,...,n, we define the p-exchange lo- 
cal neighborhood N;,(i) of the ith feasible solu- 
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tion {i)...i,..., i... i} of the MAP (1) as 


the set of solutions obtained from i by permut- 
- p or less elements in one of the dimensions 

wd Faia ae cs p(i) is the set of n-tuples 
jo. D isiuady” ape such that 5 ieee ap is 
a a he sri, ...,} for all 1 < k < d, and, fur- 
thermore, there exists only one ko € {1,...,d} such 
that 


22) yy SP while Yipojp =9 


»A}\ko 
(15) 


for all a 


where 4; j is the negation of the Kroneker delta, 5; j= 
1 — 6;;. As an example, consider the following feasible 
solution toad = 3,n = 3 MAP: {111, 222, 333}. Then, 
one of its 2-exchange neighbors is {111, 322, 233}, an- 
other one is {131, 222, 313}; a 3-exchange neighbor 
is given by {311, 122, 233}, etc. Evidently, one has 


Np C Nop+i for p = 2,...,n—1. 


Proposition 1. For any p = 2,...,n, the size |N,| of 
the p-exchange local neighborhood of a feasible solution 
of a MAP (1) is equal to 


P n 
Nol = ay n1;). 


k=2 


where D(k) = re ae ( i (16) 


The quantity D(k) in (16) is known as the number of 
derangements of a k-element set [29], i.e., the num- 
ber of permutations {1,2,..., ky (14,19, ..., i} 
such that i) #1,..., i 4 k, and can be easily calcu- 
lated by means of the recurrent relation (see [29]) 


D(k) = kD(k—1) + (-1)‘, D(1) =0, 


so that, for example, D(2) = 1, D(3) = 2, D(4) = 9, 
and so on. Then, according to Proposition 1, the size of 
a 2-exchange neighborhood is |N>| = d Ci the size of 
a 3-exchange neighborhood is |N3| = d[(5) + 2(4)], 


etc. 
Note also that size of the p-exchange neighborhood 
is linear in the number of dimensions d. Depending on 


p» |N>| is either polynomial or exponential in the num- 
ber of elements 1 per dimension, as follows from the 
representation 


1 1 1 —1)" n! 
D(n) = nl (1-4 4-h4--4$ jee, 
n>. 


The definition of a local minimum with respect 
to the p-exchange neighborhood is then straight- 
forward. The kth feasible solution with cost zz is 
a p-exchange local minimum iff z, <z; for all 
j © N,(k). Continuing the example above, the solution 
{111, 222, 333} is a 2-exchange local minimum iff its 
cost Z1 = Cq11 + C222 + C333 is less than or equal to costs 
of all of its 2-exchange neighbors. 

The number M, of local minima of the MAP is ob- 
tained by counting the feasible solutions that are local 
minima with respect to neighborhoods Nj. In a ran- 
dom MAP, where the assignment costs are random 
variables, M, becomes a random quantity itself. In this 
paper we are interested in determining the expected 
number E[M,] of local minima in random MAPs that 
have iid assignment costs with continuous distribution. 


Expected Number of Local Minima in MAP 
with n= 2 


As it was noted above, in the special case of random 
MAP with n = 2, d > 3, the costs of feasible solutions 
are iid random variables with distribution F « F, where 
F is the distribution of the assignment costs. This spe- 
cial structure of the feasible set allows for a closed-form 
expression for the expected number of local minima 
E[M] (note that ina n = 2 MAP the largest local neigh- 
borhood is N>, thus M = M3), as established in [11]. 


Theorem 2. In an = 2, d > 3 MAP with cost coeffi- 
cients that are iid continuous random variables, the ex- 
pected number of local minima is given by 


24-1 


pn ag’ 


(17) 


Equality (17) implies that in a n = 2, d > 3 MAP the 
number of local minima E[M] is exponential in d, when 
the cost coefficients are independently drawn from any 
continuous distribution. 
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Expected Number of Local Minima 
in a Random MAP with Normally Distributed Costs 


Our ability to derive a closed-form expression (17) for 
the expected number of local minima E[M] in the pre- 
vious section has relied on the independence of feasible 
solution costs (2) ina n = 2 MAP. As it is easy to ver- 
ify directly, in the case n > 3 the costs of feasible solu- 
tions are generally not independent. This complicates 
analysis significantly if an arbitrary continuous distri- 
bution for assignment costs cj,...;, in (1) is assumed. 
However, as we show below, one can derive upper and 
lower bounds for E[.M] in the case when the costs coeffi- 
cients of (1) are independent normally distributed ran- 
dom variables. First, we develop bounds for the num- 
ber of local minima E[M,] defined with respect to 2-ex- 
change neighborhoods NN; that are most widely used in 
practice. 


2-exchange Local Neighborhoods Noting that in the 
general case the number N of the feasible solutions to 
MAP (1) is equal to N = (n!)4—!, the expected number 
of local minima E[M_] with respect to local 2-exchange 
neighborhoods can be written in the form 


(18) 


EM] = | () ze 2 <0), 


k=1 jEN2(k) 


where NV2(k) is the 2-exchange neighborhood of the kth 
feasible solution, and z; is the cost of the ith feasible so- 
lution, i = 1,..., N. If we allow the n/ cost coefficients 
Ci,--ig Of the MAP to be independent standard normal 
N(y, 07) random variables, then the probability term 
in (18) can be expressed as 


PL) z-2 <0] = Fr), (19) 


jEN2(k) 


where F'y is the c.d.f. of the | N2|-dimensional random 
vector 


r<s. (20) 


Vector Z has a normal distribution N(0, »’) with 
the covariance matrix »’ defined as 


Cov(Z;sq, Zijk) = 
467, if i=r,j=s,qg=k, 


207, if t=rj=s,.gF k, 
o, iff G=rj7#s) o GArj=s), 
0 if iArjsAs. 

(21) 


While the value of Fy(0) in (19) is difficult to com- 
pute exactly for large d and n, lower and upper bounds 
can be constructed using Slepian’s inequality [30]. To 
this end, we introduce covariance matrices Y’ = (0; ) 
and © = (G;;) as 


4o*, if i=j, 
ae 20°, if iA jand 
= (i— 1) divd = (j- 1) divd 
0, otherwise 
(22a) 
4o*, if i=j, 
t= 207, ena (22b) 


so that O;, 50; < oi; holds for all 1 < i, j < |Na], 
with o;; being the components of the covariance ma- 


trix Y’ (21). Then, Slepian’s inequality claims that 


Fy(0) < Fy(0) < F500), (23) 


where Fy(0) and F;(0) are c.d.f’s of random vari- 
ables Xy ~ N(0, X’) and X> ~ N(0, Z) respectively. 
The structure of matrices © and ¥ allows the corre- 
sponding values Fy(0) and F5(0) to be computed in 
a closed form, which leads to the following bounds for 
the expected number of local minima in random MAP 
with iid normal coefficients: 


Theorem 3. In a n> 3,d > 3 MAP with iid normal 
cost coefficients, the expected number of 2-exchange local 
minima is bounded as 


(n!)¢-4 
(d+ 1ne-DR < E[M] < 


2(n!)4-} 


eouteae 


Note that both the lower and upper bounds in (24) co- 
incide with the exact expression (17) for E[M2] in the 
case n = 2. Also, from (24) it follows that for fixed 
n > 3, the expected number of local minima is expo- 
nential in the number of dimensions d for a fixed n. 
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Higher-Order Neighborhoods (p > 3) The outlined 
approach is applicable to general p-exchange neighbor- 
hoods. For convenience, here we consider the neigh- 
borhoods Ny as defined in Sect. “Local Minima 
and p-exchange Neighborhoods in MAP”, i.e., the 
neighborhoods obtained from a given feasible solu- 
tion by permuting exactly p elements in one of the 
d dimensions, so that for any feasible solution i = 
cs ae i\”)...i% and its p-exchange neigh- 
bot f= Gy fy aay a Nj (i) one has 
(compare to (15)) 


Yo bin in = ps ko € {1,...,d}, and 
r=1 "ko Tho 


y Bin in = 0 forall k€{1,...,d}\ko. 
k Jk 
r=1 
(25) 


Then, upper and lower bounds for the expected 
number of local minima E[M>] defined with respect 
to p-exchange neighborhoods NV," can be derived in 
a similar fashion. Namely, the sought probability 


P| al ze 21 <0) = Fy, (0) 


i€N 7 (k) 


can be bounded as Fy, (0) = Fy(0) s FS (0), where 


the matrices Y'p, Pe RIN? Nol are such that 


a | dpe? if i=j, 
(e)s-) @p—2)02, if 12}, a 
2po?, if i=j, 


po’, if if#j and (i—1)div(dD(p)) 
= (j — 1) div(dD(p)), 
0, otherwise . 
(26b) 


The corresponding bounds for the expected num- 
ber of local minima E[M7] are established by the fol- 
lowing theorem [11]. 


Theorem 4. In an > 3,d > 3 MAP with iid normal 
cost coefficients, the expected number of local minima 
M,, with respect to p-exchange local neighborhoods Ny 


is bounded as 


nla@-1 


————,.. < E[M*] « ni" 
[dD(p) + J , 


[le (vp= ia)" d®(z) , 


“ (27) 


where ®(z) is the c.d.f. of the standard normal N(0, 1) 
distribution. For 3-exchange neighborhoods N;", an im- 
proved upper bound holds: 


jd-1 
EM") # 3n! 


< : (28) 
n(n — 1)(n—2)d +3 


It is interesting to note that for a fixed p the ratio of 
number of local minima to the number of feasible solu- 
tions becomes infinitely small as the dimensions of the 
problem increase (see (17), (24), and (27)). 


Conclusions 


We have discussed asymptotical analysis of the ex- 
pected optimal value and the expected number of lo- 
cal minima of the Multidimensional Assignment Prob- 
lem whose assignment costs are iid random variables 
drawn from a continuous distribution. It has been 
demonstrated that for a broad class of distributions, 
the asymptotical behavior of the expected optimal cost 
of a random MAP in the case when one of the prob- 
lem’s dimension parameters approaches infinity is de- 
termined by the location of the left endpoint of the sup- 
port set of the distribution. The presented analysis is 
constructive in the sense that it allows for derivation of 
lower and upper asymptotical bounds for the expected 
optimal value of the problem for a prescribed probabil- 
ity distribution. 

In addition, we have derived a closed-form ex- 
pression for the expected number of local minima in 
an = 2 random MAP with arbitrary distribution of as- 
signment costs. In the case n > 3, bounds for the ex- 
pected number of local minima have been derived in 
the assumption that assignment costs are iid normal 
random variables. It has been demonstrated that the ex- 
pected number of local minima is exponential in the 
number of dimensions d of the problem. 
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Many iterative algorithms, deterministic or stochastic, 
admit distributed implementations, whereby the work 
load for performing computational steps, identified as 
bottlenecks, is distributed among a variety of computa- 
tional nodes. Extensive literature regarding distributed 
implementations of optimization algorithms in partic- 
ular is available, [19]. In recent years, there has been an 
extremely fruitful interface between mathematical pro- 
gramming algorithms and computer science. This has 
resulted in major advances in the development of algo- 
rithms and implementation of sophisticated optimiza- 
tion algorithms on high performance parallel and dis- 
tributed computers, [11,12]. Two major issues are im- 
portant in designing an efficient distributed implemen- 
tation, namely, task allocation, and communication pro- 
tocol. Task allocation relates to the breakdown of the 
total work load and this can either be static or dynamic 
depending. Communication patterns and frequency are 
important since they can induce substantial overhead in 
cases where workload irregularities occur. Various im- 
portant implementational details have been presented, 
among others, in [10]. The straightforward translation 
of serial to a distributed algorithm would assume some 
sort of global synchronization mechanism that would 
guarantee that information among processing nodes is 
being exchanged once a computational step has been 
performed. Processors must then synchronize so as to 
exchange information and proceed all with the same 
type of information to their next computational step. 

Asynchronous algorithms relax the assumption of a pre- 

determined synchronization protocol, and allow each 

processing element to compute and communicate fol- 
lowing local rates. The primary motivation for devel- 
oping algorithms was to address situations in which: 

e processors do not need to communicate to each 
other processor at each time instance; 

e processors may keep performing computations 
without having to wait until they receive the mes- 
sages that have been transmitted to them; 

e processors are allowed to remain idle some of the 
time; 

e some processors may be performing computations 
faster than others. 


Such algorithms can alleviate communication over- 
loads and they are not excessively slowed down by ei- 
ther communication delays nor by differences in the 
time it takes processors to perform one computation, 
[18]. Another major motivation is clearly to develop 
robust algorithms for distributed computation on het- 
erogeneous networks of computers. The ideas of asyn- 
chronous, also known as chaotic, iterative schemes, can 
be traced by to [9], in which special schemes for solv- 
ing linear systems of equations were developed. For 
discussing the basic principles and conditions of asyn- 
chronous iterations, the formalism of [8] will be fol- 
lowed. This work presented the first comprehensive 
treatment of the recent developments in the theory 
and practice of asynchronous iterations for a variety of 
problems, including deterministic and stochastic opti- 
mization. In essence, most iterative algorithms can be 
viewed as the search for a fixed point that corresponds 
to the solution of the original problem. The basic as- 
sumptions of the model of asynchronous (chaotic) iter- 
ations for determining fixed point of (non)linear map- 
pings are as follows: 

1) Let X be a vector space and x = (x1,..., X,) € X are 
n-tuples describing any vector from this set. It is also 
assumed that X = X, x--- x X,, with x; € X;, i= 1, 
Sgagihls 

2) Let f: X + X be a function defined by f(x) = (f1(x), 
weer fn(x)), Vx © X. 

3) A point X* € X isa fixed point of f(x) if x* = f(x*) 
or, equivalently, xf = f;(x*),i=1,...,n. 

For the solution of the aforementioned problem, one 

can define an iterative method as: 


xi = fix), 


2 ee 


with x;(t) being the values of the ith component at time 
(iteration) t. In order to comprehend the concept of 
asynchronous iterations, we assume that there exists 
a set of times T = { 0, 1, ...} at which one or more (pos- 
sibly none) components x; of x are updated by some 
processor of a distributed computing system. We de- 
fined by T' the set of times at which x; is updated. Given 
that no synchronization protocol dictating the informa- 
tion exchange exists, it is quite conceivable that not all 
processors have access to the same and most recent val- 
ues of of the corresponding components of x. It will be 
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therefore assumed that: 
Aaa scale) 


VteTi,0< T(t) <t, 
x(t), Vté Ti. 


xi(t+1)= 


In the aforementioned definition of the iterative pro- 
cess, the difference t — ti(t) between the current time ¢ 
and the time ti(t) corresponding to the jth component 
available at the processor updating x;(t) can be viewed 
as some form of communication delay. In studying the 
convergence behavior of algorithms of this type, two 
cases have to be considered. The operation can either 
be totally asynchronous or partially asynchronous. The 
concept of totally asynchronous algorithms was first in- 
troduced in [9], and subsequently analyzed in, among 
other, [1,5,15]. [5] proposed a general framework that 
ensembles a variety of instances. The cornerstone of his 
approach is based on the asynchronous convergence the- 
orem, [8]. It defined a general pattern for proving con- 
vergence of the asynchronous counterparts of certain 
sequential algorithms. The asynchronous convergence 
theorem can be applied to variety of problems includ- 
ing: 
e problems involving maximum norm contraction 
mappings; 
e problems involving monotone mappings; 
e the shortest path problem; 
e linear and nonlinear network flow problems. 
Qualitatively speaking, the fundamental difference be- 
tween a synchronous and an asynchronous iterative 
mapping, is similar to the differences between a Jacobi 
and a Gauss-Seidel iteration. Consider the implemen- 
tation of both these approaches in the minimization of 
function F(x). The specifics of the minimization algo- 
rithm are irrelevant: 
e Jacobi: 


xi(t+1)= arg min F(x;(t), 225 Xn(t))s 


e Gauss-Seidel: 
xi(t + 1) = arg min 
F(x\(t + 1),...,x;(t),...,x,(4)). 


The Gauss-Seidel approach corresponds to the instan- 
taneous communication, in a sequential manner, of the 


information as it being generated. The Jacobi itera- 
tion, forces processors to perform iterations utilizing 
‘outdated’ information. The asynchronous iteration is 
reminiscent to a Jacobi one. A thorough analysis and 
comparison of these two extremes is presented in [16]. 
A major class of iterative schemes that can be shown 
to be convergent when implemented asynchronously, 
are defined by mappings which can be shown to be 
contraction mappings with respect to a suitably defined 
weighted maximum norm: 


@ \x;| 
x = max —, 
|S = max 
x—ER", weR’. 


Let us consider the minimization of an unconstrained 
quadratic function F: 


min F(x) = Sx" Ax —b'x 
st. x ER", 


where A is an n x n positive definite symmetric matrix, 
and b € R". A gradient iteration of the form 


x:=(I-—yA)x+yb 


will be convergent provided that the maximum row 
sum of I — y A is less than 1, i.e.: 


[1 — you] + DS y |ai;| <1, i=1,... 
PIF I 


implying the diagonal dominance condition: 


Mn, 


If we consider the general nonlinear unconstrained op- 
timization problem: 

min g(x) 

st. x ER", 
where g: R"” —> R is a twice-differentiable convex func- 


tion, with Hessian matrix V*g(x) which is positive def- 
inite. If one considers a Newton mapping given by: 


f(x) = x —[V°g(x)] "Vg(x) 


The norm || x || = max; |x;| makes f a contraction map- 
ping in the neighborhood of x* (the optimal point). Ex- 
tensions of the ordinary gradient method 


f(x) = x —aVeg(x) 


126 


Asynchronous Distributed Optimization Algorithms 


are also discussed in [5]. The shortest path problem 
is defined in terms of a directed graph consisting of 
n nodes. We denote by A(i) the set of all nodes j for 
which there is an outgoing arc (i, j) from node i. The 
problem is to find a path of minimum length starting 
at node i and ending at node j. [4] considered the ap- 
plication of the asynchronous convergence theorem to 
fixed point iterations involving monotone mappings by 
considering the Bellman-Ford algorithm, [3], applied 
to the shortest path problem. This takes the form: 


xi(t +1) = min (aij + st (Dy): 
j€A(i) 


i=2,....n, teT', 


x(t + 1) = 0. 


A(i) is the set of all nodes j for which there exists an arc 
(i, j). Linear network flow problems are discussed in [8] 
and asynchronous distributed versions of the auction 
algorithm are discussed. In the general linear network 
flow problem we are given a set of N nodes and a set of 
arcs A, each arc (i, j) has associated with it an integer 
aij, referred to as the cot coefficient. The problem is to 
optimally assign flows, fj to each one of the arcs, and 
the problem is represented mathematically as follows: 


min > aij fij 
(i,j)EA 

s.t. = T=) Ji =S VieN, 
JG, jJEA Gi) 


bi < fi <cij, VU EA, 


where aj, bj, cj and s; are integers. Extensions of 
the sequential auction algorithms are discussed in [6], 
in which asynchronism manifests itself in the sense 
that certain processors may be calculating actions bids 
which other update object prices. [7] extended the anal- 
ysis to cover certain classes of nonlinear network flow 
problems in which the costs aj are functions of the flows 


Fi: 


min > aij(fij) 
(i,j)EA 

s.t. 2 fi- > fr = VieN, 
Pli,fyeA pGsi) 


bi x fi <cij, VU EA. 


Imposing additional reasonable assumptions to the 
general framework of totally asynchronous iterative al- 
gorithms can substantially increase the applicability of 


the concept. A natural extension is therefore the par- 
tially asynchronous iterative methods, whereby two ma- 
jor assumptions are be satisfied: 
a) each processor performs an update at least once 
during any time interval of length B; 
b) the information used by any processor is outdated 
by at most B time units. 
In other words, the partial asynchronism assump- 
tion extends the original model of computation by stat- 
ing that: 


There exists a positive integer B such that: 


e For every i and for every t > 0, at least one 
of the elements of the set {t, ..., ¢ + B — 1} 
belongs to T’. 

e There holds: 


t-B<7,(t) <t, 


for all i and j, and all t > 0 belonging to T’. 
e There holds r/(t) = t for alli and t € T’. 


[17] developed a very elegant framework with impor- 
tant implications on the asynchronous minimization 
of continuous functions. It was established that, while 
minimize function F(x), the asynchronous implemen- 
tation of a gradient-based algorithm: 


x:=x—yVF(X) 


is convergent if and only if the stepsize y is small com- 
pared to the inverse of the asynchronism measure B. 
Specifically, let F: R" — R be a cost function to be min- 
imized subject to no constraints. It will be further as- 
sumed that: 

1) F(x) >0, Vx ER"; 

2) F(x) is Lipschitz continuous: 


| VF(x) — VF(y)I| S Ki [lx — y|1. 
Vx,y, € R”. 


The asynchronous gradient algorithm of the syn- 
chronous iteration: 


x:=x—yVF(x) 
is denoted by: 


xi(t+1):=x;(t)—ysi(t), i=1,...,n, 
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where y is a positive stepsize, and s;(t) is the update di- 
rection. It will be assumed that 


s()=0, VWt¢ T'. 


It is important to realize that processor i at time time ¢ 
has knowledge of a vector xi(t) that is a, possibly, out- 
dated version of x(t). In other words: x(t) = ((x1(t/ (£)), 
..+) Xn(t!,(t))). It is further assumed that when x; is be- 
ing updated, the update direction s; is a descent direc- 
tion: For every i and t: 


si(t)ViF(x'(t)) < 0 
there exists positive constants K2, K3 such that 


K, |ViF(x'(t))| < |si(t)| < Ks |ViF(x'(t))|. 
VteT', Vi. 


If all of the above is satisfied, then for the asynchronous 
gradient iteration it can be shown that: There exists 
some Yo, depending on n, B, K;, K3, such that if 0 < 
Y < Yo, then lim; -, 99 A F(x(t)) = 0. 

It can actually be further shown that the choice 


1 
~ K3K\(1+B+nB) 


y 


can guarantee convergence of the asynchronous algo- 
rithm. This results clearly states that one can always, in 
principle, identify an adequate stepsize for any finite de- 
lay. 

Furthermore, [14] elaborated on the use of gradient 
projection algorithm, within the asynchronous iterative 
framework, for addressing certain classes of constraint 
nonlinear optimization problems. The constrained op- 
timization problems considered, is that of minimizing 
a convex function F: R” — R, defined over the space 
X =|], X; of lower-dimensional sets X; C R", and 
yor, nj =n. The ith component of the solution vector 
is now updated by 


xi(t +1) = [x,(t) — yViF(x'(t))]* 


where [-]* denotes the projection on the set X;. Once 
again: x;(t + 1) = x;(t), t ¢ T'. Once again, a gradient 
based algorithm is defined, for which 


([xi(t) — yViF(xi(®)]+ — xi(8)), 
te Ti, 
0 t¢T'. 


si(t) = 


It can actually be shown that for, provided that the par- 
tial asynchronism assumption holds, one can always 
define, in principle, a suitable stepsize yo such that for 
any 0 < y <0 the limit point, x*, of the sequence gener- 
ated by the partially asynchronous gradient projection 
iteration minimizes the Lipschitz continuous, convex 
function F over the set X. Recently, [2], analyzed asyn- 
chronous algorithms for minimizing a function when 
the communication delays among processors are as- 
sumed to be stochastic with Markovian character. The 
approach is also based on a gradient projection algo- 
rithm and was used to address a an optimal routing 
problem. 

A major consideration in asynchronous distributed 
computing is the fact that since no globally control- 
ling mechanism exists makes the use of any termination 
criterion which is based on local information obsolete. 
Clearly, when executing asynchronously a distributed 
iteration of the form x; — f;(x) local error estimates can, 
and will be, misleading in terms of the global state of 
the system. Recently [13] made several suggestions as 
to how the standard model can be supplemented with 
an additional interprocessor communication protocol 
so as to address the issue of finite termination of asyn- 
chronous iterative algorithms. 
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The auction algorithm is an intuitive method for solv- 
ing the classical assignment problem. It outperforms 
substantially its main competitors for important types 
of problems, both in theory and in practice, and is also 
naturally well suited for parallel computation. In this 
article, we will sketch the basic principles of the algo- 
rithm, we will explain its computational properties, and 
we will discuss its extensions to more general network 
flow problems. For a detailed presentation, see the sur- 
vey paper [3] and the textbooks [2,4]. For an extensive 
computational study, see [8]. The algorithm was first 
proposed in the 1979 report [1]. 

In the classical assignment problem there are n per- 
sons and n objects that we have to match on a one-to- 
one basis. There is a benefit aj for matching person i 
with object j and we want to assign persons to objects 
so as to maximize the total benefit. Mathematically, we 
want to find a one-to-one assignment [a set of person- 
object pairs (1, j1), ..., (5 jn), such that the objects j,, 
..+» jn are all distinct] that maximizes the total benefit 
ia4ijr- 

The assignment problem is important in many 
practical contexts. The most obvious ones are resource 
allocation problems, such as assigning personnel to 
jobs, machines to tasks, and the like. There are also situ- 
ations where the assignment problem appears as a sub- 
problem in various methods for solving more complex 
problems. 

The assignment problem is also of great theoreti- 
cal importance because, despite its simplicity, it em- 
bodies a fundamental linear programming structure. 
The most important type of linear programming prob- 
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lems, the linear network flow problem, can be reduced 
to the assignment problem by means of a simple refor- 
mulation. Thus, any method for solving the assignment 
problem can be generalized to solve the linear network 
flow problem, and in fact this approach is particularly 
helpful in understanding the extension of auction algo- 
rithms to network flow problems that are more general 
than assignment. 

The classical methods for assignment are based on 
iterative improvement of some cost function; for exam- 
ple a primal cost (as in primal simplex methods), or 
a dual cost (as in Hungarian-like methods, dual simplex 
methods, and relaxation methods). The auction algo- 
rithm departs significantly from the cost improvement 
idea; at any one iteration, it may deteriorate both the 
primal and the dual cost, although in the end it finds an 
optimal assignment. It is based on a notion of approxi- 
mate optimality, called €-complementary slackness, and 
while it implicitly tries to solve a dual problem, it actu- 
ally attains a dual solution that is not quite optimal. 


The Auction Process 


To develop an intuitive understanding of the auction 
algorithm, it is helpful to introduce an economic equi- 
librium problem that turns out to be equivalent to the 
assignment problem. Let us consider the possibility of 
matching the n objects with the n persons through 
a market mechanism, viewing each person as an eco- 
nomic agent acting in his own best interest. Suppose 
that object j has a price p; and that the person who re- 
ceives the object must pay the price p;. Then, the (net) 
value of object j for person i is a — p; and each person 
i would logically want to be assigned to an object j; with 
maximal value, that is, with 


eh = mee ey Ps (1) 
We will say that a person i is ‘happy’ if this condition 
holds and we will say that an assignment and a set of 
prices are at equilibrium when all persons are happy. 
Equilibrium assignments and prices are naturally of 
great interest to economists, but there is also a funda- 
mental relation with the assignment problem; it turns 
out that an equilibrium assignment offers maximum to- 
tal benefit (and thus solves the assignment problem), 
while the corresponding set of prices solves an associ- 


ated dual optimization problem. This is a consequence 
of the celebrated duality theorem of linear program- 
ming. 

Let us consider now a natural process for finding 
an equilibrium assignment. I will call this process the 
naive auction algorithm, because it has a serious flaw, 
as will be seen shortly. Nonetheless, this flaw will help 
motivate a more sophisticated and correct algorithm. 

The naive auction algorithm proceeds in ‘rounds’ 
(or ‘iterations’) starting with any assignment and any 
set of prices. There is an assignment and a set of prices 
at the beginning of each round, and if all persons are 
happy with these, the process terminates. Otherwise 
some person who is not happy is selected. This person, 
call him i, finds an object j; which offers maximal value, 
that is, 

ji€ ang, I AG — Pj: (2) 
and then: 

a) Exchanges objects with the person assigned to j; at 
the beginning of the round; 

b) Sets the price of the best object j; to the level at which 
he is indifferent between j; and the second best ob- 
ject, that is, he sets p;, to 


Pir + Vi (3) 
where 
Vi = Vi — Wi, (4) 


v; is the best object value, 

vi= ry = Pi}; (5) 
and w; is the second best object value 

Wj =A i= Bi) (6) 


that is, the best value over objects other than jj. 
(Note that y; is the largest increment by which the 
best object price p;; can be increased, with j; still be- 
ing the best object for person i.) 
This process is repeated in a sequence of rounds until 
all persons are happy. 
We may view this process as an auction, where at 
each round the bidder i raises the price of his or her pre- 
ferred object by the bidding increment y;. Note that y; 
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cannot be negative since vj > w; (compare (5) and (6)), 
so the object prices tend to increase. Just as in a real auc- 
tion, bidding increments and price increases spur com- 
petition by making the bidder’s own preferred object 
less attractive to other potential bidders. 

Does this auction process work? Unfortunately, not 
always. The difficulty is that the bidding increment y; 
is zero when more than one object offers maximum 
value for the bidder i (cf. (4) and (6)). As a result, a sit- 
uation may be created where several persons contest 
a smaller number of equally desirable objects without 
raising their prices, thereby creating a never ending cy- 
cle. 

To break such cycles, we introduce a perturbation 
mechanism, motivated by real auctions where each bid 
for an object must raise its price by a minimum positive 
increment, and bidders must on occasion take risks to 
win their preferred objects. In particular, let us fix a pos- 
itive scalar € and say that a person i is ‘almost happy’ 
with an assignment and a set of prices if the value of its 
assigned object j; is within € of being maximal, that is, 

aij, — Py, = max {aij — py} —€. (7) 


j= 

We will say that an assignment and a set of prices 
are almost at equilibrium when all persons are almost 
happy. The condition (7), introduced first in 1979 in 
conjunction with the auction algorithm, is known as €- 
complementary slackness and plays a central role in sev- 
eral optimization contexts. For € = 0 it reduces to ordi- 
nary complementary slackness (compare (1)). 

We now reformulate the previous auction process 
so that the bidding increment is always at least equal 
to €. The resulting method, the auction algorithm, is 
the same as the naive auction algorithm, except that the 
bidding increment y; is 


Vi =Vi-Wite, (8) 


(rather than y; = v; — w; as in (4)). With this choice, 
the bidder of a round is almost happy at the end of the 
round (rather than happy). The particular increment 
Vi = Vv; — wi + € used in the auction algorithm is the 
maximum amount with this property. Smaller incre- 
ments y; would also work as long as y; > €, but using 
the largest possible increment accelerates the algorithm. 
This is consistent with experience from real auctions, 


which tend to terminate faster when the bidding is ag- 
gressive. 

We can now show that this reformulated auction 
process terminates in a finite number of rounds, nec- 
essarily with an assignment and a set of prices that are 
almost at equilibrium. To see this, note that once an ob- 
ject receives a bid for the first time, then the person as- 
signed to the object at every subsequent round is almost 
happy; the reason is that a person is almost happy just 
after acquiring an object through a bid, and continues 
to be almost happy as long as he holds the object (since 
the other object prices cannot decrease in the course of 
the algorithm). Therefore, the persons that are not al- 
most happy must be assigned to objects that have never 
received a bid. In particular, once each object receives 
at least one bid, the algorithm must terminate. Next 
note that if an object receives a bid in m rounds, its 
price must exceed its initial price by at least me. Thus, 
for sufficiently large m, the object will become “expen- 
sive’ enough to be judged ‘inferior’ to some object that 
has not received a bid so far. It follows that only for 
a limited number of rounds can an object receive a bid 
while some other object still has not yet received any 
bid. Therefore, there are two possibilities: either 
a) the auction terminates in a finite number of rounds, 

with all persons almost happy, before every object 

receives a bid; or 

b) the auction continues until, after a finite number 
of rounds, all objects receive at least one bid, at 
which time the auction terminates. (This argument 
assumes that any person can bid for any object, but 
it can be generalized for the case where the set of 
feasible person-object pairs is limited, as long as at 
least one feasible assignment exists.) 


Optimality Properties at Termination 


When the auction algorithm terminates, we have an as- 
signment that is almost at equilibrium, but does this as- 
signment maximize the total benefit? The answer here 
depends strongly on the size of ¢. In a real auction, 
a prudent bidder would not place an excessively high 
bid for fear that he might win the object at an unneces- 
sarily high price. Consistent with this intuition, we can 
show that if € is small, then the final assignment will be 
‘almost optimal’. In particular, we can show that the to- 
tal benefit of the final assignment is within ne of being 
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optimal. To see this, note that an assignment and a set 
of prices that are almost at equilibrium may be viewed 
as being at equilibrium for a slightly different problem 
where all benefits aj are the same as before, except for 
the n benefits of the assigned pairs which are modified 
by an amount no more than e. 

Suppose now that the benefits aj are all integer, 
which is the typical practical case (if aj are rational 
numbers, they can be scaled up to integer by multiplica- 
tion with a suitable common number). Then, the total 
benefit of any assignment is integer, so if ne < 1,a com- 
plete assignment that is within ne of being optimal must 
be optimal. It follows, that if 

1 

€e<-, 

n 
and the benefits aj are all integer, then the assignment 
obtained upon termination of the auction algorithm is 
optimal. Let us also note that the final set of prices is 
within ne of being an optimal solution of the dual prob- 
lem 


min 
j 
jHl...n 


Dd Pi + Di maxtaij— pj} (9) 
j=l 


i=1 


This leads to the interpretation of the auction algorithm 
as a dual algorithm (in fact an approximate coordinate 
ascent algorithm; see the cited literature). 


Computational Aspects: ¢-Scaling 


The auction algorithm exhibits interesting computa- 
tional behavior, and it is essential to understand this 
behavior to implement the algorithm efficiently. First 
note that the amount of work to solve the problem can 
depend strongly on the value of € and on the maximum 
absolute object value 


C= max |a;j| . 
inj 


Basically, for many types of problems, the number of 
bidding rounds up to termination tends to be propor- 
tional to C/e. Note also that there is a dependence on 
the initial prices; if these prices are ‘near optimal,’ we 
expect that the number of rounds to solve the problem 
will be relatively small. 

The preceding observations suggest the idea of €- 
scaling, which consists of applying the algorithm sev- 


eral times, starting with a large value of € and succes- 
sively reducing € up to an ultimate value that is less than 
some critical value (for example, 1/n, when the benefits 
aj are integer). Each application of the algorithm pro- 
vides good initial prices for the next application. This 
is a very common idea in nonlinear programming, en- 
countered for example, in barrier and penalty function 
methods. An alternative form of scaling, called cost scal- 
ing, is based on successively representing the benefits 
a;; with an increasing number of bits, while keeping € at 
a constant value. 

In practice, it is a good idea to at least consider scal- 
ing. For sparse assignment problems, that is, problems 
where the set of feasible assignment pairs is severely 
restricted, scaling seems almost universally helpful. 
In theory, scaling leads to auction algorithms with 
a particularly favorable polynomial complexity (with- 
out scaling, the algorithm is pseudopolynomial; see the 
cited literature). 


Parallel and Asynchronous Implementation 


Both the bidding and the assignment phases of the auc- 
tion algorithm are highly parallelizable. In particular, 
the bidding and the assignment can be carried out for 
all persons and objects simultaneously. Such an imple- 
mentation can be termed synchronous. There are also 
totally asynchronous implementations of the auction al- 
gorithm, which are interesting because they are quite 
flexible and also tend to result in faster solution in some 
types of parallel machines. To understand these imple- 
mentations, it is useful to think of a person as an au- 
tonomous decision maker who at unpredictable times 
obtains information about the prices of the objects. 
Each person who is not almost happy makes a bid at 
arbitrary times on the basis of its current object price 
information (that may be outdated because of commu- 
nication delays). 

See [7] for a careful formulation of the totally asyn- 
chronous model, and a proof of its validity, including 
extensive computational results on a shared memory 
machine, confirming the advantage of asynchronous 
over synchronous implementations. 


Variations and Extensions 


The auction algorithm can be extended to solve a num- 
ber of variations of the assignment problem, such as the 
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asymmetric assignment problem where the number of 
objects is larger than the number of persons and there is 
a requirement that all persons be assigned to some ob- 
ject. Naturally, the notion of an assignment must now 
be modified appropriately. To solve this problem, the 
auction algorithm need only be modified in the choice 
of initial conditions. It is sufficient to require that all 
initial prices be zero. A similar algorithm can be used 
for the case where there is no requirement that all per- 
sons be assigned. Other variations handle efficiently the 
cases where there are several groups of ‘identical’ per- 
sons or objects ([5]). 

There have been extensions of the auction algo- 
rithm for other types of linear network optimization 
problems. The general approach for constructing auc- 
tion algorithms for such problems is to convert them 
to assignment problems, and then to suitably apply the 
auction algorithm and streamline the computations. 
In particular, the classical shortest path problem can 
be solved correctly by the naive auction algorithm de- 
scribed earlier, once the method is streamlined. Sim- 
ilarly, auction algorithms can be constructed for the 
max-flow problems, and are very efficient. These algo- 
rithms bear a close relation to preflow-push algorithms 
for the max-flow problem, which were developed inde- 
pendently of auction ideas. 

The auction algorithm has been extended to solve 
linear transportation problems ([5]). The basic idea is 
to convert the transportation problem into an assign- 
ment problem by creating multiple copies of persons 
(or objects) for each source (or sink respectively), and 
then to modify the auction algorithm to take advantage 
of the presence of the multiple copies. 

There are extensions of the auction algorithm for 
linear minimum cost flow (transshipment) problems, 
such as the so called €-relaxation method, and the auc- 
tion/sequential shortest path algorithm algorithm (see 
the cited literature for a detailed description). These 
methods have interesting theoretical properties and like 
the auction algorithm, are well suited for parallelization 
(see the survey [6], and the textbook [7]). 

Let us finally note that there have been propos- 
als of auction algorithms for convex separable network 
optimization problems with and without gains (but 
with a single commodity and without side constraints); 
see [9]. 
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The Hessian of a scalar function f(x) can be computed 
automatically in at least two ways. The first is a natural 
extension of the forward method for calculating gradi- 
ents. The others extend the reverse method. 


The Forward Mode 


The concept of forward automatic differentiation was 
described by L.B. Rall [14]. When calculating the gradi- 
ent vector of a function of n variables, a doublet data 
structure is introduced, consisting of n + 1 floating 
point numbers. To calculate the Hessian matrix, this 
data structure is extended to a triplet. 

A triplet is a data structure that, in the simplest 
form, contains 1 + n + n(n+1)/2 floating point num- 
bers. If X is a variable that occurs in the evaluation of 
f(x), then the triplet of X consists of 


7 dX 0X 
‘ Ox; Ox; 0x; 


fori=1,....nandj <i. 


The doublet consists of the first n + 1 elements of 
the triplet. 

At the start of the function evaluation the triplets of 
the variables x; must be set and these are simply (xx, ex, 
0) where e; is the unit vector with 1 in the kth place, 
and 0 is the null matrix. If the function evaluation is ex- 
panded as a Wengert list [17] consisting of three types 
of operations, 

e addition and subtraction, 

e multiplication and division, 

e nonlinear scalar functions, 

then the arithmetic required to correctly update the 

triplets is easily deduced. 

e If X, =X) + Xm, 1, m<k, then to obtain the triplet 
of X;, the elements of the triplets of X; and X,,, are 
simply added together element by element. 

e If xX, =X) Xm, 1, m <k, then the background arith- 
metic is more complex as 


OX, _x OXm x OX) 
Ox; aia Ox; a ™ Ox; 


and 


PX, — OX IXm aX 

Ox; 0x; ~ Ox; Ox; "Ox ;0%; 
OXm OX] a 
Ox; Ox; " Oxj;0x;° 


As all these terms are stored in the triplets of X; and 
Xm, given the triplets of X; and X,, the triplet of X;, 
can be computed by a standard routine. 

© IfX,= (Xm), m<k, then 


OX, ; OX m 
— Ky 
Ox; ¢ ( ) Ox; 
and 
0? Xx e Xm OXm Xm 
—— = Xm —— "(Xm ; 
Ox; 0x; ? ( ) Ox; Ox; +e ee 


To perform this operation the values of ’ (X,,) and 
” (Xm) must be calculated with $(X,,); all the other 
data is contained in the triplet of Xj. 
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Illustrative Example 1: Forward Mode 


Consider the simple function 
f(x) = (x1x2 + sin x1 + 4) (3x3 + 6) 


In this case n = 2 and each triplet contains 6 float- 
ing point numbers, the value of X, its gradient, and 
the upper half of its Hessian. To evaluate the function, 
gradient, and Hessian, first expand the function in the 
Wengert list as shown in column 1 and then evaluate 
the triplets one by one. The evaluation is performed at 
the point (0, 1) below. 


Xk triplet(X;) 
Nx 0, 1,0, 0, 0, 0 
Xp = x) 1051; 07.0510 
X3 = X\X) O715,0,0;.1,10 
X4,=sinX, | 0,1,0,0,0,0 
X5 = X3+ X4 | 0,2,0,0, 1,0 
X6=X5+4 | 4,2,0,0,1,0 
X= 1,0, 2;0,0,2 
Xg = 3X7 3, 0, 6, 0, 0, 6 
Xo=Xg+6 | 9,0,6,0,0,6 
Xin9 = XoXo | 36,18, 24,0, 21, 24 


The last row contains the values of the function, gradi- 
ent and Hessian. The values for this simple problem can 
be easily verified by direct differentiation. 

In practice forward automatic differentiation may 
be implemented in many ways, one possibility in many 
modern computer languages is to introduce the new 
data type triplet and over-write the meaning of the stan- 
dard operators and functions so they perform the arith- 
metic described above. The code for the function eval- 
uation can then be written normally without recourse 
to the Wengert list. Details of an implementation in 
Ada are given in [13]. A single run through a function 
evaluation code then computes the function, gradient 
and Hessian. If S is the store required to compute f(x) 
then this method requires (1 + n + n(n + 1)/2)S store. 
If M is the number of operations required to compute 
f(x) then (1 + 3n + 7n*)M is a pessimistic bound on 
the operations required to compute the function, gra- 
dient and Hessian. Additional overheads are incurred 
to access the data type and the over-written operator 


subroutines. The efficiency is often improved by treat- 
ing the triplet as a vector array and using sparse stor- 
age techniques. The number of zeros in the triplets of 
the above simple example illustrates the strength of the 
sparse form to calculate full Hessians. Maany reports 
the following results for the CPU time to differenti- 
ate the 50-dimensional Helmholz function (for details 
see [10]). 


Doublets triplets 
full sparse | full sparse 
f 1.36 0.44 60.29 0.44 
Ly 9.24 3.42 | 68.68 3.52 
f.Vf.VF| N/A N/A | 476.36 20.69 


The CPU time for calculating f alone within the 
full triplet package rises dramatically as although the 
derivative calculations are switched off the full pack- 
age still allocates the space for the full triplet. Using the 
sparse package is also especially helpful if n is large and 
f(x) is a partially separable function, i. e. 


fo=> 60 
k 


where f;(x) only depends on a small number V; of the 
n variables, as then, throughout the calculation of f;(x), 
the sparse triplet will only contain at most 1 + Vz + 
ViVi + 1)/2 nonzeros, and V;, will replace n in all the 
operation bounds, to give }°;(1 + 3Vxz + 7Vi)M. k Oper- 
ations. 

One of the main purposes for calculating the Hes- 
sian matrix is to use it in optimization calculations. The 
truncated Newton method can be written so that it ei- 
ther requires the user to provide f, V f, and V? f at 
each outer iteration or f, V f at each outer iteration 
and (V? f) p at each inner iteration. The first method 
is ideally suited to be combined with sparse triplet dif- 
ferentiation. The algorithm is described in [9] and re- 
sults given on functions of up to n = 3000 in [8]. The 
calculation of (V? f) p can also be undertaken simply 
by a modification of the triplet method. 

In [7] the conclusion was drawn that 


the sparse doublet and sparse triplet codes in Ada 
enable normal code to be written for the func- 
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tion f and accurate values of V f and V? f to 
be obtained reliably by the computer. The ma- 
jor hope for automatic differentiation is therefore 
achieved. 


Implementations are also available in Pascal.SC, C++, 
and Fortran90. The NOC Optima Library [1] code, OP- 
FAD, implements the sparse doublet and triplet meth- 
ods described above in Fortran90. 


The Mixed Method 


The advent of reverse automatic differentiation, A. 
Griewank [10], raised the hope that quicker ways could 
be found. The bound on the operations needed to com- 
pute the Hessian by the full forward triplet method con- 
tains the term 1/2n’M; by using a mixed method this is 
not required. The simplest mixed method is to use re- 
verse automatic differentiation to compute the gradient 
which, [10], only requires 5M operations to compute 
the function and gradient for any value of n. This can 
be repeated at appropriate steps h along each axis, i.e. 
at x + he;,i=1,...,, and simple differences applied to 
the gradient vectors to calculate the Hessian in less than 
5(n + 1)(M + 1) operations. 


Illustrative Example 2: Reverse Differentiation 


To obtain the gradient by reverse differentiation we 
must introduce the adjoint variables X{ and reverse 
back through the list. These rules are discussed in the 
previous article, but for convenience are repeated. If in 
the calculation of f(x), 


Xp = (Xi, Xj), i,j <k, 


then in the reverse pass 


ag 
X* = xX* 4+ — x* 
; eo fey = 
and 
dp 
* =o * * 
X* = Xt + ax, Xe 


For the same example the steps needed to calculate the 
gradient by reverse differentiation are 


DS 6: 
X*,=1 1 
Denne Ge, 

Ke = xOnKS 9 
2 SG 4 
ao 12 
a2 24 
Kia 9 
ee 9 
Kt = oe 9 
Xf = Xj cos X; 9 
Cie | od 
5G Ee SG || 18 


giving the gradient as (18, 24) in agreement with the 
forward calculation. To perform this calculation the 
values of X¢ and X» were required which had been cal- 
culated during the function value calculation. The re- 
verse gradient calculation must, therefore, follow a for- 
ward function evaluation calculation and the required 
data must be stored. 

The bound 5M on the number of operations re- 
quired to calculate the gradient is often very pessimistic, 
especially when the function evaluation uses matrix op- 
erations, [15], standard subroutines, [5], or when effi- 
cient sparse storage is used, [6]. The store required by 
this simple approach is simply that needed to calculate 
the gradient by reverse differentiation. The original re- 
verse method required O(M) store, but Griewank [11] 
describes how the store required can be reduced to O(S 
logM) at the cost of increasing the operation bound to 
O(M logM). 

The accuracy obtained by calculating the Hessian by 
simple differences will depend on h but will often be 
sufficient as accurate Hessians are rarely required in op- 
timization. Many software packages for calculating the 
gradient by reverse differentiation now exist, including 
the Optima Library Code OPRAD [1]. 

In 1998 the most widely used code to calculate gra- 
dients automatically is probably the ADIFOR code, [3], 
many examples of its use are given in that reference; 
unfortunately this implements a ‘statement level hybrid 
mode’. In this, each assignment statement 


¥;=WYj,,j<i je) 
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is treated in turn and the gradient, yo j €J, computed 
J 
efficiently by RAD but then to obtain the Doublet 


OY; ow OY; 
OXm 2 OY Oi 


many multiplications and additions may be required 
leading to a high operation count. 


Reverse Method 


A fully automatic approach could start by obtaining the 
Wengert list for the function and gradient as calculated 
by reverse automatic differentiation. This list will con- 
tain at most 5M steps. Then a forward sparse Doublet 
pass through this list could be performed that would 
need less than (1 + 3m) 5M operations. The Doublet 
formed for the same example is illustrated below. In 
the Wengert list all identical Doublets are merged and 
composite steps involving more than one operation are 
split, it will be observed that the last two rows of the 
Doublet contain the gradient and Hessian, as desired, 
and that the number of operations, 22, is much less than 
the bound 5M = 50. The storage requirement for this 
approach, when n is large, is considerably greater than 
that needed by the difference method. An alternative 
would be to perform a reverse pass through the gra- 
dient list. A full discussion is given in [4], who shows 
the two are identical in arithmetic, storage and oper- 
ation count. His experience with his Ada implementa- 
tion showed that the performance was very machine de- 
pendent. If the sparse Doublet approach is used with 
this reverse method on the partially separable func- 
tion described above then the bound on the opera- 
tions needed to obtain the Hessian reduces to }°,.5(Vx 
+ 1)(M; + 1), a considerable saving. An early imple- 
mentation, PADRE2, is described in [12]. A more re- 
cent code, ADOL-F, is described in [16]. Christianson’s 
method is implemented in OPRAD, mentioned above. 
It should perhaps be mentioned that all the above meth- 
ods can be hand-coded to solve any important problem 
without incurring the overheads still associated with 
most automatic packages, many of the helping hands 
described in [5] are still not implemented in an auto- 
matic package. 


Further methods for speeding up the calculation of 
the Hessian are described in ® Automatic Differentia- 
tion: Calculation of Newton Steps. 


Illustrative Example 3: 
Reverse Gradient, Forward Hessian 


The variables in the Wengert list of the function and 
gradient calculation will be denoted by Y. 


YE Doublet Y;, 
Yi = x1 0, 1,0 
Y2 = X2 1,0, 1 
Y3 = Yi Y> 0, 1, 0 
Y4 = sin Yj 0, 1,0 
Ws = Ve sh Wa O, 2,0 
Me = Wear at 4, 2,0 
a 1,0,2 
Yg = 3Y7 3, 0, 6 
Yo = Yg + 6 9, 0, 6 
Yio = Y6 Yo 36, 18, 24 
Mii = Il 1, 0, 0 
Me = Muir 4, 2,0 
Yi3 = Yi Yo 9, 0, 6 
Yu = 3Yj2 IW, 6, 0 
Yi5 = Yo Yy4 2, 6, 12 
Yie = 2Y\5 24, 2, 24 
Yi7 = COS VW 1, 0, 0 
Yis = Yi7Yi3 9, 0, 6 
Yio = Yi Yi3 0, 9, 0 
Yoo = Y2¥i3 9,0, 15 
Yo} = Yi ar Yr 18, 0, 21 
Yn? = Yi SP Yio 24, 21, 24 
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Many algorithms for solving optimization problems re- 
quire the minimization of a merit function, which may 
be the original objective function, or the solution to 
sets of simultaneous nonlinear equations which may in- 
volve the constraints in the problem. To obtain second 
order convergence near the solution algorithms to solve 
both rely on the calculation of Newton steps. 

When solving a set of nonlinear equations 

sj(x) = 0, j=l,...,n, 
the Newton step d at x), x € R", is obtained by solving 
the linear set of equations 
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where both the derivatives ds;/ 0x; and the vector func- 
tion s; are evaluated at a point, which we will denote by 
x0), 

For convenience we introduce the Jacobian matrix J 
and write the equation as 


Jd =-s 


When minimizing a function f(x) the Newton equation 
becomes 


where all the derivatives are calculated at a point, again 
denoted by x. 

In terms of the Hessian, H, and the gradient, g, this 
can be written 


Hd = —g 


Automatic differentiation can be used to calculate 
the gradient, Hessian and Jacobian, but it can also be 
used to calculate the Newton step directly without cal- 
culating the matrices. In this article we will first discuss 
the calculation of the Jacobian, then extend briefly the 
calculation of the gradient and Hessian, which was the 
subject of » Automatic differentiation: Calculation of 
the Hessian, and finally discuss the direct calculation of 
the Newton step. 


Jacobian Calculations 


If the functions s; were each evaluated as separate enti- 
ties, requiring M; operations, then the derivatives could 
be evaluated by reverse automatic differentiation in 5 
Mj operations. For many sets of functions it would, 
however, be very inefficient to evaluate the set s in this 
way, as considerable savings could be made by calculat- 
ing threads of operations common to more than one s; 
only once. In such situations the number of operations 
M required to evaluate the set s may be much less than 
>°jMj. Under these circumstances the decision on how 
the Jacobian should be evaluated becomes much more 
complicated. 

Before the advent of automatic differentiation the 
Jacobian was frequently approximated by one-sided 
differences 


ds; sj (x + he;) — s; (x) 
Ox; h 


If the vector function s requires M Wengert operations, 
then the Jacobian would need (m + 1) M operations by 
this approach. The accuracy of the result depends on 
a suitable choice of h. If simple forward automatic dif- 
ferentiation using doublets (see » Automatic differen- 
tiation: Calculation of the Hessian) is used, an accu- 
rate Jacobian is obtained at a cost of 3nM operations. 
If a Newton step is to be calculated then the Jacobian 
must be square and so the simple reverse mode, which 
involves a backward pass through the Wengert list for 
each subfunction, would be bounded by 5 n M opera- 
tions. 

Most large Jacobians are sparse and M.J.D. Powell, 
A.R. Curtis, and J.R. Reid [5], introduced the idea of 
combining columns i that had no common nonzeros. 
Then, provided the sparsity pattern of J is known, the 
values in those columns can be reconstructed by a re- 
duced number of differences. If the number of such 
PCR groups required to cover all the columns is c then 
the operations count is reduced to (c + 1) M. For exam- 
ple, the columns of the following 5 x 5 sparse Jacobian 
could be divided into 3 groups 


MA oO OO * 
MAO }* 0 
0 A Hk O 
0 0 M@ O 
M@aA 0 0 * 


indicated by Mf, A, and %. 

This same grouping could be used with forward au- 
tomatic differentiation to produce an accurate Jacobian 
in at most 3cM operations. If the sparse Doublet is used, 
the full benefit of sparsity within the calculation of the 
s; is obtained, as well as the benefit due to sparsity in 
the Jacobian, without the need to determine the column 
groupings. Results showing the advantage of calculating 
large (n = 5000) Jacobians this way are given in [15] and 
summarised in [7]. 

It is possible for the calculation of some s; to be in- 
dependent of other s that do contain a common thread. 
It would obviously be efficient to calculate these s; by re- 
verse differentiation, requiring 5Mj operations. Reverse 
differentiation will also be appropriate if the common 
thread has less outputs than inputs. Then sparse reverse 
doublets, [2], should be used. These are implemented in 
OPRAD, see » Automatic differentiation: Calculation 
of the Hessian. 


Automatic Differentiation: Calculation of Newton Steps 


139 


T.F. Coleman et al. [3,4] demonstrated that calcu- 
lating some columns using groups in the forward mode 
and some rows using groups in the reverse mode is 
considerably more efficient than using either alone. All 
nonzeros of the Jacobian must be included in a row 
and/or column computed. Similar results follow if some 
columns are computed using sparse doublets and some 
rows using the sparse reverse method. If C is the maxi- 
mum number of nonzeros in a row within the columns 
computed forward and R the maximum number of 
nonzeros in a column within the rows computed in re- 
verse then a crude bound on the number of operations 
is (3C + 5R)M. This bound does not allow for the addi- 
tional sparsity in the early calculations nor for the fact 
that for some reverse calculations M; should replace M. 
The selection of rows and columns taking account of 
such considerations is still unresolved. 

But the advantages to be obtained can be appre- 
ciated by considering the arrow-head Jacobian, where 
only the diagonal elements and the last row and col- 
umn contain nonzeros. If the gradient of s,, is computed. 
using sparse reverse doublets this will require at most 
5M, operations and if the other gradients are com- 
puted using sparse forward doublets, no doublet will 
contain more than 2 nonzeros, so the operations will be 
bounded by 6M. The total operations required in this 
case is independent of n. 


The Extended Matrix 


If the calculation of the functions s; proceeds by a se- 
quence of steps 


XE = Xie, k=1,...,n, 
Xx = O(X1,1 € L,1 < k), 
k=n+1,....M+n, 


with 
sj = Xu, j=l,...,n, 
then 
ox 
a mA#k, k=1,...,n, 
OX mn 
OXk _ 
OxK = ™ 
and 
OXK dbx OX] 


If we now denote 0X;/ Ox by Yx and dp,/ 0X, by Lu, 
then this becomes 


Ye = Do Lui 
l 


i.e. the kth row of the matrix-vector product 
= L)Y, 


where the elements in the first n rows of L are all zeros, 
and then 


Os; = % 
i oP 


Obtaining the Jacobian by the forward method may be 
considered as equivalent to solving 


(I—L)Y = em. 
Turning now to the reverse method if 
Xk = bx(Xi), 


then the adjoint variable X} contains a term 


which is the /th row of the matrix-vector product 
(=i) x”. 


To obtain the gradient of s,, is therefore equivalent to 
solving 


(I— L')X* = em+m 


then 


OSm vx 
Ox; oe 


So both the calculation of the Jacobian by the forward 
and backward method are equivalent to solving a very 
sparse set of equations. If the Wengert list is used, each 
row of L contains at most two nonzeros. It has therefore 
been suggested that methods for solving linear equa- 
tions with sparse matrices could be used to calculate 
J, A. Griewank and S. Reese [14] suggested using the 
Markowitz rule, while U. Geitner, J. Utke and Griewank 
[11] applied the method of Newsam and Ramsdell. 
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Hessian Calculations 


The calculation of the Hessian, as discussed in ® Au- 
tomatic differentiation: Calculation of the Hessian, can 
also be formulated as a sparse matrix calculation. Using 
the notation of » Automatic differentiation: Calcula- 
tion of the Hessian if the calculation of f(x) consists of 


Xx = be (Xm, m < k,m € Mx), 


then the reverse gradient calculation consists of 


a 
x6 = xn a, m € Mk. 
m 
If now we denote 
Ox 
p= k= 1... M, 
Ox; 
and 
ox* 
y= k=M+1,...,2M, 
Ox; 


then we obtain 


7) 
yoo ey. k=1,...,M, 
OX m 
and 
Y. = Y. ay IbK 
2M+1—m 2M+1-—m 2M+1—k aX mn 
Woe 
| 
+ Ak OXmIX; / 


The second derivatives are 1, if @ is a multiplication, 0 
if @ is an addition, and if @ is unary only nonzero if j 
= m. If we denote these second order terms by B, the 
calculation of H e; is equivalent to solving 


I-L 0 ej 
[er ree]Y=(0). 
Here the superscript S indicates that L has been trans- 
posed through both diagonals. The ith column of the 


Hessian is then the last n values of Y. For the illustra- 
tive example 


f(x) = (em + sina; + 4)(3x3 + 6) 


used in » Automatic differentiation: Calculation of the 
Hessian, the off-diagonal nonzeros in the matrix which 


we will denote by K, are 


K3,1 = K2o,18 = X2, 
K3,2 = Kioig = Xi, 
Ka. = K0,17 = cos Xj, 
Ks,3 = Kig,i6 = 1, 

Ks,4 = Kiz,16 = 1, 

Kes = Kies = 1, 

K7,2 = Kio14 = 2X2, 


=—ox* 
Ko9.1 => —X7 sin Xj, 


L contains 11 nonzeros and B contains 6. The matrix 
is very sparse and the same sparse matrix techniques 
could be used to solve this system of equations. 


The Newton Step 


As the notation is easier we will consider the Jacobian 
case. 

We have shown that if we solve (I — L) Y = em, then 
column m of the Jacobian J is in the last n terms of Y. If 
we wish to evaluate J p we simply have to solve 


(I-L)Y =p’ 


where p’ has its first 1 terms equal to p and the remain- 
ing terms zero. Then the solution is again in the last n 
terms of Y. To calculate the Newton step we know J d 
as it must be equal to — s, but we do not know d. We 
must therefore add the equations 


Yu+i = —Si 


to the equations, and delete the equations Y; = p;. For 
convenience we will partition L, putting the first n 
columns into A, retaining L for the remainder. So we 
have to solve 


[oe ]G)-C) 
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for d. The matrix E is rectangular and is full of zeros 
except for the diagonals which are 1. Solving for d gives 


E(I—L)'Ad = —s, 
sO 
J=E(I-L)71A, 


which is also the Schur complement of the sparse set of 
equations. 

One popular way of solving a sparse set of equations 
is to form the Schur complement and solve the result- 
ing equations, in this instance this becomes ‘form J and 
solve J d = — s, which would be the normal indirect 
method. This also justifies the attention given in this ar- 
ticle to the efficient calculation of J. 

Griewank [12] observed that it may be possible to 
calculate the Newton step more cheaply than forming 
J and then solving the Newton equations. Utke [16] 
demonstrated that a number of ways of solving the 
sparse set of equations were indeed quicker. His imple- 
mentation was compatible with ADOL-C and included 
many rules for eliminating variables. This approach was 
motivated by noting that if the Jacobian J = D + a DT, 
where D is diagonal and a and b vectors, then J is full 
and so solving J x = — s is an O(n*) operation. How- 
ever introducing one extra variable z = bT x enables the 
extended matrix to be solved very cheaply 


b'x—z=0, 
Dx + az =-—s 
gives 


x =—D'(az+s), 
z=—b'D (az +s), 


SO 
z=—(1+b'D— a) ‘b' Ds, 


and then x may be determined by substitution, which is 
an O(n) operation. The challenge to find an automatic 
process that finds such short cuts is still open. 

L.C.W. Dixon [6] noted that the extended matrix 
is an echelon form. An echelon matrix of degree k has 
ones on the k super-diagonal and zeros above it. If the 
lower part is sparse and contains NNZ nonzeros then 


the Schur complement can be computed in kKNNZ op- 
erations and the Newton step obtained by solving the 
resulting equations in O(k*) steps. The straight forward 
sparse system is an echelon form with k = n, so he sug- 
gested that by re-arranging rows and columns it might 
be possible to reduce k. This would reduce the oper- 
ations needed for both parts of the calculation. Many 
sorting algorithms have been proposed for reducing the 
echelon index of sparse matrices. J.S. Duff et al. [9] dis- 
cuss the performance of methods known as P* and P?. 
R. Fletcher [10] introduced SPK1. Dixon and Z. Maany 
[8] introduced another which when applied to the ex- 
tended matrix of the extended Rosenbrock function re- 
duces the echelon index from n to n/2 and gives a di- 
agonal Schur complement. It follows that this method, 
too, has considerable potential. 
All these approaches still require further research. 


Truncated Methods 


Experience using the truncated Newton code has led 
many researchers to doubt the wisdom of calculating 
accurate Newton steps. Approximate solutions are of- 
ten preferred in which the conjugate gradient method is 
applied to H d = — g; this can be implemented by calcu- 
lating H p at each inner iteration. H p can be calculated 
very cheaply by a single forward doublet pass with ini- 
tial values set at p through list for g obtained by reverse 
differentiation. The operations required to compute H 
pare therefore bounded by 15M. 

If an iterative method is used to solve J d = — s, the 
products J p and JT v can both be obtained cheaply, the 
first by forward, the second by reverse automatic differ- 
entiation. 
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Satellites are used in a variety of systems for com- 
munication and data collection. Familiar examples of 
these systems include satellite networks for broadcast- 
ing video programming, meteorological and geophysi- 
cal data observation systems, the global positioning sys- 
tem (GPS) for navigation, and military surveillance sys- 
tems. Strictly speaking, these are systems in which satel- 
lites are just one component, and in which there are 
other primary subsystems that have no direct involve- 
ment with satellites. Nevertheless, they will be referred 
to as satellite systems for ease of reference. 

Simple geometric models are often incorporated in 
simulations of satellite system performance. Important 
operational aspects of these systems, such as the times 
when satellites can communicate with each other or 
with installations on the ground (e. g. tracking stations), 
depend on dynamics of satellite and station motion. 
The geometric models represent these motions, as well 
as constraints on communication or data collection. 
For example, the region of space from which an an- 
tenna on the ground can receive a signal might be mod- 
eled as a cone, with its vertex centered on the antenna 
and axis extending vertically upward. The antenna can 
receive a signal from a satellite only when the satellite 
is within the cone. Taking into account the motions of 
the satellite and the earth, the geometric model predicts 
when the satellite and tracking station can communi- 
cate. 

Elementary optimization problems often arise in 
these geometric models. It may be of interest to de- 
termine the closest approach of two satellites, or when 
a satellite reaches a maximum elevation as observed 
from a tracking station, or the extremes of angular ve- 
locity and acceleration for a rotating antenna tracking 
a satellite. Optimization problems like these are for- 
mulated in terms of geometric variables, primarily dis- 
tances and angles, as well as their derivatives with re- 
spect to time. The derivatives appear both in the opti- 
mization algorithms, as well as in functions to be op- 
timized. One of the previously mentioned examples il- 
lustrates this. When a satellite is being tracked from the 
ground, the antenna often rotates about one or more 
axes so as to remain pointed at the satellite. The angular 
velocity and acceleration necessary for this motion are 
the first and second derivatives of variables expressed 
as angles in the geometric configuration of the antenna 
and satellite. Determining the extreme values of these 


derivatives is one of the optimization problems men- 
tioned earlier. 

Automatic differentiation is a feature that can be in- 
cluded in a computer programming language to sim- 
plify programs that compute derivatives. In the situa- 
tion described above, satellite system simulations are 
developed as computer programs that include com- 
puted values for the distance and angle variables of 
interest. With automatic differentiation, the values of 
derivatives are an automatic by-product of the compu- 
tation of variable values. As a result, the computer pro- 
grammer does not have to develop and implement the 
computer instructions that go into calculating deriva- 
tive values. As a specific example of this idea, consider 
again the rotating antenna tracking a satellite. Imag- 
ine that the programmer has worked out the proper 
equations to describe the angular position of the an- 
tenna at any time. The simulation also needs to com- 
pute values for the angular velocity and acceleration, 
the first and second derivatives of angular position. 
However, the programmer does not need to work out 
the proper equations for these derivatives. As soon as 
the equations for angular position are included in the 
computer program, the programming language pro- 
vides for the calculation of angular velocity and accel- 
eration automatically. That is the effect of automatic 
differentiation. Because the derivatives of geometric 
variables such as distances and angles can be quite in- 
volved, automatic differentiation results in computer 
programs that are much easier to develop, debug, and 
maintain. 

The preceding comments have provided a brief 
overview of geometric models for satellite systems, as 
well as associated optimization problems and the use of 
automatic differentiation. The discussion will now turn 
to a more detailed examination of these topics. 


Geometric Models 


The geometric models for satellite systems are formu- 
lated in the context of three-dimensional real space. 
A conventional rectangular coordinate system is de- 
fined by mutually perpendicular x, y, and z axes. The 
earth is modeled as a sphere or ellipsoid centered at the 
origin (0, 0, 0), with the north pole on the positive z 
axis, and the equator in the xy plane. The coordinate 
axes are considered to retain a constant orientation rel- 
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ative to the fixed stars, so that the earth rotates about 
the z axis. 

In this setting, tracking station and satellite loca- 
tions are represented by points moving in space. Each 
such moving point is specified by a vector valued func- 
tion r(t) = (x(t), y(t), z(t)) where t represents time. Ge- 
ometric variables such as angles and distances can be 
determined using standard vector operations: 


(cx, cy, CZ), 
(xtu,ytv,ztw), 
(x, y,Z)+(u,v,w) = xu+ yv + zw, 


c(x, y, Z) = 
(x,y,z) + (u,v, w) = 


(x, y,Z) x (u,v, w) 


= (yw — Zv, ZU — xw, xv — yu), 


Ie y.o = (ete +e 
= V(x, y,z) +(x, y, z). 


The distance between two points r and s is then given 
by || r —s ||. The angle 6 defined by rays from point r 
through points p and q is determined by 


cos @ = Ape ey : 
Ip — rll - lq —rll 

A more complete discussion of vector operations, their 
properties, and geometric interpretation can be found 
in any calculus textbook; [9] is one example. 

There are a variety of models for the motions of 
points representing satellites and tracking stations. The 
familiar conceptions of a uniformly rotating earth cir- 
cled by satellites that travel in stable closed orbits is 
only approximately correct. For qualitative simulations 
of the performance of satellite systems, particularly at 
preliminary stages of system design, these models may 
be adequate. More involved models can take into ac- 
count such effects as the asphericity of the gravitational 
field of the earth, periodic wobbling of the earth’s axis of 
rotation, or atmospheric drag, to name a few. Modeling 
the motions of the earth and satellites with high fidelity 
is a difficult endeavor, and one that has been studied 
extensively. Good general references for this subject are 
[1,2,3,10]. 

For illustrative purposes, a few of the details will 
be presented for the simplest models, circular orbits 


( 
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around a spherical earth, uniformly spinning on a fixed 
axis. The radius of the earth will be denoted R,. 

As a starting point, the rotation of the earth can 
be specified by a single function of time, §2(t), repre- 
senting the angular displacement of the prime merid- 
ian from a fixed direction, typically the direction speci- 
fied by the positive x axis (see Fig. 1.). At any time, the 
positive x axis emerges from the surface of the earth at 
some point on the equator. Suppose that at a particular 
time t, the point where the positive x axis emerges hap- 
pens to be on the prime meridian, located at latitude 
0 and longitude 0. Then S2(t) = 0 for that t. As time 
progresses, the prime meridian rotates away from the x 
axis, counter-clockwise as viewed by an observer above 
the north pole. The function 2 measures the angle of 
rotation, starting at 0 each time the prime meridian is 
aligned with the x axis, and increasing toward a maxi- 
mum of 360° (2 z in radian measure) with each rota- 
tion of the earth. With a uniformly spinning earth, (2 
increases linearly with t during each rotation. 

Once {2 is specified, any terrestrial location given 
by a latitude ¢, longitude A, and altitude a can be trans- 
formed into absolute coordinates in space, according to 
the equations 


6=A+ Q(t), (1) 
r=R.+a, (2) 
x = rcos@cos@¢, (3) 
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y=rsin6cos¢, (4) 
z=rsing. (5) 


Holding latitude, longitude, and altitude constant, these 
equations express the position in space of a fixed loca- 
tion on the earth for any time, thereby modeling the 
point’s motion. It is also possible to develop models for 
tracking stations that are moving on the surface of the 
earth, say on an aircraft or on a ship in the ocean. For 
example, if it is assumed that the moving craft is trav- 
eling at constant speed on a great circle arc or along 
a line of constant latitude, it is not difficult to express 
latitude and longitude as functions of time. In this case, 
the equations above reflect a dependence on f in A and 
@, as well as in §2. A more complicated example would 
be to model the motion of a missile or rocket launched 
from the ground. This can be accomplished in a similar 
way: specify the trajectory in earth relative terms, that 
is, using latitude, longitude, and altitude, and then com- 
pute the absolute spatial coordinates (x, y, z). In each 
case, the rotation of the earth is accounted for solely by 
the effect of Q(t). 

For a satellite in circular orbit, the position at any 
time is specified by an equation of the following form: 


r(t) = r[cos(wt)u + sin(wt)v]. 


In this equation, w t is understood as an angle in radian 
measure for the sin and cos operations; r, w, u, and v 
are constants. The first, r is the length of the orbit cir- 
cle’s radius. It is equal to the sum of the earth’s radius 
R, and the satellite’s altitude. The constant w is the an- 
gular speed of the satellite. The satellite completes an 
orbit every 27r/w units of time, thus giving the orbital 
period. Both u and v are unit vectors: u is parallel to the 
initial position of the satellite; v is parallel to the initial 
velocity. See Fig. 2. 

Mathematically, the equation above describes some 
sort of orbit no matter how the constants are selected. 
But not all of these are accurate descriptions of a free 
falling satellite in circular orbit. For one thing, u and v 
must be perpendicular to produce a circular orbit. In 
addition, there is a physical relationship linking r and 
w. Assuming that the circular orbit follows Newton’s 
laws of motion and gravitation, r and w satisfy 


w = Kr-2 (6) 
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where K is a physical constant that depends on both 
Newton’s universal gravitational constant and the mass 
of the earth. Its numerical value also depends on the 
units of measurement used for time and distance. For 
units of hours and kilometers, the value of K is 2.27285 - 
10°. As this relationship shows, for a given altitude (and 
hence a given value of r), there is a unique angular speed 
at which a satellite will maintain a circular orbit. Equiv- 
alently, the altitude of a circular orbit determines the 
constant speed of the satellite, as well as the period of 
the satellite. 

Generally, constants are chosen for a circular orbit 
based on some geometric description. Here is a typical 
approach. Assume that the initial position of the satel- 
lite is directly above the equator, with latitude 0, a given 
longitude, and a given altitude. In other words, assume 
that the initial position is in the plane of the equator, 
and so has a z coordinate of 0. (This is the situation de- 
picted in Fig. 2.) Moreover, the initial heading of the 
satellite can be specified in terms of the angle it makes 
with the xy plane (which is the plane of the equator). 
Call that angle 5. From these assumptions we can de- 
termine values for the constants r, w, u, and v in the 
equation for r(t). Now the altitude for the orbit is con- 
stant, so the initial altitude determines r, as well as w via 
equation (6). The initial latitude, longitude, and altitude 
also provide enough information to determine absolute 
coordinates (x, y, z) for the initial satellite position us- 
ing equations (1)-(5). Accordingly, the unit vector u is 
given by 


_ _(% y,2) 
IIx, y, 2) 
As already observed, the z coordinate of u will be 0. Fi- 


nally, the unit vector v is determined from the initial 
position and heading. It is known that v make an angle 
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of 5 with the xy plane, and hence makes an angle of 2/2 
— 6 with the z axis. This observation can be expressed 
as the equation 


v-(0,0,1) = sind. 

It is also known that v must be perpendicular to u, so 
v-u=0. 

Finally, since v is a unit vector, 
viv=l. 


If u = (uj, U2, 0), then these three equations lead to v 
= (+ u cos 5, F uy; cos 4, sin 6). The ambiguous sign 
can be resolved by assuming that the direction of orbit 
is either in agreement with or contrary to the direction 
of the earth’s rotation. Assuming that the orbit is in the 
same direction as the earth’s rotation, v = (— u2 cos 6, uy 
cos 6, sin 5). The alternative possibility, that the satellite 
orbit opposes the rotation of the earth, is generally not 
practically feasible, so is rarely encountered. 

The preceding paragraphs are intended to provide 
some insight about the mathematics used to describe 
the movement of satellites and terrestrial observers. Al- 
though the models presented here are the simplest ones 
available, they appear in the same general framework as 
much more sophisticated models. In particular, in any 
of these models, it is necessary to be able to compute 
instantaneous positions for satellites and terrestrial ob- 
servers at any time during a simulation. Moreover, the 
use of vector algebra and geometry to set up the simple 
models is representative of the methods used in more 
complicated cases. 


Sample Optimization Problems 


Computer simulations of satellite system performance 
provide one tool for comparing alternative designs and 
making cost/benefit trade-offs in the design process. 
Optimization problems contribute both directly and in- 
directly. In many cases, system performance is charac- 
terized in terms of extreme values of variables: what is 
the maximum number of users that can be accommo- 
dated by a communications system? At a given latitude, 
what is the longest period of time during which at most 
three satellites can be detected from some point on the 
ground? In these examples, the optimization problems 
are directly connected with the goals of the simulation. 


Optimization problems also arise indirectly as part 
of the logistics of the simulation software. This is par- 
ticularly the case when a simulation involves events that 
trigger some kind of system response. Examples of such 
events include the passage of a satellite into or out of 
sunlight, reaching a critical level of some resource such 
as power or data storage, or the initiation or termina- 
tion of radio contact with a tracking station. The de- 
tection of these events typically involves either root lo- 
cation or optimization. These processes are closely re- 
lated: the root of an equation can usually be charac- 
terized as an extreme value of a variable within a suit- 
able domain; conversely, optimization algorithms often 
generate candidate solutions by solving equations. 

In many of these event identification problems, the 
independent variable is time. The objective functions 
ultimately depend on the geometric models for satel- 
lite and tracking station motion, and so can be formu- 
lated in terms of explicit functions of time. In contrast, 
some of the optimization problems that concern di- 
rect estimation of system performance seek to optimize 
that performance by varying design parameters. A typ- 
ical approach to this kind of problem is to treat perfor- 
mance measures as functions of the parameters, where 
the values of the functions are determined through sim- 
ulation. Both kinds of optimization are illustrated in the 
following examples. 


Minimum Range 


As a very simple example of an optimization problem, 
it is sometimes of interest to determine the closest ap- 
proach of two orbiting bodies. Assume that a model has 
been developed, with r(t) and s(t) representing the po- 
sitions at time t for the two bodies. The distance be- 
tween them is then expressed as || r(t) — s(t) ||. This is 
the objective function to be minimized. Observe that it 
is simply expressed as a composition of vector opera- 
tions and the motion models for the two bodies. 

A variation of this problem occurs when several 
satellites are required to stay in radio communication. 
In that case, an antenna on one satellite (at position A, 
say) may need to detect signals from two others (at po- 
sitions B and C). In this setting, the measure of ZBAC 
is of interest. If the angle is wide, the antenna requires 
a correspondingly wide field of view. As the satellites 
proceed in their orbits, what is the maximum value of 
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the angle? Equivalently, what is the minimum value of 
the cosine of the angle? As before, the objective function 
in this minimization problem is easily expressed by ap- 
plying vector operations to the position models for the 
satellites. If a(t), b(t), and c(t) are the position functions 
for the three satellites, then 

(b —a)-(c—a) 

[|b —al] - Je— all 
This is a good example of combining vector operations 
with the models for satellite motion to derive the ob- 
jective function in an optimization problem. The next 
example is similar in style, but mathematically more in- 
volved. 


cos ZBAC = 


Direction Angles and Their Derivatives 


A common aspect of satellite system simulation is the 
representation of sensors of various kinds. The images 
that satellites beam to earth of weather systems and 
geophysical features are captured by sensors. Sensors 
are also used to locate prominent astronomical features 
such as the sun, the earth, and in some cases bright 
stars, in order to evaluate and control the satellite’s at- 
titude. Even the antenna used for communication is 
a kind of sensor. It is frequently convenient to define 
a coordinate system that is attached to a sensor, that is, 
define three mutually perpendicular axes which inter- 
sect at the sensor location, and which can be used as an 
alternate means to assign coordinates to points in space. 
Such a coordinate system is then used to describe the 
vectors from the sensor to other objects, and to model 
sensor sensitivity to signals arriving from various direc- 
tions. With several different coordinate systems in use, 
it is necessary to transform information described rela- 
tive to one system into a form that makes sense in the 
context of another system. This process also often in- 
volves what are called direction angles. 

As a concrete example, consider an antenna at 
a fixed location on the earth, tracking a satellite in orbit. 
The coordinate system attached to the tracking antenna 
is the natural map coordinate system at that point on 
the earth: the local x and y axes point east and north, re- 
spectively, and the z axis points straight up (Fig. 3). The 
direction from the station to the satellite is expressed 
in terms of two angles: the elevation 6 of the satellite 
above the local xy plane, and the compass angle a mea- 
sured clockwise from north. (See Fig. 4.) To illustrate, 
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here is the meaning of an elevation of 30 degrees and 
a compass angle of 270 degrees. Begin by looking due 
north. Turn clockwise through 270 degrees, maintain- 
ing a line of sight that is parallel to the local xy plane. At 
that point you are looking due west. Now raise the line 
of sight until it makes a 30 degree angle with the local 
xy plane. This direction of view, with elevation 30 and 
compass angle 270 degrees, might thus be described as 
30 degrees above a ray 270 degrees clockwise from due 
north. The elevation and compass angle are examples 
of direction angles. Looked at another way, if a spher- 
ical coordinate system is imposed on the local rectan- 
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gular system at the antenna, then every point in space 
is described by a distance and two angles. The angles 
are direction angles. Direction angles can be defined in 
a similar way for any local coordinate system attached 
to a sensor. 

How are direction angles computed? In general 
terms, the basic idea is to define the local coordinate 
system in terms of moving vectors, and then to use vec- 
tor operations to define the instantaneous value of di- 
rection angles. Here is a formulation for the earth based 
antenna. First, the local z axis points straight up. That 
means the vector from the center of the earth to the lo- 
cation of the antenna on the surface is parallel to the 
z axis. Given the latitude, longitude, and altitude of the 
antenna, its absolute position r(t) = (x, y, z) is computed 
using equations (1)-(5), as discussed earlier. The paral- 
lel unit vector is then given by r/ || r ||. To distinguish 
this from the global z axis, we denote it as the vector up. 
The vector pointing due east must be perpendicular to 
the up direction. It also must be parallel to the equato- 
rial plane, and hence perpendicular to the global z axis. 
Using properties of vector cross products, a unit vector 
pointing east can therefore be expressed as 


(0,0, 1) x up 


east = —_____.. 
[|(0, 0, 1) x up| 


Finally, the third perpendicular vector is given by the 
cross product of the other two: north = up x east. Note 
that these vectors are defined as functions of time. At 
each value of t the earth motion model gives an instan- 
taneous value for r(t), and that, in turn, determines the 
vectors up, east, and north. 

Next, suppose that a satellite is included in the 
model, with instantaneous position s(t). The view vec- 
tor from the antenna to the satellite is given by v(t) = 
[s(t) — r(t)]/ || s(t) — r(f) ||. The goal is to calculate the 
direction angles a and 6 for v. Since 6 measures the an- 
gle between v and the plane of east and north, the com- 
plimentary angle can be measured between v and up. 
This leads to the equation 


sind = up-v. 
The angle a is found from 


Vn = V- north 


Ve = V- east 


according to the equations 


Vn 
cosa = 5 5 
Vi, VE 
3 Ve 
sina = 


These follow from the fact that the projection of v into 
the local xy plane is given by v, east + v, north. 

In this example, direction angles play a role in sev- 
eral optimization problems. First, it may be of interest 
to predict the maximum value of 6 as a satellite passes 
over the tracking station. This maximum value of ele- 
vation is an indication of how close the satellite comes 
to passing directly overhead, and may be used to deter- 
mine whether communication will be possible between 
satellite and tracking station. 

Additional optimization problems concern the 
derivatives of a and 6. In many designs, an antenna can 
turn about horizontal and vertical axes to point the cen- 
ter of the field of view in a particular direction. In order 
to stay pointed at a passing satellite, the antenna must 
be rotated on its axes so as to match the motion of the 
satellite, and and 6 specify exactly how far the antenna 
must be rotated about each axis at each time. However, 
there are mechanical limits on how fast the antenna can 
turn and accelerate. For this reason, during the time 
that the satellite is in view, the maximum values of the 
first and second derivatives of w and 6 are of interest. 
If the first derivatives exceed the antenna’s maximum 
turning speed, or if the second derivatives exceed the 
antenna’s maximum acceleration, the antenna will not 
be able to remain pointed at the satellite. 


Design Parameter Optimization 


The preceding examples all involve simple kinds of op- 
timization problems with objective functions depend- 
ing only on time. There are also many situations in 
which system performance variables are optimized over 
some domain of design parameters. As one example of 
this, consider a system with a single satellite traveling in 
a circular orbit. Assume that the initial point of the or- 
bit falls on the equator, with angle 6 between the initial 
heading and the xy plane, as in Fig. 2. In this example, 
the object is to choose an optimal value of 5. The opti- 
mization problem includes several tracking stations on 
the ground that are capable of communicating with the 
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satellite. As it orbits, there may be times when the satel- 
lite cannot communicate with any of the tracking sta- 
tions. At other times, one or more stations may be ac- 
cessible. Over the simulation period, the total amount 
of time during which at least one tracking station is ac- 
cessible will depend on the value of 6. It is this total 
amount of access time (denoted A) that is to be max- 
imized. 

In this problem, the objective function A is not 
given as a mathematical expression involving the vari- 
able 6. An appropriate simulation can be created to 
compute A for any particular 6 of interest. This can then 
be used in conjunction with an optimization algorithm, 
with the simulation executed each time it is necessary 
to calculate A(6). 

The preceding example is a simple one, and the ex- 
ecution time required to compute A(6d) is small. For 
more complicated situations, each execution of the sim- 
ulation can require a significant amount of time. In 
these cases, it may be more practical to use some sort 
of interpolation scheme. The idea would be to run the 
simulation for some values of the parameter(s), and to 
interpolate between these values as needed during the 
optimization process. 

In some situations, there is a resource allocation 
problem that can add yet another level of complexity to 
optimizing system performance. For example, if there 
are several satellites that must compete for connection 
time with the various tracking stations, just determin- 
ing how to assign the tracking stations to the satellites 
is not a simple matter. In this situation, there may be 
one kind of optimization problem performed during 
the simulation to make the resource allocations, and 
then a secondary optimization that considers the effect 
of changing system design parameters. An example of 
this kind of problem is described in detail in [6]. 

The preceding examples have been provided to il- 
lustrate the kinds of optimization problems that arise 
in simulations of satellite systems. Although there has 
been very little discussion of methods to solve these op- 
timization problems, it should be clear that standard 
methods apply, especially in the cases for which the in- 
dependent variable is time. In that context, the ability 
to compute derivatives relative to time for the objec- 
tive function is of interest. In addition, it sometimes 
occurs that the objective function is, itself, defined as 
a derivative of some geometric variable, providing an- 


other motivation for computing derivatives. The next 
topic of discussion concerns the use of automatic dif- 
ferentiation for computing the desired derivatives. 


Automatic Differentiation 


Automatic differentiation refers to a family of tech- 
niques for automatically computing derivatives as 
a byproduct of function evaluation. A survey of differ- 
ent approaches and applications can be found in [5] and 
in-depth treatment appears in [4]. For the present dis- 
cussion, attention will be restricted to what is called the 
forward mode of automatic differentiation, and in par- 
ticular, the approach described in [8]. In this approach, 
to provide automatic calculation of the first m deriva- 
tives of real valued expressions of a single variable x, 
an algebraic system is defined consisting of real m+ 1 
tuples, to which are extended the familiar binary op- 
erations and elementary functions generally defined on 
real variables. For concreteness, m will be assumed to be 
3 below, but the discussion can be generalized to other 
values in an obvious way. 

With m = 3, the objects manipulated by the auto- 
matic differentiation system are 4-tuples. The idea is 
that each 4-tuple represents the value of a function and 
its first 3 derivatives, and that the operations on tuples 
preserve this interpretation. Thus, if a = (ao, a1, a2, a3) 
consists of the value of f(t), f’(t), f’(#), and f”’(#) at 
some t, and if b = (bo, bj, b2, b3) is similarly defined 
for function g, then the product ab that is defined for 
the automatic differentiation system will consist of the 
value at ¢ of fg and its first 3 derivatives. Similarly, the 
extension of the squareroot function to 4-tuples is so 
contrived that /a will consist of the value of Vf) and 
its first 3 derivatives. 

In the preceding remarks, the functions f and g are 
assumed to be real valued, but similar ideas work for 
vector valued functions. The principle difference is this: 
when f(t) is a vector, then so are its derivatives, and the 
a; referred to above are then vectors rather than scalars. 
In addition, for vector valued functions, there are dif- 
ferent operations than for scalar valued functions. For 
example, vector functions may be combined with a dot 
product, as opposed to the conventional product of real 
scalars, and while the squareroot operation is not de- 
fined for vector valued functions, the norm operation || 


f(t) || is. 
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In an automatic differentiation system built along 
these lines, there must be some functions that are evalu- 
ated directly to produce 4-tuples. For example, the con- 
stant function with value c can be evaluated directly to 
produce the tuple (c, 0, 0, 0), and the identity function 
I(t) = t can be evaluated directly to produce (t, 1, 0, 
0). For geometric satellite system simulations, it is also 
convenient to provide direct evaluation of tuples for the 
motion models. For example, let r(t) be the position 
vector for a tracking station, as developed in equations 
(1)-(5). It is a simple matter to work out appropriate 
formulas for the first three derivatives of r(t), each of 
which is also a vector. This is included in the automatic 
differentiation system so that when a particular value of 
t is given, the motion model computes the 4-tuple (r(t), 
r’(t), r’(t), r’’(t). A similar arrangement is made for 
every moving object represented in the simulation, in- 
cluding satellites, tracking stations, ships, aircraft, and 
so on. 

Here is a simple example of how automatic differen- 
tiation is used. In the earlier discussion of optimization 
problems, there appeared the following equation: 


(b—a) -(c—a) 


cos ZBAC = ——_______. 
|b — al] - le —al| 


Using automatic differentiation, a, b, and c would be 
4-tuples, each consisting of four vectors. These are pro- 
duced by the motion models for three satellites, as the 
values of position and its first three derivatives at a spe- 
cific time. The operations used in the equation, vector 
difference, dot product, and norm, as well as scalar mul- 
tiplication and division, are all special modified opera- 
tions that work directly on 4-tuples. The end result is 
also a 4-tuple, consisting of the cosine of angle BAC, 
as well as the first three derivatives of that function, all 
at the specified value of t. As a result, the programmer 
can obtain computed values for the derivatives of the 
function without explicitly coding equations for these 
derivatives. More generally, after defining appropriate 
4-tuples for all of the motion models, the programmer 
automatically obtains derivatives for any function that 
is defined by operating on the motion models, just by 
defining the operations. No explicit representation of 
the derivatives of the operations is needed. Some details 
of how the system works follow. 


Scalar Functions and Operations 


Consider first operations which apply to scalars. There 
are two basic types: binary operations (+, —, x, +) and 
elementary functions (squareroot, exponential and log- 
arithm, trigonometric functions, etc.). These operations 
must be defined for the 4-tuples of the automatic dif- 
ferentiation system in such a way that derivatives are 
correctly propagated. 

The definition for multiplication will illustrate the 
general approach for binary operations. Suppose that 
(a, b, c, d) and (u, v, w, x) are two 4-tuples of scalars. 
They represent values of functions and their deriva- 
tives, say, (a, b, c, d) = (f(t), (6), f’ (0), f(b) and (u, 
v, w, x) = (g(t), g'(0), g(t), ¢’”"(t)). The product is sup- 
posed to give ((fg) (t), (fg)’ (2), (fg) (t), fg)” (f)). Each of 
these derivatives can be computed using the derivatives 
of f and g. 


(fgi(t) = f()g(t), 
(fg) (t) = f’(Og(t) + f(g’ (t), 
(fg)"(t) = f" (g(t) + 2f' Og’) + fg" (t), 
(fg) (t)) = f'"(t)g(t) + 3f"(Dg' (0) 
+ 3f'(t)g"(t) + f(g’"(8). 


On the right side of each equation, now substitute the 
entries of (a, b, c, d) and (u, v, w, x). 


(fg)(t) = au, 
(fg)'(t) = av + bu, 
(fg)"(t) = aw + 2bv + cu, 
(fg) (t)) = ax + 3bw + 3cv + du. 


This shows that 4-tuples must be multiplied according 
to the rule 


(a, b,c, d)(u,v, w, x) 
= (au,av-+ bu, aw + 2bv + cu, 
ax + 3bw + 3cv + du). 


For addition, subtraction, and division a similar ap- 
proach can be used. All that is required is that succes- 
sive derivatives of the combination of f and g be ex- 
pressed in terms of the derivatives of f and g separately. 
Replacing these derivatives with the appropriate com- 
ponents of (a, b, c, d) and (u, v, w, x) produces the de- 
sired formula for operating on 4-tuples. 
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To define the operation on a 4-tuple of an elemen- 
tary function, a similar approach will work. Consider 
defining how a function h should apply to a 4-tuple (a, 
b, c, d) = (f(t), f(t), f’(t), f’(£). This time, the desired 
end result should contain derivatives for the compos- 
ite function h o f, and so should have the form ((h 9 f) 
(t), (ho f)'(t), (ho f)’"(t), (ho f)’"(t)) The derivative 
of ho f is given by h'(f(t)) f’(£), which becomes h'(a) 
b after substitution. Similar computations produce ex- 
pressions for the second and third derivatives: 


(ho f)"(t) 
=Ah'(fO fii? +VMOS'"O 
= h"(a)b? + h'(a)c 


and 


(ho fy'"(t) 

=h'" (f(t) fo? + 3h"(FO) FOL 
+ h'(f(t)) f(t) 

= h'"(a)b? + 3h"(a)bc + h'(a)d. 


These results lead to 
h(a, b,c, d) 
= (h(a), h'(a)b, h"(a)b? + h'(a)c, 
h(a)b? + 3h"(a)be + h'(a)d). 
As an example of how this is applied, let h(t) = e’. Then 
h(a) = h'(a) = h(a) = h(a) = e* so 


plasbsc,d) 


= (e", e*b, e*b? + ec, e@b? + 3e%be + ed) 
= e4(1,b,b? +c,b° + 3be +d). 


Other functions are a little more complicated, but the 
overall approach is generally correct. 

The preceding discussion indicates how operations 
on 4-tuples would be built into an automatic differenti- 
ation system. However, the user of such a system would 
simply apply the operations. So, if an appropriate defi- 
nition has been provided for £2(t) as discussed earlier, 
along with the derivatives, the program would compute 
a 4-tuple for 2 and its derivatives at a particular time. 
Say that is represented in the program by the variable 
W. If the program later includes the call sin(W), the re- 
sult would be a 4-tuple with values for sin({2(t)), and 
the first three derivatives. 


Vector Functions and Operations 


The approach for vector functions is basically the same 
as for scalar functions. The only modification that is 
needed is to recognize that the components of 4-tu- 
ples are now vectors. Because the rules for computing 
derivatives of vector operations are so similar to those 
for scalar operations, there is little difference in the ap- 
pearance of the definitions. For example, here is the def- 
inition for the dot product of two 4-tuples, whose com- 
ponents are vectors: 


(a, b,c, d)- (u,v, w, x) 
=(a-u,a-v+b-u,a-w+2b-v+c-u, 
a-x+3b-w+3c-v+d-u). 
The formulation for vector cross product is virtually 


identical, as is the product of a scalar 4-tuple with a vec- 
tor 4-tuple. For the vector norm, simply define 


I|(a, b, c,d)|| = (a, b,c, d) - (a, b,c, d). 


Since both dot product of vector 4-tuples and square- 
root of scalar 4-tuples have already been defined in 
the automatic differentiation system, this equation will 
propagate derivatives correctly. 

With a full complement of scalar and vector oper- 
ations provided by the automatic differentiation sys- 
tem, all of the geometric variables discussed in previ- 
ous examples can be included in a computer program, 
with derivatives generated automatically. As a partic- 
ular case, reconsider the discussion earlier of comput- 
ing elevation 6 and compass angle a for a satellite as 
viewed from a tracking station. Assuming that r and 
s have been defined as 4-tuples for the vector posi- 
tions of that station and satellite, the following fragment 
of pseudocode would carry out the computations de- 
scribed earlier: 


= r/norm(r) 

= cross(pole, up) 
= east/norm(east) 
= cross(up, east) 

= (s—r)/norm(s—r) 
= dot(v, north) 

= dot(v, east) 


= dot(v, up) 


= asin(vu) 


= atan2(ve, vn) 
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Executed in an automatic differentiation system, 
this code produces not just the instantaneous values of 
the angles w and 4, but their first three derivatives, as 
well. The programmer does not need to derive and code 
explicit equations for these derivatives, a huge savings 
in this problem. And all of the derivative information is 
useful. Recall that the first and second derivatives are of 
interest for their physical interpretations as angular ve- 
locities and accelerations. The third derivatives are used 
in finding the maximum values of the second deriva- 
tives (accelerations). 


Implementation Methods 


One of the simplest ways to implement automatic dif- 
ferentiation is to use a language like C** that sup- 
ports the definition of abstract data types and operator 
overloading. Then the automatic differentiation system 
would be implemented as a series of data types and op- 
erations, and included as part of the code for a simula- 
tion. A discussion of one such implementation can be 
found in [7]. 

Another approach is to develop a preprocessor that 
automatically augments code with the steps needed to 
compute derivatives. With such a system, the program- 
mer develops code in a conventional language such as 
FORTRAN, with some additional features that con- 
trol the application of automatic differentiation. Next, 
this code is operated on by the preprocessor, produc- 
ing a modified program. That is then compiled and ex- 
ecuted in the usual way. Examples of this approach can 
be found in [5]. 


Summary 


Geometric models are very useful in representing the 
motions of satellites and terrestrial objects in simula- 
tions of satellite systems. These models are defined in 
terms of vector operations, which permit the conve- 
nient formulation of equations for geometric constructs 
such as distances and angles arising in the satellite sys- 
tem configuration. Equations which specify instanta- 
neous positions in space of moving objects are a funda- 
mental component of the geometric modeling frame- 
work. 

Optimization problems occur in this framework in 
two guises. First, there are problems in which the ob- 
jective functions are directly defined as features of the 


geometric setting. An example of this would be to find 
the minimum distance between two satellites. Second, 
measures of system performance are derived via sim- 
ulation as a function of design parameters, and these 
measures are optimized by varying the parameters. An 
example of this kind of problem would be to seek a par- 
ticular orbit geometry in order to maximize the total 
amount of time a satellite has available to communicate 
with a network of tracking stations. 

Automatic differentiation is a feature of an envi- 
ronment for implementing simulations as computer 
programs. In an automatic differentiation system, the 
equations which define values of variables automati- 
cally produce the values of the derivatives, as well. In 
the geometric models of satellite systems, derivatives of 
some variables are of intrinsic interest as velocities and 
accelerations. Derivatives are also useful in solving op- 
timization problems. 

Automatic differentiation can be provided by re- 
placing single operands with tuples, representing the 
operands and their derivatives. For some tuples, the 
derivatives must be explicitly provided. This is the case 
for the motion models. For tuples representing combi- 
nations of the motion models, the derivatives are gen- 
erated automatically. These combinations can be de- 
fined using any of the supported operations provided 
by the automatic differentiation system, typically in- 
cluding the operations of scalar and vector arithmetic, 
as well as scalar functions such as exponential, loga- 
rithmic, and trigonometric functions. Languages which 
support abstract data types and operator overloading 
are a convenient setting for implementing an automatic 
differentiation system. 
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Introduction 


Most numerical algorithms for analyzing or optimizing 
the performance of a nonlinear system require the par- 
tial derivatives of functions that describe a mathemat- 
ical model of the system. The automatic differentiation 
(abbreviated as AD in the following), or its synonym, 
computational differentiation, is an efficient method for 
computing the numerical values of the derivatives. AD 
combines advantages of numerical computation and 
those of symbolic computation [2,4]. 
Given a vector-valued function f: R” > R”: 


fila, aby Xn) 
y = f(x) = (1) 


m(X1, oe Xn) 


of n variables represented by a big program with hun- 
dreds or thousands of program statements, one often 
had encountered (before the advent of AD) some diffi- 
culties in computing the partial derivatives df;/0x; with 
conventional methods (as will be shown below). Now, 
one can successfully differentiate them with AD, deriv- 
ing from the program for f another program that ef- 
ficiently computes the numerical values of the partial 
derivatives. 

AD is entirely different from the well-known nu- 
merical approximation with quotients of finite differ- 
ences, or numerical differentiation. The quotients of fi- 
nite differences, such as (f(x + h)— f(x))/h and (f(x 
+ h) — f(x — h))/2h, approximate the derivative f’(x), 
where truncation errors are of O(h) and O(h’), respec- 
tively, but there is an insurmountable difficulty to com- 
pute better and better approximation. For, although an 
appropriately small value of h is chosen, it may fail to 
compute the values of the function when x + h is out of 
the domain of f, and, furthermore, the effect of round- 
ing errors in computing the values of the functions is of 
problem. 
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AD is also different from symbolic differentiation 
with a symbolic manipulator. The symbolic differen- 
tiation derives the expressions of the partial deriva- 
tives rather than the values. The mathematical model 
of a large scale system may be described in thousands of 
program statements so that it becomes very difficult to 
handle whole of them with an existing symbolic manip- 
ulator. (There are a few manipulators combined with 
AD, which can handle such large scale programs. They 
should be AD regarded as a symbolic manipulator.) 


Example 1 Program 1 computes an output value y, as 
a composite function f; for given input values x; = 2, x2 
=3,; x3 4: 


X (x2 — x3) 


exp(x)(x2 —x3)) +1 


(2) 


y= Silx1, x2, x3) = 


IF (x2.le.x3) 
THEN y=X1 (x2 — x3) 


ELSE y; =X1(x2 +X3) 
ENDIF 
y1 =yi/(exp(y1) + 1). 
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Example 


The execution of this program is traced by a se- 
quence of assignment statements (Program 2). 


V1 <1 (Xz — X3), 
yi —yil(exp(yi) + 1). 
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Program 1 expanded to straight line program for the speci- 
fied input values 


A set of unary or binary arithmetic operators (+, —, 
*, /) and elementary transcendental functions (exp, log, 
sin, cos, ...) that may be used in the programs will be 
called basic operations. (Some special operations such 
as those generating ‘constant’ and ‘input’ are also to be 
counted among basic operations.) Program 2 can be ex- 
panded into a sequence of assignment statements each 
of whose right side has only one basic operation (Pro- 
gram 3), where Z), ..., Z; are temporary variables (s = 2 
for this example). 


Z| <— X2 — X3, 
Z1 <— X) * Z1; 
Z2 <— exp(Z1); 
Z2<— 22+1, 
Zy <— 21/2. 


ak wWN 
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Expanded history of execution with each line having only 
one basic operation 


Moreover, it is useful to rewrite Program 3 into a se- 
quence of single assignment statements, in which each 
variable appears at most once in the left sides (Pro- 
gram 4), hence, ‘<’ can be replaced by ‘=’. 


Vv, <— X2 — X3;, 
V2 <— X1 * Vi, 
v3 < exp(v2), 
V4 <— v3 +1,, 
V5 < V2/V4, 


aA PwWN 
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Computational process 


The sequence is called a computational process, 
where the additional variables v), ..., vs are called in- 
termediate variables that keep the intermediate results. 
A graph called a computational graph, G = (V, A), may 
be used to represent the process (see Fig. 1). 


Algorithms 


There are two modes for AD algorithm, forward mode 
and reverse mode. The forward mode is to compute 
dy;/dx; (i= 1, ..., m) for a fixed j, whereas the reverse 
mode is to compute dy;/dx; (j= 1,..., n) fora fixed i. 

The forward mode corresponds to tracing an ex- 
panded program such as Program 3 in the natural or- 
der. Assume that execution of the kth assignment in the 
program is represented as 


Zc <— Wk(Zas Zp) - (3) 


When the values of both dz,/dx; and 0z)/dx; are known, 
dz,/dx; can be computed by applying the chain rule of 
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Computational graph 


differentiation to (3): 


OZ, & OW OZq 
OZ Ox; 


OW; OZp 
0Zp Ox; , 


(4) 


Ox; 


OWy/dZ_ and dy;/dzp are called elementary partial 
derivatives, and are computed by Table 1 for various 
Wr 

Introducing new variables Z),...,Zs, X1,..-,Xn 
corresponding to 0z,/dx;, ..., 025/0xj, 0x,/0x;, ...5 
dx,/dx;, respectively, and initializing x, < 0(1<k 
<n,k #j) and x; < 1, we may express (4) as 


+—%Z%. (5) 


Thus, we can write down the whole program for the for- 
ward mode as shown in Program 5. 

The reverse mode corresponds to tracing a com- 
putational process such as Program 4 backwards. The 
kth computational step, i. e., execution of the kth assign- 
ment in the program, can be written in general as 


Vi = Wk(uk1. Uk2) lui =Vvay mi2=Vh, , (6) 


Automatic Differentiation: Introduction, History and Round- 


ing Error Estimation, Table 1 
Elementary partial derivatives 


OW 
= 


xa 


Z = W(Za; Zp) 


Bp i iy ae 745 


as 


5//Za(= 4/2) 


1/Zq 
exp(Zq)(= Zc) 
— sin(Z,) 


= sin(z,) cos(Zg) 


Initialization 
Xj <1, 
x<O(l<k<n,k Fj), 
Forward algorithm: 

1 | 2 <— x2.— 3, 

1’ | z) —1%* xX. —1%* x3, 

Dy! Z1<— 2 * X, +X] * Zi, 

2 | 2 <— x, * 2, 

3 | z. <— exp(z1), 

3/ Z2 < Z2 * aie 

4 |2z<2,+1, 

4’ | Z%<—1#*2Z, 

5 | z<— 2/22, 

5! | Z <— (1/z2) * Z — 


(Z1/Z2) * Zp 
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ing Error Estimation, Program 5 
Forward mode program for differentation 


where uj, ; and uj, 2 are formal parameters, vq, and vg, 
»Xn> V1> 
w+) Ve—1. Tf Wy is unary, ux,2 and vg, are omitted. Let 
r be the total number of computational steps. In Pro- 
gram 4, we have r = 5 and, for k = 2, e.g., Wo = ‘*’; Va, 
= x, and vg, = vy. 
The total differentiation of (6) yields the relations 


are real parameters representing some of x),... 


among dx;,..., dx, dv, ..., dv, such as follows: 
OWk OW. 
dv, = d ——d [a eens 2 een 
Vk Fae Vo, + ble vp,  ( r). (7) 


The computation of the partial derivatives of the ith 
component of the final result y; = f(x, ..., x») in (1) 
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with respect to x),..., X» is that of the coefficients of the 
relation among dx),..., dx, and dy;. 

Here, new variables X1,...,%n, Vi... 
duced for the computation of those coefficients. With- 
out loss of generality, we may assume that the value of 
yi is computed at v,. After Program 4 is executed in the 
natural order with all the information on intermediate 
results preserved, these new variables are initialized as 
xj <— OG =1,...,), Ve <— 0(k=1,..., r— 1) and 
v, < 1, then the relation 


., Vy are intro- 


dy = pes es + SY vedve (8) 
j=l k=1 


holds. Secondly, dv,, dv,;—1,..., dv, can be eliminated 
from (8) in this order by modifying 


= = _ OWk 

Va, <— Vo, + VE v ; (9) 
ak 

_ _ _ IW 

VB. — VB, + VE Oa: c (10) 
k 


Finally, if we change k in the reverse order, i.e. k=1, 
r—1,..., 1, we can successfully eliminate all the dv; 
(k=1,..., r) to have 


n 
dy = ay ; (11) 
j=l 

The final coefficient x; indicates the value of df;/dx; (j= 
1,...,n). Program 6 in which modifications (9) and (10) 
are embedded is the reverse mode program, which is 
sometimes called the adjoint program of Program 4. 

It is easy to extend the algorithms for computing 
a linear combination of the column vectors of the Jaco- 
bian matrix J with the forward mode, and a linear com- 
bination of the row vectors of J with the reverse mode. 


Complexity 


It is proved that, for a constant C ( = 4 ~ 6, varying 
under different computational models), the total oper- 
ation count for dy;/dx;'s with a fixed j in the forward 
mode algorithm, as well as that for dy;/0 x;’s with a fixed 
iin the reverse mode algorithm, is at most C - r, i.e., 
in O(r). Roughly speaking, r is proportional to the ex- 
ecution time T of the given program, so that the time 
complexity is in O(T). Furthermore, we have to repeat 
such computation n times to get all the required partial 


Forward sweep: 

(insert Program 4 here) 
Initialization: (n = 3,r = 5) 
Seg <= OG = tyaca nit), 

Ve — 0 (k=1,...,r—1), 
W <= ly 
Reverse elimination: 
5” | Vo <— V2 + (1/4) * V5, 
V4 <— V4 + (—Vs5/V4) * V5, 
4” | 03 <— ¥34+1%* V4, 
Bu V2 = V2 + V3 °* V3, 
2” | x1 <— xX, +1 * V2, 
V1, <— Vi +X] * V2, 
1” | xX. <— X%.+1%*V}, 
x3 <— x3+(—-1) * Vj. 
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Reverse mode program 


derivatives by the forward mode, and m times by the 
reverse mode. What should be noted here is that the 
computational time of the forward or reverse mode al- 
gorithm for one set of derivatives does not depend on 
m or n but only on r. 

Denoting the spatial complexity of the original pro- 
gram by S, that of the forward mode algorithm is in 
O(S). However, the spatial complexity of the reverse 
mode is in O(T), since the reverse mode requires a his- 
tory of the forward sweep recorded in storage whose 
size is in O(T). 

A rough sketch of the proof is as follows. Without 
loss of generality, assume that the given program is ex- 
panded into a sequence of single assignment statements 
with a binary or unary basic operation as shown in Pro- 
gram 3 and 4. The operation count for computing the 
elementary partial derivatives (Table 1) is bounded by 
a constant. The additional operation count for modi- 
fying v;’s and x;’s in (5), (9) and (10) is also bounded 
since there are at most two additions and two multipli- 
cations. There are r operations in the original program, 
so that the total operation count in the forward mode 
algorithm as well as that in the reverse mode algorithm 
is in O(7). 

Note that the computational complexities of the for- 
ward mode and the reverse mode may not be optimal, 
but at least one can compute them in time proportional 
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to that for the computation of the given original pro- 
gram. 

One can extend the AD algorithms to compute 
higher derivatives. In particular, it is well known how 
to compute a truncated Taylor series to get arbitrarily 
higher-order derivatives of a function with one variable 
[14]. One may regard a special function such as a Bessel 
function or a block of several arithmetic operations, 
such as the inner product of vectors, as a basic opera- 
tion if the corresponding elementary partial derivatives 
are given with computational definitions. An analogy is 
pointed out in [7] between the algorithms for the partial 
derivatives and those of the computation of the shortest 
paths in an acyclic graph. 

It has also been pointed out that there may be pit- 
falls in the derived program with AD. For example, 
a tricky program 


IF (x.ne.1.0) 

THEN y = x*x 

ELSE y = 1.0 + (x — 1.0) * b 
ENDIF 


can compute the value of a function f(x) = x’ correctly 
for all x. However, the derived program fails to compute 
f'(1.0), because the differentiation of the second assign- 
ment with respect to x is not 2.0 but b. Thus condi- 
tional branches (or equations equivalent to conditional 
branches) should be carefully dealt with. 


History 


A brief history of AD is as follows. There were not a few 
researchers in the world who had more or less indepen- 
dently proposed essentially the same algorithms. 

The first publication on the forward mode algo- 
rithm was presumably the paper by R.E. Wengert in 
1964 [16]. After 15 years, books were published by L.B. 
Rall [14] and by H. Kagiwada et al. [9] which have been 
influential on the numerical-computational circle. The 
practical and famous software system for the forward 
mode automatic differentiation was Pascal-SC, and its 
descendants Pascal-XSC and C-XSC are popular now. 

The paper [13] might be the first to propose system- 
atically the reverse mode algorithm. But there are many 
ways through which to approach the reverse mode al- 
gorithm. In fact, it is related to Lagrange multipliers, 
error analysis, generation of adjoint systems, reduction 
of computational complexity of computing the gradi- 


ent, neural networks, etc. Of course, the principles of 
the derived algorithms are the same. Some remarkable 
works on the reverse mode algorithm had been done by 
S. Linnainmaa [11] and W. Miller and C. Wrathall [12] 
from the viewpoint of the error analysis, by W. Baur 
and V. Strassen [1] from that of complexity, and by P.J. 
Werbos [17] from that of the optimization of neural 
networks. A practical program had been developed by 
B. Speelpenning in 1980 [15] and it was rewritten into 
Fortran by K.E. Hillstrom in 1985 (now registered in 
Netlib [5,6]). 

Two proceedings of the international workshops 
held in 1991 and 1996 collect all the theories, tech- 
niques, practical programs, current works, and future 
problems as well as history on automatic differentia- 
tion [2,4]. It should be noted that, in 1992, A. Griewank 
proposed a drastic improvement of the reverse mode 
algorithm using the so-called checkpointing technique. 
He succeeded in reducing the order of the size of stor- 
age required for the reverse mode algorithm [3]. Sev- 
eral software tools for automatic differentiation have 
been developed and popular in the world, e.g., ADIC, 
ADIFOR, ADMIT-1, ADOL-C, ADOL-F, FADBAD, 
GRESS, Odyssée, PADRE2, TAMC, etc. (See [2,4].) 


Estimates of Rounding Errors 


In order to solve practical real-world problems, the ap- 
proximation with floating-point numbers is inevitable 
so that it is important to analyze and estimate the ac- 
cumulated rounding errors in a big numerical com- 
putation. Moreover, in terms of estimates of the accu- 
mulated rounding errors, one can define a normalized 
(or weighted) norm for a numerically computed vector, 
that is useful for checking whether the computed vec- 
tor can be regarded as zero or not from the viewpoint 
of numerical computation [8]. 

For the previous example, let us denote as 5, the 
rounding error generated at the execution of the basic 
operation to compute the value of v;,. Then, the round- 
ing errors in the example is explicitly written: 


V =X —%3 + bi, 
Vo =X *V] ct 62, 
V3 = exp(V2) + 53, 
V4 = V5 +1+ 64, 
Vs = VPayllvn oF 6s. 


ak wWN 
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Here, V; is the value with accumulated rounding er- 
rors. 
Defining a function f as 
f(x, X2, X33 61, 52, 63, 54, 55) 
x1 (x2 — x3 + 61) + 52 


= +6s, 
exp(x1 (x2 — x3 + 51) + 62) +63 +14 b4 : 


one has 


Vs = f (x1, x2,%3351, acgrans 65), 


V5 = f (x1, Xa, ¥330,05+50). 


Here, V5 — vs is the accumulated rounding error in the 
function value. For vs = v2/v4 = @s5(V2, v4), one has 


V5 — V5 = O5(V2, V4) — Ys(V2, v4) + 55 


a 
= ao lb &4) - (V2 — v2) 
v2 
7) 
+ 5 (Es, ba) a 4) + 85, 
V4 


where & = 0/77+(1—0’)v2 and &4 = 0’V4+(1—0”)v4 
for 0 < 6’,0” < 1. Expanding V2 — v2 and V4 — v4 simi- 
larly and expanding the other intermediate variables se- 
quentially, the approximation: 


(12) 


is derived [10]. Note that i are computed as V; in Pro- 
gram 6, which are the final results of (9) and (10). 
The locally generated rounding error 5, for the 
floating-point number system is bounded by 
[dx] <c-|vel-em, (13) 
where &y indicates so-called ‘machine epsilon’ and c = 
1 may be adopted for arithmetic operations according 


to IEEE754 standard. Then Al[f]a, called absolute esti- 
mation, is defined by 


r 


Alfla => 


k=1 


D8; (14) 


$e leew 


which is an upper bound on the accumulated round- 


ing error. Regarding the locally generated errors 5;’s 
as pseudo-probabilistic variables uniformly distributed 
over [— |v¢| €us |velem]’s, A[f]p, called probabilistic es- 
timate, is defined by 


~ 2 
a a 
ALfle = em so (Z| (15) 


k=1 


There are several reports in which these estimates give 
quite good approximations to the actual accumulated 
rounding errors [8]. 

Moreover, one could answer the problem how to 
choose a norm for measuring the size of numerically 
computed vector. By means of the estimates of the 
rounding errors, a weighted norm of a vector f = [f1, 
..-» fm] whose components are numerically computed 
is defined by 


ins = |[ ge ae] 


(16) 


P 
(p = 1,2 or oo). This weighted norm is called normalized 
norm, because it is normalized with respect to accumu- 
lated rounding errors. With this normalized norm, one 
can determine whether a computed vector approaches 
to zero or not in reference to the rounding errors ac- 
cumulated in the components. Note that, since all the 
components of the vector are divided by the estimates 
of accumulated rounding errors, they have no physical 
dimension. The normalized norm may be used effec- 
tively as stopping criteria for iterative methods like the 
Newton-Raphson method. 
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Research in the field of automatic differentiation (AD) 
has blossomed since A. Griewank’s paper [15] in 1989 
and the Breckenridge conference [17] in 1991. During 
that same period, the power and availability of parallel 
machines have increased dramatically. A natural con- 
sequence of these developments has been research on 
the interplay between AD and parallel computations. 
This relationship can take one of two forms. One can 
examine how AD can be applied to existing parallel 
programs. Alternatively, one can consider how AD in- 
troduces new potential for parallelism into existing se- 
quential programs. 


Background 


Automatic differentiation relies upon the fact that all 
programming languages are based on a finite number 
of elementary functions. By providing rules for the dif- 
ferentiation of these elementary functions, and by com- 
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bining these elementary derivatives according to the 
chain rule of differential calculus, an AD system can dif- 
ferentiate arbitrarily complex functions. The chain rule 
is associative—partial derivatives can be combined in 
any order. The forward mode of AD combines the par- 
tial derivatives in the order of evaluation of the elemen- 
tary functions to which they correspond. The reverse 
mode combines them in the reverse order. For systems 
with a large ratio of dependent to independent vari- 
ables, the reverse mode offers lower operation counts, 
at the cost of increased storage costs [15]. 

The forward and the reverse mode are the extreme 
ends of a wide algorithmic spectrum of accumulating 
derivatives. Recently, hybrid approaches have been de- 
veloped which combine the forward and the reverse 
mode [5,10], or apply them in a hierarchical fashion 
[8,25]. In addition, efficient checkpointing schemes have 
been developed which address the potential storage ex- 
plosion of the reverse mode by judicious recomputa- 
tion of intermediate states [16,19]. Viewing the prob- 
lem of automatic differentiation as an edge elimination 
problem on the program graph corresponding to a par- 
ticular code, one can in fact show that the problem of 
computing derivatives with minimum cost is NP-hard 
[21]. The development of more efficient heuristics is an 
area of active research (see, for example, several of the 
papers in [3]). 


Implementation Approaches 


Automatic differentiation is a particular instantiation of 
a rule-based semantic transformation process. That is, 
whenever a floating-point variable changes, an associ- 
ated derivative object must be updated according to the 
chain rule of differential calculus. For example, in the 
forward mode of AD, a derivative object carries the par- 
tial derivative(s) of an associated variable with respect 
to the independent variable(s). In the reverse mode of 
AD, a derivative object carries the partial derivative(s) 
of the dependent variable(s) with respect to an associ- 
ated variable. Thus, any AD tool must provide an in- 
stantiation of a ‘derivative object’, maintain the associ- 
ation between an original variable and its derivative ob- 
ject, and update derivative objects in a timely fashion. 
Typically AD is implemented in one of two ways: 
operator overloading or source transformation. In lan- 
guages that allow operator overloading, such as C++ 


and Fortran90, each elementary function can be rede- 
fined so that in addition to the normal function, deriva- 
tives are computed as well, and either saved for later use 
or propagated by the chain rule. A simple class defini- 
tion using the forward mode might be implemented as 
follows: 


class adouble{ 
private: 
double value, 
public: 
/* constructors omitted */ 
friend adouble operator* (const 
adouble &, const adouble &); 
/* similar decs for other ops */ 
} 
adouble operator* (const adouble &g1, 
const adouble &g2) { 
aligic, ak 9 
double newgrad[GRAD_LENGTH]; 
for (i=0; i<GRAD_LENGTH;i++) { 
newgrad[i] = 
(g1.value) *(g2.grad[i])+ 
(g2.value) *(gl.grad[i]); 


grad [GRAD_LENGTH]; 


} 
return adouble(g1.value*g2.value, 
newgrad) ; 


An example of how this class could be used is given 
below. 

In languages that do not support operator overload- 
ing, it can be faked by manually or automatically replac- 
ing operators such as + and * with calls to subroutines. 


main() { 

double temp [GRAD_LENGTH] ; 

adouble y; 

(= ana tialaze sil Eo (3.0, (1.0) O20 
=o to (4.0, 10.0 Tooley 

ESM] = Oe ieemo|il]) = ©.0- 

adouble *xl = new adouble(3.0, temp); 

Ean] = O.O¢ eemiallal|| = i.e 

adouble *x2 = new adouble(4.0, temp); 

y = (*x1)* (x2); 

/* output (vy, [dy/dxl dy/dx2\]))) */ 


Glebe << 4/77 


iro joreaiaess (12-0, [450 S40) “/ 
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As an alternative to operator overloading, a prepro- 
cessor can be used to transform source code for com- 
puting the function into source code for computing 
the function and its derivatives. This approach relies 
heavily on compiler technology and typically involves 
a combination of in-lining and subroutine calls to im- 
plement the propagation of derivatives. An example of 
ADIFOR-generated code (edited for clarity) invoking 
the SparsLinC library for transparent exploitation of 
sparsity [6] follows. 


c derivation code for f=x*y/z 


c preaccumulate partial derivates 
tempill = x*y/z 


cane = i Oz 

temp3 = temp2*y 
temp4 = temp2*x 
temp5 = -temp1/z 


c propagate derivatives 
(@ (Kep se, , g_f may by sparse) 
call sspg3q(g_f,temp3,g_x,temp4, 
+ g_y,temp5,g_z) 


f=temp1 


The advantage of this approach is that it allows the 
exploitation of computational context in deciding how 
to propagate derivatives. For example, a recently de- 
veloped Hessian module [1], adaptively determines the 
best strategy for each assignment statement in the code 
based on a machine-specific performance model for the 
implementation kernels employed. 

A comparison of these two implementation ap- 
proaches is provided in [9]. This paper also introduces 
an implementation design that separates the core issues 
of automatic differentiation from language-specific is- 
sues through the use of an interface layer called AIF 
(AD intermediate form), thus arriving at a system de- 
sign that allows reuse of differentiation components 
across front-ends for different languages. Long-term, 
such a system design also allows the exploitation of the 
best features of both source transformation and opera- 
tor overloading. 

Current AD tools based on operator overloading in- 
clude ADOL-C [18] and ADOL-F [29], both of which 
offer the option of using either the forward or the 
reverse mode, and to compute derivatives of arbi- 
trary order Source transformation tools that use mostly 


the forward mode to provide first- and second order 
derivatives include ADIC [9] and ADIFOR [6]. The 
Odyssee [28] and TAMC [14] tools use the reverse 
mode in a source transformation context to provide 
first order derivatives. A more comprehensive survey of 
AD tools can be found at the website [31]. 


AD of Parallel Programs 


In 1994, R.L. Hinkins reported on the application of 
AD to magnetic field calculations implemented in the 
data parallel languages MPFortran (MasPar Fortran) 
and CMFortran [22]. In 1997, P. Hovland addressed 
the larger issue of AD of parallel programs in gen- 
eral, paying close attention to message-passing paral- 
lel programs [23], but also considering other paral- 
lel programming paradigms, and A. Carle developed 
ADIFOR-MP, a prototype tool supporting a subset of 
MPI [30] and PVM [13] constructs. The focus on par- 
allel programs employing a message-passing paradigm 
can be attributed to the popularity of this parallel 
programming paradigm and its relevance to all par- 
allel programs targeting nonuniform memory access 
(NUMA) machines. 

Correct AD of message-passing parallel programs 
requires that we maintain an association between 
a variable and its derivative object. In particular, when 
a variable is sent from one processor to another via 
a message, we must also send the associated derivative 
object. There are two ways of accomplishing this goal — 
we can pack the variable and derivative object together 
in one message or send two separate messages. Pack- 
ing a variable and its associated derivative object into 
a single message may incur a copying overhead. On the 
other hand, sending separate messages requires a mech- 
anism for associating the messages with one another 
at the receiving end and will increase delivery time on 
high-latency systems. In general, it is preferable to pack 
the variable and derivative object together in one mes- 
sage [24], minimizing copying cost through judiciously 
chosen derivative data structures. Other issues in en- 
suring correct AD of parallel programs include proper 
handling of nondeterminism, reduction operations at 
points of nondifferentiability, and seed matrix initial- 
ization [23]. 

In many instances, only a subset of the program 
input- and output variables is considered as indepen- 
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dent or dependent variables with respect to differenti- 
ation. An optimization technique that tries to exploit 
this fact is activity analysis, which seeks to reduce time 
and storage costs by identifying variables that do not lie 
on the computational path from independent to depen- 
dent variables. Such variables are termed passive and 
do not require an associated derivative object. Activity 
analysis depends on sophisticated compiler technology, 
namely interprocedural dataflow analysis. In message- 
passing parallel programs, sends and receives greatly 
complicate such an analysis. As the analysis needs to 
guarantee correctness, this fact leads to much more 
conservative assumptions, and as a result much opti- 
mization potential may be lost. Among the available op- 
tions to circumvent this situation are user annotations, 
runtime analysis, or the use of a higher-level language 
such as HPF [26]. These issues are investigated in more 
detail in [23]. 

Another issue arising in the parallel setting is the 
computation of partial derivatives of new elementary 
functions, such as parallel reduction operations. For 
most of the common reduction operations, such as 
sum, maximum, and minimum, computing the par- 
tial derivatives is trivial. For the product reduction, 
the situation is more complex. The partial deriva- 
tive of y = [[7_,x; with respect to x; is dy/ Ox; = 
Tia TT: 41%k). These partial derivatives can be 
computed using a parallel prefix and reverse paral- 
lel prefix operation. However, propagating the partial 
derivatives requires an additional sum reduction. We 
could instead combine the partial derivative computa- 
tion and propagation into a single reduction. This in- 
creases the computational cost, but reduces the com- 
munication cost. In [24], Hovland and C. Bischof dis- 
cuss the conditions under which each approach should 
be preferred and give experimental results to support 
the theory. 


AD-Enabled Parallelism 


As early as 1991, Bischof considered the problem of 
parallelizing the computation of derivatives computed 
via AD [4] to distribute the additional work introduced 
by AD. Applying AD to a program introduces two ba- 
sic types of parallelism: data parallelism and time paral- 
lelism. 


Data Parallelism 


The potential for data parallelism arises whenever there 
are multiple independent variables (for the forward 
mode) or multiple dependent variables (for the reverse 
mode). Different processes can be employed to propa- 
gate partial derivatives with respect to a subset of the 
independent variables in parallel. 

Such an implementation is feasible if one can em- 
ploy light-weight threads for the parallel derivative 
computation. A limiting factor is the fact that the 
derivative computations are interspersed with the func- 
tion computation. Thus, an alternative approach is 
to replicate the sequential computation on each pro- 
cessor, thereby virtually eliminating communication 
costs. This approach has proven effective for compu- 
tations involving a large number of independent vari- 
ables [7,32]. 


Time Parallelism 


Time parallelism arises as a consequence of the asso- 
ciativity of the chain rule. By breaking the computa- 
tion into several phases, we can compute and propa- 
gate partial derivatives over each phase simultaneously, 
then combine the results according to the chain rule. 
This approach is illustrated in Fig. 1. Before each phase, 
a derivative computation for that phase is forked off, 
using as input the results of the previous phase. At the 
conclusion of the derivative computations, the partial 
derivatives are combined according to the chain rule. 

This illustration assumes the forward mode. If we 
were using the reverse mode, the derivative computa- 
tion for phase A would be forked off after phase A had 
completed. The effectiveness of this approach has been 
demonstrated for both the forward mode [10] and the 
reverse mode [2]. The associativity of the chain rule 
makes it possible to apply this time-parallel approach 
to arbitrary computational structures, not just the lin- 
ear schedule illustrated here. 


Parallel AD Tools 


Research in AD and parallelism is relatively new. 
Nonetheless, there are several such tools, at varying 
stages of development. 

Hinkins developed special purpose libraries for 
the AD of programs written in MPFortran or CM- 
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Fortran [22]. Use of these libraries required that each 
arithmetic operation be manually replaced by a subrou- 
tine call. As part of his thesis [23], Hovland developed 
prototype tools for AD of FortranM [12], Fortran with 
a subset of MPI [20] message passing, and C with MPI. 
Carle is developing (1999) a prototype version of ADI- 
FOR [11] supporting MPI and PVM. Roh is developing 
an extension to ADIC that seeks to automatically ex- 
ploit the parallelism introduced by AD through the use 
of threads [27]. 


Summary 


Since 1989, a great deal of progress has been made in 
the fields of automatic differentiation and parallel com- 


/ 
ry 


dtidz 


putation. Parallel computation and AD interact in two 
ways. AD can be applied to a parallel program. Alterna- 
tively, AD can be used as a source of new parallelism in 
a computation. Effective strategies exist for exploiting 
each of the two types of parallelism introduced: time 
parallelism and data parallelism. 

In either case, ensuring that the resulting derivative 
computation is both correct and efficient requires AD 
tools that are more sophisticated than in the serial set- 
ting. Most of the existing tools are early in their devel- 
opment cycle, but can be expected to mature swiftly as 
they adopt advanced computational infrastructure de- 
veloped in other fields of computer science, e.g., par- 
allelizing compilers or parallel runtime systems. Thus, 
we expect the beginning of 2000 to also provide robust 
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and effective tools for the differentiation of parallel pro- 
grams and the introduction of parallelism through dif- 
ferentiation. 


See also 


> Asynchronous Distributed Optimization 
Algorithms 

> Automatic Differentiation: Calculation of the 
Hessian 

> Automatic Differentiation: Calculation of Newton 
Steps 

> Automatic Differentiation: Geometry of Satellites 
and Tracking Stations 

> Automatic Differentiation: Introduction, History 
and Rounding Error Estimation 

> Automatic Differentiation: Point and Interval 

> Automatic Differentiation: Point and Interval 
Taylor Operators 

> Automatic Differentiation: Root Problem and 
Branch Problem 

> Heuristic Search 

> Interval Analysis: Parallel Methods for Global 
Optimization 

> Load Balancing for Parallel Optimization 
Techniques 

> Nonlocal Sensitivity Analysis with Automatic 
Differentiation 

> Parallel Computing: Complexity Classes 

> Parallel Computing: Models 

> Parallel Heuristic Search 

> Stochastic Network Problems: Massively Parallel 
Solution 


References 


1. Abate J, Bischof Ch, Carle A, Roh L (1997) Algorithms and 
design for a second-order automatic differentiation mod- 
ule. Proc. Internat. Symp. Symbolic and Algebraic Comput- 
ing (ISSAC) ‘97, ACM, New York, pp 149-155 

2. Benary J (1996) Parallelism in the reverse mode. In: Berz 
M, Bischof Ch, Corliss G, Griewank A (eds) Computational 
Differentiation: Techniques, Applications, and Tools. SIAM, 
Philadelphia, pp 137-147 

3. Berz M, Bischof Ch, Corliss G, Griewank A (1996) Computa- 
tional differentiation: Techniques, applications, and tools. 
SIAM, Philadelphia 

4. Bischof ChH (1991) Issues in parallel automatic differentia- 
tion. In: Griewank A, Corliss G (eds) Automatic Differentia- 
tion of Algorithms. SIAM, Philadelphia, pp 100-113 


5. Bischof Ch, Carle A, Corliss G, Griewank A, Hovland P (1992) 
ADIFOR: Generating derivative codes from Fortran pro- 
grams. Scientif Program 1(1):11-29 

6. Bischof Ch, Carle A, Khademi P, Mauer A (1996) ADIFOR 
2.0: Automatic differentiation of Fortran 77 programs. IEEE 
Comput Sci Eng 3(3):18-32 

7. Bischof Ch, Green L, Haigler K, Knauff T (1994) Paral- 
lel calculation of sensitivity derivatives for aircraft de- 
sign using automatic differentiation. Proc. 5th AIAA/ 
NASA/USAF/ISSMO Symp. Multidisciplinary Analysis and 
Optimization, AIAA-94-4261, Amer Inst Aeronautics and 
Astronautics, Reston, VA, pp 73-84 

8. Bischof ChH, Haghighat MR (1996) On hierarchical differ- 
entiation. In: Berz M, Bischof Ch, Corliss G, Griewank A (eds) 
Computational Differentiation: Techniques, Applications, 
and Tools. SIAM, Philadelphia, pp 83-94 

9. Bischof Ch, Roh L, Mauer A (1997) ADIC - An extensible au- 
tomatic differentiation tool for ANSI-C. Software Practice 
and Experience 27(12):1427-1456 

10. Bischof Ch, Wu Po-Ting (1997) Time-parallel computation 
of pseudo-adjoints for a leapfrog scheme. Preprint Math 
and Computer Sci Div Argonne Nat Lab no. ANL/MCS- 
P639-0197 

11. Carle A (1997) ADIFOR-MP - A prototype automatic differ- 
entiation tool for Fortran 77 with message-passing exten- 
sions. Personal communication 

12. Foster IT, Chandy KM (1995) Fortran M: A language for 
modular parallel programming. J Parallel Distributed Com- 
put 25(1) 

13. Geist A, Beguelin A, Dongarra J, Jiang W, Manchek R, Sun- 
deram V (1994) PVM - Parallel virtual machine: A users’ 
guide and tutorial for network parallel computing. MIT, 
Cambridge, MA 

14. Giering R, Kaminski Th (1996) Recipes for adjoint code 
construction. Max-Planck Inst Meteorologie, Hamburg 
no. 212 

15. Griewank A (1989) On automatic differentiation. In: 
lri M, Tanabe K (eds) Mathematical Programming: Re- 
cent Developments and Applications. Kluwer, Dordrecht, 
pp 83-108 

16. Griewank A (1992) Achieving logarithmic growth of tem- 
poral and spatial complexity in reverse automatic differen- 
tiation. Optim Methods Softw 1(1):35-54 

17. Griewank A, Corliss G (1991) Automatic differentiation of 
algorithms. SIAM, Philadelphia 

18. Griewank A, Juedes D, Utke J (1996) ADOL-C, a package 
for the automatic differentiation of algorithms written in 
C/C++. ACM Trans Math Softw 22(2):131-167 

19. Grimm J, Pottier L, Rostaing-Schmidt N (1996) Opti- 
mal time and minimum space time product for revers- 
ing a certain class of programs. In: Berz M, Bischof Ch, 
Corliss G, Griewank A (eds) Computational Differentiation, 
Techniques, Applications, and Tools. SIAM, Philadelphia, 
pp 95-106 


Automatic Differentiation: Point and Interval 


165 


20. Gropp W, Lusk E, Skjellum A (1994) Using MPI —- Portable 
parallel programming with the message passing interface. 
MIT, Cambridge, MA 

21. Herley K (1993) On the NP-completeness of optimum accu- 
mulation by vertex elimination. Unpublished Manuscript 

22. Hinkins R L (Sept. 1994) Parallel computation of automatic 
differentiation applied to magnetic field calculations. MSc 
Thesis Univ Calif 

23. Hovland P (1997) Automatic differentiation of parallel pro- 
grams. PhD Thesis Univ. Illinois at Urbana-Champaign 

24. Hovland P, Bischof Ch (1998) Automatic differentiation of 
message-passing parallel programs. Proc. First Merged In- 
ternat. Parallel Processing Symp. and Symp. on Parallel 
and Distributed Processing, IEEE Computer Soc Press, New 
York 

25. Hovland P, Bischof Ch, Spiegelman D, Casella M (1997) Ef- 
ficient derivative codes through automatic differentiation 
and interface contraction: An application in biostatistics. 
SIAM J Sci Comput 18(4):1056-1066 

26. Koelbel C, Loveman D, Schreiber R, Steele G Jr, Zosel M 
(1994) The high performance Fortran handbook. MIT, Cam- 
bridge, MA 

27. Roh L (1997) Personal Communication 

28. Rostaing N, Dalmas St, Galligo A (Oct. 1993) Automatic dif- 
ferentiation in Odyssee. Tellus 45a(5):558-568 

29. Shiriaev D, Griewank A (1996) ADOL-F: Automatic dif- 
ferentiation of Fortran codes. In: Berz M, Bischof Ch, 
Corliss G, Griewank A (eds) Computational Differentiation: 
Techniques, Applications, and Tools. SIAM, Philadelphia, 
pp 375-384 

30. Snir M, Otto SW, Huss-Lederman S, Walker DW, Dongarra 
Jack (1996) MPI: The complete reference. MIT, Cambridge, 
MA 

31. WEB http://www.mcs.anl.gov/autodiff/adtools/ 

32. Zhang Y, Bischof Ch, Easter R, Wu Po-Ting (1997) Sensi- 
tivity analysis of O3 and photochemical indicators using 
a mixed-phase chemistry box model and automatic dif- 
ferentiation techniques. 90th Air and Waste Management 
Assoc. Annual Meeting and Exhibition June 8-13, 1997, 
Toronto. vol 97-WA68A.04, Air and Waste Management As- 
soc, Pittsburgh, PA, pp 1-16 


er 
Automatic Differentiation: 


Point and Interval 
AD 


L. B. RALL', GEORGE F. CORLISS” 
' University Wisconsin-Madison, Madison, USA 
* Marquette University, Milwaukee, USA 


MSC2000: 65H99, 65K99 


Article Outline 


Keywords 
See also 
References 


Keywords 


Differentiation; Computational methods 


Automatic differentiation (abbreviated AD) is a com- 
putational method for evaluating derivatives or Taylor 
coefficients of algorithmically defined functions. Sim- 
ply speaking, an algorithmic definition of a function is 
a step-by-step specification of its evaluation by arith- 
metic operations and library functions. Application of 
the rules of differentiation to the algorithmic definition 
of a differentiable function yields values of its deriva- 
tives. Examples of algorithmic definitions of functions 
are code lists, computer subroutines, and even entire 
computer programs. 

Automatic differentiation differs from numerical 
differentiation based on difference quotients of func- 
tion values in that automatic differentiation is exact in 
principle, but of course is subject to roundoff error in 
practice. In addition to roundoff error, difference quo- 
tients entail truncation error. Attempts to reduce this 
truncation error by decreasing stepsize results in can- 
cellation of significant digits and a catastrophic increase 
in roundoff error in general. Automatic differentiation 
also differs significantly from the symbolic differentia- 
tion taught in school, the goal of which is the transfor- 
mation of formulas for functions into formulas for their 
derivatives. Although automatic differentiation uses the 
same rules of differentiation as symbolic differentiation, 
these rules are applied to the algorithmic definition of 
the function, not to a formula for it, and the results 
are values of derivatives, not formulas. Furthermore, 
formulas may not be available for functions of inter- 
est defined only algorithmically by computer subrou- 
tines or programs to which automatic differentiation 
can be applied. In summary, automatic differentiation 
is more accurate than numerical differentiation and re- 
quires fewer resources and is more generally applicable 
than symbolic differentiation. 

The simplest type of algorithmic definition of 
a function is a code list, which is similar to the segment 
of computer code for the evaluation of an expression 
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(i.e., a formula). For illustration, consider the function 
defined by the formula 


f(x,y) = (xy + sinx + 4)(3y* + 6). 


An equivalent algorithmic definition of this function by 
a code list is 


tf = x, te = t5+4, 
bo = 9; ye = 

th = th, tg = 3t, 

ty = sinty, t = tg+6, 
tt = t3+ ta, tio = toto. 


Given the values of x and y, evaluation of the subse- 
quent entries in the code list gives tio = f(x, y). In- 
deed, the first step in evaluation or symbolic differen- 
tiation of a function defined by a formula is to form 
a corresponding code list, perhaps subconsciously. The 
conversion of well-formed expressions into code lists is 
a fundamental process in computer science, sometimes 
called ‘formula translation’. Although both automatic 
differentiation and symbolic differentiation are applica- 
ble in this case, automatic differentiation requires only 
the code list and produces only values of derivatives for 
given values of the input variables. To compute the gra- 
dient V f, the rules of differentiation applied to the code 
list above gives 


Vt, = Vx, 

Vio = Vy, 

Vij = Vb+bVh, 
Vta = (cos t,)Vty, 
Vts = Vt3 + Vta, 


Vite = Vts, 
Vt; = 2t2Vto, 
Vtg = 3Vtz, 
Vto = Vtg, 


V tio = te V to + toV te. 
It is evident from the chain rule that 


Vtio = V(x, y) = falx, y)Vx + f(x, y)Vy. 


Thus, once the code list for f(x, y) is given and the ‘seed’ 
values of x, V x and y, V y are known, the values of the 
function and its gradient can be computed without for- 
mulas for either. In case x, y are independent variables, 


then V x = [1, 0], Vy = [0, 1] and 


Vi (x.y) = [lx y), lx y)] 
= [to(t2 + cos ty), 6t2t6 + tito]. 


This example illustrates the forward mode of auto- 
matic differentiation. This process is not restricted to 
first derivatives as long as the entries t; of the code list 
have the desired number of derivatives. 

Although the forward mode illustrated above is easy 
to understand and implement, it is usually more effi- 
cient to compute gradients in what is called the reverse 
mode. To explain this process, consider a general code 
list t = (t), ..., tn) which begins with m input variables 
t,..., tm, and ends with p output variables t,-py1, ..., 
t,. For i> m, the entry t; = tj 0 t,, where j,k <iando 
denotes an arithmetic operation, or t; = $(t;) with j < i, 
where ¢ is a function belonging to a library of standard 
functions. For convenience, arithmetic operations be- 
tween constants and entries will be considered library 
functions in addition to the usual sine, cosine, and so 
on. 

If K; denotes the set of indices k < i such that the 
entry t; of the code list depends explicitly on t;, then 
the forward mode of automatic differentiation consists 
of application of the chain rule in the form 


dt; 
Vt, = —Vt 
: a Ot k 
keK;j 
fori=m+1,...,n, to obtain the gradients of the inter- 


mediate variables and output. This process works be- 
cause Vty,..., Vti-1 are known or have been computed 
before they are needed for the evaluation of V¢j. If the 
seed gradients have dimension at most d, then the for- 
ward mode of automatic differentiation requires com- 
putational effort proportional to nd, that is, d times the 
effort required for evaluation of the output t,,. If d > m, 
then it is more efficient to consider the input variables 
to be independent and then compose Vf by the stan- 
dard formula given below. This limits the computa- 
tional effort for the forward mode to an amount essen- 
tially proportional to nm. 

The reverse mode is another way to apply the chain 
rule. Instead of propagating the seed gradients Vt, ..., 
Vtm throughout the computation, differentiation is ap- 
plied to the code list in reverse order. In the case of 
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a single output variable t,,, first t, is differentiated with 
respect to itself, then with respect to tp-1,..., t1. The re- 
sulting adjoints Ot,/ tm, ..., Oty/ dt; and the seed gra- 
dients then give 


oe 
Vt, = — 
Ot; 


i=1 


Viti. 


Formally, the adjoints are given by 


Otn = 1, Otn - - Ot ti ; 
Oty Ot 4 Ot; Otp 
i€l, 
k= n—1,..., 1, where I; is the set of indices i> k such 


that t; depends explicitly on t,. It follows that the com- 
putational effort to obtain adjoints in the reverse mode 
is proportional to n, the length of the code list, and is es- 
sentially independent of the number of input variables 
and the dimensionalities of the seed gradients. This can 
result in significant savings in computational time. In 
the general case of several output variables, the same 
technique is applied to each to obtain their gradients. 

The reverse mode applied to the example code list 
gives 


OH ag: 

Otto 

Otto 

— = te, 

dtp © 

Oto Oto Ito 

— = ——_ = &-1, 

dts Oto Otg 

ot Otio Ot 

eS eae. 

Oty Otg Oty 

Otto 

— = to, 

aA 

Oto Oto Ite 

— = ——_" = y+, 

dts dts Ots 

Oto Oty Ots 

— = —— _ = 49-1, 

ots dts Ot4 

Otio Oty Ots 

= 5a = 9-1, 

ots dts Ot3 

tio _ Oti Itz Atio Ots 

Ot, Oty Otr dts Jb 
= (3t6) - (2t2) + to: th, 

Otto Oto Ot~  Otyo Ot3 
= = to: cost ty + to. 

Ot; Ot, Ot,  Ot3 Ot, pete 


Although this computation appears to be complicated, 
a comparison of operation counts in the case x, y are 
independent variables shows that even for this low- 
dimensional example, the reverse mode requires 13 op- 
erations to evaluate Vf in addition to the operations re- 
quired to evaluate f itself, while the forward mode re- 
quires 22 = 2 + 10 m. In reverse mode, the entire code 
list has to be evaluated and its values stored before the 
reverse sweep begins. In forward mode, since the com- 
putation of t; and each component of Vt; can be carried 
out independently, a parallel computer with a sufficient 
number of processors could compute t,,, Vt, in a single 
pass through the code list, that is, with effort propor- 
tional to n. A more detailed comparison of forward and 
reverse modes for calculating gradients can be found in 
the tutorial article [1, pp. 1-18] and the book [3]. 

Implementation of automatic differentiation can be 
by interpretation, operator overloading, or code trans- 
formation. Early software for automatic differentiation 
simply interpreted a code list by calling the appropri- 
ate subroutines for each arithmetic operation or library 
function. Although inefficient, this approach is still use- 
ful in interactive applications in which functions en- 
tered from the keyboard are parsed to form code lists, 
which are then interpreted to evaluate the functions and 
their derivatives. 

Operator overloading is a familiar concept in math- 
ematics, as the symbol ‘+’ is used to denote addition of 
such disparate objects as integers, real or complex num- 
bers, vectors, matrices, functions, etc. It follows that 
a code list as defined above can be evaluated in any 
mathematical system in which the required arithmetic 
operations and library function are available, including 
differentiation arithmetics [14, pp. 73-90]. These arith- 
metics can be used to compute derivatives or Taylor co- 
efficients of any order of sufficiently smooth functions. 
In optimization, gradient and Hessian arithmetics are 
most frequently used. In gradient arithmetic, the basic 
data type is the ordered pair (f, Vf) of a number and 
a vector representing values of a function and its gra- 
dient vector. Arithmetic operations in this system are 
defined by 


GF VA+(g,.Vg)=F+e,Vf + Ve), 
(Ff. VAN(g. Vg) = (fg. fV¥et evs), 
(VA) _ (Z ers) 

(g.Vg) \g 
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division by 0 excluded. If @ is a differentiable library 
function, then its extension to gradient arithmetic is de- 
fined by 


Of. VA) =P). ¢' PVA), 


which is just the chain rule. Hessian arithmetic extends 
the same idea to triples (f, Vf, Hf), where Hf is a ma- 
trix representing the value of the Hessian of f, Hf = [ 
d°f/ Ox; Ox;). 

Programming differentiation arithmetic is conve- 
nient in modern computer languages which support 
operator overloading [9, pp. 291-309]. In this setting, 
the program is written with expressions or routines for 
functions in the regular form, and the compiler pro- 
duces executable code for evaluation of these functions 
and the desired derivatives. For straightforward imple- 
mentations such as the one cited above, the differenti- 
ation mode will be forward, which has implications for 
efficiency. 

Code transformation essentially consists of analyz- 
ing the code for functions to generate code for deriva- 
tives. This results in a new computer program which 
then can be compiled and run as usual. To illustrate this 
idea, note that in the simple example given above, the 
expressions 


fx(x, y) = to(t2 + cos ty), 
fy(x, y) = 6tot6 + ty to, 


were obtained for the partial derivatives of the function 
in either forward or reverse mode. This differs from 
symbolic differentiation in that values of intermediate 
entries in the code list for f(x, y) are involved rather 
than the variables x, y. The corresponding lists for these 
expressions 


tx; = cos ty, 
tx) = tp + tx, 
tx3 = totx2, 
ty1 = tate, 

ty. = 6ty,, 
ty3 = tl, 


tyg = ty. + ty3, 


can then be appended to the code list for the function 
to obtain a routine with output values tio = f(x, y), tx3 = 
fx(x, y), and tys = f(x, y). Further, automatic differen- 
tiation can be applied to this list to obtain routines for 
higher derivatives of f [13]. As a practical matter, dupli- 
cate assignments can be removed from such lists before 
compilation. 

Up to this point, the discussion has been of point 
AD, values have been assumed to be real or complex 
numbers with all operations and library functions eval- 
uated exactly. In reality, the situation is quite differ- 
ent. Expressions, meaning their equivalent code lists, 
are evaluated in an approximate computer arithmetic 
known as floating-point arithmetic. This often yields 
very accurate results, but examples of simple expres- 
sions are known for which double and even higher pre- 
cision calculation gives an answer in which even the 
sign is wrong for certain input values. Furthermore, 
such failures can occur without any outward indication 
of trouble. In addition, values of input variables may 
not be known exactly, thus increasing the uncertainty 
in the accuracy of outputs. The use of interval arith- 
metic (abbreviated IA) provides a computational way to 
attack these problems [11]. 

The basic quantities in interval arithmetic are finite 
closed real intervals X = [x1, x2], which represent all real 
numbers x such that x; < x < x2. Arithmetic operations 
° on intervals are defined by 


XoY={xoy: xe x, ye Y}, 


again an interval, division by an interval containing 
zero excluded. Library functions @ are similarly ex- 
tended to interval functions ® such that (x) € ®(X) 
for all x € X with ®(X) expected to be an accurate in- 
clusion of the range ¢(X) of ¢ on X. Thus, if f(x) is 
a function defined by a code list, then assignment of the 
interval value X to the input variable and evaluation of 
the entries in interval arithmetic yields the output F(X) 
such that f(x) € F(X) for all x € X. The interval func- 
tion F obtained in this way is called the united extension 
of f [11]. 

In the floating-point version of interval arithmetic, 
all endpoints are floating-point numbers and hence ex- 
actly representable in the computer. Results of arith- 
metic operations and calls of library functions are 
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rounded outwardly (upper endpoints up, lower end- 
points down) to the closest or very close floating-point 
numbers to maintain the guarantee of inclusion. Thus, 
one is still certain that for the interval extension F 
of f actually computed, f(x) € F(X) for all x ¢ X. Thus, 
for example, an output interval F(X) which is very wide 
for a point input interval X = [x, x] would serve as 
a warning that the algorithm is inappropriate or ill- 
conditioned, in contrast to the lack of such information 
in ordinary floating-point arithmetic. 

Automatic differentiation carried out in interval 
arithmetic is called interval automatic differentiation. 
Interval computation has numerous implications for 
optimization, with or without automatic differentia- 
tion [6]. Maxima and minima of functions can ‘slip 
through’ approximate sampling of values at points of 
the floating-point grid, but have to be contained in 
the computable interval inclusion F(X) of f(x) over the 
same interval region X, for example. 

Although interval arithmetic properly applied can 
solve many optimization and other computational 
problems, a word of warning is in order. The properties 
of interval arithmetic differ significantly from those of 
real arithmetic, and simple ‘plugging in’ of intervals for 
numbers will not always yield useful results. In partic- 
ular, interval arithmetic lacks additive and multiplica- 
tive inverses, and multiplication is only subdistributive 
across addition, X(Y+ Z) C XY+ XZ [11]. A real algo- 
rithm which uses one or more of these properties of real 
arithmetic is usually inappropriate for interval compu- 
tation, and should be replaced by one that is suitable if 
possible. 

To this point, automatic differentiation has been ap- 
plied only to code lists, which programmers customar- 
ily refer to as ‘straight-line code’. Automatic differenti- 
ation also applies to subroutines and programs, which 
ordinarily contain loops and branches in addition to 
expressions. These latter present certain difficulties in 
many cases. A loop which is traversed a fixed num- 
ber of times can be ‘unrolled,’ and thus is equivalent to 
straight-line code. However, in case the stopping crite- 
rion is based on result values, the derivatives may not 
have achieved the same accuracy as the function val- 
ues. For example, if the inverse function of a known 
function is being computed by iterative solution of the 
equation f(x) = y for x = f—'(y), then automatic dif- 
ferentiation should be applied to f and the derivative 


of the inverse function obtained from the standard for- 
mula (f~')'(y) = (f’(x))7!. Branches essentially produce 
piecewise defined functions, and automatic differentia- 
tion then provides the derivative of the function defined 
by whatever branch is taken. This can create difficul- 
ties as described by H. Fischer [4, pp. 43-50], especially 
since a smooth function can be approximated well in 
value by highly oscillatory or other nonsmooth func- 
tions such as result from table lookups and piecewise 
rational approximations. For example, one would not 
expect to obtain an accurate approximation to the co- 
sine function by applying automatic differentiation to 
the library subroutine for the sine. As with any powerful 
tool, automatic differentiation should not be expected 
to provide good results if applied indiscriminately, es- 
pecially to ‘legacy’ code. As with interval arithmetic, au- 
tomatic differentiation will yield the best results if ap- 
plied to programs written with it in mind. 

Current state of the art software for point automatic 
differentiation of programs are ADOL-C, for programs 
written in C/C++ [5], and ADIFOR for programs in 
Fortran 77 [1, pp. 385-392]. 

Numerous applications of automatic differentiation 
to optimization and other problems can be found in the 
conference proceedings [1,4], which also contain exten- 
sive bibliographies. An important result with implica- 
tions for optimization is that automatic differentiation 
can be used to obtain Newton steps without forming 
Jacobians and solving linear systems, see [1, pp. 253- 
264]. 

From a historical standpoint, the principles of au- 
tomatic differentiation go back to the early days of cal- 
culus, but implementation is a product of the computer 
age, hence the designation ‘automatic’. The terminol- 
ogy ‘algorithmic differentiation’, to which the acronym 
automatic differentiation also applies, is perhaps bet- 
ter. Since differentiation is widely understood, auto- 
matic differentiation literature contains many anticipa- 
tions and rediscoveries. The 1962 Stanford Ph.D. thesis 
of R.E. Moore deals with both interval arithmetic and 
automatic differentiation of code lists to obtain Tay- 
lor coefficients of series solution of systems of ordi- 
nary differential equations. In 1964, RE. Wengert [15] 
published on automatic differentiation of code lists and 
noted that derivatives could be recovered from Taylor 
coefficients. Early results in automatic differentiation 
were applied to code lists in forward mode, as described 
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in [13]. G. Kedem [8] showed that automatic differen- 
tiation applies to subroutines and programs, again in 
forward mode. The reverse mode was anticipated by 
S. Linnainmaa in 1976 [10], and in the Ph.D. thesis of 
B. Speelpenning (Illinois, 1980), and published in more 
complete form by M. Iri in 1984 [7]. automatic differen- 
tiation via operator overloading and the concept of dif- 
ferentiation arithmetics, which are commutative rings 
with identity, were introduced by L.B. Rall [9, pp. 291- 
309], [14, pp. 73-90], [4, pp. 17-24]. For additional in- 
formation about the early history of automatic differ- 
entiation, see [13] and the article by Iri [4, pp. 3-16] for 
later developments. 

Analysis of algorithms for automatic differentiation 
has been carried out on the basis of graph theory by Iri 
[7], A. Griewank [12, pp. 128-161], [3], and equivalent 
matrix formulation by Rall [2, pp. 233-240]. 
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Frequently of use in optimization problems, automatic 
differentiation may be used to generate Taylor coefh- 
cients. Specialized software tools generate Taylor series 
approximations, one term at a time, more efficiently 
than the general AD software used to compute (par- 
tial) derivatives. Through the use of operator overload- 
ing, these tools provide a relatively easy-to-use interface 
that minimizes the complications of working with both 
point and interval operations. 


Introduction 


First, we briefly survey the tools of automatic differenti- 
ation and operator overloading used to compute point- 
and interval-valued Taylor coefficients. We assume that 


f is an analytic function f : R > R. Automatic differ- 
entiation (AD or computational differentiation) is the 
process of computing the derivatives of a function f at 
a point t = fo by applying rules of calculus for differ- 
entiation [9,10,17,18]. One way to implement AD uses 
overloaded operators. 


Operator Overloading 


An overloaded (or generic) operator invokes a proce- 
dure corresponding to the types of its operands. Most 
programming languages implement this technique for 
arithmetic operations. The sums of two floating point 
numbers, two integers, or one floating point number 
and one integer are computed using three different 
procedures for addition. Fortran 77 or C denies the 
programmer the ability to replace or modify the vari- 
ous routines used implicitly for integer, floating point, 
or mixed-operand arithmetic, but Fortran 95, C++, 
and Ada support operator overloading for user-defined 
types. Once we have defined an overloaded operator 
for each rule of differentiation, AD software performs 
those operations on program code for f, as shown be- 
low. The operators either propagate derivative values 
or construct a code list for their computation. We give 
prototypical examples of operators overloaded to prop- 
agate Taylor coefficients below. 


Automatic Differentiation 


The AD process requires that we have f in the form of 
an algorithm (e.g. computer program) so that we can 
easily separate and order its operations. For example, 
given f(t) = e'/(2 + t), we can express f as an algorithm 
in Fortran 95 or in C++ (using an assumed AD module 
or class): 

In this section, we use AD to compute first deriva- 
tives. In the next section, we extend to point- and 
interval-valued Taylor series. To understand the AD 
process, we parse the program above into a sequence 
of unary and binary operations, called a code list, com- 
putational graph, or ‘tape’ [9]: 


x2 = 2+ X09, 
xy 


xo = to, 


x= exp(xo), x3 


x2 
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program Examplel 
use AD_Module 
type(AD_Independent) :: t 

AD_Independent(0) 

type(AD_Dependent) :: f 
f = exp(t)/(2 + t) 

end program Examplel 

#include ‘AD_class.h’ 

void main (void) { 
AD_Independent t(0); 
AD_Dependent f; 
f = exp(t)/(2 + 4); 


Automatic Differentiation: Point and Interval Taylor Opera- 
tors, Figure 1 
Fortran and C++ calls to AD operators 


Differentiation is a simple mechanical process for 
propagating derivative values. Let t = to represent the 
value of the independent variable with respect to which 
we differentiate. We know how to take the derivative of 
a variable, a constant, and unary and binary operations 
(i.e. +, —, *, /, sin, cos, exp, etc.). Then AD software 
annotates the code list: 


xo = to; 
Vxo = 1, 
x, = exp(Xxo); 


Vx = exp(xo)*V x0, 


x2 = 2+ Xo; 
Vx. =0+ Vx0, 
x} 

3 = 3 
x2 


(Vx = V x23) 


x2 


Vx3 = 


AD propagates values of derivatives, not expres- 
sions as symbolic differentiation does. AD values are 
exact (up to round-off), not approximations of un- 
known quality as finite differences. For more informa- 
tion regarding AD and its applications, see [2,8,9,10, 
17,18], or the bibliography [21]. 


AD software can use overloaded operators in two 
different ways. Operators can propagate both the value 
x; and its derivative Vx;, as suggested by the annotated 
code list above. This approach is easy to understand and 
to program. We give prototypical Taylor operators of 
this flavor below. 

The second approach has the operators construct 
and store the code list. Various optimizations and par- 
allel scheduling [1,4,12] may be applied to the code list. 
Then the code list is interpreted to propagate deriva- 
tive values. This is the approach of AD tools such 
as ADOL-C [11], ADOL-F [20], ADO1 [16], or IN- 
TOPT_90 [13]. The second approach is much more 
flexible, allowing the code list to be traversed in either 
the forward or reverse modes of AD (see [9]) or with 
various arithmetics (e.g. point- or interval-valued se- 
ries). 

AD may be applied to functions of more than one 
variable, in which partial derivatives with respect to 
each are computed in turn, and to vector functions, 
in which the component functions are differentiated 
in succession. In addition, we can compute higher or- 
der derivative values. One application of AD involving 
higher order derivatives of f is the computation of Tay- 
lor (series) coefficients to which we turn in the next sec- 
tion. 

Source code transformation is a third approach to 
AD software used by ATOMFT [5] for Taylor series and 
by ADIFOR [3], PADRE2 [14], or Odyssée [19] for par- 
tial derivatives. Such tools accept the algorithm for f as 
data, rather than for execution, and produce code for 
computing the desired derivatives. The resulting code 
often executes more rapidly than code using overloaded 
operators. 


Taylor Coefficients 


We define the Taylor coefficients of the analytic func- 
tion f at the point t = fo: 


1 d’ f(t) 


(f|to)i = rere 


for i = 0, 1, ..., and let F := ((f|to);) denote the vec- 
tor of Taylor coefficients. Then Taylor’s theorem says 
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that there exists some point t (usually not practically 


obtainable) between t and to such that 


P 
f() = DOF to)ilt — to)! 
i= (1) 
1 d?t f(r) dh 
GED! der (EP. 


Computation of Taylor coefficients requires differ- 
entiation of f. We generate Taylor coefficients automat- 
ically using recursion formulas for unary and binary 
operations. For example, the recurrences we need for 
our example f(t) = e'/(2 + t) are 


x(t) = exp u(t) > x’ = xu’, 
(x)o = exp(u)o, 
i=) 


i 


i-1 
p=) 6) eG)# 


j=0 
x(t) = u(t) + v(t), 
(x)i = (Wi + ()is 


> 


(t) Y= 
(a) — Di) j#):-)) 


(v)o 


The recursion relations are described in more de- 
tail in [17]. Except for + and —, each recurrence follows 
from Leibniz’ rule for the Taylor coefficients of a prod- 
uct. The relations can be viewed as a lower triangular 
system. The recurrence represents a solution by for- 
ward substitution, but there are sometimes accuracy or 
stability advantages in an iterative solution to the lower 
triangular system. The recurrences for each operation 
can be evaluated in floating-point, complex, interval, or 
other appropriate arithmetic. 

To compute the formal series for f(t) = e'/(2 + t) 
expanded at t = 0, 


Xo = (to, 1,0, oe .)(0, 1,0, oe is 


Xy = exp Xo => (1, 1, Fetes) 


(2) 
X,:= 2+ Xp = (2,1,0,...), 
__& fl 11 
X= B= (a ae): 


class Taylor { 
private: 
cont int Max_Length = 20; 
Value_type coef[Max_Length]; 
public: 
Taylor ( Value_type t_0 ) { 
// Constructor for Independents 
coef[0] = t_0; coef[1] = 1; 
for(inti = 2; i; Max_Length; i++) 
{ coef[i] = 0; } 


// Or make a template: 


} 
Taylor ( void ) { 
// Constructor for Dependents 
for (int i= 0; i; Max_Length; i++) 
{ coef[i] = 0; } 
} 
Taylor ( Taylor &U) { 
// Copy Constructor 
for (int i=0;i; Max_Length; i++) 
{ coef]i] = Value_type(U.coef) [i]; } 
} 
friend Taylor operator + 
(int u, Taylor V) { 
V.coef[0] += u; return V; 
} 
friend Taylor operator / 
(Taylor U, Taylor V) { 
Taylor X; 
for (int i= 0; i; Max_Length; i++) { 
Value_type sum = U.coef[i]; 
for (int j = 0; jj i; j++) 
{sum — =X.coef|[j] * V.coef[i—j]; } 
X.coef[i] = sum / V.coef[0]; 
} 
return X; 
} 
friend Taylor exp (Taylor U) 
{ /* Similar to divide */ } 
Value_type getCoef (int i) 
{ return coef[i]; } 
}; // end class Taylor 


Point and Interval Taylor Operators 


As foreshadowed by this example, we define an abstract 
data type for Taylor series and use operator overloading 
to define actions on objects of that type using previously 
defined floating-point and interval operations. 
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Design of Operators 


In this section, we give prototypical operators for the di- 
rect propagation of Taylor coefficients such as might be 
called from code similar to that shown in Fig. 1. Direct 
propagation of values works by translating each opera- 
tion into a call to the appropriate AD routine at compile 
time. Thus, simply compiling the source code for f and 
linking it with the overloaded operator routines creates 
a program that computes the Taylor coefficients of f at 
t = to. For illustration, we provide only a stripped-down 
prototype with operators required for the example f(t) 
= e'/(2 + t). We suppress issues of references and the 
like that are essential to the design of a useful class. 
See [6] for a description ofa set of interval Taylor oper- 
ators in Ada. 

If instead, operators for AD_type record a code list, 
then an interpreter reads each node from the code list 
and calls the appropriate operator from class Taylor: 


Taylor Operand[MemsSize]; 
for (int i= 0; ; CodeSize; i++) { 
Node = getNextOperation (); 
switch (Node.OpCode) { 
case PLUS : Operand[Node.Result] 
= Operand[Node.Left] 
+ Operand[Node.Right]; 
break; 


case EXP : Operand[Node.Result] 
= exp ( Operand[Node.Left] ); 
break; 


Use of Interval Operators 
We have mentioned the possibility of working with in- 


terval values but not the significance of doing so. From 
equation (1) for an interval t, and for all t € t, 


P 
f(t) © SOF to)i(t = to)! 


i=0 
1 d?* f(t) 
(p+)! der 


(t—to)?*". (3) 


-1 -0.5 0 - 1 
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tors, Figure 2 
Taylor series enclosures of f 


In a computer implementation, the summation is 
done in interval arithmetic to ensure enclosure. The 
series Taylor coefficients (f|to); are narrow intervals 
whose width comes only from outward rounding. The 
remainder term is the Taylor coefficient (f|t);, where 
the recurrence relations are evaluated in interval arith- 
metic. The series (3) can be used to bound the range of 
f, for validated quadrature [7], or for rigorous solution 
of ODEs [15]. For the example f(t) = e'/(2 + t), we re- 
peat the sequence of computations of Equation (2) for 
the interval tp = [0, 0] and for t = [—1, 1]: 


11411 
((fI{01);) = (. | ae): 
((fI[-1.1)):) 
= ((0.12, 2.72], [—2.59, 2.68], [—2.64, 4.04],...). 


Assembling these according to (3) yields enclosures for 
all ¢ € [—1, 1]: 

f(t) € GIL-1, Wo = [0.12, 2.72] 

€ (FIL0]o + (FI[-1, 1) — 0) 


1 
= 5 + [-2.59, 2.68]t¢ 


€ (FI[0)o + (FIL0])1(¢ — 0) 


+ (f|[-1, 1)2(t — 0)? 


See eh 2.64, 4.04] t? 
“oo ——— 
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To demonstrate the true power of this approximation 
technique, we plot the corresponding 5, 10, and 20 term 
enclosures in Fig. 2. 


One-at-a-Time Coefficient Generation 


The Taylor operators described the preceding section 
accept vectors of p Taylor coefficients for operands u 
and v and return Taylor coefficients for result x with 
complexity O(p*). However, for applications such as 
ODEs or order-adaptive quadrature, the entire operand 
series is not known, and we need to compute terms one 
at a time [6]. For example, for the DE 


exp(u) 


BIE 64) 


, u(0)=1, 
initial condition u(0) = 1 implies 
(u|0)o = 1, 


and DE wu’ = exp(u)/(2 + t) implies 


iia aE 
Oe 2 
uy" = uw expt) = ae implies 
(u|0)> = e exp(1) — e(e — 1) 
"3040 4 7 * 
etc. 


Successive terms can be computed by interpreting 
the code list for f(t, u) repeatedly for series of increasing 
length for u. Each iteration of the automatic generation 
process yields an additional Taylor coefficient. Unfortu- 
nately, a simple implementation of Taylor operators has 
complexity O(p*) because already known coefficients of 
u’ are recomputed. However, since the order of oper- 
ations is the same in each iteration, we can increase 
the efficiency of the computations by storing interme- 
diate results [6]. Each overloaded operator routine calls 
a memory allocation procedure that refers it to the next 
space in an array. If that space is empty, we store Tay- 
lor coefficient values for that variable. Otherwise, the 
space must contain the previously computed Taylor co- 
efficients of that variable, which we can then use to 
more quickly compute the next coefficient in the set. 
With clever book-keeping, we compute p floating-point 
or interval-valued Taylor coefficients one at a time in 
O(p”) time. 


Trade-Offs 


We may strive for three goals when writing software for 
point and interval Taylor operations: storage space effi- 
ciency, time efficiency, and ease of use. These three fac- 
tors are often at odds with each other. 

Carefully implemented operator overloading pro- 
vides an easy to use interface and provides reasonable 
time and space efficiency. We may achieve greater time 
and space efficiency by using source code transforma- 
tion. 

In conclusion, automatic differentiation through 
Taylor operators shows merit as a technique for com- 
puting guaranteed interval enclosures about a func- 
tion f. Further efforts to refine this technique may pro- 
vide us with a tool that handles multivariate functions, 
and runs significantly faster thanks to parallelization 
and improved optimization techniques. 
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Keywords 


Automatic differentiation; Root problem; Branch 
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Automatic differentiation is a method in which a pro- 
gram for evaluating a function f is transformed into an- 
other program that evaluates both the function f and 
some of its derivatives. The key idea is the repeated 
use of the chain-rule for composing the derivatives of 
f from derivatives of parts of f. For more about auto- 
matic differentiation (AD), consult [2,3,5]. 

Proper combinations of differentiable functions 
produce differentiable functions. Some combinations 
of nondifferentiable functions also produce differen- 
tiable functions. Therefore the mere fact that a program 
defines a differentiable function is no guarantee that 
AD will work. Here we investigate two cases, where AD, 
applied to a program for a differentiable function, fails. 

The root problem arises when a square-root is com- 
bined with other functions so that the resulting func- 
tion is differentiable but the chain-rule is not applicable 
for certain arguments. 

The branch problem arises when a program for eval- 
uating a differentiable function f employs statements of 
the form B(x) then S1 else S2, where x is from the do- 
main of f, B is a Boolean function, and S1 and S2 repre- 
sent subprograms. This reflects a piece-wise definition 
of the function f, and the derivative of one or the other 
piece may be quite different from the derivative of the 
function f. 


Root Problem 


An example that is typical of the root problem is shown 
in Table 1. The program P defines the function 


f: RP >R 


with 


f@%) =4/x7 +4. 


This function is differentiable at any x € R’, in partic- 
ular f’(0) = [0, 0]. Standard AD (in the forward mode) 
transforms P into a program P’ by inserting assignment 
statements for derivatives in proper places (see Table 2). 

The program P’ is supposed to compute f(x) and 
f (x). But for x = 0 it does not compute the correct value 


Automatic Differentiation: Root Problem and Branch Prob- 
lem, Table 1 
Program P for evaluating f at x 


input: x = (x1, x2) € R? 


pal <= il 

y2 i X2 

y3 = 

v4 - 9 

ys <= Vase ya 
Yo — SVs 
Hee = 


output: f(x) 


Automatic Differentiation: Root Problem and Branch Prob- 
lem, Table 2 
Program P’ for evaluating f and f’ at x 


input: x = (x1, x2) € R? 


V1 <— x yh < ([1,0] 

V2 — x V5 < [0,1] 

a i ys <— 4y-y, 
po <= Ky = 4; 
Ys << yatys | Vs x Vat 4 
Von an Yo RS 
fG) = ye PQ) = ¥% 
output: f(x) output: f’(x) 


Automatic Differentiation: Root Problem and Branch Prob- 
lem, Table 3 
Program Q for evaluating f at x 


input: x ¢ DCR" 
yi <= ANG) 


ee 
y3 By) 
f(x) = 1% 


output: f(x) 


f'(0) = [0, 0], but rather it fails because of division by 
zero. 

One can easily see that this failure is not limited to 
the forward mode, because the reverse mode encoun- 
ters the same division-by-zero problem. Symbolic ma- 
nipulation packages such as MAPLE also fail to produce 


f'(0). 
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A more general setting for the root problem is 
shown in Table 3. Here, it is assumed that: 
1) Dg isa nonempty open subset of R"; 
2) the function A: D4 C R" — R is differentiable; 
3) Dg isa nonempty open subset of R™?; 
4) the function B: Dg C R"*! — R is differentiable; 
5) D:= {x € Da: A(x) = 0, (x, A(x)) € Dg}; 
6) Dis nonempty. 
The program Q defines the function 


f: DCR" ->R 
with 
f(x) = B(x, VA(x)). 


Standard AD (in the forward mode) transforms Q into 
a program Q’. The steps of Q’ in evaluating f’(x) can be 
seen in the formula 


f'(x) = Bi(x, yo) + Ba(x, y2) - = ; x1) , 


where [B,(x, y2), B2(x, y2)] is an appropriate partition 

of B’(x, y2). For x € D with A(x) > 0, the program Q’ 

will produce f’(x). And for x € D with A(x) = 0, the 

program Q’ fails because of division by zero. The case 

in which x* € D with A(x*) = 0 is ambiguous. It says 

nothing about the existence of f’(x*). In this case, we 

distinguish the following four situations: 

A) f’(x*) does not exist, for instance n = 2, A(x) = x} + 
x and B(x, y)=y,x" =0. 

B) A alone guarantees existence of f’(x*), for instance 
n= 2, A(x) = x) +95,x" =0. 

C) B alone guarantees existence of f’(x*), for instance 
B(x, y) =". 

D) A and B together guarantee existence of f’(x*), for 
instance n = 2, A(x) = xj + x} and B(x, y) = x1 -x2- 
y, x* =0. 

What can be done to resolve the root problem? 
The use of AD tools for higher derivatives may be 

helpful. Consider the simple case n = 1, A € C™, Dg = 

R"™!, B(x, y) = y. So we have 


D:= {x: x € Da, A(x) = 0} 


and f: DC R > Rwith f(x) = /A(x). 

Assume that for x € R it can be decided whether or 
not x € D, for instance by testing x in a program for 
evaluating A. 


For x* € D, we require the value of the derivative 

f (x*). Below, we list the relevant implications: 

e A(x*)>0> f"(x*) = rca) »Al(x*), 

A(x*) = 0 => no answer possible. 

A(x*) = 0, A'(x*) 4 0 = f’(x*) does not exist. 

A(x*) = 0, A’(x*) = 0 => no answer possible. 

A(x*) = 0, A’(x*) =0, A” (x*) £0 = f'(x*) does not 

exist. 

e A(x*) = 0, A’(x*) =0, A” (x*) = 0 => no answer pos- 
sible. 

e A(x*) =0, A’(x*) = 0, A” (x*) = 0, A” (x*) 40> 
f'(x*) does not exist. 

e A(x*) =0, A’(x*) = 0, A” (x*) =0, A”’(x*) =0 > no 
answer possible. 

e A(x*) = 0, A’(x*) = 0, A” (x*) = 0, A”’(x*) = 0, 
AM (x*) > 0 => f'(x*) =0. 

e A(x*) = 0, A’(x*) = 0, A’(x*) = 0, A’(x*) = 0, 
AM (x*)< 0 = f’(x*) does not exist. 

e A(x*) = 0, A’(x*) = 0, A” (x*) = 0, A”’(x*) = 0, 
A (x*) = 0 = no answer possible. 

Let n € {1, 2,3...} and A™(x*) = 0 fork =0,..., 2n. 

e ASD (x*) £0 = f’(x*) does not exist. 

e And) (x) =0, Ants, 0 => fix") = 

@ AG Det) = 0, AC™2< 0 => f'(x*) does not exist. 

e A D(x*) = 0, AC"* = 0 => no answer possible. 
For a nonstandard treatment of these implications 

see [6]. Of course in the general situation given in Ta- 

ble 3, the classification of cases is more problematic. 


Branch Problem 


A typical example for the branch problem is Gauss- 
elimination for solving a system of linear equations 
with parameters. For illustrative purposes, it suffices to 
consider two equations with a two-dimensional param- 
eter x (see Table 4). Here, it is assumed that: 

a) Dis anonempty open subset of R’; 

b) the function M: D C R* — R*? is differentiable; 

c) the function R: D C R* — R’ is differentiable; 

d) x € D=> the matrix M(x) is regular. 

The program GAUSS defines the function 


F: DCR? >R 
with 


M(x) - F(x) = R(x). 
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Automatic Differentiation: Root Problem and Branch Prob- 
lem, Table 4 
Program GAUSS for evaluating f at x 


input: x « D 
M11 = M(x) 
M12 <— M(x) 
M21 — M(x) 
M22 — M(x) 
Rl <— R(x) 
R2 — Ro(x) 
IF M11 40 THEN 
Sle I, < M21/Mi1 
M22 <— M22—Ex M12 
R2 < R2-—E*RI 
F2 < R2/M22 
Fl <— (RI—M12 x F2)/MI11 
ELSE 
$2: F2 < R1/M12 
Fl < (R2—M22 x F2)/M21 


output: F(x) = (F1,F2) 


Since the matrix M(x) is regular for x € D, the program 
GAUSS and the function f are well-defined. Further- 
more, the function f is differentiable. 

Standard AD (in the forward mode) transforms 
GAUSS into a new program by inserting assignment 
statements for derivatives in proper places. The result- 
ing program GAUSS is also well-defined, and for x € D 
it is supposed to produce F(x) and F’(x). 

Now choose 


D = {x €R?: 0< x, <2, 0 <x. < 2} 


and 
— | Mul) | Mix) 
MaY Moi(x) | Mz2(x) 
= x1 = X2 if 
7 10 XxX) + X2 i 
R _ R(x) = 100(x1 + 2x2) 
(©) =) | P1000 — 2x) t 


It is easy to see that D is a nonempty open subset of 
R?’, that the functions M and R are differentiable, and 
that M(x) is regular for x € D. 


GAUSS’ produces 


—40 | —90 
F'(1,1) = , 
100 | 200 
but the correct value is 
—54 | —76 
F'(1,1) = ; 
170 | 130 


One can easily check that the wrong result is not limited 
to the forward mode, because the reverse mode yields 
exactly the same wrong result. 

To better understand the situation we define 


D, := {x: xe D, My,(x) x 0}~" 
D2 := {x: x€ D, My,(x) = 0}. 


The program GAUSS can be considered as a piece- 
wise definition of the function F, 


F(x) according to $1, for x € Dj, 


F(x) = 


F(x) according toS2, for x € Dp. 


Normally, one is not too concerned about the domain 
of a function. But indeed in this case, we must be con- 
cerned. 

Let F|p, denote the restriction of F to D, and let F|p, 
denote the restriction of F to D2. Then, of course 


(Flp,)(x) for x € Dy, 


F(x) = 
(Flp,)(x) for x € Do. 


The domain D, of the function F|p, is an open set, x € 
D, is an interior point of D,, and hence 


F'(x) =(Flp,)/(x) for x € Dy, 
and this is the value GAUSS’ produces. 

The domain D) of the function F|p, is too thin, it 
has no interior points, and hence F|p, is not differen- 
tiable. In other words, the function F|p, does not pro- 
vide enough information to obtain F’(x) for x € Dp. 
Thus GAUSS’ cannot produce F’(x) for x € D). What 
GAUSS actually presents for F’(x) is the value for the 
derivative of another function, which is of no interest 
here. For more see [1]. 

In [4] it is claimed that the use of a certain branch- 
ing function method makes the branch problem vanish. 
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This is true in certain cases, in our example the branch- 
ing function method fails because it encounters divi- 
sion by zero. At least this suggests that something went 
wrong. For a partial solution to the branch problem, 
see [1] and for a nonstandard treatment of the branch 
problem, see [6]. 

A simple example of the branch problem is shown 
in the informal program 


IF x # 1 THEN f(x) <x-x 
ELSE f(x) < 1. 


This program defines the function 
f: ROR _ with f(x) = x’. 


Of course, f is differentiable, in particular we have f’(1) 
=2. 

Standard AD software produces the wrong result 
f'(1) = 0. It is not surprising that symbolic manipula- 
tion packages produce the same wrong result. Here it 
is obvious that the else-branch does not carry enough 
information for computing the correct f’(1). 

Sometimes branching is done to save work. Con- 
sider the function 


f: DCR">R 
with 
f(x) = s(x) + c(x)- E(x), 


where D is an open set. The real-valued functions s, c, E 
may be given explicitly or by subroutines. Assume that 
f(x) has to be evaluated many times for varying x-s, that 
c(x) = 0 for many interesting values of x, and that E(x) 
is computationally costly. Then it is effective to set up 
a program for computing f(x) as shown in Table 5. 

Assume that the functions s, c, E are differentiable. 
Then f is differentiable too. For given x € D we ask for 
f (x): 

Standard AD (in the forward mode) transforms SW 
into a new program by inserting assignment statements 
concerning derivatives. The resulting program SW’ is 
well-defined, and for given x € D it is supposed to pro- 
duce f(x) and f’(x). 

Define the sets 


D, := {x: x € D, c(x) £0}, 
Dy := {x: x € D, c(x) = 0}. 


Automatic Differentiation: Root Problem and Branch Prob- 
lem, Table 5 
Program SW for computing f(x) 


input: x €« D 

cx) <— 

IF c(x) 4 0 THEN 
Sl: s(x) < =: 


E(x) <= a+: 
r(x) << s(x) +c(x)- E(x) 
(Ss) 
ELSE 
S82: s(x) < =: 
f(x) <—— s(x) 


output: f(x) 


SW’ works correctly to produce 
f'(x) =r'(x) forx € Dj. 

Looking at SW, it is tempting to assume: 
f'(x) =s'(x) for x € Dz 


and SW’ actually follows this assumption. But it is clear 
that 


f'(x) = s'(x) + E(x) - c'(x) + c(x)- E'(x) 
for x € D, 


and in particular 


f'(x) = s'(x) + E(x) - e'(x) 


for x € Dp. 


If x € Dy», and if either E(x) = 0 or c’(x) = 0, then SW’ 
produces the correct F’(x), otherwise SW’ fails. 
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The traditional numerical analysis considers optimiza- 
tion algorithms which guarantee some accuracy for all 
functions to be optimized. This includes the exact al- 
gorithms (that is the worst-case analysis). Limiting the 
maximal error requires a computational effort that of- 
ten increases exponentially with the size of the problem. 
An alternative is average case analysis where the average 
error is made as small as possible. The average is taken 
over a set of functions to be optimized. The average case 
analysis is called the Bayesian approach (BA) [7,14]. 
There are several ways of applying the BA in opti- 
mization. The direct Bayesian approach (DBA) is de- 
fined by fixing a prior distribution P on a set of func- 
tions f(x) and by minimizing the Bayesian risk function 
R(x) [6,14]. The risk function describes the average de- 
viation from the global minimum. The distribution P is 
regarded as a stochastic model of f(x), x € R™, where 


f(x) might be a deterministic or a stochastic function. 
In the Gaussian case assuming (see [14] that the (n + 
1)th observation is the last one 


1 ( yrmn(x) 


yA 
R(x) = min(c,,z)e 2 #9 ) dz, 


1 +oo 
J 27 Sn (Xx) [. 
(1) 


Here, c, = min; Zz; — €, 2; = f(x;), mMn(x) is the condi- 
tional expectation given the values of z;, i= 1,...n, dn(x) 
is the conditional variance, and € > 0 is a correction pa- 
rameter. 

The objective of DBA (used mainly in continuous 
cases) is to provide as small average error as possible 
while keeping the convergence conditions. 

The Bayesian heuristic approach (BHA) means fix- 
ing a prior distribution P on a set of functions f(x) 
that define the best values obtained using K times some 
heuristic h(x) to optimize a function v(y) of variables 
y € R" [15]. As usual, the components of y are discrete 
variables. The heuristic h(x) defines an expert opin- 
ion about the decision priorities. It is assumed that the 
heuristics or their ‘mixture’ depend on some continu- 
ous parameters x € R”, where m <n. 

The Bayesian stopping rules (BSR) [3] define the 
best on average stopping rule. In the BSR, the prior dis- 
tribution is determined regarding only those features of 
the objective function f(x) which are relevant for the 
stopping of the algorithm of global optimization. 

Now all these ways will be considered in detail start- 
ing from the DBA. The Wiener process is common 
[11,16,19] as a stochastic model applying the DBA in 
the one-dimensional case m = 1. 

The Wiener model implies that almost all the sample 
functions f(x) are continuous, that increments f(x4) — 
Ff (x3) and f(x2) — f(x1), x1 < x2 < x3 < x4 are stochasti- 
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cally independent, and that f(x) is Gaussian (0, o x) at 
any fixed x > 0. Note that the Wiener process originally 
provided a mathematical model of a particle in Brown- 
ian motion. 

The Wiener model is extended to multidimensional 
case, too [14]. However, simple approximate stochas- 
tic models are preferable, if m > 1. These models are 
designed by replacing the traditional Kolmogorov con- 
sistency conditions because they require the inversion 
of matrices of nth order for computing the conditional 
expectation m,(x) and variance d,,(x). The favorable ex- 
ception is the Markov process, including the Wiener 
one. Extending the Wiener process to m > 1 the Marko- 
vian property disappears. 

Replacing the regular consistency conditions by: 

e continuity of the risk function R(x); 

e convergence of x, to the global minimum; 

e simplicity of expressions of m,(x) and s,,(x), 

the following simple expression of R(x) is obtained us- 
ing the results of [14]: 


R(x) = min z; — min 
<i<n 


The aim of the DBA is to minimize the expected devia- 
tion. In addition, DBA has some good asymptotic prop- 
erties, too. It is shown in [14] that 


d* — * 1/2 
: -(2 i +) , Geis. 
a € 


where d* is the density of x; around the global optimum 
f*,dq and f, are the average density of x; and the aver- 
age value of f(x), and € is the correction parameter in 
expression (1). That means that DBA provides conver- 
gence to the global minimum for any continuous f(x) 
and greater density of observations x; around the global 
optimum, if n is large. Note that the correction param- 
eter € has a similar influence as the temperature in sim- 
ulated annealing. However, that is a superficial similar- 
ity. Using DBA, the good asymptotic behavior should 
be regarded just as an interesting ‘by-product’. The rea- 
son is that Bayesian decisions are applied for the small 
size samples where asymptotic properties are not no- 
ticeable. 

Choosing the optimal point x,,; for the next iter- 
ation by DBA one solves a complicated auxiliary op- 
timization problem minimizing the expected deviation 


x{next) _x{3) x(4} x(5} 


Bayesian Global Optimization, Figure 1 
The Wiener model 


R(x) from the global optimum (see Fig. 1). That makes 
the DBA useful mainly for the computationally expen- 
sive functions of a few (m < 20) continuous variables. 
This happens in wide variety of problems such as max- 
imization of the yield of differential amplifiers, opti- 
mization of mechanical system of shock absorber, opti- 
mization of composite laminates, estimation of param- 
eters of immunological model and nonlinear time se- 
ries, planning of extremal experiments on thermostable 
polymeric composition [14]. 

Using DBA the expert knowledge is included by 
defining the prior distribution. In BHA the expert 
knowledge is involved by defining the heuristics and 
optimizing their parameters using DBA. 

If the number of variables is large and the objec- 
tive function is not expensive, the Bayesian heuristic 
approach is preferable. That is the case in many dis- 
crete optimization problems. As usual, these problems 
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are solved using heuristics based on an expert opinion. 
Heuristics often involve randomization procedures de- 
pending on some empirically defined parameters. The 
examples of such parameters are the initial tempera- 
ture, if the simulated annealing is applied, or the prob- 
abilities of different randomization algorithms, if their 
mixture is used. In these problems, the DBA is a conve- 
nient tool for optimization of the continuous parame- 
ters of various heuristic techniques. That is the Bayesian 
heuristic approach [15]. 

The example of knapsack problem illustrates the ba- 
sic principles of BHA in discrete optimization. Given 
a set of objects j = 1, ..., n with values cj and weights 
gj, find the most valuable collection of limited weight: 


max vy) = vy) = Dey; 
j=l 


S.t. 7, < g. 
j=l 


Here the objective function v(y) depends on n Boolean 
variables y = (y1,..., Yn), where y; = 1 if object j is in the 
collection, and y; = 0 otherwise. The well-known greedy 
heuristics hj = cj/g; is the specific value of object j. The 
greedy heuristic algorithm: ‘take the greatest feasible h;’, 
is very fast but it may get stuck in some nonoptimal de- 
cision. 

A way to force the heuristic algorithm out of such 
nonoptimal decisions is to make decision j with prob- 
ability rj = px(hj), where p,(h;) is an increasing func- 
tion of hj and x = (x), ...xm) is a parameter vector. 
The DBA is used to optimize the parameters x by min- 
imizing the best result fx(x) obtained applying K times 
the randomized heuristic algorithm p,(h;). That is the 
most expensive operation of BHA. Therefore, the paral- 
lel computation of fx(x) should be used when possible 
reducing the computing time in proportion to a num- 
ber of parallel processors. 

Optimization of x adapts the heuristic algorithm 
px(h;) to a given problem. Let us illustrate the param- 
eterization of px(hj) using three randomization func- 
tions: r! = hi / Dy, hi, 1 = 0, 1, oo. Here, the upper in- 
dex | = 0 denotes the uniformly distributed component 
and | = 1 defines the linear component of randomiza- 
tion. The index oo denotes the pure heuristics with no 
randomization where r?° = 1 if hj = max; hj and r?° = 0, 
otherwise. Here, parameter x = (Xo, X1, Xoo) defines the 


probabilities of using randomizations / = 0, 1, oo corre- 
spondingly. The optimal x may be applied in different 
but related problems, too [15]. That is very important in 
the ‘on-line’ optimization adapting the BHA algorithms 
to some unpredicted changes. 

Another simple example of BHA application is by 
trying different permutations of some feasible solution 
y®. Then heuristics are defined as the difference h; = 
v(y') — v(y°) between the permuted solution y! and the 
original one y°. The well-known simulated annealing 
algorithm illustrates the parameterization of p,(hj) re- 
lated to a single parameter x. Here the probability of 


hilx where x is 


accepting a worse solution is equal to e~ 
the ‘annealing temperature’. 

The comparison of BHA with exact branch and 
bound algorithms solving a set of the flow-show prob- 


lems is shown by the Table from [15]: 


R=100,K =1,J=10,S=10,0=10 
Technique i dp x0 x] Fee 
BHA 6.18 | 0.13 | 0.28 | 0.45 | 0.26 
CPLEX 12.23 | 0.00 = = = 


Here S is the number of tools, J is the number of jobs, 
O is the number of operations, fg, x0, X1, Xoo are the 
mean results, dg is the variance, and ‘CPLEX’ denotes 
the standard MILP technique truncated after 5000 it- 
erations. The table shows that in the randomly gen- 
erated flow-shop problems the average make-span ob- 
tained by BHA was almost twice less that obtained by 
the exact branch and bound procedure truncated at the 
same time as BHA. The important conclusion is that 
stopping the exact methods before they reach the exact 
solution is not a good way to obtain the approximate 
solution. 

The BHA has been used to solve the batch schedul- 
ing [15] and the clustering (parameter grouping) prob- 
lems. In the clustering problem the only parameter x 
was the initial annealing temperature [8]. 

The main objective of BHA is to improve any given 
heuristic by defining the best parameters and/or the 
best ‘mixtures’ of different heuristics. Heuristic decision 
rules mixed and adapted by BHA often outperform (in 
terms of speed) even the best individual heuristics as 
judged by the considered examples. In addition, BHA 
provides almost sure convergence. However, the final 
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results of BHA depend on the quality of the specific 
heuristics including the expert knowledge. That means 
the BHA should be regarded as a tool for enhancing the 
heuristics but not for replacing them. 

Many well-known optimization algorithms such as 
genetic algorithms (GA) [10], GRASP [13], and tabu 
search (TS) [14], may be regarded as generalized heuris- 
tics that can be improved using BHA. There are many 
heuristics tailored to fit specific problems. For exam- 
ple, the Gupta heuristic was the best one while applying 
BHA to the flow-shop problem [15]. 

Genetic algorithms [10] is an important ‘source’ of 
interesting and useful stochastic search heuristics. It 
is well known [2] that the results of the genetic algo- 
rithms depend on the mutation and cross-over param- 
eters. The Bayesian heuristic approach could be used in 
optimizing those parameters. 

In the GRASP system [13] the heuristic is repeated 
many times. During each iteration a greedy randomized 
solution is constructed and the neighborhood around 
that solution is searched for the local optimum. The 
‘greedy component constructs a solution, one element 
at a time until a solution is constructed. A possible 
application of the BHA in GRASP is in optimizing 
a random selection of a candidate to be in the solu- 
tion because different random selection rules could be 
used and their best parameters should be defined. BHA 
might be useful as a local component, too, by randomiz- 
ing the local decisions and optimizing the correspond- 
ing parameters. 

In tabu search the issues of identifying best com- 
binations of short and long term memory and best 
balances of intensification and diversification strategies 
may be obtained using BHA. 

Hence the Bayesian heuristics approach may be 
considered when applying almost any stochastic or 
heuristic algorithm of discrete optimization. The 
proven convergence of a discrete search method (see, 
for example, [1]) is an asset. Otherwise, the conver- 
gence conditions are provided by tuning the BHA [15], 
if needed. 

The third way to apply the Bayesian approach is 
the Bayesian stopping rules (BSR) [3]. The first way, the 
DBA, considers a stochastic model of the whole func- 
tion to be optimized. In the BSR the stochastic models 
regard only the features of the objective function which 
are relevant for the stopping of the multistart algorithm. 


In [20] a statistical estimate of the structure of multi- 
modal problems is investigated. The results are applied 
developing BSR for the multistart global optimization 
methods [4,5,18]. 

Besides these three ways, there are other ways to 
apply the Bayesian approach in global optimization. 
For example, the Bayes theorem was used to derive the 
posterior distribution of the values of parameters in 
the simulated annealing algorithm to make an optimal 
choice in the trade-off between small steps in the con- 
trol parameter and short Markov chains and large steps 
and long Markov chains [12]. 

In the information approach [17] a prior distribu- 
tion is considered on the location parameter a of the 
global optimum of an one-dimensional objective func- 
tion. Then an estimate of a is obtained maximizing 
the likelihood function after a number of evaluations of 
the objective function. This estimate is assumed as the 
next search point. For the solution of multidimensional 
problems, it is proposed to transform the problem into 
a one-dimensional problem by means of Peano maps. 
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Introduction 


After the initial introduction in 1982, Bayesian net- 
works (BN) have quickly developed into a dynamic area 
of research. This is largely due to the special structure of 
Bayesian networks that allows them to be very efficient 
in modeling domains with inherent uncertainty. In ad- 
dition, there is a strong connection between Bayesian 
networks and other adjacent areas of research, includ- 
ing data mining and optimization. 

Bayesian networks have their lineage in statistics, 
and were first formally introduced in the field of arti- 
ficial intelligence and expert systems by Pearl [17] in 
1982 and Spiegelhalter and Knill-Jones [21] in 1984. 
The first real-life applications of Bayesian networks 
were Munin [1] in 1989 and Pathfinder [7] in 1992. 
Since the 1990s, the amount of research in Bayesian 
networks has increased dramatically, resulting in many 
modern applications of Bayesian networks to various 
problems of data mining, pattern recognition, image 
processing and data fusion, engineering, etc. 

Bayesian networks comprise a class of interesting 
special cases, many of which were in consideration 
long before the first introduction of Bayesian networks. 
Among such interesting cases are some frequently used 
types of the model simplifying assumptions includ- 
ing naive Bayes, the noisy-OR and noisy-AND mod- 
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els, as well as different models with specialized struc- 
ture, in particular the time-stamped models, the strictly 
repetitive models, dynamic Bayesian networks, hidden 
Markov models, Kalman filter, Markov chains. Artifi- 
cial neural networks are another subclass of Bayesian 
networks, which has many applications, in particular in 
biology and computer science. 


Definitions 


Based on classical probability calculus, the idea of 
a Bayesian network has its early origins in Bayesian 
statistics. On the other hand, it has an added benefit of 
incorporating the notions of graph theory and networks 
that allows us to visualize the relationships between the 
variables represented by the nodes of a Bayesian net- 
work. In other words, a Bayesian network is a graphical 
model providing a compact representation for commu- 
nicating causal relationships in a knowledge domain. 
Below we introduce two alternative definitions of the 
general notion of a Bayesian network, based on the 
usual concepts of probability and graph theory (e.g. 
joint probability distribution, conditional probability 
distribution; nodes and edges of a graph, a parent of 
a node, a child of a node, etc.). 

Roughly speaking, a Bayesian network can be 
viewed as an application of Bayesian calculus on 
a causal network. More precisely, one can describe 
a Bayesian network as a mathematical model for rep- 
resenting the joint distribution of some set of random 
variables as a graph with the edges characterized by the 
conditional distributions for each variable given its par- 
ents in the graph. 

Given a finite collection of random variables X = 
{X1, X2,..., Xn}, the formal definition of a Bayesian 
network can be stated as follows: 


Definition 1 A Bayesian network is an ordered pair 

(G,D), where 

e The first component G represents a directed acyclic 
graph with nodes, which correspond to the ran- 
dom variables X,, X2,...,X,, and directed arcs, 
which symbolize conditional dependencies between 
the variables. The set of all the arcs of G satisfies 
the following assumption: Each random variable in 
the graph is conditionally independent of its non- 
descendants in G, given its parents in G. 


e The second component D corresponds to the set of 
parameters that, for each variable X;,1 < i < n, 
define its conditional distribution given its parents 
in the graph G. 


Note that the variables in a Bayesian networks can 
follow discrete or continuous distributions. Clearly, 
for continuously distributed variables, there is a cor- 
respondent conditional probability density function 
Ff (x;i|Pa(x;)) of X; given its parents Pa(X;). (From now 
on we denote by x; the realization of the correspondent 
random variable X;.) 

In many real-life applications modeled by Bayesian 
networks the set of states for each variable (node) in 
the network is finite. In the special case when all vari- 
ables have finite sets of mutually exclusive states and 
follow the discrete distributions, the previous definition 
of a Bayesian network can be reformulated in the fol- 
lowing fashion: 


Definition 2 A Bayesian network is a structure that 

consists of the following elements: 

e Acollection of variables with a finite set of mutually 
exclusive states; 

e A set of directed arcs between the variables symbol- 
izing conditional independence of variables; 

e A directed acyclic graph formed by the variables and 
the arcs between them; 

e Apotential table Pr(X;|Pa(X;)) associated with each 
variable X; having a set of parent variables denoted 
by Pa(X;). 


Observe that we do not require causality in Bayesian 
networks, i. e. the arcs of a graph do not have to symbol- 
ize causal relationship between the variables. However, 
it is imperative that the so-called d-separation rules im- 
plied by the structure are satisfied [12,19]. If variables 
X and Y are d-separated in a Bayesian network under 
the presence of evidence e, then Pr(X|Y, e) = Pr(X|e), 
i.e. the variables are conditionally independent given 
the evidence. 

Furthermore, the d-separation rules are applied to 
prove one of the key laws used in Bayesian networks, 
a so-called chain rule for Bayesian networks. 

The joint probability table Pr(X) = Pr(X, 
Xy,...,X,) sufficiently describes the belief structure 
on the set X = {X,,X2,...,X,} of variables in the 
model. In particular, for each variable X;, using the 
joint probability table, one can easily calculate the prior 
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probabilities Pr(X;) as well as the conditional proba- 
bility Pr(X;|e) given an evidence e. Nevertheless, with 
increase in the number of variables, the joint proba- 
bility table quickly becomes unmanageably large, since 
the table size grows exponentially fast with the size n 
of the variable set. Thus, it is necessary to find another 
representation, which adequately and more efficiently 
describes the belief structure in the model. A Bayesian 
network over X = {X),X2,...,Xn} provides such 
a representation. In fact, a graph in a Bayesian network 
gives a compact representation of conditional depen- 
dencies in the network, which allows one to compute 
the joint probability table from the conditional prob- 
abilities specified by the network using the chain rule 
below. 


The Chain Rule for Bayesian Networks [8] 


The joint probability distribution Pr(X) = Pr(X), 
X2,...,Xn) of the variables X = {X), X2,...,X,} in 
a Bayesian network is given by the formula 


Pr(X) = | | Pr(X;|Pa(X;)) . (1) 
i=1 


where Pa(X;) denotes the set of all parents of variable 
Xj. 

The chain rule for Bayesian networks also provides 
an efficient way for probability updating when the new 
information is received about the model. There is a va- 
riety of different types of such new information, i.e. ev- 
idence. Two most common types of evidence are find- 
ing and likelihood evidence. Finding is evidence that 
specifies which states are possible for some variables, 
while likelihood evidence gives a proportion between 
the probabilities of two given states. Note that some 
types of evidence including likelihood evidence cannot 
be given in the form of findings. 


Cases/Models 


Bayesian networks provide a general framework for 
a number of specialized models, many of which were 
identified long before the concept of a Bayesian net- 
work was proposed. Such special cases of BN vary in 
their graph structures as well as the probability distri- 
bution. 

The probability distributions for a Bayesian net- 
work can be defined in several ways. In some situations, 


it is possible to use theoretically well-defined distribu- 
tions. In others, the probabilities can be estimated from 
data as frequencies. In addition, absolutely subjective 
probability estimates are often used for practical pur- 
poses. For instance, when the number of conditional 
probability distributions to acquire from the data is very 
large, some simplifying assumptions may be appropri- 
ate. 

The simplest Bayesian network model is the well- 
known naive Bayes (or simple Bayes) model [4], which 
can be summarized as follows: 

e The graph structure of the model consists of one hy- 
pothesis variable H, and a finite set of information 
variables I = {Ij,h,...,In} with the arcs from H 
to every Ix,1 < k < n. In other words, the vari- 
ables form a diverging connection, where the hy- 
pothesis variable H is acommon parent of variables 
Ly Toy escy dni 

e The probability distributions are given by the val- 
ues Pr(I;,|H), for every information variable I,, 1 < 
k<n. 

The probability updating procedure based on the naive 

Bayes model works in the following manner: Given 

a collection of observations e), €2,...,€, on the vari- 

ables I}, Ib,...,I, respectively, the likelihood of H 

given €, €2,..., €, is computed: 


EE eyscbas..2:5¢,) =| | Prce|H). (2) 


i=1 


Then the posterior probability of H is obtained from the 
formula: 


Pr(H|e1, e2,..., €n) = C-Pr(H)-L(Aley, e2,..., en), 
(3) 


where C is a normalization constant. 

Another special case of BNs is a model underlined 
by the simplifying assumption called noisy-OR [18]. 
This model can be constructed as follows: 

Let Aj, A2,...,A, represent some binary variables 
listing all parents of a binary variable B. Each event 
Aj = x,x € {0,1}, causes B = x except when an in- 
hibitor prevents it, with the probability p;, i.e. Pr(B = 
1—x|A; = x) = p;. Suppose that all inhibitors are 
independent. 
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Then the graph of a corresponding Bayesian net- 
work is represented by the converging connection with 
Bas the child node of Aj, A2,...,A,, while the condi- 
tional probabilities are given by Pr(B = x|Aj; = x) = 
1 — p;. Since the conditional distributions are indepen- 
dent of each other, then 


n 
Pr(B = 1—x|Aj,Az,...,An) =| [pi- (4) 
i=1 


The noisy-OR assumption gives a significant advantage 
for efficient probability updating, since the number of 
distributions increases linearly with respect to the num- 
ber of parents. 

The construction complementary to noisy-OR is 
called noisy-AND. In the noisy-AND model, the graph 
is the convergent connection just as in the noisy-OR 
model, all the causes are required to be on in order to 
have an effect, and all the causes have mutually inde- 
pendent random inhibitors. Both noisy-OR and noisy- 
AND are special cases of a general method called noisy 
functional dependence. 

Many modeling approaches have been devel- 
oped which employ introduction of mediating vari- 
ables in a Bayesian network. One of these meth- 
ods, called divorcing, is the process separating parents 
Aj,A2,...,A; and Aj41,...,A, of a node B by intro- 
ducing a mediating variable C as a child of divorced 
parent nodes Aj, A2,..., A; and a parent of the initial 
child node B. The divorcing of A;, A2,..., A; is possi- 
ble if the following condition is satisfied: 

The set of all configurations of A;, A2,...,A; can 
be partitioned into the sets c), ¢2,...,¢s so that for ev- 
ery 1 < j < m, any two configurations y,, y2 € c; have 
the same conditional probabilities: 


Pr(Bly1, Ai+1, erat An) = Pr(Bly2, Ai+1, or An) a 


(5) 


Other modeling methods, which engage the mediat- 
ing variables, involve modeling undirected relations, 
and situations with expert disagreement. Various types 
of undirected dependencies, including logical con- 
straints, are represented by adding an artificial child C 
of the constrained nodes A,,A2,...,A, so that the 
conditional probability Pr(C|A;, A2,..., An) emulates 
the relation. The situation, where k experts disagree 


on the conditional probabilities for different vari- 
ables B,,B2,...,B, in the model can be modeled 
by introducing a mediating node M with k states 
™,,™M,..., mM, so that the variables B,, B2,...,B, on 
whose probabilities the experts disagree become the 
only children of expert node M. Another approach to 
modeling expert disagreements is by introducing alter- 
native models with weights assigned to each model. 

An important type of Bayesian networks are so- 
called time-stamped models [10]. These models reflect 
the structure which changes over time. By introduc- 
ing a discrete time stamp in such structures, the time- 
stamped models are partitioned into submodels for ev- 
ery unit of time. Each local submodel is called a time 
slice. The complete time-stamped model consists of all 
its time slices connected to each other by temporal links. 

A strictly repetitive model is a special case of a time- 
stamped model such that all its time slices have the 
same structure and all the temporal links are alike. The 
well-studied hidden Markov models is a special class of 
strictly repetitive time-stamped models for which the 
Markov property holds, i.e. given the present, the past 
is independent of the future. 

A hidden Markov model with only one variable in 
each time slice connected to the variables outside the 
time slice is a Kalman filter. Furthermore, a Markov 
chain can be represented as a Kalman filter with only 
one variable in every time slice. It is possible to convert 
a hidden Markov model into a Markov chain by cross- 
multiplying all variables in each time slice. 

The time-stamped models can have either finite 
horizon or infinite horizon. An infinite Markov chain 
would be an example of a time-stamped model with 
an infinite horizon. Furthermore, the repetitive time- 
stamped models with infinite horizon are also known 
as dynamical Bayesian networks. By utilizing the spe- 
cial structure of many repetitive temporal models, they 
can be compactly represented [2]. Such special repre- 
sentation can often facilitate the design of efficient al- 
gorithms in updating procedures. 

Artificial neural networks can also be viewed as 
a special case of Bayesian networks, where the nodes 
are partitioned into m mutually exclusive layers, and the 
set of arcs represented by the links from the nodes on 
layer i to the nodes oni + 1,1 < i < n. Layer 1 is usu- 
ally called the input layer, while layer n is known as the 
output layer. 
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Methods 


Just as the BNs have their roots in statistics, the ap- 
proaches for discovering a BN structure utilize statis- 
tical methods. That is why a database of cases is in- 
strumental for discovery of the graph configuration of 
a Bayesian network as well as probability updating. 
There are three basic types of approaches to extracting 
BNs from data: batch learning, adaptation, and tuning. 

Batch Learning. Batch Learning is a process of ex- 
tracting the information from a database of collected 
cases in order to establish a graph structure and the 
probability distributions for a certain Bayesian net- 
works. 

Often there are many ways to model a Bayesian net- 
work. For example, we may obtain two different proba- 
bility distributions to model the true distribution of the 
variable in the network. To make an intelligent choice 
between two available distributions, it is important to 
have an appropriate measure of their accuracy. A logi- 
cal way to approach this subject is by assigning penalties 
for a wrong forecast on the base of a specified distribu- 
tion. For example, two widely accepted ways for assign- 
ing penalties are the quadratic (Brier) scoring rule and 
the logarithmic scoring rule. 

Given the true distribution p = (pi, p2,.-.. Pm) 
of a discrete random variable with m states, and some 
-5Gm), the 
quadratic scoring rule assigns the expected penalty as: 


approximate distribution q = (q1,q,.. 


ESo(p.g)= > pi [Q-aPt+> gj). © 
i i#i 


The distance between true distribution p and approxi- 
mation q is given by the formula: 


da(p, q) = ESq(p, q) — ESQ(p, p) - (7) 


Hence, from (6) we have: 
do(p.q) = >\(pi- ai)’ - (8) 
i=1 


The distance dg(p,q) given in (8) is called the Eu- 
clidean distance. 

The logarithmic scoring rule assigns to each out- 
come i the corresponding penalty S;(q,i) = —log qi. 


Hence, the expected penalty is calculated as: 


ESi(p.q) = — )- pilogqi. (9) 
i=1 


From (7), we obtain an expression for the distance be- 
tween the true distribution p, and the approximation q: 
m a 
di(p.q) = > pilog =, (10) 
i=l ca 
which is called the Kulbach-Leibler divergence. 

Note that both definitions, the Euclidean distance 
and the Kulbach-Leibler divergence, can be easily ex- 
tended in the case of continuous random variables. 
Moreover, both scoring rules, the quadratic and the log- 
arithmic, possess the following useful property: only 
the true distribution minimizes the score. The scor- 
ing rules that exhibit this property are called strictly 
proper. Since the quadratic and the logarithmic scor- 
ing rules are strictly proper, then the corresponding dis- 
tance measures dg and d, both satisfy the following: 


d(p.q) = 0 


Different scoring rules and corresponding distance 
measures for discrete and continuous random variables 
have been extensively studied in statistics. A compre- 
hensive review of strictly proper scoring rules is given 
in [6]. 

Naturally, among several different Bayesian net- 
works that model the situation equally closely, the one 
of the smallest “size” would be preferred. 

Let M denote a Bayesian network over the variable 
set X = {X), X2,...,X,}. Then the size of M is given 
by 


ifandonlyif p=q. 


n 


Size(M) =) s(Xi) . 


i=1 


(11) 


where s(X;) denotes the number of entries in the con- 
ditional probability table Pr(X;|Pa(X;)), and Pa(X;) is 
the set of parents of X;. 

The following measure accounts for both the size of 
the model and its accuracy. 

Given a Bayesian network M over X with the 
true probability distribution p, and an approximate 
Bayesian network model N with distribution q, we de- 
fine the acceptance measure as 


a(p, N) = Size(N) + C-d(p,q), (12) 
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where Size(-) is the network size defined by (11), d(p, q) 
is a distance measure between probability distributions 
p and q, and C is a positive real constant. 

The general approach to batch learning a Bayesian 
network from the data set of cases can be summarized 
as follows: 

e Select an appropriate threshold t for distance mea- 
sure d(p, q) between two distributions; 

e Fix a suitable constant C in a definition of accep- 
tance measure a(p, N); 

e Among all Bayesian network models over X and dis- 
tribution q such that d(p, q) < tT, select the model 
that minimizes a(p, N). 

Although simple, this approach has many practical 
issues. The data sets in batch learning are usually very 
large, the model space grows exponentially in the num- 
ber of variables, there may be missing data in the data 
set, etc. To extract structure from such data, one of- 
ten has to employ special heuristics for searching the 
model space. For instance, causality can be used to clus- 
ter the variables according to a causal hierarchy. In 
other words, we partition the variable set X into sub- 
sets Sj], So,...,S,%, so that the arcs satisfy a partial or- 
der relation. If we find the model N having the distance 
d(p, q) < t, the search stops; otherwise we consider the 
submodel of N. 


Adaptation It is often desirable to build a system ca- 
pable of automatically adapting to different settings. 
Adaptation is a process of adjusting a Bayesian network 
model so that it is better able to accommodate to new 
accumulated cases. 

When building a Bayesian network, usually there is 
an uncertainty whether the chosen conditional proba- 
bilities are correct. This is called the second-order un- 
certainty. 

Suppose that we are not sure which table out of m 
different conditional probability tables T, Ty,..., Tin 
represents the true distribution for Pr(X;|Pa(X;)) for 
some variable X; in a network. By introducing a so- 
called type variable T with states t), t2,..., tm into the 
graph so that T is a parent of X;, we can model this un- 
certainty into the network. Then the prior probability 
Pr(ti, t2,...,tm) represents our belief about the cor- 
rectness of the tables T,, T>,..., Tin respectively. Next, 
we set Pr(X;|Pa(X;), t;) = Tj. Our belief about the cor- 
rectness of the tables is updated each time we receive 


new evidence e. In other words, for the next case, we 
use Pr(t), t2,...,tm|e) as the new prior probability of 
tables’ accuracy. 

Sometimes the second-order uncertainty about the 
conditional probabilities cannot be modeled by intro- 
ducing type variables. In such cases, various statistical 
methods can be applied. Normally such methods ex- 
ploit various properties of parameters, such as global 
independence, local independence, etc. 

The property of global independence states that the 
second-order uncertainty for the variables is indepen- 
dent, i.e. the probability tables for the variables can be 
adjusted independently from each other. 

The local independence property holds if and only if 
for any two different parent configurations 7, 7, the 
second-order uncertainty on Pr(A|z 1) is independent 
of the second-order uncertainty on Pr(A|z2), and the 
two distributions can be updated independently from 
each other. In other words, local independence means 
the independence of the uncertainties of the distribu- 
tions for different configurations of parents. 

The fractional updating scheme [22], is an algo- 
rithm for reducing the second-order uncertainty about 
the distributions based on the received evidence. Sup- 
pose that the properties of global and local indepen- 
dence for the second-degree uncertainty hold simulta- 
neously. For every configuration a of parents of vari- 
able X;, the certainty about Pr(X;|z) is given through 
an artificially selected sample size parameter n;, and 
for any state x! of variable X; we have a corresponding 
count n} = nj- Pr(x?|7). After receiving an evidence 
e, we compute probabilities Pee , |e). Then the up- 
dated count n} is the sum of Pr(x!, cle) and the old ae 
Since nj; = )- j n’, the old sample size parameter n; be- 
comes n; + Pr(z|e). 

Although efficient in reducing the uncertainty about 
the distributions, this scheme has some serious draw- 
backs. In fact, it tends to reduce the second-degree 
uncertainty too fast, by overestimating the counts. In 
order to avoid this, one can introduce a so-called fading 
factor f. Then after receiving an evidence, the sample 
size n; is changed to 1 he ni + Pr(z|le), and the counts 
ni are updated to f - n} + Pr(x!, mle). Therefore, the 
fading factor f insures that the influence of the past 
decreases exponentially [16]. 

After describing some approaches in adapting 
a Bayesian network to different settings of distribution 
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parameters, it is equally important to discuss the un- 
certainty in the graph structure. In many cases, we can 
compensate for the variability in the graph structure of 
a Bayesian network just by modifying the parameters of 
distributions in the network. Sometimes it may not be 
sufficient to adjust the distribution parameters in order 
to account for the change in the model. In fact, the dif- 
ference in the graph structure may be so significant that 
it becomes impossible to accurately reflect the situation 
by a mere parameter change. 

There are two main approaches to graph struc- 
ture adaptation in Bayesian networks. The first method 
works by collecting the cases, and re-running the batch 
learning procedure to update the graph structure. The 
second method, also known as the expert disagreement 
approach, works simultaneously with a set of differ- 
ent models, and updates the weight of each model ac- 
cording to the evidence. More precisely, suppose there 
are m alternative models M,,M>2,...,M,, with cor- 
responding initial weights w1, w2,...,Wm that express 
our certainty of the models. Let Y be some variable 
in the network. After receiving an evidence e, we ob- 
tain the probabilities Pr;(Y|e) := Pr(Yle, M;) and 
Pr;(e) := Pr(e|M;) according to each model Mj, for 
1 <i <m. Then, 


Pr(Y|e) :-= > w;-Prj(Yle), (13) 


i=1 


and the updated weights w; are computed as the prob- 
abilities of the corresponding models M; given the past 
evidence: w; = Pr(M;le). Hence, by the well-known 
Bayes formula: 


Pr(e|M;) Pr(M;) 
Wi= 
2 Wj Prj(e) 
J 


(14) 


Note that the expert disagreement approach to graph 
structure adaptation can be further extended to include 
the adaptation of distribution parameters based on the 
above methods, such as fractional updating. 


Tuning Tuning is the process of adjusting the dis- 
tribution parameters so that some prescribed requests 
for the model distributions are satisfied. The commonly 
used approach to tuning is the gradient descent on the 
parameters similar to training in neural networks. 


Let t represent the set of parameters which are cho- 
sen to be altered. Let p(t) denote the current model 
distribution, and q be the target distribution. Suppose 
d(p, q) represents the distance between two distribu- 
tions. The following gradient descent tuning algorithm 
is given in [9]: 

e Compute the gradient of d(p, q) with respect to the 
parameters T; 

e Select a step size a > 0, and let Ar = —a- Vv 
d(p, q)(t); i. e. give T a displacement Ar in the op- 
posite direction to the gradient of d(p, q) (to); 

e Repeat this procedure until the gradient is suff- 
ciently close to zero. 

Evolutionary methods, simulated annealing, expec- 
tation-maximization and non-parametric methods are 
among other commonly used methods for tuning or 
training Bayesian networks. 


Applications 


The concept of a Bayesian network can be interpreted 
in different contexts. From a statistical point of view, 
a Bayesian network can be defined as a compact rep- 
resentation of the joint probability over a given set of 
variables. From a broader point of view, a Bayesian net- 
work is a special type of graphical model capable of re- 
flecting causality, as well as updating its beliefs in view 
of received evidence. All these features make a Bayesian 
network a versatile instrument that can be used for vari- 
ous purposes, including facilitating communication be- 
tween human and computer, extracting hidden infor- 
mation and patterns from data, simplifying decision 
making, etc. 

Due to their special structure, Bayesian networks 
have found many applications in various areas such 
as artificial intelligence and expert systems, machine 
learning and data mining. Bayesian networks are used 
for modeling knowledge in text analysis, image process- 
ing, speech pattern analysis, data fusion, engineering, 
biomedicine, gene and protein regulatory networks, 
and even meteorology. Furthermore, it has been ex- 
pressed that the inductive inference procedures based 
on Bayesian networks can be used to introduce induc- 
tive reasoning in such a previously strictly deductive 
science as mathematics. 

The large scope of different applications of Bayesian 
networks is especially impressive when taking into ac- 
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count that the theory of Bayesian networks has only 
been around for about a quarter of a century. Next, sev- 
eral examples of recent real-life applications of Bayesian 
networks are considered to illustrate this point. 

Recent research in the field of automatic speech 
recognition [13] indicates that dynamic Bayesian net- 
works can effectively model hidden features in speech 
including articulatory and other phonological features. 
Both hidden Markov models (HMM), which are a spe- 
cial case of dynamic Bayesian networks (DBN), and 
more general dynamic Bayesian networks have been 
applied for modeling audio-visual speech recognition. 
In particular, a paper by A.V. Nefian et al. [15] de- 
scribes an application of the coupled HMM and the fac- 
torial HMM as two suitable statistical models for audio- 
video integration. The factorial HMM is a generaliza- 
tion of HMM, where the hidden state is represented by 
a collection of variables also called factors. These fac- 
tors, although independent of each other, all impact the 
observations, and hence become connected indirectly. 
The coupled HMM is a DBN represented as two reg- 
ular HMM whose hidden state nodes have links to the 
hidden state nodes from the next time slice. The cou- 
pled HMM has also been applied to model hand ges- 
tures, the interaction between speech and hand ges- 
tures, etc. In addition, face detection and recognition 
problems have been studied with the help of Bayesian 
networks. 

Note that different fields of application may call for 
specialized employment of Bayesian network methods, 
and conversely, similar approaches can be successfully 
used in different application areas. For instance, along 
with the applications to speech recognition above, cou- 
pled hidden Markov models have been employed in 
modeling multi-channel EEG (electroencephalogram) 
data. 

An interesting example of the application of 
a Bayesian network to expert systems includes devel- 
oping strategies for troubleshooting complex electro- 
mechanical systems, presented in [23]. The constructed 
Bayesian network has the structure of a naive Bayes 
model. In the decision tree for the troubleshooting 
model, the utility function is given by the cost of repair. 
Hence, the goal is to find a strategy minimizing the ex- 
pected cost of repair. 

An interesting recent study [3] describes some ap- 
plications of Bayesian networks in meteorology from 


a data mining point of view. A large database of daily 
observations of precipitation levels and maximum wind 
speed is collected. The Bayesian network structure is 
constructed from meteorological data by using various 
approaches, including batch learning procedure and 
simulation techniques. In addition, an important data 
mining application of Bayesian networks is illustrated 
by giving an example of missing data values estimation 
from the evidence received. 


Applications of Bayesian Networks to Data Mining; 
Naive Bayes Rapid progress in data collection tech- 
niques and data storage has enabled an accumulation 
of huge amounts of experimental, observational and 
operational data. As the result, massive data sets con- 
taining a large amount of information can be found al- 
most everywhere. A well-known example is the data set 
containing the observed information about the human 
genome. The need to quickly and correctly analyze or 
manipulate such enormous data sets facilitated the de- 
velopment of data mining techniques. 

Data mining is research aimed at discovery of var- 
ious types of knowledge from large data warehouses. 
Data mining can also be seen as an integral part of 
the more general process of knowledge discovery in 
databases. Two other parts of this knowledge discovery 
are preprocessing and postprocessing. As seen above, 
Bayesian networks can also extract knowledge from 
data, which is called evidence in the Bayesian frame- 
work. In fact, the Bayesian network techniques can be 
applied to solve data mining problems, in particular, 
classification. 

Many effective techniques in data mining utilize 
methods from other multidisciplinary research areas 
such as database systems, pattern recognition, machine 
learning, and statistics. Many of these areas have a close 
connection to Bayesian networks. In actuality, data 
mining utilizes a special case of Bayesian networks, 
namely, naive Bayes, to perform effective classification. 
In a data mining context, classification is the task of 
assigning objects to their relevant categories. The in- 
centive for performing classification of data is to attain 
a comprehensive understanding of differences and sim- 
ilarities between the objects in different classes. 

In the Bayesian framework, the data mining classifi- 
cation problem translates into finding the class param- 
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eter which maximizes the posterior probability of the 
unknown instance. This statement is called the max- 
imum a posteriori principle. As mentioned earlier, the 
naive Bayes is an example of a simple Bayesian network 
model. 

Similarly to the naive Bayes classifier, classification 
by way of building suitable Bayesian networks is ca- 
pable of handling the presence of noise in the data as 
well as the missing values. Artificial neural networks 
can serve as an example of the Bayesian network clas- 
sifier designed for a special case. 


Application to Global and Combinatorial Optimiza- 
tion In the late 1990s, a number of studies were con- 
ducted that described how BN methodology can be 
applied to solve problems of global and combinato- 
rial optimization. The connection between graphical 
models (e. g. Bayesian networks) and evolutionary al- 
gorithms (applied to optimization problems) was estab- 
lished. In particular, P. Larrafaga et al. combined some 
techniques from learning BN’s structure from data 
with an evolutionary computation procedure called 
the Estimation of Distribution Algorithm [11] to de- 
vise a procedure for solving combinatorial optimization 
problems. R. Etxerberria and P. Larrafiaga proposed 
a similar approach for global optimization [5]. 

Another method based on learning and simulation 
of BNs that is known as the Bayesian Optimization Al- 
gorithm (BOA) was suggested by M. Pelikan et al. [20]. 
The method works by randomly generating an initial 
population of solutions and then updating the popu- 
lation by using selection and variation. The operation 
of selection makes multiple copies of better solutions 
and removes the worst ones. The operation of variation, 
at first, constructs a Bayesian network as a model of 
promising solutions following selection. Then new can- 
didate solutions are obtained by sampling of the con- 
structed Bayesian network. New solutions are incorpo- 
rated into the population in place of some old candidate 
solutions, and the next iteration is executed unless a ter- 
mination criterion is reached. 

For additional information on some real-world ap- 
plications of Bayesian networks to classification, relia- 
bility analysis, image processing, data fusion and bio- 
informatics, see the recent book edited by A. Mittal 
et al. [14]. 


See also 


> Bayesian Global Optimization 

> Evolutionary Algorithms in Combinatorial 
Optimization 

> Neural Networks for Combinatorial Optimization 
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Synonyms 


Beam orientation optimization; Beam angle optimiza- 
tion 


Introduction 


Cancer is typically treated with 3 standard procedures: 
1) surgery — the intent of which is to physically rescind 
the disease, 2) chemotherapy - drug treatment that at- 
tacks fast proliferating cells, and 3) radiotherapy - the 
targeted treatment of cancer with ionizing beams of 
radiation. About half of all cancer patients receive ra- 
diotherapy, which is delivered by focusing high-energy 
beams of radiation on a patient’s tumor(s). Treatment 
design is traditionally considered in three phases: 


Beam Selection The process of deciding the number 
and trajectory of the beams that will pass through 
the patient. 

Fluence Optimization Calculating the amount of dose 
to deliver along each of the selected beams so that 
the patient is treated as well as possible. 

Delivery Optimization Deciding how to best deliver 
the treatment designed in the first two steps. 


The fundamental question in optimizing radiother- 
apy treatments is how to best treat the patient, and 
such research requires detailed knowledge of medi- 
cal physics and optimization. Unlike the numerous re- 
search pursuits within the field of optimization that 
require a specific expertise, the goals of this research 
rely on an overriding understanding of modeling, solv- 
ing and analyzing optimization problems as well as an 
understanding of medical physics. The necessary spec- 
trum of knowledge is commonly collected into a re- 
search group that is comprised of medical physicists, 
operations researchers, computer scientists, industrial 
engineers, and mathematicians. 

In a modern clinic, the first phase of selecting beams 
is accomplished by a treatment planner, and hence, the 
quality of the resulting treatment depends on the ex- 
pertise of this person. Fluence optimization is automat- 
ically conducted once beams are selected, and the re- 
sulting treatment is judged with a variety of metrics 
and visualization tools. If the treatment is acceptable, 
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the process ends. However, unacceptable treatments are 
common, and in this scenario the collection of beams is 
updated and fluence optimization is repeated with the 
new beams. This trial-and-error approach oscillates be- 
tween the first two phases of treatment design and of- 
ten continues for hours until an acceptable treatment 
is rendered. The third phase of delivery optimization 
strives to orient the treatment machinery so that the 
patient is treated as efficiently as possible, where effi- 
ciency is interpreted as shortest delivery time, shortest 
exposure time, etc. 

The focus of this entry is Beam Selection, which has 
a substantial literature in the medical physics commu- 
nity and a growing one in the operations research com- 
munity. As one would expect, no single phase of treat- 
ment design exists in isolation, and although the three 
phase approach pervades contemporary thinking, read- 
ers should be aware that future efforts to optimize the 
totality of treatment design are being discussed. The 
presentation below is viewed as part of this bigger goal. 


Definitions 


An understanding of the technical terms used to de- 
scribe radiotherapy is needed to understand the scope 
of Beam Selection. Patient images such as CAT scans 
or MRI images are used to identify and locate the ex- 
tent of the disease. Treatment design begins with the 
tedious task of delineating the target and surrounding 
tissues on each of the hundreds of images. The resulting 
3D structures are individually classified as either a tar- 
get, a critical structure, or normal tissue. An oncologist 
prescribes a goal dose for the target and upper bounds 
on the remaining tissues. This prescription is tailored 
to the optimization model used in the second phase of 
treatment design and is far from unique. A discussion 
of the myriad of models used for fluence optimization 
exceeds the confines of this article and is fortunately not 
needed. 

The method of treatment depends on the clinic’s 
technology, and we begin with the general concepts 
common to all modalities. A patient lies on a treatment 
couch that can be moved vertically and horizontally and 
rotated in the plane horizontal to the floor. A gantry 
rotates around the patient in a great circle, the head 
of which is used to focus the beam on the patient, see 
Fig. 1. Shaping and modulating the beam is important 


Beam Selection in Radiotherapy Treatment Design, Figure 1 
A typical treatment configuration 


Beam Selection in Radiotherapy Treatment Design, Figure 2 
A multileaf collimator 


in all forms of treatment, and although these tasks are 
accomplished differently depending on the technology, 
it is common to control smaller divisions of each beam 
called sub-beams. As an example, the gantry’s head of- 
ten contains a multileaf collimator that is capable of di- 
viding the beam (Fig. 2), a technology that is modeled 
by replacing the whole beam with a grid of rectangular 
sub-beams. Previous technology shaped and modulated 
the beam without a collimator, but the concept of a sub- 
beam remains appropriate. 

The center of the gantry’s rotation is called the 
isocenter, a point that is placed near the center of the 
target by repositioning the patient via couch adjust- 
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ments. The beam can essentially be focused on the pa- 
tient from any point on a sphere with a one meter 
radius that encompasses the patient, although some po- 
sitions are not possible due to patient-gantry interfer- 
ence. The beam selection problem is to choose a few of 
these positions so that the resulting treatment is of high 
quality. If the selection process is restricted to a sin- 
gle great circle, then the term beam is often replaced 
with angle (in fact these terms are used synonymously 
in much of the literature). 

The collection of positions on the sphere from 
which we are allowed to select is denoted by A. This 
set contains every point of the sphere in the contin- 
uum, but in practice A isa finite set of candidate beams. 
The problem of selecting beams depends on a judgment 
function, which is a mapping from the power set of 
A, denoted P(A), into the nonnegative extended reals, 
denoted Ri = {x € R : x = 0} U {oo}. Assum- 
ing that low values correspond with high-quality treat- 
ments, we have that a judgment function is a mapping 
f : P(A) > R%. with the monotonicity property that 
if A’ and A” are subsets of A such that A’ D A”, then 
f(A) < f(A”). The monotonicity condition guaran- 
tees that treatment quality can not degrade if beams are 
added to an existing treatment. 

The judgment function is commonly the optimal 
value from the second phase of treatment design, and 
for any A’ € P(A), we let X(A’) be the feasible region 
of the optimization problem that decides fluences. An 
algebraic description of this set relies on the fact that 
we can accurately model how radiation is deposited as 
it passes through the anatomy. There are several com- 
peting radiobiological models that accomplish this task, 
each of which produces the rate coefficient A(;, a, i), 
which is the rate at which sub-beam i in beam a de- 
posits energy into the anatomical position j. These val- 
ues form a dose matrix A, with rows being indexed by j 
and columns by (a, i). The term used to measure a sub- 
beam’s energy is fluence, and experimentation validates 
that anatomical dose, which is measured in Grays (Gy), 
is linear in fluence. So, if x(a, ;) is the fluence of sub- 
beam i in beam a, then the linear map x +> Ax trans- 
forms fluence values into anatomical dose. We partition 
the rows of the dose matrix into those that correspond 
with anatomical positions in the target - forming the 
submatrix Ay, in a critical structure - forming the sub- 
matrix Ac, and in normal tissue - forming the subma- 
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Beam Selection in Radiotherapy Treatment Design, Figure 3 
A dose-volume histogram, the horizontal axis is the anatom- 
ical dose (measured in Grays) and the vertical axis is the per- 
cent of volume 


trix Ay. With this notation, A;x, Acx and Ay x are the 
delivered doses to the target, the critical structures, and 
the normal tissues under treatment x. 

Treatment planners use visual and numerical meth- 
ods to evaluate treatments. The two most common vi- 
sual tools are the dose-volume histogram (DVH) and 
a collection of isocontours. A DVH is a plot of dose ver- 
sus volume and allows a treatment planner to quickly 
gauge the extent to which each structure is irradiated, 
an example is found in Fig. 3. The curve in the upper 
right side of the figure corresponds to the target, which 
is the growth to the left of the brain stem in Fig. 4. The 
ideal curve for the target would be one that remains at 
100% until the desired dose and then falls immediately 
to zero, and the ideal curves for the remaining struc- 
tures would be ones that fall immediately to zero. The 
curve passing through the middle of Fig. 3 corresponds 
to the brain stem and indicates that approximately 80% 
of the brain stem is receiving half of the target dose. 

What a DVH lacks is spatial detail about the 
anatomical dose, but this information is provided by 
the isocontours, which are level curves drawn on each 
of the patient images. For example, if the target’s goal is 
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Beam Selection in Radiotherapy Treatment Design, Figure 4 
A collection of isocontours on a single patient image 


80 Gy, then the 90% isocontour contains the anatomi- 
cal region that receives at least 0.9 x 80 = 72 Gy. Fig- 
ure 4 illustrates the 100%, 90%, ..., 10% isocontours 
on a single patient image. One would hope that these 
isocontour would tightly contain the target on each of 
the patient images, a goal commonly referred to as con- 
formality. Although a DVH is often used to decide if 
a treatment is unacceptable, both the DVH and the iso- 
contours are used to decide if a treatment is acceptable. 
Although treatments are commonly evaluated exclu- 
sively with a DVH and the isocontours, there are well 
established numerical scores that are also used. Such 
scores are called conformality indices and consider the 
ratios of under and over irradiated tissue, and as such, 
these values collapse the DVH into a numerical value. 
We do not discuss these measures here, but the reader 
should be aware that they exist. 


Formulation 


The N-beam selection problem for the judgment func- 
tion f and candidate set of beams A is 


min{ f(A’): A’ € P(A), |A’| = N}. (1) 


The parameter N is provided by the treatment plan- 
ner and is intended to control the complexity of the 


treatment. The prevailing thought is that fewer beams 
are preferred if all other treatment goals remain satis- 
factory, and if f adequately measures treatment quality, 
a model that represents this sentiment is 


min{N : min{f(A’): A’ € P(A), |A’| = N} < e}, 


where € defines the quality of an acceptable treatment. 

As mentioned in the previous section, the judgment 
function is typically the objective value from fluence 
optimization. A common least-squares approach de- 
fines X(A’) to be 


{x:x20,) xa) =0 fora € A\A’} 


1 


and f(A’) to be 


min{wr . | Arx = TG|l2 + Wc: |Acx|l2 
+ on: ||Anx|l2a:x €X(AD}, (2) 


where TG is a vector that expresses the target’s treat- 
ment goal and wr, wc and wy weight the objective 
terms to express clinical desires. The prescription for 
this model is TG, but more complicated models with 
sophisticated prescriptions are common. In particular, 
dose-volume constraints that restrict the amount of 
each structure that is permitted to violate a bound are 
common. Readers interested in fluence optimization 
are directed to the entry on Cancer Radiation Treat- 
ment: Optimization Models. 


Models 


The N-beam selection problem is often addressed as 
a mixed integer problem. As an example, for the judg- 
ment function in (2) the N-beam selection problem can 
be expressed as 


min rr: ||Arx — TG||2 + @c 
\|Acx|l2 + on : |Anxlle 

Y; X(a,i) < M+ ya, forac A 
Eevee 

x>0 

ye{o,!, 


subject to: 


(3) 


where M is an arbitrarily large value that bounds each 
beam’s fluence. This is one of many possible models, 
with simple adjustments including the replacement of 
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the 2-norm with the 1 and oo norms, both of which re- 
sult in a linear mixed integer problem. 

A modest discretization of the sphere, with 72 
great circles through the north and south poles equally 
spaced at 5 degrees at the equator and each great cir- 
cle having beams equally spaced at 5 degrees, produces 
a set of 4902 candidate beams. This means the search 
tree associated with the mixed integer model above 
has ie) terminal nodes, which for the clinically valid 
N = 10 is approximately 2.2 x 10°°. Beyond the im- 
menseness of this search space, branch-and-bound pro- 
cedures are difficult for two reasons, 1) the number of 
N element subsets leading to near optimal solutions is 
substantial, and 2) the evaluation of the judgment func- 
tion at each node requires the solution to an underlying 
fluence model, which in itself is time consuming. This 
inherit difficulty has driven the development of heuris- 
tic approaches, which separate into the two steps of: 1) 
assigning each beam a value that measures its worth to 
the overall treatment, and 2) using the individual beam 
values to select a collection of N beams. As a simple 
example, a scoring technique evaluates each beam and 
then simply selects the top N beams. The remainder of 
this section discusses several of the common heuristics. 

A selection technique is called informed if it re- 
quires the evaluation of the underlying judgment func- 
tion. One example would be to iteratively let A’ be 
the singleton beam sets and evaluate f(A’) for each. 
The N beams with the best scores would be selected 
for the treatment. If a selection method uses the data 
forming the optimization problem that defines f but 
fails to evaluate f, then the technique is called weakly in- 
formed. The preponderance of techniques suggested in 
the medical physics literature fall into this category. An 
example based solely on the dose matrix A is to value 
beam a with 


max(i,j){A(j,a,i) : j € T} 
ming, {){A(j,a,i) H j ECU NY 


where we assume the minimums in the denominator 
are nonzero. This ratio is high if a beam can deliver 
large amounts of dose to the target without damaging 
other tissues. A scoring technique based on this would 
terminate with the collection of N beams with the high- 
est values. Since weakly informed methods do not re- 
quire the solution of an optimization problem, they 
tend to be fast. 


The concern about the size of the underlying flu- 
ence model has lead to a sampling heuristic that re- 
duces the accuracy of the radiobiological model. Clin- 
ical relevance mandates that the anatomy be discretized 
so that dose is measured at distances no greater than 
2mm. For a 20cm? portion of the anatomy, roughly 
the volume of the cranium, this means the coarsest 3D 
grid permitted in the clinic divides the anatomy into 10° 
sub-regions called voxels, which are indexed by j. Cases 
in the chest and abdomen are substantially larger and 
require a significant increase in the number of voxels. 
A natural question is whether or not all of these regions 
are needed for beam selection. One approach is to re- 
peatedly sample these regions together with the candi- 
date set of beams and solve (1). Each beam is valued by 
the number of times it has a high fluence. Beams with 
high values create A in (1) with j being indexed over all 
regions. The goal of this technique is to identify a can- 
didate set of beams whose size is slightly larger than N, 
which keeps the search space manageable with the full 
compliment of voxels. The sampling procedure is cru- 
cial to the success of the procedure since it is known 
that beam selection depend on the collection of voxels. 

Once beams are valued, there are many ways to use 
this information to construct a collection of favorable 
beams. As already discussed, common scoring methods 
select the best N beams. Another approach is based on 
set covering, which uses a high-pass filter to decide if 
a beam adequately treats the target. Allowing ¢ to be 
the threshold at which we say beam a treats position j 
within the target, we let 


l, ViAien =e 


UG.a = 
(j,a) 0, >; Ay,a,i) ee: 


for each j € T. If each beam has a value of c,, where 
low values are preferred, the set cover heuristic forms 
a collection of beams by solving 


min{)> CaVa: a U(j,ayVa 21, 
for each j € T, ya € {0, 1} . (4) 


This in itself is a binary optimization problem, and 
if ¢ is small enough to guarantee that every beam treats 
the target, which is typical, then the size of the search 
space is the same as the original problem in (1). How- 
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ever, the set cover problem has favorable solution prop- 
erties, and this problem solves efficiently in practice. 
The search space decreases in size as ¢ increases, and 
designing an appropriate heuristic requires both a judi- 
cious selection of ¢ and an appropriate objective. This 
method can be informed or weakly informed depend- 
ing on how the objective coefficients are constructed. 

Another approach is to use the beam values as 
a probability distribution upon normalization. This al- 
lows one to address the problem probabilistically, a per- 
spective that has been suggested within column gen- 
eration and vector quantization. The column genera- 
tion approach prices beams with respect to the likeli- 
hood that they will improve the judgment function, and 
beams with high probabilities are added to the current 
collection. The process of adding and deleting beams 
produces a sequence of beam sets A!,.A?,...,A”, and 
problem (1) is solved with A replaced with Ak, k = 
1,2,...,n. Although it is possible for this technique to 
price all subsets of A whose cardinality is greater than 
N, which is significantly greater than the size of the 
original search space in (1), the pricing scheme tends 
to limit the number A‘s. 

The probabilistic perspective is further incorpo- 
rated with heuristics based in information science. In 
particular, a method based on vector quantization, 
which is a modeling and solution procedure used in 
data compression, has been suggested. Allowing a(a) to 
be the probability associated with beam a, this heuristic 
constructs a collection of beams by solving 


min} > a(a)p(a, Q(a)):|QAI=N}. (5) 


a 


where Q is a mapping from A into itself and p is a met- 
ric appropriate to the application. A common metric is 
to let p(a, Q(a)) be the arc length between a and Q(a). 

In the finite case, each N element subset, say A’, of 
A uniquely defines Q by setting Q(A) = A’. Assum- 
ing this equality, we complete the definition by setting 
Q(a) = a’ € A’ if and only if p(a, a’) < p(a, a”) for all 
a” € A’, a condition referred to as the nearest neigh- 
bor condition. Since the optimization problem in (5) is 
defined over the collection of these functions, the size 
of the feasible region is the same as the original beam 
selection problem in (1). Unlike the set cover approach, 
which solves (4) to optimality, and the column genera- 


tion technique, which repeatedly solves (1) to optimal- 
ity with a restricted beam set, the vector quantization 
method often solves (5) heuristically. The most com- 
mon heuristic is the Lloyd algorithm, a technique that 
begins with an initial collection of N beams and then 
iterates between 
1. defining Q with the nearest neighbor condition, and 
2. forming a new collection of beams with the centroids 
of Q~!(a), where beam a is in the current collection. 
This technique guarantees that the objective in (5) de- 
creases with each new collection. 


Conclusions 


Selecting beams is one of the three sub-problems in the 
design of radiotherapy treatments, a problem that cur- 
rently does not have an appropriate solution outside the 
clinical practice of manually selecting beams through 
trial-and-error. However, research into automating the 
selection of beams with optimization is promising. We 
conclude with a few words on the totality of treatment 
design. 

The overriding goal of treatment design is to re- 
move the threat of cancer while sparing non-cancerous 
tissues. The status quo is to assume that a patient is 
static while designing a treatment. Indeed, treatment 
planners expand targeted regions to address the dy- 
namic patient movement in the static approach, i. e. the 
target is increased to include the gross volume that con- 
tains the estimated movement of the actual target. The 
primary goal of the third phase of treatment design is 
to deliver the treatment as efficiently as possible to limit 
patient movement. This leads to a dilemma. The mono- 
tonicity property of the judgment function encourages 
treatments with many beams, but conventional wisdom 
dictates that the number of beams and the efficiency 
of the delivery are inversely proportional. However, in 
many settings the number of beams is a poor surrogate 
of efficiency. As an example, the most time demanding 
maneuver is to rotate the couch since it requires a tech- 
nician to enter the treatment vault. So, treatments with 
many beams but fewer couch rotations are preferred to 
treatments with fewer beams but more couch rotations. 

The point to emphasize from the previous para- 
graph is that the problem of selecting beams is always 
expressed in terms of the number of beams, which is 
a byproduct of the three-phase approach. Although the 
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separation of the design process into phases is natural 
and useful for computation, the division has drawbacks. 
Fluence models are large and difficult to solve, and ev- 
ery attempt is made to reduce their size. As already dis- 
cussed, the voxels need to be under 2 mm’ to reach clin- 
ical viability, and hence, the index set for j is necessar- 
ily large. The number and complexity of the sub-beams 
has increased dramatically with advanced technology, 
similarly making the index set for i large. This leaves 
the number of beams as the only control, and treatment 
designers are asked to select beams so that the fluence 
model is manageable. Years of experience have devel- 
oped standard collections for many cancers, but asking 
a designer to select one of the 2.2 x 10°” possible collec- 
tions for a 10 beam treatment in a non-standard case is 
daunting. A designer’s instinct is to value a beam indi- 
vidually rather than as part of a collection. Several of the 
weakly informed selection methods from the medical 
physics literature have the same weakness. Such indi- 
vidual valuation typically identifies all but a few beams 
of a quality solution, but the last few are often unintu- 
itive. Automating beam selection with an optimization 
process so that beams are considered within a collection 
is a step in the right direction. 

The future of treatment design is to build global 
models and solution procedures that simultaneously 
address all three phases of treatment design. Such mod- 
els are naturally viewed from the perspective of beam 
selection. What is missing is a judgment function that 
includes both fluence and delivery optimization. Learn- 
ing how to model and solve these holistic models would 
alleviate the design process from a designer’s (lack of) 
expertise and would provide a uniform level of care 
available to clinics with comparable technology. Such 
improvements are the promise of the field. 


See also 


> Credit Rating and Optimization Methods 

> Evolutionary Algorithms in Combinatorial 
Optimization 

> Optimization Based Frameworkfor Radiation 
Therapy 


The literature on beam selection is mature within the 
medical physics community but is in its infancy within 
optimization. The five citations below cover the topics 


discussed in this article and contain bibliographies that 
adequately cite the work in medical physics. 


References 


1. Acosta R, Ehrgott M, Holder A, Nevin D, Reese J, Salter B 
(2007) Comparing Beam Selection Strategies in Radiother- 
apy Treatment Design: The Influence of Dose Point Reso- 
lution. In: Alves C, Pardalos P, Vicente L (eds) Optimization 
in Medicine, International Center for Mathematics, Springer 
Optimization and Its Applications. Springer, pp 1-25 

2. Aleman D, Romeijn E, Dempsey J (2006) Beam orienta- 
tion optimization methods in intensity modulated radiation 
therapy. IIE Conference Proceedings 

3. Ehrgott M, Holder A, Reese J (2008) Beam Selection in Ra- 
diotherapy Design. In: Linear Algebra and Its Applications, 
vol 428. pp 1272-1312. doi:10.1016/j.laa.2007.05.039 

4. Lim G, Choi J, Mohan R Iterative Solution Methods for Beam 
Angle and Fluence Map Optimization in Intensity Modu- 
lated Radiation Therapy Planning. to appear in OR Spec- 
trum. doi:10.1007/s00291-007-0096-1 

5. Lim G, Ferris M, Shepard D, Wright S, Earl M (2007) An Op- 
timization Framework for Conformal Radiation Treatment 
Planning. INFORMS J Comput 19(3):366-380 


———E 
Best Approximation 


in Ordered Normed Linear Spaces 


HOSSEIN MOHEBI 

Mahani Mathematical Research Center, 
and Department of Mathematics, 
University of Kerman, Kerman, Iran 


MSC2000: 90C46, 46B40, 41A50, 41A65 


Article Outline 


Keywords and Phrases 
Introduction 
Metric Projection onto Downward and Upward Sets 
Sets Z4 and Z_ 
Downward Hull and Upward Hull 
Metric Projection onto a Closed Set 
Best Approximation in a Class of Normed Spaces 
with Star-Shaped Cones 
Characterization of Best Approximations 
Strictly Downward Sets 
and Their Best Approximation Properties 
References 


Best Approximation in Ordered Normed Linear Spaces 


203 


Keywords and Phrases 


Best approximation; Downward and upward sets; 
Global minimum; Necessary and sufficient conditions; 
Star-shaped set; Proximinal set 


Introduction 


We study the minimization of the distance to an 
arbitrary closed set in a class of ordered normed 
spaces (see [8]). This class is broad enough. It con- 
tains the space C(Q) of all continuous functions de- 
fined on a compact topological space Q and the space 
L™(S, &’, 2) of all essentially bounded functions de- 
fined on a measure space (S, X’, jz). It is assumed that 
these spaces are equipped with the natural order re- 
lation and the uniform norm. This class also contains 
direct products X = R x Y, where Y is an arbitrary 
normed space, with the norm ||(c, y)|| = |c| + |ly|l. 
The space X is equipped with the order relation induced 
by the cone K = {(c, y): c = |ly|l}. 

Let U be a closed subset of X, where X is a normed 
space from the given class, and let t € X. We consider 
the problem Pr(U, f): 

minimize |lu—t|| subjectto ue U. (1) 
It is assumed that there exists a solution of Pr(U, t). This 
solution is called a metric projection of t onto U, or 
a best approximation of t by elements of U. We use the 
structure of the objective function in order to present 
necessary and sufficient conditions for the global mini- 
mum of Pr(U, t) that give a clear understanding of the 
structure of a metric projection and can be easily veri- 
fied for some classes of problems under consideration. 

We use the so-called downward and upward sub- 
sets of a space X as a tool for analysis of Pr(U,t). 
A set UCX is called downward if (u € U, 
x <u) = x € U.A set V CX is called upward 
if (v € V,x > v) = > x € V. Downward and upward 
sets have a simple structure so the problem Pr(U, t) 
can be easily analyzed for these sets U. If U is an ar- 
bitrary closed subset of X we can consider its down- 
ward hull U,. = U — K and upward hull U* = U+ K, 
where K = {x € X: x > 0} is the cone of positive el- 
ements. These hulls can be used for examination of 
Pr(U, t). We also suggest an approach based on a di- 
vision of a normed space under consideration into two 


homogeneous not necessarily linear subspaces. A com- 
bination of this approach with the downward-upward 
technique allows us to give simple proofs of the pro- 
posed necessary and sufficient conditions. 

Properties of downward and upward sets play a cru- 
cial role in this article. These properties have been 
studied in [6,13] for X = R”. We show that some re- 
sults obtained in [6,13] are valid in a much more gen- 
eral case. In fact, the first necessary and sufficient con- 
ditions for metric projection onto closed downward 
sets in R” have been given in [1, p. 132, Theorem 
9]. Proposition 1(1) and (2) are extensions of R” and 
1=(1,...,1), of [1, Proposition 1(a) and (b)], respec- 
tively. Also, Propositions 2 and 3 are extensions of [1, 
p. 116, Proposition 2]. Furthermore, Corollary 3 is an 
extension of [1, p. 116, Corollary 2 and p. 117, Re- 
mark 2]. In connection with Proposition 6, the down- 
ward hull U, has been introduced in [1, Sect. 1], where 
the first results on the connection between d(t, U) 
and d(t, U,) have been given, for the particular case 
where U is a normal subset of R,. We use methods of 
abstract convexity and monotonic analysis (see [11]) in 
this study. 

Let X be a normed space. Let K C X be a closed 
convex and pointed cone. (The latter means that 
K 1 (—K) = {0}.) The cone K generates the order re- 
lation > on X. By definition x > y => x—yeK. 
We say that x is greater than y and write x>y if 
x —y € K \ {0}. Assume that K is solid, that is, the in- 
terior int K of K is nonempty. Let 1 € int K. Using 1 we 
can define the following function: 


p(x) =inffA eR: x <All}, (eX). (2) 


It is easy to check that p is finite. It follows from (2) that 


x<p(x)l, («eEX). (3) 


It is easy to check (and well known) that p is a sublinear 
function, that is, 


piAx) =Ap(x) (A>0,x € X), 


p(x + y) S p(x) + ply) (x,y € X). 


We need the following definition (see [13] and ref- 
erences therein). A function s: X — R is called top- 
ical if s is increasing: x > y implies s(x) > s(y) and 
s(x + A1) = s(x) + A forallx € XandA ER. 
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It follows from the definition of p that p is topical. 
Consider the function 


\|x|| := max(p(x), p(—x)). (4) 


It is easy to check (and well known) that || - || is a norm 
on X. In what follows we assume that the norm (4) co- 
incides with the norm of the space X. 

It follows from (3) that 

x S |[xl[1, 


—x < [xl], («€X). (5) 


The ball B(t, r) = {x € X: ||x —t|]| < r} has the form 
Bit,r) = {xe X:t-rl <x <t-+4rl}. (6) 


We now present three examples of spaces under con- 
sideration. 


Example 1 Let X bea vector lattice with a strong unit 1. 
The latter means that for each x € X there exists A €¢ R 
such that |x| < A1. Then 


lx] = inf{A>0: |x| <Al}, 


where norm ||- || is defined by (4). It is well known 
(see, for example, [21]) that each vector lattice X with 
a strong unit is isomorphic as a vector-ordered space to 
the space C(Q) of all continuous functions defined on 
a compact topological space Q. For a given strong unit 
1 the corresponding isomorphism yy can be chosen in 
such a way that w(1)(q) = 1 for all q € Q. The cone 
w(K) coincides with the cone of all nonnegative func- 
tions defined on Q. If X = C(Q) and 1(q) = 1 for all g, 
then 


p(x) = maxx(q) and |l\x|| = max|x(q)|. 
qEQ qEQ 


A well-known example of a vector lattice with a strong 
unit is the space L™(S, 2’, 2) of all essentially bounded 
functions defined on a measure space (S, XY, j2). If 
1(s) = 1 for all s € S, then p(x) = ess sup,¢.x(s) and 
|x|] = ess sup,es|x(s)]. 


Example 2 Let X = R x Y, where Y is a normed space 
with a norm ||-||, and let K C X be the epigraph of 
the norm K = {(A, x): A = ||x||}. The cone K is closed 
solid convex and pointed. It is easy to check and well 
known that 1 = (1, 0) is an interior point of K. For each 


(c, y) € X we have 


plc, y) = inf{A € R: (c, y) < Al} 
= inf{A € R: (A, 0) —(c, y) € K} 
= inffA € R: (A—c,—y) € K} 
= inf{fA ¢ R:A—c> ||— yl} =c+ ly. 


Hence 


II(c, y)|| = max(p(c, y), p(—(c, y))) 
= max(c + |lyl],—¢ + [lyll) = lel + II. 


Example 3 Consider the space !' of all summable se- 
quences with the usual norm. Let Y = {x = (xj) € 
I’: x; = O}. Then we can identify /' with the space 
Rx Y. Let ye Yand x = (x,y) € I’. Then ||x|| = 
[x1] + lly]. Let K = {x = (x;) € I: x1 > OPE, |xi[}- 
Assume that /' is equipped with the order relation > 
generated by K: if x = (x;) and z = (z;), then 


co 
x>roeexn- zy > 0 |xi—zil. 
i=2 
Let 1 = (1,0,...,0,...). Consider the function p de- 
fined on I' by 


p(x) = x1 + > |xil, x= (x1,%,..) EL. 


i=2 


Then (see the previous example) p(x) = inf{A ¢€ 
R: x < Al} and ||x|| = S°P2, |xi| coincides with 
max(p(x), p(—x)). 


Let X be a normed vector space. For a nonempty sub- 
set U of X and t € X, define d(t, U) = infyey ||t — ull. 
A point uo € U is called a metric projection of t onto 
U, or a best approximation of t by elements of U, if 
||t — uol| = a(t, U). 

Let U C X. For t € X, denote by Py(t) the set of all 
metric projections of t onto U: 


Pu(t) = {u € U: ||t—ul] = d(t, U)}. (7) 


It is wellknown that Py(f) is a closed and bounded sub- 
set of X. If t ¢ U, then Py(t) is located in the boundary 
of U. 

We shall use the following definitions. A pair (U, t) 
where U C X and t € X is called proximinal if there ex- 
ists a metric projection of t onto U. A pair (U, t) is called 
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Chebyshev if there exists a unique metric projection of 
tonto U. A set U C X is called proximinal, if the pair 
(U, t) is proximinal for all t ¢ X. A set U C X is called 
Chebyshev if the pair (U, t) is Chebyshev for all t € X. 

A set U C X is called boundedly compact if the 
set U, = {u € U: ||u|| < r} is compact for each r> 0. 
(This is equivalent to the following: the intersection of 
a closed neighborhood of a point u € U with U is com- 
pact.) Each boundedly compact set is proximinal. 

For any subset U of a normed space X we shall de- 
note by int U, cl U, and bd U the interior, the closure, 
and the boundary of U, respectively. 


Metric Projection onto Downward 
and Upward Sets 


Definition 1 A set U CX is called downward if 
(ue U,x <u) => x EU. 


First we describe some simple properties of downward 
sets. 


Proposition 1 Let U be a downward subset of X and 
x € X. Then the following assertions are true: 

(1) Ifx € U, then x — 1 € int U foralle >0. 

(2) intU = {x € X:x+el1e€U for some ¢ > 0}. 


Proof 

(1) Let e>0 be given and x¢U. Let N = {y € 
X: |ly — (x — €1)|| < ¢} be an open neighborhood 
of (x — €1). Then, by (6) N = {y € X: x —2e1 < 
y < x}. Since U is a downward set and x ¢€ JU, it 
follows that N C U, and so x — ¢1 € int U. 

(2) Let x € int U. Then there exists ¢9 > 0 such that 
the closed ball B(x, ¢9) C U. In view of (6), we get 
x+eql € U. 

Conversely, suppose that there exists ¢ > 0 such that 

x +e1€ U. Then, by (1): x = (x + €1) — €1 € int U, 

which completes the proof. oO 


Corollary 1 Let U be a closed downward subset of X 
and u € U. Then, u € bd U if and only ifAl1+ug¢U 
forallX >0. 


Lemma 1 The closure clU of a downward set U is 
downward. 
Proof Let x, € U, k =1,2,..., and xp > x as 


k — +00. Let ||x_ — x|| = e4(k = 1,2,...). Using (6) 
we get x — €x1 < x, for all k > 1. Since U is a down- 
ward set and x, € U for all k >1, we conclude 


that x —e,1 € U for all K>1. Let y<~x be arbi- 
trary and yy = y— el < x —e,1(k = 1,2,...). Then 
ye € U(k = 1,...). Since yy > y as k > +00, it fol- 
lows that y € cl U. O 


Proposition 2 A closed downward subset U of X is 
proximinal. 


Proof Lett € X \ U bearbitrary and r := d(t, U) = 
inf,ey ||t—u|| > 0. This implies that for each ¢ > 0 there 
exists ue € U such that ||t — ue|| < r+ ¢. Then, by (6): 


=(*+e)lSus-ti¢+ el. (8) 
Let up = t—11. Then 

||t — vol] = ||r1]| =r = d(t,U). 
In view of (8), we have ug —€1 = t—rl—el < ug. 
Since U is a downward set and u, € U, it follows that 


Ug — €1 € U for all ¢ > 0. The closedness of U implies 
Uo € U, and so uy € Py(t). Thus the result follows. O 


Remark 1 We proved that for each t € X \ U the 
set Py(t) contains the element up = t—r11 with 
r = d(t, U). Ift € U, then up = tand Py(t) = {uo}. 


Proposition 3. Let U be a closed downward subset 
of X and t € X. Then there exists the least element 
Ug := min Py(t) of the set Py(t), namely, up = t — 11, 
where r := d(t, U). 


Proof If t € U, then the result holds. Assume that 
t € Uand uy = t — 11. Then, by Remark 1, up € Py(t). 
Applying (6) and the equality ||t — uo|| = r we get 
x>t—rl=wu Vxe B(t,r). 
This implies that up is the least element of the closed 
ball B(t, r). 
Now, let u € Py(t) be arbitrary. Then ||t — u|| = r, 
and so u € B(t,r). Therefore, u > uo. Hence, ug is the 
least element of the set Py(t). Oo 


Corollary 2 Let U be a closed downward subset of 
X,t € X and up = min Py (t). Then, uo < t. 


Corollary 3_ Let U be a closed downward subset of X 
and t € X be arbitrary. Then 


d(t,U) = min{A > 0: t-AleU}. 
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Proof Let A= {A>0: t—Al€ U}. If t € U, then 
t—0-1=t€U, and so minA=0= d(t,U). Sup- 
pose that t € U; then r := d(t, U) > 0. Let A > 0 be ar- 
bitrary such that t — Al € U. Thus 


A = |[A1]| = lf -(¢—-A))|| = dt, U) =r. 


Since, by Proposition 3, t— rl € U, it follows that 
r € A. Hence, min A = r, which completes the proof. 
Oo 


The results obtained demonstrate that for the search of 
a metric projection of an element t onto a downward set 
U we need to solve the following optimization problem: 

minimize A 


subjectto t—AleU, A>0. (9) 


This is a one-dimensional optimization problem that is 
much easier than the original problem Pr(U, t). Prob- 
lem (9) can be solved, for example, by a common bi- 
section procedure: first find numbers p; and o; such 
that t-— p,1 € Uandt—o1 ¢ U. Let k >1. As- 
sume that numbers p;, and o, are known such that 
t— pxl € U and t — ox1 ¢ U. Then consider the 
number zx = 1/2(p, + ox). If t — m1 € U, then 
put Px+1 = Wk, On+1 = Ox. If t— m1 ¢ U, then put 
Pkt = Pk> Ok+1 = We. The number r = lim; px = 
lim; ox is the optimal value of (9). 

The following necessary and sufficient conditions 
for the global minimum easily follow from the results 
obtained. 


Theorem 1 Let U be aclosed downward set and t € U. 

Then uo € U is a solution of the problem Pr(U, t) if and 

only if 

Gi) uy >u:=t—rl, wherer = minfA >0:t—Ale 
U}; 

(ii) p(t — uo) = p(uo — t). 


Proof Let uo € Py(t). Since u := t —11 is the least el- 
ement of Py(t), it follows that up > a, so (i) is proved. 
We now demonstrate that (ii) is valid. In view of the 
equality r = ||t — uo|| = max(p(t — uo), p(uo — £)), we 
conclude that p(up —t) <r and p(t—uo) <r. We 
need to prove that p(t — uo) = r. Assume on the con- 
trary that p(t — uo) := inf{A: tf — up < AL} <r. Then 
there exists ¢ > 0 such that t — ug < (r —€)1. This im- 
plies that uw > t—rl+el=a-+el. Since up €U 
and U is downward, it follows that #7 + «1 € U, so uw 


is an interior point of U. This contradicts the fact that i 
is a best approximation of t by U. 

Assume now that both items (i) and (ii) hold. It fol- 
lows from (i) that tf — ug < r1. Since p is a topical func- 
tion, we conclude that p(t — uo) < r. Item (ii) implies 
\|t — uol| = p(t — uo) <r. Since r = minyey ||t — ull, 
we conclude that u € Py(t). Oo 


We now turn to upward sets. 


Definition 2 A set V CX is called upward if 
VEV,x>v = xe V. 


Clearly V is upward if and only if U = —V is down- 
ward, so all results obtained for downward sets can be 
easily reformulated for upward sets. 


Proposition 4 A closed upward subset V of X is prox- 
iminal. 


Proof This is an immediate consequence of Proposi- 
tion 2. oO 


Theorem 2 Let U be a closed upward set and t ¢ U. 
Then upg is a solution of the problem Pr(U, t) if and only 
if 

(i) uo <t+rl, wherer = minf{A > 0: t+Ale V}. 
(ii) p(uo — t) = p(t — uo). 


Proof The result can be obtained by application of 
Theorem 1 to the problem Pr(—U, —t). Oo 


Corollary 4 Let V C X be a closed upward set and 
t € X. Then d(t, V) = min{A > 0: t+Ale V}. 


Sets Z, and Z_ 


Consider function s defined on X by 


= (tx) =p), 


We now indicate some properties of function s. 

(1) s is homogeneous of degree one, that is, 
s(Ax) = As(x) for 4 € R. Indeed, we need to 
check that s(—x) =—s(x) for all x eX and 
s(Ax) = As(x) for all x € X and all A € R. Both 
assertions directly follow from the definition of s. 

(2) s is topical. It follows directly from the defini- 
tion of s that s is increasing. We now check that 
s(x + 1) = s(x) + pw forall x € Xandallw eR. 
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Indeed, 


II 


s(x + Hl) = 5 (Pl + HD) ~ (p(-x — 11) 


(P(x) 54) 
s(x) + LL. 


We will be interested in the level sets 
Za = {x € X: s(x) > O} and Z_ = {x € X: s(x) < 0} 
of function s. The following holds: 

xe Z4 ==> p(x) = p(—x) ==> p(x) = Ix] - 

x © Z_ ==> p(x) < p(—x) => p(—x) = |x|]. 


Since s is homogeneous, it follows that Z_ = —Z. Let 
Zo = {x: s(x) = 0}. Then 


Z4NZ=Z, ZUZL=X. 


Since s is continuous, it follows that Z, and Z_ are 
closed subsets of X. Note that both Z, and Z_ are 
conic sets. (Recall that a set C C X is called conic if 
(x €C,A>0) = Ax EC). 

Since sis increasing, it follows that Z, is upward and 
Z_ is downward. Let R = {A1: A => 0} be the ray pass- 
ing through 1. In view of the topicality of s, 


Ze =ZtR, Z_=LZy—-R. 

Indeed, let x € Z4; then s(x) :=A>0. Letu = x — 
Al. Then s(u) = 0, hence u € Zy. We demonstrated 
that x € Zy) + R, so Zi C Zo + R. The opposite inclu- 
sion trivially holds. Thus, Z+ = Zp) + R. We also have 
Z_ = —Zy—R = Z)—R. We now give some examples. 


Example 4 Let X =C(Q) be the space of all 
continuous functions defined on a compact topo- 
logical space Q and p(x) = maxgeq x(q). Then 
s(x) = maxgeq x(q) + mingeg x(q); therefore Z) = 
{x € C(Q): maxgeg x(q) = —mingeg x(q)}. Thus 
x € Zo if and only if there exist points q4+,q- € Q 
such that |x(q4)| = |x(q_)| = [lxll and x(q4.) > 0, 
x(q—) < 0. Further, x € Z; if and only if |x|| = 
max,geq x(q) > — mingeg x(q) and x € Z_ if and only 
if ||x|| = maxgeg(—x(q))> — mingea(—x(q)) = 
maxyeq x(q). 


Let Q consist of two points. Then C(Q) coincides 
with R? and s(x) = x; + x, that is, s is a linear func- 
tion. If Q contains more than two points, then s is not 
linear. 


Example 5 Let X = R x Y, where Y is anormed space 
(Example 2). Let x =(c,y); then p(x) =c-+ lly]. 
Hence 


1 
s(x) = Fle + lly) —(-e + yD] =e, 
so s is linear. The following holds: 


Zo = {(c, y): ¢ = O}, 
Z— = {(c, y): ¢ < O}. 


Z4 = {(c,y): c= 0}, 


Example 6 Let X =I! (see Example 3). Then 
s(x) = x; and 


Zo = {x = (x;) € 1: x, = 0}, 
Z4 ={4 (x;) € Ts xy > 0}, 


Z— = {x = (xj) € Tbs xy < 0}. 


Downward Hull and Upward Hull 


Let U be a subset of X. The intersection Ux of all down- 
ward sets that contain U is called the downward hull 
of U. Since the intersection of an arbitrary family of 
downward sets is downward, it follows that U, is down- 
ward. Clearly U. is the least (by inclusion) downward 
set, which contains U. The intersection U* of all up- 
ward sets containing U is called the upward hull of U. 
The set U* is upward and is the least (by inclusion) up- 
ward set containing U. 


Proposition 5 ([15], Proposition 3) Let U C X. Then 


Ux =U-—K:= {u—v:ueUu, ve K}, 
U*=U+4+K:={ut+v:ueUu, ve K}. 


We need the following result: 


Proposition 6 Consider a closed subset U of X. 

(1) Let t€ X be an element such that t-—U C Z4. 
Then d(t, U) = d(t, Ux). 

(2) Let t € X be an element such that t—UC Z_. 
Then d(t, U) = d(t, U*). 
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Proof We shall prove only the first part of the propo- 
sition. The second part can be proved in a similar 
way. Let r = d(t, Us). Since U C Ux, it follows that 
r < d(t, U), so we need only check the reverse inequal- 
ity. Let ux € Ux be arbitrary. Then, by Proposition 5, 
there exist u €¢ U and ve K such that u. =u—v. 
Hence 


t-u,=t-u+tv=x-—uwithx:=t+v>t. 


By hypothesis, t —u € Z4. Since x > t and Z, is up- 
ward, it follows that x — u € Z. Since ||z|| = p(z) for 
all z € Zs and p is increasing, we have 


||t—us |] = ||x-ul] = p(x—u) = p(t—u) = ||t—ul] . 


Thus for each ux € U, there exists u € U such that 
|t — us || = ||t — u||. This means that r := d(t, Us) = 
d(t, U). We proved that d(t, U) =r. Oo 


Proposition 7 

(1) Let t € X be an element such that t— U C Z4 and 
let Us be a closed set. Then (U, t) is a proximinal 
pair. 

(2) Let t € X be an element such that t— U C Z_ and 
let U* be a closed set. Then (U, t) is a proximinal 
pair. 


Proof We shall prove only the first part of the propo- 
sition. Since Ux is a closed downward set in X, it fol- 
lows, by Proposition 3, that the least element up of the 
set Py, (t) exists and up = t— rl, where r = d(t, Ux). 
In view of Proposition 6, r = d(t, U). Since up € Ux, 
by Proposition 5, there exist u € U and v € K such that 
uy = t—rl =u-—v.Thent—u=rl—vand 


p(t —u) = p(rl —v) < p(rl) =r. 


Since, by hypothesis, t — u € Z4, it follows that ||t — 
u|| = p(t—u) < r. On the other hand, ||t — u|| > 
d(t,U) = r. Hence ||t — u|| = 7, and sou € Py(t), 
which completes the proof. oO 


Remark 2 Let U C X beaclosed set. Assume that there 
exists a set V C X such that VC UC V, and Vx is 
closed. Then Ux = Vx; hence Ux is closed. In particu- 
lar, U, is closed if there exists a compact set V such that 
VCUC Vx. 


Proposition 7 can be used for the search of a met- 
ric projection of an element t onto a set U such that 


t—U CZ, and Uy, is closed. In particular, we can 
give the following necessary and sufficient conditions 
for a solution of the problem Pr(U, t) for these sets. 


Theorem 3 
(1) Lett—UC Zx and Ux is closed. Then uo € U is 
a solution of Pr(U, t) if and only if 
(i) up > t—rl wherer = min{A > 0:t—Ale 
U — K}. 
(ii) p(t — uo) = pluo — £); 
(2) Let t—U Cc Z_ and U* is closed. Then ug € U is 
a solution of Pr(U, t) if and only if 
(i!) up < t+rl wherer = minfA >0:t+Ale 
U + K}. 
(i) p(uo — t) = p(t — uo). 


Proof We again prove only the first part of the theo- 
rem. Due to Proposition 6, we get d(t, U) = d(t, Ux) = 
r. Since Ux is closed and downward, it follows (Propo- 
sition 3) that w4:= t—rl1 € Py,(t). Let up > u and 
uo € U. Then up € Ux and in view of Proposition 6, it 
holds: 


d(ug, U) = d(up, Ux) = 
= min{A > 0: t—Ale Ux}. 


Applying Theorem 1 we conclude that up is a best ap- 
proximation of t by Ux. Since up € U, it follows that uo 
is a best approximation of t by U. 

Consider now a best approximation uy of t by U. 
Applying again Proposition 6 we deduce that ||f—uo|| = 
d(t,U) = d(t, Ux) = r. Theorem 1 demonstrates that 
both (i) and (ii) hold. Ey 


Metric Projection onto a Closed Set 


Downward and upward sets can be used for examina- 
tion of best approximations by arbitrary closed sets (it 
is assumed that a metric projection exists). 

We start with the following assertion. 


Proposition 8 Let U be a closed subset of X and t € X. 
Consider the following sets: 


Uy =Uunti=Z,), U; =UN(t—Z_). (10) 
Then 

(1) t-U} CZy4, t-UP CZ. 

2 US vu; =u. 
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(3) U; NU; =UN(t— Zp), where Zo = {x € X: 
s(x) = O}. 

(4) U," and U; are closed. 

(5) If U is downward, then U;* is downward; if U is up- 
ward, then U, is upward. 


Proof 
(1) Itis easy to check 


(t-U)NZ, =t-[UN(t-Z4)] =t-—UF. 


Hence t—U;' C Zy. A similar argument shows 
that t— Up C Z_. 
(2) The following holds: 


US UUP = [(t- Z4) NU] U [(t- Z_) NU] 
= [(t-Z,)U(t-Z_)] NU 
=[t-(Z,UZ_)]NU. 


Since Z4UZ_ =X, it follows that 
yw =, 


(3) The following holds: 

Ut NU; = [UN [(t- Z4)] N[UN (t- Z_)] 
UN [(t-— Z4)N (t— Z_)] 
= UN[t-(Z4NZ_)]. 


Since Z4 M Z_ = Zp, the result follows. 
(4) This is clear. 
(5) It follows from the fact that t — Z; is downward 
and t — Z_ is upward. oO 


Consider a fixed proximinal pair (U, t). Let U;* and U;> 
be the sets defined by (10). Since U;* U U; = U, it fol- 
lows that 


inf ||t—u|| = min( inf ||t—u*||, inf_ ||t—u7||). 
ueU uteu u—€U;,- 
(11) 


It follows from (11) that at least one of the pairs (oF , t) 
and (U; , t) is proximinal and a metric projection of t 
onto U coincides with a metric projection onto at least 
one of the sets U;* or U;. Let 


r+ = inf |t— ull, 
ueUt 

r_= inf ||t—ull, (12) 
uéeU; 


r = inf |/t—ul|| = min(r4,r_). 
ueU 


For examination of metric projections of t onto U we 
need to find numbers r, and r_. The number r, can be 
found by solving a one-dimensional optimization prob- 
lem of the form (9); r_ can be found by solving a similar 
problem. 

If ry < r_, then a metric projection of t onto U co- 
incides with a metric projection of t onto U;*. Since 
t— U7 C Z4, we can use the results of this section for 
analyzing the problem Pr(U, t) and its solution. In par- 
ticular, if the downward hull (U;*), of the set U;* is 
closed, we can assert that the set Py(t) coincides with 
the set Put (t). Using Theorem 3 we can give necessary 
and sufficient conditions for the global minimum in 
this case in terms of the set U;*. They can be expressed 
in the following form: 


Py(t) = Py+(t) ={ue US: u>t—r+l, 
p(t—u) = p(u—t)}. 


If r_ < rz, then a metric projection of t onto U coin- 
cides with a metric projection of t onto U;, . If the set 
(U; )* is closed, we can assert that 


Py(t) = Py-(t) = {u € U;: u<t+4+rl, 
plu—t) > p(t—u)}. 


If r_ = r+, then we can use both sets U;* and U;. 

We assume in the rest of this section that both pairs 
(U;*,t), (Uy, t) are proximinal. In particular, these 
pairs are proximinal for arbitrary t, if U is a locally com- 
pact set. 

We are now interested in metric projections u of t 
onto U such that s(u — t) = 0. We introduce the fol- 
lowing definition. 


Definition 3. A pair (U,t) with U C X, t € X is called 
strongly proximinal if s(u — t) = 0 for each metric pro- 
jection u of t onto U. 


Recall that s(u — t) = Oifand only ifu—t € ZN Z_. 


Proposition 9 The following assertions (i) and (ii) are 
equivalent: 

(i) (U, t) is a strongly proximinal pair; 

(ii) Py(t) = Py+(t) Pu; (2). 


Proof 
(i) = > (ii). Let u € Py(t). Since u—t € Z_ = —Z 4 
and u € U, it follows thatu € UN (t-— Z4) = sem 
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Then ||t—w|| = min,eu ||t—u'|| < min ey+ \|t— 
u'||. Since u € U;*, we conclude that the equality 
jt — ul] = min ey+ |t — u’|| holds. Thus u € 
Put (t). A similar argument shows that u € Py-(t). 
Letu é€ Put) M Pu; (t). Then 


\|u — t\| = d(t, U) = d(t, U;). 


Combining the equality U = Uj} U U7 with (11), 
we get ||u — ¢|| = minyey ||u’ — t||, and hence u € 
Pu(t). 
(ii) => (i). Since (ii) holds, it follows that 
Py(t) = Py+(t) 0 Po (t) 
={ueU/:t-rl <u} 
AtueU,:u<t+ri} 
={ueU/NuU,:t-rl<u<t+rl}. 


Applying Proposition 8 (3), we conclude that 
Pyo(t) = {ue UN(t—Z):t-rl<u<t+rl} 
= UN (t— Zp) N Bt, r). 
Since Py(t) = UN B(t,r) (by definition), it follows 


that Py(t) C t — Zo, that is, the pair (U, ft) is strongly 
proximinal. oO 


Let (U,t) be a proximinal pair. We are interested 
in a description of conditions that guarantee that 
% := t — i, where @ is a metric projection of t onto U, 
belongs to Z1 M Z_ = Zp. First, we give the following 
definition: 


Definition 4 We say that a set U C X is weakly K- 
open if for each u € U there exists an element q € int K 
such that u + dq € U for all 6 with a small enough |6]. 


Proposition 10 Assume that (U, t) is a proximinal pair 
such that the set U is weakly K-open. Let ti € Py(t). 
Then? := t—i € Zo. 

Proof Let *¢ Zo; then 6 ¢(Z;MZ_). Assume 
for the sake of definiteness that 7 € Z*, that is, 
||| = p() > p(—#). Since U is weakly K-open and 
i € U, it follows that there exists q € int K such that 
ui + 5q € U for all small enough 6 > 0. Then: 


p(v) > p(v — 6q) = p(—# + 64) = p(-( — §q)). 


Hence || — 6q|| = p(v — dq) < p(¥) = |l%||. Leta = 
ii + 6q. Because U is weakly K-open, we conclude that 


ui € U for all small enough 6 > 0. Since # — 6g = t — 
ii — dq = t — H, we obtain 


min ||t—ul| < ||t—al| = ||¥—64q]] < ||¥l] = |lt¥-al. 
ueU 
This is a contradiction because #1 € Py(t). Oo 


Example 7 Let U’ C X be a locally compact set and 
q € int K. Consider the set 


U=U'+{Ag:4 © R} = {u'+Aq: u' CU AER}. 


Clearly U is a locally compact set and U is weakly K- 
open. Then for each t € X the pair (U, tf) is strongly 
proximinal. 


Best Approximation in a Class of Normed Spaces 
with Star-Shaped Cones 


The theory of best approximation by elements of 
convex sets in normed linear spaces is well devel- 
oped and has found many applications [1,2,4,5,10, 
16,17,18,19,20]. However, convexity is sometimes a re- 
strictive assumption, and therefore the problem arises 
of how to examine best approximation by not necessar- 
ily convex sets. Special tools for this are needed. 

The aim of the present article is to develop a the- 
ory of best approximation by elements of closed sets 
in a class of normed spaces with star-shaped cones 
(see [9]). A star-shaped cone K in a normed space 
X generates a relation <x on X, which is an or- 
der relation if and only if K is convex. It can be 
shown that each star-shaped cone K, such that the 
interior of the kernel K is not empty, can be repre- 
sented as the union of closed solid convex pointed 
cones K; (i € I, where I is an index set) such that 
the interior of the cone Ky := Mje;K; is not empty. 
A point 1 € int K, generates the norm ||-||+ on X, 
where ||x||. = inf{A >0: x <x, A1,—x <x, Al}, and 
we assume that X is equipped with this norm. In the 
special case I = {1} (that is, K is a closed convex solid 
pointed cone) the class of spaces under consideration 
contains such Banach lattices as the space L°(S, &, jt) 
of all essentially bounded functions defined on a mea- 
sure space (S, X’, jz) and the space C(Q) of all con- 
tinuous functions defined on a compact topological 
space Q. 

Now, let X be a normed space and UC X. 
The set kernU consisting of all u¢U such that 
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(x€U, 0<a<1)=u+a(x—u) €U is called 
the convex kernel of U. A nonempty set U is called star- 
shaped if kern U is not empty. It is known (see, for ex- 
ample, [12]) that kern U is convex for an arbitrary star- 
shaped set U. If U is closed, then kern U is also closed. 
Indeed, let uz € kernU, k = 1,... and uz — u. For 
each k = 1, 2,..., x €U and a € [0,1], we have 
uz, + a(x — uz) € U, and so u+a(x —u) € U. This 
means that u € kern U. 
We need the following statement. 


Proposition 11 Let UC X be a set and let u € U. 

Then the following assertions are equivalent: 

(i) There exists ¢ >0, an index set I, and a family of 
convex sets (U;)ie such that 


us| lo; and U; D B(u,e) (i€ 1). (13) 


i€l 
(ii) U isa star-shaped set and u € int kern U. 


Proof 

(i) => (ii). Let z € B(u, €) and let x € U, a@ € [0,1]. 
It follows from (13) that there exists i € I such that 
x € Uj. Since U; is convex and z € B(u,e) C Uj, 
we conclude that z + a(x — z) € U; C U. Hence, 
z € kernU for each z € B(u,€), and so B(u,€) C 
kern U. 

(ii) ==> (i). Let I = U. Since u € int kern U, it follows 
that there exists ¢ > 0 such that B(u,e) C kernU. 
Let x € U and U, = co(x U B(u,e)). Then the 
set U, is convex and closed and x € U,. Hence, 
U C U,ey Ux. Applying the definition of the con- 
vex kernel we conclude that U, C U. Hence, 
Uxeu Ux CU. - 


If 0 € kern U, then the Minkowski gauge uy of U can 
be defined as follows: 


beu(x) = inffA>0: x € AU}. (14) 


(It is assumed that inf @ = 0.) 

Let u € kern U. Then, 0 € kern (U — u), and so we 
can consider the Minkowski gauge wy—, of the set 
U-u. 

Theorem 4 Let u € intkern U. Then the Minkowski 
gauge [Lu—y of the set U — u is Lipschitz. 


Theorem 4 has been proved in [11] (Theorem 5.2) for 
finite-dimensional spaces. The proof from [11] holds 
for an arbitrary normed space and we omit it. 


In the sequel, we shall study star-shaped cones. Re- 
call that a set K C X is called a cone (or conic set) 
if (A>0, x © K) = > Ax € K. Let K be a star-shaped 
cone and K, = kern K. Then, Kx is also a cone. In- 
deed, let u € Kx, A >Oand x € K. Let x’ = x/X. Then, 
x’ € K, and so u+a(x’—u)€K for all a € [0,1]. 
We have Au+a(Ax’ —Au) = Aut+a(x—Au) eK. 
Since x is an arbitrary element of K, it follows that 
Au € kern K = Kx. We now give an example. 


Example 7 Let X coincide with the space C(Q) of 
all continuous functions defined on a compact met- 
ric space Q and K = {x € C(Q): maxgeq x(q) = O}. 
Clearly K is a nonconvex cone. It is easy to check that K 
is a star-shaped cone and kern K = K+, where 


Ky = {x € C(Q): x(q) = 0 for all gq € Q} 
= {x € C(Q): min x(q) = 0}. 


Indeed, let u € K;. Consider a point x € K. Then 
there exists a point q’€Q such that x(q’) > 0. 
Since u(q)>0 for all qgeQ, it follows that 
au(q’) + (1—a)x(q’) => 0 for alla € [0, 1]. Therefore, 
au+(l—a)x € K. We proved that K, CkernK. 
Now, consider u ¢ Ki. Then there exists a point 
q’ such that u(q’) <0. Since u is continuous, we 
can find an open set G C Q such that u(q) < 0 for 
q€G. Let x € K be a function such that x(q) <0 
for all q ¢ G (such a function exists). Since the set 
Q\G is compact, it follows that max,¢g x(q) < 0; 
hence ax(q)+(1—a@)u(q) <0 for all qe Q and 
small enough a > 0. Therefore ax + (1—a)u ¢ K for 
these numbers a. The equality kern K = K+ has been 
proved. Note that intkern K # 9. 


The following statement plays an important role in this 
paper. 
Theorem5 Let K C X be aclosed cone and let u € K. 
Then the following assertions are equivalent: 
(i) There exists € >0, an index set I and a family of 
closed convex cones (K;) je such that 
K= Ki and K;> Blu,e) (i€ DT). (15) 
i€l 

(ii) K is a star-shaped cone and u € intkern K. 


Proof 
(i) => (ii). It follows from Proposition 11 that K is 
a star-shaped set and u € intkern K. Since K; is 


a cone for each i € I, it follows that K is a cone. 


212 


Best Approximation in Ordered Normed Linear Spaces 


(ii) = > (i). In view of Proposition 11, there ex- 
ists a family of convex sets Uj, (i¢ I) such 
U; D Blu,e) and K=\),<,;U;. Let K; be the 
closed conic hull of Uj: K; = cl Uy, )AU;. Then 
= Vier Kj. O 


Remark 3 

(1) Let K be a closed star-shaped cone with 
intkernK #9. Then the set Ky, =kernK is 
a closed solid convex cone. (Recall that a convex 
cone K is called solid if int K # 9.) 

(2) Note that in Theorem 5, the family (K;)j<«; can be 
chosen such that each K; is a closed solid pointed 
convex cone. Indeed, if u € intkern K, then u 4 0 
and a neighborhood B(u, ¢) C kern K can be cho- 
sen in such a way that 0 ¢ B(u, e). Then the closed 
conic hull K; = cl L,,)AU; is a closed solid 
pointed convex cone. 


Let K be a star-shaped cone and K= ey Ki, 
where K; is a convex cone and Ky, = ( Veg K;. Then 
kern K D> Kx. Indeed, let u € Kx and x € K. Then 
there exists j ¢ I such that x € Kj. The inclusion 
u €();<,Ki implies that u € Kj. Since Kj is a con- 
vex cone, it follows that wx + (1—a)u € Kj for all 
a € (0, 1). This means that u € kern K. 

Let K be a closed star-shaped cone and 
u € int kern K. Consider the function 


Pu,x(x) = inf{A e R: Au—x € K}. (16) 


Functions (16) are well known if K is a convex cone. 
These functions have been defined and studied in [12] 
for the so-called strongly star-shaped cones (see [11] 
for the definition of strongly star-shaped sets). Each 
star-shaped set U with int kern U # @ is strongly star- 
shaped. (It was shown in [11] for finite-dimensional 
space; however, the same argument is valid for ar- 
bitrary normed spaces.) It was shown [12] that px 
is a finite positively homogeneous function of the 
first degree and the infimum in (16) is attained, so 
Pu,x(x)u — x € int K. The following equality holds: 


Pu,k(x — yu) = Lx-u(yu—x~x), (17) 


where [ux—, is the Minkowski gauge of K — u. In view 
of Theorem 4, the function x—, is Lipschitz, therefore 
Pux is also Lipschitz. If K is a convex cone, then p,,x is 
a sublinear function. This function is also increasing in 


the sense of the order relation induced by the convex 
cone K. The following assertion holds (see [12]). 


Proposition 12 Let K be a star-shaped cone and 
u € int kern U. Then: 


Pu.K(x+Au) = pujx(x)+A, xeX,AER (18) 


and 


(x: pux(x)<A}=Au—-K, AER. (19) 


We also need the following assertion. 


Proposition 13 Let (Kj)ic; be a family of closed 
star-shaped cones such that ();-,intkernK; 4 @. Let 
u €();¢yintkern K;. Let K = je;Ki; and Ky = 
(ie, Ki. Then 


Pu,K(x) = inf pu,x; (x) 2 
ier 


Pu,K, (x) = SUP Pu,K;, (x), (x € X) : 


i€l 
Proof Let L be a cone such that u ¢€ intkern L. For 
each x € X consider the set A,(L) = {A € R: Au— 
x € L}. It was proved in [12], Proposition 1, that this 


set is a closed segment of the form [A,, +00), where 
Ax = Pu,i(x). We have 


Axx ={AER: Auex+|JKi} 


iel 
={AER: Aue| J(x + Ki} 
iel 
=(JO eR: Avex t+ Ki} =|JAcx,- 
ie! iel 


Hence 
Pu,k(x) = inf Ay xn = inf|_) Ay k: 
ie] 


= infinf A,,x, = inf py,x,(x). 
i€l ie] 


The second part of the proposition can be proved by 
a similar argument. oO 


Let K be aclosed star-shaped cone with int kern K # 9. 
Then K can be represented as the union of a fam- 
ily of closed convex cones (K;)jez. One such family 
has been described in the proofs of Proposition 11 
and Theorem 5: I = K, K; = clconeco {i U B(u, €)}, 
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where u € intkernK and ¢>0 so small such that 
B(u,€) C kern K. This family is very large; often we 
can find a much simpler presentation. For example, as- 
sume that a cone K is given as the union of a family of 
closed convex cones (K;)je7 such that the cone ( );<; Ki 
has a nonempty interior. Then this cone is contained in 
kern K; we can use the given cones K; in sucha case. We 
always assume that cones K; are pointed for all i € I, 
that is, K; al (—K;) = {0}. 

An arbitrary star-shaped cone K induces a relation 
>x on X, where x <x y means that y — x € K. This re- 
lation is a preorder relation if and only if K is a convex 
set. Although >x is not necessarily an order relation, we 
will say that x is greater than or equal to y in the sense 
of K ifx >x y. We say that x is greater than y and write 
x>xy ifx—y eK \ {0}. Let K = U,., Ki, where K; 
is a convex cone. The cone K; induces the order rela- 
tion >x,. The relation > x, which is induced by cone K, 
can be represented in the following form: 


x>xy  ifandonlyifthere exists i € I 


such thatx >x, y. (20) 


In the rest of this article, we assume that X is 
equipped with a closed star-shaped cone K with 
intkern K 4 9. We also assume that a family (K;)je1 of 
closed solid convex pointed cones K; is given such that 
K = Uje, Ki and Kx = ();<; Ki has a nonempty inte- 
rior. Let an element 1 € int K, be fixed. It is clear that 
1 € int K; for all i € I. We will also use the following 
notations: 


Pik =P, Pik = Pi> PKs = Px- (21) 
It follows from Proposition 13 that 
p(x) = infpi(x), pa(x) = sup pi(x). (22) 


ie] 


A function f: X > R is called plus-homogeneous 
(with respect to 1) if 


f(x +All) = f(x) +A forall xe X and AER. 


(The term plus homogeneous was coined in [13].) It 
follows from (18) that p; (i € I), p and px are plus- 
homogeneous functions. 

Let 
iel. 


Bi ={xeX:1>xK, x =k, -l} (23) 


Since K; is a closed solid convex pointed cone, it is easy 
to check that B; (i € I) can be considered as the unit 
ball of the norm ||- ||; defined on X by 


lx ||; s= max(pi(x), pi(—x)) x EX. (24) 


Let 


\|x||« = sup ||x]]; (x eX; ie). (25) 


ie] 


We now show that ||x||4 <-+oo for each x #0. In- 
deed, since 1 € int K, C int K;, it follows that there ex- 
ists ¢ >0 such that 1+ ¢B C K; for all i € I, where 
B = {x € X: ||x|| < 1} is the closed unit ball with re- 
spect to the initial norm || - || of the normed space X. Let 
x #0. Then x’ = (e/||x||)x € eB; hence 1— x’ € Kj. 
This implies that 


pilx’) =inffAe R: Al —x'€K}<1. 


Since p; is a positively homogeneous function, it follows 
that 


pilx) = pi (1) 


x x 
= Fle) < Hl. 
The same argument demonstrates that p;(—x) < 
|| x||/e. Hence 


ll<ll+ = sup [lll 
i€l 


sie ea) = tel Bae 
i€l 


Clearly || - ||- isa norm on X. It is easy to see that 


\|x||4 = max(p+(x), px(—x)) xex. (26) 


Due to (23), we have 


Bi(x,r):= {y € X: |ly—xl]i <7} 
={yeX:x+rl>x, y =x, x—1l}, (27) 
where x € X,i € Iandr>0.Letx € X andr> 0. Con- 


sider the closed ball B(x, r) with center x and radius r 
with respect to || - ||+: 


B(x,r) = {y € X: |ly—xlle <7} 
={yeX:x+rl =x, y=, x—rl}. (28) 
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It follows from (20), (27), and (28) that 


B(x,r) = ( ) Bilx,r), (29) 
i€l 
and 
Bix,r) C {ye X:x+rl>xey>xKx—rl}. (30) 


We now present an example. 
Example 8 Let X = R?. Consider the cones 


A= {(x,y) € X: x > 0and y => 2x}, 


1 
B= (xy) e Xx <0and y> at 


1 
C= f(x.) Xix 2 Oand y > — Fah 
D= {(x,y) € X: x <Oand y > —2x}. 


Set K; =AUB, Ky =CUD, K=K,Uk), and 
Ky := Ki 1 Ky = AUD. It is easy to check that K is 
not a convex set while K,, Ky and K, are convex sets. 
We also have: 


Px(x) = max(y—2x, y+2x) for allx = (x,y) EX, 


|x||x = ly] +2|x| forall x =(x,y)eX. 

Example 9 Let X be a normed space with a norm || - ||. 
Let Y = X x Rand K := epil| - || C Y be the epigraph 
of || - ||. (Recall that epil| - || = {(x,A) € Y: A = |x|}. 
Then K is a convex closed cone and (0,1) € int K. 
Assume now that X is equipped with two equivalent 
norms || - ||; and ||- |/2. Let K; = epil|- |];, i = 1,2, and 
K = K, UR». If there exist x’ € X and x” € X such 
that |x’|]1 < ||x’|]2 and ||x”||, > |x” |l2, then K is not 
convex. Clearly K is a pointed cone. The set int K con- 
tains (0, 1); hence it is nonempty. Clearly K \ {0} is con- 
tained in the open half-space {(x, A): A > 0}. Cone K is 
star-shaped. It can be proved that kern K = K, M Kp. 


In the remainder of the article, we consider a normed 
space X with a closed star-shaped cone K such that 
intkern K is not empty. Assume that K is given as 
K = Uje, Ki, where 

e [isan arbitrary index set; 

K;, (i € I) is aclosed solid convex pointed cone; 
The interior int K, of the cone Ky =();<,Ki is 
nonempty. 

In the sequel, assume that the norm || - || of X coincides 
with the norm || - ||- defined by (26). 


Characterization of Best Approximations 


Let g: X x X —> R bea function defined by 


g(x,y) := sup{Ae Rix+y>K Al} (x, ye X). 
(31) 


Since leintK,, it follows that the © set 
{A ER: x+y>x Al} is nonempty and bounded 
from above (by the number ||x + y||.). Clearly this 
set is closed. It follows from the definition of g that the 
function g has the following properties: 


— 00 < g(x,y) < ||_x+yl]x foreach x, yeX, 


(32) 

x+y>x(x,y)l forall x, yex, (33) 

g(x,y) =o(y,x) forall x, yeEeXx, (34) 

y(x,—-x) = sup{A € R:0=x-x>K Al} (35) 
=0 forall xeEXx, 


g(x,y +All) = g(x,y) +A forall x, yex 


and AER, (36) 
g(x +Al,y) = g(x,y) +A forall x, yeXx 
and AeER, (37) 
vlyx. yy) = yo(x.y) forall x, yeX 
and y>0O. (38) 


Proposition 14 Let ~ be the function defined by (31). 
Then 


g(x,y) =—p(-x—y), (x, ye X), (39) 


and hence 


p(x, y) = supl—pi(—x — y)] (x, ye X). (40) 
i€l 


Proof Foreachx, y € X, we have 


—y(—x,—-y) = —sup{A eR: —(x + y) =x Al} 
= inff-A eR: —(x+ y)>xKAl} 
= inffM' eR: —(x+y)>xK-’’} 

inffA’ € R: V1 >K x+y} 

= p(ix+y). 
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Hence g(x, y) = —p(—x — y). In view of (21), we 
get (40). Oo 


Now, consider x, y€ X. We define the functions 
gx: X —> Rand g,: X —> R by 


gx(t) = p(x,t) tex (41) 
and 
py(t) = p(t,y) tex. (42) 


Note that gy, and gy are nonincreasing functions with 
respect to the relation generated by K on X. We have 
the following result: 


Corollary 5 Let ¢ be the function defined by (31). Then 
g is Lipschitz continuous. 


Proof This is an immediate consequence of Lipschitz 
continuity of p and Proposition 14. Oo 


Corollary 6 For each x, y € X, the functions defined 
by (41) and (42) are Lipschitz continuous. 


Proof It follows from Corollary 5. oO 


Proposition 15 Let g be the function defined by (31) 
and set 


Aly,a@) = {x €X: o(x,y) =a} (ye X;aeR). 


Then, A(y,a) = K+al1—y for all ye X and all 
aeéeR. Oo 


Proof Fix y € X anda € R. Then 
xe Aly,a) => g(x,y) >a. 


Due to Proposition 14, this happens if and only if 
—p(—x — y) =a, and hence by Proposition 12, if 
and only if —x — y € —w1 — K. This is equivalent to 
x € K + a1— y, which completes the proof. oO 


Corollary 7 Under the hypotheses of Proposition 15, we 
have 


ifand onlyif x+y >x al 
(x, ye X; a ER). 


Q(x, y) =a 


Lemma 2 Let W be a closed downward subset of X; 


yo € bd W and 9 be the function defined by (31). Then 
Vwew. 


g(w,—yo) < 0 = Glyn. —yo) (43) 


Proof The proof is similar to the proof of Lemma 4.3 
in [7]. Oo 


For x € X and a nonempty subset W of X, we will use 
the following notations: 


di(x,W):= inf ||x—wl|; iel 
wew 
and 


Pi(x) ={we W: ||x—wll;=di(x,W)} ie. 


Lemma 3 Let W be a closed downward subset of X, 
x€X\W,r>0, andi € I. Thenr = d'(x, W) if and 
only if x—rl © W and p;(x-—w-—rl1)) => 0 for all 
we Ww. 


Proof Let r=d'(x,W). In a manner analogous 
to the proof of Proposition 3, one can prove 
that x—rl€ Pi.(x) C W. Since Pi,(x) Cbd W, it 
follows from Lemma 2 and Proposition 14 that 
pilx —w—rl1) >= 0 for all w € W. Conversely, sup- 
pose that x —rl € W and p;(x —w—rl1) = 0 for all 
weéeW. Let we W be arbitrary. Since p; is plus- 
homogeneous and pi(x — w— 11) = pi(x — w) —7, it 
follows from (24) that 


|x —w]|i => pilx-—w)>r. 


Since ||x — (x —rl1)||; =r and x —rl € W, we con- 
clude that r = d'(x, W). Oo 


Lemma 4 Let W be a closed downward subset of X, 
x €X\W, and r>0. Then r = d(x, W) if and only if 
x—rl € W and for some ie I, pi(x —-w—rl1) = 0 
forallw € W. 


Proof Let r=d(x,W). By Proposition 3 we 
have x — 11 € Pw(x) Cbd W. Then it follows from 
Lemma 3 that y(w, rl — x) < 0 for all w € W. In view 
of (40), we get pi(x —w— rl) > 0 for all w € W and 
all i € I. Conversely, suppose that x —rl € W and 
for some i€I, pi(x-—w-—rl)>0 for all we W. 
Consider w € W. Since p; is plus-homogeneous and 
pilx —w—rl1) = pi(x —w)—r, it follows from (24) 
and (25) that 


|x — wll« > lle — wlli = pile —w) > r. 


Since r = ||x — (x —1r1)|]« and x — rl € W, one thus 
has r = d(x, W). Oo 
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The following result is an immediate consequence of 
Lemmas 3 and 4. 


Corollary 8 Let W be a closed downward subset of X, 
x € X \ W. Then 


d(x,W) =di(x,W) forall ieT. (44) 


Corollary 9 Let W be a closed downward subset of X, 
x € X \ W, and wo € W. Then, wo € Pw(x) ifand only 
if wo € Pi,,(x) for each i € I. 


Proof Let wo € Pw(x). Then ||x — wollx = d(x, W). 
In view of (25) and (44), we have ||x — wolli = 
d'(x, W) foreach i € I. Therefore, wo € Pi,,(x) for each 
i € I. Conversely, let wo € Pi tx) for each i € I. Then 
\|x — wolli = d'(x, W) for each i € I. Hence, by (44), 
we get ||x — wo||x = maxje; ||x — wo]; = d(x, W), that 
is, Wo € Py(x). Oo 


Theorem 6 Let W be a closed downward subset of 

X, X90 € x\ W, yore W, and To := \|xo = yo|| +. Assume 

that @ is the function defined by (31). Then the following 

assertions are equivalent: 

(1) yo € Pw(Xxo). 

(2) There exists 1 € X such that 

g(w,l)<0<@ly,]), Ywew, y€ Bx, ro). 

(45) 


Moreover, if (45) holds with | = —yo, then yo = wo = 
minPy(xo), where wo = Xo — 11 is the least element of 
the set Pw(xo) and r := d(x, W). 


Proof 

(1) = > (2). Suppose that yo € Pw(xo). Then ro = 
xo — yolle = d(xo,W) = r. Since W is a closed 
downward subset of X, it follows from Proposi- 
tion 3 that the least element wo = xo — ro1 of the 
set Pw(xo) exists. Let 1 = —wo and y € B(xo, 10) 
be arbitrary. Then, by (30), we have y>x —I 
or y+1>x 0. It follows from Corollary 7 that 
y(y, 1) = 0. On the other hand, since wo € Pw(x0), 
it follows that wo € bd W. Hence, by Lemma 2 we 
have y(w, 1) < 0 for all w € W. 

(2) = > (1). Assume that (2) holds. By (28) it is clear 
that xo — rol € B(xo, 70). Therefore, by (45) we 
have ~(xo — rol, 1) = 0. Due to Corollary 7, we get 


xo —7o1 + 1 >x 0, and sol — rol >x —xo. Hence 
there exists j € I such that 


l—rol 2K; —Xo0. (46) 


Now, let we W be arbitrary. Since p; is topical 
and (21), (39), and (45) hold, it follows from (46) that 


pjlXo —w)> pj(rol -—l-w)= pj(-l —w)+ 10 
= pl=-l=w) +t 
= —g(w, 1) + 10 


20+%=10. 
Then, by (24) and (25), we have 


ro < pj(xo — w) < |xo — whl; 


< ||xo — wl forallwe Ww. 


Thus ||xo — yo|lx = d(xo, W). Consequently, yo € 
Pw (xo). Finally, suppose that (45) holds with ] = —yo. 
Then, by the implication (2) = > (1), we have 
yo € Pw(xo), and so ro = ||xo — yollx = d(xo, W) and 
Yo =K Wo, Where Wo = Xo — r1 is the least element of 
the set Pyw(xo) andr := d(xo, W). Now, let w € Pw(xo) 
be arbitrary. Then ||xo — w||x = d(xo, W) = 10, that is, 
w € B(xo, ro). It follows from (45) that g(w, —yo) = 0. 
In view of Corollary 7, we have w — yo >x 0, and so 
w >x yo. This means that yp = minPw(xo) = wo. This 
completes the proof. Oo 


Strictly Downward Sets 
and Their Best Approximation Properties 


We start with the following definitions, which were in- 
troduced in [7] for downward subsets of a Banach lat- 
tice. 


Definition 5 A downward subset W of X is called 
strictly downward if for each boundary point wo of W 
the inequality w > x wo implies w ¢ W. 


Definition 6 Let W be a downward subset of X. We 
say that W is strictly downward at a point w’ € bd W 
if for all wo € bd W with w’ >x wo the inequality 
w > KWo implies w ¢ W. 


The following lemmas have been proved in [7]; how- 
ever, those proofs hold for the case under considera- 
tion. 
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Lemma 5 Let f: X —> R be a continuous strictly 
increasing function. Then all nonempty level sets 
S-(f)(c € R) of f are strictly downward. 


Lemma 6 Let W be a closed downward subset of X. 
Then W is strictly downward at w' € bd W if and only 
if 

(i) w>xkw =wéwW; 

(ii) (W! >K Wo. Wo € bd W) =>} wo = wi. 


Lemma 7 Let W be a closed downward subset of X. 
Then W is strictly downward if and only if W is strictly 
downward at each of its boundary points. 


Lemma 8 Let ¢ be the function defined by (31) and W 
be a closed downward subset of X that is strictly down- 
ward at a point w' € bd W. Then there exists unique 
1 € X such that 


g(w,l)<0=ow',l), Vwew. 

Theorem 7 Let be the function defined by (31). Then 

for aclosed downward subset W of X the following asser- 

tions are equivalent: 

(1) W is strictly downward. 

(2) For each wo € bd W there exists unique | € X such 
that 


gw, 1) <0=9(Wo,.1l) Vwew. 


Proof The implication (1) = > (2) follows from 
Lemma 8. We now prove the implication (2) => (1). 
Assume that for each wo € bd W there exists unique 
1 € X such that 


gw, 1) <0=9(wo,1) Vwew. 


Let wo € bd W and y € X with y > x wo. Assume that 
y € W. We claim that y + Al ¢ W for all A > 0. Sup- 
pose that there exists Ao > 0 such that y + Aol € W. 
Since y + Aol > KWo + Aol and W is a downward set, 
we have wo + Aol € W. In view of Corollary 1, it con- 
tradicts with wo € bd W, and so the claim is true. Then, 
by Corollary 1, we have y € bd W. Let 1 = —y. It fol- 


lows from Lemma 2 that 
g(w,l)<0=9(7,1) Vwew. (47) 


On the other hand, applying Lemma 2 to the point wo 
we have for l’ = —wo: 


o(w.l') <0 =G(wo,l') Vwew. (48) 


Since y > x,Wo for some i € I and p; is increasing, it fol- 
lows from (21), (39), and(48) that 0 = p;(—wo — I) = 
pi(-y -—1') = p(-y-l') = -(y, Il’) = 0. This, to- 
gether with (48), implies that 
g(w,l')<0= (yl) Vwew. (49) 
Since wo # y, it follows that I’ # 1. Hence (47) and (49) 
contradict the uniqueness of /. We have demonstrated 
that the assumption y € W leads to a contradiction. 


Thus y ¢ W. This means that W is strictly downward. 
oO 


Corollary 10 Let f: X —> R be a continuous strictly 
increasing function and p be the function defined 
by (31). Then for each x € X there exists unique 1 = —x 
such that 


g(w,l)<0=9(x,1) VweS.(f), 


where c = f(x). 


Proof This is an immediate consequence of Lemma 5 
and Theorem 7. Z 


Definition 7 Let W be a downward subset of X. 
A point w’ € bd W is said to be a Chebyshev point if for 
each wo € bd W with w’ >x wo and for each x9 ¢ W 
such that wo € Pw(xo) it follows that Py(xo) = {wo}, 
that is, the best approximation of xp is unique. 


Definition 7 was introduced in [7] for a downward sub- 
set of a Banach lattice. 


Definition 8 Let W be a downward subset of 
X. A point w’ €bdW is said to be a Chebyshev 
point of W with respect to each K; (i € J) if for 
each wo € bd W with w’ >x wo and for each x9 ¢ W 
such that wo € P! w(x) for each i € I it follows that 
P' w(xo) = {wo} for each i € I. 


Remark 4 In view of Corollary 8, we have that Defini- 
tions 7 and 8 are equivalent. 


Theorem 8 Let W be a closed downward subset of X 
and w' € bd W. If w’ is a Chebyshev point of W with 
respect to each K; (i € I), then W is a strictly downward 
set at w’. 
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Proof Suppose that w’ is a Chebyshev point of W 
with respect to each K; (i € I). Assume, if possible, 
that W is not strictly downward at w’. Then we can 
find wy € bd W and w € W such that w’ >x wo and 
w> KWo. Let r > ||w — wollx > 0. It follows from (27) 
that 


rl >x, W— Wo Viel. 

Thus, wo + rl >x, w for all ie I. Set x» = wo + 
rl € X. Since wo € bd W, by Lemma 6 we have 
g(y, —wo) < 0 for all y € W. Also, x9 — rl = wo € W. 
Thus, by (21), Proposition 14, and Lemma 4 we get 
r = d(xo, W). Since ||xo — wolli = |[rllli = 
all i € I, it follows from (25) that ||xo — wol|« = r, and 
hence wo € Pw(xo). In view of Corollary 9, we obtain 
Wo € Pi,(xo) for all i € I. 

On the other hand, we have x9 = wo + rl >x, w 
for all i € I. Since w> x wo, we conclude that there 
exists j €I such that w> K;Wo. It follows that 
rl = Xp — Wo >K)X0 -W 2K; 0. Hence 


r for 


llxo — whlj < Ilr; = 7 = 4! (x0, W) < |lxo — wIl;- 


Thus ||xo — w||; = d/(xo, W), and so w € Pl w(x) 
with w # wo. Whence there exist a point xo € X \ W 
and a point wo € bdW with w’ >x wo such that 
Wo € Piw(xo) for each i € I and P/w(xo) contains at 
least one point different from wo. This is a contradic- 
tion because w’ is a Chebyshev point of W with respect 
to each K; (i € I), which completes the proof. oO 


Proposition 16 Let W be a closed downward subset of 
X and w' € bd W. If W is a strictly downward set at w’, 
then w' is a Chebyshev point of W. 


Proof The proof is similar to that of Theorem 4.2 (the 
implication (2) ==> (1)) in [7]. Oo 


Corollary 11 Let f: X —> R be a continuous strictly 
increasing function. Then S-(f)(c € IR) is a Chebyshev 
subset of X. 


Proof This is an immediate consequence of Lemma 5 
and Proposition 16. Oo 
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Introduction 


Fractional bilevel programming (FBP), a class of bilevel 
programming [6,10], has been proposed as a gener- 
alization of standard fractional programming [9] for 
dealing with hierarchical systems with two decision lev- 
els. FBP problems assume that the objective functions 
of both levels are ratios of functions and the common 
constraint region to both levels is a nonempty and com- 
pact polyhedron. 


Formulation 


Using the common notation in bilevel programming, 
the FBP problem [1] can be formulated as: 


P hy (x1, x2) 
min File, x2) i 
X15%2 &i(X1, X2) 

where x2 solves 

: ho (x1, x2) 
min f2(x1, x2) = ———— 

x2 &2(X1, X2) 


st. (x1,%2) ES, 


where x; € IR”! and x. € R” are the variables con- 
trolled by the upper level and the lower level decision 
maker, respectively; h, and g; are continuous functions, 
h, are nonnegative and concave and g; are positive and 


convex on S;and S = {(x;,x2) : Ayx; + Arx. < b, 
x, => 0,x2 > 0}, which is assumed to be nonempty and 
bounded. 

Let S, be the projection of S on R”. For each 
X, € S, provided by the upper level decision maker, 
the lower level one solves the fractional problem: 

: ha(X1, x2) 
f2(%1, X2) elk, x2) 
st. Agx, < b-—Aj\ x 


x. >0. 


Let M(X) denote the set of optimal solutions to this 
problem. In order to ensure that the FBP problem is 
well posed it is also assumed that M(%;) is a singleton 
for all x, € S). 

The feasible region of the upper level decision 
maker, also called the inducible region (IR), is implicitly 
defined by the lower level decision maker: 


IR = {(X1,X2) : X; = 0, X. = argmin 
{ falX1, X2) : Aix + Arx. < b,x. > OF}. 


Therefore, the FBP problem can be stated as: 


hy (x1, x2) 
Bi (1, X2) 
st. (x1, x2) € IR. 


min fil%1, X2) = 
X1 5X2 


Theoretical Results 


The FBP problem is a nonconvex optimization problem 
but, taking into account the quasiconcavity of f2 and 
the properties of polyhedra, in [1] it was proved that 
the inducible region is formed by the connected union 
of faces of the polyhedron S. 

One of the main features of FBP problems is that, 
even with the more complex objective functions, they 
retain the most important property related to the opti- 
mal solution of linear bilevel programming problems. 
That is, there is an extreme point of S which solves the 
FBP problem [1]. This result is a consequence of the 
properties of IR as well as of the fact of f; being qua- 
siconcave. The same conclusion is also obtained when 
both level objective functions are defined as the mini- 
mum of a finite number of functions which are ratios 
with the previously stated conditions or, in general, if 
they are quasiconcave. 
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Under the additional assumption that the upper 
level objective function is explicitly quasimonotonic, 
another geometrical property of the optimal solution of 
the FBP problem can be obtained by introducing the 
concept of boundary feasible extreme point. Accord- 
ing to [7], a point (x), x2) € IR is a boundary feasible 
extreme point if there exists an edge E of S such that 
(x1, x2) is an extreme point of E, and the other extreme 
point of E is not an element of IR. 

Let us consider the relaxed problem: 


hy (x1, x2) 


min fi (x1, %2) = Balt, %2)’ (1) 


X1 5X2 


st. (x1,%2) ES. 


Since f; is a quasiconcave function and S is 
a nonempty and compact polyhedron, an extreme point 
of S exists which solves (1). Obviously, if an optimal so- 
lution of (1) is a point of IR, then it is an optimal so- 
lution to the FBP problem. However, in general, this 
will not be true, since both decision makers usually have 
conflicting objectives. 

Hence, if f; is explicitly quasimonotonic and there 
exists an extreme point of S not in IR that is an optimal 
solution of the relaxed problem (1), then a boundary 
feasible extreme point exists that solves the FBP prob- 
lem [3]. 

Although FBP problems retain some important 
properties of linear bilevel problems, it is worth point- 
ing out at this time some differences related to the 
existence of multiple optima when solving the lower 
level problem for given x; € S,. Different approaches 
have been proposed in the literature to make sure that 
the bilevel problem is well posed [6]. The most com- 
mon one is to assume that M(x) is single-valued for 
all x; € S,. Other approaches give rules for selecting 
x2 € M(x;) in order to be able to evaluate the upper 
level objective function fi (x1, x2). The optimistic ap- 
proach assumes that the upper level decision maker has 
the right to influence the lower level decision maker so 
that the latter selects x2 to provide the best value of f;. 
On the contrary, the pessimistic approach assumes that 
the lower level decision maker always selects x. which 
gives the worst value of fy. 

It is well-known [8] that, under the optimistic ap- 
proach, at least one optimal solution of the linear bilevel 
problem is obtained at an extreme point of the poly- 


hedron defined by the common constraints. However, 
in [3] an example of the FBP problem is proposed in 
which M(x) is not single-valued for given x; € S; and 
this assertion is not true. Firstly, IR no longer consists of 
the union of faces of the polyhedron S. Secondly, if the 
pessimistic approach is used, then an optimal solution 
to the example does not exist. Finally, if the optimistic 
approach is taken the optimal solution to the example 
is not an extreme point of the polyhedron S. 


Algorithms 


Bearing in mind that there is an extreme point of S 
which solves the FBP problem, an enumerative algo- 
rithm can be devised which examines the set of extreme 
points of S in order to identify the best one regard- 
ing f1, which is a point of IR. The bottleneck of the 
algorithm would be the generally large number of ex- 
treme points of a polyhedron together with the process 
of checking if an extreme point of S is a point of IR or 
not. 

In the particular case in which fj is linear and f) is 
linear fractional (LLFBP problem), in [2] an enumera- 
tive algorithm has been proposed which finds a global 
optimum in a finite number of stages by examining im- 
plicitly only bases of the matrix A. This algorithm con- 
nects the points of IR with the bases of A», by applying 
the parametric approach to solve the fractional prob- 
lem of the lower level. One of the main advantages of 
the procedure is that only linear problems have to be 
solved. 

When f; is linear fractional and f> is linear (LFLBP 
problem), the algorithm developed in [2] combines lo- 
cal search in order to find an extreme point of IR with 
a better value of f, than any of its adjacent extreme 
points in IR and a penalty method when looking for 
another point of IR from which a new local search can 
start. 

The Kth-best algorithm has been proposed in [3] 
to globally solve the FBP problem when both objective 
functions are linear fractional (LFBP). It essentially as- 
serts that the best (in terms of the upper level objective 
function) of the extreme points of S which is a point 
of IR is an optimal solution to the problem. Moreover, 
the search for this point can be made sequentially by 
computing adjacent extreme points to the incumbent 
extreme point. 
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Finally, recently two genetic algorithms have been 
proposed [4,5] which allow us to solve LLFBP, LFBP 
and LFLBP problems. Both algorithms provide excel- 
lent results in terms of both accuracy of the solution 
and time invested, proving that they are effective and 
useful approaches for solving those problems. Both al- 
gorithms associate chromosomes with extreme points 
of S. The fitness of a chromosome evaluates its quality 
and penalizes it if the associated extreme point is not 
in IR. The algorithms mainly differ in the procedure of 
checking ifan extreme point is in IR. When fy is linear, 
all lower level problems have the same dual feasible re- 
gion, so it is possible to prove several properties which 
simplify the process. 
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Many hierarchical optimization problems involving 
two or more decision makers can be modeled as a mul- 
tilevel mathematical program. The two-level struc- 
ture is commonly known as a Stackelberg game where 
a leader and a follower try to minimize their individ- 
ual objective functions F(x, y) and f(x, y), respectively, 
subject to a series of interdependent constraints [2,9]. 
Play is defined as sequential and the mood as noncoop- 
erative. The decision variables are partitioned between 
the players in such a way that neither can dominate the 
other. The leader goes first and through his choice of 
x € R" is able to influence but not control the actions 
of the follower. This is achieved by reducing the set of 
feasible choices available to the latter. Subsequently, the 
follower reacts to the leader’s decision by choosing a y 
€ R” in an effort to minimizes his costs. In so doing, 
he indirectly affects the leader’s solution space and out- 
come. 

Two basic assumptions underlying the Stackelberg 
game are that full information is available to the play- 
ers and that cooperation is prohibited. This precludes 
the use of correlated strategies and side payments. The 
vast majority of research on this problem has centered 
on the linear case known as the linear bilevel program 
(BLP) [3,6]. Relevant notation, the basic model, and 
a discussion of its theoretical properties follow. 

Forx€X CR",yEeYCR",F:XxY—>R',andf: 
X x Y > R', the linear bilevel programming problem 
can be written as follows: 


F(x, y)=cx+ dy, (1) 


min 
xEX 
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s.t. Ax ++ Buy < by, (2) 
min f(x,y) =cax+doy, (3) 
yey 

s.t. Arx + Buy < bp, (4) 


where C), C2 € R", dj, dy € R”, b; € R?, bb € R1, A, € 
R?*", By € R?*™, A, € RI*", By € R1*"™. The sets X and 
Y place additional restrictions on the variables, such as 
upper and lower bounds or integrality requirements. 
Of course, once the leader selects an x, the first term 
in the follower’s objective function becomes a constant 
and can be removed from the problem. In this case, we 
replace f(x, y) with f(y). 

The sequential nature of the decisions in (1)-(4) im- 
plies that y can be viewed as a function of x; i. e., y = y(x). 
For convenience, this dependence will not be written 
explicitly. 


Definitions 


a) Constraint region of the linear BLP: 


S = {(x, y): 
A,x + By < bj, Aox + Boy < bp}. 


xexX,yeyY, 


b) Feasible set for follower for each fixed x € X: 
S(x) = iy €Y: Axx + Buy < b>} F 
c) Projection of S onto the leader’s decision space: 


S(XX)={xeX: Ayey, 
Aix+ Bry < b;,Ao.x + Boy < by}. 


d) Follower’s rational reaction set for x € S(X): 
P(x) ={yeY: 
y € argmin {f(x, 7): 7 € S(x)}}. 


e) Inducible region: 
IR = {(x, y): (x, y) € S, y € P(x)}. 


To ensure that (1)-(4) is well posed it is common to 
assume that S is nonempty and compact, and that for 
all decisions taken by the leader, the follower has some 
room to respond; i.e., P(x) 4 Y. The rational reaction 
set P(x) defines the response while the inducible region 


IR represents the set over which the leader may opti- 
mize. Thus in terms of the above notation, the BLP can 
be written as 


min {F(x, y): (x, y) © IR}. (5) 


Even with the stated assumptions, problem (5) may 
not have a solution. In particular, if P(x) is not single- 
valued for all permissible x, the leader may not achieve 
his minimum payoff over IR. To avoid this situation in 
the development of algorithms, it is usually assumed 
that P(x) is a point-to-point map. Because a simple 
check is available to see whether the solution to (1)-(4) 
is unique (see [2]) this assumption does not appear to 
be unduly restrictive. 

It should be mentioned that in practice the leader 
will incur some cost in determining the decision space 
S(X) over which he may operate. For example, when 
BLP is used as a model for a decentralized firm with 
headquarters representing the leader and the divisions 
representing the follower, coordination of lower level 
activities by headquarters requires detailed knowledge 
of production capacities, technological capabilities, and 
routine operating procedures. Up-to-date information 
in these areas is not likely to be available to corporate 
planners without constant monitoring and oversight. 


Theoretical Properties 


The linear bilevel program was first shown to be NP- 
hard by R.G. Jeroslow [7] using satisfiability arguments 
common in computer science. The complexity of the 
problem is further elaborated in » Bilevel linear pro- 
gramming: Complexity, equivalence to minmax, con- 
cave programs. Issues related to the geometry of the so- 
lution space are now discussed. The main result is that 
when the linear BLP is written as a standard mathemat- 
ical program (5), the corresponding constraint set or 
inducible region is comprised of connected faces of S$ 
and that a solution occurs at a vertex (see [1] or [8] for 
the proofs). For ease of presentation, it will be assumed 
that P(x) is single-valued and bounded, S is bounded 
and nonempty, and that Y = {y: y > O}. 


Theorem1 The inducible region can be written equiva- 
lently as a piecewise linear equality constraint comprised 


of supporting hyperplanes of S. 
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A straightforward corollary of this theorem is that the 
linear BLP is equivalent to minimizing F over a feasi- 
ble region comprised of a piecewise linear equality con- 
straint. In general, because a linear function F = c,x 
+ diy is being minimized over IR, and because F is 
bounded below on S by, say, min{c,x + diy: (x, y) € IR}, 
it can also be concluded that the solution to the linear 
BLP occurs at a vertex of IR. An alternative proof of 
this result was given by W.F. Bialas and M.H. Karwan 
[4] who noted that (5) could be written equivalently as 


min {c)x + dy: (x,y) € coIR}, 


where co IR is the convex hull of the inducible region. 
Of course, co IR is not the same as IR, but the next the- 
orem states their relationship with respect to BLP solu- 
tions. 


Theorem 2. The solution (x*, y*) of the linear BLP oc- 
curs at a vertex of S. 


In general, at the solution (x*, y*) the hyperplane {(x, 
y): 1x + dyy = cx* + dy*} will not be a support of the 
set S. Furthermore, a by-product of the proof of The- 
orem 2 is that any vertex of IR is also a vertex of S, 
implying that IR consists of faces of S. Comparable re- 
sults were derived by Bialas and Karwan who began by 
showing that any point in S that strictly contributes in 
any convex combination of points in S to form a point 
in IR must also be in IR. This leads to the fact that if 
x is an extreme point of IR, then it is an extreme point 
of S. A final observation about the solution of the linear 
BLP can be inferred from this last assertion. Because the 
inducible region is not in general convex, the set of op- 
timal solutions to (1)-(4) when not single-valued is not 
necessarily convex. 

In searching for a way to solve the linear BLP, it 
would be helpful to have an explicit representation of IR 
rather than the implicit representation given by Defini- 
tion e). This can be achieved by replacing the follower’s 
problem (3)-(4) with his Kuhn-Tucker conditions and 
appending the resultant system to the leader’s problem. 
Letting u € R47 and v € R”™ be the dual variables associ- 
ated with constraints (4) and y > 0, respectively, leads to 
the proposition that a necessary condition for (x*, y*) 
to solve the linear BLP is that there exists (row) vectors 


u™ and v* such that (x*, y*, u*, v*) solves: 


min cx + dyy, (6) 
st. Aix+ Bry < by, (7) 
uB,—v = —d, (8) 
u(bz — Arx — Boy) + vy = 0, (9) 
A2x + Boy < bo, (10) 
x>0, y>0, u=0, v>=0. (11) 


This formulation has played a key role in the de- 
velopment of algorithms. One advantage that it offers is 
that it allows for a more robust model to be solved with- 
out introducing any new computational difficulties. In 
particular, by replacing the follower’s objective function 
(3) with a quadratic form 


1 
f(x,y) = ox + dry +x" Qyt sy" Qy, (12) 


where Q; is an m X m matrix and Q; is an m x m 
symmetric positive semidefinite matrix, the only thing 
that changes in (6)-(11) is constraint (8). The new con- 
straint remains linear but now includes all problem 
variables; i.e., 


x"Q, + y'Q+uB,—v= —dy. (13) 


From a conceptual point of view, (6)-(11) is a standard 
mathematical program and should be relatively easy to 
solve because all but one constraint is linear. Neverthe- 
less, virtually all commercial nonlinear codes find com- 
plementarity terms like (9) notoriously difficult to han- 
dle so some ingenuity is required to maintain feasibility 
and guarantee global optimality. 


Algorithmic Approaches 


There have been nearly two dozen algorithms proposed 
for solving the linear BLP since the field caught the at- 
tention of researchers in the mid-1970s. Many of these 
are of academic interest only because they are either im- 
practical to implement or highly inefficient. In general, 
there are three different approaches to solving (1)-(4) 
that can be considered workable. The first makes use of 
Theorem 2 and involves some form of vertex enumera- 
tion in the context of the simplex method. W. Candler 
and R. Townsely [5] were the first to develop an algo- 
rithm that was globally optimal. Their scheme repeat- 
edly solves two linear programs, one for the leader in 
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all of the x variables and a subset of the y variables asso- 
ciated with an optimal basis to the follower’s problem, 
and the other for the follower with all the x variables 
fixed. In a systematic way they explore optimal bases of 
the follower’s problem for x fixed and then return to the 
leader’s problem with the corresponding basic y vari- 
ables. By focusing on the reduced cost coefficients of 
the y variables not in an optimal basis of the follower’s 
problem, they are able to provide a monotonic decrease 
in the number of follower bases that have to be exam- 
ined. Bialas and Karwan [4] offered a different approach 
that systematically explores vertices beginning with the 
basis associated with the optimal solution to the linear 
program created by removing (3). This is known as the 
high point problem. 

The second and most popular method for solving 
the linear BLP is known as the Kuhn-Tucker approach 
and concentrates on (6)-(11). The fundamental idea is 
to use a branch and bound strategy to deal with the com- 
plementarity constraint (9). Omitting or relaxing this 
constraint leaves a standard linear program which is 
easy to solve. The various methods proposed employ 
different techniques for assuring that complementarity 
is ultimately satisfied (e. g., see [3,6]). 

The third method is based on some form of penalty 
approach. E. Aiyoshi and K. Shimizu (see [8, Chap. 
15]) addressed the general BLP by first converting the 
follower’s problem to an unconstrained mathemati- 
cal program using a barrier method. The correspond- 
ing stationarity conditions are then appended to the 
leader’s problem which is solved repeatedly for de- 
creasing values of the barrier parameter. To guarantee 
convergence the follower’s objective function must be 
strictly convex. This rules out the linear case, at least in 
theory. A different approach using an exterior penalty 
method was proposed by Shimizu and M. Lu [8] that 
simply requires convexity of all the functions to guar- 
antee global convergence. 

In the approach of D.J. White and G. Anandalingam 
[10], the gap between the primal and dual solutions of 
the follower’s problem for x fixed is used as a penalty 
term in the leader’s problem. Although this results in 
a nonlinear objective function, it can be decomposed 
to provide a set of linear programs conditioned on ei- 
ther the decision variables (x, y) or the dual variables (u, 
v) of the follower’s problem. They show that an exact 
penalty function exists that yields the global solution. 


Related theory and algorithmic details are highlighted 
in [8, Chap. 16], along with presentations of several ver- 
tex enumeration and Kuhn-Tucker-based implemen- 
tations. 
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A sequential optimization problem in which indepen- 
dent decision makers act in a noncooperative manner 
to minimize their individual costs, may be categorized 
as a Stackelberg game. The bilevel programming prob- 
lem (BLPP) is a static, open loop version of this game 
where the leader controls the decision variables x € X 
C R", while the follower separately controls the deci- 
sion variables y € Y C R”™ (e.g., see [3,9]). 

In the model, it is common to assume that the leader 
goes first and chooses an x to minimize his objective 
function F(x, y). The follower then reacts by selecting 
a y to minimize his individual objective function f(x, 
y) without regard to the impact this choice has on the 
leader. Here, F: X x Y > R! and f: X x Y > R!. The 
focus of this article is on the linear case introduced in 
> Bilevel linear programming and given by: 


min F(x,y) =cx+dyy, (1) 
xe 

st. Aix + Biy < by, (2) 
min f(x, y) = cox + day, (3) 
yey 


S.t. A2x + Boy S bo, (4) 


where c), C2 € R", dj, do € R™, by € R’, bo € RY, Ay 
€ R?*", By € RP*™, Ay € RI*%", By € RI*%"™. The sets 
X and Y place additional restrictions on the variables, 
such as upper and lower bounds. Note that it is always 
possible to drop components separable in x from the 
follower’s objective function (3). 

Out of practical considerations, it is further sup- 
posed that the feasible region given by (2), (4), X and 
Y is nonempty and compact, and that for each decision 
taken by the leader, the follower has some room to re- 
spond. The rational reaction set, P(x), defines these re- 
sponses while the inducible region, IR, represents the 
set over which the leader may optimize. These terms 
are defined precisely in > Bilevel linear programming. 
In the play, y is restricted to P(x). 

Given these assumptions, the BLPP may still not 
have a well-defined solution. In particular, difficulties 
may arise when P(x) is multivalued and discontinuous. 
This is illustrated by way of example in [2,3]. 


Related Optimization Problems 


The linear minmax problem (LMMP) is a special case 
of (1)-(4) obtained by omitting constraint (2) and set- 
ting c. = — c), d, = — dj. It is often written compactly 
without the subscripts as 


min max {cx + dy: Ax + By = b} (5) 
xEX yey 


or equivalently as 


min («« + max ‘y) : (6) 


xEX yES(x) 


where S(x) = {ty € Y : By < b — Ax}. Several restrictive 
versions of (5) where, for example, X and Y are polyhe- 
dral sets and Ax + By < bis absent, as well as related op- 
timality conditions are discussed in [8]. Although im- 
portant in its own right, the LMMP plays a key role in 
determining the computational complexity of the linear 
BLPP. This is shown presently. 

Consider now the inner maximization problem in 
(6) with Y = {y = 0}. Its dual is: min{u' (b — Ax): u € U}, 
where u is a q-dimensional decision vector and U = {u: 
u'B> d,u> 0}. Note that the dual objective function 
is parameterized with respect to the vector x. Replacing 
the inner maximization problem with its dual leads to 
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a second representation of (5): 


min (cx —u! Ax +u'b), (7) 
xEX,uEU 
which is known as a disjoint bilinear programming 
problem. The theoretical properties of (7) along with its 
relationship to other optimization problems are high- 
lighted in [1]. 

A more general version of a bilinear programming 
problem can be obtained directly from the linear BLPP. 
To see this, it is necessary to examine the Kuhn-Tucker 
formulation of the latter given by (6)-(11) in > Bilevel 
linear programming. Placing the complementarity con- 
straint in the objective function as a penalty term gives 
the following bilinear programming problem: 

min cjx+dy 
+M[u' (by — Arx — Boy) + vy], 
st. Aix + Bry < by, 


(8) 
u'B,—v! =—d, 
A2x + Boy < bo, 
x>0, y>=0, u=O0, v=0, 


where M is a sufficiently large constant. In [10] it is 
shown that a finite M exists for the solution of (8) to be 
a solution of (1)-(4), and that (8) is a concave program; 
that is, its objective function is concave. This point is 
further elaborated in the next section. 


Complexity of the Linear BLPP Problem 


(1)-(4) can be classified as NP-hard which loosely 
means that no polynomial time algorithm exists for solv- 
ing it unless P = NP. To substantiate this claim, it is 
necessary to demonstrate that through a polynomial 
transformation, some known NP-hard problem can be 
reduced to a linear BLPP. This will be done below con- 
structively by showing that the problem of minimizing 
a strictly concave quadratic function over a polyhedron 
(see [5]) is equivalent to solving a linear minmax prob- 
lem (cf. [4]). For an alternative proof based on satisfia- 
bility arguments from computer science see [7]. 


Theorem 1 The linear minmax problem is NP-hard. 


To begin, let x be an n-dimensional vector of decision 
variables, and c € R”, b € R!, A € R4*", De R"*" be 


constant arrays. For A of full row rank and D positive 
definite, it will be shown that the following minimiza- 
tion problem can be transformed into a LMMP: 


6* =min cx— ix'Dx, 
x (9) 
s.t. Ax <b, 


where it is assumed that the feasible region in (9) is 
bounded and contains all nonnegativity constraints on 
the variables. The core argument centers on the fact that 
the Kuhn-Tucker conditions associated with the con- 
cave program (9) must necessarily be satisfied at opti- 
mality. These conditions may be stated as follows: 


Ax <b (10) 
x'D-ulA=c (11) 
u'(b— Ax) =0 (12) 
u>0 (13) 


where u is a q-dimensional vector of dual variables. 
Now, multiplying (11) on the right by x/2, adding cx/2 
to both sides of the equation, and rearranging gives 
1 * ie 
—(cx—u- Ax) = cx — =x Dx. (14) 
2 2 
From (12) we observe that u' b = u! Ax, so (14) be- 
comes 


1 1 

3 lex —u!b) =cx— 5x! Dx. (15) 
Replacing the objective function in (9) with the left- 
hand side of (15), and appending the Kuhn-Tucker 
conditions to (9) results in 


6*=min cx—u'b, 


x,u 

s.t. Ax <b, 
x'D-u'A=c, 
u'(b—Ax) =0, 


u>= 0, 


(16) 


which is an alternative representation of (9). Thus 
a quadratic objective function in (9) has been traded for 
a complementarity constraint in (16). 

Turning attention to this term, let z be a q-dimen- 
sional nonnegative vector and note that u' (b — Ax) 
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can be replaced by z; = min[u;, (b — Ax);],i=1,...,m, 
where (b — Ax); is the ith component of b — Ax, as long 
as )); Z; = 0. The aim is to show that the following linear 
minmax problem is equivalent to (16): 


q 
6°=min cx—ulb+ > Mazi, 


Sth 
i=1 
s.t. Ax <b, 
x'’D—u'A=cu>0, 
q 
(17) 
— Yo Mazi, 
i=1 
s.t Zisui, i=1,...,q, 
Zi<(b-Ax);, i=1,...,4q 
z,> 0 i=1,...,4q 


where M in the objective functions of problem (17) is 
a sufficiently large constant whose value must be deter- 
mined. 

Before proceeding, observe that an optimal solution 
to (16), call it (x*, u*), is feasible to (17) and yields the 
same value for the first objective function in (17). This 
follows because }°; z7 = 0, where z7 = z7 (x*, u*). It 
must now be shown that (x*, u*, z*) also solves (17). 
Assume the contrary; i.e., there exists a vector (xo, uo, 
Zo) in the inducible region of (17) such that 0° < 6* and 
>i Zz? > 0. (Of course, if )°; z? = 0 and 0° < 6* this 
would contradict the optimality of (x*, u*).) 

To exhibit a contradiction an appropriate value of 
M is needed. Accordingly, let S be the polyhedron de- 
fined by all the constraints in (17) and let 

at = min {cx —u'b: (x,u,z) € Sh. (18) 
Evidently, because S is compact, 6* in (18) is finite. 
Compactness follows from the assumption that {x: Ax 
< b} is bounded, and the fact that A has full row rank 
which implies that u is bounded in the second con- 
straint in (17). Now define: 


where ¢€ is any value in (0, 7; z?). This leads to the fol- 
lowing series of inequalities: 


P= cx®— blu +M) zt <cx*—blyu* = 0* 


t 


or 


(cx* — b! u*) — (cx® — b'u°) > MY a: (19) 


1 


But from the definition of M along with (19), one has 
M) 2? — Me > 6* — 6+ 


> (cx* — bl u*) —(cx®—b' uw) > Moz? 


1 


which implies that the open interval (0, )~; z?) does not 

exist so >); z = 0, the desired contradiction. 

Similar arguments can be used to show the reverse; 
therefore, if (x*, u*) solves (16), it also solves (17) and 
vice versa. Finally, note that the transformation from 
(9) to (17) is polynomial because it only involves the ad- 
dition of 2q variables and 2q + n constraints to the for- 
mulation. The statement of the theorem follows from 
these developments. A straightforward corollary is that 
the linear BLPP is NP-hard. 

In describing the size of a problem instance, I, it is 
common to reference two variables: 

1) its Length[J], which is an integer corresponding to 
the number of symbols required to describe J under 
some reasonable encoding scheme, and 

2) its Max[J], also an integer, corresponding to the 
magnitude of the largest number in I. 

When a problem is said to be solvable in polynomial 

time, it means that an algorithm exists that will return 

an optimal solution in an amount of time that is a poly- 
nomial function of the Length[J]. A closely related con- 
cept is that of a pseudopolynomial time algorithm whose 
time complexity is bounded above by a polynomial 
function of the two variables Length[J] and Max[J]. 

By definition, any polynomial time algorithm is also 

a pseudopolynomial time algorithm because it runs in 

time bounded by a polynomial in Length[J]. The re- 

verse is not true. 

The theory of NP-completeness states that NP-hard 
problems are not solvable with polynomial time algo- 
rithms unless P = NP; however, a certain subclass may 
be solvable with pseudopolynomial time algorithms. 
Problems that do not yield to pseudopolynomial time 
algorithms are classified as NP-hard in the strong sense. 
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The linear BLPP falls into this category. The proof in 
[6], once again, is actually a corollary to the following 
theorem. 


Theorem 2 The linear minmax problem is strongly NP- 
hard. 


The proof is based on the notion of a kernel K of a graph 
G = (V, E) which is a vertex set that is stable (no two 
vertices of K are adjacent) and absorbing (any vertex 
not in K is adjacent to a vertex of K). It is shown that 
the strongly NP-hard problem of determining whether 
or not G has a kernel (see [5]) is equivalent to determin- 
ing whether or not a particular LMMP has an optimal 
objective function value of zero. 
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Production systems typically involve significant uncer- 
tainty in their operation due to either external or inter- 
nal resources. Variability of process parameters during 
operation and plant model mismatch (both paramet- 
ric and structural) could give rise to suboptimality and 
even infeasibility of the deterministic solutions. Conse- 
quently, plant flexibility has been recognized to repre- 
sent one of the important components in the operabil- 
ity of the production processes. 

In a broad sense the area covers 
e a feasibility test that requires constraint satisfaction 

over a specified space of uncertain parameters; 

e a flexibility index associated with a given design that 
represents a quantitative measure of the range of un- 
certainty space that satisfies the feasibility require- 
ment; and 

e the integration of design and operations where 
trade-offs between design cost and plant flexibility 
are considered. 

K.P. Halemane and LE. Grossmann [21] proposed 

a feasibility measure for a given design based on the 

worst points for feasible operation, which can be math- 

ematically formulated as a max-min-max optimization 
problem as will be discussed in detail in the next section. 

Different approaches exist in the literature that 
quantify the flexibility for a given design involve the 
deterministic measures such as the resilience index, 
RI, proposed in [38], the flexibility index proposed in 
[41,42] and the stochastic measures such as the design 
reliability proposed in [27] and the stochastic flexibility 
index proposed in [37] and [40]. 

The incorporation of uncertainty into design opti- 
mization problems transforms the deterministic pro- 
cess models to stochastic/parametric problems, the so- 
lution of which requires the application of specialized 
optimization techniques. The consideration of the fea- 
sibility objective within the design optimization can 
be targeted towards the following two design capa- 
bilities. The first one concerns the design with fixed 
degree of flexibility that has the capability to cope 
with a finite number of different operating conditions 
({19,20,32,34,40]). The second one considers the de- 
sign optimization with optimal degree of flexibility that 
can be achieved by the trade-off of the cost of the plant 
and its flexibility ([22,33,35,36]). In the next section the 
feasibility test and the flexibility index problem will be 
considered in detail. 


Problem Statement 


The design problem can be described by a set of equality 
constraints J and inequality constraints J, representing 
plant operation and design specifications: 


hi(d,z,x,@) = 0, 
gj(d,z,x,0) < 0, 


ieT, (1) 
je, 

where d corresponds to the vector of design variables, 
z the vector of control variables, x the state variables 
and @ the vector of uncertain parameters. As has been 
shown in [21] for a specific design d, given this set of 
constraints, the design feasibility test problem can be 
formulated as the max-min-max problem: 


x(a) 
hj(d, z,x,0) = 0; (2) 
gj(d,z,x,0) <0 


= max min max 
6eT Zz jeé, 
i€l 


where the function y(d) represents a feasibility mea- 
sure for design d. If x(d) < 0, design d is feasible for 
all 0 € T, whereas if y(d) > 0, the design cannot operate 
for at least some values of 9 € T. The above max-min- 
max problem defines a nondifferentiable global opti- 
mization problem which however can be reformulated 
as the following two-level optimization problem: 


xd) = max (a, 6) 

s.t. w(d,0) <0 

w(d,0) = min u (3) 
s.t. hid, z,x,0)=0, iel, 


gi(d,z,x,0@)<u, jel, 

where the function w(d, 6) = 0 defines the boundary of 
the feasible region in the space of the uncertain param- 
eters 6. 

Plant feasibility can be quantified by determining 
the flexibility index of the design. Following the defini- 
tion of flexibility index as proposed in [41], this metric 
expresses the largest scaled deviation 6 of any expected 
deviations A 6*, A @~, that the design can handle. The 
mathematical formulation for the evaluation of design’s 
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flexibility is the following: 


F  =max6d 
s.t. y(d) 
; hi(d, z,x,@) = 0; 
= max min max ; 
OeT 2 is), gj(d,z,x,0) <0 
ON —§A@~ <0 
T(d) = 40: 7 
< ON + §A@t 
6 = 0. 
(4) 
The design flexibility index problem can be refor- 
mulated to represent the determination of the largest 
hyperrectangle that can be inscribed within the feasible 
region of the design [41]. Following this idea, the math- 
ematical formulation of the flexibility problem has the 
following form: 


F =min6d 
s.t. w(d, 0) = 0, 
w(d,@) = minu, 
h(d,z,x,0)=0, ie€l, (5) 
gi(d.z,x,0)<u, jel, 
ON — §A0- <6 
T(6) = 40: i ; 
im < ON — §Agt 
56 >0. 


Local Optimization Framework 


For the case where the constraints are jointly 1-D quasi- 
convex in 9 and quasiconvex in z it was proven [41] that 
the point 0, that defines the solution to (3) lies at one 
of the vertices of the parameter set T. Based on this as- 
sumption, the critical uncertain parameter points cor- 
respond to the vertices and the feasibility test problem 
is reformulated in the following manner: 


x(d) = max (d, 0"), (6) 
keV 


where y(d, 0) is the evaluation of the function y(d, 
6) at the parameter vertex 6* and V is the index set for 
the 2”? vertices for the n, uncertain parameters 0. In 
a similar fashion for the flexibility index, problem (4) is 
reformulated in the following way: 


F = min 5K, (7) 
EV 


where 6* is the maximum deviation along each vertex 
direction A 6*, k € V, and is determined by the follow- 
ing problem: 


6§ =maxé 
3,2 
st. gj(d,z,x,0) <0, je, 
hi(d,z,x,0)=0, i€], (8) 
6=6N + AOk, 
6>0. 


Based on the above problem reformulations, a direct 
search method was proposed [21] that explicitly enu- 
merate all the parameter set vertices. To avoid the 
explicit vertex enumeration, proposed two algorithms 
were proposed [41,42]: a heuristic vertex search and an 
implicit enumeration scheme. These algorithms how- 
ever, rely on the assumption that the critical points cor- 
respond to the vertices of the parameter set T which is 
valid only for the type of constraints assumed above. 
To circumvent this limitation, a solution approach was 
proposed based on the following ideas [18]: 

a) They replace the inner optimization problem: 


w(d,0) =minu 
h,(d,z,x,0) = 0, 
gi(d,z,x,0) <u, 


Viel, 
YPrel 


by the Karush-Kuhn-Tucker optimality conditions 
(KKT): 


isd 

dg; Ohi 

De i— =0, 

2 1 dz +25 dz 
jeJ i€] 

dg; ah; 
a ae tay ve 
ied ie] 
Ajsj = 9, Fed; 
sj = u—gj(d,z,x,0), jel, 
Aj.sj = 0, jel, 


where s; are the slack variables of constraints j, Aj» 
4; are the Lagrange multipliers for inequality and 
equality constraints, respectively. 

b) For the inner problem the following property holds 
that if each square submatrix of dimension (n, x 
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nz), where nz is the number of control variables, of ture of the constraints. In a similar way the flexibility 
the partial derivatives of the constraints g;, Vj €¢ J index problem takes the following form: 

with respect to the control variables z is of full rank, 

then the number of the active constraints is equal to F =miné 


nz +1. 
tt. hj(d,z,x,@) = 0, 
c) They utilize the discrete nature of the selection of : ec. 
the active constraints by introducing a set of binary gj(d,z,x,0) +s; -u=0, 
variables y; to express if constraint g; is active. In oa 
particular: 
yo Ap=l 
Aj—yj 50, jet, = 
a 0g; oh; 
sj—U(L—y) 50, jel, dAig, + Lea, =o 
be +1 a, = dh 
Vj a Nz , Sj i 
je] Doha. Dai ae = 0, 
ie] i€l 
6 = 0, 


yr =O1, Aspe, fel, 


where U represents an upper bound to the slack a yp =n +1, 
variables s;. Note that if y; = 1, then A; > 0, sj = 0 jel 
which indicates that the constraint j is active, on the 6 <9 <6", 


other hand if y; = 0, then A; = 0, s; => 0 which indi- 
cates that the constraint j is inactive. = 
Based on these ideas, the feasibility test problem can be yj =9,1, Xj, sj 20, jel. 
reformulated in the following way: 
Grossmann and C.A. Floudas [18] proposed the active 
y(d) = maxu set strategy for the solution of the above reformulated 
st. hj(d,z,x,0) =0, problems based on the property that for any combina- 
tion of n, + 1 binary variables that is selected (i. e., for 


i(d,z,x,0 ,-u=0, ; i ; ; 
a ee ae a a given set of active constraints), all the other variables 


>, Aj=1, can be determined as a function of 6. They proposed 
ie] do, dh, a procedure of systematically identifying the potential 
> Xr ee + > Li 7 = 0, candidates for the active sets based on the signs of the 
i i€l gradients V,g;(d, z, x, 0). The algorithm for the feasi- 
Ss se dgi +4 Ss ; dhi ~=Ti2 bility test problem involves the following steps: 

je] dx dx a) For every potential active set determine the value u‘, 


k=1,..., mas, through the solution of the following 


nonlinear programming problem: 


sj; -U(1— yj) <9, jel, 


D9) = Me +1 uk = maxu 
jel 
66 <9 <8", st. hj(d,z,x,@) = 0, 
6 >0, gi(d,z,x,0)-u=0, j € AS(k), 
yj =91, Ajsj=0, jel, 6. <9 <6", 
6>0, 


which corresponds to a mixed integer optimization 
problem either linear or nonlinear depending on the na- yj =0,1, Aj,s;>0, jel, 
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or if the active set AS(k) involves lower and upper 
bound constraints u‘ is given by: 


YY am- DS baw], 


j()EAS(K) j(u)EAS(k) 


where j(J), j(u) are the indices that correspond to 
those pairs of constraints representing the lower and 
upper bounds on the same function, and a; is the 
total number of this type of constraints. 


b) The solution for the feasibility test problem is given 
by: 
x(d) = max uk. 
kEAS(k) 


A similar algorithm was proposed for the solution 
of the flexibility index problem. Under the conditions 
that the functions y(d, @) are quasiconcave in z and 6 
and strictly quasiconvex in z for fixed 0, the approach 
guarantees global optimality. 

For the linear case where the feasibility function 
problem has the following form: 


w(d,0) =minu 


fj(d,z, 0) = 
= Bijd + Boj(O)z — b2(8) < u, 
Viel, 


where f; are the inequality constraints after the elimi- 
nation of the state variables. For this case, analytical ex- 
pressions have been derived [32,33] the function w*(d, 
0) for a given active set k: 


WK(d,0) = Y° AF (Brjd + Boj(0)z — b2)(4)). 


i€Ths 


A branch and bound approach based on the eval- 
uation of upper and lower bounds of function y(d) 
was proposed in [30]. Although the suggested bounding 
problems are simpler than the original feasibility test 
problem they correspond to bilevel optimization prob- 
lems where global optimality cannot be guaranteed us- 
ing local optimization methods, (see [29]). 


Design Optimization 


As mentioned above, the incorporation of the feasibility 
objectives within the design optimization framework 


can be targeted towards the design with fixed degree 
of flexibility that is able to accommodate a finite num- 
ber of changing operational conditions and the design 
with optimal degree of flexibility determined by proper 
balance of economic optimality and plant feasibility. 
The design optimization for fixed degree of flexibility 
was considered in [20], which presents a general for- 
mulation for designing multipurpose plants consider- 
ing a deterministic multiperiod model of the following 
form: 


N 
min C%(d)+ >> Ci(d,z',x',¢) 
i=1 
st. hi(d,z',x',t'!) =0, 
PO ays, 
N 1. N 


r(d,z',... ZN xt, 


ti... t) <0, 


where d is the vector of design variables; z' is the vector 
of control variables in period i; x! is the vector of state 
variables in period i; t' is the length of time for period i; 
hi is the vector of equalities for period i; g' is the vector 
of inequalities for period i; r is the vector of inequali- 
ties that involve variables of all periods; N is the num- 
ber of periods where different operating conditions are 
considered. This problem formulation exhibits a block 
diagonal structure which has been exploited for compu- 
tational efficiency. See [19] for the projection-restriction 
strategy, which is an iterative scheme between eco- 
nomic optimization and design feasibility. Based of the 
flexibility analysis and the assumption that the critical 
points lie on the vertices of the uncertain parameter set 
T, Halemane and Grossmann [21] proposed an itera- 
tive algorithm to solve the problem of design consider- 
ing specific range of uncertainty. See [31], for a nested 
solution procedure combining generalized Benders de- 
composition and outer approximation algorithms to ad- 
dress the problem of multiperiod design of heat inte- 
grated distillation sequences. See [44] for an outer ap- 
proximation based decomposition method for the solu- 
tion of multiperiod design problems. 

For design optimization with optimal degree of flex- 
ibility, E.N. Pistikopoulos and Grossmann proposed 
[33] an iterative scheme in order to construct the trade- 
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off curves relating retrofit cost and expected revenue 
to flexibility. In their later work, [34,35], they extend 
their approach to nonlinear systems. Briefly, their iter- 
ative scheme consists of two phases: first the trade-off 
curve of retrofit cost and flexibility is determined, from 
this curve a number of designs are obtained for which 
at the second phase the expected revenue is evaluated 
by employing a modified Cartesian integration method. 
See [40], for a solution method based on generalized 
decomposition for the maximization of the plant flex- 
ibility subject to cost constraint. Recently (1996), a de- 
composition based approach for the simultaneous opti- 
mization of design economics and plant feasibility was 
proposed [22]. The main ideas of the proposed ap- 
proach are: 

e the utilization of a modified Benders decomposition 
scheme where the design variables correspond to the 
complicating variables; 

e the use of a numerical integration formula for the 
approximation of the multiple integral of expected 
revenue; and 

e the determination of the unknown integration 
points as part of the optimization procedure 
through the solution of a series of feasibility sub- 
problems. 

The same approach has been employed for the solution 
of planning and capacity expansion problems, [24], for 
the design of batch plants where additional properties 
simplified the solution procedure, [25]. The limitation 
of the later approach however, is that it cannot guaran- 
tee global optimality for the general case. 


Bilevel Optimization 


It should be noted that the feasibility test problem and 
the flexibility index problem correspond to bilevel op- 
timization problems where the inner level consists of 
the evaluation of the function y(d, 0) that defines the 
boundary of the feasible region. Approaches that exist 
in the literature to deal with the solution of the bilevel 
optimization problem for the linear case involve the 
enumeration techniques that based on the fact that the 
solution must occur at an extreme point of the feasi- 
ble set as the methods proposed in [13,14], and [12], 
reformulation techniques based on the transformation 
of the original problem to a single optimization prob- 
lem by employing the optimality KKT conditions of the 


inner level problem as the methods proposed in [11], 
based on branch and bound principles [17], based on 
mixed integer programming [26], based on paramet- 
ric complementarity pivoting [8], based on local opti- 
mization approaches for nonlinear programming such 
as penalty and barrier function methods, and global op- 
timization techniques based on the reformulation of 
the complementarity slackness constraint to a separa- 
ble quadratic reverse convex inequality constraint ([6]), 
or the restatement of the original problem as a reverse 
convex program ([43]). Recently (1996), [45], a global 
optimization framework was proposed based on the re- 
formulation of the bilevel linear problem utilizing the 
KKT optimality conditions of the inner level and the 
primal-dual global optimization approach proposed in 
[15,16]. 

For the nonlinear case, local optimization tech- 
niques has been proposed based on the one-dimen- 
sional search algorithm [10], and penalty function 
methods as the approach proposed in [5]. Recently 
(1998) a general global optimization approach was 
proposed [23] for the solution of the feasibility test 
and flexibility index problem based on a utilization of 
a branch and bound framework and the ideas of the de- 
terministic global optimization algorithm aBB, [1,3,4,9]. 
Although the proposed approach was applied to de- 
sign feasibility/flexibility problems, it can be extended 
to general nonlinear bilevel problems. In the next sec- 
tion the main ideas and basic steps of the later approach 
for the solution of the feasibility test and flexibility in- 
dex problems. 


Global Optimization Framework 


The basic idea of the proposed framework that leads 
to the determination of the global optimal solution for 
both the feasibility test and the flexibility index prob- 
lem is to generate a relaxation/enlargement of the fea- 
sible region based on the convexification of the original 
problem constraints. Since the enlarged feasible region 
involves more feasible points than the original feasible 
region, the resulting feasibility test and flexibility index 
problem will provide lower bounds to the global solu- 
tions. Based on this relaxation idea, the proposed ap- 
proach involves the following key steps: 
a) Since the constraints gj(d, z, x, 0), hi(d, z, x, @) 
are nonconvex functions, the Karush-Kuhn-Tucker 
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b) 


optimality conditions (KKT) of the inner problem 
that correspond to the optimization of the w(d, 0) 
function, are not necessary and sufficient to guaran- 
tee global optimality of the feasibility test and the 
flexibility index problems. Hence, the first step of 
the proposed framework involves the convexifica- 
tion of the constraints gj(d, z, x, 0), hi(d, z, x, @) 
of the original problem. For the convexified prob- 
lem and assuming that the constraint qualification 
holds, the KKT optimality conditions are necessary 
and sufficient, [18], and therefore we maintain the 
equivalence of the transformed single stage opti- 
mization problem. The solution of the single stage 
problem provides a lower bound of the design flex- 
ibility function y(d) and the flexibility index of the 
design, F. 

An upper bound to the design flexibility function 
x(d) and flexibility index F is determined through 
a feasible solution of the original MINLP formula- 
tion obtained by substituting the inner problem by 
the KKT optimality conditions. 

The next step after establishing an upper and a lower 
bound on the global solution, is to refine them. This 
is accomplished by successfully partitioning the ini- 
tial region of the uncertain and control variables 
into smaller ones. The partitioning strategy involves 
the successive subdivision of a hyperrectangle into 
two subrectangles by halving on the middle point of 
the longest side of the initial rectangle (bisection). 
In each iteration the lower bound of the feasibility 
test and the flexibility index problem is the mini- 
mum over all the minima found in every subrectan- 
gle composing the initial rectangle. Consequently, 
a nondecreasing sequence of lower bounds is gen- 
erated by halving the subrectangle that is respon- 
sible for the infimum over the minima obtained at 
each iteration. An nonincreasing sequence of upper 
bounds is derived by solving the nonconvex MINLP 
single optimization problem obtained after the sub- 
stitution of the inner problem by the KKT opti- 
mality conditions, and selecting as an upper bound 
the minimum over all previously determined upper 
bounds. If at any iteration the solution of the con- 
vexified MINLP in any subrectangle is found to be 
greater than the upper bound, this subrectangle is 
fathomed since the global solution cannot be found 
inside it. 


Feasibility Test 


The procedure for the global optimization of the design 
feasibility problem involves the following steps: 


1) 


2) 


3) 


4) 


5) 


6) 


7) 


Consider the whole uncertainty space. Set the lower 
bound LB = — oo, K = 1 and select a tolerance €. 
Evaluate the valid underestimators of the original 
constraints gj(d, z, x, 0), hi(d, z, x, @) utilizing the 
basic principles of the deterministic global opti- 
mization algorithm wBB, [1,3,4,9]. 

Considering the convexified constraints substitute 
the inner optimization problem by the necessary 
and sufficient KKT optimality conditions. 

Solve the resulting MINLP formulation to global op- 
timality using the deterministic global optimization 
algorithm SMIN-@BB, or GMIN-a@BB, [2]. If the ob- 
tained solution is greater than the current LB, up- 
date the LB. 

Substitute the inner optimization problem of the 
original problem (i.e., without convexifying the 
constraints) Solve the resulting problem using 
a local MINLP optimizer (e.g. DICOPT, [46], 
MINOPT, [39]). Set the upper bound UB equal to 
the obtained solution. 

Check for convergence. If UB — LB < e, STOP, oth- 
erwise continue to step 7). 

Apply one of the branching criteria to partition the 
initial domain into two subdomains to be consid- 
ered at the next iteration. Once the branching vari- 
able is selected the subdivision is performed by halv- 
ing on the middle point of the longest side of the ini- 
tial rectangle (bisection). The selection of a branch- 
ing variable can be made following different branch- 
ing rules. Since the aim of the branching step is the 
generation of problems with tighter lower bounds, 
the control variables, u, that participate in noncon- 
vex terms and the uncertain parameters, 6, are in- 
volved in the set of candidate branching variables. 
The control variable or uncertain parameter that 
is selected for branching, correspond to the least- 
reduced axis, that is, the largest 


xU — xt 
_ i i 
Ko= U 
Xi0 — *i,0 


Note, that alternative branching strategies may be 
applied as described in [1,3]. 
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Flexibility Index 


The procedure for the global optimization of the flexi- 

bility index problem (5), involves the following steps: 

1) Consider the whole uncertainty space. Set the lower 
bound LB = — ¢€, K = 1 and select a tolerance €. 

2) Substitute the inner optimization problem by the 
KKT optimality conditions and construct the fol- 
lowing single stage MINLP optimization problem. 
Solve the resulting problem using a local MINLP 
optimizer (e. g., DICOPT, [46], MINOPT, [39]). Set 
the upper bound UB equal to the obtained solution. 

3) Determine the valid underestimators of the origi- 
nal constraints g,(d, z, x, 0), hi(d, z, x, @) utilizing 
the basic principles of the deterministic global opti- 
mization algorithm wBB, [1,3,4,9]. 

4) Considering the convexified constraints substitute 
the inner optimization problem by the necessary 
and sufficient KKT optimality conditions. 

5) Solve the resulting MINLP formulation to global op- 
timality using the deterministic global optimization 
algorithm SMIN-a@BB, or GMIN-a@BB, [2]. If the ob- 
tained solution is greater than the current LB, up- 
date the LB. 

6) Check for convergence. If UB — LB < e, STOP, oth- 
erwise continue to step 7). 

7) Apply one of the branching criteria to partition the 
initial domain into two subrectangles to be consid- 
ered at the next iteration. 


Illustrative Example 


In this section an example of a heat exchanger network 
is considered to illustrate the steps of the approaches 
presented in the previous sections for the feasibility test 
and flexibility index problems. 

The heat exchanger network given in [18] is con- 
sidered here as shown in Fig. 1. The uncertain pa- 
rameter is the heat flowrate of stream H1 which has 
a nominal value of 1kW/K and an expected deviation 
of +0.8kW/K. The following inequalities determine the 
feasible operation of this network as they formulated af- 
ter the elimination of the state variables: 


fi = —25Fin + Q. — 0.5Q.Fi + 10 <0, 
fo = —190Fi) + Q. + 10 <0, 

fs = —270Fy1 + Q. + 250 < 0, 

fa = 260Fiy — Q. — 250 < 0. 


H2, 2kW/K 


C2 «388K ral 563K 
=> 
2kW/K 


Qe Cl 313K ; } 393K 
3k W/K 


Y 
5 


53K 13%323 


Bilevel Optimization: Feasibility Test and Flexibility Index, 
Figure 1 
Heat exchanger network 


Bilevel Optimization: Feasibility Test and Flexibility Index, 
Figure 2 
Feasible region 


The feasible region of the network is illustrated in 
Fig. 2. Note that the feasible region consists of the two 
disconnected domains which are highlighted in black. 

First the feasibility test problem is solved. Con- 
straint f; corresponds to the only nonconvex con- 
straint involving the bilinear term Q.Fy1. By introduc- 
ing a new variable w for the bilinear term Q-Fy; and 
introducing the four linear inequality constraints (f5- 
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fs) that define its convex envelope, [4,7,28], we have: 


fi = —25Fu + Q, —0.5w + 10 <0, 

fr = —190Fi1 + Q. + 10 <0, 

fs = —270Fi, + Qe + 250 < 0, 

fr = 260Fin — Q. — 250 < 0, 

fs = 10F1 + Q. —w—10<0, 

ffs = 236Fu + 1.8Q, — w — 424.8 < 0, 

fy = —10Fyy — 1.8Q. + w+ 18 <0, 

fs = —236Fi — Q. + w + 236 < 0. 

Considering this set of linear constraints and sub- 
stituting the inner optimization problem of the feasi- 
bility test problem by the necessary and sufficient KKT 
optimality conditions the following MILP optimization 
formulation is obtained: 

x(4) 
st. fi = —25Fu + Q. 
+10—0.5w+ sj = u, 
fo = -190Fy1 + 10+ Q. +52 =u, 
fs = —270Fy, + 250+ Q +53 =u 
ff = 260FH1 — 250—- Qo +54 =U, 
fs = 10Fu1 + Q. -w—-104+ 55 =u, 
fs = 236Fi1 + 1.8Q. 
—w — 4248+ s5 = u, 
fy = —10Fy) — 1.8Q 
+w+ 18+ s7 =u, 
fs = —236Fi1 — Q 
+w+ 236+ sg;-u=u, 


8 
SAS, 
j=l 


Ay +A, tA3-—AgtAs 
Ate 1 y= Ay =O, 

—0.5A; —As—Ag HA +As = 0, 

=p S30, JS liece,8; 

sj -UQ—yj)<0, j=1,...,8, 


8 
yy = 3. 
j=l 


FN, —6AFq, < Fin < FY, + SARS, 


= maxu 


Note that due to the introduction of an additional con- 
trol variable w, the number of active constraints is in- 
creased to three. The solution of the above MILP op- 
timization using GAMS/CPLEX is found to be equal 
to 0. Since this value corresponds to the lower bound 
of network feasibility y(d), this result suggests that the 
network is not feasible within the whole range of uncer- 
tainty, Fy; € (1, 1.8), and no further steps are required. 

The plant flexibility is then determined. First the in- 
ner feasibility problem is substituted by the KKT op- 
timality conditions. The resulting nonconvex MINLP 
problem is solved using the local MINLP solver 
MINOPT, [39]. The solution provides an upper bound 
of the heat exchanger network flexibility index of 0.148, 


F =miné 
st. fi = —25Fig + Q 
+10 —0.5Q.FH +s; = 0, 
fo = -190Fy, +10 + Q. +52 =0, 
fs = —270Fy1 + 250 + Q. + 53 = 0, 


fs = 260FrHy — 250 — Q. +54 = 0, 
4 


A=, 

j=l 

—0.5FyiA, + Az +A3—Ag = 0, 
Aj-yj) <0, jfHl,...,4, 
Udy) $0) FH cd, 


4 
i = 2, 
j=l 


FY, — 6AFa, < Fin < FY, + 8ARS,, 
5 >0, 
¥j = 9,1, 


Note that this formulation corresponds to a nonconvex 
MINLP due to the bilinear term (Q, Fy) in constraint 
fi and the bilinear term (Fj) in the gradient KKT 
constraint. 

In step 3), the original constraints are convexified 
using the wBB resulting in the set of linear constraints 
fis--+sfg as presented above in the solution of the feasi- 
bility test problem. In step 4), the KKT optimality con- 
ditions are written considering the new set of linear 
constraints leading to the formulation of the following 
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MILP problem: 


F =mind 
st. f, = —25Fin + Q. 
+10 —0.5w + s; = 0, 
fp = -190Fy, +10 + Q +5 =0, 
fa = —270Fy1 + 250 + Q. + 53 = 0, 
fa = 260Fyy — 250 — Q + 54 =0, 
fs = 10Fu1 + Q. -w—10+55 = 0, 
fs = 236Fu + 1.8Q. 
—w — 424.8 + s5 = 0, 
fy = —10Fy — 1.8Q, 
+w+18+ s7 =0, 
fs = —236Fy1 — Qo + w+ 2364+ sg = 0, 


8 
Pere 
j=l 


Ay tAzg+A3—Ag+As 
+1.8A¢6 — 1.8A7 —Asg = 0, 

—0.5A; —As—Ap +A7 + As = 0, 

Aj-—yj <0, jHl,...,8, 

sj; —-UQ1 — yj) <9, 


8 
> 7 = 3, 
j=l 


FN, — 8AFg, < Fin < FY, + SARS, 
6 >0, 


yj = 9,1, Aj.5; = 9, 


>0, j=Hl....,8. 

The solution of this MILP problem using 
GAMS/CPLEX results in the network flexibility of 0.06 
that provides a valid lower bound to the flexibility index 
problem. Hence, at the end of the first iteration we have 
an upper bound of 0.148 and a lower bound of 0.06 for 
the flexibility index problem. In step 7), since only one 
control variable is involved in the description of the 
problem, this corresponds to the branching variable 
resulting in the following subrectangles to be consid- 
ered at the next iteration: subrectangle 1 described by 
10 < Q, < 123 and subrectangle 1 described by 123 
< Q. < 236. Steps 2) through 6) are then performed 
for each one of these subrectangles. For subrectangle 
1, the resulting upper bounding MINLP gives a value 


of 0.148 the same as the lower bounding MILP in this 
region. Subrectangle 2, on the other hand results in 
a lower bound of 0.8138 which is larger than the cur- 
rent upper bound of 0.148 and consequently this region 
is fathomed and convergence is achieved to the global 
solution of network flexibility of 0.148. 


Conclusions 


The incorporation of uncertainty in the design stages 
is recognized to be one of the most important problems 
in the plant design analysis. Having efficient ways to test 
future plant feasibility and furthermore to quantify the 
capability of a plant to accommodate future variations 
of the operability parameters could lead to more effi- 
cient, economic and more flexible plants. Much of the 
work that appear in the literature to address the above 
problems was briefly presented in this paper. A general 
global optimization framework proposed in [23], was 
presented in more detail. Finally, an example problem 
was included to illustrate the main ideas of this frame- 
work. 
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Let us consider a sequential game where the first player 
(‘leader’) incorporates into his optimization process the 
optimal reaction vector y of the second player (‘fol- 
lower’) to the leader’s decision vector x. This situation 
is described mathematically by the bilevel program 


min f(x, y) 
xy 


BLP st. (x,y) EX 
y € argmin g(x, y’), 
yer(x) 


where it is understood that the leader is requested to 
select a vector x such that the parameterized set Y(x) is 
nonempty. 

This formulation is extremely general in that it sub- 
sumes linear zero-one optimization, quadratic concave 
programming, disjoint bilinear programming, nonlinear 
complementarity, etc. If one denotes by y(x) the set of 
optimal answers to a given leader vector x, the above 
bilevel program can be recast as the ‘standard’ mathe- 
matical program 


min f(x,y) 

xy 

st. (x,y) EX 
y € y(x). 


The induced region of a bilevel program is defined as 
the feasible set of the above program. This set is usu- 
ally nonconvex and might be disconnected. It is implicit 
that, whenever y(x) is not a singleton, the leader is free 
to select that element y € y(x) that suits him best. This 
interpretation is legitimate in the case where side pay- 
ments are allowed, i. e., the leader can bias the follower’s 
objective in his favor. On the other hand, the behavior 
of a risk-averse leader which seeks to minimize, over the 
feasible set X, the objective 


max f(x, y). 

yey(x) 
has been considered in [4]. 

The algorithmic difficulty of bilevel programming 
stems mainly from the fact that the set y(x) is ill- 
behaved, and usually not available in closed form. To 
gain some insight into this difficulty, let us consider the 
‘simple’ situation where f, g are affine, the constraint 
(x, y) € X is absent and Y(x) = {y: Ax + By > Db} is 
a convex polyhedron. It is easy to show that, as in lin- 
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ear programming, bounded and feasible linear bilevel 
programs admit extremal solutions, hence the linear 
BLP lies in the class NP of problems polynomially solv- 
able by a nondeterministic algorithm. Unfortunately, as 
shown in [2] and [3], the linear BLP is also strongly NP- 
hard. Moreover, its optimal solution(s) need not even 
be efficient (‘Pareto optimal’). This is one of the features 
that distinguish bilevel programming from bicriterion 
optimization. Indeed consider the linear BLP illustrated 
in the figure below, where the arrows denote the play- 
ers’ respective steepest descent directions: 


min $x +y 
xy 
st.  y € argmin—y’ 
y! 
xty'<1 
x,y >0. 


The induced region of this problem reduces to the 
dotted line segment of Fig. 1. Optimizing over this line 
segment yields the solution (x, y) = (1, 0), which is 
strictly dominated by all points inside the triangle T 
with vertices (0, 1/2), (0, 0) and (1, 0). Since the set of 
efficient points is the segment [(0, 0), (0, 1)], the only ra- 
tional point that is also Pareto optimal is (0, 1), which is 
actually the worst possible outcome for the leader. Note 
that, in the case where the functions f, g are affine and 
the sets X, Y(x) are polyhedral, the induced region is in 
general a nonconvex piecewise linear variety that con- 
tains several local minima. 

Assume now that the following conditions are satis- 
fied: 

e Y(x)= ty: hi(x, y) < 0,1 <i<n}; 

e the functions g and h; are continuously differen- 
tiable and convex; 

e the set Y(x) is regular for every x, i.e., some con- 
straint qualification holds. 

Then one can substitute for the follower’s program 

its Kuhn-Tucker conditions, yielding the equivalent 

single-level program 


weayex fey) 
s.t. Vyg(x,y) + Sd) AiVyhi(x, y) =0 
l<i<n 
Aihi(x,y) =0, 1l<i<n 
A4,;>0, l<i<n 


follower 


leader 


Bilevel Programming, Figure 1 


The complementarity constraints make this single-level 
problem difficult both theoretically (the constraint set is 
almost never regular) and algorithmically. 

A useful variant of BLP occurs when y(x) corre- 
sponds to the solution of an equilibrium system pa- 
rameterized in x. If this system is modeled by means 
of a variational inequality, one obtains a generalized 
Kuhn-Tucker formulation where the gradient Vy g(x, 
y) is replaced by a function F(x, y). (See [5].) In both 
cases, the complementarity constraint can be incor- 
porated in the leader’s objective as a penalty term M 
Yi <i<n Aihi(x, y), thus greatly simplifying the con- 
straint set. (It even becomes polyhedral in the linear 
case.) Under suitable assumptions, and for large but fi- 
nite values of the penalty multiplier M, the penalized 
problem is equivalent to the original bilevel problem. 
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Bilevel programming (see > Bilevel programming: In- 
troduction, history and overview; » Bilevel program- 
ming) is ideally suited to model situations where the 


decision maker does not have full control over all deci- 
sion variables. Five such situations are described in this 
article. 


Example 1 The first example involves the improve- 
ment of a road network through either capacity expan- 
sion, traffic signals synchronization, vehicle guidance 
systems, etc. While management may be assumed to 
control the design variables, it can only affect indirectly 
the travel choices of the network users. Let x denote 
the design vector, y the flow vector, X the set of fea- 
sible design variables and c;(x, y) the travel delay along 
link i. One wishes to minimize over the set X the system 
travel cost }°j y; ci(x, y), where the vector y is required 
to be an equilibrium traffic assignment corresponding 
to the design vector x. Neglecting the latter equilibrium 
requirement could lead to suboptimal policies. How- 
ever, as shown in [5] for a continuous variant of the 
network design problem, efficient heuristic procedures 
can generate near-optimal solutions at a low computa- 
tional cost. Indeed it is in the interest of both the man- 
agement and the network users to minimize travel de- 
lays, although the former is interested in minimizing to- 
tal travel time, while the users optimize their own travel 
time. 


Example 2 Next consider the maximization of rev- 
enues raised from tolls set on a transportation network. 
If tolls are set too high, traffic on the corresponding 
arcs will drop and revenues will be affected negatively. 
Conversely, low toll values will generate low revenues. 
One could strike the right balance by maximizing to- 
tal revenue, subject to the network users y achieving an 
equilibrium with respect to the toll vector x. In the case 
where the network is uncongested, users are assigned to 
shortest paths linking their respective origin and des- 
tination. This yields the bilevel program with bilinear 
objectives 


max > xjyj 

%Y jen 

st. y € argmin Sci + xi)y, + > Cis 
WEY ier, i€ly 


where I; represents the set of toll arcs, I, the set of toll- 
free arcs, and Y the polyhedron of demand-feasible flow 
vectors. In [4] it has been shown that this problem is 
reducible to a linear bilevel program with an economic 
interpretation in terms of ‘second-best’ choices, and can 
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also be reformulated as a zero-one integer program with 
few binary variables. Special cases are amenable to poly- 
nomial algorithms. 


Example 3 The third example is the Stackelberg-Nash- 
Cournot equilibrium studied in [8] where the leader 
firm maximizes its revenue x - p(x + )°1 <i<n Vi)—C(X) 
(p denotes the inverse demand function and c the leader 
firm’s production cost), subject to the vector y being 
a Cournot-Nash equilibrium with respect to the shifted 
inverse demand function p,(Q) = p(x + Q). This model 
subsumes the situations of monopoly (n = 0) as well as 
that of Stackelberg equilibrium (n = 1). It has been ex- 
tended in [7] to the case of multiple leaders, but does 
not fit any more the framework of bilevel program- 
ming. 


Example 4 A fourth example is provided by the energy 
sector, which is characterized by an extensive use of 
large scale techno-economic models describing specific 
subsectors or markets: gas and electricity subsectors, in- 
dustrial and residential markets, etc. In this respect, it 
provides a rich source for bilevel models. A bilevel pro- 
gram arises when a utility, in its strategic planning pro- 
cess, takes explicitly into account the rational reaction 
of its competitors or customers to its own investment 
schedule. This approach has been applied to assess the 
impact of new demand management technologies for 
reducing power usage [3]. Another bilevel model arises 
when a utility is legally bound to buy any energy sur- 
plus from ‘qualified small producers’ at marginal cost. 
For example, a study of the impact of cogeneration in 
the pulp and paper industry on the electricity market 
has been conducted in [2]. 


Example 5 Finally we mention that bilevel program- 
ming subsumes the principal/agent paradigm of eco- 
nomics (see [1]), where the principal (leader) subcon- 
tracts a job to an agent (follower). The principal re- 
wards the agent according to the quality of the final out- 
come w, which may be random, while the agent maxi- 
mizes its own objective, which is a function of the ef- 
fort level y and the expected reward x(w(y)). The lower 
the effort level y, the lower the (expected) quality w(y) 
of the finished job. Assuming that the agent accepts to 
perform the job only if his utility is larger than some 


‘reservation level’ gmin, one derives the bilevel program: 


anes f(x(w(y)), o(y)) 


st. g(x(w(y)), ¥) = Smin 
y € argmax g(x(w(y’)), 9’), 


yey 


where the leader’s decision variable x is a function de- 
fined over the set Y of possible effort levels. Whenever 
the set Y is not finite, this yields an infinite-dimensional 
optimization problem. The situation becomes all the 
more complex when the output @ is a random variable 
of the agent’s effort y. 
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Introduction 


Bilevel programming problems (BLPP) are encoun- 
tered when one optimization problem is embedded 
within another one as a constraint. BLPPs arise in many 
areas of engineering, where hierarchical decision mod- 
els are often encountered. Almost all areas of engineer- 
ing can provide some examples in which two decision 


models interact and the outcome of one decision influ- 
ences another; applications can be found in areas as di- 
verse as traffic control and reactive distillation. 

The general BLPP formulation is as follows: 


min F(x, y) 
s.t. G(x, y) > 0 
Outer H(x, y) = 0 

optimization ?. min f(x, y) 

problem eae : 
optimiziation 4 ct. g(x,y) = 0 
problem hey =o 
xEeXcR™, yeYcCR™ 

where 


f. Fi R™ x WR > HR, 

g = (gi... gy]: RXR? > HR, 
G=[Gy,....Gy]: 87 x RP HW", 
h = [h,..., 47): 8" xR”? > HR, 
H=([M,..., Hy]: 8 xR? > HK", 


The outer optimization problem, which minimizes 
F(x,y), is constrained by inequality constraints G, 
equality constraints H, and the inner optimization 
problem. This inner optimization minimizes its objec- 
tive function by varying y, while subject to its own in- 
ner constraints g and h. The inner variables y may also 
appear in the outer constraints and objective function, 
and the inner constraints and objective function may be 
parameterized by x. Novel global optimization strate- 
gies exist to solve the BLPP with twice continuously 
differentiable nonconvex nonlinear [18] and mixed- 
integer nonlinear constraints [19]. 

This article explores a diverse sampling of bilevel 
programming including examples from civil engineer- 
ing traffic management, chemical engineering process 
design and metabolic engineering. 


Bilevel Programming in Traffic Management 


As urban populations increase and cities expand, traffic 
and its related problems effect the everyday life of all 
commuters. 
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Traffic problems follow the hierarchical structure of 
BLPPs. Each individual commuter travels upon a net- 
work of roads that is created and organized by a cen- 
tral regulatory agency. This agency plans the layout 
and carrying capacity of highways and streets, chooses 
where to place on-off ramps that connect limited access 
highways to local streets, decides where to install traffic 
lights, and sets their signaling rate. 

As the regulatory agency plans and manages this 
network of roads, it must accommodate the traffic pat- 
tern formed by the individual decisions of the thou- 
sands of travelers who use the network each day. Dur- 
ing each trip, each traveler takes a path that she believes 
will minimize her travel time, based on previous expe- 
rience and ongoing traffic reports. Beckmann et al. [2] 
have shown that when this information is perfect and 
all travelers have access to it, the cumulative effect is to 
minimize the total time spent by all drivers on all roads 
in the network: 


min > | , ta(x)dx . 


This behavior by the travelling public creates bilevel 
programming problems in traffic management. When 
a regulatory agency tries to set policies that minimize 
gas consumption, the travel time of all drivers, or some 
other objective, its options are constrained by the re- 
sponse of the travelling public. 

One application of BLPPs in traffic is signal opti- 
mization, where the objective is to minimize travel time 
or gasoline consumption by varying the length of green 
lights and the cycle time of traffic lights [9]: 


min) > tavi(t, s) 
a 


s.t min > [ ‘ ta(x) dx. 


In this problem, the outer objective sums over all 
costs based on the signaling policies. 

This bilevel programming problem is used to plan 
road improvements, where central planning agency 
minimizes the cost of construction and similar activi- 
ties, subject to the inner optimization problem that pre- 
dicts the behavior of traffic on the road [1,9,21]. It is also 
used to optimize the flow of traffic onto limited access 
highways. The outer problem minimizes the total travel 


time of all travelers by optimizing traffic light lengths 
and other controls at the on- and off-ramps of the high- 
way, while the inner optimization problem predicts the 
behavior of traffic on the road [24,25,26]. 


Bilevel Programming in Chemical Process Synthesis 


Chemical process synthesis by optimization techniques 
is a vast area that includes plant design, the synthesis 
of reactor networks, separation systems, heat exchanger 
networks, and utility plants, and the planning of batch 
and multiperiod operations [3,10,11,12]. 


Inner Problems that Minimize the Gibbs Free En- 
ergy Many chemical engineering design problems in- 
volve distillation columns, liquid-liquid extractors and 
decanters, and reactors; modeling these unit opera- 
tions usually requires modeling the chemical equilibria 
and phase equilibria (vapor-liquid equilibrium, liquid- 
liquid equilibrium, and vapor-liquid-liquid equilib- 
rium) occurring within them. When the number of 
phases is known in advance, phase and chemical equi- 
librium can be modeled with a set of algebraic equa- 
tions. When, however, the number of phases is not 
known a priori, these algebraic equations cannot be 
used; in these problems, the number of phases, phase 
equilibrium, and in some problems chemical equilib- 
rium can be predicted by minimizing the Gibbs free en- 
ergy. Maximizing the profit, minimizing the cost, or op- 
timizing some other measure of a chemical process that 
contains a unit operation with an unknown number of 
phases is a bilevel programming problem: 


max F(x, nx) 
s.t G(x, nix) > 0 
H(x, nix) = 0 


min y y Nik Mik 
Nik ; ; 
l 


Here, the outer problem maximizes the profit F. 
Design specifications are captured by inequality con- 
straints G, while equality constraints H are the mass 
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and energy balances. The inner optimization minimizes 
the Gibbs free energy, equal to the summation of nix Liz, 
the moles of species i and phase k multiplied by the 
corresponding chemical potential. This inner problem 
is constrained by mass balances assuring that the to- 
tal number of atoms of element j is constant regard- 
less of the phase or chemical distribution, and that the 
total number of moles of species i in phase k is posi- 
tive. 

Clark and Westerberg [7,8] used this strategy to 
optimize a reactor making aniline from nitrobenzene 
and hydrogen. The reaction also produces water, which 
may form a two-liquid phase mixture with nitroben- 
zene and aniline, depending upon the relative amounts 
of nitrobenzene, aniline, and water. The outer problem 
optimized the reactor temperature and pressure, while 
the inner problem found the simultaneous phase and 
chemical equilibrium by minimizing the Gibbs Free 
Energy. 

Giimiis and Ciric [17] used bilevel programming to 
optimize a reactive distillation column that produces 
aniline from nitrobenzene and water. The outer prob- 
lem minimizes cost by varying the number of trays, re- 
flux and reboil ratios, and feed tray locations. A series 
of inner optimization problems predict the phase and 
chemical equilibrium in the condenser and on each tray 
in the column. 


Bilevel Programming and Simultaneous Design and 
Control Bilevel programming has also been used to 
integrate the design of a chemical process with the syn- 
thesis of its control scheme [4]. The outer optimization 
problem maximizes the annual profit D(z) minus the 
cost of off-spec product formed during process upsets, 
while an inner optimization problem simultaneously 
predicts the amount of off-spec product formed during 
process upsets and finds the settings of a model predic- 
tive controller that minimize this amount. The model 
is: 


max D(z) 


—« >) COP {z, ps ui(t), x1(t), 1(8), Pe(t)} — Cu 
I 


s.t f(z, p) =0 
h(z,p) =0 
g(z,p) 20 


g (zp, u(t), xi(t), yi(t), pe(t)) = 0 
ee CO {Z, ps ui(t), xi(t), yilt), pc(t)} 


s.t.x = f(z, p, u(t), xi(t), y(t), p(t) 


x(t = 0) = xq 
y(t = 0) = ya 
u(t = 0) = ug 


Sh.i(Z, p, ur(t), xi(t), yi(t), pc(t)) = 0 
h(z, p, ui(t), x(t), yi(t), pe(t)) = 0 


ub < u(t) <u". 


In this formulation, the cost of the fluctuations around 
the steady state, denoted by subscript d, will increase 
the cost of off spec production, CO/. In the inner opti- 
mization, the actions u of a model predictive controller 
are based on the disturbance I. 


Bilevel Programming and Design Under Uncertainty 
In the planning stage of a design, the range of uncer- 
tain parameters that the design can tolerate for feasi- 
ble operation should be determined. The design un- 
der parametric uncertainty problem can be described 
by a set of equality constraints I and inequality con- 
straints J representing plant operation and design spec- 
ifications: 


Viel 
Viel 


h;(d, z, x, ®@) = 0, 
g(d,z, x, 0) = 0, 


where z is the vector of control variables, x is the vector 
of state variables and @ is the vector of uncertain pa- 
rameters. Feasibility concerns are incorporated into the 
design step by quantifying design feasibility and flexi- 
bility with the feasibility test and flexibility index mea- 
sures. These measures are characterized by max-min- 
max formulations [20] that are further reformulated in 
the BLPP form [13,16,23]. For a specific design d, the 
BLPP feasibility test problem is of the form: 


oo 


s.t w(d, 8) <0 
w(d, 8) = minu 
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s.t. hi(d,z, x, 0) = 0, 
u — g(d,z,x,8) = 0, 
T = {00 <0 <0") 


Viel 
Viel 


where the function y(d, 8) represents a feasibility mea- 
sure for design d. The boundary of the feasible region 
in the space of the uncertain variables is at y(d, 6) =0. 
If w(d,0) > 0, the design can not operate at least for 
some values of 8 in T, and the BLPP is infeasible. 

For a specific design d, the design flexibility test 
problem is also formulated as a BLPP: 


mune 
s.t y(d,0) =0 
w(d,@) = min u 
s.t. hj(d,z,x,0)=0, Viel 
gi(d,z,x,0)<u, Viel 


T(5) = {0/0 —5A0~ <0 < ON + 5AOT} 
5=0 


where 6 is the largest scaled deviation of any ex- 
pected deviations A®~ and A@* the design can han- 
dle [13,16,23]. Higher 6 signifies more flexible design 
towards parametric variations. 


Bilevel Programming in Metabolic Engineering 


Metabolic engineering involves optimization of genetic 
and regulatory processes within cells to increase over- 
production of desired metabolites or proteins. These 
changes can have major effects on cell growth if the de- 
sired overproduction competes with growth resources, 
so the cell will redistribute the metabolic fluxes to max- 
imize its growth rate. Metabolic flux distributions can 
be optimized utilizing in-silico genome scale metabolic 
network maps to develop overproduction strategies. 
Several different problems in this research area have re- 
cently been formulated as BLPPs. These involve the (i) 
determination of optimal gene knockouts, (ii) identifi- 
cation of stable steady state solutions and (iii) dynamic 
gene expression control strategies, all to achieve maxi- 
mum product yield. 


Gene Knockout Strategies Gene deletion strategies 
to increase the overproduction of a desired product 


can be straightforward and involve competing reaction 
pathways; however, many others can be complex and 
non-intuitive. Burgard et al. [5] introduced a BLPP for- 
mulation to address the optimal manipulation of gene 
knockout strategies to maximize overproduction, sub- 
ject to maximizing cell’s growth objective at the inner 
level. The inner problem is parameterized with gene 
knockout strategies that are chosen by the outer prob- 
lem and constrained by metabolic flux balances and 
fixed substrate. This BLPP model for a steady-state 
metabolic network of N metabolites and M metabolic 
reactions fueled by a glucose substrate is formulated as: 


i Vchemical 
i 


s.t saa Vbiomass 
J 


VieN 


Vpts 1 Volk — Vglc_uptake = 0 


Vatp — Vatp_main =0 


Vbiomass — oe = 0 

vie ey; <vj< eal en VjieM 
yj ={0.1}, VWieM 

YiU-y) <K 

jeM 


where Vchemical is the flux of the desired product, Vpiomass 
is biomass formation, Sj is the stoichiometric constant 
for metabolite i in reaction j, v; is the flux of reac- 
tion j, Vpts and Vg respectively represent the uptake 
of glucose through the phosphotransferase system and 
glucokinase, Vik uptake is the basis glucose uptake sce- 
nario, Vatp_main is the non-growth associated ATP main- 
tenance requirement, K is the number of allowable 
knockouts, and Vie as is a minimum level of biomass 
production. The BLPP can be modified further to in- 
clude additional bounds on O2, CO2 and NH; transport 
rates and secretion pathways for key metabolites in the 
inner problem [22]. 


Stable Metabolic Networks Stability considerations 
of a redesigned metabolic network can be addressed 
within a BLPP framework, such that the new system is 
stable around a neighborhood of the new steady state. 
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Here, the outer problem maximum product flux objec- 
tive is subject to flux balances and an inner stability ob- 
jective [6]. 


Temporal Flux Control Gene expression can be con- 
trolled dynamically using the BLPP structure to opti- 
mize the temporal flux profile of a key reaction, such 
that at the end of a batch, the total product formation is 
maximized [14]. In the outer problem, a flux in a spe- 
cific reaction known to have an impact on the prod- 
uct formation is varied with time to maximize the to- 
tal product formation at the end of a batch. The inner 
problem maximizes cellular growth at each sampling 
time over the batch period by optimizing the remain- 
ing fluxes. The BLPP can be modified to determine the 
optimal regulation time of the specific flux from an ini- 
tial to a final value. Glycerol and ethanol production 
in E.coli have been studied using the BLPP formula- 
tion [14]. 

Gadkar et al. [15] coupled this BLPP model with 
control algorithms to determine genetic manipulation 
strategies in bioprocess applications. They introduced 
three alternative BLPP models to maximize ethanol 
production in anaerobic batch fermentation of E. coli, 
optimizing ethanol production, batch time and multi- 
batch scheduling in the presence of parametric uncer- 
tainty and measurement noise. These include (i) opti- 
mizing growth regulation time and batch duration time 
by penalizing for longer batch times in the outer ob- 
jective (ii) scheduling multiple batch runs to address 
inhibition due to product accumulation in the reactor, 
optimizing the number of batch runs, batch duration 
times, glucose allocation per run and the manipulated 
flux regulation time, and (iii) optimizing genetic alter- 
ations in the presence of growth inhibition and para- 
metric uncertainty in the inhibition constant. 


Conclusions 


The hierarchical structure of many engineering prob- 
lems lends themselves to bilevel programming formula- 
tions, where an inner optimization problem constrains 
a larger, ‘outer’ optimization problem. Applications in 
civil engineering design include traffic control, where 
an inner optimization problem predicting driver’s be- 
havior constrains an outer optimization problem that 
identifies the optimal control strategies. In chemical 


engineering, BLPPs are used to identify processes that 
are both economically optimal - maximizing revenue 
or minimizing cost - and simultaneously ensure that 
multiphase equilibrium is satisfied by determining the 
global minimum of Gibbs Free energy. Other applica- 
tions include the combined optimization of a chem- 
ical process and it’s controllers and chemical process 
design under parametric uncertainty, to ensure oper- 
ational feasibility and flexibility. Alternative BLPP for- 
mulations have been introduced in modeling metabolic 
engineering systems. These address the maximization 
of product yield by determining optimal gene knock- 
outs, identifying stable steady state solutions and dy- 
namic gene expression control strategies. Metabolic en- 
gineering area is a recent and growing application field 
for BLPP. 
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Introduction 


Optimisation of enterprise-wide process networks has 
attracted considerable attention in recent years; since it 
represents substantial economic savings there has been 
a growing concern to plan efficiently the operations 
within the complexity of decision networks. Often, in 
such complex networks, an hierarchy of decisions has 
to be followed and compromises made between iden- 
tities with equivalent authority. For instance, numer- 
ous investigations have been done in the optimisation 
of supply chains, Fig. 1, and in the plant selection prob- 
lem, Fig. 2. A detailed study of hierarchical decisions 
can be found in [13,14,26]. 


Formulation 
The general multilevel decentralised optimisation 
problem can be described as follows: 


min SR Ye Yin) (1st level) 


i 4k 
XV? rere 


s.t. gi(x, yi, y5,..., 7.) <9, (1) 


where OE ees solve , 


., min fi(x,yi,yk,...,yl,),... (2nd level) 
i,k I 
VV oe Vm 


s.t. gi(x, yi, xf, ae ys) . <0, 


where ceed solve, 


---pmin fi, Yi eres indo (mth level) 


Ym 


st. g (ee Mca d) SO; 
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Ky 
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Bilevel Programming Framework for Enterprise-Wide Pro- 
cess Networks Under Uncertainty, Figure 1 
Supply chain planning example 


Head 
company 
Manufacturing va Ww Manufacturing 
plant 1 


plant n 
\ Manufacturing] ... 
plant 2 


Bilevel Programming Framework for Enterprise-Wide Pro- 
cess Networks Under Uncertainty, Figure 2 
Hierarchical decision planning example 


e008 


Manufacturing 
plant k 


where, f are real convex functions, g are vectorial 
real functions defining convex sets and x,y are sets 
of variables belonging to the group of real numbers; 
ie€ {1,2,...,B, k € {1,2,...,K}, l © {1,2,...,L}, 
implying that (2nd level) has I optimisation subprob- 
lems, (3rd level) K optimisation subproblems and (mth 
level) has L optimisation subproblems, respectively. For 
the sake of simplicity and without loss of generality, we 
analyse the relations in Problem (1) using two partic- 
ular classes of multilevel programming problems: the 
bilevel programming problem, which organises verti- 
cally in two levels, and the bilevel programming prob- 
lem with multi-followers, which is similar to bilevel 
programming but with several subproblems at the sec- 
ond level. 


Bilevel Programming 


Bilevel programming problems (BLPP) involve a hier- 
archy of two optimisation problems, of the following 
form [6,17,20,25,32]: 


min F(x, y), 
xy 


s.t. G(x, y) <0, (2) 


xEXx, 


y € argmin{ f(x, y): g(x,y) <0, ye Y}, 


where X C R”* and Y C R” are both compact convex 
sets; F and f are real functions: R@*+"y) _» R; Gand 
g are vectorial real functions, G: R“**™? + R™ and 
g: ROY) + Rs nx, ny € Nand nu, nl € NU{0}. 
The following definitions are associated to Problem (2): 
e Relaxed feasible set (or constrained region), 


2={x eX, ye Y: G(x, y) < 0, g(x, y) < 0}; 
(3) 


e Lower level feasible set, 
C(x) = ty € Y: g(x, y) < 0}; (4) 


e Follower’s rational reaction set, 


M(x) = {y € Y: y€ argmin{f(x, y): y € C(x)}}; 
(5) 


e Inducible region, 
IR={x EX, ye Y: (x,y) € Q,y © M(x)}. (6) 


Note the parametric nature of the rational reaction 
set, (5), which reflects the dependence of the decisions 
taken at the upper levels on the decisions taken at the 
lower levels. This, in fact, is evidence that in bilevel 
programming problems the relations between the lev- 
els differ from the well-known Stackelberg game, where 
the decisions made by the followers don’t affect the de- 
cision already taken by the leader [32]. 


Bilevel Programming with Multi-Followers 


Bilevel programming problems with multi-followers 
involve two optimisation levels with several optimisa- 
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tion subproblems at the lower (2nd level): 


MINx,y1,y2,049m F(X, V1, Y2.--++¥m), (st level) 
s.t. G(X, V1, 25-++sVm) < 9, 
xe xX, 


yi © argmin{ f(x, v1, ¥2,.--. Ym)! 
Bi(X, Vi, Y2s--+ Ym) SO, Vi € Vij, 

(2nd level) 
ié€ {1,2,...,m}, 


(7) 
with the following definitions: 
e Feasible set for the ith follower, 
(2) (X, V1, Yor. + +s Vi-ds Vitis + +++ Ym) (8) 
= {yi © Yi: gi(x,%, 1, Y2,---, Ym) SO}, 
e Rational reaction set for the ith follower, 
Di(X, V1, V25-0 +s Vinm1s Vit ds ++ +s Vm) =— {Yi € Yj: 
yi € argmin{ fi(x, yi, y2,---.¥m): yi © Qi(x)}} . 
(9) 


Since one assumption is that followers may ex- 
change information, conflicts naturally occur. The 
Nash equilibrium is often a preferred strategy to coordi- 
nate such decentralised systems [24]. Consequently, the 
optimisation subproblems positioned in the lower level 
reach a Nash equilibrium point, (x, yf, y3,..., y%,) [2]: 


Fi Mi Viaeees De) S ICG Vis Va een ys 
Vy Ee Y,, 
PARI aera) ST Pay 
Vy2 €%, 
Fin X Es Va Vin) S Sl Xs YT V7 5+ +++ ¥m)s 
Vym © Ym. 

(10) 


Once more observe the parametric nature of the fol- 
lowers’ rational reaction set, (9). In this case, however, 
each rational reaction set is a function of both the up- 
per level decision variables and the decision variables 
of the other subproblems located in the same hierarchi- 
cal level. Additionally, the priority remains to solve the 


leader’s objective function to global optimality. Thus, 
we aim to compute the global optimum for the leader 
and the best possible equilibrium solution for the fol- 
lowers. 


Applications 


Applications of bilevel and multilevel programming in- 

clude: 

1. design optimisation problems in process systems en- 
gineering [4,5]; 

. design of transportation networks [23]; 

. agricultural planning [19]; 

. management of multi-divisional firms [27] and 

. hierarchical decision-making structures [19]. 


aA kW NY 


Cases 


Theoretical developments. Recently, Pistikopoulos 
and co-workers [9,10] have proposed novel solution 
algorithms which open the possibility of using a gen- 
eral framework to address general classes of bilevel and 
multilevel programming problems. These algorithms 
are based on parametric programming theory [1,11] 
and use of the Basic Sensitivity Theorem [15,16]. This 
approach can be classified as a Reformulation Tech- 
nique [33] since the bilevel problem is transformed into 
a number of quadratic or linear problems. The main 
idea is to divide the follower’s feasible area into differ- 
ent rational reaction sets, and search for the global op- 
timum of a simple quadratic (or linear) programming 
problem in each area. 


Global Optimum of a Bilevel Programming Problem 


While for an optimal control problem (one-player 
problem) there is a well-defined concept for optimality, 
the same is not always true for multi-person games [2]. 
In the case of bilevel programming, [7,17,18,29,32,33] 
interpret the optimisation problem as a leader’s prob- 
lem, F, and search for the global minimum of F. The 
solution point obtained for the follower’s problem, f, 
will respect the stationary (KKT) conditions and hence 
it can be any stationary point. Obviously, this solu- 
tion strategy is acceptable when the player in the upper 
level of the hierarchy is in the most “powerful” posi- 
tion, and the other levels just react to the decision of 
their leader. Such an approach is sensible in many en- 
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gineering applications of bilevel programming (for in- 
stance, see [4,5]). It is also a valid strategy for the cases 
of decentralised manufacturing and financial structures 
when the leader has a full insight and control of the 
overall objectives and strategy of the corporation, while 
the follower does not. 

However, this is not always the case. For example, 
using the feedback Stackelberg solution, where at every 
level of play a Stackelberg equilibrium point is searched, 
the commitment of the leader for his/her decision in- 
creases with the number of players involved. [3] present 
an example where the sacrifice of the leader’s objective 
on behalf of the followers results in a better solution for 
both levels. Similar solution strategies have also been 
studied [3,22,28,30]. 


Theorem 1 [32] If for each x € X, f and g are twice 
continuously differentiable functions for every y € C(x), 
fis strictly convex for every y € C(x) and C(x) is a con- 
vex and compact set, then M(-) is a real-valued function, 
continuous and closed. oO 


If Theorem 1 applies and assuming that M(x) is non- 
empty, then M(x) will have only one element, which is 
y(x). Thus, (2) can be reformulated as: 


min F(x, y(x)) 
xy 


s.t. G(x, y(x)) < 0 
x€ Cy f 


(11) 


Crp = {x € X: dy € Y, g(x, y) < 0}. 


Considering that f is a convex real function, the 
function y(x) can be computed as a linear conditional 
function based on parametric programming theory, as 
follows [9]: 


m'+n'x, if H'x <h! 
m+ n?x, if H?x < h? 
(x) = (12) 
- m* +nkx, if Hkx < h* 
mK + nXx, if H&x < h* 
where, n*, m* and h* are real vectors and H* is a real 
matrix. 


Theorem 2 [32] Ifthe assumptions of Theorem 1 hold, 
F is a real continuous function, X and the set defined by 


G(x,y) are compact, and if {dx € X: G(x, y(x)) < O}, 
then there is a global solution for Problem (2). oO 


Since an explicit expression for y can be computed, if 
the assumptions of Theorem 2 hold, and the two play- 
ers have convex functions to optimise, then the global 
optimum for Problem (2) can be obtained via the para- 
metric programming approach. The advantage of using 
this approach is that the final solution will consider the 
possibility of existence of other global minima, which 
could correspond to better solutions for the follower. 
Moreover, the parametric nature of the leader’s prob- 
lem is preserved. 

Regarding computational complexity, a number of 
authors have shown that bilevel programming prob- 
lems are NV P-Hard [8,21]. Furthermore, [31] proved 
that even checking for a local optimum is an NP-Hard 
problem. 

The objective of this section is to describe a para- 
metric programming framework which can solve dif- 
ferent classes of multilevel programming problems to 
global optimality. We describe the fundamental devel- 
opments for the quadratic bilevel programming case, 
and how the theory unfolds to address the existence of 
RHS uncertainty. 


Bilevel Programming Problem 


Consider the following general quadratic BLLP: 


1 
min F(x, y) = Ly + Lox + L3y + 5x Lax + y"Lsx 
x,y 
4.2 yiy 
3” 6y » 
s.t. Gix + Goy + G3; <0, 


1 
min f(x,y) =h+hx+bhy+ se lax + y"I5x 
a 


1 aE 
ae | : 
+ 3” 6y 
st gix+ goyt gs <0, 
(13) 


where x and y are the optimisation variables, x € X C 
R” and y € Y Sc R”. [Lalixnxs [Ea lisseye [La licenses 


[Ls] nyxnx> [Lelnyxny> [to] renx? LPs doers [ds] acm? 
Ls] ayn and [26] nyxny are matrices defined in the 
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real space. The matrices [Gi] nuxnx> [G2] nuxny> [G3] nuxi> 
[etn [82Danany &'Inuas €O725P0NA tO the con- 
straints, also defined in the real space. 

By focusing attention on the follower’s optimisation 
problem, considering x as a parameter vector and op- 
erating a variable change (z = y + I< 'I5x), it can be 
rewritten as the following mp-QP problem: 


1 1 
min f’(x,z) =1,+ x + st lax + {z+ 52 He2} : 
z 


s.t. g5Z < g, + 9)%, 
(14) 


where: If = hh; = bh —-(lg's; j= bs = lL — 12 
Ie 'Is; 15 = 05 If = Ios gi = —(gi — gale 'ls)s 9) = g23 
&; = —g3. The mp-QP problem can be solved by ap- 
plying the algorithm of [9]. As a result, a set of rational 
reaction sets (5) is obtained for different regions of x: 


zk = m* + nx; H*x <h*, k=1,2,...,K. (15) 


Incorporating the expressions (15) into Prob- 
lem (13) results in the following K quadratic problems: 


1 
min F(x) = LF + Uk + se lix, (16) 
s.t. Gikx < Gi : 


with: 
a ea 1 an 
) = 3m + a 6M; 
Uf Sky + Ign =a em 
sks + m™ Len*® = m* Lgl be; 
UF sig 420" Ls 20 OL +n 
shen’ In" Tel ee ha hs 
G, =G; + Gan‘ — GIg"15; 
G, = — (G3 + Gam"); 
k k 
Gi =[G)|H Nnscinatnse 
k kyT 
G3 =[G5|h lays(nu-tn,e) : 
Clearly, the solution of the BLLP Problem (13) is the 
minimum along the K solutions of Problem (16). 


Remark 1 The artificial variable, z, introduced in Prob- 
lem (14) is only necessary if I; # 0. In all other cases 


the multi-parametric problem can be easily formulated 
through algebraic manipulations. 


Remark 2 When one of the matrices If, L/* is null 
the optimisation problem where these are involved be- 
comes linear. In particular, if 1; = 0, Problem (14) 
is transformed into an mp-LP; on the other hand, if 
Le = 0, Problem (16) becomes an LP problem. In both 
cases, the solution procedure is not affected, due to the 
fact that the Basic Sensitivity Theorem [15,16] also ap- 
plies to the mp-LP problem. 


Remark 3 The expression for the artificial variable in- 
troduced, z, is only valid when /, is symmetric. If not, 
with the following transformation: 


seace 
Is = 


2 


the resulting matrix is non-singular. If the resulting 
matrix is singular, the expression for the artificial vari- 
able should be given by: 


zZ=yt+ Ax, 
where A should satisfy: 


NXXNX 1 1 
AER™™ 1s — (Sle + 5a)A = 0 


In this case, several solutions for the system above 
can exist. However, as long as the bilinear terms are 
eliminated in Problem (14) any solution can be selected. 


Remark 4 This technique is not valid when at the same 
time: 

1. f is a pure quadratic cost function, 

2. f involves bilinear terms and 

3. matrix Ig is singular. 


Observing Formulation (16) we can conclude that the 
parametric programming approach, Alg. 1, transforms 
the original quadratic bilevel programming problem 
into simple quadratic problems, for which a global op- 
timum can be reached. 


Bilevel Programming Problem 
with Multi-Followers 


Consider the bilevel programming problem with multi- 
followers, and assume quadratic objective functions, 
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linear constraints and two followers: 


; 1 
min f; =L,;+ 
Ewes 1 


+13-x+L}-y:+Li-y2+ (1st level) 


ly ly 
rae Lyx toy Ley 
1 
+ Syn Lay txt Ley 
+ yz Lg xt yz + Lig y, 
G}-x+G)-1+G3-y2 <0, 
(2nd level) 


min fp = L} + Follower 1 


J. 

+1p-x+L3-yi+1g-y2 

4 5x0 Lex SyP den 
+59P Bento Rey 

+ yz Ly-xt+ yz Lin, 

st. |s.t. GP-x + G5-yi + GF-y2 <0, 


min f3 = L? + 
y2 


+E3-x+13-n tlie yn 


Follower 2 


1 1 
+ 5xt Lg xt sy Ley 
1 
+52 Lytle ys 
+ yz Lg:xt yz -Lig v1» 


st. GP-x+G)-yt+G3-y <0. 
(17) 


The difference between Problem (17) and Prob- 
lem (13) is the existence of two optimisation subprob- 
lems in a single level. Accordingly, the concept of Nash 
equilibrium is introduced. 

As in the bilevel programming case, each optimi- 
sation subproblem in (2nd level) is recast as a multi- 
parametric programming problem. In this problem, the 
parameters are all the variables from the optimisation 
problem at (1st level) as well as the optimisation vari- 
ables of the other subproblems at the same level, Fol- 
lower 1 or Follower 2 in this case (17). Thus, defin- 
ing vectors, [o?]" = [xly2] and [o3]" = [xly:], we 


Algorithm — Parametric Programming Algorithm for 
BLPP 


1. Recast the inner problem as a multi-paramet- 
ric programming problem, with the leader’s 
variables being the parameters (14); 

2. Solve the resulting problem using the suitable mul- 
ti-parametric programming algorithm; 

3. Substitute each of the K solutions in the lead- 
er’s problem, and formulate the K one-level 
optimisation problems; 

4. Compare the K optimum points and select the 
best one. 


Bilevel Programming Framework for Enterprise-Wide Pro- 
cess Networks Under Uncertainty, Algorithm 1 
Parametric Programming algorithm for a BLPP 


rewrite the (2nd level) optimisation subproblems as, 
min fo(y1,@°) =Li + 13" 0° + 13: y1 
1 1 
+ son La + sy be 
+ yp Lg* +o, 


st. Gi*-@° +G)-y1 <0, 
(18) 


and, 
min fs(y2, 0?) =L} + L3* +0 + Ly- yp 
2 


1 1 
+ 50? LE 0 + yh «La ya 
+ yj L5*-o*, 
st. G)*-@>° + G3-y2 <0, 
(19) 


where w* and w? are the vectors of parameters. The bi- 
linearities can be circumvented by using a similar strat- 
egy to the one used in the bilevel case. By using a multi- 
parametric programming algorithm [9], problems (18) 
and (19) result in the following parametric expressions: 


y¥1 = b\(X, y2) — rational reaction set follower 1, 


yo = d2(x,¥1) — rational reaction set follower 2 , 


(20) 
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Algorithm 


1. Recast each of the subproblems in the lower level 
as a multi-parametric programming problem, with 
the variables out of their control being the param- 
eters (18-19); 

2. Solve the resulting problems using the suitable 
multi-parametric programming algorithm; 

3. Compute a Nash equilibrium point by direct com- 
parison of the rational reaction sets (21); 

4. Substitute each of the K solutions in the lead- 
er’s problem, and formulate the K one level 
optimisation problems; 

5. Compare the K optima points and select the best 
one. 


Bilevel Programming Framework for Enterprise-Wide Pro- 
cess Networks Under Uncertainty, Algorithm 2 

Parametric programming algorithm for bilevel program- 
ming problems with multi-followers 


which are then used to compute the Nash equilibrium 
(x, yr, ¥2): 


fil% 7.92) SA yy2), Yue N, (21) 
Sal%, V+ ¥2) S fal% yT, 2), V2 € Ya, 
easily computed by direct comparison [24]: 
bi(x, V1) = dr(x 1), > n=O), (22a) 
bi(x, ¥2) = Py(x, ¥2), > yr = OF (x). (22a) 


Finally, substituting the expressions in (22) in the 
leader’s optimisation problem, (1st level), we end up 
with a single-level convex optimisation problem, in- 
volving only the leader’s optimisation variables, as fol- 
lows: 


min fi'(x, yi(x, y7(x)), yolx, yi (%))), 


s.t. Gi(x, yi(x, ¥3), yo(x, yt) < 0, 
C, F = {x EX: Ay. y. € Y,Z, G(x, V1, y2) <0, 


G3(x, V1, y2) < OF. 


xECy, 


(23) 


The algorithm is summarised in Alg. 2. 


Bilevel Programming with Uncertainty 


[12] highlighted the importance of considering uncer- 
tainty/risk (e. g. prices, technological attributes, etc.) in 
the solution of decentralised decision makers. A com- 
prehensive analysis of linear bilevel programming 
problems can be found in [27], where uncertainty is 
considered unstructured, taking any value between its 
bounds. Here it is extended to the quadratic case. 
We address the following quadratic BLPP with uncer- 
tainty, 0: 


1 
min F(x, y,0) = L, + Lox +L3y+ 5 Lax 
xv 
1 
+ y Lsx + sy Loy 


s.t. G\x + Gry + G3 < G40 
oo f(xy, 0) => l + lx + Isy 


(24) 


1 1 
=x" Iyx + y' Isx + ~y" ley 
2 2 
st. xt oy+t+ e3 < gO, 
The steps for solving (24) are as follows: 
1. Recast the inner problem as an mp-QP, with param- 


eters being both x and @. The solution obtained is 
similar to (15): 
z* = m* + nkx + nko; 


eS 12s, argKs 


H*x + H*6 < hk , 


(25) 


2. Incorporate expressions (25) in (24) to formulate K 
mp-QPs, with parameters being the uncertainty 0: 


2 2 Te ras 
min F’(x,0) = LF + iFx + sel 06) 
x 
st. Gikx < G+ Gio, 


where Lik, Li*, L/*, Gik, GX, G/F are appropriate ma- 
trices derived by algebraic manipulations. 
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A large number of mathematical programming prob- 
lems have optimization problems in their constraints. 
Arising from the areas of game theory and multicriteria 
decision making, these bilevel programming problems 
(BPP) take the form: 


min F(x, y) 
st. G(x, y) <0 


(1) 
Argmin f(x,y) 
y= x 

s.t. g(x,y) <0 


where x € R"!, y € R” and the functions F(x, y), f(x, 
y), G(x, y) and g(x, y) are continuous and twice differ- 
entiable. It is generally assumed that these functions are 
convex; the case of nonconvex functions has not been 
considered in the literature so far (as of 2000). 

Bilevel programming has its origins in Stackelberg 
game theory, in particular from models of two-person 
nonzero-sum games. In these games, two players make 
alternate moves in a pre-established order. The first 
player (the leader) selects a move, x, that optimizes his 
own cost function. The second player (the follower) 
then has to make a move y that is constrained by the 
prior decision of the leader. The follower has access 
only to his own cost function, while the leader is aware 
of both his own as well as the follower’s cost function, 
and can thus foresee the reaction of the follower to 
any move that the leader makes. If the cost functions 
of the two players are identical (called the cooperative 
case), then the two constraint sets can be merged and 
the problem can be solved as a single level optimiza- 
tion problem. If the cost functions are exactly opposite 
(that is, f(x, y) = — F(x, y)), then there can be neither 
cooperation or compromise. The most interesting (and 


normally studied) case is when the two objectives are 
neither identical nor opposite. 

BPP also arises in hierarchical decision making. For 
example, a central planning office might decide upon 
national budgets which act as constraints for local gov- 
ernments and businesses. Other applications include 
long-range planning problems followed by short term 
scheduling in the chemical process industries and en- 
ergy planning of businesses constrained by national 
government policy. A detailed list of references for ap- 
plications of BPP can be found in [14]. See [13] for 
a full review of algorithms and applications of bilevel 
and multilevel programming. 


Definitions 


The following definitions will be used in the sequel. The 
relaxed constraint region for the BPP is defined as 


S = {(x,y): G(x, y) <0, g(x, y) < 0}. 


The follower’s feasible region for a fixed x, o(x), is de- 
fined as 


o(x) = {y: g(x,y) < 0}. 


This set is parametric in x, and represents the allowable 
choices for the follower. The rational reaction set M(x) 
is defined as 


M(x) ={y: y € Argmin{f(x,y): y € o(x)}}. 
Finally, the inducible region for the problem is 
IR = {(x,y): y € M(x), (x, y) € S}. 


The inducible region IR (which represents the fol- 

lower’s feasible region) is in general nonconvex. In 

terms of the bimatrix or Stackelberg games, IR repre- 

sents ‘equilibrium’ points, that is, the set of compromise 

solutions between the leader and the follower. In the 

presence of first level constraints (1), IR may be empty, 

which implies that the BPP has no solution. However, 

it can be shown that the IR is compact and the BPP has 

a solution, if the following conditions are met [7]: 

a) F(x, y), f(x, y), G(x, y) and g(x, y) are continuous 
and twice differentiable; 

b) f(- y) is strictly convex in y; 

c) o(x) isa compact convex set; and 

d) F(x, y) and G(x, y) are convex in x and y. 
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Note that the solution to the BPP need not be individ- 
ually optimal for each of the leader’s and follower’s ob- 
jective function (that is, it need not be an efficient solu- 
tion). 

The specific instance of BPP when all the functions 
involved are linear has received the most interest. The 
linear bilevel programming problem (BLPP) can be writ- 
ten as 

min 
x 


s.t. A\x + Biy & by, 


Fy (x, yy=celxtdly 


Argmin c)x +d} y (2) 
y 


Y= 4¥s.t. g(x, y) 


= A2x + Buy < bo. 


Complexity 

Because of the nonconvexity of the induced region 
IR, BPP can be a hard problem to solve. It is gener- 
ally known that even the linear problem, BLPP, is NP- 
hard. This has been shown by reducing the problem 
to a knapsack optimization problem [3], the standard 
KERNEL problem [11], and by reduction to a problem 
of minimizing a convex quadratic function over a poly- 
hedron [1]. In fact, even checking for local optimality 
in BLPP is NP-hard [15]. 


Multiple Solutions to the Follower’s Problem 


In the absence of dual degeneracy, the follower’s sub- 
problem has a single solution for every x. However, 
if the follower’s subproblem has multiple solutions for 
any x, then the overall BPP may not be well-defined. In 
this case, we need further assumptions about the coop- 
erativeness of the follower with respect to the leader. Al- 
ternately, the follower’s objective function can be mod- 
ified as 


f(x, y) 


in effect, allowing the leader to ‘kick back’ a small por- 
tion of its earnings to ensure that the follower selects 
a suitable solution. 


= f(x,y) + €F(x, y); 


Solution Methods 


From the 1980s onwards, many approaches have been 
proposed for the solution of BPP. These can be classi- 
fied as enumerative, complementary pivot, branch and 


bound, descent and penalty function methods. The last 
two categories of methods are only useful in finding sta- 
tionary points and local minima, and will not be dis- 
cussed here. The vast majority of the approaches ad- 
dress the linear case, BLPP. Some of the global opti- 
mization methods are discussed below. 


Enumeration Methods 


The linear BLPP is equivalent to maximizing the linear 
function F;(x, y) over a piecewise linear constraint re- 
gion composed of the edges and hyperplanes of S, the 
feasible region. It can be shown that the global opti- 
mum to BLPP occurs at a vertex of S. This suggests 
an extreme point search procedure for solving BLPP. 
One such procedure is the Bialas-Karwan Kth-best al- 
gorithm [4]. The basic idea is to find an ‘ordered’ set of 
extreme points to the relaxed problem 


min Fi(x,y)=clx+dly 


s.t. Ax + Byy < b; 
A2x + Boy < bp. 


The algorithm has the following steps: 


0 | Solve the relaxed problem. Let the solution be 
(x!, y!). Set k = 1. 
1 | Solve the inner problem with x = x*. If y* 
in the solution set to the inner problem, then 
STOP. 
2 | Locate all adjacent extreme points (x;, y;) such 
that 

Clea ya cle edly Mi, 
Choose ae adjacent extreme point j that mini- 
mizes rae xj + d} yj. Stk =k ay 
(xj, yj). Go to Step 1. 


Since each successive pair of points tested in this algo- 
rithm is adjacent, it can be efficiently implemented us- 
ing the dual simplex method. 


Complementary Pivot Methods 


Under proper regularity conditions, the inner problem 
to the BPP can be replaced by its Karush-Kuhn-Tucker 
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optimality conditions. For the case of the BLPP, this re- 
sults in the following single-level optimization problem 
(KKT): 


min F(x, y) = cl x + diy 
st. Aix+ By < by 
Aox + Boy < by 
[(A2x + Bay — bz) = 0 
dy +A] p=0 
= 0. 


(3) 


The problem KKT has a linear complementarity pivot 
formulation. As such, it can be solved using a comple- 
mentary pivoting method. Consider the following para- 
metric formulation LCP(A): 


clxtdiy<a, 

A\x + By < by, 

A2x + Boy < bo, 

[L(A2x + Bry — bz) = 0, 
dy + A} =0, 

we 0. 


The global minimization of BLPP then corresponds to 
the identification of the minimum value of A such that 
LCP(A) has a solution. The following method can be 
used to solve LCP(A): 


0 | Solve LCP(A) without the first parametric con- 
straint. Let (x°, y°) be the solution to this prob- 
lem, with Ay = c} x° +d] y°. 

1 | Solve LCP(A*). If LCP(A*) has no solution, go 
to Step 3. Otherwise, let (x*, y*) be the solution. 


2 | Set 


eel = AW eden ill af 
WE = Eh ae seal! yh? — 


y lepakea,| yk 5 
where y is a small positive number. Set k = k+1, 
go to Step 1. 

3 | If k = 0, then BLPP has no solution. Otherwise, 
x.y is an €-global optimum to BLPP,where 
e=) lc) tay |e 


The key to this algorithm is the ability to efficiently 
solve LCP(A‘) in Step 1. J. Judice and A. Faustino [12] 


have proposed a hybrid enumerative method which 
works by branching on the complementarity conditions 
[(A2x + By y — b2) = 0. Numerous heuristics can be 
used in each node of the resulting branch and bound 
tree, in order to reduce the search for a complementary 
solution. 


Branch and Bound Methods 


These methods work by identifying the set of inner- 
level constraints that are active at the optimal solution. 
The simplest method, due to J. Fortuny-Amat and B. 
McCarl [10], works by converting the KKT comple- 
mentarity conditions in (3) to 


[L(A2x + Boy — br) = 0, hi < Mai, 
Agx + Boy = by > M(1—<«;), 
aj = 0- 1, Vi, 


where M is a large constant. The variable a; is equal to 
1 if inner level constraint i is active at the optimal so- 
lution, and zero otherwise. This converts the one-level 
problem to a mixed integer linear program (MILP), 
which can be solved with commercial MILP codes. 
However, this requires the addition of 2 - m constraints 
and m variables, where m is the number of inner-level 
constraints. 

Note that at the optimal solution, at least one of the 
inner problem constraints must be active, that is, 


eek (4) 
i=1 


Moreover, it can be shown that the following conditions 
must hold [11]: 


» aj = if dj <0, (5) 
fi: By,,>0} 
Ya = 1 ifd; > 0, (6) 


fi: By,,<o} 


forj=1,..., 2. It is possible to use (4)-(6) as branching 
criteria in a branch and bound tree. Each of these con- 
ditions, when tight, can be used to eliminate a variable 
from the inner constraints. By combining these condi- 
tions with the use of linear relaxations to obtain lower 
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bounds, a branch and bound algorithm can be devel- 
oped to solve the BLPP [11]. 

An alternate method to the use of binary variables is 
to establish a one-to-one correspondence between each 
a; and each j1;, as follows: 


1 
ui < pi < Ma;, 


where M is a suitably large number. This ensures that if 
a; = 0, then pz; = 0, while if a; = 1, w; > (1/M) implying 
an inactive constraint. With this approach, BLPP can be 
transformed to: 
min F,(x,y)=c)x+d]y 
x 
s.t. A\Xx + Biy S b; 
A2x + Boy =< b> 
a(Arx + Boy == b>) =0 


dy +A} w=0 
Mi < Ma; 
a; < My; 

w>=0, a= {0,1}. 


By partitioning the variables into x = (x, y) and y = 
(4, a), it can be seen that this problem is of the form 


min f(x,y) 

x,y 

st. 2(X, y) <0 
h@&, y) = 0, 


where f(, y), g(x,y) and h(x, y) are bilinear func- 
tions. Thus, the GOP algorithm of [8,9] can be applied 
to solve this problem. The algorithm works by solving 
a set of primal and relaxed dual problems that bound 
the global solution. The primal problem is 


min f(x, y*) 
st. 3(X, yk) <0 
hey") = 0 
where y* is a fixed number. Because this problem is lin- 
ear, it can be solved for its global solution, and yields 
an upper bound on the global solution. It also provides 


multipliers for the constraints, jz* and A*, which can be 
used to construct a Lagrange function of the form 


L(x, y, WEA) = F(R, y*) + wk RH, y*) 
+ AKh(x, y*). 


It is then possible to solve a dual problem 


min u 
y 


st. u> L(x,y, pO), 


which provides a lower bound on the global solution. 
The dual problem is actually solved by partitioning the 
y-space using the gradients of L and solving a relaxed 
dual subproblem in each region. In [16] it has been 
shown that for the bilevel problems, only one dual sub- 
problem needs to be solved at each iteration. This ap- 
proach can also be used when the inner problem objec- 
tive function is quadratic. 

Another approach, proposed in [2], can also be used 
when the inner level problem has a convex quadratic 
objective function. The basic idea is to first solve the 
one-level linear problem by dropping the complemen- 
tarity conditions. At each iteration, a check is made to 
see if the complementarity condition is satisfied. If it 
is, the corresponding solution is in the inducible re- 
gion IR, and hence a candidate solution for BPP. If not, 
a branch and bound scheme is used to implicitly exam- 
ine all combinations of complementary slackness. 


Let WwW, {i: Mi 0}, W> {i: gi 0}, W3 {i: i ¢ WwW 
U Wo}. 
0 | Set k =0, Wi = W2 =O, W3 = i, F = 00. 


1 | Set w; = 0,1 € Wy, gi = 0,7 € Wz. Solve the 
relaxed system. Let (xk, y*, p*) be the solution. 
If no solution exists, or if F(x*, y*) > F, go to 
Step 4. 

2 | If pigi = 0, Vi, go to Step 3. Otherwise se- 
lect i such that jjg; is maximal, say i. Let 
W, = W, U i, W3 = W3 U i, and go to Step 1. 
Update F = F(x*, y*). 

4 | If all nodes in the three have been exhausted, 
go to Step 5. Else, branch to the newest un- 
fathomed node, say j, and set W; = W, U j, 
W2 = W, U j. Go to Step 1. 

5 | If F = 0, no solution exists to BPP. Otherwise, 
the point corresponding to F is the optimum. 


ies) 


Computational Results and Test Problems 


The difficulty of solving bilevel problems depends on 
a number of factors, including the number of inner 
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versus outer level variables, degree of cooperation be- 
tween the leader and follower objective functions, num- 
ber of inner level constraints and the density of the 
constraints. Computational results have been reported 
by many authors, including [2,11,12] and [16]. Gener- 
ally, these have so far been limited to problems involv- 
ing up to 100 inner level variables and constraints. See 
[5,6] for methods for automatically generating linear 
and quadratic bilevel problems which can be used to 
test any of these and other algorithms for bilevel pro- 
gramming. 
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The bilevel programming problem is a hierarchical 
problem in the sense that its constraints are defined in 
part by a second parametric optimization problem. Let 
W(x) be the solution set of this second problem (the so- 
called lower level problem): 


W(x) := Argmin{f(x, y): g(x,y) < 0}, (1) 
y 


where f, gj € C?(R" x R™, R), i = 1, ..., p. Then, the 
bilevel programming problem is defined as 


“min” {F(x,y): y € W(x),x € x} (2) 


with F € C!(R” x R”, R) and X C R" is closed. Prob- 
lem (2) is also called the upper level problem. The inclu- 
sion of equality constraints in the problem (1) is possi- 
ble without difficulties. If inequalities and/or equations 
in both x and y appear in the problem (2), this problem 
becomes even more difficult since these constraints re- 
strict the set W(x) after a solution y out of it has been 
chosen. This can make the selection of y € W(x) a pos- 
teriori infeasible [6]. 

The bilevel programming problem can easily be in- 
terpreted in terms of Stackelberg games which are a spe- 
cial case of them widely used in economics. In Stackel- 
berg games the inclusion of lower level constraints g(x, 
y) < 0 is replaced by y € Y where Y C R” is a fixed 
closed set. Consider two decision makers which select 
their actions in an hierarchical manner. First the leader 
chooses x € X and announces his selection to the fol- 
lower. Knowing the selection x the follower computes 
his response y(x) on it by solving the problem (1). Now, 
the leader is able to evaluate the value of his initial 
choice by computing F(x, y(x)). Having full knowledge 


about the follower’s responses y(x) for all x € X the 
leader’s task is it to minimize the function G(x) := F(x, 
y(x)) over the set X, i. e.to solve problem (2). 

The bilevel programming problem has a large num- 
ber of applications e. g.in economics, natural sciences, 
technology (cf. [17,25] and the references therein). 

The quotation marks in (2) have been used to in- 
dicate that, due to minimization only with respect to 
x in the upper level problem (2), this problem is not 
well defined in the case that the lower level problem (1) 
has not a uniquely determined optimal solution for all 
values of x [6]. Minimization only with respect to x in 
(2) takes place in many applications of bilevel program- 
ming, e.g.in the cases when the lower level problem 
represents the reactions of the nature on the leader’s 
actions. If W(x) does not reduce to a singleton for all 
parameter values x € X, either an optimistic or a pes- 
simistic approach has to be used to obtain a well defined 
auxiliary problem. 

In the optimistic case, problem (2) is replaced by 


min {F(x, y): y © W(x),x € X} (3) 
xy 


[6,11], where minimization is taken with respect to both 
x and y. The use of (3) instead of (2) means that the 
leader is able to influence the choice of the follower. If 
the leader is not able to force the follower to take that 
solution y € W(x) which is the best possible for him, he 
has to bound the damage resulting from an unwelcome 
choice of the follower. Hence, the leader has to take the 
worst solution in W(x) into account for computing his 
decision. This leads to the auxiliary problem in the pes- 
simistic case: 


min jmax{F(x,y): ye W(x)}: x EX (4) 
x y 


[15,16]. 

In the sequel it is assumed that the lower level prob- 
lem (1) has a unique (global) optimal solution y(x) for 
all x € X. This is guaranteed to be true at least if the 
assumptions C), SCQ), and SSOC) below are satisfied. 
Then, the implicit function approach to bilevel program- 
ming can be used which means that problem (2) (and 
equivalently (3)) is replaced by 


min {G(x) := F(x, y(x)): x € X}. (5) 


C) The functions f(x, -), gi(x, -): R” — Rare convex 
in y for each x € X. 
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SCQ) For each x € X there exists a point y(x) such that 
g(x, ¥(x)) <0. 

For convex problems, Slater’s condition SCQ) implies 

that a feasible point y(x) to (1) is optimal if and only 

if the Karush-Kuhn-Tucker conditions for this problem 

are valid: There exists a point A € A(x, y(x)), where 


A(x, y(x)) 
= {A> 0: V,L(«, (x) =0,1" 2%, Wx) = 0} 
(6) 


with L(x, y) =f (x,y) +A! g(x, y) denoting the Lagrange 
function of the problem (1). 


Reformulation as a One-Level Problem 


There are several methods to reformulate (3) as an 
equivalent one-level problem. 

The first possibility consists in replacing the lower 
level problem (1) by its Karush-Kuhn-Tucker condi- 
tions (6): 


VyL(x, y) = 0, 
Al g(x, y) = 0, 
g(x,y) <0, 
A>0,xEXx 


min POS): (7) 


This is an optimization problem with constraints given 
in part by a parametric complementarity condition. 

A second possibility is to use a variational inequality 
describing the set W(x). Let assumption C) be satisfied. 
Then, the problem (3) is equivalent to 


g(x,y) <0,x EX, 
min ) F(x, y): Vf(x, y(z—y) 20 &. (8) 
i Vz: 9(x,z) <0 


Both approaches (7) and (8) lead to a so-called 

mathematical program with equilibrium constraints 

(MPEC) [17]. 

SSOC) For each x € X, for each y € W(x), for all A € 
A(x, y) and for all d ¥ 0 satisfying 


Vy gi(x, y)d = 0 for all i: A; > 0, 
the following inequality holds: 


d'V} L(x, y,A)d > 0. 


If at an optimal solution y(x) of the convex problem 
(1) at x = x the assumptions SCQ) and SSOC) are sat- 
isfied, then y(X) is a strongly stable optimal solution in 
the sense of M. Kojima [13]. This means that there ex- 
ists an open neighborhood U of x and a uniquely deter- 
mined continuous function y: U > R™ such that y(x) is 
the uniquely determined optimal solution of (1) for all x 
€ U. Hence, for convex problems (1), the assumptions 
SCQ) and SSOC) imply that there is a uniquely deter- 
mined implicit function y(x) describing the unique op- 
timal solution of the problem (1) for all x € X. This 
function can be inserted into the problem (2) which 
results in the third equivalent one-level problem (5). 
Problem (5) consists in minimizing the implicitly de- 
termined, generally nonsmooth, nonconvex objective 
function F(x, y(x)) on the set X. It has an optimal solu- 
tion if the set X is compact or the function F(-, -) satisfies 
some coercivity assumption [11]. 

Under suitable assumptions, the parametric com- 
plementarity problem as well as the parametric varia- 
tional inequality describing the constraints in a mathe- 
matical program with equilibrium constraints also pos- 
sess a uniquely determined continuous solution func- 
tion [17]. Then, the implicit function approach can also 
be used to investigate MPECs. 


Properties of the Solution Function 


For the investigation of bilevel programming problems 
via (5) the knowledge of properties of the solution func- 
tion y: X — R” is needed. If the assumptions C), SCQ), 
and SSOC) are satisfied, this function is continuous 
[13], upper Lipschitz continuous [22], Hélder contin- 
uous with exponent 1/2 [9] and directionally differen- 
tiable [3,24]. Let z = (x, v(x), T = {j: g)(@) = 0}, 
J(A) := {j: Aj > 0}. The directional derivative 


y' (Xr) = Jim, t~'[y(x + tr) — y(x)] 


of the function y(-) at a point x can be computed as the 
unique optimal solution y’(x; r) of the convex quadratic 
problem 


L tot eee Ty 1/5 7 : 
sf VyyL, A)jd+d ViyLZ, A)r> min, 
Vie J(A), (9) 
Viel\ JOA), 


Ve gi(Zr + Vygilz)d = 0, 
Vi gilZ)r + Vygi(z)d < 0, 
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for some suitably chosen Lagrange multiplier 


A € Argmax {V,L(Z,A)r: 2 € A(z} (10) 
A 


[3]. The correct choice of A is a rather difficult task since 
it possibly belongs to the relative interior of some facet 
of the polyhedral set A(Z) [3]. For making the applica- 
tion of these properties of the solution function easier, 
a further assumption is used: 

CR) For each pair (x,y), x € X, y € W(x), there is an 
open neighborhood V C R" x R”™ of (x, y) such 
that, for all I C T, the family of gradients {Vy gi(x, 
y) :i € I} has constant rank on V. 

If the assumptions C), SCQ), SSOC), and CR) are sat- 

isfied, the function y: X — R” is a piecewise continu- 

ously differentiable function [21], i.e. it is continuous 
and there exist an open neighborhood U of x and a fi- 

nite number of continuously differentiable functions y’ 

U — R",i=1,..., k, such that y(-) is a selection of 

the y’: 


y(x) € {y'(x): peed esate VxeuU. 


The functions y': U + R™ describe locally optimal so- 
lutions of auxiliary problems 


min {f(x, y): g(x,y) =0,j Eh}, 


where the sets I;, i = 1, ..., k, satisfy the following two 
conditions: 
e there exists a vertex A € A(X, y(x)) such that J Oye 

I; C I; and 
e the gradients {V,gj(x, y(x)): j € 1;} are linearly 

independent [14]. 

Let IS(x) denote the family of all sets I; having these 
two properties. Then, k is the cardinality of IS(x). The 
functions y': U + R” are continuously differentiable at 
x [7]. For the computation of the Jacobian of the func- 
tion y/(-) at x = X the unique solution of a system of 
linear equations is to be computed. 

Moreover, the directional derivative y’(x;1r) is equal 
to the unique optimal solution of the quadratic problem 
(9) for each optimal solution A of the linear problem 
(10) [21]. For fixed x, it is a continuous, piecewise lin- 
ear function of the direction r. The quadratic problem 
(9) has an optimal solution if and only if A solves the 
linear problem (10). Hence, for computing a linear ap- 
proximation of the function y: X — R” it is sufficient to 


solve the parametric quadratic optimization problems 
(9) for all vertices A € A(X, y(X)). 

Piecewise continuously differentiable functions are 
locally Lipschitz continuous [10]. The generalized Jaco- 
bian [1] of the function y(-) satisfies 

dy(X) C conv {Vy' (x): — Ligogh} (11) 
[14]. Let g7(z) = (gi(z))ier. If the assumption 
FRR) For each x € X, for each vertex A € A(Z) with 

Z = (x, y(x)), the matrix 


Vipl@a) Vi gaq@) VLG) 
Vy g7(Z) 0 Vx 87(Z) 


has full row rank 
is added to C), SCQ), SSOC), and CR), then equality 
holds in (11) [5]. 


Optimality Conditions 


Even under very restrictive assumptions, problem (5) is 
a nondifferentiable, nonconvex optimization problem. 
For the derivation of necessary and sufficient optimal- 
ity conditions, various approaches of nondifferentiable 
optimization can be used. 


Conditions Using the Directional Derivative 
of the Solution Function 


Let X = {x: hy(x) < 0, k € K}, where hy € C!(R", R), ke 
K and K isa finite set. Generalizations of the following 
results to larger classes of constraint sets are obvious. 
Let x € X, y(x) € W(x), z = (x, y(X)). Let the assump- 
tions C), SCQ), SSOC), and CR) as well as 
MFCQ) There exists a direction d such 

Vhy(x)d < 0 forall k € K := {1: hj(x) =0} 
be valid. Then, if X is a locally optimal solution of the 
problem (5) (and thus of the bilevel problem (2)), there 
cannot exist a feasible direction of descent, i.e. 


that 


VxF(Z)r + VyF(Z)y' (xr) = 0 (12) 
for all directions r satisfying Vhx(x) < 0,k € K. By use 
of the above approach for computing the directional 
derivative of the solution function y(-), the verification 
of this necessary optimality condition can be done by 
solving a bilevel optimization problem of minimizing 
the function (12) subject to the condition that y’(x;1r) 
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is an optimal solution of the problem (9). By replacing 
problem (9) with its Karush-Kuhn-Tucker conditions 
and applying an active index set strategy the following 
condition is obtained: If x is a locally optimal solution 


of the problem (2) then 
v := min {g(x, 1): I € IS(x)} > 0, (13) 


where g(x,I) denotes the optimal objective function 
value of the problem 


VF (z)r + Vy F(z)d > min, 
ae 4 


Vitg(Z)r<0, keK, 
V;,L(z, A)r + V2, L(Z, A)d + V, gi(Z)a = 0, 


Vigi(Z)r + Vygi(z)d = 0, iel, 
Vigi(z)r + Vygi(Z)d < 0, 1eI\1, 
a >0, i€I\JA), — [rll =1, 


and A is the unique vertex of A(z) with J Oye ria. 
Problem (13) is a combinatorial optimization problem 
and can be solved by enumeration algorithms. 

In [2] a more general necessary optimality condi- 
tion is given even without assuming CR). Then, the di- 
rectional derivative of the solution function is in gen- 
eral discontinuous with respect to perturbations of the 
direction and is to be replaced by the contingent deriva- 
tive of the solution function. 

In [17] it is shown that nonexistence of directions 
of descent in the tangent cone to the feasible set is also 
anecessary optimality condition for MPECs. In general, 
this tangent cone is not convex. Using a so-called basic 
constraint qualification it is shown that it is equal to the 
union of a finite number of polyhedral cones. The re- 
sulting condition is similar to (13). Dualizing this con- 
dition, some kind of a Karush-Kuhn-Tucker condition 
for MPECs is obtained. 

It is also possible to obtain a sufficient optimality 
condition by use of the directional derivative. Namely, if 
for the optimal function value in (13) the strict inequal- 
ity v > 0 holds then, for each c € (0, v), there exists ¢ > 0 
such that 


F(x, y(x)) = F(x, y(x)) + ¢ [|x — xl 


for all x satisfying h(x) < 0 and ||x — x|| < ¢ [2]. Neces- 
sary and sufficient optimality conditions of second or- 
der based on the implicit function approach (applied to 
the more general MPEC formulation) are given in [17]. 


Conditions Using the Generalized Jacobian 
of the Solution Function 


By [1], the generalized differential of the function G(x) 
:= F(x, y(x)) is equal to 


dG(x) = conv {V, F(Z) + VyF(Za: w € dy(x)}, 
(14) 


provided that the conditions C), SCQ), SSOC), and CR) 
are satisfied. Hence, the application of the necessary 
optimality conditions from Lipschitz optimization to 
problem (5) leads to necessary optimality conditions 
for the bilevel problem (2). Thus, if X is a locally op- 
timal solution of the problem (2) and the assumptions 
C), SCQ), SSOC), CR), and MFCQ) are satisfied, then 
there exist Lagrange multipliers y; > 0, i € K, such that 


0€ 0G) +) vKVA)}. 


i€K 


This is an obvious generalization of the necessary op- 
timality condition given in [4], where no upper level 
constraints in (2) appeared, and is also a special case 
of the results in [19], where the general constraint set 
x € X in the upper level problem (2) together with more 
restrictive assumptions for the lower level problem are 
used. For the use of this necessary optimality condition 
in computations the explicit description of the general- 
ized Jacobian in (11) (with equality instead of inclusion) 
is needed. 


Solution Algorithms 


The implicit function approach leads to the problem 
(5) of minimizing a nondifferentiable, nonconvex, im- 
plicitly determined function on a fixed set. Any algo- 
rithm solving nonsmooth optimization problems can 
be applied to this problem. Due to the structure of (5) 
the computation of function values and derivative in- 
formation for the objective function is expensive. Two 
types of algorithms are proposed: descent and bundle 
algorithms. The convergence proofs show that the algo- 
rithms converge to points where the above optimality 
conditions are satisfied, i.e. to solutions where no de- 
scent direction exists respectively to Clarke stationary 
points. 
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Descent Algorithms 
Let 


X = {x: hy(x) <0, k € K}. 


Descent algorithms are iterative methods which com- 
pute a sequence of feasible points {x'};en by xitl 
t; r', Vi, where r’ is a feasible direction of descent and t; 
is a stepsize. For bilevel problems a feasible direction of 
descent is obtained by minimizing the function (12) 


=x' + 


Vx F(z)r + VyF(z)y'(x3 1) 


subject to r being an inner direction of the cone of fea- 
sible directions to X: 


min {a: Vi. F(Z)r + VyF(Z)y'(X31r) < a, 
ar 


Vhi(x)r<a, i€K, |r| <1}. 
Inserting the Karush-Kuhn-Tucker conditions of the 
quadratic optimization problem (9) for the computa- 
tion of y’(x;r) and again using an active set strategy 
this problem is converted into an equivalent combi- 
natorial optimization problem. For the computation of 
a stepsize, e. g., Armijo’s rule can be applied. Such an 
algorithm is described in [6,8,17]. In [6] it is also in- 
vestigated how this idea can be generalized to the case 
when the lower level problem (1) is not assumed to have 
a uniquely determined optimal solution for all values of 
the parameter. In [17] this approach is applied to the 
more general MPEC. 


Bundle Algorithms 


Let X = R". Different constraint sets can be treated by 
use of approaches in [12]. As in descent algorithms, in 
bundle algorithms for minimizing Lipschitz nonconvex 
functions a sequence of iterates {x‘}; ey with x'*! =x! + 
tir’, Vi, is computed. For computing a direction a model 
of the function to be minimized is used. In the pa- 
per [23], the following bundle algorithm has been pro- 
posed. Let two sequences of points {x'}*_,, {z}4_, have 
already been computed. Then, for minimizing a non- 
convex function G(x), this model has the form 


kdl d 
max {v(z') 'd — ki} + . ' 


15 
1<i<k 2 ae 


where 


Ok,; = Max {Gey — (zi)! (x* — 2!) — G(z'), 


co |x* — z! 


he 


v(z') is a subgradient of the function G(x) at x = z' and 
uk is a weight. If the direction computed by minimizing 
the model function (15) realizes a sufficient decrease, 
a serious step is made (i.e. t, = 1 is used). Otherwise, 
either a short step (which means that t;, is computed ac- 
cording to a stepsize rule) or a null step (only the model 
is updated by computing a new subgradient) is made. 
For updating the model (15), in each iteration of the 
bundle algorithm a subgradient of the objective func- 
tion is needed. For its computation formula (14) can be 
used. 

The bundle algorithm is applied to problem (5) in 
[4,18,20]. In [4], the lower level problem is not assumed 
to have a uniquely determined optimal solution for 
all parameter values. The Lipschitz optimization prob- 
lem (5) is obtained via a regularization approach in the 
lower level problem (1). 

Numerical experience for solving bilevel problems 
(in the formulation (2) as well as in the more general 
MPEC formulation) with the bundle algorithm is re- 
ported in [18,20]. 
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The bilevel programming (BP) problem is a hierarchical 
optimization problem where a subset of the variables 
is constrained to be a solution of a given optimization 
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problem parameterized by the remaining variables. The 
BP problem is a multilevel programming problem with 
two levels. The hierarchical optimization structure ap- 
pears naturally in many applications when lower level 
actions depend on upper level decisions. The applica- 
tions of bilevel and multilevel programming include 
transportation (taxation, network design, trip demand 
estimation), management (coordination of multidivi- 
sional firms, network facility location, credit alloca- 
tion), planning (agricultural policies, electric utility), 
and optimal design. 

In mathematical terms, the BP problem consists of 
finding a solution to the upper level problem 


min F(x, y) 
xy 
st. g(x,y) <0, 


where y, for each value of x, is the solution of the lower 
level problem: 


min f(x, y) 
y 
st. h(x, y) < 0, 


with x € R™, y © RY, F,f > RY > Rg: Rt” 
—> R™, and h: R™*+” — R"™ (nx, ny, nu, and nl are 
positive integers). The lower level problem is also re- 
ferred as the follower’s problem or the inner problem. 
In a similar way, the upper level problem is also called 
the leader’s problem or the outer problem. One could 
generalize the BP problem in different ways. For in- 
stance, if either x or y or both are restricted to take inte- 
ger values we would obtain an integer BP problem [22]. 
Or, if we replace the lower level problem by a varia- 
tional inequality we would get a generalized BP prob- 
lem [15]. 

For each value of the upper level variables x, the 
lower level constraints h(x, y) < 0 define the constraint 
set §2(x) of the lower level problem: 


Q(x) = {y: h(x, y) <0}. 
Then, the set M(x) of solutions for the lower level prob- 


lem is given by minimizing the lower level function f(x, 
y) for all values in §2(x) of the lower level variables y: 


M(x) = {y: y € argmin { f(x, y): ye Q(x). 


Given these definitions the BP problem can be re- 
formulated as: 


min F(x, y) 
xy 


st. g(x, y) < 0, 
y € M(x). 


The feasible set 
{(x,y): g(x,y) <0, ye M(x)} 


of the BP problem is called the induced or inducible re- 
gion. The induced region is usually nonconvex and, in 
the presence of upper level constraints, can be discon- 
nected or even empty. In fact, consider the following BP 
problem 


min x—2y 
xy 
s.t. —x+3y—4<0, 
where y, for each value of x, is the solution of: 
min x+y 
y 
st x—-y<0, 
—x—-—y<0. 
For this problem we have: 
x)= yz |x 
and 
M(x) = |x|. 
Thus, the induced region is given by: 


{(x,y): —x+3y—-4<0, ye M(x)} 
= {(x,y): y=—x, -l1 <x <0} 
U{(x,y): y=x,0<x <2}, 


which is nonconvex but connected. If the upper level 
constraints were changed to 


—x+3y—-4<0, 
=yer eZ 0, 
5 =9 
then the induced region would become 


1 
(x,y): y= x, -1s+s-5| 


u {ey y=x, 
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which would be a disconnected set. In either case the 
BP problem has two local minimizers (—1, 1) and (2, 2) 
and one global minimizer (—1, 1). 

This simple example illustrates many features of 
bilevel programming like the nonconvexity and the dis- 
connectedness of the induced region and the existence 
of different local minimizers. In this example the in- 
duced region is compact. In fact, compactness of the in- 
duced region is important for the existence of a global 
minimizer and can be guaranteed under appropriate 
conditions [9]. 

The original formulation for bilevel programming 
appeared in 1973, in a paper authored by J. Bracken 
and J. McGill [5], although it was W. Candler and R. 
Norton [7] who first used the designation bilevel and 
multilevel programming. However, it was not until the 
early 1980s that these problems started receiving the at- 
tention they deserve. Motivated by the game theory of 
H. Stackelberg [20], several authors studied bilevel pro- 
gramming intensively and contributed to its prolifera- 
tion in the mathematical programming community. 

The theory of bilevel programming focuses on 
forms of optimality conditions and complexity results. 
A number of authors ([8,16], just to cite a few) have 
established original forms of optimality conditions for 
bilevel programming by either considering reformula- 
tions of the BP problem or by making use of nondif- 
ferentiable optimization concepts or even by appealing 
to the geometry of the induced region. The complex- 
ity of the problem has been addressed by a number 
of authors. It has been proved that even the linear BP 
problem, where all the involved functions are affine, is 
a strongly NP-hard problem [10]. It is not hard to con- 
struct a linear BP problem where the number of local 
minima grows exponentially with the number of vari- 
ables [6]. Other theoretical results of interest have been 
established connecting bilevel programming to other 
fields in mathematical programming. For instance, one 
can show that minimax problems and linear, integer, 
bilinear and quadratic programming problems are spe- 
cial cases of BP. Other classes of problems different 
from but related to BP are multi-objective optimization 
problems and static Stackelberg problems. See [21] for 
references in these topics. 

Many researchers have designed algorithms for the 
solution of the BP problem. One class of techniques 
consists of extreme point algorithms and has been 


mostly applied to the linear BP problem because for this 
problem, if there is a solution, then there is at least one 
global minimizer that is an extreme point of {2 [17]. 
Two other classes of algorithms are branch and bound 
algorithms and complementarity pivot algorithms that 
have in common the fact that exploit the complemen- 
tarity part of the necessary optimality conditions of the 
lower level problem (assumed convex in y so that the 
necessary optimality conditions, under an appropriate 
constraint qualification, are also sufficient). These two 
classes of algorithms have been applied mostly to the 
case where the upper level is linear and the lower level 
is linear or convex quadratic (see for instance [10] and 
[12]) and, as the extreme point algorithms, find a global 
minimizer of the BP problem. On the other hand, the 
algorithms designed to solve nonlinear forms of BP ap- 
peal to descent directions (see, among others [14] and 
[18]) and penalty functions (for instance [1]) and are 
expected to find a local minimizer. 

For additional material about bilevel programming, 
see the books [3,19], the survey papers [2,4,11,13,23], 
and the bibliography review [21]. 
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Decision-making in large, hierarchical organizations 
rarely proceeds form a single point of view. Two of the 
most prominent aspects of such organizations are spe- 
cialization closely followed by coordination. The for- 
mer arises from a practical need to isolate individual 
jobs or operations and to assign them to specialized 
units. This leads to departmentalization; however, to 
accomplish the overall task, the specialized units must 
be coordinated. The related process divides itself natu- 
rally into two parts: 
i) the establishment of individual goals and operating 
rules for each unit; and 
ii) the enforcement of these rules within the work en- 
vironment. 
The first deals with the selection of appropriate divi- 
sional or lower level performance criteria and, more 
generally, the selection of the modes of coordination 
and control. The second relates to the choice of coor- 
dination inputs. 
An important control variable in the theory of de- 
partmentalization is the degree of self-containment of 
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the organization units. A unit is self-contained to the 
extent and degree that the conditions of carrying out 
its activities are independent of what is done elsewhere 
in the system. The corporate or higher level unit is 
then faced with the coordination problem of favorably 
resolving the divisional unit interactions. Mathemati- 
cal programming has often been used as the basis for 
modeling these interactions with decomposition tech- 
niques providing solutions to problems of large scale 
(see, e.g., [9]). The central idea underlying decompo- 
sition techniques is very simple and can be envisioned 
as the following algorithmic process: top management, 
with its set of goals, asks each division of the company 
to calculate and submit an optimal production plan as 
though it were operating in isolation. Once the plans 
are submitted, they are modified with the overall bene- 
fit of the company in mind. Marginal profit figures are 
used to successively reformulate the divisional plans at 
each stage in the algorithm. An output plan ultimately 
emerges which is optimal for the company as a whole 
and which therefore represents the solution to the orig- 
inal programming problem. 

Although this procedure attempts to mimic corpo- 
rate behavior it fails on two counts. The first relates 
to the assumption that it is possible to derive a single 
objective or utility function which adequately captures 
the goals of both top management and each subordi- 
nate division. The second stems from lack of commu- 
nications among the components of the organization; 
at an intermediary stage of the calculations there is no 
guarantee that each division’s plan will satisfy the cor- 
porate constraints. In particular, if the production of 
some output by division k imposes burdens on other 
divisions by using up a scarce company resource, or by 
causing an upward shift in the cost functions pertaining 
to some other company operation, division k’s calcula- 
tion is likely to lead it to overproduce this item from 
the point of view of the company because the costs to 
other divisions will not enter its accounts. This is the 
classical problem of external diseconomies. Similarly, if 
one of division k’s outputs yields external economies 
where a rise in its production increases the profitabil- 
ity of other divisions, division k may (considering just 
its own gains in its calculations) not produce enough 
of this product to maximize the company’s profits as 
a whole. This may result in a final solution that does 
not realistically reflect the production plan that proba- 


bly would have been achieved had each division been 
given the degree of autonomy it exercises in practice. 

Another way of treating the multilevel nature of 
the resource allocation problem is through goal pro- 
gramming. T. Ruefli [11] was the first to apply this 
technique by proposing a generalized goal decompo- 
sition model. Others expanded on his work develop- 
ing models capable of representing a wide range of 
operational characteristics including informational au- 
tonomy, interdependent strategies, and bounded ra- 
tionality or individual goals. Combinations of these 
models have been used to solve problems related to 
government regulation, distribution, and control [9]. In 
[3], J.F. Bard presents an approach that derives from 
the complementary strategies of two-stage optimization 
[7], » Bilevel linear programming, [10] and equilib- 
rium analysis [13]. Decision-making between levels is 
assumed to proceed sequentially but with some amount 
of independence to account for the divergence of cor- 
porate and subordinate objectives. At the divisional 
level each unit simultaneously attempts to maximize its 
own production function and, in so doing, produces 
a balance of opposing forces. An example based on an 
integrated paper company operating three divisions is 
given to illustrate the differences between centralized 
and decentralized control. The corporate unit has little 
direct control over divisional schedules but may set in- 
ternal transfer prices which affect production capacity 
and profits. 


Multilevel Model 


A distinguishing characteristic of multilevel systems is 
that the decision maker at one level may be able to influ- 
ence the behavior of a decision maker at another level 
but not completely control his actions. In addition, the 
objective functions of each unit may, in part, be deter- 
mined by variables controlled by other units operating 
at parallel or subordinate levels. For example, policies 
affected by corporate management relating to resource 
allocation and benefits may curtail the set of strate- 
gies available to divisional management. In turn, po- 
lices adopted at the lower levels affecting productivity 
and marketing may play a role in determining overall 
profitability and growth. W.F. Bialas and M.H. Karwan 
[7] have noted the following common features of mul- 
tilevel organizations: 
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1) interactive decision-making units exist within a pre- 
dominately hierarchical structure; 
2) each subordinate level executes its policies after, and 
in view of, decisions made at a superordinate level; 
3) these extramural effects enter a decision maker’s 
problem through his objective function and feasible 
strategy set. 
The need for specialization and decentralization has 
traditionally been met by the establishment of profit 
centers. In this context, divisions or departments are 
viewed as more or less independent units charged with 
the responsibility of operating in the best possible man- 
ner so as to maximize profit under the given con- 
straints imposed by top management. The problem of 
decentralization is essentially how to design and impose 
constraints on the department units so that the well- 
being of the overall corporation is assured. The tradi- 
tional way to coordinate decentralized organizations is 
by means of the pricing mechanism; coordination is de- 
signed by analogy with the operation of a free market or 
competitive economy. Exchange of products between 
departments is allowed and internal prices are specified 
for the exchange commodities. The problem of effective 
decentralization reduces to the selection of the internal 
prices. 

The framework presented in this article is an exten- 
sion of the bilevel programming problem introduced in 
> Bilevel linear programming, and embodies a corpo- 
rate management unit at the higher level and M divi- 
sions or subordinate units at the lower level. The latter 
may be viewed as either separate operating divisions of 
an organization or coequal departments within a firm, 
such as production, finance, and sales. This structure 
can be extended beyond two levels (e. g., see [4,9]) with 
the realization that attending behavioral and opera- 
tional relationships become much more difficult to con- 
ceptualize and describe. 

To formulate the problem mathematically, suppose 
the higher level decision maker wishes to maximize his 
objective function F and each of the M divisions wishes 
to maximize its own objective function f‘. Control of 
the decision variables is partitioned among the units 
such that the higher level decision maker may select 
a vector x” € S? C R” and each lower level decision 
maker may select a vector xeSiCR™i=1,...,M. 
Letting x = (x!,...,x”) and n= )°™ | n’, in the most 
general case we have F, f1,..., fi: R” > R!. It shall be 


assumed that the corporate unit has the first choice and 
selects a strategy x° € S°, followed by the M subordinate 
units who select their strategies x! € S', simultaneously. 
In addition, the choice made at the higher level may af- 
fect the set of feasible strategies available at the lower 
level, while each lower-level decision maker may influ- 
ence the choices available to his peers. The strategies 
sets will be given explicitly by 


Sie gx x)= 0}, 
sis eo g(x’, x") <0} i= 1,...,M, 


where c° = (29,5) ic5 8 yk oeg ™ and gs R” 
> R”,i=1,...,M. 

To assure that the problem is well posed, it is com- 
mon to assume that all functions are twice continu- 
ously differentiable and that the sets S,i=0,..., M, 
are nonempty and compact; i.e., the ith unit always has 
some recourse. The bilevel multidivisional program- 


ming problem (BMPP) can now be defined: 


max F(x°, x°), (1) 
s.t. g(x’ x) <0, (2) 
max f'(x',x'), 


st. g(x’ x) <0, (4) 


When M = 0, problem (1)-(4) reduces to a standard 
mathematical program; when M = 1 a bilevel program 
results; when (1) is removed an equilibrium program- 
ming problem remains [13]. A solution to the latter 
is often taken as an equilibrium point; call it xz = 
(x',x'), where x! solves subproblem (3)-(4), i= 1,..., 
M, for x' given. Thus, xz represents a point of stability. 
No incentive exists at xg for any of the divisions to de- 
viate from X' because each has optimized its individual 
objective function. For the linear BMPP, results simi- 
lar to those presented for the linear bilevel program- 
ming problem in > Bilevel linear programming hold 
(see [3]). 


Applications 


Most applications of bilevel programming, including 
bilevel multidivisional programming, that have ap- 
peared in the literature have dealt with central eco- 
nomic planning at the regional or national level. In this 
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context, the government is considered the leader and 
controls a set of policy variables such as tax rates, sub- 
sidies, import quotas, and price supports (e. g., see [5] 
and accompanying papers). The particular industry tar- 
geted for regulation is viewed as the follower. In most 
cases, the follower tries to maximize net income subject 
to the prevailing technological, economic, and govern- 
mental constraints. Possible leader objectives include 
maximizing employment, maximizing production of 
a given product, or minimizing the use of certain re- 
sources. 

The early work of W. Candler and R. Norton [8], 
focusing on agricultural development in northern Mex- 
ico, illustrates how bilevel programming can be used 
to analyze the dynamics of a regulated economy. Simi- 
larly, J. Fortuny-Amat and B. McCarl [10] present a re- 
gional model that pits fertilizer suppliers against local 
farm communities, while E. Aiyoshi and K. Shimizu [1] 
and Bard [3] discuss resource allocation in a decentral- 
ized firm. In the case of the latter, a central unit supplies 
resources to its manufacturing facilities which make de- 
cisions concerning production mix and output. Organi- 
zational procedures and conflicting objectives over effi- 
ciency, quality and performance lead to a hierarchical 
formulation. In a work related to the original Stack- 
elberg model of a single leader-follower oligopolistic 
market in which a few firms supply a homogeneous 
product, H.D. Sherali [12] presents an extension to N 
leader firms and discusses issues related to the exis- 
tence, uniqueness, and derivation of equilibrium so- 
lutions. His analysis provides sufficient conditions for 
some useful convexity and differentiability properties of 
the followers’ reaction curves. 

In a recent study [5], the French government has 
used bilevel programming to examine the economics of 
promoting biofuel production from farm crops within 
the petro-chemical industry. The stumbling block to 
this policy is that industry’s costs for producing fuels 
from hydrocarbon-based raw materials is significantly 
less than it is for producing biofuels. Without incentives 
in the form of tax credits, industry will not buy farm 
output for conversion. The problem faced by the gov- 
ernment is to determine the level of tax credits for each 
final product or biofuel that industry can produce while 
minimizing public outlays. A secondary objective is to 
realize some predefined level of land usage for nonfood 
crops. Industry is assumed to be neutral in this scenario 


and will produce any biofuel that is profitable. In the 
model, the agricultural sector is represented by a subset 
of farms in an agriculturally intensive region of France 
and is a profit maximizer. It will use the land available 
for nonfood crops only as long as the revenue gener- 
ated from this activity exceeds the difference between 
the set-aside payments now received directly from the 
government and the maintenance costs incurred un- 
der the current support program. The resultant bilevel 
model contains 3628 variables and 3230 constraints at 
the lower level, and 8 variables and 10 constraints at the 
upper level. Both objective functions are quadratic and 
all constraints are linear. 

In an earlier effort, G. Anandalingam and V. Ap- 
prey [2] investigated the problem of conflict resolution 
by postulating the existence of an arbitrator who acts as 
the leader in a Stackelberg game. They presented mod- 
els for different configurations of the resulting multi- 
level linear programs and proposed a series of solution 
algorithms. The models were illustrated with an appli- 
cation involving a water conflict problem between India 
and Bangladesh; it is shown that both parties could gain 
by the arbitration of an international agency such as the 
United Nations. 

Recently, researchers have tried to apply bilevel 
models to the network design problem arising in trans- 
portation and telecommunications systems. In the ac- 
companying formulation, a central planner controls in- 
vestment costs at the system level, while operational 
costs depend on traffic flows which are determined by 
the individual user’s route selection. Because users are 
assumed to make decisions so as to maximize their in- 
dividual utility functions, their choices do not necessar- 
ily coincide (and may, in fact, conflict) with the choices 
that are optimal for the system. Nevertheless, the cen- 
tral planner can influence the users’ choices by improv- 
ing some links to make them relatively more attractive 
than the others. In deciding on these improvements, the 
central planner tries to influence the users’ preferences 
in such a way that total costs are minimized. The par- 
tition of the control variables between the upper and 
lower levels naturally leads to a bilevel formulation. 

A conceptual framework for the optimization of 
Tunisia’s inter-regional highways was proposed in [6]. 
The accompanying formulation included 2683 vari- 
ables (2571 at the lower level) and 820 constraints (all 
at the lower level); the follower’s problem was divided 
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into two separate subproblems as a direct consequence 
of the bilevel approach. The first centered on the user- 
optimized flow requirement (user-equilibrium) and the 
second on the nonconvex improvement functions. Be- 
cause none of the standard algorithmic approaches 
could handle problems of this size, a specialized algo- 
rithm was devised to deal with each of the two lower- 
level problems separately. At each iteration, the algo- 
rithm tries to find a better compromise with the user, 
while including the smallest possible number of non- 
convex improvement functions to get the exact solu- 
tion with the minimum computational effort. Despite 
the large number of variables and constraints, optimal- 
ity was achieved. 


Solutions 


An assessment of existing algorithms for solving vari- 
ous classes of bilevel programs indicates that exact solu- 
tions can only be guaranteed for problem instances with 
up to a few hundred variables and constraints, and then 
only for the linear case. When nonlinear (nonconvex) 
functions are included in the model, virtually all algo- 
rithms stumble in the presence of more than a handful 
of variables and constraints. The ability of those work- 
ing in the field to formulate problems far outstrips the 
capacity of current techniques to solve them optimally. 

When faced with the problem of actually having 
to provide solutions to large scale formulations, re- 
searchers have inevitably fallen back on heuristics and 
ad hoc procedures. Simulated annealing, tabu search 
and genetic algorithm-based approaches are examples 
of the more formal techniques adapted, at least for the 
linear case. In many instances, code developers were 
able to demonstrate global optimality by comparing re- 
sults with exact methods. The conclusion that can be 
drawn from these observations and related experience 
is that the need for efficient algorithms remains undi- 
minished. This is the primary reason why realistic ap- 
plications continue to lag behind theory and the devel- 
opment new codes. 
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The bilevel programming problem (abbreviation: BPP) 
is a mathematical program in two variables x and 6, 
where x = x°(@) is an optimal solution of another pro- 
gram. Specifically, BPP can be formulated in terms of 
two ordered objective functions gy and W as follows: 


i ,0 
a eee) (1) 
st.  fi(x,0)<0, ieP, 


where x = x°(@) is an optimal solution of the program 


min W(x,0) 
(x) (2) 
st. gi(x,8)<0, jeQ. 


Here the functions 9, YW, fi, gi: Rx R™ > Rie 
P, j € Q, are assumed to be continuous; x € R", 6 
€ R”; P, Q are finite index sets. Program (1) is of- 
ten called the upper (first level, outer, leader’s) prob- 
lem; then (2) is the lower (second level, inner, follower’s) 
problem. Many mathematical programs, such as min- 
imax problems, linear integer, bilinear and quadratic 


programs, can be stated as special cases of bilevel pro- 
grams. In view of the so-called Reduction Ansatz, devel- 
oped in [18,44], semi-infinite programs can be consid- 
ered as special cases of bilevel programs. For stability 
and deformations of these see, e. g., [20,21]. Problems 
appearing in such seemingly unrelated areas as best ap- 
proximation problems and data envelopment analysis 
can be viewed as bilevel programs. In the former, one is 
often interested in finding a least-norm solution in the 
set of all best approximate solutions, while, in the latter, 
one wants to rank, or decrease the number of, efficient 
decision making units by a ‘post-optimality analysis’. 
For history of bilevel programs, reviews of numerical 
methods and applications, especially for connections 
with von Stackelberg games of market economy see, e. g., 
[14,22,30,39]. In this contribution we will focus only on 
optimality conditions and duality. 


Basic Difficulties 


The study of bilevel programming problems requires 
some familiarity with point-to-set topology; see, e. g., 
[1,2,6,15]. Since the lower level optimal solution map- 
ping x° : 0x°(@) is a point-to-set mapping (rather than 
a vector function), the optimal value function of the 
BPP may be discontinuous. This is illustrated by the fol- 
lowing example: 


Example 1 Consider the bilevel program with the up- 
per level objective g (x, 8) = —x1/0, the lower level ob- 
jective W (x, 0) = — x; — x2, and the lower level feasible 
set determined by x) + 6, x2 < 1, x; > 0, x. > 0. The 
lower level optimal solutions x = x°(@) are the segment 
{x1 + X2 = 1, x, => 0, x2 > O}, for 9 = 1, and the single- 
ton [0, 1/@], when 0 < 6 < 1. The corresponding upper 
level optimal solutions, i. e., the BPP optimal solutions, 
are the points [1, 0] and [0, 1/0], respectively. Here the 
corresponding optimal value of the BPP jumps from -1 
to 0, as 0 assumes the value 1. 


Note that the lower level feasible set mapping, in Exam- 
ple 1, is lower semicontinuous (open) at 6 = 1. Hence 
we conclude that discontinuity of the optimal value can 
occur even if the lower level model is stable. 

The fact that the set of optimal solutions is gener- 
ally discontinuous in a stable situation is well known in 
linear programming. It may manifest itself in a chaotic 
behavior of the optimal solutions, but not the optimal 
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value, when the program is solved by computer repeat- 
edly with small perturbations of data; see » Nondiffer- 
entiable optimization: Parametric programming. The 
topological loss of continuity is generally unrelated to 
the conditioning, which describes numerical sensitivity 
of the solutions relative to roundoff errors. In particu- 
lar, a linear program with an ill-conditioned coefficient 
matrix can be stable. 

Another difficulty results from the fact that the op- 
timal solutions mapping x° : 6 x°(@) is not generally 
closed. Hence a BPP may not have an optimal solution 
even if the feasible set of the lower program is compact: 


Example 2 Consider the bilinear BPP: 
minx + 6, 


where x = x°(@) solves 


min —x, 
st. x0 =0, 
O<x<l1, 0<6<1 


Here the optimal solutions mapping is the function 
x°(0) = 0, if 9 > 0, and x°(0) = 1, if 6 = 0. The feasi- 
ble set of the lower level problem is a unit square in the 
(6, x)-plane, while the feasible set of the BPP is a dis- 
joint noncompact set consisting of the segment 0 < 0 < 
1 and the point [0, 1]. Since the origin is not a feasible 
point, the BPP does not have a solution. Note that the 
function x°(@) is not continuous here because the lower 
level feasible set mapping is not lower semicontinuous 
at the origin, i. e., the lower level problem is unstable. 


Optimality 


A popular approach to the study of optimality in BPP 
is to reduce the program to a one-level program. This 
can be done as follows: Denote the optimal value of the 
lower level program (2) by W°(@) and introduce the new 
constraint f°(x, 0) = W(x, 8) — ¥°(@). Now the BPP can 
be reformulated as 


min v(x, A) 


ab. fe, 2) <0, (3) 
ie R= {O}UP. 


Difficulties with this formulation generally include dis- 
continuity of the leading constraint f° and the lack of 


classical constraint qualifications. The latter can be han- 
dled in convex case using the results on optimality con- 
ditions from, e.g., [5,15,47]. One of the first attempts 
to formulate optimality conditions for bilevel program- 
ming problems, using (3), was made in [2]. However 
a counterexample to these conditions was given in 
[4,12,17], also see [10]. The one-level approach leads, 
under assumptions that guarantee Lipschitz continuity 
of the optimal value function, to necessary conditions 
of the Fritz John type. Under a partial calmness condi- 
tion, and a constraint qualification for the lower level 
problem, one obtains conditions of the Karush-Kuhn- 
Tucker type. The concept of partial calmness is equiv- 
alent to the ‘exact penalization’ and it is satisfied, in 
particular, for the minimax problem and if the lower 
level problem is linear. This approach in a nonsmooth 
framework is used in, e. g., [11] and [46]. The relation- 
ship between the BPP and an associated exact penalty 
function was explored also in [7] to derive other types 
of necessary and sufficient optimality conditions. Other 
approaches to optimality conditions, that use nons- 
mooth analysis, include [13,19,32]. Another approach 
to reducing the BPP to a single-level program is to re- 
place the lower level problem by an optimality condi- 
tion. This is usually done in formulations of numerical 
methods; see, e. g., [42]. There are also approaches that 
use the specific geometry of BPP. One of these applies 
properties of the steepest descent directions to BPP and 
it yields a necessary condition for optimality, see [33]. 
Adaptations of the well-known first and second order 
optimality conditions of mathematical programming to 
BPP appeared in [40]. Checking local optimality for lin- 
ear BPP is NP-hard; see [41]. Examples of linear BPPs 
with an exponential number of local minima can be 
generated by a technique proposed in [9]. 

Many authors have studied links between two- 
objective and bilevel programming, looking for condi- 
tions that guarantee that the optimal solution of a given 
BPP be Pareto optimal for both upper and lower level 
objective functions, and vice versa; e. g., [28,29,30,37]. 
The idea is to find an optimal solution of the BPP by 
solving a bi-objective program. It was shown in [43] 
that an optimal solution in linear BPP may not be 
a Pareto optimum for the objective function of the outer 
program and the optimal value function of the lower 
program, contrary to a claim made in [38]. The authors 
of [43] also give a sufficient condition for the implica- 
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tion to hold. If an optimal solution exists, in the linear 
BPP case with a compact feasible set at the lower level, 
then at least one optimal solution is assumed at a ver- 
tex of this set, see [3]. Necessary conditions for opti- 
mality can also be stated using marginal value formu- 
las for optimal value functions. However, these formu- 
las can not assume a usual constraint qualification in 
order to be applied to the formulation (3). One such 
formula in parametric convex programming is given in 
[48] and, under slightly different assumptions, in [49]. 
In the latter, it is used in the context of data envel- 
opment analysis to rank efficiently administered uni- 
versity libraries by their radii of rigidity. Existence of 
optimal solutions is studied in [16,23,24]; constraints 
in [24] are defined by an implicit variational problem. 
Both, existence and stability of solutions and approx- 
imate solutions are studied in [27]. Optimality condi- 
tions are important for checking optimality, formula- 
tion of duality theories, and for numerical methods. 


Parametric Approach To Optimality 


A parametric approach to characterizing global and lo- 
cal optimal solutions in convex BPP can be described as 
follows: Denote, for every 9, the optimal value of (3) by 


min (x, 6) 
g°(8) = 4 


st. fi(x,0)<0, iG R={O}UP. 


Also, denote the feasible set in the x variable by F(@) 
= {x: fi(x, 0) < 0, i € R, and the feasible set in the 0 
variable by 

F={0ER™: F(0) £9}. 


A parametric formulation of the BPP is 


min g°(@) 
st. OEF, 


(4) 


Here we optimize the optimal value of the outer prob- 
lem over the feasible set in the variable 0, considered 
as a ‘parameter’. The problem of the form (4) is a ba- 
sic problem of parametric programming, e. g., ®» Non- 
differentiable optimization: Parametric programming. 


It has been extensively studied in the literature from 
both the theoretical and the numerical side. In particu- 
lar, various optimality conditions have been formulated 
for it, e. g., in the context of input optimization; see [48]. 
The key observation in the parametric approach is that, 
under the assumption that the feasible set of the lower 
program is compact, every 6* that globally solves the 
parametric program (4), with the corresponding opti- 
mal solution x* of the program (3), is a global optimal 
solution of the bilevel program, and vice versa. How- 
ever, under the compactness assumption, both sets can 
be empty (as demonstrated by Example 2). A necessary 
and sufficient condition for global optimality in convex 
BPP can be given over a ‘region of cooperation’ in terms 
of the existence of a saddle point; see [15]: Given a can- 
didate for global optimality @* and the set of all optimal 
solutions at the lower level {x° (@)}, 6 € F. Denote by 
K(@*) the region in the 6-space, where the minimal in- 
dex set of active constraints R= (@) = {i€ R: x € {x°(@)} 
=> fi (x, 0) = 0} does not strictly increase, i.e., K(0*) = 
{0 €¢ F: R° (8) C R°(6*)}. Then the region of coopera- 
tion at 0* is defined as the set {(0, x)}: 0 € K(0*),x€ 
F(@)}. One can characterize global optimality on the en- 
tire feasible set for linear BPP, and also for convex BPP 
provided that the constraints are ‘LFS functions’, e.g. 
[35,48]. These functions form a large class of convex 
functions that includes all linear and polyhedral func- 
tions. Characterizations of global optimality are sim- 
plified under the so-called sandwich condition. This is 
a two-sided global inclusion involving the set of opti- 
mal solutions of the inner program, e. g., [15]. Charac- 
terizations of locally optimal parameters 6* for convex 
(4) require lower semicontinuity of the optimal solu- 
tions mapping x°. The results apply to the convex BPP 
with the additional assumption that the corresponding 
optimal solution x* € {x° (0*} is unique; see, e.g., [15]. 
The uniqueness assumption in the characterization of 
local optimality cannot be replaced by the requirement 
that the set {x°(@*)} be compact. The following example 
illustrates a situation where a local optimum of the BPP 
can not be recovered by the parametric approach. 


Example 3 Consider the program min g(x, @) = x67, 
where x solves min W(x, 6) = 0, subject to —1 <x, 0 < 
1. Here x* = 1, 0* =0 isa local minimum of the bilevel 
program. But y°(0) = — 6? and 6* = 0 is not its local 
minimum; in fact, it is an isolated global maximum! 
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Duality 


Duality theories for bilevel programming problems can 
be formulated by adjusting the duality theories of math- 
ematical programming (see, e.g., [34]) to the single- 
objective model (3). Let us outline how this works using 
a parametric approach; we follow the ideas from [15]. 
Instead of a single ‘dual’ one obtains a collection of sev- 
eral ‘subduals’, each closely related to the original (pri- 
mal) program. The number of these subduals is cardi- 
nality of the set 


TT ={2 CR: 2 =R* (6) for some 6 € F}. 


First, with each 2 C IT, one associates the feasible sub- 
region Sq = {0 € F: R°(0) = Q}, the Lagrangian Lag (x, 
6; u) = 9 (x, 0) + Vier\@ ui f' (x, 8), and the point- 
to-set mapping Fe : F > R" defined by Fg (0) = {x: f' 
(x, 0) < 0, i € 2}. The corresponding subdual function 
is 


Po(u) = inf {Le(x,6;u): 0 € Se, x € Fe(6)} 
and the subdual (D, §2) is defined as 
sup {®alu): ueé [Sg > Ree) ; (5) 


Here u belongs to the set of all nonnegative vector 
functions defined on Sg. The duality results, stated for 
partly convex programs in, e.g., [47] can be reformu- 
lated for the outer convex model and hence BPP. In par- 
ticular, if, for some 92 C IT, u* € [Se > eo), and 
an optimal solution x* of the inner program for some 
fixed 0* € Sg, one has Pe(u*) = g (x*, O*), then u* 
solves the subdual (5) and 6* solves (4) on Sg. 

If optimization of the optimal value function in (4) 
is performed from some fixed ‘initial’ 0, but using only 
parameter paths that preserve continuity of the optimal 
solutions mapping of the lower problem, then we talk 
about stable BPP. This approach, in the convex case, 
guarantees that the optimal solutions mapping in BPP 
is closed and that the optimal value function is continu- 
ous, thus removing the two basic difficulties mentioned 
in Section 1. However, the optimal solutions now de- 
pend on the initial choice of the parameter and on 
a particular class of stable paths used. Stable paramet- 
ric programming has been studied in [48], stable BPP is 
mentioned (but not studied) in [15]; see [36]. 
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Introduction 


A function f(x, y) is called bilinear if it reduces to a lin- 
ear one by fixing the vector x or y to a particular value. 
In general, a bilinear function can be represented as fol- 
lows: 


f(x,y) =a'x+x'Qyt+b'y, 


Where a,x € R"”, b, y € R”™, and Q is a matrix of 
dimension n x m. It is easy to see that bilinear func- 
tions compose a subclass of quadratic functions. We 
refer to optimization problems with bilinear objective 
and/or constraints as bilinear problems, and they can 
be viewed as a subclass of quadratic programming. 
Bilinear programming has various applications in 
constrained bimatrix games, Markovian assignment 
and complementarity problems. Many 0-1 integer pro- 
grams can be formulated as bilinear problems. An ex- 
tensive discussion of different applications can be found 


in [5]. Concave piecewise linear network flow prob- 
lems, fixed charge network flow problems, and multi- 
item dynamic pricing problems, which are very com- 
mon in the supply chain management, can be also 
solved using bilinear formulations (see, e. g., [7,8,9]). It 
should be noted that more general convex/non-convex 
optimization problems can be reduced to a bilinear 
problem as well, and different reduction techniques can 
be found in [1,2,10]. 


Formulation 


Despite a variety of different bilinear problems, most of 
the practical problems involve a bilinear objective func- 
tion and linear constraints, and theoretical results are 
derived for those cases. In our discussion we consider 
the following bilinear problem, which we refer to as BP. 


: _ I T bt 
a eae 

where X and Y are nonempty polyhedra. The BP for- 

mulation is also known as a bilinear problem with a dis- 

joint feasible region because the feasibility of x (y) is in- 

dependent form the choice of the vector y (x). 


Equivalence to Other Problems 


Below we discuss some theoretical results, which reveal 
the equivalence between bilinear problems and some of 
concave minimization problems. 

Let V(x) and V(y) denote the set of vertices of X 
and Y, respectively, and g(x) = minyey f(x,y) = a'x 
+ minyey{x’Qy + b'y}. Note that minyey f(x, y) 
is a linear programm. Because the solution of a lin- 
ear problem attains on a vertex of the feasible region, 
g(x) = minyey f(x,y) = minyeviy f(x, y). Using 
those notations, the BP problem can be restated as 


min 
xEX,yEY 


f(x,y) = min ey) 


= min (or F(x, y)} = min g(x). (1) 
Observe that the set of vertices of Y is finite, and for 
each y € Y, f(x, y) is a linear function of x; therefore, 
function g(x) is a piecewise linear concave function of x. 
From the later it follows that BP is equivalent to a piece- 
wise linear concave minimization problem with linear 
constraints. 
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It also can be shown that any concave minimiza- 
tion problem with a piecewise linear separable objec- 
tive function can be reduced to a bilinear problem. To 
establish this relationship consider the following opti- 
mization problem: 


min = di(xi). (2) 


where X is an arbitrary nonempty set of feasible vectors, 
and ¢;(x;) isa concave piecewise linear function of only 
one component xj, 1. e., 


cle; +s(=@l(x)) x € [A,A1) 


j= cx; + 57(= $7(x;)) x; € (A, A2) 


cM +S (= GMa) 2 € ANAT] 


II 


with cS. 2? Sas Se. Let K; {1,2,..., nj}. 
Because of the concavity of #;(x;), the function can be 
written in the following alternative form 


Gi(xi) = ming (e)} = mintejxi +s/}. ©) 


Construct the following bilinear problem: 


min f(x,y) = > > oN (xiyt 


xEX,yeEY - 
i ké€K; 


= 0 Vcixi + sh)yi (4) 


i k€K; 


where Y = [0, 1]2:!¥il. The proof of the following the- 
orem follows directly from Equation (3), and for details 
we refer to the paper [7]. 


Theorem 1 If (x*, y*) is a solution of the problem (4) 
then x* is a solution of the problem (2). 


Observe that X is not required to be a polytop. If X is 
a polytop then the structure of the problem (4) is similar 
to BP. 

Furthermore, it can be shown that any quadratic 
concave minimization problem can be reduced to a bi- 
linear problem. Specifically, consider the following op- 
timization problem: 


min @(x) = 2a'x + x'Qx, (5) 
xEX 


where Q is a symmetric negative semi-definite matrix. 
Construct the following bilinear problem 


. _ oT T T 
ed a ytx Qy, (6) 
where Y = X. 


Theorem 2 (see [4]) If x* is a solution of the problem 
(5) then (x*, x*) is a solution of the problem (6). If (x, 9) 
is a solution of the problem (6) then x and y solve the 
problem (5). 


Properties of a Solution 


In the previous section we have shown that BP is equiv- 
alent to a piecewise linear concave minimization prob- 
,lem. On the other hand it is well known that a concave 
minimization problem over a polytop attains its solu- 
tion on a vertex (see, for instance, [3]). The following 
theorem follows from this observation. 


Theorem 3 (see [4] and [3]) If X and Y are bounded 
then there is an optimal solution of BP, (x*, y*), such 
that x* € V(X) and y* € V(Y). 


Let (x*, y*) denote a solution of BP. By fixing the vec- 
tor x to the value of the vector x*, the BP problem re- 
duces to a linear one, and y* should be a solution of the 
resulting problem. From the symmetry of the problem, 
a similar result holds by fixing the vector y to the value 
of the vector y*. The following theorem is a necessary 
optimality condition, and it is a direct consequence of 
the above discussion. 


Theorem 4 (see [4] and [3]) If (x*, y*) is a solution of 
the BP problem, then 


min f(x, y") = f(x*, y*) = min f(x", y) (7) 
xe yey 


However, (7) is not a sufficient condition. In fact it 
can only guarantee a local optimality of (x*, y*) un- 
der some additional requirements. In particular, y* has 
to be the unique solution of minycy f(x*, y) problem. 
From the later it follows that f(x*,y*) < f(x*,y), 
Vy € V(Y), y 4 y*. Because of the continuity of 
the function f(x,y), for any y € V(y), y # 9%, 
f(x*,y*) < f(x, y) in a small neighborhood Uy of the 
point x*. Let U = ()yeyiyy,y¢yx Uy. Then f(x*, y*) < 
f(x,y), Vx € U,y € V(Y), y # y*. At last ob- 
serve that Y is a polytop, and any point of the set 
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can be expressed through a convex combination of 
its vertices. From the later it follows that f(x*, y*) 
< f(x,y), Vx € U, y € Y, which completes the proof 
of the following theorem. 


Theorem 5 If (x*, y*) satisfies the condition (7) and 
y” is the unique solution of the problem minyey f(x*, y) 
then (x*, y*) is a local optimum of BP. 


Recall that BP is equivalent to a piecewise concave min- 
imization problem. Under the assumptions of the theo- 
rem, it is easy to show that x* is a local minimum of the 
function g(x) as well (see [4]). 


Methods 


In this section we discuss methods to find a solution of 
a bilinear problem. Because BP is equivalent to a piece- 
wise linear concave minimization problem, any solu- 
tion algorithm for the later can be used to solve the for- 
mer. In particular, one can employ a cutting plain al- 
gorithms developed for those problems. However, the 
symmetric structure of the BP problem allows con- 
structing more efficient cuts. In the paper [6], the au- 
thor discusses an algorithm, which converges to a so- 
lution that satisfies condition (7), and then proposes 
a cutting plain algorithm to find the global minimum 
of the problem. 

Assume that X and Y are bounded. Algorithm 1, 
which is also known as the “mountain climbing” pro- 
cedure, starts from an initial feasible vector y° and it- 
eratively solves two linear problems. The first LP is ob- 
tained by fixing the vector y to the value of the vector 
y”". The solution of the problem is used to fix the 
value of the vector x and construct the second LP. If 
f(x™, y™!) A f(x™, y™), then we continue solving 
the linear problems by fixing the vector y to the value 
of y”. If the stopping criteria is satisfied, then it is easy 
to show that the vector (x, y”) satisfies the condition 
(7). In addition, observe that V(X) and V(Y) are fi- 
nite. From the later and the fact that f(x”, y”~!) > 
f(x™, y”) it follows that the algorithm converges in 
a finite number of iterations. 

Let (x*, y*) denote the solution obtained by the 
Algorithm 1. Assuming that the vertex x* is not de- 
generate, denote by D the set of directions d; along 
the ages emanating from the point x*. Recall that 
g(x) = minycy f(x, y) is a concave function. To con- 


Step 1: Let y° € Y denote an initial feasible solu- 
tion, and m <— 1. 


Step 2: Let x” = grenminex({ (x,y _)}, and 
y™ = argminyey{f(x™, y)}. 

Stepasslini(e ym acy then stop, 
Otherwise, m < m + 1 and go to Step 2. 


Bilinear Programming, Algorithm 1 
Mountain Climbing Procedure 


struct a valid cut, for each direction dj find the maxi- 
mum value of 6; such that g(x*+6;d;) => f(x*, y*)—e, 
1.e., 


6, = argmax{6;|g(x* + 6;d;) = f(x*, y*) —e}, 


where ¢ is a small positive number. Let C = (dj,..., 


dn), 
1 i \* 
al ( Fog) oes ; 


and X; = X() A}. If X; = @ then 


Al = 


eo = f(x", y")—e, 
and (x*, y*) is a global e-optimum of the problem. If 
X, # % then one can replace X by the set Xj, i. e., con- 
sider the optimization problem 


f(x,y), 


min 
xEX),yEY 
and run Algorithm 1 to find a better solution. How- 
ever, because of the symmetric structure of the prob- 
lem, a similar procedure can be applied to construct 


Step 1: Apply Algorithm 1 to find a vector (x*, y*) 
that satisfies the relationship (7). 


Step 2: Based on the solution (x*, y*), compute 
the appropriate cuts and construct the sets X; and 
Mo 

Step 3: If X; = 9 or Y; = G, then stop; (x*, y*) is 
a global g-optimal solution. Otherwise, X < Xj, 
Y < Yj, and go to Step 1. 


Bilinear Programming, Algorithm 2 
Cutting Plane Algorithm 
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a cut for the set Y. Let Ay denote the corresponding 
= 1 : 

half-space, and Y; = Y() Aj. By updating both sets, 

i.e., considering the optimization problem 


min f(x,y), 


xEX),yEY) 


the cutting plane algorithm (see Algorithm 2) might 
find a global solution of the problem using less number 
of iterations. 
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Introduction 


Many problems in the supply chain management can 
be formulated as a network flow problem with speci- 
fied arc cost functions. Let G(N, A) represent a network 
where N and A are the sets of nodes and arcs, respec- 
tively, and f,(x,) denotes an arc cost function. In the 
network, there are supply and demand nodes, and the 
main objective of the problem is to minimize the total 
cost by satisfying the demand from the available sup- 
ply. In addition, one can assume that the arc flows are 
bounded, which corresponds to the cases where a ship- 
ment along an arc should not exceed a specified capac- 
ity. The mathematical formulation of the problem can 
be stated as 


min f (x) =) fala) (1) 
acA 

st. Bx=b (2) 

xqg€[0,A,] VaeA (3) 


where B is the node-arc incident matrix of network G, 
and b is a suplly/demand vector. In the next section, we 
discuss two formulations where fa(xq) is either a con- 
cave piecewise linear or fixed charge function of the arc 
flow. The concave piecewise linear functions are typi- 
cally used in the cases where merchandisers encourage 
to buy more products by offering discounts in the unit 
price for large orders. In [6], the authors showed that 
the problem in these settings is NP hard. Some heuris- 
tic procedures to solve the problem are discussed in [10] 
and [13]. The fixed charge functions are used in the 
cases where regardless the quantity of the shipment it 
is required to pay a fixed cost to ship along an arc. The 
fixed cost might be the cost of renting a truck, ship, air- 
plane, or train to transport goods between nodes of the 
network. The problem can be modeled as a 0-1 mixed 
integer linear program and most solution approaches 
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utilize branch-and-bound techniques to find an exact 
solution (see [1,2,5,7,16]). Some heuristic procedures 
are discussed in [3,4,8,9,12,14]. In this article we show 
that both problems are equivalent to a bilinear problem 
with a disjoint feasible region. 

In addition to choosing a proper production level, 
sometimes managers have to make pricing decisions as 
well. In particular, one can assume that the satisfied de- 
mand is a function of the price, i. e., lower prices gener- 
ate an additional demand. Such functional relationship 
between the prices and the satisfied demand is com- 
monly used by economists. However, because of the 
production capacity restrictions, fixed costs related to 
the production process, seasonality and other factors, 
often it is not feasible to satisfy the optimal level of de- 
mand, and managers should consider optimal produc- 
tion and inventory levels in combination with pricing 
decisions to maximize the net profit during a specified 
time period. One of such problems and an equivalent 
bilinear formulation are discussed in the next section as 
well. 

In addition to the bilinear formulations of the sup- 
ply chain problems, in Sect. “Methods” we explore the 
structure of the bilinear problems and discuss difficul- 
ties in applying the standard computational methods. 
Despite the intricacy, the section proposes some heuris- 
tic methods to find a near optimum solution to the 
problems. The solution obtained by a heuristic proce- 
dure can also be used to expedite exact algorithms. 


Formulation 
Concave Piecewise Linear Network Flow Problem 
In the problem (1)-(3), assume that f,(x,) is a piece- 


wise linear concave function, i.e., 


Xa € [0,&) 
Xa € [&1, &2) 


bi Xe 8, (= 70) 

2 2(_ 72 
fe) ee ES 
(oa a + sia(= a(Xq)) Xa € [Ee AG] F 
With. ¢ oe > 2.20 Let KR, = 41,5, 2.2505). Be 


cause of the concavity of fa(x,q), it can be written in the 
following alternative form 


fala) = min{f(xa)} = min{etxe +sh}. (A) 


By introducing additional variables y* € [0,1], k € Ka, 
construct the following bilinear problem. 


min g(x,y) = D1] Dp caya [tet DD save 


acA | keKg aA keKa 
=) DO aay’ 
acA keK, 
(5) 
st. Bx =b (6) 
Yi yk=1l VaeA (7) 


keKa 


xq €([0,Ag],yk>0 VaeA and kEK, (8) 


In [13], the authors show that at any local minima of the 
bilinear problem, (x, 7), j is either binary vector or can 
be used to construct a binary vector with the same ob- 
jective function value. Although the vector 7 may have 
a fractional components, the authors note that in prac- 
tical problems it is highly unlikely. The proof of the 
theorem below follows directly from (4). Details on the 
proof as well as transformation of the problem (1)-(3) 
into (5)—(8) can be found in [13]. 


Theorem 1 [If (x’,y') is a global optima of the problem 
(5)-(8) then x’ is a solution of the problem (1)-(3). 


According to the theorem, the concave piecewise linear 
network flow problem is equivalent to a bilinear prob- 
lem in a sense that the solution of the later is a solution 
of the former. It is important to notice that the prob- 
lem (5)-(8) does not have binary variables, i. e., all vari- 
ables are continuous. However, at optimum y’ is a bi- 
nary vector, which makes sure that in the objective only 
one linear piece is employed. 


Fixed Charge Network Flow Problem 


In the case of the fixed charge network flow problem, 
we assume that the function fa(x,) has the following 
structure. 


CaXa+Sa Xa € (0,Aq] 


Fa(Xa) = 0 xX, =0 


Observe that the function is discontinuous at the origin 
and linear on the interval (0, Aq]. 
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Management, Figure 1 
Approximation of function fq(xq) 


Let €, € (0, Aq], and define 


Xa € lea, Aal 
Xq € [0, €,) 


Eq _ CaXaq + Sa 
Pilea) =) Coe 
where c&* = Cq + Sq/Eq. It is easy to see that £7(xq) = 
falXa), VXa € {03}U [€a,Aq] and Pa'(Xa) < falXa), 
Vxa € (0, €a), i.e. 6§*(xq) approximates the function 
fa(Xa) from below. (see Fig. 1). Let us construct the fol- 
lowing concave two-piece linear network flow problem. 


min 6°(x) = ) 7 Gi" (Xa) (9) 
acA 

s.t. Bx = b, (10) 

x,€[0,A,], VaeAd, (11) 


where € denotes the vector of ¢,. Function $*(x) as 
well as the problem (9)-(11) depends on the value of 
the vector e. In the paper [14], the authors show that 
for any value of €, € (0,Aq], a global solution of the 
problem (9)-(11) provides a lower bound for the fixed 
charge network flow problem, i.e. f°(x°) < f(x*), 
where x® and x’ denote the solutions of the correspond- 
ing problems. 


Theorem 2 (see [14]) For all ¢ such that €, € (0,Aq] 
foralla € A, $*(x*) < f(x*). 


Furthermore, by choosing a sufficiently small value for 
€q one can ensure that both problems have the same 
solution. Let 6 = min{x'|x” € V(x),a € A, x’ > 0}, 


where V(x) denotes the set of vertices of the polyhe- 
dra (10)-(11). Observe that 5 is the minimum among 
all positive components of all vectors x” € V(x); there- 
fore, 6 > 0. 


Theorem 3 (see [14]) For all ¢ such that ¢, € (0, 4] for 
alla € A, p°(x*) = f(x*). 


Theorem 3 proves the equivalence between the fixed 
charge network flow problem and the concave two- 
piece linear network flow problem (9)-(11) in a sense 
that the solution of the later is a solution of the for- 
mer. As we have seen in the previous section, concave 
piecewise linear network flow problems are equivalent 
to bilinear problems. In particular, problem (9)-(11) is 
equivalent to the following bilinear problem. 


min 2 [CaXa + Sal Va + C5" Xa [1 - yal (12) 
Be acA 

st. Bx =b, (13) 
Xa>0, and y,€[0,1], VaeA, (14) 


where €, € (0, 6]. 


Capacitated Multi-Item Dynamic Pricing Problem 


In the problem, we assume that a company during a dis- 
crete time period A is able to produce different com- 
modities from a set P. In addition, we assume that 
at each point of time j € A and for each product 
p € Pa functional relationship f(p,j)(d(p,j)) between 
the satisfied demand and the price is given, i. e., in order 
to satisfy the demand d(p,j) of the product p, the price 
of the product at time j should be equal to f(p,;)(d(p,;)). 
As a result, the revenue generated from the sales of the 
product p at time jis g(p,j)(4p.)) = fiv.(dp.p) 4.0: 
Although we do not specify the function f(p,;)(d(p,j)), 
it should ensure that g(p,;)(dip,;)) is a concave function 
(see Fig. 2a). 

Because of the concavity of g(p,j)(d(p,j)), there ex- 
ists a point diy, j)» Such that the function reaches its 
maximum, and producing and selling more than dp, 
is not profitable. Therefore, without lost of general- 
ity, we can assume that dip,j, € [0, dp,i]- According 
to the definition of g(p,j)(d(p,;)), it is a concave mono- 
tone function on the interval [0, ee ;j)]. To avoid non- 
linearity in the objective, one can approximate it by 
a concave piecewise linear function. Doing so, divide 
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The revenue function and its approximation 


[0, dp, j)] into intervals of equal length, and let dé Pi) 
k € {1,...,N}U{0} = K U{0}, denote the end points 
of the intervals, Then the approximation can be defined 


as 
N 
is k k 
Sonor) = do aM 
k=1 
k = dk k k 
i Sp.) = 80, p) Soe. Mp,) A p,3y 
k k 
4b = 1, and p,j) = 0,Vp € Pj € A (see 
rig 2b). 
Let xf denote the amount of product p that is 


(p,i,j) 
a at time i and sold at time j using the unit 


price 85,3) Idi, )= ia = fo.n(4G p)- In addition, 
let y(p,i) denote a binary variable, which equals one if 
ies nee > 0 and zero otherwise. Costs associated 
with the production rts include — costs 
ci i, 3)? production costs Cot y and setup costs Cp ) At 
last, let C; represent the productiot capacity ai (ne 1; 
which is “shared” by all products. Using those defini- 
tions, one can construct a linear mixed integer formula- 
tion of the problem. Below we provide a simplified for- 
mulation of the problem, where the variables A(p,j) are 
eliminated from the formulation. For the details on the 
mathematical formulation of the problem and its sim- 
plification we refer to [15]. 


max) 7) » » Aadat i,j) XCo,i4f) = 65,1) Msi) 


pEP ieA |jeAli<j keK 


(15) 


Ci,VieA, (16) 


dd dM. S 


peP jeAli<j keK 


De bia < CiM(p,i), 


jEAli<j kEK 

VpeP and ie€A, (17) 
a Dae “bia a 
keK i€Ali<j diy, j) 

VpeP and jeEA, (18) 
k 
X(p,i,) 29>  Vp,i) € 10,15, 

VpeP,ije A and kek, (19) 
k wr 
where q/ Ca 168. Gan es 


Let X = {x|x > Oand a i,j) be feasible to (16) 


and (18)},and Y = [0, 1]!?!!4!, Consider the following 
bilinear problem. 


max XxX; 
xEX, ae = 
i ~~ ™~ i | k k cst 
>; Pe ys V(p,i,*(p.i.i) — “(p.i) | (eA) 
pEP icA | jeAli<j kEK 
(20) 


Theorem 4 (see [15]) A global maximum of the bilin- 
ear problem (20) is a solution or can be transformed into 
a solution of the problem (15)-(19). 


Methods 


In the previous section, we have discussed several prob- 
lems arising in the supply chain management. To solve 
the bilinear formulations of the problems, one can em- 
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ploy techniques applicable for general bilinear prob- 
lems. In particular, a cutting plain algorithm proposed 
by Konno can be applied to find a global solution of the 
problems. In addition, he proposes an iterative proce- 
dure, which converges to a local minimum of the prob- 
lem in a finite number of iterations. For details on the 
procedure, which is also known as “mountain climb- 
ing” procedure (MCP), and the cutting plain algorithm 
we refer to the paper [11] or Bilinear Programming sec- 
tion of this encyclopedia. 

Below, we discuss problem specific difficulties of ap- 
plying the above mentioned algorithms and some ef- 
fective heuristic procedures, which are able to provide 
a near optimum solution using negligible computer re- 
sources. The MCP, which is used by the heuristics to 
find a local minimum/maximum of the problems, is 
very fast due to a special structure of both LP prob- 
lems employed by the procedure. However, to obtain 
a high quality solution, in some problems it is necessary 
to solve a sequence of approximate problems. The bilin- 
ear formulations of the supply chain problems typically 
have many local minima. Therefore, cutting plain algo- 
rithms may require many cuts to converge. By combin- 
ing the heuristic procedures with the cutting plain algo- 
rithm, one can reduce the number of cuts by generating 
deep cuts. 

One of the main properties of a bilinear problem 
with a disjoint feasible region is that by fixing vectors x 
or y to a particular value, the problem reduces to a lin- 
ear one. The “mountain climbing” procedure employs 
this property and iteratively solves two linear problems 
by fixing the corresponding vectors to the solution of 
the corresponding linear programs. In the case of con- 
cave piecewise linear network flow problem, given the 
vector x, the problem (5)-(8) can be decomposed into 
|A| problems, 


min Sle Ra + 8*) 


{yk|keKa} kE€K, 


s.t. Ss =, yk >0 WRkEKg. 


kEKa 


Furthermore, it can be shown that a solution of the 
problem is a binary vector, which has to satisfy the in- 


equality 


be as es Dee 


ke€Kg keKa 


As a result, one can employ a search technique by as- 
signing yk = 1 if Ek! < & < &* and yk = 0, 
Vk € Ka, k # k. On the other hand, by fixing the vector 
y to the value of the constructed vector ¥, the problem 
(5)-(8) reduces to the following network flow problem. 


: kak 
min] yh | 


acA | keKg 


st. Bx=b, x,>0, WaeA 
Observe that ) ex, y= ck , and different vectors j 
change the cost vector in the problem. 

Although the MCP converges to a local minimum, 
it can provide a near optimum solution for the problem 
(5)-(8) if the initial vector is such that p?* = 1 and 
v = 0, Wk € Ky, k 4 ng. The effectiveness of the 
procedure is partially due to the fact that in the supply 
chain problems f,(x,) is an increasing function. In ad- 
dition, the procedure requires less computer resources 
to converge because both linear problems are relatively 
easy to solve. A detailed description of the procedure, 
properties of the linear problems, and computational 
experiments can be found in [13]. 

In the case of fixed charge network flow problems, 
it is not obvious how to choose the vector ¢. Theorem 3 
guarantees the equivalence between the fixed charge 
network flow problem and the bilinear problem (12)- 
(14) ife, € (0, 5]. However, according to the definition, 
it is necessary to find all vertices of the feasible region 
to compute the value of 6, which is computationally ex- 
pensive. Even if the correct value of 5 is known, typi- 
cally it is a very small number. As a result, the value of 
Eq is close to zero, and c** is very large compared to the 
value of cq. The later creates some difficulties for finding 
a global solution of the bilinear problem. In particular, 
the MCP may converge to a local minimum, which is 
far from being a global solution. 

To overcome those difficulties, [14] proposes a pro- 
cedure where it gradually decreases the value of ¢ (see 
Algorithm 1). The algorithm starts from an initial value 
for the vector ¢, i.e., €g = Aq. After constructing the 
corresponding bilinear problem, it employs the MCP 
to find a local minimum of the problem. If the stopping 
criteria is not satisfied, the value of € is updated, i.e., 
Eq = ME, where a € (0,1), and the algorithm again 
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solves the updated bilinear problem using the current 
solution as an initial vector for the MCP. 

The choice of w has a direct influence on the CPU 
time of the algorithm and the quality of the solution. 
Specifically, if the value of @ is closer to one, then due to 
the fact that e decreases slowly, the algorithm requires 
many iterations to stop. On the other hand, if the val- 
ues of the parameter is closer to zero, it may worsen the 
quality of the solution. A proper choice of the parame- 
ter depends on the problem, and it should be chosen by 
trials and errors. In the paper [14], the authors test the 
algorithm on various randomly generated test problems 
and found satisfactory to choose a = 0.5. 

As for the stopping criteria, it is possible to show 
that the solution of the final bilinear problem is the so- 
lution of the fixed charge network flow problem if on 
Step 2 one is able to find a global solution of the corre- 
sponding bilinear problems. For details on the numer- 
ical experiments, stopping criteria and other properties 
of the algorithm, we refer to [14]. 

In the problems with pricing decisions, one may 
also experience some difficulties to employ the MCP for 
finding a near optimum solution. To explore the prop- 
erties of the problem, consider the following two linear 
problems, which are constructed from the problem (20) 
by fixing either vector x or y to the value of the vector x 
or j, respectively. 


LP, : 


max) |) » ye ti.n*G (p,i,f) 


peP icA |jEeAli<j kEK 


£ 
— €(p,i)|M(p.1) 


LP, : 


ee ax) Ds [4 fi po) | Xpsisj)* 


pEP icA jeAli<j keK 


The MCP solves iteratively LP; and LP, problems, 
where the solution of the first problem is used to fix the 
corresponding vector in the second problem. However, 
if one of the components of the vector y equals to zero 
during one of the iterations, e.g., jp, = 0, then in 
the second ee coefficients of the corresponding 
variables x paij) are equal to zero as well. As a result, 
changes in the values of those variables do not have any 
influence on the objective function value. Furthermore, 
because the products “share” the capacity and other 
products may have positive coefficients in the objective, 


Step 1: Let eg <— Aq, x? <— 0, y? <— 0, and 
m <1. 

Step 2: Find a local minimum of the problem (12)- 
(14) using the MCP. Let (x, y”") denote the solu- 
tion found by the algorithm. 

Step 3: If da € A such that x” € (0,e”") then 
Eq <— Eq, m < m +1, and go to step 2. Other- 
wise, stop. 
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it is likely that at optimum of LP, x bi isi) =0,Vj EA, 
k € K. From the later, it follows that (p,:) = 0 during 
the next iteration, and one concludes that if some prod- 
ucts are eliminated from the problem during the iter- 
ative process, the MCP does not consider them again. 
Therefore, it is likely that the solution returned by the 
algorithm is far from being a global one. To avoid zero 
coefficients in the objective of LP), [15] proposes an ap- 
proximation to the problem (20), which can be used in 
the MCP to find a near optimum solution. 
To construct the approximate problem, let 


1 _ k k cst 
(pi) X(p.i) = » » D(p,i.f)* (pid) — “(p,i)? 
jeAlisj kek 


and 


re ye oi i,j) Xp i,j) 


(Psi) jE Ali<j keK 


2 
i) (X(p,1)) = 
eee ere 


Step 1: Let (p,;) be a sufficiently large number, 
Vepsi) =1,VpeP,ie A,andm<0. 


Step 2: Construct the approximation problem (21), 
and find a local maximum of the problem using 
the MSP. Let (x”*!, y”*!) denote the solution re- 
turned by the algorithm. 


Step 3: If dp € P andi € A such that 
k (m+1)k 
VjeAlisj oeex V(p,i,j)* se 


(p.i.3) 
(m+1)k 
DjeAlicj Lkex X(p,i,9) 


m + Land go to Step 2. Otherwise, stop. 


— (pi) © &G,a and 


> 0 then e <— awe, m <— 
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where €(p,i) > 0, and x(p,i) is the vector of xt, i,j: US- 
ing those functions, construct the following bilinear 
problem 


& 
max x, = 
exer ( y) 


ee [ ip. 0.0) 70.0 Pp,i(X(p,n 1 — oi), 


pEeP ieA 
(21) 


where the feasible region is the same as in the problem 
(20). The authors show that g°(x, y) approximates the 
function g(x, y) from above. 


Theorem 5 (see [15]) There exists a sufficiently small 
& >0 such that a solution of the problem (20) is a solu- 
tion of the problem (21). 


Algorithm 2 starts from a sufficiently large value of £(», i) 
and finds a local maximum of the corresponding bilin- 
ear problem (21) using the MCP. If the stopping criteria 
is not satisfied then it updates the value of ¢ to we, up- 
dates the bilinear problem (21), and employs the MCP 
to find a better solution. Similar to the fixed charge net- 
work flow problem, the choice of a has a direct influ- 
ence on the CPU time of the algorithm and the quality 
of the returned solution. The running time of the algo- 
rithm and the quality of the solution for the different 
values of w are studied in [15]. 

In addition to w, one has to find a proper initial 
value for the parameter €(,,;). Ideally, it should be equal 
to the maximum profit that can be generated by pro- 
ducing only product p at time i. However, it requires 
solving a linear problem for each pair (p,i) € P x A, 
which is computationally expensive. On the other hand, 
it is not necessary to find an exact solution of those LPs, 
and one might consider a heuristic procedure which 
provides a quality solution within a reasonable time. 
One of such procedures is discussed in [15]. 


References 


1. Barr R, Glover F, Klingman D (1981) A New Optimization 
Method for Large Scale Fixed Charge Transportation Prob- 
lems. Oper Res 29:448-463 

2. Cabot A, Erenguc S (1984) Some Branch-and-Bound Pro- 
cedures for Fixed-Cost Transportation Problems. Nav Res 
Logist Q 31:145-154 

3. Cooper L, Drebes C (1967) An Approximate Solution 
Method for the Fixed Charge Problem. Nav Res Logist Q 
14:101-113 


4. Diaby M (1991) Successive Linear Approximation Proce- 
dure for Generalized Fixed-Charge Transportation Prob- 
lem. J Oper Res Soc 42:991-1001 

5. Gray P (1971) Exact Solution for the Fixed-Charge Trans- 
portation Problem. Oper Res 19:1529-1538 

6. Guisewite G, Pardalos P (1990) Minimum concave-cost net- 
work flow problems: applications, complexity, and algo- 
rithms. Ann Oper Res 25:75-100 

7. Kennington J, Unger V (1976) A New Branch-and-Bound 
Algorithm for the Fixed Charge Transportation Problem. 
Manag Sci 22:1116-1126 

8. Khang D, Fujiwara O (1991) Approximate Solution of 
Capacitated Fixed-Charge Minimum Cost Network Flow 
Problems. Netw 21:689-704 

9. Kim D, Pardalos P (1999) A Solution Approach to the Fixed 
Charge Network Flow Problem Using a Dynamic Slope 
Scaling Procedure. Oper Res Lett 24:195-203 

10. Kim D, Pardalos P (2000) Dynamic Slope Scaling and Trust 
Interval Techniques for Solving Concave Piecewise Linear 
Network Flow Problems. Netw 35:216-222 

11. Konno H (1976) A Cutting Plane Algorithm for Solving Bi- 
linear Programs. Math Program 11:14-27 

12. Kuhn H, Baumol W (1962) An Approximate Algorithm for 
the Fixed Charge Transportation Problem. Nav Res Logist 
Q9:1-15 

13. Nahapetyan A, Pardalos P (2007) A Bilinear Relaxation 
Based Algorithm for Concave Piecewise Linear Network 
Flow Problems. J Ind Manag Optim 3:71-85 

14. Nahapetyan A, Pardalos P (2008) Adaptive Dynamic Cost 
Updating Procedure for Solving Fixed Charge Network 
Flow Problems. Comput Optim Appl 39:37-50. doi:10. 
1007/s10589-007-9060-x 

15. Nahapetyan A, Pardalos P (2008) A Bilinear Reduction 
Based Algorithm for Solving Capacitated Multi-ltem Dy- 
namic Pricing Problems. Comput Oper Res J 35:1601-1612. 
doi:10.1016/j.cor.2006.09.003 

16. Palekar U, Karwan M, Zionts S (1990) A Branch-and-Bound 
Method for Fixed Charge Transportation Problem. Manag 
Sci 36:1092-1105 


Bi-Objective Assignment Problem 


JACQUES TEGHEM 
Lab. Math. & Operational Research Fac., 
Polytechn. Mons, Mons, Belgium 


MSC2000: 90C35, 90C10 
Article Outline 


Keywords 
Direct Methods 


Bi-Objective Assignment Problem 


289 


Two-Phase Methods 


First Step 
Second Step 
Heuristic Methods 
Preliminaries 
Determination of PE(A"), !=1,...,L 
Generation of E(P) 
Concluding Remarks 
See also 
References 
Keywords 


Multi-objective programming; Combinatorial 
optimization; Assignment 


Until recently (1998), multi-objective combinatorial op- 
timization (MOCO) did not receive much attention in 
spite of its potential applications. The reason is prob- 
ably due to specific difficulties of MOCO models as 
pointed out in » Multi-objective combinatorial opti- 
mization. Here we consider a particular bi-objective 
MOCO problem, the assignment problem (AP). This is 
a basic well-known combinatorial optimization prob- 
lem, important for applications and as a subproblem of 
more complicated ones, like the transportation prob- 
lem, distribution problem or traveling salesman prob- 
lem. Moreover, its mathematical structure is very sim- 
ple and there exist efficient polynomial algorithms to 
solve it in the single objective case, like the Hungarian 
method. In a bi-objective framework, the assignment 
problem can be formulated as: 


n n 
: k 
‘min’ z,(X) = > ye ty 
i=1 j=1 
k=1,2, 
n 
(P) y= 1, se 
j=l 
n 
ye = 1 f= 1 7M, 
i=1 
Xij € {0, 1} 
where ae are nonnegative integers and X = (xj, ..., 


Xnn). Our aim is to generate the set of efficient solu- 
tions E(P). It is important to stress that the distinc- 
tion between the supported efficient solutions (belong- 
ing to SE (P)), i.e. those which are optimal solutions 


of the single objective problem obtained by a linear ag- 
gregation of the objectives, and the nonsupported effi- 
cient solutions (belonging to NSE(P) = E(P)\SE(P)) (see 
> Multi-objective integer linear programming) is still 
necessary even if the constraints of the problem satisfy 
the so-called ‘totally unimodular’ or ‘integrality’ prop- 
erty: when this property is verified, the integrality con- 
straints of the single objective problem can be relaxed 
without any deterioration of the objective function, i.e. 
the optimal values of the variables are integer even if 
only the linear relaxation of the problem is solved. It is 
well known that the single objective assignment prob- 
lem satisfies this integrality property, and thus this is 
true for the problem (see ® Multi-objective combina- 
torial optimization): 


min za(X) = AqzZ(X) + A2z2(X) 


n 
a= 1, alle ahs 
j=l 

P n 

(Pa) eye, j=l,. A 
i=1 
Xij € {0,1} 

A, >0, A2.>0. 


Nevertheless, in the multi-objective framework, 
there exist nonsupported efficient solutions, as indi- 
cated by the following didactic example: 


4 
CY = 


Ce = 


FoF WDB Wn au 


YN WH WT ONY 
Wm ork N FB WY 


The values of the feasible solutions are represented in 
the objective space in Fig. 1 

There are four supported efficient solutions, cor- 
responding to points Z), Z2, Z3 and Z4; two nonsup- 
ported efficient solutions corresponding to points Zs; 
and Z,; the eighteen other solutions are nonefficient. 


Remark 1 In [7], D.J. White analyzes a particular case 
of problem (P) corresponding to 


(kK) __ ig. 
Cij = 46 jk 
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6 9 12 


16 19 22 


Bi-Objective Assignment Problem, Figure 1 
The feasible points in the (z;, Z2)-space for the didactic exam- 
ple 


where 


1 ifj=k, 
0 iffj#k. 


For this particular problem, he proves that E(P) = 
SE(P). 


We consider the problem to generate E(P) and (see 
> Multi-objective combinatorial optimization) we can 
distinguish three methodologies: direct methods; two- 
phase methods and heuristic methods. 


Direct Methods 


In [1], the authors propose a theoretical enumerative 
procedure to generate E(P) in the order of increasing 
values of z;: at each step they consider the admissible 
edges incident at the current basis and among the set of 
possible new bases, they selected the one with the best 
value of z;: they affirm that this basis corresponds to 
a new efficient solution. As proved by the example de- 
scribed above, this procedure appears false: for instance 
from point Zs = (16, 11), corresponding to the solution 
X14 = X22 = X33 = X4) = 1, it is impossible to obtain by 
an unique change of basis the following point Z. = (19, 
10), corresponding to the solution x13 = x2) = x34 =X42 = 
1. Moreover the real difficulties induced by the high de- 


generacy of the assignment problem are not taken into 
account in [1]. 


Two-Phase Methods 


The principle of this approach, and the first phase de- 
signed to generate SE(P), are described in ® Multi-ob- 
jective combinatorial optimization; by complementary, 
we analyse here the second phase [3]. 

The purpose is to examine each triangle AZ,Z, de- 
termined by two successive solutions X" and X° of SE(P) 
(see Fig. 2) and to determine the possible nonsupported 
solutions whose image lies inside this triangle. We note 
that 


Za(X) = Ayz(X) + Azz2(X) 


with = Zo, — Zo, and Az = Z; — Z, and a= Ay ec. 

tigen ; Fe : 

In the first phase, the objective function z,(X) has 
been optimized by the Hungarian method giving 

@ 2 = Apzir + A2zZr = ArZis + ArzZ5, the optimal 
value of z,(X); 

e the optimal value of the reduced cost = c!  — 
(uj + v;), where u; and v; are the dual variables asso- 
ciated respectively to constraints i and j of problem 
(Pa). 


At optimality, we have o > Qand x; = 1 > 
a =e. 

First Step 

We consider L = {riy: @ > ob. To generate non- 


supported efficient solution in triangle AZ,Z,, each 
variable xj € L is candidate to be fixed to 1. Never- 
theless, a variable can be eliminated if we are sure that 
the reoptimization of problem (PA) will provide a dom- 
inated point in the objective space. If xj € L is set to 1, 
a lower bound Jj of the increase of Z, is given by 


—(A) 
;min re ik Y min Ci 


on + min (2) oe) 
k#j k#i 


eM), min¢”) + min oY) : 
ists peg FR ei Kis 


where the indices i, and j, (i; and j,) are such that in the 
solution X" (respectively, X°) we have 


Xing =Xij, =1, (ig = ij, = VD. 
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A Z1ig + A2Z2p for z(X) 


Bi-Objective Assignment Problem, Figure 2 
Test 1 


Effectively, to re-optimize problem (Pj) with xj = 1, in 
regard with its optimal solution X” (respectively, X°), it 
is necessary to determine, at least, a new assignment in 
the line i, (respectively, i,) and in the column j, (respec- 
tively, j;). But clearly, to be inside the triangle AZ,Z,, 
we must have (see Fig. 2) 


Ga + dij < Artis + Arzy. 


Consequently, we obtain the following fathoming test: 
e (Test 1): xj € L can be eliminated if Z) + ]j > A1z1s 
+ A2Zor or, equivalently, if lj => A1A2. 
So in this first step, the lower bound 1; is determined for 
all xj; € L; the list is ordered by increasing values of lj. 
Only the variables not eliminated by test 1 are 
kept. Problem (P,) is re-optimized successively for each 
noneliminated variable; let us note that only one itera- 
tion of the Hungarian method is needed. After the op- 
timization, the solution is eliminated if its image in the 
objective space is located outside the triangle AZ,Z,. 
Otherwise, a nondominated solution is obtained and 
put in a list NS,,; at this time, the second step is applied. 


Second Step 


When nondominated points Z), ..., Zm € NS;s are 
found inside the triangle AZ,Z,, then test 1 can be im- 
proved. Effectively (see Fig. 3), in this test the value 


Aizis + Anz 
can be replaced by the lower value 


(CP) = max (Aqz,i41 + A2z,i), 


i=0,....m 


where Z, = Z;, Zm + 1 = Z;, with Y = AZ, mii + 
A222, 0: 


22 


Bi-Objective Assignment Problem, Figure 3 
Test 2 


The new value corresponds to an updated upper 
bound of z,(X) for nondominated points. More vari- 
ables of L can be eliminated with the new test 
e (Test 2): xj € L can be eliminated if 

Za tli >= max (Aiz,i+1 + A2z,i)- 


i=0,....m 


Each time a new nondominated point is obtained, the 

list NS, and the test 2 are updated. The procedure stops 

when all the x € L have been either eliminated or ana- 

lyzed. At this moment the list NS,, contains the nonsup- 

ported solutions corresponding to the triangle AZ,Z,. 
When each triangle have been examined 


NSE(P) = U;.NS;5. 


Numerical results are given in [3]. 


Heuristic Methods 


As described in » Multi-objective combinatorial opti- 
mization, the MOSA method is an adaptation of the 
simulated annealing heuristic procedure to a multi- 
objective framework. Its aim is to generate a good ap- 
proximation, denoted E(P), of E(P) and the procedure 
is valid for any number K > 2 of objectives. Similarly 
to a single objective heuristic in which a potentially op- 
timal solution emerges, in the MOSA method the set 
E(P) will contain potentially efficient solutions. 


Preliminaries 


e A wide diversified set of weights is considered: dif- 
ferent weight vectors A”, | € L, are generated where 
29 = (A), -1,..., x with AM > 0, Vkand 


Aree? 


K 
yra®=1, Vek. 
k=1 
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e A scalarizing function s(z, A) is chosen, the effect 
of this choice on the procedure is small due to the 
stochastic character of the method. The weighted 
sum is very well known and it is the easiest scalar- 
izing function: 


K 
s(z,A) = pa AkZk- 
k=1 


e The three classic parameters of a simulated anneal- 
ing procedure are initialized 
- To: initial temperature (or alternatively an initial 
acceptance probability Po); 
- a (<1): the cooling factor; 
- Ntep: the length of temperature step in the cool- 
ing schedule; 
and the two stopping criteria are fixed: 
- T-stop: the final temperature; 
- Netop: the maximum number of iterations with- 
out improvement 
e A neighborhood V(X) of feasible solutions in the 
vicinity of X is defined. This definition is problem 
dependent. It is particularly easy to define V(X) in 
the case of the assignment problem: if X is charac- 
terized by x;;, = 1,i=1,..., n, then V(X) contains 
all the solutions Y satisfying 


Viji = 1, ie {1 Fossey n} \ {a, b}, 
Vai, = Vege = 1, 
where a, b are chosen randomly in {1,..., n}. 


Determination of PE(A M), L=Ayeieg 


For each / € L the following procedure is applied to de- 
termine a list PE(A”) of potentially efficient solutions. 
a) (Initialization): 

- Drawat random an initial solution Xo. 

-— Evaluate z;,(Xo), Vk. 

- PE(A”) = {Xo}; Ne =n=0. 
b) (Iteration n): 

-— Draw at random a solution Y € V(X,) 

- evaluate z;,(Y) and determine 


Aze = zK(Y) — z(Xn), Wk. 
- Calculate 


As = s(z(Y), A) — s(z(X,), A). 


If A s < 0, we accept the new solution: 
Xn4i<Y N, = 0. 


Else we accept the new solution with a certain 
probability p = exp(—A s/T,,): 


P 


< Y, N, = 0, 
Xnt l-p 
<« X, No=N.+1. 


- If necessary, update the list PE(A”) in regard to 
the solution Y. 
- n<—nt+1 


IF n( mod N¢ep) = 0 

THEN T;, = aT,~1; ELSE T, = Ty-1. 
IF Ne = Ngtop OR T < Tetop 

THEN stop ELSE iterate. 


Generation of E(P) 


Because of the use of a scalarizing function, a given set 
of weights 2 induces a privileged direction on the ef- 
ficient frontier. The procedure generates only a good 
subset of potentially efficient solutions in that direction. 
Nevertheless, it is possible to obtain solutions which are 
not in this direction, because of the large exploration of 
D at high temperature; these solutions are often dom- 
inated by some solutions generated with other weight 
sets. 

To obtain a good approximation E(P) to E(P) it is 
thus necessary to filter the set 


UIE PE(A) 


by pairwise comparisons to remove the dominated so- 
lutions. This filtering procedure is denoted by A such 
that 


E(P) = AI! PE(A). 


A great number of experiments is required to determine 
the number L of set of weights sufficient to give a good 
approximation of the whole efficient frontier. 


Concluding Remarks 


Details and numerical results are given in [3] and [5]. 
Let us add that it is easy to adapt the MOSA method 

in an interactive way [2]; a special real case study of an 

assignment problem is treated in this manner in [6]. 
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The biquadratic assignment problem was first intro- 
duced by R.E. Burkard, E. Cela and B. Klinz [2], as 
a nonlinear assignment problem that has applications 
in very large scale integrated (VLSI) circuit design. 
Given two fourth-dimensional arrays A = (aj) and B 
= (Vmpst) with n* elements each, the nonlinear integer 
programming formulation of the BiQAP is 


min S\ > aijeiDmpstXimXjpXksX10 


i,jsk,l m.p.syt 


n 
s.t. yxyp=i, j=il,... 


i=1 


n 
) xij = 1, 2 ey 7 
j=l 


xij € {0,1}, i,j=l,...,n. 


The BiQAP is a generalization of the quadratic assign- 
ment problem (cf. » Quadratic assignment problem) 
(QAP), where the objective function is a fourth degree 
multivariable polynomial and the feasible domain is the 
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assignment polytope as in the QAP. An equivalent for- 
mulation of the BIQAP using permutations is the fol- 
lowing: 


n n n n 
min DDD aiinibono nooo: 


i=1 j=1 k=1 1=1 


where §,, denotes the set of all permutations of the in- 
teger set N = {1,..., n}. 

Burkard, Cela and Klinz [2] showed that the BiQAP 
is NP-hard. They computed lower bounds for BiQAP 
derived from lower bounds of the QAP. The computa- 
tional results showed that these bounds are weak and 
deteriorate as the dimension of the problem increases. 
This observation suggests that branch and bound meth- 
ods (cf. also » Integer programming: Branch and 
bound methods) will only be effective on very small 
instances. For larger instances, efficient heuristics, that 
find good-quality approximate solutions, are needed. 

Burkard and Cela [1] developed several heuristics 
for the BiQAP, in particular deterministic improve- 
ment methods and variants of simulated annealing and 
tabu search. Computational experiments on test prob- 
lems with known optimal solutions [1], suggest that 
one version of simulated annealing is best among those 
tested. T. Mavridou, P.M. Pardalos, L.S. Pitsoulis, and 
M.G.C. Resende develop a GRASP heuristic for solving 
the BiQAP in [3], which finds the optimal solution for 
all the test problems presented in [1]. 


See also 


> Feedback Set Problems 

> Generalized Assignment Problem 

> Graph Coloring 

> Graph Planarization 

> Greedy Randomized Adaptive Search Procedures 
> Quadratic Assignment Problem 
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The centuries-old method of bisection can be gener- 
alized to provide a global optimization algorithm for 
Lipschitz continuous functions. Full details of the algo- 
rithm, acceleration methods and its performance can be 
found in [1,6,7]. (Recall that f : R” — R is Lipschitz con- 
tinuous if there is an M > 0 such that |f(x) — f(y)| < M 
|| x — y || for all x, y € R”. We then term M a Lipschitz 
constant of f.) 

The familiar bisection method enables us to find 
a point of interest on the line by first bracketing the 
point in an interval, and then successively halving the 
interval. It is used in this way, for example, to find 
the root of a continuous function or to show that 
a bounded sequence always has a limit point. The bisec- 
tion method is simple and convergence is assured and 
linear. 

The bisection method can also be used (although we 
never think of it in this role) to find the minimum of 
a semi-infinite interval [m, oo), as illustrated in the left- 
hand side of Table 1. Given an initial interval bracket 
around m we examine the midpoint: if the midpoint is 
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A comparison of the bisection method and the generalization to higher dimensions 


Bisection Multidimensional bisection 
(n = 0) (n > 0) 
Problem 
\ / 
[m,00) Epigraph of f 
@ m 
m 
Find m Find m 
Natural domain (in R") 
Point line segment (n = 1) 


hexagon (n = 2) 


rhombic dodecahedron (n = 3) 


Initial bracket (in R"*') 


Single interval 


Union of (n + 1)-dimensional simplexes 


Bracket reduction 


Interval halving 


Reduction of simplexes, 


followed by elimination 


Convergence 


Nall brackets = {m} 
Bracket size halves 


Mall brackets = {all global minima} 
Bracket depth reduces linearly 


in [m, oo) then we retain the lower interval whereas if 
the midpoint is not in [m, oo) we retain the upper in- 
terval. It is this idea that has been generalized to higher 
dimensions to give the algorithm, detailed here, that has 
been termed in the literature multidimensional bisec- 
tion. 

It can be shown (see [7]) that the analogue in R"*! of 
an upper semi-infinite interval in R is the epigraph (ev- 
erything above and including the graph) of a Lipschitz 
continuous function. Multidimensional bisection finds 
the set of global minima of a Lipschitz continuous func- 
tion f of n variables over a compact domain, in a man- 
ner analogous to the bisection method. At any stage in 
the iteration the bracket is a union of similar simplexes 


in R™*?, with the initial bracket a single simplex. (A sim- 
plex is a convex hull of affinely independent points, so 
a triangle, a tetrahedron and so on.) In the raw version 
of the algorithm the depth of the bracket decreases lin- 
early and the infinite intersection of all brackets is the 
set of global minima of the graph of the function. 

The algorithm works thanks to two simple facts and 
a very convenient piece of geometry. First, however, we 
note a property of a Lipschitz continuous function with 
Lipschitz constant M: if x € R" lies in the domain of the 
function and (x, y) (with y € R) lies in the epigraph of 
the function, then (x, y) + C lies in the epigraph, where 
C is an upright spherically based cone of slope M, with 
apex at the origin. 
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(a, f(z) 


One of the 
three new 
simplexes 


Standard 
simplex 


Edge with 
slope M@ 


.{#,0) 


Bisection Global Optimization Methods, Figure 1 

A standard simplex and the three smaller standard simplexes 
resulting from reduction; when (x, f(x)) — A is removed from 
the standard simplex three similar standard simplexes re- 
main 


Now for the two simple facts: if we evaluate the 
function f at any point in the domain, then no point 
higher than (x, f(x)) can be the global minimum on the 
graph of f and no point in the interior of a (x, f(x)) — 
C can be the global minimum. Informally, this means 
that every evaluation of f lets us slice away an upper half 
space and an upside down ice-cream cone, with apex at 
(x, f(x)), from the space R"*'; we are sure the global op- 
tima are not there. These two operations coalesce in the 
familiar bisection method. 

Now for the convenient geometry, which comes to 
light as soon as we attempt to generalise the bisection 
method. Spherically based cones are ideal to use, but 
hard to keep track of efficiently [3], so we use a sim- 
plicial approximation to the spherical base of the cone 
to make the bookkeeping easy. Such a simplex-based 
cone, A, has a cap which we call a standard simplex; 
one is shown as the large simplex in Fig. 1, for the case 
when n = 2. It fits snugly inside C, so the sloping edges 
have slope M. If we know that the global optimum lies 
in this simplex bracket and evaluate f at x, then we can 
remove (x, f(x)) — A from the space. Conveniently, this 
leaves three similar standard simplexes whose union 
must contain the global minima, as shown in Fig. 1. 
This process is termed reduction of the simplex. 


What does a typical iteration of the algorithm do? 
At the start of each iteration the global minima are held 
in a multidimensional bracket, a union of similar stan- 
dard simplexes. We denote this set of simplexes, or sys- 
tem, by 8. An iteration consists of reducing some (pos- 
sibly all) of these simplexes, followed by elimination, 
or retaining the portions of the bracket at the level of, 
or below, the current lowest function evaluation. For 
this reason an iteration can be thought of informally as 
‘chop and drop’, or formally as ‘reduce and eliminate’. 

How do we start off? The algorithm operates on cer- 
tain natural domains which we must assume contain 
a global minimizer (just as we begin in the familiar bi- 
section method by containing the point of interest in 
an interval). For functions of one variable a natural do- 
main is an interval, for functions of two variables it is 
a hexagon, while for functions of three variables the 
natural domain is a rhombic dodecahedron (the hon- 
eycomb cell). For higher dimensions the pattern con- 
tinues; in each dimension the natural domains are ca- 
pable of tiling the space. By means of n + 1 function 
evaluations at selected vertices of the natural domain it 
is possible to bracket the global optima over the natu- 
ral domain in an initial single standard simplex, termed 
the initial system. 

In brief, given a Lipschitz continuous function f on 
a standard domain, the algorithm can be summarised 
as: 


1 | Set i = 0 and form the initial system Sp. 

Form Sj,;, by applying reduction and then 
elimination to the system S;. 

3 | If a stopping criterion is satisfied (such as that 
the variation of the system is less than a pre- 
assigned amount), then stop. Otherwise, incre- 
ment i and return to Step 2. 


Multidimensional bisection 


By the variation of the system is meant the height 
from top to bottom of the current set of simplexes. The 
following example illustrates the course of a run of mul- 
tidimensional bisection. 

Take f(x1, x2) = —e-*i sin x; + |x2|, which has 
a global minimum on its graph at (0.653273, 0, 
—0.396653). There are also local minima along the 
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Bisection Global Optimization Methods, Table 2 

Example of a run of multidimensional bisection. Note how 
the number of simplexes in the system decreases in the 8th 
iteration; this corresponds to the elimination of simplexes 
around local, and nonglobal, minima 


Iter Simpl. Variat. Best 
in the point 
system to date 


1 33.300 
3 20.000 


(10.000, —10.000, 10.000) 
(10.000, 6.667, 6.667) 


0 

1 

2 9 9.892 (1.340, 1.667, 1.505) 
5 108 =1.959 (25.637, 0.185, 0.185) 
7 264 0.504 (0.839, 0.074, —0.294) 
8 39 =0.257 = (0.649, —0.036, —0.361) 
15 369 0.007 (0.669, 0.000, —0.396) 
18 924 0.001 (0.653, 0.000, —0.397) 
19 1287 0.000 (0.651, 0.000, —0.397) 


x,-axis. We use as our standard domain the regular 
hexagon with center at (10, 10) and radius 20, and use 
M = 1. Table 2 provides snapshots of the progress of the 
algorithm to convergence; it stops when the variation is 
less than 0.001. We carry the best point to date, shown 
in the final column of the table. 

In this example we reduced all simplexes in the sys- 
tem at each iteration. This ensures that the infinite in- 
tersection of the brackets is the set of global minima. In 
[6] it is shown that, under certain conditions, the opti- 
mal one-step strategy is to reduce only the deepest sim- 
plex in each iteration. With this reduction and n = 1 
multidimensional bisection is precisely the Piyavskii- 
Shubert algorithm [4,5]. 

Raw multidimensional bisection can require a large 
number of function evaluations, but can be economi- 
cal with computer time (see [2]). As described so far, 
the method does not use the full power of the spher- 
ical cone, rather a simplicial approximation, and this 
approximation rapidly worsens as the dimension in- 
creases. Fortunately, much of the spherical power can 
be utilized very simply, by raising the function evalua- 
tion to an effective height. This is trivial to implement 
and has been called spherical reduction [6]. Reduction, 
as described so far, removes material only from a single 
simplex, whose apex determines the evaluation point. 
Simplexes overlap when n > 2, and it is possible to re- 


move material from many simplexes rather than just 
one. This is harder to implement, but has been carried 
out in [1] where it is termed complete reduction. The al- 
gorithm operates more efficiently when such improved 
reduction methods are used. 

Multidimensional bisection collapses to bisection 
with n = 0 when we use a primitive reduction process, 
one which depends only on whether the point in R™! 
considered lies in the epigraph of f; this is described in 
[7]. A summary comparison of bisection and multidi- 
mensional bisection is given in Table 1. 


See also 


> «BB Algorithm 
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The conventional nonfuzzy relations using the classical 
two-valued Boolean logic connectives for defining their 


operations will be called crisp. The extensions that re- 
place the 2-valued Boolean logic connectives by many- 
valued logic connectives will be called fuzzy. A uni- 
fied approach of relations is provided here, so that the 
Boolean (crisp, nonfuzzy) relations and sets are just 
special cases of fuzzy relational structures. The first 
part of this entry on nonfuzzy relations can be used 
as reference independently, without any knowledge of 
fuzzy sets. The second part on fuzzy structures, how- 
ever, refers frequently to the first part. This is so because 
most formulas in the matrix notation carry over to the 
many-valued logics based extensions. 

In order to make this material useful not only theo- 
retically but also in practical applications, we have paid 
special attention to the form in which the material is 
presented. There are seven distinguishing features of 
our approach that facilitate the unification of crisp and 
fuzzy relations and enhance their practical applicability: 
1) Relations in their predicate forms are distinguished 

from their satisfaction sets. 

2) Foresets and aftersets of relations are used in addi- 
tion to relational predicates. 

3) Relational properties are not only global but also lo- 
cal (important for applications). 

4) Nonassociative BK-products are introduced and 
used both in definitions of relational properties and 
in computations. 

5) The unified treatment of computational algorithms 
by means of matrix notation is used which is equally 
applicable to both crisp and fuzzy relations. 

6) The theory unifying crisp and fuzzy relations makes 
it possible to represent a whole finite nested family 
of crisp relations with special properties as a single 
cutworthy fuzzy relation for the purpose of compu- 
tation. After completing the computations, the re- 
sulting fuzzy relation is again converted by a-cuts 
to a nested family of crisp relations, thus increasing 
the computing performance considerably. 

7) Homomorphisms between relations are extended 
from mappings used in the literature to general rela- 
tions. This yields generalized morphisms important 
for practical solving of relational inequalities and 
equations. 

These features were first introduced in 1977 by W. 

Bandler and L.J. Kohout [1] and extensively developed 

over the years both in theory and practical applications 

[7,30,52]. 
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Boolean Relations 
Propositional Form 


A binary relation (from A to B) is given by an open 
predicate ___P___ with two empty slots; when the first is 
filled with the name a of an element of A and the sec- 
ond with the name b of an element of B, there results 
a proposition, which is either true or false. If aPb is true, 
we write aRpb and say that ‘a is Rp-related to b’. If a P 
b is false, we write a — Rp b and say that ‘a is not Rp- 
related to b’, etc. When it is unnecessary to emphasize 
the propositional form the subscript is dropped in Rp, 
writing: R, a Rb, a — Rb, respectively. 


Heterogeneous and Homogeneous Relations 


The lattice of all binary (two-place, 2-argument) rela- 
tions from A to B is denoted by R(A ~ B). Relations 
of this kind are usually called heterogeneous. Nothing 
forbids the set B to be the same as A, in which case we 
speak of relations ‘within a set’ or ‘in a set’, or ‘ona set’, 
and call these homogeneous. 

Relations from A to B can always be considered as 
relations within A U B, but so ‘homogenized’ relations 
may lose some valuable properties (discussed below), 
when so viewed. For this reason, we do not attempt to 
assimilate relations between distinct sets to those within 
a set. 


The Satisfaction Set 


The satisfaction set or representative set or extension set 
of a relation R € R(A — B) is the set of all those pairs 
(a, b) € A x B for which it holds: 


Rs = {(a,b) © AX B: aRb}. 


Clearly Rs is a subset of the Cartesian product A x 

B. Knowing Rp, we know Rs; knowing Rs, we know 

everything about Rp except the wording of its ‘name’ 
P. 


The Extensionality Convention 


This convention says that, regardless of their proposi- 
tional wordings, two relations should be regarded as 
the same if they hold, or fail to hold between exactly 


the same pairs: Rs = Rs’ => Rp = Rp’. In the set theory, 
this appears as the axiom of extensionality. This conven- 
tion is not universally convenient; it is perhaps partly 
responsible for delays in the application of relation the- 
ory in the engineering, social and economical sciences 
and elsewhere. 

Once the extensionality convention has been 
adopted, it becomes a matter of indifference, or mere 
convenience, whether a relation is given by an open 
predicate or by the specification of its satisfaction set. 
There is a one-to-one correspondence between the sub- 
sets Rs of A x B and the (distinguishable) relations Rp 
in R(A — B). Since Rs and Rp now uniquely determine 
each other, the current fashion for set-theoretical parsi- 
mony suggests that they be identified. This view is com- 
mon in the literature, which often defines relations as 
being satisfaction sets. We, however, maintain the dis- 
tinction in principle. 


Example of the failure of the extensionality convention 


R>, Qs € R(A ~ B);A={1, 6, 8}, B={0, 5, 7}. 
Predicates: 
Pig 
[Poko 


__'(‘__is greater than or equal to _’) 
>__' (‘__is greater than _') 


Relations in their Predicate Form: 
R> ={1>0,8>0,8>5,8>7,6>0,6>5} 
Q,={1>0,8>0,8>5,8>7,6>0,6>5} 


The Satisfaction Sets: 

Rs = Qs = {(1, 0), (8, 0), (8, 5), (8, 7), (6, 0), (6, 5)}. 

By the extensionality convention: 

Rs =Qs = R> =Qs. 

So, R should be the same relation as Q. This is not the case, 
because the predicates are not equivalent: 

(Wx) x P1 x is true, but (Wx) x P2 x is false. 

Hence the extensionality convention fails for these relations. 


The Digraph Representation 


When B = A, so that we are dealing with a relation 
within a set, we may use the digraph Rp to represent 
it; in which an arrow goes from a to a’ if and only if a 
Ra’. Any relation within a finite or countably infinite 
set can, in principle, be shown in a digraph; conversely, 
every digraph (with unlabelled arrows) represents a re- 
lation in the set of its vertices. Interesting properties of 
relations are often derived from digraphical considera- 
tions; there is a whole literature on digraphs. 
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Foresets and Aftersets of Relations 


These are defined for any relation R from A to B. 
e The afterset of a € Ais 


aR = {be B: aRb}. 
e The foreset of b € Bis 
Rb={aeA: aRb}. 


Mnemonically and semantically, an afterset consists 
of all those elements which can correctly be written af- 
ter a given element, a foreset of those which can cor- 
rectly be written before it. An afterset or foreset may 
well be empty. 

Clearly, b € a R if and only if a € R b. A relation is 
completely known if all its foresets or all its aftersets are 
known. 


Matrix Representation 


Very important computationally and even conceptu- 
ally, as well as being a useful visual aid, is the incidence 
matrix Ry of a relation R. This arises from a table in 
which the row-headings are the elements of A and the 
column-headings are the elements of B, so that the cells 
represent A x B. In the (a, b)-cell is entered 1 if a R b, 
and 0 if a — Rb. For visual purposes it is better to sup- 
press the Os, but they should be understood to be there 
for computational purposes. 


aR = {1} 

aok = {b1, b2, 53} 
a3 kh = {by, ba} 
aR = 6 


Example: The matrix representation Ry and the afterset rep- 
resentation of a relation R 


Clearly there is a one-to-one correspondence (bijec- 
tion) between distinct tables and distinct relations, and, 
as soon as there has been agreement on the names and 
ordering of the row and column headings, between ei- 
ther of these and distinct matrices of size |A| x |B| with 
entries from {0, 1}. 

Furthermore, the afterset a;R is in one-to-one cor- 
respondence with the nonzero entries of the ith row 
of Ry; the foreset Rb; is in one-to-one correspondence 
with the nonzero entries of the jth column of Ry. 


Operations and Inclusions in R(A ~ B) 


There are a considerable number of natural and impor- 
tant operations. We begin with unary operations and 
then proceed to several kinds of binary ones. 


Unary Operations 


The negated or complementary relation of R€ R(A > 
B) is = R € R(A — B) given by a — Rb if and only if it 
is not the case that aRb. 

The converse or transposed relation of RE R(A > 
B) is RT € R(B > A) given by 

bR'a © aRb. 

(It is also called the inverse and is therefore often writ- 
ten R7!. In no algebraic sense it is an inverse, in gen- 
eral.) 

Both operators T and — are involutory, that is, when 
applied twice they give the original object: (RT)T = R, = 
(— R) = R. They commute with each other: — (RT) = (- 
R)', so that the parentheses may be omitted safely. One 
can write: — RT. 


Definition 1 (Binary operators and a binary relation 
on R(A — B)) 
e The intersection or meet or AND-ing: 


a(ROR')b << ~— aRband aR’b. 


e The union or join or OR-ing: 


a(RUR')b <& — aRboraR’b. 


e A relation R ‘is contained in’ (is a subrelation of) 
a relation R’, and R’ ‘contains’ (is a superrelation of) 
R,RCR: 

RER s&_~ (Va)(Vb)(aRb > aR’b) 
© ROR=R & RUR=R, 
where —> is the Boolean implication operator. 


Definition 2 The relative complement of R with re- 
spect to R’, or difference between R’ and R, is R’ \ R, 
given by: 

a(R’\ R)b &~ aR’b but aaRb, 


that is, by R’\R=R’/N-R. 
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Binary Operations on Successive Relations 


Definition 3 (Circle and square products) Where R 

€ R(A > B) and S € R(B > C), the following compo- 

sitions give a relation in R(A — C): 

e The circle product or round composition is 0, given 
by aRo Sc & aRN Sc £9. 

e The square composition or square product is UO, 
given by aROU Sc & aR= Sc. 


The circle product is the usual one, to be found 
throughout the literature going back at least to the nine- 
teenth century. The square product is a more recent 
(1977) innovation. The O product belongs to the fam- 
ily of products sometimes called BK-products. Further 
interesting kinds of BK-products and their uses are dis- 
cussed in the sequel. 


Proposition 4 (Properties of L-product) 

1) (ROS)NROSE(RAR)JOSE(RUR)O 
(ROS)u (ROS); 

2) ROS =s* oR: 

3) ROS=7-~RO-7S; 

4) the square product is not associative. 


i) 
IF] 


Matrix Formulation of the Binary Operations 


All of the binary operations on relations have a con- 
venient formulation in matrix terms - using the ma- 
trix operations given in Proposition 6. The matrix op- 
erations use in their definitions standard Boolean logic 
connectives for crisp relations. By replacing these by 
the connectives of suitable many-valued logics, all the 
formulas easily generalize to fuzzy relations. Thus ma- 
trix formulation of binary operations and compositions 
unifies computationally crisp and fuzzy relations. 


Definition 5 The Boolean connectives A, V, <>, on the 
set By = {0, 1} are given by: 


A/Q 1 Vidi 6101 
0/0 0 0/0 1 ott oO 
1/0 1 1/1 1 1/01 


For a pair (x1, x2) of elements from Bz, we infix the 
operators: x) A X2, etc., while for a list (xx) =1,..., 
(xk)kex of elements from 8, we write get Xe or 
nex Xk or simply /\; xx. (Note that K can be denu- 
merably infinite, or even greater, without spoiling the 
definition; no convergence problems are involved.) 


Proposition 6 (Matrix notation) 

1) (RN S)ij = Rj A Sijs 

2) (RU Si = Rj Vv Sijs 

3) (Ro S)i; = VC Riz A Ség)i 

4) (Re S)ik = A\j(Rij V Six) 

5) (ROS)ik = A\j(Rij = Six); 

6) (Ri X Raiginjj, = (Roving, A Rvdirjr- 


Non-Associative Products of Relations 


Definition 7 (Triangle products) 
e Subproduct <: x(R <S) z <> xR C Sz; 
e Superproduct >: x(R> S)z@xRD Sz. 


The matrix formulation of < and > products uses 
the Boolean connectives —, <-, © on the set By = {0, 1} 
given by 

>/0 1 01 

O/11 0 1 

1)/0 1 1 0 
Proposition 8 (Logic notation for < and >) 


© (RAIS)ix = Aj(Rij > Sjx)s 
e (ROS) = A (Rij << Sj) 


Only the conventional o -product is associative. The 0 
product is not associative [2]. 


Proposition 9 The following mixed pseudo- 
associativities hold for the triangle products, with Q 
€B(W ~ X) and the triple products in B (W ~> Z): 

e Qd(RES)=(Q<R)PS; 

e Qd(R<AS)=(QOR)<S; 

e Qe (Re S)=Qp (ROS). 


Characterization of Special Properties 
of Relations Between Two Sets 


Definition 10 (Special properties of a heterogeneous 

relation RE R (X ~ Y)): 

e Ris covering if and only if (Vx) € X (Ay) € Y such 
that xRy. 

e Ris onto if and only if (Vy) € Y (Ax) € X such that 
xRy. 

e Ris univalent if and only if (Vx) € X, if xRy and xRy’ 
then y=y’. 

e R is separating if and only if (Vy) € Y, if xRy and 
x’Ry then x =x’. 
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Composed properties can be defined by combining 
these four basic properties. Well-known is the combi- 
nation ‘covering’ and ‘univalent’ which defines func- 
tional. Other frequently used combination is ‘onto’ and 
‘separating’. 

The self-inverse circle product is very useful in the 
characterization of special properties of relations be- 
tween two distinct sets. Using the product, one can 
characterize these properties in purely relational way, 
without directly referring to individual elements of the 
relations involved. 


Proposition 11 (Special properties of a heterogeneous 
relation RE R(X ~ Y)): 

e Ris covering if and only ifEx CRoR™, 

R is univalent if and only ifR~' o RC Ey. 

R is onto if and only if (for all) Ey CR" oR. 

e Ris separating if and only ifRo R' C Ey. 

Here Ex and Ey are the left and right identities, respec- 
tively. 


Relations on a Single Set: Special Properties 


The self-inverse products are a fertile source of relations 
on the single set X. There are certain well-known spe- 
cial properties which a relation may possess (or may 
lack), of which the most important are reflexivity, sym- 
metry, antisymmetry, strict antisymmetry, and transi- 
tivity, together with their combinations, forming pre- 
orders (reflexive and transitive) (partial) orders (reflex- 
ive, antisymmetric and transitive), equivalences (reflex- 
ive, transitive and symmetric). 


Definition 12 (Special properties of binary relations 

from X to X) 

e Covering: every x; is related by R to something > Vi 
€ I Aj <I such that Rj = 1. 

e Locally reflexive: if x; is related to anything, or if any- 
thing is related to x;, then x; is related to itself + Vi 
EIT R= max; (Ri, Rji). 

e Reflexive: covering and locally reflexive Vi € I Rij 
=1. 

e Transitive Vi, j,k € I (x:Rx;j and xjRx_ = xjRxx) 
PER 
Symmetric: (x;Rxj => xjRxi) <> RT =R. 

e Antisymmetric: (x;Rx; and xjRx; > x; = xj) @ ifiF 
j then min(Ry, Rji) = 0. 

e Strictly antisymmetric: never both x;Rx; and xjRx; > 
Vij el min(R,, Ri) =0. 


Most of the properties listed above are common in 
the literature. Local reflexivity is worthwhile exception. 
It appeared in [1] and was generalized to fuzzy rela- 
tions in [4], leading to new computational algorithms 
for both crisp and fuzzy relations [4,10]. Unfortunately, 
it is absent from the textbooks, yet it is extremely im- 
portant in applications of relational methods to analysis 
of the real life data (see the notion of participant in the 
next two sections). 


Partitions IN and ON a Set 


A partition on a set X is a division of X into nonoverlap- 
ping (and nonempty) subsets called blocks. A partition 
in a set X is a partition on the subset of X [17,18] called 
the subset of participants. 

There is a one-to-one correspondence between par- 
titions in X and local equivalences (i.e. locally reflexive, 
symmetric and transitive relations) in R(X ~ B). The 
partitions in X (so also the local equivalences in R(X ~ 
B)) form a lattice with ‘__is-finer-than__’ as its order- 
ing relation. This whole subject is coextensive with clas- 
sification or taxonomy, i.e., very extensive indeed. Fur- 
thermore, classification is the first step in abstraction, 
one of the fundamental processes in human thought. 


Tolerances and Overlapping Classes 


Some tests for tolerance and equivalence are as follows: 
Ro RT is always symmetric and locally reflexive. 
Ro RT isa tolerance if and only if R is covering. 
ROBT is always a (local) tolerance. 

RORTCE Rifand only if R is reflexive. 
ECRCRORT ifand only if R is an equivalence. 
RO RT = Rif and only if R is an equivalence. 
RORT™C ROR’ if and only if R is covering. 

It is not always the case that one manages, or even 
attempts, to classify participants into nonoverlapping 
blocks. Local tolerance relations (i.e. locally reflexive 
and symmetric) lead to classes which may well over- 
lap, where one participant may belong to more than 
one class. The classic case, giving its name to this kind 
of relation, is ‘__is-within-one milimeter-of__’. This is 
quite a different model from the severe partitions [80], 
and has been for a long time unduly neglected both 
in theory and applications, even when the data mutely 
favor it. 


Boolean and Fuzzy Relations 


303 


Hierarchies in and ona Set: 
Local and Global Orders and Pre-orders 


An example of a hierarchy in a finite set X is displayed 
in Fig. 1. In such a hierarchy, there is a finite number 
of levels and there is no ambiguity in the assignment of 
a level to an element. The elements which appear even- 
tually in the hierarchy are the participants; those which 
do not are nonparticipants; if all of X participates, then 
the hierarchy is on X. 

Level 2 

Hasse Diagram 

Level 1 


Level 0 


Digraph 
Nonparticipants: 
00 


Every local order (i.e. locally reflexive, transitive and 
antisymmetric relation) from a finite set to itself estab- 
lishes a hierarchy in that set, that is, can be used as the 
‘precedes’ relation in the hierarchy. Conversely, given 
any hierarchy, its “__precedes__’ is a local order. The hi- 
erarchy is on X exactly when the local order is the global 
one. 

The picture of the hierarchy is called its Hasse di- 
agram. It can always be obtained from the digraph of 
the local-order relation by the suppression of loops and 
of those arrows which directly connect nodes between 
which there is also a longer path. 

The formulas of Theorem 13 can be used for fast 
computational testing of the listed properties. 


Theorem 13 The following conditions universally char- 
acterize the transitivity, reflexivity and pre-order on R€ 
R(X ~ X): 

e Ris transitive if and only ifRE R>R™". 

e Ris reflexive if and only ifR> R'CR. 

e Risa pre-order if and only ifR=R> R71. 

More complex relational structures are investigated by 


theories of homomorphisms, which can be further gen- 
eralized [6]. 


Definition 14 Let F, R, G, S be heterogeneous relations 
between the sets A, B, C, D such that R € R(A ~ B). 
The conditions that (for alla € A,bE B,ceC de 
D) the expression (aFc A aRb A bGd) — cSd we denote 
by FRG:S. We say that FRG:S is forward compatible, or, 
equivalently, that F, G are generalized morphisms. 


The following Bandler-Kohout compatibility theorem 
holds, [6]: 


Theorem 15 (Generalized morphisms) 

e FRG: S are forward compatible if and only if F’ o R 
oGLs. 

e Formulas for computing the explicit compatibility cri- 
teria for F and G are: FRG: S are forward-compatible 
if and only ifF £©R<«(G<ST). 


The R’s of forward compatibility constitute a lower 
ideal. Similarly, the backward compatibility given by F 
o So GTC R gives a generalized proteromorphism. It 
constitutes an upper ideal or filter: FRG: S are backward 
compatible if and only if Fo So GTC R if and only if S 
CFTARDG. 

FRG : S are both-way compatible if they are both 
forward and backward compatible. The conventional 
homomorphism is a special case of both-way compati- 
bility, where F and G are not general relations but just 
many-to-one mappings. 

The generalized morphisms of Bandler and Kohout 
[6] are relevant not only theoretically, but have also an 
important practical use in solving systems of inequali- 
ties and equations on systems of relations. 

For partial homomorphisms the situation becomes 
more complicated. In partial structures the conven- 
tional homomorphism splits into mutually related 
weak, strong and very strong kinds of homomor- 
phism [5]. 


Fuzzy Relations 


Mathematical relations can contribute to investigation 
of properties of a large variety of structures in sciences 
and engineering. The power of relational analysis stems 
from the elegant algebraic structure of relational sys- 
tems that is supplemented by the computational power 
of relational matrix notation. This power is further en- 
hanced by many-valued logic based (fuzzy) extensions 
of the relational calculus. 


304 


Boolean and Fuzzy Relations 


As often in mathematics, where terms are used in- 
clusively, the crisp (nonfuzzy) sets and relations are 
merely special cases of fuzzy sets and relations, in which 
the actual degrees happen to be the extreme ones. On 
the theoretical side, fuzzy relations are extensions of 
standard nonfuzzy (crisp) relations. By replacing the 
usual Boolean algebra by many-valued logic algebras, 
one obtains extensions that contain the classical rela- 
tional theory as a special case. 


Definitions 


A fuzzy set is one to which any element may belong to 
various degrees, rather than either not at all (degree 0) 
or utterly (degree 1). Similarly, a fuzzy relation is one 
which may hold between two elements to any degree 
between 0 and 1 inclusive. The sentence x;Ry; takes its 
value 5 (x; R yj) = Rj, from the interval [0, 1] of real 
numbers. In early papers on fuzzy relations j1p(x;, yj) 
was usually written instead of Rj. 

The matrix notation used in the previous sections 
for nonfuzzy (crisp) relations is directly applicable to 
the fuzzy case. Thus, all the definitions of operations, 
compositions and products can be directly extended to 
the fuzzy case. 


Operations and Inclusion on R- (X ~ Y) 
Fuzzy Relations with Min, Max Connectives 


This has been the most common extension of relations 
to the fuzzy realm. Boolean A and Vv are replaced by 
many-valued connectives min, max in all crisp defini- 
tions. 

In matrix terms, this yields the following intersec- 
tion and union operations: 


(RN S)ij = min(Rij, Sij), 
(RUS); = max(R;;, S;;). 


(In older p-notation, Lan s(x, xj) = min(WR(xi, yj), 
[Ls (Xi, Vids etc.) 

The negation of R is given by (— R)j = 1 — Ry. The 
converse of R is given by (R™) jj = Rj. 


Fuzzy Relations Based on Lukasiewicz Connectives 


When the bold (Lukasiewicz) connectives x V y = 
min(1, x + y), x A y = max(0, x + y —1) are used to 


define LI, M operations, this is an instance of relations 
in MV-algebras. 


Fuzzy Relations With t-Norms and Co-Norms 


Fuzzy logics can be further generalized. A and Vv are ob- 
tained by replacing min and max by a t-norm and a t- 
conorm, respectively. A t-norm is an operation * : [0, 
1]? — [0, 1] which is commutative, associative, non- 
decreasing in both arguments and having 1 as the unit 
element and 0 as the zero element. Taking a continuous 
t-norm, by residuation we obtain a many-valued logic 
implication —. Using { A, V, *, — } one can define 
families of deductive systems for fuzzy logics called BL- 
logics [31]. In relational systems using BL-logics, one 
can define again various t-norm based relational prop- 
erties [53,83], BK-products and generalized morphisms 
of relations [47]. 


Definition 16 (Inclusion of relations) A relation R is 
‘contained in’ or is a subrelation of a relation S, written 
RCS, if and only if (Vi)(Vj) Ri < Sj. 


This definition guarantees that R is a subrelation of R’ if 
and only if every Ry is a subrelation of its corresponding 
Rg. (This convenient meta-property is called cutworthi- 
ness, see Theorem 17 below.) 


Products: Re(X ~» Y) x Re(¥Y ~ Z) > Re (X ~ Z) 


For fuzzy relations, there are two versions of products: 
harsh and mean [3,52]. Most conveniently, again, in 
matrix terms harsh products syntactically correspond to 
matrix formulas for the crisp relations. The fuzzy rela- 
tional products are obtained by replacing the Boolean 
logic connectives AND, OR, both implications and the 
equivalence of crisp products by connectives of some 
many-valued logic chosen according to the proper- 
ties of the products required. Thus the o-product and 
C-product are given exactly as in Proposition 6 above 
by formulas 3) and 5), respectively; for triangle prod- 
ucts as given in Proposition 8 above. For the MVL im- 
plication operators most often used to define fuzzy tri- 
angle products, see » Checklist paradigm semantics for 
fuzzy logics, Table 1, or [8]. The details of choice of the 
appropriate many-valued connectives are discussed in 
[3,7,8,40,43,52]. 

Given the general formula (R@S)jix := # (Rij * Six) 
for a relational product, a mean product is obtained by 
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Boolean and Fuzzy Relations, Table 1 
Closures and an interior 


The local equivalence closure of R: locequ clo R = tra clo (sym clo (locref clo R)). 


: 
| 6. The local pre-order closure of R: locpre clo R = locref clo (tra clo R) = tra clo (locref clo R). 
8. 


The reflexive closure of R: ref cloR=RLI Ey. 


. | The tolerance closure of R: tol clo R = ref clo (sym clo R). 


. | The pre-order closure of R: pre clo R = ref clo (tra clo R). 


. | The equivalence closure of R: equ clo R = tra clo (tol clo R). 


replacing the outer connective # by }‘and normalizing 
the resulting product appropriately. In more concrete 
terms, in order to obtain the mean products, the outer 
connectives \/ ; in o and /\; in O, <, > are replaced by 


Un) © [3]. 


N-ary Relations 


An n-ary relation R is an open sentence with n slots; 
when these are filled in order by the names of elements 
from sets Xj, . 
either true or false if the relation is crisp, or is judged to 
hold to a certain degree if the relation is fuzzy. This ‘in- 
tensional’ definition is matched by the satisfaction set 
Rs of R, which is a fuzzy subset the n-tuple of Xj, ..., 
X,, and can be used, if desired as its extensional defini- 
tion. The matrix notation works equally well for n-ary 
relations and all the types of the BK-products are also 
defined. For details see [9]. 


..» Xp, there results a proposition that is 


Special Properties of Fuzzy Relations 


The special properties of crisp relations can be general- 
ized to fuzzy relations exactly as they stand in Defini- 
tion 12, using in each case the second of the two given 
definitions. It is perhaps worthwhile spelling out the re- 
quirements for transitivity in more detail: 


R°CR & (Vi, k) max(min R;;, Rjx)) < Rik. 
1 


Useful references provide further pointers to the lit- 
erature: general [43] on fuzzy partitions [14,69], fuzzy 
similarities [69], tolerances [34,75,85]. 


Alpha-cuts of Fuzzy Relations 


It is often convenient to study fuzzy relations through 
their w-cuts; for any @ in the half-open interval [0, 1], 
the a-cut of a fuzzy relation R is the crisp relation Ry 
given by 


1 ifRj; >a, 
(Ra)ij = on 
0 otherwise. 


Compatibility of families of crisp relations with 
their fuzzy counterpart (the original relation on which 
the a-cuts have been performed) is guaranteed by the 
following theorem on cutworthy properties [10]: 


Theorem 17 It is true of each simple property P (given 
in Definition 12) and every compound property P (listed 
in Table 1), that every a-cut of a fuzzy relation R pos- 
sesses P in the crisp sense, if and only if R itself possesses 
in the fuzzy sense. (Such properties are called cutworthy.) 


Fuzzy Partitions, Fuzzy Clusters 
and Fuzzy Hierarchies 


Via their a-cuts, fuzzy local and global equivalences 
provide precisely the nested families of partitions in and 
on a set which are required by the theory and for the ap- 
plications in taxonomy envisaged in [17,18]. Fuzzy local 
and global tolerances similarly provide families of toler- 
ance classes for the cluster type of classification which 
allows overlaps. Fuzzy local and global orders furnish 
nested families of hierarchies in and on a set, with their 
accompanying families of Hasse diagrams. 
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The importance of fuzzy extensions cannot be over- 
estimated. Thus, one may identify approximate simi- 
larities in data, approximate equivalences and orders. 
Such approximations are paramount in many applica- 
tions, in situations when only incomplete, partial infor- 
mation about the domain of scientific or technological 
application is available. 

Closures and interiors of relations play an impor- 
tant role in design of fast fuzzy relational closure algo- 
rithms [4,9,10,11] for computing such approximations. 

Theorem 17 and other theorems on commuting of 
cuts with closures [11,42] guarantee their correctness. 


Closures and Interiors with Special Properties 


For certain properties P which a fuzzy relation may 
have or lack, there always exists a well-defined P-closure 
of R, namely the least inclusive relation V which con- 
tains R and has the property P. Also, for some proper- 
ties P, the P-interior of R is the most inclusive relation 
Q contained in R and possessing P. Clearly, where the 
P-closure exists, R itself possesses P if and only if R is 
equal to P-clo(R), and the same for interiors. 

Certain closures use the local equality Er of R, given 
by (Er) ji = max;(max(Rj, Rji)); (Er)ij =0 if j x i. Others 
use the equality on X given by (Ex); = 1, (Ex); = 0 for 
j#i 

Important closures and one important interior are 
given in Table 1. See [4,10] for further details. 


Applications of Relational Methods 
in Engineering, Medicine and Science 


Relational properties are important for obtaining 
knowledge about characteristics and interactions of 
various parts ofa relational model used in real life appli- 
cations. Identification of composite properties of math- 
ematical relations, such as local or global pre-orders, 
orders, tolerances or equivalences, plays an important 
role in evaluation of empirical data, (e. g. medical data, 
commercial data etc. or data for technological fore- 
casting) and building and evaluating relational models 
based on such data [48,49]. 

The local and global properties detect important 
semantic distinctions between various concepts cap- 
tured by relational structures. For example the inter- 
actions between technological parts, processes etc., or 
relationships of cognitive constructs elicited experimen- 


tally [37,39,41,55]. Capturing both, local and global 
properties is important for distinguishing participants 
from nonparticipants in a relational structure. This dis- 
tinction is crucial for obtaining a nondistorted picture 
of reality. 

In the general terms, the abstract theoretical tools 
supporting identification and representation of rela- 
tional properties are fuzzy closures and interiors [4,10]. 
Having such means for testing relational properties 
opens the avenue to linking the empirical structures 
that can be observed and captured by fuzzy relations 
with their abstract, symbolic representations that have 
well defined mathematical properties. 

This opens many possibilities for computer ex- 
perimentation with empirically identified logical, say, 
predicate structures. These techniques found practical 
use in directing resolution based theorem prover strat- 
egy [56], relation-based inference in medical diagnosis 
[48,58] and at extracting predicate structures of ‘train 
of thought’ from questionnaires presented to people by 
means of Kelly’s repertory grids. BK-relational prod- 
ucts and fast fuzzy relational algorithms based on fuzzy 
closures and interiors have been essential for compu- 
tational progress of in this field and for optimization 
of computational performance. See the survey in [52] 
with a list of 50 selected references on the mathemat- 
ical theory and applications of BK-products in various 
fields of science and engineering. Further extensions or 
modifications of BK-products have been suggested in 
[19,20,21,30]. 

Applications of relational theories, computations 
and modeling include the areas of medicine [48,59], 
psychology [49], cognitive studies [36,38], nuclear en- 
gineering [84], industrial engineering and management 
[25,46], architecture and urban studies [65,66] value 
analysis in business and manufacturing [60] informa- 
tion retrieval [51,54], computer security [45,50] data- 
bases, theoretical computer science [13,68,71], software 
engineering [78], automated reasoning [56], and logic 
[12,28,63]. Particularly important for software engi- 
neering is the contribution of C.A.R. Hoare and He 
Jifeng [33] who use the crisp triangle BK-superproduct 
for software specification, calling the crisp < products 
in fact ‘weak prespecifications’. 

Relational equations [22] play an important role in 
applications [70] in general, and also in AI and appli- 
cations of causal reasoning [24]; fuzzy inequalities in 


Boolean and Fuzzy Relations 


307 


mathematical programming [72]. Applications in game 
theory of crisp relations is well established [78,79]. 


Brief Review of Theoretical Development 


Binary (two place) relations were first perceived in their 
abstract mathematical form by Galen of Pergamon in 
the 2nd century AD [57]. After a long gap, first sys- 
tematic development of the calculus of relations (con- 
cerned with the study of logical operations on binary 
relations) was initiated by A. DeMorgan, C.S. Pierce 
and E. Schréder [9,64]. Significant investigation into 
the logic of relations was the 1900 paper of B. Rus- 
sell [76] and axiomatization of the relational calculus 
in 1941 by A. Tarski [64,81]. Extensibility of Tarski’s 
axioms to the fuzzy domain has been investigated by 
Kohout [44]. 

Later algebraic advances in relational calculus [9] 
stem jointly from the elegant work of J. Riguet (1948) 
[74], less widely known but important work of O. 
Bortivka (1939) [15,16,17,18] and the stimulus of fuzzy 
set theory of L.A. Zadeh (1965) [35,85,86], and include 
a sharpened perception of special properties and the 
construction of new kinds of relational products [3], to- 
gether with the extension of the theory from Boolean 
to multiple-valued logic based relations [2,9]. The tri- 
angle subproduct R < S, the triangle superproduct R 
> S, and square product R O S were introduced in 
their general form defined below by Bandler and Ko- 
hout in 1977, and are referred to as the BK-products 
in the literature [19,20,30]. The square product, how- 
ever, stems from Riguet (1948) [74], needing only to 
be made explicit [1,9]. E. Sanchez independently de- 
fined an a-compostition [77] which is in fact < us- 
ing Heyting-Gédel implication. The special instances 
of the triangle BK-products were more recently redis- 
covered and described in 1986 by J.P. Doignon, B. Mon- 
jardet, M. Roubens, and P. Vincke [23,26] calling these 
‘traces of relations’. Hence, a ‘trace-of-relation’ is a BK- 
triangle superproduct in which — is the residuum of 
a commutative A. The crisp square product was also 
independently introduced in 1986 by R. Berghammer, 
G. Schmidt and H. Zierer [13] as a generalization of 
Riguet’s ‘noyau’ [74]. 

On the other hand, advances in abstract relational 
algebras stems from the work of Tarski [81] and his 
school [32,64,67,82]. Tarski’s axiomatization [81] of 


homogeneous relational calculus takes relations and 
operations over relations as the primitives. It applies 
only to homogeneous relations as it has only one con- 
stant entity, the identity relation E. For heterogeneous 
relations, taking e.g. Uxy as the universal relation we 
have a finite number of separate identity relations (con- 
stants) i.e. Eyy, Eyz, ..., etc. [4,10]. Therefore viewed 
syntactically through the logic axioms, the axiomati- 
zation of heterogeneous relations (containing a whole 
family of universal relations) would be a many-sorted 
theory [30], each universal relation belonging to a dif- 
ferent sort. 
Tarski’s axioms of homogeneous relations 

RoE=EoR;(RoS)T=ST oRT; 

(RT)T =R; (> R)T => (RT); 


(RLIS)T =RT LIST; 
(Ro S)o T=Ro (S07); 


(RLIS)0 T=(RoT)LI (S07); 
Ro(SUT)=(R0S)L (R07); 
(RT o7(ROS)) UAS=7S. 


Taking the axioms on their own opens the way 
to abstract relational algebras (RA) with new prob- 
lems at hand. Tarski and his school have investigated 
the interrelationship of various generalizations of as- 
sociative RAs in a purely abstract way. In some of 
these generalizations, the axiom of associativity for re- 
lational composition is dropped. This leads from rep- 
resentable (RRA) to semi-associative (SA), weakly as- 
sociative (WA) and nonassociative (NA) relational al- 
gebras. In 1982 R.D. Maddux [62] gave the following 
result: 


RRA C RAC SA C WA CNA. 


All these generalizations deal only with one relational 
composition. The equations for pseudo-associativities 
given above (Proposition 9) and the nonassociativity 
of the square product (Proposition 4) show that there 
exist nonassociative representations of relational alge- 
bras (RA) in the relational calculus. Theorem 15 and 
Proposition 8 show that the interplay of several rela- 
tional compositions is essentially involved in the com- 
putationally more powerful formulas of the relational 
calculus. The Tarskian RA axiomatizations, however, 
do not express fully the richness of the calculus of bi- 
nary relations and the mutual interplay of associative 
0, pseudo-associative >, < and nonassociative D) prod- 
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ucts. Considerable scope for further research into new 
axiomatizations still remains. Our results based on 
nonassociative BK-products of Bandler and Kohout 
that historically precede abstract nonassociative gener- 
alizations in relational algebras of Maddux show that 
the nonassociative products have representations and 
that these representations offer various computational 
advantages. There is also a link of RA with projective 
geometries [61]. 


Basic Books and Bibliographies 


The best general books on theory of crisp relations and 
applications are [78] and [80]. In fuzzy field, there is no 
general book available at present. There are, however, 
extant some more specialized monographs: on solving 
fuzzy relations equations [27], on preference model- 
ing and multicriteria decision making [39], on repre- 
sentation of cognitive maps by relations [39] and on 
crisp and fuzzy BK-products of relations [53]. One can 
also find some specialized monographs on logic foun- 
dations and relational algebras: [32,82]. All these books 
also contain important list of references. The most im- 
portant bibliography of selected references on the topic 
related to fuzzy sets and relations is contained in [43]. 
The early years of fuzzy sets (1965-1975) are covered 
very comprehensively in the critical survey and anno- 
tated bibliography [29]. Many-valued logic connectives 
form an important foundation for fuzzy sets and rela- 
tions. The book of N. Rescher [73] still remains the best 
comprehensive survey that is also accessible to a non- 
logician. It contains almost complete bibliography of 
many-valued logics from the end of the 19th century 
to 1968. 


See also 


> Alternative Set Theory 

> Checklist Paradigm Semantics for Fuzzy Logics 

> Finite Complete Systems of Many-valued Logic 
Algebras 

> Inference of Monotone Boolean Functions 

> Optimization in Boolean Classification Problems 

> Optimization in Classifying Text Documents 
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A bottleneck Steiner tree (or a min-max Steiner tree) is 
a Steiner tree (cf. » Steiner tree problems) in which the 
maximum edge weight is minimized. Several multifa- 
cility location and VLSI routing problems ask for bot- 
tleneck Steiner trees. 

Consider the problem of choosing locations for 
a number of hospitals serving homes where the goal is 
to minimize maximum weighted distance to any home 
from the hospital that serves it and between hospitals. 
The solution is a tree which spans all hospitals and 
connects each home to the closest hospital. This tree 
can be seen as a Steiner tree where the homes are ter- 
minals and hospitals are Steiner points (cf. » Steiner 
tree problems). Unlike the classical Steiner tree prob- 
lem where the total length of Steiner tree is minimized, 
in this problem it is necessary to minimize maximum 
edge weight. 

The other instance of the bottleneck Steiner tree 
problem occurs in electronic physical design automa- 
tion where nets are routed subject to delay minimiza- 
tion [2,3]. The terminals of a net are interconnected 
possibly through intermediate nodes (Steiner points) 
and for electrical reasons one would like to mini- 
mize maximum distance between each pair of intercon- 
nected points. 

The most popular versions of the bottleneck Steiner 
tree problem in the literature are geometric. Note that if 


the number of Steiner points is not bounded, then any 
edge can be subdivided into infinitely small segments 
and the resulting maximum edge length becomes zero. 
Therefore, any meaningful formulation should bound 
the number of Steiner points. One such formulation is 
suggested in [9]. 


Problem 1 Given a set of n points in the plane (called 
terminals), find a bottleneck Steiner tree spanning all 
terminals such that degree of any Steiner point is at 
least 3. 


Instead of introducing constraints, one can minimize 
the number of Steiner points. The following formula- 
tion has been proved to be NP-hard [15] and approxi- 
mation algorithms have been suggested in [11,14]. 


Problem 2. Given a set of n terminals in the plane and 
A > 0, find a Steiner tree spanning n terminals with 
the minimum number of Steiner points such that every 
edge is not longer than A. 


Sometimes the bottleneck Steiner tree has predefined 
topology, i.e. the unweighted tree consisting of edges 
between terminals and Steiner points [4,5,10]. Then it 
is necessary to find the optimal positions of all Steiner 
points. Since the number of different topologies for 
a given set of terminals grows exponentially, fixing the 
topology greatly reduces the complexity of the bottle- 
neck Steiner tree problem. 


Problem 3 Find a bottleneck Steiner tree with a given 
topology T which spans a set of n terminals in the plane. 
The first algorithms for the Euclidean case of Problem 
3 are based on nonlinear optimization [7] and [13]. For 
a given A > 0, the algorithm from [15] finds whether 
a Steiner tree ST with the maximum edge weight A ex- 
ists as follows. 


The topology T is first transformed into a forest by re- 

moving edges between terminals, if any such edge has 

length more than A, then ST does not exist. Each con- 

nected component T' is processed separately. The fol- 

lowing regions are computed in bottom-up fashion: 

i) the region of the plane R(s) where a Steiner point s 
can be placed; and 

ii) the region R* (s) where the Steiner point adjacent to 
s can be placed which is the area within distance at 
most A from R(s). 

If a Steiner point p is adjacent to nodes s;,..., s, in Tj, 

then R(s) = R* (s}) N--- O R* (sx). The number a(s) 
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of arcs bounding R(s) may be as high as the number of 
leaves in T;. In order to keep this number low, the tree 
K can be decomposed in O(log n) levels such that in 
total there will be only O(n) arcs in all regions. Thus the 
runtime of the algorithm is O( log n) [15]. 

When the distance between points is rectilinear, 
several efficient algorithms are suggested for Prob- 
lem 3 [4,9,10]. The algorithm above can be adjusted 
for the rectilinear plane: the regions R(s) are rectangles. 
The fastest known algorithm solves Problem 3 in time 
O(n?) [9]. 

Each bottleneck Steiner problems can be general- 
ized to arbitrary weights on edges and formulated for 
weighted graphs [6]. 


Problem 4 Given a graph G = (V, E, w) with nonneg- 
ative weight w on edges, and a set of terminals S C V, 
find a Steiner tree spanning S with the smallest maxi- 
mum edge weight. 


Problem 4 can be solved efficiently in the optimal time 
O(|E|) time [6]. Unfortunately, the above formulation 
does not bound the number of Steiner points. To bound 
the number of Steiner points it is necessary to take in 
account that unlike the classical Steiner tree problem in 
graphs (cf. > Steiner tree problems), an edge cannot be 
replaced with a shortest path without affecting the bot- 
tleneck objective. The following graph-theoretical gen- 
eralization of Problem 1 considered in [1,9] has been 
proved to be NP-hard. 


Problem 5 Given a complete graph G = (V, E, w) with 
nonnegative weight w on edges, and a set of terminals 
S C V, find a Steiner tree spanning S with the smallest 
maximum edge weight such that each Steiner point has 
degree at least 3. 


Similarly to the classical Steiner tree problem, if no 
Steiner points are allowed, the minimum spanning tree 
(cf. also ® Capacitated minimum spanning trees) is the 
optimal solution for Problems 1 and 5. Therefore, sim- 
ilarly to the Steiner ratio, it is valid to consider the bot- 
tleneck Steiner ratio p(n). The bottleneck Steiner ratio 
is defined as the supremum over all instances with n 
terminals of the ratio of the maximum edge weight of 
the minimum spanning tree over the maximum edge 
weight of the bottleneck Steiner tree. It has been proved 
that pg(n) = 2 [log, n|— 4, where 6 is either 0 or 1 de- 


pending on whether mantissa of log, n is greater than 
log, 3/2 [9]. 

The approximation complexity of the Problem 5 is 
higher than for the classical Steiner tree problem: even 
(2 — €)-approximation is NP-hard for any € > 0 [1]. 
On the other hand, the best known approximation al- 
gorithm for Problem 5 has approximation ratio log) n 
[1]. The algorithm looks for an approximate bottleneck 
Steiner tree in the collection C of edges between all pairs 
of terminals and minimum bottleneck Steiner trees for 
all triples of terminals. Using Lovasz’ algorithm [12] it is 
possible to find out whether such a collection contains 
a valid Steiner tree, i.e. a Steiner tree with all Steiner 
points of degree at least three. The algorithm finds the 
smallest A such that C still contains valid Steiner tree if 
all edges of weight more than A are removed. It has been 
shown that A < M- log) n, where M is the maximum 
edge weight of the optimal bottleneck Steiner tree. 


See also 


> Capacitated Minimum Spanning Trees 
> Directed Tree Networks 

> Minimax Game Tree Searching 

> Shortest Path Tree Algorithms 

> Steiner Tree Problems 
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In solving optimal control problems involving non- 
linear differential equations, some iterative procedure 
must be used to obtain the optimal control policy. From 
Pontryagin’s maximum principle it is known that the 


minimum of the performance index corresponds to the 
minimum of the Hamiltonian. Obtaining the minimum 
value for the Hamiltonian usually involves some itera- 
tive procedure. Here we outline a procedure that uses 
the necessary condition for optimality, but the bound- 
ary conditions are relaxed. In essence we have the op- 
timal control policy at each iteration to a wrong prob- 
lem. Iterations are performed, so that in the limit the 
boundary conditions, as specified for the optimal con- 
trol problem, are satisfied. Such a procedure is called 
approximation to the problem or boundary condition 
iteration method (BCI). Many papers have been writ- 
ten about the method. As was pointed out in [1], the 
method is fundamentally very simple and computa- 
tionally attractive for some optimal control problems. 
In [3] some evaluations and comparisons of different 
approaches were carried out, but the conclusions were 
not very definitive [5]. Although for control vector it- 
eration (CVI) many papers are written to describe and 
evaluate different approaches with widely different op- 
timal control problems, see for example [14], for BCI 
such comparisons are much more limited and there is 
sometimes the feeling that the method works well only 
if the answer is already known. However, BCI is a use- 
ful procedure for determining the optimal control pol- 
icy for many problems, and it is unwise to dispatch it 
prematurely. 

To illustrate the boundary condition iteration pro- 
cedure, let us consider the optimal control problem, 
where the system is described by the differential equa- 
tion 


d 
\|x|| 7 = f(x,u), with x(0) given, (1) 


where x is an n-dimensional state vector and uw is 
an r-dimensional control vector. The optimal control 
problem is to determine the control u in the time inter- 
val 0 < t < ty, so that the performance index 


i= [ w(x, u) dt (2) 


is minimized. We consider the case where the final time 
ty is given and there are no constraints on the control 
or the state variables. According to Pontryagin’s maxi- 
mum principle, the minimum value of the performance 
index in (2) is obtained by minimizing the Hamiltonian 


H=w+z'f. (3) 
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The adjoint variable z is defined by 


dz 0H : 
( with z(t) = 0, (4) 


which may be written as 


ow 

ith z(t;) = 0. 5 

dt ox ox ween () 

The necessary condition for the minimum of the 
Hamiltonian is 


0H 
—=0. 6 
aa (6) 
Let us assume that (6) can be solved explicitly for the 
control vector 


u = g(x, Zz). (7) 


If we now substitute (7) into (1) and (5), and in- 
tegrate these equations simultaneously backward from 
t=t, to t=0 with some value assumed for x(t;), we 
have the optimal control policy for a wrong problem, 
because there is no assurance that upon backward in- 
tegration the given value of the initial state x(0) will be 
obtained. Therefore it is necessary to adjust the guessed 
value for the final state, until finally an appropriate 
value for x(tr) is found. For this reason the method is 
called the boundary condition iteration method (BCI). 

In order to find how to adjust the final value of the 
state, based on the deviation obtained from the given 
initial state, we need to find the mathematical relation- 
ship to establish the effect of the change in the final 
state on the change in initial state. Many papers have 
been written in this area. The development of the neces- 
sary sensitivity equations is presented very nicely in [1]. 
In essence, the sensitivity information can be obtained 
by getting the transition matrix for the linearized state 
equation. Linearization of (1) gives 


dix (aft)! aft \ ' 

The transition matrix @ is thus obtained from solving 
dd (att)! 
—={—]| © ith O(t-) =I, 


where [ is the ( x n) identity matrix. 


Suppose at iteration j the use of x” (ty) gives the ini- 
tial state x (0) which is different from the given initial 
state x(0). Then a new choice will be made at iteration 
(j + 1) through the use of 


XV) (£6) = xP (6) + €B(0)(K(0) — x(0)), (10) 


where a stabilizing parameter € is introduced to avoid 
overstepping. A convenient way of measuring the devi- 
ation from the given initial state is to define the error as 
the Euclidean norm 


e = |x) —x(0)]. (11) 


Once the error is sufficiently small, say less than 10~°, 

then the iteration procedure can be stopped. 

The algorithm for boundary condition iteration 
may thus be presented as follows: 

e Choose an initial value for the final state x) (ty) and 
a value for €; set the iteration index j to 1. 

e Integrate (1), (2), (5) and (9) backwards from t = tr 
to t = 0, using for control (7). (2) is not needed for 
the algorithm, but it will give the performance index. 

e Evaluate the error in the initial state from (11), and 
if it is less than the specified value, end the iteration. 

e Increment the iteration index j by one. Choose 
a new value for the final state x” (t;) from (10) and 
go to step 2. 

The procedure is therefore straightforward, since 
the equations are all integrated in the same direction. 
Furthermore, there is no need to store any variables 
over the trajectory. There is the added advantage that 
the control appears as a continuous variable, and there- 
fore the accuracy of results will not depend on the size 
of the integration time step. Theoretically the results 
should be as good as can be obtained by the second 
variation method in control vector iteration. It is im- 
portant to realize, however, that the Hamiltonian must 
be well behaved, so that (7) can be obtained analyt- 
ically. The only drawback is the potential instability 
since the state equation and the sensitivity equation are 
integrated backwards, and problems may arise if the fi- 
nal time tf, is too large. For many problems in chemical 
engineering the BCI method can be easily applied as is 
shown in the following example. 
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Illustration of the Boundary Condition Iteration 
Procedure 


Let us consider the nonlinear continuous stirred tank 
reactor that has been used for optimal control studies 
in [4, pp. 308-318], and which was shown in [13] to ex- 
hibit multiplicity of solutions. The system is described 
by the two equations 


dx, 
dt 


2 
0.5 


= —2(x, + 0.25) 


5x1 


) — u(x, + 0.25), (12) 


dx 
dt 


25x] 
= 0.5 — x2 — (x2 + 0.5) exp (- a :) , (13) 
1 


with the initial state x;(0) = 0.09 and x2(0) = 0.09. The 
control u is a scalar quantity related to the valve open- 
ing of the coolant. The state variables x; and x2 rep- 
resent deviations from the steady state of dimension- 
less temperature and concentration, respectively. The 
performance index to be minimized is 


ty 
al (x7 +x} +0.1u7) dt, (14) 
0 
where the final time t = 0.78. The Hamiltonian is 
H= Z1(—2(x1 + 0.25) +R- u(x, + 0.25)) 
+ 2(0.5—x2.—R)+ xf +x}+0.1u?, (15) 


where R = (x2 + 0.5) exp (25 x)/(x; + 2)). The adjoint 
equations are 


dz, (Z2 _ 21) 
—_— = 2)z, —2 50R ——_, 16 
ap eT De ee + Geb)?” (16) 
dz (Z2 — 21) 
— =-2 ———R : 17 
dt OF Gaga oa 
The gradient of the Hamiltonian is 
0H 
— = 0.2u -— (x1 + 0.25)z1, (18) 
du 
so the optimal control is given by 
u = 5(x; + 0.25)z). (19) 


The equations for the transition matrix are: 


dP, 


= P>,, 
dt 0x1 i 0x2 _ 
d®), Of oft 
= @P Pr, 
dt Oxy a 0x2 = 
dh, df df, 
= ®P PD 
dt 0x1 it 0x2 =~ 
d@y, dfr Of 
= @P @P 
dt 0x1 2 0x2 aa 
where 
0 
af 4, SOR 
Ox, (x1 + 2)? 
df, oR 
Ox. (x. + 0.5)’ 
df, «SOR 
Ox, (x; +2)?” 


af, _ 


=-1 + —_.. 
0x2 (x2 + 0.5) 


The adjustment of the final state is carried out by the 
following two equations: 


x(t) = P(t) + € [Pn (y?) — 110) 


+1 (0)(x9?(0) — x2(0))] . (20) 


x (ty) = x9 (tp) + € [Pn 0)" 0) — 100) 
+2,(0)(x!?(0) — x2(0))| . (i) 


To illustrate the computational aspects of BCI, the 
above algorithm was used with a Pentium-120 per- 
sonal computer using WATCOM Fortran compiler 
version 9.5. The calculations were done in double pre- 
cision. When the performance index is included, there 
are 9 differential equations to be integrated backwards 
at each iteration. Standard fourth order Runge-Kutta 
method was used for integration with a stepsize of 0.01. 
For stability, it was found that € had to be taken of the 
order of 0.1. For all the runs, therefore, this value of € 
was used. As is shown in Table 1, to get the error less 
than 10~°,a large number of iterations are required, but 
the computation time is quite reasonable. The optimal 
value of the performance index is very close to the value 
I = 0.133094 reported in [13] with the second variation 
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Boundary Condition Iteration BCI, Table 1 
Application of BCI to CSTR 


Initial choice Performance Number of CPU times 
X1 (tp) = X2(te) index iterations 


0.045 0.133095 2657 


0.00 [0.133007 [2858 ian 
F001 [0.133007 [2805 1a? 


method and is essentially equivalent to J = 0.133101 ob- 
tained in [6] by using 20 stages of piecewise linear con- 
trol with iterative dynamic programming. By refining 
the error tolerance to e < 10~® required no more than 
an additional thousand iterations with an extra expen- 
diture of about 6 seconds of computation time in each 
case. Then the final value of the performance index for 
each of the four different initial starting points was I = 
0.133096. 

Now that computers are very fast and their speed 
is rapidly being improved, and computation time is no 
longer prohibitively expensive, the large number of iter- 
ations required by BCI should not discourage one from 
using the method. Since the control policy is directly in- 
side the integration routine, equivalent results to those 
obtained by second variation method can be obtained. 
The number of equations, however, to be integrated is 
quite high with a moderately high-dimensional system. 
If we consider a system with 10 state variables, there are 
121 differential equations to be integrated simultane- 
ously. Although computationally this does not repre- 
sent a problem, the programming could be a challenge 
to derive and enter the equations without error. There- 
fore, BCI methods for which the (n x n) transition ma- 
trix is not used may find a more widespread application. 
One possible approach is now presented. 


Sensitivity Information Without Evaluating 
the Transition Matrix 


Suppose at iteration j we have n sets of final states 
xG-"t) (tf), -+ +> x (tr) with corresponding values for 
the initial state obtained by integration x9—"+D 0), .0., 
x (0). Then we can write the transformation 


P= AQ, (22) 


where 

P= (xJ-"*D(E,) x)(t,)) ; (23) 
and 

Q= (x0 (0) x)(0)). (24) 
The transformation matrix 

A= PQ" (25) 
and the next vector at ty is chosen as 

x¥T)(t) = Ax(0). (26) 


(1) and (5) are integrated backward to obtain x* (0), 
and the matrices P and Q are updated and the pro- 
cedure continued. If the initial guesses are sufficiently 
close to the optimal, very rapid convergence is ex- 
pected. 


1 | Pick n sets of values for x(ty) and integrate (1) 
and (5) backward from t = ft, to t = 0. using (7) 
for control, to give n sets of initial state vectors. 
2 | From these two sets of vectors form the (n x n) 
matrices P and Q. 

3 | Calculate A from (25), and calculate a new vec- 
tor x4) (t,) from (26). 

4 | With the vector from Step 3 as a starting con- 
dition, integrate (1) and (5) backward to give 
xJ+(Q), 

5 | Use the vectors in Steps 3 and 4 to replace 
xJ—"™) (te) and xJ—"*) (0) imn matrices P and 
Q and continue until the error as calculated 
from (11) is below some tolerance, such as 107°. 
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For good starting conditions, one may use itera- 
tive dynamic programming (IDP) [9], and pick the final 
states obtained after each of the first n passes. F. Har- 
tig and FJ. Keil [2] found that in the optimization of 
spherical reactors, IDP provided excellent values which 
were refined by the use of sequential quadratic pro- 
gramming. For convergence here we need good start- 
ing conditions. This is now illustrated with the above 
example. 
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By using IDP, as described in [6,7,8] for piece- 
wise linear continuous control, with 3 randomly cho- 
sen points and 10 iterations per pass for piecewise lin- 
ear control with 15 time stages, the data for the first four 
passes in Table 2 give good starting conditions for BCI. 

By using as starting conditions the final states ob- 
tained in passes 1 and 2 as given in Table 2, the conver- 
gence is very fast with the above algorithm as is shown 
in Table 3. Only 9 iterations are required to yield I = 
0.133096. 

As expected, if the initial set of starting points is bet- 
ter, then the convergence rate is also better as is seen in 
comparing Table 4 to Table 3. However, in each case 
the total computation time was only 0.05 seconds on 
a Pentium-120. Taking into account that it takes 0.77 
seconds to generate the initial conditions with IDP, it 
is observed that the optimum is obtained in less than 1 
second of computation time. Therefore, BCI is a very 
useful procedure if (6) can be solved explicitly for the 
control and the final time ft; is not too large. Simple con- 
straints on control can be readily handled by clipping 
technique, as shown in [12]. Further examples with this 
approach are given in [10]. 


Boundary Condition Iteration BCI, Table 2 
Results of the first four passes of IDP 
CPU time s 


Passno. Perf. index x(t») X2(tr) 


fi fons27 [0.05359 |-0.13101 039 


Boundary Condition Iteration BCI, Table 3 
Convergence with the above algorithm from the starting 
points obtained in passes 1 and 2 by IDP 


Iteration no. Perf.index Error é 


0.129568 
0.136682 
0.135079 
0.133218 
0.133093 
0.133096 
0.133096 


BlLwWwl] ry 


0.2414- 1072 
0.1350 - 1072 


oO 


; 


0.5209 - 10~8 


Xo) 


Boundary Condition Iteration BCI, Table 4 
Convergence with the above algorithm from the starting 
points obtained in passes 3 and 4 by IDP 


Iteration no. Perf.index Errore 


0.121769 
0.135249 
0.133317 


0.133094 
0.133096 
0.133096 


0.7353: 10-2 
0.1415- 1072 
0.1531- 1073 
0.2861 - 10~4 
0.1703 - 10~> 
0.1190 - 1077 
0.5364- 101° 


See also 
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Interval arithmetic can be used to bound the range of 
a real function over an interval. Here, we bound the 
ranges of its Taylor coefficients (and hence derivatives) 
by evaluating it in an interval Taylor arithmetic. In the 
context of classical numerical methods, truncation er- 
rors, Lipschitz constants, or other constants related to 
existence or convergence assertions are often phrased 
in terms of bounds for certain derivatives. Hence, inter- 
val inclusions of Taylor coefficients can be used to give 
guaranteed bounds for quantities of concern to classical 
methods. 

Evaluating the expression for a function using in- 
terval arithmetic often yields overly pessimistic bounds 
for its range. Our goal is to tighten bounds for the range 
of f and its derivatives by using a differentiation arith- 
metic for series generation. We apply monotonicity and 


Taylor form tests to each intermediate result of the cal- 
culation, not just to f itself. The resulting inclusions for 
the range of derivative values are several orders of mag- 
nitude tighter than bounds obtained from differentia- 
tion arithmetic and interval calculations alone. Tighter 
derivative ranges allow validated applications such as 
optimization, nonlinear equations, quadrature, or dif- 
ferential equations to use larger steps, thus improving 
their computational efficiency. 

Consider the set of q times continuously differen- 
tiable functions on the real interval x = [x, x] denoted 
by f(x) € C1 [x]. We wish to compute a tight inclusion 
for 


RFP sx) = [fP): x<x sz, _ 


where p < q. We assume that f is sufficiently smooth 
for all indicated computations, and that all necessary 
derivatives are computed using automatic differentia- 
tion (cf. [5], » Automatic differentiation: Point and in- 
terval Taylor operators). 

Computing an inclusion for the range of f) is 
a generalization of the problem of computing an inclu- 
sion for the range of f, R(f; x). Moore’s natural inter- 
val extension [3] gives an inclusion which is often too 
gross an overestimation to be practical. H. Ratschek and 
J. Rokne [8] gives a number of improved techniques 
and many references. The approach of this paper fol- 
lows from two papers of L.B. Rall [6,7] and from [1]. 
Taken together, Rall’s papers outline four approaches 
to computing tight inclusions of R(f; x), which we ap- 
ply to derivatives: 

e monotonicity, 

e mean value and Taylor forms, 
e intersection, and 

e subinterval adaptation. 

We apply the monotonicity tests and the Taylor 
form to each term of the Taylor polynomial of a func- 
tion. Whenever we compute more than one enclosure 
for a quantity, either a derivative or an intermediate 
value, we compute intersections of all such enclosures. 
We apply these tests to each intermediate result of the 
calculation, not just to f itself. The bounds we com- 
pute for R(f™; x) are often several orders of magnitude 
tighter than bounds computed from natural interval ex- 
tensions. In one example, we improve the interval in- 
clusion for R(f“°; x) from [— 3.8E10, 7.8E10] (width 
= 1.1E11) to [— 2.1£03, 9.6E03] (width = 1.1E04). 
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This improvement by a factor of 10’ allows a Gaus- 
sian quadrature using 5 points per panel or a 10th 
order ODE solver (applications for which bounds for 
R(f; x) might be needed) to increase their stepsizes, 
and hence their computational efficiency, by a factor of 
107" 5. 

We discuss the evaluation of a function from a code 
list representation (see also ® Automatic differentia- 
tion: Point and interval Taylor operators). Then we dis- 
cuss how monotonicity tests and Taylor form represen- 
tations can be used to give tighter bounds for R(f; x). 


Evaluation of Functions 


Functions are expressed in most computer languages by 
arithmetic operations and a set ® of standard functions, 
for example, ® = {abs, arctan, cos, exp, In, sin, sqr, sqrt}. 
A formula (or expression) can be converted into a code 
list or computational graph {t), ..., tn} (cf. [5], » Auto- 
matic differentiation: Point and interval Taylor opera- 
tors). The value of each term tf; is the result of a unary or 
binary operation or function applied to constants, val- 
ues of variables, or one or two previous terms of the 
code list. For example, the function 


x? — 10x? +9 
x? —Ax —5 


f(x) = 


can be converted into the code list 


ty = sqr(x); iG S200 18 
to = sqr(ty); ty = 4-x; 
tz = 10- 8 tg = te = t7; 
t4 = to = t3; to = tg = 5p 
is = tea ESB tio = ts/to. 


Bounding Derivative Ranges, Figure 1 
Code list 


The final term t, of the code list (t;9 in this case) 
gives the value of f(x), if defined, for a given value of the 
variable x. The conversion of a formula into an equiva- 
lent code list can be carried out automatically by a com- 
puter subroutine. 

The code list serves equally well for various kinds 
of arithmetic, provided the necessary arithmetic oper- 
ations and standard functions are defined for the type 
of elements considered. Thus, the code list in Fig. 1 can 


serve for the computation of f(x) in real, complex, in- 
terval, or differentiation arithmetic. When x is an inter- 
val, one gets an interval inclusion f(x) of all real values 
F(x) for real x € x [3,4]. 

The process of automatic differentiation to obtain 
derivatives or Taylor coefficients of f(x) can be viewed 
as the evaluation of the code list for f(x) using a differ- 
entiation arithmetic in which the arithmetic operations 
and standard functions are defined on the basis of the 
well-known recurrence relations for Taylor coefficients 
(cf. also [3,4,5], » Automatic differentiation: Point and 
interval Taylor operators). Let (f); := f()/i! be the 
value of the ith Taylor coefficient of f(x) = f(x + h). 
Then we can express a Taylor series as 


[o,@) ; hi [o,@) ; 
fo =D f@s = Linu, 
i=0 . i=0 


and the elements of Taylor series arithmetic are vec- 
tors f = ((f)o, ...» (f)p). In Taylor arithmetic, constants 
c have the representation c = (c, 0,..., 0), and x = (xo, 1, 
0,..., 0) represents the independent variable x = xo + h. 
For example, multiplication f(x) = u(x) - v(x) of Taylor 
variables is defined in terms of the Taylor coefficients of 
u and v by (f); = Lixo (iy Wpapt = O.cccaDs 


Monotonicity 


We extend an idea of R.E. Moore for using monotonic- 
ity [4]: we check for the monotonicity of every deriva- 
tive of f and of every intermediate function t; from the 
code list. If the ith derivative of f is known to be of 
one sign on x (R (f; x) > 0 or < 0), then f“~) is 
monotonic on the interval x, and its range is bounded 
by the real values f—)(x) and f“~) (x). This is impor- 
tant because the bounds of R(f“~ ); x) by f°) (x) and 
f")(&) may be tighter than the bounds computed by 
the naive interval evaluation of f @—) (x). Hence, in ad- 
dition to the ranges R(f“; x), we propagate enclosures 
of the values at the endpoints R(f“; x) and R(f; x) so 
that those values are available. (We use R(f @); x) and 
R(f;x) instead of R(f; x) and R(f;x) to denote 
that f at the endpoints is evaluated in interval arith- 
metic.) 

Similarly, if R(f; x) > 0 (or < 0), then 
f ~2) ig convex (resp. concave), and its maximum 
value is max(f“~?(x), f“-2(X)) (resp. minimum is 


min( f(x), FO (X))). 
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f”’ ([x]) >0 


Bounding Derivative Ranges, Figure 2 
R (f°); x) => 0 implies f is monotonic and f’ is convex 


We apply the monotonicity test to each term of each 
intermediate result because an intermediate result may 
be monotonic when f is not. Further, by proceeding 
with tighter inclusions for the terms of the intermedi- 
ate results, we reduce subsequent over-estimations and 
improve our chances for validating the monotonicity of 
higher derivatives. If f—!) is found to be monotonic, 
the tightened enclosure for R(f“~ '; x) may allow us to 
validate R(f“~ ); x) > 0 (or < 0), so we backtrack to 
lower terms of the series as long as we continue to find 
monotonicity. In the recurrence relations for divide and 
for all of the standard functions, the value of f () (x) de- 
pends on the value of f —) (x). Hence, if the enclosure 
for R(f“~ ; x) is tightened, we recompute the enclo- 
sure for R(f OF x) and all subsequent terms. 

Table 1 shows (some of) the results when the mono- 
tonicity test is applied to each of the intermediate re- 
sults of 


x*— 10x? +9 
x? —4x —5 


f(x) = 


on the interval x := [1, 2]. Each row shows enclosures 
for Taylor coefficients. The row ‘x’ has two entries for 
the function x evaluates on the interval x and its deriva- 


tive. All higher-order derivatives are zero. Similarly, 
rows t4 and ts have five nonzero derivatives. 

A few entries show where tightening occurs because 
of the monotonicity test. For example, at 1 the 3rd 
derivative of t4 is positive. Hence, t4 is monotonic, but 
that knowledge yields no tightening. Also t4’ is convex, 
a fact which does allow us to tighten the upper bound 
from 12 to —8. Similarly at *, finding that tg is positive 
allows us to improve the upper bound for ts. In this ex- 
ample, the monotonicity tests allow us only two rela- 
tively modest tightenings, but those two tighter values 
propagate through the recurrences to reduce the width 
of the bound finally computed for t\}) from about 2.3E6 
to 300, an improvement of nearly a factor of 104. 


Taylor Form 


In [6], Rall proves that if x € x, then 


p-l 
R(fsx) C Fp(x) = D(f)i(x — 2)! 

i=0 
) (x)(x = ¥)? 


+ FY 
p! 


(2) 
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Bounding Derivative Ranges, Table 1 
Numerical results of applying monotonicity tests 


iG 
x(x) 2) 1 
ii = x? 
ti(x) [1,4] [2, 4] 1 
a ae 
to(x)  [1, 16] [4, 32] (6, 24] 
B= 0 i — 10x 
t3(x)  [10, 40] [20, 40] 10 
ty = ty — ts = x* — 10x? 
t4(x) [—39, 6] [—36, 12] [—4, 14 
Tightened to: 
tC) [245-9] [—36, -8] ai 
bite o= x — 10K 9 
ts(x)  [—30, 15] [—36, 12] (aa 
Tightened as the result of tightening t,: 
oy 15,01 [—36, -8] [-4, 14 
lig = lig = ty SO — ee 
ts(x)  [—7, 4] FL [3, 6] 
Tightened to: 
ts(x)  [—7, 0] [—1, 8] [3, 6 
to = tg —5 = x> —4x—5 
to(x)  [—12, -1] [-1, 8] [3, 6 
Tightened as the result of tightening tg: 
to(x)  [—12, -5] [-1, 8] [3, 6 
tho = ts/to = i 
tio(x) [—15, 30] 
[—132, 276] 


[—1160, 2392] 


Tightened as the result of tightening t; and to: 
tio(x) [—9.73E—0.6, 3.01] 
[0.41, 12.01] 


[—5.21, 23.61] 


and F®) is an interval extension of f”). The F, given 
by (2) is called the (elementary) Taylor form of f of or- 
der p. 

We expand the Taylor series for the function f and 
all intermediate functions t; appearing in the code list 
at three points, x = a := x, x = c := midpoint (x), and 
x = b := X. The series for f at x and X are already 
available since they were computed for the monotonic- 
ity test. The extra work required to generate the series 
at c is often justified because the midpoint form is much 
narrower than either of the endpoint forms. Let h := 


[4, 8] 1 
[4, 8]! il 
[4, 8] 1 
[4, 8] 1 
[4, 8] 1 
1 
1 
1 
1 


[—10097, 20823] 
[—87881, 181229] 
[—764851, 1577270] 


[—9.68, 51.97] 
[—21.84, 113.67] 
[—47.58, 248.96] 


width (x). We compute the Taylor form (2) for f and 
each tf; at the left endpoint, center, and right endpoint to 
all available orders and intersect. The remainder using 
R(f"*)); x) has the potential for tightening all previous 
terms: 


h2 
RG) C (s@ + f'(a)h(0, 1] ay Ole * (0, 1] 


+ ot fa) * [0,1] 


hit 
G+! (0.0) 


+R(fF9;x) 
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Bounding Derivative Ranges, Table 2 
Numerical results of applying Taylor from tests 


eS 
x(x) 2] 1 
h = Ge 
ti([a,b]) [1,4] [2, 4] 1 
No tightening occurs. 
ta = = ty = 5 = Oe 
t4(x) [—39, 6] [—36, 12] [—4, 14] [4, 8] 1 
[—24, 12] tightened by f(a) using Fab(2) 
[—24,-2.5] tightened by f(c) using Fab(2) 
[—20, -7] tightened by f(c) using Fab(3) 
[—20, -8] tightened by f(c) using Fab(4) 
[—29, -9] tightened by f(a) using Fab(1) 
[—27.438, -9] tightened by f(c) using Fab(1) 
[—24, -9] tightened by f(b) using Fab(1) 
Tightened to : 
ta(x) [=24) <9] [—20, -8] [—4, 14] [4, 8] 1 
ts = t4+9=x*—10x?+9 
ts(x) [—30, 15] [—36, 12] [—4, 14] [4, 8] i 
Tightened as the result of tightening t4: 
ts(x) [—15, 0] [2078] [—4, 14] [4, 8] 1 
tg = (i 
tg(x) [—7, 4] [—1, 8] [3, 6] 1 
Tightened to: 
tg (x) [—4, 0] [-1, 8] [3, 6] 1 
tio (x) [—15, 30] [—10097, 20823] 
[—132, 276] [—87881, 181229] 


[—1160, 2392] [—764851, 1577270] 


Tightened as the result of tightening t; and to: 


[—9.73E—06, 3.01] 
(0.55, 8.81] 


tio(x) 


[—8.57, 39.94] 
[—19.27, 87.64] 


[—4.57, 18.49] [—42.02, 191.84] 


[—6.08E—06, 3.01] tightened by f(c) using Fab(1) 


[—1, 1] 

i 

erly 
Qi 


[—1, it 
. Qitl 


h2 
n (0 + fh +f" + 


», shi 
eee Cre. 
pitt 


(i+ 1)! 


RE es) 


[-1, 1] 


2 
i nN (4 + f'(b)h * [-1,0] + fo * (0, 1] 


: hi ; 
ea tO) lor 
hitl 
(i+ 1)! 


FRE) (-1,0)"*") 
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Bounding Derivative Ranges, Figure 3 
Taylor polynomial enclosures for f, remainders from naive interval evaluation 


f 10 


Fj 
* 
to 


-2(} 


Bounding Derivative Ranges, Figure 4 
Taylor polynomial enclosures for f, remainders tightened by Taylor form 
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For higher-order derivatives, R(f“; x) is contained in 
similar Taylor forms involving f“*") (x), for n> 0. 

We apply the Taylor form to each intermediate re- 
sult. Except for the operators +, —, *, and sqr, when- 
ever one term is tightened, all following terms can be 
recomputed more tightly. This can result in an itera- 
tive process which is finite only by virtue of Moore’s 
theorem on interval iteration [4]. In practice then, we 
restrict the number of times subsequent terms are re- 
computed starting at a given order. 

Table 2 shows (some of) the results when the Taylor 
form is applied to each of the intermediate results of 


x? — 10x? +9 
x) = ———___ 
fe) x? —4x —5 


on the interval x := [1, 2]. “Tightened by f(a), f(c), 
or f(b)’ indicates whether the left endpoint, the mid- 
point, or the right endpoint expansion was used. ‘Using 
Fab(n)’ indicates that f(i) was tightened using f"*", 

The pattern of Table 2 is typical: Most Taylor forms 
give no tightening; there are many small improvements; 
and the compound effect of many small improvements 
is significant. Here we have reduced the width of the en- 
closure for the 6th Taylor coefficient from about 2.3E7 
to 2.3E2. Figures 3 and 4 compare the Taylor poly- 
nomial enclosures for f resulting from naive interval 
evaluation of the remainders with the enclosures tight- 
ened by the Taylor form computations shown in Ta- 
ble 2. 

For this example, the bounds achieved using the 
Taylor form are tighter than those achieved using the 
monotonicity test. For other examples, the monotonic- 
ity test performs better. Hence in practice, we apply 
both techniques. If the expression for f is rewritten in 
a mathematically equivalent form to yield tighter inter- 
val bounds for R(f; x), the techniques of this paper can 
still be used profitably to tighten enclosures of higher 
derivatives. 


Intersection and Subinterval Adaptation 


The third general technique described by Rall for tight- 
ening enclosures of R(f; x) is to intersect all enclosures 
for each quantity, as we have done here. That is, what- 
ever bounds for R(f“; x) we compute using monotonic- 
ity or Taylor form of any degree, we intersect with the 


tightest bound previously computed. Each new bound 
may improve our lower bound, our upper bound, both, 
or neither. Some improvements are large. Others are so 
small as to seem insignificant, but even the smallest im- 
provements may be magnified by later operations. 

Rall’s fourth technique is the adaptive partitioning 
of the interval x. The over-estimation of R(f; x) by 
naive interval evaluation decreases linearly with width 
(x), while the over-estimation by the Taylor form de- 
creases quadratically. Hence, partitioning x into smaller 
subintervals is very effective. However, we view subin- 
terval adaptation as more effectively controlled by the 
application (e.g., optimization, quadrature, DE solu- 
tion) than by the general-purpose interval Taylor arith- 
metic outlined here. Hence, we do not describe it fur- 
ther. 


Software Availability 


An implementation in Ada of interval Taylor arith- 
metic operators for +, —, *, /, and sqr is available at 
[9]. Similar implementations could be written in For- 
tran 90, C++, or any other language supporting opera- 
tor overloading. 
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In this article, we present some important theoretical 
results based upon which solution of parametric non- 
linear programming problems can be approached. The 
need for these results arises from the fact that while sta- 
bility, continuity and convexity properties of objective 
function value for linear programs are readily available 
[7], their counterparts in nonlinear programs are valid 
only for a special class of nonlinear programs. It is not 
surprising then that a large amount of research has been 
devoted towards establishing these conditions (see [1] 
and [3] for a comprehensive list of references). Further, 
due to the existence of strong duality results for linear 
models, parametric programming can be done by ex- 
tending the simplex algorithm for linear models [6]. On 
the other hand, for nonlinear programs the parametric 
solution is given by an approximation of the optimal 
solution. This approximation or estimation of the opti- 
mal solution can be achieved by obtaining the optimal 
solution as a function of parameters. In order to derive 
these results we first state the following implicit function 
theorem: 


Theorem 1 (see for example [3,8]) Suppose that $(x, 
0) is a (r x 1) vector function defined on E” x E™, with x 
€ E” and € E”, and D,, o(x, 8) and Dg $(x, 0) indicate 
the (r x n) and (r x m) matrix of first derivatives with 
respect to x and 0 respectively. Suppose that @: E"*" > 
E". Let o (x, 0) be continuously differentiable in x and 0 
in an open set at (xo, 00) where (xo, 00) = 0. Suppose 
that D, (xo, 90) has an inverse. 

Then there is a function x(@) defined in a neigh- 
borhood of 9 where for each @ in that neighborhood 
o[x(0), é|= = 0. Furthermore, x(0) is a continuously dif- 
ferentiable function in that neighborhood and 


x'(Oo) 
= —Dx$[x(9), 90] | Dod[x(), 0] 
= —D,(Xx9, 9) 'Dob(xo, 9), 


where x' (60) denotes the derivative of x evaluated at 0. 
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Consider the parametric nonlinear programming prob- 
lem of the following form: 


z(6) = min f(x, 6) 


s.t. gi(x,0)>0, i=1,...,p, (1) 
hj(x,0)=0, j=1,...,q, 
xeEX, 


where f, g and hare twice continuously differentiable in 
x and 6. The first order KKT conditions for (1) are given 
as follows: 


P q 
V(x,0)— } > AiVgi(x, 0) + D> wjVhj(x, 8) = 0, 
i=1 j=l 
Aigi(x,@) = 0, i=1,...,p, 
hj(x,@)=0, j=l,...,g. 
(2) 


An application of the implicit function theorem 1 
to the KKT conditions (2) results in the following basic 
sensitivity theorem: 


Theorem 2 ([2,3,8]) Let @o be a vector of parameter 

values and (xo, 0, Jo) a KKT triple corresponding to 

(2), where Ao is nonnegative and Xp is feasible in (1). Also 

assume that: 

i) strict complementary slackness holds; 

ii) the binding constraint gradients are linearly inde- 
pendent; 

iii) the second order sufficiency conditions hold. 

Then, in neighborhood of 60, there exists a unique, once 

continuously differentiable function [x(@), (0), 4(0)] 

satisfying (2) with [x(8o), A(Bo), 1(80)] = (xo. do» Ho) 

where x(0) is a unique isolated minimizer for (1), and 


= —(Mp)'No, (3) 


-Vgi ae 
MV" gy 81 


- —Vg, Vii --- Vig 


ApV" gp &p 


and 


Vi Mitces Vp ta)" 
L(x,A, wu, 0) = f(x, 6) 


P q 
+ Do Aigilx, 0) + DO nihil, 4). 


i=1 j=l 


However, for a special case of (1) when the parameters 
are present on the right-hand side of the constraints, (1) 
can be rewritten in the following form: 


20) = min f(x) 
st. g(x) = 8, (4) 


xeEXx. 


A simplified version of an equivalent of (3) for (4) can 
also be obtained (see for example [8] for details). An- 
other important result that can be derived for (4) is 
that the rate of change of the optimal value function, 
2(@), with change in 0 is given by KKT multiplier. Thus, 
given an optimal solution of (4) at a fixed point in 0, 
an estimate of the optimal solution in the neighbor- 
hood of @o can be obtained by using the KKT multi- 
plier obtained at 9 (see [8] and [9]). For a special case 
of (4), when (4) is convex in x and @ is bounded be- 
tween certain lower and upper bounds, say 0 and 1, one 
can obtain a piecewise linear approximation of the op- 
timal value function for the whole range of @. In order 
to derive these results, we first state the following prop- 
erties. 


Theorem 3 (continuity property of the objective func- 
tion value; [3]) Let 


2(0) = inf { f(x): x € X, g(x) > 6}. 


Suppose 

i) Xisacompact convex set in E", 

ii) f and g are both continuous on X x E", and 

iii) each component of g is strictly concave on X for 
each 0. 

Then z(0) is continuous on its effective domain. 
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LB = Lower Bounds; UB = Upper Bound 


40) 


0; Orne Oi 


Bounds and Solution Vector Estimates for Parametric NLPS, 
Figure 1 
Bounds on the optimal value function z(@) 


Theorem 4 (convexity property of the solution space; 
[4]) Suppose 

i) gj are jointly quasiconcave on X and 0, and 

ii) X is convex. 

Then the solution space R(@) = {x € X: g(x) = 9}, is con- 
vex. 


Theorem 5 (convexity property of the objective func- 
tion value; [4]) Suppose 

i) f is convex on X, and 

ii) the solution space R(@) is essentially convex. 

Then z(0) is convex on 0. 


Since z(9) is continuous and convex under above con- 
ditions, for a given interval [6;, 0:1], we can obtain [5] 
(see Fig. 1) parametric lower and upper bounds as fol- 
lows. 


Parametric Lower Bound 


A linear underestimator of the convex function, z(@), 
will be a global underestimator, hence lower bounds at 
6; and 6;,; given by: 


LB;(0) = 2"(6;) + Voz" (6;)(0 — 61), 
LBi41(9) = 2" (6:41) + Voz" (6:41)(6 — 9:41), 
where Vg z*(0) is given by the Lagrange multipliers 


as discussed earlier, provide global underestimators to 


2(@). 


Parametric Upper Bound 


A linear interpolation between the objective function 
value at the end points 0; and 6;,; given by: 


ZY, 0) = az" (V7, 8;) + (1—@)z"(Y, 6:41), @ € (0, 1), 


gives a valid upper bound because of the convexity of 
the objective function. 

It may be mentioned that in this simple way we can 
obtain a region, ABC, within which the value of ob- 
jective function will lie. An intersection point, int, of 
the two lower bounds, LB;(@) and LB; (9), is then de- 
termined. At this point the value of lower and upper 
bounds are compared, and if the difference is within 
certain tolerance, €, we stop, otherwise, the interval [6;, 
641] is subdivided into two intervals [6;, 8 in] and [9 int, 
641]. In each of these regions a similar bounding pro- 
cedure is repeated until we meet the tolerance criterion. 
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Branch and price is a generalization of linear program- 
ming (LP) based branch and bound specifically de- 
signed to handle integer programming (IP) formula- 
tions that contain a huge number of variables. The ba- 
sic idea of branch and price is simple. Columns are left 
out of the LP relaxation because there are too many 
columns to handle efficiently and most of them will 
have their associated variable equal to zero in an op- 
timal solution anyway. Then to check the optimality of 
an LP solution, a subproblem, called the pricing prob- 
lem, is solved to try to identify columns with a profitable 
reduced cost. If such columns are found, the LP is reop- 


timized. Branching occurs when no profitable columns 

are found, but the LP solution does not satisfy the in- 

tegrality conditions. Branch and price applies column 
generation at every node of the branch and bound tree. 

There are several reasons for considering IP formu- 
lations with a huge number of variables. 

e Acompact formulation of an IP may have a weak LP 
relaxation. Frequently the relaxation can be tight- 
ened by a reformulation that involves a huge num- 
ber of variables. 

e A compact formulation of an IP may have a sym- 
metric structure that causes branch and bound to 
perform poorly because the problem barely changes 
after branching. A reformulation with a huge num- 
ber of variables can eliminate this symmetry. 

e Column generation provides a decomposition of the 
problem into master and subproblems. This decom- 
position may have a natural interpretation in the 
contextual setting allowing for the incorporation of 
additional important constraints or nonlinear cost 
functions. 

e A formulation with a huge number of variables may 
be the only choice. 

At first glance, it may seem that branch and price in- 
volves nothing more than combining well-known ideas 
for solving linear programs by column generation with 
branch and bound. However, it is not that straight- 
forward. There are fundamental difficulties in applying 
column generation techniques for linear programming 
in integer programming solution methods. These in- 
clude: 

e Conventional integer programming branching on 
variables may not be effective because fixing vari- 
ables can destroy the structure of the pricing prob- 
lem. 

e Column generation often converges slowly and solv- 
ing the LPs to optimality may become computation- 
ally prohibitive. 

We illustrate the concepts of branch and price and the 

difficulties that may arise by means of an example. 

In the generalized assignment problem (GAP) the 
objective is to find a maximum profit assignment of m 
tasks to n machines such that each task is assigned to 
precisely one machine subject to capacity restrictions 
on the machines. For reasons that will become appar- 
ent later, we will consider separately the two cases of 
nonidentical and identical machines. 
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Nonidentical Machines 


The natural integer programming formulation of GAP 
is 


max ~ = PijZij 


l<i<m 1<j<n 


s.t. = zij = 1, t= Viens Ms 
l<j<n 
oe WijZij < dj, j=l,...,n, 
1<i<m 
zij € 10, 1}, 
i=1,....m, j=Hl,...,n, 


where pj; is the profit associated with assigning task i to 
machine j, wj is the amount of the capacity of machine 
j used by task i, dj is the capacity of machine j, and z; 
is a 0-1 variable indicating whether task i is assigned to 
machine j. 

An alternative formulation of GAP in terms of 
columns representing feasible assignments of tasks to 
machines is 


max DY DT | DS aire | 


jSlSn1sk<Kj \i<i<m 


s.t. » > iN, =i, 


1<j<n1<k<K; 


Y Ral. FHL, 


1<Sk<Kj 


Ai 2 (0,18, 


j=l,...,n, 


where the first m entries of a column, given by y, = (y/ ie 


sia) yy) satisfy the knapsack constraint 


> WijXi S dj, Xi € {0, 1}, 


1l<i<m 


i=1,...,m, 


and where K; denotes the number of feasible solutions 
to the above knapsack constraint. The first set of con- 
straints ensures that each task is assigned to a machine, 
and the second set of constraints, the convexity con- 
straints, ensures that exactly one feasible assignment of 
tasks to machines is selected for each machine. This is 
in fact the formulation that is obtained when we ap- 
ply Dantzig- Wolfe decomposition to the natural formu- 
lation of GAP with the assignment constraints defin- 


ing the master problem and the machine capacity con- 
straints defining the subproblems. 

The reason for considering this alternative formu- 
lation of GAP is that the LP relaxation of the master 
problem is tighter than the LP relaxation of the natu- 
ral formulation because certain fractional solutions are 
eliminated. Namely, all fractional solutions that are not 
convex combinations of 0-1 solutions to the knapsack 
constraints. 

Unfortunately, the LP relaxation of the master prob- 
lem cannot be solved directly due to the exponential 
number of columns. However, the LP relaxation of a re- 
stricted version of the master problem that considers 
only a subset of the columns can be solved directly us- 
ing, for instance, the simplex method. Furthermore, if 
the reduced costs of all the columns that were left out 
are nonnegative, then the LP solution obtained is also 
optimal for the LP relaxation of the unrestricted master 
problem. To check whether there exist a column with 
positive reduced cost we solve the pricing problem 


max {z(KPj;) — vj}, 
l<j<n 


where v; is the optimal dual price from the solution to 
the LP relaxation of the restricted master problem asso- 
ciated with the convexity constraint of machine j, and 
z(KP;) is the value of the optimal solution to the knap- 
sack problem 


max 2 (pij — Ui)xi 


l<i<n 
s.t. > WijXi S dj 
1<i<n 
xi Ec {0, 1}, 
ié€ {1,...,n}, 


with u; being the optimal dual price from the solution 
to the LP relaxation of the restricted master problem 
associated with the assignment constraint of task i. If 
the optimal value of the pricing problem is positive, we 
have identified a column with positive reduced cost. In 
that case, we add the column to the restricted master 
problem and reoptimize. 

The LP relaxation of the master problem solved 
by column generation may not have an integral op- 
timal solution and applying a standard branch and 
bound procedure to the master problem over the ex- 


330 


Branch and Price: Integer Programming with Column Generation 


isting columns is unlikely to find an optimal, or good, 
or even feasible solution to the original problem. There- 
fore it may be necessary to generate additional columns 
in order to solve the linear programming relaxations 
of the master problem at nonroot nodes of the search 
tree. 

Standard branching on the A-variables creates 
a problem along a branch where a variable has been 
set to zero. Recall that vy represents a particular solu- 
tion to the jth knapsack problem. Thus Ai = 0 means 
that this solution is excluded. However, it is possible 
(and quite likely) that the next time the knapsack prob- 
lem for the jth machine is solved the optimal solution 
is precisely the one represented by We In that case, it 
would be necessary to find the second best solution to 
the knapsack problem. At depth / in the branch and 
bound tree we may need to find the /th best solution, 
which is very hard. Fortunately, there is a simple rem- 
edy to this difficulty. Instead of branching on the As in 
the master problem, we use a branching rule that corre- 
sponds to branching on the original variables z. When 
zi = 1, all existing columns in the master that do not as- 
sign task i to machine j are deleted and task i is perma- 
nently assigned to machine j, i. e., variable x; is fixed to 
1 in the jth knapsack. When z; = 0, all existing columns 
in the master that assign job i to machine j are deleted 
and task i cannot be assigned to machine j, i. e., variable 
x; is removed from the jth knapsack. Note that each of 
the knapsack problems contains one fewer variable af- 
ter the branching has been done. 

Observe that the branching scheme discussed above 
is specific to the GAP. This is typical of branch and price 
algorithms. Each problem requires its own “‘problem- 
specific’ branching scheme. 

In practice, one of the computational difficulties en- 
countered when applying branch and price is the so- 
called tailing-off effect of the column generation, i.e., 
the large number of iterations needed to prove the op- 
timality of the LP solution. Potentially, this may hap- 
pen at every node of the search tree. Also, the pric- 
ing problem that needs to be solved at each column 
generation iteration may be difficult and time con- 
suming. Fortunately, the branch and bound framework 
has some inherent flexibility that can be exploited ef- 
fectively in branch and price algorithms. Branch and 
bound is an enumeration scheme that is enhanced by 
fathoming based on bound comparisons. To control the 


size of the branch and bound tree it is best to work with 
strong bounds; however, the method will work with any 
bound. Therefore, instead of solving the linear program 
to optimality, i.e., generating columns as long as prof- 
itable columns exist, we can choose to prematurely end 
the column generation process and work with bounds 
on the final LP value. 

Again, consider the alternative formulation of GAP. 
By dualizing the assignment constraints, we obtain the 
following Lagrangian relaxation, which provides an up- 
per bound on the value of the LP for any vector u. 


> piv, | Ay 


l<i<m 


a a 


1<j<n 1Sk<Kj 


+ Sou fi- do Ye wa 


l<i<m 1<jsn1<k<Kj 


s.t. » A =, j=l,...,n, 


1<k<Kj 


Me {0,1}, 


PS Apis Ghali he 


After some algebraic manipulations, we obtain 


Yo ui+ D5 max 


1<i<m l<j<n 
j j 
dS | dS G4 | AL 
1<k<K; \1<i<m 
s.t. > =, P= Ipneeg th; 
1Sk<Kj 
Ai € {0, 1}, 
j=l,....n, k=1,...,K; 


which is equivalent to 


Yo uit D5 KP). 


1<i<m l<j<n 


This shows that after solving the pricing problem, we 
have all the information necessary to compute an up- 
per bound on the value of the final LP solution. There- 
fore, after every column generation iteration, we may 
decide to prematurely end the column generation pro- 
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cess if the value of the LP solution to the current re- 
stricted master problem, which provides a lower bound 
on the final LP value, and this upper bound are suff- 
ciently close. 


Identical Machines 


This is a special case of the problem with nonidenti- 
cal machines and therefore the methodology described 
above applies. However, we need only one subprob- 
lem since all of the machines are identical, which im- 
plies that the A; can be aggregated by defining Ax = >; 
i and that the convexity constraints can be combined 
into a single constraint 7) <%<K Ax =n where A, is re- 
stricted to be integer. In some cases the aggregated con- 
straint will become redundant and can be deleted alto- 
gether. An example of this is when the objective is to 
minimize )~A,, i. e., the number of machines needed to 
process all the tasks. Note that this special case of GAP 
is equivalent to a 0-1 cutting-stock problem. 

A much more important issue here concerns sym- 
metry, which causes branching on the original vari- 
ables to perform very poorly. With identical machines, 
there are an exponential number of solutions that dif- 
fer only by the names of the machines, i. e. by swapping 
the assignments of 2 machines we get 2 solutions that 
are the same but have different values for the variables. 
This statement is true for fractional as well as 0-1 so- 
lutions. The implication is that when a fractional solu- 
tion is excluded at some node of the tree, it pops up 
again with different variable values somewhere else in 
the tree. In addition, the large number of alternate op- 
tima dispersed throughout the tree renders pruning by 
bounds nearly useless. 

The remedy here is a different branching scheme 
that works directly on the master problem but focuses 
on pairs of tasks. In particular, we consider rows of the 
master with respect to tasks r and s. Branching is done 
by dividing the solution space into one set in which r 
and s appear together, in which case they can be com- 
bined into one task when solving the knapsack, and into 
another set in which they must appear separately, in 
which case a constraint x, + x; < 1 is added to the knap- 
sack. Note that the structure of the subproblems is no 
longer the same on the different branches. 

Most of the material presented above is based on 
[3], in which the term branch and price was first in- 


troduced, and [1], in which the concepts of branch and 
price are covered in much more detail. Another impor- 
tant source of information on branch and price is [4], in 
which various general branching schemes and bound- 
ing schemes are discussed. Routing and scheduling has 
been a particularly fruitful application area of branch 
and price, see [2] for a survey of these results. 
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Introduction 


Let G be a graph (or hypergraph) with node set V(G) 
and edge set E(G). Let T be a tree having | E(G) | leaves 
in which every non-leaf node has degree 3. Let v bea bi- 
jection (one-to-one and onto function) from the edges 
of G to the leaves of T. The pair (T, v) is called a branch 
decomposition of G. Notice that removing an edge, say 
e, of T partitions the leaves of T and the edges of G 
into two subsets A, and B,. The middle set of e and of 
(Ae, B.), denoted by mid(e) or mid(A¢, B.), is the set 
V(G[Ae]) N V(G[B.]) where G[A,] is the subgraph of 
G induced by A, and similarly for G[B,]. The width of 
a branch decomposition (T, v) is the maximum order 
of the middle sets over all edges in T. The branchwidth 
of G, denoted by A(G), is the minimum width over all 
branch decompositions of G. A branch decomposition 
of G is optimal if its width is equal to the branchwidth 


!This research was partially supported by NSF grant DMI- 
0217265 


of G. For example, Fig. 1 gives an optimal branch de- 
composition of an example graph where some of the 
middle sets of the edges of the branch decomposition 
are provided. 

An edge e is contracted if e is deleted and the ends 
of e are identified into one node and a graph H is a mi- 
nor of a graph G if H can be obtained from a subgraph 
of G by contracting edges. Graphs of small branchwidth 
are characterized by the following theorem. 


Theorem 1 (Robertson and Seymour [49]) A graph G 

has branchwiadth: 

e Oifand only if every component of G has < 1 edge 

e <1 if and only if every component of G has < 1 node 
of degree > 2 

e <2ifand only if G has no K4 minor. Oo 


Other classes of graphs with known branchwidth are 
grids, complete graphs, Halin graphs, and chordal 
graphs. The branchwidth of a ax b-grid is the minimum 
of aand b while the branchwidth of a complete graph G 
with at least 3 nodes is [2 | V(G) |] [49]. Halin graphs 
have branchwidth 3 and the branchwidth of chordal 
graphs is bounded below by [2 | w(G) |] and above 
by @(G) where w(G) denotes the clique number of the 
graph [30]. 


Graph Minors Theorem 


A planar graph is a graph that can be drawn on a sphere 
or plane without having edges that cross. A subdivi- 
sion of a graph G is a graph obtained from G by re- 
placing its edges by internally vertex disjoint paths. In 
the 1930s, Kuratowski [42] proved that a graph G is 
planar if and only if G does not contain a subdivision 
of Ks or K33. Let F be a class of graphs. F is minor 
closed when all the minors of any member of F also 
belong to #. Given a minor closed class of graphs F, 
the obstruction set of F is the set of minor minimal 
graphs that are not elements of F. Clearly, any class of 
graphs embeddable on a given surface is a minor closed 
class. Erdés, also in the 1930’s, posed the question of 
whether the obstruction set for a given surface is finite. 
Wagner [57] later proved that the sphere has a finite 
obstruction set, K; and K3,3. The question of charac- 
terizing the obstruction set for surfaces other than the 
sphere remained open until 1979-1980 when Archdea- 
con [4] and Glover et al. [27] solved the case for the 
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{a,b,f} 


Branchwidth and Branch Decompositions, Figure 1 


Example Graph G with Optimal Branch Decomposition (T, v) with width 3 


projective plane where they proved that there are 35 mi- 
nor minimal “non-projective-planar” graphs. Archdea- 
con and Huneke [5] proved that the obstruction set for 
any non-orientable surface is finite and Robertson and 
Seymour [48] proved the case for any surface as a corol- 
lary of the Graph Minors Theorem (formerly known 
as Wagner's conjecture): every minor closed class has 
a finite obstruction set. Branch decompositions, tan- 
gles, and tree decompositions, discussed in later sec- 
tions, were beneficial to the proof of the Graph Minors 
Theorem. 


Tangles 


Let G be a graph (or hypergraph) and let k>1 be an 
integer. A separation of a graph G is a pair (G1, G2) 
of subgraphs of G with G; U G; = (V(G) U V(G)), 
E(G,) U E(G2)) = G, E(G;) N E(G2) = @ and the or- 
der of this separation is defined as | V(G1) N V(G2) | 
where V(G1)M V(G) is called the middle set of the sep- 
aration. For a hypergraph G, define I(G) to be the bi- 
partite graph such that the nodes of I(G) correspond to 
the nodes and edges of G and an edge ev in I(G) corre- 


sponds to the edge e of G being incident with the node v 
in G. A hypergraph G is called connected if I(G) is con- 
nected. Also, denote y(G) as the largest cardinality of 
a set of nodes incident to an edge of G. A tangle in G of 
order k is a set T of separations of G, each of order < k 
such that: 


(T1) for every separation (A, B) of G of order < k, one 
of (A, B), (B, A) is an element of T; 

(T2) if (Aj, B,), (Az, B2), (A3, B3) €T then A; U Az U 
A3 # G; and 

(T3) if (A,B) €T then V(A) 4 V(G). 


These are called the first, second and third tangle ax- 
ioms. The tangle number of G, denoted by 6(G), is the 
maximum order of any tangle of G. Figure 2 gives an 
example of a tangle of order 3 for the graph in Fig. 1. 
Notice in Fig. 2 that the inclusion of separations of the 
graph of order 3 to the tangle would result in a viola- 
tion of one of the tangle axioms. A tangle T of G with 
order k can be thought of as a “k-connected” compo- 
nent of G because some “k-connected” component of G 
will either be on one side or the other for any separa- 
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Separation of order 0 
(0, G) 


Separation of order 1 
(v, G) Vv € V(G) 
Separation of order 2 
({v, w}, G) Vv, w € V(G) 
(G[e], G[E(G)\e]) Ve € E(G) 
(G[0, 2,4, 6, 8], G[1, 3,5, 7, 9]) 


Branchwidth and Branch Decompositions, Figure 2 
Tangle of Order 3 for the Example Graph of Fig. 1 


tion of T. Robertson and Seymour [49] proved a min- 
max relationship between tangles and branch decom- 
positions, given below. 


Theorem 2 (Robertson and Seymour [49]) For any 
hypergraph G such that E(G) 4 9, max{B(G), y(G)} = 
6(G). Oo 


Related structures to tangles are respectful tangles and 
tangle bases. Respectful tangles of a graph G embed- 
ded on a surface &’ are tangles that are restricted ac- 
cording to the graph’s embedding on & and the or- 
der of these tangles is limited by the graph’s represen- 
tativeness on 2’. Respectful tangles were discussed in 
the work of Robertson and Seymour [50] and created 
the foundation for the Seymour and Thomas [53] re- 
sult for planar graphs. Tangle bases were introduced by 
Hicks [33] to assist in a branch-decomposition-based 
algorithm, discussed in a later section, to compute opti- 
mal branch decompositions for general graphs. Tangle 
bases are also restricted in the sense that the only mem- 
bers of a tangle basis are edges (just considering the first 
part of a separation) and separations which can be con- 
structed from the union of edges. A formal definition is 
given below. 

For an integer k and hypergraph G, a tangle basis, 
B, of order k is a set of separations of G with order < k 
such that: 


(B1) (G[e], G[E(G) \ e]) € BVe € E(G) if y(e) < k 

(B2) if(C,D) € Band de € E(G) such that G[e] = C, 
then 4(A,, B,), (Az, Bz) € Bsuch that Ay UA2 = 
Cand B, N B, = D 

(B3) B obeys the tangle axioms T2 and T3. 


Separations of order 2 
(G[e], G[E(G)\e]) Ve € E(G) 


Branchwidth and Branch Decompositions, Figure 3 
Connected Tangle Basis of Order 3 for the Graph of Fig. 1 


A tangle basis, B, in G of order k is connected if every 
separation (A, B) of B has A connected and define the 
connected tangle basis number of G, denoted by 0’(G), 
as the maximum order of any connected tangle basis 
of G. An example of a connected tangle basis for the 
graph in Fig. 1 is given in Fig. 3. Notice that the num- 
ber of separations of the connected tangle basis of Fig. 1 
is lower than the number of separations of the tangle 
of Fig. 1 offered by Fig. 2 but still contains the essential 
members of the tangle. Below is a min-max theorem re- 
lationship between tangle bases and branchwidth. 


Theorem 3 (Hicks [33]) If hypergraph G is connected 
such that B(G) > y(G), then the tangle basis number 
6’(G) is equal to the B(G). Oo 


Constructing Branch Decompositions 


In terms of finding branch decompositions for general 
graphs, there is an algorithm in Robertson and Sey- 
mour [51] to approximate the branchwidth of a graph 
within a factor of 3. For example, the algorithm de- 
cides if a graph has branchwidth at least 10 or finds 
a branch decomposition with width at most 30. This al- 
gorithm has not been used in a practical implementa- 
tion and its improvements by Bodlaender [8], Bodlaen- 
der and Kloks [13], and Reed [46] have not been shown 
to be practical either. Bodlaender and Thilikos [16] 
presented a tree-decomposition-based linear time al- 
gorithm for finding an optimal branch decomposition 
but it appears to be impractical. Tree-decomposition- 
based algorithms are discussed in a later section. In ad- 
dition, Bodlaender and Thilikos [17] gave an algorithm 
to compute the optimal branch decomposition for any 
chordal graph with maximum clique size at most 4 but 
the algorithm has been only shown practical for a par- 
ticular type of 3-tree. 

Under practical algorithms, Kloks et al. [39] gave 
a polynomial time algorithm to compute the branch- 
width of interval graphs, but for general graphs, one 
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has to rely on heuristics. Cook and Seymour [20,21] 
gave a heuristic algorithm to produce branch decompo- 
sitions that shows promise. In addition, Hicks [30,31] 
also found another branchwidth heuristic that was 
comparable to the algorithm of Cook and Seymour. 
Recently, Tamaki [54] has presented a linear time 
heuristic for constructing branch decompositions of 
planar graphs. This algorithm performs well when 
compared to the heuristics of Cook and Seymour [21] 
and Hicks [31]. Recently, Hicks [33] has developed 
a branch-decomposition-based algorithm for con- 
structing optimal branch decompositions and it seems 
to be practical for sparse graphs with branchwidth at 
most 8. 

For planar graphs, Seymour and Thomas showed 
that the branchwidth and an optimal branch decom- 
position of a graph can be computed in polynomial 
time. The complexity for the branchwidth is O(n*) and 
the complexity for computing an optimal branch de- 
composition is O(n‘) [53]. Hicks [34,35] gave a practi- 
cal implementation of these algorithms. Recently, Gu 
and Tamaki [28] introduced an O(n*) algorithm to 
compute an optimal branch decomposition of a planar 
graph by restricting the number of calls to the Seymour 
and Thomas algorithm for computing branchwidth to 
O(n). More work in this area is encouraged to decrease 
the bound further. 


Branch-Decomposition-Based Algorithms 


Branch decompositions are of algorithmic importance 
for their appeal to solve intractable problems that can 
be modelled on graphs with bounded branchwidth. 
Courcelle [22] and Arnborg et al. [6] showed that 
several NP-complete problems can be solved in poly- 
nomial time using dynamic programming techniques 
on input graphs with bounded treewidth, discussed 
in a later section. Similar results have been obtained 
by Borie et al. [18]. The result is also equivalent to 
graphs with bounded branchwidth since the branch- 
width and treewidth of a graph bound each other by 
constants [49]. In contrast, Seymour and Thomas [53] 
proved that testing if a general graph has branchwidth 
at most k, is NP-complete. The use of dynamic pro- 
gramming techniques in conjunction with a branch 
decomposition or a tree decomposition is referred to 
as a branch-decomposition-based or a tree-decomposi- 


tion-based algorithm and these types of algorithms are 
part of the class of algorithms called fixed parameter 
tractable algorithms [1]. 

Some examples of branch-decomposition-based al- 
gorithms proposed in theory are Fomin and Thi- 
likos [24] and Alekhnovich and Razborov [2]. Fomin 
and Thilikos used their result of improving a bound of 
Alon et al. [3] for the upper bound on the branchwidth 
of planar graphs to design a branch-decomposition- 
based algorithm in theory for vertex cover and dom- 
inating set for planar graphs [24]. Alekhnovich and 
Razborov [2] used the branchwidth of hypergraphs 
to design a branch-decomposition-based algorithm in 
theory to solve satisfiability problems. 

Although theory indicates the fruitful potential of 
branch-decomposition-based algorithms, the number 
of branch-decomposition-based algorithms in the lit- 
erature is exiguous. One noted exception is the work 
of Cook and Seymour [21] who produced the best 
known solutions for the 12 unsolved problems in 
TSPLIB95, a library of standard test instances for the 
TSP [47]. Hicks also presented a practical branch-de- 
composition-based algorithm for general minor con- 
tainment [32] and constructing optimal branch de- 
compositions [33]. One is also referred to the work of 
Christian [19]. 


Branchwidth of Matroids 


Since graph theory and matroid theory have a symbi- 
otic relationship, it is only natural that branch decom- 
positions can be extended to matroids. In fact, branch 
decompositions have been used to produce a matroid 
analogue of the graph minors theorem [26]. A formal 
definition for the branchwidth of a matroid is given be- 
low. 

The reader is referred to the book by Oxley [43] if 
not familiar with matroid theory. Let M be a matroid 
with finite ground set S(M) and rank function p. The 
rank function of M*, the dual of M, is denoted p*. 

A separation (A, B) of a matroid M isa pair of com- 
plementary subsets of S(M) and the order of the sepa- 
ration, denoted p(M, A, B), is defined to be following: 

P(A) + p(B) — p(M)+1 if AFD 
xB, 


0 else , 


p(M, A, B) = 
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Example Graph G from Fig. 1 with Optimal Branch Decomposition (T, 2) with width 3 for its Cycle Matroid M(G) 


A branch decomposition of a matroid M is a pair 
(T, jt) where T is a tree having | S(M) | leaves in which 
every non-leaf node has degree 3 and yz is a bijection 
from the ground set of M to the leaves of T. Notice that 
removing an edge, say e, of T partitions the leaves of 
T and the ground set of M into two subsets A, and 
B,. The order of e and of (A,,B,), denoted order(e) 
or order(A¢, Be), is equal to p(M, Ae, B,). The width of 
a branch decomposition (T, jz) is the maximum order 
of all edges in T. The branchwidth of M, denoted by 
B(M), is the minimum width over all branch decompo- 
sitions of M. A branch decomposition of M is optimal 
if its width is equal to the branchwidth of M. The cy- 
cle matroid of graph G, denoted M(G), has E(G) as its 
ground set and the cycles of G as the cycles of M(G). 
For example, Fig. 4 gives an optimal branch decompo- 
sition of the cycle matroid of the example graph given 
Fig. 1 where all of the orders for the edges of the branch 
decomposition are provided. 

There is also a corresponding notion of a tangle and 
tangle number for matroids, provided by Dharmati- 
lake [23]. In addition, Dharmatilake gave a min-max re- 


lationship between tangles of matroids and the branch- 
width of matroids, given below. 


Theorem 4 (Dharmatilake [23]) Let M be a matroid. 
Then B(M) = 0(M) if and only if M has no coloop and 
B(M) #1. Oo 


It was posed by Robertson and Seymour [49] that the 
branchwidth of a graph and the branchwidth of the 
graph’s cycle matroid are equivalent if the graph has 
a cycle of length at least 2. Recently, this conjecture was 
proved in the positive by Hicks and McMurray [37]. 
One is also referred to the work of Geelen et al. [26], 
Geelen et al. [25], Hall et al. [29], and Hlinény [38] for 
more detailed discussions on the branchwidth of ma- 
troids. 


Treewidth and Tree Decompositions 


This text would be remiss if a definition for treewidth 
and tree decompositions were not given. 

A tree decomposition of a graph (or hypergraph) G 
is a pair, (T', 0), where T is a tree and for t € V(T), a(t) 
is a subset of V(G) with the following properties: 
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Branchwidth and Branch Decompositions, Figure 5 
Optimal Tree Decomposition (T,o@) of Example Graph in 
Fig. 1 with width 3 


bs Urever (4) = V(G) 
e We € E(G),4t € V(T) such that the ends of e are 
contained in o(t) 
e fort, t’,t” € V(T), if t’ is on the path of T between 
tand t” then o(t) Na(t”) C a(t’). 
The width of a tree decomposition is the largest value 
of | o(t) — 1 | over all nodes t € V(T). The treewidth 
of a graph G, denoted by t(M), is the minimum width 
over all tree decompositions of G. A tree decomposi- 
tion of G is called optimal if its width is equal to the 
treewidth of G. For example, Fig. 5 gives an optimal tree 
decomposition of the example graph in Fig. 1. If T is 
restricted to be a path then (To) is called a path de- 
composition and its corresponding connectivity invari- 
ant for a graph G is called the pathwidth of G. 
The relationship between branchwidth 
treewidth is characterized in the following theorem. 


and 


Theorem 5 (Robertson and Seymour [49]) For any 
hypergraph G,max(B(G),y(G)) < t(G)+ 1 < 


max(|5B(G)], y(G), 1). O 


Tree decompositions and the associated connectivity 
invariant, treewidth, have been extensively researched 
by Thomas [56], Seymour and Thomas [52], Bodlaen- 
der [8,10], Bodlaender and Kloks [12,13], Bodlaender 
et al. [11], Bodlaender et al. [15], Bodlaender et al. [14], 
Ramachandramurthi [44], Reed [45,46] and many oth- 
ers (see the survey papers by Bodlaender [7,9]). One is 
also referred to the work of Koster et al. [40], Koster 
et al. [41], Telle and Proskurowski [55], and Alber and 
Neidermeier [1] for literature related to tree decom- 
positions and tree-decomposition-based algorithms. In 
addition, one is referred to Hicks et al. [36] for a more 


thorough survey of branch and tree decomposition 
techniques related to optimization. 


References 


1. 


Alber J, Niedermeier R (2002) Improved tree decomposi- 
tion based algorithms for domination-like problems. In: 
Proceedings of the 5th Latin American Theoretical Infor- 
matics (LATIN 2002). Lecture Notes in Computer Science, 
vol 2286. Springer, Heidelberg, pp 613-627 


. Alekhnovich M, Razborov A (2002) Satisfiability, branch- 


width and tseitin tautologies. In: 43rd Annual IEEE Sympo- 
sium on Foundations of Computer Science. IEEE Computer 
Society, pp 593-603 


. Alon N, Seymour PD, Thomas R (1994) Planar separators. 


SIAM J Discret Math 7:184-193 


. Archdeacon D (1980) A Kuratowski Theorem for the Pro- 


jective Plane. PhD thesis, Ohio State University 


. Archdeacon D, Huneke P (1989) A Kuratowski theorem for 


non-orientable surfaces. J Combin Theory Ser B 46(2):173- 
231 


. Arnborg S, Lagergren J, Seese D (1991) Easy problems for 


tree-decomposable graphs. J Algorithms 12:308-340 


. Bodlaender H (1993) A tourist guide through treewidth. 


Acta Cybernetica 11:1-21 


. Bodlaender H (1996) A linear time algorithm for finding 


tree-decompositions of small treewidth. SIAM J Comput 
25:1305-1317 


. Bodlaender H (1997) Treewidth: Algorithmic techniques 


and results. In: Privara |, Rvzicka P (eds) Proceedings of 
the 22nd International Symposium on Mathematical Foun- 
dations of Computer Science, MFCS’97. Lecture Notes in 
Computer Science, vol 1295. Springer, Berlin, pp 29-36 


. Bodlaender H (1998) A partial k-arboretum of graphs with 


bounded treewidth. Theoret Comput Sci 209:1-45 


. Bodlaender H, Gilbert J, Hafsteinsson H, Kloks T (1992) Ap- 


proximation treewidth, pathwidth, and minimum elimina- 
tion tree height. In: Schmidt G, Berghammer R (eds) Pro- 
ceedings 17th International Workshop on Graph-Theoretic 
Concepts in Computer Science WG 1991. Lecture Notes in 
Computer Science, vol 570. Springer, Berlin, pp 1-12 


. BodlaenderH, Kloks T (1992) Approximating treewidth and 


pathwidth of some classes of perfect graphs. In: Proceed- 
ings Third International Symposium on Algorithms and 
Computation, ISAAC 1992. Lecture Notes in Computer Sci- 
ence, vol 650. Springer, Berlin, pp 116-125 


. Bodlaender H, Kloks T (1996) Efficient and constructive al- 


gorithms for the pathwidth and treewidth of graphs. J Al- 
gorithms 21:358-402 


. Bodlaender H, Kloks T, Kratsch D, Muller H (1998) 


Treewidth and minimum fill-in on d-trapezoid graphs. 
J Graph Algorithms Appl 2(5):1-23 


. Bodlaender H, Tan R, Thilikos D, van Leeuwen J (1997) 


On interval routing schemes and treewidth. Inf Comput 
139:92-109 


338 


Branchwidth and Branch Decompositions 


23. 


24. 


25. 


26. 


27. 


28. 


29: 


30. 


31. 


32. 


33. 


34. 


35. 


Bodlaender H, Thilikos D (1997) Constructive linear time 
algorithms for branchwidth. In: Degano P, Gorrieri R, 
Marchetti-Spaccamela A (eds) Lecture Notes in Computer 
Science: Proceedings of the 24th International Colloquium 
on Automata, Languages, and Programming. Springer, 
Berlin, pp 627-637 

Bodlaender H, Thilikos D (1999) Graphs with branchwidth 
at most three. J Algorithms 32:167-194 

Borie RB, Parker RG, Tovey CA (1992) Automatic generation 
of linear-time algorithms from predicate calculus descrip- 
tions of problems on recursively constructed graph fami- 
lies. Algorithmica 7:555-581 

Christian WA (2003) Linear-Time Algorithms for Graphs 
with Bounded Branchwidth. PhD thesis, Rice University 
Cook W, Seymour PD (1994) An algorithm for the ring- 
router problem. Technical report, Bellcore 


. Cook W, Seymour PD (2003) Tour merging via branch- 


decomposition. INFORMS J Comput 15(3):233-248 


. Courcelle B (1990) The monadic second-order logic of 


graphs |: Recognizable sets of finite graphs. Inf Comput 
85:12-75 

Dharmatilake JS (1996) A min-max theorem using matroid 
separations. Contemp Math 197:333-342 

Fomin F, Thilikos D (2003) Dominating sets in planar 
graphs: Branch-width and exponential speed-up. In: Pro- 
ceedings of the Fourthteenth Annual ACM-SIAM Sympo- 
sium on Discrete Algorithms (Baltimore, MD 2003). ACM, 
New York, pp 168-177 

Geelen JF, Gerards AMH, Robertson N, Whittle GP (2003) 
On the excluded minors for the matroids of branch-width 
k. J Combin Theory Ser B 88:261-265 

Geelen JF, Gerards AMH, Whittle G (2002) Branch width 
and well-quasi-ordering in matroids and graphs. J Combin 
Theory Ser B 84:270-290 

Glover H, Huneke P, Wang CS (1979) 103 graphs that are 
irreducible for the projective plane. J Combin Theory Ser B 
27:332-370 

Gu QP, Tamaki H (2005) Optimal branch-decomposition of 
planar graphs in o(n?) time. In: Proceedings of the 31st In- 
ternational Colloquium on Automata, Languages and Pro- 
gramming. LNCS, vol 3580, pp 373-384 

Hall R, Oxley J, Semple C, Whittle G (2002) On matroids of 
branch-width three. J Combin Theory Ser B 86:148-171 
Hicks IV (2000) Branch Decompositions and their Applica- 
tions. PhD thesis, Rice University 

Hicks lV (2002) Branchwidth heuristics. Congressus Numer- 
antium 159:31-50 

Hicks lV (2004) Branch decompositions and minor contain- 
ment. Networks 43(1):1-9 

Hicks IV (2005) Graphs, branchwidth, and tangles! oh my! 
Networks 45:55-60 

Hicks IV (2005) Planar branch decompositions |: The rat- 
catcher. INFORMS J Comput 17(4):402-412 

Hicks IV (2005) Planar branch decompositions Il: The cycle 
method. INFORMS J Comput 17(4):413-421 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44, 


45. 


46. 


47. 


48. 


49. 


50. 


al 


52. 


53. 


54. 


Hicks IV, Koster AMCA, Kolotoglu E (2005) Branch and tree 
decomposition techniques for discrete optimization. In: 
Cole Smith J (ed) Tutorials in Operations Research 2005. IN- 
FORMS, Hanover, MD, pp 1-29 

Hicks IV, McMurray N (2007) The branchwidth of graphs 
and their cycle matroids. J Combin Theory Ser B 97: 
681-692 

Hlinény P (2002) On the exclued minors for matroids of 
branch-width three. preprint 

Kloks T, Kratochvil J, Muller H (1999) New branchwidth ter- 
ritories. In: Meinel C, Tison S (eds) STAC’99, 16th Annual 
Symposium on Theoretical Aspects of Computer Science, 
Trier, Germany, March 1999 Proceedings. Springer, Berlin, 
pp 173-183 

Koster A, van Hoesel S, Kolen A (2002) Solving partial con- 
straint satisfaction problems with tree-decompositions. 
Networks 40:170-180 

Koster AMCA, Bodlaender HL, van Hoesel SPM (2001) 
Treewidth: Computational experiments. Electr Notes Dis- 
cret Math 8:54-57 

Kuratowski K (1930) Sur le probleme des courbes gauches 
en topologie. Fundamenta Mathematicae 15:271-283 
Oxley JG (1992) Matroid Theory. Oxford University Press, 
Oxford 

Ramachandramurthi S (1997) The structure and number of 
obstructions to treewidth. SIAM J Discret Math 10:146-157 
Reed B (1992) Finding approximate separators and com- 
puting tree width quickly. In: Proceeding of the 24th An- 
nual Association for Computing Machinery Symposium on 
Theory of Computing. ACM Press, New York, pp 221-228 
Reed B (1997) Tree width and tangles: A new connectivity 
measure and some applications. In: Bailey RA (ed) Survey 
in Combinatorics. Cambridge University Press, Cambridge, 
pp 87-162 

Reinelt G (1991) TSPLIB - a traveling salesman library. 
ORSA J Comput 3:376-384 

Robertson N, Seymour PD (1985) Graph minors: A survey. 
In: Surveys in Combinatorics, London Math Society Lecture 
Note Series, edition 103. Cambridge University Press, Cam- 
bridge, pp 153-171 

Robertson N, Seymour PD (1991) Graph minors X: Ob- 
structions to tree-decompositions. J Combin Theory Ser B 
52:153-190 

Robertson N, Seymour PD (1994) Graph minors XI: Circuits 
on a surface. J Combin Theory Ser B 60:72-106 

Robertson N, Seymour PD (1995) Graph minors XIII: The 
disjoint paths problem. J Combin Theory Ser B 63:65-110 
Seymour P, Thomas R (1993) Graph searching and a min- 
max theorem for tree-width. J Combin Theory Ser B 
58:22-33 

Seymour PD, Thomas R (1994) Call routing and the rat- 
catcher. Combinatorica 14(2):217-241 

Tamaki H (2003) A linear time heuristic for the branch- 
decomposition of planar graphs. Technical Report MPI-I- 
2003-1-010, Max-Planck-Institut fuer Informatik 


Broadcast Scheduling Problem 


339 


55. Telle JA, Proskurowski A (1997) Algorithms for vertex par- 
titioning problems on partial k-trees. SIAM J Discret Math 
10(4):529-550 

56. Thomas R (1990) A Menger-like property of tree-width: The 
finite case. J Combin Theory Ser B 48:67-76 

57. Wagner K (1937) Uber eine eigenschaft der ebenen kom- 
plexe. Math Annal 115:570-590 


Broadcast Scheduling Problem 


CLAYTON W. COMMANDER 

Air Force Research Laboratory, Munitions Directorate, 
and Dept. of Industrial and Systems Engineering, 
University of Florida, Gainesville, USA 


Article Outline 


Synonyms 

Introduction 
Organization 
Idiosyncrasies 


Formulation 


Methods 
Sequential Vertex Coloring 
Mixed Neural-Genetic Algorithm 
Greedy Randomized Adaptive Search Procedures (GRASP) 
Multi-start Combinatorial Algorithm 
Computational Effectiveness 


Conclusion 
See also 
References 


Synonyms 


BSP; The BROADCAST SCHEDULING PROBLEM is also 
referred to as the TDMA MESSAGE SCHEDULING 
PROBLEM [6] 


Introduction 


Wireless mesh networks (WMNs) have become an im- 
portant means of communication in recent years. In 
these networks, a shared radio channel is used in con- 
junction with a packet switching protocol to provide 
high-speed communication between many potentially 
mobile users. The stations in the network act as trans- 
mitters and receivers, and are thus capable of utilizing 
a multi-hop transmission procedure. The advantage of 
this is that several stations can be used as relays to for- 
ward messages to the intended recipient. This allows 


beyond line of sight communication between stations 
which are geographically disbursed and potentially mo- 
bile [2]. 

Mesh networks have increased in popularity in re- 
cent years and the number of applications is steadily in- 
creasing [25]. As mentioned in [1], WMNs allow users 
to integrate various networks, such as Wi-Fi, the inter- 
net and cellular systems. WMNs can also be utilized in 
a military setting in which tactical datalinks network 
various communication, intelligence, and weapon sys- 
tems allowing for streamlined communication between 
several different entities [6]. For a survey of wireless 
mesh networks, the reader is referred to [1]. 

In WMNs, the critical problem involves efficiently 
utilizing the available bandwidth to provide collision 
free message transmissions. Unfettered transmission by 
the network stations over the shared channel will lead 
to message collisions. Therefore, some medium access 
control (MAC) scheme should be employed to sched- 
ule message transmissions so as to avoid message colli- 
sions. The time division multiple access (TDMA) pro- 
tocol isa MAC scheme introduced by Kleinrock in 1987 
which was shown to provide collision free broadcast 
schedules [19]. In a TDMA network, time is divided 
into frames with each frame consisting of a number of 
unit length slots in which the messages are scheduled. 
Stations scheduled in the same slot broadcast simulta- 
neously. Thus, the goal is to schedule as many stations 
as possible in the same slot so long as there are no mes- 
sage collisions. 

When considering the broadcast scheduling prob- 
lem on TDMA networks, there are two optimization 
problems which must be addressed [31]. The first in- 
volves finding the minimum frame length, or the num- 
ber of slots required to schedule all stations at least 
once. The second problem is that of maximizing the 
number of stations scheduled within each slot, thus 
maximizing the throughput. Both of these problems 
however, are known to be NP-hard [2]. Therefore, ef- 
ficient heuristics are typically used to quickly provide 
high quality solutions to real-world instances. 


Organization 


The organization of this article is as follows. In the fol- 
lowing section, we formally define the problem state- 
ment and provide a mathematical programming for- 


340 


Broadcast Scheduling Problem 


mulation. We also examine the computational com- 
plexity the problem. In Sect. “Methods”, we review 
several solution techniques which appear in the lit- 
erature. We provide some concluding remarks in 
Sect. “Conclusion” and indicate directions of future re- 
search. Finally, a list of cross references is provided in 
Sect. “See also”. 


Idiosyncrasies 


We will now briefly introduce some of the symbols and 
notations we will employ throughout this paper. De- 
note a graph G = (V,E) as a pair consisting of a set 
of vertices V, and a set of edges E. All graphs in this 
paper are assumed to be undirected and unweighted. 
We use the symbol “b := a” to mean “the expression a 
defines the (new) symbol b” in the sense of King [18]. 
Of course, this could be conveniently extended so that 
a statement like “(1 — €)/2 := 7” means “define the 
symbol ¢ so that (1 — €)/2 = 7 holds”. Finally, we will 
use italics for emphasis and SMALL CAPS for problem 
names. Any other locally used terms and symbols will 
be defined in the sections in which they appear. 


Formulation 


A TDMA network can be conveniently described as 
a graph G = (V,E) where the vertex set V repre- 
sents the stations and the set of edges E represents the 
set of communication links between adjacent stations. 
There are two types of message collisions which must 
be avoided when scheduling messages in TDMA net- 
works. The first, called a direct collision occurs between 
one-hop neighboring stations, or those stations i,j € V 
such that (i, j) € E. One-hop neighbors which broad- 
cast during the same slot cause a direct collision. Fur- 
ther, if (i,j) ¢ E, but (i,k) € E and (j,k) € E, theni 
and j are called two-hop neighbors. Two-hop neighbors 
transmitting in the same slot cause a so-called hidden 
collision [2]. 

Assume that there are M slots per frame. Further, 
assume that packets are sent at the beginning of each 
time slot and are received in the same slot in which they 
are sent. Let x: Mx V & {0, 1}, be a surjection defined 
by 


1, if station n scheduled in slot m , 
Xmn = . (1) 
0, otherwise . 


Also, let c: E ++ {0,1} return 1 if i and j are one-hop 
neighbors, i.e., if (i, j) € E andi ¥ j. 

Using the aforementioned definitions and as- 
sumptions, we can now formulate the BROADCAST 
SCHEDULING PROBLEM (BSP) on TDMA networks as 
the following multiobjective optimization problem: 


MinimizeM 
M |v 
Maximize 2 > Xij 
i=1 j=1 
subject to: 
M 
Yaa & VneVv, (2) 
m=1 


Cij +Xmi + Xmj S 2, 


Vi,jeViitj.m=l,...,M, (3) 


CikXmi + CkjXmj LILVIGKE VIA DIA, 
kAi,m=1,...,M, (A) 


Xmn € {0,1}, VWneV,m=1,...,M, (5) 


MeZ. (6) 


The objective provides a minimum frame length 
with maximum bandwidth utilization, while con- 
straint (2) ensures that all stations broadcast at least 
once. Constraints (3) and (4) prevent direct and hidden 
collisions, respectively. Constraints (5) and (6) define 
the proper domain of the decision variables. 

Suppose that we relax the BSP and only the con- 
sider the first objective function. This is referred 
to as the FRAME LENGTH MINIMIZATION PROBLEM 
(FLMP) and is given by the following integer program: 
min{M: (2) — (6)}. Clearly any feasible solution to 
this problem is feasible for BSP. Now, consider a graph 
G’ = (V,E’) where V follows from the original com- 
munication graph G, but whose edge set is given by 
E' = EU {(i, f): i, j are two-hop neighbors}. Then us- 
ing this augmented graph, we can formulate the follow- 
ing theorem due to Butenko et al. [2]. 


Theorem 1 The FRAME LENGTH MINIMIZATION 
PROBLEM on G = (V,E) is equivalent to finding an 
optimal coloring of the vertices of G’(V, E’). 
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Proof Recall that in order for a message schedule to 
be feasible, all stations must broadcast at least once and 
no collisions occur, either hidden or direct. Notice now 
that E’ contains both one-hop and two-hop neighbors, 
and in any feasible solution, neither of these can trans- 
mit in the same slot. Thus, there is a one-to-one corre- 
lation between time slots in G and vertex colors in G’. 
Hence, a minimum coloring of the vertices of G’ pro- 
vides the minimum required slots needed for a collision 
free broadcast schedule on G. Oo 


After one has successfully solved the FLMP by solv- 
ing the corresponding GRAPH COLORING PROBLEM, 
an optimal frame length M* is attained. With this, 
the THROUGHPUT MAXIMIZATION PROBLEM (TMP) 
given as follows max{yoM) pad xij: (2) — (6)} can be 
solved, where M is replaced by M* in (2) — (6). A direct 
result of Theorem 1 is that finding an optimal frame 
length for a general instance of the BSP is NP-hard [11]. 
The reader is referred to the paper by Butenko et al. [2] 
for the complete proof. Also, in [8], the TMP was also 
shown to be NP-hard [8]. Thus it is unlikely that a poly- 
nomial algorithm exists for finding an optimal broad- 
cast schedule for an instance of the BSP [11]. It is inter- 
esting to note however, that if we ignore constraint (4) 
which prevents two-hop neighbors from transmitting 
simultaneously, then the resulting problem is in P, and 
a polynomial time algorithm is provided in [13]. 

Due to the computational complexity of the 
BSP, several heuristics have been applied and appear 
throughout the literature [2,3,6,28,31]. In the follow- 
ing section, we highlight several of these methods and 
examine their effectiveness when applied to large-scale 
instances. 


Methods 


In this section, we review many of the heuristics which 

have been applied to the BSP. We analyze the tech- 

niques used and compare their relative performance as 

reported in [6]. The particular algorithms we examine 

are as follows: 

e Sequential vertex coloring [31]; 

e Mixed neural-genetic algorithm [27]; 

e Greedy randomized adaptive search procedures 
(GRASP) [2,3]; 

e A multi-start combinatorial algorithm [6]. 


We note here that none of the heuristics which we 
describe in this section attempt to solve the BsP by us- 
ing the typical multiobjective optimization approach, in 
which one combines the multiple objectives into one 
scalar objective whose optimal value is a Pareto opti- 
mal solution to the original problem. Instead all of the 
methods decouple the objectives and handle each in- 
dependently. This is done because for instances of the 
BSP, frame length minimization usually takes prece- 
dence over the utilization maximization problem [27, 
28,31). 


Sequential Vertex Coloring 


Yeo et al. [31] propose a two-phase approach based on 
sequential vertex coloring (SVC). The first phase com- 
putes an approximate solution for the FLMP. Then us- 
ing the computed frame length, the TMP is considered 
in the second phase. Specific details are as follows. 


Frame Length Minimization For this phase, the 
FRAME LENGTH MINIMIZATION PROBLEM is consid- 
ered and an approximate solution is computed by solv- 
ing a graph coloring problem in the augmented graph. 
A sequential vertex ordering approach is used whereby 
the stations are first ordered in descending order of the 
number of one-hop and two-hop neighbors. The first 
vertex is colored and the list of the other N — 1 vertices 
are scanned downward. The remaining vertices are col- 
ored with the smallest color which has not already been 
assigned to one of its one-hop neighboring station. The 
process is continued until all vertices have been col- 
ored. 


Throughput Maximization To solve the TMP in the 
frame length computed in phase 1, an ordering method 
of the sequential vertex coloring algorithm is applied. 
The stations are now ordered in ascending order of the 
the number of one-hop and two-hop neighbors. The 
first ordered station is then assigned to any slots in 
which it can simultaneously broadcast with the previ- 
ously assigned stations. This process is repeated for ev- 
ery station in the ordered list. 


Mixed Neural-Genetic Algorithm 


As with the coloring heursitic presented described 
above, Salcedo-Sanz et al. [27] introduced a two-phase 
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heuristic based on combining both Hopfield neural net- 
works [15] and genetic algorithms as in [29]. As with 
the vertex coloring algorithm, phase one considers the 
FLMP and phase two attempts to maximize the through- 
put. 


Frame Length Minimization In order to solve the 
FRAME LENGTH MINIMIZATION PROBLEM, a discrete- 
time binary Hopfield neural network (HNN) is used. 
As described in [27], the HNN can be represented as 
a graph whose vertices are the neurons (stations) and 
whose edges represent the direct collisions. The neu- 
rons are updated one at a time after a randomized ini- 
tialization until the system converges. For specific im- 
plementation details, the reader should see [27]. 


Utilization Maximization In this phase, a genetic al- 
gorithm [12] is used to maximize the throughput within 
the frame length that was determined in phase one. Ge- 
netic algorithms (GAs) get their names from the bio- 
logical process which they mimic. Motivated by Dar- 
win’s Theory of Natural Selection [7], these algorithms 
evolve a population of solutions, called individuals, over 
several generations until the best solution is eventually 
reached. Each component of an individual is called a al- 
lele. Individuals in the population mate through a pro- 
cess called crossover, and new solutions having traits, 
i.e. alleles of both parents are produced. In successive 
generations, only those solutions having the best fitness 
are carried to the next generation in a process which 
mimics the fundamental principle of natural selection, 
survival of the fittest [12]. Again, the reader should ref- 
erence [27] for implementation specific information. 


Greedy Randomized Adaptive Search Procedures 
(GRASP) 


GRASP [9] is a multi-start metaheuristic that has 
been used with great success to provide solutions 
for several difficult combinatorial optimization prob- 
lems [10], including SATISFIABILITY [24], QUADRATIC 
ASSIGNMENT [21,23], and most recently the COOPER- 
ATIVE COMMUNICATION PROBLEM ON AD-HOC NET- 
WORKS [4,5]. 

GRASP is a two-phase procedure which generates 
solutions through the controlled use of random sam- 
pling, greedy selection, and local search. For a given 


problem JT, let F be the set of feasible solutions for /7. 
Each solution X € F is composed of k discrete compo- 
nents a),..., a@,%. GRASP constructs a sequence {X}; of 
solutions for JT, such that each X; € F. The algorithm 
returns the best solution found after all iterations. 


Construction Phase The construction phase for the 
GRASP constructs a solution iteratively from a partial 
broadcast schedule which is initially empty. The sta- 
tions are first sorted in descending order of the num- 
ber of one-hop and two-hop neighbors. Next, a so- 
called Restricted Candidate List (RCL) is created and 
consists of the stations which may broadcast simultane- 
ously with the stations previously assigned to the cur- 
rent slot. From this RCL a station is randomly chosen 
and assigned. A new RCL is created and another station 
is randomly selected. This process continues the RCL is 
empty, at which time the slot number is incremented 
and the procedure is repeated recursively for the sub- 
graph induced by the set of all vertices whose corre- 
sponding stations have not yet been assigned to a time 
slot. 


Local Search The local search phase used is a swap- 
based procedure which is adapted from a similar 
method for graph coloring implemented by Laguna and 
Marti in [20]. First, the two slots with the fewest num- 
ber of scheduled transmissions are cif stationombined 
and the total number of slots is now given as k = m—1, 
where m is the frame length of the schedule computed 
in the construction phase. Denote the new broadcast 
schedule as {Xm7n,m’ = 1,...,k,n = 1,...,N}. 
Now, let the function f(x) = ae E(m',), where 
E(m’,) is the set of collisions in slot m’. f(x) is then min- 
imized by the application of a local search procedure as 
follows. 

A colliding station in the combined slot is chosen 
randomly and every attempt is made to swap this sta- 
tion with another from the remaining k — 1 slots. Af- 
ter a swap is made, f(x) is re-evaluated. If f(x) has 
a lower value than before the swap, the swap is kept 
and the process repeated with the remaining colliding 
stations. If after every attempt to swap a colliding sta- 
tion the result is unimproved, a new colliding station is 
chosen and the swap routine is attempted. This contin- 
ues until either a successful swap is made or for some 
specified number of iterations. If a solution is improved 
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such that f(x) = 0, then the frame length has been suc- 
cessfully decreased by one slot. The value of k is then 
decremented and the process is repeated.If the proce- 
dure ends with f(x) > 0, then no improved solution 
was found. 


Multi-start Combinatorial Algorithm 


To our knowledge, the most recent heuristic for the 
BSP is a hybrid multi-start method by Commander and 
Pardalos [6]. This heuristic combines a graph color- 
ing heuristic with a randomized local search to pro- 
vide high-quality solutions for large-scale instances on 
the problem. As with the previously described method, 
this heuristic is also a two-phase approach. The reader 
should see [6] for pseudo-code and other implementa- 
tion specific details. 


Frame Length Minimization First a greedy random- 
ized construction heuristic was used to determine the 
value for M. As a result of Theorem 1, the method is 
based on the construction phase of the Greedy Ran- 
domized Adaptive Search Procedure (GRASP) [26] for 
coloring sparse graphs proposed by Laguna and Marti 
in [20]. This particular method was chosen because it is 
able to quickly provide excellent solutions for the frame 
length. That being said, any other coloring heuristic 
would provide a value for M such as the Sequential 
Vertex Coloring method described above. However, the 
randomized approach of the selected method allows the 
search space to be more thoroughly investigated. This is 
due to the fact that different optimal colorings will yield 
different solutions in the second phase. 


Throughput Maximization The solution from the 
first phase will not provide an optimal throughput in 
general, because each station will only be scheduled to 
transmit once in the frame. Therefore, a randomized 
local improvement method is used to schedule each 
station as many times as possible in the frame. This 
method locally optimizes each slot by considering the 
set of nodes which may transmit with the currently 
scheduled slot. A node from this set is randomly se- 
lected and the process repeats until no other stations 
may broadcast in the current slot. The next slot is then 
considered and the process is repeated until the solu- 
tion is locally optimal. 


Computational Effectiveness 


In [6], the authors performed an extensive computa- 
tional experiment comparing the effectiveness of the 
aforementioned heuristics. They tested all of the algo- 
rithms on a common platform and reported solutions 
for 63 instances ranging from 15 to 100 stations with 
varying densities. In addition, they implemented the 
integer programming model from Sect. “Formulation” 
using the Xpress-MP™ optimization suite from Dash 
Optimization [17]. Xpress-MP contains an implemen- 
tation of the simplex method [14], and uses a branch 
and bound algorithm [30] together with advanced 
cutting-plane techniques [16,22]. 

For each instance tested, the combinatorial algo- 
rithm of [6] is superior to the other heuristics men- 
tioned. For all 63 instances tested, the method found 
solutions at least as good as any of the other algo- 
rithms from the literature for all of the networks, out- 
performing them on 56 cases. The performance of 
the GRASP [2] and the Mixed Neural-Genetic Algo- 
rithm [27] were comparable, with GRASP performing 
slightly better on average. The weakest of the methods 
was the Sequential Vertex Coloring [31] algorithm. For 
specific numerical results, see [6]. 


Conclusion 


In this article, we introduced the BROADCAST 
SCHEDULING PROBLEM on TDMA networks. The BsP 
is an important problem that occurs in wireless mesh 
networks regarding efficiently scheduling collision free 
broadcasts for the network stations. We formally de- 
fined the problem, examined the computational com- 
plexity, and discussed several algorithms which have 
been applied to the BsP, all with competitive results. 
We conclude with a few words on possible direc- 
tions of future research. In addition to the ones de- 
scribed, other metaheuristics could be considered and 
approximation algorithms developed. Also, a heuris- 
tic exploration of cutting plane algorithms on the IP 
formulation would be an interesting alternative. An- 
other alternative would be to consider instances of the 
problem in which the stations are part of a mobile 
ad-hoc network. In this case, the topology of the net- 
work would change as the stations change position. 
This could potentially cause significant difficulties in 
determining the evolving sets of one-hop and two-hop 
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neighbors. There is no doubt that as technology ad- 
vances and research on ad-hoc networks increases, so 
too will applications of the BsP which will require ad- 
vanced solution techniques [25]. 
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Quasi-Newton methods attempt to update a Hessian ap- 
proximation (or the inverse of it) instead of evaluating 
the Hessian matrix exactly at each iteration, as in the 
basic Newton method for unconstrained optimization. 
Consider the optimization problem: 


min f(x). 


For this problem the Newton method requires the 
solution and updating iteratively of the solution point 
according to: 


Hx )Ax© = —g(x'), (1) 


where H(x) denotes the Hessian matrix at point x) 
(kth iteration of Newton’s method), g(x) is the gra- 
dient vector at the same point, and finally Ax“ is the 
correction to the point x“. The correction is applied 
according to: 


where for the standard Newton method a = 1, but oth- 
erwise in practical applications and to force theoreti- 
cally ‘global convergence’ (not just in the neighborhood 
of the minimizer) one conducts a line search to estimate 
optimally the value of @ at each iteration. Alternative al- 
gorithms use the concept of trust regions. 

There exist symmetric updating formulae of rank- 
two corrections for both the inverse Hessian and the 
Hessian, all belonging to the broad category of Broy- 
den methods. The general family updates either the 
Hessian (H) or the inverse Hessian (G = H~!). There 
are two well-known schemes, the Davidon—Fletcher- 
Powell rank-two update (DFP update), originally pro- 
posed by W.C. Davidon [3], and later by R. Fletcher 
and M.J.D. Powell [6], and the well-known Broy- 
den-Fletcher-Goldfarb-Shanno update formula (BFGS 
update). This was proposed by C.G. Broyden [1,2], 
Fletcher [4], D. Goldfarb [7], and D.F. Shanno [9]. Both 
of these methods preserve positive definiteness of the 
updated matrices. 

The definitions of p and q used below are intro- 
duced first: 


Pk = Xk+1 — Xk, 


Qk = $k+1 — &k- 


The DFP updating scheme of the inverse Hessian is 
given by: 
+ 
DEP PEPx 
Gey) = Get = 


Giqug, Gk 
Pi. dk 


q, Gidk 


The complementary updating formula to any up- 
dating Hessian (or inverse Hessian) scheme can be 
found by exchanging G with H and q with p (for ex- 
ample as discussed in [8]). By applying this property to 
the DFP update above, it is obtained: 


Hipp, He 
Pp, Hipx 


GG 


HEFGS = Hy an = 
q;. Pk 


which is the BFGS updating scheme for the Hessian. 
By taking the inverse of this one can obtain the in- 
verse Hessian BFGS updating formula: 


GEFGS — G, + (: 7 +S) (2) 


qi. Pk Pi. dk 


(pea Ge + Giaep, 
qi. Pk 
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The general class of Broyden methods can be de- 
rived by the linear combination of the two types of up- 
dates, since they are both symmetric rank-two type cor- 
rections, being constructed from the same vectors px 
and G, qx. Thus it can be obtained that (for example 
see [5,8]): 


Gh. = (1 PGR + GES, 


which yields: 
T 1 
G G 
G?,,=Ge+ out - ae “+ ovevy, (2) 
Px dk qi, Gide 


which is the general family of Broyden methods, with: 


T + ({ Pe Grae 
vi = (4% Geax) (3 a) G | . 

A pure Broyden method is one that uses a constant 
value of ¢ in all iterations. The Broyden family does 
not preserve positive definiteness of the updated inverse 
Hessian G? 1 for all values of ¢. 

Generally, of all these schemes the varying @ vari- 
ant is never used nowadays (2000), with the BFGS 
scheme being the method of choice whenever an up- 
dating scheme is chosen. This is because computational 
experience has proven the BFGS to be more effective 
than the DFP scheme. 
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The capacitated minimum spanning tree problem 
(CMST) or terminal layout problem is usually described 
as the problem of determining a rooted spanning tree 
of minimum cost in which each of the subtrees off the 
root node contains at most K nodes. That is, the CMST 
is a generalization of the well-known minimum span- 
ning tree problem (MST) where the objective is to find 
a minimum cost tree spanning a given set of nodes such 
that some capacity constraints are observed. 

As a graph theoretic problem we consider a con- 
nected graph G = (V, A, b, c) with node set V = {0,..., 
n} and arc set A. Each node i € V has a nonnegative 
node weight b; which may be interpreted as capacity re- 
quirement whereas a nonnegative arc weight cj repre- 
sents the cost of using arc (i,j) € A. Node 0 denoted as 
the center node will be the root of the tree (with bo := 
0). We define a subtree or component C; of a tree span- 
ning V as its maximal subgraph uniquely connected to 
the center by arc (0, i) (denoted as central arc). The de- 
mand ofa subtree is the sum of the node weights of the 
included nodes. To satisfy the capacity constraint the 
demand of each subtree must not exceed a given capac- 
ity K. (Without loss of generality we may assume b; < K 
for all i.) By means of these definitions the CMST is the 
problem of finding a minimum cost tree spanning node 
set V where all subtrees satisfy the capacity constraint. 

In spite of existing polynomial algorithms for the 
unconstrained MST the CMST has been shown to be 
NP-hard [32] even when all b;-values are identical. This 
case of the CMST is referred to as unit weight CMST or 
equal demand CMST; otherwise it is called the nonunit 
weight case. Most references in the literature deal with 
the unit weight case with only a few exceptions treat- 
ing the more general case. For a comprehensive sur- 
vey of the (unit weight) CMST up to the mid-1990s see 
[4]. The CMST in undirected graphs requires a sym- 
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metric cost matrix. Otherwise, the direction of the arcs 
has to be considered, i. e., all subtrees are directed. This 
capacitated minimum spanning arborescence problem 
(CMDT) includes the CMST as a special case. 

Motivated by the intractability of the problem both 
heuristic as well as exact algorithms have been devel- 
oped. In the sequel various algorithmic concepts are 
reviewed mainly for the unit weight CMST (with spe- 
cial emphasis on progress made in the late nineties; for 
some older yet important references not given here see 
[4]). However, first we sketch some applications of the 
CMST some of which may also lead to important mod- 
ifications of the problem. 


Applications 


The CMST has a great variety of applications especially 
in the field of telecommunications network design. For 
instance, in the design of minimum cost teleprocess- 
ing networks terminals (nodes) have to be connected 
to a central facility (the center node) by so-called mul- 
tipoint lines (the subtrees) which have to be restricted 
with respect to the traffic transfered between the center 
and the included terminals or the number of terminals 
included in the line. The latter is sometimes called reli- 
ability constraint because it limits the maximal number 
of terminals disconnected from the central facility in 
the case of a single link breakdown. Although different 
constraints may be referred to as capacity constraints 
(e.g. considering arc weights instead of node weights 
or even nonlinear weight functions depending on the 
distance of a node or arc from the center) most formu- 
lations in the literature consider only one of them. 


Mathematical Programming Formulations 


For the CMST a great variety of formulations may 
be found in the literature; see, e.g., [14,19,20,21,22]. 
Here we restrict ourselves to the presentation of a well- 
known flow-based formulation. As relaxations of di- 
rected formulations may be advantageous we consider 
the CMDT. 

Assume 0; = 1 for all i= 1,..., n, and bp = 0, then 
the CMDT can be described as a mixed integer linear 
programming formulation as follows. Define xj = 1, if 
arc (i, j) is included in the solution, and xj = 0, other- 
wise. Furthermore, let yj denote the flow on arc (i, j) 
for all i, j, i.e. i=0,...,n andj =1,..., . Ensure vari- 


ables x; and yj with (i, j) ¢ A to be equal to zero by as- 
signing prohibitively large weights to them. The follow- 
ing single-commodity flow formulation gives a mini- 
mum cost directed capacitated spanning tree with cen- 
ter node 0 as the root: 


non 
min ) ) Cij* Xij 


i=0 j=1 
n 
s.t. yoy j=l,...,n, 
i=0 
n n 
(P) 2 
i=0 i=1 
j=l,....n, 


xij S Vij S (K— Di) + xij 
for all i, j, 


Xij € {0, 1}, Vij >0 forall 1, j. 
The first set of equalities ensures that exactly one arc 
is reaching each noncentral node. The coupling con- 
straints in combination with the flow conservation en- 
sure that no cycles are allowed and that the capac- 
ity constraint is satisfied in each subtree. For a formal 
proof of cycle prevention see [14], i-e., a tree spanning 
all nodes is guaranteed. 


Exact Algorithms 


Most exact algorithms for solving the CMST are based 
on the branch and bound or the branch and cut 
paradigm, while other approaches are usually not com- 
petitive due to time and space complexity (e. g. dynamic 
programming [23]). 

When describing the concepts from the literature 
in most cases we do not report computational exper- 
iments as there is no fair comparison. When report- 
ing problem sizes solved to optimality by a specific al- 
gorithm different authors have proposed various ways 
of conducting experiments (e.g. the way of data gen- 
eration), i.e., comparability is not always guaranteed. 
Moreover, problem instances with a larger number of 
nodes might be easier to solve than those with a smaller 
number of nodes, depending on the respective values of 
K [20,24]. Another aspect which seems to have consid- 
erable impact on the performance of most algorithms is 
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the location of the root. For instance, instances with the 
root in the center of a rectangle in the Euclidean plane 
may be solved very easily compared to instances with 
the root in the corner of the rectangle. Recently, a set of 
problem instances with 40 and 80 nodes became used 
consistently (see, e. g., [20,22]). 

Branch and bound methods for the CMST can be 
divided into two classes. Node oriented methods branch 
by fixing nodes, arc oriented methods branch by includ- 
ing an arc (i, j) into the solution (i.e. fixing xj = 1) or 
excluding it from the solution (x; = 0). A node is called 
established if the path from the root to this node only 
consists of arcs fixed to 1 (these arcs are called estab- 
lished, too). Usually, only those arcs incident with ex- 
actly one established node are allowed to be fixed. If an 
arc is fixed to 1, it becomes established and both of its 
end nodes are established. Correspondingly, an arc is 
called disallowed, if xj = 0 is fixed. 

A well-known relaxation of the CMST is the MST 
relaxation which can be easily solved to optimality. If 
the MST solution is feasible for the CMST, then it is also 
optimal and the respective problem can be fathomed. 

In the early 1970s an arc oriented branch and bound 
algorithm based on the MST relaxation was very pop- 
ular. Subproblems are, e.g., branched by defining the 
first not yet established arc of an infeasible subtree (the 
first counted from the center) as established or dis- 
allowed, respectively [8]. This approach may be im- 
proved by using logical tests and tighter lower bounds 
[10]. Let node i be established, then in a subproblem 
disallowing arc (i, j) all arcs (i, k) with node k being 
an established node of the same subtree as j may be 
disallowed, too, without loosing optimality. If an opti- 
mal solution is lost by disallowing these arcs, the com- 
plementary subproblem with established arc (i, j) con- 
tains another optimal solution. A dominance criterion 
is used to fathom some subproblems. In addition, the 
lower bounds are improved using a special case of the 
degree constrained MST considering that the degree of 
the center node — and hence the number of subtrees — 
is greater or equal to the ratio of the total demand and 
the capacity K of each subtree. 

A. Kershenbaum and R.R. Boorstyn [27] propose 
two branch and bound algorithms both using last-in 
first-out to choose the subproblem that is next to be 
considered. One of the algorithms is node oriented. It 
starts with n subtrees and each node being “permissi- 


ble’ for each subtree. A subproblem is branched by in- 
cluding or excluding a node from a specific subtree. 
Lower bounds are obtained from a partitioning algo- 
rithm. The node weights used in this algorithm are orig- 
inally derived from the MST solution and then, dur- 
ing the branch and bound, transformed in a weight ex- 
change process. Theoretically, these bounds are at least 
as good as those from the MST relaxation, in practice 
they are much better. With the same partitioning tech- 
nique an arc oriented branch and bound algorithm sim- 
ilar to the one of [8] is developed. 

B. Gavish [14] compares several relaxations of the 
CMST with respect to lower and upper bounds. Best re- 
sults are obtained with a Lagrangian relaxation with an 
additional degree constraint combined with a subgradi- 
ent optimization procedure. 

Outperforming his previous methods, Gavish [15] 
develops a new binary programming formulation for 
the CMST based on an extension of the subtour elimi- 
nation constraints known from the traveling salesman 
problem (TSP). Because of the large number of these 
constraints involved in the formulation an augmented 
Lagrangian procedure is developed where a dual ascent 
algorithm is used to obtain initial multipliers and a sub- 
gradient procedure to optimize them. 

L. Gouveia [20] presents a flow formulation with 
binary variables zj, being 1 if a flow of q units goes 
through arc (i, j). Instead of the O(n) constraints of 
the above flow formulation (P) only O(n) constraints 
are required. The linear relaxation of the new formula- 
tion yields lower bounds as good as those produced by 
the original formulation. With additional constraints 
different Lagrangian relaxation schemes are obtained 
that yield some improvements on the bounds of Gav- 
ish [15], especially for problem instances with small ca- 
pacity K and the center in the ‘corner’ of a rectangle 
containing the nodes. 

K. Malik and G. Yu [30] present another branch 
and bound algorithm with Lagrangian subgradient op- 
timization. They give a formulation for the CMST 
(closely related to the one of [15]) and additional tight- 
ening constraints which are added to the problem dur- 
ing the optimization process. Based on a multicom- 
modity flow formulation R. Kawatra [26] uses a La- 
grangian approach, too. 

L. Hall [24] reports on experience with a cutting 
plane algorithm for instances with up to 200 nodes 
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making clever use of polyhedral methods. Gouveia and 
P. Martins [22] propose a hop-indexed generalization 
of formulation (P). Further improvements on the lower 
bounds for problem instances with the root in the cor- 
ner of a rectangle are obtained. 

P. Toth and D. Vigo [38] provide an exact algo- 
rithm for the CMDT and numerical results are also 
provided for problem instances with up to 200 nodes. 
Their approach uses an additive lower bounding proce- 
dure combining a Lagrangian lower bound and a lower 
bound based on solving minimum cost flow prob- 
lems. 


Heuristics 


Before presenting heuristics for the CMST it is useful 
to consider the characteristics of feasible and infeasible 
solutions [4]. A solution consists of a set of components 
C; = (V;, A;) with node set V; and arc set A; where usu- 
ally C; is a spanning tree for V;. Each component in- 
cludes only one central arc so that two different node 
sets V; and V; may have the center node as the only 
common node. Joining all node sets V; would yield the 
entire node set V. 

A component is called feasible if it does not violate 
the capacity constraint, and infeasible, otherwise. It is 
referred to as central if it includes the center node and 
noncentral, otherwise (i. e. a noncentral component re- 
sults from a component by eliminating the central arc 
and the center node). Sets of components having both 
infeasible and noncentral components are not consid- 
ered as a solution. 

A solution is called feasible, if every component con- 
tained in the solution is central and feasible itself. It is 
incomplete, if every component is feasible but at least 
one is noncentral. If all components are central but at 
least one is not feasible then a solution is called infeasi- 
ble. 

The following special solutions of the CMST may be 
emphasized. The incomplete solution with n + 1 com- 
ponents C; = ({i}, 0), i= 0,..., n, is called an empty tree. 
All components are feasible and all except Cp are non- 
central. The feasible solution with n components C; = 
({0, i}, {(0, i)}), i = 1, ..., n, is called a star. All com- 
ponents are central and feasible. In the case of a sparse 
graph with only a subset of nodes being directly con- 
nected to the center artificial arcs with high cost values 


should be introduced to complete the graph. The star 
then might be feasible only for the modified problem. 


Finding Initial Feasible Solutions 


Most procedures for determining initial feasible solu- 
tions (start procedures) for the CMST may be classified 
as construction procedures, savings procedures or dual 
procedures. 


Construction Methods 


Construction methods start with an incomplete solu- 
tion, usually the empty tree, and successively enlarge it 
until the solution is feasible. Most procedures in this 
category replace two components and the chosen arc 
that connects them by a new component. We may dis- 
tinguish between arc oriented and node oriented meth- 
ods. 

Arc oriented (or best arc) procedures choose in 
a greedy fashion arcs which are used to join its two in- 
cident components. The procedures stop when a feasi- 
ble solution is obtained. The components of the final 
solution generally are not built one by one but simul- 
taneously. It is not necessary to finish one component 
before starting another one. 

As examples one may use the basic principle of well- 
known MST algorithms. The modified Kruskal algo- 
rithm [7] in each iteration chooses a feasible arc with 
lowest cost and joins the two corresponding compo- 
nents. All arcs that have become infeasible in this step 
are removed from consideration for the next iterations. 
Correspondingly, the modified Prim algorithm in each 
iteration chooses an arc with minimal cost which is in- 
cident to the center or a central component (with not 
yet exhausted capacity). 

Node oriented (or best node) procedures choose in 
a greedy fashion a node or component and join it to its 
nearest neighbor component by the best possible arc in- 
cident to the chosen component, while preserving fea- 
sibility. 

An obvious idea is to cluster the nodes into groups 
of no more than K nodes and then to choose the arc set 
according to an MST for the nodes of each group and 
the center. Assuming that coordinates of the nodes are 
given this approach may be referred to as clustering (or 
sweep) algorithm [36]. 
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The Martin algorithm [25,31] chooses the compo- 
nent which is most distant to the center and joins it to 
its nearest feasible neighbor component. If 7; is the cost 
of connecting component C; to the center, the compo- 
nent with maximal 7; is chosen. 

The regret method (or Vogel approximation method 
(VAM, [8]) computes for every component C; a regret 
1; = ay(i) — a,(i) that has to be accepted if C; is not 
joined to its nearest feasible neighbor component with 
cost a;(i) but to its second nearest feasible neighbor 
with cost a2(i). The component with maximal regret is 
chosen and joined to its nearest neighbor. The regrets 
are recomputed and the procedure continues until the 
solution is feasible. 

Mixed procedures combine arc and node aspects. 
They assign a weight w; to each node i and compute for 
every arc (i, j) the trade-off function value as tj = w; — 
cj. The feasible arc with largest t,; is chosen and the re- 
spective components are joined. In general, the weights 
have to be updated after each iteration. With an appro- 
priate definition of the weight function and an update 
tule all preceding heuristics except the clustering pro- 
cedures can be incorporated in this concept [29]: 

e The Kruskal algorithm is obtained for w; = 0 for all 
i. Obviously, no update is needed. 

e For the Prim algorithm assign weights w; = 0 to all 
central components and w; = —oo to all other com- 
ponents. If a noncentral component is joined with 
a central component, the weight of the new compo- 
nent is set to zero. 

e The Martin algorithm requires w; = y;+ a)(i), and 
with w; = a2(i) = r; + a,(i) one obtains the VAM. 
The weights w; have to be recomputed if the values 
of a;(i) or a2(i) have changed, respectively. 

Mixed VAM is a combined regret-best arc procedure 

[18,39]. The regret r; is used as node weight w; and thus, 

the trade-off function is ty = rj — cj. 

The unified algorithm [29] proposes a parameteri- 
zation of the weight function and the trade-off func- 
tion. 


Savings Procedures 


Savings procedures for the CMST usually start with the 
star. The best feasible change, i.e. the change which 
yields the largest savings, is performed. This is itera- 
tively repeated until no savings can be obtained any 


more. The methods could easily be applied to other fea- 
sible solutions and so they could be classified as im- 
provement procedures, too. 

The Esau-Williams algorithm (EW, [11]) joins the 
two components which yield the maximal savings in 
cost. The savings sj of joining C; and C; is defined as 
si = max{yZj, xj} — ci if joining of C; and C; is feasible, 
and sj = 00, otherwise, with y; again being the minimal 
cost of the connection from the center to the nodes of 
C; and ci being the minimal cost of an arc connecting 
C; and Cj. Then all savings concerning the new compo- 
nent have to be recomputed and again the maximal sav- 
ings is chosen. The process is stopped if no more posi- 
tive savings are available. 

The EW is closely related to the above mentioned 
best node procedures. For instance, the Martin algo- 
rithm may be referred to as a less greedy version of the 
EW. 

The EW can also be described as a special case of 
the unified algorithm starting with the empty tree and 
at each step adding a feasible arc with maximal trade-off 
tij =i = Cij- 

Whitney’s savings heuristic [10,39] modifies the EW 
by allowing noncentral arcs to be deleted as well as cen- 
tral arcs. This leads to a possible recombination of seg- 
ments of the components. Here we see again that sav- 
ings algorithms are closely related to the class of im- 
provement procedures. 

The parallel savings algorithm (PSA) [18] computes 
savings like the EW. However, one iteration does not 
only join one pair of components but a set of pairs with 
maximal total savings. This set is determined by solving 
a maximum weight matching (maximal with respect to 
the savings) in an adequate graph. 

To avoid the parallel construction of nearly equal 
sized components which cannot be joined any longer if 
they exceed half of the capacity, Gavish [16] proposes 
consideration of dummy nodes which yield high sav- 
ings for any component joined with them. Thus, in this 
PSA with dummy nodes the number of joins between 
original components in one iteration is reduced by half 
the number of dummy nodes. 


Dual Procedures 


Dual procedures start with an infeasible low cost solu- 
tion, usually the MST solution. The violation of the con- 


352 


Capacitated Minimum Spanning Trees 


straint(s) is iteratively reduced at the expense of a total 
cost increase until the solution becomes feasible. 

The start procedure of D. Elias and M.J. Ferguson 
[10] examines every arc (i, j) of any infeasible compo- 
nent. If (i, j) is deleted the resulting noncentral compo- 
nent C; is connected to another central component by 
arc (k, 1). This arc is chosen such that the total capac- 
ity overflow is reduced. Ties are broken such that the 
smallest cost increase is chosen. The procedure deletes 
that arc (i, j) which leads to minimal total cost increase 
Cx — cj and adds (k, /) to the solution. (As a modifica- 
tion the arc (i, j) with minimal ratio of cost increase and 
capacity overflow reduction could be chosen.) Given in- 
teger cost weights, the procedure terminates with a fea- 
sible solution after a finite number of iterations, because 
in each iteration the total capacity overflow is reduced 
by at least one unit. 

Given a feasible solution one can try to improve the 
solution by recombining segments of the components 
in a similar way [10]. To increase flexibility, consider 
a slight modification: Exchanging two arcs should be 
allowed even if it leads to an increase of total capacity 
overflow whenever the cost of the solution does not in- 
crease and the arc that is to be included never had been 
in the solution before. 

Dual procedures may well be related to other con- 
cepts. For instance, a dual procedure may be seen as 
a constructive savings procedure starting within the in- 
feasible region of the solution space. In that sense it 
might be related to metastrategies as, e.g., tabu search 
described below in the sense that it performs a recover 
phase within a strategic oscillation approach. 


Additional Procedures 


Besides classifying construction, savings or dual proce- 
dures, there are procedures using aggregation and de- 
composition techniques combined with dynamic pro- 
gramming. In addition, some heuristics which start 
with generating a TSP tour are not considered in that 
scheme. 

Gouveia and J. Paixao [23] present two heuristics 
for the CMST which are based on problem size reduc- 
tion by aggregation and decomposition techniques. In 
the aggregation heuristic the nodes are clustered. using 
the EW — thus forming new nodes with higher and in 


general nonidentical weights — until the resulting ag- 
gregated problem is small enough to be solved to op- 
timality in time limits deemed practical. The decompo- 
sition heuristic creates for each central arc of the MST 
solution a subproblem by considering only the nodes 
of the respective subtree. Subproblems which are small 
enough are solved to optimality. For the remaining sub- 
problems the aggregation heuristic is used. 

Note that the above mentioned sweep algorithm 
might be classified as aggregation procedure, too. 

For the case of unit weights, K. Altinkemer and Gav- 
ish [2] provide a modified PSA with a worst-case error 
bound of 3 — 2/K and derive a bound of 4 for the case 
of nonunit weights. First, a TSP tour is constructed and 
then it is partitioned into feasible subtrees by adding 
some central arcs and removing respective noncentral 
arcs. Note that the (noncenter) nodes of the resulting 
subtrees are always connected in the same order as in 
the TSP tour. 

In the case of unit weights a K-iterated tour parti- 
tioning algorithm is used: K solutions are constructed. 
In each solution the first subtree starts with the first 
(noncenter) node of the TSP-tour, the second subtree 
starts with node 2 (first solution), node 3 (second so- 
lution), ..., node K + 1 (Kth solution). Apart from the 
first and the last each subtree contains exactly K nodes. 
The best out of these K solutions is chosen. 

For nonunit weights a nearest insertion optimal par- 
titioning algorithm may be applied: In the nearest inser- 
tion tour the nodes are renumbered according to their 
position. Modified costs cj’ are computed as the cost of 
a tree linking the center with node i, node i with node 
i+ 1, etc., and node j — 1 with node j. If such a tree is 
infeasible (due to capacity), the respective cost is set to 
infinity. With these definitions, the shortest path with 
respect to cj’ from the center to node n represents the 
optimal partitioning. 

In each procedure a final step can be added: The so- 
lution is improved by computing MSTs for the derived 
components. 


Improvement Procedures 


Improvement procedures for the CMST can be classi- 
fied as either local exchange procedures or second order 
procedures. 
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Neighborhood Definition 


Local exchange procedures start with a feasible solution 
and seek to improve it by modifying the current solu- 
tion in a prespecified way: Sets of arcs are included in 
or excluded from the solution. If more than one change 
of the solution is possible the best one (with respect to 
cost) is chosen. The procedure continues as long as im- 
provements are possible. 

Given a feasible solution, H. Frank et al. [13] ex- 
amine for every node i the following exchange: Con- 
nect i to its nearest neighbor not yet connected to i 
and remove the arc with highest cost from the result- 
ing cycle while still preserving feasibility. The exchange 
with greatest cost decrease is chosen as long as improve- 
ments are positive. The authors describe this procedure 
for a network design problem with variable arc capaci- 
ties and cost but it can be naturally applied to the special 
case with only one available capacity and fixed cost for 
each arc as in the CMST. 

Elias and Ferguson [10] try to improve the solu- 
tion by recombining segments of the components, i. e., 
deleting an arc and reconnecting the resulting non- 
central component without loosing feasibility (cf. the 
Whitney savings heuristic above). 

The previously reported improvement procedures 
alter a current solution by including or excluding arcs. 
In contrast, a node exchange procedure transforms one 
feasible solution to a neighbor solution by changing the 
assignment of the nodes to the subtrees. Such a trans- 
formation is called move. Subsequently a certain num- 
ber of moves is performed thus trying to find improved 
solutions. 

Starting from the EW solution, in their CMST pro- 
cedure A. Amberg et al. [4] consider two types of 
moves: Shift moves choose one node and shift it from 
its actual component to another one. Exchange moves 
choose two nodes belonging to different subtrees and 
exchange them. Both types may be simultaneously used 
whereas only feasible moves are allowed, i. e. those lead- 
ing again to feasible solutions. 

A modified neighborhood definition involves cut- 
ting a subtree from a given solution and to paste it 
within another subtree or to connect it to the root node 
[35]. Additional neighborhood structures are given in 
[1]. Contrary to the previous neighborhood structures 
the authors do not restrict themselves to the consider- 


ation of two subtrees to be involved in one move but 
into a chain of moves performed simultaneously (called 
cyclic exchanges and path exchanges). That is, the num- 
ber of exchanges grows exponentially with the problem 
size. Based on a shortest path algorithm some profitable 
exchanges may be determined in way which may be 
termed Lin-Kernighan neighborhood or ejection chain. 


Second Order Algorithms 


Second order algorithms iteratively apply a slave pro- 
cedure to different start solutions (where some arcs 
are fixed to be included) and/or modified cost matri- 
ces (where inhibitively high cost has been assigned to 
some arcs) thus forcing arcs into or out of the solution. 
Savings procedures as the EW or the PSA are applied as 
slave procedures to complete the solution. In each itera- 
tion, all possible modifications according to a given rule 
are checked. The best one is realized and the respective 
modifications are made permanent for the remaining 
iterations. Two important second order algorithms are 
inhibit and join [25]. 

The inhibit procedure examines for every arc of the 
current solution the effect of excluding this arc by ap- 
plying the EW to a modified graph where the cost of 
the respective arc has been made inhibitively high. The 
inhibition yielding the lowest cost solution is made per- 
manent (the arc is inhibited for the remaining itera- 
tions) and the process is repeated until no further cost 
reduction can be obtained. At most O(n?) iterations, 
each with at most O(n) inhibitions, have to be consid- 
ered. 

The join procedure determines for every node i its 
nearest neighbor i; as well as the closest neighbor iz 
closer to the center than i (if different from i,). It com- 
putes the effect on the cost of the solution if node i is 
directly connected to node i; or alternatively to node 
in (if this is not already done in the actual solution) by 
applying the start procedure on a modified graph. The 
joining which produces the best solution is made per- 
manent and the procedure is repeated with this solu- 
tion. In each of the O(n) iterations O(n) joins have to 
be considered. 

It should be noted that both procedures, inhibit and 
join, are already look ahead procedures (trying to over- 
come a shortsighted myopic behavior). Both improve- 
ment procedures can be used alone or in combina- 
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tion with each other performing one iteration of join 
after an iteration of inhibit and vice versa. Combin- 
ing the procedures restricts the number of iterations to 
O(n) (from the join procedure) yielding a complexity of 
O(n’) times the EW complexity. 

For the improvement procedure of [28] in a first 
step the MST solution and the EW solution have to be 
determined. Then the following iteration is performed. 
Define T as the set of arcs which are in the MST but 
not in the EW solution. For every nonempty subset S of 
T generate a (incomplete) solution including these arcs 
(if this is feasible), then exclude all arcs of the remaining 
subset T\S (by modifying the respective arc costs) and 
complete the solution by applying the heuristic. Choose 
the subset S* which yields the largest improvement and 
permanently include these arcs into the solution. Re- 
peat this iteration with modified T := T\S*. 

The min-exchange heuristic outlined in [17] starts 
with any given feasible solution and determines for ev- 
ery pair of components C, and C, the cheapest arc (i, 
j) connecting the two components. All arcs incident to 
i or j are deleted. C, and C, are decomposed into two 
noncentral single node components C; and C; and some 
remaining components. Now the noncentral compo- 
nents are connected with the center; hereby the mini- 
mal cost arcs are chosen. The PSA completes this mod- 
ified solution. The authors propose to split all compo- 
nents simultaneously. 


Computational Results 


In the early CMST literature the EW has been found 
to perform best on average when compared to proce- 
dures with similar computation times. Therefore, even 
nowadays EW is taken as a benchmark to check the 
performance of other procedures. Kershenbaum and 
W. Chou [29] report that the unified algorithm run- 
ning with 3 to 10 different parameter combinations and 
correspondingly multiplied computation times yields 
1-5% improvement over EW. Unfortunately, no spe- 
cific parameter combination produces improvements 
in general. 

Gouveia and Paixao [23] admit the nearest insertion 
optimal partitioning algorithm to perform much worse 
than EW on average with some significant exceptions. 
This shows that no general dominance of EW consider- 
ing single problem instances can be derived. 


Gavish and Altinkemer [16,18] report for test prob- 
lems with up to 400 nodes that the PSA yields im- 
provements of 2-4% in the unit weight case, but per- 
forms poorly for nonidentical weights. In the latter case, 
the min-exchange heuristic applied to the PSA solu- 
tion gives results comparable to those of EW [17]. Gav- 
ish [16] reports that the PSA with dummy nodes at- 
tains improvements over EW (up to 6% some cases). 
However, in the nonunit weight case EW performs still 
better. Here, the PSA with constant number of joins 
gives consistently better results than EW. Gouveia and 
Paixao [23] apply this variant of the PSA with the num- 
ber of joins varying between 1 (which is in fact the EW) 
and 12 on unit weight test problems with up to 200 
nodes: Significant improvements over EW with com- 
putation times raised by a factor of up to 250 are ob- 
tained. They also report that the (original) PSA per- 
forms best when the capacity is a power of 2 (in the 
unit weight case). Their aggregation heuristic on aver- 
age yields a slight improvement over EW (up to 3% in 
some cases). In test problems, that have the center in 
the ‘middle’ of the rectangle containing the nodes, the 
decomposition procedure has larger computation times 
than the aggregation algorithm (factors of slightly more 
than 1 up to 3 are found) and better results, whereas 
in cases with the center on the ‘corner’ of the rectangle 
both methods in almost all instances have similar run- 
ning times and solutions. Apart from a few cases the 
PSA with constant number of joins (and varied param- 
eters) on average performs better than both procedures. 

M. Karnaugh [25] tests inhibit and join on problems 
with up to 150 nodes. The combination of the proce- 
dures gives 2-3% improvement over EW while the run- 
ning time is increased by a factor 100 (derived for the 
150-node problems). Applying only inhibit performs 
slightly worse. 

Kershenbaum et al. [28] found that inhibit, join and 
their own procedure yield improvements of around 2% 
over EW. Thus, their own procedure requiring only 2 to 
3 times more computation time than EW, outperforms 
join. 


Metaheuristics 


Given a local search mechanism, a metastrategy like 
tabu search or simulated annealing as a guiding process 
decides which of the possible moves is chosen and for- 
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wards its decision to the application process which then 
executes the chosen move. In addition, it provides some 
information for the guiding process (depending on the 
requirements of the respective metastrategy) like the re- 
computed set of possible moves. 

Contrary to the improvement procedures reported 
in the last section, the cost of a new solution may exceed 
the cost of the previous one. Moves leading to a cost in- 
crease are allowed in order to overcome local optima. 
Which of the available feasible moves should be chosen 
to transform the current solution? The answer to this 
question is not clear and various approaches may lead 
to good solutions. The guiding process may use, e. g., 
the two metastrategies simulated annealing and tabu 
search. 

Simulated annealing (SA) randomly chooses one 
of the feasible moves and its change in cost is com- 
puted. If the change is a cost decrease the move is per- 
formed. Otherwise, the new solution is accepted with 
a certain probability. The probability function usually 
is logarithmic and — intending to favor good solutions 
— decreases with raising amount of cost increase. It 
decreases with the number of iterations already per- 
formed thus intensifying the search in the current area 
of the solution space when the execution time is grow- 
ing. A parameter called start temperature has to be 
specified to adapt the probability function to the actual 
problem. SA does not require any additional informa- 
tion. If the new solution is rejected the current solution 
remains unchanged in this step. The next iteration tries 
again to alter the same solution. Simulated annealing 
implementations for the CMST are given in [4,6]. 

Tabu search (TS) examines all feasible moves. The 
best move — leading to the highest cost decrease or the 
lowest cost increase, respectively — is chosen and per- 
formed. Now suppose that a local optimum is reached. 
Without further instructions the procedure could per- 
manently alternate between this local optimum and its 
best neighbor. For that reason a so-called tabu list is 
created: To prevent that a yet explored solution is exam- 
ined again, all moves that (could) lead to such a solution 
are stored in the tabu list. Which moves have to be set 
tabu is derived from the running list (RL) containing all 
performed moves in their sequence of execution. Both 
lists have to be updated after each iteration. 

In the literature there are several distinct ways of de- 
riving the tabu list. They are referred to, e.g., as static 


tabu search STS, reverse elimination method REM and 
cancellation sequence method CSM (see [4] for the 
CMST). For STS and CSM some parameters have to 
be specified to adopt the methods to a specific prob- 
lem and problem instances (especially problem size and 
scaling of cost). 

The storage complexity of the application process 
is O(n’) and the time complexity O(K’) per iteration 
because of the recomputation of MSTs in the changed 
components. Using simulated annealing we have a time 
complexity of the guiding process of O(K’): To com- 
pute the probability of acceptance for the new solution 
two new subtrees have to be computed. This is part 
of the application process and need not be performed 
twice. Thus, additional effort only arises if a solution is 
rejected which does not influence the overall complex- 
ity. The storage complexity also is not raised if simu- 
lated annealing is used. 

The complexity of the guiding process depends on 
the special tabu search method. Different tabu search 
implementations are described in [4,35]. Whereas [4] 
seem to provide better results for the benchmark in- 
stances with up to 80 nodes than [35], both seem to 
be outperformed by the more recent (as of 2000) algo- 
rithm in [1] based on their more powerful neighbor- 
hood structures. 

Besides TS and SA additional modern heuristic 
search concepts have been investigated for the CMST. 
A neural network approach is investigated in [33]. 
A GRASP implementation is provided in [34]. The re- 
sults for both approaches seem to be behind some of 
those described in the previous paragraphs. 


Problem Modifications and Related Problems 


Additionally to considering arc costs one may take into 
account unreliable arcs and node outage costs which are 
incurred by the user whenever a terminal node is un- 
able to communicate with the central node, i.e., costs 
associated with link failures [9]. 

An interesting modification of the CMDT is the 
resource-constrained minimum spanning tree problem 
in directed graphs [12]. Here each node, say i, has a cer- 
tain amount of scarce resources available (a capacity) 
which may be used to fulfill capacity requirements of 
all arcs leaving i. Instead of measuring capacity require- 
ments for subgraphs off the root node here the con- 
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sideration restricts to the set of incident arcs leaving 
anode. The current state of the art for solving this prob- 
lem circumvents a branch and cut approach [12]. 

In practice we might be faced with the problem that 
a solution of the design phase need not be a tree but 
a forest with more than one root node. Most of the 
approaches developed for the CMST might be applied 
in a slightly modified way to this so-called multicenter 
CMST (see e. g. [3] for an extension of the partitioning 
heuristics with corresponding worst-case bounds). 

When multiple centers are considered in arc ori- 
ented vehicle routing then the capacitated arc routing 
problem (CARP) may be transformed in a way that sub- 
problems are successively solved as CMST. Amberg et 
al. [5] develop this transformation and apply their TS 
and SA approaches to this multiple center CARP. 

Besides solving the CMST as a pure combinato- 
rial optimization problem it may also be embedded 
into a problem of users with traffic requirements who 
have to build contracts with, e.g., a telephone com- 
pany for the provision of service. This may lead to 
the consideration of some game-theoretic concepts as- 
sociated with a cost allocation problem arising from 
the CMST or more general capacitated network design 
problems [37]. 


Conclusions 


In this paper we have provided a survey on existing 
methods for solving the CMST. 

With respect to considered algorithmic concepts it 
might be interesting to incorporate some sort of either 
exact or heuristic reduction techniques. 


See also 


> Bottleneck Steiner Tree Problems 
> Directed Tree Networks 

> Minimax Game Tree Searching 
> Shortest Path Tree Algorithms 
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Constantin Carathéodory, a mathematician of Greek 
origin, was born in Berlin on September 13, 1873 and 
died on February 2, 1950, in Munich, Germany. He 
made important contributions to the theory of real 
functions, to the calculus of variations, and to measure 
theory. 

He first studied in the Brussels’ Military School, 
where he received a solid mathematical background. 
After two years as an assistant engineer with the British 
Asyut Dam project in Egypt, Carathéodory began his 
study of mathematics at the Univ. of Berlin in 1900, 
where he attended the courses of L. Fuchs, G. Frobe- 
nius and H. Schwarz. He was particularly influenced by 
Schwarz’ lectures with whom he became a close friend. 
In 1902 he entered the Univ. of Géttingen, where he re- 
ceived his PhD [1] under the German mathematician 
H. Minkowski. In 1909 he became a full Professor in 
the Univ. of Hannover. In 1913 he obtained the chair 
held previously by F. Klein in Gottingen and in 1918 
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Carathéodory Theorem 


he succeeded Frobenius in the Univ. of Berlin. Then, 
in 1920, he accepted to help the Greek Government in 
creating the Univ. of Smyrna, Asia Minor, which then 
belonged to the Greeks. When the Turks razed Smyrna 
in 1922, Carathéodory managed to save the university 
library, which he moved to the Univ. of Athens, where 
he taught until 1924. He then was appointed professor 
of mathematics at the Univ. of Munich. 

Carathéodory made important contributions to var- 
ious branches of mathematics. In the calculus of vari- 
ations, besides a comprehensive study of discontinu- 
ous solutions, which was contained in his PhD thesis, 
he also added important results linking the theory with 
first order partial differential equations. His work on 
the problems of variation of m-dimensional surfaces in 
an n-dimensional space marked the first far-reaching 
results for the general case. He also applied the calcu- 
lus of variations to specific problems of mechanics and 
physics. He contributed important findings in his book 
[6]. The theory of functions and measure theory are 
two additional areas where the work of Carathéodory 
is very important. His book [3] is a classic of the field. 
In the theory of functions of several variables he sim- 
plified the proof of the main theorem of conformal rep- 
resentation of simply connected regions on the unit- 
radius circle. His investigations of the geometrical-set 
theoretic properties of boundaries resulted in his the- 
ory of boundary correspondence. Already in 1909 he 
published a far-reaching paper on the foundations of 
thermodynamics [2]. The paper remained unnoticed by 
the physicists, because it was published in a mathemat- 
ical journal. Only in 1921 M. Born brought the paper to 
the attention of the physics community, and since then 
the paper and the Carathéodory principle became clas- 
sics. He also contributed to Einstein’s special theory of 
relativity. His published works include [4,5,7,8,9]. 
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One of the basic results [3] in convexity, with many ap- 
plications in different fields. In principle it states that 
every point in the convex hull of a set S C R" can be rep- 
resented as a convex combination of a finite number (1 
+ 1) of points in the set S. See for example [1,4,6,7,9,10]. 
Generalizations of the theorem can be found in [2] 
and [5]. 


Theorem 1 Let S be any subset of R". 
For every x € conv(S) (the convex hull of S), there 
exist n + 1 points Xo, ..., X» € S such that x € conv(xo, 


vay Ky) 
Proof Since x € conv(S), there exists a representation 
x= ae a; xj, x; € Sfori=0,...,k and 4 a;=1. 
If k < n, we are finished. 
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Now suppose k > n. Note that then x; — xo, ... 
— xq are linearly dependent. There then exist scalars A, 
...) Ax, not all zero, such that ae Aj (xj — x9) = 0. 

Let now Ay = — D“*_, Ais it then follows that 7*_, 
A;x; = 0 and we can find at least one A; > 0. So we have, 


> Xk 


k k k 
x= Yo aixi—y-0 = So aixi—y Do Aix 
i=0 i=0 i=0 


k 
= YS (ai — VAi)xi 
i=0 


for any y ER. 
Choose y in the following way: 


for some j € {0,..., k} so, a; — yA; > 0 for all i=0,..., 
k, 

Then we obtain x = VE (ai — yAj) x; with a; — 
yA; = 0 fori=0,...,k, a (a; — yAj) = land a; — 
yj =0. 

And so x is represented as a convex combination of 
at most k points in S. We can now repeat these steps 
until k =n. 
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Why the Checklist Paradigm? 


The classical logic deals with two logical values—‘truth’ 
and ‘falsity’. It can be characterised algebraically and se- 
mantically by a Boolean algebra. Not all issues of logic 
can, however, be settled by a system of classical two- 
valued logic. For example some modalities, such as ne- 
cessity and possibility cannot, in general, be expressed 
in any system that admits only a finite number of logi- 
cal values. Also some temporal logics [33] characteris- 
ing time require an infinite number of logical values in 
their semantics. 


Fuzzy Logics 


Many-valued logic algebras are needed for developing 
the mathematics of fuzzy relations [28] and sets [18]. 
For example, in order to compute the degree 6 to which 
two fuzzy sets intersect, we use the formula 6,4 9 g(x) = 
d(x) A dz (x), where A is a many-valued ‘AND’ con- 
nective and 5,(x), 5g(x) are some logical values: either 
truth-values, possibilities, probabilities, etc. Depending 
on the epistemological interpretation of the logical val- 
ues, we read the statement 6(A)(x) “The degree to which 
it is true that x € A’, “The degree to which it is possi- 
ble that x € A’, “The degree to which it is probable that 
x € A’, etc. 

Computing the degree of inclusion of two sets [2] is 
done by the formula 6(A C B) = (Vx)d4(x) > p(x), 
where x ranges over elements of the universe U from 
which the elements of A and B are drawn. Here — is 
a many-valued implication operator. 


Approximate Reasoning 


Many-valued logic systems are also required for alge- 
braic characterization of logics of approximate reason- 
ing. The premises of an inference (i.e. the antecedent 
formulas that form the arguments of the rules of ap- 
proximate inference) are used by the rules to generate 
the succedent formulas — the conclusion(s). 

If each of these logic formulas attains as its logic 
value a single value from some lattice, we speak of 
a point-based logic system of approximate reasoning. If 
the logic value is a whole interval [5;, 5;] such that 6, < 
6; it is an interval logic. 

Hence, many-valued logics play a key role in all the 
areas of mathematics and logic discussed above. There 


is not one many-valued logic, there is an infinite num- 

ber of families of logic systems of various kinds. Hence, 

according to the purpose of its use, one has to choose 
an appropriate many-valued system. But even after the 
choice is made, the two key questions still remain: 

e Where the logic values come from? 

e Is there any basic epistemic or semantic procedure 
by which the basic logic connectives can be mean- 
ingfully derived? 

These questions are answered by the checklist paradigm. 


Many-Valued Logics in Fuzzy Sets 


The theory of fuzzy sets and relations requires a many- 
valued logic in which to manipulate the degrees of 
truth which attach to fuzzy statements. As in classical 
two-valued logic (in which the statements are judged 
to be either utterly true or utterly false), one wishes 
a truth-functional connection between the truth values 
assigned to ‘p’ and to ‘q’ and those to be assigned to ‘p 
or g and ‘p and q and ‘if p then q’, as well as to ‘not- 
p and ‘not-q’, that is, one wishes the evaluation of the 
derived formulas to depend solely on the evaluation of 
the original formulas, without further reference to their 
contents. 

There are a number of such many-valued logical 
systems, with truth values in the closed real interval 
[0, 1]. Everyone agrees that the values assigned in the 
crisp ‘corners’, where the values |p| of p and |q| of q are 
zero (false) or one (true), must accord with the classical 
Boolean logic. Most agree in setting 


|not — p| = |>p| = 1— |p| 


and the most usual ‘or’ and ‘and’ connectives are given 
by 


|p or q| = |p V q| = max(p, q), 
|p and q| = |p A q| = min(p, q), 


although other have been proposed and have some- 
thing to be said for them. 

Selecting max and min as the functions for com- 
puting the logical values of the connectives V and A 
does not yet determine the system of many-valued logic 
fully. Indeed, a number of different systems employ 
these. Third determining factor is the choice the impli- 
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cation operator —. Some frequently used — are listed 
below. 


Checklist Paradigm Semantics for Fuzzy Logics, Table 1 
Some important many-valued implication operators 


No Opr. || Definition 
D, S Standard Strict 
2 1, a<b 
a= b= 
0, otherwise 
3 Sa Gédel 
3 l, @=b 
a>b= 
b, otherwise 
4, G43 || product ply 
(also: Goguen-Gaines) 
4 : ( =} 
a — b =min{ 1, — 
a 
4’, G43’ || Modified G43 
4 , ( b 1— +) 
a — b =min|{ 1, -, 
a 1—b 
5 IL, Lukasiewicz 
= b =min(1,1— a+b) 
55) KDL || Reichenbach 
pw =min(1, 1 — a+ ab) 
6 KD Kleene—Dienes 
6 
a>b=(l-a)vb 
7, EZ Early Zadeh 
Gb Gn) 
(ge by ke 
where ka = (1—a) Va 
8 W Willmott 
a> b=(a— b) Akb 


Not only properties of the many-valued logic sys- 
tems but also of the systems of fuzzy sets crucially de- 
pend on the choice of the implication operator. For ex- 
ample both the definition of a fuzzy power set (i.e. the 
set of all subsets) and of the fuzzy set-inclusion opera- 
tor depend on its choice. The very first paper on fuzzy 
sets by L.A. Zadeh [44,45] uses max and min connec- 
tives to define the intersection M and the union U of two 
fuzzy sets. The set inclusion operator Zadeh defines by 
the formula 


MWACB)=1 S$ (Vx)pa < p(x). 


Using the ‘Standard Strict? — in the formula given of 
the first section above we obtain 


6(A C B) 
= (Vx)da(x) > 53(x) 


= (Wx)wa(x) > pep (x) 


= min (14(x) > pa(x)) . 
{xEU} 


This formula is equivalent to Zadeh’s early definition 
of fuzzy set inclusion which in fact is crisp (nonfuzzy). 
Power set theories with proper fuzzy set inclusion have 
been first investigated in [2,43] (using the implication 
operators listed in the table above). 

Since 1965, when the first paper on fuzzy sets was 
written by Zadeh, not only max and min but also other 
many-valued logic connectives were used to define the 
union U and the intersection N of fuzzy sets. An impor- 
tant pair are the so called ‘bold connectives’: ‘a As b = 
max(0, a + b — 1)’ and ‘a Vs; b = min(1, a+ b)’. As the 
subscript indicates, these are related to the Lukasiewicz 
implication operator. These represent the so-called MV 
algebras which play an important role in application 
of fuzzy sets in quantum logics [35] and elsewhere. 
Both types of connectives, the pairs ‘max-min’ and the 
‘bold’ connectives are special instances of the so-called 
triangular norms (t-norms) and conorms (t-conorms) 
[17,36]. These associative operations with special prop- 
erties, defined on [0, 1], algebraically characterise the 
whole infinite family of OR-AND pairs of many-valued 
logic connectives and play a crucial role in the theory 
and applications of fuzzy sets. 


The Checklist Paradigm 


The checklist paradigm provides the mechanism by 
which several types of very different families of many- 
valued logic connectives emerge from some more basic 
considerations. 
e It provides the semantics of systems that use single 
value as its logic value. 
e It provides the justification for interval logics. 
It provides a link of many-valued logics connectives 
with generalized quantifiers. 
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Mathematics of the Checklist Paradigm 


A checklist template Q is a finite family of properties (Pj, 
..., P,). With a template Q, and a given proposition A, 
one can associate a specific checklist Q4 = (Q, A). A val- 
uation f 4 of a checklist Qy is a function from Q to {0, 1}. 

The value ag of the proposition A with respect to 
a template Q (which is the summarised value of the val- 
uation f 4) is given by the formula 


n 
4Q= >> pt 
i=1 
where n = card Qand p# = f,(P)). 

A fine valuation structure, a pair of propositions A, 
B with respect to the template Q, is a function f : 3 from 
Q into {0, 1} assigning to each attribute P; the ordered 
pair of its values (p4, p®). 

Let a; be the cardinality of the set of all attributes 
P; such that f ,(P;) = (j, k). 

Obviously we have the following constraint on the 
values: 9 + Qo, + O19 + Oy, =n. Further, we define rp = 
Qoo + Ao1, 71 = Ain + Q11, Co = Ano + A105 Cy = Ao) + 1. 

These entities can be displayed systematically in 
a contingency table. In such a table, the inner fine- 
summarization structure consists of the four @;,, ap- 
propriately arranged, and of margins co, ci, ro, 71 (see 
Fig. 1). 

Now let F be any logical propositional function of 
propositions A and B. For i, j € {0, 1}, let f(i, j) be the 
classical truth value of F for the pair i, j of truth values; 
let u(i, j) = @;, jn, the ratio of the number in the ij-cell of 
the constraint table, to the grand total. Then we define 
the (nontruth-functional) fuzzy assessment of the truth 
of the proposition F(A, B) to be 


m(F(A, B)) = Yo f (i,j) iy: 


i,j 


| No for B | Yes for B | Row total | 

NoforA || ao | anit 
YesforA |} ao | oan] oi 
Column Total] co | oa =~] 


Checklist Paradigm Semantics for Fuzzy Logics, Figure 1 
Checklist paradigm of the assignment of fuzzy values. De- 
fine: a=1r,/n;b=c,/n 


This assessment operator will be called the contrac- 
tion/approximation measure. 

The four interior cells a9, @o1, 10, 1, Of the con- 
straint table constitute its fine structure; the margins 10, 
T1, Cos C, constitute its coarse structure (see Fig. 1). 

The fine structure gives us the appropriate fuzzy as- 
sessments for all propositional functions of A and B; the 
coarse structure gives us only the fuzzy assessments of 
A and B themselves. Our central question is: 


to what extent can the fine structure be recon- 
structed from the coarse? 


As shown elsewhere [3,5,6,8] the coarse structure 
imposes bounds upon the fine structure, without deter- 
mining it completely. Hence, associated with the vari- 
ous logical connectives between propositions are their 
extreme values. 

There are four extremes that the fine structure of the 
contingency table (see Fig. 1) can attain [6,8]: 

i) the two mindiag fine structures with the diagonal 
values minimized (Qo = 0 or @; = 0); and 
ii) the two maxdiag fine structures with the diagonal 

values maximized (a9; = 0 or @j9 = 0). 

Thus we obtain the inequality restricting the possible 
values of m(F): 


con top > m(F) => con bot, 


where ‘con’ is the name of connective represented by 
f(i, j). Choosing for the logical type of the connective 
‘con’ the implication and making the assessment of the 
fuzzy value of the truth of a proposition by the formula 
m,(F) = 1— 149 we obtain: 


min(1,1—a+b) > m,(A > B) > max(1 — a,b). 


We can see that the checklist paradigm generated 
the Lukasiewicz implication operator, and the Kleene- 
Dienes implication operator. 

We have already noted that choosing for ‘F the con- 
nective type ‘AND’ [5,8]) and m , we obtain the bounds 


min(a, b) > m,(AND) > max(0,a+ b—1). 


These bounds are formally identical with those of B. 
Schweitzer and A. Sklar [36] giving the bounds on 
copulas which play an important role in their theory 
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of t-norms and t-conorms. Surprisingly, these check- 
list paradigm bounds also coincide with Novak’s re- 
cent (1991) derivation [31] of bounds on fuzzy sets ap- 
proximating classes of Vopénka’s alternative set theory 
[41,42]. E. Hisdal derives the same inequalities as the 
bounds on some connectives of her TEE model and 
comments on a possible link (cf. [16, Appendix A2]). 
In the context of modalities in fuzzy logics, checklist 
paradigm-like inequalities for F = {AND, OR} were re- 
cently (1992) also independently discovered in [34]. Yet 
all these models are neither formally nor epistemolog- 
ically identical. This indicates the need for a more pre- 
cise meta- and metametalogical formulation of many- 
valued based mathematical systems, that would include 
in their full definition a part formulating their ‘mathe- 
matical epistemology’. 


Interval Inference and The Checklist Paradigm 


The checklist paradigm puts ordering on the pairs of 
distinct implication operators and other pairs of con- 
nectives. Hence it provides a theoretical justification of 
interval-valued approximate inference. For the m, con- 
traction/approximation measure, there are 16 inequali- 
ties linking the TOP and BOT types of connectives [5,8], 
thus yielding 16 logical types of TOP-BOT pairs of con- 
nectives. Ten of these interval pairs generated by m, are 
listed in Table 2. 


Other Systems of Fuzzy Logic Connectives 
for Interval Inference 


In Boolean (crisp) logic, the values of a logical formula 
written in the disjunctive normal form (DNF) are equal 
to the values the formula expressed in the conjunctive 
normal form (CNF). This does not hold for every system 
of many-valued connectives. 

I.B. Tiirksen [38,39,40] has shown that for max(a, 
b), min(a, b) and some other t-norm and t-conorm 
based CNFs and DNFs the inequality DNF(a CON b) 
< CNF(a CON b) holds for all 16 basic many-valued 
connectives CON. 

Taking for example the max-min based CNF and 
DNF, the corresponding implications are given by 


CNF(a > b) = (-a v b), 
DNF(a > b) = (aA b)V (Ap Aq) Vv (pad 7b). 
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Two-argument interval pairs of connectives generated by m, 


Logical Type Valuation 
of Connective BOTTOM<TOP 
AND max(0,a+b—1) 
a&b < min(a, b) 
Nicod max(0, 1 — a — b) 
alb < min(1 —a,1—)) 
Sheffer max(1 — a,1—b) 
alb <min(1,2 —a-—b) 
OR max(a, b) 
avVb < min(1, a + b) 
Nonimplication max(0, b — a) 
a+b < min(1 — a, b) 
Nonimplication max(0, a — b) 
ax~b < min(a, 1 — b) 
Implication max(a, 1 — b) 
a<b < min(1,1+a-—b) 
Implication max(1 — a,b) 
a—>b < min(1,1—a+b) 
Equivalence max(1—a—b, a+b—1) 
a= < 
min(1—a+b,1+a—b) 
Exclusive OR max(a — b,b— a) 
ag@b < 
min(2—a—b,a+b) 


For further information on other systems of con- 


nectives for fuzzy interval inference see [11,27]. 


Optimization of Interval Inference 


Formulas that are equivalent logically [6,27] may not be 
equivalent when compared by their formula complex- 
ity. This is well-known phenomenon when expressing 
logical formulas in DNF or CNF normal forms [19]. 
The same logical function expressed in one of these 
forms may have more complicated expression in the 
other normal form. Similarly, this can be observed with 
other logic connectives [40]. So, transformations be- 
tween logically equivalent formulas expressed by differ- 
ent connectives may have different formula complexity. 
Hence, the knowledge of such transformations is useful 
in optimization of interval inference. 

For example, exclusive OR, or the eor operator, is 
conveniently defined in two ways: as ‘a without D’ or 
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“‘b without a’, or else as ‘a or b but not both’, thus 


a eor b = (aand —b) or (=a and b) 


= (aor b) and (a and b). 


Using these definitions together with the definitions 
of the previous section easy calculations bring the re- 


sults of the following Table [6]: 


Formulas for equivalence (IFF) and exclu- 
sive-OR (EOR) 


Other useful formulas are those that give universal 
bounds on classes of fuzzy interval pairs of formulas. If 
we define the unnormalized fuzziness of x [1] as ox = 
min(x, 1 — x), then for x in the range [0, 1], x is in the 
range [0, 0.5], with value 0 if and only if x is crisp, and 
value .5 if and only if x is .5. The following gap theorem 


holds [8]: 
Theorem 1 


(4 Atop 8) — (4 Avot b) 
= (4 Vtop b) — (4 Voor 8) 
= (4 top b) — (a bot 5) 
= min(¢a, db). 

(4 =top b) — (a =bot 5) 
= (4 Prop b) — (4 Boor ¥) 
= 2min(¢a, ob). 


The width of the interval produced by an application 
of a pair of associated connectives (i.e. TOP and BOT 
connectives) characterises the margins of imprecision 
of an interval logic expression. Because the interval be- 
tween the TOP connective and the BOT connective is 
directly linked to the concept of fuzziness ¢, the mar- 
gins of imprecision can be directly measured by the de- 


gree of d. 


Other Systems of Checklist Paradigm Connectives 


Several measures other than m, that yield interesting 
results are also important. For implication again, but 
only the evaluation ‘by performance’ (that is, we are 
only concerned with the cases in which the evaluation 
of A is 1; see Fig. 1, we use m2 = uy1/uUy9 + U1;) and ob- 
tain the inequality 


—l1 
min (1. *) > m(F) > max (0 ates) : 
a a 


in which the left-hand side is the well-known Goguen- 
Gaines implication (cf. e.g. [3]). Still another contract- 
ing measure which distinguishes the proportion of satis- 
factions ‘by performance’, u(1, 1), and ‘by default’, u(0, 
0) + u(0, 1). This measure given by the formula m3 = 
Uy1 V (Uoo + Uo1) yields [3] 


max[min(a, b),1— a] > m3(F) 
> max(a+b—1,1—- a). 
Two variations on measure m3 have turned out to be of 
interest [3]. One is its lower contrapositivization given 
by the formula 
Ma = (U1 V (Uoo + Uo1)) V (Yoo V (uo + H11)) 


which gives the following inequality: 


min[max(a + b —1,1-— a), max(b,1—a-—b)] 
< mg < min[max(1 — a, b),Kxa,xb] , 


where Ka =a V (1— a). 

The other arises by taking for the ‘performance’ part 
the less conservative m thus obtaining the formula for 
Ms = Mz V (Ugo + 11). This yields 


b 
max min (1. ‘) =a] > ms 
a 


k +b-1 
= max | ————,l-a]|. 

a 
For the proofs of the results presented in this subsection 
and further explanation see [6, Sect. 5] (this is the first 


paper on the checklist paradigm, published in 1980). 


Collapse of Intervals into Points Under 
the Additional Probabilistic Constraints 


When only the row and column totals r;, cj of the fine 
structure are known (see Fig. 1), one can ask what are 
the expected values for the a [3, Sect. 7]. 
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Suppose the ways in which numbers can be dis- 
tributed within the cells of the fine structure (so as 
to give the fixed coarse totals) constitute a hypergeo- 
metric distribution. Then the means of the distribu- 
tion for each cell, give the expected configuration of the 
fine structure. The inequalities determining the interval 
BOTCON < TOPCON now turn into equalities: 


Qi; Cj Qi; fi 


>? . 
Qik Ck Ahk Th 


Surprisingly, introducing the expected value (a 
probabilistic notion) this way causes the fuzzy inter- 
val to collapse into a single point: the expected value 
thus generating the values of the mid connective [3,5]. 
For example, the interval pair (—;, —6) generated 
by m, consisting of the Lukasiewicz implication and 
Kleene-Dienes implication operators collapses into the 
Reichenbach implication operator. 

For further details on the ‘probabilistic collapse’ 
of the interval pairs generated by other measures see 
[3,5,8,20,26]. 


Checklist Paradigm and Generalized Quantifiers 


Some of the notions and results of the checklist 
paradigm are in a remarkable relation to the theory 
of observational generalized quantifiers, as studied by 
P. Hajek and T. Havranek [14] in connection with 
the method of automated hypothesis formation [12,13]. 
Namely, a particular type of implication operator and 
a particular type of implicational quantifier are mutu- 
ally definable. The link is given by the contingency ta- 
bles of the checklist paradigm and the statistics of the 
observational quantifiers [15]. 


Checklist Paradigm and Four Modes of Reasoning 


Classical two-valued logic has presented certain modes 
of reasoning, of which only two concern us: modus po- 
nens and modus tollens, respectively. 

The first of these derives from the two premises ‘if 
a then BD and ‘a’, the conclusion ‘b’; the second derives 
from ‘if a then b’ and ‘not b’, the conclusion ‘not a’. The 
validity of these modes is trivial. 

On the other hand, there are two modes of reason- 
ing which are classically illegitimate, although in the 


daily life we all use something very much like them 
all the time. These, so-called plausible rules [4,32], are 
shown as the central pair in [26, Fig. 1]. Denial derives 
from ‘if a then b’ and ‘not a’, the assertion ‘not 0’, while 
confirmation derives from ‘if a then b’ and ‘b’, the asser- 
tion ‘a’. 

The reason why these errors in classical reasoning 
retain a strong intuitive attraction is that most human 
reasoning does not deal with crisp or two-valued or 
Boolean truth-versus-falsity, but with graded degrees of 
credence, or belief-worthiness, or whatever you like to 
call it. Because in multiple-valued logics the plausible 
modes gain legitimacy this intuition about human rea- 
soning gains mathematical legitimacy. Indeed, human 
reasoning, ‘good’ human reasoning, is best modeled in 
multiple-valued logic which admits in addition to the 
modus ponens and modus tollens also the two modes 
of plausible reasoning. 

In classical logic, an evaluation takes each of a given 
set of propositions into one of the extreme truth-values 
0 (false) or 1 (true), subject to some semantic consis- 
tency rules. In multiple-valued logic, an evaluation is 
a mapping of the set of propositions into a somewhat 
richer set, which for present purposes may be taken to 
be the closed interval [0, 1] from 0 to 1, again subject to 
certain semantic consistency rules. 

Hence, in multiple-valued logic, for any fixed choice 
among the distinguished implication operators, to the 
classically valid modes of modus ponens and modus 
tollens are to be added fuzzily valid modes of denial and 
confirmation (modus negans and modus confirmans) 
[4,7]. Although the out-of-bounds constraints were ad- 
dressed elsewhere [4], one may wonder, what does the 
checklist paradigm have to offer when applied to the 
four plausible modes of inference. 

Checklist paradigm is applicable not only to the 
components of the object language, such as logical op- 
erators and connectives, but also at the meta-level, thus 
providing an interval logic based semantics for vari- 
ous rules of inference. As shown below and in [8], it 
also provides a justification and the proofs of validity 
of nonclassical interval-based rules (plausible modes) 
of reasoning called denial and confirmation (modus ne- 
gans and modus confirmans) [4,7,8,24]. 

As already mentioned, these do not have a nontriv- 
ial analogy in Boolean crisp logic. Thus we have the fol- 
lowing theorems. 
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Theorem 2 (checklist modus ponens) Given r = m(A 
— B) and a = m(A) satisfying the consistency condition 
r > 1—a, the values of b = m(B) are subject to 


r—(l—a)<b<r. (1) 


Theorem 3 (checklist confirmation) Givenr=m(A—> 
B) and b = m(B) subject to the consistency condition b < 
r, the values of a = m(A) are subject to 


l—r<a<1-—-(r—D). (2) 


Theorem 4 (checklist modus tollens) Given r = m(A 
— B) and —b = m (not — B) satisfying the consistency 
condition r => b, the values of ~a = m (not — A) are sub- 
ject to 


r—b<-7a<r. (3) 


Theorem 5 (checklist denial) r= m(A — B) and -a 
= m (not — A) subject to the consistency condition 1 — a 
< 1, the values of ~b = m (not — B) are subject to 


l—-r<-7b <2-(r+a). (4) 


Group Transformations of Logic Connectives 
and the Checklist Paradigm 


Let us recall that a realization of an abstract group is any 
group of concretely realizable operations which has the 
same algebraic structure as the given abstract group. It 
is well known that any abstract group can be concretely 
realized by a family of permutations. So a specific ab- 
stract group provides a global structural characteriza- 
tion of a specific family of permutations that concretely 
represent this abstract group. This idea can be used for 
global characterization of logic connectives. 


The Piaget Group of Transformations 


Such a global characterization of two-valued connec- 
tives of logic was first given by Piaget in the context 
of studies of human cognitive development. J. Piaget 
and his collaborators have shown that an important 
role in child’s mental development is transition from 
more concrete to more abstract thinking. This transi- 
tion plays a role in development of intelligence, which is 
viewed in the Piagetian setup as a transition from totally 


ambiguous and vague notions to crisp propositions in 
two-valued logic. 

Given a family of logical connectives one can ap- 
ply to them various transformations. Individual logic 
connectives are 2-argument logic functions. Transfor- 
mations are functors that, taking one connective as the 
argument will produce another connective. 

Let 4 transformations on basic propositional func- 
tions f(x, y) of 2 arguments be given as follows: 


I(f) = f(x,y), Df) =-7f(-x, >), 
CHHfonay, ADH “fa: 


In 1940, Piaget discovered experimentally a specific 
concrete form of such transformations. In the set of the 
above transformations T, = {I, D, C, N} these individ- 
ual transformations are called identity, dual, contrad- 
ual, negation transformation, respectively. 

It has been shown that the Piaget group of trans- 
formation is satisfied by some many-valued logics (cf. 
[5,6,10,37]). 

The system of connectives 


{=rTop, ®sot. =BoT, Prop} 


obeys the Piaget group of transformations. Hence it 
possesses the abstract structure of the Klein 4-element 


group. 


An 8-Element Group of Logic Transformations 


Adding new nonsymmetrical transformations to those 
defined by Piaget enriches the algebraic structure of 
logic transformations. In 1979 LJ. Kohout and W. 
Bandler added the following nonsymmetric operations 
[22,23]: 


[hpH=feey), RCO) = fay) 
LOG) =-=fexry), ROG) = 7-9) 


to the above defined four symmetrical transformations. 
This yields a new 8-element group of transformations. 

The abstract 8-element group [T, * ] that captures 
the structure of the above defined logic transformations 
is also commutative and is called the symmetric S222 
group in the standard terminology of group theory. The 
interval logic system based on m, can be characterized 
by such groups of transformations. 

Given a set of connectives CON and a set of trans- 
formations T, we say that Tcon,r = T (CON) is the 
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set of connectives generated by the application of T to 
the set CON. For example, a —5 b will generate such 
a set of connectives. This generated set is a realization 
of S2x2x2. 


A 16-Element $2,222 Group 
of Logic Transformations 


The implication operators a +; b and a —< b yield- 
ing the measure my are contrapositive. This means that 
their valuations satisfy the semantic equality a > b = 
ab 7a. 

If we are interested in extending the interval logics 
into the domain of noncontrapositive — then the cor- 
responding A is not commutative. In order to distin- 
guish the contrapositive cases from noncontrapositive 
ones in a syntactically correct formal way, an additional 
operator is introduced. 

This operator called, commutator K, satisfies the 
equality a * b = K(b * a). The commutativity as well 
as the contrapositivity involves restrictions on transfor- 
mations of connectives. In the abstract group, these re- 
strictions are expressed abstractly as congruences [30]. 
It is convenient to express such restrictions equation- 
ally. For any contrapositive —, the following equalities 
hold: C[K(a —> b)] = K[C(a > b)] =a — b. For anon- 
contrapositive —, (1) fails, but the following equality 
holds: 


(K(C(K(C(a > b))))) =a — b. 


The following holds [29]: 


Theorem 6 The closed set of connectives generated by 
{ 4, <4, K} is a representation of the symmetric 16 
element abstract group S2x2x2x2. 


Conclusion 


The checklist paradigm clearly demonstrates the fol- 
lowing general meta-principle: a system of logic con- 
nectives is formed by a specific family of connectives to- 
gether with some common process/structure/principles 
that involve the said family of connectives in some uni- 
fying way, causing these to interact. 

In the checklist paradigm semantic model we use 
two basic unifying principles: 
i) approximation (contraction) measures; 


ii) transformations of logical types of connectives lead- 
ing to a global characterization of logics by their 
groups of transformations [23]. 

The methods of the checklist paradigm surveyed here 

give the theoretical bounds on the performance of par- 

ticular many-valued implication operators and other 
connectives by deriving these from deeper epistemolog- 
ical and formal assumptions. Hence, it provides a theo- 
retical justification of interval-valued approximate in- 
ference. The checklist paradigm, together with fuzzy 
questionnaires and square and triangle relational prod- 
ucts also plays an important role in the experimen- 
tal identification of fuzzy membership functions and 
structures [21,25] (see also ® Boolean and fuzzy rela- 
tions). The results can be extended to the groupoid- 
based many-valued Pinkava algebras (see ® Finite com- 
plete systems of many-valued logic algebras) that are 
used in the design of knowledge-based and other sys- 
tems [19]. This theoretical work is supplemented by 
empirical studies of the adequacy of various logical con- 
nectives in practical applications of fuzzy sets and rela- 
tions [9]. 


See also 


> Alternative Set Theory 

> Boolean and Fuzzy Relations 

> Finite Complete Systems of Many-Valued Logic 
Algebras 

> Inference of Monotone Boolean Functions 

> Optimization in Boolean Classification Problems 

> Optimization in Classifying Text Documents 
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As companies are increasingly concerned about long 
term stability and profitability, recent years have wit- 
nessed growing demand for long range planning tools 
in all sectors. The chemical process industries are no 
exception. New environmental regulations, rising com- 
petition, new technology, uncertainty of demand, and 
fluctuation of prices have all led to an increasing need 
for decision policies that will be ‘best’ over a long time 
horizon. Quantitative techniques have long established 
their importance in such decision making problems. 
It is therefore no surprise that there is a considerable 
number of papers in the optimization literature devoted 
to the problem of long range planning in the processing 
industries. The purpose of this article is to review recent 
advances in this area. We will describe the main mod- 
eling issues, and discuss the computational complexity, 
formulations and solution algorithms for this problem. 


The Long Range Planning Problem 


Consider a plant comprising of several processes to pro- 
duce a set of chemicals for sale. Each process intakes 
a number of raw materials and produces a main prod- 
uct along with some by-products. Any of these main or 
by-products could be the raw materials for another pro- 
cess. Considering the ingredients and final product of 
all the processes, we have a list of chemicals consisting 
of all raw materials we consider purchasing from the 
market, all products we consider offering for sale on the 
market, and all possible intermediates. The plant can 
then be represented as a network comprising of nodes 
representing processes and the chemicals in the list, in- 
terconnected by arcs representing the different alterna- 
tives that are possible for processing, and purchases to 
and sales from different markets. 

The process planning problem then consists of 
choosing among the various alternatives in such way as 
to maximize profit. Once we know the prices of chem- 
icals in the various markets and the operating costs of 
processes, the problem is then to decide the operating 
level of each process and amount of each chemical to be 
purchased and sold to the various markets. The prob- 
lem in itself grows combinatorially with the number 
of chemicals and processes and is further complicated 
once we start planning over multiple time periods. 

Let us now consider the operation of the plant over 
a number of time periods. It is reasonable to expect 
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that prices and demands of chemicals in various mar- 
kets would fluctuate over the planning horizon. These 
fluctuations along with other factors, such as new en- 
vironmental regulations or technology obsolescence, 
might necessitate the decrease or complete elimination 
of the production of some chemicals while requiring in- 
crease or introduction of others. Thus, we have some 
additional new decision variables: capacity expansion 
of existing processes, installation of new processes and 
shut down of existing processes. Moreover, due to the 
broadening of the planning horizon, the effect of dis- 
count factors and interest rates will become prominent 
in the cost and price functions. Thus, the planning ob- 
jective should be to maximize the net present value in- 
stead of short term profit or revenue. This is the prob- 
lem that we shall devote our attention to. The problem 
can be stated as follows: assuming a given network of 
processes and chemicals, and characterization of future 
demands and prices of the chemicals and operating and 
installation costs of the existing as well as potential new 
processes, we want to find an operational and capacity 
planning policy that would maximize the net present 
value. 


Computational Complexity 


The number of possible alternatives, regarding which 
processes to expand and when, increases with the num- 
ber of processes and the number of time periods. Even 
though this increase is clearly exponential in the num- 
ber of processes and time periods, it was not until re- 
cently that a formal computational complexity charac- 
terization was provided for this problem. In particular, 
the general long range process planning problem has 
been shown by S. Ahmed and N.V. Sahinidis [3] to be 
NP-hard by identifying two known NP-hard problems 
as special cases. 

Consider first a single-process, multiperiod prob- 
lem where the decisions consist of determining the ex- 
pansion sequence to satisfy given demands over a num- 
ber of time periods at a minimum cost. It can be shown 
that this problem is equivalent to the NP-hard capac- 
itated lot-sizing problem, where one has to determine 
production lot sizes to satisfy demands at a minimum 
cost. Similarly, a multiple-process, single-time period 
problem, where the decisions are to determine which 
processes to install to satisfy demand at a minimum 


cost, can be shown to be equivalent to the NP-hard 
knapsack problem, where one has to select items from 
a set to place into a knapsack such that weight restric- 
tions are not violated and utility is maximized. 


Solution Strategies 


Some of the early approaches for the long range plan- 
ning problem were based on dynamic programming 
as described by S.M. Roberts [18]. AS. Manne [5,15] 
used integer programming approaches to account for 
economies of scale. D.M. Himmelblau and T.C. Bickel 
[7] presented a nonlinear programming formulation 
for a hydrodesulfurization process, and I.E. Grossmann 
and J. Santibez [6] developed a multi period mixed in- 
teger linear programming formulation. Y. Shimizu and 
T. Takamatsu [22] discussed a goal programming ap- 
proach where in addition to cost minimization, mini- 
mizing the number of expansions is also suggested. M. 
Santiago, O.A. Iglesias and C.N. Pamiagua [21] devel- 
oped a method to handle nonlinear concave cost func- 
tions arising in planning models. A.G. Jimenez and D.F. 
Rudd [9] presented a recursive mixed integer linear 
programming technique and applied it to the Mexican 
petrochemical industry. We next describe some of the 
more contemporary approaches to these problems. 


Integer Programming Approach 


Under the assumption of linear mass balances in the 
processes and fixed charge cost models, Sahinidis et 
al. [20] developed a mixed integer linear programming 
(MILP) formulation of the long range process planning 
problem as described below. 

Indices 


i For the set of NP processes 
For the set of NC chemicals 
For the set of NM markets 
For the set of NT time periods 


Si eet SS 


Parameters 


X1,X¥ Lower and upper bounds on the expansion of 
process iin period t. 

Lower and upper bounds on the availability 
of chemical j in market / in period t. 

Lower and upper bounds on the demand of 


chemical j in market / in period t. 


L au 
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Ij» Vj Forecasted buying and selling prices of 
chemical j in market / in period t. 


tio qj + Input and output proportionality constants 
for chemical j in process i. 
Git) Bit | Variable and fixed cost for the expansion of 
process i at the beginning of period t. 
Variables 


Xj, Capacity expansion of process i at the beginning of 
period t. 

Pj, Amount of chemical j purchased from market / at 
the beginning of period t. 

Qi Total capacity of process i in period t. The capac- 
ity of a process is expressed in terms of its main 
product. 

Sj Units of chemical j sold to market / at the end of 
period t. 

Wit Operating level of process i in period t expressed 
in terms of output of its main product. 

yit A O-1 integer variable. If process i is expanded 
during period t then yj = 1, else yi = 0 


Formulation 
max NPV 
NT NP 
a » = Y (ori Xit + Bitvie + Sit Wie) 
t=1 i=1 (1) 
NC NM 


7 » So (vjeSjut — TyPjt) 


j=l 1=1 


subject to 


yieXt, < Xie < yinXl, Vint (2) 
Qi = Qu + Xin, =Vi, t (3) 
Wit < Qi, Vi, t (4) 


NP NM NM 
> (nig — mij) Wie = Yo Sint — > Pit, Vj,t (5) 
1=1 1=1 


i=1 


ain, = Pas Bitps Vi,l.t (6) 
dit, < Sj < dines Vi,l.t (7) 


Xit, Qit, Wit 2 0, Wi, t (8) 


Vit € {0, 1}, Vi, t; (9) 
The objective (1) in the above formulation is to 
maximize the difference between the sales revenues of 
the final products and the investment, operating, and 
raw material costs. Equation (2) is a constraint that 
bounds capacity expansions. A zero value of yi; forces 
the capacity expansion of process i at period t to zero. 
If the binary variable equals 1, then the capacity expan- 
sion is performed within prescribed bounds. Constraint 
(3) in the above formulation defines the total capacity 
available at period t as a sum of capacity available in 
period t — 1 and the capacity expansion at the begin- 
ning of period t. The condition that the operating level 
of any process cannot exceed the installed capacity is 
modeled by constraint (4). Equation (5) expresses mass 
balances for chemicals across processes and markets. 
Constraints (6) and (7) are bounds on the purchase and 
sales quantities. The nonnegativity and binary restric- 
tions are imposed through constraints (8) and (9). Var- 
ious extensions of this general model are discussed in 

the recent survey article [2]. 

Sahinidis et al. [20] developed strong bounding 
techniques and cutting planes to be used within 
a branch and bound framework to solve the above prob- 
lem. The fact that the problem is decomposable in the 
number of time periods can also be exploited by us- 
ing Benders decomposition. Further improvement of 
the bounding schemes are suggested by reformulating 
the problem to exploit lot sizing substructure in [19]. 
The reformulated problem results in a large number 
of constraints and variables. In [10], the reformulated 
problem is projected onto a lower-dimensional space 
to reduce the number of variables, and is solved using 
a cutting plane strategy along with branch and bound. 
Computational results in [10,19] and [20] suggest the 
following: 

e Branch and bound with strong bounding techniques 
performs much better than Benders decomposition 
for large problems. 

e For small sized problems, the reformulation and 
projection approach do not provide appreciable 
gains. 

e For large problems, the best approach is to use a cut- 
ting plane method based on the projected model. 
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In the MILP model, economies of scale in the invest- 
ment cost functions were modeled by the introduc- 
tion of a set of binary decision variables (yj) to impose 
a fixed charge on the decision to expand in addition 
to the linear term for variable costs. In reality, variable 
costs are not directly proportional to expansion quan- 
tity. Rather, the investment cost is a concave function 
because of the presence of quantity discounts. Thus, 
a more realistic model for the investment cost would 
be: 


ee : when Xj; = 0, 
Bit + airX;j' when Xi; > 0, 

where aj; > 0 and 0 < bj, < 1. In this formulation, the in- 
teger variables have been discarded and the linear vari- 
able cost function has been replaced by a concave func- 
tion in Xj with coefficient a; and exponent bj. Note 
that this function is discontinuous at X;, = 0. M.L. Liu, 
Sahinidis and J.P. Shectman [14] present two formula- 
tions using these concave cost functions. In the fixed 
charge concave programming model (FCP), the linear 
cost relation is retained but the discrete variables are 
eliminated by using the following concave function: 


0 when Xj; = 0, 


SF (Xit) = 
Bit + ieXit 


when X;; > 0. 


In the continuous concave programming model (CCP), 
the discontinuity at X;; = 0 is avoided by using the fol- 
lowing function: 


Sf (Xie) = one, 


Both (FCP) and (CCP) are problems with concave 
objective functions to be minimized over a set of lin- 
ear constraints. These can be solved by a concave pro- 
gramming method based on the branch and bound pro- 
cedure. Computational experience with these models 
suggests that the algorithm for (FCP) outperforms the 
straightforward branch and bound for the MILP for- 
mulation. 


Approximation Schemes 


Despite the success of optimization models and algo- 
rithms in solving problems of industrial relevance, the 


majority of approaches in current industrial-level plan- 
ning practice are still based on heuristics rather than 
integer programming techniques. However, the perfor- 
mance characterization of these approximate methods 
is based on empirical evidence and little has been done 
in the way of analytical investigations. Liu and Sahinidis 
[12] developed a simple heuristic for the process plan- 
ning problem. The method is based upon solving the 
LP relaxation of the MILP, and then shifting capacity 
expansions from latter periods to earlier periods while 
maintaining feasibility. Worst-case bounds on the per- 
formance of this heuristic have also been developed and 
probabilistic analysis of the heuristic has shown that, 
under standard assumptions on the problem data, the 
heuristic solution converges to the optimal solution al- 
most surely as the problem size increases. A modifi- 
cation of this heuristic for process planning problems 
with a restriction on the number of allowed expansions 
has been presented in [3]. The modified heuristic has 
been proven to be asymptotically optimal in expecta- 
tion. 


Dealing with Uncertainty 


Uncertainty is an integral part of the long range pro- 
cess planning problem. In the deterministic models dis- 
cussed above, it is assumed that all uncertainty has been 
accounted for in the estimation of the problem param- 
eters. Stochastic models, on the other hand, provide ex- 
plicit means of handling parameter uncertainties. 

In process planning problems under uncertainty, 
the decision maker is interested in a plan that optimizes 
some sort of a stochastic objective. Two most common 
such objective functions in the literature are the ex- 
pected cost/profit of the plan and the plan’s flexibility. 

Problems with the expected cost objective have 
been formulated as two-stage stochastic linear programs 
(2S-SLP). In such problems, the uncertain parameters 
are treated as random variables with known distribu- 
tions. The desired degree of flexibility of the plan is 
pre-specified by identifying the probability space over 
which the plan is required to be feasible. The decision 
variables of the problem are partitioned into two sets. 
The first stage variables, which are often known as “de- 
sign’ variables, have to be decided before the actual re- 
alization of the random parameters. Subsequently, once 
the values of the design variables have been decided and 
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the random events presented themselves, further pol- 
icy improvements can be made by deciding the values 
of the second stage variables, also known as ‘control’ or 
‘operating’ variables. The choice of the design variables 
should be such that the first stage costs and the sec- 
ond stage expected costs are minimized. These prob- 
lems have been solved using decomposition schemes, 
where the expectation functional over the uncertain pa- 
rameter space has been approximated using Monte- 
Carlo sampling [11], successive disaggregation [4] or 
by Gaussian quadrature [8]. Computational results in 
[11] show that a combination of Benders decomposi- 
tion with Monte-Carlo sampling provide optimal or ex- 
cellent near-optimal solutions. Problems with up to 10 
processes, 4 products, 6 chemicals, and with up to 5** 
scenarios were solved in at most a few CPU minutes on 
a standard workstation. 

From the flexibility objective point of view, one is 
interested in a plan that maximizes the range of the un- 
certain parameters over which the plan remains feasi- 
ble. Problems of this type are typically harder to for- 
mulate and require the identification of a suitable mea- 
sure of flexibility that one can optimize. Such a formula- 
tion has been presented in [24] which maximizes their 
stochastic flexibility metric [23] subject to a cost con- 
straint. 

The objectives of optimizing cost or profit and 
maximizing flexibility are typically conflicting. For- 
mulations that combine the objectives by associating 
a retrofit cost corresponding to design flexibility have 
been presented in [16,17]. 

Liu and Sahinidis [13] applied a fuzzy programming 
approach for the problem of process planning under 
uncertainty. In this model, the uncertain parameters are 
considered to be fuzzy numbers with a known range of 
values, and constraints are treated as ‘soft,’ i.e. some vi- 
olation is allowed. The degree of satisfaction of the con- 
straints is then measured in terms of membership func- 
tions, and the objective is to optimize a measure of con- 
straint satisfaction. 

The standard stochastic programming formulation 
does not address the variability of the uncertain re- 
course costs across the uncertain parameter scenarios. 
The need for enforcing robustness of these costs is par- 
ticularly important to a risk aversive planner in a high 
variability environment. The stochastic programming 
formulation of the process planning problem has been 


extended in [1] to account for robustness of the re- 
course costs through the use of an appropriate vari- 
ability criterion. In particular, upper partial mean has 
been proposed as the measure of variability for its intu- 
itive appeal and to avoid nonlinear formulations. These 
models provide the decision maker with a tool to an- 
alyze the trade-off associated with the expected profit 
and its variability. To overcome the difficulty asso- 
ciated with solving the robust models which include 
nonseparable terms, a heuristic procedure for the re- 
stricted recourse formulation has been developed. This 
method iteratively enforces recourse robustness while 
solving the standard stochastic program in each step. 
The heuristic generates similar but more conservative 
trade-off frontiers for the profit and its upper partial 
mean. 


Conclusion 


The purpose of this article has been to review the re- 
cent advances in the use of optimization techniques in 
long range chemical process planning. Considerable at- 
tention has been devoted to the mixed integer linear 
programming formulation of the problem and efficient 
solution schemes that exploit the structure of the prob- 
lem have been developed. Continuous models have also 
been successfully solved using global optimization tech- 
niques. The combinatorial complexity of the problem 
has recently motivated the need for heuristics and their 
performance analysis. Some exciting new results have 
been obtained in this regard. Uncertainty of the prob- 
lem parameters has been dealt with through stochastic 
programming and fuzzy programming models. Vari- 
ous different objective criteria including expected value, 
flexibility and variability have been considered in exten- 
sions of the two-stage stochastic programming formu- 
lation and a number of efficient algorithms have been 
developed for industrially relevant problems. 
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Solving a linear system of the form Ax = b where A € 
R™", b € R” is one of the most fundamental problems 
in mathematics and science. The two basic categories 
of numerical solutions are direct methods and itera- 
tive methods. Of the direct methods, Gaussian elimina- 
tion with backsolving is the most commonly used tech- 
nique. Straightforward Gaussian elimination uses row- 
operations to reduce the system to upper-triangular 
form before backsolving to find the solution, whereas 
the equivalent LU-decomposition initially factors A into 
the product of a lower- and upper-triangular matrix: A 
= LU. In the special case where A is both symmetric and 
positive definite, the matrix may be decomposed into 
A = LL' where L € R”*” is lower-triangular. This 
decomposition is known as the Cholesky factorization, 
and is named for A.L. Cholesky. 

The LU-decomposition of a square matrix, A, is 
the factorization of A into the product of a lower- 
triangular matrix, L ¢ R” and an upper-triangular 
matrix, U € R™”. The system Ax = (LU)x = b is 
then solved by forward solving Ly = b where y = Ux, 
and then backsolving Ux = y. The solution can be 
found in roughly the same number of floating point 
operations (flops) as Gaussian elimination with back- 
ward substitution. More specifically, both methods re- 
quire about n°/3 multiplications/divisions and n°/3 ad- 
ditions/subtractions for large n. The main advantage of 
this method is that once the matrix is factored (which 
requires O(n*) steps), the system can be solved repeat- 
edly for different b, which only requires O(n?) steps. 
One drawback of this method is that pivoting may be 
required to find the decomposition. 

A special class of problems arises if the matrix in 
the system is positive definite, i.e. x? Ax > 0 for x # 
0. Note that if A is positive definite, but not symmet- 
ric, this implies that 1/2(A+AT), the symmetric part of 
A, is positive definite. The matrix A, can then be de- 
composed into the form A = LDMT where L, M are 
lower-triangular and D is a diagonal matrix containing 
the pivots of A. If, in addition to being positive definite, 
Ais also symmetric, i.e. A = AT, then L is symmetric, M 


= L, and the matrix has the special decomposition A = 
LDLT, where D has positive entries. Therefore /D ex- 
ists and A can be decomposed into A = IL", where 
LT = LVD and is referred to as the Cholesky triangle. 
Hence the Cholesky factorization is often referred to as 
the ‘square-rooting method’ [5]. The major advantage 
of this is that it requires around half the flops of the 
standard LU-decomposition. 

The Cholesky factorization, presented below for 
symmetric and positive definite A € R”” in pseu- 
docode, is taken from [2]. 


FOR fe = Ihe co55 i 


ye 
k-1 2 
Akk ~ Dip=1 Wp 


FORi=k+1,...,n 


k-1 
Bie (4n-De sintiy | [an 


Akk 3= 


Cholesky Factorization, Algorithm 1 
A pseudocode for the Cholesky factorization 


4 2 
Example1 Let A= (; . This matrix is both sym- 


metric and positive definite. Therefore a Cholesky fac- 
torization exists for A (see [3] for a proof). An LU- 
decomposition for it is 


a= 0d): 


Note that this is not unique, as another LU-decomposi- 
tion is 


a= 6 JG i): 


The pivots in both cases are 4 and 4. Hence, the 
LDLT-decomposition is 


a= Dl Jl 
5 1/\0O 4) \0 
Finally, the Cholesky factorization is 


seit =P? 2 1 
- ~\1 2) \o 2)° 


Additionally, this factorization is unique for symmetric, 
positive definite matrices. The decomposition can be 


Re nie 
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performed in fixed-point with no pivoting required [9]. 
This implies that the Cholesky decomposition is guar- 
anteed to be stable without pivoting. 

If the matrix A is ill-conditioned, i.e. has condition 
number k = ||A|| ||A~1|| > 1, then the matrix may be 
nearly singular and the computed solution to the sys- 
tem, Ax = b, may not be sufficiently accurate. A process 
of iterative refinement may be used to assess the accu- 
racy of the solution and then improve upon it when 
working in higher precision is not practical. See [2,9] 
for a more detailed explanation. 

The Cholesky factorization can also be used to find 
the inverse and determinant of a symmetric, positive 
definite matrix (the LU-decomposition can be used for 
general A € R””). It is important for the matrix to be 
positive definite for a variety of reasons, for example, if 
Ais symmetric, but not positive definite, then the stabil- 
ity is not guaranteed. In the case of finding the inverse 
of a matrix A, a poor inverse may be obtained even if A 
is well-conditioned, i.e. k is ‘close’ to 1. 

The efficiency of the Cholesky factorization can be 
further improved if the matrix is ‘banded’. A matrix A 
= [a,j] is said to have upper bandwidth q if aj = 0 when- 
ever j >i + q and lower bandwidth p if aj = 0 whenever 
i>j + p. Since A is symmetric, when A has lower band- 
width p, it also has upper bandwidth p. In this case A is 
said to have bandwidth p. For example, if p = 1, then A is 
tridiagonal. The following algorithm from [2] takes ad- 
vantage of the fact that A is symmetric, positive definite 
and has bandwidth p. It requires n square roots and 


flops or approximately “?+3?) +5?) flops for p Xn. 


OR f = Wn cca nif 
FOR j = max{l,i—p},...,i-—1 


Gi; = (0 Yin =max{1,i—p} ainain) [95 
1/2 
ii = (a = =max{1,i—p} a) 


Cholesky Factorization, Algorithm 2 
A pseudocode for the banded Cholesky factorization 


Another important application of the Cholesky fac- 
torization is in the key role it plays in one of the most 


commonly used numerical techniques for solving the 
least squares problem (LS problem; cf. also > least 
squares problems). The least squares problem is to find 
the ‘best’ solution to Ax = b when the system is in- 
consistent for A € R”*”. Instead, the system AT Ax = 
ATb, more commonly known as the normal equations, 
is solved by first finding the Cholesky factorization of 
the symmetric matrix ATA = pean which is positive 
definite if A has rank n. Next, Ly = A'b is forward- 
solved, and finally, the ‘best least squares’ solution, X, is 
found by backsolving L'x = y. Note that ¥ minimizes 
|| Ax — bl|2 and the algorithm requires O(n) flops. For 
the algorithm and an analysis of the accuracy of the 
method, see [2]. 
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Broadly speaking, matrix analysis is the study of non- 
algebraic properties of matrices and the analysis of ma- 
trices in order to reveal their finer properties and struc- 
ture. (Here, ‘algebraic’ is understood at least in the clas- 
sical sense of algebra.) The matrices are usually real or 
complex matrices. Combinatorial matrix analysis is the 
study of combinatorial properties of matrices and the 
analysis of matrices which takes into account combi- 
natorial structure. Here combinatorial structure usually 
refers to the zero-nonzero pattern of a matrix, captured 
through the use of either the directed graph or bipartite 
graph associated with the nonzero entries of a matrix 
(or the graph of the nonzero entries in the case of a sym- 
metric matrix), or to the positive-negative-zero pattern 
of a real matrix, captured through the use of the signed 
digraph or signed bipartite graph of a matrix. 

This article is intended as an introduction to com- 
binatorial matrix analysis. More detail can be found in 
[5] and [6], and the references contained therein. We do 
not discuss here the many applications of matrix theory 
and linear algebra to combinatorics, graphs, and dis- 
crete structures. 


Matrix Patterns and Various Graphs 


Let A = [ay] be a matrix of order n whose entries aj 
are real or complex numbers. To A there corresponds 


a directed graph (or digraph) D(A) with vertex set V = 
{1, ..., n} and with an arc (i, j) from vertex i to vertex 
j if and only if aj A 0. The bipartite graph (or bigraph) 
BG(A) of A has vertex set {1,, ..., n,} (corresponding 
to the rows of A) and {1,, ..., n,} (corresponding to the 
columns of A); the edges of BG(A) are all pairs {i,, jc} 
for which aj 4 0. The bipartite graph of a matrix can 
be defined for a rectangular m x n matrix in the same 
way except that the vertices corresponding to the rows 
are {1,,...,m,}. Both the digraph and the bigraph reveal 
the zero-nonzero pattern of a square matrix A. 

If A is a real matrix and we want to capture the sign 
(+, —, 0) of the entries of A, then we assign a + or — 
to each arc of D(A) (to each edge of BG(A)) according 
as the corresponding entry of A is positive or negative, 
and in this way obtain the signed digraph and signed bi- 
graph of A. We use the same notations D(A) and BG(A) 
for the signed versions of the digraph and bigraph of 
A. Thus two matrices A and B have the same sign pat- 
tern if and only if they have the same signed digraphs 
(equivalently, the same signed bigraphs). 

If A is a symmetric matrix (or has a symmetric pat- 
tern in the sense that a 4 0 if and only if a; 4 0), then 
the graph G(A) of A has vertex set {1, ..., n} with an 
edge {i, j} between i and j if and only if ay 4 0 (equiv- 
alently, aj; 4 0). Thus G(A) is obtained from D(A) by 
‘removing’ the directions on arcs (this may result in two 
edges joining certain pairs of vertices and one edge of 
each such pair is removed as well). Sometimes in D(A) 
and G(A) it is convenient to ignore the arcs (i, i) and 
edges {i, i} (called loops) corresponding to nonzero en- 
tries aj; on the main diagonal of A. 

A square matrix A of order n is irreducible provided 
there does not exist a permutation matrix P such that 


ae 0 
PAP! = ; 
he Q 


where A is a square matrix of order k for some k with 
0 <k <n. (The matrix PAPT is obtained from A by 
simultaneously permuting its rows and columns. The 
digraphs of A and PAPT are isomorphic.) A digraph 
is strongly connected provided for each ordered pair of 
distinct vertices i and j there is a path from i to j. 


Proposition 1 The matrix A is irreducible if and only 
if the digraph D(A) is strongly connected, [5]. 
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A square matrix can be brought to a very special form 
by simultaneous row and column permutations. 


Theorem 2 Let A be a matrix of order n. Then there 
exist a permutation matrix P and an integer k > 1 such 
that 


A; O O 
+ Ay A2 O 
PAP’ = ; 
Agi Ak2 <1: Ak 
where Aj, ..., Ag are square, irreducible matrices. The 


matrices Aj, ..., Ax are uniquely determined to within 
simultaneous permutations of their rows and columns, 
but their order on the diagonal is not necessarily unique. 


The matrices A;, ..., Ax in this theorem are called 
the irreducible components of A and correspond to the 
strongly connected components of the digraph D(A). 

Irreducible matrices have an inductive structure that 
is revealed in the next theorem [5]. 


Theorem 3 Let A be an irreducible matrix of order n 
> 2. Then there exist a permutation matrix P and an 
integer m > 2 such that 


A; O ::- O Ey 
E, Ad <«:: O O 
PAP! — . . . . 
O O Am-1 O 
O O Em Am 
where Aj, ..., Am are irreducible matrices and E), ..., 


E,, are matrices having at least one nonzero entry. 


Allowing independent row and column permutations 
in the definition of irreducibility leads to full indecom- 
posability. A square matrix A of order n is fully in- 
decomposable provided there do not exist permutation 
matrices P and Q such that 


PAQ = be cl 


Aa A2 


where Aj is a square matrix of order k for some k with 
0 <k <n. The matrices A and PAQ have isomorphic 
bigraphs. 

Theorems analogous to Theorems 2 and 3 hold 
with independent permutations replacing simultaneous 


permutations and fully indecomposable replacing irre- 
ducible. The connection is provided by the fact that 
a square matrix A is fully indecomposable if and only 
if there are permutation matrices P and Q such that 
PAQ has a nonzero main diagonal and PAQ is irre- 
ducible [5]. 


Eigenvalues and Digraphs 


The following theorem is the Perron—Frobenius theo- 
rem [7,8] and is one of the first instances of the influ- 
ence of the digraph of a matrix on its spectral proper- 
ties. 


Theorem 4 (Perron-Frobenius theorem) Let A be 

a matrix of order n > 1 each of whose entries is a nonneg- 

ative real number. Assume that A is irreducible, equiva- 

lently D(A) is strongly connected. Then there is a positive 
number p(A) such that 

1) p(A) is a simple eigenvalue of A; 

2) every eigenvalue A of A satisfies |A| < p(A); 

3) the number of eigenvalues of A with |A| = p(A) 
equals the greatest common divisor k of the lengths 
of the circuits of D(A), and these eigenvalues are 
pAer® Ga Tics BD 


A more recent application of the digraph of a matrix to 
localization of its eigenvalues concerns a generalization 
[2] of the Gershgorin theorem. Let A = [aj] be a complex 
matrix of order n and let 


R; => » |ai;| 


i#i 


(l<i<n). 


Then Gershgorin’s theorem asserts that the n eigenval- 
ues of A lie in that part R of the complex plane deter- 
mined by the union of the n closed disks 

{z: |z — ai;| < Ri}, (l<i<n). 
If A is irreducible, then a boundary point of R is an 
eigenvalue of A only if it is a boundary point of each of 
the n closed disks. 

By considering the circuits, or directed cycles, of the 
digraph of A, a better inclusion region can be obtained. 


Theorem 5 Let A = [a,] be a complex matrix of order 
n. Then the n eigenvalues of A lie in that part S of the 
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complex plane determined by the union of the regions 


S(y) = 42: I] |z—aii| < [ [2 : 
y y 
(y a circuit of D(A)), 


where ||, denotes the product over all vertices i belong- 
ing to the circuit y. If A is irreducible, then a boundary 
point of S is an eigenvalue of A only if it is a boundary 
point of each S(y). 


Theorems 4 and 5 demonstrate how information con- 
cerning the combinatorial structure of a matrix can be 
used to give information on spectral properties of the 
matrix. 


Sign-Nonsingular Matrices 


It is easy to characterize real matrices A = [aj] of order 
n whose singularity is a consequence of their zero pat- 
tern, equivalently, of their nonzero pattern or bigraph 
BG(A). Let Z(A) denote the set of all real matrices B of 
order n that have the same zero pattern as A, that is, sat- 
isfy BG(B) = BG(A). Then the following are equivalent: 
i) Each matrix B € Z(A) is singular; 
ii) Each of the n! terms in the standard determinant 
expansion of A is zero (the standard determinant 
expansion of a matrix A is 


detA = Yoel. scaters in) Ari, “++ Ani, 


where the sum extends over each permutation i, 
...d, Of {1,..., n} and e(i; ...i,) is + or — depending 
on whether the permutation is even or odd); 

iii) The bigraph BG(A) does not have a perfect match- 
ing (i.e. a set of n pairwise vertex disjoint edges 
meeting all vertices); 

iv) There is a set of fewer than n rows and columns 
which together contain all the nonzero entries of A; 

v) There is a set of fewer that n vertices of BG(A) 
which together meet all the edges of BG(A). 

Properties ii) and iii) are clearly equivalent, as are prop- 

erties iv) and v). Properties i) and ii) are equivalent, 

since if there is a nonzero term in the standard deter- 
minant expansion of A, then by sufficiently emphasiz- 
ing the entries of A in that term we obtain a nonsingu- 
lar matrix. Properties iii) and iv) are equivalent by the 
Frobenius-Konig theorem [5]. 


Now let Q(A) denote the set of all real matrices of 
order n that have the same sign pattern (+, —, 0) as A. 
Q(A) consists of all real matrices of order n that have 
the same signed digraph (equivalently, the same signed 
bigraph) as A and is called the qualitative class of A. The 
matrix A is called sign-nonsingular provided each ma- 
trix in Q(A) is nonsingular. Some equivalent character- 
izations of sign-nonsingularity are: 

i) Ais sign-nonsingular; 

ii) There is a nonzero term in the standard determi- 
nant expansion of A and each such nonzero term 
has the same sign; 

iii) det (A) 4 0 and the determinants of the matrices in 
Q(A) all have the same sign. 


The matrix 
1 -l 0 
1 —l 
1 1 1 


is a sign-nonsingular matrix. 

If a matrix A = [aj] is sign-nonsingular, then 
so is every matrix PAQ where P and Q are permu- 
tation matrices, as is every matrix of the form DA 
where D is a nonsingular diagonal matrix. Also a sign- 
nonsingular matrix must have a nonzero term in its 
standard determinant expansion. Thus in dealing with 
sign-nonsingular matrices we may assume that each en- 
try on the main diagonal is negative, that is, A has a neg- 
ative main diagonal. With this normalization, we have 
the following theorem [1]. The sign of a circuit 


Vi jie ik hh 
of the signed digraph of A is 
sign(y) = sign dj, j,°** Qjp_y jp Fjnjrs 
the products of the signs of the arcs of the cycle. 


Theorem 6 (Bassett-Maybee-Quirk theorem) Let A 
be a real matrix of order n with a negative main diago- 
nal. Then A is a sign-nonsingular matrix if and only if 
each circuit of the signed digraph of A is negative. 


Sign-nonsingularity allows one to characterize square, 
homogeneous systems of linear equations Ax = 0 for 
which Ax = 0 has only the zero solution (thus all solu- 
tions of Ax = 0 have the same sign pattern) for all ma- 
trices A with the same sign pattern as A. A more general 
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problem is to characterize linear systems Ax = b that are 
sign-solvable in the sense that the sign pattern of the so- 
lution is determined solely by the sign patterns of A and 
b. More precisely, Ax = b is sign-solvable provided that 
for all A € Q(A) and all b € Q(b) there is a vector x 
such that Ax = b and all of the vectors in 


{: 3A € Q(A),b € Q(b) st. Ax =O} 


have the same sign pattern. 

Sign-solvable linear systems can be characterized in 
terms of two classes of matrices, called S*-matrices and 
L-matrices. Ann x (n + 1) matrix B is an S*-matrix pro- 
vided each matrix of order n obtained from B by delet- 
ing a column is a sign-nonsingular matrix. Cramer’s 
rule implies that B is an S*-matrix if and only if there is 
a vector w with no zero coordinates such that the right 
null spaces of the matrices in Be Q(B) are contained 
in {0} U Q(w) U Q(—w). The matrix 


is an S*-matrix. A matrix A is an L-matrix provided ev- 
ery matrix in Q(A) has linearly independent rows. Sign- 
nonsingular matrices are square L-matrices. Every S*- 
matrix is an L-matrix and so is any matrix obtained 
from an L-matrix by appending columns. 

The following theorem characterizes sign-solvable 
linear systems [9]. 


Theorem 7 Let A = [aj] be an m x n matrix and let 
b be an m x 1 column vector. Let z = (Z, ..., Zn)™ be 
a solution of the linear system Ax = b, and let 


B= tj: 2 #0}, 
a= {i: aij #0 forsomeje ph. 
Then Ax = b is sign-solvable if and only if the matrix 


[Ala, B] — b[B]] 
is an S*-matrix and the matrix A(a, B)™ is an L-matrix. 


(Here Ala, 6], respectively A(a, B), is the submatrix 
of A formed by the rows in w and the columns in f, 
respectively not in w and not in , and bia] = b[a, {1}].) 

A detailed study of sign-solvability and related is- 
sues is contained in [6]. 


Doubly Stochastic Matrices 


A real matrix A = [aj] of order n is doubly stochastic 
provided each of its entries is nonnegative, and all row 
and column sums equal 1: 


aij 20 (Gi,j=1,...,n), 


n n 
) Gij => 1, ) Gij =1. 
j=l i=1 


Doubly stochastic matrices arise quite naturally in 

many different contexts: 

i) Let U = [uj] bea real orthogonal matrix or a com- 
plex unitary matrix. Then 

O=[luil Gj=Hl....n) 

is a doubly stochastic matrix. 

ii) (Optimal assignment problem) Consider an assign- 
ment of n people to n positions in which the ‘value’ 
of the ith person to the jth position is vj > 0 (i, j= 1, 
..., 1). An optimal assignment is an assignment i > 
ji G=1,..., n) of people to positions (here j; ...j, 
is a permutation of {1,..., m}) which maximizes the 
total value a j, Lhe set 2, of doubly stochas- 
tic matrices of order n is a convex polytope and, 
according to Birkhoff’s theorem [5], the set of ver- 
tices of this polytope is the set P,, of permutation 
matrices of order n. Thus the vertices of 2, corre- 
spond to the n! possible assignments, and the opti- 
mal assignment problem can be solved as a linear 
programming problem on £2. 

iii) Let P,, = {P), ..., Pa}, and let (cj: i=1,..., n!) be 
a probability distribution on P,: c; > 0 (i= 1,..., 
n!) and )°"!_, c; = 1. Then the expectation of a per- 
mutation R € P,, chosen at random is 


E=E[R] = yiPi = [ej], 
i=1 


a doubly stochastic matrix. It is a consequence of 
Birkhoff's theorem that every doubly stochastic ma- 
trix of order n arises from a probability distribution 
on P,, in this way. The probability that a function f 
chosen at random according to the probabilities 


prob(f(i) = j) = ei; 


is a permutation equals the permanent of A defined 
by 


per(A) = » Aj, °°" Anj,> 


(i,j =1,...,n) 
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where the sum extends over all permutations j; 

.--jn Of {1,..., a}. 

Let A = [aj] be a real, symmetric matrix (or a com- 
plex, Hermitian matrix). Then there exists a real, or- 
thogonal matrix U such that 


i te # 
vat =o OY 
0 0 Ay 
where A; >... > A, are the n eigenvalues of A. Com- 


paring diagonal entries, we get 
(A,. a oe (1) 


where S = Uisa doubly stochastic matrix. Without 
loss of generality, assume that ay) > +--+ > @yy. Then 
equation (1) implies that 


2h em = S(a11, a 


Apter: +A; < ay te:t+ay (i= 1,...,n), (2) 


with equality for i= n. When the inequalities (2), with 
equality for i =n, hold between two vectors A = (Aj,..., 
An) and (t= (441, ...5 Ann) (that have been arranged in 
nonincreasing order), then J is said to be majorized by 
pt. A Hardy-Littlewood-Pélya theorem states that if A 
is majorized by j1, then there exists a doubly stochastic 
matrix S such that A = Sy [10]. Hence by Birkhoff’s the- 
orem, A is majorized by yu if and only if A is in the con- 
vex hull of all vectors obtained from jy by permuting its 
coordinates. There exist doubly stochastic matrices S of 
very special form such that A = Sj when A is majorized 
by py [4,10]. 

As noted above, the vector of eigenvalues of a real, 
symmetric matrix is majorized by the vector of its en- 
tries on the main diagonal. Conversely, if A and ju are 
two n-vectors with A majorized by jz, then according 
to a theorem of A. Horn, there exists a real, symmetric 
matrix of order n, whose eigenvalues are given by A and 
whose main diagonal entries are given by yu [10]. 

Let A be a doubly stochastic matrix, and let Aj, ..., 
Ax (k = 1) be the fully indecomposable components of 
A. Since all row and column sums of A equal 1, it fol- 
lows easily that up to row and column permutations A 
is the direct sum of its fully indecomposable compo- 
nents: there exist permutation matrices P and Q such 
that 


PAQ= Ai ®---@ Ak, 


where © denotes direct sum. The polytope 92, has di- 
mension (n — 1)”. Each doubly stochastic matrix deter- 
mines a face of §2, equal to the set of all doubly stochas- 
tic matrices S such that BG(S) is a subgraph of BG(A) 
(i.e. each edge of BG(A) is also an edge of BG(A)). 
This face is the smallest face of 92, containing A, and 
each nonempty face of §2,, arises in this way. Since no 
entry of a doubly stochastic matrix can exceed 1, the 
nonempty faces of {2,, can be described as follows: Let 
C be a (0, 1)-matrix of order n which, up to row and 
column permutations, is a direct sum of fully indecom- 
posable matrices (such matrices are said to have total 
support). Then 


F(C) = {A: Ae 2,,A < C entrywise} 


is a face of (2, and its dimension equals o(C) — 2n + k, 
where o(C) is the number of 1s of Cand k is the number 
of its fully indecomposable components [3]. 
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The resource allocation problem seeks to find an optimal 
allocation of a fixed amount of resources to activities so 
as to minimize the cost incurred by the allocation. A 
simplest form of the problem is to minimize a separa- 
ble convex function under a single constraint concern- 
ing the total amount of resources to be allocated. The 
amount of resources to be allocated to each activity is 
treated as a continuous or integer variable, depending 
on the cases. This can be viewed as a special case of the 
nonlinear programming problem or the nonlinear inte- 
ger programming problem. 

Due to its simple structure, this problem is encoun- 
tered in a variety of application areas, including load 
distribution, production planning, computer resource 
allocation, queueing control, portfolio selection, and 
apportionment. The first explicit investigation of the 


resource allocation problem is due to B.O. Koopman 
[15] (1953), who dealt with the problem of the opti- 
mal distribution of efforts, which arises in the problem 
of searching for an object whose position is a random 
variable. Since then, a great number of papers have been 
published on resource allocation problems. Efficient al- 
gorithms have also been developed, depending on the 
form of objective functions and constraints or on the 
type of variables (i. e., continuous or integer). 

See [11] for a comprehensive review of the state- 
of-the-art of the problems (as of 1988). After this book 
was published, many papers have been published on re- 
source allocation problems. A significant progress has 
been made on the algorithm side. Also, new general- 
izations and variants of the problem have been investi- 
gated, and new application fields have been discovered. 
Such new progress has been reviewed in [13]. 

We first classify the resource allocation problems. 
A generic form of the resource allocation problem dis- 
cussed in this article is described as follows: 


min f(x; .Xn) 
(P) dst. yeah (1) 
j=l 
xj 20 JH lUyecign. 


That is, given one type of resource whose total amount 
is equal to N, we want to allocate it to n activities so that 
the objective value f(x), ..., X,) is minimized. The ob- 
jective value may be interpreted as the cost or loss, or 
the profit or reward, incurred by the resulting alloca- 
tion. In case of profit or reward, it is natural to max- 
imize f, and we shall sometimes consider maximiza- 
tion problems. The difference between maximization 
and minimization is not essential because maximizing f 
is equal to minimizing —f. 

Each variable x; represents the amount of resource 
allocated to activity j. If it represents persons, proces- 
sors or trucks, however, variable x; becomes a discrete 
variable that takes nonnegative integer values, and the 
constraint 

xj: integer, j=l,...,n, (2) 
is added to the constraints in (1). The resource alloca- 
tion problem with this constraint is often referred to as 
the discrete resource allocation problem. 
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As for the objective function, it usually has some 
special structure according to the intended applica- 
tions. Typically, the following special case, called sep- 
arable, is often considered: 


> fix). (3) 
j=l 


If each fj is convex, the objective function is called sep- 
arable convex objective function. 

Resource allocation problems are classified accord- 
ing to the types of objective functions, constraints and 
variables. We shall describe the classification scheme, 
and several types of problem formulations according to 
the classification scheme. In general, we use the nota- 
tion a/B/y to denote the type of a resource allocation 
problem. Here, a specifies the type of objective func- 
tion, 6 the constraint type, and y the variable type; 


denote the case of integer variable, respectively contin- 
uous variable. We shall now explain the notations for 


and f. 


a: Objective Functions 


The objective function f(x, .. 

lowing special structures: 

1) Separable (S, for short): jn Fil) where each f; is 
a function of one variable. 

2) Separable and convex (SC, for short): pae Fi (x)s 
where each f; is a convex function of one variable. 
In particular, if each f ; is quadratic and convex, we 
denote such a subclass by SQC. 

3) Minimax: minimize max, <j; <n f;(xj), or Maximin: 

maximize min, <;<»fj(x;); here, all f; are monotone 

nondecreasing in xj. 

Lexicographically minimax (Lexico-Minimax, for 

short): Since the objective value of Minimax is de- 

termined by the single variable xj satisfying fx (x;) 

= max; f;(x; ), there may be many optimal solutions. 

To remove such ambiguity, we introduce the lexico- 

graphical ordering for n-dimensional vectors: Given 

a=(aj,...,a,) and b= (bj,..., by), a is lexicograph- 
ically smaller than b (or b is lexicographically greater 
than a) if a; = bj for j = 1,...,k — land a < by 


.» Xn) may take the fol- 


4 


wma 


some k. This is denoted by a <jex b or b >\ex a. For 
a = (aj, ...; An); let DEC(a) (respectively, INC(a)) 
denote the n-tuple of a;,j=1,..., n, arranged in non- 
increasing order (respectively, nondecreasing order) 
of their values (e.g., for a = (4, 3, 1, 5), we have 
DEC(a) = (5, 4, 3, 1) and INC(a) = (1, 3, 4, 5)). The 
objective of Lexico-Minimax is to find an allocation 
.» Xn) such that DEC(x) is minimal. 
Notice that an optimal solution to Lexico—-Minimax 


vector x = (xj,.. 


is also optimal to Minimax, but the converse is not 
generally true. This is a refined objective of Mini- 
max. Similarly, we define Lexico-Maximin as the one 
that maximizes INC(x). 

5) Fair: minimize the expression 


« ( max filo), min fils) | 
l<j<n l<j<n 


where g(u, v) is nondecreasing (respectively, non- 
increasing) in u (respectively, v). This objective is a 
generalization of Minimax and Maximin. 


B: Constraints 


In addition to the simple first resource constraint of 
(1), other additional constraints are also imposed. Typi- 
cal additional constraints which appeared in various re- 
source allocation problems are as follows. We refer the 
case of no additional constraints as ‘simple’. 

1) Lower and upper bounds (LUB, for short): 1]; < xj < 

uj j=l,...,n. 
2) Generalized upper bounds (GUB, for short): 7 jes; x; 


< b;,i=1,..., m, where S),..., S, is a partition of 


{1,..., n}. 

3) Nested constraints (Nested, for short): pare < bi, 
i=1,..., m, where S; C... C S,,. We can assume 
by <... < by, since if b; > bj,;, the ith constraint is 
redundant. 

4) Tree constraints (Tree, for short): ies; xj < bj, i= 
1, ..., m, where the sets S; are derived by some hier- 


archical decomposition of E into disjoint subsets. 

5) Network constraints (Network, for short): The con- 
straint is defined in terms of a directed network with 
a single source and multiple sinks. Given a directed 
graph G = (V, A) with node set V and arc set A, let 
s € V be the source and T C V be the set of sinks. 
The amount of supply from the source is N > 0, and 
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the capacity of arc (u, v) is c(u, v). Denote the flow 
vector by g = {(u, v):(u, v) € A}. @ is a feasible flow 
in Gif it satisfies 


0< (u,v) <c(u,v), (u,v) EA, 


Y> ovw- > owvyaso, 4 


(v,w)EA4 (v) (u,v)EA—(v) 
vEeV—T-—{s}, (5) 
YS osy- Yo gus)=N, (6) 
(s,v)EA + (s) (u,s)EA—(s) 
x1(@) 
= DF pun- DF ot 20, 
(u,t)€A_(t) (t,v)EA 4 (t) 
te T, 
Ying) =N. (8) 
t 


The value x;(y) denotes the amount of flow entering 
a sink t € T. For a feasible flow g, the vector x;(g)x € 
T} is called the feasible flow vector with respect to @. 
For instance, the problem SC/Network/C (i.e., the 
separable convex resource allocation problem under 
network constraints) is defined as follows: 


min )> filo¢(g)) 
t€T (9) 
st. (4) —(8), 


where f;, for each t € T, is a convex function. 

6) Submodular constraints (SM, for short): A set of fea- 
sible solutions is defined by a base polyhedron B(r) 
=x € R®: x(S) < r(S) for all S € D, x(E) = r(B)} of a 
submodular system (D, 17), i-e., 

x € B(r). (10) 

Here, we use the notation E = {1, ..., n}, and x(S) 

= Vices x; forS C Eandx € R®. D C 2? isa dis- 

tributive lattice such that 0, E € D, i.e., D is closed 
under union and intersection operations. Also, the 

function r: D > Z is submodular over D, i.e., 


r(X)+7r(Y) > r(XUY)4+7r(XNY). 


For a submodular system (‘D, r), 
P(r) = {x € R®: x(S) <r(S) forall S € D} 


is called the submodular polyhedron of (D, r). 
Notice that the first constraint in (1) is included in 
the constraints of (10), as x(E) = r(E) in the above 
definition. If we consider the case of integer vari- 
ables, the constraint is defined by 


x € Br) NZ. 


It is assumed, in general, that B(r) of the constraint 
(10) is not explicitly given as an input, but is implic- 
itly given through an oracle that tells the value r(X) 
when X is given. 

General linear constraints (Linear, for short): Con- 
straints defined by a set of linear inequalities 


7 


— 


n 
S > aijx; < bi, i=1,...,m. (11) 
j=l 


No other special assumption is imposed on the 
structure of the constraints. 
Notice that all the constraints, LUB, GUB, Nested, Tree, 
Network are special cases of submodular constraints 
(see [11]), and SM is a special case of Linear. 


Algorithms 


We first introduce an incremental algorithm for the 
simple resource allocation problem, SC/Simple/D. We 
assume that each fj is defined over the interval [0, N]. 
Since f; is convex, we have 


dj(1) <--» < dj(N), (12) 


where 


dj(yj) = fv) — fy — V.- 


The incremental algorithm is a kind of greedy algo- 
rithm, and is also called a marginal allocation method. 
Starting with the initial solution x = (0, ..., 0), one unit 
of resource is allocated at each iteration to the most fa- 
vorable activity (in the sense of minimizing the increase 
in the current objective value) until }°x; = N is attained. 
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Input: An instance of SC/Simple/D. 
Output: An optimal solution x*. 
Let x :=(0,...,0)andk :=0; 
WHILE k < N DO 
Find j* such that 
d jx (x; te 1) = MIN} <j<n d(x; Su ip 
Xje = xix + 1; 
k :=k+1 
END; 
Output x as x*. 


Procedure INCREMENT 


It has been shown this procedure correctly com- 
putes an optimal solution in O(N log n +n) time. 

Several polynomial time algorithms have been de- 
veloped for problem SC/Simple/D [3,7,14]. The fastest 
among them is proposed in [3]. Its running time is 
O (max{n, n log(N/n)}), but the algorithm is very com- 
plicated. All these algorithms are based on divide-and- 
conquer. 

The incremental algorithm presented above also 
works for problem SC/SM/D. In this case, among all the 
elements such that x + e(j) € P(r) (i.e. feasible except 
for the constraint (x + e(j)) (E) = r(B)), the x; with the 
minimum increase in f;(x;) is incremented by one. This 
process is repeated until x(E) = r(E) is finally attained. 

A polynomial time algorithm for SC/SM/D is also 
known [2,4]. It first solves a problem of SC/Simple/D 
type, which is obtained from the original problem by 
by considering only the simple constraint x(E) = r(E) 
but disregarding the rest. If the obtained solution y 
is feasible, we are done, i.e., it is an optimal solu- 
tion of the original problem. Otherwise, the problem 
is decomposed into subproblems using the information 
obtained from the vector y and the submodular con- 
straints. 

When specialized to problem SC/Network/D, the 
running time becomes O(|T|(t(m, m, Cmax)+ |T| 
log(N|T|))), where t(n, m, Cmax) denotes the running 
time for the maximum flow algorithm for a graph with 
n vertices, m arcs and the maximum arc capacity Cmax. 
The direct consequence of this result is that problems 
SC/GUB/D, SC/Nested/D and SC/Tree/D can be solved 
in O(n* log(Nn)) time. For SC/Nested/D, the running 
time was improved to O(n log n log(N/n)) in [8]. The 


idea of the improvement is based on a general and beau- 
tiful proximity theorem between integral and continu- 
ous optimal solutions for SC/SM/D and SC/SM/C. 

For SQC/-/D with — equal to Simple, GUB, Nested, 
Tree or Network, D.S. Hochbaum and S. Hong [9] de- 
veloped improved algorithms based on proximity re- 
sult between SQC/-/C and SQC/-/D and efficient al- 
gorithms for SQC/-/C. 

Minimax/-/D and Maximin/-/D, are equivalently 
transformed into problems of SC/-/D. Therefore, 
equally efficient algorithms can be developed for min- 
imax and maximin problems. The transformation is 
done as follows: We only show this fact for the most 
general case, i. e., Minimax/SM/D, which are described 
as follows: 

min max fj(x;), 

MINIMAX sek 
s.t. x is an integral base of B(r). 


Here, all f;, j € E, are assumed to be nondecreasing. 
When all f; are nonincreasing, problems MINIMAX 
and MAXIMIN are mutually transformed into MAX- 
IMIN and MINIMAX, respectively, by the following 
identities: 


— min max f;(x;) = max min —f;(x;), 
x j€E Sit i) x jeE Fil i) 


— max min f;(x;) = min max —f;(x;). 
x j€E fi(*)) x j€E fil) 


Define for j € E, 
xj 
He) => FO) =O (13) 
y=0 
Note that 
gi(xj) — gj(xj — 1) = fi(x;) 
holds for each x; = 0, 1, .... From the nondecreasing- 


ness of f;, it follows that g; is convex over the nonneg- 
ative integers. Now consider the following problems of 
SC/SM/D: 


Q,: min Y> gi(x;): xe Braz 
j€E 


It is then shown that an optimal solution of problem 
Q, is optimal to MINIMAX. 


386 


Combinatorial Optimization Algorithms in Resource Allocation Problems 


Generalizations 


We finally note a recent development. K. Ando, S. Fu- 
jishige and T. Naitoh [1,6] considered the separable 
convex resource allocation problem for a bisubmodular 
system and for a finite jump system, whose underlying 
constraint can be viewed as a generalization of the sub- 
modular constraint. They developed greedy algorithms 
for such problems. For the case of a bisubmodular sys- 
tem, a polynomial time algorithm has been given in [5]. 
Also, Hochbaum and J.G. Shanthikumar [10] showed 
that, for a class of general linear constraints, efficient al- 
gorithms can be developed. The running time of their 
algorithm depends on the maximum absolute value of 
the subdeterminants, A, and if A = 1 (i.e., the con- 
straint matrix is totally unimodular), the running time 
becomes polynomial. The idea is based on the proxim- 
ity result between the integral and continuous optimal 
solutions. When A = 1, V.V. Karzanov and S.T. Mc- 
Cormick [12] proposed another polynomial time algo- 
rithm. 

In addition to these efforts to generalize the con- 
straints, new progress has recently been made towards 
generalizing objective functions for which efficient al- 
gorithms can still be developed. This research was done 
by K. Murota [16,17] who identified a subclass of non- 
separable convex functions, M-convex functions, which 
is defined on the base polyhedron of a submodular sys- 
tem as follows. 

A function f : ZF + RU {oo} is said to be M-convex 
if it satisfies the following property: 

e (M-EXC): For any x, y € dom f and for any i € 
supp*(x—y), there exists a j € supp” (x—y) such that 


f(x) +f) 
> f(x — e(i) + e(f)) + fly + eC) — e(f)), 


where 


dom f = {x eZ": f(x) < +oo}, 
suppt (x — y) = {k € E: xx > yx}, 
supp (x—y)={keE: xp < yx}. 


The M-convex functions can enjoy nice theorems of 
discrete convex analysis in a parallel manner to the tra- 
ditional convex analysis. A polynomial time algorithm 
has been developed for this class of problems [18]. 
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In combinatorial optimization games, we consider co- 
operative games for which the value of the game is ob- 
tained via a combinatorial optimization problem. For 
a cooperative game (a class of games with side pay- 
ments), the set of participating players is denoted by N 
and a value v(S) is achieved by each subset S of play- 
ers without any help from other players (in the set N — 
S). Usually, we set v(@) = 0. In general, the representa- 
tion of the game requires an input size exponential in 
the number of players. For a combinatorial optimiza- 
tion game, however, the value v(S) is often succinctly 
defined as a solution to a combinatorial optimization 
problem for which the combinatorial structure is de- 
termined by the subset S of players. The income dis- 
tributed to individual player i is represented by x;, 1 < i 
< N, and x = (xj,..., XN). 

The main issue in cooperative games is how to fairly 
distribute the income collectively earned by the whole 
group of players in the game, cooperating with each 
other. For simplicity, let x(s) = )°iesx;. The income 
vector x is called an imputation if x(N) = v(N), and Vi 
€ N: x; = v({i}) (individual rationality). Additional re- 
quirements may be added to ensure fairness, stability 
and rationality. And they lead to different sets of in- 
come vectors which are generally referred to as solu- 
tion concepts. Among many of these solution concepts, 
the core, which consists of all the imputations satisfy- 
ing the subgroup rationality condition VS C N: x(s) > 
v(s), is naturally defined and has attracted much atten- 
tion from researchers. It has also led to many fruitful 
results in combinatorial optimization games. Our fo- 
cus in this article will be on the core. Readers inter- 
ested in other solution concepts for cooperative games 
in general can find them in many game theory books 
and survey papers. For example, [12] gives an inter- 
esting discussion for several classical solution concepts 
in cooperative games and their applications to political 
economy. 

Recently, computational complexity has been sug- 
gested as another metric for evaluating the rational- 
ity of these solution concepts [2]. In this argument, 
computational complexity is suggested as a measure of 
bounded rationality [13] for players not to spend super- 
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polynomial time to search for the most suitable solution. 
For combinatorial optimization games, N. Megiddo [7] 
suggested that algorithms polynomial in the number of 
players (as good algorithms following the concept in- 
troduced by J. Edmonds [3]) be sought for solutions. 
As the value of any subset of players is defined as the 
optimal solution to a combinatorial optimization prob- 
lem, the input size can often be restricted to be bounded 
by a polynomial in the number of players. This is usu- 
ally the case for many practical collective optimization 
problems. The value of a subgroup of players is the op- 
timal objective function value that this subgroup can 
achieve under the constraints imposed by resources 
controlled by players in the subgroup. Very often the 
collective optimization problem requires an integer so- 
lution. It is under this context the game is then referred 
to as a combinatorial optimization game. 

An example to formulate a two-sided market (the 
assignment game) is given in [11]. The underlying 
structure is a bipartite graph (V;, V2;E). One interpre- 
tation given by L.S. Shapley and M. Shubik is that V, 
is the set of sellers, and V> is the set of buyers. For the 
simplest case, each seller has an item (say a house) to 
sell and each buyer wants to purchase an item. The ith 
seller, i € Vj, values its item at c; dollars and the jth 
buyer values the item of the ith seller at hj dollars. Be- 
tween this pair, we may define a value v({i, j}) = hy — 
c; if hy = c; and set (i, j) an edge in E with weight v({i, 
j}). Otherwise, there is no edge between i and j since no 
deal is possible if the seller values the item more than 
the buyer does. Considering a game with side-payment, 
the value v(s) of a subset S of buyers and sellers is de- 
fined to be the weight of maximum matching in the bi- 
partite graph G[S] induced by the corresponding set S 
of vertices (an edge is in G[S] if and only if its two end 
vertices are both in S). In a linear programming formu- 
lation, this is 


v(S)=max > vi, f)xij 
(i,j)EE 


s.t. 2 xij <1 


i€V,NS 


y xij <1 


jEV2NS 


x>0. 


Shapley and Shubik have shown that the core for this 
assignment game is precisely the set of solutions for the 


dual program of the above linear program with S = V; 
U V2. Such nice properties are not unusual in combina- 
torial optimization games. For example, the same fact is 
established for another game, a cost allocation game on 
trees, by A. Tamir. Tamir has shown that the core is ex- 
actly the set of optimal solutions to the dual program of 
the linear program formulation for the total cost of the 
cost allocation problem on trees [14]. 

The Shapley-Shubik model is a theoretical formu- 
lation for a pure exchange economy. The linear produc- 
tion game of G. Owen [8] applies their ideas to a pro- 
duction economy. In Owen’s model, each player j (j € 
N) owns a resource vector, b/. For a subset S of players, 
their value is the objective function value of the optimal 
solution for the following linear program: 


max cly 
jEs 
y= 0. 


Thus, the value is what the subset of players can achieve 
in the linear production model with the resources un- 
der their control. The core for the lin- ear production 
game is always nonempty [8] if all the above linear pro- 
grams have finite optimum. A constructive proof pre- 
sented by Owen obtains an imputation in the core from 
any optimal solution of the dual program 


min = w' bi 
jEN 
st. wlA>c! 


of the linear program for all the players 


max cly 


s.t. Ay < 2 bi, 


jeN 


In fact, let w be the optimal solution for the dual pro- 
gram. Set x; = w' bi, j © N. Then x = (x,..., xn) is an 
imputation in the core. To see so, for each subset S C N, 
consider x(s) = }°; € 5x;. By definition of x, we have x(s) 
=Yiie sw! b'. Let ys be the optimal solution for the lin- 
ear program for v(s). Then, Ay < wy < sb). Therefore, 
x(s) > wi Ay®. On the other hand, w' A > c'. It follows 
that x(s) > el yt, which is the same as x(s) > v(s) since 
v(s) = eye. 
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Notice that this proof depends on the fact that, if for 
each S CN, the linear program (1) has a finite optimal 
value. In general, a linear program may be unbounded 
or infeasible. If for any S C N, the linear program (1) 
is unbounded, obviously the core does not exist. If it is 
infeasible, we may define v(s) = —oo. This allows the 
extension of the above result to the case when the fol- 
lowing conditions are satisfied: 


1) 


max cly 
st. Ay<) bi 
jEN 
v2.9 


has a finite optimal value. 

2) For each S C N, 1) has a finite optimal value or is 

infeasible. 

However, unlike the assignment game, there may in 
general be imputations in the core which cannot be ob- 
tained from the dual program for Owen’s linear pro- 
duction game [8]. In general, it is not known how to 
decide whether an imputation is in the core in polyno- 
mial time. 

There is a weakness in applying the linear produc- 
tion game model to the studies of coalition optimiza- 
tion problems. That is, in reality, many variables are re- 
quired to be of integer values. It happens that for the as- 
signment game of Shapley and Shubik, the linear pro- 
duction model of Owen’s always results in an integer 
solution. There are, however, many other situations for 
which the integer optimal solution cannot be obtained 
in the framework of the linear program. 

A generalized linear production model introduced 
by D. Granot retains the main linear program structure 
of Owen’s model but allows right-hand sides of the re- 
source constraints not to be linear in the resource vec- 
tors b/ of individual players [6]. Thus, v(S) is defined to 
be max{c! y: Ay < U(S), y = 0}, where b(S) = (b1(s), ..., 
b,,(s)) is a general function of S. It is shown that, if for 
each i, 1 < i < N, the game consisting of player set N 
with value function b;(s) has a nonempty core, the gen- 
eralized linear production game has a nonempty core. 
As the game of Owen’s model, an imputation in the 
core is constructed from the optimal solution for the 
dual program and vectors in the core associated with 
(N, b;) [6]. This would in general need an exponen- 


tial number of function values b(s) for all the subset S$ 
of N. For some collective combinatorial optimization 
problems, b(S) is given implicitly as a solution to some 
optimization problem and thus the problem input size 
is polynomially bounded. The extended power of Gra- 
not’s model can be applied to prove nonemptiness for 
the cores of many games beyond those of Owen’s linear 
production game. 

In particular, the generalized linear production 
game model is applied to show the nonemptiness of 
acertain minimum cost spanning tree game [6]. In this 
problem, we have a complete graph as the underlying 
structure. A cost is assigned to each edge. There is a dis- 
tinguished node 0. Players are vertices {1, ..., n}. The 
cost c(S) of a subset S of players is defined to be the cost 
of minimum spanning tree in the graph G[S U 0}] in- 
duced by S U {0}. (Notice that the cost game is different 
from the value game defined as above but can be han- 
dled similarly.) Even though an imputation in the core 
can be found in polynomial time for this game, in [4] it 
is shown that it is NP-hard to decide whether an impu- 
tation is not in the core. 

Another way to extend Owen’s model to include 
games of combinatorial optimization nature is to ex- 
plicitly require integer solutions in the definition of 
the linear production model. That is, one may define 
game value v(s) for a subset S C N to be the maximum 
value of an integer program instead of a linear program. 
Therefore, 


v(S) = max clx: Ax < SS bi, x integers 
jeS 


For the assignment game of Shapley and Shubik and 
the cost allocation game on trees of Tamir, the integer 
program can be solved by its linear program relaxation, 
since there is always an integer solution for the latter. 
In the work of Shapley and Shubik, as well as that of 
Tamir, b/ is a unit vector and b(N) is a vector of all ones. 
It is this particular structure of linear constraints that 
makes the core to be identified with the set of optimal 
solutions for the dual linear program to the linear pro- 
gram of the game value for the set of players [11,14]. 
It is no wonder this property is further exploited in [5] 
for a partition game, and in [1] for packing/covering 
games. 
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The packing game, for example, is defined for a set 
N of players whose game value is given by the following 
integer program 


cx 


max 
S.t. x’ Aun < 1% 


x € {0,1}”, 


where 1 is a vector of |N| ones, and Ay, is a 0-1 ma- 
trix of rows indexed by M and columns indexed by N. 
For each subset S of players, its value is given by 


max clx 
7 Tr T ale 
s.t. x Am,s = Lisp x Aus = 0,-Is|° 
x € {0,1}, 


where Ay, is the submatrix of A with row set M and 
column set S, S = N — S and v(Q) is defined to be 0. 

The covering game and the partition game are de- 
fined similarly. It is a necessary and sufficient condition 
for the core of the packing (and covering, and parti- 
tioning) game to be nonempty that the linear relaxation 
of the corresponding optimization problem always has 
an integer optimal solution. In additional, the core, if 
nonempty, is exactly the set of optimal solutions to the 
dual program of the linear relaxation of the correspond- 
ing integer program [1,5]. 

These results allow for a characterization of combi- 
natorial structures for the corresponding combinatorial 
optimization game to have a nonempty core. Because of 
the linear program characterization of the core, ques- 
tions such as whether the core is empty or not, whether 
we can find an imputation in the core, and whether an 
imputation is in the core, can often be determined in 
polynomial time. Notice that, there are cases that the 
linear program may be of exponential size in the num- 
ber of players, it is not immediate that all these ques- 
tions can be solved in polynomial time. But even for 
cases when there are an exponential number of con- 
straints, the linear program may be solvable in polyno- 
mial time [9]. 

First established by Shapley and Shubik for the as- 
signment game, the connection of the core for a com- 
binatorial optimization game with dual program of the 
linear program relaxation has been a successful tool in 
the characteration of the core, design and analysis of al- 


gorithms to find an imputation in the core and to test 
membership of an imputation in the core. It is expected 
that this approach would continously lead to fruitful re- 
sults in cooperative game theory. 
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Introduction 


Test problems are instances of a mathematical problem 
used to establish the accuracy or efficiency of a solu- 
tion method. Test problems provide a common base- 
line against which to compare a new solution algo- 
rithm with an existing procedure. A problem genera- 
tor is an algorithm to produce a test instance for a spe- 
cific combinatorial problem. In this section, we pro- 
vide an overview of test problems and generation meth- 
ods, as well as sources of test problems for a number 
of well-known combinatorial problems. The design of 


test problems is a critical step in the design of combi- 
natorial algorithms, and a sufficient number and vari- 
ety of test problems must be available to determine the 
performance of a proposed algorithm across a range of 
problem types. 

There are four basic sources of test problems: 

1. Problems taken from real-world applications 
2. Libraries of standard test problems 
3. Test problems with parameters generated randomly 
from a specified probability distribution 
4. Test problems generated by an algorithm designed 
to produce problem instances with specific charac- 
teristics: e. g., problems with a known solution. 
Each of these sources has associated advantages and 
disadvantages. For example, problems taken from real- 
world applications have a degree of complexity consis- 
tent with at least some problems encountered in prac- 
tice [28], and provide a context for presenting proposed 
solutions that promotes understanding and acceptance. 
However, we typically cannot find a sufficient num- 
ber of such problems to constitute a satisfactory exper- 
iment. 

Libraries of standard test problems can provide 
problems that were used by other researchers, facili- 
tating comparisons with existing solution procedures. 
However, as with real-world cases, libraries may not 
provide a sufficient number or variety of problem in- 
stances. Procedures that randomly generate test prob- 
lems can quickly provide an essentially unlimited num- 
ber of problem instances, but the optimal solution to 
large randomly generated problems may remain un- 
known. An additional hazard with randomly generated 
problems is that such problems are sometimes artifi- 
cially easy to solve [6,33]. 

Constructive procedures designed to generate test 
problems with known solutions can be very useful for 
evaluating an algorithm’s performance, and can also 
provide a large number of test instances. Problem gen- 
eration procedures must be carefully examined to de- 
termine the difficulty, realism, and other characteristics 
of the problems generated. An ideal generator would 
produce problems in polynomial time, with a known 
solution, of appropriate hardness, and with sufficient 
diversity [31]. Of course, it can be difficult to simultane- 
ously meet all these requirements. For example, a triv- 
ial problem instance might be generated in polynomial 
time, but provide no real test for a proposed solution 
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procedure. Problem instances should also be posed us- 
ing standardized representations [10]. 

A good set of test problems is only one part of 
the evaluation of an algorithm. Barr, et al. [2] provide 
guidelines for designing computational experiments 
and for reporting results of solution algorithm perfor- 
mance. 

The following section provides sources of standard 
test problems and problem generators for a number of 
well-known combinatorial optimization problems. 


Libraries and Generators 


The INFORMS OR/MS Resource Collection [16] and 
the OR-Library maintained by Beasley [3,4] both pro- 
vide extensive collections of test data sets for a variety of 
operations research problems. The Zuse Institute [19] 
maintains a collection of various problems related to 
mathematical programming. A handbook of test prob- 
lems [11] provides a collection of test problems from 
a wide variety of engineering applications. The Discrete 
Mathematics and Theoretical Computer Science (DI- 
MACS) Challenges [8] encourage experimental eval- 
uations of algorithms using standard test problems. 
Over the past decade, challenges have been held for 
TSP, cliques, coloring, and satisfiability. An overview of 
sources for specific combinatorial problems is provided 
below. 


Combinatorial Auctions 


This problem involves auctions in which bidders place 
unrestricted bids for bundles of goods. A seller faced 
with a set of offers for bundles of goods wishes to max- 
imize his revenue. The Combinatorial Auction Test 
Suite (CATS) provides an algorithm for generating 
problem instances of differing levels of realism [21]. 


Frequency Assignment Problem 


A library of frequency assignment problems in the con- 
text of wireless communication networks is available 
at [9]. This website includes an extensive bibliography 
on frequency assignment problems. 


Graph Colorability 


Sanchis [31] provides an algorithm for generating 
graph colorability problems with known solutions. This 


reference also provides a generator for the minimum 
dominating set problem. 


Linear Ordering Problem 


Reinelt [29] maintains a library of problems instances 
for the linear ordering problem, including problem data 
and optimal solutions. This library also includes soft- 
ware and data for several other discrete optimization 
problems. Another library is maintained by Marti [22] 
in which there are large randomly generated problems 
with best known solutions. 


Maximum Clique Problem 


Hasselberg, et al. [13] consider a number of interest- 
ing problems, including the maximum clique problem. 
They introduce different test problem generators mo- 
tivated by a variety of practical applications, including 
coding theory and fault diagnosis. 


Minimum Cut-Set 


Krishnamurthy [20] provides a problem generator for 
partitioning heuristics, including the minimum cut-set 
problem. Generated instances of this problem are useful 
in circuit design applications. 


Minimum Vertex Cover Problem 


Sanchis and Jagota [32] discuss a test problem genera- 
tor that builds instances of the minimum vertex cover 
problem. The generator provides construction param- 
eters to control problem difficulty. Sanchis [31] pro- 
vides an algorithm to generate minimum vertex cover 
problems that are diverse, hard and of known solu- 
tion. 


Multidimensional Assignment Problem (MAP) 


The axial MAP is a generalization of the linear as- 
signment problem. Grundel and Pardalos [12] provide 
a MAP generator that produces difficult problems with 
known unique optimal solutions. 


Quadratic Assignment Problem (QAP) 


Pardalos [25] provides a method for constructing test 
problems for constrained bivalent quadratic program- 
ming. This reference includes a standardized random 
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test problem generator for the unconstrained quadratic 
zero-one programming problem. Yong and Parda- 
los [35] provide methods for generating test problems 
with known optimal solutions for more general cases 
of the QAP. Calamai, et al. [7] describe a technique 
for generating convex, strictly concave and indefinite 
QAP instances. Palubeckis [24] provides a method for 
generating hard rectilinear instances of the QAP with 
known optimal solutions. Burkard, et al. [5] give addi- 
tional useful information concerning this difficult prob- 
lem. 


Satisfiability 


Achlioptas, et al. [1] propose a generator for satisfiabil- 
ity problems that controls the hardness of the instances. 
A web page maintained by Uchida, Motoki, and Watan- 
abe [34] is dedicated to two methods of generating in- 
stances of 3-satisfiability. A library of satisfiability prob- 
lem instances and solvers is available on a Darmstadt 
University website [15]. 


Steiner Problem in Graphs 


Khoury, et al. [17] use a binary-programming formu- 
lation to generate test problems with known solutions 
by applying the Karush-Kuhn-Tucker optimality con- 
ditions to the corresponding quadratically-constrained 
optimization problem. Koch, et al. [18] provide a li- 
brary of Steiner tree problems with information about 
the origin, solvability, and other characteristics of this 
problem. 


Traveling Salesman Problem (TSP) 


Moscato [23] maintains a web site with resources for 
the generation of TSP instances with known optimal so- 
lutions. An approach for generating discrete instances 
of the symmetric TSP with known optima is provided 
by Pilcher and Rardin [27]. A number of libraries 
(e. g [4,30]) provide test cases for the TSP. 


Vehicle Routing Problem 


Homberger [14] provides a large set of Vehicle Rout- 
ing Problems with Time Windows, including instances 
with up to one thousand customers. 


Conclusions 


Researchers need a large set of well-designed test prob- 
lems to effectively compare the performance of exist- 
ing solution algorithms or to evaluate a new algorithm. 
Although practitioners may prefer real-world problems 
for such tests, a sufficient number of test problems may 
not be available to conduct a thorough experiment. 
Randomly generated test problems can provide an es- 
sentially limitless supply of instances. However, ran- 
dom test instances may be artificially easy to solve, or, at 
the other extreme, may have no known solution, mak- 
ing it difficult to judge the performance of a new solu- 
tion algorithm. Test problem generators, if properly de- 
signed, can provide a large supply of hard problem in- 
stances with known optimal solutions. Many such gen- 
erators are readily available to researchers. Libraries of 
test problem are also available, providing a variety of 
problem types and sizes. 
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In the communication network assignment problem 
(CAP) a system of communication centers C, ..., Cy 
is given. The centers have to be embedded into a given 
(undirected) network N = (V, E) with vertex set V, |V| 
= n, and edge set E. The centers exchange messages at 
given rates per time unit through a selected routing pat- 
tern. Let t be the amount of messages sent from center 
C; to center C; per time unit. If there is no direct con- 
nection between C; and C; the messages sent from C; to 
C; pass through several intermediate centers. The mes- 
sages exchanged between C; and C; may be sent along 
a single path or they may be split into several parts, each 
part being sent along its own path. For any fixed em- 
bedding € of the centers into the network and for any- 
fixed routing pattern p of the messages, let IMT ¢, p (Ci) 
denote the overall amount of traffic going through the 
center C; as intermediate center. The goal is to find an 
embedding € of the centers into the network and a rout- 
ing pattern p which minimizes the maximum interme- 
diate traffic over all centers: 


min max {IMT¢,)(C;): 1 <i <n} (1) 

E,0 

A typical application of the problem arises in the 
case of locating stations (terminals, computers) in 
a local-area computer network (LAN) as described by 
T.B. Boffey and J. Karkazis [1]. Usually, a given seg- 
ment of the LAN serves different pairs of communi- 
cating stations. In order to prevent interference and 
garbled messages, only one message at a time can be 
sent through a given segment of the LAN. On the other 
hand one has to restrict the offered traffic through the 
same segment so as to maintain a reasonable through- 
put in the network. To this end it is reasonable to locate 
bridges at the endpoints of each segment. All bridges 
will work as intermediate centers and all stations will 
work as bridges. The result is that each pair of stations 
(or bridges) communicates through its own segment. It 
is reasonable to require an embedding of stations and 
additional bridges into the nodes of the LAN such that 
the intermediate traffic going through the busier station 
(or bridge) is minimized. Boffey and Karkazis [1] pro- 
posed and discussed also a continuous version of the 
problem. 

A similar problem, the so-called elevator problem 
leads also to the optimization problem (1) as described 
by Karkazis [5]. The elevator problem arises when a sin- 


gle elevator has to be replaced by two elevators, each 
covering contiguous subsets of floors. It might be rea- 
sonable to place the connecting landing so as to min- 
imize the traffic intensity on the busier elevator. More 
specifically assume that the first elevator serves floors 
{1, 2, ..., i} and the second elevator serves floors {i, 
i+1,..., n}, and let tj represent the traffic intensity 
from floor i to floor j. Then the traffic load of the first 
elevator is given as Ve = yi Ly (te + th) and 
the traffic load of the second elevator is given as i 
= Vy, (ta + ti). Then we want to choose i so 
as to minimize max{T", tans Obviously this problem 
setting can be generalized for more than two elevators. 
Essentially there are two distinct models of routing 
patterns in (1): the single path model and the fractional 
model. In the single path model, for every pair of com- 
munication centers C; and Cj, a single route in the net- 
work is selected and all tj, messages are sent along this 
fixed route. In the fractional model, the amount fj is 
split into a number of positive parts and every part is 
sent along its own path. Most of the results available in 
the literature concern the CAP on trees. In this case, for 
each pair of vertices in the network there is only one 
path to join them and hence, both models coincide. 
R.E. Burkard, E. Cela and G.J. Woeginger have 
proved in [3] that in general the CAP is NP-hard. 
More specifically has been shown that the CAP is NP- 
hard for networks that are i) paths; ii) stars of branch 
length three; iii) cycles NP-hardness in both models); or 
iv) doublestars (NP-hardness in the single path model). 
Moreover, it has been proved in [3] that the CAP is 
polynomially solvable in the case of stars of branch 
length two and in the case of doublestars in the frac- 
tional model. In the case of a star of branch length two 
the CAP can be formulated as a maximum weight per- 
fect matching problem (MWPMP) if the communica- 
tion center to be assigned to the central node of the 
star is kept fixed. Since there are only n possibilities for 
the selection of the center to be placed at the central 
node, one just has to solve n MWPMPs. In the frac- 
tional model, finding an embedding of the communica- 
tion centers into the nodes of the network and a routing 
pattern which minimize the intermediate traffic can be 
done by solving a specified number of linear programs 
with O(P) variables and O(n’) constraints each, where 
P is the number of pairwise disjoint paths in N. In the 
case of doublestars P = O(n’) and there are O(n’) pro- 
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grams to be solved (see [3]). This implies that in this 
case the CAP is polynomially solvable in the fractional 
model. 

Some exact algorithms and heuristic approaches to 
solve the CAP on trees have been proposed in [3,5]. 
Karkazis has proposed a branch and bound algorithm 
in the case where N is a path [5], and Burkard, Cela 
and Woeginger [3] have proposed a branch and bound 
approach in the case that N is a tree. The algorithms 
have been tested on randomly generated trees and com- 
munication rates tj. The tests show that only small in- 
stances of the CAP of size up to 12 can be solved in 
reasonable time. For large instances the number of the 
branched nodes in the branch and bound tree explodes. 
In order to approximately solve larger instances of the 
CAP on trees Burkard, Cela and T. Dudas proposed 
in [2] simulated annealing and tabu search approaches. 
The proposed heuristics are tested on randomlygener- 
ated instances of size up to 32. The comparison of the 
heuristic solutions with the optimal solution produced 
by the branch and bound algorithm for instances of 
small size shows that the performance of these heuris- 
tics is quite satisfactory. 

Finally, in [2] the asymptotic behavior of the CAP on 
trees has been investigated.Under natural probabilis- 
tic assumptions on the problem data the CAP on trees 
shows a very interesting behavior: The ratio between 
the maximum and the minimum values of the objective 
function, i.e., the ratio between the maximum and the 
minimum values of the intermediate traffic through the 
busiest center, approaches 1 with probability tending to 
1 as the size of the problem tends to infinity. The proof 
of this fact is based on the strong relationship between 
the CAP-T and a special version of the quadratic as- 
signment problem. It is shown that the latter fulfills the 
condition of a theorem of Burkard and U. Fincke [4] on 
the asymptotic behavior of combinatorial optimization 
problems. From a practical point of view the asymp- 
totic behavior described above implies that the CAP on 
trees becomes trivial as its size tends the infinity: every 
feasible solution provides a good approximation of an 
optimal solution. 
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Introduction 


Facility location models deal, for the most part, with the 
location of plants, warehouses, distribution centers and 
other industrial facilities. These location models do not 
account for competition or for differences among fa- 
cilities and therefore allocate customers to facilities by 
proximity. In reality, retail facilities operate in a com- 
petitive environment with an objective of profit and 
market share maximization. These facilities are also dif- 
ferent from each other in their overall attractiveness 
to consumers. One branch of location analysis focuses 
on the location of retail and other commercial facilities 
which operate in a competitive environment, namely 
competitive facility location. The basic problem is the 
optimal location of one or more new facilities in a mar- 
ket area where competition already exists or will exist in 
the future. Assuming that profit increases when market 
share increases, maximizing profit is equivalent to max- 
imizing market share. It follows, then, that the location 
objective is to locate the retail outlet at the location that 
maximizes its market share. 

A unique feature of competitive facility location 
models is facility attractiveness (its appeal to con- 
sumers). Facilities differ in the total “bundle of bene- 
fits” they offer customers. They vary in one or more 
of the attributes which make up their total attractive- 
ness to customers. Furthermore, varying importance 
assigned to each of these attributes by different cus- 
tomers will result in a selective set of consumers patron- 
izing each. Facility attractiveness level, therefore, needs 


to be incorporated in the location model. Facility attrac- 
tiveness needs to first be assessed using one of a vari- 
ety of methods. Once attractiveness is assumed known, 
market share captured can be calculated. Facility attrac- 
tiveness is estimated using a utility function (a com- 
posite index of attractiveness) or some other measure 
(floor area) serving as a surrogate for a latent attractive- 
ness. Utility models are predicated on consumer spatial 
choice models as well as on the premise that facilities of 
the same type are not necessarily comparable. 

Also unique to competitive facility location is the 
modeling of demand in terms of buying power. Income 
levels and discretionary spending become a measure of 
demand. For a review of competitive models see [4,15]. 

The underlying theme running through all compet- 
itive models is the existence of an interrelationship be- 
tween four variables: buying power(demand), distance, 
facility attractiveness, and market share, with the first 
three variables being independent variables and the last 
the dependent variable. Buying power, or effective buy- 
ing income, is known (for example, Sales and Mar- 
keting Management magazine). Distance from demand 
points to facilities can be measured. The most difficult 
link in the interrelationship between the four variables 
is the determination of facility attractiveness. For a dis- 
cussion of the determination of facility attractiveness 
see [6,9]. As is mentioned above and discussed below, 
it is estimated using a utility function. Once buying 
power, distance, and attractiveness are known, market 
share can be calculated. 


The Proximity Model 


The first modern paper on competitive facility location 
is generally agreed to be Hotelling’s paper on duopoly 
in a linear market [21]. Hotelling considered the loca- 
tion of two competing facilities on a segment (for ex- 
ample, two ice-cream vendors along a beach strip). The 
distribution of buying power along the segment is as- 
sumed uniform and customers patronize the closest fa- 
cility. When one facility is located and there is no com- 
petition, all customers patronize the existing facility. 
However, when a competing facility is introduced and 
is located at a different point on the segment, the cus- 
tomers on one side of the midpoint between the two fa- 
cilities patronize one facility and the customers on the 
other side of the midpoint patronize the second facil- 
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ity. If one facility is held fixed in place, the best loca- 
tion for the second is either immediately left or right 
of the fixed one, depending on which segment - left or 
right of the existing facility — is longer. In models based 
on Hotelling’s formulation it is assumed that customers 
patronize the closest facility. 


The Location-Allocation Model 


An extension to Hotelling’s approach is the location- 
allocation model for the selection of sites for facilities 
that serve a spatially dispersed population. Both the 
facilities’ locations and the allocation of customers to 
them are determined simultaneously. The allocation of 
customers to facilities is made using Hotelling’s prox- 
imity assumption - each facility attracts the consumers 
closest to it. The market share attracted by each facility 
is calculated and the best locations for the new facilities 
are then found. Multifacility location-allocation models 
analyze the system-wide interactions among all facili- 
ties. Revelle [28] introduced location-allocation models 
to competitive location. Goodchild [19] suggested the 
location-allocation market share model (MSM). A re- 
tail firm is planning to open a chain of outlets in a mar- 
ket in which a competing chain already exists. The en- 
tering firm’s goal is to maximize the total market share 
captured by the entire chain. Most location-allocation 
solution methods rely on heuristic approaches that do 
not guarantee an optimal solution, rather they provide 
good solutions for implementation. The best locations 
are selected from a user-provided, prespecified set of 
potential sites. Typically, these problems are formulated 
on a network and the location solution is on a node. 
A book edited by Ghosh and Rushton [18] provides 
a collection of papers on the subject. A comprehen- 
sive review of location-allocation models can be found 
in [17]. 

The assumption that customers patronize the facil- 
ity closest to them implies that the competing facilities 
are equally attractive. For equally attractive facilities, 
the plane is partitioned by a Voronoi diagram [26,27]. It 
is implicitly assumed that all customers located at a de- 
mand point patronize the same facility. This, in turn, 
implies an “all or nothing” property. The combined 
buying power at a demand point is assigned entirely to 
one facility and none is assigned to other facilities, un- 
less two or more facilities are equidistant. A solution 


procedure for solving the multiple competitive facility 
location in the plane is proposed in [29]. 


The Deterministic Utility Model 


When the facilities are not equally attractive, the prox- 
imity premise for allocating consumers to facilites is no 
longer valid. To account for variations in facility at- 
tractiveness, a deterministic utility model for compet- 
itive facility location is introduced by T. Drezner [2]. 
Hotelling’s approach is extended by relaxing the prox- 
imity assumption. Consumers are known to make their 
choice of a facility based on factors other than distance 
alone. Therefore, it is assumed that customers base their 
choice of a facility on facility attractiveness which is 
represented by a utility function. This utility function is 
a composite index of facility attributes and the distance 
to the facility, representing the expected satisfaction 
from that facility (either an additive or a multiplicative 
utility function). It is generally agreed that customers, 
through a decision-making process, choose the facility 
with the highest utility, the facility which is expected to 
maximize their satisfaction. This choice is determined 
by some formula according to which customers eval- 
uate alternative facilities’ attributes weighted by their 
personal salience to arrive at an overall facility attrac- 
tiveness. 

A trade-off between distance and attractiveness 
takes place. Based on this premise the degree of ex- 
pected satisfaction with each alternative as a function of 
the relevant characteristics of that facility is measured. 
It is suggested that a customer will patronize a better 
and farther facility as long as the extra distance to it 
does not exceed its attractiveness advantage. For exam- 
ple, paramedics transporting a motor vehicle accident 
victim will by-pass a nearby hospital in favor of a far- 
ther, better equipped trauma center as long as the dif- 
ference in quality of care exceeds the adverse effect to 
the patient caused by the extra distance and time de- 
lay. A break-even distance is defined. At the break-even 
distance the attractiveness of two competing facilities is 
equal. This break-even distance, therefore, is the maxi- 
mum distance that a customer will be willing to travel to 
a farther facility (new or existing) based on his percep- 
tion of its attractiveness and advantage relative to other 
facilities. All customers at a demand point will patron- 
ize the new facility if it is located within the break-even 
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distance. While customers are no longer assumed to pa- 
tronize the closest facility, customers at a certain de- 
mand point are assumed to apply the same utility func- 
tion, therefore, they all patronize the same facility. The 
“all or nothing” property is maintained in this exten- 
sion. 

Based on aggregated utility values for existing facil- 
ities and a utility function for a new facility, the best 
location is found for the new one. The optimal location 
for the new facility is sensitive to its attractiveness. Dif- 
ferent attractiveness levels may yield different optimal 
locations. 


The Random Utility Model 


To address the “all or nothing” assumption of the de- 
terministic utility model and to account for variations 
in individual utility functions, a random utility model 
is introduced by Drezner and Drezner [7]. The deter- 
ministic utility model is extended by assuming that each 
customer draws his utility from a random distribution 
of utility functions. The probability that a customer will 
prefer a certain facility over all other facilities is cal- 
culated by applying the multivariate normal distribu- 
tion. Once the probabilities are calculated, the market 
share captured by a particular facility (new or exist- 
ing) can be calculated as a weighted sum of the buy- 
ing power at all demand points. This formulation elim- 
inates the “all or nothing” property since a probability 
that a customer will patronize a particular facility can be 
established and is no longer either 0% or 100%. To cir- 
cumvent the mathematically complicated formulation 
of the random utility model, Drezner et al. [14] sug- 
gested using the simpler logit model. The probability 
that a customer will patronize a facility as a function 
of the distance to that facility, can be approximated by 
a logit function of the distance. 


Gravity Based Models 


An alternative approach to the location of competing 
facilities, based on the gravity model, was introduced 
by Huff [22,23] and is extensively used by marketers. 
According to the gravity model two cities attract retail 
trade from an intermediate town in direct proportion 
to the populations of the two cities and in inverse pro- 
portion to the square of the distances from them to the 
intermediate town. Huff proposed that the probability 


that a consumer patronizes a retail facility is propor- 
tional to its size (floor area) and inversely proportional 
to a power of the distance to it. Facility size, or square 
footage, is a surrogate for facility attractiveness. Huff 
depicted equi-probability lines. A customer located on 
such a line between two facilities patronizes the two 
facilities with equal probability. These equi-probability 
lines divide the region into catchment areas, each dom- 
inated by a facility, in a manner similar to the Voronoi 
diagram [26]. These lines do not define an “all or noth- 
ing” assignment of customers to facilities, rather, at any 
demand point, the proportion of consumers attracted 
to each facility is a function of its square footage (at- 
tractiveness) and distance. The model finds the market 
share captured at each potential site, and thus the best 
location for new facilities whose individual measures of 
attractiveness are known. 

Suppose there are k existing facilities and n de- 
mand points. The attractiveness of facility j is Aj for 
j =1,...,k, and the distance between demand point i 
and facility j is djj. The buying power at demand point 
iis b;. Therefore, the proportion of the buying power 
(market share) M; attracted by facility j is: 


Mj =) bi — (1) 


where A is the power to which distances are raised. 

In the original Huff formulation, facility floor 
area serves as a surrogate for attractiveness. A ma- 
jor improvement on Huff approach was suggested by 
Nakanishi and Cooper [25] who introduced the mul- 
tiplicative competitive interaction (MCI) model. The 
MCI coefficient replaces the floor area with a product of 
factors, each a component of attractiveness. Each fac- 
tor in the product is raised to a power. Thus, the at- 
tractiveness of a facility is a composite of a set of at- 
tributes rather than the floor area alone. Nakanishi and 
Cooper’s idea was elaborated on and applied by Jain 
and Mahajan [24] to food retailing using specific attrac- 
tiveness attributes. Gravity based models suggest the 
evaluation in terms of market share of a user provided 
discrete set of potential sites for the location of a new 
facility. 

Huff's and Nakanishi and Cooper’s models were ex- 
tended to the location of multiple facilities by [1,16]. 
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Achabal et al. [1] extended the MCI model to the lo- 
cation of multiple facilities which belong to the same 
chain. The problem was modeled as a nonlinear integer 
programming problem and a random search procedure 
combined with an interchange heuristic was employed 
to identify optimal and near-optimal sets of locations. 
Ghosh and Craig [16] proposed a franchise distribution 
model. An expanding franchise seeks to maximize sales 
while minimizing cannibalization of franchise outlets. 
This model was also formulated as a nonlinear integer 
programming problem but included additional factors 
such as advertising. These two models select the best 
locations from a user-provided set of alternative sites as 
well. 

Other papers [20,30] suggest variations on Huff's 
formulation by replacing the distance raised to a power 
with an exponent of the distance. This formulation ac- 
celerates the distance decay. 

Finding the best location for a new facility (or mul- 
tiple facilities) in a continuous space using the grav- 
ity model objective is discussed in T. Drezner [3] and 
Drezner and Drezner [10] for the single facility case, 
and in T. Drezner [5] and T. Drezner et al. [12] for the 
location of multiple facilities. 

Finding the best location for a competing facil- 
ity that minimizes the probability of not meeting 
a given minimum threshold of market share is dis- 
cussed in [13]. 

All models discussed above assume that demand is 
distributed among the competing facilities. For non- 
essential services, some of the demand may not be satis- 
fied. A model which assumes that some of the demand 
is lost is proposed in [11]. 


Anticipating Future Competition 


The competitive facility location models discussed 
above are myopic and short-term oriented in that they 
attempt to find the optimal location for a new facility 
(facilities) by maximizing current market share against 
existing competition. A different approach to compet- 
itive location focuses on anticipating and preempting 
future competition. It is assumed that a new compet- 
ing facility will enter the market at some point in the 
future. The competitor will establish his facility at the 
location which maximizes his market share. Therefore, 
one’s present location decision will affect the competi- 


tor’s location decision. Conversely, assuming a future 
competitive entry has implications for one’s present lo- 
cation decision. The objective is to find the location that 
maximizes the market share captured by one’s own fa- 
cility following the competitor’s entry. This problem is 
known in the economic literature as the Stackelberg 
equilibrium problem or the leader-follower problem 
and as the Simpson’s problem in voting theory. See [8] 
for a review of the topic. 


Conclusions 


There are two main applications for competitive facil- 
ity location models. The first application is the location 
analysis of a new facility. The best location for the new 
facility, based on market share maximization at that lo- 
cation, is found. The second application is an analysis of 
the impact of changes in quality in existing facilities (ei- 
ther own’s, competitor’s, or both) on the market share 
captured by one’s facility and on its optimal location. 
In addition, a decision maker will be able to perform 
a “what-if analysis” and anticipate the impact on his fa- 
cility of either competitor’s improvements or of the in- 
troduction of a new facility. In this case one needs to 
know the overall attractiveness of the proposed new fa- 
cility or the difference in overall attractiveness pre-post 
improvements in an existing one. Using the models, 
a decision maker can assess: 
1. the impact on location of changes in attractiveness 
for his new facility; 
2. the impact on market share of change in location for 
his new facility; 
3. the impact on market share of changes in attractive- 
ness at his existing facility(ies); 
4. the impact on his facility of changes in other facilities 
or the introduction of a new facility. 
These models afford the anticipation and analysis of the 
impact of likely future scenarios. In a highly competi- 
tive market such as exists domestically, and in the face 
of increasing global competition, the ability to optimize 
location in terms of market share provides a strategic 
advantage for decision makers. 
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Portfolio management is a typical decision making 
problem under incomplete, sometimes unknown, in- 
formation. Very often, a probability distribution is as- 
sumed for stock/bond prices in the future. In the clas- 
sical work of H.M. Markowitz [9], the investors are as- 
sumed to base their decisions for portfolio management 
on their preference of return and risk. In this model, the 
return is specified as the expected value of the portfolio, 
and the risk its variance. One of the great achievements 
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of this work is its predictive power of diversified invest- 
ment decisions. 

The assumption that future events would follow 
some probability distribution is also widely accepted for 
many other problems for which information on future 
events is uncertain. Very often, uncertainty is used as 
a synonym for probability distribution. However, a fun- 
damental problem still remains: What decisions should 
we make in presence of future unknown events for 
which we are simply ignorant of any information? In 
such situations, the quality of a solution made under 
ignorance can only be known after future events reveal 
themselves. Therefore, the quality of a decision should 
be evaluated in comparison with the optimal available 
strategy we could have chosen knowing the outcome. 
Along this approach, the concept of competitive ratio, 
which optimizes the ratio of the outcome of a strategy 
under incomplete information and the optimal outcome 
under complete information, has been widely applied 
to solve computational problems under incomplete in- 
formation, [4,7,8,11]. In particular, R. El-Yaniv, et al., 
applied competitive analysis to the problem of foreign 
currency purchase [5]. X. Deng has suggested to apply 
the competitive analysis to portfolio management prob- 
lems [3]. 

Consider a maximization problem. Let X = (xj, ..., 
Xn) be the variables we have no complete information 
until in the future. Let Y = (y1, ..., Vm) be the deci- 
sion variablesfor which we have to choose their values 
now. Let A = (aj, ..., ax) be the variables we know of 
their values at the time we make decisions on Y. A de- 
cision rule is a function S: A— > Y. Let v(A, Y, X) be 
the value of the objective function. Denote by vs(A, X) 
= v(A, S(A), X) be the value of the objective function 
achieved by the decision rule S if the future outcome is 
X. Let OPT (A, X) = maxany v(A, Y, X). The competi- 
tive ratio of decision rule S is 


vs(A, X) 

min ———_., 

all xX OPT(A, X) 
We are interested in a decision rule which achieves the 
optimal competitive ratio: 

_ vs(A, X) 
max min ————_.. 
all S all xX OPT(A, X) 


Consider the portfolio management problem of 
choosing from a set of n stocks. We may scale units of 


stocks so that one unit of money and the current price 
for each stock is one. The portfolio choice decision can 
be represented by a vector (x1,...;%n), 1<i<n, OL, 
Xi = 1. 

To illustrate the competitive analysis method, we 
first consider the extreme case when we know no in- 
formation about future prices of the stocks. A simple 
strategy is to distribute the fund equally to all the stocks 
such that x} =--+ =x, = 1/n. Let 1 + c; be the price of 
stock i at the end of the period. Therefore, in retrospec- 
tive, the best strategy would be to invest all the money 
in the stock of the best performance: 1 + c, = max {1 + 
cj: 1 < i <n}. The income of the above strategy achieves 
VR + ci)/n. Since we may assume that 1+ c; > 0, we 
have 


This simple strategy achieves a competitive ratio of 1/n. 
On the other hand, it is natural that this strategy is op- 
timal when we have no information whatsoever about 
the stocks. Consider any strategy which invests x; in 
stock i (}°7_,x; = 1). Its outcome will be )0"_ ,xi(1+c)). 
Since }°7_,x; = 1, there existssome j such that x; < 1/n. 
In the worst case, it may happen that we have 1 + cj = 2 
and 1 + c; = 0 for all other stocks. Therefore, the opti- 
mal investment will be put all the money in stock c;. The 
competitive ratio of this strategy is no more than x;(1 + 
cj)/(1 + cj) < I/n. Therefore, the above simple strategy 
achieves the optimal competitiveratio when no infor- 
mation is available. 

To illustrate this idea further, consider another case 
is when we have some information about future prices 
of the stocks. Suppose that the onlyinformation we have 
is that stock i will fluctuate between [(1 — ¢;), (1 + 6;)] 
(— e; < 6;), 1 < i < n. It is easy to see that we may 
normalize the value versus the risk-free rate of interest 
and make it as the first option so that —e, = 5; = 0. E.g., 
we may divide outcomes of other securities by (1 + 1), 
the riskless interest rate. 

Given a portfolio choice decision, x;, 1 < i < n, 
>°j=1x; = 1. That is, one unit of investment is dis- 
tributed to n options with a fraction of x; on option 
i: 1 <i <n. Let (1 + c;) be the unknown future price 
of option i by the projected time of sales (—e; < ¢; < 
6;). In retrospect, the optimal solution wouldhave been 
max''_, (1 + cj) by investing all one unit on the option 
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achieving the optimum. For a fixed strategy of assigning 
xj: 1 <i<n, its ratio versus the optimum will be 


5 ail ean 
max{l+cj: 1<i<n} 


Taking all situations into consideration, the competi- 
tive ratio of this strategy is 


ei + 1) x; 


min 
max {1+ cj: 1<j<n} 


Seg S47 5 


where the minimum is taken over all the ranges of c;: 
—€; < ¢ < 6;, 1 <i <n. Suppose now that max/_, (1+ 
cj) is achieved at some i: 1 < i < n. Then, the above ratio 
is at least x; since xj > 0 and 1 + cj > 0, for all j:1 <j 
<n. If c; < 6 ;, the adversary can choose a new value 
c’; = 6;. In this case, the denominator max'_; (1 + cj) 
increases by 6; — cj > 0. The numerator )7"_, (1 + cj) 
x; increases by (4;— c;) x;. Thereforeit is to the benefit 
of the adversary to choose c; = 4;. Similarly, it is to the 
benefit of the adversary to set cj = —€;, for all j ¢ i. That 
is, the minimum ratio is achieved at 


(1 + 6; )Xj + all = €;)Xj 
14+ 46; 


for some i, 1 < i < n. Therefore, the adversary will 
choose some i such that 


(1 + 6;)x; + Ll _ €;) x; ; 


min >: l<i<n 
1+; 


Given a portfolio decision vector x, we can search 
through all n possible situations to find the minimum in 
polynomial time. This allows us to evaluate the quality 
of portfolio choices in terms of their competitive ratios. 

As a portfolio manager aiming at a solution with the 
best competitive ratio, its goal is to choose the decision 
vector x which maximizes 


(1 = 51) x; 5 Vail = €;)Xj ; 
14+.6; , 


min 


1l<i<n 


In a linear program formulation, this is 


max Z 
(14+6))xi +) 541) -€)) x; 
s.t. +5; => Z, 
l<i<n, 


n 
x; > 0, i Ke: 


Therefore, the optimal competitive ratio for the above 
portfolio management problem can be solved in poly- 
nomial time. 

In the general case, information about future may be 
different for different investigators. Compare two situ- 
ations where two investors each has two options, one 
government bond of riskless interest rate 1 + r and a se- 
curity. One investor knows that the future price of the 
security will be in [1 + €, 1 + 6] with € < 6 and another 
knows nothing about future prices of the security. The 
most interesting case will be 2e < r < 5. Apply the above 
analysis, the more informed investor will decide a pro- 
portion x of his fortune on riskless bond with 


x(l+r)+(1—x)Q +e) 
l+r 


_ x(l+r)+(1—x)( + 8) 
a 14+6 , 


Therefore, it invests 


a (1 + 6)(r—e) 
~ (14 6\(r-—) +0 4+n(6—71) 


on the riskless bond and 


_ (1 + r)(6 — 1) 
~ 1+ 8(r-e) + 4+N6-7 


1l-x 


on the other security. Its competitive ratio will be 


(14+ d5)\(r—6€)+ (1+ 6)(5—-1) 
(1+ 6\r—-e) ++ n(b—7) 


Applying the analysis above, we see that the person 
knowing nothing will invest 1/2 for the riskless bond 
and 1/2 for the other security. However, the worst sit- 
uations considered by the less informed investor would 
not occur at all. Therefore, its competitive ratio will be 
the minimum of 


(l+r)+(1+e) 
211 +17) 
and 
(1+r)+(1+64) 
2(1+ 8) 


From the above discussion, the decision of the investor 
knowing nothing has a worse competitive ratio than 
that of the more informed one. 
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In comparison with the general approach of us- 
ing probability distribution for events of uncertainty, 
the above situation shows that the competitive analysis 
method allows analysis for information asymmetry of 
investors. It is not easy to apply the probability method 
here since, in principle, the real world should not have 
two different probability distributions. This advantage 
is not only for the above case when the range of future 
prices is known. It can also be applied to other types of 
information about future. 

Other decision rules based on rationality other than 
probability argument have also been suggested for fi- 
nancial problems. In particular, T.M. Cover has sug- 
gested a solution, the universal portfolio, which re- 
quires no information (not even probability distribu- 
tion) about the future prices of the stocks under con- 
sideration [1]. In contrast to competitive analysis which 
evaluates a strategy with all other strategies, Cover has 
evaluated his solution in comparison with a class of 
strategies called constant rebalanced portfolio, which 
maintains a fixed proportion of one’s fortune in each 
of the securities. Notice that, this would require fre- 
quent adjustment the holdings of the securities as their 
prices change. Surprisingly, Cover has shown his solu- 
tion to approximate, under mild conditions, the best 
constant rebalanced portfolio (chosen after the stock 
outcomes are known) which out-perform any constant 
rebalanced portfolio, any single stock and index fund 
such as Down Jones Index Average (DJIA) [1]. How- 
ever, Cover’s algorithm requires higher-dimensional 
integration to calculate his solution and the dimen- 
sion grows with the number of securities under con- 
sideration. This may make it computationally difficult 
to apply this method. In comparison, the competitive 
analysis would suggest a solution which is a constant 
rebalanced portfolio with the same weight for all the 
securities. 

Dembo and King have discussed a tracking model 
for asset allocation which minimizes an investor’s re- 
gret (defined as the difference of the solution of a strat- 
egy under incomplete information and the optimal so- 
lution) distribution in the L, metric [2]. In general, one 
may express the regret of a decision maker with strat- 
egy Sas a function of f(vs(X), OPT (X)), where X is the 
revealed future event, vs(X) is the value achieved un- 
der strategy S operating under ignorant of the future 
event X, and OPT(X) is the optimal value achievable 


knowing the complete information. One such function 
often used is the Log metric distance of these two values 
in the feasible space [10]. However, since the authors 
use the absolute difference for the basis of evaluation 
of strategies, probability assumption is still necessary in 
this model. The competitive analysis and the solution of 
Cover [1] base the evaluation on the ratio of the perfor- 
mance of a strategy with unknown information and the 
performance of the best solution in the class of strate- 
gies under consideration. 

R.M. Hogarth and H. Kunreuther have discussed 
situations when financial decisions are made under ig- 
norance. They have designed experiments to study it by 
evaluating human empirical judgements. However, de- 
cision making processes ofeconomic agents are ignored 
in this study [6]. 

Some information is still available in reality, though 
not necessarily in the form of a well shaped probability 
distribution. The competitive analysis provides an ap- 
proach which does not rely on probability distribution, 
allows for analysis under asymmetrical information of 
agents in the market, and in principle, has no difficulty 
to include available information in the analysis. The re- 
maining difficulties in applying it successful to portfolio 
management are mainly modeling of available informa- 
tion and efficient algorithms for computational purpose. 
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Introduction 


Basic to the process of human understanding and learn- 
ing, the problem of recognition, which includes classifi- 
cation and machine learning and the more general ap- 
proach of pattern recognition, consists of a set of algo- 
rithms or procedures to determine in which of a num- 
ber of alternative classes an object belongs. 

While recognition is a human process whose func- 
tioning is largely unknown [11], pattern recognition 
and classification and machine learning are algorithms 
or heuristic procedures with a precise functional char- 


acterization to determine as precisely as possible the 

class membership of an object. 

The two approaches, pattern recognition on the one 
hand and classification and machine learning on the 
other, emphasize two different aspects of the learning 
methodology, similar to a distinction often made in nu- 
merical analysis between extrapolation and interpola- 
tion [13]. 

In pattern recognition, given a feature of a popu- 
lation, it is desired that all objects that belong to that 
population be recognized with an acceptable small er- 
ror, since the paramount aspect of this activity is to rec- 
ognize the object so as to be able to proceed accord- 
ingly. It is not of interest to diagnose a varying percent- 
age of sick individuals, but rather it is essential to rec- 
ognize correctly the pathology. Thus in pattern recog- 
nition, given an object, it is desired to determine if the 
object belongs to the population specified and, if so, to 
determine precisely to which class it belongs [5]. 

In classification and machine learning, a population 
is considered given and some objects belong to known 
classes, while other objects belong to as yet unknown 
classes, so it is desired to determine the class member- 
ship of objects that are known to belong to that popu- 
lation. Depending on the definition of the populations 
considered and the algorithms used, the classification 
rate may differ from one application to another. Classi- 
fication and machine learning procedures are often de- 
fined in terms of heuristics, such as support vector ma- 
chines with kernel methods. The kernel to be applied to 
a given problem cannot be determined except by trial 
and error, so that the existence of a suitable kernel is 
not guaranteed. Thus results may differ markedly from 
application to application [5]. 

Here we shall be concerned with pattern recogni- 
tion problems that must consider: 

e The collection of objects to examine and the training 
set available for learning the classes. 

e The attributes that can be defined precisely on the 
objects in the training set and on the objects to be 
recognized (which may be as yet unknown). 

e The precision with which the recognition is re- 
quired, as well as the possible structures defined on 
the data sets. 

The pattern recognition algorithm used to perform 

this will be formulated as a complementarity problem 

rather than an optimization algorithm as it may be con- 
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sidered more general, and the known differences that 
may exist in the attributes of the classes allows addi- 
tional constraints to be defined, which permit more 
precise results to be obtained. 


Definitions 


Consider a set of objects, characterized by a set of com- 
mon attributes, which have been assigned to suitable 
classes, so that their class labels are known. This is called 
a training set [5]. 


Definition 1 A subset of a data set is termed a training 
set if every entity in the training set has been assigned 
a class label. 


Definition 2 Suppose there is a set of entities E and 
aset P= {P,, P), ... , P,} of subsets of the set of enti- 
ties, ie. P) C E,j © J = {1,2,... ,n}. A subset f C J 
forms a cover of E if ) ie P; = E. If, in addition, for 
every k,j € hi #k, P|) Py = Y, it is a partition. 


Definition 3 The data set is coherent if there exists 

a partition that satisfies the following properties: 

1. The relations defined on the training set and in par- 
ticular the membership classes, defined over the data 
set, consist of disjoint unions of the subsets of the 
partition. 

2. Stability: the partition is invariant to additions to the 
data set. This invariance should apply both to the 
addition of duplicate entities and to the addition of 
new entities obtained in the same way as the objects 
under consideration. 

3. Extendability: if the dimension of the set of at- 
tributes is augmented, so that the basis will be com- 
posed of p+ 1 attributes, then the partition ob- 
tained by considering the smaller set will remain 
valid, even for the extension, as long as this exten- 
sion does not alter the relations defined on the data 
set. 


Definition 4 A data set is linearly separable if there 
exist linear functions such that the entities belonging to 
one class can be separated from the entities belonging 
to the other classes. It is pairwise linearly separable if 
every pair of classes is linearly separable. A set is piece- 
wise separable if every element of each class is separable 
from all the other elements of all the other classes. 


Clearly if set is linearly separable, it is pairwise linearly 
separable and piecewise separable, but the converse is 
not true. The following results are straightforward: 


Theorem 1 [fa data set is coherent, then it is piecewise 
separable. 


A given class is formed from distinct subsets of the par- 
tition, so no pattern can belong to two classes. There- 
fore each pattern of a given class will be separable from 
every pattern in the other subsets of the partition. Con- 
sequently the data set is piecewise separable. 


Theorem 2 Given a data set that does not contain two 
identical patterns assigned to different classes, a correct 
classifier can be formulated that realizes the given parti- 
tion on this training set. 


Corollary 1 Given that the training set does not con- 
tain two or more identical patterns assigned to differ- 
ent classes, the given partition yields a completely correct 
classification of the patterns. 


The avoidance of the juxtaposition property, i.e. two 
identical patterns belong to different classes, entails that 
the Bayes error is zero [2]. 

In general this does not mean that in any given 
neighborhood of a pattern there cannot be other pat- 
terns of other classes, but only that they cannot lie on 
the same point. Thus the probability distribution of the 
patterns with respect to the classes may overlap, if such 
distributions exist, although they will exhibit discon- 
tinuities in the overlap region, so that juxtaposition is 
avoided. 


Formulation 


The classification algorithm to be formulated may be 
specified as a combinatorial problem in binary vari- 
ables [6]. 

Suppose that a training set is available with n pat- 
terns, represented by appropriate feature vectors in- 
dicated by x; € R’, Vi=1,2,...,n and grouped in 
c classes. An upper bound is selected to the number 
of barycentres that may result from the classification, 
which can be taken “ad abundantiam” as m, or on the 
basis of a preliminary run of some classification algo- 
rithm. 
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trix which is set to zero. The barycentres when cal- = 2 
culated will be written in the matrix by class. Thus 
a barycenter of class k will occupy a column of the ma- 


The initial barycenter matrix will be an p x mc ma- me (: 


t; -ony) 20 (6) 


i=1 


trix between (m(k — 1) + 1) and mk. th 7 th 
Since we are considering a training set, the feature aS pee Veh ‘= bya ie Vsh 

vectors can be ordered by increasing class label. Thus 

the first n; columns of the training set matrix consists ie 

of patterns of class 1, from n; + 1 to nz of class 2 and in ~ 2 


j=km+1 
general from nz_; + 1 to n, of class k. ar 


T 
Thus consider the following inequality constrained tj tj 
eer: : ; x; — ————_. rs: 
optimization problem, from which we shall derive the a4 dar, Vr “4 ai lero Vr 

1 1 
non-linear complementarity specification. Let the fol- — — 


lowing hold: x ig 20 
e x; € R?: the p-dimensional pattern vector of pat- Vie 1208, ASH 12, a. gi 
tern 1; k,1=0,1,...,c—1; (7) 
e classes are considered, k = 0,1, ... ,(c — 1). Let 
the number of patterns in class c, be indicated by nx; zj,yij € {0, 1} integer . (8) 
then the n patterns can be subdivided by class so that 
n= a Nk; The solution of this optimization problem assigns 
e ZE€ {0, 1}, integer: {7 = 1,2, ... mc}ifz; = 1then each pattern to a mean vector, called a barycen- 
the barycenter vector j € {mk+1},...,m(k+1)} ter (zj,j = 1,2, ...,mc), whose values are given by 
belonging to recognition class cx, € {0,...,c—1}, the vectors t; € R?, j = {1,2, ... , mc} divided by the 
e yij € {0,1}, integer: pattern i has been assigned to number of patterns assigned to that barycenter. The 
the barycenter j (yi; = 1); least number of barycentres, indicated by the objective 
e t; © R?: the sum of the elements of the vectors function Eq. (1), which will satisfy the stated constraints 
of the patterns assigned to the barycenter j = is determined. 
{1,2,...,mc}; The n constraints Eqs. (2) and (3) state that each 
e Misa large scalar. feature vector from a pattern in a given class must be 
assigned to some barycenter vector of that class. As pat- 
Min Z = 2 “ (1) terns _ barycentres have been ordered By class, the 
é summation should be run over the appropriate index 
sets. 
m(k+1) The mc constraints Eq. (4) impose that no pattern 
> yig-1>=0 Wk=0,1,...,(c—1); be assigned to a non-existing barycenter. 
j=km+1 Instead, constraints Eqs. (5) and (6) determine the 
Vi=mp1+1,...,n, (2) Vector of the total sum element by element assigned to 


a barycenter. Notice that x; is a vector, so the number of 


inequalities will be 2mc times the number of elements 
~ pa Ds yyrn2o (3) in the feature vector. 


a The last set of inequalities Eq. (7) indicates that each 


4 feature vector must be nearer to the assigned barycenter 
= = V7 20 VGH 1, 2,52. te (4) _ of its own class than to any other barycenter. Should the 
i=1 barycenter be null, this is immediately verified, while if 

it is non-zero, this must be imposed. 
Finally, Eq. (8) indicates that the vectors z € R™* 


t;— xivij 70 Vj =0,1,...,mc 5 
: 2S on : ©) and y € R"”* are binary. 


i=1 
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The solution will determine that each pattern of the 
training set is nearer to a barycenter of its own class 
than to a barycenter of another class. Each barycenter 
has the class label of the patterns assigned to it, which 
will belong by construction to a single class. This de- 
fines a partition of the pattern space. 

A new pattern can be assigned to a class by deter- 
mining its distance from each barycenter formed by the 
algorithm and then assigning the pattern to the class of 
the barycenter to which it is nearest. 

In general, other constraints which characterize re- 
lationships between objects of different classes can be 
easily introduced in this specification, as well as dynam- 
ical relationships regarding the attributes of the objects. 

The problem can also be formulated as a non-linear 
complementarity problem in binary variables, which 
will be solved through iterating on a set of linear com- 
plementarity problems in binary variables, by using 
a linear programming technique with parametric vari- 
ation in one scalar variable [7] which has given good 
results [3]. 

For simplicity in the representation and analysis, 
write the constraints (7) as: 


gly, x, t) 


T 
( th th 
TV Sma A am 
pas ies Ysh peer Ysh 


. t 
j 
x; — ——— 
m(k+1) 
( peed =) 


x viz (9) 


m(k+1) 
tj 
~ (« ~ yom) 
m 
j=km+1 pear hans Vrj 


The following additional notation should be 
adopted to write the optimization problem (1)-(8) as 
a non-linear complementarity problem: 

e eis an appropriate dimensional vector of ones. 
e Ee R"™"" is a matrix composed of mc identity 

matrices of dimension n x n. 

e HeR”™*" matrix of ones. 

e 7 isa scalar to be assigned by dichotomous search 
during the iterations. 

The data matrix of patterns indicated as X of dimension 

(p X mx c) x (n X m x c) is written in diagonal block 

form with blocks of dimension p x n elements contain- 

ing the original data matrix. 


This block is repeated mc times with the first ele- 
ment of the block placed at the position ((j — 1)p + 1, 
(Gj-Wn),j= 1,2, ...,mc. 

In fact the size of matrices E, H and X can be greatly 
reduced in applications since the patterns in the train- 
ing set are ordered conformably with the barycenter 
vector t = {tj} € R?”* and each class is of known car- 
dinality. 

The non-linear complementarity problem can 
therefore be written as: 


—Z e 
0 0 
Ey —e 
—ely n 
> 10 
Nex ie at 0 (10) 
t—Xy 0 
—e'(t— Xy) 0 
gly, x, t) 0 
—elz n 
Zz 
¥. 
t 
: 
Ay 
>0 (11) 
As 
AG 
i 
Az 


(z', yh t Ap, AQ, AS, Ag, AS, Ag, Az) 


4 e 
—y 
0 
Ey —e 
—ely n 
*l| me—-Hy |*]01]=° 
t—xXy 0 
—e'(t— Xy) 0 
gly. x, t) 0 
—elz n 


(12) 
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Binary values to the z, y variables are imposed by the 
constraints Eq. (10) and the complementarity condition 
Eq. (12). 

Finally, by recursing on the parameter 7 fewer and 
fewer barycentres will be created, as long as the problem 
remains feasible and thus ensuring a minimal solution. 


Methods and Applications 


The aim of this section is to describe the method to 
solve the non-linear complementarity problem speci- 
fied in the previous section. The convergence of the 
non-linear complementarity problem Eqs. (10)-(12) 
has been given elsewhere [1]. 

In a small enough neighborhood, the approxima- 
tion of the non-linear complementarity problem by 
a linear complementarity problem will be sufficiently 
accurate so that, instead of solving the original system, 
a linear complementarity system approximation can be 
solved, which may be thus represented: 


-I 0 0 00000000 Zz 
0 —I 0 00000000 y 
0 0 0 00000000 t 
E 0 0o0000000T]A, 
0 —el 0 00000000] 4A, 
MI —-H 0 000000004) A; 
0 —X I 00000000 |] Aq 
0 eX -eT 00000000 ]]A; 
0 Vegy(t,9) Vert. 7) 00000000 ]] Ag 
0 0 D 00000000] ]4A, 
—el 0 0 00000000 As 
e 
e 
0 
—e 
n 
+ 0 >0 
0 
0 
—g(t, 9) + Vglt, PP 
—d 
uy) 


(13) 


(14) 


bah 5 Oe Aly Ag Ags Aas ye Aes De, Da) 


—I 0 0 00000000 Zz 
0 -I 0 00000000 

0 0 0 00000000 t 
0 E 0 o0000000))A, 
0 —el 0 00000000])A, 
MI -H 0 00000000])A; 
0 =X I 00000000 ]]A, 
0 elX -e' 00000000 As 


0 Vegy(i, 9) Veli. $)00000000 |] Ag 


0 0 D 00000000 ]]A, 
—e! 0 0 00000000) \rAg 
e 
=€C. 
n 
+ 0 =0 
0 
0 
—g(t, §) + Valk pp 
= 
1 


(15) 


The problem (10)-(12) is then solved by expand- 
ing the vectorial function g(y,x,t) in a Taylor series 
around the iteration point and solving the resulting 
linear complementarity problem approximation (13)- 
(15) of the given non-linear complementarity problem 
within a suitable trust region. It is easy to show: 


Theorem 3 The following are equivalent: 
1. The non-linear optimization problem defined by (1)- 
(8) has a solution; 
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2. The non-linear complementarity problem defined 
by (10)-(12) has a solution; 

3. The linear complementarity problem defined by (13)- 
(15) has a solution. 


Thus the computational specification of this algorithm 
is: 


Algorithm 1 (CASTOR) 
Begin; 
e Given: a training set A € R?*” with n patterns each 
with p elements belonging to c classes; 
e Construct: the matrices E € R"*""°,H € R™*", 
Xe R(Pme)x(mne) De Remexpme, 
e Set y’, d°, 7°; 
Fork = 1,2, ...; 
e while z‘t!, yk+1 ttl is a solution to LCP 
Egs. (13)-(15) Do; 
e Begin: recursion on g(x,y,t) 
-~ while (geri yktt tk+1) # (z*, y*, tk) Do; 
_ (z* 9%, t*) va gee aie tkt+1) 
- Determine Vg,(x*, yk, t*,) 
*Begin: dichotomous search on nk 
(zk+1, ae’ tkt+1) -_ LCP(z*, yk, t*) 


pxn 


*end; 
e end; 
the solution is (z*, ae t*) 
end; 


The termination of the classification algorithm may 
now be proved under a consistency condition. 


Theorem 4 Given a set which does not contain two 
identical patterns assigned to different classes, a correct 
classifier will be determined by Algorithm 1. 


Models 


Suppose a training set is available, defined over a suit- 
able representation space, which is piecewise separable 
and coherent, what properties should such a training set 
have to determine a precise classification with regard to 
a set of data of as yet unknown classes? 

The algorithm CASTOR (Complementarity Algo- 
rithm System for TOtal Recognition) described in the 
Sects. “Formulation” and “Methods” will determine 
a classification rule to apply, on the data set, just that 
partition which has been found for the training set, so 
that to each entity in the data set a class is assigned. If 
the training set forms a random sample and the data 


set which includes the training set is coherent, then this 
classification can be performed to any desired degree 
of accuracy by extending the size of the training sam- 
ple. Sufficient conditions to ensure that these properties 
hold are given by selecting the data set and the verifica- 
tion set by non-repetitive random sampling. 


Theorem 5 Suppose that the data set is coherent; then 
the data set can be classified correctly. 


To avoid having to introduce distributional properties 
on the data set considered, the empirical risk minimiza- 
tion inductive principle may be applied [12]: 


Definition 5 A data set is stable, according to defini- 
tion 3, with respect to a partition and a population of 
entities if the relative frequency of misclassification is 
Remp (a*) = Oand 
lim pr{Remp(a*) > €} = 0, (16) 
n—oo 
where a’ is the classification procedure applied, « > 0 


for given arbitrary small value and pr{.} is the probabil- 
ity of the event included in the braces. 


In some diagnostic studies the set of attributes consid- 
ered have no significant relationship with the outcome 
or the classification of the entity. Typically the classes 
could be eye color and the attributes the weight, height 
and sex of a person. Such a classification would be spu- 
rious since there is no relation between eye color and 
body indices. 

A spurious collection of entities, in which there is 
no similarity relations, may occur and should be recog- 
nized. With this algorithm, this occurrence is easily de- 
termined, as very many barycentres are formed, almost 
one per object. Such spuriousness may arise even in the 
presence of some meaningful relationships in the data, 
which are, however, swamped by noise, and so data re- 
duction techniques may be useful [5]. 

In general, by considering smaller and smaller sub- 
sets of the attribute space X, if there exists a relation- 
ship between the attributes and the classes of the en- 
tities, for certain of these subsets the frequency of the 
entities of a given class will increase to the upper limit 
of one, while in other subsets it will decrease to a lower 
limit of zero. Thus for a very fine subdivision of the at- 
tribute space, each subset will tend to include entities 
only of a given class. 
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Definition 6 A proper subset S; of the attribute space 
X of the data set will give rise to a spurious classification 
if the conditional probability of a pattern belonging to 
a given class c is equal to its unconditional probability 
over the attribute space. The data set is spurious if this 
holds for all subsets of the attribute space X. 


priyi = ¢ | i. xi) Sk} = prtyi =e | (i. x1) OX} 
(17) 


The following results can now be presented, which are 
proved elsewhere [1]. 


Theorem 6 Consider a training set of n patterns ran- 
domly selected, assigned to two classes, where the uncon- 
ditional probability of belonging to class one is p. Let a be 
a suitable large number and let (n > a). Let the train- 
ing set form b, barycentres. Then, under CASTOR, this 
training set will provide a spurious classification if 


bn 


— n> a. (18) 
n 


2(l=p) 


Theorem 7 Let the probability of a pattern belonging to 
class one be p. Then the number of barycentres required 
to partition correctly a subset S, containing n, > a pat- 
terns, which is not spurious, formed from the CASTOR 
algorithm, is b, < ns, Wns > a. 


Corollary 2 ([12]) The Vapnik-Cervonenkis dimen- 
sion (VC dimension), s(C,n), for the class of sets defined 
by the CASTOR algorithm, restricted to the classifica- 
tion of a non-spurious data set which is piecewise sepa- 
rable, with n, elements and two classes, is less than 2”:, if 
ns > a. 


Theorem 8 ([2]) Let C be a class of decision func- 
tions and yy, a classifier restricted to the classification 
of a data set which is not spurious and returns a value 
of empirical error equal to zero based on the training 
sample (z,,Z2, ... ,Zn). Thus InfyecL(W) = 0, i.e. the 
Bayes decision is contained in C. Then 


pr {L(W*) > €} < 2s(C,2n)2>°. (19) 
By calculating bounds on the VC dimension, the uni- 
versal consistency property can be established for this 
algorithm applied to the classification of a data set 
which is not spurious. 


Corollary 3 ([5]) A non-spurious classification prob- 
lem with a piecewise separable training set is strongly 
universally consistent. 


Cases 


To use the CASTOR algorithm in applications, it is 
necessary to determine, first, whether the data set is 
spurious or not, for the given problem with the spe- 
cific pattern vectors adopted. The way the pattern vec- 
tors are defined based on the data available may affect 
strongly the results obtainable. 

Further, the coherence of the data set must be tested 
to ensure that the patterns extracted are sufficiently rich 
to ensure the proper classification, stability and extend- 
ability of the data set (Definition 5). Then the algorithm 
can be applied, but the results will only hold if the data 
set, training set and verification set are random sam- 
ples, taken from a known or unknown population, as 
otherwise the sample may not be representative of the 
population. 

Note that with this method, if the data come from 
a set of unknown populations, a suitable partition of 
the data set will form accordingly, even though the op- 
erator may not know to which population an individ- 
ual barycenter belongs. If the number of objects com- 
ing from different populations is so high with respect 
to the training set, then the problem may be recognized 
as spurious, only to signify that too many barycentres 
are formed with respect to the available training ob- 
jects [8,9]. 

Consider a set of proteins randomly sampled from 
a population of proteins, and the set of proteins whose 
structure should be determined also belongs to that 
population, but are of unknown structure. Probability 
limits can be imposed on the likelihood of the struc- 
ture identified being the correct one. Therefore, accu- 
rate limits on the precision of the recognition of the the 
new protein’s structure can be specified. 

Results could be obtained also by selecting “pur- 
posefully” representative proteins and subjecting these 
to a suitable algorithm. The results could be better 
on particular sets than the asymptotic mean precision 
measures, but generally, and, almost surely, on using 
new data, the results will turn out to have a greater 
variance and a lower mean precision, as is well known 
from sampling theory [4]. Thus to minimize the asymp- 


412 


Complementarity Algorithms in Pattern Recognition 


Complementarity Algorithms in Pattern Recognition, Table 1 


Q3 Classification results on the Rost 126 verification set by similarity classes (15 proteins selected) 


Sim. class CASTOR PHD DSC PRED MUL NNSSP Zpred CONS 


0.82 0.74 | 0.73 | 0.72 | 0.68 
0.96 0.75 | 0.77 | 0.64 | 0.64 


0.87 | 0.83 
0.83 | 0.75 
1.00] 1.00 | 1.00 | 1.00 1.00 | 1.00 
0.69 | 0.61 


1.00 0.84 
1.00 0.81 
1.00 1.00 
0.98 0.67 


Complementarity Algorithms in Pattern Recognition, Table 2 


0.78. 
0.70 
0.83 
0.68 


0.66. 
0.76 
0.69 
0.73 


0.80 
0.76 
0.84 
0.77 


0.69 


0.76 


0.59 | 0.66 0.55 | 0.68 


Q3 estimate of the classification precision of CASTOR and other classification procedures on 56 randomly selected proteins 


of the Cuff 513 data set 


Sim. class CASTOR PHD 


DSC PRED MUL NNSSP Zpred CONS 


totic misclassification error a sample, as large as possi- 
ble, drawn randomly from the given population should 
be used, which will then ensure that under mild condi- 
tions the properties derived above are satisfied. 

Also a distinction is often introduced regarding the 
similarity between classes of subsets of proteins [10]. 
In this case, it may be considered relevant that a cover 
be defined on the population of proteins, so as to form 
eight or more subpopulations. The samples can still be 
drawn randomly from the relevant subpopulation and 
the classifier determined for each subpopulation. To de- 
termine the structure of a new protein, first the simi- 
larity in the residue chain must be determined with re- 
spect to each subpopulation and then the classifier of 
the subpopulation with the highest similarity coefficient 
is applied. Here, the proper sampling method should 
consist of a stratified non-repetitive random sampling 
design, but this would be warranted only if there are 
significant differences in the results for the subpopula- 
tions. 

The limitations of not using stratified random data 
sets and using ad hoc heuristics, instead of demon- 


0.72 | 0.70 


0.64 


strably convergent algorithms, is well brought out in 
the following tables [1], where classification results are 
compared between the CASTOR algorithm and seven 
popular alternative procedures, for two well-known 
data sets the Rost 126 and the Cuff 513 data sets. 

Table 1 presents the Q; classification results on 
the Rost 126 verification set for the various similar- 
ity classes which have appeared in the sample of 15 
proteins and in the random verification set. The pre- 
cision of the classification results found by applying 
the CASTOR algorithm dominate all other procedures, 
and usually by over 15%. 

In Table 2 the Q3 estimates are given for the clas- 
sification precision of CASTOR and the other classi- 
fication procedures on 56 randomly selected proteins 
of the Cuff 513 data set. It is seen that the precision 
obtained by CASTOR dominates all the other entries 
except four. Two of these entries occur for the CONS 
algorithm for similarity classes 0 and 2, while the two 
other entries which dominate the results by the CAS- 
TOR algorithm occur for similarity class 2, for the pro- 
cedures PRED and NNSSP. 
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Conclusions 


The experiments described above shed some light on 
two important aspects, which are very closely related: 
the sampling procedure to adopt and the classification 
procedure to apply. Moreover, these results show the 
essential non-linearity and complexity of the pattern 
recognition problem. 

Random sampling is necessary for precise and sta- 
ble estimates, with the required accuracy, obtainable in 
predictions, since invariably the choice of special sets 
in verification or in training will alter the expected pre- 
diction accuracy, as a non-random sample will contain 
a different distribution of classes from the one regard- 
ing the population. As the prediction precision varies 
with the class distribution, this will have a significant 
effect on recognition. 

Heuristics compared to algorithms will bias the re- 
sults in the same way: they will be accurate in some 
cases, unstable in others. Moreover, when the classifica- 
tion results are poor, with a heuristic the source of the 
problem cannot usually be determined. With an algo- 
rithm, such as the one indicated, the root of the prob- 
lem will invariably be tied to one of the mild assump- 
tions not being satisfied. 

This can be checked and remedied. 


See also 


> Generalizations of Interior Point Methods for the 
Linear Complementarity Problem 

> Generalized Eigenvalue Proximal Support Vector 
Machine Problem 
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We survey a number of the basic ideas, results, and 
references, for the complexity classes P, NP, CoNP, 
PSPACE, DEXPTIME, NDEXPTIME, and EXSPACE, 
the most important complexity classes in optimization. 
These ideas and results include the following: 
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i) the time and space complexities of both determin- 
istic and nondeterministic multitape Turing ma- 
chines as formalized in [1]; 

ii) the complexity classes above including the known 
results on their intercontainments; and 

iii) the concepts of polynomial reducibility, F-hardness, 
and F-completeness, for the complexity classes F 
above. 

We present brief historical surveys of the results ob- 
tained in ii) and iii). We also briefly survey many of the 
results on the above complexity classes in the basic ref- 
erences [1,12,16,25,29,30,37], emphasizing results espe- 
cially relevant to the area of optimization. 

The following are a list of some of the basic notation 
and terminology used here. 


Definition 1 A finite alphabet » is a finite nonempty 
set of characters. A set of strings over some finite alpha- 
bet is said to be a language. 


Further, we denote ‘infinitely often’ by ‘i. 0.’. 


Definition 2 By an exponential function in n we mean 
a function f(n) = 2°"" where c, r > 0 are constants inde- 
pendent of n. 


The languages or problems 3-SATISFIABILITY (3- 
SAT), 3-DIMENSIONAL MATCHING, VERTEX 
COVER, CLIQUE, HAMILTON CIRCUIT, and PAR- 
TITION are defined as in [12]. Thus, for example, the 
language 3-SAT is defined to be the set of all satisfi- 
able CNF formulas with no more than 3 literals per 
clause, when suitably encoded as a language over some 
finite alphabet. We note that this language, its quan- 
tified variants, and its succinctly-specified variants are 
the languages in the literature most widely used to 
prove NP-, PSPACE-, DEXPTIME-, and NDEXPTIME- 
hardness results (Definition 8 and [12,21,22,29,30]). 

Finally, we denote the linear programming, {0, 1}- 
integer linear programming, integer linear program- 
ming, and quadratic programming problems as defined 
in [12,25,30,37] by LP, {0, 1}-ILP, ILP, and QP, respec- 
tively. 


Time and Space Complexity of Turing Machines 


In the literature of computational complexity, the most 
common models of computational devices and the 
problems solvable by such devices are Turing machines 
(TMs) and language recognition problems, respectively 


[1,9,12,15,17,29]. Here, we only consider multitape de- 
terministic and nondeterministic Turing machines (de- 
noted DTMs and NDTMs, respectively) and their as- 
sociated language recognition problems as described in 

[1]. Informally, such a Turing machine M consists of 

the following: 

1) a finite state control together with a finite nonempty 
set Q of possible states of the control; 

2) finite nonempty tape and input alphabets T and I, 
respectively, and distinct symbols b and F, denoting 
‘blank and ‘leftmost cell of tape’, such that I Cc T 
and b,-}- eT —TI; 

3) a finite number k > 1 of tapes, each of which is infi- 
nite to the right only and is divided into individual 
tape cells such that each cell can contain exactly one 
symbol in T at any one time; 

4) k tape heads, one for each tape, each head capable of 
scanning a single cell at any one time; 

5) a start state q° € Qand a set of accepting states F C 
Q; and 

6) a finite set 44 of moves, each of the form 


(q,51,--+5Sks 1, (ti, d1),..., (tk, de)) 
20«xT* xOx(T x1. BS). 


M is said to be deterministic if, for each k + 1 tuple (q, s1, 
wees Sk) EQX T*, there is at most one move in jt whose 
initial k + 1 components are q, $1, ..., Sk, respectively. 
Otherwise, M is said to be nondeterministic. A state q € 
Qis said to be final if there is no move in jz whose first 
component equals q. We assume that the following two 
restrictions hold on F and ju: 

7) Each accepting state is final. 

8) There are no moves 


(q,51,--+5 Sk. 1, (4, di), ..., (tk, dk)) 


in pz such that letting 1 <i<k,s;=Fandt; #6, 5; 
#tandt; =F, or s;= and d;=L. 

Let w € I*. Aa partial computation of M on w isa finite 

sequence Ow = (mj, ..., mj) with 1 > 0 of moves of M 

such that the following hold: 

1) Initially, M is in its start state q°; the first tape of M 
holds the string F w, one symbol per cell starting at 
its leftmost cell; the contents of the leftmost cells of 
each of the other tapes of M equal | the contents 
of all other cells of M equal b; and each of the tape 
heads of M scans the leftmost cell of its correspond- 
ing tape. 
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2) When initialized as in 1), M executes the move- 
rules mj, ..., mj; consecutively and in that order. The 
length of the partial computation oy = (mj, ..., mi) 
of M on w equals /. A partial computation 0, = (m, 
..., m) of M on w is said to be an accepting com- 
putation (respectively, a nonaccepting computation) 
of M on w if, after executing the sequence of moves 
Ow, M is in an accepting state (respectively, M is in 
afinal state that is not an accepting state). 

The restrictions 7) and 8) on the sets F and jz ensure 

that no accepting or nonaccepting computation of M 

on wis an initial subsequence of any other partial com- 

putation of M on w and at no point during a partial 
computation of M on w does one of the tape heads of 

M attempt to move off the left end of its corresponding 

tape. 

The language accepted by a Turing machine M, de- 
noted by L(M), is the set of all strings w € I* such that 
there exists an accepting computation on M on w. The 
language recognition problem of M is the problem of 
verifying, given w € I* 1 L(M), that w is, in fact an el- 
ement of L(M). The time and space complexities of M 
are defined in terms of partial computations of M on 
strings w € I* as follows: 


Definition 3 Let M be a deterministic Turing ma- 

chine. The time complexity of M, denoted by Ty(-), is 

the function from N to N U { co } defined, for all n € N, 

by 

e Ty (n) =max{l € N:1/is the length ofa partial com- 
putation of M on w, where w € I}, if this maximum 
exists; 

e Ty(n) = oo otherwise. 

The space complexity of M, denoted by Sj,(-) is the func- 

tion from N to N U { 00 } defined, for all n € N, by 

e Sy(n) = max {s € N:s is the maximum number of 
tape cells scanned on any of the tapes of M during 
a partial computation of M on w }, where w € I”, if 
this maximum exists; 

e Sm(n) = co otherwise. 

Let M be a nondeterministic Turing machine. The time 

complexity of M, denoted by T),(-), is the function from 

N to N defined, for all n € N, by 

e Ty(n) = 0 ifno string w € I” is in L(M); 

e Ty(n) = max {1 € N:]is the minimum length of 
an accepting computation M on w }, where w € I" 4 
L(M); 


e Ty(n) = oo otherwise. 

The space complexity of M, denoted by Sy(-) is the 
function from N to N defined, for all n € N, by 

e Sy(n) = 0ifno string w € I” is in L(M); 

e@ Sy(n) = max {m € N| m equals the minimum of the 
maximum numbers of tape cells scanned on any of 
the tapes of M during an accepting computation of 
M on w, where w € I" M L(M)}; 

e Sy(n) = oo otherwise. 


There is a fundamental difference between the defini- 
tions of time and space complexity, for DTMs and for 
NDTMs, respectively. For a DIM M, the functions Ty 
and Sy are defined in terms of the numbers of moves 
executed and tape cells scanned in an arbitrary par- 
tial computation of M on strings w € I*. In contrast, 
for an NDTM M, these functions are defined only in 
terms of the numbers of moves executed and tape cells 
scanned in minimum ‘cost accepting computations’ of 
M on strings w € I* N L(M). The following are two easy 
implications of this difference: 


Proposition 4 If L is accepted by a DTM M with time 
and space complexities Ty and Sy, then L is also ac- 
cepted by an NDTM M! with time and space complex- 
ities Ty and Sy such that, for alln € N, 


Tu'(n) < Tu(n) and Sy(n) < Su(n). 


Proposition 5 If L is accepted by a DTM M with input 
alphabet I such that, for alln € N, Ty (n) € N (equiva- 
lently, T y(n) A 00), then the language I* — L is accepted 
by a DTM M' such that, for all n € N, Tu’ (n) = Tm (n). 


Consequently, deterministic time complexity classes as 
defined in Definition (6) are closed under complemen- 
tation. At present (1999), nondeterministic time com- 
plexity classes are not known to be closed under com- 
plementation. 


Definition of the Complexity Classes 


In Definition (6) we use Definition (3) to define the time 
and space complexity classes most relevant to optimiza- 
tion. Next in Theorem (7), we give several basic prop- 
erties of these classes, whose proofs do not require the 
concept of polynomial time reducibility defined in Def- 
inition (8). 
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Definition 6 Let T,S:N—N. 

A Turing machine M is said to be polynomi- 
ally time-bounded, respectively polynomially space- 
bounded, if the function Tyy(n), respectively Sy(n), is 
bounded above by a polynomial function in n. M is said 
to be exponentially time-bounded, respectively exponen- 
tially space-bounded, if the function T y(n), respectively 
Smu(n), is bounded above by an exponential function in 
n. 

e DTIME(T(n)), respectively NDTIME(T(n)), is the 
class of all languages L for which there exist a DTM, 
respectively an NDTM, M and and ac > 0 such that 


L=L(M) 
and, for alln € N, 
Tu(n) <c- T(n). 


e DSPACE (S(n)), respectively NDSPACE (S(n)), is 
the class of all languages L for which there exist 
a DTM, respectively an NDTM, M and ac >0 such 
that 


L=L(M) 
and, for alln € N, 
Su(n) < c- S(n). 


e P, NP, PSPACE, DEXPTIME, NDEXPTIME, and 
EXSPACE are the classes of all languages L such 
that L is the language accepted by a polynomially 
time-bounded DTM, a polynomially time-bounded 
NDTM, a polynomially space-bounded TM, an ex- 
ponentially time-bounded DTM, an exponentially 
time-bounded NDTM, and an exponentially space- 
bounded TM, respectively. 

e CoNP, respectively CONDEXPTIME, is the class of 
all languages L for which there exists an NDTM M 
with tape alphabet I such that L = I* — L(M) and 
the function Tj;(n) is polynomially time-bounded, 
respectively exponentially time-bounded. 


Theorem 7 

1) The following containments hold among the com- 
plexity classes defined in Definition 6: 
a) PC NPN CoNP. 
b) NP, CoNP C PSPACE C DEXPTIME. 


c) DEXPTIME Cc NDEXPTIME ™ CoNDEXP- 
TIME. 
d) NDEXPTIME, CoNDEXPTIME C EXSPACE. 
2) 
a) P=NPif and only if NDTIME(n) C P; 
b) P = PSPACE if and only if de > 0 such that 
DSPACE(n‘)  P: 
c) NP = PSPACE if and only if de > 0 such that 
DSPACE(ne) C NP; 
d) for all integers k > 1, NDTIME(n*) 
DSPACE(n*). 
3) PSPACE = DEXPTIME if and only if de => 0 such 
that DTIME(2") C PSPACE. 
If we restrict the classes P and NP to languages over 
a single letter alphabet, denoting these restrictions by 
Psiqg and NP3iq, respectively, then 
a) Poa = NPsta if and only if Ux> 1DTIME(2"") = 
Ux> 1 NDTIME(2""); 
b) NP3siq is closed under complementation if and only 
if the class Ux> 1 NDTIME(2*") is closed under 
complementation. 


4 


nS 


Proof The claims in 1), 2), and 3) of the theorem fol- 
low directly from Definitions 3 and 6, the discussion af- 
ter Definition 3, and simple well-known arguments in- 
volving ‘padding’, e. g. see [4,13,17]. As an example, let 
L € DSPACE(n'), where ] > 1 is an integer. Let k > 2 be 
an integer. Let L’ = {w-#":m=k-I, w é L}, with # 
a symbol not occurring in L. Then L’ € DSPACE(n"'*); 
and L’ € P (respectively L’ € NP) if and only if L € P (re- 
spectively L € NP). The claims of 4) follow from simple 
arguments about ‘tally languages’, see [13]. 


General Comments 


The Turing machine model is due to A.M. Turing [36]. 
Additional discussions of this model can be found in 
[1,9,17]. 

The time and space complexity of DTMs were first 
studied in [15], and [14], respectively. 

The time-bounded complexity classes P, NP, CoNP, 
etc. are invariant under several otherformal computer 
models, including multihead multitape Turing ma- 
chines, Turing machines with multidimensional tapes, 
and both the RAM and RASP models of [35] and [11] 
under the logarithmic cost function. (For a detailed dis- 
cussion of this, see [1, Chap. 1]. 
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It is widely assumed that a computational problem 
is ‘practically computationally tractable’ only if it can be 
solved by a deterministic polynomially time-bounded 
Turing machine [1,12,25,29,30,37]. The resulting im- 
portance of the class P was first observed by A. Cobham 
[7] and J. Edmonds [10]. 

By the well-known Savitch theorem [33] that 
NDSPACE(log n) C DSPACE({log n]?), the classes 
of languages accepted by polynomially space-bounded 
DTMs and NDTMs are equal and the classes of 
languages accepted by exponentially space-bounded 
DTMs and NDTMs are equal. This is the reason for 
defining the complexity classes PSPACE and EXSPACE, 
rather than the classes DPSPACE, NDPSPACE, DEXS- 
PACE, and NDEXSPACE. 


Efficient Reducibility and ‘Hard’ Problems 


Throughout this section F is a class of languages. Fol- 

lowing [1,12,17], we define polynomial reducibility, 

F-hardness and F-completeness under such reducibili- 

ties. To do this, we must extend the definition of DTMs 

given above so that the resulting machines have out- 
puts, and thus can be viewed as computing partial func- 
tions of their inputs. This is accomplished by augment- 
ing each DTM M with an additional ‘output’ tape oy 
such that the tape head of 0,4 can only move one cell to 
the right or stay stationary during any move of M. In 

addition to all usual constraints on M, for all inputs w 

of M: 

1) initially, during a partial computation of M on w, all 
cells of oy are blank and the tape head of oxy scans 
its leftmost cell; 

2) the value computed of M on w is the final nonblank 
contents of 0,4 during the accepting computation or 
the nonaccepting computation of M on w, if such 
a computation exists, and is undefined otherwise. 
The time complexity Ty of such augmented DTMs 

(henceforth, referred to simply as DTMs) is defined ex- 

actly as in Definition 3. 


Definition 8 Let Y and A be finite alphabets. 

A function f from 2* to A* is said to be polynomial 
time computable if and only if there exists a polynomi- 
ally time-bounded DTM M with input alphabet » such 
that, for all w € X*, the value computed by M on input 
w is f(w). 


LetL Cc &* and M C A*. Lis said to be polynomially 
reducible to M, denoted L < pM, if and only if there is an 
f :&7* — A* such that f is polynomial time computable 
and, for allwe X*:w €L ifand only if f(w) € M. 

A language L is said to be F-hard if and only if for 
all languages L’ € F, L’ <, L. L is said to be F-complete 
if and only if L is both F-hard and is in F. 


Henceforth, let F be any of the complexity classes NP, 
CoNP, PSPACE, DEXPTIME, NDEXPTIME, CoNDEX- 
PTIME, or EXSPACE. The following two propositions 
underlie most of the work on F-hard and F-complete 
problems in the literature on algorithmic analysis and 
computational complexity. 


Proposition 9 

1) Let &’, A, and IT be finite alphabets. Let LC &'*, M 
Cc A*,andN CII*. IfL <, MandM < ,N, then L 
<, N. (Thus, polynomial reducibility is transitive.) 

2) If L and M are languages such that L <p M and L is 
F-hard, then M is also F-hard. 

3) P= NP if and only if some NP-complete language is 
in P. 

4) P = PSPACE if and only if some PSPACE-complete 
language is in P. 

5) NP = PSPACE if and only ifsome PSPACE-complete 
language is in NP. 

6) NP = CoNP if and only if some CoNP-complete lan- 
guage is in NP if and only if some NP-complete lan- 
guage is in CoNP. 

7) If L is DEXPTIME-, NDEXPTIME-, or EXSPACE- 
hard, then the recognition of L requires more than 
2" time, 2"° time, respectively 2" space, i.o. on any 
DTM, NDTM, respectively TM, where € > 0 is a con- 
stant independent of n. 


Proof We sketch a proof: 1) follows from the fact that 
the polynomial time computable functions are closed 
under composition. The proofs of 2) through 6) fol- 
low directly from the correctness of 1) and Defini- 
tions 6 and 8. The proof of 7) follows directly from 
well-known ‘hierarchy’ theorems, for deterministic and 
nondeterministic time and space-bounded Turing ma- 
chines [1,12,14,15,17,34]. 


Proposition 10 There exists an F-complete language, 
for each of the complexity classes F. 


Proof It is easy to construct F-complete languages de- 
fined in terms of Turing machines and coded versions 
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of their inputs, for each of the complexity classes F. For 
example, the language L = {# - M; -#- code(x, ..., Xn) 
- #™ > x) +X, is accepted by the one-tape NDTM M,; in 
time t}, where m = 3 - |Mj| - t, is both an element of 
NDTIME(n) and is NP-hard. 


The first complete problems for NP and, consequently, 
the importance of the class NP in nonnumerical com- 
putation are due to S.A. Cook [8], and R.M. Karp [18]. 
Following the terminology of [12], these initial NP- 
complete problems include the problems 

e 3-SAT, 

3-DIMENSIONAL MATCHING, 

VERTEX COVER, CLIQUE, 

HAMILTONIAN CIRCUIT, and 

PARTITION. 

The first complete problems for PSPACE and ND- 
EXPTIME, are due to A.R. Meyer and L.J. Stockmeyer 
[24]. Subsequently, a very large number of natural com- 
putational problems have been shown to be F-hard or 
F-complete, for each of the complexity classes F. Many 
examples and historical references can be found in 
[1,12,16,25,29,30,37]. References [12,25,28,30,37], are 
especially relevant to problems in the area of optimiza- 
tion, including: 

e LP (which is solvable deterministically in polyno- 

mial time as show initially in [19]); 

e {0, 1}-ILP and ILP (the feasibility problems of which 

are NP-complete [5,12,17,30]); and 

e@ QP (which is NP-complete) [12,32,37]. 

Reference [12] also discusses much of the early work 
(prior to 1979) on the complexity of approximating NP- 
hard optimization problems. Reference [16] consists of 
several separately authored chapters surveying many of 
the more recent results on the complexity of approxi- 
mating NP-hard optimization problems. These results 
include the important result of [3] that unless P = 
NP, no MAX SNP-hard optimization problem [31] has 
a PTAS (i.e. a polynomial time approximation scheme, 
[12]). 

Many basic polynomial time solvable and NP- 
hard optimization problems become PSPACE-hard, 
DEXPTIME-hard, NDEXPTIME-hard, and even EXS- 
PACE-hard, when problem instances are specified 
succinctly by hierarchical specifications or by 1- 
and 2-dimensional periodic specifications, see [20,21, 
22,26,27,29]. References [2,6] discuss algorithms for 


and the computational complexity of solving systems 
of multivariable polynomial equations over real closed 
fields. Most of these last problems are NP-hard or 
worse. Finally, the problems of determining the solv- 
ability of a system of multivariable polynomial equa- 
tions over N or Z are recursively undecidable, by 
a straightforward effective reduction from Hilbert’s 
tenth problem [9,23]. 


See also 


> Complexity of Degeneracy 

> Complexity of Gradients, Jacobians, and Hessians 

> Complexity Theory 

> Complexity Theory: Quadratic Programming 

> Computational Complexity Theory 

> Fractional Combinatorial Optimization 

> Information-based Complexity and 
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> Kolmogorov Complexity 
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> Parallel Computing: Complexity Classes 
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Degeneracy in Linear Programming 


In mathematical programming, the terms degeneracy, 
and its absence, nondegeneracy, have arisen first in the 
simplex method of linear programming (LP), where 
they have been given precise definitions. The notions 
were first introduced by G.B. Dantzig in his seminal pa- 
per [7] when he invoked the nondegeneracy assump- 
tion to prove the finite convergence of the simplex 
method. 

In the study of the simplex method for LP, degener- 
acy and nondegeneracy are properties defined for basic 
feasible (or extreme point) solutions of systems of lin- 
ear constraints or for the systems themselves. To give 
the definition, let us consider the general system of lin- 
ear constraints 


= bj, i — eres 
Aj.x (1) 
>bi, i 3 


II 
< 
“ 


where A;-: € R" is the row vector of coefficients in the ith 
constraint, and we assume that the equality constraints 
in it are linearly independent. Let K denote its set of 
feasible solutions. 

Given x € K, the ith constraint in (1) is said to be: 
active or tight at x if either it is an equality constraint 
(i.e., i € {1,..., r}), or it is an inequality constraint that 
holds as an equation at x; inactive or slack at x other- 
wise. We denote the index set of active constraints at x, 
ie, {i: i€ {1,...,m}, A,X = bj} by I). 

The feasible solution x for (1) is said to be an ex- 
treme feasible solution or a BFS (basic feasible solution) 


if it is the unique solution of the system of equations de- 
fined by the active constraints at it in (1);i.e. Ai.x = b; 
for i € I(x). 

The BFS x for (1) is said to be: a nondegenerate BFS 
if the set of active constraints at it, treated as equations, 
forms a square nonsingular system of equations; a de- 
generate BFS if that system has one or more redundant 
equations, i.e., if |I(x)| > n. Thus at a degenerate BFS, 
this system of equations formed from the active con- 
straints is an overdetermined system of linear equations 
with a unique solution. 

The general system (1) is said to be: a degenerate sys- 
tem if it has at least one degenerate BFS; nondegenerate 
system if all its BFSs are nondegenerate. 


Degeneracy in Standard Form Systems 


Before solving an LP, the simplex method transforms 
the constraints into a standard form which is 


Ax = b, 

id (2) 
where the matrix A is of order m x n, and without any 
loss of generality we assume that rank(A) = m. Let A; 
denote the jth column vector of A for j = 1,..., n, it is 
the column of x; in (2) and we assume it is 4 0 for all j. 
Let I” denote the set of feasible solutions of (2). 

Specializing the above definitions to the standard 
form, we conclude that a feasible solution x of (2) is 
a BFS if and only if {A.;: j such that x; > 0} is lin- 
early independent. The BFS x is: degenerate if It iE 
xj > 0} < m, nondegenerate if |{j: xj > 0} =m. 

So, for a nondegenerate BFS x, the submatrix with 
column vectors {A.;: j such that x; > 0} isa basis for 
the system of equations in (2). If x is a degenerate BFS, 
this submatrix has to be augmented with the columns of 
some variables having 0 values in x in order to become 
a basis; usually this augmentation can be carried out in 
many ways. Hence, while each nondegenerate BFS is as- 
sociated with a unique basis, each degenerate BFS is as- 
sociated with several (usually a huge number of) bases. 

System (2) is said to be: degenerate if it has at least 
one degenerate BFS; nondegenerate otherwise. From 
this we see that if system (2) is degenerate, then the 
right-hand side constants vector b lies in a subspace that 
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is the linear hull of a set of m — 1 or less column vectors 

of the coefficient matrix A. These observations imply 

the following facts. 

1) Keeping the coefficient matrix A fixed in (2), but let- 
ting the right-hand side constants vector b vary over 
R”, the set of all b for which (2) is degenerate is a set 
of Lebesgue measure zero in R”. 

2) If (2) is degenerate, the right-hand side constants 

vector b in it can be perturbed ever so slightly to 
make the perturbed system nondegenerate. 
The perturbation technique for resolving the prob- 
lem of cycling in the simplex method caused by de- 
generacy is based on this fact. We will discuss more 
on this later. 

3) If(2) has at least one nondegenerate BFS, the dimen- 
sion of 7 isd =n—m. 

Whether (2) is degenerate or not, every nondegener- 
ate BFS of (2) is incident to exactly d = n — m edges 
of I. 

However, a degenerate BFS is usually incident to 
more than d (could be very large) edges of I", and 
the number of edges of I” incident at different de- 
generate extreme points of J” may be very different. 

4) If (2) is nondegenerate, I is said to be a regular 
or simple polyhedron because every one of its ver- 
tices is incident to exactly d (the dimension of I’) 
edges. This nice regular property may not hold for 
I’ if (2) is degenerate. Thus, degeneracy has the ef- 
fect of making the polyhedron more complex geo- 
metrically. 


The Complexity of Checking Whether a System 
of Constraints is Degenerate 


Despite the rarity of degeneracy among all possible LP 
models, surprisingly, many real world LP models turn 
out to be degenerate. However, R. Chandrasekaran, 
S.N. Kabadi and K.G. Murty [4] showed that the prob- 
lem of checking whether a given system of linear con- 
straints is degenerate is NP-complete. 

A nondegenerate BFS of (2) is said to be nearly de- 
generate if some variables have positive values which 
are very close to zero in it. In practice, while comput- 
ing BFSs of (2), unless exact arithmetic is used, it is very 
hard to distinguish between degenerate and nearly de- 
generate BFSs because of round-off errors introduced in 
digital computation. 


Problems Posed By Degeneracy 

for the Simplex Method of LP 

Cycling 

Very soon after developing the simplex method for LP, 
Dantzig realized that it may not lead to an optimum so- 
lution under degeneracy but instead may cycle indefi- 
nitely among a set of nonoptimal degenerate bases. The 
first example of cycling in the simplex method was con- 
structed by A.J. Hoffman [13]. 

For implementing the simplex method, the user has 
to select two tie breaking rules to be used in each pivot 
step, one for selecting the entering nonbasic variable, 
and the other for selecting the dropping basic variable, 
among all those that tie. For cycling to occur, these tie 
breaking rules are very crucial. 

Any technique that makes sure that the simplex 
method cannot cycle under degeneracy is said to re- 
solve degeneracy. Quite early in the development of LP, 
techniques for resolving degeneracy in theory, using 
a virtual perturbation involving powers of an infinites- 
imal indeterminate, without altering the data were de- 
veloped ([5,8,21]; also see [16] for an extension of this 
technique to the bounded variable simplex method). 
These fix the tie breaker for the dropping variable as one 
based on lexicographic ordering, but leave the entering 
variable choice arbitrary among those eligible. Over the 
years several other techniques have been developed for 
resolving degeneracy in theory; some, e. g., Bland’s tech- 
nique [3], fix the tie breakers for both the entering and 
dropping variables. Bland’s technique and others like it, 
however, lead to implementations which are very slow 
in practice. 

Computationally, it is also important that tech- 
niques for resolving degeneracy pay attention to the 
possible effects of round-off errors in near degenerate 
solutions. See [10]. 

It is commonly believed that the problem of cycling 
is not encountered in practice; however, degeneracy re- 
lated problems have been discovered to contribute sub- 
stantially to the difficulty in using LP based methods in 
scheduling and related combinatorial and integer pro- 
gramming problems. 


Stalling 


Even after resolving the problem of cycling, yet another 
phenomenon called stalling at a degenerate BFS can oc- 
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cur in the simplex method. Unlike cycling which is an 
infinite repetition of the same sequence of degenerate 
bases, stalling is a finite but exponentially long sequence 
of consecutive degenerate pivot steps at the same objec- 
tive value. Examples of stalling in network flow models 
have been exhibited by J. Edmonds, see [6,17]. 

A technique is said to resolve both cycling and 
stalling under degeneracy, if it can be established that 
the total number of consecutive degenerate pivot steps 
in the simplex method, using this technique, is bounded 
above by a polynomial function of n and the size of the 
LP. Such techniques that fix the tie breakers for both 
the entering and the dropping variables have been de- 
veloped in [6] for the special case of minimum cost pure 
network flow problems. Extending this work to the gen- 
eral LP model seems to be hard, as it can be shown 
that resolving both cycling and stalling in a general LP 
model is only possible if there exist tie breakers for both 
the entering and dropping variables which guarantee to 
make the simplex method a polynomial time method 
for LP. To establish whether such tie breakers exist has 
been a long standing open problem in LP theory. 


Degeneracy Handling in Commercial Codes 


In spite of the folklore that cycling is very unlikely to oc- 
cur in practice, commercial LP codes have sought to im- 
plement anti-cycling procedures that involve little over- 
head and are effective in practice. 

The lexicographic technique for resolving degener- 
acy is not very desirable, as it needs the explicit basis in- 
verse in every step (most commercial codes do not com- 
pute the basis inverse explicitly, they use matrix factor- 
izations of the basis inverse for preserving sparsity and 
for numerical stability). 

For handling degeneracy, commercial codes nor- 
mally use procedures based on perturbing the bounds 
on the variables. If there is no progress in the objec- 
tive value after some number of iterations dependent 
on problem size, then the bounds on the variables in 
the present basic vector are enlarged (i.e., if the previ- 
ous lower and upper bounds on x; are ¢; and uj, they 
are changed to £; — 6; and u; + 6), where 4; is a small 
positive quantity chosen appropriately), and the appli- 
cation of the algorithm is continued on the perturbed 
problem. When the perturbed problem reaches opti- 
mality, the bounds are reset to their original values to 


see if the resulting basis is optimal to the original prob- 
lem (this happens very often). Otherwise, the resulting 
basis satisfies the optimality criterion but may be infea- 
sible. Then a Phase I procedure is used to get feasibility, 
this works fine in almost all cases since the optimal basis 
for the perturbed problem is close to one for the orig- 
inal problem. The dual simplex algorithm can also be 
used for this later part. See [11] for details. 


Effect of Degeneracy on the Optimum Face of an LP 


From LP theory we know that if an LP has at least one 
optimum nondegenerate BFS, then the dual problem 
has a unique optimum solution. Conversely, if the dual 
problem has at least one optimum nondegenerate BFS, 
then the primal LP optimum solution is unique. 


Effects of Degeneracy 
on Post-optimality Analysis in LP 


After having found an optimum solution for an LP, 
an integral part of a good report generator is marginal 
analysis. 

Consider the LP in standard form: minimize z = cx 
subject to (2), and let z*(b) denote the optimum objec- 
tive value in this LP as a function of the right-hand side 
constants vector b while all the other data remains fixed. 
The marginal value or shadow price vector for this LP is 
defined to be (0z(b)/ 0b; :i=1,..., m) when it exists. 

If the LP has a nondegenerate optimum BFS, then 
the dual optimum solution is unique, it is the vector of 
marginal values; and for each right hand side constant 
b; there is an interval of positive length containing the 
present value of b; in its interior, which is its optimal- 
ity range. As 0; varies in this range while all the other 
data remains fixed, the dual optimum solution remains 
unchanged and remains as the marginal value vector. 

In practical applications, the right-hand side con- 
stants vector b in the LP model for a company’s op- 
erations usually contains parameters such as the lim- 
its on raw material supplies, etc. When it exists, prac- 
titioners use the marginal value vector to derive many 
facts of great use in planning, such as identifying which 
raw material supplies are critical, what the break even 
price is for additional supply of each raw material, etc. 
Marginal analysis is the process of drawing such con- 
clusions, and practitioners rely on it heavily to provide 
valuable planning information. 
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The situation changes dramatically when all the op- 
timum BFSs of the LP are degenerate. The dual opti- 
mum solution may not be unique, and the marginal 
value vector as defined above may not exist. In its 
place we have two-sided marginal values: a positive (and 
a negative) marginal value giving the rate of change 
in the optimum objective value per unit increase (de- 
crease) in b;. M. Akgul [1] proved the existence of these 
two-sided marginal values using convex analysis. Sim- 
ple proofs based on parametric LP are given in [16] 
where it is shown that the positive (negative) marginal 
value with respect to b; is 


max {7;: over the dual optimum face} 


(min {7;: over the dual optimum face}). 


Effects of Degeneracy 
in Interior Point Methods for LP 


Unlike the simplex method which walks along edges 
of the polyhedron, the paths traced by interior point 
methods (IPMs) are contained in the strict interior of 
the polyhedron. There are many different classes of 
IPMs based on the strategy used. At first glance, degen- 
eracy, a concept based on properties of extreme point 
solutions, does not seem to be as serious a problem 
for IPMs as it is for simplex methods. In fact proofs of 
polynomiality for IPMs of the projective, path follow- 
ing, and affine potential reduction categories hold true 
without any nondegeneracy assumption. 

However, degeneracy affects the convergence of the 
primal-dual pair in the affine scaling method. Under 
primal nondegeneracy, this method has been shown to 
be globally convergent for any steplength as long as all 
the iterates remain in the interior of the feasible region. 
But this technique breaks down when the primal non- 
degeneracy assumption is removed, in fact L.H. Hall 
and R.J. Vanderbei [12] constructed a degenerate ex- 
ample to show that the dual sequence cannot be con- 
vergent anymore if any fixed steplength greater than 
2/3 to the boundary is taken. Thus stepsize 2/3 to the 
boundary is the longest stepsize for the affine scaling 
algorithm that guarantees convergence of the primal- 
dual pair in the presence of degeneracy. 

Although other IPMs go through the interior of 
the feasible region, degeneracy still has a role to play 
in them. But the problems here are different from the 
cycling and stalling problems occurring in the sim- 


plex method. Degeneracy and redundant constraints 
affect the central path which most IPMs aim to follow. 
Numerical performance of the algorithms may suffer 
from numerical instability and ill-conditioning if the 
optimum solutions are degenerate or near degenerate. 
Also, generating an optimum basis from the near op- 
timum interior solution at the termination of the IPM 
is strongly polynomial, but the computational effort de- 
pends on the degree of degeneracy. 


Effect of Degeneracy on Algorithms 
for Enumerating Extreme Point Solutions 


Consider the problem of enumerating all the extreme 
point solutions of a system of linear constraints, say (2). 
Let £9 denote the unknown number of extreme point 
solutions. 

If (2) is nondegenerate, all its extreme point solu- 
tions can be enumerated in time O(f9mn), an effort 
which grows linearly with fo, D. Avis and K. Fukuda 
[2]. If (2) is degenerate and is the system of constraints 
for a network linear program, J.S. Provan [19] has an al- 
gorithm for enumerating all its extreme point solutions 
in time polynomial in £p and the input size. 

However, it remains an open question whether 
there is an algorithm for enumerating all extreme point 
solutions in time polynomial in p and input size, when 
(2) is a general degenerate system of constraints. 

Murty and S.-J. Chung [18] have shown that de- 
generate polyhedra have proper subsets called segments 
satisfying certain facial incidence properties. For each 
nondegenerate polyhedron, the only segment possible 
is the whole polyhedron itself. The difficulty of enu- 
merating extreme point solutions of degenerate sys- 
tems efficiently is related to the problem of recogniz- 
ing whether a given segment is the whole polyhedron 
or a proper subset of it. 


Effect of Degeneracy 
in Extreme Point Ranking Methods 


Consider the objective function z(x) = cx defined over 
I’, the set of feasible solutions of (2). For simplicity as- 
sume that I” is a convex polytope, i-e., it is bounded. 
An algorithm for ranking the extreme points of I” in 
increasing order of z(x) has been discussed in [15]. In 
each step, this algorithm carries out the operation of 
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enumerating the adjacent extreme points of a given ex- 
treme point of I’. 

If (2) is nondegenerate, every extreme point has ex- 
actly n — m adjacent extreme points, and the above 
operation can be carried out efficiently by pivot steps. 
Hence, the complexity of generating k extreme points 
in the ranked sequence grows linearly with k, and the 
ranking algorithm becomes practically effective. 

If (2) is degenerate, the number of adjacent extreme 
points of a degenerate extreme point of I” may be very 
large, and the ranking algorithm becomes almost im- 
practical. 

The assignment problem is a well known example of 
a highly degenerate problem. However all its extreme 
point solutions known as assignments are 0 — 1 vec- 
tors, using this property an efficient special algorithm 
has been developed in [14] for ranking the assignments 
in increasing order of a linear objective function. 


Degeneracy in Nonlinear Programming 


In contrast to linear programming where the concept 
of degeneracy is defined purely using extreme point so- 
lutions; in nonlinear programming it is defined for any 
solution point. 

Discussion of degeneracy arises in nonlinear pro- 
gramming, particularly in methods known as active set 
methods. These methods are popular for solving non- 
linear programs in which the constraints are linear, say 
of the form (1); but also used when there are nonlin- 
ear constraints in the system. In these methods, when at 
a feasible point x°, certain constraints indexed by an ac- 
tive set A are treated as equations, and the rest are tem- 
porarily disregarded, and a search direction y° is gener- 
ated. The next point is taken ideally as the best feasible 
point on the half-line {x° + Ay®: A > 0}. However, if one 
or more inequality constraints not from the set A are vi- 
olated by x° + Ay® whenever A > 0 and sufficiently small, 
then those constraints allow no progress in the search 
direction, and we have a degenerate situation. See [10] 
and [11] for a discussion of this degeneracy in active set 
methods, and its resolution. 

There are also other generalizations of the notions 
of degeneracy and nondegeneracy, to systems of non- 
linear constraints. In these generalizations, degeneracy 
is taken to mean any measure of departure of problem 
structure from some idealized norm. Simply put, non- 


degenerate means well-posed in some context, degen- 
erate means absence of such nice structure. For a non- 
linear program, nondegeneracy at a solution point has 
been defined variously as the satisfaction of: LICQ (lin- 
ear independence constraint qualification of the bind- 
ing constraint gradients), KKT first order necessary 
conditions for a local minimum, second order suffi- 
cient conditions for a local minimum, or the strict com- 
plementary slackness condition. Also connections be- 
tween nondegeneracy and performance of algorithms 
has been studied, addressing the local effects of special 
kinds of nondegeneracy or its lack at a local minimizer. 
See [9]. 
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The evaluation or approximation of derivatives is a cen- 
tral part of most nonlinear optimization calculations. 
The gradients of objectives and active constraints en- 
ter directly into they Karush-Kuhn-Tucker conditions 
so that inaccuracies in their evaluation limit the achiev- 
able solution accuracy. The latter depends also crucially 
on the conditioning of the projected Hessian of the La- 
grangian. Hence accurate values of this symmetric ma- 
trix allow the design of appropriate stopping criteria 
including the verification of second order conditions. 
Second derivatives also facilitate a rapid final rate of 
convergence, provided the step-defining linear systems 
can be solved by factorization or iteration at a reason- 
able cost. The same observations apply to more general 
optimization calculations like the solution of nonlinear 
complementarity problems. 

Whether or not the obvious benefits of evaluating 
first and higher derivatives accurately justify the costs 
incurred, does strongly depend on the suitability of 
the differentiation method employed for the particular 
problem at hand. We may distinguish five principal op- 
tions for evaluating or approximating derivatives 
e symbolic differentiation; 

e handcoded derivatives; 
e automatic differentiation; 
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e difference quotients; 
e secant updating. 


‘Numerical’ Differentiation Methods 


The last two options are widely used in practical op- 
timization, primarily because they require no extra ef- 
fort whatsoever on the part of the user. Difference quo- 
tients are often called divided differences or finite dif- 
ferences, though the last term invites confusion with 
a related method for discretizing differential equations. 
Other popular labels are differencing or numerical dif- 
ferentiation, because the results are floating point num- 
bers rather than algebraic expressions. The latter are of- 
ten presumed to be the output of more symbolic meth- 
ods. Even though we shall see that the distinction is not 
quite that easy, there is no doubting the importance of 
the fundamental relation 


d 
F'(x)x = Fat + ax) 


a=0 


= : [F(x + x) — F(x)] + Oe). 


Here, the vector function F: R” — R”™ is assumed 
Lipschitz-continuously differentiable on some neigh- 
borhood of the base point x € R”. In other words, the 
directional derivative of F along some vector x € R” is 
the product of the Jacobian matrix F’(x) € R”*" with 
the direction x and it can be approximated by a dif- 
ference quotient. The quality of this approximation de- 
pends strongly on the choice of ¢ and one must expect 
a halving in the number of significant digits under the 
best of circumstances. Quasi-Newton, or secant meth- 
ods may be viewed as an ingenious way of sequentially 
incorporating difference quotients into a Jacobian ap- 
proximation while iterating towards the solution vec- 
tor of a nonlinear system of equations. The correspond- 
ing theory of superlinear convergence is quite beautiful 
from a mathematical point of view, though perhaps not 
terribly relevant in practice for large, structured prob- 
lems. 

It is important to note that the quality of the approx- 
imate derivative matrices generated by quasi- Newton 
methods influences only the rate of convergence but not 
so much the solution accuracy itself. The latter depends 
on the accurate evaluation of residual vectors, which 
may be composed of gradients as is the case for the KKT 


conditions. The importance of accurate residual values 
is particularly well understood in numerical linear al- 
gebra, and replacing them with approximations of un- 
certain reliability is generally a dicy proposition. For- 
tunately, it just so happens that gradients can usually 
be evaluated with working precision at a moderate cost 
relative to that of the underlying functions. This is far 
from true for Jacobians and Hessians, whose cost is very 
hard to predict (and even define) as we shall demon- 
strate further below on various examples. 

The relative cost of evaluating one-sided difference 
quotients in p directions x from the same base point 
x is clearly p + 1. Theoretically one might sometimes 
reduce the evaluation costs by exploiting the fact that 
the p points x + ex are close to x. This proximity may 
arise in the topological sense that the stepsize e||x|| is 
small as well as in the structural sense that x is sparse 
and thus leaves many components of x unchanged. In 
practice such savings are rarely realized and they would 
certainly destroy the main advantage of differencing, 
namely its black box quality, which does not require 
any insight or access to the process by which function 
values are generated. Of course, there is the optimistic 
assumption that they vary smoothly as a function of the 
argument x, and usually the selection ofa suitable incre- 
ment € causes enough trouble for the user and possibly 
even quite a few extra trial evaluations. 

Hence it is indeed fair to assume that one-sided or 
centered differences in p directions x at acommon x re- 
quire 1 + p or 1 + 2p separate function evaluations but 
little extra storage. By letting x range over all n Carte- 
sian basis vectors one obtains an approximate Jacobian 
with first or second order accuracy at the cost of 1 + n 
or 1 + 2 function evaluations. The number of depen- 
dent variables does not matter for differencing so that 
the cost of a gradient, where m = 1, is also 1 + n or 1 
+ 2n times that of the underlying scalar function. To 
compute the Hessian or more generally a full second 
derivative tensor one needs n(n + 1)/2 function evalu- 
ations for one-sided and twice that many for the more 
accurate centered differences. 

Since multiple function evaluations are an ‘embar- 
rassingly’ parallel task the availability of several pro- 
cessors can be used to achieve a nearly perfect speed 
up for derivative approximations by differencing [7]. In 
the sparse case, the number of independent variables 
n can be replaced in the cost ratios above by a num- 


Complexity of Gradients, Jacobians, and Hessians 


427 


ber p < n that represents either the maximal number 
of nonzeros in any row of F’(x) or the usually slightly 
larger chromatic number of the column incidence graph. 
The latter reduction can be achieved by the by now 
classical grouping or coloring technique originally due 
to Curtis—-Powell-Reid [6] and further developed by 
Coleman-Moré [4]. An alternative way to compress the 
rows of the Jacobian even further at the expense of some 
linear equation solving is due to Newsam-Ramsdell 
[12] and has recently been adopted to automatic differ- 
entiation. 


‘Analytical’ Differentiation Methods 


The first three options listed at the beginning are based 
on the chain rule and may therefore be combined under 
the label analytical differentiation. They all would yield 
exact derivative values if real arithmetic could be per- 
formed in infinite precision. Moreover, even the actual 
sequence of operations performed to evaluate a partic- 
ular partial derivative would quite likely be the same 
and thus yield identical results if the same floating point 
arithmetic was used. Only the way in which the instruc- 
tion for this floating point calculation are generated and 
stored differ significantly between the three approaches. 
Also, there may be more or less recalculation of inter- 
mediates that are common to several partial derivatives, 
which can have drastic effects on the computational ef- 
ficiency. 

The result of the second option handcoding may 
in principle be always similarly obtained by symbolic 
or automatic differentiation, provided the computer al- 
gebra package or the differentiation software is sufh- 
ciently smart. Hence we will discuss only the pure op- 
tions one and three, which might of course also be com- 
bined by a highly sophisticated programmer or soft- 
ware tool. 

Symbolic differentiation is usually performed in 
computer algebra packages like Maple, Mathematica 
and Reduce. Most users have the notion that the dif- 
ferentiation commands in these sophisticated systems 
turn formulas for functions into formulas for deriva- 
tives. Moreover there is a tendency to assume that 
having a ‘formula’ means directly expressing depen- 
dent variables as algebraic expressions of independents 
without allowing any named intermediates. The natural 
data structures for such formulas would be expression 


trees. There, every node has only one parent, so that 
the whole thing can be easily linearized and printed by 
enumeration in a depth first order. In reality computer 
algebra packages do not restrict themselves to expres- 
sion trees, because for any nontrivial function the cor- 
responding tree structure is very likely to represent an 
incredible amount of redundancy, even before any dif- 
ferentiation takes place. 


Two-Stranded Chain Scenario 


Consider for example a sequence of complex function 
evaluations 


Xeti t+ iveti = Oe(xK + ive) + iWE(xK + ive) 


for k = 0, ..., 1 — 1 starting from some initial xp + iyo 
€ C. Suppose all function pairs o; + i W, are nonlin- 
ear and do not allow any algebraic simplifications. Then 
eliminating the intermediates x, and y, yields the for- 
mula 


x2 + iy2 = o1(h0(xX0. Yo) + ipo(xXo, Yo) 
+ Wi(do(X0. Yo) + iPo(xo, Yo), 


which involves already twice as many terms as the one- 
level original formula. The same doubling occurs at 
each subsequent level so that expressing x; and y; di- 
rectly in terms of the initial components xo and yo yields 
an exponentially long formula with the symbols xo and 
yo each occurring exactly 2! times. In this case one could 
avoid the highly undesirable expression swell by merely 
substituting z, = x;+ iy, which turns the binary ex- 
pression tree into a simple chain of the same height 1. 

While this example may appear rather algebraic and 
somewhat contrived, exactly the same effect occurs if 
the real pairs (x;, yx) specify straight lines in the plane. 
Specifically, one might think of light-beams being re- 
flected in a maze of mirrors or some other optical ar- 
rangement in the plane. Each ray (x;, yx) that is in- 
coming to a mirror or lense uniquely determines an 
outgoing ray (x; + 1, yx + 1) via some simple algebraic 
relationship. Then expressing the final ray parameters 
(x), yi) directly as functions of the initial parameters (xo, 
yo) will again yield an expression of size 2. 

Rather than dealing with this algebraic monster one 
should of course keep all the intermediate pairs (xx, yx) 
with 0 < k < 1 as named variables. Along this chain 
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one can easily propagate all information of interest, in- 
cluding the 2x2 Jacobian of (x;, yz) with respect to 
(xo, Yo), at a temporal and spatial complexity of order 
I. To achieve this result one may employ suitable vari- 
ants of computer algebra, automatic differentiation or, 
of course, hand-coding. Before discussing them in more 
detail let us discuss a general model of function and 
derivative evaluations. 


Computational Model 


All analytical differentiation methods are based on the 
observation that most vector functions F of practical in- 
terest are being evaluated by a sequence of assignments 


vi =gilvj jer fori=1,...,1 +m. (1) 


Here, the variables v; are real scalars and the elemental 
functions @; are either binary arithmetic operations or 
univariate intrinsics. Consequently, only one or two of 
the partial derivatives 


0 
ci = ay; (Vi) kei 


do not vanish identically and can be evaluated at a cost 
comparable to that of the underlying ¢; itself. 

Without loss of generality we may require that the 
first n variables v;_ , = x; with j = 1, ..., n represent the 
independent variables and the last m variables y; = vj + i 
with i = 1, ..., m represent the dependent variables. 
Then the function y = F(x) is defined by the program 
(1). Here, the nonnegative integer / represents the num- 
ber of intermediate variables, which we expect to be 
much larger than both n and m for seriously nonlinear 
problems. We will also assume that within a small con- 
stant all elemental functions have the same complexity 
so that we have the approximate operations count 


OPS(x es y) ~ | = #intermediates. 


Throughout this article, ~ means proportional with 
small constants that are independent of the particular 
problem at hand. Each intermediate variable may be 
viewed and thus later differentiated as a function v; = 
v;(x) of the independent variable vector x. As long as 
all intermediates v; are stored in separate locations the 
memory requirement for evaluating F will also be pro- 


portional to /. This is a very unrealistic assumption as 
most evaluation programs involve shared allocation of 
intermediates. Due to space constraints we will not be 
able to discuss any aspects of spatial complexity in this 
article. For a detailed treatment of various trade-offs be- 
tween space and time see [9]. 

The way in which the elemental partials cj are han- 
dled differs amongst various analytical differentiation 
methods. They are always evaluated as floating point 
numbers at the current argument in what is variously 
known as automatic or algorithmic or computational 
differentiation. The same can be assumed for hand 
written derivative codes unless they are programmed 
within a computer algebra system, where the cj can be 
defined and manipulated as algebraic expressions. In 
some cases applying the chain rule to these expressions 
may theoretically lead to significant simplifications and 
thus potentially provide the user with analytical insight. 
In the following section we reverse engineer one such 
class of examples and arrive at the tentative conclusion 
that the practical potential for symbolic simplifications 
during the differentiation process appears to be very 
slim indeed. 


Indefinite Integral Scenario 


Suppose that 


* P@) 
J Q() 


F(x) = dx 


for two polynomials P(x) and Q(x) with deg(Q)> 
deg(P). Besides a rational term the symbolic expression 
for F(x) is then likely to contain a welter of logarithms 
and arcus tangents, whose complexity may easily ex- 
ceed that of the integrand f(x) = P(x) / Q(x) by orders 
of magnitude. Then fully symbolic differentiation will 
of course lead back to an algebraic expression for f(x), 
while automatic differentiation will combine the cj in 
floating point arithmetic according to some variant of 
the chain rule and obtain‘just’ a numerical value of f(x) 
at the given point x € R. Moreover, due to cancellations 
that value may well be less accurate than that obtained 
by plugging the particular argument x into the formula 
for f (x). 

However, similar numerical instabilities are likely to 
already affect the evaluation of F(x) itself. They may 
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also show up in the form of an imaginary component 
when the coefficients of P(x) and Q(x) are real but given 
in floating point format. Then the roots of the denomi- 
nator polynomial are already perturbed by unavoidable 
round-off and symbolic differentiation of the resulting 
expression for F(x) will usually not lead back to f(x) 
but some other rational function with a higher poly- 
nomial degree in the numerator or denominator. To 
avoid this effect all coefficients of f(x) must be spec- 
ified as algebraic numbers so that the symbolic inte- 
gration can be performed exactly. This process which 
typically involves rational numbers with enormous co- 
efficients and thus requires a large computational ef- 
fort. 

Hence on practical models one may well be bet- 
ter advised to evaluate F(x) by a numerical quadrature 
yielding highly accurate results at a fraction of the com- 
puting time. Analytically differentiating a nonadaptive 
quadrature procedure yields the same quadrature ap- 
plied to the derivatives of the integrand, namely f’(x) = 
F(x). Hence the resulting values are quite likely to be 
good approximations to the original integrand f(x) and 
they are the exact derivatives of the approximate values 
computed for F(x) by the quadrature. 


Lack of Smoothness 


Adaptive quadratures on the other hand may vary grid 
points and coefficient values in a nondifferentiable or 
even discontinuous fashion. Then derivatives of the 
quadrature value may well not exist in the classical 
sense at some critical arguments x. This difficulty is 
likely to arise in the form of program branches in all 
substantial scientific codes and there is no agreement 
yet on how to deal with it. In most situations one can 
still compute one-sided directional derivatives as well 
as generalized gradients and Jacobians [9]. Naturally, 
computing difference quotients of nonsmooth func- 
tions is also a risky proposition. Generally, optimal re- 
sults in terms of accuracy and efficiency can only be 
expected from a derivatives code developed by a knowl- 
edgeable user, possibly with the help of program analy- 
sis and transformation tools. 


Predictability of Complexities 


With regards to spatial and temporal complexity the 
following basic distinction applies between the analyti- 


cal differentiation methods sketched above. The cost of 
fully symbolic differentiation seems impossible to pre- 
dict. It can sometimes be very low due to fortuitous can- 
cellations but it is more likely to grow drastically with 
the complexity of the underlying function. In contrast 
the relative cost incurred by the various modes of au- 
tomatic differentiation can always be a priori bounded 
in terms of the number of independent and dependent 
variables. Moreover, as we will see below these bounds 
can sometimes be substantially undercut for certain 
structured problems. 

Another advantage of automatic differentiation 
compared to a fully symbolic approach is that restric- 
tions and projections of Jacobians and Hessians to cer- 
tain subspaces of the functions domain and range can 
be built into the differentiation process with corre- 
sponding savings in computational complexity. In the 
remainder of this article we will therefore focus on the 
complexity of various automatic differentiation tech- 
niques; always making sure that no other known ap- 
proach is superior in terms of accuracy and complex- 
ity on general vector functions defined by a sequence of 
elemental assignments. 


Goal-Oriented Differentiation 


The two-stranded chain scenario above illustrates the 
crucial importance of suitable representations of the 
mathematical objects, whose complexity we try to 
quantify here. So one really has to be more specific 
about what one means by computing a function, gradi- 
ent, Jacobian, Hessian, or their restriction and projec- 
tion to certain subspaces. At the very least we have to 
distinguish the (repeated) evaluation in floating point 
arithmetic at various arguments from the preparation 
of a suitable procedure for doing so. This preparation 
stage comes actually first and might be considered the 
symbolic part of the differentiation process. It usually 
involves no floating point operations, except possibly 
the propagation and simplification of some constants. 
This happens for example when a source code for eval- 
uating F is precompiled into a source code for jointly 
evaluating F(x) and its Jacobian F’(x) at a given argu- 
ment x. In the remainder we will neglect the prepa- 
ration effort presuming that it can be amortized over 
many numerical evaluations as is typically the case in 
iterative or time-dependent computations. 
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In general, it is not a priori understood that F’(x) 
should be returned as a rectangular array of floating 
point numbers, especially if it is sparse or otherwise 
structured. Its cheapest representation is the sparse tri- 
angular matrix 


C=C(x) = on eda ts 


i=1—n,...,l+m ° 


The nonzero entries in C can be obtained during the 
evaluation of F at a given x for little extra cost in terms 
of arithmetic operations so that 


OPS{x H> C} ~ OPS{x ES Ft. 


As we will see below, the nonzeros in C allow directly 
the calculation of the products 


F'(x)x €R” forx € R", 


and 
F(x)'y' ER" fory’ ER”, 


using just one multiplication and addition per cj # 0. 
So if our goal is the iterative calculation of an approxi- 
mate Newton-step using just a few matrix-vector prod- 
ucts, we are well advised to just work with the collection 
of nonzero entries of C provided it can be kept in mem- 
ory. If on the other hand we expect to take a large num- 
ber of iterations or wish to compute a matrix factoriza- 
tion of the Jacobian we have to first accumulate all mn 
partial derivatives dy;/ dx; from the elemental partials 
ci. It is well understood that a subsequent inplace trian- 
gular factorization of the Jacobian F’(x) yields an ideal 
representation if one needs to multiply itself as well as 
its inverse by several vectors and matrices from the left 
or right. Hence we have at least three possible ways in 
which a Jacobian can be represented and kept in stor- 
age: 

e unaccumulated: computational graph; 

e accumulated: rectangular array; 

e factorized: two triangular arrays. 

Here the arrays may be replaced by sparse matrix struc- 
tures. For the time being we note that Jacobians and 
Hessians can be provided in various representation at 
various costs for various purposes. Which one is most 
appropriate depends strongly on the structure of the 
problem function F(x) at hand and the final numerical 


purpose of evaluating derivatives in the first place. The 
interpretation of C as computational graph goes back 
to L.V. Kantorovich and requires a little more explana- 
tion. 


The Computational Graph 


With respect to the precedence relation 
Cij x 0 


the indices i,j € V = [1 —n,...,1+ m] form a directed 
graph with the edge set E. Since by assumption j < i 
implies j < i the graph is acyclic and the transitive clo- 
sure of < defines a partial ordering between the corre- 
sponding variables v; and v;. The minimal and maximal 
elements with respect to that order are exactly the inde- 
pendent and dependent variables v; _ , = x; with j = 1, 
..., nand the v,,; = y; with i = 1,..., m, respectively. 
For the two stranded chain scenario with 1 = 3 one ob- 
tains a computational graph of the following form: 


jxi = => (,iEE, 


Assuming that all elemental g; are unary functions 
or binary operations we find |E| < 2(1+m) ~ 1. One may 
always annotate the graph vertices with the elemental 
functions g; and the edges with the nonvanishing ele- 
mental partials cj. For most purposes the g; do not re- 
ally matter and we may represent the graph (V, E) sim- 
ply by the sparse matrix C. 


Forward Mode 
Given some vector x = (Vj-,)j =1,..., n € R", there 
exist derivatives 


v; = —vi(x + ax) forl<i<l+m. 


da 


a=0 
By the chain rule these 1; satisfy the recurrence 
= Do cir fori=1,...,J+m. (2) 
jxi 


The resulting tangent vector py = (W+i)i=1,..., 
isfies y = F’(x)x and it is obtained at a cost propor- 
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tional to 1. Instead of propagating derivatives with re- 
spect to just one direction vector x one may amortize 
certain overheads by bundling p of them into a ma- 
trix X ¢ R"*? and then computing simultaneously Y 
= F'(x)X € R”*?. The cost of this vector forward mode 
of automatic differentiation is given by 


Opsic 2S ¥} ~ pl ~ pOPstx 3 3. 8) 


If the columns of X are Cartesian basis vectors e; € R" 
the corresponding columns of the resulting Y are the 
jth columns of the Jacobian. Hence by setting X = I with 
p =n we may compute the whole Jacobian at a tempo- 
ral complexity proportional to nl. Fortunately, in many 
applications the whole Jacobian is either not needed at 
all or due to its sparsity pattern it may be reconstructed 
from its compression Y = F'(x)X fora suitable seed ma- 
trix X. As in the case of difference quotients this matrix 
may be chosen according to the Curtis—Powell-Reid [6] 
or the Newsam-Ramsdell [12] approach with p usually 
close to the maximal number of nonzeros in any row of 
the Jacobian. 


Bauer’s Formula 


Using the recurrence for the ¥; given above one may 
also obtain an explicit expression for each individual 
partial derivative dy;/ dx;. Namely, it is given by the sum 
over the products of all arc values c;; along all paths 
connecting the minimal node v;_, with the maximal 
node v;,;. This formula due to F.L. Bauer [1] implies 
in particular that the ijth Jacobian entry vanishes iden- 
tically exactly when there is no path connecting nodes j 
—nand!+iin the computational graph. In general the 
number of distinct paths in the graph is very large and it 
represents exactly the lengths of the formulas obtained 
if one expresses each y; directly in terms of all x; that it 
depends on. Hence we may conclude 


bauer formul 


OPS{C +> F’} ~ OPS{x > y}. 


In the two-stranded chain scenario considered above, 
both operations counts would be of order 2), which 
is obviously an unacceptable effort. Fortunately, vec- 
tor forward and Bauer’s formula are just two special 
choices amongst many ways for accumulating the Jaco- 
bian F’(x) from the computational graph C. The most 


celebrated alternative is the reverse or backward mode 
of automatic differentiation. 


Reverse Mode 


Rather than propagating directional derivatives 1; for- 
ward through the computational graph one may also 
propagate adjoint quantities v; backward. To define 
them properly one must perturb the original evaluation 
loop by rounding errors 6; so that now 
vi = 6; + gi(v{) j<i fori=1l—n,...,l. 

Then the resulting vector y is a function not only 
of x but also of the vector of small perturbations 
(5;)i=1—n,...,1- Given any row vector of weights y = 
(V1+4:)i=1,...,m We obtain the sensitivities 


) 


wi=—yy forl—-n<i<l, 


06; §;=0 


where all other perturbations 6; with j ¢ i are set to 
zero during the differentiation. The adjoint compo- 
nents Vj, = Xj; form the row vector x = yF’(x) € R", 
which is simply the gradient of the linear combination 
yF(x). In the optimization context this scalar valued 
function is usually a Lagrangian, whose gradient and 
Hessian figure prominently in the first and second or- 
der optimality conditions. The amazing thing is that 
as a consequence of the chain rule such gradients can 
be computed at the same cost as tangents by using the 
backward recurrence 


vji= ) ViCij 


i>j 


forj=1,...,l—n. (4) 


Just like in the forward scalar recurrence (2), each ele- 
mental partial cj 4 0 occurs exactly once and we may 
amortize costs by bundling several y into an adjoint 
seed matrix Y € R1*™. This vector reverse mode yields 
the matrix X = YF’(x) € R17" at the cost 


OPS{C > X} ~ ql ~ gOPS{x HS y}. 


Again the whole Jacobian is obtained directly if we seed 
Y = I with q = m. Hence we find by comparison with 
(3) as a rule of thumb that the reverse mode is prefer- 
able ifm <n, i.e., if there are not nearly as many de- 
pendents as independents. In classical NLPs we may 
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think of mas the number of active constraints plus one, 
which is often much smaller then n, the number of vari- 
ables. In unconstrained optimization we have m = 1 so 
that the gradient of the objective F can be computed 
with essentially the same effort as F itself. In the sparse 
case we may now employ column rather than row com- 
pression with q roughly equal to the maximal number 
of nonzeros in any column of F’(x). 

For suitable seeds Y the column compression X = 
YF'(x) allows the reconstruction of the complete Jaco- 
bian F’ (x). Furthermore, row and column compression 
can be combined yielding for example Jacobians with 
arrow head structure at the cost of roughly p + q = 3 
function evaluations. In that case one may use 


# 
eo(7 > = 18 and Y=e!. 
0 0: 0 1 


Then X = YF'(x) is the last row of the arrowhead ma- 
trix F’(x) and the two columns of Y = F’(x)X contain all 
other nonzero entries. For pure row or column com- 
pression dense rows or columns always force p = n or 
q =m, respectively. Hence the combination of forward 
and reverse differentiation offers the potential for great 
savings. In either case projections and restrictions of the 
Jacobian to subspaces of the vector functions domain 
and range can be built into the differentiation process, 
which is part of the goal-orientation we alluded to be- 
fore. 


Second Order Adjoints 


Rather than separately propagating some first deriva- 
tives forward, others reverse, and then combining the 
results to compute Jacobian matrices efficiently, one 
may compose these two fundamental modes to com- 
pute second derivatives like Hessians of Lagrangians. 
More specifically, we obtain by directional differentia- 
tion of the adjoint relation X = yF’(x) the second order 
adjoint 


x = YF"(x)x ER". 


Here we have assumed that the adjoint vector y is con- 
stant. We also have taken liberties with matrix vector 
notation by suggesting that the m x n x n derivative ten- 
sor F(x) can be multiplied by the row vector y € R™ 
from the left and the column vector x € R"x € R" from 


the right yielding a row vector X of dimension n. In an 
optimization context y should be thought of as a vector 
of Lagrange multipliers and x as a feasible direction. By 
composing the complexity bounds for the reverse and 
the forward mode one obtains the estimates 


OPS{x PS y} ~ OPS{x, x OS jt 
rev ad |. 
~ OPS{x, ¥ H> x} ~ OPS{x, x, pH > xX}. 


Here, ad represents reverse differentiation followed by 
forward differentiation or vise versa. The former inter- 
pretation is a little easier to implement and involves 
only one forward and one backward sweep through the 
computational graph. 


Operations Counts and Overheads 


From a practical point of view one would of course 
like to know the proportionality factors in the relations 
above. If one counts just multiplication operations then 
y and X are at worst 3 times as expensive as y, and X is 
at most 9 times as expensive. A nice intuitive example 
is the calculation of the determinant y of a /n x /n 
matrix whose entries form the variable vector x. Then 
we have m = 1 and 


OPS{x > y} = avi + O(n) 


multiplications if one uses an LU factorization. Then it 
can be seen that y = 1/y makes x the transpose of the 
inverse matrix and the resulting cost estimate of Jn + 
O(n) multiplications conforms exactly with that for the 
usual substitution procedure. 

However, these operations count ratios are no re- 
liable indications of actual runtimes, which depend 
very strongly on the computing platform, the particu- 
lar problem an hand, and the characteristics of the AD 
tool. Implementations of the vector forward mode like 
ADIFOR [3] that generate compilable source codes can 
easily compete with divided differences, i.e. compute 
p directional derivatives in the form Y = F’ (x)X at 
the cost of about p function evaluations. For sizeable 
p ~ 10 they are usually faster than divided differences, 
unless the roughly p-fold increase in storage results in 
too much paging onto disk. The reverse mode is an en- 
tirely different ball-game since most intermediate val- 
ues v; and some control flow hints need to be first saved 
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and later retrieved, which can easily make the calcula- 
tion of adjoints memory bound. This memory access 
overhead can be partially amortized in the vector re- 
verse mode, which yields a bundle X = YF’(x) of q gra- 
dient vectors. For example in multicriteria optimization 
one may well have q ~ 10 objectives or soft constraints, 
whose gradients are needed simultaneously. 


Worst-Case Optimality 


Counting only multiplications we obtain for Jacobians 
F’ € R”*" the complexity bound 


OPS{x MS F’} < 3min(n, m) OPS{x PS y} . 


Here, n and m can be reduced to the maximal number 
of nonzero entries in the rows and columns of the Jaco- 
bian, respectively. 

Similarly, we have for the one-sided projection of 
the Lagrangian Hessian 


H(x, ) = yF" = SY yV7F; e R*™* 


i=1 


onto the space spanned by the columns of X: 


OPS{x ES H(x, 7)X} < 9pOPSfx PS yt. 


As we already discussed for indefinite integrals there 
are certainly functions whose derivatives can be evalu- 
ated much cheaper than they themselves for example 
using a computer algebra package. Note that here again 
we have neglected the preparation effort, which may be 
very substantial for symbolic differentiation. Neverthe- 
less, the estimates given above for AD are optimal in 
the sense that there are vector functions F defined by 
evaluation procedures of the form (1), for which no dif- 
ferentiation process imaginable can produce the Jaco- 
bian and projected Hessian significantly cheaper than 
the given cost bound divided by a small constant. Here, 
producing these matrices is understood to mean calcu- 
lating all its elements explicitly, which may or may not 
be actually required by the overall computation. 

Consider, for example, the cubic vector function 


b(a' x)? 


F(x) =x+ with a,b € R" . 


Its Jacobian and projected Hessian are given by 
F(x) =I+b (aT x)” a’ eR 

and 
H(x,y)X = 2a(yb)(a'x)a'X ER"? . 


For general a, b and X, all entries of the matrices F’ (x) 
and H(x, y)X are distinct and depend nontrivially on x. 
Hence their explicit calculation by any method requires 
at least n* 
Since the evaluation of F itself can be performed using 
just 3n multiplications and a few additions, the opera- 
tions count ratios given above cannot be improved by 
more than a constant. There are other, more meaning- 
ful examples [9] with the same property, namely that 
their Jacobians and projected Hessians are orders of 
magnitude more expensive than the vector function it- 
self. At least this is true if we insist on representing them 
as rectangular arrays of reals. This does not contradict 
our earlier observation that gradients are cheap, be- 
cause the components of F(x) cannot be considered as 
independent scalar functions. Rather, their simultane- 
ous evaluation may involve many common subexpres- 
sions, as is the case for our rank-one example. These ap- 
pear to be less beneficial for the corresponding deriva- 
tive evaluation, thus widening the gap between function 
and derivative complexities. 


or np arithmetic operations, respectively. 


Expensive = Redundant? 


The rank-one problem and similar examples for which 
explicit Jacobians or Hessians appear to be expen- 
sive have a property that one might call redundancy. 
Namely, as x varies over some open neighborhood in its 
domain, the Jacobian F’(x) stays in a lower-dimensional 
manifold of the linear space of all matrices with its for- 
mat and sparsity pattern. In other words, the nonzero 
entries of the Jacobian are not truly independent of each 
other so that computing them all and storing them sep- 
arately may be wasteful. In the rank-one example the 
Jacobian F’(x) is dense but belongs at all x to the one- 
dimensional affine variety {I + baa™ : a € R}. Note 
that the vectors a, b € R" are assumed to be dense 
and constant parameter vectors of the problem at hand. 
Their elements all play the role of elemental partials cj 
with the corresponding operation g; being multiplica- 
tions. Hence accumulating the extremely sparse trian- 
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gular matrix C, which involves only O(n) nonzero en- 
tries, to the dense n x n array F’(x) is almost certainly 
a bad idea, no matter what the ultimate purpose of the 
calculation. In particular, if one wishes to solve linear 
systems in the Jacobian, the inverse formula of Sher- 
man-Morrison—Woodbury provides a way of comput- 
ing the solution of rank-one perturbations to diagonal 
matrices with O(n) effort. This formula may be seen as 
a very special case of embedding linear systems in F’ 
into a much larger and sparse linear system involving C 
as demonstrated in [11] and [5]. 

As of now, all our examples for which the array 
representation of Jacobians and Hessians are orders of 
magnitude more expensive to evaluate than the under- 
lying vector function exhibit this redundancy property. 
In other words, we know of no convincing example 
where vectors that one may actually wish to calculate as 
end products are necessarily orders of magnitude more 
expensive than the functions themselves. Especially for 
large problems it seems hard to imagine that array rep- 
resentations of the Jacobians and Hessians themselves 
are really something anybody would wish to look at 
rather than just use as auxiliary quantities within the 
overall calculation. 

So evaluating complete derivative arrays is a bit like 
fitting a handle to a wooden crate that needs to be 
moved about frequently. If the crate is of small weight 
and size this job is easily performed using a few screws. 
If, on the other hand, the crate is large and heavy, 
fitting a handle is likely to require additional bracing 
and other reinforcements. Moreover, this effort is com- 
pletely pointless since nobody can just pick up the crate 
by the handle anyhow and one might as well use a fork 
left in the first place. 


Preaccumulation and Combinatorics 


The temporal complexity for both the forward and the 
reverse (vector) mode are proportional to the number 
of edges in the linearized computational graph. Hence 
one may try to reduce the number of edges by cer- 
tain algebraic manipulations that leave the correspond- 
ing Jacobian, i.e., the linear mapping between x and 
y = F'(x)x and equivalently also that between y and 
xX = YyF’(x) unchanged. It can be easily checked that 
this is the case if given an index j one updates first 


Cik + = CijCjk 


either for fixed i > j and all k ~< j, or for fixed k < j 
and all i > j, and then sets cj = 0 or cj = 0, respec- 
tively. In other words, either the edge (j, i) or the edge 
(k, j) is eliminated from the graph. This leads to fill-in 
by the creation of new arcs, unless all updated cj, were 
already nonzero beforehand. Eliminating all edges (k, j) 
with k <j or all edges (j, i) with i > j is equivalent and 
amounts to eliminating the vertex j completely from the 
graph. After all intermediate vertices 1 < j < / are elim- 
inated in some arbitrary order, the remaining edges cj 
directly connect independent variables with dependent 
variables and are therefore entries of the Jacobian F’(x). 
Hence, one refers to the accumulation of the Jacobian 
F’ if all intermediate nodes are eliminated and to preac- 
cumulation if some of them remain so that the Jacobian 
is represented by a simplified graph. 

As we have indicated in the section on goal oriented 
differentiation one would have to carefully look at the 
problem function and the overall computational task 
to decide how much preaccumulation should be per- 
formed. Moreover, there are I! different orders in which 
a particular set of | < | intermediate nodes can be elim- 
inated and even many more different ways of eliminat- 
ing the corresponding set of edges. So far there have 
only been few studies of heuristic criteria for finding 
efficient elimination orderings down to an appropriate 
preaccumulation level [9]. 


Summary 


First and second derivative vectors of the form y = 
F'(x)x, ¥ = yF’(x) and x= yF” (x)x can be evaluated 
for a fixed small multiple of the temporal complexity of 
the underlying relation y = F(x). The calculation of the 
gradient X and the second order adjoint x by the basic 
reverse method may require storage of order / = #in- 
termediates. This possibly unacceptable amount can be 
reduced to order log(/) at a slight increase in the opera- 
tions count (see [8]). 

Jacobians and one-sided projected Hessians can be 
composed column by column or row by row from vec- 
tors of the kind y, ¥ and x. For sparse derivative ma- 
trices row and/or column compression using suitable 
seed matrices of type CPR or NR allow a substantial 
reduction of the computational effort. In some cases 
the nonzero entries of derivative matrices may be re- 
dundant, so that their calculation should be avoided, if 
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the overall computational goal can be reached in some 
other way. The attempt to evaluate derivative array with 
absolutely minimal effort leads to hard combinatorial 
problems. 
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> Complexity Theory 
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> Fractional Combinatorial Optimization 

> Information-Based Complexity and 
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Introduction 


Least squares problems and solution techniques to 
solve them have a long history briefly addressed by 
Bjorck [4]. In this article we focus on two classes of 
complex least squares problems. The first one is estab- 
lished by models involving differential equations. The 
other class is made by least squares problems involv- 
ing difficult models which need to be solved for many 
independent observational data sets. We call this least 
squares problems with massive data sets. 
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A Standard Formulation for Unconstrained 
Least Squares Problem 


The unconstrained least squares problem can be ex- 
pressed by 


min 1(p) , [o(p) := | ms [x(tn), (te). P] 


N 
=>"[re(p)]. mm € RY. (1) 


k=1 


The minimization of this functional, i. e., the minimiza- 
tion of the sum of weighted quadratic residuals, un- 
der the assumption that the statistical errors follow 
a Gaufian distribution with variances as in (4), pro- 
vides a maximum likelihood estimator ([7] Chap. 7) for 
the unknown parameter vector p. This objective func- 
tion dates back to Gauf [14] and in the mathemati- 
cal literature the problem is synonymously called least 
squares or £, approximation problem. 

The least squares structure (1) may arise either from 
a nonlinear over-determined system of equations 


ni(p)=0, k=1,...,.N, N>n, (2) 


or from a data fitting problem with N given data points 
(tx, Ye) and variances o,, a model function F(t, p), 
and n adjustable parameters p: 


rik = kD) = Ye—Fe(p) = Swe [Ye — Flte, p)] - 
(3) 


The weights w, are related to the variances o; by 
wr := Blo; . (4) 


Traditionally, the weights are scaled to a variance of 
unit weights. The factor 8 is chosen so as to make the 
weights come out in a convenient range. In short vector 
notation we get 


r:= Y—F(p)= [rii(p), tees rin(P)] , 
F(p), Ye RY. 


Our least squares problem requires us to provide the 
following input: 

1. model, 

2. data, 

3. variances associated with the data, 


4. measure of goodness of the fit, e.g., the Euclidean 
norm. 

In many practical applications, unfortunately, less at- 

tention is paid to the variances. It is also very important 

to point out that the use of the Euclidean norm requires 

pre-information related to the problem and statistical 

properties of the data. 


Solution Methods 


Standard methods for solving linear version of (1), 
i.e. F(p) = Ap, are reviewed by Bjérck [4]. Non- 
linear methods for unconstrained least squares prob- 
lems are covered in detail by Xu [35,36,37]. In addition, 
we mention a popular method to solve unconstrained 
least squares problems: the Levenberg-Marquardt algo- 
rithm proposed independently by Levenberg [21] and 
Marquardt [22] and sometimes also called “damped 
least squares”. It modifies the eigenvalues of the nor- 
mal equation matrix and tries to reduce the influence 
of eigenvectors related to small eigenvalues (cf. [8]). 
Damped (step-size cutting) Gauf’—-Newton algorithms 
combined with orthogonalization methods control the 
damping by natural level functions [6,9,10] seem to be 
superior to Levenberg-Marquardt type schemes and 
can be more easily extended to nonlinear constrained 
least squares problems. 


Explicit Versus Implicit Models 


A common basic feature and limitation of least squares 
methods, but seldom explicitly noted, is that they re- 
quire some explicit model to be fitted to the data. How- 
ever, not all models are explicit. For example, some 
pharmaceutical applications for receptor-ligand bind- 
ing studies are based on specifically coupled mass equi- 
librium models. They are used, for instance, for the 
radioimmunological determination of Fenoterol or re- 
lated substances, and lead to least squares problems in 
systems of nonlinear equations [31], in which the model 
function F(p) is replaced by F(t; p, z) which, besides the 
parameter vector p and the time t, depends on a vector 
function z = z(t; p) implictly defined as the solution of 


the nonlinear equations 
F,(t;p,z)=0, F.(p)<¢R”. (5) 


This is a special case of an implicit model. There is 
a much broader class of implicit models. Most models 


Complexity and Large-Scale Least Squares Problems 


437 


in science are based on physical, chemical and biologi- 
cal laws or include geometry properties, and very often 
lead to differential equations which may, however, not 
be solvable in a closed analytical form. Thus, such mod- 
els do not lead to explicit functions or models we want 
to fit to data. We rather need to fit an implicit model 
(represented by a system of differential equations or an- 
other implicit model). The demand for and the applica- 
tions of such techniques are widespread in science, es- 
pecially in the rapidly increasing fields of nonlinear dy- 
namics in physics and astronomy, nonlinear reaction 
kinetics in chemistry [5], nonlinear models in mate- 
rial sciences [16] and biology [2], and nonlinear sys- 
tems describing ecosystems [28,29] in biology, or the 
environmental sciences. Therefore, it seems desirable 
to focus on least squares algorithms that use nonlin- 
ear equations and differential equations as constraints 
or side conditions to determine the solution implic- 
itly. 


Practical Issues of Solving Least Squares Problems 


Solving least squares problems involves various difficul- 
ties among them to find an appropriate model, non- 
smooth models with discontinuous derivatives, data 
quality and checking the assumption of the underlying 
error distribution, and dependence on initial parameter 
or related questions of global convergence. 


Models and Model Validation A model may be de- 
fined as an appropriate abstract representation of a real 
system. In the natural sciences (e.g., Physics, Astron- 
omy, Chemistry and Biology) models are used to gain 
a deeper understanding of processes occurring in na- 
ture (an epistemological argument). The comparison of 
measurements and observations with the predictions of 
a model is used to determine the appropriateness and 
quality of the model. Sir Karl Popper [26] in his famous 
book Logic of Scientific Discovery uses the expressions 
falsification and verification to describe tasks that the 
models can be used to accomplish as an aid to scientific 
process. Models were used in early scientific work to ex- 
plain the movements of planets. Then, later, aspects and 
questions of accepting and improving global and fun- 
damental models (e. g., general relativity or quantum 
physics) formed part of the discussion of the philoso- 
phy of science. In science models are usually falsified, 


and, eventually, replaced by modified or completely dif- 

ferent ones. 

In industry, models have a rather local meaning. 
A special aspect of reality is to be mapped in detail. 
Pragmatic and commercial aspects are usually the mo- 
tivation. The model maps most of the relevant features 
and neglect less important aspects. The purpose is to 
e provide insight into the problem, 

e allow numerical, virtual experimentation but avoid 
expensive and/or dangerous real experiments, or 

e tune a model for later usage, i.e., determine, for in- 
stance, the reaction coefficients of a chemical sys- 
tem - once these parameters are known the dynam- 
ics of the process can be computed. 

A (mathematical) model represents a real-world prob- 

lem in the language of mathematics, i.e., by using 

mathematical symbols, variables (in this context: the 
adjustable least squares parameters), equations, in- 
equalities, and other relations. How does one get 

a mathematical model for a real-world problem? To 

achieve that is neither easy nor unique. In some sense 

it is similar to solving exercises in school where prob- 
lems are put in a verbal way [25]. The following points 
are useful to remember when trying to build a model: 

e there will be no precise recipe telling the user how to 
build a model, 

e experience and judgment are two important aspects 
of model building, 
there is nothing like a correct model, 

e there is no concept of a unique model, as different 
models focusing on different aspects may be appro- 
priate. 

Industrial models are eventually validated which means 

that they reached a sufficient level of consensus among 

the community working with these models. 

Statistics provide some means to discriminate mod- 
els but this still is an art and does not replace the 
need for appropriate model validation. The basic no- 
tion is: with a sufficient number of parameters on can 
fit an elefant. This leads us to one important conse- 
quence: it seems to be necessary that one can interpret 
these model parameters. A reasonable model derived 
from the laws of science with interpretable parameters 
is a good candidate to become accepted. Even, if it may 
lead to a somewhat worse looking fits than a model with 
a larger number of formal parameters without interpre- 
tation. 
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Non-Smooth Models The algorithm reviewed by 
Xu [35,36,37] for solving least squares problems usu- 
ally require the continuous first derivatives of the model 
function with respect to the parameters. We might, 
however, encounter models for which the first deriva- 
tives are discontinuous. Derive-free methods such as 
Nelder and Mead’s [23] downhill Simplex method, or 
direction set methods; cf. ([27], p. 406) have been suc- 
cessfully used to solve least squares problems. The Sim- 
plex method provides the benefit of exploring parame- 
ter space and good starting values for derivative based 
methods. Powell’s direction set method with appropri- 
ate conjugate directions preserve the derivative free na- 
ture of the method. 


Global Convergence Nonlinear least squares algo- 
rithms usually converge only if the initial parameters 
are close to the best fit parameters. Global convergence 
can be established for some algorithms, i.e., they con- 
verge for all initial parameters. An essential support 
tool accompanying the analysis of difficult least squares 
problem is to visualize the data and the fits. Inappro- 
priate or premature fits can easily be excluded. Inappro- 
priate fits are possible because all algorithms mentioned 
in Sect. “Introduction”, “Parameter Estimation in ODE 
Models”, and “Parameter Estimation in DAE Models” 
are local algorithm. Only if the least squares problem 
is convex, they yield the global least squares minimum. 
Sometimes, it is possible to identify false local minima 
from the residuals. 


Data and Data Quality Least squares analysis is con- 
cerned by fitting data to a model. The data are not ex- 
act but subject to unknown random errors e,. In ideal 
cases these errors follow a Gaussian normal distribu- 
tion. One can test this assumption after the least squares 
fit by analyzing the distribution of the residuals as de- 
scribed in Sect. “Residual Distributions, Covariances 
and Parameter Uncertainties”. Another important is- 
sue is whether the data are appropriate to estimate all 
parameters. Experimental design is the discipline which 
addresses this issue. 


Residual Distributions, Covariances and Parameter 
Uncertainties Once the minimal least squares solu- 
tion has been found one should at first check with 
the y*-test or Kolmogoroff-Smirnov test whether the 


usual assumption that the distribution really follows 
a Gaussian normal distribution. With the Kolmogo- 
roff-Smirnov test (see, e. g., [24]) it is possible to check 
as follows whether the residuals of a least-squares solu- 
tion are normally distributed around the mean value 0. 
1. let M := (x1, X2,...; Xn) be a set of observations for 
which a given hypothesis should be tested; 
2. letG:x € M > R, x > G(x), be the correspond- 
ing cumulative distribution function; 


3. for each observation x € M define S,(x) := k/n, 
where k is the number of observations less than or 
equal to x; 

4. determine the maximum D := max(G(x) — S,(x) | 
x € M); 


5. Deit denotes the maximum deviation allowed for 
a given significance level and a set of n elements. 
Deit is tabulated in the literature, e.g., ([24], Ap- 
pendix 2, p. 560); and 

6. if D < Dait, the hypothesis is accepted. 

For the least squares problem formulated in 
Sect. “A Standard Formulation for Unconstrained Least 
Squares Problem” the hypothesis is “The residuals 
x := 1, = Y—F(p) are normally distributed around the 
mean value 0”. Therefore, the cumulative distribution 
function G(x) takes the form 


J/22G(x) =[ g(z)dz 


= am : g(z)dz + g(z)dz, 


Q(z) = en? 


The value xo separates larger residuals; this is problem 
specific control parameter. 

The derivative based least squares methods usually 
also give the covariance matrix from which the uncer- 
tainties of the parameter are derived; cf. [7], Chap. 7. 
Least squares parameter estimations without quantify- 
ing the uncertainty of the parameters are very doubtful. 


Parameter Estimation in ODE Models 


Consider a differential equation with independent vari- 
able t for the state variable 


d 
x(t) = 7 =f(t.x,p), xeR™ , peR" (6) 


with a right hand side depending on an unknown pa- 
rameter vector p. Additional requirements on the solu- 
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tion of the ODE (1) like periodicity, initial or boundary 
conditions or range restrictions to the parameters can 
be formulated in vectors rz and r3 of (component wise) 


equations and inequalities 
Y2 [x(t1),...,x(tk), p] =0 or (7) 
r3 [x(t1), aaang »X(tx), P| = 0. 


The multi-point boundary value problem is linked to 
experimental data via minimization of a least squares 
objective function 


1y(x, p) := |[r1 [x(t1),....x(tn), P] ||; - (8) 


In a special case of (8) the components ¢ of the vec- 
torr, € R" are “equations of condition” and have the 
form 


re = 07; [nij — gi(x(t;), p)] . 
Nj 
f= ics h= ys (9) 
i=1 
This case leads us to the least squares function 


NP Nj 


h(x,p) = )) >> oF? Inj — gi(x(t;), py? - 


j=l i=1 


(10) 


Here, N? denotes the number of values of the inde- 
pendent variable (here called time) at which observed 
data are available, Nj denotes the number of observ- 
ables measured at time t; and 7;; denotes the observed 
value which is compared with the value of observable i 
evaluated by the model where the functions g;(x(t;), p) 
relate the state variables to x this observable 


Nii = gilx(t;),p) + &ij- (11) 


The numbers ¢;; are the measurement errors and oF; 


are weights that have to be adequately chosen due to 
statistical considerations, e. g. as the variances. The un- 
known parameter vector p is determined from the mea- 
surements such that the model is optimally adjusted to 
the measured (observed) data. If the errors ¢;; are inde- 
pendent, normally distributed with the mean value zero 
and have variances o7, (up to acommon factor 6”), then 
the solution of the least squares problem is a maximum 
likelihood estimate. 


The Initial Value Problem Approach 


An obvious approach to estimate parameters in ODE 
which is also implemented in many commercial pack- 
ages is the initial value problem approach. The idea is to 
guess parameters and initial values for the trajectories, 
compute a solution of an initial value problem (IVP) (6) 
and iterate the parameters and initial values in order to 
improve the fit. Characteristic features and disadvan- 
tages are discussed in, e.g., [6] or [18]. In the course 
of the iterative solution one has to solve a sequence of 
IVPs. The state variable x(t) is eliminated for the ben- 
efit of the unknown parameter p and the initial values. 
Note that no use is made of the measured data while 
solving the IVPs. They only enter in the performance 
criterion. Since initial guesses of the parameters may be 
poor, this can lead to IVPs which may be hard to solve 
or even have no solution at all and one can come into 
badly conditioned regions of the IVPs, which can lead 
to the loss of stability. 


The Boundary Value Problem Approach 


Alternatively to the IVP approach, in the “boundary 
value problem approach” invented by Bock [5], the in- 
verse problem is interpreted as an over-determined, 
constrained, multiple-point boundary problem. This 
interpretation does not depend on whether the direct 
problem is an initial or boundary value problem. The 
algorithm used here consists of an adequate combina- 
tion of a multiple shooting method for the discretiza- 
tion of the boundary value problem side condition in 
combination with a generalized Gauss-Newton method 
for the solution of the resulting structured nonlinear 
constrained least squares problem [5,6]. Depending on 
the vector of signs of the state and parameter depen- 
dent switching functions Q it is even possible to allow 
piecewise smooth right hand side functions f, i.e., dif- 
ferential equations with switching conditions 


x’ = f(t,x, ps sign(Q(t,x, p))), (12) 


where the right side may change discontinuously if the 
vector of signs of the switching functions Q changes. 
Such discontinuities can occur, e.g. as a result of un- 
steady changes of physical values. The switching points 
are in general given by the roots of the state-dependent 
components of the switching functions 


Q;(t,x, p) = 0. (13) 
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Depending on the stability behavior of the ODE and 
the availability of information about the process (mea- 
sured data, qualitative knowledge about the problem, 
etc.) a grid T;, 

Tat A ee Sie ASF, 


1<j<m-1, (14) 


of m multiple shooting nodes t; (m — 1 subintervals I;) 
is chosen. The grid is adapted to the problem and data 
and is defined such that it includes the measuring inter- 
val ([t1, Tm] = [to. t]). Usually, the grid points t cor- 
respond to values of the independent variable t at which 
observations are available but additional grid points 
may be chosen for strongly nonlinear models. At each 
node t; an IVP 


x()=f(t.x.p), x(t=1))=s)eR™ (15) 


has to be integrated from 1; to tj+1. The m — 1 vec- 
tors of (unknown) initial values s; of the partial trajec- 
tories, the vector s,, representing the state at the end 
point and the parameter vector p are summarized in the 
(unknown) vector z 


sg) (16) 


For a given guess of z the solutions x(t; s;, p) of the m—1 
independent initial value problems in each sub inter- 
val I; are computed. This leads to an (at first discon- 
tinuous) representation of x(t). In order to replace (6) 
equivalently by these m — 1 IVPs matching conditions 


h,(sj, 8j+1, Pp) -= x(tj4+15 8), P) — Sj41 = 0, 
hj: Rt"? > IR" (17) 


are added to the problem. (17) ensures the continuity of 
the final trajectory x(t). 

Replacing x(t;) and p in (10) by z the least squares 
problem is reformulated as a nonlinear constrained op- 
timization problem with the structure 


1 
min 5 lFi@)Ik2 |Fo(z) =O0ER”, 
Z 
F3(z) >O0¢ IR") , (18) 
wherein nz denotes the number of the equality and n3 


the number of the inequality constraints. This usu- 
ally large constrained structured nonlinear problem 


is solved by a damped generalized Gauss-Newton 
method [5]. If Ji(zx) := 02Fi (Zk), Jo(zk) := 02F2(zx) 
vis. J3(zx) := 0,F3(zx) denote the Jacobi matrices of F,, 
F,vis. F3, then the iteration proceeds as 


Zk+1 = Ze + a Az, (19) 


with damping constant a@%,0 < Q@min < a, < 1, and 
the increment Az; determined as the solution of the 
constrained linear problem 


acid 
min) > |\Ji(ex) tx + Fi (ee) 2 


J(z) AZ, + Fo(zx) = 0 


J3(Z,) AZ, + F3(zx) = 0 se 


Global convergence can be achieved if the damping 
strategy is properly chosen [6]. 

The inequality constraints that are active in a feasi- 
ble point are defined by the index set 


V(24) := {i|Fai(tx) = 0, (21) 


The inequalities which are defined by the index set 
J(z,) or their derivatives are denoted with F 3 or J3 in 
the following. In addition to (21) we define 


F, := F, . JS Ja : 
F; J3 


In order to derive the necessary conditions that have 
to be fulfilled by the solution of the problem (18) the 
Lagrangian 


(22) 


La, A, w) = 5 Bi) 3 —ATRs(2) — w™Fs(2) (23) 


and the reduced Lagrangian 


a, Ae) = FHF @OIG - ATR() 
(24) 


are defined. The Kuhn-Tucker-conditions, i. e. the nec- 
essary conditions of first order, are the feasibility con- 
ditions 

F,(z") = 0, 


F3(z*) > 0 (25) 


ensuring that z* is feasible, and the stationarity condi- 
tions stating that the adjoined variables A*, z* exist as 
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solution of the stationary conditions 


OL 
aye A wt FL (2*) -J(e*) - (a*)" Jo(z*) 
—(n*)" Js") =0 (26) 
and 


e*>0, i¢T(z*)> pF =0. (27) 


If (z*,A*, w*) fulfills the conditions (25), (26) and (27), 
it is called a Kuhn-Tucker-point and z* a stationary 
point. The necessary condition of second order means 
that for all directions 


J3(z*)s =0 


; jJ3; *\s = 0 
J3(2*)s > 0” [iJ3i(z" )s 


se rat)=|s £0] 
(28) 


the Hessian G(z*, A*, w*) of the Lagrangian is positive 
semi-definite: 


s'G(z*,A*,m*)s > 0, 
a2 
G(z*,A*, w*) := —L(z*,a*,m*). (29) 
dz? 


As 4; = Ofori ¢ J(z*) it is sufficient to postulate the 
stationary condition for the reduced Lagrangian (24). 
For the linear problem (20) follows: (z*,A*, w*) is 
a Kuhn-Tucker-point of the nonlinear problem (18) if 
and only, if (0,A*, w*) is a Kuhn-Tucker-point of the 
linear problem. The necessary conditions for the exis- 
tence of a local minimum of problem (18) are: 
1. (z*,A*, *) is a Kuhn-Tucker-point of the non- 
linear problem 
2. the Hessian G(z*, 4*, —*) of the Lagrangian is pos- 
itive definite for all directions s € T(x*), vis. s'G 
(z*,A*, w*)s > 0 
If the necessary conditions for the existence of the local 
minimum and the condition 4; # 0 for i € J(z*) are 
fulfilled, two perturbation theorems [6] can be formu- 
lated. If the sufficient conditions are fulfilled it can be 
shown for the neighborhood of a Kuhn-Tucker-point 
(z*,A*, w*) of the nonlinear problem (18) that the lo- 
cal convergence behavior of the inequality constrained 
problem corresponds to that of the equality constrained 
problem which represents active inequalities and equa- 
tions. Under the assumption of the regularity of the Ja- 


cobians J; and J,, i.e. 


rank(J.(zx)) = ne, 
(30) 


a unique solution Az; of the linear problem (20) exists 
and an unique linear mapping iy can be constructed 
which satisfies the relation 

Ay =-JT F(a), Jf =I, 


J, = [JT@d) Je]. G1) 


The solution Az; of the linear problem or formally the 
generalized inverse rg [5] of J, results from the Kuhn- 
Tucker conditions. But it should be noticed that z; is 
not calculated from (31) because of reasons of numer- 
ical efficiency but is based on a decomposition proce- 
dure using orthogonal transformations. 

By taking into consideration the special structure of 
the matrices J; caused by the continuity conditions of 
the multiple shooting discretization (18) can be reduced 
by a condensation algorithm described in [5,6] to a sys- 
tem of lower dimension 


. 1 
min 5 lAixe + ay ||5|Aoxx + a= 0 


A3xx + a3 = 0}, (32) 


from which x, can be derived at first and at last Az,. 
This is achieved by first performing a “backward re- 
cursion”, the “solution of the condensed problem” and 
a “forward recursion” [6]. Kilian [20] has implemented 
an active set strategy following the description in [6] 
and [33] utilizing the special structure of J. 

The details of the parameter estimation algorithms 
which are incorporated in the efficient software pack- 
age PARFIT (a software package of stable and efficient 
boundary value problem methods for the identification 
of parameters in systems of nonlinear differential equa- 
tions) are found in [6]. The damping constant ak in 
the k-th iteration is computed with the help of natu- 
ral level functions which locally approximate the dis- 
tance ||zx — z*|| of the solution from the Kuhn-Tucker 
point z*. 

The integrator METANB (for the basic discretization 
see, for instance, [3]) embedded in PARFIT is also suit- 
able for the integration of stiff differential equation sys- 
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tems. It allows the user to compute simultaneously the 
sensitivity matrixes G, 


0 
G(t; to, Xo, p) := Frame to,Xo.p) € M(ng,na) (33) 
0 


and H, 


0 
H(t; to, Xo, P) := op to,Xo,p) € M(ng,np) (34) 


which are the most costly blocks of the Jacobians J; via 
the so-called internal numerical differentiation as intro- 
duced by Bock [5]. This technique does not require the 
often cumbersome and error prone formulation of the 
variational differential equations 


G' = f,(t,x,p)-G, G(to; to, x0, p) = 1 (35) 
and 
H’ = f, (t,x, p)-H + f(t, x, p), 
H(to; to, Xo, Pp) = 0 (36) 


by the user. 

Using the multiple shooting approach described 
above, differential equation systems with poor stability 
properties and even chaotic systems can be treated [18]. 


Parameter Estimation in DAE Models 


Another, even more complex class of problems, are pa- 
rameter estimation in mechanical multibody systems, 
e.g., in the planar slider crank mechanisms, a simple 
model for a cylinder in an engine. These problems lead 
to boundary problems for higher index differential al- 
gebraic systems [34]. Singular controls and state con- 
straints in optimal control also lead to this structure. In- 
herent to such problems are invariants that arise from 
index reduction but also additional physical invariants 
such as the total energy in conservative mechanical sys- 
tems or the Hamiltonian in optimal control problems. 

A typical class of DAEs in mechanical multibody 
systems is given by the equations of motion 


x= Vv 
M(t, x)v = f(t, x) — Vx g(t, x)A , 
0 = g(t, x) 


(37) 


where x = x(t) and v = v(t) are the coordinates and 
velocities, M is the mass matrix, f denotes the applied 
forces, g are the holonomic constraints, and A are the 
generalized constraint forces. Usually, M is symmet- 
ric and positive definite. A more general DAE system 
might have the structure 


x = f(t,x,z p) 
0 = g(t,x,zp), 


where p denotes some parameters and z = Z(t) is 
a set of algebraic variables, i.e., the differentials z do 
not appear; in (37) A is the algebraic variable. In ad- 
dition we might have initial values x9 and zo. Obvi- 
ously, some care is needed regarding the choice of Zo 
because it needs to be consistent with the constraint. 
In some exceptional cases (in which Z := V-g has full 
rank and can be inverted analytically) we might insert 
z = 2(t,x;p) into the differential equation. DAE sys- 
tems with a regular matrix Z are referred to as index-1 
systems. Index-1-DAEs can be transformed into equiv- 
alent ordinary differential equations by differencing the 
equations w.r.t. t. At first we get the implicit system of 
differential equations 


gt+Xx4+2Zz=0, X:= Vig 
which, according to the assumption of the regularity of 
Z, can be written as the explicit system 


z= Z7" (g; + Xf) a 


Many practical DAEs have index 1, e. g., in some chem- 
ical engineering problems, where algebraic equations 
are introduced to describe, for instance, mass balances 
or the equation of state. However, multibody systems 
such as (37) have higher indices; (37) is of index 3. The 
reason is, that the multiplier variables, i.e., the alge- 
braic variables, do not occur in the algebraic constraints 
and it is therefore not possible to extract them directly 
without further differentiation. If Z does not have full 
rank the equations are differentiated successively, un- 
til the algebraic variables can be eliminated. The small- 
est number of differentiations required to transform the 
original DAE system to an ODE system is called the in- 
dex of the DAE. The approach developed and described 
by Schulz et al. [34] is capable to handle least squares 
problems without special assumption to the index. 
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An essential problem for the design, optimization 
and control of chemical systems is the estimation of 
parameters from time-series. These problems lead to 
nonlinear DAEs. The parameters estimation problem 
leads to a non-convex optimization problem for which 
several local minima exist. Esposito and Floudas [13] 
developed two global branch-and-bound and convex- 
underestimator based optimization approaches to solve 
this problem. In the first approach, the dynamical sys- 
tem is converted into an algebraic system using orthog- 
onal collocation on finite elements. In the second ap- 
proach, state profiles are computed by integration. In 
Esposito and Floudas [12] a similar approach is used to 
solve optimal control problems. 


Parameter Estimation in PDE Models 


A very complex class of least squares problems are 
data fitting problems in partial differential equations 
based models. These include eigenvalue problems, as 
well as initial and boundary value problems and cover 
problems in atomic physics, elasticity, electromagnetic 
fields, fluid flow or heat transfer. Some recent prob- 
lems are, for instance, in models describing the water 
balance and solid transport used to analyze the distri- 
butions of nutrients and pesticides [1], in the determi- 
nation of diffusive constants in water absorption pro- 
cesses in hygroscopic liquids discussed in [15], or in 
multispecies reactive flows through porous media [38]. 
Such nonlinear multispecies transport models can be 
used to describe the interaction between oxygen, ni- 
trate, organic carbon and bacteria in aquifers. They may 
include convective transport and diffusion/dispersion 
processes for the mobile parts (that is the mobile pore 
water) of the species. The immobile biophase represents 
the part where reactions caused by microbial activity 
take place and which is coupled to transport through 
mobile pore water. The microorganisms are assumed 
to be immobile. The model leads to partial differential 
algebraic equations 


Md,;u — V(DVu) + qVu = f;(u, v, z, p) , 
dv = f,(u, v,z, p), (38) 


0 = g(u,v,z, p), 


where D and q denote the hydraulic parameters of the 


model, p denotes a set of reaction parameters, u and v 
refer to the mobile and immobile species, and z is re- 
lated to source and sink terms. 


Methodology 


To solve least squares problems based on PDE mod- 

els requires sophisticated numerical techniques but also 

great attention with respect to the quality of data and 
identifiability of the parameters. To solve such prob- 
lems we might use the following approaches: 

1. Unstructured approach: The PDE model is, for fixed 
parameters p, integrated by any appropriate method 
yielding estimations of the observations. The param- 
eters are adjusted by a derivative-free optimization 
procedure, e.g., by the Simplex method by Nelder 
and Mead [23]. This approach is relatively easy to 
implement, it solves a sequence of direct problems, 
and is comparable to what in Sect. “Parameter Es- 
timation in ODE Models” has been called the IVP 
approach. Arning [1] uses such an approach. 

2. Structured approach (for initial value PDE prob- 
lems): Within the PDE model spatial coordinates 
and time are discretized separately. Especially for 
models with only one spatial coordinate, it is advan- 
tageous to apply finite difference or finite element 
discretizations to the spatial coordinate. The PDE 
system is transformed into a system of (usually stiff) 
ordinary differential equations. This approach is 
known as the method of lines (see, for example, [30]). 
It reduces parameter estimation problems subject 
to time-dependent partial differential equations to 
parameter identification problems in systems of or- 
dinary differential equations to be integrated w.r.t. 
time. Now it is possible to distinguish again between 
the IVP and BVP approach. Schittkowski [32] in his 
software package EASY-FIT applies the method of 
lines to PDEs with one spatial coordinate and uses 
several explicit and implicit integration methods to 
solve the ODE system. The integration results are 
used by an SQP optimization routine or a Gauf- 
Newton method to estimate the parameters. ZieBe 
et al. 38 and Dieses et al. [11], instead, couple the 
method of lines (in one and two spatial coordinates) 
with Bock’s [6] BVP approach, discretize time, for 
instance, by multiple shooting and use an extended 
version of PARFIT. 
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The method of lines has become one of the standard 
approaches for solving time-dependent PDEs with only 
one spatial coordinate. It is based on a partial discretiza- 
tion, which means that only the spatial derivative is 
discretized but not the time derivative. This leads to 
a system of N coupled ordinary differential equation, 
where N is the number of discretization points. Let us 
demonstrate the method by applying it to the diffusion 
equation 


2 


a a 
—c(t,z) = Dyaitt, Z), 


ot ay) 


O< 
0< 


with constant diffusion coefficient D. We discretize the 
spatial coordinate z according to 
zi = idz, 


cj = c(t) =c(t,z;), i=0,...,N. (40) 


If we choose a finite difference approximation we get 


0 c(t, z — Az) — 2c(t, z) + c(t,z + Az) 
aay elt, z) ~ 7 
dz (Az) 
_ fi-l — 2¢; 7 Ci+1 (Al) 
(Az) 


which replaces the diffusion Eq. (39) by N ordinary dif- 
ferential equations 


a= Ci-1 — 2¢; = Ci+1 (42) 
(Az) 

A detailed example of this method is discussed in [15]. 
The water transport and absorption processes within 
a hygroscopic liquid are described by a model contain- 
ing the diffusion Eq. (39) describing the water trans- 
port within the hygroscopic liquid, a mixed Dirich- 
let-Neumann condition representing a flux balance 
equation at the surface of the liquid, and an additional 
integral relation describing the total amount of water in 
the liquid. The model included three parameters to be 
estimated. 

The available measurement data provide the total 
time dependent concentration C(t) of water in the liq- 
uid. A further complication was that the mathematical 
solution of the diffusion equation is the water concen- 
tration c(t,z) in the hygroscopic liquid and it is a func- 


tion of time and location. Therefore, in order to com- 
pare the mathematical solution with the observed data 
one had to integrate c(t, z) over the space coordinate z, 
i.e., the depth of the fluid. 


Least Squares Problems with Massive Data Sets 


We motivate the necessity to analyze massive data sets 
by an example taken from astrophysics [19]. We out- 
line the method for a huge set of millions of observed 
data curves in which time is the independent parame- 
ter and for each of the N, N ~ 10°, curves there is a dif- 
ferent underlying parameter set we want to estimate by 
a least squares method. Note that we assume that there 
is a model in the sense of (6) or (10) available involv- 
ing an adjustable parameter vector p. We are further 
assume that we are dealing with nonlinear least squares 
problems which are not easy to solve. The difficulties 
could arise from the dependence on initial parameters, 
non-smoothness of the model, the number of model 
evalutions, or the CPU time required for one model 
evaluation. For each available curve we can, of course, 
solve this least squares problem by the techniques men- 
tioned or discussed earlier in this article. However, the 
CPU time required to solve this least squares problem 
for several million curves is prohibitive. The archive ap- 
proach described in this section is appropriate for this 
situation. 

Examples of massive data sets subject to least 
squares analyses are surveys in astrophysics where mil- 
lions of stars are observed over a range of time. About 
50% of them are binary stars or multiple systems. The 
observed data could be flux of photons (just called light 
in the discipline of binary star researchers) in a cer- 
tain wavelength region or radial velocity as a function 
of time. Thus we have to analyze millions of light and 
radial velocity curves. There are well validated mod- 
els and methods (cf., [17] to compute such curves on 
well defined physical and geometrical parameters of the 
binary systems, e.g., the mass ratio, the ratio of their 
radii, their temperatures, inclination, semi-major axis 
and eccentricity to mention a few. Thus one is facing 
the problem how to analyze the surveys and to derive 
the stellar parameters P relevant to astrophysicists. In 
this eclipsing binary star example it suffices to consider 
the range [0, P] for the independent parameter time be- 
cause the observed curves are periodic with respect to 
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the period P. The period could be determined a priori 
from a frequency analysis of the observed curve. Under 
certain assumptions, in eclipsing binary star analyses, 
time can be replaced by phase. 

The critical issues are speed and stability. Speed is 
obviously necessary to analyze large number of data, 
light and radial velocity curves in the example. Stabil- 
ity is required to automatize the procedure. Automati- 
zation enables the user to analyze large sets of eclipsing 
binary data produced by surveys. Stability and autom- 
atization need to overcome the problem of initial pa- 
rameters usually experienced in nonlinear least squares. 
There is a price to be paid in terms of accuracy. But nev- 
ertheless, such an approach will produce good approx- 
imate results and may indicate interesting eclipsing bi- 
nary stars for detailed follow-up analysis. 

The method we propose to solve least squares prob- 
lems with massive data sets is a matching approach: 
match one or several curves to a large test sets of pre- 
computed archive curves for an appropriate set of com- 
binations of |P| parameters. 


The Matching Approach 


Let for a given binary system £°. be any observed light 
value for observable c,c = 1...C, at phase 0;,i = 
1,..., I. Correspondingly, ts denotes the computed 
light value at the same phase 6; for the archive light 
curve k,k = 1... K. Note that K easily might be a large 
number such as 10!° . Each archive light curve k is com- 
puted by a certain parameter combination. 

The idea of the matching approach is to pick that 
light curve from the archive which matches the ob- 
served curve of binary j best. The best fit solution is ob- 
tained by linear regression. The matching approach re- 
turns, for each j, the number of the archive light curve 
which fits best, a scaling parameter, a, and a shift pa- 
rameter, b, (which might be interpreted as a constant 
third light) by solving the following nested minimiza- 
tion problem for all j, j = 1,..., N: 


I 
Wa [e. = (axel cp oF bc) 


=1 


min, min 
k Ake bke F 


Note that the inner minimization problem requires just 
to solve a linear regression problem. Thus, for each k, 
there exists an analytic solution for the unknown pa- 


rameters ax, and b;,. Further note that the aan val- 

ues might be obtained by interpolation. The archive 

light curves are generated in such a way that they have 

a good covering in the eclipses while a few points will 

do in those parts of the light curves which show only 

small variation with phase. Thus, there might be a non- 
equidistant distribution of phase grid points. A cubic 
interpolation will probably suffice. 

Thus, the matching approach requires us to provide 
the following components: 

1. solving linear regression problems determining a 
and b for all archive curves and all observed curves 
(the sequence of the loops is important), 

2. generating the archive curves, 

3. cubic interpolation in the independent time-like 
quantity and interpolation after the best matching 
solution has been found. 

In the sequel we briefly comment on the last two com- 

ponents. 


Generating and Storing the Archive Curves As the 
number of archive curves can easily reach 10!° one 
should carefully think about storing them. That re- 
quires also appropriate looping over the parameters 
p = 1,...,|P|. For the eclipsing binary example the 
details are given in [19]. Among the efficiency issues is 
the usage of non-equidistant parameter grids exploit- 
ing the sensitivity of the parameters on the model func- 
tion £°.. 

One might think to store the archive light curves in 
a type of data base. However, data base techniques be- 
come very poor when talking about 10'° curves. There- 
fore, it is probably easier to use a flat storage scheme. In 
the simplest case, for each k we store the physical and 
geometric parameters, then those parameters describ- 
ing observable c, and then the values of the observable. 
If we use the same number of phase values for each ob- 
servable and each k, we have the same amount of data 
to be stored. 


Exploiting Interpolation Techniques Within the 
matching approach interpolation can be used at two 
places. The first occurrence is in the regression phase. 
The test curves in the archive are computed for a finite 
grid of the independent parameter time (phase in this 
example). The observed curves might be observed at 
time values not contained in the archive. We can inter- 
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polate from the archive values by linear or cubic inter- 
polation to the observed time values. However, it may 
well pay out to have some careful thoughts on the gen- 
eration of the time grid points. 

The second occurrence is when it comes to deter- 
mining the best fit. The linear regression returns that 
parameter set which matches the observed one best. Al- 
ternatively, we could exploit several archive points to 
obtain a better fit to the observed curve. Interpolation 
in an appropriately defined neighborhoods of the best 
archive solution can improve the fit of the observed 
curve. 


Numerical Efficiency The efficiency of a least squares 
method could be measured by the number of function 
or model evaluation per unknown parameter. If we as- 
sume that for each model parameter p we generate np 
archive curves in the archive, the archive contains test 
curves N. = ea n, and thus requires N, model 
evaluation; n, is the number of archive grid points of 


parameter p. 


Conclusions 


This contribution outlines how to solve ODE and PDE 
based least squares problems. Academic and commer- 
cial least squares solvers as well as software packages 
are available. Massive data sets and observations arise 
in data mining problems, medicine, the stock market, 
and surveys in astrophysics. The approach described 
in Sect. “The Matching Approach” has been proven 
efficient for surveys in astrophysics. It can also sup- 
port the generation of impersonal good initial param- 
eter estimations for further analysis. The archive ap- 
proach is also suitable for parameter fitting problems 
with non-smooth models. Another advantage is that on 
the archve grid it provides the global least squares min- 
imum. 
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Complexity theory poses the question: How much com- 
puting time is required to solve a problem, as a func- 
tion of the size of the problem? A similar questions may 
be asked about other computing resources like memory 
space. In the context of optimization, the commonly- 
asked complexity question is how much computing 
time, as a function of m and 1, is required to solve 
a certain class of mathematical programming problems 
with n variables and m constraints. This form of asymp- 
totic complexity analysis was introduced by J. Hartma- 
nis and R.E. Stearns [4]. 

Several different complexity theories have been de- 
veloped to address this question. The best known com- 
plexity theory is based on Turing machines. Before 


448 


Complexity Theory 


defining this term, we start with a definition of ‘prob- 
lem’. Formally, a problem is a function F that takes as 
input an instance and produces as output a result. For 
example, in the context of linear programming the in- 
stance would be a triple (A, b, c) specifying a standard- 
form linear program LP: 


T 


min cx 
st. Ax=b, 
x>0. 


The value of F(A, b, c) is the optimal value of the LP 
instance, or perhaps the optimizer. The range of F must 
also include special output values to signify an infeasi- 
ble instance, an unbounded instance, or an ill-formed 
instance, e.g., dimensions of A and b are incompati- 
ble. Thus, in the context of complexity theory, the word 
‘problem’ refers not to a specific instance but to a class 
of instances. 

For a Turing machine, all instances must be speci- 
fied as finite-length strings of symbols where the sym- 
bols are chosen from a fixed, finite alphabet. For LP 
and other optimization problems, a reasonable alphabet 
would include the ten digits and delimiter marks like 
decimal points, commas, parentheses. A cardinality ar- 
gument shows that this stipulation of finite string over 
finite alphabet precludes the consideration of problems 
with arbitrary real number data. Thus, Turing machine 
solution of linear programming is generally restricted 
to rational or integer data. Rational and integer data are 
essentially equivalent since one can transform rational 
to integer by multiplying by a common denominator. 

A second limitation of the Turing machine model is 
that there is no simple way to specify a general objec- 
tive function or constraint function of an optimization 
problem as part of the input. There is a generalization 
of the Turing machine definition to overcome this lim- 
itation (so-called ‘oracle’ Turing machines), but in this 
article we limit attention to conventional Turing ma- 
chines. This limitation means that our Turing machine 
complexity analysis focuses on optimization problems 
with predefined classes of objective functions and con- 
straints in which the only free parameters are numeric 
data, e. g., (A, b, c) in linear programming. 

A Turing machine (TM) is a computational device 
equipped with an infinitely-long tape used for memory 
and a controller with a finite program. The tape con- 


tains an infinite number of cells numbered 0, 1, 2,..., 
and each cell is capable of holding one symbol chosen 
from a finite alphabet. The alphabet of the tape is a su- 
perset of the alphabet used for the input. Initially the 
tape contains the input instance written one symbol per 
cell starting at the left end of the tape (cell 0). The re- 
maining cells contain a special symbol meaning ‘blank’. 

The Turing machine controller has a tape head that 
is above one cell of the tape at any particular time. The 
controller is always in a state chosen from a finite list 
of states. Finally, the TM obeys a finite list of transition 
rules. Each transition rule has the form: ‘if the current 
symbol under the head is x and the current state is y, 
then change the symbol to x’, change the state to y’ and 
move the tape head one cell in direction d’, where d is 
either ‘left’ or ‘right’. Thus, a TM is fully specified by its 
input alphabet, its tape alphabet, its list of states, and its 
list of transition rules. If, for any given combination of 
current symbol/current state, there is at most one appli- 
cable transition rule, the TM is said to be deterministic 
else it is said to be nondeterministic. In this article we 
consider deterministic TMs only. 

An execution of a Turing machine consists of a se- 
quence of moves. Initially, as mentioned above, the in- 
put is written on the tape, the head is at position 0, and 
the machine is in a specially designated state called the 
‘start’ state. The applicable transition rule is selected 
and executed, meaning that cell 0 is rewritten and the 
head is moved. Each execution of a transition rule is 
called a ‘move’. The Turing machine continues to make 
moves until it reaches another special state called the 
‘halt’ state. 

The Turing machine is said to solve problem F, if 
given an input instance x, the TM (eventually) writes 
F(x) on its tape starting at position 0, followed by 
blanks, and then halts. If for some input the TM could 
ever execute an illegal operation, e.g., move left from 
cell 0, or enter a state/symbol combination before halt- 
ing for which there is no applicable transition rule, then 
it does not solve F. Furthermore, we require that the 
Turing machine can correctly handle every possible fi- 
nite string that can be written with the input alphabet. 
For incorrectly formatted strings, the Turing machine 
should output a special string indicating incorrect for- 
matting. 

The running time of a Turing machine for a given 
input instance is the number of moves required before 
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it halts. The running time for the whole problem F is 
usually expressed as a function of the size of the input, 
that is, the number of symbols in the input string. 

It can be proved using lengthy constructions that 
a Turing machine is capable of all the operations of an 
ordinary computer: it can simulate consecutively num- 
bered memory cells each holding a separate integer or 
rational number that are individually addressable, it can 
multiply, divide, add, and subtract two such numbers, 
etc. For a more detailed treatment of Turing machines, 
see [5]. 

A Turing machine is said to solve problem F in 
polynomial time if its running time is no more than 
a polynomial function of the size of the input. Examples 
of optimization problems that can be solved in poly- 
nomial time include linear and convex quadratic pro- 
gramming. 

A decision problem is a problem F in which the 
range of F consists of just two entries, ‘YES’ and ‘NO’. 
Optimization problems can often be recast as decision 
problems. For instance, in the case of linear program- 
ming, the input instance consists of (A, b, ¢, r), where r 
is a rational number, and the TM outputs ‘YES’ if the 
minimal solution to the LP problem is r or less, else 
it outputs ‘NO’. For incorrectly formatted and infea- 
sible problems, the TM also outputs ‘NO’. The deci- 
sion problem F partitions the input space into two sets 
of strings, “YES’-instances and ‘NO’-instances. A syn- 
onym for “decision problem’ is language recognition 
problem. 

The set P is defined to be all decision problems that 
can be solved in polynomial time. This set includes lin- 
ear programming (as recast in the previous paragraph) 
and many combinatorial optimization problems such 
as the minimum spanning tree problem and the short- 
est path problem (cf. also » Shortest path tree algo- 
rithms). 

Many interesting problems, such as nonconvex 
quadratic programming and Boolean satisfiability, are 
not known to be in P, but are also not proven to lie out- 
side of P. To analyze these cases, we introduce a second 
complexity class called NP. A decision problem F is said 
to lie in NP if there exists a polynomial time ‘certificate- 
checking’ machine M outputting ‘YES’ or ‘NO’, and 
polynomials p(-), q(-) with the following properties. For 
every ‘YES’-instance x of F, there exists another string 
y, called the certificate of x, such that the size of y is 


no more than p(size(x)) such that the pair (x, y) (ie. 
the string concatenation of x and y properly delimited) 
is a “YES’-instance of M. On the other hand, for every 
‘NO’-instance x of F, and for every possible certificate 
y, M outputs ‘NO’ for the input pair (x, y). Finally, in 
both cases, M is required to run in time no more than 
q(size(x) + size(y)). 

Notice this definition is asymmetric between “YES’- 
and ‘NO’-instances. Thus, it is not necessarily true 
that if problem F is in NP, then the problem F that 
results from complementing F’s output (i.e. “YES’- 
instances of F are ‘NO’-instances of F and vice- 
versa) is still in NP. Indeed, the question of whether 
F € NP © F € NPisa well-known open question. 

Another observation from the above definition is 
that P C NP. In particular, if a decision problem F lies 
in P, then it has polynomial time Turing machine T 
that distinguishes “YES’-instances from ‘NO’-instances. 
In this case, it is simple to design the certificate checking 
machine M needed for the definition of NP: in particu- 
lar, M takes as input (x, y), it discards y, i. e., overwrites 
it with blank cells, and then switches to running T on x. 

The most famous open question in complexity the- 
ory is whether this containment is actually equality, 
i.e., whether P = NP. It turns out that the P = NP 
question hinges on NP-complete problem the proto- 
type of which is the satisfiability problem. The satisfi- 
ability problem is as follows. An instance is a Boolean 
formula with variables x, . 
junctions and complement operations. For example, 
x1 A (XV X2) A (x2 V x3) is a satisfiability instance. The 
decision problem is to determine whether there is an 
assignment of the variables, each one either “TRUE’ or 
‘FALSE’, to make the entire formula true following the 
usual laws of boolean algebra. For example, the preced- 
ing formula is a “YES’-instance because there is a satis- 
fying assignment, namely x, = “TRUE”, x. = “FALSE, x3 
= ‘TRUE’. It is easy to see that this problem is in NP: the 
certificate for a “YES’-instance is the satisfying assign- 
ment. The certificate-checking machine M substitutes 
the satisfying assignment into the formula and verifies 
that the formula evaluates as ‘TRUE’. Thus, every satis- 
fiable formula has a certificate, but every nonsatisfiable 
instance is rejected by M no matter what certificate is 
given. 

S.A. Cook [2] proved that every problem in NP is 
polynomially transformable to satisfiability. We say that 
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450 


Complexity Theory 


decision problem F is polynomially transformable to F’ 
if there exists a Turing machine T that takes as input an 
instance x of F and produces as output an instance x’ 
of F’, such that the running time of T is no more than 
a polynomial in the size of the input, and such that x is 
a ‘YES’-instance of F if and only if x’ is a “YES’-instance 
of F’. 

Cook’s result means that given any decision prob- 
lem F in NP, there is a Turing machine M depending 
on F that takes as input an instance x of F and proceeds 
as follows. In polynomial time, M constructs a Boolean 
formula x’ from x such that «’ is satisfiable if and only 
x is a ‘YES’-instance of F. The construction of M is as 
as follows. The Boolean formula x’ simulates the action 
of the certificate-checking machine in on x. The actual 
entries of the certificate are represented by unknown 
Boolean variables, as are the entries on the tape of M af- 
ter the first move. The formula is composed of clauses 
that require the Turing machine to obey all transition 
rules and to end up halting with ‘YES’ as the output. 

Thus, Cook’s theorem implies that if there were 
a polynomial time algorithm for satisfiability, then 
there would be a polynomial time algorithm for every 
other problem in NP. Thus, the famous open question 
‘is P = NP?’ is now reduced to the (apparently) simpler 
question ‘is there a polynomial time algorithm for the 
satisfiability problem?’ 

It is not yet (1999) known whether the answer to 
either question in the last paragraph is ‘yes’. But many 
in the field suspect that the answer is ‘no’, i.e., many 
believe that there is no polynomial time algorithm for 
satisfiability. If indeed it is proved some day that no 
such polynomial time algorithm exists, we would say 
that satisfiability is intractable. 

A decision problem F is said to be NP-complete if it 
has these two properties, namely 
1) Fis in NP; and 
2) every problem in NP can be polynomially trans- 

formed to F. 

Cook’s result can be restated as: satisfiability is NP- 
complete. Furthermore, since polynomial transforma- 
tions can be composed, any problem F in NP to which 
any known NP-complete problem F’ can be trans- 
formed must itself be NP-complete. After Cook’s result 
was announced, R.M. Karp [6] showed that many well- 
known combinatorial problems, such as the Hamilto- 
nian cycle problem (‘given an undirected graph, is there 


a cycle containing each vertex exactly once?’) and the 
max-clique problem (‘given an undirected graph and an 
integer k, is there a set of k vertices that are all mutu- 
ally connected by edges?’), are NP-complete. By 1979, 
already thousands of problems were known to be NP- 
complete and many were catalogued in [3]. A proof that 
a problem is NP-complete is regarded as strong evi- 
dence of the problem’s intractability. 

Although the first batch of NP-completeness proofs 
applied to combinatorial problems, many continu- 
ous optimization problems are also known to be NP- 
complete; see » NP-complete problems and proof 
methodology. 

A generalization of ‘NP-complete’ is the notion of 
‘NP-hard’. A problem F is said to be NP-hard if satisfia- 
bility (or any other NP-complete problem) can be poly- 
nomially transformed to F. Thus, an NP-hard problem 
does not necessarily lie in NP. Indeed, the term NP- 
hard is often used to describe problems that are not 
even decision problems. 

The Turing machine is not the only model of com- 
plexity used in the literature. In fact, some feel that the 
TM is inadequate for modeling continuous optimiza- 
tion problems. Most continuous optimization prob- 
lem are based on computation with real numbers, but 
true real number computation is not possible with 
a TM. One model of real number computation is the 
information-based model. In this model, an algorithm 
is composed of operations on real numbers. Operations 
on real numbers are often counted as cost-free in this 
model, and the only costly operation is the evaluation of 
the functions defining the objective and constraints of 
the optimization problem. The objective functions and 
constraints are considered external black-box subrou- 
tines that take as input a vector and return as output the 
value of the function and possibly derivative values. In 
information-based complexity, a parameter € > 0 that 
specifies the desired degree of accuracy in the solution 
is always part of the input, since the information-based 
model rarely permits any problem to be solved exactly. 
This information-based model was used to analyze the 
ellipsoid method by its inventors D.B. Yudin and A.S. 
Nemirovsky [7]. It has also been used to analyze com- 
plexity of local optimization by S.A. Vavasis [12]. This 
model has also been used extensively to analyze other 
numerical algorithms not related to optimization, e. g., 
quadrature and other linear problems [9]. 
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A second model of computation is a real number 
model in which each operation is unit cost, and in which 
there is no concept of external black-box function eval- 
uation. In this model it is possible to develop real num- 
ber analogs of complexity classes P and NP, and also 
a reasonable definition for NP-complete; see [1]. This 
model can be used to analyze linear programming and 
other problems specified by a finite number of real pa- 
rameters. The complexity of linear programming prob- 
lem in this model is not fully understood (see [13] and 
[10]). 

For a more detailed look at complexity in optimiza- 
tion up to 1991, see [11]. For a more recent collection 
of papers on this topic, see [8]. 
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Nowhere in optimization is the dichotomy between 
convex and nonconvex programming more appar- 
ent than in complexity issues for quadratic program- 
ming. Quadratic programming, abbreviated QP, refers 
to minimizing a quadratic function q(x) = x' Hx/2+ 
c'x subject to linear constraints Ax > b. The problem 
is thus specified by the four-tuple (H, A, b, c) where H is 
a symmetric n x n matrix, A is an m x n matrix, b is an 
m-vector and c is an n-vector. Minimizing a quadratic 
function subject to convex quadratic constraints is also 
an interesting problem and is considered at the end of 
this article. The quadratic function q(x) is said to be 
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convex if the matrix H is positive semidefinite. A spe- 
cial case of a convex function is when H = 0, in which 
case the problem is now called linear programming. 

Convex quadratic programming inherits all the de- 
sirable attributes of the general convex programming. 
In particular, there is no local minimizer other than the 
global minimizer(s). Furthermore, general convex pro- 
gramming techniques like the ellipsoid method and in- 
terior point methods can be applied. 

With either the ellipsoid or interior point method, 
convex quadratic programming can be solved in poly- 
nomial time. In more detail, assume that (H, A, b, c) 
contain all integer data so that the problem is finitely 
represented for a Turing machine (see » Complexity 
theory). Let L denote the length of the input data, that 
is, the total number of digits to write (H, A, b, c). As- 
sume L > m* since H has m? entries. Then M.K. Ko- 
zlov et al. [9] showed that the ellipsoid algorithm of 
A.S. Nemirovsky and D.B. Yudin [14] can solve a con- 
vex QP instance in time O(m?L) iterations, where each 
iteration requires O(m) arithmetic operations on in- 
tegers, each of which has at most O(L) digits. This re- 
sult built on an analogous result for the LP case by L.G. 
Ha¢éijan (also spelled L.G. Khachiyan) [4]. Thus, the to- 
tal running time of this algorithm is polynomial in the 
size of the input. Note that the the global minimizer for 
quadratic programming (either convex or nonconvex), 
if it exists, can be written down with O(nL) digits, and 
hence computing the true global minimizer in a Turing 
machine setting is possible. 

Later, S. Kapoor and P.M. Vaidya [6] and Y. Ye 
and E. Tse [29] proved that an interior point method 
can solve convex quadratic programming in polyno- 
mial time under similar assumptions. This result built 
on the earlier result for the LP case by N.K. Karmarkar 
[7]. The best known running time for an interior point 
method for convex QP is O(m"?L) iterations, each it- 
eration requiring O(m?) arithmetic operations on inte- 
gers each of which has at most O(L) digits and is based 
on work by J. Renegar [17]. 

The running times of both the ellipsoid and inte- 
rior point algorithms are ‘weakly’ polynomial, meaning 
that the number of arithmetic operations is bounded 
by a polynomial in L rather than by a polynomial in 
m and n. In contrast, polynomial time algorithms for 
other problems like solving a system of linear equations 
or finding a minimum flow a network are strongly poly- 


nomial time, meaning that the number of operations is 
bounded by a polynomial in the combinatorial dimen- 
sion of the input data. A well-known open (1999) ques- 
tion asks whether there is a strongly polynomial time 
algorithm for convex QP (or, more specifically, for LP). 
A strongly polynomial algorithm would involve a num- 
ber of arithmetic operations polynomially bounded in 
m, n, in which each operation involves integers with 
a number of digits bounded by a polynomial in L. Some 
progress related to this question is as follows. If the di- 
mension n is restricted to a small integer, then QP can 
be solved in time linear in m. This result is due to I. 
Adler and R. Shamir [1] and builds on [10] and [2]. 
Since the constant of proportionality (or perhaps an ad- 
ditive term) is exponential in n, this algorithm is not 
so useful except when n < m. An example would be 
quadratic programming arising from a geometric prob- 
lem, such as finding the point in a 3D polyhedron clos- 
est to the origin. 

In the case of linear programming, a modified ellip- 
soid algorithm has a number of operations depending 
only on La, where La is the number of digits in A (i-e., 
the number of operations no longer depends on b or c), 
a result due to E. Tardos [19] and extended in [25]. Fi- 
nally, some special cases of quadratic programming are 
known to be solvable in strongly polynomial time such 
as the convex quadratic knapsack problem [5] which is: 


min qi(x1) al Gu(Xn) 


st Ll<x<u, (1) 
bix=y. 
Here qi, ...> Gn are convex quadratic functions of one 


variables (each specified by a quadratic and a linear co- 
efficient) I, u, b, y are also part of the problem data. 
Nonconvex quadratic programming is much harder 
than convex quadratic programming. If H is not posi- 
tive semidefinite, then the QP instance is said to be non- 
convex. A special case of nonconvex problems is when 
H is negative semidefinite, in which case the problem is 
said to be concave quadratic programming. When H is 
neither positive nor negative semidefinite, the problem 
is indefinite. Nonconvex quadratic programming was 
shown to be NP-hard by S. Sahni [18]. If the problem 
is posed as a decision problem, then it lies in NP (and 
is therefore NP-complete), a result due to S.A. Vava- 
sis [20]. (See ® Complexity theory or » NP-complete 
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problems and proof methodology for the definitions of 
NP-complete and NP-hard.) Even the problem of find- 
ing a local minimizer is known to be NP-hard, a result 
due to K.G. Murty and S.N. Kabadi [13]. 

Many restricted versions of the problem are still NP- 
hard. The nonconvex quadratic knapsack problem, that 
is, (1) with general (not necessarily convex) quadratic 
functions q1,..-, Gn, is NP-hard [18]. QP with only box 
constraints, that is, minimize q(x) subject to 1 < x < u, 
is also NP-hard. Similarly, minimizing q(x) subject to 
simplicial constraints, that is, constraints x > 0 and x,+ 

+ +X, = 1, is NP-hard as proved in [15] using a theo- 
rem of T.S. Motzkin and E.G. Straus [12]. The simpli- 
cial case is interesting because minimizing either a con- 
cave or convex quadratic function on a simplex can be 
solved in polynomial time. P.M. Pardalos and Vavasis 
[16] showed that quadratic programming in which H 
has a single negative eigenvalue (i. e., H is ‘almost’ pos- 
itive semidefinite) is NP-hard. 

The hardness results have motivated a search for ap- 
proximation algorithms to nonconvex quadratic pro- 
gramming problems. Vavasis [22,24] proposed ap- 
proximation algorithms for concave and indefinite QP 
in which the complexity depends exponentially on 
the number of negative eigenvalues of H. An addi- 
tional result is a fully polynomial time approximation 
scheme for the indefinite knapsack problem. Ye [28] 
gave a constant-factor polynomial time approximation 
scheme for indefinite qua- dratic programming with 
box constraints. 

Because computing even a local minimum of 
a quadratic programming instance is hard, several re- 
searchers have looked at approximations and special 
cases for the local minimization problem. J.J. Moré and 
Vavasis [11] proved that a local minimizer for the con- 
cave knapsack problem can be found in polynomial 
time; this result was extended to the indefinite case in 
[23]. Ye [28] gave a polynomial time algorithm to find 
an approximate KKT point of general nonconvex QP. 

So far we have considered only linear constraints. 
A convex quadratic constraint, also called an ellipsoidal 
constraint, is a constraint of the form (x — c)' A(x 
—c) < 1, where A is a symmetric positive semidefi- 
nite matrix. The problem of minimizing a nonconvex 
quadratic function subject to a single ellipsoidal con- 
straint is called the trust region problem and has re- 
ceived extensive attention in the literature because al- 


gorithms to solve this problem are often used as sub- 
routines by general-purpose optimization algorithms. 
A polynomial time algorithm for the trust region prob- 
lem was proposed independently by Ye [27] and Kar- 
markar [8]. The sense in which this algorithm is ‘poly- 
nomial time’ is weaker than the analogous claim for 
QP because in the trust region case, the optimizer x 
cannot be written in a finite number of digits even if 
the input data is all integer (because the solution may 
be irrational). But Vavasis and R. Zippel [26] showed 
nonetheless that this algorithm leads to a proof that the 
associated decision problem lies in P. The trust region 
problem is thus one of the very few nonconvex opti- 
mization problem solvable in polynomial time. M. Fu, 
Z.-Q. Luo and Ye [3] have considered generalizing this 
result to more than one ellipsoidal constraint, although 
the results are not as strong as the single-constraint 
case. 

All of the pre-1991 material in this article is covered 
in more depth by [21]. 
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By composite nonsmooth optimization (CNSO) we 
mean a class of optimization problems involving com- 
posite functions of the form f(x) := g(F(x)), where 
F: R" — R” is a (differentiable) smooth map and 
g: R” — R is a nonsmooth function. The function 
g is often a nonsmooth convex function. Problems of 
CNSO occur when solving nonlinear equations F;(x) = 
0,i=1,...,m, by minimizing the norm || F(x) ||. Similar 
problems arise when finding a feasible point of a system 
of nonlinear inequalities F;(x) < 0,i=1,...,m, by min- 
imizing || F(x)* || where Fy = max(F;, 0). Composite 
functions f also appear in the form of an exact penalty 
function when solving a nonlinear programming prob- 
lem. Another type of CNSO problem which frequently 
arises in (electrical) engineering [4] is to minimize the 
max-function max;F;(x), where the maximum is taken 
over some (finite) set. All these examples can be cast 
within the structure of CNSO. Moreover, CNSO pro- 
vides a unified framework in which to study theoretical 
properties and convergence behavior of various numer- 
ical methods for constrained optimization problems. 
There have been many contributors to the study of 
CNSO problems both in finite and infinite dimensions. 
(See for example [2,6,7,10,11,13,15,16,19].) In this arti- 
cle we only discuss different forms of composite model 
problems in finite dimensions and provide a brief ac- 
count of their first and second order Lagrangian theory 
of CNSO problems. The implications for numerical op- 
timization are not discussed here. For details on this 
see, for instance, [1,6]. 


Real-Valued CNSO 
Consider the problem 


(P) min g(F(x)). 


Notably, A.D. Ioffe [7,8,9] provided the theoretical 
foundation for CNSO problems in the case where the 
function g is sublinear (convex plus positively homoge- 
neous). Then J.V. Burke [2] extended the theory to the 
case where g is convex. A fundamental local dualization 
technique plays a significant role in the development of 
first - and second order Lagrangian theory for (P). To 
see the dualization result, let us define the Lagrangian 
of (P) as 


L(x, y*) = (y", F(x)) — g*(y*), 


where g* is the Fenchel conjugate of g [14]. Let 
Lo(z) = {y*: y* € Og(F(z)), y* F(z) = o} 
and let 


Lye(z) = ae y* € Je g(F(z)), 


y*F(z)| <n}. 


where F’(z) is the derivative of F at z, dg(y) is the convex 
subdifferential of g at y, € > 0, 7 > 0 and d¢ g(F(z)) is the 
e-subdifferential of g at F(z). The set Lo(z) is the set of 
Lagrange multipliers for (P) at z (see [2,11]). Define 


Qne(x) = max Lx, y"). 
y* EL ne(z) 


A general form of the Ioffe-Burke local dualization re- 

sult [11] states that if g is a lower semicontinuous 

convex function and F is a locally Lipschitzian and 

(Gateaux) differentiable function then the following 

statements are equivalent: 

i) g(F(x)) attains a local minimum at z. 

ii) Lo(z) A Wand Py attains a local minimum at z, for 
any n>0,€>0. 

iii) Lo(z) £ Wand ¢y-¢ attains a local minimum at z, for 
some n>0,€>0. 

These conditions also provide first order La- 
grangian conditions for (P). Moreover, this local dual- 
ization result and a generalized Taylor expansion of V. 
Jeyakumar and X.Q. Yang [11] yield second order op- 
timality conditions for (P). If g is a lower semicontinu- 
ous convex function and F is a differentiable map with 
locally Lipschitzian derivative F’ (i.e. C’') then a nec- 
essary condition for a € R” to be a local minimizer of 
(P) is 


max L°°(a,y*;u,u)>0, Vu € K(a). 


y*€Lo(a) 
On the other hand if a € R", Lo(a) 4 O and 


max —L°°(a,y*;u,—u)>0, Wue D(a), 


y*€Lo(a) 
then a is a strict local minimizer of order 2 for (P), i.e., 
there exist € > 0, p > 0 such that whenever || x — a || < 
p, f(x) = f(a)+ € || x— a ||?. Here 


dt > 0, 
g(F(a)+ tF(a)u) ¢, 
< g(F(a)) 


K(a) = ,ueR": 
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D(a) = {u € R": f"(a; u) < 0}, and the directional deriva- 
tive of f at a is given by f’(a; d) = g'(F(a); F’(a)d). The 
generalized second order directional derivative of L at a 
in the directions (u, v) € R" x R", L” (a; u, v), is defined 
by 

(VL(y + su), v) — (VL(y), v) 


lim sup ; 
ya, s—>0 s 


Special cases of these optimality conditions under twice 
continuously differentiability hypothesis can be found 
in [2,9]. Composite problems where the map F is C!, 
but is not necessarily twice continuously differentiable 
are discussed in [19]. 


Extended Real-Valued CNSO 


A composite problem form which has greater versatility 
than the traditional form (P) is the following nonfinite 
valued problem [15,16] 


min g(F(x)) 
(PE) st. x ER", 
F(x) € dom(g), 


where g: R” — R U {+00} is a convex function and F: 
R" — R” is a smooth map. For instance, constrained 
CNSO problems of the form, 


min  go(Fo(x)) 
st %xEC, 
gi(Fi(x)) < 0, 


studied in [10,17], can be re-written in the form of (PE) 
[11]. Here C is a closed convex subset of R", gj, j = 
0, ..., m, are locally Lipschitz functions and F;, j = 0, 
..., m, are differentiable functions. Optimality condi- 
tions for (PE) can be derived by reducing (PE) to a real- 
valued minimization problem as it was shown in [3]. 
This requires a regularity condition known as a con- 
straint qualification in the nonlinear programming lit- 
erature. The following regularity condition, introduced 
in [15] as a basic constraint qualification, permits one 
to establish a reduction theorem. If g: R” > RU {+00} 
is a lower semicontinuous convex function and if F 
: R’ — R" is locally Lipschitzian then the function 
f(x) = g(F(x)) is said to satisfy the basic constraint 
qualification at a point x € dom(f) if the only point 


w € N(F(x)|dom(g)) for which 0 € wT 0F(x) is w = 0, 
where N(F(x)|dom(g)) is the normal cone to dom(g) 
at F(x) and F(x) is the generalized Jacobian of F at x 
[5]. The basic constraint qualification is equivalent to 
the Mangasarian-Fromovitz constraint qualification for 
the standard nonlinear programming problem with in- 
equality and equality constraints (see [15]). The Burke- 
Poliquin reduction result gives us the following second 
order conditions for (PE). For problem (PE), suppose 
that F(a) € dom(g), g is lower semicontinuous convex 
and F is C!. Then the following statements (i) and (ii) 
hold. 

i) Ifa isa local minimizer of (PE) at which the basic 

constraint qualification holds, then 


max L°°(a,y";u,u)>0, Wu € K(a). 


y*€Lo(a) 
ii) If Lo(a) 4 @ and 


max —L°°(a, y*;u,—u) > 0, Wu € D(a), 
y* €Lo(a) 

then ais a strict local minimizer of order 2 for (PE). 
With the aid of a representation condition, second or- 
der conditions can also be obtained for a global mini- 
mizer of (PE) in the case where F is twice strictly dif- 
ferentiable. This was shown in [19]. The problems (PE) 
have also been extensively studied by R.T. Rockafellar 
[15,16] in the case where F is twice continuously differ- 
entiable and g is a proper convex function that is piece- 
wise linear quadratic in the sense that the dom(g) is ex- 
pressible as the union of finitely many polyhedral sets, 
relative to each of which g is given by the formula that 
is quadratic (or affine). 


Multi-Objective CNSO 


Nonsmooth vector optimization problems (cf. » Vec- 
tor optimization) where the functions involved are 
compositions of convex functions and smooth func- 
tions arise in various applications. The following model 
problem was examined in [12]: 


V—min (fi(Fi(x)),.... fp(Fp(x))) 
s.t. xeEC, 
(MP) 
gji(Gj(x)) < 0, 
j=l,...,m, 


where C is a convex subset of R”, fj, gj are real val- 
ued convex functions on R”, F;, G; are locally Lips- 
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chitz and differentiable functions from R” into R”. Note 
here that “V-min’ stands for vector minimization. This 
model is broad and flexible enough to cover many com- 
mon types of vector optimization problems. In par- 
ticular, this model includes the penalty representation 
of the standard vector nonlinear programming prob- 
lems, examined in [18], and many vector approxima- 
tion problems. By employing the Clarke subdifferential, 
first order Lagrangian optimality and duality results can 
be discussed as it was shown in [12]. Second order opti- 
mality conditions for a special case of the problem (MP) 
are discussed in [19,21] 
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Many problems that arise in operations research and 
related fields are combinatorial in nature: problems 
where we seek the optimum from a very large but finite 
number of solutions. Sometimes such problems can be 
solved quickly and efficiently, but often the best solu- 
tion procedures available are slow and tedious. It there- 
fore becomes important to assess how well a proposed 
procedure will perform. 

The theory of computational complexity addresses 
this issue. Complexity theory is a comparatively young 
field, with seminal papers dating from 1971-1972 
([1,5]). Today, it is a wide field encompassing many 
subfields. For a formal treatment, see [6]. As we shall 
see, the theory partitions all realistic problems into two 
groups: the ‘easy’ and the ‘hard’ to solve, depending on 
how complex (hence how fast or slow) the computa- 
tional procedure for that problem is. The theory defines 
still other classes, but all but the most artificial mathe- 
matical constructs fall into these two. Each of them can 
be further subdivided in various ways, but these refine- 
ments are beyond our scope. It should be noted that we 
have not here used the accepted terminology, which is 
introduced below. 


Definitions 


A problem is a well-defined question to which an unam- 
biguous answer exists. Solving the problem means an- 
swering the question. The problem is stated in terms of 
several parameters, numerical quantities which are left 
unspecified but are understood to be predetermined. 
They make up the data of the problem. An instance 
of a problem gives specified values to each parameter. 
A combinatorial optimization problem, whether maxi- 
mization or minimization, has for each instance a fi- 
nite number of candidates from which the answer, or 
optimal solution, is selected. The choice is based on 
a real-valued objective function which assigns a value 
to each candidate solution. A decision problem or recog- 


nition problem has only two possible answers, YES or 
NO. 


Example 1 For example, consider the problem of solv- 
ing a given system of linear equations. Stated as a ques- 
tion, it becomes: ‘what is the solution to A x = b?’ with 
parameters m, n, aj,;, b;, x; wherei=1,...,m,j=1,..., 
n. An instance might be: ‘What is the solution to 7x, — 
3x2 = 16 and 2x; + 5x2 = 9? with parameters m = 2, 
n= 2etc. 

This is neither an optimization problem nor a de- 
cision problem. An example of an optimization prob- 
lem is a linear program, which asks: ‘what is the greatest 
value of cx subject to Ax < b? To make this a combina- 
torial optimization problem, we might make the vari- 
able x bounded and integer-valued so that the number 
of candidate solutions is finite. A decision problem is: 
‘does there exist a solution to the linear program with 
cx > k? 


To develop complexity theory, it is convenient to state 
all problems as decision problems. An optimization 
(say, maximization) problem can always be replaced by 
a sequence of problems of determining the existence of 
solutions with values exceeding k,, k2,.... An algorithm 
is a step-by-step procedure which provides a solution 
to a given problem; that is, to all instances of the prob- 
lem. We are interested in how fast an algorithm is. We 
now introduce a measure of algorithmic speed: the time 
complexity function. 


The Nature of the Time Complexity Function 


Complexity theory does not measure the speed of an 
algorithm directly; that would depend on the speed of 
the computer being used and other extraneous factors. 
Rather, it considers the rate of growth of the solution 
time as a function of the instance size. Since different 
instances of the same size may require dramatically dif- 
ferent solution times, we use the ‘worst case’ or longest 
time that any instance of that size requires. This maxi- 
mal time needed to solve a problem instance, as a func- 
tion of its size, is called the time complexity function 
(TCF) or simply the complexity of the algorithm. When 
we speak of the complexity of the problem, we mean 
the complexity of the most efficient algorithm (known 
or unknown) that solves it. 

We need to clarify what we mean by the ‘time re- 
quired’ and the ‘size of an instance’. First, note that 
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we always think of solving problems using a computer. 
Thus, an algorithm is a piece of computer code. Sim- 
ilarly, the size of a problem instance is technically the 
number of characters needed to specify the data, or the 
length of the input needed by the program. For a deci- 
sion problem, an algorithm receives as input any string 
of characters, and produces as output either YES or NO 
or ‘this string is not a problem instance’. An algorithm 
solves the instance or string in time m if it requires m ba- 
sic operations to reach one of the three conclusions and 
stop. 

In order to avoid detailed consideration of the ex- 
act input length (are binary or alphanumeric characters 
used? what encoding scheme is used?), as well as avoid- 
ing precise measurement of solution times, the theory 
requires no more than orders of magnitude of these 
measurements. Recall, we are only concerned with the 
rate of increase of solution time as instances grow. For 
example, we may ask how much longer it takes if we 
double the instance size. As long as we enter data con- 
sistently, an instance that is twice as big as another un- 
der one data entry scheme remains twice as big under 
another. (For a rigorous proof and other technical is- 
sues, see [4]). Indeed, it is customary to use as a surro- 
gate for instance size, any number that is roughly pro- 
portional to the true value. We shall use the symbol n, 
n=1,2,..., to represent the size of a problem instance. 
In summary, for a decision problem JT: 


Definition 2 The time complexity function (TCF) of 
algorithm A is: 


maximal time for A 
Ta(n) = : 
to solve any string of length n. 


In what follows, the big oh notation (0) introduced in 
[3] will be used when expressing the time complexity 
function. We say that, for two real-valued functions f 
and g, ‘f(n) is O(g(n))’, or ‘f(n) is of the same order as 
g(n)’, if |f(n)| < k- |g(n)| for all n > 0 and some k > 0. 


Polynomial Versus Exponential Algorithms 


An efficient, polynomially bounded, polynomial time al- 
gorithm, or simply a polynomial algorithm, is one which 
solves a problem instance in time bounded by a power 
of the instance size. Formally: 


Definition 3. An algorithm A is polynomial time if 
there exists a polynomial p such that 


Ta(n) < p(n), Wne Zt = {1,2,...}. 


More specifically, an algorithm is polynomial of degree 
c, or has complexity O(n‘), or runs in O(n‘) time if, for 
some k > 0, the algorithm never takes longer than kn‘ 
(the TCF) to solve an instance of size n. 


Definition 4 The collection P comprises all problems 
for which a polynomial time algorithm exists. 


Problems which belong to P are the ones we referred 
to earlier as ‘easy’. All other algorithms are called expo- 
nential time or just exponential, and problems for which 
nothing quicker exists are ‘hard’. Although not all al- 
gorithms in this class have TCF’s that are technically 
exponential functions, we may think of a typical one as 
running in O(c?) for some polynomial p(n). Other ex- 
amples of exponential rates of growth are n” and n!. 

The terms ‘hard’ and ‘easy’ are somewhat mislead- 
ing, even though exponential TCFs clearly lead to far 
more rapid growth in solution times. Suppose an ‘easy’ 
problem has an algorithm with running time bounded 
by, say kn°. Such a TCF may not be exponential, but 
it may well be considered pretty rapidly growing. Fur- 
thermore, some algorithms take a long time to solve 
even small problems (large k), and hence are unsatis- 
factory in practice even if the time grows slowly. On the 
other hand, an algorithm for which the TCF is expo- 
nential is not always useless in practice. Recall, the con- 
cept of the TCF is a worst case estimate, so complexity is 
only an upper bound on the amount of time required by 
an algorithm. This is a conservative measure and usu- 
ally useful, but it is too pessimistic for some popular 
algorithms. The simplex algorithm for linear program- 
ming, for example, has a TCF that is O(2”), but it has 
been shown that for the average case the complexity is 
only O(n). Thus, the algorithm is actually very fast for 
most problems encountered. 

Despite these caveats, exponential algorithms gen- 
erally have running times that tend to increase at an ex- 
ponential rate and often seem to ‘explode’ when a cer- 
tain problem size is exceeded. Polynomial time algo- 
rithms usually turn out to be of low degree (O(n*) or 
better), run pretty efficiently, and are considered desir- 
able. 
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Reducibility 


A problem can be placed in P as soon as a polyno- 
mial time algorithm is found for it. Sometimes, rather 
than finding such an algorithm, we may place it in P 
by showing that it is ‘equivalent’ to another problem 
which is already known to be in P. We explain what 
we mean by equivalence between problems with the fol- 
lowing definitions. 


Definition 5 A problem IT’ is reducible or trans- 
formable to a problem IT (IT' « IT) if, for any instance 
I’ of IT’, an instance I of IT can be constructed in poly- 
nomially bounded time, such that the solution to I is 
sufficient to find the solution to I’ in polynomial time. 


We call the construction of the I that corresponds to I’ 
a polynomial transformation of I into I. 


Definition 6 Two problems are equivalent if each is 
reducible (or simply reduces) to the other. 


Since reduction, and hence equivalence, are clearly 
transitive properties, we can define equivalence classes 
of problems, where all problems in the same equivalence 
class are reducible (or equivalent) to each other. Con- 
sider polynomial problems. Clearly, for two equivalent 
problems, if one is known to be polynomial, the other 
must be, too. Also, if two problems are each known 
to be polynomial, they are equivalent. This is because 
any problem JT’ € P is reducible to any other problem 
IT € P, or indeed to any IT ¢ P, in the following triv- 
ial sense. For any instance I’ of IT’, we can pick any 
instance of [7, ignore its solution, and find the solu- 
tion to I’ directly. We conclude that P is an equivalence 
class. 

We state a third simple result for polynomial prob- 
lems as a theorem. 


Theorem 7 [f IT € P, then II’ x IT > IT' €P. 


Given any instance I’ of 7’, one can find an instance I 
of IT by applying a polynomial time transformation to 
I'. Since IT € P, there is a polynomial time algorithm 
that solves I. Hence, using the transformation followed 
by the algorithm, I’ can be solved in polynomial time. 


Classification of Hard Problems 


In practice, we do not usually use reduction to show 
a problem is polynomial. We are more likely to start 


optimistically looking for an efficient algorithm directly, 
which may be easier than seeking another problem 
known to be polynomial, for which we can find an ap- 
propriate transformation. But suppose we cannot find 
either an efficient algorithm or a suitable transforma- 
tion. We begin to suspect that our problem is not ‘easy’ 
(i. e., is not a member of P). How can we establish that 
it is in fact ‘hard’? We start by defining a larger class 
of problems, which includes P and also all the difficult 
problems we ever encounter. To describe it, consider 
any combinatorial decision problem. For a typical in- 
stance, there may be a very large number of possible so- 
lutions which may have to be searched. Picture a candi- 
date solution as a set of values assigned to the variables 
X = (x1, ...,X,). The question may be ‘for a given vector 
c is there a solution x such that cx < B? and the algo- 
rithm may search the solutions until it finds one satisfy- 
ing the inequality (whereupon it stops with the answer 
YES) or exhausts all solutions (and stops at NO). 

This may well be a big job. But suppose we are told 
‘the answer is YES, and here is a solution x that satis- 
fies the inequality’. We feel we must at least verify this, 
but that is trivial. Intuitively, even for the hardest prob- 
lems, the amount of work to check that a given candi- 
date solution confirms the answer YES should be small, 
even for very large instances. We will now define our 
‘hard’ problems as those which, though hard to solve, 
are easy to verify, where as usual ‘easy’ means taking 
a time which grows only polynomially with instance 
size. To formalize this, let: 


maximal time for A 

to verify that a given solution 

Va(n) = 
establishes the answer YES 
for any instance of length n. 


Definition 8 An algorithm A is nondeterministic poly- 
nomial time if there exists a polynomial p such that for 
every input of length n with answer YES, V4(n) < p(n) 


Definition 9 The collection NP comprises all prob- 
lems for which a nondeterministic polynomial algo- 
rithm exists. 


It may be noted that a problem in NP is solvable 
by searching a decision tree of polynomially bounded 
depth, since verifying a solution is equivalent to trac- 
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ing one path through the tree. From this, it is easy to 
see that P C NP. Strangely, complexity theorists have 
been unable to show that P C NP; it remains possible 
that all the problems in NP could actually be solved by 
polynomial algorithms, so that P = NP. However, since 
so many brilliant researchers have worked on so many 
difficult problems in NP for so many years without suc- 
cess, this is regarded as being very unlikely. Assuming 
P + NP, as we shall hereafter, it can be shown that the 
problems in NP include an infinite number of equiva- 
lence classes, which can be ranked in order of increas- 
ing difficulty; where an equivalence class C is ‘more dif- 
ficult’ than another class C’ if, for every problem JT € 
Cand every IT’ € C’, II’ « IT but IT & IT’. There also 
exist problems that cannot be compared: neither [7 « 
IT’ nor IT! « IT. 

Fortunately, however, all problems that arise natu- 
rally have always been found to lie in one of two equiv- 
alence classes: the ‘easy’ problems in P, and the ‘hard’ 
ones, which we now define. 

The class of NP-hard problem (NPH) is a collection 
of problems with the property that every problem in NP 
can be reduced to the problems in this class. More for- 
mally, 


Definition 10 


NPH = {/7: VI’ ENP: I’ x IT}. 


Thus each problem in NPH is at least as hard as any 
problem in NP. We know that some problems in NPH 
are themselves in NP, though some are not. Those that 
are include the toughest problems in NP, and form the 
class of NP-complete problem (NPC). That is, 


Definition 11 


(UIT € NP) 
and 
(VI' ENP: II' «T1) 


NPC = 4I7: 


The problems in NPC form an equivalence class. This 
is so because all problems in NP reduce to them, hence, 
since they are all in NP, they reduce to each other. The 
class NPC includes the most difficult problems in NP. 
As we mentioned earlier, by a surprising but happy 
chance, all the problems we ever encounter outside the 


most abstract mathematics turn out to belong to either 
P or NPC. 


Using Reduction to Establish Complexity 


When tackling a new problem /7, we naturally wonder 
whether it belongs to P or NPC. As we said above, to 
show that the problem belongs to P, we usually try to 
find a polynomial time algorithm, though we could seek 
to reduce it to a problem known to be polynomial. If we 
are unable to show that the problem is in P, the next 
step generally is to attempt to show that it lies in NPC; 
if we can do so, we are justified in not developing an 
efficient algorithm. To do this, clearly no direct algo- 
rithmic development is possible, and only a reduction 
argument will do. This is based on the following theo- 
rem, which should be clear enough to require no proof. 


Theorem 12 V/JI IT’ € NP: (IT' € NPC) and (IT' « 
IT) imply IT € NPC. 


Thus, we need to find a problem JT’ € NPC and show 
IT’ « IT, thereby demonstrating that JT is at least as 
hard as any problem in NPC. To facilitate this, we need 
a list of problems known to be in NPC. Several hun- 
dred are listed in [2] in a dozen categories such as 
graph theory, mathematical programming, sequencing 
and scheduling, number theory, etc., and more are be- 
ing added all the time. Even given an ample selection, 
a good deal of tenacity and ingenuity are needed to pick 
one with appropriate similarities to ours and to fill in 
the precise details of the transformation. 

Of course, to build up the membership in NPC us- 
ing Theorem 12, we need other problems that have al- 
ready been show to belong to that class. To begin this 
process, at least one problem needs to be in NPC. It 
was S.A. Cook [1] who showed that the satisfiability 
problem is NP-complete, using direct arguments that 
did not involve reduction. This very important result is 
called Cook’s theorem. For a proof, see [2]. 

As a simple illustration of reduction, we show that 
the traveling salesperson decision problem (TSP) is in 
NPC. To do so, we first select a closely related problem, 
the Hamiltonian circuit problem (HCP), which we as- 
sume has already been shown to be NP-complete. We 
then find a reduction of HCP to TSP. The problems are 
defined as follows. 
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Definition 13 (TSP, traveling salesperson problem). 
Instance: 

a positive integer n; 

a finite set C = {c),..., Cy} of ‘cities’; 

‘distances’, dj € Z*. Vi, j: ci, 7 € C; 

a bound B € Z*. 

Question: Does there exist a tour (i.e., a closed path 
that visits every city exactly once), of length no greater 
than B? 


Definition 14 (HCP, Hamiltonian circuit problem) 
Instance: A graph G = (V, E), where V is the set of m 
vertices, and E the set of edges. 

Question: Does G contain a Hamiltonian circuit, 
ie., a tour that traverses all vertices exactly once? 


Example 15 To show: HCP « TSP. 

In TSP, we have a complete graph and seek the 
shortest tour, whereas in HCP, given an arbitrary graph 
we require any tour. Thus, given the challenge of show- 
ing that the traveling salesperson problem (or the deci- 
sion version of it) is NP-complete, we have found a sim- 
ilar problem whose membership in NPC is already es- 
tablished. We may still be unable to find a polynomial 
transformation from HCP, in which case another prob- 
lem must be sought. A transformation of IT’ to IT is 
a way of computing each parameter of /7 in terms of 
the parameters of JT’. In this case, the reduction is rela- 
tively simple. The parameters of HCP are: 

e m=cardinality V; 
oo Vd: G contains an arc 
between vertices i, j 
The parameters of TSP are computed as follows: 


| 1 (i,j €E, 
N _ otherwise; 
e B=m. 
Here, N can be any number larger than 1; say, 2. Clearly, 
the shortest possible tour in TSP has length m, and this 
only occurs when arcs in E are used exclusively; that is, 
when a tour in HCP exists. 

To complete the reduction, we need to show that the 
transformation can be performed in polynomial time. 
For that, given a pair of nodes in TSP, we need to check 


if an arc exists in HC, and this requires time 


of 


5 = O(m’). 
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Concave programming constitutes one of the most fun- 
damental and intensely-studied problem classes in de- 
terministic nonconvex optimization. There are at least 
three reasons for this. First, many of the mathematical 
properties of concave programming are intriguingly at- 
tractive. Some are even identical to properties of lin- 
ear programming. Second, concave programming has 
a remarkably broad range of direct and indirect appli- 
cations. Third, the algorithmic ideas used in concave 
programming have played and continue to play an ac- 
tive and often fundamental role in the development of 
solution procedures for other types of nonconvex pro- 
gramming problems. 

The concave programming, or concave minimiza- 
tion, problem (CMP) can be written 


globmin f(x), 


s.t. xéeD, 


where D is a nonempty, closed convex set in R” and f is 
a real-valued, concave function defined on some open 
convex set A in R” that contains D. The goal in CMP is 
to find the global minimum value that f achieves over 
D, and, if this value is not equal to —os, to find, if it 
exists, at least one point in D that achieves this value. 
In many applications, D is compact and A equals all of 
R". CMP invariably contains many points in D that are 
local, but not global, minimizers of f over D. For this 
reason, CMP is an example of a (multi-extremal) global 
optimization problem [7]. 

The application of standard algorithms designed 
for solving constrained convex programming problems 
generally will fail to solve CMP. Even instances of 
CMP with relatively simple components can apparently 
present very significant solution challenges. For exam- 
ple, B. Kalantari [8] has shown that in problems involv- 
ing the minimization of concave quadratic functions 
over rectangles in R”, an exponential number of ex- 
treme point local minima can exist. Additionally, P.M. 
Pardalos and G. Schnitger [13] have shown that mini- 
mizing a concave quadratic function over a hypercube 
is an NP-hard problem. 


Although CMP is more difficult to solve than a con- 
vex programming problem, it possesses some highly in- 
teresting, special mathematical properties. A number of 
these properties have been exploited by researchers to 
create successful algorithms for solving the problem. 

For instance, if D contains at least one extreme 
point, and CMP has at least one global optimal solu- 
tion, then there must exist a global optimal solution 
which is an extreme point of D [14]. This is perhaps the 
most important and striking property of concave min- 
imization problems. As a result of this property, just 
as in linear programming, if CMP has a global opti- 
mal solution, then one can confine the search for such 
a solution to the set of extreme points of D, provided 
that this set is nonempty. This property holds, as in lin- 
ear programming, even when D is unbounded. A num- 
ber of algorithms for CMP are based upon this prop- 
erty. 

Another highly important property for CMP is that 
if D is a compact set, then CMP must have a global opti- 
mal solution which is an extreme point of D. This is per- 
haps the most widely-known theoretical result in con- 
cave minimization [1]. Like the property stated in the 
previous paragraph, it forms the basis for a number of 
important concave minimization algorithms. 

For cases where D is a polyhedron, possibly un- 
bounded, that contains at least one extreme point, it 
has been shown that either CMP has a global optimal 
solution which is an extreme point of D, or CMP must 
be unbounded and f must be unbounded from below 
over some extreme direction of D. Notice that the same 
property, remarkably, holds in the case of linear pro- 
gramming. This property is used by a large number 
of the algorithms designed to solve CMP when D is 
a nonempty polyhedron. 

CMP displays a remarkable diversity of applica- 
tions. Each application is either direct or indirect. By di- 
rect, we mean that the original model formulation takes 
the form of CMP immediately or, if not, with only rela- 
tively simple algebraic manipulations. The indirect ap- 
plications involve problems whose direct formulations 
do not take the form of CMP, but existing theory can 
be used to reformulate these problems in the form of 
CMP. 

Some of the oldest and most diverse direct applica- 
tions of CMP belong to a class of problems called fixed 
charge problems. In these problems, the objective func- 
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tion f is separable, i. e., it is of the form 


f@)= >“ A. 


i=1 


For each i = 1, ..., n, in these problems f; is a concave 
function on {x; € R: x; > 0} of the form 


0 ifx; =0, 


PEE cj + Wi(xi) 


if x; > 0, 


where c; > 0 is the fixed setup cost of undertaking activ- 
ity i at a positive level, and w;(x;) is a continuous con- 
cave function on {x; € R: x; > 0} that represents the 
variable cost of undertaking the activity at level x;. 

When the functions w;(x;), i = 1, ..., n, are linear, 
the classical linear fixed charge problem is obtained. 
Some of the oldest applications of concave minimiza- 
tion involve solving problems of this type. Among these 
are applications to transportation planning, site selec- 
tion, production lot sizing and network design. For 
cases where at least one of the functions w;(x;),i=1,..., 
n, is piecewise linear, several types of applications have 
been reported. Included among these are problems in- 
volving price breaks, such as bid evaluation problems, 
certain inventory planning problems and various plant 
location problems. When w;(x;) is a general concave 
function for some i = 1, ..., n, applications involving 
economies of scale, for instance, can be solved. 

More recently, CMP has been directly applied to 
a class of problems called multiplicative programming 
problems. These are problems of the form CMP where 


P 
f(x) =] [fi (1) 
j=l 


for some set of p > 2 functions f;, j = 1,..., p, that are 
each nonnegative over D. Notice that if f,(x) = 0 for 
some k € {1...., p} and some x € D, then it is easy to 
see that the global minimum value for CMP is 0, and 
xX is a global optimal solution. Therefore, it is generally 
assumed in multiplicative programming problems that 
each function f; is positive over D. Let us make this as- 
sumption henceforth. 

The objective function (1) of a multiplicative pro- 
gram is generally not a concave function. But, when 
each function f;, j = 1, ..., p, is a concave function on 


R", some simple transformations of (1) yield concave 
functions over D. For instance, if, for each x € D, we 
define w; and w2 by 


P 
wi(x) = In f(x) = ) >In fi(x), 
j=l 


IH 


wa(x) = [f(x)]?, 


respectively, then, whenever each fj, j = 1, ..., p, is 
a concave function on R"”, both w; and w2 are concave 
functions on D [3,10]. Thus, by using w; or w2, multi- 
plicative programming problems in which f;, j = 1,..., 
p, are concave functions can be easily transformed to 
concave minimization problems. 

Various applications of multiplicative program- 
ming problems with concave or linear functions f;, j = 
1, ..., p, in (1) have arisen, especially since the 1960s. 
For example, the linear case has been applied to the 
problem of optimizing value functions for multiple ob- 
jective programming problems subject to linear or non- 
linear constraints. For p = 2, the linear case has been 
used to help solve the modular design problem, to de- 
sign integrated circuit chips and to select bond port- 
folios. The concave case has been used to analyze and 
solve a number of problems in microeconomics. 

Subject to occasional restrictions, large classes of in- 
teger programming problems can be converted by var- 
ious means into the form of CMP and solved as con- 
cave minimization problems. As a result, these integer 
programming problems, indirectly, are applications of 
concave minimization. The transformation processes 
used to accomplish the conversion, however, can be 
rather involved. They may also increase the size of 
the original problem [4], and they may call for choos- 
ing values for parameters that are difficult to deter- 
mine [5,9]. 

In particular, by using a certain general transfor- 
mation process, any feasible linear integer or quadratic 
integer programming problem over a polyhedron with 
nonnegative, bounded variables can be converted into 
the form of CMP and solved as a concave minimiza- 
tion problem. By using more customized conversion 
processes, linear zero-one programs, quadratic assign- 
ment problems, and other special integer programming 
problems can also be transformed to the form of CMP 
and solved as concave minimization problems. The spe- 
cialized transformations generally take advantage of 
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some aspect of the original integer programming prob- 
lem that the general processes ignore. 

There are many other indirect applications of CMP, 
including d.c. optimization, indefinite quadratic pro- 
gramming, and bilinear programming, for instance. For 
further details and additional direct and indirect appli- 
cations, see [1,2,6,7,11,12]. 

To solve CMP, a large number of algorithms have 
been developed. Many of these rely on one or more of 
the following four approaches. 

In the cutting plane approach, a local minimum for 
f over D is found. Subsequently, a hyperplane is con- 
structed and used to cut off all points of D whose ob- 
jective function values are not less than that of the local 
minimum. This yields a new closed convex set D' C D. 
This process is then repeated with D! in the role of D. 
By iterating this process, the portion of D remaining to 
be explored is progressively reduced. Termination oc- 
curs when it can be shown that f(y) > f(x*) for all y € 
D*, where D* is the portion of D remaining at iteration 
k, and x* is the local minimum found through iteration 
k with the smallest objective function value. 

In a typical outer approximation approach, D is as- 
sumed to be compact. To initiate the approach, a simple 
bounded polyhedron P’ containing D whose vertices 
can be enumerated is constructed. A vertex v! of P! of 
minimum objective function value among all of the ver- 
tices of P! is found. This gives a lower bound f(v') for 
the optimal value of CMP. If v! € D, v! is a global op- 
timal solution for CMP and termination occurs. Other- 
wise, a new bounded polyhedron P? C P' is constructed 
that contains D, and its vertices are enumerated. With 
P’ in the role of P', the process is repeated. By repeating 
this process, a sequence of telescoping bounded poly- 
hedra containing D is obtained. Termination occurs in 
the first iteration k where the vertex v found that min- 
imizes f over all of the vertices of P* lies in D. 

In inner approximation (polyhedral annexation) ap- 
proaches for CMP, D is assumed to be a bounded poly- 
hedron. Typically, at each major iteration of an inner 
approximation algorithm, a local minimum extreme 
point solution x for CMP is available. A sequence of 
expanding inner approximating compact polyhedra for 
(DM G) is constructed via a series of subiterations, 
where G = {x € R": f(x) > f(x)}. During this pro- 
cess, either an improved local minimum extreme point 
X is found, or, after k subiterations, the algorithm de- 


tects that D C P*, where P* is the current inner approx- 
imation of (DN G). In the former case, X replaces x and 
a new major iteration begins. In the latter case, since P* 
Cc (DN GQ), it follows that D C G, and the algorithm 
therefore terminates with the global optimal solution x. 

In the branch and bound approaches for CMP, D 
is repeatedly subdivided into finer and finer partitions. 
A lower bound for f over each partition element is cal- 
culated. The lowest of these lower bounds at any step k 
of the process gives a global lower bound LB, for f over 
D. At any stage, typically some feasible solutions for 
CMP have been detected. A feasible solution y with the 
smallest f value among all feasible solutions detected 
through any point in the algorithm is always available. 
This solution is called the incumbent solution. When, 
at some step k, the inequality LB, > f(y) holds for the 
first time, the algorithm stops and returns the global op- 
timal solution y. 

Details concerning these and other solution ap- 
proaches can be found in [1,2,6,7]. 


See also 


> Bilevel Linear Programming: Complexity, 
Equivalence to Minmax, Concave Programs 
> Minimum Concave Transportation Problems 


References 


1. Benson HP (1995) Concave minimization: Theory, applica- 
tions and algorithms. In: Horst R, Pardalos PM (eds) Hand- 
book global optimization. Kluwer, Dordrecht, pp 43-148 

2. Benson HP (1996) Deterministic algorithms for constrained 
concave minimization: A unified critical survey. Naval Res 
Logist 73:765-795 

3. Benson HP, Boger GM (1997) Multiplicative programming 
problems: Analysis and efficient point search heuristic. 
J Optim Th Appl 94:487-510 

4. Garfinkel RS, Nemhauser GL (1972) Integer programming. 
Wiley, New York 

5. Giannessi F, Niccolucci F (1976) Connections between non- 
linear and integer programming problems. Symp Mat Inst 
Naz di Alta Mat 19:161-176 

6. Horst R, Pardalos PM, Thoai NV (1995) Introduction to 
global optimization. Kluwer, Dordrecht 

7. Horst R, Tuy H (1993) Global optimization: Deterministic 
approaches, 2nd revised edn. Springer, Berlin 

8. Kalantari B (1986) Quadratic functions with exponential 
number of local maxima. Oper Res Lett 5:47-49 

9. Kalantari B, Rosen JB (1982) Penalty for zero-one integer 
equivalent problems. Math Program 24:229-232 


466 


Conjugate-Gradient Methods 


10. Konno H, Kuno T (1995) Multiplicative programming prob- 
lems. In: Horst R, Pardalos PM (eds) Handbook Global Op- 
tim. Kluwer, Dordrecht, pp 369-405 

11. Pardalos PM (1994) On the passage from local to global 
in optimization. In: Birge JR, Murty KG (eds) Mathematical 
Programming: State of the Art. Braun-Brumfield, Ann Ar- 
bor, MI, pp 220-247 

12. Pardalos PM, Rosen JB (1987) Constrained global optimiza- 
tion: Algorithms and applications. Springer, Berlin 

13. Pardalos PM, Schnitger G (1988) Checking local optimal- 
ity in constrained quadratic programming is NP-hard. Oper 
Res Lett 7:33-35 

14. Rockafellar RT (1970) Convex analysis. Princeton Univ. 
Press, Princeton 


Conjugate-Gradient Methods 


J. L. NAZARETH!” 

' Department Pure and Applied Math., 
Washington State University, Pullman, USA 

* Department Applied Math., University Washington, 
Seattle, USA 


MSC2000: 90C30 


Article Outline 


Keywords 
Introduction 
Notation and Preliminaries 


Linear CG Algorithms 
Nonlinear CG Algorithms 


Nonlinear CG-Related Algorithms 
Classical Alternatives to CG 
Nonlinear CG Variants 
Variable-Storage/Limited-Memory Algorithms 
Affine-Reduced-Hessian Algorithms 


Conclusion 
See also 
References 


Keywords 


Unconstrained minimization; Conjugate gradients; 
Linear CG method; Nonlinear CG method; Nonlinear 
CG-related algorithms; Variable-storage; 
Limited-memory; Affine-reduced-Hessian; 
Three-term-recurrence 


Introduction 


Conjugate-gradient methods (CG methods) are used to 
solve large-dimensional problems that arise in compu- 
tational linear algebra and computational nonlinear op- 
timization. These two subjects share a broad common 
frontier, and one of the most easily traversed crossing 
points is via the following simple observation: the prob- 
lem of solving a system of linear equations Ax = b for 
the unknown vector x € R", where A is a positive defi- 
nite, symmetric matrix and b is a given vector, is math- 
ematically equivalent to finding the minimizing point 
of the strictly convex quadratic function 


1 
q(x) = —b'x+ 5x' Ax. 


The linear CG method for solving the system of lin- 
ear equations is able to capitalize on this equivalent 
optimization formulation. It was developed in the pi- 
oneering 1952 paper of M.R. Hestenes and E.L. Stiefel 
[11] who, in turn, cite antecedents in the contributions 
of several other authors (see [9]). The method fell out of 
favor with numerical analysts during the 1960s because 
it did not compete with direct methods, in particu- 
lar, Gaussian elimination, but it continued to be widely 
used in real-world applications by specialists in other 
areas. Interest in CG as an iterative method, down- 
playing its finite-termination properties, revived in the 
1970s when the solution of large scale linear systems was 
coming to the forefront of academic research. 

The nonlinear CG method extends the linear CG ap- 
proach to the problem of minimizing a smooth, non- 
linear function f(x), x € R”, where n can be large. It 
was developed in another landmark article published 
in 1964 by R. Fletcher and C. Reeves [8]. Optimization 
techniques for this class of problems, which are inher- 
ently iterative in nature, form a direction of descent at 
an approximation to the solution (the current iterate), 
and search along this direction to obtain a new iterate 
with an improved function value. The Fletcher—-Reeves 
algorithm combined a search direction derived from the 
Hestenes-Stiefel approach with an efficient line search 
procedure along this direction vector adapted from the 
1959 variable-metric breakthrough algorithm of W.C. 
Davidon [6]. The resulting CG algorithm was a marked 
enhancement of the classical steepest descent method 
of A.-L. Cauchy. 
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Like steepest descent, the nonlinear CG method 
is storage-efficient and requires only a few n-vectors 
of computer storage beyond that needed to specify 
the problem itself. During the three decades after its 
discovery, a large number of storage-efficient nonlin- 
ear CG-related algorithms were proposed. In particu- 
lar, a structural connection between conjugate-gradient 
and variable-metric techniques for defining search di- 
rection vectors provided the springboard for effective 
new families of variable-storage/limited-memory algo- 
rithms and affine-reduced-Hessian algorithms that oc- 
cupy a middle ground. 

These three classes of algorithms, namely, linear 
CG, nonlinear CG and nonlinear CG-related, will be 
discussed in the respective Sections below. 


Notation and Preliminaries 


Lowercase boldface letters, e. g., x, denote vectors, and 
uppercase boldface letters, e.g., A, denote symmetric, 
positive definite matrices. The residual at x of the linear 
system Ax = b is r = Ax — b. It equals the gradient vector 
g = —b + Ax of the strictly convex, quadratic form 


il 
q(x) = —b'x+ 5x Ax 


at x. The gradient vector of q vanishes only at the 
unique solution A~'b of the linear system. 


Linear CG Algorithms 


A basic CG algorithm for solving the system of linear 
equations Ax = b, where A is a positive definite, sym- 
metric matrix, is as follows: 


(Initialization) 

0 | x, = arbitrary; 

r, = residual of linear system at x1; 

ah = a8 

(Iteration i) 

1 | X41 = unique minimazing point of q on 
halfline through x; along direction dj; 

2 | visi = residual of linear system at x;+1; 
Bi = |[eisi ll? [eill?s 

A || Glee = Thang $F (Bia. 


Ww 


In the computational linear algebra setting, the matrix 
A is provided exogenously. The residual r;,1, at step 2, is 
computed as Ax;,; — b or else obtained by updating 17. 


The direction d; is always a descent direction for q at x;. 
At step 1, the minimizing point is computed as follows: 
T 

a; = ~aaa' Xi41 = Xi + adj. 

There are numerous variants on this basic algorithm 
that seek to enhance convergence through problem pre- 
conditioning (transformation of variables), to improve 
algorithm stability and to solve related computational 
linear algebra problems. A contextual overview and fur- 
ther references can be found in [2]. 

Here our focus is optimization. In this setting, the 
residual at x;,; is given its alternative interpretation 
and representation as the gradient vector gj,, of q at 
Xi+1, and this gradient is assumed to be provided ex- 
ogenously. The minimizing point at step 1 is computed, 
alternatively, as follows: 


8 & — x;) 
d} &; — gi) 


where x; is any point on the ray through x; in the di- 
rection d; and §; is its corresponding gradient vector; 
Xi+1 = X; + ajd;. The expression for a; is derived from 
the previous linear systems version and the relation 
A(X; — X;) = 8; — i. 

We will call the resulting optimization algorithm 
the CG-standard for minimizing q. It will provide an 
important guideline for defining CG algorithms in the 
subsequent discussion. 


aQ= 


Nonlinear CG Algorithms 


A nonlinear CG algorithm is used to find a minimizing 
point of the nonlinear function f(x), x € R", when n is 
large and/or computer storage is at a premium. A basic 
algorithm can be stated as follows: 


(Initialization) 

0 | x; = arbitrary; 

g, = gradient of f at xj; 

d) = —g;3 

(Iteration 7) 

1 | xj41 = an improved iterate on halfline 
through x; along direction dj; 

2 | 84, = gradient of f at x;41; 

Bi = Igisall7/1g,l175 

4 di+1 = Sei] WP Bj dj. 


ies) 
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Note that this algorithm is closely patterned after the 
CG-standard. The improved iterate at step 1 is obtained 
by a line search procedure, which is normally based 
on quadratic or cubic polynomial fitting (suitably safe- 
guarded). When f = q, such a line search procedure can 
immediately locate the minimizing point along the line 
of search, once it has gathered the requisite information 
to make an exact fit. In other words, when applied to 
the minimization of q, the foregoing nonlinear CG al- 
gorithm is able to replicate the CG-standard precisely. 
This property characterizes a nonlinear CG algorithm. 

Considerable research has gone into alternative ex- 
pressions for the quantity 6;. The four leading con- 
tenders are the Fletcher-Reeves (FR), Hestenes-—Stiefel 
(HS), Polyak-Polak-Ribiére (PPR) and Dai- Yuan (DY) 
choices (see [5,8,11,20,22]). These define f; as follows: 


cL sige 
ER: B; = 2418+, 
8; Bi 
se 
8i4iYi 
HS: = ; 1 
B dy: (1) 
oye 
PPR: B; = ea 
8; 8i 
“ 
8i418i+1 
DY: =,” (2) 
e diy; 


where y; = gi+1 — gi is the gradient change that corre- 
sponds to the step s; = Xi+1 — Xi. 
When line searches are exact and the function is 
quadratic, the following relations hold: 
T T 
8 8i = d; yi. (3) 


T i, 
8i418i+1 = 8i41Yi> 


Thus, the values of the scalar f; are identical for all four 
choices, and each of the associated algorithms becomes 
the CG-standard. In general, however, they are applied 
to nonquadratics and use inexact line searches, result- 
ing in four distinct, nonlinear CG algorithms that can 
exhibit behavior very different from one another. 

The following generalization yields a two-parameter 


family: 
_ Ailgh git) + 1 —Ad(gh yi) 


es 4 
‘ “i(g; gi) + ( — :)(d} yi) . 


where A; € [0, 1] and yu; € [0, 1]. For any choice of 
A; and j4; in these ranges, the associated algorithm re- 


duces to the CG-standard when f is quadratic and line 
searches are exact, which follows from (3). If the scalars 
A; and pu; take only their extreme values, 0 or 1, then 
one obtains four possible combinations corresponding 
to (1)-(2). The above two-parameter family of nonlin- 
ear CG algorithms, which subsumes FR, HS, PPR and 
DY, and its subfamilies (defined, for example, by taking 
A; = 1) are currently a topic of active research. 

When the line search is sufficiently accurate, a non- 
linear CG algorithm always produces a direction of de- 
scent at step 4. Suitable inexact line search termination 
conditions, in conjunction with different choices of B;, 
have been extensively studied, both theoretically and 
computationally. A good overview of the theory and 
convergence analysis of nonlinear CG algorithms can 
be found in [19]. The nonlinear CG algorithm based 
on the PPR choice for f; [20,22] and a suitable restart- 
ing strategy [23] has emerged as the most efficient in 
practice. However, it is well known that no single non- 
linear CG algorithm works well all the time. There is 
enormous variability in performance on different prob- 
lems or even within different regions of the same prob- 
lem. 


Nonlinear CG-Related Algorithms 


We informally characterize a nonlinear CG-related al- 

gorithm as follows: 

e its computer storage requirements are ‘similar’ to 
those of an implemented nonlinear CG algorithm, 
for example, [23]; 

e its path traverses the iterates of the CG-standard 
when the function is a strictly convex quadratic, 
line searches are exact, and the same initialization 
is used. 

In other words, it may use a few more n-vectors of com- 
puter storage than, say, the PPR nonlinear CG algo- 
rithm, and it is permitted to generate additional inter- 
mediate iterates and form search vectors in novel ways. 
A nonlinear CG-related algorithm does not have to im- 
itate the ‘structure’ of the basic nonlinear CG algorithm 
of the previous Section. But the above requirement that 
its path must cover the iterates of the CG-standard of 
the first Section implies the following: if the candidate 
algorithm does not exhibit finite termination when ap- 
plied to a quadratic q then it is not a CG-related algo- 
rithm. 


Conjugate-Gradient Methods 


Let us now briefly categorize the main lines of de- 
velopment. 


Classical Alternatives to CG 


These seek to enhance or accelerate the steepest descent 
algorithm of Cauchy more directly, without explicitly 
introducing notions of conjugacy. The year of publica- 
tion of [8] was indeed a banner year for such devel- 
opments. Two particularly noteworthy contributions, 
which coincidentally also appeared in 1964, were the 
parallel-tangents or PARTAN algorithm of B.V. Shah, 
R.J. Buehler and O. Kempthorne [24] and the heavy ball 
algorithm of B.T. Polyak [21]. For a modern descrip- 
tion of the former, and its subsequently discovered CG- 
related properties, see [14]. The basic iteration of the 
latter algorithm is as follows: 


Xit1 = xj — ag; + B(x; — xj-1), 


where @ is a constant positive stepsize and f isa scalar 
with 0 < 6 < 1. Although, strictly speaking, it is not CG- 
related in this form, the algorithm has CG-like rate of 
convergence properties on a quadratic under optimal 
choices of the algorithm parameters w and f (see [3]). 
The algorithms in [24] and [21] both used very simple 
steplength techniques to move from one iterate to the 
next, in contrast to the nonlinear CG implementation 
of [8]. The introduction of the line-search technique 
of Davidon [6] into either of these 1964 algorithms, as 
in [8], could have propelled them much more into the 
limelight at the time. More recently, the important con- 
tribution of Yu.E. Nesterov [17] on global rate of con- 
vergence is based on an algorithm akin to PARTAN 
[24]. However, in general, classical alternatives to CG 
remain on the sidelines. 


Nonlinear CG Variants 


These are premised on retaining the algorithmic struc- 
ture, and the conjugacy properties on quadratics, of 
the basic nonlinear CG algorithm of the above Section 
when the initial direction is not along the negative gra- 
dient and/or line searches are inexact. For instance, the 
three-term-recurrence algorithm (TTR) is able to simul- 
taneously relax both CG-standard requirements. The 
overall iteration follows the basic algorithm of the pre- 
vious Section, but with the computation of the search 


direction at steps 3 and 4 replaced as follows: 


Vi 


_ iyi 
Yi_1di-1 


Bi a 


vie 


and 
di41 = —yi + Bid; + yidj-1. 


Conjugacy of search directions is retained when the ini- 
tial search direction is chosen to be an arbitrary direc- 
tion of descent. If this initial direction is along the neg- 
ative gradient and line searches are exact, then the TTR 
generates the same search directions and iterates as the 
CG-standard on a positive definite quadratic. A draw- 
back of the TTR algorithm is that it does not guarantee 
a descent direction on more general functions even if 
line searches are exact. But, in practice, the direction al- 
most always satisfies the condition Sidi <0. 

For references to other nonlinear CG variants, see 
[10] and the survey articles in [1]. Despite theoretical 
advantages on quadratics, algorithms in this category, 
in practice, have not proved to be significantly superior 
to the PPR nonlinear CG algorithm of the above Sec- 
tion. 


Variable-Storage/Limited-Memory Algorithms 


These are premised on a key structural relation- 
ship between the nonlinear CG algorithm and the 
BFGS variable-metric algorithm, and its properties on 
quadratics, see [4,12,15]. The most effective CG-related 
algorithm, to date, is the L-BFGS algorithm of J. No- 
cedal [18]. This is described in more detail in » Un- 
constrained nonlinear optimization: Newton-Cauchy 
framework and is not repeated here. The algorithm has 
the property that it produces a descent direction un- 
der weak termination conditions on the line search. It 
has proved to be an efficient and versatile algorithm 
(it can exploit additional computer storage when avail- 
able) that generally outperforms the PPR algorithm in 
practice. 

For an overview of other variable-storage algo- 
rithms that draw on the BFGS-CG relationship, see the 
survey articles in [1]. 


Affine-Reduced-Hessian Algorithms 


These make estimates of curvature, i.e., approxima- 
tions to the Hessian or its inverse, in an affine sub- 
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space usually of low dimension and defined by the most 
recent gradient and one or more prior steps and/or 
gradients. For algorithms of this type and their CG- 
related properties, see [7,13,16] and references cited 
therein. 

The foregoing principle, on which such algorithms 
are premised, has the conceptual advantage that it pro- 
vides a true continuum between the nonlinear CG and 
full-storage variable-metric (and Newton) algorithms. 
But, practical affine-reduced-Hessian implementations 
are not yet widespread. 


Conclusion 


CG algorithms are among the simplest and most ele- 
gant algorithms of computational nonlinear optimiza- 
tion. They can be surprisingly effective in practice, and 
thus will always have an honored place in the repertoire. 
Nevertheless, the subject still lacks a comprehensive un- 
derlying theory, and many interesting algorithmic is- 
sues remain to be explored. 

Some references cited in the present discussion are 
listed below, and other key references can be traced, in 
turn, through their bibliographies. 


See also 


> Broyden Family of Methods and the BFGS 
Update 

> Large Scale Trust Region Problems 

> Large Scale Unconstrained Optimization 

> Local Attractors for Gradient-Related Descent 
Iterations 

> Nonlinear Least Squares: Newton-Type Methods 

> Nonlinear Least Squares: Trust Region Methods 

> Unconstrained Nonlinear Optimization: 
Newton-Cauchy Framework 

> Unconstrained Optimization in Neural Network 
Training 
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Introduction 


Contact map overlap maximization is a problem that 
arises in computational biology as an important ap- 
proach to compare structural similarity of proteins. 
Contact map overlap was proposed in [6] as a measure 
for protein structural similarity, and is employed in the 
Critical Assessment of Techniques for Protein Struc- 
ture Prediction (CASP). 

Proteins consist of amino acid residues and assume 
specific 3-dimensional structures. Proteins of similar 
structures often have similar function and proper- 
ties. Therefore, structure alignment provides critical in- 
sights into the relation of existing proteins, and has im- 
portant applications in designing knowledge-based po- 
tential functions that are useful for protein folding pre- 
diction. 


Definition 

Given two proteins A and B with m and n residues, re- 
spectively, we denote the residues of A with indices i, 7’, 
and i”, and the residues of B with indices j, j’ and j”. 
If two residues in the same protein are close in space, 
we say that they are in contact. A list of residue pairs 
is known as the contact map of a given protein. In this 
article, the contact map for protein A (resp. B) is de- 
noted as E“ (resp. E®) so that E4, = 1 (resp. EX, = l)if 
residues i and i’ in protein A (resp. j and j’ in protein B) 
are in contact, and E#, = 0 (resp. Ev = 0) otherwise. 


From a graph-theoretic perspective, a contact map 
is a node-node incident graph where nodes represent 
residues and edges represent contacts. The contact map 
overlap maximization problem aims at identifying an 
ordered residue correspondence between two contact 
maps so as to result in a maximum common subgraph. 
To solve this problem, a correspondence (alignment) 
must be established between the node sets (residues) of 
the contact maps so that the number of common con- 
tacts (edges) can be maximized. If residue i in protein 
A aligns with residue j in protein B, they form a pair 
(i, j). If pairs (i,j) and (i’, j’) result in common contacts, 
ie, EA, = 1 and E?, = 1, then they form an overlap. 
If two pairs (i,j) and (i’, j’) form an overlap, we set 
h(i, j,i’, j’) = 1. Otherwise, h(i, j, i’, j’) is set to zero. 

An important requirement for structure alignment 
is that the relative orders of residues in the original se- 
quences agree - a property that is known as the non- 
crossing property in the CMO literature. For two pairs 
(i,j) and (i, j’) to be non-crossing, either i < i’ and 
j <j ori> i’ andj > j’ must hold. In this paper, non- 
crossing pairs are also referred to as parallel pairs. 

For convenience of the presentation, [i, i’] (resp. 
Lj, 7’]) will denote the set of residues {i” : i < i” < i'} 
(resp. {j” : j < j” < j’}). The interval product [i, i’] x 
[j, j'] therefore denotes the set of pairs {(i”, j”) : i < 
i” < i,j < j” < j}. For any given set of residue 
pairs S, we will use Q(S) to denote the set of subsets of 
S that contain only parallel pairs. The objective of CMO 
is to identify a set of parallel pairs that maximize the re- 
sultant number of overlaps. For proteins A and B, the 
problem can be stated as follows: 


—_> y Mig als 


QeQthimxtt 2 (DEQ (i, IEQ 


An optimal alignment of the contact maps of the hu- 
man Rap30 DNA-binding domain (1BBY) and the 
DNA binding domain of Escherichia coli LexA repres- 
sor (ILEA) is shown in Fig. 1 together with their 3-di- 
mensional structures. 


Methods 


Goldman et al. [7] proved that CMO is APX-hard, 
which practically defies the existence of a polynomial 
time exact algorithm or even a polynomial time approx- 
imation scheme. In the remainder of this section, we 
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Contact Map Overlap Maximization Problem, CMO, Figure 1 
An instance of CMO involving 1BBY and 1LEA 


will survey both exact and approximate algorithms for 
this problem. 


Exact algorithms 


Four exact algorithms have been proposed for the CMO 
problem so far. All embrace the branch-and-bound 
framework to ensure global optimality. The branch- 
and-reduce algorithm [12,13] currently has an edge 
against all other exact algorithms both in terms of com- 
putational speed as well as its ability to solve challeng- 
ing instances. 


Integer Linear Program Carr et al. [5] proposed an 
integer linear programming formulation for the prob- 
lem. This formulation was further developed in [9] and 
involves two sets of binary variables: x; and yj ji. Vari- 
able x; equals one if pair (i,j) is chosen in the optimal 
alignment and zero otherwise. Variable yj jj"; equals 


one if both pairs (i, j) and (i’, j’) are chosen in the final 
solution. Otherwise, variable y;;j/;; is set to zero. The 
model is as follows: 


(M1) 


max 


» 


(i j,1, jf EAGT )=1 ERG =LI<i i<j! 
Aisi jj i<i 


s.t. > 


V:EAG,i)=1,i/>i 


LB 


JEP GIV=1, >i 


Viji'j’ (1) 


Vij’ S Xij (2) 


Vij’ S Xij (3) 


) Vij Siz (4) 
i:BA(i,i/)=1,i<i' 
y Vij S Xiry (5) 


PERG N=LI<i 
Xigtxivy S1 
Xij € {0, 1}, vijiryr € {0, 1} 


The sum of common contacts in Eq. (1) constitutes 
the number of overlaps by definition since Eq. (6) pro- 
hibits the existence of crossing pairs in the final solu- 
tion. Equations (2)-(5) ensure that the optimal solu- 
tion contains at most one pair from each residue that 
does not cross a chosen pair. In addition to these nec- 
essary constraints, two classes of cuts, clique-cuts and 
odd-hole-cuts, can be optionally generated in polyno- 
mial time and append to (M1) as was shown in [9]. 


V crossing pairs (i, j) and (i’, j’) (6) 


Lagrangian Relaxation In [3], Caprara and Lancia 
proposed a Lagrangian relaxation approach for the 
CMO problem. Their approach begins with an inte- 
ger linear program formulation of CMO as is shown 
in model (M2) below. In this model, xj and y;j;;; bear 
same definitions as their counterparts in (M1), and 7 is 
the set of maximal set of crossing pairs. 


(M2) 


1 
max 5 hig Viji'y (7) 
(i,j,i’.f) 


s.t. > xij <1, 


Gi, p):4, EL 


VIed (8) 
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SO viiry < xij, vie 7,(i',7) (9) 
(i,j) 

yiy = Yup — Wi< ij <j (10) 
Xijs Viji'j’ € {0, 1} (11) 


It is easy to verify that Eq. (7) yields the number of 
overlaps given the validity of Eq. (10). Since any two 
pairs in a maximal set of crossing pairs would cross, 
constraint (8) enforces the non-crossing property by 
prohibiting any such two pairs to co-exist in the final 
alignment. Constraint (9) ensures that, if an arbitrary 
pair (i’, j’) is chosen, the final solution should contain 
no more than one pair (i,j) from any maximal crossing 
pair set that could form an overlap with pair (i’, j’). 

By introducing multipliers Ajj for Eq. (9), 
Caprara and Lancia obtained the following Lagrangian 
relaxation of (M2): 


(LM2) 
; 1 
min max shi y Viji jr 
Re Be Aad 
Coa eye) 


© oe 


(infil f )1<ij<7 


Airy (Viiv — Vi'7ii) 
s.t. Constraints (8), (9), and (11). 


A subgradient method was used to iteratively improve 
the multipliers, while an O(|E4||E8|) algorithm was 
employed to solve (LM2) for the set of multipliers at 
each iteration. 


Reformulation as Maximum Clique Strickland 
et al. [11] showed that CMO can be cast into a max- 
imum clique problem. To this end, they considered 
a two-dimensional lattice whose rows correspond to 
the contacts in E“ and columns correspond to the 
contacts in E8. Each vertex of the lattice, if chosen, 
contributes a unit overlap to the objective value. In 
addition, an edge is drawn between two vertices if the 
corresponding pair of overlaps are admissible to some 
feasible alignments. It is not difficult to see that a maxi- 
mum clique in the resultant graph indeed corresponds 
to an optimal solution for the CMO problem. This al- 
gorithm involves a number of preprocessing routines 


to reduce the problem size before calling a maximum 
clique solver. In a more recent work [10], the authors 
proposed improved data structures to enhance the al- 
gorithm performance. 


Combinatorial Branch-and-Reduce Xie and Sa- 
hinidis [12,13] developed a branch-and-reduce algo- 
rithm, which combines the generic branch-and-bound 
framework with problem-specific reduction schemes. 
The algorithm initializes a branch-and-bound tree with 
a root node where all pairs are allowed. Reduction 
schemes, both based on domination and the current 
best solution value, are used to remove inferior pairs. 
Lower and upper bounds on the overlaps for the cur- 
rent node are then computed using dynamic program- 
ming. If the lower and upper bounds agree, the search is 
terminated with a global optimal alignment. Otherwise, 
the algorithm chooses a branching pair and creates two 
children nodes. The branching pair is enforced in one 
of the node, while it is disallowed in the other node. 

A key step in their algorithm is the computation of 
the contribution to the overlaps by a given pair on a set 
of pairs. Define Q(S) to be the set of all subsets of S 
that contain only parallel pairs. Then, the contribution 
of pair (i,j) to the objective value on the set S is defined 
as 


i,j,S):= max h(i, 7,7, 7’). 
pli, j,8) = max a Gii.7) 
(i,j EQ 


In particular, let pt (i, j,i’, j’) and p~(i, j,i’, j’) de- 
note p(i, j,S) when S = [i’, m] x [j’, n] and when S = 
[1, i’] x [1, 7’], respectively. They proved that comput- 
ing a single term of p*(-) or p (-) can be accomplished 
in O(mn) time with preprocessing time and space com- 
plexity of O(m + n). 

The upper bounding scheme is summarized in 
Proposition 1, where, for a node V of the search tree, 
C(V) is the set of pairs that must be included and F(V) 
is the set of pairs that have the freedom to be in the so- 
lution or not. In addition, define 


» 


(i, j)€EC(V) 


gi, j) = h(i, j,i, j).W(i, j) € F(V) 


Then: 
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Proposition 1 Define 


g(i, j), 


if (i,j) €F(V); i=lorj=1, 


t(i, j):= 4 max {g(i, j), g(i, j) + wi, p}, 
if (i,j) € F(V);i> landj>1, 
—oo, otherwise, 
where 
HUG It ats 
Ci, j,i’, 7’) fli), 7) € FV) 
0, otherwise, 
—0O, 
if(l,i-1]x[l,j-1NF(V) =9, 
wi, f):= 4 max jyef1,i— Pel 1]NF(V nit 7) 
+ Ai’, j st pDe Sui’, j dk pi, 
otherwise , 
and 


j.itljt+)) 
+p (i,j, —1,j/-1). 


ui’, j,i, f= pr”, 


Then 


oe ae 


2 (i. pecivy jJ/)EC(V) 


max 


Mint +] Hid | 
GIES) cone D 


Proposition 1 suggests an O(m?n”) algorithm to com- 
pute the upper bound. In addition, the upper bound- 
ing scheme also provides a natural lower bound as was 
shown in [13]. 


is an upper bound for the current node V. 


Approximate Algorithms 


This subsection outlines both approximation algo- 
rithms (i.e., with performance guarantee) and heuris- 
tics (i.e., without performance guarantee) that have 
been proposed in the literature for the CMO problem. 
Goldman et al. [7] considered a special class of 
CMO instances from 2-dimensional self-avoiding walk, 
and proposed a 3-approximation algorithm that runs 
in O(n°) time. Agarwal et al. [1] proposed a 6-approx- 
imation algorithm for the same class of problems, with 


however a better complexity of O(n? log n). In addition, 
they proved the special class of CMO from 3-dimen- 
sional self-avoiding walk is MAXSNP-hard, and pro- 
posed a O(./n)-approximation algorithm for this class 
of CMO instances. 

Carr et al. [4] proposed to use a memetic algo- 
rithm, which combines global search and local search, 
to solve general instances of CMO. In [2], Caprara et 
al. proposed several heuristics, including a genetic al- 
gorithm, local search, and greedy algorithms based on 
Lagrangian relaxation. Existing computational stud- 
ies [8,13] suggest that Lagrangian-relaxation-based 
greedy algorithms perform the best among existing 
heuristics for a large set of test instances. 


Conclusions 


The contact map overlap maximization problem is 
a very important problem in computational biology. Ef- 
ficient algorithms, both exact and approximate, for this 
problem are of great interest. Despite the inherent dif- 
ficulty of this problem, many large-scale practical in- 
stances have been solved within reasonable amounts of 
time. Many challenging instances of the problem cur- 
rently remain unsolved [13]. 
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Introduction 


In this paper we use the following notation: R” is an 
n-dimensional space, where the scalar product will be 
denoted by (x, y): 


(x.y) = Do xiv, 


i=1 


and || - || will denote the associated norm. The gradient 
of a function f : R" — R! will be denoted by Vf. 

A function f is differentiable at a point x € IR” with 
respect to a direction g € R” if the limit 


f(x + ag) — f(x) 
0 


7 ; = li 
Feng) = 


exists. The closed unit ball will be denoted by 
B: B= {x ER”: ||x|| < 1}. 

Consider a function f defined on R”. This function 
is called locally Lipschitz continuous if for any bounded 
subset X C IR” there exists an L > 0 such that 


f(x) — f(y) S Lllx—yll Vx yeXx. 


If f is convex then one can define a subdifferential 0 f(x) 
of this function at a point x € R” as follows [13]: 


Of (x) = {v ER”: fly) — f(x) = (viy—x),y € R"}. 


Elements v € df(x) of the subdifferential 0 f(x) are 
called subgradients of f at the point x. For a convex 
function f defined on R” the subdifferential df (x) is 
nonempty, convex and compact at any x € R”. A set- 
valued mapping x +> 0f(x) is upper semicontinuous. 
An e-subdifferential 0, f(x), ¢ > 0 of the convex 
function f at a point x € R” is defined as [13] 


def (x) = {v € R": f(y) — f(x) 
>(vy,y—x)—e,yeER"}. 


Elements v € 0, f(x) of the subdifferential 0, f(x) are 
called ¢-subgradients of f at the point x. For a convex 
function f defined on R” the e-subdifferential 0, f(x) 
is nonempty, convex and compact at any x € IR" and 
a set-valued mapping x +> d-f(x) is continuous in 
Hausdorff metric at any x. 

Most efficient methods in nonsmooth optimization 
such as the bundle method and its variations are based 
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on €-subgradients of convex functions (see, for exam- 
ple, [10,11]). 

The analysis of nonsmooth, nonconvex functions 
has been area of intensive research for more than three 
decades. Clarke [4,5] introduced the notion of gener- 
alized gradient. We define a Clarke subdifferential for 
locally Lipschitz continuous functions defined on R”. 
A locally Lipschitz function f is differentiable almost 
everywhere and one can define for it a Clarke subdif- 
ferential [5] by 


af (x) = cof ER": A(x" € Dif), x* > x, 
k+> doo): v= lim vst, 


where D(f) denotes the set where f is differentiable, and 
co denotes the convex hull of a set. It is shown in [5] 
that the mapping df(x) is upper semicontinuous and 
bounded on bounded sets. The generalized directional 
derivative of f at x in the direction g is defined to be 


f(x, g) = lim sup a! [f(y + ag) — f(y]. 


y>x,a}0 


The generalized directional derivative always exists and 
f(x, g) = max{(v, g): v € Of(x)}. 


f is called a Clarke regular function on R” if it is dif- 
ferentiable with respect to any direction g € R” and 
f'(x, g) = f(x, g) for all x, g €R”. For nonregular 
functions the Clarke subdifferential has calculus only 
by means of inclusions, which makes very difficult the 
computation of subgradients of some complex non- 
smooth, nonconvex functions. The cluster function 
from the cluster analysis is one such example [2,3]. 

Demyanov and Rubinov [6,7] introduced the no- 
tion of quasidifferential. Let f be a locally Lipschitz con- 
tinuous function defined on R”. This function is called 
quasidifferentiable at a point x € R” if it is direction- 
ally differentiable and there exist compact, convex sets 
f(x) and 0 f(x) such that 


f'(x, g) = max{(v, g),~v € df (x)} 
+ min{(w, g),~w € af (x)} : 
The pair 


Df(x) = [df(x). Of(x)] 


is called a quasidifferential of the function f at a point x. 
The set df (x) is said to be a subdifferential and the set 
df (x) a superdifferential of the function f at x. 

Unlike the Clarke subdifferential, the quasidiffer- 
ential enjoys a full-scale calculus; however, set-valued 
mappings x +> Of(x) and x +> df(x) need not be even 
upper semicontinuous. 

Unfortunately, the notion of an ¢-subdifferential 
cannot be extended for nonsmooth, nonconvex func- 
tions. Instead one can define the Goldstein e-subdifter- 
ential [9]. However, in general the Goldstein ¢-subdif- 
ferential is only upper semicontinuous. 

In this paper, we consider continuous approxima- 
tions to subdifferentials and quasidifferentials. We will 
also describe an algorithm for computation of elements 
of such approximations. 


Definitions 
Continuous Approximations 


Let X be a compact subset of the space IR”. We con- 
sider a family C(x, ¢) = C,(x) of set-valued mappings 
depending on a parameter ¢ > 0. For each e > 0 


CO2a5 2". 


We suppose that C(x, e) is a compact convex set for all 
x € X ande > 0. It is assumed that there exists a num- 
ber K > 0 such that 


sup {|lv||: v € C(x,e),x € X,e>0}<K. (1) 


Definition 1 The limit C,(x) of the family 
{C(x, €)}, € > Oat a point x is defined as follows: 


Cr (x) = {v ER": A(x* — x, 2% > +0,k > +00, 


ve eC(x*,e,)): v= lim vit : 
k—>+00 


It is possible that the limit C,(x) is not convex even if 
all the sets C(x, €) are convex. We consider coC;(x) the 
convex hull of C,(x). It follows from Definition 1 and 
the inequality (1) that the mapping coC; has compact 
convex images. 


Definition 2 A family {C(x, ¢)}, ¢ > 0 is called a con- 
tinuous approximation to a subdifferential df on X if 
the following holds: 


Continuous Approximations to Subdifferentials 


477 


1. C(x, €) is a Hausdorff continuous mapping with re- 
spect to x on X for all e > 0. 

2. The subdifferential 0 f(x) is the convex hull of the 
limit of the family {C(x, e)},e > 0 on X, ie. for all 
xEex 


Of (x) = coC;(x). 


Some properties of the continuous approximations 
were studied in [1]. 

Such continuous approximations need not be 
monotonically decreasing as ¢ — +0. Uniform and 
strongly continuous approximations to the subdifferen- 
tial studied in [14] have such a property. Let f be a lo- 
cally Lipschitz continuous function defined on an open 
set which contains a compact set X. We consider a fam- 
ily of set-valued mappings A, f: R" > 2", ¢ > 0. As- 
sume that the sets A, f(x) are nonempty and compact 
for alle > O and x € X. We will denote by df (x + Bs) 
the set L){df(y): y € Bs(x)}, where Bs(x) = {y € 
R": |x—yll < 6}. 

Definition 3 ([14]) We say that the family {Az f(-)}es0 
is a uniform continuous approximation to the subdif- 
ferential df on X, if the following conditions are satis- 
fied: 

1. For each given ¢ > 0, 4 > 0 there exists t > 0, such 

that for all x € X 


Of(x + Br) CAgf(x) + By, . 
2. For each x € X and forall 0 < € < &2: 


As, f(x) C An f(x). 


3. Ae f(x) is Hausdorff continuous with respect to x on 
X. 
4. For each x € X 


() Acf(x) = af(x). 


e>0 


Definition 4 ([14]) We say that the family {A-f(-)}e>0 

is a strong continuous approximation to the subdiffer- 

ential Of on X, if {Acf(-)}es0 satisfies properties 1-3 

above and instead of property 4 the following is valid: 

4’. For every y, 4 > 0 there exists ¢ > 0 such that for 
allx € X 


Of (x) C Ac f(x) C Of(x + By) +B, . 


For the set-valued mapping C(x, €) we set 


Co(x) = {v € R": We, > +0,k > +00, 


v' € C(x, e)):v = lim vi 
k—>+00 


and let 


C(x,0) = Co(x). 


Theorem 1 ([1]) Let the family {Ag f(-)}e>0 be a uni- 
form continuous approximation to the subdifferential of 
on a compact set X. Then C(x, &) = A¢f (x) is a contin- 
uous approximation to the subdifferential of in the sense 
of Definition 2. 


Corollary 1 It was shown in [14] that a strong continu- 
ous approximation is a uniform continuous approxima- 
tion. So a strong continuous approximation is a contin- 
uous approximation in the sense of Definition 2. 


Theorem 2 ([1]) Let the family C(x, ©) be a continuous 
approximation to the subdifferential df on a compact set 
X and the mapping C(x, €) be continuous with respect 
to (x,€),x € X,e > 0. Assume coC;(x) = Co(x) for all 
x € X. Then the mapping 


Q(x,e) = co|_J {C(x, th:0<t<e} 
is a uniform continuous approximation to df on X. 


Corollary 2 Let the family C(x, €) be a continuous ap- 
proximation to the subdifferential 0f on a compact set 
X and the mapping C(x, €) be a continuous with respect 
to (x,e),x € X,e > 0. Assume coC,(x) = Co(x) for all 
x € X. Then the mapping Q is upper semicontinuous 
with respect to (x, €) at the point (x, 0). 


One can get a chain rule for continuous approximations 
[1,14]. However it is not always applicable to compute 
elements of continuous approximations. In the next 
section we propose an algorithm to compute those el- 
ements. 


Methods 
Computation of the Continuous Approximations 


We consider a locally Lipschitz continuous function f 
defined on R" and assume that this function is semi- 
smooth and quasidifferentiable (for the definition of 
semismooth functions see [12]). We also assume that 
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both sets df(x) and of (x) are represented as a convex 
hull of a finite number of points at any x € R”, that is 
at a point x € IR” there exist sets 


A= {a',...,a™}, a CR", i=1,...,m,m>1 
and 
BS 90". e508"), OPER", GH 1,ic PS 


such that 
Of (x) = coA, Of (x) =coB. 


In other words we assume that the subdifferential and 
the superdifferential of the function f are polytopes 
at any x € R”. This assumption is true, for example, 
for functions represented as a maximum, minimum or 
max-min ofa finite number of smooth functions. 

We take a direction g € R” such that 


€= fis ooss Sade [BH 1 PH Lycee ott 
and consider the sequence of n vectors e/ = e/(q), 
j=l,...,nwitha € (0,1): 

@ = (@%,0, 2.250); 


e* = (Wg1,07 2,0, ... ,0), 


ey 


eT (01, W Pay sas 


5a" gn). 
We introduce the following sets: 
Raa Ry =F, 
R,= " € Rj_,: ¥jgj = max{wjgj: w € R34 : 
Rj = {v € Rj-1: vjgj = min{wjg;: w € Rj-i}} . 
It is clear that 
R, #0, Vje{0,...,n}, R; CR, 
Vje{l,....n} 
and 
Ri #9, Vj € {0, eee nh, Rj © Rj-1, 
Vi EX os} 
Moreover 


vp =w,VV,weER,r=1,...,j (2) 


and 
vy, =w,VvV,we Rj, r=1,...,f. (3) 


Consider the following two sets: 


R(x, e/(a)) = YY € A: (v,e/) = max(u,')} , 


R(x, e/(a)) = {w € B: (w,e/) = min(u, eh ‘ 


Proposition 1 Assume that the function f is quasidif- 
ferentiable and its subdifferential and superdifferential 
are polytopes at a point x. Then there exists ay > 0 such 
that 


R(x, e/(a@)) C R,, Ree) C Ry, Ff = Wy sesh 
for alla € (0, a). 


Corollary 3 Assume that the function f is quasidiffer- 
entiable and its subdifferential and superdifferential are 
polytopes at a point x. Then there exists ag > 0 such that 


f(x, F(a) = f(x, ef (a) + vjalg; + wjalg;, 
WER, wER;, fj=l,...,n 
for alla € (0, a]. 


Proposition 2 Assume that the function f is quasidif- 
ferentiable and its subdifferential and superdifferential 
are polytopes at a point x. Then the sets R, and Ry, are 
singletons. 


In the next subsection we propose an algorithm to ap- 
proximate subgradients. This algorithm finds a subgra- 
dient which can be represented as a sum of elements of 
the sets R, and Ry. 


Computation of Subgradients 


Let g € R",|g;| = 1,i=1,...,n be a given vector 
and A > 0,a@ > 0 be given numbers. We define the fol- 
lowing points 


x =x, xi =x + dAes(a), 5 ae eres ee 
It is clear that 
xi = xI1+(0,... ,0,Aa/g;,0,...,0), f=1,....n. 


Let v = v(a, A) € R” bea vector with the following co- 
ordinates: 


vj = Aolg)* [Fe)— FO), jal... sn. 4) 
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For any fixed g € R", |gi| =1,i=1,...,n anda > 
0 we introduce the following set: 
V(g.a) = |w ER": Ak > +0, k > +00), 


lim (at, Ax)} . 


k—>+00 


w= 


Proposition 3 Assume that f is a quasidifferentiable 
function and its subdifferential and superdifferential are 
polytopes at x. Then there exists a) > 0 such that 


V(g,a) C Of(x) 


for alla € (0, a]. 


Remark 1 It follows from Proposition 3 that in or- 
der to approximate subgradients of quasidifferentiable 
functions one can choose a vector g € IR” such that 
lgi| = 1,i=1,...,n, sufficiently small a > 0,A >0 
and apply (4) to compute a vector v(a@, A). This vector 
is an approximation to a certain subgradient. 


Remark 2 A class of quasidifferentiable functions 
presents a broad class of nonsmooth functions, includ- 
ing many interesting nonregular functions. Thus, the 
scheme proposed in this section allows one to approxi- 
mate subgradients of a broad class of nonsmooth func- 
tions. 


Discrete Gradients 


In the previous subsection we demonstrated an algo- 
rithm for the computation of subgradients. In this sub- 
section we consider an algorithm for the computation 
of subdifferentials. This algorithm is based on the no- 
tion of a discrete gradient. We start with its defini- 
tion [1]. 

Let f be a locally Lipschitz continuous function de- 
fined on R". Let 


$= {g ER": [igll =U. 

G={eeR":e=(e,... 
JH 1h cee (Hh; 

P = {2(da): 2(A) € R!, z(A) > 0, A > 0, 
A720) > 0,4 + 0}. 


; en); le; = 1, 


Here Sj; is the unit sphere, G is the set of vertices of the 
unit hypercube in R” and P is the set of univariate pos- 
itive infinitesimal functions. 


We take any g € S; and define |g;| = max{|gx|, 
k = 1,...,n}. We also take any e = (e€),...,€n) € 
G, a positive number a € (0,1] and define the se- 
quence of n vectors e/(a), j = 1, ... , n. Then for given 
x € R" and z € P we define a sequence of n + 1 points 
as follows: 


x= xt+ rg, 
xt = x° + z(A)el(a) , 
x? = x° + z(A)e*(a) , 


x" = x° + z(A)e"(a) . 


Definition 5 The discrete gradient of the function f 
at the point x € R” is the vector I"'(x, g,e,z,A,a) = 
(ri, ...,0/) € R", g € S, with the following coordi- 
nates: 


r = [z(A)ae;)] [ f(x!) = f(xi")] ; 
j=l,....n,j#i, 


Pi = (gi | flx+Ag)—fi)—-2 YO Dig, 
jal jai 


It follows from the definition that 
f(x +Ag) -— f(x) = MO "Gee e,z,A,a),g) (5) 


forallg€S;,e€G,zEeP,A>0,a>0. 


Remark 3, One can see that the discrete gradient is de- 
fined with respect to a given direction g € S; and in or- 
der to compute the discrete gradient I” i(x, ge, Z,A, a) 
first we define a sequence of points x°, ... ,x” and 
compute the values of the function f at these points, 
that is we compute n + 2 values of this function includ- 
ing the point x. m — 1 coordinates of the discrete gra- 
dient are defined similar to those of the vector v(a,A) 
from (4) and ith coordinate is defined so as to satisfy 
the equality (5), which can be considered as some ver- 
sion of the mean value theorem. 


Proposition 4 Let f be a locally Lipschitz continuous 
function defined on IR" and L > 0 be its Lipschitz con- 
stant. Then for anyx € R", g € S\,e € GA > 0, 
zEeP,a>0 


Fi < C(n)L, C(n) = (n? + 203? — 20")? | 
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For a given a > 0 we define the following set: 


B(x,a) = {ve R": A(g € Sj, e € G, Z EP, 
Ze > +0, A, > +0, k > +00), 


v= lim I''(x, g,e, 2k, Ax, @)} . (6) 
k—>-+00 


Proposition 5 Let the function f be a differentiable 
with respect to any direction g € R”. Then for any 
g € R" there exists v € B(x,a), a@ > 0 such that 


f'(x, g) = (v,g). 


Proposition 6 Let the function f be a locally Lipschitz 
continuous, differentiable with respect to any direction 
g € R" and x € D(f). Then V f(x) € Bix,a), a > 0. 


Proposition 7 Assume that f is a semismooth, qua- 
sidifferentiable function and its subdifferential and su- 
perdifferential are polytopes at a point x. Then there ex- 
ists yo > O such that 


coB(x,a) C Of (x) 


for alla € (0, a]. 


Remark 4 Proposition 7 implies that discrete gradi- 
ents can be applied to approximate subdifferentials of 
a broad class of semismooth, quasidifferentiable func- 
tions. 


Remark 5 The discrete gradient contains three param- 
eters: 1 > 0, z € Panda > 0. z € P is used to exploit 
semismoothness of the function f and it can be chosen 
sufficiently small. In general a depends on x. However 
if f is a semismooth quasidifferentiable function and 
its subdifferential and superdifferential are polytopes at 
any x € R” then there exist 6 > 0 and a > 0 such that 
a(x) € (0,@] for all y € Bs(x). The most important 
parameter is A > 0. In the sequel we assume that z € P 
and @ > 0 are sufficiently small. 


Consider the following set at a point x € R”: 
Do(x,A,z) = clco{y € R": A(g € S},e € G): v 
= ri(x,g,e,A,z,a)} . 


Proposition 4 implies that the set Do(x, A, z) is compact 
and convex for any x € R”. 


Corollary 4 Let f be a quasidifferentiable semismooth 
function. Assume that its subdifferential and superdiffer- 
ential are polytopes and that in the equality 


f(x + Ag) — f(x) =Af'(x,g) + 00,8), gE Si 


A0(A, g) > 0 as A — +0 uniformly with respect to 
g € 8). Then for any} > 0 there exist hy > Oand z € P 
such that 


Do(x,A,z) C Of (x) + Bs 


for alld € (0, Ao) and z € (0, Zo). 


Consider the continuous approximation C(x, €) to the 
subdifferential 0 f(x). Then Corollary 4 implies that for 
any 6 > 0 there exist ¢€9 > 0,A9 > 0 and z € P such 
that 


Do(x,A,z) C C(x, €) + Bs 


for alle € (0, €), A € (0, Ao) and z € (0, Zo). Thus, dis- 
crete gradients can be used to compute subsets of con- 
tinuous approximations to the subdifferential in the 
sense of Definition 2. Consequently they can also be 
used to compute subsets of uniform and strongly uni- 
form continuous approximations. 


Continuous Approximations to the Quasidifferential 


In this subsection we will consider continuous approx- 
imations to the Demyanov—Rubinov quasidifferential. 
We consider a function of the form 


f(x) = F(x, yx), --- 5 ¥m(*)) , (7) 


where x € R”, the function F is continuously differen- 
tiable in R"*™, y;(x), i€ I= {1, ... ,m}, are semi- 
smooth, regular functions and their subdifferentials are 
polytopes. It is easy to see that the function f is differ- 
entiable with respect to any direction and 


OF (x, 
f(x, 2) at ( ee | 
x 
OF (x, 
+ Dy oO ioe), 
i€l : 


where y(x) = (yi(x), ... . ¥m(x)), OF (x, y(x))/dx is 
the gradient of the function F with respect to x, and 
OF(x, y(x))/dy; is the partial derivative of the function 
F with respect to y;,i € I. 
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Let 
na) = fier He es, oh 
h(x) = fier SS I <ol 


Consider the mappings B;(x,a@) corresponding to 
the functions y;(x),i € I. We introduce the following 
two sets: 


Z(x,a)=coiveR": v= mee) 
x 
2 ORS VO) 
i€I)(x) yi 


vie Bi(x,a),i € 1 , 
3 GE) oe. 


Z(x,a@) = | weR":w 
Oy; 


i€Iz(x) 


w' € B,(x,a),i € 1 : 


Proposition 8 Assume that the function F is continu- 
ously differentiable in R"*™, functions y;(x), i € I, are 
semismooth and regular and their subdifferentials are 
polytopes. Then the function f is quasidifferentiable and 
there exists tg > 0 such that 


Z(x,a) C Of(x) 
and 
Z(x,a) C Of (x) 


for alla € (0, a). 


Corollary 5 Suppose we are given the function 
f(x) = filx) — fox), where f; and f2 are semismooth, 
regular functions and their subdifferentials are poly- 
topes, andB,(x,a) and By(x,a) are mappings corre- 
sponding to the functions f,; and f 2, respectively. Then 
the function f is quasidifferentiable and there exists 
Qo > 0 such that 


By (x, a) cS Ofi(x), Bz(x, a) oe 0 f2(x) 


for alla € (0, a). 


Let Doi(x,A,z) be mappings corresponding to the 
functions y;(x),i € I. We set 


dF (x, y(x)) 
Ox 


OF (x, y(x)) 
+ 2 a v 
i€I,(x) 


v' € Doi(x,z,A),i€ L(x}, 


D,(x,A,z) = co{v eR": v= 


Dy(x,z,4, B) = co{w € R”: 


aF(x. y(x)) 
) eer sree 


’ 


i€In(x) 


w! € Doi(x,z,A), i € h(x)}. 


Note that the mappings Dj(x,A,z), D2(x,A,z) are 
Hausdorff continuous with respect to x for any fixed 
A>0,z€ P. 

It follows from Corollary 4 that the 
Di (x,A,z), D2(x, A, z) can be used to compute subsets 
of continuous approximations to the subdifferential 
and superdifferential of the function (7). 


sets 


Conclusions 


In this paper we introduced continuous approxima- 
tions to the subdifferential and the quasidifferential of 
the nonsmooth, nonconvex functions. We proposed 
the algorithm for their computation. This algorithm al- 
lows one to approximate subgradients a broad class of 
nonsmooth functions. 
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Nonlinear Systems and Global Optimization 


Man-made systems and processes can often be modeled 
to reasonable accuracy by postulating the exclusive use 


of continuous linear functions. For instance, one may 
think of the simplest production and distribution mod- 
els known from the OR literature. (Models with inte- 
ger variables will not be discussed here, even though 
they can be equivalently reformulated, to fit into the 
present framework.) If we attempt, however, the anal- 
ysis of natural — physical, chemical, biological, envi- 
ronmental, or even economic, financial and societal — 
systems and their governing processes, then nonlinear 
functions start to play a significant role in the quan- 
titative description. To illustrate this point, one may 
think of the most prominent (basic) function forms 
in physics: probably polynomials, power functions, the 
exponential-logarithmic pair and trigonometric func- 
tions come to mind first. Clouds, water flows, rugged 
terrains, plants and animals — as well as many other 
natural objects — all possess visible nonlinearities. For 
sophisticated examples and general principles, one may 
think of discussions of nonlinear dynamics, chaos, self- 
organizing systems and the fractal nature of the Uni- 
verse: consult, e. g., [3,5,14,19,25]. 

Prescriptive (control, management, optimization) 
models which attempt to describe and optimize the be- 
havior of inherently nonlinear systems — as a rule — 
lead to nonlinear decision problems. Since nonlinear 
decision models frequently possess multiple local op- 
tima, the general relevance of global optimization (GO) 
becomes obvious. In this brief article we present a list 
of important and challenging GO applications. We also 
provide several illustrative references: these describe 
numerous further application areas. 


The Continuous Global Optimization Model 


We shall consider problems in the general form 


ape 

l<x<u; 
g(x) <0, 
as eee | 


(1) 
s.t. D:= $x: 


In (1) we use the following notation: 
e xis a vector which represents decision alternatives 
in R"; 


e f(x) is a continuous objective function; 
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e Disanonempty set of feasible decisions, defined by 

- g(x), an m-vector of continuous constraint func- 

tions; and 

- I. u, explicit (finite, componentwise) n-vector 

bounds. 

Explicit bounds on the constraint function values 
can also be imposed; however, such more specialized 
models are directly amenable to the form (1). 

First of all, note that if all functions are continu- 
ous, then - by the classical theorem of Weierstrass - 
the optimal solution set to (1) is nonempty. At the same 
time, without further structural assumptions, (1) can be 
a very difficult global optimization problem. In other 
words - unless additional information is provided - 
there may well exist multiple (local) solutions of var- 
ious quality to (1). Naturally, in most cases one would 
like to find the ‘very best’ (global) solution to the under- 
lying decision problem, avoiding the ‘traps’ offered by 
local optima. To attain this objective, a considerable va- 
riety of GO models and solution approaches have been 
suggested: consult, e. g., [12]. 


Test Problems 


Although our primary topic is real-world GO applica- 
tions, one should at least mention several standardized 
test problem suites, since these often originate from 
real-world applications. For collections of (both con- 
vex and nonconvex) nonlinear programming test prob- 
lems, consult, e.g., [11,18]. See [6,7,13,22] for collec- 
tions of GO test problems. On the WWW, see [1,8] and 
[21]; especially [21] provides numerous further links 
and pointers, including discussions of test and real- 
world problems. 


Illustrative Applications 


Since GO problems are literally ubiquitous in scientific, 
engineering and economic decision making, we shall 
only list a number of illustrative applications. All appli- 
cation areas will be listed simply in alphabetical order, 
by information source. (The reader will notice overlaps 
among the problems studied in different works.) 

The test problem collection [6] presents application 
models from the following fields: 
e chemical reactor networks; 
e distillation column sequencing; 
e heat exchanger network synthesis; 


indefinite quadratic programming; 

mechanical design; 

general nonlinear programming; 

phase and chemical reaction equilibrium; 

pooling and blending; 

quadratically constrained problems; 
reactor-separator-recycling systems; 

VLSI design. 

The volume [7] significantly expands upon the 
above material, adding more specific classes of nonlin- 
ear programming models, combinatorial optimization 
problems, and dynamic models, as well as further prac- 
tical examples (see later on). 

The MINPACK-2 collection presented at [1] in- 
cludes models related to the following types of prob- 
lems: 
brain activity; 

Chebychev quadrature; 

chemical and phase equilibria; 
coating thickness standardization; 
combustion of propane; 

control systems (analysis and design); 
database optimization; 

design with composites; 
elastic-plastic torsion; 

enzyme reaction analysis; 
exponential data fitting; 

flow in a channel; 

flow in a driven cavity; 

Gaussian data fitting; 
Ginzburg-Landau problem; 

human heart dipole; 

hydrodynamic modeling; 
incompressible elastic rods; 
isomerization of alpha-pinene; 
Lennard-Jones clusters; 

minimal surfaces; 

pressure distribution; 

Ramsey graphs; 

solid fuel ignition; 

steady-state combustion; 

swirling flow between disks; 
thermistor resistance analysis; 

thin film design; 

VLSI design. 

A detailed discussion of several GO case studies and 
applications is presented in [22]. These problems were 
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analyzed by using LGO, an integrated model develop- 

ment environment to formulate and solve GO prob- 

lems; consult also [23]. The current list of LGO appli- 

cations includes, for instance, the following areas: 

e bio-mechanical design; 

e ‘black box’ (closed, confidential, etc.) system design 
and operation; 

e combination of negotiated expert opinions (fore- 
casts, votes, assessments, etc.); 

e data classification, pattern recognition; 
dynamic population and resource management; 
extremal energy (point arrangement) problems, free 
and surface-constrained forms; 

e inverse model fitting to observation data (calibra- 
tion); 
multifacility location-allocation problems; 

e nonlinear approximation, nonlinear regression, and 
other curve/surface fitting problems; 

e optimized tuning of equipment and instruments in 
medical research and other areas; 
reactor maintenance policy analysis; 

e resource allocation (in cutting, loading, scheduling, 
sequencing, etc. problems); 

e risk analysis and control in various environmental 

management contexts; 

robotics design issues; 

robust product/mixture design; 

satisfiability problems; 

statistical modeling; 

systems of nonlinear equations and inequalities; 

therapy (dosage and schedule) optimization. 

The WWW site [21] discusses, inter alia, the follow- 

ing application areas: 

bases for finite elements; 

boundary value problems; 

chemical engineering problems; 

chemical phase equilibria; 

complete pivoting example; 

distance geometry models; 

extreme forms; 

identification of dynamical systems with matrices 

depending linearly on parameters; 

indefinite quadratic programming models; 

minimax problems; 


nonlinear circuits; 
optimal control problems; 
optimal design; 


parameter identification with data of bounded error; 

PDE defect bounds; 

PDE solution by least squares; 

pole assignment; 

production planning; 

propagation of discrete dynamical systems; 

protein-folding problem; 

pseudospectrum; 

quadrature formulas; 

Runge-Kutta formulas; 

spherical designs (point configurations); 

stability of parameter matrices. 

The collection of test problems [7] includes models 

from the following application areas: 

e batch plant design under uncertainty; 

e chemical reactor network synthesis; 

e conformational problems in clusters of atoms and 
molecules; 

e dynamic optimization problems in parameter esti- 

mation; 


homogeneous azeotropic separation system; 
network synthesis; 

optimal control problems; 

parameter estimation and data reconciliation; 
phase and chemical reaction equilibrium; 
pooling/blending operations; 

pump network synthesis; 

robust stability analysis; 

trim loss minimization. 

The article [4] reviews several significant applica- 
tions of rigorous global optimization (based on the in- 
terval branch and bound approach). These applications 
include: 


e currency trading; 

e finite element analysis (in a high-tech engineering 
design context); 
gene prediction in genome therapeutics; 

e magnetic resonance imaging (in a medical applica- 
tion); 

e numerical mathematics (search for an approxi- 
mate greatest common divisor of given polynomi- 
als); 

e parameter estimation in signal processing; 

e portfolio management; 

One can immediately add here the application of inter- 

val techniques to an issue of paramount significance in 

numerical modeling: 
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e solving systems of nonlinear (and linear) equations; 
consult, e. g., [20] 
The volume [24] also covers a broad range of appli- 
cations from the following areas: 
agro-ecosystem management; 
analysis of nucleid acid sequences; 
assembly line design; 
cellular mobile network design; 
chemical process optimization; 
chemical product design; 
computational modeling of atomic and molecular 
structures; 
controller design for motors; 
electrical engineering design; 
feeding strategies in animal husbandry; 
financial modeling; 
laser equipment design; 
mechanical engineering design; 
radiotherapy equipment calibration; 
robotics design; 
satellite data analysis (interferometry problem); 
virus structure reconstruction; 


water resource distribution systems. 

As the above lists illustrate, the application poten- 

tials of global optimization are indeed most diverse. 
For additional literature on real-world applications, 

see, e. g., the following references: 

e network problems, combinatorial optimization 
(knapsack, traveling salesman, flow-shop prob- 
lems), batch process scheduling: [17]; 

e GO algorithms and their applications (primarily) in 
chemical engineering design: [9]; 

e contributions on decision support systems and tech- 
niques for solving GO problems, but also on molec- 
ular structures, queueing systems, image recon- 
struction, location analysis and process network 
synthesis: [2]; 

e multilevel optimization algorithms and their appli- 
cations: [15]; 

e engineering applications of the finite element 
method: [16]; 

e a variety of applications, e.g., from the fields 
of environmental management, geometric design, 
robust product design, and parameter estima- 
tion: [10]. 

Numerous issues of the Journal of Global Optimiza- 
tion - as well as a large number of other professional 


OR/MS, natural science and engineering journals - also 
publish articles describing interesting GO applications. 
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The Continuous Global Optimization Model 


We shall consider the continuous global optimization 
problem (GOP) in the general form 


min f(x) 


(1) 
st. D:= 4x: g(x) <0, 


In (1) the following assumptions are used: 
e xisa vector representing decision alternatives in R"; 
e Disa nonempty set of feasible decisions, defined by 
- I, u: explicit (finite, componentwise) n-vector 
bounds of x, and 
- g(x) is an m-vector of continuous constraint 
functions defined on [I, u]; 
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a 
Dstt, 
SOE 


eee 
“oN ee 
; see 


RS 


Continuous Global Optimization: Models, Algorithms and 
Software, Figure 1 
A two-variable multi- extremal function 


e f(x) is a continuous objective function defined 

on D. 

Explicit bounds on the constraint function values can 
also be imposed; however, such more specialized mod- 
els are also directly amenable to the form (1). 

Since the functions f and g are all continuous in D, 
the GOP (1) evidently has a nonempty globally opti- 
mal solution set X*. At the same time, one can imme- 
diately realize that — in its full generality — instances 
of model (1) can pose a very significant numerical chal- 
lenge. Since the usual convexity assumptions are absent, 
D may be disconnected and/or nonconvex, and the ob- 
jective function f may also be multi-extremal. That is, 
the number of local (pseudo) solutions to (1) is typi- 
cally unknown and it can be large; the quality of the var- 
ious local and global solutions may differ significantly. 
To illustrate this point, see Fig. 1, which depicts a ‘hilly 
landscape’ (in fact, the surface plot of a relatively sim- 
ple composition of trigonometric functions with em- 
bedded polynomial arguments, in just two variables). 
For instance, this function could be the objective in 
(1) defined on the corresponding interval feasible re- 
gion [J, uJ. 

To solve the GOP (1) —- in a strict mathematical 
sense - means to find the complete set of globally op- 
timal solutions X*, and the associated global optimum 
value f* = f(x*), x* € X*. In most cases, at least in the 
realm of continuous GO, we need to replace this ‘am- 
bitious’ objective by finding a verified estimate - up- 
per and lower bounds - of f*, and corresponding ap- 
proximation(s) of points from the set X*. Naturally, 


such estimates are to be determined on the basis of a fi- 
nite number of algorithmically generated sample points 
from D, or from the embedding interval [J, u]. 

For reasons of better analytical and numerical 
tractability, usually the following additional assump- 
tions are made: 

e Disa full-dimensional subset (a ‘body’) in R”; 
e X* is at most countable; 
e g (i.e., each of its component functions) and f are 

Lipschitz-continuous on [], u]. 

Observe that the first assumption makes algorithmic 
search possible within the set D. With respect to the 
second assumption, note that - in most practical con- 
texts — the set of global optimizers consists only of a sin- 
gle point, or of several points. Finally, the Lipschitz as- 
sumption - i. e., that changes in function values are uni- 
formly controlled by changes in their argument - is 
a sufficient condition for estimating f* on the basis of 
a finite set of search points. We emphasize that the fac- 
tual knowledge of the smallest suitable Lipschitz con- 
stant is not required —- and in practice it is typically un- 
known indeed. The Lipschitz criterion is evidently met, 
e. g., by all continuously differentiable functions defined 
on [I, u]; however, their class is even broader. 

Due to the very general model structure postu- 
lated above, classical (convexity-based) numerical ap- 
proaches are, generally speaking, not directly applicable 
to solve GOPs: instead, truly global scope methodology 
is needed. In the past decades, a considerable variety 
of GO models and solution approaches have been pro- 
posed and analyzed. Below we shall provide a concise 
review, with a view towards software development. For 
detailed discussions, consult, e. g., the illustrative list of 
references. 


Model Types 


The most important GO model classes that have been 

extensively studied include the following. (Note that 

postulated properties of g - such as e. g., convexity - 
are required componentwise.) 

e Bilinear and biconvex programming (f is bilinear or 
biconvex, D is convex). 

e Combinatorial optimization (problems that have 
discrete decision variables in f and/or in g can be 
equivalently reformulated as GO problems in con- 
tinuous variables). 
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Concave minimization (f is concave, D is convex). 
Continuous global optimization (f and g are arbi- 
trary continuous functions). 

Differential convex (DC) optimization (f and the 
components in g can all be explicitly represented, 
as the difference of two corresponding convex func- 
tions). 

Fractional programming (f is the ratio of two real 
functions, and g is convex). 

Linear and nonlinear complementarity problems (f 
is the scalar product of two vector functions, D is 
typically assumed to be convex). 

Lipschitz optimization (f and g are arbitrary 
Lipschitz-continuous functions). 

Minimax problems (f is some minimax objective, 
the maximum is considered over a discrete set or 
a convex set, D is convex). 

Multilevel optimization (e. g., models of noncooper- 
ative games, involving hierarchies of decision mak- 
ers, the conflicting criteria are aggregated by f; D is 
usually assumed to be convex). 

Multi-objective programming (e. g., determination 
of the efficient set, when several conflicting objec- 
tives are to be optimized over the region D). 
Multiplicative programming (f is the product of sev- 
eral convex functions, and g is convex, or - more 
generally — also multiplicative). 

Network problems (f can be taken from several non- 
convex function classes, and g is typically linear or 
convex). 

Parametric nonconvex programming (in these the 
feasible region D and/or the objective f may also de- 
pend also on a parameter vector). 

Quadratic optimization (f is an arbitrary - indefi- 
nite —- quadratic function; g is linear or, in the more 
general case, is also made up by arbitrary quadratic 
functions). 

Reverse convex programming (at least one of the 
functions in g expresses a reverse convex con- 
straint). 

Separable global optimization (f is an arbitrary non- 
linear - in general, nonconvex - separable function, 
D is typically convex). 

Stochastic (nonconvex) models in which the func- 
tions f, g depend on random factors. 

Various other nonlinear programming problems, in 
absence of a verified convex structure: this broad 


category includes, e. g., models in which some of the 

functions f, g are defined by complex ‘black box’ 

computational procedures. 
Note that the problem classes listed are not necessar- 
ily distinct; in fact, several of them are hierarchically 
contained in the more general problem types listed. For 
detailed descriptions of most of these model types and 
their connections consult, e.g., [13], with numerous 
further references. 

Observe also that in the list presented, there are 
specifically structured models (such as e. g., a concave 
minimization problem under linear or convex con- 
straints), as well as far more general ones (such as e. g., 
differential convex, Lipschitz or continuous problems). 
Hence, one can reasonably expect that the most suit- 
ably tailored solution approaches will also vary to a con- 
siderable extent. Very general search strategies should 
work for most models - albeit their efficiency might be 
low for specialized problems. At the same time, strictly 
specialized solvers may not work at all for problem 
classes outside of their scope. 

Several of the most important GO strategies are 
listed below, together with additional remarks and ref- 
erences. Again, the items of the list are not necessar- 
ily exclusive. Most GO software implementations are 
based upon one of these approaches, possibly combin- 
ing ideas from several strategies. 


Exact Methods 
Naive Approaches 


These include the most well known passive (simultane- 
ous) or direct (not fully adaptive) sequential GO strate- 
gies: uniform grid, space covering, and pure random 
searches. Note that such methods are obviously conver- 
gent under mild assumptions, but are - as a rule - im- 
practicable in higher-dimensional problems. Consult 
corresponding chapters in [13,24,30]. 


Complete (Enumerative) Search Strategies 


These are based upon an exhaustive (and typically 
streamlined) enumeration of all possible solutions. Ap- 
plicable to combinatorial problems, as well as to certain 
‘well-structured’ continuous GO problems such as, e. g., 
concave programming. See, e. g., [14]. 
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Homotopy (Parameter Continuation), 
Trajectory Methods, and Related Approaches 


These methods have the ‘ambitious’ objective of visiting 
all stationary points of the objective function: this, in 
turn, leads to the list of all - global as well as local - op- 
tima. This general approach includes differential equa- 
tion model based, path following search strategies, as 
well as fixed-point methods and pivoting algorithms. 
See, for instance, [5] and [8]. 


Successive Approximation (Relaxation) Methods 


The initial optimization problem is replaced by a se- 
quence of relaxed subproblems that are easier to solve. 
Successive refinement of subproblems to approximate 
the initial problem; cutting planes and more gen- 
eral cuts, diverse minorant function constructions, 
nested optimization and decomposition strategies are 
also possible. Applicable to structured GO problems 
such as, e.g., concave minimization and DC prob- 
lems [14]. 


Branch and Bound Algorithms 


A variety of partition strategies have been proposed to 
solve GOPs. These are based upon adaptive partition, 
sampling, and subsequent lower and upper bounding 
procedures: these operations are applied iteratively to 
the collection of active (remaining ‘candidate’) subsets 
within the feasible set D. Their exhaustive search fea- 
ture is similar in spirit to analogous integer program- 
ming methodology. Branch and bound subsumes many 
specific approaches, and allows for a range of imple- 
mentations. 

Branch and bound methods typically rely on some 
a priori structural knowledge about the problem. This 
information may relate, for instance to how rapidly 
each function can vary (e. g. the knowledge of a suitable 
‘overall’ Lipschitz constant, for each function f and g); 
or to the availability of an analytic formulation - and 
guaranteed smoothness - of all functions (for instance, 
in interval arithmetic based methods). 

The branch and bound methodology is applicable 
to broad classes of GO problems: e. g., in combinato- 
rial optimization, concave minimization, reverse con- 
vex programs, DC programming, and Lipschitz opti- 
mization. For details, consult [12,14,15,20,24,26]. 


Bayesian Search (Partition) Algorithms 


These methods are based upon some postulated statis- 
tical information, to enable a prior stochastic descrip- 
tion of the function class modeled. During optimiza- 
tion, the problem instance characteristics are adaptively 
estimated and updated. Note that, typically only the 
corresponding one-dimensional model development is 
exact; furthermore, that in most practical cases ‘myopic’ 
approximate decisions govern the search procedure. 

This general approach is applicable also to (merely) 
continuous GO problems. Theoretically, convergence 
to the optimal solution set is guaranteed only by gen- 
erating an everywhere dense set of search points. One 
of the obvious challenges of using statistical methods 
is the choice and verification of an ‘appropriate’ sta- 
tistical model, for the class of problems to which they 
are applied. Additionally, it seems to be difficult to 
implement rigorous and computationally efficient ver- 
sions of these algorithms for higher-dimensional opti- 
mization problems. Note, however, that if one ‘skips’ 
the underlying Bayesian paradigm, then these meth- 
ods can also be pragmatically viewed as adaptive par- 
tition algorithms, and - as such - they can be di- 
rectly extended to higher dimensions: see [24]. For 
detailed expositions on Bayesian approaches, consult, 
e.g., [18,19,27]. 


Adaptive Stochastic Search Algorithms 


This is another broad class of methods, based upon ran- 
dom sampling in the feasible set. In its basic form, it 
includes various random search strategies that are con- 
vergent, with probability one. Search strategy adjust- 
ments, clustering and deterministic solution refinement 
options, statistical stopping rules, etc. can also be added 
as enhancements. 

The methodology is applicable to both discrete and 
continuous GO problems under very mild conditions. 
Consult, for instance, [2,24,30]. 


Heuristic Strategies 
‘Globalized’ Extensions of Local Search Methods 


These are partially heuristic algorithms, yet often suc- 
cessful in practice. The essential idea is to apply a pre- 
liminary grid search or random search based global 
phase, followed by applying a local (convex program- 
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ming) method. For instance, random multistart per- 
forms a local search from several points selected ran- 
domly from the search domain D. Note that even such 
sampling is not trivial, when D has a complicated shape, 
as being defined, e. g., by (merely) continuous nonlinear 
functions. 

Frequently, sophisticated algorithm enhancements 
are added to this basic strategy. For instance, the clus- 
tering of sample points is aimed at selecting only a sin- 
gle point from each sampled ‘basin’ of f from which 
then a local search method is initiated. For more details, 
consult, for instance, [27]. 


Genetic Algorithms, Evolution Strategies 


These methods ‘mimic’ biological evolution: namely, 
the process of natural selection and the ‘survival of the 
fittest’ principle. An adaptive search procedure based 
on a ‘population’ of candidate solution points is used. 
Iterations involve a competitive selection that drops 
the poorer solutions. The remaining pool of candidates 
with higher ‘fitness value’ are then ‘recombined’ with 
other solutions by swapping components with another; 
they can also be ‘mutated’ by making some smaller- 
scale change to a candidate. The recombination and 
mutation moves are applied sequentially; their aim is to 
generate new solutions that are biased towards subsets 
of D in which good - although not necessarily globally 
optimized — solutions have already been found. 

Numerous variants of this general strategy, based 
on diverse evolution ‘game rules’, can be constructed. 
The different types of evolutionary search methods in- 
clude approaches that are aimed at continuous GOPs, 
and also others that are targeted towards solving com- 
binatorial problems. The latter group is often called 
genetic algorithms. For details, consult, e.g., [10,17, 
22,29]. 


Simulated Annealing 


These techniques are based upon the physical analogy 
of cooling crystal structures that spontaneously attempt 
to arrive at some stable (globally or locally minimal po- 
tential energy) equilibrium. This general principle is 
applicable to both discrete and continuous GO prob- 
lems under mild structural requirements: consult, e. g., 
[1,22,28]. 


Tabu Search 


In this general category of metaheuristics, the essential 
idea during search is to ‘forbid’ moves to points already 
visited in the (usually discrete) search neighborhood, at 
least for a number of upcoming steps. This way, one 
can temporarily accept new inferior solutions, in order 
to avoid (sub)paths already investigated. This approach 
can lead to exploring new regions of D, with the goal of 
finding a solution by ‘globalized’ search. 

Tabu search has traditionally been applied to com- 
binatorial optimization (e. g., scheduling, routing, trav- 
eling salesman) problems. The technique can be made - 
at least, in principle - directly applicable to continu- 
ous GOPs by a discrete approximation (encoding) of 
the problem, but other extensions are also possible. See 
[9522529]. 


Approximate Convex Global Underestimation 


This heuristically attractive strategy attempts to esti- 
mate the (postulated) large scale, ‘overall’ convexity 
characteristics of the objective function f based on di- 
rected sampling in D. Applicable to smooth problems. 
See, e. g., [6]. 


Continuation Methods 


These first transform the potential function into a more 
smooth (‘simpler’) function which has fewer local min- 
imizers, and then attempt to trace the minimizers back 
to the original function. Again, this methodology is 
applicable to smooth problems. For theoretical back- 
ground, see, for instance, [8]. 


Sequential Improvement of Local Optima 


These methods usually operate on adaptively con- 
structed auxiliary functions, to assist the search for 
gradually better optima. The general heuristic principle 
is realized by so-called tunneling, deflation, and filled 
function approaches; consult, for example, [16]. 


Global Optimization Software 


In spite of significant theoretical advances in GO, soft- 
ware development and ‘standardized’ use lag behind. 
This can be expected due to the potential numerical dif- 
ficulty of GOPs; recall Fig. 1. Even ‘much simpler’ prob- 
lem instances - such as e. g., concave minimization, or 
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indefinite quadratic programming - belong to the hard- 
est (NP) class of mathematical programming problems. 

As summarized above, there exist several broad 
classes of algorithmic GO approaches that possess 
strong theoretical convergence properties, and — at least 
in principle - are straightforward to implement. How- 
ever, all such rigorous approaches involve a computa- 
tional demand that increases exponentially as a func- 
tion of problem size, even in case of the simplest GO 
problem instances. (Consult, for example, [13] for re- 
lated discussions.) Therefore many practical GO strate- 
gies are completed by a ‘traditional’ local optimization 
phase. Global convergence, however, needs to be guar- 
anteed by the global scope algorithm component: the 
latter - at least in theory - should be used in a com- 
plete, ‘exhaustive’ fashion. The above remarks indicate 
the basic inherent theoretical (and practical) difficulty 
of developing robust, yet efficient GO software. 

Since the computational demand of rigorous strate- 
gies can be expected to be some exponential function of 
the problem dimensionality, GO problems in R” (n be- 
ing just 5, 10, 20, 50, 100, ....) may have rapidly increas- 
ing - possibly straight enormous - numerical complex- 
ity. This is (and will remain) true, in spite of the fact that 
computational power seems to grow at an unbelievable 
pace: the so-called ‘curse of dimensionality’ is here to 
stay. 

In 1996, a survey on continuous GO software was 
prepared for the newsletter of the Mathematical Pro- 
gramming Society [23]. Additional information has 
been collected from the Internet, from several GO 
books, and from the Journal of Global Optimization. 
Drawing on the responses of software developers and 
the additional information available, over 50 software 
products were annotated in that review. (In order to as- 
sist in obtaining further information, contact person(s), 
their e-mail addresses, ftp and/or WWW sites have also 
been listed.) 

Most probably, by now the number of solvers aimed 
at GOPs is around one hundred (or even more). The 
general impression is, however, that many of these soft- 
ware products are still at an experimental development 
stage, and of dominantly ‘academic character, as op- 
posed to ‘industrial strength’ tools. (Of course, it is 
not impossible that proprietary software products used 
by industry and private companies are not announced 
publicly.) 


Below we shall list some key aspects that should be 
addressed by professional quality GO software develop- 
ment: 

e well-specified hardware and software environments 
(supported development platforms); 

e quality user guidance: clearly outlined model de- 
velopment procedure, sensible modeling and trou- 
bleshooting tips, user file templates, and (also) non- 
trivial numerical examples; 

e fully functional, ‘friendly’ user interface; 
‘fool-proof solver selection and execution proce- 
dures; 

e good runtime communication and documentation: 
clear system output for all foreseeable program exe- 
cution versions and situations, including proper er- 
ror messages, and result file(s); 

e visualization features which are especially desirable 
in nonlinear systems modeling, to avoid problem 
misrepresentation, and to assist in finding alterna- 
tive models and solution procedures; 
reliable, high-quality user support; 

e continuous product maintenance and development 
(since not only science progresses, but hardware 
devices, operating systems, as well as development 
platforms are in permanent change). 

This tentative ‘wish-list’ of requirements indicates 
that although the task is not impossible, it is a chal- 
lenge. As for an example, we refer to LGO - an in- 
tegrated model development and solver system - that 
has been developed with a view towards the desiderata 
listed above. Details regarding LGO are described, e. g., 
in [24,25]. 


Software Evaluation 


In order to obtain information regarding the scope and 
usability of GO software, it needs to be thoroughly 
tested. This is a demanding task, when done properly. 
Consideration needs to be given to the selection of ap- 
propriate - nontrivial and practically meaningful - ex- 
amples. Computational experiments should be care- 
fully designed; and the results should be reported in suf- 
ficient details, to assure a fair and accurate assessment. 
For corresponding discussions and GO (or other) test 
problems, consult, e. g., [3,4,7,21]. 

A GO software evaluation framework can be pro- 
posed, for instance, along the following guidelines. 
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Software Applicability Range 
(Solvable Model Types) 


e objective function: concave, DC, Lipschitz, continu- 
ous, or some other (general or more special) func- 
tion form; 

e constraints: unconstrained problems, bound con- 
strained problems, linear constraints, general non- 
linear smooth constraints; 

e additional information related to solvable model 
types and sizes, with corresponding expected run- 
times (within given hardware and software environ- 
ments). 


GO Methodology Applied 


e summary (or more detailed) description of basic 
principles; 
e adequate list of references; 


Hardware and Software Requirements 


supported hardware platforms; 

minimal hardware configuration needed; 

operating systems; 

programming languages and environments; 
compiler(s) needed; 

connectivity to other development environments; 
portability to other hardware and software plat- 
forms. 


Test Results 


e test problem description, mathematical and/or 
coded form; 

e real world background information (when applica- 

ble); 

best known results, with references; 

accuracy requirements, stopping criteria; 

hardware and software environment used in testing; 

standard timing (to facilitate comparisons among 

different platforms); 

e time and computational demand, in order to find 
the (estimated) global optimum; 

e comparative success rate; 

e information regarding the reproducibility of results. 


Additional Software Information 


e installation procedure; 
e user interface features; 


e academic and/or professional licenses; conditions of 
use; 

e user support (manual, on-line help, example files, 
input and result handling, etc.); 

e other points of interest. 

Of course — at least from a practical point of view - 
the most meaningful test is to apply GO methods to 
problems that are of interest in the real world. For 
numerous existing and prospective GO applications, 
please consult the related articles » Global Optimiza- 
tion in the Analysis and Management of Environmental 
Systems and » Continuous Global Optimization: Ap- 
plications. 
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Introduction 


Nonlinear optimization problems involving discrete 
decision variables, also known as generalized disjunc- 
tive programming (GDP) or mixed-integer nonlinear 
programming (MINLP) problems, arise frequently in 
applications. Examples from process engineering in- 
clude the synthesis of heat exchanger or reactor net- 
works, the optimization of separation processes, such 
as sequencing and tray optimization problems of dis- 
tillation columns, and the optimization of entire pro- 
cess flowsheets [3]. The discrete decisions in these prob- 
lems are usually related to the structure of the process 
whereas typical continuous variables are process states 
such as temperatures, concentrations or flows. 

Connections between continuous and discrete op- 
timization problems have been studied for several 
decades (see, e. g., [4]). In particular, in [9] it was ob- 
served that discrete variables can be modeled by com- 
plementarity constraints, that is, the discrete model 
is replaced by a continuous model. A broad sur- 
vey on other approaches to model discrete decisions 
by continuous formulations is given in [10], includ- 
ing concave optimization problems and relaxation by 
semi-definite programming, with applications to the 
maximum clique problem, satisfiability, the Steiner tree 
problem, and minimax problems. 

Extensive work has been addressed to discrete- 
continuous problems with linear objective function and 
constraints, known as mixed-integer linear program- 
ming (MILP) problems. In fact, a number of power- 
ful algorithms have been developed which are ready to 
solve practically relevant, large-scale problems of this 
type. As soon as the objective function and the con- 
straints comprise nonlinear terms in the continuous 
variables, as it is usually the case, for example, for prob- 
lems in process engineering, the optimization problem 
is referred to as a mixed-integer nonlinear program- 
ming (MINLP) problem. Algorithms for MINLP prob- 
lems are either based on branch and bound with non- 
linear programming (NLP) subproblems or on decom- 
position methods that alternately solve NLP and MILP 
subproblems. These algorithms are guaranteed to locate 
the global optimum if the nonlinearities are convex. 

Optimization problems involving nonconvex objec- 
tive function and constraints are by far more difficult 
to solve. In [11] it is proposed to reformulate discrete- 


continuous optimization problems by the idea from [9], 
as purely continuous optimization problems with com- 
plementarity constraints. In this approach, the dis- 
crete variable set of an MINLP problem is replaced by 
continuous variables which are restricted to take dis- 
crete values by enforcing a special type of either non- 
differentiable or degenerate continuous constraints. 

[15] complements this approach by purely con- 
tinuous reformulations of MINLP problems with bet- 
ter theoretical properties, as will be explained below. 
As all continuous reformulation approaches inevitably 
lead to nonconvex optimization problems, searching 
for a global solution may be numerically challenging. 
On the other hand, these continuous reformulations 
yield efficient ways to locally solve MINLP problems on 
the basis of NLP solution methods. 


Definitions 


Consider a generalized disjunctive representation [12] 
of nonlinear optimization problems, where an objec- 
tive function is minimized subject to two different types 
of constraints, namely global constraints that hold ir- 
respectively of any discrete decision, and constraints 
contained in disjunctions that are only enforced if 
a corresponding Boolean variable Y;,, is True. The 
optimization problem is then formulated as follows: 


(GDP) min B(x) + } [bk 


keK 
st. f(x) =0, (1) 
g(x) <0, (2) 
Yi,k 

iy hix(x) =0, pete. 3) 
rin(x) <0, 

i€D, 

be = Vik 
Dg = 4152322257} s 
Q(Y) = True, Y;,, € {True, False} . (4) 


In GDP, x represents a vector of continuous de- 
cision variables and Y;,, are Boolean variables. b, is 
a scalar and y;,, represents a fixed charge. The objec- 
tive function comprises the sum of all fixed charges 
and a nonlinear term ®(x). Whereas the model equa- 
tions (1) and inequality constraints (2) hold irrespective 
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of discrete choices, there are further equations and in- 
equality constraints (3) that are contained in nz, k € K, 
disjunctions. Each disjunction k may consist of several 
terms i € D;, where the index set D; defines the num- 
ber of terms for each disjunction. Note that exactly one 
term i € D, holds per disjunction, that is, \/ <p, 
derstood as an “exclusive or’ operator. The disjunctive 


is un- 


constraints are only enforced if the Boolean variable 
value Y;,, is True. Otherwise, if Y;,; is False, the corre- 
sponding constraints are removed from the optimiza- 
tion problem. 

The Boolean variables themselves are related to each 
other by so called propositional logic constraints (4). 
These logic constraints are used to model interrela- 
tionships between disjunctive constraints. For exam- 
ple, assume that the first disjunctive term from dis- 
junction k = 1 has to be selected (Y;,, = True) if an- 
other term from disjunction k = 2 is removed from the 
constraint set (Y|,2 = False). This situation can be ex- 
pressed by the implication = Y,,. => Yji,; , which can be 
transformed into a constraint of type (4): 


Yi,2 V Yi = True. (5) 


Any optimization problem in disjunctive form GDP 
can be posed as an equivalent MINLP problem [5] by, 
for example, transforming the disjunctive constraints 
into big-M or binary multiplication constraints and by 
replacing the Boolean variables Y;,, by binary variables 
Vik © {0, 1}. 

A problem reformulation based on binary multipli- 


cation is: 

(BM) — min P(x) + d be 
st. f(x) =0, 

g(x) <0, 
Vik hi,e(x) =0, (6) 
Vik Ti,k(x) <0, (7) 
Vik * (bk — Vik) = 9, (8) 
Ay <a, (9) 


2 Vik = 1, 


i€D_ 


(10) 


vik €{0,1}, i€D,,keK, (11) 


where each disjunctive constraint is multiplied by 
a variable y;,x. If yi,, = 0, the corresponding constraint 
becomes redundant. On the other hand, a constraint 
contained in a disjunction is enforced with yj, = 1. 
The propositional logic constraints (4) can be modeled 
by the linear constraints (9) on the binary variables. 
Note that with these inequalities not only exclusive but 
also inclusive ‘or’-relations can be modeled, although 
a binary variable itself takes only exclusively the values 
0 or 1. In fact, for two binary variables y; and y the 
inclusive relation y; + y2 > 1, modeling (5), becomes 
exclusive under the additional relation y; + yz < 1. 

It is important to note that the problem formulation 
BM has the drawback of being nonconvex even if the 
nonlinear, disjunctive constraints of the original opti- 
mization problem are convex. Thus, a problem refor- 
mulation based on binary multiplication would be em- 
ployed only if the disjunctive optimization problem was 
nonconvex itself, as it is the case, for example, in a large 
portion of process engineering applications. Hence, this 
drawback should not be regarded as a strong limitation. 
Also note that the nonconvex expressions in (7) can be 
convexified if r;,, is a convex function [16]. However, 
in the following this assumption will not be made. 


Formulations 


Instead of applying an MINLP algorithm for solving 
the discrete-continuous optimization problem BM in- 
troduced in the previous section directly, one can re- 
formulate the problem such that no discrete variables 
are present anymore. In particular, the discrete set de- 
fined in (10),(11) can be replaced by a set of restric- 
tions involving continuous variables only, which can be 
used as constraints to form a purely continuous NLP. 
Since NLP solvers are usually designed to work with 
continuous variables, that is, variables from at least one- 
dimensional sets, the basic idea here is to increase the 
dimension of the constraint sets for y;,,. Note that the 
discrete variables y;,, as defined in (11) are contained 
in a set of dimension zero. 

For the explanation of the main ideas consider a sin- 
gle disjunction with nj, = 2 as it appears in (3). Put 
Vk 2= Vi,k as well as Z, := y2,~ and drop the fixed in- 
dex k. This leads to a single binary decision variable 
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y € {0, 1} and its negation z, where the pair (y, z) can 
then attain exactly one of the values (1,0) and (0, 1), 
that is, 

(y,z) € Ao = {(1,0), (0, D}, (12) 
(cf. Fig. 1). Hence, in this case the conditions (10),(11) 
are replaced equivalently by (12). In the general case 
ny = 2 there are several ways to use the set Ao to refor- 
mulate (10),(11) equivalently. A first possibility is to in- 
troduce additional variables z;,, = 1— y;,, and replace 
only (11) by the conditions (y;,x, Z:,~) € Ao, i € Dx. 

Note that the constraint (10) guarantees that exactly 
one of the variables y;,,, i € Dx, is equal to 1, since 
these variables can only take the values 0 and 1. This 
restriction can be relaxed in conjunction with an al- 
ternative approach for modeling binary decision vari- 
ables explained below. Having these later developments 
in mind, note that an alternative reformulation of (10), 
(11) using Apo is 


Vik» = Vik} € Ao, i€ De, k eK. (13) 


jEDx\ti} 


An advantage of the latter reformulation is that it does 
not increase the problem dimension by auxiliary vari- 
ables Z;,x. 


y 


Continuous Reformulations of Discrete-Continuous Opti- 
mization Problems, Figure 2 

The points which satisfy the complementarity condi- 
tion (14),(15) 


Representing the Discrete Decisions 
by Approximate Continuous Variables 


There are a number of ways to describe Ao with con- 
tinuous constraints. A suggestion of [9,11] is to re- 
place (12) with the equivalent set of constraints 


yz =0 (14) 
y>0,z>0 (15) 
ytz=1. (16) 


In fact, the constraints (14),(15) are known as a comple- 
mentarity condition. They model a piecewise linear set 
with one kink at the origin in R’, as depicted in Fig. 2. 
Together with the constraint (16) one obtains exactly 
the set Ao (cf. Fig. 3). 

It is well-known that sets whose description con- 
tains complementarity conditions are not easy to 
treat numerically. In fact, the so-called Mangasarian- 
Fromovitz constraint qualification is violated every- 
where in the feasible set as soon as a complementarity 
condition appears [14]. This constraint qualification, 
however, is known to characterize the (numerical) sta- 
bility of the described set [6,13]. 
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Continuous Reformulations of Discrete-Continuous Opti- 
mization Problems, Figure 3 

Modeling discrete variables with a complementarity condi- 
tion 


There are many suggestions on how a complemen- 
tarity condition can be treated numerically, in particu- 
lar in the literature on so-called mathematical programs 
with equilibrium constraints (MPECs) which are opti- 
mization problems with complementarity conditions in 
the constraints [7,8]. 

One approach is to use regularization techniques, 
for example to replace the condition (14) by its relax- 
ation y- z < with some positive parameter jz. The 
idea is to trace the solutions of the corresponding aux- 
iliary problems to a solution of the original problem 
while driving ju to zero. 

For the reformulation of binary variables this ap- 
proach means that the discrete set Ao is replaced by the 
one-dimensional set 


Ay = {(0,1) +t-(1,—1)|t € [0,0.5— /0.25 — 1] 
U [0.5 + ./0.25 — p, lJ}, 


which is disconnected for  < 0.25 as illustrated in 
Fig. 4. 

Hence, this approach replaces discrete by continu- 
ous variables, at least via an approximation. [15] refers 
to the variables from the set A,, as approximate continu- 
ous. In view of (13), a possible approximation of binary 


a0) 
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Continuous variables for the relaxed complementarity con- 
dition 


variables from a general disjunction is 


€ Ay, i€ De, kEK 


Vi,k > > Vik 


jEDx\ti3 


with pu > 0. 

Note that there are two serious drawbacks of the 
reformulation by a complementarity condition. First, 
a look at Fig. 3 shows that the kink at the origin is ir- 
relevant for the description of Ap because of the addi- 
tional constraint (16). Thus, there is no need to use the 
numerically demanding complementarity condition to- 
gether with (16), but any function with a smooth zero 
set and the correct intersection points would do. For 
example, one can use the constraint 


A 1\* 1 
en Z-- =, 
%~ 5 y) a 


which is illustrated in Fig. 5. 

Here, the Mangasarian—Fromovitz constraint quali- 
fication and even the stronger linear independence con- 
straint qualification are satisfied everywhere in the set 
Ao (for background information on constraint qualifi- 
cations see [1]). This can be seen as an important ad- 
vantage when compared to the properties of Ao repre- 
sented by the complementarity condition. 

A second drawback which the circle condition 
shares with the reformulation by a complementarity 
condition is that the variables are still contained in the 
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The circle condition 


discrete set Ao. In order to obtain a one-dimensional 
set one can again relax the conditions that describe Ao 
to obtain a set A, corresponding to A,, from the MPEC 
relaxation above. In fact, the constraints 


with v > 0 describe sets Ay (cf. Fig. 6), which corre- 
spond to the sets A,,(j > 0) via a reparametrization, 
that is, one arrives at the same set of approximate con- 
tinuous variables. 

Unfortunately, for the limiting case v = 0 the cir- 
cle is not described by one equality constraint but by 
two inequalities with gradients pointing in opposite di- 
rections, so that the Mangasarian—Fromovitz constraint 
qualification is then again violated in Ao. 


Representing the Discrete Decisions 
by Exact Continuous Variables 


Although both the reformulation by a complementarity 
condition and the reformulation by a circle condition 
lead to well performing numerical methods for small 


z he 


NS 
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Continuous Reformulations of Discrete-Continuous Opti- 
mization Problems, Figure 6 
The circle relaxation 


examples [11,15] they share two intrinsic drawbacks: 

e the replacement for Ao is one-dimensional, but only 
approximate, 

e in the (limiting) case of an exact description for 
Ao, the Mangasarian-Fromovitz constraint qualifi- 
cation is violated, 

e the one-dimensional set becomes discrete if equality 
constraints are inconsistent (see below). 

Since these properties may affect the numerical solution 

of large problems, [15] proposes a different continuous 

reformulation of the integrality constraints with better 
theoretical features. 

The subsequent considerations are based on an al- 
ternative model reformulation that allows to replace 
the discrete decision variables defined in (10) by vari- 
ables y;,., which are not defined on a discrete set as, 
for example, Ao. In fact, this model reformulation has 
the property of being equivalent to the corresponding 
disjunctive optimization problem in conjunction with 
one-dimensional rather than discrete variables y;,,. Be- 
fore describing the model reformulation in detail, we 
focus on the variables y;,, and show how a continuous, 
one-dimensional set A; can be defined using appropri- 
ate constraints. 

In fact, since in BM any disjunctive constraint is 
not only enforced by y;,, = 1, but alternatively also by 
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mization Problems, Figure 7 
A one-dimensional feasible set for (y, z). 


Yi,k = 1, one may define y;,, as continuous variables of 
dimension one according to: 


Vi,k € {0} U [1, oo). 


Of course, now the negation of y;,, in general does not 
coincide with 1 — y;,x, as the value of y;,, might exceed 
1. On the other hand, for the case nz = 2 as above we 
obtain 


(y,z) € Ai = ([1, 00) x {0}) U ({0} x [1,00)) (17) 


(cf. Fig. 7). Hence, the negation of y;,, is coded by the 
variable z;,,. Moreover, one can now describe the bi- 
nary decisions via 


yir> >> vik) € Ar, i€ De, KEK. (18) 
jeDi\Ci} 


The set A; is obviously one-dimensional and is an 
exact rather than approximate model of a discrete deci- 
sion. Therefore, the variables defined by the set A; are 
referred to as exact continuous. 

To be able to apply an NLP solution algorithm, one 
has to describe A; by continuous constraints. One pos- 
sibility, of course, is to use the (degenerate) comple- 
mentarity condition (14),(15) with the additional con- 
straint y + z > 1. However, it is also possible to choose 


a function with an appropriate zero set, such that the 
linear independence constraint qualification holds ev- 
erywhere in the feasible set. A function with these 
properties is the so-called Fischer-Burmeister function 
gra(y.zZ) =y+z— Vy? +27. This means that one 


can write 
Ai = {(y.z) € R?| gra(y,z) = 0, y+z>1}. (19) 


Equivalently, one could use a multitude of other so- 
called NCP-functions (for a survey see [2]). NCP- 
functions are used for the description of nonlinear 
complementarity problems. They are designed such 
that their zero set coincides with the set defined 
by (14),(15) (cf. Fig. 2). A description like (19) reveals 
better numerical properties than the original descrip- 
tion via (14),(15). For example, whereas the Mangasar- 
ian-Fromovitz constraint qualification is violated ev- 
erywhere in the set under a description via (14),(15), the 
description as the zero set of the Fischer-Burmeister 
function even leads to the validity of the linear inde- 
pendence constraint qualification everywhere in the set, 
except for the origin (which does not play a role here). 
In terms of the Fischer-Burmeister function, and us- 
ing (19), the condition (18) is equivalent to 


PrB | Vi,k> > Vik | = 9, ie D,,kekK, 


jEDK\ti3 


D> vik 21, KEK. 


i€Dx 


Modeling Propositional Logic Constraints 
with Exact Continuous Variables 


The question remains how logical conditions on two 
logical variables Y; and Y2 should be modeled when 
(y1, 21) and (y2, Z2) are not discrete but continuous as 
proposed in (17). This can easily be done by adding in- 
equality constraints. In fact, Y; A Y2 is true if and only 
if y,) > 1 and y2 > 1. Moreover, Y, V Yo is true if and 
only if y; + y2 > 1. For the negation of Y; one may not 
use 1 — y;, as y; might take a value strictly larger than 
one. On the other hand, for n;, = 2 the negation of Y; is 
already coded in the variable z;. Moreover, for ny > 2 
the negation of y;,¢ is coded in Died, \{i} yj,k» and one 
can proceed as above. Just like in the discrete case, in- 
clusive as well as exclusive ‘or’-relations can be modeled 
with exact continuous variables. 


Continuous Reformulations of Discrete-Continuous Optimization Problems 


To circumvent the introduction of nonconvexity 
into the model by binary multiplication in BM, [15] 
presents an alternative, convex reformulation approach 
on the basis of tailored big-M constraints which can 
also be used in conjunction with exact continuous vari- 
ables as defined in equation (17). A distinctive property 
of the binary multiplication-based model formulation 
BM, however, is the treatment of inconsistent equality 
constraints. 


The Case of Inconsistent Equalities 


In many applications, the constraints (6)-(8) in BM 
lead to implicit restrictions on the exact continu- 
ous variables. In particular, (6) and (8) have to hold 
simultaneously for i € Dz,k € K. In process engi- 
neering applications, the underlying equations (i.e. 
hix(x) = 0, i € Dg as well as by — yi,4 = 0, i € Dx) 
are often inconsistent for fixed k € K, that is, they do 
not admit a common solution or, put geometrically, the 
sets described by these equations are disjoint. Note that 
this is an inherent property of a GDP problem with so- 
called disjoint disjunctions which have non-empty in- 
tersecting feasible regions [17]. 

This is particularly the case, if for fixed k € K the 
values y;,, are pairwise distinct for i € Dx. It implies 
that at most one of the variables y;,,,i € Dx, is non- 
vanishing. In the case nx = 2 with y = y; and z= yp 
this means that the equation y - z = 0 holds automati- 
cally. As a consequence, the only constraint needed for 
the description of A (cf. Fig. 7) is y + z > 1, that is, the 
set 


A. ={(y,z)€R’|y+z>1} 


coincides with A, in the case of inconsistent equalities 
(cf. Fig. 8). 

Although the pair (y, z) does not vary in the com- 
plete two-dimensional set Az from Fig. 8, in the restric- 
tions one does not code the same information twice. 
This can be expected to lead to better numerical per- 
formance when NLP solvers are applied. 


Conclusions 


In [15] several example problems involving discrete 
and continuous decision variables from process engi- 


Continuous Reformulations of Discrete-Continuous Opti- 
mization Problems, Figure 8 
A two-dimensional feasible set for (y, z) 


neering are treated numerically, with approximate as 
well as exact continuous variables representing the dis- 
crete decisions. It is shown that, using these reformu- 
lations, an efficient numerical treatment of disjunctive 
optimization problems is possible, but one can only ex- 
pect to find local solutions when using standard NLP 
solvers. This is due to the fact that any continuous 
reformulation of a disjunctive optimization problem 
leads to a nonconvex optimization problem. Conse- 
quently, the reformulation approaches may be com- 
bined with global optimization algorithms whenever 
the problem size admits to do so. 


See also 


> Disjunctive Programming 

> Mixed Integer Programming/Constraint 
Programming Hybrid Methods 

> Order Complementarity 
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Introduction 


Inventory control is an important issue in supply chain 
management. Today, many different approaches are 
used to solve the complicated inventory control prob- 
lems. While some of the approaches use a periodic re- 
view cycle, others use methods based on continuous re- 
view of inventory. In this survey, stochastic inventory 
theory that is based on continuous review is analyzed. 
One of the challenging tasks in continuous review 
inventory problems is finding the order quantity (Q) 
and the reorder point (R) such that the total cost is min- 
imized and fill rate constraints are satisfied. The total 
cost includes ordering cost, backorder cost, and inven- 
tory holding cost. The fill rate is defined as the fraction 
of demand satisfied from inventory on hand. Under the 
continuous inventory control methodology, when the 
inventory position (on-hand inventory plus outstand- 
ing orders minus backorders) drops down to or below 
a reorder point, R, an order of size Q is placed. 
Although they are all the same, there are many dif- 
ferent representations of this inventory model such as 
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(Q, r) (Boyaci and Gallego [5]), (Q, R) (Hing et al. [14]), 
(R, Q) (Axsater [2,3] and Marklund [16]). In addition, 
for some of the problems it is assumed that the order 
quantity (7Q) is a multiple of minimum batch size, Q. 
Here, n is the minimum integer required to increase the 
inventory position to above R (Chen and Zheng[8]). In 
this case, the problem is formulated as a (R, nQ) type 
model. 


Models 


When the literature of (Q, R) models is investigated, 

some similarities and differences among the publica- 

tions can easily be identified. Thus, the publications can 
be classified according to those similarities and differ- 

ences. Two of the most distinctive attributes of (Q, R) 

models are as follows: 

1. Type of supply chain: While some articles only con- 
sider one entity that uses a (Q, R) policy [1,5,14], 
others consider a multi echelon inventory system 
[2,3,8,16]. 

2. Exact evaluation or near-optimal evaluation: The (Q, 
R) inventory problems are not easy to solve; thus, 
many of the research papers give approximate so- 
lution approaches or try to find bounds on the so- 
lutions [1,2,4,5,20], only a small number of articles 
give the exact evaluation of the (Q, R) inventory sys- 
tem [3,9,21]. 

In the next section, the literature based on type of sup- 
ply chain considered and the evaluation methods used 
is reviewed. First, heuristic methods are analyzed. Sec- 
ond, publications providing optimal methods are re- 
viewed. In the last section, we give some concluding re- 
marks. 


Single-Echelon Models 


Hing et al. [14] focus on average inventory level ap- 
proximation in a (Q, R) system with backorders. They 
compare different approaches proposed in the liter- 
ature. Their numerical analysis shows that the ap- 
proximation developed by Hadley and Whitin [13], 
1/2Q + safety stock, is more robust than other approxi- 
mations that have been proposed so far. Then, the au- 
thors propose a new methodology based on spreadsheet 
optimization. Using numerical examples they show that 
spreadsheet optimization based approach is better than 
those methods proposed in the literature. 


Agrawal and Seshadri [1] provide upper and lower 
bounds for optimal R and Q subject to fill rate con- 
straints. Although the authors consider backorder cost, 
the algorithm that was developed to find bounds can 
be used when backorder costs are zero. Another impor- 
tant application of the algorithm is that it can be applied 
when there are no service level constraints. 

Like Agrawal and Seshadri [1], Platt et al. [19] also 
consider fill rate constraints and propose two heuris- 
tics that can be used for (Q, R) policy models. While 
the first heuristic is suitable for deterministic lead time 
demand models, the second one assumes that demand 
during the lead time follows a normal distribution. Both 
heuristics are used to find R and Q values. The authors 
compare the proposed heuristics with others that have 
been proposed in the literature. Their analysis shows 
that their heuristics do not necessarily outperform the 
other heuristics in each problem instance. 

Boyaci and Gallego [5] propose a new (Q, R) model 
that minimizes average holding and ordering costs sub- 
ject to upper bounds on the expected and maximum 
waiting times for the backordered items. They provide 
optimality conditions and an exact algorithm for the 
problem. Boyaci and Gallego [5] conclude their study 
by performing a numerical analysis. 

Gallego [12] proposes heuristics to find distribu- 
tion-free bounds on the optimal cost and optimal batch 
size when a (Q, R) policy is used. He also shows that the 
heuristics work well when the demand distribution is 
Poisson or compound Poisson. 

Bookbinder and Cakanyildirim [4] consider a (Q, R) 
policy where lead time is not constant. They treat lead 
time as a random variable and develop two probabilistic 
models. While in the first model the lead time is fixed, 
in the second model the lead time can be reduced by 
using an expediting factor (rt). The order quantity, re- 
order point, and expediting factor are the the three de- 
cision variables in the second model. The authors show 
that for both models the expected cost per unit time 
is jointly convex. They also make a sensitivity analysis 
with respect to cost parameters. 

Ryu and Lee [20] consider the lead time as a de- 
cision variable. However, in this study the demand is 
constant. In their model, Ryu and Lee [20] assume that 
there are two suppliers for the items to be procured. 
They mainly consider two cases. In the first case, lead 
time cannot be decreased but in the second case, or- 
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ders can be expedited. The authors also assume that 
lead-time distributions are non-identical exponential. 
For the first case, their objective is to determine a Q, 
an R, and an order-splitting proportion. In the sec- 
ond case, they find new values for the lead times using 
the order-splitting proportion. Their sensitivity analy- 
sis shows that the order-splitting proportion tends to 
be a half, and it is biased by the coefficient of the expe- 
diting function. 

Cakanyildirim et al. [6] develop a model that con- 
siders lead-time variability. The authors assume that 
lead time is effected by both the lot size and the reserved 
capacity. The authors come up with a closed-form solu- 
tion for the situation where lead time is proportional to 
the lot size. Cakanyildirim et al. [6] also present the ef- 
fect of linear and concave lead times on the value of cost 
function. In the model, in addition to the order quan- 
tity and the reorder point, the reserved capacity is also 
a decision variable. Finally, the authors consider a case 
in which fixed proportion of capacity is allocated at the 
manufacturing facility. 

Most of the articles in the literature consider lead 
time as a constant and focus on demand during the lead 
time. However, Wu and Ouyang [21] assume that lead 
time is a decision variable and that lead-time demand 
follows a normal distribution. They also assume that an 
arrival order may contain some defective parts and that 
those parts will be kept in inventory until next deliv- 
ery. Moreover, they include an inspection cost for de- 
fective parts to the model. Their model is defined as 
(Q, R, L) inventory model where order quantity (Q), 
reorder point (R), and lead time (L) are decision vari- 
ables. The objective is to minimize the total cost which 
includes ordering costs, inventory holding costs (de- 
fective and non-defective), lost sales costs, backorder 
costs, and inspection costs. The authors present an al- 
gorithm to find the optimal solutions for the given 
problem. 

Duran et al. [9] present a (Q, R) policy where orders 
can be expedited. At the time of order release, if inven- 
tory position is less than or equal to a critical value r,, 
the order is expedited at an additional cost. If the in- 
ventory level is higher then r, and lower than or equal 
to the reorder point R, then order is not expedited. The 
aim is to find the order quantity (Q), the reorder point 
(R), and the expediting point r, which minimize av- 
erage cost (note that this does not include backorder 


costs). The authors present an optimal algorithm to ob- 
tain the Q R, and r, values if they are restricted to be 
integers. 

The model proposed by Kao and Hsu [15] is dif- 
ferent from other models reviewed in this paper be- 
cause the authors discuss the order quantity and re- 
order point with fuzzy demand. Kao and Hsu [15] use 
this fuzzy demand to construct the fuzzy total inven- 
tory cost. The authors derive five pairs of simultaneous 
nonlinear equations to find the optimal order quantity 
Qand the reorder point R. The authors show that when 
the demand is a trapezoid fuzzy number, the equations 
can be reduced to a set of closed-form equations. Then, 
they prove that the solution to these equations give an 
optimal solution. Kao and Hsu [15] also present a nu- 
merical example to show that the solution methodology 
developed in the paper is easy to apply in practice. 


Multi-Echelon Models 


Moinzadeh and Lee [18] present a model to determine 
the batch size in a multi-echelon system with one cen- 
tral depot and M sites. In their problem, when the num- 
ber of failed items is equal to the order quantity Q at any 
site, then those items are sent to the depot. If the depot 
has sufficient inventory on hand, it delivers the items 
immediately; otherwise, the items are backlogged. Al- 
though all sites use a (Q, R) policy, the depot uses a (S-1, 
S) policy. In other words, whenever the depot receives 
an order of size Q, it places an order simultaneously to 
replenish its stock. After determining the Q and R val- 
ues for each site, the authors use an approximation to 
estimate the total system stock and the backorder lev- 
els. The numerical results show that the (Q, R) policy is 
better than the (S-1, S) policy for such systems. 

Forsberg [10] deals with a multi-echelon inventory 
system with one warehouse and multiple non-identical 
retailers. The author assumes that the retailers face in- 
dependent Poisson demand and both the warehouse 
and the retailers use (Q, R) policies. Forsberg [10] eval- 
uates inventory holding and shortage costs using an ex- 
act solution approach. 

Chen and Zheng [8] study a (nQ, R) policy in 
a multi-stage serial inventory system where stage 1 or- 
ders from stage 2, stage 2 from stage 3, etc., and stage N 
places orders to an outside supplier with unlimited ca- 
pacity. The demand seen by stage 1 is compound Pois- 
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son and excess demand is backlogged at every stage. 
The transportation lead times among stages are con- 
stant. By using a two-step approach, Chen and Zheng 
[8] provide near-optimal solution. In the first step, they 
find the lower and upper bounds on the cost function by 
changing the penalty cost of being short on inventory. 
In the second step, the authors minimize the bounds 
by using three different heuristic approaches. Chen and 
Zheng [8] also propose an optimal algorithm that re- 
quires additional computational effort. 

Axsater [2] considers a two-stage inventory system 
with one warehouse and N non-identical retailers. He 
presents an exact method to evaluate inventory hold- 
ing and shortage costs when there are only two retail- 
ers. He focuses on the timing of the warehouse orders 
for the sub-batches of Q. He identifies three possibilities 
and evaluates the cost for each case separately. At the 
end, total cost is calculated by summing the costs for 
the three cases. When there are more than two retail- 
ers, he extends his evaluation technique by combining 
the retailers into two groups, and then uses the same 
approach he developed for the two retailer case. The 
author also presents a model where the lead times are 
constant and all facilities use (Q, R) policies with dif- 
ferent Q and R values. In this model, all stockouts are 
backordered, delayed orders are delivered on a first- 
come-first-served basis, and partial shipments are also 
allowed. In order to simplify the problem, Axsater [2] 
assumes that all batch sizes are multiples of the small- 
est batch size. In the objective function, the author only 
considers expected inventory holding cost and back- 
order cost. 

Like Axsater [2], Marklund [16] also considers 
a two-stage supply chain with one central warehouse 
and an arbitrary number of non-identical retailers. Cus- 
tomer demands occur only at the retailers. The retailers 
use (Q, R) policies with different parameters, and they 
request products from the central warehouse whenever 
their inventory positions reach R or fall below R. The 
author proposes a new policy (Qo, dao) that is motivated 
by relating the traditional echelon stock model to the 
installation stock (Q, R) model where the order quantity 
Q is a multiple of a minimum batch size. In the article, 
Marklund [16] gives the detailed derivation of the exact 
cost function when the retailers use (Q, R) policies and 
the warehouse uses the new (Qo, do) policy. The perfor- 
mance of the new policy is compared to traditional ech- 


elon stock policy and (Q, R) policy through numerical 
examples. Although the results show that the proposed 
policy outperforms the other policies in all numerical 
examples, the author does not guarantee that the policy 
will always give the best result. 

Fujiwara and Sedarage [11] apply a (Q, R) policy 
for a multi-part assembly system under stochastic lead 
times. The objective of the article is to simultaneously 
determine the order quantity and the assembly lot size 
so that the average total cost per unit time is minimized. 
The total cost includes setup costs, inventory holding 
costs of parts and assembled items, and shortage costs 
of assembled items. The authors try to find separate re- 
order points, r;, for each part and a global order quan- 
tity, Q, which will be used for all parts. Although the au- 
thors propose a global order quantity Q, they also men- 
tion that this kind of policy may not be optimal. They 
suggest that instead of a global Q, a common Q where 
all order quantities are multiples of Q might be more 
sensible. 

Chen and Zheng [7] consider a distribution system 
with one warehouse and multiple retailers. The retail- 
ers’ demands follow an independent compound Pois- 
son process. It is assumed that the order quantity is 
a multiple of the smallest batch size. The order quantity 
and the reorder point are calculated by using a heuris- 
tic. The authors present an exact procedure for evaluat- 
ing the performance (average cost) of the (nQ, R) pol- 
icy when the demand is a Poisson process. Chen and 
Zheng [7] also give two approximation procedures for 
the case with compound Poisson processes. The ap- 
proximations are based on exact formulations of the 
case with Poisson processes. 

Axsater [3] presents an exact analysis of a two-stage 
inventory system with one warehouse and multiple re- 
tailers. The demand for each retailer follows an inde- 
pendent compound Poisson process. The retailers re- 
plenish their stock from the warehouse, and the ware- 
house replenishes its stock from an outside supplier. 
The transportation times from the warehouse to the re- 
tailers and from the outside supplier to the warehouse 
are constant. In addition, if there is a shortage, then ad- 
ditional delay may also occur since shortages and stock- 
outs are backordered. The author emphasizes that the 
approach developed is not directly applicable for items 
with large demand. Instead, it is suitable mostly for 
slow-moving parts such as spare parts. 
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Moinzadeh [17] also considers a supply chain with 
one warehouse and multiple identical retailers. The au- 
thor assumes that demand at the retailers is random but 
stationary and that each retailer places its order accord- 
ing to a (Q, R) policy. In addition, Moinzadeh [17] as- 
sumes that the warehouse receives online information 
about the demand. The author shows the effect of infor- 
mation sharing on order replenishment decisions of the 
supplier. In the article, the author first proposes a possi- 
ble replenishment policy for the supplier and then pro- 
vides an exact analysis for the operating measures of 
such systems. The author concludes the article by giving 
information about when information sharing is most 
beneficial. 


Conclusions 


We provide a literature review on continuous review 
(Q, R) inventory policies. Although we review most of 
the well known papers that deal with (Q, R) policy, this 
is not an exhaustive review of the literature. Our aim is 
to present the importance of the (Q, R) policy and show 
possible extensions of the simple (Q, R) model. 
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Statement of the Result 


The method of successive substitution, Richardson iter- 
ation, or direct iteration seeks to find a fixed point of 
a map K, that is a point u* such that 


u* = K(u*). 
Given an initial iterate uo, the iteration is 


Uti = K(ux), for k>0. (1) 


Let X be a Banach space and let D C X be closed. 
A map K: D—> Disa contraction if 


||K(u) — K(y)|| < a ||u — »|| (2) 


for some a € (0, 1) and all u, v € D. The contraction 
mapping theorem, [3,7,13,14], states that if K is a con- 
traction on D then 
e Khasa unique fixed point u* in D, and 
e for any uo € D the sequence {u;} given by (1) con- 
verges to u*. 
The message of the contraction mapping theorem is 
that if one wishes to use direct iteration to solve a fixed 
point problem, then the fixed point map K must satisfy 
(2) for some D and relative to some choice of norm. The 
choice of norm need not be made explicitly, it is deter- 
mined implicitly by the K itself. However, if there is no 
norm for which (2) holds, then another, more robust, 
method, such as Newton’s method with a line search, 
must be used, or the problem must be reformulated. 
One may wonder why a Newton-like method is not 
always better than a direct iteration. The answer is that 
the cost for a single iteration is very low for Richardson 
iteration. So, if the equation can be set up to make the 
contraction constant q@ in (2) small, successive substitu- 
tion, while taking more iterations, can be more efficient 
than a Newton-like iteration, which has costs in linear 
algebra and derivative evaluation that are not incurred 
by successive substitution. 


Affine Problems 


An affine fixed point map has the form 
K(u) = Mu+b 


where M is a linear operator on the space X. The fixed 
point equation is 


(I—- M)u=b, (3) 


where I is the identity operator. The classical stationary 
iterative methods in numerical linear algebra, [8,13], 
are typically analyzed in terms of affine fixed point 
problems, where M is called the iteration matrix. Multi- 
grid methods, [2,4,5,9], are also stationary iterative 
methods. We give an example of how multigrid meth- 
ods are used later in this article. 
The contraction condition (2) holds if 


|M|| <a <1. (4) 


In (4) the norm is the operator norm on X. M may be 
a well defined operator on more than one space and (4) 
may not hold in all of them. Similarly, if X is finite di- 
mensional and all norms are equivalent, (4) may hold in 
one norm and not in another. It is known, [10], that (4) 
holds for some norm if and only if the spectral radius of 
Mis <1. 

When (4) does not hold it is sometimes possible to 
form an approximate inverse preconditioner P so that 
direct iteration can be applied to the equivalent prob- 
lem 


u = (I— PUI —M))u— Pb. (5) 


In order to apply the contraction mapping theorem and 
direct iteration to (5) we require that 


|T-PU-M)| <a <1 


in some norm. In this case we say that P is an approxi- 
mate inverse for I — M. In the final section of this article 
we give an example of how approximate inverses can be 
built for discretizations of integral operators. 


Nonlinear Problems 


If the nonlinear fixed point map K is sufficiently 
smooth, then a Newton-like method may be used to 
solve 


F(u) = u— K(u) = 0. 
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The transition from a current approximation u, of u* 
to an update u, is 


U4 = Uc — P(u, — K(u,)), (6) 
where 
Pw F(u*)! =(1-—K'(u*)) 7}. 


P= F(u,)~! is Newton’s method and P = F’(uo)~! is the 
chord method. 

It is easy to show [7,13,14] that if u is near u* and P 
is an approximate inverse for F’(u*) then the precondi- 
tioned fixed point problem 


u = u— P(u— K(u)) 


is a contraction on a neighborhood D of u*. This is, in 
fact, one way to analyze the convergence of Newton’s 
method. In this article our focus is on preconditioners 
that remain constant for several iterations and do not 
require computation of the derivative of K. 

The point to remember is that, if the goal is to trans- 
form a given fixed point map into a contraction, pre- 
conditioning of nonlinear problems can be done by the 
same process (formation of an approximate inverse) as 
for linear problems. 


Integral Equations Example 


We close this article with the Atkinson-Brakhage pre- 
conditioner for integral operators [2,4]. We will begin 
with the linear case, from which the nonlinear algo- 
rithm is a simple step. Let 2 € R‘ be compact and let 
k(x, y) be a continuous function on 2 x 2. We con- 
sider the affine fixed point problem 


ie= (2G 7)> i, und: 


where f € X = C({2) is given and a solution u* € X is 
sought. In this example D = X. We will assume that the 
linear operator J — K is nonsingular on X. 

We consider a family of increasingly accurate 
quadrature rules, indexed with a level /, with weights 
{wh}Xt and nodes {lp that satisfy 


Ni 
jim Sf (xj)w} = / f(x) dx 
—0o a ‘?) 


for all f € X. The family of operators {Kj} defined by 


Ni 
Kyu(x) = )> k(x, x))u(y)w} 


jel 
converges strongly to K, that is 


lim K,;u = Ku 

loo 
for all u € X. The family {Kj} is also collectively compact, 
[1]. This means that if B is a bounded subset of X, then 


U)Ki(B) 


is precompact in X. The direct consequences of the 
strong convergence and collective compactness are that 
I — K, are nonsingular for / sufficiently large and 


(I—K))"' > I-k) (7) 


strongly in X. The Atkinson-Brakhage preconditioner 
is based on these results. 
For g € X one can compute 


v =(I—K))"'g 


by solving the finite-dimensional linear system 


Ni 
vj = g(x}) + >) k(x}, x5 )vjw} (8) 


j=l 


for the values v(x!) = v; of v at the nodal points and then 
applying the Nystrém interpolation 


Ni 
v(x) = g(x) + > kG, x) yyw! = glx) + (Kiv)(x) 


j=l 


to recover v(x) for all x € §2. (8) can be solved at a cost 
of O(N?) floating point operations if direct methods for 
linear equations are used and for much less if iterative 
methods such as GMRES [15] are used. In that case, 
only O(1) matrix-vector products are need to obtain 
a solution that is accurate to truncation error [6]. This 
is, up to a multiplicative factor, optimal. The Atkinson-— 
Brakhage preconditioner can dramatically reduce this 
factor, however. 
The results in [1] imply that 


M, = 1+ (1-—K,)'K, 
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the Atkinson-Brakhage preconditioner, converges to 
(I — K)~' in the operator norm. Hence, for | sufficiently 
large (coarse mesh sufficiently fine) Richardson itera- 
tion can be applied to the system 


u=u—M(I-K))u—-M/f, 


where L > I. Applying this idea for a sequence of grids 
or levels leads to the optimal form of the Atkinson- 
Brakhage iteration [11]. The algorithm uses a coarse 
mesh, which we index with / = 0, to build the precon- 
ditioner and then cycles through the grids sequentially 
until the solution at a desired fine (J = L) mesh is ob- 
tained. One example of this is a sequence of composite 
midpoint rule quadratures in which Nj,; = 2N;. Then, 
[2,11], if the coarse mesh is sufficiently fine, only one 
Richardson iteration at each level will be needed. The 
cost at each level is two matrix vector products at level 1 
and a solve at level 0. 
1) Solve up — Ko up =f; set u = up. 
2) Forl=1,...,L: 

a) Compute r= u— Kju —f; 

b) u=u— Mor. 
Nonlinear problems can be solved with exactly the same 
idea. We will consider the special case of Hammerstein 
equations 


u(x) = K(u)(x) = [ k(x, y, u(y)) dy. 


If we use a sequence of quadrature rules as in the linear 
case we can define 


Ni 
K)(u)(x) = 3 Kee u(xi)) wi. 


j=l 


The nonlinear form of the Atkinson-Brakhage algo- 
rithm for Hammerstein equations simply uses the ap- 
proximation 


T+ (I~ K6(uo))*K'(u) © (I— Kj (u)) 


in a Newton-like iteration. One can see from the formal 
description below that little has changed from the linear 
case. 
1) Solve up — Ko (uo) = 0; set u = uo. 
2) Forl=1,...,L: 

a) Compute r= u — Kj(u); 

b) u=u—(I+ (I—Kj(uo)) | K’(u))r. 


The Atkinson-Brakhage algorithm can, under some 
conditions, be further improved, [12] and the number 
of fine mesh operator-function products per level re- 
duced to one. There is also no need to explicitly repre- 
sent the operator as an integral operator with a kernel. 
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In solving optimal control problems involving non- 
linear differential equations, some iterative procedure 
must be used to obtain the optimal control policy. As 
is true with any iterative procedure, one is concerned 
about the convergence rate and also about the reliabil- 
ity of obtaining the optimal control policy. Although 
from Pontryagin’s maximum principle it is known that 
the minimum of the performance index corresponds to 
the minimum of the Hamiltonian, to obtain the mini- 
mum value for the Hamiltonian is not always straight- 
forward. Here we outline a procedure that changes the 
control policy from iteration to iteration, improving the 
value of the performance index at each iteration, until 
the improvement is less than certain amount. Then the 
iteration procedure is stopped and the results are ana- 
lyzed. Such a procedure is called control vector iteration 
method (CVI), or iteration in the policy space. 


Optimal Control Problem 


To illustrate the procedure, let us consider the optimal 
control problem, where the system is described by the 
differential equation 


d 
* = f(x, u), 


ai with x(0) given , (1) 


where x is an n-dimensional state vector and uw is 
an r-dimensional control vector. The optimal control 
problem is to determine the control u in the time inter- 
val 0 < t < ts, so that the performance index 


r= | omar (2) 


is minimized. We consider the case where the final time 
te is given. To carry out the minimization of the perfor- 
mance index in (2) subject to the constraints in (1), we 
consider the augmented performance index 


_ ‘f T dx 
r= [ |o-+z (- 5)| dt, (3) 


where the n-dimensional vector of Lagrange multipliers 
z is called the adjoint vector. The last term in (3) can be 
thought of as a penalty function to ensure that the state 
equation is satisfied throughout the given time interval. 
We introduce the Hamiltonian 


H=¢4+2'f (4) 


and use integration by parts to simplify (3) to 


te dz! ‘ 
— i (H+ =) dt—z(t,)x(t,) +z (0)x(0). (5) 
0 


The optimal control problem now reduces to the mini- 
mization of J. 

To minimize J numerically, we assume that we have 
evaluated J at iteration j by using control policy denoted 
by u”. Now the problem is to determine the control 
policy u/* at the next iteration. Since the goal is to 
minimize J, obviously we want to make the change in 
J negative and numerically as large as possible. If we 
let du = ut) — w), the corresponding change in J 
is obtained by using Taylor series expansion up to the 
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quadratic terms: 


LC) 


0H 
+f [ax oF ix + 26x? pare 


2H 


+ba ou dt — 2"(ts)5x(t) . 
(6) 


The necessary condition for minimum of J is that the 
first integral in (6) should be zero; i.e., 


dz 0H ; 

a ae with z(t) = 0. (7) 
and 

dH 

a - 


In control vector iteration, we relax the necessary con- 
dition in (8) and choose du to make 6J negative and in 
the limit (8) is satisfied. One approach is to choose 


6u = —€-—, (9) 


where € is a positive parameter which may vary from 
iteration to iteration. This method is sometimes called 
first variation method, since the driving force for the 
change in the control policy is based only on the first 
term of the Taylor series expansion. The negative sign 
in (9) is required to minimize the Hamiltonian, as is re- 
quired by Pontryagin’s maximum principle. Numerous 
papers have been written on the determination of the 
stepping parameter € [7]. 


Second Variation Method 


Instead of arbitrarily determining the stepping param- 
eter €, one may solve the accessory minimization prob- 
lem, where éu is chosen to minimize 6J given by (6) 
after the requirements for the adjoint are satisfied; i.e., 
it is required to find du to minimize 4] given by 


"| (0H vo 
— a 3 Ts § 
éJ | ( ay ou + —dx ax? x 


ooo 0?’H 
gaa eat 
TEE pea ap 


iu] dt, (10) 


subject to the differential equation 


déx of! om of! 5 
B= (FE) or (F) on 


with 6x(0) =0. (11) 


The solution to this accessory minimization problem 
is straightforward, since (11) is linear and the perfor- 
mance index in (10) is almost quadratic, and can be eas- 
ily done, as shown in ([1], pp. 259-266) and [7]. The 
resulting equations, to be integrated backwards from 
t = teto t = 0 with zero starting conditions, are 


dj 4 aft - on #H 
dt ox ox Ox? 


(12) 


where the (rxn)-matrix $ = 0?H/dudx + Of! /OxJ, 
and 


dg or (PH) gy OE Nf Og 
dt du du ax® x )®- 


(13) 


The control policy is then updated through the equa- 
tion 


. (0H\ | (dH _ daft 
G+) — WG) _ 
: " (SS) (+5) 
?H\ 
-(5 >) § (x) — x) - (14) 
u 


This method of updating the control policy is called the 
second variation method. In (12) the (m x n)-matrix J is 
symmetric, so the total number of differential equations 
to be integrated backwards is n(n + 1)/2 + 2n. How- 
ever, the convergence is quadratic if the initial control 
policy is close to the optimum. 

To obtain good starting conditions, Luus and 
Lapidus [6] suggested the use of first variation method 
for the first few iterations and then to switch over to the 
second variation method. 

One additional feature of the second variation 
method is that the control policy given in (14) is a func- 
tion of the present state, so that the control policy is 
treated as being continuous and is not restricted to be- 
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ing piecewise constant over an integration time step, 
as is the case with the first variation method. As was 
shown in ([1], pp. 316-317), for the linear six-plate 
gas absorber example, when the system equation is lin- 
ear and the performance index is quadratic, the second 
variation method yields the optimal control policy in 
a single step. 


Determination of Stepping Parameter 


However, the large number and complexity of equa- 
tions required for obtaining the control policy and the 
instability of the method for very complex systems led 
to investigating different means of obtaining faster con- 
vergence with the first variation method. The effort was 
directed on the best means of obtaining the stepping pa- 
rameter € in (9). When € is too large, overstepping oc- 
curs, and if € is too small, the convergence rate is very 
small. 

Numerous papers have been written on the deter- 
mination of €. Several methods were compared by Rao 
and Luus [8] in solving typical optimal control prob- 
lems. Although they suggested a means of determining 
the ‘best’ method for performance indices that are al- 
most quadratic, it is found that a very simple scheme 
is quite effective for a wide variety of optimal control 
problems. Instead of trying to get very fast convergence 
and risk instability, the emphasis is placed on the ro- 
bustness. The strategy is to obtain the initial value for 
€ from the magnitude of 0H/du, and then increasing € 
when the iteration has been successful, and reducing its 
value if overstepping occurs. This type of approach was 
used in [2] in solving the optimal control of a pyrolysis 
problem. When the iteration was successful, the step- 
ping parameter was increased by 10 percent, and when 
overstepping resulted, the stepping parameter was re- 
duced to half its value. The algorithm for first variation 
method may be presented as follows: 

e Choose an initial control policy uw and a value for 
€; set the iteration index j to 0. 

e Integrate (1) from t = 0 to t = tr and evaluate the 
performance index in (2). Store the values of the 
state vector at the end of each integration time step. 

e Integrate the adjoint equation (7) from t = ff to 
t = 0, using for x the stored values of the state vec- 
tor in Step 2. At each integration time step evaluate 
the gradient 0H/du. 


e Choose a new control policy 


(15) 


e Integrate (1) from t = 0 to t = ff and evaluate the 
performance index in (2). Store the values of the 
state vector at the end of each integration time step. 
If the performance index is worse (i. e., overstepping 
has occurred), reduce € to half its value and go to 
Step 4. If the performance index has been improved 
increase € by a small factor, such as 1.10 and go to 
Step 3, and continue for a number of iterations, or 
terminate the iterations when the change in the per- 
formance index in an iteration is less than some cri- 
terion, and interpret the results. 


Illustration of the First Variation Method 


Let us consider the nonlinear continuous stirred tank 
reactor that has been used for optimal control studies in 
([1], pp. 308-318) and [6], and which was shown in [4] 
to exhibit multiplicity of solutions. The system is de- 
scribed by the two equations 


Ot —2 (x, + 0.25) 
dt 
25x] 
+ (x2 + 0.5) exp (- = ;) — u(x; + 0.25) , 
(16) 
OF Mpeg, (x2 + 0.5) exp ( an ) Pag alee 
dt x, +2 


with the initial state x|(0) = 0.09 and x2(0) = 0.09. 
The control u is a scalar quantity related to the valve 
opening of the coolant. The state variables x, and x2 
represent deviations from the steady state of dimen- 
sionless temperature and concentration, respectively. 
The performance index to be minimized is 


te 
ts (xj + x} +0.1u’) dt, (18) 
0 


where the final time ts = 0.78. The Hamiltonian is 


H = Z, (—2 (x; + 0.25) + R—u(x, + 0.25)) 


+22 (0.5 —x2-R)+xi+x3+0.1u?, (19) 
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where R = (x2 + 0.5) exp(25x1/(x + 2)). The adjoint 
equations are 


dz, (Zo _ 21) 

{T= 2)z,—2 50R——_. , 20 
dt (u + 2) 2 x1 + (x; + 2)2 (20) 
dz, (Z2 — 21) 

—_— =-2 —— ——_R 21 
di 2+ 0 +05) + Z2, (21) 

and the gradient of the Hamiltonian is 

0H 

— = 0.2u — (x, + 0.25) z,. (22) 
du 


To illustrate the computational aspects of CVI, the 
above algorithm was used with a Pentium-120 personal 
computer using WATCOM Fortran compiler version 
9.5. The calculations were done in double precision. As 
found in [4], convergence to the local optimum was ob- 
tained when small values for the initial control policy 
were used, and the global optimum was obtained when 
large values were used as initial policy. As is seen in 
Table 1, when an integration time step of 0.0065 was 
used (allowing 120 piecewise constant steps), in spite of 
the large number of iterations, the optimal control pol- 
icy can be obtained in less than 2s of computer time. 
The iterations were stopped when the change in the 
performance index from iteration to iteration was less 
than 107°. 

The total computation time for making this run 
with 11 different initial control policies was 9.6s on 
the Pentium-120 digital computer. When an integra- 


Control Vector Iteration CVI, Table 1 
Application of First Variation Method to CSTR 


Initial Performance Number of CPU 
policy u® index iterations times 


fio fo2sass | 16 oe | 
ha fo2sass [re __| 


Control Vector Iteration CVI, Table 2 
Effect of the number of time stages P on the optimal perfor- 
mance index 


Number of 
time stages P 


Optimal | 
by CVI 
0.13429 
0.13363 
0.13339 
0.13323 
0.13317 
0.13313 
0.13310 


Optimal | 
by IDP 
0.13416 
0.13357 
0.13336 
0.13321 
0.13316 
0.13313 
0.13310 


tion time step of 0.00312 was used, the value of the per- 
formance index at the global optimum was improved 
to 0.133104. When a time step of 0.001 was used, giv- 
ing 780 time steps, the optimal control policy yielded 
I = 0.133097. Even here the computation time for the 
11 different initial conditions was only 31s. With the 
use of piecewise linear control and only 20 time stages, 
a performance index of I = 0.133101 was obtained 
in [3] with iterative dynamic programming (IDP). To 
obtain this result with IDP, by using 5 randomly cho- 
sen points and 10 passes, each consisting of 20 itera- 
tions, took 13.4s on a Pentium-120. The use of 15 time 
stages yielded I = 0.133112 and required 7.8 s. There- 
fore, computationally CVI is faster than IDP for this 
problem, but the present formulation does not allow 
piecewise linear control to be used in CVI. 

The effect of the number of time stages for piecewise 
constant control is shown in Table 2, where CVI results 
are compared to those obtained by IDP in [5]. 

As can be seen, the given algorithm gives results 
very close to those obtained by IDP, and the deviations 
decrease as the number of time stages increases, be- 
cause the approximations introduced during the back- 
ward integration when the stored values for the state 
vector are used, and in the calculation of the gradient of 
the Hamiltonian in CVI become negligible as the time 
stages become very small. As is shown in Fig. 1, when 
the optimal value of the performance index is plotted 
against 1/P”, the extrapolated value, as 1/P” approaches 
zero, gives the value obtained with the second variation 
method. 

The first variation method is easy to program and 
will continue to be a very useful method of determining 
the optimal control of nonlinear systems. 
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Control Vector Iteration CVI, Figure 1 


Linear variation of optimal performance index with P—2; 


“+ CVI, A-- A-- A IDP 


See also 


> Boundary Condition Iteration BCI 

> Duality in Optimal Control with First Order 
Differential Equations 

> Dynamic Programming: Continuous-time Optimal 
Control 

> Dynamic Programming and Newton’s Method in 
Unconstrained Optimal Control 

> Dynamic Programming: Optimal Control 
Applications 

> Hamilton-Jacobi-Bellman Equation 

> Infinite Horizon Control and Dynamic Games 

> MINLP: Applications in the Interaction of Design 
and Control 


> Multi-objective Optimization: Interaction of Design 


and Control 
> Optimal Control of a Flexible Arm 
> Optimization Strategies for Dynamic Systems 
> Robust Control 
> Robust Control: Schur Stability of Polytopes of 
Polynomials 


> Semi-infinite Programming and Control Problems 
> Sequential Quadratic Programming: Interior Point 

Methods for Distributed Optimal Control Problems 
> Suboptimal Control 


References 


1. Lapidus L, Luus R (1967) Optimal control of engineering pro- 
cesses. Blaisdell, Waltham 

2. Luus R (1978) On the optimization of oil shale pyrolysis. 
Chem Eng Sci 33:1403-1404 

3. Luus R (1993) Application of iterative dynamic program- 
ming to very high dimensional systems. Hungarian J Industr 
Chem 21: 243-250 

4. Luus R, Cormack DE (1972) Multiplicity of solutions result- 
ing from the use of variational methods in optimal control 
problems. Canad J Chem Eng 50:309-311 

5. Luus R, Galli M (1991) Multiplicity of solutions in using dy- 
namic programming for optimal control. Hungarian J In- 
dustr Chem 19:55-62 

6. Luus R, Lapidus L (1967) The control of nonlinear systems. 
Part Il: Convergence by combined first and second varia- 
tions. AIChE J 13:108-113 

7. Merriam CW (1964) Optimization theory and the design of 
feedback control systems, McGraw-Hill, New York, pp 259- 
261 

8. Rao SN, Luus R (1972) Evaluation and improvement of con- 
trol vector iteration procedures for optimal control. Canad J 
Chem Eng 50:777-784 


rr 
Convex Discrete Optimization 


SHMUEL ONN 
Technion - Israel Institute of Technology, 
Haifa, Israel 


MSC2000: 05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 
68R, 68U, 68W, 90B, 90C 


Article Outline 


Abstract 


Introduction 
Limitations 
Outline and Overview of Main Results and Applications 
Terminology and Complexity 


Reducing Convex to Linear Discrete Optimization 
Edge-Directions and Zonotopes 
Strongly Polynomial Reduction of Convex 
to Linear Discrete Optimization 
Pseudo Polynomial Reduction 
when Edge-Directions Are not Available 


514 


Convex Discrete Optimization 


Convex Combinatorial Optimization and More 
From Membership to Linear Optimization 
Linear and Convex Combinatorial Optimization 
in Strongly Polynomial Time 
Linear and Convex Discrete Optimization 
over any Set in Pseudo Polynomial Time 
Some Applications 


Linear N-fold Integer Programming 
Oriented Augmentation and Linear Optimization 
Graver Bases and Linear Integer Programming 
Graver Bases of N-fold Matrices 
Linear N-fold Integer Programming in Polynomial Time 
Some Applications 


Convex Integer Programming 
Convex Integer Programming 
over Totally Unimodular Systems 
Graver Bases and Convex Integer Programming 
Convex N-fold Integer Programming in Polynomial Time 
Some Applications 


Multiway Transportation Problems and Privacy 


in Statistical Databases 
Multiway Transportation Problems and Privacy 
in Statistical Databases 
The Universality Theorem 
The Complexity of the Multiway Transportation Problem 
Privacy and Entry-Uniqueness 


References 


Abstract 


We develop an algorithmic theory of convex optimiza- 
tion over discrete sets. Using a combination of algebraic 
and geometric tools we are able to provide polyno- 
mial time algorithms for solving broad classes of convex 
combinatorial optimization problems and convex in- 
teger programming problems in variable dimension. 
We discuss some of the many applications of this the- 
ory including to quadratic programming, matroids, bin 
packing and cutting-stock problems, vector partition- 
ing and clustering, multiway transportation problems, 
and privacy and confidential statistical data disclosure. 
Highlights of our work include a strongly polynomial 
time algorithm for convex and linear combinatorial op- 
timization over any family presented by a member- 
ship oracle when the underlying polytope has few edge- 
directions; a new theory of so-termed n-fold integer 
programming, yielding polynomial time solution of im- 
portant and natural classes of convex and linear in- 
teger programming problems in variable dimension; 
and a complete complexity classification of high dimen- 


sional transportation problems, with practical applica- 
tions to fundamental problems in privacy and confi- 
dential statistical data disclosure. 


Introduction 


The general linear discrete optimization problem can 
be posed as follows. 

LINEAR DISCRETE OPTIMIZATION. Given a set S C 
Z" of integer points and an integer vector w € Z”, find 
an x € S maximizing the standard inner product wx := 
Dia WiXi- 

The algorithmic complexity of this problem, which 
includes integer programming and combinatorial opti- 
mization as special cases, depends on the presentation 
of the set S of feasible points. In integer programming, 
this set is presented as the set of integer points satisfying 
a given system of linear inequalities, which in standard 
form is given by 


S = {xeN": Ax=})}, 


where N stands for the nonnegative integers, A € 
Z”*" is an m X n integer matrix, and b € Z” is an 
integer vector. The input for the problem then consists 
of A, b, w. In combinatorial optimization, S C {0,1}" 
is a set of {0, 1}-vectors, often interpreted as a family 
of subsets of a ground set N := {1,...,}, where each 
x € S is the indicator of its support supp(x) C N. The 
set S is presented implicitly and compactly, say as the 
set of indicators of subsets of edges in a graph G sat- 
isfying a given combinatorial property (such as being 
a matching, a forest, and so on), in which case the in- 
put is G, w. Alternatively, S is given by an oracle, such 
as a membership oracle which, queried on x € {0,1}", 
asserts whether or not x € S, in which case the algo- 
rithmic complexity also includes a count of the number 
of oracle queries needed to solve the problem. 

Here we study the following broad generalization of 
linear discrete optimization. 

CONVEX DISCRETE OPTIMIZATION. Given a set 
S Cc Z", vectors w},.. 
functional c: R? —> R, find an x ¢€ S$ maximizing 
c(w1x,..., Wax). 

This problem can be interpreted as multi-objective 
linear discrete optimization: given d linear functionals 
W1X,..., Wax representing the values of points x € S 
under d criteria, the goal is to maximize their “con- 


.,Wq € Z", and a convex 
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vex balancing” defined by c(w1x,..., wax). In fact, we 
have a hierarchy of problems of increasing generality 
and complexity, parameterized by the number d of lin- 
ear functionals: at the bottom lies the linear discrete 
optimization problem, recovered as the special case of 
d = 1 and c the identity on R; and at the top lies the 
problem of maximizing an arbitrary convex functional 
over the feasible set S, arising with d = n and with 
w; = 1; the ith standard unit vector in R” for all i. 

The algorithmic complexity of the convex discrete 
optimization problem depends on the presentation of 
the set S of feasible points as in the linear case, as 
well as on the presentation of the convex functional c. 
When S is presented as the set of integer points satisfy- 
ing a given system of linear inequalities we also refer to 
the problem as convex integer programming, and when 
S © {0,1}" and is presented implicitly or by an ora- 
cle we also refer to the problem as convex combinatorial 
optimization. As for the convex functional c, we will as- 
sume throughout that it is presented by a comparison 
oracle that, queried on x, y € R%, asserts whether or 
not c(x) < c(y). This is a very broad presentation that 
reveals little information on the function, making the 
problem, on the one hand, very expressive and applica- 
ble, but on the other hand, very hard to solve. 

There is a massive body of knowledge on the com- 
plexity of linear discrete optimization - in particular 
(linear) integer programming [55] and (linear) combi- 
natorial optimization [31]. The purpose of this mono- 
graph is to provide the first comprehensive unified 
treatment of the extended convex discrete optimiza- 
tion problem. The monograph follows the outline of 
five lectures given by the author in the Séminaire 
de Mathématiques Supérieures Series, Université de 
Montréal, during June 2006. Colorful slides of the- 
ses lectures are available online at [46] and can be 
used as a visual supplement to this monograph. The 
monograph has been written under the support of the 
ISF - Israel Science Foundation. The theory devel- 
oped here is based on and is a culmination of several 
recent papers including [5,12,13,14,15,16,17,25,39,47, 
48,49,50,51] written in collaboration with several col- 
leagues - Eric Babson, Jesus De Loera, Komei Fukuda, 
Raymond Hemmecke, Frank Hwang, Vera Rosta, Uriel 
Rothblum, Leonard Schulman, Bernd Sturmfels, Rekha 
Thomas, and Robert Weismantel. By developing and 
using a combination of geometric and algebraic tools, 


we are able to provide polynomial time algorithms 
for several broad classes of convex discrete optimiza- 
tion problems. We also discuss in detail some of the 
many applications of our theory, including to quadratic 
programming, matroids, bin packing and cutting-stock 
problems, vector partitioning and clustering, multiway 
transportation problems, and privacy and confidential 
statistical data disclosure. 

We hope that this monograph will, on the one hand, 
allow users of discrete optimization to enjoy the new 
powerful modelling and expressive capability of convex 
discrete optimization along with its broad polynomial 
time solvability, and on the other hand, stimulate more 
research on this new and fascinating class of problems, 
their complexity, and the study of various relaxations, 
bounds, and approximations for such problems. 


Limitations 


Convex discrete optimization is generally intractable 
even for small fixed d, since already for d = 1 it in- 
cludes linear integer programming which is NP-hard. 
When d is a variable part of the input, even very sim- 
ple special cases are NP-hard, such as the following 
problem, so-called positive semi-definite quadratic bi- 
nary programming, 


xeEN”, 


i=1,...,n}. 


max {(w1x)? + +++ + (Wax)? : 


x; <1, 


Therefore, throughout this monograph we will assume 
that d is fixed (but arbitrary). 

As explained above, we also assume throughout that 
the convex functional c which constitutes part of the 
data for the convex discrete optimization problem is 
presented by a comparison oracle. Under such broad 
presentation, the problem is generally very hard. In par- 
ticular, if the feasible set is S := {x €e N": Ax = b} 
and the underlying polyhedron P := {x € Ri. : Ax = 
b} is unbounded, then the problem is inaccessible even 
in one variable with no equation constraints. Indeed, 
consider the following family of univariate convex inte- 
ger programs with convex functions parameterized by 
-—~w <u<wn, 


max {c,(x) : x E N}, 
i= —Xx, ifx <u; 
wer" Vox = 2u, ifx > u. 
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Consider any algorithm attempting to solve the prob- 
lem and let u be the maximum value of x in all queries to 
the oracle of c. Then the algorithm can not distinguish 
between the problem with c,, whose objective function 
is unbounded, and the problem with c.., whose optimal 
objective value is 0. Thus, convex discrete optimization 
(with an oracle presented functional) over an infinite set 
S C Z" is quite hopeless. Therefore, an algorithm that 
solves the convex discrete optimization problem will ei- 
ther return an optimal solution, or assert that the prob- 
lem is infeasible, or assert that the underlying polyhe- 
dron is unbounded. In fact, in most applications, such 
as in combinatorial optimization with S C {0,1}" or 
integer programming with S := {x € Z” : Ax = 
b, 1 < x < usandl,u € Z", the set S is finite and 
the problem of unboundedness does not arise. 


Outline and Overview of Main Results 
and Applications 


We now outline the structure of this monograph and 
provide a brief overview of what we consider to be our 
main results and main applications. The precise rele- 
vant definitions and statements of the theorems and 
corollaries mentioned here are provided in the relevant 
sections in the monograph body. As mentioned above, 
most of these results are adaptations or extensions of re- 
sults from one of the papers [5,12,13,14,15,16,17,25,39, 
47,48,49,50,51]. The monograph gives many more ap- 
plications and results that may turn out to be useful in 
future development of the theory of convex discrete op- 
timization. 

The rest of the monograph consists of five sections. 
While the results evolve from one section to the next, 
it is quite easy to read the sections independently of 
each other (while just browsing now and then for rele- 
vant definitions and results). Specifically, Sect. “Convex 
Combinatorial Optimization and More” uses defini- 
tions and the main result of Sect. “Reducing Convex 
to Linear Discrete Optimization”; Sect. “Convex Inte- 
ger Programming” uses definitions and results from 
Sections “Reducing Convex to Linear Discrete Opti- 
mization” and “Linear N-fold Integer Programming”; 
and Sect. “Multiway Transportation Problems and Pri- 
vacy in Statistical Databases” uses the main results 
of Sections “Linear N-fold Integer Programming” and 
“Convex Integer Programming”. 


In Sect. “Reducing Convex to Linear Discrete Opti- 
mization” we show how to reduce the convex discrete 
optimization problem over S C Z" to strongly polyno- 
mially many linear discrete optimization counterparts 
over S, provided that the convex hull conv(S) satisfies 
a suitable geometric condition, as follows. 


Theorem 1 For every fixed d, the convex discrete opti- 
mization problem over any finite S C Z" presented by 
a linear discrete optimization oracle and endowed with 
a set covering all edge-directions of conv(S), can be solved 
in strongly polynomial time. 


This result will be incorporated in the polynomial 
time algorithms for convex combinatorial optimization 
and convex integer programming to be developed in 
Sect. “Convex Combinatorial Optimization and More” 
and Sect. “Convex Integer Programming”. 

In Sect. “Convex Combinatorial Optimization and 
More” we discuss convex combinatorial optimization. 
The main result is that convex combinatorial optimiza- 
tion over a set S C {0,1}" presented by a member- 
ship oracle can be solved in strongly polynomial time 
provided it is endowed with a set covering all edge- 
directions of conv(S). In particular, the standard lin- 
ear combinatorial optimization problem over S can be 
solved in strongly polynomial time as well. 


Theorem 2 For every fixed d, the convex combinatorial 
optimization problem over any S C {0, 1}" presented by 
a membership oracle and endowed with a set covering all 
edge-directions of the polytope conv(S), can be solved in 
strongly polynomial time. 


An important application of Theorem 2 concerns con- 
vex matroid optimization. 


Corollary 1 For every fixed d, convex combinatorial 
optimization over the family of bases of a matroid pre- 
sented by membership oracle is strongly polynomial time 
solvable. 


In Sect. “Linear N-fold Integer Programming” we de- 
velop the theory of linear n-fold integer program- 
ming. As a consequence of this theory we are able 
to solve a broad class of linear integer program- 
ming problems in variable dimension in polynomial 
time, in contrast with the general intractability of lin- 
ear integer programming. The main theorem here 
may seem a bit technical at a first glance, but is 
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really very natural and has many applications dis- 
cussed in detail in Sect. “Linear N-fold Integer Pro- 
gramming”, Sect. “Convex Integer Programming” and 
Sect. “Multiway Transportation Problems and Privacy 
in Statistical Databases”. To state it we need a defini- 
tion. Given an (r + s) x t matrix A, let A, be its r x t 
sub-matrix consisting of the first r rows and let A; be its 
s x t sub-matrix consisting of the last s rows. We refer 
to A explicitly as (r + s) x t matrix, since the definition 
below depends also on r and s and not only on the en- 
tries of A. The n-fold matrix of an (r + s) x t matrix A 
is then defined to be the following (7 + ns) x nt matrix, 


A® = (1n ® Ai) ® Un ® Az) 


Ay. Ay Ag a Ay 
A, 0 0 «= O 
7 As 0 2a: 0 
OO. Oe. «ee 


Given now any n € N, lower and upper bounds 
lLu € ZR with Zoo := ZW {+00}, right-hand side 
b € Z'*"S, and linear functional wx with w € Z"!, 
the corresponding linear n-fold integer programming 
problem is the following program in variable dimen- 
sion nt, 


max{wx : xe Z", Ax =b,1<x< u\. 


The main theorem of Sect. “Linear N-fold Integer Pro- 
gramming” asserts that such integer programs are poly- 
nomial time solvable. 


Theorem 3 For every fixed (r + s) x t integer ma- 
trix A, the linear n-fold integer programming problem 
with any n, |, u, b, and w can be solved in polynomial 
time. 


Theorem 3 has very important applications to high- 
dimensional transportation problems which are dis- 
cussed in Sect. “Three-Way Line-Sum Transportation 
Problems” and in more detail in Sect. “Multiway Trans- 
portation Problems and Privacy in Statistical Data- 
bases”. Another major application concerns bin pack- 
ing problems, where items of several types are to be 
packed into bins so as to maximize packing utility sub- 
ject to weight constraints. This includes as a special case 
the classical cutting-stock problem of [27]. These are 


discussed in detail in Sect. “Packing Problems and Cut- 
ting-Stock”. 


Corollary 2 For every fixed number t of types and type 
weights v|,..., V1, the corresponding integer bin packing 
and cutting-stock problems are polynomial time solvable. 


In Sect. “Convex Integer Programming” we discuss 
convex integer programming, where the feasible set S is 
presented as the set of integer points satisfying a given 
system of linear inequalities. In particular, we consider 
convex integer programming over n-fold systems for 
any fixed (but arbitrary) (r + s) x t matrix A, where, 
given n € N, vectors 1,u € Zt, b € Z't"s and 


W1,-..,Wq € Z", and convex functional c: R¢ —> R, 
the problem is 
max {c(w1x,...,Wax) : xe Z", 


AM, =b, 1 <x <u}. 


The main theorem of Sect. “Convex Integer Program- 
ming” is the following extension of Theorem 3, assert- 
ing that convex integer programming over n-fold sys- 
tems is polynomial time solvable as well. 


Theorem4 For every fixed d and (r+) x t integer ma- 
trix A, convex n-fold integer programming with any n, |, 
u, b, w1,..., Wa, and c can be solved in polynomial time. 


Theorem 4 broadly extends the class of objective func- 
tions that can be efficiently maximized over n-fold 
systems. Thus, all applications discussed in Sect. 
“Some Applications” automatically extend accordingly. 
These include convex high-dimensional transporta- 
tion problems and convex bin packing and cutting- 
stock problems, which are discussed in detail in Sect. 
“Transportation Problems and Packing Problems” and 
Sect. “Multiway Transportation Problems and Privacy 
in Statistical Databases”. 

Another important application of Theorem 4 con- 
cerns vector partitioning problems which have appli- 
cations in many areas including load balancing, circuit 
layout, ranking, cluster analysis, inventory, and relia- 
bility, see e. g. [7,9,25,39,50] and the references therein. 
The problem is to partition n items among p players so 
as to maximize social utility. With each item is associ- 
ated a k-dimensional vector representing its utility un- 
der k criteria. The social utility of a partition is a con- 
vex function of the sums of vectors of items that each 
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player receives. In the constrained version of the prob- 
lem, there are also restrictions on the number of items 
each player can receive. We have the following conse- 
quence of Theorem 4; more details on this application 
are in Sect. “Vector Partitioning and Clustering”. 


Corollary 3 For every fixed number p of players and 
number k of criteria, the constrained and unconstrained 
vector partition problems with any item vectors, convex 
utility, and constraints on the number of item per player, 
are polynomial time solvable. 


In the last Sect. “Multiway Transportation Problems 
and Privacy in Statistical Databases” we discuss mul- 
tiway (high-dimensional) transportation problems and 
secure statistical data disclosure. Multiway transporta- 
tion problems form a very important class of discrete 
optimization problems and have been used and studied 
extensively in the operations research and mathemat- 
ical programming literature, as well as in the statistics 
literature in the context of secure statistical data disclo- 
sure and management by public agencies, see e. g. [4,6, 
11,18,19,42,43,53,60,62] and the references therein. The 
feasible points in a transportation problem are the mul- 
tiway tables (“contingency tables” in statistics) such that 
the sums of entries over some of their lower dimen- 
sional sub-tables such as lines or planes (“margins” in 
statistics) are specified. We completely settle the algo- 
rithmic complexity of treating multiway tables and dis- 
cuss the applications to transportation problems and 
secure statistical data disclosure, as follows. 

In Sect. “The Universality Theorem” we show that 
“short” 3-way transportation problems, over r x c x 3 
tables with variable number r of rows and variable 
number c of columns but fixed small number 3 of lay- 
ers (hence “short”), are universal in that every inte- 
ger programming problem is such a problem (see Sect. 
“The Universality Theorem” for the precise stronger 
statement and for more details). 


Theorem 5 Every linear integer programming problem 
max{cy : y € N” : Ay = b} is polynomial time repre- 
sentable as a short 3-way line-sum transportation prob- 
lem 


max wx: x EN™%rs , ) = Zick: 
i 


y Xijk = Vi,k > y Xijk = ais} : 
j k 


In Sect. “The Complexity of the Multiway Transporta- 
tion Problem” we discuss k-way transportation prob- 
lems of any dimension k. We provide the first polyno- 
mial time algorithm for convex and linear “long” (k + 
1)-way transportation problems, over m, x++-x m, xn 
tables, with k and m,,..., mx fixed (but arbitrary), and 
variable number n of layers (hence “long”). This is best 
possible in view of Theorem 21. Our algorithm works 
for any hierarchical collection of margins: this captures 
common margin collections such as all line-sums, all 
plane-sums, and more generally all h-flat sums for any 
0 <h < k (see Sect. “Tables and Margins” for more de- 
tails). We point out that even for the very special case of 
linear integer transportation over 3 x 3 x n tables with 
specified line-sums, our polynomial time algorithm is 
the only one known. We prove the following statement. 


Corollary 4 For every fixed d,k,m,...,m, and fam- 
ily F of subsets of {1,...,k+1} specifying a hierarchical 
collection of margins, the convex (and in particular lin- 
ear) long transportation problem over m, X++-X my Xn 
tables is polynomial time solvable. 


In our last subsection Sect. “Privacy and Entry-Unique- 
ness” we discuss an important application concerning 
privacy in statistical databases. It is a common practice 
in the disclosure of a multiway table containing sensi- 
tive data to release some table margins rather than the 
table itself. Once the margins are released, the security 
of any specific entry of the table is related to the set of 
possible values that can occur in that entry in any ta- 
ble having the same margins as those of the source ta- 
ble in the data base. In particular, if this set consists of 
a unique value, that of the source table, then this entry 
can be exposed and security can be violated. We show 
that for multiway tables where one category is signifi- 
cantly richer than the others, that is, when each sample 
point can take many values in one category and only 
few values in the other categories, it is possible to check 
entry-uniqueness in polynomial time, allowing disclos- 
ing agencies to make learned decisions on secure dis- 
closure. 


Corollary 5 For every fixed k,m,...,m, and fam- 
ily F of subsets of {1,...,k+1} specifying a hierarchical 
collection of margins to be disclosed, it can be decided in 
polynomial time whether any specified entry Xippecosiggr I 
the same in all long m, x ++. x mx x n tables with the 


disclosed margins, and hence at risk of exposure. 
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Terminology and Complexity 


We use R for the reals, R; for the nonnegative reals, 
Z for the integers, and N for the nonnegative integers. 
The sign of a real number r is denoted by sign(r) € 
{0, —1, 1} and its absolute value is denoted by |r|. The 
ith standard unit vector in IR" is denoted by 1;. The 
support of x € IR” is the index set supp(x) := {i : 
x; 4 0} of nonzero entries of x. The indicator of a sub- 
set I C {1,...,n} is the vector 1; := )°;¢;1; so that 
supp(1;) = I. When several vectors are indexed by sub- 
scripts, w),...,wq € IR”, their entries are indicated by 
pairs of subscripts, w; = (Wi,1,...,Wi,n). When vec- 
tors are indexed by superscripts, x!,...,x* € R", their 
entries are indicated by subscripts, x' = (xj,...,x/). 
The integer lattice Z" is naturally embedded in R”. The 
space R” is endowed with the standard inner product 
which, for w,x € R", is given by wx := S°%_, wixi. 
Vectors w in IR” will also be regarded as linear func- 
tionals on R” via the inner product wx. Thus, we refer 
to elements of R” as points, vectors, or linear function- 
als, as will be appropriate from the context. The convex 
hull of a set S C IR” is denoted by conv(S) and the set of 
vertices of a polyhedron P C R" is denoted by vert(P). 
In linear discrete optimization over S C Z", the facets 
of conv(S) play an important role, see Chvatal [10] and 
the references therein for earlier work, and Grdétschel, 
Lovasz and Schrijver [31,45] for the later culmination 
in the equivalence of separation and linear optimization 
via the ellipsoid method of Yudin and Nemirovskii [63]. 
As will turn out in Sect. “Reducing Convex to Linear 
Discrete Optimization”, in convex discrete optimiza- 
tion over S, the edges of conv(S) play an important role 
(most significantly in a way which is not related to the 
Hirsch conjecture discussed in [41]). We therefore use 
extensively convex polytopes, for which we follow the 
terminology of [32,65]. 

We often assume that the feasible set S C Z” is 
finite. We then define its radius to be its Ig9 radius 
p(S) := max{||x|loo : x € S} where, as usual, ||x|]oo := 
max}_, |x;|. In other words, p(S) is the smallest p €¢ N 
such that S is contained in the cube [—p, p]”. 

Our algorithms are applied to rational data only, 
and the time complexity is as in the standard Tur- 
ing machine model, see e.g. [1,26,55]. The input typi- 
cally consists of rational (usually integer) numbers, vec- 
tors, matrices, and finite sets of such objects. The bi- 


nary length of an integer number z € Z is defined 
to be the number of bits in its binary representation, 
(z) := 1+ [log,(|z| + 1)] (with the extra bit for the 
sign). The length of a rational number presented as 
a fraction r = : with p,q € Z is (r) := (p) + (q). 
The length of an m x n matrix A (and in particular of 
a vector) is the sum (A) := ij (4i,i) of the lengths of 
its entries. Note that the length of A is no smaller than 
the number of entries, (A) > mn. Therefore, when A 
is, say, part of an input to an algorithm, with m, n vari- 
able, the length (A) already incorporates mn, and so we 
will typically not account additionally for m, n directly. 
But sometimes, especially in results related to n-fold in- 
teger programming, we will also emphasize n as part of 
the input length. Similarly, the length of a finite set E 
of numbers, vectors or matrices is the sum of lengths of 
its elements and hence, since (E) > |E|, automatically 
accounts for its cardinality. 

Some input numbers affect the running time of 
some algorithms through their unary presentation, re- 
sulting in so-called “pseudo polynomial” running time. 
The unary length of an integer number z € Z is the 
number |z|+1 of bits in its unary representation (again, 
an extra bit for the sign). The unary length of a rational 
number, vector, matrix, or finite set of such objects are 
defined again as the sums of lengths of their numerical 
constituents, and is again no smaller than the number 
of such numerical constituents. 

When studying convex and linear integer program- 
ming in Sect. “Linear N-fold Integer Programming” 
and Sect. “Convex Integer Programming” we some- 
times have lower and upper bound vectors J, u with 
entries in Zoo := ZW {+00}. Both binary and unary 
lengths of a -Eoo entry are constant, say 3 by encoding 
too := +*00". 

To make the input encoding precise, we introduce 
the following notation. In every algorithmic statement 
we describe explicitly the input encoding, by listing 
in square brackets all input objects affecting the run- 
ning time. Unary encoded objects are listed directly 
whereas binary encoded objects are listed in terms of 
their length. For example, as is often the case, if the 
input of an algorithm consists of binary encoded vec- 
tors (linear functionals) w),...,wq € Z” and unary 
encoded integer p € N (bounding the radius p(S) of 
the feasible set) then we will indicate that the input is 
encoded as [p, (w1,...,Wa)]. 
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Some of our algorithms are strongly polynomial 
time in the sense of [59]. For this, part of the input 
is regarded as “special”. An algorithm is then strongly 
polynomial time if it is polynomial time in the usual 
Turing sense with respect to all input, and in addi- 
tion, the number of arithmetic operations (additions, 
subtractions, multiplications, divisions, and compar- 
isons) it performs is polynomial in the special part of 
the input. To make this precise, we extend our in- 
put encoding notation above by splitting the square 
bracketed expression indicating the input encoding 
into a “left” side and a “right” side, separated by semi- 
colon, where the entire input is described on the right 
and the special part of the input on the left. For ex- 
ample, Theorem 1, asserting that the algorithm un- 
derlying it is strongly polynomial with data encoded 
as [n,|E|; (0(S), wi,...,wa,E)], where p(S) € N, 


W1,...,Wq © Z" and E Cc Z", means that the run- 
ning time is polynomial in the binary length of p(S), 
W,,..., Wa, and E, and the number of arithmetic oper- 


ations is polynomial in n and the cardinality |E|, which 
constitute the special part of the input. 

Often, as in [31], part of the input is presented 
by oracles. Then the running time and the number 
of arithmetic operations count also the number of or- 
acle queries. An oracle algorithm is polynomial time 
if its running time, including the number of oracle 
queries, and the manipulations of numbers, some of 
which are answers to oracle queries, is polynomial in 
the length of the input encoding. An oracle algorithm 
is strongly polynomial time (with specified input encod- 
ing as above), if it is polynomial time in the entire input 
(on the “right”), and in addition, the number of arith- 
metic operations it performs (including oracle queries) 
is polynomial in the special part of the input (on the 
“left”). 


Reducing Convex to Linear Discrete Optimization 


In this section we show that when suitable auxiliary ge- 
ometric information about the convex hull conv(S) of 
a finite set S C Z" is available, the convex discrete opti- 
mization problem over S can be reduced to the solution 
of strongly polynomially many linear discrete optimiza- 
tion counterparts over S. This result will be incorpo- 
rated into the polynomial time algorithms developed in 
Sect. “Convex Combinatorial Optimization and More” 


and Sect. “Convex Integer Programming” for convex 
combinatorial optimization and convex integer pro- 
gramming respectively. In Sect. “Edge-Directions and 
Zonotopes” we provide some preliminaries on edge-di- 
rections and zonotopes. In Sect. “Strongly Polynomial 
Reduction of Convex to Linear Discrete Optimization” 
we prove the reduction which is the main result of this 
section. In Sect. “Pseudo Polynomial Reduction when 
Edge-Directions are not Available” we prove a pseudo 
polynomial reduction for any finite set. 


Edge-Directions and Zonotopes 


We begin with some terminology and facts that play 
an important role in the sequel. A direction of an edge 
(1-dimensional face) e = [u, v] of a polytope P is any 
nonzero scalar multiple of u — v. A set of vectors E cov- 
ers all edge-directions of P if it contains a direction of 
each edge of P. The normal cone of a polytope P C R” 
at its face F is the (relatively open) cone Cf of those lin- 
ear functionals h € IR” which are maximized over P 
precisely at points of F. A polytope Z is a refinement 
of a polytope P if the normal cone of every vertex of Z 
is contained in the normal cone of some vertex of P. 
If Z refines P then, moreover, the closure of each nor- 
mal cone of P is the union of closures of normal cones 
of Z. The zonotope generated by a set of vectors E = 
{e1,...,@m} in R? is the following polytope, which is 
the projection by E of the cube [—1, 1]” into R4, 


Z :=  zone(E) 


5 hen A, = +1 


i=1 


Cc R?¢. 


conv 


The following fact goes back to Minkowski, see [32]. 


Lemma 1 Let P be a polytope and let E be a finite set 
that covers all edge-directions of P. Then the zonotope 
Z := zone(E) generated by E is a refinement of P. 


Proof Consider any vertex u of Z. Then u = 
Yoeep Ace for suitable A, = +1. Thus, the normal 
cone C¥ consists of those h satisfying hA,e > 0 for alle. 
Pick any he C%, and let v be a vertex of P at which 
h is maximized over P. Consider any edge [v, w] of P. 
Then v — w = a-e for some scalar a, # 0 and some 
e € E,and0 < h(v —w)= hoe, implying aA, >0. 
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It follows that every h € Cy satisfies h(v — w)>0 
for every edge of P containing v. Therefore h is max- 
imized over P uniquely at v and hence is in the cone 
Cy, of P at v. This shows CZ C C5. Since u was arbi- 
trary, it follows that the normal cone of every vertex 
of Z is contained in the normal cone of some vertex 
of P. oO 


The next lemma provides bounds on the number of 
vertices of any zonotope and on the algorithmic com- 
plexity of constructing its vertices, each vertex along 
with a linear functional maximized over the zonotope 
uniquely at that vertex. The bound on the number of 
vertices has been rediscovered many times over the 
years. An early reference is [33], stated in the dual form 
of 2-partitions. A more general treatment is [64]. Re- 
cent extensions to p-partitions for any p are in [3,39], 
and to Minkowski sums of arbitrary polytopes are in 
[29]. Interestingly, already in [33], back in 1967, the 
question was raised about the algorithmic complexity 
of the problem; this is now settled in [20,21] (the lat- 
ter reference correcting the former). We state the pre- 
cise bounds on the number of vertices and arithmetic 
complexity, but will need later only that for any fixed d 
the bounds are polynomial in the number of gener- 
ators. Therefore, below we only outline a proof that 
the bounds are polynomial. Complete details are in the 
above references. 


Lemma2 The number of vertices of any zonotope Z := 
zone(E) generated by a set E of m vectors in R4 is at most 
2 ar —e) For every fixed d, there is a strongly poly- 
nomial time algorithm that, given E C Z4, encoded as 
[m := |E|;(E)], outputs every vertex v of Z := zone(E) 
along with a linear functional h, € Z4 maximized 
over Z uniquely at v, using O(m4~!) arithmetics oper- 
ations for d > 3 and O(m*) for d < 2. 


Proof We only outline a proof that, for every fixed d, 
the polynomial bounds O(m4~') on the number of ver- 
tices and O(m“) on the arithmetic complexity hold. We 
assume that E linearly spans R¢@ (else the dimension can 
be reduced) and is generic, that is, no d points of E lie 
on a linear hyperplane (one containing the origin). In 
particular, 0 ¢ E. The same bound for arbitrary E then 
follows using a perturbation argument (cf. [39]). 

Each oriented linear hyperplane H = {x € R®@ : 
hx = 0} with h € R®@ nonzero induces a partition 


of Eby E = H WH°WHt, with H~ := {e € E: 
he < 0}, E° := EN H,and Ht := {e € E: he>0}. 
The vertices of Z = zone(E) are in bijection with or- 
dered 2-partitions of E induced by such hyperplanes 
that avoid E. Indeed, if E = H7 |+)H™ then the lin- 
ear functional h, := h defining H is maximized over Z 
uniquely at the vertex v := )-{e: e € Ht} — ){e: 
e€ H }ofZ. 

We now show how to enumerate all such 2-par- 
titions and hence vertices of Z. Let M be any of the 
(3) subsets of E of size d — 1. Since E is generic, M is 
linearly independent and spans a unique linear hyper- 
plane lin(M). Let H = {x € R?: hx = 0} be one 
of the two orientations of the hyperplane lin(M). Note 
that H° = M. Finally, let L be any of the 24! subsets 
of M. Since M is linearly independent, there is a g € R@ 
which linearly separates L from M \ L, namely, satisfies 
gx <0 forall x € Land gx >0 forall x € M \ L. Fur- 
thermore, there is a sufficiently small € > 0 such that the 
oriented hyperplane H := {x € R? : hx = 0} defined 
by h := hte g avoids E and the 2-partition induced 
by H satisfies H~ = H~ J Land H+ = H+ WJ(M\L). 
The corresponding vertex of Z isv := )’{e :e € 
H*}— “fe : e € H7}and the corresponding linear 
functional which is maximized over Z uniquely at v is 
hy:=h= h + eg. 

We claim that any ordered 2-partition arises that 
way from some M, some orientation Hof lin(M), and 
some L. Indeed, consider any oriented linear hyper- 
plane H avoiding E. It can be perturbed to a suitable 
oriented H that touches precisely d — 1 points of E. Put 
M := H’° so that A coincides with one of the two ori- 
entations of the hyperplane lin(M) spanned by M, and 
put L := H~ MM. Let H be an oriented hyperplane ob- 
tained from M, H and L by the above procedure. Then 
the ordered 2-partition E = H~ J Ht induced by H 
coincides with the ordered 2-partition E = H~ |+) Ht 
induced by H. 

Since there are (7) many (d — 1)-subsets M C E, 
two orientations H of lin(M), and 24—! subsets L C 
M, and d is fixed, the total number of 2-partitions 
and hence also the total number of vertices of Z obey 
the upper bound ae Ce) = O(m*"!). Furthermore, 
for each choice of M, H and L, the linear functional h 
defining H, as well as g, €, hy =h = h+ €g, and the 
vertexv = ){e:e € Ht*}—){e:e€ H }ofZat 
which h, is uniquely maximized over Z, can all be com- 
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puted using O(m) arithmetic operations. This shows 
the claimed bound O(m“) on the arithmetic complex- 
ity. a 


We conclude with a simple fact about edge-directions 
of projections of polytopes. 


Lemma 3_ [If E covers all edge-directions of a polytope P, 
and Q := @(P) is the image of P under a linear map 
w: R" — R%4, then w(E) covers all edge-directions 


of Q. 


Proof Let f be a direction of an edge [x, y] of Q. Con- 
sider the face F := w !([x, y]) of P. Let V be the set 
of vertices of F and let U = {u € V : w(u) = x}. 
Then for some u € U and v € V \ U, there must be 
an edge [u, v] of F, and hence of P. Then w(v) € (x, y] 
hence w(v) = x + af for some a ¥ 0. Therefore, with 
e:= i(v — u), a direction of the edge [u, v] of P, we 


find that f = +(w(v) —w(u)) = w(e) € w(E). Oo 


Strongly Polynomial Reduction of Convex 
to Linear Discrete Optimization 


A linear discrete optimization oracle for a set S C Z" is 
one that, queried on w € Z", either returns an optimal 
solution to the linear discrete optimization problem 
over S, that is, an x* € S satisfying wx* = max{wx : 
x € S}, or asserts that none exists, that is, either the 
problem is infeasible or the objective function is un- 
bounded. We now show that a set E covering all edge- 
directions of the polytope conv(S) underlying a convex 
discrete optimization problem over a finite set S C Z” 
allows to solve it by solving polynomially many lin- 
ear discrete optimization counterparts over S. The fol- 
lowing theorem extends and unifies the corresponding 
reductions in [49] and [12] for convex combinatorial 
optimization and convex integer programming respec- 
tively. Recall from Sect. “Terminology and Complexity” 
that the radius of a finite set S C Z” is defined to be 
p(S) := max{|x;|:x € S,i=1,...,n}. 


Theorem 6 For every fixed d there is a strongly poly- 
nomial time algorithm that, given finite set S C Z" 
presented by a linear discrete optimization oracle, in- 
teger vectors W,...,Wa € Z", set E C Z" cover- 
ing all edge-directions of conv(S), and convex functional 
c: R4 —> R presented by a comparison oracle, encoded 
E|; (p(S), wi,..., Wa, E)], solves the convex dis- 


as [n, 


crete optimization problem 


max {c(w x,..., wax): x € S}. 


Proof First, query the linear discrete optimization or- 
acle presenting S on the trivial linear functional w = 
0. If the oracle asserts that there is no optimal solu- 
tion then S is empty so terminate the algorithm as- 
serting that no optimal solution exists to the convex 
discrete optimization problem either. So assume the 
problem is feasible. Let P := conv(S) C R"” and 
Q := {(wix,...,wax) : x € P} C R4. Then Q is 
a projection of P, and hence by Lemma 3 the projection 
D := {(wie,...,wae) : e © E} of the set E is a set 
covering all edge-directions of Q. Let Z := zone(D) C 
R¢ be the zonotope generated by D. Since d is fixed, 
by Lemma 2 we can produce in strongly polynomial 
time all vertices of Z, every vertex v along with a lin- 
ear functional h, € Z% maximized over Z uniquely 
at v. For each of these polynomially many h,, repeat 
the following procedure. Define a vector g, € Z” by 
&v,j = ye Wij My, i for j = 
the linear discrete optimization oracle presenting S on 
the linear functional w := g, € Z". Let x, € S be 
the optimal solution obtained from the oracle, and let 
Zy = (WiXy,..., Waxy) € Q be its projection. Since 
P = conv(S), we have that x, is also a maximizer of 
gy over P. Since for every x € P and its projection 
Z := (Wix,...,wax) € Q we have hyz = g,x, we 
conclude that z, is a maximizer of h, over Q. Now we 
claim that each vertex u of Q equals some z,. Indeed, 
since Z is a refinement of Q by Lemma 1, it follows 
that there is some vertex v of Z such that h, is maxi- 
mized over Q uniquely at u, and therefore u = z,. Since 
c(w,x,...,Wgx) is convex on R” and c is convex on 
R4, we find that 


1,...,n. Now query 


max C(W1X,..., WqX) 
xES 

= max C(wW1X,..., WqXx) 
xEP 


= max c(z) 
zEQ 


= max{c(u) : u vertex of Q} 

= max{c(z,) : v vertex of Z}. 
Using the comparison oracle of c, find a vertex v of Z 
attaining maximum value c(z,), and output x, € S, 


an optimal solution to the convex discrete optimization 
problem. oO 
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Pseudo Polynomial Reduction 
when Edge-Directions Are not Available 


Theorem 6 reduces convex discrete optimization to 
polynomially many linear discrete optimization coun- 
terparts when a set covering all edge-directions of the 
underlying polytope is available. However, often such 
a set is not available (see e.g. [8] for the important 
case of bipartite matching). We now show how to re- 
duce convex discrete optimization to many linear dis- 
crete optimization counterparts when a set covering all 
edge-directions is not offhand available. In the absence 
of such a set, the problem is much harder, and the 
algorithm below is polynomially bounded only in the 
unary length of the radius p(S) and of the linear func- 
tionals w),..., wa, rather than in their binary length 
(p(S),wi,...,Wa) as in the algorithm of Theorem 6 
Moreover, an upper bound p > p(S) on the radius of S 
is required to be given explicitly in advance as part of 
the input. 


Theorem 7 For every fixed d there is a polynomial 
time algorithm that, given finite set S C Z" presented 
by a linear discrete optimization oracle, integer p > 
p(S), vectors w1,...,wa € Z", and convex functional 
c: R4 — R presented by a comparison oracle, encoded 


as [P,W1,...,Wa], solves the convex discrete optimiza- 
tion problem 
max {c(w,x,..., wax): x € S}. 


Proof Let P:= conv(S) C R”, let T:= {(wix,..., 


wax): x € S} be the projection of S by wi,..., wa, and 
let Q := conv(T) C R?@ be the corresponding pro- 
jection of P. Let r := np max‘_,||willoo and let G := 
{—r,...,—1,0,1,...,r}4. Then T C G and the num- 


ber (2r + 1)4 of points of G is polynomially bounded in 
the input as encoded. 

Let D:= {u—v:u,v € G,u ¥ v} be the set of dif- 
ferences of pairs of distinct point of G. It covers all edge- 
directions of Q since vert(Q) C T C G. Moreover, the 
number of points of D is less than (27 + 1)74 and hence 
polynomial in the input. Now invoke the algorithm of 
Theorem 6: while the algorithm requires a set E cover- 
ing all edge-directions of P, it needs E only to compute 
a set D covering all edge-directions of the projection Q 
(see proof of Theorem 6, which here is computed di- 
rectly. oO 


Convex Combinatorial Optimization and More 


In this section we discuss convex combinatorial opti- 
mization. The main result is that convex combinato- 
rial optimization over a set S C {0,1}” presented by 
a membership oracle can be solved in strongly poly- 
nomial time provided it is endowed with a set cov- 
ering all edge-directions of conv(S). In particular, the 
standard linear combinatorial optimization problem 
over S can be solved in strongly polynomial time as 
well. In Sect. “From Membership to Linear Optimiza- 
tion” we provide some preparatory statements involv- 
ing various oracle presentation of the feasible set S. 
In Sect. “Linear and Convex Combinatorial Optimiza- 
tion in Strongly Polynomial Time” we combine these 
preparatory statements with Theorem 6 and prove the 
main result of this section. An extension to arbitrary fi- 
nite sets S C Z" endowed with edge-directions is es- 
tablished in Sect. “Linear and Convex Discrete Opti- 
mization over any Set in Pseudo Polynomial Time”. We 
conclude with some applications in Sect. “Some Appli- 
cations”. 

As noted in the introduction, when S is contained in 
{0, 1}” we refer to discrete optimization over S also as 
combinatorial optimization over S, to emphasize that S 
typically represents a family F C 2% of subsets of 
a ground set N := {1,..., n} possessing some com- 
binatorial property of interest (for instance, the family 
of bases of a matroid over N, see Sect. “Matroids and 
Maximum Norm Spanning Trees”. The convex com- 
binatorial optimization problem then also has the fol- 
lowing interpretation (taken in [47,49]). We are given 
a weighting w : N —> Z? of elements of the ground 
set by d-dimensional integer vectors. We interpret the 
weight vector w(j) € Z4 of element j as representing 
its value under d criteria (e. g., if N is the set of edges in 
a network then such criteria may include profit, relia- 
bility, flow velocity, etc.). The weight of a subset F C N 
is the sum @(F) := )),¢p @(j) of weights of its ele- 
ments, representing the total value of F under the d cri- 
teria. Now, given a convex functional c: R4 —> R, the 
objective function value of F C N is the “convex bal- 
ancing” c(w(F)) of the values of the weight vector of F. 
The convex combinatorial optimization problem is to 
find a family member F € F maximizing c(w(F)). The 
usual linear combinatorial optimization problem over 
F is the special case of d = 1 and c the identity on 
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R. To cast a problem of that form in our usual setup 
just let S := {lp : F © F} C {0,1}" be the set of 
indicators of members of F and define weight vectors 
W1,...,Wa © Z” by wi,; = w(j); fori = 1,...,d and 
j=l,...,n. 


From Membership to Linear Optimization 


A membership oracle for a set S C Z" is one that, 
queried on x € Z", asserts whether or not x € S. An 
augmentation oracle for S is one that, queried on x € S 
and w € Z", either returns an x € S with wx > wx, 
i.e. a better point of S, or asserts that none exists, i.e. 
x is optimal for the linear discrete optimization prob- 
lem over S. 

A membership oracle presentation of S is very broad 
and available in all reasonable applications, but reveals 
little information on S, making it hard to use. However, 
as we now show, the edge-directions of conv(S) allow to 
convert membership to augmentation. 


Lemma 4 There is a strongly polynomial time algo- 
rithm that, given set S C {0,1}" presented by a mem- 
bership oracle, x € S,w € Z", and set E C Z" cover- 
ing all edge-directions of the polytope conv(S), encoded 
as [n, |E|; (x, w, E)], either returns a better point x € S, 
that is, one satisfying wx > wx, or asserts that none ex- 
ists. 


Proof Each edge of P := conv(S) is the difference 
of two {0, 1}-vectors. Therefore, each edge direction 
of P is, up to scaling, a {—1, 0, 1}-vector. Thus, scaling 
e:= Tele? and e := —e if necessary, we may and will 
assume that e € {—1,0,1}" and we > 0 for alle € E. 
Now, using the membership oracle, check if there is an 
e € E such that x + e € S and we >0. If there is such 
an e then output * := x + e which is a better point, 
whereas if there is no such e then terminate asserting 
that no better point exists. 

Clearly, if the algorithm outputs an x then it is in- 
deed a better point. Conversely, suppose x is not a max- 
imizer of w over S. Since S C {0,1}", the point x is 
a vertex of P. Since x is not a maximizer of w, there is 
an edge [x, x] of P with x a vertex satisfying wx > wx. 
But then e := x — x is the one {—1,0, 1} edge-direc- 
tion of [x,X] with we > 0 and hence e € E. Thus, 
the algorithm will find and output < = x + e as it 
should. oO 


An augmentation oracle presentation of a finite S al- 
lows to solve the linear discrete optimization problem 
max{wx : x € S} over S by starting from any feasible 
x € S and repeatedly augmenting it until an optimal 
solution x* € S is reached. The next lemma bounds 
the running time needed to reach optimality using this 
procedure. While the running time is polynomial in the 
binary length of the linear functional w and the initial 
point x, it is more sensitive to the radius p(S) of the fea- 
sible set S, and is polynomial only in its unary length. 
The lemma is an adaptation ofa result of [30,57] (stated 
therein for {0, 1}-sets), which makes use of bit-scaling 
ideas going back to [23]. 


Lemma 5 There is a polynomial time algorithm that, 
given finite set S C Z" presented by an augmentation 
oracle, x € S, and w € Z", encoded as [p(S), (x, w)], 
provides an optimal solution x* € S to the linear discrete 
optimization problem max{wz : z € S}. 


Proof Let k := max’_, [log,(|wj| + 1)] and note 
that k < (w). For i=0,...,k define a linear func- 
tional uj = (uia,...,Ui,n) € Z” by uj,; := sign(wj) 
[2'-*|w;|] for j = 1,...,n. Then uo = 0, uz = w, and 


u; — 2u;—-, € {—1,0,1}” for alli = 1,...,k. 

We now describe how to construct a sequence of 
points yo, y1,..., yx € S such that y; is an optimal so- 
lution to max{ujy : y € S} for all i. First note that all 
points of S are optimal for uy = 0 and hence we can 
take yo := x to be the point of S given as part of the 
input. We now explain how to determine y; from y;— 
for i = 1,...,k. Suppose y;-; has been determined. 
Set y := yj-1. Query the augmentation oracle on y € S 
and u;; if the oracle returns a better point y then set 
y := jp and repeat, whereas if it asserts that there is no 
better point then the optimal solution for u; is read off 
to be y; := ¥. We now bound the number of calls to the 
oracle. Each time the oracle is queried on j and u; and 
returns a better point y, the improvement is by at least 
one, i.e. uj(f— ¥) = 1; this is so because uj, j and y are 
integer. Thus, the number of necessary augmentations 
from y;—, to y; is at most the total improvement, which 
we claim satisfies 


Ui(¥i — Yi-1) = (Ui — 2uj-1)(i — Vi-1) 
+ 2uj-1(yi — yi-1) < 2np +0 = 2np, 


where p := (S). Indeed, u; — 2uj-1 € {—1,0,1}” 
and yj, vi-1 € S C [—p, p]” imply (u; — 2u;-1)(yi — 
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yi-1) < 2np; and y;-; optimal for u;—; gives uj;—1(yi — 
Vi-1) <0. 

Thus, after a total number of at most 2pk calls to 
the oracle we obtain y; which is optimal for ux. Since 
w = Uu_ we can output x* := y, as the desired opti- 
mal solution to the linear discrete optimization prob- 
lem. Clearly the number 2npk of calls to the oracle, as 
well as the number of arithmetic operations and binary 
length of numbers occurring during the algorithm, are 
polynomial in p(S), (x, w). This completes the proof.O 


We conclude this preparatory subsection by record- 
ing the following result of [24] which incorporates 
the heavy simultaneous Diophantine approximation 
of [44]. 


Proposition 1 There is a strongly polynomial time al- 
gorithm that, given w € Z", encoded as [n;(w)], pro- 
duces w € Z", whose binary length (w) is polyno- 
mially bounded in n and independent of w, and with 
sign(wz) = sign(wz) for every z € {—1,0, 1}”. 


Linear and Convex Combinatorial Optimization 
in Strongly Polynomial Time 


Combining the preparatory statements of Sect. “From 
Membership to Linear Optimization” with Theorem 6, 
we can now solve the convex combinatorial optimiza- 
tion over a set S C {0,1}" presented by a member- 
ship oracle and endowed with a set covering all edge- 
directions of conv(S) in strongly polynomial time. We 
start with the special case of linear combinatorial opti- 
mization. 


Theorem 8 There is a strongly polynomial time algo- 
rithm that, given set S C {0,1}" presented by a mem- 
bership oracle, x € S,w € Z", and set E C Z" 
covering all edge-directions of the polytope conv(S), en- 
coded as [n, |E|; (x, w, E)], provides an optimal solution 
x* € S to the linear combinatorial optimization problem 
max{wz: z € S}. 


Proof First, an augmentation oracle for S can be simu- 
lated using the membership oracle, in strongly polyno- 
mial time, by applying the algorithm of Lemma 4. 
Next, using the simulated augmentation oracle 
for S, we can now do linear optimization over S in 
strongly polynomial time as follows. First, apply to w 
the algorithm of Proposition 1 and obtain w € Z” 
whose binary length (Ww) is polynomially bounded in 


n, which satisfies sign(wz) = sign(wz) for every z € 
{-1,0,1}”. Since S C {0,1}", it is finite and has radius 
p(S) = 1. Now apply the algorithm of Lemma 5 to S, x 
and w, and obtain a maximizer x* of w over S. For every 
y € {0, 1}" we then have x*—y € {—1,0, 1}" and hence 
sign(w(x* — y)) = sign(w(x* — y)). So x* is also a max- 
imizer of w over S and hence an optimal solution to the 
given linear combinatorial optimization problem. Now, 
p(S) = 1, (w) is polynomial in n, and x € {0,1}” and 
hence (x) is linear in n. Thus, the entire length of the 
input [p(S), (x, w)] to the polynomial-time algorithm 
of Lemma 5 is polynomial in n, and so its running time 
is in fact strongly polynomial on that input. oO 


Combining Theorems 6 and 8 we recover at once the 
following result of [49]. 


Theorem 9 For every fixed d there is a strongly poly- 
nomial time algorithm that, given set S fC {0,1}" 
presented by a membership oracle, x € S, vectors 
W,...,Wq € Z", set E C Z” covering all edge- 
directions of the polytope conv(S), and convex functional 
c: R¢ —> R presented by a comparison oracle, encoded 
as [n, |E|; (x, wi,..., wa, E)], provides an optimal solu- 
tion x* € S to the convex combinatorial optimization 
problem 


max {c(w1Z,...,wWaZ): z€ S}. 


Proof Since S is nonempty, a linear discrete optimiza- 
tion oracle for S can be simulated in strongly polyno- 
mial time by the algorithm of Theorem 8. Using this 
simulated oracle, we can apply the algorithm of Theo- 
rem 6 and solve the given convex combinatorial opti- 
mization problem in strongly polynomial time. Oo 


Linear and Convex Discrete Optimization 
over any Set in Pseudo Polynomial Time 


In Sect. “Linear and Convex Combinatorial Optimiza- 
tion in Strongly Polynomial Time” above we devel- 
oped strongly polynomial time algorithms for linear 
and convex discrete optimization over {0, 1}-sets. We 
now provide extensions of these algorithms to arbitrary 
finite sets S C Z". As can be expected, the algorithms 
become slower. 

We start by recording the following fundamen- 
tal result of Khachiyan [40] asserting that linear pro- 
gramming is polynomial time solvable via the ellipsoid 
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method [63]. This result will be used below as well as 
several more times later in the monograph. 


Proposition 2 There is a polynomial time algorithm 
that, given A € Z"*",b € Z", and w € Z", encoded as 
[(A, b, w)], either asserts that P := {x € R" : Ax < b} 
is empty, or asserts that the linear functional wx is un- 
bounded over P, or provides a vertex v € vert(P) which 
is an optimal solution to the linear program max{wx : 
x € P}. 


The following analog of Lemma 4 shows how to covert 
membership to augmentation in polynomial time, al- 
beit, no longer in strongly polynomial time. Here, both 
the given initial point x and the returned better point x 
if any, are vertices of conv(S). 


Lemma 6 There is a polynomial time algorithm that, 
given finite set S C Z" presented by a membership or- 
acle, vertex x of the polytope conv(S),w € Z", and set 
E C Z" covering all edge-directions of conv(S), encoded 
as [p(S), (x,w, E)], either returns a better vertex X of 
conv(S), that is, one satisfying wx > wx, or asserts that 
none exists. 


Proof Dividing each vector e € E by the greatest com- 
mon divisor of its entries and setting e := —e if nec- 
essary, we can and will assume that each e is primi- 
tive, that is, its entries are relatively prime integers, and 
we > 0. Using the membership oracle, construct the 
subset F C E of those e € E for which x + re € S 
for some r € {1,...,2(S)}. Let G C F be the sub- 
set of those f € F for which wf > 0. If Gis empty then 
terminate asserting that there is no better vertex. Other- 
wise, consider the convex cone cone(F) generated by F. 
It is clear that x is incident on an edge of conv(S) in di- 
rection f if and only if f is an extreme ray of cone(F). 
Moreover, since G = {f € F : wf >0} is nonempty, 
there must be an extreme ray of cone(F) which lies in G. 
Now f ¢€ F is an extreme ray of cone(F) if and only if 
there do not exist nonnegative A,, e € F\{f}, such that 
f = Legs Aces this can be checked in polynomial time 
using linear programming. Applying this procedure to 
each f € G, identify an extreme ray g € G. Now, us- 
ing the membership oracle, determine the largest r € 
{1,...,2p(S)} for which x+rg € S. Output x := x+rg 
which is a better vertex of conv(S). Oo 


We now prove the extensions of Theorems 8 and 9 
to arbitrary, not necessarily {0, 1}-valued, finite sets. 


While the running time remains polynomial in the bi- 
nary length of the weights w),...,wq and the set of 
edge-directions E, it is more sensitive to the radius p(S) 
of the feasible set S, and is polynomial only in its unary 
length. Here, the initial feasible point and the opti- 
mal solution output by the algorithms are vertices of 
conv(S). Again, we start with the special case of linear 
combinatorial optimization. 


Theorem 10 There is a polynomial time algorithm that, 
given finite S C Z" presented by a membership or- 
acle, vertex x of the polytope conv(S), w € Z", and 
set E C Z" covering all edge-directions of conv(S), 
encoded as [p(S), (x, w, E)], provides an optimal solu- 
tion x* € S to the linear discrete optimization problem 
max{wz:z eS}. 


Proof Apply the algorithm of Lemma 5 to the given 
data. Consider any query x’ € S, w’ € Z” made by that 
algorithm to an augmentation oracle for S. To answer 
it, apply the algorithm of Lemma 6 to x’ and w’. Since 
the first query made by the algorithm of Lemma 5 is 
on the given input vertex x’ := x, and any consequent 
query is on a point x’ := % which was the reply of the 
augmentation oracle to the previous query (see proof of 
Lemma 5), we see that the algorithm of Lemma 6 will 
always be asked on a vertex of S and reply with another. 
Thus, the algorithm of Lemma 6 can answer all aug- 
mentation queries and enables the polynomial time so- 
lution of the given problem. oO 


Theorem 11 For every fixed d there is a polynomial 
time algorithm that, given finite set S C Z" pre- 
sented by membership oracle, vertex x of conv(S), vec- 
tors W1,...,Wa € Z", set E C Z" covering all edge- 
directions of the polytope conv(S), and convex functional 
c: R4 —> R presented by a comparison oracle, encoded 
as [p(S), (x,wi,..., Wa, E)], provides an optimal solu- 
tion x* € S to the convex combinatorial optimization 
problem 


max {c(w1Z,...,Waz): z€ S}. 


Proof Since S is nonempty, a linear discrete optimiza- 
tion oracle for S can be simulated in polynomial time 
by the algorithm of Theorem 10 . Using this simulated 
oracle, we can apply the algorithm of Theorem 6 and 
solve the given problem in polynomial time. oO 
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Some Applications 


Positive Semidefinite Quadratic Binary Program- 
ming ‘The quadratic binary programming problem is 
the following: given an n x n matrix M, find a vector 
x € {0,1}" maximizing the quadratic form x? Mx in- 
duced by M. We consider here the instance where M is 
positive semidefinite, in which case it can be assumed 
to be presented as M = W/W with W a given d x n 
matrix. Already this restricted version is very broad: if 
the rank d of W and M is variable then, as mentioned 
in the introduction, the problem is NP-hard. We now 
show that, for fixed d, Theorem 9 implies at once that 
the problem is strongly polynomial time solvable (see 
also [2]). 


Corollary 6 For every fixed d there is a strongly poly- 
nomial time algorithm that given W € Z**", encoded 
as [n;(W)], finds x* © {0,1}" maximizing the form 
x'WTWx. 

Proof Let S := {0,1}" and let E := {1),...,1,}be the 
set of unit vectors in R”. Then P := conv(S) is just the 
n-cube [0,1]” and hence E covers all edge-directions 
of P. A membership oracle for S is easily and efficiently 
realizable and x := 0 € S is an initial point. Also, |E| 
and (E) are polynomial in n, and E is easily and efh- 
ciently computable. 

Now, fori = 1,...,d define w; € Z" to be the 
ith row of the matrix W, that is, w;,; := W7,; for all 
i, j. Finally, let c: R? —>+ R be the squared J, norm 
given by c(y) = |lylI2 == 04%, y, and note that the 
comparison of c(y) and c(z) can be done for y, z € zZ4 
in time polynomial in (y, z) using a constant number of 
arithmetic operations, providing a strongly polynomial 
time realization of a comparison oracle for c. 

This translates the given quadratic programming 
problem into a convex combinatorial optimization 
problem over S, which can be solved in strongly poly- 
nomial time by applying the algorithm of Theorem 9 
., Wa, E, and c. Oo 


toS,x =0,wj,.. 


Matroids and Maximum Norm Spanning Trees 
Optimization problems over matroids form a funda- 
mental class of combinatorial optimization problems. 
Here we discuss matroid bases, but everything works 
for independent sets as well. Recall that a family B of 
subsets of {1,..., n} is the family of bases of a matroid 
if all members of B have the same cardinality, called 


the rank of the matroid, and for every B, B’ € Band 
i € B \ B’ there isa j € B’ such that B \ {i} U {j} € B. 
Useful models include the graphic matroid of a graph 
G with edge set {1,..., m} and B the family of spanning 
forests of G, and the linear matroid of an mx n matrix A 
with B the family of sets of indices of maximal linearly 
independent subsets of columns of A. 

It is well known that linear combinatorial opti- 
mization over matroids can be solved by the fast 
greedy algorithm [22]. We now show that, as a conse- 
quence of Theorem 9, convex combinatorial optimiza- 
tion over a matroid presented by a membership oracle 
can be solved in strongly polynomial time as well (see 
also [34,47]). We state the result for bases, but the anal- 
ogous statement for independent sets hold as well. We 
say that S C {0,1}" is the set of bases of a matroid if it 
is the set of indicators of the family B of bases of some 
matroid, in which case we call conv(S) the matroid base 


polytope. 


Corollary 7 For every fixed d there is a strongly poly- 
nomial time algorithm that, given set S C {0,1}" of 
bases of a matroid presented by a membership ora- 
cle, x € S, Wi,...,wa € Z", and convex functional 
c: R¢ —> R presented by a comparison oracle, encoded 


as [n; (x, w1,..., Wa)], Solves the convex matroid opti- 
mization problem 
max {c(w)Z,...,wqz): zE S}. 


Proof Let E := {1;-1, : 1 <5 i < j < n} be 
the set of differences of pairs of unit vectors in R”. 
We claim that E covers all edge-directions of the ma- 
troid base polytope P := conv(S). Consider any edge 
e = [y, y'] of Pwith y, y’ € S and let B := supp(y) and 
B’ := supp(y’) be the corresponding bases. Let h € R"” 
be a linear functional uniquely maximized over P at e. 
If B \ B’ = {i} is a singleton then B’ \ B = {j} is 
a singleton as well in which case y— y’ = 1; — 1; 
and we are done. Suppose then, indirectly, that it is 
not, and pick an element i in the symmetric difference 
BAB’ := (B\ B’)U(B’ \ B) of minimum value h;. With- 
out loss of generality assume i € B \ B’. Then there is 
a j € B’ \ Bsuch that B” := B \ {i} U {j} is also a basis. 
Let y” € S be the indicator of B”. Now |BAB’q| > 2 
implies that B” is neither B nor B’. By the choice of i we 
have hy” = hy —h; + hj = hy. So y” is also a max- 
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imizer of h over P and hence y” € e. But no {0, 1}- 
vector is a convex combination of others, a contradic- 
tion. 

Now, |E| = (5) and E c {—1,0,1}" imply that |E| 
and (E) are polynomial in n. Moreover, E can be eas- 
ily computed in strongly polynomial time. Therefore, 
applying the algorithm of Theorem 9 to the given data 
and the set E, the convex discrete optimization problem 
over S can be solved in strongly polynomial time. O 


One important application of Corollary 7 is a polyno- 
mial time algorithm for computing the universal Grob- 
ner basis of any system of polynomials having a finite 
set of common zeros in fixed (but arbitrary) number of 
variables, as well as the construction of the state polyhe- 
dron of any member of the Hilbert scheme, see [5,51]. 
Other important applications are in the field of alge- 
braic statistics [52], in particular for optimal experimen- 
tal design. These applications are beyond our scope here 
and will be discussed elsewhere. 

Here is another concrete example of a convex ma- 
troid optimization application. 


Example 1 (MAXIMUM NORM SPANNING TREE). Fix 
any positive integer d. Let || - ||): R? —> R be the 
I, norm given by ||x||p := ne Ix;|P)? forl<p< 
oo and ||x|loo := max4_, |x;|. Let G be a connected 
graph with edge set N := {1,...,n}. For j = 1,...,n 
let uj € Z4 be a weight vector representing the values 
of edge j under some d criteria. The weight of a subset 
T C Nis the sum }’ <7 uj representing the total values 
of T under the d criteria. The problem is to find a span- 
ning tree T of G whose weight has maximum |, norm, 
that is, a spanning tree T maximizing || Vier ujl|p- 

Define w1,...,wa € Z" by wi := uj, fori = 
1,...,d,j =1,...,n. Let S C {0, 1}” be the set of in- 
dicators of spanning trees of G. Then, in time polyno- 
mial in n, a membership oracle for S is realizable, and 
an initial x € S is obtainable as the indicator of any 
greedily constructible spanning tree T. Finally, define 
the convex functional c := || - ||). Then for most com- 
mon values p = 1, 2, 00, and in fact for any p € N, the 
comparison of c(y) and c(z) can be done for y,z € z4 
in time polynomial in (y, z, p) by computing and com- 
paring the integer valued pth powers || yl and IIzIl5. 
Thus, by Corollary 7, this problem is solvable in time 
.,Un, p). 


polynomial in (uy, .. 


Linear N-fold Integer Programming 


In this section we develop a theory of linear n-fold in- 
teger programming, which leads to the polynomial time 
solution of broad classes of linear integer programming 
problems in variable dimension. This will be extended 
to convex n-fold integer programming in Sect. “Convex 
Integer Programming”. 

In Sect. “Oriented Augmentation and Linear Opti- 
mization” we describe an adaptation of a result of [56] 
involving an oriented version of the augmentation or- 
acle of Sect. “From Membership to Linear Optimiza- 
tion”. In Sect. “Graver Bases and Linear Integer Pro- 
gramming” we discuss Graver bases and their applica- 
tion to linear integer programming. In Sect. “Graver 
Bases of N-fold Matrices” we show that Graver bases 
of n-fold matrices can be computed efficiently. In Sect. 
“Linear N-fold Integer Programming in Polynomial 
Time” we combine the preparatory statements from 
Sect. “Oriented Augmentation and Linear Optimiza- 
tion”, Sect. “Graver Bases and Linear Integer Program- 
ming”, and Sect. “Graver Bases of N-fold Matrices”, 
and prove the main result of this section, asserting that 
linear n-fold integer programming is polynomial time 
solvable. We conclude with some applications in Sect. 
“Some Applications”. 

Here and in Sect. “Convex Integer Programming” 
we concentrate on discrete optimization problems over 
a set S presented as the set of integer points satisfying an 
explicitly given system of linear inequalities. Without 
loss of generality we may and will assume that S is given 
either in standard form S := {x € N" : Ax = 5} 
where A € Z™*" and b € Z", or in the form 


S:={xeZ": Ax=b,1l<x<u} 


where 1,u € Zh, and Zoo = ZW {+00}, where some 
of the variables are bounded below or above and some 
are unbounded. Thus, S is no longer presented by an 
oracle, but by the explicit data A, b and possibly 1, u. 
In this setup we refer to discrete optimization over S 
also as integer programming over S. As usual, an algo- 
rithm solving the problem must either provide an x € S 
maximizing wx over S, or assert that none exists (either 
because S is empty or because the objective function is 
unbounded over the underlying polyhedron). We will 
sometimes assume that an initial point x € S is given, 
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in which case b will be computed as b := Ax and not be 
part of the input. 


Oriented Augmentation and Linear Optimization 


We have seen in Sect. “From Membership to Linear Op- 
timization” that an augmentation oracle presentation of 
a finite set S C Z" enables to solve the linear discrete 
optimization problem over S. However, the running 
time of the algorithm of Lemma 5 which demonstrated 
this, was polynomial in the unary length of the radius 
p(S) of the feasible set rather than in its binary length. 

In this subsection we discuss a recent result of 
[56] and show that, when S is presented by a suit- 
able stronger oriented version of the augmentation or- 
acle, the linear optimization problem can be solved by 
a much faster algorithm, whose running time is in fact 
polynomial in the binary length (p(S)). The key idea 
behind this algorithm is that it gives preference to aug- 
mentations along interior points of conv(S) staying far 
off its boundary. It is inspired by and extends the com- 
binatorial interior point algorithm of [61]. 

For any vector g € R", let gt, g € R4 de- 
note its positive and negative parts, defined by gs = 
max{gj,0} and g> := —min{gj,0} for j = 1,...,n. 
Note that both g*,g~ are nonnegative, supp(g) = 
supp(g*) (J supp(g~), and g = gt — g. 

An oriented augmentation oracle for a set S C Z" 
is one that, queried on x € S and wi, w_ € Z", either 
returns an augmenting vector g € Z", defined to be one 
satisfying x + g € Sand wigt — w_g™ >0, or asserts 
that none exists. 

Note that this oracle involves two linear function- 
als wz,w— € Z" rather than one (w+, w_ are two 
distinct independent vectors and not the positive and 
negative parts of one vector). The conditions on an 
augmenting vector g indicate that it is a feasible direc- 
tion and has positive value under the nonlinear objec- 
tive function determined by w,, w_. Note that this or- 
acle is indeed stronger than the augmentation oracle of 
Sect. “From Membership to Linear Optimization”: to 
answer a query x € S, w € Z” to the latter, set wy := 
w_ := w, thereby obtaining wi.gt — w_g” = wg for 
all g, and query the former on x, w+, w-; if it replies 
with an augmenting vector g then reply with the better 
point * := x + g, whereas if it asserts that no g exists 
then assert that no better point exists. 


The following lemma is an adaptation of the result 
of [56] concerning sets of the form S := {x € Z": 
Ax = b, 0 < x < u} of nonnegative integer points sat- 
isfying equations and upper bounds. However, the pair 
A, b is neither explicitly needed nor does it affect the 
running time of the algorithm underlying the lemma. 
It suffices that S is of that form. Moreover, an arbitrary 
lower bound vector / rather than 0 can be included. So it 
suffices to assume that S coincides with the intersection 
of its affine hull and the set of integer points in a box, 
that is, S = aff(S)N {x € Z" : 1 < x < u} where 
l,u € Z". We now describe and prove the algorithm 
of [56] adjusted to any lower and upper bounds /, uw. 


Lemma 7 There is a polynomial time algorithm that, 
given vectors 1,u € Z", setS C Z” satisfying S = 
afiS) N{z € Z" : 1 < z < u} and presented by an 
oriented augmentation oracle, x € S, andw € Z", 
encoded as [(l,u,x,w)], provides an optimal solution 


x* € S to the linear discrete optimization problem 
max{wz:z € S}. 


Proof We start with some strengthening adjust- 
ments to the oriented augmentation oracle. Let p := 
max{||I||oo, ||u|]oo} be an upper bound on the radius 
of S. Then any augmenting vector g obtained from the 
oriented augmentation oracle when queried on y € S 
and w4,w— € Z", can be made in polynomial time 
to be exhaustive, that is, to satisfy y + 2g ¢ S (which 
means that no longer augmenting step in direction g 
can be taken). Indeed, using binary search, find the 
largest r € {1,...,2p} for which ] < y+rg < 4; 
then S = aff(S)N {z € Z" : 1 < z < u} implies 
y + rg € Sand hence we can replace g := rg. So from 
here on we will assume that if there is an augmenting 
vector then the oracle returns an exhaustive one. Sec- 
ond, let Roo := R W {+00} and for any vector v € R” 
let v-! € R&, denote its entry-wise reciprocal defined 
by vt = 7 ifv; 4 Oand v;! := ooif v; = 0. For any 
y € S, the vectors (y—1)~! and (u— y)~" are the recip- 
rocals of the “entry-wise distance” of y from the given 
lower and upper bounds. The algorithm will query the 
oracle on triples y, w+, w— with w4 := w—p(u—y)! 
and w_ := w+ l(y—1)~! where p isa suitable positive 
scalar and w is the input linear functional. The fact that 
such w+, w_ may have infinite entries does not cause 
any problem: indeed, if g is an augmenting vector then 
y +g € S implies that ge = 0 whenever yj = Uj 
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and g; = 0 whenever |; = yj, so each infinite entry 
in w+ or w_ occurring in the expression w+ g+ —w_g7 
is multiplied by 0 and hence zeroed out. 

The algorithm proceeds in phases. Each phase i 
starts with a feasible point y;_; © S and performs 
repeated augmentations using the oriented augmen- 
tation oracle, terminating with a new feasible point 
yi € S when no further augmentations are possible. 
The queries to the oracle make use of a positive scalar 
parameters j1; fixed throughout the phase. The first 
phase (i=1) starts with the input point yo 
sets [41 := (||Wlloo. Each further phase i > 2 starts 
with the point y;-; obtained from the previous phase 
and sets the parameter value ju; := 5 [4-1 to be half its 
value in the previous phase. The algorithm terminates 
at the end of the first phase i for which ju; < 4, and 
outputs x* := y;. Thus, the number of phases is at most 
[log,(2p||w||o0)] and hence polynomial in (J, u, w). 

We now describe the ith phase which determines y; 
from yj;-1. Set uj; := Fi-1 and f := yj-1. Iter- 
ate the following: query the strengthened oriented aug- 
mentation oracle on 7, w_ := w— pi(u — ae and 
w— := w+ u;(~—1)7; if the oracle returns an exhaus- 
tive augmenting vector g then set y := y+g and repeat, 
whereas if it asserts that there is no augmenting vector 
then set y; := and complete the phase. If 4; > + then 
proceed to the (i + 1)th phase, else output x* := y; and 
terminate the algorithm. 

It remains to show that the output of the algorithm 
is indeed an optimal solution and that the number of 
iterations (and hence calls to the oracle) in each phase is 
polynomial in the input. For this we need the following 
facts, the easy proofs of which are omitted: 

1. For every feasible y € S and direction g with y+ g € 

S also feasible, we have 


:= x and 


(u-y)'gtty-l) lg <n. 


2. For every y € S and direction g with y + g € S but 
y + 2g ¢ S, we have 


1 
(u—y)1gt+(y—-l 1g” > - 


3. For every feasible y € S, direction g with y + g € S 
also feasible, and jz > 0, setting w+ := w — w(u — 


y)! and w_ := w+ y(y — 1)7' we have 


wg —p((u—y) ‘gt 
t+(y-l)'g’) . 


wigt—-wig = 


Now, consider the last phase i with uw; < +, let 
x* := y; := jp be the output of the algorithm at the 
end of this phase, and let  € S be any optimal so- 
lution. Now, the phase is completed when the oracle, 
queried on the triple , w, = w — pi(u — 9)’, and 
we = wt wily —- 1)~!, asserts that there is no aug- 
menting vector. In particular, setting g := x — j, we 
find wigt — w—g™ < 0 and hence, by facts 1 and 3 
above, 


wk —wx* = wg < pi ((u—9) ‘gt 
‘ ge 1 
+(y — 1) ge) ee =, 


Since wx and wx* are integer, this implies that in fact 
wx—wx* < 0 and hence the output x* of the algorithm 
is indeed an optimal solution to the given optimization 
problem. 

Next we bound the number of iterations in each 
phase i starting from y;-; € S. Let again x € S be any 
optimal solution. Consider any iteration in that phase, 
where the oracle is queried on y, wy = w—pj(u—y)1, 
andw_ = w+ pi(y—1 )~1, and returns an exhaustive 
augmenting vector g. We will now show that 


Gee, 


‘ a 
wytg)-wy = re 


that is, the increment in the objective value from j to 
the augmented point + g is at least 7 times the dif- 
ference between the optimal objective value wx and the 
objective value wy;—, of the point y;—; at the beginning 
of phase i. This shows that at most 4n such increments 
(and hence iterations) can occur in the phase before it 
is completed. 

To establish (1), we show that wg > 5 [i and 
wx — wyi-1 < 2n1;. For the first inequality, note that 
g is an exhaustive augmenting vector and so wig? — 
w_g >Oand j + 2g ¢ S and hence, by facts 2 and 3, 
wg > pi((u-—p) ‘gt +(p—l)!g7) > $i. We proceed 
with the second inequality. If i = 1 (first phase) then 
this indeed holds since w£—wyo < 2np||wlloo = 2n[1. 
If i > 2, let w_ = w— wi-a(u — yi-1)! and w_ := 
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w+ (i-1(yi-1 — 1)7!. The (i — 1)th phase was com- 
pleted when the oracle, queried on the triple y;_1, w+, 
and w_, asserted that there is no augmenting vector. In 
particular, for ¢ := —y;-1, we find w,g+-—w_g <0 
and so, by facts 1 and 3, 


wx — WYi-1 
=w < wi ((u-—yi) 8 + i - DF) 
<pj-in = 2np;. i 


Graver Bases and Linear Integer Programming 


We now come to the definition of a fundamental ob- 
ject introduced by Graver in [28]. The Graver basis of 
an integer matrix A is a canonical finite set G(A) that 
can be defined as follows. Define a partial order C on 
Z" which extends the coordinate-wise order < on N” 
as follows: for two vectors u,v € Z” put u C v and 
say that u is conformal to v if |u;| < |v;| and ujv; > 0 
fori = 1,...,n, that is, u and v lie in the same or- 
thant of IR” and each component of u is bounded by 
the corresponding component of v in absolute value. It 
is not hard to see that C is a well partial ordering (this 
is basically Dickson’s lemma) and hence every subset of 
Z” has finitely-many C-minimal elements. Let £(A) := 
{x € Z" : Ax = 0} be the lattice of linear integer de- 
pendencies on A. The Graver basis of A is defined to be 
the set G(A) of all C-minimal vectors in £(A) \ {0}. 

Note that if A is an mn matrix then its Graver basis 
consist of vectors in Z". We sometimes write G(A) as 
a suitable |G(A)| x n matrix whose rows are the Graver 
basis elements. The Graver basis is centrally symmet- 
ric (g € G(A) implies —g € G(A)); thus, when list- 
ing a Graver basis we will typically give one of each an- 
tipodal pair and prefix the set (or matrix) by +. Any 
element of the Graver basis is primitive (its entries are 
relatively prime integers). Every circuit of A (nonzero 
primitive minimal support element of £(A)) is in G(A); 
in fact, if A is totally unimodular then G(A) coincides 
with the set of circuits (see Sect. “Convex Integer Pro- 
gramming over Totally Unimodular Systems” in the se- 
quel for more details on this). However, in general G(A) 
is much larger. For more details on Graver bases and 
their connection to Grébner bases see Sturmfels [58] 
and for the currently fastest procedure for computing 
them see [35,36]. 


Here is a quick simple example; we will see more 
structured and complex examples later on. Consider 
the 1x3 matrix A := (1,2, 1). Then its Graver basis can 
be shown to be the set G(A) = +{(2, —1, 0), (0, —1, 2), 
(1,0,—1),(1,-—1,1)}. The first three elements (and 
their antipodes) are the circuits of A; already in this 
small example non-circuits appear as well: the fourth 
element (and its antipode) is a primitive linear integer 
dependency whose support is not minimal. 

We now show that when we do have access to the 
Graver basis, it can be used to solve linear integer pro- 
gramming. We will extend this in Sect. “Convex Integer 
Programming”, where we show that the Graver basis 
enables to solve convex integer programming as well. 
In Sect. “Graver Bases of N-fold Matrices” we will show 
that there are important classes of matrices for which 
the Graver basis is indeed accessible. 

First, we need a simple property of Graver bases. 
A finite sum u := . v; of vectors v; € R” is con- 
formal if each summand is conformal to the sum, that 
is, v; E wu for all i. 


Lemma 8 Let A be any integer matrix. Then any h € 
£(A) \ {0} can be written as a conformal sum h := >° g; 
of (not necessarily distinct) Graver basis elements gj € 


G(A). 


Proof By induction on the well partial order CE. Recall 
that G(A) is the set of E-minimal elements in £(A) \ 
{0}. Consider any h € L(A) \ {0}. If it is C-minimal 
then h € G(A) and we are done. Otherwise, there is 
ah’ € G(A) such that h’ © h. Set h” := h—h’. Then 
h" € L(A) \ {0} and h” T h, so by induction there is 
a conformal sum h” = )°, g; with g; € G(A) for alli. 
Now h = h’'+°, g; is the desired conformal sum of h. 

Oo 


The next lemma shows the usefulness of Graver bases 
for oriented augmentation. 


Lemma9 Let A be an mxXn integer matrix with Graver 
basis G(A) and let ],u € Z%,, w4,w- € Z",andbe 
Z". Supposex€ T:= {ye Z":Ay=b,l<y <u}. 
Then for every g € Z" which satisfies x + g € T and 
wig? —w_g” >0 there exists an element g € G(A) 
with & © g which also satisfies x + § € T andw4gt — 
wg >0. 

Proof Suppose g € Z" satisfies the requirements. 
Then Ag = A(x+ g)—Ax = b—b = Osincex,x+g € 
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T. Thus, g € £(A) \ {0} and hence, by Lemma 8, there 
is a conformal sum g = )°,; h; with h; € G(A) for all i. 
Now, h; E g is equivalent to hi < g andh; <g, 
so the conformal sum g = )_; h; gives corresponding 
sums of the positive and negative parts g* = )°,h? 
and g- = )°, h;. Therefore we obtain 


0< wigt-wg = we doh} —w oh; 


= \(wyht —w_h7) 


which implies that there is some h; in this sum with 
w4h* —w_h; 0. Now, h; € G(A) implies A(x + hj) = 
Ax = b. Also, | < x,x+ g <uandh; C g imply that 
l<x+h; < u.Sox +h; € T. Therefore the vector 
& := h; satisfies the claim. O 


We can now show that the Graver basis enables to solve 
linear integer programming in polynomial time pro- 
vided an initial feasible point is available. 


Theorem 12 There is a polynomial time algorithm 
that, given A € Z*", its Graver basis G(A), l,u € 
Zbo xw € Z" with 1 < x < uw, encoded as 
[(A, G(A), 1, u, x, w)], solves the linear integer program 
max{wz:z€ Z",Az=b,1 <z<u}withb:= Ax. 


Proof First, note that the objective function of the in- 
teger program is unbounded if and only if the objec- 
tive function of its relaxation max{wy : y € IR", Ay = 
b,1 < y < u} is unbounded, which can be checked in 
polynomial time using linear programming. If it is un- 
bounded then assert that there is no optimal solution 
and terminate the algorithm. 

Assume then that the objective is bounded. Then, 
since the program is feasible, it has an optimal solu- 
tion. Furthermore, (as basically follows from Cramer’s 
rule, see e.g. [13, Theorem 17.1]) it has an optimal x* 
satisfying |x;"| < p for all j, where p is an easily com- 
putable integer upper bound whose binary length (p) is 
polynomially bounded in (A, 1, u, x). For instance, p := 
(n+1)(n+1)!r"*1 will do, with r the maximum among 
max; |); Ai,jxj|, max;,;|Ai,j|, max{|]j| : [Jj] < co}, 
and max{|u;| : |uj| < oo}. 

Lett T:= {ty € Z”: Ay=b,1l < y < u} 
and S := TM [—p, p]". Then our linear integer pro- 
gramming problem now reduces to linear discrete op- 
timization over S. Now, an oriented augmentation ora- 
cle for S can be simulated in polynomial time using the 


given Graver basis G(A) as follows: given a query y € S 
and w4,w— € Z", search for g € G(A) which satisfies 
wigt —w-g >Oand y+ g € S; if there is sucha g 
then return it as an augmenting vector, whereas if there 
is no such g then assert that no augmenting vector ex- 
ists. Clearly, if this simulated oracle returns a vector g 
then it is an augmenting vector. On the other hand, if 
there exists an augmenting vector g then y+ g E€ SCT 
and wigt —w_g” >0 imply by Lemma 9 that there is 
also a ¢ € G(A) with g C gsuch that w,¢*—w_g >0 
and y+ g € T. Since y,y+g € Sandg C g, we 
find that y + g € S as well. Therefore the Graver basis 
contains an augmenting vector and hence the simulated 
oracle will find and output one. 

Define i, € Z" by i, := max(lj,—p),a; := 
min(uj,p), j = 1,...,n. Then it is easy to see that 
S = aff(S) N{y € Z" :1 < y < a}. Now apply the al- 
gorithm of Lemma 7 to i, ii, S, x, and w, using the above 
simulated oriented augmentation oracle for S, and ob- 
tain in polynomial time a vector x* € S which is opti- 
mal to the linear discrete optimization problem over S 
and hence to the given linear integer program. oO 


As a special case of Theorem 12 we recover the follow- 
ing result of [55] concerning linear integer program- 
ming in standard form when the Graver basis is avail- 
able. 


Theorem 13 There is a polynomial time algorithm that, 
given matrix A € Z”*", its Graver basis G(A), x € N", 
and w € Z", encoded as [(A, G(A), x, w)], solves the 
linear integer programming problem max{wz : z € 
N”, Az = b} where b := Ax. 


Graver Bases of N-fold Matrices 


As mentioned above, the Graver basis G(A) of an in- 
teger matrix A contains all circuits of A and typically 
many more elements. While the number of circuits is 
already typically exponential and can be as large as 
ane the number of Graver basis elements is usually 
even larger and depends also on the entries of A and 
not only on its dimensions m, n. So unfortunately it is 
typically very hard to compute G(A). However, we now 
show that for the important and useful broad class of n- 
fold matrices, the Graver basis is better behaved and can 
be computed in polynomial time. Recall the following 
definition from the introduction. Given an (r + s) x t 
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matrix A, let A; be its r x t sub-matrix consisting of the 
first r rows and let Ap be its sx t sub-matrix consisting of 
the last s rows. We refer to A explicitly as (r + s) x t ma- 
trix, since the definition below depends also on r and s 
and not only on the entries of A. The n-fold matrix of an 
(r + s) x t matrix A is then defined to be the following 
(r+ ns) x nt matrix, 


AM = (1, ® Ai) ® Un ® Az) 


A; A, Ai ss At 
A, O O =: O 
= 0 Ar, O =: O 
0 O O + Ag 


We now discuss a recent result of [54], which orig- 
inates in [4], and its extension in [38], on the stabiliza- 
tion of Graver bases of n-fold matrices. Consider vec- 
tors x = (x!,...,x”") with x* € Z' fork = 1,...,n. 
The type of x is the number |{k : x* 4 0}| of nonzero 
components x* € Z! of x. The Graver complexity of an 
(r + s) x t matrix, denoted c(A), is defined to be the 
smallest c € N W& {oo} such that for all n, the Graver 
basis of A“ consists of vectors of type at most c(A). 
We provide the proof of the following result of [38,54] 
stating that the Graver complexity is always finite. 


Lemma 10 The Graver complexity c(A) of any (r+s)xt 
integer matrix A is finite. 


Proof Call an element x = (x!,...,x") in the Graver 
basis of some A™ pure if x' € G(A2) for all i. Note that 
the type of a pure x € G(A™) is n. First, we claim that 
if there is an element of type m in some G (A) then for 
some n > m there is a pure element in G(A\”), and so it 
will suffice to bound the type of pure elements. Suppose 
there is an element of type m in some G(A"). Then its 
restriction to its m nonzero components is an element 
x = (x!,...,x) in G(AM™). Let xi = pa gi,j be 
a conformal decomposition of x' with g;,; € G(A2) for 
all i, j, and let n := ky +--+ + km => m. Then g := 
(g1,1,++++Zm.km) is in G(A™), else there would be ¢ C 
gin G(A™) in which case the nonzero % with %/ := 
Yi, i,j for all i would satisfy # C x and ¢ € £(A\), 
contradicting x € G(A”). Thus g is a pure element of 
type n > m, proving the claim. 

We proceed to bound the type of pure elements. Let 
G(A2) = {g1,---, 8m} be the Graver basis of A, and 


let G2 be the t x m matrix whose columns are the gj. 
Suppose x = (x1,..., x") € G(A™) is pure for some n. 
Let v € N” be the vector with v; := |{k : x* = g;}| 
counting the number of g; components of x for each i. 
Then > ""_, vi is equal to the type n of x. Next, note that 
AiGov = Ai() op, x*) = Oand hence v € £(A)G)). 
We claim that, moreover, v € G(A,G2). Suppose indi- 
rectly not. Then there is ) € G(A,G)) with v C v, and 
it is easy to obtain a nonzero x C x from x by zeroing 
out some components so that #; = |{k : #* = g;} 
for all i. Then Ai()-_, **) = AiG? = 0 and hence 
& € L(A”), contradicting x € G(A™). 

So the type of any pure element, and hence the 
Graver complexity of A, is at most the largest value 
>, vi of any nonnegative element v of the Graver ba- 
sis G(A,G)). Oo 


Using Lemma 10 we now show how to compute G (A™) 
in polynomial time. 


Theorem 14 For every fixed (r + s) x t integer ma- 
trix A there is a strongly polynomial time algorithm that, 
given n € N, encoded as [n;n], computes the Graver 
basis G(A™) of the n-fold matrix A”). In particular, 
the cardinality |G(A)| and binary length (G(A”)) of 
the Graver basis of the n-fold matrix are polynomially 
bounded in n. 


Proof Let c := c(A) be the Graver complexity of A and 
consider any n > c. We show that the Graver basis of 
A) is the union of () suitably embedded copies of the 
Graver basis of A‘. For every c indices 1 < kj <++-< 
k. < n define a map y,.....k, from Z to Z™ sending 
6 = ack) t0¥ = nec) with = 
fori = Lica yp tend oe == Ofork ¢ {k,,...,k-}.We 
claim that G(A\”) is the union of the images of G(A“) 
under the (”) maps x,,...,k, for alll < kj <---<ke < 
n, that is, 


g(a”) = U 


1<k\<-+-<k,.<n 


aang 


Pk ,.ske(G(A)) . (2) 


Ifx = (x!,...,x°) € G(A®) then x is a C-minimal 
nonzero element of £(A), implying that x, ....k,(x) is 
a C-minimal nonzero element of £(A“”) and therefore 
we have x,,...,k.(x) € G(A™). So the right-hand side of 
(2) is contained in the left-hand side. Conversely, con- 
sider any y € G(A”). Then, by Lemma 10, the type of y 
is at most c, so there are indices 1 < kj <---<k, <n 
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such that all nonzero components of y are among those 
of the reduced vector x := ( yk ae yke) and therefore 
y = ob....,k.(x). Now, y € G(A) implies that y is 
a C-minimal nonzero element of £(A™) and hence x 
is a C-minimal nonzero element of £(A“). Therefore 
x € G(A®) and y € Pk1,..-4kg(G(A)). So the left-hand 
side of (2) is contained in the right-hand side. 

Since A is fixed we have that c = c(A) and G(A) 
are constant. Then (2) implies that |G(A™)| < 
(")|G(A™)| = O(n*). Moreover, every element of 
G(A™) is an nt-dimensional vector Pky y..5k.(X) ob- 
tained by appending zero components to some 
x € G(A“) and hence has linear binary length O(n). 
So the binary length of the entire Graver basis 
G(A”) is O(n‘t!). Thus, the (") = O(n‘) im- 
ages Pk enke(G(AM)) and their union G(A“) can be 
computed in strongly polynomial time, as claimed. O 


Example 2 Consider the (2 + 1) x 2 matrix A with 
A, := I, the 2 x 2 identity and A, := (1,1). Then 
G(A2) = +(1,-1) and G(A;G2) = +(1,1) from 
which the Graver complexity of A can be concluded to 
be c(A) = 2 (see the proof of Lemma 10). The 2-fold 
matrix of A and its Graver basis, consisting of two an- 
tipodal vectors only, are 


AW = 


oror 
orro 


0 
1 
0 , 
1 


ee OO KF 


G(A?) = 4(1 -1 -1 1). 
By Theorem 14, the Graver basis of the 4-fold matrix 
A“ is computed to be the union of the images of the 
6= (;) maps $x,,k,: 27? —> Z*? for 1 < ky < ky < 
4, getting 


10101010 
oe ee ee | 
a®_{ 119 0 0000 
001100004" 
00001100 
00000011 

iadlst 1 6 8 o @ 

i =f 0 G©er 4 Oo 8 

‘ i= @ 6 0 O=1 1 

GA) = +} 9 6 t= bf © 6 

oO i= © Oi 4 

Oe 8 Dee Sl, 


Linear N-fold Integer Programming 
in Polynomial Time 


We now proceed to provide a polynomial time 
algorithm for linear integer programming over 
n-fold matrices. First, combining the results of 
Sect. “Graver Bases and Linear Integer Programming” 
and Sect. “Graver Bases of N-fold Matrices”, we get 
at once the following polynomial time algorithm for 
converting any feasible point to an optimal one. 


Lemma 11 For every fixed (r + s) x t integer matrix A 
there is a polynomial time algorithm that, givenn € N, 
lu € ZR, x,w € Z™ satisfying] < x < u, encoded 
as [(I,u, x, w)], solves the linear n-fold integer program- 
ming problem with b := Ax, 


max{wz:ze€Z", AM%z=b, 1 <z<u}. 


Proof First, apply the polynomial time algorithm of 
Theorem 14 and compute the Graver basis G (A) of 
the n-fold matrix A). Then apply the polynomial time 
algorithm of Theorem 12 to the data AM), G(A™), 
l,u,x and w. Ey] 


Next we show that an initial feasible point can also be 
found in polynomial time. 


Lemma 12 For every fixed (r + s) x t integer matrix A 
there is a polynomial time algorithm that, givenn € N, 
Iu € Zh, andb € Z'*"s, encoded as [(I, u, b)], either 
finds anx € Z" satisfying < x < uand A" x = bor 
asserts that none exists. 


Proof Ifl ¢ uthen assert that there is no feasible point 
and terminate the algorithm. Assume then that / < u 
and determine some x € Z"! with] < x < u and 
(x) < (I, u). Now, introduce n(2r + 2s) auxiliary vari- 
ables to the given n-fold integer program and denote by 
x the resulting vector of n(t+2r-+ 2s) variables. Suitably 
extend the lower and upper bound vectors to ia by set- 
ting i j 2= Oand uj; := oo for each auxiliary variable x;. 
Consider the auxiliary integer program of finding an in- 
teger vector x that minimizes the sum of auxiliary vari- 
ables subject to the lower and upper bounds / < £ < 4 
and the following system of equations, with I, and I; the 
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rx rands x s identity matrices, 


-[. 0 O A, I, -I, 0 
A, 0 O ZI; -I 0 0 O O 
0 0 0 0 OO Az, 0 O IT; 


0 A; I, -I, 0 0O 
0 O 0 oO O 

—I, .€. 0 D0. 0 p25 
0 A2 0 0 I, =I; 


This is again an n-fold integer program, with an (r+s)x 
(t + 2r + 2s) matrix A, where A, = (A,,1,,—I,, 0, 0) 
and A, = (A2,0,0,I;,—I,;). Since A is fixed, so is A. 
It is now easy to extend the vector x € Z” deter- 
mined above to a feasible point * of the auxiliary pro- 
gram. Indeed, put b:= b—-A™x € Zs: now, for 
i = 1,...,r-+ ns, simply choose an auxiliary variable 
%; appearing only in the ith equation, whose coefficient 
equals the sign sign(b;) of the corresponding entry of b, 
and set £; := |b;|. Define w € Z"'+?"+25) by setting 
w := 0 for each original variable and w := —1 for each 
auxiliary variable, so that maximizing wx is equivalent 
to minimizing the sum of auxiliary variables. Now solve 
the auxiliary linear integer program in polynomial time 
by applying the algorithm of Lemma 11 corresponding 
to A to the data n, i, u, X, and w. Since the auxiliary ob- 
jective wx is bounded above by zero, the algorithm will 
output an optimal solution x*. If the optimal objective 
value is negative, then the original n-fold program is in- 
feasible, whereas if the optimal value is zero, then the 
restriction of x* to the original variables is a feasible 
point x* of the original integer program. Oo 


Combining Lemmas 11 and 12 we get at once the main 
result of this section. 


Theorem 15 For every fixed (r +s) x t integer matrix A 
there is a polynomial time algorithm that, given n, lower 
and upper bounds l,u € Z@!,we Z", andbe Zt", 
encoded as [(1,u,w,b)], solves the following linear n- 
fold integer programming problem, 


max{wx :x€Z™, AMx=b, 1<x <u}. 


Again, as a special case of Theorem 15 we recover the 
following result of [13] concerning linear integer pro- 
gramming in standard form over n-fold matrices. 


Theorem 16 For every fixed (r +s) x t integer matrix A 
there is a polynomial time algorithm that, given n, linear 
functional w € Z"', and right-hand side b € Z™t"s, 
encoded as [(w, b)], solves the following linear n-fold in- 
teger program in standard form, 


max {wa »:xEN™ AMy = o} . 


Some Applications 


Three-Way Line-Sum Transportation Problems 
Transportation problems form a very important class 
of discrete optimization problems studied extensively 
in the operations research and mathematical program- 
ming literature, see e.g. [6,42,43,53,60,62] and the ref- 
erences therein. We will discuss this class of problem 
and its applications to secure statistical data disclo- 
sure in more detail in Sect. “Multiway Transportation 
Problems and Privacy in Statistical Databases”. 

It is well known that 2-way transportation problems 
are polynomial time solvable, since they can be encoded 
as linear integer programs over totally unimodular sys- 
tems. However, already 3-way transportation problem 
are much more complicated. Consider the following 
3-way transportation problem over p x qx n tables with 
all line-sums fixed, 


max wx : xe NP*axn y Nick = Sik: 
i 
y Xi,jk = Vi,k» y Xijk = ui, : 
j k 


The data for the problem consist of given integer num- 
bers (lines-sums) uj,;, Vi,k, 2), fori = 1,...,p,j = 
1,...,q,k = 1,...,n, anda linear functional given by 
a p X q X n integer array w representing the transporta- 
tion profit per unit on each cell. The problem is to find 
a transportation, that is, a p x q x n nonnegative integer 


table x satisfying the line sum constraints, which attains 
P 


q n 
pat Dojat Lokal Wij kX, j,k 

When at least two of the table sides, say p, q, are 
variable part of the input, and even when the third 
side is fixed and as small as n = 3, this problem is 


already universal for integer programming in a very 


maximum profit wx = 
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strong sense [14,16], and in particular is NP-hard [15]; 
this will be discussed in detail and proved in Sect. 
“Multiway Transportation Problems and Privacy in 
Statistical Databases”. We now show that in contrast, 
when two sides, say p, q, are fixed (but arbitrary), and 
one side n is variable, then the 3-way transportation 
problem over such long tables is an n-fold integer pro- 
gramming problem and therefore, as a consequence of 
Theorem 16, can be solved is polynomial time. 


Corollary 8 For every fixed p and q there is a polyno- 
mial time algorithm that, given n, integer profit array 
w € ZP*4*", and line-sums u € Z?*4, v © ZP*" and 
z € Z4*", encoded as [(w,u,v,z)], solves the integer 
3-way line-sum transportation problem 


max wx : xe NPxaen y Hie Sees 


1 


> isk = Vises DXi = Hs] 
j k 


Proof Re-index p x q x n arrays as x = (x!,...,x") 
with each component indexed as x* := (xk > = 
(X1,1,ks-+++Xp,q,k) Suitably indexed as a pq vector rep- 


resenting the kth layer of x. Put r:= t := pq ands := 
p+4q,and let A be the (r +s) x t matrix with A; := Ip, 
the pq x pq identity and with A; the (p+ q) x pq matrix 
of equations of the usual 2-way transportation problem 
for pxq arrays. Re-arrange the given line-sums in a vec- 
tor b := (b°, b!,...,b") € Z*"s with b° := (u;,;) and 
bP (yi), ep) fork = Liege 

This translates the given 3-way transportation prob- 
lem into an n-fold integer programming problem in 
standard form, 


max{wx : x EN", AM = bh, 


where the equations Ai()°;_, x*) = po represent 
the constraints }°,x;i,;4 = wui,j of all line-sums 
where summation over layers occurs, and the equations 
Ax* = b* fork = 1,...,n represent the constraints 
» Xijk = Ze and yj Xi,jk = Vi,k Of all line-sums 
where summations are within a single layer at a time. 
Using the algorithm of Theorem 16, this n-fold inte- 
ger program, and hence the given 3-way transportation 
problem, can be solved in polynomial time. oO 


Example 3. We demonstrate the encoding of the p x q x 
n transportation problem as an n-fold integer program 


as in the proof of Corollary 8 for p = q = 3 (small- 
est case where the problem is genuinely 3-dimensional). 
Here we put r := t := 9, s := 6, write 


k 
x= (X1,1,k> X1,2,k+ X1,3,k> X2,1,k > X2,2,k» X2,3,k 


X3,1,k» X3,2,k5 %3,3,k), K=1,...,n, 


and let the (9 + 6) x 9 matrix A consist of Ay; = Ig the 
9 x 9 identity matrix and 


1 1100 0 0 0 0 

0 0 0 1 21 1 0 0 0 

00000 0i21d2i1é4i1 
Ad = 

10010 0 1 0 0 

0 1001 0 0 1 =0 

0010 01001 


Then the corresponding n-fold integer program en- 
codes the 3 x 3 x n transportation problem as desired. 
Already for this case, of 3 x 3 x n tables, the only known 
polynomial time algorithm for the transportation prob- 
lem is the one underlying Corollary 8 . 


Corollary 8 has a very broad generalization to mul- 
tiway transportation problems over long k-way tables 
of any dimension k; this will be discussed in detail in 
Sect. “Multiway Transportation Problems and Privacy 
in Statistical Databases”. 


Packing Problems and Cutting-Stock We consider 
the following rather general class of packing problems 
which concern maximum utility packing of many items 
of several types in various bins subject to weight con- 
straints. More precisely, the data is as follows. There 
are t types of items. Each item of type j has integer 
weight v;. There are n; items of type j to be packed. 
There are n bins. The weight capacity of bin k is an inte- 
ger ux. Finally, there is a utility matrix w € Z'*" where 
wj,k is the utility of packing one item of type j in bin k. 
The problem is to find a feasible packing of maximum 
total utility. By incrementing the number t of types by 1 
and suitably augmenting the data, we may assume that 
the last type t represents “slack items” which occupy the 
unused capacity in each bin, where the weight of each 
slack item is 1, the utility of packing any slack item in 
any bin is 0, and the number of slack items is the total 
residual weight capacity n; := )°j_) uk — ye NjVj. 
Let x € N‘™" bea variable matrix where x;,; represents 
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the number of items of type j to be packed in bin k. Then 
the packing problem becomes the following linear inte- 
ger program, 


max} :xeN™", Yo vpxie = Uk, 


j 
Dae nt. 
k 


We now show that this is in fact an n-fold integer pro- 
gramming problem and therefore, as a consequence of 
Theorem 16, can be solved is polynomial time. While 
the number f of types and type weights v; are fixed, 
which is natural in many bin packing applications, the 
numbers nj, of items of each type and the bin capaci- 
ties uz may be very large. 


Corollary 9 For every fixed number t of types and 
integer type weights v,,...,Vv1, there is a polynomial 
time algorithm that, given n bins, integer item num- 
bers n,,...,ny, integer bin capacities u,,...,Un, and 
t x n integer utility matrix w, encoded as [(m,..., 1, 
Uj,...,Un, w)], solves the following integer bin packing 
problem, 


max} ws :xeN™*" So vjxik = Uk, 
j 
Dent. 
k 


Proof Re-index the variable matrix as x = (xt, ..., 
x") with x* := con seh x) where x* represents the 
number of items of type j to be packed in bin k for all j 
and k. Let A be the (t + 1) x t matrix with A, := I; 
the t x t identity and with Az := (v,...,v,) a sin- 
gle row. Re-arrange the given item numbers and bin 
capacities in a vector b := (b°,b',...,b") © Zit" 
with b° := (,,...,n,) and b* := ug, for all k. This 
translates the bin packing problem into an n-fold inte- 
ger programming problem in standard form, 


max{wx : xEN", AMx = bd}, 


where the equations Ai()-;_, x*) = b° represent the 
constraints )°, x;,k = mj assuring that all items of 
each type are packed, and the equations Ayx* = b* for 
k = 1,...,n represent the constraints ae VjXj,k = Uk 


assuring that the weight capacity of each bin is not ex- 
ceeded (in fact, the slack items make sure each bin is 
perfectly packed). 

Using the algorithm of Theorem 16, this n-fold in- 
teger program, and hence the given integer bin packing 
problem, can be solved in polynomial time. oO 


Example 5 (cutting-stock problem). This is a classical 
manufacturing problem [27], where the usual setup is 
as follows: a manufacturer produces rolls of material 
(such as scotch-tape or band-aid) in one of ¢ different 
widths v;,...,v;. The rolls are cut out from standard 
rolls of common large width u. Given orders by cus- 
tomers for nj; rolls of width v;, the problem facing the 
manufacturer is to meet the orders using the smallest 
possible number of standard rolls. This can be cast as 
a bin packing problem as follows. Rolls of width v; be- 
come items of type j to be packed. Standard rolls be- 
come identical bins, of capacity uz := u each, where 
the number of bins is set to be n := SS [nj/Lu/v;]] 
which is sufficient to accommodate all orders. The util- 
ity of each roll of width v; is set to be its width negated 
Wj,k := —v; regardless of the standard roll k from 
which it is cut (paying for the width it takes). Intro- 
duce a new roll width vg := 1, where rolls of that width 
represent “slack rolls” which occupy the unused width 
of each standard roll, with utility wo, := —1 regard- 
less of the standard roll k from which it is cut (paying 
for the unused width it represents), with the number 
of slack rolls set to be the total residual width np := 
nu — Se njv;. Then the cutting-stock problem be- 
comes a bin packing problem and therefore, by Corol- 
lary 9, for every fixed t and fixed roll widths v;,..., v;, it 
is solvable in time polynomial in = [nj/|u/v; || and 
(1,...,M4,U). 

One common approach to the cutting-stock prob- 
lem uses so-called cutting patterns, which are feasi- 
ble solutions of the knapsack problem {y € N!: 
yi vjyj < uj}. This is useful when the common 
width u of the standard rolls is of the same order of 
magnitude as the demand role widths v;. However, 
when u is much larger than the v;, the number of cut- 
ting patterns becomes prohibitively large to handle. 
But then the values |u/v;| are large and hence n := 
ay [nj/|u/v;|] is small, in which case the solution 
through the algorithm of Corollary 9 becomes particu- 
larly appealing. 
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Convex Integer Programming 


In this section we discuss convex integer programming. 
In particular, we extend the theory of Sect. “Linear 
N-fold Integer Programming” and show that convex 
n-fold integer programming is polynomial time solv- 
able as well. In Sect. “Convex Integer Programming 
over Totally Unimodular Systems” we discuss convex 
integer programming over totally unimodular matri- 
ces. In Sect. “Graver Bases and Convex Integer Pro- 
gramming” we show the applicability of Graver bases to 
convex integer programming. In Sect. “Convex N-fold 
Integer Programming in Polynomial Time” we com- 
bine Theorem 6, the results of Sect. “Linear N-fold In- 
teger Programming”, and the preparatory facts from 
Sect. “Graver Bases and Convex Integer Programming”, 
and prove the main result of this section, asserting 
that convex n-fold integer programming is polynomial 
time solvable. We conclude with some applications in 
Sect. “Some Applications”. 

As in Sect. “Linear N-fold Integer Programming”, 
the feasible set S is presented as the set of integer points 
satisfying an explicitly given system of linear inequali- 
ties, given in one of the forms 


S:={xeN":Ax=b} or 
S:={xeZ":Ax=b,1l<x<u}, 


with matrix A € Z”", right-hand side b € Z”, and 
lower and upper bounds J, u € Z%.. 

As demonstrated in Sect. “Limitations”, if the poly- 
hedron P := {x € R" : Ax = b, 1 < x < u} is un- 
bounded then the convex integer programming prob- 
lem with an oracle presented convex functional is rather 
hopeless. Therefore, an algorithm that solves the con- 
vex integer programming problem should either return 
an optimal solution, or assert that the program is in- 
feasible, or assert that the underlying polyhedron is un- 
bounded. 

Nonetheless, we do allow the lower and upper 
bounds /,u to lie in Z%, rather than Z", since of- 
ten the polyhedron is bounded even though the vari- 
ables are not bounded explicitly (for instance, if each 
variable is bounded below only, and appears in some 
equation all of whose coefficients are positive). This re- 
sults in broader formulation flexibility. Furthermore, in 
the next subsections we prove auxiliary lemmas assert- 
ing that certain sets cover all edge-directions of rele- 


vant polyhedra, which do hold also in the unbounded 
case. So we now extend the notion of edge-directions, 
defined in Sect. “Edge-Directions and Zonotopes” for 
polytopes, to polyhedra. A direction of an edge (1-di- 
mensional face) e of a polyhedron P is any nonzero 
scalar multiple of y — x where x, y are any two distinct 
points in e. As before, a set covers all edge-directions of 
P if it contains a direction of each edge of P. 


Convex Integer Programming 
over Totally Unimodular Systems 


A matrix A is totally unimodular if the determinant of 
every square submatrix of A lies in {—1, 0, 1}. Such ma- 
trices arise naturally in network flows, ordinary (2-way) 
transportation problems, and many other situations. 
A fundamental result in integer programming [37] as- 
serts that polyhedra defined by totally unimodular ma- 
trices are integer. More precisely, if A is an m X n totally 
unimodular matrix, /,u € Z%,, and b € Z”, then 


Py := conv{x € Z": Ax =b,1<x <u} 


= {xe R":Ax=b,l<x<ulb:=P, 


that is, the underlying polyhedron P coincides with its 
integer hull P;. This has two consequences useful in fa- 
cilitating the solution of the corresponding convex in- 
teger programming problem via the algorithm of Theo- 
rem 6. First, the corresponding linear integer program- 
ming problem can be solved by linear programming 
over P in polynomial time. Second, a set covering all 
edge-directions of the implicitly given integer hull P;, 
which is typically very hard to determine, is obtained 
here as a set covering all edge-directions of P which is 
explicitly given and hence easier to determine. 

We now describe a well known property of polyhe- 
dra of the above form. A circuit of a matrix A € Z”*" is 
a nonzero primitive minimal support element of L(A). 
So a circuit is a nonzero c € Z” satisfying Ac = 0, 
whose entries are relatively prime integers, such that no 
nonzero c’ with Ac’ = 0 has support strictly contained 
in the support of c. 


Lemma 13 For every Ac Z"*", 1,u € ZR, andb € 
Z"™, the set of circuits of A covers all edge-directions of 
the polyhedron P := {x € R": Ax = b, 1 <x < u}. 


Proof Consider any edge e of P. Pick two distinct 
points x,y € e and set g := y—x. Then Ag = 0 
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and therefore, as can be easily proved by induction on 
|supp(g)|, there is a finite decomposition g = )°; ajc; 
with a; positive real number and ¢; circuit of A such 
that ajc; C g for all i, where C is the natural ex- 
tension from Z" to R” of the partial order defined in 
Sect. “Graver Bases and Linear Integer Programming”. 
We claim that x + a;c; € P for all i. Indeed, c; be- 
ing a circuit implies A(x + ajc;) = Ax = b; and 
I< x,x+g <uandajc; C gimplyl < x+ajc; < u. 

Now let w € R"” be a linear functional uniquely 
maximized over P at the edge e. Then waj;c; = w(x + 
aic;) — wx < 0 for all i. But ))(wajc;) = wg = 
wy —wx = 0, implying that in fact wajc; = 0 and 
hence x + ajc; € e for all i. This implies that each c; 
is a direction of e (in fact, all cj are the same and g is 
a multiple of some circuit). oO 


Combining Theorem 6 and Lemma 13 we obtain the 
following statement. 


Theorem 17 For every fixed d there is a polynomial 
time algorithm that, given mx n totally unimodular ma- 
trix A, set C C Z" containing all circuits of A, vectors 
luce Zhbe Z", and w,...,wa € Z", and convex 
c: R4 — R presented by a comparison oracle, encoded 
as [(A,C, 1, u,b, wi,...,wa)], solves the convex integer 
program 


max {c(w,x,...,wax) : x € Z", 


Ax =b,1l<x<u}. 


Proof First, check in polynomial time using linear pro- 
gramming whether the objective function of any of the 
following 2n linear programs is unbounded, 


max{ty;:y¢P},i=1,...,n, 
P:={yeR": Ay=b,l<y<u}. 


If any is unbounded then terminate, asserting that P is 
unbounded. Otherwise, let p be the least integer upper 
bound on the absolute value of all optimal objective val- 
ues. Then P C [—p,p]" and S := {y € Z": Ay = 
b, 1 < y < u} C P is finite of radius p(S) < p. In 
fact, since A is totally unimodular, P; = P = conv(S) 
and hence p(S) = p. Moreover, by Cramer’s rule, (() is 
polynomially bounded in (A, I, u, x). 

Now, since A is totally unimodular, using linear 
programming over P; = P we can simulate in polyno- 


mial time a linear discrete optimization oracle for S. By 
Lemma 13, the given set C, which contains all circuits 
of A, also covers all edge-directions of conv(S) = P; = 
P. Therefore we can apply the algorithm of Theorem 6 
and solve the given convex n-fold integer programming 
problem in polynomial time. oO 


While the number of circuits of an m x n matrix A can 
be as large as Bl ia) and hence exponential in general, 
it is nonetheless relatively small in that it is bounded in 
terms of m and n only and is independent of the ma- 
trix A itself. Furthermore, it may happen that the num- 
ber of circuits is much smaller than the upper bound 
Ot Aa Also, if in a class of matrices, m grows slowly 
in terms of n, say m = O(logn), then this bound is 
subexponential. In such situations, the above theorem 
may provide a good strategy for solving convex integer 


programming over totally unimodular systems. 


Graver Bases and Convex Integer Programming 


We now extend the statements of Sect. “Convex Integer 
Programming over Totally Unimodular Systems” about 
totally unimodular matrices to arbitrary integer matri- 
ces. The next lemma shows that the Graver basis of any 
integer matrix covers all edge-directions of the integer 
hulls of polyhedra defined by that matrix. 


Lemma 14 For every A € Z”*", l,u € Zh, andb € 
Z"™, the Graver basis G(A) of A covers all edge-directions 
of the polyhedron Pr := conv{x € Z": Ax =b,1< 
x <u}. 


Proof Consider any edge e of P; and pick two distinct 
points x,y € eZ". Then g := y—x isin £(A) \ {0}. 
Therefore, by Lemma 8, there is a conformal sum g = 
>>; hi with h; € G(A) for all i. We claim that x + h; € 
P; for all i. Indeed, first note that h; € G(A) C L(A) 
implies Ah; = 0 and hence A(x + h;) = Ax = b; and 
second note that /] < x,x +g < uandh; CE g imply 
that] <x +h; <u. 

Now let w € Z” be a linear functional uniquely 
maximized over P; at the edge e. Then wh; = w(x + 
h;)—wx < Ofor alli. But )°(wh;) = wg = wy—wx = 
0, implying that in fact wh; = 0 and hence x + h; € e 
for all i. Therefore each h; is a direction of e (in fact, 
all h; are the same and g is a multiple of some Graver 
basis element). Oo 
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Combining Theorems 6 and 12 and Lemma 14 we ob- 
tain the following statement. 


Theorem 18 For every fixed d there is a polyno- 
mial time algorithm that, given integer m x n ma- 
trix A, its Graver basis G(A), l,u € Zh, x € Z" 
with 1 < x < u, w,...,wa € Z", and convex 
c: R¢ —> R presented by a comparison oracle, encoded 
as [(A, G(A), 1, u,x,w1,...,wa)], solves the convex in- 


teger program with b := Ax, 


max {c(w1Z,...,WaZ) : zE Z", 


Az=b,l<z<u}. 


Proof First, check in polynomial time using linear pro- 
gramming whether the objective function of any of the 
following 2n linear programs is unbounded, 


If any is unbounded then terminate, asserting that P is 
unbounded. Otherwise, let p be the least integer upper 
bound on the absolute value of all optimal objective val- 
ues. Then P C [—p,p]" and S := {y € Z" : Ay = 
b, 1 < y < u} C Pis finite of radius p(S) < p. More- 
over, by Cramer’s rule, (p) is polynomially bounded in 
(A, 1,u, x). 

Using the given Graver basis and applying the al- 
gorithm of Theorem 12 we can simulate in polynomial 
time a linear discrete optimization oracle for S. Further- 
more, by Lemma 14, the given Graver basis covers all 
edge-directions of the integer hull P; := conv{y € Z”: 
Ay = b,1 < y < u} = conv(S). Therefore we can 
apply the algorithm of Theorem 6 and solve the given 
convex program in polynomial time. Oo 


Convex N-fold Integer Programming 
in Polynomial Time 


We now extend the result of Theorem 15 and show 
that convex integer programming problems over n-fold 
systems can be solved in polynomial time as well. As 
explained in the beginning of this section, the algo- 
rithm either returns an optimal solution, or asserts that 
the program is infeasible, or asserts that the underlying 
polyhedron is unbounded. 


Theorem 19 For every fixed d and fixed (r + s) x t 
integer matrix A there is a polynomial time algorithm 
that, given n, lower and upper bounds l,u € ZRé, 
wW1,-..,Wq € Z", b € Z't"s, and convex functional 
c: R4 —> R presented by a comparison oracle, encoded 


as[(1,u, w1,..., wa, b)], solves the convex n-fold integer 
programming problem 
max {c(w)x,...,wax) 1 xe Z™, 


AM, =b, 1 <x <u}. 


Proof First, check in polynomial time using linear pro- 
gramming whether the objective function of any of the 
following 2nt linear programs is unbounded, 


max{ty;:y € P},i=1,...,nt, 
P:={yeER™ : AMy=b,1<y<u}. 


If any is unbounded then terminate, asserting that P is 
unbounded. Otherwise, let p be the least integer upper 
bound on the absolute value of all optimal objective val- 
ues. Then P € [—p, p]" and S := {ye Z™: AMy = 
b, 1< y < u} C Pis finite of radius p(S) < p. More- 
over, by Cramer’s rule, () is polynomially bounded 
in n and (1, u,b). 

Using the algorithm of Theorem 15 we can simu- 
late in polynomial time a linear discrete optimization 
oracle for S. Also, using the algorithm of Theorem 14 
we can compute in polynomial time the Graver basis 
G (A) which, by Lemma 14, covers all edge-directions 
of Pr := conviy € Z% : AMy = bl < y<us= 
conv(S). Therefore we can apply the algorithm of The- 
orem 6 and solve the given convex n-fold integer pro- 
gramming problem in polynomial time. oO 


Again, as a special case of Theorem 19 we recover the 
following result of [12] concerning convex integer pro- 
gramming in standard form over n-fold matrices. 


Theorem 20 For every fixed d and fixed (r + s) x t 
integer matrix A there is a polynomial time algorithm 
that, given n, linear functionals w,,...,.wa € Z", 
right-hand side b € Z'*"’, and convex functional 
c: R4 —> R presented by a comparison oracle, encoded 
as [(wi,..., Wa, b)], solves the convex n-fold integer pro- 
gram in standard form 


max {c(w1x,...,Wax) 1x EN™, AMx = by. 
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Some Applications 


Transportation Problems and Packing Problems 
Theorems 19 and 20 generalize Theorems 15 and 16 
by broadly extending the class of objective functions 
that can be maximized in polynomial time over n- 
fold systems. Therefore all applications discussed in 
Sect. “Some Applications” 
cordingly. 

First, we have the following analog of Corollary 8 
for the convex integer transportation problem over long 
3-way tables. This has a very broad further general- 
ization to multiway transportation problems over long 
k-way tables of any dimension k, see Sect. 
Transportation Problems and Privacy in Statistical 
Databases”. 


automatically extend ac- 


“Multiway 


Corollary 10 For every fixed d, p,q there is a polyno- 
mial time algorithm that, given n, arrays W,,...,Wa € 
ZP*4*", line-sums u € ZP*4, vy € ZP*" andz € ZL", 
and convex functional c: R? —> R presented by a com- 
parison oracle, encoded as [(wi,...,Wa,U, V, Z)], solves 
the convex integer 3-way line-sum transportation prob- 
lem 


max C(W1X,...,WqxX) 1 XE NPxaxn ; 


> a = 2j,k> Y Xi,j.k = Vi,k» 

i j 
» Xi,jk = mi] ; 
k 


Second, we have the following analog of Corollary 9 for 
convex bin packing. 


Corollary 11 For every fixed d, number of types t, 
and type weights v,,...,v; © Z, there is a polyno- 
mial time algorithm that, given n bins, item numbers 
ny,...,n, € Z, bin capacities uy,...,U, € Z, util- 
ity matrices w,,...,wa € Z™", and convex functional 
c: R4 —> R presented by a comparison oracle, encoded 
as [(n,..., wa)], solves the con- 
vex integer bin packing problem, 


Nt, Uj,...,Un, W),.--, 


max c(wix,..., Wax) :x © N™", 


> vx =Uk, ye = m| : 
j k 


Vector Partitioning and Clustering The vector par- 
tition problem concerns the partitioning of n items 
among p players to maximize social value subject to 
constraints on the number of items each player can re- 
ceive. More precisely, the data is as follows. With each 
item i is associated a vector v; € Z* representing its 
utility under k criteria. The utility of player h under or- 
dered partition 7 = (m,...,7») of the set of items 


{1,..., a} is the sum v7 := )iieq, vi of utility vec- 

tors of items assigned to h under z. The social value 
‘ ‘ 4 a4 4 a4 

of w is the balancing c(vj,.-.. Viigo +++ sVpie++ ++ Vp 4) 


of the player utilities, where c is a convex functional 
on R?*. In the constrained version, the partition must 
be of a given shape, i.e. the number |z;,| of items that 
player h gets is required to be a given number A, (with 
>> An = n). In the unconstrained version, there is no 
restriction on the number of items per player. 

Vector partition problems have applications in di- 
verse areas such as load balancing, circuit layout, rank- 
ing, cluster analysis, inventory, and reliability, see e. g. 
[7,9,25,39,50] and the references therein. Here is a typ- 
ical example. 


Example 6 (minimal variance clustering). This prob- 
lem has numerous applications in the analysis of statis- 
tical data: given n observed points v;,..., 
group them into p clusters 7,..., 7» that minimize the 
sum of cluster variances given by 


V, in k-space, 


2 


ra du” 


| eae, i€p, 


Consider instances where there are n = pm points and 
the desired clustering is balanced, that is, the clusters 
should have equal size m. Suitable manipulation of the 
sum of variances expression above shows that the prob- 
lem is equivalent to a constrained vector partition prob- 
lem, where A; = m for all h, and where the convex 
functional c: R?* —> R (to be maximized) is the Eu- 
clidean norm squared, given by 


= ek 


h=1 i=1 


llzll? = 


c(z) = 


If either the number of criteria k or the number of play- 
ers p is variable, the partition problem is intractable 
since it instantly captures NP-hard problems [39]. 
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When both k, p are fixed, both the constrained and un- 
constrained versions of the vector partition problem are 
polynomial time solvable [39,50]. We now show that 
vector partition problems (either constrained or un- 
constrained) are in fact convex n-fold integer program- 
ming problems and therefore, as a consequence of The- 
orem 20, can be solved is polynomial time. 


Corollary 12 For every fixed number p of players and 
number k of criteria, there is a polynomial time al- 
gorithm that, given n, item vectors v},...,Vn € Zk, 
Ajis.++,Ap € N, and convex functional c: Ree —y 
R presented by a comparison oracle, encoded as 
[(v1,.-+,Vn,A1,...,Ap)], solves the constrained and 
unconstrained partitioning problems. 


Proof There is an obvious one-to-one correspondence 
between partitions and matrices x € {0, 1}?*" with all 
column-sums equal to one, where partition z corre- 
sponds to the matrix x with x,,; = lif i € m», and 
Xn,i = 0 otherwise. Let d := pk and define d ma- 
trices wy; € Z?*" by setting (wy, j)n,i = Vi, for all 
h=1,...,p,i=1,...,n andj = 1,...,k, and set- 
ting all other entries to zero. Then for any partition 2 
and its corresponding matrix x we have vj), = wy,jx 
for allh = 1,...,p andj = 1,...,k. Therefore, the 
unconstrained vector partition problem is the convex 
integer program 


max C(W11X,..., Wp ex) 2 x © NOX, 


Sig = | . 
h 


Suitably arranging the variables in a vector, this be- 
comes a convex n-fold integer program with a (0 + 
1) x p defining matrix A, where A; is empty and A2 := 
CLyea.25.1), 

Similarly, the constrained vector partition problem 
is the convex integer program 


as C(W11X,...,Wp,ex) 1 x E NP*, 


Yo xh,i =); See =a, : 
h i 


This again is a convex n-fold integer program, now with 
a (p+ 1) x p defining matrix A, where now A; := I, is 
the px p identity matrix and A, := (1,..., 1) as before. 


Using the algorithm of Theorem 20, this convex n- 
fold integer program, and hence the given vector parti- 
tion problem, can be solved in polynomial time. oO 


Multiway Transportation Problems and Privacy 
in Statistical Databases 


Transportation problems form a very important class 
of discrete optimization problems. The feasible points 
in a transportation problem are the multiway tables 
(“contingency tables” in statistics) such that the sums 
of entries over some of their lower dimensional sub- 
tables such as lines or planes (“margins” in statistics) 
are specified. Transportation problems and their cor- 
responding transportation polytopes have been used 
and studied extensively in the operations research and 
mathematical programming literature, as well as in the 
statistics literature in the context of secure statistical 
data disclosure and management by public agencies, 
see [4,6,11,18,19,42,43,53,60,62] and references therein. 

In this section we completely settle the algorith- 
mic complexity of treating multiway tables and discuss 
the applications to transportation problems and secure 
statistical data disclosure, as follows. After introduc- 
ing some terminology in Sect. “Tables and Margins”, 
we go on to describe, in Sect. “The Universality Theo- 
rem”, a universality result that shows that “short” 3-way 
r X c X 3 tables, with variable number r of rows and 
variable number c of columns but fixed small number 3 
of layers (hence “short”), are universal in a very strong 
sense. In Sect. “The Complexity of the Multiway Trans- 
portation Problem” we discuss the general multiway 
transportation problem. Using the results of Sect. “The 
Universality Theorem” and the results on linear and 
convex n-fold integer programming from Sect. “Linear 
N-fold Integer Programming” and Sect. “Convex In- 
teger Programming”, we show that the transportation 
problem is intractable for short 3-way r x c x 3 ta- 
bles but polynomial time treatable for “long” (k + 1)- 
way m, X +--+ X mz X n tables, with k and the sides 
m,,..., Mx fixed (but arbitrary), and the number n of 
layers variable (hence “long”). In Sect. “Privacy and En- 
try-Uniqueness” we turn to discuss data privacy and 
security and consider the central problem of detecting 
entry uniqueness in tables with disclosed margins. We 
show that as a consequence of the results of Sect. “The 
Universality Theorem” and Sect. “The Complexity of 
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the Multiway Transportation Problem”, and in anal- 
ogy to the complexity of the transportation problem 
established in Sect. “The Complexity of the Multiway 
Transportation Problem”, the entry uniqueness prob- 
lem is intractable for short 3-way r x c x 3 tables 
but polynomial time decidable for long (k + 1)-way 
m, X-++xX mz Xn tables. 


Tables and Margins 


We start with some terminology on tables, margins and 
transportation polytopes. A k-way table isan m, X---x 
my array xX = (Xi,,...,i,) of nonnegative integers. A k- 
way transportation polytope (or simply k-way polytope 
for brevity) is the set of all m, x --+ x mx nonnega- 
tive arrays x = (xj,,...,i,) such that the sums of the en- 
tries over some of their lower dimensional sub-arrays 
(margins) are specified. More precisely, for any tuple 
(i1,..., ix) with i; € {1,...,mj} U {+}, the corre- 
= i, is the sum of entries of x over 
all coordinates j with i; = +. The support of (i;,..., ik) 
and of x;,,...,i, is the set supp(i1,...,ik) = tj: i F 
+} of non-summed coordinates. For instance, if x is 
a4 x5 x32 array then it has 12 margins with support 
F = {1,3} such as x3 494 = ee ee 
A collection of margins is hierarchical if, for some fam- 
ily F of subsets of {1,...,k}, it consists of all mar- 
gins uj,,...,;, With support in F. In particular, for any 
0 < h < k, the collection of all h-margins of k-tables 
is the hierarchical collection with F the family of all h- 
subsets of {1,..., k}. Given a hierarchical collection of 
— ;, Supported on a family F of subsets of 
{1,...,k}, the corresponding k-way polytope is the set 
of nonnegative arrays with these margins, 


sponding margin x;, 


margins uj, 


Sese 
Tr = { ee Rt ki ie = Main’ 
supp(ii,..-,ik) EF}. 


The integer points in this polytope are precisely the k- 
way tables with the given margins. 


The Universality Theorem 


We now describe the following universality result of 
[14,16] which shows that, quite remarkably, any ra- 
tional polytope is a short 3-way r x c x 3 polytope 
with all line-sums specified. (In the terminology of 


Sect. “Tables and Margins” this is the r x c x 3 poly- 
tope T of all 2-margins fixed, supported on the fam- 
ily F = {{1, 2}, {1, 3}, {2, 3}}.) By saying that a poly- 
tope P C R? is representable as a polytope Q C R4 
we mean in the strong sense that there is an injection 
o:{l,...,p}—> {1,...,q} such that the coordinate- 
erasing projection 


mw: RI —> R? : x = (x1,...,%q) 


F> W(x) = (Xo(1),--+.Xa(py) 


provides a bijection between Q and P and between the 
sets of integer points QM Z4 and PN Z?. In particu- 
lar, if P is representable as Q then P and Q are isomor- 
phic in any reasonable sense: they are linearly equiv- 
alent and hence all linear programming related prob- 
lems over the two are polynomial time equivalent; they 
are combinatorially equivalent and hence they have the 
same face numbers and facial structure; and they are in- 
teger equivalent and therefore all integer programming 
and integer counting related problems over the two are 
polynomial time equivalent as well. 

We provide only an outline of the proof of the fol- 
lowing statement; complete details and more conse- 
quences of this theorem can be found in [14,16]. 


Theorem 21 There is a polynomial time algorithm that, 
given A € Z™*" and b € Z", encoded as [{A, b)], pro- 
duces r,c and line-sums u € Z™*, v € ZS andz € 
Z°° such that the polytope P := {y € R', : Ay = b} is 
representable as the 3-way polytope 


T:= }ee REP : ae = Zk, 
i 


y Xisjk = Vi,k > y Xi jk = H5| : 
j k 


Proof The construction proving the theorem con- 
sists of three polynomial time steps, each representing 
a polytope of a given format as a polytope of another 
given format. 

First, we show that any P := {y > 0: Ay = b} with 
A, b integer can be represented in polynomial time as 
Q := {x => 0: Cx = d} with C matrix all entries of 
which are in {—1, 0, 1, 2}. This reduction of coefficients 
will enable the rest of the steps to run in polynomial 
time. For each variable y; let kj := max{|log, |a;,j|| : 
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i = 1,...m} be the maximum number of bits in the 
binary representation of the absolute value of any en- 
try a;,; of A. Introduce variables xj,9,...,Xj,«;, and re- 
late them by the equations 2xj;,; — xj,i41 = 0. The 
representing injection o is defined by o(j) := (j,0), 
embedding y; as xj,9. Consider any term aj;,; y; of the 
original system. Using the binary expansion |a;,;|_ = 
ee t,2° with all t, € {0,1}, we rewrite this term as 
a ee t,x;,s. It is not hard to verify that this repre- 
sents P as Q with defining {—1, 0, 1, 2}-matrix. 

Second, we show that any Q := {y > 0: Ay = 
b} with A,b integer can be represented as a face F 
of a 3-way polytope with all plane-sums fixed, that is, 
a face of a 3-way polytope Ty of all 1-margins fixed, 
supported on the family F = {{1}, {2}, {3}}. 

Since Q is a polytope and hence bounded, we 
can compute (using Cramer’s rule) an integer upper 
bound U on the value of any coordinate y; ofany y € Q. 
Note also that a face of a 3-way polytope Tf is the set of 
all x = (x;,;,~) with some entries forced to zero; these 
entries are termed “forbidden”, and the other entries 
are termed “enabled”. 

For each variable y;, let r; be the largest between the 
sum of positive coefficients of y; and the sum of absolute 
values of negative coefficients of y; over all equations, 


i= max ( far, : ak,j > OF, 
k 
YS llanj| ang < 0) ; 
k 


Assume that A is of size m x n. Let r:= ae rj,Ri= 
{1,...,r7},h = m+1landH := {1,...,h}. We now 
describe how to construct vectors u,v € Z",z € Z", 
anda set EC R x R x H of triples - the enabled, non- 
forbidden, entries - such that the polytope Q is repre- 
sented as the face F of the corresponding 3-way poly- 
tope of rx rx h arrays with plane-sums u, v, z and only 
entries indexed by E enabled, 


F:= }eRp > Xi,j,k =0 
forall (i,j,k) € E, 


and ) Xi,j,k = Zk. y Xi,jk = Vj, 
ij i,k 
) Xijk = Ui : 
jek 


We also indicate the injection o : {1,...,m} —> Rx 
R x H giving the desired embedding of coordinates y; 
as coordinates x;,;,, and the representation of Q 
as F. 

Roughly, each equation k = 1,...,m is encoded 
in a “horizontal plane” R x R x {k} (the last plane 
R X R x {h} is included for consistency with its en- 
tries being “slacks”); and each variable y;, j = 1,...,n 
is encoded in a “vertical box” R; x Rj x H, where 
R = |¥;_, R; is the natural partition of R with |Rj| = 
r; for all j = 1,...,n, that is, with Rj := {1 + 
ij Php easy Vi<j ri}. 

Now, all “vertical” plane-sums are set to the same 
value U, that is, uj := vj := U for j = 1,...,r. All 
entries not in the union Can Rj x Rj x H of the vari- 
able boxes will be forbidden. We now describe the en- 
abled entries in the boxes; for simplicity we discuss the 
box R; x R; x H, the others being similar. We distin- 
guish between the two cases r,} = 1 andr, > 2. In 
the first case, Rj = {1}; the box, which is just the sin- 
gle line {1} x {1} x H, will have exactly two enabled 
entries (1,1, k*), (1,1, k~) for suitable k*, k~ to be de- 
fined later. We set o(1) := (1,1,k*), namely embed 
Vi = X11,4+- We define the complement of the vari- 
able y; to be 7; := U — y (and likewise for the other 
variables). The vertical sums u, v then force y) = U — 
Vi = U- X14 + = X1,1,4-> 80 the complement of y; 
is also embedded. Next, consider the case r; > 2. For 
each s = 1,..., 71, the line {s} x {s} x H (respectively, 
{s}x{1+(s mod r;)}xH) will contain one enabled en- 
try (s, s, k*(s)) (respectively, (s,1+(s mod 1), k~(s)). 
All other entries of R; x R,; X H will be forbidden. 
Again, we set o(1) := (1,1,k"(1)), namely embed 
Vi = X11,4+q)3 it is then not hard to see that, again, 
the vertical sums u, v force x, , ; 


(s) = *11,k+0) = V1 
and X5,14(s mod n),k-(s) = U — x1 1,4+) = Ji for each 
s = 1,...,1. Therefore, both y; and y, are each em- 
bedded in r; distinct entries. 

We now encode the equations by defining the hori- 
zontal plane-sums z and the indices k*(s), k~(s) above 
as follows. For k = 1,...,m, consider the kth equa- 
tion yar ak,j¥j = bx. Define the index sets _— 
{j : ax,j>O} and J” := {j : axj < O}, and set 
Ze i= be +U- Dyer |ax,;|. The last coordinate of z 
is set for consistency with u,v to be Zz, = Zm4i := 
r-U— oy, Zk. Now, with 7; := U — y; the com- 
plement of variable y; as above, the kth equation can be 
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rewritten as 


Ye anivit Yo lanl 


jelt jeJ— 
n 
=> ei +U- > |ax,;| 
j=l jes 


=b,+U- b> |ax, | = 2h 
jeJ— 


To encode this equation, we simply “pull down” to the 
corresponding kth horizontal plane as many copies of 
each variable y; or y; by suitably setting k*(s) := k 
or k~(s) := k. By the choice of 1; there are sufficiently 
many, possibly with a few redundant copies which are 
absorbed in the last hyperplane by setting k*(s) :-= m+ 
1 or k“(s) := m+ 1. This completes the encoding and 
provides the desired representation. 

Third, we show that any 3-way polytope with plane- 
sums fixed and entry bounds, 


F:= ye aa Yo Vink = oh. 
i,j 


> die = b;, 
i,k 
Do viik = ai, 
isk 


Vijk S cna ; 


can be represented as a 3-way polytope with line-sums 
fixed (and no entry bounds), 


i }reR ZO : Yo xnnK = ZK; 


I 


Yo xnyK = VzK; Yo xn1K = us : 
J K 


In particular, this implies that any face F of a 3-way 
polytope with plane-sums fixed can be represented as 
a 3-way polytope T with line-sums fixed: forbidden 
entries are encoded by setting a “forbidding” upper- 
bound e¢;,;,4 := 0 on all forbidden entries (i, j,k) ¢ E 
and an “enabling” upper-bound e;,;,, := U on all en- 
abled entries (i, j,k) € E. We describe the presenta- 
tion, but omit the proof that it is indeed valid; further 
details on this step can be found in [14,15,16]. We give 


explicit formulas for 7,7, v7,x, Z,x in terms of aj, bj, cx 
and e;,;,x as follows. Putr:=1-mandc:=n+1+m. 
The first index I of each entry x;,;,x will be a pair 
I = (i, j) in the r-set 


{(,1),..., (1, m), (2,1),..., 
(2,m),...,(1,1),...,(1,m)}. 


The second index J of each entry x7,7,x will be a pair 
J = (s, t) in the c-set 


(Dyess Dyce DBD 2 hs 


The last index K will simply range in the 3-set {1, 2, 3}. 
We represent F as T via the injection o given explicitly 
by o(i, j,k) := (Ci, j), 1, k), 1), embedding each vari- 
able y;,;,« as the entry x(j,j),(1,k),1- Let U now denote the 
minimal between the two values max{a),..., a ;} and 


max{b;,...,b,,}. The line-sums (2-margins) are set to 
be 
Ui, f),,t) = i, j,t> 
_j U ift =i, 
G20 ~ [0 otherwise. ” 
. U ift=j, 
GG.) 0 otherwise. 
U ift=1, 
Vi, jt = i,j,+ ift=2, , 
U itt = 3; 
Cj ifi= i, 
Z(i,j),1 = m:-U—a; if i = 2; 
0 ifi = 3. 
C+ i4,j = Cj ifi= 1, 
Z(i,j),2 = 0 ifi = 2, 
bj ifi = 3. 
0 ifi=1, 
Z(i,j),3 = aj ifi = 2, 
1-U—b, ifi=3. 


Applying the first step to the given rational poly- 
tope P, applying the second step to the resulting Q, and 
applying the third step to the resulting F, we get in poly- 
nomial time a 3-way rx c x3 polytope T of all line-sums 
fixed representing P as claimed. oO 
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The Complexity 
of the Multiway Transportation Problem 


We are now finally in position to settle the complex- 
ity of the general multiway transportation problem. The 
data for the problem consists of: positive integers k (ta- 
ble dimension) and m,,..., my, (table sides); family F 
of subsets of {1,..., k} (supporting the hierarchical col- 
lection of margins to be fixed); integer values uj,...., i, 
for all margins supported on F; and integer “profit” 
m, X+++X my array w. The transportation problem is to 
find an m, X---x m, table having the given margins and 
attaining maximum profit, or assert than none exists. 
Equivalently, it is the linear integer programming prob- 
lem of maximizing the linear functional defined by w 
over the transportation polytope T;, 


mX"XMk . y. 2 Sears ; 
xéeN > Xiyyip F Yiysig 


ik) Ee Fh. 


max { wx : 


supp(ii,.. 


The following result of [15] is an immediate con- 
sequence of Theorem 21 . It asserts that if two sides 
of the table are variable part of the input then the 
transportation problem is intractable already for short 
3-way tables with F = {{1, 2}, {1,3}, {2, 3}} support- 
ing all 2-margins (line-sums). This result can be eas- 
ily extended to k-way tables of any dimension k > 3 
and F the collection of all h-subsets of {1,..., k} for 
any 1 < h < kas long as two sides of the table are 
variable; we omit the proof of this extended result. 


Corollary 13 Itis NP-complete to decide, given r, c, and 
line-sums u € Z'™°, v € Z'3, andz € Z°*?, encoded 
as [(u, v, Z)], if the following set of tables is nonempty, 


S= Je N"S : yeas = i,k» 
i 


y Xi,j,k = Vi,k» y Xij.k = ws| : 
j k 


Proof The integer programming feasibility problem 
is to decide, given A € Z”*" and b € Z", if {y € 
N" : Ay = b} is nonempty. Given such A and J, the 
polynomial time algorithm of Theorem 21 produces r, c 
andu € Z’**,v € Z'*, and z € Z°*3, such that 
{y € N" : Ay = b} is nonempty if and only if the set S 
above is nonempty. This reduces integer programming 


feasibility to short 3-way line-sum transportation feasi- 
bility. Since the former is NP-complete (see e.g. [55]), 
so turns out to be the latter. O 


We now show that in contrast, when all sides but 
one are fixed (but arbitrary), and one side n is vari- 
able, then the corresponding long k-way transportation 
problem for any hierarchical collection of margins is an 
n-fold integer programming problem and therefore, as 
a consequence of Theorem 16, can be solved is poly- 
nomial time. This extends Corollary 8 established in 
Sect. “Three-Way Line-Sum Transportation Problems” 
for 3-way line-sum transportation. 


Corollary 14 For every fixed k, table sides m,,..., mk; 
and family F of subsets of {1,...,k +1}, there is a poly- 
nomial time algorithm that, given n, integer values u = 
(wiy,..., dea) Or all margins supported on F, and integer 
m, X+++X my X n array w, encoded as [(u, w)], solves 
the linear integer multiway transportation problem 


max{ wx : x © N™Por%mexn 

Xiesiegr = Ui, iggi> SUPP(iL,--.sik¢i) © FY. 
Proof Re-index the arrays as x = (x’,...,x”) with 
each x/ = (Xi,,...,i,,j) @ Suitably indexed mm ---m x 


vector representing the jth layer of x. Then the trans- 
portation problem can be encoded as an n-fold integer 
programming problem in standard form, 


max{wx :xEN™, A™Mx = db}, 


with an (r + s) x t defining matrix A where t := 
m\m+++m, and r,s, A; and A, are determined from 
¥F, and with right-hand side b := (b°,b!,...,b") € 
Z'*"s determined from the margins u = CUE eacinctads 
in such a way that the equations A;()7j_, x/) = b° rep- 
resent the constraints of all margins ~;,,..,i,,4. (where 
summation over layers occurs), whereas the equations 
Ax/ = b/ for j = 1,...,n represent the constraints 
of all margins x;, ....,i,,; with j # + (where summations 
are within a single layer at a time). 

Using the algorithm of Theorem 16, this n-fold inte- 
ger program, and hence the given multiway transporta- 
tion problem, can be solved in polynomial time. oO 


The proof of Corollary 14 shows that the set of feasi- 
ble points of any long k-way transportation problem, 
with all sides but one fixed and one side n variable, 
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for any hierarchical collection of margins, is an n-fold 
integer programming problem. Therefore, as a conse- 
quence of Theorem 20, we also have the following ex- 
tension of Corollary 14 for the convex integer multiway 
transportation problem over long k-way tables. 


Corollary 15 For every fixed d, k, table sides 
m,,...,mg, and family F of subsets of {1,...,k + 1}, 
there is a polynomial time algorithm that, given n, inte- 
ger values u = (Wis, ssosigga) fOr all margins supported on 
Ff, integer m, X-++X mx Xnarrays Wi,.. 
vex functional c: R4 —> R presented by a comparison 
oracle, encoded as [(u,w,,...,wa)], solves the convex 
integer multiway transportation problem 


., Wa, and con- 


max { c(w1x,..., Wax) 1 x E N™VOr*mkxn 


Sinsica = Wy digo SUPPUsu ce) SF): 


Privacy and Entry-Uniqueness 


A common practice in the disclosure of a multiway 
table containing sensitive data is to release some of 
the table margins rather than the table itself, see e.g. 
[11,18,19] and the references therein. Once the margins 
are released, the security of any specific entry of the ta- 
ble is related to the set of possible values that can occur 
in that entry in any table having the same margins as 
those of the source table in the data base. In particular, 
if this set consists of a unique value, that of the source 
table, then this entry can be exposed and privacy can be 
violated. This raises the following fundamental entry- 
uniqueness problem: given a consistent disclosed (hier- 
archical) collection of margin values, and a specific en- 
try index, is the value that can occur in that entry in 
any table having these margins unique? We now de- 
scribe the results of [48] that settle the complexity of 
this problem, and interpret the consequences for secure 
statistical data disclosure. 

First, we show that if two sides of the table are vari- 
able part of the input then the entry-uniqueness prob- 
lem is intractable already for short 3-way tables with 
all 2-margins (line-sums) disclosed (corresponding to 
F = {{1, 2}, {1, 3}, {2, 3}}). This can be easily extended 
to k-way tables of any dimension k > 3 and F the col- 
lection of all h-subsets of {1,...,k} foranyl <h<k 
as long as two sides of the table are variable; we omit the 
proof of this extended result. While this result indicates 


that the disclosing agency may not be able to check for 
uniqueness, in this situation, some consolation is in that 
an adversary will be computationally unable to identify 
and retrieve a unique entry either. 


Corollary 16 It is coNP-complete to decide, given r, c, 
and line-sums u € Z'™**, v € Z'3, z € Z°*3, encoded 
as [(u, v, z)], if the entry x,1,1 is the same in all tables in 


. € Nex? : ) Xi,jk = Zj,k > 
i 
y Xi,jk = Vi,k» y Xi jk = ui, : 
j k 


Proof The subset-sum problem, well known to be 
NP-complete, is the following: given positive in- 
tegers do,@),...,@m, decide if there is an I C 
{1,...,m} with a9 = )°;<,a;. We reduce the com- 
plement of subset-sum to entry-uniqueness. Given 
Ag, @,..., Am, consider the polytope in 2(m + 1) vari- 
ables yo, ¥1---,¥ms20;Z15---++2m> 


P:= (y, Z) € lal > aoyo — > aiyi = 0, 


i=1 
y+tz2=1, i= 0.1... mt. 


First, note that it always has one integer point with yp = 
0, given by y; = 0 and z; = 1 for all i. Second, note that 
it has an integer point with yo 4 0 if and only if there is 
anI C {1,...,m} with ap = >> ,<; ai, given by yo = 1, 
yi = lfori € I, y; = Ofori € {1,...,m} \ I, and 
zi = 1— y; for all i. Lifting P to a suitable r x c x 3 
line-sum polytope T with the coordinate yp embedded 
in the entry x;,1,; using Theorem 21, we find that T has 
a table with x,,1,, = 0, and this value is unique among 
the tables in T if and only if there is no solution to the 
subset-sum problem with do, a1,..., Gn. Oo 


Next we show that, in contrast, when all table sides but 
one are fixed (but arbitrary), and one side n is vari- 
able, then, as a consequence of Corollary 14, the cor- 
responding long k-way entry-uniqueness problem for 
any hierarchical collection of margins can be solved 
is polynomial time. In this situation, the algorithm of 
Corollary 17 below allows disclosing agencies to efh- 
ciently check possible collections of margins before dis- 
closure: if an entry value is not unique then disclosure 
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may be assumed secure, whereas if the value is unique 
then disclosure may be risky and fewer margins should 
be released. Note that this situation, of long multiway 
tables, where one category is significantly richer than 
the others, that is, when each sample point can take 
many values in one category and only few values in the 
other categories, occurs often in practical applications, 
e.g., when one category is the individuals age and the 
other categories are binary (“yes-no”). In such situa- 
tions, our polynomial time algorithm below allows dis- 
closing agencies to check entry-uniqueness and make 
learned decisions on secure disclosure. 


Corollary 17 For every fixed k, table sides m,,..., mx; 
and family F of subsets of {1,...,k +1}, there is a poly- 
nomial time algorithm that, given n, integer values u = 
(Uj,,...,je41) for all margins supported on F, and entry 
index (i1,..., ix+1), encoded as [n, (u)], decides if the 
entry Xi,,...,ix4, 18 the same in all tables in the set 


M|X“XMEXN , : = . . 
{x € N * Xj1,.. ~~ Uj es fkt ? 


jkti) © F}. 


oofk-1 
supp(ji,... 


Proof By Corollary 14 we can solve in polynomial time 
both transportation problems 


_— iin { Misty 2 xe NM xn ; 
xe Tr ‘ ; 
is “ , & Mm, XXMEXN 
u:= max {Xi,,...,i¢41 :xeEN ; 
xe Ty } ‘ 


over the corresponding k-way transportation polytope 


—— Mm, X"XM_EXn 
Ty = {xeR, ikl 


supp(ji.---.jkti) € F }. 


> Xj, = Uj jeti? 


Clearly, entry xi,,...,i, sey has the same value in all tables 
with the given (disclosed) margins if and only if 1] = 
u, completing the description of the algorithm and the 
proof. Oo 
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Let f: S > R bea lower semicontinuous function, where 

S CR" is a nonempty convex subset. The convex enve- 

lope taken over S is a function fs: S > R such that 

e fs is aconvex function defined over the set S; 

e fs(x) < f(x) for all x € S; 

e if his any other convex function such that h(x) < 
F(x) for all x € S, then h(x) < f s(x) for all x € S. 

In other words, fs is the pointwise supremum among 

any convex underestimators of f over S, and is uniquely 

determined. The following demonstrates the most fun- 

damental properties shown by [3,6]. Suppose that the 

minimum of f over S exists. Then, 


min {f(x): x € S} = min{fs(x): x € S} 


and 


{x"> f(x") = f(x), Vn € 5} 
C fx": fs(x*) < folx), Vx € S}. 


The properties indicate that an optimal solution of a 
nonconvex minimization problem could be obtained by 
minimizing the associated convex envelope. In general, 
however, finding the convex envelope is at least as diffi- 
cult as solving the original one. 

Several practical results have been proposed for spe- 
cial classes of objective functions and constraints. Sup- 
pose that the function f is concave and S is a polytope 
with vertices v°, ..., vS. Then, the convex envelope fs 
over S can be expressed as: 


K 

f(x) = min Yo aif(v') 
i=0 

s.t. >. ajv' =X, 


1) 


Especially, if S is an n-simplex with vertices v°, ..., v", 


fs is the affine function 
fs(x) =a'x +b, 


which is uniquely determined by solving the following 
linear system 


a'vitb= f(v'), i=0,...,n. 
The properties above have been used to solve concave 
minimization problems with linear constraints [4,6]. 

The following property shown in [1,5] is frequently 
used in the literature. For each i = 1, ..., p, let f!: $; > 
R be acontinuous function, where S; C R", and let n = 
nyt-+ + Np. If 


P 


fa=> 7 @) 


i=1 


and 
S=S,X+++x Sp, 
where x! € R",i=1,..., p, and x = (x!,..., x?) € R", 


then the convex envelope f s(x) can be expressed as: 


P 
fo(x) = >> A). 


i=1 
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In particular, let f(x) = )7"_ | fi(x;) be a separable func- 
tion, where x = (x1, ..., Xn) € R”, and let f;(x;) be con- 
cave for each i = 1, ..., n. Then the convex envelope of 
f(x) over the rectangle R = {x € R": a; < x; < b;,i=1, 
..., m} can be the affine function, which is given by the 
sum of the linear functions below: 


n 
Ro= > UG), 
i=1 

where /;(x;) meets f ;(x;) at both ends of the interval a; < 
x; < b; for each i=1,...,n. B. Kalantari and J.B. Rosen 
[7] show an algorithm for the global minimization of 
a quadratic concave function over a polytope. They ex- 
ploit convex envelopes of separable functions over rect- 
angles to generate lower bounds in a branch and bound 
scheme. 

Also, convex envelopes of bilinear functions over 
rectangles have been proposed in [2]. Consider the fol- 
lowing rectangles: 


ae ea pater 
$= Liesag ts 
and let 
filxi, yi) =xiyi, i=l,...,n, 
be bilinear functions with two variables. It has been 
shown that for each i = 1, ..., n, the convex envelope 
of f'(x;, yi) over (2; is expressed as: 
Coes, = max{m;x; + ly; — limi, Mix; 
+ Liyy = 1;M}. 


Moreover, it can be verified that f¢ (xi, yi) agrees with 
f (xj, y;) at the four extreme points of (2;. Thus, the con- 
vex envelope of the general bilinear function 


fi n=x"y=>o fii yi, 


i=1 
where xT = (xj, ..., X,) and yT = (yy, .. 
21x +++ x 2, can be expressed as 


> Yn) over 2 = 


n 
falx.y) => fbi. yi. 
i=1 
Another characterization of convex envelopes of bilin- 
ear functions over a special type of polytope, which in- 
cludes a rectangle as a special case, is derived in [8]. 


See also 


> «BB Algorithm 

> Global Optimization in Generalized Geometric 
Programming 

> MINLP: Global Optimization with wBB 
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Introduction 


A twice continuously differentiable function in several 
variables, when considered on a compact convex set 
C, becomes convex if an appropriate convex quadratic 
is added to it, e.g. [2]. Equivalently, a twice continu- 
ously differentiable function is the difference of a con- 
vex function and a convex quadratic on C. This decom- 
position is valid also for smooth functions with Lip- 
schitz derivatives [8]. Here we recall three conditions 
that are both necessary and sufficient for the decom- 
position [9,10]. We also list several implications of the 
convexification in optimization and applied mathemat- 
ics [10,11]. A different notion of convexification is stud- 
ied in, e.g. [6]; see also [3,5]. 


Definitions 


Definition 1. ([7,10]) Given a continuous function 
f: RR" — R ona compact convex set C of the Eu- 
clidean space R” , consider #: R"t! — R defined 
by o(x,a@) = f(x) — 1/2ax'x where x" is the trans- 
pose of x. If (x, a) is convex on C for some a = a*, 
then #(x, a) is said to be a convexification of f and a* is 
its convexifier on C. Function f is convexifiable if it has 
a convexification. 


Observation. If a@* is a convexifier of f on a compact 
convex set C, then so is every @ < a*. 


Illustration 1. Consider f(t) = cost on, say, —a < 
t < 2m. This function is convexifiable, its convexifier 
is any a < —1. For, e.g., a* = —2, its convexification 
is o(t,-2) = cost + t?. Note that f(t) is the differ- 
ence of (strictly) convex $(t,@) = cost — 1/2at? and 
(strictly) convex quadratic —1/2ax'x for every suffi- 
ciently small a. The graphs of f(t) and its convexifica- 
tion #(t, —2) are depicted in Fig. 1. 


Characterizations of a Convexifiable Function 


One can characterize convexifiable functions using the 
fact that acontinuous f: R” — R is convex if, and only 
if, f is mid-point convex, i.e., f((x+y)/2) < 1/2(f(x)+ 
f(y). x,y € C, e.g. [4]. Denote the norm of u € R” by 
||u|| = (utu)"? . With a continuous f: R” > R one 
can associate VW: Rx R" > R: 


Definition 2. ([10]) Consider a continuous f: R” > 
R ona compact convex set C in R”. The mid-point ac- 


convexification 


Convexifiable Functions, Characterization of, Figure 1 
Function f(t) = cost and its convexification 
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Mid-point acceleration function of f(t) = cost 


celeration function of f on C is the function 


W(x, y) = (4/ |lx — yl) + £0) 
—2f((x+y)/2)], x,yeC xy. 


Function W describes a mid-point “displacement of 
the displacement” (i.e., the “acceleration”) of f at x be- 
tween x and y along the direction y — x. The graph of W 
for the scalar function f(t) = cos t is depicted in Fig. 2. 


Using W one can characterize a convexifiable func- 
tion: 


Theorem 1. ([10]) Consider a continuous f: R" > R 
on a compact convex set C in IR". Function f is convexifi- 
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able on C if, and only if, its mid-point acceleration func- 
tion W is bounded from below on C. 


For scalar functions one can also use a determinant: 


Theorem 2. ([9] Determinant Characterization of 
Scalar Convexifiable Function) A continuous scalar 
function f: IR — R is convexifiable on a compact con- 
vex interval lif, and only if, there exists a number a such 
that for every three points s < t < E inI 


1 1 1 
det} s t é 


> Sals— H(t E-9). 
(s) f(t) f(€) 

Illustration 2. Function f(t) = —|¢)? on C = 
[—1, 1] is continuously differentiable but it is not con- 
vexifiable. Indeed, for s = 0,W(0, t) 271 — 
21)/t-? + —oo as t > 0,t > 0. Also, using The- 
orem 2 ats = —e, t = 0,€ = & > 0, we find that 
there is no a such that a < —2/e as ¢ — O. Function 
g(t) = —|t| is not convexifiable around the origin. 


Scalar convexifiable functions can be represented ex- 
plicitly on a compact interval I: 


Theorem 3. ([9] Explicit Representation of Scalar 
Convexifiable Functions) A continuous scalar func- 
tion f : I > R is convexifiable if, and only if, there exists 
a number a such that 


1 t 
fl) = fle) + sal) + f g(6.0) 8. 


Here c,t € I,c < t,and g = g(-,a): I > Risa non- 
decreasing right-continuous function. 

An implication of this result is that every smooth 
function with a Lipschitz derivative, in particular every 
analytic function and every trajectory of an object gov- 
erned by Newton’s Second Law, is of this form. 

Two important classes of functions are convexifi- 
able and a convexifier @ can be given explicitly. First, 
if f is twice continuously differentiable, then the second 
derivative of f at x is represented by the Hessian matrix 
H(x) = (0° f(x)/0x;0x;), i,j =1,...,n. This isa sym- 
metric matrix with real eigenvalues. Denote its smallest 
eigenvalue at x by A(x) and its “globally” smallest eigen- 
value over a compact convex set C by 


A* = min A(x). 
xEC 


Corollary 1. ([7]) A twice continuously differentiable 
function f: R" — R is convexifiable on a compact con- 
vex set Cin R" and a = X* is a convexifier. 

Suppose that f is a continuously differentiable 
(smooth) function with the derivative satisfying the Lip- 
schitz property, i. e., ||V f(x) — Vf(y)|| < L||x — y|| for 
every x,y € C and some constant L. Here V f(u) is the 
(Frechet) derivative of f at u. We represent the deriva- 
tive of f at x by a column n-tuple gradient V f(x) = 
(0 f(x)/dx;). 


Corollary 2. ([8]) A continuously differentiable func- 
tion f: R” — R, with the derivative having the Lip- 
schitz property with a constant L on a compact convex 
set C in R", is convexifiable on C and a = —L is a con- 
vexifier. 


A Lipschitz function may not be convexifiable. For ex- 
ample, f(t) = ¢*sin(1/t) for t # 0 and f(0) = 0 
is a Lipschitz function and it is also differentiable 
(not continuously differentiable). Its derivative is uni- 
formly bounded, but the function is not convexifiable, 
e.g. [7,11]. 


Canonical Form of Smooth Programs 


Every mathematical program (NP) 


Min f(x), f'(x) <0, ie P={l1,...,m},xEC, 


where the functions f, fi: R" — R,i € P are con- 
tinuous and convexifiable on a compact convex set C 
can be reduced to a canonical form. First one consid- 
ers some convexifications of these functions: $(x,@) = 
f(x) — 1/2ax'x and $!(x,a;) = fi(x) — 1/2a;x"x, 
where a, @; are, respectively, arbitrary convexifiers of 
f, fi, i © P. Then one associates with (NP) the follow- 
ing program with partly linear convexifications 


(LF; 6, €): 
: 1 
Miny.,g)b(x, a) + 30x 6, 
; 1 
b'(x, aj) + 5 aix'6 <0, ieP, 
x€C,|\x-—@|| <e. 


Here ¢ > 0 is a scalar parameter. This parameter was 
fixed at zero value in [2]. For the sake of “numerical 
stability” it was extended to ¢ > 0 in [7]. If the norm 
is chosen to be uniform, i.e., ||u||oo = maxj=1,...,.n |Ui| 
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then (LF; 6, ¢) is a convex program in x for every fixed 
(8, €) and linear in (6, ¢) for every x. Such programs are 
called partly linear-convex. Theory of optimality and 
stability for such programs and related models is well 
studied, e. g. [1,8]. 


Remark Since one can construct the program (LF; 6, ¢) 
for every (NP) with convexifiable functions, we refer to 
(LF; 0, €) as the parametric Liu-Floudas canonical form 
of (NP). 


Let us relate an optimal solution of (NP) to optimal so- 
lutions x°(e), 0°(e) of (LF; 9, ). 


Theorem 4. ([8,9]) Consider (NP) with a unique opti- 
mal solution, where all functions are assumed to be con- 
vexifiable, and its partly linear-convex program (LF; 0, 
é). Then a feasible x* is an optimal solution of (NP) if 
and only if, x* = lime4o x°(€) and 0* = lime+o 0°(e), 
with x* = 0*. Moreover, the feasible set mapping of (LF; 
Q, &) is lower semi-continuous at 0° and e* = 0, relative 
to all feasible perturbations of (0, €). 


Other Applications 


There are many other areas of applications of convexi- 
fiable functions: 

(i) Every convexifiable function f is the difference 
of a convex function (x,@) and a convex quadratic 
1/2ax'x for every sufficiently small @ on a compact 
convex set. Hence it follows that the results for con- 
vex functions can be applied to ¢(x,q). With minor 
adjustments, pertaining to the quadratic term, such re- 
sults can be extended to convexifiable (generally non- 
convex) functions. Here is an illustration of how this 
works for the mean value. The result is well known for 
convex functions (the case a = 0). 


Theorem 5. ([9]) Consider a continuous scalar convex- 
ifiable function f: R — R on an open interval (a, b) 
with a convexifier a. Then 


d 
1 
do) f fae = IFC) + fd) 
~ ald - 0), forevery a<c<d<b. 


A composite version of this result follows. 


Theorem 6. ([9]) Integral Mean-Value for Compos- 
ite Convexifiable Function) Let f: (a,b) — R be con- 
tinous and convexifiable with a convexifier a and let 
g: [c,d] — (a, b) be continuous. Then 


d 
f ld —)- f gitar < U(d—c) 


d 
- [fe gnat+ 50 Rc, dig) 


where 


d 
R(c, d;g) = ado). f ginat? do) 


/ d 
- fig@rae. 


Remarks If f: (a,b) — R and g: [c,d] — (a,b) 
are continuous on (a,b), then, in Theorem 5 and 6, 
one can specify a = 0, if f is convex on (a,b). Also 
a = A* = mine, f(t), if f is twice continuously dif- 
ferentiable or analytic on (a, b), and a = —L a negative 
Lipschitz constant of the derivative of f on (a, b), if f is 
continuously differentiable. 


(ii) Convexification of differential equations: A so- 
lution y(t) of an ordinary differential equation of sec- 
ond or higher order, over a compact interval, is contin- 
uously differentiable with a Lipschitz derivative. Such 
y is convexifiable. Using y(t) = (t,a) + 1/2at?, the 
problems in differential equations can be “convexified”, 
i.e., transformed to equivalent differential equations 
with convex solutions #(t,a). After back-substitution, 
the true solution y(t) is recovered. In particular, the 
problems of theoretical mechanics based on the Second 
Newton Law, can be “convexified”, e. g. [9,11]. 

(iii) Convexification in linear algebra: Some of the 
basic eigenvalue inequalities for symmetric matrices 
follow from inequalities for convex functions, e. g. [4]. 
Using a convexification, one can extend these results to 
non-convexity. For example, after finding a convexifier 
of the product function f(x) = xx2-++Xx,, the follow- 
ing follows: 


Corollary 3. (Bounds for the Determinant of an Ar- 
bitrary Symmetric Matrix [11]) Let A = (aj;) be an 
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nxn real symmetric matrix where n > 2 and let p be its 
spectral radius. Then 


I] aij —(n— 1p"? y ai <detA 
i=1,....n i,j=l,...,.n3i<j 
< I] aii + (n—1)p"” ss ai : 


i=1,....n i,j=l,....n3i<j 


If the left hand-side in Corollary 3 is positive, then the 
matrix A is non-singular. 

From the two-sided Jensen inequalities, obtained by 
convexification, new estimates follow for the absolute 
value of the inner product function in arbitrary inner 
product spaces. In some situations the new estimates 
are sharper than the Cauchy-Schwarz inequality. Many 
results for convexifiable functions, including the canon- 
ical form (LF; 0, €), can be extended to non-smooth 
Lipschitz functions. This can be done using the fact that 
every Lipschitz function, when considered on a com- 
pact convex set, is only a linear function away from the 
set of all coordinate-wise monotone functions [11]. 


Conclusions 


A necessary and sufficient condition for convexifiability 
on a compact convex set is given using the mid-point 
acceleration function. For scalar functions there are 
also characterizations given in terms of determinants 
and integrals. In particular, every smooth function with 
a Lipschitz derivative is convexifiable. Such function is 
only a convex quadratic away from the set of all con- 
vex functions. Using this “closeness”, many results for 
convex functions can be extended to smooth (generally 
non-convex) functions with Lipschitz derivatives. On 
the other hand, mathematical programs with convexi- 
fiable functions can be reduced to the parametric Liu- 
Floudas partly linear-convex canonical form. 


References 


1. Floudas CA (2000) Deterministic Global Optimization. 
Kluwer, Dordrecht 

2. Liu WB, Floudas CA (1993) A remark on the GOP algorithm 
for global optimization. J Global Optim 3:519-521 

3. Pardalos PM (1988) Enumerative techniques in global op- 
timization. Oper Res Spektr 10:29-35 

4. Roberts AW, Varberg DE (1973) Convex Functions. Aca- 
demic Press, New York 

5. Vial JP (1983) Strong and weak convexity of sets and func- 
tions. Math Oper Res 8(2):231-259 


6. Wu ZY, Lee HWJ, Yang XM (2005) A class of convexification 
and concavification methods for non-monotone optimiza- 
tion problems. Optimization 54:605-625 

7. Zlobec S (2003) Estimating convexifiers in continuous op- 
timization. Math Commun 8:129-137 

8. Zlobec S (2005) On the Liu-Floudas convexification of 
smooth programs. J Glob Optim 32:401-407 

9. Zlobec S (2005) Convexifiable functions in integral calcu- 
lus. Glasnik Matematicki 40(60):241-247, Erratum (2006) 
Ibid 41(61):187-188 

10. Zlobec S (2006) Characterization of convexifiable func- 
tions. Optimization 55:251-262 

11. Zlobec S (2008) On two simple decompositions of contin- 
uous functions. Optimization 57(2):249-261 


—-  —_— 
Convex Max-Functions 


CLAUDIA SAGASTIZABAL 
IMPA, Jardim Botanico, Brazil 


MSC2000: 49K35, 49M27, 65K10, 90C25 


Article Outline 


Keywords 

Synonyms 

Examples of the Problem 

Continuity and Optimality Conditions 


Algorithms of Minimization 
Nonlinear Programming 
Nonsmooth Optimization 
Other Methods of Resolution 


See also 
References 


Keywords 


Minimax problem; Convex optimization; 
Max-function 


Synonyms 


Minimax 


Examples of the Problem 


In a classic nonlinear program (NLP) a smooth objec- 
tive function is minimized on a feasible set defined by 
finitely many smooth constraints. However, many op- 
timization problems have an objective function that is 
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not smooth but is of the max type, that is, defined as 
f(x) := max {fe(x): €e Lh, (1) 


where the functions f¢: R” + Rare themselves smooth. 
When £ is finite, the minimization of f is called a fi- 

nite minimax problem. In a general minimax problem, 

the indices ¢ can range over an infinite compact set L, 

see [6]. Those f¢’s realizing the maximum in (1) are 

called active functions. The corresponding indices form 
the active index set, defined as £,(x) := {1 € £: f(x) = 
fil}. 

There are numerous examples of optimization 
problems dealing with the minimization of convex 
max-functions: 

e When processing empirical data {(te, pe): € = 1, 
..., m}, consider the problem of selecting the coef- 
ficients x of a polynomial p,(t) = i aoX+ é that 
fits well the data. The quality of the approximation 
can be measured through the deviation f¢(x) := |p¢ 
— px(te)|, defined for all 2. Depending on the nature 
of the problem, it can be interesting to minimize ei- 
ther the sum of squared deviations or the maximum 
deviation. The first case is a least squares problem, 
while the second is known as Chebyshev best approx- 
imation, and is a particular instance of (1), with in- 
dex set £ = {€: €=1,..., m}. 

e A more general case is finding the best approxima- 
tion of a continuous function go on a compact in- 
terval £ Instead of n powers #/, n linearly indepen- 
dent functions g;: £ — R are given. To find the lin- 
ear combination )°j xj; which best approximates 
@ in the max-norm comes to solve (1), with f¢(x) 
= | dj xj9;)(C) — go(6)|. Because the problem has 
infinitely many constraints, it is also an example of 
semi-infinite programming. 

e A basic problem in structural optimization, see [2], 
is to find the stiffest structure of a given volume that 
is able to carry loads varying on a given set. Opti- 
mization can be performed through the variation of 
sizing variables, like the thickness of bars in a truss; 
shape variables, as the splines defining the boundary 
of the body; or even the distribution and properties 
of the composite material used to make the structure 
itself. After discretization by finite elements, the de- 
sign problem has an objective function as in (1). The 
fs therein have usually the form f¢(x) = (1/2)xT Ag 


x — bTx, with Ay denoting the stiffness matrix of the 
th element of the grid. 

Some large scale mixed integer problems can 
be solved by decomposition techniques using La- 
grangian relaxation. The idea is to coordinate, by 
means of a master program, the iterative resolu- 
tion of problems of smaller dimension or complex- 
ity, called local problems. When applying price de- 
composition, the master is led to maximize a dual 
function, say (A), on the space of multipliers A, 
using some iterative method. In production plan- 
ning problems, the iterate A* sent to the local solvers 
can be interpreted as prices paid by the master. 
Let £ = []; £' denote the (decomposed) primal 
space and let L(x, A) = >>; L'(x', A) be the (decom- 
posed) Lagrangian of the original problem. Each lo- 
cal unit i decides its corresponding optimal level 
of production by solving min,ic¢i Li(x', A*). Hav- 
ing those local optimal levels, the master adjusts 
prices by updating A* in order to maximize the 
dual function 6, defined as the pointwise mini- 
mum of the Lagrangian: 6(A) = minyer L(x, A) = 
>, min,iesi L'(x', A). An equivalent problem for 
the master is to minimize f(A) := — 0(A). This last 
formulation has an objective f that is a max-function 
as in (1), letting f¢(A) = — )7iL'(x}, A), for each xy € 
QL. 

Solving a nonlinear program using exact penal- 
ties leads to the iterative minimization of penalized 
objective which are max-functions, with the max- 
operation involving the constraints. 

In game theory, consider a zero sum game, with two 
players whose strategy is to optimize their individ- 
ual choice against the worst possible selection by the 
other player. The first player can choose his action 
over 1 possible moves, with probability distribution 
x =(xj,.. 
of the second player, corresponds a loss a;,; player I 
pays to player II. Let £ be a continuous index count- 
ing elements in the set of probability distributions 
of the second player: £ = {z ER": ) zal, gz 
O}. Calling A = [a; ;] the n x m loss-matrix, the aver- 
age amount to be paid by player I is f¢(x) = xT Azz. 
It follows that the first player needs to solve a prob- 
lem like (1). The mini-maximization of the bivariate 
function F(x, z) = xT Az is also called a saddle-point 
problem. 


.» Xn). To every possible move j = 1,...,m 
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Continuity and Optimality Conditions 


When taking the pointwise maximum in (1), some 
properties of the functions f¢ are transmitted to f. Such 
is the case of continuity and convexity, but not of dif- 
ferentiability. 

More precisely, f is continuous when both the f¢-s 
and its gradients Vf, are continuous. When the under- 
lying functions f¢ are convex, so is f. 

Convexity implies that the max-function f is dif- 
ferentiable almost everywhere. Nevertheless, at those 
points where more than one underlying function f) re- 
alizes the maximum, the gradient fails to exist. Typi- 
cally, such is the case at x, a minimizer of f. For in- 
stance, suppose that £,(x) = {1,2}, i-e., there are two 
active functions at x: f(x) = fi(x) = fa(x). Clearly, 
for V f(x) to exist, the unlikely equality Vf(x) = 
Vfi(x) = Vf2(x) needs to hold. Moreover, the opti- 
mality condition for minimizing f on R” would require 
all the involved gradients to be null. 

Rather than a single gradient, it is possible to define 
a whole set of subgradients by making convex combi- 
nations of V f\(x) and V f:(x). For an arbitrary convex 
max-function f, at any given x, the set of subgradients 
is the so-called subdifferential of f at x. Its expression is 
given by the formula 


af(x) = es aiVfi(x): ae Ab, (2) 


1EL,(x) 


where A is the unit simplex 


As ine Rl, >” of =1, a 20>. 3) 


1EL, (x) 


When & in (1) is an infinite compact set, (2) still holds, 
provided the application £ + f¢(x) is upper semicon- 
tinuous for each x, see [8, Chap. VI, $4.4]. 

Consider the constrained problem 


min 78), (4) 


where 2 C R" is a closed convex set and f is the func- 
tion defined in (1). Assume the index set £& is infinite 
and suppose (4) has a solution x such that f(x) = 
max { fg(x): € € £}. Then it can be proved (see [6, 
Chap. VI, Thm. 3.3]) that (4) is equivalent to the finite 


minimax 


xEQ i= 


min max {fe;(x): L,€ gy, (5) 


with r < n+ 1. The set {€),... 
basis of (4). 

When {2 satisfies a constraint qualification condi- 
tion of Slater type, see, for instance, [7, Chap. III], a nec- 
essary optimality condition (OC) characterizing a min- 
imizer x of (4) is 


, £,} is called an extremal 


0 € Of (x) + No(), (6) 


where N is the normal cone of convex analysis. Because 
f is convex, the optimality condition is also sufficient. 

The optimality condition (6) can be further speci- 
fied when {2 is represented by a set of convex inequali- 
ties: 

Q:= {x ER": cj(x) <0, je J}. (7) 
Observe that {2 may contain an infinite number of con- 
straints cj, assumed to be smooth and convex. Using (1) 


together with (7), the following characterization of X re- 
sults: 


Lemma 1 There exist r <n + 1 ands <n such that for 


the index sets L(x) := {h,.... 1} C Land fj, ..., js} 
C J it holds 
So aiV fi(®) + D> wiVe;,%) = 0, (8) 
i=1 i=1 


where the multipliers a and \t are positive and a is an 
element of the simplex A in R'4!e™) from (3). 


This characterization ensures the existence of an ex- 
tremal basis of (4) near x. 


Algorithms of Minimization 


Depending on the nature of the problem, several ap- 
proaches have been proposed to solve (4): 

e Reformulation as a NLP. 

e Minimization of the nonsmooth max-function. 

e Determination of a saddle point. 

e Search of an extremal basis. 

For example, the amount of available information can 
determine the method of resolution: if for any given x, 
all the active indices in (1) are known, then the full sub- 
differential (2) is available and a nonlinear program- 
ming technique can be applied. 
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Nonlinear Programming 


An important feature of this approach is that smooth 
NLP techniques have a superlinear rate of convergence. 
The essential idea is first to write (4) as an NLP with an 
additional variable: 

min fr 

ree (9) 

st. or > fe(x), LeEL, 


and then solve the associated optimality conditions 
by using a Newton-like method, such as sequen- 
tial quadratic programming (cf. also » Successive 
quadratic programming) or interior point schemes, see 
[4, Parts HI-IV] and [3, §§4.3-4.4]. 


Nonsmooth Optimization 


Sometimes the explicit knowledge of all the active con- 
straints in (9) can be difficult, if not impossible, to ob- 
tain; such is the case for structural optimization prob- 
lems. 

On the other hand, it is often possible to obtain 
a single subgradient almost for free when computing 
f(x). Indeed, suppose that just one active index / in £,(x) 
is known: f(x) = fi(x). Then, because of (2), Vfi(x) € 
Of (x). 

Algorithms from nonsmooth optimization, such as 
bundle methods [8, Chaps. XIV-XV], are designed to 
minimize a general convex function, possibly nondif- 
ferentiable, with the information furnished by an ora- 
cle that gives f(x) and only one subgradient at x. Non- 
smooth optimization techniques, specialized to a max- 
function like f in (1), have been successfully used in [12] 
and [9] to solve general minimax problems. 

Although bundle methods are essentially first or- 
der methods, in recent years some proposals have been 
given that aim at obtaining better than linear conver- 
gence. They consist of a combination of bundle, proxi- 
mal and quasi-Newton techniques [5,10,11]. 


Other Methods of Resolution 


V.F. Demyanov and V.N. Malozemov treat (4) in an 
indirect way, by solving an infinite sequence of finite 
minimax problems. Keeping (5) in mind, the idea is 
to asymptotically identify an extremal basis by making 
successive approximations on a finite grid of the index 
set £. 


In game theory, rather than solving (4) by some 
‘mini-maximization’ procedure, it can be more con- 
venient to find a saddle point. That is, an equilibrium 
point satisfying min, max,, F(x, zg) = maxz, min, F(x, 
Ze), with F(x, ze) := f¢(x). The determination of saddle 
points of F can be performed taking advantage of the 
extra structure of the problem. Some popular methods 
are Arrow—Hurwicz’s and Uzawa’s, see [1]. 


See also 


> Lagrangian Multipliers Methods for Convex 
Programming 
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W.I. Zangwill [9] first proposed the convex-simplex al- 
gorithm (CSA) for the following problem: 


meat, 7 


where f(x) is a pseudoconvex function on R". The set 
S is a nonempty polyhedron, i.e., S = {x € R": Ax = b, 
x > 0}, Aisa m x n matrix, and bD is a vector in R™. For 
simplicity, S is assumed to be bounded also. 

CSA belongs to a class of algorithms called feasible 
direction methods. Given an initial feasible solution, al- 
gorithms in this class solve problem (1) by iteratively 
generating an improving feasible direction that leads 
to another feasible solution with an improved objec- 
tive value. The name ‘convex-simplex’ is to indicate that 
the algorithm generates improving feasible directions 
in manner similar to the simplex algorithm for linear 
programs. When f(x) is linear, the algorithm is iden- 
tical to the simplex algorithm. In general, a vector d is 
an improving feasible direction at a point, x, feasible to 
problem (1) if the following (sufficient) conditions hold 
a) Vf(x)Td<0, 

b) Ad = 0, and 
Cc) d; = Oif x; =0. 

It follows from the first order Taylor series expan- 

sion of f(x) that 


f(x + Ad) = f(x) HAVA (x)"d +A Id (x; Ad), 


where lim, -, 9 @ (x; Ad) = 0. Via the above expansion, 
condition a) implies that f(x + Ad) < f(x) for a suffi- 
ciently small A > 0, i-e., d leads to an improvement in 
the objective function. The remaining two conditions 
guarantee that d can produce a point in S. In particular, 
condition b) yields the following: 


A(x + Ad) = Ax + AAd = Ax = b. 


This shows that x + Ad is always feasible with respect 
to the equality constraint. Next, each component of x + 
Ad can be written as 


Xe Adj; if x; > 0, 


xji+ Ad; = 
Adj; if x; = 0. 


When A is a sufficiently small positive number, x; + Ad; 
remains nonnegative in the first case. For the second 
case, it follows directly from condition c) that Ad; > 0 
for all A > 0. Thus, x + Ad € S when A is sufficiently 
small. 

To describe how CSA generates an improving fea- 
sible direction, let a; denote the jth column of A. Also, 
assume that every m columns of A are linearly indepen- 
dent and every extreme point of S has m strictly posi- 
tive components. Under these assumptions, every fea- 
sible solution has at least m positive components and at 
most (m — m) zero components. Given a feasible solu- 
tion x, let I(x) be the set of indices for the m largest com- 
ponents of x. Then, A can be partitioned into [B, N], 
where B = [a;:j € I(x)] and N = [a;:j ¢ I(x)]. Similarly, 
x! can be partitioned into [xj , x,;] where x} , the basic 
component, corresponds to components of x belonging 
to I(x), and x,|, the nonbasic component, corresponds 
to components not in I(x). By the above assumptions, 
x} > 0 and Bis nonsingular. 

Partitioning the direction vector, d', into its basic 
and nonbasic components, i. e., [dj , dj], produces the 
following sequence of relationships: 


Ad = 0, 
Bdg + Ndy = 0, 
dz = —B'Ndy. 


The last equality yields the following: 
Vf(x)"d = Vaf(x)"ds + Vu f(x)" dw 
= [Vy f(x)’ — Va f(x)" BN] dy 


4 
=rydn = > rjdj, 
j€I(x) 
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where rj, = Vyf (x)! — Vaf (x)' Bo! N. In order for 
d to be an improving direction, V f(x) 'd< 0. There are 
several approaches to make this inner product negative. 
Each approach generates a different algorithm and all of 
which can be viewed as an extension or variant of the re- 
duced gradient algorithm first proposed by P. Wolfe [8]. 

Like the simplex algorithm, CSA allows only one 
nonbasic component of d to be nonzero. In particular, 
let 


j? = argmax {—rj: rji< o} 
jET(x) 
and 
jf =argmax {xjrj: xj > 0,1; > 0}. 
JEM) 


Then, CSA chooses dy as follows. If — Ti+ 2 Xj Ty-s 
then d jt =l and d; = 0 for the remaining nonbasic 
components. This makes V f(x)! d= rt <0. Other- 
wise, dj- = — 1 instead and V f(x)" d=—r1,- <0. 
Given dy, the basic component can be computed using 
the relationship dg = — B™! Nady. 

When ry > 0 and vi xy = 0, the indices j* andj” are 
undefined in the above construction. When this occurs, 
x is globally optimal and dy is usually set to zero to in- 
dicate that there is no improving feasible direction. To 
demonstrate, it is sufficient to show that there exist vec- 
tors! = [ads il > 0 and v (unrestricted) satisfying 
the following equations: 


Vaf(x) + B'v — jz = 0, 
Vu f(x) + N'v— pn = 0, 
[Ln XB = 0, 

[Lg XN =0. 


These equations are known as the Karush-Kuhn- 
Tucker conditions (see, e. g., [2]) and they are sufficient 
optimality conditions for problem (1). Letting wz = 0, 
pun = Vw f(x) — NBT)! Vy f(x), and v = — (BT)! 
V af (x) satisfies the first three conditions. Since jun = 
ry; the above assumptions concerning ry imply that Nn 
> 0 and val xy = 0. Thus, the Karush—-Kuhn-Tucker 
conditions hold and x must be globally optimal. 

When dy # 0, a better feasible point can be ob- 
tained from a solution to the following problem, typi- 
cally called the line search problem: 

min f(x +Ad), 


OSA SAmax 


where Amax = min, {— x;/d; : dj < 0}. This prevents com- 
ponents of x + Ad from being negative. (If S is un- 
bounded, then every component of d may be nonneg- 
ative and Amax = 00.) Algorithms such as the bisection 
search, the golden section method, and an inexact line 
search technique (e.g., the Armijo rule, [1]) can effi- 
ciently solve the line search problem. 
To summarize, CSA can be stated as follows: 


Selectx! € Sandsetk = 1. 

1 | Identify I (x*) and form the submatrices B 
and N. 

Compute 


= VniGo = Vala) BAIN. 


2 | IF ry = Oand ix) —0, 
THEN stop and x* is an optimal solution. 


ELSE, let 
j’ = argmax{—r; : r; < 0}, 
jé1(x*) 
j= argmax{x} 1; 8 a > 0,7; > o}. 
j€I(x*) 
i 
Set dy = 0. 


We —i7yes 2 ou: 
THEN set dj, = 1. 
ELSE, set ae = —1 instead. 
Set dk = B~'Ndk. Go to Step 3. 
3) || Satan = min {—x/‘/dt : d‘ <0}and com- 
pute 
AK = argmin f(x* + Ad*). 


0<A<Amax 


Then, set x**! = x* + AKd‘ andk =k+1. 
Return to Step 1. 


The convex-simplex algorithm (CSA) 


The convergence proof for CSA is the same as that 
of the reduced gradient algorithm and follows stan- 
dard arguments in nonlinear programming. Although 
CSA behaves like the simplex algorithm, CSA con- 
verges slowly when compared to other algorithms. This 
is due in part to the restriction that only one nonba- 
sic component of the improving feasible direction can 
be nonzero. To accelerate CSA, B.A. Murtagh and M.A. 


Copositive Optimization 


561 


Saunder [5] used a second order approximation for the 
objective function and allowed several nonbasic com- 
ponents to be nonzero. The latter is often referred to as 
superbasic variables. 

For other developments, S. Nguyen [6] and R.V. 
Helgason and J.L. Kennington [4] specialized CSA to 
nonlinear network flow problems. D.P. Rutenberg [7] 
(see also [3]) demonstrated that special techniques for 
solving linear programs with generalized network and 
generalized upper bounding structure also extend to 
CSA. 


See also 


> Lemke Method 

> Linear Complementarity Problem 

> Linear Programming 

> Parametric Linear Programming: Cost Simplex 
Algorithm 

> Sequential Simplex Method 
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Copositive optimization (or copositive programming, 
coined in [4]) is a special case of conic optimization that 
consists in extremizing a linear function over a (con- 
vex) cone subject to additional (inhomogeneous) linear 
(inequality or equality) constraints. 

It is well known that the simplest class of hard prob- 
lems in continuous optimization is that of quadratic 
optimization problems [20] - to extremize a (possi- 
bly indefinite) quadratic form x ' Qx over a polyhedron 
{x € Ri: Ax = bh. Note that a linear term in the ob- 
jective function can be removed by an affine transfor- 
mation of the polyhedron. The number of local, non- 
global solutions to this problem may be exponential in 
the number of variables and/or constraints. 

This class has a close connection to copositive op- 
timization. The idea here is to linearize the quadratic 
form 


x! Qx = trace(x' Qx) = trace(Q, xx!) = (Q, xx!) 


by introducing the new symmetric matrix variable 
X = xx! and Frobenius duality (X, Y) = trace(X, Y). 
IfAx € R" forall x € Ri andbe R, then the linear 
constraints can be squared, to arrive in a similar way at 
constraints of the form (A;, X) = 03. 

Now the set of all these X generated by feasible x is 
nonconvex since rank(xx') = 1. The convex hull 


K= conv {xx!: xe Ri} : 


results in a convex matrix cone known as the cone 
of completely positive matrices since [14]; see [1]. 
Note that a similar construction dropping nonnegativ- 
ity constraints leads to 


P= conv {xx!: xe R"} ; 


the cone of positive-semidefinite matrices, the basic set 
in semidefinite optimization (or semidefinite program- 
ming, SDP). 

The first account of copositive optimization goes 
back to [4], who established a copositive representa- 
tion of a subclass of particular interest, namely, in stan- 
dard quadratic optimization (StQP). Here the feasible 
polyhedron is the standard simplex A = {x € Ri‘: 
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>>; x: = 1}: this subclass is also NP-hard from the 
worst-case complexity but allows for a polynomial- 
time approximation scheme [3]. There can be up to 
= 2"/(1.25./n) local nonglobal solutions. Now, with J 
the n x n all-ones matrix, we have 


min {x" Qx: xe A} 


= min {(Q,X): J, X)=1,X eK} . 


Note that the right-hand problem is convex, so there are 
no more local, nonglobal solutions. In addition, the ob- 
jective function is now linear, and there is just one lin- 
ear equality constraint. The complexity has been com- 
pletely pushed into the feasibility condition X € K, 
which also shows that there are indeed convex mini- 
mization problems that cannot be solved easily. 

The duality theory for conic optimization problems 
requires the dual cone K* of K w.r.t. the Frobenius 
inner product (..., ), which is 


K* = {S symmetric n x n: (S,X) > 0 
forall X € K}. 


Here it can easily be shown that K* coincides with the 
cone of copositive matrices, which justifies the termi- 
nology: 


K* = {s symmetric n Xx n: x'Sx >Oifxe Ri} ; 


i.e., a matrix S is copositive [18] (most probably abbre- 
viating “conditionally positive-semidefinite”), if S gen- 
erates a quadratic form x ' Sx taking no negative values 
over the positive orthant. The dual of the special pro- 
gram over K above is then 


max{y:S=Q-yJeEK*}, 


a linear objective in just one variable y with the 
innocent-looking feasibility constraint § € K*. This 
shows that checking membership of K* (and, simi- 
larly, of K) is already NP-hard, and there are many ap- 
proaches to algorithmic copositivity detection; for re- 
cent developments see, e. g., [7] and references therein. 

More generally, a typical primal-dual pair in copos- 
itive optimization (COP) is of the following form: 

inf {(C, X): (Aj, X)=b;, i=1l:m, Xe K} 


= sup > bins y ER”, S=C-— yj;A; € K* 


The inequality above is just standard weak duality, but 
observe we have to use inf and sup since - as in general 
conic optimization - there may be problems with the 
attainability of either or both problems above, and like- 
wise there could be a (finite or infinite) positive duality 
gap without any further conditions such as strict feasi- 
bility (Slater’s condition). For the above representation 
of standard quadratic optimization problems, this is not 
the case: 


min {(Q,X): J,X)=1, Xe K} 
= max{y:S=Q-yJeEK"*}. 


But for a similar class arising in many applications, the 
multi-standard quadratic optimization problems [6], 
dual attainability is not guaranteed while the duality 
gap is zero - an intermediate form between weak and 
strong duality [25]. 

Recently Burer [8] showed a more general result: 
any mixed-binary quadratic optimization problem 


1 
min } 587 Qx + eTa: Ax = By xER", 


xj € {0, 1}, a je Bl 
can (under mild conditions) be represented as COP: 
1 ree 4 “ 
min} 5(Q, X): A(X) =b,X € x| : 


where X and O are (n + 1) x (n + 1) matrices, and the 
size of (A, b) is polynomial in the size of (A, b). 

Denote by N = {N symmetric n x n: Nij > 0 
for all i, j = 1: n} the cone of nonnegative matrices. 
Then evidently 


KCPOAONCPHN CK*, 


which also shows that K never can be self-dual (note 
(PAN) =P*4+N*=P+N), unlike P = P* 
and N = N™*. For n > 5, A. Horn noted that the left- 
most and the rightmost inclusion above is strict [10,14], 
so the middle sets P 1 N and P + WN can only be used 
as tractable approximations for the intractable cones K 
and K*, respectively. 

Copositive approximation hierarchies [3,15,21,22] 
start with K° = P + N and consist of an increasing 
sequence K ‘") of cones satisfying U,>9K () = int K*, 
the cone of strictly copositive matrices, i.e., those that 
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generate quadratic forms strictly positive over A. For 
instance, a higher-order approximation due to [21] 
squares the variables to get rid of sign constraints: 
S € K* if and only if y'Sy > 0 for all y s.t. yj = x?, 
some x € R”, and this is guaranteed if the n-variable 
polynomial of degree 2(r + 2) in x, 


oP Oa(Y aly Sa. a) > Seen: 
jk 


is nonnegative for all x € IR”. But this holds in particu- 
lar if 

(a) oe has no negative coefficients; or if 

(b) ae is a sum-of-squares (s.0.s.): 


2 w= SLi), fi some polynomials. 


This gives the approximation cones 

Cc” = {s symmetric n x n: S satisfies (a)} 
and 

K” = {S symmetric n x n: S satisfies (b)} . 


While C“ can be described by linear constraints on 
the entries of S, leading to LP formulations, the cones 
K need for their description linear matrix inequali- 
ties (LMIs), leading to SDP formulations. However, for 
large r both are also intractable as they generate prob- 
lems on matrices of order O(n’t! x n"*1), see [3]. 

Copositive optimization has been receiving increas- 
ing attention also because many NP-hard combinato- 
rial problems have a representation in this domain; we 
start with the historically first such representation, the 
maximum (weight) clique problem, which amounts to 
finding a largest (or heaviest) clique in an undirected 
graph G (with weights on the vertices). Using an StQP 
formulation going back to [19] and applying some reg- 
ularization [2], the following copositive formulation 
was introduced in [4]: 


1/o(G) = min {x' Qgx: x € A} 
= min {(Qg¢, X): J, X) =1, X¢ K}, 
where Qg is a matrix derived from the adjacency matrix 
of G (and the weights). Taking the inverse t = 1/y in 


the dual of the last problem above, we also arrive at the 
formulation of [9] (for the complementary graph): 


w(G) = min{t: tQg—Je K*}. 


Here w(G) is the clique number of G, i.e., the size 
(weight) of a maximum (weight) clique in G. Replac- 
ing K* with its zero-order approximation, we get 
a strengthening 6/(G) of the well-known Lovasz bound 
8(G) [16,17,26]: 


6'(G) = min {t: t(Qg -Je P+ N} > a(G), 


while shrinking further the feasible set to P, we finally 
arrive at the Lovdsz number 6(G) which - as 6’(G) - 
can be computed in polynomial time: 


6(G) = min{t: tQg —J€ P} = a(G). 
Strong duality yields, as above, 
1/0'(G) = min{(Qg, X): J, X)=1, XE PNN}, 


and a recent improvement over 6’(G) adding a single 
valid linear cut motivated by the COP representation is 


1/0°(G) = min {(Qc, X): J, X) = 1, 
(C, X)>0, XE PNN}> 1/0(G), 


where C € K®* is arbitrary: indeed, for any X € K we 
then have (C, X) > 0. See [5] for appropriate choices 
of C and results, and [9,13,15,22] for higher-order ap- 
proximation alternatives, with a particular emphasis 
on SDP-based bounds on the clique number. Simi- 
lar copositivity optimization approaches, among many 
others, were employed to obtain bounds on the (frac- 
tional) chromatic number of a graph [11,12], and 
graph partitioning and quadratic assignment prob- 
lems [23,24]. 
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Introduction 


Let us denote the set of n x n real symmetric matrices 
by 


Sy ={X eR, X= xX"). 


We will be considering the following subsets of S,: 
e Then x n symmetric positive semidefinite matrices 


Se = {XE Sh, y Xy>O0Vye RR"; 
e Then x n symmetric copositive matrices 


Cy = {X € Sy, y Xy>0Vy ER", y>0}; 
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e Then x n symmetric completely positive matrices 


k 
C=] > yy ey 0G = 1s, 


i=l 
e The m x n symmetric nonnegative matrices 
Ny ={X €Sy,X =O}. 


It is easy to see that all these sets are convex cones 
(that is, if X and Y belong to one of these sets, then cx, 
for any c > 0, andaX + (1—a@)Y, forany0 <a@ <1, 
also do so). 

We will denote the Eucledian inner product of A € 
S, and B € S, by 


n n 
AeB= pee ajjbij = trace(AB) . 


i=1 j=1 


For an arbitrary cone of matrices K, we define its dual 
cone K* as 


K* ={Y|XeY>0, VX EK}. 


The cone of positive semidefinite matrices St is dual 
to itself, and it is easy to see that the cone of copositive 
matrices C,, and the cone of completely positive matri- 
ces C* are dual to each other (generally, K** = XK). 

For a given cone of n x n matrices K,, and its dual 
cone K*, we define a pair of conic linear programs 
called, correspondingly, primal and dual: 


p* = inf{C eX: AjeX=b;(i=1,...,m), 


X€ Knits (1) 


d* = sup{b7y: C—) | y;Aj+S =C, S © K7}. (2) 
y 


i=1 


When K, = N;,, we refer to linear programming, 
when K, = Ss we refer to semidefinite programming, 
and when K, = C,, we refer to copositive program- 
ming. 


The conic duality theorem (see, e. g., [9]) establishes 
the duality relation between (1) and (2): 


Theorem 1 (Conic Duality Theorem) [If there exists 
an interior feasible solution X° € int(K) of (1) and 
a feasible solution of (2), then p* = d* and the supre- 
mum in (2) is attained. Similarly, if there exist feasible y° 


and S° for (2), where S° € int(K*), and a feasible solu- 
tion for (1), then p* = d* and the infimum in (1) is 
attained. 


It is well-known that optimization over the cones St 
and N,, can be done in polynomial time (in sense 
of computing e-optimal solution), but copositive pro- 
gramming is NP-hard as we will see below. 


Applications 


Copositive matrices have a great variety of applications 
in mathematics and, especially, in optimization. They 
play an essential part in characterization of local so- 
lutions of constrained optimization problems [5], in- 
cluding the linear complementarity problem. In [1,2,8] 
the authors used copositivity to improve convex re- 
laxation bounds for quadratic programming problems. 
Generally, convex relaxations are the underlying basis 
of many crucial results in robustness analysis. For ex- 
ample, copositive matrices have been used in the sta- 
bility analysis of piecewise linear control systems (in 
context of using piecewise quadratic Lyapunov func- 
tions [4]). 


Complexity of Copositive Programming 


Copositive programming can be easily shown to be N P- 
hard by reduction from the maximum independent set 
problem. In [3], the authors established the following 
theorem: 


Theorem 2 Let G(V,E) be a given graph with V = 
{1...n}. Then the maximum independent set size of G 
is the optimum value of the following program: 


a(G) = max O, eX (3) 
subject to 


xij =0, (i, f)eE£ 
trace(X) = 1 (4) 
xeCr 

where O,, is the all-one n x n matrix. 


Proof. Extreme rays of C* are rank-one matrices of the 
form xx! for nonnegative x € R”. Then, considering 
the convex cone 


CeS{2 eC tay = 0, (i,j) € E}, 
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we can conclude that its extreme rays are of the 
form xx", where the nonnegative x € R” supports an 
independent set (i.e., the set {i : x; > O} is indepen- 
dent). Therefore, the extreme points of the set defined 
by (4) are given by the intersection of the extreme rays 
of Cg with the hyperplane O, e X = 1. 

Since the optimum value of a linear function over 
a convex set is attained at an extreme point, there is an 
optimum solution of the form 


X* = x*x*") x* ER", x* > 0,|[x*|| = 1, 
and where x* supports an independent set. Therefore, 
we can reformulate the program (3) as 


7 2 
max (5) » |x|] = 1, 


i=1 


x>0, xjxj > 0 => (i,j) ¢ E. 


Then, it is easy to see that the maximum is attained 
when x supports a maximum independent set and 
all x; > 0 are equal to 1/a(G). This provides the op- 
timum value to the program (3) equal to a@(G). QED. 

Since X € C* is always nonnegative, we can reduce 
the set of constraints x;; = 0, (i, j) € Ein (4) toa single 
constraint A e X = 0. Thus, the following copositive 
program is dual to (3), (4): 


a(G) = min {A:AI+ yA—O, = Q, QEC,}. 
A,yeR 

Therefore, the maximum independent set problem is 
reducible to copositive programming. See also [1,2] for 
reduction of the standard quadratic optimization prob- 
lem to copositive programming. 

Furthermore, it can be shown that checking if 
a given matrix is not copositve is NP-complete [5] and, 
hence, checking matrix copositivity is co- NP-complete. 


Models 
Approximating C,, with Linear Matrix Inequalities 


While, in general, there is no polynomial-time verifi- 
able certificate of copositivity, unless co-NP = NP, 
in many cases it is still possible to show by a short 
argument that a matrix is copositive. For instance, if 
the matrix M can be represented as sum of a positive 
semidefinite matrix S € st and a nonnegative matrix 


N € NN, then it follows that M € C,,. Hence, we can 
obtain a semidefinite relaxation of a copositive program 
over M introducing the linear matrix constraints: 


M=S+N 
N>0, Se St 


Parrilo showed in [6] that using sufficiently large sys- 
tems of linear matrix inequalities, one can approximate 
the copositive cone C,, to any desired accuracy. 

Obviously, copositivity of the matrix M is equiv- 
alent to (global) nonnegativity of the fourth-degree 
form: 


P(x) = (xo x)'M(x 0 x) 


n n 
= > > Mies; >0,xeER”, (5) 
i=1 j=1 


«> 


where “o” denotes the componentwise (Hadamard) 
product. It is shown in [6] that the mentioned de- 
composition into positive semidefinite and nonnegative 
matrices exists if and only if P(x) can be represented as 
sum of squares. Higher-order sufficient conditions for 
copositivity proposed by Parrilo in [6] correspond to 


checking whether the polynomial 


P(x) = (x: “] P(x) (6) 


i=1 


has a sum-of-squares decomposition (or - a weaker 
condition - whether P(x) has only nonnegative co- 
efficients). These conditions can be expressed via linear 
matrix inequalities over n” x n” symmetric matrices. In 
particular, for r = 1, Parrilo showed that the existence 
of a sum-of-squares decomposition of P(x) is equiv- 
alent to feasibility of the following system (see also [3]): 


M-M%e€S*, i=1,...,n, 
Me =6, Ge lyscceh, 
(i) (Gi) _ . . 
‘4 ie ee ee iF j, 
1 . . 
My + Mi, + Mi; >0, i<j<k, 
where M“ (i = 1,...,n) are symmetric matrices. 


With sufficiently large r, the convergence to the 
copositivity constraint on M is guaranteed by the fa- 
mous theorem of Pélya [7]: 


Cost Approximation Algorithms 567 


Theorem 3 (Polya) Let f be a homogeneous polyno- 
mial which is positive on the simplex 


A= |xeR": Dai=ixzol. 


i=1 


Then, for a sufficiently large N, all the coefficients of the 
polynomial 


(>: 7) f(x) 


are positive. 
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The notion of cost approximation (CA) was created in 
the thesis [39], to describe the construction of the sub- 
problem of a class of iterative methods in mathematical 
programming. In order to explain the notion of CA, we 
will consider the following conceptual problem (the full 
generality of the algorithm is explained in detail in [45] 
and in [37,38,40,42,43,44,46]): 


min T(x) := f(x) + u(x), 
s.t. x EX, 


(1) 


where X C R" is nonempty, closed and convex, u: R" 
— RU {+ oo} is lower semicontinuous (l.s.c.), proper 
and convex, and f: R" > R U {+ oo} is continuously 
differentiable (for short, in C'!) on dom uN X, where 
dom denotes ‘effective domain’. This problem is gen- 
eral enough to cover convex optimization (f = 0), un- 
constrained optimization (f = 0 and X = R"), and dif- 
ferentiable constrained optimization (u = 0). We note 
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that if int (dom u) N X is nonempty, then any locally 
optimal solution x* satisfies the inclusion 


— Vf (x*) € du(x*) + Nx(x*), (2) 


where Nx is the normal cone operator for X and du is 
the subdifferential mapping for u. Equivalently, by the 
definitions of these two operators, 


V F(x") "(x —x*) + u(x) —u(x*) >0, x eX. 


The CA algorithm was devised in order to place 
a number of existing algorithms for (1) in a common 
framework, thereby facilitating comparisons, for exam- 
ple, between their convergence properties. In short, the 
method works iteratively as follows. Note that, from (2), 
we seek a zero of the mapping [Vf + du + Nx]. Given an 
iterate, x' € dom u M X, this mapping is approximated 
by a monotone mapping, constructed so that a zero of 
which is easier to find. Such a point, y‘, is then utilized 
in the search for a new iterate, x'*!, having the prop- 
erty that the value of some merit function for (1) is re- 
duced sufficiently, for example through a line search in 
T along the direction of d‘ := y' — x". 


Instances of the CA Algorithm 


To obtain a monotone approximating mapping, we in- 
troduce a monotone mapping @': dom uM X > R’, 
which replaces the (possibly nonmonotone) mapping 
Vf; by subtracting off the error at x‘, [6 — Vf](x‘), 
from ®‘, so that the resulting mapping becomes [®‘ + 
du+Nx]+ [Vf — &'](x'), the CA subproblem becomes 
the inclusion 


[Ot + du + Nx](y’) + [Vf —S'](x') 30". (3) 


We immediately reach an interesting fixed-point char- 
acterization of the solutions to (2): 


Theorem 1 (Fixed-point, [45]) The point x' solves (2) if 
and only if y' = x' solves (3). 


This result is a natural starting point for devising stop- 
ping criteria for an algorithm. 

Assume now that ®' = V@g’ for a convex function 
g'. We may then derive the inclusion equivalently as 
follows. At x’, we replace f with the function g‘, and 
subtract off the linearization of the error at x‘; the sub- 
problem objective function then becomes 


To(y) = g'(y) + uly) + IVF (x!) — Vole] Ty. 


It is straightforward to establish that (3) is the optimal- 
ity conditions for the convex problem of minimizing 
Tgt over X. 


Linearization Methods 


Our first example instances utilize Taylor expansions of 
f to construct the approximations. 

Let u =0 and X =R”. Let @'(y) := (1/y;) Q y, where 
y1 > 0 and Q' is a symmetric and positive definite map- 
ping in R’*”. The inclusion (3) reduces to 


V f(x‘) + — Oty! — x’) = 0”, 
t 


that is, y' = x’ — y,(Q')'Vf(x‘). The direction of y' 
— x‘, d':= — y,(Q‘)“!Vf(x'), is the search direction of 
the class of deflected gradient methods, which includes 
the steepest descent method (Q' := I", the identity ma- 
trix) and quasi-Newton methods (Q' equals (an approx- 
imation of) V* f(x"), if positive definite). (See further 
[5,35,47,50].) 

In the presence of constraints, this choice of ®‘ 
leads to y! = P2'[x! — y,(Q')-! Vf (x")], where P2 '[-] de- 
notes the projection onto X with respect to the norm 
IIZllq: == vz! Q'z. Among the algorithms in this class 
we find the gradient projection algorithm (Q' := I") 
and Newton’s method (Q' := V* f(x‘), yr := 1). (See 
[5,19,27,50].) 

A first order Taylor expansion of f is obtained from 
choosing y'(y) := 0; this results in Tg:(y) = Vf(x‘)T y 
(if u = 0 is still assumed), which is the subproblem ob- 
jective in the Frank-Wolfe algorithm ([5,17]; cf. also 
> Frank-Wolfe algorithm). 

We next provide the first example of the very use- 
ful fact that the result of the cost approximation (in the 
above examples a linearization), leads to different ap- 
proximations of the original problem, and ultimately 
to different algorithms, depending on which represen- 
tation of the problem to one applies the cost approxi- 
mation. 

Consider the problem 


min f(x) 
J (4) 
st. gi(x)=0, i=1,...,£, 
where f and g;, i = 1, ..., €, are functions in C’. We 


may associate this problem with its first order optimal- 
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ity conditions, which in this special case is 


a V,L(x*,A*) 0” 

oi ae ea ~ (ae 6) 
where A € R° is the vector of Lagrange multipliers for 
the constraints in (4), and L(x, A) := f(x)+ AT g(x) is 
the associated Lagrangian function. We consider using 
Newton’s method for this system, and therefore intro- 
duce a (primal-dual) mapping @:R*"* 9 > R"*§ of the 
form 


B((y, p), (x, A)) = VEC.) (*) 


_ es X) a) (’) 
—V g(x) 0 p)- 
The resulting CA subproblem in (y, p) can be written as 
the following linear system: 


V2L(x, Ay — x) + Va(x)"p = 0", 
Va(x)(y — x) = —g(x); 


this system constitutes the first order optimality condi- 
tions for (e. g., [4, Sec. 10.4]) 


min f(x)+ VF(x)"(y — x) 
+3(y — x) VIL(x, AYy — x) 
st. g(x) + Vg(x)"(y—x) = 0°, 


where we have added some fixed terms in the objective 
function for clarity. This is the generic subproblem of 
sequential quadratic programming (SQP) methods for 
the solution of (4); see, for example, [5,16]. 


Regularization, Splitting 
and Proximal Point Methods 


We assume now that f :=f1 + f2, where f; is convex on 
dom uM X, and rewrite the cost mapping as 


[Vf + du + Nx] = [Vo' + Vfi + du + Nx] 
—[Vo' -Vfil. 


The CA subproblem is, as usual, derived by fixing the 
second term at x’; the difference to the original setup is 
that we have here performed an operator splitting in the 
mapping Vf to keep an additive part from being ap- 
proximated. (Note that such a splitting can always be 


found by first choosing f; as a convex function, and 
then define f2 := f — f1. Note also that we can derive 
this subproblem from the original derivation by simply 
redefining yg‘ := y' + f;.) We shall proceed to derive 
a few algorithms from the literature. 

Consider choosing g'(y) = 1/(2y1) || y — x¢ ||*, vt 
> 0. If f2 = 0, then we obtain the subproblem ob- 
jective Ty:(y) = T(y)+ 1/(2 y2) || y — x¢ ||?, which 
is the subproblem in the proximal point algorithm 
(e.g. [32,33,34,51,52]). This is the most classical algo- 
rithm among the regularization methods. More general 
choices of strictly convex functions 9! are of course pos- 
sible, leading for example to the class of regularization 
methods based on Bregman functions ([9,14,22]) and 
w-divergence functions ([23,54]). If, on the other hand, 
ff =0, then we obtain the gradient projection algorithm 
if also u = 0. 

We can also construct algorithms in between these 
two extremes, yielding a true operator splitting. If both 
fi and f2 are nonzero, choosing g' = 0 defines a partial 
linearization ([25]) of the original objective, wherein 
only f2 is linearized. Letting x = (x, x] )T, the choice 
¢'(y) = 1/(2y:) || v1 — x} ||? leads to the partial proxi- 
mal point algorithm ([7,20]); choosing g'(y) = f(y, x3) 
leads to a linearization of f in the variables x2. 

Several well-known methods can be derived either 
directly as CA algorithms, or as inexact proximal point 
algorithms. For example, the Levenberg-Marquardt al- 
gorithm ([5,49]), which is a Newton-like algorithm 
wherein a scaled diagonal matrix is added to the Hes- 
sian matrix in order to make the resulting matrix posi- 
tive definite, is the result of solving the proximal point 
subproblem with one iteration of a Newton algorithm. 
Further, the extra-gradient algorithm of [24] is the re- 
sult of instead applying one iteration of the gradient 
projection algorithm to the proximal point subprob- 
lem. 

The perhaps most well-known splitting algorithm 
is otherwise the class of matrix splitting methods 
in quadratic programming (e.g., [28,29,35,36]). In 
a quadratic programming problem, we have 


1 
f(x) = 5x Ax + q' x, 
where A € R"*”. A splitting (Aj, Aj) of this matrix is 


one for which A = A; + Aj, and it is further termed 
regular if A} — A} is positive definite. Matrix splitting 
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methods correspond to choosing 
org 
fix) = 5% Aix, 


and results in the CA subproblem mapping y +> Aj y 
+ [A} x! + q], which obviously is monotone whenever 
Aj was chosen positive semidefinite. 

Due to the fact that proximal point and splitting 
methods have dual interpretations as augmented La- 
grangian algorithms ([51]), a large class of multiplier 
methods is included among the CA algorithms. See [45, 
Chapt. 3.2-3.3] for more details. 


Perturbation Methods 


All the above algorithms assume that 
i) the mappings du and Nx are left intact; and 
ii) the CA subproblem has the fixed-point property of 

Theorem 1. 

We here relax these assumptions, and are then able 
to derive subgradient algorithms as well as perturbed 
CA algorithms which include both regularization algo- 
rithms and exterior penalty algorithms. 

Let [B' + Nx] + [Vf + du + ®‘] represent the orig- 
inal mapping, having moved du to the second term. 
Then by letting any element &,,(x') € du(x') repre- 
sent this point-to-set mapping at x‘, we reach the sub- 
problem mapping of the auxiliary problem principle of 
[12]. Further letting ®‘(y) = (1/y,)[y — x‘] yields the 
subproblem in the classical subgradient optimization 
scheme ([48,53]), where, assuming further that f = 0, 
y' = Px[x! — yz Eu(x')]. (Typically, €; := 1 is taken.) 

Let again [®' + du + Nx]+ [Vf + ®‘] represent the 
original problem mapping, but further let u be replaced 
by an epiconvergent sequence {u'} of l.s.c., proper and 
convex functions. An example of an epiconvergent se- 
quence of convex functions is provided by convex exte- 
rior penalty functions. In this way, we can construct CA 
algorithm that approximate the objective function and 
simultaneously replace some of the constraints of the 
problem with exterior penalties. See [3,13] for example 
methods of this type. 

One important class of regularization methods 
takes as the subproblem mapping [®' + Vf + du + 
Nx], where @' is usually taken to be strongly monotone 
(cf. (12)). This subproblem mapping evidently does not 
have the fixed-point property, as it is not identical to 
the original one at x‘ unless &'(x') = 0” holds. In order 


to ensure convergence, we must therefore force the se- 
quence {®*} of mappings to tend to zero; this is typically 
done by constructing the sequence as ®' := (1/y;)® for 
a fixed mapping ® and for a sequence of y; > 0 con- 
structed so that {y,;} — oo holds. For this class of al- 
gorithms, F. Browder [10] has established convergence 
to a unique limit point x* which satisfies — ®(x*) € 
Nx«(x*), where X* is the solution set of (2). The origin 
of this class of methods is the work of A.N. Tikhonov 
[55] for ill-posed problems, that is, problems with mul- 
tiple solutions. The classical regularization mapping is 
the scaled identity mapping, ®‘(y) := (1/y;)Ly], which 
leads to least squares (least norm) solutions. See fur- 
ther [49,56]. 


Variational Inequality Problems 


Consider the following extension of (2): 
— F(x*) € du(x*) + Nx(x"*), (6) 


where F: X — R” is a continuous mapping on X. When 
F = Vf we have the situation in (2), and also in the case 
when F(x, y) = (VxJT(x, y), — VylT(x, y)T)T holds for 
some saddle function J7 on some convex product set 
X x Y (cf. (5)), the variational inequality problem (6) 
has a direct interpretation as the necessary optimality 
conditions for an optimization problem. In other cases, 
however, a merit function (or, objective function), for 
the problem (6) is not immediately available. We will 
derive a class of suitable merit functions below. 

Given the convex function y: dom uM X > RinC! 
on dom uM X, we introduce the function 


w(x) := sup L(y,x), x €domun Xx, (7) 
yEex 
where 
L(y, x) = u(x) — u(y) + p(x) — g(y) 
+ [F(x) — Vo(x)]"(x-y). (8) 
We introduce the optimization problem 


- y (x). (9) 


Theorem 2 (Gap function, [45]) For any x € X, w(x) 
> 0 holds. Further, w(x) = 0 if and only if x solves (6). 
Hence, the solution set of (6) (if nonempty) is identical 
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to that of the optimization problem (9), and the optimal 
value is zero. 


The Theorem shows that the CA subproblem defines 
an auxiliary function y which measures the violation 
of (6), and which can be used (directly or indirectly) as 
a merit function in an algorithm. 

To immediately illustrate the possible use of this re- 
sult, let us consider the extension of Newton’s method 
to the solution of (6). Let x € dom uM X, and con- 
sider the following cost approximating mapping: y +> 
P(y, x) = VF(x)(y — x). The CA subproblem then is 
the linearized variational inequality problem of finding 
y € dom uM X such that 


[F(x) + VF(x) '(y—x)]' (z—y) + u(z), —u(y) = 0, 
VzeX. (10) 


Assuming that x is not a solution to (6), we are inter- 
ested in utilizing the direction d := y — x ina line search 
based on a merit function. We will utilize the primal gap 
function ([2,62]) for this purpose, which corresponds to 
the choice ¢ := 0 in the definition of y. We denote the 
primal gap function by y,. Let w be an arbitrary solu- 
tion to its inner problem, that is, yp(x) = u(x) — u(w) 
+ F(x)T(x — w). The steplength is chosen such that the 
value of yy, decreases sufficiently; to show that this is 
possible, we use Danskin’s theorem and the variational 
inequality (10) with z = w to obtain (the maximum is 
taken over all w defining ¥,(x)) 


Wi (xsd) 
= max {[F(x) + VF(x)"(x— wy] d +u'@sa)} 
< —Wp(x) — d' VF(x)"d, 


which shows that d defines a direction of descent with 
respect to the merit function yy, at all points outside the 
solution set, whenever F is monotone and in C! on dom 
uM X. (See also [30] for convergence rate results.) So, 
if Newton’s method is supplied with a line search with 
respect to the primal gap function, it is globally conver- 
gent for the solution of variational inequality problems. 

The merit function y and the optimization problem 
(9) cover several examples previously considered for the 
solution of (6). 

The primal gap function, as typically all other gap 
functions, is nonconvex, and further also nondiffer- 
entiable in general. In order to utilize methods from 


differentiable optimization, we consider letting g be 
strictly convex, whence the solution y’ to the inner 
problem (7) is unique. Under the additional assump- 
tion that dom u M X is bounded and that u is in C' on 
this set, w is in C! on dom uN X. Among the known 
differentiable gap functions that are covered by this 
class of merit functions we find those of [1,18,26,40], 
and [31,59,60,61]. 


Descent Properties 
Optimization 


Assume that x’ is not a solution to (2). Weare interested 
in the conditions under which the direction of d‘ := y' — 
x‘ provides a descent direction for the merit function T. 
Let d' := y'—x', where y’ is a possibly inexact solution 
to (3). Then, if 6! = Vo", the requirement is that 
Ty(7') < Tyr(x'), (11) 
that is, any improvement in the value of the subprob- 
lem objective over that at the current iterate is enough 
to provide a descent direction. To establish this result, 
one simply utilizes the convexity of y' and u and the 
formula for the directional derivative of T in the direc- 
tion of d'‘ (see [45, Prop. 2.14.b]). We further note that 
(11) is possible to satisfy if and only if x‘ is not a solution 
to (2); this result is in fact a special case of Theorem 1. 
If &' has stronger monotonicity properties, descent 
is also obtained when @' is not necessarily a gradient 
mapping, and, further, if it is Lipschitz continuous then 
we can establish measures of the steepness of the search 
directions, extending the gradient relatedness condi- 
tions of unconstrained optimization. Let D‘ be strongly 
monotone on dom u 1 X, that is, for x, ye domuN X, 


[O'(x) — &*(y)] "(x — y) > mor ||x— yl’, (12) 


for some mg: > 0. This can be used to establish that 
T'(x‘;d') <—mg: |d'|’. 


If y' is not an exact solution to (3), in the sense that for 
a vector y’, we satisfy a perturbation of (3) where its 
right-hand side 0” is replaced by r' # 0”, then d‘ := 
y' — x! is a descent direction for T at x" if || r! || < mo: 
|| a I. 
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Variational Inequality Problems 


The requirements for obtaining a descent direction in 
the problem (6) are necessarily much stronger than in 
the problem (2), the reason being the much more com- 
plex form that the merit functions for (6) takes. (For 
example, the directional derivative of T at x in any di- 
rection d depends only on those quantities, while the 
directional derivative of yw depends also on the argu- 
ment y which defines its value at x.) Typically, mono- 
tonicity of the mapping F is required, as is evidenced 
in the above example of the Newton method. If further 
a differentiable merit function is used, the requirements 
are slightly strengthened, as the following example re- 
sult shows. 


Theorem 3 (Descent properties, [45,60]) Assume that 
X is bounded, u is finite on X and F is monotone and 
in C! on X. Let y: X x X > R be a continuously differ- 
entiable function on X x X of the form g(y, x), strictly 
convex in y for each x € X. Let a > 0. Let x € X, y be the 
unique vector in X satisfying 


Wa(x) := max La(y, x), 
yEex 


where 
La ly, x) = u(x) — u(y) 


3 tad oly, 
a 


= 
1 
+ Ee = Vols.) (x — y). 
Then, with d := y — x, either d satisfies 


Walxsd) <—yalx), y € (0,1), 


or 


Vals) -= (yy, x) + Ve@(y, x)" d). 


1 

(=) 
A descent algorithm is devised from this result as fol- 
lows. For a given x € X and choice of a > 0, the CA sub- 
problem is solved with the scaled cost approximating, 
continuous and iteration-dependent function 9. If the 
resulting direction does not have the descent property, 
then the value of a is increased and the CA subproblem 
rescaled and resolved. Theorem 3 shows that a sufficient 
increase in the value of a will produce a descent direc- 
tion unless x solves (6). 


Steplength Rules 


In order to establish convergence of the algorithm, the 
steplength taken in the direction of d‘ must be such that 
the value of the merit function decreases sufficiently. 
An exact line search obviously works, but we will intro- 
duce simpler steplength rules that do not require a one- 
dimensional minimization to be performed. 

The first is the Armijo rule. We assume temporarily 
that u = 0. Let a, B € (0, 1), and £ := f", where T is the 
smallest nonnegative integer i such that 


f(x* a Bid’) — f(x") Es apiv f(x')'d'. 


There exists a finite integer such that (13) is satisfied for 


(13) 


any search direction d:= y' —x' satisfying (11), by the 
descent property and Taylor’s formula (see [45, Lemma 
2.24.b]). 

In the case where u # 0, however, the situation 
becomes quite different, since T := f + u is nondif- 
ferentiable. Simply replacing Vf(x‘)T d‘ with T’(x‘;d') 
does not work. We can however use an overestimate of 
the predicted decrease T’(x';d‘). Let a, B € (0, 1), and 
€ := B', where T is the smallest nonnegative integer i 
such that 

T(x! + Bid’) — T(x!) 

< af'[Ve'(x') — Ve'(yI"d', 
where now y' necessarily is an exact solution to (3), 
and gy‘ must further be strictly convex. We note that 
T (x'sd') < [Ve'(x') — Vo'(y')]T d’ indeed holds, with 
equality in the case where u = 0 and X = R” (see [45, 
Remark 2.28]). 
To develop still simpler steplength rules, we further 


assume that Vf is Lipschitz continuous, that is, that for 
x,y €domun x, 


IV fx) — VFC)I| < Myz llx — IL, 


for some My f > 0. The Lipschitz continuity assump- 
tion implies that for every ¢ € [0, 1], 


T(x’ + ld‘) — T(x’) 
< [Vo'(x')—Vo'(y')] | dt 
+e lat 


2 
3 


adding a strong convexity assumption on 9’ yields that 


T (xt + Ed‘) — T(x!) <2 (-me + “) a’. 
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This inequality can be used to validate the relaxation 
step, which takes 


2Mgt 
l,€ | 0, M [0, 1], (14) 
Myf 
and the divergent series steplength rule, 
Co 
[0,1] > {£:}>0, > e,=00. (15) 
t=0 


In the case of (14), descent is guaranteed in each step, 
while in the case of (15), descent is guaranteed after a fi- 
nite number of iterations. 


Convergence Properties 


Convergence of the CA algorithm can be established 

under many combinations of 

i) the properties of the original problem mappings; 

ii) the choice of forms and convexity properties of the 
cost approximating mappings; 

iii) the choice of accuracy in the computations of the 
CA subproblem solutions; 

iv) the choice of merit function; and 

v) the choice of steplength rule. 

A subset of the possible results is found in [45, Chapt. 

5-9]. Evident from these results is that convergence re- 

lies on reaching a critical mass in the properties of the 

problem and algorithm, and that, given that this critical 

mass is reached, there is a very large freedom-of-choice 

how this mass is distributed. So, for example, weaker 

properties in the monotonicity of the subproblem must 

be compensated both by stronger coercivity conditions 

on the merit function and by the use of more accurate 

subproblem solutions and steplength rules. 


Decomposition CA Algorithms 


Assume that dom u M X is a Cartesian product set, that 
is, for some finite index set C and positive integers n; 
with )o ce mi = Nn, 


x=][X,. X; CR"; 
i€C 
u(x) = > uj(x;), uj: R"’ > RU {+oo}. 
ie 
Such problems arise in applications of equilibrium pro- 
gramming, for example in traffic ([41]) and Nash equi- 


librium problems ([21]); of course, box constrained and 
unconstrained problems fit into this framework as well. 

The main advantage of this problem structure is 
that one can devise several decomposition versions of 
the CA algorithm, wherein components of the original 
problem are updated upon in parallel or sequentially, 
independently of each other. With the right computer 
environment at hand, this can mean a dramatic increase 
in computing efficiency. We will look at three comput- 
ing models for decomposition CA algorithm, and com- 
pare their convergence characteristics. In all three cases, 
decomposition is achieved by choosing the cost approx- 
imating mapping separable with respect to the partition 
of R" defined by C: 


D(x)" = [B1(x1)",..., Bje(xje)) "1. (16) 


The individual subproblems, given x, then are to find y;, 
i € @, such that 


Pi(yi) + Ouj(yi) + Nx,(yi) + Fi(x) — ®i(x;)) 
30"; 
if ®; = Vq@; for some convex function g;: dom uj; NM X; 
— Rin C! on dom u; MN Xj, then this is the optimality 
conditions for 


in Ty, (y; 
min Ty;(yi) 


i i 


= 9i(yi) + ui(yi) + [Fi(x) — Veil] yi - 


Sequential Decomposition 


The sequential CA algorithm proceeds as follows. Given 
an iterate x‘ € dom uM X at iteration t, choose an index 
i; € C and a cost approximating mapping Pi, and solve 
the problem of finding yi, € Rt such that (i = i;) 


Pi (yi) + Oui(y;) + Nx,(y}) + Fi(x') — @] (x!) 
30". 


Let yi := xj for all j € C \ {i;} and d! := y' — x. The next 
iterate, x'*!, is then defined by x'*! := x! + €,d‘, that is, 


atthe x; + £4(y} =e); j=, 


, Xi, j x lis 
for some value of £; such that ee, + L(y, - xi) € dom 
ui, Xi, and the value of a merit function w is reduced 
sufficiently. 
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Assume that F is the gradient of a function f: dom u 
1 X — R. Let the sequence {i,} be chosen according to 
the cyclic rule, that is, in iteration tf, 


it =f 


(mod |C|) + 1. 


Choose the cost approximating mapping (i = i;) 


yirr Oy) = Vif (xb, yi), 
Yi € dom Uj M1 X;. 


Note that this mapping is monotone whenever f is con- 
vex in x;. Since ®}(x') = Vif(x'), the CA subproblem is 
equivalent (under this convexity assumption) to finding 


t ‘ t . (y; 
Vi€ arg min tf (gi) + ui(yi)}. 


An exact line search would produce ¢; := 1, since yj 
minimizes f (xz ;, -)+ uj over dom uj; M X; (the remain- 
ing components of x kept fixed), and so x/*! := y!. The 
iteration described is that of the classic Gauss-Seidel al- 
gorithm ([35]) (also known as the relaxation algorithm, 
the coordinate descent method, and the method of suc- 
cessive displacements), originally proposed for the solu- 
tion of unconstrained problems. The Gauss-Seidel al- 
gorithm is hence a special case of the sequential CA al- 
gorithm. 

In order to compare the three decomposition ap- 
proaches, we last provide the steplength requirement in 
the relaxation steplength rule (cf. (14)). The following 
interval is valid under the assumptions that for each i 
€ C, V; f is Lipschitz continuous on dom u; 1 X; and 
each mapping 9 is strongly monotone: 


2m: 
tin € (0, +) 9 fo. 
M 


Vif 


(17) 


Synchronized Parallel Decomposition 


The synchronized parallel CA algorithm is identical to 
the original scheme, where the CA subproblems are 
constructed to match the separability structure in the 
constraints. 

We presume the existence of a multiprocessor pow- 
erful enough to solve the |C| CA subproblems in par- 
allel. (If fewer than |C| processors are available, then ei- 
ther some of the subproblems are solved in sequence or, 
if possible, the number of components is decreased; in 


either case, the convergence analysis will be the same, 
with the exception that the value of |C| may change.) 

In the sequential decomposition CA algorithm, the 
steplengths are chosen individually for the different 
variable components, whereas the original CA algo- 
rithm uses a uniform steplength, @;. If the relative scal- 
ing of the variable components is poor, in the sense that 
F or u changes disproportionally to unit changes in the 
different variables x;, i € C, then this ill-conditioning 
may result in a poor performance of the parallel algo- 
rithm. Being forced to use the same steplength in all the 
components can also have an unwanted effect due to 
the fact that the values of some variable components are 
close to their optimal ones while others may be far from 
optimal, in which case one might for example wish to 
use longer steps for the latter components. These two 
factors lead us to introduce the possibility to scale the 
component directions in the synchronized parallel CA 
algorithm. We stress that such effects cannot in general 
be accommodated into the original algorithm through 
a scaling of the mappings ®'. The scaling factors sj, ; in- 
troduced are assumed to satisfy 

O0<s,<si¢ <1, i€€. 

Note that the upper bound of one is without any loss of 
generality. 

Assume that F is the gradient of a function f: dom u 
1X — R. In the parallel algorithm, choose the cost ap- 
proximating mapping of the form (16), where for each 
ie, 


yi Offy) = Viflatan yo 
yi € domu; 1 X;j. 


This mapping is monotone on dom u M X whenever f 
is convex in each component x;. Since ®}(x‘) = Vif (x‘), 
i € @, it follows that the CA subproblem is equivalent 
(under the above convexity assumption on f) to finding 


t : t ; (y; 
y; € arg min {f(x 4; 71) + ui(yi)}- 


Choosing £; := 1 and 5;; := 1,1 € @, yields x‘*! := y’, 
and the resulting iteration is that of the Jacobi algo- 
rithm [8,35] (also known as the method of simultaneous 
displacements). The Jacobi algorithm, which was origi- 
nally proposed for the solution of systems of equations, 
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is therefore a parallel CA algorithm where the cost ap- 
proximating mapping is (18) and unit steps are taken. 
The admissible step in component j is ¢s;,+ € [0, 1], 


where 
2Mg! ) 
sitMvyf 


The maximal step is clearly smaller than in the se- 
quential approach. To this conclusion contributes both 
the minimum operation and that My; < Mvy; both 
of these requirements are introduced here because the 
update is made over all variable components simulta- 
neously. (An intuitive explanation is that the sequen- 
tial algorithm utilizes more recent information when it 
constructs the subproblems.) One may therefore expect 
the sequential algorithm to converge to a solution with 
a given accuracy in less iterations, although the paral- 
lel algorithm may be more efficient in terms of solution 
time; the scaling introduced by s;,; may also improve 
the performance of the parallel algorithm to some ex- 
tent. 

Although the parallel version of the algorithm may 
speed-up the practical convergence rate compared to 
the sequential one, the need for synchronization in 
carrying out the updating step will generally deterio- 
rate performance, since faster processors must wait for 
slower ones. In the next section, we therefore introduce 
an asynchronous version of the parallel algorithm, in 
which processors do not wait to receive the latest infor- 
mation available. 


Le (0.min (18) 


i€C 


Asynchronous Parallel Decomposition 


In the algorithms considered in this Section, the syn- 
chronization step among the processors is removed. Be- 
cause the speed of computations and communications 
can vary among the processors, and communication 
delays can be substantial, processors will perform the 
calculations out of phase with each other. Thus, the ad- 
vantage of reduced synchronization is paid for by an 
increase in interprocessor communications, the use of 
outdated information, and a more difficult convergence 
detection (see [8]). (Certainly, the convergence analysis 
also becomes more complicated.) Recent numerical ex- 
periments indicate, however, that the introduction of 
such asynchronous computations can substantially en- 
hance the efficiency of parallel iterative methods (e. g., 
[6,11,15]). 


The model of partial asynchronism that we use is as 
follows. For each processor (or, variable component) i 
€ C, we introduce 
a) initial conditions, x;(t) := x € X;, for all t < 0; 
b) aset J‘ of times at which x; is updated; and 
c) a variable ti(t) for each j € C and t € J’, denoting 
the time at which the value of x; used by processor 
iat time t is generated by processor j, satisfying 0 < 
Ti(t) <tforallj ¢ Candt> 0. 
We note that the sequential CA algorithm and the 
synchronized parallel CA algorithm can both be ex- 
pressed as asynchronous algorithms: the cyclic sequen- 
tial algorithm model is obtained from the choices J! 
= Ugsof|C] k + i — 1} and Ti(t) := t, while the syn- 
chronous parallel model is obtained by choosing J’ := 
{1,2,...} and ti(t) := t, for all i, j and t. 
The communication delay from processor j to pro- 
cessor i at time f is t— ti(t). The convergence of the 
partially asynchronous parallel decomposition CA algo- 
rithm is based on the assumption that this delay is upper 
bounded: there exists a positive integer P such that 
i) for every i € Cand t > 0, at least one element of {t, 
..., f+ P— 1} belongs to Ji, 

ii) O< t — ti(t) < P —1 holds for all i, j € € and all t 
> 0; and 

iii) ti (t) = t holds for all i € C and all t > 0. 


In short, parts i) and ii) of the assumption state 
that no processor waits for an arbitrarily long time to 
compute a subproblem solution or to receive a mes- 
sage from another processor. (Note that a synchronized 
model satisfies P = 1.) Part iii) of the assumption states 
that processor i always uses the most recent value of its 
own component x; of x, and is in [58] referred to as 
a computational nonredundancy condition. This condi- 
tion holds in general when no variable component is 
updated simultaneously by more than one processor, 
as, for example, in message passing systems. For further 
discussions on the assumptions, we refer the reader to 
[8,57]; we only remark that they are easily enforced in 
practical implementations. 

The iterate x(t) is defined by the vector of x;(t), i 
€ C. At a given time t, processor i has knowledge of 
a possibly outdated version of x(t); we let 


(= [cia ae xei(t}e(0)" | 
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denote this vector. (Note that iii) above implies the re- 
lation xi(t) = x;(ti(t)) = x;(t).) 

To describe the (partially) asynchronous parallel 
CA algorithm, processor i updates x;(t) according to 


xi(t + 1) = xj(t) + €si(yi(t) — xi()), 
Pes" 


where y;(t) solves the CA subproblem defined at x‘(t), 
and s; € (0, 1] is a scaling parameter. (We define d;(t) := 
yi(t)— xi(t) to be zero at each t ¢ J'.) 

The admissible steplength for i € C is fs; € [0, 1], 
where 


2 minjee{ 2+} (19) 


Ot (0 Mvyll + (el + DPI 


If further for some M > 0 and every i € ©, all vectors 
x, yin dom u¢N X with x; = y; satisfy 
Vif) — Vif I] SM lx — yl. (20) 


then, in the above result, the steplength restrictions are 
adjusted to 


2minjee{~} 
Leo, : ; 
My, + (|C| + 1) MP 


(We interpret the property (20) as a quantitative mea- 
sure of the coupling between the variables.) 

Most important to note is that the upper bound 
on ¢£ is (essentially) inversely proportional to the maxi- 
mal allowed asynchronism P; this is very intuitive, since 
if processors take longer steps then they should ex- 
change information more often. Conversely, the more 
outdated the information is, the less reliable it is, hence 
the shorter step. 

The relations among the steplengths in the three 
approaches (cf. (17), (18), and (19)) quantify the intu- 
itive result that utilizing an increasing degree of paral- 
lelism and asynchronism results in a decreasing quality 
of the step directions, due to the usage of more out- 
dated information; subsequently, smaller steplengths 
must be used. More detailed discussions about this 
topic is found in [45, Sect. 8.7.2]. 


See also 
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Synonyms 


Credit scoring; Credit granting; Financial risk manage- 
ment; Optimization 


Introduction/Background 


Financial risk management has evolved over the past 
two decades in terms of both its theory and its prac- 
tices. Economic uncertainties, changes in the business 
environment and the introduction of new complex fi- 
nancial products (e. g., financial derivatives) led finan- 
cial institutions and regulatory authorities to the devel- 
opment of a new framework for financial risk manage- 
ment, focusing mainly on the capital adequacy of banks 
and credit institutions. 

Banks and other financial institutions are exposed 
to many different forms of financial risks. Usually these 
are categorized as [14]: 


e Market risk that arises from the changes in the prices 
of financial securities and currencies. 

e Credit risk originating from the inability of firms 
and individuals to meet their debt obligations to 
their creditors. 

e Liquidity risk that arises when a transaction cannot 
be conducted at the existing market prices or when 
early liquidation is required in order to meet pay- 
ments obligations. 

e Operational risk that originate from human and 
technical errors or accidents. 

e Legal risk which is due to legislative restrictions on 
financial transactions. 

Among these types of risk, credit risk is considered as 

the primary financial risk in the banking system and ex- 

ists in virtually all income-producing activities [7]. How 

a bank selects and manages its credit risk is critically 

important to its performance over time. 

In this context credit risk management defines the 
whole range of activities that are implemented in order 
to measure, monitor and minimize credit risk. Credit 
risk management has evolved dramatically over the 
last 20 years. Among others, some factors that have 
increased the importance of credit risk management 
include [2]: (i) the worldwide increase in the num- 
ber of bankruptcies, (ii) the trend towards disinterme- 
diation by the highest quality and largest borrowers, 
(iii) the increased competition among credit institu- 
tions, (iv) the declining value of real assets and collat- 
eral in many markets, and (v) the growth of new finan- 
cial instruments with inherent default risk exposure, 
such as credit derivatives. 

Early credit risk management was primarily based 
on empirical evaluation systems of the creditworthiness 
of a client. CAMEL has been the most widely used sys- 
tem in this context, which is based on the empirical 
combination of several factors related to capital, assets, 
management, earnings and liquidity. 

It was soon realized, however, that such empirical 
systems cannot provide a solid and objective basis for 
credit risk management. This led to an outgrowth of 
studies from academics and practitioners on the devel- 
opment of new credit risk assessment systems. These 
efforts were also motivated by the changing regulatory 
framework that now requires banks to implement spe- 
cific methodologies for managing and monitoring their 
credit portfolios [4]. 
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The existing practices are based on sophisticated 
statistical and optimization methods, which are used 
to develop a complete framework for measuring and 
monitoring credit risk. Credit rating models are in the 
core of this framework and are used to assess the credit- 
worthiness of firms and individuals. The following sec- 
tions describe the functionality of credit rating systems 
and the type of optimization methods that are used 
in some popular techniques for developing rating sys- 
tems. 


Definitions 


As already noted, credit risk is defined as the likelihood 
that an obligor (firm or individual) will be unable or un- 
willing to fulfill debt obligations towards the creditors. 
In such a case, the creditors will suffer losses that have 
to be measured as accurately as possible. 

The expected loss Lj; over a period t from granting 
credit to a given obligor i can be measured as follows: 


Lit = PDj,LGD; EAD; 


where PD;, is the probability of default for the obligor i 
in the time period t, LGD; is the percentage of exposure 
the bank might lose in case the borrower defaults and 
EAD; is the amount outstanding in case the borrower 
defaults. The time period t is usually taken equal to one 
year. 

In the new regulatory framework default is consid- 
ered to have occurred with regard to a particular obligor 
when one or more of the following events has taken 
place [4,11]: 

e itis determined that the obligor is unlikely to pay its 
debt obligations in full; 

e acredit loss event associated with any obligation of 
the obligor; 

e the obligor is past due more than 90 days on any 
credit obligation; or 

e the obligor has filed for bankruptcy or similar pro- 
tection from creditors. 

The aim of credit rating models is to assess the prob- 
ability of default for an obligor, whereas other models 
are used to estimate LGD and EAD. Rating systems 
measure credit risk and differentiate individual cred- 
its and groups of credits by the risk they pose. This 
allows bank management and examiners to monitor 
changes and trends in risk levels thus promoting safety 


and soundness in the credit granting process. Credit 
rating systems are also used for credit approval and 
underwriting, loan pricing, relationship management 
and credit administration, allowance for loan and lease 
losses and capital adequacy, credit portfolio manage- 
ment and reporting [7]. 


Formulation 


Generally, a credit rating model can be considered as 
a mapping function f : R” —~> G that estimates the 
probability of default of an obligor described by a vec- 
tor x € R" of input features and maps the result to a set 
G of risk categories. The feature vector x represents all 
the relevant information that describes the obligor, in- 
cluding financial and nonfinancial data. 

The development of a rating model is based on the 
process of Fig. 1. 

The process begins with the collection of appropri- 
ate data regarding known cases in default and nonde- 
fault cases. These data can be taken from the historical 
database of a bank, or from external resources. At this 
data selection stage, some preprocessing of the data is 
necessary in order to transform the obtained data into 
useful features, to clean out the data from possible out- 
liers and to select the appropriate set of features for the 
analysis. These steps lead to the final data {x;, y;}/"_,, 
where x; is the input feature vector for obligor i, y; in 
the known status of the obligor (e.g. y; = —1 for cases 
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in default and y; = 1 for nondefault cases), and m in 
the number of observations in the data set. These data, 
which are used for model development, are usually re- 
ferred to as training data. 

The second stage involves the optimization process, 
which refers to the identification of the model’s pa- 
rameters that best fit the training data. In the simplest 
case, the model can be expressed as a linear function of 
the form: 


f(x) = xB + Bo 


where B € R” is the vector with the coefficients of the 
selected features in the model and fo is a constant term. 
Other types of nonlinear models are also applicable. 

In the above linear case, the objective of the opti- 
mization process is to identify the optimal parameter 
vector « = (f, Bo) that best fit the training data. This 
can be expressed as an optimization problem of the fol- 
lowing general form: 


min L(a, X) (1) 


where S is a set of constraints that define the feasible 
(acceptable) values for the parameter vector «, X is the 
training data set and £ is a loss function measuring the 
differences between the model’s output and the given 
classification of the training observations. 

The result of the model optimization process are 
validated using another sample of obligors with known 
status. This is referred to as the validation sample. Typ- 
ically it consists of cases different than the ones of the 
training sample and for a future time period. The op- 
timal model is applied to these new observations and 
its predictive ability is measured. If this is acceptable, 
then the model’s outputs are used to define a set of risk 
rating classes (usually 10 classes are used). Each rating 
class is associated with a probability of default and it 
includes borrowers with similar credit risk levels. The 
defined rating needs also to be validated in terms of its 
stability over time, the distribution of the borrowers in 
the rating groups, and the consistency between the es- 
timated probabilities of default in each group and the 
empirical ones which are taken from the population of 
rated borrowers. 


Methods/Applications 


The optimization problem (1) is expressed in differ- 
ent forms depending on the method used to develop 
the rating model. The characteristics of some popular 
methods are outlined below. 


Logistic Regression 


Logistic regression is the most widely used method in fi- 
nancial decision-making problems, with numerous ap- 
plications in credit risk rating. Logistic regression as- 
sumes that the log of the probability odds is a linear 
function: 

P 


epg le 


where p = Pr(1| x) is the probability that an obligor x 
is a member of class 1, which is then expressed as 


—1 
p= [1 4+ a 


The parameters of the model (constant term fy and 
coefficient vector B) are estimated to maximize the con- 
ditional likelihood of the classification given the train- 
ing data. This is expressed as 


m 
max Pr(y; | x;) 
ma, al 


which can be equivalently written as 


m 


yitl 1— yi 
In(p; In(1 — p; 
amex n(p;) + n(l = p | 


i 2 2 
where y; = 1 if obligor iis in the nondefault group and 
yi = —1 otherwise. 

Nonlinear optimization techniques such as the 
Newton algorithm are used to perform this optimiza- 
tion. 

Logistic regression has been widely applied in credit 
risk rating both by academics and by practitioners [1]. 
Its advantages are mainly related to its simplicity and 
transparency: it provides direct estimates of the proba- 
bilities of default as well as estimates for the significance 
of the predictor variables and it is computationally fea- 
sible even for large data sets. 
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Neural Networks 


Neural networks is a popular methodology for devel- 

oping decision-making models in complex domains. 

A neural network is a network of parallel processing 

units (neurons) organized into layers. A typical struc- 

ture of a neural network (Fig. 2) includes the following 
structural elements: 

1. An input layer consisting of a set of nodes (process- 
ing units - neurons); one for each input to the net- 
work. 

2. An output layer consisting of one or more nodes de- 
pending on the form of the desired output of the 
network. In classification problems, the number of 
nodes of the output layer is determined in accor- 
dance with the the number of groups. 

3. A series of intermediate layers referred to as hidden 
layers. The nodes of each hidden layer are fully con- 
nected with the nodes of the subsequent and the pro- 
ceeding layer. 

Each connection between two nodes of the network 
is assigned a weight representing the strength of the 
connection. On the basis of the connections’ weights, 
the input to each node is determined as the weighted 
average of the outputs from all the incoming connec- 
tions. Thus, the input in;, to node i of the hidden layer 
r is defined as follows: 


r-1 "j 


F j 
Nir = > >» W;,0ik + Dir 


j=0 k=1 


Output 1 Output 2 


Output layer 


Hidden layer 


Input layer 


Input 1 


Input 2 Input 3 


Credit Rating and Optimization Methods, Figure 2 
A typical architecture of a neural network 


where nj; is the number of nodes at the hidden layer j, 
wik is the weight of the connection between node i at 
layer r and node k at layer j, 04; is the output of node k 
at layer j and ¢;, an bias term. 

The output of each node is specified through 
a transformation function. The most common form of 
this function is the logistic function: 


oir = (1 +exp 1 


The determination of the optimal neural net- 
work model requires the estimation of the connec- 
tion weights and the bias terms of the nodes. The 
most widely used network training methodology is the 
backpropagation approach [18]. Nonlinear optimiza- 
tion techniques are used for this purpose [10,13,16]. 

Neural networks have become increasingly popu- 
lar in recent years for the development of credit rating 
models [3]. Their main advantages include their abil- 
ity to model complex nonlinear relationships in credit 
data, but they have also been criticized for their lack of 
transparency, the difficulty of specifying a proper archi- 
tecture and the increased computational resources that 
are needed for large data sets. 


Support Vector Machines 


Support vector machines (SVMs) have become an in- 
creasingly popular nonparametric methodology for de- 
veloping classification models. In a dichotomous clas- 
sification setting, SVMs can be used to develop a linear 
decision function f(x) = sgn(xB + Bo). 

The optimal decision function f should maximize 
the margin induced in the separation of the classes [24], 
which is defined as 2/||B||. Thus, the estimation of the 
optimal model is expressed as a quadratic program- 
ming problem of the following from: 


min 16'B + Ce'd 
subject to Y(XB + efy)) +d>e (2) 
B,Bo e¢ R,d=0 


where X is an m Xn matrix with the training data, Y 
is an mxm matrix such that Y;; = y; and Y;; = 0 forall 
i # j,d is m x 1 vector with nonnegative error (slack) 
variables defined such that d; > 0 iff yj(x;B + Bo) < Le 
isa m 1 vector of ones, and C > 0 is a user-defined con- 
stant representing the trade-off between the two con- 
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flicting objectives (maximization of the separating mar- 
gin and minimization of the training errors). 

SVMs can also be used to develop nonlinear 
models. This is achieved by mapping the problem 
data to a higher-dimensional space H (feature space) 
through a transformation of the form xix} = 
h(x) | (x j). The mapping function ¢ is implicitly de- 
fined through a symmetric positive definite kernel func- 
tion K(x;,xj) = b(xi)' (x;) [22]. The representation 
of the data using the kernel function enables the devel- 
opment of a linear model in the feature space H. 

For large training sets several computational proce- 
dures have been proposed to enable the fast training of 
SVM models. Most of these procedures are based on 
a decomposition scheme. The optimization problem (2) 
is decomposed into smaller subproblems taking advan- 
tage of the sparse nature of SVM models, since only 
a small part of the data (the support vectors), contribute 
to the final form of the model. A review of the algo- 
rithms for training SVMs can be found in [6]. 

SVMs seem to be a promising methodology for 
developing credit rating models. The algorithmic op- 
timization advances enable their application to large 
credit data sets and they provide a unified framework 
for developing both linear and nonlinear models. Re- 
cent application of SVMs in credit rating can be found 
in [9,12,21]. 


Multicriteria Value Models 
and Linear Programming Techniques 


The aforementioned classification methods assume that 
the groups are defined in a nominal way (i.e., the 
grouping provides a simple description of the cases). 
However, in credit risk modeling the groups are defined 
in an ordinal way, in the sense that an obligor classified 
in alow risk group is preferred to an obligor classified in 
a high risk group (in terms of its probability of default). 
Multicriteria methods are well-suited to the study of or- 
dinal classification problems [26]. 

A typical multicriteria method that is well-suited for 
the development of credit rating models is the UTADIS 
method. The method leads to the development of an 
multiattribute additive value function: 


V(x) = > wjvj(x)) 
j=l 


where w; is the weight of attribute j, and vj(x;) is the 
corresponding marginal value function. Each marginal 
value function provides a monotone mapping of the 
performance of the obligors on the corresponding at- 
tribute in a scale between 0 (high risk) and 1 (low 
risk). According to [15], such an additive value func- 
tion model is well-suited for credit scoring and is widely 
used by banks in their internal rating systems. 

Using a piece-wise linear modeling approach, the 
estimation of the value function is performed based on 
a set of training data using linear programming tech- 
niques. For a two-class problem, the general form of the 
linear programming formulation is as follows [8]: 


min d; + dy +--+ dm 
subject to: y;[V(x;) — 6] + d; > 6, 
Wp twrete-t+wWwy = 1 
wj, di, B= 0 


Le Qiycay 


where f is a value threshold that distinguishes the two 
classes, 6 is a small positive user-defined constant and 
d; = max{0, 6 — y;[V(x;) — B]} denotes the classifica- 
tion error for obligor i. 

Extensions of this framework and alternative lin- 
ear programming formulations with applications to 
credit risk rating have been presented by [5,17,19]. The 
main advantages of these methodologies involve their 
computational efficiency and the simplicity and trans- 
parency of the resulting models. 


Evolutionary Optimization 


Evolutionary algorithms (EA) are stochastic search and 
optimization heuristics inspired from the theory of nat- 
ural evolution. In an EA, different possible solutions of 
an optimization problem constitute the individuals of 
a population. The quality of each individual is assessed 
with a fitness (objective) function. Better solutions are 
assigned higher fitness values than worse performing 
solutions. The key idea of EAs is that the optimal so- 
lution can be found if an initial population is evolved 
using a set of stochastic genetic operators, similar to the 
“survival of the fittest” mechanism of natural evolution. 
The fitness values of the individuals in a population 
are used to define how they will be propagated to sub- 
sequent generations of populations. Most EAs include 
operators that select individuals for reproduction, pro- 
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duce new individuals based on those selected, and de- 
termine the composition of the population at the sub- 
sequent generation. 

Well-known EAs and similar metaheuristic tech- 
niques include, among others, genetic algorithms, ge- 
netic programming, tabu search, simulated annealing, 
ant colony optimization and particle swarm optimiza- 
tion. EAs have been used to facilitate the development 
of credit rating systems addressing some important is- 
sues such as feature selection, rule extraction, neural 
network development, etc. Some recent applications 
can be found in the works of Varetto [20], Salcedo- 
Sanza et al. [25] and Tsakonas et al. [23]. 


Conclusions 


Credit rating systems are in the core of the new regula- 
tory framework for the supervision of financial institu- 
tions. Such systems support the credit granting process 
and enable the measurement and monitoring of credit 
risk exposure. 

The increasing volume of credit data which are 
available for developing rating systems highlight the 
importance of implementing efficient optimization 
techniques for the construction of rating models. The 
existing optimization methods used in this field, are 
mainly based on nonlinear optimization, linear pro- 
gramming and evolutionary algorithms. 

Future research is expected to take advantage of 
the advances in computer science, algorithmic devel- 
opments regarding new forms of decision models, the 
analysis of the combination of different models, the 
comparative investigation on the performance of the 
existing methods and the implementation into decision 
support system that can be used by credit analysts in 
their daily practice. 
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Synonyms 


Criss-cross 


Introduction 


From the early days of linear optimization (LO) (or lin- 
ear programming), many people have been looking for 
a pivot algorithm that avoids the two-phase procedure 
needed in the simplex method when solving the general 
LO problem in standard primal form 


min {cl x: Ax =b,x> 0} : 
and its dual 
max {b! y: Aly< Ch 


Such a method was assumed to rely on the intrinsic 
symmetry behind the primal and dual problems (i.e. it 
hoped to be selfdual), and it should be able to start with 
any basic solution. 

There were several attempts made to relax the fea- 
sibility requirement in the simplex method. It is im- 
portant to mention Dantzig’s [7] parametric selfdual 
simplex algorithm. This algorithm can be interpreted 
as Lemke’s algorithm [22] for the corresponding lin- 
ear complementarity problem (cf. » Linear comple- 
mentarity problem) [23]. In the 1960s people realized 
that pivot sequences through possibly infeasible basic 
solutions might result in significantly shorter paths to 
the optimum. Moreover a selfdual one phase proce- 
dure was expected to make linear programming more 
easily accessible for broader public. Probably these ad- 
vantages stimulated the introduction of the criss-cross 
method by S. Ziont [39,40]. 


Ziont’s Criss-Cross Method 


Assuming that the reader is familiar with both the 
primal and dual simplex methods, Ziont’s criss-cross 
method can easily be explained. 
e Itcan be initialized by any, possibly both primal and 
dual infeasible basis. 
If the basis is optimal, we are done. 
If the basis is not optimal, then there are some 
primal or dual infeasible variables. One might 
choose any of these. 
It is advised to choose once a primal and then 
a dual infeasible variable, if possible. 
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e If the selected variable is dual infeasible, then it 
enters the basis and the leaving variable is chosen 
among the primal feasible variables in such a way 
that primal feasibility of the currently primal feasi- 
ble variables is preserved. 

If no such basis exchange is possible another in- 
feasible variable is selected. 

e If the selected variable is primal infeasible, then it 
leaves the basis and the entering variable is chosen 
among the dual feasible variables in such a way that 
dual feasibility of the currently dual feasible vari- 
ables is preserved. 

If no such basis exchange is possible another in- 
feasible variable is selected. 

If the current basis is infeasible, but none of the infeasi- 

ble variables allows a pivot fulfilling the above require- 

ments then it is proved that the LO problem has no op- 
timal solution. 

Once a primal or dual feasible solution is reached 
then Ziont’s method reduces to the primal or dual sim- 
plex method, respectively. 

One attractive character of Ziont’s criss-cross 
method is primal-dual symmetry (selfduality), and this 
alone differentiates itself from the simplex method. 
However it is not clear how one can design a finite ver- 
sion (i.e. a finite pivot rule) of this method. Both lex- 
icographic perturbation and minimal index resolution 
seem not to be sufficient to prove finiteness in the gen- 
eral case when the initial basis is both primal and dual 
infeasible. Nevertheless, although not finite, this is the 
first published criss-cross method in the literature. 

The other thread, that lead to finite criss-cross 
methods, was the intellectual effort to find finite, other 
than the lexicographic rule [4,8], variants of the simplex 
method. These efforts were also stimulated by study- 
ing the combinatorial structures behind linear pro- 
gramming. From the early 1970s in several branches of 
the optimization theory, finitely convergent algorithms 
were published. In particular A.W. Tucker [32] intro- 
duced the consistent labeling technique in the Ford- 
Fulkerson maximal flow algorithm; pivot selection rules 
based on least-index ordering, such as the Bard-type 
scheme for the P-matrix linear complementarity prob- 
lem (K.G. Murty, [24]) and the celebrated least-index 
rule in linear and oriented matroid programming (R.G. 
Bland, [2]). A thorough survey of pivot algorithms can 
be found in [29]. 


It is remarkable that almost at the same time, in dif- 
ferent parts of the world (China, Hungary, USA) es- 
sentially the same result was obtained independently 
by approaching the problem from quite different direc- 
tions. 

Below we will refer to the standard simplex (basis) 
tableau. A tableau is called terminal if it gives a pri- 
mal and dual optimal solution or evidence of primal or 
dual infeasibility/inconsistency of the problem. Termi- 
nal tableaus have the following sign structure. 


_ dual 
inconsistent 


_ primal 
inconsistent 


optimal 


Terminal tableaus 


The pivot operations at all known pivot methods, 
including all variants of the primal and dual simplex 
method and Ziont’s criss-cross method have the fol- 
lowing properties. When a primal infeasible variable is 
selected to leave the basis, the entering variable is se- 
lected so that after the pivot both variables involved 
in the pivot will be primal feasible. Analogously, when 
a dual infeasible variable is selected to enter the basis, 
then the leaving variable is selected in such a way that 
after the pivot both variables involved in the pivot will 
be dual feasible. Such pivots are called admissible. The 
sign structure of tableaus at an admissible pivot of ‘type 
YP and ‘type IT are demonstrated by the following fig- 
ure. 


Admissible pivot situations 


Observe that, while dual(primal) simplex pivots 
preserve dual(primal) feasibility of the basic solution, 
admissible pivots do not in general. Admissible pivots 
extract the greedy nature of pivot selection, i. e. ‘repair 
primal/dual infeasibility’ of the pivot variables. 
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The Least-Index Criss-Cross Method 


The first finite criss-cross algorithm, which we call the 
least-index criss-cross method, was discovered indepen- 
dently by Y.Y. Chang [26], T. Terlaky [26,27,28] and 
Zh. Wang [34]; further, a strongly related general re- 
cursion by D. Jensen [18]. Chang presented the algo- 
rithm for positive semidefinite linear complementarity 
problems, Terlaky for linear optimization, for oriented 
matroids and with coauthors for QP, LCP and for ori- 
ented matroid LCP [9,16,19], while Wang primarily for 
the case of oriented matroids. 

The least-index criss-cross method is perhaps the 
simplest finite pivoting method to LO problems. This 
criss-cross method is a purely combinatorial pivoting 
method, it uses admissible pivots and traverses through 
different (possibly both primal and dual infeasible) 
bases until the associated basic solution is optimal, or 
an evidence of primal or dual infeasibility is found. 

To ease the understanding a figure is included 
that shows the scheme of the least-index criss-cross 
method. 


choose first a | = 
infeasible gq 


p 
choose first choose first Stop optimal 
negative in positive in 
row p column g 
q 
Stop primal Stop dual 
infeasible infeasible 
q 
Pp 


pivot 


Scheme of the least-index criss-cross method 


Observe the simplicity of the algorithm: 
e It can be initiated with any basis. 

No two phases are needed. 

No ratio test is used to preserve feasibility, only the 

signs of components in a basis tableau and a prefixed 

ordering of variables determine the pivot selection. 
Several finiteness proofs for the least-index criss-cross 
method can be found in the literature. The proofs are 
quite elementary, they are based on the orthogonality of 
the primal and dual spaces [14,26,28,29,34]; on recur- 
sive argumentation [11,18,33] or on lexicographically 
increasing lists [11,14]. 


0 | Let an ordering of the variables be fixed. 

Let T(B) be an arbitrary basis tableau (it can be 
neither primal nor dual feasible); 

1 | Let r be the minimal i such that either «x; is pri- 
mal infeasible or x; has a negative reduced cost. 
IF there is no r, THEN stop; the first terminal 
tableau is obtained, thus T(B) is optimal. 

2 | IF x; is primal infeasible THEN let p := 1; q := 
min{l : ty¢ < O}. 

IF there is no q, THEN stop; the second terminal 
tableau is obtained, thus the primal problem is 
infeasible. 

Go to Step 3. 

IF x, is dual infeasible, THEN let gq := r : 
p s=min{l : tig > O}. 

IF there is no q, THEN stop: the third terminal 
tableau is obtained, thus the dual problem is in- 
feasible. 

Go to Step 3. 

3 | Pivot on (p, q). Go to Step 1. 


The least-index criss-cross rule 


One of the most important consequences of the 
finiteness of the least-index criss-cross method is the 
strong duality theorem of linear optimization. This gives 
probably the simplest algorithmic proof of this funda- 
mental result: 


Theorem 1 (Strong duality theorem) Exactly one of 

the following two cases occurs. 

e Atleast one of the primal problem and the dual prob- 
lem is infeasible. 

e Both problems have an optimal solution and the op- 
timal objective values are equal. 
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Other Interpretations 


The least-index criss-cross method can be interpreted 
as a recursive algorithm. This recursive interpretation 
and the finiteness proof based on it can be derived from 
the results in [2,3,18] and can be found in [33]. 


Recursive Interpretation 


As performing the _least-index 
method at each pivot one can make a note of 
the larger of the two indices r = max{p, q} that 
entered or left the basis. In this list, an index 
must be followed by another larger one before 
the same index occurs anew. 


criss-cross 


The recursive interpretation is becoming apparent 
when one notes that it is based on the observation that 
the size of the solved subproblem (the subproblem for 
which a terminal tableau is obtained) is monotonically 
increasing. 

The third interpretation is based on the proof tech- 
nique developed by J. Edmonds and K. Fukuda [9] and 
adapted by Fukuda and T. Matsui [11] to the case of the 
least-index criss-cross method. 


Lexicographically Increasing List 
Let u be an binary vector with appropriate di- 
mension, set initially to be the zero vector. In 
applying the algorithm let r = max{p, q} be the 
larger of the two indices p that entered or q that 
left the basis. 


At each pivot update u as follows: let u, = 1 and u; 
= 0, Vi< r. The remaining components of u stay 
unchanged. Then at each step of the least-index 
criss-cross method the vector u strictly increases 
lexicographically, thus the method terminates in 
a finite number of steps. 


Other Finite Criss-Cross Methods 


Both the recursive interpretation and the hidden flex- 
ibility of pivot selection in the least-index criss-cross 
method make it possible to develop other finite vari- 
ants. Such finite criss-cross methods, which do not rely 
on a fixed minimal index ordering, were developed on 
the basis of the finite simplex rules presented by S. 
Zhang [37]. These finite criss-cross rules [38] are as fol- 
lows. 


First-in Last-out Rule (FILO) 


First, choose a primal or dual infeasible variable 
that has changed its basis-nonbasis status most 
recently. 


Then choose a variable in the selected row or col- 
umn so that the pivot entry fulfills the sign re- 
quirement of the admissible pivot selection and 
which has changed its basis/nonbasis status most 
recently. 


When more than one candidates occur with the 
same pivot age then one break tie as you like (e. g. 
randomly). 


This rule can easily be realized by assigning an 
‘age’ vector u to the vector of the variables and 
using a pivot counter k. Initially we set k = 0 and 
u = 0. Increase k by one at each pivot and we set 
the pivot coordinates of u equal to k. Then the 
pivot selections are made by choosing the vari- 
able with the highest possible u; value satisfying 
the sign requirements. 


Most Often Selected Variable Rule 


First, choose a primal or dual infeasible variable 
that has changed its basis-nonbasis status most 
frequently. 


Then choose a variable in the selected row or col- 
umn so that the pivot entry fulfills the sign re- 
quirement of the admissible pivot selection and 
which has changed its basis/nonbasis status most 
frequently. 


When more than one candidates occur with the 
same pivot age then one break tie as you like (e. g. 
randomly). 


The most often selected rule can also be realized 
by assigning another ‘age’ vector u to the vector 
of the variables. Initially we set u = 0. At each 
pivot we increase the pivot-variable components 
of u by one. Then the pivot selections are made 
by choosing the variable with the highest possible 
u; value satisfying the sign requirement. 


Exponential and Average Behavior 


The worst-case exponential behavior of the least-index 
criss-cross method was studied by C. Roos [25]. Roos’ 
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exponential example is a variant of the cube of V. Klee 
and G.J. Minty [21]. In this example the starting solu- 
tion is the origin defined by a feasible basis, the variables 
are ordered so that the least-index criss-cross method 
follows a simplex path, i.e. without making any ratio 
test feasibility of the starting basis is preserved. An- 
other exponential example was presented by Fukuda 
and Namiki [12] for linear complementarity problems. 

Contrary to the clear result on the worst-case behav- 
ior, to date not much is known about the expected or 
average number of pivot steps required by finite criss- 
cross methods. 


Best-Case Analysis of Admissible Pivot Methods 


As it was discussed above, and it is the case for many 
simplex algorithms, the least-index criss-cross method 
is not a polynomial time algorithm. A question nat- 
urally arises: whether there exists a polynomial criss- 
cross method? Unfortunately no answer to this ques- 
tion is available at this moment. However some weaker 
variants of this question can be answered positively. 
The problem is stated as follows: An arbitrary basis is 
given. What is the shortest admissible pivot path from 
this given basis to an optimal basis? 

For nondegenerate problems, [10] shows the exis- 
tence of such an admissible pivot sequence of length at 
most m. The nondegeneracy assumption is removed in 
[15]. This result solves a relaxation of the d-step conjec- 
ture. 

Observe, that we do not know of any such result for 
feasibility preserving, i.e. simplex algorithms. In fact, 
the maximum length of feasibility-preserving pivot se- 
quences between two feasible bases is not known to be 
bounded by a polynomial in the size of the given LO 
problem. 


Generalizations 


Finite criss-cross methods were generalized to solve 
fractional linear optimization problems, to large classes 
of linear complementarity problems (LCPs; cf. » Lin- 
ear complementarity problem) and to oriented matroid 
programming problems (OMPs). 


Fractional Linear Optimization 


Fractional linear or, as it is frequently referred to, hyper- 
bolic programming, can be reformulated as a linear op- 


timization problem. Thus it goes without surprise that 
the least-index criss-cross method is generalized to this 
class of optimization problems as well [17]. 


Linear Complementarity Problems 


The largest solvable class of LCPs is the class of LCPs 
with a sufficient matrix [5,6]. The LCP least-index criss- 
cross method is a proper generalization of the LO criss- 
cross method. When the LCP arises from a LO problem, 
the LO criss-cross method is obtained. 


Convex Quadratic Optimization 


Convex quadratic optimization problems give an LCP 
with a bisymmetric coefficient matrix. Because a bisym- 
metric matrix is semidefinite and semidefinite matri- 
ces form a subclass of sufficient matrices, one obtain 
a finite criss-cross algorithm for convex quadratic opti- 
mization problems as well. Such criss-cross algorithms 
were published e. g. in [20]. The least-index criss-cross 
method is extremely simple for the P-matrix LCP. Start- 
ing from an arbitrary complementary basis, here the 
least-indexed infeasible variable leaves the basis and it is 
replaced by its complementary pair. This algorithm was 
originally proposed in [24], and studied in [12]. The 
general case of sufficient LCPs was treated in [4,13,16]. 


Oriented Matroids 


The intense research in the 1970s on oriented ma- 
troids and oriented matroid programming [2,9] gave 
a new insight in pivot algorithms. It became clear that 
although the simplex method has rich combinatorial 
structures, some essential results like the finiteness of 
Bland’s least-index simplex rule [2] does not hold in 
the oriented matroid context. Edmonds and Fukuda [9] 
showed that it might cycle in the oriented matroid case 
due to the possibility of nondegenerate cycling which is 
impossible in the linear case. 

The predecessors of finite criss-cross rules are: 
Bland’s recursive algorithm [2,3], the Edmonds- 
Fukuda algorithm [9], its variants and generaliza- 
tions [1,35,36,37]. All these are variants of the simplex 
method in the linear case, i. e. they preserve the feasibil- 
ity of the basis, but not in the oriented matroid case. In 
the case of oriented matroid programming only Todd’s 
finite lexicographic method [30,31] preserves feasibility 
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of the basis and therefore yields a finite simplex algo- 
rithm for oriented matroids. 


The least-index criss-cross method is a finite criss- 


cross method for oriented matroids [28,34]. A gen- 
eral recursive scheme of finite criss-cross methods is 
given in [18]. Finite criss-cross rules are also pre- 
sented for oriented matroid quadratic programming 
and for oriented matroid linear complementarity prob- 
lems [13,19]. 


See also 


> Least-Index Anticycling Rules 

> Lexicographic Pivoting Rules 

> Linear Programming 

> Pivoting Algorithms for Linear Programming 


Generating Two Paths 


> Principal Pivoting Methods for Linear 


Complementarity Problems 


> Probabilistic Analysis of Simplex Algorithms 
> Simplicial Pivoting Algorithms for Integer 


Programming 
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In solving global and combinatorial optimization prob- 
lems cuts are used as a device to discard portions of the 
feasible set where it is known that no optimal solution 
can be found. Specifically, given the optimization prob- 
lem 


min {f(x): x €DCR"}, (1) 


if x° is an unfit solution and there exists a function I(x) 
satisfying I(x°) > 0, while I(x) < 0 for every optimal solu- 
tion x, then by adding the inequality /(x) <0 to the con- 
straint set we exclude x° without excluding any optimal 
solution. The inequality I(x) < 0 is called a valid cut, or 
briefly, a cut. Most often the function /(x) is affine: the 
cut is then said to be linear, and the hyperplane /(x) = 0 
is called a cutting plane. However, nonlinear cuts have 
proved to be useful, too, for a wide class of problems. 
Cuts may be employed in different contexts: outer 
and inner approximation (conjunctive cuts), branch 
and bound (disjunctive cuts), or in combined form. 


Outer Approximation 


Let 2 C R" be the set of optimal solutions of problem 
(2). Suppose there exists a family P of polytopes P D 2 
such that for each P € P a distinguished point z(P) € P 
(conceived of as some approximate solution) can be de- 
fined satisfying the following conditions: 

Al) z(P) always exists (unless 2 = 9) and can be com- 
puted by an efficient procedure; 

A2) given any P € P and the associated distinguished 
point z = z(P), we can recognize when z € §2 and 
if z ¢ Q, we can construct an affine function [(x) 
such that P’ = P/M {x: I(x) < 0} € P, and I(z)> 0, 
while I(x) <0, Vx€ Q,i.e. Q2CP’CP\ {zt}. 

Under these conditions, one can attempt to solve prob- 

lem (2) by the following outer approximation method 

(OA method) [8]: 


Prototype OA (outer approximation) procedure 


0 | Start with an initial polytope P; € P. Set k=1. 
Compute the distinguished point zk = Z(Px) (by A1)). If 
Z(Px) does not exist, terminate: the problem is infeasible. 
If z(P,) € §2, terminate. Otherwise, continue. 


2 | Using A2), construct an affine function /,(x) such that Py, 


1=PKO {x:l (x) < Of € P and |, (x) strictly separates zk 
from 92, i.e. satisfies 


K(z) > 0, Ix) < OWX E @. (2) 
Set k < k +1 and return to Step 1. 
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The algorithm is said to be convergent if it is either 
finite or generates an infinite sequence {z*} every cluster 
point of which is an optimal solution of problem (2). 

Usually the distinguished point z‘ is defined as 
a vertex of the polytope P, satisfying some criterion 
(e.g., minimizing a given concave function). In these 
cases, the implementation of the above algorithm re- 
quires a procedure for computing, at iteration k, the 
vertex set V; of the current polytope P,. At the begin- 
ning, V; is supposed to be known, while P;.; is ob- 
tained from P;, simply by adding one more linear con- 
straint ],(x) < 0. Using this information V;,; can be 
derived from V;, by an on-line vertex enumeration pro- 
cedure [1]. 


Example 1 (Concave minimization.) Consider the 
problem (1) where f(x) is concave and D is a convex 
compact set with int D # 9. 

Assume that D is defined by a convex inequality g(x) 
< 0 and let w € int D. Take P to be the collection of 
all polytopes containing D. For every P € P define z := 
z(P) to be a minimizer of f(x) over the vertex set V of 
P (hence, by concavity of f(x), a minimizer of f(x) over 
P). Clearly, if z € D, it solves the problem. Otherwise, 
the line segment joining z to w meets the boundary of 
D at a unique point y and the affine function I(x) = (p, 
x— y) + g(y) with p € 0 g(y) strictly separates D from 
z (indeed, [(z) = g(z)> 0 while I(x) < g(x)— 9(z)+ g(z) 
< 0 for all x € D. Obviously P’ = PN {x : I(x) < 0} 
€ P, so Assumptions Al) and A2) are fulfilled and the 
OA algorithm can be applied. The convergence of the 
algorithm is easy to establish. 


Example 2 (Reverse convex programming.) Consider 
the problem (1) where f(x) = (c, x), while D = {x € R” 
: h(x) < 0 < g(x)} with g(x), h(x) continuous convex 
functions. Assume that the problem is stable, i.e. that 
D = cl(int D), so a feasible solution x € D is optimal if 
and only if 


{x €D: (c,x—x) <0} C {x: g(x) <0}. = (3) 


Also for simplicity assume a point w is available satisfy- 
ing max{h(w), g(w)}< 0 and (c, w) < min{(c, x):h(x) < 
0 < g(x)} (the latter assumption amounts to assuming 
that the constraint g(x) > 0 is essential). 

Let £2 be the set of optimal solutions, P the collec- 
tion of all polytopes containing 92. For every P € P let 
z = z(P) be a maximizer of g(x) over the vertex set V 


of the polyhedron PM }x:: (c, x) < y}, where y is the 
value of the objective function at the best feasible solu- 
tion currently available (set y = +00 if no feasible so- 
lution is known yet). By (3), if g(z) < 0, then y is the 
optimal value (for y< +00), or the problem is infeasi- 
ble (for y = +00). Otherwise, g(z)> 0, and we can con- 
struct an affine function [(x) strictly separating z from 

&2 as follows. Since max{h(w), g(w)}< 0 while max{h(z), 

g(z)}> 0 the line segment joining z, w meets the surface 

max{h(x), g(x)} = 0 at a unique point y. 

1) If g(y) = 0 (while h(y) < 0), then y is a feasible solu- 
tion and since y = A w+ (1— A) z for some A € (0, 1) 
we must have (c, y) = A (c, w) + (I— A) (c, z) < y, so 
the cut I(x) = (c, x—y) < 0 strictly separates z from 
2. 

2) If h(y) = 0, then the cut I(x) = (p, x—y + h(y) < 0, 
where p € 0 h(y), strictly separates z from (2 (indeed, 
I(x) < h(x)— h(y)+ h(y) = h(x) < 0 for all x € 2 while 
I(z)> 0 because I(w)< 0, I(y) = 0). 

Thus assumptions Al), A2) are satisfied, and again the 

OA algorithm can be applied. The convergence of the 

OA algorithm for this problem is established by a more 

elaborate argument than for the concave minimization 

problem (see [3,8]). 


Various variants of OA method have been developed 
for a wide class of optimization problems, since any op- 
timization problem described by means of differences 
of convex functions can be reduced to a reverse convex 
program of the above form [3]. However, a difficulty 
with this method when solving large scale problems is 
that the size of the vertex set V; of Px may grow expo- 
nentially with k, creating serious storage problems and 
making the computation of V;, almost impracticable. 


Inner Approximation 


Consider the concave minimization problem under lin- 
ear constraints, i.e. the problem (2) when f(x) is a con- 
cave function and D is a polytope in R”. 

Without loss of generality we may assume that 0 is 
a vertex of D. For any real number y < f(0), the set Cy = 
{x ER" }{f(x) = y} is convex and 0 € DN C,. Of course, 
DCC, ifand only if f(x) > y for all x € D. 

The idea of the inner approximation method (IA 
method), also called the polyhedral annexation method 
(or PA method)[3], is to construct a sequence of ex- 
panding polytopes P; C P2 C--- together with a nonin- 
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creasing sequence of real numbers y; > y2 >---, such 
that yx € f(D), Pe C Cyn, k = 1, 2,..., and eventually 
DC Py for some h: then y;, < f(x) for all x € D,ie. y;, 
will be the optimal value. 

For every set P C R" let P® be the polar of P, i.e. 
P° = {y € R":(y, x) < 1, V x © P. As is well known P° 
is a closed convex set containing 0 (in fact a polyhe- 
dron if P is a polyhedron), and P C Q only if P° D Q°; 
moreover, if C is a closed convex set containing 0, then 
(C°)° = C. Therefore, setting S;, = (P*)°, the IA method 
amounts to constructing a sequence of nested polyhe- 
dra Sy D +++ D S; satisfying S, C Cyx, k= 1,..., hand 
Sn C D°. The key point in this scheme is: Given yx € 
f(D) and a polyhedron S; such that S,° C Cy, check 
whether S; C D° and if there is af € §;\D°, then con- 
struct a cut I,(y) < 1 to exclude yk and to form a smaller 
polyhedron S,,; such that Sx41° C Cyk+1 for some yr 
€ f(D) satisfying yi < Yr. 

To deal with this point, define s(y) = max{(y, x) : 
x € D}. Since y € D° whenever s(y) < 1 we will have S; 
Cc D° whenever 


max {s(y): y € Sg} <1. (4) 


But clearly the function s(y) is convex as the pointwise 
maximum ofa family of linear functions. Therefore, de- 
noting the vertex set and the extreme direction set of S, 
by V;, Ux, respectively, we will have (4) (i.e. S; C D®) 
whenever 

max {s(y): ye Vie} <1, 


(5) 
max {s(y): y € Ux} < 0. 


Thus, checking the inclusion S; C D° amounts to 
checking (5), a condition that fails to hold in either of 
the following cases: 


s(y*) >1  forsome y* € Vj 


s(y*) >0 for some y* € Ux. (7) 


In each case, it can be verified that if x maximizes (y*, 


x) over D, and yxy = min{y,, f(x*)} while 
Ox = sup {: f(Ox*) > vent. 
then Sy.1 =S. ty: (x*, y) < 1/0,} satisfies 


Pyyi = Spy, = conv(P U {O,x*) © Cyr 


In the case (6), Sx; no longer contains ye while in the 
case (7), yk is no longer an extreme direction of Sj4;. In 
this sense, the cut (x*, y < 1/6, excludes yk . We can thus 
state the following algorithm. 


IA Algorithm (for concave minimization) 


By translating if necessary, make sure that 0 is a vertex of 
D.Let x! be the best basic feasible solution available, y ; 
= f(x'). Take a simplex P; C Cy, and let 5) =P1°, V; = 
vertex set of $;, U; = extreme direction set of $;.Set k=1. 
Compute s(y) for every new y € (Vx U Ux)\{0}. If (5) holds, 
then terminate: 5, C D°sox* isa global optimal solution. 
If (6) or (7) holds, then let 


xk © argmax {lyk x) :xeD}. 


Update the current best feasible solution by comparing 
x* and x*, Set Vkt1 = FOek +1), 
Compute 6, = max{@ > 1:f(Ox*) > yvia1}and let 
Ski = SK \y: (xk, y) < A}. 
From Vx and U, derive the vertex set Vx. and the 


extreme direction set Ux1 of 5,41. Set k<— k+ 1 and go to 
Step 1. 


It can be shown that the IA algorithm is finite [3]. 
Though this algorithm can be interpreted as dual to the 
OA algorithm, its advantage over the OA method is that 
it can be started at any vertex of D, so that each time 
the set V; has reached a certain critical size, it can be 
stopped and ‘restarted’ at a new vertex of D, using the 
last obtained best value of f(x) as the initial y,. In that 
way the set V; can be kept within manageable size. 
Note that if D is contained in a cone M and P, = {x 
€ M:(v', x) < 1} C Cy, then it can be shown that (7) 
automatically holds, and only (6) must be checked [6]. 


Concavity Cut 


The cuts mentioned above are used to separate an unfit 
solution from some convex set containing at least one 
optimal solution. They were first introduced in convex 
programming [2,4]. Another type of cuts originally de- 
vised for concave minimization [7] is the following. 
Suppose that a feasible solution x has already been 
known with f(x) = y and we would like to check 
whether there exists a better feasible solution. One way 
to do that is to take a vertex x° of D with f(x°) > y and 
to construct a cone M, as small as possible, vertexed at 
x°, containing D and having exactly n edges. Since x° 
is interior to the convex set C, = {x : f(x) = y}, each 
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ith edge of M, for i = 1, ..., n, meets the boundary of 
Cy at a uniquely defined point y' (assuming that Cy 
is bounded). Through these n points y', ..., y" (which 
are affinely independent) one can draw a unique hyper- 
plane, of equation z (x— x°) = 1 such that m (y'— x°) = 
1(i=1,...,n), hence x = eT U~!, where U is the matrix 
of columns y!— x°,..., y”— x° and e denotes a vector of 
n ones. Since the linear inequality 


eu e@= 2") = 1 (8) 


excludes x° without excluding any feasible solution x 
better than x, this inequality defines a valid cut. In par- 
ticular, if it so happens that the whole polytope D is cut 
off, i.e. if 


Dc {x: e!U"x — x°) < Is (9) 


then x is a global optimal solution. 

This cut is often referred to as a y-valid concavity cut 
for (f, D) at x° [3]. Its construction requires the avail- 
ability of a cone M 3D S vertexed at x° and having ex- 
actly n edges. In particular, if the vertex x° of D has ex- 
actly n neighboring vertices then M can be taken to be 
the cone generated by the n halflines from x° through 
each of these neighbors of x°. Note, however, that the 
definition of the concavity cut can be extended so that 
its construction is possible even when the cone M has 
more than n edges (as e.g., when x° is a degenerated 
vertex of D). 

Condition (9), sufficient for optimality, suggests 
a cutting method for solving the linearly constrained 
concave minimization problem by using concavity cuts 
to iteratively reduce the feasible polyhedron. Unfortu- 
nately, experience has shown that concavity cuts, when 
applied repeatedly, tend to become shallower and shal- 
lower. Though these cuts can be significantly strength- 
ened by exploiting additional structure of the problem 
(e. g., in concave quadratic minimization, bilinear pro- 
gramming [5] and also in low rank nonconvex prob- 
lems [6]), pure cutting methods are often outperformed 
by branch and cut methods where cutting is combined 
with successive partition of the space [8]. 

Concavity cuts have also been used in combinato- 
rial optimization (‘intersection cuts’, or in a slightly ex- 
tended form, ‘convexity cuts’). 


Nonlinear Cuts 


In many problems, nonlinear cuts arise in a quite natu- 
ral way. 

For example, consider the following problem of 
monotonic optimization [10]: 


max { f(x): g(x) <1, h(x)>1,xe€ Ri}, (10) 


where f, g, h are continuous increasing functions on R4, 
(a function f(x) is said to be increasing on R‘. if 0 < x 
<x’ => f(x) < f(x’); the notation x < x’ means x; < x; 
for all i while x < x’ means x; < x, for all i). As argued in 
[10], a very broad class of optimization problems can be 
cast in the form (10). Define G = {x € R4. g(x) < 1}, H= 
{x € Ri: h(x) = 1}, so that the problem is to maximize 
f(x) over the feasible set GN H. Clearly 
O<x<x/€G => xeEG, 


(11) 


O0<x<x ¢H > xé€H. (12) 
Assume that g(0) < land0<a<x<bforalxeGn 
H (so 0 € int G, b € H). From (11) it follows that if z 
€ R'\ Gand z(z) is the last point of G on the halfline 
from 0 through z, then the cone Kz(z)} = {x € R4: x> 
t(z)} separates z from G, i.e. GN Kz(z) = 9, while z € 
Kz (Z). 

A set of the form P = U, <y{x: 0 < y}, where V is 
a finite subset of R" , is called a polyblock of vertex set V 
[9]. A vertex v is said to be improper if v < v’ for some v’ 
€ V \ {v}. Of course, improper vertices can be dropped 
without changing P. Also if P > G/M H then the poly- 
block of vertex set V’ = VM H still contains GM H 
because v ¢ H implies that [0, v] N H = 9. With these 
properties in mind we can now describe the polyblock 
approximation procedure for solving (10). 

Start with the polyblock P; = [0,b] > GN H and its 
vertex set V, = {b} C H. At iteration k we have a poly- 
block P; > GN H with vertex set V; C H. Let yk € arg 
max{f(x) : x € V;}. Clearly fF maximizes f(x) over Px, 
and € H, so if yk € Gthen yk is an optimal solution. 
If y ¢ G then the point x* =  (y*) determines a cone 
K,,« such that the set Pz, = Px\ K,.« excludes yf but still 
contains GM H. It turns out that P,4; is a polyblock 
whose vertex set V1 is obtained from V; by adding n 
points v*!,..., vs" (which are the n vertices of the hy- 
perrectangle [x*, y*] adjacent to y*) and then dropping 
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all those which do not belong to H. With this polyblock 
Pi41, we pass to iteration k+1. 

In that way we generate a nested sequence of poly- 
blocks P} D P2 D +++ D GN H. It can be proved that 
either y* is an optimal solution at some iteration k or 
FO) Ny = max{f(x) sx € GN H}. 

A similar method can be developed for solving the 
problem 


min {f (x): g(x) < 1,h(x)>1,xe Ri} 


by interchanging the roles of g, h and a, b. In contrast 
with what happens in OA methods, the vertex set V;. of 
the polyblock P; in the polyblock approximation algo- 
rithm is extremely easy to determine. Furthermore this 
method admits restarts, which provide a way to pre- 
vent stall and overcome storage difficulties when solv- 
ing large scale problems [10]. 
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A company that produces large rolls of paper, textile, 
steel, etc., usually faces the problem of how to cut the 
large rolls into smaller rolls, called finished rolls, in such 
a way that the demands for all finished rolls be satisfied. 
Any large roll is cut according to some cutting pattern 
and the problem is to find the cutting patterns to be 
used and to how many large rolls they should be ap- 
plied. We assume, for the sake of simplicity, that each 
large roll has width W, an integer multiple of some unit 
and the finished roll widths are also specified by some 
integers Wj, ..., Wm. Let a designate the number of rolls 
of width w; produced by the use of the jth pattern, i = 1, 
...,m,j=1,..., n. Let further b; designate the demand 
for rolli,i=1,...,m,and cj =1,j=1,...,n.IfA = (aj), 


b=(by,..., bm)T, ¢ = (c1,..., Cn), then the problem is: 
min c!x 
st. Ax=b 
x>0. 


Here x; means the number of jth cutting patterns to be 
used and as such, an all integer solution would be re- 
quired to the problem. However, one is usually satisfied 
with an optimal solution of the above problem with- 
out the integrality restriction and, having that, a simple 
round-off procedure provides us with the solution to 
the problem. 

In the above problem, however, the matrix A is 
huge, therefore we do not, and in most cases can- 
not, create it, by enumerating the cutting patterns. P.C. 
Gilmore and R.E. Gomory [3,4] resolved this difficulty 
by an ingenious column generation technique. It works 
in such a way that we generate column j, in the course 
of the simplex algorithm, whenever needed. Assume 
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that B is the current basis and designate by m the cor- 
responding dual vector, i.e., the solution of the linear 
equation: a1TB = rae Now, if a = (a), ...; dm)? € Zn 
satisfies the inequality w'a < W, then, by definition, 
a represents a cutting pattern, a column of the matrix 
A. Since the cutting-stock problem is a minimization 
problem, the basis B is optimal if mTa < 1 for any a that 
satisfies w'a < W. We can check it by solving the linear 
program: 


min ala 


s.t. wla< Ww 
m 
ae Z". 


If the optimum value is greater than 1, then the optimal 
a vector may enter the basis, otherwise B is an optimal 
basis and xg is an optimal solution to the problem. The 
problem to find the vector a is a knapsack problem for 
which efficient solution methods exist. 

In practice, however, frequently more complicated 
cutting-stock problems come up, due to special cus- 
tomer requirements depending on quality and other 
characteristics. In addition, we frequently need to in- 
clude set up costs, capacity constraints and costs due 
to delay in manufacturing. These lead to the develop- 
ment of special algorithms as described in [1,4,5,6,7]. 
Recently Cs.I. Fabian [2] formulated stochastic variants 
of the cutting-stock problem, for use in fiber manufac- 
turing. 


See also 


> Integer Programming 
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Often the solution of multivariable optimization prob- 
lems it is desired to be done with a gradient-free algo- 
rithm. This may be the case when gradient evaluations 
are difficult, or in fact gradients of the underlying opti- 
mization method do not exist. Such a method that of- 
fers this feature is the method of the cyclic coordinate 
search and its variants. 
The minimization problem considered is: 


min f(x). 


The method in its basic form uses the coordinate axes 
as the search directions. In particular, the search di- 
rections d“), ..., d‘, where the d“ are vectors of ze- 
ros, except for a 1 in the ith position. Therefore along 
each search direction d" the corresponding variable x; 
is changed only, with all remaining variables being kept 
constant to their previous values. 

It is assumed here that the minimization is carried 
out in order over all variables with indices 1, ..., n 
at each iteration of the algorithm. However there are 
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variants. The first of these is the Aitken double sweep 
method, which processes first the variables in the order 
mentioned above, and then in the second sweep returns 
in reverse order, that is n— 1,..., 1. The second variant 
is termed the Gauss-Southwell method [2], according 
to which the component (variable) with largest partial 
derivative magnitude in the gradient vector is selected 
for line searching. The latter requires the availability of 
first derivatives of the objective function. 

The algorithm of the cyclic coordinate method can 
be summarized as follows: 


1. Initialization 

Select a tolerance € > 0, to be used in the ter- 
mination criterion of the algorithm. Select an 
initial point x and initialize by setting z“) = 
x, Setk =Oandi=1. 


2. Main iteration 

Let a* (scalar variable) be the optional solu- 
tion to the line search problem of minimizing 
f(z + adj). Setz@) = 2 + afd”. If 
j <n, then increase i to i+ 1 and repeat step 2. 
Otherwise, if j = n, then go to step 3. 


3. Termination check 

Set x*+! = 7). If the termination criterion is 
satisfied, for example ||x“*) — x || < e, then 
stop. Else, set 2) = x“), Increase k to k + 1, 
set i = 1 and repeat step 2. 


The steps above outline the basic cyclic coordinate 
method, the Aitken and Gauss—Southwell variants can 
be easily included by modifying the main algorithm. 


In terms of convergence rate comparisons, D.G. Lu- 
enberger [3] remarks that such comparisons are not 
easy. However, an interesting analysis presented there 
indicates that roughly n — 1 coordinate searches can be 
as effective as a single gradient search. Unless the vari- 
ables are practically uncoupled from one another then 
coordinate search seems to require approximately n line 
searches to bring about the same effect as one step of 
steepest descent. 

It can generally be proved that the cyclic coordinate 
method, when applied to a differentiable function, will 
converge to a stationary point [1,3]. However, when 
differentiability is not present then the method can stall 
at a suboptimal point. Interestingly there are ways to 
overcome such difficulties, such as by applying at ev- 
ery pth iteration (a heuristic number, user specified) 
the search direction x )— x, This is even applied 
in practice for differentiable functions, as it is found to 
be helpful in accelerating convergence. These modifi- 
cations are referred to as acceleration steps or pattern 
searches. 


See also 
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Data envelopment analysis (DEA) is a novel technique 
based on linear programming for evaluating the rela- 
tive performance of similar units, referred to as deci- 
sion making units (DMUs). The system under evalu- 
ation consists of m DMUs: each DMU consumes vary- 
ing amount of m, different inputs (resources) to pro- 
duce my different outputs (products). Specifically, the 
jth DMU is characterized by the input vector x/ > 0 and 
the output vector yl > 0. The aim of DEA is to discern, 
for each DMU, whether or not is operating in an effi- 
cient way, given its inputs and outputs, relative to all 
remaining DMUs under consideration. The measure of 
efficiency is the ratio of a weighted sum of the outputs 
to a weighted sum of the inputs. For each DMU, the 
weights are different and obtained by solving a linear 
programming problem with the objective of showing 
the DMU in the best possible light. 


The ability to deal directly with incommensurable 
inputs and outputs, the possibility of each DMU of 
adopting a different set of weights and the focus on in- 
dividual observation in contrast to averages are among 
the most appealing features of model based on DEA. 

A process is defined output-efficient if there is no 
other process that, using the same or smaller amount 
of inputs, produces higher level of outputs. A process 
is defined input-efficient if there is no other process 
that produces the same or higher level of outputs, using 
smaller amount of inputs. For each orientation there 
are four possible models: 

1) the ‘constant returns’ model; 

2) the ‘variable returns’ model; 

3) the ‘increasing returns’ model; 

4) the ‘decreasing returns’ model. 

Each model is defined by a specific set of economic as- 
sumptions regarding the relation between inputs and 
outputs [10,11]. Associated with each of the four DEA 
models, independent of the orientation, there is a pro- 
duction possibility set, that is, the set of all possible in- 
puts and outputs for the entire system. This set consists 
of the n DMUs and of ‘virtual’ DMUs obtained as linear 
combination of the original data. The efficient frontier 
is a subset of the boundary points of this production set. 
The objective of DEA is to determine if the DMU un- 
der evaluation lies on the efficient frontier and to assign 
a score based on the distance from this frontier [6]. 

The production set for the ‘constant returns’ model 
is 


eS pe per Aj, 
ys bj Wj, 
WA; >0, ye 


j=l 


(x,y) >0: 
Aj =1, 


u>Od 
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while for the ‘variable returns’ model we have 
£2 wh wy, 
YS BY a VA; 
VA; = 0, Via 4j = 1 


The production sets for the ‘increasing’ (resp. “de- 


Th = 4 (x,y) = 0: 


creasing’) returns models are similar to the set T, above 
with the equality constraint ae Aj; = 1 replaced by the 
inequality )(7_, Aj < 1 (resp. ))7_, Aj 2 1). 
The ‘constant returns, input oriented’ envelopment 
LP is given next: 
min 6 
6,A>0 


n 
s.t. So xlaj—x 0 <0 


j=l 


(1) 


For the ‘constant returns, output oriented’ case we 
have, instead, the following LP problem [4]: 


max 
w,Azo 
n 
: ok 
s.t. ye, < x/ 
fas (2) 


vidi — yz 0. 


j=l 


In both cases the additional constraint 
n 
Ay 
j=l 


defines the LP for the variable, increasing and decreas- 
ing returns DEA models, respectively. 

The corresponding dual problem for the input- 
oriented case is 


s 
= 


max oly +6 


1,0,8 

st. —a!xitolyW +B <0 
j=l,...,n (3) 
a! xi <1 
z>0, o +0, 


where 6 = 0, 6 unrestricted, 6B < 0 and 6 > 0 for 
the constant, variable, increasing and decreasing return 
DEA models. 


For the output-oriented case the dual is: 


min xz'xi +8 

10,8 

st. a'xi-olyit B>0 
j=il,...,n (4) 
oly >] 
zr>0, o=0 


with 6 = 0, B unrestricted 6 > 0 and B < 0 for the con- 
stant, variable, increasing and decreasing returns DEA 
models. 

For the ‘input-oriented, constant returns’ case, the 
reference DMU j” is 
e inefficient if 

- the optimal value of problem (1) is different from 

1, or 
- the optimal value of Problem (1) is equal to 1 but 
there exists an optimal solution with at least one 
slack variable strictly positive; 
e efficient in the remaining cases. 
Moreover the efficient DMU j* can be 
e extreme-efficient if Problem (1) has the unique solu- 
tion A. = 1,47 =Oj=L....~j 4s 
e nonextreme efficient when Problem (1) has alternate 
optimal solutions. 
The efficiency for the other models is defined in a simi- 
lar manner. 

The conditions 6 > 0 and w > 0 can be introduced 
without loss of generality in (1) and (2) since only non- 
negative values for these variables are possible given our 
assumption on the data. Since Aj, = 1, A; =0 forj 4 j*, 
0* =1, and Aj. =1, A; =0 forj A j*, y* = 1 are always 
feasible for (1) and (2), respectively, the optimal objec- 
tive function value lies in the interval (0, 1] for the input 
orientation case and [1, oo) for the output orientation 
case. 

The linear programs (1) and (3) above can be in- 
terpreted in the following way. In the input-oriented 
case, we compare the reference DMU j* with a ‘virtual’ 
DMU obtained as linear combination of the original 
DMUs. Each input and output of this virtual DMU is 
a linear combination of the corresponding component 
of the inputs and outputs of all the DMUs. The optimal 
value is, in this case, always less than or equal to 1. If 
the optimal value is strictly less than 1, then it is possi- 
ble to construct a virtual DMU that produces at least the 
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same amount of outputs as the reference DMU using an 

amount of inputs that is strictly smaller than amount 

used by the j*th DMU. When this happens we declare 

the DMU ;* inefficient. Instead, when the optimal value 

is equal to 1 there are three possible cases: 

e there exists an optimal solution with at least one 
slack variable strictly positive; 

e the optimal solution is unique; 

e there exists multiple optimal solutions. 

In the first case we declare the reference DMU inefhi- 

cient. In the last two cases the j*th DMU is efficient 

(extreme-efficient, respectively nonextreme efficient). 

For Problem (3), the optimal solution 7* and o* 
represent the weights that are the most favorable for 
the reference DMU, i.e., the weights that produce the 
highest efficiency score under the hypothesis that, us- 
ing the same weights for the other DMUs, the efficiency 
remains always below 1. 

Similar interpretations can be given for the output- 
oriented case for Problems (2) and (4). 

In Fig. 1 it is represented the production possibility 
set and the efficient frontier for the five DMUs ‘A’ to ‘P’. 
These DMUs are characterized by two different inputs 
and a single output value set to some fixed value. 

All the DMUs are efficient but the DMU ‘E’. The 
DMU ‘B’ is efficient but nonextreme. The virtual DMU 
‘KR’, obtained as convex combination of the DMUs ‘C’ 
and ‘D’, is more efficient than the DMU ‘EP’. The optimal 
value 0* for the linear programming problem (1) for 
the DMU ‘F is exactly the ratio of the lengths of the 
segments OE and OK. 


single fixed output 
Op = OK 


OE 


t2 


O 


Data Envelopment Analysis, Figure 1 
Two-input, single output DMUs 


2 


single fixed input 


_ OK 
B vF=GF 


O 


Data Envelopment Analysis, Figure 2 
Two-output, single input DMUs 


Yi 


Figure 2 shows the case of DMUs characterized by 
two distinct outputs and a single input set to a fixed 
value. All the DMUs are efficient except the DMU ‘F 
that is dominated by the virtual DMU ‘KR’. The optimal 
value y* for the linear programming problem (2) for 
the DMU ‘F is the ratio of the lengths of the segments 
OE and OK. 

The original ‘constant returns’ model was proposed 
in [4]. In [2] the variable returns model was proposed 
with the objective of discriminating between technical 
efficiency and scale efficiency. The bibliography pub- 
lished in [7] (part of [3]) contains more than 500 refer- 
ences to published article in the period 1978-1992 and 
many more articles appeared since. 

In all the DEA models discussed above, all efficient 
DMUs receive an equal score of 1. An important mod- 
ification proposed in [1] allows to rank efficient units. 
The main idea is to exclude the column being scored 
from the DEA envelopment LP technology matrix. The 
efficiency score is now a value between (0, +00] in both 
orientations. In [5] are discussed in detail the issues (in- 
feasibility, relationship between modified and standard 
formulation, degeneracy, interpretation of the optimal 
solutions) related to these DEA models. 
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In [8] and [9] the properties of ‘unit invariance’ (in- 
dependence of the units in which inputs and outputs 
are measured) and ‘translation invariance’ (indepen- 
dence of an affine translation of the inputs and the out- 
puts) of an efficiency DEA measure are discussed. The 
translation invariance property is particularly impor- 
tant when data contain zero or negative values. Stan- 
dard DEA models are not unit invariant and translation 
invariant. In [8] it is proposed a weighted additive DEA 
model that satisfies these properties: 


min > wiser + > Ww, S, 
Ayst s— 
ye i=1 r=1 
n 
s.t Sse Se 
j=l 
b= 1;. »,My, 
n 
j i* 
So Aj =S, = Sr 
= (5) 
r=1,...,m 
n 
2d Ai=1 
j=iln 


a 
ae 
vii 
S 
ll 
= 


20, “rel .:. 


where w;' and w,, are the sample standard deviation of 
the inputs and outputs variables respectively. 

Models based on data envelopment analysis have 
been widely used in order to evaluate efficiency in both 
public and private sectors. [3, Part II] contains 15 ap- 
plication of DEA showing the ‘range, power, elegance 
and insight obtainable via DEA analysis’. Banks, hos- 
pitals, and universities are among the most challenging 
sectors where models based on DEA have been able to 
assess efficiency and determine strength and weakness 
of the various units. 


See also 


> Optimization and Decision Support Systems 
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Introduction 


Data mining has proven valuable in almost every as- 
pect of life involving large data sets. Data mining is 
made possible by the generation of masses of data from 
computer information systems. In engineering, satel- 
lites stream masses of data down to storage systems, 
yielding a mountain of data that needs some sort of data 
mining to enable humans to gain knowledge. Data min- 
ing has been applied in engineering applications such 
as quality [4], manufacturing and service [13], labor 
scheduling [17], and many other places. Medicine has 
been an extensive user of data mining, both in the tech- 
nical area [21] and in health policy [6]. Pardalos [16] 
provide recent research in this area. Governmental op- 
erations have received support from data mining, pri- 
marily in the form of fraud detection [9]. 

In business, data mining has been instrumental 
in customer relationship management [5,8], financial 
analysis [3,12], credit card management [1], health 
service debt management [22], banking [19], insur- 
ance [18], and many other areas of business involv- 
ing services. Kusiak [13] reviewed data mining appli- 
cations to include service applications of operations. 
Recent reports of data mining applications in web 
service and technology include Tseng and Lin [20] 
and Hou and Yang [11]. In addition to Tseng and 
Lin, Lee et al. [14] discuss issues involving mobile 
technology and data mining. Data mining support 
is required to make sense of the masses of business 
data generated by computer technology. Understand- 
ing this information-generation system and tools avail- 
able leading to analysis is fundamental for business in 
the 21st century. The major applications have been in 
customer segmentation (by banks and retail establish- 
ments wishing to focus on profitable customers) and in 
fraud and rare event detection (especially by insurance 
and government, as well as by banks for credit scor- 
ing). Data mining has been used by casinos in customer 
management, and by organizations evaluating person- 
nel. 

We will discuss data mining functions, data min- 
ing process, data systems often used in conjunction 
with data mining, and provide a quick review of soft- 
ware tools. Four prototypical applications are given to 
demonstrate data mining use in business. Ethical issues 
will also be discussed. 


Definitions 


There are a few basic functions that have been applied 

in business. Bose and Mahapatra [2] provided an exten- 

sive list of applications by area, technique, and problem 
type. 

e Classification uses a training data set to identify 
classes or clusters, which then are used to catego- 
rize data. Typical applications include categorizing 
risk and return characteristics of investments, and 
credit risk of loan applicants. The Adams [1] case, 
for example, involved classification of loan appli- 
cations into groups of expected repayment and ex- 
pected problems. 

e Prediction identifies key attributes from data to de- 
velop a formula for prediction of future cases, as 
in regression models. The Sung et al. [19] case pre- 
dicted bankruptcy while the Drew et al. [5]) case and 
the customer retention part of the Smith et al. [18] 
case predicted churn. 

e Association identifies rules that determine the 
relationships among entities, such as in market 
basket analysis, or the association of symptoms 
with diseases. IF-THEN rules were shown in the 
Sung et al. [19] case. 

e Detection determines anomalies and irregularities, 
valuable in fraud detection. This was used in claims 
analysis by Smith et al. [18]. 

To provide analysis, data mining relies on some fun- 
damental analytic approaches. Regression and neural 
network approaches are alternative ways to identify the 
best fit in a given set of data. Regression tends to have 
advantages with linear data, while neural network mod- 
els do very well with irregular data. Software usually 
allows the user to apply variants of each, and lets the 
analyst select the model that fits best. Cluster analysis, 
discriminant analysis, and case-based reasoning seek to 
assign new cases to the closest cluster of past observa- 
tions. Rule induction is the basis of decision tree meth- 
ods of data mining. Genetic algorithms apply to special 
forms of data, and are often used to boost or improve 
the operation of other techniques. 

In order to conduct data mining analyzes, a data 
mining process is useful. The Cross-Industry Standard 
Process for Data Mining (CRISP-DM) is widely used 
by industry members [15]. This model consists of six 
phases intended as a cyclical process: 
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Business understanding: Business understanding 
includes determining business objectives, assessing 
the current situation, establishing data mining goals, 
and developing a project plan. 

Data understanding: Once business objectives and 
the project plan are established, data understanding 
considers data requirements. This step can include 
initial data collection, data description, data explo- 
ration, and the verification of data quality. Data ex- 
ploration such as viewing summary statistics (which 
includes the visual display of categorical variables) 
can occur at the end of this phase. Models such 
as cluster analysis can also be applied during this 
phase, with the intent of identifying patterns in the 
data. 

Data preparation: Once the data resources available 
are identified, they need to be selected, cleaned, built 
into the form desired, and formatted. Data clean- 
ing and data transformation in preparation for data 
modeling needs to occur in this phase. Data explo- 
ration at a greater depth can be applied during this 
phase, and additional models utilized, again provid- 
ing the opportunity to see patterns based on busi- 
ness understanding. 

Modeling: Data mining software tools such as vi- 
sualization (plotting data and establishing relation- 
ships) and cluster analysis (to identify which vari- 
ables go well together) are useful for initial analy- 
sis. Tools such as generalized rule induction can de- 
velop initial association rules. Once greater data un- 
derstanding is gained (often through pattern recog- 
nition triggered by viewing model output), more de- 
tailed models appropriate to the data type can be ap- 
plied. The division of data into training and test sets 
is also needed for modeling (sometimes even more 
sets are needed for model refinement). 

Evaluation: Model results should be evaluated in 
the context of the business objectives established 
in the first phase (business understanding). This 
will lead to the identification of other needs (often 
through pattern recognition), frequently reverting 
to prior phases of CRISP-DM. Gaining business un- 
derstanding is an iterative procedure in data mining, 
where the results of various visualization, statistical, 
and artificial intelligence tools show the user new re- 
lationships that provide a deeper understanding of 
organizational operations. 


e Deployment: Data mining can be used both to 
verify previously held hypotheses, and for knowl- 
edge discovery (identification of unexpected and 
useful relationships). Through the knowledge dis- 
covered in the earlier phases of the CRISP-DM 
process, sound models can be obtained that may 
then be applied to business operations for many 
purposes, including prediction or identification of 
key situations. These models need to be monitored 
for changes in operating conditions, because what 
might be true today may not be true a year from 
now. If significant changes do occur, the model 
should be redone. It is also wise to record the results 
of data mining projects so documented evidence is 
available for future studies. 

This six-phase process is not a rigid, by-the-num- 
bers procedure. There is usually a great deal of back- 
tracking. Additionally, experienced analysts may not 
need to apply each phase for every study. But CRISP- 
DM provides a useful framework for data mining. 

There are many database systems that provide con- 
tent needed for data mining. Database software is avail- 
able to support individuals, allowing them to record 
information that they consider personally important. 
They can extract information provided by repetitive 
organizational reports, such as sales by region within 
their area of responsibility, and regularly add external 
data such as industry-wide sales, as well as keep records 
of detailed information such as sales representative ex- 
pense account expenditure. 

e Data warehousing is an orderly and accessible 
repository of known facts and related data that is 
used as a basis for making better management de- 
cisions. Data warehouses provide ready access to 
information about a company’s business, products, 
and customers. This data can be from both internal 
and external sources. Data warehouses are used to 
store massive quantities of data in a manner that can 
be easily updated and allow quick retrieval of spe- 
cific types of data. Data warehouses often integrate 
information from a variety of sources. Data needs to 
be identified and obtained, cleaned, catalogued, and 
stored in a fashion that expedites organizational de- 
cision making. Three general data warehouse pro- 
cesses exist. (1) Warehouse generation is the pro- 
cess of designing the warehouse and loading data. 
(2) Data management is the process of storing the 
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data. (3) Information analysis is the process of using 
the data to support organizational decision making. 
Data marts are sometimes used to extract specific 
items of information for data mining analysis. Ter- 
minology in this field is dynamic, and definitions 
have evolved as new products have entered the mar- 
ket. Originally, many data marts were marketed as 
preliminary data warehouses. Currently, many data 
marts are used in conjunction with data warehouses 
rather than as competitive products. But also many 
data marts are being used independently in order to 
take advantage of lower-priced software and hard- 
ware. Data marts are usually used as repositories of 
data gathered to serve a particular set of users, pro- 
viding data extracted from data warehouses and/or 
other sources. Designing a data mart tends to begin 
with the analysis of user needs. The information that 
pertains to the issue at hand is relevant. This may 
involve a specific time-frame and specific products, 
people, and locations. Data marts are available for 
data miners to transform information to create new 
variables (such as ratios, or coded data suitable for 
a specific application). In addition, only that infor- 
mation expected to be pertinent to the specific data 
mining analysis is extracted. This vastly reduces the 
computer time required to process the data, as data 
marts are expected to contain small subsets of the 
data warehouse’s contents. Data marts are also ex- 
pected to have ample space available to generate ad- 
ditional data by transformation. 

Online analytical processing (OLAP) is a multi- 
dimensional spreadsheet approach to shared data 
storage designed to allow users to extract data and 
generate reports on the dimensions important to 
them. Data is segregated into different dimensions 
and organized in a hierarchical manner. Many vari- 
ants and extensions are generated by the OLAP ven- 
dor industry. A typical procedure is for OLAP prod- 
ucts to take data from relational databases and store 
them in multidimensional form, often called a hy- 
percube, to reflect the OLAP ability to access data on 
these multiple dimensions. Data can be analyzed lo- 
cally within this structure. One function of OLAP is 
standard report generation, including financial per- 
formance analysis on selected dimensions (such as 
by department, geographical region, product, sales- 
person, time, or other dimensions desired by the 


analyst). Planning and forecasting are supported 

through spreadsheet analytic tools. Budgeting cal- 

culations can also be included through spreadsheet 
tools. Usually, pattern analysis tools are available. 

There are many statistical and analytic software 
tools marketed to provide data mining. Many good 
data mining software products are being used, in- 
cluding the well-established (and expensive) Enterprise 
Miner by SAS and Intelligent Miner by IBM, CLEMEN- 
TINE by SPSS (a little more accessible by students), 
PolyAnalyst by Megaputer, and many others in a grow- 
ing and dynamic industry. For instance, SQL Server 
2005 has recently been vastly improved by Microsoft, 
making a more usable system focused on the database 
perspective. 

These products use one or more of a number of an- 
alytic approaches, often as complementary tools that 
might involve initial cluster analysis to identify rela- 
tionships and visual analysis to try to understand why 
data clustered as it did, followed by various prediction 
models. The major categories of methods applied are 
regression, decision trees, neural networks, cluster de- 
tection, and market basket analysis. The Web site www. 
KDnuggets.com gives information on many products, 
classified by function. In the category of overall data 
mining suites, they list 56 products in addition to 16 
free or shareware products. Specialized software prod- 
ucts were those using multiple approaches (15 commer- 
cial plus 3 free), decision tree (15 plus 10 free), rule- 
based (7 plus 4 free), neural network (12 plus 3 free), 
Bayesian (13 plus 11 free), support vector machines (3 
plus 8 free), cluster analysis (8 plus 10 free), text min- 
ing (50 plus 4 free), and other software for functions 
such as statistical analysis, visualization, and Web us- 
age analysis. 


Example Applications 


There are many applications of data mining. Here we 
present four short examples in the business world. 


Customer Relationship Management (CRM) 


The idea of customer relationship management is to 
target customers for special treatment based on their 
anticipated future value to the firm. This requires esti- 
mation of where in the customer life-cycle each subject 
is, as well as lifetime customer value, based on expected 
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tenure with the company, monthly transactions by that 
customer, and the cost of providing service. Lifetime 
value of a customer is the discounted expected stream 
of cash flow generated by the customer. 

Many companies applying CRM score each individ- 
ual customer by their estimated lifetime value (LTV), 
stored in the firm’s customer database [5]. This concept 
has been widely used in catalog marketing, newspaper 
publishing, retailing, insurance, and credit cards. LTV 
has been the basis for many marketing programs offer- 
ing special treatment such as favorable pricing, better 
customer service, and equipment upgrades. 

While CRM is very promising, it has often been 
found to be less effective than hoped [10]. CRM systems 
can cost up to $70 million to develop, with additional 
expenses incurred during implementation. Many of the 
problems in CRM expectations have been blamed on 
over-zealous sales pitches. CRM offers a lot of opportu- 
nities to operate more efficiently. However, they are not 
silver bullets, and benefits are not unlimited. As with 
any system, prior evaluation of benefits is very difficult, 
and investments in CRM systems need to be based on 
sound analysis and judgment. 


Credit Scoring 


Data mining can involve model building (extension of 
conventional statistical model building to very large 
data sets) and pattern recognition. Pattern recognition 
aims to identify groups of interesting observations. In- 
teresting is defined as discovery of knowledge that is 
important and unexpected. Often experts are used to 
assist in pattern recognition. Adams et al. [1] compared 
data mining used for model building and pattern recog- 
nition on the behavior of customers over a one-year 
period. The data set involved bank accounts at a large 
British credit card company observed monthly. These 
accounts were revolving loans with credit limits. Bor- 
rowers were required to repay at least some minimum 
amount each month. Account holders who paid in full 
were charged no interest, and thus not attractive to the 
lender. 

We have seen that clustering and pattern search are 
typically the first activities in data analysis. Then ap- 
propriate models are built. Credit scoring is a means to 
use the results of data mining modeling for two pur- 
poses. Application scoring was applied in the Adams 


et al. example to new cases, continuing an activity that 
had been done manually for half a century in this orga- 
nization. Behavioral scoring monitors revolving credit 
accounts with the intent of gaining early warnings of 
accounts facing difficulties. 


Bankruptcy Prediction 


Corporate bankruptcy prediction is very important to 
management, stockholders, employees, customers, and 
other stakeholders. A number of data mining tech- 
niques have been applied to this problem, including 
multivariate discriminant analysis, logistical regression, 
probit, genetic algorithms, neural networks, and deci- 
sion trees. 

Sung et al. [19] applied decision analysis and deci- 
sion tree models to a bankruptcy prediction case. De- 
cision tree models provide a series of IF-THEN rules 
to predict bankruptcy. Pruning (raising the proportion 
of accurate fit required to keep a specific IF-THEN re- 
lationship) significantly increased overall prediction ac- 
curacy in the crisis period, indicating that data collected 
in the crisis period was more influenced by noise than 
data from the period with normal conditions. Example 
rules obtained were as shown in Table 1, giving an idea 
of how decision tree rules work. 

For instance, in normal conditions, if the variable 
Productivity of capital (E6) was greater than 19.65, the 
model would predict firm survival with 86 percent con- 
fidence. Conversely, if Productivity of capital (E6) was 
less than or equal to 19.65, and if the Ratio of cash flow 
to total assets (C9) was less than or equal to 5.64, the 
model would predict bankruptcy with 84 percent confi- 
dence. These IF-THEN rules are stated in ways that are 
easy for management to see and use. Here the rules are 
quite simple, a desirable feature. With large data sets, it 
is common to generate hundreds of clauses in decision 
tree rules, making it difficult to implement (although 
gaining greater accuracy). The number of rules can be 
controlled through pruning rates within the software. 


Fraud Detection 


Data mining has successfully supported many as- 
pects of the insurance business, to include fraud de- 
tection, underwriting, insolvency prediction, and cus- 
tomer segmentation. An insurance firm had a large data 
warehouse system recording details on every transac- 
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Data Mining, Table 1 
Bankruptcy Prediction Rules 


Condition Rule 


Prediction Confidence level 


Normal E6>19.65 Nonbankrupt 
Normal C9>5.64 Nonbankrupt 


Normal C9<5.64 & E6<19.65 


Crisis E6>20.61 


Bankrupt 
Nonbankrupt | 0.91 


Crisis C8>2.64 


Nonbankrupt | 0.85 


Crisis C3>87.23 


Nonbankrupt | 0.86 


Crisis 


C8<2.64, E6<20.61, & C3<87.23 


Bankrupt 


Where C3 = Ratio of fixed assets to equity & long-term liabilities. C8 = Ratio of 
cash flow to liabilities. C9 = Ratio of cash flow to total assets. E6 = Productivity 


of capital. Based on Sung et al. [19] 


tion and claim [18]. An aim of the analysis was to accu- 
rately predict average claim costs and frequency, and to 
examine the impact of pricing on profitability. 

In evaluating claims, data analysis for hidden trends 
and patterns is needed. In this case, recent growth in the 
number of policy holders led to lower profitability for 
the company. Understanding the relationships between 
cause and effect is fundamental to understanding what 
business decisions would be appropriate. 

Policy rates are based on statistical analysis assum- 
ing various distributions for claims and claim size. In 
this case, clustering was used to better model the per- 
formance of specific groups of insured. 

Profitability in insurance is often expressed by the 
cost ratio, or sum of claim costs divided by sum of pre- 
miums. Claim frequency ratio is the number of claims 
divided by the number of policy units of risk (possible 
claims). Profitability would be improved by lowering 
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the frequency of claims, or the costs of claims relative 
to premiums. 

Data was extracted from the data warehouse for 
policies for which premiums were paid in the first quar- 
ter over a three-year period. This meant that policies 
were followed over the period, augmented by new poli- 
cies, and diminished by terminations. Data on each 
policy holder was available as well as claim behavior 
over the preceding year. The key variables of cost ra- 
tio and claim frequency ratio were calculated for each 
observation. Sample sizes for each quarter were well 
above 100,000. 

Descriptive statistics found exceptional growth in 
policies over the past two years for young people (un- 
der 22), and with cars insured for over $40,000. Clus- 
tering analysis led to the conclusion that the claim cost 
of each individual policy holder would be pointless, as 
the vast majority of claims could not be predicted. Af- 


General Ability of Data Mining Techniques to Deal with Data Features 


Data characteristic 
Good 


Handle noisy data 


Very good 


Rule induction Neural networks 


Case-based reasoning Genetic algorithms 


Very good 


Handle missing data Good Good 


Good 


Process large data sets Very good Poor 


Good 


Process different data types | Good 


Transform to numerical 


Transforma-tion needed 


Predictive accuracy High Very high 


High 


Explanation capability Very good Poor 


Good 


Ease of integration Good Good 


Very good 


Ease of operation Easy Difficult 


Extracted from Bose and Mahapatra [2] 


Difficult 
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ter experimentation, the study was based on 50 clus- 
ters. A basic k-means algorithm was used. This iden- 
tified several clusters as having abnormal cost ratios or 
frequency sizes. By testing over a two-year gap, stability 
for each group was determined. Table 2 compares data 
mining techniques. 


Ethical Issues in Data Mining 


Data mining is a potentially useful tool, capable of do- 
ing a lot of good, not only for business but also for 
the medical field and for government. It does, however, 
bring with it some dangers. So, how can we best protect 
ourselves, especially in the area of business data min- 
ing? 

A number of options exist. Strict control of data 
usage through governmental regulation was proposed 
by Garfinkel [7]. A number of large database projects 
that made a great deal of practical sense have ultimately 
been stopped. Those involving government agencies 
were successfully stopped due to public exposure, the 
negative outcry leading to cancellation of the National 
Data Center and the Social Security Administration 
projects. A system with closely held information by 
credit bureaus in the 1960s was only stopped after gov- 
ernmental intervention, which included the passage of 
new laws. Times have changed, with business adopting 
a more responsive attitude toward consumers. Innova- 
tive data mining efforts by Lotus/Equifax and by Lexis- 
Nexis were quickly stopped by public pressure alone. 

Public pressure seems to be quite effective in pro- 
viding some control over potential data mining abuses. 
If that fails, litigation is available (although slow in ef- 
fect). It is necessary for us to realize what businesses 
can do with data. There will never be a perfect system 
to protect us, and we need to be vigilant. However, too 
much control can also be dangerous, inhibiting the abil- 
ity of business to provide better products and services 
through data mining. Garfinkel prefers more govern- 
mental intervention, while we would prefer less gov- 
ernmental intervention and more reliance on publicity 
and, if necessary, the legal system. 

Control would be best accomplished if it were nat- 
urally encouraged by systemic relationships. The first 
systemic means of control is publicity. Should those 
adopting questionable practices persist, litigation is 
a slow, costly, but ultimately effective means of sys- 


tem correction. However, before taking drastic action, 
a good rule is that if the system works, it is best not to fix 
it. The best measure that electronic retailers can take is 
to not do anything that will cause customers to suspect 
that their rights are being violated. 


Conclusions 


Data mining has evolved into a useful analytic tool in 
all aspects of human study, to include medicine, engi- 
neering, and science. It is a necessary means to cope 
with the masses of data that are produced in contem- 
porary society. Within business, data mining has been 
especially useful in applications such as fraud detection, 
loan analysis, and customer segmentation. Such appli- 
cations heavily impact the service industry. Data min- 
ing provides a way to quickly gain new understanding 
based upon large-scale data analysis. 

This paper reviewed some of the applications that 
have been applied in services. It also briefly reviewed 
the data mining process, some of the analytic tools 
available, and some of the major software vendors of 
general data mining products. Specific tools for partic- 
ular applications are appearing with astonishing rapid- 


ity. 
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As optimization techniques become widely used in en- 
gineering, economics, and other sciences, an increasing 
number of nonconvex optimization problems are en- 
countered that can be described in terms of dc functions 
(differences of convex functions). These problems are 
called dc optimization problems, and the theory dealing 
with these problems is referred to as dc programming, 
or dc optimization ([3,4,5,6,13]; see also [1,8]). 

Historically, the first dc optimization problem that 
was seriously studied is the concave minimization 
problem [11]. Subsequently, reverse convex program- 
ming and some other special dc optimization problems 
such as quadratic and, more generally, polynomial pro- 
gramming problems appeared before a unified theory 
was developed and the term dc optimization was intro- 
duced [12]. In fact, most global optimization problems 
of interest that have been studied so far can be identi- 
fied as dc optimization problems, despite the diversity 
of the approaches used. 


DC Structure in Optimization 


Let 82 be a convex set in R”. A function f: 2 — R is 
said to be dc on @2 if it can be expressed as the difference 
of two convex functions on 92: f(x) = p(x) — q(x), 
where p(x), q(x): 2 — R are convex. Denote the set 
of dc functions on 92 by DC(£2). 


Proposition 1 DC(S2) is a vector lattice with respect to 
the two operations of pointwise maximum and pointwise 
minimum. 


In other words, if fi(x) € DC(Q),i=1,... 
then: 

1. Sor. a: fi(x) € DC(&), for any real numbers «;; 

2. g(x) = max{fi(x), ... , fin(x)} € DC(&2); 

3. h(x) = min{fi(x), ... , fin(x)} € DC(£). 

From this property it follows in particular that if 
f € DC(&), then | f| € DC(22), and if g,h € DC(S2), 
then gh € DC(S2). But for the purpose of optimization 


,mM, 
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the most important consequence is that 


gi(x) <0, Vi=1,...,m 
<> g(x) = max{gi(x), ... , &m(x)} <0, 
gi(x) < 0 for atleastone i=1,...,m 


S g(x) := min{gi(x), ... . gm(x)} <0. 


Therefore, any finite system of dc inequalities, whether 
conjunctive or disjunctive, can be rewritten as a single 
dc inequality. 

By easy manipulations it is then possible to reduce 
any dc optimization problem to the following canonical 
form: 

minimize f(x) 


(CDC) 
subject to g(x) <0 < h(x), 


where all functions f, g, h are convex. 
Thus dc functions allow a very compact description 
of a wide class of nonconvex optimization problems. 


Recognizing dc Functions 


To exploit the dc structure in optimization problems, 
it is essential to be able to recognize dc functions that 
are still in hidden form (i.e., not yet presented as dif- 
ferences of convex functions). The next proposition ad- 
dresses this question. 


Proposition 2 Every function f € C* is dc on any com- 
pact convex set £2. 


It follows that any polynomial function is dc, and hence, 
by the Weierstrass theorem, DC({2) is dense in the Ba- 
nach space C({2) of continuous functions on Q with 
the supnorm topology. In other words, any continuous 
function can be approximated as closely as desired by 
a dc function. 

More surprisingly, any closed set S in IR” can be 
shown to be a dc set, i.e., a set that is the solution set 
of a dc inequality. Namely, given any closed set S C R” 
and any strictly convex function h: R” — R, there ex- 
ists a continuous convex function gs: R” + R such 
that S = {x € R": gs(x) — h(x) < 0} [10]. 

In many situations we not only need to recognize 
a dc function but also to know how to represent it ef- 
fectively as a difference of two convex functions. While 
several classes of functions have been recognized as dc 
functions [2], there are still few results about effective 


dc representations of these functions. For composite 
functions a useful result about dc representation is the 
following [13]. 


Proposition 3. Let h(x) = u(x) — v(x), where u,v: 
82 — R4 are convex functions on a compact con- 
vex set 82. CIR” such that 0 < h(x) <aVxeQ. If 
q: [0, a] — R is a convex nondecreasing function such 
that q'_(a) < 00 (q'_(a) being the left derivative of q(t) 
at a), then q(h(x)) is a dc function on 92: 


q(h(x)) = g(x) — Ka + v(x) — u(x)], 


where g(x) = q(h(x)) + K[a + v(x) — v(x)] is a con- 
vex function and K is any constant satisfying K > q/_(a). 


For example, by writing x* = e' with h(x) = 
»i=1,...,n %i log x; and applying the above proposition, 
it is easy to see that x* = Ci +++ x0", with a € R’, is 
dc on any box 92 = [r,s] C R4.,. Hence, any syno- 
mial function f(x) = }0y cax®, with cg € R,a € R41, 
is also dc on 22. 


Global Optimality Criterion 


A key question in the theoretical as well as computa- 
tional study of a global optimization problem is how to 
test a given feasible solution for global optimality. 

Consider a pair of problems in some sense mutually 
obverse: 


inf{ f(x): x € 2, h(x)> a}, (Pa) 


sup{h(x): x € 2, f(x) <y}, (Qy) 


where a,y € R, 2 is a closed set in R”, and 
f,g: R" — R are two arbitrary functions. 
We say that problem (P,) is regular if 


inf Py = inf{f(x): x € 2, h(x) > a}. (1) 


Analogously, problem (Q,) is regular if supQ, = 
sup{h(x): x € 2, f(x) < y}. 


Proposition 4 Let x be a feasible solution of problem 
(Po). If x is optimal to problem (P,) and if problem (Qy) 
is regular for y = f(x), then 


sup{h(x): x € Q, f(x)<y}=a. (2) 


Conversely, if (2) holds and if problem (Pq) is regular, 
then x is optimal to (Py). 
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Turning now to the canonical dc optimization problem 
(CDC), let us set 2 = {x: g(x) < 0} and without los- 
ing generality assume that the reverse convex constraint 
h(x) = 0 is essential, i.e., 


inf{ f(x): x € Q} < inf f(x): x € h(x) = O}. (3) 


Since CDC is a problem (P,) with a = 0, if x is a fea- 
sible solution to CDC, then condition (3) ensures the 
regularity of the associated problem (Q,) for y = f(x). 

Define 
C = {x: h(x) <0}, Dy) = {xe &: f(x) <y}, 
(4) 


and for any set E denote its polar by E*. As specialized 
for CDC, Proposition 4 yields: 


Proposition 5 In order that a feasible solution x of 
CDC may be a global minimizer, it is necessary that the 
following equivalent conditions hold for y = f(x): 


Diy) cc, (5) 
0 = max{h(x): x € D(y)}, (6) 
C* c [D(y)]*. (7) 


If the problem is regular, then any one of the above con- 
ditions is also sufficient. 


An important special dc program is the following prob- 
lem: 
minimize g(x) —h(x) 
where g,h: R" — R are convex functions (R denotes 
the set of extended real numbers). Writing this problem 
as min{g(x) —t: x € D,h(x) => t} with D = domg N 
domh and using (7), one can derive the following: 


subject to x € R”, (DC) 


Proposition 6 Let g,h: R" > R be two convex func- 
tions such that h(x) is proper and Isc. Let x be a point 
where g(x) and h(x) are finite. In order for x to be 
a global minimizer of g(x) — h(x) over R", it is neces- 
sary and sufficient that 


deh(X) C Oeg(x) Ve>O, (8) 


where de f(a) = {p € R": (p,x —a) —e < f(x) - 
f(a) ‘Vx € R"} is the e-subdifferential of f(x) at 
point a. 


Solution Methods 


Numerous solution methods have been proposed for 
different classes of dc optimization. Each of them pro- 
ceeds either by outer approximation (OA) of the feasi- 
ble set or by branch and bound (BB) or is of a hybrid 
type, combining OA with BB. Following are some typi- 
cal de algorithms. 


An OA Method for (CDC) 


Without losing generality, assume (3), i.e., 


dw s.t. g(w) <0, 
f(w) < mint f(x): x € Q,h(x) > 0}. (9) 


where, as was defined above, (2 = {x: g(x) < 0}. In 
most cases checking the regularity of a problem is not 
easy while regularity is needed for the sufficiency of 
the optimality criteria in Proposition 5. Therefore the 
method to be presented below only makes use of the 
necessity part of this proposition and is independent of 
any regularity assumption. 

In practice, what we usually need is not an exact 
solution but just an approximate solution of the prob- 
lem. Given tolerances ¢ > 0,7 > 0, we are interested in 
&-approximate solutions, i.e., solutions x € 92 satisfy- 
ing h(x) > —e. An ¢-approximate solution x* is then 
said to be n-optimal if f(x*) —n < min{f(x): x € 
2, h(x) = Of. 

With x now being a given ¢-approximate solution 
and y = f(x) — n, consider the subproblem 

max{h(x): x € Q, f(x) <y}. (Qy) 
For simplicity assume that the set Diy) = {x € Q, 
f(x) < y} is bounded. Then (Q,) is a convex maxi- 
mization problem over a compact convex set and can 
be solved by an OA algorithm (see [13] or [3]) generat- 
ing a sequence {x*, y*} such that 


x* € Q, f(x") < y, h(x") < max(Q,) < h(y*) (10) 


and, furthermore, ||x* — y*|| > 0 as k > +00. These 
relations imply that we must either have h(y*) <0 
for some k (which implies that max (Q,) <0), or 
else h(x*) >—e for some k. In the former case, 
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this means there is no x € 2 with h(x) >0 and 
f(x) < f(x) — n, i.e, % is n-optimal to CDC, and we 
are done. In the latter case, x* is an ¢-approximate solu- 
tion with f (x*) < f(x) — n. Using then a local search 
(or any inexpensive way available) one can improve 
x* to x’ € QN{x: h(x) = —e}, and, after resetting 
y < f(x’)—7n in (Q,), one can repeat the procedure 
with the new (Q,). And so on. 

As is easily seen, the method consists essentially of 
a number of consecutive cycles in each of which, say 
the [th cycle, a convex maximization subproblem (Q,) 
is solved with y = f(x') — n for some ¢-approximate 
solution x!, This sequence of cycles can be organized 
into a unified procedure. For this, it suffices to start 
each new cycle from the result of the previous cycle: 
after resetting y < y’ := f(x’) — 7 in (Q,), we have 
D(y’) C D(y), with a point x’ ¢ D(y’), so the algo- 
rithm can be continued by using a hyperplane separat- 
ing x’ from D(y’) to form with the current polytope 
outer approximating D(y) a smaller polytope outer 
approximating D(y’). Since each cycle decreases the 
objective function value by at least a quantity n > 0, 
and the objective function is bounded from below, the 
whole procedure must terminate after finitely many cy- 
cles, yielding an ¢-approximate solution that is 7-opti- 
mal to (CDC). 

It is also possible to use a BB algorithm for solving 
the subproblem (Q,) in each cycle. The method then 
proceeds exactly as in the BB method for GDC to be 
presented next. 


A BB Method for General DC Optimization 


A general dc optimization problem can be formulated 
as 


min{ f(x): gi(x) > 0, 


i=1,...,m,xE€2}, (GDC) 


where {2 is a compact convex subset of R”, and 
£21, +++ » 8m are de functions on 2. Although in prin- 
ciple GDC can be reduced to the canonical form and 
solved as a CDC problem, this may not be an efficient 
method as it does not take account of specific features 
of GDC. For instance, if the feasible set of GDC is highly 
nonconvex, computing a single feasible solution may be 
as hard as solving the problem itself. Under these con- 
ditions, a direct application of the OA or the BB strate- 


gies to GDC is fraught with pitfalls. Without adequate 
precautions, such approaches may lead to grossly incor- 
rect results or to an unstable solution that may change 
drastically upon a small change of the data or the toler- 
ances [15,16]. 

A safer approach is to reduce GDC to a sequence 
of problems with a convex feasible set in the follow- 
ing way. By simple manipulations it is always pos- 
sible to arrange that the objective function f(x) is 
convex. Let g(x) = minj=j,...,m gi(x), and for every 
y € RU {+00} consider the subproblem 

max{g(x): x € 2, f(x) <y}. (Ry) 
Assuming the set D(y) := {x € 2, f(x) < y} to be 
bounded, we have in (R,) a dc optimization over 
a compact convex set. Using a BB procedure to solve 
(R,) we generate a nested sequence of partition sets 
My (boxes, e.g., using a rectangular subdivision), 
together with a sequence a(M;) € RU {—oo}, and 
x* ER", k =1,2,... , such that 
(11) 


diam M; > 0ask > +o0, 


a(Mx) \. max{g(x): x € Mk N D(y)}(k > +00), 
(12) 


a(M,) > max(R,),x* € Mk D(y), (13) 
where max(P) denotes, as usual, the optimal value of 
problem P. Condition (11) means that the subdivision 
rule used must be exhaustive, while (12) indicates that 
a(M,) is an upper bound over the feasible solutions in 
My, and (13) follows from the fact that M;, is the parti- 
tion set with the largest upper bound among all parti- 
tion sets currently of interest. 

As before, we say that x is an ¢-approximate solu- 
tion of GDC if x € 92, g(x) => —e and x* is n-optimal if 
f(x*)— 1 < min{ f(x): g(x) => 0,x € 2}. From (11)- 
(13) it follows that a(Mx) — g(x") >0ask>++o0, 
and hence, for any given ¢ > 0, either a(M;) <0 for 
some k or glx" ) > -—e for some k. In the former 
case, max(R,) < 0, hence max(GDC) > y; in the lat- 
ter case, x* is an ¢-approximate solution of GDC with 
f(x*) < y.So, given any ¢-approximate solution x with 
y = f(x) — 7, a finite number of iterations of this BB 
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procedure will help to determine whether there is no 
feasible solution x to GDC with f(x) < f(x)—7n, ie, 
X is n-optimal to GDC, or else there exists an e-approx- 
imate solution x’ to GDC with f(x’) < f(x) — 7. In the 
latter case, we can reset f(x’) — 1 < y and repeat the 
procedure with the new y, and so on. In this way the 
whole solution process consists of a number of cycles, 
each involving a finite BB procedure and giving a de- 
crease in the incumbent value of f(x) by at least 7 > 0. 
By starting each cycle right from the result of the previ- 
ous one, the sequence of cycles forms a unified proce- 
dure. Since 7) is a positive constant, the number of cycles 
is finite and the procedure terminates with an e-approx- 
imate solution that is 7-optimal to GDC. 

The efficiency of such a BB procedure depends on 
two basic operations: branching and bounding. Usu- 
ally, branching is performed by means of an exhaus- 
tive subdivision rule, so as to satisfy condition (11). For 
rectangular partition, this condition can be achieved 
by the standard bisection rule: bisect the current box 
M into two equal subboxes by means of a hyper- 
plane perpendicular to a longest edge of M at its mid- 
point. However, it has been observed that the con- 
vergence guaranteed by an exhaustive subdivision rule 
is rather slow, especially in high dimensions. To im- 
prove the situation, the idea is to use, instead of the 
standard bisection, an adaptive subdivision rule de- 
fined as follows. Let the upper bound a(M;) in (12) 
be obtained as a(M;) = max{I"(x): x € Mk N D(y)}, 
where I’(x) is some concave overestimator of g(x) 
over M, that is tight at some point yk € Mx, ie. 
satisfies '(y*) = g(y*). If x* € argmax{I"(x)|x € 
My  D(y)}, then the subdivision rule is to bi- 
sect M, by means of the hyperplane x, = x* + y*/2, 
where s € argmax;_, lyk - x |. As has been proved 
in [13], such an adaptive bisection rule ensures the 
existence of an infinite subsequence {k,} such that 
ykv — xk» + 0 as v > +00. The common limit x* 
of x*» and y* then yields an optimal solution of 
the problem (R,). Computational experience has ef- 
fectively confirmed that convergence achieved with an 
adaptive subdivision rule is usually much faster than 
with the standard bisection. For such an adaptive sub- 
division to be possible, the constraint set D(y) of (12) 
must be convex, so that for each partition set M; two 
points x* © MyM D(y) and ye € Mx can be defined 
such that a(M,) — g(y*) = o(||x* — yk ll). 


=X 
ky 


DCA-A Local Optimization Approach to (DC) 


By rewriting DC as a canonical dc optimization prob- 
lem 


min{t — h(x): x € R",t € R, g(x) -—t <0}, 


we see that DC can be solved by the same method as 
CDC. Since, however, for some large-scale problems we 
are not so much interested in a global optimal solution 
as in a sufficiently good feasible solution, a local opti- 
mization approach to DC has been developed [9] that 
seems to perform quite satisfactorily in a number of ap- 
plications. This method, referred to as DCA, is based on 
the well-known Toland equality: 


x 


inf {g(x)—h(x)} = inf {g"(y)—g"(y)}, (14) 
€domg y €domh 


where g, h: R” — R are lower semicontinuous proper 

convex functions, and the star denotes the conjugate, 

e.g, g*(y) = sup{(x, y) — g(x): x € domg}. Taking 

account of this equality, DCA starts with x° € domg 

and fork = 1,2,... , computes y* € dh(x*);x*t) € 
dg*( yk ). As has been proved in [9], the thus generated 
sequence x*, y* satisfies the following conditions: 

1. The sequences etx") — h(x*) and h*(x*) — eC) 
are decreasing. 

2. Every accumulation point x* (resp. y*) of the se- 
quence {x*} (resp. {y*}) is a critical point of the 
function g(x) — h(x) (resp. h*(y) — g*(y)). 

Though global optimality cannot be guaranteed by this 

method, it has been observed that in many cases of in- 

terest it yields a local minimizer that is also global. 


Applications and Extensions 


The above described dc methods are of a general- 
purpose type. For many special dc problems more 
efficient algorithms are needed to take full advantage 
of additional structures. Along this line, dc methods 
have been adapted to solve problems with separated 
nonconvexity, bilinear programming, multilevel pro- 
gramming, multiobjective programming, optimization 
problems over efficient sets, polynomial and synomial 
programming, fractional programming, continuous lo- 
cation problems, clustering and datamining problems, 
etc. [4]. In particular, quite efficient methods have been 
developed for a class of dc optimization problems im- 
portant for applications called multiplicative program- 
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ming [4,5]. Also, techniques for bounding, branching, 
and decomposition have been refined that have very 
much widened the range of applicability of dc meth- 
ods. Most recently, monotonic optimization, also called 
DM optimization, has emerged as a new promising 
field of research dealing with a class of optimization 
problems important for applications whose structure, 
though different from the dc structure, shares many 
common features with the latter. To be specific, let C 
be a family of real valued functions on IR” such that 
(i) 1,92 €C,a,a2 € Ry > 1g) +O2g. € C; (ii) 
£1,522 €C => g(x) = maxt{gi(x), @(x)} ©C. Then 
the family D(C) = C — C is a vector lattice with re- 
spect to the two operations of pointwise maximum 
and pointwise minimum. When C is the set of convex 
functions, D(C) is nothing but the vector lattice of dc 
functions. When C is the set of increasing functions 
on R", i.e., the set of functions f: R” — R such that 
x’ > x => f(x’) => f(x), the vector lattice D(C) con- 
sists of DM functions, i.e., functions representable as 
the difference of two increasing functions. For the the- 
ory, methods, and algorihms of DM optimization, we 
refer the reader to [7,14,18]. 
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In practical real-world situations the available time for 
making decisions is often limited, while the cost of in- 
vestigation is increasing with time. Therefore, it would 


Decision Support Systems with Multiple Criteria 


613 


be enviable to exploit the increasing processing power 
provided by the modern computer technology, to save 
significant amounts of time and cost in decision mak- 
ing problems. Computationally intensive, but routine 
tasks, such as data management and calculations can be 
performed with remarkable speed by a common per- 
sonal computer, compared to the time that a human 
would need to perform the same tasks. On the other 
hand, computers are unable to perform cognitive tasks, 
while their inference and reasoning capabilities are still 
very limited compared to the capabilities of the human 
brain. Thus, in decision making problems, computers 
can support decision makers by managing the data of 
the problem and performing computationally intensive 
calculations, based on a selected decision model, which 
could help in the analysis, while the decision makers 
themselves have to examine the obtained results of the 
models and conclude to the most appropriate decision. 

This merging of human judgment and intuition to- 
gether with computer systems constitutes the underly- 
ing philosophy, methodological framework and basic 
goal of decision support systems [17]. The term “deci- 
sion support system’ (DSS) is already consolidated and 
it is used to describe any computer system that provides 
information on a specific decision problem using ana- 
lytical decision models and access to databases, in or- 
der to support a decision maker in making decisions ef- 
fectively in complex and ill-structured problems where 
no straightforward, algorithmic procedure can be em- 
ployed [28]. 

The development of DSSs kept pace with the ad- 
vances in computer and information technologies, and 
since the 1970s numerous DSSs have been designed by 
academic researchers and practitioners for the exami- 
nation and analysis of several decision problems includ- 
ing finance and accounting, production management, 
marketing, transportation, human resources manage- 
ment, agriculture, education, etc. [17,19]. 

Except for the specific decision problems that DSSs 
address, these systems are also characterized by the 
type of decision models and techniques that they in- 
corporate (i.e. statistical analysis tools, mathematical 
programming and optimization techniques, multicrite- 
ria decision aid methods, etc.). Some of these method- 
ologies (optimization, statistical analysis, etc.) which 
have already been implemented in several DSSs, are 
based on the classical monocriterion approach. How- 


ever, real world decision problems can be hardly con- 
sidered through the examination of a single criterion, 
attribute or point of view that will lead to the ‘opti- 
mum’ decision. In fact such a monocriterion approach 
is merely an oversimplification of the actual nature of 
the problem at hand, that can lead into unrealistic deci- 
sions. 

On the other hand, a more realistic and flexible ap- 
proach would be the simultaneous consideration of all 
pertinent factors that may affect a decision. However, 
through this appealing approach a very essential issue 
emerges: how can several and often conflicting factors 
can be aggregated to make rational decisions? This is- 
sue constitutes the focal point of interest for all the 
multicriteria decision aid methods. The incorporation 
of multicriteria decision aid methods in DSSs provides 
the decision makers with a highly efficient tool to study 
complex real world decision problems where multiple 
criteria of conflicting nature are involved. Therefore, 
the subsequent sections of this paper will concentrate 
on this specific category of DSSs (multicriteria DSSs, 
MCDSSs). 

The article is organized as follows. In section 2 some 
basic concepts, notions and principles of multicriteria 
decision aid are discussed. Section 3 presents the main 
features and characteristics of MCDSSs, along with a re- 
view of the research that has been conducted in this 
field, while some extensions of the classical MCDSSs 
framework in group decision making and intelligent 
decision support are also discussed. Finally, section 4 
concludes the paper and outlines some possible future 
research directions in the design, development and im- 
plementation of MCDSSs. 


Multicriteria Decision Aid 


Multicriteria decision aid (MCDA, the European 
School) or multicriteria decision making (MCDM, the 
American School) [49,64] constitutes an advanced field 
of operations research which is devoted to the develop- 
ment and implementation of decision support method- 
ologies to confront complex decision problems involv- 
ing multiple criteria, goals or objectives of conflict- 
ing nature. The foundations of MCDA can be traced 
back in the works of J. von Neumann and O. Mor- 
genstern [43], and P.C. Fishburn [20] on utility theory, 
A. Charnes and W.W. Cooper [10] on goal program- 
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ming, and B. Roy [47] on the concept of outranking 
relations and the foundations of the ELECTRE meth- 
ods. These pioneering works have affected the subse- 
quent research in the field of MCDA that can be distin- 
guished in two major groups: discrete and continuous 
MCDA. The former is involved with decision problems 
where there is a finite set of alternatives which should 
be considered in order to select the most appropriate 
one, to rank them from the best to the worst, or to clas- 
sify them in predefined homogeneous classes. On the 
contrary in continuous MCDA problems the alterna- 
tives are not defined a priori, but instead one seeks to 
construct an alternative that meets his/her goals or ob- 
jectives (for instance the construction of a portfolio of 
stocks). 

There are different ways to address these two classes 
of problems in MCDA. Usually, a continuous MCDA 
problem is addressed through multi-objective or goal 
programming approaches. In the former case, the ob- 
jectives of the decision maker are expressed as a set of 
linear or non linear functions which have to be ‘opti- 
mized’, whereas in the latter case the decision maker 
expresses his/her goals in the form of a reference or 
ideal point which should be achieved as close as pos- 
sible. These two approaches extend the classical single- 
objective optimization framework, through the simul- 
taneous consideration of more than one objectives or 
goals. Of course in this new context it seems illusory 
to speak of optimality, but instead the aim is initially 
to determine the set of efficient solutions (solutions 
which are not dominated by any other solution) and 
then to identify interactively a specific solution which 
is consistent with the preference structure of the deci- 
sion maker. The books [54,57] and [63] provide an ex- 
cellent and extensive discussion of both multi-objective 
and goal programming. 

On the other hand, discrete MCDA problems are 
usually addressed through the multi-attribute utility 
theory (MAUT) [26], the outranking relations approach 
[48] or the preference disaggregation approach ([23,44]). 
These three approaches are mainly focused on the de- 
termination and modeling of the decision makers’ pref- 
erences, in order to develop a global preference model 
which can be used in decision making. Their differences 
concern mainly the form of the global preference model 
that is developed, as well as the procedure that is used 
to estimate the parameters of the model. The developed 


preference model in both MAUT and preference dis- 
aggregation is a utility or value function either additive 
or multiplicative, whereas the outranking relations ap- 
proach is based on pairwise comparisons of the form 
‘alternative a is at least as good as alternative b’. Con- 
cerning the procedure that is used to estimate the pa- 
rameters of the global preference model, both in MAUT 
and outranking relations there is a direct interrogation 
of the decision maker. More precisely, in MAUT the de- 
cision maker is asked to determine the trade-offs among 
the several attributes or criteria, while in outranking re- 
lations the decision maker has to determine several pa- 
rameters, such as the weights of the evaluation crite- 
ria, indifference, strict preference and veto thresholds 
for each criterion. On the contrary, in preference dis- 
aggregation, an ordinal regression procedure is used to 
estimate the global preference model. Based on a refer- 
ence set of alternatives, which may consist either of past 
decisions or by a small subset of the alternatives un- 
der consideration, the decision maker is asked to pro- 
vide a ranking or a classification of the alternatives ac- 
cording to his/her decision policy (global preferences). 
Then, using an ordinal regression procedure the global 
preference model is estimated so that the original rank- 
ing or classification (and consequently the global pref- 
erence system of the decision maker) can be reproduced 
as consistently as possible. 


Multicriteria Decision Support Systems 


From the above brief discussion of the basic concepts 
and approaches of MCDA, it is clear that in any case 
the decision maker and his/her preferences constitute 
the focal point of the methodological framework of 
MCDA. This special characteristic of MCDA implies 
that a comprehensive model of a decision situation can- 
not be developed, but instead the model should be de- 
veloped to meet the requirements of the decision maker 
[46]. The development of such a model can be only 
achieved through an iterative and interactive process, 
until the decision maker’s preferences are consistently 
represented in the model. Both interactivity and itera- 
tive operation are two of the key characteristics of DSSs. 
Consequently, a DSS incorporating MCDA methods 
could provide essential support in structuring the de- 
cision problem, analyzing the preferences of the deci- 
sion maker, and supporting the model building process. 
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The support provided by multicriteria DSSs (MCDSSs) 
is essential for the decision maker as well as for the de- 
cision analyst. 

e The decision maker through the use of MCDSSs 
becomes familiar with sophisticated operations re- 
search techniques, he is supported in structuring the 
decision problem considering all possible points of 
view, attributes or criteria, and furthermore, he is 
able to analyze the conflicts between these points of 
view and consider the existing trade-offs. All these 
capabilities provided by MCDSSs serve the learning 
process of decision makers in resolving complex de- 
cision problems in a realistic context, and constitute 
a solid scientific basis for arguing upon the decisions 
taken. 

e On the other hand, from the decision analyst point 
of view, MCDSSs provide a supportive tool which 
is necessary throughout the decision making pro- 
cess, enabling the decision analyst who usually acts 
as an intermediate between the system and the de- 
cision maker, to highlight the essential features of 
the problem to the decision maker, to introduce the 
preferences of the decision maker in the system, and 
to develop the corresponding model. Furthermore, 
through sensitivity and robustness analyses the de- 
cision analyst is able to examine several scenarios, 
concerning both the significance of the evaluation 
criteria as well as the changes in the decision envi- 
ronment. 

The supportive operation of MCDSSs in making de- 
cisions in ill-structured complex decision problems was 
the basic motivation for computer scientists, manage- 
ment scientists and operations researchers in the devel- 
opment of such systems. Actually, MCDSSs are one of 
the major areas of DSSs research since the 1970s [19] 
and significant progress has been made both on the the- 
oretical and the practical/implementation viewpoints. 

The first MCDSSs to be developed in the 1970s 
where mainly oriented towards the study of multi- 
objective 
([16,61]). These early pioneer systems, mainly due to 
the limited capabilities of computer technology dur- 
ing that period, were primarily developed for academic 
purposes, they were implemented in mainframe com- 
puters, with no documentation available, while they 
had no visual representation capabilities [31]. Today, 
after more than twenty years of research and advances 


mathematical programming problems 


in MCDA, DSSs, and computer science, most MCDSSs 

provide many advanced capabilities to decision makers 

including among others [46]: 

1) Enhanced data management capabilities including 
interactive addition, deletion or modification of cri- 
teria. 

2) Assessment and management of weights. 

3) User-friendly interfaces based on visual representa- 
tions of both alternatives and criteria to assist the 
interaction between the system and the decision 
maker. 

4) Sensitivity analysis (what-if analysis) to determine 
how the changes in the weights of the evaluation cri- 
teria can affect the actual decision. 

These capabilities are in accordance with the gen- 
eral characteristics of DSSs, that is interactivity, flexibil- 
ity and adaptability to the changes of the decision envi- 
ronment, user oriented design and development, and 
combination of data base management with decision 
models. Although the aforementioned capabilities are 
common to most of the existing MCDSSs, one could 
provide a distinction of the MCDSSs according to the 
MCDA approaches that they employ: 

e MCDSSs based on the multi-objective program- 
ming approach: 

- the TOMMIX system [2], 

- the TRIMAP system [11], 

- the VIG system ([29,32]), 

- the VIDMA system [30], 

- the DIDAS system [36], 

- the AIM system [37], 

- the ADBASE system [58], and 

- the STRANGE system [59]. 

e MCDSSs based on the MAUT approach: 

- the MACBETH system [5], 

- the VISA system [6], and 

- the EXPERT CHOICE system [21]. 

e MCDSSs based on the outranking relations ap- 
proach: 

- the PROMCALC and GAIA systems [7], 
the ELECCALC system [27], 

- the PRIAM system [34], and 

- the ELECTRE TRI system [62]. 

e MCDSSs based on the preference disaggregation ap- 
proach: 

- the PEFCALC system [22], 

- the MINORA system [51], 
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- the MIIDAS system [52], and 

- the PREFDIS system [66]. 

Most of the existing MCDSSs are designed for the 
study of general multicriteria decision problems. Al- 
though they provide advanced capabilities for model- 
ing the decision makers’ preferences in order to make 
a specific decision regarding the choice of an alternative 
and the ranking or the classification of the alternatives, 
MCDSSs do not consider the specific characteristics, as 
well as the nature of the decision that should be taken 
according to the specific decision problem that is con- 
sidered. 

To address the unique nature of some significant 
decision problems, where except for the application 
of MCDA methodology, some other type of analyses 
are necessary to consider the environment in which 
the decision is taken, several authors proposed do- 
main specific MCDSSs. Some decision problems for 
which specific MCDSSs have been developed include 
the assessment of corporate performance and viability 
(the BANKADVISER system [39], the FINCLAS sys- 
tem [65], the FINEVA system [68], and the system 
proposed in [53]), bank evaluation (the BANKS sys- 
tem [40]), bank asset liability management [33], finan- 
cial planning [18], portfolio selection [67], new prod- 
uct design (the MARKEX system [42]), urban planning 
(the system proposed in [1]), strategic planning [9], and 
computer system design [15]. 


Multicriteria Group Decision Support Systems 


A common characteristic of all the aforementioned 
MCDSSs is that they refer to decisions that are taken 
by individual decision makers. However, in many cases 
the actual decision is not the responsibility of an in- 
dividual, but instead there is a team of negotiating or 
cooperative participants who must conclude to a con- 
sensus decision. In this case, although the decision pro- 
cess and consequently the required decision support, 
remains the same, as far as each individual decision 
maker is concerned, the process that will lead the co- 
operative team or the negotiating parties to a consensus 
decision is completely different from the individual de- 
cision making process. Therefore, the type of support 
needed also differs. 

Group DSSs (GDSSs) aim at supporting such deci- 
sion processes, and since the tools provided by MCDA 


can be extended to generalized group decision process, 
several attempts have been made to design and develop 
such multicriteria systems. Some examples of multicri- 
teria GDSSs include the Co-oP system [8], the JUDGES 
system [12], the WINGDSS system [13], the MEDIA- 
TOR system [24], and the SCDAS system [35]. 


Intelligent Multicriteria Decision Support Systems 


Except for the extension of the MCDSSs_ frame- 
work in supporting group decision making, recently 
researchers have also investigated the extension of 
MCDSSs through the exploitation of the advances in 
the field of artificial intelligence. Scientific fields such 
as those of neural networks, expert systems, fuzzy sets, 
genetic algorithms, etc., provide promising features and 
new capabilities regarding the representation of expert 
knowledge, the development of intelligent and more 
friendly user interfaces, the reasoning and explanation 
abilities, as well as the handling of incomplete, uncer- 
tain and imprecise information. 

These appealing new capabilities provided by ar- 
tificial intelligence techniques can be incorporated in 
the existing MCDSSs framework to provide expert ad- 
vice on the problem under consideration, assistance to 
the use of the several modules of the system, expla- 
nations concerning the results MCDA, models, sup- 
port on structuring the decision making process, as well 
as recommendations and further guidance for the fu- 
ture actions that the decision maker should take in or- 
der to implement successfully his/her decisions. The 
terms ‘intelligent multicriteria decision support systems’ 
or ‘knowledge-based multicriteria decision support sys- 
tems’ have been used by several authors to describe 
MCDSSs which take advantage of artificial intelligence 
techniques in combination with MCDA methods. 

Some examples of intelligent 
MCDSSs are, the system proposed in [3] for multi- 
objective linear programming, the MARKEX system 
for new product design [42], the CREDEX system [45] 
and the CGX system [55] for credit granting problems, 
the MIIDAS system for estimating additive utility func- 
tions based on the preference disaggregation approach 
[52], the INVEX system for investment analysis [60] 
based on the PROMETHEE method, as well as the 
FINEVA system [68] for the assessment of corporate 
performance and viability. All these systems incor- 
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porate in their structure one or more expert system 
components either to derive estimations regarding the 
problem under consideration (FINEVA, MARKEX, 
CREDEX, CGX, INVEX systems) or to support the use 
of the MCDA models which are incorporated in the 
system and generally support and improve the commu- 
nication between the user and the system (MIIDAS and 
MARKEX systems). Furthermore, the INVEX system 
incorporates fuzzy sets to provide an initial distinction 
between good and bad investment projects, so that the 
number of alternatives to be considered latter on in the 
multicriteria analysis module is reduced. 

The ongoing research on the integration of artifi- 
cial intelligence with MCDA regarding the theoretical 
foundations of this integration and the related imple- 
mentation issues ([4,25]), the construction of fuzzy out- 
ranking relations ([14,41,50]), and the applications of 
neural networks in preference modeling and utility as- 
sessment ([38,56]) constitutes a significant basis for the 
design and development of intelligent MCDSSs imple- 
menting the theoretical findings of this research. 


Conclusions 


This article investigated the potentials provided by 
MCDSSs in the decision making process. MCDSSs dur- 
ing the last two decades have consolidated their posi- 
tion within the operations research, information sys- 
tems and management science communities as an ef- 
ficient tool for supporting the whole decision making 
process beginning from problem structuring until the 
implementation of the final decision, in complex ill- 
structured problems. 

The review which was presented in this paper re- 
veals that recent advances in MCDSSs include systems 
for general use to solve both discrete and continuous 
MCDA problems, systems designed to study some spe- 
cific real world decisions, as well as systems designed to 
support negotiation and group decision making. 

As the computer science and technology progresses 
rapidly, new areas of applications of MCDSSs can be 
explored including their operation over the Internet to 
provide computer support to co-operative work of dis- 
persed and asynchronous decision units. The incorpo- 
ration of artificial intelligence techniques in the existing 
framework of MCDSSs also constitutes another signif- 
icant area of future research. Although, as its has been 


illustrated in this paper, researchers have already tried 
to integrate these two approach in an integrated intelli- 
gent system, there is a lot of work to be done in order 
to take the most out of the capabilities of neural net- 
works, fuzzy sets and expert systems to provide user- 
friendly support in decision problems where multiple 
criteria are involved. 
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Abstract 


Stochastic multistage mean-variance optimization 
problems represent one of the most frequently used 
modeling tools for planning problems, especially finan- 
cial. Decomposition algorithms represent a powerful 
tool for the solution of problems belonging to this class. 
The first aim of this article is to introduce multi-stage 
mean-variance models, explain their applications and 
structure. The second aim is the discussion of efficient 
solution methods of such problems using decomposi- 
tion algorithms. 


Background 


Stochastic programming (SP) is becoming an increas- 
ingly popular tool for modeling decisions under uncer- 
tainty because of the flexible way uncertain events can 
be modeled, and real-world constraints can be imposed 
with relative ease. SP also injects robustness to the opti- 
mization process. Consider the following standard “de- 
terministic” quadratic program: 


. I / 
min sree 


st Ax=b (1) 
xh <x <x". 

It is not always possible to know the exact values of the 
problem data of (1) given by H, A, c, and b. Instead, we 
may have some estimations in the form of data gath- 
ered either empirically or known to be approximated 
well by a probability distribution. The SP framework al- 
lows us to solve problems where the data of the problem 
are represented as functions of the randomness, yield- 
ing results that are more robust to deviations. 

The power and flexibility of SP does, however, come 
at a cost. Realistic models include many possible events 
distributed across several periods, and the end result 
is a large-scale optimization problem with hundreds 
of thousands of variables and constraints. Models of 
this scale cannot be handled by general-purpose opti- 
mization algorithms, so special-purpose algorithms at- 


tempt to take advantage of the specific structure of SP 
models. We examine two decomposition algorithms 
that had encouraging results reported in linear SP; the 
first is based on the regularized version of Benders de- 
composition developed by [21], and the second on an 
augmented-lagrangian-based scheme developed by [4]. 

Others [9,24,27] formulated multistage SP as 
a problem in optimal control, where the current stage 
variables depend on the parent node variables, and used 
techniques from optimal control theory to solve the re- 
sulting problem. Another related method is the approx- 
imation algorithm by [11] where a sequence of scenario 
trees is generated whose solution produces lower and 
upper bounds on the solution of the true problem. De- 
composition algorithms are not, however, the only ap- 
proach to tackle the state explosion from which SPs suf- 
fer; approximation algorithms and stochastic methods 
are just two examples of other methods where research 
is very active [5]. In this study, we are concerned only 
with decomposition methods. 


Problem Statement 


We consider a quadratic multistage SP. In the lin- 
ear case, SP was first proposed independently by [10] 
and [1]; for a more recent description see [7] and [13]. 
For two stages, the problem is: 


1 
min 5x Hx + c'x + Q(x) (2a) 
st Ax=b (2b) 
xh<x <x". (2c) 


We use ’ to denote the transpose of a vector or a ma- 
trix. cand x“! are known vectors in Si”. Let A and H 
be known matrices in #”*"! and R"!*"!, These quan- 
tities represent the state of the world that is known. We 
assume that H is positive semidefinite. The first two 
terms in the objective function (2a) model the goals 
of the decision maker that do not depend on uncer- 
tain events. Q(x) represents the expected value of the 
second-stage objective function: 


Q(x) = Eg(Q(x, &())] , 


where 1 
Q(x, &(@)) = = 50) (@) A) yo) 


—(L-a)c'(w)y(@) (3) 
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st W(a@)y(@) = h(w) — T(@)x (4) 


y(o)' < yo) < yo)". (5) 


Let 92 be the set of all random events, and w € 92 be 
the particular realization of an event so that when @ is 
known the random events are aggregated in the vector 
E(w) = [y(@), H(@), W(@), h(), T(), y“""(@)], and 
let & be the support of €. The uncertainty of the second 
stage is represented by the random data H(w), W(@), 
and T(w), which are matrices in SH"%"2, }y72%"2, 
and {i"2*"! respectively. The vectors c(w), h(w), and 
y!“(w) are random vectors in Si”, }t”, and Nt”? re- 
spectively. We assume that the number of possible re- 
alizations of w is finite. Under this assumption, &(w) 
is taken to mean that for different w’s the data of the 
problem change. The dependence of y on uncertainty 
is depicted as y(w) € Ht". The vector y(q) is still the 
decision variable but this notation is used to stress the 
point that for different realizations of @ we must have 
a different y. In the objective function (3), the quadratic 
term represents the risk of the decision measured by 
variance, while the linear term represents the expected 
outcome. The scalar w € [0, 1] is used in (3) to describe 
the trade-off between risk expectation. 

Deriving the multi-stage problem from the two- 
stage formulation is just a matter of applying the ideas 
described above recursively to attain the required num- 
ber of stages. For the multistage problem with T, pe- 
riods, the first-stage decision remains the same but for 
t= 2...T, we have 


Qr(x1-1) = Ee, Qi(xr,&(@)) |. (6) 


where 


1 
Qi(x1-1, &:(@)) = min a5 ¥,()Hi(w) y:() 


—(l-a)ciys(@) + Qr4i(ye(@)) 
s.t Wi(w)y(@) = hy(@)—T:-1(@) x1-1 


yl)! < yl) < yi(o)". 
(7) 


For the last time period t = T;, the recourse function 
Qr.41 is zero. 

Our principal concern involves decomposition al- 
gorithms for (7). For more insight into the properties 
of stochastic quadratic problems the reader is referred 


to [14], and [15]. Before we delve into decomposition 
algorithms, we introduce some terminology that will be 
used in the next section. 

The dynamic programming model (7) is usually re- 
ferred to as non-anticipative. This property means that 
decisions are based on the past and not the future. 
There are two ways this concept can be represented, 
namely compact and split-view formulations [20]. 

The compact variable formulation can be mapped 
directly onto a tree structure known as the scenario tree; 
see Fig. la. The root of the tree represents the state of 
the world that is deterministic. As we move down the 
scenario tree, different events represent different real- 
izations of w, each level of the tree represents a different 
time period, and the path from the root to a leaf node 
is known as a scenario. We use v = (t, k) to denote the 
kth node in period t, a(v) the ancestor node, and d(v) 
the descendant nodes. Benders decomposition, to be in- 
troduced in the next section, assumes such a structure 
and the result is a decomposition of the large scale prob- 
lem into several subproblems, each representing a node 
in the tree. 

In a split-variable formulation for each scenario, 
from the set of possible scenarios, new decision vari- 
ables are introduced so that the large-scale problem is 
decomposed into n subproblems, where n is the num- 
ber of scenarios. Conceptually, using this approach, the 
non-anticipative constraints are completely relaxed; see 
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Fig. 1b. To enforce these constraints, new constraints 
are introduced that “rebuild” the links between sub- 
problems, usually through some penalty function (see 
Fig. Ic). 


Methods 


The importance of decomposition algorithms in SP was 
recognized early on, as results in the theory of stochas- 
tic programs are closely linked with their solution al- 
gorithms. The two algorithms described in this section 
represent two very promising approaches in decompo- 
sition of SPs. 


Nested Benders Decomposition (NBD) 


Benders decomposition was first proposed in [2], and it 
has been applied to SP by [26]; it is usually referred to 
as the L-shaped method due to the structure of the con- 
straint matrix. The extension to the non-linear convex 
case has been done in [12], and the extension to the gen- 
eral convex SP appears in [8]. The algorithm has also 
been widely studied for multistage problems in a paral- 
lel environment([6]. More recent studies appear in [18]. 
In [15] the quadratic case is also studied. 
It can easily be seen that (2a) is equivalent to: 


1 
min -x’Hx+c’x+e’@ 
x,0 2 


st Ax=b (8) 


9S pox, &(w)) 

xl <x <x" 

where e is a vector of ones. The dimension of the latter 
vector is equal to the number of nodes in the next pe- 
riod. The expression pw Q(x, &(w)) represents the value 
of the next stage decision if event w occurs (with proba- 
bility p,.). The dimensions of the rest of the data are the 
same as in (2a). Even though it is possible to aggregate 
the @ vector to a single variable, computational stud- 
ies [5,6] have shown that the reduction of variables did 
not enhance performance, possibly due to loss of infor- 
mation. 

To represent the recourse function in (8), we con- 
struct an approximation using outer linearizations. 
This is achieved by computing cuts (cutting planes). 
There are two types of cuts: optimality and feasibility. 
Instead of solving the large-scale problem (8) we solve 


the relaxed version 


1 
min 5x Hx +c'x+e’O 
x 


st Ax=b 
Dx >d (9a) 
9>Gx+g (9b) 


xi <x <x" 


where (9a) and (9b) represent feasibility and optimal- 
ity cuts, respectively. The aim of these constraints is to 
approximate the feasible region of (8). Feasibility cuts 
are constructed as follows: Assuming that t = T and 
for a fixed ® the vth problem in (7) takes the follow- 
ing form: 
Q(x) = min 5 yHy —(l-a)c'y 
y 
st Wy = h- TXa(v) (10) 
yi<ysy" 
Assume that this problem is infeasible due to the vec- 


tor Xq(y) generated in a subproblem of a previous stage. 
Consider the following problem: 


P(y, Xaw)y) = mine’yt + ey 
y 


s.t Wy + ye —y =h—-Txay) (11) 
y<y<y" (12) 
pr 0 


Then since the original problem was infeasible due to 
Xa(v) we must have that P(-,xqy)) > 0. Let A be the 
Lagrange multiplier of the constraint in (11), then by 
duality we must also have that A’(h — Txqv)) < 0. Set 
D = WT and d = X’h to obtain (9a), a supporting hy- 
perplane to Q(x). To apply this result when t 4 T just 
note that the same procedure is recursively applied by 
taking under consideration the additional constraints 
from cuts of other subproblems. 

For optimality cuts, one proceeds as follows: again 
we start with the problem in (10) and let x; be the so- 
lution vector of a subproblem in the previous stage. By 
the gradient inequality we must have that (w is dropped 
since it is clear from context): 


Q(x) = Q(xn) + VQ(xK)(x — xx) 
Q(xz) = 5 Hy ~(lL—a)ely + A Wy — b+ Tx, 
VQ(x,) = 4'T 
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Thus 
Q(x) = A'Tx + Sy Hy —(1—a)ely + A'(Wy —h) 


Set 


8 = Q(x) 
Cart 


g = 5 y'Hy—(1—ae'y + MWy—h) 


to obtain (9b). Since we require a lower support for 
the expected value, we then multiply G and g by the 
probability of w taking the particular realization of @ 
for two-stage problems and the conditional probability 
for multistage problems. The application of optimality 
cuts when t # T is again developed recursively just by 
taking into account the additional variables and con- 
straints. 

The algorithm proceeds by solving the relaxed prob- 
lem (7) to obtain a solution vector, known as the pro- 
posal vector. The latter is then used to solve the sub- 
problems in (10). If a subproblem is feasible then an 
optimality cut is appended to the constraint set of the 
ancestor problem (also called the master problem). Oth- 
erwise, only a feasibility cut is appended. 

In the linear case, there are some well known 
drawbacks to the algorithmic framework developed 
above [5,21]. We expect issues similar to the following 
to manifest themselves in the quadratic case: 

e The algorithm tends to be inefficient in early itera- 
tions due to the poor description of the original ob- 
jective function provided by the cuts. Moreover, if 
a good warm-start is used, the algorithm may de- 
viate significantly from this point, so any efficiency 
achieved by a good starting point is lost. 

e The number of cuts for master problems may in- 
crease substantially, adding considerable computa- 
tional burden to their solution. 

For these reasons a regularized version of the algorithm 

was proposed in [21]; see also [22,23] for the multi- 

stage version. Ruszczynski’s results, as well as a study 
performed in [28], indicate that the regularized version 
outperforms the original algorithm. 

The basic idea is to add a quadratic term 
p || x —% ||Z in the objective function, where & is cho- 
sen as the “best” current point, in a way to be made pre- 
cise, and p is a penalty parameter. For a high value of p 


the algorithm is penalized from deviating from the cur- 
rent point. In [21], the convergence of the algorithm for 
p= 5 was established for the convex case. The regular- 
izing term stabilizes the behavior of the algorithm be- 
tween iterations, enables valid deletion schemes of the 
cuts, and avoids degenerate iterations that would oth- 
erwise be possible. 

The original problem is now decomposed into three 
types of subproblems. The first type is for the root node. 
The following problem is solved at each iteration: 


1 
min -ax'Hx—(l—a)c’x+eO+ z I| x — € [I 
x0 2 2 


st Ax=b 
Gx>6+¢ 
Dx >d 


xi<x<x", 


The second type is for non-terminal nodes. The follow- 
ing subproblem needs to be considered: 


min —(l-a)c yy 


a / 
By giv yy 
+ 66, + © Il yw — fu IE 
st Wyy = hy — Ta) Xav) 
Gyyv = Oy + gv 
Dy yy = dy 


(13) 


KS wey 
The third type is for terminal nodes. This type of sub- 
problem is identical to (13) without, of course, the cuts 
in the constraint set and the regularizing term in the 
objective function. 

The way cuts are recursively defined and the way 
subproblems are nested in each other has led this to be 
referred to as nested Benders decomposition (NBD). The 
algorithm can now be stated as follows: 


Step 1: Set the iteration counter i = 0 and t = k = 0, 
and let x be a feasible point. 

Step 2: Construct and solve v(t, k) to find the solution 
vector xi, 


Step 2.1: If the problem is infeasible and t = 0 
then STOP: the problem is infeasible. 

Step 2.2: If the problem is infeasible and t > 0, 
generate an optimality cut (9a) and append it to 
the constraint set of a(v). 
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Step 2.3: If the problem was optimal and ¢ > 0, 
generate an optimality cut (9b) and append it 
to the constraint set of a(v). 


Step 3: Compute 


a 1 
F(x) = 5x Hx +x + aX 6; 
= 


P 1 
FR) = 5x Hx +cx+ a OMe): 
j=dv) 


If F = F and t= k = 0 then STOP: & is optimal; 
Else go to step 4 
Step 4: Update the regularizing term: 


4.1 If a subproblem returned a feasibility cut then 
att = Ri 

4.2 If F(x!) > F(%/) or F(x’) > cF(#})+(1—-1)F, 
then set £/+! = %!, and increase p. 

4.3 If F(x’) < F(£‘) or F(xi) < tF(£i)+(1—1)F, 
then set </+! = x! and decrease p. 

4.4 If F(x) =F, then set tit! =x, and de- 
crease . 


Step 5: Set i= i+ 1, find the next subproblem to 
solve (see below), and go to step 2. 


Augmented Lagrangian Decomposition (ALD) 


An alternative algorithm to Benders decomposition de- 
scribed in the previous section is based on the aug- 
mented lagrangian and the method of multipliers [3]. 
The fundamental difference between NBD and ALD is 
the way the two algorithms attack non-anticipativity 
constraints. NBD handles these constraints by having 
a master problem generating proposals to the subprob- 
lems further down the event tree; proposal vectors are 
affected by “future” nodes by feasibility and optimal- 
ity cuts. In ALD a different approach is taken: non- 
anticipativity constraints are relaxed by expressing the 
large-scale problem in terms of smaller subproblems 
that are discouraged from violating the original con- 
straints. The algorithm we use was developed in [4], so 
here we only sketch the main idea. ALD was developed 
and applied to the stochastic quadratic programming 
setting in [25] with encouraging results. Similar algo- 
rithms to ALD have been developed for linear stochas- 
tic programs [16,17]. 


The expectation in (6) for a given time period can 
also be written as 


min Dis vit -(- a)cyi) 


(14) 


s.tWjyi = hy — Tjxai j=l...r 


y Spay 


The problem in (14) is to be interpreted as follows: at 
the current time period there are m scenarios, each hav- 
ing different realizations for H, W, c, etc. There are r 
linking constraints (14) that are linked by the vector 
Xa(i) In [4] the problem is decomposed by introducing 
a new variable z as follows 


min Do pi(5 vit == a)cyi) 


S.t Wii = Zij j= 1...r;i € Ij) (15) 


Zij = hj — TjXati) j=l...ni €1(f) 


WHS, 


where I(j) contains the indices of the subproblems that 
the jth constraint “crosses”, i.e., I(j) = {ilwji A O}. 
“Crosses” means that a constraint contains data from 
more than one subproblem. It is obvious that (14) 
and (15) are exactly the same problem, but the struc- 
ture of (15) facilitates a decomposition algorithm via 
the relaxation of the constraints of (15). In [4] the 
method of multipliers is used for the general problem 
min{f(x)|Ax = b}. Let L.(x, A) denote the associated 
augmented lagrangian defined by L-(x,A) = f(x) + 
A'(Ax — b) + § || Ax — b ||3, where A is the vector of 
multipliers. The general algorithmic framework of the 
method of multipliers can be described as follows: 


Step 1: Initialization: Set the iteration counter k = 0, 
and set c(0) > 0. Set x(0), and A(0) as the starting 
point for the decision variables, and lagrange mul- 
tipliers, respectively. 

Step 2: Compute the next point x(k + 1) = argmin 
L,(x, A(k)). 

Step3 Update the Lagrange multiplier vector A(k + 
1) = A(k) + c(k)(Ax(k + 1) —B). 

Step 4: Update the penalty parameter c(k), and set 
k = k + 1. Ifsome convergence criterion is not sat- 
isfied go to step 2. 
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Applying this general algorithmic framework 
to (15), the problem is decomposed into m subproblems 
and the non-anticipativity constraints are enforced 
through the penalty term in the augmented lagrangian. 

The computation for the solution of (15) involves 
keeping z fixed in order to compute the next incumbent 
for y, and then keeping y fixed in order to compute the 
next incumbent for z. Thus, at the kth iteration the fol- 
lowing subproblems are solved: 


yilk +1) = aere IP (SeiHigs -(1- a)¢ié;) 


+ DE Ah) Wiiki 


{jliel({)} 

c(k) 
a > Wisi = a) 
Vi=1,...,n 


Y Ys + 


zilk +1 = aoe |- 
i€I(j) 


Sii 


: Dos — 50 ViPS Uioegt 
i€I(j) 

s.t Cji = hj = TjXa(i) i€ I(j) 
followed by an update of the lagrange-multiplier vector 
Ajilk +1) = Ajilk) + c(k)(Wjivilk + 1) — zji(k + 1)). 
From a computational point of view the above iterative 
framework is inefficient because of the alternate min- 
imizations required, making this algorithm unsuitable 
for a parallel environment. In our implementation we 
used the more efficient iteration proposed in [4]: 

(1 — a)ci&;) 


yilk +1) = argmin y pi(>& Hig - 


+ D> Ai Wyigi 


{jlieT({} 
c(k) 
+ — (WyilEi — yl) + wij) ) 
(16) 
where _ —— ny (Wii —hjt+ Tjxqi)), Ak +1) = 
Aj(k) + § oe (Wii —hj + Tjxa(i)), and mj; denotes the 


cavdinality of I(j). The derivation of this iteration is dis- 
cussed in Bertsekas and Tsitsiklis ([4], p. 249). The ex- 
pression in (16) forms the main iteration of the ALD 


algorithm. In order to have a complete description of 
the algorithm we need to specify how one can perform 
the updates of the penalty parameter c(k) and how we 
tested for convergence. 

The obvious convergence criteria for ALD are a test 
for feasibility and small changes in the objective func- 
tion. However, it is possible, due to a poor selection of 
updates for c(k), to reach a suboptimal solution. For 
this reason, it is vital to check the KKT conditions of 
the problem in addition to any other stopping criteria. 
If the KKT conditions are not satisfied while the change 
in the objective function is small (10~° in our imple- 
mentation), the update strategy for the penalty param- 
eter appears to have been inappropriate. We performed 
various experiments with different update strategies for 
this penalty parameter and found that the strategy that 
works best on most problems is to start with a small 
value (0.001) and increase it at every iteration by an- 
other small factor (1.05); being more aggressive with the 
update of this parameter caused the algorithm to termi- 
nate prematurely. Note that an arbitrary starting point 
can be used to start the algorithm. If a feasible solution 
or the solution from a previous run is available it may 
be beneficial to start with a higher penalty term. 


Numerical Experiments 


The two algorithms were implemented and tested on 
a multistage financial planning problem. The detailed 
results can be found in [19]. Figure 2 summarizes the 
numerical performance (in terms of CPU time) as the 
number of scenario increases. ALD, and NBD stand for 
Augmented Lagrangian Decomposition, and Nested 
Benders Decomposition respectively. ONBD refers to 
Ordinary Nested Benders Decomposition, i.e. NBD 
without the regularizing term. From Fig. 2 it is clear 
that the regularized version of Benders decomposition 
is the most efficient of the algorithms we considered in 
this article. This result is in line with similar studies 
performed in the linear setting. One possible explana- 
tion is that the NBD algorithm takes advantage of the 
constraint structure of multistage stochastic program- 
ming problems more effectively. Note that the ALD al- 
gorithm can be applied to separable convex problems 
with more general constraint structure while NBD will 
need to be modified in order to be applicable to other 
types of separable problems. SP problems are one of the 
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Solution times vs. number of scenarios 


most frequently occurring class of large scale problems, 
so it is important to know whether cutting plane type 
algorithms or Lagrangian based algorithms take advan- 
tage of this structure more effectively. Based on the re- 
sults of our experiments it seems that the NBD algo- 
rithm appears to be substantially better. Furthermore, 
we found that the penalty parameter often caused no- 
table changes to the convergence times of both NBD 
and ALD. Finding an update scheme that works for all 
problems is a difficult task. In ALD the penalty parame- 
ter has two goals, one is forcing feasibility and the other 
of keeping iterations close to each other, thus a ‘subop- 
timal’ penalty update scheme may be more damaging 
than in NBD, this may give some insight to the differ- 
ence in performance of the two algorithms. More de- 
tailed numerical experiments can be found in [19]. 
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In many nonconvex optimization problems the set of 
variables is partitioned into two groups such that the 
problem becomes much easier to solve when the vari- 
ables in one group are held temporarily fixed. To ex- 
ploit this structure, a method which has proved to be 
efficient is to decompose the problem into a sequence 
of easier subproblems involving only variables of the 
other group. The basic tool for this decomposition is 
the branch and bound (BB) concept. 


BB Procedure for Decomposition 


Consider the nonconvex global optimization problem 
min {F(x, y): G(x, y) Xx 0, x EX, ye Y}, (P) 


where X is a compact convex subset of IR”, Y is a closed 
convex subset of R?, F: X x Y > R,G: Xx Y > 
IR”, K is a closed convex cone in R™ and <x is the par- 
tial ordering in R™” induced by the cone K, i.e., such 
thaty xc y' & y’-yeK. 

Problems of this form abound in applications such 
as pooling and blending in oil refining, optimal design 
of water distribution, structural design, signal process- 
ing, robust stability analysis and design of chips. 

Suppose that by fixing x € X problem (P) becomes 
an easier problem in y € Y. Then, to take advantage of 
this property on can solve (P) by a BB algorithm with 
branching performed in the x-space. 

Specifically, at iteration k of of the BB procedure 
a collection S, of partition sets in the x-space is con- 
sidered, where for each partition set M € S, a number 
(lower bound) B(M) € R U {+00} has been computed 
such that 


B(M) < inf{F(x, y): G(x, y) Xx 0,x € MNX, ye Y}. 
(1) 
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A partition set M; € argmin{6(M): M € S;} is then 
further subdivided according to an exhaustive subdi- 
vision rule (e.g., the standard bisection rule, if rectan- 
gular subdivision is used), while the best feasible solu- 
tion available (x*, y*) is recorded. By removing every 
M such that B(M) > F(<*, y*) the new collection $,44 
is formed. If S,41 = 9, the procedure terminates, con- 
cluding that the problem is infeasible if no feasible so- 
lution is available, or else that the current best feasible 
solution is actually an optimal one. Otherwise, the next 
iteration is started. 

A key operation in this procedure is bounding: 
Mt B(M). It is assumed that this operation satisfies 
the following natural conditions: 


(a) M'CM= B(M’) > B(M); Q) 
(b) B(M)<+o > MNXFOQ. 
When the BB procedure is infinite it generates 


a filter (an infinite nested sequence of partition sets) 
My,,v = 1,2, ... , such that 


B(My,) < min(P) Vv, 
Mr, AX AB Vv, 
+00 (3) 


() My, = {x*} . 


v=1 


The algorithm is said to be convergent if x* € X and 
min (P) = min{F(x*, y): G(x*, y) <x 0, y € Y}, (4) 


so any optimal solution y* of this problem yields an op- 
timal solution (x*, y*) of (P). 

The basic issue of this decomposition scheme is un- 
der which conditions the BB procedure described above 
is guaranteed to converge in sense (4). 

First observe that, since Mx,,,, C Mx, and hence, 
B(Mxz,4,) = B(Mk, ), we have from (3) 


B(Mx,) 7 B* < min(P). (5) 


Theorem 1 If B(M,x) = +00 for some k then (P) is 
infeasible and the algorithm terminates. If B(M;x) < 
+00 Vk, then there is an infinite subsequence M,,,,v = 
1,2, ... , satisfying (3) and such that x* € X. If in ad- 


dition 
= min{F(x*, y): G(x*,y) <x 0,y € Y}, (6) 
then the BB decomposition algorithm is convergent. 


Condition (6) simply says that the lower bound must 
be eventually exact as k — +00. Also note that for en- 
suring that x* € X the condition x € MM X in (1) is 
essential and cannot be omitted. 


Convergence Achieved with Lagrangian Bounds 


In many important cases Lagrangian bounds can be 
used throughout the decomposition algorithm, so that 
for every partition set M: 


B(M) = oP inf{F(x, y) + (A, G(x, y)): 
e€K* 


xeMNxX,yeY}, (7) 


where K* = {A € R™: (A, u) > 0 Vu € K} is the dual 
cone of K. 
For every t > 0 define 


v(t) = sup inf —{F(x, y) + (A, Gx, y))}, (8) 


AEK* 
l|x—x* ||<t,xEXx 


where, as throughout in what follows, x* denotes the 

limit point of an exhaustive filter of partition sets gen- 

erated by the BB algorithm, i.e., an infinite nested se- 

quence {M,,,} such that N£S°Mx,, = {x*}. 

Theorem 2 Assume Lagrangian bounds are used 

throughout the BB decomposition algorithm, and: 

(Al) v(t) > v(0) ast \, 0. 

(A2) supjexe infyey{FOx*,y) + (A,G(x*,y))} = 
Min yey SUPjeqxtF(x™, y) + (A, G(x", y))}- 

Then the BB decomposition algorithm is convergent. 

Condition Al expresses the continuity of v(t) at t = 0. 

Condition A2 requires that the duality gap be zero for 

the subproblem minycy{F(x*, y): G(x*, y) Xx Of. 

Theorem 3 Assume that F(x, y),Gi(x,y),i=1,..., 

m, are lower semi-continuous and: 

(i) There exists a compact set Y° C Y such that 


(Wx € X)(WA € K*) 
Y°Nargmin cy {F(x, y) + (A, G(x, y))} FO; 
(9) 
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(ii) supe» infyey{F(x",y) + (A,G(x*,y))} = 
minyey SUPyex{F(x*, y) + (A, G(x*, y))}. 

Then the BB decomposition algorithm using La- 

grangian bounds is convergent. Furthermore, the func- 

tion x > o(x) := min{F(x, y): G(x, y) Xx 0, ye Y} 

is lower semicontinuous at x* and satisfies 
lim o(x) = min(P). 


xEX 
x>x* 


(10) 


Remark 1 Condition (i) cannot be replaced by the fol- 
lowing weaker one 

(*) There exists a compact set Y° C Y such that for 
each € R" and for each x € X either the set of optimal 
solutions of the problem 


min{F(x, y) +) AiGi(x, y): y € Y} (11) 


i=1 


is empty or it has a nonempty intersection with Y°. 


Partly Convex Optimization Problems 


An important class of problems (P) is constituted by 
partly convex problems, i.e., problems (P) with the fol- 
lowing assumption: 

(PCA) For every fixed x € X the function y 
F(x, y) is convex, while the mapping y +> G(x, y) 
is K-convex. The latter means that G(x,ay! + (1 — 
a(y? Xx aG(x, y')+(1—a)G(x, y?) whenever y!, y* € 
R?,0O<a<l. 

Owing to (PCA), for every A € K* and fixed x € X 
the function (A, G(x, y)) is convex and the problem 
min{F(x, y) + (A, G(x, y)) : y € Y} is a convex opti- 
mization problem. Specific decomposition methods for 
this class of problems were developed earlier in [3,4], 
and more recently in [1]. Within the present frame- 
work, the convergence conditions can be specialized as 
follows. 

A function f: Y > R is said to be coercive on Y if 
limyey, y s+00 f(y) = +00. Clearly this is equivalent 
to saying that for any 7 € R the set {y € Y: f(y) < n} 
is bounded. 


Theorem 4 Assume (PCA) with F(x, y),Gi(x, y), i= 
1, ... ,m, is lower semi-continuous on X x Y and con- 
tinuous in x for fixed y € Y. Assume further that: 

(S) For some A* € K* the function y +> F(x*,y) + 
(A*, G(x*, y)) is coercive on Y. 


Then the BB decomposition algorithm using La- 
grangian bounds is convergent and the function 
o(x) := min{F(x, y): G(x, y) <x 0, ye Y} is lower 
semicontinuous at x* and satisfies 

lim o(x) = min(P). 

Se 

Remark 2 Condition A2, sometimes referred to as 
dual properness at x*, means that the subproblem 
min{F(x*, y): G(x*, y) <x 0, y € Y} has zero dual- 
ity gap. When Y is bounded, condition (S) obviously 
holds, so by Theorem 4, both conditions Al and A2 
follow from (PCA) and the lower semicontinuity of 
F(x, y), Gi(x, y),i=1,...,m. On the other hand, 
when Y is unbounded, dual properness (i. e., condi- 
tion A2), even coupled with continuity of the functions 
involved, is not sufficient to guarantee condition Al. 
These results suggest that several methods developed in 
the literature for problems of the form (P) should be 
revised for validity. 


Partly Linear Optimization 


A subclass of the class of partly convex optimization 
problems is formed by partly linear optimization prob- 
lems which have the general formulation 


min{(c(x), y) + (c°, x) 


: A(x)y + B(x) <b,r <x <s, y>0}, (GPL) 


where x € R",y € R?,c: RR" > R?, CER", As= 
R"” > R”™*?, Be R™" beER™, rise Ri. 

A special case of interest is the “pooling and blend- 
ing problem” from the petrochemical industry which 
can be stated as 


min{c’y : A(x)y <b, y>0, x € X}, 


where X is a box in R” and A(x) is an m x p matrix 
whose elements aj(x) are continuous functions of x. 
Condition (S) in Theorem 4 now reads 

For some A* € R”! 

we have c’ y + A*, A(x*)y — b) > +00 

as y > +00, 

which clearly holds if and only if (A(x*),A*) +c¢>0. 
For example this condition is fulfilled by the partly lin- 


ear problems considered in [1], and also by the bilinear 
matrix inequalities problem studied in [5]. 
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Extensions 


The above decomposition method can be extended to 
a number of important nonconvex global optimization 
problems. 


Partly Monotonic Optimization 


A function f(x): R” — R is said to be increasing (de- 


creasing, respectively) if f(x) < f(x’) (f(x) => f(x’), 
respectively) whenever x < x’ [7]. 


Theorem 5 In problem (P) assume that X = [a,b] C 


R¥, > F(x, y), G(x, y) are continuous, and 


(PMA) F(x, y), Gi(x, y),i=1,...,m, 


are increasing in x € [a, b] for every fixed ye Y. 


Assume further that the set {y © Y: G(b, y) Xx 0} is 
contained in some box Y°. Then, with lower bounds de- 
fined as 


M = [r,s] C [a, b] B B(M) 


= min{F(r, y): G(s, y) XK 0, ye Y}, (12) 
the BB decomposition algorithm is convergent. 
If F(x, y), Gi(x, y),i=1,...,m, are monotonic in 


y € Y° (or more generally, dm functions in y € Y° [7]) 
then the subproblems in (12) are standard monotonic 
(or dm) optimization problems and can be solved by 
currently available algorithms [7,10]. 


Remark 3 Theorem 5 still holds if F(x, y), Gi(x, y), 
i = 1...,m are decreasing in x € [a,b] for fixed 
y € Y and we define 


B(M) = min{F(s, y): G(r, y) <x 0}. 


Monotonic/Convex Optimization 


Theorem 6 In problem (P) assume X C [a,b], 
F(x, y),Gi(x,y),i=1,...,m, are continuous in 
(x,y), increasing in x € [a,b] for fixed y € Y, and 
convex (affine, respectively) in y for fixed x € [a,b]. 
Assume, in addition, that 


(ST) For some X* € K* 
the function F(a, y) + (A*, G(a, y)) is coercive on Y . 


Then for every M =[r,s] C [a,b], the Lagrangian 
bound problem 


sup inf F(x, y) + (A, G(x, y)) 
heran* 


(13) 


is a convex (linear, respectively) program and the associ- 
ated BB decomposition algorithm is convergent. 
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A frequently applied approach in the history of opti- 
mization is that of decomposition, by which a large 
problem is decomposed into smaller problems. The 
principle of decomposition goes back to the seminal pa- 
per by G.B. Dantzig and P. Wolfe [2]. 

The basic model is a linear programming problem 
with two sets of constraints to be stated as follows: 


max CX 
s.t. Aix < by 
A2x < b> 


x > 0. 


(1) 


where c € R”, Ay; € R™*", b) € R™, Ar € RI*%" and 
b, € R! are given constants and x € R" is a vector of 
variables. 

The fundamental idea is to solve (1) by interaction 
between two optimization problems, one of which is 
subject to the first set of constraints and the other sub- 
ject to the second set of constraints. Denote the second 
set by 


X={x>0: A2x < by}. 


For simplicity we assume that X is bounded and 
nonempty. Hence X is a polytope. Let x’ denote an ex- 
treme point of X for i € P where P is the index set of 
all extreme points. According to the Minkowski repre- 
sentation theorem (see [1]), the polytope X can alterna- 
tively be represented as the convex hull of the extreme 
points, i.e. 


X=¢x= So Aix!: 


i€P 


Diep Ai = 1, 
A; >O0forie P ‘ 


If X is unbounded extreme rays are introduced in the 
representation of X leading to a straightforward exten- 
sion of the subsequent considerations. 

Hence (1) is equivalent to 


max y cx'A; 


ieP 

s.t. So Aixtdi <b, 
ieP (2) 
A= 
ieP 


A; = 0. 


Problem (2) operates with fewer rows than the original 
formulation (1). The variable x has been substituted by 
the variables 1;. However, since the number of extreme 
points is usually very large in comparison with the di- 
mension n of the problem, the number of A-variables 
may also be very large, and it requires a big effort 
to enumerate and calculate all extreme points. Fortu- 
nately, this is unnecessary. In fact, by the Caratheodory 
theorem, at most n + 1 extreme points need to be con- 
sidered, see for example [1]. The trouble is to find the 
correct ones. 

Problem (2) is called the full master problem since all 
extreme points are introduced in the formulation. As 
already indicated we shall consider formulations deal- 
ing with only a subset of extreme points. For this pur- 
pose let P denote a subset of the index set P leading to 
a tightening of (2), called the restricted master problem. 


i€P 
s.t y Aw Ai < by 
ieP (3) 
A= 1 
ieP 
A; > Oforie P. 


Assume here for simplicity that (3) is feasible. If not, 
additional techniques exist and may be applied to make 
the problem feasible. So an optimal basic solution exists 
together with optimal dual variables to be denoted by 
y € R” and v € R according to the m + 1 rows of (3). 
By linear programming duality there exists a dual linear 
programming problem of the full master problem (2) 
with the variables (y, v) and with constraints 


yAix'+v>cx' forallie P. (4) 


Also by linear programming we know that an optimal 
solution has been found for the full master problem (2) 
if and only if the dual solution (y, v) satisfies (4). This 
may of course be checked through examination of all 
extreme points x’. Fortunately, this is not necessary and 
here comes the major idea behind the decomposition 
principle. Instead we consider the following linear pro- 
gramming problem, the so-called subproblem. 


u=max (c— yAj)x 
s.t. Ax a b> (5) 


x>0. 
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Calculate an optimal dual solution (y, v) of the 
restricted master problem (3). 


Determine an extreme point by solving the 
subproblem (5). If (6) is violated expand the index 


set P by including the extreme point and go to 
Step 1. 


An optimal solution has been obtained by the 
solution of the last master problem as 


== edu 


By assumption an optimal solution exists among the 
extreme points of X. Let i* € P denote the index of 
an optimal extreme point. Observe that the objective 
function calculates the maximal value u of cx! — yA,x! 
among all extreme points x! in X. Hence by (4) it re- 
mains to check if 


u<v. (6) 


If so, then all constraints of (4) are satisfied and we may 
stop. Otherwise introduce the elements (cx'*, A,x'') as 
a new column in the restricted masterproblem (3) and 
continue by solving it. 

The above discussion can be summarized into the 
algorithm in Table 1. 

The number of extreme points in X is finite. Hence 
only a finite number of mutually different columns may 
be introduced in the restricted master problem. This 
implies that the algorithm must terminate in a finite 
number of steps. 

The decomposition principle is suited to solve large 
scale problems. Moreover it has a nice economic in- 
terpretation. Consider a central level and a sublevel of 
a decentralized organization. The central level operates 
on the first set of constraints A;x < b; and the sublevel 
on the second set of constraints A.x < b). The right 
hand sides b;, b2 may be interpreted as resources for 
the central level and sublevel, respectively. During the 
course of the algorithm information is communicated 
from one level to the other. The central level solves 
the restricted master problem and as a result marginal 
prices y on central resources are communicated to the 
sublevel. The sublevel solves the subproblem in which 
the objective function incorporates the costs for utiliza- 
tion of the central resources. The sublevel then suggests 
activities x' to be incorporated at central level. During 


the iterations no direct information about the coeffi- 
cients in the constraints is communicated between the 
central level and the sublevel. Instead price information 
is communicated from the central level to the sublevel 
and the algorithm is the fundamental method among 
the so-called price-directive procedures. 

In most applications the last set of constraints, A2x 
< by have a so-called block-angular structure, in which 
the variables are grouped into independent blocks. This 
implies that the subproblem separates into multiple 
independent problems. In an economic context the 
block-angular structure reflects a division of the sub- 
level into multiple independent sublevels, each of which 
communicates directly with the central level. 

A counterpart of the present Dantzig-Wolfe de- 
composition procedure exists in the form of Benders 
decomposition. In linear programming they are dual in 
the sense that application of Benders decomposition on 
the dual program of the original problem (1) is equiv- 
alent to the direct application of the present procedure 
on (1). 


See also 


> Generalized Benders Decomposition 

> MINLP: Generalized Cross Decomposition 

> MINLP: Logic-Based Methods 

> Simplicial Decomposition 

> Simplicial Decomposition Algorithms 

> Stochastic Linear Programming: Decomposition 
and Cutting Planes 

> Successive Quadratic Programming: Decomposition 
Methods 
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A large number of combinatorial optimization prob- 
lems can be viewed as potentially ‘easy’ problems to 
solve that are complicated by a set of side constraints. If 
the complicating constraints were removed, the result- 
ing problem would have constraints possessing a high 
degree of structure, for which many efficient algorithms 
exist. One of the most attractive methods to exploit 
this property is the Lagrangian relaxation technique, 
in which the complicating constraints are dualized and 
then removed from the constraint set. This class of 
methods, originally proposed by various authors for 
a variety of problems, and later generalized in [3], has 
proven highly successful in solving otherwise difficult 
combinatorial problems. For an excellent introduction 
to the approach and its applications, see [1,2]. 
Consider the following problem (P): 


max f |x (1) 
such that 

Ax <b (2) 

Cx <d (3) 

xe X, (4) 


where x is an n-vector, b is an m-vector and d is a k- 
vector, and f, A and C have conformable dimensions. 
Some or all of the x variables can be integers (i.e. X C 
Z"). It is assumed that there is a finite and nonempty 
set of solutions to the constraints in the problem (2)- 


(4). Let (LP) represent problem (P) with any integrality 
constraints in X removed. 

The following notation is used in the sequel. For any 
problem (-), OS(-) is its optimal set, and V(-) represents 
its optimal value. For any set S, Co(S) represents the 
convex hull of the set. 

The Lagrangian relaxation (LR,) of (P) relative to 
the constraint set (2) and a conformable nonnegative 
vector u is defined as 


max ex + u(b — Ax) 


s.t. Cx <d 
xeEX, 
The problem 


(LR) = min(LR,,) 


is called the Lagrangian dual relative to (2). The con- 
straints (2) are referred to as the ‘dualized constraints’, 
and u is the corresponding multiplier or dual vector. 
The constraints should be chosen so that the remaining 
set C, < d possesses desirable structure. For example, 
(3) might only specify up per bounds on the variable, 
or might be a single ‘knapsack’ constraint of the form 
ea % <1. 

The first point to note about (LR,,) is that it always 
provides an upper bound for (P), i.e. 


V(LR,,) = V(P). 


This can easily be seen from the fact that u > 0 and Ax 
< b for any solution x which is optimal for (P). In prac- 
tice, it is desirable to have V(LR,,) as close to V(P) as 
possible. Moreover, there is already an LP relaxation of 
(P), obtained by dropping the integrality requirement 
on x. How does V(LR,,) relate to V(LP)? To answer this 
question, consider the following relaxation of (P), de- 
noted by (P*): 


max f!'x 
st. Ax <b 
x€Co{Cx <d: x Ex}. 


It can be shown ([3]) that 


V(P) < V(P*) = V(LR) < V(LP). 
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The equality for the optimal values of problems (P*) 
and (LR) follows from the fact that they are duals of 
each other. Moreover, it can be shown that if the mul- 
tipliers for the constraints obtained from solving the 
LP relaxation were used, the resulting Lagrangian re- 
laxation provides a bound at least as tight as the bound 
from (LP). Also, if @ is an optimal solution for (LR), 
with Ax < b and u(Ax — b) = 0, then x is optimal 
for (P). 

When are the inequalities above strict? This can be 
shown through the following integrality property, again 
due to [3]: The optimal value of (LR,,) is not changed by 
dropping the integrality condition on the x variables. 

If the integrality property (also referred to as the 
complementary slackness property) holds, then 


V(P) = V(P*) = V(LR) = V(LP). 


In this case, therefore, Lagrangian relaxation can do 
no better than the standard LP relaxation for (P). For 
a large number of practical problems, however, this 
property does not hold. This fact allows (LR,) to be 
used in place of (LP) to provide lower bounds in 
a branch and bound algorithm. 


Lagrangian decomposition 


A drawback of the Lagrangian relaxation (LR) de- 
scribed above is that only one of the possibly many spe- 
cial structured constraint sets embedded in the problem 
can be exploited. This results in the loss of structure of 
all the dualized constraints. One way to avoid this is to 
use Lagrangian decomposition ([4,5,6,7]). 

Introducing a new set of ‘copy’ constraints y = x, 
problem (P) is equivalent to 


max f' x 
xy 

such that 
Ay <b (5) 
Cx <d (6) 
yee (7) 
xeX, yey, (8) 


where X C Y. Dualizing the ‘copy’ constraints (7) re- 
sults in 


max f'x + v(y — x) 


such that 
Ay <b (9) 
Cx <d (10) 
xeX,yeyY, (11) 


which can be decomposed to the following problem 
(LD): 


max F(x) max F(y) 
st. Cx<dt )st. Ay<b> 
xEex yeY 


where Fy (x) = (f — v)? x and F, (y) = v! y. The La- 
grangian decomposition dual (LD) can then be defined 
to be 


min V(LD,) 


If uv is an optimal solution to (LR), then, with v = w- A, 
it can be shown that 


V(LD;) = V(LRz) — u(b — Ay) 
and therefore 
V(LD) < V(LR). 


It is possible to define an integrality property ([5]) such 
that if either the x- or the y-problem has the property, 
then V(LD) will be equal to the stronger of the bounds 
obtained from the two Lagrangian relaxations corre- 
sponding to each set of constraints. 

Lagrangian decomposition (LD) has several advan- 
tages over (LR). Every constraint in the original prob- 
lem appears in one of the subproblems. It thus avoids 
having to choose between the various sets of structured 
constraints. Secondly, as shown above, the bounds from 
(LD) can be tighter than those from (LR). Furthermore, 
the bound can be tightened by adding surrogate con- 
straints (for example, a surrogate constraint from Ax < 
b can be added to the x-problem in (LD). Thirdly, anal- 
ogous to (LR), it can be shown that (LD) is really the 
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dual of a primal problem involving the optimization of 
the original objective function over the intersection of 
the convex hulls of the two constraint sets. Finally, em- 
pirical results suggest that when using heuristics based 
on Lagrangian decomposition, any intermediate solu- 
tions found in the solution of (LD) lead to better solu- 
tions for problem (P) as compared to solutions found 
by Lagrangian relaxation. 


Aggregation Schemes 


The main drawback of the Lagrangian decomposition 
method is that a large number of multipliers (v) are in- 
troduced, one for each of the copy variables. The calcu- 
lation of v can be time consuming at each step. More- 
over, the convergence of the scheme can be slowed sig- 
nificantly by the larger number of directions (for the 
multipliers) to search. In order to avoid this, an alter- 
nate approach that has been suggested [8] is to aggre- 
gate some or all of the variables using a simplified linear 
function, and then to dualize the resulting copy con- 
straints. The purpose of the aggregation is to substan- 
tially reduce the number of dual variables, while still 
maintaining the constraint structure as in the standard 
decomposition. 

Let A = [A; | Ao], and x = (2); with x! € R™ and 
x* ER", Introduce the copy variables y = C ) =< 
and the constraints 

x'=y> and Aox? = e(y’), 
where g(-) is the aggregation function (for example, 
gly") = Az y’, or g(y’) = y’). Then, the problem (P) can 
be written as 


max f! x 

such that 
Aly + gly") <b (12) 
Cx <d (13) 
rey (14) 
Ax” = g(y") (15) 
xexX, yey, (16) 


where X C Y. Dualizing the constraints (14) and (15) 
leads to 


max f!x +w!(y!— x!) + w?(g(y?) — A2x?) 
st. Ary’ + g(y*) <b 
Cx <d 
xex, yey, 
which can be decomposed to the problem (LDA,,), 
given by 
max F(x) 
Cx <d+ 
xex 


mae F,(y) 


Aiy' + gly’) <b 
yey, 


where Fy (x) = fT x — w! x! — w* A) x’ and F) (y) = w! 
y' + w* g(y’). The corresponding dual problem (LDA) 
is then defined by 


min LDA,, 
w>0 


It can then be proved that for any optimal solution of 
(LR) defined by 7 € OS(LR), with wy = 7A, and w? = 


u, 

V(LDA;) < V(LRz) 
and therefore 

V(LDA) < V(LR). 


This inequality is strict only if the second subprob- 
lem in (LDA,,) (i.e. the problem of maximizing F>(y)) 
does not satisfy the integrality (complementary slack- 
ness) property. Moreover, this inequality holds only for 
the Lagrangian relaxation with those constraints that 
have been aggregated. Any other Lagrangian relaxation 
defined for problem (P) by dualizing other sets of con- 
straints will not necessarily satisfy this inequality. 

Similarly, defining v = (w!, w2)!, it can also be eas- 
ily shown that 


V(LD;) < V(LDA;) 
and therefore 
V(LD) < V(LDA). 


In general, therefore, the bound obtained by aggregat- 
ing some or all of the variables is stronger than the 
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bound obtained from Lagrangian relaxation but weaker 

than the bounds from standard Lagrangian decompo- 

sition. However, while the standard (LD) introduces n 

multipliers, (LDA) has n, + m, which can be consid- 

erably less depending on the number of dualized con- 
straints. Moreover, any inherent problem structure is 
still maintained in (LDA). The aggregate formulation 
can thus be viewed as a reasonable compromise be- 
tween tightness of the bounds and speed of solution. 

As in the case of standard decomposition, (LDA) can 

be defined by aggregating different subsets of variables. 

Unfortunately, it is not always apparent a priori which 

is the best choice for obtaining the tightest bound, and 

various alternatives may have to be tried in practice. 
One possible method of exploiting the potential in- 
equalities in these various bounds is as follows: 

1) solve (LR) to obtain u; 

2) if the integrality property does not hold, set w! = 
HA, and w = @, and solve (LDA;); this problem is 
guaranteed to give a tighter bound than (LR); 

3) setv = (4), and solve (LD;). Note that if the aggre- 
gate function g(y*) is of the form A2y, this step will 
not yield any improvement. 


Practical Issues 


Because (LR), (LD) and (LDA) can all provide tighter 
bounds than (LP), any one of the relaxations can be 
used in place of (LP) to provide upper bounds in a clas- 
sical branch and bound algorithm to solve (P). Conse- 
quently, the choice of the relaxation scheme used, as 
well as the quality of the bounds obtained, is of consid- 
erable importance. These are discussed in some detail 
below. 


Choosing Among Alternate Relaxations 


Often, there are several choices for the constraints to be 
dualized. For example, consider the generalized assign- 
ment problem 


mon 
min ) ) CijXij 
i=1 j=1 


such that 


m 
yg = 1, j=l,...,n, 
i=1 


(17) 


n 
Sage = hi, i=1,...,m,xjj =O0orl Vi, j. 
j=l 

(18) 


By dualizing the first set of constraints (17), this prob- 
lem reduces to m knapsack problems. Conversely, du- 
alizing (18) results in a generalized upper bound (GUB) 
problem in 0-1 variables. Which of the two relaxations 
should be used? There are two conflicting factors in- 
volved here, namely the tightness of the bounds and the 
ease of solution of the problem. It would be wise to se- 
lect a relaxation that yields a problem that is fairly easy 
to solve, but not so easy that the bounds are very loose. 
In general, it is difficult to know this a priori. In some 
instances, however, the test of the integrality property 
can be useful. For example, in the generalized assign- 
ment problem, the integrality property holds for the 
second relaxation but not the first, suggesting that the 
second relaxation will yield a tighter bound. 

For the case of the Lagrangian decomposition, this 
issue is not important since all constraints are main- 
tained in one of the subproblems. However, if aggre- 
gation is being used, then again, it is in general hard to 
know a priori which variables to aggregate and which 
ones to copy. Often, the best solution is to try various 
alternatives and use the computational results to guide 
the choice. 


Choice of Multipliers 


It is clear that for (LR), the best choice for u is an opti- 
mal solution to the problem 


min V(LR,,) 
u=0 


since this will yield the tightest bound V(LR). Simi- 
larly, the best dual vectors v and w for (LD) and (LDA) 
are those from the optimal solutions to the respective 
dual problems. Unfortunately, these optimal values for 
the dual variables cannot be determined a priori, and 
therefore, an interactive procedure is the only viable 
approach to improv ing the value of u, v or w. Below, 
a couple of techniques for updating u for (LR) are dis- 
cussed, but the methods are just as relevant for updating 
v and w for (LD) and (LDA). 

In general, the function V(LR,) is piecewise lin- 
ear, convex and differentiable at all points except where 


Decomposition Techniques for MILP: Lagrangian Relaxation 


637 


the Lagrangian problem has multiple optimal solutions. 
This observation has led to the development of subgra- 
dient techniques for the determining the u that min- 
imizes V(LR,,). This method is similar to traditional 
gradient methods, except that at the nondifferentiable 
points, it chooses randomly from the set of optimal La- 
grange solutions. Given an initial value u° (typically u° 
= 0), a sequence {u*} is generated by the formula 


uk+! — max{uk + t*(Ax* — b), 0}, 


where x* is an optimal solution to (LR,) and ¢* is 


a scalar stepsize, generally designed to be a decreasing 
sequence converging to zero. It is not possible to prove 
optimality in this method, so usually it is terminated 
upon reaching a specified number of iterations. Because 
of its simplicity, the subgradient technique is generally 
the method of first choice when solving (LR). 

An alternate way to update u is to use dual descent 
algorithms, also referred to as multiplier adjustment 
methods. In these methods, the sequence u‘ is gener- 
ated by 


ukt} = yk + pkg, 


where d* is an descent direction, determined from the 
directional derivative of V(LR,x) using a finite set of 
directions. Typically, the direction of steepest descent 
is chosen, and the stepsize t* is the one that mini- 
mizes V(LR,«4;qk). Unlike subgradient optimization, 
this procedure guarantees monotonic bound improve- 
ment. Moreover, it may only adjust a few multipliers 
at each iteration, resulting in improved computational 
performance. However, for general problems, the set of 
directions to choose from can be very large, resulting in 
very poor descent. It is therefore essential to tailor these 
methods to particular problems to exploit their struc- 
ture in determining the set of directions. 


Applications 


Lagrangian relaxation and decomposition have been 
successfully applied to solve a large number of practical 
combinatorial problems. These include the generalized 
assignment problem, capacitated facility location prob- 
lem, the traveling salesman problem and instances of 
the general mixed integer programming problem. For 
each of these problems, the constraints contain well- 
understood structures such as knapsack, spanning tree 


and generalized upper bound constraints, thus facili- 
tating the dualization of the other complicating con- 
straints. For a number of problems, these techniques 
represent the best available solution method. Computa- 
tional results for these and other problems indicate that 
the bounds provided by (LR) and (LD) can be extremely 
sharp. These results have led to (LD) and (LDA) being 
considered among the best available solution methods 
for solving these problems. 
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Introduction 


The assumption of a fixed template for de novo peptide 
and protein design is highly questionable [41], as pro- 
tein is commonly known to exhibit backbone flexibility, 
as illustrated by the superposition of NMR structures in 
Fig. 1. De novo design templates were observed to allow 


De Novo Protein Design Using Flexible Templates, Figure 1 
Template flexibility as illustrated by the superposition of the 
20 NMR structures of apo intestinal fatty acid binding protein 
(Protein Data Bank code 1AEL) 


residues that would not have been permissible had the 
backbone been fixed [34]. The Mayo group claimed that 
their ORBIT protein design program was robust against 
15% change in the backbone. Nevertheless, they found 
in a later case study on T4 lysozyme that core repacking 
to stabilize the fold was difficult to achieve without con- 
sidering a flexible template [37]. The secondary struc- 
tures of a-helices and f-sheets actually display twisting 
and bending in the fold, and Emberly et al. [6,7] applied 
principal component analysis of database protein struc- 
tures to quantify the degree and modes of their flexibil- 
ities. 

In this chapter we classify the various methods of 
incorporating backbone flexibility into the design tem- 
plate into three main types according to their treatment 
of the backbone and side-chain conformations. The 
first type involves considering a set of multiple discrete 
templates and performing de novo design with discrete 
rotamers on each of the templates under the fixed-back- 
bone assumption. The second type considers a contin- 
uum template by means of algebraic parameterization 
of the backbone and variation of the parameters to al- 
low for backbone movement during sequence selection. 
However, it still employs rotamer libraries to simplify 
the side-chain conformations. Through novel sequence 
selection formulations [14] and pairwise contact poten- 
tials which are discretized over distance bins [35,44,45], 
the third type considers a continuum design template in 
which the Ca—Ca distances and dihedral angles assume 
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continuous values between upper and lower bounds ob- 
served from the template structures [9], and confirms 
sequence specificity to the target fold based on these 
bounded continuous distances and angles via NMR 
structure refinement methods [16,17] rather than the 
discrete rotamer approach. For each category we will 
quote some examples of successes of de novo peptide 
and protein design. 


Flexible Template via Multiple Discrete 
Templates and Discrete Rotamers 


By incorporating protein backbone flexibility via dis- 
crete templates and discrete rotamers, de novo protein 
design frameworks either separate sequence selection 
and backbone movement explicitly or iterate between 
sequence space and structure space [3]. Notice that in 
both cases, the sequence search methods outlined in the 
previous section are all applicable, as fixed backbones 
and discrete rotamers are still assumed. 


Approaches Which Separate Sequence Selection 
and Backbone Movement 


These approaches consider an ensemble of fixed back- 
bones, searches for sequences for each of them assum- 
ing a fixed template, and finally identify the best so- 
lutions from all the results. Successes using different 
kinds of search algorithms include the ones described 
next. 


Successes Using Dead-End Elimination By vary- 
ing the supersecondary structure parameters, Ross 
et al. [46] and Su and Mayo [48] generated several sets 
of perturbed backbones from the native structure and 
redesigned the core of the 61 domain of the streptococ- 
cal protein using the DEE algorithm under the fixed- 
template assumption for each backbone. Confirmed by 
NMR experiments, six of the seven sequences tested 
folded into nativelike structure. 


Successes Using the Self-Consistent Mean Field 
Method Kono and Saven [29] applied their self-con- 
sistent mean field based protein combinatorial library 
design strategy to a set of similar backbone structures 
to obtain new sequences that are robust to distance 
changes in the template for the immunoglobulin light 
chain-binding domain of protein L. 


Successes of Monte Carlo Methods/Genetic Algo- 
rithms The Pande group generated families of 100 
fixed templates within 1A root-mean-square devia- 
tion (rmsd) from the initial backbone using a Monte 
Carlo method. With these fixed-template ensembles, 
they performed de novo design, which was based 
on genetic algorithms, on their Genome@home dis- 
tributed grid system for 253 naturally occurring pro- 
teins. They obtained sequences that exhibited higher di- 
versity than the corresponding natural sequence align- 
ments, as well as good agreement on the sequence en- 
tropies of the designed sequences from the same tem- 
plate family [32,33]. 

In order to incorporate protein flexibility, Kraemer- 
Pecore et al. [30] executed a Monte Carlo simulation 
to generate 30 fixed backbones that were within 0.3 A 
rmsd of the initial template. A genetic-algorithm-based 
sequence prediction algorithm [43] which combines fil- 
tering and sampling rotamers and energy minimization 
was then employed for sequence search on each tem- 
plate under the fixed backbone assumption. The work 
led to the identification of a sequence that folded into 
the WW domain. 

In designing protein conformational switches, Am- 
broggio and Kuhlman [1,2] also used the Monte Carlo 
based RosettaDesign to search for sequences for multi- 
ple fixed-template structures. 


Approaches Which Iterate Between Sequence Space 
and Structure Space 


There are two good examples which belong to this 
class. The first example is a genetic algorithm/Monte 
Carlo based framework used by Desjarlais and Han- 
del [5], in which a starting population of backbones 
is generated by small angle perturbations to the tem- 
plate, rotamers are randomly selected on each back- 
bone, and a genetic algorithm is subsequently used 
which exchanges not only rotamers but also backbone 
torsional information in recombination. The frame- 
work is ended with a Monte Carlo stage which re- 
fines the backbone structures. Using this novel ap- 
proach, Desjarlais and Handel [5] designed three 
new core variants of the protein 434 cro. They also 
compared results on 434 cro and T4 lysozyme with 
those obtained earlier using fixed-template models and 
found that they were similar, given that the fixed- 


640 


De Novo Protein Design Using Flexible Templates 


template models scan over a much larger rotamer 
space. 

The second one was proposed by Kuhlman et al. 
[31] and Saunders and Baker [31,47]. Their method 
starts with a set of initial backbones, searches by 
a Monte Carlo method for the sequence with the low- 
est energy for each of them, performs atomic-resolution 
structure prediction for the sequences to allow shifts in 
the structure space, and continues until the number of 
iterations hits a predetermined number. They success- 
fully designed a new sequence for Top 7, a 93-residue 
a/B protein with a novel fold [31]. They also claimed 
that the new method better captures sequence variation 
than approaches that separate sequence selection and 
backbone movement explicitly. 


Flexible Template via Continuum Template 
and Discrete Rotamers 


This method of constituting a continuum template via 
backbone parameterization and performing sequence 
search from rotamer libraries was proposed by Har- 
bury et al. [18,19,40]. On the basis of the algebraic pa- 
rameterization equations developed for coiled-coils by 
Crick [4], they allowed backbone movement by treat- 
ing the parameters as variables during sequence search 
for energy minimization, which was in turn done by 
the local optimization methods of steepest descent min- 
imization and adopted-basis Newton-Raphson mini- 
mization. 


Successes 


Harbury et al. [18,19,40] adopted this approach to de- 
sign a family of a-helical bundle proteins with right- 
handed superhelical twist. The crystal structure of the 
designed sequences with the optimal specificity was ex- 
perimentally validated to match the design template. 


Flexible Template via Continuum Template 
and NMR Structure Refinement 


Considering discrete rotamers is certainly not the best 
approach to adopt in de novo design, as about 15% of 
side-chain conformations are not represented by com- 
mon rotamer libraries [8]. A recent two-stage de novo 
design approach proposed by the Floudas group [10,11, 
26,27] considers a continuum design template without 


using discrete rotamers for the possible side-chain con- 
formations. The first stage selects a rank-ordered list 
of low-energy sequences using novel quadratic assign- 
ment-like models [13,14] driven by pairwise residue 
contact potentials, which were developed by the group 
by solving a linear programming parameter estima- 
tion problem, requiring that the native conformations 
for a large training set of 1250 proteins be ranked en- 
ergetically more favorably than their high-resolution 
decoys [35,44,45]. The forcefields developed were 
found to produce very good Z scores in recognizing the 
native folds for a large test set of proteins [35,44,45]. 
Rather than being continuous, the dependence of con- 
tact potential on distance is discretized into bins. This 
designed feature serves to make the energy objective 
function insensitive to a limited degree of backbone 
movement. For example, in the high resolution C*-C* 
forcefield [44], if the pair of amino acids selected at two 
positions i and k, which are 3.5 A apart in the template, 
are Arg and Glu, respectively, their energy contribu- 
tion to the objective function is Minus 7.77 kcal/mol. 
Despite small distance variations, this energy value is 
constant for all Arg—Glu interactions as long as the C* 
positions of the two residues are 3-4 A (bin 1) apart. 
To perform sequence selection based on a flexible tem- 
plate of multiple structures, Fung et al. [14] also devel- 
oped two novel formulations: a weighted model which 
considers the distance between any two positions as the 
weighted average of their distances in all structures, and 
a binary distance bin model that decides which bin the 
distance falls into during energy optimization. The lat- 
ter approach is in a sense similar to the backbone pa- 
rameterization approach of Harbury et al. [18,19,40] in 
which there are distance variables associated with the 
backbone. 

The second stage of the approach confirms fold 
specificity of the sequences generated in the first stage 
based on a full-atomistic forcefield. The group used to 
perform the task via ASTRO-FOLD [20,21,22,23,24,25, 
28,36], a protein structure prediction method via global 
optimization. Conformational ensembles are generated 
for each sequence under two sets of conditions. In the 
first circumstance, the structure is constrained to vary, 
with some imposed fluctuations, around the template 
structure. In the second condition, a free folding cal- 
culation is performed for which only a limited number 
of restraints (e.g., disulfide bridges), but not the un- 
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derlying template structure, are enforced. The relative 
fold specificity of the sequence, f sec, can be found by 
summing the statistical weights for those conformers 
from the free folding simulation that resemble the tem- 
plate structure (denoted as set temp), and dividing this 
sum by the summation of statistical weights for all con- 
formers from the free folding simulation (denoted as set 
total): 

Deion exp[—BE;] 

d ietora’ XPI—BEi] 

where exp[—fE;] is the statistical weight for con- 
former i. 

Note that in this nonrotamer approach, in both the 
template-constrained and the free folding calculations, 
all continuous C%-C® and angle values between upper 
and lower bounds input by the user are considered in 
sampling the conformers. True backbone flexibility [9] 
is thus conserved. 

Lately the Floudas group developed an approximate 
fold validation method which is computationally less 
expensive than ASTRO-FOLD. Through the CYANA 
2.1 software for NMR structure refinement [16,17], an 
ensemble of several hundred conformers is generated 
for both a new sequence from the first stage and the na- 
tive sequence. The energies of the conformers are then 
minimized using TINKER [42], and the fold specificity 
of the new sequence is calculated using the formula 


Sspec = 


par € conformers for new sequence exp [—BE;] 
yi € conformers for native sequence exp [—BE;] 


based on the assumption that the fold specificity to the 
flexible template is unity for the native sequence. 

Like the fold-validation method via ASTRO-FOLD, 
all continuous distance and dihedral angle values be- 
tween their upper and lower bounds, which are input 
into CYANA on the basis of observations about the 
template structures, are considered in generating the 
conformers. This distinguishes the method from the 
common rotamer approach in which only discrete side- 
chain conformations are allowed. 


Sspec = 


Successes 


The novel two-stage de novo strategy was applied to 
(1) the design of new sequences for compstatin, a syn- 
thetic 13-residue cyclic peptide that binds to comple- 
ment protein 3 (C3) and inhibits the activation of the 


complement system (part of innate immunity) [10,26, 
27,38,39], (2) the design of a potential peptide-drug 
candidate derived from the C-terminal sequence of the 
C3a fragment of C3 [15], and (3) the full sequence of 
human f-defensin-2, a 41-residue cationic peptide in 
the immune system [12]. In the case of the compstatin 
redesign, sequences with 16-fold and 45-fold improve- 
ment in specificity over the native sequence were con- 
firmed in experiments [26,27]. For the design of the 
peptide drug from C3a, the best sequence identified 
corresponds to 15-fold improvement [15]. 
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Introduction 


Computational protein design efforts were first initi- 
ated with the premise that the three-dimensional coor- 
dinates of the design template or backbone were fixed. 
This simplification was first proposed in [39], and was 
appealing because it greatly reduced the combinato- 
rial complexity of the search. Together with consider- 
ation of only a limited set of most frequently observed 
side-chain conformations called rotamers [29,40], the 
assumption enhanced the efficiency of the initial de 
novo design efforts, most of which focused on pro- 
tein cores [5,16,32,41,42], in exploring search spaces. 
The reason why protein cores were selected instead 
of the boundary or surface regions was based on the 
thesis that protein folding is primarily driven by hy- 
drophobic collapse, and thus a good core tends to pro- 
vide a well-folded and stable structure for the de novo 


designed protein [10]. The scope of the de novo de- 
sign encompassed intermediate and surface residues in 
subsequent years, and obviously the problem became 
more challenging. In this chapter, we outline the dif- 
ferent deterministic and stochastic methods that search 
for sequences specific to the fixed rigid design tem- 
plate. It should be noted that they all discretize the side- 
chain conformational space into rotamers for tractabil- 
ity of the search problem. After the introduction of each 
method we also review examples of successes. 


Sequence Search Methods 


De novo design algorithms can be classified into two 
main categories, namely, deterministic and stochas- 
tic [8]. The two main methods that fall into the deter- 
ministic category are dead-end elimination (DEE) and 
self-consistent mean field (SCMEF), whereas the two ma- 
jor stochastic type frameworks are Monte Carlo and ge- 
netic algorithms. Some methods search for low-energy 
sequences, whereas others assign probability to each of 
the 20 amino acids for each design position in a se- 
quence in order to maximize the conformational en- 


tropy. 


Deterministic Methods 


The Dead-End Elimination Criteria DEE, which is 
arguably the most popular rotamer search algorithm, 
operates on the basis of the systematic elimination of 
rotamers that cannot be parts of the sequence with the 
lowest energy. The energy function in DEE is written 
in the form of the sum of an individual term (rotamer- 
template) and a pairwise term (rotamer-rotamer): 


N-1 N 


N 
E = ) Flic) + >) >) Elias je), (1) 


i=1 i=1 j>i 


where E(i,) is the rotamer-template energy for rotamer 
i, of amino acid i, E(ig, jp) is the rotamer-rotamer en- 
ergy of rotamer i, and rotatmer j, of amino acids i and 
j» respectively, and N is the total number of positions. 
The original DEE pruning criterion is based on the con- 
cept that if the pairwise energy between rotamer i, and 
rotamer jy is higher than that between rotamer i, and 
rotamer jy for all rotamer jy, in a certain rotamer set {B}, 
then rotamer i, cannot be in the global energy mini- 
mum conformation and thus can be eliminated. It was 
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proposed in [9] and can be expressed in the following 
mathematical form: 


N N 

Elia) + >. Elia, jo) > Elic) + )_ Elicy ju) V{B}. 
j#i j#i 

(2) 


Rotamer i, can be pruned if the above holds true. 
Bounds implied by (1) can be utilized to generate 
the following computationally more tractable inequal- 


ity [9]: 


N 
E(ia) a min E(ig, jp) 
N 
> E(ic) + max E(i,, jp) : (3) 
The above equations for eliminating rotamers at 
a single position (or singles) can be extended to elimi- 
nating rotamer pairs at two distinct positions (doubles), 
rotamer triplets at three distinct positions (triples), or 
above [9,37]. In the case of doubles, the equation be- 
comes 


N 
e(ia, jo) + >> min e(ia, jo, ke) 
k#i,j 
N 
> elias jo) + DY) max elias joke), (4) 
k#i,j 


where ¢ is the total energy of rotamer pairs: 
E(ia, jo) = E(ia) + E(jv) + E(ia, jo) . (5) 


E(ias jb» Kc) = Elia, k-) + E(jo, ke) . (6) 


It determines a rotamer pair ig and jy which al- 
ways contributes higher energies than rotamer pair i, 
and jp for all possible rotamer combinations. Gold- 
stein [14] improved the original DEE criterion by stat- 
ing that rotamer i, can be pruned if the energy contri- 
bution is always reduced by an alternative rotamer i-: 


N 

Elia) — E(ic) +) ) min[ECia, jy) —Elic, ju] > 0. 
j#i 

(7) 


This can be generalized to the use of a weighted av- 
erage of C rotamers i, to eliminate i, [14]: 


N 
Elia) — Y) weElic) + Dj min[ECia, jv) 
c=1],..., Cc j#Fi (8) 
— Do wEic jo] > 0. 


c=1,...,C 


Lasters et al. [25] proposed that the most suitable 
weights w, can be determined by solving a linear pro- 
gramming problem. 

In addition to these criteria proposed by Gold- 
stein [14], Pierce et al. [38] introduced the split DEE, 
which splits the conformational space into partitions 
and thus eliminated the dead-ending rotamers more ef- 
ficiently: 


E(ia) = E(ic) 


N 
a > {min[E(ia, ja’) — Elic, ja’))} 
jiekAi “ 
+ [E(ia, ky) — Elic, ky)] > 0. (9) 


In general, n splitting positions can be assigned for 
more efficient but computationally expensive rotamer 
elimination: 


E(ia) = E(ic) 


N 
+ > tmin[E(ia, ja’) — E(ic. ja’)]} 
piFki,..knFi 
+ D) [Bia kw) — Elie ky] > 0. (10) 
k=ky,...,kn 


Looger and Hellinga [27] also introduced the gen- 
eralized DEE by ranking the energy of rotamer clus- 
ters instead of that of individual rotamers and increased 
the ability of the algorithm to deal with higher lev- 
els of combinatorial complexity. Further revisions and 
improvements on DEE were performed by Wernisch 
et al. [47] and Gordon et al. [15]. 

Being deterministic in nature, the different forms of 
DEE reviewed above all yield the same globally optimal 
solution upon convergence. 


Successes Using Dead-End Elimination: Based on 
operating the DEE algorithm on a fixed template, the 
Mayo group devised their optimization of rotamers 
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by an iterative technique (ORBIT) program and ap- 
plied it to numerous de novo protein designs. Exam- 
ples are the full-sequence design of the BBa fold of 
a zinc finger domain [6], improvement of calmodulin 
binding affinity [45], full core design of the variable do- 
mains of the light and heavy chains of catalytic anti- 
body 48G7 FAB, full core/boundary design, full surface 
design, and full-sequence design of the 61 domain of 
protein G [15], as well as the redesign of the core of 
T4 lysozyme [32]. They also adjusted secondary struc- 
ture parameters to build the “idealized backbone” and 
used it as a fixed template to design an o/f-barrel pro- 
tein [33]. The Hellinga group applied DEE with a fixed 
backbone structure to introduce iron and oxygen bind- 
ing sites into thioredoxin [2,3], design receptor and sen- 
sor proteins with novel ligand-binding functions [28], 
and confer novel enzymatic properties onto ribose- 
binding protein [11]. 


The Self-Consistent Mean-Field Method The SCMF 
optimization method is an iterative procedure that pre- 
dicts the values of the elements of a conformational 
matrix P(i,a) for the probability of a design position i 
adopting the conformation of rotamer a. Note that 
P(i,a) sums to unity over all rotamers a for each po- 
sition i. Koehl and Delarue [19] were among those who 
introduced such a method for protein design. They 
started the iteration with an initial guess for the con- 
formational matrix, which assigns equal probability to 
all rotamers: 


P(i,a) = a 


a=1,2,...,A. 
A 


(11) 

Most importantly, they applied the mean-field po- 
tential, E(i,a), which depends on the conformational 
matrix P(i, a): 


E(i,a) = U(xia) + U(Xia, Xo) 


N B 
ce 2 YFG DU Gk) (12) 


j=Lj#i b=1 


where xo corresponds to the coordinates of atoms in the 
fixed template, and x;, and x;, correspond to the coor- 
dinates of the atoms of position i assuming the confor- 
mation of rotamer a and those of position j assuming 
the conformation of rotamer b, respectively. The clas- 
sical Lennard-Jones (12-6) potential can be used to de- 


scribe potential energy U [19]. The conformational ma- 
trix can be subsequently updated using the mean-field 
potential and the Boltzmann law: 


—E(i.a) 
: e RT 
(ia) = ——a5 (13) 
a=1 © RE 


The update on P(i,a), namely, P\(i,a), can then be 
used to repeat the calculation of the mean-field poten- 
tial and another update until convergence is attained. 
Koehl and Delarue [19] set the convergence criterion 
to be 10“ to define self-consistency. They also pro- 
posed the introduction of memory of the previous step 
to minimize oscillations during convergence: 


P(i,a) = AP,(i,a) + (1—A)P(i,a), (14) 


with the optimal step size A to be 0.9 [19]. 

The Saven group [12,24,44,48] extended the SCMF 
theory and formulated de novo design as an opti- 
mization problem maximizing the sequence entropy 
subject to composition constraints and mean-field en- 
ergy constraints. In addition to the site probabilities, 
their method also predicts the number of sequences for 
a combinatorial library of arbitrary size for the fixed 
template as a function of energy. 

It should be highlighted that though deterministic 
in nature, the SCMF method does not guarantee con- 
vergence to the global optimal solution [26]. 


Successes Using the Self-Consistent Mean-Field 
Method Koehl and Delarue [20] applied the SCMF 
approach to design protein loops. In their optimization 
procedure, they first selected the loop fragment from 
a database with the highest site probabilities. Then they 
placed side chains on the fixed loop backbone from 
a rotamer library. Kono and Doi [23] also used an en- 
ergy minimization with an automata network, which 
bears some resemblance to the SCMF method, to design 
the cores of the globular proteins of cytochrome bs5¢2, 
triosephosphate isomerase, and barnase. The SCMF 
method is related to the design of combinatorial li- 
braries of new sequences with good folding properties, 
which was reviewed in several papers [17,34,35,43]. 


Stochastic Methods 


The fact that de novo design is nondeterministic 
polynomial-time hard [13,36] means that in the worst 
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case the time required to solve the problem scales non- 
polynomially with the number of design positions. As 
the problem complexity exceeds a certain level, deter- 
ministic methods may reach their limits and in such in- 
stances we may have to resort to stochastic methods, 
which perform searches for only locally optimal solu- 
tions. Monte Carlo methods and genetic algorithms are 
the two most commonly used types of stochastic meth- 
ods for de novo protein design. 


Monte Carlo Methods Different variants of the 
Monte Carlo methods have been applied for sequence 
design. In the classic Monte Carlo method, mutation 
is performed at a certain position in the sequence and 
energies of the sequence in the fixed template are cal- 
culated before and after the mutation. This usually 
involves the use of discrete rotamer libraries to sim- 
plify the consideration of possible side-chain confor- 
mations. The new sequence after mutation is accepted 
if the energy becomes lower. If the energy is higher, the 
Metropolis acceptance criterion [30] is used 


1 


p= 


kT? om 


Paccept = min(1, exp(—BAE)) 


and the sequence is updated if Paccept is larger than 
a random number uniformly distributed between 0 
and 1. 

In the configurational bias Monte Carlo method, 
at each step a local energy is used which does not in- 
clude those positions where a mutation has not been at- 
tempted [49]. Cootes et al. [4] reported that the method 
was more efficient at finding good solutions than the 
conventional Monte Carlo method, especially for com- 
plex systems. Zoz and Savan [49] also devised the mean- 
field biased Monte Carlo method which biases the se- 
quence search with predetermined site probabilities, 
which are in turn calculated using SCMF theory. They 
claimed their new method converges to low-energy se- 
quences faster than classic Monte Carlo and configura- 
tional bias Monte Carlo methods. 


Successes of Monte Carlo Methods Imposing se- 
quence specificity by keeping the amino acid composi- 
tion fixed, which reduced significantly the complexity, 
Koehl and Levitt [21,22] designed new sequences for 
the fixed backbones of the 61 domain of protein G, A 


repressor, and sperm whale myoglobin using the con- 
ventional Monte Carlo method. The Baker group also 
utilized the classic Monte Carlo algorithm in their com- 
putational protein design program RosettaDesign. Ex- 
amples of applications of the program include the re- 
design of nine globular proteins: the src SH3 domain, 
A repressor, U1A, protein L, tenascin, procarboxypepti- 
dase, acylphosphatase, S6, and FKBP 12 using fixed tem- 
plates [7]. 


Genetic Algorithms Originating in genetics and evo- 
lution, genetic algorithms generate a multitude of ran- 
dom amino acid sequences and exchange them for 
a fixed template. Sequences with low energies form hy- 
brids with other sequences, while those with high en- 
ergies are eliminated in an iterative process which only 
terminates when a converged solution is attained [46]. 


Successes of Genetic Algorithms With fixed back- 
bones, Belda et al. [1] applied genetic algorithms to 
the design of ligands for prolyl oligopeptidase, p53, 
and DNA gyrase. In addition, with a cubic lattice and 
empiricial contact potentials Hohm et al. [18] and 
Miyazawa and Jernigan [31] also employed evolution- 
ary methods to design short peptides that resemble the 
antibody epitopes of thrombin and blood coagulation 
factor VIII with high stability. 
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Introduction 


Consider the following unconstrained minimization 
problem: 


minimize f(x) subject tox € IR", (1) 


where the objective function f is assumed to be Lips- 
chitz continuous. 

Nonsmooth unconstrained optimization problems 
appear in many applications and in particular in data 
mining. Over more than four decades different meth- 
ods have been developed to solve problem (1). We men- 
tion among them the bundle method and its different 
variations (see, for example, [11,12,13,14,17,20]), algo- 
rithms based on smoothing techniques [18], and the 
gradient sampling algorithm [8]. 

In most of these algorithms at each iteration the 
computation of at least one subgradient or approxi- 
mating gradient is required. However, there are many 
practical problems where the computation of even one 
subgradient is a difficult task. In such situations deriva- 
tive-free methods seem to be a better choice since they 
do not use the explicit computation of subgradients. 

Among derivative-free methods, the generalized 
pattern search methods are well suited for nonsmooth 
optimization [1,19]. However their convergence are 
proved under quite restrictive differentiability assump- 
tions. It was shown in [19] that when the objective func- 
tion f is continuously differentiable in IR”, then the 
lower limit of the norm of the gradient of the sequence 
of points generated by the generalized pattern search 
algorithm goes to zero. The paper [1] provides conver- 
gence analysis under less restrictive differentiability as- 
sumptions. It was shown that if f is strictly differen- 
tiable near the limit of any refining subsequence, then 
the gradient at that point is zero. However, in many 
practically important problems this condition is not 
satisfied, because in such problems the objective func- 
tions are not differentiable at local minimizers. 

In the paper [15] a derivative-free algorithm for 
a linearly constrained finite minimax problem was 
proposed. The original problem was converted into 
a smooth one using a smoothing technique. This algo- 
rithm is globally convergent toward stationary points of 
the finite minimax problem. 

In this paper we describe a derivative-free method 
based on the notion of a discrete gradient for solving 
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unconstrained nonsmooth optimization problems. Its 
convergence is proved for a broad class of nonsmooth 
functions. 


Definitions 


We use the following notation: IR” is an n-dimensional 
space, where the scalar product will be denoted by 


(x, y): 


(x,y) = Do xii 
i=1 


and || - || will denote the associated norm. The gradient 
of a function f : IR" > R! will be denoted by Vf and 
the closed 5-ball at x € IR” by Ss(x) (by Ss if x = 0): 
Ss(x) = {y € IR": |[x— yl] < 8},8 > 0. 


The Clarke Subdifferential 


Let f be a function defined on IR". Function f is called 
locally Lipschitz continuous if for any bounded subset 
X C R" there exists an L > 0 such that 


Lf (x) — f(y)| < Lllx — y|| Vx, y € X. 


We recall that a locally Lipschitz function f is differ- 
entiable almost everywhere and that we can define for 
it a Clarke subdifferential [9] by 


df (x) = co " ER": A(x* € D(f), 


x* +» x,k > +00):v = lim Vfeh)}, 
k—>+00 


where D(f) denotes the set where f is differentiable and 
co denotes the convex hull of a set. It is shown in [9] 
that the mapping df(x) is upper semicontinuous and 
bounded on bounded sets. 

The generalized directional derivative of f at x in the 
direction g is defined as 


f(x, g) = lim sup a! [f(y + ag) — f(y)] . 


y>x,a}o 


If function f is locally Lipschitz continuous, then the 
generalized directional derivative exists and 


f°(x, g) = max{(v, g) :v € Of(x)} . 


f is called a Clarke regular function on R” if it is dif- 
ferentiable with respect to any direction g € IR” and 


f'(x, g) = f(x, g) for all x, g € R", where f’(x, g) is 
a derivative of function f at point x with respect to di- 
rection g: 


fg) = ies a '[f(x + ag) — f(x)]. 


It is clear that the directional derivative f’(x, g) of the 
Clarke regular function f is upper semicontinuous with 
respect to x for all g € IR”. 

Let f be a locally Lipschitz continuous function de- 
fined on IR”. For point x to be a minimum point of 
function f on R”, it is necessary that 0 € df(x). 


Semismooth Functions 


The function f : IR” > R! is called semismooth at 
x € R", if it is locally Lipschitz continuous at x and for 
every g € R", the limit 


lim v, 
ee ee 8) 
exists. It should be noted that the class of semismooth 
functions is fairly wide and it contains convex, concave, 
max- and min-type functions [16]. The semismooth 
function f is directionally differentiable and 


f'(%,g) = 8)- 
g 


lim (v, 
‘—> g,a}0,vEedf(x+ag’) 


Quasidifferentiable Functions 


A function f is called quasidifferentiable at a point x if it 
is locally Lipschitz continuous and directionally differ- 
entiable at this point and there exist convex, compact 
sets 0 f(x) and f(x) such that 


/ 7 
(x,g) = max (u,g)+ min (v,g). 
f(x. a 8) = 8) 
The set df(x) is called a subdifferential, the set a F(x) 
is called a superdifferential, and the pair of sets 
lof (x), 9 'f(x)] is called a quasidifferential of function 
f ata point x [10]. 


Methods 
Approximation of Subgradients 


We consider a locally Lipschitz continuous function f 
defined on IR” and assume that this function is quasid- 
ifferentiable. We also assume that both sets df(x) and 
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f(x) at any x € IR” are polytopes, that is, at a point 
x € R" there exist sets 


A= {a', my a é€R",i=l,...,.mm>1 
and 

B= {b',...,67},b) ER" fj =1,:..;p,.p = 1 
such that 


Of (x) = co A, Of (x) = coB. 


This assumption is true, for example, for functions rep- 
resented as a maximum, minimum, or max-min of a fi- 
nite number of smooth functions. 

We take a direction g € IR” such that 


g=(g..--,8n), gi] = Li=l,....n 


and consider the sequence of n vectors e/ = e/(a), j= 
1,...,n witha € (0, 1]: 


e = =6(@gi,0,...,.0), 
e = (ag1,a7g9,0,...,0), 
e" = (agi, a7 go,...,0" gn). 


We introduce the following sets: 


Ry =A, Re = 8, 

R,= {v ER: 4g; = max{wjgj:w € R} ‘ 

R; => a € R,- 1: VjZj = min{wjgj We Rj-1}} . 
j= Lin. 


It is clear that 


R, #0,Vj€{0,...,0},R; OR.) Vjef{l,...,n} 
and 

Ri) #9, Vj €{0,...,n}, Rj CR, Vi € (1, n} 
Moreover, 

vp =w,Vv,weR,r=1,...,j (2) 
and 

Vi = We VOW ES Rat = Lyssa fh (3) 


Consider the following two sets: 


R(x, e/(a)) = YY €A:(v,e/) = max(u,')} ; 


R(x, e/(a)) = Iw € B: (w,e/) = mine! 
ueB 
Proposition 1 Assume that function f is quasidiffer- 
entiable and its subdifferential and superdifferential are 
polytopes at a point x. Then there exists &) > 0 such that 


R(x, e/(a)) C R., R(x, e(a)) CR; j=1,....n 
J pd 


for alla € (0, a). 


Corollary 1 Assume that function f is quasidifferen- 
tiable and its subdifferential and superdifferential are 
polytopes at a point x. Then there exists &) > 0 such that 


f(x el(a)) =f (x, eA Ma) + via! g; + wjer! gi, 
Vv eR, weRj, j=l,...,n 


for alla € (0, a] 


Proposition 2 Assume that function f is quasidiffer- 
entiable and its subdifferential and superdifferential are 
polytopes at a point x. Then the sets R,, and Ry are sin- 
gletons. 


Remark 1 In the next subsection we propose an al- 
gorithm to approximate subgradients. This algorithm 
finds a subgradient that can be represented as a sum of 
elements of the sets R,, and R,,. 


Computation of Subgradients 


Let g € R", |gi| = 1,i = 1,...,n bea given vector and 
A > 0,a > 0 be given numbers. We define the follow- 
ing points: 


x° =x, xf = x9 + Aed(a), fH lyee eg 
It is clear that 
xi = xP I +(0,..., 


OAH 97,0).25.5 Oy J = Vysagtt 


Let v = v(a, A) € IR" bea vector with the following co- 
ordinates: 


v= (aig;) 1 [ f(x!) — f(xi)] sj=l,...,n. (4) 
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For any fixed g € R",|g;| = 1,i=1,...,nanda>0 
we introduce the following set: 


V(g,a@) = jw € R" : 5A, > +0,k > +00), 


w= lim v(a,Ag). 
k—+00 


Proposition 3 Assume that f is a quasidifferentiable 
function and its subdifferential and superdifferential are 
polytopes at x. Then there exists &) > 0 such that 


V(g,a) C Of(x) 
for alla € (0, a]. 


Remark 2 It follows from Proposition 3 that in or- 
der to approximate subgradients of quasidifferentiable 
functions one can choose a vector g € IR” such that 
lgi] = 1,i=1,...,n, sufficiently small aw > 0,A > 0, 
and apply (4) to compute a vector v(a@, A). This vector 
is an approximation to a certain subgradient. 


Computation of Subdifferentials 
and Discrete Gradients 


In the previous subsection we demonstrated an algo- 
rithm for the computation of subgradients. In this sub- 
section we consider an algorithm for the computation 
of subdifferentials. This algorithm is based on the no- 
tion of a discrete gradient. We start with the definition 
of the discrete gradient, which was introduced in [2] 
(for more details, see also [3,4]). 

Let f be a locally Lipschitz continuous function de- 
fined on IR”. Let 


Si = {g €R": |g] =1},G = {ee R": 


e= (e,...,én), |e) = Ljal,...,n}, 


P = {z(A): (A) € RB’, z(A) > 0, 
A> 0,A7!z(A) > 0,4 > 0}. 


Here Sj is the unit sphere, G is the set of vertices of the 
unit hypercube in IR”, and P is the set of univariate pos- 
itive infinitesimal functions. 

We take any g € S; and define |g;| = max{|gx|, 
k = 1,...,n}. We also take any e = (e),...,@n) € 
G, a positive number a € (0,1], and define the 
sequence of n vectors e/(a),j=1,...,n as in 


Sect. “Approximation of Subgradients.” Then for given 
x € IR" and z € P we define a sequence of n + 1 points 


as follows: 
x? x+ dg, 
xt= x°+ 2z(A)el(a), 
x= x° + z(A)e?(a), 
x" = x9 + z(A)e"(a). 


Definition 1 The discrete gradient of function f 
at point x € IR” is the vector I(x, 930,24.) = 
(ri,...,0/) € IR",g € S, with the following coor- 
dinates: 


Pj = [2()ales)I™ [f(’) — f™)], 
j=l... njH#i, 


Pi = (gi! | fe+ag)—fe)—-a YO Pig; 
j=LiFi 


It follows from the definition that 
f(x +Ag) — f(x) = Mi (x, g, e,z,A,a),g) (5) 


forallg € Sj,e€G,zePA>0,a>0. 


Remark 3, One can see that the discrete gradient is de- 
fined with respect to a given direction g € Sj, and in or- 
der to compute the discrete gradient I”'(x, g,e,z,A,a), 
first we define a sequence of points x°, ... , x” and com- 
pute the values of function f at these points; that is, we 
compute n + 2 values of this function including point 
x. n—1 coordinates of the discrete gradient are de- 
fined similarly to those of the vector v(a, A) from the 
Sect. “Approximation of Subgradients,” and the ith co- 
ordinate is defined so as to satisfy equality (5), which 
can be considered as as version of the mean value theo- 
rem. 


Proposition 4 Let f be a locally Lipschitz continuous 
function defined on IR" and L > 0 its Lipschitz constant. 
Then for any x € IR",g € Si\,e € GA > 0,z € P, 
a>0 


Fil] < C(W)L, C(n) = (n? + 2n3? — 2n'?)!?, 
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For a given a > 0 we define the following set: 


B(x,a) ={v € R" : A(g € S),e € G,z% € P, 
Ze > +0,Az, > +0,k — +00), 
v= lim I''(x, g, €, Zk, An, a}. (6) 


k+>+00 
Proposition 5 Assume that f is a semismooth, qua- 
sidifferentiable function and its subdifferential and su- 
perdifferential are polytopes at a point x. Then there ex- 
ists go > O such that 


co B(x,a) C Of(x) 


for alla € (0, ao]. 


Remark 4 Proposition 5 implies that discrete gradi- 
ents can be applied to approximate subdifferentials of 
a broad class of semismooth, quasidifferentiable func- 
tions. 


Remark 5 One can see that the discrete gradient con- 
tains three parameters: A > 0, z € P, anda >0.z€P 
is used to exploit the semismoothness of function f, and 
it can be chosen sufficiently small. If f is a semismooth 
quasidifferentiable function and its subdifferential and 
superdifferential are polytopes at any x € IR”, then for 
any 6 > 0 there exists a > 0 such that w € (0, a] for 
all y € Ss(x). The most important parameter is A > 0. 
In the sequel we assume that z € P anda > 0 are suffi- 
ciently small. 


Consider the following set: 
Do(x, A) =clco{v € R" : A(g € Sj,e € G,z € P): 
v= D'(x,g,e,A,z,a)}. 


Proposition 4 implies that the set Do(x, A) is compact 
and it is also convex for any x € IR”. 


Corollary 2 Let f be a quasidifferentiable semismooth 
function. Assume that in the equality 


f(x +Ag) — f(x) =Af'(x, 9) +00, 9),.6 €S1 


A~lo(A, g) + 0 as A > +0 uniformly with respect to 
g € 8). Then for any ¢ > 0 there exists Ag > 0 such that 


Do(x, A) C Of (x) + Se 
for alld € (0, Ao). 


Corollary 2 shows that the set Do(x, A) is an approxi- 
mation to the subdifferential 0 f(x) for sufficiently small 


A > 0. However, it is true at a given point. To get con- 
vergence results for a minimization algorithm based on 
discrete gradients, we need some relationship between 
the set Do(x,A) and df(x) in some neighborhood of 
a given point x. We will consider functions satisfying 
the following assumption. 


Assumption 1 Let x € IR" be a given point. For any 
€ > 0 there exist 5 > 0 and Ag > 0 such that 


Do(y,A) C Of (x + Se) + Se (7) 


for all y € Ss(x) and X € (0, Ao). Here 


af(x+Se)= () Af(y), Sex) 
yESe(x) 


= {y eR": |x — yl < ¢}. 


A Necessary Condition for a Minimum 


Consider problem (1), where f : IR” > R' is an arbi- 
trary function. 


Proposition 6 Let x* € R" be a local minimizer of 
function f. Then there exists A) > 0 such that 


OE Do(x, A) 


for alld € (0, Ao). 


Proposition 7 Let 0 ¢ Do(x, A) for a given 1 > 0 and 
v° € IR" be a solution to the following problem: 


minimize|| v||*subject to v € Do(x, A). 


Then the direction g° = —||v°||~'v° is a descent direc- 
tion. 


Proposition 7 shows how the set Do(x, A) can be used 
to compute descent directions. However, in many cases 
the computation of the set Do(x, A) is not possible. In 
the next section we propose an algorithm for the com- 
putation of descent directions using a few discrete gra- 
dients from Do(x, A). 


Computation of Descent Directions 


In this subsection we describe an algorithm for the 
computation of descent directions of the objective func- 
tion f of Problem (1). 
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Let z € P,A > 0,a € (0, 1], the number c € (0, 1), 
and a tolerance 5 > 0 be given. 


Algorithm 1 An algorithm for the computation of the 
descent direction. 

Step 1. Choose any g' € S,,e € G; compute i = 
argmax {|gj|, 7 1,...,n} and a discrete gradient 
vi = I''(x, g!,e,z,A, a). Set Di(x) = {v} and k = 1. 


Step 2. Compute the vector ||w*||? = min{||w||? : 
w € D;(x)}. If 
lw" || <6, (8) 


then stop. Otherwise go to Step 3. 

Step 3. Compute the search direction by gt! = 
= lw" hw, 

Step 4. If 


f(x Agi) — f(x) < —cAl|w* |], (9) 


then stop. Otherwise go to Step 5. 
Step 5. Compute i = argmax {|gh*"| 
and a discrete gradient 


:j=1,...,n} 


Plea ee Ame, 


construct the set Dy+1(x) = co {Dx (x) {vit 8, set 
k =k +1, and go to Step 2. 


In what follows we provide some explanations of Al- 
gorithm 1. In Step 1 we compute the discrete gradient 
with respect to an initial direction g' € IR". The dis- 
tance between the convex hull D;(x) of all computed 
discrete gradients and the origin is computed in Step 2. 
This problem is solved using the algorithm from [21]. 
If this distance is less than the tolerance 6 > 0, then 
we accept point x as an approximate stationary point 
(Step 2); otherwise we compute another search direc- 
tion in Step 3. In Step 4 we check whether this direction 
is a descent direction. If it is, we stop and the descent 
direction has been computed; otherwise we compute 
another discrete gradient with respect to this direction 
in Step 5 and update the set D;(x). At each iteration k 
we improve the approximation of the subdifferential of 
function f. 

The next proposition shows that Algorithm 1 is ter- 
minating. 


Proposition 8 Let f be a locally Lipschitz function de- 
fined on IR". Then, for 5 € (0, C), either condition (8) or 


condition (9) satisfies after m computations of the dis- 
crete gradients, where 


m < 2(log,(5/C)/log, r+1),r = 1—[(1—c)(2€) 8), 


C = C(n)L, and C(n) is a constant from Proposition 4. 


Remark 6 Proposition 4 and equality (5) are true for 
any A > 0 and for any locally Lipschitz continuous 
functions. This means that Algorithm 1 can compute 
descent directions for any A > 0 and for any locally Lip- 
schitz continuous functions in a finite number of itera- 
tions. Sufficiently small values of A give an approxima- 
tion to the subdifferential, and in this case Algorithm 1 
computes local descent directions. However, larger val- 
ues of A do not give an approximation to the subdif- 
ferential and in this case descent directions computed 
by Algorithm 1 can be considered global descent direc- 
tions. 


The Discrete Gradient Method 


In this section we describe the discrete gradient 
method. Let sequences 6; > 0,z, € P,An > 0,64 > 
+0, Z, > +0,A, - +0,k — +00, sufficiently small 
number a@ > 0, and numbers c, € (0, 1), c2 € (0, c,] be 
given. 


Algorithm 2 Discrete gradient method 

Step 1. Choose any starting point x° € IR" and set 
k=0. 

Step 2. Set s = Oand xk = x*, 

Step 3. Apply Algorithm 1 for the computation of 
the descent direction at x = xt 8 = 6,2 = 2z,A= 
Ax,€ = c,. This algorithm terminates after a finite 
number of iterations / > 0. As a result we get the set 
D)(x*) and an element v* such that 


IVF? = min{ |||? sv € DiGs)}. 


Furthermore, either ||v*|| < 8; or for the search direc- 
tion g¥ = —|[vf|| "vf 


F(xk + Ang) — flat) < —crAk|lvé Il. (10) 
Step 4. If 
IIvE Il < bx, (11) 


then set x**! = x*,k = k + land go to Step 2. Other- 
wise go to Step 5. 
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Step 5. Construct the following iteration x*,, = 
xk + o,g*, where o; is defined as follows: 


oO; = argmax {o > 0: f(x + ogk) — f(x*) 
< -c0 |v). 


Step 6. Set s = s + 1 and go to Step 3. 


For the point x° € IR" we consider the set M(x°) = 
{x ER": f(x) < f(x}. 


Proposition 9 Assume that function f is semismooth 
quasidifferentiable, its subdifferential and superdifferen- 
tial are polytopes at any x € IR", Assumption 1 is sat- 
isfied, and the set M(x°) is bounded for starting points 
x° € IR". Then every accumulation point of {x*\ belongs 
to the set X° = {x € R" :0 € Of(x)}. 


Remark 7 Since Algorithm 1 can compute descent di- 
rections for any values of A > 0, we take Ao € (0, 1), 
some f € (0, 1), and update A;,, k > 1 as follows: 


An = Brag, k> 1. 


Thus in the discrete gradient method we use approx- 
imations to subgradients only at the final stage of the 
method, which guarantees convergence. In most itera- 
tions we do not use explicit approximations of subgra- 
dients. Therefore it is a derivative-free method. 


Remark 8 It follows from (10) and c2 < c; that always 
o; > A, and therefore A; > 0 is a lower bound for o;. 
This leads to the following rule for the computation of 
o;. We define a sequence: 


Om = mAx, m>1, 


and o; is defined as the largest 6,, satisfying the inequal- 
ity in Step 5. 


Applications 


There are many problems from applications where the 
objective and/or constraint functions are not regular. 
We will consider one of them, the cluster analysis prob- 
lem, which is an important application area in data 
mining. 

Clustering is also known as the unsupervised classi- 
fication of patterns; it deals with problems of organiz- 
ing a collection of patterns into clusters based on simi- 
larity. Clustering has many applications in information 
retrieval, medicine, etc. 


In cluster analysis we assume that we have been 
given a finite set C of points in the n-dimensional space 
R", that is, 


C = {cl,...,c™}, wherec' € R", i=1,...,m. 


We consider here partition clustering, that is, the distri- 
bution of the points of set C into a given number q of 
disjoint subsets C’, i = 1,...,q with respect to prede- 
fined criteria such that: 

CG 2G FS 1 

Oey =04 f= lant ss 

(3)C= v oF 

i=1 

The sets C',i=1,...,q are called clusters. The 
strict application of these rules is called hard clustering, 
unlike fuzzy clustering, where the clusters are allowed 
to overlap. We assume that no constraints are imposed 
on the clusters C',i = 1,...,q, that is, we consider the 
hard unconstrained clustering problem. 

We also assume that each cluster C’,i = 1,..., q 
can be identified by its center (or centroid). There are 
different formulations of clustering as an optimization 
problem. In [5,6,7] the cluster analysis problem is re- 
duced to the following nonsmooth optimization prob- 
lem: 


minimize f(x',..., x) (12) 
subject to (x',..., x7) € R"™4, 
where 
1 m 
f@ 0, 25> — min I|x* — c'|*. (13) 
: s=1,...5g 
i= 
Here || - || is the Euclidean norm and x* € IR” stands for 


the sth cluster center. If q > 1, then the objective func- 
tion (13) in problem (12) is nonconvex and nonsmooth. 
Moreover, function f is a nonregular function, and the 
computation of even one subgradient of this function is 
quite a difficult task. This function can be represented 
as the difference of two convex functions as follows: 


f(x) = fi) — fr), 


where 


m q 
file) = — Ie’ IP, 


i=1 s=1 
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1 - : k i}2 
fle) == Do max Y> xk -cilP. 


> 


i=1 k= Lk#&s 


It is clear that function f is quasidifferentiable and its 
subdifferential and are polytopes at any point. 

Thus, the discrete gradient method can be applied 
to solve clustering problem. 


Conclusions 


We have discussed a derivative-free discrete gradient 
method for solving unconstrained nonsmooth opti- 
mization problems. This algorithm can be applied to 
a broad class of optimization problems including prob- 
lems with nonregular objective functions. It is globally 
convergent toward stationary points of semismooth, 
quasidifferentiable functions whose subdifferential and 
superdifferential are polytopes. 
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Introduction 


The optimal design of stochastic systems like queueing 
or inventory systems is a specific stochastic optimiza- 
tion problem. 

Let Y,(t) be an ergodic Markov process with dis- 
crete time t = 1, 2, ...and values in R”, depending on 
a control parameter x € R‘. Let H(x, -) be some cost 
function. The problem is to find the control x which 
minimizes the expectedcosts of the system either under 
the transient or under the stationary regime: 

1) Under the transient regime, the process is started at 
time 0 in a specific starting state yp and observed un- 
til time T. The optimality problem reads 


T 
min F(x) = > FAG, Y,(t)] Aj 
t=1 


st. xESCRY?. 


2) Under the stationary regime it is assumed that 
Yx(oo) is distributed according to the stationary dis- 
tribution of the process, which is — by ergodicity — 
the asymptotic distribution of Y,.(t) as t tends to in- 
finity. The optimality problem reads 


min F(x) = E[H(x, Y,(oo)] 
st xESCR?, 


(2) 


As an example, consider the problem of optimally 
determining the decision limits (x), x2) in an inventory 
policy (ifthe inventory at hand has fallen below x, order 
the amount needed to bring the inventory up to x2). As- 
suming that the sales are of random size, the inventory 
system can be modeled as a Markov process depending 
on control parameters (x; and x2). 

The solution method for such an optimization 
problem is a version of the stochastic quasigradient 
method (cf. also ® Stochastic quasigradient methods). 
The solution is stepwise improved by moving it into the 
direction of an estimate of the negative gradientof the 
objective function. 

The basic problem is therefore to find good esti- 
mates for the gradient V, F(x). This problem is a gener- 
alization of the problem of finding derivatives of prob- 
ability measures (cf. » Derivatives of probability mea- 
sures). The general notions of distributional derivatives 
(direct differentiability) and process derivatives (inverse 
differentiability) are applicable here. 


Process Derivatives 


Suppose that the process Y,(-) has a representation of 
the form 


Ye(t + 1) = Ki(x, Ye(t), &:), 


where &; is a sequence of random variables, the distri- 
bution of which does not depend on x. If the derivatives 
Vx Ki(x, y, €), Vy Ki(x, y, &) and Vy H(y) exist, we get 
by elementary calculus 


Vx A(Yx(t)) = VAY, (t)) 


t—1 t—1 


xP DOL TL Xie 2), 6) 


i=0 \j=i+1 
v.Kione(9.89 » (3) 


where the order of multiplication in the product is here 
and in the following from left to right by descending in- 
dex. Formula (3) may be computed recursively, as fol- 
lows. 

Define the [m x d] (random) matrices N; by 


t—1 t—1 


Ne= DO [ [] VK. 4.) 


i=1 \j=it+1 
x Vi Ki(x, Yx(i), i). 
This sequence follows the forward recursion 
No = 0, 
Nyro = Vu Ki (x, Yu(t), &t) + Vy Ke(x, Yet), &+) + Ne. 


After having found N;, by this recursion, one may cal- 
culate 


Vx A(Yx(t)) = VAY, (t)) - Ni. 


This pointwise calculation carries over to the expec- 
tation under the standard assumptions of dominated 
convergence, yielding 


VxE[H(Yx(t))] = E[VyH(Yx(t)) - Ni]. 
Now, the estimate for the problem in transient 


regime is 


T 
VyF(x) = )> H(Ye(t)) + Ne, 


t=1 


Derivatives of Markov Processes and Their Simulation 


657 


whereas for the stationary regime one uses 


T 
> H(¥e(t)) Ne, 


t=t+1 


—~ 1 
V, F(x) = To7 


where T is large and rt stands for the warmup-phase 
of the process, which is skipped for the estimation. Of 
course, the latter estimate is biased, it bias decreases 
with increasing T and t. 


Distributional Derivatives 


Suppose that the Markov transition has transition den- 
sity px(V1 | Yo)» i.e. 


P(Y;(t + 1) € AlYs(t) = yo) = f psvnlyo rT 
A 


and starts in state yo. The expectation of H(Y,(t) is 


E[H(Y,(t))] 
t 
= ff HOD TT exGilad yay, 
i=1 
Introduce the score function 


t 


Vx xi lLyi- 
=> Px(yilyi-v) 


Sx(Yo.--- . 
ee Px Vi lyi-1) 


i=1 
By the product rule we get the formula 


VxE[A(Yx(t))] 


= ff HODsG0.-.90 


t 
x |] pe(ilyini) dye dyn. 


i=1 
An estimate for V, E[H(Y,(t))] is 
AC(Y,(t)) . W,(4), 


where W,(t) = sx(Y,(0), ..., Yx(t)) is called the score 
function martingale. As before, the estimate for the 
problem in transient regime is 


T 
Vy F(x) = >> H(Ye(t)) + Welt), 


t=1 


whereas the estimate for the stationary regime is 


ch 
VP) = a Yo HUH) Wal. 


It is asymptotically unbiased for T, t — 00 (see [2]). 
There is also the a way of attacking directly the 
derivative of the stationary distribution: Let P, repre- 
sent the transition matrix (transition operator) of the 
Markov process. The stationary distribution 7, satisfies 


Ty = Wy + Py 
and therefore 


VxIx = [Vx Ix] + Py Wx: [Vx Px], 


1.€. 
Vx Ix = Ix [Vx Px]Sx, (4) 
with 
ioe) 
Sy = > (PE -1-2,). 
k=0 


Here 1-zr, is the transition with rows being identical to 
tx. The operator S, solves the Poisson equation 


Sx(UI— Px) =I, 


where I is the identity operator. There is a method, to 
use equation(4)) as the basis for estimating V, E[Y,(-)], 
see [1, Chapt. 3]. 


Regenerative Processes 


Recall that a set A is a regenerative set of the ergodic 

Markov transition P if 

i) u — P(u, B) is independent of u € A, for all B; and 

ii) (A) > 0, where z is the unique stationary proba- 
bility measure pertaining to P. 

Suppose that A is a regenerative set for all transitions P,. 

The sequence of regenerative stopping times of Y,(t)is 


T = min{t: Y,(t) € A}, 


T= min {t > T): ¥,(t) € Al ; 
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These stopping times cut the process into independent 
pieces. For a process Y, started in A, the following fun- 
damental equation relates the finite time behavior to the 
stationary, i.e. long run behavior: 


E[ i HC (#)| 


ELH(Y<(00)] = Fray 


(5) 


The score method for derivative estimation gives 


TA TA) 


VxE] >) (YC) | =E| >> A) WO) 
t=1 t=1 


and 


TA) 


V.E[T”] =E > W,(t) 
t=1 


and — by the quotient rule — 


VeELH(¥.(00))] 

ET) -VeE| a, H%(0) | 
(ET)? 

E[ 1 HO) VET) 
(ET)P 


(see [2]). For the estimation of V, E[H(Y,(0o))], all ex- 
pectations of the right-hand side have to be replaced by 
estimates. 


See also 


> Derivatives of Probability and Integral Functions: 
General Theory and Examples 

> Derivatives of Probability Measures 

> Discrete Stochastic Optimization 

> Optimization in Operation of Electric and Energy 
Power Systems 
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Probability functions are commonly used for the anal- 
ysis of models with uncertainties or variabilities in pa- 
rameters. For instance, in risk and reliability analysis, 
performance functions, characterizing the operation of 
systems, are formulated as probabilities of successful 
or unsuccessful accomplishment of their missions (core 
damage probability of a nuclear power plant, probabil- 
ity of successful landing of an aircraft, probability of 
profitable transactions in a stock market, or percentiles 
of the risks in public risk assessments). Sensitivity anal- 
ysis of such performance functions involves evaluating 
of their derivatives with respect to the parameters. Also, 
the derivatives of the probability function can be used 
to solve stochastic optimization problems [1]. 

A probability function can be formally presented as 
an expectation of a discontinuous indicator function of 
a set, or as an integral over a domain — depending upon 
parameters. Nevertheless, differentiability conditions of 
the probability function do not follow from similar con- 
ditions of the expectations of continuous (smooth or 
convex) functions. 
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The derivative of the probability function has many 
equivalent representations. It can be represented as an 
integral over the surface, an integral over the volume, or 
a sum of integrals over the volume and over the surface. 
Also, it can be calculated using weak derivatives of the 
probability measures or conditional expectations. 

The first general result on the differentiability of 
the probability function was obtained by E. Raik [8]. 
He represented the gradient of the probability function 
with one constraint in the form of the surface integral. 
S. Uryasev [10] extended Raik’s formula for probability 
functions with many constraints. A.J. Kibzun and G.L. 
Tretyakov [3] extended it to the piecewise smooth con- 
straint and probability density function. Special cases of 
probability function with normal and gamma distribu- 
tions were investigated by A. Prékopa [6]. G.Ch. Pflug 
[5] represented the gradient of probability function in 
the form of an expectation using weak probability mea- 
sures. 

Uryasev [9] expressed the gradient of the probabil- 
ity function as a volume integral. Also, using a change 
of variables, K. Marti [4] derived the probability func- 
tion gradient in the form of the volume integral. 

A general analytical formula for the derivative of 
probability functions with many constraints was ob- 
tained by Uryasev [10]; it calculates the gradient as an 
integral over the surface, an integral over the volume, 
or the sum of integrals over the surface and the volume. 
Special cases of this formula correspond to the Raik for- 
mula [8], the Uryasev formula[9], and the change-of- 
variables approach [4]. 

The gradient of the quantile function was obtained 
in [2]. 


Notations and Definitions 


Let an integral over the volume 
Fa)= [pln yday () 
f(x.y)<0 


be defined on the Euclidean space R", where f: R” x R” 
— R* and p: R" x R” — R are some functions. The 
inequality f(x, y) < 0 in the integral is a system of in- 
equalities 


filx,y) <0, 


Both the kernel function p(x, y) and the function f(x, y) 
defining the integration set depend upon the parame- 


a eee 


ter x. For example, let 
F(x) = P{ f(x, ¢(@)) < Of (2) 


be a probability function, where ¢ (w) is a random vec- 
tor in R”. The random vector ¢ (@) is assumed to have 
a probability density p(x, y) that depends on a parame- 
ter x € R". The probability function can be represented 
as an expectation of an indicator function, which equals 
one on the integration set, and equals zero outside of it. 
For example, let 


F(x) = E[Iepee,e)<0)9(*, $)] 


= / glx, y)p(x, y) dy 
f(x,y)<0 


= / ple. y) dy, (3) 
f (x,y)<0 


where I; is an indicator function, andthe random vec- 
tor ¢ in R” has a probability density p(x, y) that de- 
pends on a parameter x € R. 


Integral Over the Surface Formula 


The following formula calculates the gradient of an in- 
tegral (1) over the set given by nonlinear inequalities as 
sum of integral over the volume plus integral over the 
surface of the integration set. We call this the integral 
over the surface formula because if the density p(x, y) 
does not depend upon x the gradient of the integral (1) 
equals an integral over the surface. This formula for the 
case of one inequality was obtained by Raik [8] and gen- 
eralized for the case with many inequalities by Uryasev 
[10]. 
Let us denote by j(x) the integration set 


L(x) = {y € R™: f(x,y) < 0} 
= {y ER”: fix, y)<0,1<1<k} 


and by dj(x) the surface of this set jz(x). Also, let us de- 
note by 0;/4(x) a part of the surface which corresponds 
to the function f ;(x, y), i.e., 


O;U(x) = U(x) Nf{y ER”: fi(x, y) = 0}. 


If the constraint functions are differentiable and the 
following integral exists, then gradient of integral (1) 
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equals 


V,.F(x) = i Vx p(x, y) dy 
L(x) 
p(x, y) 
- — _V £i(x, y) dS. (4) 
ae [VY file, »| filx, y 


i=1 


A potential disadvantage of this formula is that in mul- 
tidimensional case it is difficult to calculate the integral 
over the nonlinear surface. Most well known numeri- 
cal techniques, such as Monte-Carlo algorithms, are ap- 
plicable to volume integrals. Nevertheless, this formula 
can be quite useful in various special cases, such as the 
linear case. 


Example 1 (Linear case: Integral over the surface for- 
mula [10].) 


Let A(@), be a random | x n matrix with the joint den- 
sity p(A). Suppose that x € R” and xj £0, j=1,..., n. 
Let us define 


F(x) = P{A(w)x < b, A(w) = 0}, 


b = (by,...,b;)) ER’, xeER", (5) 


i.e. F(x) is the probability that the linear constraints 
A(w)x < b, A(w) > 0 are satisfied. The constraint, A(w) 
= 0, means that all elements aj(w) of the matrix A(w) 
are nonnegative. Let us denote by A; and A’ the ith row 
and column of the matrix A 


Al 
AS Oe (Ai eeg A e 
Aj 
then 
Aix — by; 
filx, A) r ‘ 
: 1x — OJ 
f(x, A) = : a _Al : 
k(x, A) , 
—An 
k=1+I1xn. 


The function F(x) equals 


Fx) = [pada 6) 
f(x,A)<0 


We use formula (4) to calculate the gradient V,F(x) as 
an integral over the surface. The function p(A) does not 
depend upon x and V,p(A) = 0. Formula (4) implies 
that V,F(x) equals 

k 
p(A) 


cer Fe a ee 
Me | Va fix, A) f 


i= 


Since V,fi(x, A) = 0 fori=1+1,..., k, then V,F(x) 
equals 


pl) 
———___ \V, f(x, A) dS 
ae Daf, ayy OA) 


1 
= -)— / PAA) yr dS 
= Jaiutey (ll 


i=1 


1 
= =I fos playal ds. 
i=1 = 


~ Ajx=b; 


Integral Over the Volume Formula 


This section presents gradient of the function (1) in the 
form of volume integral. Let us introduce the following 
shorthand notations 


filx, y) 
firlx, y) = : ’ F(x. y) = fir(x, y), 
filx, y) 
Ofi(xy) - Ofk(x,y) 
ayi oy1 
Vy f(x,y) = 
afiixsy)  OFk (xy) 
Oym Om 


Divergence for the n x m matrix H consisting of the 
elements hj; is denoted by 


Following [10], the derivative of the function (1) is rep- 
resented as an integral over the volume 


Vx, F(x) = / Vx p(x, y) dy 
M(x) 


+ I, ae (p(x, y)H(x, y)) dy, (7) 
[W(x 
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where a matrix function H: R"” x R™ — R"*" satisfies 
the equation 


H(x, y)Vy f(x, y) + Va f(x, y) = 0. (8) 


The last system of equations may have many so- 
lutions, therefore formula (7) provides a number of 
equivalent expressions for the gradient. The following 
section gives analytical solutions of this system of equa- 
tions. In some cases, this system does not have any solu- 
tion, and formula (7) is not valid. The following section 
deals with such cases and provides a general formula 
where system of equations can be solved only for some 
of the functions defining the integration set. 


Example 2 (Linear case: Integral over the volume for- 
mula [10].) 


With formula (7), the gradient of the probability func- 
tion (5) with linear constrains considered in Example 1 
can be represented as the integral over the volume. It 
can be shown that equation (8) does not have a solu- 
tion in this case. Nevertheless, we can slightly modify 
the constraints, such that integration set is not changed 
and equation (8) has a solution. In the vector function 
f(x, A) we multiply column A’ on x’ if x! is positive or 
multiply it on —x! if x! is negative. Therefore, we have 
the following constraint function 


Ax = b; 


Ajx —b; 


DI aCe de 


(9) 
—(+)x,A" 

where —(+) means that we take an appropriate sign. It 
can be directly checked that, the matrix H7(x, A) 

H*(x, A) = (AM(x, Ay)... WG, AD), 

Aix; 0 
hi(x,Aj) = — 
0 Cink, 

is a solution of system (8). As it will be shown in the 


next section, this analytical solution follows from the 
fact that change of the variables Y' = x;A', i = 1, ..., 


n, eliminates variables x‘, i = 1, ..., n, from the con- 
straints (9). 


Since V,,p(A) = 0 and div, (p(A)H™* (x, A)) equals 


a (1p(A) + ae ain 52; pA)) 


5 (1p(A) + — din x2-plA)) 


formula (7) implies that dF(x)/ dx;} equals 


i 
= 0 
—X; Sl Ga + me vugecpts] dA. 


A>0 


General Formula 


Further, we give a general formula [9,10] for the dif- 
ferentiation of integral (1). A gradient of the integral is 
represented as a sum of integrals taken over a volume 
and over a surface. This formula is useful when system 
of equations (8) does not have a solution. We split the 
set of constraints K := = {1, ..., k} into two subsets K, 
and K>.Without loss of generality we suppose that 


K, = {1,..., J}, Ky, = {1+1,...,k}. 


The derivative of integral (1) can be represented as the 
sum of the volume and surface integrals 


V,.F(x) = / Vx p(x, y) dy 
M(x) 


“ A oS (p(x. y)Hi(x, y)) dy 
[L(x 


k 
= | p(x, y) 
jap 2 die) |V) file, y) 


x [Vi fil tx, 9) + Hilx, Vy filx,)] 48, (10) 


where the matrix H;; R" x R”™ — R”™" satisfies the 
equation 


Hi (x, y)Vyfirlx, y) + Ve file, y) = 0. (11) 


The last equation can have a lot of solutions and we can 
choose an arbitrary one, differentiable with respect to 
the variable y. 

The general formula contains as a special cases the 
integral over the surface formula (4) and integral over 
the volume formula (7). When the set Kj is empty, the 
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matrix H is absent and the general formula is reduced 
to the integral over the surface. Also, when the set K> is 
empty we have integral over the volume formula (7). 
Except these extreme cases, the general formula pro- 
vides number of intermediate expressions for the gradi- 
ent in the form of the sum of an integral over the surface 
and an integral over the volume. Thus, we have a num- 
ber of equivalent representations of the gradient corre- 
sponding to the various sets K, and K2 and solutions of 
equation (11). 

Equation (11) (and equation (8) which is a partial 
case of equation (11)) can be solved explicitly. Usually, 
this equation has many solutions. The matrix 


-1 
= Va ful. 9) x (Vy furl, Vy furl, »)) 

Vy fulx,y) (12) 

is a solution of equation (11). Also, in many cases there 


is another way to solve equation (11) using change of 
variables. Suppose that there is a change of variables 


y = y(x,z) 


which eliminates vector x from the function f(x, y) 
defining integration set, i. e., function f(x, y(x, z)) does 
not depend upon the variable x. Denote by y~! (x, y) the 
inverse function, defined by the equation 


ve (Vz Sz. 
Let us show that the following matrix 


H(x,y) = Vx V(X, Z)|z=y-l(x,y) (13) 


is a solution of (11). Indeed, the gradient of the function 
y(x, y(x, Z)) with respect to x equals zero, therefore 


0 = Vefulx, plx,2)) 
= Vi y(x, 2) Vy filx, ly=y(.2) 
+ Va fiilx, Wly=yx.2)> 


and function V, y(x, Z)|z=y—1(% y) is a solution of (11). 

Formula (7) with matrix (13) gives the derivative 
formulas which can be obtained with change of vari- 
ables in the integration set [4]. 


Example 3 While investigating the operational strate- 
gies for inspected components (see [7]) the following 


integral was considered 


F(x) = roves. ply) dy, (14) 


viz, 


where x € R', ye R”, p: R™ > R', 0 > 0, bly) = O%, 
y?. In this case 


b(y) — x 
jeors| "| 
b=, 
and 


na) = f ply) ay= | ply) dy. 
f(x,y)<0 A(x) 


Let us consider that / = 1, i.e. K,; = {1} and K> = {2,..., 
m + 1}. The gradient V.F(x) equals 


I Lee) + div, (p(y)Hi(x, y))] dy 
[L(x 


¥ | py) 
S Sais) |Vy fi »)| 


x [Vx filx, y) + Hy(x, y)Vyfilx, y)] ds, (15) 


Where the matrix H;(x, y) satisfies (11). In view of 


a—1 


y 


Vy filx, y) = Vx filx, y) = —1. 


a—1 


a solution Hf (x, y) of (11) equals 


H3 (x,y) = Cy) 2= (In (yada n(n) 
1 1-—a@ 1-a@ 
= — aats . (16 
am (y; Ym ) ( ) 
Let us denote 
(6;|y) = (V1, ++. Vins 95 Vitis +++ mds 
ee = (Nisesas Vins Vide es Vind 


b(Gily) = 0% + D> y*. 
j=1 
i 


We denote by y > 6 the set of inequalities 


yj2O, jol,...,i-1, it1,...,m. 
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The sets 0;4(x), i=2,...,m-+ 1, have a simple structure 


d;u(x) = w(x) (){y eR”: yi = 0} 
={y'eER™”: bOGily) <x, y' =O}. 
Fori=2,...,m+1, we have 
(V,fily)); = 0 fj isgm 7#iI—1, 01D 
(Vf). = [VA =1. (18) 
The function p(y) and the functions f;(y),i=2,...,m+ 
1, do not depend on x, consequently 
Vx ply) = 0, (19) 
Vifity) =0, i=2,...,m+1. (20) 
Equations (15)-(20) imply 
Var) = | div, (pOIAG)) dy 
L(x) 
m+1 
py) 
= Th Vy fil) aS 
Xoo PROTO 


= I. div, (p(y)h(y)) dy 
L(x) 


m+1 


+ dX hi Of. p(y) ds 


= i biysx, divy (p(y)h(y)) 4 


yiz9, 
i=1,....m 

m 1l-a@ 

—~ am am he ly)sx, pOily) dy 
i=1 y i>0 


Since 


divy (p(y)h(y)) 
= cic + p(y) div, - 


6 = 


i=1 


we, finally, obtain that the gradient V,. F(x) equals 


hog )<x, > vi | a) |e 


a. 


gi-e 


+ frjprca, PCI”) dy. 


i=1 y '>0 


The formula for V, F(x) is valid for an arbitrary suffi- 
ciently smooth function p(y). 


See also 


> Derivatives of Markov Processes and Their 
Simulation 

> Derivatives of Probability Measures 

> Discrete Stochastic Optimization 

> Optimization in Operation of Electric and Energy 
Power Systems 
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Keywords 
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For stochastic optimization problems of the form 


min F(x) = Eco dix (v) 


st. xESCR? 


(1) 


where H(x, v) is a cost function, jl, a family of prob- 
ability measures indexed by x and F(x) the objective 
value function (OVF), the necessary condition V,F(x) 
= 0 must be expressed in terms of the derivatives of 
H(x, -) and fly w.r.t. x. In particular, concepts of dif- 
ferentiability of probability measures are needed. 


Direct Differentiability 
Suppose that the family (1) is dominated, i.e. there is 
a Borel measure v such that the densities 
dix 
a 
exist for all x. Then the differentiability of the measures 
may be defined by the differentiability of the densities. 


&x(V) = 


Definition 1 The family of densities (g,(v)) is called 
strongly L,(v)-differentiable if there is a vector of inte- 
grable functions Vigx = (gi.13---> 8g)" such that 


| lgctn(v) — ge(v) AT Vege(v)| dv(v) 


b= o(||hl|) as ||Al] 10. (2) 


The family of densities (g,(v)) is called weakly L,(v)- 
differentiable if there is a vector of L)(v) functions Vx.¢x 
= (S41 +++> Sq)" such that for every bounded measur- 
able function H 


ieenc — ge(v) —h" - Vege(v)]H(v) dv(v) 
= o(|[hll) as [hl] LO. (3) 


Weak differentiability implies strong differentiability 
but not vice versa. 

There is also a notion of differentiability for families 
(4x), which do not possess densities (see [3]). 

If the densities (g,) are differentiable and H(x, 
v) is boundedly differentiable in x and bounded and 
continuous in v, then the gradient of F(x) = f H(x, 
v)gx(v)dv(v) is 


/ V.H(x, v)gx(v) dv(v) 
+ f HG. VVsg000 dv(v). 


Inverse Differentiability 


The family (j2,) is called process differentiable if there 
exists a family of random variables V,.(w) — the process 
representation — defined on some probability space ({2, 
A, P), such that: 

a) V,(-) has distribution j2, for all x; and 

b) xt V,,(@) is differentiable a.s. 

As an example, let yz, be exponential distributions 
with densities g,(v) =x exp(—x- u). Then V,(@) = (1/x) 
U for U ~ Uniform [0, 1] is a process representation in 
the sense of a) and differentiable in the sense of b) with 
derivative V,.V.(@) = —(1/x?)U. 

Process differentiability does not imply and is not 
implied by weak differentiability. If G.(u) = [“ 
gx(v)dv is the distribution function, then process differ- 
entiability is equivalent to the differentiability of x -> 
G;,'(u), whereas the weak differentiability is connected 
to the differentiablity of x > G,(u). 

If V(-) is a process representation of (j1,.), then the 
objective function 


P(x) =f HG«,v) dus(o) = ELH(x, Vo] 
has derivative 

Vx. F(x) = E[Hx(x, Vi) + Hy(x, Ve) + Vx Vel. 
where H,(x, v) = V,H(x, v) and H,(x, v) = V,H(x, v). 
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Simulation of Derivatives 


If the objective function F in (1) is easily calculated, 
then the stochastic optimization problem reduces to 
a standard nonlinear deterministic optimization prob- 
lem. This is however the exception. In the majority of 
applications, the objective function value has to be ap- 
proximated either by a numeric integration technique 
or a Monte-Carlo (MC) estimate. In the same manner, 
the gradient V,.F(x) may be approximated either by nu- 
merical integration or by Monte-Carlo simulation. We 
discuss here the construction of MC estimates for the 
gradient V,,F(x). For simplicity, we treat only the uni- 
variate case x € R!. 

We begin with recalling the Monte-Carlo (MC) 
method for estimating F(x). If (V“) is a sequence of 
independent identically distributed random variables 
with distribution function G,, then the MC estimate 


1 : 
—)° (x, Vy”) 

n 

is an unbiased estimate of F(x). 


Process Derivatives 


If the family (jz,.) has differentiable process representa- 
tion (V,), then 


+Hy(x, V0). Ve v.| (4) 


is a MC estimate of V,F(x). The method of using the 
process derivative (4) is also called perturbation analysis 


({1,2]). 


Distributional Derivatives 


If the densities g, are differentiable, there are two possi- 
bilities to construct estimates. First, one may define the 
score function sx(v) = [Vxgx(v)] /gx(v) and construct the 
score function estimate [4] 


n 


VeFa(e) = — Yo [Hala VE") 
i=1 


+H, V)s(V)| 


which is unbiased. 


Alternatively, one may write the function V,.g,(v) in 
the form 


Vx 8x(v) = Cx x(v) — &(V)I, (5) 


where g, and g, are probability densities w.r.t. v, and 
c, is a nonnegative constant. One possibility is to set g, 
resp. §, as the appropriately scaled positive, resp. neg- 
ative, part of V.gx, but other representations are possi- 
ble as well. Let now V.\, resp. V{, be random variables 
with distributions ¢,dv, resp. g.dv. The difference esti- 
mate is 


+¢x[H(x, Vo) — A(x, vn 


which is unbiased (see [3]). 


Example 2 Assume again that (j1,) are exponential dis- 
tributions with expectation x. The probability jz, has 
density 


&x(y) = x -exp(—xy). 


Let V, be distributed according to x. For simplicity, 
assume that the cost function H does not depend ex- 
plicitly on x. We need estimates for V,E(H(V;)). The 
three methods are: 
1) Score derivative: The score function is 

Vigx(v) 1 


&x(v) x 


and the score function estimate is 
Tp) 1 
V.FY =H(V,)(--V; ). 
x 


2) Difference derivative: There are several representa- 
tions in the sense of (5). One could use the decom- 
position of V,.g,(-) into its positive and negative part 
(Jordan-Hahn decomposition) and get the estimate 


_—_ 1 ‘ 7 
ViFO®) = —(H(Vz) — H(Vz)), 
x 
where V, has density 


xe(l=—xyje +12 


Vv 


Rie 
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and V, has density 


aes | 


xe(xv — le 1 


Vrs 


and both are independent. 
Another possibility is to set 


1 


V, = ——log Uj, 
eee 


: 1 
V, = = 08 U, + log U2), 


where U}, U2 are independent Uniform [0, 1] vari- 
ates. The final difference estimate is 


VFO) = —(H(V,) — H(%)). 


3) Process derivative: A process representation of (1x) 
is 


1 
V, = —-log(1—U), U ~ Uniform[0, 1]. 
x 
A process derivative of H(V x) is 
<~(3) 1 
VF = (Vx )(—— Vx). 
x 


Notice that in methods 1) and 2) the function H 
need not to be differentiable and may be an indica- 
tor function - as is required in some applications. In 
method 3), the function H must be differentiable. 


Whenever a MC estimate V, F(x) has been defined, it 
can be used in a stochastic quasigradient method (SQG; 
cf. also ® Stochastic quasigradient methods) for opti- 
mization 


Xs41 = prs[X; = psVxFn(Xs)] 


where prs is the projection on the set S and (p;) are 
the stepsizes. The important feature of such algorithms 
is the fact that they work with stochastic estimates. In 
particular, the sample size n per step can be set to 1 
and still convergence holds under regularity assump- 
tions. To put it differently, the SQG allows to approach 
quickly a neighborhood of the solution even with much 
noise corrupted estimates. 


See also 


> Derivatives of Markov Processes and Their 
Simulation 


> Derivatives of Probability and Integral Functions: 
General Theory and Examples 

> Discrete Stochastic Optimization 

> Optimization in Operation of Electric and Energy 
Power Systems 
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Keywords 


Optimization; Computational fluid dynamics 


Synonyms 


Design Optimization in CFD 


Focus 


The article focuses on design optimization using com- 
putational fluid dynamics (CFD). Design implies the 
creation of an engineering prototype (e.g., a pump) 
or engineering process (e. g., particle separator). Opti- 
mization indicates the selection of a ‘best’ design. Com- 
putational fluid dynamics (CFD) represents a family of 
models of fluid motion implemented on a digital com- 
puter. In recent years, efforts have focused on merging 
elements of these three disciplines to improve design 
effectiveness and efficiency. 


Framework 


Consider the design of a prototype or process with n 
design variables {x;: i = 1, ..., n} denoted by x. It is 
assumed that n is finite, although infinite-dimensional 
design spaces also exist (e.g., the shape of a civilian 
transport aircraft). The domain of x constitutes the de- 
sign space. A scalar objective function f(x) is assumed 
to be defined for some (or possibly all) points in the 
design space. This is the simplest design optimization 
problem. Oftentimes, however, the optimization can- 
not be easily cast into this form, and other methods 
(e.g., Pareto optimality) are employed. The purpose of 
the design optimization is to find the design point x* 
which minimizes f. Note that there is no loss of gen- 
erality in assuming the objective is to minimize f, since 
the maximization of an objective function F(x) is equiv- 
alent to the minimization of f = td 

The design optimization is typically an iterative pro- 
cess involving two principal elements. The first element 
is the simulation which evaluates the objective function 
by (in the case of computational fluid dynamics) a fluid 
flow code (flow solver). The second element is the search 
which determines the direction for traversing the de- 
sign space. The search engine is the optimizer of which 
they are several different types as described later. The 
design optimization process is an iterative procedure 
involving repetitive simulation and search steps until 


Search 


Gradient Optimizer 


Stochastic Optimizer 


Simulate 
Generate grid 
Solve flowfield 


Compute objective function 


Design Optimization in Computational Fluid Dynamics, Fig- 
ure 1 
Elements of design optimization 


a predefined convergence criteria is met. This is illus- 
trated in Fig. 1. 


Levels of Simulation 


There are five levels of complexity for CFD simulation 
Fig. 2. Empirical methods represent correlations of ex- 
perimental data and possibly simple one-dimensional 
analytical models. An example is the NIDA code [15] 
employed for analysis of two-dimensional and axi- 
symmetric inlets. The code is restricted to a limited 
family of geometries and flow conditions (e.g., no 
sideslip). Codes based on the linear potential equations 
(e.g., PANAIR [6]; see also [17]) and nonlinear poten- 
tial equations (e. g., [8]; see also [7]) incorporate in- 
creased geometric flexibility while implementing a sim- 
plified model of the flow physics (i.e., it is assumed 
that the shock waves are weak and there is no signifi- 
cant flow separation). Codes employing the Euler equa- 
tions (e.g., [22]) allow for strong shocks and vortic- 
ity although neglect viscous effects. Reynolds-averaged 
Navier-Stokes codes (RANS codes) (e.g., GASP [31]) 
employ a model for the effects of turbulence. The range 
of execution time between the lowest and highest levels 
is roughly three orders of magnitude, e. g., on a conven- 
tional workstation the NIDA code requires only a few 
seconds execution time while a 2-dimensional RANS 
simulation would typically require a few hours. 
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Reynolds-averaged Navier-Stokes (1990s) 


= 


Euler equation (1980s) 


Nonlinear potential equation (1970s) 


aut Add paseasaq, 


Linear potential equation (1960s) 


More acurate simulation 


< 


Empirical correlations (< 1960) 


Design Optimization in Computational Fluid Dynamics, Fig- 
ure 2 
Levels of CFD simulation 


The Stages of Design 


There are typically three stages of design: conceptual, 
preliminary and detailed. As the names suggest, the de- 
sign specification becomes more precise at successive 
design stages. Thus, for example, a conceptual design 
of a civilian transport aircraft may consider a (discrete) 
design space with the possibility of two, three or four 
engines, while the preliminary design space assumes 
a fixed number of engines and considers the details 
of the engine (e. g., nacelle shapes). It is important to 
note that the CFD algorithms employed in each of these 
three stages are likely to be different. Typically, the con- 
ceptual design stage employs empirical formulae, while 
the preliminary design stage may also include simpli- 
fied CFD codes (e. g., linearized and nonlinear poten- 
tial methods, and Euler codes), and the detailed design 
stage may utilize full Reynolds-averaged Navier-Stokes 
methods. Additionally, experiment is oftentimes essen- 
tial to verify key features of the design. 


Emergence of Automated Design Optimization 
Using CFD 


Although the first numerical simulation of viscous fluid 
flow was published in 1933 by A. Thom [51], CFD as 
a discipline emerged with the development of digital 
mainframe computers in the 1960s. With the princi- 
pal exception of the work on inverse design methods 
for airfoils (see, for example, the review [30] and [48]), 
CFD has mainly been employed in design analysis as 
a cost-effective replacement for some types of experi- 
ments. However, CFD can now be employed as part of 


an automated design optimization process. This oppor- 
tunity has arisen for five reasons. First, the continued 
rapid improvements in computer performance (e.g., 
doubling of microprocessor performance every 18 to 
24 months [3]) enable routine numerical simulations 
of increasing sophistication and complexity. Second, 
improve- ments in the accuracy, efficiency and robust- 
ness of CFD algorithms (see, for example, [18]) like- 
wise contribute to the capability for simulation of more 
complex flows. Third, the development of more accu- 
rate turbulence models provides increased confidence 
in the quality of the flow simulations [16]. Fourth, the 
development of efficient and robust optimizers enable 
automated search of design spaces [33]. Finally, the de- 
velopment of sophisticated shell languages (e. g., Perl 
[43]) provide effective control of pathological events 
which may occur in an automated design cycle using 
CED (e.g., square root of a negative number, failure to 
converge within a predetermined number of iterations, 
etc.). 


Problem Definition 


The general scalar nonlinear optimization problem 
(also known as the nonlinear programming problem) is 
[11,33,52] 


minimize f(x), (1) 


where f(x) is the scalar objective function and x is the 
vector of design variables. Typically there are limits on 
the allowable values of x: 


a<x<b, (2) 
and m additional linear and/or nonlinear constraints 


ci(x) =0, i=1,...,m’, 


ci(x) <0, i= 


(3) 


m' +1,...,m. 


If f and c; are linear functions, then the optimization 
problem is denoted the linear programming problem, 
while if f is quadratic and the c; are linear, then the op- 
timization problem is denoted the quadratic program- 
ming problem. 

An example of a nonlinear optimization problem 
using CFD is the design of the shape of an inlet for 
a supersonic missile. The geometry model of an axisym- 
metric inlet [53] is shown in Fig. 3. 
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throat ~ 


"constant" cross-section 


subsonic diffuser 


uniform cross-section 


(cross section increases slightly) 


Design Optimization in Computational Fluid Dynamics, Figure 3 


Geometry of high speed inlet 


The eight design variables are listed below. 


Item Definition 
initial cone angle 


final cone angle 


x-coordinate of throat 


r-coordinate of throat 


x-coordinate of end of ‘constant’ cross section 


internal cowl lip angle 


height at end of ‘constant’ cross section 


height at beginning of ‘constant’ cross section 


There are no general methods for guaranteeing that 
the global minimum of an arbitrary objective function 
f(x) can be found in a finite number of steps [4,11]. 
Typically, methods focus on determining a local min- 
imum with additional (often heuristic) techniques to 
avoid convergence to a local minimum which is not the 
global minimum. 

A point x* is a (strong) local minimum [11] if there 
is a region surrounding x* wherein the objective func- 
tion is defined and f(x) > f(x*) for x # x*. Provided f(x) 
is twice continuously differentiable (this is not always 
true; see, for example, [53]), necessary and sufficient 
conditions for the existence of a solution to (1) subject 
to (3) may be obtained [11]. In the one-dimensional 
case with no constraints the sufficient conditions for 
a minimum at x* are 

e=OandH>0 atx—<x*, 
where g = df/dx and H = d’f / dx*. For the multidimen- 
sional case with no constraints 


|\gi) = 0 and His positive definite atx = x*, (4) 


where g; = Of/ 0x;, |gi| is the norm of the vector g;, and 
H = Hj is the Hessian matrix 


OF wee: ots 
Ox? 0x10Xn 
H= : 
OF. spe ge 
0x1 0Xn OxnOXn 


The matrix H is positive definite if all of the eigenvalues 
of H are positive. 


Algorithms for Optimization 


The efficacy of an optimization algorithm depends 
strongly on the nature of the design space. In engineer- 
ing problems, the design space can manifest patholog- 
ical characteristics. The objective function f may pos 
sess multiple local optima [36] arising from physical 
and/or numerical reasons. Examples of the latter in- 
clude noise introduced in the objective function by grid 
refinement between successive flow simulations, and 
incomplete convergence of the flow simulator. Also, the 
objective function f and/or its gradient g; may exhibit 
near discontinuities for physical reasons. For example, 
a small change in the the design state x of a mixed com- 
pression supersonic inlet operating at critical condi- 
tions can cause the terminal shock to be expelled, lead- 
ing to a rapid decrease in total pressure recovery [44]. 
Moreover, the objective function f may not be evaluable 
at certain points. This may be due to constraints in the 
flow simulator such as a limited range of applicability 
for empirical data tables. 

A brief description of some different classes of gen- 
eral optimizers is presented. These methods are de- 
scribed for the unconstrained optimization problem for 
reasons of brevity. See [33] for an overview of opti- 
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mization algorithms and software packages, and [11] 
for a comprehensive discussion of the constrained op- 
timization problem. Detailed mathematical exposition 
of optimization problems is presented in [19]. 


Gradient Optimizers 


If the objective function f can be approximated in the 
vicinity of a point x by a quadratic form, then 


fF + Fle -¥) + 5 (41 - FH ly -F). ©) 


where fh Gand H; ; imply evaluation at x and the Ein- 
stein summation convention is implied. In the relatively 
simple method of steepest descent [40], the quadratic 
term in (5) is ignored, and a line minimization is per- 
formed along the direction of — gj, i.e., a sequence of 
values of the design variable x =", v = 1,..., are formed 
according to 


x) —¥4 6x 
where 
Bxf = AE; (gi|7 


and A“), y =1,..., are an increasing sequence of dis- 
placements. The estimated decrease in the objective 
function f is —A” [g;|. The objective function f is eval- 
uated at each iteration v and the search is terminated 
when f begins to increase. At this location, the gradi- 
ent g; is computed and the procedure is repeated. This 
method, albeit straightforward to implement, is ineffi- 
cient for design spaces which are characterized by long, 
narrow ‘valleys’ [40]. 

The conjugate gradient methods [40] are more ef- 
ficient than the method of steepest descent, since they 
perform a sequence of line minimizations along specific 
directions in the design space which are mutually or- 
thogonal in the context of the objective function. Con- 
sider a line minimization of f along a direction u = {u;:i 
=1,..., n}. At any point on the line, the gradient of f in 
the direction of u is u;g; by definition. At the minimum 
point in the line search, 


uig; =0 


by definition. Consider a second line minimization of f 
along a direction v. From (5) and noting that Hj is sym- 
metric, the change in g; along the direction v is Hj;vj. 


Thus, the condition that the second line minimization 
also remain a minimization along the first direction u is 


uj H;jV; =0 


When this condition is satisfied, u and v are denoted 
conjugate pairs. Conjugate gradient methods (CGM) 
generate a sequence of directions u, v, ... which are mu- 
tually conjugate. If f is exactly quadratic, then CGM 
yield an n-step sequence to the minimum. 

Sequential quadratic programming methods employ 
the Hessian H which may be computed directly when 
economical or may be approximated from the sequence 
of gradients g; generated during the line search (the 
quasi-Hessian [33]). Given the gradient and Hessian, 
the location x* of the minimum value of f may be found 
from (5) as 


Hi j(x? —%;) => i. 


For the general case where f is not precisely quadratic, 
a line minimization is typically performed in the direc- 
tion (x* — x;), and the process is repeated. 

Variational sensitivity employs the concept of direct 
differentiation of the optimization function f and gov- 
erning fluid dynamic equations (in continuous or dis- 
crete form) to obtain the gradient g;, and optimization 
using a gradient-based method. It is related to the the- 
ory of the control of systems governed by partial dif- 
ferential equations [29,39]. For example, the boundary 
shape (e.g., airfoil surface) is viewed as the (theoreti- 
cally infinite-dimensional) design space which controls 
the objective function f. Several different formulations 
have been developed depending on the stage at which 
the numerical discretization is performed, and the use 
of direct or adjoint (costate) equations. Detailed de- 
scriptions are provided in [23] and [24]. Additional ref- 
erences include [2,5,20,21,37,38,50]. 

The following summary follows the presentation in 
[24] which employs the adjoint formulation. The ob- 
jective function f is considered to be a function of the 
flowfield variables w and the physical shape S. The dif- 
ferential change in the objective function is therefore 

éf = Fw 4 Ls, (6) 
The discretized governing equations of the fluid motion 
are represented by the vector of equations 


R(w;S) = 0 
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and therefore 
OR aR + OR 6 0 (7) 
= —<oOw — = VQ, 
dw as 


where OR is a vector. Assume a vector Lagrange multi- 
plier w and combining (6) and (7) 
of + OR of + OR 
6f = imo —?6 —— —>} 8S, 
f en wot wee 4s 


where T indicates vector transpose. If y is chosen to sat- 
isfy the adjoint (costate) equation 


OR of 
Toe O. 
¥ dw OW? (8) 
then 
df = GbS 
where 
of + OR 
G=—- —. 
os as 


This yields a straightforward method for optimization 
using, for example, the method of steepest descent. The 
increment in the shape is 


6S = —AG, 


where A is a positive scalar. The variational sensitivity 
approach is particularly advantageous when the dimen- 
sion n of the design space (which defines S) is large, 
since the gradient of S is obtained from a single flow- 
field solution (7) plus a single adjoint solution (8) which 
is comparable to the flowfield solution in cost. Con- 
straints can be implemented by projecting the gradient 
onto an allowable subspace in which the constraints are 
satisfied. 

Response surface methods employ an approximate 
representation of the objective function using smooth 
functions which are typically quadratic polynomials 
[25]. For example, the objective function may be ap- 
proximated by 


fref=at = Bix; + » VijXiXj 
l<i<n l<i<j<n 
where a, 6; and yj are coefficients which are deter- 
mined by fitting f to a discrete set of data using the 
method of least squares. The minimum of f can then be 
found by any of the gradient optimizers, with optional 
recalibration of the coefficients of f as needed. There 
are many different implementations of the response 
surface method (see, for example, [12,34] and [46]). 


Stochastic Optimizers 


Often the objective function is not well behaved in 
a portion or all of the design space as discussed above. 
In such situations, gradient methods can stop with- 
out achieving the global optimum (e. g., at an infeasible 
point, or a local minimum). Stochastic optimizers seek 
to avoid these problems by incorporating a measure of 
randomness in the optimization process, albeit often- 
times at a cost of a significant increase in the number of 
evaluations of the objective function f. 

Simulated annealing [26,27,32] mimics the process 
of crystalization of liquids or annealing of metals by 
minimizing a function E which is analogous to the en- 
ergy of a thermodynamic system. Consider a current 
point (state) in the design space X and its associated ‘en- 
ergy E. A candidate for the next state x* is selected by 
randomly perturbing typically one of the components 
Xj, 1 <j <n, of, x and its energy E* is evaluated (typi- 
cally, each component of x is perturbed in sequence). If 
E* < E thenX = x*, i.e, the next state isx*. If E* > E 
then the probability of selecting x* as the next design 
state is 


7 E*—E 
POP Er)? 


where k is the ‘Boltzman constant’ (by analogy to sta- 
tistical mechanics) and T is the ‘temperature’ which is 
successively reduced during the optimization according 
to an assumed schedule [27]. (Of course, only the value 
of the product kT is important.) The stochastic nature 
can be implemented by simply calling a random num- 
ber generator to obtain a value r between zero and one. 
Then the state x* is selected if r < p. Therefore, during 
the sequence of design states, the algorithm permits the 
selection of a design state with E > E, but the proba- 
bility of selecting such a state decreases with increasing 
E —E. This feature tends to enable (but does not guar- 
antee) the optimizer to ‘jump out’ of a local minimum. 

Genetic algorithms (GAs) mimic the process of bio- 
logical evolution by means of random changes (muta- 
tions) in a set of designs denoted the population [14]. 
At each step, the ‘least fit? member(s) of the population 
(i.e., those designs with the highest value of f) are typi- 
cally removed, and new members are generated by a re- 
combination of some (or all) of the remaining mem- 
bers. There are numerous GA variants. In the approach 
of [41], an initial population P of designs is generated 
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P2 and P8 inlets 


by randomly selecting points x;, i = 1, ..., p, satisfying 
(2). The two best designs (i.e., with the lowest values 
of f) are joined by a straight line in the design space. 
A random point x’ is chosen on the line connecting the 
two best designs. A mutation is performed by randomly 
selecting a point xp,, within a specified distance of x’. 
This new point is added to the population. A member of 
the population is then removed according to a heuristic 
criterion, e. g., among the k members with the highest 
f, remove the member closest to xp,1, thus maintaining 
a constant number of designs in the population. The 
removal of the closest member tends to prevent clus- 
tering of the population (i.e., maintains diversity). The 
process is repeated until convergence. 


Examples 


Examples of the above algorithms for optimization us- 
ing CFD are presented. All of the examples are single 
discipline involving CFD only. It is emphasized that 


multidisciplinary optimization (MDO) involving com- 
putational fluid dynamics, structural dynamics, electro- 
magnetics, materials and other disciplines is a very ac- 
tive and growing field, and many of the optimization 
algorithms described herein are appropriate to MDO 
also. A recent review is presented in [49]. 


Sequential Quadratic Programming 


V. Shukla et al. applied a sequential quadratic program- 
ming algorithm CFSQP [28] to the optimal design of 
two hypersonic inlets (denoted P2 and P8) at Mach 7.4. 
The geometric model is shown in Fig. 4. The optimiza- 
tion criteria was the minimization of the strength of 
the shock wave which reflected from the centerbody 
(lower) surface. This is the same criteria as originally 
posed in the design of the P2 and P8 inlets [13]. The 
NPARC flow solver [47] was employed for the P2 op- 
timization, and the GASP flow solver [31] for the P8 
optimization. 
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ure5 

Static pressure contours for optimal P8 inlet (the original 
centerbody contour is shown by the dotted line) 


The optimization criteria was met for both inlets. In 
Fig. 5, the static pressure contours for the optimized P8 
inlet are shown. The strength of the reflected shock is 
negligible. 


Variational Sensitivity 


A. Jameson et al. [24] applied the methodology of varia- 
tional sensitivity (control theory) to the optimization of 
a three-dimensional wing section for a subsonic wide- 
body commercial transport. The design objective was 
to minimize the drag at a given lift coefficient C; = 
0.55 at Mach 0.83 while maintaining a fixed planform. 
A two stage procedure was implemented. The first stage 
employed the Euler equations, while the second stage 
used the full Reynolds-averaged Navier-Stokes equa- 
tions. In the second stage, the pressure distribution ob- 
tained from the Euler optimization is used as the target 
pressure distribution. 

The initial starboard wing shape is shown in Fig. 6 
as a sequence of sections in the spanwise direction. 
The initial pressure distribution on the upper surface, 
shown as the pressure coefficient c, plotted with nega- 
tive values upward, is presented in Fig. 7. A moderately 
strong shock wave is evident, as indicated by the sharp 
drop in —c, at roughly the mid-chord line. After sixty 
design cycles of the first stage, the drag coefficient was 
reduced by 15 counts from 0.0196 to 0.0181, and the 
shock wave eliminated as indicated in the c, distribu- 
tion in Fig. 8. A subsequent second stage optimization 


Design Optimization in Computational Fluid Dynamics, Fig- 
ure 6 
Initial shape of wing 


Design Optimization in Computational Fluid Dynamics, Fig- 
ure 7 
Initial surface pressure distribution 


using the Reynolds—averaged Navier-Stokes equations 
yielded only slight modifications. 


Response Surface 


R. Narducci et al. [35] applied a response surface 
method to the optimal design of a two-dimensional 
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Design Optimization in Computational Fluid Dynamics, Fig- 
ure 8 
Optimized surface pressure distribution 


transonic airfoil. The design objective was to maximize 
the lift coefficient C; at Mach 0.75 and zero degrees an- 
gle of attack, while satisfying the constraints that the 
drag coefficient Cp < 0.01 and the thickness ratio 0.075 
< t < 0.15 where t is the ratio of the maximum airfoil 
thickness to the airfoil chord. The airfoil surface was 
represented by a weighted sum of six different shapes 
which included four known airfoils (a different set of 
basis functions were employed in [9] for airfoil opti- 
mization using a conjugate gradient method). The ob- 
jective function f was represented by a quadratic poly- 
nomial. An inviscid flow solver was employed. 

A successful optimization was achieved in five re- 
sponse surface cycles. The history of the convergence 
of C, and Cp is shown in Fig. 9. A total of twenty three 
flow solutions were required for each response surface. 


Simulated Annealing 


S. Aly et al. [1] applied a modified simulated anneal- 
ing algorithm to the optimal design of an axisymmetric 
forebody in supersonic flow. The design objective was 
to minimize the pressure drag on the forebody of a ve- 
hicle at Mach 2.4 and zero angle of attack, subject to 
constraints on the allowable range of the body radius as 


Cycles 


Design Optimization in Computational Fluid Dynamics, Fig- 
ure9 
Convergence history for transonic airfoil 


a function of axial position. Two different variants of SA 
were employed, and compared to a gradient optimizer 
NPSOL [10] which is based on a sequential quadratic 
programming algorithm. All optimizers employed the 
same initial design which satisfied the constraints but 
was otherwise a clearly nonoptimal shape. Optimiza- 
tions were performed for two different initial shapes. 
The flow solver was a hybrid finite volume implicit Eu- 
ler marching method [45]. 

The first method, denoted simulated annealing with 
iterative improvement (SAWI), employed SA for the 
initial phase of the optimization, and then switched to 
a random search iterative improvement method when 
close to the optimum. This method achieved from 
8% to 31% reduction in the pressure drag, compared 
to optimal solution obtained NPSOL alone, while re- 
quiring fewer number of flowfield simulations (which 
constitute the principal computational cost). The sec- 
ond method employed SA for the initial phase of 
the optimization, followed by NPSOL. This approach 
achieved from 31% to 39% reduction in the pressure 
drag, compared to the optimal solution obtained by 
NPSOL alone, while requiring comparable (or less) 
cputime. The forebody shapes obtained using SA, SA 
with NPSOL and NPSOL alone are shown in Fig. 10. 
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Design Optimization in Computational Fluid Dynamics, Fig- 
ure 10 

Forebody shapes obtained using SA, SA with NPSOL and 
NPSOL. Copyright 1996 AIAA - Reprinted with permission 


Genetic Algorithms 


G. Zha et al. applied a modified genetic algorithm 
(GADO [42]) to the optimal design of an axisymmet- 
ric supersonic mixed compression inlet at Mach 4 and 
60 kft altitude cruise conditions (see above). The ge- 
ometric model included eight degrees of freedom (see 
above), and the optimization criteria was maximization 
of the inlet total pressure recovery coefficient. The con- 
straints included the requirement for the inlet to start 
at Mach 2.6, plus additional constraints on the inlet ge- 
ometry including a minimum cowl thickness and lead- 
ing edge angle. The constraints were incorporated into 
the GA using a penalty function. The flow solver was 
the empirical inlet analysis code NIDA [15]. This code 
is very efficient, requiring only a few seconds cputime 
on a workstation, but is limited to 2-dimensional or 
axisymmetric geometries. Moreover, the design space 
generated by NIDA (i.e., the total pressure recovery co- 
efficient as a function of the eight degrees of freedom) is 
nonsmooth with numerous local minima and gaps at- 
tributable to the use of empirical data Fig. 11. 

The GA achieved a 32% improvement in total pres- 
sure recovery coefficient compared to a trial-and-error 
method [53]. A total of 50 hours on a DEC-2100 work- 
station was employed. A series of designs generated 
during the optimization were selected for evaluation 
by a full Reynolds-averaged Navier-Stokes code (GASP 
[31]). A close correlation was observed between the pre- 
dictions of NIDA and GASP Fig. 12. 
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Design Optimization in Computational Fluid Dynamics, Fig- 
ure 11 
Total pressure recovery coefficient versus axial location of 
throat 
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Design Optimization in Computational Fluid Dynamics, Fig- 
ure 12 

Total pressure recovery coefficient from NIDA and GASP for 
several different inlet designs 


Conclusion 


Computational fluid dynamics has emerged as a vital 
tool in design optimization. The five levels of CFD anal- 
ysis are utilized in various optimization methodolo- 
gies. Complex design optimizations have become com- 
monplace. A significant effort is focused on multidis- 
ciplinary optimization involving fluid dynamics, solid 
mechanics, materials and other disciplines. 


676 


Design Optimization in Computational Fluid Dynamics 


See also 


> Bilevel Programming: Applications in Engineering 
> Interval Analysis: Application to Chemical 


Engineering Design Problems 


> Multidisciplinary Design Optimization 
> Multilevel Methods for Optimal Design 
> Optimal Design of Composite Structures 
> Optimal Design in Nonlinear Optics 

> Structural Optimization: History 
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Introduction/Background 


Model predictive control (MPC) is very popular for its 
capacity to deal with multivariable, constraints-model- 
based control problems for a variety of complex lin- 
ear or non-linear processes [13]. MPC is based on 
the receding-time-horizon philosophy where an open- 
loop, constrained optimal control problem is solved on- 
line at each sampling time to obtain the optimal control 
actions. The optimal control problem is solved repet- 
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itively at each time when a new measurement or es- 
timate of the state is available, thus establishing an 
implicit feedback control method [14,15]. The main 
reasons for the popularity of MPC are its optimal per- 
formance, its capability to handle constraints and its in- 
herent robustness due to feedback control properties. 

Despite the widely acknowledged capabilities of 
MPC, there are two main shortcomings that have been 
a major concern for the industrial and academic com- 
munities. The first shortcoming is that MPC imple- 
mentation is limited to slowly varying processes due 
to the demanding online computational effort for solv- 
ing the online optimal control problem. The second 
is that, despite its inherent robustness due to the im- 
plicit feedback, MPC cannot guarantee the satisfac- 
tion of constraints and optimal performance in the 
presence of uncertainties and input disturbances, since 
usually it relies on nominal models (uncertainty-free 
models) for the prediction of future states and control 
actions [14,20,22]. 

The first shortcoming of MPC can be overcome 
by employing the so-called parametric MPC (pMPC) 
or multiparametric MPC (mp-MPC) [4,16,20]. Para- 
metric MPC controllers are based on the well-known 
parametric optimization techniques [9,18] for solving 
the open-loop optimal control problem offline and ob- 
tain the complete map of the optimal control actions 
as functions of the states. Thus, a feedback control law 
is obtained offline and the online computational effort 
is reduced to simple function evaluations of the feed- 
back control. The inevitable presence of uncertainties 
and disturbances have been ignored by the pMPC com- 
munity, and only recently has the research started fo- 
cusing on control problems with uncertainty [2,20]. In 
traditional MPC the issue of robustness under uncer- 
tainty has been dealt with using various methods such 
as robust model predictive control [3,8], model predic- 
tive tubes [6,12] and min-max MPC [21,22]. However, 
this is still an unexplored area for pMPC, apart from the 
recent work presented in [2,20]. 

In this manuscript we discuss the challenges of 
robust parametric model predictive control (RpMPC) 
and we present a method for RpMPC for linear, 
discrete-time dynamic systems with exogenous distur- 
bances (input uncertainty) and a method for RpEMPC 
for systems with model uncertainty. In both cases the 
uncertainty is described by the realistic scenario where 


no uncertainty model (stochastic or deterministic) is 
known but it is assumed that the uncertainty variables 
satisfy a set of inequalities. 


Definitions 
Consider the following linear, discrete-time system: 
X41 = Ax, + Bu; + WO; 


(1) 
yr Bx; + Du; + FO; 3 


wherex € X C R",ue UCR", ye YC RI 
and 6 € © C R” are the state, input, output and dis- 
turbance (or uncertain) input vectors respectively and 
A, B, C, D, W and F are matrices of appropriate di- 
mensions. The disturbance input 6 is assumed to be 
bounded in the set 9 = {0 € R”|O' < 6; < OY, i= 
1,..., w}. This type of uncertainty is used to character- 
ize a broad variety of input disturbances and modeling 
uncertainties including non-linearities or hidden dy- 
namics [7,11]. This type of uncertainty in general may 
result in infeasibilities and performance degradation. 


Definition 1 The robust controller is defined as the 
controller that provides a single control sequence that 
steers the plant into the feasible operating region for 
a specific range of variations in the uncertain variables. 


The general robust parametric MPC (RpMPC) problem 
is defined as [20] 


(X41) = a Fa Psen + 


N-1 
» [tea Qreel F Hats} (2) 
k=0 


St. Xtpetalt = AXr+kle t+ Burte+ Woe, k= 0 
(3) 
Virkle = Cxitkie+ Durzet+ Fore, k=O (A) 


(Xp kts Meth) = Cixpp ee +Courtn+Cs < 0, 
k=0,1,...,N—1 (5) 


A(x;+n\2) = Dixi+njr + D2 < 0 (6) 
Urtk = Kxipnt, KEN (7) 
Xe = x" (8) 


Design of Robust Model-Based Controllers via Parametric Programming 


679 


where g:X x U— R"”s and h: X — R"" are the 
path and terminal constraints respectively, x* is the 


initial state, uN = {uj;,...,Us4n-1} € UX--- x 
U = UN are the predicted future inputs and ON = 
{0:,...,On—1} € ON are the current and future values 


of the disturbance. 


Formulation 


The design of a robust control scheme is obtained by 
solving a receding horizon constrained optimal con- 
trol problem where the objective is the deviations ex- 
pected over the entire uncertainty set, or the nominal 
value of the output and input deviations. In order to 
ensure feasibility of (2)-(8) for every possible uncer- 
tainty scenario 6:4, € O, k =0,...,N —1, the set of 
constraints of (2)-(8) is usually augmented with an ex- 
tra set of feasibility constraints. The type of these con- 
straints, as will be described later, will determine if the 
RpMPC is an open-loop or closed-loop controller. 


Open-Loop Robust Parametric Model Predictive 
Controller 


To define the set of extra feasibility constraints the fu- 
ture state prediction 


= 
Xtpk|t = Ax* + Y (A Burpe—1-j + AJWO:44-1-)) 
j=0 


(9) 


is substituted into the inequality constraints (5)-(6), 
which then become 


Ex’ Oe) 20, j=l,....J © 
N-1 4 


n 
* 
» V1i,jx; + > > V2i,k,jUttk,i 
i=1 k=0 i=1 


N-1 w 


+> o>) 73inj9 +t+kity4; <0 (10) 


k=0 i=1 


where y 1, y2, y3 are coefficients that are explicit func- 
tions of the elements of matrices A, B, C, D, W, F, Cy, 
C2, C3, Di, D2, Q, R, P. The set of feasibility constraints 
is defined as 


tie as 0S VO" 2 OP (Vi SH 1).2.,) 
[gj(x*.u%,0%) <0,uN e UN x*eEX.]) (11) 


The constraints y < 0 ensure that, given a particular 
state realization x*, the single control action w’ satis- 
fies all the constraints for all possible bounded distur- 
bance scenarios over the time horizon. However, this 
feasibility constraint represents an infinite set of con- 
straints since the inequalities are defined for every pos- 
sible value of 9 € ON. In order to overcome this prob- 
lem one has to notice that (11) is equivalent to 


= . N Nn 
maxmax 12 (4 u )I 


j=l,...,J,u% € UX, x* ex,0% €O\ <0 
(12) 


Adding (12) into (2)-(8) and minimizing the expec- 
tation of the objective function (2) over all uncertain re- 
alizations 6,4, one obtains the following robust model 
predictive control problem: 


‘ T 
P(X) = Egne@n | aFaPsen ++ 


N-1 
[yea Qveear a eRe] (13) 

k=0 
st. (3)-(8) and (11) (14) 


Problem (13)-(14) is a bilevel program that has as 
constraint a maximization problem, which, as will be 
shown later, can be solved parametrically and then re- 
placed by a set of linear inequalities of u% , x*. The so- 
lution to this problem corresponds to a robust control 
law as it is defined in Definition 1. Problem (13)-(14) 
is an open-loop robust control formulation in that it 
obtains the optimal control actions u% for the worst- 
case realization of the uncertainty only, as expressed by 
inequality (12), and does not take into account the in- 
formation of the past uncertainty values in the future 
measurements, thus losing the benefit of the prediction 
property. This implies that the future control actions 
can be readjusted to compensate for any variation in the 
past uncertainty realizations, thereby obtaining more 
“realistic” and less conservative values for the optimal 
control actions. This problem can be overcome if we 
consider the following closed-loop formulation of the 
problem (2)-(8). 
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Closed-Loop Robust Parametric 
Model-Based Control 


To acquire a closed-loop formulation of the general 
RpMPC problem, a dynamic programming approach 
is used to formulate the worst-case closed-loop MPC 
problem, which requires the solution of a number of 
embedded optimization problems that in the case of 
a quadratic objective are non-linear and non-differen- 
tiable. Feasibility analysis is used to directly address 
the problem and a set of constraints is again incorpo- 
rated in the optimization problem to preserve feasibility 
and performance for all uncertainty realizations. Future 
measurements of the state contain information about 
the past uncertainty values. This implies that the future 
control actions can be readjusted to compensate for the 
past disturbance realizations by deriving a closed-loop 
MPC problem as shown next. The main idea is to intro- 
duce constraints into the control optimization problem 
(2)-(8) that preserve feasibility and performance for all 
disturbance realizations. These constraints are given as 


WHE, [Wit i]e=0,.6) & 

VOr4e € Binds, € UVO4041 

€ Of{Fu;pe42 € U...{VO;4N-2 

€ Of{Fu;4n—1 € U{VO;4N-1 € O 

{Vj=1,...,J [ gj(x*, [Ur+k]k=0,....N—1> 

[O:+k]k=0,....N—-1) 0]}}} nas EPs 

Uae EU, K=O,...,£, x7 eX, Op EO, 
k=0,...,€-1, €=0,...,N—1. (15) 


The constraints of (15) are incorporated into (2)- 
(8) and give rise to a semi-infinite dimensional program 
that can be posed as a min-max bilevel optimization 
problem: 


* : Tr 
Ce ee | Fan Xt+nit + 
N-1 
a [v7 ae Qveear a Hyak (16) 
k=0 


s.t. max gj(x",u No ys 0 


O4N— lj (17) 


max min. 
O44 Ut42 


. max min max max 
Petey —24tt+Nn-1A4y-1 ji 


g(x" uN, ON) <0 


(18) 


max min max min.. 
A, Mrt1 Opp, Me+2 


. max min max max 
O44N—2 Mt+N—-1 Op 4y—-1 fj 


&(x*,u MG <0. (19) 
uve, x*ex, O%c Or. (20) 
The difference between the above formulation and for- 
mulation (13)-(14) is that at every time instant t + k 
the future control actions {u;4441,...,U++N—1} are 
readily adjusted to offset the effect of the past uncer- 
tainty {@;,..., 0:44} to satisfy the constraints. In con- 
trast, in formulation (13)-(14) the control sequence has 
to ensure constraint satisfaction for all possible distur- 
bance scenarios. The main issue for solving the above 
optimization problem is how to solve parametrically 
each of (17)-(19) and replace them with a set of in- 
equalities of uN 
metric programming problem. This is shown in the fol- 
lowing section. 


,x* suitable to formulate a multipara- 


Methods/Applications 


Parametric Solution of the Inner Maximization 
Problem of the Open-Loop Robust pMPC Problem 


An algorithm for solving parametrically the maximiza- 
tion problem of (12), which forms the inner maximiza- 
tion problem of the open-loop RpMPC (13)-(14), com- 
prises the following steps: 

Step 1. Solve Gj(x*, uN) = maxgn{gj(x*, uN, ON) ON 
< ON < 6NU} | = 1,...,J as a parametric program 
with respect to 6“ and by recasting the control ele- 
ments and future states as parameters. The paramet- 
ric solution can be obtained by following the method 
in [19], where the critical disturbance points for each 
ae Fai are identified as follows: 


1. If ws = ik >0> OG = aed = 
1,.-3.Jrthen k = 0, .N-1,i=1,...,w; 
Og; : 
2. Ws = 73:4. < 0 => Oe = ae asf = 


_.J, then k = 0,. .N-li=1,...,w. 
one Ort ki in the eonsiratnts & <0 we obtain 
G,(x*,u%) = g)(x*, uN, 0%"), where 0%» is the se- 
quence of the critical values of the uncertainty vector 
0°" over the horizon N. 
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Step 2. Compare the parametric profiles Gj(x*, uw) 
over the joint space of uN and x* and retain the up- 
per bounds. A multiparametric linear program is for- 


mulated: 


v(x*,u%) = max Gj 
j 
u(x") = min{ele > G), f= 1,2..,J}, 
E 


eeu, x" eX, i) 


which is equivalent to the comparison procedure of [1]. 


Step 3. Problem (21) is a multiparametric linear pro- 
gramming problem; hence the solution consists of a set 
of piece-wise linear expressions for y; in terms of 
the parameters uN and x* and a set of regions W;, 
on eae Nig where these expressions are valid. This 
statement was proven in [20], sect. 2.2, theorem 2.1, 
and in [10]. Note that no region W, exists such that 
Ws < Wi, Vix*, uN} © W, and Vi#s since w is con- 
vex. Thus, inequality (11) can be replaced by the in- 
equalities w;(x*,u%) <0. In this way problem (13)- 
(14) can be recast as a single-level stochastic program: 


b(x*) = min {O(x*, uN, 6%”)| 
uNeuN 
Be ae er) SO Fa Th aca J 
US” 1) SOA SA oe Neel = 02) 


where @ is the quadratic objective (13) after substitut- 
ing (9). The superscript n in 6N>” denotes the nominal 
value of 0, which is usually zero. An approximate so- 
lution to the above stochastic problem can be obtained 
by discretizing the uncertainty space into a finite set 
of scenarios 0X’, i = 1,..., ns with associated objec- 
tive weights ([20]), thus leading to a multiperiod op- 
timization problem where each period corresponds to 
a particular uncertainty scenario. By treating the con- 
trol variables uv as the optimization variables and the 
current state x* as parameters, (22) is recast as multi- 
parametric quadratic program. 


Theorem 1 The solution of (22) is a piece-wise lin- 
ear control law u,(x*) = A.x* + b. and CR,.x* + cre, 
c= 1,...,N, is the polyhedral critical region where this 
control law is valid and guarantees that (5) and (6) are 
feasible for all 0,44 € O,k =0,...,N—1. 


The proof of the theorem is straightforward from (21) 
and [20] and is omitted for brevity’s sake. It shows that 
the solution to (22), and hence (13)-(14), can be ob- 
tained as an explicit multiparametric solution [9]. 


Solution of the Closed-Loop RpMPC Problem 


In order to solve the problem (16)-(20), the inner max- 
min-max problem in (17)-(19) have to be solved para- 
metrically and replaced by simpler linear inequalities, 
so the resulting problem is a simple multiparametric 
quadratic program. For simplicity, we only present an 
algorithm for solving the most difficult problem (19). 
The same thought process can be performed for the re- 
maining constraints. The algorithm consists of the fol- 
lowing steps: 


Step 1. Solve 


9+-nN— N 
G,™ ‘(x*,u [6:44] k=0,....N—2) 
= max Le Cre aa cam Pca <OM < Qn, 
O4.N-1 


j=il,...,J, (23) 
as a multiparametric optimization problem by recast- 
ing x* and uw as parameters and by following again the 
method of [19] or [20], sect. 2.2. 


Step 2. 
Gata 


Compare the parametric profiles 
(x*,u, [6:44] k=0,...,.N—2) over the joint space 
of uN, [O:44]e=0,....N-2 and x* to retain the upper 
bounds. For this comparison a multiparametric pro- 
gram is formulated and then solved by following the 


comparison procedure in [1]: 


O4-N— 
wits (x*, uN, [Or4%]k=0,..,N—2) = max Gt’ 
J 


O,4.N— 
& "tn 1(x*, uN [6:4+k] k=0,...,.N—2) 


= minf{e | st. ce <e,jHl,...,J}. (24 
& 


The solution of the above optimization consists of a set 
of linear expressions for wots = 
eters x*, UN, [6:44] k=0,...,N-2 and a set of polyhedral 
regions ies d i ae Noe 
pressions are valid. 


Step 3. Set = N—1. 


in terms of the param- 


, where these ex- 
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Step 4. Solve the following multiparametric optimiza- 
tion problem over u“ 


Wire (x*, ul, 6°) 


= min {y; and Ca 


O,.-5€—1)» 
upp eeu 


[Urtkle=o,..0> lOr+kl k= 


if wet? <0,i=1,...,NE3. (25) 
The above problem can be solved parametrically by 
following the procedure in [20], appendix A, or [17], 
chap. 3, sect. 3.2. The solution to (25) is a convex piece- 
wise affine function of w“'+¢ in terms of the parame- 
ters x*, uw, [0:44] k=0....,N—2 that is defined over a set of 
polyhedral regions ae a eee Nes 

Step 5. Set £ = € — 1 and solve the following maximiza- 
tion problem over 64!: 


We (x*, [Urrklamo,..n€> [Or+k]k=0,..,€-1) 
= max{y"" Poe”, (Weal isi 0, [0:44] k=0,..5€—-1) 
O46 
ie 20 fade B26) 


Since the function on the left-hand side of the above 
equality is a convex piecewise affine function, its maxi- 
mization with respect to [61+] ,=0,...,¢—1 reduces to the 
method of [19] followed by a comparison procedure as 
described in step 2. 


Step 6. If £ > 0, then go to step 4, else terminate 
the procedure and store the affine functions ve 5 
Es ee Sie 


reg’ 
Step 7. The expressions we (ur, x*) are the max-min- 
max constraint (19). Similarly, the remaining max- 
min-max constraints are replaced by the set of inequal- 
ities 

6:4 


V; a 


T T £ 
’ [u, Ur41] 91) < 0 ’ 


Substituting the inequalities in step 7 into the max- 
min-max constraints of (16)-(20) we obtain the follow- 


ing stochastic multparametric program: 


o(x*) = ain Egne@n {D(x*, u, gy 


s.t. g(x", uN, ON") <0 
On; —_ a 
W(x Uy) <0,i= 1 ee ge 
6 6 
Vi ate [uj ? aad s Or ), i= 1, ox 
94+4N—2 * 6” 
i ; + =0,...,.N—2> L9¢4 .]k=0,...,.N—3)> 
v (x, [Ur+k] k=0,....N—25 | k=0,....N—3) 
‘ O44 N— 
i= Vesey” z 
O:4-N— 
Wi aaa [ut+k]k=0,....N—1> [Of -]k=0,....N—2)s 


“O:+N—1 
oe) Nreg 


(27) 


where @ is again the quadratic objective function in 
(16). By discretizing the expectation of the value func- 
tion to a set of discrete uncertainty scenarios and by 
treating the current state x* as parameter and the con- 
trol actions as optimization variables and the problem 
is recast as a parametric quadratic program. The so- 
lution is a complete map of the control variables in 
terms of the current state. The results for the closed- 
loop RpMPC controller are summarized in the follow- 
ing theorem. 


Theorem 2 The solution of (27) is obtained as a linear 
piecewise control law u,(x*) = A,x* + b, and a set of 
polyhedral regions CR, = {x* € X|CR,x* + cr, < 0} 
in the state space for which system (1) satisfies con- 
straints (5)-(6) for all ON € ON, 


Cases 

A special case of the RpMPC problem (2)-(8) arises 
when the system matrices in the first equation in (1) 
are uncertain in that their entries are unknown but 
bounded within specific bounds. For simplicity we will 
consider the simpler case where W, F = 0 and the en- 
tries aj and bj of matrices A and B are not known but 


satisfy 
aij = Ajj + Saiz, big = big + Sdie 
baij € Aij = {6aij € R| = &|aij| < bai; oa &|aij|} ; 
(28) 
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big € Big = {big € R| — €|dje| < ddie¢ < edie}, 
(29) 
where 4jj;, b;¢ are the nominal values of the entries of 
A, B respectively and 6a;;, 5b;g denote the uncertainty 
in the matrix entries, which is assumed to be bounded 
as in (28)-(29). The general RpMPC formulation (2)- 
(8) must be redefined to include the introduced model 
uncertainty by adding the extra constraints 


aij = Ajj + baz, big = bie + Siz (30) 
V6aij € Aij, Vobig EB, 

The new formulation of the RpMPC (2)-(8) and (30) 

gives rise to a semi-infinite dimensional problem with 

a rather high computational complexity. 


Definition 2 A feasible solution uN for problem (2)- 
(8) and (30), for a given initial state x*, is called a robust 
or reliable solution. 


Obviously, a robust solution for a given x* is a control 
sequence wu (future prediction vector) for which con- 
straints (5)—(6) are satisfied for all admissible values of 
the uncertainty. Since it is difficult to solve this MPC 
formulation by the known parametric optimization 
methods, the problem must be reformulated in a multi- 
parametric quadratic programming (mpQP) form. Our 
objective in this section is to obtain such a form by con- 
sidering the worst-case values of the uncertainty, i.e. 
those values of the uncertain parameters for which the 
linear inequalities of (5)-(6) are critically satisfied. Usu- 
ally, the objective function (2) is formulated to penal- 
ize the nominal system behavior; thus one must sub- 
stitute x44), = A*x* + > 5 AJBuy+4—1-; in (2). In 
this way the objective function is a quadratic func- 
tion of uN and x*. Finally, the uncertain evolution of 
the system x/+ 4; = Akx* 4 ae AJ Buy+4—-1-; is re- 
placed in the constraints (5)-(6) to formulate a set of 
linear inequalities. Thus the following formulation of 
the RpMPC is obtained: 


1 
g(x") = min, 5 (us) "Hus 
u 


1 
+x FyN + 5G") rx"| » (1) 


k-1 
st CAS + > Cl, Al Buryk—1-j 
j=0 
+ Cyjuirk + Csi <0, 
bed pnayN al, 2a licotes (OD 
N-1 
DiANx* + > DipAlBury4-1-j + Due <0, 
j=0 
£=1,...,m,, (33) 
V6aij S Aij ri Vobig E€ Bu, 
ij=l,....n, €=1,...,m. (34) 


It is evident that the new formulation of the R>DMPC 
problem (31)-(34) is also a semi-infinite dimensional 
problem. This formulation can be further simplified if 
one considers that for any uncertain matrices A and B, 
the entries of the matrices A* and A*B for all k > 0 are 
given respectively by [17] 


k _ <k k k k k 
Aig = jy + bajy, —€|64;¢ nin! < bay < €(845¢ max , 
(35) 
abe, = ab* + dab 
it if it? 
k k k 
—€|Sabip nin| < Sab; < €ldabjy ...|- (36) 


The analysis on (35)-(36) follows from [17], chap. 3, 
and is omitted for brevity’s sake. 


Robust Counterpart (RC) Problem 


Using the basic properties of matrix multiplication and 
(35)-(36), problem (31)-(34) reformulates into 


* : 1 NyT N 
= = H 
oe amin, } 5 vey 
1 
oe Ry 4: 50 yx"| , (7) 


k-1 
j 


j=l q=1 €=1 


+> Crigui+ke t DD > Crig taney + C3; <0, 
£ q=1 f=1 
(38) 
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k=1,...,.N-lLi=l,...,n, 
N-1 n m 
SY Diigab! purtn—1—jt 
j=1 q=1 l=1 
+0 > Diigane xe + Dy; <0 (39) 
q=1 £=1 
i= 1,...,,VSaij € Aij, 


Vobie © Bye, i,j =1,...,n,€=1,...,m. (40) 
This is a robust multiparametric QP problem (ro- 
bust mp-QP) where the coefficients of the linear in- 
equalities in the constraints are uncertain, the vector 
N is the optimization variable and the initial states x* 
are the parameters. A similar robust LP problem was 
studied in [5] where the coefficients of the linear con- 
straints are uncertain, similar to (35)—(36); however, no 
multiparametric programming problems were consid- 
ered. 
In a similar fashion to the analysis in [5] we con- 
struct the robust counterpart of the robust mp-QP 
problem (37)-(40): 


1 
x*)= min ye uN)" HuN 
p(x") eae ) 
1 
+x°T Fu + 5(x*) TY x" , (41) 
m : 
Ss Cyigabi Urth—1—j.t 


+ €max{|Ciiql|Sabiy inl 


k 
ICrigl S450 max 
£ 


n n n n 
-k k 
+>> > CrighgeX¢ +) > Ye max{|Crig |84%¢ min 7 
q=1f=1 q=1 €=1 
[Crigll5ae¢ max! t1¥¢1 + Cai < 
k=1,...,N=1, 


i=1,...,M,, 


(42) 


N-1 n 


m . 
“J 
zs Dyiqadeut+k—1-j.e 


j=l q=lé 


+ 3 3 J max{|Drglldad)y stele 


j=l q=1 f=1 
|Diig||Sab* 


qt,max 
n n n 
=k k 
+ OY Dunthest + OY mart Dal Bahl 
q=1 £=1 q=1 f=1 
k 
IDiiq 54 o¢ maxi} 1% e | + D2; <0 
i= 1, «20, Mh, 


(43) 


peu, x eX. (44) 


In this way the initial semi-infinite dimensional 
problem (37)-(40) becomes the above multiparametric 
non-linear program (mp-NLP). However, the paramet- 
ric solution of this mp-NLP problem is still very diffi- 
cult. 


Interval Robust Counterpart Problem 


The interval robust counterpart (IRC) problem can 
then be formulated as follows: 


* . Ls yr py, N 
— = Hi 
o(x”) pe 5 (u )* Hu 


1 
xT FyN + s(x TY" , (45) 


i 
s.t. Cy ig Ab ,gUt+k—1-j,0 
j=l q=1 l=1 


+ 3 3 . € max{|Cyiql|5abky in|? 
j=l q=l l=1 
[CrigllSany pmael}Ze+k—1-j,l 6 S Coithitke 


n n t 
+ DE Cig text 


g=1L=1 
n n 
k 
+ 2 > €max{|Ciiq||645¢ minl> 
g=1£=1 
ICrigl|da* 
| eee 


[}we + C3; < 0 


ql,max 


»N-1, i=1,...,Mg, 


(46) 


Design of Robust Model-Based Controllers via Parametric Programming 


685 


4 
-1 0.8 -0.6 -0.4 -02 0 
XY 


19 Feasible Region Fragments 


1 crRo01 
[4 CRo02 
1) CRo03 
[4 CRo04 
[9 CRo05 
Gl CRO0G 
GH CROO7 
§ CRoos 
Gl CRO09 
[5 cRo10 
[4 CRo11 
[i CRO12 
1 cro013 
1 crRo14 
GH OCRO15 
Gl CRO16 
3 CRO17 
Gl CRO18 
(i CRO19 


Design of Robust Model-Based Controllers via Parametric Programming, Figure 1 
Critical regions for the nominal parametric MPC and state trajectory 


n m 


N-1 
1d 
TOY Digibletrsery 


j=l q=1 £=1 


n m 


N-1 
+ > 2 2 max{|Djjq| [badly mink» 


j=! q=Hl1 @=1 


k 
|Diiq| Sab oe maxltZt+k—1-j,e 


i Hh non 
-k k 
+ ) ) DyighgeX¢ + y ) max{|Dyiq||549¢ minl> 


q=1 f=1 q=1l=1 
[Drigl bare. maxi} We + D2 <0 
io ee Nh, 
(47) 
— Zp k-1- jl S Uep¢k—-1-j, S Zt+-k-1-j,£ » (48) 
— We Sx Swe, (49) 
ew sex, (50) 


where the non-linear inequalities (42)-(43) have been 
replaced by four new linear inequalities. Two new vari- 
ables have been introduced to replace the absolute val- 
ues of the u;+;x-1-;,¢ and x/, thus leading to the relaxed 
IRC problem. 

The IRC is a mpQP problem with a quadratic index 
and linear inequalities, where the optimization vari- 


ables now are the vectors u;4,-1-j, Z:+k-1-j and w 
and the parameters are the states x*. The IRC prob- 
lem can be solved with the known parametric opti- 
mization methods [4,9,16] since the objective func- 
tion is strictly convex by assumption. The optimal 
control inputs uw’, optimization variables z and w 
and hence the optimal control u; can then be ob- 
tained as explicit functions uN(x*), z(x*) and w(x*) 
of the initial state x*. Furthermore, the control in- 
put u; is obtained as the explicit, optimal control 


0.156 
0.154 
0.152 bee N 

O45 lesseedabtedesee 
0.148 
Ey en EE : 
OA AA ecerse wees Gurnee essed 
0.142 
0.14 
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xy 
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Magnification of Fig. 1 around the state trajectory at the sec- 
ond time instant 
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Critical regions for the nominal parametric MPC and state trajectory 


law [9] u;(x*) = A,x* +b. which is valid in the 
polyhedral region CR, = {x* € X|CR.x* + cr < 0}, 


c =1,...,N,, where N, is the number of critical re- 
gions obtained from the parametric programming al- 
gorithm. 


The general RpMPC problem obtained from the 
case where the dynamic system (1) pertains to model 
uncertainties have now been transformed into the IRC 
problem and can be solved as a mp-QP problem. It is 
obvious that a feasible solution for the IRC problem is 
also a feasible solution for the RC and hence the initial 
RpMPC problem (2)-(8) and (30). Hence: 


Lemma 1 [fu is a feasible solution for the IRC prob- 
lem, then it is also a feasible solution for the RC problem, 
and hence it is a robust solution for the initial RpMPC 
problem (2)-(8), (30). 


Example 2. Consider a two-dimensional, discrete-time 
linear system (1) where W = F = Oand 


__ [ 0.7326 + 6a —0.0861 
~ | 0.1722 0.0064 , 


(51) 
_ [| 0.0609 + bb 
~ | 0.0064 : 


where the entries a,; and b, of the A and B matrices are 
uncertain, where 6a and 5b are bounded as in (28)-(29) 
with € = 10% and the nominal values are 4); = 0.7326 


and b,; = 0.0609. The state and control constraints are 
—3 < [0 1.4142]?x <3, —2 <u <2, and the termi- 
nal constraint is 


0.070251 1 0.02743 
—0.070251 —-1 .0274 
0.070 yds 0.02743 (52) 
0.21863 1 0.022154 
—0.21863 —-1l 0.022154 
Moreover, 
0 0 1.8588 1.2899 
Q= R=0.01, P= ‘ 
0 2 1.2899 6.7864 
(53) 


Initially, the MPC problem (2)-(8) is formulated and 
solved only for the nominal values of A and B, thus solv- 
ing a multiparametric quadratic programming problem 
as described in [4,16]. Then the IRC problem is formu- 
lated as in (45)-(50) by using POP software [9]. The re- 
sulting regions for both cases are shown in Figs. 1 and 3 
respectively. A simulation of the state trajectories of the 
nominal and the uncertain system are shown in Figs. 1 
and 3 respectively. In these simulations the uncertain 
parameters 5a and 5b were simulated as a sequence of 
random numbers that take their values on the upper or 
lower bounds of da, 6b i. e. a time-varying uncertainty. 
It is clear from Fig. 1 (and Fig. 2, which displays the 
magnified area around the state trajectory at the second 
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time instant) that the nominal solution to problem (2)- 
(8) cannot guarantee robustness in the presence of the 
uncertainty and the nominal system trajectory results 
in constraint violation. On the other hand, the con- 
troller obtained with the method discussed here man- 
ages to retain the trajectory in the set of feasible initial 
states (obtained by the critical regions of the paramet- 
ric solution) and drives the trajectory close to the origin. 
One should notice that the space of feasible initial states 
(Fig. 3) given by the critical regions of the parametric 
solution is smaller than the one given in the nominal 
system’s case (Fig. 1). 


Conclusions 


In this chapter two robust parametric MPC prob- 
lems were analyzed. In the first problem two meth- 
ods for robust parametric MPC are discussed, an 
open-loop and a closed-loop method, for treating ro- 
bustness issues arising from the presence of input dis- 
turbances/uncertainties. In the second problem, a ro- 
bust parametric MPC procedure was discussed for the 
control of dynamic systems with uncertainty in the 
system matrices by employing robust parametric opti- 
mization methods. 
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Introduction 


Clustering is probably the most important unsuper- 
vised learning problem and involves finding coherent 
structures within a collection of unlabeled data. As such 
it gives rise to data groupings so that the patterns are 
similar within each group and remote between differ- 
ent groups. Besides having been extensively applied in 
areas such as image processing and pattern recognition, 
clustering also sees rich applications in biology, mar- 
ket research, social network analysis, and geology. For 
instance, in marketing and finance, cluster analysis is 
used to segment and determine target markets, position 
new products, and identify clients in a banking data- 
base having a heavy real estate asset base. In libraries, 
clustering is used to aid in book ordering and in insur- 
ance, clustering helps to identify groups of motor in- 
surance policy holders with high average claim costs. 
Given its broad utility, it is unsurprising that a substan- 
tial number of clustering methods and approaches have 
been proposed. 

On the other hand, fewer solutions to systemati- 
cally evaluate the quality or validity of clusters have 
been presented [1]. Indeed, the prediction of the opti- 
mal number of groupings for any clustering algorithm 
remains a fundamental problem in unsupervised classi- 
fication. To address this issue, numerous cluster indices 
have been proposed to assess the quality and the results 
of cluster analysis. These criteria may then be used to 
compare the adequacy of clustering algorithms and dif- 
ferent dissimilarity measures, or to choose the optimal 


number of clusters. Some of these measures are intro- 
duced in the following section. 


Methods 
Dunn’s Validity Index 


This technique [2,5] is based on the idea of identifying 
the cluster sets that are compact and well separated. For 
any partition of clusters, where c; represent the ith clus- 
ter of such a partition, Dunn’s validation index, D, can 
be calculated as 

D= min min 


l<i<n 1<jxn 
iFj 


Se? aon} 


Max)<k<n 


Here, d(c;,c)) is the distance between clusters cj, and 
cj (intercluster distance), d’(c,) is the intracluster dis- 
tance of cluster c,, and n is the number of clusters. The 
goal of this measure is to maximize the intercluster dis- 
tances and minimize the intracluster distances. There- 
fore, the number of cluster that maximizes D is taken as 
the optimal number of clusters to be used. 


Davies-Bouldin Validity Index 


This index [4] is a function of the ratio of the sum of 
within-cluster scatter to between-cluster separation: 


n 


1 $4(Qi) + $y(Q)) 
DB= — OE 
Yomax 5(Q. Q) 


i 
In this expression, DB is the Davies-Bouldin index, n 
is the number of clusters, S, is the average distance of 
all objects from the cluster to their cluster center, and 
S(Q;Q;) is the distance between cluster centers. Hence, 
the ratio is small if the clusters are compact and far 
from each other. Consequently, the Davies-Bouldin in- 
dex will have a small value for a good clustering. 

The silhouette validation technique [22] calculates 
the silhouette width for each sample, the average sil- 
houette width for each cluster, and the overall average 
silhouette width for a total data set. With use of this ap- 
proach each cluster can be represented by a so-called 
silhouette, which is based on the comparison of its 
tightness and separation. The average silhouette width 
can be applied for the evaluation of clustering validity 
and can also be used to decide how good are the number 
of selected clusters. To construct the silhouettes S(i) the 
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following formula is used: 


(b(i) — a(i)) 


sy) max {a(i), b(i)} 


Here, a(i) is the average dissimilarity of the ith object 
to all other objects in the same cluster and b(i) is the 
minimum average dissimilarity of the ith object to all 
objects in the other clusters. 

It follows from the formula that s(i) lies between 
—1 and 1. If the silhouette value is close to 1, it means 
that sample is “well clustered” and has been assigned 
to a very appropriate cluster. If the silhouette value is 
close to 0, it means that that sample could be assigned 
to another “closest” cluster as well, and the sample lies 
equally far away from both clusters. If the silhouette 
value is close to —1, it means that sample is “misclas- 
sified” and is merely somewhere in-between the clus- 
ters. The overall average silhouette width for the entire 
plot is simply the average of the S(i) for all objects in 
the whole dataset and the largest overall average silhou- 
ette indicates the best clustering (number of clusters). 
Therefore, the number of clusters with the maximum 
overall average silhouette width is taken as the optimal 
number of the clusters. 


Measure of Krzanowski and Lai 
This index is based on the decrease of the within-cluster 
sum of squares (WSS) [15] and is given by 

DIFF(k) 
DIFF(k + 1)]’ 
DIFF(k) = (k — 1)? WSS(k — 1) — k? WSS(k) . 


KL(k) = | where 


Assuming that g is the ideal cluster number for a given 
dataset, and k is a particular number of clusters, then 
WSS(k) is assumed to decrease rapidly for k < g and 
decreases only slightly for k > g. Thus, it is expected 
that KL(k) will be maximized for the optimal number 
of clusters. 


Measure of Calinski and Harabasz 
This method [3] assesses the quality of k clusters via the 
index 
BSS(k — 1)/(k — 1) 
WSS(k)/(n — k) 


CH(k) = 


Here, WSS(k) and BSS(k) are the WSS and the between- 
cluster sums of squares, for a dataset of n members. The 
measure seeks to choose clusters that are well isolated 
from one another and coherent, but at the same time 
keep the number of clusters as small as possible, thus 
maximizing the criterion at the optimal cluster number. 
Incidentally, a separate study comparing 28 validation 
criteria [18] found this measure to perform the best. 

In addition, some other measures to determine the 
optimal number of clusters are (i) the C index [10], 
(ii) the Goodman-Kruskal index [8]), (iii) the isolation 
index [19], (iv) the Jaccard index [11], and (v) the Rand 
index [20]. 


Applications 


As can be seen, while it is relatively easy to propose 
indices of cluster validity, it is difficult to incorporate 
these measures into clustering algorithms and to ap- 
point suitable thresholds on which to define key deci- 
sion values [9,12]. Most clustering algorithms do not 
contain built-in screening functions to determine the 
optimal number of clusters. This implies that for a given 
clustering algorithm, the most typical means of deter- 
mining the optimal cluster number is to repeat the clus- 
tering numerous times, each with a different number of 
groupings, and hope to catch a maximum or minimum 
turning point for the cluster validity index in play. 

Nonetheless, there have been attempts to incorpo- 
rate measures of cluster validity into clustering algo- 
rithms. One such method [21] introduces a validity in- 
dex: 


idity = Intra — Cluster 
Yan = Inter — Cluster 
Since it is desirable for the intracluster distance and the 
intercluster distance to be minimized and maximized, 
respectively, the above validity measure should be as 
small as possible. Using the K-means algorithm, Ray 
and Turi [21] proposed running the process for two 
up to a predetermined maximum number of clusters. 
At each stage, the cluster with the maximum variance 
is split into two and clustering is repeated with these 
updated centers, until the desired turning point for the 
validity measure is observed. Another approach [16] 
is based on simulated annealing, which was originally 
formulated to simulate a collection of atoms in equi- 
librium at a given temperature [14,17]. It assumes two 
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given parameters D, which is the cutoff cluster diame- 
ter, and P, a P-value statistic, as well as p(d), the distri- 
bution function of the Euclidean distances between the 
members in a dataset. Then, the upper boundary for the 
fraction of incorrect vector pairs is given by 


f(D, K= v= f p(x)dx. 


On the other hand, it is possible to define a lower 
boundary for f(D,K) with a preassigned P-value cutoff. 
The clustering algorithm then sequentially increases the 
cluster number until the two indicators converge. 


A Novel Clustering Approach 
with Optimal Cluster Determination 


See also the article on “Gene Clustering: A Novel 
Optimization-Based Approach”. 

Recently, we proposed a novel clustering ap- 
proach [23,24] that expeditiously contains a method 
to predict the optimal cluster number. The clustering 
seeks to minimize the Euclidean distances between the 
data and the assigned cluster centers as 


n c Ss 
Seo PS > Wij (aix = Zk) : 
PO" i= j=1 k=1 

To make the nonlinear problem tractable, we apply 
a variant of the generalized benders decomposition al- 
gorithm [6,7], the global optimum search. The global 
optimum search decomposes the problem into a primal 
problem and the master problem. The former solves the 
continuous variables while fixing the integer variables 
and provides an upper-bound solution, while the latter 
finds the integer variables and the associated Lagrange 
multipliers while fixing the continuous variables and 
provides a lower-bound solution. The two sequences 
are iteratively updated until they converge at an opti- 
mal solution in a finite number of steps. 

In determining the optimal cluster number, we note 
that the optimal number of clusters occurs when the 
intercluster distance is maximized and the intracluster 
distance is minimized. We adapt the novel work of Jung 
et al. [13] in defining a clustering balance, which has 
been shown to have a minimum value when intraclus- 
ter similarity is maximized and intercluster similarity is 
minimized. This provides a measure of how optimal is 


a certain number of clusters used for a particular clus- 
tering algorithm. Given n data points, each having k 
feature points, j clusters, and a binary decision variable 
for cluster membership wj, we introduce the following: 


n 
Global center, zr = + > air, Wk, 
i=l 
Intracluster error sum, 


n c $s 2 
LASS 5 yn area 


i=1 j=1k=1 


Cc Ss 
Intercluster error sum, 7 = )> )> ||z;x — 2? I; 
j=lk=1 


Jung et al. [13] next proposed a clustering balance pa- 
rameter, which is the w-weighted sum of the two error 
sums: 


Clustering balance, e=aA+(l—a)I. 


We note here that the right a ratio is 0.5. There are two 
ways to come to this conclusion. We note that the fac- 
tor a should balance the contributive weights of the two 
error sums to the clustering balance. At extreme clus- 
ter numbers, that is, the largest and smallest numbers 
possible, the sum of the intracluster and intercluster er- 
ror sums at both cluster numbers should be balanced. 
In the minimal case, all the data points can be placed 
into a single cluster, in which case the intercluster error 
sum is zero and the intracluster error sum can be cal- 
culated with ease. In the maximal case, each data point 
forms its own cluster, in which case the intracluster er- 
ror sum is zero and the intercluster error sum can be 
easily found. Obviously the intracluster error sum in 
the minimal case and the intercluster error sum in the 
maximal case are equal, suggesting that the most appro- 
priate weighting factor to use is in fact 0.5. The second 
approach uses a clustering gain parameter proposed by 
Jung et al. [13]. This gain parameter is the difference be- 
tween the decreased intercluster error sum y; compared 
with the value at the initial stage and the increased in- 
tracluster error sum A; compared with the value at the 
initial stage, and is given by 


n 
Vik = Do wij jain — ze] — ze — 241. 
i=1 


Vj.Wk, 


Aji = wij aie — Z| 


i=1 
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n 
Gain, Ajx = Yo wij || ix =e I; _ | zx — 2; 


i=1 


= Yo wij ai; = Zk I. Vj,Vk. 


i=1 


With the identities 
n 
» WijGik = NjZjk, Vj, Vk, 


i=1 


do wij = nj, Vj. 


i=1 


where n; denotes the number of data points in cluster j, 
the gain can be simplified to 


Jung et al. [13] showed the clustering gain to have 
a maximum value at the optimal number of clusters, 
and demonstrated that the sum total of the clustering 
gain and balance parameters is a constant. As can be 
seen from the following derivation, this is only possible 
if the @ ratio is 0.5: 
Sum of clustering balance and clustering gain, 92 
=e+A 
=A+Ir+A 


n c AY 
=| SOY wij fase — z([3 


i=1 j=1 k=1 
c 
+| So \zn—zelp | +--- 
j=l 
c Ss 2 € 2 
DL wij aie — zl, - dX zn - 21, 
= 
Y wij Jain — Zn[3 


n Cc Ss 
= Limi len zi, 


i=1 j=1 k=1 
n Ss 

=O) lan - 22, 
i=1 k=1 


which is a constant for any given dataset. 


Extension for Biological Coherence Refinement 


Today, the advent of DNA microarray technology has 
made possible the large-scale monitoring of genomic 
behavior. In working with gene expression data, it is 
often useful to utilize external validation in evaluating 
clusters of gene expression data. Besides assessing the 
biological meaning of a cluster through the functional 
annotations of its constituent genes using gene ontol- 
ogy resources, other indications of strong biological co- 
herence [25] are (i) the proportion of genes that reside 
in clusters with good P-value scores, (ii) cluster corre- 
lation, since closely related genes are expected to ex- 
hibit very similar patterns of expression, and (iii) clus- 
ter specificity, which is the proportion of genes within 
a cluster that annotates for the same function. A novel 
extension of the previously described work [25] allows 
not just for the determination of the optimal cluster 
number within the framework of a robust yet intuitive 
clustering method, but also for an iterative refinement 
of biological validation for the clusters. The algorithm 
is as follows. 


Gene Preclustering We precluster the original data 
by proximity studies to reduce the computational 
demands by (i) identifying genes with very similar 
responses and (ii) removing outliers deemed to be in- 
significant to the clustering process. To provide just ad- 
equate discriminatory characteristics, preclustering can 
be done by reducing the expression vectors into a set 
of representative variables {+,0,—}, or by pregroup- 
ing genes that are close to one another by correlation 
or some other distance function. 


Iterative Clustering We let the initial clusters be de- 
fined by the genes preclustered previously, and find the 
distance between each of the remaining genes and these 
initial clusters and as a good initialization point place 
these genes into the nearest cluster. For each gene, we 
allow its suitability in a limited number of clusters on 
the basis of the proximity study. In the primal problem 
of the global optimum search algorithm, we solve for 
zjx- These, together with the Lagrange multipliers, are 
used in the master problem to solve for w,. The primal 
problem gives an upper-bound solution and the master 
problem gives a lower bound. The optimal solution is 
obtained when both bounds converge. Then, the worst- 
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placed gene is removed and used as a seed for a new 
cluster. This gene has already been subjected to a mem- 
bership search, so there is no reason for it to belong to 
any of the older clusters. The primal and master prob- 
lems are iterated and the number of clusters builds up 
gradually until the optimal number is attained. 


Iterative Extension Indication of strong biological 
coherence is characterized by good P values based on 
gene ontology resources and the proportion of genes 
that reside in such clusters. As an extension, we would 
like to mine for the maximal amount of relevant in- 
formation from the gene expression data and sieve out 
the least relevant data. This is important because infor- 
mation such as biological function annotation drawn 
from the cluster content is often used in the further 
study of coregulated gene members, common reading 
frames, and gene regulatory networks. From the clus- 
tered genes, we impose a coherence floor, based on 
some or all of the possible performance factors such 
as functional annotation, cluster specificity, and corre- 
lation, to demarcate genes that have already been well 
clustered. We then iterate to offer the poorly placed 
genes an opportunity to either find relevant member- 
ship in one of the strongly coherent clusters, or regroup 
amongst themselves to form quality clusters. Through 
this process, a saturation point will be reached eventu- 
ally whereby the optimal number of clusters becomes 
constant as the proportion of genes distributed within 
clusters of high biological coherence levels off. Figure 1 
shows a schematic of the entire clustering algorithm. 
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A classification problem is concerned with categorizing 
a data point (entity) into one of G (G > 2) mutually 
exclusive groups based upon m (positive integer) spe- 
cific measurable features of the entity. A classification 
rule is typically constructed from a sample of entities, 
where the group classifications are known or labeled 
(training or supervised learning). Then it can be used 
to classify new unlabeled entities. Many classification 
methods are based on distance measures. A common 
approach is to find a hyperplane to classify two groups 
(G = G, (JG). The hyperplane can be represented in 
a form of Aw = y, where A denotes an n x m input 
data matrix, n is the total number of input data points, 
and m is the total number of data features/attributes. 
The classification rule is then made by the weight vec- 
tor @ to map data points onto a hyperplane, and the 
scalar y, which are best selected by solving a mathe- 
matical programming model. The goal is to have en- 
tities of Group 1 (G;) lie on one side of the hyperplane 
and entities of Group 2 (G2) lie on the other side. Sup- 
port Vector Machines (SVM) is the most studied hy- 


perplane construction method. The SVM concept is to 
construct a hyperplane that minimizes the upper bound 
on the out-of-sample error. The critical step of SVM is 
to transform (or map) data points on to a high dimen- 
sional space, known as kernel transformation, and clas- 
sify data points by a separating plane [9]. Subsequently, 
the hybrid linear programming discriminant model is 
proposed by [12,13,20]. The hybrid model does not de- 
pend on data transformation, where the objective is 
to find a plane that minimizes violations and maxi- 
mizes satisfactions of the classified groups. Glover [19] 
proposed a mixed integer programming (MIP) formu- 
lation for the hybrid model by adding binary vari- 
ables for misclassified entities. Other MIP formulations 
that are subsequently developed include [1,15,16]. Re- 
cently, a new technique that use multiple hyperplanes 
for classification has been proposed by [17]. This tech- 
nique constructs a piecewise-linear model that gives 
convex separating planes. Subsequently, Better, Glover 
and Samorani [6] proposed multi-hyperplane formula- 
tions that generate multiple linear hyperplanes simulta- 
neously with the consequence of forming a binary deci- 
sion tree. 

In classification, the selection of data’s features/ 
attributes is also very critical. Many mathematical pro- 
gramming methods have been proposed for selecting 
well represented features/attributes. Bennett and Man- 
gasarian [5,23] gives a feature selection formulation 
such that the model not only separates entities into two 
groups, but also tries to suppress nonsignificant fea- 
tures. In a more recent study, Chaovalitwongse et al. 
(2006) proposed Support Feature Machine (SFM) for- 
mulations can be used to find a set of features that gives 
the highest classification performance [10]. 

Baysian decision method has also been widely stud- 
ied in classification. However, there are only few stud- 
ies incorporating the Baysian model with mathematical 
programming approaches. Among those studies, As- 
parouhov and Danchev [4] formulates a MIP model 
with binary variables, which are conformed with the 
Bayesian decision theory. In the case of multi-group 
classification, Anderson [2] developed a mathematical 
formulation that incorporates the population densities 
and prior probabilities of training data. This model 
yields classification rules for multi-groups with a reject 
option, (a set having entities that does not belong to any 
group) [22]. 
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Deterministic Optimization Models 
Support Vector Machines 


Support Vector Machines (SVM) is aimed at finding 
a hyperplane that separates the labeled input data into 
two groups, G; and G2. Then the optimal plane can 
be used for classifying new data point. The hyperplane 
can be mathematically expressed by Aw = y, where 
@ € ft is an m-dimensional vector of real numbers, m 
is total number of attributes/features used to represent 
a data entity, and y € 3t" isa scalar vector. All elements 
from Gj and G will be separated by this hyperplane un- 
der the assumption that the sets G; and G are separa- 
ble. Define margin as the minimum distance from the 
plane to elements in a group, G; or G2. The objective 
function of SVM is to find a separating hyperplane with 
the largest margin. The data set G; can be represented 
by the matrix A; € mkxm™ i © G, and the set G, can 
be represented by the matrix Aj € R"-"*™, j € G, 
where k are number of data points (entities) of group 
G,. Two open half spaces defined by the hyperplane are 
{Ajw < y} and {Ajw > y}. One contains elements 
of G, and the other contains elements of G. There- 
fore, a linear programming (LP) problem can be formu- 
lated to determine the optimal values of vectors w and 
y. To construct valid inequalities for linear program- 
ming, we rescale the variables (w, y), by dividing them 


by the positive value min {Ajw—y,—Ajw+y}.Let 
i€G 1, jE€G2 


e denote a vector of ones, and the resulting inequalities 
become 


Ajw>eyt+e, Ajw<ey—e. (1) 

The performance of the SVM relies heavily on the 
kernel transformation, the data mapping to a high di- 
mension. SVM can also incorporates nonlinear map- 
ping (-). If the new dimension is sufficiently high 
enough, the data from two classes can always be sep- 
arated by a hyperplane [9,11]. Examples of SVM kernel 
functions include linear, polynomial, radial basis func- 
tion (RBF) and sigmoid. Recently, Shimodaira et al. [24] 
has proposed the Dynamic Time-Alignment Kernel for 
time series data. 


Robust LP for SVM 


It is important to note that the above LP model assumes 
that A; and A; are perfectly separable, which is usually 


not a case in practice. In other words, it is possible that 
the inequalities in Eq. (1) provide no solution as the 
data are not perfectly separable. Bennett and Mangasar- 
ian [5] proposed an improved formulation that mini- 
mizes an average misclassifications given by 


.. ey, ez 

min —~ + — 

O,Y,y.2 mM k 

s.t. Ajw —ey—e= y, 
—Ajm+ey—e=zZ, 


y>0,z2>0. 


It is easy to see that the variables y and z are, in fact, 
the vectors representing the violations of inequalities in 
Eq. (2) and minimizing the objective function would 
lead to the minimum average violation. 


Feature Selection with SVM 


We note that an extension of the robust LP formulation 
can be used for feature selection [5,23]. A new term is 
added in the objective function in the robust LP model 
to suppress the components of w. This would try to 
eliminate all unnecessary features. Let v denote the ab- 
solute value of the weight vector a, log is the base of the 
natural logarithm, and A € (0,1). The mathematical 
program with a concave objective function and linear 
constraints for feature selection is given by 


(1A LZ + 42) + AeT(e — log”) 


min 
DyVsV5Z5V 
s.t. Ajiw—ey—e=y, 
—Ajo+ey—e=Z, 
y=0,z>0, 
—vso<v. 


Note that when A = 0, the model gives a plane that 
separates A; and A; without considering feature sup- 
pression. On the other hand, when 4 > 0, the ob- 
jective not only tries to separate A; and Aj, but also 
tries to eliminate as many of w components as possi- 
ble. Specifically, for each v; (i = 1,...,n), we min- 
imize an exponential smoothing of the step function 
(1—log *”’). This step function enables the deletion of 
irrelevant components of w. There also exists a finitely- 
terminating algorithm that solves this problem using 
successive linear programming [8]. 
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Hybrid LP Discriminant Model 


A hybrid LP discriminant model is proposed 
in [12,13,20]. This model is guaranteed to give the 
optimal solution regardless of the nature of the data. 
Therefore, the solution is invariant to transformations. 
This model is improved to overcome many shortcom- 
ings of contemporary linear discriminant formulations, 
which are reviewed and discussed in [21]. Recall that 
m is the number of attributes, all data points in G; are 
represented by an k x m matrix A;, i € Gj, and all 
data points in G; are represented by an (n — k) x m 
matrix Aj, j € G. For the simplicity of mathematical 
representation, the membership in G; or G) can be rep- 
resented by i € G; or i € Gy, respectively. This model 
will give a hyperplane of the form A’o = 
the model seeks for the optimal weight vector w, and 
a scalar y, where data points of Group | lie on one side 
of the hyperplane and data points of Group 2 lie on the 
other side (i.e., Ajw < y,i € G; and Ajw > y,i € G). 
Let y; and z; represent external and internal deviation 
variables referring to the point violations and satisfac- 
tions of the classification rule. More specifically, they 
are the magnitudes of the data points lying outside or 
inside their targeted half spaces. The objective is to 
minimize violations and maximize the satisfactions of 
the classified groups. Thus, in the objective function, 
variable h;’s discourage external deviations and vari- 
able k;’s encourage internal deviations. Then h; < k; 
for i = 0 andi € G, must be satisfied. The hybrid 
model is given by 


min ho yo + > hi vi — koZo — yy kz; 


y, where 


i€G i€G 
st. Ajo—yo-yi tZta=y i€e€G 
Aio+yotyi-Z-Zi=y i€G 
at >a =, ie€G (2) 
i 
yo, Zo = 0 
yizi20, i€G 
o,Y unrestricted. 


We note that Eq. (2) is a normalization constraint 
that is necessary for avoiding a trivial solution where all 
@; = Oand y = 0. Glover [18] identifies more nor- 
malization methods to conquer the problem with null 
weighting. 


MIP Discriminant Model 


There are several related mixed integer formulations in 
the literature [1,15,16]. In general, due to the computa- 
tional requirements, these standard MIP formulations 
can only be applied to classification problems with a rel- 
atively small number of observations. Glover [19] pro- 
posed a compact mathematical program for discrimi- 
nant model, which is a variant of the above-mentioned 
hybrid LP model. This objective of this model is to min- 
imize the number of misclassified entities. The MIP dis- 
criminant model is given by 


min y Z; 


i€G 


s.t. Ajx-—Mz;+f;=b, i€G, 
Aix +Mz;-fi=b, i€G 
fi; >0, i€G 
zi €{0,l}, ie€G 


x,b unrestricted, 


where f; are slack variables, and M is a large constant 
chosen so that when z; = 1, Ajx < b+ Mz; will be re- 
dundant for i € G, and A;x > b — Mz; will be redun- 
dant for i € Gy. This model can incorporate a normal- 
ization constraint, (—n2 Viec, Ajtn Viec Aj)x = 
1, where 1; and nz are the number of entities in G; and 
Gp, respectively. 


Multi-hyperplane Classification 


Multi-hyperplane formulations, given by Better et 
al. [6], generate multiple linear hyperplanes simultane- 
ously with the consequence of forming a decision tree. 
The hyperplanes are generated from an extension of the 
Discriminant Model proposed by Glover [18]. Instead 
of using kernel transformation that projects data into 
a high dimensional space to improve the performance 
of SVM, the multi-hyperplane approach approximates 
a nonlinear separation by constructing multiple hyper- 
planes. Let d = 0 when we are at a root node of a bi- 
nary tree, where none of the classifications have been 
done. Let d = D when the tree has two leaf nodes cor- 
responding to the final separation step. In order to ex- 
plain the model, we define the following terms. 
e Successive Perfect Separation (SPS) is a procedure 
that forces all elements of Group 1 (G,) and Group 2 
(G2) to lie on one side of the hyperplane at each node 
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for any depth d € {0,..., D—1}. SPS is a special use 

of a variant based on a proposal of Glover [18]. 

e SPS decision tree is a tree that results from the two- 
group classification iteratively applying the SPS pro- 
cedure. The root node (d = 0) contains all the enti- 
ties in the data set, and at d = D the two leaf nodes 
correspond to the final separation step. 

For a given maximum depth D, an initial multi- 
hyperplane model considers each possible SPS tree type 
of depth d, ford = 0,..., D—1. A root node is viewed 
as a “problem” node where all data points from both 
groups need to be separated. A leaf node, on the other 
hand, is considered to be a “decision” node where data 
points are classified into two groups. Define slicing vari- 
ables sl; for i € {1,...,D — 1}. There are total of 
D — | slicing variables needed for a tree having max- 
imum depth D. Specifically, at depth d = 1, sl, = 0 if 
the “left” node constitutes a leaf node while the “right” 
node constitutes a root (or problem) node. Without loss 
of generality, we herein consider D = 3 for the initial 
multi-hyperplane model. The mathematical model for 
multi-hyperplane SVM can be formally defined as fol- 
lows. 

Let M and ¢ denote large and small positive con- 
stants, respectively, and G denote a set of the union of 
entities in G; and G2. Suppose there are n entities in the 
training data set. Define a binary variable z* = 0 if ob- 
ject i is correctly classified by the “tree”, otherwise z* = 
1, Define a binary variable and z;, = 0 if object i is cor- 
rectly classified by “hyperplane h”, otherwise z;, = 1. 
The multi-hyperplane SVM model also includes tradi- 
tional hyperplane constraints for each depth d of the 
tree and the normalization constraint, which is similar 
to the mixed integer programming model in [18]. Then, 
é is added to prevent data points from lying on the hy- 
perplane. Tree-type constraints are included to identify 
the optimal tree structure for the data set, which will 
be in part of the optimal classification rule. Binary vari- 
ables y; are used for tree types (0,1) and (1,0) to acti- 
vate or deactivate either-or constraints. The SPS deci- 
sion tree formulation for the depth D = 3 is given by 


s.t.Ajxg — Mzai + Bj = ba —€ 
i€G,, d=1,2,3 (3) 


Ajixa + Mzai — Bi = ba +e 
i€ G, d= 1,2,3 


M(sl, + sh) + 27 


> Zit 2 +23)-2 1€ G 


M(sh + sh) + Mz? 


> Zi +224 +23: 1 € G 


M(2 — sl, — sh) + Mz? 


> Ziti +23 1€G 


M(2 _ sl — sl) + zr 


> 24 + 223 + 233 -2 1 € G 


M(1 + sh —sh) + 27 > 21; — My; 
i¢€ G, 


M(1 + sl; — sh) + Mz? 
> 2 +23;-M[l1—yi] ieG, 


M(1 + sh — sl) + zr > Zi 


i€G 


M(1 + sk = sl) + z 


> 23 + 233-1 i€G 


M(1 + sk = sl) + Ze = £5 
ie G, 


M(1+ sh —sh) + 27 > 2) + 23; -1 
i€éG, 


M(1 + sh —sh)+ 27 > zi — My; 


i€ G 


M(1 + shi —sh) + Mz? 
> Zi + 23; -M[1—yi] i€ G 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


(10) 


(11) 


(12) 


(13) 


(14) 


(15) 


(16) 
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n 


Pe =1 


j=l d=1 


(17) 


z} € {0,1}, zai € {0, 1}, yi € {0, 1}, 
ie G,d =1,2,3 
sl, € {0,1} k = 1,2, x, b unrestricted, 


where the constraints in Eqs. (3)-(4) are the hyper- 
plane constraints, Eqs. (5)-(6) are the constraints for 
tree type (0,0), Eqs. (7)-(8) are the constraints for tree 
type (1,1), Eqs. (9)-(12) are the constraints for tree type 
(0,1), Eqs. (13)-(16) are the constraints for tree type 
(1,0), and Eq. (17) is the normalization constraint. This 
small model with D = 3 performs well for small depths 
and has computational limitations. The reader should 
refer to [6] for a greater detail of an improved and gen- 
eralized structure model for all types of SPS trees. 


Support Feature Machines 


Support Feature Machines (SFM) proposed in [10] is 
a mathematical programming technique used to iden- 
tify a set of features that gives the highest performance 
in classification using the nearest neighbor rule. SFM 
can be formally defined as follows. Assume there are n 
data points, each with m features, we define the decision 
variables x; € {0,1}(j = 1,...,m) indicating if fea- 
ture j is selected by SFM and y; € {0,1} (i = 1,...,n) 
indicating if sample i can be correctly classified by SFM. 
There are two versions of SFM, voting and averaging. 
Each version uses different weight matrices, which are 
provided by user’s classification rule. 

The objective function of voting SFM is to maximize 
the total correct classification as in Eq. (18). There are 
two sets of constraints used to ensure that the training 
samples are classified based on the voting nearest neigh- 
bor rule as in Eqs. (19)-(20). There is a set of logical 
constraints in Eq. (21) used to ensure that at least one 
feature is used in the voting nearest neighbor rule. The 
mixed-integer program for voting SFM is given by: 


n 
max ) > yj (18) 
i=1 
s.t. ms AjjXj — De s < My; 
j=l j=l 
fori=1,...,n (19) 


fori = 1, n (20) 
See 1 (21) 
j=l 
x € {0,1}, ye {0,1}”", 

where a;; = 1 if the nearest neighbor rule correctly 


classified sample i at electrode j, 0 otherwise, n is total 
number of training samples, m is total number of fea- 
tures, M = m/2, and € is a small positive number used to 
break a tie during the voting (0 < € < 1/2). 

The objective function of averaging SFM is to maxi- 
mize the total correct classification as in Eq. (22). There 
are two sets of constraints used to ensure that the train- 
ing samples are classified based on the distance aver- 
aging nearest neighbor rule as in Eqs. (23)-(24). There 
is a set of logical constraints in Eq. (25) used to ensure 
that at least one feature is used in the distance averag- 
ing nearest neighbor rule. The mixed-integer program 
for averaging SFM is given by: 


max ) > y; (22) 
i=1 

s.t So dijx; _ Yo dijxj < Mii 

j=l j=l 

fori=1,...,n (23) 
~~ dj jxj — > dijxj < Mzi(1— yi) 
j=l j=l 

fori=1,...,n (24) 
Yoxj21 (25) 


x € {0,1}", ye {0,1}”", 


where dj; is the average statistical distance between 
sample i and all other samples from the same class at 
feature j (intra-class distance), dj ; is the average statis- 
tical distance between sample i and all other samples 
from different class at feature j (inter-class distance), 


Mi; = ae djj,and M2; = em djj. 
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Probabilistic Optimization Models 


The deterministic classification models in the previous 
section make a strong assumption that the data are sep- 
arable. In the case that the data may not be well sepa- 
rated, using the deterministic models may lead to a high 
misclassification rate. The classification models that in- 
corporate probabilities may be a better option for such 
noisy data. When the population densities and prior 
probabilities are known, there are probabilistic models 
that consider constrained rules with a reject option [2] 
as well as a Baysian-based model [4]. 


Bayesian-Based Mathematical Program 


The Baysian-based mathematical program that are con- 
formed with the Bayesian decision theoretic approach is 
proposed by Asparouhov and Danchev [4]. The model 
can be formally defined as follows. Denote c € % as 
a cut-off value, x € B” as a vector of m binary val- 
ues, and w € ‘ht is a decision variable having m- 
dimensional vector of real numbers. A preprocessing 
needs to be performed so that if x’@ < c, the entity 
x belong to class 1; otherwise it belongs to class 2. Sup- 
pose we have a set of n data points, n; data points are 
in G; and ny data points are in Go, (n = ny + np). 
Let s be a non empty multinomial cell. Denote nj; as 
the number of design set observation from the class i, 
where i = 1, 2, falling in this cell s. There are 2” num- 
ber of multinomial cells. Each cell is unique and all ob- 
servations that belongs to it have exactly the same val- 
ues of the m binary variables. Denote M as a sufficiently 
large positive real number, and € as a small positive 
number. In addition to having a geometric interpreta- 
tion, this formulation is inspired from Bayesian deci- 
sion theoretic approach and having prior probabilities, 
ni/n > Wi, incorporated. Experimental studies in [4] 
suggest this Baysian-based model can give better per- 
formance than other contemporary linear discriminant 
models. The Baysian-based classification formulation is 
given by 


min ) (|m1; — Mos| Zs + min(m;, N25)) 
@,Z5,C 
Ss 


tT: * 
s.t.x,@ — Mz, < cifm, > ms; 


x) w + Mz, >cteifnys < m; 


Ns + N25 # 0 
z,€ {0,1}, MER", CEN. 


Probabilistic Models for Classification 


An optimization model proposed by Anderson [2] 
incorporates population densities, prior probabili- 
ties from all groups, and misclassification proba- 
bilities. This method is aimed to find a partition 
{Ro, Ri,..., Rg} of Re” where m is the number of fea- 
tures. This method naturally forms a multi-group clas- 
sification. The objective is to maximize the probability 
of correct allocation subject to constraints on the mis- 
classification probabilities. The mathematical model 
can be formally defined as follows. Let fy, = 1,...,G, 
denote the group conditional density functions. Let , 
denote the prior probability that a randomly selected 
entity is from group g, g = 1,...,G, and ayy, h F g, 
are constants between 0 and 1. The probabilistic classi- 
fication model is then given by 


S.t. | fon dw < he 
Rg 


forh,g=1,....G,hA#g. 


The optimal rule that can be used as a classification 
method is given by 


R= #8 66s) = max L;(x)$ , (26) 
he0,1,...,G 

where g = 0,...,G, Lo(x) = 0, and Lj(x) = 

fale) — fie, Anfile), for hh = 1,,..,G. In 


general, there exist nonnegative constants Aj,,i,h € 
1,...,G,i # h, such that this optimal rule holds. The 
procedure for deriving a discriminant rule is composed 
of two stages. The first stage is to compute f;,, which are 
estimated density functions f;,, and z;,, which are esti- 
mated prior probabilities 2}, for h = 1,...,G. There 
are many methods proposed for density estimation. 
The second stage is to estimate the optimal Nin S> given 


the estimates #3 and 7, s. For estimating the Nin S> there 
is a MIP approach proposed in [14], and a LP approach 
proposed in [22]. 

The MIP approach uses binary variables to record 
whether each entity was allocated to each region. This 
approach measures the probabilities of correct classi- 
fication and misclassification for any candidate set of 
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X'.,8, which are calculated as the proportion of train- 
ing samples that fall into each of the regions. The ob- 
jective is to maximize a linear combination of vari- 
ables representing correct allocation. The proportions 
of training samples misclassified were incorporated in 
constraints on misclassification probabilities. On the 
other hand, the LP approach does not have binary vari- 
ables to incorporate proportions of misclassified train- 
ing data points, and to provide a mechanism for mod- 
eling a priori bounds on misclassification probabilities. 
Instead, the LP approach provides a mechanism for es- 
timating 4/5 that balances the minimization of mis- 
classifications and the maximization of correct classifi- 
cations. This can be demonstrated as follows. Redefine 
the function L;,,h = 1,..., G, as 


G 
Ln(x) = mapalx)— >) Ainpilx), 
i=1,it¢h 


(27) 


where pj(x) = fi(x)/ ae fi(x). This is analogous to 
the definition of original p; in Eq. (26) since R, can be 
expressed as Rg = {x € RE: Le(x) < Ly(x), h = 


0,....G), if and only if, ((1/ D1) file) Ly) < 


((1/ By) fulx)) Ly (x). Note that this new definition 
of L;, is just an assumption. In addition, we also as- 
sume that we have a training sample of n data points 
whose group classifications are known. There are ng 
data points in group g and ae ng =n. For no- 
1,...,G and Ng = 
1,..., Mg. Each data point x has k attributes, denoted 
as x8) e R* forg =1,...,Gand j= eines tes 


tational convenience, let y = 


MIP Formulation for Anderson’s Model 


In order to find the optimal estimation of the second 
stage for solving Anderson’s formula in Eq. (27), after 
the estimates f/s and z/,s are given, the optimal ins 
is the final goal. For estimating the Nin S> Gallagher et 
al. [14] proposed a MIP formulation. Same notation 
used in Anderson’s formula in last section is applied 
here. The model ensures that the proportion of train- 
ing data points and total data points ng of group g in 
region R; is less than or equal to a pre-specified per- 
centage, ng > (0 < dng < 1), forh,g € yandh F g. 
The original formulation of the approach is a nonlinear 


MIP model given by 


a > Ugej 


SEY jENg 


min 
LhgjVgjAihst gj 


s.t. 
Ligj = Wnpn(x®/) — bo Ainpi(x®/) 


iey\{h} 


h,gey,jeNg 


(28) 
for 


Yep See Ligth = lysis, G} (29) 


for geyjENy, 

gi — Lggi SM (1— tgg;) (30) 
for geyjeNy, 
Vegi —Lngj 2 E(1 — Ung;) (31) 


for h,geyhAgjeNy, 


» Ungj S Longe | 


jENg 


(32) 


for h,gey, h#g 


—00 <Ligg<oo for h,gey,jeNg 


Ygi 20 for gey,jeNg 


Ain =O for ieN,hey 


Ui € {0,1} for gey,jeNg 


The above nonlinear mixed integer programming 
model can be transformed to an equivalent linear mixed 
integer model. The transformation is made by replacing 
the constraint in Eq. (29) with the following constraints: 


Vegi 2 Lng higey, jE Ng 
Yngj — Lagi < M1 —venj) hig ey, 7 € Ng 
Shej < WhPn(x®/ vag) Age y, GE Ng 


ya =) SEVIENS 
heG 


>> Fngi =Vei EV IEN , 


heG 
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where fg; = Oand ven; € {0,1}, forh,gey, j€ Ng. 
The constraints in Eq. (28) define the decision variable 
Lygj as a function value of Lj, at x8), The variable Vei 
in Eq. (29) gives a result that x! lies in region Ry, if 
and only if, yej = Lngj. The binary variable ung; is 
used to indicate whether or not x8/ lies in region Rp. 
The constraints in Eq. (30) together with the objective 
function ensure that u,,; = 1, if and only if, the jth 
entity from group g is correctly allocated to group g. 
The constraints in Eqs. (31)-(32) ensure that at most 
lotngng| data points of group g are allocated to group 
h, h # g. Note that the condition of indicator vari- 
ables, ung; = 0,h # g, implies that x8/ ¢ Ry, by 
Eq. (31), but the converse need not hold. As a result, 
the number of misclassifications may be overcounted. 
To force the converse hold, (that is upg; = 1, if and 
only if, xsi Ee Ry, Vh,g € y), one can include the 
following constraints: yg; — Lngj < M(1—Ung;) for 
h,g € y,j € N,. However, the addition of such con- 
straints substantially increases the solution times and 
the actual amount of overcounting is minimal. M and 
€ are large and small positive constants, respectively. 
Since this MIP formulation is very difficult to solve, 
especially it involves 2GN binary variables. There is 
a preprocessing strategy suggested in [14] by aggregat- 
ing variables and constraints. Special branching strate- 
gies for solving the MIP model is also suggested in [14]. 
Those strategies include branching on the smallest in- 
dexed fractional-valued binary variable, branching on 
the most infeasible fractional-valued binary variable, 
pseudo reduced-cost branching schemes, and strong 
branching [3,7]. 


LP Formulation for Anderson’s Model 


In order to estimate the rd for solving Anderson’s 
formula in the second stage in Eq. (27), Lee et al. [22] 
proposed the Linear Programming (LP) model that 
minimizes a penalty function in order to allocate each 
training entity to its correct group or to the reserved- 
judgment region. Note that same notation used in the 
MIP approach and Anderson’s formula is consistent 
here. The method is given by 


min 
LhgjgjVgjAin 


> > (c1@gj + C2Vg;) 


gey jENg 


s.t. 
Lagj = Tapr(x#)— Yo Ainpilx*/) (33) 
iey\{h} 

for h,gey,jeNg 

Legj — Lagj + gj 2 0 (34) 
for h,gey,hAg,jeN, 

Legj + Ogi 29 (35) 
for geyjEeN, 

—Ligj + Ygj = 0 (36) 


for h,gey,jeNy 


—00 <Lagj<co for h,gey,jeN, 


Mj 20 for geyjEeNg 
Voi 290 for gey,jeNy 


Ain =O for iE Nghey. 


The constraints in Eq. (33) define the decision vari- 
able Lj,,; as a function value of Ly, for x8/, If the op- 
timal solution yields w,; = 0, for some (g, j) pair, 
the constraints in Eqs. (34)-(35) imply that Lyi = 
max{0, Lag; : h € y}. Thus, when w,; = 0, it means 
that the jth entity from group g is correctly classified. If 
Ygj = 0 is the case for some (g, j) pair, then the con- 
straints in Eq. (36) implies that Lggj = max{0,Lhgj : 
h € y} = 0. Hence, the jth entity from group g is 
placed in the reserved-judgment region. If both w,; and 
Yj are positive, the jth entity from group g is misclas- 
sified. The optimization solver is attempting either to 
correctly classify training data points (wz; = 0), or to 
place them in the reserved-judgment region (y,; = 0). 
The optimizer’s emphasis can be realized by varying the 
weights c, and c. It is possible for both w,; and yg; to 
be zero. One should decide how to interpret in such sit- 
uation. Recall the optimal rule in Eq. (26), which con- 
strains that if x belongs to the reserved judgment region 
(h = 0) then it gives the function value Lo(x) = 0. 
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Let N be the set of natural numbers. Let N € N and RY 
be the N-dimensional real Euclidean space, let CN be 
the N-dimensional complex Euclidean space, R, C are 
used in place of R', C! respectively. Let R* = {x € R: x 
> 0}. Let x € R; then |x| denotes the absolute value of x. 
Let x, y € RY; then (x, y) denotes the scalar product in 
R® and || x || denotes the euclidean norm in RN. Let SY 
= {xe RN: || x] =1). 

Let DC RN be aconnected domain, let F: D> R be 
a given function. The following problem is considered: 


min F(x). (1) 
x€D 


When D = RY problem (1) is called the global un- 
constrained optimization problem. When D C RN it is 
called the global constrained optimization problem. 

Without loss of generality one considers only the 
minimization problem, that is, problem (1), since the 
maximization problem can be easily reduced to a mini- 
mization problem. 

To solve problem (1) means to find a point x* € D 
such that F(x*) < F(x), Vx € D. 

A large number of problems with great theoretical 
and practical interest can be formulated as global opti- 
mization problems, that is, as problem (1). 

In this article the global optimization problems are 
studied only from the point of view of numerical opti- 
mization and in particular of numerical methods based 
on differential equations. Many other fruitful points of 
view are possible to study that include the set of global 
minimizers of F on D or in general the set of critical 
points of F on D depending on the hypotheses made on 
F and D. 

A method to solve problem (1) in the sense of nu- 
merical optimization is usually an iterative scheme that 
from a given initial guess x° € Dis able to compute a se- 
quence {x” € D: n € N} such that x” — x* when n > 
00. 

Problem (1) can be easily solved in some special 
cases, that is, when the function F and the domain D 
have special forms, for example one can recall the fol- 
lowing two important cases: 

e linear programming problem: F linear function, D 
convex polyhedron, i.e., D C RN is defined implic- 
itly by means of equalities and inequalities between 
linear functions; 


e convex programming problem: F convex function, D 

C RN convex region. 

One notes that the linear programming problem can 
be considered as a special case of the convex program- 
ming problem. For both cases effective methods to solve 
problem (1) are known, e.g., for the linear program- 
ming problem the simplex method, see [7], and for 
the convex programming problem the Newton method 
coupled with some strategy to treat the constraints that 
define D, for example active set strategy, see [9]. 

In general, problem (1) is a difficult one since the 
property of being a global minimizer is not a local prop- 
erty. That is, a global minimizer x* cannot be recog- 
nized from local properties of the function F at x*, such 
as the value of F and its derivatives at x*. Numerical al- 
gorithms to recognize global properties are unusual and 
in general computationally expensive. 

For example, let D = R, m, a € R, 5 > 0, one consid- 
ers the following two functions: 


1 
F\(x) = -————., 
1( ) 14+ x2 
1 
F)(x) = -—— 
2(%) 1+ x? 
ee Pee 
chs e Ga=8 9 BE, xé€(a—d,a+5), 


0, x é(a—d,a +6). 


Function F, has in x = 0 the unique local minimizer 
which is also the global minimizer, i.e. x* = 0. Let m 
< —1,0 <6 < |q|; then function F, has several critical 
points including two local minimizers, one is x = 0 and 
the other is x = x2 € (a — 5, a + 5). Moreover the global 
minimizer of F, is x = x* = x2. One notes that F), F2 are 
smooth functions and that they coincide, for every x € 
R \ (a — 6, a + 5) where 4 > 0 is arbitrary. 

Let D = RN, let F: D > R be a continuously dif- 
ferentiable function, let VF be the gradient of F, let 
x € D be such that (VF)(x) 4 0 then the vector — 
(VF)(x) gives the direction of steepest descent for the 
function F at the point x. One can consider the follow- 
ing system of differential equations: 


dx 
a) = —(VF)(x(t)), t>0, (2) 


x(0) = x°. (3) 


Under some hypotheses on F, the solution of prob- 
lem (2), (3) is a trajectory in RY starting from x? and 
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ending in the critical point x} -(x°) of F whose attrac- 
tion region contains x’. Using a numerical integration 
scheme for (2), (3) one can obtain a numerical opti- 
mization method, for example choosing the Euler inte- 
gration scheme with variable stepsize from (2), (3) one 
obtains the so-called steepest descent algorithm. Let &* € 
RY be the approximation of x(t,), k € N, where to = 0, 
0 < th < thy < +00, k = 1, 2,..., and th, — +00 when k 
— oo, obtained with a numerical optimization method 
coming from (2), (3). Suppose {& kKkeEN}isa sufficiently 
good approximation of the solution x(¢), t > 0 of (2), (3) 
one has lim, -. 99&* = lim; + +00 x(#) = X;,_(x°), thus the 
numerical optimization methods obtained from (2), (3) 
compute critical points that depend on the initial guess 
x°. So that these critical points usually are not global 
minimizers of F. 

One can consider numerical optimization methods 
due to other differential equations instead of (2), that 
is differential equations taking in account higher or- 
der derivatives of F or of x(t). However the minimizers 
computed with these numerical optimization methods 
depend only on local properties of the function F, thus 
in general they will not be global minimizers of F. So 
that methods based on ordinary differential equations 
are inadequate to deal with problem (1). 

In this article it is described how to use stochas- 
tic differential equations to avoid this difficulty. In fact 
one wants to destabilize the trajectories generated by 
problem (2), (3) using a stochastic perturbation in or- 
der to be able to reach global minimizers. This must 
be an appropriate perturbation, that is the correspond- 
ing perturbed trajectories must be able to leave the at- 
traction region of a local minimizer of F to go in an 
attraction region of another minimizer of F obtaining 
as tf > +00 the solution of problem (1). This is done 
by adding a stochastic term, i.e., a Brownian motion 
on the right-hand side of equation (2). Moreover this 
stochastic term takes into account the domain D, when 
D CR. This is done introducing the solution of the 
Skorokhod reflection problem. 

In the second section one gives some mathematical 
background about stochastic differential equations that 
is necessary to state the results of the third and fourth 
sections. In the third section, the unconstrained version 
of problem (1) is treated, i.e., D = RY. In the fourth sec- 
tion, the constrained version of problem (1) is treated, 
i.e, D C RN. In both these sections one gives methods, 


convergence analysis and discussion when possible of 
a relevant software library. In the last section one gives 
some information about new application areas of global 
optimization such as graph theory and game theory. 


Mathematical Background 


Let 2 CR, DY bea o-field of subsets of 2 and P be 
a probability measure on ¥. The triple (2, 2’, P) is 
called a probability measure space, see [5] for a detailed 
introduction to probability theory. Let 2’ C R, Y be 
a topology of subsets of £2’. Then X : 2 > 2’ is a ran- 
dom variable if {X € A} € Y for every A € Y. 

The distribution function Gx :R — [0, 1] of X is de- 
fined by Gx(x) = P{X < x}, x € R and one denotes 
with gx its density. The expected value or the mean 
value of X is defined as follows: 


m(X) = i xGx(dx) = : xgx(x) dx (4) 
R R 
and the variance of X is given by: 
v(X) = m((X — m(X))’). (5) 


For example, a random variable X has discrete distri- 
bution, or is concentrated on x), ..., X,, when gx(x) = 
YL, pid(x — x;), where p; > 0, x; € Q',i=1,...,n, 
do, pi = 1 and 6 is the Dirac delta. Given m € R, v > 
0 arandom variable has normal distribution when 


1 _ (a=m)? 
e 2v ‘ 
VJ 2V 


one notes that m(X) = m and v(X) = v. 

A stochastic process is a family of random variables 
depending on a parameter f, that is, {X(f): 2 > Q',t> 
0}. A Brownian motion is a stochastic process {w(t): t = 
0} having the following properties: 

e P{w(0) = 0} = 1; 

e for every choice of tj, i=1,...,k, 0 < tj < tiz1 < +00, 
i=1,...,k—1, the increments w(t;,;) — w(t), i= 1, 
...,k — 1, are independent and normally distributed 
random variables with mean value equal to zero and 
variance equal to ti41 — fj. 

An N-dimensional Brownian motion is a N-dimen- 
sional process 


&x(x) = 


{w(t) = (wi(t),..., w(t): £2 0} 


where its components {w;(f): 2 > Q',t > 0},i=1,..., 
N, are independent Brownian motions. The Brownian 
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motion is a good mathematical model to describe phe- 
nomena that are the superposition of a large number 
of chaotic elementary independent events. The most 
famous example of Brownian motion is the motion 
of pollen grains immersed in a fluid, the grains have 
a chaotic perpetual motion due to the collisions with 
the molecules of the fluid, see [15, p. 39]. 

Let IT = Q' x-++ x 2!’ C RN, where x denotes the 
Cartesian product of sets. Let Y’ be a topology of sub- 
sets of IT. Let s, t, be such that 0 < s < t, let x € 7, 
A€é Y’, then the transition distribution function of a N- 
dimensional stochastic process {X(t) : t > 0} is defined 
as follows: 


T(s, x, t, A) = P{X(t) € A and X(s) = x}. (6) 


When T can be written as: 
T(s,x, t, A) = | vx. t,y) dy (7) 
A 


for every 0 <s<t,x€ JI, A € TY’ then the function p 
is called the transition probability density of the process 
{X(t): t = O}. 

Finally, if there exists a density distribution function 
a that depends only on x € JT such that: 


a(x) = im | p(s, u, t,x), (8) 


then z is called the steady-state distribution density of 
the process {X(t): t > O}. 

One considers the following stochastic differential 
equation: 

dZ(t) = a(Z(t), t) dt + B(Z(t), t) dw(t), 

t>0, 

Z(0) = x°, 


(9) 


(10) 


where w is the N-dimensional Brownian motion, @ is 
the drift coefficient and f is the diffusion coefficient, 
see [8, p. 98] or [8, p. 196] for a detailed discussion. 
One notes that dw cannot be considered as a differen- 
tial in the elementary sense and must be understood as 
a stochastic differential, see [8, p. 59]. Under regularity 
assumptions on @ and f there exists a unique solution 
{Z(t): t > 0} of (9), (10), see [8, p. 98]. 

When @ is minus the gradient of a potential 
function equation (9) is called the Smoluchowski- 
Kramers equation. The Smoluchowski-Kramers equa- 
tion is a singular limit of the Langevin equation. 


The Langevin equation expresses Newton principle for 
a particle subject to a random force field, see [15, p. 40]. 

Let div, be the divergence operator with respect to 
the variables y, Ay be the Laplace operator with respect 
to the variables y and Lg, a(-) = divy(-w) — (1/2) Ay(-B?). 
Under regularity assumptions on @ and f, the transi- 
tion probability density p(s, x, t, y),O<s<tuye 
RY, associated to the solution {Z(t): t > 0} of problem 
(9), (10) exists and satisfies the Fokker—Planck equa- 
tion, (see 8, p. 149]) that is, given x € RY, s > 0 one 
has: 


a 
+ Lpa(P) = 0, yeR, t>s, (11) 
; lim pls,x,t,y) = 6(x-y), ye RN. (12) 


For the treatment of the constrained global opti- 
mization, that is, problem (1) with DC RY , a stochastic 
process depending on the domain D must be consid- 
ered. Let v(x) C SN~! be the set-valued function that 
gives the outward unit normals of the boundary dD of 
D at the point x € dD. One notes that when x is a regu- 
lar point of dD, v(x) is a singleton. Let 7: [0, T] > RY, 
with possibly [0, T] = R*, let |n|(¢) be the total variation 
of 7 in the interval [0, t], where t < T. The Skorokhod 
problem is defined as follows: let ¢, w, : [0, T] > RY, 
then the triple (¢, y, 7) satisfies the Skorokhod prob- 
lem, on [0, T] with respect to D, if |n|(T) < +00, @(0) = 
w(0) and for t € [0, T] the following relations hold: 


sO=6O eaU, (13) 

$(t) €D, (14) 

In| (t) = / Pa eeeOdie:. 
0 

=— d : 16 

n(t) [rv \n| (s), (16) 


where ys is the characteristic function of the set S and 
y(s) € v((s)), when s € [0, T] and @(s) € 0D and y(s) 
= O elsewhere. Viewing y(t), t € [0, T], as the trajec- 
tory of a point A € RY, one has that at time zero A is 
inside D, since y(0) € D. Moreover the trajectory of A 
is reflected from the boundary of D and the reflected 
trajectory can be viewed as $(t), t € [0, T]. That is, 
is equal to yw until A € D, when A goes out of D it is 
brought back on 0D in the normal direction to 0D. One 
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notes that the function 9 gives the reflection rule with 
respect to the boundary of D of the function y. In [16] 
it is proved that under suitable assumptions on D and 
F there exists a unique solution of the Skorokhod prob- 
lem. 

One considers the following stochastic differential 
equation with reflection term, that is: 


dZ(t) = a(Z(t), t) dt 


+ B(Z(t), t) dw(t)+ dy(t), t>0, (17) 

Z(0) =x’, (18) 
where 

(Z,Z— 9,9) (19) 


is the solution of the Skorokhod problem. One notes 
that relations (14), (19) imply that the solution of (17), 
(18) verifies Z(t) € D, t > 0. In [16] it is proved that 
under some hypotheses there exists a unique solution 
{Z(t): t => O} of (17), (18), (19) for every x° € D. 


Global Unconstrained Optimization 


Given problem (1) with D = R one considers the fol- 
lowing stochastic differential equation: 


dZ(t) = —(VF)(Z(t)) dt + o(t)dw(t), 
t>0, (20) 


Z(0) = x°, (21) 
where {w(t): t > 0} is the N-dimensional Brownian mo- 
tion and o(t) isa suitable decreasing function that guar- 
antees the convergence of the stochastic process {Z(t) : t 
> 0} toa random variable with density concentrated on 
the global minimizers of F. Under some assumptions 
on F, the transition probability density p(0, x°, t, x), x°, 
x € RY, t > 0, of the process {Z(t) : t > 0} exists and ver- 
ifies equations (11), (12); moreover, when o = «€, € > 0, 
for the steady-state distribution density 2_(x), x € RY, 
the following equation holds: 


Le,-vr(te) =0, xERY, (22) 
one has: 
__ 2F(x) N 
e(x) = Cee #@, xeER”, €>0, (23) 


where: 


__ 2F(y) 
Ge= (/ e @2@ qd 
RN 


One assumes C ; < +00 for € > 0. Moreover, one has: 


(24) 


p(0,x°, t,x) = (x) 


__2F(x°) 


+e ¢ 


Yair ee". 25) 


n=1 


where 7? is the eigenfunction of Lev correspond- 
ing to the eigenvalue A”, n = 1, 2,..., and 0 = A2 > 
A¢> +++. One notes that the eigenfunctions 7”, n = 1, 
2..., are appropriately normalized and 7 is the eigen- 
function of L<, vr corresponding to the eigenvalue A? 
= 0. Consider N = 1, the function F smooth and with 
three extrema in x, x°, xt € R such that x7 < x° < x’. 
Moreover, F increases in (x~, x°) and in (x*, +00) and 
decreases in (—00, x) and in (x®, x*). Let: 


(26) 


One assumes c~, c° to be nonzero. In [1] it is shown 
that when F(x”) < F(x"), one has 2¢(x) > 6(x—x_) as 
€ — 0 while when F(x_ ) = F(x*) one has (x) > yd(x 
—x~)+(1—y)d(« — x*), where y = [1+ (sr as € 
— 0, and the limits are taken in distribution sense. That 
is in [1] it is shown that the steady-state distribution 
density tends to Dirac deltas concentrated on the global 
minimizers of F when € —> 0. 
In [12] it is shown that: 


Vv cto 


20 


—25F 
Aw e @ as 


€ 


e—>0, (27) 
where 6 F = max{F(x°) — F(x), F(x°) — F(x*)}. For- 
mula (25) shows that p converges to 7_ when t — +00, 
but the rate of convergence becomes slow when € is 
small. Replacing € with o(t) a slowly decreasing func- 
tion such that o(t) + 0 when t > +00, using elemen- 
tary adiabatic perturbation theory one can expect that 
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the condition: 
+00 2 
/ e 70°F dt = +00 
0 


guarantees that {Z(f) : t > 0} is a solution of (20), (21) 
when t > +00 converges to a random variable concen- 
trated on the global minimizers of F. 

In [6] the following result is proved: 


(28) 


Theorem 1 (convergence theorem) Let F: RN > R be 
a twice continuously differentiable function satisfying the 
following properties: 


min F(x) = 0, (29) 


xERN 


lim FQ) = = \|(VF)(x)|| = +00, (30) 


IIxll>+ 


(31) 


ss II(VF)(x) ||’ — (AF)(x) > —00, 
x|| +00 

let o(t) = ./(c)/(log t) for t > +00, where c > cr > 0 
and cr is a constant depending on the function F. Then 
the transition probability density p of the process {Z(t) : 
t > 0}, solution of (20), (21), converges weakly to a sta- 
tionary distribution sm, that is: 


p(0,x,t,-) > a when t > +00. (32) 


Moreover the distribution x is the weak limit of 1, given 
by (23), as€ > 0. 


One notes that (20), (21) is obtained perturbing the tra- 
jectories given by the steepest descent equation for F 
with the Brownian motion and o is a factor that con- 
trols the amplitude of this perturbation. The fact that 
a(t) + 0 when t > +oo makes possible the stabiliza- 
tion of the perturbed trajectories at the minimizers of 
F. With the assumptions of the convergence theorem 
it is possible to conclude that a is concentrated on the 
global minimizers of F, so that the random variable Z(t) 
= (Z(t), ..., Zy(t)) ‘converges’ to x*, solution of prob- 
lem (1), as t + +00. That is, when x* is the unique 
global minimizer of F, then P{Z;(t) = x*} > 1 when 
t—> +00 fori=1,...,N. 

The stochastic differential equation (20) can be in- 
tegrated numerically to obtain an algorithm for the so- 
lution of problem (1). Let to = 0, ty = ae, hy, where hy 
>0,/=0,1,..., are such that t, — +oo when k > oo 
then using the Euler method one has: 


e = x, (33) 


gk) — gk _ nh (Vrye*) 


+ o(tk)(w(te + hk) — w(tk)), (34) 


where k = 0, 1, ...and € kK E RN is the approximation of 
Z(tz), k= 1, 2,..., see [2,3]. 

In (34) due to the presence of the stochastic term, 
one can substitute the gradient of F with a kind of 
‘stochastic gradient’ of F in order to save computational 
work, see [2,3] for details. 

One notes that the sequence {¢*: k € N} depends 
on the particular realization of the Brownian motion 
{w(t,) :k=0, 1,...}. That is, solving several times prob- 
lem (20), (21), by means of (33), (34), the solutions ob- 
tained are not necessarily the same. However, the con- 
vergence theorem states that ‘all’ the solutions {¢* : k € 
N} obtained by (33), (34) tend to x* as k > +00. 

So that in the numerical algorithm derived from 
(20), (21) using (33), (34) one can approximate by 
means of nr independent realizations (i. e., trajectories) 
of the stochastic process {Z(t) : t > O}, solution of (20), 
(21). A possible strategy for a numerical algorithm is the 
following: after an ‘observation period’ the various tra- 
jectories are compared, one of them is discarded and is 
not considered any more, another one is branched. The 
new set of trajectories are computed throughout the 
next observation period. The following stopping con- 
ditions are used: 

e uniform stop: the final values of the function F at the 
end of the various trajectories are numerically equal; 

e maximum trial duration: a maximum number of ob- 
servation periods has been reached. 

One notes that the algorithms based on the dis- 
cretization of the stochastic differential equations have 
sound mathematical basis, that is for a wide class of 
functions F some convergence results such as the con- 
vergence theorem given above are available. These al- 
gorithms usually have a slow convergence rate, this can 
be seen from the kind of function o which is required 
in the convergence theorem. This implies that the algo- 
rithms based on stochastic differential equations have 
an high computational cost, so that their use is usu- 
ally restricted to low-dimensional problems. However 
these algorithms can be parallelized with a significant 
computational advantage, for example in the algorithm 
described above each trajectory can be computed inde- 
pendently from the others until the end of an obser- 
vation period. One notes that the algorithms derived 
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from (20), (21) are in some sense similar to the simu- 
lated annealing algorithm (cf. also » Simulated anneal- 
ing methods in protein folding) introduced in combi- 
natorial optimization in [11]. 


Global Constrained Optimization 


Given problem (1) with D Cc RN the following stochas- 
tic differential equation with reflection term is consid- 
ered: 


dZ(t) = —(VF)(Z(t)) dt 
+ o(t) dw(t) + dy(t), 
t>0, (35) 


Z(0) = x°, (36) 


where x° € D, {w(t) : t > 0} is the N-dimensional Brow- 
nian motion, o(f) is a suitable decreasing function that 
guarantees the convergence of the stochastic process 
{Z(t) : t > O} to a random variable with density con- 
centrated on the global minimizers of F on D when t > 
+oo and n(t) is a suitable function to assure Z(t) € D, t > 
0, that is, (Z, Z — 7, n) is the solution of the Skorokhod 
problem in R* respect to D. 

Let int(D) be the set of the interior points of D. One 
assumes that D is the closure of int(D). Let p(0, x°, t, x), 
x°, x € int(D), t > 0, be the transition probability density 
of the process {Z(t): t > 0}, solution of (35), (36), when 
o =€,€ >0. Then p satisfies the Fokker-Planck equa- 
tion: 


P + Le-ve(p) =0, xe int(D), (37) 
Jim, p(0,x°, t,x) = 6(x—x’), xe€int(D), (38) 
(Sv es pVF. nx) =0, 

x€0D, t>0, (39) 


where L¢,—vr is defined in (11) (12) and n(x) € v(x) is 
the outward unit normal to 0D in x € 0D. One notes 
that boundary condition (39) assures that P{Z(t) € 
D} = 1 for every t > 0. This boundary condition follows 
from the requirement that (Z, Z — , 7) is the solution 
of the Skorokhod problem. 
One assumes the following properties of F and D: 

e F:D-— Ris twice continuously differentiable; 


e DC RV isa bounded convex domain such that ex- 
ists p satisfying (37), (38), (39) and exists the steady- 
state distribution density of the process solution 
of (35), (36); 

e let 2, be the steady-state distribution density of the 
process solution of (35), (36) when o = €, € > 0, that 
is: 


xe D, (40) 


(41) 


and zr is the weak limit of 7, as € — 0. 
In analogy with the unconstrained case one can con- 
jecture that when D C RN and F: D rarr; R satisfy the 
properties listed above and when o(t) = (c)/(log t) 
for t > +00, where c > cr > 0 and c; is a constant de- 
pending on F, then the transition probability density 
p(0, x°, t, y), x°, x € D, t > 0 of the process {Z(t): t > 
O}, solution of (35), (36) converges to a steady-state dis- 
tribution density 7 when t — +oo and z is the distri- 
bution density obtained as weak limit of z~ when € > 
0. That is, the process {Z(t): t > 0} converges in law to 
a random variable concentrated at the points x* € D 
that solve problem (1). 

A numerical algorithm to solve problem (1), with 
D CRN, can be obtained using a numerical method to 
integrate problem (35), (36). This is done integrating 
numerically problem (20), (21) and ‘adding’ the con- 
straints given by D. In the numerical algorithm the tra- 
jectories can be computed using formulas (33), (34) 
when the trajectories are in D, when a trajectory vio- 
lates the constraints, it is brought back on dD putting 
to zero its normal component with respect to the vi- 
olated constraints. Finally the stopping conditions are 
the same ones considered in the previous section. 

Analogously to the unconstrained problem, the al- 
gorithms based on the stochastic differential equations 
for the constrained case have slow convergence rate. 
However these algorithms have a high rate of paral- 
lelism. 


Miscellaneous Results 


In this section are shown two mathematical problems 
that are somewhat unusual as optimization problems. 
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Clique Problem 


Let I = {1,..., N} C N be a finite set, let J « I be the 
set of unordered pairs of elements of I. Let E CI * I. 
Then a graph G is a pair G = (I, E), where I is the set of 
the nodes of G and E is the set of the edges of G, i.e. {i, 
j} € E implies that G has an edge joining nodes i, j € I. 
A graph G = (I, E) is said to be complete or to be a clique 
when E =I * I. A graph G’ = (I’, E’) is a subgraph of G 
= (I, E) when I’ CI and E’ CEN (I' *I’). 

The maximum clique problem can be defined as fol- 
lows: Given G = (I, E), find the largest subgraph G’ of G 
which is complete. Let k(G) be the number of nodes of 
the graph G’. 

Several algorithms exist to obtain a numerical so- 
lution of the maximum clique problem see, for exam- 
ple, [14] where the branch and bound algorithm is de- 
scribed. 

One considers here the maximum clique problem 
as a continuous optimization problem. The adjacency 
matrix A of the graph G = (I, E) is a square matrix of 
order equal to the number of nodes of G and its generic 
entry Aj, ;, at row i and at column j, is defined equal to 1 
if {i, j} € E and is equal to 0 otherwise. Then in [13] it is 
shown that: 


1-—— = TAX, 42 
iG 4a 2) 
where 
S=¢x=(x,...,xv)'e RY: 


N 
\ =e es = 1, WN 
i=1 
One notes that many maximizers of (42) can exist, how- 
ever there exists always a maximizer x* = (xj, ..., xy)! 
of problem (42) such that for i= 1,..., N one has x¥ = 
1/k(G) if i € G’ and x* = 0 if i ¢ G’. That is the maxi- 
mum clique problem is reduced to a continuous global 
optimization problem that can be treated with the al- 
gorithms described above. Several other problems in 
graph theory can be reformulated as continuous opti- 
mization problems. 


Quasivariational Inequalities 


Let X C RN be a nonempty set, let 2(x) C X, x € X, 
be a set-valued function and let F: RY — RY. The qua- 


sivariational inequality problem, is defined as follows: 
Find a vector x* € §2(x*) such that: 

(F(x*), y— x") =0, Vy € 2(x*), (43) 
see [4] for a detailed introduction to quasivariational 
inequalities. This problem can be reduced to the search 
of a fixed-point of a function defined implicitly by 
a variational inequality. 

The quasivariational inequalities have many appli- 
cations such as for example the study of the generalized 
Nash equilibrium points of an N-player noncoopera- 
tive game. See [10] fora detailed discussion on N-player 
noncooperative games. 
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Directional Derivatives 


Let f be a function defined on some open set X C R” 
and taking its values in R = R U {—00, +00}. The set 
dom f = {x € X : |f(x)| < + co} is called the effective set 
(or domain) of the function f. Take x € dom f, g € R”. 
Put 


fol. g):= lim sup - [fix+ag)—f@®], 
ayo 
1 
fy (x. 9) = lim inf = [f(x + ag) — f(x)]. (2) 


Here a | 0 means that a — +0. 

The quantity fi (x, g) (respectively, Fae (x, g)) is 
called the Dini upper (respectively, lower) derivative of 
the function f at the point x in the direction g. 

The limit 


f'(x, 8) = f(x, 9) = me . [f(x + ag) — f(x)], (3) 


is called the Dini derivative of f at the point x in the 
direction g. If the limit in (3) exists, then ft (x,g) = 


fis, g) = f(x. 8). 
The quantity 


fi(x,g) :=  limsup es [f(x + ag’) — f(x)] (4) 


[a,g’]>[+0,g] & 


(respectively, 


limin€ —[f( tag’) — f(x), 


V(x, ¢) = 
falx.8 [a,g"|>-[+0,g] oe 
(5) 


is called the Hadamard upper (respectively, lower) 
derivative of the function f at the point x in the direc- 
tion g. 

The limit 


Le = lim = [f(x + ag’) — f(x)| © 


[a,g’|>[+0,g] 
is called the Hadamard derivative of f at x in the direc- 
tion g. 

If the limit in (6) exists, then f(x, gQ= ra (x, g). 
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Note that the limits in (1), (2), (4) and (5) always 
exist but are not necessarily finite. 


Remark 1 In the one-dimensional case (R” = R) the 
Hadamard directional derivatives coincide with the 
corresponding Dini directional derivatives: 


filx,g) = fox. 9), 
fax, g) = fox. 9), 
fulx, g) = folx, g). 


If the limit in (3) exists and is finite, then the function 
f is called differentiable (or Dini differentiable) at x in 
the direction g. The function f is called Dini direction- 
ally differentiable (Dini d.d.) at the point x if it is Dini 
differentiable at x for every g € R”. Analogously, if the 
limit in (6) exists and is finite, the function f is called 
Hadamard differentiable at x in the direction g. The 
function f is called Hadamard directionally differen- 
tiable (Hadamard d.d.) at the point x if it is Hadamard 
differentiable at x for every g € R". 

If the limit in (6) exists and is finite, then the limit 
in (3) also exists and f/,(x, g) =f’ (x, g). The converse is 
not necessarily true. 

All these derivatives are positively homogeneous (of 
degree one) functions of direction: 


fax, Ag) = Afalx,g), 


(Here x is either t, or |, and Q is either D, or H.) 

A function f defined on an open set X is called Dini 
uniformly directionally differentiable at a point x € X if 
it is directionally differentiable at x and for every € > 0 
there exists a real number ap > 0 such that 


VA > 0. (7) 


= [te +9) — f(x) —af'(x, g)] <2, 


Va€(0,a), VgeES, 


where S = {g € R": ||g|| = 1} is the unit sphere. 


Proposition 2 (see [2, Thm. I.3.2]) A function f is 
Hadamard d.d. at a point x € X if and only if it is Dini 
uniformly differentiably at x and its directional deriva- 
tive f’(x, g) is continuous as a function of direction. 


Remark 3 Iff is locally Lipschitz and Dini directionally 
differentiable at x € X, then it is Hadamard d.d. at x, too. 


For Dini and Hadamard derivatives (see (3) and (6)) 
there exists a calculus: 


Proposition 4 Let functions f; and f2 be Dini 
(Hadamard) directionally differentiable at a point x € 
X. Then their sum, difference, product and quotient (if 
f2(x) # 0) are also Dini (Hadamard) d.d. at this point 
and the following formulas hold: 


(fi = frdolx, 8) = fig(x, 8) + fra(*, 8), (8) 


(i Alo.) = fiflo(x. 9+ Ale)flolx.g), (9) 


Ay ee ae 
¢ 38) = Gaye Piao 8) 

—fi(x) fig(x, g)). (10) 
Here Q is either D, or H. 


These formulas follow from the classical theorems of 
differential calculus. 


Proposition5 Let 


— i(x), 11 
g(x) = max fi(x) (11) 
where the functions f; are defined and continuous on an 
open set X C R" and Dini (Hadamard) d.d. at a point 
x € X ina direction g. Then the function ¢ is also Dini 
(Hadamard) d.d. at x and 


Po (x, g) = max fi(x, 8). (12) 


where R(x) = {i€ 1:N:fi(x) = p(x)} (see [2, Cor. 1.3.2]). 


If g is defined by 
p(x) = mage es Y: 


where Y is some set, then under some additional con- 
ditions a formula, analogous to (12), also holds (see [2, 
Chap. I, Sec. 3]). 

A theorem on the differentiability of a composition 
can also be stated. 

Unfortunately, formulas similar to (8)-(10) and 
(12) are not valid for Dini (Hadamard) upper and lower 
derivatives. 

The Dini and Hadamard upper and lower direc- 
tional derivatives are widely used in nonsmooth anal- 
ysis and nondifferentiable optimization. For example, 
the following mean value theorem holds. 
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Proposition 6 (see [2, Thm. I.3.1]) Let f be defined 
and continuous on the interval {y: y = x + ag, a € [0, 
Qo], Xo > O}. Put 


m= inf fol + ag.9). 


aE[0,a0 


M= sup f(x +ag,g). 


a€[0,a] 


Then [1] 
may < f(x + aog) — f(x) < Map. 


The following first order approximations may be con- 
structed via the Dini and Hadamard derivatives. 


Proposition 7 Let f be defined on an open set X C R", 
and Dini d.d. at a point x € X. Then 


f(x + A) = f(x) + fplx, A) + op(x, A). (13) 
If f is Hadamard d.d. at x, then 
f(x + A) = f(x) + f(x, A) + on(x, A). (14) 


Let f be defined on an open set X C R" and finite at x € 
X. Then 


flx +A) = f(x) + fhe, A)+Op(x, A), (15) 
f(x + A) = f(x) + f(x, A) + op(x, A), (16) 
flx + A) = f(x) + fl, A) + du(x, A), (17) 
f(x + A) = f(x) + f(x, A) + on(x, A), (18) 
where 
aA 
ROA. WA eit (19) 
a 
oy(x, aA) ||All>o 
—— (20) 
|All 
A 
ineup 2 VA SR (21) 
ao a 
aA 
liminf 2") 9 VAeR" (22) 
ayo a 
A’ 
imap (HO) 6 YAER, 03) 
(a, A’]>[+0,A] a 
A’ 
imint 289") 25 VaeR®, Ga) 
[a,A’][+0,A] a 


First Order Necessary and Sufficient Conditions 
for an Unconstrained Optimum 


Let a function f be defined on an open set X C R", 2 be 
a subset of X. A point x* € 2 is called a local minimum 
point (local minimizer) of the function f on the set Q2 if 
there exists 5 > 0 such that 
f(x) > f(x*), Wx € 20 B3(x*), 

where Bs (x*) = {x € R”: || x — x* || < 5}. If5 = +00, then 
the point x* is called a global minimum point (global 
minimizer) of f on 2. A point x* € 92 is called a strict 
local minimum point (strict local minimizer) of f on Q 
if there exists 5 > 0 such that 


f(x) > f(x"), Vere 2 01B3(x*), x Ax". 
Analogously one can define local, global and strict local 
maximum points (maximizers) of f on 22. 

It may happen that the set of local (global, strict lo- 
cal) minimizers (maximizers) is empty. 

If §2 = X then the problem of finding a minimum or 
a maximum of f on X is called an unconstrained opti- 


mization problem. 


Proposition 8 Let a function f be Dini (Hadamard) 
directionally differentiable on X. For a point x* € dom f 
to be a local or global minimizer of f on X it is necessary 
that 


fplx",g) 20, Vg eR", (25) 
(fulx*,g)=0 VgeR"). (26) 
If f is Hadamard d.d. at x* and 

fulx",g)>0, WeeR", g#0n, (27) 


then x* is a strict local minimizer of f. 
Here 0, = (0,..., 0) is the zero element of R”. 


Proposition9 Let f be Dini (Hadamard) d.d. on X. For 
a point x** € dom f to be a local or global maximizer of 
f on X it is necessary that 


(28) 


fo(x**,g) <0, Vg eR", 


(29) 


( fuxlx**,g) <0, Vg eR"). 
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If f is Hadamard d.d. at x** and 


fax" 2) <0, VYeeR", ¢£0,, (30) 


then x** is a strict local maximizer of f. 


Note that (26) implies (25), and (29) implies (28). In 
the smooth case f%,(x, g) = (f’(x), g) (f’(x) being the 
gradient of f at x) and the conditions (27) and (30) are 
impossible. It means that the sufficient conditions (27) 
and (30) are essentially nonsmooth. 


Proposition 10 Let f be defined on an open set on X 
C R". For a point x* € dom f (i.e., |f(x)| < +00) to be 
a local or global minimizer of f on X it is necessary that 


fo(x*.g) 20, VgeR", (31) 
fy(x*,g)>0, VWeeR". (32) 
If 

fx(x*.g)>0, VgeER", g#0n, (33) 


then x* is a strict local minimizer of f. 
Note that (32) implies (31) but (31) does not necessarily 
imply (32). 


Proposition Let f be defined on an open set on X C 
R". For a point x* * € dom f to be a local or global max- 
imizer of f on X it is necessary that 


folx™,g) <0, VgeR" (34) 
and 

fie, g) <0, VgeR". (35) 
If 

filx**.g) <0, VgeER", g#0,, (36) 


then x** is a strict local maximizer of f. 

The condition (35) implies (34) but (34) does not nec- 
essarily imply (35). 

Remark 12 Observe that the conditions for a mini- 
mum are different from the conditions for a maximum. 
A point x* satisfying the conditions (25) or (31) is called 
a Dini inf-stationary point of f, while a point x* satis- 
fying (26) or (32) is called an Hadamard inf-stationary 


point. A point x** satisfying the conditions (28) or (34) 
is called a Dini sup-stationary point of f, while a point 
x** satisfying (28) or (35) is called an Hadamard sup- 
stationary point. 


Remark 13 Note that the function f is not assumed to 
be continuous or even finite-valued. 


Let x9 € dom f and assume that the condition (31) does 
not hold, i.e. xo is not a Dini inf-stationary point. If go 
ER", |[goll = 1 


f5 (Xo. 0) = f4 (X08), 
gij=1 


then go is called a Dini steepest descent direction of f at 
Xo (||g|| is the Euclidean norm). 
If (32) does not hold and if go € R”, ||go|| = 1, 


fa (x0. 80) = = FA (X00 8), 
gi/=1 


then go is called an Hadamard steepest descent direction 
of f at xo. 

Analogously if xo is not a Dini sup-stationary point 
and if g° ER", ||g°|| = 1, 


fid(xo,g°) = sup fit (xo, 9), 
ls||=1 


then g° is called a Dini steepest ascent direction of f at 
Xo. 

If xo is not an Hadamard sup-stationary point of f 
(i. e. (35) does not hold) and if g° € R", || g° || = 1, 


fa(xo.g°) = sup fil(xo.g), 
ls\|=1 


then g° is called an Hadamard steepest ascent direction 
of f at Xo. 

Of course it is possible that there exist many steepest 
descent or/and steepest ascent directions of f at xo. 

It may also happen that some direction is a direc- 
tion of steepest ascent and, at the same time, a direc- 
tion of steepest ascent as well (which is impossible in 
the smooth case). 


Example 14 Let X=R, 


x £0, 


0, = 


|x| + $x sin 1, 


f(x) = 
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Take xo = 0. It is clear that (see Fig. 1): 


t = ae ees 
fo (0, 8) = Ig| + 5 lg| = 5 gl. 
t aie wie = 
fp (xo. g) = Ig! 5 el ales 


As X = R, the Hadamard derivatives coincide with the 


Dini ones (see Remark 1). 


fy (xo.g)>0, Vg #0, 


we may conclude (see (32)) that x9 is a strict local min- 
imizer (in fact it is a global minimizer but our theory 


does not allow us to claim this). 


Note that ft and fs are positively homogeneous 
(see (7)), therefore it is sufficient to consider (in R) only 


two directions: g; = 1 and g) = —1. 


Example 15 Let X =R, xo = 0, 


xsin+, x>0, 
f(x) = | 7 
0, x <0. 
It is clear that (see Fig. 2) that 
t _ jigl, g>9, 
fp (x0. g) = hs es, 
1 a pelle £25; 
fp (Xo. 2) a : g<0. 


Neither the condition (25) nor the condition (31) 
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minimizer nor a local maximizer. Since 


max fb (xo. g) 
Ils l]=1 


= max{ f (x0, +1), fg (to. -D)} 
= max{1,0} = f(x, +1) = +1, 


then g, = +1 is a steepest ascent direction. 
Since 


min 3 (x0. 8) 
Ilgl|=1 
= min{ f5 (x0, +1), fh (x0, -D} 


= min{—1,0} = f¥(xo, +1) = —1, 


then g; = +1 is a steepest descent direction as well. 


Conditions for a Constrained Optimum 


Let a function f be defined on an open set X C R", 2 be 
a subset of X. Let x € &2, |f(x)| < +00, g € R". The limit 


fig (x, g; 2) = lim sup f(x +.ag) — f(x) 


ao a 
x+ageR 


(37) 


is called the Dini conditional upper derivative of the 
function f at the point x in the direction g with re- 
spect to (2. If no sequence {a;} exists such that a, | 
0, x + akg € 22 for all k, then, by definition, we set 


holds, therefore we conclude that xo is neither a local pa (x, g; 2) = —o0. 
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The limit 
f(x + ag) — f(x) 


a 


(38) 


fy(x, g; 2) = lim inf 
alo 
x+ageQ 


is called the Dini conditional lower derivative of the 
function f at the point x in the direction g with re- 
spect to 2. If no sequence {ax} exists such that a, | 
0, x + agg € 2 for all k, then, by definition, we set 
f(x, g 2) = +00. 


The limit 
/ _ 
f(x, g2) = limsup f(x +48") — f(x) (39) 
[a,g’]>[+0,g] a 
x+ag’/EQ 


is called the Hadamard conditional upper derivative of 
the function f at the point x in the direction g with re- 
spect to §2. If no sequences {ax}, {gx} exist such that [a;, 
gx] > [+0, g],x + a% ge € 2 for all k, then, by definition, 
we set fiz (x, g3 2) = —o0. 


The limit 
/ —_— 
fit(x, 3 2) = liminf fle + ag)— fH) (yy 
(a, 9’]>[+0,g] a 
xtag’EQ 


is called the Hadamard conditional lower derivative of f 
at x in the direction g with respect to 2. Ifno sequences 
{ax}, {gx} exist such that [a,, gx] > [+0, g],x+ ange € 
2 for all k, then, by definition, we set ft (x, g3;82) = 
+00. 

Proposition 16 (see [1]) For a point x* € 2 and such 


that |f(x*)| < co to be a local or global minimizer of f on 
Q it is necessary that 


fr(x*,g;2)>0, VeeR’, (41) 

fx(x*,g;2)>0, VgeR". (42) 
Furthermore, if 

fy(x*,g;2)>0, VWeeR", gp, (43) 


then x* is a strict local minimizer of f on Q. 


A point x* € 92 satisfying (41) ((42)) is called a Dini 
(Hadamard) inf-stationary point of f on 2. 


Proposition 17 For a point x** € Q and such that 
\f(x**)| < 00 to be a local or global minimizer of f on 
Q2 it is necessary that 


f(x**,g;2) <0, Vg eR’, (44) 


fi(x**,g;2) <0, VgeR". (45) 


If 


fi Ge, g; Q) <0, VgeR", g¥0, (46) 


then x** is a strict local maximizer of f on 92. 


A point x** € Q satisfying (44) ((45)) is called a Dini 
(Hadamard) sup-stationary point of f on 92. 
The condition (41) is equivalent to 


fa(x*,g32)>0, Vg € K(x*, 2), (47) 
where 
Ak 1 0, 
K(x*,2)= 4 gER": Jog: x* +argeQ, 
Vk 
(48) 
Analogously, the condition (44) is equivalent to 
fo(x**,g;2) <0, Wg € K(x**,Q). (49) 
The condition (42) is equivalent to 
FAC, g;2)>0, Ve eT (x*,Q), (50) 
where 
I'(x*,Q) 
lax, gk] > [+0, g], 
= gE R": Af{log. gel}: x* tage € Q, 
Vk 
(51) 
Analogously, the condition (45) is equivalent to 
f(x, g;2)<0, Vege P(x**,Q). (52) 


Note that the cones K(x*, 2) and K(x**, §2) are 
not necessarily closed, while the cones ["(x*, (2) and 
I'(x**, §2) are the Bouligand cones to (2 at x* and x**, 
respectively, and therefore always closed. 

Now it is possible to define conditional steepest as- 
cent and descent directions. 


Remark 18 It is also possible (see [3, p. 156]) to define 
the Dini and Hadamard conditional directional deriva- 
tives as follows: 

f(x + ag) — f(x) 


a 


(53) 


fo(x, 32) = lim 
alo 
x+age? 


716 


Directed Tree Networks 
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[a,g’]>[+0,g] a 
x+ag’/EQ 


. (54) 


A function f is called Dini (Hadamard) conditionally 
differentiable at x in a direction g if the limit in (53) 
((54)) exists and is finite. 


Remark 19 The conditional directional derivatives de- 
fined by (37)-(40) essentially depend on the set (2. 


In some cases it is possible to ‘separate’ the function f 
and the set (2 in the necessary conditions (47), (49), 
(50) and (52). For example, if f is Lipschitz and direc- 
tionally differentiable at x, then 


Fig (x, 92) = fis (x, 2) 


= f(x, gs) = fi(x, gs 2) 
= f'(x,g) Vg € K(x, &). 


In this case the derivatives at the left-hand sides of 
(47), (49) and (50), (52) should be replaced by f’(x*, g) 
or f’(x* *, g) respectively. 

Note that if g € I(x, g) but g ¢ K(x,g) then 
fe (x, g3 2) and Fe (x, g; 2) are not finite, by definition, 
while 


fp (&. 85.2) = f(x, B52) = f'(x.98) 

Remark 20 The necessary optimality conditions for 
unconstrained and constrained optimization problems 
described above can be used to construct numeri- 
cal methods for finding corresponding (inf- or sup- 
stationary) points. 

For special classes of functions (e. g., convex, con- 
cave, max-type, minmax-type, quasidifferentiable func- 
tions), the derivative (3) has a more ‘constructive’ form 
and therefore the conditions (25)—(36) and (41)-(46) 
take also more ‘constructive’ forms (see, e. g., [2]). 


Remark 22 The limits in (4), (5), (6), (39) and (40) are 
taken if 


[a, g’] > [+0, g]. (55) 


Sometimes, in the literature instead of this relation 
one can see two relations 


a>+0, g > g. (56) 


It was demonstrated in [4] that the limits resulting 


from (55) and (56) do not necessarily coincide. This 
warning should be taken into account. 


See also 


> Global Optimization: Envelope Representation 

> Nondifferentiable Optimization 

> Nondifferentiable Optimization: Cutting Plane 

Methods 

> Nondifferentiable Optimization: Minimax Problems 

> Nondifferentiable Optimization: Newton Method 

> Nondifferentiable Optimization: Parametric 

Programming 

> Nondifferentiable Optimization: Relaxation 

Methods 

> Nondifferentiable Optimization: Subgradient 
Optimization Methods 


References 


1. Demyanov VF, Di Pillo G, Facchinei F (1997) Exact penaliza- 
tion via Dini and Hadamard conditional derivatives. Optim 
Methods Softw 9:19-36 

2. Demyanov VF, Rubinov AM (1995) Constructive nonsmooth 
analysis. P Lang, Frankfurt am Main 

3. Dieudonne J (1969) Foundations of modern analysis. Acad 
Press, New York 

4. Giannessi F (1995) A common understanding or a common 
misuderstanding. Numer Funct Anal Optim 16(9-10):1359- 
1363 


———— 
Directed Tree Networks 


THOMAS ERLEBACH!, KLAUS JANSEN7, 
CHRISTOS KAKLAMANIS?, GIUSEPPE PERSIANO* 
' Department of Computer Science, 
University of Leicester, Leicester, UK 
? Institut fiir Informatik, Univerisitat Kiel, 
Kiel, Germany 
3 RA CTI, University of Patras, Patras, Greece 
* Dipartimento di Informatica ed Appl. 
Universita di Salerno, Fisciano, Italy 


MSC2000: 05C85 


Article Outline 


Keywords and Phrases 
Complexity Results 
Reduction from Edge Coloring 
Reduction from Arc Coloring 
Approximation Algorithms 
Reduction to Constrained Bipartite Edge Coloring 


Directed Tree Networks 


717 


Partition Into Matchings 
Coloring of Triplets 
Selection of Triplets 


Lower Bounds 
Lower Bound for Greedy Algorithms 
Lower Bounds for Optimal Colorings 


Randomized Algorithms 
Related Topics 
References 


Keywords and Phrases 


Optical networks; Path coloring; Bipartite edge 
coloring; Approximation algorithms 


Technological developments in the field of optical com- 
munication networks using wavelength-division multi- 
plexing have triggered intensive research in an opti- 
mization problem concerning the assignment of col- 
ors to paths in a directed tree. Here, the term directed 
tree refers to the graph obtained from an undirected 
tree by replacing each undirected edge by two directed 
edges with opposite directions. This path coloring prob- 
lem was first studied by M. Mihail, C. Kaklamanis and 
S. Rao [19]. An instance of it is given by a directed tree 
T = (V,E) anda set P = {pj,..., pr} of directed 
simple (i.e., not visiting any vertex twice) paths in T, 
where each path is specified by an ordered pair of ver- 
tices (start vertex and end vertex). The task is to assign 
colors to the given paths such that paths receive differ- 
ent colors if they share a directed edge. The goal is to 
minimize the number of different colors used. For given 
T = (V,E) and P, let L(e) denote the load on directed 
edge e € E,i.e., the number of paths containing e. Ob- 
viously, the maximum load L = maxzeg L(e) is a lower 
bound on the number of colors in an optimal coloring. 
Consider Fig. 1 for an example of a tree with six vertices 
and paths from a to e, from f to e, from f to c, from d to 
b, and from a to b. A possible valid coloring is to assign 
these paths the colors 1, 2, 1, 2, and 3, respectively. The 
maximum load of the paths is 2, because 2 paths use the 
edge (d,c). It is not possible to color these paths with 
2 colors, because the conflict graph of the paths (a graph 
with a vertex for each path and an edge between ver- 
tices if the corresponding paths share an edge) is a cycle 
of length 5. Hence, the coloring with three colors is an 
optimal coloring. 


Directed Tree Networks, Figure 1 
Example path coloring instance 


The path coloring problem models the assignment 
of wavelengths to directed connection requests in all- 
optical networks with tree topology. In such networks 
data is transmitted in optical form via laser beams [13]. 
Two adjacent nodes of the network are connected by 
a pair of optical fiber links, one for each direction. 
When wavelength-division multiplexing is used, multi- 
ple signals can be transmitted over the same link if they 
use different wavelengths, and the nodes are capable of 
switching an incoming signal onto any outgoing link 
depending on the wavelength of the signal. However, 
the wavelength of a signal cannot be changed, and ev- 
ery connection uses the same wavelength on the whole 
transmitter-receiver path. If two signals using the same 
wavelength are transmitted over the same directed link, 
the data is lost due to interference. The number of avail- 
able wavelengths is called optical bandwidth, and it is 
a scarce resource. Therefore, one is interested in min- 
imizing the number of wavelengths necessary to route 
a given set of requests. This optimization problem cor- 
responds to the path coloring problem defined above: 
paths correspond to connection requests, and colors 
correspond to wavelengths. 


Complexity Results 


Whereas the path coloring problem can be solved in lin- 
ear time in chain networks as it is equivalent to interval 
graph coloring, it is NP-hard in directed tree networks. 
More precisely, it is NP-complete to decide whether 
a set of paths in a directed tree of arbitrary degree can 
be colored using at most 3 colors [8,9], and it is NP- 
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complete to decide whether a set of paths in a directed 
binary tree can be colored using at most k colors if k is 
part of the input [7,9,17]. The respective reductions will 
be outlined in the following. It should be remarked that 
the special case in which both the maximum degree of 
the tree as well as the maximum load of the paths are 
bounded by a constant can be solved optimally in poly- 
nomial time [7,9]. 


Reduction from Edge Coloring 


It is an NP-complete problem to decide whether the 
edges of a given 3-regular undirected graph G can be 
colored with three colors such that edges receive differ- 
ent colors if they are incident to the same vertex [14]. 
Let G be a 3-regular undirected graph with n vertices 
and m edges. A directed tree T with n’ = 10n + 1 
vertices and a set P of t = 4m paths in T can be con- 
structed in polynomial time such that the paths in T can 
be colored using three colors if and only if the edges of 
G can be colored with three colors. T consists of a root 
r, one child c, of the root for every vertex v of G, three 
children c}, c? and c} of every c,, and two children ci! 
and ci of every c’. For each edge e = {v,w} of G, 
four paths in T are created: one from ciel to ci and 
to ci? 


vy? 


called real paths, and two copies 
*, called blockers. Here i and 


one from ch ; 
of the path from ci?! to ci”, 
j are chosen such that the subtree rooted at c! resp. ci, 
is not used by any paths other than the paths created 
for this particular edge e. Figure 2 shows an example of 
a 3-regular graph G, the constructed tree T (two of the 
four subtrees are represented by dotted triangles), and 
the paths created for the edge between the black vertices 
of G. 

If the paths in T are to be colored with three colors, 
the blockers ensure that the two real paths correspond- 


Directed Tree Networks, Figure 2 
Reduction from edge coloring 


ing to e receive the same color and, therefore, this color 
cannot be used by any other real path corresponding to 
an edge incident to v or w. If there exists a 3-coloring 
of the paths in T, a 3-coloring of the edges of G can be 
obtained by assigning each edge the color of its corre- 
sponding real paths. On the other hand, if there exists 
a 3-coloring of the edges of G, a 3-coloring of the paths 
in T can be obtained by assigning the real paths corre- 
sponding to edge e the same color as e and coloring the 
blockers with the remaining two colors. Hence, a solu- 
tion to the path coloring problem in T would also solve 
the edge coloring problem in G. 

Since it has just been proved NP-complete to decide 
whether paths in a directed tree can be colored with 
three colors, it follows that there cannot be an approx- 
imation algorithm for path coloring with absolute ap- 
proximation ratio < 4/3 unless P = NP. 


Reduction from Arc Coloring 


The NP-complete arc coloring problem [12] is to de- 
cide for a given set of m arcs Aj,.. 
cle and a given integer k whether the arcs can be 
colored with k colors such that arcs receive differ- 
ent colors if they intersect. Without loss of general- 
ity, assume that each arc is specified by a pair (aj, bj) 
with aj # b; and 1 < aj,b; < 2n. The span 
of arc A; is sp(A;) = {a; + l,a; + 2,...,b;} if 
a; < b; and sp(A;) = {a; + 1,...,2n,1,2,...,b;} 
if a; > b;. Two arcs A; and Aj intersect iff sp(Aj) N 
sp(Aj) # %. Note that one can view the arc coloring 
problem as a path coloring problem on a cycle. 

If a number is contained in the span of more than 
k arcs, then the arcs can surely not be colored with k 
colors and the answer to this instance of arc coloring 
is no. Otherwise, one can assume that every number i, 
1 < i < 2n is contained in the span of exactly k arcs; 
if this were not the case, one could simply add arcs of 
the form (i,i + 1) until the condition holds, without 
changing the answer of the coloring problem. 

Now consider a chain of n vertices vj, v2,.. 
Imagine the chain drawn from left to right, with v, the 
start vertex at its left end. The directed edges from left 
to right followed by the directed edges from right to 
left make up a cycle of length 2n. The given circular 
arcs can be translated into directed paths on this cycle 
such that two paths share a directed edge iff the corre- 
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Directed Tree Networks, Figure 3 
Reduction from arc coloring 


sponding arcs intersect, but these paths do not yet con- 
stitute a valid path coloring problem because some of 
the paths are not simple: an arc (1, 2n) would corre- 
spond to a path running from ¥, to v, and back to v;, 
for example. Nevertheless, it is possible to obtain a valid 
instance of the path coloring problem by splitting paths 
that are not simple into two or three simple paths and 
by using blockers to make sure that the paths derived 
from one non-simple path must receive the same color 
in any valid k-coloring. 

For this purpose, extend the chain by adding k ver- 
tices on both ends, resulting in a chain of length n + 2k. 
Connect each of the newly added vertices to a distinct 
subtree consisting of a new vertex with two leaf chil- 
dren. The resulting network is a binary tree T. If a path 
arrives at vertex v, coming from the left (i-e., from 
Vn—1) and “turns around” to revisit v,—1, divide the path 
into two: one coming from the left, passing through v, 
and ending at the left leaf of one of the subtrees added 
on the right side of the chain; the other one starting at 
the right leaf of that subtree, passing through v, and 
continuing left. In addition, add k — 1 blockers in that 
subtree, i. e., paths from the right leaf to the left leaf. Ob- 
serve that there are no more than k paths containing v, 
as an inner vertex, and a different subtree can be chosen 
for each of these paths. A symmetric splitting procedure 
is applied to the paths that contain v) as an inner vertex, 
i.e., the paths that arrive at v; coming from the right 
(i. e., from v2) and “turn around” to revisit v2. This way, 
all non-simple paths are split into two or three simple 
paths, and a number of blockers are added. 

The resulting set of paths in T can be colored with k 
colors if and only if the original arc coloring instance is 
a yes-instance. The blockers ensure that all paths corre- 
sponding to the same arc receive the same color in any 
k-coloring. Hence, a k-coloring of the paths can be used 
to obtain a k-coloring of the arcs by assigning each arc 
the color of its corresponding paths. Also, a k-coloring 


of the arcs can be turned into a k-coloring of the paths 
by assigning all paths corresponding to an arc the same 
color as the arc and by coloring the blockers with the 
remaining k — 1 colors. This shows that the decision 
version of the path coloring problem is NP-complete 
already for binary trees. 


Approximation Algorithms 


Since the path coloring problem in directed tree net- 
works is NP-hard, one is interested in polynomial-time 
approximation algorithms with provable performance 
guarantee. All such approximation algorithms that have 
been developed so far belong to the class of greedy algo- 
rithms. A greedy algorithm picks a start vertex s in the 
tree T and assigns colors to the paths touching (starting 
at, ending at, or passing through) s first. Then it visits 
the remaining vertices of the tree in some order that en- 
sures that the current vertex is adjacent to a previously 
visited vertex; for example, a depth-first search can be 
used to obtain such an order. When the algorithm pro- 
cesses vertex v, it assigns colors to all paths touching v 
without changing the color of paths that have been col- 
ored at a previous vertex. Each such step is referred to 
as coloring extension. Furthermore, the only informa- 
tion about the paths touching the current vertex that 
the algorithm considers is which edges incident to the 
current vertex they use. To emphasize this latter prop- 
erty, greedy algorithms are sometimes referred to as lo- 
cal greedy algorithms. 

Whereas all greedy algorithms follow this general 
strategy, individual variants differ with respect to the 
solution to the coloring extension substep. The best 
known algorithm was presented by T. Erlebach, K. 
Jansen, C. Kaklamanis, and P. Persiano in [11,16] (see 
also [10]). It colors a set of paths with maximum load 
L in a directed tree network of arbitrary degree with 
at most [5L/3] colors. In the next section this will be 
shown to be best possible in the class of greedy algo- 
rithms. 

For the sake of clarity, assume that the load on all 
edges is exactly L and that L is divisible by 3. The al- 
gorithm maintains two invariants: (a) the number of 
colors used is at most 5L/3, and (b) for each pair of 
directed edges with opposite directions the number of 
colors used to color paths going through either of these 
edges is at most 4L/3. First, the algorithm picks a leaf s 


720 


Directed Tree Networks 


of T as the start vertex and colors all paths starting or 
ending at s using at most L colors. Therefore, the invari- 
ants are satisfied initially. It remains to show that they 
still hold after a coloring extension step if they were sat- 
isfied at the beginning of this step. 


Reduction to Constrained Bipartite Edge Coloring 


The coloring extension problem at a current vertex v is 
reduced to a constrained edge coloring problem in a bi- 
partite graph G, with left vertex set V, and right ver- 
tex set V2. This reduction was introduced by M. Mihail, 
C. Kaklamanis and S. Rao in [19]. Let m9, 11,..., nx be 
the neighbors of v in T, and let mo be the unique neigh- 
bor that was processed before v. For every neighbor n; 
of v the graph G, contains four vertices: vertices w; and 
zi in Vi, and vertices x; and y; in V>. Vertex w; is said 
to be opposite x;, and z; is opposite y;. A pair of oppo- 
site vertices is called a line of G,. A line sees a color if it 
appears on an edge incident to a vertex of that line. For 
every path touching v there is one edge in G,: an edge 
(w;, x;) for each path coming from nj, passing through 
v and going to nj; an edge (w;, yi) for each path com- 
ing from n; and ending at v; and an edge (z;, x;) for 
each path starting at v and going to nj. 

It is easy to see that coloring the paths touching v 
is equivalent to coloring the edges of G,. Note that the 
vertices w; and x; have degree L in G,, while the other 
vertices may have smaller degree. If this is the case, the 
algorithm adds dummy edges (shown dashed in Fig. 4) 
in order to make the graph L-regular. 

As the paths that contain the edges (mo, v) or (v, No) 
have been colored at a previous vertex, the edges inci- 
dent to wo and xp are already colored with at most 4L/3 
colors by invariant (b). These edges are called color- 
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forced edges. A color that appears on exactly one color- 
forced edge is a single color. A color that appears on two 
color-forced edges is a double color. Since there are at 
most 4L/3 colors on 2L color-forced edges, there must 
be at least 2L/3 double colors. Furthermore, one can as- 
sume that there are exactly 2L/3 double colors and 2L/3 
single colors, because if there are too many double col- 
ors then it is possible to split an appropriate number of 
double colors into two single colors for the duration of 
the current coloring extension step. In order to main- 
tain invariant (a), the algorithm must color the uncol- 
ored edges of G, using at most L/3 new colors (colors 
not used on the color-forced edges). Invariant (b) is sat- 
isfied by ensuring that no line of G, sees more than 4L/3 
colors. 


Partition Into Matchings 


G, is an L-regular bipartite graph and its edges can 
thus be partitioned into L perfect matchings efficiently. 
Each matching is classified according to the colors on 
its two color-forced edges: SS-matchings contain two 
single colors, ST-matchings contain one single color 
and one double color, PP-matchings contain the same 
(preserved) double color on both color-forced edges, 
and TT-matchings contain two different double col- 
ors. Next, the L matchings are grouped into chains and 
cycles: a chain of length £ > 2 is a sequence of ¢ 
matchings M,,..., Mg such that M; and My are ST- 
matchings, M2,...,Mg_; are TT-matchings, and two 
consecutive matchings share a double color; a cycle of 
length £ > 2 is a sequence of £ TT-matchings such that 
consecutive matchings as well as the first and the last 
matching share a double color. Obviously, the set of L 
matchings is in this way entirely partitioned into SS- 
matchings, chains, cycles, and PP-matchings. In addi- 
tion, if a chain or cycle contains parallel color-forced 
edges, then the algorithm exchanges these edges in the 
respective matchings, thus dividing the original chain 
or cycle into a shorter sequence of the same type and an 
extra cycle. 

Now the algorithm chooses triplets, i.e., groups of 
three matchings, and colors the uncolored edges of each 
triplet using at most one new color and at most four 
active colors. The active colors are selected among the 
colors on color-forced edges of that triplet, and a color 
is active in at most one triplet. The algorithm ensures 
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that a line that sees the new color does not see one of 
the active colors of that triplet. This implies that no line 
of G, sees more than 4L/3 colors altogether, as required 
to maintain invariant (b). 


Coloring of Triplets 


The rules for choosing triplets ensure that each triplet 
contains two color-forced edges with single colors and 
four color-forced edges with double colors. Further- 
more, most triplets are chosen such that one double 
color appears twice, and this double color as well as 
the two single colors can be reused without considering 
conflicts outside the triplet. V. Kumar and E.J. Schwabe 
proved in [18] that such triplets can be colored as re- 
quired using three active colors and one new color. This 
coloring procedure can be sketched as follows. Partition 
the edges of the triplet into a matching on all vertices 
except Wo and xp and a gadget, i. e., a subgraph in which 
Wo and x have degree 3 while all other vertices have de- 
gree 2. A gadget consists of a number of cycles of even 
length not containing wo or xp and either three disjoint 
paths from wo to xo or one path from wo to xo, one path 
from wo to wo, and one path from x to xo. A careful 
case analysis shows that the triplet can be colored by 
reusing the single colors and the double color to color 
the gadget and using a new color for the matching. If 
a partitioning into gadget and matching does not exist, 
the triplet contains a PP-matching and can be colored 
using the double color of the PP-matching for the un- 
colored edges of the PP-matching and a single color and 
a new color for the uncolored edges of the cycle cover 
consisting of the other two matchings. 

In the following, the terms even sequence and odd 
sequence refer to sequences of TT-matchings of even 
resp. odd length such that consecutive matchings share 
a double color. Note that an even sequence can be 
grouped into triplets by combining two consecutive 
matchings of the sequence with an SS-matching as long 
as SS-matchings are available and combining each re- 
maining TT-matching with a chain of length 2. There 
are always enough SS-matchings or chains of length 2 
because the ratio between color-forced edges with dou- 
ble colors and color-forced edges with single colors is 
2: 1in G, initially and remains the same after extract- 
ing triplets. Similarly, an odd sequence can be grouped 
into triplets if there is at least one chain of length 2, 


which can be used to form a triplet with the first match- 
ing of the sequence, leaving an even sequence behind. 


Selection of Triplets 


Now the rules for selecting triplets are as follows. From 
chains of odd length, combine the first two matchings 
and the last matching to form a triplet. The remainder 
of the chain (if non-empty) is an even sequence and can 
be handled as described above. Cycles of even length are 
even sequences and can be handled the same way. As 
long as there is a chain of length 2 left, chains of even 
length > 4 and odd cycles can be handled, too. Pairs of 
PP-matchings can be combined with an SS-matching, 
single PP-matchings can be combined with chains of 
length 2. If there are two chains of even length > 4, 
combine the first two matchings of one chain with the 
last matching of the other and the last two matchings 
of the first chain with the first matching of the other, 
leaving two even sequences behind. So far, all triplets 
contained a double color twice and could be colored 
as outlined above. What remains is a number of cy- 
cles of odd length, at most one chain of even length, 
at most one PP-matching, and some SS-matchings. To 
deal with these, it is necessary to form some triplets 
that contain four distinct double colors. However, it 
is possible to ensure that the set of color-forced edges 
of G, (inside and outside the triplet) colored with one 
of these double colors does not contain parallel edges; 
T. Erlebach, K. Jansen, C. Kaklamanis and P. Persiano 
showed in [11] that such a triplet can be colored as re- 
quired using its single colors, two of its double colors, 
and one new color. 

In the end, the entire graph G, has been partitioned 
into triplets, and each triplet has been colored using at 
most one new color and such that a line that sees a new 
color in a triplet does not see one of the active colors 
of that triplet. Hence, invariants (a) and (b) hold at the 
end of the coloring extension step, and once the color- 
ing extension step has been performed for all vertices 
of T all paths have received one of [5L/3] colors. Since 
the number OPT of colors necessary in an optimal col- 
oring is at least L, this implies that the algorithm uses 
at most [5OPT/3] colors to color the paths. From the 
lower bound in the next section it will be clear that the 
algorithm (and any other greedy algorithm) is not bet- 
ter than 5OPT/3 in the worst case. 
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Note that greedy algorithms are well-suited for 
practical distributed implementation in optical net- 
works: one node of the network initiates the wavelength 
assignment by assigning wavelengths to all connections 
going through that node; then it transfers control to 
its neighbors who can extend the assignment inde- 
pendently and in parallel, transferring control to their 
neighbors in turn once they are done. 

It should be mentioned that simpler variants of 
greedy algorithms are known that are restricted to bi- 
nary trees and color a given set of paths with load L us- 
ing [5L/3] colors. These algorithms do not make use 
of the reduction to constrained bipartite edge color- 
ing [6,15]. 


Lower Bounds 


Two kinds of lower bounds have been investigated for 
path coloring in directed tree networks. First, one wants 
to determine the best worst-case performance guarantee 
achievable by any greedy algorithm. Second, it is inter- 
esting to know how many colors are required even in 
an optimal coloring for a given set of paths with load L 
in the worst case. 


Lower Bound for Greedy Algorithms 


For a given local greedy algorithm A and positive in- 
teger L, an adversary can construct an instance of path 
coloring in a directed binary tree network such that A 
uses at least [5L/3| colors while an optimal solution 
uses only L colors [15]. The construction proceeds in- 
ductively. As A considers only the edges incident to 
a vertex v when it colors the paths touching v, the ad- 
versary can determine how these paths should continue 
and introduce new paths not touching v depending on 
the coloring A produces at vertex v. 

Assume that there are a;L/2 paths going through 
each of the directed edges between vertex v and its par- 
ent, and that these paths have been colored with a;L 
different colors. Initially, this assumption can be sat- 
isfied for @ = 1 by introducing L paths in either di- 
rection on the link between the start vertex picked by 
algorithm A and one of its neighbors and letting ap- 
propriately chosen L/2 of these paths start resp. end at 
that neighbor. Denote the set of paths coming down 
from the parent by Py and let them continue to (pass 
through) the left child v; of v. Denote the set of paths 
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going up to the parent by P, and let them pass through 
the right child v2 of v. Introduce a set Py of (1 — aj/2)L 
paths coming from v2 and going left to v;, and a set P, 
of L paths coming from v, and going right to vp. 

Algorithm A must use (1 — @;/2)L new colors to 
color the paths in Pe. No matter which colors it chooses 
for the paths in P,, it will use at least (1 + a;/4)L differ- 
ent colors on the connection between v and vy; or on 
the connection between v and v2. The best it can do 
with respect to minimizing the number of colors ap- 
pearing between v and vy, and between v and 1 is to 
color (1 — @;/2)L paths of P, with colors used for Pe, 
a;L/4 paths of P, with colors used for Pz, and a;L/4 
paths of P, with colors used for P,,. In that case, it uses 
(1 + @;/4)L colors on each of the downward connec- 
tions of v. Any other assignment uses more colors on 
one of the downward connections. 

If the algorithm uses at least (1 + a;/4)L different 
colors for paths on, say, the connection between v and 
v1, let (1 + @;/4)L/2 of the downward paths and equally 
many of the upward paths extend to the left child of 
v1, such that all of these paths use different colors, and 
let the remaining paths terminate or begin at v). Now 
the inductive assumption holds for the left child of vy 
with a@j4; = 1+ a;/4. Hence, the number of colors 
on a pair of directed edges can be increased as long as 
a; < 4/3. When a; = 4/3, 4L/3 colors are used for the 
paths touching v and its parent, and algorithm A must 
use L/3 new colors to color the paths in Pe, using 5L/3 
colors altogether. 
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The previous calculations have assumed that all oc- 
curring terms like (1+ a@;/4)L/2 are integers. If one takes 
the possibility of non-integral values into account and 
carries out the respective calculations for all cases, one 
can show that, for every L, every greedy algorithm can 
be forced to use [5L/3| colors on a set of paths with 
maximum load L [15]. 

Furthermore, it is not difficult to show that the paths 
resulting from this worst-case construction for greedy 
algorithms can be colored optimally using only L col- 
ors. Hence, this yields also a lower bound of |5OPT/3 | 
colors for any greedy algorithm. 


Lower Bounds for Optimal Colorings 


The instance of path coloring depicted in Fig. 1 con- 
sists of 5 paths in a binary tree with maximum load 
L = 2 such that even an optimal coloring requires 3 col- 
ors. Consider the instances of path coloring obtained 
from this instance by replacing each path by ¢ identical 
copies. Such an instance consists of 5£ paths with max- 
imum load L = 2¢, and an optimal coloring requires at 
least [5€/2] = [5L/4] colors because no more than two 
of the given paths can be assigned the same color. Fur- 
thermore, [5€/2] colors are also sufficient to color these 
instances: for example, if £ is even, use colors 1,..., ¢ 
for paths from a toe, colors £+1,...,2¢ for paths from 
f toe, colors 1,...,€/2 and 2 + 1,...,5€/2 for paths 
from f to c, colors £/2 + 1,...,3€/2 for paths from d to 
b, and colors 3/2 + 1,...,5€/2 for paths from a to b. 
Hence, for every even L there is a set of paths in a binary 
tree with load L such that an optimal coloring requires 
[5L/4] colors [4,18]. 

While the path coloring instance with L = 2 and 
OPT = 3 could be specified easily, K. Jansen used 
amore involved construction to obtain an instance with 
L = 3 and OPT = 5 [15]. It makes use of three com- 
ponents as building blocks. Each component consists of 
a vertex v with its parent and two children and a spec- 
ification of the usage of edges incident to v by paths 
touching v. 

The root component ensures that at least 3 colors 
are used either on the left downward connection (ex- 
tending below v;) or on the right downward connection 
(extending below v2). Each child of the root compo- 
nent is connected to a type A component, i. e., the child 
is identified with the parent vertex of a type A com- 
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ponent and the corresponding paths are identified as 
well. 

Type A components have the property that, if the 
paths touching v and its parent are colored with 3 col- 
ors, at least 4 colors must be used either for the paths 
touching v and v, or for those touching v and 1. (If the 
paths touching v and its parent are colored with 4 col- 
ors, the remaining paths of the type A component re- 
quire even 5 colors.) Hence, there is at least one child in 
one of the two type A components below the root com- 
ponent such that the paths touching this child and its 
parent are colored with four colors. 
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The final component used is of type B. It has the 
property that, if the paths touching v and its parent are 
colored with 4 colors, at least 4 colors must be used ei- 
ther for the paths touching v and v, or for those touch- 
ing v and v2. For certain arrangements of colors on the 
paths touching v and its parent, 5 colors are necessary. 
It is possible to arrange a number of type B components 
in a binary tree such that for any combination of four 
colors on paths entering the tree of type B components 
at its root, 5 colors are necessary to complete the color- 
ing. Hence, if one attaches a copy of this tree of type B 
components to each of the children of a type A compo- 
nent, it is ensured that at least one of the trees will be en- 
tered by paths with four colors and consequently 5 col- 
ors are necessary to color all paths. Since the load on 
every directed edge is at most 3, this gives a worst-case 
example for path coloring in binary trees with L = 3 
and OPT = 5. 


Randomized Algorithms 


In [1,2], V. Auletta, I. Caragiannis, C. Kaklamanis and 
G. Persiano presented a class of randomized algorithms 
for path coloring in directed tree networks. They gave 
a randomized algorithm that, with high probability, 
uses at most 7/5L + o(L) colors for coloring any set 
of paths of maximum load L on binary trees of height 
o(L"). The analysis of the algorithm uses tail inequali- 
ties for hypergeometrical probability distributions such 
as Azuma’s inequality. Moreover, they proved that no 


randomized greedy algorithm can achieve, with high 
probability, a performance ratio better than 3/2 for trees 
of height {2(L) and better than 1.293 — o0(1) for trees of 
constant height. 

These results have been improved in [5] by I. Cara- 
giannis, A. Ferreira, C. Kaklamanis, S. Pérennes, and 
H. Rivano, who gave a randomized approximation al- 
gorithm for bounded-degree trees that has approxima- 
tion ratio 1.61 + o(1). The algorithm first computes 
in polynomial time an optimal solution for the frac- 
tional path coloring problem and then applies random- 
ized rounding to obtain an integral solution. 


Related Topics 


A number of further results related to the path color- 
ing problem in directed tree networks or in networks 
with different topology are known. The number of col- 
ors required for sets of paths that have a special form 
have been investigated, e.g., one-to-all instances, all- 
to-all instances, permutations, and k-relations. A sur- 
vey of many of these results can be found in [4]. The 
undirected version of the path coloring problem has 
been studied by P. Raghavan and E. Upfal in [20]; 
here, the network is represented by an undirected graph 
and paths must receive different colors if they share 
an undirected edge. Approximation results for directed 
and undirected path coloring problems in ring net- 
works, mesh networks, and arbitrary networks (all of 
these are NP-hard no matter whether the paths are fixed 
or can be chosen by the algorithm [7]) have been de- 
rived. 

An on-line variant of path coloring was studied by 
Y. Bartal and S. Leonardo in [3]. Here, the algorithm is 
given connection requests one by one and must deter- 
mine a path connecting the corresponding vertices and 
a color for this path without any knowledge of future 
requests. The worst-case ratio between the number of 
colors used by the on-line algorithm and that used by 
an optimal off-line algorithm with complete advance 
knowledge is the competitive ratio. In [3] on-line al- 
gorithms with competitive ratio O(log n) are presented 
for trees, trees of rings, and meshes with n vertices. 
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For a black-box global optimization algorithm to be 
truly global, some effort must be allocated to global 
search, that is, search done primarily to ensure that po- 
tentially good parts of the space are not overlooked. On 
the other hand, to be efficient, some effort must also 
be placed on local search near the current best solu- 
tion. Most algorithms either move progressively from 
global to local search (e.g., simulated annealing) or 
combine a fundamentally global method with a fun- 
damentally local method (e.g., multistart, tunneling). 
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DIRECT introduces a new approach: in each iteration 
several search points are computed using all possible 
weights on local versus global search (how this is done 
will be made clear shortly). This approach eliminates 
the need for ‘tuning parameters’ that set the balance be- 
tween local and global search, resulting in an algorithm 
that is robust and easy-to-use. 

DIRECT is especially valuable for engineering op- 
timization problems. In these problems, the objec- 
tive and constraint functions are often computed us- 
ing time-consuming computer simulations, so there is 
a need to be efficient in the use of function evaluations. 
The problems may contain both continuous and inte- 
ger variables, and the functions may be nonlinear, non- 
smooth, and multimodal. While many algorithms ad- 
dress these problem features individually, DIRECT is 
one of the few that addresses them collectively. How- 
ever, the versatility of DIRECT comes at a cost: the al- 
gorithm suffers from a curse of dimensionality that lim- 
its it to low-dimensional problems (say, no more than 
20 variables). 

The general problem solved by DIRECT can be 
written as follows: 


min f(x1,...,%n) 


St. Sihiscs.gay) =O, 


Bm(X1,-+-,Xn) = 0; 
l) < xj < ui, 


x; € J integer. 


To prove convergence, we must assume that the ob- 
jective and constraint functions are continuous in the 
neighborhood of the optimum, but the functions can 
otherwise be nonlinear, nondifferentiable, nonconvex, 
and multimodal. While DIRECT does not explicitly 
handle equality constraints, problems with equalities 
can often be rewritten as problems with inequality con- 
straints (either by replacing the equality with an in- 
equality that becomes binding in the solution, or by us- 
ing the equalities to eliminate variables). The set J in the 
above problem is the set of variables that are restricted 
to integer values. DIRECT works best when the inte- 
ger variables describe an ordered quantity, such as the 
number of teeth on a gear. It is less effective when the 
integer variables are categorical. 
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In what follows, we begin by describing how DI- 
RECT works when there are no inequality and integer 
constraints. This basic version corresponds, with minor 
differences, to the originally published algorithm [2]. 
After describing the basic version, we then introduce 
extensions to handle inequality and integer constraints 
(this article is the first publication to document these 
extensions). We conclude with a step-by-step descrip- 
tion of the algorithm. 

The bounds on the variables limit the search to an 
n-dimensional hyper-rectangle. DIRECT proceeds by 
partitioning this rectangle into smaller rectangles, each 
of which has a ‘sampled point’ at its center, that is, 
a point where the functions have been evaluated. An ex- 
ample of such a partition for n = 2 is shown in Fig. 1. 

We have drawn the rectangle as a square because 
later, whenever we measure distances or lengths, we will 
weight each dimension so that the original range (u; — 
£;) has a weighted distance of one. Drawing the hyper- 
rectangle as a hyper-cube allows us to visualize relative 
lengths as they will be used in the algorithm. 

Figure 2 shows the first three iterations of DIRECT 
on a hypothetical two-variable problem. At the start of 
each iteration, the space is partitioned into rectangles. 
DIRECT then selects one or more of these rectangles for 
further search using a technique described later. Finally, 
each selected rectangle is trisected along one of its long 
sides, after which the center points of the outer thirds 
are sampled. In this way, we sample two new points in 
the rectangle and maintain the property that every sam- 
pled point is at the center of a rectangle (this property 
would not be preserved if the rectangle were bisected). 

At the beginning of iteration 1, there is only one 
rectangle (the entire space). The process of selecting 
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rectangles is therefore trivial, and this rectangle is tri- 
sected as shown. At the start of iteration 2, the selec- 
tion process is no longer trivial because there are three 
rectangles. In the example, we select just one rectangle, 
which is then trisected and sampled. At the start of it- 
eration 3, there are 5 rectangles; in this example, two of 
them are selected and trisected. 

The key step in the algorithm is the selection of rect- 
angles, since this determines how search effort is allo- 
cated across the space. The trisection process and other 
details are less important, and we will defer discussion 
of them until later. 

To motivate how DIRECT selects of rectangles, let 
us begin by considering the extremes of pure global 
search and pure local search. A pure global search strat- 
egy would select one of the biggest rectangles in each 
iteration. If this were done, all the rectangles would be- 
come small at about the same rate. In fact, if we al- 
ways trisected one of the biggest rectangles, then af- 
ter 3" function evaluations every rectangle would be 
a cube with side length 3~*, and the sampled points 
would form a uniform grid. By looking everywhere, this 
pure global strategy avoids overlooking good parts of 
the space. 

A pure local strategy, on the other hand, would sam- 
ple the rectangle whose center point has the best objec- 
tive function value. This strategy is likely to find good 
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solutions quickly, but it could overlook the rectangle 
that contains the global optimum (this would happen if 
the rectangle containing the global optimum had a poor 
objective function value at the center). 

To select just one ‘best’ rectangle, we would have 
to introduce a tuning parameter that controlled the lo- 
cal/global balance. Unfortunately, the algorithm would 
then be extremely sensitive to this parameter, since the 
proper setting would depend on the (unknown) diffi- 
culty of the problem at hand. 

DIRECT avoids tuning parameters by rejecting the 
idea of selecting just one rectangle. Instead, several rect- 
angles are selected using all possible relative weightings 
of local versus global search. The idea of using all possi- 
ble weightings may seem impractical, but with the help 
of a simple diagram this idea can actually be made quite 
intuitive. For this diagram, we will need a way to mea- 
sure of the size of a rectangle. We will measure size us- 
ing the distance between the center point and the ver- 
tices, as shown in Fig. 3. 

With this measure of rectangle size, we can now 
turn our attention to Fig. 4 which shows how rectan- 
gles are selected. In the figure, each rectangle in the 
partition is represented by a dot. The horizontal coor- 
dinate of a dot is the size of the rectangle, measured 
by the center-vertex distance. The vertical coordinate 
is the function value at the midpoint of the rectangle. 
The dot labeled A represents the rectangle with the low- 
est function value, and so this would be the rectangle 
selected by a pure local strategy. Similarly, the dot la- 
beled B represents one of the biggest rectangles, and so 
it would be selected by a pure global strategy. DIRECT 
selects not only these two extremes but also all the rect- 
angles on the lower-right convex hull of the cloud of 
dots (the dots connected by the line). These rectan- 
gles represent ‘efficient trade-offs’ between local versus 
global search, in the sense that each of them is best for 
some relative weighting of midpoint function value and 
center-vertex distance. (We will explain the other lines 
in Fig. 4. shortly.) 
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One might think that the idea illustrated in Fig. 4 
would extend naturally to the constrained case; that is, 
we would simply select any rectangle that was best for 
some weighting of objective value, center-vertex dis- 
tance, and constraint values. Unfortunately, this does 
not work because it leads to excessive sampling in the 
infeasible region. However, as we explain next, there 
is an alternative way of thinking about the lower-right 
convex hull that does extend to the constrained case. 

For the sake of the exposition, let us suppose for the 
moment that we know the optimal function value f*. 
For the function to reach f* within rectangle r, it would 
have to undergo a rate of change of at least (f, — f*)/d,, 
where f; is the function value at the midpoint of rectan- 
gle r and d, is the center-vertex distance. This follows 
because the function value at the center is f, and the 
maximum distance over which the function can fall to 
f* is the center-vertex distance d,. Intuitively, it seems 
‘more reasonable’ to assume that the function will un- 
dergo a gradual change than to assume it will make 
a steep descent to f*. Therefore, if only we knew the 
value f*, a reasonable criterion for selecting a rectangle 
would be to choose the one that minimizes (f, — f*)/d,. 

Figure 4 shows a graphical way to find the rectangle 
that minimizes (f, — f*)/d,. Along the vertical axis we 
show the current best function value, fmin, as well as the 
supposed global minimum f*. Now suppose we anchor 
a line at the point (0, f*) and slowly swing it upwards. 
When we first encounter a dot, the slope of the line will 
be precisely the ratio (f, — f*)/d,, where r is the index 
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of the rectangle corresponding to the encountered dot. 
Moreover, since this is the first dot touched by the line, 
rectangle r must be the rectangle that minimizes (f, — 
f* dy. 

Of course, in general we will not know the value of 
f*. But we do know that, whatever f* is, it satisfies f* 
< fmin. So imagine that we repeat the line-sweep exer- 
cise in Fig. 4 for all values of f* ranging from fmin to 
—oo. How many rectangles could be selected? Well, 
with a little thought, it should be clear that the set of 
dots that can be selected via these line sweeps is pre- 
cisely the lower-right convex hull of the dots. 

This alternative approach to deriving the lower- 
right convex hull suggests a small but important mod- 
ification to the selection rule. In particular, to prevent 
DIRECT from wasting function evaluations in pursuit 
of very small improvements, we will insist that the value 
of f* satisfy f* < fmin — €. That is, we are only inter- 
ested in selecting rectangles where it is reasonable that 
we can find a ‘significantly better’ solution. A natural 
value of € would be the desired accuracy of the solution. 
In our implementation, we have set € = max(10~“*|fmin|, 
1078). 

As shown in Fig. 5, the implication of this modifica- 
tion is that some of the smaller rectangles on the lower- 
right convex hull may be skipped. In fact, the smallest 
rectangle that will be selected is the one chosen when 
f* =fmin—€. 

The version of DIRECT described so far corre- 
sponds closely to the originally published version [2]. 
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The only difference is that, in the original version, a se- 
lected rectangle was trisected not just on a single long 
side, but rather on all long sides. This approach elim- 
inated the need to arbitrarily select a single long side 
when there were more than one and, as a result, it added 
an element of robustness to the algorithm. Experience 
has since shown, however, that the robustness benefit is 
small and that trisecting on a single long side (as here) 
accelerates convergence in higher dimensions. 

Let us now consider how the rectangle selection 
procedure can be extended to handle inequality con- 
straints. The key to handling constraints in DIRECT is 
to work with an auxiliary function that combines in- 
formation on the objective and constraint functions in 
a special manner. To express this auxiliary function, we 
will need some additional notation. Let g,; denote the 
value of constraint j at the midpoint of rectangle r. In 
addition, let c1,..., Cm be positive weighting coefficients 
for the inequality constraints (we will discuss how these 
coefficients are computed later). Finally, for the sake of 
the exposition, let us again suppose that we know the 
optimal function value f*. The auxiliary function, eval- 
uated at the center of rectangle r, is then as follows: 


max(f, — f*,0) + > cj max(g,;, 0) 


j=l 


The first term of the auxiliary function exacts a penalty 
for any deviation of the function value f, above the 
global minimum value f*. Note that, in a constrained 
problem, it is possible for f, to be less than f* by violat- 
ing the constraints; due to the maximum operator, the 
auxiliary function gives no credit for values of f, below 
f*. The second term in the auxiliary function is a sum of 
weighted constraint violations. Clearly, the lowest pos- 
sible value of the auxiliary function is zero and occurs 
only at the global minimum. At any other point, the 
auxiliary function is positive either due to suboptimal- 
ity or infeasibility. 

This auxiliary function is not a penalty function in 
the standard sense. A standard penalty function would 
be a weighted sum of the objective function and con- 
straint violations; it would not include the value f* 
since this value is generally unknown. Moreover, in the 
standard approach, it is critical that the penalty coefh- 
cients be sufficiently large to prevent the penalty func- 
tion from being minimized in the infeasible region. 


This is not true for our auxiliary function: as long as f* 
is the optimal function value, the auxiliary function is 
minimized at the global optimum for any positive con- 
straint coefficients. 

For the global minimum to occur in rectangle r, 
the auxiliary function must fall to zero starting from its 
(positive) value at the center point. Moreover, the max- 
imum distance over which this change can occur is the 
center-vertex distance d,. Thus, to reach the global min- 
imum in rectangle r, the auxiliary function must un- 
dergo a minimum rate of change, denoted h,(f*), given 
by 


max(f, — f*,0) + payee c; max(g,;, 0) 


Since it is more reasonable to expect gradual changes 
than abrupt ones, a reasonable way to select a rectan- 
gle would be to select rectangle that minimizes the rate 
of change h,(f*). Of course, this is impractical because 
we generally will not know the value f*. Nevertheless, it 
is possible to select the set of rectangles that minimize 
h,(f*) for some f* <f min — €. This is how we select rect- 
angles with constraints—assuming a feasible point has 
been found so that fmin is well-defined (we will show 
how this is implemented shortly). If no feasible point 
has been found, we simply select the rectangle that min- 
imizes 
7, cj max(g,;,0) 
Fa : 


That is, we select the rectangle where the weighted con- 
straint violations can be brought to zero with the least 
rate of change. 

To implement this selection rule, it is again help- 
ful to draw a diagram. This new selection diagram is 
based on plotting the rate-of-change function h,(f*) as 
a function of f*. Figure 6 illustrates this function. For 
values of f* > f,, the first term in the numerator of 
h,(f*) is zero, and so h,(f*) is constant. As f* falls be- 
low f;, however, the h,(f*) increases, because we now 
exact a penalty for f, being above the supposed global 
minimum f*. The slope of h,(f*) function to the left of 
fris— d,. 

Figure 7 superimposes, in one diagram, the rate-of- 
change functions for a hypothetical set of seven rectan- 
gles. For a particular value of f*, we can visually find the 
rectangle that minimizes h,(f*) by starting at the point 
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(f*, 0) along the horizontal axis and moving vertically 
until we first encounter a curve. What we want, how- 
ever, is the set of all rectangles that can be selected in 
this way using any f* < fmin — €. This set can be found 
as follows (see Fig. 7). We start with f* = fmin — € and 
move upwards until we first encounter a curve for some 
rectangle. We note this rectangle and follow its curve to 
the left until it intersects the curve for another rectan- 
gle (these intersections are circled in Figure 7). When 
this happens, we note this other rectangle and follow its 
curve to the left. We continue in this way until we find 
a curve that is never intersected by another one. This 
procedure will identify all the h,(f*) functions that par- 
ticipate in the lower envelope of the curves to the left of 
fmin— €. The set of rectangles found in this way is the 
set selected by DIRECT. 

Along the horizontal axis in Fig. 7, we identify 
ranges of f* values for which different rectangles have 
the lowest value of h,(f*). As we scan from fmin — € to 


the left, the rectangles that participate in the lower en- 
velope are 1, 2, 5, 2, and 7. This example illustrates that 
it is possible to encounter a curve more than once (here 
rectangle 2), and care must be taken not to double count 
such rectangles. It is also possible for some curves to co- 
incide along the lower envelope, and so be ‘tied’ for the 
least rate of change (this does not happen in Fig. 7). In 
such cases, we select all the tied rectangles. 

Tracing the lower envelope in Fig. 7 is not compu- 
tationally intense. To see this, note that each selected 
rectangle corresponds to a curve on the lower enve- 
lope, and for each such curve the work we must do is 
to find the intersection with the next curve along the 
lower envelope. Finding this next intersection requires 
computing the intersection of the current curve with all 
the other curves. It follows that the work required for 
each selected rectangle (and hence for every two sam- 
pled points) is only on the order of the total number of 
rectangles in the partition. 

The tracing of the lower envelope can also be accel- 
erated by some pre-processing. In particular, it is possi- 
ble to quickly identify rectangles whose curves lie com- 
pletely above other curves. For example, in Fig. 7, curve 
3 lies above curve 1, and curve 4 lies above curve 2. 
These curves cannot possibly participate in the lower 
envelope, and so they can be deleted from considera- 
tion before the lower envelope is traced. 

It remains to explain how the constraint coefficients 
C1, ---> Cm are computed, as well as a few other details 
about trisection and the handling of integer variables. 
We will cover these details in turn, and then bring ev- 
erything together into a step-by-step description of the 
algorithm. 

To understand how we compute the constraint co- 
efficient cj, suppose for the moment that we knew the 
average rate of change of the objective function, de- 
noted do, and the average rate of change of constraint j, 
denoted a;. Furthermore, suppose that at the center of 
a rectangle we have gj > 0. At the average rate of change 
of constraint j, we would have to move a distance equal 
to g;/a; to get rid of the constraint violation. If during 
this motion the objective function got worse at its av- 
erage rate of change, it would get worse by do times the 
distance, or ao(gj/a;) = (ao/a;) g;. Thus we see that the 
ratio ao/a; provides a way of converting units of con- 
straint violation into potential increases in the objective 
function. For this reason, we will set cj = ao/qj. 
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The average rates of change are estimated in a very 
straightforward manner. We maintain a variable so for 
the sum of observed rates of change of the objective 
function. Similarly we maintain variables 5), ..., sy» for 
the sum of observed rates of change for each of the 
m constraints. All of these variables are initialized to 
zero at the start of the algorithm and updated each time 
a rectangle is trisected. Let x™4 denote the midpoint of 
the parent rectangle and let x'* and xt denote the 
midpoints of the left and right child rectangles after tri- 
section. The variables are updated as follows: 


right child) __ mid 
So = So + > iG na J 


[ee = xmid | 


child=left 
right : i 
a 3s ee) = a) 
sj = Spt [child — mid] 
child=left 


Now the average rates of change are ay = sy/N and aj = 
sj/N, where N is the number of rates of change accumu- 
lated into the sums. It follows that 


ao 


aj 


S0 


zk |2Ig 


Sj 
We may therefore compute c; using 


So 


= | 
: max(s;, 109°) 


where we use the maximum operator in the denomina- 
tor to prevent division by zero. 

So far we have said that we will always trisect a rect- 
angle along one of its long sides. However, as shown 
in Fig. 2, several sides may be tied for longest, and so 
we need some way to break these ties. Our tie breaking 
mechanism is as follows. We maintain counters ¢; (i= 1, 
..., 1) for how many times we have split along dimen- 
sion i over the course of the entire search. These coun- 
ters are initialized to zero at the beginning of the algo- 
rithm, and counter t; is incremented each time a rectan- 
gle is trisected along dimension i. If we select a rectangle 
that has several sides tied for being longest, we break 
the tie in favor of the side with the lowest ft; value. If 
several long sides are also tied for the lowest t; value, we 
break the tie arbitrarily in favor of the lowest-indexed 
dimension. This tie breaking strategy has the effect of 
equalizing the number of times we split on the different 
dimensions. 


Let us now turn to the calculation of the center- 
vertex distance. Recall that we measure distance using 
a weighted metric that assigns a length of one to the 
initial range of each variable (u; — ¢;). Each time a rect- 
angle is split, the length of that side is then reduced by 
a factor of 1/3. Now consider a rectangle that has been 
trisected T times. Let j = mod(T, n), so that we may 
write T = kn + j where k = (T — j)/n. After the first kn 
trisections, all of the n sides will have been trisected k 
times and will therefore have length 3~ *. The remain- 
ing j trisections will make j of the sides have length 
3— +) leaving n — j sides with length 3~ *. Simple al- 
gebra then shows that the distance d from the center to 
the vertices is given by 


3-k j 0.5 
d= — [= -j ; 
3 (Z+n i) 


(This is not obvious, but can be easily verified.) 

The handling of integer variables is amazingly sim- 
ple, involving only minor changes to the trisection rou- 
tine and to the way the midpoint of a rectangle is de- 
fined. For example, consider an integer variable with 
range [1, 8]. We could not define the midpoint to be 
4.5 because this is not an integer. Instead, we will use 
the following procedure. Suppose the range of a rectan- 
gle along an integer dimension is [a, b], with both a and 
b being integers. We will define the ‘midpoint as [(a 
+ b)/2], that is, it is the floor of algebraic average (the 
floor of z, denoted |z], is the greatest integer less than 
or equal to z). 

To trisect along the integer dimension, we first com- 
pute A = |(b—a+ 1)/3]. If A > 1, then after trisection 
the left child will have the range [a, a + A — 1], the 
center child will have the range [a + A, b — A], and 
the right child will have range [b — A + 1, Db]. If A = 
0, then the integer side must have a range of two (i-e., 
b =a +1). In this case, the center child will have the 
range [a, a] the right child will have the range [b, b], 
and there will be no left child. This procedure main- 
tains the property that the midpoint of the parent rect- 
angle always becomes the midpoint of the center child. 
As an example, Fig. 8 shows how a rectangle would be 
trisected when there are two integer dimensions. In the 
figure, the circles represent possible integer combina- 
tions, and the filled circles represent the midpoints. 

Integer variables introduce three other complica- 
tions. The first, which may be seen in Fig. 8, is that the 
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sampled point may not be in the true geometric cen- 
ter of the rectangle. As a result, the center-vertex dis- 
tance will not be unique but will vary from vertex to 
vertex. We ignore this detail and simply use the for- 
mula given above for the continuous case, which only 
depends upon the number of times a rectangle has been 
trisected. 

The second complication concerns how we define 
a ‘long’ side. In the continuous case, the length of a side 
is directly related to the number of times it has been 
trisected along that dimension. Specifically, if a rect- 
angle has been split k times along some side, then the 
side length will be 3—* (recall that we measure dis- 
tance relative to the original range of each variable). In 
the continuous case, therefore, the set of long sides is 
the same as the set of sides that have been split upon 
the least. When there are integers, however, the side 
lengths will no longer be multiples of 1/3. To keep 
things simple, however, we ignore this and continue to 
define a ‘long’ side as one that has been split upon the 
least. However, if an integer side has been split so many 
times that its side length is zero (i. e., the range contains 
a single integer), then this side will not be considered 
long. 

The third and final complication is that, if all the 
variables are integer, then it is possible for a rectangle 
to be reduced to a single point. If this happens, the rect- 
angle would be fathomed; hence, it should be ignored 
in the rectangle selection process in all subsequent iter- 
ations. 


DIRECT stops when it reaches a user-defined limit 
on function evaluations. It would be preferable, of 
course, to stop when we have achieved some desired 
accuracy in the solution. However, for black-box prob- 
lems where we only assume continuity, better stopping 
rules are hard to develop. 

As for convergence, it is easy to show that, as f* 
moves to —oo, DIRECT will select one of the largest 
rectangles. Because we always select one of the largest 
rectangles, and because we always subdivide on a long 
side, every rectangle will eventually become very small 
and the sampled points will be dense in the space. Since 
we also assume the functions are continuous in the 
neighborhood of the optimum, this insures that we will 
get within any positive tolerance of the optimum after 
a sufficiently large number of iterations. 

Although we have now described all the elements 
of DIRECT, our discussion has covered several pages, 
and so it will be helpful to bring everything together in 
a step-by-step description of the algorithm. 

1) Initialization. 

Sample the center point of the entire space. If the 

center is feasible, set xin equal to the center point 

and f min equal to the objective function value at this 
point. Set s; = 0 for j = 0,..., m; t;=0 fori=1.,..., 

n; and neval = 1 (function evaluation counter). Set 

maxeval equal to the limit on the number of func- 

tion evaluations (stopping criterion). 
2) Select rectangles. 
Compute the c; values using the current values of so 
and sj, j =1,..., m. Ifa feasible point has not been 
found, select the rectangle that minimizes the rate 
of change required to bring the weighted constraint 
violations to zero. On the other hand, if a feasible 
point has been found, identify the set of rectangles 

that participate in the lower envelope of the h,(f*) 

functions for some f* < fmin — €. A good value for 

€ is € = max(10~* [fmin|, 107°). Let S be the set of 
selected rectangles. 

3) Choose any rectangle r € S. 

4) Trisect and sample rectangle r. 

Choose a splitting dimension by identifying the set 

of long sides of rectangle r and then choosing the 

long side with the smallest t¢; value. If more than one 
side is tied for the lowest t; value, choose the one 
with the lowest-dimensional index. Let i be the re- 
sulting splitting dimension. Note that a ‘long side’ 


Direct Global Optimization Algorithm 


733 


is defined as a side that has been split upon the least 

and, if integer, has a positive range. Trisect rectangle 

r along dimension i and increment ¢; by one. Sam- 

ple the midpoint of the left third, increment neval by 

one, and update xmin and fmin. If neval = maxeval, 
go to Step 7. Otherwise, sample the midpoint of the 
right third, increment neval by one, and update xmin 
and fmin (note that there might not be a right child 
when trisecting on an integer variable). Update the 
sj; j = 0, ..., m. If all n variables are integer, check 
whether a child rectangle has been reduced to a sin- 
gle point and, if so, delete it from further considera- 

tion. Go to Step 5. 

5) Update S. 
Set S = S — {r}. If S is not empty, go to Step 3. Oth- 
erwise go to Step 6. 

6) Iterate. 
Report the results of this iteration, and then go to 
Step 2. 

7) Terminate. 

The search is complete. Report xmin and fmin and 

stop. 

The results of DIRECT are slightly sensitive to the 
order in which the selected rectangles are trisected and 
sampled because this order affects the t; values and, 
hence, the choice of splitting dimensions for other se- 
lected rectangles. In our current implementation, we se- 
lect the rectangles in Step 3 in the same order that they 
are found as we scan the lower envelope in Fig. 7 from 
f* =fmin — € towards f* = — oo. 

On the first iteration, all the s; will be zero in Step 2 
and, hence, all the c; will be zero when computed using 
cj = So/max(s;, 10~*°). Thus, in the beginning the con- 
stants c; will not be very meaningful. This is not impor- 
tant, however, because on the first iteration there is only 
one rectangle eligible for selection (the entire space), 
and so the selection process is trivial. As the iterations 
proceed, the s; will be based on more observations, lead- 
ing to more meaningful c; constants and better rectan- 
gle selections. 

When there are no inequality constraints, the above 
step-by-step procedure reduces to the basic version of 
DIRECT described earlier. To see this, note that, when 
there are no constraints, every point is feasible and so 
fr =fmin = f* for all rectangles r. This fact, combined 
with the lack of any constraint violations, means that 
the h,(f*) function given earlier reduces to (f,— f*)/d,, 
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which is precisely the rate-of-change function we min- 
imized in the unconstrained version. Thus, in the un- 
constrained case, tracing the lower envelope in Fig. 7 
identifies the same rectangles as tracing the lower-right 
convex hull in Fig. 5. 

We will illustrate DIRECT on the following two- 
dimensional test function: 


min f(x), x2) 
st. B(x1,X2) <0 


—-1<%1,x%.< +1, 


where 
f(x, X2) 
2, *t\ 2 2) 2 
= (4-2. + 3 Jap taxa + (—4 + 4x3) x3, 
(x1, x2) = —sin(4ax,) +2 sin?(27 x2). 


We call this problem the Gomez #3 problem since it was 
listed as the third test problem in an article by S. Gomez 
and A. Levy [1]. The global minimum of the Gomez #3 
problem occurs at the point (0.109, — 0.623) where the 
function value is — 0.9711. The problem is difficult be- 
cause the feasible region consists of many disconnected, 
approximately circular parts, giving rise to many local 
optima (see Fig. 9). 

For this test function, DIRECT gets within 1% of 
the optimum after 89 function evaluations and within 
0.01% after 513 function evaluations. The first 89 sam- 
pled points are shown in Fig. 10. For comparison, the 
tunneling algorithm of Gomez and Levy [1] converged 
using an average of 1053 objective function evaluations 
and 1873 constraint evaluations (averaged over 20 ran- 
dom starting points). One reason DIRECT converges 
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quickly is that it searches both globally and locally dur- 
ing each iteration; as a result, as soon as the global part 
of the algorithm finds the basin of convergence of the 
optimum, the local part of the algorithm automatically 
exploits it. 

In this example, DIRECT quickly gets close to the 
optimum but takes longer to achieve a high degree 
of accuracy. This suggests that the best performance 
would be obtained by combining DIRECT with a good 
local optimizer. The simplest way to do this is to run 
DIRECT for a predetermined number of function eval- 
uations and then use the resulting solution as a start- 
ing point for a local optimization. While straightfor- 
ward, this approach is highly sensitive to the number 
of function evaluations used in the global phase with 
DIRECT. If one uses too few function evaluations, DI- 
RECT might not discover the basin of convergence of 
the global minimum. 

To ensure that the global optimum is eventually 
found, we must somehow return to the global phase af- 
ter we have performed a local search. One way of do- 
ing this is as follows. We start the local optimizer the 
very first time a feasible point is found (or perhaps af- 
ter a minimum initial phase of 50-100 evaluations). Af- 
ter the local finishes, we return to DIRECT. However, 
DIRECT does not proceed the same as it would have 
without the local optimizer. Instead, the search will be 
more global, because the local optimizer will have re- 
duced the value of fmin (which affects rectangle selec- 
tion). DIRECT will now be looking for a point that 
improves upon the local solution—in effect, it will be 
looking for the basin of convergence of a better mini- 
mum. If DIRECT finds such an improving point, then 


we run a local search from this point and again return to 
DIRECT. This process continues until we reach a pre- 
determined limit on the total number of function evalu- 
ations (for both DIRECT and the local optimizer). Used 
in this way, DIRECT becomes an intelligent routine for 
selecting starting points for the local optimizer. 

While DIRECT works well on the Gomez #3 prob- 
lem and on test functions reported in [2], the algo- 
rithm is not without its disadvantages. For example, DI- 
RECT’s use of a space-partitioning approach requires 
the user to have relatively tight lower and upper bounds 
on all the variables. DIRECT will perform miserably 
if one specifies wide bounds such as [—10°°, +10°°]. 
The space-partitioning approach also limits the algo- 
rithm to low-dimensional problems (say, less than 20). 
While integer variables are handled, they must be or- 
dered, such as the number of gear teeth, since only then 
can we expect the function value at a rectangle’s mid- 
point to be indicative of what the function is like in the 
rest of the rectangle. Another limitation is that equality 
constraints are not handled. Finally, the stopping crite- 
rion—a limit on function evaluations—is weak. 

The advantages of DIRECT, however, are consider- 
able. The algorithm can handle nonsmooth, nonlinear, 
multimodal, and even discontinuous functions (as long 
as the discontinuity is not close to the global optimum). 
The algorithm works well in the presence of noise, since 
a small amount of noise usually has little impact on 
the set of selected rectangles until late in the search. 
Computational overhead is low, and the algorithm can 
exploit parallel processing because it generates several 
new points per iteration. Based on the comparisons in 
[2], the algorithm also appears to be efficient in terms 
of the number of function evaluations required to get 
close to the global minimum. But the most important 
advantage of DIRECT stems from its unique approach 
to balancing local and global search—the simple idea 
of not sampling just one point per iteration, but rather 
sampling several points using all possible weightings of 
local versus global search. This approach leads to an al- 
gorithm with no tuning parameters, making the algo- 
rithm easy-to-use and robust. 
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Direct search optimization procedures are attractive be- 
cause of the ease with which they can be used. Op- 
timization procedures where auxiliary functions such 


as gradients are not calculated are desirable for prob- 
lems where discontinuous functions are encountered, 
or where numerous constraints make the calculation 
and use of gradients very difficult in searching for the 
global optimum. The reliability of getting to the vicin- 
ity of the global optimum is an additional feature that 
makes the use of direct search optimization an attrac- 
tive means of optimization. 

The need for an efficient and easy to use optimiza- 
tion procedure was illustrated in [1], in attempting to 
obtain the best weighting factors in a Liapunov func- 
tion used for time suboptimal control of a linear gas ab- 
sorber. Although at that time the best optimization pro- 
cedure for that problem was the hill-climbing proce- 
dure due to H.H. Rosenbrock [35], the method encoun- 
tered difficulties in establishing the global optimum. In 
the 1970s a large number of direct search optimiza- 
tion procedures were introduced. One such method is 
due to R. Luus and T.H.I. Jaakola [29], which has been 
called in the literature by numerous authors as the LJ 
optimization procedure. The method is based on using 
a number of randomly chosen test points over some re- 
gion and contracting the region after every iteration, 
always starting the iteration with the best point found 
from the previous iteration as the center of the region. 
The ease of programming and the ease with which in- 
equality constraints can be handled make this direct 
search procedure attractive. 


Optimization Problem 


We consider the problem of minimizing the perfor- 
mance index or cost function 


Lf (Xtyae eg Xn) (1) 


subject to p inequality constraints 


aaa Pp. (2) 


through the appropriate choice of x), ..., X,. The di- 

rect search optimization procedure suggested in [29] 

involves only three steps: 

e Choose a number of points in the n-dimensional 
space through the equation 


Si(X1,..-.Xn) = 0, j= 


x =x*+Dr, (3) 


where D is a diagonal matrix with diagonal elements 
chosen at random between —1 and +1, and r is the 
region size vector. 
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e Check the feasibility of each point with respect to 
(2), and for each feasible point evaluate the perfor- 
mance index I in (1). 

e At the end of each iteration, x* is replaced by the 
best feasible value of x obtained in Step 2, and the 
region size vector r is reduced in size by 


OTD = yp, (4) 


where y is a contraction factor such as 0.95. This 
procedure is continued for a number of iterations 
and the results are examined. 

If adequate convergence is not obtained, then the pro- 
cedure can be repeated by carrying out another pass, 
using the information obtained from the previous pass. 
This optimization procedure enabled several difficult 
optimization problems to be solved in the original pa- 
per [29], and provided a means to solve a wide vari- 
ety of problems in optimal control, such as time sub- 
optimal control [9,8,7] and gave good approximation 
to optimal control by providing a means of obtaining 
the elements for the feedback gain matrix. The LJ op- 
timization procedure was found very useful for stabi- 
lizing systems through shifting of poles [30] and test- 
ing stabilizability of linear systems [13]. Research was 
done to improve the likelihood of getting the global op- 
timum for nonunimodal systems [37], but even with- 
out any modification, the reliability of the LJ proce- 
dure was found to be very good [38], even for the dif- 
ficult bifunctional catalyst blend problem [26]. There- 
fore, the LJ optimization procedure could be used ef- 
fectively for optimization of complex systems such as 
heat exchanger networks [17], a transformer design 
problem [36], design of structural columns in such 
a way that the amount of material would be min- 
imized [3], and problems dealing with metallurgical 
processes [34]. The simplicity of the method was illus- 
trated by the computer program given in its entirety in 
reference [17]. 

When the variables are restricted to be integers, spe- 
cial procedures may be necessary [12], since we can- 
not simply search on integer values to get the global 
optimum. Thus the scope of problems where LJ op- 
timization procedure has been successfully applied is 
quite wide. In parameter estimation, N. Kalogerakis 
and Luus [6] found that by LJ optimization reliable 
estimates could be obtained for parameters in very 


few iterations, so that these estimates could then be 
used as starting values for quadratically convergent 
Gauss—Newton method, without having to worry about 
nonconvergence. In model reduction the LJ method 
has been found useful to match the reduced system’s 
Nyquist plot to that of the original system [15], or used 
directly in time domain [40]. LJ optimization procedure 
has also been used successfully in model reduction in 
sampled-data systems [39] and is illustrated with sev- 
eral examples in [23]. 


Handling Equality Constraints 


Suppose that in addition to the inequality constraints in 
(2), we also have m equality constraints 

hi(x1,...,%,) =0, i=1,...,m, (5) 
where these equality constraints are ‘difficult’ in the 
sense that they can not be used to solve for some partic- 
ular variable. 

Although a two-pass method to deal with equal- 
ity constraints [10] was effective to solve optimization 
problems involving recycle streams [11], the general ap- 
proach for handling equality constraints with LJ opti- 
mization procedure was not solved satisfactorily, un- 
til it was shown [4] that penalty functions can be used 
very effectively in direct search optimization. The work 
was extended in [33], and now it appears that the use 
of a quadratic penalty function incorporating a shift- 
ing term is the best way of dealing with difficult equal- 
ity constraints [19]. We consider the augmented perfor- 
mance index 


J=I+0) (hi-si, (6) 


i=1 


where a shifting term s; is introduced for each equal- 
ity constraint. To solve the optimization problem, LJ 
optimization procedure is used in a multipass fash- 
ion, where at the beginning of each pass consisting of 
a number of iterations, the region sizes are restored to 
a fraction of the sizes used at the beginning of the pre- 
vious pass. The shifting terms s; are updated after every 
pass simply by adjusting the values at the beginning of 
pass (q+1) based on the deviation of the left-hand side 
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in (5) from zero; i.e., 


ail =s?_ hy, i=1,...,m, (7) 
where q is the pass number. Upon optimization the 
product — 2 @ s; gives the Lagrange multiplier asso- 
ciated with the ith equality constraint, yielding useful 
sensitivity information. Full details are given in [19]. 
This approach to dealing with equality constraints was 
used in [31] in using iterative dynamic programming 
(IDP) to solve an optimal control problem where the 
final state was specified, and in [28] where the volume 
of the fed-batch reactor was specified at the final time. 

Another approach to deal with equality constraints 
is to solve the algebraic equations at each iteration by 
grouping the equations [21]. For several optimization 
problems this approach yielded very rapid convergence 
to the global optimum [22]. 


Use of LJ Optimization Procedure 
for High-Dimensional Problems 


In [2] it was found that LJ optimization can be used 
quite effectively to solve optimal control problems, 
where the system is divided into a number of time 
stages. This approach was used in [27] to solve a very 
difficult optimal control problem involving the deter- 
mination of the optimum drug delivery schedule to 
minimize the tumor size at the end of 12 weeks. The 
problem was broken into 84 time stages, each con- 
sisting of a single day. In spite of the state constraints 
and discontinuous functions, this 84-dimensional opti- 
mization problem was solved successfully on a personal 
computer in reasonable computation time. Especially 
now that the personal computers are much faster, such 
a problem is considerably easier to solve. To solve high- 
dimensional problems a multipass method was used for 
LJ optimization where after a pass, the region would be 
restored to a value smaller than used at the beginning 
of the previous pass and the procedure was repeated. 
In the case of the cancer chemotherapy problem, the 
problem required a number of runs for successful so- 
lution [27]. 


Determination of Region Size 


One of the problems that was outstanding for the LJ op- 
timization procedure was how to choose the region size 


vector r effectively at the beginning of the iterations, 
especially when a multipass procedure was used. This 
problem was recently solved in [20], by suggesting that 
the initial region size be determined by the extent of the 
variation of the variable during the previous pass. With 
the use of reliable values for the region size at the be- 
ginning of each pass in a multipass run, the computa- 
tional effort is decreased quite substantially. For exam- 
ple, when we consider the nonseparable optimization 
problem introduced in [32], where we have a system de- 
scribed by three difference equations: 


_ x(k) 
mist) 1 + 0.01; (k)(3 + u2(k))’ 
_ X2(k) + ur(k)xi(k + 1) 
cai aE TTI 2 REET 
ete x(k) 


1+ .0.01u2(k)(1 + u3(k)’ 


with the initial condition 
x(0)=[2 5 7]!. 
The control variables are constrained by 


0< u,(k) <4, 
0 < un(k) < 4, 


0 < u3(k) < 0.5. 
The performance index to be minimized is 
T= xq(P) + x3(P) + x3(P) 


P 
+ (x: x2(k —1) + x2(k — 1) + 2u2(k — »] 
k=1 


P 2 
: (>: x2(k — 1) + 2u2(k — 1) + 2u3(k — »)) 


k=1 


where P is the number of stages. When P is taken as 
100, then we have a 300 variable optimization problem, 
because at each stage there are three control variables 
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to be determined. Without the use of a reliable way 
of determining the region sizes over which to take the 
control variables, the problem is very difficult, but with 
the method suggested in [20] the problem was solved 
quite readily by the LJ optimization procedure by us- 
ing 100 random points per iteration and 60 passes, each 
consisting of 201 iterations, to yield I = 258.3393. Al- 
though the computational requirements appear enor- 
mous, the actual computation time was less than 20 
minutes on a Pentium-120 personal computer [20], 
which corresponds to less than one minute on the 
Pentium4/2.4 GHz personal computer. This value of 
the performance index is very close to the value I = 
258.3392 obtained by use of iterative dynamic program- 
ming [18]. To solve this problem, IDP is much more ef- 
ficient in spite of the nonseparability of the problem, 
because in IDP the problem is solved as a 3 variable 
problem over 100 stages, rather than a 300 variable op- 
timization problem. Therefore, the LJ procedure is use- 
ful in checking the optimal control policy obtained by 
some other method. Here, the control policies obtained 
by IDP and LJ optimization procedure are almost iden- 
tical, where a sudden change at around stage 70 occurs 
in the control variables u; and u2. Therefore, LJ opti- 
mization procedure is ideally suited for checking results 
obtained by other methods, especially when the optimal 
control policy differs from what is expected, as is the 
case with this particular example. 

Recently it was shown that the convergence of the LJ 
optimization procedure in the vicinity of the optimum 
can be improved substantially by incorporating a sim- 
ple line search to choose the best center point for a sub- 
sequent pass [24]. For a typical model reduction prob- 
lem, to reach the global optimum the computation time 
was reduced by a factor of four when the line search was 
incorporated. Due to its simplicity, the LJ optimization 
procedure can be programmed very easily. Computa- 
tional experience with numerous optimization prob- 
lems has shown that the method has high reliability of 
obtaining the global optimum, so the L) optimization 
procedure provides a very good means of obtaining the 
optimum for very complex problems. 


See also 


> Interval Analysis: Unconstrained and Constrained 
Optimization 
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Continuous optimization refers to optimization involv- 
ing objective functions whose domain of definition is 
a continuum, as opposed to a set of discrete points in 
combinatorial (or discrete) optimization. Discontinu- 
ous optimization is the special case of continuous opti- 
mization in which the objective function, although de- 
fined over a continuum (let us suppose over R"), is not 
necessarily a continuous function. 

We define the discontinuous optimization problem 
as: 


inf F(x) 
st. fi(x)=0, i€E, (1) 
fix) >0, ie], 


where the index sets E and J are finite and disjoint and 
¢ and fj, i € E U IJ are a collection of (possibly dis- 
continuous) piecewise differentiable functions that map 
R" to R. A piecewise differentiable function f: R" > R 
is a function whose derivative is defined everywhere 
except over a subset of a finite number of sets, called 
ridges, of the form {x € R”: r(x) = 0}, where r is a differ- 
entiable function, and these ridges partition the domain 
into subdomains over each of which f is differentiable. 
By abuse of language, we shall call r(x) a ridge of f. 
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Without loss of generality, we can restrict our at- 
tention to the unconstrained optimization problem: inf, 
f(x), where f is a (possibly discontinuous) piecewise 
differentiable function. Indeed, in order to solve prob- 
lem (1), one can consider the unconstrained 1, exact 
penalty function 


fla) = vf (x) + > fi)| 


i€E 


— 5° min(0, fi(x)] 


iel 


for a succession of decreasing positive values of the 
penalty parameter y (fy is clearly a piecewise differ- 
entiable function). Notice however that using the 1; 
penalty function (and dealing with the decrease of 
a penalty parameter) is only one approach to han- 
dling the constrained problem and may not be the best 
way. 

Given a (possibly discontinuous) piecewise differ- 
entiable function f defined over R” and the finite set 
{ri(x)}ier of its ridges, we define a cell of f to be 
a nonempty set C C R” such that for all x, y € C we 
have sign(r;(x)) = sign(r;(y)) 0, for alli € R, where the 
function sign is either 1, —1 or 0, according to whether 
its argument is positive, negative or zero. Thus, f is dif- 
ferentiable over a cell. 

Considering the optimization of functions which 
are nonsmooth and even discontinuous is motivated 
by applications in VLSI and floor-planning problems, 
plant layout, batch production, switching regression, 
discharge allocation for hydro-electric generating sta- 
tions, fixed-charge problems, for example (see [4, In- 
trod.] for references). Note that most of these prob- 
lems can alternatively be modeled within the con- 
text of mixed integer programming, a field straddling 
combinatorial optimization and continuous optimiza- 
tion. 

The inescapable nonconvexity nature of discontin- 
uous functions gives rise to the existence of several local 
optima in discontinuous optimization problems. We 
do not address here the difficult issue of global opti- 
mization. We are concerned with finding a local in- 
fimum of the above optimization problem. An algo- 
rithm looking for local optima can however be used 
as an adjunct to some heuristic or global optimization 
method for discontinuous optimization problems but 


the inherent combinatorial nature of such an approach 
is often ultimately dominant. More importantly, it pro- 
vides a framework allowing the optimizer to deal di- 
rectly with the nonsmoothnesses and discontinuities 
involved, and thereby, improve solutions found by 
heuristic methods, when this is possible. 

Leaving aside the heuristic methods (which many 
people facing practical discontinuous optimization 
problems rely upon in order to solve mixed integer pro- 
gramming formulation of discontinuous optimization 
problems), previous work on discontinuous optimiza- 
tion includes smoothing algorithms. The smoothing al- 
gorithms express discontinuities by means of a step 
function, and then they approximate the step func- 
tion by a function which is not only continuous but 
moreover smooth, so that the resulting problem can be 
solved by a gradient technique (cf. also » Conjugate- 
gradient methods). Both LI. Imo and D,J. Leech [7] and 
I. Zang [9] developed methods in which the objective 
function is replaced only in the neighborhood of the 
discontinuities. Two drawbacks of these methods are 
the potential numerical instability when we want this 
neighborhood to be small, and the cost of evaluating 
the smoothed functions. In many instances the discon- 
tinuities of the first derivative are exactly the regions of 
interest and smoothing has the effect of making such 
regions less discernible. 

Another approach, which deals explicitly with the 
discontinuities within the framework of continuous op- 
timization, is the following active set method (intro- 
duced in [4]). Recall the following definitions relevant 
to active set methods: the null space of M, denoted by 
N(M), is defined by 


N(M) = |x R": Mx =O}. 


We say that a ridge r is active at x if r(x) = 0. Let 
A(x) © R be the (finite) index set of the ridges that 
are active at the current point x, and let A(x) be the ma- 
trix of activities, having as columns the gradients of the 
ridges that are active at x. In the case of linear ridges, 
ri(x) := al x — bj, a direction d € N(A' (x)) is said to 
preserve each activity i € A(x) since for each i € A(x) 
we have r(x + ad) = rj(x) = 0. 

If A(x) # G, then V f(x) is not necessarily defined. 
This is because we cannot talk about the gradient of the 
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function at X since there is no vector g € R” such that 
gi dis the first order change of f in direction d, for any d 
€ R". Thus, we cannot use, as in the smooth situation, 
the negative gradient direction as a descent direction. 
We term any (n x 1)-vector gz such that 


f' Gd) = gid, 
for alld € N(A'), 


a restricted gradient of f at x, because it is the gradient 
of the restriction of f to the space NV (A! (&)). 

Let us first consider the continuous piecewise linear 
case. We assume that the ridges of f are given, and also 
we assume that the restriction of f to any cell is known. 
Hence, we are assuming that more information on the 
structure of the objective function is available than, for 
example, in a bundle method [8], which assumes that 
only one element of the subdifferential is known at any 
point. 

It is shown in [4] that, under some nondegener- 
acy assumptions (e. g. the gradients of the ridges which 
are active at x are linearly independent), any continu- 
ous piecewise linear function f can be decomposed in 
a neighborhood of x into a smooth function and a sum 
of continuous functions having a single ridge as fol- 
lows: 


F(x) = f@®) + gf] (x —%) 


+ > vi min(0, a) (x —%)) , 
ie A(S) 


for some scalars {vitiea(e)s and some vector gz € R”. 
We term gy the restricted gradient of f at x. Note that if 
m ridges of f are active at X, it means that there are 2” 
cells in any small neighborhood of x. The vector g¢ and 
the m scalars {v ay ca» together with the m gradients 
of the activities, {ai}, eaGy thus completely character- 
ize the behavior of f over the 2” cells in the neighbor- 
hood of x! 

With such a decomposition at any point of R", an 
algorithm for finding a local minimum of a continuous 
piecewise linear function f is readily obtained, as long 
as we assume no degeneracy at any iterate and at any 
breakpoint encountered in the line search (we shall dis- 
cuss later the degenerate situation): 


1 Choose any x! € R” and set k < 1. 

BEGIN | REPEAT 

2 Identify the activities, A(x*), and com- 
pute dk = —P(g,x), the projection of 


the restricted gradient onto the space or- 
thogonal to the gradients of the activi- 
ties. 

IF d* = 0 (x* isa dead point; compute a 
single-dropping descent direction or es- 
tablish optimality), THEN 

3 Compute {u;}j¢4(,k), the coefficients of 
{4} ic acxk) in the linear combination of 
g,« in terms of the columns of Alx*). 


4 IF u; < Ooru; > =P, for some 
i € A(x*) (violated optimality condi- 
tion), THEN 

5 (Drop activity 7) 

Redefine d* = P_,(a;), if the vio- 


lated inequality found corresponds to 
u; > 0; otherwise d* = —P_;(a;) if it is 
u; < Se where P_; is the orthogonal 
projector onto the space orthogonal to 
the gradients of all the activities but ac- 
tivity i. 

ELSE stop: x* is alocal minimum of f. 
ENDIF ENDIF 

6 (Line search) 

Determine the step size a* by solving 
ming>o Fae +ad*). This line search 
can be done from x*, moving from one 
break-point of f to the next, in the di- 
rection d*, until either we establish un- 
boundedness of the objective function 
or the value of f starts increasing. 

7 Update x**! = x* + akd* k<—k+1. 
END REPEAT 


Continuous piecewise linear minimization algorithm 


Remark that in step 6, the directional derivative of 
the objective function in the direction d* can easily be 
updated from one breakpoint to the other in terms of 
the scalar us, where i is the index of the ridge crossed at 
breakpoint x. 

Let us now consider the case where f is still piece- 
wise linear but with possibly discontinuities across 
some ridges. We term such ridges: faults, and F(x) de- 
notes the faults that are active at x. 
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Note first that a (local) minimum does not always 
exist in the discontinuous case. Consider for exam- 
ple the following univariate function, having x = 0 as 
a fault: 


x+1 ifx>0, 


fx=" 


x otherwise. 


Hence, we rather look for a local infimum. In order to 
find such a local infimum of a function f having some 
faults, we shall simply generalize the algorithm for the 
continuous problem by implicitly considering any dis- 
continuity or jump across a fault i in f as the limit- 
ing case of a continuous situation. Since we are looking 
for a local infimum, without loss of generality we shall 
henceforth only consider functions f such that 


f(x) = lim inf f(x), 


in other words, we consider the lower semicontinuous 
envelope of f. 

The algorithm for the discontinuous case is essen- 
tially the same as in the continuous case except that we 
consider dropping an active fault from a dead point, x, 
only if we do so along a direction d such that 


jlim, f(x + 8d) = f(s) 


(i.e. as 6 > 0 is small, the value of f does not jump 
up from x to x + 6d). Thus, virtually only step 4 must 
be adapted from the continuous problem algorithm in 
order to solve the discontinuous case. To make more 
carefully the intuitive concept of directions jumping up 
or down, we define the set of soaring directions from 
a point X to be: 


Je>0,5>0: 
VO<5 <8: 
f(x + 6d) — f(x) >e 


S(x):= <deR": 


If we define, for a nondegenerate point X, 


ifd'* € N(AL,) 


St) = 4i€ AG): andafd >0 }, 
then d‘* € S(Z) 
ifd' eN(A!,) 

S-@):= 4i¢€ AR): andald’ <o }, 


then d' € S(x) 


then the set of soaring single-dropping directions from 
X are simply the directions dropping an activity i € 
S* (x) positively and the directions dropping an i € 
S~ (x) negatively (we say that activity i is dropped posi- 
tively (negatively) if all current activities, except for the 
ith, are preserved and if, moreover, a d is positive 
(negative)). A fault can now be defined more rigorously: 
a positive (negative) fault of f at a point x isa ridge ie R 
such that for any neighborhood, B(x), of x, there exists 
a nondegenerate point x’ € B(x) with i € 8*(x’) (with 
i € 8 (x’)). The set of all positive (negative) faults at x 
is denoted by Ft (x) F (x)). The set of faults of f at 
a point x is denoted by 


Fae F OUT @). 


We modify the continuous problem algorithm in such 
a way that, at a nondegenerate dead point, x*, we do not 
need to verify the optimality conditions corresponding 
to soaring single-dropping directions (u; > 0,1 € S* (x*) 
and u; < — v',i € S~ (x*)), so that we never consider 
such single-dropping directions in order to establish 
whether x* is optimal. This is reasonable since we are 
looking for a local minimum. The line-search step (step 
6) is modified similarly: when we encounter a break- 
point x ona fault along a direction d € S(x) (jump up), 
we stop; while if dis such that —d € S(x), (jump down), 
we carry on to the next breakpoint, and update properly 
the directional derivative along d. 

Note that one has to be careful at a ‘contact’ point x, 
€ R (defined below). At x,, contrary to at other points 
of a fault, we can drop activity i both positively and neg- 
atively. 

The function f: R’ > R, given by 


2x2 if x; >0or 
f(x) = 


—X2 


(x; = Oand x, < 0), (2) 


otherwise, 


illustrates well the situation. Figure 1 shows the graph 
of f in a neighborhood of x, := (0, 0)T (the dotted lines 
are simply lines that could be seen if the hatched surface 
were transparent). The point x, is a contact point with 
respect to the fault x; = 0. 


Discontinuous Optimization 


743 


Formally, we define x, € R” to be a contact point of 
f with respect to i € A(x,), when i € F(x.) such that 
either 
1) ie Ft (x) NF (x,), or 


2) there exist o*,0” € 3I%I such that a; =l,o; =-1 
and 
am, Fa Be fe 
o(x)=at a(x)=a— 


(continuity when crossing ridge i, which is a fault, at 
X-), where o(x) is the vector whose kth component 
is sign(r;(x)). 
Note that the fault x; = 0 and the point x, = (0, 0)T sat- 
isfy both conditions 1) and 2) in the above definition of 
a contact point for the function f defined by (2). They 
however satisfy only condition 1) for the function f: R? 
— R, defined by 


1 ifx,; > Oand x2 > 0, 

2 ifx,; <Oandx, <0 
f(x) = and x # (0,0)', 

3 ifx,; >Oandx, <0, 

4 otherwise, 


with faults x; = 0 and x2 = 0. For the function f: R? > 
R given by 


=X9 if x, > 0 and x2 < 0, 
f(x) = 


0 otherwise, 


with fault x; = 0, we satisfy only condition 2) (F~ (x,) 
is empty). 


An algorithm similar to the one introduced in the 
continuous case, but which does not consider soaring 
single-dropping directions, will encounter no difficulty 
with the discontinuity in f at any noncontact point (e. g. 
for (2), at any point other than x,). Let us assume, with- 
out loss of generality, that at the kth iterate, xk, F (xk) 
= F-(x*). The only step of the continuous algorithm 
which need to be modified is (assuming moreover that 
all points encountered in the algorithm are noncontact 
points): 


4 | IFu;<Oforsomei € A(x*), or uj; > = for 


some i € A(x*)\ F(x") (violated optimality 
condition), THEN 


The paper [4] describes techniques (including per- 
turbation) to cope with problems that occur in certain 
cases where the hypothesis of nondegeneracy is not sat- 
isfied at points encountered in the course of the algo- 
rithm. One cannot however extend this algorithm to 
deal with dead-point iterates (i.e. not encountered as 
breakpoint along the line search) without considering 
carefully the combinatorial nature of the problem of 
degeneracy. Nevertheless, no difficulties were encoun- 
tered in the computational experiments reported in [4], 
although serious problems can still arise at certain sin- 
gular points (contact points and dead-point iterates, at 
which the objective function is not decomposable). In- 
deed, in the discontinuous case, there is no straightfor- 
ward extension of this approach to the cases where the 
algorithm encounters a contact point. In the continu- 
ous case, the behavior of f over two juxtaposed cells are 
linked. At contact points however, there is coincidence 
of the values of restrictions of f to subdomains not oth- 
erwise linked to each other. 

Let us now discuss the extension to the nonlin- 
ear case. An advantage of the active set approach for 
the continuous piecewise linear optimization problem, 
over, for example, the simplex-format algorithm of R. 
Fourer [6], is that it generalizes it not only to the dis- 
continuous situation but also to the nonseparable and 
certain (decomposable) nonconvex cases. Above all, the 
active set approach is readily extendable to the nonlin- 
ear case, by adapting conventional techniques for non- 
linear programming, as was done above with the pro- 
jected gradient method for the (possibly discontinuous) 


744 


Discretely Distributed Stochastic Programs: Descent Directions and Efficient Points 


piecewise linear case. The definition of decomposition 
must first be generalized so that it expresses the first 
order behavior of a piecewise differentiable function in 
the neighborhood of a point. The piecewise linear algo- 
rithm described above used descent directions attempt- 
ing to decrease the smooth part of the function while 
maintaining the value of its nonsmooth part (when pre- 
serving the current activities). A first order algorithm 
for the nonlinear case could obtain these two objec- 
tives up to first order changes, as in the approach of 
A.R. Conn and T. Pietrzykowski to nonlinear optimiza- 
tion, via an J; exact penalty function [5]. In order to de- 
velop a second order algorithm, assuming now that f is 
(possibly discontinuous) piecewise twice-differentiable 
(i.e. twice differentiable everywhere except over a fi- 
nite number of ridges), one must first extend the defi- 
nition of first order decomposition to that of second or- 
der decomposition. One could then consider extending 
the strategies used by T.F. Coleman and Conn [2] on 
the exact penalty function approach to nonlinear pro- 
gramming (although the /, exact penalty function in- 
volves only first order types of nondifferentiabilities - 
ridges). The main idea is to attempt to find a direction 
which minimizes the change in f (up to second order 
terms) subject to preserving the activities (up to sec- 
ond order terms). Specifically, second order conditions 
must be derived (which are the first order conditions 
plus a condition on the ‘definiteness’ of the reduced 
Hessian of the twice-differentiable part of f (in the sec- 
ond order decomposition of f)). An analog of the New- 
ton step (or of a modification of the Newton method; 
cf. also » Gauss-Newton method: Least squares, rela- 
tion to Newton’s method) using a nonorthogonal pro- 
jection [3] is then taken (or a single-dropping direction 
is used). An algorithm following these lines would be 
expected to possess global convergence properties (re- 
gardless of starting point) and a fast (2-step superlinear) 
asymptotic convergence rate as in [1]. 


See also 


> Nondifferentiable Optimization 
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Introduction 


Many problems in stochastic optimization, as for in- 
stance optimal stochastic structural design problems, 
stochastic control problems, problems of scenario anal- 
ysis, etc., can be described [3,5] by mean value mini- 
mization problems of the type 


min F(x) (1) 


s.t. xe D, 


where the objective function F = F, is the mean value 
function, defined by 


F(x) = Eu(A(@)x — b(@)), x ER. (2) 


Here, (A(w), b(@)) is a random m x (n + 1) matrix, E 
denotes the expectation operator, D is a convex subset 
of R" and u: R™ — R designates a convex loss function 
measuring the loss arising from the deviation z = A(w)x 
— b(w) between the output A(@) x of the stochastic lin- 
ear system x — A(@) x and the random target b(w). 
Solving (1), (2), the loss function u should be ex- 
actly known. However, in practice mostly there is some 
uncertainty about the appropriate selection of u, for 
instance due to difficulties in assigning appropriate 
penalty costs to the deviation z = A(w)x — b(q@) be- 
tween the output A(w)x and the target b(w). We sup- 
pose that u € U, where U is a given set of convex loss 
functions containing the true, but unknown loss func- 
tion up. A possible way out in this situation of uncer- 
tainty about u is either to construct (feasible) descent 
directions h of F at a (iteration) point x being valid for 
a large class U of loss functions, or to provide the deci- 
sion maker with a certain set E = Ep, y (C D) of efficient 
points or solutions, being substitutes for an optimal so- 
lution x* of (1), (2); hence, this set Ep, y or at least its 
closed hull Ep,y should contain an optimal solution x* 


= x* of (1), (2) for each u € U. An important class U = 
C/, of loss functions u is the set of partially monotonous 
increasing convex loss functions on R” defined asfol- 
lows: 


Definition 1 Let J be a given subset of {1, ..., m}. For 
J =@ we put Ce = Cy, where C,, is the set of all convex 
functions u on R™. If J 4 @, then C/, denotes the set of 
all convex functions u: R” — R having the following 


property: 


Zw, 2S wy => u(z) < uw). (3) 


Here, z, € R!I, zy € R”—|J| is the partition of any z € 
R” into the subvectors z; = (Z;)j¢), Zr = (Zi)ig;. More- 
over, Zy < w; means that z; < w; forall j € J. 

Remark 2 In many cases one has loss functions u € 


C!, with one of the following additional strict partial 
monotonicity property: 


Zr < wi, 
Zi = Wh, = u(z) < u(w), (4) 


Zi < w; for some i € J 


Zp< wi, Zr = wy => u(z) < uw), (5) 


where z; < w; means that z; < w; for all j ¢ J. 


For a given set U of convex loss functions u containing 
the true, but unknown loss function up, a first definition 
of efficient solutions can be given as follows: 


Definition 3. A point x € D is called a nondominated, 
admissible or Pareto optimal solution of (1), (2) if there 
is no vector x € D,x # x, such that 


F(X) < F,(x) for allu € U, (6) 


F,(X) < F,(x) for some a € U, (7) 


where F,,(x) := Eu(A(w)x — b(w)). Let Ej, denote the 
set of all nondominated solutions of (1), (2). 


Discretely Distributed Stochastic Programs 


In the following we consider the construction of de- 
scent directions and efficient solutions for (1), (2) in the 
case that (A(q@), b(w)) has a discrete distribution 


r 
Pcacy.boo) =, WE (Ai,bH (8) 


i=1 
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where r > 1 is an integer, aj >0,i=1,...,1, aes =1, 
and €(4i pi) denotes the one-point measure in the given 
m x (n + 1) matrix (A’, b'), i=1,..., 7. 


Example: Scenario Analysis 


Given a certain planning problem, in scenario analysis 
[1,2,6,7,8] the future evolution or development of the 
system to be considered is anticipated or explored by 
means of a (usually small) number r (e.g., r = 3, 4, 5, 
6) of so-called scenarios s!, ..., s". Scenarios s‘, i = 1, 
...) 7, are plausible alternative models of the future de- 
velopment given by ‘extreme points’ of a certain set of 
basic or key variables. An individual scenario or a cer- 
tain mixture of the scenarios s’, ..., s” is assumed then 
to be revealed in the considered future time period. We 
assume now that the planningproblem can be described 
mathematically by the optimization problem 


min c’x 


st. Tx =(<)A, @) 


xe D. 


Here, D is a given convex subset of R”, and the data (c, 
T, h) are given by (c, T, h) = (c', T', h') for scenario s‘, 
i=1,...,7r, where c is an n-vector, T' an m x n matrix 
and h' an m-vector. Having written here the scenarios 
s',..., 8" by means of (9) and the data (c', T', h'), i= 1, 
..., 7, and facing therefore the subproblems 


min ci’x 
(10) 
st. Tix =(<)h', x e€D, 
for i= 1,..., 7, the decision maker has then to select an 


appropriate decision x € D. Since one is unable in gen- 
eral to predict with certainty which scenario s' will oc- 
cur, scenario analysts are looking for decisions x° which 
are ‘robust’ with respect to the different scenarios or 
“scenario-independent’, cf. [6,7,8]. Obviously, this ro- 
bustness concept is closely related to the idea of detect- 
ing ‘similarities’ within the family of optimal solutions 
x*(s'),i=1,..., 7, of the individual subproblems (10)(i), 
Pa Dpascgh DER yay With SO Lies De 
= 1, be (subjective) probabilities for the occurrence of 
Shy ciaa S's OF weights reflecting the relative importance 
of s',..., 8”. Considering loss functions u € C/, for eval- 
uating the violations z' = T' x — h' of the constraint 
Tix =h', Tix < hi’, resp., in (10)(), a class of robust or 


scenario-independent decisions are obviously the effi- 
cient solutions of 
r 


min) a; (ci'x + u(T'x —h')), 


xE€D 


(11) 
i=1 

which is a discretely distributed stochastic optimization 

problem of the type (1), (2). 


A System of Linear Relations for the Construction 
of Descent Directions 


Fundamental for the computation of the set Ep, y of ef- 
ficient solutions of (1), (2) is the following construction 
method for descent directions of the objective function 
F of (1), (2), cf. [3,4]. We suppose that the true, but un- 
known loss function u in (1) is, see Definition 1, an el- 
ement of en for some known index set J C {1,..., m}. 
We recall that for any vector z € R™ the subvectors z;, 
zy are defined by z; = (Zj)ies, Zu = (Zig s3see (3). OF 
course, if J = @, then z = zy and z; does not exist. For 
any m x (n+ 1) matrix (A, b), let (Ay, b7), (An, br), resp.; 
denote the submatrices of (A, b) having the rows (Aj, b;) 
with i ¢J,i€ {1,..., m}\J, respectively. 

Given an n-vector x (e. g., the tth iteration point of 
an algorithm for solving (1), (2)), we consider, in exten- 
sion of [3, system (3.1)-(3.4b)], the following system of 
linear relationsfor the unknowns (y, /7), where y € R” 
and IT = (4) is an auxiliary r x r matrix: 


¥ 
yaa, mij20, tj=1,...,7, (12) 
j=l 
r 
= > airi;, 5 ae rarer (13) 
i=1 
"aj 
. Nii, ; 
Aiy—b) so aie —b)), 
i=1 J 
j=l... (14) 
"aja 
j i iT ¢ yi i 
Any — by = >> (Aix — bi), 
i=1 4 
fH lyesegte (U5) 


The transition probability measure 


jj; ; i : 
Bij = —",  z' = A’x—b', (16) 


i=1 J 


is not a one-point measure for at least one j, 1 <j <r. 
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There exists at least one j, 1 < j < r, such that for all 
i Rees 


K/ is nota one-point measure and 77; > 0. (17) 


At least one inequality in (14) holds with <. (18) 
The constraint x € D in (1) can be handled by 
adding the condition 


yeD. (19) 


Remark 4 

a) By €, we denote the one-point measure in a point 
zER"”. 

According to (12), IT is a stochastic matrix. System 
(12)-(15) has always the trivial solution (y, IT) = (x, 
Id), where Id is the r x r identity matrix. 

If (y, IT) solves (12)-(15), then 


b 


ee 


Ya 


Cc 


Ary < Brx, Any = Byx, (20) 


where Ay = EA /(@), Ar = EA7;(w). 
If Pan(-)y—bn(-) denotes the probability distribu- 
tion of the random (m — |J|)-vector Ay(w)x— 
by(@), then (12), (13) and (15) mean that the dis- 
tributions Paz(-)y— bIH(-) and Pan(-)x— pm(-) corre- 
sponding to y, x, resp., are related by 


d 


wm 


Pan()x—br) = KP an ()y—bui) 


= J KO" 2Panoy-mno (dw), (21) 


where K(w, -) is the Markov kernel defined by 


ce (22) 


oeday 


i=1 


with wi = Aly — bi, z' = Aix — Di, i,j =1,..., r. Since 
f zK(w, dz) = w, the Markov kernel K is also calleda 
dilatation. 

If n-vectors x, y are related by (21), (22), then for 
every convex subset B C R” — |J| we have that 


e 


~~ 


Pan(-)x—by(-)(B) = 1 => Par(-)x—by(-)(B) = 1; 


hence, the distribution of Aj(-) y— by(-) is concen- 
trated to the convex hull of the support of Pan(-)x— 
by(-). 


f) IfJ =@, then (14) vanishes and (15) reads 


; "\ Qi, 
Aly—bi = Y "1 (Aix — b’), 
ya b= eat 


i=1 J 


In the special case 


(Alb!) = (Ay, by) forallj =1,...,r, (24) 


ie., if (A;(@), b;(@)) is constant with probability 
one, then (14) is reduced, cf. (20), to 


A wes A Ths (25) 
The meaning of (12)-(15) and the additional condi- 
tions (16)-(18) for the basic mean value minimization 
problem (1), (2) with objective function F is summa- 
rized in the next result. 


Theorem 5 Let J be any fixed subset of {1, ..., m}. 

a) If (y, ID) is a solution of (12)-(15), then F(y) < F(x) 

for every u € C!_. For J =@ also the converse holds: If 

there is a vector y such that F(y) < F(x) for allu € C,, 

(C2), then there exists an r x r matrix II such that (y, 

II) satisfies (12), (13) and (23). 

If (y, I) is a solution of (12)-(15) and (16), then 

F(y)< F(x) for every u € C!, which is strictly convex 

on conv{z!: 1 <i <r}. 

c) If (y, II) is a solution of (12)-(15) and (17), then F(y) 
< F(x) for every u € C!, which is not affine-linear on 
conv{z':1<i<r}. 

d) If (y, ID fulfills (12)-(15) and (18), then F(y) < F(x) 
for every u € C!, satisfying (4). 

Proof Ifx and (y, II) are related by (12)-(15), then F(y) 

< Via 0ju(Loj— Biz’) for every ue Cf, 

If x, (y, II) are related by (12)-(15) and (18), then 

F(y) < Doi oju(Qo1_ Byz') for every u € C, fulfilling 

(4). The rest can then be shown as in [3, Thm. 2.2]. 


b 


nS 


A simple, but important consequence of the above the- 
orem is stated next: 


Corollary 6 For given x € R" or x € D let (y, II) be 
any solution of (12)-(15) such that y £ x, y€ D \ { x}, 
respectively. 

a) Then h = y — x is a descent direction, a feasible de- 
scent direction, resp., of F at x for every u € C!, such 
that F is not constant on the line segment xy joining 
x and y. 
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b) If (y, IL) fulfills also (16), (17), (18), resp., then h = y 
— x isa (feasible) descent direction of F at x for every 
u € C), which is strictly convex on conv {z':1 <i <r}, 
is not affine-linear on conv { z': 1 <i <r}, fulfills (4), 
respectively. 


Efficient Solutions of (1), (2) 


In the following we suppose that the unknown loss 
function u is an element of ‘ok where J is a given sub- 
set of {1, ..., m}. For a given point x € D, the descent 
direction-finding procedure described in the previous 
section only can fail completely if for each solution (y, 
II) of (12)-(15) with x € D we have that 


Aly = Alx for each j = 1,...,1. (26) 


Indeed, in this case we either have y = x, or, for arbi- 
trary loss functions u, the objective function F of (1), 
(2) is constant on the whole line through the points x, y. 
This observation suggests the following basic efficiency 
concept. 


Definition 7 A point x € D is called a (C/,)-efficient 
point or a (C/,)-efficient solution of (1), (2) if and only if 
for each solution (y, IT) of (12)-(15) with y € D we have 
that Aly = Alx for each j=1,...,r,ie., A(w)x = A(@)y 
with probability 1. Let Ep, ; denote the set of all efficient 
points of (1), (2). 


For deriving parametric representations of Ep,;, we 
need the following definitions and lemmas. 

For a given n-vector x and z! = Aix— b’,i=1,..., 1, 
let S=S, C {1,..., r} with |S] = s be an index set such 
that {z': 1 <i<r}={zi:ie S}, where z; 4 z; for i,j € 
S,i # j. Defining for i € S,j = 1,..., 7, the quantities 


Qj := y at, 
zt=zi 
1 —_ OT; 
J 
Vij = ) Tj, Bij = , 
a; a; 


J 


(27) 


zt=zi 


we find that relations (12)-(15) can also be represented 


Saye, Beh fH Tea r, i€S, (28) 
j=l 
= yay jHl,...,7, (29) 


Aly —b1 <)> Bijzi. $= Nyasieh (30) 
i€S 
A Gat = Bile FS Taia ih (31) 
i€S 


For the next lemma we still need the s x r matrix T° 
= (ti) defined by 


0 ifzi zd, 

DS ied aot 
ai 

Lemma 8 Let (y, II) be a solution of (12)-(15), and let 

T = T(ID) = (tj) be the s x r matrix having the elements 

Tj given by (27). If (26) holds, then T(II) = T° and (14) 

holds with ‘=’. 


forie S, j=1,...,r. (32) 


Lemma 8 implies the following important property of 
efficient solutions: 


Corollary 9 Let x € D be an efficient solution of (1), 
(2). If (y, ID is any solution of (21)-(22) with y € D, then 
T(II) = T° and (14) holds with ‘= ’. 


For J = @ we obtain the set Ep := Ep, g of all C,,-efficient 
solutions of (1), (2). This set is studied in [3]. An im- 
portant relationship between Ep and Ep, J for any J C 
{1,..., m} is given next: 


Lemma 10 Ep; C Ep for every] C {1,..., m}. 


Comparison of Definitions 7 and 3 


Comparing the efficient solutions according to Defini- 
tion 7 and the nondominated solutions according to 
Definition 3, first for J = 9, i.e., U = Cy, we find the- 
following correspondence: 


Theorem 11 Ep g = By es 


The next corollary follows immediately from the above 
theorem and Lemma 10. 


Corollary 12 Ep; C Epg = ED 6. for] C{1,..., m}. 
Considering now U = C/,, we have this inclusion: 
Theorem 13. Ep; > Ep, C!, for] C {1,..., m}. 


The following inclusion follows from Corollary 12 and 
Theorem 13. 


Corollary 14 EO CEpyC ED ¢, for] C {1,..., m}. 


A converse statement to Theorem 13 can be obtained 
for (24): 
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Theorem 15 [f (A;(@), b}(w)) = (A;, by) with proba- 
bility 1, then Ep, ; = ae for each] C {1,..., mj. 


Further Characterization of Ep,; 


The C/, -efficiency of a point x € D can also be described 
in the following way. 


Theorem 16 A point x € D is (C!,)-efficient if and only 
if for every solution (y, II) of (12)-(15) we have that Aly 
= Alx for all j = 1,..., 7, orh = y — x is not a feasible 
direction for D at x. 


Necessary Optimality Conditions 
Without Using (Sub)Gradients 


If x € D is efficient, then, cf. Theorem 16, the descent 
direction-finding method described in in the previous 
Section fails at x. Since especially in any optimal solu- 
tion x* of (1), (2) no feasible descent direction may ex- 
ist, efficient points are candidates for optimal solutions: 


Theorem 17 Suppose that for every x € D and every 
solution (y, II) of (12)-(15) with y € D the objective func- 
tion F of (1), (2) with a loss function u € Ge. is constant 
on the line segment xy if and only if Aly = A’x for every j 
=1,..., 7. If x* is an optimal solution of (1), (2), then x* 
E Ep, J 


Remark 18 The assumption in Theorem 17 concerning 
F is fulfilled, e.g., if u € C/, is strictly convex on the 
convex hull conv {(A/y— b/)(A/x— b/): x, y € D,1 <j 
<r} generated by the line segments (A/y— b/)(A/x — b/) 
joining (A/y — b/) and (A/x — B/). 

If the assumption in Theorem 17 concerning F does not 
hold, then it may happen that F is constant on a certain 
line segment xy though A/y 4 A/x for at least one index 
j,1 <j <r. Hence, Theorem 17 can not be applied then 
directly. However, in this case the following modifica- 
tion of Theorem 17 holds true. 


Theorem 19 Let u be an arbitrary loss function from 
C!, for some J C {1, ..., m}. If D is a compact convex 
subset of R", then there exists at least one optimal solu- 
tion x* of (1), (2) lying in the closure E p,y of the set Ep, ; 
of efficient solutions of (1), (2). 


Parametric Representation of Ep, , 


Suppose that u € C/, for some index set J C { 1,..., m}. 
For solving the descent direction-generating relations 


(12)-(15) and (16)-(18), resp., we may use, see Theo- 
rem 5 and Corollary 6, the quadratic program, cf. [3,4], 


min 7/(Ary — Arx) + pa 2 


j=l ies 
: 
s.t. y= Tij = 0, 
j=l 
j=il,...,7, i€S, 
aj= ait; j= | oer, 2 (33) 
i€S _ 
Aly—bl < OBijzi, fader, 
; ies 
J J 2 ~i F 
Aiy — by, = > BiiZin j=l,....7, 
i€S 
yeD, 


where 7 = (17) is a |J|-vector having fixed positive com- 
ponents 7, / € J. Efficient solutions of (1), (2) can be 
characterized as follows: 


Lemma 20 A vector x € D is an efficient solution of (1), 
(2) if and only if (33) has an optimal solution (y*, T*) 
such that Aly* = Ax for allj = 1,..., r. 


Remark 21 According to Lemma 8 we have then also 
that T* = T° and (14) holds with ‘ =’. 


We suppose now that the feasible domain D of (1), (2) 
is given by 

D={xeER": g(x) <0, k=1,...,K}. (34) 
Here, gi, .. 


Moreover, we suppose that (33) has a feasible solution 
(y, T) such that for each nonaffine linear function g; 


-> &« are differentiable, convex functions. 


gely) <0. (35) 
No constraint qualifications are needed in the impor- 
tant special case D = {x € R" : G, < g}, where (G, g) is 
a givenk x (n + 1) matrix. 

By means of the Kuhn-Tucker conditions of (33), 
the following parametric representation of Ep,; can be 
derived [3,4]: 


Theorem 22 Let D be given by (34), and assume that 
the constraint qualification (35) holds for every x € D. 
An n-vector x is an efficient solution of (1), (2) if and 
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only if x satisfies the linear relations 


a 

| 

> 

| 
, ae 
£ | 

| 
8|>) 
—— 

N. 

II 

N 
os 
R|- 

| 
2 \- 
n"’” 


ife! # zi, 
where Aj, ..., A, are arbitrary real parameters, and the 


parameter m-vectors y1,..., yr and further parameter 
vectors p € R“, y € R" are selected such that 


(37) 


r K 
DIANA + YE oeVon(y) = 0, 


(38) 
j=l k=1 
yj = 9, j=l,....r, (39) 
g(x) <0, k=1,...,k, (40) 

<0, = 0, = 0, 

gly) S Prgk(y) Pk iis 
| <a rere “oh 
ASA, FH Mecast (42) 


a as nty; 7 
and the vectors y; are defined by yj = erag jah 
agit 
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A stochastic combinatorial optimization problem is of 
the form 


min F(x) = ico dix (v) 


st «ES 


(1) 


where S = {x,..., Xv} is a finite discrete feasible set. If 
the value of the objective function F is easily obtainable, 
the problem is just a deterministic combinatorial op- 
timization problem. In most applications however, the 
value of the objective function F has to be approximated 
by numerical integration or Monte-Carlo simulation. 

Some problems of type (1) exhibit a special struc- 
ture, which can be exploited for solution methods, like 
the stochastic linear optimization problems, where S 
are the integer points of a convex polyhedron and H 
is piecewise linear (see ® Stochastic integer program- 
ming: Continuity, stability, rates of convergence). In 
this contribution, we discuss problems with an arbi- 
trary and unstructured feasible set S. 

An example is the stochastic single machine tar- 
diness problem (SSMTP): The optimal sequence of m 
jobs, which are processed on a single machine has to be 
found. Each job has a random processing time, which 
is distributed according to the distribution function G;, 
i=1,..., m (independent ofall others), and a fixed due 
date d;. The feasible set S is the set of all m! permuta- 
tions z of {1,..., m}. If 2 is the solution found, we pro- 
cess job (1) as the first, (2) as the second and so on. 


Let c;(u) be the costs for job i being late u time units 
(c;(u) = 0 for u < 0). The SSMTP is 


m 
min ))Elej(Vaqa) +++ + Vari) — dni) 


i=1 


s.t. res 


(2) 


where V; are random variables distributed indepen- 
dently according to G;. The analytic calculation of the 
objective function (OF) in (2) involves multiple inte- 
grals (the convolution of up to m distribution func- 
tions). A simple way of approximating the OF is by 
Monte-Carlo simulation. Let , aah ve be indepen- 
dent random (pseudorandom) variables, each with dis- 
tribution G;. The true expectation F(z) = E[cj(Vazq) + 
- + Vai) — dr(i))] is approximated by the estimate 


F (sr) 


Iwy (i) (i) 
= = Dd iledVeqy +209 + Va ae): 

j=l i=l 

In principle, all exact (branch and bound) and 
heuristic (evolutionary algorithms, tabu search, ant 
systems, random search, simulated annealing, genetic 
algorithms) methods for combinatorial optimization 
may be applied to stochastic combinatorial optimiza- 
tion — just that the exact values F(x) have to be replaced 
by stochastic estimates F,, (x) which are based on sam- 
ple size n. 

The main difficulty in stochastic combinatorial op- 
timization is the fact that even if F(x,;) < F(x.) — 6, 
it may happen with positive probability that F(x) > 
F,,(x2), that is we may wrongly conclude that x2 is better 
than x,. The probability of this error decreases to zero 
with sample size n increasing to infinity. A compromise 
between the quality of the solution and the costs of very 
large samples has to be found in stochastic optimiza- 
tion. 

If the random distribution j, in (1) does not de- 
pend on x, common random numbers may be taken. To 
be more precise, let V, ..., V be a sample from jz 
and let 


x i 
Fras) = — ) A(x, VY). 
j=l 


The estimates F are now correlated, and the probability 
that F,(x,) > F,(x2) although F(x) < F,(x2) — 6 


752 


Discrete Stochastic Optimization 


is typically much smaller for such a choice than with 
samples taken independently for each x; (see [6]). 

If the estimates F are difficult to get (e.g. they need 
real observation or expensive simulation) allocation 
rules decide, which estimate or which set of estimates 
has to be taken next. These rules try to exclude quickly 
subsets of the feasible set, which — with high statistical 
evidence - do not contain optimal solutions. The effort 
is then concentrated on the (shrinking) set of not yet 
excluded points.Allocation rules may be based on sub- 
set selection (see [3]) or ordinal optimization (see [5]). 
There is also a connection to experimental design, in 
particular to sequential experimental design: In experi- 
mental design one has to choose the next point(s) for 
sampling, which - based on the information gathered 
so far - will give the best additional information which 
we need to solve the underlying estimation or optimiza- 
tion problem (for experimental design literature see [1] 
and the references therein). 

For large sets S, which have graph-neighborhood 
or partition structures, ‘stochastic’ variants of neighbor 
search or branch and bound methods may be used. In 
particular, stochastic simulated annealing and stochas- 
tic branch and bound have been studied in literature. 


Stochastic Simulated Annealing 


This is a variant of ordinary simulated annealing (cf. 
> Simulated annealing): The Metropolis rule for the 
acceptance probability is calculated on the basis of the 
current stochastic estimates of the objective function, 
i.e. the new state x; is preferred to the current state x; 
with probability 


where kg is the Boltzmann constant and T is the tem- 
perature. The estimates F are improved in each step by 
taking additional observations, i.e. increasing the sam- 
ple size n. For an analysis of this algorithm see [4]. 


Stochastic Branch and Bound 


For the implementation of a stochastic branch and 
bound method (cf. also » Integer programming: 
Branch and bound methods), an estimate of a lower 
bound function is needed. Recall that a function F, de- 
fined on the subsets of S, is called a lower bound function 


if 
inf {F(x): x € T} > F(T) 


forall T CS. 

In stochastic branch and bound an estimate F of F 
can be found for instance by sampling F.,(x;) for each 
x; in T with E(F,(x:)) = F(x;) and setting 


fr) = inf {F,(xi): x € r\ 


The bound-step of the branch and bound method is re- 
placed by a statistical test, whether the lower bound es- 
timate of a branch is significantly larger than the esti- 
mate of an intermediate solution. After each step, all es- 
timates are improved by taking additional observations. 
For details see [2] and [7]. 

In all these algorithms, common random numbers 
may decrease the variance. 


See also 


> Derivatives of Markov Processes and Their 
Simulation 

> Derivatives of Probability and Integral Functions: 
General Theory and Examples 

> Derivatives of Probability Measures 

> Optimization in Operation of Electric and Energy 
Power Systems 
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Abstract 


In this chapter, we present classification models based 
on mathematical programming approaches. We first 
provide an overview of various mathematical pro- 
gramming approaches, including linear programming, 
mixed integer programming, nonlinear programming, 
and support vector machines. Next, we present our 
effort of novel optimization-based classification mod- 
els that are general purpose and suitable for develop- 


ing predictive rules for large heterogeneous biological 
and medical data sets. Our predictive model simultane- 
ously incorporates (1) the ability to classify any num- 
ber of distinct groups; (2) the ability to incorporate 
heterogeneous types of attributes as input; (3) a high- 
dimensional data transformation that eliminates noise 
and errors in biological data; (4) the ability to in- 
corporate constraints to limit the rate of misclassifi- 
cation, and a reserved-judgment region that provides 
a safeguard against overtraining (which tends to lead 
to high misclassification rates from the resulting pre- 
dictive rule); and (5) successive multistage classification 
capability to handle data points placed in the reserved- 
judgment region. To illustrate the power and flexibil- 
ity of the classification model and solution engine, and 
its multigroup prediction capability, application of the 
predictive model to a broad class of biological and med- 
ical problems is described. Applications include the dif- 
ferential diagnosis of the type of erythemato-squamous 
diseases; predicting presence/absence of heart disease; 
genomic analysis and prediction of aberrant CpG is- 
land methylation in human cancer; discriminant anal- 
ysis of motility and morphology data in human lung 
carcinoma; prediction of ultrasonic cell disruption for 
drug delivery; identification of tumor shape and volume 
in treatment of sarcoma; multistage discriminant anal- 
ysis of biomarkers for prediction of early atheroscle- 
rois; fingerprinting of native and angiogenic microvas- 
cular networks for early diagnosis of diabetes, aging, 
macular degeneracy, and tumor metastasis; prediction 
of protein localization sites; and pattern recognition of 
satellite images in classification of soil types. In all these 
applications, the predictive model yields correct classi- 
fication rates ranging from 80 to 100%. This provides 
motivation for pursuing its use as a medical diagnostic, 
monitoring and decision-making tool. 


Introduction 


Classification is a fundamental machine learning task 
whereby rules are developed for the allocation of in- 
dependent observations to groups. Classic examples of 
applications include medical diagnosis - the allocation 
of patients to disease classes on the basis of symptoms 
and laboratory tests —- and credit screening — the accep- 
tance or rejection of credit applications on the basis of 
applicant data. Data are collected concerning observa- 
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tions with known group membership. These training 
data are used to develop rules for the classification of 
future observations with unknown group membership. 
In this introduction, we briefly describe some ter- 
minologies related to classification, and provide a brief 
description of the organization of this chapter. 


Pattern Recognition, Discriminant Analysis, 
and Statistical Pattern Classification 


Cognitive science is the science of learning, knowing, 
and reasoning. Pattern recognition is a broad field 
within cognitive science, which is concerned with the 
process of recognizing, identifying, and categorizing in- 
put information. These areas intersect with computer 
science, particularly in the closely related areas of arti- 
ficial intelligence, machine learning, and statistical pat- 
tern recognition. Artificial intelligence is associated with 
constructing machines and systems that reflect human 
abilities in cognition. Machine learning refers to how 
these machines and systems replicate the learning pro- 
cess, which is often achieved by seeking and discovering 
patterns in data, or statistical pattern recognition. 

Discriminant analysis is the process of discriminat- 
ing between categories or populations. Associated with 
discriminant analysis as a statistical tool are the tasks of 
determining the features that best discriminate between 
populations, and the process of classifying new objects 
on the basis of these features. The former is often called 
feature selection and the latter is referred to as statisti- 
cal pattern classification. This work will be largely con- 
cerned with the development of a viable statistical pat- 
tern classifier. 

As with many computationally intensive tasks, re- 
cent advances in computing power have led to a sharp 
increase in the interest and application of discrim- 
inant analysis techniques. The reader is referred to 
Duda et al. [25] for an introduction to various tech- 
niques for pattern classification, and to Zopounidis and 
Doumpos [121] for examples of applications of pattern 
classification. 


Supervised Learning, Training, 
and Cross-Validation 


An entity or observation is essentially a data point as 
commonly understood in statistics. In the framework 
of statistical pattern classification, an entity is a set 


of quantitative measurements (or qualitative measure- 
ments expressed quantitatively) of attributes for a par- 
ticular object. As an example, in medical diagnosis an 
entity could be the various blood chemistry levels of 
a patient. With each entity is associated one or more 
groups (or populations, classes, categories) to which it 
belongs. Continuing with the medical diagnosis exam- 
ple, the groups could be the various classes of heart dis- 
ease. Statistical classification seeks to determine rules 
for associating entities with the groups to which they 
belong. Ideally, these associations align with the asso- 
ciations that human reasoning would produce on the 
basis of information gathered on objects and their ap- 
parent categories. 

Supervised learning is the process of developing 
classification rules based on entities for which the clas- 
sification is already known. Note that the process im- 
plies that the populations are already well defined. 
Unsupervised learning is the process of discovering pat- 
terns from unlabeled entities and thereby discover- 
ing and describing the underlying populations. Mod- 
els derived using supervised learning can be used for 
both functions of discriminant analysis - feature selec- 
tion and classification. The model that we consider is 
a method for supervised learning, so we assume that 
populations are previously defined. 

The set of entities with known classification that is 
used to develop classification rules is the training set. 
The training set may be partitioned so that some enti- 
ties are withheld during the model-development pro- 
cess, also known as the training of the model. The with- 
held entities form a test set that is used to determine 
the validity of the model, a process known as cross- 
validation. Entities from the test set are subjected to the 
rules of classification to measure the performance of the 
rules on entities with unknown group membership. 

Validation of classification models is often per- 
formed using m-fold cross-validation where the data 
with known classification are partitioned into m folds 
(subsets) of approximately equal size. The classification 
model is trained m times, with the mth fold withheld 
during each run for testing. The performance of the 
model is evaluated by the classification accuracy on the 
m test folds, and can be represented using a classifica- 
tion matrix or confusion matrix. 

The classification matrix is a square matrix with the 
number of rows and columns equal to the number of 
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groups. The ijth entry of the classification matrix con- 
tains the number or proportion of test entities from 
group i that were classified by the model as belonging 
to group j. Therefore, the number or proportion of cor- 
rectly classified entities is contained in the diagonal el- 
ements of the classification matrix, and the number or 
proportion of misclassified entities is in the off-diagonal 
entries. 


Bayesian Inference and Classification 


The popularity of Bayesian inference has risen drasti- 
cally over the past several decades, perhaps in part due 
to its suitability for statistical learning. The reader is 
referred to O'Hagan [92] for a thorough treatment of 
Bayesian inference. Bayesian inference is usually con- 
trasted against classical inference, though in practice 
they often imply the same methodology. 

The Bayesian method relies on a subjective view of 
probability, as opposed to the frequentist view upon 
which classical inference is based [92]. A subjective 
probability describes a degree of belief in a proposition 
held by the investigator based on some information. 
A frequency probability describes the likelihood of an 
event given an infinite number of trials. 

In Bayesian statistics, inferences are based on the 
posterior distribution. The posterior distribution is the 
product of the prior probability and the likelihood func- 
tion. The prior probability distribution represents the 
initial degree of belief in a proposition, often before 
empirical data are considered. The likelihood function 
describes the likelihood that the behavior is exhibited, 
given that the proposition is true. The posterior distri- 
bution describes the likelihood that the proposition is 
true, given the observed behavior. 

Suppose we have a proposition or random variable 
@ about which we would like to make inferences, and 
data x. Application of Bayes’s theorem gives 


dF(0) dF(x|4) 


dF(6|x) = — ™ 


Here, F denotes the (cumulative) distribution function. 
For ease of conceptualization, assume that F is differ- 
entiable, then dF = f, and the above equality can be 
rewritten as 


FAFOO) 


0 — 
F(8|x) FG 


For classification, a prior probability function (g) 
describes the likelihood that an entity is allocated 
to group g regardless of its exhibited feature val- 
ues x. A group density function f(x|g) describes 
the likelihood that an entity exhibits certain measur- 
able attribute values, given that it belongs to pop- 
ulation g. The posterior distribution for a group 
P(g|x) is given by the product of the prior prob- 
ability and group density function, normalized over 
the groups to obtain a unit probability over all 
groups. The observation x is allocated to group h 
if h = arg maxgeg P(g|x) = arg maxgeg GE 
where G denotes the set of groups. 


Discriminant Functions 


Most classification methods can be described in terms 
of discriminant functions. A discriminant function 
takes as input an observation and returns information 
about the classification of the observation. For data 
from a set of groups G, an observation x is assigned to 
group h if h = argmaxgeg |,(x), where the functions 
1, are the discriminant functions. Classification meth- 
ods restrict the form of the discriminant functions, and 
training data are used to determine the values of the pa- 
rameters that define the functions. 

The optimal classifier in the Bayesian frame- 
work can be described in terms of discriminant 
functions. Let my, = 2(g) be the prior probability 
that an observation is allocated to group g and let 
f(x) = f(x|g) be the likelihood that data x are 
drawn from population g. If we wish to minimize 
the probability of misclassification given x, then 


the optimal allocation for an entity is to the group 


Te fel(x) 
h = arg maxgeg P(g|x) = arg maxgeg Seca 


Under the Bayesian framework, 


Tef(x|g) _ Ae f(lg) 
f(x) Lem fal 


JEG 


P(g|x) = 


The discriminant functions can be [,(x) = P(g|x) 
for g€G. The same classification rule is given 
by I,(x) = mg f(x|g) and I,(x) = log f(x|g) + log mz. 
The problem then becomes finding the form of the 
prior functions and likelihood functions that match the 
data. 
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If the data are multivariate normal with equal co- 
variance matrices (f(x|g) ~ N(j¢, »’)), then a linear 
discriminant function (LDF) is optimal: 


I,(x) = log f (x|g) + log zg 
= —1/2(x — jig Se — fg) — 1/2 log |X| 
— d/2log2n + log, 


ere i 
= WoX + Woo, 


where d is the number of attributes, wy = ¥ ol [gs 
and we = —l/2upL~'ws + logay + x7 Lx — 
d/2log 2. Note that the last two terms of wo are 
constant for all g and need not be calculated. When 
there are two groups (G = {1,2}) and the priors are 
equal (1, = 72), the discriminant rule is equivalent to 
Fisher’s linear discriminant rule [30]. Fisher’s rule can 
also be derived, as it was by Fisher, by choosing w so 
that (wy, — w? 2)?/(w? Sw) is maximized. 

These LDFs and quadratic discriminant functions 
(QDFs) are often applied to data sets that are not multi- 
variate normal or continuous (see pp. 234-235 in [98]) 
by using approximations for the means and covari- 
ances. Regardless, these models are parametric in that 
they incorporate assumptions about the distribution of 
the data. Fisher’s LDF is nonparametric because no as- 
sumptions are made about the underlying distribution 
of the data. Thus, for a special case, a parametric and 
a nonparametric model coincide to produce the same 
discriminant rule. The LDF derived above is also called 
the homoscedastic model, and the QDF is called the 
heteroscedastic model. The exact form of discriminant 
functions in the Bayesian framework can be derived for 
other distributions [25]. 

Some classification methods are essentially meth- 
ods for finding coefficients for LDFs. In other words, 
they seek coefficients w, and constants wgy such that 
Ip(x) = Wgx + Wego, g € G is an optimal set of discrim- 
inant functions. The criteria for optimality are differ- 
ent for different methods. LDFs project the data onto 
a linear subspace and then discriminate between enti- 
ties in that subspace. For example, Fisher’s LDF projects 
two-group data on an optimal line, and discriminates 
on that line. A good linear subspace may not exist 
for data with overlapping distributions between groups 
and therefore the data will not be classified accurately 
using these methods. The hyperplanes defined by the 


discriminant functions form boundaries between the 
group regions. A large portion of the literature concern- 
ing the use of mathematical programming models for 
classification describes methods for finding coefficients 
of LDFs [121]. 

Other classification methods seek to determine 
parameters to establish QDFs. The general form of 
a QDF is I(x) = x" W,x + WeX + Wego. The bound- 
aries defining the group regions can assume any hyper- 
quadric form, as can the Bayes decision rules for arbi- 
trary multivariate normal distributions [25]. 

In this paper, we survey the development and 
advances of classification models via the mathemat- 
ical programming techniques, and summarize our 
experience in classification models applied to pre- 
diction in biological and medical applications. The 
rest of this chapter is organized as follows. Sec- 
tion “Mathematical Programming Approaches” first 
provides a detailed overview of the development and 
advances of mathematical programming based classi- 
fication models, including linear programming (LP), 
mixed integer programming (MIP), nonlinear pro- 
gramming, and support vector machine (SVM) ap- 
proaches. In Sect. “Mixed Integer Programming Based 
Multigroup Classification Models and Applications to 
Medicine and Biology”, we describe our effort in devel- 
oping optimization-based multigroup multistage dis- 
criminant analysis predictive models for classification. 
The use of the predictive models for various biological 
and medical problems is presented. Section “Progress 
and Challenges” provides several tables to summa- 
rize the progress of mathematical programming based 
classification models and their characteristics. This is 
followed by a brief description of other classification 
methods in Sect. “Other Methods”, and by a summary 
and concluding remarks in Sect. “Summary and Con- 
clusion”. 


Mathematical Programming Approaches 


Mathematical programming methods for statistical pat- 
tern classification emerged in the 1960s, gained pop- 
ularity in the 1980s, and have grown drastically since. 
Most of the mathematical programming approaches are 
nonparametric, which has been cited as an advantage 
when analyzing contaminated data sets over methods 
that require assumptions about the distribution of the 
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data [107]. Most of the literature about mathemati- 
cal programming methods is concerned with either us- 
ing mathematical programming to determine the coef- 
ficients of LDFs or support vector machines (SVMs). 
The following notation will be used. The subscripts 
i, j, and k are used for the observation, attribute, and 
group, respectively. Let xj be the value of attribute j of 
observation i. Let m be the number of attributes, K be 
the number of groups, G; represent the set of data from 
group k, M be a big positive number, and € be a small 
positive number. The abbreviation “urs” is used in ref- 
erence to a variable to denote “unrestricted in sign.” 


Linear Programming Classification Models 


The use of linear programs to determine the coefficients 
of LDFs has been widely studied [31,46,50,74]. The 
methods determine the coefficients for different objec- 
tives, including minimizing the sum of the distances to 
the separating hyperplane, minimizing the maximum 
distance of an observation to the hyperplane, and min- 
imizing other measures of badness of fit or maximizing 
measures of goodness of fit. 


Two-Group Classification One of the earliest LP 
classification models was proposed by Mangasar- 
ian [74] to construct a hyperplane to separate two 
groups of data. Separation by a nonlinear surface us- 
ing LP was also proposed when the surface parameters 
appear linearly. Two sets of points may be inseparable 
by one hyperplane or surface through a single-step LP 
approach, but they can be strictly separated by more 
planes or surfaces via a multistep LP approach [75]. 
In [75] real problems with up to 117 data points, ten at- 
tributes, and three groups were solved. The three-group 
separation was achieved by separating group 1 from 
groups 2 and 3, and then group 2 from group 3. 

Studies of LP models for the discriminant problem 
in the early 1980s were carried out by Hand [47], Freed 
and Glover [31,32], and Bajgier and Hill [5]. Three LP 
models for the two-group classification problem, in- 
cluding minimizing the sum of deviations (MSD), min- 
imizing the maximum deviation (MMD), and minimiz- 
ing the sum of interior distances (MSID) were pro- 
posed. Freed and Glover [33] provided computational 
studies of these models where the test conditions in- 
volved normal and nonnormal populations. 


MSD: 


Minimize pac 
i 
subject to wo + Yo xijwj — di <0 VieG,, 
j 
wo+ > ° xijwj + di > 0 VieG, 
j 
wjurs Vj, 
d;>0 Vi. 


MMD: 


Minimize d 


subject to wo+ D> xiwj—d <0 VieG,, 
j 
wo+ i xijwj+d>0 VieG, 
j 
wiurs Vj, 
d>0. 
MSID: 


Minimize pd— » ej 
i 


subject to Wo + Yo xijwj-d + ej) <0 VieG,, 
j 

wo + > xiwj +d —e; >0 VieG, 
j 


wjurs Vj, 
d>0, 
ej >0 Vi, 


where p is a weight constant. 

The objective function of the MSD model is the 
L,-norm distance, while the objective function of MMD 
is the Loo-norm distance. They are special cases of 
Ly-norm classification [50,108]. 

In some models the constant term of the hyperplane 
is a fixed number instead of a decision variable. The 
model minimize the sum of deviations with constant 
cutoff score MSD° shown below is an example where 
the cutoff score b replaces wo in the formulation. The 
same replacement could be used in other formulations. 
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MSD°: 
Minimize Lz d; 
i 


subject to So x14; -—dj<b VieG,, 
j 

Yi xiwj tdi >b VieG, 
j 


wjurs Vj, 


A gap can be introduced between the two regions 
determined by the separating hyperplane to prevent de- 
generate solutions. Take MSD as an example; the sepa- 
ration constraints become 


wo + > xijwj — di < -€ VieG,, 
iy 

wo+ > xijwj +di > € VieG. 
J 


The small number € can be normalized to 1. 

Besides introducing a gap, another normalization 
approach is to include constraints such as ys wi=l 
or pee w; = 1 in the LP models to avoid unbounded 
or trivial solutions. 

Specifically, Glover et al. [45] gave the hybrid model, 
as follows. 

Hybrid model: 


Minimize pd + S- pidi —qe- > Giei 
subject to wo+ )_ xjjwj-d—di te+e;=0 
j 
VieG,, 


wot > xijwj+d+d;—e—e, =0 
j 
VieG, 
wjurs Wj, 
d,e>0, 
dj,e; > 0 Vi, 


where p, p,q, qi are the costs for different deviations. 
Including different combinations of deviation terms in 
the objective function then leads to variant models. 
Joachimsthaler and Stam [50] reviewed and sum- 
marized LP formulations applied to two-group classi- 


fication problems in discriminant analysis, including 
MSD, MMD, MSID, and MIP models, and the hy- 
brid model. They summarized the performance of the 
LP methods together with the traditional classification 
methods such as Fisher’s LDF [30], Smith’s QDF [106], 
and a logistic discriminant method. In their review, 
MSD sometimes but not uniformly improves classifica- 
tion accuracy, compared with traditional methods. On 
the other hand, MMD is found to be inferior to MSD. 
Erenguc and Koehler [27] presented a unified survey 
of LP models and their experimental results, in which 
the LP models include several versions of MSD, MMD, 
MSID, and hybrid models. Rubin [99] provided experi- 
mental results comparing these LP models with Fisher’s 
LDF and Smith’s QDF. He concluded that QDF per- 
forms best when the data follow normal distributions 
and that QDF could be the benchmark when seeking 
situations for advantageous LP methods. In summary, 
the above mentioned review papers [27,50,99] describe 
previous work on LP classification models and their 
comparison with traditional methods. However, it is 
difficult to make definitive statements about the condi- 
tions under which one LP model is superior to others, 
as stated in [107]. 

Stam and Ungar [110] introduced the software 
package RAGNU, a utility program in conjunction with 
the LINDO optimization software, for solving two- 
group classification problems using LP-based methods. 
LP formulations such as MSD, MMD, MSID, hybrid 
models, and their variants are contained in the package. 

There are some difficulties in LP-based formu- 
lations, in that some models could result in un- 
bounded, trivial, or unacceptable solutions [34,87], but 
possible remedies are proposed. Koehler [51,52,53] 
and Xiao [114,115] characterized the conditions of 
unacceptable solutions in two-group LP discrimi- 
nant models, including MSD, MMD, MISD, the hy- 
brid model, and their variants. Glover [44] proposed 
the normalization constraint )77"_,(—|G2| Vieg, xij + 
|Gi| 0 iec, Xij)wj = 1, which is more effective and re- 
liable. Rubin [100] examined the separation failure for 
two-group models and suggested applying the mod- 
els twice, reversing the group designations the second 
time. Xiao and Feng [116] proposed a regularization 
method to avoid multiple solutions in LP discriminant 
analysis by adding the term € } "7", w; in the objective 
functions. 
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Bennett and Mangasarian [9] proposed the follow- 
ing model which minimizes the average of the devia- 
tions, which is called robust LP (RLP): 


1 
Minimize Gi 2 it Ga] dd 
subject to wo + So xi d;<-1 VieG,, 
j 
wo+ > xiwj +d) >1 VieG, 
j 
wjurs Wj, 
d,>0 Vi. 


It is shown that this model gives the null solution 
W, =-+++ = Wm = 0 if and only if Ta Dies, Si 


1 
1G2| 


J Lies, xij for all j, in which case the solution 
WwW; =:::= WwW» = 0 is guaranteed to be not unique. 
Data of different diseases have been tested by the pro- 
posed classification methods, as in most of Mangasar- 
ian’s papers. 

Mangasarian et al. [86] described two applications 
of LP models in the field of breast cancer research, one 
in diagnosis and the other in prognosis. The first appli- 
cation is to discriminate benign from malignant breast 
lumps, while the second one is to predict when breast 
cancer is likely to recur. Both of them work successfully 
in clinical practice. The RLP model [9] together with 
the multisurface method tree algorithm [8] is used in 
the diagnostic system. 

Duarte Silva and Stam [104] included the second- 
order (i.e., quadratic and cross-product) terms of the 
attribute values in the LP-based models such as MSD 
and hybrid models and compared them with linear 
models, Fisher’s LDF, and Smith’s QDF. The results 
of the simulation experiments show that the methods 
which include second-order terms perform much bet- 
ter than first-order methods, given that the data sub- 
stantially violate the multivariate normality assump- 
tion. Wanarat and Pavur [113] investigated the effect 
of the inclusion of the second-order terms in the MSD, 
MIP, and hybrid models when the sample size is small 
to moderate. However, the simulation study shows that 
second-order terms may not always improve the per- 
formance of a first-order LP model even with data con- 
figurations that are more appropriately classified by 
Smith’s QDF. Another result of the simulation study is 


that inclusion of the cross-product terms may hurt the 
model’s accuracy, while omission of these terms causes 
the model to be not invariant with respect to a nonsin- 
gular transformation of the data. 

Pavur [94] studied the effect of the position of the 
contaminated normal data in the two-group classifi- 
cation problem. The methods for comparison in that 
study included MSD, minimizing the number of mis- 
classifications (MM; (described in the “Mixed Integer 
Programming Classification Models” section), Fisher’s 
LDF, Smith’s QDF, and nearest -neighbor models. The 
nontraditional methods such as LP models have po- 
tential for outperforming the standard parametric pro- 
cedures when nonnormality is present, but this study 
shows that no one model is consistently superior in all 
cases. 

Asparoukhov and Stam [3] proposed LP and MIP 
models to solve the two-group classification problem 
where the attributes are binary. In this case the training 
data can be partitioned into multinomial cells, allow- 
ing for a substantial reduction in the number of vari- 
ables and constraints. The proposed models not only 
have the usual geometric interpretation, but also pos- 
sess a strong probabilistic foundation. Let s be the index 
of the cells, 115, 12; be the number of data points in cell 
s from groups 1 and 2, respectively, and (bs), ... , Vsm) 
be the binary digits representing cell s. The model 
shown below is the LP model of minimizing the sum 
of deviations for two-group classification with binary 
attributes. 

Cell conventional MSD: 


Minimize x 

Si M1s+n25>0 

wo + > bejwj — dis <0 Vs:n,>0, 
j 


(115di5 =F N75 2s) 
subject to 


wo + >_ bejwj + dhs > 0 Vs: to; >0, 
j 

wjurs Vj, 

d\;,d.,;>0 Ys. 


Binary attributes are usually found in medical di- 
agnoses data. In this study three real data sets of dis- 
ease discrimination were tested: developing postoper- 
ative pulmonary embolism or not, having dissecting 
aneurysm or other diseases, and suffering from post- 
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traumatic epilepsy or not. In these data sets the MIP 
model for binary attributes (BMIP), which will be de- 
scribed later, performs better than other LP models or 
traditional methods. 


Multigroup Classification Freed and Glover [32] ex- 
tended the LP classification models from two-group 
to multigroup problems. One formulation which uses 
a single discriminant function is given below: 


K-1 
Minimize .y Chk 

k=1 
subject to > xi) <U, Vie G, Vk, 


J 

Y > xijwj = Le Vie G Vk, 
j 

Ug te S Leyi + Og 


Vk=1,...,K—-l1,wjurs Vj, 
U;,,L, urs Wk, 
a,urs Vk=1,...,K—1, 


where the number € could be normalized to be 1, 
and c;, is the misclassification cost. However, single- 
function classification is not as flexible and general 
as multiple-function classification. Another extension 
from the two-group case to the multigroup case in [32] 
is to solve two-group LP models for all pairs of groups 
and determine classification rules based on these solu- 
tions. However, in some cases the group assignment is 
not clear and the resulting classification scheme may be 
suboptimal [107]. 

For the multigroup discrimination problem, Ben- 
nett and Mangasarian [10] defined the piecewise-linear 
separability of data from K groups as the following: The 
data from K groups are piecewise-linear-separable if 
and only if there exist (wk, wk, ...,wk) eR™t1, k= 
1, ... ,K, such that wo+dj xjjwi > wotdj xiwit 
1, Vi € G, Vh,k Fh. The following LP will generate 
a piecewise-linear separation for the K groups if one ex- 
ists, otherwise it will generate an error-minimizing sep- 
aration: 


Minimize + ~ Gal = age 
h 


h kth 


subject to a” = (we + >> xijw?) 
j 
+ (wk + So xijw§) +1 
j 
VieG, Vh,kF#h, 
wieurs Vj,k, 


dtk>0 VieG, Wh, kFh. 


The method was tested in three data sets. It per- 
forms pretty well in two of the data sets which are to- 
tally (or almost totally) piecewise-linear separable. The 
classification result is not good in the third data set, 
which is inherently more difficult. However, combin- 
ing the multisurface method tree algorithm [8] results 
in an improvement in performance. 

Gochet et al. [46] introduced an LP model for the 
general multigroup classification problem. The method 
separates the data with several hyperplanes by sequen- 
tially solving LPs. The vectors w‘, k = 1, ... , K, are 
estimated for the classification decision rule. The rule 
is to classify an observation i into group s, where 
s = argmax,{wk + a xjjwi}. 

Suppose observation i is from group h. Denote the 
goodness of fit for observation i with respect to group k 
as 


Gi, lw", w*) 
+ 
j j 
where [a]* = max{0, a}. 


Likewise, denote the badness of fit for observation i with 
respect to group k as 


Bow ww) 
= [ (we +4 >> xijwh)— (wh + Daw] : 
j j 
where [a]~ = —min{0, a}. 


The total goodness of fit and total badness of fit are then 
defined as 


G(w) = Gw', ... w= Gi ww), 


h k#h i€G), 


wk) = x = > Bi (w", w*) ? 


h k#h i€Gp, 


B(w) = Biw, ... 
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The LP is to minimize the total badness of fit, subject to 
a normalization equation, in which q > 0: 

Minimize B(w), 
subject to G(w)— Bw) = q, 


wurs. 


Expanding G(w) and B(w) and_ substituting 
Gi .(w", wk) and Bi,.(w", w*) by yj, and Bi, respec- 
tively, the LP becomes 


Minimize », > > Bie 


h k#h i€G);, 
subject to (wh + SD ajw}) — (wh + Dye )) 
j j 
= Vin — Bhe VieG, Vh,k Fh, 


>. Y° (vin — Bik) = 4 


h k#hi€Gy 


k ‘ 
w; urs Vik, 


Vino Bre 20 Vie Ga Vhk#h. 


The classification results for two real data sets show 
that this model can compete with Fisher’s LDF and the 
nonparametric k-nearest-neighbor method. 

The LP-based models for classification problems 
highlighted above are all nonparametric models. In 
Sect. “Mixed Integer Programming Based Multigroup 
Classification Models and Applications to Medicine 
and Biology”, we describe LP-based and MIP-based 
classification models that utilize a parametric multi- 
group discriminant analysis approach [39,40,60,63]. 
These latter models have been employed successfully 
in various multigroup disease diagnosis and biolog- 
ical/medical prediction problems [16,28,29,56,57,59, 
60,64,65]. 


Mixed Integer Programming Classification Models 


While LP offers a polynomial-time computational 
guarantee, MIP allows more flexibility in (among other 
things) modeling misclassified observations and/or 
misclassification costs. 


Two-Group Classification In the two-group classi- 
fication problem, binary variables can be used in the 
formulation to track and minimize the exact number 
of misclassifications. Such an objective function is also 
considered as the Lo-norm criterion [107]. 


MM: 
Minimize » Zi 
i 
subject to Wot So xij; <Mz ViEeG,, 


J 
wo + Yo xiwj = —Mz; VieG, 
j 


Vj, 
Zi € {0, 1} Vi. 


wj urs 


The vector w is required to be a nonzero vector to 
prevent the trivial solution. 

In the MIP formulation the objective function could 
include the deviation terms, such as those in the hy- 
brid models, as well as the number of misclassifi- 
cations [5]; or it could represent expected cost of 
misclassification [1,6,101,105]. In particular, there are 
some variant versions of the basic model. 

Stam and Joachimsthaler [109] studied the classifi- 
cation performance of MM and compared it with that 
of MSD, Fisher’s LDF, and Smith’s QDF. In some cases 
the MM model performs better, but in some cases it 
does not. MIP formulations are in the review stud- 
ies of Joachimsthaler and Stam [50] and Erenguc and 
Koehler [27], and are contained in the software devel- 
oped by Stam and Ungar [110]. Computational experi- 
ments show that the MIP model performs better when 
the group overlap is higher [50,109], although it is still 
not easy to reach general conclusions [107]. 

Since the MIP model is NP-hard, exact algo- 
rithms and heuristics are proposed to solve it effi- 
ciently. Koehler and Erenguc [54] developed a proce- 
dure to solve MM in which the condition of nonzero 
w is replaced by the requirement of at least one vio- 
lation of the constraints wo + Dae xijwj < Oforie G 
or Wo + oF x;jw; = 0 for i € G. Banks and Abad [6] 
solved the MIP of minimizing the expected cost of 
misclassification by an LP-based algorithm. Abad and 
Banks [1] developed three heuristic procedures for the 
problem of minimizing the expected cost of misclas- 
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sification. They also included the interaction terms 
of the attributes in the data and applied the heuris- 
tics [7]. Duarte Silva and Stam [105] introduced the di- 
vide and conquer algorithm for the classification prob- 
lem of minimizing the misclassification cost by solv- 
ing MIP and LP subproblems. Rubin [101] solved the 
same problem by using a decomposition approach, and 
tested this procedure on some data sets, including two 
breast cancer data sets. Yanev and Balev [119] proposed 
exact and heuristic algorithms for solving MM, which 
are based on some specific properties of the vertices of 
a polyhedral set neatly connected with the model. 

For the two-group classification problem where the 
attributes are binary, Asparoukhov and Stam [3] pro- 
posed LP and MIP models which partition the data into 
multinomial cells and result in fewer variables and con- 
straints. Let s be the index of the cells, ,,, 2; be the 
number of data points in cell s from groups 1 and 2, re- 
spectively, and (bs), ... , bsm) be the binary digits rep- 
resenting cell s. Below is the BMIP, which performs best 
in the three real data sets in [3]: 

BMIP 


Minimize » {|M15 — N2s5|Zs + min(n,;, 12;)} 


S: Ns +n2,>0 
subject to wo + y bjwj < Mz; Ws: ms = Mos; 


J; 
nis >0, 


Wo + y b,jwj > —Mz; Ws: ms < M5, 
j 
wjurs Wj, 


z,€ {0,1} Ws: ms +m, >0. 


Pavur et al. [96] included different secondary goals 
in model MM and compared their misclassification 
rates. A new secondary goal was proposed, which maxi- 
mizes the difference between the means of the discrimi- 
nant scores of the two groups. In this model the term -6 
is added to the minimization objective function as a sec- 
ondary goal with a constant multiplier, while the con- 
straint >? HOw; =) x Ww; > 6 is included, where 
a. = |Gkl| Vie, xij Vi, for k = 1,2. The results 
of simulation study show that an MIP model with the 
proposed secondary goal has better performance than 
the other models studied. 


Glen [42] proposed integer programming (IP) tech- 
niques for normalization in the two-group discrimi- 
nant analysis models. One technique is to add the con- 
straint pee |w;| = 1. In the proposed model, w; for 
j=1,...,mis represented by w; = wi _ Wi where 
wi we > 0, and binary variables 5; and y; are defined 
such that bjs = lows pe and = leew, Be, 
The IP normalization technique is applied to MSD and 
MMD, and the MSD version is presented below. 


MSD - with IP normalization: 


Minimize > dj 


1 


m 
subject to wo + Yo xijwe —w;)—4;, <0 
j=l 


VieG,, 


m 
Wo + Yo xijwe = w;) +d;=0 
j=l 


VieG, 

m 

Yow two) =1, 

j=l 
wi —€6;>0 Vj=1 mM, 
wi —6;<0 VWj=1 mM, 
WwW, ="ey, 2% Vj=1, sm, 
w,-yj <0 Vj=l mM, 
6j+yj <1 Vj=l1 im, 
Wo urs , 
wr Ww, 20 Vj=l sm, 


d; >0 Vi, 
Say SHO! Wi Sze ee 


The variable coefficients of the discriminant func- 
tion generated by the models are invariant under ori- 
gin shifts. The proposed models were validated using 
two data sets from [45,87]. The models were also ex- 
tended for attribute selection by adding the constraint 
pee (6; + yj) = p, which allows only a constant num- 
ber, p, of attributes to be used for classification. 
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Glen [43] developed MIP models which determine 
the thresholds for forming dichotomous variables as 
well as the discriminant function coefficients, w;. For 
each continuous attribute to be formed as a dichoto- 
mous attribute, the model finds the threshold among 
possible thresholds while determining the separating 
hyperplane and optimizing the objective function such 
as minimizing the sum of deviations or minimizing the 
number of misclassifications. Computational results of 
a real data set and some simulated data sets show that 
the MSD model with dichotomous categorical variable 
formation can improve classification performance. The 
reason for the potential of this technique is that the LDF 
generated is a nonlinear function of the original vari- 
ables. 


Multigroup Classification Gehrlein [41] proposed 
MIP formulations of minimizing the total number 
of misclassifications in the multigroup classification 
problem. He gave both a single-function classification 
scheme and a multiple-function classification scheme, 
as follows. 

General single-function classification (GSFC) - 
minimizing the number of misclassifications: 


Minimize ya 
i 
subject to wo + So xijW; —Mz<U,p, VieG,, 
j 
wo + So xij; +Mz>Ll, VieG,, 
j 


U,—-Le > 8’) Vk, 


Lg —Ux + Mygk = 8 
Ly —Ug + Mykg = 6 
Vek + Vkg = 1 

Vi. 


Vek, g#Ak, 


wj; urs 
U;,,L, urs Wk, 
Z€ {0, 1} Vi, 
Vor {0,1} Vgk, gk, 
where U;, Ly denote the upper and lower endpoints of 


the interval assigned to group k, and yx = 1 if the in- 
terval associated with group g precedes that with group 


k and ygx = 0 otherwise. The constant 6’ is the mini- 
mum width of an interval of a group and the constant 6 
is the minimum gap between adjacent intervals. 
General multiple-function classification (GMFC) - 
minimizing the number of misclassifications: 


Es 


1 
h h k 
Wo + ) Xijwi — Wo 
j 


Minimize 
subject to 


— 0 xijwy + Mz; Se 
j 
Vie Gp, Vh,k Ah, 
wi urs Vj,k, 
zi €{0,1} Vi. 


Both models work successfully on the iris data set 
provided by Fisher [30]. 

Pavur [93] solved the multigroup classification 
problem by sequentially solving the GSFC in one di- 
mension each time. LDFs were generated by succes- 
sively solving the GSFC with the added constraints that 
all linear discriminants are uncorrelated to each other 
for the total data set. This procedure could be repeated 
for the number of dimensions that is believed to be 
enough. According to the simulation results, this proce- 
dure substantially improves the GSFC model and some- 
times outperforms GMFC, Fishers LDF, or Smith’s 
QDF. 

To solve the three-group classification problem 
more efficiently, Loucopoulos and Pavur [71] made 
a slight modification to the GSFC and proposed the 
model MIP3G, which also minimizes the number of 
misclassifications. Compared with GSFC, MIP3G is 
also a single-function classification model, but it re- 
duces the possible group orderings from six to three 
in the formulation and thus becomes more efficient. 
Loucopoulos and Pavur [72] reported the results of 
a simulation experiment on the performance of GMFC, 
MIG3G, Fisher’s LDF, and Smith’s QDF for a three- 
group classification problem with small training sam- 
ples. Second-order terms were also considered in the 
experiment. Simulation results show that GMFC and 
MIP3G can outperform the parametric procedures 
in some nonnormal data sets and that the inclusion 
of second-order terms can improve the performance 
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of MIP3G in some data sets. Pavur and Loucopou- 
los [95] investigated the effect of the gap size in the 
MIP3G model for the three-group classification prob- 
lem. A simulation study illustrates that for fairly separa- 
ble data, or data with small sample sizes, a non-zero-gap 
model can improve the performance. A possible reason 
for this result is that the zero-gap model may be over- 
fitting the data. 

Gallagher et al. [39,40,63] and Lee [59,60] proposed 
MIP models, both heuristic and exact, as a computa- 
tional approach to solving the constrained discriminant 
method described by Anderson [2]. These models are 
described in detail in Sect. “Mixed Integer Program- 
ming Based Multigroup Classification Models and Ap- 
plications to Medicine and Biology”. 


Nonlinear Programming Classification Models 


Nonlinear programming approaches are natural exten- 
sions for some of the LP-based models. Thus far, non- 
linear programming approaches have been developed 
for two-group classification. 

Stam and Joachimsthaler [108] proposed a class 
of nonlinear programming methods to solve the two- 
group classification problem under the L,-norm objec- 
tive criterion. This is an extension of MSD and MMD, 
for which the objectives are the Lj-norm and Loo-norm, 
respectively. 

Minimize the general L,-norm distance: 


Minimize ‘oF d?)\/p 
i 


subject to So xij; —dj<b VieG,, 
j 
\ > xjjwj + di > b VieG, 
j 


wyurs Vj, 


The simulation results show that, in addition to 
the L,;-norm and the Loo-norm, it is worth the ef- 
fort to compute other L,-norm objectives. Restrict- 
ing the analysis to 1 < p < 3, plus p = o0, is recom- 
mended. This method was reviewed by Joachimsthaler 
and Stam [50] and Erenguc and Koehler [27]. 


Mangasarian et al. [85] proposed a nonconvex 
model for the two-group classification problem: 


di} +d 

subject to So xijwj —d’ <0 VieEG,, 
j 

Yo xijwj +a? > 0 VieG, 
j 


Minimize 


max |wj|=1, 
m 


fa lyots 
wjurs Vj, 
d',d? urs, 


This model can be solved in polynomial-time by 
solving 2m linear programs, which generate a sequence 
of parallel planes, resulting in a piecewise-linear non- 
convex discriminant function. The model works suc- 
cessfully in clinical practice for the diagnosis of breast 
cancer. 

Further, Mangasarian [76] also formulated the 
problem of minimizing the number of misclassifica- 
tions as a linear program with equilibrium constraints 
(LPEC) instead of the MIP model MM described previ- 
ously: 


Minimize 
subject to wo + So x19 —d,;<-—-1 VieG,, 


Zi(wo +)> Xj jWj = dj + 1) =0 
j 
VieE G, ; 


wot d\ xijwj +d: > 1 VieG, 
j 


zi(wo +). xjjwj + di — 1) =0 
j 
VieG, 
di1l—zj)=0 VieG,UG, 
0<z2<1 VieG,UG, 
d;>0 VieG,UG, 
Vi. 


Wj urs 


The general LPEC can be converted to an exact 
penalty problem with a quadratic objective and linear 
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constraints. A stepless Frank-Wolfe-type algorithm is 
proposed for the penalty problem, terminating at a sta- 
tionary point or a global solution. This method is called 
the parametric misclassification minimization (PMM) 
procedure, and numerical testing is included in [77]. 

To illustrate the next model, we first define the step 
function s: R > {0,1} as 


1 ifu>0, 
s(u) = 
0 ifu<0. 


The problem of minimizing the number of misclas- 
sifications is equivalent to 


Minimize ce s(d;) 
i€G,UG2 
subject to Wot+ So xij —d;<-l1 VieG,, 
j 
wo+ >> xijwj +d) > 1 VieG, 


j 
d;>0 VieG,UG, 
wjurs Wj. 

Mangasarian [77] proposed a simple concave ap- 
proximation of the step function for nonnegative 
variables: t(u,a@) = 1— e~%“, where a > 0, u > 0. Let 
a > 0 and approximate s(d;) by t(d;, a). The problem 
then reduces to minimizing a smooth concave function 
bounded below on a nonempty polyhedron, which has 
a minimum at a vertex of the feasible region. A finite 
successive linearization algorithm (SLA) was proposed, 
terminating at a stationary point or a global solution. 
Numerical tests of SLA were done and compared with 
the PMM procedure described above. The results show 
that the much simpler SLA obtains a separation that is 
almost as good as PMM in considerably less computing 
time. 

Chen and Mangasarian [21] proposed an algorithm 
on a defined hybrid misclassification minimization 
problem, which is more computationally tractable than 
the NP-hard misclassification minimization problem. 
The basic idea of the hybrid approach is to obtain iter- 
atively wo and (wi, ... , Wm) of the separating hyper- 
plane: 


1. For a fixed wo, solve RLP [9] to determine 
(wi, mete »Wm)- 
2. For this (wy, .. 
misclassification minimization problem to deter- 


. ,Wm), solve the one-dimensional 


mine wo. 
Comparison of the hybrid method is made with the 
RLP method and the PMM procedure. The hybrid 
method performs better in the testing sets of the ten- 
fold cross-validation and is much faster than PMM. 

Mangasarian [78] proposed the model of minimiz- 
ing the sum of arbitrary-norm distances of misclassified 
points to the separating hyperplane. For a general norm 
||- || on R”, the dual norm ||- ||’ on R™ is defined as 
|x| |! = max y||=1 x" y. Define [a]* = max{0, a} and 


let w = (wy,...,Wm). The formulation can then be 
written as 
+ 
Minimize > [wo + yx) 
i€G) j 
+ 
2; [ = wo — So x30] 
i€G) j 
subject to ||w||/ = 1, 
Wo, W urs . 


The problem is to minimize a convex function on 
a unit sphere. A decision problem related to this min- 
imization problem is shown to be N’P-complete, ex- 
cept for p = 1. For a general p-norm, the minimization 
problem can be transformed via an exact penalty for- 
mulation to minimizing the sum of a convex function 
and a bilinear function on a convex set. 


Support Vector Machine 


A support vector machine (SVM) is a type of math- 
ematical programming approach [112]. It has been 
widely studied, and has become popular in many appli- 
cation fields in recent years. The introductory descrip- 
tion of SVMs given here is summarized from the tu- 
torial by Burges [20]. In order to maintain consistency 
with SVM studies in published literature, the notation 
used below is slightly different from the notation used 
to describe the mathematical programming methods in 
earlier sections. 

In the two-group separable case, the objective func- 
tion is to maximize the margin of a separating hyper- 
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plane, 2/||w||, which is equivalent to minimizing ||w]|?: 


ww ; 


xiw+b> +1 for y, = +1, 


xiw+b<-l fory; =—1, 


Minimize 


subject to 


w,burs, 


where x; € R™ represents the values of attributes of ob- 
servation iand y; € {—1, 1} represents the group of ob- 
servation i. 

This problem can be solved by solving its Wolfe 
dual problem: 


ens 1 
Maximize y aj — 5 y ij Vi VX) Xj ; 
i i,j 


subject to Yo aii =0, 


t 


a,>0 Vi. 


Here, a; is the Lagrange multiplier for the training 
point i, and the points with a; > 0 are called the sup- 
port vectors (analogous to the support of a hyperplane, 
and thus the introduction of the name “support vec- 
tor”). The primal solution w is given by w = )0; ai yixi. 
b can be computed by solving y;(w'x; + b)-1=0 
for any i with a; > 0. 

For the nonseparable case, slack variables €; are in- 
troduced to handle the errors. Let C be the penalty for 
the errors. The problem becomes 

1 
Minimize —w! C i - 
inimi rd w+ os &;) 
xiwt+tb>+1-&; for y; = +1, 
xiw+b<-1+6& for y; =—1, 


subject to 


w, burs, 


&>0 Vi. 
When k is chosen to be 1, neither the &; nor their 
Lagrange multipliers appear in the Wolfe dual problem: 


pes 1 
Maximize y aj — 5 y ij Vi VX) Xj ; 
i i,j 


subject to So aii =0, 


t 


The data points can be separated nonlinearly by 
mapping the data into some higher-dimensional space 


and applying linear SVM to the mapped data. In- 
stead of knowing explicitly the mapping ®, SVM needs 
only the dot products of two transformed data points 
P(x;)- ®(x;). The kernel function K is introduced 
such that K(x;,x;) = ®(x;) - ®(x;). Replacing x? x; by 
K(x;, x;) in the above problem, the separation becomes 
nonlinear, while the problem to be solved remains 
a quadratic program. In testing a new data point x af- 
ter training, the sign of the function f(x) is computed 
to determine the group of x: 


N; Ns 
f(x) = Yo ay: (s;)-O(x) +b = Yo aiyiK(si,x) +b, 


i=1 i=1 


where the s; are the support vectors and N, is the num- 
ber of support vectors. Again the explicit form of B(x) 
is avoided. 

Mangasarian provided a general mathematical pro- 
gramming framework for SVM, called generalized 
SVM or GSVM [79,83]. Special cases can be derived 
from GSVM, including the standard SVM. 

Many SVM-type methods have been developed 
by Mangasarian and others to solve huge classifica- 
tion problems more efficiently. These methods in- 
clude successive overrelaxation for SVM [82], proximal 
SVM [36,38], smooth SVM [68], reduced SVM [67], La- 
grangian SVM [84], incremental SVMs [37], and other 
methods [13,81]. Mangasarian [80] summarized some 
of the developments. Examples of applications of SVM 
include breast cancer studies [69,70] and genome re- 
search [73]. 

Hsu and Lin [49] compared different methods for 
multigroup classification using SVMs. Three methods 
studied were based on several binary classifiers: one 
against one, one against all, and directed acyclic graph 
(DAG) SVM. The other two methods studied are meth- 
ods with decomposition implementation. The experi- 
mental results show that the one-against-one and DAG 
methods are more suitable for practical use than the 
other methods. Lee et al. [66] proposed a generic ap- 
proach to multigroup problems with some theoretical 
properties, and the proposed method was well applied 
to microarray data for cancer classification and satellite 
radiance profiles for cloud classification. 

Gallagher et al. [39,40,63] offered the first discrete 
SVM for multigroup classification with reserved judge- 
ment. The approach has been successfully applied to 
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a diverse variety of biological and medical applica- 
tions (see Sect. “Mixed Integer Programming Based 
Multigroup Classification Models and Applications to 
Medicine and Biology”). 


Mixed Integer Programming Based Multigroup 
Classification Models and Applications 
to Medicine and Biology 


Commonly used methods for classification, such as 
LDFs, decision trees, mathematical programming ap- 
proaches, SVMs, and artificial neural networks, can 
be viewed as attempts at approximating a Bayes op- 
timal rule for classification; that is, a rule that maxi- 
mizes (minimizes) the total probability of correct clas- 
sification (misclassification). Even if a Bayes optimal 
rule is known, intergroup misclassification rates may be 
higher than desired. For example, in a population that 
is mostly healthy, a Bayes optimal rule for medical di- 
agnosis might misdiagnose sick patients as healthy in 
order to maximize total probability of correct diagno- 
sis. As a remedy, a constrained discriminant rule that 
limits the misclassification rate is appealing. 

Assuming that the group density functions and 
prior probabilities are known, Anderson [2] showed 
that an optimal rule for the problem of maximizing 
the probability of correct classification subject to con- 
straints on the misclassification probabilities must be 
of a specific form when discriminating among mul- 
tiple groups with a simplified model. The formulae 
in Anderson’s result depend on a set of parameters 
satisfying a complex relationship between the density 
functions, the prior probabilities, and the bounds on 
the misclassification probabilities. Establishing a viable 
mathematical model to describe Anderson’s result, and 
finding values for these parameters that yield an opti- 
mal rule are challenging tasks. The first computational 
models utilizing Anderson’s formulae were proposed 
in [39,40]. 


Discrete Support Vector Machine Predictive Models 


As part of the work carried out at Georgia Institute 
of Technology’s Center for Operations Research in 
Medicine, we have developed a general-purpose dis- 
criminant analysis modeling framework and computa- 
tional engine that are applicable to a wide variety of 
applications, including biological, biomedical, and lo- 


gistics problems. Utilizing the technology of large-scale 
discrete optimization and SVMs, we have developed 
novel classification models that simultaneously include 
the following features: (1) the ability to classify any 
number of distinct groups; (2) the ability to incorporate 
heterogeneous types of attributes as input; (3) a high- 
dimensional data transformation that eliminates noise 
and errors in biological data; (4) constraints to limit 
the rate of misclassification, and a reserved-judgment 
region that provides a safeguard against overtraining 
(which tends to lead to high misclassification rates from 
the resulting predictive rule); and (5) successive mul- 
tistage classification capability to handle data points 
placed in the reserved-judgment region. Studies involv- 
ing tumor volume identification, ultrasonic cell disrup- 
tion in drug delivery, lung tumor cell motility analysis, 
CpG island aberrant methylation in human cancer, pre- 
dicting early atherosclerosis using biomarkers, and fin- 
gerprinting native and angiogenic microvascular net- 
works using functional perfusion data indicate that our 
approach is adaptable and can produce effective and re- 
liable predictive rules for various biomedical and biobe- 
havior phenomena [16,28,29,56,57,59,60,64,65]. 

Based on the description in [39,40,59,60,63], we 
summarize below some of the classification models we 
have developed. 


Modeling of Reserved-Judgment Region for General 
Groups When the population densities and prior 
probabilities are known, the constrained rules with a re- 
ject option (reserved judgment), based on Anderson’s 
results, call for finding a partition {Ro,...,Rg} of 
R* that maximizes the probability of correct allocation 
subject to constraints on the misclassification probabil- 
ities; i. e., 


G 
Maximize Sm, | f,(w)dw (1) 
g=l1 Rg 
subject to frlw)dw < ang, h,g=1,...,G, 
Rg 
h#g, (2) 
where f;,h € {1,...,G}, are the group conditional 


density functions, 2, denotes the prior probability 
that a randomly selected entity is from group g,g € 
{1,...,G}, and aj.,h A g, are constants between 0 
and 1. Under quite general assumptions, it was shown 
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that there exist unique (up to a set of measure zero) 
nonnegative constants Ajn,i,h € {1,...,G},i # h, 
such that the optimal rule is given by 


R, = R*: L(x) = L ; 
gaixe g(x) ed, n(x)} 6 
£0) o225G', 

where 
Lo(x) = 0, (4) 


G 
La(x) = mafalx)— D> Anfi(x), R= 1,...,6. 


i=1,ith 


(5) 


For G = 2 the optimal solution can be modeled rather 
straightforwardly. However, finding optimal Aj,’s for 
the general case, G > 3, is a difficult problem, with the 
difficulty increasing as G increases. Our model offers an 
avenue for modeling and finding the optimal solution 
in the general case. It is the first such model to be com- 
putationally viable [39,40]. 

Before proceeding, we note that R, can be written as 
Rg = {xe R*: L(x) = Ly(x) forall h = 0, ... , G}. 
So, since L¢(x) > L(x) if, and only if, (1/ ¥ filx)) 
LAxy = (1/20, frlx)) Ly (x), the functions L_,h = 
1,...,G, can be redefined as 


G 
Ly(x) = mapa(x)— > Ainpix),h=1,...,6, 
i=1,ith 


(6) 


where pj(x) = fi(x)/ ee fi(x). We assume that L;, is 
defined as in (6) in our model. 


Mixed Integer Programming Formulations As- 
sume that we are given a training sample of N entities 
whose group classifications are known; say, ng entities 
are in group g, where ae ng = N. Let the k-dimen- 
sional vectors x9, g=1,...,G,j=1,...,Mg, con- 
tain the measurements on k available characteristics of 
the entities. Our procedure for deriving a discriminant 
rule proceeds in two stages. The first stage is to use the 
training sample to compute estimates, tis either para- 
metrically or nonparametrically, of the density func- 
tions f, [89] and estimates, 7, of the prior probabili- 
ties m,,h = 1, ... ,G. The second stage is to determine 


the optimal Aj,’s given these estimates. This stage re- 
quires being able to estimate the probabilities of correct 
classification and misclassification for any candidate set 
of Ajn’s. One could, in theory, substitute the estimated 
densities and prior probabilities into (5), and directly 
use the resulting regions R, in the integral expressions 
given in (1) and (2). This would involve, even in sim- 
ple cases such as normally distributed groups, the nu- 
merical evaluation of k-dimensional integrals at each 
step of a search for the optimal Ajq’s. Therefore, we 
have designed an alternative approach. After substitut- 
ing the fi’s and 7),’s into (5), we simply calculate the 
proportion of training sample points which fall in each 
of the regions Rj, ... , Rg. The MIP models discussed 
below attempt to maximize the proportion of training 
sample points correctly classified while satisfying con- 
straints on the proportions of training sample points 
misclassified. This approach has two advantages. First, 
it avoids having to evaluate the potentially difficult inte- 
grals in (1) and (2). Second, it is nonparametric in con- 
trolling the training sample misclassification probabil- 
ities. That is, even if the densities are poorly estimated 
(by assuming, for example, normal densities for non- 
normal data), the constraints are still satisfied for the 
training sample. Better estimates of the densities may 
allow a higher correct classification rate to be achieved, 
but the constraints will be satisfied even if poor esti- 
mates are used. Unlike most SVM models that mini- 
mize the sum of errors, our objective is driven by the 
number of correct classifications, and will not be biased 
by the distance of the entities from the supporting hy- 
perplane. 

A word of caution is in order. In traditional un- 
constrained discriminant analysis, the true probabil- 
ity of correct classification of a given discriminant rule 
tends to be smaller than the rate of correct classifi- 
cation for the training sample from which it was de- 
rived. One would expect to observe such an effect for 
the method described herein as well. In addition, one 
would expect to observe an analogous effect with re- 
gard to constraints on misclassification probabilities — 
the true probabilities are likely to be greater than any 
limits imposed on the proportions of training sample 
misclassifications. Hence, the oj, parameters should be 
carefully chosen for the application in hand. 

Our first model is a nonlinear 0/1 MIP model with 
the nonlinearity appearing in the constraints. Model 1 
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maximizes the number of correct classifications of the 
given N training entities. Similarly, the constraints on 
the misclassification probabilities are modeled by en- 
suring that the number of group g training entities in 
region R;, is less than or equal to a prespecified per- 
centage, j,.(0 < dng < 1), of the total number, ng, of 
group g entities, h,g €{1,...,G).hfg. 

For notational convenience, let G = {1, ... ,G} 
and N, = {l,... , mg}, for g € G. Also, analogous to 
the definition of p;, define p; by p; = fi(x)/ 0, fil). 
In our model, we use binary indicator variables to de- 
note the group classification of entities. Mathemati- 
cally, let ujg; be a binary variable indicating whether or 
not x® lies in region Rj; i.e., whether or not the jth en- 
tity from group g is allocated to group h. Then model 1 
can be written as follows. 

Discriminant analysis MIP (DAMIP): 


Maximize ) ) Ugej 


gEG jENg 
subject to Ligs = tabula) — D1 Ainpila®’), 
i€G\h 
h,geG,jeNg, 
(7) 
Ygj = max{0, Lagi: h=1,...,G}, gEG jeNg, 


(8) 
gi — Lggi SMU ugg), GEG jENg, (9) 


hgeG jeNn,hFg, 
(10) 


Vgi —Lngj = €(1— Ung ;), 


Do Magi S longngl], hegeGhA#g, (11) 


jENg 


=00O < Lig; < OW, Vegi = 0, Ain = 0, Ung; € {0, 1}. 


Constraint (7) defines the variable Lig; as the value 
of the function Ly, evaluated at x*. Therefore, the con- 
tinuous variable yg, defined in constraint (8), repre- 
sents max{L;,(x8/): h = 0, ... ,G}; and consequently, 
x lies in region R;, if, and only if, ygj = Lhg;. The bi- 
nary variable upg is used to indicate whether or not x‘ 
lies in region Rj; i. e., whether or not the jth entity from 
group g is allocated to group h. In particular, constraint 


(9), together with the objective, forces ugg to be 1 if, and 
only if, the jth entity from group g is correctly allocated 
to group g; and constraints (10) and (11) ensure that 
at most |Qpgng| (i.e., the greatest integer less than or 
equal to @pgng) group g entities are allocated to group 
h,h # g. One caveat regarding the indicator variables 
Ung is that although the condition uj,.; = 0,h  g,im- 
plies (by constraint (10)) that x8/ ¢ Rj, the converse 
need not hold. As a consequence, the number of mis- 
classifications may be overcounted. However, in our 
preliminary numerical study we found that the actual 
amount of overcounting is minimal. One could force 
the converse (thus, uj; = 1 ifand only if x8) © Ry) by 
adding constraints y,j —Lygj < M(1 — ung;), for ex- 
ample. Finally, we note that the parameters M and € 
are extraneous to the discriminant analysis problem it- 
self, but are needed in the model to control the indica- 
tor variables uy,j. The intention is for M and € to be, 
respectively, large and small positive constants. 


Model Variations We explore different variations in 
the model to grasp the quality of the solution and the 
associated computational effort. 

A first variation involves transforming model 1 to 
an equivalent linear mixed integer model. In particular, 
model 2 replaces the N constraints defined in (8) with 
the following system of 3GN + 2N constraints: 
h,geG,jeNn,, 


Vegi = Lng; (12) 


Vngj — Lig} < M(1—Vvngj), h,geG,jenNng, (13) 


Ynej < FtnPn(x®!)vng;, hgeGije Ny, (14) 
Yi vigil, GEG, jeNg, (15) 
heG 
D2 Fngi = Yai SEG JEN, (16) 
heG 


where Vngj =O and vig; € {0,1},h,g € Gj E Ng. 
These constraints, together with the nonnegativity of yg 
force ygj = max{0, Lig;: Wee ica Gh 

The second variation involves transforming 
model 1 to a heuristic linear MIP model. This is 
done by replacing the nonlinear constraint (8) with 
Voi = Lngj.h.g € Gj € Ng, and including penalty 
terms in the objective function. In particular, model 3 
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has the objective 


Maximize > > Buggj — > > VY gi» 


geG jeNg geG jENg 


where f and y are positive constants. This model 
is heuristic in that there is nothing to force 
Voi = max{0,Ligj: h =1,... , G}. 
in addition to trying to force as many ug,;'s to 1 as pos- 
sible, the objective in model 3 also tries to make the y's 
as small as possible, and the optimizer tends to drive yg 
towards max{0, Lygj: h = 1, ... ,G}. We remark that 
B and y could be stratified by group (i.e., introduce 
possibly distinct By, ys, g € G) to model the relative 
importance of certain groups to be correctly classified. 

A reasonable modification to models 1, 2, and 
3 involves relaxing the constraints specified by (11). 
Rather than placing restrictions on the number of 
type g training entities classified into group h, for 
all h,g € G,h ¥ g, one could simply place an upper 
bound on the total number of misclassified training en- 
tities. In this case, the G(G — 1) constraints specified by 
(11) would be replaced by the single constraint 


Dd, dy Dd, hes S Lan), 


&€G hEG\{g} jENg 


However, since 


(17) 


where q@ is a constant between 0 and 1. We will refer 
to models 1, 2, and 3 modified in this way as models 
1T, 2T, and 3T, respectively. Of course, other modifi- 
cations are also possible. For instance, one could place 
restrictions on the total number of type g points mis- 
classified for each g € G. Thus, in place of the con- 
straints specified in (17), one would include the con- 
straints > ,<¢\{g} jen, Ungj < la@gN], g € G, where 
O0<a, <1. 

We also explore a heuristic linear model of model 1. 
In particular, consider the linear program (DALP): 


Maximize )) ) | (ciWej + C2¥¢/) (18) 
BEG jENg 
subject to Lngj = tnpn(x®) — > dinpi(x), 
i€G\h 
h,geG,jen,. 
(19) 


Lggi—Lhgj tWej >0, h,geGh # 8. J ENg, (20) 


Leej + Wej > 0, gE G,jEN,, (21) 


—Lngj + yej 29, h,geG,jeNn,, (22) 


—00 < Ligj < ©, Woj» Voir Aih =0. 


Constraint (19) defines the variable Ljg as the 
value of the function L;, evaluated at x8. As the op- 
timization solver searches through the set of feasi- 
ble solutions, the Aj, variables will vary, causing the 
Lyg variables to assume different values. Constraints 
(20), (21), and (22) link the objective-function vari- 
ables with the Ljg variables in such a way that cor- 
rect classification of training entities and allocation of 
training entities into the reserved-judgment region are 
captured by the objective-function variables. In par- 
ticular, if the optimization solver drives wg to zero 
for some g,j pair, then constraints (20) and (21) im- 
ply that Lee; = max{0, Lngj: h € G}. Hence, the jth 
entity from group g is correctly classified. If, on 
the other hand, the optimal solution yields y,; = 0 
for some g,j pair, then constraint (22) implies that 
max{0, Ligj: h € G} = 0. Thus, the jth entity from 
group g is placed in the reserved-judgment region. (Of 
course, it is possible for both w, and y, to be zero. One 
should decide prior to solving the linear program how 
to interpret the classification in such cases.) If both wg 
and y,j are positive, the jth entity from group g is mis- 
classified. 

The optimal solution yields a set of Aj;,’s that best al- 
locates the training entities (i. e., “best” in terms of min- 
imizing the penalty objective function). The optimal 
Ain’s can then be used to define the functions Ly, h € G, 
which in turn can be used to classify a new entity with 
feature vector x € R* by simply computing the index 
at which max{Ly,(x): h € {0,1, ... , G}} is achieved. 

Note that model DALP places no a priori bound on 
the number of misclassified training entities. However, 
since the objective is to minimize a weighted combi- 
nation of the variables wg and y,j, the optimizer will 
attempt to drive these variables to zero. Thus, the op- 
timizer is, in essence, attempting either to correctly 
classify training entities (w,; = 0), or to place them 
in the reserved-judgment region (y,; = 0). By vary- 
ing the weights c; and cz, one has a means of control- 
ling the optimizer’s emphasis for correctly classifying 
training entities versus placing them in the reserved- 
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Model size 


Model Type Constraints 
Nonlinear MIP 
2 Linear MIP 


3 Linear MIP 


3T Linear MIP 
DALP 


3GN+ 1 
Linear program | 3GN 


judgment region. If c./c, < 1, the optimizer will tend 
to place a greater emphasis on driving the wg variables 
to zero than driving the y,; variables to zero (conversely, 
if co/c; > 1). Hence, when c2/c, < 1, one should expect 
to get relatively more entities correctly classified, fewer 
placed in the reserved-judgment region, and more mis- 
classified, than when c2/c; > 1. An extreme case is 
when c, = 0. In this case, there is no emphasis on driv- 
ing yg to zero (the reserved-judgment region is thus ig- 
nored), and the full emphasis of the optimizer is to drive 
Wg to zero. 

Table 1 summarizes the number of constraints, the 
total number of variables, and the number of 0/1 vari- 
ables in each of the discrete SVM models, and in the 
heuristic LP model (DALP). Clearly, even for moder- 
ately sized discriminant analysis problems, the MIP in- 
stances are relatively large. Also, note that model 2 is 
larger than model 3, in terms of both the number of 
constraints and the number of variables. However, it is 
important to keep in mind that the difficulty of solving 
an MIP problem cannot, in general, be predicted solely 
by its size; problem structure has a direct and substan- 
tial bearing on the effort required to find optimal so- 
lutions. The LP relaxation of these MIP models poses 
computational challenges as commercial LP solvers re- 
turn (optimal) LP solutions that are infeasible, owing to 
the equality constraints, and the use of big M and small 
€ in the formulation. 

It is interesting to note that the set of feasible solu- 
tions for model 2 is “tighter” than that for model 3. In 
particular, if F; denotes the set of feasible solutions of 
model i, then 


F, = {(L,A,u, y): there exists 7, v 


such that (L,A,u, y, ¥,v) © Fy} © Bs. (23) 


2GN + N+ G(G— 
5GN + 2N + G(G 
3GN + G(G — 1) 2GN+N 


=) 


Total variables 0/1 Variables 


D 


NG+N+G(G—1 


The novelties of the classification models devel- 
oped herein include the following: (1) they are suitable 
for discriminant analysis given any number of groups, 
(2) they accept heterogeneous types of attributes as in- 
put, (3) they use a parametric approach to reduce high- 
dimensional attribute spaces, and (4) they allow con- 
straints on the number of misclassifications, and utilize 
a reserved judgment to facilitate the reduction of mis- 
classifications. The lattermost point opens the possibil- 
ity of performing multistage analysis. 

Clearly, the advantage of an LP model over an MIP 
model is that the associated problem instances are com- 
putationally much easier to solve. However, the most 
important criterion in judging a method for obtaining 
discriminant rules is how the rules perform in correctly 
classifying new unseen entities. Once the rule has been 
developed, applying it to a new entity to determine its 
group is trivial. Extensive computational experiments 
have been performed to gauge the qualities of solutions 
of different models [17,19,40,59,60,63]. 


Validation of Model and Computational Effort We 
performed tenfold cross-validation, and designed sim- 
ulation and comparison studies on our models. The 
results reported in [40,63] demonstrate that our ap- 
proach works well when applied to both simulated 
data and data sets from the machine learning data- 
base repository [91]. In particular, our methods com- 
pare favorably and at times superior to other math- 
ematical programming methods, including the GSFC 
model by Gehrlein [41], and the LP model by Gochet 
et al. [46], as well as Fisher’s LDF, artificial neural net- 
works, quadratic discriminant analysis, tree classifica- 
tion, and other SVMs, on real biological and medical 
data. 
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Classification Results for Real-World Biological 
and Medical Applications 


The main objective in discriminant analysis is to de- 
rive rules that can be used to classify entities into 
groups. Computationally, the challenge lies in the ef- 
fort expended to develop such a rule. Once the rule 
has been developed, applying it to a new entity to de- 
termine its group is trivial. Feasible solutions obtained 
from our classification models correspond to predictive 
rules. Empirical results [40,63] indicate that the result- 
ing classification model instances are computationally 
very challenging, and even intractable by competitive 
commercial MIP solvers. However, the resulting pre- 
dictive rules prove to be very promising, offering cor- 
rect classification rates on new unknown data ranging 
from 80 to 100% for various types of biological/medical 
problems. Our results indicate that the general-purpose 
classification framework that we have designed has the 
potential to be a very powerful predictive method for 
clinical settings. 

The choice of MIP as the underlying modeling 
and optimization technology for our SVM classification 
model is guided by the desire to simultaneously incor- 
porate a variety of important and desirable properties of 
predictive models within a general framework. MIP it- 
self allows for incorporation of continuous and discrete 
variables, and linear and nonlinear constraints, provid- 
ing a flexible and powerful modeling environment. 

Our mathematical modeling and computational 
algorithm design shows great promise as the result- 
ing predictive rules are able to produce higher rates of 
correct classification for new biological data (with un- 
known group status) compared with existing classifica- 
tion methods. This is partly due to the transformation 
of raw data via the set of constraints in (7). While most 
mathematical programming approaches directly deter- 
mine the hyperplanes of separation using raw data, our 
approach transforms the raw data via a probabilistic 
model, before the determination of the supporting hy- 
perplanes. Further, the separation is driven by maxi- 
mizing the sum of binary variables (representing cor- 
rect classification or not of entities), instead of max- 
imizing the margins between groups, or minimizing 
a sum of errors (representing distances of entities from 
hyperplanes), as in other SVMs. The combination of 
these two strategies offers better classification capabil- 


ity. Noise in the transformed data is not as profound as 
in raw data. And the magnitudes of the errors do not 
skew the determination of the separating hyperplanes, 
as all entities have equal importance when correct clas- 
sification is being counted. 

To highlight the broad applicability of our ap- 
proach, below we briefly summarize the application 
of our predictive models and solution algorithms to 
ten different biological problems. Each of the projects 
was carried out in close partnership with experimen- 
tal biologists and/or clinicians. Applications to finance 
and other industry applications are described else- 
where [17,40,63]. 


Determining the Type of Erythemato-Squamous 
Disease The differential diagnosis of erythemato- 
squamous diseases is an important problem in der- 
matology [60]. They all share the clinical features of 
erythema and scaling, with very little differences. The 
six groups are psoriasis, seboreic dermatitis, lichen 
planus, pityriasis rosea, cronic dermatitis, and pityria- 
sis rubra pilaris. Usually a biopsy is necessary for the 
diagnosis but unfortunately these diseases share many 
histopathological features as well. Another difficulty for 
the differential diagnosis is that a disease may show the 
features of another disease at the beginning stage and 
may have the characteristic features at the following 
stages [91]. 

The six groups consisted of 366 subjects (112, 61, 72, 
49, 52, and 20 respectively) with 34 clinical attributes. 
Patients were first evaluated clinically with 12 features. 
Afterwards, skin samples were taken for the evalua- 
tion of 22 histopathological features. The values of the 
histopathological features were determined by an anal- 
ysis of the samples under a microscope. The 34 at- 
tributes include (1) clinical attributes (erythema, scal- 
ing, definite borders, itching, koebner phenomenon, 
polygonal papules, follicular papules, oral mucosal in- 
volvement, knee and elbow involvement, scalp involve- 
ment, family history, age) and (2) histopathological 
attributes (melanin incontinence, eosinophils in the in- 
filtrate, polymorphonuclear leukocyte infiltrate, fibrosis 
of the papillary dermis, exocytosis, acanthosis, hyperk- 
eratosis, parakeratosis, clubbing of the rete ridges, elon- 
gation of the rete ridges, thinning of the suprapapillary 
epidermis, spongiform pustule, Munro microabscess, 
focal hypergranulosis, disappearance of the granular 
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layer, vacuolization and damage of basal layer, spongio- 
sis, sawtooth appearance of retes, follicular horn plug, 
perifollicular parakeratosis, inflammatory monoluclear 
infiltrate, band-like infiltrate). 

Our multigroup classification model selected 27 dis- 
criminatory attributes, and successfully classified the 
patients into six groups, each with an unbiased correct 
classification of greater than 93% (with 100% correct 
rate for groups 1, 3, 5, and 6) with an average overall 
accuracy of 98%. Using 250 subjects to develop the rule, 
and testing the remaining 116 patients, we obtained 
a prediction accuracy of 91%. 


Predicting Presence/Absence of Heart Disease The 
four databases concerning heart disease diagnosis were 
collected by Dr. Andras Janosi of the Hungarian Insti- 
tute of Cardiology, Budapest; Dr. William Steinbrunn 
of University Hospital, Zurich; Dr. Matthias Pfisterer 
of University Hospital, Basel; and Dr. Robert Detrano 
of V.A. Medical Center, Long Beach, and Cleveland 
Clinic Foundation [60]. Each database contains the 
same 76 attributes. The “goal” field refers to the pres- 
ence of heart disease in the patient. The classification 
attempts to distinguish presence (values 1, 2, 3, 4, in- 
volving a total of 509 subjects) from absence (value 
0, involving 411 subjects) [91]. The attributes include 
demographics, physiocardiovascular conditions, tradi- 
tional risk factors, family history, personal lifestyle, and 
cardiovascular exercise measurements. This data set has 
posed some challenges to past analysis via various clas- 
sification approaches, resulting in less than 80% correct 
classification. Applying our classification model with- 
out reserved judgment, we obtained 79 and 85% correct 
classification for each group respectively. To determine 
the usefulness of multistage analysis, we applied two- 
stage classification. In the first stage, 14 attributes were 
selected as discriminatory. One hundred and thirty-five 
group absence subjects were placed into the reserved- 
judgment region, with 85% of the remaining being clas- 
sified as group absence correctly; while 286 group pres- 
ence subjects were placed into the reserved-judgment 
region, and 91% of the remaining were classified cor- 
rectly into the group presence. In the second stage, 11 
attributes were selected with 100 and 229 classified into 
group absence and presence respectively. Combining 
the two stages, we obtained a correct classification of 82 
and 85%, respectively, for diagnosis of absence or pres- 
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Disease Diagnosis: Optimization-Based Methods, Figure 1 
A tree diagram for two-stage classification and prediction of 
heart disease 


ence of heart disease. Figure | illustrates the two-stage 
classification. 


Predicting Aberrant CpG Island Methylation in Hu- 
man Cancer More details of this work can be found 
in [28,29]. Epigenetic silencing associated with aberrant 
methylation of promoter-region CpG islands is one 
mechanism leading to loss of tumor suppressor func- 
tion in human cancer. Profiling of CpG island methy- 
lation indicates that some genes are more frequently 
methylated than others, and that each tumor type is as- 
sociated with a unique set of methylated genes. How- 
ever, little is known about why certain genes succumb 
to this aberrant event. To address this question, we 
used restriction landmark genome scanning (RLGS) to 
analyze the susceptibility of 1749 unselected CpG is- 
lands to de novo methylation driven by overexpression 
of DNMT1. We found that whereas the overall inci- 
dence of CpG island methylation was increased in cells 
overexpressing DNMT1, not all loci were equally af- 
fected. The majority of CpG islands (69.9%) were re- 
sistant to de novo methylation, regardless of DNMT1 
overexpression. In contrast, we identified a subset of 
methylation-prone CpG islands (3.8%) that were con- 
sistently hypermethylated in multiple DNMT1 overex- 
pressing clones. Methylation-prone and methylation- 
resistant CpG islands were not significantly different 
with respect to size, C+G content, CpG frequency, 
chromosomal location, or gene association or pro- 
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moter association. To discriminate methylation-prone 
from methylation-resistant CpG islands, we developed 
a novel DNA pattern recognition model and algo- 
rithm [61], and coupled our predictive model described 
herein with the patterns found. We were able to de- 
rive a classification function based on the frequency 
of seven novel sequence patterns that was capable of 
discriminating methylation-prone from methylation- 
resistant CpG islands with 90% correctness upon cross- 
validation, and 85% accuracy when tested against blind 
CpG islands unknown to us regarding the methylation 
status. The data indicate that CpG islands differ in their 
intrinsic susceptibility to de novo methylation, and sug- 
gest that the propensity for a CpG island to become 
aberrantly methylated can be predicted on the basis of 
its sequence context. 

The significance of this research is twofold. First, 
the identification of sequence patterns/attributes that 
distinguish methylation-prone CpG islands will lead to 
a better understanding of the basic mechanisms under- 
lying aberrant CpG island methylation. Because genes 
that are silenced by methylation are otherwise struc- 
turally sound, the potential for reactivating these genes 
by blocking or reversing the methylation process repre- 
sents an exciting new molecular target for chemothera- 
peutic intervention. A better understanding of the fac- 
tors that contribute to aberrant methylation, includ- 
ing the identification of sequence elements that may 
act to target aberrant methylation, will be an impor- 
tant step in achieving this long-term goal. Secondly, 
the classification of the more than 29,000 known (but 
as yet unclassified) CpG islands in human chromo- 
somes will provide an important resource for the iden- 
tification of novel gene targets for further study as po- 
tential molecular markers that could impact on both 
cancer prevention and treatment. Extensive RLGS fin- 
gerprint information (and thus potential training sets 
of methylated CpG islands) already exists for a num- 
ber of human tumor types, including breast, brain, 
lung, leukemias, hepatocellular carcinomas, and primi- 
tive neuroectodermal tumor [23,24,35,102]. Thus, the 
methods and tools developed are directly applicable 
to CpG island methylation data derived from human 
tumors. Moreover, new microarray-based techniques 
capable of “profiling” more than 7000 CpG islands 
have been developed and applied to human breast can- 
cers [15,117,118]. We are uniquely poised to take ad- 


vantage of the tumor CpG island methylation profile in- 
formation that will likely be generated using these tech- 
niques over the next several years. Thus, our general- 
predictive modeling framework has the potential to 
lead to improved diagnosis and prognosis and treat- 
ment planning for cancer patients. 


Discriminant Analysis of Cell Motility and Morphol- 
ogy Data in Human Lung Carcinoma Refer to [16] 
for more details of this work. This study focuses on 
the differential effects of extracellular matrix proteins 
on the motility and morphology of human lung epider- 
moid carcinoma cells. The behavior of carcinoma cells 
is contrasted with that of normal L-132 cells, result- 
ing in a method for the prediction of metastatic poten- 
tial. Data collected from time-lapsed videomicroscopy 
were used to simultaneously produce quantitative mea- 
sures of motility and morphology. The data were subse- 
quently analyzed using our discriminant analysis model 
and algorithm to discover relationships between motil- 
ity, morphology, and substratum. Our discriminant 
analysis tools enabled the consideration of many more 
cell attributes than is customary in cell motility stud- 
ies. The observations correlate with behaviors seen in 
vivo and suggest specific roles for the extracellular ma- 
trix proteins and their integrin receptors in metasta- 
sis. Cell translocation in vitro has been associated with 
malignancy, as has an elongated phenotype [120] and 
a rounded phenotype [97]. Our study suggests that ex- 
tracellular matrix proteins contribute in different ways 
to the malignancy of cancer cells, and that multiple ma- 
lignant phenotypes exist. 


Ultrasound-Assisted Cell Disruption for Drug Deliv- 
ery Reference [57] discusses this in detail. Although 
biological effects of ultrasound must be avoided for safe 
diagnostic applications, ultrasound’s ability to disrupt 
cell membranes has attracted interest as a method to fa- 
cilitate drug and gene delivery. This preliminary study 
seeks to develop rules for predicting the degree of cell 
membrane disruption based on specified ultrasound 
parameters and measured acoustic signals. Too much 
ultrasound destroys cells, while cell membranes will not 
open up for absorption of macromolecules when too lit- 
tle ultrasound is applied. The key is to increase cell per- 
meability to allow absorption of macromolecules, and 
to apply ultrasound transiently to disrupt viable cells so 
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as to enable exogenous material to enter without cell 
damage. Thus our task is to uncover a “predictive rule” 
of ultrasound-mediated disruption of red blood cells 
using acoustic spectrums and measurements of cell per- 
meability recorded in experiments. 

Our predictive model and solver for generating pre- 
diction rules were applied to data obtained from a se- 
quence of experiments on bovine red blood cells. For 
each experiment, the attributes consisted of four ultra- 
sound parameters, acoustic measurements at 400 fre- 
quencies, and a measure of cell membrane disruption. 
To avoid overtraining, various feature combinations of 
the 404 predictor variables were selected when develop- 
ing the classification rule. The results indicate that the 
variable combination consisting of ultrasound expo- 
sure time and acoustic signals measured at the driving 
frequency and its higher harmonics yields the best rule, 
and our method compares favorably with classification 
tree and other ad hoc approaches, with a correct clas- 
sification rate of 80% upon cross-validation and 85% 
when classifying new unknown entities. Our methods 
used for deriving the prediction rules are broadly ap- 
plicable, and could be used to develop prediction rules 
in other scenarios involving different cell types or tis- 
sues. These rules and the methods used to derive them 
could be used for real-time feedback about ultrasound’s 
biological effects. For example, it could assist clinicians 
during a drug delivery process, or could be imported 
into an implantable device inside the body for auto- 
matic drug delivery and monitoring. 


Identification of Tumor Shape and Volume in Treat- 
ment of Sarcoma Reference [56] includes the de- 
tailed analysis. This project involves the determina- 
tion of tumor shape for adjuvant brachytherapy treat- 
ment of sarcoma, based on catheter images taken after 
surgery. In this application, the entities are overlapping 
consecutive triplets of catheter markings, each of which 
is used for determining the shape of the tumor contour. 
The triplets are to be classified into one of two groups: 
group 1 (triplets for which the middle catheter mark- 
ing should be bypassed) and group 2 (triplets for which 
the middle marking should not be bypassed). To de- 
velop and validate a classification rule, we used clini- 
cal data collected from 15 soft-tissue sarcoma patients. 
Cumulatively, this comprised 620 triplets of catheter 
markings. By careful (and tedious) clinical analysis of 


the geometry of these triplets, 65 were determined to 
belong to group 1, the “bypass” group, and 555 were 
determined to belong to group 2, the “do-not-bypass” 
group. 

A set of measurements associated with each triplet 
was then determined. The choice of what attributes 
to measure to best distinguish triplets as belonging to 
group 1 or group 2 is nontrivial. The attributes involved 
the distance between each pair of markings, angles, 
and the curvature formed by the three triplet mark- 
ings. On the basis of the attributes selected, our pre- 
dictive model was used to develop a classification rule. 
The resulting rule provides 98% correct classification 
on cross-validation, and was capable of correctly deter- 
mining/predicting 95% of the shape of the tumor with 
new patients’ data. We remark that the current clinical 
procedure requires manual outline based on markers in 
films of the tumor volume. This study was the first to 
use automatic construction of tumor shape for sarcoma 
adjuvant brachytherapy [56,62]. 


Discriminant Analysis of Biomarkers for Prediction 
of Early Atherosclerosis More detail on this work 
can be found in [65]. Oxidative stress is an important 
etiologic factor in the pathogenesis of vascular disease. 
Oxidative stress results from an imbalance between in- 
jurious oxidant and protective antioxidant events, of 
which the former predominate [88,103]. This results 
in the modification of proteins and DNA, alteration in 
gene expression, promotion of inflammation, and de- 
terioration in endothelial function in the vessel wall, 
all processes that ultimately trigger or exacerbate the 
atherosclerotic process [22,111]. It was hypothesized 
that novel biomarkers of oxidative stress would pre- 
dict early atherosclerosis in a relatively healthy non- 
smoking population free from cardiovascular disease. 
One hundred and twenty-seven healthy nonsmokers, 
without known clinical atherosclerosis had carotid in- 
tima media thickness (IMT) measured using ultra- 
sound. Plasma oxidative stress was estimated by meas- 
uring plasma lipid hydroperoxides using the determi- 
nation of reactive oxygen metabolites (d-ROMs) test. 
Clinical measurements include traditional risk factors, 
including age, sex, low-density lipoprotein (LDL), high- 
density lipoprotein (HDL), triglycerides, cholesterol, 
body-mass index (BMI), hypertension, diabetes melli- 
tus, smoking history, family history of coronary artery 
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disease, Framingham risk score, and high-sensitivity 
C-reactive protein. 

For this prediction, the patients were first clus- 
tered into two groups: (group 1, IMT > 0.68; group 2, 
IMT <0.68). On the basis of this separator, 30 patients 
belonged to group 1, and 97 belonged to group 2. 
Through each iteration, the classification method trains 
and learns from the input training set and returns 
the most discriminatory patterns among the 14 clini- 
cal measurements; ultimately resulting in the develop- 
ment of a prediction rule based on observed values of 
these discriminatory patterns among the patient data. 
Using all 127 patients as a training set, the predictive 
model identified age, sex, BMI, HDL cholesterol, fam- 
ily history of coronary artery disease under 60, high- 
sensitivity C-reactive protein, and d-ROM as discrim- 
inatory attributes that together provide unbiased cor- 
rect classification of 90 and 93%, respectively, for group 
1 (IMT > 0.68) and group 2 (IMT <0.68) patients. To 
further test the power of the classification method for 
correctly predicting the IMT status of new/unseen pa- 
tients, we randomly selected a smaller patient training 
set of size 90. The predictive rule from this training set 
yielded 80 and 89% correct rates for predicting the re- 
maining 37 patients as group 1 and group 2 patients, 
respectively. The importance of d-ROM as a discrimi- 
natory predictor for IMT status was confirmed during 
the machine learning process. This biomarker was se- 
lected in every iteration as the “machine” learned and 
was trained to develop a predictive rule to correctly 
classify patients in the training set. We also performed 
predictive analysis using Framingham risk score and 
d-ROM; in this case the unbiased correct classifica- 
tion rates (for the 127 individuals) for groups 1 and 2 
were 77 and 84%, respectively. This is the first study 
to illustrate that this measure of oxidative stress can 
be effectively used along with traditional risk factors to 
generate a predictive rule that can potentially serve as 
an inexpensive clinical diagnostic tool for prediction of 
early atherosclerosis. 


Fingerprinting Native and Angiogenic Microvascu- 
lar Networks Through Pattern Recognition and Dis- 
criminant Analysis of Functional Perfusion Data 
The analysis and findings are described in [64]. The 
cardiovascular system provides oxygen and nutrients 
to the entire body. Pathological conditions that impair 


normal microvascular perfusion can result in tissue is- 
chemia, with potentially serious clinical effects. Con- 
versely, development of new vascular structures fuels 
the progression of cancer, macular degeneration, and 
atherosclerosis. Fluorescence microangiography offers 
superb imaging of the functional perfusion of new 
and existent microvasculature, but quantitative anal- 
ysis of the complex capillary patterns is challenging. 
We developed an automated pattern-recognition al- 
gorithm to systematically analyze the microvascular 
networks, and then applied our classification model 
described herein to generate a predictive rule. The 
pattern-recognition algorithm identifies the complex 
vascular branching patterns, and the predictive rule 
demonstrates, respectively, 100 and 91% correct clas- 
sification for perturbed (diseased) and normal tissue 
perfusion. We confirmed that transplantation of nor- 
mal bone marrow to mice in which genetic deficiency 
resulted in impaired angiogenesis eliminated predicted 
differences and restored normal-tissue perfusion pat- 
terns (with 100% correctness). The pattern-recogni- 
tion and classification method offers an elegant solution 
for the automated fingerprinting of microvascular net- 
works that could contribute to better understanding of 
angiogenic mechanisms and be utilized to diagnose and 
monitor microvascular deficiencies. Such information 
would be valuable for early detection and monitoring of 
functional abnormalities before they produce obvious 
and lasting effects, which may include improper perfu- 
sion of tissue, or support of tumor development. 

The algorithm can be used to discriminate between 
the angiogenic response in a native healthy specimen 
compared with groups with impairment due to age or 
chemical or other genetic deficiency. Similarly, it can 
be applied to analyze angiogenic responses as a result 
of various treatments. This will serve two important 
goals. First, the identification of discriminatory pat- 
terns/attributes that distinguish angiogenesis status will 
lead to a better understanding of the basic mechanisms 
underlying this process. Because therapeutic control of 
angiogenesis could influence physiological and patho- 
logical processes such as wound and tissue repairing, 
cancer progression and metastasis, or macular degener- 
ation, the ability to understand it under different con- 
ditions will offer new insight into developing novel 
therapeutic interventions, monitoring and treatment, 
especially in aging, and heart disease. Thus, our study 
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and the results form the foundation of a valuable di- 
agnostic tool for changes in the functionality of the 
microvasculature and for discovery of drugs that al- 
ter the angiogenic response. The methods can be ap- 
plied to tumor diagnosis, monitoring, and prognosis. 
In particular, it will be possible to derive microangio- 
graphic fingerprints to acquire specific microvascular 
patterns associated with early stages of tumor develop- 
ment. Such “angioprinting” could become an extremely 
helpful early diagnostic modality, especially for easily 
accessible tumors such as skin cancer. 


Prediction of Protein Localization Sites The pro- 
tein localization database consists of eight groups with 
a total of 336 instances (143, 77, 52, 35, 20, 5, 2, 
and 2, respectively) with seven attributes [91]. The 
eight groups are eight localization sites of protein, in- 
cluding cytoplasm (cp), inner membrane without sig- 
nal sequence (im), perisplasm (pp), inner membrane, 
uncleavable signal sequence (imU), outer membrane 
(om), outer membrane lipoprotein (omL), inner mem- 
brane lipoprotein (imL), inner membrane, and cleav- 
able signal sequence (imS). However, the last four 
groups were taken out of our classification experiment 
since the population sizes are too small to ensure signif- 
icance. 

The seven attributes include McGeoch’s method 
for signal sequence recognition (mcg), von Heijne’s 
method for signal sequence recognition (gvh), von Hei- 
jne’s signal peptidase II consensus sequence score (lip), 
presence of charge on N-terminus of predicted lipopro- 
teins (chg), score of discriminant analysis of the amino 
acid content of outer membrane and periplasmic pro- 
teins (aac), score of the ALOM membrane spanning 
region prediction program (alm1), and score of the 
ALOM program after excluding putative cleavable sig- 
nal regions from the sequence (alm2). 

In the classification we use four groups, 307 in- 
stances, with seven attributes. Our classification model 
selected the discriminatory patterns mcg, gvh, alm1, 
and alm2 to form the predictive rule with unbiased cor- 
rect classification rates of 89%, compared with 81% by 
other classification models [48]. 


Pattern Recognition in Satellite Images for Deter- 
mining Types of Soil The satellite database consists 
of the multispectral values of pixels in 3 x 3 neighbor- 


hoods in a satellite image, and the classification associ- 
ated with the central pixel in each neighborhood. The 
aim is to predict this classification, given the multispec- 
tral values. In the sample database, the class of a pixel 
is coded as a number. There are six groups with 4435 
samples in the training data set and 2000 samples in the 
testing data set; and each sample entity has 36 attributes 
describing the spectral bands of the image [91]. 

The original Landsat Multi-Spectral Scanner (MSS) 
image data for this database were generated from data 
purchased from NASA by the Australian Centre for 
Remote Sensing. The Landsat satellite data are one of 
the many sources of information available for a scene. 
The interpretation of a scene by integrating spatial data 
of diverse types and resolutions including multispec- 
tral and radar data, maps indicating topography, land 
use, etc. is expected to assume significant importance 
with the onset of an era characterized by integrative ap- 
proaches to remote sensing (for example, NASA’s Earth 
Observing System commencing this decade). 

One frame of Landsat MSS imagery consists of four 
digital images of the same scene in different spectral 
bands. Two of these are in the visible region (corre- 
sponding approximately to green and red regions of the 
visible spectrum) and two are in the (near) infrared. 
Each pixel is an 8-bit binary word, with 0 correspond- 
ing to black and 255 to white. The spatial resolution 
of a pixel is about 80 m x 80 m. Each image contains 
2340 x 3380 such pixels. 

The database is a (tiny) subarea of a scene, consist- 
ing of 82 x 100 pixels. Each line of data corresponds to 
a 3 x 3 square neighborhood of pixels completely con- 
tained within the 82 x 100 subarea. Each line contains 
the pixel values in the four spectral bands (converted to 
ASCII) of each of the nine pixels in the 3 x 3 neighbor- 
hood and a number indicating the classification label of 
the central pixel. The number is a code for the following 
six groups: red soil, cotton crop, gray soil, damp gray 
soil, soil with vegetation stubble, and very damp gray 
soil. Running our classification model, we selected 17 
discriminatory attributes to form the classification rule, 
producing an unbiased prediction with 85% accuracy. 


Further Advances 


Brooks and Lee [17,18] devised other variations of 
the basic DAMIP model. They also showed that 
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DAMIP is strongly universally consistent (in some 
sense) with very good rates of convergence from Vap- 
nik and Chervonenkis theory. A polynomial-time al- 
gorithm for discriminating between two populations 
with the DAMIP model was developed, and DAMIP 
was shown to be IV P-complete for a general number 
of groups. The proof demonstrating NV P-completeness 
employs results used in generating edges of the conflict 
graph [4,11,12,55]. Exploiting the necessary and suffi- 
cient conditions that identify edges in the conflict graph 
is the central contribution to the improvement in solu- 
tion performance over industry-standard software. The 
conflict graph is the basis for various valid inequalities, 
a branching scheme, and for conditions under which 
integer variables are fixed for all solutions. Additional 
solution methods are identified which include a heuris- 
tic for finding solutions at nodes in the branch-and- 
bound tree, upper bounds for model parameters, and 
necessary conditions for edges in the conflict hyper- 
graph [26,58]. Further, we have concluded that DAMIP 
is a computationally feasible, consistent, stable, robust, 
and accurate classifier. 


Progress and Challenges 


We summarize in Table 2 the mathematical program- 
ming techniques used in classification problems as re- 
viewed in this chapter. 

As noted by current research efforts, multigroup 
classification remains NV P-complete and much work is 
needed to design effective models as well as to derive 
novel and efficient computational algorithms to solve 
these multigroup instances. 


Other Methods 


While most classification methods can be described 
in terms of discriminant functions, some methods 
are not trained in the paradigm of determining co- 
efficients or parameters for functions of a predefined 
form. These methods include classification and regres- 
sion trees, nearest-neighbor methods, and neural net- 
works. 

Classification and regression trees [14] are nonpara- 
metric approaches to prediction. Classification trees 
seek to develop classification rules based on successive 
binary partitions of observations based on attribute val- 
ues. Regression trees also employ rules consisting of bi- 


nary partitions, but are used to predict continuous re- 
sponses. 

The rules generated by classification trees are easily 
viewable by plotting them in a treelike structure, from 
which the name arises. A test entity may be classified 
using rules in a tree plot by first comparing the entity’s 
data with the root node of the tree. If the root node con- 
dition is satisfied by the data for a particular entity, the 
left branch is followed to another node; otherwise, the 
right branch is followed to another node. The data from 
the observation are compared with conditions at subse- 
quent nodes until a leaf node is reached. 

Nearest-neighbor methods begin by establishing 
a set of labeled prototype observations. The nearest- 
neighbor classification rule assigns test entities to 
groups according to the group membership of the near- 
est prototype. Different measures of distance may be 
used. The k-nearest-neighbor rule assigns entities to 
groups according to the group membership of the k 
nearest prototypes. 

Neural networks are classification models that can 
also be interpreted in terms of discriminant functions, 
though they are used in a way that does not require 
finding an analytic form for the functions [25]. Neural 
networks are trained by considering one observation at 
a time, modifying the classification procedure slightly 
with each iteration. 


Summary and Conclusion 


In this chapter, we presented an overview of mathemat- 
ical programming based classification models, and an- 
alyzed their development and advances in recent years. 
Many mathematical programming methods are geared 
toward two-group analysis only, and their performance 
is often compared with Fisher’s LDF or Smith’s QDF. 
It has been noted that these methods can be used for 
multiple group analysis by finding G(G — 1)/2 discrim- 
inants for each pair of groups (“one against one”) or by 
finding G discriminants for each group versus the re- 
maining data (“one against all”), but these approaches 
can lead to ambiguous classification rules [25]. 
Mathematical programming methods developed for 
multiple group analysis have been described [10,32, 
39,40,41,46,59,60,63,93]. Multiple group formulations 
for SVMs have been proposed and tested [17,36,40,49, 
59,60,66], but are still considered computationally in- 
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Progress in mathematical programming-based classification models 


Mathematical programming methods | References 


Linear programming 


Two-group classification 


[74,75] 
[5,31,32,33,47,99] 


Separate data by hyperplanes 


Minimizing the sum of deviations, 
minimizing the maximum 
deviation, and minimizing the 
sum of interior distances 

Hybrid model 


Review 


[45,99] 

[27,50,107] 

[110] 
[34,44,51,52,53,87, 
100,114,115,116] 
[9,86] 

[104,113] 

Effect of the position of outliers [94] 


Software 


Issues about normalization 


Robust linear programming 


Inclusion of second-order terms 


Binary attributes [3] 


Multigroup classification 


Single function classification [32] 
[10,46] 
[39,40,60,63] 


Multiple function classification 


Classification with 
reserved-judgment region using 
linear programming 


Mixed integer programming 


Two-group classification 


[1,5,6,7,54,101,105, 
109,119] 
[27,50,107] 

[110] 

Secondary goals [96] 


Minimizing the number of 
misclassifications 


Review 


Software 


tensive [49]. The “one-against-one” and “one-against- 
all” methods with SVMs have been successfully ap- 
plied [49,90]. 

We also discussed a class of multigroup general- 
purpose predictive models that we have developed 
based on the technology of large-scale optimization and 
SVMs [17,19,39,40,59,60,63]. Our models seek to max- 
imize the correct classification rate while constraining 
the number of misclassifications in each group. The 
models incorporate the following features: (1) the abil- 
ity to classify any number of distinct groups; (2) al- 
low incorporation of heterogeneous types of attributes 
as input; (3) a high-dimensional data transformation 
that eliminates noise and errors in biological data; 
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(4) constrain the misclassification in each group and 
a reserved-judgment region that provides a safeguard 
against overtraining (which tends to lead to high mis- 
classification rates from the resulting predictive rule); 
and (5) successive multistage classification capability 
to handle data points placed in the reserved-judgment 
region. The performance and predictive power of the 
classification models is validated through a broad class 
of biological and medical applications. 

Classification models are critical to medical ad- 
vances as they can be used in genomic, cell, molec- 
ular, and system-level analyses to assist in early pre- 
diction, diagnosis and detection of disease, as well as 
for intervention and monitoring. As shown in the CpG 
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island study for human cancer, such prediction and di- 
agnosis opens up novel therapeutic sites for early inter- 
vention. The ultrasound application illustrates its ap- 
plication to a novel drug delivery mechanism, assisting 
clinicians during a drug delivery process, or in devising 
devices that can be implanted into the body for auto- 
mated drug delivery and monitoring. The lung cancer 
cell motility study offers an understanding of how can- 
cer cells behave in different protein media, thus assist- 
ing in the identification of potential gene therapy and 
target treatment. Prediction of the shape of a cancer 
tumor bed provides a personalized treatment design, 
replacing manual estimates by sophisticated computer 
predictive models. Prediction of early atherosclerosis 
through inexpensive biomarker measurements and tra- 
ditional risk factors can serve as a potential clinical di- 
agnostic tool for routine physical and health mainte- 
nance, alerting physicians and patients to the need for 
early intervention to prevent serious vascular disease. 
Fingerprinting of microvascular networks opens up the 
possibility for early diagnosis of perturbed systems in 
the body that may trigger disease (e.g., genetic defi- 
ciency, diabetes, aging, obesity, macular degeneracy, tu- 
mor formation), identification of target sites for treat- 
ment, and monitoring prognosis and success of treat- 
ment. Determining the type of erythemato-squamous 
disease and the presence/absence of heart disease helps 
clinicians to correctly diagnose and effectively treat pa- 
tients. Thus, classification models serve as a basis for 
predictive medicine where the desire is to diagnose 
early and provide personalized target intervention. This 
has the potential to reduce healthcare costs, improve 
success of treatment, and improve quality of life of pa- 
tients. 
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Disjunctive programming (DP) problems can be stated 
in the form 

(DP) Minimize {f(x): x € X,x € UnenSn}, 
where f: R"” — Ris a lower semicontinuous function, X 
is a closed convex subset of the nonnegative orthant of 
R", and H is an index set for the collection of nonempty 
polyhedra 


Sn te Abx > b',x> 0h, he H. (1) 


The name for this class of problems arises from the 
feature that the constraints in (1) include the disjunc- 
tion that at least one of the (linear) sets of constraints 
defining S,, for h € H, must be satisfied. Problems 
including other logical conditions such as conjunc- 
tions, negations, and implications can be cast in the 
framework of this problem. Problem (DP) subsumes 
the classes of 0-1 mixed integer problems, the general- 
ized lattice point problem, the cardinality constrained 


linear program, the extreme point optimization prob- 
lem, the linear complementarity problem, among nu- 
merous others, and finds application in several re- 
lated problems such as orthogonal production schedul- 
ing, scheduling on identical machines, multistage as- 
signment, location-allocation problems, load balancing 
problems, the segregated storage problem, the fixed- 
charge problem, project/portfolio selection problems, 
goal programming problems, and many other game 
theory and decision theory problems (see [35] for a de- 
tailed discussion of such problems and applications). 
The theory and algorithms for disjunctive program- 
ming problems are mainly supported by the fundamen- 
tal disjunctive cut principle. The forward part of this re- 
sult due to E. Balas [4,5] states that for any nonnegative 
surrogate multiplier vectors A", h € H, the inequality 


sup{A" Ax > inf {A"b"y (2) 
heH heH 

is valid for (or is implied by) the disjunction x € 
Une HSn, where the sup{-} and inf{-} are taken compo- 
nentwise in (2). More importantly, the converse part of 
this result due to R.G. Jeroslow [16] states that for any 
given valid inequality 2x > mo for the disjunction x € 
Une HSp; there exist nonnegative surrogate multipliers 
A", h © H, such that the disjunctive cut (2) implies this 
given valid inequality, or uniformly dominates it, over 
the nonnegative orthant. This disjunctive cut principle 
also arises from the setting of convexity cuts and poly- 
hedral annexation methods as propounded by F. Glover 
[11,12], and it subsumes as well as can improve upon 
many types of classical cutting planes such as Gomory’s 
mixed integer cuts, intersection cuts, and reverse outer 
polar cuts for 0-1 programs (see [4,5,11,12,35]). H.P. 
Williams [39] provides some additional insights into 
disjunctive formulations. 

The generation of particular types of “deep cuts’ to 
delete a given solution (say, the origin, without loss of 
generality) based on the criteria of maximizing the Eu- 
clidean distance or the rectilinear distance between the 
origin and the nonnegative region feasible to the cut- 
ting plane, or maximizing the surplus in the cut with 
respect to the origin subject to suitable normalization 
constraints has also been explored in [34,37]. The in- 
tent behind such cutting plane methods is to generate 
nondominated valid inequalities that are supports (and 
hopefully, facets) of the closure convex hull of solutions 
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feasible to the disjunction. H.D. Sherali and C.M. Shetty 
[35,37] discuss how different alternate formulations of 
the disjunctive statement can influence the strength of 
the cut derived therefrom, and demonstrate how a se- 
quence of augmented formulations can be used to se- 
quentially tighten a given valid inequality. This process 
turns out to be precisely the Glover polyhedral annexa- 
tion scheme in [12]. In contrast with this sequence de- 
pendent ‘ifting’ procedure, Sherali and Shetty [37] pro- 
pose a ‘simultaneous lifting’ variant of this approach. 
Other types of disjunctive cutting planes for special 
problems include the cuts of [4,5,10,11,12,20], and [32] 
for linear knapsack, multiple choice and combinatorial 
disjunctions, [29] for linear complementarity problems, 
and the facet cuts of [25] based on the convex hull of 
certain types of disjunctions. 

Balas [3] also provides an algebraic characterization 
for the closure convex hull of a union of polyhedra. This 
characterization is particularly useful in the study of the 
important class of facial disjunctive programs, that sub- 
sumes mixed integer 0-1 problems and linear comple- 
mentarity problems, for example. A facial disjunctive 
program (FDP) can be stated as follows. 

(FDP) Minimize {cx: x Ee XN Y}, 
where X is a nonempty polytope in R”, and where Y 
is a conjunction of some h disjunctions given in the so- 
called conjunctive normal form (conjunction of disjunc- 
tions) 


Y =Mhenw [vieo, fx: al'x = of] : (3) 


Here, H = {1,... ht and for each h € H we have spec- 
ified a disjunction that requires at least one of the in- 
equalities al'x > ee for i € Qj, to be satisfied. The ter- 
minology ‘facial’ conveys the feature that XM {x: al'x 
> bh} defines a face of X for each i € Qy, h € H. For 
example, in the context of 0-1 mixed integer problems, 
the set X represents the linear programming relaxation 
of the problem, and for each binary variable x, h € H, 
the corresponding disjunction in (3) states that x, < 0 
or x; > 1 should hold true (where 0 < x, < 1 is in- 
cluded within X). Balas [3] shows that for facial disjunc- 
tive programs, the convex hull of feasible solutions can 
be constructed inductively by starting with Ko = X and 


then determining 


Ky, = conv [vieo, (Kin N {x: ax > ot) 


for eee (4) 


where Ke produces conv(X M Y). Based on this, a hier- 
archy of relaxations Ko,..., K> is generated for (FDP) 
that spans the spectrum from the linear programming 
to the convex hull representation [6]. Each member in 
this hierarchy can also be viewed as being obtained by 
representing the feasible region of the original problem 
as the intersection of the union of certain polyhedra, 
and then taking a hull-relaxation of this representation. 
Here, for a set D = ;Dj, where each D; is the union of 
certain polyhedra, the hull-relaxation of D [3] is defined 
as h — rel(D) = ; conv(Dj) > conv(D). 

In the context of 0-1 mixed integer problems 
(MIP), Sherali and W.P. Adams [27,28] develop 
a reformulation-linearization technique (RLT) for gen- 
erating a hierarchy of such relaxations, introducing the 
notion of multiplying constraints using factors com- 
posed of x, and (1 — x;,), h € H, to reformulate the 
problem, followed by a variable substitution to linearize 
the resulting problem. Approaches based on such con- 
straint product and linearization strategies were used 
by these authors earlier in the context of several special 
applications [1,2,26]. Later, L. Lovasz and A. Schrijver 
[17] independently used more general constraint fac- 
tors to generate a similar hierarchy for 0-1 problems. 
The foregoing RLT construct can be specialized to de- 
rive K;, defined by (4) for 0-1 MIPs, where in this case, 


K;, = conv [(Kn-1 N{x: xp < 0}) 
U(Kn-1 9 {x: xn = 1))] 


can be obtained by multiplying the (implicitly defined) 
constraints of Kp, by x; and (1 — x,) and then lineariz- 
ing the resulting problem. This RLT approach is used 
in [8] in the ‘Tift-and-project’ hierarchy of relaxations. 
However, the RLT process of [27,28] generates tighter 
relaxations at each level which can be viewed as hull 
relaxations produced by the intersection of the convex 
hull of the union of certain specially constructed poly- 
hedra. No direct realization of (4) can produce these re- 
laxations. For a survey on RLT approaches and for fur- 
ther enhancements, see [29,30]. 
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In the context of general facial disjunctive pro- 
grams, Jeroslow [15] presented a cutting plane algo- 
rithm that generates suitable facetial inequalities at 
each stage of the procedure such that an overall fi- 
nite convergence is guaranteed via (4). This is accom- 
plished by showing that in the worst case, the hierarchy 
Ko, ..., K-would be generated. The lift-and-project al- 
gorithm of [8] employs this cutting plane procedure 
based on the foregoing hierarchy of relaxations. Balas 
[7] also addresses an enhanced procedure that consid- 
ers two variables at a time to define the disjunctions. 
The RLT process is used to construct partial convex 
hulls, and the resulting relaxations are embedded in 
a branch and cut algorithm. 

Furthermore, for general facial disjunctive pro- 
grams, Sherali and Shetty [36] present another finitely 
convergent cutting plane algorithm. At each step, this 
procedure searches for extreme faces of X relative to 
the cuts generated thus far (these are faces that do not 
contain any feasible points lying in a lower-dimensional 
face of X, see [18]), and based on the dimension of this 
extreme face and its feasibility to Y, either a disjunctive 
face cut or a disjunctive intersection cut is generated. 
This procedure was specialized for bilinear program- 
ming problems in [33] to derive a first nonenumerative 
finitely convergent algorithm for this class of problems. 

Other disjunctive cutting plane algorithms include 
the Sherali-Sen procedures [31] for solving the general 
class of extreme point mathematical programs, the Bap- 
tiste-LePape procedures [9], and the Pinto-Grossmann 
procedures [21] for solving certain scheduling prob- 
lems having disjunctive logic constraints. S. Sen and 
Sherali [24] also discuss issues related to designing con- 
vergent cutting plane algorithms, and present examples 
to show nonconvergence of certain iterative disjunc- 
tive cutting plane methods. Sensitivity and stability is- 
sues related to feasible and optimal sets of disjunctive 
programs have been addressed in [14]; [13] deals with 
the problem of solving algebraic systems of disjunctive 
equations. For other applications of disjunctive meth- 
ods to process systems engineering, and to logic pro- 
gramming, see [19,23,38]. 


See also 


> MINLP: Branch and Bound Global Optimization 
Algorithm 


> MINLP: Branch and Bound Methods 

> MINLP: Global Optimization with aBB 

> MINLP: Logic-Based Methods 

> Reformulation-linearization Methods for Global 
Optimization 
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Abstract 


Protein force fields play an important role in protein 
structure prediction. Knowledge based force fields use 
the database information to derive the interaction en- 
ergy between different residues or atoms of a protein. 
These simplified force fields require less computational 
effort and are relatively easy to use. A C*-C® distance 
dependent high resolution force field has been devel- 
oped using a set of high quality (low rmsd) decoys. 
A linear programming based formulation was used in 
which non-native “decoy” conformers are forced to 
take a higher energy compared to the corresponding 
native structure. This force field was tested on an inde- 
pendent test set and was found to excel on all the met- 
rics that are widely used to measure the effectiveness of 
a force field. 
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Introduction 


Predicting the structure of a protein from its amino acid 
sequence is one of the biggest and yet most fundamen- 
tal problems in computational structural biology. An- 
finsen’s hypothesis [1] is one of the main approaches 
used to solve this problem, which says that for a given 
physiological set of conditions the native structure of 
a protein corresponds to the global Gibbs free energy 
minimum. Thus, one needs a force field to calculate the 
energy of different conformers and pick the one with 
the lowest energy. 

Physics-based force fields consider various types of 
interactions (for example, van der Waals interactions, 
hydrogen bonding, electrostatic interactions etc.) oc- 
curring at the atomic level of a protein to calculate the 
energy of a conformer. CHARMM [19], AMBER [5], 
ECEPP [20], ECEPP/3 [21] and GROMOS [24] are 
a few examples of the physics-based force fields. On the 
other hand, knowledge-based force fields use informa- 
tion from databases. Researchers have used the Boltz- 
mann distribution [4,7,26,], optimization based tech- 
niques [17,27] and many other approaches [6,12,13, 
14,15,16,18,23,25] to calculate these parameters. A re- 
cent review on such potentials can be found in Floudas 
et al. [8]. 

This work presents a novel C°-C® distance depen- 
dent high resolution force field that has been generated 
using linear optimization based framework [22]. The 
emphasis is on the high resolution, which would enable 
us to differentiate between native and non-native struc- 
tures that are very similar to each other (rmsd <2 A). 
The force field is called high resolution because it has 
been trained on a large set of high resolution decoys 
(small rmsd with respect to the native) and it intends 
to effectively distinguish high resolution decoys struc- 
tures from the native structure. 

The basic framework used in this work is similar 
to the one developed by Loose et al. [17]. However, it 
has been improved and applied to a diverse and en- 
hanced (both in terms of quantity and quantity) set of 
high resolution decoys. The new proposed model has 


resulted in remarkable improvements over the LKF po- 
tential. These high resolution decoys were generated 
using torsion angle dynamics in combination with re- 
stricted variations of the hydrophobic core within the 
native structure. This decoy set highly improves the 
quality of training and testing. The force field developed 
in this paper was tested by comparing the energy of the 
native fold to the energies of decoy structures for pro- 
teins separate from those used to train the model. Other 
leading force fields were also tested on this high quality 
decoy set and the results were compared with the re- 
sults of our high resolution potential. The comparison 
is presented in the Results section. 


Theory and Modeling 


In this model, amino acids are represented by the loca- 
tion of its C® atom on the amino acid backbone. The 
conformation of a protein is represented by a coordi- 
nate vector, X, which includes the location of the C% of 
each amino acid. The native conformation is denoted as 


X,, while the set i = 1, ... , N is used to denote the de- 
coy conformations X;. Non-native decoys are generated 
for each of p = 1, ... , P proteins and the energy of the 


native fold for each protein is forced to be lower than 
those of the decoy conformations (Anfinsen’s hypothe- 
sis). This constraint is shown in the following equation: 


EG )-F Ss 


1 
p=l,...,P i=1,...,N .) 


Equation (1) requires the native conformer to be always 
lower in energy than its decoy. A small positive parame- 
ter ¢ is used to avoid the trivial solution in which all en- 
ergies are set to zero. An additional constraint (Eq. 2) 
is used to produce a nontrivial solution by constrain- 
ing the sum of the differences in energies between de- 
coy and native folds to be greater than a positive con- 
stant [28]. For the model presented in this paper, the 
values of ¢ and I” were set to 0.01 and 1000, respec- 
tively. 


>> > LEX, :) — E(Xp,2)) > F (2) 
poi 


The energy of each conformation is taken as the arith- 
metic sum of pairwise interactions corresponding to 
each amino acid combination at a particular “contact” 
distance. A contact exists when the C® carbons of two 
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Distance dependent bin definition [17] 


C” Distance [A] 


1 
2 
3 
4 
5 
6 
7 
8 


amino acids are within 9 A of each other. The energy of 
each interaction is a function of the C*-C® distances 
and the identity of the interacting amino acids. To for- 
mulate the model, the energy of an interaction between 
a pair of amino acids, IC, within a distance bin, ID, was 
defined as 6'“'. The eight distance bins defined for the 
formulation are shown in Table 1. The energy for any 
fold X, of decoy i, for a protein p, is given by Eq. (3). 


E(Xp,i) = YY E Np, i,1c, 4c, (3) 
Ic ID 

In this equation, Np, ;,1c,1p is the number of interac- 
tions between an amino acid pair IC, at a C*-C™® dis- 
tance ID. The set IC ranges from 1 to 210 to account for 
the 210 unique combinations of the 20 naturally occur- 
ring amino acids. These bin definitions yield a total of 
1680 interaction parameters to be determined by this 
model. To determine these parameters, a linear pro- 
gramming formulation is used in which the energy of 
a native protein is compared with a large number of its 
decoys. The violations, in which a non-native fold has 
a lower energy than the natural conformation, are min- 
imized by optimizing with respect to these interaction 
parameters. 

Equation (1) can be rewritten in terms of Np,j,1c,1p 
as Eq. (4), where the slack parameters, S,, are positive 
variables (Eq. 5) that represent the difference between 
the energies of the decoys and the native conformation 
of a given protein. 


> CIN», i,1c.w — Np,n,1c,w]A1c,1p + Sp = € 
IC ID 


p=l,...,P i=1,....N (4) 


Sp>0 p=l,...,P (5) 


min S 6 
gic, ID) dX P (6) 


The objective function for this formulation is to min- 
imize the sum of the slack variables, Sp, written in the 
form of Eq. (6). The relative magnitude of @jcyp is 
meaningless because if all @jc1p parameters are multi- 
plied by a common factor then Eqs. (4) and (5) are still 
valid. In this formulation, 0jczp values were bound be- 
tween —25 and 25. 


Physical Constraints 


The above mentioned equations constitute the basic 
constraints needed to solve this model. However, this 
set does not guarantee a physically realistic solution. 
It is possible to come up with a set of parameters that 
can satisfy Eqs. (2,3,4,5,6) but would not reflect the 
actual interaction occurring between amino acids in 
a real system. To prohibit these unrealistic cases, an- 
other set of constraints based on the physical properties 
of the amino acids was imposed. Statistical results pre- 
sented in Bahar and Jernigan [2] were also incorporated 
through the introduction of hydrophilic and hydropho- 
bic constraints. The details of these physical constraints 
are given elsewhere [22]. 


Database Selection and Decoy Generation 


The protein database selection is critical to force field 
training. This set should adequately represent the PDB 
set [3]. At the same time, it should not be too large, as 
the training becomes difficult with an increase in the 
size of the training set. Zhang and Skolinck [29] devel- 
oped a set of 1,489 nonhomologous single domain pro- 
teins. High resolution decoys were generated for these 
proteins and used for training and testing purposes. 
High quality decoy generation was based on the hy- 
pothesis that high-quality decoy structures should pre- 
serve information about the distances within the hy- 
drophobic core of the native structure of each pro- 
tein. For each of the proteins in the database, a num- 
ber of distance constraints are introduced based on 
the hydrophobic-hydrophobic distances within the na- 
tive structure. Using a set of proximity parameters, 
a large number of decoy structures are generated using 
DYANA [9]. The rmsd distribution of decoy structures 
can be found elsewhere [22]. 
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Training and Test Set 


Of the 1400 proteins used for decoy generation, 1250 
were randomly selected for training and the rest were 
used for testing purposes. For every protein in the 
set, 500-1600 decoys were generated depending on 
the fraction of secondary structure present in the na- 
tive structure of the protein. These decoys were sorted 
based on their C® rmsd to the native structure and 
then 500 decoys were randomly selected to represent 
the whole rmsd range. This creates a training set of 
500 x 1250 = 625,000 decoys. However, because of 
computer memory limitations, it is not possible to in- 
clude all of these decoys at the same time for training. 
An iterative scheme, “Rank and Drop”, was employed 
to overcome the memory problem while effectively us- 
ing all the high quality structures. In this scheme, a sub- 
set of decoys is used to generate a force field. This force 
field is then used to rank all the decoys and a set of most 
challenging decoys (based on their energy value) is se- 
lected for the next round of force field generation. This 
process of force field generation and decoy ranking is 
repeated until there is no improvement in the ranking 
of the decoys [22]. This force field model was solved 
using the GAMS modeling language coupled with the 
CPLEX linear programming package [11]. 

It is equally important to test a force field on a diffi- 
cult and rigorous testing set to confirm its effectiveness. 
The test set was comprised of 150 randomly selected 
proteins (41-200 amino acids in length). For each of the 
150 test proteins, 500 high resolution decoys were gen- 
erated using the same technique that was used to gen- 
erate training decoys. The minimum C® based rmsds 
for these non-native structures were in the range of 0- 
2 A. This HR force field was also tested on another set 
of medium resolution decoys [17]. This set has 200 de- 
coys for 151 proteins. The minimum RMSD of the de- 
coys of this set ranged from 3-16 A. This set, along with 
the high resolution decoy set, spans the practical range 
of possible protein structures that one might encounter 
during protein structure prediction. 


Results and Discussion 


A linear optimization problem was solved using infor- 
mation from 625,000 decoy structures and the values 
of all the energy parameters were obtained. The ability 
to distinguish between the native structure and native- 
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Testing force fields on 150 proteins of the high resolution de- 
coy set. TE13 force field was only tested on 148 cases 


FF-Name Average Rank No of Firsts 

113 (75.33% 
17 (11.33% 
92 (62.16% 
70 (46.67% 


Average rmsd 


) 
) 
) 
) 


like conformers is the most significant test for any force 
field. The HR force field was tested on 500 decoys of the 
150 test proteins. In this testing, the relative position, or 
rank, of the native conformation among its decoys was 
calculated. An ideal force field should be able to assign 
rank 1 to the native structures of all the test proteins. 
Other force fields like LKF [17], TE13 [27], and HL [10] 
were also tested on this set of high resolution decoys. 
All these force fields are fundamentally different from 
each other in their methods of energy estimation. Com- 
paring the results obtained with these force fields aims 
to assess the fundamental utility of the HR force field. 
The comparison of the energy rankings obtained using 
different force fields is presented in Table 2. From this 
table it is evident that the HR force field is the most ef- 
fective in identifying the native structures by rank. The 
HR force field correctly identified the native folds of 113 
proteins out of a set of 150 proteins, which compares fa- 
vorably to a maximum of 92 (out of 148) by the TE13 
force field. 

Another analysis was carried out to evaluate the dis- 
crimination ability of these potentials. In this evalua- 
tion, all the decoys of the test set were ranked using 
these potentials. For each test protein, the C* rmsd of 
the rank 1 conformer was calculated with respect to the 
native structure of that protein. The C® rmsd would be 
zero for the cases in which a force field selects the na- 
tive structure as rank 1. However, it will not be zero for 
all other cases in which a non-native conformer is as- 
signed the top rank. The average of these rmsds repre- 
sents the spatial separation of the decoys with respect to 
the native structure. The average rmsd value obtained 
for each of the force fields is shown in Table 2. It can be 
seen that the average C® rmsd value is least for the HR 
force field. The average C® rmsd value for the HR force 
field is 0.451 A, which is much less compared to 1.721 A 
by the LKF, and 0.813 A by TE13 force field. This means 
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that the structures predicted by the HR force fields have 
the least spatial deviation from their corresponding na- 
tive structures. 

The HR force field was also tested on the test set 
published by Loose et al. [17] and was found to do bet- 
ter than other force fields. The comparison results for 
this test can be found elsewhere [22]. The effectiveness 
of the HR force field is further reinforced by its suc- 
cess on the medium resolution decoy test set. On the 
test set of 110 medium resolution decoys, it was capable 
of correctly identifying 78.2 % of the native structures, 
significantly more than other force fields. 


Conclusions 


The HR force field was developed using an optimization 
based linear programming formulation, in which the 
model is trained using a diverse set of high quality de- 
coys. Physically observed interactions between certain 
amino acids were written in the form of mathematical 
constraints and included in the formulation. 

The decoys were generated based on the premise 
that high quality decoy structures should preserve in- 
formation about the distance within the hydrophobic 
core of the native structure of each protein. The set 
of interaction energy parameters obtained after solving 
the model were found to be of very good discrimina- 
tory capacity. This force field performed well on a set 
of independent, non-homologous high resolution de- 
coys. This force field can become a powerful tool for 
fold recognition and de novo protein design. 
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Introduction 


Exact algorithms allow one to find optimal solutions to 
NP-hard combinatorial optimization (CO) problems. 
Many research papers report on solving large instances 
of some NP-hard problems (see, e. g., [25,27]). The run- 
ning time of exact algorithms is often very high for large 
instances (many hours or even days), and very large 
instances remain beyond the capabilities of exact algo- 
rithms. Even for instances of moderate size, if we wish 
to remain within seconds or minutes rather than hours 
or days of running time, only heuristics can be used. 
Certainly, with heuristics, we are not guaranteed to find 
optimum, but good heuristics normally produce near- 
optimal solutions. This is enough in most applications 
since very often the data and/or mathematical model 
are not exact anyway. 

Research on CO heuristics has produced a large va- 
riety of heuristics especially for well-known CO prob- 
lems. Thus, we need to choose the best ones among 
them. In most of the literature, heuristics are com- 
pared in computational experiments. While experi- 
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mental analysis is of definite importance, it cannot 
cover all possible families of instances of the CO prob- 
lem at hand and, in particular, it practically never cov- 
ers the hardest instances. 

Approximation Analysis [3] is a frequently used tool 
for theoretical evaluation of CO heuristics. Let H be 
a heuristic for a combinatorial minimization problem 
P and let 7, be the set of instances of P of size n. In 
approximation analysis, we use the performance ratio 
ry(n) = max{f(D/f*(1): 1€ I,}, where f(D(f*(D) 
is the value of the heuristic (optimal) solution of I. 
Unfortunately, for many CO problems, estimates for 
r3{(n) are not constants and provide only a vague pic- 
ture of the quality of heuristics. Moreover, even con- 
stant performance ratio does not guarantee that the 
heuristic often outputs good-quality solutions, see, e. g., 
the discussion of the DMST heuristic below. 

Domination Analysis (DA) (for surveys, see [22,24]) 
provides an alternative and a complement to approxi- 
mation analysis. In DA, we are interested in the domi- 
nation number or domination ratio of heuristics. Dom- 
ination number (ratio) of a heuristic H for a combina- 
torial optimization problem P is the maximum number 
(fraction) of all solutions that are not better than the 
solution found by H for any instance of P of size n. 
In many cases, DA is very useful. For example, we 
will see later that the greedy algorithm has domina- 
tion number 1 for many CO problems. In other words, 
the greedy algorithm, in the worst case, produces the 
unique worst possible solution. This is in line with lat- 
est computational experiments with the greedy algo- 
rithm, see, e.g., [25], where the authors came to the 
conclusion that the greedy algorithm ‘might be said to 
self-destruct’ and that it should not be used even as 
‘a general-purpose starting tour generator’. 

The Asymmetric Traveling Salesman Problem 
(ATSP) is the problem of computing a minimum weight 
tour (Hamilton cycle) passing through every vertex in 
a weighted complete digraph K* on n vertices. The 
Symmetric TSP (STSP) is the same problem, but on 
a complete undirected graph. When a certain fact holds 
for both ATSP and STSP, we will simply speak of TSP. 
Sometimes, the maximizing version of TSP is of inter- 
est, we denote it by Max TSP. 

APX is the class of CO problems that admit poly- 
nomial time approximation algorithms with a constant 
performance ratio [3]. It is well known that while Max 


TSP belongs to APX, TSP does not. This is at odds 
with the simple fact that a ‘good’ approximation algo- 
rithm for Max TSP can be easily transformed into an 
algorithm for TSP. Thus, it seems that both Max TSP 
and TSP should be in the same class of CO problems. 
The above asymmetry was already viewed as a draw- 
back of performance ratio already in the 1970’s, see, 
e.g. [11,28,33]. Notice that from the DA point view 
Max TSP and TSP are equivalent problems. 

Zemel [33] was the first to characterize measures of 
quality of approximate solutions (of binary integer pro- 
gramming problems) that satisfy a few basic and nat- 
ural properties: the measure becomes smaller for bet- 
ter solutions, it equals 0 for optimal solutions and it is 
the same for corresponding solutions of equivalent in- 
stances. While the performance ratio and even the rel- 
ative error (see [3]) do not satisfy the last property, the 
parameter 1-r, where ris the domination ratio, does sat- 
isfy all of the properties. 

Local Search (LS) is one of the most successful ap- 
proaches in constructing heuristics for CO problems. 
Recently, several researchers investigated LS with Very 
Large Scale Neighborhoods (see, e.g., [1,12,24]). The 
hypothesis behind this approach is that the larger the 
neighborhood the better quality solution are expected 
to be found [1]. However, some computational ex- 
periments do not support this hypothesis; sometimes 
an LS with small neighborhoods proves to be supe- 
rior to that with large neighborhoods. This means that 
some other parameters are responsible for the relative 
power of neighborhoods. Theoretical and experimen- 
tal results on TSP indicate that one such parameter 
may well be the domination number of the correspond- 
ing LS. 

In our view, it is advantageous to have bounds for 
both performance ratio and domination number (or, 
domination ratio) of a heuristic whenever it is possible. 
Roughly speaking this will enable us to see a 2D rather 
than 1D picture. For example, consider the double min- 
imum spanning tree heuristic (DMST) for the Met- 
ric STSP (i.e., STSP with triangle inequality). DMST 
starts from constructing a minimum weight spanning 
tree T in the complete graph of the STSP, doubles ev- 
ery edge in T, finds a closed Euler trail E in the ‘dou- 
ble’ T, and cancels any repetition of vertices in E to 
obtain a TSP tour H. It is well-known and easy to 
prove that the weight of H is at most twice the weight 
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of the optimal tour. Thus, the performance ratio for 
DMST is bounded by 2. However, Punnen, Margot 
and Kabadi [29] proved that the domination number 
of DMST is 1. Interestingly, in practice DMST often 
performs much worse than the well-known 2-Opt LS 
heuristic. For 2-Opt LS we cannot give any constant ap- 
proximation guarantee, but the heuristic is of very large 
domination number [29]. 

The above example indicates that it makes sense to 
use DA to rank heuristics for the CO problem under 
consideration. If the domination number of a heuristic 
His larger than the domination of a heuristic H’ (for 
all or ‘almost all’ sizes n), we may say that H is better 
than H’ in the worst case (from the DA point of view). 
Berend, Skiena and Twitto [10] used DA to rank some 
well-known heuristics for the Vertex Cover problem 
(and, thus, the Independent Set and Clique problems). 
The three problems and the heuristics will be defined in 
the corresponding subsection of the Cases section. Ben- 
Arieh et al. [7] studied three heuristics for the General- 
ized TSP: the vertices of the complete digraph are par- 
titioned into subsets and the goal is to find a minimum 
weight cycle containing exactly one vertex from each 
subset. In the computational experiment in [7] one of 
the heuristics was clearly inferior to the other two. The 
best two behaved very similarly. Nevertheless, the au- 
thors of [7] managed to ‘separate’ the two heuristics by 
showing that one of the heuristics was of much larger 
domination number. 

One might wonder whether a heuristic A, which 
is significantly better that another heuristic B from the 
DA point of view, is better that B in computational ex- 
periments. In particular, whether the ATSP greedy al- 
gorithm, which is of domination number 1, is worse, in 
computational experiments, than any ATSP heuristic of 
domination number at least (m — 2)! ? Generally speak- 
ing the answer to this natural question is negative. This 
is because computational experiments and DA indicate 
different aspects of quality of heuristics. Nevertheless, 
it seems that many heuristics of very small domination 
number such as the ATSP greedy algorithm perform 
poorly also in computational experiments and, thus, 
cannot be recommended to be widely used in compu- 
tational practice. 

The rest of the entry is organized as follows. We give 
additional terminology and notation in the section Def- 
initions. In the section Methods, we describe two pow- 


erful methods in DA. In the section Cases, we consider 
DA results for some well-known CO problems. 


Definitions 


Let P be a CO problem and let H be a heuristic for 
P. The domination number domn(H ,7) of H for an 
instance J of P is the number of solutions of 7 that 
are not better than the solution s produced by # in- 
cluding s itself. For example, consider an instance T 
of the STSP on 5 vertices. Suppose that the weights 
of tours in T are 2,5,5,6,6,9,9,11,11,12,12,15 (every in- 
stance of STSP on 5 vertices has 12 tours) and sup- 
pose that the greedy algorithm computes the tour T of 
weight 6. Then domn(greedy, TJ) = 9. In general, if 
domn(H , 7) equals the number of solutions in 7, then 
Hf finds an optimal solution for J. Ifdomn(H, 7) = 1, 
then the solution found by H for 7 is the unique worst 
possible one. 

The domination number domn(H ,n) of H is the 
minimum of domn(H, 7) over all instances J of size n. 
Since the ATSP on n vertices has (n— 1)! tours, an algo- 
rithm for the ATSP with domination number (n — 1)! 
is exact. The domination number of an exact algorithm 
for the STSP is (n — 1)!/2. If an ATSP heuristic A has 
domination number equal 1, then there is an assign- 
ment of weights to the arcs of each complete digraph 
K*, n > 2, such that A finds the unique worst possible 
tour in KF. 

While studying TSP we normally consider only fea- 
sible solutions (tours), for several other problems some 
authors take into consideration also infeasible solu- 
tions [10]. One example is the Maximum Independent 
Set problem, where given a graph G, the aim is to find 
an independent set in G of maximum cardinality. Ev- 
ery non-empty set of vertices is considered to be a solu- 
tion by Berend, Skiena and Twitto [10]. To avoid deal- 
ing with infeasible solutions (and, thus, reserving the 
term ‘solution’ only for feasible solutions) we also use 
the notion of the blackball number introduced in [10]. 
The blackball number bbn(H , 1) of H for aan instance 
J of P is the number of solutions of 7 that are better 
than the solution produced by H. The blackball num- 
ber bbn(H, n) of H is the maximum of domn(H,7) 
over all instances 7 of size n. 

When the number of solutions depends not only on 
the size of the instance of the CO problem at hand (for 
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example, the number of independent sets of vertices in 
a graph G on n vertices depends on the structure of G), 
the domination ratio of an algorithm A is of interest: 
the domination ratio of A, domr(.A, n), is the mini- 
mum of domn(.A, 7)/sol(Z), where sol(Z) is the num- 
ber of solutions of 7, taken over all instances J of size n. 
Clearly, domination ratio belongs to the interval (0, 1] 
and exact algorithms are of domination ratio 1. 


Methods 


Currently, there are two powerful methods in DA. One 
is used to prove that the heuristic under consideration is 
of domination number 1. For this method to be useful, 
the heuristic has to be a greedy-type algorithm for a CO 
problem on independence systems. We describe the 
method and its applications in the subsection Greedy- 
Type Algorithms. The other method is used prove that 
the heuristic under consideration is of very large dom- 
ination number. For many problems this follows from 
the fact that the heuristic always finds a solution that is 
not worse than the average solution. This method is de- 
scribed in the subsection Better-Than-Average Heuris- 
tics. 


Greedy-Type Algorithms 


The main practical message of this subsection is that 
one should be careful while using the classical greedy 
algorithm and its variations in combinatorial optimiza- 
tion (CO): there are many instances of CO problems for 
which such algorithms will produce the unique worst 
possible solution. Moreover, this is true for several well- 
known optimization problems and the corresponding 
instances are not exotic, in a sense. This means that not 
always the paradigm of greedy optimization provides 
any meaningful optimization at all. 

An independence system is a pair consisting of a fi- 
nite set E and a family F of subsets (called independent 
sets) of E such that (I1) and (12) are satisfied. 


(11) the empty set is in F; 
(12) If X € F and Y isa subset of X, then Y € F. 


All maximal sets of F are called bases. An indepen- 
dence system is uniform if all its bases are of the same 
cardinality. 

Many combinatorial optimization problems can be 
formulated as follows. We are given an independence 


system (E,F), a set W C Z, and a weight function 
w that assigns a weight w(e) € W to every element of 
E (Z x is the set of non-negative integers). The weight 
w(S) of S € F is defined as the sum of the weights of 
the elements of S. It is required to find a base B € F of 
minimum weight. We will consider only such problems 
and call them the (E,F, W)-optimization problems. 

If S € F, then let 1(S) = {x : SU{x} © F}-S. 
This means that I(S) consists of those elements from E- 
S, which can be added to S, in order to have an indepen- 
dent set of size |S| + 1. Note that by (12) I(S) # @ for 
every independent set S which is not a base. 

The greedy algorithm tries to construct a minimum 
weight base as follows: it starts from an empty set X, 
and at every step it takes the current set X and adds 
to it a minimum weight element e € I(X), the al- 
gorithm stops when a base is built. We assume that 
the greedy algorithm may choose any element among 
equally weighted elements in I(X). Thus, when we say 
that the greedy algorithm may construct a base B, we 
mean that B is built provided the appropriate choices 
between elements of the same weight are made. 

An ordered partitioning of an ordered set Z = 
{Z1, Z2,.-., Zk} is a collection of subsets Aj, A2,...,Ag 
of Z such that if z, € A; and z, € Aj where 1 <i < 
j < qthen r < s. Some of the sets A; may be empty and 
UE Ape z, 

The following theorem by Bang-Jensen, Gutin and 
Yeo [6] characterizes all uniform independence systems 
(E,¥) for which there is an assignment of weights to 
the elements of E such that the greedy algorithm solv- 
ing the (E, F, {1,2,..., r})-optimization problem may 
construct the unique worst possible solution. 


Theorem 1 Let (E,F) be a uniform independence sys- 
tem and let r > 2 be a natural number. There exists 
a weight assignment w : E — {1,2,...,1r} such that the 
greedy algorithm may produce the unique worst possible 
base if and only if F contains some base B with the prop- 
erty that for some ordering x,,...,x of the elements 
of B and some ordered partitioning Aj, A2,..., A, of 
X,,...,Xx the following holds for every base B’ # B 
of F: 


r—1 r 
Yo Ao) OB < So 7-1A,l. (1) 
j=0 j=l 


where Ao,j = Ag U-++U Aj and Ay = @. 
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The special case r = 2 has an ‘easier’ characterization 
also proved in [6]. 


Theorem 2 Let (E,F) be a uniform independence sys- 
tem. For every choice of distinct natural numbers a,b 
there exists a weight function w: E — {a,b} such that 
the greedy algorithm may produce the unique worst base 
if and only if F contains a base B = {x,,X2,..., Xk} 
such that for some 1 < i < k the following holds: 


(a) If B’ is a base such that {x,,... 
B. 

(b) If B’ is a base such that {xj41,...,xx} C B’ then 
B’ =B. 


xi} C B’ then B' = 


Using Theorem 1, the authors of [6] proved the follow- 
ing two corollaries. 


Corollary 3 Consider STSP as an (E,H ,W)-optimiza- 
tion problem. Let n > 3. 


(a) Ifn => 4and|W| < [+1 then the greedy algo- 


rithm never produces the unique worst possible base 


(i. e., tour). 
(b) Ifn > 3,r > n—1landW = {1,2,...,r}, then 
there exists a weight function w: E — {1,2,...,r} 


such that the greedy algorithm may produce the 
unique worst possible base (i. e., tour). 


Corollary 4 Consider ATSP as an (E,H,W)-optimiza- 

tion problems. Let n > 3. 

(a) If |W| < (+1, then the greedy algorithm never 
produces the unique worst possible base (i. e., tour). 

(b) For every r > [++] there exists a weight function 
w: E(K) — {1,2,...,1r} such that the greedy al- 
gorithm may produce the unique worst possible base 
(i. e., tour). 


Let F be the sets of those subsets X of E(K2,) which in- 
duce a bipartite graph with at most n vertices in each 
partite set. Then (E(K2,), F) is a uniform indepen- 
dence system and the bases of (E(K2n), F) correspond 
to copies of the complete balanced bipartite graph Ky,» 
in Ko,. The (E(K2,), F, Z+)-optimization problem is 
called the Minimum Bisection Problem. Theorem 2 im- 
plies the following: 


Corollary 5 [6] Let n > 4. The greedy algorithm for the 
(E(Kan), F, W)-optimization problem may produce the 
unique worst solution even if |W| = 2. 


For W = Z,, the following sufficient condition can 
often be used: 


Theorem 6 [21] Let (E, F) be an independence system 
which has a base B' = {x ,X2,...,xx} such that the 
following holds for every base B € F, BF B’, 


k-1 
So [Lor 2,0. 8A) ABI < k(k + 1/2. 
j=0 


Then the greedy algorithm for the (E, F,Z+)- 
optimization problem may produce the unique worst so- 
lution. 


Gutin, Yeo and Zverovich [23] considered the well- 
known nearest neighbor (NN) TSP heuristic: the tour 
starts at any vertex x of the complete directed or undi- 
rected graph; we repeat the following loop until all ver- 
tices have been included in the tour: add to the tour 
a vertex (among vertices not yet in the tour) closest to 
the vertex last added to the tour. It was proved in [23] 
that the domination number of NN is 1 for any n > 3. 

Bendall and Margot [8] studied greedy-type algo- 
rithms for many CO problems. Greedy-type algorithms 
were introduced in [18]. They include NN and were de- 
fined as follows. A greedy-type algorithm H is similar 
to the greedy algorithm: start with the partial solution 
X = 9; and then repeatedly add to X an element of 
minimum weight in Iy/(X) (ties are broken arbitrarily) 
until X is a base of , where I(X) is a subset of I (X) 
that does not depend on the cost function c, but only 
on the independence system (E, ¥ ) and the set X. More- 
over, I7¢(X) is non-empty if I(X) 4 9, a condition that 
guarantees that H always outputs a base. Bendall and 
Margot [8] obtained complicated sufficient conditions 
for an independent system (E, ¥) that ensure that every 
greedy-type algorithm is of domination number 1 for 
the (E, ¥, Z)-optimization problem. 

The conditions imply that every greedy-type algo- 
rithm is of domination number 1 for the following clas- 
sical CO problems [8]: (1) The Minimum Bisection 
Problem; (2) The k-Clique Problem: find a set of k ver- 
tices in a complete graph so that the sum of the weights 
of the edges between them is minimum; (3) ATSP; (4) 
STSP; (5) The MinMax Matching Subgraph Problem: 
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find a maximal (with respect to inclusion) matching so 
that the sum of the weights of the edges in the matching 
is minimum; (6) The Assignment Problem: find a perfect 
matching in a weighted complete bipartite graph so that 
the sum of the weights of the edges in the matching is 
minimum. 


Better-Than-Average Heuristics 


The idea of this method is to show that a heuristic is 
of very large domination number if it always produces 
a solution that is not worse than the average solution. 
The first such result was proved by Rublineckii [31] for 
the STSP. 


Theorem 7 Every STSP heuristic that always produces 
a tour not worse than the average tour (of the instance) 
is of domination number at least (n — 2)! when n is odd 
and (n — 2)!/2 when n is even. 


The following similar theorem was proved by Sar- 
vanov [32] for n odd and Gutin and Yeo [20] for n even. 


Theorem 8 Every ATSP heuristic that always produces 
a tour not worse than the average tour (of the instance) 
is of domination number at least (n—2)! for eachn F 6. 


The two theorems has been used to prove that a wide 
variety of TSP heuristics have domination number at 
least 2((n — 2)!). We discuss two families of such 
heuristics. 

Consider an instance of the ATSP (STSP). Order the 
vertices Xx), X2,...,X, of K* (K,) using some rule. The 
generic vertex insertion algorithm proceeds as follows. 
Start with the cycle Cz = x)x2x,. Construct the cy- 
cle Cj from Cj-; (j = 3,4,5,...,m), by inserting the 
vertex x; into Cj.; at the optimum place. This means 
that for each arc e = xy which lies on the cycle C;.; 
we compute w(xx;) + w(xj;y) — w(xy), and insert x; 
into the arc e = xy, which obtains the minimum such 
value. Here w(uv) denotes the weight of an arc uv. E.M. 
Lifshitz (see [31]) was the first to prove that the generic 
vertex insertion algorithm always produces a tour not 
worse than the average tour. Thus, we have the follow- 
ing: 

Corollary9 The generic vertex insertion algorithm has 
domination number at least (n — 2)! (n # 6). 


In TSP local search (LS) heuristics, a neighborhood 
N(T) is assigned to every tour T; N(T) is a set of tours 


in some sense close to T. The best improvement LS pro- 
ceeds as follows. We start from a tour To. In the 7th it- 
eration (i > 1), we search in the neighborhood N(T;-;) 
for the best tour T;. If the weights of T;., and T; do not 
coincide, we carry out the next iteration. Otherwise, we 
output Tj. 

The k-Opt, k > 2, neighborhood of a tour T con- 
sists of all tours that can be obtained by replacing a col- 
lection of k edges (arcs) by a collection of k edges (arcs). 
It is easy to see that one iteration of the best improve- 
ment k-Opt LS can be completed in time O(n*). Rubli- 
neckii [31] showed that every local optimum for the 
best improvement 2-Opt or 3-Opt LS for STSP is of 
weight at least the average weight of a tour and, thus, by 
Theorem 7 is of domination number at least (n — 2)!/2 
when n is even and (m — 2)! when n is odd. Observe 
that this result is of restricted interest since to reach 
a k-Opt local optimum one may need exponential time 
(cf. Section 3 in [26]). However, Punnen, Margot and 
Kabadi [29] managed to prove that, in polynomial time, 
the best improvement 2-Opt and 3-Opt LS’s for STSP 
produce a tour of weight at least the average weight of 
a tour. Thus, we have the following: 


Corollary 10 For the STSP the best improvement 2- 
Opt LS produces a tour, which is not worse than at least 
2((n—2)!) other tours, in at most O(n? logn) iterations. 


Corollary 10 is also valid for the best improvement 
3-Opt LS and some other LS heuristics for TSP, 
see [24,29]. In the next section, we will give further ex- 
amples of better-than-average heuristics for problems 
other than TSP. 


Cases 
Traveling Salesman Problem 


In the previous sections, we discussed several TSP 
heuristics. However, there are many more TSP heuris- 
tics and, in this subsection, we consider some of them. 
In the next subsection, some general upper bounds are 
given on the domination number of TSP heuristics. 
Gutin, Yeo and Zverovich [23] considered the re- 
peated nearest neighbor (RNN) heuristic, which is the 
following variation of the NN heuristic: construct n 
tours by starting NN from each vertex of the complete 
(di)graph and choose the best tour among the n tours. 
The authors of [23] proved that for the ATSP, RNN al- 
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ways produces a tour, which is not worse than at least 
n/2—1 other tours, but for some instance it finds a tour, 
which is not worse than at most n-2 other tours, n > 4. 
We also show that, for some instance of the STSP on 
n => 4 vertices, RNN produces a tour not worse than at 
most 2”? tours. 

Another ATSP heuristic, max-regret-fc (fc ab- 
breviates First Coordinate), was first introduced by 
Ghosh et al. [13]. Extensive computational experiments 
in [13] demonstrated a clear superiority of max-regret- 
fc over the greedy algorithm and several other con- 
struction heuristics from [14]. Therefore, the result 
of Theorem 11 obtained by Gutin, Goldengorin and 
Huang [15] was somewhat unexpected. 

Let K* be a complete digraph with vertices V = 
{1,2,...,n}. The weight of an arc (i,j) is denoted by 
w;. Let Q be a collection of disjoint paths in K7. An arc 
a=(i,j) is a feasible addition to Q if Q+a is either a col- 
lection of disjoint paths or a tour in K*. Consider the 
following two ATSP heuristics: max-regret-fc and max- 
regret. 

The heuristic max-regret-fc proceeds as follows. Set 
W = T = 9. While V ¥¢ W do the following: For each 
i€ V \ W, compute two lightest arcs (i,j) and (i,k) that 
are feasible additions to T, and compute the difference 
A; => |wij _ Wikl- For i € V\ W with maximum A; 
choose the lightest arc (i,j), which is a feasible addition 
to T and add (i,j) to M and ito W. 

The heuristic max-regret proceeds as follows. Set 
Wt = W- =T = Q. While V 4 W* do the fol- 
lowing: For each i € V \ W*, compute two lightest 
arcs (i,j) and (i,k) that are feasible additions to T, and 
compute the difference ae = |wij — wik|; for each 
i € V \ W_, compute two lightest arcs (j,i) and (k,i) 
that are feasible additions to T, and compute the differ- 
ence Ay = |w;; — wei]. Compute i’ € V \ Wt with 
maximum A# and i” € V \ W7 with maximum 47,. 
If Aj; > Aj, choose the lightest arc (i’, j’), which is 
a feasible addition to T and add (i’, j’) to M, i’ to Wt 
and j’ to W~. Otherwise, choose the lightest arc (j”, i”), 
which is a feasible addition to T and add (j”, i”) to M, 
i” to W~ and j” to WT. 

Notice that in max-regret-fc, if |V \ W| = 1 we set 
A; = 0. A similar remark applies to max-regret. 


Theorem 11 The domination number of both max- 
regret-fc and max-regret equals 1 for each n > 2. 


Upper Bounds for Domination Numbers 
of ATSP Heuristics 


It is realistic to assume that any ATSP algorithm spends 
at least one unit of time on every arc of K7 that it con- 
siders. We use this assumption in this subsection. 


Theorem 12 [17] Let A be an ATSP heuristic of run- 
ning time t(n). Then the domination number of A does 
not exceed maxj<y/<n(t(n)/n')”. 


Corollary 13 [17] Let A be an ATSP heuristic of com- 
plexity t(n). Then the domination number of A does 
not exceed max{e')/°, (t(n)/n)"}, where e is the basis 
of natural logarithms. 


The next assertion follows directly from the proof of 
Corollary 13. 


Corollary 14 [17] Let A be an ATSP heuristic of com- 
plexity t(n). For t(n) > en, the domination number of 
A does not exceed (t(n)/n)". 


We finish this subsection with a result from [17] that 
improves (and somewhat clarifies) Theorem 20 in [29]. 


Theorem 15 Unless P = NP, there is no polynomial 
time ATSP algorithm of domination number at least 
(n — 1)! — |[n — n* |! for any constant a < 1. 


Multidimensional Assignment Problem (MAP) 


In case of s dimensions, MAP is abbreviated by s-AP 
and defined as follows. Let X} = X, = -:: = X, = 
{1,2,..., nm}. We will consider only vectors that belong 
to the Cartesian product X = X, x X2 x--- x X;. Each 
vector e is assigned a weight w(e). For a vector e, e; de- 
notes its jth coordinate, i.e., e; € Xj. A partial assign- 
ment is a collection e!, e?,..., e! of t < n vectors such 
that ei x e for each i # k andj € {1,2,...,s}. An 
assignment is a partial assignment with n vectors. The 
weight of a partial assignment A = {e',e*,...,e'} is 
w(A) = aa w(e'). The objective is to find an assign- 
ment of minimum weight. Notice that s-AP has (n!)*! 
solutions (assignments). 

s-AP can be considered as the (X, ¥, Z+)-opti- 
mization problem. (F consists of partial assignments 
including the empty one.) This allows us to define the 
greedy algorithm for s-AP and to conclude from Theo- 
rem 6 that the greedy algorithm is of domination num- 
ber 1 (for every fixed s > 3). 
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In the subsection Traveling Salesman Problem, we 
considered the max-regret-fc and max-regret heuristics. 
In fact, max-regret was first introduced for 3-AP by 
Balas and Saltzman [4]. (See [15] for detailed descrip- 
tion of the s-AP max-regret-fc and max-regret heuris- 
tics for each s > 2.) In computational experiments, 
Balas and Saltzman [4] compared the greedy algorithm 
with max-regret and concluded that max-regret is su- 
perior to the greedy algorithm with respect to the qual- 
ity of solutions. However, after conducting wider com- 
putational experiments, Robertson [30] came to a dif- 
ferent conclusion: the greedy algorithm and max-regret 
are of similar quality for 3- AP. Gutin, Goldengorin and 
Huang [15] share the conclusion of Robertson: both 
max-regret and max-regret-fc are of domination num- 
ber 1 (similarly to the greedy algorithm) for s-AP for 
each s > 3. Moreover, there exists a family of s-AP 
instances for which all three heuristics will find the 
unique worst assignment [15] (for each s > 3). 

Similarly to TSP, we may obtain MAP heuristics 
of factorial domination number if we consider better- 
than-average heuristics. This follows from the next the- 
orem: 


Theorem 16 [15] Let H be a heuristic that for each 
instance of s-AP constructs an assignment of weight at 
most the average weight of an assignment. Then the 
domination number of H is at least ((n — 1)!)*“!. 


Balas and Saltzman [4] introduced a 3-Opt heuristic 
for 3-AP which is similar to the 3-Opt TSP heuristic. 
The 3-Opt neighborhood of an assignment A = {e’, 
e*,...,e"} is the set of all assignments that can be 
obtained from A by replacing a triple of vectors with 
another triple of vectors. The 3-Opt is a local search 
heuristic that uses the 3-Opt neighborhood. It is proved 
in [15] that an assignment, that is the best in its 3-Opt 
neighborhood, is at least as good as the average assign- 
ment. This implies that 3-Opt is of domination number 
at least ((n—1)!)*. We cannot guarantee that 3-Opt local 
search will stop after polynomial number of iterations. 
Moreover, 3-Opt is only for 3-AP. Thus, the following 
heuristic introduced and studied in [15] is of interest. 
Recursive Opt Matching (ROM) proceeds as fol- 
lows. Compute a new weight w(i, j) = w(Xj,;)/n*’, 
where Xj is the set of all vectors with last two coordi- 
nates equal i and j, respectively. Solving the 2-AP with 
the new weights to optimality, find an optimal assign- 


ment {(i,7;(i)): i= 1,2,...,n}, where z;, is a per- 
mutation on X,. While s # 1, introduce (s-1)-AP 
with weights given as follows: w/(f') = w(fi, 2.(i)) 
for each vector f' € X’, where X’n = X, x X) x 
- x X;_1, with last coordinate equal i and apply 
ROM recursively. As a result we have obtained per- 
mutations 7, 7;-1,..., 72. The output is the assign- 
ment {(i, 72(i),..., s(ms—1(... (12(1)))...)) : 
2,...,n}. 
Clearly, ROM is of running time O(n*) for every 
fixed s > 3. Using Theorem 16, it is proved in [15] that 
ROM is of domination number at least ((n — 1)!)°~!. 


i=1, 


Minimum Partition 
and Multiprocessor Scheduling Problems 


In this subsection, N always denotes the set 
{1,2,...,m} and each i € N is assigned a positive inte- 
gral weight o(i). A = (Aj, A2,..., Ap) is a p-partition 
of N if each Aj C N, Aj; 1 Aj = @ for each i A j and 
the union of all sets in A equals N. For a subset A of 
N, o(A) = 0 ;<40(i). The Minimum Multiprocessor 
Scheduling Problem (MMS) [3] can be stated as follows. 
We are given a triple (N,o, p), where p is an integer, 
p = 2. We are required to find a p-partition C of 
N that minimizes 0(A) = maxj<j<p)o(Aj) over all 
p-partitions A = (Aj, A2,...,Ap) of N. 
Clearly, if p > n, then MMS becomes trivial. Thus, 
in what follows, p < n. The size s of MMS is O(n + 
> FL, log o(i)). Consider the following heuristic H for 
MMS. If s > p”, then we simply solve the problem op- 
timally. This takes O(s’) time, as there are at most O(s) 
solutions, and each one can be evaluated and compared 
to the current best in O(s) time. If s < p”, then we 
sort the elements of the sequence o(1),0(2),...,0(n). 
For simplicity of notation, assume that o(1) > o(2) > 
- > o(n). Compute r = [log n/log p| and solve 
MMS for ({1, 2,..., 7}, 0, p) to optimality. Suppose we 
have obtained a p-partition A of {1,2,...,1r}. Now for 
i from r+1 to n add i to the set Aj of the current p- 
partition A with smallest o(A;). The following result 
was proved by Gutin, Jensen and Yeo [16]. 


Theorem 17 The heuristic H runs in time O(s? log s) 
and lim,-+99 domr(H,s) = 1. 


The Minimum Partition Problem (MP) is MMS with 
p=2. Alon, Gutin and Krivelevich [2] proved Theo- 
rem 17 for MP with s replaced by n. 
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Max Cut Problem 


The Max Cut (MC) is the following problem: given 
a weighted graph G=(V,E), find a bipartition (a cut) 
(X,Y) of V such that the sum of weights of the edges 
with one end vertex in X and the other in Y, called the 
weight of the cut (X,Y), is maximum. For this problem, 
there are some better-than-average heuristics. The sim- 
plest is probably the following greedy-like heuristic C; 
order the vertices arbitrarily and put each vertex in its 
turn either in X or in Y in order to maximize in each 
step the total weight of crossing edges. 

Using an advanced probabilistic approach Alon, 
Gutin and Krivelevich [2] proved that the heuristic C 
is of domination ratio larger than 0.025. For the un- 
weighted MC (all weights are equal), a better quality 
algorithm can be designed as described in [2]. Its dom- 
ination ratio is at least 1/3 — o(1). 


Constraint Satisfaction Problems 


Let r be a fixed positive integer, and let F = {fi, 
to, --+, fm} be a collection of Boolean functions, each 
involving at most r of the n variables, and each hav- 
ing a positive weight w(f;). The Max-r-Constraint Sat- 
isfaction Problem (or Max-r-CSP, for short), is the prob- 
lem of finding a truth assignment to the variables so as 
to maximize the total weight of the functions satisfied. 
Note that this includes, as a special case, the Max Cut 
problem. Another interesting special case is the Max-r- 
SAT problem, in which each of the Boolean functions 
fi is a clause of at most r literals. 

Alon, Gutin and Krivelevich [2] proved the follow- 
ing: 
Theorem 18 For each fixed integer r > 1 there ex- 
ists a linear time algorithm for the Max-r-CSP problem, 
whose domination ratio exceeds SIG: 


Vertex Cover, Independent Set 
and Clique Problems 


A clique in a graph G is a set of vertices in G such 
that every pair of vertices in the set are connected by 
an edge. The Maximum Clique Problem (MCI) is the 
problem of finding a clique of maximum cardinality in 
a graph. A vertex cover in a graph G is a set S of ver- 
tices in G such that every edge is incident to a vertex 
in S. The Minimum Vertex Cover Problem (MVC) is the 


problem of finding a minimum cardinality vertex cover. 
An independent set in a graph is a set S of vertices such 
that no edge joins two vertices in S. The Maximum In- 
dependent Set Problem (MIS) is the problem of finding 
a minimum cardinality independent set in a graph. It is 
easy to see that the number of cliques and independent 
sets in a graph depends on its structure, and not only 
on the number of vertices. The same holds for vertex 
covers. 

Notice that if C is a vertex cover of a graph G, then 
V(G)\C is an independent set in G; if Q is a clique in G, 
then Q is an independent set in the complement of G. 
These well-known facts imply that if there is a heuristic 
for one of the problem of domination ratio at least r(n), 
all other problems admit a heuristic of domination ratio 
at least r(n). 

MCI, MIS and MVC are somewhat different from 
the previous problems we have considered. Firstly, the 
number of feasible solutions, for an input of size n, de- 
pends on the actual input, and not just its size. The sec- 
ond difference is that the three problems do not admit 
polynomial-time heuristics of domination ratio at least 
1/p(n) for any polynomial p(n) in n unless P=NP. This 
was proved by Gutin, Vainshtein and Yeo [19]. 

Because of the first difference, it is better to compare 
heuristics for the problems using the blackball number 
rather than domination number. Since a heuristic for 
MVC can be easy transformed into a heuristic for the 
other two problems, we restrict ourselves only to MVC 
heuristics. 

The incremental deletion heuristic starts with an ar- 
bitrary permutation z of vertices of G and an initial so- 
lution S=V(G). We consider each vertex of G in turn 
(according to zr), deleting it from S if the resulting sub- 
set remains a (feasible) solution. A seemingly better 
heuristic for MVC is obtained by ordering the vertices 
by degree (lower degrees first), and then applying the 
incremental deletion heuristic. We call it the increasing- 
degree deletion heuristic. The well-known maximal 
matching heuristic constructs a maximal matching M 
and outputs both end-vertices of all edges in M as a so- 
lution. Berend, Skiena and Twitto [10] proved that the 
incremental deletion heuristic (increasing-degree dele- 
tion heuristic, maximal matching heuristic) is of black- 
ball number 2”~! — n (of blackball number larger than 
2 —€)" for each € > 0, of blackball number approxi- 
mately 1.839”). Clearly, the maximal matching heuris- 
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tic is the best among the three heuristics from the DA 
point of view. 


Quadratic Assignment Problem 


The Quadratic Assignment Problem (QAP) can be for- 
mulated as follows. We are given two n x n matrices 


A = [ajj] and B = [b;;] of integers. Our aim is to 
find a permutation z of {1,2,..., m} that minimizes the 
sum 


n n 
Dd aijbanmy- 


i=1 j=1 


Gutin and Yeo [23] described a better-than-average 
heuristic for QAP and proved that the heuristic is of 
domination number at least n!/6” for each B > 1. 
Moreover, the domination number of the heuristic is 
at least (n — 2)! for every prime power n. These results 
were obtained using a group-theoretical approach. 


See also 


> Traveling Salesman Problem 
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Abstract 


Duality gaps in optimization problems arise because of 
the nonconvexities involved in the objective function or 
constraints. The Lagrangian dual of a nonconvex opti- 
mization problem can also be viewed as a two-person 
zero-sum game. From this viewpoint, the occurrence of 
duality gaps originates from the order in which the two 
players select their strategies. Therefore, duality theory 
can be analyzed as a zero-sum game where the order of 
play generates an asymmetry. One can conjecture that 


this asymmetry can be eliminated by allowing one of 
the players to select strategies from a larger space than 
that of the finite-dimensional Euclidean space. Once 
the asymmetry is removed, then there is zero duality 
gap. The aim of this article is to review two methods by 
which this process can be carried out. The first is based 
on randomization of the primal problem. The second 
extends the space from which the dual variables can be 
selected. Duality gaps are important in mathematical 
programming and some of the results reviewed here are 
more than 50 years old, but only recently methods have 
been discovered to take advantage of them. The theory 
is elegant and helps appreciate the game-theoretic ori- 
gins of the dual problem and the role of Lagrange mul- 
tipliers. 


Background 


We discuss how duality gaps arise, and how they can 
be eliminated in nonconvex optimization problems. 
A standard optimization problem is stated as follows: 


min f(x), 
g(x) <0, (1) 
xEXx, 


where f: R” — Rand g: R” — R” are assumed to be 
smooth and nonconvex. The feasible region of (1) is de- 
noted by Ff, and it is assumed to be nonempty and com- 
pact. X is some compact convex set. 

In order to understand the origins of duality in 
mathematical programming, consider devising a strat- 
egy to determine whether a point, say y, is the glob- 
ally optimal solution of (1). Such a strategy can be con- 
cocted as follows: if f(y) is the global solution of (1) then 
the following system of inequalities 


fi =<fO), 
g(x) <0, (2) 
xEex 


will not have a solution. We can reformulate (2) in 
a slightly more convenient framework. Indeed, suppose 


that there exist m positive scalars 1;, i=1,...,m, 
such that 
L(x,A) = f(x) + D> Aigi(x) < f(y) (3) 


i=1 
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has no solution. Then (2) does not have a solution ei- 
ther. The left-hand side of (3) is called the Lagrangian 
function associated with (1). It is clear from the discus- 
sion above that the Lagrangian can be used to answer 
questions about the optimal solutions of (1). The use- 
fulness of the dual function emanates from the follow- 
ing duality observation: Let f* be the optimal objective 
function value of (1), and let L: R” — R be defined as 
follows: 


L(A) = inf L(x,d). 
xEX 


Then it is easy to prove that: 


sup L(A) < f*. (4) 


A>0 


This result is known as the weak duality theorem, and 
it is valid with a quite general set of assumptions. The 
strong duality theorem asserts that if f and g are convex, 
f* > —oo, and the interior of F is not empty, then 


sup L(A) = f*. 


A>0 


Proofs of the weak and strong duality theorems can be 
found in [1,10]. 


Game Theory Interpretation 


There is an interesting relationship between (1) and the 
following optimization problem: 


sup inf f(x) + Ai gi(x) . (5) 
A>0 xEeX 


i=1 


We refer to (1) as the primal problem, while (5) is re- 
ferred to as the Lagrangian dual. The A’s that appear 
in (5) are called the Lagrange multipliers (or dual vari- 
ables). 

It is interesting to note that (1) can equivalently be 
restated as follows: 


inf sup f(x) + )> Aigi(x). (6) 
xEX A>0 


i=1 


The relationship between (6) (or (1)) and (5) can be an- 
alyzed as a two-person zero-sum game. In this game 
player A chooses the x variables, and player B chooses 


the A variables. If player A chooses x’, and player B 
chooses A’, then player A pays L(x’,A’) to player B. Nat- 
urally, player A wishes to minimize this quantity, while 
player B attempts to maximize it. 

In game theory equilibria play an important role. 
An equilibrium, in the present context means a point 
from which no player will gain by a unilateral change 
of strategy. For the game outlined above an equilibrium 
point (x*, A*) must satisfy 


L(x,A*) > L(x*,A*) 
> L(x*,A) Wx eX, VWAE RE. (7) 


A point satisfying the preceding equation is also known 
as a saddle point of L. To see that (7) is an equilibrium 
point we argue as follows: Given that player A wishes to 
minimize the amount paid to player B, then it is obvious 
that if player B chooses A* and player A selects anything 
other than x*, player A will be worse off. Similarly, if 
player A chooses x*, then if player B chooses anything 
other than A*, then player B will be worse off. 

By the strong duality theorem, we know that the 
game has an equilibrium point under convexity as- 
sumptions. For the general case, insight can be obtained 
by interpreting (5) and (6) as two different games. 
A saddle point will exist if the optimal values of the two 
games are equal. 

Our next task is to interpret (5) and (6) as games. In- 
deed consider the following situation: Player A chooses 
a strategy first, and then player B chooses a strat- 
egy. Thus, player B already knows the strategy that 
player A has chosen. As a result player B will have an 
advantage. Player A will argue as follows: “If I choose 
x, then player B will choose sup,,, L(x, A), therefore 
I had better choose the strategy that will minimize my 
losses.” In other words player A will choose the optimal 
strategy given by solving (6). 

Now consider the same game, but with the order of 
play reversed, i.e., player B chooses first, and player A 
second. Then applying the rules of rational behavior 
(as above), we see that player B will select the A that 
solves (5). 

Consequently, duality gaps originate from the order 
in which the two players select their strategies. In the 
next section we see how this asymmetry can be elimi- 
nated by allowing one of the players to select strategies 
from a larger space than that of the finite-dimensional 
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Euclidean space. Once the asymmetry is eliminated, 
then there is zero duality gap. 


Methods 


As argued above, the player that chooses first is dis- 
advantaged, since the other player can adjust. In this 
section we discuss two methods in which this asymme- 
try in the order of play can be eliminated. Both meth- 
ods were proposed early in the history of mathemat- 
ical programming. The first method proceeds by ran- 
domization (increasing the powers of player A). It is 
difficult to say who suggested this strategy first. Since 
the origins of the idea emanate from mixed strategies 
in game theory, one could argue that the idea was first 
suggested by Borel in the 1920s [14]. A modern proof 
can be found in [2]. The second method allows player 
B to select the dual variables from a larger space. This 
idea seems to have been suggested by Everett [3], and 
then by Gould [6]. A review can be found in [11]. Algo- 
rithms that attempt to reduce the duality gap appeared 
in [4,5,7,8,9,12]. 


Randomization 


Assume that player A chooses first, then the game can 
be described by 


P* = inf supL(x,A), 


xEX A>0 
and in general by 


P* > D* = sup inf L(x, A). 


A>0 xEX 

Player A has a handicap since player B will choose 
a strategy knowing what player A will do. In order to 
avoid having a duality gap, we consider giving more 
flexibility to player A. We thus allow player A to choose 
strategies from M(X), where M(X) denotes the space 
of probability measures on B (the o-field generated 
by X). Player A will therefore choose a strategy by solv- 
ing 


i int F(x) d(x) 


= 


[ 8 du(x) < 0 (8) 


[amc =, 


Equivalently: 


p* = re nf sup |, F(x) d(x) 


+ ya f g(x) dye(x) + do « dyu(x) 1) , 
i=. Xx zis 
The dual of (8) is given by 


D* = sup a 
A>0 MEM(X 


en (/, ayes) ~ 1) 


Then it can be shown that P* = D*. The proof is be- 
yond the scope of this article; it can be found in [2]. 


ei (x) du(x) 


Functional Lagrange Multipliers 


We now consider the case where player B chooses first. 
From the previous section, it follows that player B will 
choose a strategy according to: 


D* = sup inf L(x, A). (9) 
A>=0* 


We have already pointed out that the following holds: 


D* < inf f sup L(x,A). 
Xj>0 
In order for the preceding equation to hold as an equal- 
ity, without any convexity assumptions, we consider in- 
creasing the space of available strategies of B. This was 
suggested in [3,6]. The exposition here is based on [11]. 
Let H denote all the feasible right-hand sides for (1): 


H = {be R™|Ax € X: g(x) <b}. 
Let D denote the following set of functions: 


D = {z: R™ > R| 2(d)) < 2(da), ifd, < do, 
Vd, do € Ht} i 


The following dual can be defined using the concepts 
above: 

D* = sup z(0) 

. (10) 


z(g(x)) < f(x) Vxex zeD. 


The dual in (10) is different from the type of duals that 
we have been discussing in this article. If, however, we 
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assume that c + D C D, then it was shown in [11] that 
(10) is equivalent to the following: 


D* = sup inf f(x) + z(g(x)) 
zEeD xXEX 

dual problem. A proof that the duality gap between (10) 

and (1) is zero can be found in [11]. 


Conclusions 


We have discussed Lagrangian duality, and the exis- 
tence of duality gaps from a game-theoretic viewpoint. 
We have discussed two ways in which duality gaps can 
be eliminated. The first is randomization and the sec- 
ond is the use of functional Lagrange multipliers. Un- 
fortunately none of the two methods are immediately 
applicable to real-world problems. However, for certain 
classes of problems the functional Lagrange multiplier 
approach can be useful. It was shown in [13] that if the 
original problem involves the optimization of polyno- 
mial functions, and if the Lagrange multipliers are al- 
lowed to be themselves polynomials then there will be 
no duality gap. Unlike the general case discussed in this 
article, polynomial Lagrange multipliers can be manip- 
ulated numerically. This approach can potentially help 
develop efficient algorithms for a large class of prob- 
lems. 
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Consider the following optimal control problem with 
first order ordinary or partial differential equations: 


(P) min fx.) = f r(t, x(t), u(t)) dt 
2 


subject to functions (x, u) € W;'" (2) x L? (@), fulfill- 
ing 
e the state equations 


x @) = gi(t, x(t), u(t)) ae.onQ, 


P= 15a); 
e the control restrictions 


u(t)€e UCR” ae. onQ; 


e the state constraints 


x(t) € G(t) CR" onQ; 


e and the boundary conditions 


x(s) = g(s) ond. 


The data of problem (P) satisfies the following hypoth- 

esis: 

H1) For m= 1 we have 1 < p < ~, for m > 2 we have 
m<p<o, 

H2) The sets §2 and 


X:= {(t,£) €R™ xR": te 2, £ © Gt} 


are strongly Lipschitz domains in the sense of C.B. 
Morrey and S. Hildebrandt [6]; the set U is closed. 

H3) The functions r, rg, gi, (gi,)e, Y are continuous 

with respect to all arguments. 

H4) The set of all admissible pairs (x, u), denoted by Z, 

is nonempty. 

The characterization of optimal solutions of special 

variational problems by dual or complementary prob- 

lems has been well known in physics for a long time, 

e.g. 

e in elasticity theory, the principle of the minimum of 
potential energy (Dirichlet’s principle) and the prin- 
ciple of tension (Castigliano’s principle) are dual or 
complementary to each other. 

e in the theory of electrostatic fields, the principle of 
the minimum of potential energy and the Thomson 
(Lord Kelvin) principle are dual problems. 


A first systematic approach to duality for special 
problems in calculus of variations was given by K. 
Friedrichs ([4], 1928). In the 1950s and 1960s, this con- 
cept was extended by W. Fenchel [3], J.-J. Moreau, R.T. 
Rockafellar [19,20] and I. Ekeland and R. Temam [2] to 
larger classes of variational and control problems. Bas- 
ing on Legendre transformation (or Fenchel conjuga- 
tion), it was proved to be a suitable tool to handle con- 
vex problems. 

Nonconvex problems (P) require an extended con- 
cept of duality. The construction of R. Klétzler, given 
in 1977 [7], can be regarded as a further development 
of Hamilton-Jacobi field theory. 


Construction of a Dual Problem 


In a very general setting, a problem (D) of maximiza- 
tion of an (extended real-valued) functional L over an 
arbitrary set S 4 @ is said to be a dual problem to (P) if 
the weak duality relation 


sup (D) < inf (P) 


is satisfied. 

The different notions of duality given in the intro- 
duction can be embedded into the following construc- 
tion scheme: 


1 | The set of admissible pairs (x,u) = z € Z is 
represented by the intersection of two suitable 
nonempty sets Zo and 2). 

2 | For an (extended real-valued) functional 
@: Zo 

So — Rthe equivalence relation 


f sup ®(z,S), 


inf J(z) = in 
ASE ZEZO SESo 


holds. 
3 | Assuming Lo(S) := inf P(z, S), each problem 
ZELO 


max L(S) 


(D) 
Si SS Si S Sp 


is a (weak) dual problem to (P) if L(S) < Lo(S) 
forallS € Sj. 
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The proof of the weak duality relation results from the 
well-known inequality 


inf sup ®(z,S) > sup inf H(z, S). 
SESo ZEZo 


Z€Z0 Ses, 


Fenchel-Rockafellar Duality 


In accordance with [2], we transform (P) into a general 
variational problem: 


(v) min i I(t, x(t), x;,,(t)) dt 
s.t. xEXx 


where /: R” x R" x R"” — R is given by 


inf{r(t, €,v): 
Cai v € Uwith 7 
w= g(t,é,v)}, (t,§) € X, 
oe) else. 


and X = {x € W,” (Q) : x(s) = g(s) on 92}. 

Then (P) is called convex if (V) is convex in the sense 
of [2, p. 113]. In this case both problems are equivalent 
[15]. The Fenchel-dual problem is obtained by the fol- 
lowing settings in the above construction scheme: 

1) Zo = {2 = (x, u) € W,7"(2) x L5° (2) : 
u(t) € Uae. on 22, 
x(t) € G(t) on Q, 
x(s) = g(s) on 022}, 
Z, = {z= (x,u)e W,’"(2) x L5°(2) : 
a Ae) = gt xl), u(t)) ae. on 2, 
OS Dy et t= 1 hs 
2) 8) = Lee (2) (p7! + q7! = 1), @ is the classical 
Lagrange functional, 


P(z,S) = J(z) + ¥ (xi, — eG. xu). y%), 
i, 
where (-, -) is the bilinear canonical pairing over L, 
(2)x Ly (2), [15]. By use of the Hamiltonian of (P), 


H: R'xR" xR" SR, 
H(t, &,) = sup {H(t, §,v,): v € U} 
with 


H(t, &,v,n) = —r(t.8,v) + D> nf gi(t.&v) 


i, 


it can be formulated as follows [15]: 
max 


- / (<x [H(t £,—y(0) 
2 \EEG(t) 
w(t)" dt 


_ alr 
ed | f (yo(t) ' (2) 


+)" y(cy (t) | at 


(Dr)q 


s.t. (yo. y) € Lan). 


Duality in the Sense of Klétzler 


The duality in the sense of Kl6tzler is realized by the fol- 
lowing settings in the general construction scheme [14]: 
1) Zp and Z, are chosen as before. 


2) So = wy" (X), and @ is an extended Lagrange 
functional, 
P(z,S) = J(z) 


+ oxi, — gh x, 0), Sh 6.2), 


i, 


where (-, -) is again the bilinear canonical pairing 

over Ly (S82) x L3(S2). 

By use of Gauss’ theorem, the dual problem reads as 
follows [8]: 


max i S(s, €(s))m(s) do(s) 
a2 


(Dk)q 
s.t. SES, 
where 
Dam 1 Sty (ts §) 
Si = (SE So: +H (t, &, S(t, €)) <0 4 


a.e.on X 


n(-) is the exterior unit normal vector to 02. 

In this way we can characterize minimizers of (P) 
in terms of solutions of the Hamilton-Jacobi inequality 
or of the Hamilton-Jacobi equation. Since classical so- 
lutions of the latter equation may fail to exist, on the 
one hand techniques were developed to construct gen- 
eralized solutions of this equation (viscosity solutions 
[11], generalized solutions involving the Clarke gener- 
alized gradient [1] or lower Dini derivatives [24]). On 
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the other hand, optimization techniques for paramet- 
ric problems in finite-dimensional spaces are used to 
minimize the defect in the Hamilton-Jacobi inequal- 
ity and to get sufficient conditions for (local) optimality 
[16,17,27,28,29]. 


Bidual Problems, Generalized Flows, Relaxed 
Controls 


Duality allows to associate the bidual problem with 
a flow problem or a relaxed control problem. (Dx), in- 
terpreted as an infinite linear programming problem, 
[13], has a dual problem again, which can be identified 
as a generalized flow problem in the sense of L.C. Young 
[26]: Assuming compactness of the control set U, one 
obtains the bidual problem (Dx)5 = (F): 


min [ resndteén 
D 


st veEMNp, 


(F) 


with 
Np = ty E Rp: 


a(S - d 
7, V7. 6NaLS) dol 


= [ (vaco+ Dvgeosuesn 


i, 


du(t,é,v), Vw Ee coma : 


where Rp is the set of all nonnegative Radon measures 
on D := X x U. Jp contains special measures 


dv(t,&,v) = dd) dju,(v) dt, 


with po = {ut € Q} e Mu, My is a regular family of 

probability measures, concentrated on U, [5], and bg are 

Dirac measures concentrated on the point € € G(f). 
Thus the relaxed control problem 


— min i [exo diz(v) dt 
(P) QJdu 
st. (x, uw) € Wy'"(2) x Mu, 


satisfying x(t) € G(t) and fulfilling the following varia- 
tional equation for all y € Cl" (X), 


/ as W(s, €(s))tte(s) do(s) 
= i: bo we (t, x(t) 


+ Ovex) f gilt xt.) dur(v) | dt 


has the embedding (F), and 
sup (Dx)oo = inf (F) < inf (P) 
holds, [13]. 


Strong Duality Results 


The property of strong duality between (P) and (D) is 
defined by the equation 


sup (D) = inf (P), 
and this common value is called settle-value of ®. 


Case A 


Control problems with single integrals and ordinary 
differential equations, 92 = [0, T]. 
For convex problems (P), 


sup (Dp), = min (P) 


holds, [2]. Moreover, in (Dg), the optimal solution (yj, 
y*) exists and fulfills 


y* © Wh"(Q) 


and 
d * * 
a (t) = y(t) a.e. on (0, T), 


[15]. Both dual problems, (Dp), and (Dx),, coincide, if 
in (Dx), a linear setting 


S(t, &) = a(t) + €' y(t) 


is chosen, [14]. 
For nonconvex control problems with compact U is 
was shown by different techniques, that (P) as well as 
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(P) possess a same dual problem, [12,22,23], and strong 
duality 


sup (Dx)oo = min (P) 


holds. The variational equation appearing in (P) is in 
this case equivalent to the generalized state equations 


aye, = / g(t, x(t), v) dit;(v) ae. on (0, T) 
dt U 


with the boundary conditions 


x(s)=(s) fors=0,s=T. 


The question of existence of a solution of the dual prob- 
lem (Dx)oo was discussed in [21]. 


Case B 


Control problems with multiple integrals and first or- 
der partial differential equations. 
As before, for convex problems 


sup (Dr), = min (P) 


holds. The equivalence of (Dp), and (Dx), is lost in 
general. Results concerning strong duality between (D) 
and (P) in the nonconvex case are largely missing. 


Sufficient Optimality Conditions 


First- and second order sufficient optimality conditions 
for global minimizers can be derived by means of du- 
ality. In the general concept, (x*, u*) € Z is a global 
minimizer of (P) if 


J(x*,u*) = inf sup ®(z, S) 
zZEZo SES 


= max inf ®(z, S) 
SESo zZEZo 


and it exists an S* € 8, with 
Lo(S*) = max L(S) = max Lo(S). 
SES| SESo 
Following the concept of Klétzler, these equations are 
satisfied if and only if for S* € W23"(X) the following 


conditions are fulfilled: 
a) the Hamilton-Jacobi inequality 


A(t, ) = D0 S24, 8) 


+ H(t,£, $f (1,8) <00n X 


b) the Hamilton-Jacobi equation 
> See) 
a 
+ H(t, x*(t), Se(t,x*(t))) = 0 on Q; 
c) the maximum condition 


H(t, x*(t), S(t, x*(t))) 
= H(t,x*(t),u*(t), S(t, x*(#))) a.e.on Q. 


From conditions a) and b) follows that x*(f) must 
be a global minimizer of the parametric optimization 
problem 


max A(t, &) 


(P)+ a 
st. & € G(t) 
with parameter t € 9. For this last problem (P), first- 
and second order sufficient optimality conditions can 
be derived with the quadratic setting 


S*«(¢,£) = a%(t) + y*" (t)(E — x(8) 


rn a — x*(N)QM(H(E— x*(0) 


in the dual problem (Dx)oo, where y® € Wy" (2) and 
Q* € Wk" (2) symmetric. 

The ideas, mentioned above, can be used for identi- 
fying strong local minimizers of (P) too. In this case X is 
to be replaced by 


X,=X 
N{t€ ER: E—x(H] <e, te 2}, 


[16,17,27,28,29]. The second order condition for (P); 
yields a definiteness condition for a Riccati-type expres- 
sion which generalizes the known theory of conjugated 
points in the calculus of variations in one independent 
variable. 


Duality and Maximum Principle 
Case A 


Control problems with single integrals, §2 = [0, T]. For 
convex problems (P) it can be shown that the Pontrya- 
gin maximum principle is not only a necessary but also 
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a sufficient optimality condition. In this case the canon- 
ical variables in the Maximum principle solve at the 
same time the dual problem (Dx)oo, [15]. 


Case B 


Control problems with multiple integrals, 22 C R”, 
m > 2. For convex problems (P) or relaxed problems 
(P) a maximum principle was proved in the beginning 
of the 1990s, [10,18,25]. It turns out, that the canon- 
ical variables in this principle are not necessarily func- 
tions but contents or measures from L*, (2) or C* (£2). 
A corresponding duality theory with dual variables in 
these measure spaces was developed by Klotzler [9] and 
strong duality was shown. As before, in the convex case 
the canonical variables of the maximum principle solve 
the dual problem. 
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Synonyms 
SDP duality 


Basic Properties 


Consider the primal semidefinite program 


min Cex 
SDP w*:=4st. AX=b 
X > 0, 


where C e X = trace CX denotes the inner product of the 
symmetric matrices C, X; X > Y denotes that the sym- 
metric matrix X — Y is positive semidefinite; and A: 8” 
— R” is a linear operator on the space of symmetric 
matrices, with adjoint A*. Equivalently, the linear con- 
straint can be written using symmetric matrices Aj, i = 
1,..., m, as 


A;jeX=b; foralli=l,....m; 


while the adjoint operation on y € R” is 


A*y = > yi Aj. 


i=1 

The Lagrangian function is 

£(X, y):= CoX+y"(b— AX). 
The primal problem is equivalent to 


pw” = min max £(X, y) = CeX + y'(b— AX). 
X>0 y 


The equivalence can be seen by using the hidden con- 
straint in the outer minimization problem b — A X =0, 
i.e. if this constraint does not hold then the inner max- 
imum value is +00. 

By interchanging the maximum and minimum and 
rewriting the order of terms in the Lagrangian, we get 
the dual problem and weak duality: 


y* > v* := maxminb! y+ (C— A*y) eX. 
y xX>0 


Using the hidden constraint in the outer maximization 
problem C — A*y > 0, this becomes equivalent to 


max bly 


(D) v= 
st. A*y~<C. 


The dual pair SDP and (D) look very much like 
a dual pair of linear programs (denoted LP) where the 
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adjoint operator replaces the transpose and positive 
semidefiniteness of matrix variables replaces nonnega- 
tivity of vector variables. In fact, duality theory for SDP 
has a lot of similarities with that of LP: weak duality j1* 
> v* follows from the interchange of maximum and 
minimum; from this we get that unboundedness of SDP 
(respectively, (D)) implies infeasibility of (D) (respec- 
tively, (D)). 

Weak duality illustrates one of the powerful uses of 
the dual program, i.e. it provides lower bounds on the 
optimal value of the primal program. 

Other formulations of SDP provide similar duals. In 
fact, SDP is a special case of cone programming. Let 
K, L be two convex cones, i.e. K (and L) satisfy: the 
Minkowski sum K + K C K anda K C K for alla € 
R. Define the primal cone program as 


min (C,X) 
(PC) v* = 4s. AX >xkb 
X >, 0, 


where X >, Y denotes X — Y € L (and similarly for K), 
and (-, -) denotes the appropriate inner product. Then 
the above min-max argument yields the dual cone pro- 
gram 


max 
(DC) v* = est. 


(0, y) 
A*y Xy)4+ C 


Y =Kt 0, 


where -* denotes taking the polar cone. 

It is an interesting exercise to see that this elegant 
dual formulation works for linear programs that have 
mixtures of inequality and equality constraints with 
mixtures of free and nonnegative variables. 


Strong Duality 


However, unlike linear programming, strong duality for 
SDP needs a constraint qualification, e.g. strict primal 
feasibility (called Slater’s condition), 


there exists aX > O with AX = b. 


This constraint qualification implies strong duality 
holds, i.e. that z* = v* and v* is attained. Conversely, 


A*y <;+C 


also implies that w* = v* but with j* attained. If 
Slater’s condition does not hold, then a duality gap y* > 
v* can exist, and/or the dual (or primal) optimal value 
may not be attained, see e. g. [10]. 


Example 1 If the dual is 


sup X2 
x 0 O 1 0 0 
(Dy v= ° 
s.t O x x.}/X]0 0 0 
0 x O 0 0 0 
then the primal is 
inf Uy, 
s.t. Ux = 0 
(P) pt = 
Uy, + 2U23 = 1 
U>=0 


and we have the duality gap 


* 


ve =0<p* =1. 


If strong duality holds, we get the following primal-dual 
characterization of optimality for the dual pair X, y, 
with X > 0: 

e AX =b (primal feasibility); 

e A*y <C (dual feasibility); 

e X(A*y—C) =0 (complementary slackness). 

These optimality conditions provide the basis for: 


i) the primal simplex method (maintain primal feasi- 
bility and complementary slackness while striving 
for dual feasibility); 

ii) the dual simplex method (maintain dual feasibil- 

ity and complementary slackness while striving for 

primal feasibility); and 

the interior point methods (maintain primal and 

dual feasibility while striving for complementary 

slackness). 


Unlike the LP case, there are currently no efficient al- 
gorithms for primal or dual simplex methods for SDP; 
however, interior point methods have proven to be very 
successful. Thus we see the importance of duality for 
both theoretical and algorithmic purposes. 


w~ 


iii 


Duality for Semidefinite Programming 


813 


Strict Complementarity 


Another example of the difference between LP and SDP 
arises in the complementary slackness conditions. If an 
optimal pair X, y exist, then in the LP case there also ex- 
ists an optimal pair that satisfies strict complementarity, 
ie. 


XC =A) $0; 


where in the LP case this is a sum of nonnegative diago- 
nal matrices, see [4,5]. However, in the SDP case, there 
may not exist such a strict complementary optimal pair, 
though the existence is generic, see [8]. 


Closing the Duality Gap 


Both the strict complementarity and strict feasiblity, 
or Slater’s constraint qualification, are generic, see [8]. 
But there are classes of problems where strong duality 
fails, e. g. relaxations that arise from hard combinatorial 
problems, e. g. [13]. 

One can regularize semidefinite programs and guar- 
antee that Slater’s constraint qualification holds, e.g. 
[2,3,12]. This involves finding the minimal face of the 
semidefinite cone that contains the feasible set, i.e. 
the so-called minimal cone. A numerical procedure for 
regularization is presented in [3]. However, this pro- 
cess is not computationally tractable. An equivalent ap- 
proach is the extended Lagrange-Slater dual program 
of M. Ramana [9,10]. This provides a means of writ- 
ing down a regularized program that is of polynomial 
size. Thus strong duality can be attained theoretically 
using the above techniques and exploiting the struc- 
ture of specific problems. However, lack of regular- 
ity (Slater’s condition) is an indication of an ill-posed 
problem. Thus, the question of whether regularization 
can be done computationally for general problems is 
still an open question, see e.g. [7]. 


Extensions 


The SDPs considered above have all contained lin- 
ear objectives and constraints. There is no reason to 
restrict SDPs to this special case. Duality for general 
cone programs with possible nonlinear objectives and 
constraints is considered in [2,3,11]. Applications for 
quadratic objectives SDP appear in, e. g., [1,6]. 


See also 


> Interior Point Methods for Semidefinite 
Programming 

> Semidefinite Programming and Determinant 
Maximization 

> Semidefinite Programming: Optimality Conditions 
and Stability 

> Semidefinite Programming and Structural 
Optimization 

> Semi-infinite Programming, Semidefinite 
Programming and Perfect Duality 

> Solving Large Scale and Sparse Semidefinite 
Programs 
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It is known that in convex optimization, the Lagrangian 
associated with a constrained problem is usually a sad- 
dle function, which leads to the classical saddle La- 
grange duality (i.e. the monoduality) theory. In non- 
convex optimization, a so-called superLagrangian was 
introduced in [1], which leads to a nice biduality theory 
in convex Hamiltonian systems and in the so-called d.c. 
programming. 


SuperLagrangian Duality 


Definition 1 Let L(x, y*) be an arbitrary given real- 
valued function on X x Y*. 

A function L: X x Y* — Ris said to be a supercritical 
function (or a 0*-function) on X x Y* if it is concave in 
each of its arguments. 

A function L: X x Y* — Ris said to be a subcritical 
function (ora 0” -function) on X x Y* if — Lis a super- 
critical function on X x Y*. 

A point (x, y*) is said to be a supercritical point (or 
a 0*-critical point) of Lon X x Y* if 


Lz, y") SL, y*) = Le, 7") (1) 


holds for all (x, y*) € X x Y*. 


A point (x, y*) is said to be a subcritical point (or 
a 0 -critical point) of Lon X x Y* if 


L(x, y") = L(%,y*) < L(x, 7") (2) 
holds for all (x, y*) € XxYy*. 


Clearly, a point (X, y") is a subcritical point of L on X 

x Y* if and only if it is a supercritical point of —L on 

X x Y*. A supercritical function L(x, y*) is called the 

superLagrangian if it is a Lagrange form associated with 

a constrained optimization problem. L(x, y*) is called 

the subLagrangian if — L(x, y*) is a superLagrangian. 
For example, the quadratic function 


1 1 
L(x, y) = axy — 5 bx = 50 b,c >0, 


is concave for each x and y, and hence is a supercritical 
point function on R x R. But L(x, y) is not concave on 
the vector (x, y) since the Hessian matrix of L 


D*L(x, y) = (; 5) 


is not necessarily to be negative-definite for any a € R 
and b, c > 0. Lis a subcritical function if b, c < 0. But L 
may not be convex on (x, y) for the same reason. 

Since L is a subLagrangian if and only if —L is a su- 
perLagrangian, here we only consider the duality theory 
for the superLagrangian. 


Theorem 2 (Supercritical point) Let L(x, y*) be an ar- 
bitrary given function, partially Gateaux differentiable 
on an open subset X, x Y* CX x Y*. If (x, y*) € 
Xq X Y% is either a supercritical or subcritical point of 
L, then (x, y*) is a critical point of Lon X, x Y*. 


Any critical point of a Gateaux differentiable superLa- 
grangian is a supercritical point. However, if (x, y*) is 
a supercritical point of L, it does not follows that L is 
a superLagrangian. In the d.c. programming or varia- 
tional analysis of convex Hamiltonian systems, the fol- 
lowing statements are of important theoretical value. 

S1) Under certain necessary and sufficient conditions 

we have 


inf sup L(x, y*) = 


xEXg yt eyx 


inf sup L(x, y*). (3) 
ye eye xEXg 

A statement of this type is called a superminimax 
theorem and the pair (x, y*) is called a supermini- 
max point of Lon Xq x Y*. 


Duality Theory: Biduality in Nonconvex Optimization 


815 


$2) Under certain conditions, a pair (x, y") € X_ x Y* 
exists such that 


L(x, y") < L(x, y*) = L@, y*) (4) 


holds for all (x, y*) € X, xY*. A statement of this 
type is called a supercritical point theorem. 
By the fact that the maxima of L(x, y*) can be taken in 
either order on X, x Y*, the equality 


sup sup L(x, y*) = sup sup L(x, y*) (5) 
xEX, y*ey* yeEY* xEXg 
always holds. A pair (x, y") which maximizes L on X, 
x Y* is called a supermaximum point of L on X, x Y*. 
For a given superLagrangian L: X, x Y* — R, we let 
X_ C X_ and Y* C Y* be such that 


sup L(x, y*)< +00, Vx Ee Xz, 
yey 
sup L(x, y*)< +00, Vy" e yf. 


xEXq 


Theorem 3 (superLagrangian duality) Let the La- 
grangian L: X x Y* — R be an arbitrary given function. 
If there exists either a supermaximum point (x, y") € 
Xa X Y> such that 


27 = me ae oe 


= max maxL(x,y*), (6) 
yreYy* xEXy 


or a superminimax point (X, y") € Xq x Y* such that 


U7) = mip as Le) 


. * 
= Bee max L(x, y"), (7) 
then (x, y") is a supercritical point of L on Xq x Y*. 
Conversely, if L is partially Gateaux differentiable on 
an open subset X, x YX CX x Y*, and (x, y*) is a su- 
percritical point of Lon € X, x Y*, then either the super- 
maximum theorem in the form 


L(x, y*) = L y* 
(x, y") Seer (x, y") 


= max max L(x, y"), (8) 
yreYye xEXa 


holds, or the superminimax theorem in the form 


L(x, y*) = : L ay 
(x, y") oe (x, y") 


= min maxL(x,y*) (9) 
yTEy* xEX, 


holds. 


This superLagrangian duality theorem shows a very im- 
portant fact in Hamiltonian systems, i.e. the critical 
points of the Lagrangian L either maximize or mini- 
maximize L on X; x Y* in either order. 


Nonconvex Primal and Dual Problems 


Let L: X, x Y — R be an arbitrary given supercritical 
function. For any fixed x € Xq, let 


P(x) = sup L(x, y*). (10) 


y* eye 
Clearly, the function P(x) need not be either convex or 
concave. Let X, C X, be the primal feasible set such that 
P: X; — Ris finite and Gateaux differentiable. Then for 
a nonconvex function P, two primal problems can be 
proposed as: 


(Pint): Vue Xx, (11) 


P(x) > min, 


(Poup): P(x) > max, Vue Xx. (12) 


The problems (Pins) and (P.up) are realisable if the pri- 
mal feasible set X, is not empty. 
Dually, for any fixed y* € Y*, let 


P4(y*) = sup L(x, y*) (13) 


xEXq 
with the dual feasible set Y* C Y* such that Pt; y*—>R 
is finite and Gateaux differentiable. The two nonconvex 
dual problems are: 


(Pa): Pi(y*) >min, Vy* eye, — (14) 
d\, di ..* * * 
(Pup! P*(y") — max, Vy" €Y%. (15) 


These two dual problems are realisable if the dual feasi- 
ble set Y* is not empty. 

Theorem 4 (Biduality theorem) Let L: X,x Y* > R 
be a given arbitrary function such that P and P4 are well- 
defined by (10) and (13) on the open subsets X; and Y*, 
respectively. 
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1) If (X, y*) is a supercritical point of Lon X; x Y*, then 
DP(x) = 0, DP4(y*) = 0, and 


P(x) = L(x, y*) = P@(y"*). (16) 


2) If(X, y*) is a supercritical point of L on X;, x Y*, then 
X is a minimizer of P on X; if and only if y* is a min- 
imizer of P4 on Y*, i.e. the double-min duality 


P(x) = inf P(x) <> inf P4(y*) = P4G*) (17) 

xEX yt eye 
holds. 

3) If (x, y*) is a supercritical point of L on X; x Y*, then 
X isa maximizer of P on X; ifand only if y* isa max- 
imizer of P4 on Y*, i. e. the double-max duality 


P(x) = sup P(x) © = P4(y*) = P4(y*) (18) 


xEX, 


holds. 


D.C. Programming and Hamiltonian 


In d.c. programming, the primal function P: X, > R 
can be written as 


P(x) = W(Ax) — F(x), 


where A: X— Y is a linear operator, W: Y, > R and 
F: X, — R are two convex, Gateaux differentiable real- 
valued functions, satisfying the Legendre duality rela- 
tions 


x* = DF(x) @ x = DF*(x*) 
> (x,x*) = F(x) + F*(x*) 


on X, x X*, and 


y* = DW(y) & y = DW*(y*) 
> (ys y*) = Wy) + W*(y*) 


on Y, x Y*, where F*: X* — Rand W*: Y* > Rare 
the Legendre conjugates of F and W, respectively. 

In dynamical systems, if A = d/dt is a differential 
operator, its adjoint associated with the standard bilin- 
ear forms in L* is A* = — d/dt. If W denotes the kinetic 
energy, F stands for potential energy, then the primal 
function P(x) = W(Ax)— F(x) is the total action of the 
system. The primal feasible set X;, C X, defined by 


Xp = {x €Xq: Ax € Va}, 


is called the kinetically admissible space. Clearly, P: X;, 
— Ris nonconvex. 

The Lagrangian form associated with the nonconvex 
primal problems is defined by 


L(x, y*) = (Ax; y*) — W*(y*) — F(X), (19) 


which is G ateaux differentiable on X, x Y*. The critical 
condition DL(x, y") = 0 leads to the Lagrange equa- 
tions: 
AX = DW*(y*),  A*y* = DF(2). 
Clearly, L: X, x Y* — Ris a supercritical function, and 
P(x) = 


sup L(x, y*), Wx € Xx. 


yee 
Dually, for any given dual feasible y* € Y*, 


xEX, 


where the dual feasible set Y* C Y* is defined by 


ys = {y* € ye: 


The criticality conditions DL(x, y*) = 0, DP(x) = 0 
and DP4(7*) = 0 are equivalent to each other. 

The Hamiltonian H: X, x Y* — R associated with 
the Lagrangian L is defined by 


yt exe. 


H(x, y*) = (Ax; y*) — L(x, y*) 


= W*(y*) + F(x). (20) 


For d.c. programming, H(x, y*) is a convex function 
on X, x Y*. The critical point (x, y*) of L satisfies the 
Hamiltonian canonical form 

AX = DyxH(%,y*), A*y* = Dy H(®,9*). 

Particularly, if W(A x) = 1/2 (Ax, CA x) is 
a quadratic function, C: Y, — Y* is a symmetric op- 
erator such that the composite operator A = A* C A= 
A* is selfadjoint, then the total action can be written as 


P(x) = : 5 Ax) — F(x). 
Let P°(x) = — _ A x); then the function P°: X, >R 
P(x) = ; Si Ax) — F*(Ax) 


is the so-called Clarke dual action (see [1]). 
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Theorem 5 (Clarke duality theorem) Let A: Xx C Xq 
— X* be a closed selfadjoint operator, and Ker A = {x 
€X:Ax=0€ X*} the null space of A. If x € Xx is 
a critical point of P, then any vector x € KerA + X is 
a critical point of P*. 

Conversely, if there exists aX) € X, such that A xo 
€ X%, then for a given critical point x of P°, any vector 
x € Ker A+ X is a critical point of P. 


Example 6 Let us consider a very simple one- 
dimensional optimization problem with constraint 


F(x) = whet fx > max (21) 


s.t. a<x<b, 


where k > 0 and f € R are given constants. We assume 
that — 00 <a<0<b< ow. Since F(x) is strictly convex 
on the closed set [a, b], the maximum is attained only 
on the boundary, i.e. 


sup F(x) = max{F(a), F(b)} < oo. 
x€[a,b] 
The classical Lagrange multiplier method cannot be 
used for this nonconvex problem. To set this problem 
within our framework, we need only set X = R, X, = [a, 
b] and let A = 1, so that 


y= Ax=xeVY=R. 


Thus, the range of the mapping A: X, > Y=RisY, = 
[a, b]. Let 


0 ify © Va, 
ify ¢ Va. 


It is not difficult to check that W: Y > R U {+ o } is 
convex. On Y,, W is finite and differentiable. Thus, the 
primal feasible space can be defined by 


W — 
(y) ie 


Xp ={x EX: Ax =x Ee Y,} = [a,b]. 


Clearly, on X, P(x) = W(A x) — F(x) = F(x). The 
constrained maximization problem (21) is then equiva- 
lent to the standard nonconvex minimization problem 
(Ping): P(x) > min, Vx € Xx. 

Since F(x) is strictly convex and differentiable on X, 
= [a, b], and 


x* = DF(x) = kx -—f ex? 


is invertible, where 
X* = [ak—f,bk-—f] CX* =R, 


the Legendre conjugate P°: X* — R can easily be ob- 
tained as 


o* * _ * — i * 
F*(x*) = max {xx — F(x)} = yas + fy. 


By the Fenchel transformation, the conjugate of the 
nonsmooth function W can be obtained as 


W*(y") = suptyy* — W(y)} = max yy" 
yey yEVa 


by* ify* >0, 
= 410 if y* = 0, 
ay* ify* <0. 


It is convex and differentiable on Y* = Y* =R. 

On X, x Y* = [a, b] x R, the Lagrange form for this 
nonconvex programming is well-defined by 

L(x, y*) = y* Ax — W*(y*) — F(x) 


xy* — by* — kx? + fx ify* >0, 


xy* —ay* —$kx? + fx ify* <0. 

Since both W* and F are convex, L(x, y*) is a supercrit- 
ical point function. If x € X; = [a, b], then 

P(x) = sup L(x, y*). 

y*eVe 

On the other hand, for any y* in the dual feasible 
space 

YE ={y EVE=R: Aty* = y* ext} 

= lak — f, bk — f], 

the dual function is obtained by 


P4(y*) = sup L(x, y*) 


xEX, 
= suey — F(x)} — W*(y*) 
F*( *y*) _ W*(y*), 


where 
F*(A*y*) = ay — F(x)} 
1 
= sup{x(y + f) — 5 ke} 
xER 


_- * 2 
50 + fy. 
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Duality Theory: Biduality in Nonconvex Optimization, Fig- 
ure 1 
Biduality in constrained nonconvex optimization 


Thus, the dual action P@ is well defined on Y* by 


=" + f)?—by* ify* >0, 
P4(y*) = xf if y* = 0, 


=" + f)—ay* ify* <0. 


This is a double-well function on R (see Fig. 1.). The 


dual problem 
(P2,): P#(y*)> min, Vy* € y%, 
is a convex optimization problem on either 
yrt = fe ys: y* > 0} 
or 


yr = (ye Ys: y* <O}. 


In n-dimensional problems, this dual problem is much 
easier than the primal problem. The criticality condi- 


tion leads to 
bk—f ify” >0, 
y =40 ify* =0 
ak—f ify* <0. 


It is easy to check that the following duality theorems 


hold: 
min P(x) = min P4(y*), 
xEX, y*eyx 


max P(x) = max P%(y*). 
xEX, y*eys* 


The graphs of P(x) and P“(y*) are shown in Fig. 1. 
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The concept of duality is one of the most successful 
ideas in modern mathematics and science. Inner beauty 
in natural phenomena is bound up with duality, which 
has always been a rich source of inspiration in human 
knowledge through the centuries. Duality in mathemat- 
ics, roughly speaking, is a fundamental concept that un- 
derlies many aspects of extremum principles in natu- 
ral systems. Eigenvectors, geodesics, minimal surfaces, 
KKT conditions, harmonic maps, Hamiltonian canoni- 
cal equations and equilibrium states of many field equa- 
tions are all critical points of certain functions on some 
appropriate constraint sets or manifolds. Considerable 
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attention has been attracted on this fascinating research 

subject during the last years. A comprehensive study 

on duality theory in general nonconvex and nonsmooth 
systems is given in [1]. In global optimization problems, 
duality falls principally into three categories: 

1) the classical saddle Lagrange duality (i. e. monodu- 
ality) in convex optimization; 

2) the nice biduality in convex Hamilton systems or the 
d.c. programming (difference of convex functions); 
and 

3) the interesting triduality in general nonconvex sys- 
tems. 


Saddle Lagrange Duality 


Let (X, X*) and (Y, Y*) be two pairs real vector spaces, 
finite- or infinite-dimensional, and let ( *, * ): X x X* 
— Rand ( *; * ): ¥x Y* > Rbe certain bilinear forms 
which put the paired spaces (X, X*) and (Y, Y*) in 
duality, respectively. In classical convex optimization, 
a real-valued function L: X x Y* — Ris said to bea sad- 
dle function if it is convex in one variable and concave 
in the other one. 

A pair (x, y*) is called a right saddle point of L on 
a subspace Xq x Yax C X x Y* if 


L(x, y*) < L(x, y"*) < L(x, ¥") 


holds for any (x, y*) € Xq x Y*. 

A pair (X, y*) is called a left saddle point of L on 
a subspace X, x Y* C Xx Y* ifit is a right saddle point 
of —L on the subspace X, x Y*. 

A pair (x, y*) € Xq x Y% is called a critical point of 
Lif L is partially Gateaux differentiable at (x, y*) and 


D,L(%,7") =0, Dy» L(%, 7") = 0, 


where D,L: X, — X* and Dy»L: Y% — Y denote, respec- 
tively, the partial Gateaux derivatives of L with respect 
to x and y*. 

Any critical point of a Gateaux differentiable saddle 
function is a saddle point. However, if (x, y") is a saddle 
point of L it does not follow that L is a saddle function. 
In convex optimization problems, the following state- 
ments are of important theoretical value. 

S1) Under certain necessary and sufficient conditions, 
suppose that 


inf sup 


Ley) = inf L(x, y*). (1 
cee (x, y") Je ae. (x, y"). (1) 


A statement of this type is called a saddle-minimax 
theorem and the pair (x, y*) is called a saddle- 
minimax point of Lon X, x Y*. 

$2) Under certain conditions, suppose that a pair 
(x, y") € Xq x YX exists such that 


L(x, y") = L(x, y") = L(x, y*) (2) 


holds for any (x, y*) € X, x Y*. A statement of this 
type is called a right saddle-point theorem. 
Let X; be a subset of X, such that X; contains all point 
u € X, for which the supremum sup, L(x, y*) is finite, 
ie. 


sup L(x, y*)<+00, Wx € X,. 


y* eye 
Dually, let Y¥ be a subset of Y* such that Y* contains 
all points y* € Y* for which the infimum inf, L(x, y*) is 
finite, i.e. 


J Ey )>-oo, Vy  eY?. 


The sets X; and Y* may be either empty or X,; =X, and 
y* = Y*. The connection between the minimax theo- 
rem and the saddle-point theorem is given by the fol- 
lowing results. 


Theorem 1 (Saddle-minimax theorem) Let L: X, x 
Yy* — R be a given arbitrary function. If there exists 
a saddle-minimax point (x, y") € Xa x Y* such that 


L(x, y*) = min max L(x, y*) 

xEXg y*EY* (3) 
max min L(x, y*), 

yey xEXg 


II 


then (x, y") is a saddle point of L on Xq x Y*. 

Conversely, if L(x, y*) possesses a saddle point (x, y") 
on Xq x Y*, then the saddle-minimax theorem in the 
form 


ae 


max min L(x, y*) 
yey xEXa 


holds. 


Let L: X, x Y* — R be a given arbitrary right saddle 
function. For any fixed x € X,, let 


P(x) = sup L(x, y*). (5) 
y*eVe 
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Let X; C X, be the domain of P such that P: X, > R is 
finite and Gateaux differentiable. Then the inf-problem 
(Pint): P(x) min, Vu € Xx, (6) 
is called the primal problem. 
Dually, for any fixed y* € Y*, let 


P4(y*) = sup L(x, y*) (7) 


xEXg 


with domain Y* Cc Y*, on which, P4:¥* — R is finite 
and Gateaux differentiable. Thus, the sup-problem 


(Pa): 


d bal * * 
sip P“(y") > max, Vy €Y;%, (8) 


is referred as the dual problem. The problems (Pins) and 
ae are realisable if X; and Y* are not empty, ie., 
there exists a pair (x, y") € X_ x Y* such that 

P(x) = min P(x) = inf P(x), 

xEX, xEX, 
P4(y*) = max P4(y*) = sup P4(y*). 
yrev} yrevy 

Theorem 2 (Saddle duality theorem) Let L: X, x 
y* —R be a given arbitrary function such that P and 
P4 are well-defined by (5) and (7) on the open subsets X;, 
and Y*, respectively. If (x, y") is a saddle point of L on X;. 
x Y*, P is Gateaux differentiable at x, and P* is Gateaux 
differentiable at y*, then DP(x) = 0, DP4(y*) = 0, 
and 


PHL, 7) SP"). (9) 


Theorem 3 (Weak duality theorem) The inequality 


P(x) = P*(y*) (10) 
holds for all (x, y*) € X, x Y*. 
Theorem 4 (Strong duality theorem) (x,y") is 


a saddle-point of L on X; x Y* C Xq x Y* if and only 
if the equality 


P(x) = inf P(x) = sup P4(y*) = P4Gy*) (11) 
xXEX, *ey* 


y 


holds. 


Fenchel-Rockafellar Duality 


Very often, the primal function P: X; — R can be writ- 
ten as 


P(x) = W(Ax) — F(x), 


where A: X — Y is a linear operator, W: Y, > R and 
F: X, — R are Gateaux differentiable real-valued func- 
tions. The feasible set X; C X is then defined by 


Xp ={x EX: Axe Y,}. 


Clearly, P: X; — R is convex if W is convex on Y, and 
F is concave on Xq. 

The conjugate function W*: Y* — R of W(y) is de- 
fined by the Fenchel transformation, i.e. 


W*(y*) = sup {{ysy") —W(y)}, (12) 


y 
which is always l.s.c. and convex on Y*. The following 
Fenchel- Young inequality 


W(y) = (ysy") — W*(y*) (13) 


holds on Y, x Y*. If W is strictly convex, and Gateaux 
differentiable on Y, C Y, then the following Legendre 
duality relations 


y* = DW(y) © y = DW*(y*) 
> (wy") = Wy) + W*(*) 


hold on Y, x Y%. In this case, we have W(y) = W**(y), 
the biconjugate of W, and the Fenchel transformation 
(12) is equivalent to the classical Legendre transforma- 
tion 


W*(y*) = (v(y*)s ¥*) — Wy"). 


The Lagrangian form associated with (Ping) is de- 
fined by 


L(x, y*) = (Ax; y*) — W*(y*) — F(X), (14) 


which is Gateaux differentiable on X, x Y*. The critical 
condition DL(x, y*) = 0 leads to the Lagrange equa- 
tions: 

AX = DW*(y*),  A*y* = DF(R), (15) 
where A*: Y* —> X* is the adjoint operator of A. 


Clearly, L: X, x Y* — R is a right saddle function if 


Duality Theory: Monoduality in Convex Optimization 


821 


F(x) is concave on X,. For convex function W(y), we 
have 


P(x) = sup L(x,y*), Wx € Xx. 

yeVe 
The Fenchel conjugate function of a concave function 
F: X, — Ris defined by 


F*(x*)= inf {(x,x*) — F(x)}. (16) 


Thus, for any given dual admissible y* € Y¥ with 
ys = {y* eYF: A*y* eX}, 


the Fenchel-Rockafellar dual function P?: ye —> Rcan 
be obtained as 


Pa(y*) = inf L(x, y*) = F*(A*y*) — W*(y"*). 


If P is Gateaux differentiable on X;, the critical con- 
dition DP(x) = 0 leads to the Euler-Lagrange equation 
of the primal problem (Ping): 


A* DW(Ax) — DF(X) = 0. (17) 


Similarly, the critical condition DP4 (y*) = 0 gives the 
dual Euler-Lagrange equation entre 


ADF*(A*y*) — DW*(y*) = 0. (18) 


Clearly, the critical point theorem (9) holds if the La- 
grange equation (15), Euler-Lagrange equation (17) 
and its dual equation (18) are equivalent to each oth- 
ers. 

For any given F and W, the weak duality theorem 
(10) always holds on X, x F¥. The difference inf P 
— sup P@ is the so-called duality gap. For convex pri- 
mal problem, the duality gap is zero and the strong 
Lagrange duality theorem (11) holds, which is also re- 
ferred as the Fenchel-Rockafellar duality theory. 


Linear Programming and Central Path 


Let us now demonstrate how the above scheme fits in 
with finite-dimensional linear programming. Let X = 
X* =R", Y =F* =R”, with the standard inner products 
(x, x*) =xT x* in R”, and (y;y*) = yTy* in R™. For fixed 
x =c € R" and y = b € R”, the primal problem is 
a constrained linear optimization problem: 


min (c,x) 
xeR" (19) 
s.t. Ax=b, x>0, 


where A € R”*" is a matrix, and its adjoint is simply 
A* = AT €R™", To reformulate this linear constrained 
optimization problem in the model form (Pins), we need 
to set X, = {x € R"x > 0}, which is a convex cone in R”, 
Ya ={y € R™:y = b}, a hyperplane in R”, and let 

F(x) =—(c,x), Wx EXa, 

W(y)=0, Vy € Va. 
Thus on the primal feasible set 


X, = {x ER": Ax = b, x > 0} 


we have P(x) = W(Ax)— F(x) = (c, x). The conjugate 
functions in this elementary case may be calculated at 
once as 


W*(y") = sup (ys y") = (bs y"). 
yEVa 
Vy e YF =R", 
F*(x") = inf (x,x* +c) =0, Vx" eX*, 
xEX, 
where X* = {x* € R": x* +c > 0} isa polar cone of Xq. 
Thus, on the dual feasible space 


YF = {y* ER": 
the problem dual to the linear programming (19) reads 


max P*(y*) = —(b;y*), Vy" © Y*. 
pER”™ 


*y* +c> 0}, 


(20) 


The Lagrangian L: X, x Y* — R associated with this 
constrained linear programming is 


L(x, y") = (Axs y") — (bs y") + (c, x) 

= (x, A*y + c) — (bs y"). 
But for inequality constraints in X,, the Lagrange mul- 
tiplier x* = A* y* € R" has to satisfy the following KKT 
optimality conditions 

Ax = b, 


x= 0, 


s=ct+A*y*, 


sx, = 0, 


(21) 
s>0, 
where the vector s € R” is called the dual slacks. 
The problem of finding (x, y*, 5) satisfying (21) is also 
known as the mixed linear complementarity problem. 

By using the vector of dual slacks s € R”, the dual 
problem can be rewritten as 

max 


b; 
a (5 p) 
s.t. A*y* +c—s=0, 


(22) 
s>0. 
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We can see that the primal variable x is the Lagrange 
multiplier for the constraint A*y* + c > 0 in the dual 
problem. However, the dual variables y* and s are re- 
spectively Lagrange multipliers for the constraints Ax 
= band x = 0 in the primal problem. These choices are 
not accidents. 


Theorem 5 The vector x € R" is a solution of (19) if 
and only if there exists a Lagrange multipliers (y*,s) € 
R” x R" for which the KKT optimality conditions (21) 
hold for (x, y*,5). Dually, the vector (y*,s) € R™ x R" 
is a solution of (22) if and only if there exists a Lagrange 
multiplier x € IR" such that the KKT conditions (21) 
hold for (x, ¥*, 5). 


The vector (x, y*,s) is called a primal-dual solution of 
(19). The so-called primal-dual methods in mathemat- 
ical programming are those methods to find primal- 
dual solutions (x, y*,s) by applying variants of New- 
ton’s method to the three equations in (21) and modi- 
fying the search directions and steplengths so that the 
inequalities in (21) are satisfied at every iteration. If the 
inequalities are strictly satisfied, the methods are called 
primal-dual interior-point methods. In these methods, 
the so called central path Cy. plays a vital role in the 
theory of primal-dual algorithms. It is a parametrical 
curve of strictly feasible points defined by 

Cm=1eeg sen: ce 0h, (23) 
where each point (x, yz, sr) solves the following sys- 
tem: 

Ax = b, 


x > 0, 


* Pe doa ge 
(24) 


s>0, ujsj =T, 


This problem has a unique solution (x;, y*, sz) for each 
t > 0 if and only if the strictly feasible set 


Ax =b, A*y*+c=s, 
x>0,s>0 


Fo = (x, iz S): 
is nonempty. A comprehensive study of the primal-dual 
interior-point methods in mathematical programming 
has been given in [3] and [2]. 
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Consider the following general nonconvex extremum 
problem (P): 
VxeEX, (1) 


P(x) = ®(x, A(x)) > extremum, 


where X is a locally convex topological vector space 
(Les.), P: X + R:= RU {—00} U {+00} is a non- 
convex and nonsmooth extended function, whose ef- 
fective domain 


X; =domP = {x EX: |P(x)| < +00} 


is anonempty convex subset of X; the operator A: X > 
Y is a continuous, generally nonlinear, mapping from 
X to another lcs. Y,and ®: X x Y > R isan associ- 
ated extended function. Since the cost function P(x) is 
usually nonconvex, the problem (P) may possess many 
locally extremum (either minimum or maximum) so- 
lutions. The goal of global optimization is to find all the 
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local extrema of P(x) over the feasible set X,. Generally 
speaking, traditional direct approaches and algorithms 
for solving nonconvex, nonsmooth global optimization 
problems are usually very difficult. The classical sad- 
dle Lagrange duality methods as well as the well-known 
Fenchel-Rockafellar duality theory can be used mainly 
for solving convex problems. For nonconvex problems, 
there exists a so-called duality gap between the primal 
and the classical dual problems. 

The canonical dual transformation method andas- 
sociated triduality theory were proposed originally in 
finite deformation theory [1]. The key idea of this 
method is to choose a suitable nonlinear operator A: 
X — Y such that (x, y) is either convex or concave 
in each of its variables. This method can be used to 
solve many nonconvex, nonsmooth global optimiza- 
tion problems. 


Canonical Dual Transformation 


Let (X, X*) be a pair of real linear spaces, placed in du- 
ality by a bilinear form (-, -) : X x X* > R. For a given 
extended real-valued function P: X — R, the subdif- 
ferential of P atx € X is a convex subset 0” P(x) C X* 
such that for each x* € 07 P(X), we have 


(x*, x —x) < P(x) P(x), VxeX. 


Dually, the superdifferential of P at x € X is a convex 
subset 0+ P(x) C X* such that for each x* € Ot P(X), 
we have 


(x*,x —x)}> P(x)— P(x), VWxeEX. 


Clearly, we always have 0*P = — 0” (—P). In convex 
analysis, it is convention that 0~ is simply written as 0. 
In nonconvex analysis, dstands for either 0~ or 0*, i.e. 


a= {a-, ath. 


If P is smooth, Gateaux-differentiable atx € X, C X, 
then 


OP(x) = 0 P(x) = 0+ P(X) = {DP(x)}, 
where DP: X, — X* denotes the Gateaux derivative of 
Patx. 


Definition 1 The set of functions P: X — R which 
are either convex or concave is denoted by [°(X). In 


particular, let I'(X) denote the subset of functions P € 
I'(X) which are convex and ar (X) the subset of P € 
I (X) which are concave. 

The canonical function space I’ g(Xq) is a subset of 
functions P € I’(X,) which are Gateaux differentiable 
on X, C X and the duality mapping DP: X, > X* Cc 
X* is invertible. 

The extended canonical function space Io(X) is 
a subset of functions P € I’ (X)which are either convex, 
lower semicontinuous or concave, upper semicontinu- 
ous, and if P takes the values oo, then P is identically 
equal to oo. 


By the Legendre-Fenchel transformation, the supcon- 
jugate function of an extended function P: X — R is 
defined by 


P#(x*) = sup{(x, x*) — P(x)}. 
xExX 


By the theory of convex analysis, P?: xX* > R= 
IRU {+00} is always convex and lower semicontinuous, 
ie. P? © I(X*). Dually, the subconjugate function of 
P, defined by 


P(x") = inf {(x, x") — POO}, 


is always concave and upper semicontinuous, i.e. P? € 
r o(X*), and P? = —P#. Both the super- and subcon- 
jugates are called Fenchel conjugate functions and we 
write P* = {P”, P*}. Thus the extended Fenchel trans- 
formation can be written as 


P*(x*) = ext {(x,x*) — P(x): Wx EX}, (2) 
where ext stands for extremum. Clearly, if P € I9(X), 
we have the Fenchel equivalent relations, namely, 

x* € dP(x) & x € OP*(x*) 

<> P(x) + P*(x*) = (x,x*). (3) 
The pair (x, x*) is called the Fenchel duality pair on X x 
X* if and only if equation (3) holds on X x X*. 

The conjugate pair (x, x*) is said to be a Legendre 
duality pair on X, x X7 C X x X* if and only if the 
equivalent relations 

x* = DP(x) & x = DP*(x*) 

<> P(x) + P*(x*) = (x,x*) (4) 


hold on Xq x X*. 
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Let (Y, Y*) be an another pair of locally convex 
topological real linear spaces paired in separating dual- 
ity by the second bilinear form (-;-) : Y x Y* > R. The 
so-called geometrical operator A :X — Y is a continu- 
ous, Gateaux differentiable operator such that for any 
given x € X, C X, there exists a y € Y, C Y satisfying 
the geometrical equation 


y = A(x). 


The directional derivative of y at x in the direction x € 
X is then defined by 


yx + Ox) — y(x) 
0 


where A;(x) = DA(x) denotes the Gateaux deriva- 
tive of the operator A at x. For a given y* € Y*, G)(x) 
= (A(x);y*) is a real-valued function of x on X. Its 
Gateaux derivative at x € X, in the direction x € X 
is 


by(X3 x) = pe = A,(x)x, 


bGy«(X; x) = (As(X)x; y*) = (x, Ay Gey"), 


where Af (x): Y* — X% is the adjoint operator of A; 
associated with the two bilinear forms. 

Let ®: X x Y > R be an extended function such 
that P(x) = B(x, A(x)). If ®@: Xx Y > R is an ex- 
tended canonical function, i.e. ® € I’9(X) x Io(Y), the 
duality relations between the paired spaces (X, X*) and 
(Y, Y*) can be written as 


y* € 0, P(x, y). (5) 


On the product space X_ x Ya C X x Y, if the canon- 
ical function ®(x, y) is finite and Gateaux differentiable 
such that the feasible space X; can be written as 


x* € dy P(x, y), 


Xp = {x © Xy: A(x) € Ya}, (6) 


then on X,, the critical condition 6P(x;x) = 
(x, DP(x)) = 0, Vx € Xz, leads to the Euler equation 


D,®(%, A(x)) + AF (X)Dy P(X, A(X)) = 0, (7) 


where D,® and D,® denote the partial Gateaux 
derivatives of ® with respect to x and y, respectively. 
Since ® € '¢(X,) x I'G(Yq) is a canonical function, 
the Gateaux derivative D® : X, x Ya > X* x YRC 
X* x Y* isa monotone mapping, i.e. there exists a pair 
(x*, y*) € X* x Y* such that 


-x* = D,®(x, A(z), y* = D,®(, A(®)). 


Thus, in terms of canonical dual variables x* and y*, 
the Euler equation (7) can be written in the so-called 
balance (or equilibrium ) equilibrium 


tana ay . (8) 
which linearly depends on the dual variable y*. 


Definition 2 Suppose that for a given problem (P), the 
geometrical operator A : X — Y can be chosen in such 
a way that P(x) = ®(x, A(x)), ® € Mg(X_) x Mga) 
and X, = {x € X,: A(x) € Yq}. Then 

1) the transformation {P; X;} > {®; X, x Yq} is called 
the canonical transformation, and ® : Xx Ya—>R 
iscalled the canonical function associated with A; 
the problem (P) is called geometrically nonlinear 
(respectively, geometrically linear) if A: X —> Y is 
nonlinear (respectively, linear); it is called physically 
nonlinear (respectively, physically linear) if the du- 
ality mapping D@ : X, x Ya > X* x Y* is nonlinear 
(respectively, linear); it is called fully nonlinear if it 
is both geometrically and physically nonlinear. 


2 


wa 


The canonical transformation plays a fundamental role 
in duality theory of global optimization. By this def- 
inition, the governing equation (7) for fully nonlin- 
ear problems canbe written in the tricanonical forms, 
namely, 
1) geometrical equation: y = A(x); 
2) physical relations: (— x*, y*) € d®(x, y); 
3) balance equation: x* = A(x) y*. 

Since A: X — Y is Gateaux differentiable, for any 
given x € X we have the operator decomposition 


A(x) = A:(x)x + A-(x), (9) 


where A, = A — A, is the complementary operator of 
A;,. By this operator decomposition, the relation be- 
tween the two bilinear forms reads 


(A(x); y*) = (x, AF (x)y*) — G(x, y*), 


where G(x, y*) = (— A,(x); y*) is the so-called comple- 
mentary gap function, introduced in [2]. This gap plays 
an important role in the canonical dual transformation 
methods. A framework for the fully nonlinear system is 


xEX <— (x, x*) > X* > x* 
Mea aA t t at=(A-A,)* 
yey (yy) o> Yay’ 
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Extensive illustrations of the canonical transformation 
and the tricanonical forms in mathematical physics and 
variational analysis can be found in [1]. 

Very often, the extended canonical function ® can 
be written in the form 


P(x, y) = W(y) — F(x), 


where F € I” (X) and W € I"(Y) are extended canonical 
functions. The duality relations (5) in this special case 
take the forms 


x* € OF(x), y* € OW(y). 


If Fe '¢(X,) and W € I'¢(Y,) are Gateaux differen- 
tiable, the Euler equation (7) reads 


At (x)DW(A(&)) — DF(x) = 0. 


If A: X — Y is linear, and W Y — R is quadratic 
such that DW = Cy, where C: Y > Y* is a linear opera- 
tor, then the governing equations for linear system can 
be written as 


A*CAx = Ax = x*. 


For conservative systems, the operator A = A* CA is 
usually symmetric. In static systems, C is usually posi- 
tive definite and the associated total potential P is con- 
vex. However, in dynamical systems, C is indefinite and 
P is called the total action, which is usually a d.c. func- 
tion in convex Hamilton systems. 


Triality Theory 


We assume that for any given nonconvex extended 
function P: X — R, there exists a general nonlinear 
operator A: X — Yanda canonical function W € I"(Y) 
suchthat the canonical transformation can be written as 


P(x) = W(A(x)) — (x, ¢), (10) 


where c € X* is a given source variable. Since F(x) = x, c 
is a linear function, the Hamiltonian H(x, y*) = W*(y*) 
+ x, c is a canonical function on Z = X x Y* and the 
extended Lagrangian reads 

L(x, y*) = (A(x); y*) — W*(y*) — (x, c). (11) 


For a fixed y* € Y*, the convexity of L(-, y*): X > R 
depends on A(x) and y* € Y*. 


Let 2, = Xq x Y* C Z be the effectivedomain of L, 
and let £. C 2, be a critical point set of L, i.e. 


Le = {(BP) € Xa x Yi: DLR, 7") = 0}. 


For any given critical point (x, y*) € L., we let X, x 
Y* be its neighborhood such that on X, x Y*, the pair 
(x, y") is the only critical point of L. The following re- 
sultis of fundamental importance in global optimiza- 
tion. 


Theorem 3 (Triality theorem) Suppose that W € 
I'(Ya) is convex, (X, y*) € L, is a critical point of L 
and X, x Y* is a neighborhood of (x, y*). 

If (A(x); ¥*) is convex on X,, then 


ue x) = 7 prey ae y) 


(12) 


II 


max min L(x, y*). 
yeeYy* xEX, 


However, if (A(x); y*) is concave on X,, then either 


EC y) = ae prey a y) 


= min maxL(x, y*), (13) 
yt ey* xEX, 
or 
L(x, 7* _ L(x, * 
(x,y) ges (x, y") 
= max max L(x, y*). (14) 


y*ey* xEX, 


Since W € I(Y,) is a canonical function, we always 
have 


P(x) = ext {L(x, y*): y* Ee Y*}, Vx © Xx. (15) 


On the other hand, for a given Gateaux differentiable 
geometrical mapping A: X, — Yg, the criticality condi- 
tion D, L(x, y*) = 0 leads to the equilibrium equation 


A*(x)y* = c. (16) 


If there exists a subspace Y* C Y* such that for any y* 
€ Y* and a given source variable c € X*, the equation 
(16) can be solved for ¥ = X(y*), then by the operator 
decomposition (9), the dual function P4 : y* > Roan 
be written explicitly in the form 


P4(y*) = sta{L(x, y*): x € X} 


= —Gi(y*)— W*(y*),  Vy* e yt, 
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where G“ : Y* — Ris the so-called pure complementary 
gap function, defined by 


G4(y*) = G(K(y*), y*) = — (Ac (X(y*))s y*) - 


For any given critical point (x, y*) € L£,, we have 
G4(y*) = (x, c) — (A(XG*)); y*). Thus, the Legendre 
duality relations among the canonical functions W and 
W* lead to 


P(x) — P4(y"*) =0, VRP) € Le. (17) 


This identity shows that there is no duality gap between 
the nonconvex function P and its canonical dual func- 
tion P*. Actually the duality gap, which exists in clas- 
sical duality theories, is now recovered by the comple- 
mentary gap function G(x, y*). 


Theorem 4 (Triduality theorem) Suppose that W € 
I’ (Va) is a critical point of L and X, x Y* is a neighbor- 
hood of (x, y*). If (A(x); ¥*) is convex on X,, then 


P(X) = min P(x) & P4(y*) = max P4(y*). 
xEX, yreye 
However, if (A(x); y*) is concave on X,, then 
P(X) = min P(x) & P4(y*) = min P4(y*); 
xEX; yt eye 
P(X) = max P(x) & P4(y*) = max P4(y*). 
xEX, y*eyy 
Example 5 We now illustrate the application of the 


interesting triduality theory for solving the following 
nonconvex optimization problem in X = R", 


al 
P(x) = a5 \|Ax||? = ay —xle> sta, Vx, 


where a, jt > 0 are given parameters, c € R” is a given 
vector, and A: R” — R” is a matrix. The Euler equa- 
tion associated with this nonconvex stationary problem 
is a nonlinear algebraic equation in R” 


1 
als \|Ax||? — 2) Cx = c, 
where C= ATA = CT ER". Weare interested in finding 
all the critical points of P. To set this nonconvex prob- 
lem in our framework, we let X = R” = X*, and A: R” 


— Y=Ra quadratic operator 


1 1 
y= A(x) = 5 | Ax||? — = 5x1 Cx —p. 


Since F(x) = (x, c) = xT c is a linear function on R”, 
the admissible space X, = X = R”. By the fact that x* = 
DF(x) = c, the range for the canonical mapping DF : X 
— X* = Risa hyperplane in R", i.e. 


XT = {x* €R*: x* =ch. 


The feasible set for the primal problem is X; = {x € Xq 
: A(x) € Ya} =R". 

By the fact that xT Cx > 0, Vx eX, = X = R’, 
the range for the geometrical mapping A: X, — R is 
a closed convex set in R 


YVa={veER: y>-pyCY=R. 


On the admissible subset ¥, C Y = R, the canonical 
function W(y) = (1/2)ay? is quadratic. The range for the 
constitutive mapping DW: Y, —> Y* = Ris also a closed 
convex set in R, 


Y, = (" €R: y" > —ap}. 


On Y*, the Legendre conjugate of W isalso strictly con- 
vex 
*/O* 1 —1,.*2 
wey = 54 y; (18) 
and the Legendre duality relations hold on Y, x Y*. 
On X, x Y* = R” x R, the extended Lagrangian in 
this case reads 


TLy#? = x". 


1 1 
L(x, y*) = —y*x"Cx — py* — <a 
2 2 
It is easy to check that the dual function associated with 
Lis 


1 1 
p4 #y ao fape\yoHl _ x #2 
(y") 5 ye Co— py Sa 


The dual Euler-Lagrange equation is an algebraic equa- 
tion in R: 


1 
(uw + ayy" = 50 (19) 


where o* = cTCc is a constant. Since C € R”” is positive 
definite, this equation holds only on Y*. 

In algebraic geometry, the dual Euler-Lagrange 
equation (19) is the so-called singular algebraic curve in 
(y*, o)-space (see Fig. 1). For a given parameter ju and c 
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Singular algebraic curve 


€ R", this dual equation has at most three real roots y; 
€ Y*, k= 1, 2, 3, which leads to the primal solution 


xe =y.Crc, k=1,2,3, 


where C* stands for the generalized inverse of C. We 
know that each (x;, yZ) is a critical point of L and 


P(x,) = L(x, yt) = P*(yt), & = 1,2,3. 


In the case of n = 1, the cost function 


a oe 
P(x) = 57 (5* _ 1) — cx 
is a double-well function (see Fig. 2, solid line), which 
appears in many physical systems. The graph of the 
canonical dual function 
d/..* 1 Ca * y si 

PY(y = aye MY = oR 
has two branches (Fig. 2, dashed line). It is easy to prove 
(see [1]) that if > pf, = 1.5 (o/a)??, the dual Eu- 
ler-Lagrange equation (19) has three roots yf > 0 > y3 
> yx, corresponding to three critical points of P* (see 
Fig. 2). Then, y* is a global maximizer of P4, x; = o/y* 
is a global minimizer of P, P@ takes local minimum and 
local maximum values at y} and y3, respectively, x. = 
o/y; is a local maximizer of P, while x3 = 0/3 is a local 
minimizer. 

The Lagrangian associated with this double-well en- 
ergy is 


L(x, y") = 5x°y ee Meas )—y*x. 


It is a saddle function for y* > 0. If y* < 0, it is a super- 
critical point function (see Fig. 3). 


y* > 0 


Duality Theory: Triduality in Global Optimization, Figure 2 
Graphs of P(u) and itsdual P4(y*) 


Duality Theory: Triduality in Global Optimization, Figure 3 
Lagrangian for the double-well energy 


See also 


> Duality Theory: Biduality in Nonconvex 
Optimization 

> Duality Theory: Monoduality in Convex 
Optimization 

> History of Optimization 

> Von Neumann, John 
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Introduction 


We consider the application of Dykstra’s algorithm for 
solving the following optimization problem 


min ||x° — x|| , (1) 
xEQ 


where x° is a given point, (2 is a closed and convex 
set, and ||z||?_ = (z,z) defines a real inner product in 
the space. The solution x’ is called the projection of x° 
onto 92 and is denoted by Pg(x°). Dykstra’s algorithm 
for solving (1) has been extensively studied since it fits 
in many different applications (see [5,21,22,23,27,28, 
29,32,24,42,42,45]). 

For simplicity, we consider the case 

Q=(- 


i=1 


Qi, (2) 


where §2; are closed and convex sets in R”, for i = 
1,2,...,p, and 82 # QM. Moreover, we assume that 
for any z € R” the calculation of Pg(z) is not trivial; 
whereas, for each §2;, Pg;(z) is easy to obtain as in the 
case of a box, an affine subspace, or a sphere. For the 
not feasible case (i.e., when 2 = @) the behavior of 
Dykstra’s algorithm is treated in [2,6,37]. 

Dykstra’s alternating projection algorithm is a cyclic 
scheme for finding asymptotically the projection of 
a given point onto the intersection of a finite number 
of closed convex sets. Roughly speaking, it iterates by 
projecting in a clever way onto each of the convex sets 
individually. The algorithm was originally proposed by 
Dykstra [20] for closed and convex cones in the Eu- 
clidean space R”, and later extended by Boyle and Dyk- 
stra [7] for closed and convex sets in a Hilbert space. 
It was rediscovered by Han [30] using duality theory, 
and the linear rate of convergence was established by 
Deutsch and Hundal [18] for the polyhedral case (see 
also [19,43,44]). 

Dykstra’s algorithm belongs to the general family of 
alternating projection methods, that dates back to von 
Neumann [46] who treated the problem of finding the 
projection of a given point in a Hilbert space onto the 
intersection of two closed subspaces. Later, Cheney and 
Goldstein [15] extended the analysis of von Neumann’s 
alternating projection scheme to the case of two closed 
and convex sets. In particular, they established con- 
vergence under mild assumptions. However, the limit 
point need not be the closest in the intersection. There- 
fore, the alternating projection method, proposed by 
von Neumann, is not useful for problem (1). Fortu- 
nately, Dykstra [20] found the clever modification of 
von Neumann’s scheme for which convergence to the 
solution point is guaranteed. For a complete discussion 
on alternating projection methods see Deutsch [17]. 

Dykstra’s algorithm has been extended in several 
different ways. Gaffke and Mathar [24] proposed, via 
duality, a family of simultaneous Dykstra’s algorithm 
in Hilbert space. Later Iusem and De Pierro [37] es- 
tablished the convergence of the simultaneous ver- 
sion considering also the inconsistent case in the Eu- 
clidean space IR". Bauschke and Borwein [2] further 
analyzed Dykstra’s algorithm for two sets, that appears 
frequently in applications and in particular generalized 
the results in [37]. In [36] it was established that for 
linear inequality constraints the method of Dykstra re- 
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duces to the method proposed by Hildreth [33] in his 
pioneer work on dual alternating projections. See also 
[40] for further analysis and extensions. 

Dykstra’s algorithm has also been generalized by 
Deutsch and Hundal [35] to an infinite family of sets, 
and also to allow a random ordering, instead of cyclic, 
of the projections onto the closed convex sets. More re- 
cently, it has also been generalized by Bregman et al. [9] 
to avoid the projection onto each one of the convex sets 
in every cycle. Instead, projections onto either a suit- 
able half space of the intersection of two half spaces 
are used. Further results concerning the connection be- 
tween Bregman distances and Dykstra’s algorithm can 
be found in [3,4,8,14]. For the advantages of project- 
ing cyclically onto suitable half spaces, see the previous 
work by Iusem and Svaiter [38,39]. 

A computational experiment comparing Dykstra’s 
algorithm and the Halpern-Lions-Wittmann-Bauschke 
algorithm [1] on linear best approximation test prob- 
lems can be found in [12]. 


Formulations 
Dykstra’s Algorithm 


Dykstra’s algorithm solves (1), (2) by generating two 
sequences: the iterates {xk} and the increments { yk}. 
These sequences are defined by the following recursive 
formulae: 


ko k=1 
Xo =X, , 


k k k-1 
i Po (xj) — yi); 


_ ok k k-1 
xi —(xi1-¥i), 


II 


i=1,2,...,p, (3) 
i=1,2,...,p, 


S 
aro 
| 


0 


for k = 1,2,... with initial values x} = x° and y/ = 0 


fori =1,2,...,p. 


Remarks 

1. For the sake of simplicity, the projecting control in- 
dex i(k) used in (3) is the most common one: i(k) = 
kmod p + 1, for all k > 0. However, more ad- 
vanced control indices can also be used, as long as 
they satisfy some minimal theoretical requirements 
(see e. g., [35]). 

2. The increment ye associated with 2; in the previ- 
ous cycle is always subtracted before projecting onto 
2;. Only one increment (the last one) for each 22; 
needs to be stored. 


3. If (2; is a closed affine subspace, then the operator 
Pg, is linear and it is not required, in the kth cycle, to 
subtract the increment yet before projecting onto 
§2;. Thus, for affine subspaces, Dykstra’s procedure 
reduces to the alternating projection method of von 
Neumann [46]. 

4. Fork = 1,2,... andi =1,2,..., p,it is clear from 
(3) that the following relations hold 


Xp ~X =, VM > (4) 


k k _ k= 
Xj. — X= Vij 


ae (5) 
where x) = x° and y} = 0, for alli = 1,2,...,p. 
For the sake of completeness we now present the key 

theorem associated with Dykstra’s algorithm. 


Theorem 1 Boyle and Dykstra, 1986 [7] Let 
{2),..., 2p be closed and convex sets of IR" such that 
2 = nP_ Qi; # O. For any i = 1,2,...,p and any 
x° € R", the sequence {x*} generated by (3) converges 
to x* = Pe(x°) (i.e., ||x* — x*|| > 0 as k > 00). 


We now discuss the delicate issue of stopping Dykstra’s 
algorithm within a certain previously established toler- 
ance that indicates the distance of the current iterate to 
the unique solution. 


Difficulties with some Commonly Used Stopping 
Criteria 


In some applications it is possible to obtain a some- 
how natural stopping rule, associated with the prob- 
lem at hand. For example, when solving a linear system, 
Ax = J, by alternating projection methods [10,25], 
the residual vector (r(x) = b — Ax) is usually avail- 
able and yields some interesting and robust stopping 
rules. Another example appears in image reconstruc- 
tion for which a good and feasible image tells the user 
that it is time to stop the process [13,16]. Similar cir- 
cumstances are present in some other specific applica- 
tions (e.g. saddle point problems [31], and molecular 
biology [28,29]). 

However, in general, this is not the case, and we 
are left with the information produced only by the in- 
ternal computations, i.e., the sequence of iterates and 
perhaps the sequence of increments, and some inner 
products. For this general case, a popular stopping rule 
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Dykstra’s Algorithm and Robust Stopping Criteria, Figure 1 
Feasible set 2@ = 21;N 22 in R2 


is to monitor the subsequence of projections onto one 
particular convex set, {2;, and stop the process when 
the distance, in norm, of two consecutive projections is 
less than or equal to a previously established tolerance 
[26,27,32,41]. 

Another commonly used criterion, that is claimed 
to improve the previous one (e.g. [7,22,28,45]) is to 
somehow compute an average of all the projections at 
each cycle of projections, and then stop the process 
when the distance, in norm, of two consecutive of those 
average projections is less than or equal to a previously 
established tolerance. 

Finally, we would like to mention that another crite- 
rion, that is also designed to improve any of the two cri- 
teria above, is to check any of the previously described 
rules during N consecutive cycles, where N is a fixed 
positive integer. 

None of these stopping rules is a trustable choice. 
In [6], Birgin and Raydan presented the example below 
to establish that they can fail even for a two dimensional 
problem. (see Figs. 1 and 2). 

Consider the closed and convex set 2 = 92) N £22, 
where 2, = {x € R? | x; +x2 > 10} isa half space and 
Q, = {x € R* | 3 < x; < 10,0 < x2 < 4} isa box. 
This closed and convex set in R? is shown in Fig. 1. 

Let x° = (—49,50)? and let us use Dykstra’s al- 
gorithm to find the closest point to x° in @. In Fig. 2 


Dykstra’s Algorithm and Robust Stopping Criteria, Figure 2 
First two cycles of Dykstra’s algorithm to find the projection 
of x° = (—49, 50)" onto 2 = 2,N 22 


we can see the first two cycles of this convergent pro- 
cess. Since yf = y$ = O (null initial increments) 
then for the first cycle we project x° onto 2; to ob- 
tain py = xl = (—44.5,54.5)? and then we project 
p2 onto 2, to obtain p; = x} = (3,4)". For the sec- 
ond cycle, the increments are not null (yi = (4.5, 4.5)" 
and y} = (47.5, —50.5)"), and we start from p3. First 
we project pa = p3 — y; onto 92; to obtain ps = xz. 
Then we project ps = ps — y; onto 2 to obtain p3 
again. Hence x} = x}. The increment associated with 
(2, is large enough to take the iterate back to the quad- 
rant where the projection onto the box is again p3. As 
discussed in [6], this phenomenon will occur until cy- 
deat. Le = Sa Sr a 

Moreover, by choosing x° far enough, this mislead- 
ing event can be repeated for as many cycles as any pre- 
viously established positive integer N. Eventually the 
size of the increments will be reduced and convergence 


to x’ will be observed. 


Dykstra’s Algorithm and Robust Stopping Criteria 


831 


Robust Stopping Criteria 


After a close inspection of the proof of the Boyle and 
Dykstra’s theorem, Birgin and Raydan [6] proposed 
some robust stopping criteria for Dykstra’s algorithm. 
For that they first established the following result. 


Theorem 2 Let x’ be any element of R". Consider 
the sequences {xk} and 03 generated by (3) and de- 
fine ck as 


k 
k= OS yr yt iP 


m=1 i=1 


Then, in the kth cycle of Dykstra’s algorithm, 
Ix? — x")? 2 ef. (7) 


Moreover, at the limit when k goes to infinity, equality is 
attained in (7). 


Based on the previous theorem, let us now write c* as 
follows: 


where 
k 
c= a Gens (8) 
brs 
f= >> lye — yh? (9) 
i=1 
and 


P 
k 
c= 2) Part — xP). 


m=1 i=1 


Both ck and cé are monotonically nondecreasing by 
definition. Moreover in [6], the following theorem is 
also established. 


Theorem 3 Consider the sequences {x*} and {y*} gen- 
erated by (3), and cs, er and ee as defined in (6), (8) 
and (9), respectively. For any k € N, if x* # x* then 


ee > 0 and, hence, ck < oe and ck < ckt}, 


The results established in Theorems 2 and 3 are com- 
bined in [6] to propose robust stopping criteria. No- 
tice that fcky and {c*} are monotonically increasing and 


convergent, and also that {ck} converges to zero. There- 
fore we can stop the process when 


P 
k k— k 
=> I i le 


i=1 


or, similarly, when 


P 
fac =e +25 (yf x} —x*!) <e, (10) 


i=1 


where ¢ > 0 is a sufficiently small tolerance. As ck may 


grow fast, computing cag 


may give inaccurate re- 
sults due to loss of accuracy in floating point represen- 
tation and, hence, cancellation. So, for the criterion in 
(10), it is recommendable to test convergence with the 
second expression. 

The computation of ck involves the squared-norm 
lyf! yf IP, for i = 1,2,..., p. By (5), yf = yf! +r, 
where v = x* — x*_| is a temporary n-dimensional ar- 
ray needed in the computation of Dykstra’s algorithm. 
So, the computational cost involved in the calculation 
of ck is just the cost of the extra inner product (v, v) at 
each iteration. 

The computation of c* involves the calculation of 
ch plus an extra term. The computational of this extra 
term is also small and involves an inner product and the 
difference of two vectors per iteration. But, in contrast 
with the computation of ee which does not require ad- 
ditional savings, the computation of the extra term re- 
quires to save p extra n-dimensional arrays (the same 
amount of memory required in Dykstra’s algorithm to 
save the increments). So, the computation of c* requires 
some additional calculations and memory savings, and 
hence it is more expensive. However, it also has the ad- 
vantage of revealing the optimal distance: ||x° — x* ||’, 
that could be of interest in some applications. 

We close this section with some comments con- 
cerning the behavior of the stopping criteria when the 
problem is not feasible. In this case (2 = 9), there 
is no solution and we know from Theorem 3 that the 
sequences {cf} and {ck} are monotonically increas- 
ing. Moreover, under some mild assumptions on the 
sets §2;, the sequences {xk} converge for 1 < i < 
p» and there exists a real constant 6 > 0 such that 

-_ lek, - xk |? > 6 for all k. A discussion on this 
topic is presented in [2, Section 6], including a notion 
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of distance between all the sets §2; (see also [37]). Now 
using (5), we obtain 


P P 
Yo ky — x8? = So yi — yFIP = cf. 


i=1 


i=1 


Therefore, the sequence {cf} remains bounded away 
from zero, whereas {ck} and {c'} tend to infinity. Con- 
sequently, none of the proposed stopping criteria will 
be satisfied for any k, as expected. 
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Dynamic programming deals with optimal decision 
making problems in the presence of uncertainty that 
addresses systems in which events occur sequential. In 
general the state transitions are described by stationary 
dynamic systems of the form: 


Xk+1 = f (XK, Uk, Ok), k= 02.055 


where for each time instance (stage) k, the state of the 
system is an element of the space S, the control ac- 
tion u that is to be implemented so as to achieve op- 
timality belong to a space C, and finally the uncertainty 
is modeled through a set of random disturbances w 
that belong to a countable set D. Furthermore, it is as- 
sumed that the control uz is constrained to take values 
in a given nonempty set U(xx) € C, which depends of 
the current state x,;. The random disturbances w,, k = 
0, ..., have identical statistics and the probability distri- 
butions P(-|x;, ux) are defined on D. These may depend 
explicitly on x; and u; but not on prior disturbances. 
Given an initial state x9, we seek a policy a such that 


= {{Lo, 1, ..-} for which: 
be: S>C—, px(xy) © U(xg), Van € S, 


that minimizes a cost function defined as: 


N-1 
Jn(xo) = lim € De eae) hp 
=1 


The function g() is the cost per stage such that: g: S x 
C x D > Rand is assumed to be given. Finally, the pa- 
rameter @ is termed discount factor and it holds that: 0 
<a <1. We denote by JT the set of all admissible poli- 
cies 1 = {[Lo, [41, ...}, that is the set of all sequences of 
such functions for which: 


Me: SC, p(x) € U(xg), Vx, €S. 


The optimal cost function J* is then defined as: 
J* = min Jz(x), x eS. 
nell 


An admissible policy of the form a = {y, pL, ...} is 
termed stationary and its corresponding cost is J,,. 
When studying problems of this kind the assump- 
tion is made that either the discount factor is aw < 1 
(discounted problems, [3]) or that naturally there ex- 
ists a special cost-free absorbing state (stochastic short- 
est path problems, [3]). In either of these two cases the 
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total expected cost is finite, and the minimization, the 
way it was previously stated, is well defined. In situa- 
tions where either the discount factor is 1, or a terminal 
state does not exist it is more meaningful to optimize an 
average expected cost as: 


N-1 
1 
In (xo) _ Jim. ne | Y eosmnenh . 


The problems are known as average cost per stage prob- 
lems and although they bare various similarities with 
both discounted and stochastic shortest path problems, 
they have some distinct characteristics. One of the ear- 
liest studies was that of D. Blackwell, [7], which made 
the connection between the optimal average return and 
the optimal return for values of a — 1. Precisely these 
characteristics make the analysis of average cost per 
stage problems the target of intense research, [1]. Con- 
nections are made, for developing the associated the- 
ory, with both the associated discounted problem, but 
also recently with an associated stochastic shortest path 

problem, [4]. 

Since the theory for analyzing average cost dynamic 
programming problems has been largely based on the 
associated theory for discounted and stochastic shortest 
path problems, most of the results and computational 
methods bare major similarities. As a prelude to what 
follows, it should be pointed out that for the average 
cost per stage problems: 

1) the optimal average cost per stage is independent of 
the initial state for most problems; 

2) Bellman’s equation will take a slightly modified form 
that would include differential cost for each state; 

3) there exist computational analogues of all methods 
developed for either discounted or stochastic short- 
est path problems. 

The cost function of average cost per stage problems are 

closely related the associated a-discounted problem for 

a given stationary policy as follows: 

e For any stationary policy jy and for any a € (0, 1) we 
have: 


Joye =(1—@) Jy + hy + O(|1 — aI), 


where 


is the average cost corresponding to policy ju, for 


a process with a transition probability matrix P,, and 
costs g,,. The matrix O is such that: limg -. ; O(|1— 
a|) = 0, and the vector h,, satisfies: J,, + hy = gy + 
Puhy. 
In the above, the matrix P,, is the transition probability 
matrix for a given stationary policy j2, given by: 


pul“) Pin(“(1)) 
P= — oe 
Pmi(u(1)) Pnn(w(n)) 
and g,, the associated cost vector: 
g(l, w(1)) 
gu = _ 
g(n, (n)) 


The vector h, is termed differential cost vector, and it 
represents the difference in N-stage expected optimal 
cost due to starting at stage i rather than starting at 
stage j. The key optimality results irrespective of ini- 
tial states is based on ideas first formulated in [6]. An 
important element of this analysis is that of a unichain 
policy. Given a stationary-state Markov chain, [10], the 
subset of states that communicate, i. e., there exist tran- 
sitions k, and k2 for which state transitions probabilities 
pe and p; are positive, is termed a recurrent class of 
states. States that do not belong to a recurrent class are 
termed transient. A stationary policy whose associated 
Markov chain has a single recurrent class and a possi- 
bly empty set of transient states is called unichain. In 
view of the above, the form of the Bellman’s equation 
for characterizing an optimal policy, [9], for the aver- 
age cost per stage problem takes the following from: 

e Assume that any of the following conditions hold: 


1) Every policy that is optimal within the class of 
stationary policies is unichain. 

2) For every two states i and j, there exists a station- 
ary policy (i, j), such that for some k: 


P(x, = j|xo = i, m) > 0. 


3) There exist a state t, a constant L > 0, and @ € 
(0, 1) such that: 


Jai) — Joe(t)| < L, 
i= 1,....n; 
a € (@, 1), 


where Jy is the w-discounted optimal cost vector. 
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Then: 


1) The optimal average cost per stage cost has the 
same value, A, for all intimal states, and it satis- 
fies: 


A= lim (1 — a)Ja(i), 
oon (ees 


2) For any state t, the vector of differential cost, h is 
given by: 
h(i) = jim Va?) — Ja(t)), 
ee 


of, 


together with A, satisfies Bellman’s equation: 


A + h(i) 

= lim | g(i,u)+ Y- pij(u)h(j) , 
u€U(i) = 

a eee 


and A is the optimal average cost per stage for all 
states i, i.e., 


A= J*(x) = min Jz(i), 


es ere fe 


The above result is also discussed in [2] where the 
minimization of an expected cost without discounting 
is considered. All classical methods for computing op- 
timal policies and costs in dynamic programming have 
their counterparts for addressing average cost per stage 
problems. Certain alterations are nevertheless neces- 
sary. Let us consider first the value iteration method ex- 
haustively analyzed in [11,12]. This is a method based 
on the premise that the limit of steps of the basic dy- 
namic programming algorithm: 


: lk * 
lim —T“J=J*. 
k>00 k 


Two issues arise with average cost per stage problems. 
First, some elements of the sequence T*] may diverge 
to + oo or — oo making the numerical calculation trou- 
blesome. Furthermore, since we found that the quantity 
described as the differential cost is important it would 


be appropriate to develop methods that allow the paral- 
lel computation of h as well. [14] developed the funda- 
mentals based on which a relative value iteration of the 
form: 


hk*1(i) = (Th*)(i) — (Th*)(2), 


(a eres | 


for some fixed state t, converges to vector h such that 
(Th)(t) is equal to the optimal average cost per stage for 
all initial states, and h is the associated differential cost 
vector. [3] discusses various technical details required 
for proving convergence. Tight bounds that could im- 
prove the computational behavior of the value iteration 
method were proposed by [8] which modified the ap- 
proach set forth in [14] to prove that upper and lower 
bounds on the maximal gain could be readily obtained. 
These are given according to: 


Ck S Ckp1 SAS Cey < Ck, 


where A is the optimal average cost per stage for all ini- 
tial states and 


ck = min[(Th*)(i) — h*(a)], 


cx = max((Th*)(i) — h*(i)]. 


Recently, [4], by exploiting the connection between the 
average cost and the stochastic shortest path problem 
developed a new value iteration method by making use 
of weighter sup-norm contraction arising in the stochas- 
tic shortest path problem. One of the key advantages 
of this approach is that it admits a Gauss-Seidel im- 
plementation, thus it is amenable to a distributed im- 
plementation. Policy iteration methods can also be de- 
veloped. The policy iteration algorithms generate se- 
quences of stationary policies, each with improved cost 
over the preceding one. These methods are comprised 
of two basic steps, a policy evaluation and a policy im- 
provement step. During the first step, for a given sta- 
tionary policy, jz", we obtain the corresponding aver- 
age and differential costs via the solution of the follow- 
ing system of equations which solution provides the kth 
iterate of A, and h: 


AN + nF) = gi, wk) + > piu ANG), 
j=l 


j=l,...,n. 
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The policy improvement step, consists of finding a pol- 
icy i +1 where for all state i, is such that: 


gli, p**1(3)) + S> piu (i))hK() 


j=1 


= min 
u€U(i) 


gli, u) + S- pi(wh*()) 


j=l 


In [7], the scope of policy iteration is expanded so as 
to address problems in which the optimal average cost 
per stage is not the same for every initial state. It can 
also be shown, [3], that the optimal vector (A*, h*) is 
equivalent to the optimal solution of the following lin- 
ear program: 


max A 
st. A+h(i) < giu) + D> pij(w)h()) 
j=l 
u € U(i) 
a eee 7 


In [9] the dual problem of the above-mentioned formu- 
lation is considered, whose optimal value is the optimal 
value of the primal problem. The form of the dual is: 


min > x q(i, u)g(i, u) 


i=1 u€U(i) 

st YY aw =o YO ali. w)pis(u) 
u€Uu(i) i=1 u€U(i) 

j=l,...on 

S q(i,u) = 1 
i=1 u€U(i) 
q(i,u) = 0, 
u € U(i). 


a 


Simulation-based methods are presented in [3,5] that 
use the basic concepts of Monte-Carlo simulation as 
well as ideas of reinforcement learning, [13]. 
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Many of the data analysis tasks that arise in the field of 
classification and clustering can be given some type of 
combinatorial characterization that involves the iden- 
tification of object groupings, partitions, or sequences. 
These combinatorial structures are generally defined by 
certain properties of optimality for some loss (or merit) 
criterion based on data given in the form of an n x n 
symmetric proximity matrix P between distinct pairs of 
objects from a set S = {Oj, ..., O,}. The focus of this 
entry will be solely in this context and the use of a gen- 
eral optimization strategy referred to as the General Dy- 
namic Programming Paradigm (GDPP), which allows 
the construction of recursive procedures to solve a range 
of combinatorial optimization tasks encountered in the 
field of classification and clustering. The GDPP will be 
presented in a general form below with later sections 
indicating how it can be operationalized for a number 


of specific problem types. For a more extensive presen- 
tation of the topics introduced in this entry and for gen- 
eralizations to proximity matrices that may not be sym- 
metric or that are defined between objects from (two) 
distinct sets, the monograph [12] should be consulted. 
This latter source also provides numerical illustrations 
for the topics introduced here plus instructions on how 
to obtain a collection of programs (available on the 
World Wide Web) to carry out the various optimiza- 
tion tasks presented in this entry and [12]. For a recent 
and comprehensive review of cluster analysis and the 
use of mathematical programming techniques in gen- 
eral, see [7]. 


The GDPP 


To present the GDPP, a collection of K sets of entities is 
first defined, (2), ..., 2x, where it is possible by some 
operation to transform entities in (2,— to certain enti- 
ties in §2; for 2 < k < K. Each such transformation can 
be assigned a merit (or cost) value based only on the 
entity in (2,_; and the transformed entity in 2%. An 
entity in §2; is denoted by A;, and F(Ax) is the optimal 
value that can be assigned to A; based on the sum of the 
merit (or cost) increments necessary to transform an 
entity in (2), step-by-step, to Ag € S2,. If Ag_-1 € 2-1 
can be transformed into Ax € S24, the merit (or cost) 
of that single transition will be denoted by M(A;—1, Ax) 
(or C(Ay—1, Ax)), and where the latter does not depend 
on how A,_; may have been arrived at starting from an 
entity in §2,. Given these conditions, and assuming the 
values F(A) for Aj € §2; are available to initialize the 
recursive system, F(A;) may be constructed for k = 2, 
..., K (when merit is to be maximized) as 


F (Ak) = max[F(Ag—-1) + M(Ag-1, Ax)], 


where Ag € 924, Ap € (2,x—1, and the maximum is 
taken over all A,_, that can be transformed into Ax. Or, 
if cost is to be minimized, 


F (Ak) = min[F (Ag-1) + C(Ax-1, Ax)]- 


In addition, both max/min and min/max forms could 
be considered as: 


F (Ax) = max[min( F(Ag—1), M(Ag-1, Ax))], 
F (Ax) = min[max(F(Ax—1), C(Ak—1, Ax))]. 
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The leading maximization or minimization is over all 
Ag—1 € §24_, that can be transformed into Ax. In all in- 
stances, an optimal solution is identified by some value 
for F(Ax) for a specific Ax € 2x, and the actual opti- 
mal solution obtained by working backwards through 
the recursion to see how F(Ax) was constructed. 


Partitioning 


The most direct characterization of the partitioning 
task can be stated as follows: given S = {O), ..., On} 
and P = {pj}, find a collection of M mutually exclusive 
and exhaustive subsets (or clusters) of S, say, Sj, ...; 
Sm; such that for some measure of heterogeneity H(-) 
that attaches a value to each possible subset of S, ei- 
ther the sum nn H(S,,), or alternatively, max[H(S;), 
...> H(Sy)], is minimized. This stipulation assumes that 
heterogeneity has a cost interpretation and that smaller 
values of the heterogeneity indices represent the‘better’ 
subsets (or clusters). If H(S,,) for some S,, C S depends 
only on those proximities from P that are within S,, 
and/or between S,, and S — S,,, an application of the 
GDPP is possible. Define K to be M, and let each of the 
sets (2),..., 2 contain all of the 2"— 1 nonempty sub- 
sets of the n object subscripts; F(A,) is the optimal value 
for a partitioning into k classes of the object subscripts 
present in A;. A transformation of an entity A,_; € 
Qy-1 to Ag € 2x is possible if Ay) C Ax, with cost 
C(Ay_1, Ax) = H(A; — Ap—i). Thus, beginning with the 
heterogeneity indices H(A;) for every subset A; C S, the 
recursion can be carried out, with the optimal solution 
represented by F(Ay,) when Ay = S. 

The first discussion of this general type of recur- 
sive solution for the partitioning task was in [14] but 
limited to one specific measure of subset heterogeneity 
defined by the sum of proximities within a subset di- 
vided by twice the number of objects in the subset. If 
the original proximities in P happened to be squared 
Euclidean distances between numerically given vectors 
(or profiles) for the n objects over some set of vari- 
ables, then this subset heterogeneity measure is equiv- 
alent to the sum of squared Euclidean distances be- 
tween each profile and the mean profile for the subset 
(this quantity is usually called the sum of squared er- 
ror or the k-means criterion, e. g., see [16], p. 52) A ma- 
jor advantage of the GDPP formulation is that a vari- 
ety of heterogeneity measures can be considered under 


a common rubric, with the sole requirement that the 
measure chosen be dependent only on the proximities 
within a subset and/or between the subset and its com- 
plement. For example, in [12] some twelve different al- 
ternatives are illustrated using a program implemen- 
tation that can effectively deal with object set sizes in 
their lower 20’s with the type of computational equip- 
ment and storage capacity now commonly available. As 
noted in a later section, it is also possible to extend the 
GDPP heuristically to allow for much larger object set 
sizes, although an absolute guarantee of globally opti- 
mality for the identified structures is sacrificed. 


Admissibility Restrictions on Partitions 


A specific restriction discussed at some length in the 
literature (see [8, Chapt. 5], [16, pp. 61-64], [6]) that 
would permit the construction of optimal partitions 
(subject to the restriction) for very large object sets is 
when there is an a priori assumed object ordering along 
a continuum that can be taken without loss of general- 
ity as O; <--- < O,, and the only admissible clusters are 
those for which the objects in the cluster form a consec- 
utive sequence or segment. Thus, an optimal partition 
will consist of M clusters, each of which defines a con- 
secutive segment along the given object ordering. To 
tailor the GDPP to a consecutive-ordering admissibility 
criterion, each of the sets 21,..., 2y is now defined by 
the n subsets of S that contain the objects {O;, ..., Oj} 
for 1 <i<n; F(A,) is the optimal value for a partition- 
ing of Ax into k classes; a transformation of an entity 
Ap—1 € Qy—1 to Ax € QQ; is possible if Ax: C Ax; and 
the cost of the transition is H(A,—Ax_ 1), where Ay— 
Ax—1 must contain a consecutive sequence of objects. 
Again, F(Ay) for Ay = S identifies an optimal solution. 

The selection of some prespecified ordering that 
constrains admissible clusters in a partition obviously 
does not lead necessarily to the same unconstrained op- 
timal partitions, even though the identical subset het- 
erogeneity measure and optimization criterion are be- 
ing used. There are, however, several special instances 
where the original proximity matrix P is appropri- 
ately defined and/or patterned so that the imposition of 
a particular order constraint does invariably lead to par- 
titions that would also be optimal even when no such 
order constraint was imposed. One such result dates 
back to W.D. Fisher [6] who showed that when proxim- 
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ities are squared differences between the values on some 
(unidimensional) variable, and the order constraint is 
derived from the ordering of the objects on this vari- 
able, then the selection of the sum of squared error as 
the subset heterogeneity measure, and minimizing this 
sum as an optimization criterion, leads to partitions 
that are not only optimal under the order constraint 
but also optimal when unconstrained, i.e., an uncon- 
strained optimal partition will include only those sub- 
sets defined by objects consecutive in the given order. 
(The subset heterogeneity measure in this unidimen- 
sional case reduces to the sum of squared deviations of 
the univariate values for the objects from their mean 
value within the subset.) A more general result appears 
in [3] where the special case is discussed when a prox- 
imity matrix P can be row- and column-reordered to 
display an anti-Robinson form (a matrix pattern first 
introduced in [15]). As stated more formally below, 
a matrix has an anti-Robinson form if the entries within 
each row and column of P never decrease when moving 
away from a main diagonal entry in any direction. For 
certain subset heterogeneity measures and optimiza- 
tion criteria, imposing the order constraint that displays 
the anti-Robinson pattern in the row- and column- 
reordered proximity matrix leads to partitions that are 
also optimal when unconstrained. 

The choice of an ordering that can be imposed 
to constrain the search domain for optimal partitions 
could be directly tied to a task, discussed later, of find- 
ing an (optimal) sequencing of the objects along a con- 
tinuum. Somewhat more generally, one possible data 
analysis strategy for seeking partitions as close to opti- 
mal as possible, would be to construct an object order- 
ing through an initial optimization process, and pos- 
sibly one based on another analysis method that could 
then constrain the domain of search for an optimal par- 
tition. Obviously, if one were successful in generating 
an appropriate object ordering, partitions that would 
be optimal when constrained would also be optimal 
without the constraint. The obvious key here is to have 
some mechanism for identifying an appropriate order 
to give this possible equivalence (between an optimal 
constrained partition and one that is optimal without 
constraint) a chance to succeed. As one example of 
how such a process might be developed for constructing 
partitions based on an empirically generated ordering 
for the objects, a three-stage process is proposed in [1] 


and [2]. First, the objects to be partitioned are embed- 
ded in a Euclidean representation with a specific multi- 
dimensional scaling strategy. Second, by heuristic meth- 
ods, a path among the n objects in the Euclidean repre- 
sentation is identified (hopefully, with close to minimal 
length) and used to define a prior ordering for the ob- 
jects and to constrain the subsets present in a partition. 
Finally, a recursive strategy of the same general form 
just described is carried out to obtain a partitioning of S. 


Hierarchical Clustering 


The problem of hierarchical clustering will be charac- 
terized by the search for an optimal collection of par- 
titions of S, which are denoted generically as P), ..., 
P,. Here, Pj is the (trivial) partition where all n objects 
from S are placed into n separate classes, P,, is the (also 
trivial) partition where a single subset contains all n ob- 
jects, and P; is obtained from P,_; by uniting some pair 
of classes present in P,_). As an optimization criterion 
the sum of transition costs is minimized, irrespective of 
how the costs might be defined, between successive par- 
titions in a hierarchy. Specifically, suppose T(‘P,-1, Px) 
denotes some measure of transition cost between two 
partitions Py; and P;, where P; is constructed from 
P.—1 by uniting two classes in the latter partition. An 
optimal partition hierarchy Pi, ..., P, will be one for 
which the sum of the transition costs, )vis2 T(Px-1, 
P;,), is minimized. To apply the GDPP, first define n 
sets £21, ..., 2n, where §2; contains all partitions of the 
n objects in S into n— k+ 1 classes. The value F(A;) for 
Ax € 92; is the optimal sum of transition costs up to the 
partition A;; a transformation of an entity A;_; € 24-1 
to Ay € 92; is possible if A; is obtainable from A,_; by 
uniting two classes in Ay—1, and has cost C(Aj—1, Ax) 
= T(A;-1, Ax). Beginning with an assumed value for 
F(A) of 0 for the single entity A; € 92; (which is the 
partition of S into subsets each containing a single 
object), and constructing F(A;) recursively for 2,..., n, 
an optimal solution is identified by F(A, for the single 
entity A, € 02, defined by the partition containing all n 
objects in a single class. 

A concept routinely encountered in discussions of 
hierarchical clustering is that of an ultrametric, which 
can be characterized by any nonnegative n x n symmet- 
ric dissimilarity matrix for distinct pairs of the objects 
in S, denoted generically as U = {uj}, where uj = 0 ifand 
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only if i = j and the entries in U satisfy the ultrametric 
inequality: uj < max{uj, uj} for 1 < i, j,k <n. Any 
ultrametric identifies a specific partition hierarchy, Pj, 
...) Pay Where those object pairs defined between sub- 
sets united in P,_; to form P; all have a common ultra- 
metric value; moreover, this latter value is not smaller 
than those for object pairs defined within these same 
subsets. One approach to the development of hierarchi- 
cal clustering methods is by directly fitting an ultramet- 
ric to P minimizing a loss criterion defined by an Lp- 
norm between {pj,} and a (to be identified) ultrametric 
matrix {uj}. To be specific, for a given partition hierar- 
chy, P1,..., Px, let as and co denote the two classes 
united in P,_; to form P;, and specify b;-; to be some 
appropriate aggregate (or ‘average’) value of the prox- 
imities for object pairs between co and an The loss 
functions used to index the adequacy of a given parti- 
tion hierarchy in producing an ultrametric fitted to P 
are for the L,-norm: 


n 
> ) |piry — by-a|, 
=2 o,ec™,, 


(v) 
Or ECi_y 


where ;—, is the median proximity between ena and 
Ges for the L2-norm: 
n 


> (piri — bra)’, 


=2 o,ec,, 


orec, 
where b;_; is the mean proximity between ce and 
CMs and for the Loo-norm: 
n 


max |pirjn _ bi-1| ‘ 
u) 


( 
t=2 Ov EC), 
(v) 
Op ECi-y 


where b;_; is the average of the minimum and maxi- 
mum proximities between ce and Cea For all three 
Lp-norms, an optimal ultrametric will be one for which 
the order constraint on the between-subset aggregate 
values holds: b) < +++ < by,—1, and the norm is mini- 
mized. For such an optimal solution, bj, ..., b,-1 de- 
fine the distinct entries in an (optimal) fitted ultramet- 
ric. To implement a dynamic programming approach 
for locating an optimal ultrametric, C(A,—1, Ax) is the 
incremental cost of transforming A,_; to Ax character- 


ized by the appropriate L,-norm when that pair of sub- 
sets in Ay_; is united to form A,;. As developed in de- 
tail in [11], an explicit admissibility criterion must also 
be imposed for defining a permissible transition from 
Ax— to Ax that could ensure a nondecreasing sequence 
of between-subset aggregate values. 


Constrained Hierarchical Clustering 


Analogously to the admissibility conditions for parti- 
tions, one constraint that might be imposed on each 
partition in (2; is for the constituent subsets to con- 
tain objects consecutive in some given ordering (which 
could be taken as O; < --- < O, without loss of any 
generality). Thus, (2; will be redefined to contain those 
partitions that include n— k+ 1 classes, and where each 
class is a segment in the given object ordering. 


Optimal Sequencing of an Object Set 


A combinatorial optimization task closely related to 
both partitioning and hierarchical clustering is the 
search for an optimal sequencing of the object set S 
based on the proximity matrix P. A best reordering is 
sought for the rows and columns of P that will opti- 
mize, over all possible row/column reorderings, some 
specified measure of patterning for the entries of the re- 
ordered matrix. Irrespective of the particular measure 
chosen, the GDPP is specialized as follows: A collec- 
tion of sets 2), ..., Q, is defined, where 2; includes 
all the subsets that have k members from the integer set 
{1, ..., n}. The value F(A;) is the optimal contribution 
to the total measure of matrix patterning for the ob- 
jects in A, when they occupy the first k positions in the 
(re)ordering. A transformation is now possible between 
Ag-1 € Qy_; and Ag € 925 if Ag_; C Ag (i. e., Ag; and 
A, differ by one integer). The contribution to the total 
measure of patterning generated by placing the single 
integer in Ay,_,— Ax at the kth order position is M(A;_1, 
Ax). As always, the validity of the recursive process will 
require the incremental merit index, M(A,—1, Ax), to 
depend only on the unordered sets Ay_; and A;, and the 
complement S — A;, and specifically not on how Ax-1 
may have been obtained beginning with §2,. Assuming 
F(A;) for all Ay € §2; are available, the recursive pro- 
cess can be carried out from §2; to 92,,, with F(A,) for 
the single set A, = {1,..., n} € 2, defining the optimal 
value for the specified measure of matrix patterning. 
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Two general classes of measures of matrix pattern- 
ing are mentioned here. The first is a row and col- 
umn gradient index motivated by the ideal of reorder- 
ing a symmetric proximity matrix to have an anti- 
Robinson form (and which is the same structure noted 
briefly in the clustering context when an optimal order- 
constrained partition might also be optimal when un- 
constrained). Specifically, suppose p(-) is some permu- 
tation of the first n integers that reorders both the rows 
and columns of P (i.e., Pp = {pci p(/)}). As noted ear- 
lier, the reordered matrix P, is said to have an anti- 
Robinson form if the entries within the rows and within 
the columns of P, moving away from the main diag- 
onal in any direction never decrease; or formally, two 
gradient conditions must be satisfied: 

e within rows: Poot) S Petip(j) forl<i<k <j <n; 
e within columns: Polke) S Powe) forl<i<k< 

jan. 

It might be noted that whenever P is an ultrametric, or 
if P has an exact Euclidean representation in a single di- 
mension (i.e., P = {|x; — x;|}, for some collection of co- 
ordinate values, x), ..., X,), then P can be row/column 
reordered to display a perfect anti-Robinson pattern. 
Thus, the notion of an anti-Robinson form can be in- 
terpreted as generalizing either a perfect discrete clas- 
sificatory structure induced by a partition hierarchy 
(through an ultrametric) or as the pattern expected in P 
if there exists an exact unidimensional Euclidean repre- 
sentation for the objects in S. In any case, ifa matrix can 
be row/column reordered to display an anti-Robinson 
form, then the objects are orderable along a continuum 
so that the degree of separation between objects in the 
ordering is reflected perfectly by the dissimilarity infor- 
mation in P, i-e., for the object ordering, Opi) < Opcx) 
< Opi (for i< k <j), Potow S Poop aNd Porto) S 
P{otio(- 

A natural (merit) measure of how well a reordered 
proximity matrix P, satisfies these two gradient con- 
ditions would rely on an aggregate index of the vio- 
lations/nonviolations over all distinct object triples, as 
given by the expression: 


Ye Poco): Poor) 
i<k<j 


+ > F(Powo)- Potrop)» (1) 


i<k<j 


where f(-, -) is some function indicating how a viola- 
tion/nonviolation of a particular gradient condition for 
an object triple within a row or within a column (and 
defined above the main diagonal of P,) is to be counted 
in the total measure of merit. The one option concen- 
trated on here will be f(z, y) = sign(z — y) = + lifz> 
y; 0 if z = y; and — 1 if z < y; thus, the (raw) number 
of satisfactions minus the number of dissatisfactions of 
the gradient conditions within rows above the main di- 
agonal of P, is given by the first term in (1), and the 
(raw) number of satisfactions minus dissatisfactions of 
the gradient conditions within columns above the main 
diagonal of P, is given by the second term. To carry 
out the GDPP based on the measure in (1), an explicit 
form must be given for the incremental contribution, 
M(Ax-1, Ax), to the total merit measure of patterning 
generated by placing the single integer A,— Ax_, at the 
kth order position. For any ordering p(-) of the rows 
and columns of P, the merit increment for placing an 
integer, say, k’ (= p(k)) (ie., {k’} = Ay— Ag—1) at the 
kth order position can be defined as )7?_ | Trow(o(k)) + 
rey Ico1p(k)), where 


Trow(p(k)) = D> do foie. pry), 


i/ CAR] j’ES—Ag 


Tolelk)) = SY) > f(pey, pir), 


i CAR] j’ES—AR 


and Ay; = {e(1), ..., p(k—1)}, S — Ax = {o(k+1), ..., 
p(n)}. Thus, letting F(A;) = 0 for all Ay € 92), and 
using the specification for f(-, -) suggested above, the 
recursion can be carried out to identify an optimal 
row/column reordering of the given proximity matrix P 
to maximize this gradient measure over all row/column 
reorderings of P. 

A second class of measures of matrix patterning can 
be derived indirectly from the auxiliary problem of at- 
tempting to fit a given proximity matrix P by some type 
of unidimensional scaling representation (i.e., a seri- 
ation). Suppose the search is for a set of n ordered coor- 
dinate values, x) <-++ <x, (such that }°;, x, = 0), and 
a permutation p(-) to minimize the least squares crite- 
rion 


Yo Powe — |x; — xi)”. 


i<j 
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After some algebraic reduction (see [5]), this latter least 
squares criterion can be rewritten as 


De Piij 


i<j 
+n) [xe (<) G(p(k)))? — - SIG)? 
k k 


where 


k-1 n 
G(o(k)) = D>) Power — D> Potro: 
i=1 i=k+1 


If the measure 


DGC)? (2) 


k=1 


is maximized over all row/column reorderings of P, 
and denoting the optimal permutation by p*(-), then 
G(p*(1)) < --- < G(p*(n)), and the optimal coordi- 
nates can be retrieved as x; = (1/n)G(p*(k)), for 1 < 
k < n. To execute the GDPP recursion using (2), the 
merit increment for placing the integer, say k’ (= p(k)) 
(i.e., {k’} = Ay — Ag_i) in the kth order position can be 
written as [G(p(k))]’, where 


G(o(k)) = D> pev— DD pez, 


i/E AR] j/ES—A 


with Ar-1 = {e(1), seed p(k-1)}, S— Ak = {o(k+1), seed 
p(n)}, and F(Aj) for Ay = {k’} € Q, defined by 


2 


Spey 


yes{k} 


Optimal Sequencing Based on the Construction 
of Optimal Paths 


To tailor the GDPP (and for the moment emphasizing 
the minimization of the sum of adjacent object prox- 
imities in constructing a path among the objects in S), 
a collection of sets 2), ..., 2, is defined so that each 
entity in §2;, 1 < k < n, is now an ordered pair (A;, 
jk). Here, Ax is a k element subset of the n subscripts 
on the objects in S, and jx is one subscript in A; (to be 
interpreted as the subscript for the last-placed object in 
a sequencing of the objects contained within A;). The 


function value F((Ax, jx)) is the optimal contribution to 
the total measure of matrix patterning for the objects in 
A, when they are placed in the first k positions in the 
(re)ordering, and the object with subscript j, occupies 
the kth. A transformation is possible between (Ax—1, 
jk-1) © Qy-y and (Ag; jx) € Lx if Ag-1 C Az and Ax 
— Api = {jx} (ie, Ap: and A, differ by the one in- 
teger j,). The cost increment C((Aj—1, jr—1)s (Ak jx) is 
simply pij,_,);, for the contribution to the total mea- 
sure of patterning generated by placing the object with 
the single integer subscript in Ay — Ax— at the kth or- 
der position (i. e., the proximity between the adjacently- 
placed objects with subscripts j,_; and j;,). The type of 
GDPP recursion used for the construction of optimal 
linear paths can be modified easily for the construction 
of optimal circular paths: choose object O, as an (arbi- 
trary) origin and force the construction of the optimal 
linear paths to include Oj as the initial object by defin- 
ing F((Aj, j1)) = 0 for j; = 1 and A; = {1}, and otherwise 
by a very large positive or negative value (depending on 
whether the task is a minimization or a maximization, 
respectively). The function values F((A,, j,)) for all j, 
1 < jy <n for (Ap, jn) € 2, and A, = {1,..., n} can then 
be used to obtain the optimal circular paths depending 
on the chosen optimization criteria as follows: 

e minimum path length: 


min[F((An, jn)) + Pj,ils 

© maximum path length: 
max|F((An, jn)) + Pj,ils 

¢ minimax path length: 
min[max(F((An, jn), Pin] 

¢ maximin path length: 
max[min(F ((An, jn), Pint]. 


For the first discussions in the literature on construct- 
ing optimal paths through DP, see [4,9]; for applica- 
tions to a variety of data analysis tasks, see [13]. 


Optimal Ordered Partitions 


The task of constructing an ordered partition of an ob- 
ject set S= {O,,..., O,} into M ordered classes, S, <--> 
~< Sy, using some (merit) measure of matrix patterning 
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and a proximity matrix P, can be approached through 
the GDPP recursive process applied to the partition- 
ing task but with appropriate variation in defining the 
merit increments. Explicitly, the sets 92), ..., 2 will 
each contain all 2"— 1 nonempty subsets of the n ob- 
ject subscripts; F(A,) for A, € 2; is the optimal value 
for placing k classes in the first k positions, and the sub- 
set A; is the union of these k classes. A transformation 
from Ag—1 € 2x1 to Ax € $2; is possible if A, C Ax; 
the merit increment M(A;—1, Ax) is based on placing 
the class Ag_; — Ax at the kth position (which will de- 
pend on A;—;, Ag, and S— A;). Beginning with F(Aj) 
for all Ay € £2; (i.e., the merit of placing the class A; 
at the first position), the recursion proceeds from £2, to 
Q, with F(Ay) for Ay = S € 2 y defining the optimal 
merit value for an ordered partition into M classes. 

To generalize the gradient measure given in (1), the 
merit increment for placing the class Ay,— A,—; at the 
kth order position is Ipow(Ag— Ax—1) + Icol(ay—Ag—1)> 
where 


Trow(Ak = Ax-1) 


=) 2» Yd f(b, pity), 


i/GAp—i k’ECAp—Ag—1 j’ES—Ag 


and 


Teo (Ak _ Ax-1) 


= » » > S (Pej, Pit). 


i/CAR—y k’EAR—Ap—y j/ ESA 


To initialize the recursion, let F(A) = 0 for all A; € QQ). 

A merit measure based on a coordinate representa- 
tion for each of the M ordered classes, S; <--- < Sy, 
that generalizes (2) can also be developed directly. Here, 
M coordinates, x} <--- < xy, are to be identified so 
that the residual sum-of-squares 


YDS isiy lz — el), 


k<k! ip€Sx, 
Jl ES 
is minimized (the notation pi;j,) indicates those prox- 
imities in P defined between objects with subscripts ix 
€ S_ and jym € Sr). A direct extension of the argument 
that led to optimal coordinate representation for single 
objects would require the maximization of 


Sf i 
> (—) (G(Ax — Agi)’, (3) 
k=1 


where 
G(Ax — Ax-1) 
- © (¥ mw- ¥ me), 
k/EAR—Az—1 i/EAR-] i/ES—Ax 


and n; denotes the number of objects in Ay, — Aj—1. 
The merit increment for placing the subset A, — Ax—) at 
the kth order position would be (1/1,)(G(Ax — Ax—1))?, 
with the recursion initialized by 


F(A) = (—) ( ys per) 


K/EA, i ES—Ay 


for all A; € 92;. If an optimal ordered partition that 
maximizes (3) is denoted by ST ~ --- ~< Sj, the opti- 
mal coordinates for each of the M classes can be given 


G(SF 
xp = (<) (P80), 


n Nk 


as 


where xf <-++ < x}, and )°xnxf = 0. A more com- 
plete discussion of constructing optimal ordered parti- 
tions appears in [10]. 


Heuristic Applications of the GDPP 


When faced with the task of finding a single optimal 
partition for a (large) object set S, if one had knowl- 
edge that for an optimal M-class partition the classes 
could be allocated to two (or more) groups, then the 
aggregate collections of the objects within these latter 
groups could be separately and optimally partitioned 
and an optimal M-class partition for the complete ob- 
ject set identified directly. Or, if it were known that cer- 
tain elemental subsets of the objects in S had to ap- 
pear within the classes of an optimal M-class partition, 
one could begin with these elemental subsets as the ob- 
jects to be analyzed, and an optimal M-class partition 
could again be retrieved. The obvious difficulty is to 
identify either the larger aggregate groups that might 
be dealt with separately, or an appropriate collection 
of elemental subsets, and in a size and number that 
might be handled by the recursive optimization strat- 
egy. For the latter task of identifying elemental subsets, 
one possible approach would be to begin with a parti- 
tion of S into several classes (possibly obtained through 
another heuristic process), and where each class con- 
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tained a number of objects that could be optimally an- 
alyzed. Based on these separate subset analyses, a (ten- 
tative) collection of elemental subsets would be identi- 
fied. These could then be used to obtain a subdivision 
of S, and again within each group of this subdivision, 
the objects could be optimally partitioned to generate 
a possibly better collection of elemental subsets. This 
process could be continued until no change occurred in 
the particular elemental subsets identified. As an alter- 
native, one could start with some collection of tentative 
elemental subsets obtained through another (heuristic) 
optimization strategy and try, if possible, to improve 
upon these through the same type of procedure. This 
latter approach is illustrated in [12]. Similarly, the tasks 
of constructing a (hopefully optimal) partition hierar- 
chy or object order for a (large) set could be approached 
through the identification of a collection of elemental 
subsets, which would then be operated on as the basic 
entities for the generation of a partition hierarchy or an 
object sequence. 
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Even though dynamic programming [1] was originally 
developed for systems with discrete types of decisions, 
it can be applied to continuous problems as well. In this 
article the application of dynamic programming to the 
solution of continuous time optimal control problems 
is discussed. 


Problem Formulation 


Consider the following continuous time dynamical sys- 
tem: 


a(t) = flz(t), u(t), 


z(0) = Zo, a) 


VSt= 7, 
where z(t) € R" is the state vector at time t with time 
derivative given by z(t), u(t) € U C R” is the control 
vector at time t, U is the set of control constraints, and T 
is the terminal time. The function f (z(t), u(t)) is contin- 
uously differentiable with respect to z and continuous 
with respect to u. The set of admissible control trajecto- 
ries are given by the piecewise constant functions, {u(t): 
u(t) € U, Vt € [0, T]}. It is assumed that for any admis- 
sible control trajectory, that a state trajectory z(t) ex- 
ists and is unique. For a full treatment of existence and 
uniqueness, see [4]. 

The objective is to determine a control trajectory 
and the corresponding state trajectory which minimizes 
a cost function of the form: 


T 
h(z"(T)) +f g(z"(t), u(t)) dt, (2) 
0 


where the functions g, and h are continuously differen- 
tiable with respect to both z and u. 


Example 


As a simple example, consider the problem of moving 
a unit mass from an initial point to a given final point. 


The position of the mass along a line is given by the 
state z,(t) and its velocity by z2(t). The control u(t) is 
the force applied to the mass, and is bounded u(t) € 
[—1, 1]. This system is described by: 


a(t) =22(t), z(t) = u(t), 
z(0) = [z:(0), z2(0)], ¢ € [0, T], 
u(t) € [—1, 1]. 


The objective is to move this mass as near to the final 
state point, [Z), Z2], as possible. This can be formulated 
as the minimization of the square error at the final time 
point. 


2 
pe (z;(T) —Z(T))’. 


Converting this cost function into the form given by 
(2)) results in: 


2 
h(z(T)) = )0 (2i(T) -2(T))’, 
i=1 
g(z"(t),u(t)) =0, Vt e [0,T]. 


Hamilton-Jacobi-Bellman Equation 


The time horizon is divided into N equally spaced inter- 
vals with 5 = T/N. This converts the problem into the 
discrete-time domain and the dynamic programming 
approach can be applied. Once the approach is applied, 
the result is converted back into the continuous-time 
domain by taking the limit as 5 — 0. The result is the 
following partial differential equation, 


0 = min 
uEeU 


[e(z.u) + ViS*(t2) + Vil*(t2) fz], © 
]"(T,2) = h(z), Vz, 


where J*(t, z) is the optimal cost-to-go function. This 
equation is called the Hamilton-Jacobi-Bellman equa- 
tion. It is also referred to as the continuous-time analog 
of the dynamic programming equation. 


Pontryagin Minimum Principle 


It is possible to derive the Pontryagin minimum princi- 
ple using the Hamilton—Jacobi-Bellman equation given 


846 


Dynamic Programming: Discounted Problems 


above. Using the system given by (1), the classic princi- 
ple results: 


p(t) = —V.H(z*(t), ux(t), p(t), 

p(T) = Vh(z*(T)), 

H(z*(t), u*(t), p(t)) 

= g(z*(t),u*(t)) + p' (t)f(z*(t), u*(d), 
u"(t) = argmin H(z"(t), u(t), p(t), 


where z*(t) and u*(t) are the optimal state and control 
trajectories, respectively. 

A more detailed description of these two results are 
given in the following sections. For dynamic program- 
ming and optimal control problems, see [2] as well as 
the classic optimal control text [3]. 
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Dynamic programming addresses models of decision 
making systems of an inherent sequential character. 
The problem of interest is defined as follows. We con- 
sider a discrete-time dynamic system: 


Xe = f (Xk, Uk, Ok), 
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The state transitions, f, that define the evolution of the 
system from time k to time k + 1 depend on the current 
state of the system, x;, external disturbances, w;, which 
are considered to be random variables, and finally on 
a set of control, or policy, actions, uz. The state of the 
system, x;, k = 0, 1, ..., is an element of a space S, the 
control variables, u;, k= 0, 1, ..., belong to space C, and 
the random external disturbance belongs to a countable 
space D. The control variables are such that: u, € U(x,) 
Cc C,k=0,1,..., and depend on the current state x;, k 
= 0, 1,.... The random disturbances, w;, k = 0, 1,..., 
have identical, known, distributions which depend on 
the current state and control, P(wx | xx, ux). Note that 
@, does not depend on previous values of the distur- 
bances, but may depend explicitly on the values of x, 
and u,. Given an initial state xo, the problem is to find 
a control law m = {{Uo, [41, ...}, belonging to the set of 
admissible policies, II, which is the set of all sequences 
of functions mz = {jto, /t1, ...} with: 


be: SC, px(xK) € U(xr), 
Vx, €S, k=0,1,..., 


that minimizes the cost functional: 


k=N-1 


So tk gi (xk, Uk OKIE - 


k=0 


In (xo) = lim E 
Noo 
The optimal cost function J* is thus defined as: 
J*(x) = min Jz(x), x€S. 
well 


The cost, Jz (xo), for any x9 € S and a given policy z, 
represents the limit of the expected finite horizon costs 
and these are well defined. The discounted problems 
with bounded cost per stage are such that the following 
assumption holds: 


Assumption 1 
1) V(x, u, @) € S x C x D the functions defining the 
cost per stage g are uniformly bounded: 


0 < |ge(xk, Uk, @x)| <M; 


2) Me R,and0<a<l. 


This type of problem was first address through the pi- 
oneering work of D. Blackwell, [6]. The scalar, a, is 


the discount factor, and the range of its admissible val- 
ues implies that future costs matter less that costs in- 
curring at the present time, particularly when the cost 
per stage has a monetary interpretation. Mathemati- 
cally, the presence of the discount factor guarantees the 
finiteness of the cost functional provided that the per 
stage costs are bounded uniformly. Furthermore, al- 
though the assumption of an infinite number of stages 
may never be satisfied in practice, it constitutes a rea- 
sonable approximation for problems involving a large 
number of stages. A rather typical example of a dis 
counted infinite horizon dynamic problem is the so- 
called asset selling problem where the reward for selling 
a particular asset at a given time k diminishes as time 
progresses. 

For any function J: S — R we define the operator 


(T() as: 
(TI)(x) = min Efg(x,u.0) + all fle, u.o))}. 


This is in essence the function obtained when apply- 

ing the standard dynamic programming mapping to J. 

Note that (TJ) represents essentially the optimal cost for 

a one-stage problem that has stage cost g and terminal 

cost aJ. For this operator, it can be shown, [4], that: 

e For any bounded function J, the optimal cost func- 
tion satisfies: 


rus Jima (T™ D(X), Vx ES, 


In other words, the dynamic programming algorithm 
converges to the optimal cost function. The above re- 
sult relies on Assumption 1. It should be noted that the 
operator (TJ) can be shown to be: 

1) monotonic: 


I(x) <J'(x) > (T(x) < (TY) 


for any functions J: S > R and J’: S > Rand, 
2) contractive: 


max |(T*)(2) - (rFy"(x)| 


k 


< a* max |J(x) — J'(x)| 
xES 


for any bounded functions J: S > Rand J’:S > R. 
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Both of these properties are important so as to show not 
only theoretical convergence of the dynamic program- 
ming algorithm but also to construct numerical solu- 
tion schemes. Furthermore, the optimal cost function 
J* can be shown to satisfy Bellman’s equation, i.e., Vx 
ES: 


ro ae E{g(x,u,w) + aJ*(f(x,u,))} 


in other words: J* = TJ*. This proposition essentially 
defines the necessary and sufficient condition for the 
optimality of a policy jz, i.e. 4(x) is optimal if and only 
if it attains the minimum in Bellman’s equation for ev- 
eryx eS. 

For the case where the state, control and distur- 
bance space are finite, i.e., each set has a definite num- 
ber of elements which can be found in principle, [9], 
several approaches exist for numerically solving the dis- 
counted problem with bounded cost per stage. It should 
be pointed out that under these conditions the prob- 
lem is equivalent to a finite-state Markov chain. The 
first, value iteration, is based on a successive compu- 
tation of TJ, T*J, ..., since we know that lim, — oo 
(T*J) = J*. Recall that the operator (TJ) is defined as the 
minimum over all possible disturbances with respect 
to the controls. Therefore, asymptotically we approach 
the optimal cost as well as the optima policy. Tight up- 
per and lower bounds on the iterations can be derived, 
[3,4], which substantially improve the convergence rate 
of the successive approximations. More specifically, it 
can shown that for every vector J, state i, and time k: 


(T (i) + ce < I*(i) < (TDG) + &, 
where: 


ce = —— min (TD) — (TE NI, 


1l—a@ i=1,...,n 


B= ae max (TEND) (TNL 
In fact, these error bounds can be used so as to further 
prove the finite convergence of the value iteration after 
k <K steps, k’ € N. It can also be observed that instead 
of performing the value iteration simultaneously for 
all policies, one can perform the iteration in a Gauss- 
Seidel fashion, [10]. The contractive characteristics of 
the operator (FJ) make it possible to develop similar 
schemes. Instead of iterating on the operator (TJ), we 


define a new sequence based on the operator (FY): 


N 
(FQ) = min | g(l.w) + a) © pij(u)J() |. 


j=l 


(FJ)(i) = as Ee u) 


i-1 n 
+a S> pi(FDY + a Spi (uJ) ‘ 


j=l j=i 


1 Says 


In fact, when the error bounds are not used, a very in- 
teresting property can be shown, [4]: 
e If] satisfies: 


Ii) < (TDG) = °C, 


t= 1,.2.04.% 
then: 


(T*y)(i) < (FANG) <I*(), 


a eee (i 


In other words, the Gauss-Seidel iteration converges 
faster than the ordinary, i. e., Jacobi, value iteration. An 
excellent treatment of the comparisons between Gauss- 
Seidel and Jacobi iterations and their parallel imple- 
mentation can be found in [14]. Although the value 
iteration can be shown to be convergent even when 
the state and control spaces are infinite, the actual im- 
plementation can only proceed via approximations. In 
other words, instead of actually computing TJ we can 
only compute J’, such that: maxye s|J’(x)—(TJ)(x)| < 
e. For such approximate methods to be in order, we 
do not necessarily need infinite spaces but even spaces 
with a very large number of states in which the actual 
computation is deemed inappropriate. Any function 
J’ that satisfies the above criterion can in principle be 
used. Details regarding discretization approaches and 
computational techniques for addressing infinite state 
spaces can be found in [2,11]. 

The value iteration, thus far presented, is based on 
successive evaluations of the cost functions. Early on, 
[1], it was suggested that an alternate approach is to 
iterate on policies so as to generate sequences of sta- 
tionary policies with improved, over the preceding one, 
costs. This method is know as the policy iteration. The 
method proceeds in three steps: 
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1) initialize control policy, uv. 
2) given a stationary policy, j*, evaluate the cost func- 
tion J,,« by solving: 


(I= a Pye) I yk = Suk. 


3) Obtain a new policy such that it satisfies: T yk Tuk 
= T] yk : 
In the above, the matrix P,, is the transition probability 
matrix for a given stationary policy j2, given by: 
Pu(“(1)) Pin(“(1)) 
Py, = ae wei 


ni (1) Pnn(L(n)) 


and g,, the associated cost vector: 


g(, HQ) 
Lu = Minit 
g(n, (n)) 


Termination is detected once J,,« = TJ,,x, i.e. a fixed 
point of the operator TJ has been identified. Notice that 
because of the assumption that the policy space is finite, 
the algorithm will terminate in a finite number of steps. 
Similarly to the value iteration, infinite state and con- 
trol spaces pause problems when implementing policy 
iterations. Specifically, the policy evaluation and policy 
improvement steps can only be performed via approxi- 
mations. 

In [5] an adaptive aggregation method is proposed 
so as to address the issue of occasional slow conver- 
gence. The fundamental premise is to lump states of 
the original problem so as to generate a smaller dimen- 
sion problem. In other words, the state space S is parti- 
tioned into smaller-dimensional spaces as: S = S, U-:- 
U S,. Given such a partitioning one can further define 
the transition probabilities for the aggregate states as: 


rig =) gis > pse(uls)), 


seS; teSj 


which is the probability that the next state will belong to 
S; given that the current state is S;. qi; are the elements 
of an m x n matrix Q, such that qi, 4 0, ifs €S. 

Finally, [7], noticed that since in the limit J < J* = 
TJ*, the optimal policy can be derived as the solution of 


the following linear programming problem: 


max ye 
i€S 
st. A; <gli,u)+a So pif(u)ai. 
j=l 
u € U(i), 


aa eens 


In the above formulation pj(u) denote the transition 
probabilities: pj(u) = { P(xku = j] xr = 4 ue = whi, 
j € S, u € U(i). These can either be given or derived 
based on the discrete dynamic system, xj41 = f(Xks Uk 
wx), and the known probability distribution P(-|x, u) of 
the input disturbance w,;. Linear programming formu- 
lations can also be used to derived cost and policy eval- 
uation approximations. One possibility is to approxi- 
mate J* by a set of known basis functions as: J’(x, r) = 
Ly Tk@K(x). The vector r is an m-dimensional vec- 
tor of known parameters, and for each state x we have 
chose a set of known scalars w;(x). The vector r can be 
determined as the solution of: 


max a J' (x, 1) 
xES/ 
st. J'(x,r) < g(x.u) ta) psyJ'(y.0), 
yes 
xeSs’ cS, 


u € U(x) C U(x). 


Furthermore, the cost function J,, for a given policy 
jt can be approximated via linear programming for- 
mulations by identifying a vector r so as to minimize: 
maxyes | J'(x, r)—J,,(x)|. This can be shown, [4], to be 
equivalent to solving: 


min Zz 


s.t. J'(x, 7) — g(x, w(x)) 


—a DS” pxy(ulx))I'(y. | Sz, 


yes 
xeéeSs’CS. 


Extensions of the general ideas are discussed in [12] 
where work on including constraints in the general 
formulation of the discounted dynamic programming 
problem is presented. Furthermore, [8] expanded the 
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scope of these models so as to address dynamic pro- 
gramming optimization problems involving multiple 
criteria by identifying the set of non-inferior, Pareto op- 
timal, solutions. 
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Dynamic programming deals with situations where op- 

timal decisions are being sought in systems operating 

in stages. Events occur in a specific order, such that the 

decision at time k+ 1 depends on the state of the sys- 

tem at time k. In general the key variables of the basic 

formulation are as follows: 

e krepresents discrete time; 

e x, represents the state of the system at time k; 

e@ jt (xx) represents the control, or decision, variable to 
be selected at time k; 

@ wx represents a random disturbance occurring at 
time k; 

e N represents the time horizon. 

Given the aforementioned variables, the basic dynamic 

programming formulation requires the following in- 

gredients: 
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e adiscrete-time dynamic system: 


Xkt1 = fe(Xk, Lk, Ok); 


e an additive cost function of the form: 
k=N-1 
gwlxw) + D> gales He, Ok), 
k=1 


where g; corresponds to the cost incurred at time k. 
One is therefore wishing to identify that control policy, 
Tt = {[{o,.--, LN —1, Which minimizes the expected cost: 


J* (xo) = min Jn(xo) 


k=N-1 


Ed gn(an) + D> gles Mende - 
k=1 


II 


The expectancy operator is needed since the presence of 
the random parameters w; the cost function becomes 
itself a random variable. As further complication, one 
might also minimize the expected cost not only for 
a given initial state of the system, xo, but also with re- 
spect to all possible initial states. 

Infinite horizon problems are further characterized 
by the fact that the number of stages N is infinite. In 
such a case, the cost functional over an infinite number 
of stages for a given control policy a = {{Uo, [41,...}, and 
initial state xo, is given by: 


k=N-1 


> ot gi (Xk, Wks Ok) « 


k=1 


Jn(xo) = lim E 
noo 


The factor @ is termed discount factor and is a positive 

scalar 0 < aw < 1 which simply implies that future costs 

matter less than similar costs incurred at the present 
time. Infinite horizon problems are by definition the 
limit of the corresponding N-stage problem, as N > 

oo. Three points are pivotal in the analysis of infinite- 

dimensional dynamic programming problems: 

e The optimal cost for the infinite horizon is the limit 
of the corresponding N-stage optimal cost, i.e., J* = 
limy + oo Jn. 

e The optimal costs satisfy Bellman’s equation, i.e., 


rei= a ELS eae) + I° (f(x, b,w))}. 


e If the optimal policy that correspond to the mini- 
mum of Bellman’s equation is ju(x), then the policy 
w = {, L, ...} should be optimal. 


The assumption of an infinite number of stages may not 
be satisfied in practice but is a very important one in 
terms of analyzing the asymptotic behavior of systems 
involving a finite but large number of stages. Depend- 
ing on the nature of the cost per stage and the discount 
factor, the following categories of infinite horizon dy- 

namic programming problems can be identified, [1]: 

e stochastic shortest path problems: this problem is ac- 
tually a generalization of the deterministic shortest 
path problem in the sense that we select not a succes- 
sor but rather a probability distribution pj(j1). Ob- 
viously, if the probability p(s) = 1 for a unique state 
j, then we recover the deterministic shortest path 
problem. One key feature of the stochastic short- 
est path problem is that the termination state ¢ is 
cost-free termination state such that once the sys- 
tem reaches that state it never leaves from it. In other 
words, py(j1) = 1 and g(t, 4) = 0, for all policies pu. 
In effect, the horizon is finite but the actual length is 
random. Furthermore, there exists at least one pol- 
icy for which the destination state will be reached in- 
evitably. A key assumption required for guarantee- 
ing eventual termination states that there exists an 
integer m such that for every initial state and policy, 
there is a positive probability that the termination 
state will be reached after no more than m stages. 

e discounted problems with bounded cost per stage: this 
type of infinite horizon dynamic programming en- 
compasses problems for which: 


|g(x, L,@)| < M, V(x, p,@) €Sx Cx D, 


i.e., there exists a finite scalar, M, that bounds the 
per stage cost. Furthermore, the discount factor is 
such that0<a<1. 
Both of these conditions are important so as to show 
that: 
K 
mE) Dok gle, M(xe), oK)e > 0. 
k=N 

Boundedness and discounting results in succes- 
sive approximation mappings which are contraction 
mappings, [2], thus proving the convergence of such 
schemes to the optimal solution of the discounted 
with bounded costs infinite horizon dynamic pro- 
gramming problems. 

e undiscounted problems: this type of infinite horizon 
problems covers situations in which the discount 
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factor aw = 1, which greatly complicates the analy- 
sis. The key distinction is that the lack of a discount 
factor may result in infinite costs even when the per 
cost stage is bounded. 

e average cost per stage problems: In cases where nei- 
ther discounting nor a cost-free termination state 
exists, it is often meaningful to optimize the average 
per stage cost starting from state i. 


J(i) 


N-1 

: 1 ; 

= Jim #2 (Xk, Wk(XE), OK): Xo = { ; 
=1 


In essence what is assumed is that for most problems 
of interest the average and the optimal per stage cost 
are independent of the initial state. As a result, costs 
that incurred in the early stages do not matter since 
their contributions vanishes, i. e., 


K 
lim HE] Deaton. lod. 00 =0. 


N->oo N 
k=0 


For discrete state and transition spaces, it is helpful 
to consider the associated finite-state Markov chain. Let 
the state space S consist of n states, denoted by 1,..., n: 


S = {1,...,n}. 


The transitions probabilities from state i to state j are: 


pij(u) = Plxrgi = flan = i, up = 4), 


i,jeS, ue U(i). 


The dynamics of the state transitions x, 41 = f(xks Uk 
@ ) can actually be used to compute the state transi- 
tions. Given the above, the per stage expected cost can 
be expressed as: g(i, u) = ae pi(u) g' (i, u, j) Given the 
above definitions, a very important mapping can now 
be defined: 


(TI) = min | gw) +a) | pis 


j=l 
i=1,2,..., 


and also 


(Ty Di) = gi, mi) +a D> pij(Mi)IG). 


j=1 
P= 125265 


This operator can actually be written as: 
Tu = Qu t+ aP,I. 


Therefore, a stationary policy has a corresponding cost, 
J.» Which is the solution to the equation: 


(I-—aPu In = Su- 


Computationally, two major families of approaches 
exists for determining the optimal additive costs and 
the optimal policies. The first one, value iteration, is 
based on the idea of successive approximations. It can 
be shown, under conditions depending of the specific 
type of infinite horizon problem, that: 


Jim (T'N(i) =), 


This property essentially implies that the successive ap- 
plication of the mapping (TJ) will in the limit provide 
the optimal cost. 

On the other hand policy iteration operates on the 
policy space and tries to identify a converging sequence 
of stationary policies converging to the optimal one. In 
all cases, the following basic three steps define the iter- 
ation: 

e Initialization: guess an initial stationary policy, mu°. 

e Policy evaluation: given a stationary policy, ju*, com- 
pute the corresponding cost function, J pk from the 
system: 


CE = @P es = Suk. 


e Policy improvement: obtain a new stationary policy 
satisfying: 


Tyktid yk = TJ yk. 
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Consider the problem of ordering a quantity of a cer- 

tain item at each of the N periods so as to meet some 

stochastic demand. In mathematical terms the problem 

is defined as follows: 

e x,, the stock of a particular commodity available at 
the beginning of the kth period. 

e ux the stock to be ordered and immediately deliv- 
ered at the beginning of the kth period. 

e @, the demand during the kth period, whose proba- 
bility distribution is assumed to be known. 

The demand distributions are assumed to be indepen- 

dent random variables for each time period k. A sim- 

ple stock balance at the beginning of each time period 


provides the description of the discrete-time evolution 
equation as: 


Xkt1 = Xk + UR — Ok. 


In other words, the state of the system (stock) at the 
beginning of period k + 1 was the state of the system 
(stock) at period k plus the ordered stock minus the de- 
mand at period k. The form of the replenishment of pol- 
icy is very important and sits at the hart of the analysis 
of similar problems. The one just presented is, as will be 
seen, one of the two major assumptions regarding the 
stock balance equations. Given the above definitions, 
the cost incurred at period k has two components: 

e acost r(x;) representing either a penalty for positive 
stock, storage, or negative stocks, shortage for un- 
filled demand. 

e asurcharging cost, cuz, where c is the per unit sur- 
charged cost. 

The problem just described is known as the inventory 

control problem, one of the most important ones in the 

area of operations research. The preceding formulation 
illustrates the main characteristics of the inventory con- 
trol problem: 

e a discrete-time system that defines the system evo- 
lution in time of the form: 


Xk = fie(Xk, Uk, Ok); 


a set of independent random disturbances, repre- 

senting commodity demands; 

e a set of control constraints that depend on the state 
of the system at time k, x,, that is uz € U(x,); 

e a period of N time intervals over which the operat- 

ing cost has an additive form as: 


N-1 
E (ev +) > gelxe, ue, 0) 


k=0 


and, finally, 

e we wish to optimally select the control actions at ev- 
ery time interval k, so as to optimize over all possible 
control policies the cost of operating the inventory 
system. 

Clearly, the above definition of the inventory control 

problem, formulates the problem as dynamic program- 

ming problem in which we try to minimize an expected 
additive cost function. 
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Stochastic inventory problems were first consid- 
ered by [6,10], were an abstract stochastic inventory 
model that allowed for possible constraints on the in- 
ventory after ordering were considered. In the litera- 
ture of stochastic inventory models, there are two dif- 
ferent assumptions about the excess demand unfilled 
from existing inventories: the backlog assumption and 
the lost sales assumption. These affect the form of the 
stock balance equation. The backlog assumption is his- 
torically more popular in the literature because of the 
inventory studies with spare parts inventory manage- 
ment problems. This assumption essentially states that 
an unfilled demand is being accumulated and satisfied 
at later times. The lost sales assumption states that un- 
filled demand is lost, which is the situation arising in 
retail establishments. Under either assumption, an im- 
portant issue has been to establish the optimality of the 
(s, S)-type policy, [7]. It defines a very simple replenish- 
ment rule: 


Sk —Xk, Xk < Sk, 


Me (xk) = 5 ery 
The above rule is referred to as the (s, S) policy, imply- 
ing that when the current level is less than the reorder 
point, s, an order up to the reorder level, S, has to be 
placed. Under an (s, S) policy if the inventory level at 
the beginning of a period is less than the reorder point s, 
then a sufficient quantity must be re-ordered to achieve 
an inventory level S upon replenishment. The key con- 
cept of K-convexity, [1], was instrumental in proving 
the optimality of the (s, S) policies. A real-valued func- 
tion g is K-convex, where K > 0, if: 

K+ g(e+y)2 gy) +2( EPS), 


Vz>0,b>0, y. 
The parameter K is the fixed cost associated with a pos- 


itive inventory order: 


K+cu, u>0O, 


C(u) = 


u=0 


The concept of K-convexity is one of the most impor- 
tant tools for the analysis of inventory control problem. 
It essentially expands the concept of convexity and is 
instrumental in proving the optimality of policies in in- 
ventory control problems. Regarding K-convexity, [2], 
the following hold true: 


1) A real-valued convex function g is also 0-convex and 
hence also K-convex for all K > 0. 

2) If.gi(y) and go(y) are K-convex and L-convex (K > 

0, L > 0), respectively, then wgi(y) + Bgo(y) is (aK 

+ BL)-convex for all a > 0 and 6 > 0. 

If g(y) is K-convex and w is a random variable, then 

Ew{g(y—@)} is also K-convex, provided E,{|g(y — 

w)|} < 00, for all y. 

If g is a continuous K-convex function and g(y) > 

oo as |y| > oo, then there exist scalars s and S with s 

< S such that: 

a) g(S) < g(y), for all scalars y; 

b) g(S)+ K =g(s) < g(y), forall y<s; 

c) g(y) is a decreasing function on (—oo, s); 

d) gly) < g(z) + K, for all y, z with z < y < z. 

If we further define a holding/storage cost as: 


3 


~ 


4 


na 


r(x) = pmax(0, —x) + h max(0, x), 
the function H as: 
H(y) = pE (max(0, w% — y)) + E (max(0, y — w,)) . 


Application of the dynamic programming algorithm for 
zero final cost gives: 


Jk (%k) = min 


foxes min{K + cuz + Ge(x~ + ux)}e . 
UR> 
with G;,(x;) defined as: 


Gx(y) = cyH(y) + EVe+1(y — @)) 


and yz = x, + uz. Because of the K-convexity of G, it 
can actually be shown, [7], that the (s, S) policy is op- 
timal. See [11] for the optimality of the (s, S) policy in 
the case of lost sales. For the case where the unfilled de- 
mand is not backlogged but rather lost, the system dy- 
namic equation is defined as: 


Xk+1 = max(0, x, + uz — x). 


Additional K-convexity results and the optimality of 
the (s, S) for the case of lost-sales is presented and an- 
alyzed in [4]. Finite storage capacity in most real-life 
situations imposes an upper bound on theory that can 
be kept. The recent analysis of [3] considers the multi- 
product inventory model with stochastic demands and 
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warehousing constraints. This is a fairly general model 
in that it does not allow for surplus disposal, uj, > 0, and 
imposes constraints on the stored stock, x, + ue I’. 

Similar ideas pertain the analysis of inventory con- 
trol problems over an infinite horizon. Infinite horizon 
problems need not necessarily correspond to physically 
realistic situations, but nevertheless, they define the ve- 
hicle for a thorough analysis of the asymptotic response 
of the inventory system. A discounted version of the 
backlogged problem can be stated as: 


Jn (xo) = lim 
N-oo 


N-1 
E (> ot (cig (xk) + A(x + oxen) — od] 


k=1 


where: 
H(y) = p max(0, —y) + h max(0, y). 


The case of a < 1, i.e. a discounted infinite horizon 
problem, has also been analyzed, [8], and the existence 
of an optimal state-dependent (s, S)-type policy for 
problems with discounted costs was rigorously estab- 
lished. 

The classical papers [5,10] were also devoted to 
stochastic inventory problems with the criterion of 
long-run average cost. In other words, one is interested 
in minimizing an average expected cost within an infi- 
nite horizon 


In (xo) = Jim. N 


N-1 
xE (> ot (cpg ok) + H (xn + (xe) — o)] ; 
k=1 

This analysis also concludes that (s, S)-type policies are 
as well optimal for the long time average-return prob- 
lem. The (s, S)-type of optimal policies are very impor- 
tant and they have be shown to be optimal for a wide 
variety of inventory problems including systems with 
continuous demands and discrete order sizes, [9], in 
other words for the cases where the orders u, are as- 
sumed to be nonnegative integers, as well as the case 
where special structure in the form of periodicity of var- 
ious components of the formulation such as demands, 
prices, and cost, [2]. 

Undoubtably, one of the most appealing features of 
inventory theory has been the fact that (s, S) policies 
are optimal for the class of dynamic inventory prob- 


lems with random demands. However, real-life inven- 
tory problems impose constraints that make the as- 
sumption imposed on the analysis apparently too re- 
strictive. The nature of demand, for instance, is an im- 
portant factor in determining optimal policies. Classi- 
cal models have assumed demand in each period to be 
a random variable independent of demands in other pe- 
riods and of environmental factors at other times. Nev- 
ertheless, fluctuating economic conditions and uncer- 
tain market conditions can have a major effect. Fur- 
thermore, various constraints are observed in real life 
that limit the nature of ordering decisions nd inventory 
levels. The recent work of [8] addresses similar issues 
so as to incorporate cyclic or seasonal demand, as well 
as constraints imposed on the ordering periods, stor- 
age and service level constraints. Nevertheless, it is still 
shown that (s, S) policies are also optimal for these types 
of generalized models. 
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Discrete-Time Optimal Control 


The cost function in the standard k-stage discrete-time 
optimal control problem is defined by 


k 
J = Tepalensa) + Yo li(ui, x3) (1) 


i=1 
and the recursion 


oe (2) 
oe Ieee 8 


Xit1 = filui, xi), 
In these equations, u; is a control vector in R™ and x; is 
a state vector in R”. For each vector-valued k-tuple u = 
(uy, ..., Uz) in the direct sum R*” = or, R”, there is 
a unique state vector-valued (k + 1)-tuple x(u) = (xi(u), 
wees Xe (u)) € R&D" satisfying (2), and a correspond- 
ing unique value J(u) in (1). For present purposes, the 
state transition functions fj, the terminal loss function 
Ik+1 and the stage-wise loss functions ];, i= 1,..., k, are 
assumed to be twice continuously differentiable. The 
functions x(-) and J(-) are then also twice continuously 
differentiable, and Newton’s method is formally appli- 
cable to the problem of minimizing J(-) over R‘". More- 
over, for fixed m and n the km x km linear system asso- 
ciated with the Newton iteration map for (1), (2) can be 
solved efficiently with dynamic programming recursions 
in O(k) floating point operations as k increases with- 
out bound. In contrast, it requires O(k*) floating point 
operations to assemble and solve the Newtonian linear 
system for a general cost function J on R*”" by standard 
Gaussian elimination methods. 

The following discussion conforms to [4]. See [7] 
and [8] for an alternative development with connec- 
tions to differential dynamic programming, and for a re- 
lated but nonequivalent treatment of discrete-time op- 
timal control problems based on the Riccati transfor- 
mation. For analogous constructions in the setting of 
continuous-time optimal control problems, see [4] and 
the original papers [5] and [6]. For extensions to New- 
tonian projection methods and input-constrained opti- 
mal control problems, see [2] and [3]. 


Newton’s Method 


If J is any continuously differentiable real-valued func- 
tion on RY with global or local minimizing vectors 7, 
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then all such vectors must satisfy the first order neces- 
sary condition, 


VJ(u) = 90, (3) 


where 
_ (a ay 
VJ(u) = (Aw... 0). 


A solution of (3) is called a stationary point. 

Condition (3) comprises N scalar equations in N 
scalar unknowns yj. If J is a quadratic function, then (3) 
is a linear system which can be treated with standard 
elimination algorithms or other techniques capable of 
exploiting whatever structure may exist in the coefh- 
cient matrix for (3). On the other hand, if J is a non- 
quadratic nonlinear function, then (3) is a nonlinear 
system and iterative methods are generally needed to 
generate successive approximations to a solution of (3). 
One such method is Newton’s recursive linearization 
scheme, 


urut+y, 


VJ(u) + V*J(u)¥ = 0. (4) 


When J is twice continuously differentiable, the 
vector-valued map VJ(-): RNY > RY is continuously dif- 
ferentiable and its first differential at u is the Hessian 
operator V? J(u): RY — RW defined by 


Ww = 2 wy, 
; =] dujdu; j 


for i= 1,..., N. In such cases, (4) is formally applicable 
to the nonlinear system (3). Furthermore, if V7J(7) is 
invertible at a solution u for (3), then in some neighbor- 
hood N of @, V7J(u) is also invertible and for each start- 
ing point u° € N, the iteration (4) generates a sequence 


of vectors, u°, u!,..., which remain in N and converge 
rapidly to u. More precisely, either u' = 7 eventually, 
or the errors | wi - a| = ((ui —a, ui — ii)? satisfy the 


superlinear convergence condition, 


Jett =a 
lim = = 
i>0oo \|u? — u|| 


0. (5) 


A solution of (3) at which V7J(u) is invertible is said 
to be a regular stationary point. Note that solutions of 
(3) can be local maximizers or saddle points of J, and 


that regular points of this kind can also attract the New- 
ton iterates. Hence for minimization problems, a simple 
steepest descent iteration is often employed at the out- 
set to seek out likely starting points u° for (4) near some 
regular local minimizer for J. 

If J is twice continuously differentiable, then every 
global or local minimizer u must also satisfy the second 
order necessary condition, 


VveRN, (v, V*J(a)v) = 0, (6) 


where (-, -) is the standard Euclidean inner product, 


N 
(v,w) = = ViWj. 


i=1 


In RN, w is therefore a regular local minimizer if and 
only if V7J(u) is positive definite, i.e., 

Wve RX, v 40> (v,V7J@y) > 0. (7) 
The gap between (6) and (7) is not large, hence regular 
local minimizers are commonly encountered in RY. 

By continuity, property (7) extends to V*J(u) in 
some neighborhood of w. At each fixed u in this neigh- 
borhood, the linear system in Newton’s iteration (4) is 
equivalent to a corresponding unconstrained accessory 
minimum problem 


v € arg min f(v) (8) 
6 veRN ¢ 
with a strictly convex quadratic cost function 


$(v) = (VJ(u),v) + AG V2J(u)v) (9) 


and a unique global minimizer v. This equivalence 
is computationally significant for unconstrained min- 
imization problems in general and discrete-time opti- 
mal control problems in particular. 


The Accessory Minimum Problem 


If J is the cost function of a discrete-time optimal con- 
trol problem, then it can be shown that the accessory 
minimum problem (8)-(9) is also a discrete-time con- 
trol problem with quadratic loss functions and linear 
state transition functions. Near a regular minimizer 
for J, this linear-quadratic (LQ) problem has a strictly 
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convex cost function that can be minimized with dy- 
namic programming recursions. The required control- 
theoretic construction for ¢ and the related dynamic 
programming algorithms are outlined below. 

In the following development, the symbol u may de- 
note a vector in R” or a vector-valued k-tuple in R*”. 
Similarly, x may indicate a vector in R” or a vector- 
valued (k + 1)-tuple in R“*”, and the bracket (-, 
denote the Euclidean inner product in any of the spaces 
R”, R”, R&", or R&*", In each case, the correct inter- 
pretation is always clear from the context. Now suppose 
that J(-) is defined by (1)-(2) on R*", and fix u € R*. 
Then for all v € R*” the chain rule gives, 


-) may 


d 
v) = FJtu + sv)|s=0 


k+1 k 
= SO (Veli yi) +S) (Vulisvi) (10) 
i=1 i=1 


(VJ(u), 


and 


d? 
(v, V7J(u)v) = qa + sv)|s=0 


k k 
+250 (yi, Veulivi) + >- (vi, Veulivs), (11) 
i i=1 
where 


d 
yix= asi + sv)|s=0. 


2 
ds2 
and where all partial gradients and Hessians of I;, and 
l,, i= 1,..., k, are evaluated at xz,;(u) € R” and (u;, 
xi(u)) € R” © R", respectively. 

Equations (2) and (12) and the chain rule also es- 
tablish that y; and z; are recursively generated by the 
equations of variation, 


(12) 


Zi = —x;(u + sv)|,=0, 


— (13) 
Vier =Aiyit Bivi, i=1,..., k, 
and 
Zz, = 0, 
Zi4. = AizZ; (14) 


+(Ciyi)yi + 2(Diyi)vi + (Eivi)vi 


for i= 1,..., k, with linear differential maps, 
Of Of 
Aj ==, B; = — 
Ox ou 
and 


0” fi 


0° fi 
i= 3 D; — 
. Ox 0x 


dxdu’ 


_ Of 
~~ dudu 


evaluated at (u;, x;(u)). Useful control-theoretic repre- 
sentations for VJ(u) and @ can now be constructed by 
removing y; from formula (10) and z; from formula (11) 
with the aid of an adjoint recursion for (13) and (14). 
Equations (13) and (14) are special instances of 


w, = 0, 
(15) 
wWi41 = Ajwi + &, co lee 
with w = (wi, .... Wee1) € RY" and & = (£1,...,&) € 


R'". For each &, there is a unique w= ®é satisfying (15), 
and the resulting correspondence defines a linear map 
®: R" — RD" The map @ has an associated adjoint 
linear map &*; Rt) 
uniquely determined by the requirement, 


— R*"} which, in principle, is 


(*n, &) = (n, PE), (16) 


imposed for all € ¢ R*" and n € R“*)", The matrix rep- 
resentor for * in the standard basis for R“*)” is ob- 
tained by transposing the analogous matrix represen- 
tor for ®; however, the adjoint map can also be com- 
puted directly with recursions derived from (15), with- 
out prior construction of ®. More precisely, for each 7 
E R&Dn 


(O*n)i = Wisi (17) 


for i= 1, ..., k, where y is the unique solution of the 
adjoint recursion, 


Wkti = Neti; 


Wi =AFWi4i + ni 


(18) 
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To see this, note that if w and y are solutions of (15) 
and (18) respectively, then 


(Wi. 1) 


(Neti, Wk+1) = (Weti, Wk+1) — 


II 
[Viet 


((Wi+1, Wi+1) — (Wi, Wi)) 


= > (Wit1, Aiwi + &) 


i=l 


k 
— So (AF Winn + ni. wi) 
i=1 


k 


)— Dot 


i=1 


k 
= (Wi41, &: 
i=1 


Hence for all & € R* and ne€ R&rDn 
gives, 


condition (16) 


k 
=> (Wis, & 


i=1 


(O"n, €) = (n, P§) = 
and this establishes (17). 

With the preceding formulas, it is now possible to 
write @ as a sum of linear and quadratic terms in the 
variables v1,..., v, and yj, ...; Yk+1, With coefficients de- 
rived from the partial gradients and Hessians of Hamil- 
tonian functions, 


Aj(uj, Xi, Wi+1) 
= 1,(uj, xi) + (Wisi, fi(ui, xi)). (19) 
Fix u and v in R*, let ni(u) = V, 1; fori=1,...,k +1, 


and let y(u) € R“*)" be the corresponding solution of 
the adjoint recursion, 


Wrt1 
Wi = APWiai + Veli, 


= Viclk4is (20) 
iS 1pe.3 kK: 


In addition, let y and z be the unique solutions of (13) 
and (14) respectively. Then with reference to (15)-(17) 
and (20), 


(BF Witt, vi) 


4 


and 

k 
ee (Wisi, (Ciyi) yi) 
i=l 


k k 
2 > (Wisi. (Diyi)vi) + 2 (Wi41, (Eivi)vi) 
i=1 i=1 


When these expressions are substituted into (9)-(11), it 
follows from (19) that @ is prescribed by 


k 
$Y) = qeti(vers) + d> gilvi. yi) (21) 
i=1 
and the recursions, 
=0, 
a (22) 
Viti = AiYi + Biv, a eae k, 
where A; and B; are the differential maps 
afi afi 
Aj=—, i= 
Ox du 
as before, and the loss functions q are given by 
1 
qktil(y) = 5 (ys Qk+1y) 
and 
1 
qilv, y) = (ri,v) + A (y, Qiy) 
1 
+ (y, Riv) + 5 (v, Siv) , 
fori=1,---,k, with 
Qeei = Viele 
ri = Viti, 
and 
Q; =V2.Hi, Ri =V2,Hi, S$; = V2,H 


for i = 1, ..., k. Moreover, the cost gradient VJ(u) is 


separately recoverable from 


VJ(u) = rk) 


= (V,Ai,..., 


(rj,..., 


VuHk) . (23) 


In these equations, the Hessian of J,,; is evaluated at 
Xk+1(u) and the Hamiltonian gradients and Hessians are 
evaluated at (u;, x;(u), Wisi (u)). 
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Dynamic Programming Recursions 


Recall that V*J(u) is positive definite in some neighbor- 
hood of a regular local minimizer for J. When V7J(u) 
is positive definite, the quadratic accessory minimum 
problem with cost function (21)-(22) can be solved 
by dynamic programming techniques, which rest on 
a simple embedding scheme and a few elementary theo- 
rems stated below without proof. For a fuller discussion 
of dynamic programming, see [1]. 

For j=1,...,k and y € R", define the family of cost 
functions $;(-3y): R&1 Dm _, R! by 


k 
Pi (Vj. 6-5 VES Y) = qk+i(Yet1) + D qilvi, yi) (24) 


i=j 
and the recursions, 


yjry 
Viti = Aiyi + Bivi, 


a (25) 
i= j,...,k, 


where q;, Aj and B; are u-dependent entities defined as 
before. Evidently, the cost function ¢ in (21)-(22) is re- 
covered from the equation, 


P(v) = divi, ..- 5 ¥K5 0). (26) 


Moreover, the cost functions ¢; are recursively gener- 
ated by 


Pk(VEs Y) = Qk(VEs Y) + Getil(Acy + Breve) 
and 
Pj(Vj, +++. VK Y) = Gilvpy) 
+ bjpilvjgi..-..VEs Ajy + Bjv;), 
for j=k—1,..., 1. It is likewise readily seen that 
j(Vj,---. VK 0) = G(0,...,0,¥j,..., VK), 
Vi Pj(Vj, nd 


and 


V5) = Ve j(v;,....¥%5 0), 


V2 b(v) = V7J(u) 


forj=l,..,kKve R and y € R". Note also that 
since $(-) and $;(-;y) are quadratic functions, their cor- 
responding v-Hessians are independent of v as well as 
y. These facts and the basic principles of dynamic pro- 
gramming yield the following theorems. 


Theorem 1 The following statements are equivalent: 

1) The quadratic function $(-) has a unique global min- 
imizerV € Rk", 

2) V7J(u) is positive definite. 

3) For allj = 1,...,k, V,; is positive definite. 

4) For allj = 1, ..., k and all y € R", the quadratic 
function j(-3y) has a unique global minimizer 
(Vj,..., Vk) € R&T Dm 


Theorem 2 The following statements are equivalent: 
1) Forallj=1,...,kandy € R", 


bj (y) = as Pj(Vj,---, VK) > —CO, = (27) 
2) The real-valued functions °(-), ..., p2(-) satisfy the 
backward functional recursion, 
0) = inf [qylv.) + Peay +B] (28) 
forj=k,..., 1, with 
bei) = ae+ily). 


Theorem 3 Let r;, Aj, B;, Q;, Rj, and S; be the vectors 
and linear maps appearing in the representations (21)- 
(22) and (24)-(25) for the functions $(-) and $;(-:y). 
Then the following statements are equivalent for allv = 
(V1,..., VK) € R™: 


1) Vis the unique global minimizer for # in R*”, i. e., 
arg min $(v) = {7}. 
velRkm 


2) The vector ¥V € R*™ is generated by the forward re- 
cursions 


a 


es (29) 
Viti = Ajyj + Bjy;, 


{vj} = arg min [aie yi) + Big (Ajyj + Bv)| 


= {yj + Ty}, 
(30) 


forj=1,...,k, where 
a 
gil. ¥) = (rj,¥) + 5 (y, Gy) 
1 
+ (y, Ryv) + 5 WY Siv) » 


Iie 
$5 (y) = a + (Bj, y) + n (y, Ojy), 
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Sj + B; ©j,.1B; is positive definite, 
Vj = —(Sj + BF Oj41Bj) (rj + BF Bj41), GBD 


and the linear maps ©, vectors Bj, and numbers a; 
satisfy the backward recursions, 


Ox41 = 9, 
0; =Q;+ Ai Oj41 Aj (33) 
+(RF + BFOj41Aj)*Tj, 
= 0, 
Brti ; ; (34) 
Bj = (Aj + Bij)" Bj4i + P*7;, 
and 
a = 0, 
k+l (35) 


Aj = Aj41 — 5 (yj, (S; + BYOj+1B))7)) 


forj=Hk..., 1. 


These theorems support the following efficient scheme 
for computing the Newton increment ¥ in (4). 


1 Given u € R*, solve the forward recur- 
sion (2) for x(u), and construct the corre- 
sponding linear maps A; and Bj, and vectors 
LS Ne l jf 

2 Solve the backward adjoint recursion (20) for 
w(u) and construct the corresponding vec- 
tors rj. 

3 Construct the linear maps Q;, R; and S;, solve 
the backward dynamic programming recur- 
sions (33) and (34) for ©; and Bj, and com- 
pute y; and Jj in (31) and (32). 

4 Solve the forward recursions (29)-(30) for y 
and v. 


Algorithm 


Stages 1 and 2 in the foregoing algorithm are always 
well-posed, and yield the cost gradient V/J(u) (see (23)). 
The calculation for the Newton increment Vv is well- 
posed if and only if stage 3 produces invertible linear 
maps S; + Br Ojn1Bj for j = k,..., 1. The calculation for 


v is well-posed and stage 3 concludes with k positive 
definite linear maps S; + Br ©j,1B; if and only if V7J(u) 
is positive definite. If a positive semidefinite, indefinite 
or singular linear map S; + Br ©;,1B; is encountered at 
some point in stage 3, it follows that V7J(u) is not pos- 
itive definite and the accessory minimum problem may 
have no global minimizers or stationary points, or in- 
finitely many such points. In such cases, it may be ad- 
vantageous or even necessary to abort stage 3 and aban- 
don Newton’s method temporarily in favor of a descent 
iteration that employs the negative gradient —V/J(u) 
computed in stages 1 and 2, or some other descent 
direction. Alternative quasi-Newtonian descent direc- 
tions can be obtained by replacing S; in stage 3 with 
S; + AjI where AjI is a positive shift added where nec- 
essary to maintain positive definiteness of S; + AjI + 
B- ©)+1B;. This variant of stage 3 is automatically well- 
posed and produces the unique global minimizer V of 
the perturbed cost function, 


oy) +5 (v, Aw) 

where 
A(u)v = (Aivy,...,AKVE). 

By construction, V*J(u) + A(u) is positive definite and 
v= —-(V7J(u) + A(u))*VI(u). 


Hence ¥ is a descent vector for J. On the other hand, 
the simple steepest descent direction —VJ(u) may be 
more cost-efficient, particularly when u is far from a lo- 
cal minimizer for J. 

If the work required to compute the differentials for 
fj and |; in each time step j is uniformly bounded in j, 
with m and n fixed, then the number of arithmetic oper- 
ations required to execute the foregoing algorithm (or 
its shifted variants) is directly proportional to k. This 
compares very favorably with the standard O(k*) esti- 
mate for general Newtonian calculations in R*”. 

Finally, references [9] and [10] revise the basic serial 
dynamic programming algorithm for parallel computa- 
tion, and thereby achieve significant reductions in the 
time needed to calculate each Newton iteration. 


See also 
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Steps 
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A very powerful method for optimization of a system 
that can be separated into stages is dynamic program- 
ming developed by R. Bellman [1]. The main concept 
of this technique lies in the principle of optimality which 
can be stated as follows: 


An optimal policy has the property that whatever 
the initial state and the initial decision are, the 
remaining decisions must constitute an optimal 
policy with regard to the state resulting from the 
first decision. 


Many engineering systems are in the form of indi- 
vidual stages, or can be broken into stages, so the idea of 
breaking up a complex problem into simpler subprob- 
lems so that optimization could be carried out system- 
atically by optimizing the subproblems was received en- 
thusiastically. Numerous applications of the principle 
of optimality in dynamic programming were given in 
[2], and there was a great deal of interest in applying 
dynamic programming to optimal control problems. 
In the 1960s many books and numerous papers were 
written to explore the use of dynamic programming as 
a means of optimization for optimal control problems. 
Since an optimal control problem, involving optimiza- 
tion over a trajectory, can be broken into a sequence 
of time stages, it appeared that dynamic programming 
would be ideally suited for such problems. 

Although dynamic programming could be success- 
fully applied to some simple optimal control problems, 
one of the greatest problems in using dynamic pro- 
gramming, however, was the interpolation problem en- 
countered when the trajectory from a grid point did not 
reach exactly the grid point at the next stage [12]. This 
interpolation difficulty coupled with the dimensionality 
restriction and the requirement of a very large number 
of grid points limited the use of dynamic programming 
to only very simple optimal control problems. The limi- 
tations imposed by the ‘curse of dimensionality’ and the 
‘menace of the expanding grid’ for solving optimal con- 
trol problems kept dynamic programming from being 
used for practical types of optimal control problems, 


until R. Luus [14] suggested effective means of over- 
coming both the interpolation and the dimensionality 
problems. 


Optimal Control Problem 


We consider the continuous dynamic system described 
by the vector differential equation 


& = fx,u) (1) 

— = f(x,u 

dt 
with the initial state x(0) given, where x is an (n x 1) 
state vector and u is an (m x 1) control vector bounded 
by 


Aenea By. f= lesa: (2) 


The performance index associated with this system 
is a scalar function of the state at the given final time fy; 
ie, 


I = O(x(t,)). (3) 


We may have also state constraints, but for simplicity 
we shall leave these for later. The optimal control prob- 
lem is to find the control u in the time interval 0 < 
t < t, so that the performance index in (3) is either 
minimized or maximized. To set up the problem into 
a staged form, we may approximate the optimal control 
problem by requiring a piecewise constant control pol- 
icy instead of a continuous control policy, over P stages, 
each of length L, so that 


ei 
L=-—, 4 
o (4) 
and we can consider the system at the grid points set 
up at these P stages. We may also use a piecewise linear 
approximation and the stages do not necessarily have 
to be of equal length. 


Iterative Dynamic Programming 


M. DeTremblay and Luus [10] suggested that instead 
of interpolation, an approximation can be used when 
the trajectory from a grid point does not reach a grid 
point at the next stage. They suggested that the con- 
trol policy that was found to be optimal for the clos- 
est grid point be used to continue the integration to the 
next stage. Ifa large number of grid points are taken at 
each stage then a reasonable approximation may be ob- 
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tained, but the resulting control policy can still be quite 
far from the optimum. Therefore, this simplification by 
itself gives only a crude approximation. 

However, by making a small change to the proce- 
dure, the accuracy with which the optimum is obtained 
can be improved substantially. This change requires the 
use of the procedure repeatedly in an iterative fashion 
[16], so that after every iteration, where the best value 
is used as the center point, the regions for the allowable 
values for control and for the grid points are reduced in 
size. The idea of region reduction in optimization was 
successfully used by Luus and T.HL.I. Jaakola [37] in di- 
rect search optimization. As was shown by Luus [16], 
dynamic programming can be used in this fashion to 
give a sufficiently accurate optimal control policy. Al- 
though easy to program, the method was not computa- 
tionally attractive until the idea of using accessible states 
as grid points [14]. With this latter change, the method 
was recognized as a feasible approach to solving optimal 
control problems. 

The advantage of generating the state grid points is 
also that the dimensionality of the state vector then does 
not matter. The application of the method to a non- 
linear system described by 7 differential equations and 
having 4 control variables was solved quite easily [15]. 
Also the method was used for system of difference equa- 
tions, which is actually easier, since no discretization is 
necessary [40]. In essence, the “curse of dimensional- 
ity was eliminated and the new computational proce- 
dure became known as iterative dynamic programming 
(IDP). 


Early Applications of IDP 


Iterative dynamic programming provided a very conve- 
nient way of investigating the effect of the choice of the 
final time in optimal control problems [18]. However, 
by generating the grid points, it was no longer possi- 
ble to guarantee a global optimum. This was illustrated 
by Luus and M. Galli [36]. Even the use of a very large 
number of grid points does not guarantee getting the 
global optimum. In fact, the number of grid points can 
be quite small in many cases and the global optimum is 
still obtained with good accuracy [4]. 

A very challenging problem is the bifunctional 
catalyst problem, where it is necessary to determine 
the blend of the catalyst along a tubular reactor to 


maximize the yield of a desired component [35]. By us- 
ing successive quadratic programming (SQP) and start- 
ing from 100 randomly chosen starting points, 26 lo- 
cal optima were located, but the global optimum was 
not obtained. With IDP, however, the global optimum 
was readily obtained with the use of a single grid point 
[34]. To avoid the numerous local optima, all that was 
required for this system was to take a sufficiently large 
initial region size for the control. 

Although the optimal control of fed-batch reactors 
was very difficult to obtain by methods based on Pon- 
tryagin’s maximum principle, iterative dynamic pro- 
gramming provided a reliable means of obtaining the 
global optimum, and the results were even marginally 
better than had been previously reported [20,22]. The 
additional advantage of IDP is that the computations 
are straightforward and the algorithm can be easily pro- 
grammed to run on a personal computer. 


Choice of Candidates for Control 


In the early work with IDP, the test values for the con- 
trol variables were chosen over a uniform distribution. 
This was easy to program and was easy to visualize. For 
each control variable we could have a minimum of 3 
values, namely —r, 0, and r, where r is the region size. 
For m control variables we must examine then 3” can- 
didates at each grid point. This is fine if m is less than 4, 
but if m is large, this number becomes excessively large. 

An alternative method for choosing candidates for 
control was suggested by V. Tassone and Luus [47], but 
a better approach as shown by B. Bojkov and Luus [3] 
was to choose such candidates at random inside the al- 
lowable range. This meant that in theory there was no 
upper limit on m. Conceptually m could be greater than 
100. In fact, IDP was used successfully on a system with 
130 differential equations and 130 control variables [21] 
and later with 250 differential equations with 250 con- 
trol variables [26]. 


Piecewise Linear Continuous Control 


In the early work with IDP, the given time interval was 
divided into P time stages of equal length and at each 
time-stage we would have constant control. In many 
cases the optimal control policy is quite smooth, and 
therefore it may be beneficial to approximate the con- 
trol policy by linear sections. This, indeed, gives a bet- 
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ter result with a smaller number of time stages as was 
shown by Luus [23], and allowed an optimal control 
policy for very high-dimensional systems to be deter- 
mined accurately [15,26]. For a piecewise linear control 
we calculate the control policy in the time interval (t,, 
tk+1) by the expression 


u(k + 1) — u(k) 
L 


where u(k) is the value of u at the time t, and u(k + 1) 
is the value of u at time ty... 


u(t) = u(k) + (t — tk), (5) 


Algorithm for IDP 


To illustrate the underlying logic in IDP, an algorithm 

is given to solve the optimal control problem as out- 

lined in (1)-(4), where it is required to minimize the 
performance index in (3) with the use of piecewise con- 
stant control over P stages, each of same length: 

1) Divide the time interval [0, t;] into P time stages, 
each of length L. 

2) Choose the number of test values for u, denoted 
by R, an initial control policy and the initial region 
size Yin} also choose the region contraction factor y 
used after every iteration and the number of grid 
points N. 

3) Choose the total number of iterations to be used 
in every pass and set the iteration number index to 
j=l. 

4) Set the region size vector r? = fin. 

5) By using the best control policy (the initial control 
policy for the first iteration) as reference, integrate 
(1) from t = 0 to ty N times with different values 
for control inside the allowable region to generate 
Nx-trajectories and store the values of x at the be- 
ginning of each time stage as grid points, so that 
x(k — 1) corresponds to the value of x at beginning 
of stage k. 

6) Starting at stage P, corresponding to time t; — L, 
for each of the N stored values for x(P — 1) from 
step 5 (grid points) integrate (1) from ty — L to ts, 
with each of the R allowable values for the control 
vector calculated from 


u(P — 1) = u(P— 1)* + Dr, (6) 


where u(P — 1)* is the best value obtained in the 
previous iteration and D is a diagonal matrix of dif- 
ferent random numbers between —1 and 1. Out of 


the R values for the performance index, choose the 
control values that give the minimum value, and 
store these values as u(P—1). We now have the best 
control for each of these N grid points. 

7) Step back to stage P—1, corresponding to time 
ty — 2L, and for each of the N grid points do the fol- 
lowing calculations. Choose R values for u(P — 2) 
as in the previous step, and by taking as the initial 
state x(P — 2) integrate (1) over one stage length. 
Continue integration over the last time stage by 
using the stored value of u(P—1) from step 6 by 
choosing the control policy corresponding to the 
grid point that is closest to the values of the state 
vector that has been reached. Compare the R values 
of the performance index and store the u(P — 2) 
that gives the minimum value for the performance 
index. 

8) Continue the procedure until stage 1, correspond- 
ing to the initial time ¢ = 0 and the given ini- 
tial state, is reached. This stage has only a single 
grid point, since the initial state is specified. As be- 
fore, integrate (1) and compare the R values of the 
performance index and store the control u(0) that 
gives the minimum performance index. Store also 
the corresponding x-trajectory. 

9) Reduce the region for allowable control 


ritD = yr), (7) 


where j is the iteration number index. Use the best 
control policy from step 8 as the midpoint for the 
allowable values for the control denoted by the su- 
perscript *. 

10) Increment the iteration index j by 1 and go to step 5 
and continue the procedure for the specified num- 
ber of iterations and interpret the results. 

The application of this algorithm is illustrated with sev- 

eral examples in [33], where also the computer program 

in FORTRAN is given for IDP. 


Time-Delay Systems 


The great advantage of IDP over Pontryagin’s maxi- 
mum principle is that no auxiliary variables have to be 
calculated and no derivatives are required. The state 
equation is integrated forward and there is no need 
to integrate any equations backward. Therefore, the 
method is applicable to more complex systems, such as 
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time-delay systems. The initial attempt to apply IDP to 
time-delay systems was made by S.A. Dadebo and Luus 
[8]. By using piecewise linear continuous control very 
good results for a difficult nonlinear time-delay CSTR 
system were obtained by Luus et al. [43]. The method is 
further illustrated in [32]. 


State Constraints 


Control constraints actually simplify the problem by 
decreasing the range over which the admissible values 
of control are to be taken. Research in how to handle 
state constraints is still continuing, but already very use- 
ful results have been obtained. As was shown in [39] 
and [17], the use of penalty functions appears to be 
the best way to deal with state constraints. The best 
type of penalty function has not yet been firmly estab- 
lished. Although Dadebo and K.B. McAuley [9] sug- 
gested the use of absolute value type of penalty function 
for state equality constraints, the recent work of Luus 
[27], and Luus and C. Storey [41] show that a quadratic 
penalty function with shifting terms also works very 
well. The advantage of using the quadratic penalty func- 
tion with shifting terms is that, at the optimum, the 
shifting terms yield useful sensitivity information with 
respect to the constraints. Handling of state inequality 
constraints can be achieved by introducing through dif- 
ferential equations auxiliary variables that are increased 
in value whenever the constraint is violated and then 
including these auxiliary variables at the final time as 
penalty functions in the augmented performance index 
[46]. The use of differential equations is better than the 
use of difference equations as was used by Luus [17], 
because this will prevent violation of the constraint in- 
side a time stage. The auxiliary variables when incorpo- 
rated into the augmented performance index through 
a penalty function with a sufficiently large penalty func- 
tion factor thus prevent a violation of the state con- 
straint anywhere in the time interval. 


Singular Control Problems 


When Pontryagin’s maximum principle is used, com- 
putational difficulties arise if the Hamiltonian is not an 
explicit function of the control for a portion of the tra- 
jectory. Such problems do not arise when IDP is used, 
and therefore this area was investigated by using IDP. 
The early work [19] showed that IDP can be used with- 


out much difficulty for such problems, and Luus [29] 
was able to obtain solutions to singular control problems 
that had eluded many investigators. For these problems 
the main difficulty is the very low sensitivity of the per- 
formance index on control. 


Sensitivity of Control Policy 


Especially for batch reactors, it is found that the cause of 
computational difficulties lies in the sensitivity of con- 
trol policy with respect to the yield that is to be maxi- 
mized [24]. Whereas we are not concerned with more 
than four figure accuracy in the yield, we would never- 
theless like to know what the optimal control policy is. 
The very low sensitivity was brought out by Luus [25] 
where in the optimal control of a fed-batch reactor, it 
was shown that the optimal control policy is relatively 
smooth. 


Use of Variable Stage-Lengths 


Bojkov and Luus [5,6] suggested the use of flexible 
stage-lengths in IDP for time optimal control problems 
where the time of switching is very important. For gen- 
eral type of optimal control problems the use of vari- 
able stage-lengths enabled the optimum to be more ac- 
curately obtained, and in some instances the local op- 
tima encountered with the use of stages of fixed length 
could be avoided [7]. The problem of applying this idea 
to problems where the final time was specified was over- 
come by the use of shifting terms in a quadratic penalty 
function [27]. The use of flexible stage-lengths provides 
a means of obtaining accurate switching times and al- 
lowed some optimal control problems, that had gone 
by unsolved for several decades, to be readily solved 
[29]. The use of variable stage lengths and a quadratic 
penalty function with shifting terms enables time opti- 
mal control problems to be solved directly [31], so that 
a difficult boundary value problem is avoided. Further 
illustration of the usefulness of variable stage lengths is 
given in [30] and [33]. 


Nonseparable Problems 


Problems where the performance index is a function 
of all the control variables and states, and where sep- 
aration into stages as required for dynamic program- 
ming is not possible, constitutes and interesting class 
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of problems. D. Li and Y.Y. Haimes [13] suggested 
a method of tackling such problems, and Luus and Tas- 
sone [42] considered the application of IDP to nonsepa- 
rable problems. Luus [28] showed that even for complex 
nonseparable problems a large number of grid points 
is not necessary, and the optimum can be obtained 
quite readily. If the number of stages is large, then IDP 
has a great advantage over direct search optimization 
where optimization is carried out simultaneously over 
all the stages. The best means for the application of IDP 
to nonseparable problems are still to be determined. 
The strategy of using the values for some of the vari- 
ables from previous iteration appears to work very well, 
however. 


Future Directions 


Iterative dynamic programming has been developed 
into a useful optimization procedure. As has been 
shown in [11], IDP has certain advantages over other 
optimization procedures for the optimization of a fed- 
batch reactor. The reliability of getting the global op- 
timum is very high. Now that the personal computers 
have become very powerful, the method can be easily 
used on very complex optimal control problems. When 
G. Marroquin and W.L. Luyben [44] suggested operat- 
ing a batch reactor at its best isothermal temperature 
as the set point, the computational power of the exist- 
ing computers was relatively low and the cost of com- 
putation was very high. It appeared then that optimal 
control could not be used for realistic systems. Now, 
however, we can, in effect, have a feedback control if 
the measurements of the pertinent state variables can 
be done sufficiently fast, by solving the optimal control 
problem many times during the time of operation of 
the batch reactor. If the trend in the enhancement of 
computer speed continues, we can use realistic mod- 
els and carry out optimization ‘on-line’, so that optimal 
control calculations can be carried out during the op- 
eration and the required changes in the control can be 
immediately implemented. Then the optimal control’s 
application will not be only for investigation of design 
possibilities, but will constitute an important part of the 
actual operation of the process. 

The viability of using IDP for on-line optimal con- 
trol has been illustrated for reactor control by Luus and 
O.N. Okongwu [38]. 


Since derivatives are not required in the use of IDP, 
the method is applicable to more general types of op- 
timal control problems. Also, since no auxiliary vari- 
ables are necessary, except to handle state inequality 
constraints, the method is easier to use than varia- 
tional methods based on Pontryagin’s maximum prin- 
ciple. As convergence properties of IDP are studied in 
greater detail, further improvements will inevitably be 
introduced, to make IDP even more useful. Luus [30] 
showed that variable stage lengths can be incorporated 
into optimal control problems where state inequality 
constraints are also present, by combining the approach 
of Bojkov and Luus [7] along with that of W. Mekara- 
piruk and Luus [46]. Although the best choice for the 
penalty function to be used in IDP has not yet been 
established, good progress has been made in this field 
[45] and further research in this area is continuing. Fur- 
thermore, since no derivatives are required for IDP, the 
method should have important applications where non- 
differentiable functions are encountered. 
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The shortest path problem is considered to be one of the 
classical and most important combinatorial optimiza- 
tion problems. Given a directed graph and a length a; 
for each arc (i, j), the problem is to find a path of min- 
imum length that leads from any node i to a node f, 
called the destination node. So, for each node i, we need 
to optimally identify a successor node u(i) so as to reach 
the destination at the minimum sum of arc lengths over 
all paths that start at i and terminate at t. Of particu- 
lar relevance is, in the area of distributed computation, 
the problem of data routing within a computer com- 
munication network. In such a case, the cost associated 
with a particular link (i, j) is related to an average de- 
lay. The stochastic shortest path problem is a general- 
ization whereby for each node i we must select a prob- 
ability distribution over all possible successor nodes j 
out of a given set of probability distributions pj;(u), pa- 
rameterized by a control u € U(i). Clearly, the path tra- 
versed and its length are random variables, but the op- 
timal path should lead to the destination with probabil- 
ity 1 and have the minimum expected length. Further- 
more, if the probability distributions are such that they 
assign a probability of 1 to a single successor we then re- 
cover the deterministic shortest path problem. Clearly, 
sequential decisions have to be made optimally so as to 
determine the sequence of controls that would produce 
for any current state, i.e. node, a successor state, i.e. 
node, so as to minimize the expected length for reach- 
ing the terminal state. If we were to assume that a par- 
ticular policy z, i.e., set of control actions, has been 
selected the total expected cost starting from an initial 
state i, using this policy would be: 


FC) 


= lim E 
N-oo 


N-1 
Yo ok gin, Meld), ipa): io = i 


k=0 
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The optimal cost-to-go starting from state i is denoted 
by 


J* = min J" (i). 


This problem is defined as s special case of the total cost 
infinite horizon problem. The need to assume an infi- 
nite horizon is not required by the actual description 
of the problem, but rather it is a necessity since the ac- 
tual length is random as well as unknown. The follow- 
ing are the key characteristics of the stochastic short- 
est path problem description within the infinite horizon 
dynamic programming framework: 

1) There is no discounting, ow = 1. 

2) The state space is S = {1,..., 7, f}. 

3) The transitions probabilities are: 


pij@) = = Plxrgi = jlxn =i, ue =u), 
i,jeS, ue U(i). 
4) The state t is absorbing, that is, 


pu) = 1, Vue U(t); 


the state t, the termination state, is special in the 
sense that reaching it is inevitable. 

5) The control set U(i) is finite. 

6) The destination is cost-free, i. e., 


g(t,u,t)=0, Wu e U(t). 


If we denote by g’(i, u, j) the cost of moving from i to j 
using control u, then the expected cost per stage will be 
defined as: 


g(i,u) = S- pi(wgli, u, j). 


j=l 


The concept of the absorbing state implies that this state 
will either be reached inevitably, or there is an incen- 
tive to reach it with the minimum expected cost. The 
stochastic shortest path problem was first formulated 
in [3] while addressing a fundamental problem in con- 
trol theory, namely finding the input that would take 
a given system to a specified terminal state at minimum 
cost. A fundamental assumption regarding the types of 
stochastic shortest path problems that can be analyzed 
states that: 


Assumption 1 There exists at least one proper policy. 


A proper policy p is a stationary policy which, when 
used, results in a positive probability that the destina- 
tion state will be reached after at most n stages, regard- 
less of the initial state. That is: 


Pu = max P{x, At: x9 =i, w} <1. 
i=1,...,.n 


A stationary policy is a policy of the form m = {p, pu, 
ds fi 

For analysis purposes, the following operator for 
any vector J is defined: 


(TDG) = min | gi, + Dp ID |. 


j=l 


a re 


which is obtained by applying one iteration of the basic 
dynamic programming algorithm to the cost function J, 
by realizing that the expectancy operator can be refor- 
mulated based on the functional form of the state tran- 
sition probabilities pj. It can actually be shown, [1], that 
T is a contraction mapping with respect to a weighter 
sup norm. In other words there exist positive constants 
V1, -++5 Va and some y with 0 < y < 1, such that for all 


Ji Ja: 


eeae3 


1 
<y max — |J,(i)—Ja(i)|. 
i=1,...,.n Vj 


Furthermore, the operator T is monotone, that is: for 
any vector J and J such that J(i) < J(i),i=1,...,”,and 
for any stationary policy jw we have: 


(T*J)(a) < (TI), 
(Os (ENO, 


Pe 1p.yn, “aH, 2 aca 


The main results of the theoretical analysis of stochas- 

tic shortest path problems are analogous to those for 

discounted problems: 

i) The optimal cost vector is a solution to Bellman’s 
equation: J* = T]*. 

ii) For every proper policy the cost vector J satisfies: 


Jim (Ti) = J*(i),  i=1,...,n. 
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iii) A stationary policy y is optimal if and only if T,,J* 
=TJ". 
iv) For every proper policy ju: 


lim Ty] = J#, 
ee wd =F 
Je =TJ”. 


A thorough analysis of the computational complexity 
of stochastic shortest path problems has been presented 
in [5]. 

In order to address computationally stochastic 
shortest path problems the general methods, i. e., value 
iteration, and policy iteration, as well as approximation 
schemes along the lines presented in [1] as developed 
for discounted problems. A detailed account can also be 
found in [6]. Value iteration is a principal method for 
calculating optimal cost J* by generating sequences T*] 
starting from some J. Issues related to the Gauss-Seidel 
implementation of the value iteration are discussed in 
[2]. Although in principle an infinite number of iter- 
ations will be required, under certain conditions finite 
convergence can be achieved. An alternative way is to 
perform policy iterations, in the sense that starting with 
a proper policy /4o, a sequence of policies converging to 
the optimal one is constructed. According to property 
iv), for any given policy 4, the cost vector can be evalu- 
ate as the solution of a system of linear equations: 


n 


I) = Do pilueli) (gi, wei), jf) + IG)). 


j=l 


A policy improvement can be know determined as in: 


[x41(1) 


= ' iu, j) + JPEG 
= a6 De PILE u, j) + J°*(j)). 


These approaches assume that mathematical models for 
the cost structure and the transition probabilities of the 
system exist. In may cases however, such information 
is not available and methods based on simulation have 
been developed. This information can be derived by 
simulating, for given control and state spaces, the sys- 
tem’s response so as to derive the associated transition 
costs g(i, u, j). The ideas of Monte-Carlo simulation can 


be utilized so as to use simulation for policy evaluations. 
A straightforward way of computing the corresponding 
cost vector J,, for a given policy ju, is to generate many 
sample trajectories starting at i, average the correspond- 
ing costs, therefore obtaining an estimate for J,,(i). An 
alternative way, is to perform an infinite (large) num- 
ber of simulation runs from various initial states up to 
the destination state, and any time that state i is en- 
countered we record the corresponding cost of reaching 
state f: 


c(i,m) = gli, i1) + glin, iz) +--+ + gin, t). 


By averaging the simulations we obtain: 


M 
1 
Bee ony 
Jui) = lim = Do Ci. m) 
m=1 
The iterative implementation of the update process re- 
sults in: 


Juiz) = Iu (iz) 
+ Ve (gCik, iki) + glicti, ik+2) 
+ +++ + glin, t) —Ju(ix)), 
k=1,...,N, 


Using simulation to perform the policy evaluation as 
just described, can be utilized so as to improve on the 
actual policies in order to achieve optimality. The con- 
cept of temporal differences, [7], was proposed recently 
as an alternative way so as to develop policy iterations, 
[1,2]. This concept originated in the field of reinforce- 
ment learning, and the key premise is to adjust the esti- 
mations appropriately so as to modify prior predictions 
when a temporal difference if observed, by essentially 
looking back in time and correcting previous predic- 
tions. The temporal difference is defined as the quantity: 


Ay = glik, ikti) + Tite — Ty, 
k= 1.cc3N- 


The temporal difference represents the difference be- 
tween the current estimate J,, (i) of expected cost-to-go 
to the termination state and the predicted cost-to-go to 
the termination state g(i,, if+1)+ Tier: The key idea 
of the Monte-Carlo simulation using temporal differ- 
ences is to update the individual cost-to-go as soon as 
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the cost to go of the successor has been estimated. In 
other words J,,(i,) is update as soon as g(ij), in) and in 
are generated during the simulation runs. Then update 
both J,,(i;) and J,, (iz) immediately after g(iz, iz) and i3 
are generated, etc. 

For the cases where the number of stages becomes 
prohibitively large, approximations schemes can be 
used so as to derive accurate estimates of either the 
optimal cost, J*, or the optimal policy, 4. By approx- 
imating the optimal cost, J*, we need to generate for 
a given state i and approximation J’ (i, r) of J*(i), where 
r a parameter vector that is to be determined by using 
some type of least squares minimization. Once the cost 
is know, it can then be used so as to generate subopti- 
mal policies as: 


L'(i) 
ae a) 
Yo pilw(gliu +I'G.N) - 


j=l 


The type of the approximation is nonunique but usually 
the approximations are of the form: 


m 


I(r) = So rewx(i). 


k=1 


In essence the approximation is a linear combination of 
a set of basis functions. 

Recently (1994), [8], presented an approximation 
scheme referred to as feature-based aggregation. The 
idea is to develop an approximation by making use of 
the fact that several states may share some common 
characteristics (features). For a stochastic shortest path 
problem with n states, one can identify m disjoint sub- 
sets S;, k = 1,..., m, such that: 


S = 8, U-+-U Sy. 
The basis functions w,;(i) can therefore be defined as: 


: 1 ifi € Sx, 
ox(i) = ue 
0 ifie S,. 


The approximate cost can thus be defined as: 


m 


I'(ir) = Do recor (i). 


k=1 


The optimal vector r can be determined as the solution 
of the aggregate stochastic shortest path problem, for 
which the aggregate aggregate transition probabilities 
qki express the probability of moving from any state in 
Sx to state i. The vector r solves the corresponding Bell- 
man’s equation of the aggregate problem: 


= Se aki 


i=1 


x min) | pij(u) (x u, j) + > rani) , 
j=l 


s=1 


k=1,...,m. 


For the aggregate problem, the simulation ideas previ- 
ously developed can be utilized so as to obtain the opti- 
mal vector r and therefore obtain the required approxi- 
mation. 

Approximation and simulation schemes can also be 
combined so as to provide alternatives to performing 
policy iterations, [1]. For a given stationary policy ju, 
a number of simulations, M, can be performed so as 
to obtain the estimates c(i, m). subsequently, a least 
squares optimization can be solved to provide an ap- 
proximation to the costs J,,’(i, r, and the coefficients r 
are derived by solving the following optimization prob- 
lem: 


M 
min ) > > i.) — c(i, m)|. 


i€S m=1 


Once the costs function have been determined, and im- 
proved policy, /<(i) is identified as: 


fi(i) = arg amin) | Pil(oli, u, jf) + J'(i,n). 
j 


In essence, the method iterates between a policy evalu- 
ation and a policy improvement step, using both simu- 
lation techniques for obtaining, for a given state i and 
policy 4, sample costs and approximation techniques 
for obtaining representation of these costs. There are 
subsequently used so as to estimate improved policies 
and the iterations continue. The concept of Q-learning, 
[9], was recently proposed as an alternative way of im- 
plementing the concept of re-enforcement learning in 
the solution of dynamic programming, [4]. 
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Total cost infinite horizon problems deal with optimal 
decision making problems in the presence of uncer- 
tainty of systems in which events occur sequential. In 
general, the state transitions are described by a station- 
ary dynamic system of the form: 


Xe = f(x, Uk, On), k=O0,1,..., 


where for each time instance (stage) k, the state of the 
system is an element of the space S, the control action u 
that is to be implemented so as to achieve optimality be- 
long to a space C, and finally the uncertainty is modeled 
through a set of random disturbances w that belong 
to a countable set D. Furthermore, it is assumed that 
the control u;z is constrained to take values in a given 
nonempty set U(x;) € C, which depends of the cur- 
rent state x,. The random disturbances w,;, k = 0, 1, 
..., have identical statistics and the probability distri- 
butions P(-|x;, ux) are defined on D. These may depend 
explicitly on x; and u; but not on prior disturbances. 
Given an initial state xo, we seek a policy a such that 
= {o, 1, ...} for which: 


Lk: S—>C, 


Mk(xn) € U(x), Vx € S, 


that minimizes a cost function defined as: 


N-1 
Jn(Xo) = lim E 2 aes pale) cn) 
=1 


The function g() is the cost per stage such that: g: S x 
C x D > Rand is assumed to be given. Finally, the pa- 
rameter @ is termed discount factor and it holds that: 0 
<a@ <1. We denote by J7 the set of all admissible poli- 
cies 1 = {{Uo, [41, ...}, that is, the set of all sequences of 
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such functions for which: 
Lk: S—>C, 


Mk(xn) € U(x), Vx € S. 


The optimal cost function J* is then defined as: 
J* =minJ,z(x), xeéS. 
well 


An admissible policy of the form a = {p, p, ...} is 
termed stationary and its corresponding cost is Jj. 

Nevertheless, it is often the case that either the dis- 
count factor, a, does not have to be less than one, or 
even the cost per stage does not have to be bounded 
from either above or bellow. If that is the case, then it 
is quite possible that for some initial states x9, the cost 
functional J, (xo) may become infinite. 

D. Blackwell [4] was among the first to analyze the 
case in which the discount factor ~ becomes 1. His ap- 
proach was based on the idea of studying the behavior 
as the discount factor approaches 1. Based on the ideas 
introduced in [7], undiscounted problems are analyzed 
under either of the following assumptions: 

e Positivity assumption: 


0 < g(x,u,w), V(x,u,@)€Sx Cx D; 


e Negativity assumption: 


g(x,u,@) <0, V(x,u,o)€Sx Cx D. 


Having costs per stages being bounded from either 
above or below may result in the complication of having 
unbounded costs for some initial states. Therefore, the 
assumption will be made that oo, (— 00), are admissible 
costs Jz, under the positivity (negativity) assumption. 
Defining the following two mappings, greatly simplifies 
the analysis. For any function J defined in S that takes 
the values [0, + 00] under the positivity assumption, or 
the values [— oo, 0] under the negativity assumption, 
the mappings T and T,, are defined as: 


(TJ)(x) = an E{g(x,u,w) + J(f(x,u,@)}. 


Furthermore, for any admissible stationary policy, the 
mapping T,, is defined as: 
(Ti J)(x) = Efg(x, w(x), @) + If (x, W(x), @)}. 


Under both the positivity or negativity assumptions, 
it can be shown, [1], that Bellman’s equation is satisfied: 


e Under either the positivity or the negativity assump- 
tion, the optimal cost function, J* satisfies: 


"(x)= min Ef g(x, u, a) + (f(x, u, o)} 
Clearly, the optimality conditions requires that: 
7" _ Ty. 


Equivalently, for any stationary policy, it holds true 
that: 


Ju = Tu Uy). 


It is to be noted though that for undiscounted 
problems, a = 1, the function J* need not be the 
unique function minimizes Bellman’s equation. In 
other words, the mapping T does not have a unique 
fixed point. Nevertheless, the optimal cost vector, J*, is 
the smallest fixed point, under the positivity assump- 
tion, or the largest fixed point, under the negativity as- 
sumption. 

e Under the positivity assumption, if J’: S > [0, + oo] 
satisfies J’ = TJ’, then: 


al 
e Under the negativity assumption, if J’: S > [— 00, 0] 
satisfies J’ = TJ’, then: 
es 
It should be pointed out, that in the analysis of undis- 
counted problems the concept of monotonicity plays 


a key role. The following monotone convergence theo- 
rem summarizes the key properties, [3]: 


Theorem Let P = (pj, po, ...) be a probability distri- 
bution over S = {1, 2,...}. Let {hy} be a sequence of ex- 
tended real-valued functions on S such that for alli € S 
and N = 1,2,...: 


0 < hy(i) S hnyili). 
Let h: S + [0, oo] be the limit function: 
h(i) = im, hy (i). 


Then: 


[o,@) [o,@} 
in, Spat = Shi, 
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As examples of undiscounted problems, let us consider 
two types of problems that define interesting classes 
that can be cast as undiscounted dynamic program- 
ming problems, namely the optimal stopping and the 
optimal gambling strategy problem. The former defines 
a situation in which at each state x of the state space 
there are two possible actions that are available. One 
may either decide to stop, by selecting control u;, and 
pay a terminal cost t(x), or, select control u2, pay a cost 
c(x) and continue the process, with a new state be given 
by: 


Xe+1 = f (Xk, Ox). 


For completeness purposes one also defines a termina- 
tion state s that is entered once the stopping decision is 
made. In other words, 


Xep1 =s ifug = uy, orxp =S. 


If it is further assumed that both the termination and 
the continuation costs are positive (or negative), then 
the problem satisfies the positivity (negativity) assump- 
tion. The mapping T defined earlier takes now the form: 


(TJ)(x) = min {t(x), c(x) + EV (f(x, @))}}, 
VxeS. 


The objective is to find the optimal stopping policy 
that minimizes the total expected costs over an infi- 
nite number of stages. Insofar regarding external dis- 
turbances, w, it is assumed that they have the same 
probability distribution for all time instances and de- 
pend only on the current state x;. 

The early work [5] details the gambling problem, 
but was also one of the early works on undiscounted 
problems. The problem is defined as one in which 
a player may stake at any time k any amount u, > 0 that 
does not exceed his/her current fortune, x;. The stake is 
won back with probability p, and lost with probability 
1 — p. The discrete-time state evolution is described by: 


Xkeey = Xp t+ @pup, k=1,2,.... 


The disturbance w; is considered to be 1 with proba- 
bility p, and —1 with probability 1 — p. The gambling is 
continued until reaching given fortune or loosing the 
entire initial capital. The problem is to determine that 
optimal gambling strategy that maximizes the probabil- 
ity of reaching the target fortune. As gambling strategy 


is defined the specific rule that specifies what the stakes 
should be at time k. It can be shown that the bold strat- 
egy is an optimal policy. The bold strategy is defined as: 


B* (x) = 


As suggested by the theory of undiscounted problems, 
the bold strategy is simply an optimal strategy and oth- 
ers can also be derived, [5]. 

From a computational standpoint it is important to 
know whether the method of successive approximations, 
i.e., value iteration, converges to the optimal cost func- 
tion. In other words, it is important to know whether 
the basic dynamic programming algorithm converges. 
Under either of the two basic assumptions we have: 

e Positivity assumption: 


Jo = TU) S++ S To) S++. 
e Negativity assumption: 

Jo = TU) = ++ = To) > ++ 
In either case 


jay = Jim T*(Io)(x), x ES. 


In other words, the sequence generated by successive 
approximation, i. e., by successively applying the map- 
ping T, converges and the limit is well defined, includ- 
ing the values of + oo and — oo. For the value iteration 
method though to be valid we also need to have that Joo 
= J*. In order for the above to be true under the posi- 
tivity assumptions, an additional condition needs to be 
satisfied, [2]: 

e Let the positivity assumption be satisfied and as- 

sume that the sets: 


ee ee {u € U(x): 
E{g(x,u,c) + T*Jo)( f(x, u,0))} <A} 
are compact subsets of a Euclidean space for every x 
€ S,A © R, and for all k greater than some integer k. 


Then: 


876 


Dynamic Traffic Networks 


It should be noted that since U(x) is assumed to be fi- 
nite, the above condition is satisfied. A detailed account 
of the value iteration method of undiscounted Markov 
decision problems can also be found in [6]. 

It can also be shown, [3], that it is possible to de- 
vise computational methods based on mathematical 
programming when the state, control, and disturbance 
spaces are finite. Under the negativity assumption, the 
vector J solves the following linear programming prob- 
lem: 


max y Xi 
i=1 
st. Ai < g(iu) + D- pijwa; 


j=l 


Under the positivity assumption, the corresponding 
program takes the form: 


st. A; > min 4 g(i,u)+ So pij(u)a, 


j=l 


Unfortunately, this two-level optimization problem is 
neither linear nor convex, and therefore its solution 
highly nontrivial. 
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Congested urban transportation networks represent 
complex systems in which travelers interact so as to 
determine unilaterally their cost-minimizing routes of 
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travel between their points of origin and their desti- 
nations. The governing concept here is that of ‘user- 
optimization’, which dates to J.G. Wardrop [19] (and 
was so termed by S.C. Dafermos and F.T. Sparrow [4]), 
and states that, in equilibrium, all used paths connect- 
ing an origin/destination pair of nodes will have travel 
costs that are equal and minimal. 

The complexity of user-optimized transportation 
networks, sometimes also referred to as the ‘traffic as- 
signment’ problem, has stimulated much research in 
the past several decades, both from methodological per- 
spectives, as well as in terms of practical application. 
Notable developments include: the proof by M.J. Beck- 
mann, C.B. McGuire, and C.B. Winsten [1] that, under 
certain symmetry assumptions on the link travel cost 
functions (and travel disutility functions), the traffic 
network equilibrium solution also satisfies the Kuhn- 
Tucker conditions of an appropriately constructed op- 
timization problem, and the identification by Dafermos 
[2] that the traffic network equilibrium conditions, as 
formulated by M.J. Smith [17] (without any imposi- 
tion of a symmetry assumption), satisfy a variational 
inequality problem. Books that discuss methodologi- 
cal approaches to static traffic equilibrium problems in- 
clude [9,14,16] (see also [6]). 

The study of dynamic travel route choice models on 
general transportation networks, where time is explic- 
itly incorporated into the framework, was initiated by 
D.K. Merchant and G.L. Nemhauser [8], who focused 
on dynamic system-optimal networks with the char- 
acteristic of many origins and a single destination. In 
system-optimal networks, in contrast to user-optimal 
networks, one seeks to determine the path flow and link 
load patterns that minimize the total cost in the net- 
work, rather than the individual path travel costs. 

M.J. Smith [18], in turn, proposed a dynamic traf- 
fic user-optimized model with fixed demands. H. Mah- 
massani [7] also proposed dynamic traffic models and 
investigated them experimentally. The recent book [15] 
provides an overview of the history of dynamic traffic 
network models and discusses distinct approaches for 
their analysis and computation. 

Here we present a dynamic traffic model with elastic 
demands proposed by P. Dupuis and A. Nagurney [5], 
who, along with D. Zhang and Nagurney [22], estab- 
lished the foundations for a new methodology, that of 
‘projected dynamical systems’ theory. The notable fea- 


ture of a projected dynamical system is that its set of sta- 
tionary points coincides with the set of solutions of the 
corresponding variational inequality problem. There, 
thus, exists a fundamental linkage between the static 
world of finite-dimensional variational inequality prob- 
lems and the dynamic world exhibited by a new class of 
dynamical system. 

The dynamic adjustment process that is presented 
here models the travelers’ day-to-day dynamic behav- 
ior of making trip decisions and route choices associ- 
ated with a travel disutility perspective. Subsequently, 
some of the stability results of this travel-route choice 
adjustment process obtained by Zhang and Nagurney 
[21] are reviewed, which address whether and how the 
travelers’ dynamic behavior in attempting to avoid con- 
gestion leads to a traffic equilibrium pattern. Finally, we 
recall the discrete-time algorithms devised for the com- 
putation of traffic network equilibria with elastic de- 
mands and with known travel disutility functions. The 
convergence of these discrete-time algorithms was es- 
tablished by Zhang and Nagurney [10,13]. Additional 
dynamic traffic network models, as well as qualitative 
and numerical results, using this methodology can be 
found in [11,12], and [22]. For alternative dynamic traf- 
fic network models and approaches, see [15], and the 
references therein. 


A Dynamic Traffic Network Model 


The model that we present is due to Dupuis and Nagur- 
ney [5]. It is a dynamic counterpart to the static traffic 
network equilibrium model with elastic travel demands 
developed by Dafermos [3]. 

We consider a network [N, L] consisting of nodes 
[N] and directed links [L]. Let a denote a link of the 
network connecting a pair of nodes, and let p denote 
a path (assumed to be acyclic) consisting of a sequence 
of links connecting an origin/destination (O/D) pair w. 
P,, denotes the set of paths connecting the O/D pair w 
with np, paths. We let W denote the set of O/D pairs 
and P the set of paths in the network. 

Let x, represent the flow on path p and let f, denote 
the load on link a. The following conservation of flow 
equation must hold for each link a: 


fa = > * pap, 
P 
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where dq = 1, if link a is contained in path p, and 0 
otherwise. The expression states that the load on a link 
a is equal to the sum of all the path flows on paths that 
contain the link a. 

Moreover, if we let d,, denote the demand associ- 
ated with an O/D pair w, then we must have that for 
each O/D pair w 


dy = ae 


pePy 


where x, = 0, for all p, that is, the sum of all the path 
flows on paths connecting the O/D pair w must be equal 
to the demand d,,. Let x denote the column vector of 
path flows with dimension np. 

Let c, denote the user cost associated with traversing 
link a, and let C, denote the user cost associated with 
traversing path p. Then 


Cy = > CaSap: 


In other words, the cost of a path is equal to the sum of 
the costs on the links comprising that path. We group 
the link costs into the column vector c with n4 com- 
ponents, and the path costs into the column vector C 
with np components. We also assume that we are given 
a travel disutility function A,, for each O/D pair w. We 
group the travel disutilities into the column vector A 
with J components. 

We assume that, in general, the cost associated with 
a link may depend upon the entire link load pattern, 
that is, 


Ca = Calf) 


and that the travel disutility associated with an O/D pair 
may depend upon the entire demand pattern, that is, 


Aw = Aw(d), 


where f is the m4-dimensional column vector of link 
loads and d is the J-dimensional column vector of travel 
demands. 

We now, for completeness, recall the traffic network 
equilibrium conditions. 


Definition 1 (traffic network equilibrium, [1,3]) A 
vector x* € RP , which induces a vector d* through the 


demand equations, is a traffic network equilibrium if for 
each path p € P,, and every O/D pair w: 


= i,(d"), 
2 Aw(d*), 


>0 
= 0. 


: * 
C)(x*) : *p 
if x, 
In equilibrium, only those paths connecting an O/D 
pair that have minimal user costs are used, and their 
costs are equal to the travel disutility associated with 
traveling between the O/D pair. 

The equilibrium conditions have been formulated 
as a variational inequality problem by Dafermos (cf. 
[2,3]). In particular, we have: 


Theorem 2 [3] (x*, d*) € K! is a traffic network equi- 
librium pattern, that is, satisfies the equilibrium condi- 
tions if and only if it satisfies the variational inequality 
problem (path flow formulation): 


(C(x*)", x —x*)—(A(d*)', d—d*) > 0, 
V(x,d) € K', 


where K! = {(x, d): x > 0, and the demand constraints 
hold}. 


Note that, in view of the demand constraints, one may 
define A(x) = A(d), in which case one may rewrite the 
variational inequality in the path flow variables x only, 
that is, we seek to determine x* € Ry , such that 

(cox) —~He"))".x— x*) >0, WxeR"’, 
where A(x) is the n P,, % ++-Mp,,-dimensional column 
vector with components: 


Ci sng Ay Dies cg Aw Wiotiens Ap OO): 


where J is the number of O/D pairs. If we now let 
F(x) = (C(x) — A(x)) and K = {x : x € R'}, then, 
clearly, this inequality can be placed into standard vari- 
ational inequality form. 


The Trip-Route Choice Adjustment Process 


The dynamical system, first presented in [5], whose sta- 
tionary points correspond to solutions of the latter vari- 
ational inequality problem above, is given by: 


x = ITg(x, A(x) — C(x)), x(0) = x0 € K, 
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where, assuming that the feasible set K is a convex poly- 
hedron (as is the case here), and given x € K andv € 
R", we define the projection of the vector v at x (with 
respect to K) by 


Px(x + 6v) — x 
T1x(x,v) = lim Ao =e 
5-0 6 
where Px is defined as: 
Px(x) = arg min ||x — z|| , 
ze€K 
and || - || denotes the Euclidean norm. 


This dynamical system is a projected dynamical sys- 
tem (cf. [10,22]), since the right-hand side, which is 
a projection operator, is discontinuous. 

The adjustment process interpretation of the dy- 
namical system, as discussed in [5], is as follows: Users 
of a transportation network choose, at the greatest rate, 
those paths whose differences between the travel disu- 
tilities (demand prices) and path costs are maximal; 
in other words, those paths whose costs are minimal 
relative to the travel disutilities. If the travel cost on 
a path exceeds the travel disutility associated with the 
O/D pair, then the flow on that path will decrease; if 
the travel disutility exceeds the cost on a path, then the 
flow on that path will increase. If the difference between 
the travel disutility and the path cost drives the path 
flow to be negative, then the projection operator guar- 
antees that the path flow will be zero. The process con- 
tinues until there is no change in path flows, that is, 
until all used paths have path costs equal to the travel 
disutilities, whereas unused paths will have costs which 
exceed the disutilities. Specifically, the travelers adjust 
their route choices until an equilibrium is reached. 

The following example, given in a certain discrete- 
time realization, shows how the dynamic mechanism 
of the above trip-route choice adjustment would real- 
locate the traffic flow among the paths and would react 
to changes in the travel disutilities. 


Example 3 Consider a simple transportation network 
consisting of two nodes, with a single O/D pair w, and 
two links a and b representing the two disjoint paths 
connecting the O/D pair. Suppose that the link costs 
are: 


Cal fa) = fa t2, co(fo) = 2fo, 


and the travel disutility function is given by: 
Awldy) = —dy +5. 


Note that here a path consists of a single link and, 
hence, we can use x and f interchangeably. Suppose 
that, at time t = 0, the flow on link a is 0.7, the flow 
on link b is 1.5; hence, the demand is 2.2, and the travel 
disutility is 2.8, that is, 


xXq(0) = 0.7, 
d,,(0) = 2.2, 


xp(0) = 1.5, 
Aw(0) = 2.8, 
which yields travel costs: cq (0) = 2.7 and c,(0) = 3.0. 


According to the above trip-route choice adjust- 
ment process, the flow changing rates at time t = 0 are: 


Xq(0) = Ay(0) — ca(0) = 0.1, 
xp(0) = Ay(0) — cp (0) = —0.2. 
If a time increment of 0.5 is used, then at the next 
moment t = 0.5, the flows on link a and link b are: 
x4(0.5) = x,(0) + 0.5x,(0) 
=0.7+ 0.5 x 0.1 = 0.75, 
x,(0.5) = xp(0) + 0.5x,(0) 
=15-05x0.2 = 1.4, 
which yields travel costs: cg(0.5) = 2.75 and cy (0.5) = 
2.8, a travel demand d,,(0.5) = 2.15, and a travel disu- 
tility A,,(0.5) = 2.85. Now, the flow changing rates are 
given by: 
x%_(0.5) = A,,(0.5) — c,(0.5) 
= 2.85 — 2.75 = 0.1, 
xp(0.5) = Ay(0.5) — cp, (0.5) 
= 2.85 —2.8 = 0.05. 
The flows on link a and link b at time ft = 1.0 would, 
hence, then be: 
x_(1.0) = x,(0.5) + 0.5%, (0.5) 
= 0.75+ 0.5 x 0.1 = 0.80, 
xp(1.0) = xp(0.5) + 0.5%, (0.5) 
= 14+ 0.5 x 0.05 = 1.425, 
which yields travel costs: cq(1.0) = 2.80 and c,(1.0) = 


2.85, a travel demand d,,(1.0) = 2.225, and a travel disu- 
tility A,,(1.0) = 2.775. Now, the flow changing rates are 
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given by: 


Xq(1.0) = Aw(1.0) — cq(1.0) 

= 2.775 — 2.800 = 0.025, 
xp(1.0) = Aw(1.0) — cp(1.0) 

= 2.775 — 2.850 = —0.075. 


The flows on link a and link b at time t = 1.5 would 
be: 


Xq(1.5) = xq(1.0) + 0.5%, (1.0) 

= 0.8 — 0.5 x 0.025 = 0.7875, 
xp(1.5) = xp(1.0) + 0.5%p(1.0) 

= 1.425 —0.5 x 0.075 = 1.3875, 


which yields travel costs: cq(1.5) = 2.7875 and cp(1.5) 
= 2.775, a travel demand d,(1.5) = 2.175, and a travel 
disutility A.,,(1.0) = 2.82. 

In this example, hence, as time elapses, the trip- 
route choice adjustment process adjusts the flow vol- 
ume on the two links so that the difference between the 
travel costs of link a and link b is being reduced, from 
0.3, to 0.05, and, finally, to 0.0125; and, the difference 
between the disutility and the travel costs on the used 
links is also being reduced from 0.2, to 0.1, and to 0.045. 
In fact, the traffic equilibrium with: x* = 0.8 and x, = 
1.4, which induces the demand d*, = 2.2, is almost at- 
tained in only 1.5 time units. 


Stability Analysis 


We now present the stability results of the trip route 
choice adjustment process. The results described herein 
are due to Zhang and Nagurney [21]. For example, 
the questions that motivate transportation planners and 
analysts to study the stability of a transportation sys- 
tem include: Will any initial flow pattern be driven to 
an equilibrium by the adjustment process? In addition, 
will a flow pattern near an equilibrium always stay close 
to it? These concerns of system stability are important 
in traffic assignment and form, indeed, a critical base 
for the very concept of an equilibrium flow pattern. 

For the specific application of transportation net- 
work problems, the following definitions of stability of 
the transportation system and the local stability of an 
equilibrium are adapted from the general stability con- 
cepts of projected dynamical systems (cf. [22]). 


Definition 4 (stability at an equilibrium) An equilib- 
rium flow pattern x* is stable if it is a global mono- 
tone attractor for the corresponding route choice ad- 
justment process. 


Definition 5 (asymptotical stability at an equilibrium) 
An equilibrium flow pattern x* is asymptotically stable 
if it is a strictly global monotone attractor for the corre- 
sponding route choice adjustment process. 


Definition 6 (stability of the system) A route choice 
adjustment process is stable if all its equilibrium flow 
patterns are stable. 


Definition 7 (asymptotical stability of the system) 
A route choice adjustment process is asymptotically 
stable if all its equilibrium flow patterns are asymptoti- 
cally stable. 


We now present the stability results in [21] for the trip- 
route choice adjustment process. 


Theorem 8 ([21]) Suppose that the link cost functions 
c are monotone increasing in the link load pattern f and 
that the travel disutility functions A are monotone de- 
creasing in the travel demand d. Then the trip-route 
choice adjustment process is stable. 


Theorem 9 ([21]) Assume that there exists some equi- 
librium path flow pattern. Suppose that the link cost 
functions c and negative disutility functions — A are 
strictly monotone in the link load f and the travel de- 
mand d, respectively. Then the trip-route choice adjust- 
ment process is asymptotically stable. 


The first theorem states that, provided that monotonic- 
ity of the link cost functions and the travel disutility 
functions holds true, then any flow pattern near an 
equilibrium will stay close to it forever. Under the strict 
monotonicity assumption, on the other hand, the sec- 
ond theorem can be interpreted as saying that any ini- 
tial flow pattern will eventually be driven to an equilib- 
rium by the route choice adjustment process. 


Discrete Time Algorithms 


The Euler method and the Heun method were em- 
ployed in [13] and [10] for the computation of solu- 
tions to dynamic elastic demand traffic network prob- 
lems with known travel disutility functions, and their 
convergence was also established therein. We refer the 
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reader to these references for numerical results, includ- 
ing traffic network examples that are solved on a mas- 
sively parallel computer architecture. 

In particular, at iteration t, the Euler method com- 
putes 


xtt) = Px(x* — a,F(x")), 


whereas, according to the Heun method, at iteration t 
one computes 


ttl 


= Px (« _ ae5[F(x") + F(P(x* — a.Fis"))]) : 


In the case that the sequence {a;} in the Euler 
method is fixed, say, {az} = p, for all iterations t, then 
the Euler method collapses to a projection method (cf. 
[2,6,9], and [14]). 

In the context of the dynamic traffic network prob- 
lem with known travel disutility functions, the projec- 
tion operation in the above discrete-time algorithms 
can be evaluated explicitly and in closed form. Indeed, 
each iteration t of Euler method takes the form: For 
each path p € Pin the transportation network, compute 


the path flow x5*? according to: 


xp*? = max{0, x5 + ac(Aw(d*) — Cp(x*))}. 


Each iteration of the Heun method, in turn, consists 
of two steps. First, at iteration tT one computes the ap- 
proximate path flows: 


a, = max{0, x, + ar(Aw(d*) — Cp(x"))}, 
Vp Ee P, 
and updates the approximate travel demands: 


A= > 2. 


pePyw 


Vwe W. 


Let 
x" = {x,,p € P} 
and 


d =({d.,we W}. 


Then, for each path p € P in the transportation net- 


work one computes the updated path flows ee ac- 
cording to: 
a = max {o, 
xb + [Ay (d") — Cpe") + Awd) — Cp) 
Pp 2 Ww Pp w p ’ 
VpeP, 


and updates the travel demands d**! according to: 


t+1 
Le , 


pePw 


ae Vwe W. 


It is worth noting that both the Euler method and 
the Heun method at each iteration yield subproblems in 
the path flow variables, each of which can be solved not 
only in closed form, but also, simultaneously. Hence, 
these algorithms in the context of this model can be in- 
terpreted as massively parallel algorithms and can be 
implemented on massively parallel architectures. In- 
deed, this has been done so by Nagurney and Zhang 
[13] (see also [11] for the case where the demand func- 
tions are given, rather than the travel disutility func- 
tions). 

In order to establish the convergence of the Euler 
method and the Heun method, one regularizes the link 
cost structures. 


Definition 10 (regular cost function) The link cost 
function c is called regular if, for every link a € L, 


Calf) > ~, as fg > ©, 


holds uniformly true for all link flow patterns. 


We note that the above regularity condition on the link 
cost functions is natural from a practical point of view 
and it does not impose any substantial restrictions. In 
reality, any link has an upper bound in the form of 
a capacity. Therefore, letting f, — oo is an artificial 
device under which one can reasonably deduce that 
Ca(f) > 00, due to the congestion effect. Consequently, 
any practical link cost structure can be theoretically ex- 
tended to a regular link cost structure to allow for an 
infinite load. 

The theorem below shows that both the Euler 
method and the Heun method converge to the traffic 
network equilibrium under reasonable assumptions. 
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Theorem 11 ([10,13]) Suppose that the link cost func- 
tion c is regular and strictly monotone increasing, and 
that the travel disutility function A is strictly monotone 
decreasing. Let {a;} be a sequence of positive real num- 
bers that satisfies 


lim a, =0 
TCO 


and 


lo) 
) ar = OW. 
tT=0 


Then both the Euler method and the Heun method pro- 
duce sequences {x*} that converge to some traffic network 
equilibrium path flow pattern. 
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Introduction 


A considerable amount of effort has been spent on 
studying and developing efficient solution procedures 
for the economic lot-sizing problem. This problem had 
been solved in 1958, but there is still continuing inter- 
est in the problem. The main reason for the continu- 
ing interest in this problem is its practical applications. 
For example, economic lot-sizing is the core problem in 


aggregate production planning in MRP systems (Nah- 
mias [25]). For an extensive review, see Aggarwal and 
Park [1], Bahl et al. [2], Belvaux and Wolsey [6,7,8], 
Nemhauser and Wolsey [26], and Wolsey [36]. 

The economic lot-sizing problem can be defined as 
follows. Given the demand, the unit production cost, 
the unit inventory holding cost for a commodity, the 
production capacities, and the setup costs for each time 
period over a finite, discrete-time horizon, find a pro- 
duction schedule that would satisfy demand at mini- 
mum cost. 

This model assumes a fixed and a variable com- 
ponent of production costs. The fixed cost consists of 
manpower and materials to start up the machines. To 
reduce the fixed cost per unit, large lot sizes are desired. 
On the other hand, for every unit produced there are 
associated production and inventory holding costs, and 
the total variable cost (production plus inventory) in- 
creases with the number of units produced. Solving the 
lot-sizing problem means finding a production sched- 
ule that would satisfy demand at every period and min- 
imize the total of fixed and variable costs. 

The work by Harris [18] in 1913 has been cited as 
the first study of the economic lot-sizing problem that 
assumes deterministic demands. This model, known as 
the Economic Order Quantity (EOQ) model, proposes 
a production schedule to satisfy the demand for a single 
commodity with a constant demand rate. Production 
takes place continuously over time, and the model does 
not incorporate capacity limits. 

A major limitation of the above model is that the de- 
mand is continuous over time and has a constant rate. 
Manne [23] and Wagner and Whitin [35] studied the 
lot-sizing problem with a finite time horizon consisting 
of a number of discrete periods, each with its own de- 
terministic and independent demand. 
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Formulation 


Let T be the length of the planning horizon, and ¢, h,, 
St, by denote the unit production cost, the unit inventory 
holding cost, the setup cost, and the demand in period 
t(1 < t < T), respectively. The following are the deci- 
sion variables for this problem: 


qt: amount of production in period ¢ 


I;: inventory level at the end of period t 


1 if production occurs in period t 
yte= 


0 otherwise 
A mixed-integer programming formulation of the clas- 
sical economic lot-sizing problem is 


T 
minimize Year + siyt + helt) 


t=1 


subject to (C-ELS) 

Qathi=bt+h, 1<t<T, (1) 

Q¢< bry, LSt<T, (2) 

Ir =0, (3) 

yp € {0,1}, 1<t<T, (4) 

qt i,20, 1<5t<T, (5) 
where lyr is the total demand in periods ¢,..., T. 


Below, some of the approaches that have been used 
to solve the economic lot-sizing problem are summa- 
rized. 


Dynamic Programming Algorithm 


In 1958, Wagner and Whitin [35] developed a dy- 
namic programming algorithm to solve the economic 
lot-sizing problem. This algorithm runs in O(T”). 

The special structure of the optimal solutions to the 
economic lot-sizing problem contributed to develop- 
ing an efficient dynamic programming algorithm. For 
this purpose, it is useful to view the problem as a fixed 
charge network flow problem. The optimal solution for 
this network flow problem will be a tree since the ob- 
jective function is concave and the arcs do not have ca- 
pacity restrictions. This property of an optimal solution 
implies that: 


Theorem 1 An optimal solution to the economic lot- 
sizing problem satisfies the following: 

Gi.) qili-1 = Oforl < t < T, 

(ii.) If q¢ > 0, then qy = ee b, fort <t! <T. 


Property (i.) shows that production takes place when 
the inventory carried forward is zero. This property is 
known as the Zero-Inventory Ordering (ZIO) property. 
Property (ii.) shows that if production takes place in pe- 
riod t, the amount produced will be exactly equal to the 
total demand for a number of periods (t to t’). Prop- 
erty (ii.) is used to develop the dynamic programming 
algorithm for the economic lot-sizing problem. 

Let v(t) be the minimum cost of a solution for pe- 
riods 1,...,¢. If t < t is the last period in which pro- 
duction occurs, then qz = b;; and I;_, = 0. This im- 
plies that the problem can be divided into two smaller 
subproblems, and the least-cost solution v(t — 1) is op- 
timal for the first subproblem (periods 1,...,7 — 1). 
This leads to the following recursive function: 


t-1 
v(t) = min {v(t —1) + se + ceber +) Asbs41,0 
S=T 


with v(0)=0. 


Calculating v(t) for t = 1,..., T leads to the opti- 
mal solution v(T) of the economic lot-sizing problem. 


Shortest Path Algorithm 


The economic lot-sizing problem has also been solved 
as a shortest-path algorithm in an acyclic network (Ep- 
pen and Martin [13], Martin [24], Zangwill [38]). The 
acyclic network, G, is built in the following way: let 
the total number of nodes in G be equal to T + 1, 
one for each time period along with a dummy node. 
The traversing arc (t, t’) € G1 <t <?t’ < T+ 1) rep- 
resents the choice of producing in period f to satisfy the 
demand in periods t,..., t/ — 1. Thus, the cost of arc 
(t, t’) is calculated using the following cost function: 


/—2 
St = St + Cpby yy + > hrbr+i,t-1 - 
t=t 
The shortest path from node “1” to node “T + 1” pro- 
vides the set of production intervals of minimum cost 
and therefore solves the economic lot-sizing problem. 
The shortest-path algorithm is solved in order O(m), 
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where m is the number of arcs in the directed, acyclic 
network. The total number of arcs in G is in the order 
of T’; therefore, this shortest-path algorithm solves the 
economic lot-sizing problem in O(T”). 


Primal-Dual Algorithm 


Almost 30 years later (after Wagner and Whitin devel- 
oped their dynamic programming algorithm), Wagel- 
mans et al. [34], Aggarwal and Park [1], and Feder- 
gruen and Tzur [15] showed that the running time of 
the dynamic programming algorithm could be reduced 
to O(T log T) in the general case and to O(T) when 
the costs have a special structure. This special structure 
(hi + pt = pr+i) is also referred to as the absence of 
speculative motives. They developed a primal-dual al- 
gorithm to get these results. 

Below, we present a reformulation of the eco- 
nomic lot-sizing problem. In this formulation, qi¢(t = 
t,..., T) is defined as the amount produced in period 
t to satisfy demand in period t (Krarup and Bilde [20], 
van Hoesel [31]). The reformulation is 


T iT 
minimize c 2 dit + vo 


t=1 t=t 


subject to (Ex-ELS) 
T 
So aie br, 1<1t<T, (7) 
t=1 
Git — Ory <0, l<t<t<T, (8) 
ge 20, Letters, (9) 
yr € {0,1}, 1<t<T, (10) 


where €; = pe + Doyayyy Ms. 

Krarup and Bilde show that the linear program- 
ming relaxation of (Ex-ELS) (obtained by relaxing con- 
straints y € {0, 1}) always has an integer solution. The 
corresponding dual problem has a special structure that 
enables developing a primal-dual-based algorithm. The 
following is the formulation of the dual problem: 


T 
maximize y biv; 


t=1 


subject to (D-ELS) 
T 
Yo by max(0,vr—@)) Ss) 1<t<T. (II) 


t=t 


The dual variables have the following property: 
Vv; = Vi41forl < t < T—1. This property of the dual 
variables is used to show that the dual-ascent algorithm 
gives the optimal solution to the economic lot-sizing 
problem. Note that formulation (Ex-ELS) of the eco- 
nomic lot-sizing problem is a special case of the facility 
location problem. The primal-dual algorithm, in prin- 
ciple, is similar to the primal-dual scheme proposed 
by Erlenkotter [14] for the facility location problem. 
The primal-dual algorithm for the economic lot-sizing 
problem developed by Wagelmans et al. [34] runs in 
O(T log T). 


Cutting Plane Algorithm 


Many researchers studying lot-sizing problems also fo- 
cused on determining a partial polyhedral description 
of the set of the feasible solutions and applying branch- 
and-cut methods (Pochet et al. [28], Leung et al. [21], 
Barany et al. [5]). The main motivation for studying the 
polyhedral structure of the single item lot-sizing prob- 
lem is to use the results from these studies to develop 
efficient algorithms for problems such as the multi- 
commodity economic lot-sizing problem. However, the 
branch-and-cut approach has not yet resulted in com- 
petitive algorithms for the single-item lot-sizing prob- 
lem itself. The reason is that generating a single cut 
could be as time-consuming as solving the whole prob- 
lem. 

Barany et al. [4,5] provide a set of valid inequali- 
ties for the single-commodity lot-sizing problem and 
show that these inequalities are facets of the convex hull 
of the feasible region. Furthermore, they show that the 
inequalities fully describe the convex hull of the fea- 
sible region. Their separation algorithm runs in order 
OCT), 

Pereira and Wolsey [27] study a family of un- 
bounded polyhedra arising in an uncapacitated lot- 
sizing problem with Wagner-Whitin type costs (h; + 
Pt— Pt+i = 0). They characterize the bounded faces of 
maximal dimension completely and show that they are 
integral. For a problem with T periods they derive an 
O(T’) algorithm to express any point within the poly- 
hedron as a convex combination of the extreme points 
and the extreme rays of the polyhedron. They observe 
that for a given objective function, the face of optimal 
solutions can be found in O(T”). 
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Cases 
Capacitated Economic Lot-Sizing Problem 


The capacitated lot-sizing problem is NP-hard even 
for many special cases (Florian et al. [16] and Bi- 
tran and Yanasse [9]). In 1971, Florian and Klein pre- 
sented a remarkable result. They developed an O(T*) 
algorithm for solving the capacitated lot-sizing prob- 
lem with equal capacities in all periods. This result 
uses a dynamic programming approach combined with 
some important properties of optimal solutions to these 
problems. Recently, van Hoesel and Wagelmans [32] 
showed that this algorithm can be improved to O(T*) 
if backlogging is not allowed and the holding cost func- 
tions are linear. 

Several solution approaches have been proposed 
for NP-hard special cases of the capacitated lot-siz- 
ing problem. These methods are typically based on 
branch-and-bound (see, for instance, Baker et al. [3] 
and Erengii¢ and Aksoy [12]), dynamic programming 
(Kirca [19] and Chen and Lee [10]) or acombination of 
the two (Chung and Lin [11] and Lofti and Yoon [22]). 

Shaw and Wagelmans [29] considered the capaci- 
tated lot-sizing problem with piecewise linear produc- 
tion costs and general holding costs. They showed that 
this is an NP-hard problem and presented an algorithm 
that runs in pseudopolynomial time. 


Multi-commodity Economic Lot-Sizing Problem 


The multicommodity version of the problem has at- 
tracted much attention. Manne [23] uses the ZIO prop- 
erty to develop a column generation approach to solve 
this problem. Barany et al. [5] solve the multicommod- 
ity capacitated lot-sizing problem without setup times 
optimally using a cutting-plane procedure followed by 
branch-and-bound. Moreover, Manne introduces up- 
per bounds on the production capacities. 

Other extensions to the classic economic lot-sizing 
problem consider setup times, backorders, and other 
factors. Zangwill [37] extends Wagner and Whitin’s 
model by allowing backlogging and introducing general 
concave cost functions. Veinott [33] studies an unca- 
pacitated model with convex cost structures. Trigeiro 
et al. [30] show that a capacitated lot-sizing problem 
with setup times is much harder to solve than a capac- 
itated lot-sizing problem without setup times. It is easy 
to check if the capacitated lot-sizing problem without 


setup times has a feasible solution or not. This can be 
done by computing cumulative demand and cumula- 
tive capacity. When setup times are considered, the fea- 
sibility problem is NP-complete. Also, the bin-packing 
problem is a special case of capacitated lot-sizing prob- 
lem with setup times (Garey and Johnson [17, p. 226]). 
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method; Lehmann-Maehly method 


Selfadjoint eigenvalue problems for ordinary differen- 
tial equations are very important in the sciences and 
in engineering. The characterization of eigenvalues by 
a minimum-maximum principle for the Rayleigh quo- 
tient forms the basis for the famous Rayleigh-Ritz 
method. This method allows for an efficient compu- 
tation of nonincreasing upper eigenvalue bounds. NJ. 
Lehmann and H.J. Maehly [6,7,8] independently devel- 
oped complementary characterizations that can be used 
to compute lower bounds. These methods are based 
on extremal principles for the Temple quotient. In gen- 
eral, however, an application of the Lehmann-Maehly 
method requires that certain quantities can be deter- 
mined explicitly. This may be difficult or even impos- 
sible when dealing with partial differential equations. 
Of great importance is therefore a generalization, the 
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Goerisch method [3,4,5], that may be used to overcome 
these problems. Nevertheless, the original Lehmann- 
Maehly method can easily be applied to a large class of 
ordinary differential equations; in [10] it is shown, that 
the method can be interpreted as a special application 
of the Rayleigh-Ritz method. 


Inclusion Method 


Let (H, (-|-)) be an infinite-dimensional Hilbert space 
with the inner product (-|-) and the norm || - ||. Suppose 
that V is a dense subspace of H and that one has the 
inner product [-|-] in V such that (V, [-|-]) is a Hilbert 
space (the norm in V is denoted by || - || v). The embed- 
ding V <> H is assumed to be compact. 

One can consider the right-definite eigenvalue 
problem 


Find A€RandgeV,g #0, 


(1) 
s.t. [g|v] = A(glv) for all v € V. 


Problem (1) has a countable spectrum of eigenvalues, 
and the eigenvalues can be ordered by magnitude: 


jroo 
The Rayleigh-Ritz procedure for calculating upper 
bounds is a discretization of the Poincaré principle 


(cf. [9, Chapt. 22]) 
Aj = min ‘egg 2 jEN. (2) 


ECV we (uu)? 
dim E=j u%é0 


If the linearly independent trial functions 


Uy,...,Un EV, neEeN, 


are chosen, one can reduce (2) to the n-dimensional 
subspace V,, (the span of the chosen functions {u,..., 
uy}) and obtains the values 


ale Bute Al), 
which are upper bounds to the following 4: 
[n] 
Aj < Aj ? 


j=l,...,n. 


ap is called a Rayleigh-Ritz bound for A;. Now one 
forms the real n x n-matrices 


Ao := ((UilUk))i,k=1,...4n > 


Ay = ([uiluk))ice1,..un> 


(3) 


the Rayleigh-Ritz bounds are the eigenvalues of the ma- 
trix eigenvalue problem 


Ayx = AllAgx, (A™, x) Ee RxR". (4) 


The Rayleigh—-Ritz bounds are monotonically decreas- 
ing inn EN. 

The Lehmann-Goerisch procedure (see [3,4,5,6,7]) 
for calculating lower bounds can be understood as 
the discretization of a variational principle for char- 
acterizing the eigenvalues as well. This principle and 
a proof of the method is due to S. Zimmermann and 
U. Mertins [10]. 

Let p € R bea spectral parameter such that for an N 
€N the inequality 


An <p <Anai (5) 


holds true. One expresses the first N eigenvalues in the 
form 


1 
Anei-i = P+ i=1,...,N 


t 


(assuming 0; < 0). For u € V, w, € H denotes the 
uniquely determined solution of the equation 


[ulv] = (w, |v) forallv € V, 


the following o; therefore are characterized by 


o; = inf max 
ECV uéE 
dim E=i u0 
[ulu] — p(ulu) (6) 
(wulwu) — 2p[ulu] + p?(u, u) 
i=1,..., N. A negative upper bound for 0; results in 


a lower bound for Ay,)—;. In order to discretize (6), 
one determines w, ..., W, € H such that 


[uilv] = (wilv) forallv € V, (7) 
then one defines the matrix 
Az = ((wilWk))i,k=1,..40 » (8) 


and solves the matrix eigenvalue problem 


(A; — pAo) xX =T (Az —2pA,+ p’ Ao) xe 
(t, x)€RxR"”. (9) 
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If for n € N the condition AM < pis fulfilled, then (9) 
has exactly N negative eigenvalues tT] <--- <ty<O< 

- <T,. These t; are upper bounds for our 0; (0; < T;, 
i=1,..., N). One obtains the lower bounds 


AR = p+ 


<Ajj=l,...,N. (10) 


TN+1-j 


This discretization (9), (10) is the Lehmann-Goerisch 
procedure. aye is called a Lehmann-Goerisch bound 
for Aj. 


Numerical Example 


The numerical example is the well known Mathieu 
equation. This equation has been considered by several 
authors, bounds for eigenvalues of the Mathieu equa- 
tion can be found in [1,9] and [3]. The eigenvalue prob- 
lem reads as follows 


~ ©" (x) + scos*(x)@(x) =AB(x), x € [0. =| 
9'(0) = 0 (F) =0, 


where s € R, s > 0, is a parameter. 
In order to treat this problem, the required quanti- 


ties can be defined as follows: I := (0, 2/2), 
H:=1,(D, V:= H(D. 


The inner products (-, -) and [-, -] are given by 
f.g) = / * f(x) g(x) dx forall f,g € H, 
0 


izi= ie (f(x) g' (x) + scos*(x) f(x) g(x)) dx 


forall f,g <V. 

With this definition the inner product [-, -] and the 
usual H! inner product are equivalent; the embedding 
(V, [-,-]) — (A, (- -)) is compact. 

Now the eigenvalue problem 


Find Ae RandgeV,~p 40 
st. — [o|v] = A(g|v) forall v € V. 


is equivalent to the Mathieu equation. The trial func- 
tions v, € V are defined by 


vi(x) := 1, 
vE(x) := cos(2(k — 1)x) (11) 
forxeI, k=2,...,n. 


With these trial functions the Rayleigh-Ritz upper 
bounds AM (cf. (3), (4)) can be computed. For n = 5 
one obtains 


Ay 
2.28404873592 
8.4560567005 
19.606719005 
39.5439779 
67.609198 


Ak ww Is. 


The quality of these upper bounds can be increased by 
increasing n. 

An application of the Lehmann-Goerisch proce- 
dure requires a spectral parameter p which is a rough 
eigenvalue bound (cf. (5)). For this aim the Mathieu 
equation is considered for s = 0. This is a second order 
problem with constant coefficients and can be solved 
in closed form. Its eigenvalues are ve = 4i-1)*,i¢€ 
N. From the comparison theorem (see [3]) one can see 
that the re are lower bounds for the eigenvalues of the 
Mathieu equation with s > 0; this can be used to verify 
the left hand side inequality of (5), the right-hand side 
inequality can be examined by means of the Rayleigh- 
Ritz bounds. For N = 4 one obtains 


3 < Al” < 19.607 < p:= Ag = 36 < Ag. 


If s is increased dramatically, it may be impossible to 
satisfy (5). If this happens, one can link the eigen- 
value problem under consideration and the comparison 
problem by a homotopy method (cf. [3]). 

The next task is the determination of w; € H such 
that (7) holds true. In general this is a problem, but for 
differential equations, where the right-hand side is the 
identity, one can proceed as follows: The operator on 
the left-hand side of the differential equation is denoted 
by M; then the trial functions v; are chosen from D(M) 
(that means sufficiently smooth) such that all essential 
and natural boundary conditions are satisfied. Now w; 
:= M v; fulfills (7). For the Mathieu equation one can 
define 


(M f)(x) := —f" (x) + s cos?(x) f(x) 
and 


V:= \f € H(1): f'(0) = i.) = ot 


Ellipsoid Method 


now it is easy to see that the v; from (11) fulfill v; € 4 
and w; := M v; can be used in (7), (8). 

From the eigenvalues of the matrix eigenvalue prob- 
lem (9) one obtains the following bounds: 


[5] 
Aj 
2.28404873592 
8.4560567005 
19.6067191 


i 

1 | 2.28404873561 
2 | 8.4560566942 
3 | 19.6067171 


For an example with a system of ordinary differen- 
tial equations see [2]. 


See also 


> «BB Algorithm 

> Hemivariational Inequalities: Eigenvalue Problems 

> Interval Analysis: Eigenvalue Bounds of Interval 
Matrices 

> Semidefinite Programming and Determinant 
Maximization 
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Abstract 


In this article, we give an overview of the Ellipsoid 
Method. We start with a historic introduction and pro- 
vide a basic algorithm in Sect. “Method”. Techniques 
to avoid two important assumptions required by this 
algorithm are considered in Sect. “Polynomially Run- 
ning Time: Avoiding The Assumptions”. After the dis- 
cussion of some implementation aspects, we are able 
to show the polynomial running time of the Ellipsoid 
Method. The second section is closed with some mod- 
ifications in order to speed up the running time of the 
ellipsoid algorithm. In Sect. “Applications”, we discuss 
some theoretical implications of the Ellipsoid Method 
to linear programming and combinatorial optimiza- 
tion. 
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Introduction 


In 1979, the Russian mathematician Leonid G. 
Khachiyan published his famous paper with the ti- 
tle “A Polynomial Algorithm in Linear Programming”, 
[11]. He was able to show that linear programs (LPs) 
can be solved efficiently; more precisely that LP be- 
longs to the class of polynomially solvable problems. 
Khachiyan’s approach was based on ideas similar to 
the Ellipsoid Method arising from convex optimiza- 
tion. These methods were developed by David Yudin 
and Arkadi Nemirovski, [24,25,26], and independently 
by Naum Shor, [20], preceded by other methods as for 
instance the Relaxation Method, Subgradient Method 
or the Method of Central Sections, [2]. Khachiyan’s ef- 
fort was to modify existing methods enabling him to 
prove the polynomial running time of his proposed al- 
gorithm. For his work, he was awarded with the Fulk- 
erson Prize of the American Mathematical Society and 
the Mathematical Programming Society, [16,21]. 

Khachiyan’s four-page note did not contain proofs 
and was published in the journal Soviet Mathematics 
Doklady in February 1979 in Russian language. At this 
time he was 27 years young and quite unknown. So it 
is not surprising that it took until the Montreal Mathe- 
matical Programming Symposium in August 1979 until 
Khachiyan’s breakthrough was discovered by the math- 
ematical world and a real flood of publications fol- 
lowed in the next months, [23]. In the same year, The 
New York Times made it front-page news with the ti- 
tle “A Soviet Discovery Rocks World of Mathematics”. 
In October 1979, the Guardian titled “Soviet Answer to 
Traveling Salesmen” claiming that the Traveling Sales- 
man problem has been solved - based on a fatal misin- 
terpretation of a previous article. For an amusing out- 
line of the interpretation of Khachiyan’s work in the 
world press, refer to [15]. 


Method 


The Ellipsoid Method is designed to solve decision 
problems rather than optimization problems. There- 
fore, we first consider the decision problem of finding 
a feasible point to a system of linear inequalities 


A'x <b (1) 


where A is a n X m matrix and b is an n-dimensional 
vector. From now on, we assume all data to be inte- 


gral and n to be greater or equal than 2. The goal is 
to find a vector x € R” satisfying (1) or to prove that 
no such x exists. We see in Sect. “Linear Programming” 
that this problem is equivalent to a linear programming 
optimization problem of the form 

A'x <b, 


min c'x  s.t. x>0, 


in the sense that any algorithm solving one of the two 
problems in polynomial time can be modified to solve 
the other problem in polynomial time. 


The Basic Ellipsoid Algorithm 


Roughly speaking, the basic idea of the Ellipsoid 
Method is to start with an initial ellipsoid containing 
the solution set of (1). The center of the ellipsoid is in 
each step a candidate for a feasible point of the problem. 
After checking whether this point satisfies all linear in- 
equalities, one either produced a feasible point and the 
algorithm terminates, or one found a violated inequal- 
ity. This is used to construct a new ellipsoid of smaller 
volume and with a different center. Now the proce- 
dure is repeated until either a feasible point is found 
or a maximum number of iterations is reached. In the 
latter case, this implies that the inequality set has no fea- 
sible point. 

Let us now consider the presentation of ellipsoids. 
It is well known, that the n-dimensional ellipsoid with 
center x° and semi-axis g; along the coordinate axis is 
defined as the set of vectors satisfying the equality 


n 0)2 


St (2) 


j= §j 


More general, we can formulate an ellipsoid alge- 
braically as the set 


E := {x € R" | (x—x°)'’B (x — x°) = 1}, (3) 


with symmetric, positive definite, real-valued n x n ma- 
trix B. This can be seen with the following argument. As 
matrix B is symmetric and real-valued, it can be diago- 
nalized with a quadratic matrix Q, giving D = Q-'BQ, 
or equivalently, B = QDQ™!. The entries of D are the 
eigenvalues of matrix B, which are positive and real- 
valued. They will be the quadratic, reciprocal values of 
the semi-axis g;. Inserting the relationship for B into (3) 
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yields to (x—x°)'QD7!Q7!(x—x°) = 1 which is equiv- 
alent to 


((x — x°)™Q) D7 ((x — x°)7Q)' =1. 


Hence, we can interpret matrix Q as a coordinate trans- 
form to the canonical case where the semi-axis of the 
ellipse are along the coordinate axis. Recognize that in 
the case when matrix B is a multiple of the unit matrix, 
B = r’ -I, then the ellipsoid gets a sphere with radius 
r > Oand center x’. We abbreviate this in the following 
by S(x®, r). 

We start with a somewhat basic version of the el- 
lipsoid algorithm. This method requires two important 
assumptions on the polyhedron 


P:= {x € R"| Ax <b}. 


We assume that 
1. the polyhedron P is bounded and that 
2. Pis either empty or full-dimensional. 

In Sect. “Polynomially Running Time: Avoiding 
The Assumptions”, we will see how this algorithm can 
be modified not needing these assumptions. This will 
allow us to conclude that a system of linear inequalities 
can be solved in polynomial running time with the El- 
lipsoid Method. 

Let us now discuss some consequences of these two 
assumptions. The first assumption allows us to con- 
struct a sphere S(co, R) with center co and radius R con- 
taining P completely: P C S(co, R). The sphere S(co, R) 
can be constructed, for instance, in the following two 
ways. If we know the bounds on all variables x, e. g. 
Li < x; < Uj, one can use a geometric argument to 
see that with 


R:= max {[Li|, |Ui|}, 


i=1 


the sphere S(0, R) will contain the polytope P com- 
pletely. In general, when such bounds are not given ex- 
plicitly, one can use the integrality of the data and proof, 
see for instance [6], that the sphere with center 0 and 
radius 


R= Jn2iartth)—n? (4) 


contains P completely, where (-) denotes the encoding 
length of some integral data. For an integer number bj, 


we define 
(bj) = 1+ [log,(|bi| +], 


which is the number of bits needed to encode integer b; 
in binary form; one bit for the sign and | log,(|b;| + 1) | 
bits to encode |b;|. With this, the encoding length of 
a vector b is the sum of the encoding lengths of its com- 
ponents. Similarly, for a matrix A, the encoding length 
is given by (A) := 7", (ai). 

The second assumption implies that if P is non- 
empty, its volume is strictly positive, meaning that there 
is an n-dimensional sphere of radius r > 0 which is 
contained in P. More precisely, it is possible to show 
that 


vol(P) > 27" tata (5) 


in the case that P is not empty, see [5]. This will help 
us to bound the number of iterations of the basic ellip- 
soid algorithm. The graphical interpretation of the pos- 
itive volume of polytope P is that the solution set of sys- 
tem (1) is not allowed to have mass zero, for instance, 
not to be a hyperplane. 

Figure | illustrates the effect of the two assumptions 
in the case that P is non-empty. As this is a two-di- 
mensional example, the second assumption implies that 


S(2°,R) 
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polytope P is not just a line segment but instead con- 
tains a sphere of positive radius r. 

Now, we are ready to discuss the main steps of 
the basic ellipsoid algorithm. Consider therefore Algo- 
rithm 1. In the first six steps the algorithm is initialized. 
The first ellipsoid is defined in step three. The meaning 
of the parameters for the ellipsoid and especially num- 
ber k* will be discussed later in this section. For now, 
let us also ignore step seven. Then, for each iteration k, 
it is checked in step eight if center x* satisfies the lin- 
ear inequality system (1). This can be done for instance 
by checking each of the m inequalities explicitly. In the 
case that all inequalities are satisfied, x* is a feasible 
point and the algorithm terminates. In the other case 
there is an inequality ajx < b; which is violated by x‘, 
step nine. In the next two steps, a new ellipsoid E;,.41 is 
constructed. This ellipsoid has the following properties. 
It contains the half ellipsoid 


H:= Exn{x € R"| ajx < a;x"} (6) 


which insures that the new ellipsoid E,+1 also contains 
polytope P completely, assuming that the initial ellip- 
soid S(0, R) contained P. Furthermore, the new ellip- 
soid has the smallest volume of all ellipsoids satisfying 
(6), see [2]. The central key for the proof of polynomial 
running time of the basic ellipsoid Algorithm 1 is an- 
other property, the so called Ellipsoid Property. It pro- 
vides the following formula about the ratio of the vol- 
umes of the ellipsoids 


n n+1 n n= 1 
- (45) ig) 
i 
<exp(-7). (7) 


for a proof see, for instance, [5,17]. As exp(—1/ 
(2n)) < 1 for all natural numbers n, the new ellip- 
soid has a strict smaller volume than the previous one. 
Wealso notice that exp(—1/(2n)) is a strictly increasing 
function in 1 which has the consequence that the ratio 
of the volumes of the ellipsoids is closer to one when the 
dimension of the problem increases. 

Now, let us discuss the situation when P is empty. 
In this case, we want Algorithm | to terminate in step 
seven. To do so, we will derive an upper bound k* on 
the number of iterations needed to find a center x* sat- 
isfying the given system of linear inequalities for the 


volEx+4 
volEx 


Input: Matrix A, vector b; sphere S(co, R) contain- 
ing P 
Output: feasible x or proof that P = 
// Initialize 
1: k:=0 
2: k* = 2n(2n +1) < C > +2n?(log(R) — n? +1) 
// Max number of iterations 
x° = co, By = R*-I // Initial ellipsoid 
1 


:= > // Parameter for ellipsoid: step 


make 9 a5, Aleve 
= = // Parameter for ellipsoid: dilation 


a 
| 


j= = // Parameter for ellipsoid: expansion 
// Check if polytope P is empty 
7: if k = k* return P = 9 
// Check feasibility 
8: if Ax* < breturnX := x* //Xisa feasible point 
9: else let ajx* > bj 


// Construct new ellipsoid 
Baj (Bra De 


10: Bray := 6(Be — 0 A ) 
B 
Li: kl y= xk — 7 
a; Beaj 
// Loop 


12: k < k +1 and goto step 7 
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case that P is not empty. Clearly, if the algorithm would 
need more than k* iterations, the polytope P must be 
empty. Therefore, let us assume again that P A Y. In 
this case (5) provides a lower bound for the volume of P 
and an upper bound of its volume is given, for instance, 
by (4). In addition, according to the construction of the 
ellipsoids, we know that each of the E,4, contains P 
completely. Together with the Ellipsoid Property (7) we 
get the relation 


* 


—k 7 
vol(Ex«) < exp (=) vol(Eo) < 2 bax tn tn log(R) 
n 


ob g—(nt1(C) +18 < vol(P). 


This chain provides an equation defining the maximum 
number of iterations 


k* := 2n(2n + 1)(C) + 2n*(log(R) — n? + 1). (8) 
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Basic ellipsoid algorithm. a P6 4 O.bP = O 


A geometric interpretation is that the volume of the el- 
lipsoid in the k*th iteration would be too small to con- 
tain the polytope P. Obviously, this implies that P has to 
be empty. With this, we have shown that the presented 
basic ellipsoid algorithm works correctly. 

One iteration of the basic ellipsoid algorithm, for 
the case of a non-empty polytope P, is illustrated in 
Fig. 2a. We recognize that P is contained completely 
in Ex. The dashed line shows equality ajx < 0; cor- 
responding to one of the inequalities which are violated 
by x*. Geometrically, this equality is moved in parallel 
until it contains center x*. Recognize that the new ellip- 
soid E;,.4 ) contains the half ellipsoid (6). In this case, the 
new center x;,+) is again not contained in polytope P 
and at least one more step is required. The case that 
polytope P is empty is illustrated in Fig. 2b which is 
mainly the same as Fig. 2a. 

In the case that By is a multiple of the identity, 
the ellipsoid is an n-dimensional sphere. According 
to the initialization of the basic ellipsoid algorithm in 
step three, this is the case for the first iteration when 
k = 0. This gives us an interpretation of the values 
of 6 and t. The new ellipsoid E,4) is shrunk by the 
factor /5(1 — oa) = n/(n + 1) in the direction of vec- 
tor a; and expanded in all orthogonal directions by fac- 
tor V8 = n/n? —1. Hence, we see that in the next 
step we do no longer get a sphere if we start with one. 
The third parameter, the step, gives intuitively the mea- 
sure how far we go from point x* in the direction of 


vector B;,aj;, multiplied by factor 1/,/ aj Br a;. For more 
details we refer to the survey of Bland, Goldfarb and 
Todd, [2]. 

After all the discussions above, we are now able to 
conclude the polynomial running time of the basic el- 
lipsoid Algorithm 1 for the case that the polyhedron P 
is bounded and either empty or full-dimensional. For 
an algorithm to have polynomial running time in the 
deterministic Turing-machine concept, there has to be 
a polynomial in the encoding length of the input data 
describing the number of elementary steps needed to 
solve an arbitrary instance of the problem. Therefore, 
let L := (A,b) be the encoding length of the in- 
put data. Obviously, each of the steps of Algorithm 
1 can be done in polynomial many steps in L. The 
maximum number of iterations is given through k* 
which is also a polynomial in L; consider therefore 
equations (8) and (4). Hence, we conclude that the el- 
lipsoid Algorithm 1 has a polynomial running time. 
During the reasoning above, we required implicitly 
one more important assumption, namely to have ex- 
act arithmetic. Normally, this is of more interest in nu- 
merics rather than in theoretical running time analy- 
sis. However, it turns out that for the Ellipsoid Method 
this is crucial as the Turing-machine concept also re- 
quires finite precision. Let us postpone this topic to the 
end of the next subsection and rather discuss meth- 
ods to avoid the two major assumptions on polyhe- 
dron P. 
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Polynomially Running Time: 
Avoiding the Assumptions 


In order to prove that the Ellipsoid Method can solve 
the system of linear inequalities (1) in polynomial time, 
one has to generalize the basic ellipsoid Algorithm 1 
to need not the assumptions that 1. polyhedron P is 
bounded, 2. P is either empty or full-dimensional and 
3. that exact arithmetic is necessary. 

In 1980, Khachiyan published a paper, [12], dis- 
cussing all the details and proofs about the Ellipsoid 
Method which were neglected in his paper from 1979. 
In Lemma 1, he showed that 


PNS(0, 2") ¥@, (9) 


in the case that P # 9. With this, one can use for R 
of (4) value 2’. However, we cannot just adopt the al- 
gorithm above to the new situation. Instead of a lower 
bound for vol(P), as given in (5), we would need a lower 
bound for vol(P NM S(0, 2")). To achieve this, we follow 
a trick introduced by Khachiyan and consider the per- 
turbed system of linear inequalities 

asx <2'B; +1 i=1,...,m. (10) 
Let us abbreviate the corresponding solution set 
with P’, which is in general a polyhedron. Khachiyan 
was able to proof a one-to-one correspondence of the 
original system (1) and the perturbed one (10). This 
means that P is empty if and only if P’ is empty, it is 
possible to construct a feasible point for (1) out of (10) 
in polynomial time and the formulation of the per- 
turbed system is polynomial in L. Furthermore, the new 
inequality system has the additional property that if 
P’ ¥ Q it is full-dimensional. Hence, it is possible to 
find a (non-empty) sphere included in P’. It can be 
shown, that 


S(x, 2774) C P’N S(0, 2"), 


where x is any feasible point of the original system (1) 
and hence x € P. With this argument at hand, it is pos- 
sible to derive an upper bound for the number of itera- 
tions for the Ellipsoid Method by solving the perturbed 
system (10). It can be shown that a feasible point can be 
found in at most 6n(n + 1)L iterations, [2]. 


With the perturbation of the original system and 
property (9), we do no longer require that P is bounded. 
As a byproduct, polyhedron P has not to be of full- 
dimension in the case that it is non empty; as sys- 
tem (10) is of full-dimension independent of whether P 
is or not, assuming that P # @. As a consequence, the 
basic ellipsoid algorithm can be generalized to apply for 
any polyhedron P and the two major assumptions are 
no longer necessary. 

During all of the reasoning, we assumed to have ex- 
act arithmetic, meaning that no rounding errors dur- 
ing the computation are allowed. This implies that all 
data have to be stored in a mathematically correct way. 
As we use the Turing-machine concept for the running 
time analysis, we require that all computations have to 
be done in finite precision. Let us now have a closer 
look for the reason why this is crucial for ellipsoid Al- 
gorithm 1. 

The presentation of the ellipsoid with the matrix B, 
in (3) yields to the convenient update formulas for the 
new ellipsoid, parameterized by By4, and x**!. How- 
ever, to obtain the new center xt! one has to divide 


by factor , [a; Bua j- If we work with finite precision, 


rounding errors are the consequence, and it is likely 
that matrix B; is no longer positive definite. This may 
cause that aj By a; becomes zero or negative, implying 
that the ellipsoid algorithm fails. 

Hence, to implement the ellipsoid method, one has 
to use some modifications to make it numerically stable. 
One basic idea is to use factorization 


B= 1D, 


for the positive definite matrix B;, with L being a lower 
triangular matrix with unit diagonal and diagonal ma- 
trix D; with positive diagonal entries. Obtaining such 
a factorization is quite expensive as it is of order n°. 
But there are update formulae applying for the case of 
the ellipsoid algorithm which have only quadratic com- 
plexity. Already in 1975, for such a type of factorization, 
numerically stable algorithms have been developed, in- 
suring that D; remains positive definite, see [7]. We 
skip the technical details here and refer instead to Gol- 
farb and Todd, [9]. 

With a method at hand which can handle the ellip- 
soid algorithm in finite precision numerically stable, the 
proof of its polynomial running time is complete. 
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Modifications 


In this subsection, we briefly discuss some straightfor- 
ward modifications of the presented ellipsoid method 
in order to improve its convergence rate. Therefore, 
let us consider Fig. 2 once more. From this figure it is 
intuitively clear that an ellipsoid containing set {x € 
Ex | aj x < b} has a smaller volume than the one con- 
taining the complete half-ellipsoid (6). In the survey pa- 
per of Bland et al. it is shown that the smallest ellipsoid, 
arising from the so called deep cut a'x < b, can be 
obtained by choosing the following parameters for Al- 
gorithm 1 


1+ na 
—— F 
n+1 
2+ 2na 
o:= ————., 
(n+1)1+a) 
6:= san, == ; 
(n? — 1)(1 — a?) 
with 
al! —b, 
a: : 


ieee 
a; Bea; 


The parameter @ gives an additional advantage of this 
deep cut, as it is possible to check infeasibility or for 
redundant constraints, [19]. 

Another idea could be to use a whole system of vio- 
lated inequalities as a cut instead of only one. Such type 
of cuts are called surrogate cuts and were discussed by 
Goldfarb and Todd. An iterative procedure to generate 
these cuts was described by Krol and Mirman, [14]. 

Consider now the case that the inequality system (1) 
contains two parallel constraints which means that they 
differ only in the right hand side. With this it is possible 
to generate a new ellipsoid containing the information 
of both inequalities. These cuts are called parallel cuts. 
Update formulas for B, and x* were discovered inde- 
pendently by several authors. For more details, we refer 
to [2,19]. 

However, all modifications which have been found 
so far do not allow to reduce the worst case running 
time significantly — they especially do not allow to avoid 
the presence of L. This implies that the running time 
does not only depend on the size of the problem but 
also on the magnitude of the data. 


At the end of the second chapter, we point out that 
the Ellipsoid Method can also be generalized to use 
other convex structures as ellipsoids. Methods working 
for instance only with spheres, or triangles, are possi- 
ble. The only crucial point is that one has to make sure 
that its polynomial running time can be proven. Fur- 
thermore, the underlying polytope can be generalized 
to any convex set; for which the separation problem can 
be solved in polynomial time, see Sect. “Separation and 
Optimization”. 


Applications 


In 1981, Grétschel, Lovasz and Schrijver used the Ellip- 
soid Method to solve many open problems in combi- 
natorial optimization. They developed polynomial al- 
gorithms, for instance, for the vertex packing in per- 
fect graphs, and could show that the weighted frac- 
tional chromatic number is NV P-hard, [5]. Their proofs 
were mainly based on the relation of separation and 
optimization, which could be established with the help 
of the Ellipsoid Method. We discuss this topic in 
Sect. “Separation and Optimization” and give one ap- 
plication for the maximum stable set problem. For all 
other interesting results, we refer to [5]. But first, we 
consider another important application of the Ellipsoid 
Method. We examine two concepts showing the equiv- 
alence of solving a system of linear inequalities and to 
find an optimal solution to a LP. This will prove that LP 
is polynomial solvable. 


Linear Programming 


As we have seen in the last section, the Ellipsoid Method 
solves the problem of finding a feasible point of a system 
of linear inequalities. This problem is closely related to 
the problem of solving the linear program 


max cx S.f. 
x 


ASU, #20. (11) 
Again, we assume that all data are integral. In the 
following we briefly discuss two methods of how the 
optimization problem (11) can be solved in polynomial 
time via the Ellipsoid Method. This will show that LP is 
in the class of polynomially solvable problems. 
From duality theory, it is well known that solving 


the linear optimization problem (11) is equivalent to 
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finding a feasible point of the following system of lin- 
ear inequalities 


A'x < b, 
—-x < 0 
—-Ay < -c (12) 
“yy = 0 
—clx+bly < 0 


The third and fourth inequality come from the dual 
problem of (11), insuring primal and dual feasibility 
of x and y, respectively. The last inequality results from 
the Strong Duality Theorem, implying that this in- 
equality always has zero slack. The equivalence of the 
two problems means in this case that vector x of each 
solution pair (x,y) of (12) is an optimal solution of 
problem (11) and y is an optimum for the dual problem 
of (11). In addition, to each solution of the optimization 
problem exists a vector such that this pair is feasible for 
problem (12). 

From the equivalence of the two problems (11) and 
(12) we immediately conclude that the linear program- 
ming problem can be solved in polynomial time; as 
the input data of (12) are polynomially bounded in 
the length of the input data of (11). This argument 
was used by Gacs and Lovasz in their accomplishment 
to Khachiyan’s work, see [4]. The advantage of this 
method is that the primal and dual optimization prob- 
lem are solved simultaneously. However, note that with 
this method, one has no idea whether the optimization 
problem is infeasible or unbounded in the case when 
the Ellipsoid Method proves that problem (12) is infea- 
sible. Another disadvantage is that the dimension of the 
problem increases from n to n + m. 

Next we discuss the so called bisection method 
which is also known as binary search or sliding ob- 
jective hyperplane method. Starting with an upper and 
lower bound of an optimal solution, the basic idea is to 
make the difference between the bounds smaller until 
they are zero or small enough. Solving system A'x <b, 
x > 0 with the Ellipsoid Method gives us either a vec- 
tor x providing the lower bound | := c'x for prob- 
lem (11) or in the case that the polyhedron is empty, we 
know that the optimization problem is infeasible. An 
upper bound can be obtained, for instance, by finding 
a feasible vector to the dual problem Ay > c, y > 0. 


If the Ellipsoid Method proves that the polytope of the 
dual problem is empty, we can use the duality theory 
(as we already know that problem (11) is not infeasible) 
to conclude that the optimization problem (11) is un- 
bounded. In the other case we obtain vector y yielding 
to the upper bound u := b'y of problem (11), accord- 
ing to the Weak Duality Theorem. Once bounds are ob- 
tained, one can iteratively use the Ellipsoid Method to 
solve the modified problem A'x < b,x > 0 with the 
additional constraint 


—c'x a 


u+l 
which is a constraint on the objective function value of 
the optimization problem. If the new problem is infea- 
sible, one can update the upper bound to mat , and in 
the case that the ellipsoid algorithm computes a vec- 
tor X, the lower bound can be increased to c!X which 
is greater or equal to utt In doing so, one at least bi- 
sects the gap in each step. However, this method does 
not immediately provide a dual solution. Note that only 
one inequality is added during the process, keeping 
the problem size small. More details and especially the 
polynomial running time of this method are discussed 


by Padberg and Rao [1]. 


Separation and Optimization 


An interesting property of the ellipsoid algorithm is 
that it does not require an explicit list of all inequali- 
ties. In fact, it is enough to have a routine which solves 
the so called Separation Problem for a convex body K: 


Given z € R", either conclude that z € K or give 
a vector 1 € IR" such that inequality x'x < mz 
holds for all x € K. 


In the latter case we say that vector 7 separates z 
from K. In the following, we restrict the discussion to 
the case when K is a polytope meeting the two assump- 
tions of Sect. “The Basic Ellipsoid Algorithm”. To be 
consistent with the notation, we write P for K. From 
the basic ellipsoid Algorithm 1 follows immediately that 
if one can solve the separation problem for polytope P 
polynomially in L and n, then the corresponding opti- 
mization problem 


max c'x s.t. x€P 
x 
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can also be solved in polynomial time in L and n; 
L is again the encoding length of the input data, see 
Sect. “The Basic Ellipsoid Algorithm”. The converse 
statement was proven by Groetschel et al., yielding to 
the following equivalence of separation and optimiza- 
tion: 


The separation problem and the optimization 
problem over the same family of polytopes are 
polynomially equivalent. 


Consider now an example to see how powerful the con- 
cept of separation is. Given a graph G = (V, E) with 
node set V and edges e € E. A stable set S of graph G 
is defined as a subset of V with the property that any 
two nodes of S are not adjacent; which means that no 
edge between them exists in E. To look for a maximum 
one is the maximum stable set problem. This is a well 
known optimization problem and proven to be N’P- 
hard, see [3]. It can be modeled, for instance, as the in- 
teger program 


max c!x 
x 
st. xitxjsl VWi,peE xe {0,1} (13) 
with incidence vector x, meaning that x; = 1 if 


node i is in a maximum stable set, otherwise it is zero. 
Constraints (13) are called edge inequalities. Relaxing 
the binary constraints for x gives the so-called trivial 
inequalities 


O0<x<1 (14) 


yielding to a linear program. However, this relaxation 
is very weak; consider therefore a complete graph. To 
improve it, one can consider the odd-cycle inequalities 


Se - ct 


i€C 


(15) 


for each odd cycle C in G. Recognize that there are in 
general exponentially many such inequalities in the size 
of graph G. Obviously, every stable set satisfies them 
and hence they are valid inequalities for the stable set 
polytope. The polytope satisfying the trivial-, edge- and 
odd-cycle inequalities 


P:= {x € RY! | x satisfies (14), (15) and (16)} (16) 


is called the cycle-constraint stable set polytope. No- 
tice that this polytope is contained strictly in the sta- 
ble set polytope. It can be shown that the separation 
problem for polytope (16) can be solved in polynomial 
time. One idea is based on a construction of an aux- 
iliary graph H with a double number of nodes. Solv- 
ing a sequence of n shortest path problems on H solves 
then the separation problem with a total running time 
of order | V| - |E| - log(|V|). With the equivalence of op- 
timization and separation, the stable set problem over 
the cycle-constraint stable set polytope can be solved in 
polynomial time. This is quite a remarkable conclusion 
as the number of odd-cycle inequalities may be expo- 
nential. However, note that it does not imply that the 
solution will be integral and hence, we cannot conclude 
that the stable set problem can be solved in polyno- 
mial time. But we can conclude that the stable set prob- 
lem for t-perfect graphs can be solved in polynomial 
time; where a graph is called t-perfect, if the stable set 
polytope is equal to the cycle-constraint stable set poly- 
tope (16). For more details about this topic see, for in- 
stance [8,18]. 


Conclusion 


In 1980, the Ellipsoid Method seemed to be a promising 
algorithm to solve problems practically [23]. However, 
even though many modifications to the basic ellipsoid 
algorithm have been made, the worst case running time 
still remains a function in n, m and especially L. This 
raises two main questions. First, is it possible to modify 
the ellipsoid algorithm to have a running time which is 
independent of the magnitude of the data, but instead 
depends only on n and m - or at least any other algo- 
rithm with this property solving LPs? (This concept is 
known as strongly polynomial running time.) The an- 
swer to this question is still not known and it remains 
an open problem. In 1984, Karmarkar introduced an- 
other polynomial running time algorithm for LP which 
was the start of the Interior Point Methods, [10]. But 
also his ideas could not be used to solve this question. 
For more details about this topic see [17,22]. The sec- 
ond question, coming into mind, is how the algorithm 
performs in practical problems. Unfortunately, it turns 
out that the ellipsoid algorithm tends to have a run- 
ning time close to its worst-case bound and is inefh- 
cient compared to other methods. The Simplex Method 


Ellipsoid Method 


developed by George B. Dantzig in 1947, was proven 
by Klee and Minty to have exponential running time 
in the worst case, [13]. In contrast, its practical per- 
formance is much better and it normally requires only 
a linear number of iterations in the number of con- 
straints. 

Until now, the Ellipsoid Method has not played 
arole for solving linear programming problems in prac- 
tice. However, the property that the inequalities them- 
selves have not to be explicitly known, distinguishes the 
Ellipsoid Method from others, for instance from the 
Simplex Method and the Interior Point Methods. This 
makes it a theoretically powerful tool, for instance at- 
tacking various combinatorial optimization problems 
which was impressively shown by Grotschel, Lovasz 
and Schrijver. 


See also 


> Linear Programming 

> Linear Programming: Interior Point Methods 

> Linear Programming: Karmarkar Projective 
Algorithm 

> Linear Programming: Klee-Minty Examples 

> Volume Computation for Polytopes: Strategies and 
Performances 
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Introduction 


Natural disasters (such as fires, hurricanes, tornadoes, 
flash floods, tsunamis, earthquakes, etc.) or man-made 
disasters (such as nuclear power plant explosions, 
chemical plant explosions, hazmat releases, dirty bomb 
threats, etc.) affect millions of people every year. Evac- 
uation is an emergency management strategy used to 
ensure a population’s safety in these type of situations. 
Emergency evacuation is defined as the relocation of 
a threatened population to a safer area due to an im- 
mediate or predictable life-threatening danger [25]. 
Prior to 1979, the models developed for evacuat- 
ing people and vehicles from dangerous locations were 
mainly qualitative. However, the accident at the Three 
Mile Island nuclear power plant near Middletown, 
Pennsylvania, in 1979 provided a major motive for 


quantifying emergency response plans [36]. Since then, 
a number of optimization- and simulation-based mod- 
els have been developed to identify evacuation strate- 
gies for communities (urban and rural areas), build- 
ings and industrial plants, and residential areas. The 
most recent and challenging developments in this area 
are real-time traffic management and agent-based mod- 
els. 


Applications 
Community Evacuation 


Southworth [35] models a community evacuation plan 
using a five-step procedure that involves trip genera- 
tion, trip departure time, trip destination, trip route se- 
lection, and evacuation plan set-up and analysis. The 
factors that affect any of these steps are the distribu- 
tion of the population that is at risk, human behav- 
ior, transportation infrastructure, road capacity, vehicle 
utilization, accessibility of warning technologies, time 
available before the occurrence of the hazard, evacuees’ 
route and destination selection, promptness in clean- 
ing and preparing to operate the affected highways and 
roads, traffic management actions and the availability 
of non-evacuation based protective actions, such as in 
site sheltering [6,7,10,35,36]. The next sections of this 
article provide a summary of the research related to the 
above mentioned steps. 


Community Evacuation: Trip Generation 


The trip generation step determines the number of ve- 
hicles loaded to the traffic network during the evacua- 
tion. The number of vehicles loaded in the network de- 
pends on the population of the evacuation zone (which 
is space and time dependent), number of vehicles per 
household and vehicle utilization rate. The population 
of an evacuation area consists of the permanent resi- 
dents, the transients (tourists and daily workers), and 
the residents of special facilities such as students, prison- 
ers, patients, customers in shopping malls, and mem- 
bers at recreational facilities [35,36]. For a given resi- 
dential area with a population size equal to N, South- 
worth [35] estimates the daytime population using the 
following equation: 


D=H+W+P+S, 
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where W, P, and S denote the number of workers, stu- 
dents, and residents of special facilities, respectively. H 
denotes the number of people who stay at home during 
the day and is estimated by 


H=([N—(W+ P)]*(1—s), 


where s is the probability that a non-working adult (or 
a child) is not being engaged in shopping, recreational 
or social activities. 

Based on Southworth [35], the vehicle utilization 
rate depends on the time of the evacuation, the house- 
hold size, the average number of commuters per vehi- 
cle and the average number of workers and licensed 
drivers per household. Estimating vehicle utilization 
rate is challenging. This is why some post evacuation 
surveys report significantly different utilization rates. 
For example, Baker [1] estimates the vehicle utilization 
rate to be 52% and Lindell and Perry [23] 75%. 


Community Evacuation: Trip Departure Time 


This is the time it takes one to evacuate once the evacu- 
ation warning is released. The trip departure time con- 
sists of the time required to receive the official evacu- 
ation warning, the time required to leave the current 
location to get home, the time required to arrive home, 
and the time to prepare to leave home. Next, we give 
a summary of the approaches used to calculate trip de- 
parture time. 

In 1984, Jamei [18] introduced the mobilization 
curve to estimate the percent of evacuees that enter the 
traffic network in specific time intervals. The mobiliza- 
tion curve is represented using the following equation: 


1 


ame Tie exp[—z(t — h)]) ” 


where P, is the cumulative percentage of traffic volume 
loaded in the network by time ¢, z is the response rate 
of the public to the disaster and is known as the slope 
of the mobilization curve, and h is the “half loading 
time”. The loading time depends on the incident and 
its relative severity. Radwan et al. [29] and Hobeika and 
Kim [17] have incorporated the mobilization curve in 
mass evacuation computer programs (MASSVAC 3.0 
and MASSVAC 4.0) to determine the loading rate of 
evacuees. This approach relies on the planner’s judg- 
ment in calibrating the model parameters z and h. 


In 2000, Urbanik [36] developed a probability dis- 
tribution of the trip departure time. He defined the 
probability distribution of an activity time based on 
the percentage of the population that completed the 
activity within a given time span. To simplify, he as- 
sumed that the probability distribution of trip depar- 
ture sub-activity times were independent. Then, he de- 
rived the probability distributions of trip departure 
time as the join probability distribution of sub-activities 
involved. 

None of the above mentioned approaches con- 
sider the impact of human behavior on trip depar- 
ture time. The work by Murray et al. [27,28] addresses 
the tendency of households to gather and then evacu- 
ate as a single unit. They believe that this type of be- 
havior increases the departure time and, as a conse- 
quence, the evacuation time. Their evacuation model 
is based on a network flow formulation of the prob- 
lem. In this network, the nodes represent residential 
and other possible meeting locations. The arcs repre- 
sent the shortest path between nodes. Two linear in- 
teger programming formulations of the problem are 
given that consider a realistic presentation of human 
behavior in emergency situations. The first formula- 
tion determines the household meeting location while 
minimizing the maximum travel time of family mem- 
bers. The second formulation determines the route as- 
signment along with the non-drivers’ pickup sched- 
ule by minimizing the linear trade-off between waiting 
time and travel time. For a more extensive review of 
the impact of human behavior in trip departure time, 
see [20,22,33,40]. 


Community Evacuation: Trip Destination Selection 


In emergency situations, the most straightforward ap- 
proach that evacuees follow in choosing a destination 
is the shortest evacuation plan (SEP) [38]. Based on 
SEP, the evacuees seek the closest exit that flows them 
away from the danger area. In 1996, Yamada [38] pre- 
sented an emergency evacuation plan for a city using 
two network flow optimization models. In these mod- 
els, the residential areas (RA) and places of refuge (PR) 
are the nodes of the network, and the roads between 
them are the arcs. Yamada assumes that the roads are 
bi-directional with the same travel time in both direc- 
tions and that the evacuees traverse roads on foot at 
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a uniform speed. He introduces a dummy node, v*. v* 
is then connected to each RA node. The new network 
is denoted by Gt(V+, Et), and Vt = VU {v7}, 
Et = EU {(vt,d)|d € D}. V is the original set of 
nodes (RA and PR nodes), E is the original set of arcs 
and D is the set of RA nodes only. 

The first model does not consider node and arc ca- 
pacities. The model minimizes the individual and total 
travel distance. Yamada applied the Dijkstra algorithm 
with v* as the source node and PRas the demand nodes. 
This algorithm runs in O(|V* |?). The optimal solution 
for this network flow problem is a forest of trees with 
exactly one PR in each tree. The solution determines 
the best possible destination for each RA. 

The second model considers capacity constraints on 
PR nodes. Yamada uses a minimum cost network flow 
formulation to model this problem. He modified the 
original graph by adding a dummy source node (vx), 
a dummy sink node (v’), and a set of arcs connecting 
(vx) to RA and v’ to PR nodes. The new network is de- 
noted by G*(V*, E*), where V* = V U {vx, v*} and 
E* =EU {(vs,r)|r € R} U {(d, v*)|d € D}. 

The following is the problem formulation: 


minimize > k(u, v)x(u, v) 


(u,v)EE 
subject to 
> x(u, v) = > x(v,w) VveVv 
uev* wev* 
Va.) =P 
deD 


0 <x(u, v) <c(u,v) V(u, v) € E*, 


where k(u, v), c(u, v), and x(u, v) are the cost coefficient, 
the capacity, and the number of evacuees traversing arc 
(u, v), respectively. P is the size of the population, and R 
is the set of RA nodes. Due to the capacity constraints, 
the solution may not be a forest of trees and, as a result, 
evacuees of an RA node may be assigned to multiple PR 
nodes that may not necessary be the closest. 

The SEP minimizes the total travel distance by rout- 
ing evacuees to the closest exit. This approach causes 
congestion in certain exits, which in turn increases the 
total evacuation time. Cova and Johnson [10] over- 
came this difficulty by developing an optimal lane- 
based evacuation routing plan. They formulated the 


problem as an integer extension of the minimum cost 
network flow problem. The objective, again, is to min- 
imize the total travel distance. However, the model 
generates routing plans that trade total vehicle travel 
distance against merging conflicts while preventing 
traffic-crossing conflicts at intersections. They use a mi- 
croscopic traffic simulation to compare the relative ef- 
ficiency of the plans. The model is then used to identify 
evacuation routing plans for Salt Lake City, Utah. 

The selection of a specific destination limits the 
route choices of evacuees and increases congestion of 
the roads that lead to safety. To avoid congestion, 
Hobeika et al. [17] have developed a model that routes 
the evacuees to the outside boundary of the risk area 
and lets them seek a safe place afterwards. They have 
extended the traffic network by adding dummy links 
that connect the final destinations to the network at the 
boundary areas. The dummy links have infinite capac- 
ity and short travel time. The objective is to minimize 
the total evacuation time. 

Similarly to the trip departure step, human behavior 
significantly affects the destination choice of evacuees 
in emergency situations. Evacuees may change their 
intended destination if they notice considerable traf- 
fic backed up ahead of them [35]. In situations when 
the household members are scattered throughout the 
evacuation area, the individuals’ tendency to meet be- 
fore evacuating affects the destination selection choice. 
Murray and Mahmassani [27] point out that depend- 
ing on the current location of family members, evacuees 
may decide to meet in a place that is close to the danger 
rather than far from it. 


Community Evacuation: Trip Route Selection 


The trip route selection, also known as trip route as- 
signment, identifies the movement of evacuees during 
the evacuation process. Numerous optimization, sim- 
ulation and combinatorial optimization-simulation ap- 
proaches have been used to model route selection pro- 
cedures in the last four decades. The most common 
objectives of these models are to minimize the total 
travel time, to minimize the total evacuation time, or 
to maximize the flow of evacuees from the risk area to 
safety [3,10,27,28,38]. The travel time depends on the 
speed of a vehicle on a highway segment. The average 
speed is a non-increasing function of the traffic vol- 
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ume [15]. The following equation demonstrates the re- 
lationship between speed and volume and is referred to 
as the BPR (Bureau of Public Roads) equation. 


_ SP, 
[b+ a) 


SPit 


where, SP; is the speed limit on segment i, S P;; is the av- 
erage speed of a vehicle on segment i at time t, C; is the 
capacity of segment i, Vi; vehicle flow entering segment 
iat time t, aw and £ are constants. 

Evacuation time depends not only on the traffic 
density, but also on traffic delays at intersections. The 
limited capacity, merging conflicts and crossing con- 
flicts at intersections create unavoidable bottlenecks. 
The lane-base routing approach, presented by Cova and 
Johnson [10] in 2003, speeds up the evacuating pro- 
cess by increasing the intersection capacity and alleviat- 
ing conflicts. The formulate the problem as a minimum 
cost network flow problem that minimizes the total 
travel distance, considering intersection conflicts, lane 
changing, and left-hand turns, simultaneously. Murray 
et al. [27,28] have formulated the evacuation routing 
as a vehicle routing problem (VRP). Unlike the clas- 
sic VRP, they assume that vehicles have different ca- 
pacities and are not located in a single depot but scat- 
tered throughout the network. In addition, the objec- 
tive function minimizes not only the total travel time, 
but also the waiting time of evacuees at the meeting lo- 
cations. 

Besides optimization approaches, simulation based 
approaches have also been used to model evacuation 
routing. The employed route selection logic in simula- 
tion models might be simple, static or dynamic [35]. In 
a simple routing approach, the drivers either select the 
least congested route based on their myopic perception 
or follow some pre-determined set of routes. This ap- 
proach has been used in microscopic simulation mod- 
els such as CLEAR and NETSIM to simulate evacuation 
routing in small urban and rural areas [26,30]. 

The static route assignment models assume that 
traffic conditions remain unchanged during the sim- 
ulation period. A mesoscopic simulation package, 
DYNEV, developed by KLD Associates Inc., uses such 
models to create evacuation routing plans for large ur- 
ban areas [21]. 

Considering the dynamic nature of emergency 
evacuation, the dynamic traffic route assignment mod- 


els are superior to simple and static approaches [16,17, 
31,32,36]. To route the evacuees to safety, the dynamic 
routing approach does not follow a pre-determined set 
of turning movements at intersections; instead, turn- 
ing movements are function of dynamic traffic flow 
and evacuee behavioral considerations. These behav- 
ioral considerations address a driver’s prior knowledge 
of the best direction leading to safety and her/his my- 
opic perception of traffic conditions. 

In combinatorial 
proaches, an optimal route assignment model is 
integrated with a traffic simulation model. MASS- 
VAC [17,29] and Dynasmart-P [4] are examples of 
macroscopic simulation packages that rely on com- 
binatorial approaches. The objective of the optimiza- 
tion model in MASSVAC is to minimize the number 
of casualties. This model generates an optimal set of 
routes along with an optimal evacuation schedule. 
Dynasmart-P is a dynamic traffic network analysis 
and evaluation tool that determines a time-dependent 
assignment of vehicles to different network paths. 
Thus, the assignment of a driver to a path is made 
not only based on the length of the path, but also 
evacuation time. The objective is to minimize the 


optimization-simulation ap- 


travel time for each individual traveler. A set of out- 
flow constraints limits the total number of vehicles 
leaving the link at an intersection approach. Addition- 
ally, a set of inflow constraints limits the maximum 
number of vehicles allowed to enter a link from all 
approaches [19]. 


Building Evacuation 


The issues discussed above are applicable in many 
emergency scenarios. However, the inevitable differ- 
ences of some cases demand special considerations. 
Evacuating a building due to a disaster, such as the 
threat of smoke, fire, earthquake, bomb or toxic gas 
leak, requires a different approach. Lindell and Pra- 
ter [24] have identified major differences between 
building and community evacuation. First, the social 
units within a building are not as clear as residential 
units within a community. Second, the employers can 
exercise more control strategies than public agencies. 
Finally, the departure time for building evacuation is 
shorter because of limited required preparation activ- 
ities. 
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A number of studies have been devoted to de- 
scribing the behavior of building inhabitants in emer- 
gency situations. For an extensive review, see Bryan [1]. 
However, fewer studies have been conducted to model 
evacuation procedures. One of the earliest attempts by 
Chalmet et al. [3] uses a capacitated network flow prob- 
lem as the basis for modeling a building. Workplaces, 
halls, stairwells, and elevators represent the nodes of the 
network, and movement paths between them represent 
the arcs. The static capacity of nodes is the maximum 
number of individuals who are allowed to be in building 
components simultaneously. And the dynamic capacity 
of arcs is an upper bound on the number of individuals 
who can traverse the pathways in each time interval. 

Chalmet et al. have used three different optimiza- 
tion models to solve the problem: a dynamic model, 
a graphical model and an intermediate model. The dy- 
namic model is a multi objective optimization model 
that represents the evacuation as it evolves over time. 
In contrast, the graphical and intermediate models are 
not time dependent; they treat time as a parameter. 
They are simpler than the dynamic model; however, 
they provide almost the same insight about the build- 
ing evacuation process. 

The objectives of the dynamic model are to mini- 
mize the average number of time periods spent by each 
individual to evacuate the building, to maximize the to- 
tal number of people saved, and to minimize the total 
evacuation time. The dynamic model is formulated as 
a minimum cost flow problem and efficiently solved us- 
ing the GNET Algorithm [2]. 

The graphical approach to model building evacu- 
ation was originally presented by Francis [13,14]. The 
model assigns people to evacuation routes with the ob- 
jective of minimizing the evacuation time. Two implicit 
assumptions of this model are that all evacuees have 
a uniform accessibility to the exit routes and that route 
clearance time depends on the number of people using 
the route. Given k to be the number of individuals in 
a building that has n exit routes, the formulation is 


minimize max[tj(x;)|1 < j < 1] 


subject to 
xi + aon +x, =k 
Mills: 8 ,X, > 0 


where t;(x;) is the time required to clear route j if the 
total number of evacuees on this route is xj. t; is a con- 
tinuous function and is strictly increasing with respect 
to x;. Note that t;(0) = 0. Considering the assump- 
tions made by this model, the minimum evacuation 
time happens when all routes are cleared in the same 
time. 

The intermediate model uses the same network 
structure as the dynamic model. However, it is supe- 
rior to the dynamic model in view of the required in- 
put data and computational time. Similar to the dy- 
namic model, the arcs are capacitated, but there is no 
traverse time on arcs. For a given subset A of arcs, 
called critical arcs, there is no capacity constraint; in- 
stead, the function t,(xj) estimates the time it takes 
to traverse arc (i, j) € A when the flow of evacuees 
on this arc is xj. Clearly, ¢;;(0) = 0. The objective 
is to minimize the building evacuation time which is 
explained as minimizing the traverse time on critical 
arcs. A heuristic bisection search algorithm and an ex- 
act minimax algorithm were used to solve the prob- 
lem. 


Small-Area Evacuation 


The standard approach for developing an evacuation 
plan for regions, buildings, ships, etc starts with de- 
termining the evacuation zone around a known haz- 
ard and then exploring some important factors that af- 
fect the evacuation plan (e.g., population distribution, 
road capacities and human behavior). To delimit the 
evacuation zone, a boundary is established around the 
affected area. In nuclear power plant evacuations, the 
boundary of the evacuation area is defined to X miles 
of radius from the plant, and X depends on the type of 
plant and the type of accident. In building evacuations 
such a boundary is defined by the shell of the building. 
However, for some emergency situations, such as urban 
firestorms or toxic spills on highways, the spatial impact 
of the hazard is unknown. In these kind of situations, 
defining the evacuation zone and its boundary can not 
be done in advance. This is usually the case for small 
urban and rural areas that could be subject to different 
hazardous events with uncertain spatial effects. Thus, 
the focus has been on general planning and mock drills 
rather than attempting to develop neighborhood spe- 
cific evacuation plans. Cova and Church [8] were the 
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first to analyze the potential for evacuation difficulty at 
the neighborhood scale. 

Little is known about small area evacuation as it 
is nearly impossible to measure accurately during an 
emergency. But, there has been an interest in look- 
ing for those areas that might be difficult to evacuate 
safely in an emergency. Church and Cova [6] intro- 
duce a network-based model to search for small con- 
tiguous areas or neighborhoods, within a urban/rural 
area, that may face difficulties in a sudden evacua- 
tion scenario. Their model classifies a neighborhood 
based on the degree of evacuation difficulty. The evac- 
uation difficulty is measured by the evacuation risk 
factor which is defined as the number of vehicles per 
exit road. Church and Cova formulate the problem 
as a nonlinear network partitioning problem. The ob- 
jective is to identify a critical neighborhood (criti- 
cal cluster) that has the highest evacuation risk fac- 
tor. In their network the nodes represent the house- 
holds and the arcs represent the road segments con- 
necting them. The demand of each node is esti- 
mated by multiplying the number of people per house- 
hold with the average number of vehicles per per- 
son. The problem is transformed to an integer lin- 
ear program whose objective is to identify an evacu- 
ation area that has a risk factor greater than a spe- 
cific minimum threshold. This problem is solved for 
each node of the network. As a result, each node is 
labelled by the risk factor of its corresponding crit- 
ical neighborhood. Finally, for each node the criti- 
cal risk value is the highest value related to the crit- 
ical clusters that this node has been part of. An ex- 
act and a heuristic approach are proposed to solve the 
integer-linear programming problem. Since the exact 
approach is time consuming, the heuristic approach is 
used to find a contiguous critical area around a given 
node. The heuristic approach follows a region grow- 
ing basis. The base node is selected arbitrarily from 
the network. The area around the node is expanded 
iteratively by selecting a node randomly from a list 
of candidate nodes within a specific distance from 
the base node that most improves the objective func- 
tion. 

Cove and Church [8] have applied a similar 
methodology to generate an evacuation vulnerability 
map which classifies a local area based on the evacua- 
tion difficulty. 


Real-Time Traffic Management 


Real-time traffic management for emergency evacua- 
tion dynamically controls the traffic flow to achieve 
certain system objectives such as maximum utilization 
of transportation system and minimum fatalities and 
property losses. This approach considers the evolution 
of the traffic flow in a traffic network to generate a real- 
time feedback traffic management system by using in- 
vehicle and on-route surveillance systems [5,25]. Briefly 
stated, the current condition of dynamic traffic flow 
is monitored by surveillance systems and a reference 
model that generates the desired traffic status and the 
“safest evacuation strategy” is developed to satisfy the 
designated objectives. The objectives are defined based 
on the nature of the hazard and the involvement of the 
emergency authorities. Possible objectives are minimiz- 
ing the total travel time, minimizing the network clear- 
ance time or minimizing the number of casualties. The 
generated real-time control strategies include routing 
assignments, split rates at intersections, or traffic con- 
trol advisories that are passed on to evacuees cyclically. 
In fact, the control strategies are not necessarily prac- 
ticed by all evacuees in emergency situations. There- 
fore, these strategies are modified based on the differ- 
ences between the current traffic status and the desired 
traffic status defined by the reference model. The “mon- 
itor, control, and modify” framework is repeated fre- 
quently in a closed feedback loop to decrease discrep- 
ancies between the the original plan and the current 
traffic status [25]. Some evacuation models include as- 
pects of human behavior to provide more realistic con- 
trol strategies that alleviate the deviations. The evacu- 
ation route choice model developed by Chiu et al. [5] 
is an example. This model replicates the route-selection 
procedure of evacuees when they are provided with safe 
evacuation routes. The probability that an evacuee will 
select a particular route depends on his familiarity with 
the route, the degree of overlap between the routes and 
his preference of using freeways. 


Agent-Based Modeling 


Agent-based modeling is also known as individual- 
oriented modeling. This is an increasingly powerful 
modeling technique to simulate individual interactions 
in dynamic routing situations such as emergency evac- 
uations. Agent-based modeling treats the individual 
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vehicles as intelligent decision-making entities [11]. 
A model of agents and a model of their environ- 
ment are two basic components of agent-based mod- 
eling [39]. The behavior of an agent and its interac- 
tion with other agents is modeled by a set of rules such 
as accelerating, decelerating, and lane-changing rules. 
The traffic environment is modeled using a traffic net- 
work topology, road category, traffic lights, and traffic 
signs [12,37]. 

In emergency evacuations, the agent-based sim- 
ulation captures the collective behavior of agents, 
which greatly affects the evacuation plan. As a re- 
sult, more realistic strategies are developed by includ- 
ing the individual behavior of evacuees and their in- 
teractions in panic situations. In 1993 Sinuany-Stern 
and Stern [34] used agent-based simulation for spon- 
taneous urban evacuation. They examined the sensi- 
tivity of network clearance time to several traffic fac- 
tors (such as interaction with pedestrians, intersec- 
tion traversing time, and car ownership), and route 
choice mechanisms (such as shortest path selection 
or myopic-based selection). Cova and Johnson [9] as- 
sessed the spatial affect of a proposed second access 
road on household evacuation time using an agent- 
based microsimulation model. Church and Sexton [7] 
used Paramics, an agent-based microsimulation soft- 
ware, to simulate evacuation scenarios in a small neigh- 
borhood. They estimated the impact of different evac- 
uation scenarios, such as opening an alternative exit, 
invoking traffic control plans, and changing the num- 
ber of vehicles leaving a household, on evacuation 
time. 


Conclusions 


Emergency evacuation is a management strategy to en- 
sure population safety in emergency situations. Com- 
munities, buildings, and residential areas are prone to 
disasters, thus detailed evacuation planing is neces- 
sary. Evacuation planning models consist of a five-step 
procedure that involves trip generation, trip departure 
time, trip destination, trip route selection, and evacua- 
tion plan set-up and analysis. We have presented here 
a summary of some noteworthy research on each of the 
above mentioned steps of the planning process. This re- 
view also focuses on the special features of community, 
building and small area evacuation planing. In addition, 


real-time traffic management and agent-based models 
are discussed. The real-time traffic management models 
consider the dynamic nature of traffic flow and gener- 
ate a real-time feedback traffic management system in 
emergency situation. The agent-based models provide 
realistic emergency evacuation strategies by consider- 
ing the individual behavior of evacuees (agents) and 
their interactions. 
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This section introduces the interior point approach to 
solving entropy optimization problems with linear con- 
straints. In particular, we consider the following prob- 
lem: 

Program EL: 


min f(x)= clx+ ae In x; a) 
j=l 


st. Ax=b,x>O, 


where c € R",d € R",d > O, b € R”, A is an (m x n)- 
matrix, O is an n-dimensional zero vector, and 0 In 0 
= 0. When c = O and dj = 1,j =1,..., n, Program EL 
becomes a pure entropy optimization problem. 

Denote the feasible region of Program EL by F, = 
{x € R": Ax = b; x > O} and the (relative) interior of F, 
by x = {x € R": Ax=b;x > O}. Ann-vector x is called 
an interior solution of Program EL if x € ie With these 
definitions, we have the following verifiable result: 


Lemma 1 [If F, is nonempty, then Program EL 
has a unique optimal solution. Moreover, if Fy has 
a nonempty interior, then the unique optimal solution 
is strictly positive. 


All interior point methods, including those to be dis- 
cussed in this section, require the fundamental assump- 
tion that F, has a nonempty interior, i.e., fae #@.ALa- 
grangian dual can be derived in the following manner. 
ForallxeR",y¢R”,andze Ri =x: x € R",x > O}, 
define the following Lagrangian function: 


L(x, y,z) = ~ cjxj + Y > dje(x;) 
j=l j=l 


is a proper convex function with the set {x: x € R, x > 
0} being its effective domain [6]. The concept of proper 


convex function has often been used to simplify convex 
analysis. For details about the theory of using Lagrange 
multipliers for solving constrained optimization prob- 
lems defined in terms of proper convex functions, the 
reader is referred to [6, Chap. 28]. 

Rearranging terms in (2) results in 


L(x, y,2) = D> cjxj + D> dje(x;) 
j=l j=l 
+ or Y(Soaun tes) a 
i=1 


j=1 \i=1 


Considering the fact that dj > 0 and the shape of the 
entropic function x In x, we know that, for any given y € 
R” andzeé Ri, L(x, y, z) achieves its unique minimum 
atx* >O. Also, its first derivative at x* vanishes. This 
implies 


djlnx* —) aijyi + cj +dj =z > 0. (3) 
i=1 


Multiplying both sides of (3) by x; and summing over j 
produces 


n n 
Soe + = djx; In x 
j=l j=l 
n m 
-¥o(Sran ts) 
j=l \i=1 
n 
=— Dax}. 
j=l 
Consequently, for any y¢ R” andze R", 


L(x*, y,z) = 2 biyi — > djx*, 
i=1 j=l 


where x* satisfies (3). Therefore, a Lagrangian dual of 
Program EL becomes 


L(y,z) = > biyi — ) djx* 
i=1 j=l 


s.t. djlnx* —)° aijyi + cj + dj = 2), 
i=1 


j=l,...,n. 
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This dual is equivalent to 


Program DEL: 
m n 
max L(x, y) = » bivi — a d jx; 
Bist = 
- (4) 
s.t. djlnxj +c; +dj)— > aijyi = 0, 
i=l 
j=l,...,n. 


Note that x is strictly positive because In 0 is not well- 
defined. However, if we define In 0 = —oo, the domain 
of x in Program DEL can be replaced by {x:x € R", x > 
O}. Denote the excess vector V f(x) — ATy bys. The jth 
component of s is simply dj In xj + G + dj —OvL, ai 
yi, which is the left-hand side of (4). Denote the feasible 
region of Program DEL by Fy = {(x, y): V f(x*) — AT 
y* > O}, and assume that Fg has a nonempty interior. 

We now derive the Karush-Kuhn-Tucker condi- 
tions for Program DEL. First, define, for all u > O, the 
following Lagrangian: 


L'(x,y,w) = 2 bivi — > d jx; 
= j=l 


+ So uj (ssn + Cj + dj = ay] ‘ 
i=1 


j=l 


Setting the partial derivatives with respect to y; and x; 
to zero gives 


bi— > aijuj = 0 1=1,.; > mM, 
i= (5) 
ujdj . 
—dj+—=0 jul,...,n. 
xj 


Note that (5) is equivalent to uj =x;. Therefore, the KKT 

conditions for Program DEL become 

a) There exists x € R" such that Ax = b and x > O. This 
can be viewed as the ‘primal feasibility condition’. 

b) There exists y ¢ R™” such that, together with x, dj 
In x; tot d; —yen aij Vi = 0 or V f(x) —AT y= O. 
Similarly, this can be viewed as the ‘dual feasibility 
condition’. 

¢) For all j=1,...5 0, (a; lng 4d; — 27, ay ya) 
= 0. This can be viewed as the ‘complementary slack- 
ness condition’. 


Note that, by (5), the Lagrange multipliers associ- 
ated with the constraints of Program DEL at its opti- 
mal solution happen to coincide with the x-component 
of the optimal solution of Program DEL. This, together 
with the fact that the dual of Program DEL is Program 
EL, imply that the optimal solution of Program DEL 
contains the optimal solution of Program EL. 

Also note that an alternative dual program can be 
defined by considering the following Lagrangian: 


L" (x,y) — Yo px; + y jx; In x; 
j=l j=l 


m n 
-~5 ) aijxj — 0; | Yi, 
i=1 \j=1 


for x > O and y € R”. In this expression, no Lagrange 
multipliers are defined for the constraints x > O, and it 
leads to the following dual program: 


= ae exp Dimi ViVi Cj = 


Since this dual program is unconstrained, any solution 
algorithm can be viewed as an interior point algorithm. 
For details about this approach and companion efficient 
solution algorithms, see [2]. 

In the rest of this section, we focus on the devel- 
opment of a primal-dual interior point algorithm [5]. 
Note that, to obtain the algorithm, Program DEL, 
rather than the unconstrained dual program, was used 
in [5]. The primal-dual interior point algorithm starts 
with an initial primal feasible solution x° and an ini- 
tial dual feasible solution y°. While the algorithm iter- 
ates, it maintains the primal and dual feasibility con- 
ditions and reduces the complementary slackness. In 
other words, the algorithm iterates from a pair of in- 
terior solutions (x‘, y’), with A x* = b, x* > O and sé 
= V f(x*) — ATy* > O, to a new interior solution pair 
(xk*1\ yk*1) such that the complementary slackness is 
reduced from 5; = (x*)Ts* to 8,41 = (xkt!)TsK*!. The 
algorithm terminates when 4, < €, for some given € > 0 
(or when the difference between f (x*) and the optimum 
is sufficiently small). 
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To describe the algorithm, we use the boldface 
upper-case letters X, S$, and W to denote the diagonal 
matrices formed by the components of vectors x, s, and 
w, respectively. We also denote the vectors of all ones of 
appropriate dimensions by e, the J, norm by || - || , and 
the vector whose components are In (x;)’s, j = 1, ..., n, 
by In x. 

Rather than dealing with the complementary slack- 
ness 5, directly, the following primal-dual potential 
function [8] 


W(x, s) = pln(x's) — S-In(xjs;), 


j=l 


where p > n + ./n, can be used as a surrogate mea- 
sure [5]. 

Given the initial solution pair, the potential of the 
associated complementary slackness can be calculated. 
Given the inaccuracy tolerance €, a target potential can 
be calculated. Therefore, the amount of required poten- 
tial reduction can be calculated. The primal-dual inte- 
rior point algorithm, under proper conditions, will re- 
duce the potential by a constant amount in each itera- 
tion. 

Note that two different pairs of (x, s) that have the 
same complementary slackness measure may have dif- 
ferent potentials. Therefore, to ensure that the target 
potential is sufficiently small, we need to find the mini- 
mum potential among all those (x, s) pairs such that xT 
s = €, or alower bound of this minimum potential. 

Rewrite the potential function as 


w(x,s) = (p—n) In(x's) — Yo in (=) 
j=l 


Applying the geometric-arithmetic inequality results in 


” X 5S; : 1 - X55; _ 1 
I(r) = or n 
Taking the natural logarithm leads to 

i= XjSj 1 
= (r) ng) =e 


Consequently, 


Therefore, the target potential should be (p — n) In € 
+ nln n. Given the potential associated with the initial 
solution, the exact amount of potential reduction is w 
(x°, s°) — (p — n) Ine — n In n. Note that for a given 
inaccuracy tolerance e, the target potential is indeed the 
minimum of all the potentials associated with all (x, s) 
pairs such that xT s = e. This is indicated by the tight 
geometric-arithmetic inequality. 

Given the knowledge of how much potential reduc- 
tion needs to be, if an algorithm reduces the potential 
by a constant amount in each iteration, then the com- 
plexity of the algorithm is O(y (x°, s°) — (op — n) Ine — 
nin n). 

Assume that, in iteration k, we have a primal-dual 
feasible solution pair (x‘, y*) and the slack vector s* = V 
f(x*) — ATy* > O. Ideally, one would like to find (x‘*!, 
y‘ *1) such that the KKT conditions are met, i.e., 


Axkt! =b xktl >O 
Vix) = Aly > O. 
xy _ Alyr =O. 


Define 
Ax ax 3", 
yaya, 


With these definitions, the conditions stated above be- 
come 


A(x* + Ax) =b x + Ax>O, 
V f(x" + Ax) — Al (y* + Ay) > O, 6 
(x* + AX) 


x [Vf (xk + Ax) — AT(y* + Ay)] = 0. 


Note that quantity in the bracket of (6) is simply s‘*! = 
s‘ + A s, where 


As = Vf(x* + Ax) — Vf(x*) — AT Ay. 
Therefore, we have 

(x* + Ax)(s* + As) = O, 
or 


Xksk + XkAs + AXsé + AXAs = O, 
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or 


Xk As + SkAx = —AXAs — X*s*. 


Solving the equations 
A(x‘ + Ax) = b, 
X*As + SkAx = —AXAs — X*sk, 


subject to the condition 

Vf(x* + Ax)—Al(y« + Ay) > O 
is in general difficult. 

Given O < xk € Fy, sk=V pita) = ATy* > 0, and 
d* = (x*)Tsk, the algorithm proposed in [4] solves the 


following system of nonlinear equations for Ax and Ay: 


X*As + SkAx = Op*, (7) 


AAx = O, (8) 


where 6 > 0 is a constant to be specified later and 


5k 
p* =—e-— x*ske, 
p 


andn + ./n < p < 2n. 
By choosing 


i B min,( xs) 
= | (Xk Gk)—0.5 yk | 
for some 0 < f < 1 yet to be determined, we obtain 
UO, SH) < yak sh) — y 


for a constant y > 0. Let C > 0 be a real number. Choose 
B such that 


0<f <1, (9) 


B+ CB) < > 1—CB >0. (10) 


It can be shown [5] that, to reduce the potential by 
a constant amount in each iteration, solving a linear ap- 
proximation of equations (7) and (8) can achieve the 
required accuracy. 

Suppose that n + ./n < p < 2nand that A x and 
Ay satisfy 

AAx=O, X*As+S*Ax = 6p* +2, 
(11) 


|e" | CR min(x; st), 


then 
W(x*, 8) — watt! s+) > y, 


where y = (%2)B(1 — CB) — f?(1 + CB). 
Condition (11) can be achieved by solving the fol- 
lowing set of linear equations: 


X*(V7 f(x*t)Ax— Al Ay) +S*Ax = Op*, (12) 


AAx = O. (13) 


Note that the vector V? f(x*)A x replaces V f(x* + 
Ax)-—Vf (x*) of (7) and serves as a simple linear ap- 
proximation. Equations (12) and (13) are key to the 
‘potential-reduction’ primal-dual interior point algo- 
rithm. 

Given an initial interior point solution, an interior 
point algorithm can be stated as follows. 


Initialization: 

Given an initial primal interior point solution x° 
and an initial dual solution y° such that Ax° = b, 
x? > 0, and s° = Vf(x°) — A'y® > 0, calculate 


§° = (x°) 7 9; 
setk < 0. 
Iteration: 
IF 6* < ¢, THEN STOP 
ELSE 
solve (12), (13) for Ax and Ay; 
set 
xh] = yk a Ax, 
yo = y* di Ay; 
gktl = Via) a Aye. 
§ktl = (xk+1)T gkt1, 
reset k < k +1 for the next iteration. 
END IF 


With a standard procedure for obtaining an initial 
solution [1], the following theorem of polynomial time 
convergence was shown in [5]. 


Theorem 2 Suppose that ¢ >0and2n > p>n+/n. 
Then, in the kth iteration, x* > O, s* > O, and x* and 
y* are feasible for Programs EL and DEL. Moreover, the 
interior point algorithm terminates in at most O( (x°, 
s°) — (9 —n) Ine —n Inn) iterations. 


It was also suggested that, in practical implementation, 
the stepsize can be set to 7 based on a line search such 
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that 7 = arg miny>o w(x* + nAx, s* + nAs). With this 
stepsize, one can set x*t+! = x* + Ax, y*t! = yk + 
ny. 

The search direction is a combination of a decent 
direction and a centering direction. To enable local 
quadratic convergence, a computable criterion was de- 
veloped under which a pure Newton method for solv- 
ing V f(x) — AT y = O, Ax = b (by solving the linear 
system of V? f(x*)Ax — AT Ay = — s* and AAx = O) 
can be applied for the rest of the search process. Note 
that when x* is close to the optimal solution, we have 
x* being strictly positive, and therefore V f(x) — AT 
y should be close to O. Implementation of primaldual 
interior point algorithms proposed in [5] is discussed 
in [3]. 

In addition to the ‘potential-reduction’ interior 
point method described above, the ‘path following’ 
interior point method, which follows an ideal inte- 
rior trajectory to reach an optimal solution, was pro- 
posed in [7,9]. The convergence of the path follow- 
ing interior point method has been established. How- 
ever, to the best of our knowledge, possible polyno- 
mial time convergence behavior remains an open is- 
sue. 
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Introduction 


Entropy optimization has been applied to problems in 
various fields of interest from thermodynamics to fi- 
nancial planning. In this context ‘entropy’ refers to the 
amount of uncertainty in a system, rather than the 
amount of disorder. A detailed definition of entropy 
can be found in [4]. 

One area of application, which has not received 
much attention in recent years, is that of parame- 
ter estimation. The estimation of parameters in semi- 
empirical mathematical models is a process which is 
important in many disciplines in the sciences and en- 
gineering. This article will focus on a few different areas 
of the parameter estimation problem which have been 
approached from an entropy perspective. Jaynes’ maxi- 
mum entropy principle allows for the estimation of pa- 
rameters in a statistical distribution function by specifi- 
cation of the characteristic moments. This method can 
also be used to derive the principle of maximum likeli- 
hood, one of the most widely used parameter estimation 
approaches. Entropy principles have also been used to 
derive theoretical ‘best estimators’ for recursive param- 
eter estimation schemes. These results can then be used 
to gauge the performance of various nonoptimal ap- 
proaches. A final application involves the development 
of a measure which not only allows for the estimation 
of model parameters, but also simultaneously choosing 
the best mathematical form of the model. 


Entropy Measures 


In order to optimize entropy, one must possess some 
quantitative measure of the entropy of a given distribu- 
tion. One such measure was developed by C.E. Shannon 
[8]. Shannon arrived at the function by postulating a set 
of properties which the measure should have, and then 
deriving a form which possesses those properties. For 
a probability distribution p = (pj, ..., Px), the function 
takes the form of: 


S=—)  pilngi. (1) 


i=1 


Shannon also proved that this function was unique for 
the postulated set of properties. Other researchers have 
postulated different sets of properties, but arrived at the 
same result [4]. 

Another measure of entropy, in this case the cross 
entropy or distance between two distributions, was pre- 
sented by S. Kullback and R.A. Leibler [5]. For two 
given distributions, p = (pi, ..., Pn), and q = (q1, ..., 
Qn), the function takes the form: 


r= So piin® (2) 


i=1 


It is assumed that when q; = 0, the associated p; also is 
zero and Oln f = 0. This function is referred to as the 
Kullback-Leibler measure of cross-entropy. 


Jaynes’ Maximum Entropy 
for Continuous Distributions 


Since most distributions encountered in practice are 
continuous in nature, Jaynes’ principle of maximum 
entropy (MaxEnt), must first be extended to continu- 
ous distributions. This extension is straight forward and 
results in: 


b 
max -f f(x) In f(x) dx 
[. 
s.t. f(x) dx =1 
ab 
[fog dx = a, 


t= diacciy M 


(3) 


where f(x) is a continuous probability density function 
from a to b. The Lagrange function takes the form of: 


b 
L= -j f(x) In f(x) dx 


b 
anet / f(x)dx — ] 


m b 
= / Flogi(x) dx — “| 4) 
r=1 a 


Using the Euler-Lagrange equation the following ex- 
pression results: 


f(x) = exp [—Ao — Argi(x)++-— Amgm(x)]. (5) 


A detailed discussion can be found in [4]. 
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MaxEnt Estimation Method 


The estimation of parameters in a statistical distribu- 
tion using MaxEnt follows these steps: 

1) Specify m characterizing functions, g1(x), ..., 2m(x). 
2) Use MaxEnt to find f(x), which is given by (5). 

3) Find estimates of the values of the moment equa- 


tions from the observed data set x = {x,, ..., Xn} 
through the relationship: 

~ 1 

a, = oy [ gr(x1) a 8r(Xn) | : (6) 


4) Determine estimates of the Lagrange multipliers, 
Xo. +++ Am, from: 


i ar(x)eAsi—"Amsn(). og 


,- nie 7) 
fe eA gi (x)—- Aim &m (x) dx 
and 
~ es ~ 
ae I eAigie—-Angm(®) dy. (8) 
5) The estimated function then takes the form: 
f(x) = exp[—%o—-+-—Angm(x)]- 9) 


Maximum Likelihood From MaxEnt 


The principle of maximum likelihood has been widely 
used to estimate the parameters of both statistical dis- 
tributions and semi-empirical models. Maximum like- 
lihood assumes that information exists about a ran- 
dom variable in the form of an observation, x), .. 
and a density function, f(x;01,..., 9m), unlike in Max- 
Ent where the forms of the characterizing moments are 


ae J Xn» 


known. The approach seeks to maximize the likelihood 
that the given observations will occur given a set of pa- 
rameters. If each observation is independent, then this 
‘likelihood’ is defined as: 


L(X;0) = | | f(xil®). (10) 


i=1 


The log likelihood function is most often used: 


In L(X;@) = Yin f(xi]@). (11) 


i=1 


The In L is maximized to determine the optimal param- 
eter estimates 0. 

The same objective can also be derived using the 
concept of MaxEnt, even though the former predates 
the latter. The parameters need to be chosen such that 
the entropy which remains after the observed values are 
known is large as possible. This implies that the entropy 
of the observation itself has to be a minimum. The en- 
tropy is given by: 


b b 
-[ f(x, @) In f(x, 0) dx = | In f(x, @) dF. (12) 


The knowledge which is given by the observation is: 


F(x,@)=0 whenx < x; 
F(x,@)=+ when x, < x < x 
F(x,O) = whenx, <x < x-41 
F(x,@)=1 whenx, <x 


where F(x, @) is the cumulative density. Thus the en- 
tropy of the sample is then written as: 


1 : 
— = [In f(r, ) +--+ In fxn, OD), (13) 
which is equal to: 
1 
~— | Bisco eC anise’ | (14) 


where L is the same as described by (11). Therefore to 
minimize the entropy of the sample the likelihood func- 
tion must be maximized. 


Recursive Parameter Estimation 


The determination of the parameters of a dynamical 
system on-line is a key step in the implementation of 
a wide range of control schemes. The estimation pro- 
cedure is conducted in a recursive fashion in which 
the estimates from the previous time step are com- 
bined with the current state observations to calculate 
a new set of parameter estimates. The analysis of the 
estimation process used is typically approached from 
a mean square error criterion. This method requires 
some assumptions about the error to be made and the 
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form of the data processor to be restricted. H.L. Weide- 

mann and E.B. Stear [9] presented an approach based 

on entropy concepts which has various benefits over the 

mean squared error method: 

e The form of the optimal data processor is not con- 
strained nor does it have to be known. 

e Errors are not restricted to have a normal probabil- 
ity distribution. 

e None of the operators in the system are required to 
be linear. 

Before continuing with the analysis, various mea- 
sures need to be defined. The entropy of a K-dimen- 
sional random vector X with the joint probability den- 
sity function, p,(x),..., x¢) is defined as: 

CO 
HO) =~ f ps(X)In py(00 aX. (15) 
—oo 
If R, is the covariance matrix of the vector X then the 
following holds: 

H(X) < sin {(2me)* det[R,]} . (16) 
When X is a Gaussian random vector then (16) holds 
as an equality. Another quantity which will be used in 
the analysis is referred to as the mutual information be- 
tween X and Y. 


won ff 


Pxy(X, Y) In 


Pxy(X, Y) 
Py(Y)py(¥) 


dX dY. (17) 


¥ SS - 
The boo fe ee ot ices ' 
Dynamical H ! 
System ; F ' 


Entropy Optimization: Parameter Estimation, Figure 1 
Typical parameter estimator 


The object is to estimate a vector © of unknown 
parameters with the joint probability density function, 
pe(O1, ...5 Om). The output of the dynamical model as 
a function of these parameters is expressed as y;(64,...; 


Om; k). These outputs are then measured by a sensor to 
produce {z;}. These measurements are then used by the 
data processor F to produce an r-dimensional vector V 
which is an estimate of D(@). The estimation error is 
given by: 


X = D(@)-V = D(@) — F(Z) = U-V. (18) 


Also, under certain conditions, the transform D(@) will 
possess a property that for any given random vector &, 
the following holds: 


I(§; @) = 1(§; D(@)), (19) 


or, in words, that D(@) preserves energy. When this 
does not hold, I(&@)> I(ED(@)). 

The problem now is to determine the function F 
which will produce an optimal estimator. The theoret- 
ically best function results in a minimum of the error 
entropy, defined to be Ho. The only constraint on the 
approach is that the mutual information, [(©;Z), must 
be known. With that the following can be stated: 

e The minimum entropy of the error vector is given 
by: 


Ho = H(U) — I(U;Z). (20) 


e Minimizing the mutual information, I(X;Z), is 
equivalent to the minimization of the error vector. 
This is achieved be choosing F(Z) such that Z and X 
are independent. 

e Whether or not D(@) preserves energy, the reduc- 
tion in the processed parameter entropy, H(U), is 
bounded above by I(@;Z), that is, 


H(D(@)) — H(X) < I(@;Z), (21) 


and the equality holds when D(@) preserves energy 

and the optimal processor, F, is used. 
These three statements now make it possible to deter- 
mine the best possible performance an estimator can 
achieve for a given system. The proofs of these state- 
ments and a simple example can be found in [9]. The 
extension of the theorems to the continuous time case 
is given in [6], and to the similar problem of state esti- 
mation in [7]. 


Parameter Estimation and Model Selection 


For most problems of any physical significance the 
form of the model equations are not known with abso- 
lute certainty. In this lies the problem of not only esti- 
mated unknown parameters, but also determining the 


916 


Entropy Optimization: Shannon Measure of Entropy and its Properties 


best fitting model. Given a set of N independent ob- 
servations, x), .. 
unknown true distribution g(x), the objective is to es- 
timate this true distribution by choosing a member of 
a family of distributions given by f(x|©) where © is 
a vector of parameters. In order to accomplish this, the 
distance between the two distributions needs to be min- 
imized. The entropy of the true distribution is given by: 


., Xy, of a random variable from an 


Stgig) = fg) Ing(x) dx (22) 


while a measure of the cross-entropy is given by: 


S(g; f(x|O) = [wo In f(x|O) dx. (23) 


The Kullback—-Leibler (K-L) measure is defined as: 


T= S(g; g)—S(g; f (x|O) = / geile A 


Therefore the solution involves the minimization of the 
K-L measure [3]. 

Take the example of a family of possible distribu- 
tions each one having a different number, k, of un- 
known parameters, ©;. These are denoted by f(x|@,). 
The resulting form of the measure to choose the cor- 
rect distribution is referred to as Akaike’s information 
criterion (AIC) [1]: 


dx. (24) 


AIC(k) = —21In L(O,) + 2k, (25) 


where In L(O x) is the value of the log likelihood func- 
tion with optimally determined parameters Ox. It is 
proven in [3] that this result is obtained by the mini- 
mization of the K-L measure given by (24). 

A secondary problem in the area of model selection, 
is sequential design of experiments. The concept of en- 
tropy has been applied to this problem in [2]. A total 
entropy criterion is developed which includes the un- 
certainty in the model selected as well as the uncertainty 
in the parameter values in each model. The use of this 
measure leads to a choice of an experiment for which 
the outcome is the most uncertain. 


See also 


> Entropy Optimization: Interior Point Methods 

> Entropy Optimization: Shannon Measure of 
Entropy and its Properties 

> Jaynes’ Maximum Entropy Principle 

> Maximum Entropy Principle: Image Reconstruction 


References 


1. Akaike H (1974) A new look at the statistical model identifi- 
cation. IEEE Trans Autom Control 19(6):716-723 

2. Borth DM (1975) A total entropy criterion for the dual prob- 
lem of model discrimination and parameter estimation. 
J Royal Statist Soc B 37:77-87 

3. Bozdogan H (1987) Model selection and Akaike’s informa- 
tion criterion (AIC): The general theory and its analytical ex- 
tensions. Psychometrika 52(3):345-370 

4. Kapur JN, Kesavan HK (1992) Entropy optimization princi- 
ples and applications. Acad. Press, New York 

5. Kullback S, Leibler RA (1951) On information and sufficiency. 
Ann Math Statist 22:79-86 

6. Minamide N (1982) An extension of the entropy theorem for 
parameter estimation. Inform and Control 53:81-90 

7. Minamide N, Nikiforuk PN (1993) Conditional entropy the- 
orem for recursive parameter estimation and its applica- 
tion to state estimation problems. Internat J Syst Sci 24(1): 
53-63 

8. Shannon CE (1948) A mathematical theory of communica- 
tion. Bell System Techn J 27:379-423, 623-659 

9. Weidemann HL, Stear EB (1969) Entropy analysis of parame- 
ter estimation. Inform and Control 14:493-506 


———e 
Entropy Optimization: 


Shannon Measure of Entropy 
and its Properties 


SHU-CHERNG FANG}, JACOB H.-S. TSAO? 
' North Carolina State University, Raleigh, USA 
? San Jose State University, San Jose, USA 


MSC2000: 94A17, 90C25 


Article Outline 


Keywords 
See also 
References 


Keywords 


Entropy; Cross-entropy; Maximum entropy principle; 
Minimum cross-entropy principle 


The word entropy originated in the literature on ther- 
modynamics around 1865 in Germany and was coined 
by R. Clausius [4] to represent a measure of the amount 
of energy in a thermodynamic system as a function of 
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the temperature of the system and the heat that en- 
ters the system. Clausius wanted a word similar to the 
German word energie (i.e., energy) and found it in 
the Greek word nt pon, which means transformation 
[1]. The word entropy had belonged to the domain of 
physics until 1948 when C.E. Shannon, while develop- 
ing his theory of communication at Bell Laboratories, 
used the term to represent a measure of information 
after a suggestion made by J. von Neumann. Shannon 
wanted a word to describe his newly found measure of 
uncertainty and sought Von Neumann’s advice. Von 
Neumann’s reasoning to Shannon [25] was that: “No 
one really understands entropy. Therefore, if you know 
what you mean by it and you use it when you are in an 
argument, you will win every time.’ 

Whatever the reason for the name is, the concept of 
Shannon’s entropy has penetrated a wide range of dis- 
ciplines, including statistical mechanics [12], thermo- 
dynamics [12], statistical inference [24], business and 
finance [5], nonlinear spectral analysis [21], image re- 
construction [3], transportation and regional planning 
[26], queueing theory [10], information theory [9,20], 
statistics [17], econometrics [8], and linear and nonlin- 
ear programming [6,7]. 

The concept of entropy is closely tied to the concept 
of uncertainty embedded in a probability distribution. In 
fact, entropy can be defined as a measure of probabilis- 
tic uncertainty. For example, suppose the probability 
distribution for the outcome of a coin-toss experiment 
is (0.0001, 0.9999), with 0.0001 being the probability of 
having a tail. One is likely to notice that there is much 
more ‘certainty’ than ‘uncertainty’ about the outcome 
of this experiment and hence about the probability dis- 
tribution. In fact, one is almost certain that the out- 
come will be a head. If, on the other hand, the probabil- 
ity distribution governing that same experiment were 
(0.5, 0.5), one would realize that there is much less ‘“cer- 
tainty’ and much more ‘uncertainty,’ when compared 
to the previous distribution. Generalizing this observa- 
tion to the case of n possible outcomes, we conclude 
that the uniform distribution has the highest uncer- 
tainty out of all possible probability distributions. This 
implies that, if one had to choose a probability distribu- 
tion for a chance experiment without any prior knowl- 
edge about that distribution, it would seem reasonable 
to pick the uniform distribution. This is because one 
would have no reason to choose any other and because 


that distribution maximizes the ‘uncertainty’ of the out- 
come. This is called Laplace’s principle of insufficient 
reasoning [15]. Note that we are able to justify this prin- 
ciple without resorting to a rigorous definition of “un- 
certainty.’ However, this principle is inadequate when 
one has some prior knowledge about the distribution. 

Suppose, for example, that one knows some particular 

moments of the distribution, e. g., the expected value. 

In this case, a mathematical definition of ‘uncertainty’ 

is crucial. This is the case where Shannon’s measure of 

uncertainty, or Shannon’s entropy, plays an indispens- 

able role [20]. 

To define entropy, Shannon proposed some axioms 
that he thought any measure of uncertainty should sat- 
isfy and deduced a unique function, up to a multiplica- 
tive constant, that satisfies them. It turned out that this 
function actually possesses many more desirable prop- 
erties. In later years, many researchers modified and re- 
placed some of his axioms in an attempt to simplify the 
reasoning. However, they all deduced that same func- 
tion. 

We first focus on finite-dimensional entropy, i.e., 
Shannon’s entropy defined on discrete probability dis- 
tributions that have a finite number of outcomes (or 
states). Let p = (pi, ...; Pn)? be a probability distribu- 
tion associated with n possible outcomes, denoted by 
X = (x1,...,X,)', of an experiment. Denote its entropy 
by S,(p). Among those defining axioms, J.N. Kapur and 
H.K. Kesavan stated the following [15]: 

1) S,(p) should depend on all the p;’s,j =1,..., n. 

2) Sn(p) should be a continuous function of pPpj=l, 

Loy Nl. 

3) S,(p) should be permutationally symmetric. In 
other words, if the p;'s are merely permuted, then 
Sn(p) should remain the same. 

4) S,(1/n, ...,1/n) should be a monotonically increasing 
function of n. 

5) Sn(Pis-++> Pn) = Sn—1(P1 + Pos Pas +++» Pn) + (Pi + pr) 
S2(pi/(p1 + pr), P2/(P1 + p2)). 

Properties 1, 2 and 3 are obvious. Property 4 states 
that the maximum uncertainty of a probability distri- 
bution should increase as the number of possible out- 
comes increases. Property 5 is the least obvious but 
states that the uncertainty of a probability distribution 
is the sum of the uncertainty of the probability distribu- 
tion that combines two of the outcomes and the uncer- 
tainty of the probability distribution consisting of only 
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those two outcomes adjusted by the combined proba- 

bilities of the two outcomes. 

It turns out that the unique family of functions that 
satisfy the defining axioms has the form S,(p) = —k 
= p; In p;, where k is a positive constant, In rep- 
resents the natural logarithmic function, and 0 In 0 = 
0 [15]. Shannon chose —)‘_, pj; In pj; to represent his 
concept of entropy [20]. Among its many other desir- 
able properties, we state the following: 

6) Shannon’s measure is nonnegative and concave in 
Pio +++ Pre 

7) The measure does not change with the inclusion of 
a zero-probability outcome. 

8) The entropy of a probability distribution repre- 
senting a completely certain outcome is 0, and the 
entropy of any probability distribution represent- 
ing uncertain outcomes is positive. 

9) Given any fixed number of outcomes, the maxi- 
mum possible entropy is that of the uniform dis- 
tribution. 

10) The entropy of the joint distribution of two inde- 
pendent distributions is the sum of the individual 
entropies. 

11) The entropy of the joint distribution of two depen- 
dent distributions is no greater than the sum of the 
two individual entropies. 

Property 6 is desirable because it is much eas- 
ier to maximize a concave function than a noncon- 
cave one. Properties 7 and 8 are appealing because 
a zero-probability outcome contributes nothing to un- 
certainty, and neither does a completely certain out- 
come. Property 9 was discussed earlier. Properties 10 
and 11 state that joining two distributions does not af- 
fect the entropy, if they are independent, and may actu- 
ally reduce the entropy, if they are dependent. 

Shannon’s entropy was originally defined for 
a probability distribution over a finite sample space, 
i.e., a finite number of possible outcomes, and can be 
interpreted as a measure of uncertainty of the proba- 
bility distribution. It has subsequently been defined for 
general discrete and continuous random vectors. It has 
been rigorously proved that Shannon’s entropy is the 
unique measure of uncertainty (up to a multiplicative 
constant) of a finite probability distribution that satis- 
fies a set of axioms considered necessary for any rea- 
sonable measure of uncertainty [16,19,20]. The con- 
cept of entropy, when extended for probability distri- 


butions defined on a countably infinite sample space, 
takes the form of = Dopo p; In p;. It can still be viewed 
as a measure of uncertainty but such an interpretation 
does not enjoy the same degree of mathematical rigor as 
its finite-sample-space counterpart. When the concept 
is extended for continuous probability distributions, it 
is defined to be — f p(x) In p(x) dx. However, it can no 
longer be interpreted as a measure of uncertainty at all 

[9,11]. Rather, it can only be viewed as a measure of rel- 

ative uncertainty [15]. 

Note that, with Shannon’s entropy as the measure 
of uncertainty, in the absence of any prior information 
about the underlying probability distribution, the best 
course of action suggested by the principle of insufh- 
cient reasoning is to choose the uniform distribution 
because it possesses maximum uncertainty. Given the 
knowledge of some moments of the underlying distri- 
bution, the same reasoning leads to the following prin- 
ciple: 

e Out of all possible distributions that are consistent 
with the moment constraints, choose the one that 
has maximum entropy. 

This principle was proposed by E.T. Jaynes ([5, Chap- 

ter 2]), and has been known as the principle of maxi- 

mum entropy or Jaynes’ maximum entropy principle. It 
has often been abbreviated as MaxEnt in literature. 

Let X be a random variable with n possible out- 
comes {x),..., Xn} and p = (pi, ..., pn)? be a vector 
consisting of corresponding probabilities. Suppose that 
g1(X), ...5 m(X) are m functions of X with known ex- 
pected values aj, ..., dm, respectively. The principle of 
maximum entropy leads to the following mathematical 
optimization problem: 


max H,(p) = —~)° pjlnp; 


j=l 


n 
s.t. pigilxj) =a, i=l,...,m, 
j=l 


This is a convex programming problem with lin- 
ear constraints. The nonnegativity constraints are not 
binding for the optimal solution p* because each p7 
can be expressed as an exponential function in terms 
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of the Lagrange multipliers associated with the equal- 
ity constraints. Note that, in the absence of the moment 
constraints, the solution to the problem is the uniform 
probability distribution, whose entropy is In n. As such, 
the maximum entropy principle can be viewed as an 
extension of the Laplace’s principle of insufficient rea- 
soning. The distribution selected under the maximum 
entropy principle has also been interpreted as one that 
is the ‘most probable’ in the sense that the maximum 
entropy distribution coincides with the frequency dis- 
tribution that can be realized in the greatest number of 
ways [3]. An explanation of this linkage in the context 
of the well-known application of entropy maximization 
in transportation planning can be found in [7]. 

Recall that the above discussion was originally mo- 
tivated by the task of choosing a probability distribu- 
tion among those that are consistent with some given 
moments. Now, in addition to the moment constraints, 
suppose that we have an a priori probability distribu- 
tion p® that we think our probability distribution p 
should be close to. In fact, in the absence of the mo- 
ment constraints, we would like to choose p® for p be- 
cause it is clearly the closest to p’. However, in the 
presence of some moment constraints which p° does 
not satisfy, we need a precise definition of ‘closeness’ 
or ‘deviation’. In other words, we need to define some 
sort of deviation or, more precisely, ‘directed divergence’ 
[15] on the space of discrete probability distributions 
where the distribution is chosen from. Note that we de- 
liberately avoid calling this measure a ‘distance’. This is 
because a distance measure should be symmetric and 
should satisfy the triangular inequality, but these two 
properties are not important in this context. In fact, 
we can be content with a ‘one-way (asymmetric) devi- 
ation measure’, D(p, p’), from p to p®. If a ‘one-way 
deviation measure’ from p to p” is not satisfactory, one 
can consider using a symmetric measure defined as the 
sum of D(p, p®) and D(p°, p). What is desirable for 
this ‘directed divergence’ measure includes the follow- 
ing properties: 

1) D(p, p®) should be nonnegative for all p and p”. 

2) D(p, p°) = 0 if and only if p = p®. 

3) D(p, p’) should be a convex function of pi,...; pn. 

4) When D(p, p°) is minimized subject to moment 
constraints but without the explicit presence of the 
nonnegativity constraints, the resulting p;’s should 
be nonnegative. 


Property 1 is desirable for any such measure of de- 
viation. If property 2 were not satisfied, then it would 
be possible to choose a vector p that has a zero directed 
divergence from p”, i.e., one that is as ‘close’ to p° as p® 
itself, but differs from p°. Property 3 makes minimiz- 
ing the measure much simpler, and property 4 spares us 
from explicitly considering n nonnegativity constraints. 
Fortunately, there are many measures that satisfy these 
properties. We may even be able to find one that satis- 
fies the triangular inequality. But, simplicity of the mea- 
sure is also desirable. The simplest and most important 
of those measures is the Kullback—-Leibler measure ([5, 
Chapt. 4]), defined as D(p, p°) = ae pjin (pilp;)> with 
the convention that, whenever P; is 0, p; is set to 0 and 
0 In (0/0) is defined to be 0. This measure is also known 
as the cross-entropy, relative entropy, directed divergence 
or expected weight of evidence of p with respect to p®. A. 
Hobson [1] provided an axiomatic characterization of 
cross-entropy. He interpreted D(p, p°) as the ‘informa- 
tion in p relative to p®’, and showed that the only func- 
tion I(p, p°) satisfying the following five properties has 
the form of k i pj in (p;lp')> where k is a positive 
constant: 

5) I(p, p°) isa continuous function of p and p”. 

6) I(p, p’) is permutationally symmetric, i. e., the mea- 
sure does not change if the pairs of (p;, P}) are per- 
muted among themselves. 

7) I(p, p) = 0. 

8) For any pair of integers m and mo such that mp > n 
> 0, I(1/n, ..., 1/n, 0, ..., 0; 1/9, ..., 1/9) is an in- 
creasing function of mp and a decreasing function of 
n, where I(1/n, ..., 1/n, 0, ..., 0; 1/9, ..., 1/No) de- 
notes the information obtained when the number of 
equally likely possibilities is reduced from no to n. 

9) 


U Piet ag Pa Pi ic P,) = ag. a 
Pi Pr py a) 
+at( fakes > Spey 
AG Gee ae 
0 0 
q2 q2. 4 92 


where l <r<n,qi=pit-:: +ppq2=prt1lt::: 

+ Pn i = Pi t+ +P G2 = Prato + Pre 

Property 8 says, for example, that the information 
obtained upon reducing the number of equally likely 
sides on a die from 6 to 3 is greater than the information 
obtained upon reducing the number from 6 to 4. Prop- 
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erty 9 says that one may give information about the out- 

come associated with the random event either by speci- 

fying the probabilities p,, ..., p, directly, or by specify- 
ing the probabilities q; and q first and then specifying 
the conditional probabilities p;/q, and p;/qz. 

In addition to the nine properties discussed above, 
we state the following desirable properties for cross- 
entropy: 

10) D(p, p°) is convex in both p and p®. 

11) D(p, p°) is not symmetric. 

12) If p and q are independent and r and s are also in- 
dependent, then D(p * q, r * s) = D(p, r) + D(q, 
s), where *« denotes the convolution operation be- 
tween two independent distributions. 

13) In general, the triangular inequality does not hold. 
But, if distribution p minimizes D(p, p°) subject to 
some moment constraints and q is any other dis- 
tribution that satisfies those same constraints, then 
D(q, p°) = D(q, p) + D(p, p®). Thus, in this spe- 
cial case, the triangular inequality holds, but as an 
equality. 

Kullback and Leibler’s cross-entropy was also orig- 
inally defined for probability distributions with a fi- 
nite sample space and can be interpreted as a mea- 
sure of deviation of one probability distribution from 
another. It has been extended subsequently for distri- 
butions defined on countably infinite and continuous 
sample spaces. The corresponding forms become pa 
pjin (p;/p;) and p(x) In (p(x)/p° (x)) dx, respectively. 
It has also been derived rigorously as the unique mea- 
sure of deviation of one probability distribution from 
another that satisfies a set of axioms considered as ne- 
cessity for any reasonable measure of deviation, for 
both finite probability distributions [11] and contin- 
uous distributions [14]. Cross-entropy for probability 
distributions with countably infinite sample space can 
be viewed and has been used as a measure of deviation, 
although the justification is not as strong as their finite- 
sample-space and continuous counterparts. 

With cross-entropy interpreted as a measure of ‘de- 
viation’, the Kullback-Leibler’s principle of minimum 
cross-entropy, or MinxEnt, can be stated as follows [15]: 


Out of all possible distributions that are consis- 
tent with the moment constraints, choose the 
one that minimizes the cross-entropy with re- 
spect to the given a priori distribution. 


Mathematically, we consider the following opti- 
mization problem: 


min H2(p)= > Pi In 


j=l j 


s.t. > py gilx;) = ai, = oe 


.m, 
j=l 
n 
Le, 
j=l 
Pj 29, j=l,...,n. 


Note that the nonnegativity constraints are not bind- 
ing, for the same reason as in the MaxEnt problem. For 
a detailed discussion of the properties of MinxEnt, the 
reader is referred to [23]. 

Note that, if there is no a priori information, then 
one may use the uniform distribution, denoted by u, as 
the a priori distribution. In this case, D(p, p°) = D(p, 
u) = 207-1 Pj In (pj/(1/n)) = Inn + D77_, pj In pj. Since 
minimizing aa p; In p; is equivalent to maximizing 
=i p; In p;, minimizing the cross-entropy with re- 
spect to the uniform distribution is equivalent to maxi- 
mizing entropy and, therefore, MaxEnt is a special case 
of MinxEnt. These two principles can now be combined 
into a general principle: 


Out of all probability distributions satisfying the 
given moment constraints, choose the distribu- 
tion that minimizes the cross-entropy with re- 
spect to the given a priori distribution and, in the 
absence of it, choose the distribution that mini- 
mizes the cross-entropy with respect to the uni- 
form distribution. 


Both the MaxEnt and MinxEnt principles for select- 
ing finite-sample-space probability distributions and 
the MinxEnt principle for selecting continuous prob- 
ability distributions can be axiomatically derived [22]. 
Under four consistency axioms, it was shown that the 
two principles are uniquely correct methods for induc- 
tive inference when new information is given in the 
form of expected values. Many well-known and widely 
used distributions, including the normal, gamma and 
geometric distributions, can actually be derived as so- 
lutions to some MaxEnt or MinxEnt problems [15]. 

The maximum entropy principle has also been 
shown to be a dual principle of the maximum likelihood 
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principle for the exponential family of probability distri- 
butions in the sense that a dual problem to the linearly 
constrained entropy maximization problem is equiva- 
lent to the problem of maximizing a likelihood function 
with respect to the parameters of an exponential family 
[2]. This principle has also been shown to be related to 
the Bayesian parameter estimation problem [7]. Duality 
theory and major mathematical algorithms for solving 
finite-dimensional MaxEnt or MinxEnt problems can 
be found in [7] and the references therein. 


See also 
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An equality-constrained nonlinear programming prob- 
lem may be posed in the form 


min F(x) 
xeR" (1) 
subject to c(x) = 0, 


where f is a real-valued nonlinear function and c is 
an m-vector of real-valued nonlinear functions with ith 
component c;(x),i=1,..., m. Normally, with the term 
equality-constrained nonlinear programming problem is 
meant a problem of the form (1) where f and c are 
sufficiently smooth, at least continuously differentiable. 
This will be assumed throughout this discussion, with 
the gradient of f(x) denoted by g(x) and the m x n Ja- 
cobian of c(x) denoted by J(x). 

Of fundamental for  equality- 
constrained optimization problems are the first order 
necessary optimality conditions. These conditions are 
often referred to as the KKT necessary optimality con- 
ditions, or more briefly, the KKT conditions. The KKT 
conditions state that if x* is a local minimizer to (1) 
that satisfies a certain constraint qualification, then 
there exists an m-dimensional vector A* such that 


importance 


g(x") — J(x*)TA* = 0, 
c(x*) =0. 


The vector A* is usually referred to as the vector of La- 
grange multipliers. For equality-constrained problems, 
the KKT conditions are attributed to J.L. Lagrange, and 
hence ‘classical’. The acronym KKT arises from the 
more general results on inequality-constrained prob- 
lems provided by W. Karush [3], H.W. Kuhn and A.W. 
Tucker [4,5]. 

For an equality-constrained problem, the KKT con- 
ditions state that x* must be feasible, i. e., c(x*) = 0; and 
that the gradient must have zero projection onto the 
null space of the constraint gradients, i.e., there exists 
a A* such that g(x*) = J(x*)T A*. In the case of linear 
equality constraints, i.e., c(x) = Ax — b for some (m x 
n)-matrix A and m-vector J, it follows that if x* is feasi- 
ble, then x* + p is feasible if and only if Ap = 0. Hence, 
in this situation, if x* is a local minimizer, it must hold 


that g(x*)T p = 0 for all p such that Ap = 0. But this is 
equivalent to the existence of a A* such that g(x*) = AT 
A*. Consequently, in the case of linear constraints, the 
KKT conditions are necessary for x* to be a local min- 
imizer to problem (1). Constraint qualifications essen- 
tially ensure that the linearization of c at x* provided by 
J(x*) adequately describes c in a neighborhood of x*. 
A constraint qualification which is frequently used is 
that J(x*) has rank m, i. e., that the gradients of the con- 
straints are linearly independent at x*. The related Fritz 
John necessary optimality conditions are valid without 
any constraint qualification. 

The KKT conditions are of fundamental impor- 
tance, not only from a theoretical point of view, but also 
algorithms for solving equality-constrained nonlinear 
programming problems are often based on finding a so- 
lution to the KKT conditions. In general, the KKT con- 
ditions are not sufficient for x* to be a local minimizer, 
but second order optimality conditions need be con- 
sidered. However, if c is affine and f is a convex func- 
tion on the feasible region, then the KKT conditions are 
sufficient for x* to be a global minimizer. Detailed dis- 
cussions on optimality conditions can be found in text- 
books on nonlinear programming, e. g., [1,2,6]. 

As a simple example, consider the two-dimensional 
problem where f(x) = x; and c(x) = (xj + x} — 1)/2. 
Then, the KKT conditions have two solutions: * = (1, 
0)T together with 1 = 1, and % = (—1,0)T together 
with ry = —1 However, only x is alocal minimizer (and 
in fact also a global minimizer). 
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Many complex systems in which agents compete for 
scarce resources on a network, be it a physical one, as 
in the case of congested urban transportation systems, 
or an abstract one, as in the case of certain economic 
and financial problems, can be formulated and stud- 
ied as network equilibrium problems. Applications of 
network equilibrium problems are common in many 


disciplines, in particular, in operations research and 
management science and in economics and engineer- 
ing (cf. [10,17]). 

Network equilibrium problems as opposed to 
network optimization problems involve competition 
among the agents or users of the network system. More- 
over, network equilibrium problems are governed by 
an underlying behavioral principle as to the behavior 
of the agents as well as the equilibrium conditions. For 
example, in congested urban transportation systems in 
which users seek to determine their cost minimizing 
routes of travel, the equilibrium conditions, due to J.G. 
Wardrop [23] (see also [2] and [8]), state that, in equi- 
librium all used paths connecting an origin/destination 
pair will have minimal and equal user travel costs. On 
the other hand, in the case of spatial price equilib- 
rium patterns one seeks to determine the commodity 
production, trade, and consumption pattern satisfying 
the equilibrium condition, due to S. Enke [9] and P.A. 
Samuelson [20], that expresses that there will be trade 
between a pair of spatially separated supply and de- 
mand markets provided the supply price of the com- 
modity at the supply market plus the unit cost of trans- 
portation associated with shipping the commodity is 
equal to the demand price of the commodity at the de- 
mand market; if the supply price plus the transporta- 
tion cost exceed the demand price, then there will be no 
trade between this pair of supply and demand markets. 

M.J. Beckmann, C.B. McGuire, and C.B. Winsten 
[2] initiated the systematic study of network equilib- 
rium problems in the general setting of traffic networks 
and demonstrated that the equilibrium flow pattern sat- 
isfying the traffic network equilibrium conditions (see 
also [23]), under certain symmetry assumptions on the 
underlying functions, could be reformulated as the so- 
lution to an optimization problem. Samuelson [20], fol- 
lowing [9], had made a similar connection but in the 
more specialized context of spatial price equilibrium 
problems on networks that were bipartite. 

MJ. Smith [22] later proposed an alternative formu- 
lation of traffic network equilibrium conditions which 
were then identified by S.C. Dafermos [3] to sat- 
isfy a finite-dimensional variational inequality problem. 
This connection allowed for the relaxation of the sym- 
metry assumption and, consequently, for the construc- 
tion of more realistic models (cf. [17,21], and the refer- 
ences therein). 
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Other network equilibrium applications whose 
study and understanding have benefited from this 
methodology (cf. [10,14,17,19]), include: spatial price 
equilibrium problems (see, e.g., [11,15]), oligopolis- 
tic market equilibrium problems ([7,12,13]), migration 
equilibrium problems (cf. [16,18]), and general eco- 
nomic equilibrium problems (cf. [5]). 

Here we present two examples of network equilib- 
rium problems for illustrative purposes with the first 
example being a multimodal/multiclass transportation 
network equilibrium problem in which the network is 
a physical one whereas the second problem is a mullti- 
class migration equilibrium problem which is isomor- 
phic to a specially structure multiclass traffic network 
equilibrium problem. 

Additional background, models and applications, 
qualitative results, as well as computational procedures 
and references can be found in [17] and [10]. 


A Multimodal Traffic Network Equilibrium Model 


We now present a multimodal traffic network equilib- 
rium model (cf. [3,4,6]). The model is a fixed demand 
model in that the demands associated with traveling be- 
tween the origin/destination pairs are assumed known. 
See [17] for additional background, as well as elastic 
demand traffic network equilibrium models and other 
network equilibrium problems. 

Consider a general network N = [G, A], where N 
denotes the set of nodes and A the set of directed links. 
Let a, b, c, ...denote the links, p, q, ...the paths. As- 
sume that there are J origin/destination (O/D) pairs, 
with a typical O/D pair denoted by w, and n modes of 
transportation on the network with typical modes de- 
noted by i, j, .... 

The flow on a link a generated by mode i is de- 
noted by f‘,, and the user cost associated with traveling 
by mode i on link a is denoted by c’,. Group the link 
flows into a column vector f € R"’, where L is the num- 
ber of links in the network. Group the link costs into 
a row vector c € R™. Assume that the user cost on a link 
and a particular mode may, in general, depend upon the 
flows of every mode on every link in the network, that 
is, 


c=c(f), 


where c is a known smooth function. 


The travel demand of users of mode i traveling be- 
tween O/D pair w is denoted by di, and the travel disu- 
tility associated with traveling between this O/D pair 
using the mode is denoted by A‘,. Group the demands 
into a vector d € R”. 

The flow on path p due to mode i is denoted by a 
Group the path flows into a column vector x € R", 
where Q denotes the number of paths in the network. 

The conservation of flow equations are as follows. 
The demand for a mode and O/D pair must be equal to 
the sum of the flows of the mode on the paths joining 
the O/D pair, that is, 


a = Ps i Vi, Vw, 


pePy 


where P,, denotes the set of paths connecting w. 

A nonnegative path flow vector x which satisfies 
the demand constraint is termed feasible. Moreover, we 
must have that 


fa = D xpdap. 
P 


that is, for each mode, the link load associated with 
a mode is equal to the sum of the path flows of that 
mode on paths that utilize that link. 

A user traveling on path p using mode i incurs a user 
(or personal) travel cost C; satisfying 


—_ i 
Ch = Do cabap, 
a 


in other words, the cost on a path p due to mode i is 
equal to the sum of the link costs of links comprising 
that path and using that mode. 

The traffic network equilibrium conditions are 
given below. 


Definition 1 (multimodal traffic network equilibrium) 
([2,3,4]) A link load pattern f* satisfying the feasibil- 
ity conditions is an equilibrium pattern, if, once estab- 
lished, no user has any incentive to alter his travel ar- 
rangements. This state is characterized by the follow- 
ing equilibrium conditions, which must hold for every 
mode i, every O/D pair w, and every path p € Py: 


«ok 
; if xi >0, 
C! Ww Pp 
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where Ai, is the equilibrium travel disutility associated 
with the O/D pair w and mode i. 


We now define the feasible set K as 


dx > 0, 
the demand constraints and 
the link load constraints hold 


K= sf: 


One can verify (see [3]) that the variational inequal- 
ity governing equilibrium conditions for this model 
would be given as in the subsequent theorem. 


Theorem 2 (variational inequality formulation) A 
vector f* € K is an equilibrium pattern, if and only if, 
it satisfies the variational inequality problem 

(c(f*), f — f*) = 90, Vf EK. 
Note that this variational inequality is in link loads. 
One can also derive a variational inequality problem in 
path flows (see also [1,4,17]). Existence of an equilib- 
rium f* follows from the standard theory of variational 
inequalities (cf. [14]) solely from the assumption that 
c is continuous, since the feasible set K is now com- 
pact. 

In the special case where the symmetry condition 


i acl 
oe a ees 
afi = fa 


holds, then the variational inequality problem can be 
reformulated as the solution to an optimization prob- 
lem. This symmetry assumption, however, is not ex- 
pected to hold in most applications. Consequently, the 
variational inequality problem which is the more gen- 
eral problem formulation is needed. For example, the 
symmetry condition essentially says that the flow on 
link b due to mode j should affect the cost of mode i 
on link a in the same manner that the flow of mode i 
on link a affects the cost on link b and mode j. In the 
case of a single mode problem, the symmetry condition 
would imply that the cost on link a is affected by the 
flow on link b in the same manner as the cost on link b 
is affected by the flow on link a. 


A Migration Network Equilibrium Model 


Human migration is a topic that has been studied not 
only by economists, but also by demographers, sociolo- 


gists, and geographers. Here a model of human migra- 
tion is described, which is shown to have a simple, ab- 
stract network structure in which the links correspond 
to locations and the flows on the links to populations of 
a particular class at the particular location. Hence, the 
model is isomorphic to the traffic network equilibrium 
problem just described on a network with special struc- 
ture. For additional details, see [16,17,18]. 

Assume a closed economy in which there are n lo- 
cations, typically denoted by i, and J classes, typically 
denoted by k. Assume further that the attractiveness of 
any location i as perceived by class k is represented by 
a utility a Let p denote the fixed and known popu- 
lation of class k in the economy, and let p* denote the 
population of class k at location i. Group the utilities 
into a row vector u € R/ and the populations into a col- 
umn vector p € R/”. Assume no births and no deaths in 
the economy. 

The conservation of flow equation for each class k is 
given by 


Pp =>> pi. 


i=1 
where p* >0,k=1,...,J;i=1,...,n. Let 


p = Oand satisfy the 
conservation of flow equation 


The conservation of flow equation expresses that the 
population of each class k must be conserved in the 
economy. 


Definition 3 (migration equilibrium) Assume that 
the migrants are rational and that migration will con- 
tinue until no individual of any class has any incentive 
to move since a unilateral decision will no longer yield 
an increase in the utility. Mathematically, hence, a mul- 
ticlass population vector p* € K is said to be in equilib- 
rium if for each class k,k=1,..., J: 


_(=Ak ifpk* >0 
i <ak if pt” =0. 


The equilibrium conditions express that for a given 
class k only those locations i with maximal utility will 
have a positive population volume of the class. More- 
over, the utilities for a given class are equilibrated across 
the locations. 
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Equilibrium Networks 


eerie OY ere! 


Equilibrium Networks, Figure 1 
Network equilibrium formulation of a multiclass migration 
equilibrium model 


We now discuss the utility functions. Assume that, 
in general, the utility associated with a particular lo- 
cation as perceived by a particular class, may depend 
upon the population associated with every class and ev- 
ery location, that is, assume that 


u = u(p). 


Note that in allowing the utility to depend upon the 
populations of the classes, we are using populations as 
a proxy for amenities associated with a particular loca- 
tion. Such a utility function can also model the nega- 
tive externalities associated with overpopulation, such 
as congestion, increased crime, competition for scarce 
resources, etc. 

As illustrated in [17], the above migration model is 
equivalent to a network equilibrium model with a sin- 
gle origin/destination pair and fixed demands. Indeed, 
one can make the identification as follows. Construct 
a network consisting of two nodes, an origin node 0 and 
a destination node 1, and n links connecting the origin 
node to the destination node. Associate with each link 
i, J costs: — uj,...5 ul, and link flows represented by p;, 
wexy p! . This model is, hence, equivalent to a multimodal 
traffic network equilibrium model with fixed demand 
for each mode, consisting of a single origin/destination 
pair, and J paths connecting the O/D pair. Note that 
one can make J copies of the network, in which case, 
each ith network will correspond to class i with the cost 
functions on the links defined accordingly. This identi- 
fication enables us to immediately write down the fol- 
lowing: 


Theorem 4 (variational inequality formulation) A 
population pattern p* € K is in equilibrium, if and only 
if it satisfies the variational inequality problem: 


(—u(p*),p—p*)>=0,  VpeK. 


Existence of an equilibrium then follows from the stan- 
dard theory of variational inequalities, since the feasible 
set K is compact, assuming that the utility functions are 
continuous. Uniqueness of the equilibrium population 
pattern also follows from the standard theory provided 
that the —u function is strictly monotone. The interpre- 
tation of this monotonicity condition in the context of 
applications is that condition implies that the utility as- 
sociated with a given class and location is expected to 
be a decreasing function of the population of that class 
at that location. 
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Keywords 
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Complementarity theory is a new domain of applied 
mathematics strongly related to Linear Analysis, Non- 
linear Analysis, Topology, Variational Inequalities The- 
ory, Ordered Topological Vector Spaces, Numerical 
Analysis etc. The main goal in this theory is the study of 
complementarity problems. It is well known that com- 
plementarity problems encompass a variety of practi- 
cal problems arising in: Optimization, Structural Me- 
chanics, Elasticity, Economics etc. [8]. The relation be- 
tween the general nonlinear complementarity problem 
and the fixed point problem it seems to be remarkable. 
The main aim of this article is the study of this relation. 


928 


Equivalence Between Nonlinear Complementarity Problem and Fixed Point Problem 


Preliminaries 


Let E, E* be a pair of real locally convex spaces. The 
space E* can be the topological dual of E. Let (-, -) be 
a bilinear form on E x E” satisfying the separation ax- 
ioms: 


$1) (xo, y) = 0 for all y € E* implies xo = 0; 
s2) (x, yo) = 0 for all x € E implies yo = 0. 


The triplet (E, E*, (-, -)) is called a dual system or a du- 

ality (denoted by (E, E*)). In practical problems, the 

space E can be a Banach space and E* its topological 

dual and (x, y) = y(x) for all x € E and y € E*. When 

E is a Hilbert space (H, (-, -)) or the Euclidean space 

(R", (-, -)) we have that H* (respectively, (R”)*) is iso- 

morphic to H (respectively, to R”). Let (E, E*) be a dual 

system of locally convex spaces. Denote by K a pointed 

convex cone in E, i. e., a subset of E satisfying the follow- 

ing properties: 

1) K+KCK; 

2) AK C K for all A € R, (the set of nonnegative real 
numbers); and 

3) KN (—K) = {0}. 

The closed convex cone 


K* ={yeE*: (x,y) > Oforallx eK} 


is called the dual of K. The polar of K is K° = — K*. 
Given the pointed convex cone K C E we denote by < 
the ordering defined on E by K, i.e., x < y if and only 
if y — x K. In some situations, E is a vector lattice with 
respect to this ordering, i. e., for every pair x, y € E there 
exist inf(x, y) (denoted by x A y) and sup(x, y) (denoted 
by x Vv y). We say that the bilinear form (-, -) is K-local 
if (x, y) = 0, whenever x, y € Kandx A y=0. 

Let (H, (-,-)) be a Hilbert space and K C H a closed 
pointed convex cone. It is known that the projection 
operator onto K, denoted by Px is well defined [20] and 
for every x € H, Px (x) is the unique element of K satis- 
fying || x— Px (x) || = min, ex ||x—y |. 


Theorem 1 For every x € H, Px (x) is characterized by 
the following property: 

1) (PK(x)— x, y) = 0 for ally € K; 

2) (Px(x) —x, x) =0. 


Proof A proof of this theorem is in [20]. Oo 


Very useful is also the following classical Moreau’s the- 
orem: 


Theorem 2 IfK C H is a closed convex cone and x, y, Z 
€ H, then the following statements are equivalent: 

i) z=xty,x€K, y€K° and (x, y) =0; 

ii) x = Px (z) and y = Pxo(z). 


Proof For the proof the reader is referred to [16]. O 


We say that the closed pointed convex cone K C H is 
isotone projection if and only if, for every x, y € H such 
that y — x € K we have Px(y)—Px(x) € K. This remark- 
able class of cones has been studied in several papers 
(see for example [13]). We say that a closed pointed 
convex cone K C H is a Galerkin cone if there exists 
a family of convex subcones {K,}n¢n of K such that: 

1) K, isa locally compact cone, for every n € N; 

2) ifn < m, then K, C Ky; 

3) K = UnenKn. 

We denote a Galerkin cone by K(K,),¢n. For more 
information about the application of Galerkin cones 
in complementarity theory, we indicate the papers 
[7,8,10,11,12,13] and [14]. 


Nonlinear Complementarity Problem 


Let (E, E*) be a dual system of locally convex spaces and 
K C Ea pointed convex cone. Given the mapping f:K 
— E*, the nonlinear complementarity problem associ- 
ated to f and K is: 


find xy EK 
NLCP(f, K) st. f (xo) € K* 
and (xo, f(xo)) = 0. 


Given two mappings f: K — E* and g: K > E the im- 
plicit complementarity problem is: 


find xo EK 
ICP(f, g, K) st. g(xo) EK, ff (xo) € K* 
and (g(x), f(xo)) = 0. 


The problem NLCP(f, K) is important in optimiza- 
tion, Economics, mechanics, engineering, game theory, 
etc. [8]. The problem ICP(f, g, K) was defined in rela- 
tion with the study of some problems in stochastic op- 
timal control [8]. The problems NLCP(f, K), ICP(f, g, 
K) can be solvable or unsolvable. 
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Solvability by Fixed Points Theorems 


Given a topological space X and a mapping f: X > 
X, the fixed point problem is to know under what con- 
ditions there exists a point x, € X such that f(x.) = 
xx. This problem is studied in the Fixed Point Theory, 
which is a very popular domain in Nonlinear Analysis. 
In particular the Fixed Point Theory has been used by 
several authors in the study of solvability of the prob- 
lem NLCP(f, K). The results obtained in this sense, are 
based on some equivalences between NLCP(f, K) and 
the fixed point problem. Let (H, (-,-)) be a Hilbert space, 
K C H a pointed closed convex cone and f: K > H 
a mapping. 


Theorem 3 The element x. € K is a solution of the 
problem NLCP(f, K) if and only if x» is a fixed point in 
K for the mapping T(x) = Px(x —f(x)). 


Proof Suppose that x, € Kis a solution of the problem 
NLCP(f, K). We can show that x, satisfies properties 1), 
2), of Theorem 1 for x = x4—f(xx). 

Conversely, if x. € K and x, = Px(x.—f(x.)), then 
since Px(x+—f(x.)) satisfies properties 1), 2) of Theo- 
rem 1 we deduce that x, is a solution of the problem 
NLCP(f, K). Oo 


Theorem 4 The problem NLCP(f, K) has a solution if 
and only if the mapping B(x) = Px(x)—f(Px(x)), defined 
for every x € H, has a fixed point in H. Moreover, if xo is 
a fixed point of ®, then x» = Px(xo) is a solution of the 
problem NLCP(f, K). 


Proof Suppose that xp is a fixed point for the mapping 
®,i.e., 


xo = Px(x0) — F (Px(x0)) . 


If we denote by xx = Px(xo), we have that x, € K and 
Xo =Xx—f(X«), OF X+— Xo =f (xx). Applying Theorem 1 
we can show that f(x») € K* and (xx, f(x«)) = 0, ie., 
x» is a solution of the problem NLCP(f, K). 

Conversely, if x» € K is a solution of the problem 
NLCP(f, K), then denoting by xo = x.—f (xx) and ap- 
plying Theorem 2 we deduce that Px(xo) = x» and fi- 
nally, 


(xo) = Px(xo) — f(Px(xo)) 


= Xx — f (Xxx) = X0, 


i.e., Xo is a fixed point of ®. Oo 


The mapping, ® defined in Theorem 4 was applied in 
complementarity theory in 1988, [7], while the map- 
ping W(x) = x— ®(x) was used in 1992 [19]. The map- 
ping W is known as the normal map. By Theorem 3 
the NLCP(f, K) is transformed in a fixed point prob- 
lem for the mapping T with respect to the cone K while, 
by Theorem 4 the problem NLCP(f, K) is transformed 
in a fixed point problem with respect to the whole space 
H. Several existence results for the problem NLCP(f, K) 
have been obtained by several authors using the fixed 
point theory and the mappings T and ®, [3,6,7,8,10,13]. 
The fixed point problem associated to the mappings 
T and @ has been also used in several iterative meth- 
ods for solving numerically the problem NLCP (f, K) 
[1,8,13,17,18] etc. 

In [15] and also in [2] it is shown that the problem 
NLCP (f, K) is equivalent to the following variational 


inequality 
find xe€K 
VI(f.K) st. (f(x), y—x) >0 
for all y € K. 


Because, the fixed point theory is systematically applied 
to the study of variational inequalities, we have by this 
way another possibility to use the fixed point theory in 
the study of the problem NLCP(f, K). In this sense are 
relevant the results obtained in [5,7,8,12] and in many 
other papers dedicated to the study of variational in- 
equalities. In the study of some economical problems, 
we are interested to find a solution of the problem 
NLCP(f, K) which is also the least element of the fea- 
sible set 


F={xeéK: f(x) €K*}. 


This particular problem can be also studied by the fixed 
point theory [5,8]. If the cone K is an isotone projec- 
tion cone in a Hilbert space H and if the mapping f: H 
— H satisfies some properties with respect to the or- 
dering defined by K, we obtain that the mappings T 
and ® are monotone increasing or the difference of two 
monotone increasing mappings. In this case, we can ap- 
ply some fixed point theorems based on the ordering, to 
study of the problem NLCP(f, K). Several results in this 
sense are presented in [13]. 
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The Nonlinear Complementarity Problem 
as a Mathematical Tool in Fixed Point Theory 


The fixed point theorems on cones attracted the atten- 
tion of many mathematicians. The applications of such 
kind of fixed point theorems are very important. We 
will show now how the problem NLCP(f, K) can be 
used to obtain new fixed point theorems on cones. 

Let H be a Hilbert space, K C H a closed pointed 
convex cone and h: K > Ka mapping. The fixed point 
problem associated to h and K is: 


find xo EK 


FP(h, K) 
st. h(x) = xo. 


Consider the mapping f: K > H defined by f(x) = x 
—h(x) for all x € K. 


Theorem5 The problems NLCP(f, K) and FP(h, K) are 
equivalent. 


Proof Suppose that x, is a solution of the problem 
FP(h, K). In this case we have h(x«) = xx, which im- 
plies that f(x.) = 0. It is evident that x, is a solution of 
the problem NLCP(f, K). Conversely, if x, is a solution 
of the problem NLCP(f, K) we have that x, is a solution 
of the problem VI(f, K), i-e., x» € K and (f(x), y—xx) 
> 0 for all y € K. But f(x.) = x.—h(x) and h(x.) € K 
(by hypothesis). This means that 


O< (Xx — h(x), X* — h(xx)) <0, 


which implies that h(x.) = xx. oO 


We note that Theorem 5 was applied to obtain new 
fixed point theorems [7,10,11]. We cite only the follow- 
ing two fixed point theorems. 


Theorem 6 Let (H, (-, -)) be a Hilbert space ordered by 

a Galerkin cone K(K), <n. Let T: K > K be a mapping 

satisfying the following assumptions: 

1) T(0) #0; 

2) T is a (ws)-compact operator; 

3) T is p-asymptotically bounded, with lim; + 0 o(t) 
+00. 

Then, T has a fixed point x. € K\ {0}. Moreover, xx is 

the limit of a sequence {Xm}m ew where for every m € N, 

Xm is a solution of the problem NLCP(T, Ky). 


Proof The terminology and the proof is in [7]. Oo 


Recently, a new proof for this theorem was proposed 
in [14]. 


Theorem 7 Let (H, (-, -)) be a Hilbert space ordered by 
a Galerkin cone K(K), «nw C H. Suppose, given two con- 
tinuous operators S, T: K — H such that S is bounded, T 
is compact and (S + T)(K) C K. If the following assump- 
tions are satisfied: 

1) I—S satisfies condition (S),; 

2) I—S-—T satisfies condition (GM), 

then S + T has a fixed point in K. 


Proof The terminology and the proofisin [11]. O 


We note that Theorem 7 has several interesting corol- 
laries. In [10] the reader can find other fixed point the- 
orems for set-valued operators. 


Conclusions 


This interesting double relation between the nonlinear 
complementarity problem and the fixed point theory, can 
be exploited to obtain new results in complementarity 
theory and also in fixed point theory. 
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One of the most crucial steps in many multicriteria de- 
cision making methods (MCDM) is the accurate esti- 
mation of the pertinent data [18]. Very often these data 
cannot be known in terms of absolute values. For in- 
stance, what is the worth of the ith alternative in terms 
of a political impact criterion? Although information 
about questions like the previous one is vital in mak- 
ing the correct decision, it is very difficult, if not im- 
possible, to quantify it correctly. Therefore, many de- 
cision making methods attempt to determine the rela- 
tive importance, or weight, of the alternatives in terms 
of each criterion involved in a given decision making 
problem. 

Consider the case of having a single decision cri- 
terion and a set of n alternatives, denoted as A; (for 
i = 1, ..., n). The decision maker wants to deter- 
mine the relative performance of these alternatives 
in terms of a single criterion. An approach based 
on pairwise comparisons which was proposed by T.L. 
Saaty [11], and [12] has long attracted the interest of 
many researchers, because both of its easy applicabil- 
ity and interesting mathematical properties. Pairwise 
comparisons are used to determine the relative im- 
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portance of each alternative in terms of each crite- 
rion. 

In that approach the decision maker has to express 
his/her opinion about the value of one single pairwise 
comparison at a time. Usually, the decision maker has 
to choose his/her answer among 10-17 discrete choices. 
Each choice is a linguistic phrase. Some examples of 
such linguistic phrases when two concepts, A and B are 
considered might be: ‘A is more important than B’, or 
‘A is of the same importance as B’, or ‘A is a little more 
important than B’, and so on. When one focuses di- 
rectly on the data elicitation issue one may use linguistic 
statements such as “How much more does alternative A 
belong to the set S than alternative B’? 

The main problem with the pairwise comparisons 
is how to quantify the linguistic choices selected by the 
decision maker during the evaluation of the pairwise 
comparisons. All the methods which use the pairwise 
comparisons approach eventually express the quali- 
tative answers of a decision maker into some num- 
bers. 

Pairwise comparisons are quantified by using 
a scale. Such a scale is nothing but an one-to-one map- 
ping between the set of discrete linguistic choices avail- 
able to the decision maker and a discrete set of numbers 
which represent the importance, or weight, of the previ- 
ous linguistic choices. There are two major approaches 
in developing such scales. The first approach is based on 
the linear scale proposed by Saaty [12] as part of the an- 
alytic hierarchy process AHP. The second approach was 
proposed by F. Lootsma [8,9,10] and determines expo- 
nential scales. Both approaches depart from some psy- 
chological theories and develop the numbers to be used 
based on these psychological theories. For an extensive 
study of the scale issue, see [18] and [19]. 

In this article we examine three problems related to 
the use of pairwise comparisons for data elicitation in 
MCDM. The first problem is how to combine the n(n — 
1)/2 comparisons needed to compare n entities (alter- 
natives or criteria) under a given goal and extract their 
relative preferences. This subject was extensively stud- 
ied in [21] and it is briefly discussed in the second sec- 
tion. The second problem in this article is how to esti- 
mate missing comparisons. The third problem is how to 
select the order for eliciting the comparisons and deter- 
mine whether all comparisons are needed. These prob- 
lems are examined in detail in the following sections. 


Extraction of Relative Priorities 
from Complete Pairwise Matrices 


Let Aj, ..., Ay be n alternatives (or criteria or, in gen- 
eral, concepts) to be compared. We are interested in 
evaluating the relative preference values of the above 
concepts. Saaty [11,12,14] proposed to use a matrix A 
of rational numbers taken from the set {1/9, 1/8, 1/7, 
..) 1, ..., 9}. Each entry of the above matrix A repre- 
sents a pairwise judgment. Specifically, the entry aj de- 
notes the number that estimates the relative preference 
of element A; when it is compared with element Aj. Ob- 
viously, aj = 1/aj; and aj; = 1. That is, the matrix is re- 
ciprocal. 


The Eigenvalue Approach 


Let us first examine the case in which it is possible to 
have perfect values aj. In this case it is aj = Wi/W; (Ws 
denotes the actual value of element s) and the previous 
reciprocal matrix A is consistent. That is: 


Qij = Gik X Akj for i,j,k =1,...,n, (1) 


where n is the number of elements in the comparison 
set. It can be proved [12] that the matrix A has rank 1 
with n to be its nonzero eigenvalue. Thus, we have: 


Ax = nx, (2) 


where x is an eigenvector. From the fact that aj = 
W;/ Wj, the following are obtained: 


n n 
> aijWj = > Wi =nW;, i=1,...,0, (3) 
j=l j=l 
or 
AW = nW. (4) 


Equation (4) states that n is an eigenvalue of A with W 
being a corresponding eigenvector. The same equation 
also states that in the perfectly consistent case (i. e., when 
Ai = dix X a4 for all possible triplets), the vector W, with 
the relative preferences of the elements Aj,..., Ay, is the 
principal right eigenvector (after normalization) of A. 
In the nonconsistent case (which is the most com- 
mon) the pairwise comparisons are not perfect, that is, 
the entry aj might deviate from the real ratio W;/W; 
(i.e., from the ratio of the real relative preference val- 
ues W; and Wj). In this case, the previous expression 
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(1) does not hold for all possible combinations. Now 
the new matrix A can be considered as a perturbation 
of the previous consistent case. When the entries aj 
change slightly, then the eigenvalues change in a sim- 
ilar fashion [12]. Moreover, the maximum eigenvalue 
is close to n (actually greater than n) while the remain- 
ing eigenvalues are close to zero. Thus, in order to find 
the relative preferences in the nonconsistent cases, one 
should find an eigenvector that corresponds to the max- 
imum eigenvalue A max. That is to say, to find the prin- 
cipal right eigenvector W that satisfies: 


AW = AmaxW where Amax = 7. 


Saaty estimates the principal right eigenvector W by 
multiplying the entries in each row of A together and 
taking the nth root (n being the number of the elements 
in the comparison set). Since we desire to have values 
that add up to 1, we normalize the previously found 
vector by the sum of the above values. If we want to 
have the element with the highest value to have a rel- 
ative preference value equal to 1, we divide the previ- 
ously found vector by the highest value. 

Under the assumption of total consistency, if the 
judgments are gamma distributed (something that 
Saaty claims to be the case), the principal right eigen- 
vector of the resultant reciprocal matrix A is Dirichlet 
distributed. If the assumption of total consistency is re- 
laxed, then L.G. Vargas [23] proved that the hypothesis 
that the principal right eigenvector follows a Dirichlet 
distribution is accepted if the consistency ratio is 10% 
or less. 

The consistency ratio (CR) is obtained by first es- 
timating Amax. Saaty estimates Amax by adding the 
columns of matrix A and then multiplying the resulting 
vector with the vector W. Then, he uses what he calls 
the consistency index (CI) of the matrix A. He defined 
Cl as follows: 

Amax — 1 


CI = 


n—1 


Then, the consistency ratio CR is obtained by divid- 
ing the CI by the random consistency index (RCI) as 
given in Table 1. Each RCI is an average random con- 
sistency index derived from a sample of size 500 of ran- 
domly generated reciprocal matrices with entries from 
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Optimization Techniques, Table 1 
RCI values for sets of different order n [12] 


(in| 2s Se eC 8 
RCI] 0 0 0.58 0.90 1.12 1.24 1.32 1.41 1.45 


the set {1/9, 1/8, 1/7,...,1,..., 9} to see ifits CI is 10% or 
less. If the previous approach yields a CR greater than 
10%, then a reexamination of the pairwise judgments is 
recommended until a CR less than or equal to 10% is 
achieved. 


Optimization Approaches 
A.T.W. Chu, R.E. Kalaba and K. Spingarn [2] claimed 
that given the data aj, the values W; to be estimated are 
desired to have the property: 
Wi 

aij ~ WwW; (5) 

This is reasonable since aj is meant to be the esti- 
mation of the ratio W;/Wj;. Then, in order to get the 
estimates for the Wj given the data aj, they proposed 
the following constrained optimization problem: 


min S= >» Sn = wi)’, 


i=j j=i 
n 
6 
st DW =], em 
i=j 
W;>0 fori=1,...,n. 


They also provide an alternative expression S, that is 
more difficult to solve numerically. That is, 


2 
Sy -Yd(a-Z) . (7) 


In [3] a variation of the above least squares formu- 
lation is proposed. For the case of only one decision 
maker it recommends the following models: 


log aj; = log W; — log Wj + Wa( Wi, Wy)eij. (8) 


W; 
aij = w, + WoW Wied, (9) 
j 
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where Wj and W; are the true (and hence unknown) 
relative preferences; W(X, Z) and wW2(X, Z) are given 
positive functions (where X, Z > 0). The random er- 
rors €j are assumed independent with zero mean and 
unit variance. Using these two assumptions one is able 
to calculate the variance of each individual estimated 
relative preference. However, is fails to give a way of 
selecting the appropriate positive functions. In the sec- 
ond example, presented later, a sample problem which 
originates in [11] and later in [3] is solved for different 
functions ~;, W2 using this method. 


Considering the Human Rationality Factor 


According to the human rationality assumption [21] the 
decision maker is a rational person. Rational persons 
are defined here as individuals who try to minimize 
their regret [15], to minimize losses, or to maximize 
profit [24]. In the relative preference evaluation prob- 
lem, minimization of regret, losses, or maximization of 
profit could be interpreted as the effort of the decision 
maker to minimize the errors involved in the pairwise 
comparisons. 

As it is stated in previous paragraphs, in the incon- 
sistent case the entry a; of the matrix A is an estimation 
of the real ratio W;/W;. Since it is an estimation, the 
following is true: 


(10) 


In the above relation dj denotes the deviation of aj 
from being an accurate judgment. Obviously, if dj = 
1, then the aj was perfectly estimated. From the pre- 
vious formulation we conclude that the errors involved 
in these pairwise comparisons are given by: 


Ei; = dij — 1.00, 


or after using (10), above: 


W; 

€jj = aij =) = 00: 
j i\ w, 
1 


When a comparison set contains n elements, then 
Saaty’s method requires the estimation of the following 


(11) 


n(n — 1)/2 pairwise comparisons: 


W, Wy 
wi Wy! 

W3 Wn 

WW’ (12) 
Wr-1 

Wh 


The corresponding n(n — 1)/2 errors are (after using 
relations (11) and (12)): 


W; 
Ei; = aij (7) = 1.00, 
i 


i,j=l,...,n, andj>1. 


(13) 


Since the W; are relative preferences that add up to 1, 
the following relation (14) should also be satisfied: 


so W; = 1.00. 


i=1 


(14) 


Apparently, since the W; represent relative preferences 
we also have: 


W;>0, i=1,...,n. (15) 


Relations (13) and (14), when the data are consistent 
(i.e., all the errors are equal to zero), can be written as 
follows: 


BW =b. (16) 


The vector b has zero entries everywhere except the last 
one that is equal to 1, and the matrix B has the following 
form (blank entries represent zeros): 


1 2 3 tee n 
—1 a\,2 1 
—1 a\,3 2 
—1 Ain n—1 
B= —1 423 1 
—l a2,n n—2 
An—-1,n 1 
1 1 1 1 
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The error minimization issue is interpreted in many 
cases (regression analysis, linear least squares problem) 
as the minimization of the sum of squares of the residual 
vector: r = b — BW [16]. In terms of formulation (15) 
this means that in a real life situation (i. e., when errors 
are not zero any more) the real intention of the decision 
maker is to minimize the expression: 


f(x) = ||b- BWI, (17) 


which, apparently, expresses a typical linear least 
squares problem. 

If we use the notation described previously, then the 
quantity (6) which is minimized in [2] becomes: 


S= 0 > (aijWj - Wi? = >> > (ei Wi)? 


i=1 j=1 i=1 j=1 
and the alternative expression (7) becomes: 
n n 2 n n 2 
Wj W; 
8= DY (ag) =LU (ew) - 
i=1 j=l : i=1 j=l My 


Clearly, both expressions are too complicated to re- 
flect, in a reasonable way, the intentions of the decision 
maker. 

The models proposed in [3] are closer to the one de- 
veloped under the human rationality assumption. The 
only difference is that instead of the relations: 


log aij = log w; — log Wj + Wi (Wi, Wi)éij 


and 
Wi : 
aij = W; + W2( Wi, Weis. 


the following simpler expression is used: 


W; 
aij = Ww aii (18) 
ij 


or 


eps 1.00 

Gij => W; x (ei; + 0! ). 

However, as the second example illustrates, the per- 
formance of this method is greatly dependent on the 
selection of the w,(X, Z) or Wz (X, Z) functions. Now, 
however, these functions are further modified by (17). 


Example 1 
Let us assume that the following is the matrix with 
the pairwise comparisons for a set of four elements: 


1 2/1 W/5 1/9 
1/2 1 1/8 1/9 
5/1 8/1 1 1/4 


Using the methods presented in previous sections we 
can see that 


max = 4.226, 
4.226 —4 
CI = ———— = 0.053, 
4—1 


CI 
CR = — = 0.0837 < 0.10. 
0.90 


The formulation (15) that corresponds to this example 
is as follows: 


=1 2/1 00 0 0 
-1 00 1/5 0 0 
Vi 
1 00 0 1/9 Ms 0 
00 =-1 1/8 © |x “ =! 0 
0.0 -1 0 1/9 7 0 
00 00: =f i/4 2 0 
i it £ 4 1.0 


The vector V that solves the above least squares prob- 
lem is calculated to be: 


V = (0.065841 0.039398 0.186926 0.704808). 


Hence, the sum of squares of the residual vector com- 
ponents is 0.003030. The average squared residual for 
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Data for the second example 


@) | @) |G) |@ |G | © |@ 
Ou. ao ees 
OPE es ke ee Ns 
(ye See |e ys) |) ish) 7) 15 
Clive We |S eae aves sie 
Gye 75 5 i) fe oe) 13 
(Oyo iS) S| Fe ess wee 
CaS 45 Se Seen ty lel 


936 


Estimating Data for Multicriteria Decision Making Problems: Optimization Techniques 


Estimating Data for Multicriteria Decision Making Problems: Optimization Techniques, Table 3 


Comparison of the relative preferences for the data in Table 2 


elements in set 

method used (1) (2) (3) (4) (5) (6) (7) Ave. residual 
Basu Ceca Gr 0.429 0.231 0.021 0.053 0.053 0.119 0.095 0.134 
method 
OTe ats lela 0.427 0.230 0.021 0.052 0.052 0.123 0.094 0.135 
eigenvector 
Chu’s method 0.487. 0.175. 0.030 0.059. 0.059 0.104. 0.085 0.097 
SRS Mi LesIoE 0.422 0.232 0.021 0.052 0.052 0.127 0.094 0.138 
with yy = 1 
OTERO MOS 0.386 0.287 0.042 0.061 0.061 0.088 0.075 0.161 
with wv = 1 
Federov Model 2 

32067 032 nO: ey AED 
with yy = |W; —W,| | 0383 0.262 0.032 0.059 0.059 0 0.083 0.15 
Federov Model 2 
with Yo = WilW, 0.047 0.229 0.021 0.051 0.051 0.120 0.081 0.130 
Least squares 
method under the 0.408 0.147 0.037 0.054 0.054 0.080 0.066 0.082 
HR assumption 


this problem is 0.003030/(4(4 — 1)/2) + 1 = 0.000433; 
that is, the average residual is 0.000433 = 0.020806. 


Example 2 The second example uses the same data 
used originally in [11], and later in [2] and [3]. These 
data are presented in Table 2. 

Table 3 presents a summary of the results (as found 
in the corresponding references) when the methods de- 
scribed in the subsections above are used. The power 
method for deriving the eigenvector was applied as pre- 
sented in [7]. In the last row of Table 2 are the results 
obtained by using the least square method under the 
human rationality assumption (HR). 

As it is shown in the last column of Table 3, the 
performance of each method is very different as far the 
mean residual is concerned. The results also illustrate 
how critical is the role of the functions w,(X, Z) and 
W2(X, Z) in the method of [3]. The mean residual ob- 
tained by using the least squares method under the hu- 
man rationality assumption is the smallest one by 16%. 


Matrices with Missing Comparisons 


For one to evaluate n concepts, normally all the re- 
quired n(n — 1)/2 pairwise comparisons are needed. 


However, for large numbers of concepts to be com- 
pared, the decision maker may become quite bored, 
tired and inattentive with assigning the values to the 
comparisons as time is going on, which may easily lead 
to erroneous judgments. Moreover, the time spent to 
elicit all the comparisons for a judgment matrix may be 
unaffordable. Also the decision maker may not be sure 
about the values of some comparisons and thus may not 
want to make a direct evaluation of them. In cases like 
the previous ones, the decision maker may wish to stop 
the process and then try to derive the relative prefer- 
ences from an incomplete pairwise comparison (judg- 
ment) matrix. 

Given an incomplete pairwise comparison matrix, 
there are two central and closely interrelated problems. 
The first problem is how to estimate the missing com- 
parisons. The second problem is which comparison to 
evaluate next. In other words, if the decision maker 
wishes to estimate a few extra comparisons (from the 
remaining undetermined ones) how should the next 
comparison be selected? Should it be selected randomly 
or according to some rule (to be determined)? Next, 
we study the first of these two closely related prob- 
lems. 
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Estimating Missing Comparisons 
Using Connecting Paths 


Suppose that X;,; is a missing comparison to be es- 
timated. Next, also assume that there are two known 
comparisons a;,; and a,,; for some index k. In the per- 
fectly consistent case the following relationship should 
be true: 


Xi,j = Gi,k X Ak,j- 


In the more general inconsistent case, the X;, ; value can 
be approximated by the product aj, x x a,j. In [5], and 
[6] the pair a;,, and a,j; is called an elementary con- 
necting path connecting the missing comparison X;,j. 
Obviously, given a missing comparison, more than one 
such connecting path may exist (i.e., if there are more 
than one k indexes which satisfy the above relation- 
ship). Moreover, it is also possible to have connecting 
paths comprised by more than two known comparisons 
(i.e., paths of size larger than 2). The general structure 
of a connecting path of size r, denoted as CP,, has the 
following form: 


CP,: Xi,j = Gi,ki X Aki,k2 X +++ X Akr,js 


for i,j, k1,...,kr=1,...,.n, 1<r<n—2. 

According to P.T. Harker [5,6] the value of the miss- 
ing comparison X;,; should be equal to the geometric 
mean of all connecting paths related to this missing 
comparison. That is, the following should be true: 


In the previous expression it is assumed that there are 
q such connecting paths. For the above reasons, this 
method is known as the geometric mean method for es- 
timating missing comparisons. 

A method alternative to the geometric means 
method is to express the missing comparisons in terms 
of the arithmetic averages of all related connecting 
paths and some error terms. In this way, one can also 
introduce error terms on consistency relations which 
are defined on pairs of missing comparisons (for more 
details, please see [1]). A natural objective then, could 
be to minimize the sum of the absolute terms of all 
these error terms (which can be of any sign). That is, the 


above consideration leads to the formulation of a lin- 
ear programming (LP) problem. A similar approach is 
presented in [17] (in which the path problem does not 
occur). 

However, there is a serious drawback with any 
method which attempts to use connecting paths. The 
number of connecting paths may be astronomically 
large, rendering any such method computationally in- 
tractable. For instance, for a comparison matrix of di- 
mension of six, the number of possible connecting 
paths to be considered might be equal to 64, while in 
a case of dimension equal to ten, the number of paths 
may become equal to 109,600. As a result, some alter- 
native approaches have been developed. The revised ge- 
ometric means method (or RGM) method and a least 
squares formulation are two such methods and are dis- 
cussed next. 


Revised Geometric Mean Method (RGM) 


An alternative approach to the use of connecting paths, 
is to convert the incomplete judgement matrix into 
a transformed matrix and then determine its principal 
right eigenvector. This was proposed by Harker [4] and 
it is best illustrated by means of an example. 

Suppose that the following is an incomplete judge- 
ment matrix of order 3 (taken from [4]). 


1 Ln = 
Ap=|]1/2 1 2 
—- 1/2 1 


One can replace the missing elements (denoted by —) 
by the corresponding ratios of weights. Therefore, the 
previous matrix becomes: 


1 2 w1/wW3 
Aj= 1/2 1 2 
w3/w, 1/2 1 


That is, the missing comparison X,3 was replaced by 
the ratio w,/w; (similar for the reciprocal entry X3,). 
Next observe that the product A; W is equal to: 


1 2 w1/w3 Wi 
Aj Ww= 1/2 1 2 Ww2 
w3/w, 1/2 1 W3 

2wy + 2w2 


= | w,/2 + w2 + 2w3 
w2/2 + 2w3 
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The same result can also be obtained if one consid- 
ers the matrix C, given as follows: 


2 2 0 
C=)1/2 1 2), 
0 1/2 2 


that is, matrix C satisfies the relationship 


Therefore, the desired relative preferences (i. e., the en- 
tries of vector W) can be determined as the principal 
right eigenvector of the new matrix C. This is true be- 
cause: 


Ai\W=CW=AW. 


In general, the entries of matrix C can be determined 
from the entries of an incomplete judgement matrix Ao 
as follows (where c;,; and aj; are the elements of the 
matrices C and Ag, respectively): 


c= 1 +m; 


and for i $j: 


; aj,; if a;,; is a positive number, 
i,j ‘ 
0 otherwise, 


where m; is the number of unanswered questions in the 
ith row of the incomplete comparison matrix. 

Next, the elements of the W vector can be deter- 
mined by using one of the methods presented in the 
second section. 


Least Squares Formulation 


This formulation is a natural extension of the formu- 
lation discussed earlier in the section on the HR fac- 
tor. The only difference is that in relations (12) one 
should only consider known comparisons. This, as a re- 
sult, implies that the new matrix B (as defined earlier) 
should not have rows which would correspond to miss- 
ing comparisons. Finally, observe that in order to solve 
the least squares problem given as (16), one has to cal- 


culate the vector W as follows: 
W = (B'B)'B'b, 


where BT stands for the transpose of B. 

In [1] the revised geometric means and the previous 
least squares method were tested on random problems. 
First, a complete judgment matrix was determined. 
These matrices, in general, were slightly inconsistent. 
They were derived according to the procedures used in 
[20,22], and [19]. Then, some comparisons were ran- 
domly removed and set as missing. Then, the previous 
two methods were applied on the incomplete judgment 
matrix and the missing comparisons were estimated. 
The estimated matrix was used to derive a ranking of 
the compared entities. This ranking was compared with 
the ranking derived when the original complete judg- 
ment matrix is used. In these computational experi- 
ments it was found that the two estimation methods 
for missing comparisons performed almost in a sim- 
ilar manner. This manner was different for matrices 
of different order and various percentages of missing 
comparisons. More details on these issues can be found 
in [1]. 


Determining the Comparison to Elicit Next 


Suppose that the decision maker has determined some 
of the n(n — 1)/2 comparisons when a set of n en- 
tities is considered for extracting relative preferences. 
Next assume that the decision maker wishes to proceed 
with only a few additional comparisons and not deter- 
mine the entire judgment matrix. The question we ex- 
amine at this point is which ones the additional com- 
parisons should be. To be more specific, the question 
we consider is best stated as follows: Given an incom- 
plete judgment matrix, and the option to elicit just some 
additional comparisons, then which one should be the 
comparison to elicit next? 

One obvious approach is to select the next compar- 
ison just randomly among the missing ones. This prob- 
lem was examined by Harker in [5] and [6]. Harker fo- 
cused his attention on how to determine which com- 
parison, among the missing ones, is the most critical 
one. He determined as the most critical one, to be the 
comparison which would have the largest impact (when 
the appropriate derivatives are considered) on the vec- 
tor W. 
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He observed that the largest absolute gradient (i.e., 
the largest partial derivative) means that a unit change 
of the specific missing comparison brings out the 
biggest change on the vector W. Therefore, he asserted, 
that the missing comparison related to the largest abso- 
lute gradient should be the most critical one and there- 
fore, the one to evaluate next. Then, the following for- 
mula calculating the largest absolute gradient can be 
used to choose the most critical comparison index (i, j): 


a dx(A) 
(i, j) = arg max ; 
(k,DeQ || 9k Hoo 
where Q is the set of missing comparisons and || - ||oo 


is the Tchebyshev norm. The most critical comparison 
index (i, j) is determined by the maximum norm of the 
vector of dx(A)/ 0%, which corresponds to all missing 
comparisons. 

The previous approach is intuitively plausible but 
computationally non trivial. Moreover, its effectiveness 
had not been addressed until recently. In [1] Harker’s 
derivatives approach was tested versus a method which 
randomly selects the next comparison to elicit. The 
test problems were generated similarly to the ones 
described at the end of the previous section. The 
two methods were also tested in a similar manner 
as before. To our surprise, the two methods per- 
formed in a similar manner. Therefore, the obvi- 
ous conclusion is that one does not have to imple- 
ment the more complex derivatives method. It is suf- 
ficient to select the next comparison just randomly. 
Of course, the more comparisons are selected, the bet- 
ter is for the accuracy of the final results. Since the 
order of comparisons seems not to have an impact, 
the best strategy is to select as the next comparison 
the one which is easier for the decision maker to 
elicit. 


Conclusions 


Deriving the data for MCDM problems is an approach 
which requires trade-offs. Thus, it should not come 
as a surprise that optimization can be used at various 
stages of this crucial phase in solving many MCDM 
problems. The previous analysis of some key problems 
signifies that optimization becomes more critical as the 
size of the decision problem increases. 


Finally, it should be stated here that an in depth 
analysis of many key issues in multicriteria decision 
making theory and practice is provided in [18]. 
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Abstract 


Planning and design of evacuation networks is both 
a complex and critically important optimization prob- 
lem for a number of emergency situations. One partic- 
ularly critical class of examples concerns the emergency 
evacuation of chemical plants, high-rise buildings, and 
naval vessels due to fire, explosion or other emergen- 
cies. The problem is compounded because the solution 
must take into account the fact that human occupants 
may panic during the evacuation, therefore, there must 
be a well-defined set of evacuation routes in order to 
minimize the sense of panic and at the same time cre- 
ate safe, effective routes for evacuation. The problem 
is a highly transient, stochastic, nonlinear, combinato- 
rial optimization programming problem. We focus on 
evacuation networks where congestion is a significant 
problem. 
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Keywords 
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Congestion 


Introduction 


Evacuation is one of the most perilous, pernicious, 
and persistent problems faced by humanity. Hurri- 
canes, fires, earthquakes, explosions and other natural 
and man-made disasters happen on almost a daily ba- 
sis throughout the world. How can we safely evacuate 
a collection of occupants within an affected region or 
facility is the fundamental problem faced in evacuation. 


Purpose of Chapter 


The purpose of this chapter is to both introduce to the 
reader the problem of evacuation and its manifest na- 
ture, and also suggest some alternative approaches to 
optimize this process. That life-threatening evacuations 
happen as often as they do is somewhat surprising. That 
people often do not know how to safely evacuate in time 
of need is a sad reality. That people must help people 
plan for evacuation is one of the most important activi- 
ties of a research scientist. 


Outline of Chapter 


In this chapter we first introduce the problem in 
Sect. “Modelling Fundamentals” and then also de- 
scribe our fundamental modelling 3-step methodol- 
ogy. In Sect. “Mathematical Models”, we array the 
number of different of static and dynamic approaches 
to this problem and present our general approach 
which has guided our research on the problem. Finally, 
in Sect. “Algorithms” we discuss the algorithmic ap- 
proaches to the problem where we capture the con- 
gested flow of occupants in the network and attempt to 
define the safest evacuation routes trading off the dif- 
ferent objective performance measures in the network. 


Modelling Fundamentals 


The process of an evacuation is captured in the sim- 
plified flow chart of Fig. 1. There are essentially five 
phases which underly the evacuation process. The first 
and foremost is a warning bell or siren signaling the oc- 
cupant population to leave. Unfortunately, one must re- 
act to the warning and recognize the problem at hand, 
so there is often a great deal of uncertainty associated 
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Evacuation Networks, Figure 1 
Processes for an evacuation 


with the second phase. Thirdly, after the warning is 
taken seriously, the occupants must decide to evacu- 
ate. The first three phases are highly uncertain and tran- 
sient. Once the occupants decide to evacuate, the gen- 
eral evacuation process gets underway and this is where 
the evacuation plans should be followed. Finally, there 
is a verification phase, were one must account for all 
the occupants to ensure their safe arrival at the destina- 
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Evacuation Networks, Figure 2 
Evacuation plan for a hospital complex 


tion. As a constructive framework for this chapter on 
evacuation networks, we establish that the modelling of 
evacuation problems has three fundamental steps: Step 
1.0 Representation: How should a region, e. g. Fig. 2 or 
facility be represented or modelled? Step 2.0 Analysis: 
Given the model, how should analyze the evacuation of 
the occupants, i. e. a deterministic or stochastic evacua- 
tion process? What performance measures are crucial to 
measuring performance of the evacuation? and Step 3.0 
Synthesis: How should one synthesize the results of the 
analysis step so as to best evacuate the occupants in light 
of the performance measures? 


Representation Stage 


Figure 2 depicts a large hospital campus with many 
inter-connected buildings, many different levels, and 
a complex array of circulation passages, and illustrates 
that the evacuation problem is a difficult one to repre- 
sent. However, one can begin to accurately model the 
evacuation process through a network as depicted in 
Fig. 3. By definition, an evacuation network (graph) 
G(V, E)* is comprised of a finite set V of nodes (ver- 


tices) of size N where V = {Vj, Vo,..., V,} together 
with a finite set E of arcs ex = (v;,vj) VW(i, j) nodal 
pairs and an indication of the level at which the net- 
work is defined £. The levels actually correspond to the 
degree of aggregation inherent in modelling large com- 
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plex networks. V can further be partitioned into three 
sets of nodes: 


V, := which represents the occupant source nodes dur- 
ing the evacuation, 

V2 := which represents the intermediate nodes during 
the evacuation; 

V3 := which represents the sink or destination nodes of 
the occupants. 


The set of arcs represent the different streets, pas- 
sageways, or routes from V, to V3. Associated with 
each node £ € V and each arc (v;, vj) € E are variables 
and parameters which represent node and arc process- 
ing times, node and arc capacities, arrival times to the 
network, distances, and occupant population sizes at 
the source nodes. 

Figure 3 illustrates the example evacuation network 
with the key congested routes in the evacuation plan- 
ning problem embedded in the network model. 

The Representation Step is often defined in terms 
of the size and composition of the customer popula- 
tion: infinite, finite, or mixed and how the facility under 


Minimize 
Congestion 
__ Minimize 
Evacuation Time 
|_ Minimize 


__ Minimize 


Travel Time 
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study should be decomposed by V, E, and ¢. The cru- 
cial link between the Representation and Analysis Steps 
is the complexity (i.e., number of nodes and arcs) of 
G®, which governs the number of equations used in the 
mathematical model in the Analysis Step. The Repre- 
sentation Step presents an interesting and challenging 
problem because of the many possible ways of repre- 
senting regions, facilities, ships, vehicles, and building 
components. 


Analysis Step 


The Analysis Step is the point at which the method- 
ology and mathematical models underlying the flow 
processes, and the algorithmic structure for comput- 
ing the performance characteristics of G'(Z, E) come 
together. Mathematically, we have a network G(V, E), 
with a finite set of nodes V, and edges(arcs) E, over 
which multiple classes of customers (occupants) flow 
from source(s) to sink(s) while a vector of objective 
functions 2 = { f(x), fo(x),.... fp(x)} is simultane- 
ously extremized subject to a set of constraints on the 
occupants flowing through the network. Figure 4 cap- 
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tures many of the recognized criteria appropriate in an- 
alyzing a network evacuation problem. In our studies, 
we have often used Minimum Total Evacuation Time 
and Minimum Total Distance Travelled to capture the 
evacuation problem. The Total Distance travelled is 
a suitable surrogate objective for approaching the route 
complexity, since reducing the evacuation path length 
will often begin to capture the path complexity and, 
hopefully, minimizing this measure will abate the oc- 
cupants sense of panic. Other objectives might be ap- 
propriate given the particular context or decision situa- 
tion. 


Synthesis Step 


Given the performance characteristics determined dur- 
ing the Analysis Step, we can begin to optimize the net- 
work topology itself, routing and resource allocation 
problems within: 


Topological Network Design (TND): Determination 
of the number, type, and subset of nodes and arcs 
as well as the particular node and arc topology to be 
used for the evacuation. 


;— Transient Networks 


__ Stochastic 
Networks 
__ Steady State Network Models — 
-— Static Networks 
| Deterministic _ |—Dynamic Networks 
Flow Networks 
2X 
— Simulation Networks 
-— TND Problems 
__ Opemization | _pivny Problems 
Problems 
— CND Problems 
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Morphological diagram of EEP approaches 


Routing Network Design (RND): Determination of 
the routing scheme in both steady-state and real 
time. 

Capacitated Network Design (CND): Determination 
of the Network Resources: Number of highway 
lanes, corridor length, widths, areas, landing shape, 
reception center capacity, configuration etc. 


Mathematical Models 


There are many possible mathematical modelling ap- 
proaches once our network is constructed and Fig. 5 
represents the range of approaches many research sci- 
entists have followed. References are provided for fur- 
ther details. The boldface text along the morphological 
tree represents the approach suggested in this chapter 
which we have applied in many different contexts. 

Many mathematical models which have appeared 
in the literature for generating and evaluating evacu- 
ation paths for an occupant population [3,9,13,24,31]. 
Besides the models for estimating flows, many newer 
works are just becoming available for the optimization 
of the evacuation networks, i.e. the TND, RND, and 
CND problems, and these are illustrated in the third 
branch of the morphological tree. 


;— Simulation Approaches [23, 12, 27, 6] 
\_ Analytical Transient Approaches 


-— Mean Value Analysis(MVA) [18, 25, 21, 8] 
\_MVA with Finite Waiting Room [5] 


-— Transshipment Models [3] 
_— Dynamic network Flows [9] 


_ Dynamic Programming [14] 
Deterministic Models [2] 

-— Equi-Area Partitioning Model [30] 

— Multi-Story Assignment Problem [11] 


-— Stochastic Flow Routing [28] 
\_ Mixed Integer Programming Models [24] 


-— Steady-State Stairwell Width Optimization [1, 7] 
‘— Transient Stairwell Width Optimization [29] 
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Set Partitioning Model 


The model which is presented below is a variation of 
one model appearing in [13]. It was one of the first to 
account for the critical features of the stochastic evacu- 
ation problem. Another class of models that one might 
utilize to formulate the problem are those of the class of 
multi-commodity network flow models. Unfortunately, 
these models will not control the Bernoulli splitting of 
the occupant population along the different evacuation 
paths which is problematic since splitting the different 
source populations will engender confusion and create 
a potential sense of panic among the evacuating occu- 
pants. The integer set partitioning programming model 
presented below has the desired property to control 
splitting of the flows. The multi-objective model of our 
routing problem is: 


Minimize f(x); f(x} 
where: 


(Evacuation Time) : f\(x) = > > Dijk A ijkXijk 
ij k 
(Distance Travelled) : f,(x) = », > > ii jkX ijk 
i jk 


subject to: 


(V2 Arcs) : De De Do meinehini < pe VE (1) 
ij 


(V3 Sinks) : aoe > Pixie < CG, Yq (2) 
ij ek 

(Occupant Classes) : Par. = t Vy (3) 
k 


(Routes) : xijx = 0,1 Wijk (4) 


and where: 


Xijk = 1 ifthe ith occupant class from the jth source 
is assigned the kth route alternative. 

Ajjx = the arrival rate of the ijth occupant population 
into the kth routing alternative. 

Qjjk = a data coefficient which equals 1 if the ¢th arc 
is included in the ijkth route assignment and equals 
0 otherwise. 

jig := maximum allowable traffic service rate along 
arc €. 


C, : capacity of sink (destination) node q. 

Pijk := occupant population of source ij on the kth 
route alternative. 

ijk = expected evacuation (sojourn) time of the 
ijkth occupant class. These values must be calcu- 
lated from the particular stochastic model used in 
the evacuation study, see discussion below. 

dij, := average distance travelled for the ijkth occu- 
pant class. 


Since we have two objectives in our model, it makes 
sense to talk of the Noninferior (NI) set of route alter- 
natives, since the tradeoffs between f; and f2 naturally 
underlie the optimal set of solutions we seek. Because of 
the complexity of solving this model directly, an alter- 
native approach which systematically generates feasible 
routing alternatives to a relaxed version of our mathe- 
matical model but at the same time measures the criti- 
cal objectives of evacuation time and distance travelled 
is proposed and demonstrated in the next two sections 
of the chapter. 


Congestion Models 


The real crux of the evacuation problem is to capture 
the congestion that naturally occurs when occupants 
choose the shortest routes to evacuate. There are some 
deterministic measures possible for measuring conges- 
tion, yet stochastic ones are the most accurate, because 
queueing is a nonlinear complex phenomenon. 


Erlang Loss/Delay Networks 


Fundamentally, each $; node in the circulation network 
is an M/G/c/c queue, i.e. there is no waiting room and 
C depends on the square footage area of the circula- 
tion segment or the number of vehicles which can max- 
imally occupy a highway segment [33]. Let’s for the 
sake of the argument, focus on pedestrian evacuation. 
Later on we will show how our model extends to vehic- 
ular congestion. Each occupant in the circulation sys- 
tem consumes approximately 0.2 m? of floorspace, and, 
therefore, the capacity of a circulation system element 
is: 


C = 5LW 


where L(Length) and W(Width) are given in meters. 
Each circulation segment is a representative “building 
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Evacuation Networks, Figure 6 
Three-dimensional network models 


block” for modelling pedestrian movements through 
the facility. Corridor segments, intersections, landings, 
stairwells, ramps, and so on represent a network of in- 
terconnected M/G/c/c queues, see Fig. 6. The separa- 
tions of the circulation blocks are due to changes in flow 
direction, level, or merging and splitting decisions. Fur- 
ther, the cardinality of S depends on the configuration 
and complexity of movement patterns within the facil- 
ity. Flows through the nodes of S, the circulation sys- 
tem of a building are largely state dependent, in that 
a customer receives service in the circulation node S; 
and this service rate decays with increasing amounts of 
customer traffic. 

Figure 7 shows a family of curves which represent 
the variety of empirical studies (curves a-f in Fig. 7) 
that document the decay rate of the customer service 
rate as a function of population density in a corri- 
dor. Empirical models are also available showing dis- 
tributions for stairs and other circulation elements with 


bi-, and multi-directional pedestrian flows [10,26]. Fi- 
nally, there are a set of classical linear and exponential 
curves which relate vehicle speed and vehicle density 
captured in Fig. 8. We have utilized these type of vehic- 
ular speed/density relations to develop state dependent 
models for vehicular traffic analysis [12]. In general, the 
service rate jt is a function of velocity v;, which is a con- 
stant for each individual in the corridor. Thus, it takes 
t; (seconds) 


for each person to traverse the corridor, where i is the 
number of occupants in the circulation system when an 
individual enters. 

Because of the complexity of dynamically updat- 
ing the service rate as a function of the number of 
customers within a corridor segment, it becomes ex- 
tremely difficult to utilize digital simulation models in 
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Evacuation Networks, Figure 8 
Empirical distributions of vehicular traffic flows 


the design of circulation systems within buildings. Our 
computational experience in digital simulation of ac- 
cess and egress networks underscores this defect in sim- 
ulation models. We must, therefore, look to analyti- 
cal models to aid the network design process if state 
dependent models are to be effectively utilized. Also, 
since we are examining the pedestrian/vehicular net- 
work as a design problem rather than as a control prob- 
lem, it makes most sense to look at steady state mea- 
sures rather than transient ones. 

We have recently developed a generalized model 
of the M/G/c/c Erlang loss queueing model for ser- 
vice rate decay which can model any service rate dis- 
tribution (linear, exponential, ...) [4,5,22]. It is a spe- 
cial case of an Erlang Loss model. Kelly [14] has treated 
M/G/c/c state dependent models in his book, but only 
ones with a linear, increasing function of the number 
of customers in the queue, whereas, we treat the queue 
with an nonlinear, decreasing service rate, see Fig. 3. 

Our M/G/c/c state dependent model dynamically 
models the flow rate of pedestrians within a corridor as 
a function of the population within the corridor. Sup- 
pose that G is a continuous distribution having den- 
sity g and failure rate u(t) = g(t) iy G(t). Loosely speak- 
ing, 1(t) is the instantaneous probability intensity that 
a service ¢ units old will end. The service rate depends 
on the number of customers in the system: given that 
there are n people in the system, each server processes 
work at rate f(). In other words, if there is an arrival, 
the service rate will change to f(m + 1) and if there is 
a departure, the service rate will change to f(m — 1). 

In particular, the probability distribution of the 
number of occupants in the corridor is given by: 


_ [AE(S)]"Po _ 
P(n in system) = nif(n)... FOF) n= 1,256.€ 
where 
é 1 
WS C EETONE 
V+ Doin TROD 
E(S) = — 
15 
4 
f(y) = — 
Lal 


and E(S) is the mean service time of a lone occupant 
flowing through a corridor of length L, with service rate 
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1.5 m/s (see Fig. 3). The term v, is defined as the av- 
erage walking speed when n people are in the corridor. 
For the M/G/c/c state dependent model, we have also 
shown that the departure process (including customers 
completing service and those that are lost) is a Poisson 
process with rate A [4,5]. 


Algorithms 


The problem we face in our evacuation planning prob- 
lem is that we do not know a priori which paths are NI 
without assessing the congestion in G(V, E). We must 
iteratively generate candidate paths, assess the conges- 
tion in G(V,£E), and then iterate again until the de- 
sired tradeoffs between distance travelled and evacua- 
tion time is acceptable to the planner. This iterative pro- 
cess leads to the algorithm described below. For prod- 
uct form networks where the estimate of time delays in 
the Expected Savings calculation for re-routing among 
the alternative Noninferior paths can be computed ex- 
actly, then the algorithm will guarantee finding a Non- 
inferior path for re-routing the occupant classes. For 
non-product form networks, which are typically the 
case, we can only approximate these time delays, there- 
fore, the algorithm can only guarantee an approximate 
Noninferior solution. Considering the complexity of 
the underlying stochastic-integer programming prob- 
lem, this is a reasonable and practical strategy. 


K-shortest Paths 


The algorithm to facilitate the design methodology can 
be incorporated into any appropriate discrete-event 
simulation model [6] or analytical model [8] to estimate 
fis fa, and carry out the evacuation planning/routing 
analysis. To summarize and focus the efforts in this 
chapter, an algorithmic description of Steps 1.0, 2.0, 3.0 
and it substeps are presented. 


Step 1.0: Representation Step Represent the underly- 
ing facility or region as a network G(V, E) where 
V := isa finite set of nodes and E := is a finite set 
of arcs or nodal pairs. 

Step 2.0: Analysis Step Analyze G(V,E) as a queue- 
ing network either with a transient or steady-state 
model and compute the total evacuation time of the 
occupant population along with total distance trav- 
elled to evacuate given a set of evacuation paths. 


Step 3.0 Synthesis Step 


Step 3.1: Analyze the queueing output from the 
evacuation model and compute the set of 
Noninferior evacuation paths which simultane- 
ously minimize time and distance travelled in 
G(V, E) for each occupant population. 


3.1.1 If the set on NI paths are uniquely opti- 
mal then 


Ei, = qijk —[(dj,/@) + qij] <0 Wijk 


go to Step 3.2 where: 


Ej jx := is the net increase or decrease in the 
average egress time per person caused 
by re-routing occupants to the (kth + 1) 
Noninferior route. 


qijk ‘= the sum of the average queue times 
per person on the original route. 
di. := the increased distance travelled on 


the (kth + 1) Noninferior route (e.g. if 
the kth Noninferior route is 100 feet and 
the kth + 1 Noninferior route is 120 feet, 
di. is equal to 20 feet i. e. 120 minus 100). 

@ := is the average travel speed for dk. 

qi := the sum of the expected queue times 
per person on the (kth + 1) Noninferior 
route, otherwise: 


3.1.2 Significant queueing (congestion) exists 
on one or more routes then go to Step 3.3. 


Step 3.2: STOP! The NI shortest time/distance 
routes are optimal and identical and total evac- 
uation time, distance and congestion are mini- 
mized. 

Step 3.3: Determine the total number of occupants 
who pass through the queueing area(s) and 
trace them back to their origins. 

Step 3.4: Select the total number of occupants to 
be re-routed from each source node. The total 
number of occupants re-routed is correlated to 
both the size of the queues and the number of 
occupants on each route. In selecting the popu- 
lation, the analyst should strive to achieve uni- 
formity of occupants and queues on each egress 
route. 

Step 3.5: Re-route the population to the kth route 
of the NI set of paths where k is selected by em- 
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ploying the following formula: 
Et, = qijx — [(di,/@) + qi,] Vik 
Step 3.6: Select the largest positive E’ for each set 
of populations to be re-routed, where: 


E* = max {Ey, En,.. 


Vijsources 


., Ex} 


for all possible savings, and then re-run the 
computer evacuation planning model with the 
new set of routes, by returning to Step 2.0 of the 
General Algorithm. If all Es are negative, stop! 
The current set of NI shortest routes used on the 
previous iteration are selected. 


Other Algorithms 


Besides the k-shortest path approach, one might utilize 
a turn-penalty algorithm to guide the process of de- 
termining the evacuation paths. This is probably very 
appropriate in vehicular evacuation schemes. Also, an- 
other approach which seems quite viable, would be to 
define the set of arc disjoint paths, since this would tend 
to completely separate the occupant congestion along 
the paths. We have not experimented with these ap- 
proaches to define the evacuation routes, but their use 
might be quite appropriate in the future. 


Summary and Conclusion 


We have given you some insights into the performance 
modelling and optimization problems associated with 
evacuation networks. As the maturity of this applica- 
tion area grows, the more research that is devoted to the 
area, the more theoretical and algorithmic issues and 
progress that will emerge. 
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Most of the NP-hard combinatorial optimization prob- 
lems cannot be solved to optimality in practice. There- 
fore heuristic techniques have to be used to obtain solu- 
tions of high quality. There exists different approaches 
to design a heuristic algorithm, such as tabu search 
and genetic algorithm for example. The latter solution 
method belongs to a wider class of algorithms, called 
evolutionary algorithms, that handle a set of several so- 
lutions. Within this class, the best known algorithms 
that are applied to combinatorial optimization prob- 
lems are genetic algorithms (cf. ® Genetic algorithms) 
and ant systems. For a general presentation, one can 
mention [22,72] for genetic algorithms and [12,23] for 
ant systems. 

In this article, a review of the evolutionary algo- 
rithms used up to 1998 in combinatorial optimization 
is being made. For a certain number of combinatorial 
problems, the main papers that present an evolutionary 
algorithm for that problem are referenced, and some 
short remarks are given. While it is difficult to provide 
a very precise definition of an evolutionary algorithm, 
this term will be used here as a synonym of population- 
based algorithm: an algorithm that makes evolve sev- 
eral solutions, in particular by exchanging some kind of 
information between them. Algorithms that iteratively 
modify a solution in order to obtain a good one (like 
tabu search or genetic algorithms with a ‘population’ 
of size 1) will not be considered as evolutionary algo- 
rithms. 
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The Traveling Salesman Problem 


The traveling salesman problem (or TSP) is probably 
the problem on which the largest number of evolution- 
ary algorithms have been applied. It consists in deter- 
mining a shortest tour visiting all of the given cities ex- 
actly once. A very complete survey of local search ap- 
proaches to this problem has been provided by DS. 
Johnson and L.A. McGeoch [51], while J.-Y. Potvin 
[70] compared several genetic algorithms for TSP. In 
[51], the authors recommend different solving tech- 
niques depending on the quality of the solution desired 
and the time available. Genetic algorithms or ant sys- 
tems are a good choice if enough running time is al- 
lowed and good solutions are needed. With similar run- 
ning times, the iterated Lin-Kernighan algorithm (or 
ILK) yields better results but is more complex to imple- 
ment. In ILK, a single solution instead of a population 
of individuals is considered and this method will there- 
fore not be referred to as an evolutionary algorithm. If 
there is no restriction on the running time, the best re- 
sults can be obtained by genetic algorithms based on 
ILK. 

An important breakthrough in the field of evolu- 
tionary algorithms for the TSP was the paper [67] by 
H. Miihlenbein, M. Gorges-Schleuter and O. Kramer. 
In their algorithm, implemented on a parallel machine, 
a solution was allowed to mate only with certain other 
solutions and some optimization technique was applied 
to the offsprings. Indeed, the use of a local search al- 
gorithm to improve created offsprings is a necessary 
condition for an evolutionary algorithm to be efficient. 
Moreover, they designed a crossover specific to the TSP, 
called MPX (maximum preservative crossover). It con- 
sists in copying a segment of a certain length from 
a first parent into the offspring and adding cities con- 
secutively from the second parent according to some 
rules. This crossover is very suitable for the TSP, as 
shown in [66]. Further researches studied the impact 
of the different elements on the results and improved 
the quality of the solutions obtained [7,44,89]. Sev- 
eral other crossovers, most of them using two parents, 
have been suggested by various authors. In particular, 
B. Freisleben and P. Merz proposed [37,38] the dis- 
tance preserving crossover (or DPX): An offspring is 
created by keeping the edges that are found in both 
parents, and greedily reconnecting the different pieces 


without using the edges contained in only one par- 
ent. They obtain a very efficient algorithm, that won 
both the ATSP (asymmetric TSP) and the TSP com- 
petitions at the First International Contest in Evolu- 
tionary Optimization [6]. They further improved their 
algorithm, in terms of speed and quality of solutions, 
in [39]. Their use of an edge-preserving crossover and 
of a hill-climbing algorithm illustrates important el- 
ements necessary to obtain an efficient genetic algo- 
rithm for TSP. These elements have been put forward 
in different comparisons between various genetic al- 
gorithms for TSP [70,78], together with the neces- 
sity to split the population into several subpopulations 
for solving large instances (more than a few hundred 
cities). 

The first presentation of ant colony optimization 
(ACO) [12] was made with the TSP as illustration and 
this problem remains the most often used application 
problem of works on ant colony optimization. The ini- 
tial ACO system, named ant system, has been extended 
to what is called ant colony system (ACS). A descrip- 
tion of this algorithm can be found in [23] by M. Dorigo 
and L.M. Gambardella. In the same paper, local search 
has been added to ACS and the resulting algorithm 
has been applied to ATSP and TSP. The results re- 
ported are better in [39] for TSP, but are better in [23] 
for ATSP. Another proposed extension of ant system, 
called MAX-MIN ant system [79], consists in introduc- 
ing explicit maximum and minimum values for the trail 
factors on the arcs. Good results are obtained with such 
an algorithm when local search is added. 


The Vehicle Routing Problem 


The most studied extension of the vehicle routing prob- 
lem (VRP) is the one with time windows (VRPTW). 
In order to solve this problem, a two-phase heuristic, 
called GIDEON, has been proposed in [84]. The first 
phase uses a genetic algorithm to cluster the customers, 
and the solutions obtained are improved by local op- 
timization techniques in the second phase. This proce- 
dure has first been improved in [83], and then extended 
in [85]. In this last paper, S.R. Thangiah, I.H. Osman 
and T. Sun present several metaheuristics, all having 
a first phase similar to the one in GIDEON. These algo- 
rithms have been compared to several other heuristics 
and showed very good results on test problems taken 
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from the literature. Some improvements have still to be 
brought for solving problems with large time windows. 
For such problems, a heuristic based on simulated an- 
nealing and a population-based algorithm called GEN- 
EROUS [71] are shown to be a little more efficient. The 
latter is not a standard genetic algorithm since it does 
not represent solutions by chromosomes, but it never- 
theless handles several solutions and uses a recombina- 
tion operator. An adaptive memory procedure, in con- 
junction with tabu search, has also been applied to this 
problem [75]. 

Improvements of the GIDEON approach with lo- 
cal post-optimization procedures have also been used 
for the VRP with time deadlines. A comparison done in 
[86,87] with two other heuristics shows that the cluster- 
first route-second algorithm with a genetic algorithm in 
the first phase performs well for problems in which the 
customers are distributed uniformly and/or with short 
time deadlines. 


The Quadratic Assignment Problem 


The quadratic assignment problem (or QAP) allows the 
modelization of many practical problems in location 
science, but can be solved optimally only for very small 
instances. Therefore different heuristics have been pro- 
posed for this problem. Several of them are compared in 
[13,81]. For real-world problems (irregular and struc- 
tured), the genetic hybrid by C. Fleurent and J.A. Fer- 
land in [33] appears to be one of the most efficient al- 
gorithms [81]. Based on a standard genetic algorithm 
with solutions encoded as permutations [82], this ge- 
netic hybrid applies a robust tabu search on the off- 
springs and was able to find several new best solutions 
on some benchmark problems. 

The ant colony optimization approach has also been 
considered, first in [64]. This ant system algorithm, 
hybridized with a local search, has been improved in 
[62,63] and provides very good results. A different ACO 
approach, where at each iteration the solutions are 
modified instead of newly constructed, has been pro- 
posed in [40]. This algorithm, also hybridized with a lo- 
cal search procedure, yields better results on real-world 
problems than the genetic hybrid of [33], but is not 
competitive on random problems. A further promis- 
ing method, based on scatter search, has been presented 
in [19]. 


The Satisfiability Problem (SAT) 


The problem of finding a truth assignment for vari- 
ables to make a propositional formula true is probably 
the best known, and historically the first, NP-complete 
problem. But only few evolutionary algorithms for SAT 
can be found in the literature. After a straightforward 
approach in [52], a rather different solution representa- 
tion has been proposed in [45]. But the drawback of this 
method, despite adapted operators, is that it increases 
the size of the individuals in an important way, com- 
pared to the coding ‘one gene for one variable’. This 
last coding has been used in [35], together with a SAT- 
adapted crossover (the objective function being simply 
the number of satisfied clauses). But the evolutionary 
algorithm thus obtained was not able to compete with 
a tabu search (also presented in [35]). The tabu search- 
genetic hybrid (where some iterations of tabu search is 
used for mutation) is computationally expensive, but is 
able to solve large instances that a tabu search alone 
cannot solve. For smaller instances, the hybridization 
is not useful. 

Another heuristic approach to SAT consists in as- 
signing weights to the different clauses and minimizing 
the sum of the weights of the unsatisfied clauses. These 
weights are adapted during the algorithm depending on 
the ‘difficulty’ of each constraint. This mechanism has 
been used in evolutionary algorithms in [25] and [90], 
but in both cases it came out that the best results are ob- 
tained with a ‘population’ of size 1. Such an algorithm 
is therefore no longer considered as an evolutionary al- 
gorithm. 


The Set Covering and Set Partitioning Problems 


The set covering problem (SCP) is a zero-one integer 
programming problem where the constraints are all of 
the type }°j ax; = 1 with zero-one coefficients. It is 
a well-known problem, that has also been used to study 
penalty functions in genetic algorithms [3,74]. 
Different genetic algorithms approaches have been 
proposed in the literature (see for example [50,60,61]), 
and a very efficient one has been presented by J.E. 
Beasley and P.C. Chu in [5]. This algorithm uses 
binary representation of the solutions, and a repair 
operator to preserve the feasibility of the individu- 
als and to improve the solutions. Moreover, a vari- 
able mutation rate has been introduced. Results on 
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standard test problems up to 1000 constraints and 
10,000 variables show the efficiency of this algo- 
rithm that was able to improve the best-known re- 
sult on some of the larger instances. The same pa- 
per shows no significant difference between various 
crossovers. 

The set partitioning problem (SPP) is also a zero- 
one integer programming problem, the difference with 
SCP being that the constraints are equalities instead 
of inequalities. Relatively few heuristics have been de- 
veloped for this problem. D. Levine investigated se- 
quential and parallel genetic algorithms for SPP [59]. 
His best algorithm was a genetic algorithm in an is- 
land model, hybridized with a local search heuristic. 
But this algorithm remained less efficient, both in terms 
of quality of the solutions and in terms of running 
time, than the branch and cut approach of [49]. Some 
problems met by his algorithm were due to the penalty 
term for infeasible solutions in the fitness function. In 
order to overcome these problems, other authors de- 
composed the single fitness measure in two distinct 
parts (the objective function and a measure of ‘infea- 
sibility’) [10]. Adapting the parent selection method to 
this modification, and also using an improvement op- 
erator, they obtained a better genetic algorithm, but 
that is still not able, for the problems they consid- 
ered, to compete with a commercial mixed integer 
solver. 


The Knapsack Problem 


The multidimensional (zero-one) knapsack problem is 
equivalent to the zero-one integer programming prob- 
lem with nonnegative coefficients. Only few papers 
tried to solve this problem with evolutionary algo- 
rithms. While the first such algorithms did not give 
high-quality results and were not competitive with 
other heuristics [56,88], the quality has improved. Ge- 
netic algorithms as presented in [11,48], both work- 
ing only with feasible solutions, are able to obtain op- 
timal solutions on standard test problems (instances 
with at most 105 variables and 30 constraints). In 
[11], Chu and Beasley proposed some larger test prob- 
lems (up to 500 variables and 30 constraints), without 
known optimal solution, and used them for a com- 
parison with other heuristics. Their genetic algorithm 
uses a ‘repair’ operator specific to this problem to en- 


sure good feasible offsprings and obtained high-quality 
results, but needed also more computation time (on 
a same machine, about one hour for the genetic al- 
gorithm against a few seconds for the other heuris- 
tics). 


The Bin Packing Problem 


The standard one-dimensional bin packing problem 
consists in putting items of given sizes in bins of given 
capacity. Many evolutionary algorithms proposed for 
this problem (genetic algorithms and evolution strat- 
egy, see for example [16,57,77]) performed worse than 
a simple heuristic like first fit decreasing. E. Falkenauer 
and A. Delchambre then suggested in [30] a genetic al- 
gorithm designed for grouping problems: the grouping 
genetic algorithm (GGA). In this algorithm, solutions 
are represented by chromosomes having two parts: the 
item part encodes for each item its bin and the group 
part, of variable length, encodes the bin identifiers used. 
The crossover, mutation and inversion operators have 
been adapted to this encoding. Instead of simply us- 
ing the number of bins, the authors designed a fitness 
function that also takes into account the proportion 
to which each bin is filled. With this approach, they 
obtained very satisfactory results. The arguments pre- 
sented for this new encoding are discussed by C. Reeves 
in [73]. In the same paper, a hybrid genetic algorithm is 
presented, where solutions are represented by permu- 
tations and decoded using heuristics like first fit and 
best fit. The results obtained are more or less similar 
to those in [30]. A problem size reduction heuristic, 
similar to the reduction process used in [16], has also 
been introduced in this genetic algorithm. According to 
Falkenauer [29], this reduction violates the search strat- 
egy of the genetic algorithm and he therefore prefers 
the GGA’s crossover, that has the same goal of prop- 
agating promising bins. In the same paper, the GGA 
is improved by the introduction of local optimization 
inspired by the dominance criterion of [65]. The new 
algorithm is compared with an efficient branch and 
bound algorithm and gives excellent results. 

Extensions of the standard bin packing problem, 
like the two-dimensional bin packing problem, have 
also been considered with evolutionary algorithms 
[15,69,77]. An overview of these variations is presented 
in [43]. 
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Graph Coloring 


The graph coloring problem is a well-known problem 
in graph theory; it consists in determining the smallest 
number of colors that must be used to color the ver- 
tices of a graph such that two adjacent vertices do not 
have the same color. L. Davis is the first author who 
proposed an evolutionary algorithm for this problem 
[22]. In fact, he considered a graph with weights on the 
vertices and an integer k. He then designed a hybrid 
genetic algorithm for finding a partial k-coloring such 
that the colored vertices have maximum total weight. In 
this algorithm, individuals are represented as permuta- 
tions of the vertices of the graph. This order-based en- 
coding is not very efficient, as shown by Fleurent and 
Ferland in [34]. In this paper, they also present hybrid 
genetic algorithms that use string-based encodings of 
the solutions for finding a coloring in k colors with as 
few conflicting edges (edges with both ends of the same 
color) as possible. They consider different crossovers, 
including a graph-adapted one, and hybridize the ge- 
netic algorithm with a simple local search or with tabu 
search (a modified version of [46]). The results on 
random graphs G,, 0.5 improve the previous best re- 
sults. For graphs up to 300 vertices, their tabu search- 
genetic hybrid and their tabu search give similar results, 
but in much less time for the latter. For larger graphs 
(500 or 1000 vertices), the running time becomes pro- 
hibitive, and both the evolutionary algorithm and the 
tabu search must be used within a different approach 
(determining large stable sets and coloring the resid- 
ual graph). The tests on 450-vertices Leighton graphs 
(with known chromatic numbers) showed that the tabu 
search-genetic hybrid outperforms the tabu search on 
about half of the instances, while the opposite is true 
for the remaining instances. The hybrid algorithm was 
able to find an optimal solution for two instances (out 
of twelve) that could not be solved by the tabu search 
alone. 

Another evolutionary algorithm has been proposed 
in [18], with a graph-adapted crossover that takes into 
account how ‘close’ a vertex is to conflicting edges. The 
improving algorithm applied to offsprings is a steepest 
descent method, instead of a tabu search like in [34]. 
Despite this less sophisticated method, their algorithm 
gives similar results to those obtained by the hybrid al- 
gorithm in [34]. Moreover, the latter gives worser re- 


sults when the tabu search is replaced by a simple de- 
scent method. 

Concerning ant colony optimization, a first ap- 
proach to graph coloring has been proposed in [17], but 
the results obtained need improvements. 


Other Graph Problems 
Maximum Clique 


The problem of determining the maximum clique 
(complete subgraph) in a graph is equivalent to the 
problem of determining the minimum vertex cover or 
the maximum stable set in the complementary graph. 
A first genetic algorithm, hybridized with a tabu search, 
has been proposed by Fleurent and Ferland in [35], but 
they show that their tabu search alone gives similar re- 
sults in a shorter time. In these algorithms, a solution 
is a set of vertices of given size and the objective func- 
tion measures how many edges are missing for a set to 
be a clique. Improving an algorithm of [2], E. Balas and 
W. Niehaus [4] proposed a genetic algorithm (without 
improving algorithm applied to the offsprings) for both 
the maximum cardinality and maximum weight clique 
problems where an individual is a clique. In this algo- 
rithm, the recombination operation (‘crossover’) used 
is designed specifically for this problem and taken from 
another heuristic. The results obtained on the DIMACS 
benchmark graphs are very good, similar to those ob- 
tained in [35] from the point of view of the solutions’ 
quality. A different fitness function has been suggested 
in [8] and included in a hybrid genetic algorithm us- 
ing a local optimization step. The fitness value associ- 
ated to a set of vertices is a weighted combination of the 
size of the set and the number of edges missing to have 
a clique, but the weights are modified during the run of 
the algorithm according to a simple rule. Despite the in- 
troduction of a preprocessing step that determines the 
order of the vertices on the chromosome, this algorithm 
is less efficient (but this may be due to the use of the 2- 
point crossover). 


Graph Partitioning 


Evolutionary algorithms are rather seldom used to 
tackle the k-way graph partitioning problem (partition- 
ing a (weighted) graph in k equal-sized parts), even if 
the graph bisectioning problem (the case k = 2) is some- 
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times taken to illustrate various ingredients in genetic 
algorithms ([9,54]). For the general k-way graph parti- 
tioning problem, different problem-oriented operators 
are introduced and studied in a parallel genetic algo- 
rithm in [58]. In this algorithm, the population is only 
composed of feasible solutions. Another approach has 
been proposed in [76] where the population is split in 
two halves: one containing only feasible solutions and 
the other only infeasible ones. This algorithm uses the 
same encoding scheme and crossover operator as [58], 
but has not been applied on similar instances of the 
problem. In a general way, genetic algorithms give good 
results on partitioning problems, but at a very high 
computational cost. 


Miscellaneous 
Sequencing and Scheduling 


The best-known sequencing and scheduling problems 
are the flow-shop, job-shop and open shop problems. 
The first paper applying an evolutionary algorithm to 
such a problem is [21]. Later, several other genetic al- 
gorithms have been proposed ([36,80] for example). 
One of the first efficient evolutionary algorithm for job- 
shop problems has been presented in [68] and improved 
in [20,91]. Comparisons done with other heuristics on 
benchmark problems show that sophisticated genetic 
algorithms (with the use of problem-adapted crossovers 
and hybridization) yield the best results for flow-shop 
and job-shop problems [1,24,42]. The open shop prob- 
lems have less attracted researchers of the evolutionary 
algorithms’ field, but a genetic algorithm has been pro- 
posed in [31,32]. An ant colony approach of job-shop 
problems has also been tested, in [14], but gave worse 
results than known genetic algorithms. 


Steiner Trees 


Only very few works deal with Steiner trees and evo- 
lutionary algorithms. Moreover, they consider differ- 
ent variants of this problem. The first paper [47] pro- 
poses a genetic algorithm with local optimization for 
determining minimum Steiner trees in the Euclidean 
plane. A solution is represented by the coordinates of 
the Steiner points. A comparison with simulated an- 
nealing and the Rayward-Smith-Care algorithm shows 
no significant differences. The problem of the rectilin- 


ear Steiner problem has been addressed in [53] with 
a specific coding and an adapted crossover. The min- 
imal Steiner tree problem in graphs has attracted a lit- 
tle more interest. A standard genetic algorithm (with 
bit strings as chromosomes) that gave good results on 
the sparse graphs tested has been proposed in [55]. 
Later, H. Esbensen and P. Mazumder [28] designed 
a genetic algorithm in which the encoding method is 
based on the distance network heuristic. Improvements 
have been brought in [26] and [27], where there is also 
a comparison between different algorithms. But this ge- 
netic algorithm is not competitive with an efficient tabu 
search as presented in [41]. 


Conclusion 


In this paper, some references on the evolutionary ap- 
proaches that have been proposed up to 1998 for differ- 
ent combinatorial problems have been given. A general 
remark that can be made on these solution methods is 
that evolutionary algorithms in general, and genetic al- 
gorithms in particular, are not efficient for such prob- 
lems if implemented too naively. To obtain an algo- 
rithm with good performances, it is necessary to make 
adjustments of the basic method. Moreover, knowledge 
about the problem considered is very often also needed, 
in order to design adapted operators. 

Another remark concerns their competitivity com- 
pared to other heuristic methods. While evolutionary 
algorithms can quite easily be adapted to (almost) any 
problem, their running time is often quite high. Lo- 
cal search algorithms, like tabu search or simulated an- 
nealing, can also be adapted to the different combi- 
natorial problems quite easily. If they are designed in 
an intelligent way, they are very often able to obtain 
better results than evolutionary algorithms. Moreover, 
they are usually faster. For some problems, specifically 
designed heuristics can use theoretical results about 
this problem, allowing them to obtain good results. In 
general, evolutionary algorithms are not competitive 
against (extended) local search or specific algorithms 
for small to medium size instances of combinatorial 
problems. 

But this does not mean that population-based algo- 
rithms are not useful. In fact, the different approaches 
have various (dis)advantages, and the efficient algo- 
rithms that will be developed in the future will proba- 
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bly mix these different approaches. Such algorithms are 
usually called ‘hybrid algorithms’ and have already been 
proposed for example for the traveling salesman prob- 
lem [39] or the quadratic assignment problem [33], 
demonstrating their potentials. 
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> Fractional Combinatorial Optimization 

> Multi-objective Combinatorial Optimization 

> Neural Networks for Combinatorial Optimization 
> Replicator Dynamics in Combinatorial 
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Introduction 


The a@-ECP (Extended Cutting Plane) algorithm [13,14] 
is an algorithm for solving quasi-convex MINLP 


(Mixed Integer Nonlinear Programming) problems. 
The algorithm approximates the feasible region with 
linear approximations and solves a sequence of MILP 
problems based on these approximations. There are 
several other similar methods, for instance the Gener- 
alized Benders Decomposition method [7], the Outer 
Approximation method [4], the LP/NLP Based Branch- 
and-Bound method [8], the Linear Outer Approxi- 
mation method [5] and the Sequential Cutting Plane 
method [10]. A good overview of MINLP algorithms 
and applications is given in [6]. Most of the other meth- 
ods iteratively solve both NLP and MILP problems, 
while the a-ECP method only solves MILP problems. 
The size of the MILP problems grow in each itera- 
tion, so efficient algorithms of this type require efficient 
MILP solvers. 

Most of the MINLP methods can only ensure 
global convergence for convex MINLP problems. Dif- 
ferent heuristic procedures for some of the above algo- 
rithms have been introduced for the non-convex case, 
e.g. [11]. Although these methods perform quite well 
in different applications, convergence towards the opti- 
mal solution cannot generally be ensured by these algo- 
rithms for non-convex problems. 

There are also some MINLP global optimization 
methods [1,2,9,12]. In these algorithms the function 
space is separated for the continuous and discrete vari- 
ables and the discrete variables can only occur in lin- 
ear space. The a-ECP method can solve quasi-convex 
problems where the discrete variables are involved in 
nonlinear equations as well. 

The w-ECP method has also been further extended 
to global optimization problems through the use of 
transformation techniques for problems containing sig- 
nomial terms, see [15]. 


Formulation 


The a-ECP algorithm can be used to solve problems of 
the form 


min c'z, 
s.t. g(z) < 0, 
Az <a, (1) 
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where c is a vector of constants, z = (x, y) consists of 
a vector x of continuous variables in R” and a vector y 
of integer variables in Z” and g(z)R” x ZR? is a vec- 
tor of continuous differentiable quasi-convex functions 
defined on the set X x Y having non-zero gradients in 
the infeasible region of (1). The feasible region of (1) is 
assumed to be nonempty. Furthermore, X is a compact 
convex set X C R" and Y isa finite discrete set Y C Z”. 

The matrices A and B and vectors a and b are used 
to define the linear constraints of the problem and are 
of suitable dimensions. 

The a-ECP method guarantees global optimal so- 
lutions for MINLP problems having a linear objective 
function and differentiable quasi-convex constraints. 
The linear objective function is not too restrictive since 
most optimization problems having a nonlinear objec- 
tive f(z) can be rewritten as a problem involving an ad- 
ditional variable u and an additional constraint 


fle)—u <0. (2) 


The new problem, then, will be to minimize u subject 
to the original constraints and the additional constraint 
(2). Note, however, that this is not, in general, possible 
for quasi-convex objectives since f(z) — u is not neces- 
sarily quasi-convex when f(z) is quasi-convex. An ex- 
tension to handle quasi-convex objective functions, rig- 
orously, is given in [13] 


Methods 


The algorithm solves the problem (1) by approximating 
the maximal violated nonlinear function with a linear 
function 


I(z) = gi(z*) + @ - Vgi(z*)? (z—z*) (3) 


in the current iterate z*, where i = arg max;{ gi(z*y}. 
To simplify notation, let g, = gi(z*). Furthermore, 
if the linearization added to the MILP problem is 
the jth linearization, let gj = gi(z*), &(z) = gilz), 
Vg = Vegi(z*) and z/ = z* where iis defined as above. 
The @ values change from iteration to iteration so to be 
able to reference the value of the jth constant in itera- 
tion k the a constants are replaced with ae Thus the 
linearization (3) is redefined so that in iteration k the 
jth linear approximation in will be 


I (z) = gj tal .(Vg)"(2-#) 


and the algorithm adds the linear constraint 
IP(@) <0 (4) 


to the MILP problem. The @ constants initially have the 
value a") = 1 and they are either left unchanged or 
increased by a factor in each iteration. The algorithm 
then iteratively adds more and more constraints to an 
MILP problem originally consisting of only the linear 
constraints Az < a and Bz = b from (1). In iteration k 


it thus solves the MILP problem 


min c’z, st. he Oa j= ews Ley 


Az<a, Bz=b, zEXxY, (5) 


where L* is the number of linearizations in iteration k. 
The solution to this MILP problem will be the new it- 
eration point. Using this point a new linearization is 
added to the MILP problem or one or several of the a 
constants are updated. The procedure is then repeated 
until a feasible point of (1) is found. A point is consid- 
ered feasible if 


Biz) Seg, i= 1,...,p, (6) 


for some prespecified tolerance €,. Note that the con- 
straints Az < a and Bz = b are automatically satisfied 
since the current iteration point is the solution to (5). 
The idea of finding a feasible and optimal point by solv- 
ing a sequence of MILP problems is the same as in the 
classical Kelley’s cutting plane method for NLP prob- 
lems. However, Kelley considered only the continuous 
case using LP subsolutions. Furthermore, Kelley’s cut- 
ting plane algorithm assumes that the linearizations will 
always be valid underestimators of the corresponding 
nonlinear functions. This is true if the functions are 
convex, since for convex functions it holds that 


gi(z*) + Vgi(z*)"(z—z*) < gi(z) (7) 
for all z,z* e@ Xx Y. Thus, 1) <0 whenever 
gj(z) < 0 even when a") ="; 


Unfortunately (7) does not generally hold for quasi- 
convex functions. It is possible that the linear approx- 
imations are not valid underestimators of the corre- 
sponding nonlinear function and thus the constraint 
IM) (z) <0 may cut away parts of the feasible region. 
To avoid this problem the w constants have been in- 
troduced. By using sufficiently large a values it is en- 
sured that ie < 0 whenever g;(z) < 0 holds. The 
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linearizations will then be valid outer approximations 
of the feasible region of (1). 

Generally it is not known how large the a constants 
should be. Instead an updating strategy is used. The 
a values are checked and updated in each iteration if 
they turn out to be too small. The updated value is ob- 
tained by multiplying the current value with a constant 
greater than one. When the current MILP solution is 
a feasible solution in (1) and all @ constants are large 
enough, the optimal solution to (1) has been found and 
the algorithm terminates. 


Calculating Sufficiently Large a-values 


Since it is not known beforehand how large a values to 
use, it is shown below how to obtain large enough val- 
ues to ensure €-optimality. As previously mentioned, 
parts of the feasible region may be cut out when lin- 
earizing the quasi-convex functions, if the value of the 
a constant is not increased. 

If a sufficiently large w value can be found so that 
the linearization is a global underestimator of the cor- 
responding nonlinear function in the entire feasible re- 
gion, the linearization should satisfy 


g +a .(Vg)"(z-#) < gz), 
Vze{zEeXxY: gz) <0}. (8) 


A weaker condition is that the inequality (8) is 
satisfied only for all current iteration points. If this 
condition is satisfied, the linearization is called a local 
underestimator. Thus, the linearization is a local under- 
estimator if it satisfies the following inequality in itera- 
tion k 


gral (wg) Tet —2/) < gle’), 
T= Vevey ge, (9) 


This inequality is easy to check in each iteration. If 
there is some @ constant a") that does not satisfy (9) 
then it is updated by multiplying the constant with £. 


The update formula is thus 


(k) (kK) k Sfik 
tt) —) Brae Fe) > gt), 


J ~ ) gl) 1) 


; otherwise. 


The f constant is a prespecified constant (6 > 1). 
The concept of local underestimators is now extended 


to feasible underestimators. A linearization is called 
a feasible underestimator if it approximates the entire 
feasible region. Thus, for such linearizations, it holds 
that 


g tal” .(Vg)"(e-#) <0, 


Vze{zexXxY:g(z) <0}. (11) 


This is a much more strict requirement since a local 
underestimator need only underestimate the nonlinear 
function in a finite set of infeasible points. But condi- 
tion (11) is weaker than the condition for global under- 
estimators (8) since a feasible underestimator does not 
necessarily have to underestimate all points in the fea- 
sible region of the corresponding nonlinear function. 
It is only required that ye < 0 in this region. In prac- 
tice, a feasible underestimator needs to underestimate 
the entire boundary or, more precisely, the convex hull 
of the feasible region. 

To see how to get a feasible underestimator, a new 
constant no is introduced where, as previously with 
the a constants, the constant will be used in the jth lin- 
earization and k stands for the value of the constant in 
the kth iteration. The constant is defined as 


n® = 2 (12) 


Since (11) can be divided by ar the inequality be- 
comes 


h® + (Vg)"(z-#) <0 (13) 
and moreover, because a” > 1, it holds that 
A® < (14) 


The level sets of quasi-convex functions are convex, 
which means that if the constant parameter ne is re- 
placed with zero, then the linearization (13) is always 
an outer approximation of the feasible region. In fact, 
the linearization is then an approximation of an even 
larger region 


{ZeXx Yigil2)= g(z*)} 


containing the feasible region. Thus, if i is suffi- 
ciently small, (13) is an approximation of the feasible 
region. In practice the h constants should satisfy 


k 2 
hn <en, Vi H 1g Lis 
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This is the same as requiring that 
(15) 


which can easily be seen from (12). In (15) it can be 
seen that there is an important connection between suf- 
ficiently large a values and the value of the nonlinear 
function in the linearization point (g;). The larger the 
term g; is, the larger the constant a has to be, to be suffi- 
ciently large. One could use the same updating scheme 
(10) as was used for obtaining a local underestimator, 
but to speed up the process a new updating factor y > 1 
(and y > ) is introduced. This constant is used to up- 
date the a values if the corresponding linearizations are 
not feasible underestimators. 

Whenever the algorithm finds a feasible point it 
checks that all linearizations are feasible underestima- 
tors, i.e. that (15) holds. If there is some ay” constant 
that violates this inequality, the value of that constant is 
updated by multiplying it with y. Thus, the a constants 
will be updated according to 
k) 


( (k) = 
Pia, tay a = 8j/En, 
J ie , otherwise. 


(16) 


In fact it would be sufficient to require that the lin- 
ear underestimators should not cut away the optimal 
point z*, i.e. that Ce, <0. The algorithm would 
then terminate in considerably fewer iterations, but 
since the optimal solution z* is not known it is very dif- 
ficult to check this requirement. The same difficulty also 
appears if the algorithm would be based on global un- 
derestimators of the type (8). However, as will follow, 
global convergence of the algorithm towards the opti- 
mal solution can be guaranteed by using local and fea- 
sible underestimators. That is why the concepts of local 
and feasible underestimators have been introduced. 


Handling Infeasible MILP Problems 


It is possible that the linearizations cut out enough of 
the feasible region of (1) to make the corresponding 
MILP problem infeasible. Then there would be no new 
iteration point and the algorithm would not be able to 
continue. The solution to this problem is to update all a 
values and solve the MILP problem again, after updat- 
ing the values. If there is still no feasible point, this pro- 
cess is repeated until a feasible point is obtained. There 


exist large enough @ values to make the MILP problem 
feasible, since the nonlinear problem (1) was assumed 
to be feasible. Thus, if the MILP problem is infeasible, 
the a update will be 


at) pa pie beers os 


(17) 


To illustrate the algorithm, a flowsheet of the algo- 
rithm is given below. 


k=k+1 
Solve (5) 


x 


Feasible 
solution? 


Update a:s 
according to (17) 


Call solution z*. 
Calculate 
&k = max;{gi(z*)} 


Local 
underestimators? 


Eq. (9) 


Update a:s 
according to (10) 


Feasible Add linearization 


solution to (1)? according to (4) >» 
Eq. (6) Ix4i =En +1 
Yes 
<a 
Feasible 
underestimators? Update as 


Eq. (15) according to (16) 


Point z* is 
optimal in (1) 


Extended Cutting Plane Algorithm, Figure 1 


Convergence 


Convergence properties of the algorithm are now stud- 
ied. Below it is proven that the algorithm converges to- 
wards the optimal solution for the quasi-convex prob- 
lem (1). There are three important properties which 
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are needed to prove convergence. First, the algorithm 
will never return to the same point if it is infeasible, 
secondly the generated points will converge to a fea- 
sible solution and finally this feasible solution will be 
the global optimal solution to the original quasi-convex 
problem (1). 


Cycling 

First it is shown that the algorithm never returns to the 
same point if it is infeasible, i.e., that cycling is not pos- 
sible. Note that compactness or quasi-convexity of the 
constraint functions are unnecessary to prove this the- 
orem. 


Theorem 1 [f, in iteration k, the current point z* is not 
feasible, then all new points generated by the algorithm 
will be different from z*. 


Proof Ifz* is infeasible, then g, > 0 and a linearization 
is added to the MILP problem. If this linearization was 
the jth one added, then all new points z! generated by 
the algorithm will satisfy 


gta (Vg) (2! -z) <0, 1>k. (18) 
Since z! = z* (= 2/) does not satisfy the inequality 
(18), all new points will be different from zk, Oo 


It immediately follows that all previous points gener- 


ated by the algorithm are different from z* as well. 


Corollary 1 If the current point z* is infeasible, then z* 
is different from all previous points. 


Proof If there isa z/, j < k such that z/ = z* then 2/ 
would be a point not satisfying the previous theorem.O 


Convergence to a Feasible Point 


Convergence to a feasible point for discrete problems is 
directly ensured by the above cycling theorem. By as- 
sumption, there are only a finite number of points in Y, 
and there is at least one feasible point. Consequently, if 
the algorithm does not find any of the feasible points in 
finite time, it would have to repeat an infeasible point 
after generating at most |Y| iteration points, which is 
not possible under the cycling theorem. 

Convergence in the mixed integer case can be 
proven by utilizing the fact that the points x* are taken 
on a compact set X, and the set Y is finite. This implies 


that any infinite sequence of points {z* = (x*, y*),k € 
K} taken on the set X x Y has a subsequence with 
a limit point. The following theorem shows that any 
limit point will be a feasible point which is a property 
required for convergence. Note that the quasi-convex 
property of the nonlinear functions is not required to 
prove convergence of the algorithm. Quasi-convexity is 
only required to ensure a global optimal solution. 

The algorithm ensures that ae es gj/éen, but for 
simplicity assume that equality holds for those j where 
&j = €n. Then the constant satisfies 

min (€p, gj) < he = g (19) 

This follows directly from (14) and the fact that (15) 
is already satisfied for a!) = lif gj < ey. 

Below it is proven that any limit point is a feasible 
point. 


Theorem 2 Suppose that the a-ECP algorithm gener- 
ates an infinite sequence of points {z*, k € K}. Then 
the limit point of any convergent subsequence K C K is 
feasible. 


Proof Assume there is a convergent subsequence 
{zk k € K} with a limit point that is not feasible. Then 
limypex gk = € > Oand one can find a constant M such 
that 
n® es € ; 
j = min (€n, 5) ,Vj>Lmu, Vk>M, 
by (19). Since subsequent points z* are solutions to 


a linear program containing the linearization (13) it 
holds for all k that 


0> h® + (Vg)T(zk — 2) 
> hy — || Vgill-[le* — 2/1 
when j=1,...,L,. Define G as the maximal 


norm of the gradient of g(z) in X x Y. That is, 


G = max{||Vgi(z)||,z ¢ Xx Y,i=1,..., p}. Then 
. i 
Wek — zi] > J > min (€p, €/2) efi 
IV gill G 


when k > M and j > Ly. This implies that the se- 
quence is not a Cauchy sequence and thus not conver- 
gent, which is a contradiction since it was assumed that 
the sequence {zk k € K} was convergent. O 
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Convergence to the Optimal Solution 


Finally convergence of the algorithm to the global opti- 
mal solution of (1) is shown. 

First note that the algorithm will terminate in fi- 
nite time at a point where all underestimators are 
€-feasible underestimators, i.e. (15) is satisfied. This 
follows from the convergence theorem. Since any con- 
vergent subsequence has a limit point that is feasible, 
it means that the entire sequence of points will also 
converge to a feasible point. Thus, there is a tail of 
the sequence, say {zi ,j=M,...}, where the initial 
a values of the corresponding linearizations directly 
satisfy (15). This is true for those M values that sat- 
isfy g; < €n, Vj > M. These a values will remain con- 
stant in subsequent iterations. On the other hand, af- 
ter reaching a feasible point (g; < €,), the old constants 
a, j=1,...,Mcanonly be updated a finite number 
of times before being sufficiently large to satisfy (15). 
Therefore the algorithm will eventually reach a feasible 
point where all linearizations are €-feasible underesti- 
mators and the algorithm terminates. It remains to see 
if this point is also the optimal solution. 

To prove that the obtained solution is the optimal 
solution one needs to assume that all linear constraints 
are feasible underestimators according to (11). This is 
in general true if hn = 0. However, in the actual al- 


gorithm it was only required that ey < €;. Thus, the 
actual solution obtained by the algorithm can only be 
ensured to be €-optimal. 


Theorem 3 Assume that the a-ECP algorithm con- 
verges to a feasible solution z~ and that all lineariza- 
tions are feasible underestimators according to (11). 
Then z™ is an optimal point in (1) and Z(z°), where 
Z(z) = c"z, is the optimal solution of (1). 


Proof Denote the feasible region of (1) with 92, the fea- 
sible region of the MILP problem that was solved to ob- 
tain z©° with (2° and an optimal point of (1) with z*. 
By (11) it holds that 2 C 92° and thus 


Z(z™) < Z(z*). (20) 
On the other hand z®™ was feasible in (1) and thus 
Z(z*) < Z(z™). 


From (20) and (21) one gets that Z(z*) = Z(z°) 
and thus Z(z°) is the optimal solution to (1) and z°° 
is an optimal point in (1). oO 


(21) 


Cases 
The algorithm is demonstrated on a quasi-convex in- 
teger problem. In these, as well as in other test runs, 
it has turned out that a suitable choice of 6 and y is 
B = 1.3 and y = 10. The e-tolerances in these exam- 
ples are €, = €, = 0.001. 

Consider the problem 


min 3y1 + 2y2, 
s.t. 3.5 — yiy2 <0, (22) 
Pe tlcasSP, 
The optimal solution to this problem is y = (2, 2), 
which can be seen from the Fig. 2. 
The steps executed by the a-ECP algorithm are: 


Iteration 1. Solve problem 


min 3y1 + 2y2, 
st. y € {1,..., 5}. 


The solution is y! = (1,1). A linearization in this 
point 


25 -o\? (1. 1) (” 7 ') <0 


is added to the MILP problem according to (4). 
Set a? = 1. The linearization ie is shown in the 


1 
i” 


Extended Cutting Plane Algorithm, Figure 2 
Feasible region of (22) 
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Fig. 2. As can be seen from the figure, the lineariza- 
tion cuts away the optimal solution to the problem. 

Iteration 2. The solution to the new MILP problem 
is y? = (1,4). This point is a feasible solution to 
the INLP problem. The linearization satisfies the 
requirements of a local underestimator but is not 
a feasible underestimator. Observe, that without the 
concept of feasible underestimators the algorithm 
would stop here at a nonoptimal point. However, in 
order to ensure the linear function be a feasible un- 
derestimator, the @ constant is supe according 
to (16) and a?) = = 10. Since max; ifgi(z* )} < 0, no 
additional linearization is added. 

Iteration 3. The solution to the new MILP problem 
is y’ = (1,2). A new linearization at this point is 
added to the MILP problem (a = = 1) 


15+a%)(-2 -1) iC 7 ;) <0. 


Iteration 4. The MILP solution is y* = (2, 2) which is 
feasible, however, neither linearization is a feasible 
underestimator, so the a ua are yen using 
(16). The new values are a’ = 100 and a = 10. 

Iteration 5. The solution of the modified MILP prob- 
lem is y°? = (2, 1). Since it is infeasible, a new lin- 
earization 


1640 (-1. =2) @ : <0 


is added, where a) = = 1, 
Iteration 6. The MILP solution is y° = (1, 3) which is 
also infeasible. A new linearization 


05+e (3 —1) @ = ) <0 


is added (a = 1). 

Iteration 7. The MILP solution is again the feasi- 
ble solution y’ = (2,2). The linearizations are not 
feasible underestimators and thus at : values 
a updated. The new a values are a’ = 1000, 

= 100 and a = a = 10. 

ae 8-10. The new solutions to the MILP prob- 
lems are still y*?:!° = (2, 2) but the a values are not 
large enough to guarantee that the linearizations are 
feasible underestimators. Therefore the aw constants 
are updated. 


Iteration 11. The solution is y!! = (2,2) and all lin- 
earizations are feasible underestimators. The algo- 
rithm terminates with y* = (2, 2). 


The algorithm thus returns the global solution 
= (2, 2) to (22) with the optimal value Z(2,2) = 10. 
The final MILP problem solved in iteration 11 is 


min 3y; + 2y2, 
s.t. 2.5 + 10,000(2 — yi — y2) < 0, 
1.5 + 10,000(4 — 2y; — y2) < 0, 
1.5 + 10,000(4 — y; — 2y2) < 0, 


0.5 + 1000(6 — 3y; — y2) <0, 
PE ypaeag Sls 
Conclusions 


The above algorithm has several advantages when com- 
pared to other similar algorithms for solving MINLP 
problems. At each iteration, the procedure only solves 
MILP subproblems and is thus a competitive alterna- 
tive to algorithms where only NLP problems or both 
NLP and MILP problems are solved in each itera- 
tion. 

One consequence is that since only MILP problems 
are solved in each iteration, the nonlinear constraints 
need not be calculated at relaxed values of the integer 
variables. It can be very difficult to calculate the value 
in a relaxed point if, for instance, there are binary vari- 
ables that represent the existence of units in a process 
and the constraints are evaluated by simulating the re- 
sult of having those units present or not. Then it may 
sometimes be impossible to evaluate the constraints if 
the integer variables are relaxed. 

The a-ECP algorithm also solves MINLP problems 
that have general integer variables, not only binary vari- 
ables. Also, no integer cuts are needed to ensure conver- 
gence. This is not the case with all outer approximation 
MINLP methods. In addition, the proposed algorithm 
ensures global convergence for quasi-convex MINLP 
problems. 

Cutting plane methods are claimed to have slow 
convergence. This, generally, is not the case if the con- 
vergence rate is measured as the number of nonlinear 
function evaluations. Numerical experience with the al- 
gorithm indicates that there are many cases where the 
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number of function evaluations are even magnitudes 
lower than for competing algorithms that solve both 
MINLP and NLP subproblems. This is a significant ad- 
vantage if evaluation of the constraints is the most time- 
consuming part of the problem. 

Very good results for the algorithm for a set of dif- 
ficult block optimization problems is reported in [3]. 
Therefore, the algorithm also appears to work very well 
on problems where the problem complexity is domi- 
nated by the integer part. 
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Introduction 


The roots of contemporary optimization are found in 
the works of some of the greatest minds in mathe- 
matics: Lagrange, Weierstrass, Caratheodory and von 
Neumann to name a few. However, it was the work of 
George Dantzig in the late 1940s that catalyzed much 
of the research comprising the core of optimization, 
mathematical programming and operations research. 
Dantzig developed the simplex algorithm and proved 
the discipline’s fundamental theorem, and in doing so 
he became known as the father of linear programming. 
The simplex algorithm is arguably one of the most im- 
portant discoveries of the 20th century. Applications 
and subsequent theoretical developments have flour- 
ished ever since. 

The Fundamental Theorem of Linear Programming 
states that every well-posed linear program has a basic 
optimal solution. Some related theory shows that this 
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result can be interpreted algebraically (as stated), ge- 
ometrically (replace basic with vertex), or in terms of 
convex analysis (replace basic with extreme point). This 
multi-faceted perspective is a byproduct of the linearity 
since the supporting polyhedral analysis shows that all 
three constructs are one and the same. However, a nat- 
ural question is whether or not the algebraic interpre- 
tation extends beyond the confines of linearity. The an- 
swer is yes, and the goal of this article is to explain how 
a broader perspective is achieved. 


Definitions 


We consider the following multiple objective extension 
of the standard form linear program 


(MOP) min{F(x): Ax = Db}, 


where each component function F; of F: R” — R? 
has the form F;(x) = Yi Si, (*;). Notice that each 
F; is separable in the sense that f(;,;) only depends 
on the component x;. The codomain of F is ordered 
lexicographically so that the minimization problem is 
well defined. Unlike a standard form linear program, 
(MOP) does not have inequality constraints. However, 
the lexicographic ordering allows us to model inequali- 
ties through the objective function. Let D: R” > R be 
defined by 


D(x) = ) | max{—x;, 0} ’ 


j=l 


so that 


arg min {c™x: Ax =b,x> o} 


— argmin | ( shi ) : Ax = | (1) 


Cx 


and 


(LP) min fats Ax = b,x > o} 


= min} (re } saxo . 
Cx 


provided that (LP) is well-posed. Hence (MOP) is an 
extension of (LP). We make the tacit assumption that A 
has full row rank. 

Forx € RU = {x € R: x = O}, we define 
B(x) = {i: x; >0} and N(x) = {i: x; = 0}. The argu- 
ment is assumed when it is clear. A set subscripts on 


a vector (matrix) indicates the subvector (submatrix) 
whose components (columns) are indexed by the set. 
For example, 


*[2 a]G)+ fa] 


= Apxp + AnXyn 


= Apxp : 


A basic feasible solution is an x € R‘, satisfying Ax = 
b so that the columns of Az are linearly independent. 
A basic feasible solution within the optimal set of (LP) 
is called a basic optimal solution, and the Fundamen- 
tal Theorem states that such a solution exists as long as 
(LP) is well-posed — i. e. not infeasible or unbounded. 

Observe that the definition of a basic feasible solu- 
tion is independent of the objective function, after all, it 
is only concerned with feasibility. However, the (MOP) 
framework blurs the division between the constraints 
and the objective, and the Fundamental Theorem’s ex- 
tension relies on a broader definition of B and N that in- 
cludes information from the objective. The idea is to re- 
place the separation of zero and nonzero expressed by B 
and N with the distinction of whether or not the objec- 
tive is monotonic. Notice that each term of D has a nat- 
ural change in monotonicity when its argument is 0. 
It is precisely this observation that permits the broader 
perspective. 

We say that a function f from R into R is lo- 
cally strongly monotonic at x if there is a neighborhood 
about x over which f is strictly increasing, strictly de- 
creasing or constant. The objective F is not a single 
variable mapping, making the concept of monotonicity 
opaque. However, the terms of each component func- 
tions are single variable mappings, and we define 


Hj = {x;: f(i,j) is not strongly monotonic at x; 


i€{1,2,...,p}}. 


for some 


Each Hj is the collection of values on the jth axis at 
which the monotonicity of at least one of the objec- 
tive’s components changes. In the case of D, we have 
for each j that Hj = {0}. The jth component of x is 
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cornered if x; € Hj, and the analogs to B and N are 


B(x) = {f: x; 
V(x) = {fi x; 


is not cornered} and 


is cornered} . 


As before, we drop the argument when it is clear. The 
corresponding definition of a basic feasible solution is 
a corner solution, which is any x € R” satisfying Ax = 
b so that the columns of Ag are linearly independent. 

We are not guaranteed that (MOP) has a corner so- 
lution, and to circumvent this undesirable possibility, 
we assume that Hj is non-empty for each j. Then, from 
the full rank assumption of A we know that the columns 
of A can be rearranged so that A = [A’|A”], where A’ 
is invertible. Partitioning x appropriately and corner- 
ing each component of x”, we have that ((A’)~!(b — 
Ax")" (x’)")" is a corner solution. 

The monotonicity discussion above permits an ex- 
tended definition of a basic optimal solution, but it is 
not enough to extend the Fundamental Theorem. For 
this we require F to have an additional monotonicity 
property. We say that a real valued function f is strongly 
linearly monotonic over S2 if it is strongly monotonic, 
which again means strictly increasing, strictly decreas- 
ing or constant, on each line segment within 92. The 
monotonicity property we impose on F to extend the 
Fundamental Theorem is that each component func- 
tion F; be strongly linearly monotonic on the closure 
of 


Q(x) = {x +aq: a € R, Ag = 0, v(x + aq) 

= v(x), Xy(x) = (x + 4) v(x)} ; 
for each non-cornered optimal x. This assumption 
guarantees the component functions, and in turn F, are 


well behaved as we move from a non-corner optimal 
solution in the affine plane {x: Ax = D}. 


Methods 


The following result extends the Fundamental Theo- 
rem of Linear Programming. 


Extension of the Fundamental Theorem 


Under the assumptions of the previous section, we have 
that if min{F(x): Ax = b} has a solution, then it has 
a corner optimal solution. 

The fact that this result includes the original Fun- 
damental Theorem follows directly from (1) and (2). 


Although the monotonicity properties required by the 
result are technical, they are not overly restrictive. In- 
deed, no assumption of continuity or differentiability 
is needed, and the extension permits functions that are 
standard counter examples to other analytical results. 
To highlight this fact, we consider an example with 
A = [0,0,1] and b = [0], which makes the feasible re- 
gion R? x {0}. We let g be the standard Cantor function 
on [0, 1], which is continuous, piecewise constant, and 
differentiable almost everywhere with f’(x) = 0. This 
function fails to be differentiable on the Cantor set, de- 
noted by C = {)°72, 0;/3': 0; € {0, 2}}. If fi, is the 
Cantor function, then H; contains each of these points. 
We additionally let h be the Dirichlet function on [1, 2] 
defined by 


0, xeE[1,2]NQ 
1, x¢Z[1,2]NQ 


The Dirichlet function is discontinuous over its entire 
domain. These two functions have played a critical role 
in the development of analysis since they highlight er- 
rors in previous mathematical convention. The point of 
discussing them here is to show that the extension of 
the Fundamental Theorem does not suffer from similar 
hindrances. 

We let F, = D, which guarantees each Hj con- 
tains 0. The first two component functions of F are 


h(x) = 


1—x, xy <1 
Fal) = 4 h(x), 1l<x<2 
sin(x —(4+ 7)/2)+1 x, >2, 
and 
arctan(l—x), x2. <1 
1—2g(x—-1), 1<x < 4/3 
faa (x2) = 
2g(x—1)-1, 4/3 <x, <2 
In(x — 1), x2 >2. 


Since every feasible element has x3 = 0 and 0 € H3, 
this element is always cornered. Notice that f(2,3) only 
needs to be defined over the singleton {0}, and we set 
F2,3)(0) = 0. Each of the functions just defined are non- 
negative over their domains, and since F)(1,1,0) = 0 
and F(1,1,0) = 0, the minimum values of F, and F) 
over {x: Ax = b} = {x € R?: x3 = 0} are simultane- 
ously zero. 


Extremum Problems with Probability Functions: Kernel Type Solution Methods 


969 


From the definition of F we have that 


H, = {0} U [1,2] U {2+ kw: k =0,1,2,...} 
H, = {0}U(C+1), and H; = {0}. 


Since x3 is cornered for every feasible element, we have 
that B(x) C {1,2}, and hence, Ag,,) is a submatrix of 
[0, 0]. No subcollection of these columns has linearly 
independent columns, and we conclude that every cor- 
ner satisfies B(x) = @. This means the collection of cor- 
ners is H, x H2 x Hs3. Since (1, 1, 0) is in this set and 
F(1,1,0) = (0,0)', we have that a corner optimal so- 
lution exists. To see that there are non-corner optimal 
solutions, notice that F(1, 3/2, 0) is also (0, 0)" but that 
(1, 3/2, 0) is not a corner. 


Conclusions 


The Fundamental Theorem of Linear Programming 
was one of the most important results of the 20th cen- 
tury, and we have discussed how the algebraic insights 
of this result extend beyond the assumption of linear- 
ity. The extended presentation leads to simplex type 
procedures for many problems in which the basic idea 
is to move from corner to corner until optimality is 
achieved. Unfortunately, the number of corners can be 
uncountable as demonstrated by our example, so we 
lose the finite convergence of linear programming. 
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Two types of stochastic programs are widely known: 
two-stage and chance constrained problems. The last 
ones were introduced to stochastic programming by A. 
Charnes and W.W. Cooper in the 1950s [1] and are for- 
mally described defining a nonlinear probability func- 
tion v(x, t) of the form: 


v(x, t) = P{E: f(x, &) < t}. (1) 


Here f(x, &) is a real valued function, defined on R" x 
R’, t is a fixed level of reliability, § = & (@) is a ran- 
dom parameter and P denotes probability. Note that for 
a fixed x the function v(x, t) as a function of t is the dis- 
tribution function of the random variable f(x, s). 

Various examples of extremum problems with 
probability function v(x, t) can be found in [3, Chap. 1], 
where among others also the so-called ‘stock exchange’ 
paradox is analyzed. To overcome a paradoxical situa- 
tion being caused by an unsuccessful choice of the ob- 
jective expected return, the strategy which maximizes 
the expected growth of return (Kelly strategy), was ap- 
plied in [2]. In [3] it was demonstrated that a risky (i. e. 
probabilistic) strategy is better than the Kelly one. 

In the approximate maximization of v(x, f) over the 
constraint set X C R" we should apply some (quasi-) 
gradient type method. This in turn needs the presenta- 
tion of v(x, ft) as an integral, which we can realize via the 
Heaviside zero-one function y (-): 


1, a8 Fe) = ¢, 


AE — fle, §)) = \ if f(x, £) > t. 


Then 


v(x, t) = / x(t — f (x, €))o( dé), (2) 
S 
where o (-) is the distribution function of a random 
vector € and the integral in (2) is understood in the 
Lebesgue-Stieltjes sense. 
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Integral representation (2) of the probability func- 
tion v(x, t) demonstrates us expressively difficulties 
which arise in approximate maximization of its value: 
integrand x (-) itself is a discontinuous zero-one func- 
tion and integral (2) over x (-) is never convex. Only in 
some cases, e. g., if function f(x, €) is jointly convex and 
continuous in (x, £) and o (-) asa measure is quasicon- 
cave, then function v(x, f) is quasiconcave in x, see [12]. 

In this survey we at first will solve iteratively, using 
stochastic analogues of linearization and gradient pro- 
jection methods, the following probability maximiza- 
tion problem: 


max v(x, 1) = max P {§: f%2) 28, (3) 


where the constraint set X is assumed to be simple, i.e. 
on X we can effectively solve auxiliary problems of max- 
imization of linear or quadratic functions. At second, 
we will exploit the introduced technique for minimiza- 
tion of a smooth function over probabilistic equality- 
inequality type constraints, using a stochastic analogue 
of the modified Lagrange method. 

Gradient type methods require differentiability of 
a cost function. A lot of papers have been devoted to 
differentiability conditions of v(x, t) in x, starting from 
[13] where v’,(x, f) was presented via surface integral. 
The gradient of v(x, t) via volume integral was pre- 
sented in [16]; see also the survey paper [4]. All these 
formulas are quite uncomfortable to handle, especially 
for numerical methods. In the following we will assume 
differentiability of v(x, t) in x and in (x, t), i. e. there ex- 
ist v(x, t) and v¥,(x, t). 

Define solution sets X* for the problem (3) as fol- 
lows: 


xX*= {x (vi(x*, t),x —x*) <0, Vx eX}, (4) 
or 
X* = {x*: x* = a[x* + pvi(x, 6], Va>o}, ©) 


where z[y] means the projection of a vector y to the 
set X. Then we can interpret linearization and gradient 
projection methods as iteration ways for testing condi- 
tions (4) and (5), respectively. 

Following [17, Chap. IV], method for solution of 
a problem is said to be convergent, if limit points of the 
sequence {x,}, generated by the algorithm, belong to the 
solution set X*. 


Denote n independent realizations &), ..., &, of 
a random vector & by &”, ie. €" = (&, ..., &,). Then, 
following [14] and [10], the smoothed approximation 
of v(x, f) looks as follows: 


En) 


Vn(x, t,€") = valx, t, ee 


K( Lf) ee Si) 4 6) 


~ nh, 


where the sequence {h,,} is connected with the sequence 
N = {1, 2,...} as 


limh, =0, limnh,=co, neN, (7) 
and the continuous kernel function K(y) satisfies con- 


ditions [14]: 


[xo dy =1, sup |K(y)| < 0%, (8) 
—oo<y<o0o 
/ yK(y) dy = 0, / IyK(y)| dy < oo. (9) 


Gradient of the smoothed approximate probability 
function v,(x, t, €”) from (6) looks now as follows: 


Viol, t . 


“(§)K (ASP — 2). (10) 


Even estimates (6) and (10) are biased, i. e., 
Evi (a, 4,2") 4 v(x; 9), 


we still have 


Ete 


CO 


= vi(x, t)— An / yK(y)Vx, (x,t — Ohny) dy, 


—0o 


where 0 < 6 < 1, see [15], and consequently, 


lim sup EV’, (x, t,€") — vi (x, t)| = 0. 

H-POO 5. se 

For approximate solution of (3) consider the 
stochastic analogue of the linearization method: 


Xn+1 = Xn a VnlXn — Xn)s (11) 
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where X,, is a solution of the linear problem: 
max(Vj,<(Xn, t,€"), x) = Vie (Xn, t, E", Xn) 
x 


and xp € X. 

Explain the stochastic nature of the sequence {xp}, 
generated by the algorithm (11). For each n the ran- 
dom vector x, is defined on the sigma-algebra F,,— 1, 
generated by random vectors &, ..., &,—1. Union of 
the sequence of sub-sigma-algebras U2, F; is equal to 
the sigma-algebra F of the initial probability space (£2, 
F, P), where the random vector & was defined. Note that 
in each iteration step we should generate new (indepen- 
dent) realizations of the random vector &. 

Assume that function f(x, &) is differentiable in x 
and that for all t € R' and all x € X its gradient is 
bounded with a o-integrable function K(&): 


Lex. £)] < KC), / K(&)o( dé) <0. (12) 


Let the sequence {y,} of steplength satisfy condi- 
tions: 


(13) 


Then the following convergence theorem holds [5] 


Theorem 1 Let differentiable in x function f(x, &) sat- 
isfy conditions (12), smoothing continuous kernel K(y) 
conditions (8), (9), sequence {y »} of steplength conditions 
(13), and let the solution set X* be finite. Then all limit 
points of the sequence {x,}, generated by the algorithm 
(11), belong almost surely to the solution set X*. 


Remark 2 Proof of the theorem relies on the stochastic 
analogue of [17, Thm. A], see [9, Chap. H, Thm. 8], and 
was verified in [5]. 


Remark 3 Statements of the theorem are valid also 
for the stochastic analogue of the gradient projection 
method, see [5]: 


Xn41 = W[Xn + YnVnx(Xnt, En], x0 EX. (14) 
As it was described earlier, algorithms (11) and (14) 
need in nth iteration step n independent realizations 
of the random vector &. In [11] it was verified that in 


asymptotic sense statistical estimation type methods, as 


algorithms (11) and (14) are, have no advantages com- 
pared with methods of random search, but need more 
calculating efforts. 

As an example of the last statement consider the free 
maximum problem: 


max = maxP{E: f(x.8) <f}. (15) 


xeER" 
Let &,, be the nth realization of the random variable 

€. Consider the algorithm: 
Yn 


Xn+1 = Xn — 7— 


hig” 


ite Sn) ) 


'Otns EK (16) 


Assume, in addition to assumptions (7)-(9) and 
(12), (13) to {hn}, {yn}, K(y) and f(x, &), that 


ie) eo) oO 9 


yy oe > Valin 208: i < oo. (17) 
n=1 n=1 n=1 " 
Then, if /r|fi(x,&)|o(d&) is bounded for 


bounded x, the limit points of the sequence {x,} be- 
long almost surely to the set X* of stationary points, 


xX*= 1a v(x", 0 = Oo}, 


see [7]. 


Remark 4 Even algorithms (11) and (14) take more 
calculating efforts compared with random search 
method (16), the last one is very unstable, and con- 
verges only ‘in probability’ sense. 


Consider the following nonlinear programming prob- 
lem with a smooth cost function f(x) and with proba- 
bilistic constraints of inequality type with a fixed level 
of reliability a7,0<a<lie, 


min {f(x): v(x, t) > a} (18) 
xeRR’ 
(for sake of simplicity consider only the case with one 
inequality constraint). 

Define the solution set X* for the problem (18) as 
follows [8]: 


X* = {x*) FNG}, (19) 
where 
r= ce [ROY + et Oat] = ob, Qo) 
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with 
t= arg min [Fi(x*) + v(x", Dal’, (21) 
>0 
and 
G = {x*: v(x*,t) >a}, (22) 


where A* is the optimal Lagrange multiplier of the La- 
grangian. 

Replacing v(x, t) and v/, (x, t) with their estimates 
(6) and (10), we should regularize the estimated ana- 
logue of (21) since the approximated subproblem (21) 
could be ill-posed. 

Denote by 


Wn(x,t, &n) = min{0, v,(x, t,€”) — a}. 


Then the stochastic analogue of modified Lagrange 
method looks as follows: 
Xn+1 = Xn 
—YVn ace a (A. t, E")1,(E") 
+ My). (%ns t, BE" Wal(Xn, t, é,)], 


(23) 


where A,,(&,) is a solution of the regularized auxiliary 
subproblem of quadratic programming 


min [| {(n) + Magn f EAL? + ty IAP] 


with @, > 0, a, > 0, n > oo and M > 0. The following 
convergence theorem is valid, see [6]: 


Theorem 5 Let conditions of the previous theorem be 
satisfied, let the cost function f(x) be continuously differ- 
entiable and let 


lee) 
) AnYn < &. 
n=1 


Then limit points of the sequence, generated by the algo- 
rithm (23), belong almost surely to the solution set X*, 
defined by (19). 
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Facilities layout (FL) is concerned with the placement, 
relative to one another, of the facilities of some physi- 
cal system. This area differs from planar multifacilities 
location (MFL; cf. also » Multifacility and Restricted 
Location Problems) in that in FL the facilities are all as- 
sumed to have a significant physical area and are to be 


placed in a finite total area which represents their phys- 
ical system. In MFL the facilities are assumed to be di- 
mensionless points. 

The aim of FL is to produce a scale plan (in some 
scenarios called a block plan) of the physical system to 
be designed. The plan depicts the facilities of the system 
(each one having its given area and shape) laid out rel- 
ative to each other. An example of a simple block plan 
is shown in Fig. 1. 

The identification of effective plans depends upon 
interfacility relationships, which may be quantitative 
(e.g. transportation costs) or qualitative (e.g. utility 
scores, called REL chart scores, based on facility adja- 
cency). Each FL problem involves optimizing one or 
more objective functions based on the given interfacil- 
ity relationship. 

FL is an important application area of optimiza- 
tion. This is partly because increased global competi- 
tion in manufacturing has spurred renewed efforts to 


Facilities Layout Problems, Figure 1 
A block plan with 11 facilities, including the exterior region, 
indicated as facility 1 
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reduce production costs. Efficient physical layout de- 
sign of manufacturing plants is critical in the quest to 
achieve and maintain competitive productivity. Indeed, 
up to 70% of the operating costs of a manufacturing sys- 
tem are related to materials handling and layout. This is 
because improved layout design often brings about re- 
ductions in materials handling, transportation, conges- 
tion, and work-in-process. 

There are applications of FL techniques in areas 
other than manufacturing plant design. Examples in- 
clude the design of office blocks and other commer- 
cial buildings, hospitals and other public services, and 
university campuses, government agencies, and sports 
complexes. As will become evident in the following dis- 
cussion, most FL models are NP-hard in the strong 
sense (cf. also » Complexity Theory; » Complexity 
Classes in Optimization). This has reinforced the search 
for effective heuristics for them. 

One of the earliest and best-known FL heuristics is 
termed CRAFT (coordinate relative allocation of facil- 
ities technique) [1]. It requires an initial block plan as 
input, which it attempts to improve by exchanging the 
positions of two or three facilities at a time. In con- 
trast to this improved procedure, many other early FL 
heuristics are construction procedures which build up 
the final block plan iteratively, by placing facilities se- 
quentially. The serial decision process requires, at each 
step: 

i) a selection of which facility is to be placed next in 
the block plan being constructed, and 

ii) a decision as to where this facility is going to be 
placed. 

Early construction procedures include: 

LAP [23] and ALDEP [32]. 

One of the major FL optimization models is based 
on the quadratic assignment problem (QAP; cf. also 
> Quadratic Assignment Problem). For overviews on 
this subject see [4,5,29]. Formulations of various FL 
problems based on the QAP involve minimizing the 
total transportation cost between all pairs of facilities. 
This total cost comprises a sum of components calcu- 
lated according to the distance and the amount of work 
flow between each pair of facilities. The constraints 
of the QAP model are based on the assumption that 
the block plan is tessellated into a grid of unit squares 
(called locations) and that no two facilities are to be as- 
signed the same location. Many of these models assume 


COREL- 


that all the facilities are of equal area. However, when 
facilities have unequal areas or irregular shapes, addi- 
tional constraints must be added. The facilities are par- 
titioned into a number of subfacilities of unit area. The 
problem then is to locate the subfacilities so that all the 
subfacilities of each facility are assigned adjacent loca- 
tions in an appropriate configuration. As the QAP is 
NP-hard, most FL applications of it are concerned with 
heuristics. A QAP model of a common FL problem is: 


n n n n 
min ) ) ) ) Qi jkrXijXkr 


i=1 j=1 k=1 r=1 


n 
s.t. ae oe j=il,...,n, 


i=1 


n 
eel i= 1,. pits 
j=l 
xjj=Oorl, i,j=1,...,n, 
where 
n = the number of subfacilities, 
cij = the cost per unit time period of assigning sub- 
facility i to location j. (These costs are usually 
one-time relocation costs which are converted 
to an annual equivalent.) 
dj, = the cost per movement or interaction over the 
distance from location i to location r, 
fik = the number of moves per time period in the 
workflow from subfacility i to subfacility k, 
S; = the set of locations to which subfacility i may 
be feasibly assigned, 
Sik dir ifi x korj# Yr, 
Giikr = 
id Cij ifi = kandj=r, 
1 if subfacility i 
xij = is assigned to location j, 


0 otherwise. 


If there are more locations than subfacilities, a num- 
ber of dummy facilities can be introduced with zero cj 
and fx values. The fix values are set to relatively high 
levels if subfacilities i and k belong to the same facility, 
thereby ensuring their adjacency. The cj values are set 
to relatively high values when j ¢ Sj. 

A second major FL optimization model is based on 
graph theory (GT) and involves maximizing the sum 
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of the REL chart scores corresponding to the pairs of 
facilities that are adjacent in the block plan. The for- 
mulations can accommodate specifications that the re- 
gion exterior to the block plan is one of the facilities, 
and that the facilities are of unequal areas and vari- 
ous shapes. GT models represent facilities and the pos- 
sible adjacency of pairs of facilities in the block plan 
by the vertices and edges of a graph, respectively. The 
REL chart scores are used to weight the edges of the 
graph. The objective is to identify the planar subgraph 
(termed an adjacency graph) of this graph with the 
largest total weight in terms of its REL chart scores. 
The optimal adjacency graph specifies which pairs of 
facilities are to be placed adjacent to each other in the 
block plan. As this model was shown to be NP-hard in 
[13], most research concentrates on heuristics. How- 
ever, some GT algorithms for FL problems guarantee- 
ing optimality, do exist. The algorithm in [12] involves 
a series of tests for determining whether a proposed ad- 
jacency graph being constructed is planar or not. In [6] 
an integer programming formulation based on the GT 
approach is discussed. It employs a Lagrangian relax- 
ation procedure (cf. also ® Integer Programming: La- 
grangian Relaxation) for the derivation of bounds to be 
used in a branch and bound algorithm (cf. also ® Inte- 
ger Programming: Branch and Bound Methods). Ap- 
proaches to enforce connectivity of subgraphs corre- 
sponding to facilities are taken from k-cardinality tree 
models ([14] and [7]) which can also incorporate for- 
bidden areas [10,11]. 

Early GT heuristics first identify the adjacency 
graph and then attempt to construct a block plan cor- 
responding to the information provided by the graph. 
Examples include the heuristics of [3,9,24] and [27]. 
The comparisons in [28] show that the results of [27] 
are invariably so close to optimality that the quest for 
heuristics which find good quality adjacency graphs can 
now be considered essentially solved. More recent GT 
heuristics build up the adjacency graph and its corre- 
sponding block plan simultaneously, such as the heuris- 
tics of [37]. 

It has been observed that many of the previously 
mentioned techniques are not computationally feasible 
for some of the large scale numerical instances of FL 
problems encountered in industry and often identify lo- 
cal optima which are clearly far from globally optimal. 
This has given rise to many investigations into whether 


the more recently developed random search procedures 
(such as simulated annealing (SA; cf. also » Simulated 
Annealing Methods in Protein Folding) and genetic al- 
gorithms (GA; cf. also ® Genetic Algorithms)) could 
be used to devise useful FL heuristics. There is a fun- 
damental difference between SA and GA. That is, GA 
must, of necessity, deal with a set of possible solutions 
to the problem in hand, while SA considers only one 
possible solution at a time. Because GA explores the set 
of all feasible solutions by combining the characteristics 
of various single feasible solutions, it sometimes covers 
a larger portion of the solution space than SA, within 
the same computational time. Thus, it appears to be the 
more successful of the two for FL problems. 

SA can be applied to FL problems in a variety of 
ways. There exist SA improvement heuristics for FL 
problems with 
i) multiple floors, (based on the improvement ap- 

proach) [26], 

ii) multiple objectives based on both transportation 

costs and REL chart scores [33]. 

For further information see [16] and [22]. However, 
it appears that the logarithmic cooling schedule of SA 
causes its FL heuristics to perform relatively slowly. For 
this reason it seems that GA heuristics are more effec- 
tive for FL problems. For instance the GA approach to 
solve the QAP, devised in [35], can be applied to QAP 
models of FL problems, such as the one given earlier. 
However, this GA heuristic has only a single solution 
giving rise to a mutant, which means that parallelism is 
lost to a certain extent. 

To overcome this deficiency, it is possible to design 
more effective GA heuristics for FL problems by adopt- 
ing a small mutation rate and a large crossover rate. 
A heuristic with efficient crossover operators with low 
level mutation has been devised in [34]. Further heuris- 
tic attempts to tackle the QAP include tabu search (see 
e.g. [2]) or the reverse elimination method [36]. 

The approaches to FL described so far have been 
classical in the sense that they have nearly all embraced 
single objective functions. In contrast, there have been 
developments in FL models with multiple criteria. Ex- 
amples include: a multifactor plant layout methodology 
devised in [15], a layout planning system with multiple 
criteria and a variable domain representation in [18], 
an expert system using priorities for solving multiple 
criteria facilities layout problems in [25], and a multi- 
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attribute decision theoretic approach for layout design 
in [31]. 

There are numerous computer programs in exis- 
tence which implement FL heuristics. Three early ones 
from the 1960s: CRAFT, CORELAP, and ALDEP have 
already been discussed. In the 1970s two improvement- 
style heuristics, both based on CRAFT, appeared to be 
among the best of those proposed then. FRAT (facili- 
ties relative allocation technique) [21] assumes that all 
the facilities have equal areas. TSP (terminal sampling 
procedure) [17] carries out the interchange of the place- 
ment, in the block plan, of pairs of facilities on a se- 
lective basis. The program has the ability to use im- 
proved block plans as input and to fix the placement of 
certain facilities. Three of the large number of FL pro- 
grams written in the 1980s will be mentioned. SPACE- 
CRAFT [20] is an extension of CRAFT to multifloor FL 
problems. See [17] for a perturbation scheme, and [18] 
for a new FL system which accommodates a variety of 
types of spaces, including solid, circulation, and empty. 
A multicriteria objective function involves transporta- 
tion cost, REL chart scores, the percentage of unused 
area, and block plan structure. 

The 1990s saw a different type of FL program 
emerging: the decision support system (DSS). One such 
example, called /ayout manager [8] is a user-friendly 
menu-driven DSS which provides for the choice be- 
tween a number of optimality criteria including, among 
others, transportation cost and REL chart scores. The 
system is written in Pascal, within the Microsoft Win- 
dows environment. 
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A typical assumption in facility location models is that 
the cost customers face in patronizing facilities is in- 
dependent of the actions of other customers (with the 
possible exception of capacity restrictions). For exam- 
ple, many classical facility location models assume that 
customers patronize the facility (or are served by the fa- 
cility) that minimizes the cost of travel between the fa- 
cility and the customer (e. g., see, [12,13]). Other facility 
location models incorporate marketing considerations, 
and assume that customers patronize the facility that is 
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‘most attractive’ to them, where attractiveness depends 
not only on travel cost, but also on attributes of each fa- 
cility such as size, goods offered, and number of servers 
(e.g., see [15]). 

However, in many situations, the cost customers 
face in patronizing a facility is a function of the actions 
of other customers. For example, waiting time for ser- 
vice may be longer in a store that is patronized by many 
customers than in a store with fewer customers. An am- 
bulance that serves a large, heavily-populated area is 
likely to incur longer delays in providing service than an 
ambulance serving a smaller, less-populated area. These 
are examples of negative externalities associated with 
the market share of the facility. Conversely, in some 
cases the externalities could be positive: for example, 
a crowded nightclub is likely to be more popular than 
one that attracts fewer patrons. 

If facilities provide essential services (e. g., gasoline, 
drivers’ licenses), customer demand may be constant, 
regardless of the costs customers face in obtaining ser- 
vices. However, for facilities that provide nonessential 
services (e.g., fast-food restaurants, retail stores), cus- 
tomer demand might be a function of the total cost of 
receiving service. 

This chapter discusses models for the location of 
facilities that incorporate not only travel cost but also 
negative externalities associated with the market share 
of the facility. Various problem formulations are dis- 
cussed, and selected references are provided. A more 
comprehensive discussion is given in [10]. The case of 
positive externalities is not discussed because, for such 
problems, degenerate solutions tend to occur (e. g., the 
optimal solution may be to locate all facilities at the 
same point, with any point in the region being opti- 
mal). 

One can consider two different situations regard- 
ing the allocation of customer demands to facilities. 
In a user-optimizing environment, customers patron- 
ize the facility that minimizes their total cost, in this 
case travel cost plus externality cost. Such a situation 
occurs, for example, in customers’ selection of grocery 
stores and bank branches. In a system-optimizing envi- 
ronment, customers are assigned to facilities by a cen- 
tral agent. An example is the assignment of voters to 
polling places. 

In the system-optimizing environment, allocation 
of customer demands can be considered as part of the 


location optimization problem, similar to many mod- 
els of facility location that do not incorporate exter- 
nalities. In the user-optimizing environment, however, 
models of facility location have at their core a customer- 
choice equilibrium problem: equilibrium occurs when 
each customer frequents the facility that minimizes his 
total travel cost plus externality cost. For purely neg- 
ative externalities, the equilibrium utilization of facili- 
ties (total demand satisfied by each facility) is unique, 
although the equilibrium user-choice pattern (alloca- 
tion of individual customer demands to facilities) may 
not be unique ([8,18]). This result holds whether de- 
mands are inelastic or elastic with respect to total cus- 
tomer cost. Determination of the user-choice equilib- 
rium can be written as a nonlinear complementarity 
problem (analogous to [1]), and also as a network flow 
problem [21] which can be solved using network opti- 
mization techniques (e. g., [20]). 

This article discusses models for the location of fa- 
cilities in both types of customer choice environments. 
A distinction is made between facilities with mobile 
servers (e. g., ambulances) that travel to fixed customers 
and return to their home location between calls and fa- 
cilities that house fixed servers (e. g., postal clerks). 


Location of Mobile Servers 


Some of the first location models to incorporate ex- 
ternalities were developed in the context of emergency 
service vehicle location. In such models, the servers 
(the emergency service vehicles) travel to customers, 
and the externality cost is the servers’ queueing de- 
lay. A system-optimizing environment is assumed: cus- 
tomers are assigned to service regions of the servers. 
Models for determining the home location of such mo- 
bile servers have considered a variety of location ob- 
jectives, including minimization of mean response time 
to customers (travel time plus queue delay), minimiza- 
tion of the maximum response time to any customer, 
equalization of server workloads, and other objectives. 
Examples of such models can be found in [3,4,5,7,11], 
and [19]. 


Location of Fixed Service Facilities 


Most other facility location models that incorporate ex- 
ternality costs have assumed fixed service facilities. 
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System-Optimizing Environment 


In the system-optimizing environment, since cus- 
tomers are assigned centrally to facilities, it is natu- 
ral to think only of noncompeting facilities. For the 
case of fixed customer demands, a natural objective 
in locating facilities (and allocating customers to fa- 
cilities) is to minimize total customer cost. This prob- 
lem is a generalized p-median problem. Such a model 
might be appropriate for the location of certain pub- 
lic facilities such as voters’ polling places. O. Berman 
and R.C. Larson [2] presented a p-median problem 
that includes queueing-like congestion of the facilities. 
In the system-optimizing environment with elastic de- 
mands, a natural objective is to locate facilities to max- 
imize the total demand served by facilities (i.e., max- 
imum facility utilization). Such a model might be rel- 
evant for the location of fast-food franchises or clinics 
for preventive childcare (e. g., inoculations). This facil- 
ity location problem is a generalized p-median prob- 
lem with an embedded demand equilibrium [18]. For 
the case of discrete customer demands on a network, 
S. Kumar [18] proved a nodal optimality theorem and 
showed that the problem can be formulated as a nonlin- 
ear integer convex programming problem and solved 
using branch and bound (see also [10]). 


User-Optimizing Environment 


In the user-optimizing environment with fixed service 
facilities, one can distinguish between noncompeting 
and competing facilities. For the case of noncompet- 
ing facilities in the user-optimizing environment with 
inelastic demand, a natural location objective is to min- 
imize total customer cost. This framework might be ap- 
propriate for the location of public facilities such as So- 
cial Security Offices. Assuming discrete customer de- 
mands, the problem can be written as a mixed integer 
bilevel program [10] (given a set of fixed facility loca- 
tions, one can then determine the user-choice equilib- 
rium utilization of facilities). M.L. Brandeau and S.S. 
Chiu [8] considered the case of two such facilities on 
a tree network with nodal demands. They character- 
ized the optimal facility locations, and presented an al- 
gorithm for finding those locations. 

A typical location objective for the case of com- 
peting facilities (whether or not externalities are con- 
sidered) is maximization of market share. When ex- 


ternalities are not considered, problems of competi- 
tive facility location involve a locational equilibrium; 
when negative externality costs and user-optimizing 
customer choice are considered, such problems also in- 
volve a customer-choice equilibrium. E. Kohlberg [17] 
considered the location of competing identical facilities 
on a line with uniformly distributed, inelastic demands 
where customers select a facility based on the sum of 
travel time plus waiting time for service. For the case 
of two facilities, the optimal locations occur at the mid- 
point of the line, and for the case of more than two fa- 
cilities, Kohlberg [17] showed that a locational equilib- 
rium does not occur. R.M. Braid [6] analyzed the lo- 
cational equilibrium for two congested public facilities 
located by competing governmental jurisdictions in an 
inelastic-demand environment. Brandeau and Chiu [9] 
analyzed the case of two competing facilities on a tree 
network with inelastic demands and a general negative 
externality function. Such a model might be appropri- 
ate for the location of similar competing grocery stores. 
They assumed a Stackelberg game (with a leader and 
a follower). They characterized the optimal locations of 
the leader and the follower, and presented an algorithm 
for finding those locations. 

Kumar [18] considered the location decision of 
a profit-maximizing firm that locates one facility in a re- 
gion where a number of competitors are already located 
and in which customer demand is elastic. An example 
application is the location of competing retail outlets. 
The problem is a bilevel programming problem which 
can be heuristically solved using a gradient projection 
ascent approach (e. g., [14]). 


Resource Allocation with Externalities 


If facilities are already located, changing facility loca- 
tions may be expensive. An alternative is to allocate re- 
sources to change the characteristics of the facility (e. g., 
through training or technological improvements). The 
question is how to balance the cost of change with the 
associated benefits (e.g., increased market share, low- 
ered total customer cost). Resource allocation problems 
of this type are discussed in [10] and [16]. 


See also 


> Combinatorial Optimization Algorithms in 
Resource Allocation Problems 
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Facility location problems deal with the question of 
where to locate certain facilities, so that they can sat- 
isfy some kind of demand of a certain set of customers, 
and so that the total cost is minimized. If the facili- 
ties are factories or warehouses and the goods will be 
shipped from the facilities to the customers, one can as- 
sume that the shipments will be made so as to minimize 
the transportation costs. (See also ® Facility Location 
with Staircase Costs and » Stochastic Transportation 
and Location Problems.) However, if the facilities are 
hospitals or supermarkets, transportation will consist of 
customers traveling by their own means to/from the fa- 
cility, and in such a case, it is not certain that each cus- 
tomer will behave exactly so as to minimize the trans- 
portation costs. 

So in public facility location problems where the 
clients are free to make their own choice of facility, 
one should probably expect different results than those 
minimizing the transportation costs. Modeling such 
situations, the objective cannot only be to minimize 
the total transportation and facility costs. The effect 
of spatial interaction has been used to improve loca- 
tion models of this type. Simple plant location problems 
with spatial interaction between the travelers have been 
treated in [3,4,6,19,20,22,23], modeled as a nonlinear, 
mixed integer programming problem. 

In [15] a different model is derived, in a similar way 
as used in [14], that does not use the approximation 
yielding entropy terms. The model is called the ‘exact’ 
formulation of the simple plant location problem with 
spatial interaction, because of the usage of the classical 
way of deriving the gravity model, without doing any 
approximation. 

Assuming integer requirements on the transported 
amounts enables an exact linearization of the nonlinear 
costs. This yields a linear, pure zero-one model, to the 
price of a significantly increased number of variables. 


Luckily the model has a special structure that can be 
exploited by several different solution methods. 


Model 


We now describe a public facility location model, with 
m possible locations for supply points (plants) and n de- 
mand points (client zones). The fixed cost for opening 
plant i is a;. At demand point j the demand (the num- 
ber of clients in zone j) is w;. Trips will be made between 
the demand points and the opened plants so that the de- 
mand is satisfied. The transportation costs for one trip 
between plant i and demand point j (i.e. the cost for 
a client at zone j to get service at plant i) is cj. 
The following variables are introduced. 


1 ifa plant at location i is opened, 


a 
: 0 if not, 
the number of trips between 
plant i and demand point j 
xij = 


(i. e. the number of clients 


in zone j getting service at plant i). 


The total cost for transportation and opening plants 
is 


m n m 
v; = min) y em) AjZj. 


i=1 j=1 i=1 


As for the spatial interaction, one may note that 
several microstates (obtained by identifying every sin- 
gle client’s trip) may yield the same macrostate (the x- 
solution). The macrostate given by the largest number 
of microstates is the most probable solution, according 
to [29]. Maximizing the number of microstates yielding 
x yields another objective function for finding the most 
likely x-solution. 


Vv. = min 3 3 In(x;;!). 


i=1 j=1 


A suitable objective function is now obtained by 
combining these two parts, ve = v2 + yv — 1, where 
the weight y reflects the sensitivity of the system to the 
costs. For large values of y, it is very important to min- 
imize the costs, while for smaller values of y, the costs 
are not very important. 


984 


Facility Location Problems with Spatial Interaction 


The best value of the parameter y, being the weight 
of how much the clients take the costs into account, 
must be found by calibrating the model against a real 
life situation. Considering a certain situation, one can 
assume that y is fixed and given. 

The following model (SPLPS) is obtained: 


* 


vy" = min 
yy L Lest ba Zi 
i=1 j=1 i=1 j=1 
such that 
> xi = Wj, Vi (1) 
i=1 
j Wiz <0 Vi,j (2) 
xij 20, integer, Vi,j (3) 
z; € {0,1}, Vi. (4) 


SPLPS is a pure integer problem, with a nonlinear 
objective function that actually is defined only in the 
integer points. 

If y is so large that the logarithmic part is negligible, 
we get pure cost minimization. The model is then iden- 
tical to the simple plant location problem, SPLP, and 
can be efficiently solved by for example a dual ascent 
method, [5]. 

In previous work, a continuous relaxation of x 
together with Stirling’s approximation, In(xj!) ~ xj 
In(xj) — xj, have been used, yielding a nonlinear, mixed 
integer programming problem. 

Now we linearize the cost function for each vari- 
able xj in the interval 0 < xj < wj, with break points 
at each integer point. This does not introduce any 
error (as Stirling’s approximation would). The num- 
ber of variables then depends on the values of the 
demands. 

We get cijk = In(k!) — In((k — 1!) 4+ vei; 
In(k) + ycjj. Note that ¢jjxk > Cijx—1, [18], which in- 
dicates convexity of the resulting cost functions. 

Then we do the substitution xj = ee Xijks Where x jjx 
is the amount of x; that falls in the interval (k — 1, k). 
The following model (SPLPE) is obtained: 


v* = i ee + De Zi 


i=1 j=1 k=1 


such that 
m Wj 
ea = Wis Vi. (5) 
i=1k=1 
Xijk — Zi <0, Vi, j. k, (6) 
Xijk € {0, 1}, Vi, j k, (7) 
z € {0,1}, Wi. (8) 


This is a large linear integer programming problem 
with m(1 + }1"_, w;) binary variables. The fact that it is 
a pure 0-1- spioblemi is favorable when it comes to solu- 
tion methods. The coefficients in the constraints (6) are 
all reduced to one, so the formulation is probably quite 
strong. 


Solution Methods 


It is in principle possible to solve SPLPE with a standard 
integer programming code, but the size of the model 
prohibits this for all instances but very small ones. As 
the model is fairly new, one cannot find many solution 
methods proposed in the literature. 

A dual ascent procedure for this problem has been 
developed, see [17]. Another method, based on the 
same dual, is Lagrangian relaxation and subgradient 
optimization, investigated in [18]. Solution methods 
based on primal and dual decomposition techniques 
can also be used, see [16], where one conclusion is that 
Benders decomposition seems to be an efficient solu- 
tion method. In [13], the dual ascent approach is in- 
serted in a branch and bound framework, and applied 
to a somewhat more general problem. 

We will briefly describe these methods below. 


The Dual Ascent and Adjustment Method 


A dual ascent procedure can be used to, in principle, 
solve the LP-dual of the LP relaxation of SPLPE, by in- 
creasing the dual variables in small steps, in such a way 
that an ascent of the dual function is obtained in each 
step. Furthermore, a dual adjustment procedure can be 
used to temporarily decrease dual variables that block 
further improvement. 

Let a; denote the dual variables corresponding to 
constraint set 1 of the LP relaxation of SPLPE, fix the 
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dual variables corresponding to constraint set 2 and 
6; the dual variables corresponding to the constraints 
Z; < 1. The LP-dual will be as follows: 


n m 
v* = max) wjoj+ > 4; 
j=l i=l 


such that 
aj — Bijk <Cijx, Vi, j,k, (9) 


n 


wij 
>>> Bie — 8: < vai, Vi, 


(10) 
j=l k=1 
Bijx = 0, Vins, k, (11) 
6;>0, Vi. (12) 


The basic steps are to make moves in the dual vari- 
ables a. For fixed a = @, the LP-dual is trivially solv- 
able, yielding 


Bijx = max(0,0; —Cijx), Wi, j,k, 
and 
n Wj 
6j = max 0,5) S > Bijk — ya: 7 Vi. 
j=l k=1 
Now let kj be such that 
a; es Cijk, Vk < kij, aj < Cijks Vk > kij, 
and 
_ 1 if @; = Cijkij 
oa 0 if not. 
Also, let 
n Wj 
si = a S > max(0, 2; = Cijk) 
j=l k=1 


and define I?={i: s; > y aj}, IW={i: s; = yaj}, IS = {iz 5; < 
ya;and I= =I” UI. 
The complementary slackness conditions are 


Xijk = Zi Vk <kij—qij, Visj, 
Xijk =0, Wk>kij, Vij, 
ze=l, Vier" 

z=0, Viel 


Now define 
wi(@) = >> (kij — aij) 
ieI> 
and 
wi@) = >> kiy. 
ie€[= 


Then it can be shown, [17], that 


wi (@) < ba > a = wi (@). 


i=1 k=1 


This means that w! and w are lower and upper bounds 
on the left-hand sides of constraints (9). In order to 
obtain feasibility (optimality in the dual) the intervals 
between these bounds should contain the right-hand 
sides w;. The following is proved in [17]: If wi (@) < 
w; < w7(a), Vj, then a is optimal in the LP relaxation 
of SPLPE. 

The dual ascent method is now to increase a; in 
small steps, so that wi(@) and wi (@) increase. The in- 
crease of a certain a; is bounded by the closest break- 
point, induced by the dual constraints of either set 1 
(corresponding to enabling or forcing the increase of 
yet another xj) or set 2 (corresponding to enabling or 
forcing the increase of yet another z;). 

The bounds wi and w‘ will approach w; from be- 
low, and wi will not be allowed to exceed w;. The in- 
crease of a is repeated, in each step for the j which yields 
the largest distance between w’ and w,, until optimum 
is found or improvement is blocked (i.e. a further in- 
crease of any a would result in wi > wj). In the last case 
we use an adjustment procedure, which decreases some 
aj, in order to allow the increase of other a;’s. Then 
the ascent phase above is repeated. More details can be 
found in [17]. 


Dual Ascent and Branch-And-Bound 


The dual ascent and adjustment procedure only solves 
the LP relaxation of the problem, so to find the ex- 
act integer optimum, the procedure must be used 
within a branch and bound framework. The subprob- 
lem in each node of the branch and bound tree is then 
solved with the dual ascent procedure, in the sense that 
lower bounds on the optimal objective function value 
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and sometimes feasible primal solutions are obtained. 
Branches are cut off when the lower bound exceeds the 
best upper bound known. 

One can note that if all the z-variables are fixed in 
SPLPE, then the problem is trivially solvable, and the 
x-variables will attain integer values even if the con- 
straints xj € {0, 1} are replaced by 0 < xj < 1, so 
it may be regarded as an LP-problem. Furthermore it 
is proved, in [13], that the dual ascent procedure accu- 
rately solves the problem when all z-variables are fixed, 
within a finite number of steps. 

Therefore, it is natural to do the branching over the 
z-variables. Fixed z-variables are handled as follows in 
the dual ascent phase. Let I = {I: z; is fixed to 0} and 
I, = {I: z; is fixed to 1}. For all i € Ip U Ih, the dual 
variables 6; are removed, and the corresponding dual 
constraints in set 2 are removed. For all i € I, the cor- 
responding primal constraints in set 2 are redundant, 
so we can assume that Bix =0,Vie lh, Vj, k. Also, 
Xie = 0, Vi € Ip, Vj, k. 

All elements of Ip U I, must be removed from I’, 
I>, I< and I. It is not necessary to calculate s; for i € 
Ip UI. After these changes, the bounds wi and we are 
calculated as above. 

Some supporting hyperplanes, and breakpoints, are 
removed from the dual function, as a result of the fixa- 
tions, so in the dual ascent procedure, fewer steps often 
need to be taken. (Sometimes the increase of some a; 
is limited by the breakpoint where a facility is opened. 
This will not occur if the facility is fixed open or closed.) 

In the worst case, the branch and bound method 
will enumerate all z-solutions. Thus we have the fol- 
lowing result: The dual ascent method within a branch 
and bound framework will find the exact optimum of 
SPLPE within a finite number of steps. 

In practice branching is done when the dual ascent 
and adjustment procedure stops, which not necessar- 
ily means that the LP-optimum is found. In many cases 
unnecessary branching is done, and we must expect the 
branch and bound tree to be larger than it would be for 
an LP-based branch and bound method. 

Branching is done over any z;, i € I”, since any value 
between 0 and 1 is optimal for such a Z;, i.e. the com- 
plementary slackness conditions allow for nonintegral 
values of z;. 

The original dual ascent method starts from zero 
(no facilities opened and nothing sent). However, for 


very small values of y in SPLPS many facilities will be 
opened, while for very large values of y the z-solution 
obtained for the ordinary uncapacitated facility loca- 
tion problem, SPLP, by for examples the dual ascent 
method DUALOC, [5], might be optimal or close to 
optimal in SPLPS. In such cases one can use these so- 
lutions as starting solutions. 

The choice of which dual variable, aj, to increase 
first in the dual ascent procedure, could be done cycli- 
cally in j, but it seems better to choose the j which ex- 
hibits the maximal residual, i. e. the largest gap between 


d 


my 


and wj. 


Lagrangian Relaxation 
with Subgradient Optimization 


Lagrangian relaxation is a well known and often used 
approach for approximate solution of integer and 
mixed integer problems, see for example [8] and [7]. 
The Lagrangian relaxation of SPLPE is obtained by re- 
laxing the demand constraints, using multipliers a. We 
obtain the following Lagrangian dual: 
(LD) vy = maxg(qQ), 

where, for fixed multipliers, @ = @, the Lagrangian re- 
laxation takes the following form: 


n Wj 


g@) =min > >> cijexijx 


i=1 j=1 k=1 


m 
+ > Y GiZi 
i=1 
n m Wj 
j=l i=1 k=1 
s.t. Xijk — Zi <0, Vi, j, k, 
Xijk € {0, 1}, Vi, j,k, 
Zi € {0, 1}, Vi. 


(DS) 


(DS) separates into m problems, one for each fa- 
cility, containing one binary variable and a number of 
continuous variables, and is trivially solvable. It has the 
‘integrality property’, i.e. the y-solution will obtain in- 
tegral values even if the integrality constraints are re- 
moved. This property implies that the optimal value of 
LD is the same as that of the LP relaxation. 
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In order to solve the Lagrangian dual, we use 
the well known technique subgradient optimization, 
see [25,26] and [9]. We use a subgradient, €, of the dual 
function g(a) to find a search direction for updating 
the multipliers, a: 


where x denotes the optimal solution of (DS) at a. Let- 
ting @” denote the multiplier values in iteration 1, we 
obtain the multipliers in the next iteration by setting 
git) — A) 4 pDEO 
where t” and &” are the stepsize and the search di- 
rection. Several ways of choosing the stepsize, ¢“), have 


been suggested. Here we use the one that is suggested 
by [26]: 


19 = [ls g(a) 

Jeo] 
where ¥ is an upper bound of v; and 4; should be as- 
signed a value in the interval (¢|, 2 — ¢,), where &, > 0, 
in order to ensure convergence. 

Termination of the subgradient search procedure 
occurs when ||d” || < e, t9 <€,1>M,V—g(@) < € or 
v —v <1. The last criterion indicates optimality, since 
all feasible solutions are integral, i.e. v* is integral. 


Benders Decomposition 


We have noted that if all z-variables were fixed, the so- 
lution would not be changed if the constraints xj € 
{0,1} were replaced by 0 < xj < 1. Therefore one might 
regard SPLPE as a mixed integer programming prob- 
lem. This opens up the possibility of solving the prob- 
lem with Benders decomposition, [1]. Below we give 
a short description of how the method can be applied 
to SPLPE, as done in [16]. 

In the Benders subproblem, (PS), we fix z to Z, 
which makes the subproblem separable into several 
trivial knapsack problems: 


(PS) A(Z) = DUA + > vaizi, 
j=l i=1 


where, V j, 


Vi,k. 


(PS) is feasible if and only if }°,; Z; > 1. The dual solu- 
tion (a, B) is also easy to calculate. 
The Benders master problem is given below. 


n m 
VPM = min )° qj + > V AjZi 
j=l i=1 
m Wj 
st. gy = wou = > Bue 
(PM) i=1 k=1 
VIj 
m 
ee >= 1, 
i=1 
Zi€ {0, 1}, Vi. 


The Benders decomposition method is to iterate be- 
tween the master problem, (PM), and the subproblem, 
(PS). (PM) yields a lower bound on v*, and Z to be used 
in (PS). (PM) yields an upper bound on v* (for inte- 
gral Z) and a new dual solution, (a, pa which is used 
to form a new cut for the master problem. The method 
has exact finite convergence. 

The proportion of z-variables is much smaller in 
SPLPE than in SPLP, which is promising for the 
Benders decomposition approach. However, as shown 
computationally in [16], (PM) often is very difficult to 
solve. A suggested modification, [24], is to use the LP re- 
laxation of (PM), by replacing z; € {0, 1} with 0 < z; <1, 
in initial iterations (for example until the LP-bounds 
are within 1% of each other). A good set of Benders cuts 
is thus generated before the integer master problem is 
solved. It is possible since any dual feasible solution of 
(PS) yields a valid Benders cut, and Zz only appears in 
the dual objective function. 

If Z is not integer, (PS) might not yield integer x- 
solutions, but is still easily solvable. The bounds ob- 
tained from the master problem and subproblem are 
not valid for the integer problem, but for the LP relax- 
ation of SPLPE. 
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Mean Value Cross Decomposition 


An alternate way of solving the LP relaxation of the 
problem is to use the method mean value cross de- 
composition, [10,11,12]. This method is a modifica- 
tion of the subproblem phase of ordinary cross de- 
composition, [28], but also a generalization of the 
Kornai-Liptak method, [21], and a generalization of 
the Brown-Robinson methods for polyhedral games, 
[2,27]. 

The method uses the Lagrangian relaxation and the 
subproblem of Benders decomposition, both described 
in previous sections, but no master problems. The input 
to one of the problems consists of the mean value of 
all the previous solutions of the other subproblem. The 
method has asymptotic convergence. 


Comparisons and the Role of 


The parameter y reflects the relation between the trans- 
portation costs and the effects of the spatial interaction 
in the objective function, and its value should be chosen 
specifically for each real life situation. 

For very small values of y, the optimal solution is 
z; = 1, V i, while for large values of y, the optimal z- 
solution can be obtained by DUALOC. In these cases 
the problem is then completely solved by simply solv- 
ing the primal subproblem, (PS), once. In computa- 
tional tests in [16,17,18] and [13], this occurs when y 
is smaller than 0.0001 or larger than 0.1, while for y = 
0.01 the differences to the solutions mentioned above 
are the largest. 

The conclusions of the computational tests in [16, 
17,18] and [13] are the following. Ordinary Benders de- 
composition seems to be more efficient than direct so- 
lution with a general integer programming code. How- 
ever, direct solution with a standard IP-code can only 
solve small problems, due to memory requirements, 
and the ordinary Benders decomposition method also 
fails for many of the problems. The integer master 
problem is simply too hard. 

The modified Benders decomposition method 
(starting with the LP relaxation of the master problem) 
eliminates the weaknesses of the Benders approach, and 
is a very efficient method. 

The approximate methods mean value cross de- 
composition and Lagrangian relaxation with dual sub- 
gradient optimization are much quicker than ordinary 


Benders, but not better than modified Benders decom- 
position. For some large problems, these methods give 
large gaps between the upper and lower bounds. 

The dual ascent method is also quite quick, but 
leaves gaps between the upper and lower bounds of 
varying size. In [18] it is noted that the dual ascent 
method and the Lagrangian method complement each 
other in an interesting way. 

The best methods seems to be the modified Benders 
decomposition method and the dual ascent method 
with branch and bound. These methods are capable of 
solving quite large problems (up to almost 3,000,000 
variables) optimally. 

Finally we wish to point out that none of these 
methods explicitly store the whole x-matrix, and that 
this is what enables the solving of large problems. 


Conclusion 


We have described the simple plant location problem 
with spatial interaction, applied an exact linearization 
to the problem, and described a couple of solution 
methods for the resulting large integer programming 
problem. Although the model has a large number of 
variables, the methods are able to solve it quite effi- 
ciently. The problem is very well suited for the ap- 
proaches of Benders decomposition, Lagrangian relax- 
ation, and dual ascent. These methods actually manage 
to solve the problem without storing all of the variables, 
and especially the dual ascent method uses relatively 
small amounts of computer memory. 

We conclude that the model is solvable and useful 
in practice. 
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Location of facilities, plants, or other units for produc- 
tion or distribution, is an important problem in many 
different situations. The same type of problem can oc- 
cur when one is installing equipment in for example 
telecommunication networks, or when installing ma- 
chines in a factory. 

The common circumstances in these situations are 
the following. A number of units, ‘facilities’, produc- 
ing a certain service, may be located at certain possible 
points. The service commodity is then to be sent from 
the facilities to certain ‘customer’ points, which have 
a certain demand for the service. The main complica- 
tion is that the costs for production of the service is not 
linear, instead there is a fixed cost for placing a facil- 
ity at a certain location. In addition there may be linear 
costs for producing and shipping the commodity to the 
customers. 

In the literature, see for example [3,4,6,8,13] and [1], 
one can find the traditional capacitated plant location 
model, where the total cost for satisfying demand con- 
sists of two parts, namely linear transportation costs 
and fixed costs for opening/installing the facilities. In 
this model there is one fixed cost for each facility. 
(Other variants can be found in » Stochastic trans- 
portation and location problems and > Facility location 
problems with spatial interaction.) 

However, in practice, there is often a need for con- 
sidering several different possible sizes of each facility. 
This leads to a facility location problem with staircase 
shaped costs. This approach will not only allow differ- 
ent sizes, but also different production costs at different 
levels of production at a facility. 

For example, in telecommunications there are al- 
most always several different sizes for the fibers, cables, 


switches, controllers and other connections that must 
be dimensioned when installing a new network. In such 
problems staircase shaped costs will occur at several dif- 
ferent levels, both for the activities at nodes as well as 
along links. One situation where the specific location 
model discussed here is quite appropriate is the instal- 
lation of video servers for a video-on-demand service 
on a telecommunication network. 


Mathematical Model 


We define a staircase cost function as a finite piece- 
wise linear nondecreasing function with a finite set of 
discontinuities, each corresponding to a certain size of 
a facility. Let m be the number of possible location sites, 
n the number of customers and q; the number of pos- 
sible sizes at location site i. Furthermore, Dj is the de- 
mand of customer j, px is the unit cost of production at 
a facility at location site i and size k, Sj, is the capacity 
of a facility of size k at location site i, fx is the fixed cost 
for a facility of size k at location site i, and cj is the cost 
for sending one unit from location site i to customer j. 

The following variables are used: ti is the produc- 
tion within level k at facility i (where level k of the stair- 
case corresponds to an operating facility of size k), x; is 
the amount shipped from location i to customer j, and 
Vik is set to 1 if the facility at site i is of size k or larger 
and 0 otherwise. 

The capacities and costs for increasing the size of 
a facility are A Si, = Six — Sie —1 and A fix = fix —fik—1 
— (pik — Pik—1) Sik—1, where Sig = 0 and fio, = 0, see 
Fig. 1. Note that 0 < tx. < A Si, Vi, k, and if the total 
production at facility i requires more than size k, then 
tik = A Six. 


mon 
v* = min ) ) CijXij 


i=1 j=1 


m i 
+ > Y\(piktik + Afik Vik): 


i=1 k=1 


such that 


Yo xij = Dj, Vi. (1) 


i=1 


n qi 
Le = » tik, Vi, (2) 
j=l k=1 
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tik < ASixyik, Vik, (3) 
tik-1 > ASiz-1yizr, Vi, k > 1, (4) 
e720, Vik (5) 
vik =Oorl, Vi,k. (6) 


It is natural to assume that pj. > 0, A fix = 0, A Six 
> 0 for all i, k and D; > 0 for all j. The constraints (1) en- 
sure that all the demand must be met for each customer, 
while (2) ensure that, for each location, the amount 
shipped also is produced. Constraint sets (3) and (4) 
ensure that the level of production corresponds to the 
correct level on the staircase cost function for each fa- 
cility. One might note that from constraints (3) and (4) 
follows that yic41 < Vik. 

This is a linear mixed integer programming prob- 
lem with mn + }°7_, q; continuous variables and 
yo, 4 integer variables. The proportion of integer 
variables is higher than in the ordinary facility lo- 
cation problem. Because of this, and because of the 
structure of the problem, solving the problem with 
a general code for mixed integer programming prob- 
lems is probably not very efficient for large (real life) 
instances. 

One aspect of the structure of the problem is that if 
y is kept fixed (i.e. the sizes of the facilities are given), 
the remaining problem is simply a standard network 
flow problem, and hence x and ¢ will attain integer 
values. 

Another important aspect of the structure of the 
problem is the potential separability. There are several 


a ae ar ne 
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possibilities of making the model separable by relaxing 
different sets of constraints. 

It is also possible to use a problem formulation with 
f and S instead of A f and A S. This yields constraints 
of SOS1-type (one must ensure that only one of the pos- 
sible sizes is used at a facility), and a somewhat smaller 
problem (less constraints). The LP relaxation is quicker 
to solve and the optimal objective function value is the 
same as that of the model above (i.e. the duality gaps 
of the two formulations are the same). However, solv- 
ing the model with general mixed integer codes, the 
alternate model seems to produce larger branch and 
bound trees. Concerning the methods discussed below, 
the two models in most cases behave in identical man- 
ners. 


Solution Methods 


Methods for models with staircase cost functions or 
for models capable of modeling such functions can be 
found in for example [2,11,14,15] and [12]. We will be- 
low describe some possibilities. 

If the exact solution is to be found (and verified), 
the only reasonable way seems to be to resort to branch 
and bound, in some sense. This matter in general is ex- 
tensively discussed in the literature, and although there 
might be some considerations for the staircase cost case 
that differ from the single fixed cost case, when it comes 
to branching and search strategies, we will not dwell on 
it here. 

Assuming a standard branch and bound frame- 
work, the main question is how to solve the subprob- 
lems, i.e. how to get the bounds, especially the lower 
bounds. This will be discussed more below. 

However, an alternative is to move the branch and 
bound procedure into a Benders master problem, i.e. 
use a Benders decomposition framework in order to 
obtain the exact solution. This will also be briefly de- 
scribed below. 

We will start with procedures for obtaining upper 
and lower bounds on the optimal objective function 
value. The upper bounds correspond to feasible solu- 
tions obtained, while the lower bounds are used to get 
estimates of the quality of the feasible solutions. If all 
cost coefficients are integral, we note that a lower bound 
that is within one unit from the upper bound indicates 
that the upper bound is optimal. 
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Primal Heuristics 


There is a well-known ADD heuristic for the capaci- 

tated plant location problem, [13], which can be mod- 

ified to suit the staircase cost facility location problem, 
see [12]. This heuristic can be improved by combining 

it with certain priority rules, [6]. 

If for each plant it is decided to which level of pro- 
duction it can be used (i.e. the y-variables are fixed), 
the resulting problem is an unbalanced transportation 
problem. Let L; denote the level (size) of plant i, and 
initiate the heuristic by setting L; = 0, Vi. Let I = {i: L; 
< qi}. The ADD heuristic consists of the repeated use 
of the following step: Increase the size (set L; = L; + 1) 
of the location site i € J that provides the largest reduc- 
tion of the total cost. Terminate the procedure when no 
more reduction is possible. 

In order to avoid ADD increasing the level of pro- 
duction in the order of ‘decreasing’ capacity until a fea- 
sible solution is found, we apply a generalization of 
one of the priority rules discussed in [6]. These priority 
rules provides a better phase-1 solution than the ADD 
heuristic itself. Two examples of priority rules, PR1 and 
PR2, for choosing the location site i € I where the size 
is to be increased (L; = L; + 1), are given below. (They 
correspond to P1 and P3 in the notation of [6]). 

PRI) Choose site i € I in the order of decreasing quo- 
tients A S;,1,41/A fi,1;41, until the location sites 
are able to serve the entire demand. 

PR2) Choose site i € J in the order of increasing values 
of 


[n/3] 

1 =. Aha 
¢;; + ——_., 

Ln/3 | 2, 7 ASi 54 


1 


until the location sites are able to serve the en- 
tire demand. (Cc is c sorted according to increasing 
values.) 
In [13] the ADD heuristic is outperformed by the 
heuristic DROP but [6] show that ADD with prior- 
ity rules produce solutions with equally good objective 
function values as DROP, in less computational time. 


Linearization 


A widely used way of obtaining a lower bound is di- 
rect LP relaxation. The integer requirements (6) are re- 
placed with the constraints 0 < yx < 1, Vi, k. We also 


include the redundant constraints tj, < A Sj, Vi, k, and 
possibly yi < yik—1 for all i, k > 1. The optimal objec- 
tive function value of the LP relaxation is denoted by 
vip, and vpp < v*. The duality gap, the difference be- 
tween v* and vyp, is in most cases larger than zero. The 
LP-problem is large, but sparse, and can be solved with 
a standard LP-code. 


Convex Piecewise Linearization 


Since the binary variables y; are only included to give 
the correct cost for the production, they can be elim- 
inated if we use an approximation of the costs. If the 
staircase cost function is underestimated with a piece- 
wise linear and convex function, we get a problem, 
much easier to solve, which gives a lower bound on 1*, 
denoted by vcprz, see [14] and [11]. For explicit expres- 
sions of how to construct the convex piecewise lin- 
earization see [11]. 

The resulting problem is a linear minimal cost net- 
work flow problem with parallel arcs, which is quite eas- 
ily solvable by a standard network code. The x- and t- 
part of the solution is feasible in the original problem, 
so we can generate an upper bound by evaluating this 
solution in the correct cost function, which is done by 
finding the correct values of y. 

In [10] it is proved that the convex piecewise lin- 
earization and the LP relaxation are equivalent, in the 
sense that vcp, = vpp and an x-solution that is opti- 
mal in one of the problems is also optimal in the other 
problem. Utilizing the network structure, we thus get 
a quicker way of solving the LP relaxation. 


Benders Decomposition 


In [11] a Benders decomposition approach is used, and 
combined with the convex piecewise linearization de- 
scribed above. 

The Benders subproblem is simply obtained by fix- 
ing the integer variables, i.e. fixing the sizes of the fa- 
cilities. The resulting problem is minimal cost network 
flow problem, similar to a transportation problem, but 
with certain intervals (given by the facility sizes) for the 
supplies. 

However, the Benders master problem obtained 
by a standard application of the Benders approach, is 
much too hard to solve. The number of integer variables 
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is much larger than in an ordinary location problem 

with the same numbers of facilities and customers. One 

way around this is to combine the Benders approach 
with the convex piecewise linearization. 

An improved piecewise linearization is obtained by 
branching at certain production levels. A staircase cost 
function is divided into two parts by the branching, and 
a binary variable is introduced, indicating which of the 
parts that is to be used. In each of the two parts, con- 
vex piecewise linearization is used. In this manner, one 
could design a branch and bound method for solving 
the problem, similar to what is described in [14]. 

Considering the model after a number of branch- 
ings, we have an approximation (a relaxation) of the 
original problem, with a much smaller number of in- 
teger variables. On this problem we then apply Benders 
decomposition. 

In principle one could let each subproblem in the 
branch and bound method be solved exactly with 
Benders decomposition, thereby obtaining basically 
a branch and bound method, which employs Benders 
decomposition to solve the branch and bound subprob- 
lems. This is however very inefficient. 

The other extreme is standard application of Ben- 
ders decomposition to the original problem, in which 
case the Benders approach employs branch and bound 
to solve the Benders subproblems. This is also quite in- 
efficient in practice. 

A more efficient method is to combine the two 
approaches, Benders decomposition and branch and 
bound on a more equal level. This can be done the fol- 
lowing way. 

1) Solve the initial convex piecewise linearization (with 
a network code). 

2) Do one or more branchings, where the error of the 
approximation at the obtained solution is largest. 

3) Solve the obtained problem with Benders decompo- 
sition (to a certain accuracy). 

4) Repeat 2) and 3), until optimality. 

There are two very important comments to the above 

algorithm. 

A) When one returns to step 3) after having done 
branchings, one can recalculate and reuse all the 
Benders cuts obtained before the branchings. (This 
is described in detail in [11].) 

B) The stopping criterion for the Benders method, i. e. 
the required accuracy in step 3), is a very important 


control parameter. One should in initial iterations 
require a low accuracy, and gradually, as the method 
approaches the optimal solution, require higher and 
higher accuracies. 
The effect of combining comments A) and B) is that one 
should only do a few Benders iteration in each main 
iteration, since the number of Benders cuts will auto- 
matically increase, as the old cuts are recalculated and 
kept. 

The main conclusion of the computational tests 
done in [11] is that only a small part of all the integer 
variables (in average 4%) need to be included by the im- 
proving piecewise linearization technique, when solv- 
ing a problem to reasonable accuracy. In other words, 
only a small subset of the possible sizes need to be in- 
vestigated. 


Lagrangian Relaxation 
and Subgradient Optimization 


Now we will describe a Lagrangian heuristic, found 
in [12], in more detail. Lagrangian relaxation and sub- 
gradient optimization are used to obtain a near-optimal 
dual solution, and act as a base for an efficient primal 
heuristic. Based on the solution of the Lagrangian re- 
laxation one can construct a transportation problem 
which yields primal feasible solutions, and can be used 
during the subgradient process. 

An important aspect of the Lagrangian approach is 
that a method yielding good feasible primal solutions 
can be based on dual techniques. 

Lagrangian relaxation, [7], in combination with 
subgradient optimization, [9] is a commonly used tech- 
nique for generating lower bounds on the optimal ob- 
jective function value of mixed integer programming 
problems. Here we apply Lagrangian relaxation to con- 
straint set (2), and denote the Lagrangian multipliers by 
u;. For fixed values of u, the subproblem separates into 
several smaller problems: 


m 
6,;(4) = min SG + Uj) xij 
i=l 
m 
s.t. Sms => Dj, 
i=l 


xij 2 0, 
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qi 

= min Yo (pit —Ui)tir + Afixyir) 
k=1 

s.t. tik < ASik Vik; Vk, 


92; (U4) 


tik-1 = ASix-1yik, Wk>1, 
tix = 0, 


Vik = Oorl 


The first set of subproblems consists of n continu- 
ous knapsack problems, which are trivially solvable. 
The second set of subproblems consists of m one- 
dimensional staircase cost problems. The solution can 
be found by calculating the minimizer k; for each i, as 
follows: 
ki 
02;(4;) = min 
ki=0,...,43 ke 


((pik — Wi)tik + Afixyik). 
1 


The resulting solution is 


a 1, VWk< ki, 

ia 0, Vk > ki, 
, ASixk, Vk < ki, 
ti, = rh 
0, Vk > kj. 


Note that the subproblem has the integrality prop- 
erty, [7], so max 6(u) = vp. 
The Lagrangian dual, 


max 0(u) = » 6, j(u) + > 63; (u) 
j=l 


i=1 


can be solved by standard subgradient optimiza- 
tion, [9], in order to get the best lower bound. One can 
use enhancements such as modified directions, [5], d’ 
= & +a d’~', where &” is the subgradient generated in 
iteration r and d" is the direction used in iteration r. 

A steplength shortening is obtained by setting 
A = A\2 when there has not been any improvement of v 
for N; consecutive iterations. When there has not been 
any improvement of v for N> iterations, the subgradient 
optimization procedure is terminated. The subgradient 
is given by £7 = 0" xy’ — Df, tx’ for all i, where xj 
and tj’ are the optimal solutions to the subproblems. 
Reasonable choices for the parameters are Ni = 6, 
N) = 25, and a = 0.7. 

One can use a heuristic based on the solution of the 
Lagrangian relaxation to try to get a feasible solution. 


The obtained values of yj’ are used to calculate the 
supply at each location and a transportation problem 
is solved. The solution to the transportation problem is 
feasible in the original problem if constraint sets (3) and 
(4) are satisfied, which easily can be achieved. The val- 
ues of the flow variables x; are taken directly from the 
solution to the transportation problem. The total pro- 
duction t; is then calculated as t; = 77, xj. One can 
then easily find ft; as the part of t; that lies within level 
k, and the yx solution is simply yi = 1 if ti, > 0 and 0 if 
not. Finally all unnecessary production capacity at each 
location i is removed. 

The complete Lagrangian heuristic, LH, also in- 
cludes the following. The convex piecewise lineariza- 
tion, CPL, is solved with an efficient network code. The 
Lagrangian multipliers are initiated with a convex com- 
bination of the appropriate node prices obtained by 
solving CPL and min; cj, with the largest weight on the 
former. The primal procedure to generate feasible so- 
lutions is used every third iteration in the subgradient 
procedure. 

Note that CPL yields vcp, = vzp, so the subgradi- 
ent procedure cannot improve the lower bound, which 
is quite unusual in methods of this kind. The motiva- 
tion behind using the subgradient procedure is not to 
get lower bounds, but to get primal solutions (upper 
bounds). 


Computational Results 


In [12] the heuristic procedures are tested by solving 
a number of randomly generated test problems, with up 
to 50 locations, 100 destinations and 20 sizes of each 
location (yielding 6000 continuous variables and 1000 
integer variables). The conclusions of the tests are the 
following. 

A standard mixed integer programming code (in 
this case LAMPS) needs extremely long solution times 
for finding the exact optimum. The ADD heuristics 
produce solutions with relative errors in the range of 
1%-20% (in average 11%), but also requires quite long 
solution times (although not as long as the MIP-code). 

The convex piecewise linearization CPL, combined 
with exact integer evaluation of the solutions obtained, 
yields solutions that all are better than those obtained 
by the ADD heuristics, with relative errors between 
0.8% and 10% (in average 4%), in a much shorter time 
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(in average 1000 times quicker than the ADD heuris- 
tics). So CPL dominates the ADD heuristics completely, 
both with respect to solution time and solution quality. 

The Lagrangian heuristic, LH, produces solutions 
with relative errors between 0.4% and 3.2% (in average 
1.5%), with solution times in average 20 times shorter 
than the ADD heuristics, but of course significantly 
longer than CPL. 

Comparison to other tests is difficult, since other 
computers and codes are used. The Benders approach 
in [11] seems to be slower than the Lagrangian ap- 
proach. However, on modern computers and with 
modern MIP-codes, its performance may well improve. 


Conclusion 


The capacitated facility location problem with staircase 
costs has many important applications. Computational 
results indicate that it is possible to find near-optimal 
solutions to such problems of reasonable size in a rea- 
sonable time, i. e. that this better model can be used in- 
stead of, for example, the ordinary capacitated facility 
location problem in appropriate situations. 
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Farkas’ lemma is the most well-known theorem of the 
alternative or transposition theorem (cf. » Linear opti- 
mization: Theorems of the alternative). Given an m x n 
matrix A and a vector b (of dimension m) it states that 
either the set 


S:= OE ylA> O.y'b < o} 
or the set 
T := {x: Ax =b, x >0} 


is empty but not both sets are empty. This result has 
a long history and it has had a tremendous impact on 
the development of the duality theory of linear and 
nonlinear optimization. 

J. Farkas (1847-1930) was professor of Theoretical 
Physics at the Univ. of Kolozsvar in Hungary. His inter- 
est in the subject is explained in the first two sentences 
of his paper [5]: 


The natural and systematic treatment of analytic 
mechanics has to have as its background the in- 
equality principle of virtual displacements first 
formulated by Fourier and later by Gauss. The 
possibility of such a treatment requires, however, 
some knowledge of homogeneous linear inequal- 
ities that may be said to have been entirely miss- 
ing up to now. 


J.B.J. Fourier [7] seems to have been the first who es- 
tablished that a mechanical system has a stable equilib- 
rium state if and only if some homogeneous system of 
inequalities, like in the definition of the above set S, has 
no solution. This observation became known as the me- 
chanical principle of Fourier. By Farkas’ lemma this hap- 
pens if and only if the set T is nonempty. 

It is almost obvious that if the set T is not empty, 
then the set S will be empty and we have equilibrium. 


This follows easily by noting that the sets S and T can- 
not be both nonempty: ify € S and x € T then the con- 
tradiction 


ylb=y'(Ax) =(y'A)x = 0 


follows, because yl A > Oand x > 0. This shows that 
the condition ‘T is not empty’ is certainly a sufficient 
condition for equilibrium. The hard part is to prove 
that this is also a necessary condition for equilibrium. 
The proof has a long history. First, the condition with- 
out proof for special cases was given by A. Cournot in 
1827 and for the general case by M. Ostrogradsky in 
1834. Farkas published his condition first in 1894 and 
1895, but the proof contains a gap. A second attempt, 
in 1896, is also incomplete. The first complete proof was 
published in Hungarian, in 1898 [3], and in German in 
1899 [4]. This proof is included in Farkas’ best known 
paper [5]. For more details and references, see the his- 
torical overviews [9] and [10]. 

Nowadays (1998) many different proofs of Farkas’ 
lemma are known. For quite recent proofs, see, 
e.g., [1,2,8]. An interesting derivation has been given 
by A.W. Tucker [11], based on a result that will be re- 
ferred to as Tucker’s theorem. (See ® Tucker homoge- 
neous systems of linear relations.) The theorem states 
that for any skew-symmetric matrix K (i.e. K = — K') 
there exists a vector x such that 


x+kKx > 0. 


Tucker’s theorem implies the existence of nonnegative 
vectors Z}, Z. and x and a nonnegative scalar ¢ such 
that 


Ax — tb > 0, (1) 
— Ax + tb > 0, (2) 


—Alz + Ae = 0, 


b'z, = b'z, = 0, 


(3) 
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and 


z,+Ax—tb>0, 
Z,-Ax+tb>0, 


x= Ale & Ale > 6, 
t+ b! 2, = b!z, > 0. (4) 


If t =0, then, putting y = z2 — z1, (3) and (4) yield a vec- 
tor in the set S. If t > 0, since the above inequalities are 
all homogeneous, one may take t = 1 and then (1) and 
(2) give a vector in the set T. This shows that at least one 
of the two sets S and T is nonempty, proving the hard 
part of Farkas’ lemma. 

It is worth mentioning a result of C.G. Broyden [1] 
who showed that Tucker’s theorem, and hence also 
Farkas’ lemma, follows from a simple property of or- 
thogonal matrices. The result states that for any or- 
thogonal matrix Q (so QQ! = Q! Q=1) there exists 
a unique sign matrix D and a positive vector x such that 
Qx = Dx; a sign matrix is a diagonal matrix whose di- 
agonal elements are equal to either plus one or minus 
one. 

The key observation here is that if K is a skew- 
symmetric matrix, then 


Q=(1+ K)\(I—K) 


is an orthogonal matrix, where I denotes the identity 
matrix; Q is known as the Cayley transform of K [6]. 
The proof of this fact is straightforward. First, for each 
vector x one has 


x! (I+ K)x = x!'x, 


whence I + K is an invertible matrix. Furthermore, us- 
ing K' =— K, one may write 

Q'Q=(1+ K\(I— K) 11+ K) 1(I1— K) 

= (I+ K)(I— K*)"(I— K). 

Multiplying both sides from the left with (I — K) one 
gets 

(I- K)QQ' = (I- K’)(I- kK’) "(I- kK) 

= (I- kK), 


and multiplying both sides with (I — K)~' one finds 
QQ! =], showing that Q is orthogonal indeed. 


Therefore, by Broyden’s theorem, there exists a sign 
matrix D and a positive vector z such that 


(I+ K) '(1—K)z = Dz. 
This can be rewritten as 

(I— K)z = (I+ K)Dz, 
whence 

z—Kz= Dz+ KDz, 
or 

z—Dz = K(z+ Dz). 


Defining x = z + Dz one has x > 0, Kx > 0 and x + Kx = 
2z > 0, proving Tucker’s theorem. 
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The key to identifying optimal solutions of constrained 
nonlinear optimization problems is the Lagrange mul- 
tiplier conditions. One of the main approaches to estab- 
lishing such multiplier conditions for inequality con- 
strained problems is based on the dual solvability char- 
acterizations of systems involving inequalities. J. Farkas 
[7] initially established such a dual characterization for 
linear inequalities which was used in [23] to derive nec- 
essary conditions for optimality for nonlinear program- 
ming problems. This dual characterization is popularly 
known as Farkas’ lemma, which states that given any 
vectors a}, ...; Gm and c in R", the linear inequality clx 
> 0 is a consequence of the linear system al x >0,i=1, 
..., m, if and only if there exist multipliers 4; > 0 such 


that c = yj Aja;. This result can also be expressed as 


a so-called alternative theorem: Exactly one of the fol- 
lowing alternatives is true: 

i) dx €R",a} x >0,c'x <0, 

ii) FA; >0,c= bee Ajai. 

This lemma is the key result underpinning the lin- 
ear programming duality and has played a central role 
in the development of nonlinear optimization theory. 
A large variety of proofs of the lemma can be found 
in the literature (see [5,25,26]). The proof [3,5] that 
relies on the separation theorems has led to various 
extensions. These extensions cover wide range of sys- 
tems including systems involving infinite-dimensional 
linear inequalities, convex inequalities and matrix in- 
equalities. Applications range from classical nonlinear 
programming to modern areas of optimization such 
as nonsmooth optimization and semidefinite program- 
ming. Let us now describe certain main generalizations 
of Farkas’ lemma and their applications to problems in 
various areas of optimization. 


Infinite-Dimensional Optimization 


The Farkas lemma for a finite system of linear inequal- 
ities has been generalized to systems involving arbi- 
trary convex cones and continuous linear mappings be- 
tween spaces of arbitrary dimensions. In this case the 
lemma holds under a crucial closure condition. In sym- 
bolic terms, the main version of such extension to arbi- 
trary dual pairs of vector spaces states that the following 
equivalence holds [6]: 


[A(x) € S > c(x) > 0] & ce AT(S*), (1) 


provided the cone A! (S*) is closed in some appropri- 
ate topology. Here A is a continuous linear mapping be- 
tween two Banach spaces, S is a closed convex cone hav- 
ing the dual cone S* [5]. The closure condition holds 
when S is a polyhedral cone in some finite-dimensional 
space. For simple examples of nonpolyhedral convex 
cones in finite dimensions where the closure condition 
does not hold, see [1,5]. However, the following asymp- 
totic version of Farkas’ lemma holds without a closure 
condition: 


[A(x) € S > c(x) = 0] ce c(A'(S*)), (2) 


where cl(A'(S*)) is the closure of A'(S*) in the ap- 
propriate topology. These extensions resulted in the 
development of asymptotic and nonasymptotic first 
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order necessary optimality conditions for infinite- 
dimensional smooth constrained optimization prob- 
lems involving convex cones and duality theory for 
infinite-dimensional linear programming problems 
(see e. g. [12]). Smooth optimization refers to the opti- 
mization of a differentiable function. A nonasymptotic 
form of an extension of Farkas’ lemma that is differ- 
ent from the one in (1) is given in [24] without the 
usual closure condition. For related results see [4]. An 
approach to the study of semi-infinite programming, 
which is based on generalized Farkas’ lemma for infi- 
nite linear inequalities is given in [12]. 


Nonsmooth Optimization 


The success of linear programming duality and the 
practical nature of the Lagrange multiplier conditions 
for smooth optimization have led to extensions of 
Farkas’ lemma to systems involving nonlinear func- 
tions. Convex analysis allowed to obtain extensions in 
terms of subdifferentials replacing the linear systems 
by sublinear (convex and positively homogeneous) sys- 
tems [8,31]. A simple form of such an extension states 
that the following statements are equivalent: 


— g(x) €S => f(x) =0 (3) 
ded aro + 1200). (4) 
Aes* 


where the real valued function f is sublinear and lower 
semicontinuous, and the vector function g is sublinear 
with respect to the cone S and vg is lower semicontinu- 
ous for each v € S*. When f is continuous the statement 
(4) collapses to the condition 


e270) +a] L aa90] (5) 


AEs* 


This extension was used to obtain optimality conditions 
for convex optimization problems and quasidifferen- 
tiable problems in the sense of B.N. Pshenichnyi [27]. 
A review of results of Farkas type for systems involving 
sublinear functions is given in [13,14]. 

Difference of sublinear (DSL) functions which arise 
frequently in nonsmooth optimization provide useful 
approximations for many classes of nonconvex nons- 


mooth functions. This has led to the investigation of 
results of Farkas type for systems involving DSL func- 
tions. 

A mapping g: X — Y is said to be difference sublin- 
ear (DSL) (with respect to S) if, for each v € S*, there are 
(weak *) compact convex sets, here denoted 0(vg)(0) 
and A(vg)(0), such that, for each x € X, 


max w(x) 
w€d(vg)(0) 


max u(x) — 
u€d(vg)(0) 


vg(x) = 
where X and Y are Banach spaces. If Y = R and S = 
R, then this definition coincides with the usual no- 
tion of a difference sublinear real-valued function. Thus 
a mapping g is DSL if and only if vg is a DSL function 
for each v € S*. The sets 0(vg)(0) and d(vg)(0) are the 
subdifferential and superdifferential of vg, respectively. 
For a DSL mapping g: X > Y we shall often require 
a selection from the class of sets {(rg\(0): ve a0 
This is a set, denoted (w,), in which we select a single 
element O(vg)(0) for each v € S*. An extension of the 
Farkas lemma for DSL systems states that the following 
statements are equivalent [10,20]: 
i) —g(x)€S=> f(x) = 0; 
ii) for each selection (w,) with w, € d(vg)(0), ve S*, 

df(0) < af(0) + B, 
where B = clconeco [Uesx(8(vg)(0) — wy)]. A uni- 
fied approach to generalizing the Farkas lemma for sub- 
linear systems which uses multivalued functions and 
convex process is given [2,17,18]. 


Global Nonlinear Optimization 


Given that the optimality of a constrained global opti- 
mization problem can be viewed as the solvability of ap- 
propriate inequality systems, it is easy to see that an ex- 
tension of Farkas’ lemma again provides a mechanism 
for characterizing global optimality of a range of nonlin- 
ear optimization problems. The €-subdifferential analy- 
sis here allowed to obtain a new version of the Farkas 
lemma replacing the linear inequality c(x) > 0 by a re- 
verse convex inequality h(x) < 0, where h is a convex 
function with h(0) = 0. This extension for systems in- 
volving DSL functions states that the following condi- 
tions are equivalent. 

i) —g(x)€S> h(x) <0; 
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ii) for each selection (w,) with wy € A(vg)(0), ve S* 
and for each € > 0, 


d¢h(0) € clcone co LU (d(vg)(0) — "9 ‘ 
ves* 
Such an extension has led to the development of con- 
ditions which characterize optimal solutions of various 
classes of global optimization problems such as convex 
maximization problems and fractional programming 
problems (see [19,20]). 

However, simple examples show that the asymp- 
totic forms of the above results of Farkas type do not 
hold if we replace the DSL (or sublinear) system by 
a convex system. Ch.-W. Ha [15] established a ver- 
sion of the Farkas lemma for convex systems in terms 
of epigraphs of conjugate functions. A simple form of 
such a result [29] states that the following statements 
are equivalent: 

i) (Vie D gi(x) <0 > lx) <0; 
ii) epi h* C cl cone co [ Ujerepig?], 
provided the system 


i€l, g(x) <0 


has a solution. Here h and, for each i € I, g; are 
continuous convex functions, I is an arbitrary index 
set, and h* and g*¥ are conjugate functions of h and 
gi respectively. This result has also been employed 
to study infinite-dimensional nonsmooth nonconvex 
problems [30]. A basic general form of the Farkas 
lemma for convex system with application to multi- 
objective convex optimization problems is given in [11]. 
Extensions to systems involving the difference of con- 
vex functions are given in [21,29]. A more general re- 
sult involving H-convex functions [29] with application 
to global nonlinear optimization is given in [29]. 


Nonconvex Optimization 


The convexity requirement of the functions involved in 
the extended Farkas lemma above can be relaxed to ob- 
tain a form of Farkas’ lemma for convex-like system. 
Let F: X x Y > Rand let f: X — R, where X and Y are 
arbitrary nonempty sets. The pair (f, F) is convex-like 
on X if 


(da € (0, 1))(Vx1, x2. € X)(Ax3 € X), 
f (x3) < af(x1) + (1 — a) f (x2) 


and (Vy € Y): 
F(x3, y) < @F(x1, y) + (1 — a) F(x2, y). 


If the pair (f, F) is convex-like on X, there is x9 € X with 
(Vy € Y) F(xo, y) < 0 and if a regularity condition holds 
then the following statements are equivalent [21]: 


Vy € Y, F(x, y) <0 => f(x) = 0, 
(VO <0)(4A € A)(Vx € X) 


f(x) + DD A,F(x, y) > 8, 


yey 


where A is the dual cone of the convex cone of all non- 
negative functions on Y. An asymptotic version of the 
above result holds if the regularity hypothesis is not 
fulfilled. This extension has been applied to develop 
Lagrange multiplier type results for minimax prob- 
lems and constrained optimization problems involving 
convex-like functions. For related results see [16]. 


Semidefinite Programming 


A useful corollary of the Farkas lemma, which is often 
used to characterize the feasibility problem for linear in- 
equalities, states that exactly one of the following alter- 
natives is true: 

i) axe R"” ales bj;,i=1,...,m, 

ii) 4A; = 0 yy Aja; = 0, be bjA;=—1. 

This form of the Farkas lemma has also attracted vari- 
ous extensions to nonlinear systems, including sublin- 
ear and DSL systems [20] with the view to character- 
ize the feasibility of such systems. The feasibility prob- 
lem, which has been of great interest in semidefinite pro- 
gramming, is the problem of determining whether there 
exists an x € R” such that Q(x) > 0, for real symmetric 
matrices Q;, i = 0, ..., m, where > denotes the partial 
order, i.e. B > A ifand only if A — Bis positive semidef- 
inite, and Q(x) = Qo — 01, x; Q;. However, simple ex- 
amples show that a direct analog of the alternative does 
not hold for the semidefinite inequality systems Q(x) > 
0 without additional hypothesis on Q. A modified dual 
conditions which characterize solvability of the system 
Q(x) = 0 is given in [28]. 
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Feasible sequential quadratic programming (FSQP) 
refers to a class of sequential quadratic programming 
(SQP) methods that have the additional property that 
all iterates they construct satisfy the inequality con- 
straints. Thus, for the problem 


min f(x) 
Sh. - SU FS 1 ng hy, (1) 


hj(x)=0, j=1,...,me, 


where f, the gjs, and the hjs are smooth, FSQP methods 

generate a sequence {x;} such that gj(xx) < 0 for all j 

and all k. 

From the application’s point of view, enforcing fea- 
sibility of the iterates with respect to inequality con- 
straints is often an important attribute. First, it may be 
the case that the objective function is simply not defined 
when certain constraints are violated, for example, with 
problems involving dynamical systems, in which stabil- 
ity is needed in order for, say, certain steady state er- 
rors to be well defined. Second, it may be crucial that 
a (suboptimal) solution satisfying certain ‘hard’ con- 
straints be available after a prescribed amount of time 
has elapsed, too short to allow convergence to the op- 
timal solution. This is the case, for instance in certain 
real-time control applications. A third situation where 
feasibility of successive iterates is desirable is in the con- 
text of trade-off exploration for design problems. In- 
deed, trade-offs between ‘soft’ design specifications can- 
not be meaningfully explored unless ‘hard’ specifica- 
tions are satisfied. From the point of view of numerical 
algorithms, while maintaining feasibility of successive 
iterates obviously requires special attention, it also has 
important beneficial side effects. Namely, 

i) the objective function can be forced to decrease at 
each iteration, and thus can serve as merit func- 
tion in the line search, thereby avoiding the complex 
issue of choice of an appropriate surrogate merit 
function; and 

ii) as pointed out below, in the context of SQP type 
methods, the quadratic programs successively con- 
structed all have a nonempty feasible set, which is 
not the case in general for ‘infeasible’ methods. 

Methods that generate feasible iterates have regained 

much popularity in recent years with the in-depth in- 

vestigation of barrier-based interior point methods, suc- 
cessively in the context of linear, convex-quadratic, 


general convex, and nonconvex problems, the class of 
problems of interest here. Contributions to the latter 
can be found in the classical book [4] as well as, e. g., 
in [11] (see [14] for a ‘modern’ presentation) and [3], 
and in many recent reports. In those methods, each 
search direction is typically obtained via the solution of 
a linear system of equations. FSQP algorithms, on the 
other hand, being of the SQP type, involve the solution 
of quadratic programs as subproblems. While they are 
often impractical for problems with large numbers of 
variables, SQP-type algorithms are particularly suited 
to various classes of engineering applications where the 
number of variables is not too large but evaluations 
of objective/constraint functions and of their gradients 
are highly time consuming. Indeed, because these al- 
gorithms use quadratic programs as successive models, 
progress between (expensive) function evaluations is 
typically significantly better than with algorithms mak- 
ing use of mere linear systems of equations as models. 

FSQP algorithms are of the feasible direction type 
in that, while they allow iterates to lie on constraint 
boundaries, small enough displacements along the 
search directions they generate always yield feasible 
points. Indeed, whenever the current iterate lies on or 
near a nonlinear constraint boundary, the search di- 
rection tends to point toward the interior of the feasi- 
ble set. In that respect FSQP algorithms are analogous 
to interior point methods. Early feasible direction al- 
gorithms (see, e.g., [12,16]) were first order methods, 
i. e., only first derivatives were used and no attempt was 
made to accumulate and use second order information. 
As a consequence, such algorithms converged linearly 
at best. E. Polak proposed several extensions to these 
algorithms which take second order information into 
account when computing the search direction (see [12, 
Sect. 4.4]). Some of the search directions proposed by 
Polak can be viewed as modified SQP directions but the 
fast local convergence rate usually associated with SQP 
is not preserved. In [1], a feasible SQP algorithm is pro- 
posed with emphasis on avoiding costly line searches 
by making it likely that a full step along the constructed 
direction is acceptable as a next iterate, even early on 
in the optimization process. The price paid however is 
again the loss of fast local convergence. 

In this article, we focus on feasible SQP methods 
which, under appropriate assumptions, preserve the 
fast rate of convergence of standard SQP methods. Such 
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methods have been considered early on by J.N. Her- 
skovits and L.A.V. Carvalho [5] and in [2,6,8,9,10], and 
recently also by L. Qi and Z. Wei [13]. 


Main Ideas 


For simplicity, consider the case where only inequality 
constraints are present, i.e., me = 0. Suppose that the 
current estimate x; for the solution of (1) is feasible, i. e., 
gj(xx) < 0 for all j. The basic SQP direction, aes is ob- 
tained by solving the quadratic programming problem 


min $(d°, Hea®) + (Vlas), 5 
st. gj(xK) + (Vgj(xx), d°) <0, Vj, 


where Hy, is the Hessian of the Lagrangian, or an esti- 
mate thereof. While, in general, QP (2) may be incon- 
sistent, feasibility of x,, which we seek to enforce, guar- 
antees that it admits a feasible point. Indeed, in particu- 
lar, d? = 0 is always feasible. Assume that H; is symmet- 
ric positive definite. Then QP (2) has a unique solution 
d. It is a simple exercise to show that, in addition, d?, 
has the interesting property of being a first order de- 
scent direction for f at xx, i.e. (V f (xx), dv) < 0. 

Suppose now that some constraint, say gj), is active 
at Xx, i. e., Zj)(xx) = 0. Then, if the joth constraint is also 
active in QP (2), then (V gj, (xx), di) = 0, so that d? is 
tangent to the feasible set. Quite possibly, as a result, 
&jo(xk + td?) may be positive for small t, making it dif- 
ficult, or impossible, to locate a next feasible iterate in 
direction d?. Thus d? is not an appropriate search di- 
rection for FSQP. However any, however small, amount 
of tilting of d? towards the interior of the feasible set 
makes it a feasible direction. The challenge in FSQP 
type methods is to tilt d? enough that a sizable step 
can be made within the feasible set, but little enough 
that the fast local convergence properties of sequential 
quadratic programming are preserved. 

With appropriate titling of the basic SQP search di- 
rection, and an appropriate line search along the re- 
sulting direction d (yielding a next iterate x, + t,d; for 
some ft, € (0, 1]) a globally convergent feasible SQP al- 
gorithm can be constructed. However, the result would 
be unsatisfactory if the algorithm thus obtained did not 
exhibit a fast local convergence rate, a property that is 
generally expected from SQP-type methods. For such 
rate (in particular, a superlinear rate) to take place, it 


is critical that a full step of one be eventually taken, 
i.e. that, when x; is close to the solution, t, be equal 
to one. Here a difficulty already arises in the context 
of classical (nonfeasible) SQP methods, where it may 
happen that the line search rule prevents the full step 
from being taken. This possible conflict between global 
convergence and fast local convergence is known as the 
Maratos effect. In the context of FSQP methods, this 
difficulty is compounded by the fact that, in order to 
be acceptable, in addition to satisfy an appropriate de- 
scent criterion, the next iterate must be feasible. This 
imposes further demands on the Maratos-effect avoid- 
ance scheme. Two schemes have been proposed in the 
literature: second order correction with arc search, and 
nonmonotone line search. 


Algorithms 


Following is a simple example of an FSQP algorithm, 
taken from [10]. 


Parameters: a € (0,1/2), 6B € (0,1). 


Data: x5 6 X, Ho = A) = 0. 

Step 0. Initialization: Set k = 0. 

Step 1. Computation of a search arc. 
Compute d?. If d? = 0, stop. 

Compute dj and p, and set 
d= (lp) a, < pray. 
Compute correction dx. 

Step 2. Arc search. 
Compute t;, the first number tf in the sequence 
{1, B, B?,...} satisfying f(x, + tdy + t?dy) < 
Ff (xx) + at (Vf (xx), dk), gi(xe + td + dx) < 0, 
j alee > Mj. 

Step 3. Updates. 


Compute His: = Hj,, > 0. 
Set xp41 = Xp + ted + aye 


Setk=k+1. 
Go back to Step 1. 


Algorithm: Simple FSQP 


Here, d;, is a feasible direction and a direction of first 
order descent for f, p € (0, 1] goes to zero fast enough 
(like || df, ||?) when d?. goes to zero, and dx is a correc- 
tion that aims at insuring that the full step of one will be 
accepted when x; is close enough to a solution; compu- 
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tation of ‘as involves constraint values at x,+ d,. Under 
standard assumptions this algorithm is known to gen- 
erate sequences whose limit points are Karush-Kuhn- 
Tucker points. Under strengthened assumptions, in- 
cluding the assumption that H; is updated in such a way 
that it approximates well, in a certain sense, the Hessian 
of the Lagrangian as a solution is approached, conver- 
gence can be shown to be Q-superlinear or 2-step su- 
perlinear. See [10] for details. A refined version of the 
algorithm of [10] is implemented in the CFSQP/FFSQP 
software (see [15]). Refinements include the capability 
to handle equality constraints [6], minimax and con- 
strained minimax problems and to efficiently handle 
problems with large numbers of inequality constraints 
and minimax problems with large numbers of objec- 
tive functions [8]. Also note that an FSQP method with 
drastically reduced amount of work per iteration has 
been recently proposed [7]. 


Applications 


Applications abound where FSQP-type algorithms are 
of special interest. In particular, as stressed above, such 
algorithms are particularly appropriate for problems 
where the number of variables is not too large but func- 
tions evaluations are expensive, and feasibility of iter- 
ates is desirable (or imperative). Furthermore, prob- 
lems with a large number of inequality constraints (or 
minimax problems with large numbers of objective 
functions), such as finely discretized semi-infinite op- 
timization problems, can be handled effectively, mak- 
ing FSQP especially well-suited for problems involving, 
e. g., time or frequency responses of dynamical systems. 
Pointers to a large number of applications can be found 
on the web, at the URL listed above. Application areas 
include all branches of engineering, medicine, physics, 
astronomy, economics and finances, to mention but 
a few. 
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In recent years (1990) feedback set problems have been 
the subject of growing interest. They have found ap- 
plications in many fields, including deadlock preven- 
tion [90], program verification [79], and Bayesian in- 
ference [2]. Therefore, it is natural that in the past 
few years there have been intensive efforts on exact 
and approximation algorithms for these kinds of prob- 
lems. Exact algorithms have been proposed for solving 
the problems restricted to special classes of graphs as 
well as several approximation algorithms with provable 
bounds for the cases that are not known to be polyno- 
mially solvable. The most general feedback set problem 
consists in finding a minimum-weight (or minimum 
cardinality) set of vertices (arcs) that meets all cycles 
in a collection C of cycles in a graph (G, w), where w 
is a nonnegative function defined on the set of vertices 
V(G) (on the set of edges E(G)). This kind of problem is 
also known as the hitting cycle problem, since one must 
hit every cycle in C. It generalizes a number of prob- 
lems, including the minimum feedback vertex (arc) set 
problem in both directed and undirected graphs, the 
subset minimum feedback vertex (arc) set problem and 
the graph bipartization problem, in which one must re- 
move a minimum-weight set of vertices so that the re- 
maining graph is bipartite. In fact, if C is the set of all 
cycles in G, then the hitting cycle problem is equivalent 
to the problem of finding the minimum feedback vertex 
(arc) set in a graph. If we are given a set of special ver- 
tices and C is the set of all cycles of an undirected graph 
G that contains some special vertex, then we have the 
subset feedback vertex (arc) set problem and, finally, if 
C contains all odd cycles of G, then we have the graph 
bipartization problem. All these problems are also spe- 
cial cases of vertex (arc) deletion problems, where one 
seeks a minimum-weight (or minimum cardinality) set 
of vertices (arcs) whose deletion gives a graph satis- 
fying a given property. There are different versions of 
feedback set problems, depending on whether the graph 
is directed or undirected and/or the vertices (arcs) are 
weighted or unweighted. See [30] for a complete sur- 
vey, and [91] for a general NP-hardness proof for al- 
most all vertex and arc deletion problems restricted to 
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planar graphs. These results apply to the planar bipar- 
tization problem, the planar (directed, undirected, or 
subset) feedback vertex set problems, already proved to 
be NP-hard [33,46]. Furthermore, it is NP-complete for 
planar graphs with no indegree or outdegree exceeding 
three [46], general graphs with no indegree or outde- 
gree exceeding two [46], and edge-directed graphs [46]. 

The scope of this article is to give a complete state- 
of-art survey of exact and approximation algorithms 
and to analyze a new practical heuristic method called 
GRASP for solving both feedback vertex and feedback 
arc set problems. 


Notation and Graph Representation 


Throughout this paper, we use the following notation 
and definitions. 

A graph G = (V, E) consists of a finite set of vertices 
V(G), and a set of arcs E(G) C V(G) x V(G). 

An arc (or edge) e = (v;, v2) of a directed graph (di- 
graph) G = (V, E) is an incoming arc to v2 and an out- 
going arc from v, and it is incident to both v; and 1. If 
G is undirected, then e is said to be only incident to v; 
and v>. 

For each vertex i € V(G), let in(i) and out(i) denote 
the set of incoming and outgoing edges of i, respec- 
tively. They are defined only in case of a digraph G. If G 
is undirected, we will take into account only the degree 
Ag(i) of i as the number of edges that are incident to i 
inG. 

A(G) denotes the maximum degree among all ver- 
tices of a graph G and it is called the graph degree. 

A vertex v € Gis called an endpoint if it has degree 
one, a linkpoint if it has degree two, while a vertex hav- 
ing degree higher than two is called a branchpoint. 

A path P in G connecting vertex u to vertex v is a se- 
., €y in E(G), such that e; = (vj, vj 41), 
i=1,...,7, with vy, = uand v,,) = v. A cycle C in Gis 


quence of arcs e},.. 


a path C=(y,..., v,), with v; = v,. 

A subgraph G' = (V', E’) of G = (V, E) induced by 
V’ is a graph such that E’ = EM (V’ x V’). A graph Gis 
said to be a singleton, if |V(G)| = 1. Any graph G can be 
partitioned into isolated connected components Gj,..., 
G, and the partition is unique. Similarly, every feedback 
vertex set V’ of G can be partitioned into feedback ver- 
tex sets F,,..., F, such that F; is a feedback vertex set of 
G;. Therefore, following the additive property and de- 


noting by j4(G, w) the weight of a minimum feedback 
vertex (arc) set for (G, w), we have: 


k 
w(G.w) = 7 W(Gi.w). 
i=1 


The Feedback Vertex Set Problem 


Formally, the feedback vertex set problem can be de- 
scribed as follows. Let G = (V, E) be a graph and let w: 
V(G) — R* be a weight function defined on the ver- 
tices of G. A feedback vertex set of G is a subset of ver- 
tices V’C V(G) such that each cycle in G contains at 
least one vertex in V’. In other words, a feedback ver- 
tex set V’ is a set of vertices of G such that by removing 
V’ from G along with all the edges incident to V’, re- 
sults in a forest. The weight of a feedback vertex set is 
the sum of the weights of its vertices, and a minimum 
feedback vertex set of a weighted graph (G, w) is a feed- 
back vertex set of G of minimum weight. The weight 
of a minimum feedback vertex set will be denoted by 
u(G, w). The minimum weighted feedback vertex set 
problem (MWFYVS) is to find a minimum feedback ver- 
tex set of a given weighted graph (G, w). The special 
case of identical weights is called the unweighted feed- 
back vertex set problem (UFVS). 


Mathematical Model 
of the Feedback Vertex Set Problem 


As a covering-type problem, the feedback vertex set 
problem admits an integer zero-one programming for- 
mulation. Given a feedback vertex set V’ for a graph 
(G, w), G=(V, E), and a set of weights w = {w(v)}y e vq) 
let x = {x,}) ¢ vq be a binary vector such that x, = 1 if 
v € V’, and x, = 0 otherwise. Let C be the set of cy- 
cles in (G, w). The problem of finding the minimum 
feedback vertex set of G can be formulated as an integer 
programming problem as follows: 


min i w(v)x, 
veEV(G) 

ae. yet, vrec, 
veV(l) 


0<x, <linteger, ve V(G). 


If one denotes by C, the set of cycles passing through 
vertex v € V(G), then the dual of the corresponding lin- 
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ear programming relaxation is a packing problem: 


max Yr 
Tec 

st. YS yr < w(v), Vv € V(G), 
TEC, 
yr > 0, WE. 


Polynomially Solvable Cases 


Given the NP-completeness of the feedback vertex set 
problem, a recent line of research has focused on iden- 
tifying the largest class of specially structured graphs 
on which such problems remain polynomially solvable. 
A pioneering work is due to A. Shamir [79], who pro- 
posed a linear time algorithm to find a feedback vertex 
set for a reducible flow graph. C. Wang, E. Lloyd, and 
M. Soffa [90] developed an O(|E(G)|-|V(G)|?) algorithm 
for finding a feedback vertex set in the class of graphs 
known as cyclically reducible graphs, which is shown to 
be unrelated to the class of quasireducible graphs. Al- 
though the exact algorithm proposed by G.W. Smith 
and R.B. Walford [83] has exponential running time in 
general, it returns an optimal solution in polynomial 
time for certain types of graphs. A variant of the al- 
gorithm, called the Smith-Walford-one algorithm, se- 
lects only candidate sets F of size one and runs in 
O(|E(G)|-|V(G)|?) time. The class of graphs for which it 
finds a feedback vertex set is called Smith- Walford one- 
reducible. In the study of feedback vertex set problems 
a set of operations called contraction operations has had 
significant impact. They contract the graph G(V, E), 
while preserving all the important properties relevant 
to the minimum feedback vertex set. See [56] for a de- 
tailed analysis of these reduction procedures which are 
important for the following two reasons. First, a class of 
graphs of increasing size is computed, where the feed- 
back vertex set of each graph can be found exactly. 
Second, most proposed heuristics and approximation 
algorithms use the reduction schemes in order to re- 
duce the size of the problem. Another line of research 
on polynomially solvable cases focuses on other spe- 
cial classes, including chordal and interval graphs, per- 
mutation graphs, convex bipartite graphs, cocomparabil- 
ity graphs and on meshes and toroidal meshes, butter- 
flies, and toroidal butterflies. The feedback vertex set on 
chordal and interval graphs can be viewed as a special 
instance of the generalized clique cover problem, which 


is solved in polynomial time on chordal graphs [20,93] 
and interval graphs [65]. For permutation graphs, an 
algorithm due to A. Brandstaédt and D. Kratsch [8] 
was improved by Brandstadt [7] to run in O(|V(G)|°) 
time. More recently (1994), Y.D. Liang [58] presented 
an O(|V(G)|-|E(G)|) algorithm for permutation graphs 
that can be easily extended to trapezoid graphs while 
keeping the same time complexity. On interval graphs, 
C.L. Lu and C.Y. Tang [61] developed a linear-time 
algorithm to solve the minimum weighted feedback 
vertex set problem using dynamic programming. S.R. 
Coorg and C.P. Rangan [19] present an O(|V(G)|*) 
time and O(|V(G)|*) space exact algorithm for cocom- 
parability graphs, which are a superclass of permuta- 
tion graphs. More recently, Liang and M.S. Chang [13] 
developed a polynomial time algorithm, that by ex- 
ploring the structural properties of a cocomparability 
graph uses dynamic programming to get a minimum 
feedback vertex set in O(|V(G)*| |E(G)]) time. A re- 
cent (1998) line of research [63] on polynomially solv- 
able cases focuses on special undirected graphs having 
bounded degree and that are widely used as connection 
networks, namely mesh, butterfly and k-dimensional 
cube connected cycle (CCC,). 


Approximation Algorithms and Provable Bounds 
on Undirected Graphs 


A 2 log»|V(G)|-approximation algorithm for the un- 
weighted minimum feedback vertex set problem on 
undirected graphs is contained in a lemma due to P. 
Erdés and L. Posa [25]. This result was improved in [66] 
to obtain a performance ratio of O(,/log|V(G)|). R. 
Bar-Yeruda, D. Geiger, J. Naor, and R.M. Roth [2] gave 
an approximation algorithm for the unweighted undi- 
rected case having ratio less than or equal to 4 and 
two approximation algorithms for the weighted undi- 
rected case having ratios 4 log, |V(G)| and 2A?(G), re- 
spectively. To speedup the algorithm, they show how to 
preprocess the input valid graph by applying the cor- 
responding undirected versions of the Levy-Lowe re- 
duction transformations. For the feedback vertex set 
problem in general undirected graphs, two slightly dif- 
ferent 2-approximation algorithms are described in [3] 
and [1]. These algorithms improve the approximation 
algorithms of [2]. They also can find a loop cutset 
which, under specific conditions, is guaranteed in the 
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worst case to contain less than four times the number 
of variables contained in a minimum loop cutset. Sub- 
sequently, A. Becker and Geiger [4] applied the same 
reduction procedure from the loop cutset problem to 
the minimum weighted feedback vertex set problem 
of [2], but their result is independent of any condition 
and is guaranteed in the worst case to contain less than 
twice the number of variables contained in a minimum 
loop cutset. They [4] propose two greedy approxima- 
tion algorithms for finding the minimum feedback ver- 
tex set V’ in a vertex-weighted undirected graph (G, 
w), one of them having performance ratio bounded by 
the constant 2 and complexity O(m+n log n), where m 
= |E(G)| and n = |V(G). In [17], F.A. Chudak, M.X. 
Goemans, D. Hochbaum, and D.P. Williamson showed 
how the algorithms due to Becker and Geiger [3] and 
V. Bafna, P. Berman, and T. Fujito [1] can be explained 
in terms of the primal-dual method for approxima- 
tion algorithms that are used to obtain approximation 
algorithms for network design problems. The primal- 
dual method starts with an integer programming for- 
mulation of the problem under consideration. It then 
simultaneously builds a feasible integral solution and 
a feasible solution to the dual of the linear program- 
ming relaxation. If it can be shown that the value of 
these two solutions is within a factor of a, then an a- 
approximation algorithm is found. The integrality gap 
of an integer program is the worst-case ratio between 
the optimum value of the integer program and the op- 
timum value of its linear relaxation. Therefore, by ap- 
plying the primal-dual method it is possible to proof 
that the integrality gap of the integer program under 
consideration is bounded. In fact, Chudak et al., after 
giving a new integer programming formulation of the 
feedback vertex set problem, provided a proof that its 
integrality gap is at most 2. They also gave the proofs of 
some key inequalities needed to prove the correctness 
of their new integer programming formulation. 


Theorem 1 Let V’ denote any feedback vertex set of 
a graph G = (V, E), E # @, let t denote the cardinal- 
ity of the smallest feedback vertex set for G, and let E(S) 
denote the subset of edges that have both endpoints in 
SC V(G), b(S) = |E(S)| — |S|+1. Then 


Yo (Ac) — 1] = b(V(G), (1) 


vevV’ 


Y> Ac(v) = b(V(G) +t. (2) 
veV’ 
If every vertex in G has degree at least two, and V’ x is 
any minimal feedback vertex set (i.e. V v € V’y, V'm\ 
{v} is not a feedback vertex set), then 


Y_ Ac(v) < 2(0(V(G)) + 1) — 2. (3) 


veEV’/y 


G. Even, Naor, B. Schieber, and L. Zosin [28] showed 
that the integrality gap of that integer program for the 
standard cycle formulation of the feedback vertex set 
problem is S2 (log n). The new integer programming for- 
mulation given in [17] is as follows: 


min 2 w(v)x, 
veV(G) 
st.) M(As(v) — xy > B(S), 
veS 
SC V(G): E(S) #9, 
x, € {0,1}, veV(G). 


The linear programming relaxation is: 


min > w(v)x, 
veV(G) 
st.) N(As(v) — 1)xy > B(S), 
ves 
SC V(G): E(S) 4 @, 
xy >0, ve V(,G), 
and its dual is: 
max > b(S) ys 
S 
st. > (As(v)—lys S wy, v € V(G), 
S:veS 
ys=0, SCV(G): E(S) #9. 


For the subset feedback vertex problem, the authors 
of [28] showed that it can be approximated in poly- 
nomial time by a factor of min{2 A(G), 8 log(|V'|+1), 
O(log t*)}, where t* denotes the value of the opti- 
mal fractional solution. In [28] the authors also pro- 
posed a technique, called bootstrapping, that enhances 
the O(log |V’|) to a factor of O(log t*/B), where B de- 
notes the minimum weight of a vertex. The bootstrap- 
ping technique iteratively uses a graph partition algo- 
rithm. The output of each iteration is by itself a subset 
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feedback vertex set and is used as part of the input of 
the next iteration. After O(log |V’|) iterations the algo- 
rithm gives as output a subset feedback vertex set having 
weight at most O(t* log t*). Even, Naor and Zosin [26] 
improved this result proposing an 8-approximation al- 
gorithm. The main tool that they used in developing 
their approximation algorithm and its analysis is a new 
version of multicommodity flow, called relaxed multi- 
commodity flow, a hybrid of multicommodity flow and 
multiterminal flow, in which there are additional con- 
straints, called intercommodity constraints. For each arc, 
the authors considered the maximum flow among all the 
commodities, which is shipped along it. They required 
that for each vertex v € V(G) the sum of the maximum 
flows shipped along its incident arcs be bounded by four 
times the capacity of v. By considering the multicommod- 
ity flow, the vertices for which the intercommodity con- 
straints are tight play an important role from the point 
of view of the connectivity of the graph. They are called 
intersatured vertices. The main result of [26] is a theo- 
rem that bounds the weight of the vertices that must be 
intersatured, so as to satisfy a given demand vector by 
the sum of demands. 


Approximation Algorithms and Provable Bounds 
on Directed Graphs 


In general, problems on undirected graphs are relatively 
easier to handle than problems on directed graphs, 
since more graph theory can be utilized. Not surpris- 
ingly, the approximation results obtained so far for the 
undirected version are stronger than those for the di- 
rected version. In fact, none of the algorithms referred 
to in the previous subsections apply to the feedback 
vertex set problem in directed graphs and, in contrast 
with the undirected version, no analytical results are 
known for the directed case. A very recent direction of 
research on approximation algorithms in the directed 
version focuses on the complete equivalence among all 
feedback set (and/or feedback subset) problems and 
among these and the directed minimum capacity mul- 
ticut problem in circular networks. An exhaustive de- 
scription of the procedures that reduce any feedback 
set problem to any other or any of them to the directed 
minimum capacity multicut problem and vice versa are 
formalized and used in [27] to obtain an approxima- 
tion algorithm for the subset feedback arc set problem 


of a weighted directed graph G = (V, E), where the in- 
teresting cycles to be hit are contained in a set of spe- 
cial vertices X C V(G), where |X| = k. The weight of 
the feedback arc set found by their approximation al- 
gorithm is O(t* log*|X|), where t* is the weight of 
an optimal fractional feedback set. Nevertheless, their 
approach can be used to solve any other feedback set 
problem as well as the directed minimum capacity mul- 
ticut problem. Even et al. [27] also proposed an algo- 
rithm for approximating the minimum weighted sub- 
set vertex set problem in the weighted and directed 
case, leading to a result that holds for any other feed- 
back set problem as well. This approach is an algorith- 
mic adaptation of a theoretical result due to P.D. Sey- 
mour [78], who proved that the integrality gap in the 
case of the unweighted feedback vertex set problem can 
be at most O(log t* log log t*), where t* is defined as 
above. Even et al. observe that all existence arguments 
contained in the proof of Seymour’s statement can be 
made constructive and thus, with some additional oper- 
ations, an algorithm for the unweighted feedback vertex 
set problem having an approximation factor of O(log 
t* log log t*) can be obtained. Further modifications 
of the algorithm lead to a polynomial time approxi- 
mation scheme applicable to the weighted problem. In 
O(|E(G)|-|V(G)|?) time the algorithm finds a feedback 
vertex set having weight 


O (min {t* log t* log log t*, 
t* log |V(G)| log log |V(G)|}) . 


All the observations contained in [27] improve 
the O(log” |V(G)|)-approximation algorithm for this 
case [54]. In the case of directed planar graphs, 
H. Stamm [86] presented an O(|V(G)|log |V(G)|)- 
approximation algorithm, whose performance guaran- 
tee is bounded by the maximum degree of the graph 
and an O(|V(G)|?) time approximation algorithm with 
performance guarantee no more than the number of 
cyclic faces in the planar embedding of the graph mi- 
nus 1. M. Cai, X. Deng, and W. Zang [10] obtained 
a 2.5-approximation algorithm for the minimum feed- 
back vertex set problem on tournaments, improving the 
previously known algorithm with performance guaran- 
tee of 3 by E. Speckenmeyer [85]. Let H be the triangle- 
vertex incidence matrix of a tournament T and let e be 
the all-one vector. In [10], necessary and sufficient con- 
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ditions are established for the linear system {x: Hx > e, 
x => 0} to be a totally dual integral system (TDI). 


Definition 2 A rational linear system {x: Ax > b, x > 0} 
is called totally dual integral, if the optimization prob- 
lem max {yT b: yT A < cT, y > O} has an integral opti- 
mum solution y for every integral vector c for which the 
maximum is finite. 


It has been shown that any rational polyhedron P has 
a TDI system P = {x: Ax < b} representation with A in- 
tegral, and that, if P is full-dimension, there is a unique 
minimal TDI system P = {x: Ax < b} with A and b inte- 
gral if and only if P is integral. In [11] the authors have 
extended this approach to the feedback vertex set prob- 
lems and the cycle packing problem in bipartite tourna- 
ments, where a bipartite tournament is an orientation 
of a complete bipartite graph. For the aforementioned 
problems they have found strongly polynomial time al- 
gorithms, which are a consequence of a min-max relax- 
ation on packing and covering directed cycles. 


Exact Algorithms 


In contrast to the numerous approximation schemes 
that have been studied, relatively few exact algorithms 
for the feedback vertex set problem have been pro- 
posed. To our knowledge, the first algorithm to find an 
exact minimal cardinality FVS is due to Smith and Wal- 
ford [83], who proposed a particular graph partition 
technique. Although their algorithm solves the prob- 
lem in an arbitrary directed graph in exponential run- 
ning time, it returns an optimal solution in polynomial 
time for certain types of graphs. Later, exact algorithms 
of enumerative nature often used the graph reduction 
procedures to speed up the process. One study, [16], 
essentially used direct enumeration plus reduction and 
reported satisfactory computational results for a set 
of partial scan design test problems. T. Orenstein, Z. 
Kohavi, and I. Pomeranz [67] proposed a somewhat 
more involved exact enumerative procedure based on 
graph reduction and efficient graph partitioning meth- 
ods. Their algorithm has been designed for identifying 
a minimum feedback vertex set in a digital circuit and 
it is efficient in random graphs, even though in cliques 
or graphs that are ‘almost’ cliques it has an exponential 
behavior, since the reduction and partition techniques 
cannot be applied. 


Somewhat surprising, exact algorithms for feedback 
vertex set based on mathematical programming formu- 
lation are quite few. Recently (1996), M. Funke and 
G. Reinelt [32] considered a special variant of feed- 
back problems, namely the problem of finding a max- 
imum weight node induced acyclic subdigraph. They 
discussed valid and facet defining inequalities for the 
associated polytope and developed a polyhedral-based 
exact algorithm presenting computational results ob- 
tained by applying a branch and cut algorithm. 


The Feedback Arc Set Problem 


Given a graph G = (V, E) and a nonnegative weight 
function w: E(G) — R* defined on the arcs of 
G, the feedback arc set problem consists of finding 
a minimum-weight subset of arcs E’ C E(G) that meets 
every cycle in a given collection C of cycles in (G, w). As 
in the vertex case, this leads to the minimum feedback 
arc set problem (MWFAS) in both directed and undi- 
rected graphs, the minimum weighted graph bipartiza- 
tion problem via arc removals, and so on. 


Mathematical Model 
of the Feedback Arc Set Problem 


Given an arc weighted graph (G, w), G = (V, E) and the 
set C of all cycles in G, the minimum weighted feedback 
arc set problem can be formulated as the following in- 
teger programming problem: 


min >. w(e)Xe 
e€E(G) 

at. eel, Ve, 
eer" 
xe € {0,1}, Wee E(G). 


In its relaxation, the constraints x, € {0, 1}, V e € E(G) 
are replaced by x, > 0, V e € E(G), obtaining a fractional 
feedback arc set. As with the feedback vertex set prob- 
lem, the feedback arc set problem is a covering problem 
and its (linear programming) dual is called a packing 
problem. In the case of the feedback arc set problem this 
means assigning a dual variable to all interesting cycles 
to be hit in the given graph, such that for each arc the 
sum of the variables corresponding to the interesting 
cycles passing through that arc is at most the weight of 
the arc itself. 
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State of the Art of Feedback Arc Set Problems 


Feedback arc set problems tend to be easier than their 
vertex counterparts, especially for planar graphs. In the 
directed case feedback vertex and feedback arc set prob- 
lems are each reducible to one another. Even, Naor, 
Schieber, and Sudan [27] showed how to perform re- 
ductions among feedback set problems and feedback 
subset problems and vice versa, preserving feasible so- 
lutions and their costs. In all reductions, there is a one- 
to-one correspondence between feasible solutions and 
their corresponding costs. Therefore, an approximate 
solution to one problem can be translated to an ap- 
proximate solution of the other problem reducible to 
this problem. Because most of the reduction procedures 
can be performed in linear time, these problems can be 
viewed as different representations of the same prob- 
lem. Hence, as feedback vertex sets are reduced into 
feedback arc sets with the same weight and vice versa, 
all of these problems are equally hard to approximate. 
In the literature of feedback set problems most of the 
proposed algorithms are designed to solve the prob- 
lem in vertex-weighted graphs. One of the pioneering 
papers on feedback arc set problems is [76], where it 
is proved that finding a minimum feedback arc set in 
an arc-weighted reducible flow graph is as difficult as 
finding a minimum cut in a flow network. The pro- 
posed algorithm has complexity O(mn* log (n?/m)), 
where m = |E(G)| and n = |V(G)|. The algorithm was 
adapted to solve the problem in the vertex-weighted 
case. Shamir’s linear time algorithm [79], used for the 
unit-weighted case, cannot be applied to solve the arc- 
weighted problem, because any reduction between arc 
and vertex set problems does not preserve the reducibil- 
ity property. Given a directed graph G = (V, E), a di- 
join E’ C E(G) is a set of arcs such that the graph G’ 
=(V, B), B= EU {(v, u): (u, ve E’} is strongly con- 
nected. Given nonnegative weights w., e € E(G), the 
minimum-weight dijoin problem is to find the dijoin 
with minimum weight. The feedback arc set problem in 
planar digraphs is reducible to the problem of finding 
a minimum-weight dijoin in the dual graph, which is 
solvable in polynomial time [39]. Stamm [86] proposed 
a simple 2-approximation algorithm for the minimum 
weight dijoin problem by superposing two arbores- 
cences. It is interesting to observe that, when translated 
to the dual graph, all these problems lead to problems 


of hitting certain cutsets of the dual graph, problems 
which can be approximated within a ratio of 2 by the 
primal-dual method. Goemans and Williamson [37] 
proposed a primal-dual algorithm that finds a 9/4- 
approximate solution to feedback set problems in pla- 
nar graphs. The first approximation algorithm for the 
feedback arc set problem was given in [54]. The ap- 
proximation factor is O(log? n) in the unweighted case, 
where n is the number of vertices of the input graph. 
This bound was obtained by using a O(log n) approx- 
imation algorithm for a directed separator that splits 
the graph into two approximately equally-sized com- 
ponents, S and S. This separator can be found by ap- 
proximating special cuts called quotient cuts. This result 
was improved by Seymour [78], who gave a O(log n log 
log n)-approximation algorithm that solves the linear 
relaxation of the feedback arc set mathematical model 
and then interprets the optimal fractional solution x* as 
a length function defined on the arcs. Systematically, in 
a recursive fashion, it uses this length function to delete 
from the graph G all arcs between S and S. Note that the 
linear program can be solved in polynomial time by us- 
ing the ellipsoid or an interior point algorithm. Hence, 
the quality of the bound in this approach depends on 
the way the graph is partitioned. Seymour [78] proved 
the following lemma: 


Lemma 3 For a given strongly connected digraph G = 
(V, E), suppose there exists a feasible solution x to the 
feedback arc set problem. If is the value of the optimal 
fractional solution x*, then there exists a partition (S, S) 
such that, for some €, 0 < € < 1, the following conditions 
hold: If 8+ (S) = {(u,v): (u,v) € E(G), ue S, ve S} 
and 8~(S) = {(v,u): (v,u) € E(G), u € S, ve S}, 
then the following is true: 


> wle)x(e) < ed, (4) 
e€E(S) 
Y= w(le)x(e) < (1 €)g. (5) 
e€E(S) 
and either 
w(e) < 20e¢ log log log d (6) 
€ 
e€dt(S) 
or 
Ls w(e) < 20€¢ log (<) log log @. (7) 


e€d—(S) 
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Furthermore, the partition (S,S) can be found in poly- 
nomial time. 


This Lemma admits a constructive proof, [27]. The al- 
gorithm in this proof finds a feedback arc set having 
weight O(r* log?|X|), where X is a special set of ver- 
tices defining the cycles to be hit and t* is the weight 
of an optimal fractional feedback set. The idea is to re- 
duce the problem to the directed minimum capacity 
multicut problem in circular networks and of adapt- 
ing the undirected sphere growing technique described 
in [35] to directed circular networks. Then the graph 
is decomposed in the following way. A fractional and 
optimal solution to the directed feedback set problem 
induces a distance metric on the set of arcs (on the set 
of vertices) E(G). The approximation algorithm arbi- 
trarily picks a vertex v € X and solves the shortest path 
tree problem rooted at v with respect to the metric in- 
duced by the fractional solution. The procedure that 
finds the shortest path tree defines layers with respect to 
the source v. Each layer is a directed cut that partitions 
the graph into two parts. The next step of the approxi- 
mation algorithm is to choose a directed cut and to add 
the cut to the feedback set constructed so far. The algo- 
rithm continues recursively in each part and ends when 
the graph does not contain any interesting cycles. The 
key of the algorithm is the choice of the criterion to se- 
lect the directed cut that partitions the graph. Even et al. 
decided to relate the weight of the cut to the cost of the 
fractional solution. More recently (1996), Even, Naor, 
Schieber, and Zosin [28] showed that, for any weight 
function defined on the arcs, the subset feedback arc 
set problem can be approximated in polynomial time 
by a factor of two. The approximation algorithm con- 
sists of successive computations of minimum cuts. Its 
approximation factor is estimated by considering the 
capacities of minimum cuts as flow paths. When new 
minimum cuts are computed, previous flow paths are 
updated according to the decomposition of the graph 
induced by an optimal solution. 


A GRASP for Feedback Set Problems 


Although the approximation algorithms guarantee 
a solution of a certain quality, for many practical real 
world cases, heuristic methods can lead to better so- 
lutions in a reasonable amount of CPU time. Meta- 
heuristics, such as genetic algorithms, simulated an- 


nealing, greedy randomized adaptive search procedures 
(GRASP), Lagrangian relaxation, and others have been 
developed with successful computational performance 
on a wide range of combinatorial optimization prob- 
lems. Interestingly, however, feedback vertex set prob- 
lems seem to be an exception. For this family of prob- 
lems relatively few practical heuristics have been de- 
veloped. Furthermore, most of the heuristics that seem 
to be quite successful computationally are greedy type 
heuristics or generalized greedy type heuristics (e.g. 
GRASP). Almost all the efficient heuristics developed 
so far employ the solution-preserved reduction rules 
studied in [56]. It has been observed in practice that 
this group of heuristics greatly reduces the cardinal- 
ity of the graph not only at the beginning of the al- 
gorithm, but also dynamically during the execution of 
node deletion type heuristics. A recent line of research 
on heuristic approaches is due to P.M. Pardalos, T. 
Qian, and M.G.C. Resende [70] where three variants of 
the so-called greedy randomized adaptive search proce- 
dure (GRASP) metaheuristic are proposed for finding 
approximate solutions of large instances of the feed- 
back vertex set problem in a digraph. GRASP is a mul- 
tistart method characterized by two phases: a construc- 
tion phase and a local search phase, also known as a lo- 
cal improvement phase. During the construction phase 
a feasible solution is iteratively constructed. One ele- 
ment at time is randomly chosen from a restricted can- 
didate list (RCL), whose elements are sorted according 
to some greedy criterion, and is added to the build- 
ing feedback vertex set and removed from the graph 
with all its incident arcs. Since the computed solution, 
in general, may not be locally optimal with respect to 
the adopted neighborhood definition, the local search 
phase tries to improve it. These two phases are iterated 
and the best solution found is kept as an approximation 
of the optimal solution. To improve the efficiency of the 
method, Pardalos et al. incorporated in each iteration 
of their algorithm solution-preserving graph reduction 
techniques in their directed version and that can be 
used also to check if a digraph is acyclic, returning an 
empty reduced graph in case of positive answer. The 
authors employed the following three greedy functions 
used to select the node with the maximum G(i) values: 

e Gz, (i) =in(/) + out(a); 

e@ Gz (i) = in(i) * out(i); 

e Gc (i) = max {in(i), out(i)}. 
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Greedy function Gy, assigns equal weight to in- and 
out-degrees. Gg favors the balance between in- and 
out-degrees. Gc only considers the largest value of the 
degrees. As demonstrated in [70], Gg produced the 
best computational results. GRASP was tested on two 
randomly generated problem sets, finding the optimal 
solutions to all the problems in the first set, where the 
optimal values are known (computed in [32]). Further- 
more, this GRASP dominates the pure greedy heuris- 
tics in all the test instances with comparable running 
time. In [31], Fortran subroutines are given for finding 
approximate solutions of both the directed feedback 
vertex set problem and the directed feedback arc set 
problem using GRASP. The subroutines for solving 
approximately the feedback vertex set problem corre- 
sponds to the pseudocode algorithm proposed in [70]. 
The subroutines for solving approximately the feedback 
arc set problem uses a linear-time procedure proposed 
in [27] in order to reduce the given feedback arc set 
problem instance to an equivalent feedback vertex set 
problem instance, and then the reduced vertex version 
problem is solved. 


Future Research 


As has been pointed out in [38], fast construction 
heuristics combined with local improvement tech- 
niques tailored for special applications have been the 
‘workhorse’ of combinatorial optimization in practice. 
As the design of efficient construction heuristics and 
local search procedures will be a key to the effective 
computational procedure for feedback set problems, 
new approaches are considered that will lead to higher 
quality solution. New variants of the classical GRASP 
approach are considered, called Reactive GRASP tech- 
niques. The first idea along this line has been due to M. 
Prais and C.C. Ribeiro [74], who used reactive GRASP 
to a matrix decomposition problem arising in the con- 
text of traffic scheduling in satellite-division-multiple- 
access systems (SS/TDMA). In the reactive GRASP, the 
restricted candidate list parameter @ is not fixed, but 
selfadjusted according to the quality of the solution pre- 
viously found during the search. In more detail, the pa- 
rameter a is randomly chosen from a set of m prede- 
terminated acceptable values A = {a1,..., @m}. Associ- 
ated with the choice of a; there is a probability p;, ini- 
tially corresponding to a uniform distribution. During 


the search phase some information is collected in order 
to change the discrete set of probabilities {p;}i-4,..., m-. 
Several possible strategies can be explored for this up- 
date operation. One among them has been proposed by 
Prais and Ribeiro. It is an absolute qualification rule, 
based on the average value of the solutions obtained 
with each value of a = a;. Once chosen the updating 
criterion of the probabilities {p;};-1,...,m, it is possible 
to use different values of @ at different iterations. There- 
fore, different restricted candidate lists can be built and 
eventually different solutions can be constructed, which 
would never be built by using a single, fixed value of a. 

T.A. Feo and Resende have discussed in [29] the ef- 
fects the parameter a can have on the quality of the 
solution and, at least analyzing the results obtained by 
Prais and Ribeiro, it seems that @ can have an evident 
impact on the outcome of a GRASP procedure. 


Conclusions 


Despite the large body of work on approximation al- 
gorithms, computational studies of feedback set prob- 
lems seem to be still in their embryonic stage. No mod- 
ern metaheuristics, except the GRASP procedure re- 
cently (1996) developed in [70] have ever been applied 
to the feedback vertex set problem. The size of the gen- 
eral problem that can be handled is still quite limited. It 
seems that this area of computational research has the 
greatest potential for progress and impact in the com- 
ing years. It has to be also underlined that, since detect- 
ing cycles is a relatively expensive operation, the local 
search of feedback vertex set appears to be even more 
difficult than other notorious combinatorial problems 
like the traveling salesperson or set covering problems. 
Therefore, the design of efficient local search proce- 
dures and fast construction heuristics will be a key to 
the effective computational procedure for feedback set 
problems. 


See also 


> Generalized Assignment Problem 

> Graph Coloring 

> Graph Planarization 

> Greedy Randomized Adaptive Search Procedures 
> Quadratic Assignment Problem 

> Quadratic Semi-assignment Problem 
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Let S be a nonempty closed and convex set in a real 
Hilbert space H with norm ||-||. A sequence (Xn)n> 0 of 
points in 1 is said to be Fejér monotone with respect to 
S (or simply S-Fejérian) if 


VX ESVNEN: [[xXn41—Xl| < lle -Z]]. 


In words, each point in the sequence is not further from 
any point in S than its predecessor. Given xp € H, a typ- 
ical example of S-Fejérian sequence is that generated by 
the algorithm 


Vn EN: xXn41 = TXn, 
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where T:}{ — KH is a nonexpansive operator, i. e., 
V(x, y) €H*: ||Tx — Tyl| < lx — yl. (2) 


with nonempty fixed point set S. Under suitable as- 
sumptions, the sequence of successive approximations 
(Xn)n>0 converges to a point in S [20]. 

In convex optimization, one frequently encounters 
algorithms whose orbits (x,),>0 are Fejér monotone 
with respect to the solution set. In order to simplify and 
standardize the convergence proofs of this broad class 
of algorithms, it is important to investigate the notion 
of Fejér monotonicity and to bring out some general 
convergence principles. These are precisely the objec- 
tives of the present article. 


Notation and Assumptions 


Throughout, the sequence (X,),>0 is Fejér monotone 
with respect to a nonempty closed and convex set S in 
a real Hilbert space H{ with scalar product (-|-), norm 
|||, and distance d. For every n € N, p, denotes be the 
projection of x, onto S, i.e., the unique point p, € S 
such that ||x» — pn|| = d(xn, S). Recall that p, is charac- 
terized by the variational inequality 


Vx ES: (X— palXn — pn) < 0. (3) 


The expressions x, — x and x, — x denote respectively 
the weak and strong convergence of (Xn)n>0 to x. BW 
and © denotes respectively the sets of weak and strong 
cluster points of (x;)n>o- Finally, Id denotes the iden- 
tity operator on KH. 


Basic Convergence Properties 


By way of preamble, some immediate consequences of 
(1) are stated below. 


Proposition 1 The following assertions hold. 

i) (Xn)n>0 is bounded. 

ii) VX ES: (|xXn — X||)n>0 converges. 

iti) (d(xp, S))n> 0 is nonincreasing. 

iv) Wx € S: xX, — X if and only if lim ||x, — x|| = 0 
ifand only ifSN GAY. 


Weak Convergence 


In general, Fejér monotone sequences do not converge, 
even weakly (consider for instance the {0}-Fejérian se- 
quence ((— 1)"xXo)n>0 with xo 4 0). By virtue of Propo- 


sition 1i), W A anda necessary condition for (Xn)n>o 
to converge weakly to a point in S is W C S. A remark- 
able consequence of Fejér monotonicity is that this con- 
dition is also sufficient. To see this, take y, and y2 in W, 
say Xz, — y1 and x), — y2, and x € S. By Proposition 
lii), 


lim ||xx, — ||? = lim ||x7, — ||’. 
Therefore, by expanding, 
lim ||, [|* — lim |[-<, |? = 2 ly — ya) - 
It follows that 
ScC{eeH: (xly—y) =a}, (4) 


where a = (lim||xz, |]? — ||27, ||*)/2. Thus, (v1, y2)€ S? 
= @ = (yil Yu = Ya) = Ya | Yi — Ya) > Ye = Ye Con- 
sequently, the bounded sequence (xy), > 0 cannot have 
more than one weak cluster point in S. This fundamen- 
tal property will be recorded as: 


Proposition 2. (X;)n> 0 converges weakly to a point in 
S ifand only if CS. 


Two additional properties are worth mentioning in 

connection with weak convergence. 

e Let affS be the closed affine hull of S. If yi A Yo, then 
(4) asserts that S is contained in a closed affine hy- 
perlane. If affS = H’, W reduces to a singleton and 
(Xn)n > therefore converges weakly. 

e Suppose that x, — x € Sand let x e 1. Then the 
identities 


Yn eEN: |lx,—<x|l’ 


= [xn —X||? +2 (xn — ¥1X — x) + |X — xl? 


together with Proposition _ 1ii) that 


(||xn—Xl)n > 0 converges. 


imply 


Strong Convergence 


As evidenced by the classical counterexample of [13], 
Xn ~ x €SAx, > xX € S. Accordingly, strong 
convergence conditions for Fejér monotone sequences 
must be identified. 
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First, consider the projected sequence (Py)n>o. It 
follows from (1) and (3) that for every (m, n) €N? 


IlPxn — Patmll? 

= |[Pn - Haccvllt +2 (Pn — XntmlXn+m — Pn+m) 
+ ||Ratie = Pavel” 

= dle SY = Ames 8) 
+2 (Pn — Pntm|Xn+m — Pnt+m) 

= day, SY = dente, SY. 


Consequently, since (d(x; S)) n> 0 converges by Propo- 
sition liii), (Pn)n>o is a Cauchy sequence. This estab- 
lishes: 


Proposition 3 (p,),> 0 converges strongly. 


This result, which is of interest in its own right, also 
leads to a simple criterion for the strong convergence of 
(Xn)n>o to a point in S. Indeed, suppose that lim d(x,, 
S) = 0. Then, thanks to Proposition Liii), d(x,, S) > 0, 
i. €., X» — Pn — 0. On the other hand, by Proposition 3, 
Pn — X with x € S since S is closed. One thus obtains: 


Proposition 4 (x;)n> 0 converges strongly to a point in 
S ifand only if lim d(x, S) = 0. 


Going back to (4), assume now that (y;, yo) € G*. Then 


a = (|lyi|]? — |ly2||*)/2 and (4) therefore becomes 
Sc fae oH: (x- 2524] y - yp} =o 
={xeH: ||x—- yl] = |x — yall}. (5) 


In words, if (X,) n> possesses two distinct strong clus- 
ter points y; and y2, S is contained in the closed affine 
hyperplane whose elements are equidistant from y; and 
yo. If affS = H, it results from (5) that (x,)n>0 pos- 
sesses at most one strong cluster point. This happens 
in particular when the interior of S is nonempty (Slater 
condition). In this case, however, a sharper result holds, 
namely (x;)n> 0 converges strongly [22]. 


Linear Convergence 


Proposition liii) asserts that (d(x,, S))n>o is nonin- 
creasing. Assume now that it decreases at a linear rate, 


say 


de €J0,1[ Vn EN: d(xn41,8) < Kd(xy,S). (6) 


Then, in view of Proposition 4, x, — x € S. On the 
other hand, for every (m, n) € N’, (1) yields 


[Xn — Xn+mll 
S |lXn — Pall + |]%n+m — Prll 
< 2d(xn, S). 


Thus ||x, — X|| < 2d(x,, S) and one reaches the follow- 
ing conclusion. 


Proposition 5 Suppose that (6) holds. Then (xn)n>o 
converges linearly to a pointx € S:¥Vn EN: ||x, —%| 
< 2K"d(xo, S) 


Geometric Construction 


In order to make the above theoretical convergence re- 
sults more readily applicable in concrete problems, it 
will henceforth be assumed that (x;,), > has been gen- 
erated by the following algorithm. 


Take xp € H and set n = 0. 

1 | Generate a closed affine half-space H,, such 
that S C Hy. 

2 | Compute the projection P,x, of x, onto H, 
and take 1, € [0, 2]. 

Sete ore An (aoe 

4 | Setn =n+1and go to step 1. 


Ww 


Fejér Monotonicity in Convex Optimization, Algorithm 1 
General Fejérian scheme 


The relaxation parameter A, determines the posi- 
tion of the update x, , on the closed segment between 
the current iterate x, and its reflection r;, = 2P,x, — xy 
with respect to H, (see Fig. 1.). In some problems, it 
is possible to significantly accelerate the progression of 
the iterates towards a solution by proper choice of the 
relaxation sequence (Ay) n> [5]. 

Hereafter, two properties of the relaxation sequence 
will be considered, namely 


J ~An(2— An) = +00 (7) 
n>=0 

and 
(An)n>o lies in [e, 2 — €], where € € ]0, 1[. (8) 
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Fejér Monotonicity in Convex Optimization, Figure 1 
A Fejérian iteration 


Now fix x € S. Then, for every n € N, 
llXn41 — Xl” 
= |[Xn — || + x || PnXn — Xnll 
+ 2An (Xn — X|Paxn — Xn) 
= Il Xn = |? —An(2 —An)d(xn, Hn)’. (9) 
Consequently, (X,)n> 0 is S-Fejérian and 


ye —)n)d(Xn, Hn)? < +00. 


n>=0 


(10) 


Furthermore, if (An)n>o0 lies in [0, 2 — ¢] for some 
€ €] 0, 1[, then the series )°,>o||%n+1 — Xnll* and 
st (X — Xn|Xn4+1 — Xn) converge [6,15]. 

In view of (10), the next two convergence results 
are immediate consequences of Proposition 2 and 4, re- 
spectively. 


Proposition 6 (Xn)n>0 converges weakly to a point in 

S if one of the conditions below is fulfilled. 

i) JO>S>WCS. 

ii) (7) is in force and lim d(xn, Hn) = 0 > WC S. 

iii) (8) is in force and n> 0d(Xn, Hn)? < + 0 > 
Wes. 


Proposition 7 (X;)n> 0 converges strongly to a point in 

S if one of the conditions below is fulfilled. 

i) (10) > lim d(x,, S) = 0. 

ii) (7) is in force and lim d(x, Hy) = 0 = lim d(x,, S) 
=0. 

iti) (8) is in force and ~n> od(xn, Hn)? < + © => lim 
d(x, S) = 0. 


To investigate linear convergence, assume that 


dn €]0,1[ Vn EN: d(xn, Hy) = nd(xn,S) (11) 


and that (8) holds. Then x = p, in (11) supplies 


d(xn41,8)" < |lxXn+1 — Pall? 
< d(Xn, sy = Cae ee H,) 
< (1—e7n?)d(xn,S)’. 


Whence, Proposition 5 yields: 


Proposition 8 Suppose that (8) and (11) hold. Then 
(Xn)u>o converges linearly to a pointx € S: Vn € 
N: |x, —X|| < 2«"d(xo, S) with « = (1 — €?n?)”. 


Applications 


Several convex optimization methods are now pre- 
sented. They are shown to be Fejér monotone and their 
convergence is established on the basis of the general 
results stated above. For brevity, only weak convergence 
is considered; however, strong and linear convergence 
results can be derived in a like manner under suitable 
assumptions. In each problem, the solution set S is as- 
sumed to be nonempty. 


Fixed Points of Nonlinear Operators 


For every n EN, let T,: H — H bea firmly nonexpan- 
sive operator, i.e., 


V(x, y) € H?: 
(Thx — Tyy|x — y) = ||Trx — Dall; (12) 


and let Fix T, = {xeEH: T,x = x} be its fixed point set. 
The problem under consideration is to find a common 
fixed point of the family (T;,),>0, ie. 


Find xe H 


(13) 
s.t. VneéeN: T,x =X. 


Let S= Ny, >0 Fix T, and 
Hy = {x €H: (x — TyXn|Xn — TaXn) < 0}. 


It then follows from (12) that S$ C Fix T,, C H,. Thus, 
Algorithm 1 takes the following form. 
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Take xp € H and set n = 0. 
Take 1, € [0,2]. 

Set ee nagt = Ben ce Mls = Beale 
Set n =n +1and go to step 1. 
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Fejér Monotonicity in Convex Optimization, Algorithm 2 
Common fixed point 


Noting that d(x,, H,) = ||(Id — T,,) x,||, several con- 
vergence results can be derived by direct application of 
Propositions 6-8. In particular, in the case of a single 
nonexpansive operator T (see (2)), the algorithm below 
is pertinent. 


Take x9 € H and set n = 0. 
Take 1, € [0,1]. 

Seta =n te Clin non) 
Set n =n +1 and go to step 1. 
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Fixed point 


Proposition 9 If \on>0 An(1 — An) = +00, any se- 
quence generated by Algorithm 3 converges weakly to 
a fixed point of T. 

Indeed, the assignments T,, <-(Id + T)/2 and A, <2, 
in Algorithm 2 yield Algorithm 3 as T,, is firmly non- 
expansive [3,5] and Fix T,, = Fix T. Next, observe that 
(d(xn, Hn))n>0 = (|]Ud — T) xy||/2)n> 0 is nonincreas- 
ing by (2). Hence, lim d(x,, H,) = 0 => (Id — T) x, > 
0 and it results from the demiclosedness of Id — T [20] 
that x.,, —~ x => (Id — T) x = 0. Thus, Proposition 9 
follows from Proposition 6ii). 


Zeros of Monotone Maps 


In connection with set-valued maps A, B: H =3 Ha few 
definitions and facts need to be recalled [2,27]. First, 
A is characterized by its graph gr A = {(x, u) € H?: u 
€ Ax}. The inverse A~! of A has graph {(u, x) €H?: (x, 
u) € grA} and the linear combination A + y B (y €R) 
has graph 

{(x,u+ yv): (x,u) € grA,(x,v) € gr B}. 


A is monotone if 


V(x,u) € gr AV(y,v) € gra: 
(x —ylu—v) > 0. 


If A is monotone and if there exists no monotone map 

B # A such that gr A C grB then A is maximal mono- 

tone. In this case 

e gr A is weakly-strongly closed: for every sequence 
(Vn Vn))n =0 in sf 


(Yas Vn))n>0 in grA 
yn —~ y 


n 
Vn Vv 


=> (y,v) €grA. (14) 


e Forevery y €]0, + oo[, the resolvent of A, i = (Id + 
y A)“', isa single-valued firmly nonexpansive oper- 
ator defined on H [17,23]. 

Of broad interest is the problem of finding a zero of 

a maximal monotone map A: H = H [23],ie., 


Find xe H 
s.t. 0 € Ax. 


(15) 


For every y €] 0, + ool, the solution set § = Aq! 0 
can be written as S = {xEH: x ex + y Ax} = Fix ee 
Thus, given (Yn), >0 in] 0, + ool, the equilibrium prob- 
lem (15) can be cast in the form of the common fixed 
point problem (13) with (Ty)n>0 = JU )n>o- Algo- 
rithm 2 is then known as the (relaxed) proximal point 
algorithm [17,23]. 


Take xp € H and set n = 0. 

Take y, € ]0,+oo[ and A, € [0,2]. 
Set Xni1 = Xn + DO a Rin)le 
Set n = n+ 1and go to step 1. 
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Proximal point 


Proposition 10 Suppose that 


isin [€, +00 
Sub el where € € ]0,1[. 
is in [e,2 — |] 


(Yn) n>0 


16 
(An)n>0 


Then any sequence generated by Algorithm 4 converges 
weakly to a zero of A. 


This result is a consequence of Proposition 6iii). In- 
deed, for every n EN, define yy = Xn + (Xn 41 — Xn)/Ans 
Vn = (Xn — Xn+1)/(Vn An) and note that v, € Ay,. Now 
suppose d(x,, H,,) — 0. Then, thanks to (16), Xn+1 — Xn 
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— 0 and, in turn, v, > 0 and y, — x, — 0. Hence, xz, 
xX > Vk, > x= 0 € Ax by (14). 

Weak convergence can also be achieved under vari- 
ants of (16), €.g., )-n>0 Y¥, =t+tooand VneEN:A,=1 
[2]. Such results can be deduced from Proposition 6 as 
well. 


Zeros of the Sum of Two Monotone Maps 


Take two maximal monotone maps A, B: H = H. An 
extension of (15) that captures a wide body of optimiza- 
tion and applied mathematics problems is [27] 


Find xe H 
oe (17) 
s.t. 0 < Ax + Bx. 
In instances when A + B is maximal monotone, one can 
approach this problem via Algorithm 4. Naturally, for 
this approach to be numerically viable, the resolvents 
of A + B should be computable relatively easily. A more 
widely applicable alternative is to devise an operator 
splitting algorithm, in which the operators A and B are 
employed in separate steps [16]. Two Fejérian splitting 
algorithms are described below. 
First, suppose that B is (single-valued and) co- 
coercive in the sense that B~' — a Id is monotone for 
some @ €] 0, + oof, i.e. 


V(x, y) € H?: 
(Bx — By|x — y) > a ||Bx — Byl|’. (18) 


Given y €] 0, 2a], it follows from (18) that Id — yB is 
nonexpansive. Moreover, the solution set S = (A + B)7! 
0 can be written as S = {x € H: x — y Bx Ex + yAx} 
= Fix T where T = i; o(Id — y B) is nonexpansive as 
the composition of two nonexpansive operators. Algo- 
rithm 3 can then be implemented by alternating a for- 
ward step involving B with a backward (proximal) step 
involving A. 


Take y € ]0, 2a], xo € H, and set n = 0. 
Set Xni12 = Xn — YBxy and take 1, € [0, 1]. 
Set Xni1 = Xn + AnJyXns12 = #8). 

Set n =n +1 and go to step 1. 
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Forward-backward method 


As a corollary of Proposition 9 we obtain: 


Proposition 11 If S°,>0 An(1 — An) = + ©, any se- 
quence generated by Algorithm 5 converges weakly to 
a zero of A+ B. 


The second algorithm is centered around the operator 
T = J}0(2J} — Id) + Id — J}, where y € J0, + oof. This 
operator possesses two nice properties: it is firmly non- 
expansive and y € Fix T + J? y € (A + B)' 0 [16]. 
Whence, by putting T;,, < T in Algorithm 2, one ob- 
tains the Douglas-Rachford method [8,16]. 


Take y € ]0, +00[, xo € H, and set n = 0. 

Set Xni12 = JP xn and take A, € [0, 2]. 

Set Xn41 = Xn + An (Jo (2xns102 ae ae Xnuif2). 
Set n =n +1and go to step 1. 
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Douglas—Rachford method 


As in Algorithm 5, B is activated at step 1 and A at 
step 2. Convergence is established as in Proposition 9 
Proposition 12 If 3°, >0 An(2 — An) = + ©, any se- 
quence generated by Algorithm 6 converges weakly and 
the image of the weak limit under J} is a zero of A + B. 


Variational Inequalities 


Let B: H — HX be a single-valued maximal monotone 
operator, let y: H( —]— 00, + co] be a proper, lower- 
semicontinuous, convex function, and let dg: H = H 
be its subdifferential, i. e., 
d(x) 
= () eH: (y—x\u) +(x) < oy}. 
yEH 
Then 0g is maximal monotone [2] and, upon taking A 
= 0g in (17), one arrives at the variational inequality 
problem 
Find x¢€ H 
st. VxEeH: 
(% — x|BR) + @®) < glx). 


(19) 


In this context, the resolvent Ne reduces to Moreau’s 
prox mapping [18] 


1 
proxy xe arg min 9(y) + by ly — x|l?. 
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As a special instance of (17), the variational inequality 
problem (19) can be solved via the forward-backward 
method (Algorithm 5) and Proposition 11 then yields: 


Proposition 13. Suppose that (18) is in force. Take 
yeE]0, 2a], xo €H, and let 


VneN: 
Xnt1=XntAn (prox? (xn — YBxn) — Xn) » (20) 


where (An)n> 0 is in [0, 1] and Y° n> 0 An(1— An) =+ ©. 
Then (Xn)n> 0 converges weakly to a solution of (19). 


A noteworthy situation is when @ = la, where /g is the 
indicator function of a nonempty closed convex set Q, 
i.e, 


0 ifx € Q, 


lagixbh 
+oo ifx éQ. 


(21) 


It follows that dig = Na; where Ng is the normal cone 
to Qie., 


Nex = (\tueH: (y—x|u) <0}, 


yEQ 


ifx € Q, and Ng x = 9 otherwise. In addition, (19) reads 


Find x€Q 
(22) 
s.t. VxeEQ: (x—x|Bx) <0, 


and prox,” = Pg is the projector onto Q. 


Differentiable Optimization 


A standard convex programming problem is to mini- 
mize a proper, lower-semicontinuous, convex function 
f: F( — ]— 00, + 00] over a nonempty closed convex set 
QEH, Le; 

Find x = arg _ f(x). (23) 
In view of (21), (23) is equivalent to finding a global 
minimizer of tg + f, i.e., by Fermat’s rule, to finding 
a zero of 0(ig +f). If 0 lies in the interior of Q— {x € H: 
F(x) < + co}, then (tg + f) = dig + 0 f [2] and (23) is 
therefore of the form (17) with A = Ng and B= 0 f. This 
occurs in particular when f is finite and continuous at 
a point in Q. 


Now suppose that f is differentiable. Then 0 f = 
{V f} is single-valued and (23) can further be reduced 
to (22) with B = V f. The forward-backward scheme 
(20) then becomes the projected gradient algorithm 


VneN: 
Xnt1 = Xn tAn (Po (xy _ yVf(xn)) — i) . 


Proposition 13 provides conditions for weak conver- 
gence to a minimizer of f over Q. 


Convex Feasibility Problems 


Given a family (S;);<; of intersecting nonempty closed 
and convex subsets of H, the convex feasibility problem 
reads [3,5,6,15] 


Find xeS=[ )S;. 


ie] 


(24) 


At iteration n, select a nonempty finite index set I, C I 
and, for every i € I,,, let p;,, be an approximate projec- 
tion of x, onto S;, i. e., the projection of x, onto a closed 
affine half-space H;,, containing S;. Then 


Hin = {x en; (x — Pisn|Xn — Pin) <0}. 


Let 


xeH: 


a Wi,n (x = Pi.n|Xn — Pin) < 0 


i€Iy 


where the weights (wj,n)ier, are in ]0, 1] and satisfy 
ier, Win = 1. ThenSc Nier, Si Cc Nier, Hin CH, 
and PrxX_ = Xn + La(Xn+1/2 — Xn), Where Xn 41/2 = Vier, 
Wi, nPi,n and 


» Winn llPin — Xnll” 


i€Iy 


; if Xnpi F Xn (25) 


1 else. 


Algorithm 1 then turns into Algorithm 7. 
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Take xo € Hl and set n = 0. 

Take a nonempty finite set I, C I. 

Compute approximate projections (pi,n)iez, of 
Xp onto (S;)ier,- 

3 | Take (wi,n)ier, in ]0, 1] such that 

ey Vinal 

Set Xn41/2 =Vier, WinPi,n, Ly asin (25). 

Take 2, € [0,2L,]. 

Set Xn41 = Xn + An(Xn+1/2 — Xn). 

Set n = n+ 1and go to step 1. 


Ne © 
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Fejér Monotonicity in Convex Optimization, Algorithm 7 
Convex feasibility 


Weak convergence to a point in S follows from 
Proposition 6 under various assumptions on the con- 
trol sequence (I;,)n > and the approximate projections 
((pi, ndier, n= 0 [56,15]. 


Nondifferentiable Optimization 


Suppose that f is subdifferentiable in (23), i.e., 
Vx EH: df(x) #9, 


and that its minimum value f over Q is known. Then 
(23) can be viewed as a special case of (24) with two 
sets, namely S; = Q and S, = {x € H: f(x) < fh. 
Now take 


Ay: = {x EH: (x—xy|Un) <F - fl) 


where u,, €0 f(x,). Then S, C Ho, , and 


pes heal Sn) ifx, € Sy 
P21 = u 


Xe otherwise 


is called a subgradient projection of x, onto Sp [3,5]. If 
Algorithm 7 is implemented by alternating a relaxed 
subgradient projection onto S; with an exact projection 
onto S), i.e. 


Vn EN: Xn41 = Pg (Xn +i, (p2,n — xn)) ; 


one obtains the subgradient projection method of [21]. 
Weak convergence to a solution of (23) under the as- 
sumptions of uniform boundedness of 0 f on bounded 
sets, (An)n> is in [0, 2], and (8), follows from Proposi- 
tion 6iii [3,5]. 


Inconsistent Convex Feasibility Problems 


When Mj 7S; = @ and I is finite, (24) can be replaced by 
the minimization problem 
. = ad 2 
Find x = arg min ; So wid(x, S;) (26) 
i€l 
where (wi)ier is in ]0, 1] and Vierwi = 1. Let (Pi)ier 
be the projectors onto (Sj)ie7, let T = )oie;wiPj, and 
let S be the solution set of (26). Then T is firmly non- 
expansive and S = Fix T [5]. By reiterating a previous 
argument, one obtains: 


Proposition 14 Take xo € H, (An)n>o in [0, 2] such 
that Yon>0 An(2 — An) = + 0, and let 


Vn EN: xXn41 = Xn tAn (x WiPixXn — =| : 


i€l 


Then (Xn)n> 0 converges weakly to a solution of (26). 


Historical Notes and Comments 


In 1922, L. Fejér considered the following problem [12]: 
given a closed subset S$ C R? and a point y ¢ S can one 
find a point x € R? such that 


vx ES: ||x—Xl| < lly —x]. 


Inspired by this work, T.S. Motzkin and I.J. Schoenberg 
adopted in their 1954 paper [19] the term Fejér mono- 
tone to describe sequences satisfying (1). In this paper 
(see also [1]), an algorithm was developed to solve sys- 
tems of linear inequalities in R? by successive projec- 
tions onto the half-spaces defining the polyhedral so- 
lution set S. The concept of Fejér monotonicity was 
shown to be an adequate tool to study convergence of 
this algorithm. Basic facts such as (5) and (9) can al- 
ready be found in [19] and [1], respectively. 

In the 1960s, LI. Eremin extended the use of Fe- 
jér monotonicity to more general convex problems in 
Hilbert spaces. A summary of his publications cover- 
ing the period 1961-1967 is given in [9]. By the end 
of the 1960s, most results on Fejér monotonicity in 
Hilbert spaces were essentially known and one can find 
them scattered in the Soviet literature in the context of 
specific convex programming problems. Thus, (4) ap- 
pears in [10], Proposition 2 in [4], Propositions 4 and 5 
in [14], and Proposition 8 in [14] and [21]. It should be 
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noted that Proposition 2 has been implicitly rediscov- 
ered many times and that it seems to originate in [24]. 

Recently, Fejér monotonicity has been reserved 
a featured role in several convex optimization pa- 
pers [3,6,15,25,26]. It has also proven a valuable tool 
in more applied disciplines such as biology, economics, 
and engineering [5,11]. Some extensions of the notion 
of Féjer monotonicity are discussed in [7]. 


See also 


> Generalized Monotone Multivalued Maps 

> Generalized Monotone Single Valued Maps 

> Generalized Monotonicity: Applications to 
Variational Inequalities and Equilibrium Problems 
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The financial decisions of an organization (i.e. firm, 
bank, insurance company, etc.) are usually considered 
in the context of optimization. Concerning the case of 
a firm and for a long term period, one meets two types 
of decisions: decisions related to the optimal allocation 
of funds, and decisions related to the optimal financial 
structure. In the short term, one considers decisions re- 
lated to the management of working capital, and refers 
to the optimization of stocks, cash, accounts receiv- 
able and short term debts. The financial theory analyzes 
these decisions (short and long terms), but always from 
an optimization perspective (for example, theory of cost 
of capital, portfolio theory, options theory, etc.). This 
perspective has led some researchers to propose tech- 
niques of operations research to solve financial decision 
problems. The classical modeling of decision problems 
in operations research consists in formulating an opti- 
mization (maximization or minimization) problem un- 
der specific constraints. In fact, it is a best choice prob- 
lem. 

However, recently, these financial problems have 
been examined from a more comprehensive and more 
realistic perspective which overcomes the restrictive 


framework of optimization [80,84]. For example, in 

capital budgeting decision making, K. Bhaskar and P. 

McNamee [6] pose the following questions: 

a) In assessing investment proposals, do the decision 
makers have a single objective or multiple objec- 
tives? 

b) If decision makers do have multiple objectives, 
which are those and what is the priority structure 
of the objectives? 

In another similar study, Bhaskar [5] refers that mi- 
croeconomic theory has largely adopted a single objec- 
tive function which is the principle of utility maximiza- 
tion for the consumer unit and profit maximization for 
firms. To attack the single objective function principle 
for firms, Bhaskar [5] addresses three categories of crit- 
icism: 

a) there exist alternatives to the profit maximization 

approach which are based on equally simple hy- 

potheses and which can better explain reality; 

the profit maximization or any other equally simple 


b 


ee 


hypothesis is too naive to explain the complex pro- 

cess of decision making; 

c) the real-world firms do not have suitable informa- 
tion to enable them to maximize their profits. Fur- 
thermore, several other theories of the firm have 
been postulated which have proposed different ob- 
jectives than that of the traditional microeconomic 
theory. 

One can cite the revenue maximizing model [3], the 

manager’s utility model [74], the satisficing model [64] 

and the behavioral models [13]. 

On the basis of the above remarks it is possible to 
distinguish three main reasons which have motivated 
a change of view in the modeling of the financial prob- 
lems: 

1) Formulating the problem in terms of seeking the 
optimum, financial decision makers (i.e. financial 
analysts, portfolio managers, investors, etc.) get in- 
volved in a very narrow problematic, often irrele- 
vant to the real decision problem. 

2) The different decisions (financial ones) are taken 
by the people (i.e. financial managers) and not by 
the models; the decision makers get more and more 
deeply involved in the decision making process and, 
in order to solve problems, it becomes necessary to 
take into consideration their preferences, their ex- 
periences and their knowledge. 
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3) For financial decision problems such as the choice of 
investment projects, the portfolio selection, the eval- 
uation of business failure risk, etc., it seems illusory 
to speak of optimality, since multiple criteria must 
be taken into consideration. 

In this article, our basic aim is to examine the contribu- 

tion of the multicriteria analysis to the study and to the 

solution of some financial decision problems. Section 

2 presents the basic principles of multicriteria analysis. 

The multicriteria character of some financial problems 

and some real world applications of the multicriteria 

analysis in the field of financial management are given 
in section 3. Finally, some discussion and the advan- 
tages which resulted by the application of multicriteria 
analysis in the field of financial management, are given 
in section 4. 


Basic Principles of Multicriteria Analysis 


Multicriteria analysis, often called multiple criteria de- 
cision making (MCDM) by the American School and 
multicriteria decision aid (MCDA) by the European 
School, is a set of methods which allow the aggrega- 
tion of several evaluation criteria in order to choose one 
or more alternatives (i. e. investment projects, financial 
assets at variable revenue, financial assets at fixed rev- 
enue, dynamic firms, etc.). It also deals with the study 
of the activity of decision aid to a well-identified deci- 
sion maker (i. e. individual, firm, organization, etc.). 
The development of multicriteria decision aid 
(hence we use this term in the text) began in 1971. 
Its principal objective is to provide the decision maker 
with tools in order to enable him to advance in solving 
a decision problem (for example, the selection of invest- 
ment projects for a firm), where several, often conflict- 
ing multiple criteria must be taken into consideration. 


Methods 


The specialists in the field distinguish several categories 

of methods in MCDA. The boundaries between these 

categories are, of course, rather fuzzy. B. Roy [58] pro- 

poses the following three categories of methods: 

1) unique synthesis criterion approach disregarding 
any incomparability; 

2) outranking synthesis approach, accepting incompa- 
rability; and 


3) interactive local judgement approach with trial- 
error iterations. 

In this paper, the classification proposed in [53] is 

adopted. It distinguishes four categories: 

1) multi-objective mathematical programming; 

2) multi-attribute utility theory; 

3) outranking relations approach; and 

4) preference disaggregation approach. 

Multi-objective mathematical programming is charac- 

terized by the fact that an action (or alternative) a is 

represented by a vector of real variables (x1, ..., x;). The 

set A of the feasible solutions is defined by a set of linear 

constraints: A = {x € R!: A. X < b,x > 0} with Aa ma- 

trix of dimensions m x | and b a vector-matrix m x 1. 

The chosen vector must give satisfaction to relatively 

several numerical criteria, m in number, and noted as 

C!, ..., C”, which are linear functions of x. It is pos- 

sible to distinguish three different methods inside this 

approach: 

1) the efficient solutions procedure; 

2) the goal programming; 

3) the compromise programming. 

A synthesis of the studies realized on this category of 

methods can be found in [69,72] and [77]. 

Multi-attribute utility theory (MAUT) is an exten- 
sion of the classical utility theory. It seeks to give a rep- 
resentation of the preferences of a decision maker by 
means of a utility function, aggregating several evalua- 
tion criteria: u(g) = u(gi, ..., Zn). In other words, the 
problem is to choose the action a which maximizes the 
utility function of the decision maker: u[g (a)] = max 
ulg (a)]. 

The criteria (attributes) can be certain or probabilis- 
tic (each gj(a) is associated with a probability distri- 
bution). In general, one can decompose a multicrite- 
ria utility function in real functions 1), . 
ing the independence of criteria. Thus, different utility 
function models are obtained. The most studied form 
of utility function, from a theoretical point of view, is 
the additive form: 


+> Un Concern- 


U(21,---. Kn) = Ui(g1) +++ + Un( Bn), 


where 1, ..., U, are the marginal utilities defined on 
the scales of criteria. For the study of the condition of 
independence in utility between criteria (substitution 
rate), one can refer to [34]. The latter and [77] present 
syntheses of works on the construction of multicriteria 
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utility functions both, under certainty and under uncer- 
tainty. 

The outranking relations approach was developed in 
Europe with the elaboration of the ELECTRE methods 
(ELimination Et Choix Traduisant la REalité). The con- 
cept of outranking in ELECTRE methods is due to Roy, 
who is the founder of these methods. The outranking 
relation allows to conclude that an action a € A (dis- 
crete set) outranks an action b € A if there are enough 
arguments to confirm that a is at least as good as b, 
while there is no essential reason to refute this state- 
ment. In the ELECTRE methods the aggregation of cri- 
teria requires to define the threshold notions of pref- 
erence and indifference, concordance and discordance. 
In fact, a outranks b if there exists a sufficient majority 
of criteria for which a is better classified than b (con- 
cordance) and if the unfavorable deviations for the rest 
of the criteria (discordance) are not too high. Thus, this 
modeling can bring into evidence the cases of incompa- 
rability when the multicriteria evaluation of two actions 
is very differentiated. A detailed presentation of all out- 
ranking methods can be found in [61,63] and [72]. 

The approach of the disaggregation of preferences is 
often used in MCDA as a mean for the modeling of the 
preferences of a decision maker or a group of decision 
makers. This approach uses the regression methods. 
The introduction of regression methods in MCDA is 
effected because of the development of the social judge- 
ment theory. Multiple regression can, in general, detect, 
identify or ‘capture’ the judgement policy of a decision 
maker (i. e. disaggregation of the preferences). This one, 
particularly if it is in relation with a certain number of 
past decisions, might be the expression of a global pref- 
erence. The approach by multiple regression is quite 
close to the MAUT; their differences are placed at the 
level of obtaining the marginal utilities u;(g;) and the 
weights p;. For example, for the additive utility func- 
tion: 


u(g) = S- piui(gi). 


the marginal utilities u;(g;) and the weights p; are ob- 
tained by direct interrogation of the decision maker 
(aggregation methods) as far as it concerns the MAUT 
approach, and by indirect interrogation of the deci- 
sion maker (disaggregation methods) as far as it con- 
cerns the multiple regression approach. The principal 


drawback which prevents the closeness of the two ap- 
proaches is related to the linearity of the models pro- 
posed by multiple linear regression. A rather exhaus- 
tive bibliography of the methods of the disaggregation 
of preferences can be found in [32] and [53]. 


Decision Aid Activity 


Concerning the activity of decision aid, Roy [58,60] 
proposes a methodology of systematic intervention of 
multicriteria analysis in the decision process. In brief, 
this methodology comprises four levels: 
I) Object of the decision and spirit of recommenda- 
tion or participation. 
II) Analyzing consequences and developing criteria. 
III) Modeling comprehensive preferences and opera- 
tionally aggregating performances. 
IV) Investigating and developing the recommenda- 
tion. 
It is important to emphasize that these four levels do 
not necessarily follow one another in the above men- 
tioned order. The activity of decision aid does not nec- 
essarily constitute a sequential process; interactions be- 
tween the decision maker and the analyst can occur. 
This general methodology has contributed to the de- 
velopment of several multicriteria methods which have 
been applied successfully to real cases. Among these 
methods the well-known are the ELECTRE methods 
developed by Roy and his collaborators. 


Multicriteria Character of Financial Problems 
and Some Real-World Applications 


The operational research techniques were the first to be 
used in the solution of some financial problems. I. Eke- 
land [19] wonders 


why finance, rather curiously, has remained so 
long away from the techniques of operational re- 
search (i.e. optimization techniques), except for 
those concerning portfolio selection models. 


According to the same author, the Capital Asset Pricing 
Model (CAPM) is a static optimization model based on 
the principle according which, the best portfolio (i.e. 
optimal portfolio) is the one which maximizes the ex- 
pected return for a given level of risk, in the period 
of time considered. For R.W. Ashford et al. [2], the 
techniques of operational research can be applied to 
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working capital management as well as to the eval- 

uation of investment projects. Among the techniques 

used for the management of working capital, one could 
mention: 

e inventory control for the management of stocks; 

e dynamic programming, linear programming, 
stochastic programming and visual and interac- 
tive techniques of simulation for the management 
of cash; 

e the Markov process and the discriminant analysis 
for the management of accounts receivable; 

e dynamic programming, linear programming, and 
stochastic programming for the management of 
short-term debts (current liabilities). 

Among the techniques used in the evaluation of in- 

vestment projects, one could mention the simulation 

methods [23] and those of mathematical statistics [24] 

which take into consideration the risk factor. Sim- 

ulation methods and linear programming (i.e. the 

LONGER program, [51]) are also used in financial 

planning (i.e. elaboration of investment and financing 

plans). Under these circumstances, the solution of fi- 
nancial problems is easy to obtain. It is based on the 
fact that the problem is well posed, well-formulated re- 
garding the reality involved and on an evaluation cri- 
terion (i.e. monocriteria paradigm). But in reality, the 
modeling of financial problems is based on a different 
kind of logic. In that case, their solution should take 
into consideration the following elements (i.e. multi- 

criteria paradigm, cf. [59]): 

e multiple criteria; 

e conflict situation between the criteria; 

e complex evaluation process, subjective and_ ill- 
structured; 

e introduction of financial decision makers in the 
evaluation process. 

MCDA has already contributed in a significant man- 
ner to the solving of several financial problems such as 
venture capital investment, business failure risk, credit 
granting, bond rating, country risk, political risk, evalu- 
ation of the performance and viability of organizations, 
choice of investments, financial planning and portfolio 
management. 

The multicriteria character of these financial prob- 
lems can be easily demonstrated. We will limit here the 
analysis on the choice of investment projects and port- 
folio management. International literature could actu- 


ally provide very important case studies for the rest of 
the financial problems [36,80,84]. 


Investment Decision 


The choice of investment projects entails an important 
decision for every firm, public or private, large or small 
one. In fact, considering its duration, its amount and 
its irreversible character an investment decision is re- 
garded as a major and strategic one. Therefore, the pro- 
cess of an investment decision should be conveniently 
modeled. If one considers that, in principle, the in- 
vestment decision process consists of four main stages: 
perception, formulation, evaluation and choice, the fi- 
nancial theory intervenes only in the stages of evalu- 
ation and choice [8]. With its empirical financial cri- 
teria (i.e. the payback method, the accounting rate of 
return) and sophisticated ones, based on discount tech- 
niques (i.e. the net present value, the internal rate of 
return, the index of profitability, the discounted pay- 
back method, etc.), the financial theory proposes either 
a ranking from the better to worst when there are many 
investment projects in competition or an acceptance 
or refusal if there is only one investment project. Al- 
though the tools of the financial theory should be im- 
proved so that they could take into account time, infla- 
tion and risk (i. e. analytical methods, simulation meth- 
ods, games theory, CAPM, etc.), there are still prob- 
lems concerning the evaluation and selection of invest- 
ment projects. Among the most important ones, one 
could mention the reduction of the investment notion 
in a time series of monetary flows (i.e. inflows, out- 
flows), the choice of the discount rate, the conflicts be- 
tween financial criteria (i.e. net present value versus 
internal rate of return), etc. According to the finan- 
cial theory, the discount rate (sometimes rate of return) 
plays the role of acceptance or rejection rate (a cut off 
rate) of an investment project in the case where the cri- 
terion of internal rate of return is used. Thus, one can 
see that the investment decision of a firm depends on 
one variable only, which is the discount rate. As far as 
the conflicts between criteria are concerned, one often 
ascertain that the criteria which are supposed to express 
the goal of the profitability of projects, could lead to 
divergent rankings (for example, the net present value 
and the index of profitability or even the net present 
value and the internal rate of return). In consequence, 
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the financial approach of investment decision seems 
limited and unrealistic. It is limited because it remains 
in the stages of evaluation and choice, and it is unreal- 
istic because it is based only on financial criteria. 
MCDA, on the other hand, contributes in a very 
original way to the investment decision process. Ini- 
tially, it intervenes in the whole process of investment, 
from the stages of perception and formulation to the 
stages evaluation and choice. Concerning the stages of 
perception and formulation, MCDA contributes to the 
identification of possible actions (i.e. investment op- 
portunities) and to the definition of a set of potential 
actions (i. e. possible variants, each variant constituting 
an investment project in competition with others). This 
set of projects can be global, fragmented, stable or evo- 
lutionary. Then, it is necessary to choose a reference 
problematic which is well-adapted to the investment 
decision problem (i.e. choice, sorting, ranking). 

e Choice problematic P.a: help in choosing the best 
investment project or in developing a selection pro- 
cedure for investment projects. 

e Sorting problematic P.f: help in sorting investment 
projects according to norms or in building an as- 
signment procedure for investment projects. 

e Ranking problematic P.y: help in ranking the in- 
vestment projects according to a decreasing prefer- 
ence order or in building an ordering procedure for 
investment projects. 

Concerning the stages of evaluation and choice, MCDA 

offers a methodological framework much more realistic 

than the financial theory, by introducing in the study 
of investment projects both quantitative and qualitative 
criteria. Criteria such as the urgency of the project, the 
coherence of the objectives of the projects with those 
of the general policy of the firm [21], the social and 
environmental aspects should be taken into considera- 
tion in an investment decision. Therefore, MCDA con- 
tributes to show the best investment projects accord- 
ing to the problematic chosen, to solve the conflicts be- 
tween criteria satisfactorily, to set up the relative im- 
portance of criteria in the decision making process and 
to make known the preferences and the investors’ sys- 
tem of values. It is very interesting to mention that 
many authors have already used MCDA methods in 
the evaluation of investment projects (list non exhaus- 
tive): ELECTRE II and ORESTE methods [14]; MAUT 
methods [21]; multi-objective mathematical program- 


ming [5,35,41]; the Analytic Hierarchy Process (AHP) 
method [38]; PROMETHEE method [55,81]. 

Finally, in order to examine if the firms apply in 
reality multiple criteria in their investment decisions, 
we present the results of the empirical study of Bhaskar 
and McNamee [6]. The two authors, by studying large 
United Kingdom companies, have shown that most 
companies appear to have more than one objective 
when an investment is being appraised (96%). The 
most common number of objectives that companies 
had was eight. Concerning the objectives priority, most 
companies (77%) had profitability as the primary ob- 
jective. The next most important objective was com- 
pany’s growth. Other criteria less important than the 
two above but, which play a role in the investment de- 
cisions are the risk, the liquidity, the environment, the 
age of assets, the flexibility, the depth of skills, etc. With 
these empirical results an answer has been given to the 
questions posed in the introduction by the two authors. 


Portfolio Management 


In the field of portfolio management it is possible to 
cite the pioneering work of H.M. Markowitz [46] who, 
by developing the optimization model mean-variance 
(M-V), is the founder of the classical approach of the 
portfolio management. According to [19], the prob- 
lem of portfolio choice in the model (M-V) is a mul- 
ticriteria one, because the investor will try simultane- 
ously to maximize the return and minimize the risk; 
but determining the acceptance level of risk, one comes 
back to maximize the return, which is a classical mon- 
ocriteria problem. After this bicriteria, and even more 
the monocriteria (i.e. market model, CAPM) portfo- 
lio choice consideration, the development of multi- 
factor models has been started where there are more 
types of risk and not only market risk [57]. Thus, 
the problem of portfolio selection becomes multidi- 
mensional. The necessity of having multidimensional 
methods (i.e. statistics and econometrics) for the se- 
lection of stocks has been presented by specialist re- 
searchers in finance [33]. The multidimensional nature 
of risk in portfolio management has also been demon- 
strated by specialist researchers in multicriteria analy- 
sis. See [76,77] and [10] on the ‘Prospect Ranking Vec- 
tor (PRV)’ method. Today an arsenal of multidimen- 
sional and multicriteria methods such as factor analysis, 
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goal programming, AHP, ELECTRE, MINORA, ADE- 
LAIS, etc. have been already applied in the field of port- 
folio management [9,25,27,28,29,37,40,47,48,62,82,86]. 

The multicriteria nature of the problem of portfo- 
lio selection is well presented in [37]. The authors study 
the problem of the international portfolio selection. Ac- 
cording to them, the classical optimization model of 
portfolio selection used in a national context can have 
even more chance of being sub-optimal in a situation 
of international diversification. In fact, in an interna- 
tional context, the M-V model does not always con- 
stitute a suitable method because, it does not incor- 
porate all the criteria that the portfolio managers and 
investors use in their stock investment decisions. For 
such decisions, the authors propose new criteria such 
as: the return of the last five years on a monthly basis, 
the standard deviation of the return calculated on the 
last five years, the total cost of transactions, the country 
risk (or political risk), the direct available coverage for 
foreign currencies and the exchange risk. The multicri- 
teria methodology used (i.e. ELECTRE IS, ELECTRE 
III) has the advantage of offering the portfolio manager 
a large set of investment opportunities, and also gives 
him the flexibility of choosing the relative importance 
of the different criteria during the process of portfolio 
selection. Finally, the authors believe that the use of an 
optimization model under constraints changes the na- 
ture of the portfolio selection problem because a con- 
straint does not play the same role as a criterion in 
all decision problems. To show this new direction of 
research in portfolio management, it is convenient to 
mention the special issue of the Canadian journal ‘L’ 
Actualité Economique}, which is dedicated on the con- 
tribution of multicriteria analysis in the study of finan- 
cial markets [36]. 


Some Real-World Applications 


In this paragraph two applications of MCDA are briefly 
presented. The first one concerns the evaluation of the 
venture capital investment and, the second one the 
evaluation of the business failure risk. 


Venture Capital Investment 


Venture capital constitutes today an important source 
of financing for small and medium size firms. It plays, 
also, an interesting role in the development of the busi- 


ness’ spirit. The crucial problem for venture capital in- 
vestment is the choice of evaluation criteria and their 
aggregation in a global operational model, which will 
serve as a basis for the rational and automatic selec- 
tion of viable firms. The earlier evaluation models (i.e. 
descriptive and statistical) can not explain the invest- 
ment decisions in venture capital, since the latter relies 
much more on subjective and qualitative elements than 
on objectives and quantitative ones [83]. Moreover, the 
complexity of the evaluation of venture capital invest- 
ment problem has been mentioned in the evaluation 
procedures of projects by two French venture capital 
firms [80]. 


Study Context 


The data sample coming from two French venture cap- 
ital firms, IDI and SIPAREX, was used as the applica- 
tion object of MCDA. Although these two firms use 
project evaluation procedures, their problem remains 
that of the absence of a model able of supporting their 
decisions in venture capital investment. In fact, the 
variables used in the evaluation procedure are both fi- 
nancial variables (i.e. profitability ratios, solvency ra- 
tios, liquidity ratios, etc.), and qualitative variables (i.e. 
market trend, information security, quality of manage- 
ment, market niche/position, etc.). But, although there 
are, in both venture capital firms, techniques for the 
treatment of financial variables, there is no explicit 
model for the elaboration and modeling of the quali- 
tative variables. Therefore, it is at this stage of analysis 
that the evaluation problem becomes complex. More- 
over, the complexity of the evaluation of venture capi- 
tal investment problem is also underlined in other stud- 
ies [18,26,54,71,83] among others). The role of the ven- 
ture capitalist goes beyond that of the simple contribu- 
tor to the funds of the firm. 


Multicriteria Method and Results 


The multicriteria system MINORA (Multicriteria IN- 
teractive Ordinal Regression Analysis) was proposed 
for the evaluation of firms to the two venture capi- 
tal firms. It belongs to the fourth category of MCDA 
methods, which is the approach of the disaggregation 
of preferences. The MINORA system is both based on 
the iterative utilization of an ordinal regression method 
and on an appropriate man-machine dialogue. Its aim 
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is to construct multicriteria decision models which are 
as consistent as possible with the judgement policy of 
a decision maker. The decision maker (here the ven- 
ture capitalist) expresses his judgement policy by rank- 
ing some firms, among those he knows well on the ba- 
sis of previous decisions. The system, then, by the use 
of the ordinal regression method UTA (UTilits Addi- 
tives [31], estimates optimally the additive utility func- 
tion(s), on multiple criteria, which is (are) as consistent 
as possible with the decision maker’s ranking. The util- 
ity function model is estimated iteratively and interac- 
tively. It allows, first, the aggregation of all the criteria 
(i.e. financial and/or qualitative) by giving their rela- 
tive importance, and second, the automatic and global 
evaluation of each firm. With the help of the decision 
makers of the two venture capital firms, two evaluation 
models were elaborated (one for each venture capital 
firm). This paper presents only the global model of IDI. 
e The evaluation model for IDI 

IDI evaluates firms for financing according to twelve 
criteria. The utility function model was then estimated 
in the fourth stage of interaction and appeared perfectly 
consistent with the objectives of IDI. The equation for 
the global model is the following: 


u(g) = 0.008u;(g1) 

+ 0.072u2(g2) + 0.006u3(g3) + 0.197U4(g4) 
+ 0.105us(gs) + 0.232u6(g6) + 0.009u7(g7) 
+ 0.094ug(gg) + 0.047uU9(go) + 0.071u10(g10) 
+ 0.09711 (g11) + 0.062u12(gi2), 


where the evaluation criteria are the following: 


gi) the sensitivity of sales to the inflation rate; 

g) the sensitivity of value added to the sales varia- 
tions; 

g3) the sensitivity of labor productivity (value added 
per capita) to wage cost increase (wage per capita); 

g4) the supplier credit in days; 

gs) the available net income; 

&) the quality of management; 

g7) the research and development effort; 

gs) the extent of diversification; 

go) the market trend; 

Zio) the market niche/position; 

gi1) the cash-out method (opportunities for exit); 

Zi2) the world market share. 


The model described above is the best adapted to ex- 
press the preferences, the knowledge and the experi- 
ences of the venture capitalist concerning the quality 
of the firms and their final evaluation. A detailed pre- 
sentation of the multicriteria method and the results of 
the application in the two venture capital firms IDI and 
SIPAREX can be found in [80]. 


The Business Failure Risk 


According to a general definition, failure is the situ- 
ation that a firm cannot pay lenders, preferred stock 
shareholders, suppliers, etc., or a bill is overdrawn, or 
the firm is bankrupt according to law. Today, there 
is a complete arsenal of evaluation methods for the 
business failure risk [16]. Since the late 1980s, methods 
close to a qualitative definition of business failure have 
been developed. These are multicriteria methods which 
present undeniable advantages in matter of evaluation 
for the business failure risk [84]. 


Study Context 


The study concerns the evaluation of failure risk of 
firms financed by a Greek bank of industrial develop- 
ment. This bank finances with stock equity and long 
term credit the development of Greek firms and con- 
tributes to the renovation of industrial and commercial 
firms on a national and regional level. As in the previ- 
ous case of the venture capital investment, there is no 
model able to provide help to the bank credit managers 
in the financing of firms. 


Multicriteria Method and Results 


ELECTRE TRI method was proposed for the evaluation 
of business failure risk, which is particularly suitable for 
multicriteria sorting problems. It belongs to the third 
category of MCDA methods, which is the approach of 
outranking relations [61,75]. From a finite set of actions 
(i.e. firms) evaluated by quantitative and/or qualitative 
criteria and from a set of categories previously defined 
(i. e. reference actions or reference profiles), ELECTRE 
TRI proposes two different procedures of assignment 
which allow the classification of all the actions in these 
categories. In consequence, ELECTRE TRI consists of 
establishing an outranking relation between the actions 
to be assigned and the reference profiles. The eventual 
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differences between the two assignment procedures, the 
pessimistic and the optimistic one, come from the in- 
comparability situations between an action and one or 
several reference profiles [17,75]. 

For the case of the evaluation of the business fail- 
ure risk, three categories of risk were determined by the 
credit managers of the Greek bank: 


C,) the failed firms (9 in number); 

Cz) the risky firms; uncertain category of firms to be 
studied further (10 in number); 

C3) the healthy firms (20 in number). 


These 39 firms were evaluated by seven criteria, five fi- 
nancial ratios and two strategic criteria. The criteria are 
the following: 


x) Earnings before interests and taxes/Total assets, 
x2) Net income/Stockholder’s equity, 

x3) Total debts/Total assets, 

x4) Financial expenses/Sales, 

x5) Administrative and general expenses/Sales, 

x6) Managers work experience, 

x7) Market niche/position. 


From the reference profiles and the thresholds of dis- 
crimination (preference model established by the credit 
managers of the bank), ELECTRE TRI provided good 
percentages of classification, which were of the order 
of 82.05% and 89.74% for the optimistic and the pes- 
simistic procedures respectively. The pessimistic pro- 
cedure gave better results and proved more adaptable 
to the problem of evaluation of business failure risk 
(it did not give serious classification errors of the type 
C,; — C3 or C3 — C)). For a detailed presentation of 
the multicriteria method and the results, see [84]. 

Concerning other financial problems which present 
a multicriteria character and on which a MCDA 
method has been applied, it is possible to provide a list 
of published works (non exhaustive). 
e Acquisitions of firms: [68]. 
e Bankruptcy risk: [1,17,67,78,79]. 
e Country risk: [7,11,12,49,52,70]. 
e Evaluation of performance of organizations 

— Insurance: [45]. 

- Banks: [43,44,85]. 

- Firms: [4,15,30,39,42,66,87,88]. 
e Financial planning: [20,22,73]. 
e Venture capital: [50,56,65]. 


Concluding Remarks 


This article has shown the contribution of the MCDA 
to the solution of some financial decision problems (i. e. 
venture capital, business failure risk, investment choice, 
portfolio management, etc.). In the past, all these prob- 
lems were approached with the use of financial the- 
ory in a very narrow framework, that of optimization. 
Some researchers took advantage of the optimal char- 
acter of these problems in order to propose operational 
research techniques (i. e. classical or monocriteria mod- 
eling) for their solution. The use of MCDA methods 
provides many advantages in financial management, 
among which one could mention the following: 

e the possibility of structuring complex evaluation 
problems; 

e the introduction of both quantitative (i.e. financial 
ratios) and qualitative criteria in the evaluation pro- 
cess; 

e the transparency in the evaluation, allowing good 
argumentation in financial decisions; 

e the introduction of sophisticated scientific methods 
in the field of financial management. 

In conclusion, MCDA methods seem to have a promot- 

ing future because they offer a highly methodological 

and realistic framework to decision problems. 


See also 


> Bi-objective Assignment Problem 

> Competitive Ratio for Portfolio Management 

> Decision Support Systems with Multiple Criteria 

> Estimating Data for Multicriteria Decision Making 
Problems: Optimization Techniques 

> Financial Optimization 

> Fuzzy Multi-objective Linear Programming 

> Multicriteria Sorting Methods 

> Multi-objective Combinatorial Optimization 

> Multi-objective Integer Linear Programming 

> Multi-objective Optimization and Decision Support 
Systems 

> Multi-objective Optimization: Interaction of Design 
and Control 

> Multi-objective Optimization: Interactive Methods 
for Preference Value Functions 

> Multi-objective Optimization: Lagrange Duality 

> Multi-objective Optimization: Pareto Optimal 
Solutions, Properties 


Financial Applications of Multicriteria Analysis 


1033 


> Multiple Objective Programming Support 

> Outranking Methods 

> Portfolio Selection and Multicriteria Analysis 
> Preference Disaggregation 

> Preference Disaggregation Approach: Basic 


Features, Examples From Financial Decision 
Making 


> Preference Modeling 
> Robust Optimization 
> Semi-infinite Programming and Applications in 


Finance 


References 


1. 


Andenmatten A (1995) Evaluation du risque de défaillance 
des emetteurs d’obligations: Une approche par l'aide mul- 
ticritére a la décision. Press. Polytechniques Univ. Roman- 
des 


. Ashford RW, Berry RH, Dyson RG (1988) Operational re- 


search and financial management. Europ J Oper Res 
36:143-152 


. Baumol WJ (1959) Economic theory and operations analy- 


sis. Prentice-Hall, Englewood Cliffs, NJ 


. Bergeron M, Martel JM, Twarabimenye P (1996) The evalu- 


ation of corporate loan applications based on the MCDA. 
J Euro-Asian Management 2(2):16-46 


. Bhaskar K (1979) A multiple objective approach to capital 


budgeting. Accounting and Business Res Winter, pp 25-46 


. Bhaskar K, Mcnamee P (1983) Multiple objectives in ac- 


counting and finance. J Business Finance and Accounting 
10(4):595-621 


. Clei J (1994) La méthodologie d’analyse de la Coface. 


Banque Stratégie 109:5-6 


. Colasse B (1993) Gestion financiére de l’entreprise. Press. 


Univ. France, Paris 


. Colson G, De Bruyn Ch (1989) An integrated multiobjective 


portfolio management system. Math Comput Modelling 
12(10-11):1359-1381 


. Colson G, Zeleny M (1979) Uncertain prospects ranking 


and portfolio analysis under the condition of partial infor- 
mation. Math Syst in Economics 44 


. Cook WD, Hebner KJ (1993) A multicriteria approach 


to country risk evaluation: With an example employ- 
ing Japanese data. Internat Rev Economics and Finance 
2(4):327-438 


. Cosset JC, Siskos Y, Zopounidis C (1995) Evaluating coun- 


try risk: A decision support approach. Global Finance J 
3(1):79-95 


. Cyert RM, March JG (1963) A behavioral theory of the firm. 


Prentice-Hall, Englewood Cliffs, NJ 


. Danila N (1980) Méthodologie d'aide a la décision dans 


le cas d'investissements fort dépendants. Thése de Doc- 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


torat de 3e Cycle, UER Sci des Organisations, Univ Paris- 
Dauphine 

Diakoulaki D, Mavrotas G, Papagyannakis L (1992) A mul- 
ticriteria approach for evaluating the performance of 
industrial firms. OMEGA Internat J Management Sci 
20(4):467-474 

Dimitras Al, Zanakis SH, Zopounidis C (1996) A survey of 
business failures with an emphasis on prediction methods 
and industrial applications. Europ J Oper Res 90:487-513 
Dimitras Al, Zopounidis C, Hurson CH (1995) A multicriteria 
decision aid method for the assessment of business failure 
risk. Found Computing and Decision Sci 20(2):99-112 
Dixon R (1991) Venture capitalists and the appraisal 
of investments. OMEGA Internat J Management Sci 
19(5):333-344 

Ekeland | (1993) Finance: Un nouveau domaine des math- 
ématiques appliquées. Revue Franc de Gestion, pp 44-48 
Eom HB, Lee SM, Snyder ChA, Ford FN (1987/8) A multiple 
criteria decision support system for global financial plan- 
ning. J Management Information Systems 4(3):94-113 
Evrard Y, Zisswiller R (1982) Une analyse des décisions 
d’investissement fondée sur les modéles de choix multi- 
attributs. Finance 3(1):51-68 

Goedhart MH, Spronk J (1995) Financial planning with frac- 
tional goals. Europ J Oper Res 82:111-124 

Hertz DB (1964) Risk analysis in capital investment. Harvard 
Business Rev 42:95-106 

Hillier FS (1963) The derivation of probabilistic informa- 
tion for the evaluation of risky investments. Managem Sci 
9:443-457 

Ho PC, Paulson AS (1980) Portfolio selection via factor anal- 
ysis. J Portfolio Management, pp 27-30 

Hoban JP (1976) Characteristics of venture capital invest- 
ments. Unpublished Doctoral Diss, Univ Utah 

Hui TK, Kwan E (1994) Internat. portfolio diversification: 
A factor analysis approach. OMEGA Internat J Manage- 
ment Sci 22(3):263-267 

Hurson CH, Zopounidis C (1995) On the use of multi- 
criteria decision aid methods to portfolio selection. J Euro-— 
Asian Management 1(2):69-94 

Hurson CH, Zopounidis C (1997) Gestion de portefeuille et 
analyse multicritére. Economica, Paris 

Jablonsky J (1993) Multicriteria evaluation of clients in fi- 
nancial houses. Central Europ J Oper Res and Economics 
3(2):257-264 

Jacquet-Lagréze E, Siskos J (1982) Assessing a set of addi- 
tive utility functions for multicriteria decision making: The 
UTA method. Europ J Oper Res 10:151-164 
Jacquet-Lagréze E, Siskos J (1983) Méthodes de décision 
multicritére. Ed. Hommes et Techniques 

Jacquillat B (1972) Les modéles d’évaluation et de sélec- 
tion des valeurs mobiliéres: Panorama des recherches 
américaines. Analyse Financiére 11(4):68-88 

Keeney RL, Raiffa H (1976) Decisions with multiple objec- 
tives: Preferences and value tradeoffs. Wiley, New York 


1034 


Financial Applications of Multicriteria Analysis 


35. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


AA, 


45. 


46. 
47. 


48. 


49. 


50. 


51. 


Khorramshahgol R, Okoruwa AA (1994) A goal program- 
ming approach to investment decisions: A case study of 
fund allocation among different shopping malls. Europ J 
Oper Res 73:17-22 

Khoury NT, Martel JM (1993) Nouvelles orientations dans 
l'étude des marchés financiers: Asymétrie d'information 
et analyse multicritére. L’Actualité Economique, Revue 
d’Analyse Economique 69(1):5-7 

Khoury NT, Martel JM, Veilleux M (1993) Méthode 
multicritere de sélection de portefeuilles indiciels in- 
ternationaux. L’Actualité Economique, Revue d’Analyse 
Economique 69(1):171-190 

Kivijarvi H, Tuominen M (1992) A decision support sys- 
tem for semistructured strategic decisions: A multi-tool 
method for evaluating intangible investments. Revue des 
Systemes de Décision 1:353-376 

Lee H, Kwak W, Han | (1995) Developing a business perfor- 
mance evaluation system: An analytic hierarchical model. 
The Engineering Economist 30(4):343-357 

Lee SM, Chesser DL (1980) Goal programming for portfolio 
selection. J Portfolio Management, pp 22-26 

Lin WT (1978) Multiple objective budgeting models: A sim- 
ulation. Accounting Rev LIlI(1):61-76 

Mareschal B, Brans JP (1993) BANKADVISER: Un systeme 
interactif multicritére pour |l’évaluation financiére des en- 
treprises a l'aide des méthodes PROMETHEE. L’Actualité 
Economique, Revue d’Analyse Economique 69(1):191- 
205 

Mareschal B, Mertens D (1990) Evaluation financiére par 
la méthode multicritére GAIA: Application au secteur ban- 
caire belge. Revue de la Banque 6:317-329 

Mareschal B, Mertens D (1992) BANKS: A multicriteria, 
PROMETHEE-based, decision support system for the eval- 
uation of the international banking sector. Revue des Sys- 
temes de Décision 1(2-3):175-189 

Mareschal B, Mertens D (1993) Evaluation multicritére par 
la méthode multicritere GAIA: Application au secteur de 
l’assurance en Belgique. L’Actualité Economique, Revue 
d’Analyse Economique 69(1):206-228 

Markowitz H (1952) Portfolio selection. J Finance, pp 77-91 
Martel JM, Khoury NT, Bergeron M (1988) An application of 
a multicriteria approach to portfolio comparisons. J Oper 
Res Soc 39(7):617-628 

Martel JM, Khoury NT, M’zali B (1991) Comparaison 
performance-taille des fonds mutuels par une analyse 
multicritére. L’Actualité Economique, Revue d’Analyse 
Economique 67(3):306-324 

Mondt K, Despontin M (1986) Evaluation of country risk us- 
ing multicriteria analysis. Paper presented at the EURO VIII 
Conf 

Muzyka D, Birley S, Leleux B (1996) Trade-offs in the in- 
vestment decisions of Europ. venture capitalists. J Business 
Venturing 11:273-287 

Myers SC, Pogue GA (1974) A programming approach to 
corporate financial management. J Finance 29:579-599 


22: 


23. 


54. 


55: 


56. 


57. 


58. 


59. 


60. 


61. 


62. 


63. 


64. 


65. 


66. 


67. 


68. 


69. 


70. 


71. 


72. 


73. 


Oral M, Kettani O, Cosset JC, Daouas M (1992) An estima- 
tion model for country risk rating. Internat J Forecasting 
8:583-593 

Pardalos PM, Siskos Y, Zopounidis C (1995) Advances in 
multicriteria analysis. Kluwer, Dordrecht 

Poindexter JB (1976) The efficiency of financial markets: 
The venture capital case. Unpublished Doctoral Diss New 
York Univ 

Ribarovic Z, Mladineo N (1987) Application of multicriteri- 
onal analysis to the ranking and evaluation of the invest- 
ment programmes in the ready mixed concrete industry. 
Engin Costs and Production Economics 12:367-374 
Riquelme H, Rickards T (1992) Hybrid conjoint analysis: An 
estimation probe in new venture decisions. J Business Ven- 
turing 7:505-518 

Ross SA (1976) The arbitrage theory of capital asset pricing. 
J Econom Theory, pp 343-362 

Roy B (1985) Méthodologie multicritére d'aide a la déci- 
sion. Economica, Paris 

Roy B (1988) Des critéres multiples en recherche opéra- 
tionnelle: Pourquoi? In: Rand GK (ed) Operational Research 
'87. Elsevier, Amsterdam, pp 829-842 

Roy B (1996) Multicriteria methodology for decision aiding. 
Kluwer, Dordrecht 

Roy B, Bouyssou D (1993) Aide multicritére a la décision: 
Méthodes et cas. Economica, Paris 

Saaty TL, Rogers PC, Pell R (1980) Portfolio selection 
through hierarchies. J Portfolio Management Spring:16-21 
Scharlig A (1996) Pratiquer ELECTRE et PROMETHEE. Press. 
Polytechniques Univ. Romandes 

Simon HA (1957) Models of man. Wiley, New York 

Siskos J, Zopounidis C (1987) The evaluation criteria of the 
venture capital investment activity: An interactive assess- 
ment. Europ J Oper Res 31:304-313 

Siskos Y, Zopounidis C, Pouliezos A (1994) An integrated 
DSS for financing firms by an industrial development bank 
in Greece. Decision Support Systems 12:151-168 
Slowinski R, Zopounidis C (1995) Application of the rough 
set approach to evaluation of bankruptcy risk. Internat J 
Intelligent Systems in Accounting, Finance, and Manage- 
ment 4:27-41 

Slowinski R, Zopounidis C, Dimitras Al (1997) Prediction of 
company acquisition in Greece by means of the rough set 
approach. Europ J Oper Res 100:1-15 

Spronk J (1981) Interactive multiple goal programming ap- 
plication to financial planning. Martinus Nijhoff, Boston 
Tang JCS, Espinal CG (1989) A model to assess country risk. 
OMEGA Internat J Management Sci 17(4):363-367 
Tyebjee TT, Bruno AV (1984) A model of venture capitalist 
investment activity. Managem Sci 30(9):1051-1066 

Vincke Ph (1992) Multicriteria decision-aid. Wiley, New 
York 

Vinso JD (1982) Financial planning for the multinational 
corporation with multiple goals. J Internat Business Stud, 
pp 43-58 


Financial Equilibrium 


1035 


74. 


75. 


76. 


77. 


78. 


79. 


80. 


81. 


82. 


83. 


84. 


85. 


86. 


87. 


88. 


Williamson OE (1964) The economics of discretionary be- 
havior: Managerial objectives in a theory of the firm. 
Prentice-Hall, Englewood Cliffs, NJ 

Yu W (1992) Aide multicritére a la décision dans le cadre 
de la problématique du tri: Concepts, méthodes et appli- 
cations. Thése de Doctorat, Univ Paris-Dauphine 

Zeleny M (1977) Multidimensional measure of risk: The 
prospect ranking vector. In: Zionts S (ed) Multiple Criteria 
Problem Solving. Springer, Berlin, pp 529-548 

Zeleny M (1982) Multiple criteria decision making. 
McGraw-Hill, New York 

Zollinger M (1982) L’analyse multicritére et le risque 
de crédit aux entreprises. Revue Franc de Gestion Jan.- 
Fév.:56-66 

Zopounidis C (1987) A multicriteria decision-making 
methodology for the evaluation of the risk of failure and 
an application. Found Control Eng 12(1):45-67 
Zopounidis C (1990) La gestion du capital-risque. Econom- 
ica, Paris 

Zopounidis C (1993) A multicriteria methodology for the 
evaluation and ranking of investment projects. Bull Greek 
Banks Assoc 10(39-40):22-28. (In Greek) 

Zopounidis C (1993) On the use of the MINORA multicrite- 
ria decision aiding system to portfolio selection and man- 
agement. J Inform Sci and Techn 2(2):150-156 

Zopounidis C (1994) Venture capital modelling: Evaluation 
criteria for the appraisal investments. The Financier: ACMT 
1(2):54-64 

Zopounidis C (1995) Evaluation du risque de défaillance 
de l’entreprise: Méthodes et cas d’application. Economica, 
Paris 

Zopounidis C, Despotis DK, Stavropoulou E (1995) Multiat- 
tribute evaluation of Greek banking performance. Applied 
Stochastic Models and Data Analysis 11(1):97-107 
Zopounidis C, Godefroid M, Hurson Ch (1995) Designing 
a multicriteria decision support system for portfolio selec- 
tion and management. In: Janssen J, Skiadas CH, Zopouni- 
dis C (eds) Advances in Stochastic Modelling and Data 
Analysis. Kluwer, Dordrecht, pp 261-292 

Zopounidis C, Matsatsinis NF, Doumpos M (1996) Develop- 
ing a multicriteria knowledge-based decision support sys- 
tem for the assessment of corporate performance and via- 
bility: The FINEVA system. Fuzzy Economic Rev 1(2):35-53 
Zopounidis C, Pouliezos A, Yannacopoulos D (1992) De- 
signing a DSS for the assessment of company performance 
and viability. Comput Sci in Economics and Management 
5:41-56 


Financial Equilibrium 


ANNA NAGURNEY 
University Massachusetts, Amherst, USA 


MSC2000: 91B50 


Article Outline 


Keywords 

A Multi-Sector, Multi-Instrument Financial 
Equilibrium Model 

Optimality Conditions 

Economic System Conditions 

See also 

References 


Keywords 


Efficient frontier; Risk-free asset; Market portfolio; 
Option; Perfect competition; Portfolio optimization; 
Variational inequality formulation 


Finance is concerned with the study of capital flows 
over space and time. The theory of financial economics 
is a combination of many different theories among 
which the theories of finance and economics, mathe- 
matical programming, and utility theory are credited 
with the biggest contributions. 

The current state of modern financial economic the- 
ory is based upon the fundamental contributions of 
economists in the decade of the 1950s. Here we review 
some of the major developments. For a more complete 
historical breakdown, see [32]. 

The first major breakthrough was by K. Arrow 
and G. Debreu, who, in a series of publications 
(cf. [1,2,4,12,13]), introduced an important extension 
to the existing economic theory. Their contributions 
brought competitive equilibrium theory to a new level 
and allowed for the development of modern economic 
and finance theory. Specifically, Arrow and Debreu 
applied the techniques of convexity and fixed point 
theory to a model that followed the neoclassical eco- 
nomic foundations of: market clearing, uncertainty, 
and individual rationality and then they derived new 
fundamental economic properties from these models 
(e. g., [3,14]). 

F. Modigliani and M. Miller [28], in turn, showed 
that the capital structure of a firm, that is, the financial 
framework of the firm, usually measured by the debt 
to equity ratio, does not affect the value of a firm. In 
their work, for the first time, the idea of financial arbi- 
trage was used by stating that any investor can use risk- 
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less arbitrage in order to avoid the financial structure 
of a firm. Their work serves as the base for most of the 
research on capital structure. 

The other theoretical breakthrough was by H.M. 
Markowitz in 1952, the founder of modern portfolio 
theory. Markowitz [25] proposed that one of the prin- 
cipal objectives of investors, in addition to the maxi- 
mization of the returns of their portfolios, is to diver- 
sify away as much risk as possible. He claimed that in- 
vestors choose assets in a manner so that the risk of 
their portfolio matches their risk preferences. He sug- 
gested that individuals who cannot bear risk will invest 
in assets with low risk, whereas people more comfort- 
able with risk will accept investments of higher risk. His 
work suggested that the trade-off between risk and re- 
turn is distinct for each investor; however, the prefer- 
ences of all people lie upon a fictitious curve which is 
usually called the ‘frontier of efficient portfolios’. Along 
this curve lie all the diversified portfolios that have 
the highest return for a given risk, or the lowest risk 
for a given return. Markowitz’s model was based on 
mean-variance portfolio selection, where the average 
and the variability of portfolio returns were determined 
in terms of the mean and covariance of the correspond- 
ing investments. 

Many versions and extensions of Markowitz’s 
model have appeared in the literature (cf. [19], and 
the references therein). The first important simplifi- 
cation of Markowitz’s model was suggested by W.F. 
Sharpe [35], through a model known as the diagonal 
model, in which ‘the individual covariances between all 
securities are assumed to be zero’. According to this 
model, the variance-covariance matrix has zeros in all 
positions other than the diagonal. 

The most significant extension of the models by 
Markowitz [25] and Sharpe [35], was the Capital Asset 
Pricing Model (CAPM), which was based on the work 
of Sharpe [36], J. Lintner [24], and J. Mossin [29]. In 
this model the concept of a risk-free asset and market 
portfolio were introduced. A risk-free asset is an asset 
with a positive expected rate of return and a zero stan- 
dard deviation. A market portfolio, on the other hand, 
is a portfolio on the efficient frontier of the Markowitz 
model which is considered to be desirable by all in- 
vestors. The CAPM assumes that all investors will se- 
lect a portfolio that will be a linear combination of the 
risk-free asset and the market portfolio, and, hence, the 


equilibrium prices of all assets can be expressed as a lin- 
ear combination of the risk-free price and the price of 
the market portfolio. Since some of the assumptions 
governing the CAPM were not realistic (such as the ab- 
sence of transaction costs), the model was extended and 
improved several times in the years that followed. It is, 
nevertheless, one of the major breakthroughs in mod- 
ern economic and finance theory and forms the basis 
for most of the financial models. 

Most of the major extensions of the CAPM occurred 
in the decade of the seventies, where a series of papers 
either relaxed some of its assumptions, or derived em- 
pirical results by applying it to a series of problems. 
Among the most significant contributions of that time 
were: the extension to a multiperiod economy by R.C. 
Merton [27] and the consumption CAPM by D.T. Bree- 
den [6] (which, however, failed empirically due to the 
difficulty in observing and computing consumption). 

The dissatisfaction with the empirical tests of the 
CAPM led to more advanced models, such as the Ar- 
bitrage Pricing Theory (APT) by S.A. Ross [34]. The 
APT’s main contribution was the inclusion of multiple 
risk factors and the generalization of the CAPM, which 
was considered to be a special case of APT with only 
a single risk factor. In particular, Ross assumed that the 
rate of return of every security can be expressed as a lin- 
ear combination of some ‘basic’ risk factors. 

Another major development in modern financial 
economic theory was the derivation of an accurate op- 
tion pricing model by F. Black and M. Scholes [5], 
which revolutionized the pricing of financial instru- 
ments and the entire financial industry. Note that an op- 
tion is, in general, the right to trade an asset for a prea- 
greed amount of capital. If the right is not exercised af- 
ter a predetermined period of time, the option expires 
and the holder loses the money paid for holding that 
right. A major part of the subsequent literature focused 
on different approaches to, simplifications of, and vari- 
ations of the Black-Scholes Model (BSM). A significant 
simplification of the BSM was done by J.C. Cox, Ross, 
and M. Rubinstein [11] (see [27]). 

Furthermore, the mean-variance portfolio analysis 
that was introduced and mathematically formulated by 
Markowitz [25,26] and later simplified by the diago- 
nal model of Sharpe [35], was further extended by G.A. 
Pogue [33] and J.C. Francis [18], with the introduction 
of variance-covariance matrices for both assets and li- 
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abilities, applied to the asset-liability management of 
banks. 

Most of the aforementioned models and theo- 
ries were subsequently extended and improved. The 
APT of Ross was refined by G. Chamberlain [7] and 
G. Connor [8], and the Black-Scholes model was 
further explored and significantly generalized (see, 
e.g., [10,15,17]). 

The majority of the literature in financial economics 
has been based on the assumption that investors can- 
not affect the prices at which they buy or sell. Each in- 
vestor is considered to be an isolated case, who tries 
to maximize his utility function, subject to the prices 
that the market provides him. All the participants in the 
economy, be they buyers or sellers, have as a goal the 
maximization of their profits and the minimization of 
their losses. The prices are derived through the market 
where investors constantly buy and sell commodities. 
The analysis of market equilibrium tries to determine 
the prices at which different products will be bought 
and sold, and also the amount of each product that each 
participant in the economy will hold in an equilibrium 
state. 

Market equilibrium analysis has its roots in the 
last half of the nineteenth century. The work of H. 
Gossen [21], W. Jevons [23], and L. Walras [39] initi- 
ated the analysis of equilibrium theory. Subsequently, 
in the 1930s the study of market equilibrium became 
more formal and solid. The work of A. Wald [37,38] 
and J.R. Hicks [22] provided, for the first time, proofs of 
different qualitative properties of the equilibrium, along 
with a detailed study of the conditions under which 
an equilibrium could be modeled and derived. Further- 
more, the work of Arrow [1] and G. Debreu [12] started 
a new era in equilibrium analysis by bringing uncer- 
tainty into equilibrium theory, which led to the current 
status of market equilibrium theory. 

The basic assumption that governs most of the ex- 
isting models that address the theory of market equi- 
librium is that of perfect competition. Perfect competi- 
tion prohibits any participant in the economy (buyer 
or seller) from having control over the prices of differ- 
ent products or over the actions of other participants. 
The price of a product is considered to be a variable, the 
value of which is determined by the combined actions 
of all the buyers and sellers. Buyers are, hence, ‘price 
takers’, in that they modify their holdings of a product 


according to the price, ignoring the effects that their 
behavior may have on that price. Moreover, perfect 
competition assumes that all participants in the econ- 
omy have perfect information about the products avail- 
able, the current price, and the bids of a specific prod- 
uct. Furthermore, the number of the participants in the 
economy is assumed to be large enough so that the mar- 
ket activity regarding a specific product will be small 
compared to the transactions in the overall market. 

For definiteness, we present a financial equilib- 
rium model due to A. Nagurney [30] (see, also, [32], 
and the references therein). The model relaxes the 
CAPM assumptions of homogeneous expectations 
(cf. [24,29,36]), without imposing restrictions as to the 
nature of different sectors (e. g., [20]). 

The mathematical framework that is utilized to 
develop the multi-sector, multi-instrument financial 
equilibrium model is finite-dimensional variational in- 
equality theory. The methodology of finite-dimensional 
variational inequalities was first suggested for the mod- 
eling, analysis, and computation of multi-sector, multi- 
instrument financial equilibrium problems by Nagur- 
ney, J. Dong, and M. Hughes [31] and was further 
explored by Nagurney [30]. For complete references, 
qualitative results, as well as a plethora of financial 
equilibrium models and computational approaches, 
see [32]. 


A Multi-Sector, Multi-Instrument Financial 
Equilibrium Model 


Consider a single country economy with multiple in- 
struments and with multiple sectors. We let i denote 
a typical instrument, with the total number of instru- 
ments available in the economy, denoted by I. We let j 
denote a typical sector in the economy, with the num- 
ber of sectors denoted by J. 

Let r; denote the (nonnegative) price of instrument 
i, and group the prices of all the instruments into the 
column vector r € Ri; Denote the volume of instru- 
ment i that sector j holds as an asset, by x, and group 
the (nonnegative) assets in the portfolio of sector j into 
the column vector X/ € R/.. Further, group the assets 
of all sectors in the economy into the column vector 
XE Re. Similarly, denote the volume of instrument i 
that sector j holds as a liability, by a , and group the 
(nonnegative) liabilities in the portfolio of sector j into 
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the column vector Y/ € R/.. Finally, group the liabili- 
ties of all sectors in the economy into the column vector 
YeR’. 

Assume that the total volume of each balance sheet 
side of each sector is exogenous. Recall that a balance 
sheet is a financial report that demonstrates the status of 
a company’s assets, liabilities, and the owner’s equity at 
a specific point of time. The left-hand side of a balance 
sheet contains the assets that a sector holds at a partic- 
ular point of time, whereas the right-hand side accom- 
modates the liabilities and owner’s equity held by that 
sector at the same point of time. According to account- 
ing principles, the sum of all assets is equal to the sum 
of all the liabilities and the owner’s equity. Moreover, 
we assume that the sectors under consideration act in 
a perfectly competitive environment. 

Since each sector’s expectations are formed by refer- 
ence to current market activity, a sector’s expected util- 
ity maximization can be written in terms of optimizing 
the current portfolio. Sectors may trade, issue, or liqui- 
date holdings in order to optimize their portfolio com- 
positions. 

We assume that each sector j tries to maximize his 
utility function, which we denote as U/(X!, Y/, r). We 
also assume that the utility function of every sector is 
concave, continuous, and twice continuously differen- 
tiable. Furthermore, the accounts of each sector must 
balance. We denote the total financial volume held by 
sector j by S’. Therefore, the optimization problem that 
each sector j faces is given by: 


max U/(X/,Y/,r) 


I 
s.t. \ = Si, 


i=1 


I 
Ves) 
i=1 


M20, VY) 20, 
i=1,...,1, 


where the price vector r is an exogenous vector in the 


optimization problem of every sector j;j =1,..., J. 
We now discuss the feasible set of the sectors. For 
each sector j;j = 1,..., J, we let 


I 
Xi=yXieR|: Sx =i 


i=1 


denote the constraint set of his assets. Similarly, we let 


I 
ei. be _— 
YisjYeR\: }\¥/ =si 
i=1 
denote the constraint set for his liabilities. Then, the 
feasible set for a sector j is a Cartesian product, denoted 
by «/, where 


Ki = {Xi x Vit, 


Let X denote the feasible set for the assets of all the sec- 
tors, where: 


X= X'x...xXix...x« X/, 


Similarly, for the liabilities, let Y denote the feasible set 
of the liabilities of all the sectors, that is, 


Y=yY'x...xYix...xY/. 


Also, define k = ix x Y}. 

We now present the optimality conditions for a sec- 
tor’s utility maximization problem, given above. We 
then give the economic conditions determining the in- 
strument prices (in equilibrium). 


Optimality Conditions 


The necessary and sufficient conditions for an optimal 
portfolio for sector j are that the vector of assets and lia- 
bilities, (X/*, Y/*) € K/, satisfies the following system of 
equalities and inequalities: For each instrument i, i = 1, 
..., I, we must have the following Kuhn-Tucker condi- 
tions being satisfied, at an equilibrium price vector 1*: 


oe or) 4 

7 ax! “pe” 
OU YEP) _ 

7 ay! ae 
_ i(xi* yi" 

x (=e 2 vr uj) =o, 

ax! 
vi (a ee) -1) - 


where [Lis be are the Lagrange multipliers associated 
with the constraints. Obviously, a similar set of equali- 
ties and inequalities holds for every other sector in the 
single country economy. 
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Economic System Conditions 


Moreover, the economic system conditions that ensure 
market clearance at a positive instrument price (and 
a possible excess supply of the instrument at a zero 


price) are: For each instrument i, i = 1, ..., I, we must 
have that: 
J Ck 
i i* \=0 ifr* > 0, 
ay -¥) 
Ck 
jal >0 ifr? =0. 


This system of equalities and inequalities states that if 
the price of a financial instrument is positive, then the 
market must clear for that instrument and if the price 
is zero, then either there is an excess supply of that in- 
strument in the economy or the market clears. 

Let K be the feasible set for all the asset and liability 
holdings of all the sectors, and all the prices of all the 
instruments where K = {k x R/}. 

Combining the above, we present the following def- 
inition of equilibrium. 


Definition 1 (financial equilibrium) A vector (X*, Y*, 
r*) € K is an equilibrium of the single country, multi- 
sector, multi-instrument financial model if and only if it 
satisfies the system of equalities and inequalities above, 
for all sectors j,j = 1,..., J, and for all instruments i, i = 
1,..., J, simultaneously. 


The necessary and sufficient conditions for optimal 
portfolios, along with the economic conditions for the 
instrument prices, are now utilized in obtaining the 
variational inequality formulation of the financial equi- 
librium conditions. 


Theorem 2 (variational inequality formulation) A 
vector of assets and liabilities of the sectors, and instru- 
ment prices, (X*, Y*, r*) € K, is a financial equilibrium 
if and only if it satisfies the variational inequality prob- 
lem: 


fot i(xi* yi* % 
- Peele cals Sa x (x! — x") 
= ax! 
: ee 

— x [ 


yi —y?"] 


1 t 


ay! 


t 


V(X, Y,r) eK. 


We now put the variational inequality into standard 
form. We first define the J-dimensional column vector 
U with components: {U!, ..., U/} and let Vx U denote 
the JI-dimensional vector with components: {V1 U}, 
...5 Vx/U} with Vy;U/ denoting the gradient of U/ 
with respect to the vector X’. The expression Vy U is 
defined accordingly. We let m = 2JI + I. We define the 
n-dimensional column vector x = (X, Y,r) € K, and the 
n-dimensional column vector F(x) with components: 


F(x) —VxU(X, ne r) 

=e 

Fay =| 30 =| 2 aGr- 
F,(x) paane§ ~ ¥}) nx1 


Consequently, the variational inequality may be rewrit- 
ten as: 
e Determine x* € K satisfying: 


(F(x*)',x—x*}>0, Vx eK. 

Other financial equilibrium models, including models 
with hedging instruments such as futures and options, 
as well as, international financial equilibrium models 
can be found in [32], and the references therein. 


See also 


> Equilibrium Networks 

> Generalized Monotonicity: Applications to 
Variational Inequalities and Equilibrium Problems 

> Oligopolistic Market Equilibrium 

> Spatial Price Equilibrium 

> Traffic Network Equilibrium 

> Walrasian Price Equilibrium 
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There is great need for an integrative approach to fi- 
nancial analysis and planning. The globalization of fi- 
nancial markets and the introduction of complex prod- 
ucts such as exotic derivatives have increased volatility 
and risks. Strides in computers and information tech- 
nology has eliminated any delays between the occur- 
rence of an event and the impact on the markets — 
within the home country and internationally. The do- 
main of financial planning provides a rich source of 
applications for optimization models and related tools. 
Such tools as simulation, estimation, stochastic pro- 
cesses, decision support, and artificial intelligence have 
become indispensable in several domains of financial 
operations [36]. With the continued growth of com- 
plex financial instruments and an increased acceptance 
of operations research tools by practitioners, optimiza- 
tion models are positioned to play a significant role in 
financial planning. There is a wealth of literature avail- 
able regarding the role of optimization models in finan- 
cial planning. See [12,16,23,32,35,37,38]. 

The primary purpose of this article is to present 
an overview of an integrative optimization-based fi- 
nancial planning model. In financial applications, the 
planner must provide recommendations from among 
a large of number of alternatives in which there is 
considerable uncertainty. The financial planner must 
therefore model the decisional environment as well 
as the stochastic elements in a dynamic fashion. The 
model presented here encompasses several popular ap- 
proaches to the problem of investment strategies, in- 
cluding stochastic programs and dynamic stochastic 
control [4]. The financial planning model results in 
large stochastic optimization problems and efficient al- 
gorithms are now available for solving these nonlinear 
programs. A brief review of the various algorithms is 
also presented. 


Single-Period Models 


The most widely used methods for portfolio selection 
are based on the mean/variance approach [20]. Mean- 
variance optimization is a mathematical tool that cre- 
ates a portfolio of assets with the maximum expected 


return for a given level of risk or with the minimum 
risk for a given expected return. Over the years, a num- 
ber of researchers have extended and refined the orig- 
inal model to include transactions costs, trading size 
and turnover constraints and other practical require- 
ments [30]. Several researchers have provided efficient 
procedures for estimating the variance/covariance ma- 
trix of returns required by the model, based on factor, 
index or scenario analysis [10]. 

While mean-variance analysis provides a powerful 
framework for asset allocation, it suffers from several 
limitations. The Markowitz model treats expected re- 
turns, standard deviations, and correlations as popula- 
tion parameters. These population parameters are not 
available, and therefore statistical estimates are used. 
The estimation errors thus introduced can distort the 
optimization results and could result in major errors in 
asset allocation. 

Single-period models cannot capture long-term in- 
vestment goals. They do not have the ability to con- 
sider opportunity costs that should influence decisions 
on strategic placement of funds; investment opportu- 
nities with maturities exceeding a single period can- 
not be included; neither can the impact of anticipated 
exogenous supply/demand for funds be properly as- 
sessed [21]. Single-period models tend to produce high 
portfolio turnovers and opportunistic asset trades. They 
cannot accurately account for the effect of transaction 
costs. Purchases of asset categories with high transac- 
tion costs are disfavored unless they promise high im- 
mediate returns. Multiperiod models, properly formu- 
lated, can overcome many of these limitations. This is 
the focus of our discussion in the next section. 


Multiperiod Models 


We address financial planning over long horizons 
via multistage stochastic programming. The stochas- 
tic program brings together all major financial-related 
decisions in a single and consistent structure. It inte- 
grates investment strategies (also known as asset allo- 
cation strategies), liability decisions (e. g., borrowings) 
and savings strategies (or re-investment decisions) in 
a comprehensive fashion. As such, the system forms 
the basis for assessing and managing risks for large in- 
stitutional organizations, including banks, savings and 
loans, insurance companies, pension plans, and gov- 
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ernment entities. Several noteworthy applications of 
stochastic programming for financial planning include 
the Russell-Yasuda investment system for insurance 
companies [6], the Towers Perrin investment system 
for pension plans [22], the integrated simulation and 
optimization system for the Metropolitan Life Insur- 
ance Company [35], and the integrated product man- 
agement system [12]. In each case, asset investment de- 
cisions are combined with liability choices in order to 
maximize the investor’s wealth over time. 

We describe a generalized network model for mul- 
tiperiod investment planning [23]. While some real- 
world issues are difficult to accommodate within the 
network context and must be handled as general lin- 
ear constraints, the network provides a visual reference 
for the financial planning system. We divide the en- 
tire planning horizon T into two discrete time inter- 
vals T; and Ty where T, = {0,..., t} and T, = {t + 1, 
..., T}. The former corresponds to periods in which in- 
vestment decisions are made. Period t defines the date 
of the planning horizon; we focus on the investor’s po- 
sition at the beginning of period t. Decisions occur at 
the beginning of each time stage. Much flexibility exists. 
An active trader might see his time interval as short as 
minutes, whereas a pension plan advisor will be more 
concerned with much longer planning periods such as 
the dates between the annual Board of Director’s meet- 
ing. It is possible for the steps to vary over time — short 
intervals at the beginning of the planning period and 
longer intervals towards the end. Tz handles the hori- 
zon at time Tt by calculating economic and other factors 
beyond period t up to period T. The investor cannot 
render any active decisions after the end of period t. 

Asset investment categories are defined by set 
A = {l, ..., I}, with category 1 representing cash. 
The remaining categories can include broad investment 
groupings such as stocks, bonds, and real estate. The 
categories should track well-defined market segment. 
Ideally, the co-movements between pairs of asset re- 
turns would be relatively low so that diversification can 
be done across the asset categories. 

In our approach, uncertainty is represented by 
a number of distinct realizations. Each complete real- 
ization of all uncertain parameters gives rise to a sce- 
nario; we denote by S the discrete set of all scenarios. 
Several scenarios may reveal identical values for the un- 
certain quantities up to a certain period - i.e., they 
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share common information history up to that period 
(see Fig. 1). Scenarios that share common information 
up to a specific period must yield the same decisions 
up to that period. We will address the representation of 
the information structure through a condition known 
as nonanticipativity. 

Weassume that the portfolio is rebalanced at the be- 
ginning of each period. Alternatively, we could simply 
make no transaction except reinvest any dividend and 
interest - a buy and hold strategy. For convenience, we 
also assume that the cashflows are reinvested in the gen- 
erating asset category and all the borrowing is done on 
a single period basis. 

For each i € A, t € T,, ands € S, we define the fol- 
lowing parameters and decision variables. 


Parameters 


© r;,=1+ pj» where pj , is the percent return for as- 
set i, time period t, under scenario s (projected by 
the stochastic modeling subsystem). zr; is the prob- 
ability that scenario s occurs, Fait; =e 

e wo is the wealth in the beginning of time period 0. 

e oj,, are the transaction costs incurred in rebalancing 
asset i at the beginning of time period t (symmet- 
ric transaction costs are assumed, i. e., cost of selling 
equals cost of buying). 

e + is the borrowing rate in period t under scenario s. 
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Decision Variables 


e xj, is the amount of money in asset category i, in 
time period t, under scenario s, after rebalancing. 

e v;_, isthe amount of money in asset category i, in the 
beginning of time period f, under scenario s, before 
rebalancing. 

e w, is the wealth at the beginning of time period t, 
under scenario s. 

¢ ;,, is the amount of asset purchased for rebalancing 
in period t, under scenario s. 

e dj , is the amount of asset i sold for rebalancing in 
period t, under scenario s. 

e 0b; is the amount borrowed in period t, under sce- 
nario s. 

With these definitions in place, we can present the 
deterministic equivalent of the stochastic asset alloca- 
tion problem. 


Model SP 
s 
max Z = S > af (ws) (1) 
s=1 
such that 
Yi xig=wo, Ws, (2) 


Vs eS, (3) 


_ 1s s 
Vit = Vit-1%i,t-1> 


(4) 
VseS te1,...5%, i€A, 
Xi = Vit + Pin —Oi¢) = ds (5) 
VseS, i#l, t=1,...,t 
Xie a “ie + > d; (1 — 0i,1) 
iAl 
~ yo Pa = by, (1 + B;-)) + b; (6) 
iA#1 
VseS, t=1,...,T, 
Xi = i 
for all scenarios s, s’ (7) 


with identical past up to time f. 


The generalized network model is presented in 
Fig. 2. The nonlinear objective function (1) can take 
several different forms. If the classical mean-variance 
function is employed, then (1) becomes max Z = 7 


Mean(w,) — (1 — n) Var(w;), where Mean(w,) is the 
average total wealth and Var(w,) is the variance of the 
total wealth across the scenarios at the end of period rt. 
Parameter 7 indicates the relative importance of vari- 
ance as compared with the expected value. This ob- 
jective leads to an efficient frontier of wealth at pe- 
riod t. An alternative to mean-variance is the von Neu- 
mann-Morgenstern expected utility of wealth at period 
t. Here, the objective becomes 


S 
max Z = > a, Utility(w?), 


s=1 


where Utility(W) is the von Neumann-Morgenstern 
utility function [15]. The two objective functions are 
equivalent under certain conditions on the distribution 
of returns and the shape of the utility function [17]. 

Constraint (2) guarantees that the total initial in- 
vestment equals the initial wealth. Constraint (3) rep- 
resents the total wealth in the beginning of period rt. 
This constraint can be modified to include assets, liabil- 
ities, and investment goals. The modified result is called 
the surplus wealth [21]. Most investors make allocation 
decisions without reference to liabilities or investment 
goals. J.M. Mulvey employs the notion of surplus wealth 
to the mean-variance and the expected utility models 
to address liabilities in the context of asset allocation 
strategies. Constraint (4) depicts the wealth Vit accu- 
mulated at the beginning of period t before rebalancing 
in asset i. The flow balance constraint for all assets ex- 
cept cash for all periods is given by constraint (5). This 
constraint guarantees that the amount invested in pe- 
riod t equals the net wealth for asset. Constraint (6) rep- 
resents flow balancing constraint for cash. Nonantici- 
pativity constraints are represented by (7). These con- 
straints ensure that the scenarios with the same past will 
have identical decisions up to that period. While these 
constraints are numerous, solution algorithms take ad- 
vantage of their simple structure. 

Model SP is a split variable formulation of the 
stochastic asset allocation problem. This formulation 
has proven successful for solving the model using tech- 
niques such as progressive hedging algorithm of [27] 
and quadratic diagonal approximation of [25]. Split 
variable formulation is also found beneficial by direct 
solvers that use the interior point method [19]. By sub- 
stituting constraint (7) back in constraint (2) to (6), we 
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Financial Optimization, Figure 2 
Generalized network model for each scenario s € S$ 


obtain a compact formulation of the stochastic alloca- 
tion problem. Constraints for this formulation exhibit 
a dual block diagonal structure. This formulation may 
be better for some direct solvers [19]. 


Scenario Generation 


Scenario analysis offers an effective, and easily under- 
stood tool for addressing the stochastic elements in 
a multistage financial model. We define a scenario as 
a single deterministic realization of all uncertainties 
over the planning horizon. Ideally, the process con- 
structs scenarios that represent the universe of possi- 
ble outcomes. This objective differs from generation of 
a single scenario, say for forecasting and trading strate- 
gies. We are interested in constructing a ‘representative’ 
set of scenarios that are both optimistic and pessimistic 
within a risk analysis framework. Such an effort was un- 
dertaken by Towers Perrin (one of the largest actuarial 
firms in the world) using a system called CAP:Link [22]. 
The system entails a cascading of a set of submod- 
els, starting with the interest rate component. Towers 
Perrin employs a version of the Brennan-Schwartz [5] 
two-factor interest rate model. The other submodels are 
driven by the interest rates and other economic factors. 
Towers Perrin has implemented the system in over 14 
countries in Europe, Asia, and North America. 
Scenario generation requires the estimation of the 
input parameters for the modeling of the economic fac- 
tors. The ability to choose the ‘correct’ or ‘best’ set of 


parameters is essential if such models are to have prac- 
tical value. A variety of techniques are available for esti- 
mating the economic factors required for projected re- 
turns and liabilities. See [24] and [1] for a discussion of 
some these techniques. See also [8,9] for a treatment of 
the robustness of scenario generation. 


Solution Techniques 


In this section we review a number of algorithms avail- 
able to solve the asset allocation models. We focus on 
solutions to multistage stochastic programs possessing 
discrete-time decisions with a modest number of sce- 
narios - typically under 1000 to 3000 - and nonlin- 
ear objective functions addressing risk aversion. The 
model’s size depends on the number of decision vari- 
ables and the form of the nonanticipativity rules. If 
Model SP is selected, the model becomes a convex pro- 
gram whose size hinges on the number of scenarios that 
are placed in S. 


Direct Solvers 


The simplest approach when the objective is linear is 
to use an efficient linear programming solver. Although 
simpler to handle, LP does not represent risk aversion 
well. See [19] for a solution of the multistage asset al- 
location problem with a linear objective function us- 
ing OB1 and MINOS. OB1 is a primal-dual interior 
point algorithm for solving linear programs [18]. MI- 
NOS is a nonlinear programming code that can also 
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solve LP [28]. On a compact formulation, MINOS out- 
performed OB1 on several test problems. The split for- 
mulation, however, significantly reduced the time re- 
quired by OB1 to yield the fastest solution times. 

When the objective is nonlinear, a general purpose 
nonlinear programming code can be used for solution. 
However, the nonlinear interior point methods have 
advantages over these codes. For example, in mean- 
variance applications, the covariance matrix can be fac- 
tored to convert the mean-variance function into a sep- 
arable function. This is achieved by a modest increase in 
the number of constraints. R.J. Vanderbei and T.J. Car- 
penter [34] show that nonlinear interior point meth- 
ods can take advantage of the separable structure de- 
spite the increase in the number of constraints. A sim- 
ilar transformation is possible with the expected utility 
objectives as discussed in [3]. 

Primal-dual interior point algorithms can be spe- 
cialized to solve nonlinear stochastic optimization 
problems. See [7] for an extension of a primal-dual in- 
terior point procedure for linear programs to the case 
of convex separable quadratic objectives. The extension 
is tested on the asset allocation problems of [26] and 
compared to MINOS. The primal-dual interior point 
method compared favorably with MINOS, especially 
for the larger test problems. In the direct solution of 
nonlinear programs via interior point methods, the pri- 
mary computational step is the factorization of the nor- 
mal equations ADAT, where A is the coefficient matrix 
and D is a diagonal matrix [18]. This factorization is 
typically done by means of the Cholesky (LLT) method. 
A major difficulty when applying these algorithms to 
stochastic optimization problems has to do with the 
sparsity structure of A. Several efficient approaches are 
now (2000) available to address the sparsity issue, the 
most recent being the tree dissection method [2]. 

Ideas of using parallel computing for stochastic pro- 
grams have been around for quite some time [29]; [14]. 
More recently (1993), E. Rothberg [31] developed an 
extremely efficient method for carrying out sparse ma- 
trix factorization in a parallel environment. Rothberg’s 
factorization coupled with tree dissection concepts pro- 
vides some very encouraging results for stochastic pro- 
grams. Initial evidence indicates that parallel direct 
solvers will be able handle stochastic programs with 
over 10,000 scenarios within several minutes of runtime 
in a parallel environment. 


Decomposition Algorithms 


Considerable progress has been made in the design 
of efficient decomposition algorithms for solving mul- 
tistage stochastic programs. A number of decompo- 
sition algorithms are based on the augmented La- 
grangian function, such as the progressive hedging al- 
gorithm (PHA) and the diagonal quadratic approxi- 
mation (DQA). PHA applies to the variable split form 
of the multistage stochastic program. The nonantici- 
pativity constraints are placed in the objective func- 
tion as penalty and multiplier terms, and are progres- 
sively enforced by an iterative procedure. Mulvey and 
H. Vladimirou [27] compare the performance of the 
progressive hedging algorithm to alternative solution 
strategies on a set of linear and nonlinear portfolio 
management problems. The general purpose optimizer 
MINOS [28] solve these test problems in their compact 
form. This is the most efficient program formulation 
for MINOS because it results in the smallest constraint 
matrix, i.e. the size of the basis is minimized. The lin- 
ear problems were also solved using the primal-dual 
interior code (OB1) of [18]. For nonlinear test cases, 
they employ an extension of the primal-dual interior 
point method to convex, separable optimization pro- 
grams [7]. The staircase formulation obtained by partial 
variable splitting is employed in these terms. On linear 
problems the progressive hedging algorithm was faster 
than MINOS. It was also faster than OB1 when the 
compact form was used. Interior point outperformed 
PHA for staircase structures. On nonlinear problems, 
PHA maintains its superiority over MINOS, particu- 
larly on large test problems. The progressive hedging 
algorithm also fares well against interior point algo- 
rithm on nonlinear problems, outperforming it in sev- 
eral cases. 

DQA forms an augmented Lagrangian function by 
dualizing nonanticipativity constraints. The DQA al- 
gorithm approximates the Lagrangian at the current 
iterate by a quadratic and separable term [25]. The 
outer loop revises the dual variable by the method of 
multipliers, whereas the inner loop consists of sep- 
arable quadratic or convex terms. DQA is a flex- 
ible scheme which can be implemented in many 
ways, in particular, in a parallel distributed environ- 
ment. Mulvey and A. Ruszczynski [25] compare the 
performance of DQA with highly specialized meth- 
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ods for linear two-stage problems. The most success- 
ful methods found so far are based on Benders de- 
composition, suggested for stochastic programming 
in [33]. MSLiP [11] is a recent (1990) implemen- 
tation of this idea, which allows for solving linear 
multistage problems in a nested formulation. Mulvey 
and Ruszczynski [25] show that the specialized de- 
composition techniques MSLiP and DQA outperform 
MINOS. 


Conclusions and Future Directions 


The proposed multistage financial planning model pro- 
vides a general framework for integrating all asset and 
liability decisions for a large financial entity - such as an 
insurance company, bank, or pension plan - as well as 
for individual investors. This comprehensive approach 
measures the risk and rewards of alternative invest- 
ment strategies. Without an integrative asset-liability 
model, investors are unable to properly measure risks 
to their wealth. The usual asset-only approach inade- 
quately evaluates the impact of investments on wealth 
and achieving investment goals. The main lesson is that 
investment models must be tailored to individual cir- 
cumstances. The multistage stochastic program pro- 
vides an ideal vehicle for developing a financial plan 
that fits the investor’s needs. 

Future research should continue along several di- 
mensions. First, we must increase the size of solvable 
stochastic programs so that additional scenarios can 
be handled in a practical fashion. There is no fun- 
damental reason why we cannot address 10,000 to 
100,000 scenarios using parallel and distributed com- 
puters. Certainly, the raw computing power will be 
available. Whether or not direct solvers or decompo- 
sition algorithms are best is a matter for future re- 
search.There are a number of algorithmic items to ex- 
plore. One is to take further advantage of the struc- 
ture of the multistage stochastic program within a par- 
allel interior-point algorithm. For instance, we can con- 
duct the Cholesky factorization using modern sparse 
matrix calculations on parallel or distributed comput- 
ers. Rothberg’s approach [31] seems to be a potential 
winner. Solving the stochastic program as quickly as 
possible will increase the chances that individual in- 
vestors and institutions will apply the models. In the 
case of decomposition methods, the sparse matrix cal- 


culations are key for techniques such as DQA which 
use an interior-point algorithm for solving subprob- 
lems. Any substantial progress on this issue leads to 
immediate gains in the decomposition algorithm. Also, 
the restarting issue for interior-point algorithms re- 
mains. 

Another computational issue involves generating 
scenarios. In particular, out-of-sample testing will be 
critical in order to compute valid bounds on the model 
recommendations. When it applies, dynamic stochas- 
tic control can be useful. The control system assists in 
the selection of the scenarios — for instance, by generat- 
ing importance estimates for adding (or deleting) sce- 
narios as they affect the solution to the control prob- 
lem. These scenarios should be linked to the stochas- 
tic program. Of course, embedding a stochastic pro- 
gram within a simulation system such as carried out 
in [35] to evaluate the precision of the recommenda- 
tions is possible. The approach requires large computa- 
tional resources and may be impractical. Linking sim- 
ulation and optimization models, however, will be in- 
creasingly important, as multistage stochastic programs 
become more widespread in practice. 

A third issue deals with the automatic calibration 
of scenario generation systems using a nonlinear pro- 
gram. For example, the two-factor interest rate model 
possesses seven parameters, including the correlation 
coefficient for the Weiner terms. Setting these parame- 
ters requires considerable effort. There are several com- 
peting objectives: minimizing deviations on the sum- 
mary statistics with respect to historical values; meet- 
ing expectations regarding future asset returns such as 
stocks and bonds; and avoiding trends that are clearly 
unrealistic. The estimation approaches developed in 
[24] and [1] address some these issues, but more work 
is needed to fully understand both modeling and com- 
putational issues of automatic calibration of scenar- 
ios. 


See also 


> Competitive Ratio for Portfolio Management 

> Financial Applications of Multicriteria Analysis 

> Portfolio Selection and Multicriteria Analysis 

> Robust Optimization 

> Semi-infinite Programming and Applications in 
Finance 
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First we review briefly some facts about 2-valued dis- 
crete functions (two-atom Boolean algebras). Then we 


proceed with various n-valued extensions and general- 
izations which are not necessarily always Boolean. The 
most general class of systems to be discussed are Pi- 
algebras. The logic connectives of these algebras are 
partial nonassociative noncommutative general alge- 
braic groupoids [13,30]. 

Associative connectives, such as various families of 
t-norms and ft-conorms [10] which are widely used 
in fuzzy logics are the special instances of PlI-algebra 
groupoid connectives. References to some applications 
of Pi-logic algebras conclude this entry. 

Although the primitives of PI-algebras are only par- 
tially defined they are functionally complete. Hence one 
can represent any finite discrete function by PI-normal 
forms. 


Definition 1 A finite discrete function of k arguments 
f(x1, ..., x~) is a mapping from the k-fold Cartesian 
product of a set A to itself. In symbols: f: A x --- x A 
— A, where A is a finite set containing n elements, A = 


{@o,.--5An—1}- 


For typographic convenience, we shall map the ele- 
ments of A into the finite subset N of nonnegative num- 
bers by the assignment a; = i, namely A = {0,..., n — 1}. 
This does not imply that the ordering of natural num- 
bers is always relevant to our algebraic considerations. 
These numbers should just be considered as more con- 
venient labels than, say, a; for the elements of a finite 
set A. 


Boolean 2-Valued Logic Algebras 


A Boolean 2-valued function is a discrete function that 
takes its values from the two-element set {0, 1}. We can 
form 16 different two-argument functions on the set 
{0, 1}. The ten nontrivial of these are shown below. 


Finite Complete Systems of Many-valued Logic Algebras, Ta- 
ble 1 
Two-argument connectives of the 2-valued logic 


xl|y il Alv| > << =| @| = | 1] | 

0o;O;}0;0;1)1/1)0/0)0/1)1 
Oo;1);}0);1)1)/0/0;1/1)/0/0)1 
1/o0}}0/1);0/1);0/1;0);1)]0)1 
iL || i 1/1;]1);)17;1;]0;0/;0);0/;)]0 
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Some of these operations, also called logic algebra 
connectives, play an important role in logic. Hence they 
are given special names to stress their meaning and sig- 
nificance. 

When ‘1’ is interpreted as “True’ and ‘0’ as ‘False’, 
then V represents logical AND, A (nonexclusive) logi- 
cal OR, @ exclusive logical OR (i. e. XOR). The connec- 
tive = is logical equivalence which captures the equiva- 
lence of two propositions, and — is an implication op- 
erator which captures the validity (truth-value) of the 
conditional ‘If__ then _’. 

The connectives, together with some inference rules 
(e.g. modus ponens, see ® Checklist paradigm seman- 
tics for fuzzy logics), make a system of classical propo- 
sitional logic. 

Let us recall that any Boolean function can be ex- 
pressed by the conjunctive normal form (CNF) or dis- 
junctive normal form (DNF). Let us introduce the no- 
tation x° = (x Aa) V (x AG), where x denotes the 
negation of x, and o is a parameter equal either to 0 or 
1. Then it is obvious that 
x wheno = 1, 


xe = 


x wheno = 0. 


x° is called literal. 


Theorem 2 (Normal forms theorem) Every Boolean 
function f (x1, ..., X,) can be represented by their canon- 
ical (full) CNF and DNF normal forms. 

i) The disjunctive normal form: 


01 On 
0 eee 
(C1 5-+5;0n)=1 


ii) The conjunctive normal form: 


oT on 
A xB env x8, 
f(O15-+50n)=0 


A clause in a DNF consists either of a literal or of 
a conjunction of literals. In a CNF, on the other hand, 
a clause consists either of a literal or of a disjunction of 
literals. 

Because we can express any Boolean function by 
formulas formed by means of the sets of connectives 
CNF-Cset (= {A, V, —}), DNF-Cset (= {V, A, —}), we 
call these sets complete sets of connectives. 


A Repertory of Complete Many-Valued 
Logic Normal Forms 


Important structural relationships that provide the al- 
gebraic backbone of various logics are contained in 
their normal forms. It is possible to generalize from 
two-valued normal forms to many-valued normal forms 
in various ways. We shall discuss here one such gener- 
alization, namely partial functionally complete Pinkava 
algebras (Pi-algebras) which offer some interesting in- 
sights and have also a significant practical value. The 
systems were discovered in 1971 by V. Pinkava [24,25] 
as a significant generalization of the systems used by 
Pinkava in his previous work [21]. 


Many-Valued Families 
of the Pinkava Logic Algebras 


Definition 3 ([30,35]) The Pinkava n-valued family of 
logical calculi Pi = {A, O, ©, ©, >} consists of the par- 
tially defined connectives operating on the value-set {0, 
...,n—l}: 


0 if v; = 0, 
yOvj= 41 ifv; AO&vj =1, 
undefined otherwise, 
0 ifv; =0, 
if v; FOKV; = 1, 


undefined otherwise, 


vjLv; — Vj 


Vj if v; = 0, 
Vy Ov = 
undefined otherwise, 
v =v+1 (mod n), 
1 ifv=k, 
Yo= 
0 ifv#k, 


where i, j € {1, 2}. 


Note that the carrier and the characteristic functions can 
also be generated in the Pinkava logic calculi by the con- 
nectives. For example, the characteristic function W can 
be generated by o, [13]. 


Theorem 4 (Complete normal forms) Any n-valued 
logic function that is obtained by a completion of the par- 
tially defined Pinkava connectives of the type {© O, 9, 
<+} defined above is functionally complete and can be 
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expressed in the following canonical normal form: 


Fines ¥e) = [cyO (07, Wa, (vs))] - 


General Pi-Algebras 


Families of Pi-Algebras 
and their Functionally Complete Normal Forms 


The Pinkava logic calculi can be further general- 
ized [11,13,14]. These generalizations are called Pi- 
algebras. The connectives involved are partial nonasso- 
ciative noncommutative general (algebraic) groupoids 
in their most general form. Associative connectives, such 
as the t-norms and t-conorms [10,44] are special in- 
stances of them (see also ® Boolean and fuzzy rela- 
tions). 


Definition 5 (Families of Pi-algebras) Let Pi be an al- 
gebra with carrier P such that PI = (P, o, O, ©, ®), 
where [13]: 

1) (P, ©) is an arbitrary groupoid with zero Z., without 
divisors of zero, and with the almost absorbing ele- 
ment de such that dp © p = p © do = ay for every p 
EP, pA Zo. 

2) (P, © ) is an arbitrary groupoid with unit eo. 

3) (P,Q) is an arbitrary groupoid with a right zero z, 
and a right unit e, o. 

4) @ isa discrete cyclic shift function ®:P — P satisfy- 
ing the following conditions: Given a discrete cyclic 
order of P, then for every p € P it holds that p x ® 
(p) and 6°(p) = p, 6** (p) = &(H*(p)). 


In the above definition a < b means that a is a direct 
predecessor of b. 


Definition 6 Let pi, p2 € P, and @ be a cyclic shift 
function. Then the advance 5 from p; to p2 with respect 
to @ is the least ordinal such that 6° (p,) = p». We write 
66(pi) = p2. The advance 5* denotes the inverse of 5. 


Theorem 7 (Canonical normal forms) Let the ad- 
vances 51, 52 be defined by the formulas 5; := (do, ey a); 
52 := (ZO, €@). Then any function f on the carrier P in 
a Pi-algebra can be expressed in its canonical normal 
form: 


fv. Mog 45 .) 


= © 2% [a (0 (Ost? oF [Wa,(v,)])\. 
feo 


The argument scope of the outer connective of the 
normal form is fy peiivasdefZeo) This means that 
the values eg are omitted. 


Theorem 8 (Functional completeness) Any Pi- 
algebra is functionally complete if, given the advance 5) 
= (ao, e, Q), it also holds that 6; = 6(Zo, zy). 


Theorem 9 [If the right zero z,_ is also the zero and the 
right unit e, g is also the unit of the groupoid (P, O), then 
the following normal form is also functionally complete: 


F (V1, Va, - +4) 
= © o* {oF (60 (OL, 6% %,()])} 
fFeq 


The Taxonomy of the PI-Algebras 
of Many-Valued Logics 


The main theoretical question that the PI-algebras an- 
swer is: “Which features of two-valued Boolean logic 
structures disappear and which are preserved and car- 
ried over into the extensions and generalizations to 
many-valued logics’? The Pi-logic algebras are partial 
systems that put under one roof a wide variety of fam- 
ilies of functionally complete many-valued logical sys- 
tems. Thus they offer a useful framework in which vari- 
ous generalizations and extensions can be carried out. 
They also provide a sound base for a useful classifi- 
cation of many-valued logics by their various proper- 
ties. This approach, based on Pl-normal forms, use- 
fully complements another way of classifying the many- 
valued logic connectives by groups of logic transfor- 
mations (see ® Checklist paradigm semantics for fuzzy 
logics). 


Special Subfamilies of n-Valued PI-Systems 


Because the Pinkava connectives are only partially de- 
fined, it is possible by imposing further restrictions on 
these connectives to define subclasses of the function- 
ally complete Pinkava systems. For example, consider 
the restrictions 

a) (v7 Oy)" =, © v9; 

b) Of" Oy) = Oy, 

They make the subclass {o, 1, <>} functionally com- 
plete. Imposing some other restrictions we can ob- 
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tain other subclasses of functionally complete connec- 
tives. For example, the Post system, the Aizenberg- 
Rabinovich system, the Zhegalkin algebra, and lattice- 
type many-valued logic systems can be so obtained. 
For further details see e.g. [13,38]. A partial taxon- 
omy of various subclasses of the Pinkava systems can 
be found in [13, Fig. 8.1] or in [11 p. 279, Fig. 1]. 
Several important subclasses of the Pinkava systems 
are presented in [26]. For the subclasses that are gen- 
eralizations of the Sheffer function, see [32]. The sys- 
tems particularly suitable for minimizations appear 
in [33]. 


PI-Algebras and a New Variety 
of 2-Valued Normal Forms 


It is illuminating to look at two-valued well-known spe- 
cial instances of logic connectives and classify them in 
terms of Pi-algebra connective types. This reveals that 
there are other canonical normal forms in addition to 
DNF and CNF. For instance, the ©, which is partial, of- 
fers two distinct completions: either Boolean (inclusive) 
OR V or exclusive-OR (nonequivalence) ©. Although 
the connectives UH and ¢ are identical in the two-valued 
case, both forming Boolean A, they extend each to a dis- 
tinct partial connective for n > 2. This is because each 
of these connectives plays a different role in the normal 
form, serving a different purpose. 

In order to explore more fully the richness of Pi- 
algebras, one has to look at their taxonomy in the gen- 
eral many-valued case. For a more detailed taxonomy 
see [13, Fig. 8.1.2] or [11]. 

Two highlights emerge from this approach: 

1) Even in the simple two-valued case, the nor- 
mal forms of generalized Pi-algebras subsume not 
only the conjunctive and disjunctive normal form 
but also the implication, equivalence, exclusive- 
or and other normal forms in one unifying pat- 
tern. 

2) Two distinct general n-valued connectives may ‘col- 
lapse’ into a single connective when one sets n = 
2. Viewed the other way, a two-valued connective 
may ‘bifurcate’ into two distinct types of connec- 
tives when more than two values are used. This bi- 
furcation of structures and concepts is an interesting 
phenomenon that accompanies fuzzification of two- 
valued structures. 


Theoretical and Practical Importance 
of Pl-algebras 


The Requirement of Functional Completeness 


The functional completeness is of primary interest to 
a scientist or an engineer engaged in practical applica- 
tions of many-valued logic. In such applications it is of- 
ten desirable to have the means for generating all possi- 
ble finite discrete functions by means of a complete set 
of many valued logic connectives. 

For example, it is desirable to have a set of logic gates 
that can generate any combinatorial switching circuit. In 
pattern recognizers implemented by many-valued logic 
networks, the set of basic ‘cognitive elements’ has to be 
complete, otherwise some patterns may be misclassi- 
fied. The completeness is necessary in order to have 
the means for representation of all the possible discrete 
functions over a finite set of elements. 

Similarly, in biological or psychological and medi- 
cal models based on abstract classification of patterns 
by logic nets the choice of an incomplete set of connec- 
tives as the representational base of the model might 
yield a bias towards assumptions that are not contained 
in the experimental data. For example, in ethological 
models of instincts the representation using an incom- 
plete set of connectives would represent the a priori 
assumption that certain forms of instincts do not ex- 
ist. Yet the data might contain the evidence for these, 
but this evidence is not representable and will be dis- 
carded by an unfortunate choice of the incomplete set 
of connectives. In models of neuro-psychological disor- 
ders this might cause a priori exclusion of some impair- 
ments of the substratum structures, diminishing the 
predictive usefulness of such models. 

The complexity of the normal forms as well as the 
complexity of the minimized many-valued logic (MVL) 
expressions depends on the character of the discrete 
function (i.e. the data) to be represented, the choice 
of an appropriate many-valued logic system of connec- 
tives, and the number of the discrete values of the value 
set A. Hence, only by the choice of an appropriate MVL 
system may we achieve an optimal representation in 
each specific application domain. 

The choice of a suitable system is usually an itera- 
tive process, which requires a comparative evaluation 
of several systems, performed in order to optimize the 
choice. In order to assess whether a chosen system is 
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functionally complete, a set of conditions sufficient to 
determine the completeness is required. Alternatively, 
a set of rules has to be given that would make it pos- 
sible to generate complete systems of required addi- 
tional properties directly. The constructive conditions 
for completeness given above for the normal forms of 
PI-algebras are such rules. 

Other conditions for completeness of the same of 
greater generality are not so suitable for this purpose 
because the number of conditions necessary to test for 
completeness increases rapidly with increasing number 
of values n of a many-valued logic system. E. Post was 
first to give the general conditions for completeness of 
2-valued logics (n = 2). These were later generalized by 
S.W. Jablonskij (S.V. Yablonskii) [9], J. Slupecki [45], 
A. Salomaa [43] and others. The most general condi- 
tions known at present are those given by I. Rosen- 
berg [41,42] which are the generalization of the Post 
conditions for any n-valued finite case. 

In all these later cases (unlike for PI-algebras initi- 
ated by Pinkava), the number of conditions increases 
astronomically with increasing n. For n = 2 (Post) there 
are 5 conditions that the logic system has to satisfy. For 
n = 3 (Jablonskij) there are 18 conditions. For a seven- 
valued MVL system (n = 7), there are 7,848,984 condi- 
tions to be tested. The general formula for any finite n 
< 2 is given by Rosenberg in [42]. This formula shows 
that, for large n, the number is rather prohibitive, hence 
of no practical value. On the other hand, PlI-algebras 
can generate an infinite number of finite functionally 
complete systems of connectives for any finite n. This 
is so because the Pinkava complete sets of connectives 
are only partially specified and the completion of the 
‘blanks’ by any values does not invalidate their com- 
pleteness. 


Satisfiability Problem in Computational 
and Descriptive Complexity Theory 


Central Importance of the Satisfiability 
Problem of Boolean Formulas 
in Complexity Theory 


The main goal in the complexity theory of algorithms 
is to distinguish problems that can be solved efficiently 
from those that cannot be. A computational solution to 
a problem is practically feasible if it belongs to the com- 


plexity class P, that can be computed by a deterministic 
algorithm in time bounded by a polynomial function of 
the size of the input data. 

The central problem of complexity theory in com- 
puter science and a major problem of contemporary 
logic and mathematics is whether the class P is equal 
to the class NP. Problems solvable by nondeterministic 
algorithms in polynomial time belong to the class NP. 

Problems in the class NP are computationally 
tractable only if they are of polynomial complexity, that 
is if P = NP. 

A successful proof of the conjecture that P # NP 
would on the other hand indicate that the NP class is of 
computationally not tractable exponential complexity. 

The class NP contains many practical problems that 
can be characterised by the following property: There 
is no known way to compute a solution in polynomial 
time, but there is a known way to check in polynomial 
time whether a potential (e.g. guessed) solution is an 
actual solution. 

The satisfiability problem [19,20] that concerns 
Boolean formulas [5] is closely related to the question 
about computational complexity of many other com- 
putational problems [6,7]. 

We say that a Boolean formula is satisfiable if there 
exists at least one way of assigning values to its variables 
so as to make it true. Finding the answer ‘yes’ or ‘no’ to 
this question is called the [2]. If the Boolean formula of 
our concern is written solely in the CNF we have the 
SAT-CNF problem. SAT-k-CNF is obtained by restrict- 
ing SAT-CNF to Boolean formulas in k-CNF, where k- 
CNF is composed of clauses, each of which contains at 
most k literals [2]. 

It follows from Cook’s theorem [3] that the ques- 
tion whether or not P = NP is equivalent to asking 
whether there is a polynomial time deterministic algo- 
rithm (PTDA) recognizing the set of satisfiable Boolean 
propositional formulas (the SAT problem), or equiva- 
lently, a PTDA recognizing the set of propositional tau- 
tologies TAUT [19]. This demonstrates the central im- 
portance of the SAT problem for computational com- 
plexity theory. 

In 1971 S.A. Cook [3] proved that every problem X 
€ NP is polynomially Turing reducible [1,2] to the ques- 
tion about the complexity of TAUT-DNF, i.e. the set of 
propositional tautologies coded in DNF. In symbols: X 
</. TAUT-DNE. 
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This is related to one of the open (as of 1999) funda- 
mental problems of logic [19]: Is there a propositional 
proof system P in which every tautology has a polyno- 
mial size proof? At present it is known only [4] that 
there exists a propositional proof system in which ev- 
ery tautology has a polynomial size of proof if and only 
if NP = co NP, i.e. the class NP is closed under comple- 
mentation. 

The relation eae is a pre-order (see ® Boolean and 
fuzzy relations) hence it provides the means for com- 
paring the relative computational difficulty of prob- 
lems [1]. Because it is a pre-order, it may contain vari- 
ous equivalence classes (see ® Boolean and fuzzy rela- 
tions) of problems. 

The statement: “The SAT problem is NP-complete’ 
is referred to as the Cook-Levin theorem in the litera- 
ture. Using the reducibility relation </. together with 
this theorem yields a useful technique for providing 
proofs of the NP-completeness of other problems. We 
say that a problem X is NP-complete [1,2] if 
e Xe€ENP;and 
e Y <° X for every problem Y € NP. 

There is a great number of computational prob- 
lems in the graph and set theories, the NP-completeness 
of which can be proven by reducing the SAT prob- 
lem directly or indirectly to each of them. For ex- 
ample dominating set, vertex cover, clique, 3-SAT, 3- 
colorability [20]. SAT can be reduced to the clique prob- 
lem, which in turn is reducible to the vertex cover prob- 
lem. The vertex cover problem is reducible to the dom- 
inating set problem. Similarly, there is another chain of 
reductions: SAT to 3-SAT to 3-colorability. These re- 
duction chains form a part of the semilattice generated 
by the reducibility relation. 


Open Problems in the Complexity Theory 
of PI-algebras (1999) 


Computational complexity of PI-logic algebras and 
normal forms is an uncharted territory. There is an in- 
finite number of ways in which the partially specified 
but functionally complete PI-logic normal forms can be 
made fully specified, and a large variety of algebraic re- 
strictions that can be placed upon them to generate par- 
ticular fully defined systems. 

Despite of their partial nature, the PI-normal form 
have well defined length even before their algebraic 


properties are completely specified. Hence one may ex- 
pect that they will play some role in placing the upper 
bound on descriptive complexity [8] of propositional 
systems. This may be a promising direction of research 
in the future. It should also be noted that the gener- 
alized PI-normal forms allow for description of trans- 
formations from lattice based connectives to ring based 
connectives. Indeed, both are special instances of PI- 
logic algebras (see Theorem 7 and [13]). That might 
help to build a bridge between methods for analysis of 
algebraic propositional systems [40] with notions of de- 
scriptive complexity [8]. 


Applications of Pl-Algebras 


In addition to their theoretical significance, the func- 
tionally complete PI-systems have found a num- 
ber of practical applications in various fields: in 
medicine, clinical behavioral sciences and neurol- 
ogy [12,22,23,34,36]; in data analysis and classifica- 
tion [37,39]; analysis of logical paradoxes [29,38]; au- 
tomata theory and systems science [17,31,38]; design 
of MVL-switching circuits [11]; in dynamic computer 
protection also applicable to distributed an parallel sys- 
tems [13,15,16]; logic and theorem proving [18,35]; op- 
timization of discrete functions [27,33] and fuzzy log- 
ics [28]. 


See also 


> Alternative Set Theory 

> Boolean and Fuzzy Relations 

> Checklist Paradigm Semantics for Fuzzy Logics 
> Inference of Monotone Boolean Functions 

> Optimization in Boolean Classification Problems 
> Optimization in Classifying Text Documents 
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A constraint qualification (CQ) is a condition imposed 
on the analytical description of a given set which can 
be the constraint set of an optimization problem. CQs 
are essential in order to establish optimality conditions, 
but they play also a crucial role in duality theory and 
perturbation analysis for optimization problems, and in 


the study of error bounds and stability for algebraic sys- 
tems like systems of equations or/and inequalities. The 
notion first order constraint qualification is used if a CQ 
is formulated in terms of first order derivatives or gen- 
eralized derivatives of the data functions defining the 
(constraint) set, or if it is related to optimality or stabil- 
ity conditions involving first order terms of the original 
data. Roughly speaking, first order constraint qualifica- 
tions establish a link between the geometry of the given 
set and certain kinds of first order approximations of 
the analytical data. 

A canonical form of constraints for which con- 
straint qualifications have been studied is, for example, 
the constraint system of a mathematical programming 
problem, i.e., 


gi(x) <0, iel={l,...,m}, (1) 
g(x) =0, jes={m+],...,r}, 
where g; : R’ > R (i = 1, ..., r) are given functions, 
possibly restricted to some subset X C R". 
Another canonical form are abstract constraints, 
G(x) EC, (2) 


where G maps a Banach space X into a Banach space 
Y, and C is a nonempty closed convex cone in Y. 
Many of the results reported below similarly hold (with 
some technical modifications) under the weaker as- 
sumption that C C Y is an arbitrary closed convex 
set [4,5,8,40,42]. The inclusion (2) is suitable to repre- 
sent also constraints of abstract optimal control prob- 
lems, semi-infinite programs, semidefinite optimiza- 
tion problems, and others, see, e. g., [5]. Obviously, (1) 
is a special case of (2), put X = R", Y=R',C=f{y:y, < 
0,i€ J; y; =0,7 € J} and G=(gi,..., gr). 

The notion “constraint qualification’ was introduced 
by H.W. Kuhn and A.W. Tucker [22] in developing the 
theory of nonlinear programming. However, under the 
name regularity conditions, description-depending as- 
sumptions were known already in classical variational 
and extremum problems. To illustrate the meaning of 
first order CQs, let us consider a simple example: 


Example 1 


min 5x*+y 


st x-y=0. 
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The classical Lagrange conditions x + u = 0, 1 — u=0, 
x — y = 0 are necessary (and sufficient, in this example) 
for the optimality of the point (x,y) = (—1,—1) with 
associated multiplier 7 = 1. On the other hand, if the 
constraint is equivalently written as 


1 2 
~(x—y) =0, 
5 y) 


then the corresponding Lagrange conditions become 
x+ u(x — y) =0, 1 — u(x — y) =0, (x — y)*/2 = 0, which 
are contradictory. Trivially, in the first description of 
the feasible set, the linearization adequately represents 
the possibilities for variation near (x, y), in the second 
description, the linearization is inadequate in this re- 
spect. 


Optimality Conditions 


First order necessary optimality conditions in dual 
form require certain CQs to hold. Consider the opti- 
mization problem 


(P) min f(x) 
st. «EM, 
where M is the solution set of (1). First suppose that 
X = R", f: R” — R, and g; (Vi) are continu- 
ously differentiable. For x ¢€ M define Iz := 
{i € I: gi(x) = 0}, write h € Tyy(x) (tangent cone) if 
h = limpsoo An(x* —X, where A, > 0, x* € M (Wk) and 
x* —» X, and write h € Ky(X) (linearization cone) if 
(h, Dgi(x)) < 0 for i € Iz and (h, Dg;(X)) = Oforje 
J. 
Then the Karush-Kuhn-Tucker conditions (KKT), 


Jue R’: Df®) + D> ujiDgi(%) =0, 
i€IUJ 
xeM, 
uj=0, ujg(x)=0, ie], 


are necessary for X being a local minimizer of (P), pro- 

vided that, for example, one of the following CQs is sat- 

isfied (see, e. g., [2]): 

e Kuhn-Tucker CQ: For every h € Ky(x) there is 
a continuously differentiable function y: [0, 5) > M, 
6 > 0, such that y(0) = x and (0) = A. 


e Mangasarian-Fromovitz CQ (MFCQ, [28]): Dg;(x), 
j € J, are linearly independent, and for some 
h # 0, there holds (h, Dg;(x)) < 0, i € Ix, and 
(h, Dgj(%)) = 0,j €J. 

e Linear Independence CQ (LICQ): Dgi(x),i€ IU J, 
are linearly independent. 

There holds (see, e.g., [2]): LICQ = MFCQ > 
Kuhn-Tucker CQ = Abadie CQ; the converse impli- 
cations are not true, in general. For further CQs in this 
respect, see [2,38]. If no inequalities appear (i. e., I = 9), 
the above CQs are classical for optimality conditions in 
Euler-Lagrange form. Note that Abadie’s CQ is auto- 
matically satisfied at each point of M if g; are affine- 
linear for all indices i€ I U J. 

Now suppose that (P) is a convex program, i.e., 
gi, i € I, are convex (but not necessarily differen- 
tiable) functions and g; are affine-linear functions with 
gradients aj, j € J. Then the following CQs are of- 
ten used for optimality conditions of Karush-Kuhn- 
Tucker type (in subdifferential terms) and saddle-point 
conditions ([16,36,41]): 

e Basic CQ at x € M: Each normal direction h, i.e., 
(h,x —x) < 0 for all x € M, has a representation 
h= ier, Midi + jer Aja; for some A € R”, pj 
> 0, d; € 0g;(X) (for i € Iz), where 0g;(x) denotes 
the Moreau-Rockafellar subdifferential of g; at x. 

e Weak Slater CQ: 4x° € M such that g;(x°) < 0, i € In, 
where Iy := {i € I: g; is not affine-linear}. 

e Strong Slater CQ: Ax° € M such that g;(x°) < 0, i € I, 
is satisfied, and qj, j € J, are linearly independent. 
The latter naming of CQs is taken from [16]. If no equa- 
tions appear, the strong Slater CQ becomes the well- 
known and classical Slater CQ [2,36,41]. There holds: 
weak Slater CQ => basic CQ; and for a given x € M, 
the basic CQ is equivalent to a nonsmooth form of the 
Abadie CQ [16]. If the gj, i € I, are differentiable, then 
the strong Slater CQ is equivalent to the MFCQ being 
satisfied at any x € M [33,34]. There are certain forms of 
first order optimality conditions which do not require 

a CQ, see, e. g., [2,3,32,38]. 

Next, consider 


@) min f(x) 
s.t. xewM, 


where M is the solution set of (2), and f is defined on the 
Banach space X. Let f, G be continuously differentiable. 
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Denote by Y* the dual space of Y. Then the conditions 


que Y*: Df(®) + (u, DG) =0, 
(u,y) <0, VWyeEC, 
G(x) € C, 
(u, G(X)) = 0, 


are necessary for X¥ being a local minimizer of (P), pro- 
vided that, for example, the following CQ is satisfied 
(see, e. g., [5,34,42]): 

e Robinson CQ:0 € int{G(x) + DG(x)X — C}, where 

‘int’ denotes the topological interior. 

Because the core of a convex set includes its interior, 
0 € core{G(x) + DG(x)X — C} is a consequence of 
the Robinson CQ. In fact, the latter is also sufficient for 
Robinson’s CQ to hold, and both CQs are also equiva- 
lent to R4[G(x) + DG(x)X — C] = Y, for details one 
may consult [5,33,42]. If (P) is specialized to (P), then 
the Robinson CQ and MFCQ are equivalent [34]. Un- 
der convexity assumptions on f and Gin (P), an exten- 
sion of the strong Slater CQ plays a crucial role for first 
order optimality characterizations [37] (see also [40]): 
0 € int(G(x) — C), which becomes G(x) € intC if intC 
# @. In the case of differentiable data, the latter CQ is 
equivalent to the Robinson CQ [33,40]. 

For many other classes of optimization problems, 
first order CQs in connection with optimality condi- 
tions have been intensively studied. Among them we 
refer to CQs in composite optimization [38], optimal 
control problems [7,17,31], nonsmooth (nonconvex) 
programs [7,38], mathematical programs with equilib- 
rium constraints [27], semidefinite programs [39], and 
semi-infinite programs [5,15,31,32]. Certain first order 
CQs, in particular, Robinson’s CQ and the MFCQ play 
an important role in the theory of second order op- 
timality conditions (and second order stability analy- 
sis), see ® Second order constraint qualifications and, 
e. g., [3,4,8,39,40]. 


Duality 


If (P) is a convex program, then first order CQs are 
closely related to the existence of optimal solutions 
of the Lagrange dual problem (D) associated with (P) 
and to properties of the perturbation function v(u) := 
inf{f(x): gi(x) < uj, i € I, g(x) = uj, j € J}, like con- 
tinuity or subdifferentiability [10,36,37]. An important 


CQ is 
e Calmness: v(0) is finite and the Moreau-Rockafellar 
subdifferential dv(0) of v(-) is nonempty. 

Under calmness, the dual problem (D) is solv- 
able and v(0) coincides with the optimal value of 
(D) [10,36,37]. The strong Slater CQ implies calmness. 
If v(0) is finite, then the following three conditions are 
mutually equivalent: 

i) For (1) the strong Slater CQ holds; 

ii) v(-) is continuous at 0; 

iii) the set of solutions of the dual problem (D) is 
nonempty and bounded. 


For more details see, e. g., [1,33,36]; for generalizations 
to convex problems (P) with abstract constraints of the 
type (2) see, e. g., [33,37,40]. 

Now suppose that (P) has continuously differen- 
tiable data f, g;, and X is a stationary solution solution 
of (D), i.e., X satisfies together with some multiplier u 
the KKT condition. Then, obviously, LICQ implies that 
the multiplier u associated with xX is unique. In [25] is 
shown that a strengthened form of MFCQ, the so-called 
strict MFCQ, is necessary and sufficient for the unique- 
ness of the Lagrange multiplier. Another basic result is 
the following: MFCQ holds at x if and only if the set of 
all multipliers associated with x is bounded (Gauvin’s 
theorem [13]). 

Extensions of Gauvin’s theorem to the general 
problem (P) with smooth data can be found, e.g., 
in [8,34,42]. For recent surveys of several aspects of CQs 
and duality, see [5,40]. 

The above relations are also important theoretical 
tools for establishing solution techniques which use La- 
grangians or dual schemes (see, e.g., [16,29,38]), for 
convergence analysis of path following methods (see, 
e.g., [14]), for regularity properties guaranteeing finite 
termination of algorithms [6,11], and for several stabil- 
ity subjects (see the next section). 


Stability 


If the data couples (f, g) of (P) or (f, G) of (P), respec- 
tively, are embedded in a family F of data, where a “dis- 
tance’ between two elements of F should be available, 
then the question arises how changes of the data in F 
affect existence of solutions (local or global optimiz- 
ers, stationary solutions, critical points), and whether 
‘small’ data perturbations lead to ‘small’ changes of the 


1058 


First Order Constraint Qualifications 


optimal value and solution set, or not. A ‘good’ stabil- 
ity behavior is often sensitive to the description of the 
constraint set and needs a CQ. 

For example, consider a parametric smooth pro- 
gram with finite-dimensional variables x and canoni- 
cal perturbations (¢, a, b) in a finite-dimensional space, 
namely, 


min f(t,x)— (a,x) 
(P(t, a, b)) st. gi(t, x) < bj, 
gilt, x) = bj, 


ie I, (3) 
jel, 


with respect to x, where I, J are as above and f, g; are 
twice continuously differentiable with respect to (t, x). 
Given an initial parameter triple (f,0,0) and a KKT 
point Z = (x, ) of the initial problem, then strong sta- 
bility of Z (i-e., the existence of a locally unique and 
Lipschitzian solution z(t, a, b) of the perturbed KKT 
system near Z) necessarily requires LICQ to hold at 
x [23], while LICQ together with some strong second 
order optimality condition characterizes strong stabil- 
ity [9,21,23,35]. 

MFCQ and the strong Slater CQ are very impor- 
tant to get other stability properties like strong stabil- 
ity, pseudoregularity, upper Lipschitz (or Hélder) con- 
tinuity, or upper semicontinuity of the optimal and/or 
stationary solution maps under perturbations (see also 
the next section). LICQ and MFCQ, respectively, play 
an essential role for existence, representations and es- 
timates of directional derivatives (studied in differ- 
ent forms: standard one-sided directional derivative, 
semiderivative, Dini type, Hadamard type, and others) 
of the optimal value function. For an introduction into 
these interrelations, see [1,5,9,12,19,24,38], while [5,40] 
also survey extensions to the class (P), under the Robin- 
son CQ. 

In the study of structural (or global) stability of fea- 
sible sets and nonlinear finite/semi-infinite programs, 
including one-parametric deformations, MFCQ and its 
extensions turn out to be fundamental in these settings, 
in particular, they characterize the global stability of 
compact feasible sets, for surveys see [14,18]. 

If one is interested in directional stability of optimal 
values or optimal solutions under data perturbations, 
another type of CQs often comes into play: directional 
regularity conditions which are imposed on the con- 
straint set of the initial problem P((f,0,0)). A typical 


example is the 
e Gollan CQ at a feasible x in direction d: Dg;(t, x), j 

€ J, are linearly independent, and for some h ¥ 0, 

there holds ((h, d), Dgi(t, x)) < 0,i € Igy, and 

((h, d), Dgj(#,X)) = 0, j € J. 

For this CQ and a natural extension to abstract con- 
straints in Banach spaces, see [4,5,40], in which direc- 
tional differentiability and second order expansion of 
the optimal-value function as well as Lipschitz/H6lder 
stability and first order expansion of optimal solutions 
under directional regularity conditions are studied. 


Metric Regularity 


Metric regularity of a parametric constraint system 
refers to a certain local error bound for the distance 
of some point x to the solution set in terms of the 
residuum of the data at x and is closely related to first 
order CQs. Consider for example system (1) with X = 
R" under right-hand side perturbations b, 


ie, 
io 


8i (x) < bj, (3) 

& (x) =b p 
where I and J are as above. Denote by S(b) the solu- 
tion set mapping of this system. If g;, Vi € I, are con- 
tinuously differentiable, and if x € S(0), then MFCQ 
at x implies that (4) is metrically regular at (x, 0), ie., 
there exist a neighborhood U of (x,0) and a constant 
L = L(x) such that for (x, b) € U, 


: (gi(x) — bi)4, 
dist(x, S(b)) < L . » 6 
ee ( gix)— bj, EI ” 
where c, := max {c, 0} for c € R, and || - || is any norm 


in R’. This was shown in [34], the converse assertion 
is also true [8,34]. In the Banach space context of the 
system (2) with right-hand side perturbations, the same 
equivalence holds when replacing MFCQ by the Robin- 
son CQ, see again [8,34]. 

If g;, i € I, are convex (not necessarily differen- 
tiable), and gj, j € J, are affine-linear, then the strong 
Slater CQ implies that (4) is metrically regular at each 
(x, 0), x € S(0), see [33]. The converse direction is also 
true [8]. In fact, in both [8] and [33], the authors prove 
these results for convex inclusions in the Banach space 
setting (2) using a suitable generalized form of the Slater 
CQ. Moreover, under the strong Slater CQ, the estimate 
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(5) even holds for all x in X and all b near 0, where the 
number L = L(x) is bounded on bounded sets [33]. 

Note that under mild assumptions on the 
parameter-dependence, the same CQs imply metric 
regularity under more general perturbations (like in 
(3), for example), for details see again [8,24,33,34]. 


Error Bounds 


The role of (first order) CQs for deriving local and 
global error bounds will be now discussed for a system 
of convex inequalities, i.e., suppose in (1) that J = @, 
and g;, i € I, are convex (not necessarily differentiable) 
functions. Denote the solution set again by M. 

Given x € M, the system (1) is called metrically reg- 
ular™ at x (or simply metrically regular at x [26,30]; the 
asterisk is used to avoid confusions with the above no- 
tion for parametric systems), if there exist a neighbor- 
hood U of x and a constant L = L(x) such that for 
xeEU, 


dist(x, M) < Lmax{gj(x)4: ie Th. (6) 


In [26] was shown that for differentiable functions g;, i 
€ I, the Abadie CQ holds at x € M if and only if (1) is 
metrically regular* at x [26]. For the nondifferentiable 
case, it follows by a similar idea of proof that the basic 
CQ is equivalent to metric regularity*. 

If M is bounded and the strong Slater CQ holds, 
then (6) is satisfied for all x € R"”, with some 
uniform constant L [33]. This property is called 
a global error bound, or, an ‘error bound in Hoffman’s 
sense’ [20,26,30]. If M is unbounded then additional 
asymptotic CQs are required to guarantee the existence 
of a global error bound. For a survey of asymptotic CQs 
and their interrelations, see [20]. 

CQs like Abadie’s CQ, MFCQ and the (strong) 
Slater CQ are also essential in deriving local and global 
error bounds for approximate solutions of convex and 
nonconvex mathematical programs and other varia- 
tional problems. These questions are for some classes 
of programs closely related to the existence of so-called 
weak sharp minima introduced in [6,11]. For a gen- 
eral survey of error bounds in the sense just discussed 
see [30], for the special case of quadratic convex pro- 
grams see [26]. 

In contrast to other applications of CQs, the rela- 
tions between CQs and error bounds are still not clar- 


ified completely and require strong additional effort in 
research. 


See also 


> Equality-constrained Nonlinear Programming: KKT 
Necessary Optimality Conditions 

> Inequality-constrained Nonlinear Optimization 

> Kuhn-Tucker Optimality Conditions 

> Lagrangian Duality: Basics 

> Rosen’s Method, Global Convergence, and Powell’s 
Conjecture 

> Saddle Point Theory and Optimality Conditions 

> Second Order Constraint Qualifications 

> Second Order Optimality Conditions for Nonlinear 
Optimization 
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Introduction 


The general flow-shop problem [4,60,68], denoted as 
n/m/Cmax in the literature, involves n jobs, each re- 
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quiring operations on m machines, in the same ma- 
chine sequence. The processing time for each opera- 
tion is py, where i € {1,2,...,n} denotes a job and 
j €{1,2, ...,m}a machine. The problem is to deter- 
mine the sequence of these n jobs that produces the 
smallest makespan assuming no preemption of oper- 
ations. In the simplest case, all jobs are available and 
ready to start at time zero and the setup times on ma- 
chines are assumed to be sequence-independent and 
included in pj. In more realistic situations, however, 
jobs are released at different times, thus requiring dy- 
namic scheduling and the setup times are sequence- 
dependent. 

The makespan problem for flow shops has been the 
most studied by far in the literature. (The makespan 
Cmax is equivalent to the completion time of the last job 
to leave the system.) This is partly because: 

e Makespan isa simple and useful criterion for heavily 
loaded shops when long-term utilization should be 
maximized. 

e Makespan is the only objective function simple 
enough to have available some analytic results for 
multimachine problems and to make some branch- 
and-bound methods practical for medium-sized 
problems. 

The minimization of the makespan objective is to a cer- 
tain extent equivalent to the maximization of the uti- 
lization of the machines. The models, however, tend to 
be of such complexity that makespan results are already 
relatively hard to obtain. Even harder to analyze are the 
flow time and the due-date-related objectives. 


Variations 


There are a number of variations for the flow shop 
scheduling problem [60,68]. Some of them are pre- 
sented in the following. 

The permutation flow shop problem (PFSP). 
A constraint that may appear in the flow-shop en- 
vironment is that the queues in front of each ma- 
chine operate according to the first in, first out disci- 
pline, which implies that the order (or permutation) in 
which the jobs go through the first machine is main- 
tained throughout the system. This problem can be 
formulated as follows. Each of n jobs from the job 
set j = {1,2,...,n}, for n > 1, has to be processed 
on m machines 1,2, ... ,m in the order given by the 


indexing of the machines. Thus, job j,j € J, consists 
of a sequence of m operations; each of them corre- 
sponding to the processing of job j on machine i dur- 
ing an uninterrupted processing time p;; = 0. (It is as- 
sumed that a zero processing time on a machine cor- 
responds to a job performed by that machine in an in- 
finitesimal time.) Machine i,i = 1,2, .. 
ecute at most one job at a time, and it is assumed 


» , Mm, Can ex- 


that each machine processes the jobs in the same or- 
der. We represent the job processing order by the per- 
mutation z = (z(1), ... , a(n)) on the set j, and we 
let P denote the set of all permutations on j. We wish 
to find the optimal processing order 2* € P of jobs 
minimizing the maximum completion time Cmax(z) 
(makespan) [65,68]. 

The flow shop scheduling problem with limited 
machine availability. In such a problem, n jobs have to 
be scheduled on m machines under the makespan cri- 
terion and under the assumption that the machines are 
not available during the whole planning horizon [6]. 

No-wait or no-idle flow shop scheduling prob- 
lems with deteriorating jobs. Deterioration of a job 
means that its processing time is a function of its ex- 
ecution start time. A simple linear deterioration func- 
tion is assumed and some dominating relationships be- 
tween machines can be satisfied. No-wait requirement 
is another phenomenon which may occur in flow shops 
and implies that the starting time of a job at the first 
machine has to be delayed to ensure that the job can 
go through the flow shop without having to wait for 
any machine. The “no-idle” constraint means that each 
machine, once it commences its work, has to process 
all operations assigned to it without any interruption. 
In [102] it is shown that for the problems to minimize 
makespan or weighted sum of completion time, poly- 
nomial algorithms still exist, although these problems 
are more complicated than the classical ones. In [101] 
the general, no-wait and no-idle flow shop schedul- 
ing problem with decreasing linear deterioration under 
dominant machines is considered. 

A hybrid flow shop consists of a series of produc- 
tion stages, each of which has several machines operat- 
ing in parallel. Some stages may have only one machine, 
but at least one stage must have multiple machines. The 
flow of jobs through the shop is unidirectional. Each job 
is processed by one machine in each stage and it must 
go through one or more stage. Machines in each stage 


1062 


Flow Shop Scheduling Problem 


can be identical, uniform or unrelated. An extended 
survey of the problem is presented in [57]. 

In [45] the stochastic flow shop problem is pre- 
sented and analyzed. 


Exact Algorithms 
for the Flow Shop Scheduling Problem 


Branch and bound is a general method for solving many 
types of combinatorial problems. The basic idea of 
branching is to conceptualize the problem as a decision 
tree. From each decision choice point, called a node, 
for a partially completed solution there grow a number 
of new branches, one for each possible decision. These 
in turn become new nodes for branching again and so 
on. Leaf nodes, which cannot branch any further, rep- 
resent complete solutions or dead ends. A number of 
branch-and-bound procedures have been proposed for 
the solution of the flow shop scheduling problem and 
its variations [16,19,36,83,95,106,107]. Dynamic pro- 
gramming approaches for the solution of the flow shop 
scheduling problem have been proposed in [92,105]. 


Heuristic Algorithms 
for the Flow Shop Scheduling Problem 


Since the last few decades, pure flow shop schedul- 
ing problems have been largely studied. Since the flow 
shop minimization problem is NP-hard [87], a num- 
ber of heuristic and metaheuristic algorithms have 
been proposed for the solution of the problem. High- 
performance heuristics have been proposed to mini- 
mize the makespan [15,21,61] or the maximum tar- 
diness [96]. Some additional characteristics have been 
studied for the makespan criterion: non-sequence- 
dependent setup and removal times [69,91], minimum 
time lags [91], and more recently job-precedence con- 
straints [14]. Studies on hybrid flow shop scheduling 
problems are relatively recent. The main results deal 
with the makespan criterion, and are often limited to 
two stages; nevertheless, some work has been done on 
lateness criteria [38,43]. A number of heuristics algo- 
rithms were proposed in [13]. A worst-case analysis of 
heuristics is presented in [85]. 

In [6] a heuristic approach is proposed to approxi- 
mately solve the problem that consists in scheduling the 
jobs two by two according to an input sequence, and 
using a polynomial algorithm. This algorithm is an ex- 


tension of the geometric approach developed for the 
two-job shop scheduling problem. An algorithm that 
constructs heuristics that use a lower bound to find 
a feasible solution for the general m-stage flow shop 
scheduling problem with multiple operations and time 
lags is described in [75]. A greedy algorithm for the so- 
lution of the permutation flow shop model with vari- 
able processing times is presented in [28]. A two- 
stage heuristic algorithm for the flow-shop problem 
with multiple processors is presented in [90]. A bilevel 
programming heuristic is presented in [48]. A simple 
heuristic is presented in [55]. 

A two-phase heuristic is presented in [89]. In the 
first phase, an initial job sequence is generated using 
one of the available well-known and efficient heuris- 
tics, while in the second phase the sequence generated 
is improved in terms of the makespan using a pair ex- 
change mechanism with directionality constraint. The 
n-job two-machine flow shop scheduling problem is 
studied in [99] with the criterion of minimizing the sum 
of job completion times. The scheduling problem is 
first formulated mathematically. Three heuristic meth- 
ods are then invented to find near optimal schedules. 
Three lower bound generation schemata are designed 
to compute three different lower bounds, of which the 
tightest one is used. To further reduce the search space, 
some dominance properties are proved. Then a branch- 
and-bound algorithm is developed to obtain an optimal 
schedule. In [100] the flow shop scheduling problem, 
with the criterion of minimizing the sum of job com- 
pletion times is addressed. Two heuristic approaches 
are proposed to deal with this problem. The first ap- 
proach focuses on reducing machine idle times and the 
second one places efforts on reducing both the machine 
idle times and the job queue times. 

Complete reviews of the heuristic and metaheuris- 
tic algorithms for the solution of the flow-shop prob- 
lem and some of its variations are presented in [11,20, 
39,60,68,78,97]. 


Metaheuristic Algorithms 
for the Flow Shop Scheduling Problem 


Several metaheuristic algorithms have been proposed 
for the solution of the flow shop scheduling problem. 
In the following an analytical presentation of these al- 
gorithms is given: 
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e Simulated annealing [1,3,50] plays a special role 


within local search for two reasons. First, it appears 
to be quite successful when applied to a broad range 
of practical problems. Second, some threshold- 
accepting algorithms such as simulated annealing 
have a stochastic component, which facilitates a the- 
oretical analysis of their asymptotic convergence. 
Simulated annealing [2] is a stochastic algorithm 
that allows random uphill jumps in a controlled 
fashion in order to provide possible escapes from 
poor local optima. Gradually the probability al- 
lowing the objective function value to increase is 
lowered until no more transformations are possi- 
ble. Simulated annealing owes its name to an anal- 
ogy with the annealing process in condensed-matter 
physics, where a solid is heated to a maximum 
temperature at which all particles of the solid ran- 
domly arrange themselves in the liquid phase, fol- 
lowed by cooling through careful and slow reduc- 
tion of the temperature until the liquid is frozen 
with the particles arranged in a highly structured lat- 
tice and minimal system energy. This ground state is 
reachable only if the maximum temperature is suf- 
ficiently high and the cooling sufficiently slow. Oth- 
erwise a metastable state is reached. The metastable 
state is also reached with a process known as 
quenching, in which the temperature is instanta- 
neously lowered. Its predecessor is the so-called 
Metropolis filter. Simulated annealing algorithms 
for the flow shop scheduling problem are presented 
in [27,35,40,46,47,52,58,62,63,82,88,94,98, 103]. 

Tabu search (TS) was introduced by Glover [30,31] 
as a general iterative metaheuristic for solving com- 
binatorial optimization problems. Computational 
experience has shown that TS is a well-established 
approximation technique, which can compete with 
almost all known techniques and which, by its flexi- 
bility, can beat many classic procedures. It is a form 
of local neighbor search. Each solution S has an as- 
sociated set of neighbors N(S). A solution S’ € N(S) 
can be reached from S by an operation called a move. 
TS can be viewed as an iterative technique which ex- 
plores a set of problem solutions, by repeatedly mak- 
ing moves from one solution S to another solution 
S’ located in the neighborhood N(S) of S [32]. TS 
moves from a solution to its best admissible neigh- 
bor, even if this causes the objective function to de- 


teriorate. To avoid cycling, solutions that have been 
recently explored are declared forbidden or tabu for 
a number of iterations. The tabu status of a so- 
lution is overridden when certain criteria (aspira- 
tion criteria) are satisfied. Sometimes, intensification 
and diversification strategies are used to improve the 
search. In the first case, the search is accentuated 
in the promising regions of the feasible domain. In 
the second case, an attempt is made to consider so- 
lutions in a broad area of the search space. TS al- 
gorithms for the flow shop scheduling problem are 
presented in [5,7,8,12,26,27,37,40,46,63,66,86,104]. 
Genetic algorithms (GAs) are search procedures 
based on the mechanics of natural selection and nat- 
ural genetics. The first GA was developed by John 
H. Holland [42] in the 1960s to allow computers 
to evolve solutions to difficult search and combi- 
natorial problems, such as function optimization 
and machine learning. GAs offer a particularly at- 
tractive approach for problems like the flow shop 
scheduling problem since they are generally quite 
effective for rapid global search of large, nonlin- 
ear and poorly understood spaces. Moreover, GAs 
are very effective in solving large-scale problems. 
GAs [34] mimic the evolution process in nature. 
GAs are based on an imitation of the biological pro- 
cess in which new and better populations among dif- 
ferent species are developed during evolution. Thus, 
unlike most standard heuristics, GAs use informa- 
tion about a population of solutions, called individ- 
uals, when they search for better solutions. A GA is 
a stochastic iterative procedure that maintains the 
population size constant in each iteration, called 
a generation. The basic operation is the mating of 
two solutions in order to form a new solution. To 
form a new population, a binary operator, called 
crossover, and a unary operator, called mutation, 
are applied [72,73]. Crossover takes two individu- 
als, called parents, and produces two new individ- 
uals, called offspring, by swapping parts of the par- 
ents. GAs for the flow shop scheduling problem are 
presented in [5,9,10,17,18,53,59,64,79,80,81,82]. 
Scatter search [33] may be viewed as an evolution- 
ary (population-based) algorithm that constructs 
solutions by combining others. It derives its foun- 
dations from strategies originally proposed for com- 
bining decision rules and constraints in the context 
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of integer programming. The goal of this method is 
to enable the implementation of solution procedures 
that can derive new solutions from combined ele- 
ments in order to yield better solutions than those 
procedures that base their combinations only on 
a set of original elements. Scatter search algorithms 
for the flow shop scheduling problem are presented 
in [44,65]. 

Variable neighborhood search is a metaheuris- 
tic for solving combinatorial optimization prob- 
lems whose basic idea is systematic change of the 
neighborhood within a local search [41]. Variable 
neighborhood search algorithms for the flow shop 
scheduling problem are presented in [47,63]. 

The use of artificial neural networks to find good 
solutions to combinatorial optimization problems 
has recently attracted some attention. A neural net- 
work consists of a network [67] of elementary nodes 
(neurons) that are linked through weighted con- 
nections. The nodes represent computational units, 
which are capable of performing a simple compu- 
tation, consisting of a summation of the weighted 
inputs, followed by the addition of a constant called 
the threshold or bias, and the application of a non- 
linear response (activation) function. The result of 
the computation of a unit constitutes its output. This 
output is used as an input for the nodes to which it 
is linked through an outgoing connection. The over- 
all task of the network is to achieve a certain net- 
work configuration, for instance, a required input- 
output relation, by means of the collective compu- 
tation of the nodes. This process is often called self- 
organization. A neural networks algorithm for the 
flow shop scheduling problem is presented in [86]. 
An improvement heuristic based on an adaptive 
learning approach is proposed and applied to 
the general flow-shop problem. The approach uses 
a single-pass or a constructive heuristic and tries 
to find improvements iteratively by perturbing the 
data using a weight factor, allowing a nondeter- 
ministic local neighborhood search. The weights are 
modified using strategies similar to neural-networks 
training, i.e., weights are reinforced if the solution 
improves [4]. 

Artificial immune system (AIS) is an intelligent 
problem-solving technique that has been used in 
scheduling problems for about 10 years. AISs are 


computational systems inspired by theoretical im- 
munology, observed immune functions, principles 
and mechanisms in order to solve problems. Na- 
ture and in particular biological systems have always 
been fascinating to the human expert owing to their 
complexity, flexibility and sophistication. The ner- 
vous system inspired the evolution of an artificial 
neural network, in the very similar manner immune 
system motivated the emergence of the AIS. The AIS 
can be defined as an abstract or metamorphic com- 
putational system using ideas gleaned from the the- 
ories and component of immunology [22,23]. AIS 
algorithms for the flow shop scheduling problem are 
presented in [25,51]. 

Particle swarm optimization (PSO) is a popu- 
lation-based swarm intelligence algorithm. It was 
originally proposed by Kennedy and Eberhart [49] 
as a simulation of the social behavior of social or- 
ganisms such as bird flocking and fish schooling. 
PSO uses the physical movements of the individuals 
in the swarm and has a flexible and well-balanced 
mechanism to enhance and adapt to the global 
and local exploration abilities. PSO algorithms for 
the flow shop scheduling problem are presented 
in [7,8,54,56,93]. 

The ant colony optimization (ACO) metaheuris- 
tic is a relatively new technique for solving combi- 
natorial optimization problems. Based strongly on 
the ant system metaheuristic developed by Dorigo, 
Maniezzo and Colorni [24], ant colony optimiza- 
tion is derived from the foraging behavior of real 
ants in nature. The main idea of ACO is to model 
the problem as the search for a minimum cost path 
in a graph. Artificial ants walk through this graph, 
looking for good paths. Each ant has a rather sim- 
ple behavior so that it will typically only find rather 
poor-quality paths on its own. Better paths are 
found as the emergent result of the global cooper- 
ation among ants in the colony. An ACO algorithm 
consists of anumber of cycles (iterations) of solution 
construction. During each iteration a number of 
ants (which is a parameter) construct complete so- 
lutions using heuristic information and the collected 
experiences of previous groups of ants. These col- 
lected experiences are represented by a digital ana- 
logue of trail pheromone which is deposited on the 
constituent elements of a solution. Small quantities 
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are deposited during the construction phase, while 
larger amounts are deposited at the end of each iter- 
ation in proportion to solution quality. Pheromone 
can be deposited on the components and/or the con- 
nections used in a solution depending on the prob- 
lem. ACO algorithms for the flow shop scheduling 
problem are presented in [29,70,71,84]. 

Greedy randomized adaptive search procedure 
(GRASP) [74] is an iterative two-phase search 
method which has gained considerable popularity 
in combinatorial optimization. Each iteration con- 
sists of two phases, a construction phase and a local 
search procedure. In the construction phase, a ran- 
domized greedy function is used to build up an ini- 
tial solution. This randomized technique provides 
a feasible solution within each iteration. This solu- 
tion is then exposed for improvement attempts in 
the local search phase. The final result is simply the 
best solution found over all iterations. GRASP al- 
gorithms for the flow shop scheduling problem are 
presented in [76,77]. 
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This article consists of three parts: first, a historical in- 
troduction to the topic, next an overview of the most 
frequent forecasting methods and finally a short de- 
scription of modern computer-aided techniques as they 
are used nowadays (2000) for instance for forecasts on 
financial markets. 


Introduction 


Prediction ideas and information about uncertain fu- 
ture events in general are as old as humanity. Scientific 
forecasts are based on predetermined patterns, regular- 
ities or conformities with a (natural) law. A theoretical 
basis is made up of its components, the parameters of 
the model and the conditions for the system. 

Predictions of future events are called forecasts and 
are concerned with the question of what the world ‘will’ 
look like [6]. Any organization must be able to make 
forecasts concerning their work which aim to reduce 
the uncertainty of the environment. For example, busi- 
ness firms, in particular, require forecasts for a large 
number of events and conditions in all phases of their 
operation and forecasts are indispensable for planning 
and strategy in everyday life. 
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For centuries, the nature of forecasting was the field 
of philosophers, who studied problems of inductive in- 
ference analytically in order to obtain instruments for 
qualitative and quantitative forecasting. 

When asking the question of whether the future can 
be predicted, or whether it is arbitrary and random, we 
first noticed certain regularities in the behavior of na- 
ture. These regularities were most obvious in the fields 
of physics and astronomy in which it was possible to 
make conditional forecasts. They were given their firm 
mathematical basis by I. Newton, in the seventeenth 
century. We still use his theory of gravity which is able 
to predict the motion of (celestial) bodies. 

At the turn into the twentieth century, psychology 
began to use experimental methods to investigate learn- 
ing in humans and other organisms. In doing so, psy- 
chologists acquired knowledge about which behavior 
could be forecasted and how to reduce uncertainty as 
much as possible. During the twentieth century, the 
topic of forecasting in general became increasingly im- 
portant, especially after quantitative methods had been 
developed. Various forecasting methods were given pri- 
ority in economics and even more recently the com- 
puter has provided research tools, engendering the field 
of machine learning. Systematic research on trade cy- 
cles and on crises management are the first economic 
forecasts, at the same time as early psychological inves- 
tigations. 

One of the first aims of economics was to become 
a science which could make forecasts with the help of 
induction. The true measure of the value of economists 
is often seen as the accuracy of the forecasts they 
make [14]. 

J.H. Holland, K. Holyoak, R. Nisbett and P. Thagard 
in 1986 [18] gave an excellent overview of the various 
insights of researchers in psychology, philosophy and 
artificial intelligence. Also borrowing from several other 
disciplines such as engineering, statistics, biology and 
game theory, including experimental economics [20] 
they systematically developed principles providing co- 
herence of a diverse set of findings on the nature of in- 
ductive processes for prospective events in the future. 


Forecasting Methods and Models 


Obviously there are several possibilities of classification 
because of various methodological approaches. Using 


a fundamental division, we will generate two groups of 
forecasting methods: 

i) qualitative methods and 

ii) quantitative methods. 

Another main distinction consists of a generaliza- 
tion of similar situations which can be 
i) data based (usually given in the form of a time se- 

ries, a chronological sequence of observed data with 

respect to a certain variable) expecting that history 

repeats itself in a certain way and 
ii) theory based, where we assume that external factors 

determine events. 
It is natural to start with qualitative forecasting meth- 
ods predicting future events with a certain subjective 
probability: on the one hand we tend to make forecasts 
for similar events on the basis of a certain generaliza- 
tion, on the other hand we try to predict new events 
for those situations where little or no historical data 
are available and for events where we expect changes 
within the data patterns. Generalization ideas - in a log- 
ical and methodological sense - are made on the basis 
that events will have properties in a certain analogy to 
the past and tend towards the direction of objective pro- 
cesses. 

Here we want to recommend a well known classifi- 
cation by S. Makridakis and $.C. Wheelwright [21]. Our 
aim is to discuss a selected subset of these frequently 
used methods. 


Expert Systems 


In questions about future events, a systematic discus- 
sion of a group of five to twelve experts (expert systems) 
usually yields forecasts with a better hit-rate than indi- 
vidual predictions. This belongs to the class of judge- 
mental forecast. Using this method, credibility is one of 
the most desirable features of a forecast [10]. 

The Delphi method, developed by members of the 
RAND Corporation in the 1960s [11], avoids face-to- 
face effects by using a procedure based on a ques- 
tionnaire technique. Delphi therefore guarantees three 
basic characteristics: anonymity, interaction with con- 
trolled feedback, and statistical group responses. 


Subjective Curve Fitting 


A frequently used method is subjective curve fitting and 
extrapolation, which is used in economics, for exam- 
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ple, for forecasts on the development of products with 
certain life cycles or seasonal fluctuations. Experimen- 
tal findings show that there are several gestaltsoriented 
rules dominating expectation formations [4,5]. Subjec- 
tive curve fitting differs to some extent between the dif- 
ferent subjects; frequently we are not only interested in 
individual expectations of a single forecasting subject 
but also interested in the so-called average opinion of 
a whole group as a good predictor for future events. For 
example it is a well known hypothesis that the average 
expected rate of inflation has an essential influence on 
various economic variables. 


Technological Comparison 


A sensible method is technological comparison [2] and 
[15]. Obviously, we should enlarge, compare or com- 
bine these qualitative methods with quantitative ones 
(combined forecasts), knowing that even models which 
best fit the available data are not necessarily the most 
accurate ones in predicting beyond this data [23]. 


Expectations and Decision 


The simplest way of modeling expectations of future 
events, which is used frequently in economic theory, 
is to assume that conditions prevailing today will be 
maintained in all subsequent periods analogously. In 
cases where no causal explanation from other variables 
seems to be appropriate we could simply use extrapola- 
tion methods with the given data base to enable at least 
a short term forecast. These methods are successfully 
used 

i) for seasonally adjusted data; and 

ii) for cases where a continuation of the historical trend 

is to be expected. 


Statistical Procedures 


The next step is to use statistical procedures which are 
in some sense learning and in another sense adaptive 
methods. Quantitative forecasting methods, theory and 
data based, using knowledge from mathematical statis- 
tics started to be successful in the early 1960s beginning 
with ideas of R.G. Brown on smoothing methods. 

In particular, exponential smoothing is frequently 
used for producing short term forecasts [8,9]. Brown 
suggested estimating the average of a time series and 


used it as an extrapolation for the forecast. With each 
new data set and observation respectively, we are able to 
revise the mean square error (MSE) applying exponen- 
tial smoothing to the squares of the error in the most 
recent forecast. Several techniques have been proposed 
using exponential smoothing but it is evident that all the 
history of a process cannot be described by one and the 
same simple model. 


Moving Average Model 


In 1970 G.E. Box and G.M. Jenkins introduced more 
sophisticated forecasting models which were the first to 
take into account the nature of the data and the manner 
of the stochastic process to be forecasted. They asked 
not only the question of what to forecast and what data 
to collect, but also what data to analyze and in what 
context to embed the forecast. Their moving average 
models [7], enable a successful application. They popu- 
larized an approach that combines the moving average 
and the autoregressive approaches in [7]. The classes 
of autoregressive (integrated) moving averages (ARMA 
and ARIMA) processes have been successfully intro- 
duced by them and their models are some of the most 
frequently used tools for stochastic analysis. 

Several ways to model multiple time series are de- 
scribed in [16]. Further ARIMA models are given in 
[23,24]; [13] gives an excellent comparison of these 
models using performance methods. 

When enlarging the statistical methods with sensi- 
ble associations and connections, econometric methods 
should be considered, if causal relationship and changes 
in causal variables are expected and can be estimated. 


Regression Analysis 


Usually it would be sensible to figure out a certain a pri- 

ori relationship between the given data sets such that 

statistical methods of regression analysis can be used. In 

its simplest form, the classical linear regression model is 

used to determine an equation relating two sets of data 

with each other: 

i) the set of observations of the explanatory or inde- 
pendent variables (the predictors); and 

ii) the set of the associated responses, the observations 
of the dependent variables. 

This task often seems to be easy at first sight, but when 

all details are concerned it becomes a high leveled task. 
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Obviously there are some future values which can easily 
be forecasted, e.g. the fuel consumption for a certain 
period as a relation of velocity. 

As our example from the field of finance will 
demonstrate, there are, however, enough reasons to as- 
sume more complicated situations caused by complex 
systems and/or error terms. For example, demand as 
the variable of interest can be seen as a function of the 
price which takes the role of the predictor. 


Econometric Models 


Obviously, the standard model using the single regres- 
sion equation has been extended in various ways. For 
example, 

i) disturbances are allowed to be autocorrelated and to 
have different variances (heteroskedastic); 

ii) by regressors measured with errors or in dynamic 
situations (with dependences on lagged values) 
stochastic regressors arise. 

As proposed already at the foundation of the Econo- 

metric Society in Cleveland in 1930 we also use nowa- 

days (2000) econometric models which are not only 
data based, but also theory oriented. 

In econometrics, the single equation regression 
model is enlarged and complex systems of simultane- 
ous equation models are used, including several equa- 
tions and also several dependent variables. These mod- 
els are implemented as applied econometrics software 
and build, for instance, the basis for national budget 
calculations, usually containing several hundred equa- 
tions. 


Modern Computer Aided Techniques 


To predict future movements of financial markets, 
technical analysts use time series and apply the statis- 
tical and econometrical methods described above. We 
enlarge the methods by new techniques which are able 
to recognize certain relationships from examples by 
generalization with the help of new computer technolo- 
gies. The methods used in this application are a compo- 
sition of artificial neural networks, genetic algorithms 
(see ® Genetic algorithms) and fuzzy logic. Obviously 
we are not able to go into details, but we try to give 
a short characterization for our application. 


Neural Networks 


These are inspired by the functionality of nerve cells 
in the brain. Like humans, they can learn to recognize 
patterns by repeated exposure to many different ex- 
amples. They can be used to detect salient characteris- 
tics whether they are handwritten characters, profitable 
loans or good trading decisions. Neural networks learn 
to recognize even regularities in data that are inexact 
or incomplete. A neural network finds this relationship 
by means of a learning cycle where a large amount of 
samples are presented repeatedly to the network. Neu- 
ral networks cannot guarantee an optimal solution to 
a problem. However, properly configured and trained 
neural networks can often make consistently good clas- 
sifications, generalizations or decisions in a statistical 
sense. Neural networks are widely used to identify pat- 
terns in the data of financial markets. 


Fuzzy Logic 


This is a strategy that is not based on a mathematical de- 
scription of a special system or market but is intended 
to model the behavior of a human investor. The ex- 
pert’s knowledge is specified in terms of linguistic rules 
in which linguistic expressions are associated with fuzzy 
sets. Fuzzy set methods tend to overcome the vagueness 
of causality. They can be used to explain financial mar- 
kets’ developments using fundamental rules as shown 
in Fig. 1. 

Fuzzy logic is a superset of conventional (Boolean) 
logic that has been extended to handle the concept of 
partial truth - truth values between ‘completely true’ 
and ‘completely false’. [25,26]. In other words, a fuzzy 
system is a collection of ‘membership functions’ and 
rules that are used to reason about data. In our exam- 
ple the ranges of the interest rates of Germany and the 
USA gives forecasts for the exchange rate between these 
currencies. Fuzzy logic enables us to model and predict 
market developments on the basis of the experience of 
financial experts. 


Genetic Algorithms 


A genetic algorithm allows us to optimize any given 
function. Genetic algorithms are search procedures 
based on the mechanisms of natural selection, mutation 
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Defuzzification 


IF Inflation German = low AND inflation US = low THEN US/EURO = stable 
IF Inflation German = medium AND inflation US = low THEN US/EURO = increasing 


Forecasting, Figure 1 
Fuzzy Logic rules to predict the US Dollar/Euro fixingj 


Forecasting, Figure 2 
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toda future 


Genetic algorithm used for optimizing a rulebasis to forecast the US Dollar/Euro fixing 


and recombination. A population consisting of chro- 
mosomes (i.e., solutions for a function) is created ran- 
domly. In the next step each chromosome is evaluated 
and given a certain fitness value. The fitness value rep- 
resents the feasibility and the optimality of a given so- 
lution. Depending on their fitness value a certain per- 
centage of the population is selected and deleted. The 
surviving individuals are recombined and mutated. Af- 
ter the population has been evaluated and the forecast 
adjustment based on past data has been decided, the se- 
lection process starts again. 


The genetic algorithm is used to optimize the cer- 
tainty of each rule in the fuzzy logic rulebase. As shown 
in Fig. 2 the fitness function of the genetic algorithm 
consists of a fuzzy logic rulebase and several mathemat- 
ical objects to calculate the forecast error. The forecast 
error is used to evaluate the individuals. All rules are ap- 
plied to historic financial data and their forecast error is 
summed up over the whole horizon. 

Rules which are only partly true get lower certainty 
values until their certainty corresponds to their actual 
influence. Currency markets tend to follow certain reg- 
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ularities, detected by expert knowledge or, for example, 

the purchasing power theorem. 

A similar procedure is applied to neural networks 
in order to optimize the topology of the neural network 
itself as well as a data mining approach to identify the 
input parameters. The genetic algorithm allows us to 
cancel out useless time series. When forecasting finan- 
cial markets an appropriate and adaptive input param- 
eter selection is necessary. 

In our case the inputs are knowledge of economic 
data to receive forecasts for future developments of fi- 
nancial markets. 

One goal in system theory is, in order to integrate 
the ideas of several disciplines, to have a successful 
instrument for analyzing forecasting processes includ- 
ing learning and discovery in direction optimality. This 
process of searching for the best value that can be real- 
ized or attained is based on the events of subjects whose 
actions are not able to be forecasted with total certainty. 

Finally, we are able to summarize this as follows, us- 
ing the different stages we took into account: 

e historical comparison based on repeatedly observed 
similar events and on statistical data, e.g. business 
fluctuations, Harvard’s barometers, chart extrapola- 
tions; 

e time series analysis, based on proceedings on math- 
ematical statistics; 

e econometric forecasting models, including stepwise 
regression models, as well as vector autoregressive 
models (VAR); 

e modern techniques, mainly computer aided, and 
software which is available as adaptive models, 
learning models, artificial neural networks (ANN), 
fuzzy set models and evolutionary algorithms. 


See also 


> Continuous Global Optimization: Applications 
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Consider a system of m linear inequalities in n real vari- 
ables 

Ax <b, (1) 
where x = (x), ..., X,)? € R" is the vector of unknowns 
and A, b are a given real matrix and vector. Let X = 
{x € R" : Ax < b} be the solution set of the system, and 
let X' denote the projection of X onto the linear space 
spanned by the last n — k coordinates: 


X™l = {(xe41,..0)%n) € R™*: 
A(x1,...,x~) € R* 
s.t. (%1,...,%Xn) © X}. 


The Fourier-Motzkin method [3,4,5,8,10,12,14,15] suc- 
cessively eliminates variables x1, ..., x,—1 from (1) and 
computes matrices Al! and vectors b!*! such that 


xt) — — eR, alelylkl < ee 


k=1,...,n—-1, 


where x"*l = (xpaj,...5 Xn)T. 


In order to eliminate variable x,, we first multiply 
each of the m inequalities of (1) by an appropriate pos- 
itive scalar to make each entry in the first column of 
A equal to + 1 or 0. We can thus assume without loss 
of generality that the original system of inequalities has 
the form 


+1-x, +a;(x") < 0, ie M4, 
—1-x, +a;(x")<0, ie M, 
0-x, ta;(x") <0, ie Mo, 


where «;(x!!!) = ainxat +++ + WinXn+ Bj are given affine 
forms of x!) = (x9, ..., X%)7 € R”! and M,, M_, Mo 
are disjoint sets of (indices of) inequalities partitioning 
the entire set of inequalities in (1): 


Ms UM_U™M) = {1,...,m}. 


It is easy to see that for each fixed x!"!, the inequalities 
with indices i € M,U M_ can be satisfied by some real 
x, if and only if each upper bound — a(x"), i € My 
on x, exceeds each lower bound aj(x!4!), j € M_ on the 
same variable, i.e., — a(x!!!) > a(x!!!) for all i € My 
and j € M_. Combining these |M..| |M_| inequalities 
with the remaining |Mo| inequalities of (1) that do not 
depend on x;, we arrive at the system of |M..| |M_| + 
|Mo| linear inequalities 


ai(x!}) + aj(x") < 0, 
a(x") <0, i€ Mo, 


(i,j) € My x M_, 


whose solutions set is X'!], The above system can be 
written as A!)x] < pl] with appropriate matrix A"! 
and vector b!!, This gives X"! = {xf eR™!; Allyl 
< Dl}. Eliminating variable x, from Al)x!!] < bl!) we 
obtain a similar description X#! = {x!7] e R™? ; APlx(?] 
< b!?!} for the second projection and so on. After n — 
1 steps of the above procedure we have n — 1 matri- 
ces Al] and vectors b'*! such that X!! = {xll eR: 
Al‘ xl < pM} k=1,...,0n-1. 


Solution of Systems of Linear Inequalities 
and Linear Programming Problems 


If the solution set X = {x € R": Ax < b} is nonempty, 
then so are all the projections XH) CR™* k=1,..., 
n — 1, and vice versa. In particular, if Ax < b is feasible, 
then 


xl = fel eR: Allyl < ple 
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is a nonempty interval on the scalar variable x!"—!) 
= x,. Given Al") and bl"), we can easily find 
a point xX, € X'-1)_ Then, substituting x, = Xx, into 
Alrlxln-2]_< pl"~2], we obtain a new feasible system 
of linear inequalities whose solution set is the inter- 
val 1 iinet ER: (xn-1,Xn) € ial Solving this one- 
variable system yields a point x7] (Xn-1,X%n) € 
X!"-2] which can be substituted in Al~4Ix!*-3] < 
b!"~3] etc. By repeating such backward substitutions, 
the Fourier-Motzkin method can compute a solution 
(X1,...,X,) to any feasible system of linear inequalities 
Ax < b. ‘Historically, it is the ‘pre-linear programming’ 
method to solve linear inequalities’ [14]. 

Now suppose that the input system is infeasible, i. e. 
X = {x € R”: Ax < b = O}. As was pointed out in [10], 
the Fourier-Motzkin method can then find nonnega- 
tive real multipliers p;,..., Pm such that 


paAa=0, pb=-1, p=(pi,...,pm)=0. (2) 


To see this, observe that each inequality in Al!x!!] < 
b!] is a positive combination of at most two inequalities 
of the original system. Since a nonnegative combina- 
tion of nonnegative combinations of some inequalities 
is a nonnegative combination of the same inequalities, 
we conclude that each inequality in each system A!) x!) 
<b k=1,...,n—l,isa nonnegative combination 
of the input inequalities. Considering that Al"—'x!"—1) 
< pb!" is an infeasible system of linear inequalities 
in one variable, Al”—'x!""_< pl" js easily seen to 
contain one or two inequalities whose positive com- 
bination yields the infeasible inequality 0 - x, < —1. 
This is equivalent to (2). In particular, the Fourier- 
Motzkin method provides a simple algorithmic proof 
of the Farkas lemma (cf. » Farkas lemma; » Farkas 
lemma: Generalizations): (1) is feasible if and only if (2) 
is infeasible. 

The Fourier-Motzkin method can also be used to 
solve the general linear programming problem 


ge = fared cas Ax <b, xe€ R"}. (3) 


For instance, we can eliminate n variables x = (x), ..., 
Xy) from Ax < b, X»41 —c™x <0 to determine the inter- 
val X!) = (one: Xne1 < E*}. Then, letting x41 = &* and 
solving the resulting system yields an optimal solution. 

It should be mentioned that there are far more ef- 
ficient linear programming algorithms. Note, however, 


that (3) calls for projecting X = {x € R": Ax < b} on 
a one-dimensional subspace. After an appropriate lin- 
ear transformation, the Fourier-Motzkin method can 
project X = {x € R": Ax < b} on any given subspace 
in R". 


Complexity of the Fourier-Motzkin Method 


Let m; denote the number of inequalities in the kth sys- 
tem Al*l xl] < bl generated by the Fourier-Motzkin 
method. Since m; = |M,| |M_| + |Mo| < m?’, we have 
mr < ies for all k. So the number of inequalities is at 
most squared at each step of the method, which implies 
that m, is bounded by a doubly exponential function in 
k, say mg < m= . The following example shows that with 
sufficiently many variables, the kth step of the method 
can produce 


2k(1—0(1)) 
mr, =m 


inequalities. 


Example 1[14] Letn=2* +k+2and consider a system 
of linear inequalities Ax < b which contains as left-hand 
sides m = 8(4) linear forms +x;, + x;, + x;, for all 1 
<i) < in < iz <n. By induction on j = 1,..., kit is easy 
to show that after eliminating the first j variables, the 
resulting system includes among its left-hand sides all 
the forms + x;, +--+ x;, withk+1<ij<--: <i,< 
nand s = 2/ + 2. In particular, for j = k we have at least 


get = yah inequalities in A!*!x!*] < plH, 


Let us now return to the first step of the algorithm 
where we replace Ax < b by the |M,| |M_|+|Mo| new 
inequalities A!'!x!!] < bl), As was pointed out already 
by J.B.J. Fourier, ‘it nearly always happens that a rather 
large number of these new inequalities are redundant’ 
and ‘their removal greatly simplifies the problem’ [8]. If 
the redundant inequalities are systematically removed 
at each step of the algorithm, the number m, of in- 
equalities generated by kth step of the Fourier-Motzkin 
method is bounded by an exponential function in k. As- 
sume without loss of generality that X = {x € R": Ax < 
b} is full-dimensional, then each projection X™ is also 
full-dimensional and m; is the number of facets of X"!. 
Therefore m, is bounded by the total number of i-faces 
of X fori > n—k—1. Hence 


k+1 k+1 
m m 
mr, < ~ — 
(7) (k +1)! 


for m > oo. 
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(This rough estimate can be improved by using the up- 
per bound theorem [11]; in particular, m, cannot grow 
faster than m7!) In the example below, X"! has 


mkt1 


m, > ————— 
‘= (e+ 11 
facets. 


Example 2. Let s > 2 be a natural number. Consider the 
system of m = (k+ 1) s linear inequalities 


Vij 2 Xi, 7 nee oe 


Xp tr + Xk 2 ZI, 


where x;, yj, and z; are real variables. The elimination 
of x1,..., xg results in s**! = (m/(k + 1))**! inequalities 


visa) te t+ Yep) ZZ, L=1,...,5, 

where f ranges over the set of all s* mappings from {1, 
..+> k} to {1, ..., s}. None of the inequalities above is 
redundant. For instance, 


Yu tee t+ VR 2 21 


is violated by yy, =-- 
whereas all the other inequalities can be satisfied by giv- 
ing the remaining variables y; a high value. 


-=ym =Oand z, =--: =z,=1, 


Since detecting the redundancy of an inequality can be 
done via linear programming (or by maintaining a list 
of vertices and extreme directions of X'"! with the dou- 
ble description method [4,13], see also [9,15] and ref- 
erences herein), the Fourier-Motzkin method runs in 
exponential space and time. It is natural to ask whether 
given X = {x € R": Ax < b} anda number k € {1,..., 1 
— 1}, an irredundant description for X [A] = {fk € Rn-k, 
Al xl) < bl} can be computed in output-polynomial 
time, i. e. by an algorithm that runs in time polynomial 
in the total input and output size. This question is open 
even in the bit model of computation for rational A and 
b, when redundant inequalities can be detected in poly- 
nomial time. A related problem is the generation of all 
vertices for X = {x € R”: Ax < b}. The vertex generation 
problem (or its dual, the convex hull problem) can also 
be solved by the double description method, see e. g. 
[1], but the question as to whether there is an output- 
polynomial vertex generation algorithm remains open. 


Finally, we mention that the Fourier-Motzkin 
method can be modified to a quantifier-elimination 
method for arbitrary semilinear sets 


X" = {(xp41,...,%n) ER”: 
(Qix, € R)---(Qxxx € R) 
F(x1,...,Xy) true}, (4) 
where Qj, ..., Qx € {H, W} are existential and/or uni- 


versal quantifiers and F(x, .. 
function of m threshold predicates 


-» X,) is a given Boolean 


true if a} x < bi, 
Filx) = 
false otherwise, 
with given coefficients a; € R" and b; € R,i=1,..., m. 


In particular, if Qi, ..., Q, are all existential quantifies 
and F = F; A--: A Fm, we obtain the previously con- 
sidered problem of projecting the polyhedral set X = {x 
€ R’: a x < b;, i= 1,..., m} onto the space spanned 
by the last n — k coordinates. In general, (4) can be 
transformed into an equivalent quantifier-free repre- 
sentation X') = {(xpay,.0.5 Xn) 1 G(Xke1s «+> Xn) true}, 
where G is some Boolean formula whose atoms are new 
threshold predicates of (xj41,...5 Xn) € R"*. This can 
be done, for instance, as follows [6,7]. To eliminate the 
rightmost quantifier Q,x; € R, write each threshold in- 
equality involving x; in the form x, < a(x) or x, > 
a(x"), where the a;’ are given affine forms of the re- 
maining variables x99 = (x1, 0005 Xh— dy Xk dy vee Xn) 
Replace the infinite range x, € R by the finite set S of 
sample points x, = (a(x) + aj(x\))/2 and x, = + oo. 
Now it is easy to see that the expression (Ax, € R)F(x1, 
...) Xn) is equivalent to the quantifier-free disjunction 
Vxpes F(X1,...,Xn) and that (Vx; € R)F(x1,.. 
be replaced by the equivalent conjunction Ax,esF(x1, 
. s+) X,). Quantifies Qx— 1xX~—1, ..., Qyx, can be elimi- 
nated in the same way. For a discussion of faster algo- 
rithms that eliminate blocks of consecutive identically 
quantified variables see [2]. 


.» Xn) can 


See also 


> Linear Programming 
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A fractional combinatorial optimization problem 
(FCOP) is a combinatorial optimization problem with 
an objective function which is a ratio of two (nontriv- 
ial) functions. Instances of a FCOP can be expressed in 
the general form: 


max 1. 
g(x)’ 1) 
( 
for xEX, 


where X C {0, 1}? is a set of (vectors representing) 
certain combinatorial structures, and f and g are real- 
valued functions defined on X. Numbers f(x), g(x), 
and f(x)/g(x) are usually called the cost, the weight, 
and the mean-weight cost of structure x. A minimiza- 
tion FCOP is equivalent to the corresponding max- 
imization problem, if the cost function f can be re- 
placed with function —f. The FCOPs which appear in 
the literature on combinatorial optimization include: 
the minimum ratio spanning-tree problem [2,13,14]; 
the maximum profit-to-time ratio cycle problem and 
the equivalent minimum cost-to-time ratio cycle prob- 
lem [1,3,6,11,12,13,14]; the minimum mean cycle prob- 
lem [1,10,11]; the maximum mean-weight cut prob- 
lem [16]; the maximum mean cut problem [5,9]; and 
the fractional 0-1 knapsack problem [7,8]. 

Consider, as an example, the minimum cost-to-time 
ratio cycle problem (MRCP). An instance of this prob- 
lem consists of a directed graph G = (V, E), where E = 
{€1, ...5 €m} is the set of edges, and numbers c; and t; 
associated with each edge e;, for i = 1, ..., m. The ob- 
jective is to find a simple cycle I” in G which minimizes 
the ratio of )-{c; : e; € "'} to “{t; : e; € }. To ex- 
press this instance of the MRCP in the form (1), let X 
C {0,1} be the set of the characteristic vectors of the 
simple cycles in G, and for x = (x), ..., Xm) € {0, 1}, 
let f(x) = — (cyx1 + +++ + CmXm) and g(x) = tix, +++ + 
tmXm. The MRCP models the following tramp steamer 
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problem [1,12]: V is the set of ports which can be visited 
by our cargo ship; E C V x V is the set of possible direct 
port-to-port trips; numbers c; and t; are the cost and the 
transit time of trip e; € E, respectively; and the objective 
is to find a closed tour for the ship which minimizes the 
daily cost (or, equivalently, maximizes the daily profit). 

A FCOP such that the denominator of the objec- 
tive function g(x), ..., Xp) =x] + +++ + x» is commonly 
called a uniform fractional combinatorial optimization 
problem. The minimum mean cycle problem (which is 
a special case of the MRCP with all numbers f; equal to 
1) is a uniform FCOP. A FCOP such that f(x, ..., xp) 
= A,X, ++++ + ApXp and g(Xx1,..., Xp) = bx, + +++ + Dp Xp 
is called a linear fractional combinatorial optimization 
problem. All FCOPs mentioned above are linear. 

Generic methods for FCO usually follow the para- 
metric approach to fractional optimization. Let 5 € Rbe 
a parameter. Problem: 


max f(x)—46- g(x), 
for xeEX, 


(2) 


is called the parametric problem corresponding to the 
fractional problem (1). Let h(6) denote the optimum 
objective value of problem (2). From now on assume 
that f(x) > 0, for some x € X, and g(x) > 0, for all x € 
X. Function h is continuous, convex, piecewise linear 
and strictly decreasing on (— 00, + 00). It has exactly 
one root 5* and this root is the optimum objective value 
of problem (1). The main generic methods for FCO 
are the binary search method, the Newton method, and 
Megiddo’s parametric search. They all can be viewed as 
methods for finding the root of function h. 


The Binary Search Method (BSM) 


This method maintains an interval [a, 6] containing 
the root 6* of function h, and reduces this interval by 
half in each iteration by checking the sign of h((a+ 
f)/2. Thus to apply the BSM, one needs an algorithm 
Ao which for a given 6 € R calculates the sign of the 
optimum objective value of problem (2). For a linear 
FCOP such that all numbers |a;| and |b;| are integers 
not greater than U (an integral linear FCOP), the BM 
finds an optimum solution in O(log(pU)) iterations. 
This follows form the fact that if numbers f(x’)/g(x’) 
and f(x”)/g(x”) are not equal, they must differ by at 
least 1/(pU)*. Hence, as soon as the length of the search 


interval [a, B] becomes less than 1/(pU)?, it contains 
only one value f(x)/g(x), which must be equal to 5*. 


The Newton Method (NM) 


This generic method for fractional optimization, also 
called the Newton-Raphson method or the Dinkelbach 
method [4], is an application of the classical Newton 
method to the problem of finding the root 5* of func- 
tion h. The NM computes an increasing sequence 4), 
53,... of lower bounds on 5*. During iteration i, a so- 
lution x® of problem (2) for 5 = 6; is computed, and 
6;+1 is set to f(x)/g(x) (see Fig. 1). The NM finds an 
optimum solution of a FCOP in a finite number of it- 
erations, because function h consists of a finite number 
(< |X|) of linear segments. The NM solves a uniform 
FCOP in at most p + 1 iterations, because function h 
corresponding to such a problem consists of at most p 
linear segments (since function g yields at most p differ- 
ent values). Other bounds on the number of iterations 
of the NM for FCO can be derived from the fact that for 
each iteration i, except the last one, 

g(x@t)) 
g(x!) 


h(6i41) 
h(5;) 


1; (3) 


which implies that sequence (h(4;) - g(x)) decreases to 
0 at a geometric rate. Using this fact one can show that 
the NM solves an integral linear FCOP in O(log(pU)) 
iterations, and any linear FCOP in O(p? log’p) itera- 


h(d) 


fx?) - 5eg(x”) 


Fractional Combinatorial Optimization, Figure 1 
The Newton method for FCO 
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tions [16,17]. The NM gives the asymptotically fastest 
known algorithm for the maximum mean-weight cut 
problem [16,17]. Its running time is O(nm? log n) for 
a graph with n nodes and m edges. 


Megiddo’s parametric search (MPS) 


Let A, be an algorithm as Ao above but with the follow- 
ing additional property: The value of each computed 
arithmetic expression on each possible execution path 
of algorithm A, is a linear function of parameter 6. 
Such an algorithm A, is called a linear algorithm for 
a parametric problem. MPS [13,14] solves a FCOP by 
following the computation of algorithm A, for 6 = 6*. 
All values calculated during this computation are lin- 
ear functions of (unknown) 6* and are stored as such. 
Thus each comparison amounts to calculating the sign 
of the value of an expression s — t 5*, where s and ¢ are 
known numbers, and can be resolved by running algo- 
rithm Ap for 6 = s/t (s/t < 6* <= h(s/t) > 0). If the run- 
ning times of both A; and Apo are at most T, then the 
overall running time of MPS is O(T’). If algorithm Ao 
runs in time T and algorithm A, is parallel and runs in 
time T, on P processors, then MPS can be implemented 
in such a way that the overall (sequential) running time 
is O(T|P + ToT log P): At the kth (parallel) step of the 
computation of A, for 5 = 6*, the required signs of Px 
(< P) expressions sx; — tj 5", j = 1, ..., Px, can be 
found by at most log P; + 1 executions of algorithm Ao. 
The first execution is for 6 equal to the median of the 
numbers sj, ;/tx, ;, and its result gives the signs of half of 
the expressions. MPS gives, for example, the asymptot- 
ically fastest known algorithms for the minimum ratio 
spanning-tree problem and the minimum cost-to-time 
ratio cycle problem [14]. Their running times are O(m 
log? n log log n) and O(n? log n log log n), respectively, 
for a graph with n nodes and m edges. An extension 
of MPS to cases when only approximate algorithms Ap 
are practical is proposed in [7] and applied there to the 
fractional 0-1 knapsack problem. 

For some FCOPs, there are specialized algorithms 
which do not follow any of the above three generic 
methods. The most prominent examples are the O(mn) 
[10] and O(m,/nlog(nU)) [15] algorithms for the 
maximum mean cycle problem (the latter one is for the 
integral case). A detailed treatment of methods for FCO 
can be found in [17]. 
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In various applications of nonlinear programming a ra- 
tio of two functions is to be maximized or minimized. 
In other applications the objective function involves 
more than one ratio of functions. Ratio optimization 
problems are commonly called fractional programs. 

One of the earliest fractional programs (though not 
called so) is an equilibrium model for an expanding 
economy introduced by J. von Neumann [50] in 1937. 
The model determines the growth rate as the maximum 
of the smallest of several output-input ratios. At a time 
when linear programming hardly existed, the author 
already proposed a duality theory for this nonconcave 
program. 

However, apart from a few isolated papers like von 
Neumann’s, a systematic study of fractional program- 
ming began much later. In 1962 A. Charnes and W.W. 
Cooper published their classical paper [19] in which 
they show that a linear fractional program can be re- 
duced to a linear program with help ofa nonlinear vari- 
able transformation. 

The study of fractional programs with only one ra- 
tio has largely dominated the literature in this field un- 
til about 1980. Many of the results then known are 
presented in the first monograph on fractional pro- 
gramming by S. Schaible [57] (1978). Since then two 
other monographs solely devoted to fractional pro- 
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gramming appeared, authored by B.D. Craven [23] 
and I.M. Stancu-Minasian [68]. Each of these includes 
a chapter on multi-ratio fractional programs. 

Fractional programs have often been studied in 
the broader context of generalized convex program- 
ming [4]. Ratios of convex and concave functions as 
well as composites of such ratios are not convex in gen- 
eral, even in the case of linear ratios. But often they 
are generalized convex in some sense. From the be- 
ginning, fractional programming has benefited from 
advances in generalized convexity, and vice versa; see 
B. Martos [45]. This is demonstrated by the fact that 
the proceedings of each of the five international sym- 
posia on generalized convexity contain contributions 
on fractional programming; see [16,27,41,63,66]. Frac- 
tional programming overlaps also with global optimiza- 
tion. Several types of ratio optimization problems have 
local, nonglobal optima. For an extensive survey of frac- 
tional programming, see [60]. 

The survey [60] also contains the largest bibliogra- 
phy on fractional programming so far (1999). It has al- 
most twelve-hundred entries. For a separate, rich bibli- 
ography see [68]. 

Clearly, fractional programming is a dynamic, 
growing area of research. It has been encouraging to 
observe that over the years research on theory and so- 
lution methods has increasingly more focused on those 
ratios which are of particular interest in applications. 
Since these are spread over a wide range of fields, sur- 
veys on fractional programming applications have been 
much needed. In the single-ratio case, a first detailed 
survey appeared in [57] and became a basis for [58,62] 
and for surveys by others. A more recent, detailed sur- 
vey of single-ratio fractional programming applications 
is found in [68]. For the multi-ratio case, the surveys 
in [60,61,62] may be consulted. As various classes of 
fractional programs are presented below, the relevance 
of each class will be indicated. 


Classification 


Let f, g, hg (k = 1,..., m) denote real-valued functions 
which are defined on a set C in the n-dimensional Eu- 
clidean space R”. Consider 

(x) 
ga) = 2 (1) 
g(x) 


over the set 
S={xeC: hy(x) <0, k =1,...,m}, (2) 
assuming that g(x) > 0 on C. The nonlinear program 
(P) sup {q(x): x € S} (3) 


is called a (single-ratio) fractional program. In some ap- 
plications more than one ratio appears in the objective 
function. Examples discussed in this article are 


sup Len gi(x): x € sk (4) 


and 


P 
sup | > aie): xe s ; (5) 


i=1 


where q(x) equals the ratio of the numerator f;(x) and 
the denominator g;(x) satisfying g;(x) > 0 on C. Prob- 
lem (4) is sometimes referred to as a generalized frac- 
tional program [62] while (5) is called a sum-of-ratios 
fractional program. Both problems (4) and (5) are re- 
lated to the multi-objective fractional program 


max {(qi(x), wees Gp(X)): XE Sh. (6) 


So far, the functions in the numerator and denom- 
inator were not specified. If f, g and h, are affine func- 
tions (linear plus a constant) and C is the nonnegative 
orthant of R", then (P) is called a linear fractional pro- 
gram. It is of the following form: 


clx+a 


———: Ax <b,x>0), 7 
eap ne (7) 


sup 
where c,d € R", a, B ER, the superscript T denotes the 
transpose, A is an m x n matrix and b € R™. 

In generalization of a linear fractional program, we 
call (P) a quadratic fractional program if C is the non- 
negative orthant, f, g are quadratic and the h, are affine. 

Problem (P) is said to be a concave fractional pro- 
gram if the numerator f is concave on C and g, hy are 
convex on C, where C is a convex set. In addition, it 
is assumed that f is nonnegative on S if g is not affine. 
Note that the objective function of a concave fractional 
program (3) is generally not a concave function. In- 
stead, it is composed of a concave and a convex func- 
tion. Even under these restrictive concavity/convexity 
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assumptions fractional programs are generally noncon- 
cave programs. 

The focus in fractional programming is the objec- 
tive function and its ratio-structure. The feasible region 
is generally assumed to be convex or a convex polyhe- 
dron. 


Single-ratio Fractional Programs 


Consider the problem 


(P) sup {q(x): x € S}, (8) 


where q(x) equals the ratio of the numerator f(x) and 
the denominator g(x) with g(x) > 0 on C. 


Applications 


Fractional programs arise in management decision 
making as well as outside of it. They also occur some- 
times indirectly in modeling where initially no ratio is 
involved. The purpose of the following overview is to 
demonstrate the diversity of problems which can be cast 
in the form ofa single-ratio fractional program. A more 
comprehensive coverage which also includes the refer- 
ences for the models below is given in [60]. For other 
surveys of applications of (8) see [23,57,58,62,68]. 


Economic Applications 


The efficiency of a system is sometimes characterized 
by a ratio of technical and/or economical terms. Maxi- 
mizing the efficiency then leads to a fractional program. 
Some applications are given below. 


Maximization of Productivity 


P.C. Gilmore and R.E. Gomory [35] discuss a stock cut- 
ting problem in the paper industry for which under the 
given circumstances it is more appropriate to minimize 
the ratio of wasted and used amount of raw material 
rather than just minimizing the amount of wasted ma- 
terial. This stock cutting problem is formulated as a lin- 
ear fractional program. In a case study, J.A. Hoskins 
and R. Blom [38] use fractional programming to opti- 
mize the allocation of warehouse personnel. The objec- 
tive is to minimize the ratio of labor cost to the volume 
entering and leaving the warehouse. 


Maximization of Return on Investment 


In some resource allocation problems the ratio 
profit/capital or profit/revenue is to be maximized. 
A related objective is return per cost maximization. Re- 
source allocation problems with this objective are dis- 
cussed in more detail in [47]. In these models the term 
‘cost’ may either be related to actual expenditure or may 
stand, for example, for the amount of pollution or the 
probability of disaster in nuclear energy production. 
Depending on the nature of the functions describing re- 
turn, profit, cost or capital, different types of fractional 
programs are encountered. For example, if the price per 
unit depends linearly on the output and cost and capi- 
tal are affine functions, then maximization of the return 
on investment gives rise to a concave quadratic frac- 
tional program (assuming linear constraints). In loca- 
tion analysis maximizing the profitability index (rate of 
return) is in certain situations preferred to maximizing 
the net present value, according to [5] and [8] and the 
cited references. 


Maximization of Return/Risk 


Some portfolio selection problems give rise to a con- 
cave nonquadratic fractional program of the form (11) 
below which expresses the maximization of the ratio of 
expected return and risk. For related concave and non- 
concave fractional programs arising in financial plan- 
ning see [60]. Markov decision processes may also lead 
to the maximization of the ratio of mean and standard 
deviation. 


Minimization of Cost/Time 


In several routing problems a cycle in a network is to 
be determined which minimizes the cost-to-time ra- 
tio or maximizes the profit-to-time ratio. Also the av- 
erage cost objective used within the theory of stochas- 
tic regenerative processes [3] leads to the minimization 
of cost per unit time. A particular example occurring 
within this framework is the determination of the op- 
timal ordering policy of classical periodic and continu- 
ous review single item inventory models, e. g., [31]. An- 
other example of this framework are maintenance and 
replacement models. Here the ratio of the expected cost 
for inspection, maintenance and replacement and the 
expected time between two inspections is to be mini- 
mized, e. g., [6,30]. 
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Maximization of Output/Input 


Charnes, Cooper and E. Rhodes [22] use a linear frac- 
tional program as a model to evaluate the efficiency 
of decision making units (data envelopment analysis 
(DEA)). Given a collection of decision making units, 
the efficiency of each unit is obtained from the maxi- 
mization of a ratio of weighted outputs and inputs sub- 
ject to the condition that similar ratios for every deci- 
sion making unit are less than or equal to unity. The 
variable weights are then the efficiency of each member 
relative to that of the others. For an extensive treatment 
of DEA see [21]. 

In the management literature there has been an in- 
creasing interest in optimizing relative terms such as 
relative profit. No longer are these terms merely used 
to monitor past economic behavior. Instead the opti- 
mization of rates is getting more attention in decision 
making processes for future projects; e. g., [5,37]. 


Noneconomic applications 


In information theory the capacity of a communica- 
tion channel can be defined as the maximal transmis- 
sion rate over all probabilities. This is a concave non- 
quadratic fractional program. The eigenvalue problem 
in numerical mathematics can be reduced to the maxi- 
mization of the Rayleigh quotient, and hence gives rise 
to a quadratic fractional program which is generally 
not concave. An example of a fractional program in 
physics is given by J.E. Falk [29]. He maximizes the 
signal-to-noise ratio of a spectral filter which is a con- 
cave quadratic fractional program. 


Indirect Applications 


There are a number of management science problems 
that indirectly give rise to a concave fractional program. 
A concave quadratic fractional program arises in loca- 
tion theory as the dual of a Euclidean multifacility min- 
imax problem. In large scale mathematical program- 
ming, decomposition methods reduce the given linear 
program to a sequence of smaller problems. In some 
of these methods the subproblems are linear fractional 
programs. The ratio originates in the minimum-ratio 
rule of the simplex method. 

Fractional programs are also met indirectly in 
stochastic programming, as first shown in [20] and [13]. 


This will be illustrated by two models below [57,68]. 
First, consider the stochastic mathematical program: 


max fa! x: xe Sh, (9) 


where the coefficient vector a is a random vector with 
a multivariate normal distribution and S is a (determin- 
istic) convex feasible region. It is assumed that the de- 
cision maker replaces (9) by a decision problem 


max {P{a'x >k}: x € s\ : (10) 


i.e., he wants to maximize the probability that the ran- 
dom variable aTx attains at least a prescribed level k. 
Then (9) reduces to 


e'x—k 
Vx Vx 


where e is the mean vector of the random vector a and 


max 7 xeES, 


(11) 


V its variance-covariance matrix. Hence the maximum 
probability model of the concave program (9) gives rise 
to a concave fractional program. If in (9) the linear ob- 
jective function is replaced by other types of nonlin- 
ear functions, then the maxi- mum probability model 
leads to various other concave fractional programs as 
demonstrated in [57,68]. 
Consider a second stochastic program 


max { fo(x) + Ofi(x): x € S}, (12) 


where fo, f1; are concave functions on the convex fea- 
sible region S, f; > 0 and @ is a random variable with 
a continuous cumulative distribution function. Then 
the maximum probability model for (12) gives rise to 
the fractional program 


xf f= 
| filx) 


For a linear program (12) the deterministic equivalent 
(13) becomes a linear fractional program. If fo is con- 


:xeES). (13) 


cave and fj linear, then (13) is still a concave fractional 
program. However, if f; is also a (nonlinear) concave 
function, then (13) is no longer a concave fractional 
program. Obviously a quadratic program (12) reduces 
to a quadratic fractional program. For more details on 
(12), (13) see [57,68]. 

Stochastic programs (9) and (12) are met in a wide 
variety of planning problems. Whenever the maximum 
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probability model is used as a deterministic equivalent, 
such decision problems lead to a fractional program of 
one type or another. Hence, fractional programs are en- 
countered indirectly in many different applications of 
mathematical programming, although initially the ob- 
jective function is not a ratio. 

With the recent advent of various interior point 
methods for linear programming problems fractional 
programming has been given more attention as well. 
For instance, K.M. Anstreicher [2] showed that Kar- 
markar’s projective algorithm is fundamentally an algo- 
rithm for linear fractional programming on a simplex. 

M. Gaudioso and M.F. Monaco [34] use quadratic 
fractional programs as subproblems in an algorithm for 
convex nondifferentiable programs. These arise as du- 
als of search direction subproblems. 


Theoretical and Algorithmic Results 


Most of the algorithms known so far solve linear, or 
more generally, concave fractional programs (8). At 
least five different strategies are found in the literature 
and are reviewed below. 


Solving Problem (P) Directly 


Concave (linear) fractional programs share some im- 
portant properties with concave (linear) programs, due 
to the generalized concavity (and in addition, general- 
ized convexity in the linear case) of the objective func- 
tion q(x) =f (x)/g(x) [4,45]: 

1) a local maximum is a global maximum; 

2) a maximum is unique if either the numerator is 
strictly concave or the denominator is strictly con- 
vex; 

3) a solution of the Karush-Kuhn-Tucker optimality 
conditions is a maximum, assuming f, g, hj are dif- 
ferentiable on the open set C; 

4) a maximum is attained at an extreme point of the 
convex polyhedron S of a linear fractional program 
(provided an optimal solution exists). 

Because of the properties 1) and 3), it is possible to solve 

concave fractional programs by several of the stan- 

dard concave programming algorithms. Indeed, it was 
shown that certain concave programming methods can 
be applied to programs with a quasiconcave objective 
function [45]; for example, the Frank-Wolfe lineariza- 
tion method [23,45]. M. Boncompte and J.E. Martinez- 


Legaz [14] proposed a cutting plane method for con- 
cave fractional programs, based on the upper subdif- 
ferentiability of the objective function. If (P) is a linear 
fractional program, then property 4) can be used to cal- 
culate a maximum X by determining a finite sequence 
of extreme points x; of S with increasing values q(x;) 
converging to x. Thus a simple simplex-like procedure 
can solve linear fractional programs [45]. 


Solving an Equivalent Problem (P.q) 


Some of the concave programming algorithms are not 
suitable for generalized concave programs [45]. Thus 
the choice of concave programming algorithms to solve 
concave fractional programs directly is limited. How- 
ever, it can be shown that every concave fractional pro- 
gram is transformable into a concave program: the vari- 
able transformation 


Y=) and = (14) 
reduces (P) to 
(Pea) sup {ef (2): (0 €3} (15) 


with the region S represented by the relations 
y y y 
thy (2) <0, ig(Z) <1, 7 eC, t>0, (16) 


and this is a concave program [55]. If (y, f) is an optimal 
solution of (Peq), then ¥ = Y/t is an optimal solution of 
(P). Such a transformation was originally suggested by 
Charnes and Cooper [19] who showed that with help of 
(14) a linear fractional program can be reduced to a lin- 
ear program. Because of the transformability into a con- 
cave program, concave fractional programs can indi- 
rectly be solved by any concave programming method, 
applying the algorithm to the equivalent program (P.q). 
Hence through transformation (14) one gains access to 
all convave programming algorithms. 

To solve (Peq) rather than (P) may be particularly 
appropriate when the numerator f and the denomina- 
tor g have a certain algebraic structure. For example, 
the maximum probability model (11) or certain port- 
folio selection models have an affine numerator and 
a denominator which is the square root of a convex 
quadratic form. In this case (P.q) reduces to a concave 
quadratic program, and hence (P) can directly be solved 
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by one of the standard quadratic programming tech- 
niques [59]. In the special case ofa linear fractional pro- 
gram (7) transformation (14) yields the linear program 


sup \e"y tat: (y, t) es} (17) 


with the feasible region S$ represented by the relations 


Ay—bt<0, d'y+ft=1, y>0, t>0. 


Hence (7) can be solved by the simplex method [19]. 
For a comparison with other linear fractional program- 
ming methods see [60]. 


Solving a Dual Problem (D) 


One of the disadvantages of solving (P) directly is that 
duality concepts of concave programming cannot be 
used since basic duality relations are no longer valid for 
these nonconcave programs. However, transformation 
(14) enables us to gain access to concave programming 
duality. Thus a dual fractional program can be defined 
as one of the classical duals of the equivalent concave 
program (P.q) [55]. For instance, the Lagrangian dual 
of (Peq) gives rise to the dual fractional program 


f(x) —uT h(x) 


inf | 
(D) ee es 


xEC 


v=o (18) 


where h = (hy, ..., hm)™. As in concave programming, 
several duality relationships can be established between 
(P) and (D) [55]. 

Various duals have been suggested in different ap- 
proaches [57,59]. However, not much effort has been 
devoted to algorithmically using duality. In [56] the 
dual is used to calculate bounds in an iterative proce- 
dure for concave fractional programs. Much remains to 
be done to take full advantage of fractional program- 
ming duality in algorithms. 

For the dual (D) to be a computationally attractive 
alternative to (P) or (Peq), the fractional program (P) 
should have a certain amount of algebraic structure in 
f,g and hy. Otherwise it may well be easier to solve (P) 
rather than a dual of (P). If (P) is a concave quadratic 
fractional program with an affine denominator, then 
the dual can be written as a linear program with one 
additional concave quadratic constraint [59,65]. 

One advantage of a dual method is that in addi- 
tion to an optimal solution of (P) also the sensitivity 


of the maximal value of q(x) with regard to right-hand 
side changes can be calculated. The dual variables 4; 
in an optimal solution turn out to be propertional to 
the marginal values of q(x) at ¥ [57,58,59]. Sensitivity 
and parametric analysis for fractional programming has 
been extensively discussed; see [18,23,57,58] and the 
cited references. 


Solving a Parametric Problem (P,) 


There is a rich class of algorithms based on the follow- 
ing parametric problem associated with (P): 


(P,) max{f(x) —qg(x): x € S}, (19) 


where q € R is a parameter. The program (P,) is some- 
times numerically more tractable than the program (P). 
For example, (P,) is a parametric quadratic (linear) 
program if (P) is a quadratic (linear) fractional pro- 
gram, and (P,) is a parametric concave program if (P) 
is a concave fractional program. M. Sniedovich [67] an- 
alyzed the relationship between (P,) and classical opti- 
mization techniques applied to (P). 

In the following it is assumed that S is compact and 
f, g are continuous on S. Let F(q) denote the optimal 
value of the objective function of (P,). F(q) is a strictly 
decreasing, convex function which has a unique zero 
q = q. An optimal solution x of (Pz)) is also an optimal 
solution of (P) with q = f(x)/g(x). Thus solving (P) is 
equivalent to finding the unique root of the nonlinear 
equation F(q) = 0. With the properties of F(q) in view, 
T. Ibaraki [39] applied various classical search tech- 
niques to calculate the zero q = q. These interval-type 
algorithms generate a succession of intervals with de- 
creasing amplitude containing q = q. Computational 
results are reported in [39,62]. The application of New- 
ton’s method is commonly referred to as the algorithm 
by W. Dinkelbach, who first proposed such a proce- 
dure [28]. Its equivalence to Newton’s method was seen 
later by Ibaraki. A very efficient version of Dinkelbach’s 
method was suggested in [51] improving an earlier vari- 
ant in [56]. 


Interior Point Algorithms 


In addition to the four more classical strategies above, 
recently new techniques have emerged which are of 
the interior point type. The first such method, devel- 
oped for linear fractional programs, is due to Anstre- 
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icher [2]. In 1994, R.W. Freund and F. Jarre [32] 
proposed a method for concave fractional programs. 
A polynomial convergence is established and some nu- 
merical results are reported. 

Most of the computational work in single-ratio frac- 
tional programming compares and tests algorithms that 
use the parametric program (P,). Much more work is 
needed to compare computationally the various four 
approaches above with each other and with the very re- 
cent polynomial time interior point methods. Also new 
methods need to be developed for certain nonconcave 
fractional programs arising in applications; e. g., [59]. 

This section on single-ratio fractional programming 
concludes with a brief discussion of integer fractional 
programming. In some of the economic applications 
above the variables are restricted to be integers, if in- 
divisible goods are involved. Also, a number of combi- 
natorial fractional programs with 0-1 variables are of 
interest; for instance fractional location problems [5]. 


Integer Fractional Programming 


This is an important, but somewhat neglected field 
within fractional programming. In [5] A.I. Barros gives 
an overview of some of the advances. Here the para- 
metric procedure by N. Megiddo [46] stands out par- 
ticularly. T. Radzik [53] provided a detailed survey of 
the advances in combinatorial fractional programming. 
The survey includes many of his own complexity re- 
sults on Dinkelbach’s and Megiddo’s parametric proce- 
dures. Among others, Radzik shows that Dinkelbach’s 
algorithm solves a combinatorial linear fractional pro- 
gram in a strongly polynomial number of iterations, re- 
gardless of the constraint structure. Some of the results 
in [53] are specialized to cases such as the problem of 
profit-to-time cycles and maximum mean-weight cuts. 
In the same survey also open problems in combinato- 
rial fractional programming are identified. 

Leaving the single-ratio case now, the three multi- 
ratio fractional programs in (4), (5) and (6) will be ad- 
dressed below. Among these, so far best researched is 
the following. 


Maximization of the Smallest of Several Ratios 


Consider the Problem 


sig) min qj(x): x ES), (20) 
<p 


Pere 


where qi(x) = fi(x)/gi(x) and 
S= {x EC: hy(x) <0, k=1,...,m}. 


It is assumed that C C R” is nonempty, convex and hx 
are real-valued convex functions on C. Before analyzing 
(20), some applications of this model are outlined. 


Applications 


In mathematical economics problem (20) may arise 
when the growth rate of an expanding economy is de- 
termined [50]: 


output (x 
growth rate = max | min Supa) ; 
x \usi<p input,(x) 


(21) 
where x denotes a feasible production plan of the econ- 
omy. In management science simultaneous maximiza- 
tion of rates such as those discussed earlier can lead to 
(20). This is so if either in a worst-case approach the 
model 


i(x) 
oe Pres — sup (22) 
is used or with the help of prescribed ratio goals r; 
ax fils) —r;| > inf (23) 
1si<p | gi(x 


is employed. In both cases essentially a max-min frac- 
tional program (20) is to be solved. Examples of the sec- 
ond approach are found in financial planning with dif- 
ferent financial ratios or in the allocation of funds under 
equity considerations. Furthermore (20) was recently 
used in location analysis; see [5] for details. A third area 
of application of model (20) is numerical mathematics. 
Given the values F; of a function F(t) in finitely many 
points ¢; of an interval for which an approximating ratio 
of two polynomials N(¢, x;) and D(t, x2) with coefficient 
vectors x), X2 is sought. If the best approximation is de- 
fined in the sense of the L.o-norm, then the following 
problem is to be solved: 


max — inf (24) 


i 


N(ti, x 
| (t 1) F; 


D(t;, x2) 7 
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with variables x), x2. Like (23), this problem can be re- 
duced to a max-min fractional program (20). 


Theoretical and Algorithmic Results 


Several authors, including von Neumann [50], have 
introduced dual programs for problem (20) employ- 
ing different approaches; see [60]. In most duality ap- 
proaches the following assumptions are made: C is 
nonempty, convex and compact, —f;, g;, hy are lower 
semicontinuous, —f;, gi, hk are convex, g; are positive 
on C, f; are nonnegative on S, if at least one gi is not 
affine, and the feasible region S is nonempty. Let F = 
(fis of p)l, G = (G1 «+s Bp) and h = (hy, ..., Am)T. 
The following dual is derived in [40] with help of the 
Farkas lemma (cf. » Farkas lemma; » Farkas lemma: 
Generalizations): 


: v' F(x) —u! h(x) 
wot jel Sree A} 
v>0,v40 


Under the assumptions above the optimal values in 
(20) and (25) coincide, and duality relations much like 
those in concave and linear programming hold [40]; see 
also [25]. 

The primal max-min program (20) is associated 
with a dual min-max fractional program (25). Such 
a symmetry is not obvious in single-ratio fractional 
programming duality theory; see (18). Symmetry be- 
tween the primal and dual exists also in the following 
sense: in both problems a local optimum is a global 
optimum. This follows from the fact that the primal 
objective function is semistrictly quasiconcave and the 
dual objective function is semistrictly quasiconvex [4]. 
The dual objective function usually involves infinitely 
many ratios in contrast to the primal one. However, this 
asymmetry disappears in case of a linear problem (20) 
where f ;, g;, hy are affine and C is the (unbounded) non- 
negative orthant of R”. Then only finitely many ratios 
need to be considered in the dual objective function. 
In the linear case it can further be shown that in ad- 
dition to the usual complementary slackness between 
variables in one problem and constraints in the other 
one, complementary slackness also exists between cer- 
tain variables in one and ratios in the other one [25]. 
Hence in the linear case of (20) there exist complete 


symmetry as well as a close relationship between the 
primal and the dual fractional program. 

Regarding solution methods for (20), an extension 
of Dinkelbach’s algorithm to (20) was introduced by 
J.P. Crouzeix, J.A. Ferland and Schaible in [26]. It 
proved to have attractive convergence properties and 
became the starting point for the design of similar 
methods surveyed in [24]. Several of these interval-type 
methods have been compared and tested. M. Gugat [36] 
proposed a fast interval-type algorithm for (20) which 
always converges superlinearly. Boncompte and Mar- 
tinez—Legaz [14] used a cutting plane approach, orig- 
inally suggested in [52] for a more general class of 
quasiconcave problems, employing upper subdifferen- 
tiability of the objective function in (20). A computa- 
tional comparison with the Dinkelbach-type method 
in [26] is carried out too. A different cutting plane 
method incorporating the ideas of [52] and [67], again 
for a more general class of problems than generalized 
fractional programs, is discussed in [7]. In case of prob- 
lem (20) the method in [7] reduces to the Dinkelbach- 
type method in [26]. Thus the latter can also be viewed 
as a cutting plane method. 

Most of the algorithms above solve the primal prob- 
lem (20). In the linear case the Dinkelbach-type algo- 
rithm in [26] can also be applied to the dual because of 
symmetry between (P) and (D). Recently a ‘dual’ algo- 
rithm for (20) was proposed in the nonlinear case [10]. 
It can be viewed as an extension of the Dinkelbach- 
type algorithm in [26] applied to the dual of a general- 
ized linear fractional program. In [9] a new dual of (20) 
was proposed as well as an efficient method to solve it. 
Less restrictive assumptions ensure superlinear conver- 
gence of this new ‘dual’ algorithm. An extensive com- 
putational comparison of the Dinkelbach-type method 
in [26] with the two dual methods was performed by 
Barros, J.B.G. Frenk, Schaible and S. Zhang; see [5,9,10]. 
The test problems involve quadratic ratios. 

Freund and Jarre [33] proposed an interior point 
method for solving (20) which extends their method 
in [32] for single-ratio problems. Furthermore, A.S. Ne- 
mirovsky and Yu.E. Nesterov [48,49] developed several 
interior point algorithms for (20) which converge in 
polynomial time. The studies above contain thorough 
complexity analyses. Summarizing, one can say that 
max-min fractional programs have been researched 
quite extensively. 
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Maximizing a Sum of Ratios 


Consider the multi-ratio fractional program 


P 
op) Sal KE s ; 


i=1 


(26) 


where qi(x) = fi(x)/gi(x), gi(x) > 0. 


Applications 


Model (26) arises naturally in decision making when 
several rates are to be optimized simultaneously and 
a compromise is sought which optimizes a weighted 
sum of these rates. In light of the applications in the 
single-ratio case, numerators and denominators may 
represent profit, cost, capital, risk or time, for exam- 
ple. Model (26) also includes the case where some ratios 
are not proper quotients, i.e., g;(x) = 1. This describes 
situations where a compromise is sought between ab- 
solute and relative terms like profit and return on in- 
vestment (profit/capital) or return and return/risk. Ad- 
ditional applications of (26) are surveyed in [61]. To 
mention a few, Y. Almogy and O. Levin [1] analyze 
a multistage stochastic shipping problem and show that 
a deterministic equivalent of this stochastic problem 
leads to (26). M.R. Rao [54] discusses various models in 
cluster analysis. The problem of optimal partitioning of 
a given set of entities into a number of mutually exclu- 
sive and exhaustive groups (clusters) gives rise to var- 
ious mathematical programming problems, depending 
on which optimality criterion is used. If the objective is 
to minimize the sum of the average squared distances 
within groups, then a minimum ofa sum of ratios is to 
be determined. H. Konno and M. Inori [42] formulated 
a bond portfolio optimization problem in the form (26). 


Theoretical and Algorithmic Results 


As seen earlier, the case of ratios of concave and convex 
functions is of particular interest in applications. Fortu- 
nately, it lends itself to a relatively easy analysis of mod- 
els (8) and (20). A local maximum is a global one, dual- 
ity relations can be established and several efficient so- 
lution techniques are available. Unfortunately, for the 
sum-of-ratios problem none of this is true any longer 
if in (26) all ratios f;(x)/g;(x) are quotients of concave 
and convex functions. In particular, a local maximum is 
usually not a global one, even in the case of linear ratios. 


More often the objective function is not quasiconcave. 
I.A. Bykadorov [15] studied certain generalized concav- 
ity properties of sums of linear ratios and, more gener- 
ally, of sums of ratios of polynomials. Only some lim- 
ited theoretical results are known for the sum of con- 
cave ratios; see [23] and the surveys [60,61]. In the case 
of linear ratios, C.H. Scott and T.R. Jefferson [64] pro- 
posed a duality concept for (26) using geometric pro- 
gramming duality. 

Given the small theoretical basis, it is not surpris- 
ing that algorithmic advances have been rather limited 
too. Several strategies have been proposed and are sur- 
veyed in [61]. The best tested method can be found 
in [43]. Separating numerators and denominators with 
help of additional variables, problem (26) is embedded 
into a higher-dimensional space with a concave objec- 
tive function. A global minimum is then found through 
approximation techniques. Computational experience 
with the related multiplicative program in [43] shows 
that the method works quite well for up to about four 
terms. However, for more terms in the sum it looses its 
efficiency fast. Much work is still necessary to develop 
efficient algorithms for (26), even in the case of linear 
ratios. 


Multi-objective Fractional Programming 


The problem of simultaneously maximizing several ra- 
tios leads to a multi-objective fractional program 


max {(qi(x),...,qp(x)): xE S}, (27) 
where qi(x) = fi(x)/gi(x), gi(x) > 0. Such a model arises 
when in contrast to the previous two models (20) and 
(26) a unifying objective is not considered. Instead, the 
decision-maker is to be provided with some or all ef- 
ficient (Pareto optimal) alternatives. These are feasible 
solutions such that no ratio can be further increased 
without decreasing at least one of the other ratios. Ap- 
plications, for instance in financial planning or pro- 
duction planning, can easily be envisioned in light of 
the applications of fractional programming described 
earlier; see also [44,60,68] and references therein. In 
case of concave ratios problem (27) can be seen as 
a special case of a semistrictly quasiconcave multi- 
objective programming problem; e. g., [17] and articles 
in [16,27,41,63,66]. Duality for multi-objective frac- 
tional programs has been studied by several authors, 
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usually for concave or linear ratios; see [11,68]. For 
such problems also equivalences to multi-objective pro- 
grams without ratios have been established [68]. These 
are formed with help of the numerator and denomina- 
tor functions. 

Another topic, important from a theoretical and al- 
gorithmic point of view, is the question whether the set 
E of efficient (Pareto optimal) solutions is connected. 
Only partial answers were available until very re- 
cently [60]. Meanwhile connectedness has been shown 
for continuous concave fractions over a compact con- 
vex feasible region. This is a consequence of a more gen- 
eral result in [12] for semistrictly quasiconcave objec- 
tive functions. Several solution methods for the calcu- 
lation of (weakly, proper) efficient solutions have been 
proposed for linear and concave ratios; see [44,68] and 
cited references. It is noted that the calculation of E 
may simplify the solution of the difficult sum-of-ratios 
problem [60] since an optimal solution of (26) is an 
efficient solution. Such an approach seems to be par- 
ticularly promising in case of two ratios. In summary, 
some good progress has been made in the analysis of 
concave multi-objective fractional programs. However 
more work is needed. 


Conclusion 


Many interesting problems inside and outside manage- 
ment decision making gives rise to the optimization of 
one or several ratios. Much effort has been devoted to 
the analysis of such nonconcave programs. However, 
the theoretical basis is still not broad enough, espe- 
cially for sum-of-ratios problems and, to a lesser ex- 
tend, for multi-objective fractional programs. The com- 
putational experience with fractional programs is also 
quite limited. Major progress has been made for con- 
cave single-ratio and max-min fractional programs. But 
much more work is necessary for the other fractional 
programs of interest in applications. 


See also 


> Bilevel Fractional Programming 

> Fractional Combinatorial Optimization 

> Quadratic Fractional Programming: Dinkelbach 
Method 
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Introduction 


One of the classes of 0-1 optimization problems is the 
maximization (or minimization) of a sum of ratios of 


linear 0-1 functions: 


m T 
ajo + aj x 
max x)= ——_,, 1 
xE{0,1}" f ) 2 bio + bix ( ) 
s.t. Dx <c, (2) 


where aj € R’, bj € R’, ajo E R, bio € R, De R*xn 
and c € R*. Problem (1)-(2) is referred to as fractional 
0-1 programming problem [21], or hyperbolic 0-1 pro- 
gramming problem [1,20]. 

Note that if for some j and x in the feasible region (2) 
the term b jo + bix is equal to zero, then problem (1)- 
(2) may not have a finite optimum. Therefore, it is usu- 
ally assumed that 


bj +b; x #0, for allx € {0,1}" andj =1,...,m. 
(3) 


Furthermore, sometimes we can make a stricter as- 
sumption and require that all denominators in (1) are 
positive, i. e., 


bi tbj)x>0, for all x € {0,1}" and j =1,...,m. 
(4) 


A special simplified class of (1)-(2) is the so-called 
single-ratio fractional (hyperbolic) 0-1 programming 
problem: 


n 
Ay + Yojay GiXi 


(x) = Ep Boe (5) 


max 
x€{0,1}" 


Problem (1) can be generalized if instead of linear 0-1 
functions we consider 0-1 polynomials: 


ajo + disea, ajs [ies Xi 


max f(x) = , (6) 
one? 2 bjo + dren, bir Vier Xi 
where A; and B; are families of subsets of {1,2,..., n}. 


In general case, problems of type (1), (5) and (6) can 
be considered subject to various 0-1 linear and nonlin- 
ear constraints. A specific class of fractional 0-1 pro- 
gramming problems, where fractional terms appear not 
in the objective function, but in the set of constraints, is 
discussed in [2]: 


m 
max g(x) = WiX; 7 
imax g(x) d Xi (7) 


1092 


Fractional Zero-One Programming 


m Ss n Saya. 
Sig + Qin GX 


s.t. Si RS See = SS 
s n s = : 
jai Pjo + Din Byixi 


s=1,...,K, (8) 


where K is the number of fractional constraints. 

Finally, we should note here that in contrast to (1)- 
(2), (6) and (7)-(8), problem (5) received most of 
the attention in the literature. Detailed surveys on 
single-ratio fractional combinatorial optimization can 
be found in [14,19]. 


Applications 


Applications of constrained and unconstrained ver- 
sions of problems (1)-(2), (5), (6), (7)-(8) arise in 
scheduling [16], query optimization in data bases and 
information retrieval [7], service systems design and fa- 
cility location [3,20], graph theory [11], data mining [2] 
and other areas [19]. 

Consider, for example, a problem discussed in [3]. 
We have a set of customers’ regions with Poisson de- 
mand rates a;(i = 1,..., n). These regions can be as- 
signed to a service facility with an exponential service 
rate b. If we define a 0-1 variable x; corresponding to 
each region i such that x; = 1 if region i is serviced by 
the service facility (and x; = 0, otherwise) then the ser- 
vice facility can be described as an M/M/1 queue with 
arrival rate A = }~"_, ajx; and service rate b. If we as- 
sume steady-state conditions (A < b) then the average 
waiting time for each customer is equal to 


1 1 


= ; 9 

b—-Az b— So F_, ax; ( ) 
and the total average waiting time is given by 

= AjiXi (10) 


b — pai GX 


Next suppose that the customers’ region i contributes 
profit p; and the penalty for delay per unit time/per cus- 
tomer is t. Then in order to maximize the profit we need 
to solve the following nonlinear knapsack problem 


ane 
max ix; —t ae. 11 
x€{0,1}" dP b— Paix; or 
n 
s.t. > ajx; <b (12) 


Another interesting application of fractional 0-1 
programming can be found in graph theory [11]. Let 
G = (V,E) be an undirected graph. The density d(G) 
of G is defined as the maximum ratio of the number of 
edges ey to the number of nodes ny over all subgraphs 
H CG,i.e. 

d(G) = max = , 


HCG ny 


(13) 


where ey and ny are the number of edges and nodes 
in the subgraph H. Obviously, the problem of finding 
d(G) can be formulated as the following fractional 0-1 
programming problem: 


py rs, 


aa 2 a 


(14) 


m 
x€{0,1}", x40 


where aj; are the elements of the adjacency matrix of 
G and n is the number of nodes in G. A similar formu- 
lation can also be given for the arboricity '(G) which 
is defined as the minimum number of edge-disjoint 
forests into which G can be decomposed [11]. 


Complexity Issues 


Constrained problems (1) and (5) where we optimize 
a single- or multiple-ratio fractional 0-1 function sub- 
ject to linear 0-1 constraints, as well as problem (7)- 
(8) are obviously NP-hard since general linear 0-1 pro- 
gramming is their special case if we set bj; = 0 and 
bj = lforj =1,...,mandi=1,...,n. 

An unconstrained single-ratio fractional 0-1 pro- 
gramming problem (5), can be solved in polynomial 
time, see [7], if condition (4) holds. If the denomi- 
nator can take both negative and positive values, i.e., 
only (3) holds, single-ratio problem (5) is known to be 
NP-hard [7]. In other words, the sign of the denomi- 
nator is “the borderline between polynomial and NP- 
hard classes” [7]. Another simple proof of this fact is 
given in [1]. Recall the classical SUBSET SUM prob- 
lem: Given a set of positive integers S = {s],..., Sn} 
and a positive integer K, does there exist a vector 
x € {0,1}”, such that 


(15) 


n 
) SjxXj = K? 
i=1 
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This problem is known to be NP-complete [4]. With 
the instance of the SUBSET SUM problem we associate 
the following unconstrained single-ratio fractional 0-1 
programming problem: 


1 


max : 16 
xe{o,1" 1 — 2(>-7_, sixi — K) (16) 


It is easy to observe that (3) holds and the solution 
of (16) is equal to 1 if and only if the SUBSET SUM 
has a solution, which implies the necessary result. Fur- 
thermore, it can be easily shown that finding an approx- 
imate solution of (5) within any positive multiple of the 
(negative) optimal value is NP-hard [7]. 

For multiple-ratio problem (1) with (4) satisfied, the 
number of ratios (m = 1, or m > 2) defines complex- 
ity of the problem. For m = 1 we have a classical poly- 
nomially solvable single-ratio case, while for m = 2, 
that is the 2-ratio case, the problem becomes NP-hard 
(see [18] or [13]). 

Some other aspects of the complexity of uncon- 
strained single- and multiple-ratio fractional 0-1 pro- 
gramming problems (1) and (5), including complexity 
of uniqueness, approximability and local search, are ad- 
dressed in [12,13]. 


Mixed Integer Reformulation 


Li [9] and Wu [21] suggested a straightforward lin- 
earization technique for (1) based on a simple well- 
known idea: a polynomial mixed 0-1 term z= xy, 
where x is a 0-1 variable, and y is a continuous vari- 
able taking any positive value, can be represented by 
the following linear inequalities: (1) y—z< K — Kx; 
(2) z < y; (3) z < Kx; (4) z => 0, where K is an upper 
bound on y. 

Assume that (4) is satisfied. Define a new variable y; 
for each ratio in (1) that is 


1 
21 bio + re Bix 


Then fractional 0-1 programming problem (1) can 
be equivalently expressed as: 


sD Ajoyj + 3 s A jiUji 


j=l i=1 


(17) 


yerod 1" 


s.t. Dx <c 


bioyjs +) bjiuji = 1 juil,....m 


i=l 
yj ~ Kj — xi) < uji < Kix; 


j=l,...,m 
OS uj Sj j=l,...,m 
(18) 


where a new variable u;; is introduced for each nonlin- 
ear term y;x;, and K; is an upper bound on yj. 

Additional, though similar in spirit to (18), linear 
mixed 0-1 reformulations as well as other related issues 
are carefully discussed in [20]. 


Solution Techniques 


Most of the research efforts have been focused on 
solving various classes of single-ratio problem (5). 
Among developed solution techniques we should men- 
tion branch-and-bound [15], cutting plane [5], enu- 
meration [6] and approximation algorithms [8]. How- 
ever, most popular methods for solving single-ratio 
fractional 0-1 programming (and general fractional 
combinatorial) problems are based on the parametric 
approach [10,11,14]. 

For some classes of multiple-ratio fractional 0-1 
programming problems, there are developed special- 
ized algorithms [2,3,16,17,20]. More recent examples 
include a highly efficient cutting-plane algorithm for 
solving problem (11)-(12) [3] and a heuristic for solv- 
ing special classes of fractionally constrained problems 
of type (7)-(8) [2]. Reported computational experi- 
ments involved test instances with the size of up to 
10,000 variables. 

Unfortunately, the fractional programming prob- 
lem becomes substantially more difficult if we introduce 
additional ratios in the objective function. General mul- 
tiple-ratio problem (1)-(2) can be solved utilizing stan- 
dard branch-and-bound methods after reformulation 
into linear mixed 0-1 programming problem via tech- 
niques discussed in [9,20,21]. An improved branch- 
and-bound algorithm based on node tightening is de- 
veloped in [20]. 


References 


1. Boros E, Hammer P (2002) Pseudo-Boolean Optimization. 
Discret Appl Math 123(1-3):155-225 


1094 


Frank-Wolfe Algorithm 


20. 


21. 


Busygin S, Prokopyev OA, Pardalos PM (2005) Feature Se- 
lection for Consistent Biclustering via Fractional 0-1 Pro- 
gramming. J Comb Optim 10(1):7-21 

Elhedhli S (2005) Exact solution of a class of non- 
linear knapsack problems. Oper Res Lett 33:615- 
624 

Garey MR, Johnson DS (1979) Computers and Intractabil- 
ity: A Guide to the Theory of NP-Completeness. WH Free- 
man, San Francisco 

Granot D, Granot F (1976) On solving fractional (0-1) pro- 
grams by implicit enumeration. INFOR 14:241-249 

Granot D, Granot F (1977) On integer and mixed inte- 
ger fractional programming problems. Ann Discret Math 
1:221-231 

Hansen P, Poggi de Aragao M, Ribeiro CC (1991) 
Hyperbolic 0-1 programming and query optimiza- 
tion in information retrieval. Math Program 52:256- 
263 

Hashizume S, Fukushima M, Katoh N, Ibaraki T (1987) 
Approximation algorithms for combinatorial  frac- 
tional programming problems. Math Program 37:255- 
267 

Li H (1994) A global approach for general 0-1 fractional 
programming. Eur J Oper Res 73:590-596 

Megiddo N (1979) Combinatorial optimization with ratio- 
nal objective functions. Math Oper Res 4:414-424 


. Picard J-C, Queyranne M (1982) A network flow solution to 


some nonlinear 0-1 programming problems, with applica- 
tions to graph theory. Networks 12:141-159 

Prokopyev OA, Huang Hx, Pardalos PM (2005) On com- 
plexity of unconstrained hyperbolic 0-1 programming 
problems. Oper Res Lett 33:312-318 

Prokopyev OA, Meneses C, Oliveira CAS, Pardalos PM 
(2005) On Multiple-ratio Hyperbolic 0-1 Programming 
Problems. Pac J Optim 1/2:327-345 

Radzik T (1998) Fractional Combinatorial Optimization. In: 
Du D-Z, Pardalos PM (eds) Handbook of Combinatorial Op- 
timization, vol 1. Kluwer, Dordrecht, pp 429-478 

Robillard P (1971) (0,1) Hyperbolic programming prob- 
lems. Nav Res Logist Q 18:47-57 

Saipe S (1975) Solving a (0,1) hyperbolic program by 
branch and bound. Nav Res Logist Q 22:497-515 

Skiscim CC, Palocsay SW (2001) Minimum Spanning Trees 
with Sums of Ratios. J Glob Optim 19:103-120 

Skiscim CC, Palocsay SW (2004) The Complexity of Min- 
imum Ratio Spanning Tree Problems. J Glob Optim 30: 
335-346 
Stancu-Minasian IM 
Kluwer, Dordrecht 
Tawarmalani M, Ahmed S, Sahinidis N (2002) Global Op- 
timization of 0-1 Hyperbolic Programs. J Glob Optim 
24:385-416 

Wu T-H (1997) A note on a global approach for general 0-1 
fractional programming. Eur J Oper Res 101:1997 


(1987) Fractional Programming. 


Frank-Wolfe Algorithm 


SIRIPHONG LAWPHONGPANICH 
Naval Postgraduate School, Monterey, USA 


MSC2000: 90C30 


Article Outline 


Keywords 
See also 
References 


Keywords 


Away direction; Bisection search; Bounded; Column 
generation; Convergence rate; Convex function; 
Convex hull; Convexity; Dantzig-Wolfe 
decomposition; Differentiable function; Direction 
finding problem; Extreme point; Feasible direction; 
Feasible direction methods; First order Taylor series 
expansion; Geometrically; Globally optimal; Golden 
section method; Heuristics; Inexact line search line; 
Search algorithms; Line search problem; Linear 
program; Matrix; Network; Nonempty; Nonlinear 
programming; Parallel tangents; PARTAN; 
Polyhedron; Positive definite matrix; Pseudoconvex; 
Quadratic programming; Regularized direction finding 
problem; Regularized Frank-Wolfe decomposition; 
Simplex algorithm; Simplicial decomposition; Stepsize; 
Stopping criterion; Strongly convex 


In 1956, M. Frank and P. Wolfe [5] published an article 
proposing an algorithm for solving quadratic program- 
ming problems. In the same article, they extended their 
algorithm to the following problem: 


led . 


where f(x) is a convex and continuously differentiable 
function on R". The set S is a nonempty and bounded 
polyhedron of the form S = {x € R" : Ax < b,x > 
0}, where A is a m X n matrix and b € R”. The al- 
gorithm belongs to the class of feasible direction meth- 
ods for nonlinear programming problems. Starting from 
a feasible solution, algorithms in this class solve (1) by 
iteratively generating a feasible direction that leads to 
another feasible solution with an improved objective 
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function value. The Frank-Wolfe (FW) algorithm for 
(1) can be stated as follows: 


0 | Select x! € S and set k = 1. 

1 | Let y; = arg mun veg Vie) ly 

IF Vi wa: = 0 

THEN stop and «x* is an optimal solution 
ELSE go to Step 2. 

2 | Let A* = arg min, .,—, f(x* + A(y* — x*)). 
Set x! = xk 4 AK(y — x*) and k =k +1; 
Go to Step 1. 


The Frank-Wolfe algorithm 


The problem in Step 1 is generally referred to as the 
direction finding problem, for the vector d* = (yk — x*) 
is a feasible direction at x*. Since Vf (x*) is a constant 
vector with respect to y, the direction finding problem 
is a linear program and can be solved using the sim- 
plex algorithm. Doing so implies that d* always points 
toward an extreme point since y* is always an extreme 
point of S. When x* satisfies the stopping criterion, it 
must be globally optimal because the following holds 
for all x € S: 


F(x) = fle") + VE) (x — x) 
S14 VIG) GY =e) Fe. 


The three inequalities follow from the convexity of f(x), 
the fact that yk solves the direction finding problem, and 
the stopping criterion, respectively. 

When x* does not satisfy the stopping criterion, 
Vf (x) T (yk x*)< 0 and the algorithm proceeds to Step 
2. In this step, A* is a solution to a line search prob- 
lem which has only one a decision variable and can 
be solved by a number of algorithms such as bisec- 
tion search, golden section method and an inexact line 
search technique using, e. g., Armijo’s rule [1]. It is im- 
portant to note that the new solution, x*t!, has a better 
objective value. To demonstrate, consider the first or- 
der Taylor series expansion of f(x) around the point x*, 
ive, 


f(x + AG — x*) 
= f(x") 4AV F(x") (y* — x*) 
+A | yp’ — x8 | eels A= x4), 


where limy _, 9a(x*; Aiy* — x*)) = 0. Since Vi (x*)T (yk 
— x*) < 0, the above expansion implies that there ex- 
ists a sufficiently small de (0,1) such that f (xk + 
Ay =x") < ice Using the fact that AK solves the 
line search problem, the following must hold: 


ie) = f(x* ah ruler — x*)) 
< f(x* + Ay — x) < f(x. 


Thus, x**! has a better objective value. 

Using standard techniques in nonlinear program- 
ming, it can be shown that the sequence of FW iterates, 
xk, converges to an optimal solution. This also holds 
under a weaker assumption that f(x) is pseudoconvex. 
In [14], W.B. Powell and Y. Sheffi eliminate the line 
search problem in Step 2 and show that the FW algo- 
rithm still converges to an optimal solution as long as 
AK satisfies the following conditions: 


CO 

yn and lim AK = 0. 

rm k-0o 
For example, one suitable choice is AK = 1/k. 

The main advantage of the FW algorithm is in its 
simplicity. It is easy to understand and implement on 
a computer. Computer programs for the simplex and 
the line search algorithms already exist and are gener- 
ally available. When the constraint matrix A has a net- 
work structure (see, e.g., [7,11], and [2]), more effi- 
cient network algorithms can be used to solve the di- 
rection finding problem and the overall computational 
time can be reduced. In addition, the FW algorithm 
does not require much computer storage or memory. 
However, this feature may be less important as the com- 
puter memory becomes available in abundance and at 
a cheaper price. 

The main disadvantage of the FW algorithm is its 
slow convergence rate. (See Fig. 1.) During the early it- 
erations, the algorithm tends to decrease the objective 
function rather dramatically. However, the FW iterates 
tend to zigzag as they slowly approach an optimal so- 
lution. In [17], Wolfe shows that the sequence x* con- 
verges geometrically to the optimal solution, if it is in 
the relative interior of S and f(x) is strongly convex. On 
the other hand, if the optimal solution is on the bound- 
ary of S, the convergence may be slower. 

In practice, there are several modifications that can 
accelerate the convergence of the FW algorithm. The 
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Frank-Wolfe Algorithm, Figure 1 

The problem is: min {|| w — x||2: x € 5} where S is the convex 
hull of E', E2, and E?. The Frank-Wolfe algorithm generates 
feasible directions that point toward either E' or E?. It dra- 
matically reduces the objective function during the first two 
iterations and zigzags toward the optimal solution, x“, after- 
ward 


first modification is due to Wolfe [17]. It involves gen- 
erating in Step 1 an additional feasible direction, dk = 
zk — xk where z* = arg Max, ¢ sVf(x*)T z. The direc- 
tion dé is generally referred to as the‘away’ direction 
since it is constructed from the worst extreme point 
with respect to minimizing the objective function. Be- 
tween the original and the away directions, only one is 
selected for the line search problem in Step 2. Although 
the away direction generally leads to a faster conver- 
gence in practical applications (see, e. g., [3]), J. Guélat 
and P. Marcotte [8] showed that the resulting algorithm 
still converges geometrically to an optimal solution un- 
der appropriate assumptions. The second modification 
is based on the parallel tangents (PARTAN) method in- 
troduced in [15]. During the kth iteration, the PARTAN 
direction, p*, is defined to be (x* — x*—?) when k > 3. 
When the FW algorithm zigzags, p* intuitively points 
toward an optimal solution. (See Fig. 2.) 

When integrated together, the PARTAN variant 
(see [4] and [10]) of the FW algorithm alternates be- 
tween the original and the PARTAN directions when 
performing line searches. More formally, the original 
Step 2 of the FW algorithm is replaced with the follow- 
ing steps: 


xkl 


Frank-Wolfe Algorithm, Figure 2 
The PARTAN direction, p* = (xk — x‘—2), points toward an 
optimal solution 


2 | Letak = arg jiavhe Weegee +A(y* — x*)). 
Set2® =x" 2 AF) = x*), 

go to Step 3. 

3 | (PARTAN step) 

IFk=1 

THEN set x* =z" 

ELSE let 


a* =arg Maite yey a +a(z* —x*)), 


where a, is the maximal stepsize in the 
direction (z* — x*—), 

set x*t! = x1 4 gkyzk — xk-1); 
setk=k+1; 


return to Step 1. 


Finally, the last modification for accelerating the 
FW algorithm involves using some or all of the ex- 
treme points generated during the current and prior it- 
erations. Instead of performing a line search in Step 2, 
a typical modification (see, e.g., [6,9,16] and [12]) ei- 
ther requires a heuristic, approximate, or exact solution 
to the following problem: 


: (2) 


The feasible region of (2) is the convex hull of {y', ..., 
yt, each of which is an extreme point of S. Thus, (2) 
is an approximation to (1) and this approximation im- 
proves as more extreme points are added to (2). Since 
the number of extreme points of S is finite, an optimal 


Frequency Assignment Problem 


1097 


solution to (2) should also solve (1) after a finite number 
of iterations. When (2) is solved exactly (or nearly so), 
the resulting algorithm is generally known as the simpli- 
cial decomposition or column generation technique and 
is related to the Dantzig- Wolfe decomposition. 

In the above three modifications, the direction find- 
ing problems are linear programs with the same struc- 
ture. In 1994, A. Migdalas [13] introduced an extension 
called the regularized Frank-Wolfe algorithm in which 
the direction finding problem has a nonlinear term in 
the objective function to control the distance between 
y* and x, For example, one version of the regularized 
direction finding problem is: 


. 1 
yk = arg min Vi(x*)Ty + a - x)" py ="); 
where Dk is a positive definite matrix. 


See also 


> Rosen’s Method, Global Convergence, and Powell’s 
Conjecture 
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The ever growing number of wireless communications 
systems deployed around the globe has made the opti- 
mal assignment of a limited radio frequency spectrum 
a problem of primary importance. At issue are planning 
models for permanent spectrum allocation, licensing, 
regulation [20] and network design to include; aero- 
nautical mobile, land mobile, maritime mobile, broad- 
cast, land fixed (point-to-point) and satellite. Further at 
issue are on-line algorithms for dynamically assigning 
frequencies to users within an established network. In 
particular, land cellular mobile systems have been well 


1098 


Frequency Assignment Problem 


studied (I. Katzela and M. Naghshineh [9] reference 
nearly 100 works in cellular dynamic channel assign- 
ment). 

Frequency assignment problems are typically mod- 
eled in graph theoretical terms. That is, a graph G(V, E) 
is considered with vertices V(G) = {v,..., Vn} and edges 
E(G). Each vertex in V(G) represents a transmitter and 
two vertices (v;, vj) are adjacent (have an edge between 
them) if the corresponding transmitters are not permit- 
ted to share the same frequency. The frequency contin- 
uum is partitioned into channels (frequencies) of even 
width and numbered consecutive integer values. A fre- 
quency assignment is then a mapping f of the nonzero 
positive integers Z, to the vertices of the graph such 
that no two adjacent vertices receive the same value: 


f: Vo Z+ 
s.t. (vj,¥j) € E(G) & f(vi) F flv). 


This formulation, where adjacent vertices cannot share 
the same frequency is termed co-channel constrained 
and was shown by B.H. Metzger [12] to be equiva- 
lent to the well-studied graph coloring problem. Typi- 
cally, the objective is to find an assignment of frequen- 
cies (colors) to the transmitters (vertices) that mini- 
mizes the number of frequencies (colors) used. The 
minimum number y(G) for which a y(G)-coloring 
exists for G is called the chromatic number. Since 
graph K-colorability for arbitrary K is known to be an 
NP-complete problem [6], co-channel constrained fre- 
quency assignment is also NP-complete. 

Consider the restriction that two adjacent vertices 
may not receive frequencies that are the same or dif- 
fer by exactly k. This FAP is said to be adjacent chan- 
nel constrained and when k = 0 is simply the co- 
channel problem. Adjacent channel constraints model 
harmonic interference (signals that are integer multi- 
ples of the fundamental or carrier frequency). In gen- 
eral, a set T may be defined which contains zero and 
a subset of the positive integers such that no two adja- 
cent vertices may receive assignments whose absolute 
difference is contained in T, 


f: Vo Zy 
st. (vi, vj) € E(G) & |f(vi) — f(vp)| ¢ T. 


This FAP formulation was introduced by W.K. Hale [7] 
and is termed T-coloring. When T = {0}, the co-channel 


constrained FAP or graph coloring problem results. 
M.B. Cozzens and F.S. Roberts [4] define the number 
of unique colors used in a T-coloring as the order and 
the total bandwidth used (maximum color minus the 
minimum color) as the span. Hence for any T-coloring, 
two optimality criteria exist: minimum order, denoted 
by y7r(G), and minimum span, denoted by spr(G). For 
the co-channel constrained FAP y7(G) = spr(G) how- 
ever, in general, this is not true. Cozzens and Roberts 
show that for any graph and any T the minimum or- 
der is equivalent to the chromatic number; 77(G) = 
x(G). Hence, T-coloring research has primarily been 
focused on characterizing the minimum span using nu- 
merous assumptions about the structure of G and value 
of T [2,4,5,11,13,16], and [17]. 

In many situations, the potential for interference 
between transmitters may occur on several different 
levels, where each level is defined by a separate set of 
edges on the common set of vertices. The kth edge set 
is denoted by the graph G;, k = 0,..., K. The family of 
graphs thus defined and which share an identical vertex 
set are sometimes referred to, in unison, as a multigraph 
and denoted by G(V, Go, ..., Gx). Since each level rep- 
resents a unique interference mechanism, a family of T- 
sets must be also defined as T(0),..., T(K). Interference 
occurs on the kth level when any 2 vertices adjacent in 
the kth edge set receive frequencies that differ by a value 
in T(k). In graph coloring nomenclature, the multilevel 
FAP is denoted by 


f: Vo Zq, 
(x, y) € E(Gx) <= |f(x) — f(y) ¢ T(K), 
Vix,y)EV, xA#y, k=0,...,K 


where the family of graphs are nested such that Gp D-:- 
> Gx and the T-sets are reverse nested, as 0 € T(0) C 
++» C T(K). Cozzens and D.I. Wang develop bounds on 
the minimum span for general multigraph T-colorings 
in [5]. Excellent reviews on T-coloring and frequency 
assignment for single graphs and multigraphs may be 
found in [14] and [15]. 

Since the simplest FAP has been shown to be NP- 
complete, it is generally hopeless to pursue exact solu- 
tion methods. Approximate heuristic techniques have 
been the focus of most research and most of these 
techniques fall under the scope of sequential heuristics. 
There are three fundamental approaches to sequentially 
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coloring the vertices of a graph: 

e (frequency exhaustive) Given an ordering of the ver- 
tices, attempt to color each vertex, sequentially, the 
smallest feasible color. This approach is also called 
a greedy coloring. 

e (requirement exhaustive) Given an ordering of the 
vertices, attempt to assign the first color to each ver- 
tex, sequentially. When all vertices have been con- 
sidered, attempt to assign the second color to the 
unassigned vertices, then the third and so on, fol- 
lowing the same vertex ordering. 

e (uniform) Given an ordering of the vertices, attempt 
to color each vertex, sequentially, the color that has 
been least used. 

The efficiency of each approach is quite dependent 

upon what ordering the vertices are placed in. There are 

many rules by which the vertices of a graph can be or- 
dered. In a smallest-last ordering, the vertex of smallest 
degree in V is denoted v;. This vertex is then deleted 
from the graph and the next smallest degree vertex v2 
is found and deleted, and so on until all vertices have 

been deleted. The smallest-last vertex order is then {v,, 

Vn—1---> Vi}. The largest-first vertex order sorts the ver- 

tices of the graph according to their degree in G: largest 

to smallest. D. Brelaz [3] introduced a vertex order- 
ing specified by the saturation degree of the vertices, 
from highest saturation degree to lowest. The satura- 
tion degree of a vertex is defined to be the number of 
different colors that exist on the vertices that are adja- 
cent. The vertex with the highest saturation degree is 

‘most denied’ since it has fewer colors to choose from. 

J.A. Zoellner and C.L. Beall [21] compared the three se- 

quential approaches with several different vertex order- 

ing rules and found that, all else being equal, frequency 
exhaustive methods typically yields smaller spans. Hale 

[8] expanded upon these results by defining a gener- 

alized structure for all sequential coloring algorithms 

which consists of three fundamental steps: 

1) order the vertices; 

2) select the next vertex to color; 

3) select the color. 

Hale’s procedure is general. It cover all types of vertex 

orderings in step 1 and allows for each of the three se- 

quential techniques in step 3. Step 2 is added to allow 
the coloring sequence to adapt during the process. Hale 
introduced new sequential techniques for step 2 that are 
adaptive variants of the saturation degree. A very good 


review of frequency assignment heuristics can be found 
in [10]. 

Approximate solutions may also be obtained by us- 
ing more traditional polyhedral methods on relaxation 
problems of the integer program (IP) formulation of 
the FAP. A. Wisse [19] developed a minimum order IP 
formulation for the FAP which relies on a list coloring 
model, that is, the frequencies which may be assigned 
are restricted to a finite list (set), designated by F, of 
cardinality m. Furthermore, I is designated as the index 
set for all transmitters (vertices) and n the cardinality of 
I. Define two binary decision variables as 


1 iftransmitter i assigned freq f, 
Xif = 
. 0 else, 
1 if freq f used at least once, 
ype 
0 else. 
The IP which results is 
min z= Yo vy 
{EF 
s.t. xy =1, Viel, 
fer 
Yo xip S nyy. Vf EF, 
i€l 
2 Xjg S1—<is, 
gs: |f-g|¢T 
Vif e€ F, V(v;, v;) € E(G), 
xi © {0,1}, Viel, Vf EF 
yp {0,1}, Vf EF. 


A minimum span FAP IP formulation may be had by 
deleting variable y; and adding py, defined to be the 
maximum frequency assigned, 


min pL 
s.t. y=, Viel, 
feF 
ae Xjg 1 — xis, 
g: |f-g|¢T 
VfeF, YV(vi,vj) € E(G), 
Y ree: Viel, 
fer 
xip © {0,1}, Viel, Vf eF, 
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Of course a solution to either IP formulation would be 
exact, however efficient methods for finding solutions 
to this formulation do not yet exist for problems of 
large dimension. Linear relaxations of these formula- 
tions where 0 < xj < 1, have been successfully devel- 
oped and yield fairly good solutions for some moder- 
ate size problems [1]. A potential reduction method [18] 
has also been developed that utilizes the transformation 
xi = 2xi¢ — 1, that is, xj¢ € {—1, +1}. Asa result, any fea- 
sible solution to the transformed IPs must satisfy xTx = 
mn where x is a vector of xj¢ in Z’'". The polyhedron 
P formed from the linear relaxation of x and y or p in 
the constraints of the IPs is then incorporated into the 
problem by minimizing a logarithmic potential func- 
tion over the polyhedron as 


Lx 
x— =D Welog si : 


k=1 


min | nm — x! 
P 


where N is the number of constraints, w, are nonneg- 
ative real valued weights, and s, is the slack of con- 
straint k. A sequence of iterative solutions are obtained 
in a three step process which begins with a nonopti- 
mal feasible solution x°. An interior point method is 
applied to a quadratic approximation of the potential 
function within an ellipsoid centered on the current 
feasible point x‘. This yields a decent direction Ax. The 
potential function is then minimized within the ellip- 
soid along the line x'+ wAx and yields the next iterate 
x't! The iterate solution is then rounded to an integer 
value. The algorithm stops when the rounded solution 
is feasible to the original problem. This algorithm was 
tested and was found to suffer from slow convergence. 
As a result, an alternate quadratic assignment formu- 
lation was developed which proved to be much faster. 
Define a new binary valued decision variable 


1 ifxig = 1, xjg = 1, and 
(vi,vj)€ E(G), |f—gleT, 


0. else. 


Gifig = 


Then the assignment F — x has no interference if 


ne » DS » XifXjeJifig = 9, 


i=1 f=1 j=1 g=1 


which is equivalent to 
1.ToO, — 
3x Qx = 0, 


where Q is a mn X mn matrix containing qigg. Thus the 
new potential function, with the added quadratic term, 
is minimized over the polyhedron as 


1 ie 
F T 
min| ~x Qx-—— > wy,logs 
P12 Q N oe cae 
k=1 
Interior point solutions of this potential function con- 
verged much more quickly than those of the first for- 
mulation. 


See also 


> Assignment and Matching 

> Assignment Methods in Clustering 

> Bi-objective Assignment Problem 

> Communication Network Assignment Problem 

> Graph Coloring 

> Maximum Constraint Satisfaction: Relaxations and 
Upper Bounds 

> Maximum Partition Matching 

> Quadratic Assignment Problem 
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Some of the main indicators of progress in the math- 
ematical sciences have been the occurrences of new 
types of numbers. One of the more recent cases are the 
complex numbers. Much of modern science cannot be 
imagined without their use. 

Their introduction into mathematics first had been 
motivated by the wish to solve the equation 


P(z) = 0, (1) 


where P(z) is a polynomial. 

If one considers only real numbers, such simple 
equations like P(z) = z” + 1 = 0 have no solutions. In 
the field of complex numbers however (1) always at 
least one solution, if P is a nonconstant polynomial with 
complex coefficients. This fact is known as the funda- 
mental theorem of algebra. It was first proved rigor- 
ously by C.F. Gauss in 1799. Since then a large number 
of proofs have been found. In this article I give some 
examples for the main types of proofs: analytic, topo- 
logical and algebraic. 


Analytic Proofs 


Possibly the simplest proof, being based on the Liouville 
theorem [3]: Every bounded entire function is a con- 
stant. 

Assume now that the nonconstant polynomial P(z) 
has no zero. Since |P(z)| — oo for |z| — 00, the function 
f(z) = 1/P(z) is bounded and thus a constant by Liou- 
ville’s theorem. But then also P(z) is constant, a contra- 
diction. 

Another, still simple, proof is based on the argument 
principle: The number of zeros of a holomorphic func- 
tion f inside a simple closed curve y can be expressed 
by the integral 
ft) dz. 

f(z) 


1 
2ni 
y 
Let P(z) be a polynomial of degree n > 1. Choosing for 
y the circle around the origin with radius R > 0, we ob- 
tain for the number N of zeros of P(z): 


1 P'(z) 


~ ani} Plz) **" 


Y 
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Since 
Pz) on ( 1 ) 
=-+0O » |Z) > o, 
P(z) Zz |z|? 2 


we obtain N = n for R > Ro. Thus P(z) has n zeros 
(counted with multiplicity). 


Topological Proofs 


Closely related to the second analytic proof, presented 
above, is the proof by the concept of homotopy [2]. 

If X and Y are two topological spaces, then two con- 
tinuous maps Qo, gi: X — Y are called homotopic if 
there exists a continuous map 


yg: Xx[0,1] 3 Y 
such that 

p(x, 0) = go(x), 

p(x, 1) = g(x). 


We choose X = {z € C: |z| = 1}, Y = C — 0. Let 
P(z) = 2"+ dyp—1z"—~! + +++ tay = 2"+ Q(z), say n > 1, 


ao #0). 


For sufficiently large R the two maps 


ze > (Rz)", 


zt > P(Rz) 


Po: xX—> Y, 
Pi: xX—> Y, 


are homotopic. A homotopy is in fact given by 9(z, t) = 
(Rz)" + t Q(Rz), z € X, t € [0, 1]. If there is no zero of 
P(z) inside the circle |z| = R, yg and thus also go would 
be homotopic with the constant map 


Q.: X> Y, ZH > ay; 


which can be shown to be false by topological means. 


Algebraic Proofs 


Since there is no purely algebraic system of axioms for 
the field of complex numbers there cannot be a purely 
algebraic proof. However there is a proof which as the 
only result from analysis uses the intermediate value 
theorem [1], which we reproduce here. 

A statement equivalent to the fundamental theorem 
is that C is the algebraic closure of R. We start by show- 
ing that every nonconstant polynomial P(z) with real 
coefficients has a complex zero. We proceed by induc- 
tion. 


Let n be the degree of P(z). 

i) Ifn is odd, the claim is an immediate consequence 
of the intermediate value theorem. 

ii) Let n = 2'u with odd u, t > 0, and assume the claim 
has been proven for t — 1. 
We select a splitting field S for P(z) over C. Then we 
have a decomposition 


P(z) = (z—41)+++(Z@— an) in S[z]. 


For an arbitrary real number c we form the expres- 
sions bj(c) = aja; + c(a; + aj) and the polynomial Q(z) 
= [hi <icj<n (2 — bi(c)). The coefficients of Q(z) are 
symmetric polynomials in a), ..., d, over R and thus 
real. The degree of Q(z) is n(n — 1)/2 = 2'!u(2'u—1) 
= 2'1y for an odd number v. By the induction hypoth- 
esis Q(z) has at least one zero in C. Thus bj(c) is in C 
for a pair of subscripts (i, j) that may depend on c. If 
this construction is carried out for all natural numbers 
c with 1 <c < 1+ n(n —1)/2 one finds c and c’ belong- 
ing to the same pair of subscripts, i. e. there is a pair (i, j) 
with bj(c) € C and bj(c’) € C. If one solves the system 
of equations 


bij(c) = ajaj + cla; + aj), 
b;;(c’) = ajaj + c (a; + aj) 


one obtains a; = a/2 + Va? — 4b?/2 € C. Thus P(z) 
has a complex zero. 

Let now P(z) € C[z] be irreducible and t a zero of 
P(z) in a splitting field of P(z) over C. Then P(z) is the 
irreducible polynomial of t over C. Since t is algebraic 
over C and C is algebraic over R, t is algebraic over R. 
We denote the irreducible polynomial of t over R by 
U(z). Then P(z)/U(z) in C[z]. 

U(z) has at least one zero in C. Since C is normal 
over R, U(z) splits into linear factors in C[z]. Thus P(z) 
is linear and t € C. 


See also 


> Grobner Bases for Polynomial Equations 
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Fuzzy multi-objective linear programming extends the 
linear programming model (LP) in two important as- 
pects: 

e multiple objective functions representing different 
points of view (criteria) used for evaluation of fea- 
sible solutions, 

e uncertainty inherent to information used in the 
modeling and solving stage. 

A general model of the FMOLP problem can be pre- 
sented as the following system: 


—_-> 


[c,x,...,€,x] > min (1) 
such that 

ajx<b;, i=1,...,m, (2) 

x>0, (3) 


where’, = [¢€11,..., Cin I= 1,...,k),x=[%1,...5 Xn] 7, 
a; = [ai,...,Gin (i= 1,..., m). The coefficients with 
the sign of wave are, in general, fuzzy numbers, i. e. con- 
vex continuous fuzzy subsets of the real line. The wave 


over min and relation < ‘fuzzifies’ their meaning. Con- 
ditions (2) and (3) define a set of feasible solutions (de- 
cisions) X. An additional information completing (1) is 
a set of fuzzy aspiration levels on particular objectives, 
thought of as goals, denoted by gi,.... x. 

There are three important special cases of the above 
problem that gave birth to the following classes of prob- 
lems: 

e flexible programming; 

e multi-objective linear programming (MOLP) with 
fuzzy coefficients; 

e flexible MOLP with fuzzy coefficients. 

In flexible programming, coefficients are crisp but 
there is a fuzzified relation < between objective func- 
tions and goals, and between left- and right-hand sides 
of the constraints. This means that the goals and con- 
straints are fuzzy (‘soft’) and the key question is the de- 
gree of satisfaction. In MOLP with fuzzy coefficients all 
the coefficientsare, in general, fuzzy numbers and the 
key question is a representation of relation < between 
fuzzy left- and right-hand sides of the constraints. Flex- 
ible MOLP with fuzzy coefficients concerns the most 
general form (1)-(3) and combines the two key ques- 
tions of the previous problems. 

The two first classes of FMOLP problems use dif- 
ferent semantics of fuzzy sets while the third class com- 
bines the two semantics. In flexible programming, fuzzy 
sets are used to express preferences concerning satisfac- 
tion of flexible constraints and/or attainment of goals. 
This semantics is especially important for exploiting in- 
formation in decision making. The gradedness intro- 
duced by fuzzy sets refines the simple binary distinc- 
tion made by ordinary constraints. It also refines the 
crisp specification of goals and ‘all-or-nothing’ deci- 
sions. Constraint satisfaction algorithms, optimization 
techniques and multicriteria decision analysis are typi- 
cally involving flexible requirements which can be rep- 
resented by fuzzy relations. 

In MOLP with fuzzy coefficients, the semantics of 
fuzzy sets is related to the representation of incom- 
plete or vague states of information under the form 
of possibility distributions. This view of fuzzy sets en- 
ables representation of imprecise or uncertain informa- 
tion in mathematical models of decision problems con- 
sidered in operations research. In models formulated in 
terms of mathematical programming, the imprecision 
and uncertainty of information (data) is taken into ac- 
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count through the use of fuzzy numbers or fuzzy in- 
tervals instead of crisp coefficients. It involves fuzzy 
arithmetic and other mathematical operations on fuzzy 
numbers that are defined with respect to the famous 
Zadeh’s extension principle. 

In flexible MOLP with fuzzy coefficients, the uncer- 
tainty and the preference semantics are encountered to- 
gether. This is typical for decision analysis and opera- 
tions research where, in order to deal with both uncer- 
tain data and flexible requirements, one can use a fuzzy 
set representation. 

Below, we make a tutorial characterization of 
the three classes of problems and solution meth- 
ods. For more detailed surveys see, e.g., [16,18,20,27, 
30,32,36,37]. 


Flexible Programming 


Flexible programming has been considered for the first 
time in [41] with respect to single-objective linear pro- 
gramming. It is based on a general Bellman-Zadeh 
principle [2] defining the concept of fuzzy decision as an 
intersection of fuzzy goals and fuzzy constraints. A fuzzy 
goal corresponding to objective c;x is defined as a fuzzy 
set in X; its membership function ju; : X — [0, 1] char- 
acterizes the decision maker’s aspiration of making ¢;x 
‘essentially smaller or equal to g)’. A fuzzy constraint 
corresponding to a;x<b; is also defined as a fuzzy set 
in X; its membership function j1; > [0, 1] characterizes 
the degree of satisfaction of the ith constraint. 

In order to define the membership function ju; (x) 
for the ith fuzzy constraint, one has to know the tol- 
erance margin d; > 0 for the right-hand side b; (i = 
1,...,™m); 


1 for a;x < bj, 

strictly decreasing from 1 to 0 
for b; < ax < b; + dj, 

0 fora;x > b; + dj. 


pi(x) = (4) 


Specifying a membership level a, a € [0, 1], in [41] 
the set of feasible solutions ofeach fuzzy constraint has 


been restricted to the crisp set 
x = {x: wi(x)>a}, i=l,...,m. 


Then, the set of feasible solutions of a flexible program- 
ming problem is Xy = "_, Xi. The single objective 


function is replaced by the fuzzy goal 


min, cx 
pug(x) = Merten) 
cx 


To get an optimal solution one has to determine the op- 
timal pair (a*, x*) such that 


min{a”™, jz(x*)} = sup min, («. max yeo(0)} J . (5) 


If the optimal a* was determined a priori, the prob- 
lem(5) could be reduced to a crisp mathematical pro- 
gramming problem where the objective was to find x* 
that maximizes [1g (x) on the set XZ. In the general case 
an iterative algorithm is necessary when beginning with 
any a; € [0, 1], the values a, and maxxex,,, {Wa (x)} 
converge to the optimum step by step. 

HJ. Zimmermann [46] has proposed a more in- 
tegrative approach to flexible programming allowing 
consideration of multiple goals and constraints on 
a common ground. An aspiration level g; and a toler- 
ance margin d; > 0 have to be assumed for the /th goal 


(1 = 1, ..., k) when assessing the membership function 
[4)(x) as: 
1 for cx < gi, 
strictly decreasing from 1 to 0 
p(x) = 7 (6) 


for g) < ¢)x < gi + dj, 
0 for cx > gi +d). 


According to the Bellman-Zadeh principle, the set 
of fuzzy decisions is characterized by an aggregation of 
the component membership functions. If a conjunctive 
minimum operator were used for the aggregation, the 
membership function would be: 


[Lp(x) = min{{41(x), [i (x)}. (7) 


Then, the problem of finding the best decision (solu- 
tion) boils down to the following optimization prob- 
lem: 


Mp(x) — max 


s.t. x>0. 


(8) 


The value of the aggregated function jp(x) can be 
interpreted as the overall degree of satisfaction of the 
decision maker with k fuzzy goals and m fuzzy con- 
straints. 
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In case of minimum operator (7), problem (8) be- 
comes: 


Vv — max 


st. v<pyj(x), 1=1,...,k, 
(9) 
v< pix), i=l1,...,m, 
x>0. 


In [46,47], Zimmermann has applied linear mem- 
bership functions (4), (6) in problem (9) thus getting an 
ordinary LP problem. He also proposed to use the prod- 
uct operator instead of minimum, however, then (8) be- 
comes nonlinear even if linear membership functions 
are used. A comprehensive review of various proposi- 
tions for modeling the functions jzp(x) can be found 
in [39,48]. 

Knowing the membership functions jj(x)(J = 
1,..., k) for fuzzy goals, one can define a Pareto optimal 
solution in the space of membership values, calledan 
M-Pareto optimal solution [32]. Some other refine- 
ments of the Zimmermann’s approach have been pro- 
posed in [1,11]. 


Definition 1 A solution x* is said to be M-Pareto op- 
timal if and only if there does not exist another x € X 
such that jui(x) > j47(x*), 1/=1,..., k, with strict inequal- 
ity holding for at least one /. 


The concept of M-Pareto optimal solutions was at the 
origin of several interactive methods proposed for flex- 
ible programming (see [30,32]). In these methods, the 
decision maker determines membership functions for 
fuzzy goals and then specifies reference levels for the 
membership functions, denoted by jz; (J = 1, ..., k). 
Assuming some minimum levels for membershipfunc- 
tions of fuzzy constraints, denoted by ¢; (i= 1,..., m), 
one gets the following optimization problem: 


max{7l) — pt)(x)} —> min 


s.t. Mix) > tj, i=1,...,m, 
x>0, 
which is equivalent to 
v —> min 
st v>w— Ky bh SH Dyeds KS 
a [A1(x) (10) 

mi(x)>ti, i=1,...,m, 
x>0. 


Again, problem (10) becomes an ordinary LP prob- 
lem when all membership functions are linear. This ap- 
proach is interactive in the sense that the reference lev- 
els can be changed from one iteration to another, as well 
as the membership functionsof fuzzy goals. 


MOLP with Fuzzy Coefficients 


All fuzzy coefficients of the FMOLP problem are given 
in a convenient form of L-R fuzzy numbers [13]. An L- 
R (flat) fuzzy numbera = (a", a®, a, a®)rp is defined 
by the membership function: 


L (=) forr <a’, 
alr) = 41 
R (5 a 
where L and R are symmetric bell-shaped reference 
functions which are strictly decreasing in [0, 1] and 
such that L(0) = R(0) = 1, L(1) = R() = 0; [a!, a®] is an 
interval of the most possible values, and a! and a are 
nonnegative left and right ‘spreads’ of a, respectively. 
Experience indicates that an expert can describe the 
precise form of a fuzzy number only rarely. Therefore, 
as a practical way of getting suitable membership func- 
tions of fuzzy coefficients, H. Rommelfanger [26] has 
proposed that the expert begins with the specification 
of some prominent membership levels a and associates 
them with special meanings. After that the expert is 
expected tospecify values which belong to the selected 
membership levels. 


fora’ <r<ak, 


) forr > ak, 


a =1: ja(r) = 1 means that value r certainly belongs 
to the set of possible values; 

a =A: ja(r) > A means that the expert estimates that 
value r with j4g(r) > A has a good chance of 
belonging to the set of possible values; 

a@=e: [4(r) < € means that value r with g(r) < € 


has only a very little chance of belonging to the 
set of possible values, i.e. the expert is willing 
to neglect the corresponding values of r with 
Malr) <e. 


For example, it is reasonable to assume that A = 0.6, 
é=0.1. 

For the sake of clarity, let us assume that the refer- 
ence functions of all fuzzy coefficients are of two kinds 
only: L and R. It should be specified, moreover, that all 
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arithmetic operations on fuzzy numbers taking place in 
(1), (2) are extended operations in the sense of Zadeh’s 
extension principle [45]: 


faglt) = sup TCfaly), fz(2), 


r=y*z 


reR, (11) 
where * is a real operation *: R x R > Rand T: [0, 1] 
x [0, 1] > [0, 1] is any given t-norm. 

For any x > 0, the left-hand side of the ith constraint 
and the value of the /th objective function can be sum- 
marized to the following fuzzy numbers: 


ax = (ax, as 


iL, 
l 


x, a) x, of'x) i=1,...,m, 


LR? 
Gx= (c ic x eS vix) 1=1,...,k. 


LR’ 


In the literature the min t-norm is generally applied. 
Then, 


n n 

i a en Trg 

ajx= y G;iXj, ¢( X= y CpjXj> (12) 
j=l j=l 
n n 

Ry Rigs St Rie Ry. 

ajx= y GjjXj, X= y C1 jXj> (13) 
j=l j=1 
n n 

De ce Lx Ly — Bax, 

ax = y QjjXj, YpX= y VijXj (14) 
j=l j=l 
n n 

Ry _ Ry. Ry Ry. 

a;x = y OjjXj,  YpX= y VijXi- (15) 
j=l j=l 


Obviously, the spreads of these fuzzy numbers extend 
when number and values of variables increase. The sim- 
ple addition of the spreads of fuzzy coefficients corre- 
sponds to the assumption that their uncertainty comes 
fromindependent sources. This is not realistic in many 
practical situations. For getting a more realistic ex- 
tended addition of the left-hand sides of fuzzy con- 
straints and of fuzzy objectives, Rommelfanger and T. 
Keresztfalvi [29] recommend the use of Yager’s param- 
eterized t-norm: 


ti,...,ts € [0,1], p>o. (16) 


Then, a/x, a;'x, c/ x, c/x are calculated according to (12) 


ie Reged hg aR 
and (13), however, the spreads a; x, at;*x, YX Y; Xare 


calculated according to a new, less cumulative formula: 


1/q 


where q = p/(p— 1) = 1. 

Coming back to MOLP problem with fuzzy coeffi- 
cients, we haveto answer the question how to interpret 
the relation between fuzzy left- and right-hand side of 
the constraints. If constraints (2) were transformed to 
equality constraints (by addition of slack variables on 
the left) thenthe equality relation could be interpreted 
in terms of weak inclusion of fuzzy sets [12,21]: 


Ux Cb;, 1 1,...25 7%: (17) 


It says that the region of possible values of the left-hand 
side should be contained in the tolerance region of the 
right-hand side. The LP problem with constraints (17) 
is called robust programming problem. 

Each constraint (18) is then reduced to four deter- 
ministic constraints: 


ajx>b7, ax < bi, 
arx—alx> bt — pi, a 
afx +arx <b} + Bi, 
fori=1,...,m, 
where 6; = (b,b%, BI. BM. or bi = (bt,bF, 


co BE )rrs t= 1 nee m. 

In order to transform fuzzy objectives into deter- 
ministic equivalents, one can consider a ‘middle’ value 
of ¢;x at some level € € [0, 1], /=1,..., k. The ‘middle’ 
can be understood [8] as a weighted combination of the 
most possible values ox and cs and of the smallest 
and the greatest (extreme) values at possibility level &. 
Thus, the objectives (1) become: 


[z1(x),...,Z%(x)] > min, (19) 
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where z(x) = wicy — woy/xL!(&) + w3ckx + wa 
yixR71(é), 1=1,..., kj wi, w2, w3, w4 are nonnega- 
tive weights, e.g. w) = w3 = 0.3, w2 = w4 = 0.2. The 
deterministicobjectives (19) are linear even if reference 
functions L and R are nonlinear. 

There exist approaches proposing a substitution of 
each objective by several deterministic objectives corre- 
sponding to extreme values of several &-level sets [9,28]. 

Finally, let us mention a comparison technique of 
fuzzy numbers, which is based on the compensation 
of area determined by the membership functions of 
two fuzzy numbers being compared. This technique, 
which has been characterized in [17] and [5], and then 
in [31] and [15], can be used directly to transform 
the comparison of fuzzy left- and right-hand side of 
the constraints, and of the fuzzy objectives and fuzzy 
goals into nonparametric deterministic equivalents. Al- 
though this technique seems intuitive, it has a convinc- 
ing theoretical foundation. 

Indeed, the semantics of fuzzy numbers consid- 
ered in the MOLP problem with fuzzy coefficients is 
related to the representation of incomplete or vague 
states of information under the form of possibility dis- 
tributions. This view of fuzzy numbers is concordant 
with the Dempster interpretation of fuzzy numbers as 
imprecise probability distributions [10]. In this per- 
spective, the comparison of two fuzzy numbers can 
be substituted by the comparison of their mean val- 
ues defined consistently with the well-known defini- 
tion of expectation in probability theory. The idea ex- 
ploited in [14] relies on the mathematical fact that, 
with respect to a fuzzy number, the possibility mea- 
sure corresponds to an upper probability distribu- 
tion, while the necessity measure, to a lower proba- 
bility distribution of the corresponding random vari- 
able. Then it is reasonable to define the mean value 
of a fuzzy number as a closed interval whose bounds 
are expectations of upper and lower probability distri- 
butions. The comparison of two fuzzy numbers boils 
down to the comparison if arithmetic means of these 
bounds, which is computationally equivalent to the 
above mentioned technique based on area compensa- 
tion, as shown in [15]. 

In consequence of application of all these compar- 
ison techniques, the MOLP problem with fuzzy co- 
efficients is transformed to an associate deterministic 
MOLP problem, as (19), (18), (3) above, which should, 


preferably, be solved by one of existing interactive pro- 
cedures (see, e. g., [43]). 


Flexible MOLP with Fuzzy Coefficients 


This problem combines the two semantics of fuzzy sets 
considered separately in flexible programming and in 
MOLP with fuzzy coefficients. This means that in addi- 
tion to fuzzy coefficients in the objective functions and 
on the both sides of the constraints, the degree of sat- 
isfaction of fuzzy constraints and fuzzy goals is consid- 
ered in fuzzy set terms. 

A crucial question which has to be answered while 
solving a flexible MOLP problem with fuzzy coefficients 
is how to express the minimal conditions on the satis- 
faction of fuzzy constraints in deterministic terms. 

In most of existing approaches, the minimal condi- 
tions on the satisfaction of fuzzy constraints (2) are ex- 
pressed by one or two deterministic linear constraints 
which substitute the original fuzzy constraints. To give 
an idea of these crisp surrogates, let us present them in 
common terms from the most pessimistic to the most 
optimistic attitude. We assume the following form of 
the fuzzy left- and right-hand side of the ith constraint: 


R 
i 


b; = (bi, 0, Bi)ze, 


a) (see [3,40]) 


aix = (ayx,a;x, af x, af x)re, 


aéx+afxR'(p) <b;, p € [0,1]; 
b) (see [22,25,44]) 
a*x <b; 
akx + a®xR7l(e) < bj + BiR'(e), 
e € [0,1]; 


c) (see [4]) 
ax + axR7'(a) < b; + BiR7(0), 
o € [0, 1]; 
d) (see [8,34,35]) 
atx — b; < a! xL7!(t) + B;R™'(r), 
t € [0,1], optimistic, 
akx + a®xR!(n) < bj + BiR'(n), 


n € [0,1], pessimistic; 
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e) (see [23]) 


aéx <b; + 5Bi; 
6+¢e€[0,1],5>0, ¢>0, 
atx + (l—e—Sd)akx <b) + (1—©)Bi 
f) (see [33,19]) 
ayx—ayxL'(a) < bj + BR '(a), 


a € [0,1]. 


In all these approaches, the parameters aq, 6, &€, 1, T, 
p, o can be used by the decision maker to control the 
degree of satisfaction of fuzzy constraints in an interac- 
tive way. 

Figure 1 shows results of conditions a)-f) applied 
on a common fuzzy constraint. Although it is the case 
in Fig. 1, the reference functions L and R need not be 
linear in the above conditions. 

Another interpretation of fuzzy constraints has 
been given in [24]. The ith fuzzy constraint is replaced 
by the pessimistic condition proposed in [34] and by 
a new objective: 


akx + af*x < b; + BF, (20) 


ui(x) > max, (21) 


where membership function ju;(x) is defined according 
to (4). More detailed discussion of the interpretation of 
fuzzy constraints can be found in [30]. 

If fuzzy goals are specified as L-R fuzzy numbers 


= (g1,0,v1)r1 (= 1,..., k), then the satisfying con- 


ditions 


Ga=¢),. T= 1,.255K; (22) 
can be treated as additional fuzzy constraints. In accor- 
dance to the chosen interpretation of the fuzzy inequal- 
ity relation, (22) can be substituted by one or two crisp 
inequalities listed above or by (20) and (21). Another 
proposal has been made by R. Slowinski in [34,35]; the 
degree of satisfaction of fuzzy goals is represented there 
by the levels of intersection of left reference functions 


of ¢;x with right reference functions of g) (J = 1,..., k): 


(23) 


These crisp objectives substitute the fuzzy ones. In the 
case of linear reference functions L, functions (23) be- 
come linear fractional: 


L 
c;X — g1 


—>min, /]=1,...,k. (24) 


yt x+ Vv] 
The crisp objectives (24) and the optimistic and pes- 
simistic conditions d) on the satisfaction of fuzzy con- 
straints have been used in the FLIP method presented 
in [8,34,35,39]. They constitute an associate deter- 
ministic multi-objective linear-fractional programming 
(MOLFP) problem. In FLIP, the MOLFP problem is 
solved using an interactive sampling procedure. In each 
calculation step of this procedure, a sample of nondom- 
inated points (Pareto optimal solutions) of the MOLFP 
problem is generated and then shown to the decision 
maker who is asked to select the one that fits best 
his/her preferences. If the selected point is not the final 
compromise, it becomes a central point of a nondomi- 
nated region that is sampled in the next calculation step. 
In this way, the sampled part of the nondominated set is 
successively reduced (focusing phenomenon) until the 
most satisfactory efficient point (compromise solution) 
is reached. An important advantage of the method pre- 
sented above is that the only optimization procedure 
to be used is a linear programming one. Moreover, it 
has a simple scheme and allows retractions to the points 
abandoned in previous iterations. 

The interaction with the decision maker takes place 
at two levels: first when fixing the safety parameters and 
then in the course of the guided generation and evalu- 
ation of the nondominated points of the MOLFP prob- 
lem. 

Let us precise that the fuzzy goals g; (J = 1, ..., k) 
do not influence the set of nondominated points of the 
MOLFP problem; they rather play the role of a visual 
reference than that of a preferential information influ- 
encing the set of generated proposals for the compro- 
mise solution. 

An important feature of any software implement- 
ing a fuzzy multi-objective programming method is the 
presentation of candidate solutions in the interactive 
process. In the FLIP software, the Pareto optimal so- 
lutions of the MOLFP problem are shown not only nu- 
merically but also graphically, in terms of mutual po- 
sitions of fuzzy numbers corresponding to original ob- 
jectives and aspiration levels on the one hand, and to 
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Fuzzy Multi-objective Linear Programming, Figure 1 
Results of conditions a)-f) applied on a common fuzzy constraint 


1110 


Fuzzy Multi-objective Linear Programming 


left- and right-hand side of original constraints on the 
other hand [6]. In this way, the decision maker gets 
quite a complete idea of the quality of each proposed 
solution. 

The quality is evaluated taking into account the fol- 
lowing characteristics: 

e scores of fuzzy objectives in relation to the goals; 

e dispersion of values of the fuzzy objectives due to 
uncertainty; 

e safety of the solution or, using a complementary 
term, the risk of violation of the constraints. 

So, the definition of the best compromise involves 
not only the scores on particular objectives but also the 
safety of the corresponding solution. It is possible due 
to visual interaction that needs graphical display of ob- 
jectives and constraints for any analyzed solution. The 
comparison of fuzzy left- and right-hand side of the 
constraints, as well as evaluation of dispersion of the 
values of objectives, is practically infeasible on the ba- 
sis of numerals only. The graphical presentation of pro- 
posed solutions is not only a ‘user friendly’ interface but 
the best way for a complete characterization of these so- 
lutions. 

There exists an implementation of FLIP in Visual 
Basic in the MS-Excel environment; it allows a user to 
define all safety parameters and the parameter p of the 
Yager’s formula (16) for the aggregation of fuzzy objec- 
tives and of fuzzy left-hand sides of fuzzy constraints. 
The candidates for the best compromise solution are 
displayed there both numerically and graphically. 


Conclusions 


Fuzzy multi-objective linear programming methods 
have often been proposed in view of specific applica- 
tions (see, e. g., [6,18,30,34,39,44]). This means that the 
many proposals described in this article are based on 
different assumptions that are verified in different prac- 
tical situations. The choice of a procedure for an ac- 
tual decision problem should take into account these 
assumptions. In any case, the interactive process should 
enable the best use of the decision maker’s knowledge 
of the problem. Fuzzy multi-objective linear program- 
ming can also be seen as a tool for an interactive robust- 
ness analysis of MOLP problems. It gives an insight into 
sensitivity of proposed solutions on changes of partic- 
ular coefficients within some intervals and on changes 


of preferences as to degrees of satisfaction of the con- 
straints. 
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Synonyms 


Indices 


i= orders 

j= product-stock tanks 
s= products 

k= components 

I= component tanks 
n= event points 


Sets 
l= 


orders 
orders which can be performed in product-stock 
tank j 


I,= orders which order product s 

J=  product-stock tanks 

Ji=  product-stock tanks which are suitable for per- 
forming order i 

J;=  product-stock tanks which can store product s 

N= event points within the time horizon 

= products 

S;= products which can be stored in product-stock 
tank j 

K= components 

K, = components which can be stored in component- 
stock tank / 

L= component stock tanks 

[, = component-stock tanks which can store compo- 
nent k 

Parameters 

Vmax(j) = maximum capacity of product-stock 

tank j 
Vmin(j) = minimum amount of product stored in 


tank j if tank j is utilized 
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Vinitial(j,s) = 
Vin(Lk) = 
Vcomp(l) = 
Recipe(s,k) = 


I(i) = 
Bflow = 


Prod_srt(i) = 


Prod_end(i) = 
U1= 


H= 


Variables 


uv(ij,n) = 


y(s,j,n) = 


sv(s,j,n) = 


xv(s,n) = 


yv(k,Ln) = 


Ts(i,j,n) = 
Te(i,j,n) = 


lift(i,j,n) = 


amount of product s stored in tank j 
initially 

amount of component k stored in com- 
ponent tank / initially 

maximum capacity of component 
tank | 

the proportion of component k to in 
product s 

lifting rate of order i 

flow rate of product being produced 
and transferred to product-stock tanks 
time by which order i can start 

time by which order i is due 

lower bound on the amount of product 
lifted 

upper bound on the amount of product 
lifted 

upper bound of a small-sized order 
upper bound of a medium-sized order 
lower bound of a large-sized order 
minimum flow rate of component 
tanks 

maximum flow rate of component 
tanks 

time horizon 


binary variables that assign the be- 
ginning of order i in tank j at event 
point n 

binary variables that assign product s 
being stored in tank j at event point 
binary variables that assign product s 
being produced and transferred to 
tank j at event point n 

0-1 continuous variables that assign 
product s being produced at event 
point 

binary variables that assign compo- 
nent k being extracted from compo- 
nent-stock tank / at event point n 
starting time of order i in tank j at 
event point n 

finishing time of order i in tank j 
while it starts at event point n 
amount of product being lifted for 
order i from tank j at event point n 


Pst(s,j,n) = amount of product s in tank j at event 
point 1 before new product is trans- 
ferred from the blender 

starting time of product s being pro- 
duced and transferred to product- 
stock tank j at event point n 
finishing time of product s being pro- 
duced and transferred to product- 
stock tank j at event point n 

amount of product s being trans- 
ferred from blender to tank j at event 
point n 

amount of component k being trans- 
ferred to the blender at event point n 
amount of component k in compo- 
nent tank / at event point n 

amount of component k being trans- 
ferred from separation units to com- 
ponent tank / at event point n 


Tbs(s,j,n) = 


Thf(s,j,n) = 


Blnd(s,j,n) = 


comp(k,I,n) = 
bc(k,1,n) = 


cracking(k,I,n) = 


Introduction 


Gasoline blending is a crucial step in refinery opera- 
tion as gasoline can yield 60-70% of a refinery’s profit. 
The process involves mixing various stocks, which are 
the intermediate products from the refinery, along with 
some additives, such as antioxidants and corrosion in- 
hibitors, to produce blends with certain qualities [1]. 
In the past few decades, a substantial amount of work 
has been dedicated to process operations [3,4,7,8,9]. 
A variety of support systems have been developed to 
address planning and scheduling of blending opera- 
tions. StarBlend [13], for example, which is developed 
by Texaco, uses a multiperiod blending model written 
in GAMS that facilitates the incorporation of future re- 
quirements into current blending decisions. Glismann 
and Gruhn [5,6] proposed a mixed-integer linear model 
(MILP), which is based on a resource-task network rep- 
resentation, to solve the task of short-term scheduling 
of blending processes. The recipe optimization prob- 
lem is then formulated as a nonlinear program and the 
results are returned to the scheduling problem, so that 
an overall optimization can be achieved. A fuzzy lin- 
ear formulation was applied to the blending facilities 
by Djukanovic et al. [2], in order to address the prob- 
lem of uncertainty of input information within the fuel 
scheduling optimization. Singh et al. [14] addressed the 
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Problem 1 Problem 2 


oe 


Problem 3 


Shee 
Crude-Oil Storage Tanks Charging Tanks Crude Other Component Blend Header 
Marine Vessels Distillation Production Stock 
Units Units Tanks <<>>. > 


TORO oOon NT 
| 


Finished Product Tanks Lifting/Shipping Points 


Gasoline Blending and Distribution Scheduling: An MILP Model, Figure 1 
Graphic overview of the gasoline blending and distribution system 


problem of blending optimization for in-line blending 
for the case of stochastic disturbances in feedstock qual- 
ities. They presented a real-time optimization method 
that can provide significantly improved profitability. 

The objective of this work is to propose a new math- 
ematical model that addresses the simultaneous opti- 
mization of the short-term scheduling problem of gaso- 
line blending and distribution as described in the fol- 
lowing section. 


Definition 

The overall oil-refinery system is decomposed. into 
three parts as depicted in Fig. 1. The first part (prob- 
lem 1, Fig. 1) involves the crude-oil unloading, mix- 
ing and inventory control (Jia et al. [10]), the second 
part (problem 2, Fig. 1) consists of the production unit 
scheduling, which includes both fractionation and re- 
action processes, and the third part (problem 3, Fig. 1), 
which is addressed in this work, depicts the finished 
product blending and shipping end of the refinery. 
The gasoline blending system consists of four pieces 
of equipment all linked together through various pip- 


ing segments, flow meters and valves. They are com- 
ponent-stock tanks, blend header, product-stock tanks 
and lifting ports. Components from the component- 
stock tanks are fed to the blend header according to the 
recipes. Thus, different products can be produced and 
then stored in their suitable product-stock tanks. The 
final step is to lift those products during the specified 
time periods in order to satisfy all the orders. The ob- 
jective is to determine the following variables: (1) start- 
ing and finishing time of orders taking place in each 
product-stock tank; (2) the amount and type of product 
being lifted for each order from tanks; (3) starting and 
finishing times of the product being transferred from 
the blender to the tanks; (4) the amount and type of 
component being transferred from component tanks to 
the blender, so as to process all the orders in specific 
time periods. 

The scheduling problem as described above is mod- 
eled in the next section following a continuous-time 
representation. It gives rise to an MILP formulation 
that can be efficiently solved using commercially avail- 
able solvers. 
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Formulation 


It is assumed that perfect mixing is achieved at the 
blend header and that the changeover time between dif- 
ferent products in the storage tanks is negligible. 


Material Balance Constraints 

for Product-Stock Tank j 

Constraint (1a) expresses that the amount of product s 
in tank j at event point n+1 (Pst(s,j,n+1)) is equal to 
that at event point n adjusted by any amounts trans- 
ferred from the blender (Blnd(s,j,n)) or lifted at event 
point n (Mier, lift(i, j, n)). Constraint (1b) states that 
the amount of product s being lifted from tank j at the 
last event point N should not exceed the amount of 
product s stored in tank j. 


Pst(s, j,n + 1) = Pst(s, j,n) + Blnd(s, j,n) 
—)Cliftli,j.n), WseS,jel.neNnZN 


i€l, 
(la) 
Pst(s, j,n) + Blnd(s, j,n) > Y~lift(i, j.n) , 
i€l, 


VseS,jeJ;,n=N _ (1b) 


Capacity Constraints 


Constraint (2) imposes a volume capacity limitation of 
product s in tank j at event point n. 


Vmin(j) * y(s, j,n) < Pst(s, j,n) + Blnd(s, j, n) 
VsEeS,jet,neNn 
(2) 


< Vmax(j) * y(s, j,n) , 


Allocation Constraints 


According to constraint (3a), uv(ij,n) is equal to 1 if 
the amount of product being lifted from tank j for or- 
der iis not zero at event point n, that is, lift(i, j,n) A 0; 
uv(ij,n) equals 0 otherwise. Ul and U2 correspond 
to lower and upper bounds on the amount of prod- 
uct lifted, respectively, and are chosen according to 
the smallest order and the maximum capacities of the 
tanks. 


U1 * uv(i, j,n) < lift(i, j,n) < U2 * uv(i, j,n), 
Viel, jeji,zneN (3a) 


To avoid task splitting, constraints (3b)-(3d) state that 
order i should be processed only once if it is a small 
order and at most twice if it is a medium-sized order. 
Otherwise, it can be processed at most three times. For 
different problems, U3 and U4 are chosen accordingly 
to define small and medium-sized orders. Constraint 
(3e) expresses that for large orders which are defined as 
greater than or equal to U5, the minimum order split- 
ting is 25 Mbbl. 


Yoyo uli, ji.) =1, 


n j€Ji 


V } > Prod_ord(i,s) < U3,i€I,n€N (3b) 


YoYo uli, jn) a2 


n j€ii 


V ) = Prod_ord(i,s) < U4,ieI,n€N (3c) 


Yodo uli, jn) <3,VielLneNn (3d) 
n jSJi 

25 * uv(i, j,n) < lift, j, n) , 
V } > Prod_ord(i,s) >US5,i€1je]i,neNn 


Ss 


(3e) 


Constraint (4) forces sv(s,j,n) to be equal to 1 when 
Blnd(s,j,n) is not zero; otherwise sv(s,j,n) equals 0. 


Vmin(j) * sv(s, j,n) < Blnd(s, j,n) 
VsES, jet,neNn 
(4) 


< Vmax(j) * sv(s, j, 1) , 


Demand Constraints 


Constraints (5a) and (5b) state that order i can be pro- 
cessed at most once in one tank during the time hori- 
zon under consideration and that the amount of prod- 
uct being lifted from all the product-stock tanks should 
be equal to the amount ordered (}°, Prod_ord(i, s)). 


Yo uv(i, jon) <1, Viel je] neNn 


S> So lift, jn) = Y¢ Prod_ord(i,s) , 


n j€Ji 


(5a) 


VseES,ieI,neN (5b) 
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Sequence Constraints 


Constraints (6a)—(6c) state that order i starting in tank j 
at event point 1+1 should start after the finishing time 
of the same order processed in the same tank which has 
started at event point n. Constraints (6d) and (6e) ex- 
press that order i should start and finish during the spe- 
cific time period based on the order requirement. These 
constraints are relaxed if uv(i,j,n) is zero, which means 
order iis not executed in tank j at event point n. 


Ts(i, j,n +1) => Te(i, j,n) —-H * (1—uv(i, j,n)), 
VieljeJ.neN,nAN (6a) 


Ts(i, j,n +1) => Ts(i, j,n), 
Viel, jeJ,neN,nAN (6b) 


Te(i, j,n +1) > Te(i,j,n), 
Viel,jelneNnZAN (6c) 


Ts(i, j,n) => Prod_srt(i) * uv(i, j,n) , 
Viel, jeJ,neN (6d) 


Te(i, j, n) < Prod_end(i) + H * (1—uv(i, j,n)), 
Viel, jetlneN (6e) 
Duration Constraints 


If order i is processed in tank j at event point n, that 
is, uv(i, j,n) = 1, then both ends of constraint (7a) are 
equal, so the duration is given by lift(i, j, n)/I(i), where 
I(i) is the lifting rate of order i. If uv(i, j,n) = 0, then 
the duration is zero according to constraint (7b). 


lift(i, j,n) — )°, Prod_ord(i, s) * (1 — uv(i, j, n)) 
I(i) 
< Te(i, j,n) — Ts(i, j,n) < 


lift(i, j, n) 
i) | 
Viel, jeJ,neN (Za) 
Te(i, j, n) — Ts(i, j, n) 
2 Lises; Prod_ord(i, s) * uv(i, j,n) 
a I(i) 
Vielj,,jeJ,neN (7b) 


Blending Stage Consideration 

The consideration of the blending stage requires the in- 
corporation of the constraints described in the follow- 
ing constraints. 


Material Balance Constraints for the Blender 


To avoid the introduction of bilinear terms in the mass- 
balance equations and to keep the model linear, the 
idea of component mixing used by Quesada and Gross- 
mann [12] together with the assumption of constant 
production recipe is used. On the basis of these assump- 
tions, constraint (8) is introduced to express that the re- 
quired amount of component k to produce product s at 
event point n ()>,(Recipe(s, k) * jer, Bind(s, j,n))) 
should be equal to the total amount of component k 
being transferred from all the component tanks at that 
event point (wry comp(k, I, n)). 


Y \(Recipe(s, k) * ) > Blnd(s, j, n)) 


j€Js 


= Y > comp(k, 1, n) , WseS,kEK,nEN 


1EL, 


(8) 


Material Balance Constraints for Component Tank I 


The amount of component k in tank / at event point n+1 
(bc(k,Ln+1)) is equal to that at event point n (bc(k,],n)) 
adjusted by any amounts transferred from separation 
units (cracking(k,ln)) or delivered to the blender at 
event point n(comp(k, 1, n)). This relation is expressed 
by constraint (9a). Constraint (9b) imposes the upper 
and the lower bounds on the flow rates of component k 
transferred from tank / to the blender. 


be(k, 1,n + 1) = be(k,1,n) + cracking(k, 1, n) 
—comp(k,l,n),Wk€ Ki,n€N _ (9a) 


flowmin * yv(k,1,n) < comp(k, I, n) 
< flowmax * yv(k,1,n), 
Vk eEK,lELy,n EN (9b) 


Allocation Constraints for Product-Stock Tank j 


Constraint (10) states that product s cannot be trans- 
ferred to product-stock tank j and distributed at the 
same event point n. 


Y- sv(s, j.n)+uv(i, jn) <1, Viel, jelneNn 


s€Sj 


(10) 
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Gasoline Blending and Distribution Scheduling: An MILP Model, Table 1 


Distribution data for an example with ten orders 


Order 01 o2 03 04 


05 806 o7 = 088 09 =6—010 


Time by whichanordercanstar thn] 0 | 0 [24 |2e [40 [as | 96 [118 [144 [1505 | 
PDuedatetm =i ze 24 [aa [as [72 [72 [120 | 190 |r68 [1855 | 
Liking rate obi) | 50|50_[50 [50 [50 [50 | so] @ | so] 5 _| 


192 


Time horizon 
(hr) 
Product-stock ptl pt3 


pt4 pt5 
tank 

Products that 
can be stored 


E4W4 
Initial product E490.20 
and amount 
(Mbbl) 
Maximum 
capacity (Mbbl) 
Minimum 
capacity (Mbbl) 


Allocation Constraints for Blender 


According to constraint (11a), xv(s,n) equals 1 if prod- 
uct s is produced and transferred to at least one tank 
at event point n, whereas xv(s,n) equals 0 if product s 
is not transferred to any of the tanks at event point n. 
Constraint (11b) expresses that only one product can 
be produced in the blender at the same event point n. 


sv(s,j,n) < xv(s,n) < Y- sv(s, j. n), 


i€Js 
VseES,neN (lla) 


Y > xv(s,n) <1, VseS,neNn (11b) 


Sequence Constraints 


Similar to constraints (6a)-(6c), constraints (12a)- 
(12c) state that product s should start being transferred 
to tank j at event point (n+1) after the finishing time for 
the same product transferred to the same tank which 
started at event point n, whereas constraints (12d) and 
(12e) represent the requirement of all the transfers to 


E4W4 W4 E4N5 | E4W4 
W414.08 | N587.51 | W428.49 


94 92 84 94 92 91 82 
0.94 ca 0.92 0.84 0.94 092 0.91 0.82 


pt6 pt7 


pts ptio—- | pti 
N4N5 |N4N5  |N4N5  [N4N5 


N412.36 | N523.96 | N485.11 | N412.36 
ey 


W457.59 | N413.79 


happen within the time horizon H. 


Ths(s, j,n+1) = The(s, j,n)-—H*(1—sv(s, j,n)), 
VseS,jelneNnZ#N (12a) 


Tbs(s, j,n +1) > Ths(s, j,n), 
VseS,,jeJ,neN,nAN (12b) 


The(s, j,n +1) > The(s, j,n), 
VseS,jeJ,neN,nAN (12c) 


Ths(s,j.n)<H, VWseS;jesJneNn (12d) 


The(s,j,.n)<H, VseSj,jeJ.neNn (12e) 


If the blender provides product s for more than one 
product-stock tank at event point n, then the starting 
and finishing times for all the tanks should be the same. 


Tbs(s, j,n) + H * (1 —sv(s, j,n)) 
> Ths(s, j’,n) —H * (1 —sv(s, j’,n)), 
VsES jE), jf EI, j# in EN (13a) 
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Blending data for an example with ten orders 


Component M 
Tanks that can be ct 
stored in 
0 
Recipe of 
products 


Amount of 26.46 ct10 | 67.90 ct9 
component (MbbI) 
and tank that it is 

and initially stored 


in 


Blending rate 
(MbbI/hr) 


Tbs(s, j,n) —H * (1—sv(s, j, n)) 
< Ths(s, j’,n) + H* (1—sv(s, j’,n)), 
VsES jE Js,j ES, fj 4 F,n EN (13b) 


Tbe(s, j,n) + H * (1—sv(s, j, n)) 
> The(s, j’,n) —H * (1—sv(s, j’,n)), 
WseS,jeJs,j €J,,jA#jinEN (13c) 


The(s, j,n) — H * (1—sv(s, j,n)) 
< The(s, j’,n) + H * (1—sv(s, j’,n)), 
VseES,jeJs,jf €J,j4#i,n EN (13d) 


Constraints (14a) and (14b) express that product 
transfer and distribution should be performed consec- 
utively in the same product-stock tank j. 


Ts(i, j,n +1) > The(s, j,n) —-H *(1—sv(s,j,n)), 
Vielj,seS,jeJneN,nZ~N (14a) 


Tbs(s, j,n +1) > Tei, j,n) -H *(1—uvii, j,n)), 
Vieljy,seS,jeJneN,nA~N (14b) 


According to constraint (15), two different prod- 
ucts s and s’ being transferred to the same or differ- 
ent product-stock tanks have to be transferred consec- 
utively according to the allocation constraint for the 


ct53,54 ct51 
ct15 52 
eI : 

fo fooras_[osi7e 
4 

lo 


59.44 ct8 | 7.30 ct15 
5.75 ct52 
3.10 ct53 
28.29 ct54 


C4 cs 


ct57,58 
ct60 


AR CG 


G 
ct55ct11 | ct7,12,17 


ct56,59 
lo ~=——s«sf.0.0443 | 0.384 


0.59 ct51 19.35 ct13 
27.38 ct4 


R 


0.29 ct57 
8.90 ct58 
1.64 ct60 


13.84 ct55 
25.63 ct11 


4.25 ct59 
53.41 ct56 
49.34 ct51 
34.58 ct7 


blender. 


Tbs(s, j,n +1) > The(s’, j’,n)—H*(1—sv(s’, j’,n)), 
WseS;,s €S,sA#s, fel, j el neNnZ#N 
(15) 


Duration Constraints 


The minimum run length of 6h is imposed on the 
blender by constraint (16a): 


y > Bind(s, j, n) >6*Bflow, Ws €S,ne€N (16a) 
jeJs 


Constraint (16b) defines the duration of product s being 
transferred to the tanks at event point 1 as the difference 
between the finishing time (Tbe(s, j, m)) and the start- 
ing time (Ths(s, j, n)), if it takes place in tank j. Con- 
straint (16c) expresses that the duration of transferring 
product s from the blender to tank j corresponds to the 
amount of product s being transferred divided by the 
flow rate. The purpose of having an artificial variable 
(arti(s, n)) is to find a feasible solution in case a larger 
flow rate is required. 


(The(s, j,n) — Tbs(s, j,n)) — H * (1 —sv(s, j, n)) 

< duration(s, n) 

< (The(s, j,n) — Ths(s, j, n)) + H*(1—sv(s, j, n)), 
VseS),jeJ,neN (16b) 
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Gasoline Blending and Distribution Scheduling: An MILP Model, Figure 2 
Gantt chart for the example with ten orders 
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Gasoline Blending and Distribution Scheduling: An MILP Model, Figure 3 
Gantt chart for the example with 16 orders 
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Gasoline Blending and Distribution Scheduling: An MILP Model, Figure 4 
Gantt chart for the example with 23 orders 
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Gasoline Blending and Distribution Scheduling: An MILP Model, Figure 5 
Gantt chart for the example with 30 orders 
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Gasoline Blending and Distribution Scheduling: An MILP Model, Figure 6 
Gantt chart for the example with 37 orders 
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Gasoline Blending and Distribution Scheduling: An MILP Model, Figure 7 
Gantt chart for the example with 45 orders 


168 


180 


192 


1126 


Gasoline Blending and Distribution Scheduling: An MILP Model 


Gasoline Blending and Distribution Scheduling: An MILP Model, Table 3 
Computational results for the blending and distribution system 


on Contin- 0-1 
tders ous 


variables 


Con- 1st integer solution 
variables straints Nodes Itera- CPU 
tions 


[30 [9056 |ans2 | 20793 [a0 [24906 [627.13 [o | 


N/A |N/A |N/A 
116 [4205 1032 18737 [20 |3614 [29.03 |o N/A |N/A |N/A ]2 3614 | 29.03 

N/A |N/A |N/A 

N/A |N/A |N/A 


dises; Blnd(s, j, n) 
Bflow 
VseS,jel,neN (l6c) 


duration(s,n) = —arti(s,n), 


Objective Function 


The objective of the scheduling problem is to minimize 
the sum of artificial variables in the duration constraints 
on the blender so as to determine a feasible solution 
with a flow rate as close to Bflow as possible. The for- 
mulation, however, is general to accommodate different 
objective functions targeting the optimization of pro- 
duction. However, in most realistic cases [11] the ob- 
jective of this stage of refinery operation is to satisfy all 
the orders without any delays. 


objective = > »™ arti(s,n), Ws eS,neéEN (17) 


Case 


The case study considered here is based on realistic data 
provided by Honeywell Hi-Spec Solutions. The distri- 
bution problem consists of 45 orders of four different 
products that are stored in 11 product-stock tanks. The 
incorporation of the blending stage adds the consid- 
eration of nine components and 20 component tanks. 
Smaller-scale instances of the problem are constructed 
to test the proposed formulation involving the consid- 
eration of 10, 16, 23, 30, and 37 orders. The detailed 
data for the case often orders are presented in Tables 1 
and 2. GAMS/CPLEX 7.0 was used for the solution 
of the resulting MILP formulation. The computational 
characteristics of the models are tabulated in Table 3. 


time (s) 


2nd integer solution Optimal solution 
CPU 


time (s) 


Objec- Nodes Itera- 
tive tions 
value 


Objec- Nodes Itera- 
tive tions 


value 
2 1495 6.15 


1 
0 


1353 
4280 


244258 
838195 


0.793 
5.094 


246176 
874051 


5016.51 
20351.18 


The optimal solution with zero integrality gap as well as 
the first and second integer solutions are shown. Note 
that since the objective corresponds to the summation 
of artificial variables used to relax the flow-rate con- 
straints, if a solution has a nonzero objective this in- 
dicates that one of these constraints has been violated 
at the cost of the objective function. For the case study 
examined, however, as shown in Table 3, even the full- 
scale problem involving 45 orders converged to a fea- 
sible solution requiring 4280 nodes in approximately 
5h CPU time which is a reasonable time for the solu- 
tion of the integrated scheduling of blending and dis- 
tribution problem with a time horizon of 8 days. The 
resulting Gantt-charts of the six cases examined are 
shown in Figs. 2-7. Compared with the commonly used 
Gantt chart for scheduling purposes, the difference here 
is that the number below the line corresponds to the or- 
der number, whereas the number above the line corre- 
sponds to the amount of product lifted from this partic- 
ular tank. Note that different orders can be performed 
in the same tank at the same time as shown in Figs. 3-7. 


Conclusions 


In this work, a continuous-time formulation was pre- 
sented for the short-term scheduling of a gasoline 
blending and distribution system. It was shown that the 
resulting model can be solved efficiently even for real- 
istic large-scale problems. The main advantage of the 
proposed approach is the full utilization of the time 
continuity. This results in smaller models in terms of 
variables and constraints since only the real events have 
to be modeled. 
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C.F. Gauss (1777-1855) worked in a wide variety of 
fields in both mathematics and physics including num- 
ber theory, group theory, analysis, differential geom- 
etry, geodesy, magnetism, astronomy, and optics. His 
work has had an immense influence in many areas. 

In 1788, Gauss began his education at the Gymna- 
sium with the help of L. Biittner and R. Bartels, where 
he learned High German and Latin. After receiving 
a stipend from the Duke of Brunswick-Wolfenbiittel, 
Gauss entered Brunswick Collegium Carolinum in 
1792. At the academy, Gauss independently discovered 
Bode’s law, the binomial theorem and the arithmetic- 
geometric mean, as well as the law of quadratic reci- 
procity and the prime number theorem [1,4]. 

Gauss left Gottingen in 1798 without a diploma, 
but by this time he had made one of his most impor- 
tant discoveries: the construction of a regular 17-gon by 
ruler and compasses [2,3]. This was the most major ad- 
vance in this field since the time of Greek mathematics 
and was published in his famous work ‘Disquisitiones 
Arithmeticae’ [1, Sect. VI]. 

On July 16, 1799, in his absence, he was awarded his 
Doctor of Philosophy degree at the university in Helm- 
stedt. His dissertation is a proof of the fundamental the- 
orem of algebra (FTA) [2,3]. The fundamental theorem 
of algebra states that 


Theorem 1 Every polynomial equation of degree n has 
n roots in the complex numbers. 


Gauss is usually credited with the first proof of the FTA. 
He is undoubtedly the first to spot the fundamental flaw 
in earlier proofs, namely the fact that they were assum- 
ing the existence of roots and then trying to deduce 
properties of them. His proof of 1799 is topological in 
nature and has some rather serious gaps. It does not 
meet our present-day standards required for a rigor- 
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ous proof. He published the book ‘Disquisitiones Arith- 
meticae’ in the summer of 1801. There were seven sec- 
tions, all but the last section, referred to above, being 
devoted to number theory. 

In 1814, the Swiss accountant J.R. Argand published 
a proof of the FTA which may be the simplest of all the 
proofs. His proof is based on d’Alembert’s idea in 1746. 
Argand simplifies d’ Alembert’s idea using a general the- 
orem on the existence of a minimum of a continuous 
function. 

Two years after Argand’s proof appeared Gauss 
published in 1816 a second proof of the FTA. Gauss 
uses Euler’s approach but instead of operating with 
roots which may not exist, Gauss operates with inde- 
terminates. This proof is complete and correct. A third 
proof by Gauss also in 1816 is, like the first, topological 
in nature. Gauss introduced in 1831 the term ‘complex 
number’. 

In 1849 Gauss produced the first proof that a poly- 
nomial equation of degree n with complex coefficients 
has n complex roots. The proof is similar to the first 
proof given by Gauss. However it adds little since it is 
straightforward to deduce the result for complex coefh- 
cients from the result about polynomials with real coef- 
ficients. 

It is worth noting that despite Gauss’s insistence 
that one could not assume the existence of roots which 
were then to be proved reals he did believe, as did every- 
one at that time, that there existed a whole hierarchy of 
imaginary quantities of which complex numbers were 
the simplest. Gauss called them a shadow of shadows. 

The different proofs of the FTA are Gauss’s most 
important contributions as a rigorist, that is to say, 
as a representative of logical strictness in method of 
proof [1]. Since this theorem has great significance in 
both algebra and function theory, it influenced many 
other related areas, including mathematical optimiza- 
tion. 

Gauss used infinite sequences and series in his 
daily work, not only in mathematics but in astronomy, 
geodesy, and physics. As an eleven-year-old, Gauss was 
already studying Newton’s binomial theorem, which 
includes the infinite geometric series as a special case. 
He investigated the conditions under which an infinite 
binomial series has a logical meaning. He also thought 
about the theoretical formulation of the notion of lim- 
iting value [3]. In an unfinished article written around 


1800, ‘Fundamental concepts in the principles of series’, 
he formulated the notion of the limit of a sequence in a 
fashion far ahead of the times. 

Gauss introduced there the notions of upper bound 
and least upper bound G; he also introduced the no- 
tions of lower bound and greatest lower bound g. Fur- 
thermore he introduced the ‘final upper bound’ H and 
the ‘final lower bound’ h. If H = h, then their common 
value was called the absolute limit (limiting value) of the 
sequence. His definitions nearly agree with the present- 
day definitions of upper bound G, lower bound g, limit 
superior H, limit inferior h, and the condition H = h for 
the existence of the limiting value [3,4]. 

Gauss’s great interest in astronomy, and his later 
interest in geodesy, compelled him to seek a ratio- 
nal method for determining the magnitude of obser- 
vational errors. In turn, the theory of observational er- 
rors forced him to deal with the modes of thought and 
concepts of the calculus of probabilities. This work had 
great significance in the development of numerous ar- 
eas in both the calculus of probabilities and mathe- 
matical statistics. Furthermore this theory forced re- 
searchers to make clear the conditions under which the 
law of the normal distribution is applicable. This law is 
often called Gauss’s distribution law. 

In 1823 Gauss published his great work “Theoria 
combinationis observationum erroribus minimus ob- 
noxiae’ (‘A theory for the combination of observations, 
which is connected with least possible error’). It is a 
systematic and generalized presentation of his earlier 
theory of observational errors. Here he develops the 
method of least squares [3,4] with mathematical rigor as, 
in general, the most suitable way of combining observa- 
tions, independent of any hypothetical law concerning 
the probability of error. 

The term ‘determinant’ was first introduced by 
Gauss in “Disquisitiones Arithmeticae’ (1801) while dis- 
cussing quadratic forms [3]. He used the term be- 
cause the determinant determines the properties of the 
quadratic form. However the concept is not the same as 
that of our determinant. In the same work Gauss lays 
out the coefficients of his quadratic forms in rectangu- 
lar arrays. He describes matrix multiplication (which he 
thinks of as composition so he has not yet reached the 
concept of matrix algebra) and the inverse of a matrix 
in the particular context of the arrays of coefficients of 
quadratic forms. 
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Gaussian elimination, which first appeared in the 
text ‘Nine Chapters of the Mathematical Art’ written in 
200 BC, was used by Gauss in his work which studied 
the orbit of the asteroid Pallas. Using observations of 
Pallas taken between 1803 and 1809, Gauss obtained a 
system of six linear equations in six unknowns. Gauss 
gave a systematic method for solving such equations 
which is precisely Gaussian elimination on the coefhi- 
cient matrix [1]. 

Gauss’s career was marked by distinct periods dur- 
ing which he immersed himself first in astronomy, then 
in geodesy, and then in physics. Yet he regarded himself 
first and last as “entirely a mathematician’. More Gauss 
was an outstanding example of the few creative thinkers 
who were equally at home in both pure mathematics 
and applied mathematics. Gauss was always trying to 
find new applications of mathematics. He kept many 
little notebooks in which he wrote down ideas and sug- 
gestions as they occurred to him. Always alert to pos- 
sibilities of applying mathematical theories to practical 
problems, he foresaw the use of mathematics not only 
in science and technology, but also in such fields as eco- 
nomics, statistics, finance, and so on. 

During his long and active career, Gauss published 
a considerable number of books and articles in jour- 
nals. But upon his death in 1855, many unpublished ar- 
ticles, notes, and manuscripts were found in his desk. 
When his complete “Collected Works’ were finally pub- 
lished later, it had taken a group of German scientists 
nearly seventy years to edit his writings. Even today the 
name of Gauss occurs throughout mathematics and re- 
lated areas over and over again. We have the Gaussian 
equations in spherical trigonometry; the hypergeomet- 
ric series is also called the Gaussian series; the normal 
probability curve is known as the Gaussian curve; Gaus- 
sian period is a period of congruent roots in the division 
of the circle; addition and subtraction logarithms are 
also known as Gaussian logarithm; in higher geometry 
we speak of Gauss’s theorem and Gauss curvature; cer- 
tain formulas for approximations are known as Gaus- 
sian approximation methods. 

To appreciate the genius of a man like Gauss we 
must also see him in perspective, through the eyes of 
his colleagues, his students, his friends, and in terms 
of posterity’s verdict. No other mathematician of the 
nineteenth century ever received as much acclaim and 
recognition as that given to Gauss. 
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Gauss—Newton Method: Least Squares, Relation to Newton’s Method 


Least squares optimization appears most often in pa- 
rameter estimation problems involving nonlinear mod- 
els. In this problem the object is to minimize the 
squared distance between an observed and a fitted value 
from a model with adjustable parameters. For a single 
equation model the formulation becomes 


n 


= Lyn 


M=1 


min S(6) = — f(0,x,)|’ (1) 


where @ are the adjustable model parameters, y,, is the 
observed value of the the dependent variable (assumed 
to contain error) at the jz data point, x, are the ob- 
served values of the independent variables (assumed er- 
ror free) at the yz data point, and n is the total number 
of data points observed. 

This is a very common and well studied problem. 
As a result many different solution methods exist. In 
particular two of the earlier developed methods, New- 
ton’s method and the Gauss-Newton approach will be 
discussed and the relationship between the two will be 
presented. 


Newton’s Method 


Newton’s method is derived based on a second order 
Taylor series expansion of the objective function around 
the current ‘guess’ of the solution 6;: 


Qi(O) = S(6;) + q' (6 — 6;) 


+ 5(6 — 6)" H(6 — 6;) i 
with 
qi = m2 woe (3) 
Hi = ae 
a6) a6 a 
=n 3 “196,30; 5a ees > a ae 


where ey = yy —fy and f,, =f(x,; 8). In order to find 
a stationary point of (2) the first order derivatives are 


equated to zero: 


dQi 


99 UT Hi(8 — 6) = 0. (5) 


If H is nonsingular, then the solution of (5) for 6 can be 
written as: 


6 = 6; —H;'qi. (6) 


The method is implemented in a iterative fashion where 
the value of 6 from (6) is used as the next ‘guess’ of the 
solution. The iterations continue until a convergence 
criterion is reached. Theoretically this should be based 
on the first order derivatives being equal to zero. But 
for practically purposes and numerical reasons the cri- 
terion is most often based on the change in the param- 
eter values. For example: 


|Oi+1 — 9i| 


<€, 7 
|0;| + € 7 (7) 


where €; and €2 are arbitrary small constants. 


Properties of Newton’s Method 


Newton’s method has the following properties [11]: 

e Converges in one iteration if S(@) is quadratic, as is 
the case when the model f(0, x) is linear in the pa- 
rameters. 

e Requires that both the first and second derivatives of 
S(6) are computed. 

e Inversion of the Hessian matrix of S(@) is required 
at each iteration (O(n3) operation). 

The iteration is undefined when H is singular. 

e H is required to be positive definite for the step to 
reduce the value of the objective function. 

e Outside the neighborhood of the minimum, conver- 
gence is not guaranteed. 

Many of these properties, especially the requirement of 

second derivatives, makes this method impractical for 

most physically significant problems. 


Gauss—Newton Method 


The method developed by C.F. Gauss [7] attempts to 
overcome some of the drawbacks to the original New- 
ton approach. A closer look at (4) shows that for small 
errors (e,,) the first term in the equation is approxi- 
mately zero: 


: O° fu 
2D 6H 5506, ~0 fore, <1. (8) 
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Therefore the Hessian matrix H; can be approximated 
as: 


* “ Of Ofu 
H; ~ H; =2 — —_., 9 
t-ay feos ° 


A step in the solution method then takes the form: 


6:41 = Oj) — H*"'q.. 


t 


(10) 


This method can be viewed as linearizing the nonlinear 

model, and then solving the resulting linear regression 

to determine the starting point for the next iteration [4]. 

The Gauss—Newton method has the following proper- 

ties [12]: 

e Only first derivatives of S(@) need to be computed at 
each iteration. 

e The approximated Hessian matrix H* is intrinsi- 
cally positive definite and due to the structure, in- 
version is much easier. 

e The approximation is exact if the errors e,, tend to 
zero at the minimum. 

e Outside the neighborhood of the minimum, conver- 
gence is not guaranteed. 

These properties offer improvements over the Newton 

method especially in the computational effort required. 


Comparisons Between Newton 
and Gauss—Newton Method 


Various comparisons have been made between these 

two methods: 

1) If the model fits the data well (i.e., all e,, are small 
at the solution), then the Gauss-Newton method of- 
ten requires no more iterations than the Newton 
method [1]. 

2) Ifthe model does not fit the data well (i.e., some e,, 
do not tend to zero at the solution), then the Newton 
method will require fewer iterations than the Gauss- 
Newton, but the computation times will be similar 
[6]. 

Both of these methods are similar in that they fall under 

the category of gradient based approaches. In general, 

a gradient method is iterative in which the step at each 

iteration is defined as: 


(11) 


where q; is defined earlier, p; is the steplength, and 
R; is a matrix which should be positive definite. In 


Gi41 = 0; — piRiqi, 


the Newton method R; is the inverse Hessian H~!, 
while Gauss—Newton uses the approximation H*~!. 
As mentioned earlier, the inverse Hessian is not always 
positive definite, while the approximation is, except in 
the case that the Jacobian matrix, q, is rank deficient. In 
the implementation of both methods, the steplength p; 
is taken as 1. 


Variable Steplength 


One of the obvious extensions of the method involves 
a selection of the steplength other than one. At each 
iteration, the search direction given by the Gauss- 
Newton step is downhill due to the positive definiteness 
of the approximate Hessian. But the step does not nec- 
essarily result in a reduction of the objective function S, 
since overshooting the minimum is possible. Therefore 
a steplength p should be chosen such that at least: 


S(6:41) < S(6;). (12) 


One such method can be found in [3]. First define the 
function W;(p) as: 


W;(p) = S(6; — pRiqi). (13) 


The value of W;(0) is defined as S(0;). An initial value 
of p° is chosen and the value of W;(p°) is calculated. If 
W;(p°) is greater than W;(0), then obviously this value of 
p is not acceptable. Even if the value of p is acceptable, 
the following process may still offer an improvement. 
The function W;(p) can be approximated by 
a quadratic function which matches at p = 0, p = p’, 
and the slope at p = 0. The function takes the form: 


Wi(p) ~ a+ bp+ cp (14) 


with the coefficients defined as: 


a = W(0) = S(G;), 


dW; 

b= = —q} Riqi. 
dp p=0 

_ W;(p°) — a — bp? 


(p°)? 
The object is to minimize this approximation over p. 
A stationary point occurs at: 


a) 
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This calculation can be used in an iterative fashion un- 
til an acceptable value of p is found which reduces the 
objective function. Reference [3] contains a detailed im- 
plementation of this iterative calculation. 


Gauss-Newton Example 


This example of the Gauss-Newton approach with 
a variable steplength is found in [3]. This example con- 
sists of a two parameter single equation model of the 
form: 


A. 
y = exp ) —0)x) exp ard |e 
2 


The parameters, 0, represent the Arrhenius constants 
for a first order irreversible reaction: 


(16) 


k 
A->B 


with x, representing the reaction time, x2 the reaction 
temperature, and y the fraction of A remaining. The 
data for the example can be found in the table below. 


The gradients, q, of the objective function take the 
form: 


15 
Q2 
=2 —-— 
qi d eu fu Xp =| Xp, (18) 
15 
6 6 
92 = 2) eu fu exp -=| (19) 
j=l X 2 X 2 
and the approximate Hessian matrix is given by: 
15 al 
Hh = ht 1k =1,2, 20 
ig x 001 9% a 
where 
Of 
a = fue p|-=| Xu (21) 
Ofu AX 6. 
P= = 22 
00 fu X 2 . Xu2 ( ) 


as: 


The objective is to minimize the least squares func- 
tion: 


15 


min S(@) = S°[y— ful) 


M=1 


(17) 


bh x; (hr) X2(K) y 
: ue Moy 0.980 Using this initial guess the value of the objective func- 
a oo ve ures tion, gradients, and approximated Hessian were calcu- 
3 0.30 100 0.955 
lated. 

4 0.40 100 0.979 
5 0.50 100 0.993 S(9;) = 1.090441, 
: saa ana aa é ee 

. : l= ; 
8 0.15 200 0.455 DANGROS 9 
9 0.20 200 0.255 ut — ( 0:2689478  —0.7730614) | 4-5 
10 0.25 200 0.167 1 "~ \—0.7730614 2.310325 , 
11 0.02 300 0.566 
12 0.04 300 0.317 The search step direction v;, is calculated from — 
13 0.06 300 0.034 Hy Te This is generally accomplished by solving the 
14 0.08 300 0.016 linear system: 
15 0.10 300 0.066 


= Hrvy => qi. (23) 


Many different numerical techniques exist for the solu- 
tion of (23), see [5] or [13] for examples. The calcula- 
tion results in: 


», = (76449785 
1 {512.9099 } ° 
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Initially using a stepsize p° = 1, the following values for 
the parameters are: 


go — (105.0215 
687.0901] ° 
An objective value of $(9°) = 0.9133969 results, which is 
less than S(@,). Even though this is an acceptable value, 
still a different stepsize may give a better result. Using 


the approximation given in (14) with the following val- 
ues of the parameters for the fit: 


Wi(p = 0) = 1.090441, 
Wi(p = 1) = 0.9133969, 


dW; 
= —2.081916. 


p=0 


The parabola has a minimum, given by (15), at 
a steplength p* = 0.5464714. The resulting values of 
the parameters using this steplength are: 


p= 397.5376 
~~ \909.7092 J 
An objective value of S(@') = 0.3345645 results, which 
is a large improvement over S(6°). This value of the pa- 
rameter set, 0', is accepted as 05, and the iterations con- 


tinue. The results of the iterations can be found in the 
table below. 


i | S(0;) 64.4 G49 

1 | 1.090411 750 1200 

2 | 0.3345645 397.5376 919.7092 
3 | 0.05765885 646.0847 938.5288 
4 | 0.04038005 810.6260 965.7625 
5 | 0.03980731 818.3628 962.1228 
6 | 0.03980599 813.4583 960.9063 


The value of the parameters and the objective func- 
tion at the sixth iteration are accepted as the solution to 
the problem. The final values of the gradients and the 
approximate Hessian are: 


~0.218524\ 
Ee x10, 
7 ( 0.631308 ) 


y* = ( 0.271890 


—0.957336 


0.957336) ys 
3.50371 


The above calculation benefited from that fact that 
the initial guess for the parameter values was relatively 
close to the solution. Take now the same example, but 
using the following parameter values as the starting 
point of the calculation: 


= (ooo) 
2000 


This is obviously a ‘worse’ starting point than the pre- 
vious calculation. Using these parameter values the fol- 
lowing results: 


S(@1) = 5.299502, 


—0.0007098080 
0.0002442936 }’ 


W* = ( 0.7036033 co) ios 


qd = 


—0.2354773 0.07896382 
», = (7134608.0 
' ~~ \—432361.0) ° 
go — (—134508.0 
—430361.0) © 
Using the value of 0°, calculated with p° = 1, it is not 
possible to calculate the value of the objective func- 
tion since the resulting exponentials are very large. The 
value of p was repeatedly halved until a reasonable value 


of the objective function was obtained. The value p° = 
2~* = 0.00390625 resulted in: 
po = ree, 
311.0039 

An objective value of S(6°) = 0.3366272 x 10”? results, 
which is not acceptable. The stepsize needs to be ad- 
justed such that the objective function decreases. This 
is accomplished in the same way as outlined previously. 
The parabolic approximation reaches a minimum at 
p* 5x 107”. This is too small to be practical, so 
a value of p! = p° /4 will be used. This results in S(01) = 
5.471375. Again this is not acceptable since it is larger 
than S(6,). The value of p is iterated on until an ac- 
ceptable value is determined. Finally after three more 
iterations, p* = 0.0000619701, which produces: 


ot = 91.65955 
1973.211) © 
An objective value of S(0*) = 5.299135 results, which is 
just less than the original value of 5.299502, but given 
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the criterion in (12) is acceptable. 6* is accepted as 02 
and the iterations continue. 

The solution, in this case, is obtained after 25 itera- 
tions. This illustrates the major downfall of the Gauss- 
Newton method, that without a “good’ initial guess con- 
vergence to the solution is slow at best and not guaran- 
teed. In fact without using a variable stepsize, the algo- 
rithm would have blown up after just one iteration. 


Modifications and Applications 


A very large number of different variations on the ba- 
sic Gauss-Newton algorithm exist. For the most part, 
these variations include methods to determine the step- 
size, and approaches which actually improve the accu- 
racy of the approximated Hessian matrix. For examples 
of different variations see [10] or [8]. Others have done 
comparisons and numerical experiments with popular 
variations to test their applicability to a wide range of 
problems [2,15]. The algorithm has also been applied 
to what is referred to as weighted least squares (WLS) 
in which each term in the objective function receives 
a different coefficient: 


n 


min S(8) = wu lye -fO.x)] 


M=1 


(24) 


where w,, is the weighting for the jth data point, see 
[14] and [9] for examples. 


See also 


> ABS Algorithms for Linear Equations and Linear 
Least Squares 

> ABS Algorithms for Optimization 

> Gauss, Carl Friedrich 

> Generalized Total Least Squares 

> Least Squares Orthogonal Polynomials 

> Least Squares Problems 

> Nonlinear Least Squares: Newton-type Methods 

> Nonlinear Least Squares Problems 

> Nonlinear Least Squares: Trust Region Methods 
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Introduction 


The aim of cluster analysis is to establish a set of clus- 
ters such that the data points in a cluster are more sim- 
ilar to one another than they are to those in other clus- 
ters. The clustering problem is old, can be traced back 
to Aristotle, and has already been studied quite exten- 
sively by 18th century naturalists such as Buffon, Cu- 
vier, and Linne [19]. Since then, clustering has been 
used in many disciplines, such as market research, so- 
cial network analysis, and geology, thus reflecting its 
broad appeal and utility as a key step in exploratory 
data analysis [26]. In market research for instance, clus- 
ter analysis is widely used when working with multi- 
variate data from surveys and test panels. Market re- 
searchers use cluster analysis methods to segment and 
determine target markets, and position new products. 
Cluster analysis is also used in the service of market 
approaches to the establishment of business enterprise 
value. Johnson [28] addresses the potential role and 
utility of cluster analysis in transfer pricing practices. 
Given the importance of clustering, a substantial num- 
ber of books, such as [11,20,27,39], as well as review pa- 
pers, such as [58] have been published on this subject. 
In biology, clustering provides insights into tran- 
scriptional networks, physiological responses, gene 
identification, genome organization, and protein struc- 
ture. Genome-wide measurement of mRNA expression 
levels is an efficient way of gathering comprehensive 
information on genetic functions and transcriptional 
networks. However, extracting useful information from 
the resulting data sets first involves organizing genes 
by their pattern and/or intensity of expression in or- 
der to define those that are co-regulated. Such in- 
formation provides a basis for extracting regulatory 
motifs for transcription factors driving the diverse ex- 
pression patterns, allowing assembly of predictive tran- 


scriptional networks [2]. This information also pro- 
vides insights into the functions of unknown genes, 
since functionally related genes are often co-regulated 
[55]. Furthermore, clustered array data provides iden- 
tification of distinct categories of otherwise indistin- 
guishable cell types, which can have profound implica- 
tions in processes such as disease progression [50]. In 
sequence analysis, clustering is used to group homol- 
ogous sequences into gene families. Examining char- 
acteristic DNA fragments helps in the identification of 
gene structures and reading frames. In protein struc- 
ture prediction, clustering the ensemble of low energy 
conformers is used to identify the top suggested protein 
structures. 

Two common similarity metrics are correlation 
and Euclidean distance. The latter is often popular, 
since it is intuitive, can be described by a familiar 
distance function, and satisfies the triangular inequal- 
ity. Clustering methods that employ asymmetric dis- 
tance measures [33,41] are probably more difficult 
to intuitively comprehend even though they may be 
highly suited to their intended applications. The ear- 
liest work on clustering emphasized visual interpre- 
tations for the ease of study, resulting in methods 
that utilize dendograms and color maps [5]. Other 
examples of clustering algorithms include: (a) Sin- 
gle-Link and Complete-Link Hierarchical Clustering 
[27,49], (b) K-Means Algorithm and its family of vari- 
ants, such as the K-Medians [21,34,37,60,61], (c) Re- 
formulation Linearization-based Clustering [1,46], 
(d) Fuzzy Clustering [3,9,44,47], (e) Quality Clus- 
ter Algorithm (QTClust) [23], (f) Graph-Theoretic 
Clustering [17,57,59], (g) Mixture-Resolving Cluster- 
ing Method [7,26], (h) Mode Seeking Algorithms [26], 
(i) Artificial Neural Networks for Clustering [4,31] such 
as the Self-Organizing Map (SOM) [32] and a vari- 
ant that combines the SOM with hierarchical cluster- 
ing, the Self-Organizing Tree Algorithm (SOTA) [22] 
(j) Information-Based Clustering [8,48,54], (k) Stochas- 
tic Approaches [30,36,38]. Some of these methods, such 
as the K-Means and Information Clustering, are opti- 
mization-based approaches, in which the clustering is 
represented as an unknown parameter vector of a cost 
function. The process then seeks to obtain the best clus- 
tering by minimizing this cost function. Other classes 
of clustering methods such as competitive learning may 
not have a straightforward cost function. For instance, 
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in the SOM, cluster centers are arbitrarily chosen ini- 
tially, after which random data points are selected and 
placed into the nearest cluster, whose center is updated 
accordingly after each selection. Clustering ceases when 
the cluster centers become stationary. 

Recently, Tan et al. [51,52] presents a novel op- 
timization-based Mixed-Integer Nonlinear Program- 
ming (MINLP) clustering algorithm, the Global 
Optimal Search with Enhanced Positioning (EP_ 
GOS_Clust), which is robust yet intuitive. This algo- 
rithm is significant in that it is able to progressively 
identify and weed out outlier data points. In addition, 
it involves a pre-clustering process that is rigorous and 
has a clearly-defined decision criterion. This is notable 
as the results of many clustering methods based on 
function optimization schemes often vary depending 
on the random initialization or starting heuristics. The 
EP_GOS_Clust also contains a convenient method to 
predict the optimal cluster number. The algorithm is 
compared with several approaches commonly used in 
clustering biological microarray data, namely K-meth- 
ods, QTClust, SOM, and SOTA. By comparing the in- 
tra-cluster and inter-cluster error sums, as well as the 
strength of biological coherence based on Gene Ontol- 
ogy resources and expression pattern correlation, the 
EP_GOS_Clust is shown to compare favorably against 
other methods. The following sections will describe this 
novel clustering approach in more detail. 


Formulations 
Notation and Pre-Clustering 


The measure of distance for a gene i, fori =1,...,n 
having k features (or dimensions), for k = 1,...,¢ is 
defined as aj. Each gene is to be assigned to only one 
(hard clustering) of c possible clusters, each with cen- 
ter zjx, for j = 1,...,c. The binary variables w, indi- 
cates whether gene i falls within cluster j (w; j= 1, if 
yes; wi; = 0, if no). 

Pre-clustering the data is important to expedite the 
computational resources required to solve the hard 
clustering problem by (i) identifying genes with simi- 
lar experimental responses, and (ii) removing outliers 
deemed not to be significant to the clustering process. 
A straightforward pre-clustering approach to provide 
just the adequate amount of discriminatory character- 
istics so that the genes can be pre-clustered properly is 


to reduce the quantities represented in the k-dimen- 
sional expression vectors into a set of representative 
variables {+,0, —}. The (+) variable represents an in- 
crease in expression level compared to the previous 
time point, the (—) variable represents a decrease in 
expression level from the previous time point, and the 
(0) variable represents an expression level that does not 
vary significantly across the time points. The expression 
data can also be pre-clustered by creating a rank-or- 
dered list of gene proximities based on Euclidean dis- 
tance or correlation. Genes that demonstrate an ob- 
vious level of proximity, such as a separation of only 
at most 1% of the maximum inter-gene distances, are 
then grouped together. The pre-clusters are the prox- 
imity genes that form a complete clique, that is, there is 
a link between every gene within the same pre-cluster. 
With this choice, a maximal clique search can be per- 
formed by using various levels of pre-clustering crite- 
ria. Clearly, when the criterion is overly lenient, a large 
number of pre-clusters are formed, but most of the 
genes will belong to multiple pre-clusters, and the num- 
ber of maximal cliques formed is small. On the other 
hand, an unnecessarily strict cut-off results in a small 
number of pre-clusters, thus not accurately reflecting 
the extent of relatedness between the data. In pre-clus- 
tering over a range of cut-off values, we can then select 
the appropriate criterion as the point where the maxi- 
mum number of complete cliques is formed [53]. 


Hard Clustering by Global Optimization The global 
optimization approach seeks to minimize the Euclidean 
distances between the data points and the centers of 
their assigned clusters as: 


> Yo wy (aix - Zjk) 


Minimize 
es i=1 j=l k=1 
c 
s.t. Yi wy= 2, Vi=1,...,n 
j=l 


w;; are binary variables, z;, are 
continuous variables . 
(Problem 1) 


There are two sets of variables in the problem, w; and 
Zjk- While the bounds of w; are clearly 0 and 1, that of 
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Zjk is obtained by observing the range of aj values. 


Zip =min{aj}, Wk=1,...,s 


iy Vk =1,...,s. 


a= max {ajx} , 
The pre-clustering work suggests that some of the genes 
need only be restricted to some number of known clus- 
ters, since it can be determined (for instance by distance 
and correlation metrics) that certain genes are exceed- 
ingly dissimilar from some of the pre-clusters and thus 
have virtually zero probability of being clustered there. 
This restriction can be described by introducing an ad- 
ditional binary parameter suit;;. A data point deemed 
to belong uniquely to just one cluster will only have 
suit;; = 1 for only one value of j and zero for the oth- 
ers, whereas a data point restricted to a few clusters will 
have suit;; = 1 for only those clusters. This reduces the 
computational demands of the problem. The introduc- 
tion of the suit;; parameters also obviates the need for 
constraints that prevent the redundant re-indexing of 
clusters. Together with the necessary first-order opti- 
mality condition (i.e., the vector distance sum of all 
genes within a cluster to the cluster center should be 
intuitively zero), the formulation becomes: 


n Ss 
Minimize y y ay 


mee i=1 k=1 


_ > » Y -(suiti;)(ainwijZjx) 


i=1 j=1 k=1 


n n 
s.t. (suit ;;) (<0 > Wij — > eam) 


i=1 i=1 


=0, VWj,Vk 


c 
Y\(suiti;) wij =1, Vi 
j=l 


n 
1l< Y -(suiti;)wij <n—-—c+l 


j=l 
wij =0-1, Vi,Vj 
gee<zy < 2 Vi, Wk 
jk = 7ik = jk? J; . 


(Problem 2) 


The first set of constraints are the necessary optimality 
conditions, the second demand that each gene can be- 
long to only one cluster, and the third state that there is 


at least one and no more than (m — c + 1) data points 
in a cluster. Note also that the )°/_, )°,.—, 4;, term in 
the objective function of Problem 2 is a constant and 
can be dropped, though for the sake of completeness 
we will retain the term throughout the subsequent for- 
mulations in the paper. Problems | and 2 are Mixed In- 
teger Nonlinear Programming (MINLP) problems with 
bilinear terms in the objective function and the first set 
of constraints. To handle the nonlinearities formed by 
the product of variables wij and zjx, new variables yijx 
along with additional constraints [12] are defined as fol- 
lows: 


Vijk = WijZjk (1) 
Zik — Zi (1 = Wij) S Vijk S Zjk —Ziy (1 ~ wij) (2) 
ZeWij S Vik SZ Wij» ViWG Wk. (3) 


The introduction of yj and the additional constraints 
reduces the formulation to an equivalent Mixed-Integer 
Linear Programming (MILP) problem, but results in an 
inordinately large number of variables. Thus, there is 
a need for new approaches to address large datasets. 


The GOS Algorithm for Clustering The introduc- 
tion of the bilinear variable yj results in a large num- 
ber of variables to be considered. In a problem with 
over 2000 data points, each having 24 features, to be 
placed into over 380 clusters, the number of variables to 
be considered numbers over 18 million. Without intro- 
ducing the yj variables will leave the problem in a non- 
linear form. Mixed-integer nonlinear programming 
(MINLP) problems are considered extremely difficult. 
Theoretical advances and prominent algorithms for 
solving MINLP problems are addressed in [12,13,15]. 
The MINLP clustering formulation described in 
Problem 2 can be solved by a variant of the General- 
ized Benders Decomposition (GBD) algorithm [14], de- 
noted as the Global Optimum Search (GOS). The pri- 
mal problem results from fixing the binary variables to 
a particular 0-1 combination. Here, w3 is fixed and Zjk 
is solved from the resultant linear programming (LP) 
problem. In addition, the solution also includes the rel- 
evant Lagrange multipliers. The master problem is es- 
sentially the problem projected onto the y-space (i.e. 
that of the binary variables). To expedite the solution 
of this projection, the dual representation of the mas- 
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ter is used. This dual representation is in terms of the 
supporting Lagrange functions of the projected prob- 
lem. It is assumed that the optimal solution of the pri- 
mal problem as well as its Lagrange multipliers can be 
used for the determination of the support function. In 
the master problem, the zj, solution from the accompa- 
nying primal is taken and the master is solved for the 
wj; variables. 

The two sequences of upper and lower bounds are 
then iteratively updated until they converge in a finite 
number of iterations. With each successive iteration, 
a new support function is added to the list of constraints 
for the master problem. Thus in a sense, the support 
functions for the master problem build up with each 
iteration, forming a progressively tighter envelope and 
gradually pushing up the lower bound solution until it 
converges with the upper bound solution. 

With fixed starting values for w;, the primal prob- 
lem becomes: 


Minimize 
Zjk 


n Ss n Cc Ss 
2 * 
» : ie » : ) Bik Wijk 


i=1 k=1 i=1 j=1 k=1 


n n 
s.t. zik )_Wi,— > ainwi, = 0, Vj,Wk 
i=1 i=1 


L U . 
Zip S Zk S Zp » Vi,Wk. 


(Problem 3.1) 


The primal problem is a Linear Programming (LP) 
problem. All the other constraints drop out since they 
do not involve zj, which are the variables to be solved 
in the primal problem. Besides zj,, the Lagrange multi- 
pliers re for each of the constraints above is obtained. 
The objective function is the upper bound solution. 
These are inputted into the master problem, which be- 
comes: 


min [lg 
Wij. MB 


n Ss n c Ss 
such that jlg > > Ss ron = » ye AikWi jZjx 


i=1 k=1 i=1 j=1 k=1 


+ dae (aon 


j=l k=1 i=1 


n 
-Soawm) "1 m=1,M 


i=1 


ee 1; Vi 
j=l 


1<) wj<n-c+1, Vi 
j=l 
wij =O-1, Vi, Vj. 
(Problem 3.2) 


The master problem solves for w;; and jzg, and results 
in a lower bound solution (i. e., the objective function). 
The master problem is a Mixed Integer Linear Pro- 
gramming (MILP) problem. The w;; solutions are cy- 
cled back into the primal problem and the process is 
repeated until the solution converges. Thus, there is no 
longer a need for the variables y;j,, which substantially 
reduces the number of variables to be solved. Also, after 
every solution of the master problem, where a solution 
set for w;; is generated, an integer cut is added for sub- 
sequent iterations to prevent redundantly considering 
that particular solution set again. The cut is expressed 
as: 
n n 
yw YD 


i€{n|wij=1} i€{n|wij=0} 


wij snl. (4) 


Determining the Optimal Number of Clusters Most 
clustering algorithms do not contain screening func- 
tions to determine the optimal number of clusters. Yet 
this is important to evaluate the results of cluster analy- 
sis in a quantitative and objective fashion. On the other 
hand, while it is relatively easy to propose indices of 
cluster validity, it is difficult to incorporate these mea- 
sures into clustering algorithms and appoint thresholds 
on which to define key decision values [18,27]. Some of 
the indices used to compute cluster validity include the 
Dunn’s validity index [10], the Davis-Bouldin valid- 
ity index [6], the Silhouette validation technique [43], 
the C index [24], the Goodman-Kruskal index [16], the 
Isolation index [39], the Jaccard index [25], and the 
Rand index [42]. We note that the optimal number of 
clusters occurs when the inter-cluster distance is maxi- 
mized and the intra-cluster distance is minimized. We 
adapt the concept of a clustering balance [29], where 
it has been shown to have a minimum value when 
intra-cluster similarity is maximized and inter-cluster 
similarity is minimized. This provides a measure of how 
optimal is a certain number of clusters used for a partic- 
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ular clustering algorithm. We introduce the following: 


1 n 
Global Center, z? = 7 >. Qik, Wk (5) 
i=1 
Intra-cluster error sum, 


n Cc Ss 6 
A= YY wy fae — Ze 3 


i=1 j=1 k=1 
Inter-cluster error sum, 


c Ss 7 
r= > fen a 


j=l k=1 


Jung et al. [29] proposed a clustering balance parame- 
ter, which is the w-weighted sum of the two error sums. 


Clustering Balance, e =aA+(l—a)I. (8) 


We note here that the rightful a-ratio is 0.5. There are 
two ways to come to this conclusion. We note that the 
factor aw should balance the contributive weights of the 
two error sums to the clustering balance. At extreme 
cluster numbers, that is, the largest and smallest num- 
ber possible, the sum of the intra-cluster and inter-clus- 
ter error sums at both cluster numbers should be bal- 
anced. In the minimal case, all the data points can be 
placed into a single cluster, in the case of which the in- 
ter-cluster error sum is zero and the intra-cluster er- 
ror sum can be calculated with ease. In the maximal 
case, each data point forms its own cluster, in the case 
of which the intra-cluster error sum is zero and the in- 
ter-cluster error sum can be easily found. Obviously the 
intra-cluster error sum in the minimal case and inter- 
cluster error sum in the maximal case are equal, sug- 
gesting that the most appropriate weighting factor to 
use is in fact 0.5. The second approach uses a clustering 
gain parameter proposed by Jung et al. [29], which is 
given by: 


— Z| - (9) 


Jung et al. [29] showed the clustering gain to have 
a maximum value at the optimal number of clusters, 
and demonstrated that the sum total of the clustering 
gain and balance parameters is a constant. This is only 
shown to be only possible if the a-ratio is 0.5 [51]. These 
derivations suggest that for any clustering algorithm in- 


cluding that using the GOS algorithm, one can deduce 
the optimal number of clusters by performing multiple 
repetitions of the clustering process over a suitably large 
range of cluster numbers and watching for the cluster- 
ing gain or clustering balance turning points. 


Proposed Algorithm 


The GOS formulation appears to be a suitable cluster- 
ing algorithm. But for it to be effective, the formulation 
must be provided with a good initialization point. Also, 
we want to expeditiously incorporate the approach to 
predict the optimal number of clusters into a cluster- 
ing algorithm. With these considerations in mind, we 
propose the following GOS clustering algorithm with 
enhanced data point positioning (EP_GOS_Clust). 


Gene Pre-Clustering We pre-cluster the original 
data by proximity studies to reduce the computational 
demands by (i) identifying genes with very similar re- 
sponses, and (ii) removing outliers deemed to be in- 
significant to the clustering process. To provide just 
adequate discriminatory characteristics, pre-clustering 
can be done by reducing the expression vectors into 
a set of representative variables or by pre-grouping 
genes that are close to one another by correlation or 
some other distance function. 


Iterative Clustering We let the initial clusters be de- 
fined by the genes pre-clustered previously, and find the 
distance between each of the remaining genes and these 
initial clusters and as a good initialization point placed 
these genes into the nearest cluster. For each gene, we 
allow its suitability in a limited number of clusters based 
on the proximity study. In the primal problem of the 
GOS algorithm, we solve for zx. These, together with 
the Lagrange multipliers, are used in the master prob- 
lem to solve for wj. The primal gives an upper bound 
solution and the master a lower bound. The optimal so- 
lution is obtained when both bounds converge. Then, 
the worst-placed gene is removed and used as a seed 
for a new cluster. This gene has already been subjected 
to a membership search so there is no reason for it to 
belong to any one of the older clusters. The iterative 
steps are repeated and the clusters build up gradually 
until the optimal number is attained. Figure 1 shows 
a schematic of EP_GOS_Clust. 
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Gene Clustering: A Novel Decomposition-Based Clustering Approach, Figure 1 
Schematic of EP_GOS_Clust algorithm 


Gene Clustering: A Novel Decomposition-Based Clustering Approach, Table 1 
Comparison of cluster correlation. The shaded row contains the results for EP_GOS_Clust and the top three performers in 
each column are marked with an asterisk 


Correlation coefficient 


Optimal Cluster Number Average Maximum Minimum. Standard deviation 


EP_GOS_Clust 
KMedians 
KCityBlk 


Clustering NES 
Method KMeans 
GOS | 


KAvePair 
SOTA 
SOM 
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Inter-cluster error sum 


Case Study 
Experimental Data 


As a study, we use experimental microarray data de- 
rived from a study in the role of the Ras/protein kinase 
A pathway (PKA) on glucose signaling in yeast [56]. 
These experiments analyzed mRNA levels in samples 


extracted from cells at various times following stimula- 
tion by glucose or following activation of either Ras2 
or Gpa2, which are small GTPases involved in the 
metabolic and transcriptional response of yeast cells 
to glucose [45]. These experiments were performed in 
wild type cells and cells defective in PKA activity. Clus- 
tering these microarray data has proven to be a critical 


1142 


Gene Clustering: A Novel Decomposition-Based Clustering Approach 


Difference between Intra-Cluster & Inter-Cluster 
Error Sums 


250 350 


#EP_GOS _Clust 
®™ KMedians 

» KAvePair 

* KCityBlk 

* KCorr 


@ KMeans 
» MINLP 
-~GOS | 
@QTClust 
-SOTA 
«SOM 


450 550 650 


Number of Clusters 


Gene Clustering: A Novel Decomposition-Based Clustering Approach, Figure 4 


Error sum difference 
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Optimal cluster number 


step in using the data to develop a predictive model of 
a topological map of the signaling network surrounding 
the Ras/PKA pathway [35]. 

Levels of RNA for each of the 6237 yeast genes 
in each of the RNA samples from the above exper- 
iments were measured using Affymetrix microarray 


chips and analyzed by the Affymetrix software. We used 
the Affymetrix MicroArray Suite 5.0, which analyzes 
the consensus of intensities of hybridization of an RNA 
to the collection of perfect match probes for a gene on 
the array, relative to the intensities of hybridization to 
single mismatch probes, to further determine whether 
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Gene Ontology comparison. The table compares the — log,9(P) values of the clusters, which reflect the level of annotative 
richness, as well as the proportion of yeast genes that fall into biologically significant clusters. The latter is important in 
‘presenting’ the maximal amount of relevant genetic information for follow-up work in areas such as motif recognition and 


regulatory network studies 


— log,9(P) Comparison 


Average Standard deviation In clusters with — 


In clusters with 
— log19(P) values > 3 


logi9 
(P) values > 4 


Clustering 
Method 


a signal for a specific RNA in a sample was reliable (P 
or present), unreliably low (A or absent), or ambigu- 
ous (M). Before clustering the array data, we filtered 
the data to remove unreliable data. In particular, we 
retained all genes for which all the time points were 
present (4105 genes), all the genes for which greater 
than 50% of the time points were present, and all the 
genes for which the present/absent calls exhibited a bi- 
ologically relevant pattern (e.g. PAAA for the four time 
points in the experiment, suggesting repression of gene 
expression over the course of the experiment). In all, we 
retained 5652 genes. 


Description of Comparative Study 


The clustering algorithms to be compared are 
(a) K-Means, (b) K-Medians, (c) K-Corr, where the 
Pearson correlation coefficient is the distance metric, 
(d) K-CityBlock, where the distance metric is the city 
block distance, or the ‘Manhattan’ metric, which is akin 
to the north-south or east-west walking distance in 
a place like New York’s Manhattan district, (e) K-Ave- 
Pair, where the cluster metric is the average pair-wise 
distance between members in each cluster, (f) QTClust, 
(g) SOM, (h) SOTA, (i) GOS I, where genes with up 
to 7 different feature points are pre-clustered, initial 
clusters are defined by uniquely-placed genes, and each 
gene is placed into its nearest cluster as the initialization 
point, and (j) EP_GOS_Clust, for which genes are pre- 
clustered if they have 2 or less different feature points 


and can be uniquely clustered. Since the K-family ap- 
proaches are sensitive to the initialization point, we run 
each 25 times and use only the best result. 


Results and Discussion 


A good clustering procedure should minimize the in- 
tra-cluster error sum and maximize the inter-cluster er- 
ror sum. We look also at the difference between error 
sums, which is somewhat indicative of the efficacy of 
a particular clustering algorithm, since methods using 
intra-cluster error sum as the cost function would prob- 
ably outperform methods using inter-cluster error sum 
as a performance indicator. From Fig. 2, 3 and 4, we can 
see that EP_GOS_Clust compares very favorably com- 
pared to the other clustering algorithms. Also, as seen 
from Fig. 5, EP_GOS_Clust predicts the lowest number 
of optimal clusters. Together with the quality of the er- 
ror sum comparisons, we infer the superior ‘economy’ 
of EP_GOS_Clust in producing tighter data groupings 
by utilizing a lower number of clusters, as it is actu- 
ally possible to achieve tight groupings by using a large 
number of clusters, even with an inferior clustering al- 
gorithm. 

EP_GOS_Clust is also capable of uncovering 
strongly correlated clusters with high levels of biolog- 
ical coherence. Tables 1 and 2 shows that it performs 
consistently well when compared against the signifi- 
cance of cluster biological coherence uncovered by the 
other clustering methods. We find our clusters to ex- 
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hibit good correlation and a high level of functional co- 
herence strength across all cluster sizes, which indicates 
that EP_GOS_Clust shows good consistency and lack of 
size-bias. Also, it can be seen that EP_GOS_Clust com- 
pares very well with other clustering methods in pro- 
ducing highly correlated clusters, even against methods 
such as K-Corr that already explicitly uses correlation 
as a metric for clustering and the correlation hunting 
SOM. In addition, EP_GOS_Clust conveniently isolates 
errant data points and refines the existing groupings as 
the clustering progresses. 
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Introduction 


Some methods are reviewed to solve the resulting com- 
plementarity problem and two novel algorithms are de- 
scribed. The use of complementarity problems provides 
more flexibility to solve optimization problems, as well 
as a number of other advantages [10]. 

The existence of a general solution procedure for 
the linear complementarity problem (LCP) permits the 
incorporation of this algorithm recursively in an opti- 
mization algorithm and so avoids the use of active set 
strategies to handle inequality constraints and the use 
of second-order information on the objective function. 
This is often beneficial in the presence of nonconvex 
functions [10]. 

There exist many traditional approaches to solve the 
LCP. An algorithm was formulated early for the solu- 
tion of LCPs [5,6]. Later, it was shown that if a LCP has 
a solution, then there exists a linear program, which, 
for a suitable objective function, will have an optimal 
solution that is also a solution to the LCP [2]. This 
was further generalized [7,8,9] so that for certain classes 
of LCPs the problem could be specified and solved as 
a linear program. A characterization of LCP was for- 
mulated [11] showing the equivalence of its solution to 
a solution of an appropriate parametric linear program 
with one scalar parameter. 

A number of interior point algorithms to solve 
the LCPs have been presented, such as an interior 
point potential reduction algorithm [4] with P-matri- 
ces, positive semidefinite matrices and skew-symmetric 
matrices, an interior point algorithm which uses the 
affine scaling algorithm, to solve nonconvex (indefi- 
nite or negative definite) quadratic programming prob- 
lems [14]. A fully polynomial-time approximation al- 
gorithm for computing a solution of the LCP with 
row-sufficient matrices can also be formulated [15]. 
This algorithm is a fully polynomial-time approxima- 


tion scheme for finding an €-approximate stationary 
point of the general LCP. 

Here we shall briefly describe some particular meth- 
ods and indicate two extensions of these algorithms 
which apply to more general matrices. 


Definitions 


In this section some definitions will be given and they 
will be used in the next sections [3]. 


Definition 1 Given M, an n x n matrix, and gq, an n- 
dimensional vector. Let N be the index set of the 
variables, i.e., N= 1,2, ... , 1; the formulation of the 
LCP, LCP(q, M), is then as follows: 


Mx+q>0, (1) 
x>0, (2) 
x'(Mx+q)=0. (3) 


Definition 2 A matrix Me R"*" is said to be 
a P-matrix (Po-matrix) if all its principal minors are 
positive (nonnegative). The class of such matrices is de- 
noted P (Po). 


Definition 3 A square matrix is called a Z-matrix if 
its off-diagonal entries are all nonpositive. A Z-matrix 
which is also a P-matrix (Po-matrix) is called a K-ma- 
trix (Ko-matrix). 


Definition 4 A matrix M € R"*" is said to be column- 

sufficient if it satisfies the implication 
[z;(Mz); <0 forall i] > [z;(Mz); =0 forall i]. 
(4) 


The matrix M is called row-sufficient if its transpose is 
column -sufficient. If M is both column-sufficient and 
row-sufticient, then it is called sufficient. 


Definition 5 A square matrix M is a skew-symmetric 
matrix if its transpose is also its negative: 


A’ =-A. (5) 
Definition 6 If M is a positive definite matrix, then 
there exists a vector z such that 


Mz>0, z>0 (6) 
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Definition 7 If M is a positive semidefinite matrix, 
then there exists a vector z such that 


Mz>0, z>0. (7) 


Definition 8 A potential function is 


P(x, 2) = nlog(c'x)—) log xj, x €int(ya) (8) 


j=1 


where int( 7) indicates the interior of the set yq which 
is the set of all feasible solutions of the dual. 


Formulation 


The aim of this section is to describe two modern im- 
plementations with interior point methods. In the first 
subsection an interior reduction algorithm to solve the 
LCP is presented, with particular matrix classes, [4], 
while in the following subsection an interior point po- 
tential algorithm to solve the general LCP is presented. 


An Interior Point Reduction Algorithm 
to Solve the LCP 


There exist many interior point algorithms to solve 
LCPs. A particularly interesting approach is an interior 
point potential reduction algorithm for the LCP [4]. 
The complementarity problem is viewed as a minimiza- 
tion problem, where the objective function is the prod- 
uct of the solution vector x and the slack vector of the 
inequalities y. 

The objective of the algorithm formulated is to find 
an €-complementarity solution in time bounded by 
a polynomial in the input size. This algorithm is formu- 
lated to solve LCP(q,M) which will have a solution, such 
as when the matrix M is a P-matrix. It is then extended 
to matrices M which are only positive semidefinite and 
to skew-symmetric matrices. 

Consider a LCP, that is, given a rational matrix 
M € R"*" and a rational vector q € R", find vectors 
x, y € R” such that 


y=Mx+q, (9) 


(10) 
gy e@, (11) 


which can be regarded as a quadratic programming 
problem 


Minimize x! y (12) 
subject to y= Mx + q (13) 
x,y=O0. (14) 


Given the problem Eqs. (12)-(14) the aim is to find 
a point with x7 y < « fora given e > 0. 

The algorithm proceeds by iteratively reducing the 
potential function: 


f(x,y) = pln(xTy) — )"In(xjyj). (15) 
j 


Apply a linear scaling transformation to make the coor- 
dinates of the current point all equal to 1 and then take 
a gradient step in the transformed space using the gra- 
dient of the transformed potential function. The step 
size can be determined either by the algorithm or by 
line search to minimize the value of the potential func- 
tion. Finally transform the solution point back to the 
original space. 

Consider the potential function Eq. (15) under scal- 
ing of x and y, given any feasible interior point (x°,y°) 
if the matrices X and Y are diagonal matrices with the 
elements on the diagonal given by the values of (x°,y’). 

Define a linear transformation of the space by 

t= k's PST 'y, (16) 
and let W = XY, w;j = (x9) 7 (v4) so that (w = wy, 
W2,+++,W,) and M = Y-'MX. Consider the trans- 
formed problem as follows: 


Minimize x’ Wy (17) 
subject to y = Mx + q (18) 
£920. (19) 


Feasible solutions of the original problem are mapped 
into feasible solutions of the transformed problem: 


y= Y (Mx + q) = Me+q. (20) 
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Assume that the current point is indeed (e,e) and the 
potential function has the form 


EP = piaR"wy)— Siz wy). QW 
j=l 
The gradient of f is given by 
Vif (x, y) = Fy IE (22) 
Vy f(x,y) = aw ~Y—e, (23) 


and indicate by g the gradient vector evaluated at the 
current point (e,e). 

Denote by (Ax, Ay) the projection of Vf(e, e) on 
the linear space {2 defined by Ay = MAx. 

Thus we define the following problem: 


Minimize || Ax — g||* + ||Ay— g|l? (24) 

subject to Ay = MAx. (25) 
It follows that [4] 

Ax =(I+ M'™M)'(I+ M‘)g, (26) 

Ay =M(I+M™M) (I+ M?)g. (27) 


It is possible determine the reduction Af in the value 
of f in moving from x = y = e to a point of the form 
xX =e-—tAx, 9 = e —tAy, where t > 0. It is desired 
to choose f so as to achieve a reduction of at least n“ * for 
some k > 0, at every iteration. Since this is shown to be 
possible, [4], the result follows if the matrix is positive 
definite, positive semidefinite or skew-symmetric. 


An Interior Point Potential Algorithm 
to Solve General LCPs 


In this subsection a “condition-based” iteration com- 
plexity will be formulated regarding the solution of var- 
ious LCPs. This parameter will characterize the degree 
of difficulty of the problem when a potential reduction 
algorithm is used. The condition number derived will 
of course depend on the data of the problem (M,q). 
Consider the primal-dual potential function of 
a LCP as stated in Eqs: (9)-(11), for any interior feasible 


point, (x, y) € F, and p > 0, which may be represented 
So: 


W(x.) = Wnt p(x. y) = (n+p) n(x" y)—)~ In(x;y}). 
j=l 
(28) 


Suppose the iterations have started from an inte- 
rior feasible point (xo,yo), with W(xo, yo) = W° a se- 
quence of interior feasible points can be generated 
{x yk} (k = 0,1, ...) terminating at a point such that 
hy ty) < €. Such a point is found when 


W(x", yk) < pln(e) + nIn(n) (29) 


since by the arithmetic-geometric inequality 
nIn((x*)?(y*)) _ ei In(x;yj) = nIn(n) = 0. 

The fact that W(x’ y) < W° implies that x’y < 
W°/p and therefore the boundedness of {(x, y) € F | 
x'y < W°/p} guarantees the boundedness of {(x, y) € 
int(F) | x'y < W°}, where int() indicates the relative 
interior of its argument. 

To obtain a reduction in the potential function the 
scaled gradient projection method may be used. The 
gradient vectors of the potential function with respect 
to x and y are 


n+ 


= —1 
Wea (7, Pe (30) 
VY, = (22!) ye (31) 
xTy 


At the kth iteration the following linear program is 
solved, subject to an ellipsoid constraint: 


Minimize Z = Vi Wid, + VI Wikdy (32) 
subject to d, = Md, (33) 
1 >a? > |[(X*) td, ||? + IX)", (34) 


Denote by (at ,d )" the minimal solution of Eqs. (32)- 
(34) and let 


k ca I ae Ki M'r)—e 
He (F) = ae : (35) 


“tp ykryk — 7) 
GHTyH Ye" —m)—e 


r= ((rk? a. M(xt?M") (je - MX*) 


kk Cry) 
Oey 
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then there results 
(eae sgt 
(YE dy} Iie 
By the concavity of the log function and certain elemen- 
tary results it can be shown [17] that 


(37) 


Wak + dy" + dy) — W(x*, y*) < 


a? 1 
—allpt + (n+ p+ 5) _ (8) 


(L=@) 
Letting 
js 1 
ere, al ; | < (39) 
(n+p+2 n+p+2 2 
results in 
Wak + d,,y* + dy) — W(x*, y*) < 
k}|2 1 
— min Pal ; . (40) 
(2n +p +2)’ n+p +2) 


The expression for || p* || is indicated by (35) and can be 
considered the potential reduction at the kth iteration 
of the objective function. For any x,y let 


n+p 
xly 


g(x, y) = Xy—e (41) 


H(x, y) = 2I-(XM"—Y)(Y?+MX?M")"'(MX-—Y) 
(42) 


which is a positive semidefinite matrix. Thus 


eX =o (x®, yHe*, y®)g(xk, y*) (43) 


which may also be indicated as || g(x, y)||7, = g7 (x. y) 


A(x, y)g(x, y). 
Define a condition number for the LCP(q,M) as 


y(M, q.€) = inff|| g(x, y)|I?, | x7 y 


> €,W(x,y) < yp, (x,y) € int(F)}. (44) 


The condition number y(M,q,€) represents the degree 
of difficulty for the potential reduction algorithm in 
solving the LCP(q,M). The larger the condition num- 
ber that results, the easier can the problem be solved. 
The condition number for LCPs provides a criterion to 
subdivide given instances of LCP(q,M) into classes and 


those that can be solved in polynomial time may be in- 
dicated. 


Corollary 1 An instance of a LCP(q,M) is solvable in 
polynomial time if y(M,q,€) > 0 and 1/y(M, q,€) is 
bounded above by a polynomial in In(1/e) and n. 


This corollary is slightly different to corollary 1 in [16]. 
Further the following definitions are important: 


+ 
> \(M, q) = {a |x"y—q'x <0,x-—a2 >0, 


y+M'x>0 forsome (x,y) € int(F)} (45) 


Definition 9 Let G bea set of LCP(q,M) such that the 
following conditions are satisfied: 


+ 
G = {(M, q) | int(F) # 8, )1(M, 4) = 9}. (46) 


Lemma 1 Let tM, q) be empty for a LCP(q,M). 
Then for p >n+ J2n, y(M,q,€) => 1. 


Lemma 2 Let {x | x'y—q'’x >0,x-a2 >0O,y+ 
M'x > 0 forsome (x,y) € int(F)} be empty for 
a LCP(q,M). Then for0 < p< n— »/(2n), there results 
y(M, q,€) 2 1. 


With these properties it can be shown that for many 
classes of matrices y(M, q, €) > 0 or that the conditions 
indicated in the lemmas are satisfied, so the LCP is solv- 
able in polynomial time. 

Further, the potential reduction algorithm will 
solve, under general conditions, the LCP(q,M) when M 
is a P-matrix and when M is a row-sufficient matrix. 
Thus, 


Theorem 1 Let W(x°, y) < O(nln(n)) and M be 
a P-matrix. Then the potential reduction algorithm ter- 
minates at x’ y < € in O(n? max{| A |/0(n), 1} In(1/e)) 
iterations and each iteration uses at most O(n?) arith- 
metic operations. 


The bound indicates that the algorithm is a polynomial- 
time algorithm if | 4 |/@(n) is bounded above by a poly- 
nomial in In(1/e) and n. 


Theorem 2 Let p > 0 and be fixed. For a row-sufficient 
matrix M and {(x, y) € F | W(x, y) < W°} bounded, 
then y(M, q,€) > 0. 


Since for the LCP(q,M) defined by this class of matri- 
ces the condition number is bounded away from zero, 
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the potential reduction algorithm will solve this class of 
problems. 


Methods and Applications 


Depending on the algorithm proposed, any penalty 
function algorithm or any linear programming algo- 
rithm will ensure, given the conditions imposed on the 
problem, a polynomial-time solution is achieved. 

Often computationally, the most efficient method is 
the Newton method with a penalty or a barrier param- 
eter. However, the actual method of solution is left to 
the interested reader, who can refer to the original con- 
tributions, since too many problem -dependent factors 
are involved. 


Models 


The aim of this section is to treat the methods described 
in “Formulation” under some more general conditions. 


An Interior Point Newton Method 
for the General LCP 


This algorithm finds a Karush-Kuhn-Tucker point 
for a nonmonotone LCP with a primal interior point 
method using Newton’s method with a convex barrier 
function, under some mild assumptions. 


Consider a bounded LCP: 

Mu+q-—v=0 (47) 
u,v>0 (48) 
VS 0 (49) 


and suppose that the LCP solution set S = {u, v| Mu + 
q—v = 0,u,v > 0,u'v = 0} is bounded above by 
a vector (m},m3)" € R". Define two diagonal posi- 
tive matrices 


D, > Diag(2m) (50) 


D2 > Diag(2mz) (51) 


to obtain the following LCP 


y = Dy'v = Dy'(Mu + q) 
= (Dy'MD,)x + Dz'q (52) 


1 
se>x,y=0 


5 (53) 


x y= 0 (54) 


which without loss of generality will be indicated as 


Mx+q-—y=0 (55) 
x,y>0 (56) 
y= 0; (57) 


Assume that there exists an approximate interior point 
solution, as is usual with interior point methods, with 
variables O0<xj;,yi<€,€<n? Vi=1,2,...,n 
and consider the following barrier function for the 
optimization problem for the LCP (55)-(57). 


Minimize w(x, y, 4) = x’ y—p a In(xjyi) (58) 


i=1 


subject to Mx —y+q=0 (59) 
1 

, ~ 60 

xy <5e (60) 

x,y >0 (61) 


where e € R” is the vector of unit elements and 6 > 0 
is an arbitrary small parameter. 

To convert the optimization problem (58)-(61) into 
a convex programming problem, consider as a barrier 
parameter, which is successively reduced, then the gra- 
dient of this function is: 


_ xiye — B= Wy: 

(Vi W(x, y))i = oe, ; (62) 
— xi yi — (B — px: 

(Viv (x, y))i = array ae (63) 


It is easy to show that if the barrier parameter at any 
iteration k will satisfy the following inequality 


2 

2 (x; ys + B) 

+P 

then the Hessian matrix of the function (58) is positive 
denite for the conditions imposed. Thus the optimiza- 
tion problem (58)-(61) is a convex programming prob- 
lem and it may be solved by one of the methods above, 
which is also suitable to a further generalization [1]. 
Here it will be solved as a convex quadratic program- 
ming [12]. Rewrite the optimization problem (58)-(61) 
as: 


(64) 


Min y(x, y, w) =x’ y—p > In(xiyi + B), (65) 


i=1 
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M -I 
I 0 
subjectto} 0 I ( * ) +b>0. (66) 
= ee a) ed 
0 =I 


Where b? = (q7,0,0, $e", 5e"). 

Indicate the constraint matrix as the matrix A of di- 
mension 5n x 2n. Also, idicate with z? = (x', y7) € 
Ree: 

The algorithm considered is a primal method with 
a log barrier function. It will follow a central path and 
will take small steps [12] and it can be shown that from 
an approximate global minimum, an exact global mini- 
mum can be simply derived [12]. 

Let J7 denote the feasible region of Eq. (66) and de- 
note the interior of this feasible region by int(/7), i.e., 
Az > b by relaxing as is usual in the Interior point al- 
gorithms, the equality constraints. 

Make the following assumptions: 
rank(A) = 2n, 

IT is compact, 

intUIT) £ 9. 

KV Se VIS YDS cea th 
Define the potential function 


h(z,u) = Wx. yw) — WY In(a7z—bi). (67) 
i=1 


The following lemmas are straight forward adaptations 
of the original results. 


Lemma 3 For any fixed choice of 4 > 0, that meets 
the condition (64), the function (67) is strictly convex on 
int(IT). 


Lemma 4 For any fixed choice of 4 > 0, that meets the 
condition (64), the function (67) has a unique minimum. 


Let (jz) be the minimum of h(z, jz) for a fixed jz. As 
jt — 0 there must be an accumulation point by com- 
pactness. This point must be an approximate global 
minimum. 


Lemma 5 Let Zz be an accumulation point of (1). As 
jt — 0 then Z is an approximate global minimum for 
problem (65)-(66). 


Generalization of an Interior Point Reduction 
Algorithm to Solve General LCPs 


The condition number for LCPs provides a criterion 
to subdivide given instances of LCP(q,M) into classes. 
These results will now be extended. 

Consider a LCP(q,M) Eqs. (9)-(11) with a nonsin- 
gular coefficient matrix M, for which, moreover, (I- 
M) is nonsingular and the solution set of LCP(q,M) is 
bounded from above. This LCP can be indicated so: 


Mu+q-v=0, (68) 
u,v>0 (69) 
u'v=0, (70) 


where u,v,q € R". Suppose that the LCP solution set 
S={u,v| Mu+q—v=0,u,v>0,u'v=0} is 
bounded above by a vector (m}, mj)" € R". 

Apply the transformation defined by Eqs. (50) and 
(51), so that there results 


y = Dy'v = D>'(Mu + q) 


= (Dy'MD,)x + Dz"'q, (71) 
1 
seer aNs (72) 
gy=6. (73) 

which will be indicated as 

Mx+q-y=0, (74) 
x,y>0, (75) 
x y=0. (76) 


For the potential reduction algorithm to solve general 
LCPs, it is required that x > 0 and y > 0. 


Lemma 6 For a _ nonsingular M the matrices 
M = D,MD>», (I— M), (I— XYM) and (—Y + MX) 
are all nonsingular. 


Corollary 2 Under the conditions of Lemma 3 
(Y + MX) is nonsingular. 


The following additional lemma is also required. 
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Lemma 7 For all LCP(q,M) with nonsingular ma- 
trices M and (I-M) transformed to the form given 
by Eqs. (71)-(73) so that for any feasible solution 
(x, y) € int(F) so that0 < X <I, 0< Y <I, there re- 
sults g(x,y) = “t2Xy—e £0. 


xly 


Theorem 3 For all LCP(q,M) with nonsingular ma- 
trices M and (I-M) transformed to the form given 
by Eqs. (71)-(73) so that for any feasible solution 
(x, y) € int(F) there results0 < X <I, 0< Y <I, the 
condition number for the LCP y(M, q, €) > 0 for some 
p> 0. 


For notational simplicity assume that the transformed. 
matrix M is indicated by M without loss of gener- 
ality. y(M,q,€) = 0 if ||g(x, y)||j; = 0. Assume that 
Ilg(x. y)||%, = 0 and expand it in terms of its factors. 


2g(x, y)" g(x, y) — g(x,y)" 
[(XM? — Y)(Y? + MX?M")"'(MX — Y)] 
g(x,y) =0 (77) 
It is easy to show that this will never happen under the 
conditions of the theorem. Hence, for any matrix that 
satisfies the assumed conditions the condition number 
is strictly positive and so a solution to the LCP may be 
obtained straightforwardly by this method. This pro- 
vides a partial characterization and extension of the ma- 
trix class G defined in [16]. 


Cases 


Algorithms should be tested extensively for their com- 
putational efficiency on a wide series of cases, so that 
suitable comparisons can be made. 

One hundred and forty random instances of LCPs 
were solved for four different sizes (30, 50, 100, 250), 
with three types of matrices: positive semidefinite, neg- 
ative semidefinite and indefinite. In Table 1 the num- 
ber of problems solved for each type of matrix with 
the parametric LCP algorithm [11] and with an interior 
point algorithm with the Newton method are indicated. 

The instances with positive (semi)definite matri- 
ces are easy to solve in fact. The instances with nega- 
tive (semi)definite and indefinite classes are considered 
hard to solve, but both algorithms have no trouble with 
these classes, except that the first seems to be more hap- 


Generalizations of Interior Point Methods for the Linear 
Complementarity Problem, Table 1 

Results for 140 linear complementarity problems (LCPs) of 
different matrix classes and sizes 


INDF 
IPNM_ PLCP 


Type PSD NSD 
Size PLCP IPNM PLCP 
30 12 

50 
100 
250 


PSD positive semidefinite matrix, NSD negative 
semidefinite matrix, INDF indefinite matrix, PLCP 
parametric LCP algorithm, /PMN interior point algo- 
rithm with the Newton method. 


Generalizations of Interior Point Methods for the Linear 
Complementarity Problem, Table 2 

Timing results for 140 LCPs of different matrix classes and 
sizes (seconds) 


Type PSD NSD INDF 
Size PLCP IPNM PLCP IPNM PLCP- IPNM 


20 | 005] oos| ace] 006] o07| 007 


[s0_[ 02a] 018] 038] 032] 033[ 032 
Foo_[3a7| 142] 700] 337] sae] 2.78 
faso_ [10037 [2256 | 12151 [95.12] 11.99 [7.45 


hazard, rather than being subject to numerical difficul- 
ties. 

Both routines seem to be only slightly affected by the 
type of matrix, but the interior point algorithm with the 
Newton method is more efficient, as confirmed in Ta- 
ble 2, where the average time for solving the instances 
is given in seconds. 


Conclusions 


Interior point methods to solve the LCP are now well 
established and allow polynomial solutions to be ob- 
tained for such problems with suitable matrix classes. 
Moreover these routines can be used as a subroutine in 
general iterative optimization problems. 

Evidently research is being actively conducted to 
generalize the applicable matrix classes for which solu- 
tions can be obtained in polynomial time and space. 
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See also 


> Complementarity Algorithms in Pattern 
Recognition 

> Mathematical Programming Methods in Supply 
Chain Management 

> Simultaneous Estimation and Optimization of 
Nonlinear Problems 
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Introduction 


The generalized assignment problem (GAP) seeks the 
minimum cost assignment of n tasks to m agents such 
that each task is assigned to precisely one agent subject 
to capacity restrictions on the agents. 

The formulation of the problem is: 


os (1) 


min 
i=1 j=1 
n 

subject to Yo air; <b; i=1,...,.m (2) 
j=l 
m 
yi j=il,...,n (3) 
i=1 
xij € {0,1} £= 1)... 
(4) 
j=l,...,n 


where c;; is the cost of assigning task j to agent i, aj; is 
the capacity used when task j is assigned to agent i, 
and 0; is the available capacity of agent i. Binary vari- 
able x;; equals 1 if task j is assigned to agent i, and 0 


1154 


Generalized Assignment Problem 


otherwise. Constraints 3 are usually referred to as the 
semi-assignment constraints. 

The formulation above was first studied by Srini- 
vasan and Thompson [80] to solve a transportation 
problem. The term generalized assignment problem for 
this setting was introduced by Ross and Soland [74]. 
This model is a generalization of previously proposed 
model by DeMaio and Roveda [17] where the capacity 
absorption is agent independent (i.e., aj; = aj, Vi). 

The classical assignment problem, which provides 
a one to one pairing of agents and tasks, can be solved 
in polynomial time [47]. However, in GAP, an agent 
may be assigned to multiple tasks ensuring each task 
is performed exactly once, and the problem is NP- 
hard [28]. Even the GAP with agent-independent re- 
quirements is an NV P-hard problem [23,53]. 

The GAP has a wide spectrum of application areas 
ranging from scheduling (see [19,84]) and computer 
networking (see [5]) to lot sizing (see [31]) and facility 
location (see [7,30,74,75]). Nowakovski et al. [64] study 
the ROSAT space telescope scheduling where the prob- 
lem is formulated as a GAP and heuristic methods are 
proposed. Multiperiod single-source problem (MPSSP) 
is reformulated as a GAP by Freling et al. [25]. Janak 
et al. [38] reformulate the NSF panel-assignment prob- 
lem as a multiresource preference-constrained GAP. 
Other applications of GAP include lump sum capi- 
tal rationing, loading in flexible manufacturing sys- 
tems (see [45]), p-median location (see [7,75]), max- 
imal covering location (see [42]), cell formation in 
group technology (see [79]), refueling nuclear reac- 
tors (see [31]), R & D planning (see [92]), and routing 
(see [22]). A summary of applications and assignment 
model components can be found in [76]. 


Extensions 
Multiple-Resource Generalized Assignment Problem 


Proposed by Gavish and Pirkul [29], multi-resource 
generalized assignment problem (MRGAP) is a special 
case of the multi-resource weighted assignment model 
that is previously studied by Ross and Zoltners [76]. 
In MRGAP a set of tasks has to be assigned to a set 
of agents in a way that permits assignment of multi- 
ple tasks to an agent subject to a set of resource con- 
straints. This problem differs from the GAP in that, 
an agent consumes a variety of resources in perform- 


ing the tasks assigned to it. Although most of the prob- 
lems can be modeled as GAP, multiple resource con- 
straints are frequently required in the effective model- 
ing of real life problems. MRGAP may be encountered 
in large models dealing with processor and database lo- 
cation in distributed computer systems, trucking indus- 
try, telecommunication network design, cargo loading 
on ships, warehouse design and work load planning in 
job shops. 

Gavish and Pirkul [29] introduce and compare var- 
ious Lagrangian relaxations of the problem and suggest 
heuristic solution procedures. They design an exact al- 
gorithm by incorporating one of these heuristics along 
with a branch-and-bound procedure. 

Mazzola and Wilcox [58] modify Gavish and Pirkul 
heuristic and develop a hybrid heuristic for MRGAP. 
Their algorithm defines a three phase heuristic which 
first constructs a feasible solution and then systemat- 
ically tries to improve the solution. As an enhanced 
version of MRGAP, Janak et al. [38] study the NSF 
panel-assignment problem. In this setting, each task 
(i. e., proposal) has a specific number of agents (i. e., re- 
viewers) assigned to it and each agent has a lower and 
upper bound on the number of tasks that can be done. 
The objective is to optimize the sum of a set of prefer- 
ence criteria for each agent on each task while ensuring 
that each agent is assigned to approximately the same 
number of tasks. 


Multilevel Generalized Assignment Problem 


The Multilevel Generalized Assignment Problem 
(MGAP) is first introduced by Glover et al. [31] to 
provide a model for the allocation of tasks in a manu- 
facturing environment. MGAP differs from the classical 
GAP in that, agents can perform tasks at different effi- 
ciency levels, implying both different costs and different 
resource requirements. Each task must be assigned to 
one and only one agent at a level and each agent has 
limited amount of single resource. Important manufac- 
turing problems, such as lot sizing, can be formulated 
as MGAP. 

Laguna et al. [46] use a neighborhood structure 
for defining moves based on ejection chains and de- 
velop a Tabu Search (TS) algorithm for this problem. 
French and Wilson [26] develop two heuristic solu- 
tion methods for MGAP from the solution methods 
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for GAP. Procedures for deriving an upper bound on 
the solution of the problem are also described. Ce- 
selli and Righini [11] present a branch-and-price al- 
gorithm based on decomposition of the MGAP into 
a master problem and a pricing sub-problem, where 
the former is a set-partitioning problem and the latter 
is a multiple-choice knapsack problem. This algorithm 
is the first exact method proposed in the literature for 
the MGAP. To provide a flexible assignment tool to 
the decision maker, Hajri-Gabouj [37] develops a fuzzy 
genetic multi-objective optimization algorithm to solve 
a nonlinear MGAP. 


Dynamic Generalized Assignment Problem 


In The Gap Model, the sequence in which the agent per- 
forms the tasks is not considered. This sequence is es- 
sential when each task is performed to meet a demand 
and earliness or tardiness incurs additional cost. Dy- 
namic generalized assignment problem (DGAP) is sug- 
gested to track customer demand while assigning tasks 
to agents. Kogan et al. [44], for the first time, add the 
impact of time to the GAP model assuming that each 
task has a due date. They formulate the continuous- 
time optimal control model of the problem and derive 
analytical properties of the optimal behavior of such 
a dynamic system. Based on those properties, an efi- 
cient time-decomposition procedure is developed. 

Kogan et al. [43] extend the DGAP to cope with 
stochastic environment and multiple agent-task rela- 
tionships. They prove that this stochastic, continuous- 
time generalized assignment problem is strongly 
N P-hard and reduce the model to a number of classi- 
cal deterministic assignment problems stated at discrete 
time points. A pseudo-polynomial time combinatorial 
algorithm is developed to approximate the solution. 
The well-known application of such a generalization is 
found in the stochastic environment of the flow shop 
scheduling of parallel workstations and flexible man- 
ufacturing cells as well as dynamic inventory manage- 
ment. 


Bottleneck Generalized Assignment Problem 


Bottleneck generalized assignment problem (BGAP), 
is the min-max version of the well-known (min-sum) 
generalized assignment problem. In the BGAP, the 
maximum penalty incurred by assigning each task 


to an agent is minimized. Min-sum objective func- 
tions are commonly used in private sector applications, 
while min-max objective function can be applied to 
the public sector. BGAP has several important applica- 
tions in scheduling and allocation problems. Mazzola 
and Neebe [57] propose two min-max formulations 
for the GAP: the Task BGAP and the Agent BGAP. 
Martello and Toth [56] present an exact branch-and- 
bound algorithm as well as approximate algorithms 
for BGAP. They introduce relaxations and produce, 
as sub-problems, min-max versions of the multiple- 
choice knapsack problem which can be solved in poly- 
nomial time. 


Generalized Assignment Problem 
with Special Ordered Set 


GAP is further generalized to include cases where items 
may be shared by a pair of adjacent knapsacks. This 
problem is called the generalized assignment prob- 
lem with special ordered sets of type 2 (GAPS2). In 
other words, GAPS2 is the problem of allocating tasks 
to time-periods, where each task must be assigned to 
a time-period, or shared between two consecutive time- 
periods. Farias et al. [15] introduce this problem which 
can also be applied to production scheduling. They 
study the polyhedral structure of the convex hull of the 
feasible space, develop three families of facet-defining 
valid inequalities, and show that these inequalities cut 
off all infeasible vertices of the LP relaxation. A branch- 
and-cut procedure is described and facet-defining valid 
inequalities are used as cuts. Wilson [86] modifies and 
extends a heuristic algorithm developed previously for 
the GAP problem to solve GAPS2. He argues that, any 
feasible solution to GAP is a feasible solution to GAPS2, 
hence a heuristic algorithm for GAP can also be used as 
a heuristic algorithm to GAPS2. A solution produced by 
a GAP heuristic will be close to GAPS2 optimality if it is 
close to the LP relaxation bound of GAP. The heuristic 
uses a series of moves starting from an infeasible, but in 
some senses optimal solution and then attempts to re- 
store feasibility with minimal degradation to the objec- 
tive function value. An existing upper bound for GAP 
is also generalized to be used for GAPS2. 

French and Wilson [27] develop an LP-based 
heuristic procedure to solve GAPS2. They modify 
a heuristic for GAP to be used for GAPS2 and show 
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that, while Wilson [86] heuristic is straightforward for 
large instances of the problem, and Farias et al. [15] 
solve smaller instances of the problem by an exact 
method, their heuristic solves fairly large instances of 
the problem rapidly and with a consistently high degree 
of solution quality. 


Stochastic Generalized Assignment Problem 


In GAP, stochasticity may arise because the actual 
amount of resource needed to process the tasks by 
the different agents may not be known in advance 
or the presence or absence of individual tasks may 
be uncertain. In such cases, there is a set of poten- 
tial tasks in which, each task may or may not re- 
quire to be processed. Dyer and Frieze [20], analyze the 
generalized assignment problem under the assumption 
that all coefficients are drawn uniformly and indepen- 
dently from [0, 1] interval. Romeijn and Piersma [72] 
analyze a probabilistic version of GAP as the num- 
ber of tasks goes to infinity while the number of ma- 
chines remains fixed. Their model is different from 
Dyer and Frieze [20] since it doesn’t have the ad- 
ditional assumptions that the cost and resource re- 
quirement parameters are independent of each other 
and among machines. They first derive a tight condi- 
tion on the probabilistic model of the parameters un- 
der which, the corresponding instances of the GAP 
are feasible with probability one. Next, under an addi- 
tional sufficient condition, the optimal solution value 
of the GAP is characterized through a limiting value. 
It is shown that the optimal solution value, normal- 
ized by dividing by the number of tasks, converges 
with probability one to this limiting value. Toktas et 
al. [82], consider the uncertain capacities situation and 
derive two alternative approaches to utilize determinis- 
tic solution strategies while addressing capacity uncer- 
tainty. Albareda-Sambola et al. [1] assume that a ran- 
dom subset of the tasks would require to be actually 
processed. Tasks are interpreted as customers that may 
or may not require a service. They construct a convex 
approximation of the objective function and present 
three versions of an exact algorithm to solve this prob- 
lem based on branch-and-bound techniques, optimal- 
ity cuts, and a special purpose lower bound. An assign- 
ment of tasks can be modified once the actual demands 
are known. Different penalties are paid for reassigning 


tasks and for leaving unprocessed tasks with positive 
demand. 


Bi-Objective Generalized Assignment Problem 


Zhang and Ong [91] consider the GAP from a mullti- 
objective point of view, and propose an LP-based 
heuristic to solve the bi-objective generalized assign- 
ment problem (BiGAP). In BiGAP, each assignment 
has two attributes that are to be considered. For exam- 
ple, in production planning, these attributes may be the 
cost and the time caused by assigning jobs to machines. 


Generalized Multi-Assignment Problem 


Proposed by Park Et Al. [66], the generalized multi- 
assignment problem (GMAP) consists of tasks that may 
be required to be duplicated at several agents. In other 
words, each task is assigned to r; agents instead of one. 
Park et al. [66] develop a Lagrangian dual ascent algo- 
rithm for the GMAP that is combined with the subgra- 
dient search and used as a lower bounding scheme for 
the branch-and-bound procedure. 


Methods 


Determining whether an instance of a GAP has a fea- 
sible solution is an IVP-complete problem. Hence, 
unless P = NP, GAP admits no polynomial-time 
approximation algorithm with fixed worst-case perfor- 
mance ratio. Nevertheless there are numerous approxi- 
mation algorithms for GAP in the literature which actu- 
ally address a different setting where the available agent 
capacities are not fixed and the weighted sum of cost 
and available agent capacities is minimized. For some 
of these algorithms, a feasible solution is required as an 
input. For details, see [14,24,65,78]. Excluding this set- 
ting for GAP, the solution approaches proposed in the 
literature are either exact algorithms or heuristics. For 
expository surveys on the algorithms, see [10,54,60]. 


Exact Algorithms 


The optimal solution to the GAP is obtained using 
an implicit enumerative procedure either via branch- 
and-bound scheme or branch-and-price scheme in the 
literature. Branch-and-bound method consists of an 
upper bounding procedure, a lower bounding proce- 
dure, a branching strategy, and a searching strategy. It 
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is known that good bounding procedures are crucial 
steps in branch-and-bound method. Branch-and-price 
proceeds similar to branch-and-bound but obtains the 
bounds by solving the LP-relaxations of the subprob- 
lems by column generation. For more details on the 
valid inequalities and facets for the GAP that are used 
in the solution procedures, see [16,32,33,40,55,67]. 

The first branch-and-bound algorithm for the GAP 
is proposed by Ross and Soland [74]. Considering 
a minimization problem, they obtain the lower bounds 
by relaxing the capacity constraints. Martello and 
Toth [53] propose removing the semi-assignment con- 
straints where the problem decomposes into a se- 
ries of knapsack problems. Due to the quality of the 
bounds obtained, this algorithm is frequently used in 
the literature for benchmarking purposes. Chalmet and 
Gelders [12] introduce the Lagrangian relaxation of the 
semi-assignment constraints. Fisher et al. [23] use this 
technique with multipliers set by a heuristic adjustment 
method to obtain the lower bounds in the branch-and- 
bound procedure. Tighter bounds resulted from this 
method, significantly reduce the solution time. Guig- 
nard and Rosenwein [34] design a branch-and-bound 
algorithm with an enhanced Lagrangian dual ascent 
procedure that solves a Lagrangian dual at each enu- 
meration node and adds a surrogate constraint to the 
Lagrangian relaxed model. This algorithm effectively 
solves generalized assignment problems with up to 500 
variables. Drexl [19] presents a hybrid branch-and- 
bound/dynamic programming algorithm where the up- 
per bounds are obtained via an efficient Monte Carlo 
type heuristic. Numerous lower bounds are proposed 
and their benchmark results are presented. Nauss [62] 
proposes a branch-and-bound algorithm where lin- 
ear programming cuts, Lagrangian relaxation, and sub- 
gradient optimization are used to derive good lower 
bounds; feasible-solution generators with the heuristic 
proposed by Ronen [73] are used to derive good up- 
per bounds. Nauss [63] uses similar branch-and-bound 
techniques to solve the elastic generalized assignment 
problem (EGAP) as well. 

The first branch-and-price algorithm for the gen- 
eralized assignment problem is proposed by Savels- 
bergh [77]. A combination of the algorithms proposed 
by Martello and Toth [53] and Jérnsten and Nas- 
berg [39] is used to calculate the upper bound and the 
pricing problem is proved to be a knapsack problem. 


Barnhart et al. [6] reformulate the GAP by applying 
Dantzig-Wolfe decomposition to obtain a tighter LP re- 
laxation. In order to solve the LP relaxation of the re- 
formulated problem, pricing is done by solving a se- 
ries of knapsack problems. Pigatti et al. [67] propose 
a branch-and-cut-and-price algorithm with a stabiliza- 
tion mechanism to speed up the pricing convergence. 
Ceselli and Righini [11] present a branch-and-price al- 
gorithm for multilevel generalized assignment problem 
that is based on decomposition and a pricing subprob- 
lem that is a multiple-choice knapsack problem. 


Heuristics 


Large instances of the GAP are computationally in- 
tractable due to the NVP-hardness of the problem. 
This calls for heuristic approaches whose benefits are 
twofold; they can be used as stand-alone algorithms to 
obtain good solutions within reasonable time and they 
can be used to obtain the upper bounds in exact so- 
lution methods such as the branch-and-bound proce- 
dure. Although the variety among the heuristics is high, 
they mostly fall into one of the following two categories: 
greedy heuristics and meta-heuristics. 

Kiastorin [41] proposes a two phase heuristic algo- 
rithm for solving the GAP. In phase one, the algorithm 
employs a modified subgradient algorithm to search for 
the optimal dual solution and in phase two, a branch- 
and-bound approach is used to search the neighbor- 
hood of the solution obtained in phase one. 

Cattrysse et al. [9] use column generation tech- 
niques to obtain upper and lower bounds. In their 
method, a column represents a feasible assignment of 
a subset of tasks to a single agent. The master problem is 
formulated as a set partitioning problem. New columns 
are added to the master problem by solving a knapsack 
problem for each agent. LP relaxation of the set parti- 
tioning problem is solved by a dual ascent procedure. 

Martello and Toth [54] present a greedy heuristic 
that assigns the jobs to machines based on a desirability 
factor. This factor is defined as the difference between 
the largest and second largest weight factors. The algo- 
rithm iteratively considers, among the unassigned jobs, 
the one having the highest desirability factor (or regret 
factor) and assigns it to its maximum profit agent. This 
iterative process establishes an initial solution which 
would be improved in the next step of the algorithm 
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by simple interchange arguments. This heuristic can be 
used in a problem size reduction procedure by fixing 
variables to one or to zero. 

Relaxation heuristics are developed by Lorena and 
Narciso [49] for maximization version of GAP. Feasible 
solutions are obtained by a subgradient search in a La- 
grangian or surrogate relaxation. Six different heuristics 
are derived particularizing relaxation, the step size in 
the subgradient search and the method used to obtain 
the feasible solution. In a Lagrangian heuristic for GAP, 
Haddadi [35] introduces a substitution variable in the 
model which is defined as the multiplication of the orig- 
inal variables by their corresponding constraint coef- 
ficients. The constraints defining these new variables 
are then dualized in the Lagrangian relaxation of the 
problem and the resulted relaxation is decomposed into 
two subproblems: the knapsack problem and the trans- 
portation problem. Narciso and Lorena [61] use relax- 
ation multipliers with efficient constructive heuristics 
to find good feasible solutions. 

A breadth-first branch-and-bound algorithm is de- 
scribed by Haddadi and Ouzia [36] in which a standard 
subgradient approach is used in each node of the de- 
cision tree to solve the Lagrangian dual and to obtain 
an upper bound. The main contribution in this study is 
a new heuristic that is applied to exploit the solution of 
the relaxed problem by solving a GAP of smaller size. 

Romeijn and Romero Morales [70] study the opti- 
mal value function from a probabilistic point of view 
and develop a class of greedy algorithms. A family of 
weight functions is designed to measure desirability of 
assigning each job to a machine which is used by the 
greedy algorithms. They derive conditions under which 
their algorithm is asymptotically optimal in a proba- 
bilistic sense. 

Meta-heuristics are widely used to solve GAP in the 
literature. They are either adapted by themselves for 
GAP or are used in combination with other heuristics 
and meta-heuristics. 

Variable depth search heuristic (VDSH) is a gen- 
eralization of local search in which the size of the 
neighborhood adaptively changes to traverse a larger 
search space. VDSH is a two phase algorithm. In the 
first phase, an initial solution is developed and a lower 
bound is obtained. In the second phase, a nested itera- 
tive refinement process is applied to improve the qual- 
ity of the solution. VDSH is introduced by Amini and 


Racer [2] to solve the GAP. In their method, the im- 
provement phase consists of a two level nested loop. 
The major iteration creates an action set correspond- 
ing to each neighborhood structure alternative. Possible 
neighborhood structures for GAP are: reassign (shift) 
a task from one agent to another, swap the assignment 
of two tasks, and permute the assignment of a subset 
of the tasks. Then, a subsequence of operations that 
achieves the highest saving is obtained through per- 
forming some minor iterations. A new solution is estab- 
lished based on that and another major operation starts. 

Amini and Racer [3] develop a hybrid heuristic 
(HH) around the two well known heuristics: VDSH 
(see [2,69]) and Heuristic GAP (HGAP) (see [54]). Pre- 
vious studies show that HGAP dominates VDSH in 
terms of solution time, while VDSH obtains solutions 
of better quality within reasonable time. A computa- 
tional comparison is conducted with the leading alter- 
native heuristic approaches. Another hybrid approach 
is by Lourengo and Serra [52] where a MAX-MIN Ant 
System (MMAS) (see [81]) is applied with GRASP for 
the GAP. 

Yagiura et al. [90] propose a variable depth search 
(VDS) method for GAP. Their method alternates be- 
tween shift and swap moves to explore the solution 
space. The main aspect of their method is that, in- 
feasible solutions are allowed to be considered. How- 
ever in some of the problem instances, the feasible 
space is small or contains many small separate regions 
and the efficiency of the algorithm is affected. In an- 
other study, Yagiura et al. [89] improve VDS by incor- 
porating branching search processes to construct the 
neighborhoods. They show that appropriate choices of 
branching strategies can improve the performance of 
VDS. Lin et al. [48] make further observations on the 
VDSH method through a series of computational ex- 
periments. They consider six greedy strategies for gen- 
erating the initial feasible solution and designed several 
simplified strategies for the improvement phase of the 
method. 

Osman [68] develops a hybrid heuristic which com- 
bines simulated annealing and tabu search. This algo- 
rithm takes advantage of the non-monotonic oscillation 
strategy of tabu search as well as the simulated anneal- 
ing philosophy. 

Yagiura et al. [87] propose a tabu search algorithm 
for GAP which utilizes an ejection chain approach. An 
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ejection chain is an embedded neighborhood construc- 
tion that compounds simple moves to create more com- 
plex and powerful moves. The chain considered in their 
study is a sequence of shift moves in which every two 
successive moves share a common agent. Searching into 
the infeasible region is allowed incurring a penalty pro- 
portional to the degree of infeasibility. An adaptive ad- 
justment mechanism is incorporated for determining 
appropriate values of the parameters to control their 
influence on the problem. Yagiura et al. [88] improve 
their previous method by adding a path relinking ap- 
proach which is a mechanism for generating new so- 
lutions by combining two or more reference solutions. 
The main difference of this method with the previous 
one is the way it generates starting solutions for ejection 
chains. It is shown that, by this simple change in the al- 
gorithm, the improvement in its performance is drastic. 

Asahiro et al. [4] develop two parallel heuristic 
algorithms based on the ejection chain local search 
(EC) presented by Yagiura et al. [87]. One is a simple 
parallelization called multi-start parallel EC (MPEC) 
and the other one is cooperative parallel EC (CPEC). 
In MPEC, each search process independently explores 
search space while in CPEC search processes share par- 
tial information to cooperate with each other. They 
show that their proposed algorithms outperform EC by 
Yagiura [87]. 

Diaz and Fernandez [18], devise a flexible tabu 
search algorithm for GAP. Allowing the search to ex- 
plore infeasible region and adaptively modification of 
the objective function are the sources of flexibility. The 
modification of the objective function is caused by the 
dynamic adjustment of the weight of the penalty in- 
curred for violating feasibility. The main difference of 
this method with the tabu search method of Yagiura 
et al. [87,88] in exploring the infeasible region is that, 
in this method, no solution is qualitatively preferred to 
others in terms of its structure. 

Chu and Beasley [13] develop a genetic algo- 
rithm for GAP that incorporates a fitness-unfitness 
pair evaluation function as a representation scheme. 
This algorithm uses a heuristic to improve the cost 
and feasibility. Feltl and Raid] [21] add new features 
to this algorithm including two alternative initializa- 
tion heuristics, a modified selection and replacement 
scheme for handling infeasible solutions more appro- 
priately and a heuristic mutation operator. 


Wilson [85] proposes another algorithm for GAP 
which is operating in a dual sense. Instead of genetically 
improving a set of feasible solutions as in a regular GA, 
this algorithm tries to genetically restore feasibility to 
a set of near optimal ones. The method starts with po- 
tentially optimal but infeasible solutions and then im- 
proves feasibility while keeping optimality. When the 
feasible solution is obtained, the algorithm uses local 
search procedures to improve the solution. 

Lorena et al. [50] propose a constructive genetic 
algorithm (CGA) for GAP. In CGA, unlike classical 
GA, problems are modeled as bi-objective optimiza- 
tion problems, which consider the evaluation of two 
fitness functions. The evolution process is conducted 
to attain the two objectives conserving schemata that 
survive to an adaptive threshold test. The CGA al- 
gorithm has some new features compared to GA in- 
cluding population formation by schemata, recombina- 
tion among schemata, dynamic population, mutation 
in structure and the possibility of using heuristics in 
schemata and/or structure representation. 

Lourenco and Serra [51] present two metaheuris- 
tic algorithms for GAP. One is a MIN-MAX ant sys- 
tem which is combined with local search and tabu 
search heuristics. The other one is a greedy random- 
ized adaptive search heuristic (GRASP) studied with 
several neighborhoods. Both of these algorithms con- 
sist of three main steps: (i) constructing a solution by 
either a greedy randomized or an ant system approach, 
(ii) improving these initial solutions by applying local 
search and a tabu search, (iii) updating the parameters. 
These three steps are repeated until a stopping criterion 
is verified. 

Monfared and Etemadi [59] use a neural network 
based approach for solving the GAP. They investi- 
gate four different methods to structure the energy 
function of the neural network: exterior penalty func- 
tion, augmented Lagrangian, dual Lagrangian and in- 
terior penalty function. They show that augmented La- 
grangian can produce superior results with respect to 
feasibility and integrality while maintaining feasibility 
and stability measures. 

Problem generators and benchmark instances play 
an important role in comparing/developing new meth- 
ods. Romeijn and Romero Morales [71] propose a new 
stochastic model for the GAP which can be used to ana- 
lyze the random generators in the literature. They com- 
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pare the random generators by Ross and Soland [74], 
Martello and Toth [53], Trick [83], Chalmet and 
Gelders [12], Racer and Amini [69] and conclude these 
random generators are not adequate because they tend 
to generate easier problem instances when the number 
of machines increases. Cario et al. [8] compare GAP 
instances generated under two correlation-induction 
strategies. Using two exact and four heuristic algo- 
rithms from the literature, they show how solutions are 
affected by the correlation between costs and the re- 
source requirements. 


Conclusions 


This review presents the applications, extensions, and 
solution methods for the generalized assignment prob- 
lem. As the GAP receives more attention, it will be more 
likely to see large sets of classical benchmark instances 
and comparative results on solution approaches. 
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The generalized Benders decomposition, GBD, [7] is 
a powerful theoretical and algorithmic approach for 
addressing mixed integer nonlinear optimization prob- 
lems, as well as problems that require exploitation of 
their inherent mathematical structure via decomposi- 
tion principles. A comprehensive analysis of the Gen- 
eralized Benders Decomposition approach along with 
a variety of other approaches for mixed integer non- 
linear optimization problems and their applications are 
presented in [3]. 


Formulation 
[7] generalized the approach proposed by [1], for ex- 
ploiting the structure of mathematical programming 
problems stated as: 
min f(x,y) 
xy 
st. h(x,y) =0 
g(x,y) < 0 
x€X CR" 
y € {0,1}, 
under the following conditions: 
C1) X is anonempty, convex set and the functions 


f: R"xRISR, 
g: R" x Ri > R? 


are convex for each fixed y € Y = {0, 1}4, while the 
functions h: R” x R! + R” are linear for each fixed 


y € Y= {0, 1}4. 
C2) The set 
h(x, y) = 0, 
Zy= \ZzER?: g(x,y) <0 


for some x € X 


is closed for each fixed y € Y. 
C3) For each fixed y € YN V, where 


h(x, y) = 0, 


g(x,y) < 0, 
for some x € X 


V=y\y: 


one of the following two conditions holds: 

i) the resulting problem has a finite solution and has 
an optimal multiplier vector for the equalities and 
inequalities. 

ii) the resulting problem is unbounded, that is, its 
objective function value goes to —oo. 

It should be noted that the above stated formulation is, 

in fact, a subclass of the problems for which the GBD 

of [7] can be applied. This is due to the specification 
of y € {0, 1}, while [7] investigated the more general 
case of Y C RY%, and defined the vector of y variables 

as ‘complicating’ variables in the sense that if we fix y, 

then: 

a) the problem may be decomposed into a number 
of independent problems, each involving a different 
subvector of x; or 

b) the problem takes a well known special structure for 
which efficient algorithms are available; or 

c) the problem becomes convex in x even though it is 
nonconvex in the joint x-y domain, that is, it creates 
special structure. 

Case a) may lead to parallel computations of the 
independent subproblems. Case b) allows the use of 
special-purpose algorithms (e. g., generalized network 
algorithms), while case c) invokes special structure 
from the convexity point of view that can be useful for 
the decomposition of nonconvex optimization prob- 
lems. (e. g., [4]). 

In the sequel, we concentrate on Y = {0, 1}4 due to 
our interest in (MINLP; cf. also » Mixed integer non- 
linear programming) models. Note also that the analy- 
sis includes the equality constraints h(x, y) = 0 which 
are not treated explicitly in [7]. 

Condition C2) is not stringent and it is satisfied if 
one of the following holds (in addition to C1), C3)): 

i) xis bounded and closed and h(x, y), g(x, y) are con- 
tinuous on x for each fixed y € Y. 

ii) there exists a point zy such that the set 


{x eX: h(x, y) = 0, g(x,y) < Zy} 


is bounded and nonempty. 
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Note though that mere continuity of h(x, y), g(x, y) on 
X for each fixed y € Y does not imply that condition 
C2) is satisfied. For instance, if X = [1, oo] and h(x, y) 
=x+y, g(x, y) = —1/x, then z, = (—oo, 0) which is not 
closed since for x + 00, g(x, y) > —oo. 

Note that the set V represents the values of y for 
which the resulting problem is feasible with respect to 
x. In others words, V denotes the values of y for which 
there exists a feasible x € X for h(x, y) = 0, g(x, y) < 0. 
Then the intersection of y and V, Y © V, represents the 
projection of the feasible region of the original problem 
onto the y-space. 

Condition C3) is satisfied if a first order constraint 
qualification holds for the resulting problem after fixing 
ye Yn. 

The basic idea in generalized Benders decomposi- 
tion, GBD, is the generation, at each iteration, of an up- 
per bound and a lower bound on the sought solution 
of the MINLP model. The upper bound results from 
the primal problem, while the lower bound results form 
the master problem. The primal problem corresponds 
to the original problem with fixed y-variables (i. e., it is 
in the x-space only) and its solution provides informa- 
tion about the upper bound and the Lagrange multi- 
pliers associated with the equality and inequality con- 
straints. The master problem is derived via nonlinear 
duality theory, makes use of the Lagrange multipliers 
obtained in the primal problem, and its solution pro- 
vides information about the lower bound, as well as 
the next set of fixed y-variables to be used subsequently 
in the primal problem. As the iterations proceed, it is 
shown that the sequence of updated upper bounds is 
nonincreasing, the sequence of lower bounds is non- 
decreasing, and that the sequences converge in a finite 
number of iterations. 


Theoretical Development 


This Section presents the theoretical development of 
the generalized Benders decomposition, GBD. The pri- 
mal problem is analyzed first for the feasible and infea- 
sible cases. Subsequently, the theoretical analysis for the 
derivation of the master problem is presented. 


The Primal Problem 


The primal problem results from fixing the y variables 
to a particular 0-1 combination, which we denote as y* 


where k stands for the iteration counter. The formula- 
tion of the primal problem P(y‘), at iteration k is: 


min f(x, y*) 

st. h(x, y*) =0 
g(x.y*) <0 
xEXCR". 


P(y*) 


Note that due to conditions C1) and C3i), the solu- 
tion of the primal problem P(y*) is its global solution. 
We will distinguish the two cases ‘feasible primal’ 
and ‘infeasible primal’, and describe the analysis for 
each case separately. 
e Feasible primal. 
If the primal problem at iteration k is feasible, then 
its solution provides information on x*, f(x*, y*) 
which is the upper bound, and the optimal mullti- 
plier vectors A*, w* for the equality and inequal- 
ity constraints. Subsequently, using this information 
we can formulate the Lagrange function as 


Ley A p= foes) 
+ AKTh(x, y) + ukT g(x,y). 


e Infeasible primal. 
If the primal is detected by the NLP solver to be in- 
feasible, then we consider its constraints 


h(x, y*) = 0, 


g(x, y*) <0, 
xEXCR’, 


where the set X, for instance, consists of lower and 
upper bounds on the x variables. To identify a feasi- 
ble point we can minimize an J; or J49 sum of con- 
straint violations. An /;-minimization problem can 
be formulated as: 


Pp 
min a; 
xE€X 
i=1 
st. h(x,y*) =0 
gisy") say, f=1p0,0f, 


Note that if pee a; = 0, then a feasible point has 
been determined. 
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Also note that by defining as 


a* = max (0, a) 


and 
+ ky _ k 
g; (x, y") = max (0. gi y )), 


the /;-minimization problem is stated as: 


P 
. + 

eee 2 §i 

st. h(x, y*) = 0. 


An [,o-minimization problem can be stated simi- 
larly as: 
: +(x yk 
min max  g;' (x, 
xe€X 1,..., Pp 8i ( y ) 


s.t. h(x, y«) = 0. 


Alternative feasibility minimization approaches aim 
at keeping feasibility in any constraint residual once 
it has been established. An ],-minimization in these 
approaches takes the form: 


min > ge (x, y‘) 


xEX 
iel’ 
st. h(x, y’) =0 
gay) 20. Veil, 


where I is the set of feasible constraints and I’ is the 
set of infeasible constraints. Other methods seek fea- 
sibility of the constraints one at a time while main- 
taining feasibility for inequalities indexed by i € I. 
This feasibility problem is formulated as: 


ie ak 
min S > wigit (x. y*) 
iel’ 
st. h(x,y*)=0 
gi(xy*) <0, iel, 


and it is solved at any one time. 
To include all mentioned possibilities [2] formu- 
lated a general feasibility problem (FP) defined as: 


‘ + k 
min ) wig. (x, 
xeX i§; ( y ) 

iel’ 


st. h(x,y")=0 
gi(x, y") <0, ielL. 


(FP) 


The weights w; are nonnegative and not all are zero. 
Note that with w; = 1, i € I’, we obtain the 1,- 
minimization. Also in the /,9-minimization, there 
exist nonnegative weights at the solution such that 


Yow =1 


and w; = 0 if gj(x, y’) does not attain the maximum 
value. 

Note that infeasibility in the primal problem is de- 
tected when a solution of (FP) is obtained for which 
its objective value is greater than zero. 

The solution of the feasibility problem (FP) pro- 
vides information on the Lagrange multipliers for 
the equality and inequality constraints which are de- 


xk 
noted as A, 7* respectively. Then, the Lagrange 
function resulting from on infeasible primal prob- 
lem at iteration k can be defined as: 


—k sk __ kT a 
L (x,y,A ,p*) =A h(x, y) + w*' g(x,y). 


It should be noted that two different types of La- 
grange functions are defined depending on whether 
the primal problem is feasible or infeasible. Also, the 
upper bound is obtained only from the feasible pri- 
mal problem. 


The Master Problem 


The derivation of the master problem in the GBD 

makes use of nonlinear duality theory, and is charac- 

terized by the following three key ideas: 

i) projection onto the y-space; 

ii) dual representation of V; and 

iii) dual representation of the projection of the original 
problem on the y-space. 

In the sequel, the theoretical analysis involved in these 

three key ideas is presented. 


Projection Onto the y-Space 


The original problem can be written as: 


mininf f(x,y) 
y x 


s.t. h(x, y) = 0 
g(x,y) <0 (1) 
xeEx 
ye Y= {0,1}7, 
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where the min operator has been written separately for 
y and x. Note that it is infimum with respect to x since 
for given y the inner problem may be unbounded. Let 
us define v(y) as: 


inf f(x,y) 
= s.t. h(x,y) =0 (2) 
g(x,y) <0 
xe X. 


Note that v(y) is parametric in the y variables and there- 
fore, from its definition corresponds to the optimal 
value of the original problem for fixed y (i.e., the pri- 
mal problem P(y*) for y = y*). 

Let us also define the set V as: 


h(x, y) = 0, 
g(x,y) <0 : (3) 
for some x € X 


V=\y: 


Then, problem (1) can be written as: 


oe v(y) 4) 


s.t. ye Yny, 


where v(y) and V are defined by (2) and (3) respec- 
tively. 

Problem (4) is the projection of the original prob- 
lem onto the y-space. Note also that in (3) ye YNV 
since the projection needs to satisfy the feasibility con- 
siderations. 

Having defined the projection problem onto the y- 
space, we can now state the theoretical result of [7]. 


Theorem 1 (Projection) 

i) If (x*, y*) is optimal in the original problem, then y* 
is optimal in (4). 

ii) If the original problem is infeasible or has unbounded 
solution, then the same is true for (4) and vice versa. 


Note that the difficulty in the original problem is due to 
the fact that v(y) and V are known only implicitly via 
(2) and (3). 


To overcome the aforementioned difficulty we have 
to introduce the dual representation of V and v(y). 


Dual of V 


The dual representation of V will be invoked in terms 
of the intersection of a collection of regions that contain 
it, and it is described in the following theorem, due to 


[7]. 

Theorem 2 (Dual of V) Assuming conditions C1) and 
C2), a point y € Y belongs also to the set V if and only if 
it satisfies the (finite) system: 


0>infL(x,y,A,@), VA,ME A, 


= z (5) 
A={A ER", ZER?: £>0, ) 7,.=1 
i=1 


Note that (5) is an infinite system because it has to be 
satisfied for all A, Z € A. The dual representation of the 
set V needs to be invoked so as to generate a collection 
of regions that contain it (i.e., system (5) and system 
(5) corresponds to the set of constraints that have to be 
incorporated for the case of infeasible primal problems. 

Note that if the primal is infeasible and we make use 
of the /;-minimization of the type: 


min ) Qj 
x 


iel 

st. h(x, y*) =0 (6) 
gixy*)<a;, i€l, 
x EX, 


then the set A results from a straightforward applica- 
tion of the KKT gradient conditions to problem (6) with 
respect to qj. 

Having introduced the dual representation of the set 
V, which corresponds to infeasible primal problems, we 
can now invoke the dual representation of v(y). 


Dual Representation of N(y) 


The dual representation of v(y) will be in terms of the 
pointwise infimum ofa collection of functions that sup- 
port it, and it is described in the following theorem, due 
to [7]. 
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Theorem 3 (Dual of v(y)) 


inf f(x,y) 
st. h(x,y) =0 
7 g(x,y) <0 
xeExX 
ee . 
Vy Ee Yn, 
L(x, y,A, 1) 


= f(x,y) +A'h(x,y) + w' g(x,y). 


The equality of v(y) and its dual is due to having the 
strong duality theorem satisfied because of conditions 
C1), C2) and C3). 

Substituting (7) for v(y) and (5) for y € YN V into 
problem (4), (which is equivalent to (1)), we obtain: 


co oo inf L(x, y, A, 4) 


> . aay -_ 
s.t. 0> inf L(x, yA, 7). 


Using the definition of supremum as the lowest upper 
bound and introducing a scalar juz we obtain: 


ae A 
s.t. [p= inf L(x, y, A, 2), 
(M) VA, Vu = 0, 
0> BEE ya Bh 
Vv (1.7) EA, 
where 


L(x, y,A, ) = f(x,y) 
4A ite: y) + we g(x, y). 


a =]. ae 
L(x,y,A,f) =A h(x,y) +7" g(xy), 


which is called the master problem. 

If we assume that the optimum solution of v(y) in 
(2) is bounded for all y € Y M V, then we can replace 
the infimum with a minimum. Subsequently, the mas- 


ter problem will be as follows: 


ye re 
st. > min L(x, y,A, 14), 
s fg > min (x,y,A, 14) 
VA, = 0, 
> . — oh eat 
0> min L(x, y, A, f), 
V (Zz) € A, 


where L(x, y, A, 2) and L(x, y, Ae 7) are defined as be- 
fore. 

Note that the master problem involves, an infinite 
number of constraints and hence we would need to 
consider a relaxation of the master (e.g., by dropping 
a number of constraints) which will represent a lower 
bound on the original problem. Note also that the mas- 
ter problem features an outer optimization problem 
with respect to y € Y and inner optimization problems 
with respect to x which are in fact parametric in y. It is 
this outer-inner nature that makes the solution of even 
a relaxed master problem difficult. 

The inner minimization problems 


min L(x,y,A, 1), WA, We 2 0, 
x€ 
min L(x, y, ‘1, 2), Vv (2. 7) eA, 
x€ 


are functions of y and can be interpreted as support 
functions of v(y). (&(y) is a support function of v(y) 
at point y, if and only if &(y) = v(y) and &(y)< v(y), 
Vy # yo.) If the support functions are linear in y, then 
the master problem approximates v(y) by tangent hy- 
perplanes and we can conclude that v(y) is convex in y. 
Note that v(y) can be convex in y even though the orig- 
inal problem is nonconvex in the joint x-y space (see 
[5]). 

In the sequel, we will define the aforementioned 
minimization problems in terms of the notion of sup- 
port functions, that is: 


§(ysA, 4) = min L(x, y, A, 1), 
VA, Wud, 
E(ysA, 7) = min L(x, y, 1,72), 


v (1,7) eA. 
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Algorithmic Development 


In the previous Section we discussed the primal and 
master problem for the GBD. We have the primal prob- 
lem being a (linear or) nonlinear programming, NLP, 
problem that can be solved via available local NLP 
solvers (e.g., MINOS 5.3). The master problem, how- 
ever, consists of outer and inner optimization prob- 
lems, and approaches towards attaining its solution are 
discussed in the following. 


How to Solve the Master Problem 


The master problem has as constraints the two inner 
optimization problems (i.e., for the case of feasible pri- 
mal and infeasible primal problems) which however 
need to be considered for all A and all ww > 0 (ie fea- 
sible primal) and all (A, E) € A (i.e., infeasible). This 
implies that the master problem has a very large num- 
ber of constraints. 

The most natural approach for solving the master 
problem is relaxation [7]. The basic idea in the relax- 
ation approach consists of the following: 

i) ignore all but a few of the constraints that cor- 
respond to the inner optimization problems (e. g., 
consider the inner optimization problems for spe- 
cific or fixed multipliers (A', w+) or Ge. Z')); 

ii) solve the relaxed master problem and check 
whether the resulting solution satisfies all of the ig- 
nored constraints. If not, then generate and add to 
the relaxed master problem one or more of the vio- 
lated constraints and solve the new relaxed master 
problem again; 

iii) continue until a relaxed master problem satisfies all 
of the ignored constraints, which implies that an 
optimal solution at the master problem has been 
obtained or until a termination criterion indicates 
that a solution of acceptable accuracy has been 
found. 


General Algorithmic Statement of GBD 


Assuming that the problem has a finite optimal value, 
[7] stated the general algorithm for GBD listed below. 

Note that a feasible initial primal is needed in Step 1. 
However, this does not restrict the GBD since it is pos- 
sible to start with an infeasible primal problem. In this 
case, after detecting that the primal is infeasible, Step 3b 
is applied in which a support function & is employed. 


Note that Step 1 could be altered, that is instead of 
solving the primal problem we could solve a continuous 
relaxation of the original problem in which the y vari- 
ables are treated as continuous bounded by zero and 
one: 


min f(x,y) 

%Y 

s.t. h(x,y) =0 
g(x,y) <0 (8) 
xeExX 
0<y<1l. 


If the solution of (8) is integral, then we terminate. If 
there exist fractional values of the y variables, then these 
can be rounded to the closest integer values and sub- 
sequently these can be used as the starting y' vector 
with the possibility of the resulting primal problem be- 
ing feasible or infeasible. 

Note also that in Step 1, Step 3a and Step 3b a rather 
important assumption is made, that is we can find the 
support functions & and & for the given values of the 
multiplier vectors (A, jz) and (A, /L). The determination 
of these support functions can not be achieved in gen- 
eral since these are parametric functions of y and result 
from the solution of the inner optimization problems. 

Their determination in the general case requires 
a global optimization approach as the one proposed by 
[5,6]. There exist however, a number of special cases for 
which the support functions can be obtained explicitly 
as functions of the y variables. We will discuss these 
special cases in the next Section. If however, it is not 
possible to obtain explicitly expressions of the support 
functions in terms of the y variables, then assumptions 
need to be introduced for their calculation. These as- 
sumptions, as well as the resulting variants of GBD will 
be discussed in the next Section. The point to note here 
is that the validity of lower bounds with these variants 
of GBD will be limited by the imposed assumptions. 

Note that the relaxed master problem (see Step 2) in 
the first iteration will have as a constraint one support 
function that corresponds to feasible primal and will be 
of the form: 


min 
yEY, WB ve (9) 


s.t. [ep = &(y3A}, p2!). 
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1 | Let an initial point y' € Y/N V (ie., by fix- 
ing y = y’, we have a feasible primal). Solve 
the resulting primal problem P(y’) and ob- 
tain an optimal primal solution x! and opti- 
mal multipliers; vectors A!, '. Assume that 
you can find, somehow, the support func- 
tion &(y;A1, w') for the obtained multipliers 
A', 1. Set the counters k = 1 for feasible and 
1 = 1 for infeasible and the current upper 
bound UBD = v(y'). Select the convergence 
tolerance € > 0. 

2 | Solve the relaxed master problem: 


min [Lg 
yEY, (4B 
St fp = Eye A ye"), 
(RM) ka 1.2... K, 
0>Fy:2.7), 
f=) A. 


Let (y, 4g) be an optimal solution of the 
above relaxed master problem. jig is a lower 
bound on the original problem, that is the 
current lower bound is LBD = fig. If UBD — 
LBD < e«, then terminate. 

3 | Solve the primal problem for y = J, that is the 
problem P(y). Then we distinguish two cases: 
feasible and infeasible primal: 

3a | Feasible Primal P(y). 

The primal has v(¥) finite with an opti- 
mal solution x and optimal multiplier vec- 
tors A, jt. Update the upper bound UBD = 
min{UBD, v(¥)}. If UBD — LBD < e, then 
terminate. Otherwise, set k= k+1,A* = A, 
and j* = ju. Return to Step 2, assuming we 
can somehow determine the support func- 
tion E(y; AR, ioe 

3b | Infeasible Primal P(y). 

The primal does not have a feasible solution 
for y = y. Solve a feasibility problem (e.g., 
then /,-minimization) to determine the mul- 


tiplier vectors A, ji of the feasibility problem. 


Set) = 1+ ‘Lo = i and qt! = ji. Return to 


Step 2, assuming we can somehow determine 
sl+1 


the support function &(y;A, z'*?). 


In the second iteration, if the primal is feasible and 
(A?, 44”) are its optimal multiplier vectors, then the re- 


laxed master problem will feature two constraints and 
will be of the form: 


min [lg 
yeY, UB 


St bea = E(y;A', w’) 
bp = E(y;A7, y?). 


(10) 


Note that in this case the relaxed master problem 
(10), will have a solution that is greater or equal to the 
solution of (9). This is due to having the additional con- 
straint. Therefore, we can see that the sequence of lower 
bounds that is created from the solution of the relaxed 
master problems is nondecreasing. A similar argument 
holds true in the case of having infeasible primal in the 
second iteration. 

Note that since the upper bounds are produced by 
fixing the y variables to different 0-1 combinations, 
there is no reason for the upper bounds to satisfy any 
monotonicity property. If we consider however the up- 
dated upper bounds (i. e., UBD = min, v(y*)), then the 
sequence for the updated upper bounds is monotoni- 
cally nonincreasing since by their definition we always 
keep the best (least) upper bound. 

The termination criterion for GBD is based on the 
difference between the updated upper bound and the 
current lower bound. If this difference is less than or 
equal to a prespecified tolerance ¢ > 0 then we termi- 
nate. Note though that if we introduce in the relaxed 
master integer cuts that exclude the previously found 
0-1 combinations then the termination criterion can be 
met by having found an infeasible master problem (i.e., 
there is no 0-1 combination that makes it feasible). 


Finite Convergence of GBD 


[7] proved finite convergence of the GBD algorithm 
which is as follows: 


Theorem 4 (Finite convergence) If C1), C2) and C3) 
hold and Y is a discrete set, then the GBD algorithm ter- 
minates in a finite number of iterations for any given € > 
0 and even for € = 0. 


Variants of GBD 


In the previous Section we discussed the general algo- 
rithmic statement of GBd and pointed out a key as- 
sumption made with respect to the calculation of the 
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support functions &(y;A, j) and (y; A, 72) from the fea- 
sible and infeasible primal problems respectively. In 
this section, we will discuss a number of variants of 
GBD that result from addressing the calculation of the 
aforementioned support functions either rigorously for 
special cases or making assumptions that may not pro- 
vide valid lower bounds in the general case. 


Variant 1 of GBD: V1-GBD 


This variant of GBD is based on the following assump- 
tion that was denoted by [7] as Property (P): 


Theorem 5 (Property (P)) For every A and yt > 0, the 
infimum of L(x, y, A, 14) with respect to x € X (i.e., the 
support &(y;A, jL)) can be taken independently of y so 
that the support function &(y;A, 4) can be obtained ex- 
plicitly with little or no more effort than is required to 
evaluate it at a single value of y. Similarly, the support 
function E(y; a E), (A, ft) € A can be obtained explic- 
itly. 


[7] identified the following two important classes of 
problems where Property (P) holds: 
e Class 1: f, h, g are linearly separable in x and y. 
e Class 2: Variable factor programming. 
In class-1 problems, we have 


f(xy) =fitx) + fly), 
h(x, y) =h,(x) + hy(y), 
g(x, y) =g1(x) + go(y). 


In class-2 problems, we have 
f(x,y) =— > filyi, 
B(x. y)j = » y= 
In [8] problems, we have 
fy) =o Ye fille): + YO giv), 
Ra 
g(x,y); =— dx Wyi — Lk). 


In the sequel, we will discuss the v1-GBD for class- 
1 problems since this by itself defines an interest- 
ing mathematical structure for which other algorithms 
(e. g., outer approximation) has been developed. 


V1-GBD Under Separability 


Under the separability assumption, the support func- 
tions E(y;A*, *) and E(y; x _j1') can be obtained as ex- 
plicit functions of y since: 


gy: A" WS) = min L(x y, AK) 
=min{ f(x,y) + AMTh(xy) + wT g(x. y)} 
=mint{ fit) + fly) 

+ AKT (hy (x) + holy) + u*" (gi@®) + ga(y))} 


=foly) + AK ho(y) + w*! g(x) 
+ min[ fix) + ATi) + wT gi(0)]. 


Note that due to separability we end up with an ex- 
plicit function of y and a problem only in x that can be 
solved independently. 


Similarly, the support function &(y; z .[') is 
Eysa pt!) = min L(x, y, 0.) 
=min(' "h(x, y) + 27 g(x. y)} 
=min{X'" (hn(x,y) + ha(x.y)) 
+7" (gi(x,y) + g(x, y))} 
=X" holy) + ET goly) 
+ min[7'" hie) + 72'7 2100]. 


Note that to solve the independent problems in x, 
we need to know the multiplier vectors (A*, w*) and 
(CZ!) from feasible and infeasible primal problems 
respectively. 

Under the separability assumption, the primal 
problem for fixed y = y* takes the form 


min fi(x) + fly’) 
st. Ay(x) = —Ialy*) 
gi(x) < —goly*). 


Now, we can state the algorithmic procedure for the v1- 
GBD under the separability assumption. 

Note that if in addition to the separability of x and 
y, we assume that y participates linearly (i. e., conditions 
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1 | Let an initial point y! € YN V. Solve the pri- 
mal P(y') and obtain an optimal solution x!, 
and multiplier vectors A’, jz’. Set the counters 
k = 1,1 = 1,and UBD = v(y'). Select the con- 
vergence tolerance € > 0. 

2 | Solve the relaxed master problem 


min [lg 
YEY wp 
s.t. bea = foly) + AKT ho(y) 
+h" Tg, (y) + Lt, 
Ie = In cconlX 
a hy 1 
O> meA y+ PM gly) +Ly, 
ol eer AG 
where 


Lk = min{ fix) + ATH (x) + w*Tg,(0)}, 


Ty = min{fi@) + Xho) +72'Tg, 0} 


are solutions of the above stated independent 
problems. 

Let (y, 4g) be an optional solution. fig is a 
lower bound, that is LBD = jig. If UBD — 
LBD < «, then terminate. 

3 | As in GBD. 


Algorithm for v1-GBD 


for outer approximation algorithm), then we have 


fily) =cly, 
ha(y) = Ay, 
g2(y) = By, 


in which case the relaxed master problem of Step 2 
of v1-GBD will be a linear 0-1 programming problem 
with an additional scalar 2g, which can be solved with 
available solvers (e. g., CPLEX, ZOOM, SCICONIC). 

If the y variables participate separably but in a non- 
linear way, then the relaxed master problem is of 0-1 
nonlinear programming type. 

Note that due to the strong duality theorem we do 
not need to solve the problems for Lk, ia since their op- 
timum solutions are identical to the ones of the corre- 
sponding feasible and infeasible primal problems with 
respect to x respectively. 


Variant 2 of GBD: V2-GBD 


This variant of GBD is based on the assumption that we 
can use the optimal solution x* of the primal problem 
P(y*) along with the multiplier vectors for the determi- 
nation of the support function &(y;A*, ju*). 

Similarly, we assume that we can use the optimal 
solution of the feasibility problem (if the primal is in- 
feasible) for the determination of the support function 
EQ Aku). 

The aforementioned assumption fixes the x vec- 
tor to the optimal value obtained from its correspond- 
ing primal problem, and therefore eliminates the inner 
optimization problems that define the support func- 
tions. It should be noted that fixing x to the solution of 
the corresponding primal problem may not necessarily 
produce valid support functions in the sense that there 
would be no theoretical guarantee for obtaining lower 
bounds can be claimed in general. 

The v2-GBD algorithm can be stated as follows: 


1 | Let an initial pointy! ¢ YN V. 

Solve the primal problem P(y') and obtain an 
optimal solution x! and multiplier vectors MM. 
2! Set the counters k = 1, / = 1, and UBD = 
v(y'). Select the convergence tolerance € > 0. 
2 | Solve the relaxed master problem: 


ove a 
s.t. [ip Li Ae), 
ele aoan dN 
0 > L(x!,y,2, 2), 
b= econ ly 


Ce nS) 
= f(x*,y) +A‘ h(x*,y) + wk g(x*, y), 
Dy. 1 a) 
=7'"hex!.y) +70 g(x!.y) 


are the Lagrange functions evaluated at the op- 
timal solution x* of the primal problem. 

Let (y, @g) be an optimal solution. fig is a 
lower bound, that is LBD = jig. If UBD — 
LBD < ¢, then terminate. 

3 | Asin GBD. 


Algorithm for v2-GBD 


1172 


Generalized Benders Decomposition 


Note that since y € Y = {0 — 1}, the master problem 
is a 0-1 programming problem with one scalar vari- 
able jg. If the y variables participate linearly, then it 
is a 0-1 linear problem which can be solved with stan- 
dard branch and bound algorithms. In such a case, we 
can introduce integer cuts of the form: 


Soy - DO yi <1BI-1, 


i€B i€NB 


where B = {i: y; = 1}, NB = {i: y; = 0}, |B| is the cardinal- 
ity of B, which eliminate the already found 0-1 com- 
binations. If we employ such a scheme, then an alter- 
native termination criterion is that of having infeasible 
relaxed master problems. This of course implies that all 
0-1 combinations have been considered. 

It is of considerable interest to identify the condi- 
tions which if satisfied make the assumption in v2-GBD 
a valid one. The assumption in a somewhat different re- 
stated form is that: 


534% WS) = min L(x y. A*, u*) 


Site yA, BS lyaiK; 


a a 
E(ysA pw!) = min L(x, yA 7) 
>L(x)y,0. 2), 1=1,...,A, 


that is, we assume that the Lagrange function evalu- 
ated at the solution of the corresponding primal are 
valid underestimators of the inner optimization prob- 
lems with respect to x € X. 

Due to condition C1) the Lagrange functions L(x, y, 
AK, ie), L(x, y, we we) are convex in x for each fixed y 
since they are linear combinations of convex functions 
ix xX. 

L(x, y; AK, uk), L(x’, y, 1 Zz) represent local lin- 
earizations around the points x and x" of the support 
functions &(y;A*, uw), Ely; re u') respectively. There- 
fore, the aforementioned assumption is valid if the pro- 
jected problem v(y) is convex in y. If however, the pro- 
jected problem v(y) is convex in y. If however, the pro- 
jected problem v(y) is nonconvex, then the assump- 
tion does not hold, and the algorithm may terminate at 
a local (not global) solution or even at a nonstationary 
point. 

Note that in the above analysis we did not assume 
that Y = {0, 1}/, and hence the argument is applicable 
even when the y-variables are continuous. 


It is also very interesting to examine the validity of 
the assumption made in v2-GBD under the conditions 
of separability of x and y and linearity in y (i.e, OA 
conditions). In this case we have: 


fy) = clyt fi(x), 
h(x, y) = Ay + hi(x), 
g(x, y) = By + gi(x). 


Then, the support function for feasible primal becomes 


EAN pS) = ely + A‘T (Ay) 
+ p*T (By) + min fi(x) + ATi (x) + pT gi(x), 


which is linear in y and hence convex in y. Note also 
that since we fix x = x‘, the min, <x is in fact an eval- 
uation at x*, Similarly the case for Ely; ra Nig ) can be 
analyzed. 

Therefore, the assumption in v2-GBD holds true if 
separability and linearity hold which covers also the 
case of linear 0-1 y variables. This way under condi- 
tions C1), C2), C3) the v2-GBD determined the global 
solution for separability in x and y and linearity in y 
problems. 


Variant 3 of GBD: V3-GBD 


This variant was proposed in [4] and denoted as global 

optimum search, GOS, and was applied to continuous 

as well as 0-1 set Y. It uses the same assumption as the 

one in v2-GBD but in addition assumes that: 

i) f(x, y), g(x, y) are convex functions in y for every 
fixed x; and 

ii) h(x, y) are linear functions in y for every x. 

This additional assumption was made so as to create 

special structure not only in the primal but also in the 

relaxed master problem. The type of special structure in 

the relaxed master problem has to do with its convexity 

characteristics. 

The basic idea in GOS is to select the x and y vari- 
ables in a such a way that the primal and the relaxed 
master problem of the v2-GBD satisfy the appropriate 
convexity requirements and hence attain their respec- 
tive global solutions. 

We will discuss v3-GBD first under the separability 
of x and y and then for the general case. 
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V3-GBD Under Separability 


Under the separability assumption we have: 


fy) = fie) + fy), 
h(x, y) = h,(x) + hy(y), 
g(x, y) = gi(x) + go(y). 


The additional assumption that makes v3-GBD dif- 
ferent from v2-GBD implies that 
i) f(y), go(y) are convex in y; and 
ii) ho(y) are linear in y. 
Then, the relaxed master problem will be: 
min [lg 
YB 
st. lp = faly) + AFT holy) + wk! gly) 
+ [fiGe*) + ART Ay (x*) + kT g(x], 
heya vay KG 
IT it 
0>=A haly) +H gly) 
+ [0 nie!) +727 @)], 


Note that the additional assumption makes the 
problem convex in y if y represent continuous variables. 
If y € Y = {0, 1}, and the y-variables participate linearly 
(i.e., f:, go are linear in y), then the relaxed master is 
convex. Therefore, this case represents an improvement 
over v3-GBD, and application of v3-GBD will result in 
valid support functions, which implies that the global 
optimum of the original problem will be obtained. 


V3-GBD Without Separability 


The global optimum search, GOS, aimed at exploiting 
and invoking special structure for nonconvex nonsepa- 
rable problems 


min f(x,y) 

st. h(x,y) =0 
g(x,y) <0 
xeEX CR" 
ye Y CR, 


under the conditions C1), C2), C3) and the additional 

condition: 

i) f(x, y), g(x y) are convex functions in y for every 
fixed x; 


ii) h(x, y) are linear functions in y for every x. 
Hence both the primal and the relaxed problems attain 
their respective global solutions. 

Note that since x and y are not separable, then the 
GOS cannot provide theoretically valid functions in the 
general case, but only if the v(y) is convex (see the Sec- 
tion v2-GBD). 

The global optimization approach (GOP) of [5,6] 
overcomes this fundamental difficulty and guarantees 
€-global optimality for several classes of nonconvex 
problems. 


GBD in Continuous and Discrete-Continuous 
Optimization 


We mentioned in the Section Formulation that the 

original problem represents a sub-class of the prob- 

lems for which the generalized Benders decomposition, 

GBD, can be applied. This is because we considered 

the y € Y set to consist of 0-1 variables, while [7] pro- 

posed an analysis for Y being a continuous, discrete or 
continuous-discrete set. 

The main objective in this section is to present the 
modifications needed to carry on the analysis for con- 
tinuous Y and discrete-continuous Y set. 

The analysis presented for the primal problem re- 
mains the same. The analysis though for the Master 
problem changes only in the dual representation of the 
projection of the original problem (i. e., v(y)) on the y- 
space. In fact, Theorem 3 is satisfied if in addition to the 
two conditions mentioned in C3) we have that: 

iii) for each fixed y, v(y) is finite, h(x, y), g(x, y) and 
f(x, y) are continuous on X, X is closed and the 
€-optimal solution of the primal problem P(y) is 
nonempty and bounded for some ¢ > 0. 

Hence, Theorem 3 has as assumptions: C1) and C3), 
which now has i), ii) and iii). The algorithmic proce- 
dure remains the same, while the theorem for the fi- 
nite convergence becomes finite e-convergence and re- 
quires additional conditions, which are described in the 
following theorem: 


Theorem 6 (Finite e-convergence) Let 

i) Ybeanonempty subset of V; 

ii) X be a nonempty convex set; 

iii) f, g be convex on X for each fixed y € Y; 
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iv) h be linear on X for each fixed y € Y; 

v) f, gh be continuous on X x Y; 

vi) the set of optimal multiplier vectors for the primal 
problem be nonempty for all y € Y, and uniformly 
bounded in some neighborhood of each such point. 

Then, for any given € > 0 the GBD terminates in a finite 

number of iterations. 


Assumption i) (i.e, Y C V) eliminates the possibility 
of Step 3b, and there are many applications in which Y 
CV holds (e. g., variable factor programming). If how- 
ever, YZV, then we may need to solve step 3b infinitely 
many successive times. In such a case, to preserve fi- 
nite €-convergence, we can modify the procedure so as 
to finitely truncate any excessively long sequence of suc- 
cessive executions of Step 3b and return to Step 3a with 
¥ equal to the extrapolated limit point which is assumed 
to belong to Y M V. If we do not make the assumption 
Y CV, then the key property to seek is that V has a rep- 
resentation in terms of a finite collection of constraints 
because if this is the case then Step 3b can occur at most 
a finite number of times. Note that if in addition to C1), 
we have that X represents bounds on the x-variables or 
X is given by linear constraints, and h, g satisfy the sep- 
arability condition, then V can be represented in terms 
of a finite collection of constraints. 

Assumption vi) requires that for all y € Y there ex- 
ist optimal multiplier vectors and that these multiplier 
vectors do not go to infinity, that is they are uniformly 
bounded in some neighborhood of each such point. [7] 
provided the following condition to check the uniform 
boundedness: 

If X is a nonempty, compact, convex set and there 
exists a point x € X such that 


h(x, y) = 0, 
g(x,y) < 0, 


then the set of optimal multiplier vectors is uniformly 
bounded in some open neighborhood of y. 
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> MINLP: Branch and Bound Global Optimization 
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> MINLP: Branch and Bound Methods 

> MINLP: Design and Scheduling of Batch 
Processes 

> MINLP: Generalized Cross Decomposition 

> MINLP: Global Optimization with aBB 
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> MINLP: Logic-based Methods 

> MINLP: Outer Approximation Algorithm 

> MINLP: Reactive Distillation Column Synthesis 

> Mixed Integer Linear Programming: Mass and Heat 
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> Mixed Integer Nonlinear Programming 

> Simplicial Decomposition 

> Simplicial Decomposition Algorithms 

> Stochastic Linear Programming: Decomposition 
and Cutting Planes 

> Successive Quadratic Programming: Decomposition 
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In the context of economics and optimization, a funda- 
mental role is nowadays recognized to generalized con- 
cavity which has been widely studied starting from the 
pioneering work of K. Arrow and A.C. Enthoven [1]. 

The study of generalized concavity of a vector val- 
ued function is not so deep as in the scalar case; never- 
theless some classes with related properties have been 
suggested in order to obtain sufficient optimality con- 
ditions and the connectedness of the set of all efficient 
points. 

In this order of ideas, since there are different ways 
in generalizing to the multi-objective case the defi- 
nitions of generalized concave functions given in the 
scalar case, we introduce the following classes of gen- 
eralized concave vector valued functions, referring to 
bibliography for further deepenings. 

Let X be a convex subset of the n-dimensional space 
R" and let F be a vector function from X to R*. Assume 
that R° is partially ordered by the convex closed cone 
U with vertex at the origin 0 € U and with nonempty 
interior (i.e. intU 4@). Set U° = U \ {0}. 


Definition 1 The function F is said to be U-concave if: 


F(x; + A(x2 — x1)) 
€ F(x) + A(F(x2) — F(x1)) + U, 


for all A € (0, 1) and all x}, x. €S. 


Definition 2 The function F is said to be U- 
quasiconcave if: 


X1,x2 € S, F(x2) € F(x;) +U 


imply 
F(x, + A(x2 — x1)) € F(x;) + U 


for all A € (0, 1). 


Definition 3 The function F is said to be U°- 
quasiconcave if: 


x1,%2 € S, F(x2) E F(x) + U® 


imply 
F(x; + A(x2 — x1)) € F(x1) + U® 


for all A € (0, 1). 


Definition 4 The function F is said to be intU- 
quasiconcave if: 


X1,X%2 € S, F(x2) E F(x) + int U 


imply 
F(x, + A(x2 _ x})) € F(x) + int U 
for all A € (0, 1). 


In [12], D.T. Luc suggests another class of quasiconcave 
functions which results less general than the one given 
in Definition 2, but which plays an important role in 
establishing the connectedness of the set of all efficient 
points. 


Definition 5 The function F is said to be Luc U- 
quasiconcave if: 


yeR, x1,%2 € S, F(x 1), F(x2) i a U 


imply 
F(x, + Axo —x})) E yee U 


for all A € (0, 1). 
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In the scalar case, that is, when s = 1 and U = R,, Defi- 
nitions 1, 2 and 5, 3 and 4 reduce to the ordinary defini- 
tions of concavity, quasiconcavity and semistrictly qua- 
siconcavity, respectively. 

Inclusion relationships among the previous classes 
of functions are given in the following Theorem: 


Theorem 6 

i) if F is U-concave, then F is Luc U-quasiconcave; 

ii) if F is Luc U-quasiconcave, then F is U- 
quasiconcave; 

iii) if F is U-concave and U is a pointed cone, then F is 
intU-quasiconcave; 

iv) if F is U-concave and U is a pointed cone, then F is 
U° -quasiconcave. 


Proof i 
i) Assume that F(x;), F(x.) € y + U; it follows (1 — 
A)F(x1) € (1— A)y + Uand AF(x2) € Ay+JU, so that 
(1 — A)F(x1) +AF(x2) € (1 —-A)y+Ayt+U=y+U. 

ii) It is sufficient to choose y = F(x;). 

iii) Assume that F(x2) € F(x,) + intU, that is, F(x.) — 
F(x) € intU. Since F is U-concave we have F(x; + 
(x2 — x1)) € F(x) + A(F(x2) — F(x;)) + U. The 
thesis follows taking into account that for a pointed 
cone the property intU + U = intU holds. 

iv) The proof is similar to the one given in iii). Oo 


Remark 7 When U is the Paretian cone U = R%,, com- 

ponentwise generalized concavity implies generalized 

concavity. For instance: 

e if any component of F is quasiconcave then F is U- 
quasiconcave; 

e if any component of F is strongly quasiconcave then 

F is either intU-quasiconcave or U°-quasiconcave; 
e if any component of F is upper semicontinuous 

and semistrictly quasiconcave then F is either intU- 

quasiconcave or U°-quasiconcave. 
It can be proven that F is R‘,-concave (Luc R§,- 
quasiconcave) if and only if all its components are con- 
cave (quasiconcave); such a property does not hold for 
the other given classes of generalized concave func- 
tions, so that the inclusion relationships stated in i) and 
ii) of Theorem 6 are strict. 

In the particular case of a continuous bicrite- 
ria function (s = 2, U = R?), the class of Luc U- 
quasiconcave functions collapses to the class of U- 
quasiconcave functions [8]. 


Remark 8 The following examples point out that 
the classes of intU-quasiconcave and U°-quasiconcave 
functions are not comparable. 

Consider the function F: R > R?, F(x) = (x, x?—x, 
—x?+x) and the Paretian cone U = Ro. F is intU- 
quasiconcave since there do not exist x, y € R such 
that F(y) > F(x); on the other hand, F is not U°- 
quasiconcave since F(1) = (1, 0, 0) € F(0) + Ro \ {0}, 
but F(1/2) ¢ F(0) + R4. \ {0}. 

Consider now the function F: R > R’, F(x) = (x, 
F(x) with f(x) = 0 if x < 1, f(x) =x—1ifx>1and the 
Paretian cone U = R%. It is easy to verify that F is U°- 
quasiconcave, but F is not U-quasiconcave since F(2) = 
(2, 1) € F(0) + intR7., and F(1) = (1, 0) ¢ F(0) + intR}. 


Remark 9 In the scalar case an upper semicontinu- 
ous and semistrictly quasiconcave function is also qua- 
siconcave; this property is lost for a vector valued func- 
tion as is shown in the following example, so that the 
two classes are not comparable: consider the function F: 
R= R’ defined as F(x) = (x sin 1/x, —x sin 1/x) if x 4 0; 
F(x) = 0 ifx = 0. F is continuous and U°-quasiconcave 
but it is not U-quasiconcave at x = 0. 


Remark 10 As is known, in the scalar case there exists 
a characterization of quasiconcave functions in the dif- 
ferentiable case; unfortunately such a characterization 
cannot be extended in the vector case (for further de- 
veloping see [7]). 


Consider a differentiable vector valued function F. As 
for the quasiconcave case, there are different ways to 
extend the concept of pseudoconcavity introduced by 
O.L. Mangasarian [14]. With the aim to state some suf- 
ficient optimality conditions, we introduce the follow- 
ing two classes of functions, where J-(x) denotes the Ja- 
cobian matrix of F evaluated at x. 


Definition 11 F is said to be U-weakly pseudoconcave 
if: 


X1,X2 € S, F(x) € F(x;) + U® 
imply 
Te(x1)d € U®, a= fac a 
Ilx2 — x1 |] 


Definition 12 F is said to be U-pseudoconcave if: 


x1,%. €S, F(x) € F(x,) + U® 
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imply 
X2— XX, 


Je(x1)d € int U, d 


[lea - all 


When s = 1 and U = R,, Definitions 11, 12 reduce to the 
ordinary definition of a pseudoconcave function. 

Obviously, a function which is U-pseudoconcave 
is U-weakly quasiconcave too; a linear function is U- 
concave and U-weakly pseudoconcave with respect to 
every cone U with vertex at the origin 0 € U but it is 
not U-pseudoconcave. As a consequence the class of U- 
pseudoconcave functions is properly contained in the 
class of U-weakly pseudoconcave functions. 


Remark 13, When U is the Paretian cone U = R‘,, we 

have: 

e if any component of F is pseudoconcave then F is 
R{, -weakly pseudoconcave; 

e if any component of F is strictly pseudoconcave 
then F is either R', -weakly pseudoconcave or R°, - 
pseudoconcave. 


Efficiency 


Consider the following vector optimization problem: 


(P) U-—maxF(x), xESCx, 


where X is an open set of R", F: X > R‘, and U € R’ is 
a nontrivial cone with vertex at the origin 0 € U, intU 
ZO. 

A point x9 € S is said to be: 

weakly efficient if F(x) ¢ F(xo) + intU, for all x € S; 

efficient if F(x) ¢ F(xo) + U®, for all x € S; 

e strictly efficient if F(x) ¢ F(xo) + U, for allx eS,x4 
xo. 

If the previous conditions are verified in I M S, where 

I is a suitable neighborhood of xo, then xp is said to be 

a local weakly efficient point a local efficient point and 

a local strictly efficient point, respectively. 

In the scalar case (s = 1, U = R,), the definitions 
of a (local) weakly efficient point and an (local) efficient 
point reduce to the ordinary definition of a (local) max- 
imum point, while a (local) strictly efficient point re- 
duces to the ordinary definition of a (local) strict maxi- 
mum point. Obviously (local) strictly efficiency implies 
(local) efficiency and (local) efficiency implies (local) 
weakly efficiency. 


The concept of efficiency was originally introduced 
by V. Pareto in the early 1900s when he used the pos- 
itive orthant Ri to generate the order; therefore when 
U = R\, efficient points are often called Pareto points. 

As in the scalar case, vector generalized concavity 
plays an important role in investigating relationships 
between local and global optima. Following [14], the 
assumption of convexity of the feasible region can be 
weakened requiring that S is star-shaped at the point 
Xo. 

A set S C X is said to be star-shaped at xo € S if for 
every x € Sit results: 


[x, xo] = {tx + (1—ft)xo: te [0,1]} CS. 


Since optimality results involve a feasible point, from 
now on we will consider generalized concavity at 
a point xo; this means that all the given definitions hold 
with x; = xo. The following theorem shows that, under 
suitable assumption of generalized concavity, local effi- 
ciency implies global efficiency. 


Theorem 14 Let us consider problem (P) where S is 

a star-shaped set at xo. 

i) if Xo is a local weakly efficient point and F is intU- 
quasiconcave at xq, then xo is a weakly efficient point 
for (P); 

ii) if xo is a local efficient point and F is U®- 
quasiconcave at Xo, then xo is an efficient point for 
(P); 

iii) if xo is a strict local efficient point and F is U- 
quasiconcave at xo, then xo is a strictly efficient point 
for (P); 

iv) if xo is a local efficient point and F is U- 
pseudoconcave at xo, then xq is an efficient point for 


(P). 


Proof i) Assume that there exists x* € S such that 
F(x*) € F(xo) + intU. Since F is intU-quasiconcave at 
xo, we have F(xo + A(x* — xo)) € F(xo) + intU for all A 
€ (0, 1) and such a relation implies, choosing A small 
enough, the non local weakly efficiency of xo. 

ii), iii) follow with similar arguments. 

iv) Assume that there exists x* € S such that F(x*) € 
F(xo) + U®. Since F is U-pseudoconcave at x9, we have 
Jp(xo)d € intU, d = (x* — xo)/||x* — xo|l, that is 


_ F(xo + td) — F(xo) 
lim E 
toot t 


int U 
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and this implies the existence of a suitable € > 0 such 
that F(xo + td) — F(x) € intU for all t € (0, €). 

Set t=A||x* — xo||; we have F(xo+A(x*—x9)) € F(x0) 
+ intU for all A € (0, €/||x* — xo||) and this contradicts 
the local efficiency of xo. Oo 


Corollary 15 Let us consider problem (P) where S is 

locally star shaped at xo. 

i) If U isa pointed cone and F is U-concave at xo, then 
a local efficient point xq is an efficient point too. 

ii) If F is linear, then a local efficient point xo is an effi- 
cient point too. 


Optimality Conditions 


Now we point out the role played by vector general- 
ized concavity in stating sufficient optimality condi- 
tions. With this aim consider the necessary optimality 
conditions stated in the following Theorem: 


Theorem 16 Let us consider problem (P) where F is dif- 
ferentiable at xo. 
i) If xo isa local interior efficient point for (P), then 


da € U*\ {0}: w'Jp,, =0, (1) 


where U* denotes the positive polar cone of U. 
ii) If Xo is a local efficient point for (P) then 


Jr, (v) € int U, Vv € T(S,x0), v#0. (2) 


Here, T(S, xo) is the Bouligand tangent cone, defined as: 


Afan} CR, {xn} C S, 
Oy > OO, Xn > Xo, 


T(S, xo) = 4 v: 


On(Xn — X09) > V 


The following theorem points out the different roles 

played by weakly pseudoconcavity and pseudoconcav- 

ity: 

Theorem 17 Let us consider problem (P) where S is 

a star shaped set and F is differentiable at xo. 

i) if (1) holds and F is U-pseudoconcave at xo, then xo 
is an efficient point for (P); 

ii) if (1) holds with a € intU* and F is U-weakly pseu- 
doconcave at Xo, then xo is an efficient point for (P); 

iii) if (2) holds and F is U-pseudoconcave at xo, then xo 
is an efficient point for (P); 

iv) ifJr,,(v) ¢ U®, for all v € T(S, xo) and F is U-weakly 
pseudoconcave at Xo, then Xo is an efficient point for 


(P). 


Proof i) Assume that there exists x* € S such that 
F(x*) € F(xo) + U®. Since F is U-pseudoconcave at xo, 
we have Jr,, (d) € intU, d= (x* — xo)/||x* — xo||, so that 
aT (Jz, (d)) > 0 and this contradicts (1). 

ii) Assume that there exists x* € S such that F(x*) 
€ F(xo) + U®. Since F is U-weakly pseudoconcave at xo, 
we have Jp, (d) € U®, d = (x* — xo)/||x* — xo|l, so that 
aT(Jz,,(d)) > 0 and this contradicts (1). 

iii), iv) follow immediately. Oo 


When F is a linear vector valued function, Theorem 
17ii) can be specified by means of the following theo- 
rem: 


Theorem 18 Consider problem (P) where F is linear 
and U is a pointed cone. 

An interior point xo is an efficient point for (P) if and 
only if there is a € intU* such that aTJp,, = 0. 


F. John Generalized Conditions 


Now we stress the role of vector generalized concavity 
in stating the sufficiency of F. John condition. 

With this aim consider the vector problem (P) in the 
following form: 


U—max F(x), 


(P) 
xeS={xeX: G(x) eV}, 


where X C R” is an open set, F: X > R*, G: X — R™ 
are differentiable functions and U Cc R‘, V C R™ are 
closed, pointed, convex cones with vertices at the origin 
and nonempty interiors. 

Denote with U*, V* the positive polar cones of U 
and V, respectively, and let xo be a feasible point such 
that G(xo) = 0. 

The following F. John necessary optimality condi- 
tions hold: 


Theorem 19 [f x9 is a local efficient point for (P), then 
A(ap, Ac) x O,ar € U* ac eV": 


(3) 

ap JP,, + 06 Tex = 0. 
The following theorem points out the role of gener- 
alized concavity in stating sufficient optimality condi- 
tions: 


Theorem 20 Let us consider the vector optimization 
problem (P) where S is a star shaped set at xo and F, G 
are differentiable at xo. 
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i) if F is U-weakly pseudoconcave at xo, G is V- 
quasiconcave at xo, and (3) holds with arp € intU*, 
then xq is an efficient point for (P). 

ii) if F is U-pseudoconcave at xo, G is V-quasiconcave 
at xo, and (3) holds with ap € U* \ {0}, then xo is an 
efficient point for (P). 


Proof i) Suppose that there exists x* € S such that 
F(x*) € F(xo) + U®. Since F is U-weakly pseudocon- 
cave at x9 and G is V-quasiconcave at xo we have, re- 
spectively, Jp, (x* — xo) € U®, Je,,(x* — xo) € V and 
thus OFT Pg (x* — xo) > 0, 6] Gro (x* — xo) > 0 since ap 
€ intU* and ag € V*. Consequently ap Tx, ("= %) 
+ ET Gry (x* — x9) > 0 and this contradicts (3). 

ii) similar to the one given in i). Oo 


Connectedness of the Efficient Points Sets 


A vector maximization problem normally has a contin- 
uum of optimal alternatives and it may be necessary to 
select one or several of these which are best with respect 
to some additional auxiliary criterion, so that a desir- 
able property is connectedness since it provides a pos- 
sibility of continuous moving from one efficient point 
to any other along optimal alternatives only. Consider 
problem (P) where F = (f;,.. 
tion and U is the Paretian cone; denote with S(a) the 
upper level set associated to the point a € R’‘, that is 
S(a) = {x € S: F(x) € a + U}. The following fundamen- 
tal result was given by A.R. Warburton [16]. 


.» fs) is a continuous func- 


Theorem 21 

i) if fi, ..., fs are quasiconcave functions on the closed 
convex set S and S(a) is compact for eacha € f1(S) x 

- x f;(S), then the set of all weakly Pareto points is 
nonempty and connected; 

ii) if fi, .. 
the closed convex set S and S(a) is compact for each a 
€ fi(S) x -++ x fs(S), then the set of all Pareto points 
is nonempty and connected. 


.» fs are strongly quasiconcave functions on 


Obviously the compactness of sets S(a) is verified when 
S is a compact set; in this last case for a bicriteria 
and three criteria, Theorem 21ii) holds, requiring the 
weaker assumption of semistrictly quasiconcavity in- 
stead of strongly quasiconcavity [9,15]. 

In [12], Luc extends Theorem 21i) with respect to 
a pointed closed convex cone requiring that F is U- 
continuous. 


F is said to be U-continuous at x € S if for any neigh- 
borhood H of F(x), there exists a neighborhood I of x 
such that F(y) € H— U forallyeIns. 


Theorem 22 Assume that F is a U-continuous Luc U- 
quasiconcave function on S and the set of all weakly ef- 
ficient points of S(a) is compact for each a € R*. Then 
the set of all weakly efficient points is nonempty and con- 
nected. 


See also 


> Invexity and its Applications 
> Isotonic Regression Problems 
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Introduction 


Generalized Disjunctive Programming (GDP) [13] is 
an extension of disjunctive programming [1,2] that 
provides an alternate way of modeling mixed-integer 


linear programming (MILP) and mixed-integer nonlin- 
ear programming (MINLP) problems. The general for- 
mulation of a (GDP) is as follows: 


minZ = » ck + f(x) 


keK 
s.t. r(x) <0 
Yjk 
Vv Sik(x) <0 | keK (GDP) 
Ik 
Ck = Vjk 
2(Y) = True 


xeER", cER", Y € {true, false}™ 


where Yj are the Boolean variables that decide whether 
a given term j in a disjunction k € K is true or false, and 
x are the continuous variables. The objective function 
involves the term f(x) for the continuous variables and 
the charges c, that depend on the discrete choices in 
each disjunction k € K. The constraints r(x) < 0 must 
hold regardless of the discrete choices, and gjx(x) < 0 
are conditional constraints that must hold when Yjx is 
true in the j-th term of the k-th disjunction. The cost 
variables c, correspond to the fixed charges, and their 
value equals to yj if the Boolean variable Yj, is true. 
2(Y) = True are logical relations for the Boolean vari- 
ables expressed as propositional logic. An important 
particular case is the one where the functions f(x), r(x) 
and gjx(x) are all linear. For the nonlinear case it is 
assumed for the derivation of basic methods that the 
functions are convex, although in practical applications 
these often correspond to nonconvex functions. 


Mixed-Integer Programming Reformulations 


Problem (GDP) can be reformulated as the following 
“big-M”MINLP problem, 


minZ = )) > yinyjn + f(x) 
kEK j€J 
s.t. r(x) <0 


Si(x) < Mjc(l— yjr), Ge Tk, KEK (BM) 


= Vik = 1, kek 

JET 

Ay<a 

O<x<x"¥, ye e{0,l}, fete, kEK 
where the Boolean variables are replaced by binary 
variables yj, the disjunctions are replaced by “Big-M” 
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constraints which involve a parameter Mj and bi- 
nary variables yx. The propositional logic statements 
Q(Y) = True are replaced by the linear constraints 
Ay < aas described by Williams [19]. Here we assume 
that x is a non-negative variable with finite upper bound 
x”. An important issue in model (BM) is how to specify 
a valid value for the Big-M parameter Mjx. If the value 
is too small, then feasible points may be cut off. If Mjx is 
too large, then the continuous relaxation might be too 
loose yielding weak lower bounds. Therefore, finding 
the smallest valid value for Mj is the desired selection. 
For linear constraints one can use the upper and lower 
bound of the variable x to calculate the maximum value 
of each constraint, which then can be used to calculate 
a valid value of Mjx. For nonlinear constraints one can 
in principle maximize each constraint over the feasible 
region, which is a non-trivial calculation. It is also im- 
portant to note that if the binary variables yx are spec- 
ified as continuous, 0 < yjx < 1, and the functions 
f(x), r(x) and gix(x) are assumed to be convex, the relax- 
ation of problem (BM) reduces to a convex NLP prob- 
lem, that provides a valid lower bound to the solution 
of problem (GDP). 

The MINLP hull reformulation of problem (GDP) 
is based on the following proposition by Lee and Gross- 
mann [11]: 


Proposition 1 The convex hull of each disjunction k € 
K in problem (GDP), 


Yjx 
Vv] gje(x) <0 (Dx) 
jeTk 
C = Vik 
O0<x<x", c>0 


where gjx(x) < 0 are convex inequalities, is a convex set 
and is given by, 


x= Yi vc = D> vin Vik 
IGTk jes 
(20 = VikX urd € Jk 
Yo vik = 1,0 < yj <1 eT 
I¢Tk 
Vi gjk(VI*/y jk) <0, 7 € Te 


x,c,viF > 0,7 € Jy 


(CHg) 


The proof is based on an extension of the work by 
Stubbs and Mehrotra [16]. In (CH;), v are disaggre- 


gated variables that are assigned to each term of the 
disjunction {k € K}, and yj can be regarded as the 
weight factors that determine the feasibility of the dis- 
junctive term. Note that when y;x is 1, then the j’th term 
in the k’th disjunction is enforced and the other terms 
are ignored. The constraints yj gjk(v/*/y jx) are con- 
vex if gix(x) is convex as discussed in Hiriart-Urruty 
and Lemaréchal [8]. Formal proofs can be found in [15] 
and [16]. Note that the convex hull (CH;) reduces to the 
result by Balas [2] if the constraints are linear. Based 
on the convex hull relaxation (CH,;), Lee and Gross- 
mann [11] proposed the following MINLP hull refor- 
mulation of (GDP): 


minZ = 2 > VikV jk + f(x) 


kEK jETk 
s.t. r(x) <0 
c= > vy, Y wets keK (HR) 
jETk Sr 
O<vi¥ <yuxn, j€Ie, KEK 
vik gie(vi*/y jx) <O7EIk, KEK 
Ay<a 
0<x, vik < x2, yi =O1,f EI, KEK. 


The relaxation of problem (HR) where 0 < yjx < 1, 
reduces to a convex NLP problem that yields a valid 
lower bound to the optimal solution of problem (GDP). 
Also, this relaxation, which can also be regarded as 
a generalization of the disjunctive problem studied by 
Ceria and Soares [4], can be interepreted as one where 
the convex hulls of each of the disjunctions are intere- 
sected. 

The following proposition holds for problems (PR) 
and (BM) as proved by Grossmann and Lee [7]. 


Proposition 2. LetZk, be the optimal value of prob- 
lem (HR) where the binary variables are relaxed as 0 < 
yik < 1, and let ZRy be the optimal value of prob- 
lem (BM) where the binary variables are relaxed as 0 < 
Yu Ne Ze = Zee, 


Hence, problem (HR) has the useful property that the 
lower bound of its relaxation is greater than or equal to 
the lower bound predicted from the relaxation of prob- 
lem (BM). In some problems this translates into a sig- 
nificantly tighter formulations [13,14]). The trade-off, 
however, is that in the reformulation (HR) the number 


1182 


Generalized Disjunctive Programming 


of constraints and variables is larger than the one in the 
reformulation (BM). 

It is also important to point out that for 
the computer implementation of the constraint 
Vik jk (Vjk/Vjk) < O in problem (HR), an approxi- 
mation is required for the nonlinear functions, gjx(x) 
in order to avoid the division by zero when yj, = 0. 
Furman et al. [5] have proposed the following approxi- 
mation, which has the interesting feature that it is exact 
for yjx = Oand yjx = 1, 


(1 —e) yj + €) gin viel — 8) yx + €))) 
— eg (O)(1 — yjx) <0. 


Furthermore, it can be shown that this inequality is 
convex for any value of e. Note also that this expression 
reduces to the original one as e — 0. 


Solution Algorithms for GDP 


The most direct way of solving problem (GDP) is by 
reformulating it as an MINLP (or MILP for the linear 
case). In both cases the big-M and hull reformulation 
are the two extreme choices. The latter generally yields 
tighter relaxations, but involves solving a larger prob- 
lem. For the linear case LP-based branch and cut meth- 
ods can be used [10], including special cutting plane 
techniques [14]. For the nonlinear case, MINLP meth- 
ods such as branch and bound, outer-approximation, 
Generalized Benders, extended cutting plane or hybrid 
methods can be used [6]. 

Logic-based method for solving linear problems 
(GDP) include the branch and bound method by Beau- 
mont [3], which branches on the constraints of the 
disjunctions. Raman and Grossmann [13] developed 
a branch and bound method which solves GDP prob- 
lem in hybrid form, by exploiting the tight relax- 
ation of the disjunctions and the tightness of the well- 
behaved mixed-integer constraints. For the nonlinear 
case a disjunctive branch and bound method based on 
the hull relaxation has been proposed by Lee and Gross- 
mann [11] that is coupled with logic inference tech- 
niques [9]. Also, for the special case of two-term dis- 
junctions in (GDP), which typically arise in process net- 
work problems, Tirkay and Grossmann [17] have pro- 
posed outer-approximation and Generalized Benders 
Decomposition algorithms. Some of these algorithms 
have been implemented in LOGMIP, a computer code 


based on GAMS [18]. Finally, for the nonconvex case 
a disjunctive branch and bound method coupled with 
a spatial branch and bound search has been reported 
in [12]. 
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Problem 


Consider two matrices A € R”*” and B € R**™, each 
row being a point in one of two classes in the feature 
space. The generalized eigenvalue proximal support vec- 
tor machine (GEPSVM) consists in finding two hyper- 
planes each one being closer to one set of points and 
farther from another set of points. Let x’w — y = 0 be 
a hyperplane in IR”. In order to satisfy the previous 
condition for all points in A, the hyperplane can be ob- 
tained by solving the following optimization problem: 


||Aw — ey |? 
min —————__. (1) 
w.y#0 ||Bw — ey||* 
The hyperplane for B can be obtained by minimiz- 
ing the inverse of the objective function in (1). Now, let 


G=[A —-e]'[A —el, 
, (2) 
H=[B —e][B —e] 
and 
z=[w yl’. (3) 


Then (1) becomes 


_ 2'Gz 
min : 
zeER™ 2’ Hz 


(4) 


The expression in (4) is the Raleigh quotient of the 
generalized eigenvalue problem Gz = AHz. When H 
is positive definite, the stationary points are obtained 
at and only at the eigenvectors of (4), where the value 
of the objective function is given by the eigenvalues. 
The Raleigh quotient is bounded, and it ranges over 
the interval determined by minimum and maximum 
eigenvalues [4]. H is positive definite under the assump- 
tion that the columns of [B_ — e] are linearly indepen- 
dent. The reciprocal of the objective function in (4) has 
the same eigenvectors and reciprocal eigenvalues. Let 
Zmin = [Wi y2]’ be the eigen- 
vectors related to the eigenvalues of the smallest and 
largest modulo, respectively. Then x’w, — y = 0 is the 
closest hyperplane to the set of points in A and the fur- 
thest from those in B and x’w2 — y2 = 0 is the clos- 
est hyperplane to the set of points in B and the fur- 
thest from those in A. GEPSVM finds application in 
many supervised learning problems [3]. For example, 
a bank prefers to classify customer loan requests as 
“good” or “bad” depending on their ability to pay back 
the loan. The Internal Revenue Service tries to discover 
tax evaders starting from the characteristics of known 
evaders. As another example, a built-in system in a car 
could detect if a walking pedestrian is going to cross the 
street. More applications can be found in biology and 
medicine. The tissues that are prone to cancer can be 
detected with high accuracy, or new DNA sequences or 
proteins can be tracked down to their origins. Given its 
amino acid sequence, finding out how a protein folds 
provides important information about its expression 
level. An unlabeled point x is associated to the class y; 
related to the closest hyperplane P;. Therefore, a point 
x is classified using its distance for the corresponding 
hyperplane: 


vil’ and Zmax = [w4 


yi = argmin,_, ,{dist(x, P;)}, (5) 
where 
|x'wi — vil 
dist(x, P;) = —————— (6) 
IIwill 
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Kernel Formulation 


To obtain greater separability between classes, nonlin- 
ear embedding of data to a higher-dimensional space 
is required. This nonlinear mapping can be done im- 
plicitly by kernel functions, which represent the inner 
product of the elements in a nonlinear space. Kernel 
functions can be described as follows: 


K(x;, xj) = (b(xi) — 6(x;), d(x) — O(xj)), (7) 


where (x) is the embedding function. 

Using kernels, we can express the problem in terms 
of inner products between elements, and therefore the 
computationally expensive calculation of the feature, in 
the embedded space, is avoided. Some commonly used 
kernel functions are 

Linear K(xj,x;) = x; x; 

Polynomial K(x;, xj) = (xj -+xj + iy? 


| wey ||? 
Gaussian K(x;,xj;) = exp | -——————_ ] . 
o 


Using the kernel function, each element of the ker- 
nel matrix is 


K(A, B);,; = K(A‘, B’). (8) 


Then problem (1) becomes 


I|K(A, C)u — ey||? 
min ‘ 
u,y#0 ||K(B, C)u — ey||? 


(9) 


A point x is classified using its distance for the corre- 
sponding hyperplane in the feature space: 


yi= argmin ;_, »{dist(x, P;)}, (10) 
where 
K(x, C)ui — yi 
dist(x, P;) = met (11) 
Uj 


The associated eigenvalue problem has matrices of or- 
der n + k + 1 and rank at most m. This means a regu- 
larization technique is needed since the problem can be 
singular. 


Algorithm 


Let G and H be as defined in (2). Note that even if A 
and B are full rank, matrices G and H are always rank- 
deficient. The reason is that G and H are matrices of 
order m+ 1, and their rank can be at most m. The 
added complexity due to the singularity of the matrices 
means that special care must be given to the solution of 
the generalized eigenvalue problem. Indeed, if the null 
spaces of G and H have a nontrivial intersection, i.e., 
Ker(A) (| Ker(B) ¥ 0, then the problem is ill posed and 
a regularization technique is needed to solve the eigen- 
value problem. Mangasarian et al. [2] proposes to use 
Tikhonov regularization applied to a twofold problem: 


||Aw — ey? + dllz|? 


min (12) 
wy £0 ||Bw — ey||? 
and 
Bw — ey|l? + 8|/z||? 
min BY evIP + All? i 
wv #0 |Aw — ey || 


where 6 is the regularization parameter and the new 
problems are still convex. The minimum eigenvalues- 
eigenvectors of these problems are approximations of 
the minimum and maximum eigenvalues-eigenvectors 
of (4). The solutions (w;, y;), i = 1,2 to (12) and (13) 
represent the two hyperplanes approximating the two 
classes of training points. The same regularization tech- 
nique can be applied to the nonlinear formulation. 


Another Algorithm 


It is possible to solve the problem without regulariza- 
tion. In practice, if 8G — wH is nonsingular for every a 
and 8, it is possible to transform the problem into an- 
other problem that is nonsingular and that has the same 
eigenvectors of the initial one. We start with the follow- 
ing theorem whose proof can be found in [5], p. 288. 


Theorem 1 Consider the generalized eigenvalue prob- 
lem Gx = Hx and the transformed G* x = AH*x de- 


fined by 
G* = ™|1G = 6H, H* = tH — 6.G (14) 


for each choice of scalars t1, Tz, 51, and 5 such that the 
2 x 2 matrix 


_ K Bel 61 
a-(? : ) 


(15) 
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is nonsingular. Then the problem G*x = 1H*x has the 
same eigenvectors of the problem Gx = A Hx. An associ- 
ated eigenvalue X* of the transformed problem is related 
to an eigenvalue A of the original problem by 


™mA* + 61 
T+ bA* , 


In the linear case, Theorem 1 can be applied. By setting 
t% = =1and 5; = —6), 5) = —b), the regularized 
problem becomes 


[Aw — ey||? + 41 ||Bw — ey |? 


min - : (16) 
w.y#0 ||Bw — ey||? + 52||Aw — ey||? 


If 6,and 5» are nonnegative, 2 is nondegenerate. The 
spectrum is now shifted and inverted so that the mini- 
mum eigenvalue of the original problem becomes the 
maximum of the regularized one, and the maximum 
becomes the minimum eigenvalue. Choosing the eigen- 
vectors related to the new minimum and maximum 
eigenvalue, we obtain the same solution of the original 
problem. 

This regularization works for the linear case if we 
suppose that in each class of the training set there is 
a number of linearly independent rows that is at least 
equal to the number of the features. This is often the 
case and, if the number of points in the training set is 
much greater than the number of features, Ker(G) and 
Ker(H) have both dimension 1. In this case, the proba- 
bility of a nontrivial intersection is zero. 

In the nonlinear case the situation is different. Guar- 
racino et al. [1] propose to generate the two proximal 
surfaces 


K(x,C)u,-y, = 0, K(x, C)u.-—y2 = 0 (17) 
by solving the following problem 
IK(A, Cha = evi? + SIKau—evl? gy 


min 
uy #0 ||K(B, C)u — ey||? + 6||Kau — ey||? 


where K, and Kg are diagonal matrices with the diag- 
onal entries from the matrices K(A,C) and K(B,C). The 
perturbation theory of eigenvalue problems [6] pro- 
vides an estimation of the distance between the original 
and the regularized eigenvectors. If we call z an eigen- 
vector of the initial problem and z(6) the corresponding 
one in the regularized problem, then |z — z(6)| = O(6), 
which means their closeness is in the order of 5. 
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Introduction 


Generalized geometric programming (GGP) problems 
with continuous and discrete variables occur quite 
frequently in various fields such as civil and material 
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engineering design, chemical engineering, location- 
allocation, inventory control, production planning, 
and scheduling etc. These applications are extensively 
surveyed in Floudas and Pardalos [11] and Floudas [9]. 
Biegler and Grossmann [3] provided a retrospective 
on optimization techniques that have been applied 
in process systems engineering. They indicated that 
design and synthesis problems have been dominated 
by nonlinear programming (NLP) and mixed-integer 
nonlinear programming (MINLP) models. Although 
MINLP programs appear in many chemical engineer- 
ing problems, they are often nonconvex and no direct 
optimization method is available to guarantee global 
optimality [21]. With the increasing reliance on math- 
ematical programming based approaches in chemical 
engineering problems, the need for finding global opti- 
mum is paramount. 

The developed methods for GGP problems with 
continuous and discrete variables can be divided into 
two approaches. 

(i) Stochastic methods: The stochastic methods in- 
volve random elements in their search and rely on 

a statistical argument to prove their convergence. 

For instance, Salcedo et al. [23] proposed an im- 

proved random search algorithm for solving non- 

linear optimization problems. Cardoso et al. [5] 

solved nonconvex nonlinear integer programming 

problems with simulated annealing. Yiu et al. [30] 

developed a hybrid descent approach based on 

a simulated annealing algorithm and a gradient- 

based method to solve multidimensional noncon- 

vex continuous optimization problems. Hussain 
and Al-Sultan [15] proposed a hybrid algorithm 
for nonconvex function minimization by utilizing 
the genetic technique to generate search directions. 

These stochastic methods mentioned above can 

not guarantee to find the global optimum. There- 

fore, the quality of the solution is not ensured. 

Moreover, the probability of finding the global so- 

lution decreases when the problem size increases. 
(ii) Deterministic methods: Mathematical methods 

that generate convex underestimators for twice 
differentiable constrained nonconvex optimization 
problems are of primary importance in determin- 
istic global optimization [9]. The a BB global op- 
timization algorithm [1,2,9] is a power approach 
for constructing such convex underestimators for 


nonconvex functions [10]. In a general survey of 
optimization techniques ([3,13,14]), many deter- 
ministic methods for convex MINLP problems 
have been reviewed. The methods include Branch 
and Bound (BB) ([17,24]), Generalized Benders 
Decomposition (GBD) [12], Outer-Approximation 
(OA) ([6,7;22]), Extended Cutting Plane Method 
(ECP) [28], and Generalized Disjunctive Program- 
ming (GDP) [16]. One possible approach to cir- 
cumvent the nonconvex objective function or the 
nonconvex constraints in MINLP models is trans- 
formation. Floudas ([8,9]), Floudas and Parda- 
los [11] and Maranas and Floudas [20] proposed 
exponential transformation methods to treat GGP 
problems with continuous and discrete variables. 
The core concept of their methods is to convert 
the problem into a new problem where both the 
constraints and the objective are decomposed into 
the difference of two convex functions. By uti- 
lizing exponential variable transformations, each 
signomial term z = 
positive, can be transferred into an exponential 
term z/ = e%!2*1+8 ln _ However, the exponential 
transformation technique can only be applied to 
strictly positive variables and is thus unable to deal 
with nonconvex GGP problems with continuous 
and discrete free variables. 
Although positive variables are adopted frequently to 
represent engineering and scientific systems, it is also 
common to introduce free variables to model the sys- 
tem behavior, such as stresses, temperatures, electrical 
currents, velocities and accelerations, etc. In general, 
the values accepted by the machines are under a discrete 
space. For instance, a controller can only increase tem- 
perature from a fixed initial point to a set of fixed points 
at a fixed interval. Consequently, deriving a global op- 
timum for the GGP problem with continuous and dis- 
crete free variables is essential for real applications. Li 
and Tsai [18] proposed a technique for treating free 
continuous variables in GGP problems. Porn et al. [21] 
introduced different convexification strategies for 
MINLP problems with both polynomial and nega- 
tive binomial terms in the constraints. They suggested 
a simple translation, x + t = e*, to treat a free variable 
x. However, inserting the transformed result into the 
original signomial term will bring additional signomial 
terms and therefore increasing the computation bur- 


xt xP , where x; and x2 are 
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den. This study proposes a method for solving a GGP 
problem with continuous and discrete free variables to 
obtain a global optimal solution. The GGP problem is 
first transformed into another one containing only pos- 
itive variables. Then the transformed problem is refor- 
mulated as a convex mixed-integer program. A global 
optimum of the GGP problem with continuous and dis- 
crete free variables can finally be found within the tol- 
erable error. Furthermore, this study develops several 
convexification strategies for signomial terms so that 
the efficiency of the optimization approach can be en- 
hanced. The right choice of transformation for convex- 
ification of nonconvex signomial terms might signifi- 
cantly decrease the solution time [4]. By employing the 
proposed rules, certain classes of signomial terms can 
be determined as convex terms and do not require any 
transformation. Moreover, some nonconvex signomial 
terms with specific features can be transformed into 
convex terms in accordance with the proposed rules by 
replacing some variables, thereby making the resulting 
problem a computationally efficient model. 


Formulation 


The mathematical formulation of a GGP problem with 
continuous and discrete free variables is expressed as 
follows: 


GGP: 

Minimize (X,Y) 

subjectto g(X,Y)<0 t=1,...,T, 
X = (X1,..., Xp, Xpt1-.+5 Xn); 
x; S Xj = Xi, 
Y = (yy---s Vago Vqti ++ +> Ym)s 
y, Syjs Vp 


where x; € Kt for 1 < i < p, x; are bounded 
free variables for p + 1 < i < n, yj; are positive inte- 
ger/discrete variables for 1 < j < q, yj are bounded 
free variables for q+ 1 < j < m, f(X, Y) and g;(X, Y) 
are mixed-integer signomial functions, x; and x; are 
lower and upper bounds of the continuous variable 
x;, and be and yy, are lower and upper bounds of the 


integer/ discrete variable y;, respectively. 


Methods 


Treating Free Variables. Li and Tsai [18] proposed 
a technique for treating free continuous variables in 


GGP problems. By integrating Li and Tsai method with 
the approach of dealing with free discrete variables de- 
scribed below, a GGP problem with continuous and 
discrete free variables can be equivalently transform 
into a mixed-integer GGP program with positive vari- 
ables. The following illustrates how to convert free dis- 
crete variables into non-positive discrete variables. 


Let: yj = yf —yj. YP] 20 
for j=qt1,---,m. 


. Bj . 
And a nonlinear term y;’ is expressed as 


B; , apn RBs 
yi = OF) + CDF), 
A; € integer, for j =q+1,...,m. 


If ye > Oand y; = 0, then y; is positive. Otherwise, 
if y; >9 and yy = 0, then y; is negative. To prohibit 
from yielding positive values for yy and y; simultane- 
ously, we have the following remark. 


Remark 1 A free discrete variable y; can be expressed 
as yj = yp =F Vp I; > 0, and yp and y; will not 
be positive concurrently by the following inequalities. 


ee er i aay oe 
(ii) M(8;- + ¥7 Sy Sy7- 


Misa sufficiently large positive number and 6; € {0, 1}. 

By means of changing variables, the GGP problem 
with free variables can be equivalently solved with an- 
other one having non-negative variables. The next is to 
deal with discrete variables containing zero, consider 
the following propositions: 


Proposition 1 [21] For positive discrete variables y; € 
{dj,dj2,-*- pays where djiti > dji >0 for i = 
1,2,+++,mj — 1, a product term y{' y3? --- y° where 
1,02,°** ,@m are real constants can be transformed 
into a function et" t%m2m where z; = Indj + 
mj—1 mj—1 

aan uji(ln dj,i44 =Indj), Yj=1 Uji < 1 for U ji € 
{0, 1}. 

Proof Let yj = e*/ and z; = In yj, expressing y; as 

mj—l mj—l 

yj = dp t+ Liar uyldj iti — dp), Vier Uji S1, 
where uj; € {0, 1}. 
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We then have yf! yp? --- yt = eMZtetomim and 

| j—1 
Zj = In dj1 + ys uji(In dj i+1 = In dj1), pes Uji 
< 1, for uj; € {0, 1}. Oo 


Because some variables y; in Proposition 1 may have 
zero value, Proposition 1 needs to be modified as the 
following proposition: 


Proposition 2 For positive discrete variables yj ¢€ 
{djr.djz.-** dim} where dizi >dji>0 fori = 
1,2,---,mj; —1, 1 < j < 4, and non-negative 
discrete variables y; € {0,dj1,dj2..--- aint where 
djiti >4ji>Ofori=1,2,---,mj—-1, q+1<jx 


Og+i 


m, a product term s = yi" y5? a Yah 


be expressed as 


+e ym can 


my 
(02625 (Somi) forge tsiem 


i=1 


m mj 
(ii) § S y > uji—(m — q) + etizi te FainZm <s 


j=qt1i=1 
m mj 
<5 [(m—9)— D> Yup) +L(ettenem), 
j=qtl i=1 


where Vj = Gj, 36a Uji(dj,i41—Gj1), Z; =I1n dj + 
ey win djiti —In bi) te & 1, Uji € 
{0,1}, for 1 = j S @ and yj = SE eee = 

2, uji(n dji), a ujiS 1, Uji € {0, 1} forqt+1 < 
fj < m, L(eM1 tmz) is a piecewisely linearized ex- 
pression of e11+"+%m2m, and § is the upper bound of s. 


Proof If there is yj = 0 for some j(q +1 < j < m), 
then >, uj; = O and s = 0 by (i). 

If y; > 0 for all j = q+ 1,-:- ,m, then 7y2, uji = 
1 for j = q+ 1,-:-,m. Therefore we have 
eas Yuji — (m — q) = 0 if all variables in 
the signomial term are not zero, and this implies s = 
ert +amZm according to (ii). Oo 


Remark 2 For a non-negative discrete variable y, y € 
{dy, da,-++ din} ,0 < dy < dy < +++ < dy, the expo- 
nential term y* where a is a real constant can be repre- 
sented as 


pores) m—1 
— d¥+ 5° ui(d%,,—d¥) where gm ui SI, 
i=l i 
uj € {0, 1}. 


According to the above discussions, free discrete vari- 
ables in GGP can be converted into positive discrete 
variables. In addition, Li and Tsai method [18] can deal 
with the free continuous variables. Consequently, the 
GGP program with continuous and discrete free vari- 
ables can be transformed into a GGP program with only 
positive variables. In order to obtain a global optimum 
of the transformed GGP program, it is required to be 
converted into a convex mixed-integer problem which 
is solvable by the conventional convex mixed-integer 
techniques to derive a globally optimal solution. 
Convexification Strategies. Convexification strate- 
gies for signomial terms are important techniques for 
global optimization problems. Sun et al. [25] proposed 
a convexification method for a class of global optimiza- 
tion problems with monotone functions under some 
restrictive conditions. Wu et al. [29] developed a more 
general convexification and concavification transfor- 
mation for solving a general global optimization prob- 
lem with certain monotone properties. With different 
convexification approaches, an MINLP problem can be 
reformulated into another convex mixed-integer pro- 
gram solvable to obtain an approximately global op- 
timum. Bjork et al. [4] proposed a global optimiza- 
tion technique based on convexifying signomial terms. 
They discussed that the right choice of transforma- 
tion for convexifying nonconvex signomial terms has 
a clear impact on the efficiency of the optimization 
approach. Tsai et al. [26] also suggested convexifica- 
tion techniques for the signomial terms with three vari- 
ables. This study presents generalized convexification 
techniques and rules to transform a nonconvex GGP 
program with continuous and discrete variables into 
a convex mixed-integer program. Consider the follow- 
ing propositions: 
Lemma 1 For a twice-differentiable function f(X) = 
c l Key X= (wis Xa," Xn), C07 EM, Vi, let 
HX) be the ith principal minor of a Hessian matrix 
H(X) of f(X). The determinant of H;(X) can be ex- 


pressed as det Hj(x) = 
1- > Qa; 3 
ji 


; i 10; 2 n ia j 

(-c}' ( TT a jx} TI xi 

isi j€i 
J;#~P 

Remark 3 Ifc > 0, x; > 0anda; < 0 (for all i), then 

det H;(x) > 0. 


Generalized Geometric Programming: Mixed Continuous and Discrete Free Variables 


1189 


Remark 4 Ifc < 0, x; > 0, a; > 0 (for all i), and, 
1—>°"_, a; = 0, then det H;(x) > 0. 


Proposition 3 A twice-differentiable function f(X) = 
n 


c [| x7 is convex for c,x; > 0,0; <0,i=1,2,-+-,n. 
i=1 


Proof By Lemma 1 and Remark 3, det Hj(x) = 


(-c)' (1 oj, ‘ TT xi (:- = «) 
SE j€Ii I€Si 


FP 
> 0 fori = 1,2,--+,n, whenc,x; > 0, a; <0,i= 
1,2,--- ,. Since det H;(x) > 0 for all i, H;(X) is posi- 
tive semi-definite and f(X) is convex. Oo 


Proposition 4 A twice-differentiable function f(X) = 
n 


c [| x; is convex forc < 0, xi,a; > 0 (fori = 
ah 

' n 
1,2,---,n),and1— >\a; = 0. 


i=1 


Proof By Lemma 1 and Remark 4, det Hj(x) = 


ae us iQ; 
ii G6Ji 


(—c)i (1 on 


(eh T;FP 
> Ofori = 1,2,---,n, whenc < 0, x;,a; > 0, and 
1— iL, a; = 0. Since det H;(x) > 0 for all i, H;(X) is 
positive semi-definite and f(X) is convex. Oo 


For a given signomial term s, if s can be converted into 
a set of convex terms satisfying Proposition 3 and 4, 
then the whole solution process is more computation- 
ally efficient. Under this condition, s does not necessi- 
tate the exponential transformation. For instance, s = 
xy /xy?x3! with x1, x2, x3 > 0 is a convex term requir- 
breaking no transformation by Proposition 3, and s = 
= x92 40-7 
Proposition 4. 


with x,,x. > 0 is also a convex term by 


Remark 5 A product term z = uf (x) is equivalent to 
the following linear inequalities: 


(i) Mu—1)+ f(x) <z< M(1—u)+ f(x), 
(ii) —Mu<z< Mu, 


where u € {0, 1}, z is an unrestricted in sign variable, 
and M = max f(x) is a large constant. 


Remark 6 The product term uju2--+u,, where uj € 
{0, 1} for i = 1,2,--- ,m can be replaced by a variable 
u expressed as 


(i)0<u<u;, for i=1,2,---,m, 


(ii) u>ouj—m+H+1. 


i=1 


Following the above discussions, herein we take a sig- 
nomial term with three variables for instance to de- 
scribe the strategy of convexification. The strategy can 
also be extended to convexity a signomial term contain- 
ing n variables. 


Consider a signomial term cx% xP x 


three positive variables, the term cx xh x} can be con- 


vexified by the following rules: 


composed of 


Rulel Ifc>0, a, B,y < 0, then cx xP 30 is already 
a convex term by Proposition 3. 


Rule 2 If c>0, a,8 < 0, and y>0, then let 


aby _ a =/ ==, gel 
cxtx, x3; = cxx>z," where z; = x;°. The term 


ext xh z,’ is convex by Rule 1. 


Rule 3 


a Boy _ 
CX, xX, xX3 = 


If c>0, a < 0, and B,y>0, then let 
cxtz Per where z) = x;*,22 = x37. 


B 


The term cx%z,"z,” is convex by Rule 1. 


Rule 4 If c>0 and a, 6, y >0, then let cx% xP x¥ = 
cet inn +B Inxa+y In x3_ 


Rule5 Ifc <0,a,6,y >0,anda+6+y <1, then 


cx2xh x} is already a convex term by Proposition 4. 


Rule 6 Ifc < 0,a,B>0,a+ 6 <_ 1, then let 


ayByY — -ya,b 1-a—-B = yVi(l-a—B) 
CXP Xs X34 = CXP XZ where Z; = x; . The 


B 1-a—B 


term cx{x;z, * is convex by Rule 5. 


Rule 7 Ifc < 0,0 < @ < 1, then let cx xP 30 = 
(l—a@)/2_(1—a)/2 


2B/(1—a@ 
cxtZ; Z where z} = ye and z = 
2y/(1— = -ay/2 
xV/0-) The term cx%z0 20M? is convex by 
Rule 5. 


Rule8 Ifc < 0and “a, fh, y <0ora,B,y > 1”, then 
i 


a,BY _ p7353 53 — 3a — 36 
let cxf x5 x3 = €Z{ 2323 where Zy = Xj" ,2Z2 = xy, 
3 3.323: 
and z; = x;”. The term cz} z} z} is convex by Rule 5. 
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Rule9 Ifa,B>0, x, € Z, x3 = landa+f/>1, 
then let cx®x) = c[d% + 07! mild, — 4%) ] xp 
for i € {1,2,---,m, — 1} . By Remark 5, the prod- 
uct term u);x, can be transformed into linear inequal- 


ities. 


By applying the proposed rules, we can determine cer- 
tain classes of signomial terms are convex and do 
not necessitate any transformation. Besides, we can 
transform a nonconvex signomial term into a convex 
term accordance with the proposed rules by replac- 
ing some variables, thereby decreasing the number of 
concave functions requiring to be estimated and mak- 
ing the resulting problem a computationally efficient 
model. 

In order to be a valid transformation in the global 
optimization procedure, the transformation should be 
selected such that the signomial terms are not only con- 
vexified but also underestimated [4,21,27]). If the trans- 
formations are appropriately selected, the correspond- 
ing approximate signomial term will underestimate the 
original convexified signomial term by applying piece- 
wise linear approximations to the inverse transforma- 
tion functions. We examine the proposed rules can sat- 
isfy the underestimating condition as follows: 

In Rule 2, let Z; be the approximate transforma- 
tion variable obtained from piecewise linear function 
of Zz) = xz 1 The inverse transformation z; = 
x3 '(x3>0) is convex and z, will be overestimated 
(Z; > Zz). When inserting the approximate variable in 
the signomial term, we find the underestimating prop- 
erty cx? x8 2,” < cx%x;z," is fulfilled since c > 0 and 
z, has a negative power in the convexified term. Sim- 
ilarly, Rules 3 and 4 meet the underestimating condi- 
tion. 

In Rule 6, let 2, be the approximate transfor- 
mation variable obtained from piecewise linear func- 
tion of z = x (Q-a-B) The inverse transforma- 


tion z} = gue 


(x3 > 0, a cee or =a < 
0) is convex and z, will be overestimated (Z, > z,). 
When inserting the approximate variable in the sig- 
nomial term, we find the underestimating property 
cx%xP ge? < cx%xP gi oP is fulfilled since c < 0 
and z; has a positive power in the convexified term. 


Similarly, Rules 7 and 8 satisfy the underestimating 
property. 


From above discussions, we observe the proposed 
rules not only convexity but underestimate the convex- 
ified signomial term. Consequently, utilizing the trans- 
formations in the global optimization of a GGP prob- 
lems, the feasible region of the convexified problem 
overestimates the feasible region of the original non- 
convex problem. 


Case Studies 


Casel Minimize x}x3°x3 + x3°x3 +x? 


subject to 

3x1 + 2x2 — x3 <7, 

—52%, 52, 05%. <4, -5<5%3;<-1, 
x1,%2 EZ, x3 EM. 

This problem is a nonconvex GGP program with 
continuous and discrete free variables. Current expo- 
nential transformation methods [8,9,11,20]) developed 
for solving mixed-integer GGP problems can not be 
adopted to treat this kind of problems. By employing 
the proposed method, we first utilize a straightforward 
substitution for the free variables to make the GGP 
problem with only non-negative variables. By Li and 


Tsai [18], let the free continuous variable x3 = —x,, 
x3 > 0. 
The free discrete variable can be transformed by Re- 


mark 1, x; = x} —x,, xj}, x7 > 0. The original prob- 


lem becomes as follows: 


Minimize 9 — (x}')?x3°(xz)° + (x) )?x3°(x5 P— 
x3°x3 + (aT P — Oe)? 

subject to ae — 3x, +2x,.+ x, <7, 

0axe) 32, 0< x7 350 Hm <4, 
l<x,; <5, 


ae a nn 
x! xX) ,%2 EZ, x, ER. 


Then, we use the proposed convexification rules to 
transform all signomial terms into convex terms as fol- 
lows: 


(i) zy = (xp x3? (5 YP and 22 = (x7 )?x3°(25)° 
are transformed by Rule 4 and Proposition 2. 

(ii) — x}? x3 is convexified as — x3°x; = —(uay+ 
B59) + 3°53 + 5 ugs)xy = —Z3 — 2524 


— 39°72, — 45 z by Rule 9. 
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(iii) (Can and — (ie are treated directly as 


(x; y= ai + Ps —(x} y= Uy, 


— Puy — Puy, — uj, — 5°uj, by Remark 2. 


Subsequently, the transformed program is presented as 
a convex mixed-integer program below: 


Minimize 


557, ~ 4552, reokegs 


—2Z + 2 — 23 —27?°z4 —3 
subject to 

3x) — 3x7 +2x.+x; <7, 
—x, <x, <50,-x,, 


5(0, -1) + x} <x, <x}, 


7 = ui or 2iK5, x = i ‘In2, 
Xp = Uy + 2uy) + 3uj3 + 4Uy4 + 5uy5, 


y, =ujy-In2+u4,,-In3+u,,-:In44+u,,-In5, 


Wy tub <Luy tu tug tua t wy <1, 
X2 = Ug1 + 2U22 + 323 + 4ur4, 

Y2 = Ug? -In2 + up3-In3 + U4 -1n4, 

U2, + U22 + U23 + Ur <1, 

yy = Lnx;), 

0<z <2z(ut + uf), 

0 < 2% < 2(ua + U2 + u23 + U4), 

Z(uyy + uyy + Ua1 + U2 + 23 + U4 — 2) 

$ et HSnt3T <2 

zy <2(2— (uy, + ug + ua + ure + 23 + ur) 


4 Lest #1592435), 


05 2% SAuy + uy + U43 + uy + U5), 


0 < 22 S Z(u2, + U22 + U23 + U4), 


uy, tuy tu tug tuys + ua + un + 23 
+ tng — 2) + e721 FEM TIIG < Zp, 


ZS 2(2— (uy + yy + Uy3 + Uyy + Uys + YI 
+ tag + ua3 + U4) + Le?! 419924995 ), 
5(uy, 1) +x, S23 <3, OX< 23 < 5p), 
S(t l)+%3 S Za x5, O08 24 < 5Sttrn, 
5(uz31) +x, S25 < x3, OX 25 < 5up3, 
S(uy4l) + x3 < 2% <3, OX % < Sur, 
zz = uj, +2us, 


2g = Uy, + 2, + 3h, + Puy, + ee 


(0,0,0,1,0,0,0) < (x, x], x2, YT. Vi Ya V5) 
< (2,5,4,5,1n2,1n5,1In4,1n5), 


where ujj,uj;,4;;,1 € {0,1}, and Z = 125,000. 

Solving the original problem without any variable 
transformation and convexification by LINGO [19], 
a local optimum obtained is (x1, x2, x3) = (—5,0,—5) 
and the objective value is -3125. However, solving 
the above transformed convex mixed-integer program 
within the tolerable error 0.001, the globally optimal so- 
lution obtained is (x,,x2,x3) = (—2,4,—3.266) and 
the objective value found is -4491.16. 

Case2 Minimize x?°x + 31n x, subject to 


xX) + x2 55 

ce y—XxX2 <6, 

x, € {0.1,0.5,0.7,1.2},-6<x.<4, ye {0,1}. 
This problem contains a discrete variable, a free 

continuous variable and a binary variable which can- 

not be treated by the exponential-based methods. The 

nonlinear terms x{’° 

vex functions. By Remarks 2, 5 and 6, the problem can 

be equivalently transformed into a linear mixed-integer 

programming problem as follows. 


x2, 3Inx, and x{°y are noncon- 


Minimize 

0.19 x5 (0.59? = 0.1" )s; + 0.7" =0.1°" + 
(1.2°° — 0.1°)s3 + 3(In 0.1 + (In 0.5 — In 0.1)u;+ 
(In 0.7 — In 0.1)u2 + (In 1.2 — In0.1)u3) 

subject to 

—x,t%.<5, x, =0.14+(0.5-—0.1)u,+ 

(0.7 — 0.1)uz + (1.2 — 0.1)us, 

uy tu2+u3 <1, 0.1°°y + (0.5°° —0.1°?)zi+ 
(0.7°° —0.1%°)z. + (1.2°° —0.1°°)z3 — x2 < 6, 


0=<2Z%, wu, 4xy weruty—ti, 
i=1,2,3, —6u; <s; < 6u;, 6(u; = 1)+ 
x2 Ss; $6(1—uj)+x, i=1,2,3, 


S1, $2, $3 are unrestricted in sign variables, 


U1, U2, U3 € {0,1},-6 < x2 <4. 


The transformed program can be solved to locate 
the globally optimal solution (x), x2, y) = (0.1, —6, 0). 
The objective value is -8.805. 
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Conclusions 


This paper proposes a generalized method to solve 
the globally optimal solutions of GGP problems with 
continuous and discrete free variables. The techniques 
of dealing with free variables aim to change variables 
and to convert the logical relationship among the vari- 
ables in a product term into a set of linear inequalities, 
which can be merged conveniently into the GGP mod- 
els. Compared with current GGP methods, the pro- 
posed method is capable of dealing with free variables 
of a GGP problem and is guaranteed to converge to 
a global optimum. In addition, several computationally 
efficient convexification rules for signomial terms are 
presented to enhance the efficiency of the optimization 
approach. 
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The study of multivalued generalized monotone opera- 
tors is a recent(as of 1999) subject. The first to introduce 
such a notion seem to have been L.G. Mitjuschin and 
W.M. Polterovich [16] who defined multivalued quasi- 
monotone operators in demand theory. The same con- 
cept was also defined by A. Hassouni [10] and D.T. Luc 
[14]. Later, Luc [15] and J.P. Penot and P.H. Quang [20] 
proceeded to define new kinds of generalized mono- 
tonicity for multivalued operators. Alarge part of this 
effort has been devoted to the definition of appropriate 
concepts so that generalized convex nonsmooth func- 
tions are characterized by the generalized monotonicity 
of their subdifferentials [18]. 

As it stands today, the theory is not at the stage of 
development of the corresponding theory for single val- 
ued operators (see » Generalized monotone single val- 
ued maps). More concepts have to be introduced and 
probably some of the already existing ones have to be 


modified so that a nice correspondence such as the one 
exhibited in the first theorem of ® Generalized mono- 
tone single valued maps can be established, without im- 
posing any additional assumptions. This concerns both 
generalized monotonicity of multivalued operators and 
generalized convexity of nonsmooth functions, as some 
notions of generalized convexity involve subdifferen- 
tials. 

This article presents various definitions of gen- 
eralized monotonicity for multivalued operators and 
generalized convexity for nonsmooth functions. Also, 
various characterizations of generalized convexity of 
a function through the corresponding generalized 
monotonicity of the subdifferential are surveyed. Some 
characterizations have a ‘mixed’ form, i. e., they involve 
both the function and its subdifferential. 

The next section contains the definition of the sub- 
differential for lower semicontinuous functions, along 
with the necessary notation. Then the less known cor- 
respondence between the convexity of a function and 
the monotonicity of its subdifferential is presented. In 
the main part of the article, this correspondence is ex- 
tended to cover the various cases of generalized convex- 
ity and generalized monotonicity. 


The Subdifferential 


There is a host of nonequivalent subdifferentials for 
nonconvex functions. The interested reader may find 
a thorough exposition of the various concepts in [19]. 
The most common, the Clarke-Rockafellar subdiffer- 
ential, is the one that will be used here, although many 
of the results hold also for a large number of other 
subdifferentials; see for instance [1,18,19]. Generalized 
monotonicity of bifunctions is used in [13] to char- 
acterize generalized convex functions through various 
generalized derivatives. 

In this article, X denotes a Banach space, X™* its dual, 
and f: X — R U { +00} a lower semicontinuous (lsc) 
function with nonempty domain dom(f) = {x € X: f(x) 
# +00}. The function f is called radially continuous if 
its restriction to line segments is continuous. The value 
of a functional x* € X* at a point x € X will be denoted 
by (x*, x). Given x, y € X, (x, y) is the open line segment 
{tx+ (1—t)y: t € (0, 1)}. The line segments [x, y], [x, y) 
and (x, y] aredefined analogously. 
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The Clarke-Rockafellar generalized derivative of f at 
Xo € dom/(f) in the direction d € X is given by 


f* (xo, d) 


td’) — 
= suplimsup _ inf Aa iy 
e>0 x x0 d’€Be(d) t 
t\0 


Here, t \, 0 is used to denote the fact that t > 0 and t 
— 0, and x — ¢x, means that both x > x, and f(x) > 
f (Xo). 

The (Clarke-Rockafellar) subdifferential of f at xo € 
dom(f) is defined by 

df (xo) 

= {xt ext: (xd) < fxd), Vd ext, 
while for x» € X \ dom(f), 0 f(x) = @. 

Even for xp € dom(f), the subdifferential df (xo) may 
be empty. Whenever the function f is locally Lipschitz, 


one has df(xo)# 9, for all xo € dom(f). In this case f* 
coincides with the Clarke generalized derivative: 


f°(xo;d) = lim sup —— 


t\0 


In case f is convex, df coincides with the classical 
Fenchel-Moreau subdifferential 


Of (x0) 
= {x" EX": (x*,d) < f(x + d) — f(xo)}. 


The Monotone Case 
Let T: X > 2%” be a multivalued operator with domain 
D(T) = {x € X: T(x) £ GO}. 


The operator T is called: 
e monotone, if for all x, y € X and 


x* € T(x), y* € T(y) 
one has 
=a y= x) 20 (1) 


e strictly monotone, if for all x ¢ y the above inequality 
is strict. 


It is well known that the subdifferential of a convex 
function is amonotone operator. However, the fact that 
convex functions are characterized by the monotonic- 
ity of their Clarke-Rockafellar subdifferentials is a rel- 
atively recent result. In addition, there exists a ‘mixed’ 
characterization of convexity, involving both the func- 
tion and its subdifferential: 


Theorem 1 Let f be Isc. The following are equivalent: 
i) The function f is convex. 
ii) For allx, y € dom(f) and all x* € df (x) one has: 


(x",y—x) S fO) — fe). (2) 
iii) The subdifferential 0 f is a monotone operator. 


The implication i)= ii) follows from the equality of the 
Clarke-Rockafellar and the Fenchel-Moreau subdiffer- 
ential for convex functions. The implication ii)=>iii) is 
shown in every textbook on monotone operators. Fi- 
nally, the implication iii) i) is shown in [4]. 

An analogous theorem holds for strictly convex 
functions (see » Generalized monotone single valued 
maps for definitions of the various kinds of convexity 
and generalized convexity): 


Theorem 2 Let f be Isc. Consider the following asser- 
tions: 

i) The function f is strictly convex. 

ii) For all distinct x, y € dom(f) and 


x* € df (x), 
one has 


(x*,y—x) < fly) — f(x). 


iii) The subdifferential 0 f is a strictly monotone opera- 
tor. 

Then i)=>ii)=>iii). If in addition, df (x) A % for all x € 

dom(f), then iti)=>i). 


For the proof, see [8]. 


The Quasimonotone Case 


The concepts of quasimonotone, semistrictly quasi- 
monotone and strictly quasimonotone maps are direct 
generalizations of the corresponding concepts for sin- 
gle valued maps (see ® Generalized monotone single 
valued maps and [9,12]). A multivalued operator T: X 
— 2** is called: 
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e quasimonotone [14], if for all x, y € X and all x* € 
T(x), y* € T(y), the following implication holds: 


(x*, y—x)>0 > (*,y—x) >= 0; 


e semistrictly quasimonotone [5], if it is quasimono- 
tone and for any distinct 


x,y € D(T) 
one has the implication: 


Ax* € T(x): (x*,y—x) >0 
>3ze (2.7). 3z* € T(z): (3) 


(z*,y—x) > 0; 


e strictly quasimonotone [5], if it is quasimonotone 

and for any distinct x, y € D(T), there exists z € (x, 

y) and z* € T(z)such that (z*, y—x) 4 0. 

It can be shown [5] that relation (3) is equivalentto the 
following: if (x*, y— x) > 0 for some x* € T(x), then the 
set of all z € (x, y) for which there exists z* € T(z) such 
that (z*, y— x) > 0, is dense in [x, y]. 

In the single valued case, whenever T is a gradi- 
ent, its quasimonotonicity, semistrict quasimonotonic- 
ity and strict quasimonotonicity is equivalent to qua- 
siconvexity, semistrict quasiconvexity and strict quasi- 
convexity of the underlying function, respectively (see 
> Generalized monotone single valued maps for the 
corresponding definitions, and [3] for properties of 
such functions). Analogous results hold for multivalued 
operators which are subdifferentials. The next theorem 
gives two equivalent characterizations of quasiconvex- 
ity: one ‘mixed’, andone through the quasimonotonic- 
ity of the subdifferential. 


Theorem 3 Let f be Isc. The following are equivalent: 
i) The function f is quasiconvex. 
ii) For allx, y € dom(f), the following implication holds: 


Ax* € df(x): (x*,y—x)>0 
= Vze [x,y]: f(z) < fly). 


iii) The operator Of is quasimonotone. 


(4) 


The equivalence i)<>iii) is shown in [14, Thm. 3.2], 
while the equivalence i)<>ii) is shown in [1, Thm. 2.1]. 
In [1] it is also shown that, in case f is radially contin- 
uous, implication (4) is equivalent to the following im- 
plication: 


Ax* € f(x): (x*,y—x)>0 => f(x) < fly). 


A ‘mixed’ characterization exists also for semistrictly 
quasiconvex functions [5], but a continuity assumption 
stronger than lower semicontinuity is needed: 


Theorem 4 Let f be Isc. If f is semistrictly quasiconvex, 
then for all x, y € dom(f) one has: 


Ax* € df(x): (x*,y—x) >0 
=> Vze[x,y): f(z) < fly). 


The converse also holds if in addition f is radially con- 
tinuous. 


(5) 


Radial continuity is an often used, weak continuity as- 
sumption. In fact, it is not as weak as it seems. Since X is 
a Banach space, it can be shown that a Isc quasiconvex 
function which is radially continuous is also continuous 
[8]. 

Characterization of strict or semistrict quasiconvex- 
ity via the generalized monotonicity of the subdifferen- 
tial requires an even stronger continuity assumption: 


Theorem 5 A locally Lipschitz function f is strictly 
(respectively semistrictly) quasiconvex, if and only if its 
subdifferential is strictly (respectively semistrictly) quasi- 
monotone. 


For the proof, see [5]. 


The Pseudomonotone Case 


The definition of pseudomonotonicity for multivalued 
operators was given by J.C. Yao [21] and generalizes 
the corresponding definition for single valued opera- 
tors (see » Generalized monotone single valued maps 
and [11]). An operator 


Tr Lao" 

is called pseudomonotone if for all x, y € X one has: 
Ax* € T(x): (x*,y—x) >0 
=> Vy" Ee Ty): (y*,y—x) = 0. 


Equivalently, an operator T is pseudomonotone if and 


only if the following implication holds: 
Ax* € T(x): (x*,y—x) >0 6) 
=> Vy*ET(y): (*,y—x) > 0. 


Obviously, a pseudomonotone operator T is quasi- 
monotone. If in addition the domain D(T) is con- 
vex, then relation (6) implies that T is also semistrictly 
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quasimonotone. Also, it is clear that a monotone oper- 
atoris pseudomonotone. 

An operator T : X > 2%” is called strictly pseu- 
domonotone [21], if for all distinct x, y € X one has: 


Ax* € T(x): (x*,y—x) >0 
=> Vy ET(y): (V*,y—x) > 0. 


It is clear that a strictly pseudomonotone operator is 
pseudomonotone, and that a strictly monotone oper- 
ator is strictly pseudomonotone. Finally, it can easily 
be shown [8] that a strictly pseudomonotone operator 
with convex domain is strictly quasimonotone. 

In summary, between the various concepts of gen- 
eralized monotonicity, the following implications hold 
(some of which assume convexity of the domain): 


qm 
tT 
m > pm => sstr.qm 
t tt tt 
str.m => str.pm = — str.qm 
Here, ‘str.’ and ‘sstr.’ stands for ‘strictly’ and 


‘semistrictly’, respectively, and ‘m’, ‘pm’ and ‘qm’ for 
‘monotone’, ‘pseudomonotone’ and ‘quasimonotone’, 
respectively. These implications are exactly the same as 
those holding for singlevalued operators (see > Gener- 
alized monotone single valued maps). 

In contrast to quasiconvex functions and their vari- 
ants, pseudoconvex functions have to be redefined in 
the nonsmooth case. The reason is that the usual defini- 
tion of pseudoconvexity makes explicit reference to the 
derivative of the function (however, there exists a defi- 
nition which does not mention the derivative explicitly 
[17]; see also [3] for details). 

A function f is called pseudoconvex, if for all x, y € 
dom(f) the following implication holds: 


Ax* € Of(x): (x*,y—x)>0 


7) 
=> VzeE[x,y): f(z) < fly). 


Note that the above definition, expresses a ‘mixed’ 
property in the spirit of relation (4); actually, (7) is 
stronger than (4), and hence any pseudoconvex func- 
tion is quasiconvex. In particular, a pseudoconvex func- 
tion f has a convex domain. If in addition f is radially 
continuous, then it is semistrictly quasiconvex [8]. 


The definition of pseudoconvexity given here differs 
slightly from the definition introduced in [20]. There, 
a function f is called pseudoconvex if it satisfies the im- 
plication 


de” e0yf(xie (x .y—k) a0 = fea fy. ® 


A pseudoconvex function (as defined by relation (7)) 
obviously satisfies (8). The converse is not always true; 
however, if f is radially continuous, or if its domain is 
convex, then (8) implies that f is quasiconvex (see [20] 
and [6], respectively). It follows immediately that f sat- 
isfies (7), i.e., it is pseudoconvex. 

The following theorem connects pseudoconvexity 
of a function to pseudomonotonicity of its subdifferen- 
tial (see [8] and [20] for the proof of the first and the 
second assertion, respectively): 


Theorem 6 [If f is pseudoconvex, then df is pseu- 
domonotone.Conversely, if df is pseudomonotone and f 
is radially continuous, then f is pseudoconvex. 


A function f is called strictly pseudoconvex [8] if for all 
x, y € dom(f) one has: 


Ax* € Of(x): (x*,y—x) > 0 


(9) 
=> Wze[x,y): f(z) < fly). 


For radially continuous functions, relation (9) is equiv- 
alent to 


Ax* € df(x): (x*,y—x)>0 => f(x) < f(y). (10) 


Indeed, if relation (10) holds, then f is pseudoconvex, 
hence it is semistrictly quasiconvex. Consequently, if 
(x*, y— x) => 0 for some 


x* € Of(x), 


then f(x)< f(y) implies that f(z)< f(y) for all z € [x, y), 
i.e. (9) holds. 

We have the following connection to strict pseu- 
domonotonicity: 


Theorem 7 [f f is strictly pseudoconvex, then Of is 
strictly pseudomonotone. Conversely, if of is strictly 
pseudomonotone and its values are nonempty on 
dom(f), then f is strictly pseudoconvex. 


For the proof of the first assertion, see [20]; the second 
assertion is shown in [8]. 
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As a corollary of the last theorem, it can be shown 
[8] that a locally Lipschitz, strictly pseudoconvex func- 
tion f is strictly quasiconvex. Hence, between the vari- 
ous kinds of generalized convexity, the following impli- 
cations hold (some implications need extra continuity 
assumptions): 


qcx 
tT 

cx > pex => sstr.qcx 
tt tt t 

str.cx => str.pcx = — str.qcx 


Here, ‘cx’, ‘pcx’ and ‘qcx’ stands for “convex’, ‘pseu- 
doconvex’ and “quasiconvex’, respectively. Thus, the 
same implications hold as those fordifferentiable func- 
tions (see the corresponding diagram in » Generalized 
monotone single valued maps). In addition, each type 
of generalized convex function is characterized by the 
corresponding generalized monotonicity of the subd- 
ifferential, exactly as in the case of differentiable func- 
tions (the first theorem in » Generalized monotone 
single valued maps). 


See also 


> Fejér Monotonicity in Convex Optimization 

> Generalized Monotone Single Valued Maps 

> Generalized Monotonicity: Applications to 
Variational Inequalities and Equilibrium Problems 

> Set-valued Optimization 
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In the analysis and solution of complementarity prob- 
lems and variational inequalities, it is commonly as- 
sumed that the defining map is monotone. This is not 
surprising since in the special case of an underlying 
optimization problem usually convexity is assumed, 
and convexity of the objective function corresponds to 
monotonicity of its gradient. 

For several decades much effort has been devoted 
to generalizing convexity in various ways, often with 
the view of nonconvex optimization inmind [1]. On the 
other hand, only recently a systematic study of general- 
izations of monotonicity has emerged. Since the arti- 
cle [14] in 1990 about two hundred publications have 
appeared. They deal with either concepts and charac- 
terizations of generalized monotonicity or with uses in 
variational inequalities and related models [23]. 

In this survey characterizations of generalized 
monotonicity for different subclasses of maps are pre- 
sented. The need for such criteria is obvious, given that 
the defining inequalities are often hard to verify. 

The article is organized as follows. The next sec- 
tion provides a brief review of some basic generalized 
monotonicity concepts and their relationships. This is 
followed by a presentation of criteria for generalized- 
monotonicity in case of differentiable, affine and non- 
differentiable (locally Lipschitz) maps in the subsequent 
sections. 

This article on concepts and characterizations of 
generalized monotone maps in the single valued case is 
complemented by one on multi valuedmaps. In a third 
article in this volume the use of generalized monotonic- 
ity in variational inequalities and more general models 


is surveyed. For amore detailed survey of applications 
see [11]. 


Seven Kinds of (Generalized) Monotonicity 

Seven basic kinds of convex/generalized convex func- 
tions are [1]: 

e convex (cx), strictly convex (str.cx); 


e pseudoconvex (pcx), — strictly pseudoconvex 
(str.pcx); 
e quasiconvex (qcx), semistrictly quasiconvex 


(sstr.qcx) and strictly quasiconvex (str.qcx). 
Strongly convex and strongly pseudoconvex functions 
[1] are not considered here. 

These functions are related to each other as follows: 


qcx 
tT 
cx > pcx => sstr.qcx 
t tt t 
str.cx => str.pcx => — str.qcx 


For the sake of completeness, the related definitions 
are presentedbelow. 
Consider f: C — R where C C R’” is convex. 


e f is convex (cx) if for all x, y € C and t € (0, 1), 
fltetil=)y) =f +C=ofy: 
e f is strictly convex (str.cx) if (1) is a strict inequality 


forx # y. 
e f is quasiconvex (qcx) if for all x, y € C such that f(x) 


<f(y), t € (0, 1), 
f(tx+(—dy) < fU)s (2) 


e f is strictly quasiconvex (str.qcx) if (2) is a strict in- 
equality for x £ y; 

e f is semistrictly quasiconvex (sstr.qcx) if for all x, y € 
C such that f(x) < f(y) the inequality (2) is strict. 
For the remaining two types of generalized convex 
functions one assumes differentiability of f on the open 
convex set C CR", although more general definitions 

are available [1]: 
e f is pseudoconvex (pcx) if for all x, y € C 


(y—x)'Vf(x)>0 > fly) > f(x) (3) 


e f is strictly pseudoconvex (str.pcx) if for all x, y € C, 
x # y thesecond inequality in (3) is strict. 
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Different kinds of generalized convexity preserve differ- 
ent properties of convex functions. E.g., the characteris- 
tic of a pseudoconvex function is that a stationary point 
is a global minimum. Furthermore, for a semistrictly 
quasiconvex function a local is a global minimum and 
for a quasiconvex function the lower level sets are con- 
vex. The qualifier ‘strictly’ indicates that a global mini- 
mum is unique. In contrast to convex functions, inflec- 
tion points are admissible for all types of generalized 
convex functions. 

Note that in [1] the terminology of quasiconvex and 
pseudoconvex functions was harmonized, resulting in 
renaming former ‘strongly quasiconvex’ functions as 
strictly quasiconvex and ‘strictly quasiconvex’ functions 
as semistrictly quasiconvex. 

It is well known that a differentiable convex func- 
tion is characterized by a monotone gradient. Corre- 
spondingly, a strictly convex function is characterized 
by a strictly monotone gradient. Accordingly, gener- 
alized monotonicity concepts have been introduced in 
such a way that incase of a gradient map F = Vf gener- 
alized monotonicity of F corresponds to some kind of 
generalized convexity ofthe underlying function f. The 
definitions of (generalized) monotone maps are listed 
below. 

Consider F : C > R” where C C R”. 

e Fis monotone (m) on Cif for all x, ye C 


(y — x)" (F(y) — F(x)) = 0; (4) 


e Fis strictly monotone (str.m) on C if for all x, y € C, 


x#y 
(y — x)" (F(y) — F(x)) > 0; (5) 
e Fis pseudomonotone (pm) on C if for all x, y € C, 
(y—x)"F(x) >0 > (y—x)'F(y)>0, (6) 
which is equivalent to 
(y—x)"F(x) > 0 = (y—x)'F(y) > 0; 


e F is strictly pseudomonotone (str.pm) on C if for all 
xyEC xy, 


(y—x)"F(x) >0 > (y—x)'F(y)>0; (7) 
e Fis quasimonotone (qm) if for all x, y € C, 


(y—x)'F(x) >0 > (y—x)"F(y) 20; (8) 


e F is strictly quasimonotone (str.qm) on C if F is 
quasimonotone on C and for all x, y € C, x # y there 
exists z = tx + (1—t)y, t € (0, 1), such that 


(y— x)" F(z) £0; (9) 


e Fis semistrictly quasimonotone (sstr.qm) on Cif F is 
quasimonotone on C and forx,yeEC,x Fy, 
(y—x)' F(x) >0 > (y—x)'F(z)>0 (10) 
for some z = tx + (1 — t)y, t € (0, 1/2). 
If F is continuous, quasimonotonicity does not have 
to be required explicitly for strictly/semistrictly quasi- 
monotone maps since it is implied by (9), (10), respec- 
tively. In terms of references for the concepts above, see 
[13] for pseudomonotone maps, [14] for quasimono- 
tone and strictly pseudomonotone maps and [9] for 
strictly quasimonotone and semistrictly quasimono- 
tone maps. 
The following diagram was derived in [9,13,14] for 
general maps which are not necessarily gradient maps: 


qm 
tt 

m > pm => sstr.qm 
tt tt tt 

str.m = str.pm = — str.qm 


Now consider the special case of a gradient map F 
= V f, where f is differentiable on the open convex set 
C C R". In analogy to monotone maps it can be shown 
(9,13, 14]: 


Theorem 1 The map F = Vf is quasimonotone (re- 
spectively, semistrictly quasimonotone, strictly quasi- 
monotone, pseudomonotone, strictly pseudomonotone) 
if and only if the function f is quasiconvex (respectively, 
semistrictly quasiconvex, strictly quasiconvex, pseudo- 
convex, strictly pseudoconvex). 


Note that in the case of semistrictly quasiconvex func- 
tions Theorem 1 provides the first successful character- 
ization in terms of the gradient. Before, the existence of 
such a characterization was doubted [17]. 

There are several studies where similar results are 
obtained for nondifferentiable functions in which the 
gradient is replaced by the subdifferential (see, e.g., 
> Generalized monotone multivalued maps). 
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Given the geometric properties of generalized con- 
vex functions mentioned above [1], it is not difficult to 
derive the geometric properties describing generalized 
monotonicity of gradient maps; e. g. [2,3,15]. 

New generalized monotone maps can be con- 
structed from existing ones. As an example from [20], 
consider z = Ax + b, where A is an m X n matrix and 
b € R™. Let D C R™ and C = {x € R":Ax + b € D}. 
Then the map F(x) = ATG (Ax + b) is quasimono- 
tone (pseudomonotone) on C if G is quasimonotone 
(pseudomonotone) on D. Moreover, F is strictly pseu- 
domonotone on C if G is strictly pseudomonotone on 
Dand A hasfull rank. 


The Differentiable Case 


In this section it is assumed that F:C > R” is differen- 
tiable and C C R” is an open convex set. Let Jr (x) be the 
Jacobian of F. First order characterizations of general- 
ized monotone maps have been established in [15]. In 
case of gradient maps they extend classical second order 
characterizations of generalized convex functions. 
Let x € C, ve R", v ¥ 0 and consider the following 
conditions: 
A)  vTF(x) = 0 implies vTJp(x)v= 0; 
A+) vTF(x) = 0 implies vTJp(x)v> 0; 
B)  vTF(x) = vTJp(x)v = 0 and the condition v' F(x + 
tv) > 0 for some f < 0 implies that there exists 
t > Osuch that x + tv € C, v' F(x + tv) > 0 for 
al0<t<t; 
C) vTF(x) = vTJp(x) v = 0 implies that there exists f > 
0 such that x + tv € Cv! F(x + tv) > 0 for all 
C2726 


The following can be shown: 


Theorem 2 Let F: C— R" be differentiable on the open 

convex set C C R". 

i) F is quasimonotone if and only if A) and B) hold for 
allx € Candv€ R"; 

ii) F is pseudomonotone if and only if A) and C) hold 
forallx € Cand ve R"; 

iii) F is strictly pseudomonotone if A+) holds for all x € 
CandveR". 


More recently, it was shown in [4] that for continuously 
differentiable maps v™ F(x) = 0 in B) and C) can be re- 
placed by the less restrictive assumption F(x) = 0, and 
i) and ii) are still true. An immediate consequence of 


this stronger characterization is that for a nonvanish- 
ing map on an open convex set there is no difference 
between quasimonotonicity and pseudomonotonicity. 
Both are characterized by condition A). However, this 
is no longer true in closed convex sets (see [10, Example 
3.1]). 


The Affine Case 


In this section we focus on the special case of affine 
maps. Let F(x) = Mx + q where M is an n x n matrix 
and q € R”. Consider F on an open convex set C C R". 
For general differentiable maps we have F = Vf if and 
only if Jp(x) is symmetric for all x. Hence for an affine 
map F(x) = Mx + q we have F = Vf if and only if M is 
symmetric. In this case f(x) = (x1Mx)/2 + q™x. There- 
fore first order characterizations of generalized mono- 
tone affine maps correspond to second order character- 
izations of generalized convex quadratic functions. 

For affine maps conditions B) and C) are always sat- 
isfied. Hence, specializing Theorem 2 we have 


Theorem 3 The map F(x) = Mx + q is quasimonotone 
on an open convex set C C R" if and only if F is pseu- 
domonotone on C if and only if for allx € Candveé 
R"” 


v' (Mx + g=0> v' Mv >0. 


As a result, quasimonotonicity in a neighborhood of 
a point x such that Mx + q = 0 implies monotonic- 
ity on R”. 

As mentioned earlier, one can construct new gener- 
alized monotone maps with the help of a given one as 
follows. Given the linear map G(z) = Mz, if G is quasi- 
monotone (pseudomonotone) on the nonnegative or- 
thant R”, then the map F(x) = (ATMA)x is quasimono- 
tone (pseudomonotone) on R‘,, for any nonnegative m 
x n matrix A. 

Recently a matrix-theoretic characterization of gen- 
eralized monotone affine maps was obtained [6]. The 
departure point for its derivation is Theorem 3. The fol- 
lowing notation is needed to describe the results. 

For the affine map F(x) = Mx+ q one considers 


1 1 
B= 5(M +M'), P= 5M'BIM, 


where B’ is the Moore-Penrose pseudo-inverse of B, 
n,, n— and no is the number of positive, negative and 


Generalized Monotone Single Valued Maps 


zero eigenvalues of B, respectively, 


r = dim (ker(M)), 

f(x) =(Mx + q)"B'(Mx + q), 
S ={x ER": f(x) < 0}, 
T={xeR": x Px 201, 


CC R" is convex with C £ @. 
One has [6]: 


Theorem 4 F is quasimonotone on C (and pseu- 
domonotone on (C)) if and only if one of the following 
conditions holds: 

i) n—=0,i.e., Bis positive semidefinite and F is mono- 
tone on R"; 
n—=1,r=no+1,—-q ¢M (int C), q € B(R") 2M 
(R"), P is positive semidefinite, S isa closed convex 
set and C C §; 

n—=1,r=no, —q ¢ M(int C), q € B(R") = M(R"), 
T = T, U (-T,) where T., is a closed convex cone, 
int T, # 6, and for x such that Mx = q either 
CC-x+ Ty orC C—-x— Ty. 


iil) 


ii2) 


Hence the maximal domain of quasimonotonicity is: 
e R" incase i); 

e Sincase iil), and 

e —x-+ Ty or —x — Ty in case ii2). 

From Theorem 4 a characterization of quasimono- 
tone (pseudomonotone)affine maps on convex cones 
can be derived, and further specialized to the nonnega- 
tive orthant [6]. 

It should be noted that in the special case MT = M, 
case iil) does not occur and Theorem 4 reduces to clas- 
sical characterizations of generalized convex quadratic 
functions [7,18,19,21,22]; see also [1]. Case iil) does not 
occur either if M is nonsingular. Hence it arises only if 
M is not symmetric and singular. 

Theorem 4 characterizes pseudomonotone affine 
maps on open convex sets. However in applications, 
e.g. in complementarity problems and variational in- 
equalities, pseudomonotonicity on closed and convex 
sets is needed. Such characterizations have very recently 
been derived in [5] with an approach different from the 
one in [6]. It involves an extension of Martos’ concept 
of positive subdefinite matrices [18] to the nonsymmet- 
ric case. Among others, [5] generalizes previous results 
on pseudomonotone matrices for linear complemen- 
tarity problems, e. g. [8]. 


The Nondifferentiable Case 


Finally, characterizations of certain nondifferentiable 
generalizedmonotone maps [16] are presented in this 
section. 

Let F: C > R" be locally Lipschitz where C C R” is 
open convex. The criteria below make use of the gener- 
alized Jacobian in the senseof Clarke. Given x € C, let 
L(x) be the set of all limits DF(x;) where x;— x, F is dif- 
ferentiable at x; € C and DF (x;) is the Jacobian. Define 
OF(x) to be the convex hull of L(x). Finally, for x € C 
and v € R" set 


DF (x; v) = sup {v" Av: Ae dF (x)}, 
D_F(x;v) = inf {v' Av: A € 0F(x)}. 


In generalization of Theorem 2i) one has: 


Theorem 5 The locally Lipschitz map F is quasimono- 

tone on C if and only if for allx € C,v € R" 

A’) vTF(x) = 0 implies D,.F(x;v) > 0, and 

B’) vTF(x) = 0, 0 € {vTAv: A © OF(x)} and v' F(x + 
tv) > 0 for some t < 0 imply that there exists t > 0 


such that v1? F(x + tv) > 0 for allt € [0, t]. 


In light of [4], a stronger sufficient condition can be ob- 
tained which however is no longer necessary [16], in 
contrast to the differentiable case. 


Theorem 6 The map F is quasimonotone on C if for all 

xEC, vEeR'v40 

A”) vTF(x) = 0 implies D_F(x;v) > 0, and 

B") F(x) = 0, D_(x;v) = 0 and v' F(x + tv) > 0 for 
some t < 0 imply that there exists > 0 such that 
vl F(x + tv) > 0 for all t € (0, ¢]. 

In analogy to the differentiable case (see Theorem 

2), corresponding characterizations can be obtained 

for pseudomonotone maps, replacing B’), B’’) by 

a stronger condition. Furthermore, criteria for strict 

pseudomonotonicity are derived in [16]. 

Very recently, generalized monotonicity criteria for 
locally Lipschitz maps have been extended to the class 
of general continuous maps [12]. In this study Clarke’s 
generalized Jacobian is replaced by an ‘approximate Ja- 
cobian’. 


Conclusion 


In this survey we have presented various character- 
izations of generalized monotone maps. Details are 
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shown mainly for quasimonotone and pseudomono- 
tone maps. In retrospect, it becomes clear how the main 
characterization in the differentiable case (Theorem 2) 
specializes in the affine case (Theorems 3, 4) and how 
it can be extended in the nondifferentiable case (Theo- 
rem 5). 


See also 


> Fejér Monotonicity Inconvex Optimization 

> Generalized Monotone Multivalued Maps 

> Generalized Monotonicity: Applications to 
Variational Inequalities and Equilibrium Problems 

> Set-valued Optimization 
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This article on generalized monotone maps focuses on 
some of their uses in variational inequalities and equi- 
librium problems. Definitions and properties of var- 
ious types of generalized monotone maps are found 
in » Generalized monotone single valued maps and 
> Generalized monotone multivalued maps. These ar- 
ticles form the background of the present survey. 

Variational inequalities appear in various forms and 
arise in a wide range of problems in the natural and 
social sciences, for example [22]. The simplest varia- 
tional inequality problem (VIP) is the following: Given 
a nonempty closed convex subset K of R” and a map F: 
K — R’, find an element xo € K such that 


(F(xo))' (x — xo) = 0 forall x € K. (1) 


The prime example of a variational inequality stems 
from a minimization problem. Given a differentiable 
function f: K > R, if x9 € K minimizes f, then xo is 
a solution of the VIP (1) with F = V f. 

As shown by G.J. Hartman and G. Stampacchia 
[17], (1) has a solution if K is compact and F is 
continuous. This result found many applications and 
holds also, with the same assumptions, in infinite- 
dimensional Banach spaces (cf. [26, Prop. 77.8]). How- 
ever, in infinite-dimensional problems this form of the 
theorem is not useful. The reason for this is that in 
almost all interesting applications the assumptions of 
(strong) compactness of the set K and of continuity of 
the operator F are too strong to be met. A decisive step 
forward was made by F. Browder who relaxed both as- 
sumptions, at the cost of imposing another assumption, 
namely monotonicity [7]. Specifically, let X be a real Ba- 
nach space with dual X*, and K a nonempty, weakly 
compact and convex subset of X. Given an operator T: 
K — X*, consider the following VIP: find x9 € K such 
that 


(2) 


(Tx9,x— x0) >0 forallx € K, 


where (-, -) is the duality pairing between X* and X. As 
shown by Browder, the VIP (2) has a solution if T is 
hemicontinuous and monotone. We recall that an op- 
erator T is called hemicontinuous if its restriction to line 
segments is continuous when X* is equipped with the 
w* -topology. The operator is called monotone if for all 
x, y € K one has 


(Ty — Tx, y —x) > 0. 


It is interesting to note that in the standard ex- 
ample of a variational inequality problem where X = 
R" and T is the gradient of a function f: K — R the 
operator T is monotone if and only if f is convex. 
This shows that monotonicity is a natural assumption 
for VIP. But it also shows that it may be too rigid 
in many applications. This led to the consideration of 
variational inequality problems and their extensions 
with generalized monotone operators. The first to con- 
sider generalized monotonicity in connection with vari- 
ational inequalities was H. Brezis [5]. Then S. Kara- 
mardian [19], coming from convex and generalized 
convex optimization [1], began a tradition of intro- 
ducing concepts of generalized monotonicity which, 
unlike the one of Brezis, preserve the connection be- 
tween monotonicity and convexity. They ensure that 
in case of a gradient map, the gradient is general- 
ized monotone (for instance, pseudomonotone, strictly 
pseudomonotone, quasimonotone, strictly quasimono- 
tone, semistrictly quasimonotone) if the underlying 
function is generalized convex (i.e., respectively, pseu- 
doconvex, strictly pseudoconvex, quasiconvex, strictly 
quasiconvex, semistrictly quasiconvex [1]). For defini- 
tions and properties of these concepts see ® General- 
ized monotone single valued maps and » Generalized 
monotone multivalued maps for single- and multival- 
ued generalized monotone maps, respectively. 

In the next section, results on the existence of solu- 
tions for the variational inequality problem with gen- 
eralized monotone operators are presented. A general- 
ization of these results to vector valued variational in- 
equality problems is given in the third section. Finally, 
the last section surveys results on the existence of solu- 
tions for equilibrium problems, both in the scalar and 
in the vector case. To begin, consider the following no- 
tation and definitions. 

Let X be a real Banach space. Given x, y € X, ]x, y[ 
and [x, y] denote the open line segment and the closed 
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line segment joining x and y, respectively; the segments 
]x, y], and [x, y[ are defined analogously. A multivalued 
operator T: K > 2*"\{G} is called upper hemicontinu- 
ous if for all x, y € K, the restriction of T to [x, y] is 
upper semicontinuous with respect to the w*-topology 
on X*. 

For any nonempty subset D of X, a point xo € X is 
called an inner point of D [14,25] if for all u € X* the 
following implication holds: 


(x,u) < (xo,u), Vx € D 


=> (x,u) = (xo,u), Vx ED. 


The set of inner points of D is denoted by inn D. The 
concept of an inner point is a generalization of the con- 
cept of a relative algebraic interior point. Indeed, in case 
X is finite dimensional, the two concepts coincide. In 
the general case, any relative algebraic interior point 
is an inner point; in case of a closed convex set, inner 
points have the following properties [14,25]: 


Theorem 1 Let K ¢ 9 be a closed and convex subset of 
X. Then one has: 

i) innK CK; 

ii) if K is separable, then inn K # 9; 

iii) if x; € K, x9 € inn K, then 


]x1, Xo] € inn K; 


in particular, inn K is convex. 


There are many important examples of closed convex 
subsets K which contain inner points, without contain- 
ing any relative algebraic interior points [14]. 


Scalar Variational Inequalities 


Let X bea real Banach space, and K a nonempty, closed, 
convex subset of X. Let further T: K > 2%’ \ {Q} be 
a multivalued operator with nonempty values. The VIP 
for such an operator is the following: find x) € K such 
that 

Vx Ee KAx* €Tx9: (x*,x—x9) = 0. (3) 


This problem is closely related to the so-called dual 
variational inequality problem (DVIP), which is the fol- 
lowing: find x9 € K such that 


Vx EK Vx" € Tx: (x*,x—x9) > 0. (4) 


Indeed, it is well known that, if xo is a solution of DVIP, 
then it is also a solution of VIP, provided that T is up- 
per hemicontinuous [20]. For this reason, most proofs 
of existence of a solution for VIP establish first the ex- 
istence of a solution of DVIP. 

R.W. Cottle and J.C. Yao [8] were the first to 
show an existence result for a solution of a VIP with 
a single valued pseudomonotone operator, hereby ex- 
tending Karamardian’s result [19] for complementarity 
problems in finite-dimensional spaces. Later, Yao [24] 
generalized this result to multi valued pseudomono- 
tone operators; I.V. Konnov [20] generalized it further 
to include semistrictly quasimonotone operators; see 
> Generalized monotone multivalued maps. The most 
general result in this direction with no assumptions (ex- 
cept coercivity) was derived for properly quasimono- 
tone operators [12]. The operator T is called properly 
quasimonotone if for all x1,...,X, € K and all y € co{x;, 
...5 Xn} there exists i € {1,..., n} such that (x*, y — 
xi) < 0 for all x* € Tx;. The name of this property is 
justified by the fact that a lower semicontinuous func- 
tion f: K — R is quasiconvex if and only if its Clarke- 
Rockafellar subdifferential is properly quasimonotone 
[12]. For such operators, the following theorem holds 
[11]: 


Theorem 2 Let T: K — 2*° \{} be a properly quasi- 
monotone operator. Suppose that K is weakly compact, 
or alternatively that the following coercivity condition 
holds: there exists a weakly compact subset W of K and 
xo € W such that 


Vx € K\W Axg € Txo: (xg. xo — x) <0. (5) 


Then the DVIP (4) has a solution. Consequently, if T is 
upper hemicontinuous, then the VIP (3) also has a solu- 
tion. 


A semistrictly quasimonotone operator (or, a fortiori, 
a pseudomonotone operator) is properly quasimono- 
tone [11]. Thus, the above result generalizes the corre- 
sponding results in [20] and [24]. 

For the still more general case of a quasimonotone 
operator, even for single valued operators, one needs 
a mild assumption on the domain [14]. For multivalued 
operators one needs still stronger assumptions [9]: 


Theorem 3 Let T: K > 2*’\ {@} bea quasimonotone 
operator. Suppose that: 
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a) K is weakly compact, or alternatively that coercivity 
condition (5) holds; 

b) inn KF @; 

c) T has compact values. 

d) T is upper hemicontinuous. 

Then the VIP (3) has a solution. 


Vector Variational Inequalities 


The VIP has been generalized in various ways. One 
of these generalizations proposed by F. Giannessi [13] 
suggests to consider the variational inequality in a mul- 
tidimensional space rather than the real number field. 
This is the so-called vector variational inequality prob- 
lem (VVIP). The VVIP is closely related, just as its 
scalar counterpart, to the least element problem and the 
complementarity problem [23]. 

In the VVIP, apart from the Banach space X and its 
closed, convex subset K, one considers a Banach space 
Y and the space L(X, Y) of all continuous linear oper- 
ators from X to Y. The space Y is ordered by a cone C. 
In this case, the expression ‘the element x € Y is non- 
negative’ can have two different meanings: either x € C 
or x ¢ —int C. It further increases the applicability, es- 
pecially to economics, without much additional effort if 
one allows this cone to ‘move’; thus, instead of a cone 
one considers a multivalued mapping C: K > 2” such 
that for each x € K, C(x) is a closed convex cone with 
nonempty interior. Let further T: K > 24%") \ {@} be 
a multivalued operator. The VVIP is the following: find 
x9 € K such that 


Vy € KAA € Txo: 


(6) 
A(y — xo) € —int C(x9). 


In the scalar case Y = R, C(x) = R* one has L(X, 
Y) = X*, and VVIP becomes VIP. In the general case, 
monotonicity and generalized monotonicity have to be 
newly defined. The operator T is called: 
e monotone if for all x, y € K one has: 
VAE Tx VBE Ty: 
(B— A)(y — x) € C(x); 
e pseudomonotone if for all x, y € K the following im- 
plication holds: 
dA € Tx: A(y—x) € —int C(x) 
=> VBeTy: Bly—x) ¢—intC(x); 


e quasimonotone if for all x, y € K the following im- 
plication holds: 


dA € Tx: A(y—x) € —C(x) 
=> VBeTy: Bly—x) ¢ —int C(x). 


We now recall some topological notions. The strong 
operator topology (SOT) on L(X, Y) is the weakest 
topology such that for each x € X, the function L(X, Y) 
> A— Ax € Y is continuous. An operator A € L(X, Y) 
is called completely continuous if it maps weakly con- 
vergent sequences into strongly convergent sequences. 
Examples of completely continuous operators are com- 
pact operators. The following result proved in [10] gen- 
eralizes many existence results in the literature as well 
as Theorem 3: 


Theorem 4 Suppose that the following assumptions 

hold: 

i) the operator T is upper hemicontinuous with respect 
to the SOT topology; 

ii) the graph of the multifunction 


x > Y \ (-int C(x)) 


is sequentially closed in X x Y in the (weak) x 
(strong) topology; 

iii) K is weakly compact; 

iv) for each x € K, Tx consists of completely continuous 
operators; 

v) T is pseudomonotone, or 

v) T is quasimonotone, its values are norm compact 
and inn K # @. 

Then the VVIP (6) has a solution. 


As in the scalar case, the assumption ‘K is compact’ may 
be replaced by a coercivity condition. 


Equilibrium Problems 


The remainder of this article deals with problems more 
general than VIP. Given a nonempty set K and a bifunc- 
tion f: K x K > R, the equilibrium problem (EP) [4,6] 
for f is the following: find xp € K such that 


f(x,y) =O forall ye K. (7) 


A great variety of problems can be formulated as 
an EP including problems of optimization, saddle point 
theory, game theory, fixed point theory and VIP [4]. For 
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instance, if K is a nonempty closed, convex subset of 
a Banach space X and T: K > 2*"\ {0} is a multivalued 
operator with weakly compact values, let f be defined as 


f(x, y) = max{(x*,y—x): x* € Tx}. (8) 


It is easy to see that xo € K is a solution of the EP 
(7) if and only if it is a solution of the VIP (2). Because 
of this correspondence, one is led to define concepts of 
generalized monotonicity for bifunctions. A bifunction 
f is called: 
e monotone [4] if for all x, y € K one has: 


f(xy) + fly. x) <0; 


e pseudomonotone [3] if for all x, y € K the following 
implication holds: 


f(x,y) =0 > fly,x) <9; 


e quasimonotone [3] if for all x, y € K the following 
implication holds: 


f(x,y) >0 => fly.x) <0. 


It is easy to see that a multi valued operator is mono- 
tone (respectively, pseudomonotone, quasimonotone) 
if and only if the bifunction defined by relation (8) 
is monotone (respectively, pseudomonotone, quasi- 
monotone). Equilibrium problems with generalized 
monotone bifunctions in the above sense were consid- 
ered in [3]. There the following result was proved: 


Theorem 5 Let X be a real topological Hausdorff vector 

space and K C X be nonempty, convex and closed. Let 

further f: K x K — R be a bifunction such that f (x, x) = 

0 for all x € K. Consider the following assumptions: 

i) fC. y) is hemicontinuous (i. e., continuous on every 
line segment in K) for ally € K; 

ii) f(x, +) is semistrictly quasiconvex [1] and lower semi- 
continuous for all x € K; 

iii) there exists a compact subset BC X and yp € BN K 
such that f (x, yo) < 0 for all x € K\ B (coercivity); 

iv) for allx € K, if f(x, y) = 0 and f(x, yi) > 0, then f (x, 
z) > Oforallz € Jy, yi; 

v) the algebraic interior of K is nonempty. 

If f is pseudomonotone and assumptions (i-iii) hold, 

then the EP (7) has a solution. Likewise, if f is quasi- 

monotone and all assumptions i)-v) hold, then (7) has 

a solution. 


The above theorem generalizes older results by Brezis, 
L. Nirenberg and Stampacchia [6] and is related to 
more recent results with monotone bifunctions by E. 
Blum and W. Oettli [4]. 

Just like vector variational inequalities, vector equi- 
librium problems have also been considered where the 
bifunction takes values in a locally convex vector space 
ordered by a cone [2]. As shown by Oettli [21] for 
the pseudomonotone case, vector equilibrium prob- 
lems can also be treated by considering two real valued 
bifunctions, rather than one vector valued one. Oettli’s 
approach can even be applied to the quasimonotone 
case [15]. For this, let X be a real Hausdorff topological 
vector space, K C X be nonempty and convex, and f, g: 
K x K > Rbe two bifunctions. The bifunction f is said 
to be pseudomonotone with respect to the bifunction g 
[21] if for all (x, y) € K x K the following implication 
holds: 


FEY ZV => vly.x) =O. 


The bifunctions f, g are said to be a quasimonotone 
pair [15] if for all (x, y) € K x K the following implica- 
tion holds: 


fey >0 => gly.x) <0. 


Iff =g, then the above definitions reduce to those of 
pseudomonotone and quasimonotone bifunctions, re- 
spectively. 

The following rather technical, but very useful result 
was proved in [21] for the pseudomonotone case and in 
[15] for the quasimonotone case: 


Theorem 6 Consider the following assumptions: 

i) f(x, x)= Oforallx € K; 

ii) the set o(y) = {x € K: g(y, x) < 0} is closed in K for 
ally € K; 

iii) for all x, y, z € K, if f(x, y) < 0 and f(x, z) < 0, then 
F(x, u) < Oforallu € Jy, zi; 

iv) there exist a compact subset D of K and y* € D such 
that for allx € K\D one has f (x, y*) < 0; 

v) the set {u € [x, z]: g(u y) < O} is closed for all x, z € 
K; 

vi) the relative algebraic interior of K is nonempty. 

Suppose that f is pseudomonotone with respect to g and 

assumptions i)-iv) hold, or that the bifunctions f, g are 

a quasimonotone pair and all assumptions i)-vi) hold. 
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Then at least one of the following problems has a solution 
xo € K: 


f(xo.y)=0 forally € K, 
gly.xo) <0 forally € K. 


By choosing the bifunctions f and g appropriately, a va- 
riety of results can be produced. For instance, let X and 
K beas before, and let Z be a real Hausdorff locally con- 
vex space. Finally, let C C Z be a proper, convex, closed 
cone with nonempty interior int C. Define the relations 
<,<, £and 4 on Z by 


x<y S} y-xeEG; 
x<y S&S y-—x €intC; 
xfy @& y-xE€C; 
xfy @& yx ¢intC. 


Given a bifunction H: K x K — Z, consider the vec- 
tor equilibrium problem (VEP): find x9 € K such that 


H(xo,y) £0 forall y € K. (9) 


Theorem 6 can now be applied to show the exis- 
tence of a solution for VEP. This is done as follows. 
Since the cone C has a nonempty interior by assump- 
tion, the dual cone C* has a w*-compact base B. (Recall 
that a (closed) base B of a cone W is a convex subset 
of W such that 0 ¢ B and W = U;sofB.) Define the real 
valued bifunctions f and g on K as follows: 


7o)= max g (H(x, y)). 
g(x, y) = mind (H(, y)). 


Applying Theorem 6 to these bifunctions, one ar- 
rives at the following result [15]: 


Theorem 7 Suppose that the bifunction H satisfies the 

following assumptions, for all x, y, z in K: 

i) H(x,x) £0; 

ii) the set {x € K: H(y, x) # 0} is closed in K; 

iii) if H(x, y) < 0 and H(x, z) < 0, then H(x, u) < 0 for 
allu € Jy, zi 

iv) the sets {ue ]x, z[: H(u, y) # O} and {ue Jx, z[: H(u, 
y) # 0} are closed; 

v) there exist a compact subset D of K and y* € D such 
that for all x € K\D we have H(x, y*) < 0 (coerciv- 
ity); 


vi) H(x, y) > 0 => H(y, x) < 0 (quasimonotonicity of 
A); 

vii) if H(u, y) < 0 for some u € Jx, yf, then H(u, x) > 0. 

Then the VEP (9) has a solution. 


The above result considerably strengthens a corre- 
sponding result in [2]. 

As another example for using Theorem 6, consider 
the Banach spaces X, Y, the multivalued operator T and 
the cone-valued map C as in the previous section on 
VVIP. For each x € K, choose a w*-compact base B(x) 
of the dual cone C*(x). Now define the bifunctions f 
and g as follows: 


f(x,y) = max ¢ (A(y— x)), 


AETXx 


g(x,y) = min $ (Aly— x). 
A€Tx 

Then, applying Theorem 6, one can show Theorem 
4 as a corollary, for a set K with nonempty relative alge- 
braic interior. Other variants of Theorem 4 can also be 
deduced [15]. 

In conclusion, this article demonstrates that gener- 
alized monotonicity rather than monotonicity is suf- 
ficient to establish the existence of solutions for VIP, 
VVIP, EP and VEP. A more extensive survey can be 
found in [16]. 

Finally, the reader interested in recent results on the 
relevance of generalized monotone VIP for the general 
economic equilibrium is referred to [18]. 
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A network is composed of two types of entities: arcs and 
nodes. The nodes represent locations or terminals, and 
the arcs represent one-way links connecting pairs of 
nodes. The arc (i, j) links node i to node j and the flow is 
from i to j. The structure of a network can be displayed 
by a drawing, as illustrated in Fig. 1. The structure of 
a network may also be represented by a node-arc inci- 
dence matrix A, where Aj, is 1 if arc k is directed away 
from node i, Aj is —1 if arc k is directed toward node 
i, and Aj is 0 otherwise. Any matrix A in which each 
column has exactly two nonzero entries, a + 1 and a — 
1, is called a node-arc incidence matrix. The minimum 
cost network flow problem is a linear program, say 

min {¢'x: Ax =b,1l<x< u}, 

where A is a node-arc incidence matrix. The generalized 
network problem, as its name implies, is a generaliza- 


Generalized Networks, Figure 1 
Example network with nodes 1, 2, 3, 4 and arcs (1, 2), (1, 3), 
(2, 3), (2, 4), (3, 2), (3, 4) 


tion of the minimum cost network flow problem, also 
referred to as the pure network problem. 

Let f denote the flow in arc (i, j) in a pure network. 
A characteristic of this model is that the f units which 
depart node i must arrive at node j. Many real applica- 
tions violate this assumption. In a pipeline distribution 
network, liquid or gas will be lost due to leakage. In 
a network carrying a perishable commodity, a certain 
percentage of the commodity will be lost as it moves 
along the arcs. For these cases, flow may be lost as it tra- 
verses certain arcs. However, if an arc represents hold- 
ing money in a savings account over a period of time, 
the value at the end of the period will equal the initial 
investment plus the interest earned. An arc in a gener- 
alized network permits flow to increase, decrease, or re- 
main the same as it traverses the arc. This is illustrated 
in Fig. 2 for the arc (i, j). Each end of the arc has a con- 
stant (multiplier) associated with it, which determines 
the gain or loss during traversal. For the pure network 
arc, the +1 and —1 correspond to the coefficients in the 
node-arc incidence matrix. 

Generalized network models are also used to change 
units in a flow model. The arcs illustrated in Fig. 3 con- 
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Pure Network Are 
{all flow Icaving i arrives at j) 


b=-] e 
a 
Ci) eerie 


Generalized Network are with Loss 
{only 90% of flow leaving [arrives at ]} 


b 
Ci) —(i) 


Generalized Network arc with Gain 
{cach unit of flow Icaving i is doubled 
when it arrives at ]} 


Generalized Networks, Figure 2 
Different types of generalized network arcs 


from US $ from pound slerling 
to pound sterling to French franes 


ese l -0.57 Conan) I wok PH) 


Generalized Networks, Figure 3 
Generalized network arcs to convert currency 


Generalized Networks, Figure 4 
Sample generalized network 


vert from US dollars to pound sterling, and from pound 
sterling to French francs. That is, dollars which depart 
New York are converted to pounds when they arrive 
at London. Pounds leaving London are converted to 
francs when they arrive at Paris. This is also useful to 
convert from machine-hours to units of finished parts 
or pallets to truck loads. 


In its most general form, the generalized network 
problem isa linear program with the special feature that 
each column of the constraint matrix has at most two 
nonzero entries. Let G be an mxn matrix with full row 
rank having this feature. Let c, ], and u be n-component 
vectors, and r an m-component vector. Let Y = {x: Gx 
=r7r,1 <x <u}, and assume that Y # 9. The general- 
ized network problem is to find an n-component vector 
x, such that cx = min, {cx: Gx =r,1] <x < u}. For 
the generalized network model illustrated in Fig. 4, G 
is 


nodes\arcs | __1 2 3 4 . 
1 1 1 0 0 0 
2 =2 0 =ll 0.5 0 
3 0 =ll =ll 2 
4 0 0 0 1 =ll 


For each arc, an arbitrary orientation has been as- 
signed so that an arc is defined by the following tuple: 
(from node, to node, from-node multiplier, to-node 
multiplier, cost, lower bound, upper bound). 

Some authors and computer codes require that the 
from-node multiplier be 1. The above model can be 
converted to this form via the variable substitution 
Xt = agxz fork =1,..., n, where a; is the from-node 
multiplier for arc k. However, this restriction causes 
some difficulty if the generalized network solver is ever 
adapted to solve the integer generalized network model. 
The code developed by J.L. Kennington and R.A. Mo- 
hamed [8] (RAMSES) allows for arbitrary multipliers 
on both ends of each arc. Other authors assume that 
the lower bounds are all zero. The above model can 
be converted to this form via the variable substitution 
Xp = xp — I, for k = 1,..., n, where i is the lower 
bound for arc k. 

Many of the computer codes that have been devel- 
oped for the generalized network problem are special- 
izations of the primal simplex algorithm. These special- 
izations exploit the graphical structure of the basis and 
solve systems of equations on a graph rather than with 
standard matrix operations. Let B be a nonsingular m x 
m submatrix of G, and N be the submatrix composed of 
the remaining n — m columns of G. By imposing sim- 
ilar partitions on c, x, and u, the generalized network 
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PROCEDURE primal simplex iteration 
1 Let x be a solution to *B = ¢® 

2 SetrANec%-—aN 

3 Select g such that 


problem is represented as 


min cPx? + cNxN 


B N 

st. Bx’ + Nx* = 1, boy <Oand 2% =U) or 
Pax? oy), AN > 0 and xf = ul! 
IN <xN < yy. 4a IF no such g exists, 


THEN the solution is an optimum. 
4b IFcY =, 
THEN A+ 1 
ELSE A+ -1 
5 Let y be a solution to By = Ng, where Ng is 
the gth column of NV 


Any solution (x3, x’) in which x € {IN, uN} and x? 
= B-!(r — Nx) is called a basic solution with respect 
to the basis B. A feasible solution that is also basic is 
called a basic feasible solution BFS. Each iteration of the 
primal simplex algorithm corresponds to moving from 


one BFS to another BFS so that the objective function oe Bp sp 
7 7 : 7 F (x; = Hh ) 
value never increases, proceeding until an optimum is ———| for Ay >0, 
reached. BoB 
, Ey — Uj 
The dual variables associated with a BFS are given ae ented for Ayi < 0, 
by x =c®B™ and the reduced costs are given by A = c — : 
ae - ; i oO otherwise 
x G. The optimality conditions for a given primal-dual 
pair are 7 Let s=argmin{d;:7=1,...,n — m}. 
8 IFd,>uv —i% 
A; > 0 >x =], THEN DO Case 1 
Aj=0 Sl <xj<u;, ELSE DO Case 2. 
Case 1. 
Aj<0 >xj = Uj, 
: d # oF 2? —A(ud! —i¥)y 
ae eee tie te ta IF A=1, 
se each j. Using os motaens an iteration of the prima THEN DO z if ‘ai eS 
simplex algorithm is as in the table above. ELSE DO 2¥ «1% 
: : q q 
By re-ordering the rows and columns of B, it can be Case 2. 
displayed in block diagonal form as follows: oe? o 2? — Ads 
- ay © xy’ + Ads. 
B Interchange: 
B= Me . the gth column of N and 
BP the sth column of B. 
For example, the basis can be displayed as B equals 
1 2 0 0 0 00 0 0 0 lL 2 
—1 0 1 1 —2 00 0 0 O =] 1 1 -2 
0 0 0 0 0 12 2 0 0 2 1 3 
0 0 0 0 0 0 4 0 —-1 O —l1 
B= 0 0 0 0 0 00 0 0 1 —2 
~10 0 0 0 0 00 1 0 0 1 
0 0 0 0 0 0 0 0 —-1 0 1 
0 0 -—-l 0 0 00 0 0 0 —1 
0 0 0 —2 0 00 0 0 0 —1 4 
0 -—2 O 0 1 00 0 0 3 2 2 1 
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Generalized Networks, Figure 5 
A display of the basis B 


with p = 2 and row order 1, 2, 10, 8, 9, 5, 6, 7, 4, and 3. 
A display of the graph corresponding to B is illustrated 
in Fig. 5. The direction of the arcs was selected arbitrar- 
ily. 

A connected network having exactly one cycle (such 
as the upper component in Fig. 5) is called a one-tree. 
Anarc which is incident to a single node (such as the arc 
corresponding to the last column of B) is called a root 
arc. A connected network on k nodes having k — 1 reg- 
ular arcs and one root arc is called a rooted tree (such 
as the lower component in Fig. 5). It has been known 
from at least the 1960s that the connected components 
of a generalized network are either one-trees or rooted 
trees ([5,7]). This structure can be exploited in solving 
the systems 7B = c® and By = N-q needed in the simplex 
algorithm, the details of which appear in [6]. 

In software implementations of the primal simplex 
algorithm, the basis of a generalized network is main- 
tained using a special data structure. Using the rooted 


Generalized Networks, Table 1 
Label for the basis illustrated in Fig. 5 


Node Pred Thrd Card Last 
Node 
1 10 ® 1 1 
2 1 8 3 9 
3 3 4 4 6 
4 3 z 2 I 
5 10 1 1 5 
6 3 3 1 6 
Uf 4 6 1 Zo 
8 2 9 1 8 
9 2 10 1 9 
10 2 5 2 5 


tree illustrated in Fig. 5, one may imagine a line around 
the contours of the tree as illustrated in Fig. 6a, which 
is known as a depth-first search. For this example, the 
nodes in this search are ordered 3, 4, 7, 4, 3, 6, 3. An 
order called pre-order is obtained by eliminating all du- 
plicate occurrences (i. e. 3, 4, 7, 6). The label which gives 
the next node in the pre-order is called the thread. 

Three additional labels are generally used to main- 
tain the basis. The predecessor of node v, denoted p(v) 
is the first node encountered on the path from v to the 
root. For root nodes, we adopt the convention p(v) = 
v. If the arc linking v and p(v) were deleted, then there 
would be two trees, one containing v and the other ex- 
cluding v. The tree containing v is said to be rooted at 
v. The cardinality of v is defined to be the number of 
nodes in the tree rooted at v. The last node of v is de- 
fined to be the last node in the tree rooted at v when the 
nodes are taken in thread (pre-order) order. 

The data structure used to represent a rooted tree 
is extended for the one-tree in an obvious way. The 
cycle in the one-tree plays the role of the root node. 
The predecessor label of the nodes in the cycle point 
to the next node in the cycle. That is, beginning with 
any node in the cycle, say v, the sequence v, p(y), 
p(p(v)), ... identifies all nodes in the cycle. The conven- 
tion adopted for the thread is that traversal around the 
cycle using the thread is in the opposite direction to that 
using the predecessor. The pre-order for a one-tree is 
illustrated in Fig. 6b and the four labels for the basis 
illustrated in Fig. 5 are given in Table 1. Using these 
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Generalized Networks, Table 2 
Survey of generalized network codes, where A stands for As- 
sembly and F for FORTRAN 


Code Lang Authors 
NETG F F. Glover, 
1973 D. Klingman, 
J. Stutz 
> FW. Langley 
1973 
- F D. Adolphson, 
1981 L. Heum 
GENNET F G. Brown, 
1984 R. McBride 
GWHIZNET A J. Tomlin 
1984 
GRNET F M. Engquist 
1985 M. Chang 
LPNETG FJ. Mulvey, 
1985 S. Zenios 
- F I. Ali, 
1986 A. Charles, 
T. Song 
GRNET-K F M. Chang, 
(parallel) M. Engquist, 
1987 M. Finkel, 
R. Meyer 
PGRNET F R. Clark, 
(parallel) R. Meyer, 
1987 M. Chang 
GNO/PC C W. Nulty, 
1988 M. Trick 
GRNET-A A M. Chang, 
1988 M. Cheng, 
C. Chen 
GENFLO F M. Ramamurti 
1989 
GRNET2 F R. Clark, 
(serial) R. Meyer, 
1989 M. Chang 
TPGRNET F R. Clark, 
(parallel) R. Meyer 
1989 
NETPD F N. Curet 
1994 
RAMSES C J. Kennington, 
1997 R. Mohamed 


A Depth-First Search for a One-Tree 


Generalized Networks, Figure 6 
Depth-first search illustrated 


labels and the ideas presented in [2], all operations of 
the primal simplex algorithm can be performed directly 
on the basis forest composed of rooted trees and one- 
trees. 

Since the generalized network problem is a specially 
structured linear program, any of the LP algorithms can 
be used to solve the network problem. By utilizing the 
structure of the network, however, any of the LP algo- 
rithms can be specialized to reduce the solution time. 
A specialization of the dual simplex algorithm may be 
found in [8] and a primal-dual procedure may be found 
in [4]. The relaxation method of Bertsekas has also been 
extended for the generalized network case (see [3]). The 
interior point algorithm (see [9]) could also be special- 
ized for this problem. 

The first specialized software for the generalized 
network problem was developed in the early 1970's. 
A partial list of codes which have been developed may 
be found in Table 2. An extensive list of applications of 
the generalized network model may be found in [1] and 
in [10]. 
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The complementarity problem and its generalizations 
are now established as an important class of applied 
mathematical problems. For these problems, there ex- 
ists a body of theoretical results, algorithms for com- 
puting solutions and many applications from engineer- 
ing to economics and from theoretical physics to com- 
puter science. A recent survey, [6], describes some of 
this progress, including applications in some major in- 
dustrial research laboratories in the United States. Cov- 
ered there are models for elasto-hydrodynamic lubrica- 
tion of bearings (automotive industry) and spatial price 
equilibrium (telecommunications firm). The applica- 
tion of complementarity allowed engineers and ana- 
lysts to comprehend and solve a new range of problems 
which had been out of reach. It is now well documented 
that other approaches do not adequately model these 
application problems while complementarity handles 
them. 

Two main generalizations of the nonlinear comple- 
mentarity problem were made: 
e Generalization of the ordering to that of a cone (see 

[5]. 
e Generalization to several functions per index (see 

[1], and [7]). 
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The first of these generalizations was applied to 
solve an elasto-hydrodynamic lubrication problem in 
[5], while the second was applied in [7] to solve a more 
complex mixed lubrication problem. These particular 
problems were studied in the past without complemen- 
tarity models, but it is now recognized that the earlier 
attempts were incomplete, and failed to comprehend 
the main features of the physical situation. 

In recent years, the second generalized complemen- 
tarity problem above has been reconsidered and a re- 
lated problem, the generalized order complementar- 
ity problem has been studied. It was known for some 
time that under certain conditions on the functions in- 
volved, there exists a solution to the linear generalized 
complementarity problem. See [1]. Recently, more ex- 
tensive results have been obtained. For example, B.P. 
Szanc [8] developed a theory and algorithms for non- 
linear functions of the class P, thereby extending the 
work of G.J. Habetler and M.M. Kostreva [2]. Results 
for the infinite-dimensional version of the generalized 
order complementarity problem are presented in [4]. 

The nonlinear complementarity problem is as fol- 
lows: Given f: R" > R", find x € R” satisfying x > 0, 
f(x) = 0, and xT f(x) = 0. The most general set of condi- 
tions known for existence and uniqueness of solutions 
for this problem (even removing the requirement for 
continuity of f) are in [2]. 

Considering the generalized complementarity with 
cone ordering, let K be a pointed, solid cone in R" and 
let 


K* = {yeR": x'y>Oforallx eK}, 


and let f: R’ — R". The generalized complementarity 
problem (f, K) is to find x € R” satisfying x € K, f(x) € 
K*, and xT f(x) =0. 

Finally, the generalization with multiple functions 
per index is as follows: fj: R” — R, find x € R" satisfying 
2 Of g(X) = 0:7 € Tj. f = Lyssa ty He If =0,i¢€ 
Ij, j = 1,..., n. Here the product of the variable x; with 
the product of functions (|J;| of them), plays the role of 
the complementarity condition. 
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> Integer Linear Complementary Problem 
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This article deals with the solution of mixed integer non- 
linear programming (MINLP) problems of the form 


min f(*,y) 


(P) 4) st. g(x,y) <0 


xeX,yeé Y integer. 


Throughout the following general assumptions are 

made: 

Al) f and g are twice continuously differentiable and 
convex functions; 

A2) X and Y are nonempty compact convex (polyhe- 
dral) sets; and 

A3) a constraint qualification holds at the solution of 
every NLP subproblem obtained by fixing the in- 
teger variables y. 

MINLP problems arise in a range of engineering appli- 

cations (see, e. g., [8] and [10] and references therein). 

A class of methods for MINLP problems is dis- 
cussed, which provide an alternative to nonlinear 
branch and bound (cf. ®» MINLP: Branch and bound 
methods) [3]. These algorithms are based on the con- 
cept of defining an MILP master problem. Relaxations 
of such a master problem are then used in constructing 
algorithms for solving the MINLP problem. 

The methods presented here are a generalization of 
outer approximation proposed by M.A. Duran and LE. 
Grossmann [4] (see also [14]) and of LP/NLP based 
branch and bound of I. Quesada and Grossmann [13]. 

The next section presents the reformulation of (P) 
as an MILP master program. Based on this reformula- 
tion two algorithms are presented in the following sec- 
tions which solve a finite sequence of NLP subproblems 
and MILP or MIQP master problems, respectively. The 
final section shows how the re-solution of these master 
problems can be avoided by updating their branch and 
bound trees. 


Outer Approximation of (P) 


In this section the MINLP model problem (P) is refor- 
mulated as an MILP problem using outer approxima- 
tion. The reformulation employs projection onto the 
integer variables and linearization of the resulting NLP 
subproblems by means of supporting hyperplanes. The 
convexity assumption allows an MILP formulation to 
be given where all supporting hyperplanes are collected 
in a single MILP. 

In order to improve the readability of the material, 
the reformulation is first done under the simplifying as- 
sumption that all integer assignments y € Y are feasi- 
ble. Next a rigorous treatment of infeasible subprob- 
lems is outlined, correcting an inaccuracy in [4] and 
[14], which could cause the algorithm to cycle. Finally, 
the two parts are combined and the correctly reformu- 
lated MILP master program is presented. 

The reformulation presented in the next section af- 
fords new insight into Outer Approximation. It can be 
seen, for example, that it suffices to add the lineariza- 
tions of strongly active constraints to the master pro- 
gram. This is very important since it reduces the size of 
the MILP master program relaxations that are solved in 
the outer approximation algorithms. 


When Ally € Y Are Feasible 


In this subsection the simplifying assumption is made 
that all y € Y are feasible. The first step in reformulating 
(P) is to define the NLP subproblem 


min f(x, y/) 
NLP(y') \st. g(x,y!) <0 


xEex 


in which the integer variables are fixed at the value y 
= y!. By defining v(y’) as the optimal value of the sub- 
problem NLP(y/) it is possible to express (P) in terms of 
a projection on to the y variables, that is 


min(v(y’)). (1) 
ylEeYy 
The assumption that all y € Y are feasible implies that 
all subproblems are feasible. Let x/ denote an optimal 
solution of NLP(y’) for y/ € Y (existence of x/ follows by 
the compactness of X). Because a constraint qualifica- 
tion holds at the solution of every subproblem NLP(y’) 
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for every y’ € Y, it follows that (1) has the same optimal 
value as the problem 


min(u(y’)), (2) 
ylEY 
where u(y’) is the optimal value of the following LP 
—xi 
min fi +(WfiT (" . 


; : — xi 3 
st. O>g/ +[Vgi]" (" ‘s a 
xeXx, 


In fact, it suffices to include those linearizations of con- 
straints about (x/, y/) which are strongly active at the so- 
lution of the corresponding subproblem. Here f/ = f(x, 
y/) and V fi = V f(x, y), ete. 

It is convenient to introduce a dummy variable 7 € 
R into (3), giving rise to the equivalent problem 


min 7 
x 


st. n> fit (vfit (" a 


aq (x — x 
0>g/+[Vg!] P 
xeEXxX. 


The convexity assumption Al) implies that (x', y’) is 
feasible in the inner optimization problem above for all 
y' € Y, where x’ is an optimal solution to NLP(y’). Thus 
an equivalent MILP problem 


min 7 
X5V51 


— xi 
st. ne pscopr(° *) 
yay 

— xi 

o= erie (* ‘) 
+= 
VywieY 

x EX, 


(My) 


y € Y integer 


is obtained. That is, (My) has one set of linearizations of 
the objective and constraint functions per integer point 
yey. 


Infeasible Subproblems 


Usually, not all y € Y give rise to feasible subproblems. 
Defining the sets 


T= { fi xf optimal solution to NLP( yj )} ; 
V={yeY: Ax € X with g(x, y) < 0}. 


Then V is the set of all integer assignments y that give 
rise to feasible NLP subproblems and T is the set of 
indices of these integer variables. Then (P) can be ex- 
pressed as a projection on to the integer variables 


min(v(y/)). 
ylEeV 


In this projection the set V replaces Y in (1). The equiv- 
alent MILP problem is now given by 


min 7 
X5YoM 


st. on > fl e(VfAT (' 7 “) 
ae 


(My) 7 (x—xi 
0> git [Vgi]" ; 
y= 
VjeT 
xExX, ye V integer 


obtained from (My) by replacing Y by V. 

It remains to find a suitable representation of the 
constraint y € V by means of supporting hyperplanes. 
The master problem given in [4] is obtained from prob- 
lem (My) by replacing y € V by y € Y. Duran and 
Grossmann 1986 justify this step by arguing that a rep- 
resentation of the constraints y € V is included in the 
linearizations in problem (My). However, it is not diffi- 
cult to derive an MINLP problem where this would lead 
to an incorrect master problem [5], [11, p. 79]. 

In order to derive a correct representation of y € V 
it is necessary to consider how NLP solvers detect infea- 
sibility. Infeasibility is detected when convergence to an 
optimal solution ofa feasibility problem occurs. At such 
an optimum, some of the nonlinear constraints will be 
violated and other will be satisfied and the norm of the 
infeasible constraints can only be reduced by making 
some feasible constraints infeasible. A suitable frame- 
work for nonlinear feasibility problems in the context of 


Generalized Outer Approximation 


outer approximation is 


: k gt k 
min Yo wight. y) 
iejt 
st. g(x, y*) <0, 
xeX. 


k 
F(y") je, 


The constraints in F (y*) have been divided into two 
sets: one that can be satisfied and another that cannot 
be satisfied. Infeasible subproblems now correspond to 
solutions of F(y*) with Viet wk gt (x, y*) > 0. Most 
common feasibility problems such as |; and Igo as well 
as the feasibility filter [6] fit into this framework. The 
following lemma shows how solutions of F(y*) can be 
used to construct a representation of y € V. 


Lemma 1 IfNLP(y*) is infeasible, so that x* solves F(y*) 
with 


S> wi(gi)t > 0, (4) 


iejt 


then y = yt is infeasible in the constraints 
o> gh + (veh) (“-*.) vie yt 
> gi i y- yk ? 
x —xk ; 
O> gi +(Vgh)" (anak Viel, 


for all x € X. The proof of this Lemma can be found in 
[5, Lemma 1]. 


The General Case 


This subsection completes the derivation of the MILP 
master program by combining the developments of the 
previous two subsections. Let the integer assignment y* 
produce an infeasible subproblem and denote 


S= \k: NLP(y*) infeasible, x* solves Fy} : 


Note that S is the complement of the set T defined in 
the previous subsection. It then follows directly from 
Lemma 1 that the constraints 


k 
0> gk +IV A" (Ge) Vk eS, 
& & y—yk 


exclude all integer assignments y* for which NLP(y*) is 
infeasible. Thus a general way to correctly represent the 


constraints y € V in (My) is to add linearizations from 
F(y*) when infeasible subproblems are obtained, giving 
rise to the following MILP master problem: 


st. n> fit (vfiye (' ~ *) 
y-y! 


Sgn 
> e+ [Vg] 
yay? 


(M) 
VjEeT 
_ xk 
0> gk + [Vgk]T (' ‘) 
yy 
Vkes 
xexX, ye€/Y integer. 


The development of the preceding two subsections pro- 
vides a proof of the following result: 


Theorem 2 If assumptions Al), A2) and A3) hold, then 
(M) is equivalent to (P) in the sense that (x*, y*) solves 
(P) if and only if it solves (M). 


Problem (M) is an MILP problem, but it is not prac- 
tical to solve (M) directly, since this would require all 
subproblems NLP(y’) to be solved first. This would be 
a very inefficient way of solving problem (P). Therefore, 
instead of attempting to solve M directly, relaxations of 
(M) are used in an iterative process that is the subject of 
the next section. 


Linear Outer Approximation 


This section describes, how relaxations of the master 
program (M), developed in the previous section can 
be employed to solve the model problem (P). The re- 
sulting algorithm is termed linear outer approximation. 
It is shown to iterate finitely between NLP subprob- 
lems and MILP master program relaxations. This al- 
gorithm is seen to be less efficient if curvature infor- 
mation is present in the problem. A worst-case exam- 
ple, in which linear outer approximation visits all in- 
teger assignments has been derived in [5]. This exam- 
ple motivates the introduction of a second order term 
into the MILP master program relaxations, resulting 
in a quadratic outer approximation algorithm which is 
considered in the next section. 


Generalized Outer Approximation 


1219 


Each iteration of the linear outer approximation al- 
gorithm chooses a new integer assignment y! and at- 
tempts to solve NLP(y’). Either a feasible solution x! is 
obtained or infeasibility is detected and x! is the solu- 
tion of a feasibility problem F(y') (other pathological 
cases are eliminated by the assumption that the set X 
is compact). The algorithm replaces the sets T and S in 
(M) by the sets 


i: x/ solution to NLP(y’)} ; 


se 


={ 


ix 
k < i: x* solution to Fy)! : 


It is also necessary to prevent any y/,j € T', from becom- 
ing the solution of the relaxed master problem. This can 
be done by including a constraint 


n < UBD, 
where 
UBD! = min {f/: j € T’} 


is an upper bound on the optimum. Thus the following 
master problem is defined 


min 7 
XY 
st. 1 < UBD! 


n> fie v(pyr (' - 
yoy 


o> gi+vigit (*~* 
= gi + Vig!) 
=) 


(M') 
Vje Ti 
yk 
0> g¥ + Vi gt]! (' ‘) 
ae i 
Vk eS! 
xeExX, ye ¥Y integer. 


The algorithm solves (M’) to obtain a new integer as- 
signment y'*! 
tively. A detailed description of the algorithm is as fol- 
lows. 


, and the whole process is repeated itera- 


fix the 
Nonlinear Programming ee eee eee eee integer 
NLP - subproblem variables 
NLP generates 
| supporting 
Mixed Integer Linear hyperplanes 
Programming add new 
~eoolon SGROTRS 
MILP master program hyperplanes 
MILP finds 
new integer 
Be Cae assignment 
pee — No 
a MILP infeasible? > 


~ - 
= ee 
~ - 


| Yes 


STOP 


Generalized Outer Approximation, Figure 1 
Illustration of linear outer approximation 


Initialization: y° given: 


sii=@, 1 =, S47 =, UBD =cs 

REPEAT 

1. Solve NLP(y’) at F(y') as appropiate. Let the so- 
lution be x!. 


2. Liniarize objective and constraint functions 
about sy )eSetd: e =a = UE ons a 
Si! U {i} as appropriate. 

3. IF (NLP(y’) feasible AND f' < UBD'‘!) 
THEN 
update current best point by setting x* = x’, 
ue = 7 UBD: =i" 

EESE Set UBD = WED =" 

4. Solve the current relaxation (M') of the master 
program (M), giving a new y'*!. Set i = i+ 1. 

UNTIL ((M’') is infeasible) 


Algorithm 1: Linear outer approximation 


The figure below illustrates the different stages of 
the algorithm. 

The algorithm also detects whether or not (P) is in- 
feasible. If UBD = 00 on exit, then all integer assign- 
ments that are visited by the algorithm are infeasible 
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(i.e. Step 3 is not invoked). The use of upper bounds 
on 7 and the definition of the sets T' and S!' ensure 
that no y’ is replicated by the algorithm. This enables 
a proof to be given that the algorithm terminates after 
a finite number of steps, provided that there is only a fi- 
nite number of integer assignments. 


Theorem 3 If assumptions A1), A2) and A3) hold, and 
if |Y| < 0, then Algorithm 1 terminates in a finite num- 
ber of steps at an optimal solution of (P) or with an indi- 
cation that (P) is infeasible. 


A proof of this theorem can be found in [5]. Below 
a brief outline of the proof is given: The optimality of 
x! in NLP(y’) implies that 1 > f' for any feasible point 
in (3). The upper bound 7 < f! therefore ensures that 
the choice y = y' in (M*) has no feasible points x € X. 
Therefore the algorithm is finite. The optimality of the 
algorithm follows from the convexity of f and g which 
ensures that the linearizations are supporting hyper- 
planes. 


Quadratic Outer Approximation 


Curvature can often play an important role in optimiza- 
tion. If this is the case, then an algorithm based on lin- 
ear representations of the problem functions can be in- 
efficient. In [5], a worst-case example is given for which 
linear outer approximation visits all integer points. This 
motivates the introduction of a curvature information 
into the master programs. In the remainder of this sec- 
tion it is shown how this can be achieved for linear 
outer approximation by including a second order La- 
grangian term into the objective function of the MILP 
master programs. 

These considerations have led to the development 
of a new algorithm based on the use of second order 
information. The development of such an algorithm 
seems contradictory at first sight, since quadratic func- 
tions do not provide underestimators of general convex 
functions. However, the derivation of the previous sec- 
tion allows the inclusion of a curvature term into the 
objective function of the MILP master problem. This 
quadratic term influences the choice of the next iterate 
by the algorithm without surrendering the finite con- 
vergence properties which rely on the fact that the fea- 
sible region of the master problem is an outer approxi- 
mation of the feasible region of the MINLP problem P. 


The resulting algorithm is referred to as quadratic outer 
approximation and is obtained by replacing the relaxed 
master problem (M') by the MIQP problem (Q‘) in Step 
4 of Algorithm 1. Introducing the quadratic term 


1 (x—x'! : ae 
(x, =3( _ veni( 7 
BO % 4) y-y 
where 
Lig L(xi yd) = f(xy) + A) g(x, y') 


is the usual Lagrangian function. 
The new master problem (Q‘) can be defined as 


min + q'(x,y) 
X,¥s1 


s.t. 1 < UBD! 
wt (xx 
n> fit (Vf) 
y= 
| 02 gi +[Vgi]" (“-*) 
(Q') ~ y— yi 
Vie T! 
_ yk 
o> ete iver (% - 
a 
Vk eS! 
xeExX, ye ¥Y integer. 


Numerical experience in [11, Chapter 6] indicates that 
adding a curvature term reduces the number of iter- 
ations of outer approximation if general integer vari- 
ables are present. However, the iteration count is not 
reduced for problems involving binary variables only. 
As aconsequence these preliminary results indicate that 
quadratic outer approximation only improves the com- 
putation times for problems with general integer vari- 
ables, as MIQP problems are usually more expensive to 
solve than MILP problems. 


Avoiding Resolving the Master Problems 


This section presents a new approach to the solution 
of successive master program relaxations. It has been 
proposed by Quesada and Grossmann [13] for a class 
of problems whose objective and constraint functions 
are linear in the integer variables and is called LP/NLP 
based branch and bound algorithm. Their approach is 
generalized here to cover problems with nonlinearities 
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Solution not integer feasible; 
branch 


Update all problems 
on the stack 


Generalized Outer Approximation, Figure 2 
Progress of LP/NLP based branch and bound 


in the integer variables. In addition a new algorithm 
QP/NLP based branch and bound is proposed based on 
the quadratic master program (Q') which takes curva- 
ture information into account. 

The motivation for the LP/NLP based branch and 
bound algorithm is that outer approximation usually 
spends an increasing amount of computing time in 
solving successive MILP master program relaxations. 
Since the MILP relaxations are strongly related to one 
another this means that a considerable amount of infor- 
mation is re-generated each time a relaxation is solved. 
The new approach avoids the re-solution of MILP mas- 
ter program relaxations by updating the branch and 
bound tree. This section makes extensive use of branch 
and bound terminology; see the extensive literature on 
branch and bound (e.g., [1,2,8,9,12]) for the relevant 
definitions. 


ie . 
Solution integer feasible; 
solve the NLP-subproblem 


Continue the branch and 
bound process 


Instead of solving successive relaxations of (M), the 
new algorithm solves only one MILP problem which 
is updated as new integer assignments are encountered 
during the tree search. Initially an NLP-subproblem is 
solved and the initial master program relaxation (M°) is 
set up from the supporting hyperplanes at the solution 
of the NLP-subproblem. The MILP problem (M?°) is 
then solved by a branch and bound process with the ex- 
ception that each time a node (corresponding to an LP 
problem) gives an integer feasible solution a say, the 
process is interrupted and the corresponding NLP(y’) 
subproblem is solved. New linearizations from NLP(y’) 
are then added to every node on the stack, effectively 
updating the branch and bound tree. The branch and 
bound process continues in this fashion until no prob- 
lems remain on the stack. At that moment all nodes are 
fathomed and the tree search is exhausted. 
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Initialization: y° given; 
gattiei, Teh. S" aw 
Set up the initial master program: 
Solve NLP(y°). Let the solution be x®. 


Linearize objective and constraint functions about (x°, y°). 


Set T° = {0}. 
Sein = Sy UBD =n 


Place (M°) with its integer restrictions relaxed on the stack. 


WHILE (stack is not empty) DO BEGIN 


1. Remove a problem (P’) from the stack and solve the LP giving (x’, y’, n’). 7 is a lower bound for all 
NLP child problems whose root is the current problem. 


2. IF (y’ integer) THEN 
Sci v; 
Solve NLP(y’) or F(y'). 
Let the solution be x’. 


Linearize objective and constraint functions about (x!, y'). 


Set ae ons Se Ulan: 


Add linearizations to all pending problems on the stack. 


IF (NLP(y’)feasible AND f! < UBD’) THEN 


Update best point x" =x, y° = 7°, UBD = 7 


ELSE Set UBD‘*! = UBD’. 
ENDIF 
Place (P’) back on stack; set i = i+1. 


Pruning: Remove all problems from stack with n’ > UBD'*!. 


ELSE 


Branch on an integer variable and add two new problems to the stack. 


ENDIF 
END (WHILE-LOOP) 


Algorithm 2: LP/NLP based branch and bound 


Unlike ordinary branch and bound a node cannot 
be assumed to have been fathomed, if it produces an 
integer feasible solution, since the previous solution at 
this node is cut out by the linearizations added to the 
master program. Thus only infeasible nodes can be as- 
sumed to be fathomed. In the case of MILP master pro- 
grams there exists an additional opportunity for prun- 
ing. Since the LP nodes are outer approximations of 
the MINLP subproblem corresponding to their respec- 
tive subtree a node can be regarded as fathomed if its 
lower bound is greater than or equal to the current up- 
per bound UBD‘. 

As in the two outer approximation algorithms the 
use of an upper bound implies that no integer assign- 
ment is generated twice during the tree search. Since 
both the tree and the set of integer variables are fi- 
nite the algorithm eventually encounters only infeasi- 


ble problems and the stack is thus emptied so that the 
procedure stops. This provides a proof of the following 
consequence to Theorem 3. 


Corollary 4 If assumptions A1), A2) and A3) hold, and 
if |Y| < 00, then Algorithm 2 terminates in a finite num- 
ber of steps at a solution of (P) or with an indication that 
(P) is infeasible. 


The figure below illustrates the progress of Algorithm 
2. In i), the LP relaxation of the initial MILP has been 
solved and two branches added to the tree. The LP that 
is solved next (indicated by an * ) does not give an in- 
teger feasible solution and two new branches are intro- 
duced. The next LP in ii) produces an integer feasible 
solution indicated by a box. The corresponding NLP 
subproblem is solved and in iii) all nodes on the stack 
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are updated (indicated by the shaded circles) by adding 
the linearizations from the NLP subproblem including 
the upper bound UBD! which cuts out the current as- 
signment y’. Then, the branch and bound process con- 
tinues on the updated tree by solving the LP marked 
bya *. 

If curvature information plays an important part 
in the problem (P), then it may be beneficial to add 
a quadratic term q'(x, y) to the master problem. This 
gives rise to QP/NLP based branch and bound algo- 
rithm. It differs from Algorithm 2 in two important 
aspects. The first difference is that QP rather than LP 
problems are solved in the tree search. The second dif- 
ference is a consequence of the first. Since QP problems 
do not provide lower bounds on the MINLP problems 
(P), the pruning step in Algorithm 2 cannot be applied. 

In preliminary numerical experiments in [11, Chap- 
ter 6] and [7] it has been observed that the LP and 
QP version of Algorithm 2 are usually faster than their 
counterparts based on Algorithm 1, often beating the 
latter by a factor of 2. A detailed numerical comparison 
of the two approaches is still outstanding. 
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Generalized primal-relaxed dual approach (GPRD) in 
the context of global optimization employs the Ben- 
ders’ idea of partitioning (see [2]) in order to exploit the 
structure of global optimization problems with com- 
plicating variables (variables which, when temporar- 
ily fixed, render the remaining optimization problem 
much simpler, see [4]). For the class of global optimiza- 
tion problem considered by the GPRD approach, fix- 
ing the values of the complicating variables reduces the 
given problem to a convex program, parameterized by 
the values of the complicating variables. In order to ap- 
proximate the solution of this class of problems effi- 
ciently, the GPRD approach uses the primal and relaxed 
dual problems with fixed complicating variables to pro- 
vide sharper upper and lower bounds of the solution re- 
spectively, following the original ideas in [2,4] and [9]. 

It however adopts a different way of construct- 
ing relaxed dual problems by generalizing the original 
method used in the GOP algorithms (see [3]) so that it 
can handle a wider range of global optimization prob- 
lems including nonsmooth ones. 

Let k, p, n, m be some positive integers. Let X and Y 
be two closed sets in R” and R” respectively. Let f, gi, 
hj (1 <i<kand 1 <j <p) be continuous functions on 
R" x R™. Let g = (g1,..., gx)? and h = (hy,..., hp)T. 

Let us consider the following global optimization 
problem: 


pee f(x,y) 
st. g(x,y) <0, 
(OP) v = h(x, v7) = 0, 
xex, yey, 
l<i<k, 1<j<p, 


where X and Y are nonempty closed convex sets in R” 
and R” respectively. It is assumed that for any fixed y € 


Y,orx€X,1<i<k,andl1<j<p,f(-.y), gil, y), f(x 
-), gi(x, +) are convex functions, and h; (-, y), hj(x, -) are 
affine functions. It is also assumed that for any fixed y € 
V = {y € Y: there is an x € X: g(x, y) < 0 and A(x, y) = 
O}, the partial primal problem (OP) is stable in the sense 
that its perturbation function has a nonempty subgradi- 
ent at zero point; see [1]. This assumption holds when, 
for instance, the Slater’s constraint qualification holds 
for (OP) at every fixed y € V, though it is more general 
than the Slater’s. 

Although the problem (OP) appears to address only 
a limited class of global optimization programs, it is 
shown in [5] that very broad mathematical program- 
ming problems can indeed be reformulated in this form 
by using a simple variable transformation. Further- 
more, it is shown in [6] that for any fixed y € Y the 
reformulated problems are always stable. 

It follows from the stability assumption that for any 
fixed yo € V there exist Lagrange multipliers (Ao, j4o) € 
R? x RK and xo € X such that the triplet (xo, Ao, [4o) is 
the solution of the following saddle problem: 


&(X0. Yo) < 9, 
h(xo, Yo) = 0, 
Lg g(Xo, Yo) = 0, 


and for any (x, A, 2) € X x R? x R* 


L(xo, yo. A, /L) < L(Xxo, yo. Ao, [Lo) 


SP 
( ) < L(x, yo, Ao, Mo); 


where the Lagrange function of the primal problem 
(OP) is defined by 


L(x, y,A, w) = f(x,y) FATH(x, y) + wl" g(x, y). 


The solution (xo, Ao, /4o) of (SP) can be found by solving 
the following partial primal problem: 


(PP) min f(x, yo), 


g(x, ¥o)S0, 
h(x, yo)=0 


which is a convex minimization problem. 
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For a given yo € V, the GPRD approach finds an 
upper bound for the solution of (OP) by solving (PP): 


+ = : 
v" (yo) = ees f(x, Yo). 
g(x, yo)<0, 
h(x, yo)=0 
The problem (PP) is in general easier to solve as it is 
convex. The GPRD approach then estimates a lower 
bound for the solution of (OP) by solving the follow- 
ing relaxed dual problem: 


v (U, H) 
(RD) — min max min HHH (x, y), 
VEV (Ay,up)EU xEX 
where U = {(A;, lr) € R? x RY": 1 < t < N}, and the 
mapping H: U > C°(X x Y) is such that the function 
H@+H40(., .) is continuous and satisfies that for any fixed 
(At, Ht) € U;, 


L(x, y; At, LAr) 
= f(x,y) + AP R(x, y) + wi g(x,y) 
> HOvHD (x, y), V(x, y) EX x Y. 


In the GPRD approach, the set U is usually constructed 
to include the multipliers (A, jz) obtained from solving 
the problem (PP) above. 

The generalized primal-relaxed dual algorithm is to 
construct, for n = 0, 1, ..., a sequence of elements y,, 
€ Y, sets U,, and functions Heo) for each (A, Wt) € 
U,, such that v* (y,) — v-(U,, Hn) > 0 as n > 0. The 
selections of U,, and H Ar) are clearly not unique but 
they have to be constructed so that the global solutions 
of the relaxed-dual problems (RD) can be solved effi- 
ciently. In the literature U,, is set to be the unit of all the 
Lagrange multipliers (A, jz) of the partial primal prob- 
lems (PP) with y = y,, (m = 1, ..., n). Assume that the 
selection Hove) is given for any (A;, +) € U,. Then 
the generalized primal-relaxed dual algorithm reads: 


1 | Given yp € V,ande > 0. 

2 | Given y, € V, solve (PP) for y = y, to obtain 
x, and Lagrange multiplies (A, Un). 

3 | Solve (RD) to obtain yni1, where U, = 
Un -1(Am, Mm) }- 

4 | Stop ifv*(yn) —v (Un, Hn) < €. 

Otherwise go to Step 2 with n = n+ 1. 


PRD Algorithm for (OP) 


It is shown in [7] that the PRD algorithm converges 
to the global solutions of (OP) under some mild as- 
sumptions. 

There are many possible choices for the mapping 
role H. In the literature the following results have 
been reported. In Geoffrion’s original work in [4], 
Hmm) (4, y) = LX, y, Ams em) (1 < m © n). It is in 
general difficult to solve (RD) computationally with this 
choice of H,,. In the pioneer work [3], H,, takes the form 
of 


Ee ey) 
_ L(Xm, Ym» Ams em) 
+ Vi L(Xm,¥, Ams Lem)(X — Xm) 
=F VyL(Xms Ym Ams Lm) Y — ¥m) 


(m = 1,...,n), 


where Xm. Vm» Am» [Lm are obtained from the previous 
iterations of the PRD algorithm and V,.L(x, y, A, 4) (or 
V,L(x, y, A, 2) is the gradient of the Lagrange func- 
tion L at x (or y) for a fixed (y, A, jz) (or (x, A, [)). 
The resulting PRD algorithm has been referred to as 
GOP algorithm and has been widely used in various 
global optimization problems (see, e.g., [8] for a sur- 
vey). Important progress has been made in developing 
efficient ways of solving (RD) for the GOP algorithm, 
see, also [8]. 

The GOP algorithm is only applicable to smooth 
optimization problems where the objective functions 
and the constraints are differentiable. Nonsmooth op- 
timization problems occur in many important real ap- 
plications. In [7] the GPRD approach is applied to 
a class of nonsmooth global optimization problems 
where 


=F F° 
Lia 
and 


gi = Gj + maxG;, i ere 
e€ 


where E = {1,..., d}, and the smooth C! functions F, F°, 
G=(Gi,..., Ge)T, G = (Gf,..., GE) satisfy all the con- 
ditions in (OP) fore = 1,..., d. The resulting algorithm, 
referred to as EGOP, reads: 
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1 | Given yo € V,ande > 0. 

Given y, € V, solve (PP) for y = y, to obtain 
x, and Lagrange multipliers (A, /,). 

3 | Solve (RD) to obtain yyi1, where U, = 
UF Am, en)} and for any (Ain, fim) € Uns 


Hy "(x y) 
= ES Caaas ras Bian a) 
+V,LS(Xm,¥,Am,lem)(X = Xm) 
+VyLS(Xm,Yms Ams em) (Y — Ym) 
+ max(P* (Xm, ¥m)+VaP* (Xm, y)(x—Xm) 


Wl (Gage Ym)iy aa Ym)) 
k 
+ Dit max(G; (Xm, Ym)) 
PWG Chan NCS = Miz) 
2 Wil Ge (Chris WY = Wea))e 


where the smooth part of the Lagrange is de- 
fined by LS(x, y,A, ) = F(x, y) +ATA(x, y) + 
p' Gx, y). 

4 | Stop ifv*(yn) — v (Un, Hn) < €. Otherwise go 
to Step 2 with n =n+1. 


EGOP Algorithm for (OP) 


This algorithm is identical with the GOP algorithm 
in the smooth case where F* = 0, G° = 0 for e=1,..., d. 
The EGOP covers a wider range of global optimization 
problems, and it is shown in [7] to be convergent un- 
der essentially the same assumptions which ensure the 
convergence of the GOP algorithm. 

Penalty implementation of the PRD algorithm has 
also been considered in the literature to explore another 
way of coping with possible infeasible primal or relaxed 
dual subproblems in the algorithm. In [7], the EGOP 
algorithm is applied to the following two penalty prob- 
lems: 


(NPOP), min P(x, y), 
Y 


(x, y)EXX 
where 


k 
P(x, y) = f(x,y) + p >> max (0, gi(x, y)) 


j=l 


P 
+ pyo|hiy) 


» pe 0. 
j=l 
and 
(SPOP)y min P(x, y), 
(x,y)EXXY 
where 


k 
P(x, y) = f(x,y) + M>- max (0, g(x, y))” 


j=1 


P 
+M)°|hj(x, y) 


jal 


7 M> 0. 


The convergence of the two penalty implementations 
of EGOP algorithm is established in [7], where it is 
shown that the sequences {(x,, ¥n)} generated by the 
EGOP algorithm for the penalty problems (NPOP), 
and (SPOP)y converge to the solutions of the (OP) 
when p is large enough or M tends to infinite. 


See also 


> wBB Algorithm 
> Global Optimization in Phase and Chemical 
Reaction Equilibrium 
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Introduction 


In generalized semi-infinite optimization problems, 
a finite-dimensional decision variable x is subject to in- 
finitely many inequality constraints, that is, in 


GSIP: minimize f(x) subjectto xeM, 


the feasible set is described by 

M = {x € R"|g(x, y) < 0 forall y € Y(x)}, 
with the index set 

¥(x) = {y ER" |velx, y) $0, €€ L}. 


All defining functions f,g, ve, ¢ L = {1,...,s} are 
assumed to be real valued and at least continu- 
ous on their respective domains. Moreover, the set- 
valued mapping Y : R” = R” is assumed to be locally 
bounded, that is, for each x € IR” there exists a neigh- 
borhood U of x such that L),.<¢y Y(x) is bounded in R”. 

In applications such as design centering, robust 
optimization, and (reverse) Chebyshev approximation 
({13,32]), often finitely many semi-infinite constraints 
gilx, y) <0, y € Y;(x), i € I, describe the feasible set 


M of GSIP, along with finitely many equality con- 
straints in the definitions of M and Y(x). In order to 
avoid technicalities this article focuses on the basic case 
of a single semi-infinite constraint (see [13,32] for more 
general formulations). 

As opposed to a standard semi-infinite optimiza- 
tion problem, the possibly infinite index set Y(x) of 
the semi-infinite inequality constraint is allowed to vary 
with x in a GSIP. For surveys and detailed studies about 
standard semi-infinite optimization see [10,15,25]. In 
contrast to standard semi-infinite programs, the feasi- 
ble set of GSIP is not necessarily a closed set, and it 
might possess a stable disjunctive structure ([32]). 

Powerful optimality conditions are based on a thor- 
ough analysis of these topological structures. This arti- 
cle mainly deals with first-order optimality conditions 
and will, thus, begin with a discussion of different first- 
order properties of the feasible set. 


Definitions 


The key to understanding the topological features in the 
feasible set of GSIP lies in the bilevel structure of semi- 
infinite programming ([27,32]). In fact, it is not hard to 
see that the semi-infinite constraint in GSIP is equiva- 
lent to 


g(x) := max g(x,y) <0, 
yeY(x) 
which means that the feasible set M of GSIP is the 
lower-level set of some optimal value function: 


M = {x € R"|g(x) <0}. (1) 


The usual convention “maxg = —oo” is consistent 
here, as an empty index set Y(x) corresponds, loosely 
speaking, to “the absence of restrictions” at x and, 
hence, to the feasibility of x. 

The function ¢ is the optimal value function of the 
so-called lower-level problem 


Q(x): max g(x,y) subject to ve(x,y) <0, CEL. 
yeR™ 


In contrast to the upper-level problem, which consists 
in minimizing f over M, in the lower-level problem x 
plays the role of an n-dimensional parameter, and y is 
the decision variable. The main computational problem 
in semi-infinite programming is that the lower-level 
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problem has to be solved to global optimality, even if, 
for example, only a stationary point of the upper-level 
problem is sought. 


Topological Properties 


The alternative description of the feasible set in (1) 
shows that the topological properties of M are deter- 
mined by the continuity properties of #, whereas first- 
and second-order optimality conditions will rely on the 
first- and second-order properties of ¢. The proper- 
ties of optimal value functions have been studied exten- 
sively in parametric optimization ([2]; for a brief intro- 
duction see [32]). 

The optimal value function ¢ can be shown to be at 
least upper semicontinuous, so that points x € R” with 
g(x) < 0 belong to the topological interior of M. On the 
other hand, for investigations of the local structure of M 
or of local optimality conditions one is only interested 
in feasible points from the boundary 0M of M. Hence it 
suffices to consider the zeros of @, that is, points x € R” 
for which Q(x) has maximal value g(x) = 0. We de- 
note the globally maximal points of Q(x) for arbitrary 
x € R” by 


Y.(x) = ty € YOx)I g(x, y) = o(x)} 
and for the special case of x € MM 0M by 
Yo(x) = ty € Y(x)|g(x, y) = 0}. 


The set Yo(x) is also called the upper-level active in- 
dex set of GSIP. 

Note that M is closed if for all x € IR” the index set 
Y(x) is nonempty and the Mangasarian-Fromovitz con- 
straint qualification (MFCQ) holds at some element of 
Yo(x) ([13,32]). IfM is not closed, there may exist infea- 
sible boundary points x € 0M, that is, boundary points 
with g(x) > 0. 


The Reduction Ansatz 


For theoretical as well as numerical purposes it is of cru- 
cial importance to keep track of the elements of Y, (x) 
for varying x. These points solve the lower-level prob- 
lem so that for functions g and ve, £ € L, which are 
continuously differentiable with respect to y, they sat- 


isfy the first-order necessary optimality condition of 
Karush-Kuhn-Tucker: let 


L(x, y,y) = g(x,y) - y' v(x, y) ‘ 


denote the Lagrangian of Q(x) with multiplier vec- 
tor y € R*. Then for x € M and each y € Y,(X) such 
that MFCQ holds at y in Q(x), there exist multipli- 
ers y > 0 with Dy L(x, y, y) = 0 and fy - ve(x, ) = 0, 
£€L. Here D,£ denotes the gradient of £ with re- 
spect to y as a row vector. Note that the multiplier 
vector y is uniquely determined if instead of MFCQ 
the stronger linear independence constraint qualifica- 
tion (LICQ) holds at y. 

Keeping track of the elements of Y,(x) can now 
be achieved, for example, by means of the implicit 
function theorem, if the functions g and vg, ¢ € L, are 
C? with respect to y. For x € M a local maximizer ¥ 
of Q(x) is called nondegenerate in the sense of Jon- 
gen/Jonker/Twilt ([19]), if LICQ, strict complementary 
slackness and a second-order sufficiency condition are 
satisfied. The Reduction Ansatz ([14,16,35]) is said to 
hold at x € M if all global maximizers of Q(x) are non- 
degenerate. The set Y, (x) can then only contain finitely 
many points, say, Y4(x) = {y',..., 72} with p € N. By 
a result from [8] the local variation of these points with 
x can be described by the implicit function theorem. 

In fact, for x locally around * there exist contin- 
uously differentiable functions yf (x), 1<i<p, with 
y'(x) = y' such that y'(x) is the locally unique local 
maximizer of Q(x) around 7’. It turns out that the func- 
tions 9;(x) := g(x, y'(x)) are even C” in a neighbor- 
hood of x. Their gradients are 


Do; (x) = Dr L£(%, 9’, 7’), (2) 


where y’ is the uniquely determined multiplier vector 
corresponding to 7’. 

A major consequence of the Reduction Ansatz is 
the so-called Reduction Lemma ([16]): if the Reduction 
Ansatz holds at x, then for all x from a neighborhood U 
of x one has 


g(x) = max ¢;(x). 
1<ix<p 


In view of (1) this means that locally around a feasible 
boundary point « € MN 0M, the feasible set M can be 
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described by finitely many C?—constraints, that is, GSIP 
locally looks like a smooth finite optimization problem: 


MNU = {xe U|o(x) <0,i=1,...,p}. (©) 


In particular, locally around x set M is closed, and it 
does not possess a stable disjunctive structure at x. 


First-Order Properties of the Feasible Set 


Since the Reduction Ansatz cannot be expected to 
hold generically everywhere in MM 0M, the first-order 
structure of M is also studied under considerably 
weaker assumptions. For the first-order approximation 
of M one defines the contingent cone ’*(x, M) to M at 
x as follows: d € '*(x, M) if and only if there exist se- 
quences of scalars (t”)yen and of vectors (d”) yen such 
that 
t? \.0,d’ +d(v > oo) and *£4+1t'd" eM 
forall veEN. 


The contingent cone is a closed cone, not necessarily 
convex, containing first-order information about M. In 
view of (1) the contingent cone to M at x should be re- 
lated to a level set of a first order approximation of ¢ 
at x. Unfortunately, the differentiability properties of ¢ 
can be very weak, so that lower and upper directional 
derivatives of @ at x in direction d in the Hadamard 
sense ([4]) come into play: 


j < + td) — p(x 
yg! (%,d) = ine 
t\0,d—>d t 
and 


g(x + td) — 9(%) 


g',(x,d) = lim sup ; 


t\0,d>d 


@ is called directionally differentiable at x (in the 
Hadamard sense) if for each direction d # 0 one has 
p(x, d) = g!,(x, d) =: g'(x, d). The outer lineariza- 
tion cone of M at x can now be defined as 


L* (x, M) = {d € R"|g! (x, d) < 0} 
and the inner linearization cone by 


L(&, M) = {d € R"|g', (&,d) < 0}. 


For x € 0M N M the chain of inclusions 
L(x, M) C I’'*(x, M) C L*(x, M) (4) 


holds ([22,33]). A good first-order description of M 
around x can thus be obtained if the linearization cones 
L(x, M) and L* (x, M) do not differ too much from each 
other. 

For example, in standard semi-infinite program- 
ming the index set mapping Y(x) = Y is constant, and 
the theorem of Danskin ([6]) then says that @ is direc- 
tionally differentiable with 


g'(%,d) = max D,2(%, yd 
yEYo(*) 


for all d € R". The linearization cones 


L(x, M) = () {d € R"|D, g(x, y)d < 0} 
VEYo(x) 


and 


L*(#,M) = (| {dE R"|Dyrg(x, y)d < 0} 
yEYo(x) 


thus differ only by the strictness of inequalities, and 
they do not possess a disjunctive structure. 

If in GSIP the Reduction Ansatz holds at x, using (2) 
it is not hard to see that ¢ is directionally differentiable 
with 


g'(%,d) = max Dy L(x, 7', p')d 
l<i<p 


for all d € R”. The linearization cones 


P 
L(x, M) = (\ta € R"|D,L(, ',7')d < 0} 


i=1 
and 


P 
L*(#,M) =( \{d € R"|D,£(%, 9’, 7')d < 0} 


i=1 


again differ only by the strictness of inequalities. 
Under weaker assumptions than the Reduction 
Ansatz the situation in GSIP becomes more involved 
since @ does not even have to be directionally differ- 
entiable. The following estimates for the upper and 
lower directional derivatives from [9,23] are known to 
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be tight: for x € dM MM such that MFCQ is satisfied 
at each y € Yo(x) one has for each d € R” 


D,L(x, y,y)d < ol (x, d) 


sup min 
ye Yo (x) YEKKT (X.Y) 


< y'(x,d)< m m D,L£(x, y,y)d . 
< yx pe cree sh L(x, y, Y) 


Here 


KKT(x, y) = ty € R'|y = 0, Dy L(x, y, y) = 0, 
ve-ve(x,y) = 0,£ € L} 


denotes the set of Karush-Kuhn-Tucker multipliers at 
yin Q(x). 

At least this yields estimates for the linearization 
cones: 


yEYo(X) VEKKT (x,y) 
C L(x, M) c P’*(x, M) C L*(x, M) 


cf) U eR" DL. y,y)d < 0}. 
ye Yo(X) yYEKKT (x,y) 


{d €R"|D,L(%, y,y)d < 0} 


However, the estimate for the inner linearization 
cone is rather poor in many situations in which the 
problem data are endowed with a special structure. 
In [31] analogous estimates are given without the as- 
sumption of MFCQ in Yo(x). 

A disjunctive structure of [”*(x, M) is intimately re- 
lated to the nonuniqueness of the lower-level Karush- 
Kuhn-Tucker multipliers. This becomes clearer un- 
der the assumption that the lower-level problems 
Q(x), x € U, are convex for some neighborhood U of 
x, and that Y(x) possesses a Slater point. Due to results 
from [11,18,26] the multiplier set KKT(x) then does 
not depend on y € Yo(x), and ¢ is directionally differ- 
entiable at « with 


g'(x,d)= min max D,L(x,y,y)d 


VEKKT(X) yEYo(%) 


for all d € R”. This yields 


L.M)= (J) () {de R"|D,.£(&.y,y)d < 0} 


YEKKT(&) yEYo(x) 
and 


L*(@,M)= |) () (deR"| 


yEKKT(%) yEYo(%) 


D,L(X,y,y)d < 0}. 


Now both the inner and outer linearization cones 
possess a disjunctive structure, and they only differ by 
the strictness of inequalities. Moreover it becomes obvi- 
ous that the occurrence of a stable disjunctive structure 
in GSIP is caused by nonunique lower-level Karush- 
Kuhn-Tucker multipliers. For more details on lower- 
level problems with a special structure see [27,29,32]. 


Constraint Qualifications 


In what follows let the functions f, g, and vg ,£ € L, be 
continuously differentiable. It is well known ([3]) that 
at a local minimizer x of f on M the following primal 
first-order necessary optimality condition holds: 


{d € R"|Df(z)d <0}N*(%,M) =O. (5) 


To obtain a more explicit condition from (5) one 
needs an explicit description of ’*(x, M). A good can- 
didate would be the outer linearization cone L* (x, M), 
which contains the contingent cone by (4). Even in fi- 
nite optimization simple examples show, however, that 
I’*(x, M) can be a proper subset of L*(x, M). In this 
case one cannot replace the contingent cone in (5) by 
the outer linearization cone. 

On the other hand, in view of (4) it is always pos- 
sible to replace the contingent cone in (5) by the in- 
ner linearization cone. However, the resulting optimal- 
ity condition may be trivially satisfied since L(x, M) can 
be void itself. 

These observations give rise to the following defini- 
tions. 


Definition 1 The extended Mangasarian-Fromovitz 
constraint qualification (EMFCQ) holds at x € M if 
L(x, M) # @, and the extended Abadie constraint qual- 
ification (EACQ) holds at xe M if [*(x,M) = 
L*(x, M). 


Note that EMFCQ coincides with MFCQ for finite dif- 
ferentiable optimization problems ((24]). Furthermore, 
it is obvious that EACQ coincides with the Abadie con- 
straint qualification (ACQ, [1]) for finite differentiable 
optimization problems. Whereas in finite optimization 
MFCQ is stronger than ACQ, for GSIP this is not nec- 
essarily the case as an example in [33] shows (see, how- 
ever, [31]). For extensions of the Karush-Kuhn-Tucker 
constraint qualification to GSIP see [12]. Explicit for- 
mulations of EMFCQ under different structural as- 
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sumptions on the lower-level problem Q(X) can eas- 
ily be obtained from the descriptions of L(x, M) given 
above. 


Formulation 


An important difference to finite or standard semi- 
infinite programming is that, for GSIP, there does not 
exist a single first-order necessary optimality condition, 
but the explicit formulation of the condition heavily 
depends on the structure of the lower-level problem. 
In fact, from the abstract primal first-order optimality 
condition (5) one can derive explicit dual conditions 
by replacing the contingent cone by an appropriate lin- 
earization cone and then cast the resulting conditions 
on certain infinite inequality systems in a dual formu- 
lation by means of theorems of the alternative, like, for 
example, the lemma of Gordan ([5,17]). 


First-Order Optimality Conditions. In what fol- 
lows, such optimality conditions are given for the struc- 
tures discussed above. Recall that optimality conditions 
are trivial at interior points of M. 


Theorem 1 ([16]) Let x €0MOM be a local min- 
imizer of GSIP, at which the Reduction Ansatz holds. 
Moreover, let there exist a dy € IR" such that 


D,L(,7', p')do <0 forall 1<i<p, 


(i.e. EMFCQ holds at x). Then there exist multipli- 
ersdA; >0,i=1,...,p, with |{1 <i < plA; > 0}| <n 
such that 


P 
Df (%) + Ai De L(%, ',7') = 0. 


i=1 


Theorem 2 ([29,32]) Letx € MN M bea local mini- 
mizer of GSIP, let the lower-level problems Q(x), x € U, 
be convex for some neighborhood U of x, and let Y(x) 
possess a Slater point. Then for each choice y € KKT(x) 
such that there exists a do with 


D,L£(%,y,y) dy <0 forall y © Yo(z), (6) 


there exist y' © Yo(x) and multipliers A; > 0, 
i=1,...,p, with |{1 <i < pA; > 0}| <n, such that 


P 
Df(%) + Ai De L(x, yy) = 0. 


i=1 


If EMFCQ holds at x, then at least one such choice y 
exists. 


Theorem 3 ([27,32]) Letx € OM M bea local min- 
imizer of GSIP, and let MFCQ hold at all y € Yo(x). 
Moreover, let there exist a dy € R" such that 


D,£(x, y, y) do < 0 
forall y © KKT(x,y), y € Yo(x), 


(which is sufficient for EMFCQ at x). Then there exist 
y! € Yo(x), y' € KKT(&, y'), and multipliers 4; = 0, 
i=1,...,p, with |{1 <i < plA; > 0}| < n, such that 


P 
Df(%) + Yo ADL y',y') =0. 


i=1 


Note that, under the convexity assumption on the 
lower-level problem, Theorem 2 provides a whole 
family of optimality conditions (parametrized by 
y € KKT(x)) and, thus, takes a possible disjunctive 
structure of M at x into account. On the other hand, 
in the absence of a nice lower-level structure, Theo- 
rem 3 yields a much weaker condition (which cannot 
be strengthened without further assumptions, as exam- 
ples show). 

First-order necessary optimality conditions for 
GSIP have been derived under several other structural 
assumptions and other theoretical approaches as well. 
In fact, without the assumption of EMFCQ, Fritz John- 
type optimality conditions can be derived ([32]), and 
there also exist optimality conditions without the as- 
sumption of any regularity condition, either in the 
upper- or in the lower-level problem ((20,31,32]). Con- 
ditions under other constraint qualifications are in- 
vestigated in [12]. Furthermore, other theoretical ap- 
proaches to optimality conditions are the linearization 
approach from [27] and conditions based on quasid- 
ifferentiable calculus ([7,27,30]). First-order sufficient 
optimality conditions for GSIP are examined in [32,34]. 


Second-Order Optimality Conditions 


Second-order necessary and sufficient optimality con- 
ditions can be obtained in a straightforward manner 
under the Reduction Ansatz. One must simply write 
down the corresponding condition for the reduced fi- 
nite optimization problem with the feasible set from 
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(3). Unfortunately, the Hessians of the optimal value 
functions ;(x) = g(x, y'(x)),1 < i < p, have a more 
complicated structure than the gradients from (2), due 
to the appearance of so-called shift terms. In fact, one 
has 


2 fife si ai 
Dy Lil% J oy ) 
—Dyv1i(X, y') 


) 


where DyVji stands for the matrix with rows D,ve, 
LE Li := {€ €L| ve(x, y') = 0}. 

Second-order conditions are also known under 
weaker assumptions, for example without the strict 
complementary slackness assumption of the Reduction 
Ansatz ([16]), and in connection with second-order 
epiregularity ([13,28], see also [4]). A second-order sta- 
bility analysis for GSIP is given in [21]. 


Dr ilk) = DL(R, ¥',7')— 


2p (2 oi ai 
Dy Lilx, y os ) 
—Dyv,(%,7') 0 


Conclusions 


First- and second-order optimality conditions are not 
only of theoretical importance, but also of high signifi- 
cance for the design of efficient numerical methods for 
GSIP. A review of such methods, including methods 
of feasible directions, KKT methods, and discretization 
methods, is given in [13]. 


See also 


> Bilevel Optimization: Feasibility Test and Flexibility 
Index 

> Parametric Optimization: Embeddings, Path 
Following and Singularities 

> Second Order Constraint Qualifications 
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One important application of nonlinear least squares 
concerns with data fitting or parameter estimations. In 
ordinary least squares for data fitting, it is assumed that 
the errors in independent variables are either zero or 
negligible. Although there are situations in which errors 
in independent variables are zero or negligible, there 
exist many cases such as experiments and observations 
where this isnot so and use of the ordinary least squares 
may lead to bias in the estimated values of parameter 
vector and variance values [8]. Generalized total least 
squares problems are formulated from data fitting if er- 
rors in all variables are taken into account. Suppose that 
we have chosen a model function y = $(xx, ¢) to fit a set 
of data y1, ..., ¥m sampled at m points f, ... 
x € R" is an adjustable parameter vector. The gener- 
alized total least squares problem concerning with this 
data fitting determines an optimal value of x and t such 
that the function 


» tm, Where 


fle.) = 5 bole 4) — yp)? + y(G 4] 
j=l 
= slr Wr +e! Ve] 


is minimized, where ($(x, Tj), Tj), j = 1,..., m, are true 
but unknown values of pair (y, t), W = diag(wi,..., Wm), 
V = diag(v1, ..., Vm), wj = 0, vj = 0, j =1,..., m, are 
weighting factors, r and e are two m-vectors with com- 
ponents 1; = $(x, tj)— yj, ej = T)— ty j= 1,..., m, re- 
spectively. 

Generalized total least squares problems can be 
solved by directly applying any method for ordinary 
nonlinear least squares or general minimization prob- 
lems. Since these methods minimize the objective func- 
tion f(x, t) with respect to (n+m) variables x and t, and 
do not allow for the use of the special structureof the 
function, direct use of these methods will not be effi- 
cient. Assuming that the functions rj(x, t), j = 1, ..., 
m, hence the function f(x, T) is twice continuously dif- 
ferentiable, the first and the second order derivatives of 
f(x, T) are defined by 


a= be r he eek 


wre (Ger oy) 


1234 
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where 


m 
fate = AWA! + » wit jVexl js 
j=l 


m 
V2,f = AWD + > wirjV2,7;, 
j=l 
m 
ve => V + DWD + wjtjVert;> 
j=l 


A= [Vert ane Vira ’ 
Ot |* 


The (mx m)-matrix V2_f is a diagonal matrix with di- 
agonal elements 


D = diag | $2. 


Or; f Or; 
J 


In developing algorithms for generalized total least 
squares, it is important to exploit the special structures 
of the function f(x, t) and its derivatives, and in par- 
ticular, the fact that variables x and t can be treated 
separately. W.E. Demming [2], M. O’Neill, LG. Sin- 
clair and J. Smith [5], D.R. Powell and J.R. Macdonald 
[6] proposed approximate Newton methods for polyno- 
mial data fitting. These methods evaluate the second or- 
der derivatives V?,.f and V2_f analytically or numeri- 
cally, but ignore the mixed partial derivatives V>_f and 
V2..f. When analytical derivatives are used, approxi- 
mate Newton methods are not very efficient because 
of the unreasonable approximations. When derivatives 
are evaluated from difference quotient and compensa- 
tions for ignoring mixed parts are made, the behav- 
ior of these methods is improved, because in this case 
the methods are equivalent to using one Newton step 
to separate problem variables and then the separated 
problem is solved using Newton method. 

An optimization problem is separable if the opti- 
mization with respect to some of the variables is eas- 
ier than with respect to others. Generalized total least 
squares problems are a kind of separable optimization 
problems. W.H. Southwell [7] uses the first order nec- 
essary condition to separate the vector x and the vector 
t and then the separated problem is solvedusing New- 
ton method. Gauss-Newton and quasi-Newton meth- 
ods can also be used to solve the separated problems. 


When Newton method is applied to solve a gener- 
alized total least squares problem, the solution of the 
Newton equation 


Viel Veet | [6%] __ [Vef 
Vet Verlier) Ver 
gives a correction (5x, 5t) to (x, T), that is, 


x4 =x+dx, Te =t+6r, 


where x,, T; denote the new iterate. When the fitting 
function (x, t) is a polynomial in the form 


(x,t) = Y- xipi(t), 
i=1 


where p((t), i= 1,..., 1, are a set of orthogonal polyno- 
mials, then off-diagonal elements of the (n x n)-matrix 
V2.f are all zeros. Thus both the matrices V2.,.f and 
V2._f are diagonal. By assuming the elements of matri- 
ces V2_f and V?,f are negligible, approximate New- 
ton methodsapproximate the Hessian matrix V’f by 
the simple diagonal matrix 
hh 
Vert] 
Since V2.,.f and V?_f are diagonal, the solution 5x and 
dt can be easily given by 
"  wiripi(t;) 
8x; — 2 wir) ae =, i=l. 
jar WiPilT) 


OG(x,T)) 
vjei Wig Tj 


a ce. 
HiT: X,T; 
vj +0; ( Fee) tw a 
j=l,...,m. 
Polynomials p;(t), i= 1, ..., n, orthogonal over a set of 
points tj, j= 1,..., m, can be generated using the recur- 
rence relation 
pidt)=1, polt)=t—a, 
pi(t) = (t—aj-)pi() — Bi-1pi-a(t), 
i eee 
where 
dja Wit PIT)” 
Qj) = ry 
Lei wjpi-1(t;) 
Bia = ini WiT{Pi-1 (Tj) Pi-2(T)) 
i-1= 


Yel Wj Pi-2(Tj) 
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Approximate Newton methods begin iteration from 


the initial point x =0 and i =tj,j=1,...,m.Ateach 
iteration, the polynomials p;(t), i = 1, ..., n, orthogonal 
over the set of points cy = 1,..., m, are first calcu- 


lated from the recurrence relation, then iteration 


(kD) Coe) 


= 4 8 x(), = 1h) 4g) 


is implemented to generate a new iterate. The process 
is repeated until convergence is reached. If the resulting 
fitting polynomial is required to express in the form of 
power series 


g(x, tH) = > of += > pi), 
i=1 i=1 


the coefficients c; can be calculated from 


= So xKoi4 141 lsicn, 
k=i 
where 
1 ifi=k, 
Oik = 


0 ifi>kori,k <2, 


Oj+1k+1 = Oik — Ak—-10}+1,k — Br 190i+1k-1; 
1< k: 


Powell and Macdonald extended the method to 
more general case where $(x, t) is a general nonlinear 
function of both the variables x and t. In this case, the 
(n x n)-matrix V2, f is no longer diagonal, and the cor- 
rection 5x needs the solution of the equations 


V2, fix = —Vyf. 


By taking account of the omitted parts of the mixed 
partial derivatives V?_f and V?,f, they use ‘uncon- 
ventional formulas’, rather than analytical derivatives 
or usual difference approximations, to approximate 
derivatives in V,f and V 7, so that the omission parts 
can be compensated to some degree. In fact, their ap- 
proximate Newton method is equivalent, in some sense, 
to the separated Newton method. 

Approximate Newton methods require evaluations 
of second order derivatives for problem functions. Ig- 
noring all the second order terms in V2,.f, V2_,f, V2,f 
and V2_f, an approximation to V’f is directly obtained 


from the first order derivatives of functions r; and ¢;, j = 
1,..., m. The iteration scheme x“) = x) + § x, 7+) 
= + §r with dx and 6¢ given by 


AxWD, bx 
V+ DeWDx | | bt 
_ Ag Wr) 
~ | Ve) + De Wr) 


Ax WA, 
D,.WA, 


gives the Gauss-Newton method for generalized total 
least squares. Special structure of the system can be ex- 
ploited to get savings in finding its solution. Define P; 
= V+ D,WDk. From the bottom part of the system we 
have 


dt = —P,* [Ve + De Wr + Dp WA] dx]. 


Since P,; is a diagonal matrix, once 5x is obtained, 6r 
can be directly obtained by substitutions. Substituting 
dt into the top part of the system we obtain 


[Ax WA] — Ax WD, P1D, WA, | 6x 
= Apw2o® 


with 


Rie 


bY = W 
x [=r + DyPo (Ve + DyWr)]. 


This equation can be expressed as 
AxpW2U,W2A] bx = ApW2 


where Ux = I — W'?D,P7'D,W"? is a diagonal matrix 
with diagonal elements v;/[v;+ wj(Or\/ dt;)?] >0,j=1, 
..., m. The solution 6x can be generated by first per- 
forming a QR factorization to the matrix U;?W!?A] 


1 R 
UZW?A, =Q H 


and then back substitutions in 
1 
Réx = QU, 7b™. 


The Gauss-Newton method is locally convergent 
and convergence behavior depends upon the closeness 
of the Gauss—Newton matrix to the true Hessian matrix 
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Vf at the solution. In order to introduce global conver- 
gence for Gauss—Newton method, line search technique 
or trust region strategy can be used. Let 


A.W? 0 
k= 1 1 
DyW2 V2 


Then 
ArWAL = Ak WDx — yt 
DpWA] V+D,WwD, |“ 


and the Gauss-Newton matrix is at least positive 
semidefinite, often positive definite, (Sx, 6t) is a de- 
scent direction of f(x, T) at (x), +). A line searcha- 
long the direction determines a steplength a, satisfying 
some descent conditions and the new iteration point is 


FD = ® 4 98x, 


THD = pO) 4 gy hr FT), 


P.T. Boggs, R.H. Byrd and R.B. Schnabel [1] use 
trust region technique in their modification of Gauss- 
Newton method for generalized total least squares 
problems. The modification is a generalization of the 
Levenberg-Marquardt method, in which the trust re- 
gion subproblem 


min q,(6z) = [Jz 6z + hi |’ 
st. [dz] < Ax 


is solved, where A; is the trust region radius, 


The solution, denoted by 5z(jz), of the subproblem sat- 
isfies the system of equations 


B bx] A, Wr 
‘léc} [Ve + De Wr® 


[|6z(4) || = Ax, 


jt> 0, unless || 6z(0) || < Ax, where By, denotes the ma- 
trix 
ie WA] + pl 


D.WA, 


AxWD, 
V+ D,eWwD, + pl : 


Let Py = V+DeWDE + LI. From the buttom part of 
the system, we get 


bt = —P, [Ve + Dy Wr 4+ D, WA] bx]. 


Substituting it into the top part we have 
(A,W2U,W2A] + wDdx = Apw2d™, 
OU, =1—W2D,P, D.W?, 
bo) = w 


Nie 


x [-r® + DyP, (Ve + Dg Wr). 


Since this system is the normal equation of the linear 
least squares problem 


on ey pe —— li, 
a ee Beye a 
p2t 0 


the solution 5x can be obtained by performing a QR 


min 


’ 


factorization to the matrix U? W?Al, a sequence of 
plane rotations to eliminate j!?I and back substitu- 
tions. 

For a given value pw, &x(u) is obtained from the 
solution of the system and then 6 r( yw) from substitu- 
tion. If 


lo(u)| = |[8z(u)] — Ax] < pdr 


is satisfied, 5z(u™) is accepted as an approximate solu- 
tion of the trust region subproblem where p € (0, 1)is 
a preset tolerance. Otherwise, u is updated to give 
a new value p)) and a solution 5z(u“)) is recom- 
puted from the system. Moré’s updating formula [4] 


Hy) 0 ou) |5z2(n)| 
Vou) Ak 


be 


can be used to generate pw), where Vo(w) is eval- 
uated from difference approximation 


_ (LH) — ou) 


(€) 
Vo(ur’) pO = ptt 


For generalized total least squares problems, the 
parameter vector x and the variable vector t can be 
treated separately. The first order necessary condition 
for a point to be a solution of the problem can be used 
to eliminate the t dependence in the function f(x, T). 
Consider the system of equations 


V.f = Ve+ DWr=0. 


These contain m nonlinear equations with m un- 
knowns, each of which only contains one unknown 1; 
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for fixed value of x 


0(x, T;) 
vj(tj — tj) + wi(P(x, Tj) — yee = 0, 
j 
j=l,...,m. 


When these equations can be algebraically solved to 
give an explicit solution expression t(x), substitution 
it into the function f(x, t) allows the parameter vector 
x to be determined by directly using any conventional 
method to minimize the function f(x, t(x)) which now 
is a function of the parameter vector x. However, in 
most cases, it is impossible or difficult to get an explicit 
form of the solution t(x) and each equation mustbe 
solved numerically for each given value of x by mini- 
mizing the functions 


Wx, 1) = Flux 4) — yi)? + Gj — 71, 


j=l,...,m, 


to get an approximate solution, T(x) say, to the solu- 
tion t(x) so that the values of function f(x, t(x)) and its 
derivatives with respect to x can be evaluated from the 
values x and T(x). 

Assume that V2_f(x*, t*) is positive definite, then it 
follows from the implicit function theorem [3] that there 
exist open neighborhoods N(x*), N(t*) of x*, t* such 
that for any x € N(x*), a unique t satisfying the sys- 
tem exists in N(t*), this being the vector t(x). Further- 
more, T(x) is continuously differentiable and Vor (x, 
T(x)) is positive definite for all x € N(x*). Substituting 
t(x) into the function f(x, T) we get a separable mini- 
mization problem 


min f(x, t(x)), 


which is defined only in terms of x and reduces the 
problem dimension from m + n to n. The separation is 
particularly efficient since in most cases, m is very large. 
Using the chain rule, the differentiability of t(x) and the 
fact that Vf = 0 we get derivatives of the function f(x, 
t(x)) 


g(x) =Vif + Vit Vif = Vif, 
G(x) =Vi,f + Vi, f Vet 
Veal Vat Ved Veal 


Since the positive definiteness of the matrix G(x) is im- 
plied by that of the matrix V’f, if V7f is positive defi- 


nite at the solution (x*, r*), the matrix G(x*) is positive 
definite, too. 

The separated Newton method minimizes the func- 
tion f(x, t(x)) using Newton iteration 


Gpdx = gH), HD = WO 4 gy) 

to generate a sequence {x}, where G; and g™ are eval- 
uated at x and r(x). r(x) is an approximate solu- 
tion of the system Vf = 0 obtained using Newton iter- 
ation 


t(-(k) —(s) 
(+1) _ 6) _ wi 0; ) 


T 
J J Mlk) 7 )y? 
(x, 2) 
s=1,2,..., j=l,...,m. 
When 
(st1) _ _(s) 
e\ Tj <€, 


oo" is accepted as T(x) where € > 0 isa preset small 


constant. The values t; and aC ali = 1,...,m, can 
be used as starting values of these iterations for k = 1 
and k > 2, respectively. 

A careful observation shows that the difference be- 
tween the Powell-Macdonald method and the sepa- 
rated Newton method is that for given value x, the 
former carries out only one Newton iteration for the 
system Vf = 0 while the later one solves the system 
quite exactly by repeated doing the iteration. 

The separated Newton method still requires the 
evaluation of secondorder derivatives. Ignoring second 
order terms in all derivatives V2,.f, V2,f, Vi,f and 
V2._f, we get an approximation to G 


M, = ApW2U;,W?2Al_, 
U,; = (1+ V'D,WD,) 1. 


Then the iteration 


M,6x = agih) x(k) — x(k) as $x!) 

is the separated Gauss-Newton method [8]. The prop- 
erty that the convergence of Gauss-Newton method 
for ordinary least squares depends on the closeness of 
the Gauss-Newton matrix to true Hessian matrix is 
applicable to the separated Gauss-Newton method. If 
M(x*) = G(x*), the method is locally convergent and 
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rate of convergence is quadratic. If M(x*) # G(x*), the 
method may not converge and if it converges, the rate is 
at best linear. In order to force global convergence, line 
search or trust region techniques can be incorporated. 

For large residual problems, the Gauss-Newton ma- 
trix M is not a good approximation to G and quasi- 
Newton updates can be used to generate better ap- 
proximations. When quasi-Newton updates, for exam- 
ple BFGS update, are used, the separated problem is re- 
garded as a general minimization problem, the special 
structure of the problem function is not exploited and 
approximations are not directly obtained from the first 
order derivatives. The vectors 5 and y™ used in up- 
dating formulas can be defined by 


5 = 6H) — x(k) _ 


y” = gerd, r(xktD)) — g(x, r(x‘) 


Alternative definitions for y(k) can be derived by us- 
ing thespecial structure of the derivatives. Two com- 
mon used definitions for y are 
y™ = Appi WA, 6x + Acer WDeq dt 
+ (Aggi — Agr, 
y = Angi W(r't) _ r*)) 
+ (Anti — Ag) Wr?) 


where 6 = t(x(k+1))— t(x™). Numerical experi- 
ments favors the last definition of y [9]. 

Based on the separated Gauss-Newton method and 
the separated BFGS method, separated hybrid method 
is a simple generalization of the hybrid method for or- 
dinary nonlinear least squares problems, where a test 
[9] is derived to determine what step should be cho- 
sen at each iteration. When the test chooses the Gauss- 
Newton step, the approximation B,; to G; is set to the 
Gauss— Newton matrix M; and when the test chooses 
the BFGS step, the matrix B;, is obtained from B,—; us- 
ing BFGS updating formula. 

When separated methods are used to solve gener- 
alized total least squares problems, computational sav- 
ings can be obtained if we initially ignore errors in t,, 
j=1,..., m, and just solve an ordinary nonlinear least 
squares problem. Whenreasonable reduction in the ob- 
jective function has been achieved, errors in all vari- 
ables are then considered and separated methods are 
applied. This modification of any separated method is 


effective in solving generalized total least squares prob- 
lems. 


See also 


> ABS Algorithms for Linear Equations and Linear 
Least Squares 

> ABS Algorithms for Optimization 

> Gauss—Newton Method: Least Squares, Relation to 
Newton’s Method 

> Least Squares Orthogonal Polynomials 

> Least Squares Problems 

> Nonlinear Least Squares: Newton-type Methods 

> Nonlinear Least Squares Problems 

> Nonlinear Least Squares: Trust Region Methods 
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Introduction 


The theory as well the applications of variational in- 
equalities (VIs) and the nonlinear complementarity 
problem (NCP) have proved to be a very powerful tool 
for studying a wide range of problems arising in me- 
chanics, physics, optimization, and applied sciences. 
A survey on the developments of VI and NCP is in [7]. 
In recent years, considerable interest has been shown 
in developing various extensions and generalizations 
of the VI problem. An important class of such gen- 
eralizations, introduced in [2], is the so-called gen- 
eralized variational inequality (GVI). This class has 
many important and significant applications in various 
fields such as mathematical physics and control the- 
ory, economics, and transportation equilibrium (see, 
e.g., [1,11]). For example, it is known that the traf- 
fic equilibrium problem can be formulated as a VI 
when the travel cost between any two given nodes for 
a given flow is fixed [4]. However, the traffic conditions 
may vary and the travel cost between two given nodes 
may not be fixed, but within a cost interval. In this 
case the corresponding problem can be formulated as 
a GVI. Moreover, GVI provides a unifying framework 
for many general problems such us fixed-point, opti- 
mization, and complementarity problems. In what fol- 
lows we give an overview of recent developments con- 
cerning the issue of existence of a solution and equiva- 
lent reformulations. 


Problem Formulation and Framework 


In its general form, the GVI problem can be stated as 
follows: 
find x* € X and u* € F(x*) such that 


(u*,y—x*)>0 VyeEXx, 


where 

e (-,-) denotes the usual inner product in R", 

e X CR” isa nonempty closed and convex set, 

e R” = R" isa set-valued map, i. e., an operator that 
associates with each x € R" a set F(x) C R”. 

If F is a single valued function, then the GVI problem 

reduces to the classical VI, which is to find x* € X such 

that 


(F(x*),y—x*)>0 VyeXx. 


In connection with the set-valued map F: R" = 
R" a few definitions need to be recalled. First, F is char- 
acterized by its graph: 


graph (F) = {(x,u) € R" x R": ue F(x)}. 


The image of X under F is 
F(X) = (J F(x), 
xEX 


the inverse of F is defined by 
F"\(u) = {x: u € F(x)}, 
and the domain of F is the set 
dom (F) = {x € R": F(x) £ @}. 


Throughout we assume that dom (F) > X. Over the 

past two decades, most effort has been concentrated on 

the question of the existence of solutions to GVI prob- 
lems. The study of the existence of solutions of GVI in- 
volves several continuity properties of set-valued maps. 

We recall these conditions in the sequel. 

e A set-valued map F: R” = R” is said to be upper 
semicontinuous (u.s.c.) at x € IR” if for each open 
set V D F(x) there exists a neighborhood U of x 
such that F(U) C V; F is us.c. on a set X C R” if 
it is u.s.c. at every point in X. 

e A set-valued map F: R” = R" is upper hemicon- 
tinuous on X C R", if its restriction to line seg- 
ments of X is upper semicontinuous. 
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The study of the existence of solutions of GVI in- 
volves also some monotonicity-type properties for set- 
valued maps. In what follows we recall the definitions. 


(M1) Fis quasimonotone on X if, for every pair of dis- 
tinct points x, y € X and every u € F(x), v € F(y), 
we have: 


(v.x—y)>O0= > (u,x-—y)>0. 


(M2) F is properly quasimonotone on X if, for any 
xl, ...,x" © X and any Ay,...,An > 0 with 
y= Ai = 1, there exists j € {1, ... ,m} such that 


for all ui € F(x!) and x = )¥_, Aix', we have: 
(wi,x—x!) <0. 


(M3) F is pseudomonotone on X if, for every pair of 
distinct points x, y € X and every u € F(x),v € 
F(y), we have: 


(v.x—y) > 0 => (u,x—y)>0. 


(M4) F is monotone on X if, for every pair of distinct 
points x,y € X and every u € F(x), v € F(y), we 
have: 


(u-—v,x—y)>0. 


(M5) F is strictly monotone on X if, for every pair of 
distinct points x, y € X and every u € F(x),v € 
F(y), we have: 


(u—v,x—y)>0. 


(M6) Fis strongly monotone on X with constant B > 0 
if, for every pair of distinct points x, y € X and ev- 
ery u € F(x), v € F(y), we have: 


(u-v,x—y) = Bllx—yll’, 


where || - || denotes the classical euclidean norm. 

(M7) F is maximal monotone on X if it is monotone 
on X and its graph is not properly contained in the 
graph of any other monotone operator on X. 


The relationships among these kinds of monotonicity 
are represented in Fig. 1. 


maximal 
monotone 


strongly 
monotone 


strictly 


monotone 
monotone 


pseudomonotone 


properly 
quasimonotone 


quasimonotone 
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Relationships among generalized monotonicity conditions 


Existence and Uniqueness 


In recent years the existence of solutions to GVIs has 
been investigated extensively. In what follows we pro- 
vide some of the most fundamental results. The basic 
result on the existence of a solution to the GVI problem 
requires the set X to be compact and convex and the 
map F to be u.s.c. From this basic result many others 
can be derived by replacing the compactness of X with 
additional coercivity conditions on F. 


Existence of Solutions: Bounded Domain 


This section presents some existence results for solu- 
tions of GVI in the case of a compact domain. The fol- 
lowing existence theorem exploits the formulation of 
GV1 as a fixed-point problem. 


Theorem 1 ([8]) If X is compact and F is u.s.c. on X 
with compact and convex values, then GVI has a solu- 
tion. 


Theorem 2 ([12]) IfX is compact and F is upper hemi- 
continuous and properly quasimonotone on X with com- 
pact and convex values, then GVI has a solution. 
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Existence of Solutions: Unbounded Domain 


The existence of solutions of GVI on unbounded do- 
mains is guaranteed by the same conditions as for 
bounded domains, together with a coercivity condition. 
In the literature various coercivity conditions have been 
considered. In particular (see [5]): 


(C1) 
AR>0, Vx e€ X\Xap, Vu € F(x), 
dye Xr: (u,y—x) <0; 
(C2) 
IR>O, Vx Ee X\Xp, dye Xr, 
Vue F(x): (u,y—x) <0; 
(C3) 
AR>0, VxEeX Xr, JAyEeXr, 
dveF(y): (v.y—x) <0; 
(C4) 
Xoo MN (F(X))~ = {0}, 
where 


Xp = {x EX: ||x|| < R} 
and 
(F(X))~ = {d ER”: (u,d) < 0, Vu € F(X)} 


is the polar cone of F(X). Further, the recession cone 
Xoo, for X closed and convex, is defined by 


Xo = {de R":x+tdexXx, Vt>0,xEX}. 


Some basic relationships among these coercivity 
conditions are summarized in the following result. 


Theorem 3 ([5]) 

@ (C2)= > (Cl). 

e IfF has convex values, then (C2) and (C1) are equiv- 
alent. 

e If F is pseudomonotone on X, then (C3) => (C2). 

© (C4) = > (C3). 

e If F is upper hemicontinuous and pseudomonotone 
on X, then (C2), (C3) and (C4) are equivalent. 


e If F has convex values and it is upper hemicontinu- 
ous and pseudomonotone on X, then (C1), (C2), (C3), 
and (C4) are equivalent. 


The coercivity conditions allow us to exhibit a suffi- 
ciently large ball intersecting with X such that no point 
outside this ball is a solution of the GVI; then one can 
establish the existence of a solution stated below. 


Theorem 4 ([5]) Jf F is upper hemicontinuous and 
pseudomonotone on X with compact and convex values, 
then the following statements are equivalent: 

e GVI has a nonempty and compact solution set. 

(C1) holds; 

(C2) holds. 

(C3) holds. 

(C4) holds. 


In what follows we state an existence theorem for which 
we require neither the upper semicontinuity of F, nor 
the compactness, nor the convexity of F(x), but we need 
the maximal monotonicity of F. 


Theorem 5 ([15]) Assume that F is maximal monotone 
on IR". Then the solution set of GVI is nonempty and 
compact if and only if (C4) holds. 


In general, GVI can have more than one solution. The 
following theorem gives conditions under which GVI 
can have at most one solution. 


Theorem 6 

e If F is strictly monotone on X, then GVI has at most 
one solution. 

e If F is us.c. strongly monotone on X, and has 
nonempty convex and compact values, then GVI has 
a unique solution. 


GVI and Related Problems 


As stated, the theory of GVI is a powerful unifying 
methodology that contains as special cases several well- 
known problems such as fixed-point, optimization, and 
complementarity problems. In what follows we de- 
scribe these equivalent formulations of the GVI prob- 
lem. Such formulations can be very beneficial for both 
analytical and computational purposes. Indeed we can 
apply classic results of these problems to treat the GVI. 
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GVI and Fixed-Point Problems 


In what follows we exploit the formulation of GVI as 
a fixed-point problem. We recall that x* is a fixed point 
of the set-valued map F: X = R" if 
x*eX and x* € F(x*). 

The fixed-point reformulation is very relevant for the 
GVI problem. Indeed we can apply Kakutani’s fixed- 
point theorem, which is instrumental for proving the 
existence result on a bounded domain. We define the 
following set-valued map: 


@: X x conv (F(X)) 3 X x conv (F(X)) 
(x,u) H W(u) x F(x), 
where W(u) = argminyex(u,x) is the set of con- 
strained minimizers of the map (u,x) on X and 
conv (F(X)) denotes the convex hull of F(X). Assum- 
ing that X is compact, W(u) results in being nonempty. 


It easy to see that the problem of finding a fixed point 
(x*, u*) of O,i.e., 


x*eEK, u* € F(x*), x* €argmin(u™,x), 
xE€K 


is equivalent to GVI. 
It is worth noting that the GVI problem can also be 
formulated as an inclusion as follows: 


find x* € K such that 0 € F(x*) + Nx(x*), 


i.e., finding a zero of the set-valued map F + Nx in the 
domain X, where the normal cone Nx(x) to the set X at 
point x € X is given by: 


Nx(x) = {d € R": (d,y—x) <0 VyeEX}. 


GVI and Optimization Problems 


Let us consider the constrained optimization problem: 


min f(x) 


xEXx, 


(1) 


where 

e X isaclosed and convex subset of R”, 

e The objective function f is defined on an open 
neighborhood of X, denoted 22. 


It is well known that if f is continuously differen- 
tiable, then the classical VI with F = Vf is a necessary 
optimality condition for (1). The VI gives also a suffi- 
cient condition if f is pseudoconvex on X, i.e., 


f(x) > f0) => (VF). y—x) <0, 


for all x, y € X. 

Therefore, if f is continuously differentiable and 
pseudoconvex on X, the VI with F = Vf is equivalent 
to the optimization problem (1). In what follows we ex- 
tend these results in terms of GVI when f: 2 — R is 
a locally Lipschitz continuous function, that is, for each 
point x € 92 there exists a neighborhood U of x such 
that f is Lipschitz continuous on U. To this end we re- 
call some basic facts about Clarke calculus for a locally 
Lipschitz continuous function, see [3]. The Clarke’s 
generalized derivative of f at x in the direction v, de- 
noted by f°(x;v), is given by 


tv) — 
7 v) = lim sup fos ey= fo) 
nny 
The generalized gradient of f at x, denoted by df (x), is 
defined as follows: 


df(x) ={E ER": (&,v) < flv) VveR"}. 


A generalized derivative can be obtained from the gen- 
eralized gradient: 


f(osv) = max{(é,v) : & € Af(x)}. 


We can extend the definition of pseudocon- 
vexity for a locally Lipschitz continuous function 
f: 2 —R, [16]: f is pseudoconvex on 2 if, for all 
x,y € 9, there exists & € df (x) such that 


(E&y—-x)=>0 = > f(x) < fl). 


Let us now consider the GVI with Clarke gradient op- 
erator F = df. We can state the following result. 


Theorem 7 ([3]) A GVI with F = Of provides neces- 
sary optimality conditions for problem (1). 


In general, a GVI does not give sufficient optimality 
conditions. However, as shown in [16], when f is pseu- 
doconvex on $2, the GVI gives sufficient optimality 
conditions too. Consequently, as for the single-valued 
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case, if f is pseudoconvex on {2, a GVI with F = Of is 
equivalent to the optimization problem (1). The above 
discussion focused on the GVI with gradient operator; 
however, an arbitrary set-valued map, in general, is not 
a gradient map. A powerful tool in dealing with the GVI 
problem by way of its equivalent optimization reformu- 
lation is given by the so-called gap functions. Specifi- 
cally, we say that a function g: R" x R” > RU {+00} 
is a gap function for GVI if 

e (x,u) > 0 for all (x, u) € graph(F), 

e x* is a solution of GVI if and only if x* € X and 
there exists u* € F(x*) such that g(x*, u*) = 0. 
Hence, the GVI problem can be rewritten as the follow- 

ing constrained optimization problem: 


min ¢(x, u) 


(x,u) € graph (F). 
An example of a gap function, proposed in [6], is: 


y(x,u) = sup(u,x—y), (x,u)e€R" xR". (2) 


yEx 
The function g(x,-) is convex and closed for every 
fixed x € R” and 9(-, u) is affine for every fixed u € R” 
(see [6]). It is worth noting that @ represents a duality 
gap in the Mosco duality scheme [14] for GVI. Let us 
consider this more general GVI problem: find x* € R" 
and u* € F(x*) such that 


(u*,x — x") > h(x") — G(x) 


VxeR", (3) 


where ¢: R” > RU {+00} is a proper, lower semi- 
continuous convex function. The dual problem of (3) 
is defined as: find v* € R” and y* € —F~!(—v*) such 
that 


(y*,v—v"*) = o*(v*) -—¢*(y) 


where $*(v) = sup, epn{(v, x) — b(x)} is the Fenchel 
conjugate of @. 


VveR", 


Theorem 8 ([15]) The gap function (2) measures the 
duality gap of Mosco’s duality scheme: 


ifxe X 


otherwise. 


p(x, u) 
+00 


b(x) + *(—u) + (u, x) = 


The gap function ¢ is not differentiable in general. 
Moreover, when graph (F) is unbounded, it is in gen- 
eral not finite valued. These drawbacks can be avoided 


by using a regularized gap function. Let us consider 


al 
QG(x,u) = pa ce =i 3 lx - yl , 


where (x,u) € R"” x R", G is a symmetric positive 
definite matrix, and ||- ||g is the norm in R” defined 
by |lxlle = V(x, Gx). This function, introduced in [6] 
for generalized quasivariational inequalities, i.e., GVIs 
where set X depends on solution x, is a gap function for 
GVI and is called a regularized gap function. Since 


1 
Welx,u,y) = (u,x—y) - six — ylle 


is strongly concave with respect to y, there is a unique 
maximizer over X denoted by y(x, u). If we denote the 
projection operator onto set X with respect to the norm 
ll - lc by I7x,g(-), it is easy to check that this maximizer 
is 


-1 


y(x,u) = IIx,g(x -—G su). 


Therefore, the regularized gap function 


pols) = (u.x— yew) = 5 Ix yes IE 


is finite valued everywhere. Moreover, the regularized 
gap function is continuously differentiable, and its gra- 
dient is given by 


ViQalx, u) =ut+ G Ly(x, u) _ x] ’ 

VuPG(x,u) =x — y(x,u). 
Therefore, using the regularized gap function we obtain 
an equivalent differentiable optimization reformulation 


of the GVI problem. Gap functions can be used in the 
design of numerical algorithms for solving the GVI. 


GVI and Complementarity Problems 


It is well known that, when X is a closed convex cone 
and F: X — R", the VI problem is equivalent to the 
NCP problem, which consists in finding x* € X such 
that 


F(x*)e X* and (F(x*),x*) =0, 
where 


X* = {deR": (u,d) > 0,VWue X} 
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is the negative polar cone of X. Such a relationship is 
preserved in the GVI problems. First, let us consider an 
extension of the NCP problem, see [17], that can be de- 
fined as follows. 

Let X be a closed convex cone of R” and F a set- 
valued map. The generalized complementarity prob- 
lem (GCP) is to find x* € X such that there exists 
u* € F(x*) satisfying the following properties: 


u*eX* and (u*,x*)=0. 


As in the single-valued case, both problems GVI and 
GCP have the same solution set if the underlying set X 
is a closed convex cone. 
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In this article we describe the main moment prob- 
lems and their solution methods from theoretical to ap- 
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sciences and subjects, for a detailed list please see the fi- 
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General Moment Optimization Problems 


1245 


The Standard Moment Problem 


Let gi, ..., Zn and h be given real-valued Borel measur- 
able functions on a fixed measurable space X := (X, A). 
We would like to find the best upper and lower bound 
on 


yh) := [ h(t)u(d2), 


given that ju is a probability measure on X with pre- 
scribed moments 


[sun u(dt) = yi. 


i=1,...,H. 
Here we assume ju such that 


[gil ede) < +00, 
x 
and 
/ In| pdt) < too. 
x 
For each y := (yj, ... 


> Yn) € R", consider the optimal 
quantities 


L(y) == L(y|h) := inf p(h), 


U(y) := U(y|h) := sup p(h), 
LL 


where jt is a probability measure as above with 


(gi) = Vis i=1,...,n. 
If there is no such probability measure pz we set L(y) := 
+ 00, U(y) :=— 00. 

If h := ys the characteristic function of a given mea- 
surable set S of X, then we agree to write 


Liy|xs) := Ls(y), Ulylxs) = Us(y). 


Hence, Ls(y) < u(S) < Us(y). Consider g: X — R" such 
that g(t) := (gi(t), ..., n(t)). Set also go(t) := 1, allt eX. 
Here we basically present J.H.B. Kemperman’s (1968) 
geometric methods for solving the above main moment 
problems [13] which were related to and motivated by 
[18,20,24]. The advantage of the geometric method is 
that many times is simple and immediate giving us the 
optimal quantities L, U in a closed-numerical form, on 
the top of this is very elegant. Here the o-field A con- 
tains all subsets of X. 


The next result comes from [22,23,25]. 


Theorem 1 Let f;, ..., fy be given real-valued Borel 
measurable functions on a measurable space §2 (such as 
Si +++» Kn and h on X). Let ps be a probability measure on 
92 such that each f; is integrable with respect to w. Then 
there exists a probability measure ju’ of finite support on 
@2 (i.e., having nonzero mass only at a finite number of 
points) satisfying 


[fo mean = f fan w'can, 
Q Q 


alli=1,..., N. 


One can even achieve that the support of jz’ has at most 
N+ 1 points. So from now on we can talk only about 
finitely supported probability measures. 

Call 


V := conv g(X) 


(conv stands for convex hull), where g(X) := {z € R”: z 
= g(t) for some t € X} is a curve in R" (if X = [a,b] CR 
or if X = [a, b] x [c, d] C R’). 

Let S C X, and let M* (S) denote the set of all prob- 
ability measures on X whose support is finite and con- 
tained in S. 

The next results come from [13]. 


Lemma 2_ Given y € R", then y € V if and only if du € 
M*(X) such that 


M(g=y 


(i.e. w(gi) = fx gilt) w(dt) =y;,i=1,...,n). 


Hence L(y|h) < + 00 if and only if y € V (note that by 
Theorem 1, 


L(y|h) = inf {u(h): wp € M*(X), u(g) = y} 
and 

U(y|h) = sup {u(h): ws € M*(X), u(g) = y}). 
Easily one can see that 

L(y) := L(y|h) 
is a convex function on V, i.e. 


Lay’ +(L-A)y") S ALY) + (1 - ALLY"), 
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whenever 0 < A < land y’, y” € V. Also U(y) := U(y| 
h) = — L(y| — h) is a concave function on V. 

One can also prove that the following three proper- 
ties are equivalent: 
i) int(V) := interior of V £ ¢; 
ii) g(X) is not a subset of any hyperplane in R”; 
iii) 1, g1,..., Zn are linearly independent on X. 

From now on we assume that 1, is 
early independent, i.e. int(V) # ¢. 

Let D* denote the set of all (n + 1)-tuples of real 
numbers d* := (do, ..., dy) satisfying 


> n are lin- 


h(t)>do+Scdigi(t), all 


i=1 


te X. (1) 


Theorem 3 For each y € int (V) we have that 


L(y|h) (2) 


= sup do+ > divi: a” = (dieees hal ED"? 
i=1 
Given that L(y| h) > — 00, the supremum in (2) is even 
assumed by some d* € D*. 


If L(y|h) is finite in int(V), then for almost all y € int(V) 
the supremum in (2) is assumed by a unique d* € D*. 
Thus L(y| h) < + oo in int(V) if and only if D* ¥ 9. 
Note that y := (1, ..., ¥n) € int(V) C R” if and only if 
dy + -#_, diy; > 0 for each choice of the real constants 
d; not all zero such that do+ }°7_, digi(t) = 0, all t € 
X. (The last statement comes from [8 p. 5] and [12 p. 
573].) 
If h is bounded then D* ¥ 9, trivially. 


Theorem 4 Let d* € D* be fixed and set 


B(d*) 
“ (3) 
=} z= g(t): dot > digi(t) = h(t), t eX 
i=1 
Then for each point 
y € conv B(d*) (4) 


the quantity L(y|h) is found as follows. Set 


y= >> pjglt)) 


j=l 


with 


g(t) € B(d*), 


and 
pj 2 9, Yo pj =. (5) 
j=l 
Then 
L(y|h) = S> pjh(ti) = do + D> divi. (6) 
j=l i=1 


Theorem 5 Let y € int(V) be fixed. Then the following 
are equivalent: 
i) Ay € M* (X) such that p(g) = y and w(h) = Lh), 
i. e. infimum is attained. 
ii) Ad* € D* satisfying (4). 
Furthermore for almost all y € int(V) there exists at 
most one d* € D* satisfying (4). 


In many situations the above infimum is not attained so 
that Theorem 4 is not applicable. The next theorem has 
more applications. For that, set 


n(z) = lim inf inf {h(¢): t € X,|g(t)—z| < 5}. (7) 
—>0 
If ¢ > 0 and d* € D*, define 
C.(d*) 


= | seam: O<nlz)— > dizi<er, (8) 


i=0 


and 


G(d*) := (conve: (d*). (9) 


N=1 


It is easily proved that C,(d*) and G(d*) are closed; fur- 
thermore B(d*) C Co(d*) C C,(d*), where B(d*) is de- 
fined by (3). 


Theorem 6 Let y € int(V) be fixed. 
i) Let d* € D* be such that y € G(d*). Then 


L(y|h) = do t+ diy ++++ + ayn. (10) 
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ii) Assume that g is bounded. Then there exists d* € D* 
satisfying 
y € conv Co(d*) C G(d*) 
and 
(11) 


iii) We further obtain, whether or not g is bounded, that 
for almost ally € int(V) there exists at most one d* € 
D* satisfying y € G(d*). 


L(y|h) = do + diy + +++ + dayn. 


The above results suggest the following practical simple 
geometric methods for finding L(y|h) and U(y|h), see 
REDE 


The Method of Optimal Distance 
Call 


M := conviex(gi(t),.... gn(t), h(t)). 


Then L(y|h) is equal to the smallest distance between 
(Vis ++ Yn» 0) and (y1,...,¥n,Z) € M. Also U(y|h) is 
equal to the largest distance between (yj, ...; Yn, 0) and 
(y1,--+,¥n»Z) € M. Here, M stands for the closure of 
M. In particular we see that L(y|h) = inf{yn41: (yi... 
Yn> Yn+1) € Mj} and 


U(y|h) 
= sup {Yn+1: (Vises Yns Ynt) € M} . 


Example 7 Let js denote probability measures on [0, 
aj,a>0. Fix0 <d<a. Find 


L:= int | t* (dt) 
[0,4] 


(12) 


lb 
and 
U:= sup [ t* (dt) 
HM J[0,a] 
subject to 


i t w(dt) = d. 
[0,a] 


So consider the graph G := {(t, t?): 0 <t <a}. Call M := 
conv G = convG. 


A direct application of the optimal distance method 
here gives us L = d? (an optimal measure ju is supported 
at d with mass 1), and U = da (an optimal measure ju 
here is supported at 0 and a with masses (1 — d/a and 
d/a, respectively). 


The Method of Optimal Ratio 


We would like to find 
Ls(y) = inf 4(S) 
and 


Us(y) «= sup 14(S), 
over all probability measures jz such that 


(gi) = Vis i=1,...,n. 


Set S’ := X — S. Call Ws := convg(S), Ws := convg(S’) 

and W := convg(X), where g := (g1,...5 £n): 
Finding Ls(y). 

1) Pick a boundary point z of W and ‘draw’ through z 
a hyperplane H of support to W. 

2) Determine the hyperplane H’ parallel to H which 
supports Wy as well as possible, and on the same 
side as H supports W. 


3) Denote 


~ 


Aq:i=WOH=WsNH 
and 


Ba := Ws NH’. 


Given that H’ # H, set Gg := conv(Aq U Bz). Then 
we have that 


A(y) 
A 
for each y € int(V) such that y € Gg. Here, A(y) is the 
distance from y to H’ and A is the distance between 
the distinct parallel hyperplanes H, H’. 

Finding Us(y). (Note that Us(y) = 1 — Ly(y).) 

Pick a boundary point z of Ws and ‘draw’ through z 
a hyperplane H of support to Ws. Set Ag:= Ws NH. 
Determine the hyperplane H’ parallel to H which 
supports g(X) and hence W as well as possible, and 
on the same side as H supports Ws. We are inter- 
ested only in H’ ¥ H in which case H is between H’ 
and Ws. 


Ls(y) = (13) 


1 


—— 


2 


wer 


3) Set Bg := WN H’! = Wy MH’. Let Gg as above. Then 
A(y) 
Uy) =—., (14) 


for each y € int(V), where y € Gg, assuming that H 
and H’ are distinct. Here, A(y) and A are defined as 
above. 
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Examples here of calculating Ls(y) and Us(y) tend 
to be more involved and complicated, however the ap- 
plications are many. 


The Convex Moment Problem 


Definition 8 Let s > 1 bea fixed natural number and 
let xo € R be fixed. By m,(xo) we denote the set of 
probability measures jz on R such that the associated 
cumulative distribution function F possesses an (s — 
1)th derivative FS—) (x) over (xo, +00) and furthermore 
(—1)° FS—) (x) is convex in (xo, +00). 


Description of the Problem 


Let gi, i = 1, ..., n; h are Borel measurable functions 
from R into itself. These are assumed to be locally inte- 
grable on [xo, +00) relative to Lebesgue measure. Con- 
sider jt € m,(Xxo), s = 1 such that 


ete - POC ee 


i=1,...,n (15) 
and 
wilh = ff no] wat) < +00. (16) 
Let c := (Cj, ...; Cn) € R” be such that 
Mgi)=cj, i=1,...,n, be ms(xo). (17) 
We would like to find L(c) := inf, w (h) and 
(18) 


U(c) := sup p(h), 
lu 


where jz is as above described. 

Here, the method will be to transform the above 
convex moment problem into an ordinary one handled 
by the first section, see [14]. 


Definition 9 Consider here another copy of (R, B); B 
is the Borel o-field, and further a given function P(y, A) 
onR xB. 

Assume that for each fixed y € R, P(y, -) is a proba- 
bility measure on R, and for each fixed A € B, P(-, A) is 
a Borel-measurable real-valued function on R. We call 
Pa Markov kernel. For each probability measure v on R, 


let 2 := Tv denote the probability measure on R given 
by 


(A) = (Tv)(A) := ik Ply, A) v(dy). 


T is called a Markov transformation. 
In particular: Define the kernel 


s(u—x)s—! 


= ifxo9 <x <u, 
K,(u, x) := 0 
0 


(19) 
elsewhere. 


Notice K; (u, x) > Oand [ K; (u, x) = dx = 1, all u> 
xq. Let 6, be the unit (Dirac) measure at u. Define 


6, (A) ifu < x9; 
P.(u, A) := (20) 
i K,(u,x)dx ifu> xp. 
A 
Then 
(Tv)(A) := ; P.(u, A)v( du) (21) 
R 


is a Markov transformation. 


Theorem 10 Let x9 € R and natural number s > 1 
be fixed. Then the Markov transformation (21) 4 = Tv 
defines a 1-1 correspondence between the set mx of all 
probability measures v on R and the set m,(xo) of all 
probability measures 2 on R as in Definition 8. In fact 
T is a homeomorphism given that m* and m,(xo) are 
endowed with the weak* -topology. 


Let @: R > R be a bounded and continuous function. 
Introducing 


p* (u) := (Td)(u) = | P(x) - Ps(u, dx), (22) 
R 
then 
[oan = Je dv. (23) 


Here ¢* is a bounded and continuous function from R 
into itself. 
We obtain that 


p(u) 
1 

*(u) = [ aoa txost™ dt 
0 


ifu > xo. 


ifu < x9; 


(24) 
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In particular 


tC — x0)'o*(u) 
: (25) 
1 - 

“rao ee 


Especially, if r > — 1 we get for @(u) := (u — xo)" that 


1 
o*(u) = (" t ‘) (u — xo)’, for all u > xo. Here r! := 
1-2.-- 


) _ tee ts) 


Ss 


rand 


Solving the Convex Moment Problem 


Let T be the Markov transformation (21) as described 
above. For each jz € m, (xo) corresponds exactly one v 
€ m* such that jz = Tv. Call g¥ := Tg;,i=1,..., n and 
h* := Th. We have 


fe au =f du 
R R 
and 


[owrava fo ndy. 
R R 


Notice that we get 
vigh= f gf av = es i=1,...,n. (26) 
R 
From (15), (16) we get that 
/ T |gi| dv< +00, i=1,...,n, 
R 
and 
i: T |h| dv < +00. (27) 
R 


Since T is a positive linear operator we obtain |Tg;| < 
Tlgil,i=1,...,, and |Th| < T|hl, ie. 


*|\dv<-+too, i=1,...,n, 
[lst 

R 
and 


/ |h*| dv < +00. 
R 


That is, g7, h* are v-integrable. 


Finally 
L(c) = inf v(h") (28) 

and 
(29) 


U(c) = sup v(h*), 


where v € m* (probability measure on R) such that (26) 
and (27) are true. 

Thus the convex moment problem is solved as 
a standard moment problem (see the first section). 


Remark 11 Here we restrict our probability measures 
on [0, + oo) and we consider the case xp = 0. That is jz 
€m,(0),s > 1, i.e. (— 1)§ FS7 (x) is convex for all x > 
0 but jz ({0}) = v ({0}) can be positive, v € m*. We have 


e*(u) =su*- [uw — x)! - h(x) - dx, 
0 


(30) 
u> 0. 
Further #*(0) = $(0), (¢* = Td). Especially, 
if (x) =x’ 
then ¢*(u) = (" 7 ‘) -u’, (31) 
r= 0). 
Hence the moment 
+00 
Oy = / x" (dx) (32) 
0 
is also expressed as 
(1) 
where 
+00 
= / u’ v(du) (34) 
0 


Recall that Tv = j, where v can be any probability mea- 
sure on [0, + co). 

Here we restrict our probability measures on [0, b], 
b > 0 and again we consider the case xy = 0. Let pw € 
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m,(0) and 


/ x" w(dx) := a,, 
(0,b] 


where s > 1, r > Oare fixed. 
Also let v be a probability measure on [0, b] unre- 


(" , ‘) a,, where 


(36) 


(35) 


stricted, i.e. v € m*. Then 6, = 


a i u’ v(du). 
(0,5) 


Let h: [0, b] + R, bean integrable function with respect 
to Lebesgue measure. Consider jz € m; (0) such that 


i hd < +00. (37) 
[0,5] 


/ h* dv <+o0, vem". (38) 
[0,b] 


Here h* = Th, x = Tv and 


i; hdu= i; h* dv. 
(0,b] [0,0] 


Letting a, be free, we have that the set of all possible 
(a, (h)) = (t(x"), x(h)) coincides with the set of all 


-1 
(>) ot) 
s 
r+ 7 
= (( s ) vu"), vi) : 


where p as in (37) and v as in (38), both probability 
measures on [0, b]. Hence, the set of all possible pairs 
(B,, (h)) = (B,, v(h*)) is precisely the convex hull of 
the curve 


P= {(u',h*(u)): 0 <u < dt. (39) 


In order one to determine L(a) the infimum ofall z(h), 
where jt is as in (35) and (37), one must determine the 
lowest point in this convex hull which is on the vertical 
through (8,, 0). For U(@,) the supremum ofall p(h), 
as above, one must determine the highest point of above 
convex hull which is on the vertical through (f,, 0). 


For more on the above see again $1. 


Infinite Many Conditions Moment Problem 
See also [16]. 


Definition 13 A finite nonnegative measure jz on 
a compact and Hausdorff space S is said to be inner reg- 
ular when 


p(B) = sup {u(K): K C B; K compact} (40) 


holds for each Borel subset B of S. 


Theorem 14 See [16]. Let S be a compact Hausdorff 
topological space and a;: S > R(i € I) continuous func- 
tions (I is an index set of arbitrary cardinality), also let a; 
(i € I) be an associated set of real constants. Call Mo(S) 
the set of finite nonnegative inner regular measures [t on 
S which satisfy the moment conditions 


(aj) = fos (ds) <a;, all iel. (41) 
S 


Also consider the function b: S — R which is continuous 
and assume that there exist numbers d; > 0 (i € I), all 
but finitely many equal to zero, and further a number q 
> 0 such that 


1< ) djaj(s)—qb(s), all séS. (42) 
i€] 
Finally assume that Mo(S) # @ and call 
Up(b) = sup {u(b): se € Mo(S)}. (43) 
(u(b) := J's b(s) w(ds)). Then 
Uo(b) 
; ci = 0; 
=a dX CIM" Bs) < Yo .-, ciai(s) alls € S( ’ 
(44) 


here all but finitely many c;, i € I, are equal to zero. 
Moreover, Uo(b) is finite and the above supremum is as- 
sumed. 


Remark 15 In general we have: let S be a fixed measur- 
able space such that each 1-point set {s} is measurable. 
Further let Mo(S) denote a fixed nonempty set of finite 
nonnegative measures on S. 

For f: S—> Ra measurable function we denote 


Lo(f) = Lf, Mo(S)) 


45 
= int} [ 400 (ds):  € Mo(S)? . om 
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Then we have 


Lo(f) = —Uo(—f). (46) 


Now one can apply Theorem 14 in its setting to find 


Lo(f). 


Applications and Discussion 


The above described moment theory optimization 
methods have a lot of applications in many sciences. 
To mention a few of them: physics, chemistry, statistics, 
stochastic processes and probability, functional anal- 
ysis in mathematics, medicine, material science, etc. 
Optimization moment theory could be also considered 
the theoretical part of linear finite or semi-infinite pro- 
gramming (here we consider discretized finite nonneg- 
ative measures). 

The above described methods have in particular im- 
portant applications: in the marginal moment prob- 
lems and the related transportation problems, also in 
the quadratic moment problem, see [17]. 

Other important applications are in tomography, 
crystallography, queueing theory, rounding problem in 
political science, and martingale inequalities in proba- 
bility. At last, but not least, optimization moment the- 
ory has important applications in estimating the speeds: 
of the convergence of a sequence of positive linear oper- 
ators to the unit operator, and of the weak convergence 
of nonnegative finite measures to the unit-Dirac mea- 
sure at a real number, for that and the solutions of many 
other important optimal moment problems please see 


2]. 


Final Conclusion 


Optimization moment theory is a very active area of 
mathematical probability theory with a lot of applica- 
tions in other subjects, and with a lot of researchers 
from around the world in it contributing new useful re- 
sults, continuously during all of the 20th century. 


See also 


> Approximation of Extremum Problems with 
Probability Functionals 

> Approximation of Multivariate Probability Integrals 

> Discretely Distributed Stochastic Programs: Descent 
Directions and Efficient Points 


> Extremum Problems with Probability Functions: 
Kernel Type Solution Methods 

> Logconcave Measures, Logconvexity 

> Logconcavity of Discrete Distributions 

> L-shaped Method for Two-stage Stochastic 
Programs with Recourse 

> Multistage Stochastic Programming: Barycentric 
Approximation 

> Preprocessing in Stochastic Programming 

> Probabilistic Constrained Linear Programming: 
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> Probabilistic Constrained Problems: Convexity 
Theory 

> Simple Recourse Problem: Dual Method 

> Simple Recourse Problem: Primal Method 

> Stabilization of Cutting Plane Algorithms for 
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> Static Stochastic Programming Models 

> Static Stochastic Programming Models: Conditional 
Expectations 

> Stochastic Integer Programming: Continuity, 
Stability, Rates of Convergence 

> Stochastic Integer Programs 

> Stochastic Linear Programming: Decomposition 
and Cutting Planes 
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Arbitrary Multivariate Distributions 

> Stochastic Network Problems: Massively Parallel 
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> Stochastic Programming Models: Random Objective 

> Stochastic Programming: Nonanticipativity and 
Lagrange Multipliers 
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The general routing problem (GRP) is a routing prob- 
lem defined on a graph or network where a minimum 
cost tour is to be found and where the route must in- 
clude visiting certain required vertices and traversing 
certain required edges. More formally, given a con- 
nected, undirected graph G with vertex set V and (undi- 
rected) edge set E, a cost c, for traversing each edge e 
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€ E, a set Vr C V of required vertices and a set Eg © 
E of required edges, the GRP is the problem of finding 
a minimum cost vehicle route, starting and finishing at 
the same vertex, passing through each v € Vp and each 
e € Ep at least once ([13]). 

The GRP contains a number of other routing prob- 
lems as special cases. When Er = @, the GRP re- 
duces to the Steiner graphical traveling salesman prob- 
lem (SGTSP) ([4]), also called the road traveling sales- 
man problem in [7]. On the other hand, when Vaz = 9, 
the GRP reduces to the rural postman problem (RPP) 
({13]). When Vp = V, the SGTSP in turn reduces to 
the graphical traveling salesman problem or GTSP ([4]). 
Similarly, when Ep = E, the RPP reduces to the Chinese 
postman problem or CPP ([5,8]). 

The CPP can be solved optimally in polynomial 
time by reduction to a matching problem ([6]), but 
the RPP, GTSP, SGTSP and GRP are all NP-hard. 
This means that the computational effort to solve such 
a problem increases exponentially with the size of the 
problem. Therefore exact algorithms are only practical 
for a GRP if it is not too large, otherwise a heuristic al- 
gorithm is appropriate. The GRP was proved to be NP- 
hard in [10]. 

In [3], an integer programming formulation of the 
GRP is given, along with several classes of valid inequal- 
ities which induce facets of the associated polyhedra 
under mild conditions. Another class of valid inequal- 
ities for the GRP is introduced in [11] and in [12] it is 
shown how to convert facets of the GTSP polyhedron 
into valid inequalities for the GRP polyhedron. These 
valid inequalities form the basis for a promising branch 
and cut style of algorithm described in [2] which can 
solve GRPs of moderate size to optimality. 

In [9], a heuristic algorithm for the GRP is de- 
scribed. The author adapts Christofides’ heuristic for 
the TSP to show that when the triangle inequality holds 
in the graph, the heuristic has a worst-case ratio of 
heuristic solution value to optimum value of 1.5. 

There are many vehicle routing applications of the 
GRP. In these cases, the edges of the graph are used 
to represent streets or roads and the vertices represent 
road junctions or particular locations on a map. In any 
practical application there are likely to be many addi- 
tional constraints which must also be taken into ac- 
count such as the capacity of the vehicles, time-window 
constraints for when the service may be carried out, 


the existence of one-way streets and prohibited turns 
etc. 

Many applications are for the special cases when ei- 
ther Er = 9 or Vr = 9. However, there are some types of 
vehicle routing applications where the problem is most 
naturally modeled as a GRP with both required edges 
and required vertices. For example, in designing routes 
for solid waste collection services, collecting waste from 
all houses along a street could be modeled as a required 
edge and collecting waste from the foot of a multistory 
apartment block could be modeled as a required vertex. 
Other examples include postal delivery services where 
some customers with heavy demand might be mod- 
eled as required vertices, while other customers with 
homes in the same street might be modeled together as 
a required edge. School bus services are other examples 
of GRPs where a pick-up in a remote village could be 
modeled as a required vertex, but if the school bus must 
pick-up at some point along a street (and is not allowed 
to perform a U-turn in the street) then that may best be 
modeled as a required edge. 

Further details about solution methods and appli- 
cations for various network routing problems can be 
found in [1]. 


See also 
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Genetic algorithms (GAs) comprise a class of stochas- 
ticglobal optimization methods based on several strate- 
gies from biological evolution. The basic genetic algo- 
rithm was developed by J.H. Holland and his students 
([5,6,7,8]), and was based on the observation that selec- 
tion (either natural or artificial) can produce highly op- 


timized individuals in a relatively short number of gen- 
erations. This is true despite the fact that the space of all 
gene mutations through which a population must sort 
is astronomical. For instancethe genome of the yeast 
Saccharomyces cerevisiae, which is the simplest eukary- 
ote, contains just over 6000 genes, each of which can 
occur in several mutant forms. Despite this, S. cere- 
visiae can reoptimize itself to survive and flourish in 
many new environments in a relatively short number 
of generations. This is equivalent to having a com- 
puter search for a near-optimal solution to a 6000- 
dimensional problem where each of the 6000 variables 
can take on any one ofa large number of values. 

The most important notion from natural systems 
that the GA employs is the use of a population of in- 
dividuals which go through a selection step to produce 
offspring and pass on their genetic material.Optimality 
or fitness is measured by how many offspring an indi- 
vidual produces. A second notion is the use of crossover 
in which individuals share genetic information and pass 
the shared information onto their offspring. A third 
borrowing from nature is the idea of mutation, the con- 
sequence of which is that the transfer of genetic infor- 
mationis prone to random errors. This helps maintain 
the level of genetic diversity in a population. 

The implementation of a simple GA (SGA) which 
uses these ideas is straightforward. The description that 
follows uses a binary encoding, but all of the ideas fol- 
low identically for integer or even real number encod- 
ings. The most important idea is that one works with 
a population of individuals which will interact through 
genetic operators to carry out an optimization process. 
An individual is specified by a chromosome C which is 
a bit string of length N, that can be decoded to give a set 
of N parameters x; which are the natural parameters for 
the optimization application. Each parameter ~; is en- 
coded by n; bits so that al n; = N,. In what follows, 
chromosome and bit string are synonymous. A fitness 
function f(x, . 
timized, is used to rank the individual chromosomes. 
An initial population of Nyop individuals is formed by 
choosing Npop bit strings at random, and evaluating 
each individual’s fitness. (Decode C — (x,..., xy), cal- 
culate f(x1,...,xn).)Subsequent generations are formed 
as follows. All parents (members of the current gener- 
ation) are ranked by fitness and the highest fitness in- 
dividual is placed directly into the next generation with 


..» Xn), which is the function to be op- 
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no change. (This step of keeping the most-fit individual 
intact is termed elitism and is a purely heuristic addi- 
tion. It insures that good solutions to the problem at 
hand are not lost until better ones are found.) Next, 
pairs of parents are selected and their chromosomes are 
crossed over to form chromosomes of the remaining in- 
dividuals in the next generation. A parent’s probability 
of being selected increases with its fitness. So for a min- 
imization application, the parent with the current low- 
est value of f(x1,..., Xj) has the highest chance of being 
selected for mating. Crossover consists of taking some 
subset of the bits from parent 1 and the complementary 
set of bits from parent 2 and combining them to form 
the chromosome of child 1. A childis simply a mem- 
ber of the next generation. The remaining bits from the 
two parents are combined to form the chromosome of 
child 2. Additionally, during replication there is a small 
probability of a bit flip or mutation in a chromosome. 
This serves primarily to maintain diversity and prevent 
premature convergence. Convergence occurs when the 
population becomes largely homogeneous - most in- 
dividuals have almost the same values for all of their 
parameters. Premature convergence occurs when the 
population converges early in a run, before significant 
amount of searching has been performed. The most 
common cause is a poor choice of the scaling of the 
fitness function. It should be noted that ‘premature’ 
and ‘early’ are loosely defined. To bound the magni- 
tude of the effect of mutations, the binary chromo- 
somes are usually Gray coded. An integer that is repre- 
sented as a Gray coded binary number has the property 
that most single bit flips change the value of the deci- 
mal integer represented by the chromosome by +1. In 
sum, the algorithm consists of successively transform- 
ing one generation of individuals into the next using the 
operations of selection, crossover and mutation. Since 
the selection process is biased towards individuals with 
higher fitness, individuals are produced that come ever 
closer to being optimal solutions to the function of in- 
terest. 

It is important to emphasize that crossover is 
the key feature that distinguishes the GA from other 
stochastic global search methods. If crossover is inef- 
fective, GA degenerates into a random walk search be- 
ing executed separately by each individual in the popu- 
lation. The random walk is generated by the mutation 
operator. 


The GA is presented below as pseudocode: 


PROCEDURE genetic algorithm() 
Initialize population; 
FOR (g = 1 to Ngen generations) DO 
FOR (i = 1 to Npop individuals) DO 
Evaluate fitness of individual i: f;(g): 
END FOR; 
Save best individual to population g + 1; 
FOR (i = 2 to Npop) DO 
Select 2 individuals; 
Crossover: create 2 new individuals; 
Mutate the new individuals; 
Move new individuals to population g+1; 
END FOR; 
END FOR; 
END genetic algorithm; 


Pseudocode for the Simple Genetic Algorithm 


Selection commonly uses a roulette wheel procedure. 
Each individual is assigned a slice of the unit circle pro- 
portional to its fitness (f(x), ..., xv)).One then chooses 
pairs of random numbers to select the next two individ- 
uals to be mated. A typical crossover operator takes the 
chromosomes from apair of individuals and chooses 
a common cut point along them. One child gets the 
portion of the first parent’s chromosome to the left of 
thecut point, and the portion of the second parent’s 
chromosome to the right of the cut point. The chromo- 
some of the second child is comprised of the remaining 
fragments of the two parent chromosomes. In the most 
common mutation operator each bit in the binary chro- 
mosome has an equal and low probability being flipped 
from 1 to 0 or vice versa.Many variants on these opera- 
tors have been used. 

The important variables in the GA method are the 
population size, Npop, the total number of generations 
allowed, Ngen, the number of bits used to represent 
a real variable, and the mutation rate. The total CPU 
time used in an optimization run is proportional to 
Npop X Ngen X T(f), where T(f) is the time required to 
evaluate the fitness function f(x), ..., xy). This leads 
to a trade-off between having large, diverse popula- 
tions that explore parameter space widely, and having 
smaller populations that explore longer. In practice, the 
choice is problem dependent. 
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The simple GA and a large number of variants 
have been successfullyused to find near-optimal solu- 
tions to many engineering and scientific applications. 
([2,3,4,6,9,10,11]) Although much effort has gone into 
formally analyzing the GA to understand why it is so 
robust, the most important formal result is the Schema 
theorem ([6,7,8]). Schemata are strings made up of the 
characters 1, 0 and * which is the ‘don’t care’ charac- 
ter. These schemata are building blocks out of which 
the strings representing individuals’ chromosomes can 
be constructed. For instance the string 11100 contains 
schema such as 111, 1100 and 1 * 10. The schema theo- 
rem provides a powerful statement about the behavior 
of schemata in a chromosome. Mathematically, it states 


mH, g+1) 


= m(H, go (: =? 
i 
where m(H, g) is the number of examples of a schema H 
that exist in the population at generation g; f(H) is the 
average fitness of chromosomes containing H; f is the 
average fitness of all chromosomes; p, is the probability 
that crossover will occur at a particular mating; p,, is 
the probability that a particular bit will be mutated; | 
is the length of the chromosome; 5(H) is the length of 
the schema in bits; and o(H) is the order of the schema, 
defined to be the number of fixed (as opposed to don’t 
care) positions in the schema. 
The factors outside the brackets in (1) indicate that 
a particular schema will increase its representation in 
the population at a rate proportional to its fitness rela- 
tive to the average fitness. Good schemata will increase 
their representation exponentially and bad schemata 
will decrease their representation likewise. The terms 
inside the bracket serve to decrease this exponential 
convergence by disrupting the selection-based pres- 
sure. Both crossover and mutation can disrupt good 
schemata. The longer a schema is, the more likely it is to 
be disrupted by crossover, and disappear from the pop- 
ulation. In the same fashion, schemata with many fixed 
positions are more likely to be disrupted by mutations. 
The competition between selection which drives the 
population towards convergence on a good solution 
and crossover and mutation which drive the popula- 
tion towards more diverse states are the keys to the 
GA. Crossover is especially important for keeping the 


8(H) 
7 = ian 


| 
m 5 CL 
4 Pn (1) 


method from being trapped in local minima. One con- 
sequence of the parameter shuffling brought about by 
the crossover operator is that the GA is most efficient at 
optimizing functions that are at least partially separa- 
ble. One individual can find a state where half of the pa- 
rameters of the fitness function are optimized and a sec- 
ond individual can find a state where the other half are 
optimized. If these individuals crossover at the correct 
point, one of theirchildren will have the parameter val- 
ues that globally optimize the function. 

As with most other heuristic global optimization 
methods, no definitive statements can be made about 
the global optimality of GA-generated solutions. 

A family of algorithms that are very similar to the 
GA, called evolution strategies were developed indepen- 
dently and virtually simultaneously in Germany by I. 
Rechenberg ([1,12]). 
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Genetic algorithms (GAs; cf. also ® Genetic algo- 
rithms) have been used for a large number of model- 
ing applications in chemical and biological fields [5,9]. 
At least three factors contribute to this. First, GAs pro- 
vide an easy-to-use global search and optimization ap- 
proach. Second, they can easily handle noncontinu- 
ous functions. Finally, they are relatively robust even 


for moderately high-dimensional problems. All of these 
have contributed to the use of the GA for the important 
but computationally demanding field of protein struc- 
ture prediction. 

Proteins carry out a wide variety of functions in liv- 
ing cells, almost all of which require that the protein 
molecules assume precise 3-dimensional shapes [2,3]. 
Enzymes are typical examples. They generally consist 
of a large structure of 100-300 amino acids stabilizing 
a small active site which is designed to carry out a spe- 
cific chemical reaction such as cleaving a bond in a tar- 
get molecule. Even slight changes in the structure of 
the active site can destroy the protein’s ability to func- 
tion. Many drugs act by fitting snugly into enzymes’ ac- 
tive sites, causing them to shut down. Therefore, a de- 
tailed understanding of the 3-dimensional structure of 
a protein can enhance our understanding of its func- 
tion. This can in turn help understand related disease 
processes and can finally lead to disease cures. Unfortu- 
nately the experimental determination of protein struc- 
tures, using x-ray crystallography or solution NMR is 
very difficult. Currently the structures of only a few 
thousand of the estimated 100,000 proteins that are 
used by the human body have been determined this 
way. The alternative is to predict the structures com- 
putationally. 

The basic computational approach is simple to state, 
although many details have yet to be worked out. It re- 
lies on the experimental fact that a protein in solution 
(as well as any other molecule) will tend to find a state 
of low free energy. Free energy accounts for the inter- 
nal energy (potential plus kinetic) of single molecules as 
well as the entropy of the ensemble of molecules of the 
same type. At absolute zero, the entropy contribution 
to the energy, as well as the kinetic energy, go to zero, 
leaving only the potential energy. Therefore, the most 
likely shape or state of a protein at absolute zero is the 
one of lowest potential energy. The simplest computa- 
tional model then needs a method to search the space of 
conformations and an energy function (approximating 
the physical potential energy) which is minimized dur- 
ing the search. (A protein’s conformation is the descrip- 
tion of the 3-dimensional positions of all of the atoms 
for a fixed set of atoms and atom-atom connections. 
The configuration describes the atom-atom connectiv- 
ity and only changes through chemical bond forming 
or breaking.) The conformation which yields the lowest 
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value of the energy function is a best estimate of con- 
formation of the natural protein. It is possible to extend 
this simple model to include the effects of finite tem- 
perature, but these extensions are beyond the scope of 
this article. In-depth discussions of molecular model- 
ing, including energy functions for proteins and other 
molecules can be found in [6,8,10], and [1]. 

Because proteins possess many degrees of freedom, 
and the energy functions have many local minima, 
global optimization methods that search efficiently and 
are not prone to being caught in local minima are re- 
quired. The GA is often used because it fits both of these 
criteria. 

Proteins [2] are long linear polymers composed of 
well-conserved sequences of the 20 amino acids. Each 
amino acid is in turn made up of a backbone 


R 


—- (NH —- GC — CO) - 


where R stands for one of the 20 side groups that make 
the amino acids unique. These range from a single hy- 
drogen atom to chains having many degrees of free- 
dom. The primary structure of the protein is simply the 
sequence of amino acids. For many naturally occurring 
proteins, this sequence carries sufficient information to 
determine the final 3-dimensional or tertiary structure 
of the protein. Experimentally, proteins that have been 
denatured (caused to unfold by heating the solution or 
changing its chemical composition) will spontaneously 
refold to their active, or native conformation, when the 
solution is returned to its original state. 

There are two sets of coordinates often used for 
specifying the conformation of a protein. The first are 
the standard Cartesian coordinates for each atom. For 
N atoms, this requires 3N — 6 numbers. The alternative 
is to use internal coordinates which are the bond dis- 
tances (distances between atoms bound together), the 
bond angles (angles formed by a given atom and two 
atoms bound to it), and the dihedral angles (the angle of 
rotation about a center bond for a set of 4 atoms bound 
as A— B — C—D). Toa good first approximation, the 
bond distances and bond angles are fixed at values that 
are independent of the particular amino acid or protein. 
Therefore, the conformation of a protein is determined 
largely by the values of its dihedral angles. There are 
on average about 15 atoms and about 3 dihedrals per 


amino acid, requiring about N/5 degrees of freedom to 
describe the conformation of an N-atom protein. The 
dimension of conformation space for a moderate-size 
protein of 100 amino acids ( ~ 1500 atoms) is ~ 4500 
when using Cartesian coordinates vs. ~ 300 when us- 
ing internal coordinates with fixed bond distances and 
angles. 

In many protein structure prediction applications, 
the simple GA approach is used. For each generation, 
one calculates the fitness (energy) of each individual 
in the population, selects pairs of individuals based on 
their energy, performs crossover and mutation. The GA 
chromosome directly codes for the values of the dihe- 
dral angles. Both binary encoded and real number en- 
coded chromosomes have been used with equal success. 
For binary encoded dihedrals, one must decide on the 
resolution of the GA search. The maximum one would 
use is 10 bits per angle which gives a resolution of about 
1/3 degree. Often as few as 5 or 6 bits will be sufficient, 
especially if the GA-generated conformations will be 
subjected to local gradient minimization. 

For each GA individual, the chromosome is de- 
coded to give the values of the dihedrals which are 
passed to the energy function. This in turn returns an 
energy which is used as the fitness for the subsequent 
selection process. 

Another encoding scheme that is often used is based 
on the idea of a rotamer library. It is known from study- 
ing the set of experimentally known structures that the 
dihedral angles in many amino acid side chains take 
on restricted sets of values. Also, the values of several 
neighboring dihedrals are often correlated. It has then 
been possible to develop libraries of preferred sidechain 
conformations (called rotamers) for each amino acid. 
This can be incorporated into the GA by having each 
word in the chromosome simply determine which of 
a set of rotamers to use for each amino acid in the se- 
quence. The use of rotamer libraries in the GA frame- 
work is illustrated in references [7,12,13,14], and [11]. 

The other major ingredient needed for a protein 
structure prediction method is an energy function to be 
minimized. This is a huge area of research which is be- 
yond the scope of this article, but two major approaches 
will be summarized. The first scheme uses physics- 
based empirical potentials. These are functions of the 
bond distances, bond angles, dihedral angles, and non- 
bonded distances (distances between atoms not directly 
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bound together). The functional forms are derived from 
the results of accurate but computationally expensive 
quantum mechanical calculations that are performed 
on small molecular fragments such as individual amino 
acids. The results are fitted to simple functions with 
several free parameters. The parameter values are ei- 
ther taken from the original quantum calculations or 
from independent spectroscopic experiments. Various 
methods are used to approximate the effect of the water 
and salt environment around the protein. The advan- 
tage of these potentials is that they are continuous and 
very general. They can be constructed for any protein 
and give reasonable energies for any conformation re- 
quested. The disadvantage is that they are not yet suf- 
ficiently accurate to give reliable structure predictions. 
For many if not all of the proteins whose structure is 
known, there are conformations that have much lower 
calculated energy than that of the experimental confor- 
mation. 

The second approach is to use potentials based on 
observations of known protein structures. Basically, 
more probable conformations (ones that look more like 
real proteins) will have lower energy values. For in- 
stance certain sequences of amino acids almost always 
assume a particular secondary structure. The secondary 
structure of a protein describes the presence of multi- 
amino acid helices, sheets and turns but not the ex- 
act placement of the atoms in the secondary structure 
elements or the spatial orientation of these elements. 
These potentials have the advantage that they build on 
our observations of proteins as entire molecules and in- 
corporate long-range order. As with the empirical po- 
tentials, though, they suffer from accuracy problems. 
However, except for very small proteins (less than 20 
amino acids) the structure-based potentials show the 
most promise. 

A common feature of GA-based protein structure 
prediction methods is the use of hybrid approaches 
combining standard GA with a local search method. 
The GA is then used primarily to perform an efficient 
global search which is biased towards regions of con- 
formation space with low energy. This is a pragmatic 
approach driven by the large number of degrees of free- 
dom even when internal coordinates are used. A simple 
and often used approach [5] is to subject GA-generated 
conformations to gradient minimization. Another ap- 
proach is to use a population of individuals which carry 


out independent Monte-Carlo or simulated annealing 
walks (cf. also » Simulated annealing methods in pro- 
tein folding; » Monte-Carlo simulated annealing in 
protein folding) for a number of steps and then undergo 
selection, crossover and mutation [4,15,16]. 


See also 


> Adaptive Simulated Annealing and its Application 
to Protein Folding 

> Bayesian Global Optimization 

> Genetic Algorithms 

> Global Optimization Based on Statistical Models 

> Monte-Carlo Simulated Annealing in Protein 
Folding 

> Packet Annealing 

> Random Search Methods 

> Simulated Annealing Methods in Protein Folding 

> Stochastic Global Optimization: Stopping Rules 

> Stochastic Global Optimization: Two-phase 
Methods 
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Introduction 


Geometric programming is an important class of non- 
linear optimization problems. Their source dates back 
to the 1960s when Zener began to study a special type 


of minimization cost problem for design in engineer- 
ing, now known as geometric programming. The term 
geometric programming is adopted because of the cru- 
cial role that the arithmetic-geometric mean inequality 
plays in its initial development. 

Actually, the early work in geometric program- 
ming was, for the most part, concerned with mini- 
mizing posynomial functions subject to inequality con- 
straints on such functions, which was called posyn- 
omial geometric programming. In the past decade, 
because a number of models abstracted from applica- 
tion fields were not posynomial geometric program- 
ming, the theory had to be generalized to a much 
broader class of optimization problems called gener- 
alized geometric programming, which has spawned 
a wide variety of applications since its initial develop- 
ment. Its great impact has been in the areas of (1) en- 
gineering design [1,4,10,11]; (2) economics and statis- 
tics [2,3,6,9]; (3) manufacturing [8,17]; (4) chemical 
equilibrium [13,16]. Reference [19] focuses on solu- 
tions for generalized geometric programming. 


Formulation 


[19] provides a global optimization algorithm for the 
generalized geometric programming (GGP) problem 
stated as: 


min Go(x) 


st. Gy(x) <bm,m=1,...,M 
GGP m(X) < bm | 
xEX={x:0<x; < xj <x} 
i=1,...,N} 
Tm N mti 
where Gy) = pole ver rd | POM Ma = 
0,1, ...,M, and cy are positive coefficients, T,, are 


the given number of the terms in the function G,,(x), 
Smt = +1 and —1;6,, = +1 or —1, Yt; are arbitrary 
real constant exponents. In general, formulation GGP 
corresponds to a nonlinear optimization problem with 
a nonconvex objective function and constraint set. 
In Gyp(x), if bm¢ = +1 for all t,t =1,...,Ty, and 
x; >0,i=1,...,N, then the function G,,(x) is called 
a posynomial. Note that if we set 6,,, = +1 for all 
m=0,1,...,M,t=1,...,T7,, and é,, = +1 for all 
m=1,...,M, then the GGP formulation reduces 
to the classical posynomial geometric programming 
(PGP) formulation that laid the foundation for the the- 
ory of the GGP problem. 
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Local optimization approaches for solving the GGP 
problem include three kinds of methods in general. 
First, successive approximation by posynomials, called 
“condensation,” is the most popular [14]. Second, Passy 
and Wilde [15] developed a weaker type of duality, 
called “pseudo-duality,” to accommodate this class of 
nonlinear optimization. Third, some nonlinear pro- 
gramming methods are adopted to solve the GGP prob- 
lem based on exploiting the characteristics of the GGP 
problem [12]. 

Though local optimization methods for solving the 
GGP problem are ubiquitous, global optimization algo- 
rithms based on the characteristics of the GGP prob- 
lem are scarce. Maranas and Floudas [13] proposed 
such a global optimization algorithm based on the ex- 
ponential variable transformation of GGP, the convex 
relaxation, and branch and bound on some hyperrect- 
angle region. Reference [19] proposes a branch-and- 
bound optimization algorithm that solves a sequence 
of linear relaxations over partitioned subsets in order 
to find a global solution, and to generate the linear 
relaxation of each subproblem and to ensure conver- 
gence to a global solution, special strategies have been 
applied. (1) The equivalent reverse convex program- 
ming (RCP) formulation is considered. (2) A linear 
relaxation method for the RCP problem is proposed 
based on the arithmetic-geometric mean inequality and 
the linear upper bound of the reverse convex con- 
straints; this method is more convenient with respect to 
computation than the convex relaxation method [13]. 
(3) A bound tightening method is developed that will 
enhance the solution procedure, and, based on this 
method, a branch-and-bound algorithm is proposed. 


Methods and Applications 
Transformation 


In [5], Duffin and Peterson show that any GGP problem 
can be transformed into the following reverse posyno- 
mial geometric programming (RPGP): 


min Xo 
St. Bm(x) <1, m=1 p 
n(x) > 1, m=ptl,....4 


l 


i 


xEN={x:0<x 


yee. nh 


<x) < x < 00 


where g,,(x) are posynomials for m= 1, ... ,q, and 
n>N. 

To see how such a reformulation is possible, first 
consider the objective function in GGP. If the optimal 
value of GGP is positive, the GGP problem is equivalent 
to the following form: 


min Xo 
x9 'Go(x) < 1, 
Galt) < batt = bhava 
xEexX. 


(GGP1): 


And if the optimal value of GGP is negative, then GGP 
can be transformed into the following form: 


min Xo 
G <-l, 
(GGP2): xoGo(x) = 
Gm(x) <bm,m=1,...,M 
xEexX. 


We can add a large constant to the objective function 
of GGP in order to ensure that the optimal value of 
(GGP) is positive, then derive the form GGP1. In this 
method a probably lower bound estimation for the op- 
timal value of GGP is needed. 

Secondly we turn to consider the constraints. If the 
primal constrained function G,,(x) is either a posyno- 
mial or the negative of a posynomial, then it is obvious. 
So we only consider the following constrained function: 


G(x) = hy(x) — ho(x) <1, 


where each h;(x)(i = 1,2) is a posynomial. Notice that 
x satisfies the above inequality if and only if there exists 
a single variable s > 0 such that (x, s) satisfies 


h(x) <s <ho(x) +1. 


Now note that the above formulation is equivalent to 
the following two constraints 


sthy(x)<1 and sthy(x)+s1>1, 


which are in a form consistent with the formulation 
RPGP. 
By applying the following exponent transformation 


xj =expz;, i=0,...,n 
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to the formulation RPGP, we can obtain the following 
reverse convex programming (RCP) problem: 


min exp(Zo) 

st. Sm(z) <1, m=1,... .p 
&m(zZ)>1,m=ptl,....q4 
zEeQ={zi2b <7 <2), 
i=0,1,...,n} 


where 


Tm 


tol) = Jem o ym § MS ly oon 4G 
t=1 0 


i= 


Because each exp{) jo YmtiZit is convex, both the ob- 
jective and constrained functions are convex. 

The main difficulty for solving the RCP problem is 
connected with the presence of the reverse convex con- 
straints gn(z) >1,m=p+1,...,q, which destroy 
the convexity and possibly even the connectivity of the 
feasible set and give rise to a nonconvex feasible region. 


Linear Relaxation Programming 


The principal construct in the development of a solu- 
tion procedure for solving the RCP problem is the con- 
struction of a linear relaxation programming of RCP 
for obtaining the lower bound for this problem, as well 
as for its partitioned subproblems [19] derives such 
a linear relaxation by applying the arithmetic-geometric 
mean inequality for the convex constraints and overes- 
timating every reverse convex constraint in either the 
initial bounds on the variables of the problem or mod- 
ified bounds as defined for some partitioned subprob- 
lem in a branch-and-bound scheme. 


(1) Linear Relaxation for Convex Constraints The 
arithmetic-geometric mean inequality that played such 
a crucial role in developing the duality theory for 
posynomial programming is also used to obtain lin- 
ear relaxation programming. Recall that this inequality 
states that for any vector w > 0 and any nonnegative 
weight vector ¢ whose components sum to one, we have 


Dore TT" 


provided («;/€+)** is defined to be 1 when ¢; = 0. Give 
a posynomial 


Smlx) = >> theilx) = > cme | [xP 


and ém >0 with S°,ém: = 1. Then 
posynomial ¢,,(x) is defined by 


En Cnll [ 


1 


a condensed 


where €m = [],(Cmt/Emr)°™ and Pni = Yo, VntiEmt- 

Thus the condensed posynomial g(x) is also 
a posynomial, and it has a single posynomial term. 
According to this method, the condensed single term 
for the convex constraints g,,(z) < 1 of RCP, where 
z; = In xj, is of the following form: 


&m(Z) = Cm EXP (> yi (1) 


1 


where the definitions of ¢,, and 7; have been given in 
the former. 

To illustrate how the condensed term can be used to 
obtain the linear relaxation, we consider the following 
convex constraints g;,(z) < 1,m =1,... , pand select 
an arbitrary weight vector €,, > 0 whose components 
sum to one. We use the condensed constrained func- 
tions to replace the above convex constraints: 


Sm(zZ) <1, m=l1,...,p. (2) 


It follows that 


&m(Z) < gm(z) 


foreach m = 1, ... , p. Thus ifin RCP the convex con- 
straints are replaced by the condensed constraints, the 
feasible region for RCP will be contained in the new 
feasible region. Notice that the condensed constraints 
(2) can be easily transformed into equivalent formula- 
tions as linear constraints: 


Lm(z) = Y- YmiZi + IN Em < 0, MaAyaecyP: 


t 


(2) Linear Relaxation for Reverse Convex Constraints 
For reverse convex constraints such a linear relaxation 
can be obtained by overestimating every convex func- 
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tion gn(z) of the reverse convex constraint with a lin- 
ear function L,,(z) for every m= p+1,...,q. The 
method in [13] of underestimating a concave function 
with a linear function is adopted, and we describe the 
linear function as follows: 


Tn n 
Lin (Z) = > Cmt Amt Ss Bint (>: ya) 
t=1 


i=0 


and 
A= baer exp(Yjr,1) = V7 ev) 
mt — yu _ yl ’ 
mt mt 
U L 
exp( Vins) = exp(Y,,) 
Bint = U iL 
Yint = Yint 


n 
L : Ts U 
be = ) min(YmtiZ; >Vmti2; ). 


i=0 


n 
U L U 
vine = ) max(YmtiZ; >VntizZ; yy 


i=0 


and it follows that 
Ln(z) = gm(z), m= pt+),...,4q. 


Thus if in (RCP) the reverse convex constraints are re- 
placed by the overestimation linear constraints, the fea- 
sible region for RCP will be contained in the new feasi- 
ble region. 


(3) Linear Relaxation Programming For the objec- 
tive function of RCP, it is obvious that min exp(zo) is 
equivalent to min Zp. From the above discussion for the 
two kinds of constraints respectively, [19] constructs 
the corresponding linear relaxation programming on 
the region (2 LRP({2) as follows: 


min Z 

st. Ly(z) <0, m=1,...,p 
Ly(z)=1,m=pt+l,....4q 
26€Q=4z:2 ae <2) , 
i=0,1,...,n}. 


The following results establish some salient properties 
of the linear relaxation programming LRP(S2) that are 
essential in designing the proposed algorithm. 


Lemma 1 Assume the minimum of LRP(Q2) is LB*; 
then exp(LB*) provides a lower bound of the optimal 
value of the RCP problem. 


Proof We denote the feasible region of RCP and 
LRP({2) D and P; then it is immediate that P > D by 
the construction method. So based on the above as- 
sumption, exp(LB*) is a lower bound of the minimum 
of the RCP problem. a 


Branch-and-Bound Algorithm 


Reference [19] develops a branch-and-bound algorithm 
to solve the RCP based on the former linear relaxation 
method. This algorithm needs to solve a sequence of 
linear relaxation programming problems over $2 or the 
subsets of 2 in order to find a global solution. Further- 
more, to ensure convergence to a global solution, a new 
bound tightening method (BTM) is proposed and will 
be applied to enhance the solution procedure. 

The critical element in guaranteeing convergence to 
a global minimum is the choice of a suitable branching 
rule. In [18] three kinds of branching methods are pro- 
vided. Reference [19] chooses the first method, a simple 
and standard bisection rule. This method is sufficient 
to ensure convergence since it drives all the intervals to 
zero for the variables that are associated with the term 
that yields the greatest discrepancy in the employed ap- 
proximation along any infinite branch of the branch- 
and-bound tree. 

Branching rule: 

Assume that the hyperrectangle (24 is going to be 
divided. Then the selection of the branching variable z,, 
which possesses a maximum length in §24 and the parti- 
tioning of {24 are done using the following rules, where 
24 = {z: z; (24) S25 z; (24), $=0,.0.. 90h, Let 


e = arg max {2¥(2") _ Zan ; 


and partition 24 by bisecting the interval [z/ (24), 
z¥(24%)] into the subintervals [z/(.21), (z£(24) + 
zY (24))/2] and [(z4(24) + 29 (24))/2, zY¥(24)). 

In what follows we describe the BTM strategy pro- 
posed by [19]. 

Assume that the subhyperectangle $24 (s is the it- 
eration counter) is selected for further consideration. If 
in the node q(s) the corresponding solution 2(Q2)) is 
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not feasible in some convex constraint, let 


1 = arg max{gm(2(21™))| 
Tm 


Sml2(29)) = Ym e(2(2Q™)) > 1}. 


t=1 


Compute the weight vector é) by €); = uzi(Z)/gi(Z), 
i=1,...,T;,and then condense the function g/(z) us- 
ing this weight vector as described in Sect. “Linear Re- 
laxation Programming.” Then a new single term is ob- 
tained, and therefore a new linear constraint is added to 
the linear relaxation programming LRP({2%). Denote 
this new linear relaxation programming and new added 
condensed single term LRP((2 a)) and &i(z). And 
from the discussion in Sect. “Linear Relaxation Pro- 
gramming” we know g/(z) = ¢; exp()_; 71iZi), where 
c= WWAGrin ng and yi; = ye vitiélt- 
It is obvious that 


81(2(2%)) = gi(2(2™)) , 


and since gj(2(2%))) > 1, it follows that 2(24) 
does not satisfy the new added constraint g)(z) < 1. 
From the arithmetic-geometric mean inequality, we 
have g)(z) < gi(z). Of course, the new single-term con- 
straint g)(z) < 1 is equivalent to a linear constraint. 
Hence, if z is feasible for RCP, it is certainly feasible for 
LRP(2%), whose feasible region obviously does not 
contain the point 2(Q4). Clearly, this BTM technique 
will enhance the solution procedure. 

Based on the previous BTM technique, [19] con- 
structs the global optimization algorithm. The basic 
steps of the algorithm are summarized in the following 
statement. 


Algorithm Statement 


step 0: (Initialization) 


0.1: Assume a convergence tolerance 5>0, and 
the initial weights €,,,m = 1, ... , p. Set the iteration 
counter s = 0, then Q, = Qo = {1}, q(s) = q(0) = 1, 
2) = Q) = Q. Set an initial upper bound U* = oo. 

0.2: Solve the problem LRP(2), and denote the 
solution and the minimum (2(27), LBys)). 

0.3: If 2(924%) is feasible for RCP, then stop with 
2(921°) as the prescribed solution to the RCP problem, 
else let LB(s) = LBs); 


0.4: If 2(24)) is not feasible on some convex con- 
straints, the BTM technique will be adopted. 


step 1: (Partitioning step) Choose a branching vari- 
able z,, then partition 24) to get 29°)! and 24), 
Replace q(s) by node indices q(s).1, q(s).2 in Q.. 


step 2: (Feasibility check for (RCP)) 
where w = 1,2, compute: 


For each q(s).w, 


&m(w) = Cm exp (x: min(YmiZ; pat] , 


i=0 


form=1,...,p 


Ln 
&m(w) = Yo cme exp ‘eaor 
t=1 
form=pt+l,....4q 


where Cm, Yi, re have been defined in Sect. “Linear 
Relaxation Programming.” If forsome m € {1, ... , p}, 
&m(Z)>1, or for some m € {p+ 1, ... , qh} Em(Z) < 1, 
then the node indices q(s).w will be eliminated. If 
QU)” (w = 1, 2) are all eliminated, then go to step 5. 
step 3: (Updating upper bound) For undeleted sub- 
hyperrectangle update 


Ang Bee 


mt? bso % 

Solve LRP(Q1)”), where w=1 or w=2 or 
w = 1,2, and denote the solutions and optimal val- 
ues (2(29)-"), LBys).w)- Then if 2(295)-") is feasible 
for RCP, U* = min{U*, LBqs).}- 


step 4: (Deleting step) If LB,(;) > U* +6, then 
delete the corresponding node; 


step 5: (Fathoming step) Fathom any nonimproving 
nodes by setting Q,+; = Q;—{q € Q;: LB, = U* —4}. 
If Q;41 =, then stop, and exp(U*) is the optimal 
value, z*(k) (where k € Ko) are the global solutions, 
where ko = {k : 25 (kK) = U*}. Otherwise, s = s + 1; 


step 6: (Node-selection step) Set LB(s) = min{LB, : 
q € Q,}, then select an active node q(s) € 
arg min{LB(s)} for further considering; 
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step 7: (Bound tightening step) If in this node q(s), 
2(24)) is feasible in all convex constraints of RCP, 
then return to step 1, else the BTM technique will be 
adopted, and then return to step 1. 


Theorem 1 (convergence result) The above algorithm 
either terminates finitely with the incumbent solution be- 
ing optimal to RCP or it generates an infinite sequence 
of iterations such that along any infinite branch of the 
branch-and-bound tree, any accumulation point of the 
sequence LB(s) will be the global minimum of the RCP 
problem. 


Proof A sufficient condition for a global optimiza- 
tion to be convergent to the global minimum, stated 
in Horst and Tuy [7], requires that the bounding oper- 
ation be consistent and the selection operation bound 
improving. 

A bounding operation is called consistent if at every 
step any unfathomed partition can be further refined 
and if any infinitely decreasing sequence of successively 
refined partition elements satisfies: 


lim (U* — LB(s)) =0, (3) 
s—>+oo 

where LB(s) is a lower bound inside some subhyperrect- 
angle in stage s and U’ is the best upper bound at iter- 
ation s, not necessarily occurring inside the above same 
subhyperrectangle. In the following we will demon- 
strate that (3) holds. 

Since the employed subdivision process is the bisec- 
tion, the process is exhaustive. Consequently, from the 
discussion in [13] (3) holds, and this means that the em- 
ployed bounding operation is consistent. 

A selection operation is called bound improving if 
at least one partition element where the actual lower 
bound is attained is selected for further partition af- 
ter a finite number of refinements. Clearly, the em- 
ployed selection operation is bound improving because 
the partition element where the actual lower bound is 
attained is selected for further partition in the immedi- 
ately following iteration. 

In summary, it is shown that the bounding op- 
eration is consistent and that the selection operation 
is bound improving; therefore, according to Theo- 
rem IV.3. in Horst and Tuy [7], the employed global 
optimization algorithm is convergent to the global min- 
imum. oO 


Applications 


Reference [19] reports the numerical experiment for 
the deterministic global optimization algorithm de- 
scribed above to demonstrate its potential and feasibil- 
ity. The experiment is carried out with the C program- 
ming language. The simplex method is applied to solve 
the linear relaxation programming problems. 

To illustrate how the proposed algorithm works, 
first [19] gives a simple example to show the solving 
procedure of the proposed algorithm. 

Example 1: 


xt + x3 
s.t. 0.3x1X2 = 1 
xEX={2<x, <5; 


1<x) <3}. 


min 


First, transform the above problem into the RPGP 
form as follows: 


min Xo 

st ay =x ate, 4 21 
g(x) = 0.3x1x2 > 1 
x € Qy = {x|5 < x < 10; 


2< x, <5; 1< x) <3}. 


Let x; = exp z; (i = 0,1, 2), then we can obtain the 
following reverse convex programming problem (P) of 
Example 1: 

min exp(Zo) 

st. fi(z) = exp(—zZo + 221) 
+ exp(—Zp + 2z2) < 1 
iz) = 0.3 exp(Z1 +2)>1 
ZEN={z| 
1.6094 < Z < 2.3026; 
0.6931 < z, < 1.6094; 
0 < z < 1.0986}. 

In step 0, set 6 = 107%, s=0, U* = oo. For the 
convex constraint function f{(z), choose the initial 
weight as €; = (1/2, 1/2) since it has two terms. Then 
qs) = 1, 0. = OQ) = {1}, RT) = 2! = QD, Accord- 
ing to the discussion in Sect. “Methods and Applica- 
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tions“, the LRP(2') of problem P is formulated below: 


min Zp 

st. Ly(z) = —z +2) + 2 < —0.6931 
L2(z) = 1.9356z, + 1.9356z2 > 1.7416 
ze}, 


The solution and optimal value of LRP(Q!) are: 


2(2') = (1.6094, 0.6931, 0.2231) , 
LB, = 1.6094. 


Since 2(9') is not feasible for problem P, then 
LB(s) = LB(0) = 1.6094. Since 2(') is not feasible 
for fi(z) < 1, then the BTM technique will be adopted. 
First, update the weight ¢, according to the solution 
2(Q'), and derive ¢; = (0.7191, 0.2809), then from for- 
mula (1) in Sect. “Methods and Applications“, we ob- 
tain a new linear constraint: 


L3(z) = —Zo + 1.4382z, + 0.5618z, < —0.5938 . 


The current linear relaxation programming denoted as 
LRP(&) is: 


min Zp 

st. Ly (z) = —zZy + 2) + Z2 < —0.6931 
Ly(z) = 1.9356z, + 1.9356z2 > 1.7416 
L3(z) = —Zp + 1.4382z, + 0.5618z2 
< —0.5938 
ze}, 


In step 1, divide the region 92! into the following 
two regions: 


Q? = {z| 1.6094 < zp < 2.3026; 
0.6931 < z < 1.6094; 0 < z. < 0.5493}, 
2? = {z| 1.6094 < zp < 2.3026; 
0.6931 < z < 1.6094; 0.5493 < z) < 1.0986}, 


then the node set Qo = {2, 3}. 
In step 2, the two nodes in Qp have not been deleted; 
then go to step 3. After updating the parameters ac- 


cording to the formula in Sect. “Linear Relaxation Pro- 
gramming” respectively, we can obtain the new func- 
tion L2(z) in each node. Then we have LRP(Q7): 


min Zo 

st. Ly(z) = —z% +2, + 2 < —0.6931 
L2(z) = 1.3633z, + 1.3633z2 > 1.3450 
L3(z) = —Zo + 1.4382z, + 0.56182z, 
< —0.5938 
ze? 


and we have LRP(2?): 


min Zo 

st. Ly (z) = —Z29 +: 2) + 2 < —0.6931 
Lo(z) = 2.3613z, + 2.3613z2 > 2.8946 
L3(z) = —Z + 1.4382z, + 0.56182, 
< —0.5938 
ze. 


The solutions and optimal values are respectively 


2(27) = (1.7555, 0.6931, 0.2934) , 


LBy = 1.7555 
2(2°) = (1.9356, 0.6931, 0.8427) , 
LB3 = 1.9356. 


In step 4 the two nodes have not been deleted; then go 
to step 5. Compute 


Qi = Qo — {q € Qo: LB, = U* — 8} = {2,3}, 
and set s = 1. In step 6, the current lower bound is 


LB(s) = LB(1) = min{LB,,q € Q;} 
= min{LB,, LB3} = 1.7555. 


So we will choose the active node as q(1) = 2 for further 
consideration. 

In step 7 in the node q(1), the BTM technique 
is adopted. From formula (1) in Sect. “Methods 
and Applications” we compute the new weight 
£1 = (0.6899, 0.3101) according to the solution 2(27), 
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and we obtain the following new linear constraint: 
L4(z) = —Zo + 1.3797z, + 0.6203z2 < 1.2919. 


The current linear relaxation programming denoted as 
LRP(92”) is: 


min Zp 

st. Ly (z) = —Zy) +z) + Z2 < —0.6931 
Ly(z) = 1.3633z; + 1.36332, > 1.3450 
L3(z) = —Zo + 1.4382z, + 0.5618z 
< —0.5938 
L4(z) = —Zp + 1.3797z, + 0.6203z2 
< 1.2919 
zEM?. 


Then return to step 1, divide the region 92”, and go into 
a new circle. After 22 iterations, the procedure stops. 
The global minimum of problem P is 1.9140, and the 
global solution is 


z* = (1.9140, 0.6933, 0.5107) . 


Then the global minimum of example 1 is 6.7804, and 
the global solution is x* = (2.0003, 1.6664). 
Additionally, to test the algorithm, [19] chooses five 
examples, all of which are taken from engineering, con- 
cerning the detailed application context, please refer to 
the releveant references. 
Example 2 ([1]): 


min Xo 


-1 
s.t. Xo 
1 


xy lxylxs5 + Bas lx? xaXts <1 
Me his _ x2 <-l 

—Xx5 — 2x9x1xX2x3xq 1x5 < —1 

x €X = {x|30 < x < 40; 
0.01 <x, <1; 

0.0001 < x. <1; 

15 < x3 < 20; 

15 < x4 < 20; 


O.1l<x5 <1}. 


Example 3 ([11)): 


min Xo 
st. 0.274x3x7 + 2520.66x1x3 + xox} 
—X9X1X2x3 +1< 1 
Xixe ‘xe <1 
x1x4 <1 
1 
xeX={x|10°? < xp <2; 
20 < x, < 35; 
120 < x2 < 160; 
1< x3 < 10; 


10°<x,<1h. 


IA 


X3Xq 


Example 4 ([20]): 


min Xo 

st, 3.7x5 xe? 4 1.985x, Tx: 
4700.30, X40 = 1 
0.7673x3- — 0.05x; < 1 
xEX = {x|5 <x < 15; 
Ola <5: 


380 < x2 < 450}. 
Example 5 ([20]): 


min Xo 

st. 4x, — 4x5 <1 
—x9 -—x,; <-l 
xe X = {x|0.01 < x9 < 15; 
0.01 < x; < 15}. 


Example 6 ([5]): 
in AP 


st. xixy’ +azixy <1 
—x7?xy!—x2xy! <-1 
xEX = {x|0.1 <x, <1; 
5 < x2 < 10; 
8 < x3 < 15; 


0.01 <x4<1}. 


The following table summarizes the computational 
results on the above five examples. In the table s denotes 
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the number of the iteration, L denotes the longest node 
number in Q, described in the algorithm statement, and 
6 denotes the convergence tolerance. The results show 
that the algorithm of [19] can globally solve the GGP 
problem effectively. 


No. Solution 


(37.0070,0.4489,0.0048,18.0348, 16.0449,0.5667) 
(0.0000, 32.7781,155.0000, 4.7288, 0.0027) 


(11.9637, 0.8098, 442.0915) 
(0.5, 0.5) 
16 | (0.1020, 7.0711, 8.3284, 0.2434) 


no s 6 CPU time 
Prats as 
ta ace 


ise 38 foe] 
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Abstract 


Global equilibrium search is a method that can be ap- 
plied to a variety of hard optimization problems. The 
algorithm utilizes ideas similar to those of the simu- 
lated annealing method. The algorithm accumulates in- 
formation about the search space in order to generate 
new solutions for the subsequent stages. This method 
has been successfully applied to well-known prob- 
lems such as the multidimensional knapsack problem, 
the job-shop scheduling problem, the unconstrained 
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quadratic programming problem, the maximum satis- 
fiability problem, etc. 


The numerous discrete optimization problems that 
arise in practice have such different characteristics that 
development of a general purpose solution method is 
clearly impracticable. One way of tackling this issue is 
to develop a library of suitable solution methods, allow- 
ing the practitioner to choose the most suitable for his 
problem under his time constraints and quality require- 
ments. In recent decades, heuristic approaches, such as 
tabu search [1], simulated annealing (SA) [3], etc., have 
gained a considerable amount of attention from the sci- 
entific community for being the only practical tool that 
can be applied to a wide range of difficult problems. 
Global equilibrium search (GES) offers another highly 
effective tool for solving large-scale optimization prob- 
lems. 

The method was introduced by Shylo [7] in 1999. It 
shares ideas similar to those that inspired the SA tech- 
nique, while providing, in practice, faster asymptotic 
convergence to the optimal solution on a wide class of 
optimization problems. Moreover, the GES method can 
be used in an ensemble with other techniques, which 
makes it more versatile than most of its predecessors. 

Consider a discrete optimization problem of the fol- 
lowing form: 


mint f(x):x €S:S C {0,1}"} (1) 


where f is some quality function. Let us introduce a ran- 
dom binary vector & that takes a value from a feasi- 
ble set S according to the Boltzmann distribution, with 
jt = 0 being the temperature parameter: 


exp(—f (x)) 
DY xes eXP(—Mf (x)) © 


Consider the SA method applied to problem (1). 
It can be shown easily that under certain conditions 
(i.e., symmetric neighborhood structure) the station- 
ary probabilities of the Markov chain associated with 
the SA method are given by (2). 

Set S can be split into two subsets in such a way that 
one of them contains the feasible solutions for which 
the jth component is 1, and another set will contain the 
solution with the jth component equal to 0. Let us name 
these two sets Sj and Sj. Obviously, S; U Sj = S. Then 


PLE(u) = x} = (2) 


the probability of the jth component of € being 1 can be 
expressed as 


Dixes! xP(-KF (x) 
Des exP(—wf(x)) © 


Pi) = PRE) = 1p = 


The idea of the GES method is to use some subset 
of known solutions $ to generate new solutions in the 
successive stages of the algorithm. The distribution (3) 
or any other equivalent formula [4] can be used for such 
a generation (substituting S with S in the formula): 


de exp(—j1f (x)) 


y epenfey 


Pi) = PAE (u) = 1 = 


If arg min{f(x):x € S$} is unique, then the average 
Hamming distance between newly generated solutions 
and the best solution in the set S converges to zero as 
L goes to infinity. However, the speed of such conver- 
gence is not the same for different components of the 
solutions generated, i.e., the speed of convergence of 
the jth component depends on the quality of the solu- 
tions S; compared with the quality of solutions in Sj. 
Simply put, the temperature parameter in (3) controls 
the level of similarity of the newly generated solutions 
with high-quality solutions in S. The uniqueness of the 
best solution x* in S mentioned above should be main- 
tained at all stages of the algorithm. 

One of the limitations of the strategy described 
above is that in order to implement it, there should exist 
an easy way of generating random solutions from S with 
the distribution given by (4). Unfortunately, for some 
problems, the structure of set S would make this hard 
to achieve. For such cases, the local search based tech- 
niques (i. e., SA method, tabu method, GES method) are 
not easily applicable. 

Another issue with generating the random solution 
x from S using (4) is that the components of the ran- 
dom solution x are not independent random variables. 
However, for the simplicity of an algorithm, this is usu- 
ally ignored because the convergence property is more 
important for the performance of the algorithm. 

Whenever the new solution is added to set Ka it is 
easy to recalculate the probabilities p; if the denomina- 
tor and numerator in (4) are stored separately. There- 
fore, there is no need to store the whole set S in the 
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Input: /1 — vector of temperature values, K - number of temperature stages, 
maxnf ail — restart parameter, ngen — # of solutions generated during each 
stage 

Output: 

1: Xbest <- construct random solution; S=E={xpest} 
2: while stopping criterion = FALSE do 
3: if S = @ then 


4: x < construct random solution 
53 Xmax =x 
6: S = {Xmax} (set of known solutions) 
7h E = {Xmax} (set of elite solutions) 
8: end if 
9: for nfail =0 to nfail* do 
10: Xold = Xmax 
ll: for k = 0 to K do a 
iI: calculate generation probabilities(p*, S, ux) 
iis for g = 0 to ngen do 
14: x < generate solution(xmax, p*) 
15: R < search method(x) (R is some subset of encountered solutions) 
16: S=SUR ee 
17: Stress = tuey won, jf) 9 se S GS} 
18: if f(Xmax) < f(Xbest) then 
10: Xbest = Xmax 
20: end if 
ike update_elite_set(E,R) 
WD end for 
DB: end for 
24: if f(xola) > f(%max) then 
ON nfail=0 
26: end if 
Nf S=E 


28: end for 

29: P=PUN(Xbest, dp) 
30th 

31: if RESTART-criterion= TRUE then 


32: E=@ 
33: end if 
34: Si= Es 


35: Xmax = arg min { f(x): x € S} 
36: end while 
37: return Xpest 


Global Equilibrium Search, Figure 1 
Global equilibrium search method (general scheme) 


memory! The notion of S is used below mainly for the ule. As with the SA method, there is no basic recipe 
simplicity of discussion. to provide an optimal schedule for the GES. The gen- 

The performance of any GES-based algorithm is eral advice here is to choose the sequence of increas- 
dependent on the choice of the temperature sched- ing values ftp = 0, [t1, 2 = Mid,..., UK = WK-1a(K 
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is a number of temperature stages and a > 0), in such 
a manner that the algorithm will find the best solu- 
tion from set S almost for sure when generating solu- 
tions with temperature parameter jx. However, there 
is no need to provide a separate cooling schedule for 
each problem solved. Simple scaling of the cost func- 
tion (f’(x) = C- f(x), C > 0) can make one temper- 
ature schedule suitable for a wide range of problems 
from the same class. The choice of scaling factor can be 
made, for example, in the initial stage of the algorithm, 
when py = 0. Additionally, if we multiply the denomi- 
nator and numerator of (4) by exp(wf(x*)), where x* 
is the best solution from S then the convergence to the 
best solution from $ is less dependent on the absolute 
values of solution costs. 

The general scheme of the GES method is presented 
in Fig. 1. There are some elements that are included in 
the scheme, but that were not discussed above: elite so- 
lutions set, prohibition of certain solutions and restart- 
ing the search. These elements are not necessary for 
success of the GES method and can be easily excluded. 
However, for some classes of problems they can provide 
a significant performance improvement. 

The main cycle (lines 2-36) is repeated until some 
stopping criterion is satisfied. The algorithm execution 
can be terminated when the best known record for the 
given problem is improved, or when the running time 
exceeds some limiting value. If the set of known solu- 
tions S is empty, then the initialization of the data set 
is performed in lines 3-7. The cycle in lines 9-28 is ex- 
ecuted until there is no improvement in nfail™ consec- 
utive cycles. The main element of the GES method is 
the temperature cycle (lines 11-23). The probabilities 
that guide the search are estimated using expression (4) 
at the beginning of each temperature stage (line 12). 
For each probability vector, ngen solutions are gener- 
ated (lines 13-22). These solutions are used as initial 
solutions for the local search procedure (line 15). The 
subset of encountered solutions R is used to update set 
S (line 16). 

Some set of the solutions can be stored in mem- 
ory, in order to provide a fast initialization of the algo- 
rithm’s memory structures (lines 27 and 34). Such a set 
is referred to as an elite set in the algorithm pseudocode. 
Certain solutions can be excluded from this set to avoid 
searching the same areas multiple times. In lines 29 and 
30, the solutions for which the Hamming distance to 


Xbest 1s less than parameter d, are excluded from the elite 
set. 

A number of successful applications of the GES 
method have been reported in recent years [6]. The ap- 
plication of the GES method for the multidimensional 
knapsack problem is described in [8]. 

The GES based method was presented in [5] for 
solving job-shop scheduling problems. To date, suit- 
able exact solution methods are not able to find high- 
quality solutions with reasonable computational effort 
for the problems involving more than ten jobs and ten 
machines. The computational testing of the GES algo- 
rithm provided a set of new upper bounds for a wide 
set of challenging benchmark problems [2]. The com- 
parison with existing techniques for job-shop schedul- 
ing asserts that the GES method has a great potential for 
solving scheduling problems. 

The application of GES for the unconstrained 
quadratic programming problem was discussed in [4], 
where GES was used in a combination with a tabu al- 
gorithm. Such an ensemble proved to be an extremely 
efficient tool for large-scale problems, outperforming 
some of the best available solution techniques. 

In conclusion, the universality of the GES method 
together with its flexibility make it an optimization tool 
worth considering. 
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Probability-one homotopy methods are a class of algo- 
rithms for solving nonlinear systems of equations that 
are accurate, robust, and converge from an arbitrary 
starting point almost surely. These new globally con- 
vergent homotopy techniques have been successfully 
applied to solve Brouwer fixed point problems, poly- 
nomial systems of equations, constrained and uncon- 
strained optimization problems, discretizations of non- 
linear two-point boundary value problems based on 
shooting, finite differences, collocation, and finite ele- 
ments, and finite difference, collocation, and Galerkin 
approximations to nonlinear partial differential equa- 
tions. 


Probability-One Globally 
Convergent Homotopies 


A homotopy is a continuous map from the interval [0, 
1] into a function space, where the continuity is with re- 
spect to the topology of the function space. Intuitively, 
a homotopy p(A) continuously deforms the function 
p(0) = g into the function p(1) =f as A goes from 0 to 1. 
In this case, f and g are said to be homotopic. Homotopy 
maps are fundamental tools in topology, and provide 


a powerful mechanism for defining equivalence classes 
of functions. 

Homotopies provide a mathematical formalism for 
describing an old procedure in numerical analysis, vari- 
ously known as continuation, incremental loading, and 
embedding. The continuation procedure for solving 
a nonlinear system of equations f(x) = 0 starts with 
a (generally simpler) problem g(x) = 0 whose solution 
xo is known. The continuation procedure is to track the 
set of zeros of 


P(A, x) =Af(x) + (1 —A)g(x) (1) 


as A is increased monotonically from 0 to 1, starting at 
the known initial point (0, x9) satisfying p(0, xo) = 0. 
Each step of this tracking process is done by starting at 
a point (A,X) on the zero set of p, fixing some AA > 0, 
and then solving p(A+ Ad, x) = 0 for x using a locally 
convergent iterative procedure, which requires an in- 
vertible Jacobian matrix D, pla + Ad, x). The process 
stops at A = 1, since f(x) = p(1, x) = 0 gives a zero X of 
f(x). Note that continuation assumes that the zeros of p 
connect the zero xo of g to a zero X of f, and that the Ja- 
cobian matrix D,e(A, x) is invertible along the zero set 
of p; these are strong assumptions, which are frequently 
not satisfied in practice. 

Continuation can fail because the curve y of zeros 
of p(A, x) emanating from (0, xo) may: 

1) have turning points, 

2) bifurcate, 

3) fail to exist at some A values, or 

4) wander off to infinity without reaching A = 1. 
Turning points and bifurcation correspond to singu- 
lar D,,(A, x). Generalizations of continuation known 
as homotopy methods attempt to deal with cases 1) and 
2) and allow tracking of y to continue through singu- 
larities. In particular, continuation monotonically in- 
creases A, whereas homotopy methods permit A to both 
increase and decrease along y. Homotopy methods can 
also fail via cases 3) or 4). 

The map p(A, x) connects the functions g(x) and 
f(x), hence the use of the word ‘homotopy’. In general 
the homotopy map p(A, x) need not be a simple con- 
vex combination of g and f as in (1), and can involve A 
nonlinearly. Sometimes A is a physical parameter in the 
original problem f(x; A) = 0, where A = 1 is the (nondi- 
mensionalized) value of interest, although ‘artificial pa- 
rameter’ homotopies are generally more computation- 
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ally efficient than ‘natural parameter’ homotopies p(A, 
x) = f(x; A). An example of an artificial parameter ho- 
motopy map is 


P(A, x) = Af (xs A) + (1 —A)(x — a), (2) 


which satisfies p(0, a) = 0. The name ‘artificial’ reflects 
the fact that solutions to p(A, x) = 0 have no physical 
interpretation for A < 1. Note that p(A, x) in (2) has 
a unique zero x = a at A = 0, regardless of the structure 
of f(x; A). 

All four shortcomings of continuation and homo- 
topy methods have been overcome by probability-one 
homotopies, proposed in 1976 by S.N. Chow, J. Mallet- 
Paret, and J.A. Yorke [2]. The supporting theory, based 
on differential geometry, will be reformulated in less 
technical jargon here. 


Definition 1 Let U C R” and V C R? be open sets, 
and let p: Ux[0, 1)xV — R? be a C? map. p is said to 
be transversal to zero if the px(m+1+p) Jacobian matrix 
Dp has full rank on p~!(0). 


The C? requirement is technical, and part of the defini- 
tion of transversality. The basis for the probability-one 
homotopy theory is the parametrized Sard’s theorem, 


[2]: 


Theorem 2 Let p: U x [0, 1) xV > R? be a C? map. 
If p is transversal to zero, then for almost alla € U the 
map 


PalA, x) = pa, A, x) 


is also transversal to zero. 


To discuss the importance of this theorem, take U = R”, 

V = R?, and suppose that the C? map p: R™ x [0, 1) x 

R? — R’ is transversal to zero. A straightforward appli- 

cation of the implicit function theorem yields that for 

almost all a € R”, the zero set of p, consists of smooth, 

nonintersecting curves which either: 

1) are closed loops lying entirely in (0, 1) x R?, 

2) have both endpoints in {0} x R?, 

3) have both endpoints in {1} x R?, 

4) are unbounded with one endpoint in either {0} x R? 
or in {1} x R?, or 

5) have one endpoint in {0} x R? and the other in {1} x 
R’. 

Furthermore, for almost all a € R™, the Jacobian matrix 

Dpz has full rank at every point in p7 '(0). The goal is to 
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Zero set for 0q(A, x) satisfying properties 1)-4) 


construct a map p, whose zero set has an endpoint in 
{0} x R?, and which rules out 2) and 4). Then 5) obtains, 
and a zero curve starting at (0, xo) is guaranteed to reach 
a point (1,x). All of this holds for almost all a € R”, 
and hence with probability one [2]. Furthermore, since 
a € R™ can be almost any point (and, indirectly, so can 
the starting point x), an algorithm based on tracking 
the zero curve in 5) is legitimately called globally con- 
vergent. This discussion is summarized in the following 
theorem (and illustrated in Fig. 1). 


Theorem 3 Let f: R? > R? be a C? map, p: R™x[0, 1)x 
R? — R? aC? map, and p,(A, x) = p(a, A, x). Suppose 
that 
1) p is transversal to zero. 
Suppose also that for each fixed a € R”, 
2) Pa(0, x) = 0 has a unique nonsingular solution xo, 
3) pa(1, x) =f (x) (x € R?). 
Then, for almost all a € R", there exists a zero curve y 
of Pa emanating from (0, xo), along which the Jacobian 
matrix Dag has full rank. 

If, in addition, 
4) p_'(0) is bounded, 
then y reaches a point (1,x) such that f(x) = 0). Fur- 
thermore, if Df (x) is invertible, then y has finite arc 
length. 


Any algorithm for tracking y from (0, xo) to (1,%), 
based on a homotopy map satisfying the hypothe- 
ses of this theorem, is called a globally convergent 
probability-one homotopy algorithm. Of course, the 
practical numerical details of tracking y are nontriv- 
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ial, and have been the subject of twenty years of re- 
search in numerical analysis. Production quality soft- 
ware called HOMPACK90 [6] exists for tracking y. The 
distinctions between continuation, homotopy methods, 
and probability-one homotopy methods are subtle but 
worth noting. Only the latter are provably globally con- 
vergent and (by construction) expressly avoid dealing 
with singularities numerically, unlike continuation and 
homotopy methods which must explicitly handle sin- 
gularities numerically. 

Assumptions 2) and 3) in Theorem 3 are usually 
achieved by the construction of p (such as (2)), and 
are straightforward to verify. Although assumption 1) 
is trivial to verify for some maps, if A and a are involved 
nonlinearly in p the verification is nontrivial. Assump- 
tion 4) is typically very hard to verify, and often is a deep 
result, since 1)-4) holding implies the existence of a so- 
lution to f(x) = 0. 

Note that 1)-4) are sufficient, but not necessary, for 
the existence of a solution to f(x) = 0, which is why 
homotopy maps not satisfying the hypotheses of the 
theorem can still be very successful on practical prob- 
lems. If 1)-3) hold and a solution does not exist, then 
4) must fail, and nonexistence is manifested by y go- 
ing off to infinity. Properties 1)-3) are important be- 
cause they guarantee good numerical properties along 
the zero curve y, which, if bounded, results in a globally 
convergent algorithm. If y is unbounded, then either the 
homotopy approach (with this particular p) has failed 
or f(x) = 0 has no solution. 

A few remarks about the applicability and limita- 
tions of probability-one homotopy methods are in or- 
der. They are designed to solve a single nonlinear sys- 
tem of equations, not to track the solutions of a param- 
eterized family of nonlinear systems as that parameter 
is varied. Thus drastic changes in the solution behavior 
with respect to that (natural problem) parameter have 
no effect on the efficacy of the homotopy algorithm, 
which is solving the problem for a fixed value of the 
natural parameter. In fact, it is precisely for this case of 
rapidly varying solutions that the probability-one ho- 
motopy approach is superior to classical continuation 
(which would be trying to track the rapidly varying so- 
lutions with respect to the problem parameter). Since 
the homotopy methods described here are not for gen- 
eral solution curve tracking, they are not (directly) ap- 
plicable to bifurcation problems. 


Homotopy methods also require the nonlinear sys- 
tem to be C? (twice continuously differentiable), and 
this limitation cannot be relaxed. However, requiring 
a finite-dimensional discretization to be smooth does 
not mean the solution to the infinite-dimensional prob- 
lem must also be smooth. For example, a Galerkin 
formulation may produce a smooth nonlinear system 
in the basis function coefficients even though the ba- 
sis functions themselves are discontinuous. Homotopy 
methods for optimization problems may converge to 
a local minimum or stationary point, and in this regard 
are no better or worse than other optimization algo- 
rithms. In special cases homotopy methods can find all 
the solutions if there is more than one, but in general 
the homotopy algorithms are only guaranteed to find 
one solution. 


Optimization Homotopies 


A few typical convergence theorems for optimization 
are given next (see the survey in [5] for more examples 
and references). Consider first the unconstrained opti- 
mization problem 


min f(x). (3) 


Theorem 4 Let f: R" > R be a C° convex map with 
a minimum atx, ||x||, < M. Then for almost all a, ||a|l2 
< M, there exists a zero curve y of the homotopy map 


Pala, x) = AVF (x) + (1 —A)(x — a), 


along which the Jacobian matrix Dpa(A, x) has full rank, 
emanating from (0, a) and reaching a point (1, x), where 
*% solves (3). 


A function is called uniformly convex if it is convex and 
its Hessian’s smallest eigenvalue is bounded away from 
zero. Consider next the constrained optimization prob- 
lem 


min f(x). (4) 


This is more general than it might appear because the 
general convex quadratic program reduces to a problem 
of the form (4). 


Theorem 5 Let f : R" > R be a C? uniformly convex 
map. Then there exists 5 > 0 such that for almost alla > 0 
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with ||a||2 < 6 there exists a zero curve y of the homotopy 
map 


PalA, x) = AK(x) + (1—A)(x — a), 


3 
of + (22) +0, 
Ox; : 


where 


0 
Kia) = - |S - 


along which the Jacobian matrix Dpa(A, x) has full rank, 
connecting (0, a) to a point (1, x), where x solves the con- 
strained optimization problem (4). 


Given F : R" — R", the nonlinear complementarity 
problem is to find a vector x € R” such that 


x>0, F(x)>0, x'F(x)=0. (5) 


It is interesting that homotopy methods can be adapted 
to deal with nonlinear inequality constraints and com- 
binatorial conditions as in (5). Define G: R” — R" by 


Giz) = —|Fi(z) — zi]? + (Fil2))? + 2}, 


C= Leiwsg thy 
and let 


pa(A,z) = AG(z) + (1 —A)(z— a). 


Theorem 6 Let F: R" > R" be a C? map, and let the 
Jacobian matrix DG(z) be nonsingular at every zero of 
G(z). Suppose there exists r > 0 such that z > 0 and z = 
IZ|loo = r imply F(z) > 0. Then for almost all a > 0 there 
exists a zero curve y of Pq(A, z), along which the Jacobian 
matrix Dpa(A, z) has full rank, having finite arc length 


and connecting (0, a) to (1, Z), where Z solves (5). 


Theorem 7 Let F: R" > R" be a C? map, and let the 
Jacobian matrix DG(z) be nonsingular at every zero of 
G(z). Suppose there exists r > 0 such that z > 0 and ||z||oo 
> r imply z,F;(z) > 0 for some index k. Then there exists 
5 > 0 such that for almost all a => 0 with ||alloo < 6 there 
exists a zero curve y of Pa(A, z), along which the Jacobian 
matrix Dpa(A, z) has full rank, having finite arc length 
and connecting (0, a) to (1, Z), where Z solves (5). 


Homotopy algorithms for convex unconstrained opti- 
mization are generally not computationally competitive 
with other approaches. For constrained optimization 
the homotopy approach offers some advantages, and, 
especially for the nonlinear complementarity problem, 


is competitive with and often superior to other algo- 
rithms. Consider next the general nonlinear program- 
ming problem 


min 0(x) 
st. g(x) <0, (6) 
h(x) = 0, 


where x € R", 6 is real valued, g is an m-dimensional 
vector, and h is a p-dimensional vector. Assume that 
0, g, and h are C*. The Kuhn-Tucker necessary opti- 
mality conditions for (6) are (cf. also ® Equality-con- 
strained nonlinear programming: KKT necessary opti- 
mality conditions): 


VA(x) + BTVA(x) + wT Vg(x) =0, 
h(x) = 0, 

g(x) < 0, 

w= 0, 

ww! g(x) = 0, 


(7) 


where f € R? and yz € R™. The complementarity con- 
ditions pp > 0, g(x) < 0, uT g(x) = 0 are replaced by the 
equivalent nonlinear system of equations 


W(x. 1) = 0, (8) 
where 

Wil, H) = — [mi + glx)? +H} - (gi())’, i) 

ty ee Ms 
Thus the optimality conditions (7) take the form 

F(x, B, ) 

[VO(x) + B'Vh(x) + wIVg(x)]" 
= h(x) =0. 
W(x, 1) 
(10) 

With z = (x, B, j), the proposed homotopy map is 

PalA,z) = AF(z) + (1—A)(z— a), (11) 


where a € R"*?*™, Simple conditions on 6, g, and h 
guaranteeing that the above homotopy map p,(A, z) 
will work are unknown, although this map has worked 
very well on some difficult realistic engineering prob- 
lems. 
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Taxonomy of homotopy subroutines 


x= ee) F(x) =0 p(a,A,x) =0 algorithm 
dense sparse dense sparse dense sparse 
FIXPDF FIXPDS FIXPDF FIXPDS FIXPDF FIXPDS | ordinary differential equation 
FIXPNF FIXPNS FIXPNF FIXPNS FIXPNF FIXPNS | normal flow 
FIXPQF | FIXPQS | FIXPQF | FIXPQS | FIXPQF | FIXPQS | augmented Jacobian matrix 


Frequently in practice the functions 6, g, and h in- 
volve a parameter vector c, and a solution to (6) is 
known for some c = c. Suppose that the problem un- 
der consideration has parameter vector c = c“). Then 


c= (1—-A)e™ +AcM (12) 


parametrizes c by A and 6 = @(x;c) = O(x;c(A)), g = 
g(x;c(A)), h = h(x;c(A)). The optimality conditions in 
(10) become functions of A as well, F(A, x, B, 4) = 0, 
and 


Pal(A,z) = AF(A,z) + (1—A)(z— a) (13) 


is a highly implicit nonlinear function of A. If F(0, 20) 
= 0, a good choice for a in practice has been found to 
be a = 2. A natural choice for a homotopy would be 
simply 


F(A, z) = 0, (14) 


since the solution z to F(0, z) = 0 (the problem cor- 
responding to c = c) is known. However, for various 
technical reasons, (13) is much better than (14). 


Software 


There are several software packages implementing both 
continuous and simplicial homotopy methods; see [1] 
and [6] for a discussion of some of these packages. 
A production quality software package written in For- 
tran 90 is described here. HOMPACK90 [6] is a For- 
tran 90 collection of codes for finding zeros or fixed 
points of nonlinear systems using globally convergent 
probability-one homotopy algorithms. Three qualita- 
tively different algorithms (ordinary differential equa- 
tion based, normal flow, quasi- Newton augmented Ja- 
cobian matrix) are provided for tracking homotopy 
zero curves, as well as separate routines for dense and 
sparse Jacobian matrices. A high level driver for the spe- 


cial case of polynomial systems is also provided. HOM- 
PACK90 features elegant interfaces, use of modules, 
support for several sparse matrix data structures, and 
modern iterative algorithms for large sparse Jacobian 
matrices. 

HOMPACK90 is logically organized in two differ- 
ent ways: by algorithm/problem type and by subroutine 
level. There are three levels of subroutines. The top level 
consists of drivers, one for each problem type and algo- 
rithm type. The second subroutine level implements the 
major components of the algorithms such as stepping 
along the homotopy zero curve, computing tangents, 
and the end game for the solution at A = 1. The third 
subroutine level handles high level numerical linear al- 
gebra such as QR factorization, and includes some LA- 
PACK and BLAS routines. The organization of HOM- 
PACK90 by algorithm/problem type is shown in Ta- 
ble 1, which lists the driver name for each algorithm and 
problem type. 

The naming convention is 


D 

Q 

where D ~ ordinary differential equation algorithm, 
N ®& normal flow algorithm, Q ~ quasi-Newton aug- 
mented Jacobian matrix algorithm, F ~ dense Jaco- 
bian matrix, and S ~ sparse Jacobian matrix. Depend- 
ing on the problem type and the driver chosen, the user 
must write exactly two subroutines, whose interfaces 
are specified in the module HOMOTOPY, defining 
the problem (f or p). The module REAL_PRECISION 
specifies the real numeric model with 


FIXP 


SELECTED_REAL_KIND(13), 


which will result in 64-bit real arithmetic on a Cray, 
DEC VAX, and IEEE 754 Standard compliant hard- 
ware. 
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The special purpose polynomial system solver POL- 
SYS1H can find all solutions in complex projective 
space of a polynomial system of equations. Since a poly- 
nomial programming problem (where the objective 
function, inequality constraints, and equality con- 
straints are all in terms of polynomials) can be formu- 
lated as a polynomial system of equations, POLSYS1H 
can effectively find the global optimum of a polyno- 
mial program. However, polynomial systems can have 
a huge number of solutions, so this approach is only 
practical for small polynomial programs (e. g., surface 
intersection problems that arise in CAD/CAM model- 
ing). 

The organization of the Fortran 90 code into mod- 
ules gives an object oriented flavor to the package. For 
instance, all of the drivers are encapsulated in a single 
MODULE HOMPACK90. The user’s calling program 
would then simply contain a statement like 


USE HOMPACK90, ONLY : FIXPNF 


Many scientific programmers prefer the reverse call 
paradigm, whereby a subroutine returns to the calling 
program whenever the subroutine needs certain infor- 
mation (e.g., a function value) or a certain operation 
performed (e. g., a matrix-vector multiply). Two reverse 
call subroutines (STEPNX, ROOTNX) are provided for 
‘expert’ users. STEPNX is an expert reverse call step- 
ping routine for tracking a homotopy zero curve y that 
returns to the caller for all linear algebra, all function 
and derivative values, and can deal gracefully with sit- 
uations such as the function being undefined at the re- 
quested steplength. 

ROOTNX provides an expert reverse call end game 
routine that finds a point on the zero curve where g(A, 
x) = 0, as opposed to just the point where A = 1. Thus 
ROOTNX can find turning points, bifurcation points, 
and other ‘special’ points along the zero curve. The 
combination of STEPNX and ROOTNX provide con- 
siderable flexibility for an expert user. 


See also 


> Parametric Optimization: Embeddings, Path 
Following and Singularities 
> Topology of Global Optimization 
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Abstract 


It is becoming apparent that convex financial planning 
models are at times a poor approximation of the real 
world. More realistic, and more relevant, models need 
to dispense with normality assumptions and concavity 
of the utility functions to be optimized. Moreover, the 
problems are large scale but structured; consequently 
specialized algorithms have been proposed for their 
solution. The aim of this article is to discuss a non- 
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convex portfolio-selection problem and describe algo- 
rithms that can be used for its solution. 


Background 


Modern portfolio theory started in the 1950s with 
H. Markowitz’s work [16,17]. Since then a lot of re- 
search has been done in improving the basic models 
and dispensing with the limiting assumptions of the 
field. The aim of this article is to introduce the problem 
of optimization of higher-order moments of a portfo- 
lio. This model is an extension of the celebrated mean- 
variance model of Markowitz [16,17]. The inclusion of 
higher-order moments has been proposed as one possi- 
ble augmentation to the model in order to make it more 
applicable. The applicability of the model can be broad- 
ened by relaxing one of its major assumptions, i. e. that 
the rate of returns are normal. In order to solve the 
portfolio-selection problem, we first need to address the 
problem of scenario generation, i.e. the description of 
the uncertainties used in the portfolio-selection prob- 
lem. Both problems are non-convex, large-scale, and 
highly relevant in financial optimization. 

We focus on a single-period model where the deci- 
sion maker (DM) provides as input preferences with re- 
spect to mean, variance, skewness and possibly kurtosis 
of the portfolio. Using these four parameters we then 
formulate the multicriterion optimization problem as 
a standard non-linear programming problem. This ver- 
sion of the decision model is a non-convex linearly con- 
strained problem. 

Before we can solve the portfolio-selection prob- 
lem we need to describe the uncertainties regarding the 
returns of the risky assets. In particular we need to spec- 
ify: (1) the possible states of the world and (2) the prob- 
ability of each state. A common approach to this mod- 
elling problem is the method of matching moments (see 
e. g. [5,9,20]). The first step in this approach is to use the 
historical data to estimate the moments (in this paper 
we consider the first four central moments, i.e. mean, 
variance, skewness and kurtosis). The second step is to 
compute a discrete distribution with the same statisti- 
cal properties as those calculated in the previous step. 
Given that our interest is on real-world applications, we 
recognize that there may not always be a distribution 
that matches the calculated statistical properties. For 
this reason we formulate the problem as a least-squares 


problem [5,9]. The rationale behind this formulation 
is that we try to calculate a description of the uncer- 
tainty that matches our beliefs as well as possible. The 
scenario-generation problem also has a non-convex ob- 
jective function and is linearly constrained. 

For the two problems described above we apply 
a new stochastic global optimization algorithm that has 
been developed specifically for this class of problems. 
The algorithm is described in [19]. It is an extension 
of the constrained case of the so-called diffusion algo- 
rithm [1,4,6,7]. The method follows the trajectory of 
an appropriately defined stochastic differential equa- 
tion (SDE). Feasibility of the trajectory is achieved by 
projecting its dynamics onto the set defined by the lin- 
ear equality constraints. A barrier term is used for the 
purpose of forcing the trajectory to stay within any 
bound constraints (e.g. positivity of the probabilities, 
or bounds on how much of each asset to own). 

A review of applications of global optimization to 
portfolio selection problems appeared in [13]. A de- 
terministic global optimization algorithm for a mul- 
tiperiod model appeared in [15]. This article comple- 
ments the work mentioned above in the sense that we 
describe a complete framework for the solution of a re- 
alistic financial model. The type of models we consider, 
due to the large number of variables, cannot be solved 
by deterministic algorithms. Consequently, practition- 
ers are left with two options: solve a simpler, but less 
relevant, model or use a heuristic algorithm (e.g. tabu- 
search or evolutionary algorithms). The approach pro- 
posed in this paper lies somewhere in the middle. The 
proposed algorithm belongs to the simulated-anneal- 
ing family of algorithms, and it has been shown in [19] 
that it converges to the global optimum (in a proba- 
bilistic sense). Moreover, the computational experience 
reported in [19] seems to indicate that the method is 
robust (in terms of finding the global optimum) and re- 
liable. We believe that such an approach will be useful 
in many practical applications. 


Models 
Scenario Generation 


From its inception stochastic programming (SP) has 
found several diverse applications as an effective 
paradigm for modelling decisions under uncertainty. 
The focus of initial research was on developing effec- 
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tive algorithms for models of realistic size. An area that 
has only recently received attention is on methods to 
represent the uncertainties of the decision problem. 

A review of available methods to generate meaning- 
ful descriptions of the uncertainties from data can be 
found in [5]. We will use a least-squares formulation 
(see e.g. [5,9]). It is motivated by the practical concern 
that the moments, given as input, may be inconsistent. 
Consequently, the best one can do is to find a distribu- 
tion that fits the available data as well as possible. It is 
further assumed that the distribution is discrete. Under 
these assumptions the problem can be written as 


n k 
min > (Y0 pymi(o)) =i). 
i=1 j=1 
: J 
st > y= lpj = 0j= i Paeeenen ce 


j=l 


where ju; represents the statistical properties of interest 
and m;(-) is the associated ‘moment’ function. For ex- 
ample, if jz; is the target mean for the ith asset, then 
mj(@j) = ow; i.e. the jth realization of the ith asset. 
Numerical experiments using this approach for a mul- 
tistage model were reported in [9] (without arbitrage 
considerations). Other methods such as maximum en- 
tropy [18] and semidefinite programming [2] enjoy 
strong theoretical properties but cannot be used when 
the data of the problem are inconsistent. A disadvan- 
tage of the least-squares model is that it is highly non- 
convex, which makes it very difficult to handle numer- 
ically. These considerations lead to the development 
of the algorithm described in Sect. “A Stochastic Opti- 
mization Algorithm” (see also [19]) that can efficiently 
compute global optima for problems in this class. 
When using scenario trees for financial planning 
problems it becomes necessary to address the issue of 
arbitrage opportunities [9,12]. An arbitrage opportu- 
nity is a self-financing trading strategy that generates 
a strictly positive cash flow in at least one state and 
whose payoffs are non-negative in all other states. In 
other words, it is possible to get something for nothing. 
In our implementation we eliminate arbitrage oppor- 
tunities by computing a sufficient set of states so that 
the resulting scenario tree has the arbitrage-free prop- 
erty. This is achieved by a simple two-step process. In 
the first step we generate random rates of returns; these 


are sampled by a uniform distribution. We then test for 
arbitrage by solving the system 


m m 
Le a) et — is , 
Xp =e ) Xj Tj; ) j= 1, 7; > 0, 


j=1 j=l (1) 


j= 
j=l,...,m 


where xi represents the current (known) state of the 
world for the ith asset and xi represents the jth real- 
ization of the ith asset in the next time period (these 
are generated by the simulations mentioned above). 
r is the riskless rate of return. The z; are called the risk- 
neutral probabilities. According to a fundamental re- 
sult of Harisson and Kerps [10], the existence of the 
risk-neutral probabilities is enough to guarantee that 
the scenario tree has the desired property. In the sec- 
ond step, we solve the least-squares problem with some 
of the states fixed to the states calculated in the first step. 
In other words, we solve the following problem: 


n k m 
min 1 (Do pimilon)) + pumila) — mi) 
* ji=1 j=l 1=1 
(2) 


k+m 
st) Pep SU: fH Laem, 


j=l 


In the problem above, © are fixed. Solving the preced- 
ing problem guarantees a scenario tree that is arbitrage 
free. 


Portfolio Selection 


In this section we describe the portfolio-selection prob- 
lem when higher-order terms are taken into account. 
The classical mean-variance approach to portfolio 
analysis seeks to balance risk (measured by variance) 
and reward (measured by expected value). There are 
many ways to specify the single-period problem. We 
will be using the following basic model: 


min — aE[w] + BV[w] 
Ww 
(3) 
st Yow) =1 lj<wi<u; i=1,...,n, 
i=1 
where E[-] and V[-] represent the mean rate of return 
and its variance respectively. The single constraint is 


known as the budget constraint and it specifies the ini- 
tial wealth (without loss of generality we have assumed 
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that this is one). The a and 8 are positive scalars and 
are chosen so that a + 6 = 1. They specify the DM’s 
preferences, i.e. @ = 1 means that the DM is risk seek- 
ing, while 6 = 1 implies that the DM is risk averse. Any 
other selection of the parameters will produce a point 
on the efficient frontier. The decision variable (w) rep- 
resents the commitment of the DM to a particular asset. 
Note that this problem is a convex quadratic program- 
ming problem for which very efficient algorithms exist. 
The interested reader is referred to the review in [23] 
for more information regarding the Markowitz model. 

We propose an extension of the mean-variance 
model using higher-order moments. The vector- 
optimization problem can be formulated as a standard 
non-convex optimization problem using two additional 
scalars to act as weights. These weights are used to en- 
force the DM’s preferences. The problem is then for- 
mulated as follows: 


min — aE[w] + BY[w] — yS[w] + é6K[w] 
. (4) 
st Yow) =1 l;<wi<uj i=1,...,n, 


i=1 


where S[-] and K[-] represent the skewness and kurto- 
sis of the rate of return respectively. y and 6 are positive 
scalars. The four scalar parameters are chosen so that 
they sum to one. Positive skewness is desirable (since 
it corresponds to higher returns, albeit with low proba- 
bility), while kurtosis is undesirable since it implies that 
the DM is exposed to more risk. The model in (4) can 
be extended to multiple periods while maintaining the 
same structure (non-convex objective and linear con- 
straints). The numerical solution of (2) and (4) will be 
discussed in the next section. 


Methods 
A Stochastic Optimization Algorithm 
The models described in the previous section can be 
written as: 
min f(x) 
x 
s.t Ax = b 
x 2.0. 


A well-known method for obtaining a solution to an 
unconstrained optimization problem is to consider the 


following ordinary differential equation (ODE): 
dxX(t) = —Vf(X(4))dt. (5) 


By studying the behaviour of X(t) for large t, it can be 
shown that X(t) will eventually converge to a stationary 
point of the unconstrained problem. A review of so- 
called continuous-path methods can be found in [25]. 
A deficiency of using (5) to solve optimization prob- 
lems is that it will get trapped in local minima. To al- 
low the trajectory to escape from local minima, it has 
been proposed by various authors (e.g. [1,4,6,7]) to 
add a stochastic term that would allow the trajectory 
to ‘climb’ hills. One possible augmentation to (5) that 
would enable us to escape from local minima is to add 
noise. One then considers the diffusion process: 


dx(t) = —Vf(X(t))dt+ V2T(t)dB(t) ; (6) 


where B(t) is the standard Brownian motion in R". It 
has been shown in [4,6,7], under appropriate condi- 
tions on f and T(t), that as t + 00, the transition prob- 
ability of X(t) converges to a probability measure /7. 
The latter has its support on the set of global minimiz- 
ers. 

For the sake of argument, suppose we did not have 
any linear constraints but only positivity constraints. 
We could then consider enforcing the feasibility of the 
iterates by using a barrier function. According to the al- 
gorithmic framework sketched out above, we could ob- 
tain a solution to our (simplified) problem by following 
the trajectory of the following SDE: 


dX(t) = —Vf(X(t)) dt+ X(t) dt+ V2T(p dB(A), 
(7) 


where j4 > 0 is the barrier parameter. By X"! we will 
denote an n-dimensional vector whose ith component 
is given by 1/X;. Having used a barrier function to deal 
with the positivity constraints, we can now introduce 
the linear constraints into our SDE@. This process has 
been carried out in [19] using the projected SDE: 


dX(t) = P[—-Vf(X(t))+uX(t)']dt+ V2T(t)PdB(t), 

(8) 
where P = I— A’(AA‘)~!A. The proposed algorithm 
works in a similar manner to gradient-projection al- 


gorithms. The key difference is the addition of a bar- 
rier parameter for the positivity of the iterates and 
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a stochastic term that helps the algorithm escape from 
local minima. 

The global optimization problem can be solved by 
fixing jz and following the trajectory of (8) for a suit- 
ably defined function T(t). After sufficient time passes, 
we reduce jz and repeat the process. The proof that fol- 
lowing the trajectory of (8) will eventually lead us to the 
global minimum appears in [19]. Note that the projec- 
tion matrix for the type of constraints we need to im- 
pose for our models is particularly simple. For a con- 
straint of the type }7_, x; = 1 the projection matrix is 
given by 


4 —- ifi Fj, 
_~ — otherwise. 
Other Methods 


In this article we have focused on the numerical solu- 
tion of a financial planning problem using a stochas- 
tic algorithm. We end this article by briefly discussing 
other possible approaches. Only stochastic methods 
will be discussed; for deterministic methods we refer 
the interested reader to [13]. 

Two-phase methods: Methods belonging to this 
class, as the name suggests, have two phases: a local 
and global phase. In the global phase, the feasible region 
is uniformly sampled. From each feasible point a local 
optimization algorithm is started. The later process is 
the local phase. This basic algorithmic framework has 
been modified to improve its performance by various 
authors. Improving this type of method requires care- 
ful selection of the sample points from which to start 
the local optimizations. Inevitably there is some com- 
promise between computational efficiency and theoret- 
ical convergence. For a review of two-phase methods 
we refer the reader to [21] and references therein. 

Simulated annealing (SA): This family of algo- 
rithms was inspired by the physical behaviour of atoms 
in a liquid. The method was independently proposed 
by Cerny[3] and Kirkpatrick et al. [11]. The method 
is inspired by a fundamental question of statistical me- 
chanics concerning the behaviour of the system in low 
temperatures. For example, will the atoms remain fluid 
or will they solidify? If they solidify, do they form 
a crystalline solid or a glass? It turns out [11] that if 
the temperature is decreased slowly, then they form 


a pure crystal; this state corresponds to the minimum 
energy of the system. If the temperature is decreased 
too quickly, then they form a crystal with many defects. 
SA algorithms generate a point from some distribution. 
Whether to accept the new point or not is decided by 
an acceptance function. The latter function is ‘temper- 
ature’ dependent. At high temperatures the function is 
likely to accept the new point, while at low tempera- 
tures only points close to the global optimum value are 
supposed to be accepted. As can be anticipated, the per- 
formance of the algorithm depends on the annealing 
schedule, i. e. how fast the temperature is reduced. Per- 
formance also depends on how points are sampled, the 
acceptance function and, of course, the stopping condi- 
tions. An excellent review article for SA is [14]. 

Stochastic adaptive search methods: These types 
of algorithms have strong theoretical properties but 
present challenging implementation issues. A typical 
algorithm from this class is the pure adaptive search 
method. This method works like a pure random search 
method but with the additional assumption of the abil- 
ity to sample from a distribution that gives realizations 
that are strictly better than the incumbent. There exist 
many variants and combinations of this type of method, 
and an excellent review of them is given in [24]. 

Genetic algorithms: This class of algorithms has 
been inspired by concepts from evolutionary biology 
and from aspects of natural selection. There are two 
phases in these algorithms: generation of the popula- 
tion and updating. During the generation phase, can- 
didate points (offsprings) are generated by sampling 
a p.d.f. This p.df. is usually specified from the origi- 
nal or the previous generation (the parents). In the sec- 
ond phase the population is updated. This update is 
performed by applying a selection mechanism and per- 
forming mutation operations on the population. There 
are very few theoretical results concerning the con- 
vergence properties of genetic algorithms. However, if 
their success in applications is anything to go by, then 
more attention needs to devoted to convergence as- 
pects of the method. An excellent review of genetic al- 
gorithms is given in [22]. 

Tabu search: This is another heuristic algorithm 
that has been successfully used for global optimization 
(especially combinatorial problems) but lacks theoret- 
ical backing. This class of algorithms was proposed by 
Glover, and a review of the method appeared in [8]. The 
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algorithm has three phases: preliminary search, intensi- 
fication, and diversification. In the first phase, the algo- 
rithm takes the current configuration, examines neigh- 
bouring solutions, and selects the one with the best 
objective function value. This process is continued un- 
til no improving state can be identified. At this stage the 
possibility of returning to this point is ruled out by plac- 
ing it into a list. This list is called the tabu list. In the 
second phase (intensification), the tabu list is cleared 
and the algorithm returns to the first phase. In the fi- 
nal stage (diversification), the most frequent moves that 
were placed into the tabu list during the first phase are 
placed from the start into the list. The algorithm then 
starts from a random initial point. In this phase the al- 
gorithm is not allowed to make any moves that are in 
the tabu list. 
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Environmental Systems Analysis 
and Optimization 


The harmonized consideration of technical, economic 
and environmental objectives in strategic planning and 
operational decision making is of paramount impor- 
tance, on a worldwide scale. Environmental quality is- 
sues are of serious concern even in the most developed 
countries, although direct pollution control expendi- 
tures are typically in the 2-3 percent range of their gross 
domestic product. The ‘optimized’ or at least ‘accept- 
able’ solution of environmental quality problems re- 
quires the combination of knowledge from a multitude 
of areas, and requires an interdisciplinary effort. 

In the past decades, mathematical programming 
(MP) models have been applied also to the analysis 
and management of environmental systems. The an- 
notated bibliography [9] reviews over 350 works, in- 
cluding some thirty books. Note further that the en- 
gineering, economic and environmental science litera- 
ture contains a very large amount of work that can serve 
as a basis and therefore is closely related to such mod- 
eling efforts. For instance, the classic textbook [28] re- 
views the basic quantitative models applied in describ- 
ing physical, chemical and biological phenomena of rel- 
evance. A more recent exposition (with a somewhat 
broader scope) is presented in, for instance, [11]. The 
chapters in the latter edited volume discuss the follow- 
ing issues: 

e environmental crisis, as a multidisciplinary chal- 
lenge; 


soil pollution; 

air pollution; 

water pollution; 

water resources management; 

pesticides; 

gene technology; 

landscape planning; 

environmental economics; 

ecological aspects; 

environmental impact assessment; 

environmental management models. 
Environmental management models are discussed - in 
the broader context of governmental planning and op- 
erations - already in [8]. In addition to items listed 
above, the (relevant) topics covered include also 

e solid waste management; 

e urban development; 

e policy analysis. 

Numerous further books can be mentioned; with vary- 
ing emphasis on environmental science, engineering, 
economics or systems analysis. Consult, e.g., [1,2,3, 
4,6,10,13,15,16,17,18,19,23,24,25,29,31,32,33]. Most of 
these works also provide extensive lists of additional 
references. 

In the framework of this short article there is no 
room to go into any detailed discussion of environ- 
mental models. Therefore we shall only emphasize one 
important methodological aspect reflected by the ti- 
tle: namely, the relevance of global optimization in this 
context. 

The predominant majority of MP models pre- 
sented, e. g., in the books listed or in [9] belong to (con- 
tinuous or possibly mixed integer) linear programming, 
or to convex nonlinear programming, with additional - 
usually rather simplified - considerations regarding 
system stochasticity. At the same time, more detailed or 
more realistic models of natural systems and their gov- 
erning processes often possess high (explicit or hidden) 
high nonlinearity. For instance, one may think of power 
laws, periodic or chaotic processes, and (semi)random 
fluctuations, reflected by many natural objects on var- 
ious scales: mountains, waters, plants, animals, and so 
on. For related far-reaching discussions, consult, for 
example, [5,7,20,21], or [30]. Since many natural ob- 
jects and processes are inherently nonlinear, manage- 
ment models that optimize the behavior of environ- 
mental systems frequently lead to multi-extremal deci- 
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sion problems. Continuous global optimization (GO) 
is aimed at finding the ‘absolutely best’ solution of such 
models, in the possible presence of many other (locally 
optimal) solutions of various quality. See » Continu- 
ous global optimization: Models, algorithms and soft- 
ware and > Continuous global optimization: Applica- 
tions for anumber of textbooks and WWW sites related 
to the subject of GO. Therefore, here we mention only 
the handbook [14] and the WWW site [22]. 

We shall illustrate the relevance of GO by two very 
general examples, adapted from [26]. The latter book 
presents also a number of other case studies related 
to environmental modeling and management, with nu- 
merous additional references pertinent to this subject. 


Model Calibration 


The incomplete or poor understanding of environmen- 

tal - as well as many other complex - systems calls 

for descriptive model development as an essential tool 
of the related research. The following main phases of 
quantitative systems modeling can be distinguished: 

e identification: formulation of principal modeling 
objectives, determination (selection) of suitable 
model structure; 

e calibration: (inverse) model fitting to available data 
and background information; 

e validation and application in analysis, forecasting, 
control, management. 

Consequently, the ‘adequate’ or ‘best’ parameterization 

of descriptive models is an important stage in the pro- 

cess of understanding environmental systems. Interest- 
ing, practically motivated discussions of the model cal- 
ibration problem are presented also in [1,3,12,32]. 

A fairly simple and commonly applied instance of 
the model calibration problem can be stated as follows. 
Given 
e a descriptive system model (e.g. of a lake, river, 

groundwater or atmospheric system) that depends 

on certain unknown (physical, chemical) parame- 

ters; their vector is denoted by x; 

the set of a priori feasible parameterizations D; 

e the model output values y\” = y(”? (x) at time mo- 
ments t= 1,..., T; 

e aset of corresponding observations y; att=1,..., T; 

e adiscrepancy measure denoted by f which expresses 
the distance between y\”” and y,. 


Then the optimized model calibration problem can be 
formulated as 


min f(x) := FLy(x), ved 
s.t. x € D. 


(1) 


Frequently, D is a finite n-interval (a ‘box’); fur- 
thermore, f is a continuous or somewhat more special 
(smooth, Lipschitz, etc.) function. Additional structural 
assumptions regarding f may be difficult to postulate, 
due to the following reason. For each fixed parame- 
ter vector x, the model output sequence fy (x)} may 
be produced by some implicit formulas, or by a com- 
putationally demanding numerical procedure (such as 
e. g., the solution ofa system of partial differential equa- 
tions). Consequently, although model (1) most typically 
belongs to the general class of continuous GO prob- 
lems, a more specific classification may be difficult to 
provide. Therefore one needs to apply a GO procedure 
that enables the solution of the calibration problem un- 
der the very general conditions outlined above. 

To conclude the brief discussion of this example, 
note that in [26] several variants of the calibration prob- 
lem statement are studied in detail. Namely, the model 
development and solver system LGO is applied to solve 
model calibration problems related to water quality 
analysis in rivers and lakes, river flow hydraulics, and 
aquifer modeling. (More recent implementations of 
LGO are described elsewhere: consult, e. g., [27].) 


‘Black Box’ Optimization 
(in Environmental Systems) 


As outlined above, the more realistic - as opposed to 
strongly simplified - analysis of environmental pro- 
cesses frequently requires the development of sophisti- 
cated systems of (sub)models: these are then connected 
to a suitable optimization modeling framework. For ex- 
amples of various complexity, consult [1,2,10,19,32]. 
We shall illustrate this point by briefly discussing 
a modeling framework for river water quality man- 
agement: for additional details, see [26] and references 
therein. 

Assume that the ambient water quality in a river at 
time t is characterized by a certain vector s(t). The com- 
ponents in s(t) can include, for instance the following: 
suspended solids concentration, dissolved oxygen con- 
centration, biological oxygen demand, chemical oxy- 
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gen demand, concentrations of micro-pollutants and 
heavy metals, and so on. Naturally, the resulting water 
quality is influenced by a number of factors. These in- 
clude the often stochastically fluctuating (discharge or 
nonpoint source) pollution load, as well as the regional 
hydro-meteorological conditions (streamflow rate, wa- 
ter temperature, etc). Some of these factors can be di- 
rectly observed, while some others may not be com- 
pletely known. In a typical model development pro- 
cess, submodels are constructed to describe all physi- 
cal, chemical, biological, and ecological processes of rel- 
evance. (As for an example, one can refer to the classical 
Streeter—Phelps differential equations that approximate 
the longitudinal evolution of biological oxygen demand 
in a river; consult [25,28].) 

In order to combine such system description with 
management models, one has to be able to evaluate all 
decision considered. Each given decision x can be re- 
lated, inter alia, to the location and sizing of industrial 
and municipal wastewater treatment plants, the control 
of nonpoint source (agricultural) pollution, the design 
of a wastewater sewage collection network, the daily op- 
eration of these facilities, and so on. The analysis fre- 
quently involves the computationally intensive evalua- 
tion of environmental quality - e.g., by solving a sys- 
tem of (partial) differential equations —- for each deci- 
sion option considered. The quite (possibly) more real- 
istic stochastic extensions of such models may also re- 
quire the execution of Monte-Carlo simulation cycles. 
Under such or similar circumstances, environmental 
management models can be (very) complex consisting 
of a number of ‘black box’ submodels. Consequently, 
the following general conceptual modeling framework 
may, and often will, lead to multi-extremal model in- 
stances requiring the application of suitable GO tech- 
niques: 


min{TCEM(x)}, 


EQmin < EQ(x) = EQmax: (2) 
TFrin s TF(x) s TF inaxs 


in which 

e TCEM(x) is total (discounted, expected) costs of en- 
vironmental management; 

e EQ(x) is resulting environmental quality (vector); 

© EQmin and EQmax are vector bounds on ‘acceptable’ 
environmental quality indicators; 


e TF(x) are resulting technical system characteristics 
(vector); 
e@ TFnin and TFmax are vector bounds on ‘acceptable’ 
technical characteristics. 
Numerous other examples could be cited: similarly to 
the case considered above, they may involve the solu- 
tion of systems of (algebraic, ordinary or partial dif- 
ferential) equations, and/or the statistical analysis of 
the environmental (model) system studied. For fur- 
ther examples - including data analysis, combination 
of expert opinions, environmental model calibration, 
industrial wastewater management, regional pollution 
management in rivers and lakes, risk assessment and 
control of accidental pollution - in the context of 
global optimization consult, e.g., [26], and references 
therein. 


See also 


> Continuous Global Optimization: Applications 

> Continuous Global Optimization: Models, 
Algorithms and Software 

> Interval Global Optimization 
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> Optimization in Water Resources 
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The reliable calculation of phase equilibrium for mul- 
ticomponent mixtures is a critical aspect in the simu- 
lation, optimization and design of a wide variety of in- 
dustrial processes, especially those involving separation 
operations such as distillation and extraction. It is also 
important in the simulation of enhanced oil recovery 
processes such as miscible or immiscible gas flooding. 
Unfortunately, however, even when accurate models of 
the necessary thermodynamic properties are available, 
it is often very difficult to actually solve the phase equi- 
librium problem reliably. 


Background 


The computation of phase equilibrium is often consid- 
ered in two stages, as outlined by M.L. Michelsen [12, 
13]. The first involves the phase stability problem, that 
is, to determine whether or not a given mixture will split 
into multiple phases. The second involves the phase 
split problem, that is to determine the amounts and 
compositions of the phases assumed to be present. Af- 
ter a phase split problem is solved it may be necessary 
to do phase stability analysis on the results to deter- 
mine whether the postulated number of phases was cor- 


Global Optimization: Application to Phase Equilibrium Problems 


1287 


rect, and if not repeat the phase split problem. Both the 
phase stability and phase split problems can be formu- 
lated as minimization problems, or as equivalent non- 
linear equation solving problems. 

For determining phase equilibrium at constant tem- 
perature and pressure, the most commonly considered 
case, a model of the Gibbs free energy of the system is 
required. This is usually based on an excess Gibbs en- 
ergy model (activity coefficient model) or an equation 
of state model. At equilibrium the total Gibbs energy of 
the system is minimized. Phase stability analysis may be 
interpreted as a global optimality test that determines 
whether the phase being tested corresponds to a global 
optimum in the total Gibbs energy of the system. If it 
is determined that a phase will split, then a phase split 
problem is solved, which can be interpreted as finding 
a local minimum in the total Gibbs energy of the sys- 
tem. This local minimum can then be tested for global 
optimality using phase stability analysis. If necessary 
the phase split calculation must then be repeated, per- 
haps changing the number of phases assumed to be 
present, until a solution is found that meets the global 
optimality test. Clearly the correct solution of the phase 
stability problem, itself a global optimization problem, 
is the key in this two-stage global optimization pro- 
cedure for phase equilibrium. As emphasized in [10], 
while it is possible to apply rigorous global optimization 
techniques directly to the phase equilibrium problem, 
it is computationally more efficient to use a two-stage 
approach such as outlined above, since the dimension- 
ality of the global optimization problem that must be 
solved (phase stability problem) is less than that of the 
full phase equilibrium problem. 

In solving the phase stability problem, the conven- 
tional solution methods are initialization dependent, 
and may fail by converging to trivial or nonphysical 
solutions or to a point that is a local but not a global 
minimum. Thus there is no guarantee that the phase 
equilibrium problem has been correctly solved. Because 
of the difficulties that may arise in solving phase equi- 
librium problems by standard methods (e. g., [12,13]), 
there has been significant interest in the development 
of more reliable methods. For example, the methods of 
A.C. Sun and W.D. Seider [16], who use a homotopy 
continuation approach, and of S.K. Wasylkiewicz, L.N. 
Sridhar, M.F. Malone and M.F. Doherty [18], who use 
an approach based on topological considerations, can 


offer significant improvements in reliability. CM. Mc- 
Donald and C.A. Floudas [7,8,9,10] show that, for cer- 
tain activity coefficient models, the phase stability and 
equilibrium problems can be made amenable to solu- 
tion by powerful global optimization techniques, which 
provide a mathematical guarantee of reliability. 

An alternative approach for solving the phase sta- 
bility problem, based on interval analysis, that pro- 
vides both mathematical and computational guarantees 
of global optimality, was originally suggested by M.A. 
Stadtherr, C.A. Schnepper and J.F. Brennecke [15], who 
applied it in connection with activity coefficient mod- 
els, as later done also in [11]. This technique, in par- 
ticular the use of an interval Newton and generalized 
bisection algorithm, is initialization independent and 
can solve the phase stability problem with mathematical 
certainty, and, since it deals automatically with round- 
ing error, with computational certainty as well. J.Z. 
Hua, Brennecke and Stadtherr [3,4,5,6] extended this 
method to problems modeled with cubic equation of 
state models, in particular the Van der Waals, Peng- 
Robinson, and Soave-Redlich-Kwong models. Though 
interval analysis provides a general purpose and model 
independent approach for guaranteed solution of the 
phase stability problem, the discussion below will focus 
on the use of cubic equation of state models. 


Phase Stability Analysis 


The determination of phase stability is often done us- 
ing tangent plane analysis [1,12]. A phase at specified 
temperature T, pressure P, and feed mole fraction vec- 
tor z is unstable and can split (in this context, ‘unstable’ 
refers to both the thermodynamically metastable and 
classically unstable cases), if the molar Gibbs energy of 
mixing surface m(x, v) ever falls below a plane tangent 
to the surface at z. That is, if the tangent plane distance 


D(x, v) = m(x,v) — mo — (=) (x; — 21) 
170 


i=1 


is negative for any composition (mole fraction) vector 
x, the phase is unstable. The subscript zero indicates 
evaluation at x = z, nis the number of components, and 
v is the molar volume of the mixture. A common ap- 
proach for determining if D is ever negative is to min- 
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imize D subject to the mole fractions summing to one 


1—) > x; =0 (1) 


i=1 


and subject to the equation of state relating x and v: 


RT a 


p— 
ee! ewe 


= 0. (2) 


Here a and b are functions of x determined by spec- 
ified mixing rules. The ‘standard’ mixing rules are b 
= pan xb; and a = yy Dai XiXjAijp with Gij = 
(1 — kij)./ajaj. The a;(T) and b; are pure component 
properties determined from the system temperature T, 
the critical temperatures T,;, the critical pressures P,j 
and acentric factors w;. The binary interaction parame- 
ter kj is generally determined experimentally by fitting 
binary vapor-liquid equilibrium data. Equation (2) is 
a generalized cubic equation of state model. With the 
appropriate choice of u and w, common models such 
as Peng-Robinson (u = 2, w = —1), Soave—-Redlich- 
Kwong (u = 1, w= 0), and Van der Waals (u = 0, w = 0) 
may be obtained. It is readily shown that the stationary 
points in this optimization problem must satisfy 


sj(x,v) —s;(z,vp) = 0, i=1,...,n—1, (3) 


where 


—_ om _ om 
' \ ax; Xn) 


The (n + 1) x (n + 1) system given by equations (1), 
(2) and (3) above can be used to solve for the stationary 
points in the optimization problem. 

The equation system for the stationary points has 
a trivial root at (x, v) = (z, vo) and frequently has mulkti- 
ple nontrivial roots as well. Thus conventional equation 
solving techniques may fail by converging to the trivial 
root or give an incorrect answer to the phase stability 
problem by converging to a stationary point that is not 
the global minimum of D. This is aptly demonstrated by 
the experiments of K.A. Green, S. Zhou and K.D. Luks 
[2], who show that the pattern of convergence from dif- 
ferent initial guesses demonstrates a complex fractal- 
like behavior for even very simple models like Van der 
Waals. The problem is further complicated by the fact 
that the cubic equation of state (2) may have multiple 
real volume roots v. 


As an example of a system that causes numerical 
difficulties, consider the binary mixture of hydrogen 
sulfide (component 1) and methane (component 2) at 
a temperature of 190 K and pressure of 40.53 bar (40 
atm) modeled using the Soave—Redlich-Kwong equa- 
tion of state, and with an overall feed composition of z, 
= 0.0187. Figure 1 shows a plot of the reduced Gibbs en- 
ergy of mixing m vs. x, for this system (in the reduced 
composition space where x2 = 1 — x;), and also shows 
the tangent at the feed composition. 

The corresponding tangent plane distance function 
is shown in Fig. 2 and Fig. 3. 

Note that this system has a region, around x, of 
0.03 to 0.05, where multiple real volume roots occur 
and thus multiple values of m and D exist; only the 
lowest values are physically significant. This system has 
five stationary points, four minima and one maximum. 
Conventional locally convergent methods are typically 
used with multiple initial guesses, generally at or near 


Global Optimization: Application to Phase Equilibrium Prob- 
lems, Figure 1 

Reduced Gibbs energy of mixing m versus x, for the system 
hydrogen sulfide and methane, showing tangent at a feed 
composition of 0.0187 
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Global Optimization: Application to Phase Equilibrium Prob- 
lems, Figure 2 

Tangent plane distance D versus x; for the example system 
of Fig. 1. See Fig. 3 for enlargement of area near the origin 


Global Optimization: Application to Phase Equilibrium Problems 


1289 


0.02 
0.015 
0.01 
0.005 


Global Optimization: Application to Phase Equilibrium Prob- 
lems, Figure 3 
Enlargement of part of Fig. 2, showing area near the origin 


the pure components (x; = 0 and x, = 1). When this is 
done convergence will likely be to the local minimum 
at the feed composition (0.0187) and to the local min- 
imum around 0.88. The global minimum with D < 0 
is missed, leading to the incorrect conclusion that the 
mixture is stable. 


Interval Analysis 


Interval analysis makes possible the mathematically and 
computationally guaranteed solution of the phase sta- 
bility problem. Since the mole fraction variables x; are 
known to lie between zero and one, and it is easy to put 
physical upper and lower bounds on the molar volume 
v as well, a feasible interval for all variables is readily 
identified. By applying an interval Newton/generalized 
bisection approach to the entire feasible interval, enclo- 
sures of all the stationary points of the tangent plane 
distance D can be found by solving the nonlinear equa- 
tion system (1)-(3), and the global minimum of D thus 
identified. This approach requires no initial guess, and 
is applicable to any model for the Gibbs energy, not 
just those derived from equations of state. For the bi- 
nary system used as an example above, all five station- 
ary points are easily found, and the global minimum at 
x = 0.0767, v = 64.06 cm?/mol, and D = — 0.004 thus 
identified [3,6]. 

The efficiency of the interval approach can depend 
significantly on how tightly one can compute interval 
extensions for the functions involved. The interval ex- 
tension of a function over a given interval is an enclo- 
sure for the range of the function over that interval. 
When the natural interval extension, that is the func- 
tion range computed using interval arithmetic, is used, 
it may tightly bound the actual function range. How- 


ever, it is not uncommon for the natural interval exten- 
sion to provide a significant overestimation of the true 
function range, especially for functions of the complex- 
ity encountered in the phase stability and equilibrium 
problems. 

Some tightening of bounds can be achieved by tak- 
ing advantage of information about function mono- 
tonicity. Another simple and effective way to allevi- 
ate this difficulty in this context is to focus on tight- 
ening the enclosure when computing interval exten- 
sions of mole fraction weighted averages, such as TF = 
a xirj, where the 7; are constants. Due to the mix- 
ing rules for determining a and b, such expressions oc- 
cur frequently, both in the equation of state (2) itself, 
as well in the derived model m(x, v) for the Gibbs en- 
ergy of mixing and thus in equation (3). The natural 
interval extension of 7 will yield the true range (within 
roundout) of the expression in the space in which all 
the mole fraction variables x; are independent. How- 
ever, the range can be tightened by considering the con- 
straint that the mole fractions must sum to one. One 
approach for doing this is simply to eliminate one of 
the mole fraction variables, say x,. Then an enclosure 
for the range of 7 in the constrained space can be deter- 
mined by computing the natural interval extension of 
tht pee — rn)x;. However, this may not yield the 
sharpest possible bounds on 7 in the constrained space. 

For constructing the exact (within roundout) 
bounds on 7 in the constrained space, S.R. Tessier [17] 
and Hua, Brennecke and Stadtherr [5] have presented 
a very simple method, based on the observation that at 
the extrema of 7 in the constrained space, at least n — 
1 of the mole fraction variables must be at their up- 
per or lower bound. This observation can be derived 
by viewing the problem of bounding the range of 7 in 
the constrained space as a linear programming prob- 
lem. As shown in [5], when the constrained space inter- 
val extensions for mole fraction weighted averages are 
used, together with information about function mono- 
tonicity, significant improvements in computational ef- 
ficiency, nearly an order of magnitude even for small 
(binary and ternary) problems, can be achieved in us- 
ing the interval approach for solving the phase stability 
problem. 

For small problems, it is usually efficient to globally 
minimize D by finding all of its stationary points, since 
this does not require repeated evaluation of the range 
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of D. However, in general, for making a determination 
of phase stability or instability, finding all the station- 
ary points is not really necessary, nor for larger prob- 
lems, desirable. For example, if an interval is encoun- 
tered over which the interval extension of D has a neg- 
ative upper bound, this guarantees that there is a point 
at which D < 0, and so one can immediately conclude 
that the mixture is unstable without determining all the 
stationary points. It is also possible to easily make use 
of the underlying global minimization problem. Since 
the objective function D has a known value of zero at 
the mixture feed composition (tangent point), any in- 
terval over which the interval extension of D has a lower 
bound greater than zero cannot contain the global min- 
imum and can be discarded, even though it may contain 
a stationary point (at which D will be positive and thus 
not of interest). Thus, one can essentially combine the 
interval-Newton technique with an interval branch and 
bound procedure in which lower bounds are generated 
using interval techniques. 

Also, it should be noted that the global interval ap- 
proach described here can easily be combined with ex- 
isting local methods for determining phase stability and 
equilibrium. First, some (fast) local method is used. If it 
indicates instability then this is the correct answer as it 
means a point at which D < 0 has been found. If the local 
method indicates stability, however, this may not be the 
correct answer since the local method may have missed 
the global minimum in D. Applying interval analysis 
as described here can then be used to confirm that the 
mixture is stable if that is the case, or to correctly deter- 
mine that it is really unstable if that is the case. 


Conclusion 


As demonstrated in [3,4,5,6,11,15], interval analysis can 
be used to solve phase stability and equilibrium prob- 
lems efficiently and with complete reliability, provid- 
ing a method that can guarantee with mathematical 
and computational certainty that the correct result is 
found, and thus eliminating computational problems 
that are encountered with conventional techniques. 
The method is initialization independent; it is also 
model independent, straightforward to use, and can be 
applied in connection with any equation of state or 
activity coefficient model for the Gibbs free energy of 
a mixture. There are many other problems in the anal- 


ysis of phase behavior, and in chemical process analysis 
in general [14], that likewise are amenable to solution 
using this powerful approach. 
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Many practically significant problems require to opti- 
mize in a ‘black box’ situation, when the objective func- 
tion is given by a code, but its structure is not known. 
In some algorithms, developed for such a case, differ- 
ent heuristic ideas are implemented. A disadvantage of 
the heuristic algorithms is dependence of the results on 
many parameters which choice is difficult because of 
rather vague meaning of these parameters. To develop 
a theory of global optimization the ‘black box’ should be 
replaced by a ‘grey box’ corresponding to some model 
of predictability/uncertainty of values of an objective 
function. 

A model of an objective function is an important 
counterpart of any optimization theory (e. g., quadratic 
models are widely used to construct algorithms for lo- 
cal nonlinear optimization). The uncertainty on val- 
ues of multimodal functions at the arbitrary points of 
the feasible region is more essential than uncertainty 
on the value of the objective function which will be 
calculated at the current iteration of the local descent. 
Therefore, the global optimization models that describe 
the objective function with respect to information ob- 
tained during the previous iterations are different from 
polynomial models used in local optimization. Differ- 
ent models may be used; e. g., a deterministic model, 
defining the guaranteed intervals for unknown func- 
tion values, or a statistical model, modeling the uncer- 
tainty on function value by means of a random variable. 
The choice of a model is crucial because it defines the 
methodology of constructing the corresponding algo- 
rithms. A Lipschitzian type model enables the construc- 
tion of global optimization algorithms with guaranteed 
(worst case) accuracy. However, the number of func- 
tion evaluations in the worst case grows drastically with 
the dimensionality of the problem and the prescribed 
accuracy. In spite of this pessimistic theoretical result 
many practical rather complicated problems have been 
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solved heuristically. Because a heuristics is a human ex- 
perience based methodology, oriented towards average 
(typical, normal) conditions, it seems reasonable to de- 
velop a theory formalizing the principle of rational be- 
havior with respect to average conditions in global op- 
timization. The average rationality is well justified for 
playing a ‘game against nature’ which models optimiza- 
tion conditions better than an antagonistic game where 
the principle of minimax (guaranteed result) is well jus- 
tified. The method ology of average rationality was ap- 
plied to develop the general theory of rational choice 
under statistically interpreted uncertainty [4]. This gen- 
eral theory was further specified to develop the theory 
of global optimization based on statistical models of 
multimodal functions [11]. 

To construct a statistical model of multimodal func- 
tion f(x), x € A C R", the axiomatic approach is ap- 
plied: the rationality of comparisons of likelihood of 
different values of f(-) is postulated by simple, intu- 
itively acceptable axioms, and it is proved that the in- 
terpretation of an unknown value f(x) as a Gaussian 
random variable &, is compatible with the axioms. The 
parameters of &. (mean value m(x|(x;, y;)) and vari- 
ance o7(x|(x;, yi)), where y; = f(x;) are known func- 
tion values obtained during the search) are introduced 
by axiomatic theory of extrapolation under uncertainty. 
In the one-dimensional case both functions are very 
simple: m(x|(x;, yi)) is piecewise linear (connecting the 
neighboring trial points) and 07 (x|(x;, y;)) is piecewise 
quadratic. 

By means of further (more restrictive) assumptions, 
the statistical models, corresponding to the stochas- 
tic functions, may be specified. The one-dimensional 
model corresponding to the Wiener process was intro- 
duced in [3]. However, the specification of a model as 
a stochastic function is not very reasonable: this nor- 
mally involves additional very serious implementation 
difficulties and does not help to choose the model ac- 
cording to the a priori information on the problem. 
Using a statistical model the algorithm is constructed 
maximizing the probability to find better points than 
those found during the previous search. Such a strat- 
egy is justified also by the natural axioms of ratio- 
nality of search. In the one-dimensional case the al- 
gorithm is easy to implement. In the multidimen- 
sional case, an auxiliary optimization problem must be 
solved [8]. 


Although the algorithm is based on the statistical 
model it is described without of use of randomization. 
Therefore it may be investigated by usual deterministic 
methods, e.g. the convergence of the algorithm in the 
is proved under weak assumptions on the underlying 
statistical model (continuity of m(x|-), 07(x|-) and weak 
dependence of both characteristics at point x on (x;, yj) 
for relatively remote points ~; [8]). 

The models and algorithms of this approach are well 
grounded theoretically because they are derived from 
natural assumptions on rational behavior of an opti- 
mizer. As a topic for further research, the theory of av- 
erage complexity seems very prospective. It would be 
important to evaluate the complexity of practically ef- 
ficient algorithms constructed by the approach as well 
as to obtain general bounds and compare them with 
those obtained for Lipschitzian algorithms. The first re- 
sults in this direction are interesting even for the one- 
dimensional case: the limit distribution of error of pas- 
sive random search in case of the Wiener model exists 
or does not exist depending on a subtle interpretation of 
the model [2]. Other important theoretical topics are: 
developing dual (global-local) models for the multidi- 
mensional case, and justification of multidimensional 
statistical models oriented towards algorithms of the 
branch and bound type (cf. also » Integer program- 
ming: Branch and bound methods), whose auxiliary 
computations would be essentially less time consum- 
ing than maximization of the probability over the whole 
feasible region at each iteration. 

Many algorithms were constructed using different 
statistical models and more or less theoretically justi- 
fied ideas. For example, a Bayesian algorithm (cf. also 
> Bayesian global optimization) is defined by mini- 
mizing the average error with respect to the stochas- 
tic function chosen for a model [5]. By interpolation, 
the next calculation of a value of the objective function 
is performed at minimum point of m(-|(x; y;)) [1,6]. 
For the information-statistical method, an ad hoc one- 
dimensional model is constructed [1,7]. The algorithms 
may be generalized for the case with ‘noisy’ functions, 
see for example the algorithm in [8,10]. 

The known results from the theory of stochastic 
functions as well as axiomatic construction of statis- 
tical models do not give numerically tractable models 
which are completely adequate to describe local and 
global properties of a typical global optimization prob- 


Global Optimization Based on Statistical Models 


1293 


lem [1]. But in the framework of statistical models the 
adequacy, e. g., to local prop erties of the objective func- 
tion, might be tested as a statistical hypothesis. If the 
statistical model is locally inadequate in a subset of the 
feasible region, then the objective function is assumed 
unimodal in this subset and a local minimum of f(x) 
may be found by a local technique. An example of the 
combination of global and local search with a stopping 
rule corresponding to a high probability of finding the 
global minimum is presented in [9]. 

In the case of one-dimensional global optimization 
there are many competing algorithms including algo- 
rithms based on statistical models [8]. The algorithms 
representing different approaches may be compared 
with sufficient reliability by means of experimental test- 
ing. Since the codes in one-dimensional case are very 
precise realizations of theoretical algorithms then in- 
fluence of implementation specifics is insignificant (at 
least with respect to multidimensional cases) and the 
comparison results may be generalized from codes to 
corresponding approaches. The results in [8] show that 
the algorithm from [9] and its modification [8] out- 
performs algorithms based on Lipschitzian type mod- 
els even if a good estimate of the Lipschitz constant 
is available. The comparison of multidimensional al- 
gorithms is methodologically more difficult, partly be- 
cause of very different stopping conditions. But gener- 
ally speaking, the algorithms based on statistical models 
are efficient with respect to the number of evaluations 
of the objective function for the multimodal functions 
up to 10-15 variables [8]. The auxiliary computations 
require much computing time and computer memory. 
Therefore, such algorithms are rational to use for the 
problems, whose objective unction is expensive to eval- 
uate. If an objective function is cheap to evaluate, the 
gain obtained from a low number of function evalua- 
tions may be less than the loss caused by the auxiliary 
computations. 

A detailed review of the subject is presented in [8]; 
further references may be found in [1]. 
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Batch processes are a popular method for manufactur- 
ing products in low volume or that require several com- 
plicated steps in the synthesis procedure. The growth 
in the market for specialty chemicals has contributed to 
the demand for efficient batch plants. Batch processes 
are especially attractive due to their inherent flexibil- 
ity. They can accommodate a wide range of production 
requirements. Batch equipment can be reconfigured to 
produce more than one product. Finally, certain pieces 
of equipment in batch processes can be used for more 
than one task. 

An important area of concern in the design of batch 
processes is their ability to accommodate changes in 


production requirements and processing parameters. 
The key issue is: given some degree of uncertainty in 
a) the future demand for the products and b) the pa- 
rameters that describe the chemical and physical steps 
involved in the process, what is the appropriate amount 
of flexibility the process should possess so as to main- 
tain feasible operation while maximizing profits? 

Many methods have been proposed for the design of 
batch plants under known market conditions and nom- 
inal operating conditions. Two major classes of batch 
plant designs are multiproduct plants and multipurpose 
plants. In the multiproduct plant, all products follow 
the same sequence of processing steps. Typically, one 
product is produced at a time in what is termed a single- 
product campaign (SPC). Multipurpose batch plants al- 
low products to be processed using different sequences 
of equipment, and in some cases products can be pro- 
duced simultaneously. 

While significant progress has been made in the de- 
sign and scheduling of batch plants, until recently the 
issues of flexibility and design under uncertainty have 
received little attention. Among the first to address the 
problem of batch plant design under uncertainty in 
a novel way were [10], and [8]. They divided the vari- 
ables in the design problem into five categories: struc- 
tural, design, state, operating, and uncertain. Structural 
variables describe the interconnections of the equip- 
ment in the plant. Design variables describe the size 
of the process equipment and are fixed once the plant 
is constructed. State variables are dependent variables 
and are determined once the design and operating vari- 
ables are specified. Operating variables are those whose 
values can be changed in response to variations in the 
uncertain variables. Finally, the uncertain parameters 
are the quantities that can have random values which 
can be described by a probability distribution. Usu- 
ally the uncertain parameters have normal distributions 
and are considered to be independent of each other. 
[8] also introduced the distinction between variations 
which have short-term effects and those with long-term 
effects. [18] extended this idea, suggesting a distinction 
between ‘hard’ and ‘soft’ constraints in which the for- 
mer must be satisfied for feasible plant operation, but 
the latter may be violated, subject to a penalty in the ob- 
jective function. They considered the time required to 
produce a product as uncertain and developed a prob- 
lem formulation. 
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In [12], and [13] the authors addressed the prob- 
lem of multiproduct batch plant design with uncertain- 
ties in both demand for the products and in technical 
parameters such as processing times and size factors. 
They restricted their designs to one piece of equipment 
per stage. [3] presented several variations on the prob- 
lem of design with uncertain demands. They used inter- 
val methods to develop different solution procedures, 
including a two-stage approach and a penalty function 
approach. Another type of batch plant is the multipur- 
pose plant. [14] proposed a scenario-based approach for 
the design of multipurpose batch plants with uncertain 
production requirements. The multipurpose approach 
resulted in a large scale MILP model for which efficient 
techniques for obtaining good upper and lower bounds 
were proposed. [15] developed a model for the multi- 
product batch design problem which takes into account 
uncertainties in the product demands and in equipment 
availability. They considered the problem of design 
feasibility separately from the maximization of profits 
and presented an approach for achieving both criteria. 
[16] addressed the problem of uncertain demands, and 
used a scenario-based approach with discrete proba- 
bility distributions for the demands. In addition, they 
considered the scheduling problem as a second stage, 
following the design problem. [6], and [7] considered 
the multiproduct batch plant design problem based on 
a stochastic programming formulation. They developed 
a relaxation of the production feasibility requirement 
and added a penalty term to the objective function to 
account for partial feasibility. Through this analysis, the 
problem can be reformulated as a single large scale non- 
convex optimization problem. [2] extended this work 
to the design of multipurpose batch plants and imple- 
mented an efficient Gaussian quadrature technique to 
improve the estimation of the expected profit. [5] iden- 
tified special structures in the nonconvex constraints 
for multiproduct and multipurpose batch design for- 
mulations. These properties can be exploited to obtain 
tight bounds on the global solution. This allows very 
large scale design problems to be solved in reasonable 
CPU time using the w@BB method of [1]. 


Conceptual Framework 


Most batch design problems are variations on the same 
basic model of a batch plant. The plant consists of M 


processing stages where each stage j contains Nj identi- 
cal pieces of equipment. The volume of each unit, Vj, is 
a design variable, and the number of units per stage, Nj, 
may be a variable or a fixed parameter. 

In the batch plant, NP products are to be made, and 
the amount of each produced is Q;. Each product is pro- 
duced in a number of batches of identical size, B;. Using 
these definitions, a number of constraints on the design 
of the plant can be imposed. These constraints are: 

1) an upper limit on the batch size, 

2) a lower limit on the amount of time between 
batches, 

3) an upper limit on the total processing time allowed, 
and 

4) a constraint on the production related to the de- 
mand for each product. The basic form of these con- 
straints is shown below, for a multiproduct batch 
plant with single-product campaigns. 


Constraints on Batch Size 


The batch size for each product i cannot be larger than 
the size of the pieces of equipment in each stage j. This 
can be written 


Vj 
Bas 
Sij 


i=1,...,NP, j=1,...,M. 
The size factor, Sj, is the capacity required in stage j 


to process one unit of product i. 


Minimum Cycle Time 


In order to make sure that each batch is processed sep- 
arately in a given stage, one batch cannot begin pro- 
cessing until the previous batch has been processed for 
a certain amount of time. This is called the cycle time 
t; : 
Ti = =, 

j 
i=1,....NP, j=1,...,M. 
The time factor, tj is the amount of time to process 

one batch of product i in stage j. 
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Constraints on Production Time 


The amount of time needed to produce all of the 
batches must be less than the total time available, H, 


Demand Constraints 


The production for each product must meet the de- 
mand. 


Economic Objective Function 


The objective is to maximize profits. The profit is calcu- 
lated by subtracting the annualized capital costs from 
the revenues: 


NP M 
Profit = > Q; “Dp = Yaniv), 
j=l 


i=1 


where p; is the price of product i. The annualization fac- 
tor for the cost of the units in stage j is aj. 

In the case where the number of units per stage, 
N; is variable and/or the unit sizes, V;, take only dis- 
crete values, this problem is a mixed integer nonlinear 
optimization problem (MINLP). If Nj; is fixed and the 
unit sizes are continuous, the problem is a nonlinear 
program (NLP). In either case, the problem is noncon- 
vex, therefore conventional mixed integer and nonlin- 
ear solvers cannot be used robustly. Instead, global op- 
timization techniques must be employed to guarantee 
that the optimal solution is located. 


Sources of Uncertainty 


Within the mathematical framework for a multiprod- 
uct batch plant there are a number of possible sources 
of uncertainty. The most commonly studied are uncer- 
tainty in the process parameters, like the size factors, Sj, 
and the time factors, tj, and uncertainty in the product 
demand, Dj. In addition to these, [3] considered uncer- 
tainty in the time horizon, H, and in the product prices, 
Pi. 

Uncertainty in the process parameters is model in- 
herent uncertainty, as classified by [11]. That is, un- 
certainty in the process parameters affects the feasible 


operation of the batch plant. Conversely, uncertainty 
in the product demand is an external source of uncer- 
tainty, therefore it only affects the objective function, 
and not the feasibility of the plant design. 


Uncertainty in Process Parameters 


The size factors and processing times affect the feasi- 
ble design and operation of the batch plant. The goal is 
to design a plant that can operate feasibly, even if there 
is some uncertainty in the values of these parameters. 
The approach that is commonly followed is to consider 
a number of different scenarios, where each scenario 
corresponds to a set of parameter realizations. For ex- 
ample, if the size factors, Sj, have some nominal value, 
S; ; then one scenario is that all of the size factors are at 
their nominal value. Similarly, if we have some knowl- 
edge about the amount of uncertainty in the size fac- 
tors, we can construct a lower extreme scenario, where 
each size factor is at its lower bound, Si, and an up- 
per extreme scenario, Si. The new set of size factors, 
reflecting the different scenarios is represented by the 
parameter g i The scenarios can be weighted using the 
factor, w?. 

The set of constraints for the batch design problem 
must be modified so that the design is feasible over the 
whole set of scenarios, P: 


t. 
Si, "Nj 
NP Pp 
Y Sit, eH 
i 


Uncertainty in Product Demand 


Uncertainty in the demand for the products affects the 
profitability of the plant. In this case, the product de- 
mand is given by a probability distribution function 
J(@;) where 0; represents the uncertain demand for 
product i. The calculation of the expected revenues re- 
quires the integration over an optimization problem: 


NP 
Eg [med P| 


i=1 


NP 
= iQip J(@)dd0. (1 
Prescay 2? Q | (0) dd. (1) 


i=1 
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The integration should be performed over the feasible 
region of the plant, which is unknown at the design 
stage. See [6] for a Gaussian quadrature approach to 
discretize the integration. The range of uncertain de- 
mands is covered by a grid, where each point on the grid 
represents a set of demand realizations, and is assigned 
a weight corresponding to its probability, w4 J7. The set 
of quadrature points is represented by Q. The expected 
revenues are now calculated as a multiple summation: 


NP 
Eg ms Yai 
" j=1 
Pp Q 


= 1 es nL 
= og eT ie 
q=1 1 


p=1 


i= 


In addition, the time horizon constraint must be 
modified: 


od 
y= <H, VpeP, VqeQ. 


Global Optimization Approaches 


The set of constraints for the design of a multiprod- 
uct batch plant under uncertainty form a nonconvex 
optimization problem. Global optimization techniques 
must be used in order to ensure that the true optimal 
design is located. 

Following the analysis of [9], an exponential trans- 
formation can be applied, reducing the number of non- 
linear terms in the model. 


Vj =exp(vj), Vie M, 
B; = exp(bj), Vie NP, 
TP. = exp(t?,), Vie NP. 


In [5] and [6] global optimization methods were de- 
veloped to solve this problem, where the number of 
units in each stage, Nj, is fixed. In this case, the cycle 
time becomes a parameter, determined by, 


t?. 
In{ —2 F 
Nj 


Vie NP, VpeP. 


Ps 
es 


The nonlinear optimization problem to be solved is 
written as a minimization: 


M 
min 6 Yo ajNj exp (B;v;) 
bi,vj,.Q)? j=l 
P l Q NP 
—~Vi— Nott p-07 
De 
p=l q=1 i=1 
P l Q NP 
+1 Lot DP: (87 - OF") 
p=1 iis q=1 i=1 
s.t. vj In(S7;) +b; 
NP 
> QP - exp(t?, — bi) < H 
i=1 
OF = QP < 6 


In(V}) < vj < In’) 


vl vu 
minIn {| < b; < minIn =e 
iP SP. iP sP 
ij ij 
(2) 


Note that the time horizon constraint is the only non- 
convex constraint remaining in the problem formula- 
tion. A penalty term is added to the objective function 
to account for unsatisfied demand, the penalty parame- 
ter is y. 


The GOP Approach 


In [7] and [2] the GOP algorithm of [4,17] has been ap- 
plied to solve design formulations for both multipur- 
pose and multiproduct batch plants. GOP converges to 
the global optimum solution by solving a primal prob- 
lem and a number of relaxed dual problems in each it- 
eration. In [7] it is observed that if the variables in the 
batch design problem are partitioned so that y = {v;, bi} 
and x = iQ” }, then the problem is convex in y for every 
fixed x, and linear in x for every fixed y. This satisfies 
Condition A) of the GOP algorithm. 

A property was developed in [7] that allows the 
number of relaxed duals per iteration to be reduced 
from 2%”? to 2N?, making the problem computation- 
ally tractable. 


«BB Approach 


The wBB approach of [1] was applied in [5] to solve 
both multiproduct and multipurpose design formu- 
lations. @BB is a branch and bound approach that 
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converges to the global solution by solving a se- 
quence of upper and lower bounding problems. The 
lower bounding problem is formulated by subtracting 
a quadratic term, multiplied by the constant a, from 
each of the nonconvex terms, thus convexifying the 
problem. Often, the size of the a term must be esti- 
mated, resulting in poor lower bounds in the first few 
levels of the branch and bound tree. However, the non- 
convex terms in the batch plant design formulation al- 
low the exact value of a to be calculated, resulting in 
a tight lower bound on the global solution. This tech- 
nique has been used to find the optimal design for 
a multiproduct batch plant with 5 products in 6 stages. 
This corresponds to a nonconvex NLP with 15,636 vari- 
ables, 3155 constraints, and 15,625 nonconvex terms. 


Other Types of Batch Plants 


In addition to the multiproduct batch plant with single- 
product campaign illustrated in the preceding sections, 
there are many other batch plant design formulations 
that can be adapted to consider the issue of uncertainty 
in design. 


Mixed-Product Campaign 


This is another example of a multiproduct batch plant. 
In this case, storage of the intermediate products is al- 
lowed between processing steps. In addition, batches of 
different products can be alternated. This allows a re- 
duction in the total production time. Rather than be- 
ing limited by the largest cycle time for all stages, this 
method calculates the total production time for each 
stage: 


The total time for each stage must be less than the total 
time allowed: 


NP Qi? 
qp,tot , j i Pp 
i=1 


This can be written 


NP (ap 

y Q; te <H. 
B; 4 

i=1 


Note that this constraint has the same form as the 
time horizon constraint for the single-product cam- 
paign formulation. 


Multipurpose Batch Plant-Single Equipment 
Sequence 


In a multipurpose batch plant, the equipment can be 
used for more than one function, therefore each prod- 
uct may have a different route through the plant. In 
the single equipment sequence case, there is one dis- 
tinct route for each product. Production is carried out 
in a sequence of campaigns L, and there may be more 
than one product produced simultaneously in a cam- 
paign, h. The time needed for each campaign, Cp, is 
based on the maximum cycle time for all products in 
the campaign, 


h=1 
where 
1 if product i is allowed 
Ahi = in campaign h, 


0. else. 


Finally, the sum of all campaign times must be less than 
the total time available: 


EL 


Ce ea. 


h=1 
Multipurpose Batch Plant-Multiple Equipment 
Sequence 


In this case, there are multiple routes through the plant 
for each product i, PR;. The total amount of product i 
produced is the sum over the production of i in each 
route: 


9p _ 9p 
Qi? = da. 
rePR; 


The time for campaign C;, is based on the maximum 
cycle time for each route in the campaign, 


L qi? 
r 
> Oar Cy Z ve 
B, 


h=1 
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The sum of all campaign times must be less than the 
total time available, 


Note that in both of the multipurpose batch de- 


sign formulations shown above, the constraints that are 


added are either linear, or have the exact same form 


of nonconvexities as shown for the multiproduct batch 


design formulation. Therefore, the global optimization 
techniques discussed in Section “Global Optimization 
Approaches’ are applicable to these problems. 


See also 


> aBB Algorithm 
> Continuous Global Optimization: Models, 


Algorithms and Software 


> Global Optimization in Generalized Geometric 


Programming 


> Global Optimization Methods for Systems of 


Nonlinear Equations 


> Global Optimization in Phase and Chemical 


Reaction Equilibrium 


> Interval Global Optimization 
> MINLP: Branch and Bound Global Optimization 


Algorithm 


> MINLP: Global Optimization with wBB 
> Smooth Nonlinear Nonconvex Optimization 
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The global optimization techniques are still quite un- 
popular in the astronomical community, in particular, 
among the double stars astronomers. Among the rea- 
sons of their reticence one finds a long practice of man- 
ual and graphical methods, ‘least squares’ adjustments 
of a linearized objective function, differential correc- 
tion, etc. 

This article does not present, unfortunately, the 
state of the art in orbits determination, even if a few 
astronomers, mostly young ones, tries to convince the 
others that a global minimization step is useful. This ar- 
ticle presents a possible way to obtain the orbital pa- 
rameters of double-lined spectroscopic visual binaries. 


Astronomical Problem 


The generic terms ‘binary star’ designate two stars that 
are gravitationally linked together. Since J. Kepler, one 
knows that such an interaction leads to an elliptic or- 
bital motion of one star around each the other (Ke- 
pler’s first law). The Kepler third law tells us that there 
is a simple relation between the orbital period (P), the 
semimajor axis of the relative orbit (a) and the mass 
sum of the 2 stars (M, (the mass of the brighter star) 
and Mz (the mass of the fainter component)): 


a 


Pp Ma +t Mz, 
where a is expressed in astronomical unit (1 A.U. is 
equal to the average distance of the Earth from the Sun), 
P is expressed in years and the masses in solar masses 
(Mo). This relation is still, almost 400 years after Ke- 
pler, the only direct and hypothesis-free method to es- 
timate stellar masses. 

A visual binary corresponds to a situation where the 
2 stars are visually resolved and the orbital motion, pro- 
jected on the plane orthogonal to the sight direction, 
can be perceived. From the relative positions of B with 
respect to A along time (tf, x and y), one can extract 
the 7 parameters characterizing the visual orbit. Among 


these parameters, there are P and the angular value of a 
(expressed in seconds of arc). The latter cannot be con- 
verted into its linear value in A.U. unless the distance to 
the binary system is known (or, equivalently, the paral- 
lax of the system, @, is known). 

A binary star is spectroscopic if the motion of its 
spectral lines is observable. This motion is due to the 
Doppler effect: all lines issued from one star are shifted 
toward the blue (red) side of the spectrum when that 
star is moving toward (away from) the observer. The 
wavelength shift between the laboratory wavelength, 
A,, and the observed one, Ao, is connected to the radial 
velocity V through: 

Ag-At _ V 


Xr L Cc 
where c stands for the speed of the light in the vacuum. 
In a double-lined spectroscopic binary, lines from the 
two components are seen in the spectrum. 

The radial velocity curve ((t, Va), (t, Vg)) of each 
component along time shows a periodic variation. Lets 
Ka designates the amplitude of the radial velocity curve 
of component A and Kg the amplitude of component 
B. There is a relation between the mass ratio and the K. 
values: 


Ky _ Mp 
Ke Ma 
The amplitudes are usually expressed in km/s. 
Hence, if a binary star is simultaneously visual and 
double-lined spectroscopic, one can extract the individ- 


ual masses and the distance to the system with no extra 
hypothesis. 


Objective Function 


To describe the observations of a double-lined spectro- 
scopic visual binary requires at least 10 parameters. By 
observations, one means the relative positions of the 
fainter component with respect to the brighter star and 
the radial velocities of both components. Why more 
than 10 parameters could be necessary is beyond the 
scope of this paper. Among the different possible sets 
of 10 parameters, we select: 
e a): the angular semimajor axis of the relative orbit 
of the fainter component around the brighter star; 
e i: the inclination of the orbital plane with respect to 
the plane orthogonal to the direction sight; 
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w: the argument of the periastron; 

(2: the longitude of the ascending node; 

e: the eccentricity; 

P: the period; 

T: the periastron epoch (one of them); 

Vo: the radial velocity of the system’s center of mass; 
w: the parallax of the system; 

k: the ratio of the semimajor axis (relative to the 
brighter component) to the sum of the two semima- 
jor axes. 


The most natural way to combine visual and spectro- 
scopic observations is to use a least squares approach 
and to seek the minimum of an expression like: 


D(a,i,...,@,K) 


N 0 7 \? dV 

; 

= (2 “) ray ae 
=i Ox; 2 (1) 

2 2 

Ns, 0 as Nsp 0 te 

“3 Vag Van 23 Va, —VB, 
k=1 OVa, 1=1 OVe, 


where the hat (super) stands for the adjusted (observed) 
quantity and o. are the a priori known (or estimated) 
standard deviations of the observations. 

In fact, yet this idea of combining the two aspects of 
the orbit is unusual. Most of the time, astronomers keep 
the separation when computing the orbital parameters. 
Visual observers compute their own orbit and spectro- 
scopists theirs: one group simply fixes some parameters 
(w, e, P and T) to the values obtained by the other group 
(e.g., [5]). A few papers only presents a simultaneous 
adjustment of the ten parameters (e. g., [12,18]). 

The reader could be puzzled by the fact that the 
expression of D seems to be too kind to have numer- 
ous local minima and to require a global optimization 
method to be minimized. A description of how x, y, V4 
and Vz are computed is going to justify our approach. 

The visual orbit requires 


x =AX+ FY, 
y =BX+GY, 
X = cosE —e, 
Y =V1—e?sinE, 


where X and Y (x and y) are the angular rectangular co- 
ordinates, in the orbital (tangential) plane, of the fainter 
component with respect to the brighter one; A, B, F and 


G are the Thiele-Innes constants, expressed in terms of 
a’), i, m and Q as 

A =a" (cosa cos 2 — sinw sin & cos i), 

B =a" (cos sin 2 + sinw cos 2 cos i), 

F =a” (—sinw cos 2 — cosw sin 2 cos i), 

G =a" (—sinw sin 2 + cosw cos 2 cos i). 


E is the eccentric anomaly at time ¢t, determined unam- 
biguously by Kepler’s equation 


: 20 
E-—esinE = —(t-—T). 
P 
For a spectroscopic orbit j (j = A or j = B), one needs 


Va = Vo — Ka(cos(@ + v) + ecosa), 
Va = Vo + Kp(cos(w + v) + e cosa), 
2na*™ 

i 


~~ $6400 - 365.242198781PV/1 — e2 


v jl+e E 
tan- = tan —. 
2, l-e 2 


The angular separation in arcseconds is converted into 


sin i 
Kj 


its linear value using 


al”) 

8 
—— «49508 - 10°, 
WD 


(km). g(kmn) 


an => 
ao =(1—Ka™, 
Global Search 


In front of a low-dimension but highly nonlinear prob- 
lem, what can be used to find the minimum of an ex- 
pression such as D (equation (1))? Simulated annealing 
([8,11]) has already been successfully applied to the de- 
termination of the orbital parameters of visual binaries 
[14]. In that case, only 7 parameters are required, but 
the nature of the problem seems close enough to the 
current one to be tempted to use the same approach. 
The implementation of SA used for the visual prob- 
lem gives satisfaction ([1,15]). Nevertheless, the in- 
crease of the working space dimension is, by itself, 
enough to justify the search for an improved algorithm 
for the combined spectroscopic-visual problem. 
Among the few SA implementations for continuous 
functions, the one in ‘Numerical Recipes’ [16] was se- 
lected. Although the published code behaves very well, 
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some improvements (at least for our purpose) are pos- 
sible. We are going to focus on modifications of the ba- 
sic algorithm, mainly some improvements of the guess 
generator. A rough pseudocode of the algorithm in [16] 
is given below: 


DO 
use a simplex to get a new solution; 
decrease the temperature; 

WHILE (temperature > Twin); 


Suggested pseudocode after [16] 


Let’s first remind that the guess generator proposed in 
[16] is based on a thermally disturbed simplex [13]. 
When the temperature approaches 0, the generator re- 
duces to the Nelder-Mead algorithm and a local conver- 
gence can be expected. W.H. Press et al. announce a lo- 
cal convergence whereas V. Torczon [17] showed such 
a convergence cannot be guaranteed with the Nelder- 
Mead algorithm. 

The major drawback of this algorithm is that the 
simplex can degenerate (a vertex becomes a linear com- 
bination of strictly less than the other n ones). If that 
happens, only a subspace of the complete working space 
can be visited and the risk of missing the minimum 
raises. 

To decide whether or not to reinitialize the simplex 
can be based on the mean of the values at the n+1 ver- 
tices. The mean is compared with the mean at the pre- 
vious temperature. If the relative change is not impor- 
tant enough or the generator stops at a local minimum, 
a new simplex is generated. The best point ever met is 
chosen as one of the vertices. 

A natural way to initialize a simplex is to choose the 
n remaining vertices such that each edge issued from 
the (n+1)th point is parallel to a different axis of coor- 
dinates. A refined version of that approach is adopted. 
Instead of randomly choosing the value of the compo- 
nent in the interval of accepted values for that compo- 
nent, some ‘taboo’ restrictions are added. 

The overall working space is divided in regions. 
When a new simplex is generated, each cells contain- 
ing a vertex are marked as taboo. The random selec- 
tion of the value of a component is repeated until the 
resulting cell (C) does not lie in a taboo region (TL). 


Even if the best point ever met does not change between 
two successive re-initializations, this procedure guar- 
antees that the two simplices are different. That raises 
the probability of visiting the overall space. Practically, 
the taboo cells are kept in a circular linked list and dis- 
carded when space for a new cell is required. The result- 
ing pseudocode is given below: 


DO 
use a simplex to get a new solution; 
IF initialization required 
THEN adopt the best solution as the (#+1)th 
vertex; 
for the first n vertices (V;) 
DO 
Vi = Vass 
DO 
change the ith component of V;; 
identify C; 
WHILE (C in TL); 
add C to TL; 
OD; 
FI; 
decrease the temperature; 
WHILE (temperature > Twin); 


Adopted pseudocode 


Ingber’s algorithm ([6,7]) is used for the annealing sche- 
dule. The initial temperature is set to 10‘/°80)) where 
(logio(D)) stands for the mean of the logarithm of the 
objective function over the first generated simplex. 


Element Value Std. dev. 
a(’) 0.072 0.0010 
i(°) 68 1.3 
o(°) 352 DD) 
2(°) 262.0 0.53 

e 0.38 0.016 
P(yr) 1.7255 0.00098 
T (Besselian yr) 1979.332 0.0099 
Vo(km/s) SOI) 0.13 
o(”) 0.038 0.0012 
K 0.349 0.0096 
mass A(Mo) 1S 0.18 
mass B(Mo) 0.8 0.12 


Orbital parameters of HIP111170 and their standard deriva- 
tions 
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Example 1 (HIP111170) 


The double star HIP111170 ( = HR8851 = HD213429) 
is a good example to illustrate how appropriate a si- 
multaneous adjustment is whereas a disjoint one would 
failed. The visual observations ([9,10]) are too few to 
allow a visual orbit determination: 3.5 observations (2 


Visual orbit 


x (") 


0.08 


-0.05 0.10 


y() 


Global Optimization in Binary Star Astronomy, Figure 1 
Adjusted visual orbit of HIP 111170. The cross represents 
component A 


Spectroscopic orbit 


Radial velocity (km/s) 


0.1 Phase ia 


Global Optimization in Binary Star Astronomy, Figure 2 
Adjusted spectroscopic orbits of HIP 111170 


quantities) are necessary to adjust 7 parameters. Fortu- 
nately, the spectroscopic data are more numerous and 
the two radial velocity curves are well covered. From 
a mathematical point of view, two visual observations is 
the minimum if the spectroscopic observations [3] are 
well spread over the two curves. 

The table above gives the orbital parameters used 
for the figures. The obtained parallax is in quite good 
agreement with the 0.03918+0.00183' after the Hip- 
parcos mission [4]. 


Conclusion 


Even when the observations seem very precise, the ob- 
jective function describing the residual between the ob- 
served and computed data has many local minima. As- 
tronomers should be aware of that fact as they should be 
aware of techniques to efficiently tackle such situations. 
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Introduction 


The cutting angle method (CAM) is a deterministic 
method for solving different classes of global optimiza- 
tion problems. It is a version of the generalized cutting 
plane method, and it works by building a sequence of 
tight underestimates of the objective function. The se- 
quence of global minima of the underestimates con- 
verges to the global minimum of the objective func- 
tion. It can also be seen from the perspective of branch- 
and-bound type methods, which iterate the steps of 
branching (partitioning the domain), bounding the ob- 
jective function on the elements of the partition, and 
also fathoming (eliminating those elements of the par- 
tition which cannot contain the global minimum). 

The key element of CAM is the construction of tight 
underestimates of the objective function and their effi- 
cient minimization in a structured optimization prob- 
lem. CAM is based on the theory of abstract convex- 
ity [23], which provides the necessary tools for building 
accurate underestimates of various classes of functions. 
Such underestimates arise from a generalization of the 
following classical result: each convex function is the 
upper envelop of its affine minorants [21]. In abstract 
convex analysis, the requirement of linearity of the mi- 
norants is dropped, and abstract convex functions are 
represented as the upper envelops of some simple mi- 
norants, or support functions, which are not necessar- 
ily affine. Depending on the choice of the support func- 
tions, one obtains different flavours of abstract convex 
analysis. 
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By using a subset of support functions, one obtains 
an approximation of an abstract convex function from 
below. Such one-sided approximation, or underesti- 
mate, is very useful in optimization, as the global min- 
imum of the underestimate provides a lower bound on 
the global minimum of the objective function. One can 
find the global minimum of the objective function as 
the limiting point of the sequence of global minima of 
the underestimates. This is the principle of the cutting 
angle method of global optimization [1,2,23]. 

The cutting angle method was first introduced for 
global minimization of increasing positive homoge- 
neous (IPH) functions over the unit simplex [1,2,23]. 
Then it was extended to a broader class of Lipschitz pro- 
gramming problems [9,25]. In this Chapter, after pro- 
viding the necessary theoretical background, we will de- 
scribe versions of CAM for global minimization of IPH 
and Lipschitz functions over a polytope (in particular 
the unit simplex), and provide details of its algorithmic 
implementation. 


Definitions 
Notation 


nis the dimension of the optimization problem; 

I= {1,... , 1}; 

x; is the ith coordinate of a vector x € R”"; 

x* © IR” denotes the k-th vector of some sequence 

Cg geet 

e [l,x] = ¥0;<,lix; is the inner product of vectors 1 
and x; 

e ifx,ye R" thenx>y <x; > y; foralli € J; 

e ifx,ye R" thenx > y x; > y; foralli € J; 

e RY := {x = (x%,...,xX,) € R" : x; = 0 for all 

i € I} (nonnegative orthant); 

@ Roo denotes (—oo, +00]; 

e e” = (0,...,0,1,0,...,0) denotes the m-th unit 
orth of the space R”. 

© S={xeER4: ie xi = 1} (unit simplex). 


Abstract Convex Functions 


Let X C R” be some set, and let H be a nonempty set 
of functions h: X — V C [—o0, +00]. We have the 
following definitions [23]. 


Definition 1 A function f is abstract convex with re- 
spect to the set of functions H (or H-convex) if there 


exists U C H such that: 


f(x) = sup{h(x):h e U}, Ve eX. 


Definition 2. The set U of H-minorants of f is called 
the support set of f with respect to the set of func- 
tions H: 


supp(f,H) = the H, h(x) < f(x) Vx eX}. 


Definition 3 H-subgradient of f at x is a function 
h € H such that: 


f(y) = hy) — (Ax) — f(x), 
The set of all H-subgradients of f at x is called H- 
subdifferential 

duf (x) = {he H: f(y) = h(y) — (h(x) — f(x), 

Vy EX}. 


VyeXx. 


Definition 4 The set 07, f(x) at x is defined as 
Of (x) = {h € supp(f. H): h(x) = f(x)}. 


Proposition 1 [23], p.10. If the set H is closed under 
vertical shifts, i.e. (h € H,c € R) implies h—c € H, 
then 07, f(x) = Ouf (x). 

When the set of support functions H consists of all 
affine functions, then we obtain the classical convexity. 
Next we examine two other examples of sets of support 
functions H. 


IPH Functions 


Recall that a function f defined on R‘, is increasing if 
x > yimplies f(x) > f(y). 


Definition 5 A function f: R4. — R is called IPH 
(Increasing Positively Homogeneous functions of de- 
gree one) if 


Vx,yER}, x>y= f(x) = fly)s 
Vx ER!,VA> 0: f(x) =Af(x). 
Let the set H; be the set of min-type functions 
H, = {h: h(x) = min a;Xx;, a E€Ri,xe Ri}. 
i€ 


Proposition 2 [23] A function f: R". > Roo is ab- 
stract convex with respect to H, if and only if f is IPH. 
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Example 1 The following functions are IPH: 


1) f(x) = ye, ax: with a; > 0; 
2) px (x) = (yer ky (k as 0); 
3) f(x) = V[Ax,x] where A is a matrix with non- 


negative entries; 


4) (oO = Mes where) CL > 0,5 
1 


jes fi 


It is easy to check that 

e the sum of two IPH functions is also an IPH func- 
tion; 

e iff is IPH, then the function y f is IPH for all y > 0; 
let T be an arbitrary index set and (f;);e7 be a fam- 
ily of IPH functions. Then the function finp(x) = 
infyer f(x) is IPH; 

e let (f)re7 be the same family and there exists a point 
y > Osuch that supe f(y) < +00 then the func- 
tion foup(x) = sup;er fi(x) is finite and IPH. 

These properties allow us to give two more exam- 
ples of IPH functions. 


Example 2 The following maxmin functions are IPH: 
1) 


f(x) = max min al* x; 


keK je] 


where a! >0,k €K,j €J,i € I. HereJ and K are 
finite sets of indices; 
2) 
f(x) = max min a! x; (1) 
kEK j€Jx ‘er 


where a! >0, j © Jk,k € K. Here J, and K are 

finite sets of indices. 

Note that an arbitrary piecewise linear function f 
generated by a collection of linear functions f',..., f™ 
can be represented in the form (1) (see [5]); hence an 
arbitrary piecewise linear function generated by non- 
negative vectors is IPH. 


Let] ¢ R¢,1 A Oand I(l) = {i € I: 1; > 0}. We 
consider the function x +» (1,x) defined by the for- 


mula /(x) = (1, x) where the coupling function (-, -) is 
defined as 
(1,x) = min 1x; . (2) 
i€I(1) 
Here I(1) = {i € {1,...,n}|1; > 0}. This function 


is called a min-type function generated by the vector 


I. We shall denote this function by the same symbol 
I(x). Clearly a min-type function is IPH. It follows from 
Proposition 2 that: 
e A finite function f defined on IR‘, is IPH if and only 
if 
f(x) = max{(I,x): 1 © Hi,1 < f}; (3) 


e Let x° € R’ bea vector such that f(x°) > 0 and 
b= 7G)", Then 


(I, x) < f(x) 


for all x € R" and (J, x°) = f(x®). 
The vector f care is ie the support vector of 
a function f at a point x®. 


Lipschitz Functions 


Definition 6 A function f: X — Ris called Lipschitz- 
continuous in X, if there exists a number M > 0 such 
that 


Vx, y eX: |f(x)— f(y)| < Mllx — yl]. 


The smallest such number is called the Lipschitz con- 
stant of f in the norm || - ||{'. 


Let the set H> be the set of functions of the form 

Hy = {h: h(x) = a—C||x — d|, 

x,beR",aeR,CeE Ry}. 

Proposition 3 [23] A function f: R" — Roo is H2- 
convex if and only if f is a lower semicontinuous func- 
tion. The H2-subdifferential of f is not empty if f is Lips- 
chitz. 
There is an interesting relation between IPH functions 
and Lipschitz functions, which allows one to formulate 
the problem of minimization of Lipschitz function over 
the unit simplex as the problem of minimization of IPH 
functions restricted to the unit simplex. 


Theorem 1 (see [23,25]). Let f: S > R be a Lip- 
schitz function and let 


f(x) — FO 


M= sup ——W—— (4) 
x,yES,xA#y lx — ylli 
'The norm || - || can be replaced by any metric, or, more gen- 


erally, any distance function based on Minkowski gauge. For ex- 
ample, a polyhedral distance dp(x, y) = max{[(x — y), hj] |1 < 
i < m}, where h; € R",i = 1,..., mis the set of vectors that 
define a finite polyhedron P = ()j*_, {x | [x, hj] < 1}. 
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be the least Lipschitz constant of f in || - ||,\-norm, where 
Ilx|lb = doje, |xi|. Assume that 


min f(x) > 2M. 
xES 


Then there exists an IPH function g: IR". — R such that 
g(x) = f(x) for allx € S. 


Methods 


We consider the problem of global minimization of an 
H-convex function f on a compact convex set D C X, 
minimize f(x) subject tox € D. (5) 


We will deal with the two mentioned cases of f being 
H,-convex (IPH) and H2-convex (Lipschitz). 


Generalized Cutting Plane Method 


A consequence of Propositions 2 and 3 is that we can 
approximate H-convex functions from below using a fi- 
nite subset of functions from supp(f, H). Suppose we 
know a number of values of the function f at the points 
x* k= 1,...,K.Then the pointwise maximum of the 
support functions h* € 0%, f(x), 


H¥(x) = _max hY(e) (6) 


95549 


is a lower approximation, or underestimate of f. We 
have the following generalization of the classical cutting 
plane method by Kelley [16]. 

Kyax is the limit on the number of iterations of the 
algorithm. The problem at Step 2.1 is called the auxil- 
iary, or relaxed, problem. Its efficient solution is the key 
to numerical performance of the algorithm. For convex 
objective functions, H Kis piecewise affine, and the solu- 
tion to the relaxed problem is done by linear program- 
ming. However, when we consider other abstract con- 
vex functions, like IPH or Lipschitz, the relaxed prob- 
lem is not linear, but it also has a special structure that 
leads to its efficient solution. 


Global Minimization of IPH Functions 
over Unit Simplex 


In this section we present an algorithm for the search 
for a global minimizer of an IPH function f over the 


Step 0. (Initialisation) 
0.1 SetK=1. 
0.2 Choose an arbitrary initial point x! € D. 


Step 1. (Calculate H-subdifferential) 

1.1 Calculate h® € 07, f(x*). 

1.2 Define H¥(x) := _max h*(x), for all x € D. 
K 


Step 2. (Minimize H*) 
2.1 Solve the Problem 
Minimize H*(x) 
Let x* be its solution. 
2.2 Set K:= K+1,x* :=x*. 


subjectto x €D. 


Step 3. (Stopping criterion) 
31 lf K =< Ke, and fr, — H(x") = € go to 
Step 1. 


Global Optimization: Cutting Angle Method, Algorithm 1 
Generalized Cutting Plane Algorithm 


unit simplex S, that is we shall study the following opti- 
mization problem: 

minimize f(x) subject tox € S (7) 
where f is an IPH function defined on R‘%. Note that 
an IPH function is nonnegative on R‘., since f(x) > 
f(0) = 0. We assume that f(x) > 0 for all x € S. It 
follows from positiveness of f that I(/) = I(x) for all 
x € Sand I(x) = f(x)/x. 

Since I(e”) = {m}, then the vector 1 = f(e”)/e™ 
can be represented in the form / = f(e”)e” and 


(f(e)e”, x) = fle™)xXm . 


Remark 1 Note that H¥(x) := 


max } H¥—!(x), min IKx;¢, which simplifies solution 
ie1(1) 
to the auxiliary problem at Step 2.1. 


This Algorithm reduces the problem of global min- 
imization (7) to the sequence of auxiliary problems. 
It provides lower and upper estimates of the global 
minimum f+ for the problem (7). Indeed, let Ax = 
min,es H*(x) be the value of the auxiliary problem. It 
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follows from (3) that 


(1*, x) = min IK x; < f(x) forall x € S, 
ie(1*) 
k=1,...,K 


Hence HX(x) < f(x) for allx € Sanddx = 
minyes H¥(x) < minyes f(x). Thus Ax is a lower es- 
timate of the global minimum f+. Consider the number 
Bx = mingau,..,x f(x") =: foest- Clearly ux > fas 
sO [4x is an upper estimate of f+. It is shown in [23] 
that Ax is an increasing sequence and wx —Ax —> 0 
as K —> +00. Thus we have a stopping criterion, which 
enables us to obtain an approximate solution with an 
arbitrary given tolerance. 


Global Minimization of Lipschitz Functions 


Method Based on IPH Functions By using Theo- 
rem 1, global minimization of Lipschitz function over 
the simplex S can be reduced to the global minimiza- 
tion of a certain IPH function over S. 

Let f: S > R be a Lipschitz function and let 


c>2M— min f(x) ; (8) 


where M is defined by (4). Let fi(x) = f(x) +c. It 
follows from Theorem 1 that the function f; can be ex- 
tended to an IPH function g. The problem 


minimize g(x) subjectto xeéS (9) 


is clearly equivalent to the problem 


minimize f\(x) subjectto xeéS. (10) 


Thus we apply the cutting angle method to solve prob- 
lem (10). Clearly functions f and f; have the same min- 
imizers on the simplex S. If the constant c in (8) is 
known, CAM is applied for the minimization of a Lips- 
chitz function f over S with no modification. If c is un- 
known, we can assume that c is a sufficiently large num- 
ber, however numerical experiments show that CAM is 
rather sensitive to the choice of c, in particular, when c 
is very large, the method converges very slowly. In or- 
der to estimate c we need to know an upper bound on 
the least Lipschitz constant M and a lower estimate of 
the global minimum of f. 

If the feasible domain is not the unit simplex S but 
a polytope, it can be embedded into S with a simple 


change of variables. Solution to the constrained auxil- 
iary problem in Step 2.1 of the algorithm was investi- 
gated in [8]. 


Direct Method Consider H>-convex functions, 
which, by Proposition 3 include all Lipschitz functions. 
Let dp be a polyhedral distance function. As a conse- 
quence of H2-convexity, we can approximate Lipschitz 
functions from below using underestimates of the form 


HX (x)= : 
(11) 


II 


max (f(x*) — Cdp(x, x*)), 
k=1,...,K 


where C > M, and M is the Lipschitz constant of f 
with respect to the distance dp. Then we apply the Al- 
gorithm 1 to function f in the feasible domain D. The 
auxiliary problem as Step 2.1 becomes 


minimize _imax (f (x*) — Cdp(x, x*)) 


=1,..., 


subject tox € D. 


The same considerations about the convergence of 
the algorithm as those for Algorithm 2 are applied. Note 


Step 0. (Initialisation) 
0.1 Take points x” =e",m=1,...,n.Set K =n. 


O2 Calculate!’ = (xix ea 


Step 1. (Calculate H-subdifferential) 


1.1 Define HX(x) := max min /*x;, for all 


k=1,...K j¢eI(1*) 
XeenSs 


Step 2. (Minimize H¥) 
2.1 Solve the Problem 
HE) 
Let x* be its solution. 
2.2 SetK:=K+1,x* := x*. 
2:3 Computed™ — f(x ia 


Minimize subjectto x ES. 


Step 3. (Stopping criterion) 
30 IK <9, andijpey — 1 (x) = 6 2010 
Step 1. 
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that in the univariate case the underestimate H¥* in (11) 
is exactly the same as the saw-tooth underestimate in 
Piyavski-Shubert method [20,26] if dp is symmetric. 

For minimization of Lipschitz functions, an esti- 
mate of the Lipschitz constant is required in both cases, 
when transforming f to an IPH function, or using Al- 
gorithm 1 directly. The crucial part in both methods is 
the efficient solution to the auxiliary problem in Step 
2.1. The next section presents a very fast combinato- 
rial algorithm for enumeration of all local minimizers 
of functions H*. 


The Auxiliary Problem 


The Step 2.1 (find the global minimum of H*(x)) is 
the most difficult part of the cutting angle method. This 
problem is stated in the following form: 


minimize H* (x) subject tox € S (12) 
where 
H¥(x) = max min ty x; = max h*(x), (13) 
k<K ier(Ik) k<K 
K>nlk= f (xk )/xck are given vectors, k = 1,...,K. 


Note that x* = ef, 


Proposition 4 [2,3] Let K > n, [kK = Pe, k= 
Lu. > 0, (KM) = 2, ken 41, .50.K 
Then each local minimizer of the function H¥ (x) defined 
by (13) over the simplex S is a strictly positive vector. 


Corollary 1 Let {x*} be a sequence generated by Algo- 
rithm 2. Then x* > 0 for all k > n. 


Let ri(S) = {x € S: x; > 0 for all i € I} be the relative 
interior of the simplex S. It follows from Proposition 4 
and Corollary 1 that we can solve the problem (12) by 
sorting the local minima of the function H¥ over the set 
ri(S). We now describe some properties of local min- 
ima of H¥ on ri(S), which will allow us to identify these 
minima explicitly. 

It is well known that functions h* and HX are direc- 
tionally differentiable. Let f’(x,u) denote directional 
derivative of the function f at the point x in the direc- 
tion u. Also let 


R(x) = {k: h(x) = H¥(x)}, 


Qx(x) = fi € 11"): Ikx; = A*(x)}. 


Proposition 5 (see, for example, [13]). Let x > 0. 
Then 


(h*)'(x,u) = min Iku; 
EQx(x) 

(AX) (x, u) = amex (i )'(x,u) = max min [*u;. 
ER(x kER(x) 1€Qx(x) 


Let x € S. The cone 


K(x,S) = {ue R": 
such that x + au € S Va € (0, a)} 


dap > 0 


is called the tangent cone at the point x with respect to 
the simplex S. The following necessary conditions for 
a local minimum hold (see, for example, [13]). Suppose 
x €ri(S). Then K(x, S) = {u: \oj¢,ui = 0}. 


Proposition 6 Let x € S be a local minimizer of the 
function HK over the set S. Then (H*)'(x, u) > 0 for all 
u € K(x,S). 


Applying Propositions 5 and 6 we obtain the following 
result. 


Proposition 7 [2,3] Let x > 0 be a local minimizer of 
the function HK over the set ri (S), such that HX (x) > 0. 


Then there exists an ordered subset {1",1",..., 1} of 
the set {1',...,1§ such that 
1) 
d d 
x= : where d = ; (15) 
(4 =| ier ii 
2) 
ik 
max min a =1; (16) 
k<K je1(1k) is i 


3) Either k; = {i} for alli € I or there exists m € I 
such that km > n+ 1; ifkm <n then ky, = m; 

4) if km > n+ 1and tk» 4 0 then Ik" > I* for all 
ielLiém. 


Solution of the Auxiliary Problem 


It follows from Propositions 4 and 7 that we can find 
a global minimizer of the function H K defined by (13) 
over the unit simplex using the following procedure: 
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e sort all subsets {1#,...,]'"} of the given set 
I',...,1§ vectors, such that (16) holds and in > 
IM ix mifkm > n+1,i € U(lk™) and km = mif 
km <3 

e for each such subset, find the vector x defined 
by (15); 

e choose the vector with the least value of the function 
HK among all the vectors described above. 

Thus, the search for a global minimizer is reduced 
to sorting some subsets, containing n elements of the 
given set {1',... 1} with K > n. Fortunately, Proposi- 
tion 7 allows one to substantially diminish the number 
of sorted subsets. 

The subsets L = {1",...,1*"} can be visualized 
with the help of an n x n matrix whose rows are given 
by the participating support vectors 


Co cg TE 
pe I, gee, A 

L=| | ? (17) 
fe eg. 


The conditions 2) and 4) of Proposition 7 are then easily 
interpreted as follows. Condition 4) implies that the di- 
agonal elements of matrix L are smaller than elements 
in their respective columns, and condition 2) implies 
that the diagonal of L is not dominated by any other 
support vector I* ¢ L (zero entries of matrix L are ex- 
cluded from compaisons). Thus we obtain a combina- 
torial problem of enumerating all combinations L that 
satisfy conditions 2) and 4). 

However it is impractical to enumerate all such 
combinations directly for large K. Fortunately there is 
no need to do so. It was shown in [6,7,8] that the re- 
quired combinations can be put into a tree structure. 
The leaves of the tree correspond to the local minimiz- 
ers of HX, whereas the intermediate nodes correspond 
to the minimizers of H", H"*!,..., H&—!.The incre- 
mental algorithm based on the tree structure makes 
computations very efficient numerically (as processing 
of queries using trees requires logarithmic time of the 
number of nodes). It is possible to enumerate several 
billions of local minimizers of H¥ (e. g., when n = 5 
and K = 100, 000) in a matter of seconds ona standard 
Pentium IV based workstation. 

The direct method of minimization of Lipschitz 
functions involves solution to a different auxiliary prob- 


lem, that of minimizing H* given in (11), with dp being 
a simplicial distance function. It turns out that a very 
similar method of enumeration of local minimizers of 
HX, by putting them in a tree structure, also works [9]. 
There is a counterpart of Proposition 7, with the differ- 
ence that the support vectors are defined by 


k 
p= EO at, (18) 


and the local minima and minimizers of H¥ are identi- 
fied through 


d = HK(x*) = C(Trace(L) + uy 
1 
x, =o—-]',i=l,...,n, 
C 


where constant C is chosen greater or equal to the Lips- 
chitz constant M off in the simplicial distance dp. Thus 
both versions of CAM, for IPH and for Lipschitz func- 
tions, share the same algorithm, but with different defi- 
nitions of support vectors. 

The actual algorithms for enumeration of local 
minima of H* and maintaining the tree structure, as 
well as treatment of linear constraints, are presented 
in [7,8,9]. The algorithms involve a crucial fathoming 
step, and can be seen as branch-and-bound type algo- 
rithms [9,12,23]. 


Conclusions 


Cutting angle methods are versions of the general- 
ized cutting plane method for IPH, Lipschitz and other 
classes of abstract convex functions. The main idea 
of this deterministic method is to replace the original 
problem of minimizing f with a sequence of relaxed 
problems with special structure. The objective func- 
tions in the relaxed problems provides tight lower esti- 
mates of f, and the sequence of their solutions converge 
to the global minimum of f. Efficient solution to the re- 
laxed problem makes CAM very fast on a class of global 
optimization problems. 

Optimization is not the only field such underesti- 
mates are applied. Versions of CAM are also used for 
non-uniform random variate generation [10] and mul- 
tivariate data interpolation [11]. 

Both versions of CAM described here have been 
successfully applied to a number of real life problems, 
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including very difficult molecular geometry prediction 
and protein folding problems [12,17]. A software li- 
brary GANSO for global and non-smooth optimiza- 
tion, which includes the cutting angle method, is avail- 
able from http://www.ganso.com.au. 
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Some classical methods of finite-dimensional convex 
minimization can be extended for quite broad classes of 
multi-extremal optimization problems. One successful 
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generalization is based on the so-called envelope repre- 
sentation of the objective function. 

We begin with the simplest case of a convex differ- 
entiable function f in order to introduce this approach. 
For such a function the tangent hyperplane T = {xV 
F(y\(x — y)+ fly) = 0} is simultaneously a support hy- 
perplane. That is, the inequality f(x) > f(y)+ V fiy)(x 
— y) holds for each x. This inequality can be expressed 
also in the following form: the affine function 


hy(x) = Vf(y(x — y) + fi) (1) 


is a support function for the function f. Thus the func- 
tion f can be represented as the pointwise maximum of 
the functions of the form hy: 


{nS a hy (x). 


One of the main results of convex analysis asserts that 
an arbitrary lower semicontinuous convex function f 
(perhaps admitting the value +00) is the upper envelope 
(UE) of the set of all its affine minorants: 


h is an affine function, 
h<f 


(The inequality h < f stands for h(x) < f(x) for all x.) 
The supremum above is attained if and only if the sub- 
differential of f at the point x is nonempty. Since affine 
functions are defined by means of linear functions, one 


f(x) = sup j h(x): 


can say that convexity is‘linearity + envelope represen- 
tation’. 

As it turns out the contribution of‘envelope repre- 
sentation’ to the convexity is fairly large. This obser- 
vation stimulated the development of the rich theory 
of‘convexity without linearity’. (See [12,14,19] and ref- 
erences therein.) In particular, functions which can be 
represented as UE of subsets of a set of sufficiently sim- 
ple functions are studied in this theory. 

We need the following definition. Let H be a set of 
functions. A function f is called abstract convex (AC) 
with respect to H (or H-convex) if f is the UE of a subset 
from H, that is 


f(x) = sup {h(x): he Hh < fy. (2) 


The set H is called the set of elementary functions. 
For applications we need sufficiently simple elementary 
functions. 


Many results from convex analysis related to var- 
ious kinds of convex duality can be extended to ab- 
stract convex analysis Abstract convexity sheds some 
new lights to the classical Fenchel-Moreau duality and 
the so-called level sets conjugation (see [19]). The set s(f, 
H) = {h € H:h < f}, presented in (2), is called the sup- 
port set of f. The mapping f + s(f, H) is called the 
Minkowski duality ([9]). The support set accumulates 
a global information ofa function f in terms of the set of 
elementary functions H and it can be useful in the study 
of global optimization problems involving the function 
re 

One of the main notions of convex analysis, which 
plays the key role for applications to optimization, is 
the subdifferential. There are two equivalent definitions 
of the subdifferential of a convex function. The first of 
them is based on the global behavior of the function. 
A linear function | is called a subgradient (i.e. a mem- 
ber of the subdifferential) of the function f at a point y 
if the affine function h(x) = I(x)— (I(y)— f(y)) is a sup- 
port function with respect to f, that is h(x) < f(x) for 
all x. The second definition has a local nature and is 
connected with local approximation of the function: the 
subdifferential is a closed convex set of linear functions 
such that the directional derivative u +> f',(u) at the 
point x is presented as the UE of this set. For a differen- 
tiable convex function these two definitions reflect re- 
spectively support and tangent sides of the gradient. 

The various generalizations of the second definition 
have led to development of the rich theory of nons- 
mooth analysis. The natural field for generalizations of 
the first definition is AC. 

A function h € H is called the subgradient (or H- 
subgradient) of an H-convex function f at a point y if 
f(x) = h(x)— (h(y)— f(y) for all x. The set dy f(y) of all 
subgradients of f at y is referred to as the subdifferential 
of the function f at the point y. 

Let H’ be the closure of the set H under vertical 
shifts, that is 


a) Ws h'(x) = h(x) —c, 
' heH,ceR 

Clearly h € dy f(y) if and only if f(y) = max{h'(y):h' < 

f,h' € H’}. Thus if H is already closed under shifts then 


Juf(y) = the sf, H): hy) = f(y)}- (3) 
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Thus the subdifferential is not empty if and only if the 
supremum in (2) is attained. 

Sometimes (3) is used for the definition of the sub- 
differential for an arbitrary set of elementary functions 
H (not necessary closed under shifts). 

Many methods of convex minimization are based 
on the local properties of the convex subdifferential 
(more precisely, on the directional derivative). How- 
ever there are some methods which exploit only the 
support property of the subdifferential. The conceptual 
schemes of these methods can be easily extended for AC 
functions. One of these methods is presented below. 

Consider the following problem 


f(x) > min, xeX, (4) 


where X is a compact set. Assume that f is AC with re- 
spect to a set of elementary functions H. We consider 
the following algorithm based on the generalized cut- 
ting plane idea, which is a nonlinear generalization of 
the classical cutting plane method. 


0 Letk := 0. Choose an arbitrary initial point x» € 
x; 

1 Calculate a subgradient in the form (3) that is an 
element hx € s(f, H) such that hy (xx) = f(xx); 

2 Finda global optimum y* of the problem 


max h;(x) ~ min, x € X._ (5) 
0<i<k 


3 Let xpu1 = y*, k:=k+1.Go to step 1. 


Conceptual scheme (generalized cutting plane method) 


Convergence of the sequence constructed by this 
procedure to a global minimizer has been proved under 
very mild assumptions by D. Pallaschke and S. Rolewicz 
[12]. Upper and lower estimates of the optimal value of 
the problem (4) can be computed, which lead to an ef- 
ficient stopping criterion (compare with [2]). 

There are two major difficulties in the numerical 
implementation of the Algorithm. The first is the cal- 
culation of a subgradient. In general it is very difficult 
to find it numerically, however it is possible in several 
important particular cases. The second difficulty is the 
solution of the auxiliary problem (5). This is a linear 


programming problem in the case of the set H of affine 
functions, but for sets of more complicated functions 
the problem (5) is essentially of a combinatorial nature 
or a problem of convex maximization. 

The simplest example of this approach is Lipschitz 
programming. Iff is a Lipschitz function we can, for ex- 
ample, take as H the set of functions h of the form h(x) 
=—a ||x — x,|| — c, where a is a positive and c is a real 
number, x, € X. In order to find an H-subgradient we 
should take a > L where L is the Lipschitz constant of 
the function f; thus we need to know an upper estimate 
of this constant; this is a special piece of global infor- 
mation about this function. With such H the problem 
(4) can be reduced to a sequence of special problems of 
concave minimization. Some known algorithms of Lip- 
schitz programming fall within the described approach 
[11,21]. 

For fairly large classes of functions defined on the 
cone R’, of all n-vectors with nonnegative coordinates 
it is possible to take as H a set of functions which in- 
cludes as its main part a min-type function of the form 


1 = i IF ib ER" ; 
(x) aa ee SS 6) 
with T (1) = {i: 1; > O}. 


We define the infimum over empty set to be zero. If / is 
a strictly positive vector and c a positive number then 
the set {x: min, jx; < c} is a complement to a ‘right 
angle’. Exploiting min-type functions instead of linear 
functions allows us to separate a point from the (not 
necessary convex) set by the complements of ‘right an- 
gles’. 

Various classes of elementary functions arise, based 
on the set L of all functions of the form (6) with] € R‘.. 
In particular, L itself and sets 


H, = fh: h(x) = l(x)-—c, le L, cE R}, 
H, = {h: h(x) = min(I(x), c), le L, c € R} 


are convenient for applications. The classes of AC with 
respect to H; and H> functions are quite large [14]. The 
first of them consists of all increasing (with respect to 
the usual order relation) functions f such that the func- 
tion of a real variable t — f(tx), t € [0, +00), is con- 
vex for all x € R’,. This class contains all homogeneous 
functions of degree 6 > 1, their sums and UE of sets of 
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such functions. In particular it contains all polynomials 
with nonnegative coefficients. The second class consists 
of all increasing functions f such that f(tx) > tf(x) for 
all x € R" and ¢ € [0, 1]. Concave increasing functions 
f with f(0) > 0 and UE of sets of such functions belong 
to this class. Also, positively homogeneous functions of 
degree 6 < 1, their sums and UE of sets of such func- 
tions belong to it. 

For minimizing AC functions with respect to H; (i 
= 1, 2) we need again to calculate the H;-subgradients 
in the form (5) and then to reduce the problem (4) to 
a sequence of auxiliary problems. A version of the gen- 
eralized cutting plane method in such a case is called 
‘cutting angle method’ ([2,14]). 

A.M. Rubinov et al. ([1,14,16,17]) have demon- 
strated that for AC functions generated by various 
classes of min-type functions it is possible to find sub- 
gradients very easily. In particular, only the number 
f(x) (resp. f’(x, x)) is required for the calculation of an 
element of 04,f(x) (resp. 04,f(x)), without any addi- 
tional information about a global behavior of the func- 
tion f. Thus the main problem with implementation of 
the cutting angle method is to solve the auxiliary sub- 
problem, which is a problem of the mixed integer pro- 
gramming of a special kind in this case. 

Let L be the set of all functions (6) with ] € R‘.. It 
can be shown ([14,16]) that a function f defined on R{. 
is L-convex if and only if f is IPH (increasing and pos- 
itively homogeneous of degree one).IPH functions can 
serve for the miminization of a Lipschitz function over 
the unit simplex S, = {x € R4.: )0 jx; = 1}. First ([14,15]), 
for each Lipschitz function g defined on S,, there exists 
a constant c>0 such that the function g(x) = g(x) + ¢ 
can be extended to an IPH function defined on R‘, . Sec- 
ond, the auxiliary problem (5) for problem (4) with an 
IPH function f and X = S,, has a special structure and 
can be efficiently solved for fairly large n (see [14, Chap. 
9] and references therein). Thus, the minimization of 
a Lipschitz function over the unit simplex can be effi- 
ciently accomplished by the cutting angle method. 

Numerical experiments demonstrate that a combi- 
nation of the cutting angle method with a local search 
is very efficient, since the cutting angle method allows 
one to leave a local minimizer fairly quickly. 

Envelope representation is useful also in the study 
of some theoretical problems arising in optimization. 
Many interesting examples of such applications can be 


found in the books [12,14,19]. In particular, a general 
scheme of penalty and augmented Lagrangian based on 
the notion of the subdifferential is presented in [12]. I. 
Singer [19] demonstrated that Fenchel-Moreau duality 
leads to a unified theory of duality results for very gen- 
eral optimization problems. It can be shown [18] that 
AC forms the natural framework for the study of solv- 
ability theorems (generalizations of Farkas’ lemma; cf. 
> Farkas lemma; » Farkas lemma: Generalizations). In 
contrast with numerical methods based on applications 
of subdifferentials, the study of solvability theorems is 
based on application of support sets. AC serves also for 
the study of some problems of quasiconvex minimiza- 
tion (see for example [10,13,20]). 

A subset H ofa set X of functions is called the supre- 
mal generator ([{9]) of X if each function from X is AC 
with respect to H. There exist very small supremal gen- 
erators of very large classes of functions. The following 
two examples of such supremal generators are useful for 
nonsmooth optimization. 

1) Recall that a function f is called positively homoge- 

neous (PH) of degree k if p(Ax) = Akp(x) for A > 0. 

It can be shown ([14]) that the set of all functions of 

the form 


h(x) = —a (> “) + 3 LX: (7) 
i=1 i=1 


where a < 0, |), .. 
mal generator of the set PH; of all lower semicon- 
tinuous PH functions of degree one defined on n- 
dimensional space R”. Since each function (7) is 
concave it follows that the set of all concave PH func- 
tions of degree one is a supremal generator of PH). 

2) It can be shown ([3,4,9,14]) that the set H of all 
quadratic functions h of the form 


., 1, are real numbers is a supre- 


h(x) = —-a ye + s Iix; +c, (8) 
i=1 i=1 


where a > 0, h;,. 
mal generator of the set of all lower semicontinuous 
functions f:R” — R U {+00} minored by 1 in the 
following sense: there exists h € H such that f > h. 
Supremal generators are a convenient tool in the study 
of nonsmooth optimization problems. A local approxi- 
mation of the first (resp. second) order of a nonsmooth 


..>1,, c are real numbers is a supre- 
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function is fulfilled very often by various kinds of gen- 
eralized derivatives of the first (resp. second) order, 
which are PH functions of the first (resp. second) de- 
gree. Practical applications of these derivatives to opti- 
mization are based on their representation in terms of 
linear (resp. quadratic) functions. 

Linearization of lower semicontinuous PH func- 
tions of the first degree can be accomplished by supre- 
mal generators of the space PH), consisting of concave 
functions. Each finite concave function g € PH; can be 
presented as min {Ha): le dg(o)} where dg(0) is the 
superdifferential (in the sense of convex analysis) of this 
function g at the origin. Hence each function g € PH; 
can be linearized by the operation sup min. 

The second order approximation of a nonsmooth 
function f at a point x can be accomplished by the sub- 
jet, that is the set 


a f(x) 
f-—ghasa 


local minimum x 


with g € C2(R") 


= 4 (Vag(x), V7g(x)): 


(Here V g(x) (resp. V? g(x)) stands for the gradient 
(resp. Hessian) of a function g at a point x.) Let H{ be the 
set of all functions of the form (8). It can be shown (see 
[5,6]) that the subjet 0” —f(x) is nonempty if and only if 
the H{-subdifferential 04; f(x) is not empty. AC with re- 
spect to H can also serve for supremal representation of 
the second order generalized derivatives of nonsmooth 
functions in terms of quadratic functions (see[5]). 
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Introduction 


The filled function methods describe a class of global op- 
timization methods for attacking the problem of find- 
ing a global minimizer of a function f: X — WR over 
a certain subset X C t”. Each variant of such meth- 
ods replaces the objective function f(x) by a specific 
auxiliary function that is associated with a local min- 
imum and some parameters in every iteration, and is 
minimized through some local search strategies. The 
term “filled function” means that every auxiliary func- 
tion can fill the region of attraction at a certain neigh- 
borhood of a local minimum of the objective function. 
The definition of a filled function involves some 
basic concepts. The term “basin” was introduced first 
in [1]. A basin of a function f(x) at an isolated min- 
imizer xj denotes a connected domain Bf which con- 
tains xf and in which starting from any point the steep- 
est descent trajectory of f(x) converges to x}, but out- 
side of which the steepest descent trajectory of f(x) does 
not converge to x/*. Accordingly, a hill of a function f (x) 
at a maximizer x/ is a basin of — f(x) at the point x}. 


In addition, the basin B} at a minimizer x} is lower 
(or higher) than the basin BY at another minimizer xf 
if the following inequality holds: 


F(z) < f(xf) (or f(xy) = f(z). 


Definitions 

The first kind of filled function method was proposed 

in [5] for the unconstrained optimization problem 
min Fa). 


The corresponding filled function involved two param- 
eters, and was defined by 
x —x* 2 
aL ae 


1 
r+ fix) P ( ro 


where x} is a minimizer of the objective function f(x), 
and rand p are parameters such that r+ f(x}) > 0, > 
0. In order to demonstrate the principle of the filled 
function method, people usually assume that the func- 
tion f(x) is twice continuously differentiable and coer- 
cive, i.e., its Hessian is continuous and the following 
condition holds: 


PUGxT 1p) = 


lim f(x) =+00. (2) 

I|x|| >+00 
It is also assumed that the function f(x) has only a finite 
number of minimizers in a closed domain 92 C ‘h" that 
contains all global minimizers of f(x). 

Under certain other conditions concerning the pa- 
rameters r and p, the function P(x,xj,r, p) defined 
in (1) has three properties as follows: 

(a) xf is a maximizer of P(x, xf,7r, e) and the whole 
basin BY at xj becomes a part of a hill of 
P(x, xf,7, p) at xf. 

(b) P(x, x},r, e) has no minimizers or saddle points in 
any higher basin of f(x) than BY at x}. 

(c) f(x) has a lower basin B than Bj at x}, then there 
is a point x’ in such a basin B that minimizes 
P(x, xy, 1, p) on the line through x’ and xf. 

A function satisfying the above three properties is said 

to be a filled function of f(x) at the local minimizer xf. 

Note that the above definition just lists the main prop- 

erties required for a filled function, in which the num- 

ber of parameters is not an important factor (see the 
discussion about categories of filled functions below). 
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Usually, when people develop a variant of the filled 
function method, property c in the above definition 
may be replaced by a similar one. For example, it was 
replaced in [21] by 


(c) Iff(x) has a basin BF at x} that is lower than 
By, then there is a point x’ € B} that minimizes 
P(x, xf,r, p) on the line through xf and x”, for 
every x” in some neighborhoods of x}. 


Note that property c is much stronger than that re- 
quired in [5] since a minimizer is required for lines con- 
necting the current minimizer with every point in some 
neighborhoods of a next better minimizer. 

In addition, for the unconstrained global optimiza- 
tion problem, in [16] two classes of continuously dif- 
ferentiable filled functions with multiplicative and ad- 
ditive structures, respectively, were proposed which as- 
sumed the existence of a local minimizer in a lower 
basin but not just on lines. 

Under such assumptions as the objective function 
f: 8" — Nis coercive, continuously differentiable and 
has finite local minimizers, another stronger variant of 
the filled functions can be found in [18], where the con- 
cept of a basin at a local minimizer was extended to that 
of a G-basin. A subset B* C t" is said to be a G-basin 
of f(x) corresponding to a local minimizer x* if it is 
a connected domain with the following properties: 


(i) f(x) = f(x*) for any x € B*; 
(ii) x € B* is a local minimizer of f(x) if and only if 
F(X) = f(x"). 


The definition in [18] requires that a filled function 


p(x) is differentiable and satisfies some modifications of 
conditions a and b as follows: 


(a') xf is a strictly local maximizer of p(x). 

(b’) For any x # x* satisfying f(x) > f(x*), x is not 
a stationary point of p(x). 

Furthermore, any lower local minimizer * of f(x) than 


a nonglobal minimizer x* is also a local minimizer of 
the filled function and is lower than every point on the 
boundary of the box set 2 which contains all global 
minimizers of f(x). For points higher than x* in 2, the 
farther they are from x* implies a lower value of the 
filled function. 

Recently, in order to take advantages of filled func- 
tions and reduce the difficulty in adjusting the value of 


parameters, the concept a locally filled function was in- 
troduced in [9,22], which was based on the concept of 
a local basin. 

Given a bounded and closed convex set w C §2 and 
a basin B, of the objective function f(x) at a local mini- 
mizer x/,, if the set 


By(@) = @NB, 49, 


then B,(w) is called a local basin associated with xf 
and w. Furthermore, a continuously differential func- 
tion P(x) is said to be a locally filled function associated 
with w at a local minimizer x,* of f(x) if the following 
conditions hold: 


(aj) xf is an interior point of w and a strict local max- 
imizer of P(x). 

(bz) If Bi(@) is a local basin containing the point xf, 
then P(x) does not have any local minimizer or 
saddle point in B,(@). 

If there exist local basins lower than B,(w), then 
at least one of such local basins, e. g., Bz(w), sat- 
isfies the following condition: There is a point 
x2 € B,(w) such that P(x) decreases strictly along 
the segment connecting xf and x, that is, 
P((1 — @)xf + ax2) is decreasing strictly with re- 
spect toa € [0, 1]. 


(cz) 


In [9,22], the difference between a filled function and 
a locally filled function was illustrated by such a func- 
tion y = f(x) defined on the interval [-0.5,0.5] as 


f(x) = z(sin(12mx) + 1.5), 
where the variable z; was defined by 


Z = log(z. + 107°) +10, 


Ze ((+- (ert) 4 ~) re 


Note that x* = 0.2366 is one of its local minimizers. An 
auxiliary function 


Q(x, x*, A) = —[f (x) — f(x*)] exp (Allx — x* ||’) 


does not satisfy the definition of the filled function on 
[-0.5,0.5] for the parameter A = 16, but it satisfies all 
conditions associated with a locally filled function for 
the parameter A = 16 and the choice of the interval 
w = [—0.1, 0.3]. 
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Methods 


If the objective function f: it" > M is coercive, then 
its global minimizer can be found in a suitable large 
bounded closed set 2 C St” which should be explored 
completely. In general, let us denote the feasible re- 
gion for minimizing f: X C 2" > & by &, and as- 
sume that for any point x € 02, f(x) > minyeg f(y). 

The basic outline of filled function methods can be 
described as follows: 


Step 1 Choose an initial point x, € 2. Denote the 
maximum of the iteration number and the index of 
the iterative process by Iter_No and k, respectively. 
Set k = 0. 

Step 2 Minimize the function f(x) in 92 starting from 
the point x, € Q and obtain a local minimizer xf 
of f(x). Denote the basin of the objective function f 
at x} by BY. 

Step 3 Choose two suitable parameters r and p, and 
construct a filled function P(x, xt,r, ) associated 
with xf and f, for example, which is defined by (1). 

Step 4 Minimize the filled function P(x, xf,r, e) and 
find another point x2 in a lower basin B} of f than 
BY if such a point x2 exists for a suitable choice of 
parameters r and p. 

Step 4.1 Ifa lower basin BF of f than By at xf is found, 
then a new local minimizer x} can be obtained by 
any local search strategy. Furthermore, perform the 
replacement of variables such as 

xj—>xt, BI oB, k+1—>k, 
and go to step 3 (The method continues searching 
for a global minimum by minimizing another filled 
function corresponding to the local minimizer x}). 

Step 4.2 Otherwise, either the parameters should be 
adjusted again by an internal updating strategy, or 
no better local minimizer than xf can be found 
in 2. 

Step 5 If the iterative index k > Iter_No, or no better 
local minimizer of f can be found in 92, the cur- 
rent best local minimizer will be regarded as a global 
minimizer of f in 92. 


In the above outline of filled function methods, how to 
choose parameters in a filled function is an important 
issue, and it may be implemented through an internal 


iterative process for minimizing P(x, x}, r, p) approx- 
imately in order to find a lower basin of f or an in- 
creasing direction x — xf for P(x, xj,r,) at a point 
x. An algorithmic implementation and some practical 
considerations can be found in [5]. 

Until now people have proposed many kinds of 
filled functions, for which some are general, while many 
others are specific [3,5,6,7,8,10,11,12,13,14,15,17,20, 
21,23]. These filled functions can be classified into four 
categories. 


Two-Parameter Filled Functions 


A two-parameter filled function was presented in (1). 
Although the first filled function method was proposed 
to deal with unconstrained optimization problems, the 
two-parameter filled function method had been ex- 
tended to find a constrained global minimizer [3]. 

The constrained optimization problem can be for- 
mulated as follows: 


Minimize f(x), 
subject togi(x)>0, ie€7, (3) 
hj(x)=0, jee, 


where J and £ are indices sets corresponding to 
inequalities and equalities, respectively. The two- 
parameter filled function for problem (3) is defined by 


I|x — xf |]? 
r+ F(x) (- p? ) ri) 


Pr(x,x7,7, p) = 


where 


F(x) = f(x) + DA; max{0, —gi(x)}+ D> jlhj(x)| 


ied jek 


(5) 


is an exact penalty function for the constrained mini- 
mization problem (3), and A € Wt be Re Since 
the function defined by (4) is a nonsmooth filled func- 
tion, the definitions such as basin and filled function 
should be modified accordingly, see [3]. 
Two-parameter filled functions have two disadvan- 
tages. One is that the changes of both the filled function 
and its gradient (if available) are affected by the term 
exp(—||x — xf'||?/p?). When ||x — x}'||? is large, it is dif- 
ficult to distinguish these changes, so some pseudo- 
minimizers, or saddle points or higher minimizers of 
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the filled functions may be located. The other is that 
the coordination between r and p is very difficult; even 
a global minimizer x* may be lost for an improper set- 
ting of parameters. 

Several modified two-parameter filled functions 
were proposed in [7] as follows: 


eee (- l|x — sit) 
r+ f(x) pe? 
G(x, xf,17, p) = —p” log[r + f(x)] — |lx — xf”, 


G(x, xf,r, p) = —p’ log[r + f(x)] — ||x — xf] . 


Pe; tp) = 


A more general form of filled functions with two pa- 
rameters can be found in [20]: 


P(x,r, A) = W(r + f(x))exp(—Aw((|x — x* ||), (6) 


where 6 > 1, A> 0, the parameter r is chosen such 
that r+ f(x) > 0 for all x € Q, and the functions 
w(t), w(t) have the following properties: 


(i) w(t) and w(t) are continuously differentiable for 
t € (0, +00). 

(ii) For t€(0,+00), w(t)>0, w(t) <0 
w’(t)/W (4) is monotonically increasing. 

(iii) w(0) = 0 and for any ft € (0,+00), w(t) > 0, 
w(t)>c>0. 

Note that choices for the functions w(t) and w(t) 


and 


can be 1/t*(a>0), csch(t), exp(1/t)—1,... and 
t,sinh(t), e' — 1, ..., respectively. The general form of 
filled functions in (6) includes the class of generalized 
filled functions considered in [24], which are special 
two-parameter filled functions. 

Since the above filled functions may tend to zero 
or -0o as ||x|| — +00 for some objective functions f(x) 
or F(x), they do not approximate a coercive objective 
function properly. In such a case, a coercive filled func- 
tion may be preferred. In [8] the concept of a globally 
convexized filled function for a twice continuously dif- 
ferentiable function f: {2 — % was introduced. 

A continuous function U(x) is a globally convexized 
filled function if it has three properties: 


(a) U(x) has no stationary point in the region 
Si = {x | f(x) = f(x), x € Q}, 


except a prefixed point xp € S; that is a minimizer 
of U(x). 


(b) U(x) has a minimizer in the region (if it exists) 


Sz = {x | f(x) < f(x7), x € 2}. 


Two successful globally convexized filled functions can 
be found in [8] as follows: 


Uy (x, x}, x0, A, h) = ||x — xol| 
x arctan{A[ f(x) — f(xf) + h]}}, 
Ux(x, xf, x0, A, h) = ||x — xoll 


x tanh{A[ f(x) — f(x) + h]}. 


In general, such globally convexized filled functions 
may be expressed by 


U(x, x7, x0, A, A) = n(\|x—xoll)O(ALf (x) —f (x7) +h) 


for a large enough A > 0 and a suitable parameter h 
such that 


0<h< f(xf)— f(x"), 


where x* is a global minimizer of f(x), xf is not a global 
but is a local minimizer of f(x), and n(t) and g(t) are 
continuously differentiable univariate functions satisfy- 
ing the following conditions [8]: 


(i) (0) =0,7/(t) >a >0,Vt> 0. 

(ii) (0) = 0, $(£) is monotonically increasing for all 
t € W (or for t € (—t,, +00), where ft; > 0). 

(iii) 6’(t) > 0, Vt € HK (or M(t) > 0, Vt € (—f1, +00), 
where f; > 0). 

(iv) When t > +00, $/(t) is monotonically decreas- 
ing to 0 at least as fast as 1/t. 

Note that choices for these two functions can be t, 


tan(t), e' —1,... for n(t) and arctan t, tanh(t),1 — 
e',... for f(t). 


Single-Parameter Filled Functions 


In order to reduce the difficulty in coordination be- 
tween r and pin a two-parameter filled function, several 
single-parameter filled functions were proposed in [7]: 


Q(x, x7, A) = —[ f(x) — f(xf)] exp (Allx — xf’) , 
Q(x, xf, A) = —[f(x) — f(xf)] exp (Allx — xfl) , 
VE(x, x7, A) = —Vf(x) — 2AL f(x) — f(xf) (x — xf), 

x—xt 


VE(x, xf,.A) = —Vf(x) — ALf(x) — f(xf)] ——— 


Ilx — x] 


rll 
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More and more single-parameter filled functions 
appeared afterwards. For example, 


1 
Infl + f(x) — fp) 
was proposed in [11], which is defined only for the re- 
gion where f(x) > f(xf) — 1. The L function 
L(x, xf,a) = —pllx — xf? — Lf (x) — fq] 


and the mitigated L2 function 


A(x, xy,a) = — allx — xf? 


MLy(x, x}, a) = pd (a) -trev-ren™ 
1 


llx — 
were proposed in [12] and [13], respectively, where 
m > lisa prefixed natural number, p is a positive pa- 
rameter, and ¢ is a mitigator. A function y: i > MH is 
said to be a mitigator if it is a twice continuously differ- 
entiable function in its domain and has the following 
properties: 
(i) y(0) = 0, y'(t) > 0, and y(t) < 0 forall t > 0. 
(ii) lim y(t) exists. 
t>+00 

Note that the ML» function can reduce the negative 
definite effect in the Hessian of a single-parameter 
filled function such as the L function significantly. The 
numerical results and generalizations can be found 
in [12,13,14,15]. 

A more general form for the single-parameter filled 
functions can be expressed by 


Q(x, A) = —O(f (x) — f(xf))exp(Aw(||x — xf ||?) , 


where B > 1, A > 0, and the functions y(t) and w(t) 
have the following properties [20]: 


(i) g(t) is continuously differentiable for t > 0. 

(ii) f(0) = 0, d’(t) > 0, Vt >= 0. 

(iii) '(t)/p(t) is monotonically decreasing for 
t € (0, +00). 

(iv) w(0) = 0 and for any tf € (0,+00), w(t) > 0, 
w(t)>c>0. 


Note that the choices for these functions can be f, 
a' —1(a > 1), sinh(t), ... for g(t) and t, sinh(t), e’ — 
1, ... for w(t). 

In order to avoid the influence of the exponential 
term, a general single-parameter filled function can be 
set by 


U(x, A) = —n( f(x) — f(xt)) — Aw(||x — x7 1/9) , 


where the function n(t) is continuous on [0, +00) and 
is differentiable in (0, +00). Furthermore, the functions 
n(t) and w(t) have the following properties [20]: 


(i) (0) = 0; 

(ii) n/(f)>0 is monotonically decreasing for 
t € (0, +00) and lim,_,9+ 7/(t) = +00; 

w(0) = 0 and for any t € (0,+00), w(t) > 0, 
w(t)>c>0. 


(iii) 


Nonsmooth Filled Functions 


It is well known that the constrained optimization 
problem can be formulated as a nonsmooth opti- 
mization problem by using the exact penalty func- 
tion; see [3] or (3)-(5). With use of the methods of 
nonsmooth analysis, a nonsmooth unconstrained op- 
timization problem was studied in [10], which involved 
a modified filled function as follows 


PROG PoP) 


_ 1 Ix — xf || 
=In(1+ 5) exp (-S ) » 


where F(x) is a weak semismooth objective function 
and x} is a local minimizer of F(x). 
For a composite function F(x) in the form 


F(x) = f(x) + A(c(x)), 


where f(x) and c(x) = (ci(x), ... , Cm(x))? are smooth 
functions and h: R"™ —R is convex but nons- 
mooth [2], a kind of two-parameter filled function 


P(x,1, A) = W(r + f(x))exp(—Allx — xf |)?) 


was considered in [20], where the function y(t) has 

properties such as: 

(i) w(t) > Ofort> 0. 

(ii) y(t) is monotonically decreasing for t > 0. 

(iii) W(t1) — W(t2) < c2(t — t)) for fp > t; > 0, where 
C2 > 0 is a constant. 

In addition, for the single-parameter filled functions, 

we can also consider some general forms as follows: 


U(x, A) = —( f(x) — f(xp))explAlla — x§ ll’), 
or 


U(x, A) = —P(f(x) — f(x~)) — Allx — xf ll. 


where A > 0 is a parameter, and the function g(t) is 
required to satisfy certain conditions [20]: 
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(i) $(0) =0, (t) is monotonically increasing for 
t= 05 

(ii) c1(t2 — th) < O(t2) — b(t) S c2(t2 — th) for 
to > t; > 0, where 0 < c; < c2 are constants. 

Note that even for a continuously differential uncon- 

strained optimization problem, there may exist a non- 

smooth filled function. For example, a two-parameter 

nonsmooth filled function 


Plex] ph) = fq) —min| fo) fl 
— pllx — xf I (8) 
+ p{max[0, f(x) — f(xf)]? 


was introduced in [21], where f(x) is coercive and Lip- 
schitz continuous with a constant L in fi". 


Discrete Filled Functions 


After the concept of the filled functions was introduced 
for continuous global optimization by Ge [5], some 
people tried to transform discrete global optimization 
problems into continuous ones and then to solve them 
by the continuous filled function methods [6,17,23]. 

For the discrete case, since the third property of 
a continuous filled function usually does not hold, such 
an extension is not trivial. Difficulties may also oc- 
cur when continuous optimization methods are applied 
to deal with discrete optimization problems where the 
gradient vectors are unavailable or expensive to com- 
pute. 

Discrete filled functions are related to the concept of 
the discrete neighborhood. The discrete neighborhood 
for a point x € Z” is usually defined by 


N(x) = {x,x+e;|i=1,2,...,n}, 


where e; is the ith unit vector (i.e., the n-dimensional 
vector with the ith component equal to 1 and all other 
components equal to 0). On the basis of the local search 
approach and the two-parameter filled function defined 
by (1), Zhu [23] proposed an approximate algorithm 
for a class of nonlinear integer programming problems 
xeOnZn f(x) , 
where 92 is a bounded closed box with all vertices inte- 
gral. The algorithm is a direct method, which tries to 
improve a current discrete local minimal solution by 
minimizing an associated filled function. In [23], the 


author used two examples to illustrate the numerical 
performance of the algorithm proposed there. 

In addition, based on the concept of 1/5-neighbor- 
hood of an integer point x such as 


an 1 
yER" | lly —xllo X at 


N(x) = 
Ge and Huang [6] investigated unconstrained nonlin- 
ear integer programming, constrained nonlinear inte- 
ger programming, and mixed nonlinear integer pro- 
gramming problems. For such cases, the authors tried 
to use a penalty function to transform a nonlinear inte- 
ger programming problem into a global optimization 
problem, which can be solved by the filled function 
method if the objective function is twice continuously 
differentiable in 3+”, and its gradient and Hessian ma- 
trix are bounded. In particular, when the constraints are 
equalities, all constrained functions are assumed to be 
twice continuously differentiable. 

The unconstrained nonlinear integer programming 
model in [6] has the form: 


Minimize f(x), 
subject to |x;| < bj,i=1,2,...,n (9) 
xeEeZ", 


where each 0; is an integer. Under certain conditions, if 
x* is a global minimizer of a penalty function 


oi(x, k) = f(x) -—k >». cos 20x; 


i=1 


in the box {x | |x;| < b;,i =1,2,...,n} and x* is in 
a 1/5-neighborhood of an integer point x, then x is a so- 
lution of problem (9). 

For some integer m <n, if the second constraint 
in (9), x € Z", is replaced by x} € Zi = mm-+ 
1,...,m), then the corresponding problem is called 
the mixed nonlinear integer programming problem, for 
which a similar function 


f2(x, k) = f(x) —k > cos 27x; 


i=>m 


can be used as a penalty function. 
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Similarly, for a constrained nonlinear integer pro- 
gramming problem 


Minimize f(x), 


subject to cj(x) =0,i=1,2,...,p, 
min{0, c;(x)} = 0,i=p+1,...,q, (10) 
|x;| < b;,i=1,2,...,n 
xeEZ", 


some results can be derived by using the following 
penalty function: 


P 


gs(x,r,k) = f(x) +r >> ¢7(x) 


i=1 


q 
+r > [min{0, c;(x)}] 


i=p+l 


The minimization of $3(x, 7, k) can be dealt with by the 
filled function method proposed for constrained opti- 
mization problems [3]. 

For the discrete optimization problem 


eeXed" ee : 
where f is a Lipschitz function, X is a bounded and 
(strictly) pathwise connected domain, Ng et al. [17] 
modified the definition of continuous filled functions 
in order to allow them to be applied to discrete cases. 
Now we give a definition of a discrete filled function as 
follows: 


Given a discrete local minimizer x* of a function 
f:X CZ" = R, let B* be the discrete basin of f 
at x* over X. A function F: X — R is said to be 
a discrete filled function of f at x* if it satisfies the 
following conditions: 


(a) x* is a strict local maximizer of F over X; 

(b) F has no discrete local minimizers in B* or 
in any discrete basin of f higher than B*; 

(c) If f has a discrete basin B** at x** that is 
lower than B*, then there is a discrete point 
x’ € B** that minimizes F on a discrete path 
SIE GK ati hin xX, 


On the basis of the two-parameter nonsmooth filled 
function defined by (8) at a local minimizer x}, a two- 
phase algorithm was proposed to solve a discrete global 
optimization problem in [17]. In phase 1, a discrete 
steepest descent method was applied to find a local min- 
imizer x} of f over X, which was called the local search. 
Phase 2 searched for a minimum of the discrete filled 
function defined by (8) ona discrete path in X via some 
special search directions, which was called global search. 
The global search would identify a point x’ in a discrete 
basin lower than the discrete basin BY of f at xf. The 
algorithm stopped when minimizing a discrete filled 
function did not yield a better solution than the current 
best local minimizer. 


Summary 


Many existing filled function methods require the as- 
sumption that the objective function has only a fi- 
nite number of local minimizers. In addition, they also 
require that these local minima have different objec- 
tive values. The assignment of single/two parameters in 
a filled function is a very important issue for ensuring 
the existence of a specific point for the filled function, 
by which a better local minimum of the original objec- 
tive function can be found in a lower basin if it exists. 
Note that even for a local minimizer existing in a lower 
basin, how to find it is still a reduced optimization prob- 
lem. 

Furthermore, it is hard to find a general stopping 
criterion for the filled function methods, i.e., to check 
whether a feasible point obtained by any of the filled 
function methods is a global minimizer or not. All these 
drawbacks indicate that research on the filled function 
methods will be fascinating in the future. People may 
consider extensive approaches to solve global optimiza- 
tion problems, for example, by using modified func- 
tions which include some nonfilled functions [19], or by 
using locally filled functions which are integrated with 
techniques in cluster analysis [9,22]. 
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Introduction 


Given the wide variety of different global optimiza- 
tion techniques, every time we have a new optimiza- 
tion problem we must select the best technique for 
solving this problem. This selection problem is made 
more complex by the fact that most techniques for solv- 
ing global optimization problems have parameters that 
need to be adjusted to the problem or to the class of 
problems. For example, in gradient methods, one can 
select different step sizes. 

When we have a single or few parameters to choose, 
it is possible to empirically try many values and come 
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up with an (almost) optimal value. Thus, in such situ- 
ations, we can identify an optimal version of the cor- 
responding technique. In other approaches, such as 
methods like convex underestimators (described in de- 
tail in the next section), instead of selecting the value of 
single number-valued parameter, we have to select the 
auxiliary function. It is not practically possible to test all 
possible functions, so it is not easy to identify an opti- 
mal version of the corresponding technique [9]. 

This entry presents the work of Floudas and 
Kreinovich [9,10] on the functional forms of convex 
underestimators for twice continuously differentiable 
functions. They consider the problem of selecting the 
best auxiliary function within a given global optimiza- 
tion technique. Specifically, they showed that in many 
such selection situations, natural symmetry require- 
ments enables one to either analytically solve the prob- 
lem of finding the optimal auxiliary function, or at least 
reduce this problem to the easier-to-solve problem of 
finding a few parameters. 

In particular, they showed that we can thus explain 
both the a@BB method [1,2,6,16] and the generalized 
aBB recently proposed in [4,5]. A recent review article 
on these deterministic global optimization approaches 
can be found in [8]. 


Selecting Convex Underestimators: 
The «BB Method 


It is well known that convex functions are compu- 
tationally easier to minimize than non-convex ones 
(see [7]). This relative easiness is not only an empirical 
fact, it also has a theoretical justification (see [13,19]). 

Because of this relative easiness, one of the ap- 
proaches for minimization of a non-convex function 
f(x) = f(x, ... , Xn) (under certain constraints) over 
a bow x", "= bet al le... eh, 20) is to first 
minimize its convex “underestimator”, i.e., a convex 
function L(x) < f(x). Since L(x) is an underestimator, 
the minimum of L(x) is alower bound for the minimum 
of f(x). By selecting L(x) as close to f(x) as possible, we 
can get estimates for min f(x) which are as close to the 
actual minimum as possible. 

The quality of approximation improves when the 
boxes become smaller. To get more accurate bounds on 
min f(x), we can bisect the box [x’, x”] into sub-boxes 
whithin a regular branch-and-bound framework, and 


use the above technique to estimate min f(x) after con- 
sidering the result of each node and utilizing fathoming 
of branches where appropriate. 

A known efficient approach to designing a convex 
underestimator is the wBB global optimization algo- 
rithm [1,2,6,16], in which we select an underestimator 
L(x) = f(x) + ®(x), where 


n 
P(x) = — Yo aj: (xi — x7) + (xf — xi). (1) 
i=1 

Here, the parameters q; are selected in such a way that 
the resulting function L(x) is convex and still not too 
far away from the original objective function f(x). For 
a thorough presentation of ways to select these param- 
eters, see [1,2,3,11]. 

In many optimization problems, the wBB tech- 
niques are very efficient, but in some non-convex opti- 
mization problems, it is desirable to improve their per- 
formance. One way to do that is to provide a more gen- 
eral class of methods, with more parameters to tune. 
In the a@BB techniques, for each coordinate x;, we have 
a single parameter a; affecting this coordinate. Chang- 
ing a; is equivalent to a linear re-scaling of x;. Indeed, 
if we change the unit for measuring x; to a new unit 
which is A; times smaller, then all the numerical val- 
ues become A; times larger: x; > y; = gi(x;), where 
gi(x;) = A; - x;. In principle, we can have two different 
re-scalings: 
© x; > yi = gi(x;) = A; - x; on the interval [x";, x;], 

and 
© x; > z = hj(x;) = pj - x; on the interval [x;, x" ;]. 
If we substitute the new values yj; = gj(x;) and z; = 
h;(x;) into the formula (1), then we get the following 
expression 


P(x) = — > aj -(gi(xi)—gi(x}))-(hilx?) —hi(x:)). 


i=1 
(2) 


For the above linear re-scalings, we get 


n 
B(x) = — 0G; + (xj — x7) (xY — xi), 
i=1 
where @; = Qj +A; ° [j. 
From this viewpoint, a natural generalization is to 
replace linear re-scalings g;(x;) and h;(x;) with non- 
linear ones, that is, to consider convex underestimators 
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of the type L(x) = f(x) + ®(x), where ®(x) is de- 
scribed by the formula (2) with non-linear functions 
gi(x;) and h;(x;). Now, instead of selecting a number 
a; for each coordinate i, we have an additional free- 
dom of choosing arbitrary non-linear functions g;(x;) 
and h;(x;). The question of which are the best choices 
is naturally posed. In [4,5], several different non-linear 
functions were tried, and it turned out that among 
the tested functions, the best results were achieved for 
the exponential functions gj(x;) = exp(y; - x;) and 
hi(x;) = —exp(—y; - x;). For these functions, the ex- 
pression (2) can be somewhat simplified: indeed, 


a; + (gi(xi) — gi(x?)) + (h(x?) — hi(x:)) 

= a; + (e%"*! — eV). (—eVeX! 4 Vii) 

= ; = eVi(ei-X;)) c= eV) —1)) ; 

~ def ; 
where @; = Qj: evi x7), 

Two related questions naturally arise and are ad- 
dressed in the work of Floudas and Kreinovich [9,10]: 
e first, a practical question: an empirical choice is 

made by using only finitely many functions; is this 

choice indeed the best - or there are other, even bet- 
ter functions g;(x;) and h;(x;), which we did not dis- 
cover because we did not try them? 

e second, a theoretical question: how can we explain 
the above empirical fact? 


Shift Invariance 


The starting point for measuring each coordinate x; is 
often a matter of arbitrary choice. If a selection of the 
functions g;(x;) and h;(x;) is “optimal” (in some intu- 
itive sense), then the results of using these optimal func- 
tions should not change if we simply change the start- 
ing point for measuring x;, that is, replace each value x; 
with a new value x; + s, where s is the shift in the start- 
ing point. Indeed, otherwise, if the “quality” of the re- 
sulting convex underestimators changes with shift, we 
could apply a shift and get better functions g;(x;) and 
h,(x;) - which contradicts the assumption that the se- 
lection of g;(x;) and h;(x;) is already optimal. 

The “optimal” choices gj(x;) and gj(x;) can be de- 
termined from the requirement that each component 
ot; « (gi(xi) — gi(x7)) - (hi(x/’) — hi(x;)) in the sum (2) 
be invariant under the corresponding shift, that is, that 
they satisfy the following definition. 


Definition 1 A pair of smooth functions (g(x), h(x)) 
from real numbers to real numbers is shift-invariant if 
for every s and a, there exists @(a, s) such that for every 
x', x, and x¥, we have 


a - (g(x) — g(x")) - (A(x) — h(x) 
= (a, s)- (g(x +s) — g(x" +s)) (3) 
-(h(x" +s) —h(x +5)). 


At first glance, shift invariance is a reasonable but weak 
property. It turns out, however, that this seemingly 
weak property actually almost uniquely determines the 
optimal selection of exponential functions. Proposi- 
tion 1 applies. 


Proposition 1 If a pair of functions (g(x), h(x)) is shift- 
invariant, then this pair is either exponential or linear, 
i.e., each of the functions g(x) and h(x) has the form 
g(x) =A+C-exply-x)org(x) =A+k-x. 


For a proof, see [9] or [10]. 


Sign Invariance 


In addition to shift, another natural symmetry is chang- 
ing the sign. If we require that the expression (2) re- 
main invariant under a replacement of x by -x, then 
we get the relation between g(x) and h(x): h(x) = 
—g(—x). So, if a pair (g(x), h(x)) is shift-invariant and 
sign-invariant, then: 

e either g(x) = exp(y- x) and h(x) = —exp(—y- x), 
e or g(x) = h(x) =x. 

In other words, the optimal generalized wBB scheme is 
either the original wBB [1,2,6,16], or the scheme with 
exponential functions described in [4,5]. 


Scale Invariance 


Sign-invariance can be perceived as a special case of 
scale-invariance. Scale-invariance addresses a change 
in the unit for measuring x, that is, transformations 
x—>dA-x, 

We have already shown that there are only two 
shift-invariant solutions: exponential and linear func- 
tions. Out of these two solutions, only the linear so- 
lution - corresponding to the original wBB — is scale- 
invariant. Thus, if we also require scale-invariance, we 
restrict ourselves only to original techniques and miss 
the (often better) exponential generalizations. 
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Although imposing both shift- and scale-invariance 
leads to restrictions, one could still choose to employ 
only the latter, formally expressed as follows: 


Definition 2 A pair of smooth functions (g(x), h(x)) 
from real numbers to real numbers is scale-invariant if 
for every A and a, there exists @(a, A) such that for ev- 
ery x!, x, and x, we have 


a - (g(x) — g(x")) - (h(x”) — h(x) 
= (a, A) - (g(A+ x) — g(A-x")) - (A(A+ x”) (4) 
—h(A-x)) 


The following proposition applies. For a proof, see [9]. 


Proposition 2 [fa pair of functions (g(x), h(x)) is scale- 
invariant, then this pair is either exponential or linear, 
i.e., each of the functions g(x) and h(x) has the form 
g(x) = A-x” or g(x) =A+k-In(x). 


From the theoretical viewpoint, these functions may 
look as good as the exponential functions coming from 
shift invariance, but in practice, they do not work so 
well. The problem with these solutions is that they do 
not preserve smoothness. Both linear and exponen- 
tial functions which come from shift-invariance are in- 
finitely differentiable for all x and hence, adding the 
corresponding term (x) will not decrease the smooth- 
ness level of the objective function. In contrast, the 
functions g(x) = x” which come from scale invari- 
ance are not infinitely differentiable at x = 0 or when 
y is not integer. So, if we use scale invariance to select 
a convex underestimator, we end up with a new param- 
eter y which only attains integer-valued values and is, 
thus, less flexible than the continuous-valued parame- 
ters coming from scale-invariance. 


Generalization of Shift Invariance 


Instead of the expression (2), we can consider an even 
more general expression 


P(x) =—)\ay;-a;(a, x")- bi(x;, x’). (5) 
i=1 


What can be concluded from shift-invariance in this 
more general case? 


Definition 3. A pair of smooth functions (a(x, x“), 
b(x,x¥)) from real numbers to real numbers is shift- 


invariant if for every s and q, there exists @(a,s) such 
that for every x", x, and x”, we have 
a- a(x, x")- b(x, x¥) 
=@(a, s)-a(x +s, x* +s)-b(x +5, x7 +5). 
(6) 


Regarding such functions, Floudas and Kreinovich [9] 
proved the following proposition: 


Proposition 3 If a pair of functions (a(x, x“), b(x, x”) 
is shift-invariant, then 


a(x, x") -b(x, x¥) = A(x — x")- B(xY — x)-eY* 


for some functions A(x) and B(x) and for some real num- 
ber y. 


Comment. If we additionally require that the expression 
a(x, x")-b(x, x¥) be invariant under x —> —x, then we 
conclude that B(x) = A(x). 

Another shift-invariance result comes from the fol- 
lowing observation. Both the wBB expression 


(Sa a) 
and the generalized expression 


a(t eV 2") aft eV —x)) 


have the form a(x — x")- a(x¥ — x) with a(0) = 0. The 
differences x—x" and xY —x come from the fact that we 
want these expressions to be shift-invariant. The prod- 
uct form makes sense, since we want the product to be 0 
on each border x = x’ and x = x of the correspond- 
ing interval [x", x¥]. 

On the other hand, it is well known that optimizing 
a product is more difficult than optimizing a sum; since 
we will be minimizing the expression f(x) + ®(x), it is 
therefore desirable to be able to reformulate it in terms 
of the easier-to-minimize sum, e.g., as b(x — xt) + 
b(x¥ — x) + c(x¥ — x") for some functions b and c (for 
minimization purposes, c does not depend on x and is 
thus a constant). It is worth mentioning that both the 
BB expression and its exponential generalization al- 
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low such representation. Note that: 
— (x —x")-(x¥ — x) 
1 ma, | po 2 
Hr (e-x Pra (x -x/y- 
— pt ) 
and 
— (1 — e% @-*)). (1 — eX #9) 


= Jp eH) 4 ge — x) gre — x") 
Interestingly, the above two expressions are the only 
ones which have this easiness-to-compute property: 


Definition 4 We say that a smooth function a(x) from 
real numbers to real numbers describes an easy-to- 
compute underestimator if a(0) = 0, a’(0) # 0, and 
there exist smooth functions b(x) and c(x) such that for 
every x, x’, and x”, we have 


a(x—x*)-a(x¥—x) = b(x—x")+b(xU—x)+c(x¥—x"), 


(7) 


The condition a’(0) 4 0 comes from the fact that oth- 


erwise, for small Ax af e — xh and x¥ — x, each value 
a(x —x*) will be quadratic in x —x", the resulting prod- 
uct will be fourth order, and we will not be able to com- 
pensate for quadratic non-convex terms in the original 
objective function f(x) - which defeats the purpose of 
using f(x) + ®(x) as a convex underestimator. 


Proposition 4 The only functions which describe easy- 
to-compute underestimators are a(x) = k-x and a(x) = 
k-(1—e”™*), 


This is another shift-invariance related result that is also 
proven in [9]. It selects linear and exponential functions 
as “the best” in some reasonable sense. Floudas and 
Kreinovich [9] proved that any “natural” shift-invariant 
optimality criterion on the set of all possible underesti- 
mator methods selects either a linear or an exponential 
function. 


Final Remarks 


The work of Floudas and Kreinovich [9,10] has a much 
further-reaching effect than on the case of aBB- 
based convex underestimation mainly discussed here. 
A symmetry-based approach leads to optimal tech- 
niques also in the cases of optimal bisection (for se- 
lecting box-splitting strategies) and optimal selection 


of penalty and barrier functions. Other empirically op- 
timal techniques can also be explained by symmetry- 
based arguments. These include the “epsilon-inflation” 
technique [15,18], results in simulated annealing and 
genetic algorithms [17], as well as optimal selection of 
probabilities in swarm optimization [12,14]. 


References 


1. Adjiman CS, Androulakis IP, Floudas CA (1998) 
A Global Optimization Method, wBB, for General Twice- 
Differentiable Constrained NLPs Il. Implementation and 
Computational Results. Comput Chem Eng 22:1159- 
1179 

2. Adjiman CS, Dallwig S, Floudas CA, and Neumaier A (1998) 
A Global Optimization Method, wBB, for General Twice- 
Differentiable Constrained NLPs |. Theoretical Advances. 
Comput Chem Eng 22:1137-1158 

3. Adjiman CS, Floudas CA (1996) Rigorous Convex Un- 
derestimators for General Twice-Differentiable Problems. 
J Global Optim 9:23-40 

4. Akrotirianakis IG, Floudas CA (2004) A New Class of Im- 
proved Convex Underestimators for Twice Continuously 
Differentiable Constrained NLPs. J Global Optim 30:367- 
390 

5. Akrotirianakis IG, Floudas CA (2004) Computational Expe- 
rience with a New Class of Convex Underestimators : Box- 
Constrained NLP Problems. J Global Optim 29:249-264 

6. Androulakis IP, Maranas CD, Floudas CA (1995) a@BB: 
A Global Optimization Method for General Constrained 
Nonconvex Problems. J Global Optim 7:337-363 

7. Floudas CA (2000) Deterministic Global Optimization: The- 
ory, Algorithms and Applications. Kluwer, Dordrecht 

8. Floudas CA, Akrotirianakis IG, Caratzoulas S, Meyer CA, Kall- 
rath J (2005) Global Optimization in the 21st Century: Ad- 
vances and Challenges. Comput Chem Eng 29:1185-1202 

9. Floudas CA, Kreinovich V (2006) Towards Optimal 
Techniques for Solving Global Optimization Problems: 
Symmetry-Based Approach. In: Torn A, Zilinskas J (eds) 
Models and Algorithms for Global Optimization. Springer, 
Berlin, pp 21-42 

10. Floudas CA, Kreinovich V (2007) On the Functional Form 
of Convex Understimators for Twice Continuously Differ- 
entiable Functions. Optim Lett 1:187-192 

11. Hertz D, Adjiman CS, Floudas CA (1999) Two results on 
bounding the roots of interval polynomials. Comput Chem 
Eng 23:1333-1339 

12. lourinski D, Starks SA, Kreinovich V, Smith SF (2002) Swarm 
Intelligence: Theoretical Proof that Empirical Techniques 
are Optimal. In: Proceedings of the 2002 World Automa- 
tion Congress WAC 2002, Orlando, FL, pp 107-112 

13. Kearfott RB, Kreinovich V (2005) Beyond Convex? Global 
Optimization is Feasible only for Convex Objective Func- 
tions: A Theorem. J Global Optim 33:617-624 


1328 


Global Optimization: g-wBB Approach 


14. Kennedy J, Eberhart R, Shi Y (2001) Swarm Intelligence. 
Morgan Kaufmann, New York 

15. Kreinovich V, Starks SA, Meyer G (1997) On a Theoretical 
Justification of the Choice of Epsilon-Inflation in PASCAL- 
XSC. Reliable Comput 3:437-452 

16. Maranas CD, Floudas CA (1994) Global Minimum Potential 
Energy Conformations of Small Molecules. J Global Optim 
4:135-170 

17. Nguyen HT, Kreinovich V (1997) Applications of Continu- 
ous Mathematics to Computer Science. Kluwer, Dordrecht 

18. Rump SM (1998) A Note on Epsilon-Inflation. Reliable Com- 
put 4:371-375 

19. Vavasis SA (1991) Nonlinear Optimization: Complexity Is- 
sues. Oxford University Press, Oxford 


ee 
Global Optimization: 


g-aBB Approach 


CHRYSANTHOS E. GOUNARIS, 
CHRISTODOULOS A. FLOUDAS 
Department of Chemical Engineering, 
Princeton University, Princeton, USA 


MSC2000: 49M37, 65K10, 90C26, 90C30, 46N10, 
47N10 


Article Outline 


Keywords and Phrases 

Introduction 

The New Relaxation Term 

The New Underestimating Function 
Selection of Appropriate Parameter Values 
Computational Results 

References 


Keywords and Phrases 


Convex underestimators; «BB; Global optimization 


Introduction 


Various deterministic global optimization algorithms 
that utilize a branch and bound framework make use 
of convex underestimators of the functions under con- 
sideration. For a recent review of such approaches, 
see [7]. For arbitrarily nonconvex C ?_continuous func- 
tions f(x), defined in domain X = [x’, x], the aBB 
underestimator [1,2,3,6,10] is typically used. This is 


constructed by adding to the original function the fol- 
lowing separable relaxation term, $(x;q): 


Plxsa) = — ) nsx; — x} )(xP — xi), (1) 
i=1 
where a; > 0,i=1,2,...,n. The resulting underesti- 
mator of f(x) would thus be 
Lapa(x;@) = f(x) + &(x5a) . (2) 


Since the relaxation term is separable, the follow- 
ing relationship exists among the Hessian matrices of 


Lapa(xs@), f(x) and $(x;a): 
V? Laps (x;a) = V7 f(x) +24, (3) 


where A = V*o(x;a) = diag {a, @2,...,@n}. The ad- 
dition of the relaxation term corresponds to a diagonal 
shift of the Hessian matrix. Therefore, if we select large 
enough values for the a; parameters, the nonconvexi- 
ties in the original function can be overpowered and the 
resulting underestimator Logg(x;~) becomes convex. 

A number of rigorous methods have been devised 
in order to select appropriate values for these param- 
eters [1,2,3,8]. Extensive computational testing of the 
algorithm [3] showed that the most efficient of those 
methods is the one based on the scaled Gerschgorin the- 
orem. According to this method, it suffices to select 


a; = max | 0,—= hii — )> max {hij 
iat 
—_ (x7 — x7) 
[aisl} (xU — x2) (4) 


where hy, and h,, are lower and upper bounds of 
0° f/dx,x, that can be calculated by interval analysis. 

The g-@BB approach was developed in [4,5] and of- 
fers an alternative convex underestimation functional 
form than the one originally proposed in the wBB the- 
ory. The new relaxation scheme suggests subtraction of 
a similar separable term that is of exponential, rather 
than quadratic, nature. 


The New Relaxation Term 


In this section, we present the new relaxation function. 
It shares most of the characteristics of the relaxation 
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function, @(x;@), used in the original wBB underesti- 
mator described above. However, it possesses novel ad- 
ditional properties that enable it to derive convex un- 
derestimators that are tighter to the original function. 
Thus, the new underestimators can help expedite the 
branch and bound process of the overall global opti- 
mization framework. 
The new relaxation function is defined as follows: 


Ma engl (xU—x, 
P(x;y) = -you — Vili 1 — ei; a), (5) 


i=1 


where y = (yi, 72,---, Yn) is a vector of nonnegative 
parameters. As will be explained later, these parame- 
ters play a similar role as the ;’s in the original wBB 
method. 


The gradient of ®(x;y) is 


—yy eM) a yyevi(r—*1) 


—yye¥2(*2-22) = ype?2(2 —*2) 
V@(x;y) = - 


—y,e¥n(%n—¥n) i yneln&n Xn 
and its Hessian is defined by the diagonal matrix 
V’P(x;y) = diag v2") + previ x) , 
i= ee 


Note that V?@(x; y) is a function of x as opposed to 
the Hessian matrix of #(x;@), used in @BB, which is 
constant throughout the domain X. 
The new relaxation function ®(x;y) has the follow- 
ing important properties: 
Pl: (x;y) <0, forall x € [x’, x4]. 
P2: ®(x;y) = 0 at the corner points of the interval 
[xh, x¥]. 
P3: ®(x;y) is a convex function. 
P4: @(x;y) achieves its minimum at the middle point, 
x4 of X and its maximum at the corner points. 
P5: The diagonal element of V*@(x;y) is a convex 
function and achieves its minimum at the mid- 
dle point and its maximum at the endpoints of 


U 
[arate |. 


The New Underestimating Function 


The new underestimating function, L)(x;y), is formed 
by adding (x;y) to the nonconvex function f(x), that 


is, 

Li(xsy) = f(x) + B(x; y). (6) 
The Hessian of L, is 

V7Li(xsy) = V7 f(x) + V?O(xs y). 


The underestimator L)(x; y) has the following impor- 

tant properties: 

U1: L,(x;y) is an underestimator of f(x). 

U2: Li (x;y) matches f(x) at all corner points of X. 

U3: The maximum separation distance between the 
nonconvex function f(x) and its underestimator 
L\(x;y) is bounded. 

U4: The underestimators constructed over supersets of 
the current set are always less tight than the un- 
derestimator constructed over the current box con- 
straints. 

Since the function ®(x;y) is convex for every x € X 
and y > 0, all nonconvexities in the original function 
f(x) can be eliminated, provided that the parameters y; 
have the appropriate values. The selection of these val- 
ues is presented in the next section. 


Selection of Appropriate Parameter Values 


The initial values for the y; parameters are selected by 
solving the following system of nonlinear equations: 

lt y2t yer) =0, 1=1,2...,.n, (7) 
where €; < 0,i=1,2,...,n. The parameters £; con- 
vey second-order characteristics of the original non- 
convex function into the construction process of the 
underestimator. Candidate values for these parameters 
can be selected as follows: 

£; = —2a;, t=] 12853 Hy (8) 
where a; > 0,i = 1,2,...,m are the parameters that 
correspond to the original wBB method, as given by (4). 
Akrotirianakis and Floudas [4] proved that such a se- 
lection for the y; parameters always results in an un- 
derestimator that is tighter than the one resulting from 
the original method, i. e., (2). However, this new under- 
estimator is not necessarily convex. Furthermore, they 
proved that there always exists some selection of y; pa- 
rameters that results in a convex underestimator. 
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Therefore, they developed a systematic procedure 
that determines values for all parameters y; that not 
only guarantee the convexity of the underestimating 
function L\(x;y) but also ensure that L)(x;y) is at least 
as tight as the underestimating function Lygg(x;o). This 
procedure is an iterative scheme that is based on inter- 
val analysis and consecutive partitions of the domain X. 
Before we present the scheme, let us present two addi- 
tional results from [4] that are relevant: 


Theorem 1 Let y = (y,> Vorcres vy)" be the solution 
of system (7), with £; defined by (8). Then, the two un- 
derestimators L,(x; Y) and Leapp(x; a), where 


A(1 _ e5¥ (1 —¥1))2 
a= 
- (ay =a) 


A(1 — 60-5¥nCeH —#))2\ 7 
» (9) 
(x¥ — xb)? 


have the same maximum separation distance from f (x). 


Theorem 2 Let @ = (@,@,...,@,)" be the values of 
the a parameters as computed by (4). Then, the two un- 
derestimators L\(x;y) and Lapp(x; @), where 


y= U L 
xy — xy 


- (228 + Sey (x¥ — xt)/2) 


2log(1 + Va,(x¥ — sal) ae 


U _ ¥L 
xn Xn 


have the same maximum separation distance from f (x). 


The main result of the above two theorems is that 
for any y € [y, Y] there exists an a € [@, @], such that 
the underestimators Ly(x3y) and Leaps(x;~) have the 
same maximum separation distance from the noncon- 
vex function f(x). From all these pairs of underestima- 
tors, the only one that is known to be convex a pri- 
ori is Lagp(x;@), since this is the one resulting from 
the classical ~BB method. However, as will be apparent 
from the examples presented later, the underestimators 
Logp(x;o) and L\(x;y) are convex within a large portion 
of the intervals [a, a] and [y, 7], respectively. On the 
basis of the above observations, it is natural to search 
for a vector y in the interval [y, 7] or for a vector @ in 
the interval [a, @], so that at least one of the underesti- 
mators L; (x;y) and Laga(x;@) is convex. 


The algorithm described below was developed in [4] 
for the appropriate selection of values for the y param- 
eters, so that the corresponding underestimator is both 
a convex function and at least as tight as the underesti- 
mator used by the classical ~@BB method. It searches for 
avector y € [y, ] so that the corresponding a € [a, a] 
produces an underestimating function Lygp(x;@) that is 
convex. The search starts by setting y = y anda =a 
and then checking whether Lygg(x; @) is ‘convex. This 
is done by using the scaled Gerschgorin method to de- 
termine lower bounds on the eigenvalues of the Hes- 
sian matrix V7Lagp(x; a). For those lower bounds that 
are negative, the intervals of the corresponding vari- 
ables are bisected, thereby generating a number of sub- 
domains that are stored in a list, denoted by A,. Then, 
the algorithm checks whether V7? Lapp(x;a@) is posi- 
tive semidefinite in each of those subdomains using 
again the scaled Gerschgorin method. If the size of 
the list, A,, exceeds a certain number of nodes, then 
V?Lapp(x; a) is most likely not positive semidefinite. 
The values of all y;’s have to then be increased by a pre- 
specified positive quantity, 7 > 0, and the correspond- 
ing values of the new a@;’s are calculated. The algorithm 
now tries to verify whether V*Lapp(x;@), with the new 
increased w parameters, is positive semidefinite. It con- 
tinues in this manner until the list A; becomes empty. 
In that case, the corresponding a values make the Hes- 
sian matrix, V?Lapp(x;@), positive semidefinite for all 
x € X and consequently Laza(x;c) is a convex underes- 
timator. The main reason for using the underestimator 
Lapp(x;a) instead of the underestimator L;(x;y) is that 
it is easier to verify the positive definiteness of the ma- 
trix V7Lage(x;@) than that of the matrix V*L;(x;y). 
For more details see Alg. 1 

Termination of the above algorithm is guaranteed 
by the fact that Lag (x; @) is known, a priori, to be con- 
vex underestimator. 


Computational Results 


Because an iterative procedure is needed to determine 
appropriate values for the y; parameters, the construc- 
tion of the new underestimators requires more com- 
putational effort than that required for the classical 
aBB method. However, within a global optimization 
framework, actual computational savings may be real- 
ized since the tighter underestimators produced by the 
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Algorithm: 


Step 1 (initialization): Set K = 1,J = 1, Jmax = 2" +1 n=1.1X; = XA) = {Xj} and yi,x = y 


ie 


Step 2: For alli = 1,2---, n, use (9) to calculate the a;,x that correspond to y;,x, and form the underestimator 


Lapp (x3 AK). 


Step 3: If the maximum separation distance of Lygg(x;a@x) from f(x) is less than the maximum separation 


distance of Lygg(x;@) from f(x) then go to step 4. 


Otherwise, adopt as an underestimator the classical wBB underestimator, Lagg(x; a), and stop. 


Step 4: Check whether Laga(x; ax) is convex: 
Repeat 


Step 4.1: Remove the last element from the list A; of unexplored subdomains. Let us name that subdomain 


X ast 


Step 4.2: Form the interval Hessian [V7Lypp(x;ax)] with x € Xigst. 
Step 4.3: Use (4) and (8) to find lower bounds on each eigenvalue of the interval Hessian 


[V*Lopa(x30K)] in Xiast. 
Step 4.4: Form the set IL = {i: £; < 0}. 


Step 4.5: If I # Q, bisect all intervals [x/-,, .,, ee with i € I_, and add them at the end of the list Aj. 


Step 4.6: Set J =J+ 2!!-| — 1, where |I_| represents the cardinality of the set I_ (i.e., a total of 2!-| new 
subdomains have been generated and added to the list and one node has been removed). 


Until (A; = 9 or J = Jmax). 


Step 5: If A; = @ then stop. The Hessian V7Lepp(X; Cx) is positive semidefinite for all x € X and Lyzp(x; aK) 
is a convex underestimator. Also the underestimator Loga(x;@x) is tighter than the underestimator 


Lapp(x; @) obtained by the classical aBB method. 


Otherwise, increase the values of all y;,x,i =1,2,. 


step 2. 
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new method could expedite the branch and bound pro- 
cess through faster fathoming and visits to fewer tree 
nodes. 

A detailed computational comparison between the 
new underestimators and the ones used by the classi- 
cal a@BB method was performed by Akrotirianakis and 
Floudas [5]. They concluded that the new underesti- 
mators usually perform better than the classical wBB 
method, in terms of both the overall CPU time and 
the number of nodes generated by the enumeration 
tree. It was also observed that the new underestima- 
tors perform better when the problem involves many 
arbitrarily nonconvex terms in the objective or con- 
straints. 

In the same study, Akrotirianakis and Floudas [5] 
also presented a hybrid optimization framework where 
underestimators L(x; y) were used to construct the re- 


..,n by setting yi,x+1 = NYi,x. Set K = K + 1 and go to 


laxation in every node of the branch and bound tree. 
A stochastic random-linkage algorithm [9] was then 
employed to solve these relaxations and the method 
exhibited improved computational efficiency. Interest- 
ingly enough, the method located the actual global op- 
timum in all case studies, despite the lack of theoretical 
guarantees owing to the fact that the underestimators 
L(x; y) are not necessarily convex. 

As an illustration, we present here two examples 
from [5]: 


Example 1 

This example involves a nonconvex function that de- 
scribes the molecular conformation of pseudoethane. It 
is taken from [11], where the global minimum potential 
energy conformation of small molecules is studied. The 
Lennard-Jones potential is expressed in terms of a sim- 
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14 
a 


Function f;(x) and comparison of underestimators La gg(x; &) and Ly gg(x;a) 


ple dihedral angle. The potential energy of the molecule 
is given by the following function: 


= 588600 
Filx) ~~ (3r6—4cos(0)r6—2(sin?(0)cos(x—*# )—cos*(8))r9)® 
= 1079.1 
3r§—4c0s(8)15—2(sin2(O)cos(x— 2 )—cos?(8))13)3 
4 600800 
(3r2—4c0s(0)r2—2(sin2(0)cos(x)—cos?())r2)6 
_ 1071.5 
(3r2—4cos(0)r2—2(sin2(0)cos(x)—cos?(@))r2)3 
4 481300 
(3r3—4cos(9)r5—2(sin? (0+ 22 )cos(x)—cos?(8))r9)6 
1064.6 


= (3r2—4c0s(0)r5—2(sin? (0+ 2 )cos(x)—cos2(8))r3)3 ’ 


where ro is the covalent bond length (rp = 1.54A), 6 is 
the covalent bond angle (6 = 109.5°) and x is the dihe- 
dral angle (x € X = [0, 27]). Figure 1 depicts the graph 
of f 1 (x). 

The value of the w parameter computed by the clas- 
sical aBB method using (4) is @ = 77.124 and the cor- 
responding value for the y parameter, obtained by (10), 
is Y = 1.0673. Also, by solving (7) for y we obtain 
y = 0.8521 and the corresponding value for the a 
parameter, obtained by (9), is @ = 18.579. The iter- 
ative algorithm checks whether there exist values of 
y € [y.y] and a € [a,a@] such that the underestima- 
tor Lypg(xja) is convex. After 16 iterations it concludes 
that if a = a, then Laga(x;a) is a convex underestima- 
tor of f(x). Furthermore, if y = y, then L;(x;y) is also 


a convex underestimator of f(x). Note that the values 
of y and a did not have to increase at all. 

The resulting minima of the two underestima- 
tors Lepe(x;@) and Lepa(x;a) are —762.2377 and 
—184.4244, respectively. Figure 1 depicts these two un- 
derestimators and reveals the improvement in tight- 
ness. 


Example 2 This example is taken from [2] and exam- 
ines the following two-dimensional nonconvex func- 
tion: 
: 1 

fi(x) = cos(x1) sin(x2) — oi 
where x; € [—1,2] and x2 € [1,1]. The above func- 
tion possesses three minima and its graph is depicted 
in Fig. 2. The values of the @ parameters computed 
by the classical wBB method using (4) are @ = 1.921 
and @ = 10.921. Using (10), we can determine the 
corresponding value for the y parameters; these are 
Y, = 90.75 and y, = 1.46. Also, by solving (7) for 
yi;,i = 1,2, we obtain = 0.672 and = 1.267. Us- 
ing (9), we can determine the corresponding values for 
the a parameters; these are @, = 1.3456 anda, = 6.5. 

The iterative algorithm checks whether there ex- 
ist values of y; € [y»Vil.i = 1,2 anda; € 
la;,@i],i = 1,2, such that the underestimator 
Lupe(x;a@) is convex. After eight iterations it concludes 
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Function f2(x) and comparison of underestimators Le gg(x; &) and La gp(x;a) 


that if w = (1.8325,a@,), then Logg(x;a) is a convex 
underestimator of f(x). Also, if y = (0.74, Y,)» then 
L,(x;y) is also a convex underestimator of f2(x). Note 
that only the value of y; had to be increased from its 
original value, y > and the increase was only by 10%. 

The resulting minima of the two underestima- 
tors Lagp(x;@) and Legeg(x;a@) are —15.88469 and 
— 10.22767, respectively. Figure 2 depicts these two un- 
derestimators and reveals the improvement in tight- 
ness. 


References 


1. Adjiman CS, Floudas CA (1996) Rigorous convex underes- 
timators for general twice-differentiable problems. J Glob 
Optim 9:23-40 

2. Adjiman CS, Dallwig S, Floudas CA, Neumaier A (1998) 
A global optimization method, wBB, for general twice- 
differentiable constrained NLPS |. theoretical advances. 
Comput Chem Eng 22:1137-1158 

3. Adjiman CS, Androulakis IP, Floudas CA (1998) A global op- 
timization method, @BB, for general twice-differentiable 
constrained NLPs Il. Implementation and Computational 
Results. Comput Chem Eng 22:1159-1179 

4. Akrotirianakis IG, Floudas CA (2004) A new class of im- 
proved convex underestimators for twice continuously dif- 
ferentiable constrained NLPs. J Glob Optim 30:367-390 

5. Akrotirianakis IG, Floudas CA (2004) Computational expe- 
rience with a new class of convex underestimators: box- 
constrained NLP problems. J Glob Optim 29:249-264 

6. Androulakis IP, Maranas CD, Floudas CA (1995) @BB: 
A global optimization method for general constrained 
nonconvex problems. J Glob Optim 7:337-363 


7. Floudas CA, Akrotirianakis IG, Caratzoulas S, Meyer CA, 
Kallrath J (2005) Global optimization in the 21st cen- 
tury: advances and challenges. Comput Chem Eng 29: 
1185-1202 

8. Hertz D, Adjiman CS, Floudas CA (1999) Two results on 
bounding the roots of interval polynomials. Comput Chem 
Eng 23:1333-1339 

9. Locatelli M, Schoen F (1999) Random linkage: a family of 
acceptance/rejection algorithms for global optimization. 
Math Program 85:379-396 

10. Maranas CD, Floudas CA (1994) Global minimum poten- 
tial energy conformations of small molecules. J Glob Optim 
4:135-170 

11. Maranas CD, Floudas CA (1994) A deterministic global opti- 
mization approach for molecular structure determination. 
J Chem Phys 100:1247-1261 


Global Optimization in Generalized 
Geometric Programming 


CosTAs D. MARANAS 
Pennsylvania State University, University Park, USA 


MSC2000: 90C26, 90C90 


Article Outline 


Keywords 

Robust Stability Analysis 
See also 

References 


1334 


Global Optimization in Generalized Geometric Programming 


Keywords 


Signomials; Generalized geometric programming; 
Global optimization; Robust stability analysis 


Generalized geometric GGP or signomial programming 
(GGP) problems are characterized by an objective func- 
tion and constraints which are the difference of two 
posynomials. A posynomial G(x) is simply the sum of 
a number of posynomial terms or monomials g,(x), k = 
1,..., K, multiplied by some positive real constants cx, k 
= 1,..., K. Each monomial g;(x) is in turn the product 
of a number of positive variables each of them raised to 
some real power, 


gx (x) ne ee ae a eee a 


where dj,x, .. 
gers. The term ‘geometric programming’ was adopted 
because of the key role that the well-known arithmetic- 
geometric inequality played in the initial developments. 
Generalized geometric problems were first introduced 
and studied by U. Passy and D.J. Wilde [28] and GJ. 
Blau and Wilde [8] when existing (posynomial) geo- 
metric programming (GP) formulations failed to ac- 
count for the presence of negatively signed monomi- 
als in models for important engineering applications. 
These applications are extensively reviewed in [31] and 
[16]. Chemical engineering applications include heat 
exchanger network design [14], chemical reactor design 
[8,9], optimal condenser design [4], oxygen production 
[21], membrane separation process design [12], opti- 
mal design of cooling towers [16], chemical equilibrium 
problems [29], optimal control [23], batch plant mod- 
eling [20,33], optimal location of hydrogen supply cen- 
ters [3] and many more. 

By grouping together monomials with identical 
sign, the generalized geometric problem can be formu- 
lated as the following nonlinear optimization problem: 


.» dy,~ € Rand are not necessarily inte- 


min Go(t) = Gi (b) — Ge (t) 

st. G(t) = GF (t) -— GF (t) < 0, 
(= 1,..,M, 
t; = 0, 


GGP 


where 
N 
Qijk 
GF) = Lalli 
kek? = i= 1 
J 
j=0,..., M, 
N 
= Qa; 
G= Do eel] 
kEK, i=1 
J 
j=0,...,M, 


where t = (ft), ..., ty) is the positive variable vector; 
Gr, G;,j = 0, ..., .M, are positive posynomial func- 
tions in t; jx are arbitrary real constant exponents; 
and cj are positive coefficients. Also, the sets Ke, Ky 
count how many positively/negatively signed monomi- 
als form posynomials Gr. G; respectively. In general, 
formulation GGP corresponds to a nonlinear optimiza- 
tion problem with nonconvex objective function and/or 
constraint set. Note that if we set K,=0 for all j = 0, 
...> M then the mathematical model for GGP reduces 
to the (posynomial) geometric programming (GP) for- 
mulation which laid the foundation for the theory of 
generalized geometric problems. 

Unlike (posynomial) problems (GP), the problems 
GGP remain nonconvex in both their primal and dual 
representation and no known transformation can con- 
vexify them. They may involve multiple local min- 
ima and/or nonconvex feasible regions and therefore 
are much more difficult problems to solve. Local opti- 
mization approaches for solving GGP problems include 
bounding procedures based on posynomial condensa- 
tion [2,5,13,15,23]; iterative solution of KKT conditions 
[9,25,32]; and adaptations of general purpose nonlin- 
ear programming methods [1,7,10,19,24,26,31]. A com- 
putational comparison of available codes for signomial 
programming is given in [12,32]. While local optimiza- 
tion methods for solving GGP problems are ubiqui- 
tous, application of specialized global optimization al- 
gorithms on GGP problems is scarce. J.E. Falk [17] pro- 
posed such a global optimization algorithm based on 
the exponential variable transformation of GGP and 
the convex relaxation and branch and bounding on the 
space of exponents of negative monomials (j = 1, ..., 
Mandke K;). Based on these ideas, C.D. Maranas 
and C.A. Floudas [27] proposed an alternative parti- 
tioning in the typically smaller space of variables i = 


Global Optimization in Generalized Geometric Programming 


1335 


1, ..., N. The proposed branch and bound type algo- 
rithm attains finite €-convergence to the global mini- 
mum through the successive refinement of a convex re- 
laxation of the feasible region and/or of the objective 
function and the subsequent solution of a series of non- 
linear convex optimization problems. The efficiency of 
the proposed approach is enhanced by eliminating vari- 
ables through monotonicity analysis, by maintaining 
tightly bound variables through rescaling, by further 
improving the supplied variable bounds through con- 
vex minimization. The proposed approach was applied 
to a large number of test examples, in particular robust 
stability analysis problems. 


Robust Stability Analysis 


Robust stability analysis of nonlinear systems involves 
the identification of the largest possible region in the 
un- certain model parameter space for which the con- 
troller manages to attenuate any disturbances in the 
system. The stability of a feedback structure is deter- 
mined by the roots of the closed loop characteristic 
equation: 


det (I + P(s,q)C(s, q)) = 0, 


where q is the vector of the uncertain model parame- 
ters, and P(s), C(s) the transfer functions of the plant 
and controller, respectively. After expanding the deter- 
minant we have: 


P(s,q) = an(q)s” 

+ An—i(q)s"' + +++ + ao(q) = 0, 
where the coefficients a;(q), i = 0, ..., n, are typically 
multivariable polynomial functions. The ‘zero exclu- 
sion condition’ (ZEC) implies that a system with char- 
acteristic equation P(q, s) = 0 is stable only if it does not 


have any roots on the imaginary axis for any realization 
of the qs in the uncertain model parameter space Q: 


0¢ Pijw,q), Wqe Q, and Va é€ [0, ov]. 
A stability margin k,, can then be defined as follows: 
km(G@) = inf{k: P(jw,q(k)) =0, Vqe Q}. 


Robust stability for this model is then guaranteed if and 
only if 


km > 1. 


Geometrically, k, expands the initial uncertain param- 
eter region Q as much as possible without loosing stabil- 
ity. Note that, typically real parameter uncertainty is ex- 
pressed as bounds on the real parameters of the model. 

Checking the stability of a particular system with 
characteristic equation P(jw, q) involves the solution of 
the following nonconvex optimization problem. 


re . 
s.t. Re[P(jw, q)] = 0 
(S) Im[P(jo,q)] = 0 
qi — Aq, k < qi 
< ql’ + Aq7k, 
i=1,...,n, 


where q¥ is a stable nominal point for the uncertain 
parameters and Aq*, Aq” are estimated bounds. Note 
that it is important to be able to always locate the global 
minimum of (S), otherwise the stability margin might 
be overestimated. This overestimation can sometimes 
lead to the erroneous conclusion that a system is stable 
when it is not. Because for most problems without time 
delays a;(q), i = 0, ..., n, are multivariable polynomial 
functions, formulation (S) corresponds to a generalized 
geometric problem. Next, an illustrative robust stability 
example is highlighted. 

This example was studied in [18] and [30]. The 
plant has three uncertain parameters and the charac- 
teristic equation is: 


P(s, 41, 42, 43) = s* + (10+ qo + q3)s 
+ (q2q3 + 10g2 + 10q3)s” 
+ (1 — qoq3 + qi)s + 2q) . 


The nominal values of the parameters of the system are 
qi; = 800, gq, =4, 


q3 = 6, 


and the bounded perturbations are: 


Aq; = Aq, = 800, 
Ags SAG =, 
Aq; = Aq; = 3. 
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After eliminating w the zero exclusion formulation be- 
comes: 


min k 
st. 10q2?q} + 10q3q3 + 20043q3 
+10093q3 + 1004243 + 919295 
+419393 + 10004243 + 84193 
+10004343 + 89193 + 6919293 
+60q193 + 604142 
—qi — 200q: < 0 
800 — 800k < qi < 800 + 800k 
4—2k <q. <4+2k 
6 — 3k < q3 <6 + 3k. 


The stability margin is found to be ky», = 0.3417, which 
implies that the system is unstable. Furthermore, the 
first instability occurs at: 


q; = 1073.4, 

G, = 3318, 

qe = 4.975. 
See also 


> «BB Algorithm 

> Continuous Global Optimization: Models, 
Algorithms and Software 

> Convex Envelopes in Optimization Problems 

> Global Optimization in Batch Design Under 
Uncertainty 

> Global Optimization Methods for Systems of 
Nonlinear Equations 

> Global Optimization in Phase and Chemical 
Reaction Equilibrium 

> Interval Global Optimization 

> MINLP: Branch and Bound Global Optimization 
Algorithm 

> MINLP: Global Optimization with wBB 

> Smooth Nonlinear Nonconvex Optimization 
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Global Optimization of Heat Exchanger Networks, Figure 1 
Head exchanger network superstructure 
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The cost of energy represents an important part of 
the total operating cost of many processing plants. 
Therefore, the recovery of energy through heat ex- 
changer networks (HENs) has played an important role 
in industry, and has been a major concern of design 
engineers for the last two decades (for reviews, see 
[5,10,11]). Design approaches based on mathematical 
programming techniques and models have been devel- 
oped and applied in the synthesis and the optimiza- 
tion of HENs (see for instance [3,12,18]). The synthesis 
of HENs with a mathematical modeling framework in- 
volves the optimization of a superstructure like the one 
in Fig. 1 [18], and represents a difficult global optimiza- 
tion problem from a deterministic point of view [20]. 


Nonconvexities are introduced into mathematical 
models for HENs by the fractional powers of linear frac- 
tional terms that appear in heat transfer area cost terms, 


Area Cost = C (=4,)' : 


Here the variables are the heat transfer rate, q, and the 
logarithmic mean temperature difference driving force, 
AT or LMTD, U is the heat transfer coefficient, and C 
and f are cost coefficient and exponent, respectively. 
Other sources of nonconvexities in mathematical pro- 
gramming models for heat exchanger networks arise 
due to the logarithmic mean temperature difference 
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driving force, which can be given rigorously 
[dt, — dt. | 
log [3] 
or by an approximation like the ones due to W.R. Pa- 
terson [13] 


LMTD = 


1 2 —— 
LMTD = Ea 2: ar)| # 5 V dinate 


wile 


or J.J.J. Chen [2] 


(dtn)(dt-)(dth + eh 


LMTD = 
2 


Here, dt, and dt, are the temperature differences at 
the hot and cold extremes in the heat exchanger. Non- 
convexities in mathematical models of HENs also may 
appear in the form of bilinear terms that are used to 
model the nonisothermal mixing of process streams. 
For instance, the energy balance for modeling the non- 
isothermal mixing of process streams 1 and 2 to pro- 
duce stream 3 would require the inclusion of the fol- 
lowing bilinear equation in the mathematical model: 


fiti + fat, = fats, 


in which f stands for heat capacity flowrate, and t¢ for 
stream temperature. 

The issue of determining a global optimum solu- 
tion for problems involving heat exchanger networks 
was first considered in [17]. Since then, representative 
global optimization problems in heat exchanger net- 
works have been posed, see for instance [4]. Never- 
theless, deterministic global optimization algorithms, 
and their application to the optimization of certain 
classes of NLP and MINLP models in heat exchanger 
networks appeared only until the 1990s in [1,6,9,14, 
15,16,20,21,22]. 

Most of the applications of deterministic global op- 
timization algorithms for the solution of nonconvex 
problems involving HENs are based on a branch and 
bound framework [7,8]. Within the branch and bound 
approach for global optimization, lower bounds of the 
global minimum value of the objective function are 
computed by solving a convex relaxation of the origi- 
nal nonconvex problem over subsections of the search 
region. For the development of the convex relaxations 


for nonconvex problems in HENs, the following prop- 
erties are exploited. 


Property 1 ([19,20,21,22]) Let @ and AT be continuous 
positive variables with AT > 0. Also, let U, C, a and B 
be positive constants, with 6 > 0, and a = (B + 1)/f. 
Then, the function 


ge B 
C 
( UA =) 
is convex. Furthermore, if q is a positive variable, and 
S is a convex subset in R’,, the convex optimization 
problem in (2) can be used to compute a rigorous lower 
bound for the solution of the problem in (1), i-e., the 


problem in (2) is a valid convex relaxation of the prob- 
lem in (1): 


GloMin C (<4)? 
s.t. (q, AT) CS (1) 


(eg sg<Q", 


. ox \8 
min C (car) 
st. 60> (q")a 
+ Pig =e (q—q') (2) 
(q, AT) CS, 
O<q'<q<q', 020. 
Property 2 ([19]) Let dt), dt, and AT, be continuous 
positive variables. Also, let T; and T2, be positive con- 
stants such that T; — T2 > 0. Then the following in- 
equalities are convex: 


AP [dtp atic 
log [2 | 
are lites e 5) 
t 
log | ort | 
AT [(t — %) — dé,| 


Property 3 ([19]) Let dt,, dt, and AT, be continuous 
positive variables. Also, let T; and T2, be positive con- 
stants such that T; — T2 > 0. Then the following in- 
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equalities, which are based on the Paterson approxima- 
tion [13] for the LMTD, are convex: 


1[(di+d)] 2 —— 
AT < dtpdt,, 
<;| 5 ]+5¥ h 
1 | (dt, + T, — Th) 2 
AT < dt,(T; — To), 
<;| 5 }+5 n(T, — Th) 
1/(]-T dt, 2 
AT < E ai ‘+ (Gh Dyee, 
3 2 3 


Property 4 ([19]) Let dt,, dt, and AT, be continuous 
positive variables. Also, let T; and T>, be positive con- 
stants such that T; — T2 > 0. Then the following in- 
equalities, which are based on the Chen approximation 
[2] for the LMTD, are convex: 


AT < jo +. =] 3 
ce [conn - a shh a my) | 
ve E - ICE 24 ard | 


Property 5 ([19]) Let dt), dt, be continuous positive 
variables, and let AT be the logarithmic mean temper- 
ature difference, AT = [dt, — dt./log|dt)/dt,]. Also, as- 
sume that r is a constant determined by the ratio of 
two particular values of dt, and dt,. Then, the follow- 
ing bounding inequality is valid, and holds as an equal- 
ity along the line determined by the ratio r = dty/dt-: 


AT < P(r)dth + Q(r)dte, 


where 
P(r) = ae (r) iene, 
Toe «Ci A, 
Wee i ifr =1, 
~ Ta (ifr #1. 


Several other useful properties and their application in 
the development of convex relaxations for HENs prob- 
lems can be found in [1,6,14,19], and [20,21,22] 

As an illustrative example of the use of the above 
properties, and the application of global optimization 
techniques in heat exchanger networks, consider the 


Global Optimization of Heat Exchanger Networks, Figure 2 
Heat exchanger network for the illustrative problem 


400.60 


Global Optimization of Heat Exchanger Networks, Figure 3 
Global optimum HEN design of the illustrative problem 


determination of the global optimal design of the HEN 
shown in Fig. 2 [14]; stream data and cost information 
are included in Table 1. This problem was originally 
solved in [14] and [21] using the arithmetic mean tem- 
perature difference driving force (AMTD), and assum- 
ing isothermal mixing of process streams (ts = t). 
Figure 3 shows the global optimum solution of the 
nonconvex model (P) associated with the illustrative 
problem. A design with a total network cost of $36,199 
is determined. Note that model (P) does not assume 
isothermal mixing, utilizes the approximation by Chen 
[2], and enforces a minimum approach temperature of 
5 degrees. The global optimization of model (P) was 
performed with the branch and contract algorithm pro- 
posed in [21,23]; the convex model (R) was used in 
the computation of rigorous lower bounds of the total 
network cost. The solution process required 7 branch 
and bound nodes, and approximately 37 cpu seconds of 
a Pentium I processor running at 133Mhz. Alternative 
suboptimal solutions for the illustrative problem based 
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Global Optimization of Heat Exchanger Networks, Table 1 
Problem data for illustrative example 


Tin Tout F 
Stream (K) (K) (kW K~!) 
H1 575 395 5,555) 
H2 718 398 3.125 
Cl 300 400 10 
C2 365 - 4.545 
C3 358 - Bova 


Cost of Heat Exchanger 1 ($yr~') = 270[A;(m?) 
Cost of Heat Exchanger 2 ($yr~') = 720[A2(m?) 
Cost of Heat Exchanger 3 ($yr~') = 240[A3(m?) 
Cost of Heat Exchanger 4 ($yr~') = 900[A4(m?) 


| 
| 
| 
| 


U,; =U; =0.1 kWm?K"™! 
U3 =n =1.0 kW ae 1K 


on the rigorous LMTD include network designs with 


total costs of $38,513, $39,809, $41,836, and $47,681. 


Nonconvex Model (P) 


Indices 
1,2,3,4 


index for heat exchangers 


1h, 2h, 3h, 4h 


hot side of heat exchangers 


1c, 2¢, 3c, 4c 


Parameters 


cold side of heat exchangers 


U;, U2, U3, U4 =| overall heat transfer coefficients 


Positive Variables 


stream temperature 


temperature difference at end of heat exchanger 


approximation of the logarithmic mean 
temperature difference 


Model Constraints 


qi = 5.555(t; — 395), 
qi = fi(ts — 300), 

gz = 3.125(t, — 398), 
92 = fr(te — 300), 

q3 = 4.545(t; — 365), 
q3 = 5.555(575 — t1), 
ga = 3.571 (ts — 358), 
ga = 3.125(718 — t), 
qi + q2 = 1000, 

qi + q3 = 999.9, 

q2 + qa = 1000, 
fith = 10, 

dt, = t) — ts, 

dt,, = 95, 

dty, = tz — te, 

dtr, = 98, 

dtz3, = 575 — fs, 

dtz3, = t,; — 365, 

dt4, = 718 — ty, 


heat transfer rate 


heat capacity flowrate 


Objective Function 


: ql q2 
270 + 720 
mes eC r CAD 
q3 q4 
+ 240 + 900 : 
U3AT3 UsAT4 


dts. = tz — 358, 

AT, =  Snictittn + ae 
AD = Ee es ee 
AT, = esata He = 
AT, = Ee a a. 


fits + fate = 4000, 

CSy Sas q, 121,234, 
O<t<t;<t7, j=1,2,3,4,5,6, 

dt, >5, k= 1h, 1c, 2h, 2c, 3h, 3c, 4h, 4c 
(Sf SfAShe VSR Shss: 
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Convex Model (R) 
Objective Function 
. [01]? [02]? 
270 720 
mn 2OT AT | UAT 
[03]? [04]? 
+ 240 + 900 , 
U3 A T; U4, A Ts 


Model Constraints 


6; > (qi)? + (af)? = (a)? la 7 (ai)? (qi — 
qi — 4i 

i= 1,2,3,4, 

gi = 5.555(t, — 395), 

q = Vis — 300ft, 

gz = 3.125(t, — 398), 

G2 = Yoo — 300f2, 

q3 = 4.545(t3 — 365), 

q3 = 5.555(575 — t1), 

ga = 3.571(ts — 358), 

ga = 3.125(718 — ty), 


qi + qr = 1000, 
qi + q3 = 999.9, 

q2 + q4 = 1000, 

fith= 

dti, = t) — ts, 

dt; = 95, 

dtr, = tr — te, 

dtr, = 98, 

dtz3, = 575 — fs, 

dts, = t, — 365, 

dt4, = 718 — ty, 

dt4, = tp — 358, 

AT, < joe + dtyc) 


(dtz,)(dt2-)(dton +f dtr) 


2 


(dtan)(dtac)(dtan + dtc) 
2 


(dt3,)(dt3-)(dtsn as dt3.) 


pee. 
| 


ai); 


Z1, = ts — 300, 

Z22 = ts — 300, yis + yoo = 4000, 
nis = ts fit fi'ts— fits, 
yis=tsfitfrts—frts 
WeseEnt ies k 

vis Sts fit fits — frts 

V6 = tt fr + frts—fyté 

yoo = th fa + fy te — fy tg, 

yo < thf + fy te — fy th, 


Yr X< te Up + fr te — aE 
2 
1f[at+yaar 
Zu 


Yat Vat 


qa +f 95ay 


aed se 

ea a(z- ak 
wetee (id) 
se hoa(t-4) 


ae 7) 


zu < iF (fa —aifit afr). 
zu < at (fra-afhtarf), 
Z22 ae g—a5f.+ af). 

in SF ap (fe-ghta hf’). 


0<qi a ee he i= 1,2,3,4, 
0<t Sit j=1,2,3,4,5,6, 
dt. > 5, k= th, 1c, 2h, 2c, 3h, 3c, 4h, 4c, 
OS =f 2h. O<ff<fh< fy, 


V155 926, Z11, 222 = 0. 
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See also 


> 
> 
> 
> 


> 


MINLP: Global Optimization with aBB 

MINLP: Heat Exchanger Network Synthesis 
MINLP: Mass and Heat Exchanger Networks 
Mixed Integer Linear Programming: Heat 
Exchanger Network Synthesis 

Mixed Integer Linear Programming: Mass and Heat 
Exchanger Networks 
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The hit and run algorithms fall into the category of se- 
quential random search methods (cf. also » Random 
search methods), or stochastic methods. These meth- 
ods can be applied to a broad class of global optimiza- 
tion problems. They seem especially useful for prob- 
lems with black-box functions which have no known 
structure. These problems often involve a very large 
number of variables, and may include both continuous 
and discrete variables. 

The concept of hit and run is to iteratively generate 
a sequence of points by taking steps of random length 
in randomly generated directions. R.L. Smith, in 1984 
[12], showed that this method can be used to generate 
points within a set S that are asymptotically uniformly 
distributed. The hit and run method was originally ap- 
plied to identifying nonredundant constraints in linear 
programs [1,3], and in stochastic programming [2]. 

Hit and run was first applied to optimization in 
[16], and the name improving hit and run (IHR) was 
adopted. The term ‘improving’ was intended to indi- 
cate that the sequence of points were improving with 
regard to their objective function values. The IHR al- 
gorithm couples the idea of pure adaptive search [8,15] 
with the hit and run generator to produce an easily im- 
plemented sequential random search algorithm. Pure 
adaptive search (PAS, see also ® Random search meth- 
ods) predicts that points uniformly generated in im- 
proving level sets has, on the average, a linear number 
of iterations in terms of dimension. One way to approx- 
imate PAS, would be to use hit and run to generate ap- 
proximately uniform points, and then select those that 
land in improving level sets. This is the idea behind im- 
proving hit and run. 

In addition to IHR, a family of methods have been 
developed that are based on hit and run. Other vari- 
ations include: adding an acceptance probability with 
a cooling schedule, varying the choice of direction, 
varying the length of step, and modifying the sampling 
method to include a mixture of continuous and discrete 
variables. 


Hit and Run Based Algorithms 


The underlying concept of hit and run based algorithms 
is that, if hit and run could generate a uniformly dis- 
tributed point in an improving level set, then PAS pre- 
dicts that we need only a linear number of such points. 
The point generated by just one iteration of hit and run 
is far from uniform and may not be in the improv- 
ing set, so the number of function evaluations is not 
expected to be linear in dimension, but in [16] it was 
shown that the expected number of function evalua- 
tions for IHR on the class of elliptical programs (e. g. 
positive definite quadratic programs) is polynomial in 
dimension, O(n®?). The number of function evalua- 
tions includes those points that are rejected because 
they do not fall into the improving level set. This the- 
oretical performance result motivates the use of hit and 
run for optimization. Numerical experience indicates 
that IHR has been especially useful in high-dimensional 
global optimization problems when there are many lo- 
cal minima embedded within a broad convex structure. 

The general framework for a hit and run based op- 
timization algorithm for solving a global optimization 
problem, 


min f(x) 


st xeéES, 


where f is a real-valued function on S, is stated below. 


PROCEDURE hit and run optimization method() 
InputInstance(); 
Generate an initial solution Xo; 
Set Yo = f (Xo); 
Set k = 0; 
DO until stopping criterion is met; 
Generate a random direction D;; 
Generate a random steplength A;; 
Evaluate candidate point Wy = Xx + Ax Dx; 
Update the new point, 
W, if candidate point accepted 
Xku = , 
X; if rejected 
Set Yur = min(Y,,f (Xk+1)); 
OD; 
RETURN (Best solution found, Y;41); 
END hit and run optimization method; 


Pseudocode for a hit and run based optimisation algorithm 
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Improving hit and run uses the most basic hit and 
run generator, which is to generate a direction vector 
D, that is uniformly distributed on a hypersphere, and 
then generate a steplength A; which is generated uni- 
formly on the intersection of D, with the feasible set S. 
In many applications, S may be an n-dimensional poly- 
tope described by linear constraints, in which case the 
intersection of a direction with S is easily computed us- 
ing a slight modification of a minimum ratio test (see 
[16] for details). This is the most basic hit and run gen- 
erator, but several variations have been developed. 

One variation is to add an acceptance probability 
with a cooling schedule to the hit and run generator, 
as in simulated annealing (cf. » Simulated annealing). 
This was developed in [10] and called the hide-and-seek 
algorithm. Just as IHR was motivated by pure adap- 
tive search, hide-and-seek was motivated by adaptive 
search [9] (see also ® Random search methods). Adap- 
tive search generates a series of points according to 
a sequence of Boltzman distributions, with parameter 
T changing on each iteration. The theory predicts that 
adaptive search with decreasing temperature parame- 
ter T will converge with probability one to the global 
optimum, and the number of improving points have 
the same linear bound as PAS. Hide-and-seek uses the 
basic hit and run generator, but accepts the candidate 
point with the Metropolis criterion and parameter T. It 
is interesting to consider the two extremes of the accep- 
tance probability: if the temperature is fixed at infinity, 
then all candidate points are accepted, and the hit and 
run generator approximates pure random search with 
a uniform distribution; at the other extreme if the tem- 
perature is fixed to zero, then only improving points 
are accepted, and we have improving hit and run. H.E. 
Romeijn and Smith derived a cooling schedule which 
essentially starts with hit and run, and approaches IHR. 
They proved that hide-and-seek will eventually con- 
verge to the global optimum, even though it may expe- 
rience deteriorations in objective function values. They 
also present computational results on several test func- 
tions, which compare favorably with other algorithms 
in the literature. 

A second variation to the basic hit and run gener- 
ator is to modify the direction distribution. Thus far, 
we have only described choosing a direction according 
to a uniform distribution on an n-dimensional hyper- 
sphere, which has also been termed hyperspherical di- 


rection (HD) choice. In [16] and [10], the direction dis- 
tribution is defined more generally; the direction may 
be generated from a multivariate normal distribution 
with mean 0 and covariance matrix H. If the H matrix is 
the identity matrix, then the direction distribution is es- 
sentially the uniform distribution on a hypersphere. In 
[4] a nonuniform direction distribution is derived that 
optimizes the rate of convergence of the algorithm. Al- 
though exact implementation of the optimal direction 
distribution may be very difficult, it motivates an adap- 
tive direction choice rule called artificial centering hit 
and run. 

Another choice for direction distribution is the co- 
ordinate direction (CD) method, in which the direc- 
tion is chosen uniformly from the n coordinate vec- 
tors (spanning R”). Both HD and CD versions of di- 
rection choice were presented and applied to identify- 
ing nonredundant linear constraints in [1]. They were 
also tested in the context of global optimization in [14]. 
Computationally, CD can outperform HD on specific 
problems where the optimum is properly aligned, how- 
ever HD is guaranteed to converge with probability one, 
while it is easy to construct problems where CD will 
never converge to the global optimum. A simple ex- 
ample is given in [5] where local minima are lined up 
on the coordinate directions, and it is impossible for 
the CD algorithm to leave the local minimum unless 
it accepts a nonimproving point. For such an exam- 
ple, in [5] it is shown that the CD algorithm coupled 
with a nonzero acceptance probability for nonimprov- 
ing points will converge with probability one. Experi- 
mental results were also reported. 

A third variation to the basic hit and run generator 
modifies it to be applicable to discrete domains [7,11]. 
Hit and run as described so far has been defined on 
a continuous domain. An extension to a discrete do- 
main was accomplished by superimposing the discrete 
domain onto a continuous real number system. It was 
motivated by design variables such as fiber angles in 
a composite laminate, or diameters in a 10-bar truss, 
where the discrete variables have a natural continuous 
analog. Two slightly different modifications have been 
introduced. 

In [11] the candidate points were generated us- 
ing Hit and run on the expanded continuous domain, 
where the objective function of a nondiscrete point is 
equal to the objective function evaluated at its nearest 
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discrete value. In this way, the modified algorithm oper- 
ates on a continuous domain where the objective func- 
tion is a multidimensional step function, with plateaus 
surrounding the discrete points. This modification still 
converges with probability 1 to the global optimum, as 
proven in [11]. 

The diagram in Fig. 1 illustrates this method. Start- 
ing from point Xj, hit and run on the continuous do- 
main generates a candidate point such as A. The ob- 
jective function at A is set equal to that of its nearest 
discrete point B, forcing f(A) = f(B). If the candidate 
point is accepted, then X2 = A, and another candidate 
point (shown as C) is generated. 

A second scheme to modify hit and run to oper- 
ate on discrete domains is to similarly generate a point 
on a continuous domain, and then round the gener- 
ated point to its nearest discrete point in the domain on 
each iteration [6,7,13]. Again starting from point X, in 
Fig. 1, suppose A is generated. In this version, the can- 
didate point is taken as the nearest discrete neighbor, 
in this example B. The objective function is evaluated at 
B, f(B), and if the point is accepted, then X2 = B. The 
difference in this variation is illustrated by noting that 
the next candidate point is generated from B instead of 
from A, see point D in Fig. 1. Also note that only dis- 
crete points are maintained. In [6,7] it is shown that this 


second scheme dominates the first scheme in terms of 
average performance for the special class of spherical 
programs, and numerical results have been promising. 

Another modification to the basic hit and run gen- 
erator is in the way the steplength is generated. Instead 
of generating the point uniformly on the whole line 
segment, the line segment can be restricted to a fixed 
length, or adaptively modified. S. Neogi [6] refers to 
this as full-line length, restricted line length, or adap- 
tive stepsize. In [6] the adaptive stepsize is coupled with 
an acceptance probability to maintain a fixed probabil- 
ity of generating an improving point. See [6] for a more 
detailed discussion of this variation of a simulated an- 
nealing algorithm based on the hit and run generator. 

The many variations of hit and run have been nu- 
merically tested on many test functions and applied to 
real applications. All of the papers referenced in this ar- 
ticle include numerical results, but the details are left to 
the individual papers. Overall, the theoretical motiva- 
tions and numerical experience leads us to believe that 
hit and run is a promising approach to global optimiza- 
tion. 


See also 


> Random Search Methods 

> Stochastic Global Optimization: Stopping Rules 

> Stochastic Global Optimization: Two-phase 
Methods 
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Introduction 


Mathematically the global optimization problem is for- 
mulated as 


f* = min f(X), 


where a nonlinear function f(X), f : R” — R, of con- 
tinuous variables X, is an objective function; D € R" 
is a feasible region; n is a number of variables. A global 
minimum f' and one or all global minimizers X°: 


I=" 


should be found. No assumptions on unimodality are 
included in the formulation of the problem. Most often 
an objective function is defined by an analytical formula 
or an algorithm, which evaluates the value of the ob- 
jective function using the values of variables and arith- 
metic operations. » Continuous global optimization: 
models, algorithms and software. 

One of the classes of methods for global optimiza- 
tion are methods based on interval arithmetic. Interval 
arithmetic [10] provides bounds for the function val- 
ues over hyper-rectangular regions defined by intervals 
of variables. The bounds may be used in global opti- 
mization to detect the sub-regions of the feasible region 
which cannot contain a global minimizer. Such sub- 
regions may be discarded from the subsequent search 
for a minimum. 

Interval arithmetic provides guaranteed bounds but 
sometimes they are too pessimistic. Interval arithmetic 
is used in global optimization to provide guaranteed 
solutions, but there are problems for which the time 
for optimization is too long. A disadvantage of interval 
arithmetic is the dependency problem [5]: when a given 
variable occurs more than once in interval computa- 
tion, it is treated as a different variable in each occur- 
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rence. This causes widening of computed intervals and 
overestimation of the range of function values. 

Analysis of both overestimating and underestimat- 
ing intervals is useful to estimate how much inter- 
val bounds overestimate the range of function values. 
Moreover inner interval arithmetic operations may be 
used instead of standard interval arithmetic operations 
in some cases when dependency of operands is known 
or operands are known to be monotonic. Although 
monotonicity cannot easily be determined in advance, 
inner and standard interval arithmetic operations may 
be chosen randomly building random interval arith- 
metic, estimating the range of real function values from 
a sample of random intervals. 


Methods / Applications 
Interval Analysis in Global Optimization 


Interval arithmetic is proposed in [10]. Interval arith- 
metic operates with real intervals x = ES x | = 
{x € R|x <x <x}, defined by two real numbers 
x € Randx € R,x < xX. For any real arithmetic op- 
eration x o y the corresponding interval arithmetic op- 
eration x o y is defined as an operation whose result 
is an interval containing every possible number pro- 
duced by the real operation with the real numbers from 
each interval. The interval arithmetic operations are de- 
fined as: 


xy.xy x>0 y>0 
xy. xy x>0,0€y, 
et x>0,y<0, 
[xy, xy]. xX,y>0, 
[ min(xy. xy), 

ae a max(xy, ¥9) | Oex,0Ey, 
xy, xy], Eex,y<0, 
xy, xy], x y>0 
xy xy, x<0,0€y, 
xy, Xy|. x<0,y <0, 


x/y,xly|, x>0,y>0, 
xly,xly|, x>0,y<0, 

ree xly,xly|, OEx,y>O0, 
= [x/y,x/y], 0€xX,7 <0, 
xly,xly|, x<0,y>0, 

xly, xly|, x<0,y <0. 


An interval function can be constructed replacing 
the usual arithmetic operations by interval arithmetic 
operations in the formula or the algorithm for calcu- 
lating values of the function. An interval value of the 
function can be evaluated using the interval function 
with interval arguments. The resulting interval always 
encloses the range of real function values in the hyper- 
rectangular region defined by the vector of interval 
arguments: 


{Foy [Xe XX eR", Ker" F(X), 


where f : R” > R, fi [R, R]” — [R, R]. Because of 
this property the interval value of the function can be 
used as the lower and upper bounds for the function in 
the region which may be used in global optimization. 

The first version of interval global optimization al- 
gorithm was oriented to minimization of a rational 
function by bisection of sub-domains [12]. Interval 
methods for global optimization were further devel- 
oped in [3,4,11], where the interval Newton method 
and the test of strict monotonicity were introduced. 
A thorough description including theoretical as well as 
practical aspects can be found in [5] where a very ef- 
ficient interval global optimization method involving 
monotonicity and non-convexity tests and the special 
interval Newton method is described. > Interval global 
optimization. 

A branch and bound technique is usually used to 
construct interval global optimization algorithms. An 
iteration of a classical branch and bound algorithm 
processes a yet unexplored sub-region of the feasi- 
ble region. Iterations have three main components: 
selection of the sub-region from a candidate list to 
process, bound calculation, and branching. In inter- 
val global optimization algorithms bounds are calcu- 
lated using interval arithmetic. All interval global opti- 
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mization branch and bound algorithms use the hyper- 
rectangular partitions and branching is usually per- 
formed bisecting the hyper-rectangle into two. Variants 
of interval branch-and-bound algorithms for global 
optimization where the bisection was substituted by 
the subdivision of subregions into many subregions in 
a single iteration step have been investigated in [2]. The 
convergence properties have been investigated in detail. 
An extensive numerical study is presented in [8]. » Bi- 
section global optimization methods; > Interval analy- 
sis: Subdivision directions in interval branch and bound 
techniques. 

The tightness of bounds is a very important factor 
for efficiency of branch and bound based global op- 
timization algorithms. An experimental model of in- 
terval arithmetic with controllable tightness of bounds 
to investigate the impact of bound tightening in inter- 
val global optimization was proposed in [14]. The ex- 
perimental results on efficiency of tightening bounds 
were presented for several test and practical problems. 
Experiments have shown that the relative tightness of 
bounds strongly influences efficiency of global opti- 
mization algorithms based on the branch and bound 
approach combined with interval arithmetic. 


Underestimating Interval Arithmetic 


Kaucher arithmetic [6,7] defining underestimates is 
useful to estimate how much interval bounds overes- 
timate the range of function values. Kaucher arithmetic 
operations (0,) are defined as: 


E+u7=|x+yvEty], 


E-yj=[z-yvz%-J], 


xy. x>0,0Cy, 
xyv xy], X>0,7<0 
XXuy = or X<0,y>0 
[xy zy]. x.y ‘ 
[0,0], x,0Ey, 
[xy.xy], O€x,y<0, 
[=7. zy] x<0,0€Ey, 


iy <0 
Vv ; 0, 0 
ie xly x/5| x> a 
= or x<0,y>0 
[x/¥.x/¥], O€X,V¥>0, 
[zy. x/y| Oex,y<0, 


where [a V b] = [min(a, b), max(a, b)]. Underestimat- 
ing interval arithmetic guarantees to underestimate: 


F(X) ¢ {fx eX} F(X). 


An interval defined by Kaucher arithmetic is a worst 
case estimate and can be the degenerate interval [0, 0]. 
A regularized version of Kaucher arithmetic proposed 
in [13] assumes regularity of the dependency between 
variables. In the underestimation assuming regular- 
ity of the dependency between variables, multiplica- 
tion operation (X,,) is defined differently from Kaucher 
arithmetic: 


min(x y, x ee 
x>0,y>0 or X<0,y <0, 


w(x, %, 7, y), max(x v.59) ], 
x> 0.y <0 or x<0,y>0, 


WEF) WE YD). 
otherwise , 


(x2-21) y2—x1(y2—y1) 

*2y1 5 ( piesa Me) >I, 
X2—X1)Y2—X1(y2—Y1 
X12» 2(x2-x1)(y2—-y1) 


(x2 y2—x1 1)" : 
ane otherwise . 


(x1, X2, V1, V2) = <0, 


In [1,9] inner interval arithmetic is defined. If the 
operands in the interval operations to calculate the 
function values are known to be monotonic then stan- 
dard interval arithmetic operations may be combined 
with inner interval operations to tighten resulting in- 
tervals without losing the guarantee of enclosure [1]. If 
it is known that operands in subtraction or division are 
dependent or are monotonic and have the same mono- 
tonicity (either both are monotonically increasing or 
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both monotonically decreasing) then inner interval op- 
erations may be used instead of standard interval op- 
erations. If it is known that operands in summation or 
multiplication are monotonic and do not have the same 
monotonicity (one is monotonically increasing and an- 
other is monotonically decreasing) then inner interval 
operations may be used instead of standard interval 
operations. 

The difference between inner interval operations 
(oin) and underestimating interval operations (0,) con- 
cerns the result of multiplication: 


Rad 
a 
i=) 

II 

II 
Ral 
| 

ec 


y 

[ max(x y, xy), min(x y, x) ; 
OEx0Ey, 
otherwise , 


&| 
x 
5 
ISS] 
ll 
2) —_— 


X/inY=X 


Random Interval Arithmetic 


It is difficult computationally to find which operands 
are dependent, to be certain they are monotonic, and to 
determine their monotonicity (intervals of the deriva- 
tives of all operands need to be found). Random inter- 
val arithmetic proposed in [1] is obtained by choosing 
standard or inner interval operations randomly with the 
same probability at each step of the computation. The 
range of function values is estimated using a number of 
sample intervals evaluated using random interval arith- 
metic. The estimation is based on the assumptions that 
the distribution of the centres of the evaluated inter- 
vals is normal with a very small relative standard de- 
viation and the distribution of the radii is normal but 
taking only positive values. The mean value of the cen- 
tres [centress the mean value of the radii [4;aq;; and the 
standard deviations of the radii 0;4;; of the random in- 
tervals are used to evaluate an approximate range of the 
function 


[ Lcentres am (Lradii + AO radii) | ’ (1) 


where a@ is between 1 and 3 depending on the number of 
samples and the desired probability that the exact range 
is included in the estimated range. It is suggested in [1] 
that a compromise between efficiency and robustness 
can be obtained using w = 1.5 and 30 samples. Experi- 


mental results presented in [1] for some functions over 
small intervals show that random interval arithmetic 
provides tight estimates of the ranges of the consid- 
ered function values with probability close to 1. How- 
ever, in the experiments, the intervals of variables of the 
function considered were small. In the case of large in- 
tervals of variables, and particularly for multi-variable 
functions, the obtained estimates for a range of func- 
tion values frequently do not fully enclose the range of 
function values. 

For the application of random interval arithmetic to 
global optimization it is important to extend these ideas 
to the case of functions defined over large multidimen- 
sional regions. Balanced random interval arithmetic 
proposed in [16] extending the ideas of [1], is obtained 
by choosing standard and inner interval operations at 
each step of the computation randomly with prede- 
fined probabilities for the standard and inner opera- 
tions. A number of sample intervals are evaluated. It is 
assumed that the distribution of centres of the evaluated 
balanced random intervals is normal and that the distri- 
bution of radii is folded normal, also known as absolute 
normal, because the radii cannot be negative. The range 
of values of the function in the defined region is esti- 
mated using the mean values (jz) and the standard devi- 
ations (a) of centres and radii of the evaluated balanced 
random intervals: 


[ Lcentres + (3.00 centres + radii + 3.00;aaii)] . (2) 


The ranges of values of the objective function esti- 
mated using balanced random interval arithmetic can 
be used in the general branch and bound framework 
building a stochastic global optimization algorithm. 
The performance of such an algorithm has been eval- 
uated experimentally on market model estimation [17] 
and on chemical engineering problems. When speed of 
optimization is more important than guaranteed reli- 
ability, such an algorithm is a good alternative to the 
algorithm with standard interval arithmetic because it 
is several times faster. 


Balanced Interval Arithmetic 


The exact range of function values lies between the 
results of overestimating and underestimating interval 
arithmetic. Estimates of the ranges of function values 
estimated from the results of standard interval arith- 
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metic and inner interval arithmetic were investigated 
in [15]. There, balanced interval arithmetic is defined 
as the weighted mean of the overestimating and under- 
estimating intervals of the function: 


pcx f (X) + (1— pe) x f, (X) . (3) 


where the predefined coefficient 0 < pc < 1 defines 
the balance between overestimating and underestimat- 
ing intervals. 

The ranges of the values of several functions es- 
timated using balanced interval arithmetic and using 
balanced random interval arithmetic have been experi- 
mentally compared [15]. The results of the experiments 
have shown that ranges estimated using balanced inter- 
val arithmetic compete with ranges estimated using bal- 
anced random interval arithmetic. However balanced 
interval arithmetic is not based on the assumptions of 
normal distributions and does not require several sam- 
ples. 

The ranges of values of the objective function esti- 
mated using balanced interval arithmetic can be used 
in the general branch and bound framework building 
a deterministic global optimization algorithm. When 
the predefined coefficient pc is less than 1, the algo- 
rithm may be faster than the algorithm with standard 
interval arithmetic. 

For each interval function, there exists a7, 0 < a < 
1, for which 


{px |ke X,| Cax F(X) +a) xF, (¥) 


for all possible sub-regions of the feasible region, 
X C D. The algorithm guarantees the exact solution 
ifpce >a. 
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Microclusters are [11] aggregates of atoms, ions, or 
molecules, sufficiently small that a significant propor- 
tion of these units is present on their surfaces. They 
correspond to systems that are neither single entities 
nor continua composed by an infinite number of units, 
but lie somewhere in between bridging the gap between 
single atoms or molecules and bulk matter. Typically, 
microclusters consist of two to several hundred atoms. 
A key word pertaining to the novel features of micro- 
clusters is size effects [26]. The microscopic size of mi- 
croclusters gives rise to unique properties in two ways. 
First, a large percentage of a cluster’s atoms are on or 
close to the surface, and surface atoms do not arrange 
themselves in the same way as do atoms in bulk mat- 
ter, but instead they tend to avoid being exposed on 
the surface. Assuming a spherical shape, the fraction of 
the number of surface atoms is 4/n'/?. For n = 10? this 
number is 86%, for n = 10° is 40% and for n = 10* is 
still 20%. For example, in a cluster of 55 argon atoms 
at least 42 atoms are on the surface in some sense. This 
effect completely overwhelmsthe tendency of atoms to 
arrange themselves in a regular crystalline array as they 
normally do in bulk matter. For instance, the ordering 
of silicon atoms in the Sijo cluster is completely differ- 
ent from the ordering in the silicon crystalline struc- 
ture. It appears that clusters consisting of specific num- 
bers of atoms are extremely stable, as they show up 
more prominently in the mass spectrum than neigh- 
boring cluster sizes. These numbers of particles that en- 
hance stability are called magic numbers and they are 
substance specific [2]. For instance [3], xenium clus- 
ters consisting of N = 13, 19, 23, 25, ...are particularly 
stable, although for sodium clusters the magic numbers 
are N = 8, 20, 40, 58, 92,.... 

The study of the topography of the potential en- 
ergy function of a microcluster in the internal config- 
urational space was and still remains a central prob- 


lem in this area of research [11,13]. This problemcan 
be succinctly stated as follows: Given N particles inter- 
acting with two-body central forces, find their configu- 
ration(s) in the three-dimensional Euclidean space in- 
volving the global minimum total potential energy. 
This can be expressed mathematically as follows: 


N-1 N 
v= a > v(rij), 
i=1 j=itl 


where 


rij = lx =x + (ye Vy? + lei — BP, 


HEN HNHMH N= 3B=0. 


Here, V is the total potential energy of the microclus- 
ter as the summation of all two-body interaction terms, 
v(rj) is the potential energy term corresponding to the 
interaction of particle i with particle j, and rj is the Eu- 
clidean distance between i and j. Note that in the dou- 
ble summation, j spans from i+ 1 to N so that we avoid 
double counting pair interactions and the interaction 
of a particle with itself. Furthermore, by specifying x, 
= y, = Z, = 0, we fix the first particle at (0, 0, 0) elimi- 
nating all three translational degrees of freedom of the 
microcluster. By further imposing yz = z2 = z3 = 0 we 
eliminate the rotational degrees of freedom as well. Pair 
potentials that have been used in cluster studies include 
the following [11]: 
1) vir) =(n—m)! [nr—™ — mr] (Mie); 
2) v(r) =4 €{(o/r) — (o/r)*} (Lennard-Jones); 
3) v(r) = [1 — e2- 9? — 1 (Morse); 
4) v(r)= Ae” — Ber (Gaussian); 
5) v(r) = 2%zP ir + Ae” (Born—Meyer); 
Lennard-Jones and Morse potential models are the 
most popular selections to describe the force field. 
Even under simplifying assumptions about the in- 
teraction energy, the minimization of the total potential 
energy is very difficult to solve because it corresponds to 
a nonconvex optimization problem involving numer- 
ous local minima. Hoare [11] claimed that the num- 
ber of local minima of an n—atom microcluster grows 
as exp(n’). In fact, L.T. Wille [34] has shown that the 
complexity of determining the global minimum energy 
of a cluster of particles interacting via two-body forces 
belongs to the class NP. In other words, there is no 
known algorithm that can solve this problem in nonex- 
ponential time [22]. A geometrical, possibly topological 
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proof that a local minimum is both unique and global is 
not likely to be found because there still exist unsolved 
problems in the theory of sphere packings where diffi- 
culties are without any doubt less acute [4,5,6,10], than 
those in the minimization problem at hand. 

Existing methods use physical intuition, approxi- 
mation procedures, mimicking of physical phenom- 
ena, random searches, lattice optimization/relaxation, 
or local/global optimization approaches. M.R. Hoare, in 
a series of papers [12,13,14,15,16], proposed a method 
of finding minima of the total potential function of an 
5 < N < 66 particle Lennard-Jones cluster based on 
a growth scheme involving the following steps: First, 
a particular compact seed structure involving a small 
number of atoms is selected which is likely to appear in 
the N-particle structure. At each iteration an extra par- 
ticle is placed at all packing vertices and the resulting 
structures are tested for geometrical uniqueness. The 
distinct structures are then relaxed and a local opti- 
mization procedure locates and records the local min- 
ima involved. Each of the minima then serve as a new 
seed structure in repetition of the procedure. Finally, 
all of the generated distinct local minima are tabulated 
in decreasing order of binding energies. A number of 
‘growth rules’ are incorporated in the procedure that 
alleviates the computational effort. Using this method, 
Hoare generated a large number of local minima for 
structures from 5 to 66 particles. However, no claim for 
complete enumeration of all local minima, and thus de- 
tection of the global minimum, can be made. In fact, it 
has been reported [32] that solutions of low-symmetry 
are not likely to be found with this method. 

Piela’s method [25] is based on the simple idea of 
smoothly deforming the potential energy hypersurface 
[29], in such a way as to make shallow potential wells 
disappear gradually, while the deeper ones grow at their 
expense. As the potential wells evolve they change their 
position and size. One then eventually ends up with 
a single potential well that has absorbed all the others 
which hopefully corresponds to the global minimum. 
A local optimization procedure then can easily find the 
single local minimum corresponding to the global one 
as well. The hypersurface is deformed using the diffu- 
sion equation, with the original shape of the hypersur- 
face representing the initial concentration distribution. 
The main advantage of this method is that you do not 
have to explore the myriads of local optima, nor do you 


have to know their position beforehand. However, the 
approach depends on the conjecture that shallow po- 
tential wells disappear faster than deeper ones. In fact, 
it has been observed that when the global minimum lies 
on a narrow potential well of large depth, it might dis- 
appear faster than a wider, originally shallower, poten- 
tial well. 

Simulated annealing [18] variations has been widely 
used either alone, or in conjunction with some other 
method(s). A large number of researchers have been 
using this method for finding the global minimum of 
the potential energy function. Wille [32,33] solved the 
potential minimization problem for up to 25 particles, 
interacting under two-body Lennard-Jones forces and 
he found two new minima for N = 24 that were better 
than the one reported in [11]. P. Ballone and P. Mi- 
lani [1] using a semi-empirical many-body potential, 
solved for the ground-geometries of carbon clusters in 
the range 50 < M < 72 and found that all the struc- 
tures of low energies are hollow spheres with nearly 
graphitic atomic arrangement. D. Hohl and R.O. Jones 
[17] applied the same methodology also to phosphorus 
clusters P, to Pg, arriving to arather counterintuitive 
most stable structure for Pg. In [23] a combined sim- 
ulated annealing and a quasi-Newton-like conjugate- 
gradient method is used for determining the structure 
of mixed argon-xenon clusters interactingwith two- 
body Lennard-Jones forces. In [30,31] the binding en- 
ergy of Nickel Lennard-Jones clusters is studied using 
the simulated annealing method in a canonical ensem- 
ble Monte-Carlo technique. The simulated annealing 
method can be viewed as a method for stochastically 
tracing the annealing process by Monte-Carlo simu- 
lation. D. Shalloway [27,28] presented a deterministic 
method for annealing the objective function by tracing 
the evolution of a multiple-Gaussian-packet approxi- 
mation and using notions from renormalization group 
theory. This method has been applied to microcluster 
conformation problems and it appears that in most of 
the test problems was able to identify the global mini- 
mum. 

Lattice optimization techniques have been very ef- 
ficient in generating structures involving the lowest 
known potential energy. In [7] it is proposed that the 
most energetically favored microclusters in the range 
20 < N < 50 are the onesthat involve interpenetrat- 
ing icosahedra (polyicosahedra) or (PIC). For N < 55 


Global Optimization in Lennard-Jones and Morse Clusters 


1353 


a double icosahedral (DIC) growth scheme was intro- 
duced [8] and for 55 < N < 147 [9] a third layer icosa- 
hedral structure using two different surface arrange- 
ments was presented. Using these notions, J.A. Northby 
[24] derived optimal configurations for Lennard-Jones 
microclusters in the range 13 < N < 147 based on a lat- 
tice optimization/relaxation algorithm. First a heuris- 
tic procedure is employed for finding a set of lattice 
local minimizers assuming icosahedral- (IC) or face- 
centered (FC) arrangements. Then, the currently best 
lattice minimizers are relaxed by using a local optimiza- 
tion algorithm. G.L. Xue [35] improved on Northby’s 
method [24] by reducing the time complexity of the 
algorithm. Furthermore, by relaxing every lattice local 
minimizer a number of better optimal configurations 
were found in the range 13 < N < 147. However, it ap- 
peared that the best local lattice does not always relax to 
the structure involving thelowest total Lennard-Jones 
potential energy. A parallel implementation [19] al- 
lowed results on minimum energies for clusters of up to 
N = 1,000 atoms. Also by employing a parallel version 
of a two-level simulated annealing algorithm [36,37,38] 
solutions for clustersizes as large as N = 100,000 have 
been reported. 

C.D. Maranas and C.A. Floudas [20,21] introduced 
deterministic global optimization to the microcluster 
minimum potential energy problem. It was shown that 
the problem is convex only if both the first and second 
derivatives of the pairwise potential energy model with 
respect to the Euclidean distance are positive. This left 
only a narrow convex envelope for both Lennard-Jones 
and Morse potential energy models. To widen this en- 
velope, the sum of squares of all Cartesian coordinates 
multiplied by a positive parameter @ were added to 
the original objective function. It was shown that there 
exists a value for a such that the augmented objec- 
tive function is convex. An upper bound for this value 
was identified. Based on these developments a branch 
and bound algorithm was devised based on the con- 
vex lower bounding of the objective function through 
the use of the w parameter. The algorithm was imple- 
mented for finding the global minimum configuration 
of small Lennard-Jones and Morse microclusters. For 
larger ones lower and upper bounds were derived by 
using a relaxation procedure. Later, these ideas sparked 
the development of the aBB algorithm for general non- 
convex optimization problems. 


See also 
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A general mathematical problem encountered in var- 
ious applications is to find the configuration of r un- 
known points in R” (quite often n < 3) satisfying 


a number of constraints on their mutual distances and 
their distances to m fixed points, while minimizing 
a given function of these distances. Often the unknown 
points represent the locations of facilities to be con- 
structed to serve the users located at the fixed points, 
so as to minimize a cost function (travel time, trans- 
port cost for customers, etc.) or to maximize the global 
attraction (utility, number of customers, etc.). Also the 
unknown points may represent the cluster centers while 
the fixed points are the objects to be classified into 
groups (clusters). The biggest challenge occurs when 
the unknown points represent the objects (atoms, parti- 
cles) whose interactions depend upon their mutual dis- 
tances: the objective function in these problems is then 
interpreted as a “potential energy function” that should 
attain a global minimum at the unknown configuration. 

For many years, combinatorial geometric reason- 
ing and nonlinear programming methods have been 
the basic tools in the study of these problems. How- 
ever, since most nonconvex problems are characterized 
by the existence of many local nonglobal minimizers, 
other more suitable methods have to be used to effi- 
ciently cope with this difficulty. 

Global optimization methods began to be intro- 
duced in these fields more than two decades ago [9,15]. 
Subsequently, dc optimization techniques were used to 
tackle facility location with nonconvex objective func- 
tions and nonconvex constraints [5,6,12,13,19,20,21]. 


Single Facility Location 


The first location problem, introduced by Weber 
(1909), was to find the location of a facility so as to min- 
imize the sum of its weighted distances to a given set 
of users located in a plane. Over the years this uncon- 
strained convex minimization problem has been fur- 
ther and further generalized, leading to more and more 
complex models of location. 


Minisum and Maxisum 


Suppose a new facility is designed to serve m users lo- 
cated ata', ... ,> a” € R4,. Certain users, henceforth 
called the “attraction points,” are interested in having 
the facility located as close to them as possible. Oth- 
ers, called the “repulsion points,” would like the facil- 
ity to be located as far away from them as possible. Let 
Ji, J2 denote the index sets of attraction and repulsion 
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points, respectively. For each user j = 1, ... , ma func- 
tion qj(t) is known that measures the cost of travel- 
ing a distance t away from @; also, hj(x) is a function 
of the distance from user j to point x € R*. It is as- 
sumed that the function q;(t) is concave increasing with 
qj(t) > +00 as t > ov, while hj(x) is a convex func- 
tion such that hj(x) > +00 as ||x|| > +00. So if x is 
the unknown location of the facility, then to take ac- 
count of the interest of attraction points, one should try 
to minimize the sum > jen 1 j(hj(x)), whereas from the 
point of view of repulsion points one should try to max- 
imize the sum ) > jet 1 j(hj(x)). Under these conditions, 
a reasonable objective of the decision maker may be to 
locate the facility so as to minimize the quantity 


S- ailhj(x)) — Y- gi(hj(x)) 
jen j€l2 


over R". Denoting the right derivative of qj(t) at 0 by 
qj (0) and assuming q; (0) < +ooVj, it can easily be 
seen that each function gj(x) := Kjhj(x) + qj[hj(x)] is 
convex for Kj > q; (0), and so we come up with the dc 
optimization problem 


min{G(x) — H(x)|x ¢ R41}, (1) 


where G(x), H(x) are convex functions defined by 


G(x) = Y> gi(x) + D2 Kjhj(x). 


jeJ2 jen 
H(x) = D> gi(x) + 9) Kjhj(x). 
jen j€J2 


Problems with the above objective function are called 
minisum problems. 

In other circumstances, instead of minimizing the 
cost, one may seek to maximize the total attraction 


> ailhilx)] — So ailhj(x)] , 
JEN j€J2 
where each q; is a convex decreasing function. Assuming 
+ . 
qj (0) > —oo, the problem is then 
max{G(x) — H(x)|x € R‘}, (2) 
where G(x), H(x) are now the convex functions 
G(x) = D2 gi(x) + D0 Kjhj(x), 
EN j€J2 


A(x) = > g(x) + D> Kjhj(x). 


jeJ2 JEN 


Obviously, any maxisum problem can be converted 
into a minisum one and vice versa. Most problems stud- 
ied in the literature are minisum, under much more re- 
stricted assumptions than in the above setting (see [16] 
and references therein). Weber’s classical formulation 
corresponds to the case J) = © (no repulsion points) 
and hj(x) = ||x — a/||,q;(t) = wjt,w; > 0,Vj. The 
cases Jo # @ with qj(t) nonlinear have begun to be in- 
vestigated only recently, motivated by growing con- 
cerns about the environment. 


Maximin and Minimax 


When siting emergency services, like a fire station, one 
does not want to maximize the overall attraction but 
rather to guarantee for every user a minimal attraction 
as large as possible. The problem, often referred to as 
the p-center problem, can be formulated as 


max { _min qj{hj(x)]|x € Ri} ; (3) 
jHl,...,m 


where q(t) are convex decreasing functions (minimax 
problem). Assuming Iq; (0)| < ooVj as previously, we 
have the dc representation qj[hj(x)] = gj(x)—Kjhj(x), 
hence 


min qilhj(x)] 
j=l 


= > g(x) = aa F [Kjni) + Yat] , 
iy ee aj 


and so (3) is again a dc optimization problem. 

By contrast, when siting an obnoxious facility, one 
wants to minimize the maximal attraction to an user, so 
the optimization problem to be solved is 


max {minj=1,...,k qjlhj(x)]|x € Ri} ’ (4) 


where qj(t) are concave increasing functions (minimax 
problem). Again, assuming Iq; (0)| < ooVj, we have 
the dc representation qj[hj(x)] = Kj(x) — gj(x), and 
so 


Peer 


= max 
jHl,... on 


[Kini + Yo gi] - gi). 
j=l 
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i.e., the minmax location problem (4) is again a dc op- 
timization problem. 

A special maximin location problem worth men- 
tioning is the design centering problem encountered 
in engineering design. Given a compact convex set 
B C R" containing 0 in its interior and m compact con- 
vex sets Dj, j = 1, ... ,m contained in a compact con- 
vex set C C R", find x € C so as to maximize 


where rj(x) = min{p(y — x): y € Dj}, p: R" > Ry 
is the gauge of Band Dp = R” \ C. It can be shown [17] 
that the function ro(x) is concave while r;(x),..., 
’m(x) are convex, so this can be viewed as a maximin 
problem in which each D; is a user and r;(x) is the dis- 
tance from point x to user j. 


Constrained Location 


In the real world many factors may set restrictions on 
the facility sites. Therefore, practical location problems 
are often constrained. 


Location on Union of Convex Sets 


The most simple type of restriction is that the facility 
can be located only in one of several given convex re- 
gions Cy, ... , Cx [8]. If C; = {x: ci(x) < 0}, with c;(x) 
being convex functions, then the constraint x € Ue. C; 
can be expressed as 


min cj(x) <0, 


i=1,.. 


oe 


which is a dc constraint. 


Location on Area with Forbidden Regions 


In other circumstances, the facility can be located only 
outside some forbidden regions that are, for instance, 
open convex sets C? = {x: cj(x) < 0}, with c;(x) being 
convex functions (see, e.g., [2]). Since the constraint 
x¢ Uae So is equivalent to minj=,... 4 ci(x) > 0, this 
is again a dc constraint. 


General Constrained Location Problem 


The most general situation occurs when the constraint 
set is a compact, not necessarily convex, set. However, 
a striking result of dc analysis shows that even in this 


general case the constraint can be expressed as a dc in- 
equality [12,22]. 

Of course the corresponding dc optimization prob- 
lem is very hard. Although a method (the relief indi- 
cator method [18]) exists for dealing with general non- 
convex constraints, so far it only works in low dimen- 
sion. 


Multiple Source 


When more than one facility is to be located, the objec- 
tive function depends upon whether these facilities pro- 
vide the same service or different services to the users. 
If there are r > 2 facilities providing the same ser- 
vice, these facilities are called sources. Each user is then 
served by the closest source. So if x! is the unknown lo- 


cation of the ith facility and X = (x', ... ,x") € (R’)’, 
then the overall attraction is 
Y= ailhiO] — So gil], (5) 


jin}; jEJ2 


where hj(X) = min{h,(x'): i=1,...,r} and qj,h; 
are as previously. Since hj(X) =  j_,hj(x') - 
maxj=1,...47 ig h;(x'), the first term in (5) is the de 
function 


jen jen i=l = 


> ni. 


igfl 


where Kj = Iq; (0)| and 


gji(X) = alhoO+%] ae 2 his) 


i=1 


is a convex function. Similarly for the second term 
in (5). Hence the objective function in the r source 
problem is a dc function on (R7)’. 

The multisource problem is usually referred to 
as the generalized Weber problem, or also the r-me- 
dian problem when J, = @. Traditionally it is often 
viewed as a location-allocation problem and formu- 
lated as a mixed 0-1 integer programming problem (see, 
e. g., [16]). 


Clustering 


In many practical situations we have a set of objects of 
a certain kind that we want to classify into r > 2 groups 


Global Optimization in Location Problems 


1357 


(clusters), each including elements close to each other 
in some well-defined sense. In the simplest case, this 
gives rise to the following problem: for a given finite 
set of points a',...,a™ ER", find r cluster centers 


x! © R",i=1,...,7, such that the sum of the min- 
ima over i € {1, ... ,r} of the “distance” between each 
point @/ and the cluster centers x',i = 1, ... , r, is min- 


imized. If d(a, x) denotes the distance from a to x, then 
the problem is 


nin} min d(a),x'): x! € [0, of ; (6) 
j gaaest? 


= 


Formally, this is nothing but the r-median problem, 
i.e., the generalized Weber problem with J, = 9. 

If d(a, x) = )°7_, |a; — x;|, then, using the equal- 
ity |a; —x;| = min{y;: — y; < a; — x; < yj}, prob- 
lem (6) can be written as 


m n 
: jl 
jamin (D9 
j=l i=1 

— yll <@i-x! < yi! 
j=l,...,m,l=1,...,7r, 


which is a concave minimization problem under linear 
constraints. One way to cope with the large dimension 
of this problem is to replace it with the equivalent bilin- 
ear program 


min 52 ty" 


j=l I=1 


s.t.— y! <ai—x! < y! j=1,...,m1=1,...,4r 


r r 
Yotay", til = 0, >) ti = 1s, 


1=1 I=1 


and to solve the latter approximately to a local optimum 
by alternately fixing t and y. 


When d(a,x) = ,/>0/_,(ai — x;)*, the problem is 


no longer a concave minimization but can be re- 
duced to a dc program by easy manipulations. In [1] 
results of solving the generalized Weber problem 
with m = 10,000, p = 2, and m = 1,000, p = 3, by de 
methods are reported. Alternatively, (6) can also be 
transformed into a monotonic optimization and solved 
by recently developed monotonic optimization meth- 
ods [23,24]. For this observe that d(a,x) = (d(a, x) + 


a1 Xi) — OL, x and since u(a,x) = d(a,x) + 
>of, xi and )>"_, x; are both increasing functions, it 
follows that d(a,x) is a dm (difference of monotonic) 
function, and, hence, (6) is a monotonic optimization 
problem. 


Multiple Facility 


When the r > 2 facilities to be located provide differ- 
ent services, aside from the costs due to interactions 
between facilities and users, one should also consider 
the costs due to pairwise interactions between facilities. 
The latter costs can be expressed by functions of the 
form pji[hii(x', x')], where again hj)(x', x') are convex 
nonnegative valued functions and ;/(t) are concave in- 
creasing functions on [0, +00) with finite right deriva- 
tives at 0. The total cost one would like to minimize is 
then 


So Fix!) + > palhalx'’.x')], (7) 


i=1 i<l 


where Fi(x') = Dies, qjilhj(x')] — Vjez, qiilhj(x')] 
and qji, hj are as in minisum single facility problems. 

As we saw above, each function F;(x') is dc, hence 
each function p;)[hj)(x', x!)] is dc, too, and (7) is again 
a dc function on (R*)". In the special case when there 
are no repulsion points (every F;(x') is convex) and the 
pairwise interactions between facilities p;;(t) are con- 
vex, this is simply a convex function. Also, in the ab- 
sence of interactions between facilities (p;;(.) = 0 Vij), 
the minimization of function (7) splits into r indepen- 
dent single facility minisum problems. 


Molecular Conformation 


A variant of the multifacility problem that has risen to 
attract much research in recent years is the so-called 
molecular conformation problem encountered in com- 
putational biology, computational chemistry, and pro- 
tein folding. This is the problem of determining ground 
states or stable states of certain classes of molecular 
clusters and proteins and can be stated as follows [14]. 
Given a cluster of N atoms (in three-dimensional 
space), we wish to locate their centers x!, ... ,x so 
as to minimize the potential energy function 


y= SS v(x! — xi), 


1<i<j<N 


Vy (x?, ese 
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where ||.|| is the euclidean norm and v(r) the inter- 
atomic pair potential. This can be viewed as a multifa- 
cility problem in which there is no user but many facil- 
ities (the number N may be rather large; see, e. g., [14]). 
In models used for computation, the pair potentials of 
interest include the following: 


v(r) = r_ 4% — 2r-® (Lennard-Jones) , 


v(r) = [1 _ eat] — 1 (Morse) , 


gah = 
v(r) = —— + Ae?’ (Born-Meyer) . 
T 


Using representation theorems in dc optimization, it 
can be seen that these functions are dc (at least for 
r > €, where ¢ is an arbitrary small positive number). 


Distance Geometry 


A related problem that also has applications in molec- 
ular conformation, and other questions such as survey- 
ing and satellite ranging, data visualization, and pattern 
recognition, etc., is the multidimensional scaling prob- 
lem or distance geometry problem. It consists in finding 
r objects x', ... ,x” in R” such that the quantity 


; - \2 
Vert, 2.x) = Domi (8, - Ix! - xP) ©) 
i<j 
is smallest, where A = (4;;), W = (wj;) are symmetric 
matrices of order r such that 
6, = 5 20, wij = wii = 0 


(P= 1). pact) 


(i < j); 
bi; = wii = 0 


By writing this problem as 


min) > wijlla! — x4)? — 2) wijSiillx! — x) 
i<j (9) 


i<j 


st. xi ER"(i=1,...,7) 
or, alternatively, as 


ty Ss 87, = (aia Ps te Ky) 


xie€R", i=1,...,7r 


min ) wit, 
i,j 


(10) 


we again obtain a dc optimization problem that is also 
a monotonic optimization problem. 
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The harmonic retrieval (HR) problem is an ubiqui- 
tous problem that arises in various applications, such 
as signal modeling and direction-of-arrival. It consists 
of estimating the parameters of multiple sinusoids from 
noisy data. The data is modeled as 


K 
y(t) = Do af sin(27 ff t) + n(t), 
k=1 (1) 


ti=1,...,N, 


where as, f ae and Oi are the amplitude, frequency, and 
phase of the kth sinusoid, respectively. It is assumed 
that the number of sinusoids, K, is known and all fre- 
quencies satisfy 0 < fi <0.5,k=1,..., K, and f; x 
fi for k # j. In addition, the noise, n(f), is assumed to 
be zero-mean, white Gaussian noise (WGN) with vari- 
ance ae. Given the data, y(t) for t = 1, ..., N, the goal 
is to estimate the sinusoid parameters, 0* = [at,..., a; 
fis-ofkl- 

The conventional FFT or periodogram-based meth- 
ods [4, Chapt. l]are only able to solve the HR prob- 
lem when frequencies are spaced more than 1/N cy- 
cles/sample apart, where N is the number of available 
data points. To tackle the problem where the difference 
between any two frequencies is smaller than the thresh- 
old 1/N, high resolution techniques must be used [4, 
Chapt. 5]. The sinusoidal parameter estimation problem 
is based on solving the least squares (LS) problem (P): 


(P) O15 2 argmin J(6), (2) 
where 
J(0) 
N K 2 (3) 
=o jy) — Yo ax sin2z fut + ox) 
t=1 k=1 
and @ = [a,..., aks fis -..> fxs Dis ---> Ox]. We can 


see from (3) that the objective function is nonconvex, 
which suggests that a global optimization method rep- 
resents the most appropriate procedure for determin- 
ing Ors: 

Two methods that have been proposed for solving 
(P) are the one proposed in [8], for which we will re- 
fer to as Stoica’s method and the Iterative Quadratic 
Maximum Likelihood method (briefly: IQML method) 
[1]. Both methods can not guarantee convergence un- 
less the initial conditions are sufficiently close to the 
global minimum. Stoica’s method first generates ini- 
tial estimates using the overdetermined Yule-Walker 
method. Then, it improves on these estimates by us- 
ing a periodogram-based procedure and a simplified 
Gauss—Newton algorithm to iteratively maximize the 
likelihood function. In [8], it was shown experimen- 
tally that Stoica’s method requires extremely large data 
records. The well known IQML method is an itera- 
tive quadratic maximization algorithm that attempts 
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to determine the maximum likelihood (ML) estimates 
in terms of a prediction polynomial. This algorithm, 
as our experiments show, produces poor estimates for 
short data records and/or low signal-to-noise ratio 
(SNR). The IQML algorithm is also noted to sometimes 
fail to converge and the estimated frequencies are al- 
most always inconsistent [9]. 

Taking a different approach, we will apply the global 
optimization algorithm of interval methods (IM) to 
the HR problem (2). Interval method type algorithms 
[3,6,7] have proven to be an excellent and reliable pro- 
cedure for solving global optimization problems involv- 
ing nonconvex objective functions. One of the reasons 
one chooses interval methods is because they are ap- 
plicable to most optimization problems regardless of 
convexity and differentiability of the objective function, 
or knowledge of its Lipschitz constant. Additionally, 
for continuous objective functions, its convergence to 
a global optimum interval has been proven [3]. In us- 
ing the IM method for solving the LS estimates of (2), 
convergence is very slow. 

One way to overcome the problem of slow con- 
vergence is to decompose the problem whereby opti- 
mization occurs over smaller dimensions and in paral- 
lel. This can be accomplished through combining the 
expectation-maximization algorithm (briefly: EM algo- 
rithm) [2] with the interval method. This proposed 
combination of the EM algorithm with the interval 
method is defined as the expectation-maximization in- 
terval method (EMIM) algorithm. The EM algorithm 
represents a computationally efficient method for solv- 
ing estimation problems. For the HR problem, the EM 
algorithm decomposes the HR problem into K sub- 
problems, where K is the number of sinusoids. The K 
subproblems, which are nonconvex optimization prob- 
lems, are then solved using an IM global optimization 
method. This results in an algorithm that is able to con- 
verge to the global minimum interval with significantly 
reduced computational complexity, in comparison with 
using the IM algorithm alone for solving (P). 


Interval Arithmetic 


Interval methods are a class of global optimization al- 
gorithms that utilize interval arithmetic. An interval 
which contains the global minimum is found by par- 
titioning the search space into regions, where at each 


iteration, regions are selected for further search by ad- 
ditional partitioning. Those partitions that cannot con- 
tain the global minimum are discarded. A major advan- 
tage of interval methods is their ability to find the global 
minimum of nonconvex differentiable or nondifferen- 
tiable objective functions. 

Interval arithmetic [6] was developed to automati- 
cally estimate and control numerical errors caused by 
finite precision of computer arithmetic. The INTLIB li- 
brary [5] is used to implement interval arithmetic as 
used in the IM algorithm. A real interval number X 
= [a, b] consists of the set set{x: a < x < b} of real 
numbers. Additional notations used here are: the upper 
bound (ub) of X = b, the lower bound (lb) of X = a, the 
mid-point of X is m(X) = (a + b)/2, and the width of X 
is w(X) = b— a. Furthermore, w(X) = max{w(X;)}/=" 
where X = [X, ..., X,]?. The general interval arith- 
metic operational rules is defined as XY = {xl 
X, y € Y}, where X and Y are real interval numbers 
and LH represents the arithmetic operations of plus, mi- 
nus, multiplication, and division. For additional infor- 
mation on interval arithmetic see [6,7]. 

The unconstrained global optimization problem can 
be described as 


yx € 


pores (x), (4) 
where g(x): R” — R, x € R" and D € R" represents the 
feasible region. The main tool for solving the problem 
in (4) is the concept of inclusion function. A function 
G(X): J” — J is an inclusion function of the objective 
function g(x), ifx € Y implies that g(x) € G(Y) and that 
the isotonicity property is met (i.e. X C Y implies that 
F(X) € G(Y)). The inclusion function with isotonic- 
ity property provides the theory for the use of interval 
methods as a global optimization procedure. In short, 
inclusion functions represent the range of function val- 
ues of f over the interval X. 

The optimization procedure for the interval method 
involves continually bisecting a box X; from an initial 
box, Xo, until G(X‘), the inclusion function, contains 
the global minimum given that w(G(X')) < €. What 
differentiates this method from the method of exhaus- 
tive search is that regions of the objective are discarded 
from evaluation if the Ib G(X‘) in the list, £, is greater 
than the minimum between the past or present value 
of ub G(X’) given that i # j. The algorithm of E.R. 
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Global Optimization Methods for Harmonic Retrieval, Table 1 
A pseudocode for interval methods 


PROCEDURE interval method 

Setayae—oxG 

Calculate G(Y), y :=lbG(Y), g :=ub G(m) 
where m =mid Y 
Initialize list £ := {(Y, y)}. 

REPEAT until convergence 
Choose a coordinate direction k, parallel to Y;, 
and of max length. 
Bisect Y to obtain boxes Vj, V2, where 
Y= WU Wa 
Calculate G(V;) and G(V2) and v; :=lb G(V;) 
ore = 12, 
Place (G(V;), v;) at end of list. 
Choose pair (Y, j) from £ such that 7 < z, 
WB )o 
Discard pairs from list, (Z, z), ifz > ¢. 
Terminate if m(Z) < €, VZ, in L£. 
Denote first pair of list by (Y, y). 
Compute m := mid Y and 
& = min(g, ub G(m)). 

RETURN 

END interval method 


Hansen [3,7] is the particular interval method that will 
be used for locating the LS estimates of the HR prob- 
lem and is outlined in Table 1. In [7], it was proven 
that convergence to the global minimum was achieved 
if w(G(X)) > 0 as w(X) = 0. 


Interval Method for Solving HR 


To apply the IM to solving the HR problem, the objec- 
tive function (3) must be placed in its inclusion form: 


J(O) 
N K 2 (5) 
— ¥,, 0 = So Ak sin(27 Fyt + | ; 
t=1 k=1 
where © = [A), ..., Ax, Fi, ..., Fx, ®1,..., Px] and 


Ax, F,, and ®, are the interval counterparts of a;, fi; 
and ¢;, respectively. Throughout this paper capital let- 
ters represent interval variables that correspond to its 
real variable equivalent. The initial interval, @o, is cho- 
sen such that it encompasses the global minimum. This 
is accomplished by choosing an interval that is deter- 


1.4 
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Global Optimization Methods for Harmonic Retrieval, Fig- 
ure 1 
Objective function of a single sinusoid 


mined from a priori information or from other high 
resolution HR methods [4]. The IM of Hansen’s, de- 
scribed in previous section, is used to determine the 
global minimum, 6%, of (5). The objection function (5) 
for a single frequency, phase and amplitude held con- 
stant, is plotted in Fig. 1). It can easily see that this rep- 
resents a very difficult but practical problem for global 
optimization. 


Simulations 


In this section, a numerical experiment will be demon- 
strated to show the performance of the IM for solving 
the HR problem (P). The experiments consist of esti- 
mating the sinusoid parameters for the following data, 


y(t) = 1.0 sin(27(0.2)t + 0.0) + n(t), 
oe ere 


(6) 


where n(t) is white Gaussian noise. We choose the ini- 
tial box for the IM algorithm to be © = [A, F, ®]T 
= [[0.71.2], [0.10.3], [00.4]]™. The signal-to-noise-ratio 
(SNR) is defined as 


(az)? 
10log | 550.5 eile 


k=1 mt 


where a is the variance of the noise. The results of 
this simulation, shown in Table 2, is described in terms 
of sample mean and standard deviation based on 50 
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Global Optimization Methods for Harmonic Retrieval, Table 2 


IM estimates 
IM: N = 35 (50 MC runs) 
SNR 10 5 0 

a* =1.0 1.0155 1.0447 1.0633 
+0.0518 +0.0913 +0.1365 

fas 20 0.1995 0.1993 0.1989 
+6.591 - +0.0012 +0.0017 

10-4 

o* =0 0.0654 0.0839 0.1193 

+0.0564 +0.0975 +0.1319 


Global Optimization Methods for Harmonic Retrieval, Table 3 


IQML Estimates 


IQML: N = 35 (50 MC runs) 
SNR 10 5 0 

a = 0 1.0080 0.9862 0.8919 
0.0533 +0.1479 +0.3230 

fe = 20 0.1998 0.1970 0.1728 
+7 .623 - 0.0202 0.0839 

10+ 

o* =0 0.0141 —0.0126 0.5288 

0.0949 0.2852 +1.1429 


Monte-Carlo (MC) runs. This results are based on the 
midpoints of ©. Note that the final estimates, 6, are 
very close to the true value of 0* with a small standard 
deviation. In comparison with IQML, see Table 3, the 
IM fares considerably better in both mean and standard 
deviation. This is particularly notable when comparing 
the frequency component, which represents the most 
important feature of harmonic retrieval. 

The convergence rate of the IM is sensitive to the 
order, K, of the HR problem. In fact, the dimension- 
ality of the parameter space, J”, increases at a rate of 
3K. Thus, as n increases, the convergence rate becomes 
prohibitively slow. The curse of dimensionality can be 
mitigated through decomposition and parallelizing the 
problem by utilizing the EM algorithm as described in 
the next section. 


EMIM 


The detailed development of the EM algorithm [2] is 
well-known, and will be outlined here as part of the de- 


velopment of the EMIM algorithm for solving the HR 
problem. To determine the LS estimates of the sinu- 
soidal parameters, the EM algorithm first decomposes 
the observed data y(t) into its signal components (E 
step) and then estimates the parameters of each signal 
component separately (M step). The algorithm iterates 
back and forth between the E step and M step, using the 
current estimate to decompose the observed data better 
and thus improve the next parameter estimate. 

For the HR problem, the incomplete data is the ob- 
served data, y(1), ..., y(N). The complete data is mod- 
eled as the following K data records: 


yx(t) = aj sin(2a fit + Of) + nx(t), 
RS tees, 


where nx(t) = Bxly(t) — a a, sin(2rf;t + o;)]. The 
B;’s are arbitrary real-valued scalars satisfying )\_, Bx 
= land B; > 0. Thus )\f_, m(t) =n(t), fort =1,...,N. 
The EM algorithm, beginning with n = 0, is represented 
by the following two steps: 

E) Fork=1,..., K, compute 


P(t) = AM sina Ft + BO) 


: ses ce 1) 


1=1 
M) Fork=1,...,K, 


oery =arg min i. (8) 
aks fk Pk 


where 
N 
i = CG) — ax sin(27 fet + ox))?. (9) 
t=1 


a(n) A 
The parameter vector 6” = 


(a, FG] is the 
estimate for 6; 7 lat, fy. or)" after 1 iterations. In 
the original HR problem, we have to search the (3 x 
K)-dimensional parameter space to find the minimum 
value of the least squares objective function. But after 
the EM algorithm decomposes the HR problem into 
K smaller subproblems, we only have to solve K sub- 
problems each of which requires the search of a 3- 
dimensional parameter space to find the global optimal 
point(s). This results in a significant reduction in com- 
putational complexity. 
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To solve the minimization problem in M step, we 
resort to using the IM for finding the final interval that 
contains the point minimizing the objective function. 
Since IM has been proven to converge to the global 
optimum for continuous objective functions [3], this 
algorithm will not be trapped in the local extremum. 
Needed in the IM algorithm is the inclusion function 
of the objective function (6), which is constructed by 
forming the natural interval extension [3,7] of Ji: 


N 

I? = S- (FP) — Ag sinQxFyt + ®)) , (10) 

t=1 

where Ax, F;, and ®; are the interval counterparts of 
ax, fr, and ,, respectively. The initial value ge = 
fae fO.ee 1" are arbitrarily guessed or can come 
from other high-resolution estimation methods. The 
initial interval Oxo = [Ago Fio, ®ko]™ for the M) 
step is the region over which the minimization is car- 
ried out. This initial interval ©;,0 is used at the begin- 
ning of each M) step of the EMIM algorithm. At the 
(n + 1)st iteration of EMIM, the IM partitions ©, 0 
iteratively to find the final interval estimate out). 
The mt) = igi will be used as the parame- 
ter estimate to compute yoo for the next iteration 
of the EMIM algorithm. The process is repeated until 


ae qe ~6” < p, where p is chosen by the 
user. 
Consider the case where 6 = oh and f; = Bj. 


It is straightforward to see that p(t) — re and 
di = dj in the E)-step and M)-step, respectively. Thus, 
oth) = er) for all n which means that the final 
estimates for 6; and 6; will be the same. In order to 
avoid this problem, 6; must not equal f; or Go must 
not equal Oe in order to fully exploit the capability of 
the EMIM algorithm. 


Simulations 
Our experiments consist of estimating the sinusoidal 
parameters for the following data, 

y(t) = 1.0 sin(27(0.2)t + 0.0) 

+ 1.0 sin(27(0.22)t + 0.0) + n(t), 

t=1,...,35, 


where n(t) is white Gaussian noise. Since |0.2 — 0.22| 
< 1/35 = 0.02857, the periodogram cannot be used to 


determine the frequencies. We choose the initial box for 
the EMIM algorithm to be: 

[1,0, 2,0] " 

= [Ai Fi.0, ®1,0, 2,0, F2,0, P2,0] ' 

= [[0.7 1.2], [0.1 0.3], [0 0.4], 

[0.7 1.2], [0.1 0.3], [0 0.4]]" 


and £,; = 0.09, B2 = 0.91. The signal-to-noise-ratio 
(SNR) is defined as 


K (a*)? 
10log| 55 0.5—£- |, 
k=1 e 


n 


where 0% is the variance of the noise. If no a priori in- 
formation about the possible values of the sinusoid pa- 
rameters is available, the full range of possible values 
for the frequency, the phase, and the amplitude must 
be used as the initial intervals. Utilizing the full range 
will impose no difficulty when very fast computing en- 
gines are used. However, other high resolution tech- 
niques can be used to yield a smaller and more cogent 
initial interval. 

Using 50 MC runs, we computed the sample means 
and standard deviations for the EMIM and the IQML 
algorithms. (See Table 4 and Table 5, respectively). As 
for the EMIM, the mid-points of the final interval es- 
timates are considered as the final estimates, thus the 
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EMIM estimates 


EMIM: N = 35, p = 10° (50MC runs) 
SNR 10 5 0 
aie 1.0305 1.0235 1.0263 

+0.0992 | +0.1389 | +0.1622 
ema 0.1993 0.1993 0.1969 
+2.209- | +4.119- | +0.0110 
Lon Io 
@, =0 0.0631 0.0851 0.1369 
+0.0764 | +0.1152 | +0.1609 
Fe Sot 1.0284 1.0501 1.0995 
+0.0744 | +0.1036 | +0.1054 
eS 0.2192 0.2194 0.2182 
+0.0012 | +0.0016 | +0.0051 
; =0 0.0746 0.0757 0.1314 
+0.1177 | +0.1224 | 40.1662 
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Global Optimization Methods for Harmonic Retrieval, Table 5 
IQML estimates 


IQML: N = 35 (50 MC runs) 
SNR 10 5 0 

Gi, = We 0.9549 0.6615 0.7404 
+0.3283 +0.2908 +0.2778 

ji = 20 0.1963 0.1707 0.1404 
+0.0137 +0.0476 +0.0836 

o; =0 0.3332 0.9323 0.5472 
+0.6117 +1.0843 +1.0659 

Gs = Ne 0.9013 0.7567 0.8582 
+0.3732 +0.2683 +0.2788 

S22 no 2428 0.2559 0.2721 
+0.0685 +0.0867 +0.0985 

o; =0 —0.0886 —0.0079 0.3123 
+0.4852 +0.7185 +0.8358 


sample mean and variance can be calculated accord- 
ingly. Note that the EMIM generates estimates which 
have mean values very close to the true parameter 
values and relatively very small variances. As for the 
IQML, its variance for each value of SNR is significantly 
larger than the corresponding EMIM. Clearly, EMIM 
outperforms IQML by providing estimates that are less 
biased with smaller variances. 


Conclusion 


In comparison between the two types of IM algorithms 
with the IQML method, it was shown that both the 
IM and EMIM algorithms represent a powerful tool 
for solving the HR problem. Furthermore, it has been 
noted that by decomposing the problem by the EMIM 
algorithm does not degrade the performance of using 
the IM. 

We have shown experimentally that the IM and 
EMIM algorithms are robust for very short data records 
and low SNR. Nevertheless, if the dimensionality is low 
or convergence to the ML estimates is desired, then the 
IM algorithm can be used. For either EMIM or IM, con- 
vergence time can be improved by generating initial in- 
terval of smaller widths by using other high resolution 
HR methods. Furthermore, using a multi-processor 
computer to implement the decomposed sub-problems 
in parallel can also reduce the execution time. 
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The problem of finding a solution of a system of equa- 
tions and/or system of inequalities is one of the main re- 
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search subjects in numerical analysis and optimization. 
The source of systems of equations and/or inequalities 
contains many ‘real-world’ problems ([2,7]), the non- 
linear complementarity problem (cf. also » General- 
ized nonlinear complementarity problem), the varia- 
tional inequality problem (cf. also » Variational in- 
equalities) over a convex set, Karush-Kuhn-Tucker sys- 
tems, the feasibility problem, the problem of computing 
a Brouwer’s fixed point ([10,15]). 

In general, a system of nonlinear equations and/or 
inequalities is given by 


hi(x)=0, ie], 
(SNE) 4 g(x) <0, jeJ, 
xEX, 


where I, J are finite index sets, X C R” is a convex set, 
and h; (i € I), g; (j € J) are nonlinear functions defined 
on a suitable set containing X. 

Solution methods for (SNE), which are based on 
convex and nonsmooth optimization techniques, and 
fixed point algorithms can be found in [2,3,4,5,14,15], 
and references given therein. 

In order to apply global optimization methods for 
solving (SNE), one defines a vector function h: R" > 
R! having components h,(x)(i € I), a function 


f(x) = max{||h()|]. (gi): je Dj}. 


where || - || is any vector norm on R!|, and considers the 
following global optimization problem 


(GOP) f* = min{f(x): x € X}. 
In particular, the function f in (GOP) can be defined by 


f(x) = max {{|hi(x)|: i€ B. {gi(x): jedh}. 


In general, a vector x* € R” is a solution of (SNE) if 
and only if it is a global optimal solution of (GOP) and 
f* =f (x*) = 0. Thus, finding a solution of (SNE) can 
be replaced by computing a global optimal solution of 
(GOP). In the case that I = 9, i.e., (SNE) is a system of 
inequalities, global optimization algorithms to (GOP) 
will terminate whenever a feasible point x € X is found 
satisfying f(x) < 0. While applying a global optimiza- 
tion algorithm to (GOP), if it is pointed out that f* > 0 


(e.g., a lower bound yu of f* can be computed such that 

jt > 0), then obviously (SNE) has no solution. 

There are three main classes of (SNE), which can be 
solved by implementable methods in global optimiza- 
tion: 

i) The functions h; (i € I) and g; (j € J) are all dic. (a 
function is called d.c. if it can be expressed as the 
difference of two convex functions, see ® D.C. pro- 
gramming). 

ii) The functions h; (i € I) and g; (j € J) are all Lips- 
chitzian with Lipschitz constants L; (i € I) and Mj 
(j € J), respectively. 

iii) The corresponding problem (GOP) can be replaced 
by a convex relaxation problem. 

For class i), the function f in (GOP) is d.c., and one can 

find an explicit form of f as the difference of two convex 

functions, so that d.c. programming techniques can be 
applied ([9,11,12,18,19]). 

For class ii), if in the definition of f, £,-norms are 
used, i.e. 


1 
P 
(x)? og 
Lone (x uae ee 
max |h;(x)|, p=%, 
then f is Lipschitzian with Lipschitz constant L = 
max {vie rLi, {Mj: j € Jt}. Algorithms for solv- 
ing Lipschitz optimization problems can be found in 
[6,7,8,9,10,12,16,17]. 
Techniques for the construction of convex relax- 
ation problems for some special cases of class iii) are 
given in [13]. 
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Multiplicative functions, products of real-valued func- 
tions f;,i=1,..., p, are generally nonconvex functions 
even though each f; is convex. As a result, most multi- 
plicative programming problems containing [[?_, fi(x) 
in the objective and/or constraints are nonconvex min- 
imization; and hence we need global optimization to 
look for a global minimum in stacks of local minima. 
Fortunately, however, the number p of f;s in multiplica- 
tive functions encountered in practical applications is 
rather small in comparison with the number n of vari- 
ables; e.g. two or three in geometrical optimization 
[10] and at most five in multiple objective optimization 
[1]. As will be seen later, this enables us to embed the 
troublesome nonconvexity into a small subspace of di- 
mension p. Exploiting such a property, called low-rank 
nonconvexity [6], a number of researchers have devel- 
oped efficient algorithms since the late 1980s years to 
solve various subclasses of multiplicative programming 
problems, including the linear multiplicative program 


min (¢) x + cyo)(c} x + C29) 


s.t. xéeD, 


(1) 


where D C R" is a polytope and ey x+c0 > 0 for any x 
€ D; the convex multiplicative program 


P 
min | | fi(x) 
i=1 


s.t. xeéED, 


(2) 


where D is a compact convex set and the fjs are convex 
functions positive-valued on D; the generalized convex 
multiplicative program 


P 
min S> fai-1(%) frail) + g(x) 


i=1 


s.t. xe D, 


(3) 


where D and the fis are the same as in (2) and g is a con- 
vex function; and the convex program with an addi- 


tional convex multiplicative constraint 


min g(x) 
st. %ED 


P 
[| fi@ <1. 


i=1 


(4) 


where D, the fs and g are the same as in (3). As long as 
p is a small number, all of these nonconvex programs 
can be solved in a practical amount of time even if n 
exceeds a few hundreds. 


Linear Multiplicative Program 


Problem (1), though simple looking, is NP-hard (cf. 
also » Complexity theory; » Complexity classes in 
optimization) as shown in [11]. There are two ma- 
jor methods, each of which is based on a variant of 
parametric simplex algorithms for linear programming 
[12]. 

The first method introduces a parameter & > 0 and 
transforms (1) into an equivalent problem: 


min &f;\(x) 
st. xED (5) 
fw) <& §& 20, 


where f j(x) = ce x + c0, i= 1, 2. To solve (5), we need 
only to solve 


min {fi(x): x € D, f(x) < &} (6) 


for all & > Emin = min{f2(x): x € D}, using the paramet- 
ric right-hand side simplex algorithm (cf. also ® Para- 
metric linear programming: Cost simplex algorithm). 
We then have a set of optimal solutions x(&) to (6) and 
the analytical expression of 


b(&) = Efi(x)), 


which is a piecewise quadratic function over & > Emin. 
Let 


&* € argmin {P(&): & > Emin} . 


Then x(&*) is an optimal solution to (1). 

This parametric method, proposed by K. Swarup 
[13] in the middle 1960s, was originally used for find- 
ing a locally optimal solution to (1). Strangely, it had 
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not been appreciated as an efficient global optimization 
tool until the second parametric method was developed 
by H. Konno and T. Kuno [4] more than twenty years 
later. 

The second method also introduces a parameter & 
> 0, but in a deferent way: 


min F(x,§) = &fi(x) + 22 


E> 0. a 


s.t. x € D, 


For any x we have 
min {F(x,&): & > 0} =2V7 fix) f(x). 


Therefore, (7) is equivalent to (1); moreover, (7) is 
equivalent to finding a minimum point &* of a function 


w(€) = min {F(x, €): x € D} (8) 


over & > 0. Since the right-hand side of (8) is a linear 
program, we can locate &* using the parametric objec- 
tive simplex algorithm. In fact, noting that A = E/(& + 
1/E) maps & = {&: € > 0} to a unit interval {A:0< A < 1}, 
we solve 


min {Ac} x +(1—A)c]x: xe D} (9) 


parametrically over A € (0, 1). Let x(A) denote an opti- 
mal solution to (9). Then 


A* € arg min {fi (x(A)) fo(x(A)): A € (0, 1} 


aoa and x(A*) is an optimal solution 


gives &* = 
to (1). 
Under some probabilistic assumptions, the average 
number of simplex pivots needed to solve a linear pro- 
gram with a single parameter is known to be polyno- 
mial in the problem input length [12]. Hence, (1) can 
also be solved in polynomial time on the average, which 
contrasts sharply with the result of the worst-case anal- 


ysis. 


Convex Multiplicative Program 


The above parametric methods for (1) can be extended 
to more general classes of multiplicative programming 
problems. For example, (7) is directly applicable to the 
special case of (2) where p = 2; but it is difficult to design 
an algorithm for solving (7) parametrically when the f js 


are nonlinear functions. One effective approach in this 
case is branch and bound on the set of parameter val- 
ues & = {&: & > 0} [7] (cf. also » Integer programming: 
Branch and bound methods). 

Let F denote the family of functions of the form: 


B 
ae + E 
where a, B € R. The function y defined by (8) is 
a pointwise minimum of some functions in F such that 
a = f1(x) and 6 = f(x) for x € D. The family F pos- 
sesses the following properties: 
i) Any two points (&,, Ws), (E: Wr) € R’, with 0 < & < 
€,, uniquely determine 


WE Fs Wee Wslés — Wilé 
eT: Liss = Ue; 


ii) Any function in J is Lipschitz continuous over & > 
&’ for any &’ > 0; 

iii) Two distinct functions in F have at most one inter- 
section point over & > 0. 

Suppose [&,, €;] C & is an interval containing &*. Since 

fi and f2 are convex, F(-, &) is also a convex function 

for any & > 0; and hence w(&,) and w(&;) can be com- 

puted by convex programming. For (&,, w(&;)) and (&;, 

w(&,)), let us construct a function in F according to i): 


ea eer 


u(&; &;, &:) 

— WE s)Es — WEE WE) Es — WEE 

a eae 
From iii) we have 

u(é;&, &) < w(&), VEE [&, &;]. 


Let &, € arg min {u(&; &,, &;): € € [&;, &]} and 


u(§; &, Em) 
u(é; Em, &t) 


if0<& <&y, 


ur( )= 
: if & > Ey. 


Then uz underestimates w over [&,, &;] and is better 
than u, = u(-; €,, €;) in the sense: 


ui(§) S u2(§) < WE), VE € [Es, &]. 


In this way, as improving the underestimator of w suc- 
cessively, we can generate the sequence of minimum 
points of u%s convergent to &*. 
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The parametrization (7) can further be extended to 
(2) with p > 2 [8] as follows: 


min 


P 
F(x,&) = - i fi(x) 
i=1 
xeD, (10) 
Pp 
[]&21 €20. 
i=l 
Karush-Kuhn-Tucker conditions with respect to € im- 
ply the equivalence between (2) and (10). Let 
w(&) = min {F(x, €): x € D}. 
Then (10) reduces to a problem with p variables: 
min () 
P 


a [lee $20: aD 
i=l 
The objective function w is concave and coordinatewise 
nondecreasing; and its value at any € > 0 can be com- 
puted by convex programming. 
An alternative approach to (2) with p > 2 [14] is 
a generalization of (5): 


min 1 [é 
i=1 

st. xE€D, (12) 
fit) <&, i=1,...,p, 
&>0. 


Let W € R" x R? denote the feasible region of (12) and 
Q= {é Ee R?: Gx, (x,é) € wh. 


Then (12) also reduces to a problem with p variables: 


P 
min Y log éj 

=] (13) 
st & EQ. 


The objective function is concave; the feasible region (2 
is a projection of the convex set W and hence a convex 
set. 

Both (11) and (13) are concave minimization prob- 
lems (cf. also » Concave programming); however, even 
general-purpose algorithms such as branch and bound 
and outer approximation (cf. also ® Generalized outer 
approximation) [3] can handle them very efficiently 
when p is less than five. 


Other Multiplicative Programs 


In a way similar to (11), problem (3) can reduce to 
a concave minimization problem with 2p variables [5] 
through a parametrization: 


3 E5i—1(fai—1(x))? + Eni (fri(x))? 


, 2 
i=1 
+(x) aa 
s.t. xéeD 
&j)-16; > 1,i=1,...,p, 
f= 0, 


Let y(&) denote the optimal value of (14) with fixed &. 
Then (14) reduces to 


min w(é) 
s.t. £5 j)-16) oan ey eo ere »?P, (15) 
&>0. 


The objective function w is concave and coordinatewise 
nondecreasing. For problem (4), we can use the follow- 
ing parametrization [9]: 


min g(x) 
st. %“ED 
fi) < &,i=1,...,p, (16) 
P 
[]&s1. €20. 
i=1 


Let y(&) denote the optimal value of (16) with fixed &. 
Then (16) is equivalent to 


min w(&) 


Pp 
s.t. 1 [é <1, &2>0. iW) 
i=1 


The objective function y is convex; but the feasible re- 
gion is a d.c. set (difference of two convex sets). Thus, 
we can solve (3) and (4) by solving smaller-size prob- 
lems (15) and (17), respectively. For a more complete 
survey of the algorithms, see the article by Konno and 
Kuno in [2]. 

We have seen that the parametric approach offers 
an efficient tool to handle multiplicative programming 
problems. This approach is not specific to the mul- 
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tiplicative structure but can be extended to a much 
wider class of nonconvex minimization problems, in- 
cluding minimum concave-cost flow problems, facil- 
ity location, multilevel programming and so forth. The 
textbook [6] shows how the parametric approach can 
be generalized to a broad class of problems. 


See also 
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> Multiparametric Linear Programming 

> Multiplicative Programming 

> Parametric Linear Programming: Cost Simplex 
Algorithm 
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Introduction 


Various deterministic global optimization algorithms 
that utilize a branch and bound framework make use 
of convex underestimators of the functions under con- 
sideration. This entry presents the work of Meyer and 
Floudas [11] on the convex underestimation of C?-con- 
tinuous functions. The work extends and refines the 
convex underestimation approach used in the aBB 
global optimization algorithm [1,2,3,4,10]. A recent re- 
view of deterministic global optimization approaches 
can be found in [6]. 

Let f :R" — R be a smooth nonconvex C?-con- 
tinuous function. Its convex underestimator ¢ : R” € 
x — R is defined as: 


P(x) = f(x) — q(x) (1) 


where q: R" — R is some perturbation function. 

In the classical wBB approach, a series of simplifica- 
tions are made to yield an efficient convexification pro- 
cedure. The first of these simpifications is the imposi- 
tion of a quadratic structure on the perturbation func- 
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tion: 


q(x) := So ai i=) (ere) (2) 


i=1 


To ensure that q(x) is nonnegative, a is assumed to be 
nonnegative. Observe that q(x), a quadratic function 
with a diagonal Hessian matrix V?q(x) := 2 diag(a) 
has an eigenvalue-eigenvector structure that is uniform 
over the entire domain x with eigenvectors that are 
aligned with the coordinate axes. In the work of Ad- 
jiman et al. [2], a second simplification is introduced 
in which the interval extension, H*, is used instead 
of V’f(x) itself. The interval extension of the matrix 
V7 f(x) € R"*" is a matrix of intervals of R. Each el- 
ement H*;; of the matrix H* is defined in such a way 
that 


oy 


Daas; ; for all x Ex. 


x 
a 3 iy 


Computing the tightest possible interval extension is in 
itself a global optimization problem. In practice, an in- 
terval extension can be calculated using interval arith- 
metic [12,14,16]. The overestimation made in the in- 
terval calculations may result in a significant loss of ac- 
curacy. Adjiman et al. [2] applied the work of [5,7,8, 
9,13,15,17,18], and devised various methods to com- 
pute w vectors that guarantee the convexity of the un- 
derestimators. The tightness of the underestimators is 
dependent on the particular calculation method used. 
Extensive computational testing [1] showed that the 
method based on the scaled Gerschgorin theorem per- 
forms better in practice. 

In the work of Meyer and Floudas [11], the form of 
the wBB perturbation function and the way in which 
it is calculated are reexamined, a novel spline based 
method for convex underestimation is proposed and 
an efficient means of computing these tighter underes- 
timators is elucidated. 


An a« Spline Underestimator 


The size of the domain x affects the result of every step 
in the @ calculation and strongly influences the tight- 
ness of the resulting convex underestimator. In partic- 
ular, reducing x reduces the mismatch between the as- 


sumed quadratic functional form and the ideal form; 
it reduces the overestimation in the interval extension 
of the Hessian matrix; and the maximum separation 
distance has been shown to be a quadratic function of 
interval length [10]. It is therefore useful to construct 
a convex underestimator using a number of different 
a vectors, each applying to a subregion of the full do- 
main x. 

Let f(x): R" > R bea C?-continuous function. 
For each variable x; € R, let the interval [x,;,x;] be par- 
titioned into N; subintervals. The endpoints of these 
subintervals are denoted with x°, x!,--- x)’, where 
x, =xl<xl<..<xk <... <x! = x, In this no- 
tation, the kth interval is a, xe], A smooth convex 
underestimator of f(x) over x is defined by (1). The new 
perturbation function, q(x), would be: 


i=1 


. (a? - xi) + Bix; + yk . (4) 


In each interval Ee x*], ak >0 is chosen such 
that V’@(x), the Hessian matrix of (x), is pos- 
itive semi-definite for all members of the set 
{x Ex: x; € [xk 1! xk]}. qk(x;) is the quadratic func- 
tion associated with variable i in interval k. The func- 
tion q(x) is a piecewise quadratic function contructed 
from the functions qk (xi). 

The continuity and smoothness properties of q(x) 
are produced in a spline-like manner. For q(x) to 
be smooth the qk functions and their gradients must 
match at the endpoints x. In addition, we require that 
q(x) = 0 at the vertices of the hyperrectangle x. To sat- 
isfy these requirements, the following conditions are 


imposed for alli = 1,...,: 
qi (x;) = 0 
qk (x') = oe (xt) Vk =1,...,Ni—1 
qi (x) =0 ~) 
k k+1 
as aa; WE= Usagi 
dx; uk dx; uk 
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Expanding these equations for each i = 1,...,n, 
one obtains the following system of equations: 


pix’ + y} <0 
Blak + yk = phtiak 4 yh 
Vk =1,...,N;—-1 
Ni 4M 0 
Hat (of — af) + pf =a” (xt — af) 4 ph” 


Vk =1,...,N;—-1 


(6) 
which can be represented as: 

—x? -1 ; 
2 
x} —x} 1 -1 : 
7 —x! 1-1 pk 
Ni Ni 
Xx; 1 B; 
-1 1 y} 
-1 1 Yj 
=i i yi 

0 

0 

0 

0 

S] 

52 

i= 
Sj 
(7) 
where s* = —ork (xc — ¢F-1) — okt 1(xk+1 — xh), 


The solution of the above linear system of equations 
is: 


k-1 
BF = Bi + > Vk =2,.055Ny @) 
j=l 
k-1 
yk = —Bixt — six! VRS Uysasa 
j=l 


For a rigorous proof of the continuity, smoothness, 
convexity and underestimation properties of underesti- 
mator ¢(x), see [11]. 


Nonconcave Perturbation 


Consider a function f(x) which is convex in one subdo- 
main and concave in another. In the a spline approach, 
$(x) can be convex even if the a values are negative in 
the regions in which f(x) is strictly convex. In the classi- 
cal wBB underestimator, the underestimation property 
is guaranteed by the concavity of q(x), as given in (2). 
The concavity of q(x) is, in turn, a result of the non- 
negativity of the a values. In this section, we discuss 
how the underestimation property of ¢(x) can be main- 
tained when some @ values are negative. 

The underestimation property, ¢(x) < f(x) for all 
x € x, is ensured by the following condition: 


min q(x) > 0 (9) 


Instead of solving minimization problems, the key 
idea is to adjust the a’s to prevent the creation of lo- 
cal minima at any nonvertex point in x by prohibit- 
ing the occurrence of stationary points on convex re- 
gions of the perturbation function. This is illustrated 
in Fig. 1. In Fig. la, a concave perturbation function 
is depicted. The non-negativity of this function follows 
from its concavity. In Fig. 1b, a perturbation function is 
shown which is convex over the domain marked with 
a bold line. 

The point x* is a stationary point of q in this 
convex region and we note that q (x*) is negative. In 
Fig. 1c, the perturbation function is again convex over 
the marked region but there is no stationary point in 
this region. This function is non-negative over the en- 
tire domain [x, x]. 

Using this idea, Meyer and Floudas [11] derived 
a tight convex underestimator by starting with q(x), 
with non-negative a values as defined in Sect. “An 
a Spline Underestimator”, and making the zero a’s 
negative, one at a time, while maintaining the convexity 


Global Optimization: p-@BB Approach 


1373 


q(x) 


los 
& 
=I 
lo 


a b 
Global Optimization: p-7BB Approach, Figure 1 


& 
= 
le 
& 
tal 


(a) Concave, (b) nonconcave, and (c) nonnegative nonconcave perturbation functions 


of #(x) and avoiding the generation of stationary points 
on the convex portions of q(x). For the rest of this sec- 
tion we will assume that f : x > R isa univariate func- 
tion, x = [x,x] C R. The separable structure of the a 
spline function allows the techniques developed here to 
be applied to the multivariate case. 

Note that the 8 and y parameters defining q(x) are 
functions of the a’s and the endpoints, x°,...,x%. The 
following formula, derived from (8), is an expression 


for B* in terms of a!,..., a. 


k=l 
1 ae ee 
pt = er at ") (xi — x°) 
= 
— ait! (xt? — xl) (xi — x°)) 
i: 2 ee 
ag eee) 
= 


— ot} (itl — xl) (xi —x%)) 
(10) 


Suppose that having calculated 6 € R% for some 
given a € R, we wish to modify some element a/. 
Meyer and Floudas [11] derived formulae that may be 
used to update the £’s following such an @ update. Un- 
der the substitution a/ — a/, the elements f!,..., B™ 
that satisfy (8) may be expressed in terms of B',..., BY, 
a and & using the following update formulae: 


Br BY = (al —a) (x) 1) (x1 — 2) 
Xe eX 
+ yy (w - a) (x) = 27) (x) 2") 
1 bss = 
= ig - a) (ex 


pk — pk =—* (oe? — a!) (x! — x7!) (x?! — x?) 
RO SSX 
+ a (e! — (x) 1) (2! — 2) 
= ys (a! - w) (x) - 27) 
x x 
x (x1 + xi — x° — x) ifj= 
(12) 
pk — pk = + — (ai -ai) (xi - x) 
x 
(x7? — x) 
= a = (a — a) (xi — x4) 
(x? =o") 
= a (a! -w) (x) - 2) 
x x 
x (x71 + x! — 2x) ifj>k 
(13) 


A stationary point x* of the function q: R > R is 
one that satisfies: 
dq 


ag = 0 @ ak (xk + xh 2x*) + pk = 0 


x* 
in some interval x* € [x*~!,x*]. It follows that an 
interval k contains no stationary point if either $(x* + 
x1 Blak) > x or 3(xk + xk! + Bhlak) < xk}, 

Meyer and Floudas [11] derived conditions on a! 
that guarantee the absence of such stationary points. 
Their results are summarized in the following three 
Lemmas, which correspond to cases j < k, j = k and 
j > k, respectively. 


Consider two intervals [x/—!,x/] and 
[xk-! x] where j < k. Let the sequence of a values 


Lemma 1 
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defining q¥(x) be 


where a* <0. Let G(x) be the function defined by the 
sequence of a values 


fee a ee 


where &/ <0. There exists no stationary point of q*(x) 
on the interval [x*—!,x*] if either of the following 
bounds on &! hold: 


(a — x9) (ork (xk — xk!) + Br) 
(xi — x31) (xi + xi-! — 2x9) 


aj > 


a (xi — xI7!) (x1 + xd — 2x°) 


xi — xJ“1) (xi + xi“! — 2x9 
( )( ) 


or 
gi = G29) (a (2 =F) + 8) 
(xi — xi-1) (xi + xi-l — 2x9) 
ad (xF — xF1) (xd) + x} — 2x°) 
(xi = a) (xi +xj-1 2x") 
Lemma 2 Consider an interval [x*—!,x*]. Let 


{at,a?,...,a@N1} be the sequence of aw values deter- 
mining q*(x). Let q‘(x) be the function defined by the 
sequence of a values 


Ue cd Se ee 


where &* <0. A stationary point of G(x) does not exist 
on the interval [x*!, x*] if either of the following condi- 
tions hold: 


_k 4 


a” > if¢ <0 
(x — x3) (xk x1 — 2x0) fo < 
t (14) 
Pa : 
> iff >0 
(x* = xk-1) (x* + xk-1 — ax) if 
where 
= BE (x — x9) 
+ ak (x! xk ') (x! 4 xk — x0 x) 
Lemma 3 Consider two intervals [x/~!,x/] and 
[x*,xk-1] where j>k. Let ak <0, and {a,..., 


Wess .,aN1} be the sequence of a values de- 


termining qk (x). Let q*(x) be the function defined by 
the sequence of a values {a',...,a*,...,@),...,aN}} 
where 6 <0. A stationary point of G(x) does not ex- 
ist on the interval [x*~!, x*] if either of the following 
bounds on &/ hold: 


Ja... 


Oi... 


(x = x°) (aé (xk - xt-1) +4 B*) 


ai 
(xi _ xi“) (xi eg Fd 2xN) 
ad (xi — xI71) (xd) + xd — 2x) 
(xi — xi-l) (xi + xi-l 2xN) : 
os (x — x°) (a (x* _ x1) = Br) 


xi — xJ—!) (xi + xi“! — 2x 
(xi — xi“) (x) + x ™) 


a} (xi — xI“1) (x7? + x) — 2x) 


xi — xJI-1) (xi + xJ71 — 2xN 
( )( 


When q(x) is concave on a set of intervals and is guar- 
anteed to have no stationary point on the remainder of 
the intervals, q(x) is monotonically nondecreasing be- 
tween x° and a global maximum x* and monotonically 
nonincreasing between x* and x. Under the afore- 
mentioned conditions, the perturbation function q(x) 
is always non-negative and, thus, $(x) is a valid under- 
estimator of f(x) [11]. 


Illustrative Example 


As an illustration, we present here an example from 
Meyer and Floudas [11]. It involves the well-known 
Lennard-Jones potential energy function: 


jOeeea= 


x2 x6 
in the interval [x, x] = [0.85, 2.00]. The first term of 
this function is a convex function and dominates when 
x is small, while the second term is a concave function 
which dominates when x is large. The minimum eigen- 
value of this function in an interval [x, x] can be calcu- 
lated explicitly as follows: 


156 84 we 
=e 8 it xX = 121707 

min f” = ¢ —7.47810 if [x,x] > 1.21707 
156 84 


—_-= if x > 1.21707. 
x x 
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Global Optimization: p-«BB Approach, Figure 2 
Lennard-Jones convex underestimators with (a) concave and (b) nonconcave perturbations 


Global Optimization: p-«7BB Approach, Table 1 Global Optimization: p-~BB Approach, Table 3 
Parameters for 2 subinterval perturbation for Lennard-Jones Parameters defining nonconcave perturbations for the 
function Lennard-Jones potential 


k min f’” ak Bk k 


x' 
fo [oasoooo] | 
0.921875 | 326.18127 
0.993750 | 81.99112 
Global Optimization: p-«BB Approach, Table 2 Usted 
eemece for 16 Sinenet perturbation of Lennard- eSB UD eee 
Jones function 1.209375 | —7.46047 


oy ~*~ 


1 | 1.425 | —7.47810 | 3.73905 | 1.62764 |—1.38349 
2 | 2.000 | —3.84462 | 1.92231 | —1.62764 | 3.25528 


N]o 


BR] Ww 


7 


1.281250 | —7.47810 
1.353125 | —6.71098 
1.425000 | —5.21291 
1.496875 | —3.84462 
1.568750 | —2.78248 
1.640625 | —2.00473 
1.712500 | —1.44791 


NI] OO 


[oe] 


0.850000 
0.921875 | 326.18127 | 0.00000 | 1.78326 |—1.51577 
0.993750 | 81.99112 | 0.00000 | 1.78326 |—1.51577 
1.065625 | 13.55346 | 0.00000 | 1.78326 |—1.51577 
1.137500 | —4.27629 | 2.13815 | 1.62958 | —1.35200 
1.209375 | —7.46047 | 3.73024 | 1.20779 | —0.87222 
1.281250 | —7.47810 | 3.73905 | 0.67093 | —0.22296 
1.353125 | —6.71098 | 3.35549 | 0.16101 | 0.43038 
1.425000 | —5.21291 | 2.60645 | —0.26750 | 1.01021 
1.496875 | —3.84462 | 1.92231 | —0.59301 | 1.47405 
1.568750 | —2.78248 | 1.39124 | —0.83117 | 1.83055 
1.640625 | —2.00473 | 1.00236 | —1.00321 | 2.10044 
1.712500 | —1.44791 | 0.72395 | —1.12729 | 2.30401 
1.784375 | —1.05201 | 0.52600 | —1.21713 | 2.45786 
1.856250 | —0.77029 | 0.38515 | —1.28262 | 2.57472 
1.928125 | —0.56887 | 0.28443 | —1.33074 | 2.66405 
2.000000 | —0.42385 | 0.21192 | —1.36642 | 2.73284 


Xe) 


=/o]/—o4/]/— 
WIN]—-|1oO 


1.784375 | —1.05201 | 0.52600 
1.856250 | —0.77029 | 0.38515 


1.928125 | —0.56887 | 0.28443 | —1.07909 | 2.16074 
2.000000 | —0.42385 |0.21192 |—1.11476 | 2.22952 


CONIA] MH] BR] WIN] — |] O Jes 


=| 
O;u 


Xe) 


The classical wBB underestimator for this function 
and interval is f(x) — a (x — x) (x _ x). Bisecting 
the domain and applying (8), we obtain a convex un- 
derestimator defined by the parameters in Table 1. 

Partitioning the domain into 16 equal sized subin- 
tervals and applying (8), we obtain the convex under- 


Als ea |e] a] acs 
Alrnsi AR] Wil N]—|o 
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estimator (x) with the parameters defining q(x) of Ta- 
ble 2. 

The potential energy function, the classical wBB un- 
derestimator, and the ¢(x) underestimators are shown 
in Fig. 2a. In this figure, the a spline underestimator 
based on 2 subregions is denoted as $), while that 
based on 16 subregions is denoted as $"°. 

Figure 2b depicts the strengthening of an underes- 
timation function through the use of nonconcave per- 
turbations. A negative a value has been assigned to two 
of the three regions in which the second derivative is 
strictly positive, as shown in Table 3. The resulting un- 
derestimator is depicted as # (x), while the notation 
o* (x) is used to depict the underestimator with no neg- 
ative a’s (same as $"'® in Fig. 2a). 
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The prediction of the behavior of fluid mixtures is 
a fundamental aspect of chemical process engineering. 
The physico-chemical problem of computing solutions 
to the phase and chemical equilibrium problem is cen- 
tral to the design, control and operation of many im- 
portant processes. These include distillation (standard 
and azeotropic), extraction trains, petroleum reservoirs 
and applications involving gases at high pressure. The 
ubiquity of the flash calculation in chemical engineer- 
ing is just one example of its prevalence. Because the 
properties of many fluids vary in a complex fashion, 
the thermodynamic models that have arisen to describe 
their behavior create some difficulties in their applica- 
tion. These challenges will be explored in this article. 


Problem Statement 


The equilibrium condition is characterized by an ex- 
tremum of some thermodynamic condition. Most com- 
monly, the focus is on systems that attain equilibrium 
states under conditions of constant pressure (P) and 
temperature (7) where the global minimum value of 
the Gibbs free energy describes the true equilibrium 
state. The problem may be stated as follows: 


Given C components participating in up to P po- 
tential phases under isothermal and isobaric con- 
ditions find the number of phases and the distri- 
bution of components in those phases that yield 
the global minimum of the Gibbs free energy. 


The requisite material balance constraints must also be 
satisfied. In what follows, all quantities associated with 
the Gibbs free energy are treated as dimensionless by 
dividing by RT, where R is the universal gas constant. 
The total Gibbs free energy is given by the summation 
of the molar Gibbs free energies for each phase: 
G =i nkgk = G, 
kep kepP 

where n* is the total number of mols present in phase k; 
g* and G* are respectively the molar and total Gibbs free 


energy of phase k. The composition variables can be de- 
fined intensively in terms of mol fractions (x = {x*}), 
or extensively, as the number of mols of component i 
in phase k (n= {n*}). It is easy to move from one form 
to the other via the relation n* = n'x*. g* is naturally 
expressed with intensive variables while extensive vari- 
ables are appropriate for G*. The equilibrium solution 
must also satisfy the linear material balance constraints. 


Thermodynamic Models 


Turning to the available thermodynamic models avail- 
able to predict fluid phase behavior, these typically lead 
to expressions for the molar Gibbs functions that are 
mathematically complex, nonlinear and nonconvex. In 
this section, the analysis is presented for the molar 
Gibbs function. 


Liquid Phases 


Many liquid phases are only partially miscible (referred 
to as phase splitting). Nonideality is often expressed 
through the employment of excess functions which at- 
tempt to correlate the deviation of the system from ide- 
ality. The excess Gibbs free energy is simply the amount 
by which the Gibbs free energy is above that of an ideal 
solution: 


g(x) = g(x) — g'(x) 
with 

g(x) = >> xine + Do xilnx;, 

i€C i€C 

where j1; is the chemical potential of pure component i 
referred to the standard state. g'(x) is convex. A num- 
ber of different expressions of increasing complexity are 
now summarized for the excess Gibbs functions The 
only variables are the mol fractions x; and all other 
quantities are parameters particular to the thermody- 
namic model. References to these equations and their 
parameters can be found in [21]. 


The Wilson Equation 


Because the molar Gibbs free energy is convex in this 
case, this equation is the only model described here that 
cannot be used to predict phase splitting. 


g' (x) = —>0 x; in Aga: 


ieC jec 
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Regular Solutions 


This equation is bilinear: 


g' (x) = yy Ae. 


i€C jEC 


The NRTL Equation 


This widely used model consists of a summation of bi- 
linear fractional terms: 


i652 


5G ea Giixj 


The next three models are nonconvex in form. They 
are grouped together because it has been shown in [16] 
how they can be transformed into the difference of two 
convex functions (d.c. form), allowing the application 
of standard branch and bound global optimization al- 
gorithms. 


The UNIQUAC Equation 


The excess Gibbs function is composed of a residual 
part and a combinatorial part, denoted g&(x), defined 
as: 


VjXj 

seo) = Yoxi[1- 5 gi|In rx; rx; 
qix i 

aa 


i€C 


The excess Gibbs function is then given as: 


5 aX; 
g(x) = gi) + SY qixj in. 
: 2 Dj Gti 


The next two models represent the behavior of 
molecules in mixtures by aggregating the properties of 
constituent functional groups (represented by the index 


set G = {g} = {I}). 


The UNIFAC Equation 


The combinatorial part is the same as for the UNI- 
QUAC equation: 


= gé(x) + ym y Vgi 
ieC g€G 

yj jj 
D7 Uy Wij Mg 


g' (x) 


Q, In —In re 


The ASOG Equation 
g(x) = ». Xj -s al — Xi > Vegi 
i€C i€C geG 
x dIn : a r, a =lar 


Dj Xi Lo7 Vj Agi . 

Of all the above methods, the NRTL, UNIQUAC and 
UNIFAC are currently the most commonly used. No- 
tice that some of the correlations are of high mathe- 
matical complexity. While this is necessary in order to 
predict multiple liquid phases, it can lead to problems 
where extraneous and erroneous additional phases are 
predicted. An example is given in [19] where the NRTL 
equation mathematically predicts three liquid phases 
when the physical mixture has only two phases. 


Vapor Phases 


Deviation from ideality in vapor phases is often ex- 
pressed through the use of fugacity coefficients: 


g(x, Zz) — g' (x) = In f(x, z), 


where (x, z) is the fugacity coefficient of the mixture. 
The standard state is usually assumed to be an ideal gas 
at T and unit fugacity. The compressibility z = pv/RT 
measures deviation from the ideal gas law, and an ex- 
pression for it is required to calculate $(x, z). For an 
ideal gas, z = 1; otherwise, z is often obtained from an 
equation of state (EOS) which correlates the tempera- 
ture, pressure, volume and the composition of nonideal 
mixtures. This equation of state then becomes an addi- 
tional constraint (typically nonlinear and nonconvex) 
that must be obeyed over all compositions. One possi- 
ble generalized equation of state can be written in its 
standard form as: 
z—aB A 


—aB— = 0, 1 
—" ZB Z4pB (1) 


A= ye YS) Aijxixy, B= >> Bixi, (2) 


i€C jEC ieC 


where @ and f are constants that depend on the equa- 
tion of state employed. The more important equations 
of state include the van der Waals (a = B = 0), Soave-— 
Redlich-Kwong (a = 0, 6 = 1), and Peng-Robinson 
(a = /2—1, B = V2 +1). See [26] fora thorough 
review. Note that (1) is composed of the sum of a linear 
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fractional and a bilinear fractional function, and that (2) 
defines A as a bilinear function. This means that when 
an equation of state is used, an additional level of com- 
plexity is added to the problem in the form of noncon- 
vex and nonlinear constraints. 

As is demonstrated in several standard texts [26], 
the overall mixture fugacity coefficient can be obtained 
using (1) as: 


In (x, z) 


= (z—1)—In(z—B) + cay coo 


(@+f)Bz+pB° 


This function is highly nonlinear and nonconvex, con- 
sisting of a bilinear fractional function (A/B) multiply- 
ing the logarithm of a linear fractional function. 


Obtaining Equilibrium Solutions 


Here the global minimum of the total Gibbs function is 
sought subject to the material balance constraints. Be- 
cause the total Gibbs function is used, extensive vari- 
ables are appropriate. Following [23], assume there are 
zm phase classes characterized by a separate thermody- 
namic model. gos represents the phase class where an 
EOS is used. Before solving the problem, Pz, the num- 
ber of phases consistent with phase class 7, must be se- 
lected. P= Uz Pz. The solution will then yield Py! <P, 
where P;! is the number of phases of class 7 present in 
nonzero amounts at equilibrium. Consider a potential 
LLV mixture: if the NRTL is used to model two liquid 
phases, and the Peng-Robinson equation for a single 
vapor phase, then 7, = NRTL, Pz, = 2; m2 = PR, Pz, 
= 1. If the actual physical mixture at equilibrium is cal- 
culated as LV, then Pe = Pe = 1. The phase rule [26] 
gives an upper bound on the number of possible phases. 
The optimization formulation can now be written as: 


: _ k 
nip G= 0G 
pEN kePy 

st. EOS‘ =0, 


(G) 
Vk EP. 


TTEOS? 


where 


Nein, on Sa), Vi. oy SO, VER 
k 


Here, n? is the total number of moles of component iin 
the mixture. Note that the equation of state in (G) com- 


prising (1) and (2) is assumed to be written in extensive 
form. 


Equation Based Approaches 


Even though (G) is naturally expressed as an optimiza- 
tion problem, equation based approaches are by far the 
most prevalent due to their use in commercial chemical 
process simulators. The first order necessary optimality 
conditions of (G) reduce to a set of nonlinear equations, 
corresponding to the condition of equality of chemical 
potentials ( je): 


k k’ 


Mi =p;, VieC, Vkk EP. (3) 


All chemical engineering undergraduates encounter the 
direct iteration K-value method for solving (3), known 
as the single stage flash calculation. A general descrip- 
tion is supplied by [12]. The inside-out algorithm of 
[2] is of especial prominence due to its superior perfor- 
mance to other methods. Because these equations are 
nonconvex, there may be several solutions which satisfy 
them, and these methods are prone to failure, especially 
at conditions close to the critical point (which is called 
the plait point for liquid phases). 


Local Optimization 


Given the problems associated with the equation based 
approaches, various attempts to solve (G) using local 
optimization have been attempted. A steepest descent 
method was used in [27] and is known as the RAND 
method. Various methods were compared to an im- 
plementation of Wolfe’s quadratic programming algo- 
rithm in [5]. A variable projection method was used 
in [3]. Several other variants of Newton based meth- 
ods have been employed (see [17] for a brief summary). 
None of these methods—typically Newton or quasi- 
Newton algorithms—removes the possibility of con- 
verging to a local optimum, or a trivial solution (sad- 
dle point where the mol fractions in two phases of the 
same class z are the same), and are highly dependent on 
starting point. A major problem is that P; is unknown 
and must be guessed, and therefore, the incorrect num- 
ber of phases P;! is easily obtained with these methods. 
Another key problem in these approaches is the devel- 
opment of numerical singularities when phases coalesce 
or split as the algorithm progresses [22]. 
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Global Optimization 


The above facts motivate the employment of global op- 
timization techniques because if an approach can be 
guaranteed to obtain the global optimum solution of 
(G), then a sufficiency condition for phase equilibrium 
is automatically supplied. The first use of global opti- 
mization to solve (G) was undertaken in [17], where it 
was shown how the GOP algorithm [4] could be used in 
cases where the NRTL equation was used to model liq- 
uid phase behavior. New variables were introduced so 
that the formulation of (G) would consist of a biconvex 
objective function and a bilinear constraint set, satisfy- 
ing the requirements of the GOP to guarantee global 
optimality. For the UNIQUAC equation, a branch and 
bound global optimization algorithm described in [10] 
was implemented to determine the global minimum of 
(G) [15]. A key aspect of the work in [17] was the math- 
ematical transformation of the nonconvex expressions 
for the Gibbs free energies into forms with special struc- 
ture, namely the difference of two convex function (d.c. 
form). Similar transformations and this same algorithm 
can be also applied to the UNIFAC, ASOG and modi- 
fied Wilson equations, as shown in [16]. These were the 
first approaches to guarantee convergence to the global 
solution of (G), regardless of the supplied initial point. 


Verifying Equilibrium Solutions 


The tangent plane criterion provides an alternative suf- 
ficiency condition for a candidate equilibrium solution 
to correspond to a global minimum of the Gibbs free 
energy [7]. A candidate solution must satisfy the neces- 
sary condition for equilibrium—that is, satisfy (3). Sta- 
bility requires that the tangent hyperplane constructed 
using the chemical potential values of the candidate so- 
lution (denoted ut) at no point lies above the molar 
Gibbs surface for all phase classes used to model the 
mixture. Stated in optimization terms, if the global min- 
imum of the tangent plane distance function, Dj, for 
each phase class 2 used to represent the behavior of 
the mixture, is nonnegative Vz, then the candidate so- 
lution corresponds to a global minimum of the Gibbs 
free energy [1]. The phase stability problem is defined 
for a phase class z as: 


min D” = g” ~\ xin} 
(4° i€C 


st. EOS(x,z)=0 ifmgos C x, 


where X = {x: )7jx; = 1, x; = 0, Vi}. Clearly, (1) and 
(2) are required for (S) when gos C 1.g” is obtained 
from the appropriate thermodynamic models described 
earlier. Therefore, it is seen that the approach involves 
verifying that a candidate solution is the equilibrium 
one. 


Equation Based Approaches 


As with (G), the first order necessary optimality condi- 
tions of (S) reduce to a set of nonlinear equations: 


wy ut =K, stxeEX, (4) 


where K is a constant. The EOS must be satisfied if 2 095 
C a. If a nonnegative solution to this set of equations 
is obtained, then the postulated solution is assumed to 
be stable. Standard direct iteration methods have been 
used [20] as well as homotopy continuation methods 
[24] to solve (4). However, no guarantee of obtaining 
all stationary points can be provided with the typical 
equation based approach. However, an interval New- 
ton method has been used in [11] to €-enclose all sta- 
tionary points. This work can be considered a ‘global’ 
method for equation solving. It should be noted that 
a branch and bound global optimization algorithm [13] 
has been used to obtain all homogeneous azeotropes in 
mixtures [9]; because the condition of azeotropy adds 
a single linear constraint (equality of mol fractions in 
all phases) to (3), this approach can in principle be used 
to guarantee obtaining all €-global solutions to both (3) 
and (4). 


Global Optimization 


The advantage of a global optimization approach is 
that if a nonnegative solution is found, then it can be 
definitively asserted that the candidate solution is the 
globally stable equilibrium one, unlike available local 
algorithms. It is shown in [18] how global optimiza- 
tion can be used to solve (S), using the GOP algorithm 
for the NRTL equation, and a branch and bound algo- 
rithm for the UNIQUAC equation. For the modified 
Wilson, ASOG and UNIFAC equations, it was shown 
in [16] how this same branch and bound algorithm 
could be used after transforming the expressions for 
g(x) into d.c. form. It has been shown how the formula- 
tions for (G) and (S) involving equations of state can 
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be transformed into biconvex form allowing the ap- 
plication of a number of global optimization algo- 
rithms [14], although no implementation was under- 
taken. An important recent extension of global opti- 
mization to the case of equations of state is supplied 
in [8] where the nonlinear terms are validly underes- 
timated within the framework of a branch and bound 
algorithm. 


Combining Approaches 


From the above development, it is apparent that: 
1) To obtain a candidate equilibrium solution, either 
solve (G) or (3); and 
2) To verify a candidate as the equilibrium solution, ei- 
ther solve (S) or (4). 

Approach 1) is problematic because the a priori selec- 
tion of P; represents a formidable challenge. If too few 
phases are allowed, then convergence to constrained 
minima can occur; if too many are assumed, then nu- 
merical problems may arise, or convergence to trivial or 
local extrema may occur. Therefore, the concept of us- 
ing the tangent plane criterion to provide initial guesses 
for (G) or (3) has been shown to greatly increase re- 
liability with a tolerable increase in computational ef- 
fort. In addition, when solving (G) or (3), the num- 
ber of composition variables is Ny = |C||P|, while for 
(S) or (4), Ny = |C|. The performance of the RAND 
method was found to considerably improve when com- 
bined with a phase-splitting algorithm [6]. The semi- 
nal work of M.L. Michelsen [20] proposed an iterative 
approach whereby the solution from the tangent plane 
criterion is used to initialize the search for the equilib- 
rium solution. This is implemented using a direct sub- 
stitution method (K-value approach) as well as an op- 
timization method. The calculations are computation- 
ally efficient and reported to be quite reliable, although 
there is the danger of predicting a stable phase distribu- 
tion, when, in fact, this is not the case. In a comparative 
study for liquid-liquid phase splitting [25], Michelsen’s 
method was found to be the most reliable. A similar it- 
erative approach using homotopy continuation meth- 
ods to solve (4) have also been used in [24]. However, 
there are a number of difficulties associated with these 
approaches. First, no guarantee of obtaining all station- 
ary points can be provided. Second, since the solutions 
obtained from the stability problem are then used to 


initiate the search for a solution with a lower Gibbs free 
energy, these guesses may lead to local optima, or even 
infeasible equilibrium solutions. Therefore, no guaran- 
tees can be made of having obtained the equilibrium 
solution, even though overall reliability is significantly 
increased. 


Global Optimization 


When solving (G) using global optimization, the maxi- 
mum allowable number of phases P,, Vr, must be con- 
sidered for rigorous determination of phase and chem- 
ical equilibrium. This leads to high computational ef- 
fort when often the global solution is generated early in 
the global optimization search [19]. For these reasons, 
an algorithm known as GLOPEQ (global optimization 
for the phase and chemical equilibrium problem) was 
implemented in [19]. An iterative approach was pro- 
posed based on the fact that solving (S) to global op- 
timality to verify a candidate solution is vastly prefer- 
able to solving (G). GLOPEQ therefore leads to signifi- 
cant computational savings over other global optimiza- 
tion approaches. It should be noted that the approach 
described in [8] can be incorporated into GLOPEQ, 
extending its applicability and giving the first global 
optimization method for both nonideal liquid and va- 
por phases. The key difference between GLOPEQ and 
the other local iterative approaches is that global opti- 
mization is used at each step of the algorithm, allowing 
a guarantee to be made of obtaining the true equilib- 
rium solution no matter the starting point. 


Reaction Equilibria 


If reaction occurs in the mixture, then the permissible 
regions N and X must be adjusted. See [23] for an ele- 
gant analysis of the tangent plane criterion for reacting 
mixtures. Note that N and X remain linear and they do 
not affect the global optimization approach for solving 
(G) or (S). 


General Comments on Efficiency 


Clearly local approaches, while less reliable, are more 
efficient than global optimization approaches. Because 
of the relatively heavy computational burden of global 
optimization, these approaches are more justified for 
off-line analysis as they could not be practically used in 
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a chemical process simulator. Having said that, compu- 
tational times of seconds for highly nonideal mixtures 


of up to eight components [8] provide a great deal of 


promise for improving the robustness of the equilib- 
rium calculation without resulting in excessive solution 
times. 


See also 
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Introduction 


Multi-layered dielectric structures are relevant in many 
applications that seek to influence electromagnetic ra- 
diation across the infrared, optical and X-ray spec- 
tra. Anti-reflection coatings, components for integrated 
optics and semiconductor lasers are based on multi- 
layered dielectric designs; they are generally modeled 
using the transfer matrix method that has been in 
widespread use for the past thirty years [5,26]. In 
many cases optical designs can be devised by deduc- 
tive reasoning, but, as design objectives have become 
more elaborate, robust numerical optimization tech- 
niques have become increasingly relevant. Baumeister 
reported the first refinement technique for multilayer 
dielectric in 1958 [3]. 

The synthesis of multi-layered dielectric structure 
designs requires a robust global optimization approach. 
The mathematical model that describes the optical 
properties of these structures is highly non-linear and 


presents any solver the task of sifting through count- 
less local minima. Early approaches relied on stochastic 
global methods. The lack of deterministic methods in 
the literature highlights the challenging mathematical 
task of identifying minimizing convex approximations. 
As far as the authors know, the only deterministic ap- 
proach proposed to date is limited in scope due to ap- 
proximations that are made to derive model equations 
such that the problem has a unique solution. 

This encyclopedia entry examines the problem of 
multilayer dielectric design, which has been treated 
with a range of algorithms over the past 20 years. 
Stochastic approaches are reviewed including Simu- 
lated Annealing, Genetic Algorithms and a Multi-Level 
approach. A deterministic minimization approach is 
also discussed. The study may be considered a review 
and critical comparison of techniques for electromag- 
netic filter design. 


Formulation 
Statement of Physical Problem 


Multilayered dielectric structures have two modes of 
operation: in passive mode, a structure reflects or trans- 
mits light from an external source as a function of the 
input wavelength and direction; in active mode, a struc- 
ture creates light internally and distributes the emis- 
sion both spectrally and spatially. Figure 1 illustrates 
these two modes of operation. Here, a (Kk, Z:) and 
al" (k,z;) are the forward and backward propagating 
amplitudes in Region i respectively. The superscript in 
brackets (44 = {s, p}) indicates the polarization, which 
is described as either Transverse Electric (s) or Trans- 
verse Magnetic (p). 

In the passive geometry, amplitudes are equated 
with real measurable quantities: |a,(k,z,)|?> = 1, 
Jae)? = Rte), lan(k.zn)|2 = Tle) and 
|an(k,zn)|? = 0, where R(x) and T(x) are the reflec- 
tivity and transmissivity of the structure respectively. 
In active mode |a;(«,z)|? = 0, |ai(k,2z1)|? = Py(k), 
|an(k,zn)|? = Pr(K), |an(k, Zw) |? = 0 where P; and P, 
are the forward and backwards emission powers. The 
source amplitudes, aj(k,z;) = A(k) and a(k,z;) = 
A(«) are dependent on the type of emission source and 
will not be elaborated upon here. For more information 
on these issues, the reader should refer to [10] and [4]. 
In both operation modes, the structure interacts with 
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Global Optimization of Planar Multilayered Dielectric Struc- 
tures, Figure 1 

Schematic of a multilayer dielectric structure highlighting 
the nomenclature... 


light as a function of its wavelength, A, and the op- 
tical angle parameter, k = n; sin 0;, which is related 
to the propagation angle, 0;, in region i. Note that « 
is invariant throughout the structure unlike 6;, which 
varies with the refractive index, n;, due to Snell’s law 
(n; sin 9; = constant). 

The following analysis considers only the passive 
mode of operation, although the approach is readily 
adaptable to describe problems involving the active 
mode. Therefore, R(x) and T(x) are usually part of 
some expression to be minimized. The design variables 
to be optimized are the refractive indices, n;, and layer 
thicknesses, d;, throughout the structure. The optimiza- 
tion problem is posed as follows: a single valued objec- 
tive function F{R(«;n, d), T(«;n, d)} involving the re- 
flectivity and transmissivity must be minimized subject 
to unknown variables n = {n;} and d = {d;} where 
i € {1,..., N}. The problem is typically bounded, 
defining a variable space of finite extent: for example, 
the unknown variables here are constrained to upper, 
n”, d” and lower, n’, d’ bounds. This is summarized 
as, 


min F{R(k; n, d), T(«k; n, d)} 


s.t. n—n’ <0 
n’'—n<0 (1) 
d—d¥ <0 


d'—d<o. 
The use of the transfer matrix method to describe the 


propagation of light through multi-layer planar dielec- 
tric materials is well-established [5,26]. Oulton and Ad- 


jiman [28] present an alternative and more compact 
representation highlighting the mathematical details of 
the model and symmetries that are useful for writing ef- 
ficient code and deriving compact analytical gradients 
for local optimization. 

Consider the schematic for a general multilayered 
structure in Fig. 1. The transfer matrix, 1 (x), relates 
the electromagnetic field amplitudes in regions i and j 
at z; as follows, 


(L)+ H)— 
Xi (x) Xi (k) 
- + 
xP Ce) XE (e) 


aM (k, 2; 
ak, Zi 


1 1 
(Ww) ie (12) 
Xi 2 (ce =| (3) 
i 
The coupling coefficients, C a are 
co — | Bi 
ae Bi 
(4) 
n; ; 
cia Ie 
> Nj Bj 


Here, B; = ko,/ n; — k? is related to « and A through 
ko = 2z/A, the wavenumber of the incident light. 6; is 
the component of the wavenumber normal to the pla- 
nar layers and is sometimes referred to as the propaga- 
tion constant. Note that ve (Kk) is symmetric with only 
two independent elements. 

To describe propagation across region j, of thick- 
ness dj = 2; — Z;, the amplitudes, aj(«, zj) and 
a;(K, z;) at z are related to the amplitudes, aj(x, z;) 
and aj(K, z;) at z; by the transfer matrix, P;(«), which 
is independent of polarization for isotropic materials. 
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In order to relate the fields at the interfaces between re- 
gions i + 2 andi + 1 at zj4; and regions i + 1 andi 
at z;, interface and propagation matrices are multiplied 
together such that, 

) (6) 


i+1(K, Zi41) (H) aj(K, Z;) 
=M" (k : 
) 41,100 Ai(K, Zi) 
where Mi) i(k) = Pig ile) TY, (kx). In the general 


i+1(k, Zi41) 
formulation, the amplitudes can be expressed as a vec- 


tor of plane wave modes corresponding to the vari- 
ous angles of propagation within the planar dielectric 
medium. When considering N values of « (angles of in- 
cidence), the transfer matrix, Me ; willbea2N x 2N 
matrix. Notice, however, that due to Snell’s law and the 
law of reflection MY ; is sparse with only 4N compo- 
nents along the diagonals of each quadrant of Me. : 
Therefore there are only 2N independent components. 
From here on, the parameter « will be dropped from 


the mathematical expressions for brevity. 


Analytical Gradients for Effective Optimization 


The efficiency and accuracy of local optimization can 
be enhanced by using analytically determined gradi- 
ents. Methods for determining the gradients of trans- 
fer matrices have been presented in the literature [29], 
but here the new formalism allows a compact analyti- 
cal evaluation which leads to simplified coding and ef- 
ficient strategies for calculating large numbers of gradi- 
ents for one structure. 

Consider the derivative of the standard mode 
matching matrix equation with respect to the variable, 
¢; for an optical structure with N layers: 


8 f anew) \ _ aMW ( af(z) 
at \ a (en) J 8G, AM) 


4+ mM a ( 


(7) 


a (21) 
a! (z1) 


Given any two constant boundary condition ampli- 
tudes the matrix equation can be solved for the deriva- 
tives of the unknown amplitudes. Consider now the 
derivative of the matrix with respect to the variables of 
a given layer. 

The derivative with respect to d; is the easiest to 
evaluate as only one phase matrix, P;, needs to be dif- 


ferentiated. The matrix derivative is: 


OM, 

sl _ yg (4) (1) yg (He) 

9d, = My, 4M; M;; 
t 


(8) 


Differentiation with respect to the refractive index, nj, 
is much more complicated as it involves the product of 
three matrices and must be evaluated using the Leib- 
niz rule. In the current representation, transfer matrix 
symmetries can be exploited to give a concise form as in 
Eq. (8), which can be written for the two polarizations 
as follows: 


aM, 
Sn = MN MS (9) 
1 
where, 
dM“) = nike iBid; Lf @2iBidi _ 1} 
BEA Se 24 1) pid 
niké 
dM”) = — 
i 
2K 
ipid; 1a {eriBidi _ 1 
2_k 2:8; —_ 
ranted 


(10) 


These are extremely concise forms for the matrix 
derivatives of fairly complicated expressions where 
each gradient only requires the evaluation of a supple- 
mentary transfer matrix, dmy" and one additional ma- 
trix evaluation. 

The reflectivity, R and transmissivity, T, involve 
the absolute square of the field amplitudes. Given the 
derivative of the amplitude, a;(z;), the derivative of its 
absolute square is given by, 


dR d(a,(z;)a;(z;)*) 
d¢; dl; 
= WNlaj(z)}R 


fea (11) 


lag; 
daj(z;) 
dé; t 


+ 29{ai(2)19 | 


Methods and Applications 


By the early 1990s, a range of optimization methods 
were being used to generate optical multi-layer filter de- 
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signs. Dobrowolski and co-workers reviewed ten meth- 
ods for computational speed and effectiveness at reach- 
ing an optimum solution to determine which would be 
best suited to these highly non-linear problems [11]. 
Amongst these were both global and local methods but, 
at this time of limited computer power, no particular 
approach was deemed superior over the fixed calcula- 
tion time of 2 hours. The authors rated a local gradient- 
based modified Gauss-Newton method highly for its 
consistency over the problems investigated. The obser- 
vation that the global optimization methods performed 
on a par with local methods, despite the clear limitation 
in fixed calculation time, was also noted. 

Computing power today is not as great an is- 
sue and the use of global optimization techniques for 
multi-layered optical design has attracted a great deal 
of attention (see references throughout this Section). 
Global algorithms can be broadly divided into two 
categories: deterministic and stochastic. Deterministic 
methods guarantee a global solution, usually at the 
expense of calculation time, whereas stochastic meth- 
ods converge rapidly to solutions with only a prob- 
abilistic guarantee of global optimality in finite time. 
Liberti and Kucherenko investigated these contrast- 
ing philosophies by comparing the deterministic spa- 
tial Branch and Bound (sBB) and Stochastic Multi Level 
Single Linkage (MLSL) methods for a range of test func- 
tions [24]. The authors concluded that the stochas- 
tic method, in the cases studied, converged faster to 
a global optimum with a high degree of probability, but 
the deterministic method could perform better in cases 
where specific theoretical assumptions about a prob- 
lem’s analytical structure could be taken into account. 
In general, deterministic methods require preparation 
for a particular problem, whereas stochastic methods 
can be more readily adapted for black-box scenarios. 
Nevertheless, stochastic approaches cannot guarantee 
global optimality in finite time. 

In this section, a range of global optimization ap- 
proaches are reviewed. It is most useful that in the 
study of these methods, some authors have examined 
the same numerical synthesis problem: the design of 
an anti-reflection coating to operate in the far in- 
frared [1,6,11,25,28,36,44]. The objective is to minimize 
a Germanium (Ge: refractive index ng, = 4.2) and 
Zinc Oxide (ZnS: refractive index nzps = 2.2) multi- 
layered structure to achieve a normal incidence reflec- 


tivity, Rik = 0) +> 0 for Ny = 47 equidistant wave- 
lengths in the spectral band 7.7 < A < 12.3 um. The in- 
cident medium is air and the substrate which the struc- 
ture is built on has refractive index ngy, = 4. 

The objective function, F(d, A;, Ri), was chosen to 
be the same as that used by authors in the literature to 
allow comparative studies. 


Ny —1/2 
Fd, A, R= E Y 8.0 (2) 


i=1 


In the following studies, the optimum layer thicknesses 
for reproducing the best designs are omitted for brevity, 
so the reader should consider the relevant references for 
this information. 


Multi-Level Approaches 


In Multi-Level (ML) approaches (e.g., [20,21,23,24, 
28]), different starting points are generated by a higher- 
level algorithm, and the problem is solved from each 
starting point by a lower-level local optimization al- 
gorithm. This approach is very general because it re- 
quires no tuning. It has been applied by Oulton and 
Adjiman [28] to the design of multi-layered dielec- 
tric device design by using a deterministic sampling of 
the parameter space and local nonlinear programming 
(NLP) solver. The approach is able to rank many lo- 
cal solutions for post-optimization analysis. It is also 
non-adaptive at the global level in that the algorithm 
depends only on the current state and not on pre- 
viously calculated states. This brings two advantages: 
firstly, non-adaptive methods are deemed superior to 
adaptive ones in multi-processor applications, which is 
certainly a benefit for computationally intensive global 
optimization problems. Secondly, non-adaptive algo- 
rithms have freedom over the specification of conver- 
gence criteria. Since the optimization algorithm in [28] 
essentially operates by batch local optimization, it can 
be halted according to criteria such as the number of 
global iterations or after a set time limit. Rigorous cri- 
teria are also applicable to the ML strategy [20,21,23]: 
as the algorithm progresses the probability that the cur- 
rent best local solution is the global one increases, pri- 
marily due to the global search coverage guaranteed 
through the appropriate choice of the sampling ap- 
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proach; for instance, Oulton and Adjiman used a Sobol’ 
sequence [33], a deterministic Low Discrepancy Se- 
quence (LDS) which provides a good coverage of the 
variable space. The Sobol’ LDS was selected because its 
construction is based on i) homogeneity as the num- 
ber of sample points, n +> on, ii) good distribution 
for small n and iii) fast computational algorithm. All 
these features, but particularly ii), make this LDS most 
applicable to the current problem. There are a variety 
of LDSs that are constructed on differing requirements 
such as Holton, Faure, Niederreiter and Sobol’ amongst 
others [7,18,27,33]. 


Simulated Annealing 


Simulated annealing (SA) has been applied to a vari- 
ety of electromagnetic multilayer design problems in 
the infra-red, ultra-violet and X-ray spectra [6,8,9,15, 
22,41,42]. SA [22] operates by randomly changing an 
initial design in small steps and accepting the changes 
based on an evaluation of the new design performance 
according to criteria that become increasingly stringent 
as the algorithm progresses. Changes are always ac- 
cepted if they result in a better design. On the other 
hand, a worse design is accepted with a probability 
based on a Boltzmann distribution. The probability of 
acceptance is tuned by changing the Boltzmann tem- 
perature according to a user-specified schedule. Wider 
exploration of the variable space at the start of the op- 
timization is achieved by setting a high temperature, 
which essentially allows the algorithm to accept worse 
designs and thereby move between local regions of at- 
traction. A cooling schedule restricts the algorithm’s 
ability to investigate adjacent local regions and forces 
convergence to a local optimum. In this case, it is clear 
that convergence to the global optimum will be depen- 
dent on the initial design and especially on the cooling 
schedule. 

The first reports on SA applied to multilayer design 
highlighted mainly the technique’s ability to avoid lo- 
cal minima [41], although adaptations to avoid deep lo- 
cal minima were also reported [8]. These reports were 
for structure in the visible to near infrared spectra. The 
method has recently seen use in the design of reflec- 
tors for UV [15] and X-ray [9] spectra. These have 
applications that include neutron optics, X-ray astro- 
physics and synchrotron radiation. In this region of the 


spectrum, matter interacts with electromagnetic radia- 
tion differently requiring a modified transfer matrix de- 
scription that accounts for surface roughness and inter- 
diffusion (See [9] and references therein). Wu et al. have 
applied simulated annealing to a different optics prob- 
lem involving diffraction gratings [42]. This important 
design problem concerns the efficient coupling of light 
into and out of waveguides and optical interconnects. 

Boudet et al. [6] use SA to synthesize multilayer de- 
signs for the problem given in the introduction to this 
section. Results for N; = 17 (triangle) and N; = 20 
(filled triangle) are shown in Fig. 2b along with results 
generated using other approaches. The merits of the 
method are discussed in Sect. “A Comparison of Meth- 
ods for an Infrared Filter Design”. 


Genetic and Memetic Algorithms 


Evolutionary or Genetic Algorithms (GA) are the pre- 
ferred method in the optics community [2,14,16,17,19, 
25,39,43,44,45,46,47]. GAs operate on the principle that 
the evolution of a random population of parameteriza- 
tions, subject to iterative rules of reproduction and mu- 
tation, will converge to a region of attraction contain- 
ing the global optimum [17]. Members of the popula- 
tion with high performance are given a greater likeli- 
hood of reproducing thereby generating a better popu- 
lation than the one before. Mutation prevents the algo- 
rithm converging too quickly and provides the mecha- 
nism by which the variable space can be explored more 
fully. Usually, local optimization is required to refine 
the final solution. In the case of the Memetic algorithm, 
local optimization is performed on each new member 
of the population. This approach could therefore be 
considered as a multi level approach (see earlier Sec- 
tion). 

Eisenhammer et al. [14] optimized the performance 
of heat mirrors for solar cells: these are high pass filters 
that transmit optical solar radiation but reflect thermal 
radiation, which would otherwise be lost by the solar 
cell. Their designs differ slightly from typical dielectric 
multilayers since they incorporate metals, which help 
to reflect thermal frequencies. Bagnoud and Salin [2] 
and Yakovelev and Tempea [43] have applied GAs to 
the design of chirpped mirrors, which are used to make 
ultra-fast lasers with fempto-second pulses. These so- 
phisticated mirrors are designed to have a reflectance 
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over a broad range of frequencies and, in addition, must 
also be compensated to ensure stability of the reflection 
phases. Yakovelev and Tempea [43] used a memetic 
algorithm and report fast convergence compared with 
the standard GA. Hoorfar et al. [19] developed the GA 
approach by considering the choice of dielectric ma- 
terials from a list of candidates for the multilayer de- 
sign. The authors thereby treat the mixed parameter ap- 
proach (i.e. discrete and continuous optimization pa- 
rameters) to which the GA approach appears amenable. 
Other authors have developed the technique still fur- 
ther by considering multiple objective and constraint 
functions [39]. The standard infrared filter design prob- 
lem introduced by Aguilera et al. [1] has been treated by 
Martin et al. [25,44,46]: the results of these studies are 
shown in Fig. 2b along with those of other global opti- 
mization approaches. 


Needle Optimization 


Most optimization strategies for multilayer design 
problems consider the variation of layer thickness d; 
and layer refractive index n,. Variations of the stan- 
dard techniques including multiple objective functions 
and mixed parameter optimization have also been dis- 
cussed in this article. However, few methods con- 
sider the variation in the number of layers of a mul- 
tilayer design problem. The needle optimization ap- 
proach tackles the problem exactly from this perspec- 
tive [32,34,35,37,38,40]. Firstly, the optimum position 
for introducing a needle like layer perturbation to 
a structure is determined: this usually corresponds to an 
insertion point that gives optimum convergence of the 
objective function. Tikhonov Jr et al. [35] provide an al- 
gorithm for locating this optimum position before nee- 
dle insertion. For some objective functions, this is an- 
alytically determined, but, for flexibility, numerical ap- 
proaches are available also [34,40]. Following insertion 
of a needle, the new design is used as the starting point 
for a local optimization. The needle insertion and local 
optimization procedure is repeated until no more re- 
finement is possible within the constraints of the prob- 
lem at hand. 

An alternative approach to this problem, which has 
not yet been explored, would be to formulate the prob- 
lem as Mixed-Integer Program, in which the existence 
or otherwise of each putative layer would be repre- 


sented by a binary variable. This problem could be 
solved locally using standard techniques, and many of 
the global methods discussed here could be applied. 


Deterministic Methods 


Deterministic algorithms generally require the non- 
linear set of model equations to be analyzed to obtain 
a convex problem which underestimates the minimum 
of the original design problem. Using one of various 
search approaches, such as Branch and Bound, it is pos- 
sible to converge to a feasible global minimum by suc- 
cessively solving such problems, which produce tighter 
and tighter bounds on the solution along an infeasible 
design path. These methods are reviewed extensively 
elsewhere in the Encyclopedia. 

Due to the complexity of the highly coupled transfer 
matrix equations, it is difficult to find appropriate con- 
vex estimators. However, Tikhonravov and Dobrowol- 
sky [36] treat the above problem using an approxi- 
mate infeasible path approach, reducing the problem to 
a quadratic programming problem with linear inequal- 
ity constraints with one global optimum solution. Fea- 
sible solutions are obtained by local optimization of the 
resulting design. In their method, the reflection calcu- 
lation is approximated for k = 0 by i) assuming con- 
tinuous variation of the refractive index profile, and ii) 
assuming only a small reflectivity, R(0). In the scope of 
general multilayer optimization problems, these con- 
ditions are fairly restrictive, but they are applicable to 
the filter design problem posed by Aguilera et al. [1]. 
Strictly, this is not a deterministic global optimization 
method because it is based on solving an approximate 
problem to global optimality, and there is no guaran- 
tee that this corresponds to the global solution of the 
original problem. Tikhonravov and Dobrowolsky [36] 
perform the local optimization of a discretized struc- 
ture to find a feasible solution. One interesting aspect 
of their approach is the proof of an optimal relation- 
ship between the minimum objective function and the 
optical thickness of a filter for a given set of material 
parameters. Although solutions along this line may not 
exist, the condition marks the limit of global optimality. 
The limiting condition of optimality is plotted in Fig. 2 
and marks a theoretical boundary above which all so- 
lutions must lie. This is useful as a benchmark for the 
development of deterministic global optimization algo- 
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Global Optimization of Planar Multilayered Dielectric Structures, Table 1 


Details of solvers and their implementations in this study. (a) [25]; (b) [6]; (c) [44]; (d) [28]. * 
Value estimated based on computation time and CPU type, taking into 


generations and 100 members per population. ~ 


Value estimated based on 1600 


account details from [25]. ’ Number of function evaluations depends on number of layers design and fixed computation 


time of 5 hrs 


Function Evals. 
160,000" 


CPU Time 


Language 


C++ HP Apollo Series 


~100,000° 


C++ HP Apollo 715/75 


150,000 


C++ Unknown 


150,000-250,000" | 5 Hrs 


MatLab/C++ | Intel PIV 2 GHz 


Global Optimization of Planar Multilayered Dielectric Structures, Table 2 

Comparison of optimum solutions found using ML [28] and GA [44]. The ML algorithm performed between approximately 
150, 000 and 250, 000 function evaluations, depending on the number of layers, while the GA algorithm used between 
150, 000 and 650, 000 function evaluations (specific number is unknown) 


Number of Layers 


GA | Merit Function (%) 0.855 | 0.697 | 0.577 | 0.523 | 0.553 | 0.494 
— a 20.34 | 27.04 | 40.17 | 50.99 | 44.98 | 71.15 


| [optical Thickness (um) 31.26 [38.19] 45.19] 5628] 52.10 [6aa7 


rithms and for assessing the performance of stochastic 
methods. 


A Comparison of Methods 
for an Infrared Filter Design 


It is very difficult to compare the general performance 
of optimization approaches. In the following study, 
methods are compared through their performance in 
solving the infrared filter design problem that was de- 
scribed in the introduction to this Section. In each case, 
the same problem with precisely the same objective 
function is considered. In addition, past authors have 
terminated their solvers after a set number of iterations 
to allow a fair comparison with other methods. How- 
ever, this can be a confusing measure of convergence as, 
in the case of GAs, a global optimum may not have been 
reached and in the case of SA, the cooling schedule may 
limit the effectiveness of the method. Consequently, the 
reader will note that there is no consensus over the 
global optimum between any of the optimization meth- 
ods. Nevertheless, it is important to place each solver 
on an equal footing, and the number of function evalu- 
ations will be used as a measure of this. Table 1 shows 
information specific to each solver used in the study. 


It should be noted here, that only past studies of this 
problem using the methods discussed are compared in 
this study. Other studies that treat this problem can be 
found in [1,11,12,30,31] amongst others. 

Yang and Kao [44] have provided an extensive study 
on this problem analyzing designs with varying layer 
number. The same approach was taken to generate data 
for the ML approach following the strategy in [28]. 
A direct comparison of optima found by GA [44] and 
ML methods is shown in Table 2 as a function of the 
number of design layers. Here, GA was allowed 150,000 
function evaluations before stopping, whereas ML was 
allowed 5 hours, which, depending on the number lay- 
ers, allows between 150,000 and 250,000 function eval- 
uations. Both methods operate equally well, but, ML 
tends to locate slightly better solutions at the expense 
of optical thickness (this is equivalent the sum of the 
layer thicknesses multiplied by the respective refractive 
indices). This is to be expected due to the slightly larger 
number of function evalualtions allowed for structures 
with lower numbers of layers. 

The trade-off between the optical thickness ofa filter 
and its reflectivity has been examined by Dombrowol- 
sky et al. [13]. Based on a quasi-deterministic quadratic 
approach [36] to the anti-reflection coating design, they 
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Global Optimization of Planar Multilayered Dielectric Structures, Figure 2 

(a) Positions of all local solutions found using ML approach for 17 and 20 layers in the 5 hour calculation time. (b) Compar- 
ison of optimum solutions of GA [25], SA [6], ML [28] and optimum locus for this problem [13]. Note that for the GA and SA 
algorithms, the maximum optical thickness of the filter is 32 um, whereas, the ML algorithm has no upper limit. The top 10 


solutions for the ML approach are shown 
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Global Optimization of Planar Multilayered Dielectric Struc- 
tures, Figure 3 

New results of the ML approach generated by varying the 
number of layers from 15 to 30. Through post optimization 
analysis, the top 50 results nearest the optical locus [13] were 
identified 


specify a locus merit function against optical thickness. 
This represents a theoretical limit on optimality for 
a given anti-reflection bandwidth. This can be tested in 
this instance: Fig. 2a shows the locus of solutions for 
merit function, F, against optical thickness using the 
ML approach [28]. Here, dots represent the solutions 


for the 17 layer structure and crosses, solutions of the 
20 layer structure for the ML approach. It is clear that 
all solutions appear on or above the optimal locus rep- 
resented by the broken line. Note however, that the op- 
timal locus does not guarantee that solutions should be 
found on or near it. Figure 2b shows these results along- 
side optical designs using GA [25] and SA [6]. Here, it is 
important to note that the GA and SA approaches limit 
the total optical thickness to 32 um, whereas the ML al- 
gorithm is free to locate solutions over a larger range. 
Despite this, all methods appear comparable, with per- 
haps the GA appearing superior over SA. The effective- 
ness of the ML approach in identifying solutions near 
the optimal locus can actually be assessed after opti- 
mization. 

An advantage of the ML approach is the ability to 
perform post optimization analysis on local minima 
making the optimization problem highly adaptive. This 
is appealing because supplementary design criteria can 
be taken into account without having to alter the ob- 
jective function directly; the optimization is usually ex- 
tremely sensitive to the form of the objective function. 
This can be quite effective since ML generates between 
100 and 200 local solutions in the 5 hour calculation 
time, depending on the number of layers in a design. 
For example, further analysis of the local optima in the 
current example allows solutions near the optimal locus 
to be identified. New results were generated using the 
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same approach as in [28] for designs ranging in layer 
number from 15-30. The top 50 solutions nearest the 
optimal locus were then filtered out from the complete 
set and are plotted in Fig. 3. Clearly, solutions very close 
to the optimal locus highlight the effectiveness of ML. 
However, the reduced number of function evaluations 
for structures with larger numbers of periods limits the 
effectiveness of the search. It is also interesting to note, 
that this analysis identifies gaps along the optimal locus. 
This suggests that, in some cases, extra layers are redun- 
dant when seeking to optimize both layer thickness and 
merit function. 


Conclusions 


The design of multi-layered dielectric optical struc- 
tures can be formulated as a highly nonlinear optimiza- 
tion problem in which the thickness and refractive in- 
dex of each layer is to be optimized, based on an ap- 
propriate objective function. This problem is known 
to have a large number of local minima, and several 
global optimization algorithm have been proposed to 
tackle it. These algorithms are mostly stochastic search 
algorithms (Simulated Annealing, Genetic Algorithms 
and Memetic Algorithms) or deterministic algorithms 
with a probabilistic guarantee of convergence (Multi- 
Level Algorithm). A deterministic approach with guar- 
anteed global optimality has also been proposed based 
on an approximation of the design problem. The per- 
formance of several of these algorithms has been com- 
pared for a specific problem. 

Future work on this design problem must continue 
to address the challenges posed by the large number of 
local optima which exist. The design formulation can 
also be extended to include the number of layers as one 
of the design variables. An early and encouraging effort 
in this direction is the needle optimization algorithm. 
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Introduction 


The multiple-minima problem, i.e., the large number 
of minima associated with the potential functions used 
to represent the conformational energy of a polypep- 
tide chain, is one of the greatest obstacles to overcome 
in order to compute the three-dimensional structure 
of a protein. Despite much effort and a large num- 
ber of interesting ideas and approaches, progress to- 
ward the solution of this problem has been very slow. 
An exhaustive search of the conformational hyper- 
surface of a large polypeptide is not computationally 
feasible even with today’s supercomputers. Originally, 
the challenge was to locate the global energy mini- 
mum of small oligopeptides such as the pentapeptide 
Metenkephalin [1,17,20,26,45,46,48,55,57,58,59,72,73, 
79,91]. 

Since the global minimum of a potential function 
for a specific sequence is not known a priori, the only 
possibility of locating the global minimum of the po- 
tential energy is to carry out a large number of inde- 
pendent tests and determine if there is convergence to 
a unique conformation. This approach has been used in 
the test studies of Met-enkephalin in which hundreds of 
independent runs using different techniques have led to 
a unique lowest energy conformation, shown in Fig. 1, 
for the Empirical Conformational Energy Program 
for Peptides (ECEPP/2 [44,50,89], and ECEPP/3 [49]) 


Global Optimization in Protein Folding, Figure 1 
Lowest-energy conformation of Metenkephalin using the 
ECEPP/2 force field [27] 


potential energy functions. Similar results have been 
achieved for other test cases corresponding to larger se- 
quences [1,23,39,56,62,77,78,80,81,94]. More recently, 
we have focused our efforts on the development of 
searching techniques that combine molecular dynam- 
ics with a coarse-grained representation of the protein 
structure. This approach to the protein folding prob- 
lem is more rigorous since it accounts for entropic con- 
tributions and, on the other hand, is computationally 
more advantageous due to the simplified treatment of 
the complexities of the amino acid geometry. Our lab- 
oratory has made considerable progress in this area of 
research during the past few years, and we present a de- 
scription of some of the successful methods that we 
have developed. 


The Build-up Procedure 


While systematic and exhaustive enumeration of all 
possible conformations is not practically feasible for 
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polypeptides and proteins, attempts have been made to 
develop algorithms that lead to a truncated systematic 
search of the conformational space of these molecules. 
One of these methods, developed in our laboratory, 
the build-up procedure, [9,71,85,86,88,91] assumes that 
short-range interactions play a dominant role in deter- 
mining the conformation of a polypeptide chain. Thus, 
the method starts by locating the low-energy confor- 
mations of small fragments of the chain by an exhaus- 
tive energy minimization procedure. Then, a selection 
of the minima is carried out, keeping those that lie 
within an appropriate chosen upper bound (the cutoff 
energy) of the lowest-energy fragment. Subsequently, 
the limited set of minima for one fragment is combined 
with the set of another fragment to form larger peptides 
which are also subjected to energy-minimization. This 
process is repeated until the whole chain is eventually 
built up from its constituent parts. At successive stages 
of the algorithm, more and more long-range interac- 
tions come into play. 


Outline of the Procedure 


1. The smallest fragment that the build-up procedure 
uses to construct a polypeptide conformation is the 
single amino acid. The ECEPP/2 minimum-energy 
conformations of terminally blocked single residues 
were reported by M. Vasquez, G. Némethy and H.A. 
Scheraga [90]. The conformations were ordered by 
increasing energy using a cutoff energy of 5 kcal/mol 
and were classified according to the code defined 
by S.S. Zimmerman, M.S. Pottle, G. Némethy and 
H.A. Scheraga [98]. The ECEPP/3 force field pro- 
duces the same energy minima for all blocked amino 
acids with the exception of the proline and hydrox- 
yproline residues. 

2. All possible dipeptides for a given molecule are gen- 
erated from single-residue data (for a peptide with 
n residues there are n—1 dipeptides). After energy- 
minimization, the dipeptides are sorted and are used 
to construct tripeptides. 

3. Subsequent steps to form larger fragments of the 
polypeptide chain involve joining two fragments 
with one or more residues in common, e.g. after 
generating conformations for the tripeptides, these 
can be used to construct tetrapeptides from two 
tripeptides having two residues in common. This 


process is continued until the whole polypeptide 
chain is built. 


Drawbacks of the Procedure 


One of the major difficulties of the build-up proce- 
dure is that the number of conformations of fragments 
that must be energy-minimized and stored at each step 
increases exponentially. A partial solution, aside from 
using an energy cut-off, is to retain only those min- 
ima whose backbone conformations differ significantly: 
e.g. when several local minima have almost identical 
backbone but different side-chain conformations, only 
the lowest-energy minimum is kept while the degen- 
erate ones are discarded. This approach drastically re- 
duces the number of conformations to be stored at each 
stage of the procedure; however, it may lead to prob- 
lems at later stages because the side-chain rotamers that 
are most favorable energetically in smaller fragments 
are not necessarily favored in the whole polypeptide 
chain. Another difficulty associated with the procedure 
is that atomic overlaps can occur when two fragments 
are joined in an arbitrary manner. These overlaps lead 
to conformations with extremely high energy for which 
minimization is usually not computationally feasible. 
A set of algorithms designed to surmount these prob- 
lems was presented by K.D. Gibson and H.A. Scher- 
aga [9]. 


Applications 


The build-up procedure has been used extensively in 
a number of studies of different molecules, among them 
Metenkephalin [91], Gramicidin S [6,51], Melittin [71], 
bovine pancreatic trypsin inhibitor [92,93] and colla- 
gen [41,42,43]. The method appears to work well for 
small oligopeptides and fibrous proteins but, except in 
a few cases, its application to larger molecules becomes 
unmanageable for polypeptide chains containing 10 or 
more amino acid residues. 


The Self Consistent Electrostatic Field Method 


Among all the interactions that lead to protein folding, 
electrostatic interactions are the only ones character- 
ized as long-range. Therefore, they undoubtedly must 
play an important role in folding. The dominant effects 
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of electrostatic interactions in proteins are well recog- 

nized [60]. Among these effects, it is worth mentioning: 

e The orientation of the CO and NH dipoles in a-he- 
lices are very favorable electrostatically [95] leading 
to a large dipole moment associated with this type of 
secondary structure. 

e The electric field produced by an w-helix constitutes 
a very important stabilizing factor of the native con- 
formations of proteins containing this type of sec- 
ondary structure [11]. 

e The relative orientations of a-helices and B-sheets 
in proteins are favorable electrostatically [4,12,25]. 

Based on this evidence, L. Piela and H.A. Scher- 
aga [62], hypothesized that that the native conforma- 
tion of a protein arises when the electrostatic interac- 
tions are near optimal, or equivalently, that the native 
conformation must have approximately optimal orien- 
tations of its group dipoles in the electric field gener- 
ated by the whole molecule and its surrounding solvent. 
Based on this assumption (which was later confirmed 
through rigorous calculations on an extensive set of 
proteins [82]), a conformational search method, named 
the Self-Consistent Electric Field (SCEF) method, was 
developed. The SCEF procedure was implemented as 
follows: 

1. Given an arbitrary starting conformation of the 
molecule, minimize the total (e. g. ECEPP/3) confor- 
mational energy to reach the nearest local minimum. 

2. For this minimized conformation, the electric field 
due to the whole molecule is calculated at each CO 
and NH group of the peptide units, and also in the 
middle of the C’-N peptide bond. 

3. The direction of the electric field with respect to the 
CO and NH bond dipole moments provides infor- 
mation as to which peptide units are badly oriented. 
This electrostatic analysis of the alignment between 
the permanent dipoles and the electric field, is used 
to generate a diagnostic rotation. The diagnostic ro- 
tation is the variation that must be applied to a given 
torsional angle to obtain the best alignment of the 
worst oriented peptide-unit dipoles with respect to 
the electric field, e. g., if the electrostatic analysis in- 
dicates that the dipole moment of the peptide bond 
between residues i and i+1 is the worst oriented, the 
diagnostic rotation will describe a change of the cor- 
responding backbone dihedral angles w; and $;+1 
required to align the dipole moment of the unit. 


4, Carry out the diagnostic rotation. 

5. Use the new conformation of the molecule as the 
starting point in step 1: 

e if a new local minimum is reached, then repeat 
the procedure from step 2 for the new local min- 
imum; 

e if the same local minimum is found, then step 3 
must be repeated, but using the diagnostic rota- 
tion for the next worst-oriented dipole. 

6. Steps 1-5 are repeated until self-consistency is 
achieved, i.e., until further application of the pro- 
cedure does not change the conformation of the 
molecule. 


Computation of the Electric Field 
and Dipole Moments 


If r represents the position vector assigned to the dipole 
moment i of a group of atoms, then the electric field, 
£(r), is computed as: 


E(r) = (le). qe(r—ri)/ |r — rel? (1) 
k 


where ¢€ is the dielectric constant, gq, indicates the 
charge on atom k with position vector r; and the prime 
in the summation sign indicates that the atoms which 
contribute to the ith dipole moment as well as those 
other atoms covalently bonded to them should be ex- 
cluded from the computation. 

The electric field is computed at three points, rico, 
ri,nH, and r;. These are reference points with respect 
to which the dipole moments of the CO bond, n°, 
the NH bond, pw", and the whole ith peptide unit, w;, 
respectively, are calculated. These dipole moments are 
computed according to the following relations: 


BS° = gclri,c — ri,co) + 4o(Ti,o — Ti,co) (2) 
NH _ 
he = Qn(tin — Vi,.NH) + Qu(Ti,H — Ti,NH) (3) 


HL; = qc(ri,c — Vi) + qolTi,o — Ti) 
+ qn(rin—1i) + qu(rin —1ri) (A) 
r,co» t,NH» and r; are chosen so that the bond 
quadrupole moments of the CO and NH bonds, Qco, 


Qwu; respectively, vanish, i.e., the three points satisfy 
the following relations: 


Qco = Aclri,c—fi,col’ + qolri,o —Ti,col’ = 0 (5) 
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Qua = Quiti.n—rinal’ + 4ulriza—rinal’ =0 (6) 
and, 


ri = (ric —Ti,n)/2. (7) 


Degree of Alignment of a Dipole Moment 
with the Electric Field 


The process of aligning a particular dipole moment, 
(with X being CO or NH), with the electric field can be 
accomplished by rotations of the backbone dihedral an- 
gles w;, and @;+, (see Fig. 2). When such a rotation is 
carried out, only the electric field components perpen- 
dicular to the rotation axis will change: 


E.xk(ti,co) = E(ri,co) — [E(ri,co) : ei,klei,zn (8) 
E.ik(riny) = E(rinu) — (E(rinw): ei,klein = (9) 


where e;,, for k = 1,2 denotes the unit vector along 
the axes of rotation, w;, and ¢;+1, respectively. Further- 
more, in writing these equations it was assumed that the 
points r;,co and rj,nuy are sufficiently close to the rota- 
tion axis. 


i 
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SCEF peptide unit i with the atomic charges used in the 
ECEPP force field 


The energy, E, of a dipole in an electric field is 
a given by: 

E=-p-€£. (10) 
Assuming that the electric field in the neighborhood 
of the ith peptide group is relatively uniform, a lower 
bound for the energy gain due to a rotation is repre- 
sented by: 


AE; = AES° + AEN (11) 


where the individual energy gains, AEC°(< 0) and 
AE" (< 0), to align the dipole and the field vectors are, 


AE} = —|ee gl |E1k(ri,x)| + ar : Fix(ri,x)] 
(12) 


with 


eg = eT — (ME - ein) ei,k . (13) 
The value of AE; given by Eq. (11) is used as a measure 
of the deviation from perfect alignment in the electric 
field of the ith peptide unit. 


Best-possible Alignment of a Dipole Moment 
with the Electric Field 


From an analysis of the AE;s, it is possible to detect 
which peptide unit is the most unfavorably oriented in 
the electric field. The SCEF method provides a mech- 
anism to compute the rotation that should lead to an 
improved orientation of this peptide unit with respect 
to the electric field. To accomplish this, the electric 
field E(r;) at the ith peptide unit can be viewed as the 
sum of two contributions generated by the portions 
of the polypeptide chain on both sides of the ith unit: 
(a) En(r;) generated by the part of the molecule con- 
taining the N-terminus; and (b) £¢(r;) generated by the 
part of the molecule containing the C-terminus, 


E(ri) = En(ri) + Ec(ri) . (14) 


The components of fj, En(r;) and Ec(r;) paral- 
lel to an axis of rotation do not change with rotations 
about this axis. On the other hand, the perpendicular 
components of these vectors with respect to a given 
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axis, say e;,, do change with rotations about the axis 
and they are given by: 


Mite = Mi — Ci,k(M; + e:,k) (15) 
En, ik(ri) = En (ri) — €i,k [En (ri) : ei,x | (16) 
Ec,ik(ri) = Eclri) — ei,k [Ec(ri)- ei] - (17) 


If ; ,, does not lie along E14 = Ey,izk + Ec,ies 
perfect alignment between the vectors can be obtained 
by a single rotation about e;,¢. For k= 1, a rotation about 
the w; axis produces a change of Fy, 11 to Ey 1 Sim- 
ilarly for k = 2, a rotation about the $;+4) axis leads 
to a change of Ec,» to Eo, fee Therefore, alignment is 
achieved if either one of the following equations is sat- 


isfied: for k = 1 (w; axis), 

Mir(Ey iy +Fc,1) = lil En itFe,111 (18) 
for k = 2 ($;+1 axis), 

Hii2(En,t2t+Fo,19) = |Mi,12/ |En,t2+Fe,191- (19) 


From geometrical considerations (see Fig. 3), the so- 
lution of Eq. (18) (similarly for Eq. (19)) is found to sat- 
isfy the relation: 


|a| = arccos(c/b) (20) 


where b = |Fy,11|, ¢ = d'” with d = b? — a’ sin’ 6c, 
a=|£c,11|, and @¢ is the angle between Fco,11 and 
;,11- Equation (20) has various numbers of solutions. 
If they exist, these solutions correspond to rotations of 
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SCEF: solution of alignment Eq. (18) 


the dipole moment ; |, with different energies. The 
value leading to the lowest energy represents the solu- 
tion to the alignment Eq. (18). This rotation of y; leads 
to an energy gain given by, 

AEi,n = —Bi,11* (Ey. — En) - (21) 

Expressions similar to Eq. (20) and (21) have been 
derived for the rotation around the $+, axis (k = 2) 
and for the corresponding energy gain, AEj,c. 

It should be mentioned that, in reality, the solution 
given by Eq. (20) produces an approximate alignment 
of #4; ;; with the corresponding electric field compo- 
nent. The reason is that the derivation of these equa- 
tions was based on the assumptions that (a) the center 
of the peptide unit is on the yf; axis of rotation, and 
(b) the electric field is homogeneous. While, in reality, 
these conditions are not satisfied, the results obtained 
from these expressions are reasonably accurate [62]. 

Finally, after both rotations about the w; and ¢)+1 
axis have been computed, the SCEF method has to 
decide which rotation should be implemented. The 
method selects the rotation associated with the more 
negative energy gain (AE;,y or AE;,c). In those cases 
where no solution is found for y; and ¢;+1, another 
unfavorable peptide unit is chosen. 


Applications 


The procedure was tested on a 19-residue poly(L-ala- 
nine) chain [62] with acetyl-and N-methyl amide ter- 
minal blocking groups. The starting conformations 
were a series of partially w-helical conformations repre- 
senting different degrees of distortion from the canon- 
ical right-handed a-helix. The right-handed a-helical 
conformation corresponds to the global energy mini- 
mum of the ECEPP/2 (and ECEPP/3) potential func- 
tion. In the four cases reported, the procedure was 
able to achieve the conformation corresponding to the 
global energy minimum in a very short computation 
time. 

Figure 4a shows the starting conformation of one of 
the tests. The conformation contains only 1.5 a-helical 
turns at each terminus and 70.6% of the native hydro- 
gen bonds are broken. In subsequent iterations of the 
SCEF procedure, the right-handed a-helix shown at the 
bottom of Fig. 4b, was completely recovered. 
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SCEF method: application to poly(L-alanine) 


The SCEF procedure was also used [76] in a re- 
strictive search of the conformational space of the 
58-residue protein bovine pancreatic trypsin inhibitor 
(BPTI). In this application, the algorithm led to a series 


of conformations with up to 50 kcal/mol lower than the 
starting conformation. 


The Monte Carlo-Minimization Method 


The Monte Carlo-Minimization (MCM) [26], [27] 
method developed by Z. Li and H.A. Scheraga was mo- 
tivated by experimental studies indicating that proteins 
are not static structures but instead undergo fluctua- 
tions. For a protein to be stable, its native conformation 
must be stable not only to small perturbations but also 
against larger-scale thermal fluctuations. Based on these 
considerations, Li and Scheraga developed a stochastic 
approach for global optimization of polypeptides and 
proteins that combines the power of the Metropolis 
Monte Carlo method [40] in global combinatorial opti- 
mization and that of conventional energy minimization 
to find local minima. The underlying working hypoth- 
esis of the method is that protein folding can be consid- 
ered as a Markov process, with (a) Boltzmann transition 
probabilities, and (b) this Markov process should lead 
to a unique absorbing state [3] that corresponds to the 
native state for a natural biologically active protein. For 
this absorbing state, equilibrium is reached after a suf- 
ficiently long time and the stationary probability of oc- 
currence approaches unity. 

The Metropolis Monte Carlo method can simulate 
the thermal processes, by taking into account both ran- 
dom fluctuations and energetic considerations. How- 
ever, straightforward applications of the Metropolis 
Monte Carlo method to polypeptides has proven to be 
quite inefficient [10,57,74] mainly because (a) a high- 
dimensional conformational space has to be sampled 
by making small increments of the variables in each 
step, and (b) The large energy barriers in the confor- 
mational space tend to confine the sampling to a very 
restrictive region of the space. To overcome these diffi- 
culties, the MCM method includes conventional energy 
minimization as a second important feature. Thus, the 
MCM method generates a Markov walk on the hyper- 
lattice of all discrete energy minima, with Boltzmann 
transition probabilities. 

The procedure implemented in the MCM algorithm 
is as follows: 

e Given an energy-minimized conformation, C 
with total energy E™, a Monte Carlo sampling 


curr? 


strategy is used to generate a perturbed conforma- 


min 
curr? 
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 — Ww maps for the five residues of Metenkephalin showing the backbone dihedral angles of 18 random starting conforma- 
tions (indicated by the numbers 1 to 18) for the MCM method. The backbone dihedral angles of the global minimum achieved 
by the MCM method in all the runs (see Fig. 1) are indicated by 0 


tion Cpyert. The sampling strategy consists of ran- 
dom changes, involving k dihedral angles of the to- 
tal number Ngie,h used to described the molecule. 
The number of changes are generated with prob- 
abilities 2-* (k = 1,2,..., Ndien). This probability 
selection implies that fluctuations involving more 
degrees of freedom are sampled with successively 
lower probabilities. This sampling strategy satisfies 
the ergodicity requirements, i.e., any local mini- 
mum is accessible from any other one after a finite 
number of random sampling steps. Furthermore, 
in order to improve the average acceptance ratio, 
random changes involving backbone dihedral an- 
gles are sampled more frequently than those of side 
chains. This type of sampling strategy led to an av- 
erage acceptance ratio of approximately 20% at 0°C 
for Metenkephalin. 

The randomly generated conformation, Cper, is 
then subjected to conventional energy minimization 
until it reaches the nearest local minimum of the 


potential energy function (ECEPP/2 or ECEPP/3). 
Minimization of the energy is carried out with 
the Secant Unconstrained Minimization Solver 
(SUMSL) algorithm [8]. The resulting conforma- 
tion, Cre» has a total energy Ene 
of atomic overlaps. 

e The energies of the conformations Cm and Czy" 
are compared, and the Metropolis criterion is used 
to decide which conformation is to be kept, i-e., 
if the energy difference AE = Boos — E™" <0, or 
(when AE>0) if e~44/8T is greater than a ran- 
domly generated number between 0 and 1, the new 


conformation, C™” replaces the current C™™”; oth- 


and is usually free 


: pert curr? 
erwise, Coett is discarded. 
Applications 


The MCM procedure was successfully applied to study 
the conformational preference of the pentapeptide 
Metenkephalin [26,27]. In its initial application [26], 13 
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of 18 random starting conformations of this oligopep- 
tide converged to the global minimum, shown in Fig. 1, 
within the time of the simulations. Using a different 
sampling strategy [27], the 5 remaining runs also con- 
verged to the same lowest energy structure. Figure 5 
shows the values of the backbone dihedral angles, @ and 
v for the 18 starting conformations. 

As a further development, we extended the concept 
of MCM to include biasing the perturbations to elec- 
trostatic interaction, giving the Electrostatically Driven 
Monte Carlo (EDMC) method, which is described in 
the next section. More recently, we took advantage 
of grouping the conformations obtained in the search 
into families which are updated on the fly and, using 
the properties of the families in the subsequent steps 
of the search, this resulted in the conformation-family 
Monte Carlo (CFMC) method [65]. The CFMC method 
was used to search the conformational space of the 
B-domain of staphylococcal protein A in the united- 
residue representation [65] and for crystal structure 
prediction of small molecules [63]. 


The Electrostatically Driven 
Monte Carlo Method 


The Electrostatically Driven Monte Carlo (EDMC) 
method, introduced by D.R. Ripoll and H.A. Scher- 
aga, is a procedure for iteratively searching the confor- 
mational hypersurface of relatively small polypeptide 
molecules. The EDMC method incorporates the best 
features of the SCEF and MCM methods and combines 
them with a set of new techniques to produce a more 
efficient search of the conformational space. 

The search for the the global energy minimum of 
a molecule proceeds as a “quasi-random walk” along 
a conformational pathway. As with the MCM method, 
this pathway is defined, in principle, by an infinite se- 
quence of energy-minimized conformations encoun- 
tered over an unbounded number of iterative steps of 
the algorithm. In practice, however, a finite number of 
iterations is specified for a given run. The underlying 
assumption behind the EDMC method is that (a) the 
electrostatic interactions should lead to conformations 
representing an improvement of the charge distribu- 
tion, i.e. the new conformations are expected to have 
lower electrostatic and total energies; and (b) thermal 
fluctuations, on the other hand, are expected to intro- 


duce disorder within the molecule. These thermal ef- 
fects could force the molecule to adopt conformations 
that are higher in energy, but may allow it to escape 
from stable local minima of relatively high energy. 

The implementation of these ideas is accomplished 
as follows: Thermal effects are associated with random 
changes in the molecular conformation, i.e. a small 
set of randomly-chosen variables was altered randomly. 
On the other hand, the reordering effect of the elec- 
trostatic interactions was viewed as a tendency of all 
permanent dipole moments associated with the pep- 
tide units of the polypeptide, to attain their best possi- 
ble alignment in the local electric field produced by the 
rest of the molecule. Additionally, a series of new fea- 
tures [77], included in the latest implementation of the 
EDMC method, has helped to accelerate the search and 
to optimize the process of generation of new conforma- 
tions. 


The Procedure 


The first accepted conformation on the conformational 
pathway followed by the EDMC method is usually an 
unfolded state of the polypeptide chain (i.e. the initial 
values of the variables describing the molecular confor- 
mation are assigned randomly); its energy is minimized 
to relieve possible atomic overlaps. The subsequent ac- 
cepted conformations are obtained by a variety of tech- 
niques described below. An iteration of the procedure 
is defined as a set of manipulations of the currently ac- 
cepted conformation that leads to its replacement by 

a newly generated conformation. 

The strategy used to produce new conformations 
within an iteration of the method is based upon a com- 
bination of movements associated with the electrostatic 
interactions and thermal motion. 

(a) An important technique that the EDMC method 
uses to generate new conformations is based on an 
electrostatic analysis similar to that produced by 
the SCEF method [62], but extended to consider 
the permanent dipole moments of polar side- 
chains. As a first step of an iteration, this electro- 
static analysis of the currently accepted conforma- 
tion (the initial energy-minimized conformation 
or the accepted conformation from the previous it- 
eration) is carried out to determine the alignment 
of the permanent dipoles with the local electric 
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field produced by the whole molecule. As a result, 
diagnostic rotations that could improve the local 
dipole alignments with the electric field are pro- 
duced for all permanent dipole moments. These di- 
agnostic rotations are incorporated into a predic- 
tion list of possible conformational changes. The 
information contained in this list is used to gener- 
ate new conformations in a subsequent search for 
states of lower energy. 

(b) Since it may happen that none of these predic- 
tions leads to an acceptable conformation, a ran- 
dom and/or biased sampling technique is also used 
to generate additional conformations. The follow- 
ing procedure is followed: 

1. Specification of the mode in which the variable 
dihedral angles of the selected residues are to be 
altered: 

i) Select all variables at random; 

ii) Select the backbone variables randomly 
within specific regions of the dé — yw map; 

iii) Select all variables from pre-computed 
low-energy conformations of the tri-pep- 
tides included in the sequence; 

iv) Select backbone variables compatible with 
regular structures 6-sheets or a-helices). 

2. Random selection of i) the number of residues 
to be affected by the changes, and ii) their po- 
sitions in the sequence. 

The latest implementation of the algorithm [77] in- 
cludes a technique to produce a cluster analysis of the 
accepted minima. The conformations are grouped into 
clusters using rms distance criteria and ranked on the 
basis of their total energies. Furthermore, every gener- 
ated conformation, even if rejected, is associated with 
an existing cluster or family, but added to it only if its 
energy is lower than the one corresponding to the best 
member of that family.During an iteration, randomly 
generated conformations can also be produced by per- 
turbing low-energy conformations included in any of 
the clusters (except the one containing the current ac- 
cepted minima) using the protocol described in item (b) 
above. 

A conformation generated by any of these two pro- 
cedures (a or b) is subjected to minimization of the to- 
tal energy where the backbone and side-chain dihedral 
angles of the molecule are considered as variables. The 
energy-minimization procedure is carried out with the 


SUMSL algorithm [8]. The value of the potential en- 

ergy constitutes the basis for either the acceptance or 

rejection of the new minimum-energy conformation. 

A newly generated conformation must fulfill two cri- 

teria to be accepted: 

1. Ifa generated conformation is found to correspond 
to an accepted minimum that has already been 
encountered more than a pre-defined number of 
times (usually 5-10), then it is automatically ex- 
cluded from further consideration. This analysis of 
the long-term behavior of the search provides one 
of the criteria to ensure that the search does not be- 
come trapped in a set of local minima of the confor- 
mational space. 

2. Ifa conformation satisfies the previous condition, its 
energy Enew is compared with the energy, Ecurr, of the 
current accepted conformation, and the Metropolis 
criterion [40], as described for the MCM method, is 
applied. 

When the energy of the new conformation passes 
both tests successfully, the conformation is accepted, 
replacing the current one, and a new iteration begins. 


Backtrack 


The number of conformations generated within a given 

iteration is limited (usually 100 to 200 conformations). 

It may happen that neither the set of electrostatic pre- 

dictions, nor the set of randomly generated confor- 

mations produces an acceptable conformation. Under 
these circumstances, the algorithm then assumes that 
the current local minimum is quite stable and a new 
procedure named backtrack is triggered. The backtrack 
procedure attempts to displace the search to a differ- 
ent region of the conformational hypersurface by sub- 
stantially altering the processes of generation and ac- 
ceptance of conformations. 

The backtrack procedure involves the following: 

a) A new set of conformations is generated by chang- 
ing a large number of variables simultaneously. In 
particular, the procedure tends to select the vari- 
ables associated mainly with the backbone of the 
polypeptide chain; and, 

b) the temperature parameter, T, used in the Metropo- 
lis acceptance criterion is (i) raised abruptly to a very 
high value, or (ii) steadily increased by means of 
a pre-defined heating scheme. 
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The backtrack procedure is applied until the ac- 
ceptance test is satisfied, or until the number of gen- 
erated conformations reaches a predetermined maxi- 
mum value. In the rare event that the latter situation 
occurs, the run is terminated since it is assumed that it 
is practically impossible to escape from the current re- 
gion of the conformational space. On the other hand, 
when a conformation from the backtrack procedure is 
accepted, the temperature parameter is reset to its origi- 
nal user-specified value, and the generation mechanism 
is switched back to the standard protocol described 
above. 

The objective of the modified generation procedure 
during backtrack is to produce conformations substan- 
tially different from the current minima, while rais- 
ing the temperature has the effect of increasing the 
probability of acceptance of conformations with ener- 
gies much higher than the current local minimum. The 
backtrack mechanism has been shown to be an effective 
technique to help the search avoid being trapped in sta- 
ble, high-energy regions of the conformational space. 

The EDMC method has some similarities with sim- 
ulating annealing, proposed by S. Kirkpatrick, C.D. 
Gelatt and M.P. Vecchi [15], since both make use of 
high temperatures to surmount large energy barriers. 
The difference is that the EDMC procedure concen- 
trates the search in the low-energy regions of the con- 
formational space using energy minimization and a low 
temperature value. High temperatures are used rarely 
during backtrack to escape from stable or already vis- 
ited regions. Once this is accomplished, the tempera- 
ture parameter is reset to its initial (low) value. A search 
using simulated annealing, on the other hand, starts 
with a high temperature value and this parameter is 
gradually reduced during the simulation. The expecta- 
tion is that, given a sufficiently high initial temperature 
and a good annealing schedule, the search will over- 
come large energy barriers and will become localized 
in the low-energy region containing the global mini- 
mum. 


Applications 


The multiple-minima problem has been found to be 
computationally tractable by the EDMC method on ex- 
isting computers for polypeptides sequences consisting 
of up to 20 amino acid residues. 


Global Optimization in Protein Folding, Figure 6 
Lowest-energy conformation of the membrane-bound por- 
tion of melittin for the ECEPP/3 force field determined by 
the Conformational Space Annealing [23] and the EDMC [77] 
methods 


In applications to Metenkephalin [79], oxy- 
tocin [39], arginine-vasopressin [39], decaglycine [80], 
a 19-residue chain of poly(L-alanine) [78], and the 
20-residue membrane-bound portion of melittin [77] 
(see Fig. 6), the EDMC algorithm converged to unique 
conformations presumed to be the global energy min- 
ima for those particular sequences. 

In other applications, to a seven-residue pep- 
tide epitope [75], and a twelve-residue analogue of 
mastoparan and mastoparan X [7], the method identi- 
fied very low-energy conformations, but it is not certain 
that the global energy minima were attained in these 
cases. 

Lately, the EDMC method has been applied to the 
36-residue villin headpiece subdomain [81], and the 
45-residue fragment B-domain of staphylococcal pro- 
tein A [94]. In both applications, unrestricted global 
searches that started from randomly generated confor- 
mations encountered in their paths low-energy basins 
that included native-like conformations. To our knowl- 
edge, the application to the B-domain of staphylococ- 
cal protein A was, at the time, the first all-atom sim- 
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ulation in which such a large protein was ever folded 
from random initial conformations without resort to 
knowledge-based information. 

The EDMC method has also been used in restric- 
tive searches of the conformational space of larger 
molecules. In an application to the 58-residue protein 
BPTI [76], the algorithm produced the lowest energy 
conformation known for BPTI using the ECEPP/2 or 
ECEPP/3 potential. In addition, the EDMC method has 
also been used to search the conformational properties 
of a non-oncogenic p21 protein [30] and a molecular 
switch designed as a biological logic gate [2]. 


The Diffusion Equation Method 
and Other Methods Based on the Deformation 
of the Potential-Energy Surface 


The diffusion equation method (DEM) is a determinis- 
tic approach that attempts to solve the multiple-min- 
ima problem by deforming the potential energy hyper- 
surface. The basic idea of the method, introduced by 
Piela et al.[61], is to deform the multivariable func- 
tion that represents the potential energy in such a man- 
ner as to make the shallow wells disappear gradually, 
while other potential wells grow at their expense. Un- 
der the assumption that the shallower wells will dis- 
appear more easily than the deep wells, it is possi- 
ble to envision an iterative procedure that, applied to 
the potential function, will change its shape, making 
most of the minima become shallower until they dis- 
appear, while leaving a single absorbing minimum re- 
lated to the lowest minimum of the original function. 
At this point of the deformation process, a simple lo- 
cal minimization algorithm should be able to retrieve 
the position of the unique minimum from any starting 
point. However, since the deformation of the poten- 
tial should likely have altered the location of all min- 
ima, the global minimum of the original function is 
not the same as the minimum of the deformed surface. 
Its location can, in principle, be attained by slowly re- 
versing the deformation and using standard local min- 
imization procedures. Piela et al. showed that the de- 
formation of the hypersurface can be carried out with 
the aid of the diffusion equation. In this context, the 
original shape of the potential function has the mean- 
ing of an initial concentration (or temperature) distri- 
bution. 


The diffusion equation method which must be 
solved to obtain a deformed potential-energy surface is 
given by Eq. (22). 


OF (x1, X2,...,Xn3t 
V7 F(X), X0.--- 5 Xnpb) = ( = ” ) (22) 
ot 
where x1, X2,...,X, are variables describing the con- 
formation of a molecule, V? = (07/dx7, 0°/0x3,..., 


d°/dx2) is the Laplacian operator, the variable t rep- 
resents time and can be identified with the extent of 
deformation, and F is the deformed potential-energy 
function. Additionally, Eq. (22) is solved with the ini- 
tial condition F(x1,x2,...,%n30) = f(x1, %2,...,Xn)s 
where f(x1,%2,...,Xn) is the original (undeformed) 
potential-energy function. The function F usually rep- 
resents a concentration or a temperature distribution. 

If the function f(x, x2,...,X,) is bounded, a solution 

of Eq. (22) exists for any positive value of t. 

The procedure described above represents a spon- 
taneous mass transport (or flow of heat) in a medium 
for an initial distribution of concentration (or temper- 
ature) given by the function f (x1, x2,...,X,) (which in 
our case represents the conformational energy). Gov- 
erned by the diffusion equation and independent of the 
initial conditions, the concentration (or temperature), 
will evolve with time in such a manner that it will be- 
come constant for f = oo. However, it is expected that 
the concentration (or temperature) will exhibit a sin- 
gle minimum for certain (very large) values of t. This 
single minimum should represent the last trace of the 
potential well corresponding to the global minimum of 
the original hypersurface f(x), x2,...,X,). The defor- 
mation and its subsequent reversal to retrieve the posi- 
tion of the original minimum is illustrated in Fig. 7. 

Application of the DEM consists of the following 
steps: 

e Solve Eq. (22) using F(x,0) = f(x) as the initial 
condition or apply the operator T(t) for a suffi- 
ciently large value of t (to); then, use a local min- 
imization to locate the position xj of the unique 
minimum on the deformed surface. This is the start- 
ing point to be used in the reversing procedure. 
Apply the reversing procedure described above. 

e For a reversing procedure involving m steps, the 
position xj obtained by minimizing F (XF—~(m—1) Ags 

to = 0) should correspond, hopefully, to the posi- 

tion of the global minimum of the function f. 
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Global Optimization in Protein Folding, Figure 7 

The DEM method: Illustration of the deformation of the origi- 
nal potential f(x) = x* + 2x? + 0.9x? by the operator T(t) = 
exp(td?/dx”), and of the reversing procedure. The deforma- 
tion applied by the operator T(tp = 0.25) leads to a curve 
with a unique minimum that is achievable from any point of 
the space with a simple minimization. The reversing proce- 
dure is shown by the arrows directed downward. Each step 
of the reversing procedure is followed by minimization sym- 
bolized in the figure by a ball moving down hill from the min- 
imum position of the upper curve and always reaching the 
position of the minimum in the lower curve. In the final step, 
the global minimum of the original function is found 


Among other applications, the DEM has been applied 

to: 

e A cluster of 55 Lennard-Jones atoms for which the 
global minimum was found [16]. 

A single terminally blocked alanine [17]. 

e The pentapeptide Met-enkephalin [17] for which 
the method led to practically the same global-min- 
imum backbone structure obtained by other meth- 
ods. The test, however, was carried out under more 
restrictive conditions since only the backbone dihe- 
dral angles ¢ and y were considered as variables. 

e Prediction of the crystal structures of hexasulfur and 
benzene molecules [96,97]. 

Although the DEM method is, in theory, a determin- 

istic approach, we found [96,97] that it must be com- 

bined with a Monte Carlo search to work for more com- 


plex systems. When the potential-energy surface is de- 
formed to contain just a single minimum, it is so flat 
that, to the numerical accuracy, it is effectively constant. 
Thus, deformation cannot be carried out to leave only 
one minimum. Moreover, the position of a minimum 
on a highly deformed surface is too far from that on 
the original energy surface. During the process of re- 
versal, the single minimum splits into multiple minima 
and it is not clear which one of those should be chosen 
to continue the reversal. In our successful application 
to crystal-structure prediction [96,97] we, therefore, in- 
troduced the MCM search both on the deformed po- 
tential-energy surface and during reversal. 

Taking advantage of the concept of the deforma- 
tion of potential-energy surfaces, we developed sev- 
eral other methods for the search of the global mini- 
mum of the energy of polypeptide and proteins. The 
distance scaling method (DSM) [70] developed by J. 
Pillardy and L. Piela, (as well as its predecessor, the 
shift method (SM) [68]) attempts to solve the mul- 
tiple-minima problem using transformations of the 
atom-atom distances that lead to smoothing of the 
potential energy hypersurface. These methods have 
subsequently evolved into the Self-Consistent Basin- 
to-Deformed-Basin Mapping (SCBDBM) method, in 
which the coupling between the basin containing the 
global energy minimum to the corresponding basin in 
the deformed potential-energy surface is established. 
The SCBDBM involves some Monte Carlo search on 
the deformed potential-energy surface and during the 
process of reversal. All three methods have been ap- 
plied successfully to clusters of argon atoms and water 
molecules [67,68,69,70] and to the prediction of crys- 
tal structures [97]. The SCBDBM method was also ap- 
plied [66] in searches for low-energy minima of poly- 
L-ananine chains of up to 100 amino-acid residues in 
length and the 10-55 fragment of the B-domain of 
staphylococcal protein A using a united-residue rep- 
resentation of the polypeptide chain. As opposed to 
DEM, the SM, DSM, and SCBDBM approaches, al- 
though not so elegant from the theoretical point of 
view, involve simple transformations of the potential- 
energy surface and are, therefore, much better for prac- 
tical use than DEM, which requires solving a parabolic 
differential equation in multiple dimensions and with 
complicated boundary conditions, which is a highly 
non-trivial task. 
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Another approach related to deformation of the 
potential-energy surface has been developed by K.A. 
Olszewski, L. Piela and H.A. Scheraga [55] and 
termed Self-Consistent Mean Torsional Field (SCMTF) 
method. It is based on the idea that the ground-state 
solution of the Schroedinger equation contains infor- 
mation about the location of the global minimum. 
Their implementation uses a mean field approxima- 
tion to solve a set of coupled Schroedinger equa- 
tions in a dihedral-angle space. Each equation de- 
scribes the changes of a single dihedral angle in the 
averaged field of the others. This approach was suc- 
cessful in finding the lowest-energy conformations of 
Met-enkephalin [55], and decaglycine and eikosaala- 
nine chains [56]. 


The Conformational Space Annealing Method 


One of the most efficient methods to search the con- 
formational space of polypeptide chains developed in 
our laboratory is the Conformational Space Annealing 
(CSA) method [19,21,22,24], which combines the ideas 
of genetic algorithms, the build-up procedure, random 
search, and local minimization. The CSA method be- 
gins with a randomly-generated population of confor- 
mations which are energy minimized to generate the 
first bank of conformations. The first bank is meant 
to represent a sparse sampling of the conformational 
space that captures short-range interactions. From the 
initial population, a number of conformations (called 
seeds) are selected as parents for the trial popula- 
tion. These “seed” conformations are altered in a non- 
random fashion to create new trial conformations. As 
in any genetic algorithm, the trial population is gen- 
erated by the use of genetic operators: mutations and 
crossovers. Unlike traditional genetic algorithms, the 
mutation operator applied in CSA does not change the 
value of the selected variable randomly; instead it uses 
values of the corresponding variables in the initial pop- 
ulation (the first bank) or in the current population of 
conformations as a pool of random numbers. A copy 
of the first bank is used as a source of “random” vari- 
ables, which are not uniformly distributed but their dis- 
tribution is determined by intramolecular interactions 
at this stage, mainly by steric overlap. The crossover op- 
erators copy a set of variables representing a continuous 
segment of the polypeptide chain of various size taken 


from a randomly selected conformation in the current 
population to a selected parent conformation (seed). 
This is described in detail in the next section. Atten- 
tion is paid to assure that all trial conformations are sig- 
nificantly different from each other and from the par- 
ent conformations. After generation, all trial conforma- 
tions are energy minimized. The next step of the CSA 
algorithm is the update of the current population (the 
bank) without increasing its size. Each trial conforma- 
tion is compared to each existing conformation of the 
bank. If the trial conformation is similar to an existing 
conformation of the bank, only the lower-energy con- 
formation out of these two is preserved. If the trial con- 
formation is not similar to any existing conformation 
in the bank it represents a new distinct region of con- 
formational space. Then it replaces the highest-energy 
conformation in the bank, if its energy is lower than the 
highest energy in the bank, otherwise it is discarded. 
The distance between conformations i and j is defined 
as the difference of their dihedral angles. If the distance, 
Dj, is less than or equal to some predefined cutoff value, 
Deut, conformations i and j are considered similar, oth- 
erwise they are considered different. CSA achieves its 
efficacy by beginning with a large Deut value to essen- 
tially search all possible structures, and then gradually 
reduces (“anneals”) Dut by reducing the minimum dis- 
tance between the conformations of the bank and fo- 
cusing the search in low-energy regions of conforma- 
tional space. After updating the current population, the 
seed conformations are selected from the set of confor- 
mations not selected as seeds previously; additionally 
attention is paid to cover the conformational space as 
broadly as possible by selecting conformations not sim- 
ilar to each other as seed conformations. 

The CSA method was shown to be very efficient 
in finding the global minimum of the ECEPP/3 poten- 
tial energy function for Metenkephalin [22] and melit- 
tin [24]; it was also implemented as a standard search 
technique with the coarse-grained UNRES force field 
developed in our laboratory (see next section). 


Hierarchical Approach 


Another approach developed in our laboratory [38,87] 
starts with a coarse-grained representation of a protein 
and provides atomistic details at the end. It can be sum- 
marized in the following three stages: 
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Global Optimization in Protein Folding, Figure 8 
The UNRES model of polypeptide chains. The interaction 
sites are side-chain centroids of different sizes (SC) and 


the peptide-bond centers (p) indicated by shaded circles, 
whereas the a-carbon atoms (small empty circles) are in- 
troduced only to assist in defining the geometry. The vir- 
tual C“-C” bonds have a length of 3.8 A, corresponding to 
a trans peptide group; @ and y, denoting the virtual-bond 
angle and virtual-bond dihedral angle, respectively, are vari- 
able. Each side chain is attached to the corresponding a-car- 
bon with a “bond length”, bsc,, variable “bond angle”, a&sc;, 
formed by SC; and the bisector of the angle defined by C” 


i—1! 
C¥, and C%_,, and with a variable “dihedral angle” Bsc, of 


counterclockwise rotation about the bisector, starting from 
the right side of the C*_,, C*, C%_, frame 


i+1 

1 Extensive simulations with using the coarse-grained 
UNRES model [28,29,35,36,37,53,54] developed in 
our laboratory and subsequent selection of struc- 
tures with the lowest free energy. 

2 Conversion of selected coarse-grained structures to 
all-atom structures. 

3 Exploration of the conformational space of all-atom 
structures in the neighborhood of geometries ob- 
tained in Stage 2. 

In the UNRES model, a polypeptide chain is rep- 
resented as a sequence of a-carbon atoms (C%) with 
attached united side chains (SC) and united peptide 
groups (p), each of which is positioned in the middle 
between two consecutive C® atoms, as shown in Fig. 8. 


All three stages are executed using physics-based 
potentials; therefore, energy is the determinant of each 
of them. Stage 1 is the key point of the approach, be- 
cause it provides the widest range of exploration of the 
conformational space. Consequently, we have put most 
of our effort in the development of the coarse-grained 
UNRES force field. To execute stage 2, we developed an 
approach in which the peptide groups are positioned 
first within an a-carbon trace to minimize their energy 
of local and electrostatic interactions [13] and, subse- 
quently, the side-chain atoms are added to minimize 
the energy of the chain given a coarse-grained geom- 
etry [14]. The all-atom ECEPP/3 [49] force field is used 
in stage 3. 

The effective energy function is a sum of dif- 
ferent terms corresponding to interactions between 
the SC (Usc;sc;)s SC and p (Use; p;)s and 7) (Up; p;) 
sites, as well as local terms corresponding to bend- 
ing of virtual-bond angles 6 (U,), side-chain rotamers 
(U;ot), virtual-bond torsional (Ujo,) and double-tor- 
sional (Utora) terms, virtual-bond-stretching (Upona) 
terms, correlation terms (U‘”)) pertaining to coupling 
between backbone-local and backbone-electrostatic in- 
teractions [29] (where m denotes the order of correla- 
tion), and a term accounting for the energetics of disul- 
fide bonds (Uss). Each of these terms is multiplied by 
an appropriate weight, w. The energy function is given 
by Eq. (23). 


U = wsc > Uscisc; + Wscp > Use; p; 
i<j xj 


+ Wpp > One + Wtor x Utor(¥i) 


i<j-l 
+ Word )_, Urora(Vis Viti) + Wo >) Us(6i) 


+ Wrot > Urot (sc; ’ Bsc;) 23) 


i 
6 
+ wee 
m=>3 


nbond 
+ Woond ), Ubona(di) + wss >, Uss;i - 
i=l i 
The expression for the effective energy in the 
UNRES model was derived based on the physics of in- 
teractions, as a cluster-cumulant [18] expansion of the 
effective free energy of a protein plus the surround- 
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ing solvent, in which the secondary degrees of freedom 
had been averaged out [29,31,35]. Most of the expres- 
sions were parameterized based on energy surfaces of 
models systems computed by ab initio molecular quan- 
tum mechanics [35,53]; some of them were parameter- 
ized based on the statistics from the PDB [36,37]. The 
energy-term weights (the w’s in Eq. (23)) were deter- 
mined [54] by using the method of hierarchical opti- 
mization of the potential-energy landscape developed 
in our laboratory [28], in which the energy of selected 
training proteins decreases with increasing native-like- 
ness. 

Using the Conformational Space Annealing (CSA) 
method [19,21,22] to search for the global energy min- 
imum of the UNRES energy function, we achieved 
considerable success in the Community Wide Experi- 
ments of Techniques for Protein Structure Prediction 
(CASP). In CASP3, we made the best prediction for 
target T0061 (protein HDEA), predicting its 60-residue 
segment within 4.2 A C? RMSD from the experimen- 
tal structure (PDB code: 1BG8) [34]. The experimental 
and predicted structures are superposed in Fig. 9. 

At that time, our force field did not contain suf- 
ficient correlation terms and was unable to account 
for B-sheet formation. After introducing correlation 
terms [29], in the CASP4 - CASP6 experiments we were 
able to predict significant portions of the structures of 
a+ 6 and £ proteins [52,64]. In the CASP6 experi- 
ment [52], we predicted complete structures of five pro- 
teins and large portions of structure of other protein 
without ancillary information from protein structural 
databases. The largest a-helical protein, the whole of 
which except for a short C-terminal fragment was pre- 
dicted in CASP6 was target T0198 (235 residues; we 
predicted the topology of its 208-residue a-helical part) 
and the largest a + f protein was T0230 (97 residues). 

We extended our hierarchical approach to treat 
oligomeric proteins [83,84] and to proteins containing 
disulfide bonds [5]; the second extension includes the 
energy-based prediction of disulfide-bond topology. 

Recently [33] we extended the implementation of 
the UNRES force field to mesoscopic dynamics. The 
corresponding simulations led us to the conclusion that 
conformational entropy makes a major contribution to 
the probability of occurrence of a family of conforma- 
tions. A particular single conformation can have a very 
low potential energy but no chance to appear at room 


Global Optimization in Protein Folding, Figure 9 
Superposition of the crystal (dark grey) and predicted (light 
gray) structures of HDEA. The C® atoms of the fragment in- 
cluded between residues D25 to 185 were superposed. The 
RMSD is 4.2 A. Helices 3, 4 and 5 are indicated as H-3, H-4 and 
H-5, respectively 


temperature if it belongs to a very narrow basin in the 
potential-energy surface. On the contrary, higher-en- 
ergy conformations could form a very broad basin and, 
consequently, make an overwhelming contribution to 
the statistical ensemble at room temperature. Conse- 
quently, in our latest work [32] we have reformulated 
energy-based protein-structure prediction as a search 
of the basin with the lowest free energy at physiological 
temperatures, by using techniques based on molecular 
dynamics, such as replica-exchange molecular dynam- 
ics [47] to search conformational space. 
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Introduction 


In their effort to locate the global solution, deterministic 
global optimization algorithms, like the wBB [1,2,6,14], 
employ a branch and bound framework. During this 
process, convex underestimation techniques are used to 
formulate relaxed convex problems that can be solved 
to optimality with the use of local solvers, thus provid- 
ing valid lower bounds for the original problem. The 
tightness of the underestimators used is of fundamental 
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importance for the computational performance of these 
algorithms, since a tighter relaxation can lead to faster 
fathoming and less nodes of the branch and bound tree 
to be visited [7]. A recent review article on deterministic 
global optimization approaches can be found in [8]. 

In the case of arbitrary nonconvex functions that do 
not exhibit an exploitable mathematical structure, the 
a@BB general underestimator [3,6] can be used: 


V 
L(x) = f(x) — }o a(x, — x2 )xP—x). (1) 


v=1 


Originally introduced in [14], this underestimator 
derives from the function by subtracting a positive 
quadratic (a, > OVv). Given sufficiently large values 
of the a, parameters, all nonconvexities in the orig- 
inal function f(x) can be overpowered, resulting into 
a convex underestimator L(x) that is valid for the en- 
tire domain [x’,x/]. A number of rigorous methods 
have been devised in order to select appropriate values 
for these parameters [2,3,13]. Extensive computational 
testing of the algorithm [1] showed that the most effi- 
cient of those methods is the one based on the scaled 
Gherschgorin theorem. According to this method, it 
suffices to select: 


Vv 
1 
uy (2) 


U_ Ll 
[ral w=) 
(ay — xh) 
where hy, and h,, are lower and upper bounds of 
0? f/0x,x, that can be calculated by interval analysis. 

One could use alternatively a new class of general 
purpose convex underestimators that has been devel- 
oped by Akrotirianakis and Floudas [4,5]. These under- 
estimators are derived in a similar fashion, by subtract- 
ing an exponential term from the original function, that 
is: 


Vv 
— = = (xy —xb) = v(xU —x,) : 
L(x) = f(x) aC eY yQ eY ) 
(3) 


An iterative systematic procedure is used to determine 
the values of the y, parameters so as the underesti- 
mating function to be convex. The procedure ensures 


also that the resulting underestimator L(x) is tighter 
than L(x), the one that results from the original method. 
Floudas and Kreinovich [9,10] have in fact shown that 
these two functional forms (original quadratic and ex- 
ponential) are the only optimal ones, since they are the 
only ones to be shift-, sign- and scale-invariant. 

Maranas and Floudas [14] showed that the maxi- 
mum separation distance between the original function 
f(x) and the underestimator L(x) of (1) is a quadratic 
function of interval length. Because of this, as well as 
because of potentially less overestimation in the inter- 
val extension of the Hessian matrix elements h,,,, the 
underestimator would become tighter with shrinkage 
of the domain under consideration. This was firstly ex- 
ploited in Meyer and Floudas [15], where a piecewise 
approach was utilized. The method proposed partition- 
ing of the domain into many subdomains and construc- 
tion of the corresponding wBB underestimator for each 
one of them. These underestimators, although not valid 
for the entire domain, are much tighter in their respec- 
tive subdomains. A hyperplane is subsequently added 
to each one of these underestimators and is selected in 
such a way, so that the combination of all these con- 
vex pieces results into an overall convex underestimator 
that is continuous and smooth (C!-continuity). 

This entry describes the work of [11,12] on the de- 
velopment of tight convex underestimators. The con- 
struction of these underestimators is based on a piece- 
wise application of the wBB underestimator, in a similar 
fashion with the p-@BB approach [15], but, instead of 
adding hyperplanes, we identify those supporting line 
segments that have to be combined with convex parts 
of the original underestimators so as to form a C!- 
continuous convex underestimator that is valid for the 
overall domain under consideration. One can also con- 
sider only the lines defined by these linear segments, 
thus coming up with a piecewise linear underestimator 
that can easily be incorporated in the NLP relaxation as 
a set of linear constraints. 

In their work, Gounaris and Floudas [12] also 
demonstrated how one can make use of the high quality 
results of the approach in the univariate case so as to ex- 
tend its applicability to functions with a higher number 
of variables. This is achieved by proper projections of 
the multivariate wBB underestimators into select two- 
dimensional planes. Furthermore, since the method 
utilizes projections into lower-dimensional spaces, they 
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explored ways to recover some of the information lost 
in this process. In particular, they apply the method af- 
ter having transformed the original problem in an or- 
thonormal fashion. This leads to the construction of 
even tighter underestimators, through the accumula- 
tion of additional valid linear cuts in the relaxation. 


Theoretical Results for Univariate Functions 


Let f(x) be a univariate function that needs to be 
underestimated in D = [x!’,x4]. We select an in- 
teger N > 1 and partition the complete domain in 
N segments of equal length. Thus, the i-th subdo- 
main would be defined as D; = [x'~!,x‘], where: 
xiaxh + F(x¥ —x"),i=0,1,...,N. 

For every subdomain D;,i = 1,2,...,.N, we con- 
struct the corresponding wBB underestimator: 


P(x) = f(x) — a'(x — x'1)(x! — x) 
(4) 


qi 


1 1 
max 0, =) a 
where x" A is a lower bound of the second derivative 
that is valid for the entire subdomain Dj. 

Note that although an underestimator P;}(x) can be 
defined outside its respective subdomain, its convexity 
is only guaranteed for x € [x'~!, x']. 

We define P(x), x € [x’,x¥] to be the following 
branched function: 


PSG) A exes (5) 


Function P(x) is a piecewise convex valid underes- 
timator of f(x). Since it is not convex, a convexifica- 
tion technique has to be employed. The proposed tech- 
nique involves the identification of those supporting 
line segments that are required for an overall underesti- 
mator U(x). The technique is based on two algorithms, 
called “inner” and “outer”, which are described in detail 
in [11]. 

The underestimator U(x) consists of the identified 
linear parts, as well as convex parts of the underesti- 
mators P;(x), therefore it is a C!-continuous branched 
function. This might pose some computational compli- 
cations if the lower bounding (relaxation) problem is 
to be solved by local optimization solvers that require 
C ?-continuity. In order to avoid this problem, one can 
take into account only the lines defined by the line seg- 
ments. According to this alternative, we first identify 


the linear segments needed for the construction of un- 
derestimator U(x), but we consider those as lines de- 
fined in [x’,x¥]. Let there be K such lines denoted 
as T(x), k = 1,2,...,K and arranged in order of as- 
cending slope. If applicable, this set can be augmented 
with lines that are tangential to P| and Py at the respec- 
tive domain edges x" and x”. 

Each of these lines T; is a valid underestimator of 
function f(x) across the whole domain. We define the 
function V(x) to be the pointwise maximum of all these 
lines. V(x) is convex, since it is the pointwise maxi- 
mum of linear functions and it is obviously an under- 
estimator, since it consists of pieces of other underes- 
timators. At the expense of some tightness (in the re- 
gions where underestimator U(x) consisted of convex 
parts), we now have a piecewise linear underestimator 
V(x) that can be incorporated in the relaxation as a set 
of linear constraints. The whole lower bounding prob- 
lem can now be formulated as a linear programming 
problem (LP). 


Tightness of Univariate Underestimator 


It is apparent that as the level of partitioning increases, 
the underestimator P(x) comes closer to the function, 
and therefore convex underestimators U(x) and V(x) 
approach the convex envelope of f(x). Gounaris and 
Floudas [11] proved the following two theorems that 
are relevant with the tightness of the resulting underes- 
timators in the univariate case: 


Theorem 1. There is some finite partitioning level N, 
for which the convex underestimator U(x) is the convex 
envelope of function f (x). 


Theorem 2. There is some finite partitioning level N, 
for which underestimator V(x) is €-close to underestima- 
tor U(x), that is: 


max {U(x) — V(x)} <€ (6) 


where: € > 0 is an arbitrarily small constant. 


Since these univariate underestimators are very tight, 
the remaining question is whether we can exploit them 
so as to construct underestimators of functions in 
higher dimensions. Gounaris and Floudas [12] pre- 
sented some extensions of the method for application 
on multivariate functions that involve dimension re- 
duction of the problem through proper projections into 
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lower-dimensional spaces. These extensions are de- 
scribed in Sect. “Extension to Multivariate Functions”. 


Extension to Multivariate Functions 


Let f(x) be a function of V variables that needs to be 
underestimated in a box domain D = [x{, x] x--- x 
[xt,x¥]. We choose integers Ny > 1,v = 1,2,...,V 
and partition each range [x/,x/] in N, segments of 
equal length. Thus, the j-th segment of the v"" set would 
be defined as [x! |, x/], where: x) = xe PaCad = 
Coe Ay = 0,1,...,N,. The complete V-dimensional do- 
main D has now been partitioned into N = inh N, 
box subdomains of equal measures. Let D; be such a V- 
dimensional subdomain. It is uniquely defined by a set 


of indices i,,1 < i, < Ny, Vv = 1,2,..., V. Thus, the 


i‘ subdomain would be defined as D; = [x{!~', x{'] x 
Seucse cae ap 
For every subdomain Dj, i = 1,2,...,.N, we con- 


struct the corresponding awBB underestimator [1,?2, 
3,6,14]: 


4 
Pil) = fle) — Day ay — NC" — 3) 


v=1 


Vv 
; Wis 2 ; 
a, = moc }0,—3(1 y max {|h|, (7) 
u=1 


uv 
iy iy—1 
asi} (x; = ay ) 
ve Ty iy-l 
y Ay ) 


where hi!) and hs) are respectively lower and upper 
bounds of 02 ‘f/ox,x, that are valid for the entire sub- 
domain Dj. 

Note that although an underestimator P;(x) can be 
defined outside its respective subdomain, its convexity 
is only guaranteed for x € Dj. 

We select variable w,1 < w < V, which we des- 
ignate to be the active variable, and enumerate all 
My = N/Ny permutations of indices i,, v # w. Ev- 
ery such permutation m, 1 < m < My, corresponds 
to a subdomain Dy», = [x&,x¥] x T[ var (xiv, xi], 
which can be further divided into N. “subdanatis 
Dw mj (x xd] x i= ea), j = 1,2, 
..., Ny. These eebdouuane” belong to the set of the 
original subdomains D; (for i, = j) and therefore each 
one has an underestimator P,,j;(x) associated with it, 


that is: 
Pymj(x) =f (x) — ai, (xy — xt, (xd, — xy) 


Vv 
— alte, xox) © 


v=1 


vAw 


where index i satisfies Dj; = Dy; and parameters 
ai, v= 1,2,..., V are calculated according to (7). 

For every such subdomain Dymj,j = 1,2,...,Nw; 
we define the following univariate function: 


, j—1 j 
Gwimj(Xw) = min Pymj(x), ay SxS Mes (9) 
Vvxw 


Since they correspond to the minimum of a convex 
function over a subset of its variables, these functions 
are convex. Furthermore, each one is defined over a dif- 
ferent segment of [x*, x¥]. Therefore, each one can be 
considered as a convex piece of an overall piecewise con- 
vex underestimator. The latter is fully suitable for appli- 
cation of the convex underestimation method for uni- 
variate functions which was described in the previous 
sections. 

Let Vwmn(xw) be the piecewise linear underestima- 
tor obtained by the univariate method, and let it be the 
pointwise maximum of Ky, associated lines, that is: 


Vm (Xw) = max {Tymk(Xw), Wk =1,2,... »Kwat ’ 


L U 
1 See, 


(10) 


Without loss of generality, let us assume that the 
lines Ty,mx are arranged in order of ascending slope, 
that is, slope(Twmck-1)) < slope(Twmk),k = 2, 
3,...,Kwm, and that the set already includes the po- 
tential augmented tangents at the domain edges, desig- 
nated earlier as Ty and Tx +}. 

Univariate underestimator Vy (xy) could, in prin- 
ciple, be considered as a multivariate function that is 
dependent to only one variable, x,, and defined over 
the whole multidimensional (dimension V) subdomain 
Dym. That is: 


Vm (Xw) = Vm (X)s xE Dywm 


(11) 


Function Viym(x) is piecewise affine and consists of 
segments of V-dimensional hyperplanes. Since these 
hyperplanes depend only on the w*" variable, they are 
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parallel to all standard basis vectors e, with the excep- 
tion of e, (to which they are parallel only if the slope 
of the corresponding line Ty, is zero). This function 
is a valid underestimator for the original function f(x) 
across the whole subdomain Dyn. 

Applying the aforementioned procedure for ev- 
ery permutation m = 1,2,...,M,, we come up with 
a collection of such underestimating segments, each 
of which is a valid underestimator for the function 
f(x) across a subset of its original domain D. In or- 
der to develop a convex underestimator that would be 
valid for the whole domain, we have to combine all 
these segments. Let m = 0 denote the combination of 
all permutations m = 1,2,...,M,,. This combination 
can be achieved back in the projection space, by com- 
puting the lower hull of the set of all underestimators 
Vwm(xw). In fact, one needs to consider only the ver- 
tex points of each underestimator V,(x;) (that is the 
points of intersection between two lines Ty(x—1) and 
Twmk)> as well as their end points (x! Tymni(xi,)) and 
(<0. Tym(Kym)(xW)). Any standard 2d convex hull al- 
gorithm (e.g., Graham-Scan) can be used for this pur- 
pose. The lower hull is a convex piecewise linear func- 
tion Vyo(xy), and it is the pointwise maximum of Kyo 
lines, that is: 


Vwol(xw) = max { Tyor(Xw), Vk = 1,2,..., Kwo}, 


L U 
Xy SXw SX, . 


(12) 


By construction, this function is a convex underes- 
timator of all pieces Gy, ;(x,,) for all permutations, that 
is: 


pg : 
Vivo(Xw) SGwimj(Xw), Xy € [xe mone 


Vj =1,2,...,Ny,.¥m =1,2,...,My. 
(13) 


Therefore, function V,,9(xy), if considered as V,,o(x), is 
a valid underestimator for function f(x) across its whole 
original domain D. 

For any selection of the active variable w, the 
method will yield a convex (piecewise affine) underes- 
timator which would be valid for the whole domain of 
interest, D. However, the method can be independently 
applied for every variable being active (one at a time), 


leading to a collection of valid underestimators. The 
pointwise maximum of all these is itself a valid convex 
underestimator, and is tighter (or equally tight) to the 
original function than any of its predecessors. Thus, the 
resulting underestimator is: 


V(x) = max {Vyo(x), Vw = 1,2,...,V},xEeD. 
(14) 


Note that the underestimator V(x) is also piece- 
wise hyperplanar, and can be represented in the prob- 
lem relaxation as a set of linear constraints. Since we 
do not know explicitly which hyperplanes Tyox(xw) > 
Twow(x), k = 1,2,...,Kyo,w = 1,2,..., V contribute 
some part of theirs to the overall underestimator V(x), 
all of them should be included in this relaxation, despite 
the fact that some may end up being redundant. 

Since our method produces piecewise affine under- 
estimators L = V, the resulting convex relaxation is 
just a linear programming problem (LP), which takes 
the form of (15). 


min ju 
9X 


Vk = 1,2,...,K© 


w0 


t p> T (x, 
3 He > Tyg (%w) Vw =1,2,...,V 


(15) 
Vk =1,2,...,.KY 
T (xy) <0) Vw=1,2,...,V 
Vq =1,2,...,Q 


Domain Rotation 


The methodology presented in Section 4 involves the 
minimization of underestimators Py, ,,;(x), over all their 
variables with the exception of one, variable x,,, which is 
designated as “active”. Whenever such a projection into 
spaces of lower dimensionality is involved, there is the 
possibility that some useful information is lost. Some of 
this lost information will be recovered if we opt to ap- 
ply the methodology for every variable being “active”, 
one at a time, which basically calls for projecting into V 
different two-dimensional planes, each one being paral- 
lel to a different basis vector e,,v = 1,2,..., V. How- 
ever, since there is a finite number of variables in our 
problem, there is a limited number of planes to which 
we can project. If we want to enhance further the col- 
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Univariate functions f;_,4 with underestimators V(x) for three different partitioning levels (N = 24, 36 and 48) 


lection of underestimators that we will eventually accu- 
mulate in the relaxation (thus improve our chances for 
better tightness/lower bound), we will have to project 
into additional planes, that do not correspond to some 
variable that is “natural” to the problem, rather than to 
some linear combination of theirs. 

This can be achieved by applying an orthonormal 
transformation to the problem’s variable space, that is: 


x>x' =R-x. (16) 


This transformation has to be orthogonal, which 
means that it should preserve the lengths of vectors 
and the angles between vectors. Furthermore, it should 
be an orientation-preserving transformation. A V x V 
matrix R that could provide such a transformation is 
called a rotation matrix and has to be a member of the 


special orthogonal group, that is: 


R'=R? 


ReESO(V) & 
v) |R| = +1. 


(17) 


In their work, Gounaris and Floudas [12] discuss 
the selection of a suitable such matrix. They rigorously 
address the issue of selecting a suitable “rotated” do- 
main and some suitable level of partitioning, and they 
also present a method to calculate appropriate values 
for the a parameters in the transformed counterpart of 
the problem. 


Examples 


Figure 1 depicts the plots for four nonlinear univariate 
functions. In particular, for functions: fi(x) = (3x — 
1.4)sin(18x)+1.7, fo(x) = x?—cos(18x), f3(x) = (x+ 
sinx)e~* and f(x) = — °Z_, ksin[(k + 1)x + k]. 
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Global Optimization: Tight Convex Underestimators, Figure 2 
Piecewise planar underestimators of bivariate functions 


The underestimators presented correspond to par- 
titioning in N = 24, 36 and 48 subdomains (increasing 


tightness). 


2 
(4% D= 107 {x,-1F 410% (x -1¥+ x?-a7-4 
FOE pX t 2 ier) 


N= (32x32) Ag= n/ 16 


Tote!Linear Cuts = 309 
Global minimum == -1.03163 
Lower Bound =+4,03164 
BB Lower Bound = - 6.04 


N= (32x32) Ap= 7/8 


Total Linear Cuts = 191 
Global minimum = 8 x 10% 
Lower Bound =-7 x 106 
aBB Lower Bound = - 0.69 


Figure 2 depicts plots for four nonlinear bivari- 
ate functions. For each case, N; x N> is the level of 
partitioning used and Ag is the resolution of domain 
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rotation. Some additional information regarding the 
improvement of lower bound, as well as the number 
of linear cuts that have to be accumulated in the relax- 
ation, is also included. 
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A large number of decision problems in the world of 
applications may be formulated as searching for a con- 
strained global optimum (minimum, for certainty) 


gp” = oy") 

= min{g(y): ye D, gily) <0, 1<i<m}, 
where the domain of search (DS) 

D={yeR®: -21 <y,<27,1<j<N}, 


RN is the N-dimensional Euclidian space and the objec- 
tive function y(y) (henceforth denoted gm+i(y)) and the 
left-hand sides gi(y), 1 < i < m, of the constraints are 
Lipschitzian (with respective constants L;,1 <i< m+ 
1) and may be multi-extremal. 

If DS is set defined by the hyperparallelepiped 


S={weR*: aj Sw) <b), 1S j= N}, 
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then, by introducing the transformation 


= Wwj— (a; a b;)/2 


and the extra constraint 


bj — aj 


go(y) = max } |y;| — 27s Ns 0, 
it is possible to keep up the initial presentation D for DS 
(which is assumed to be the standard one) not altering 
the relations of Lipschitzian properties in dimensions. 

The assumption of the divided functions g;,0 <i< 
m + 1, differences being bounded by the respective con- 
stants L; (Lipschitzian property), which may be inter- 
preted as a mathematical description of a limited power 
of change in real systems, provides a basis for estimat- 
ing g* and y*; by exploring DS with finite number of 
trials depending on the desired accuracy of search. This 
Lipschitzian approach ([2,5,9,20]) requires, in general, 
substantially less trials than the plain uniform grid tech- 
nique owing to the thorough selection of each subse- 
quent trial with the account of all the previously com- 
puted functions’ values. 

Such a selection turns into solving some auxiliary 
multidimensional optimization problem (MOP) of in- 
creasing multi-extremality (along with the accumula- 
tion of trial outcomes) at each step of the search pro- 
cess. But the case N = 1 is effectively solvable and, 
therefore, it is of interest to present MOP by its one- 
dimensional equivalent. 

A possible way to do so ([1,7,11,12,14,15,18]) is to 
employ single-valued Peano curves y(x) continuously 
mapping the unit interval [0, 1] on the x-axis onto the 
hypercube D and, thus, yielding the equality 


g* = 9(x") 
x € [0,1], (1) 


Sa POO so otsy< 6.02720 


These curves, first introduced in [4,8], are ‘filling’ the 
cube, i.e. they pass through every point of D, and this 
gave rise to the term space filling curves (SFC); see sur- 
vey [10]. 

The construction of SFC can be explained by fol- 
lowing the scheme from [4]. Divide D into 2% equal hy- 
percubes of ‘first-partition’ by cutting D with the set of 


N mutually orthogonal hyperplanes (each plain is par- 
allel to one of the coordinate ones and passes through 
the middle points of D edges orthogonal to this hyper- 
plane). Then divide (in the above manner) each of the 
obtained first-partition cubes into 2% second-partition 
cubes. Continuing this process, i.e. consequently cut- 
ting each cube ofa current partition into 2 cubes of the 
subsequent partition, yields hypercubes of any Mth par- 
tition with the edge-length equal 2. The total num- 
ber of cubes in the Mth partition is equal 2”. 

Next, cut the interval [0, 1] into 2% equal parts. 
Then, once again, cut each of these parts into 2% smaller 
(equal) parts, etc. Designate d(M, v) the subinterval of 
Mth partition, where v is the coordinate of the left end- 
point of this interval. The length of d(M, v) is equal 
2-MN Assume that v € d(M, v), but the right endpoint 
of this subinterval (if it is not equal 1) does not belong 
to it. 

Establish a mutually single-valued correspondence 
between all subintervals of any particular Mth partition 
and all subcubes of Mth partition. Henceforth, the no- 
tation D(M, v) will stay for the subcube corresponding 
to the subinterval d(M, v) and vice versa. Assume this 
correspondence to satisfy the following conditions: 

e D(M+1,v) Cc D(M, v”) if and only if d(M + 1, v) 
Cc d(M, v’"). 

e d(M, v’) and d(M, v”) have a common endpoint 
(which is either v’ or v’) if and only if D(M, v’) and 
D(M, v’) have a common face (i. e. these subcubes 
are contiguous). 

Now, a single-valued continuous map y(x) is set by in- 

troducing the third requirement 

e Ifx €d(M, v), then y(x) € D(M, v), for M > 1. 

Note that for any integer M > 1 and any given x € [0, 

1] there is just one subinterval meeting the condition x 

€ d(M, v); the continuity is the consequence of the first 

two conditions. 


Approximation of SFC 
The center y°(x) of the subcube D(M, v) containing y(x) 
may be interpreted as an approximation to y(x); the in- 
equalities 

max {|yi(x) — y,(x)]: 1 <j <n} <2”, 


x € [0,1], 
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reflect the accuracy attainable for any particular preset 
value of M. 

A constructive way to establishing the above 
correspondence is described and substantiated in 
[3,12,15,19] and, in short, can be presented as follows. 
Introduce the auxiliary hypercube 


A={yeR®: -05<y,<15,1<i<N} 


and designate A(s), 0 < s < 2% — 1, the subcubes of the 
first partition of A. The centers of A(s) (to be referred as 
u(s)) are N-dimensional binary vectors defined by the 
relations 


uj(s) = (Bi + Bi-1) 
1<i<N, 


mod 2, 

(2) 
un = By-1, 
where f;,0 <i< N, are digits in binary presentation of 
S: 


s = By2N7! +--+ + Bo2”. (3) 


Owing to this numeration, any two centers u(s) and u(s 
+1),0<s<2N —1, have just one different coordinate, 
which means that the corresponding subcubes A(s) and 
A(s + 1) are contiguous. 

Next, let the binary form of v in d(M, v) be 


MN 
0<v= eae <1. 
i=1 
Then the identity d(M, v) = d(z1,..., Zu), where 
Z;= Yo aj-ynti2', 1<j<M, (4) 


provides the possibility to interpret d(z, ..., Zu), as the 
zuth subinterval of the interval d(zj,..., Zj—1) divided 
into 2 equal parts (the numeration streams from left 
to right along the x-axis). Note that the above identity 
implies D(M, v) = D(z1,..., Zu). 

Now, mapping A onto D by the linear transforma- 
tion and assuming that D(z) = D(s) if D(s) is the image 
of A(s), we obtain the numeration (in the first partition 
of D) satisfying the above conditions. Then by mapping 
A onto each subcube D(z;) of the first partition, we get 
the desired numeration in the second partition of D, 
where D(z}, 22) = D(z, s) if D(z, s) is the image of A(s), 
and so on. To ensure that D(z), 2% — 1) and D(z, + 1, 0) 


would also have a common face (and, in general, the 
last subcube in the first partition of D(z), ..., Zz) and 
the first subcube in the first partition of D(z, ..., Zm+ 
1) would also be contiguous) we add some mechanism 
in the above numeration procedure to provide the nec- 
essary juxtapositioning. 

Introduce the integer / = I(z), ..., Zz) indicating the 
number of the only coordinate which has to be different 
for the center of the initial subcube D(z), ..., Zy, 0) and 
the last subcube D(z;, ..., zy, 2% — 1) of the next par- 
tition of D(z, ..., Zm) and the binary vector w = w(zi, 
..., Zm) indicating the position of the center of the sub- 
cube D(z, ..., Zm; 0). To do so we employ the integer 
function 


1 ifs = Oors = 2N -1, 
I(s) = 4 min fi: 2<j<N,ppi=i}, (5) 


otherwise, 


where fj— is from (3), and the binary vector-function 


me 
wi(s +1) = wj(s) = ; (6) 
uj(s), 2 Sis N, 


where s is supposed to be the odd number, 1; stays for 

logical negation of u;, and w(0) = u(0). The amended 

procedure for successive numeration in subsequent 
partitions includes the operations: 

e permutation of uy and u; in u(z;) from (2) and of 
wy and w; in w(z;) from (6) with t = 1(z-1), where 
z-1 is from (4), 1 <j < M, and [(z;_;) is from (5); t 
= N if j = 1. New vectors are to be referred as u‘(z;) 
and w'(z;); 

e addition 

us"(zj) = (ul(z;) + qi) mod 2,1<i<N, 

w;"(z;) = (wi(zj)+ qi) mod 2,1<i<N, 
where q = w(zj-1), 1 <j <M, and q= (0,...,0) € RY 
ifj=1; 

e transformation 

N, I(zj) = t, 
(2) =4¢, I(zj) =N, 
I(zj), 1(zj) # Nand I(z;) Ft, 


where t is from the above permutation. 
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The successively computed values u'4(z;), w'4(z;), (gj) 
are used instead of the initial values u(z;), w(zj), [(zj), 1 
<j <M, to obtain the approximation 


M 

y(x) = )o(u%(z,) — p)a4, x € d(M,¥v), 

j=l 
with p= (2 ,....927") € RY. 

The important property of reducing dimensionality 
through SFC is that functions gi(y(x)),0 <i< m+ 1, 
from (1) corresponding to Lipschitzian functions from 
the initial MOP satisfy the uniform Hélder conditions 
([7,11,15,19]) 


|gi(y(x’)) — gil”) | < Ki(|x’ — x" |)¥, 
x’, x” € [0,1], 


with respective coefficients K; = AL; VN. ,O<i<m+t 
1. 

Problem (1) can further be reduced to an uncon- 
strained case by employing the index approach (IA) 
([7,13,17,18]) which makes no use of penalties and, 
thus, does not require any adjustments of penalty co- 
efficients. Within IA functions gj(y(x)) from (1) may 
not be defined throughout [0, 1]; they have to be com- 
putable only at the points x € [0, 1] meeting the condi- 
tions gx(y(x)) < 0, 1 < k < i (this property is to be re- 
ferred as partial computability of problem functionals). 
Therefore, within IA the outcome of each trial is given 
by a dyad 

f(x) = g(x), v = vx) = v(x), (7) 
where v is the number of the first constraint violated at 
the point x; this number is to be referred as the index of 


the corresponding point. 
The unconstrained equivalent of (1) is 


w(x") = min {y(x): x € [0, 1}, 


where 
yee) = HE) 
z 0, v = v(x) <m, 
a v=vx)=m+1, 


and x* is a solution to (1). The algorithm presented be- 
low solves (1) by minimizing y(x). It substitutes the un- 
known values g* and K;,0 <i < m+1, by their running 
estimates; it also surmounts the discontinuity inherent 


to w(x). 


Algorithm 


The first trial is to be executed at an arbitrary interior 

point x’ € (0, 1). The choice of any subsequent point 

x*+1 k > 1, is due to the rules: 

1) Renumber the points x', ..., x* of the previous trials 
by subscripts in the increasing order of the coordi- 
nate, i.e. 


O0=X0 <3 < XE < X41 = 1, 


and associate them with the computed values z; = 
f(x), 1 <i<k, from (7); values Zp and Z4; are un- 
defined. 


2) Collect in the sets 


wer 


ly ={i: 1<i<k, v= v(x}, 


0<v<m+1, 


all subscripts corresponding to the points with equal 
indices; it is assumed that v(xo) = v(x_41) = — 1 and 
I_; = {0, k + 1}. 


3) Construct the unions 


~ 


Sy = I, U---UI,-4, O0O<v<m-i, 


and 


Ty = Ty41 U+++ U Ini U Ina, 


0<v<m+l1, 


of subscripts corresponding to the trial points with 
the indices less than v and exceeding v respectively; 
Im+2 = @ by the definition. 

Compute the running lower bounds 


| (8) 


for respective Holder coefficients of the functions 
gv(y(x)),0< v < m+ 1.IfI, contains less than two 
elements or if 2, from (8) is equal zero, assume that 
By =1. 

Find the values 


4 


wa 


5 


~ 


for all nonempty sets I,, 0 < v < m+ 1; vector ¢ = 
(€0, ..+» €m) is the input of the algorithm. 


Global Optimization Using Space Filling 


6) Compute characteristics R(i), 1 < i<k-+ 1, where 


ROSA, 
(z; — zi-1)" _ 2(z; + Zi-1 — 223) 
Pied; rly , 
i-l, ié€l, 
4 * 
RO=o = 
Thy 
iely i-leS, 
4(z;-, — z* 
Rw 
r [Ly 
i-lel,, i€Sy, 


Aj = (xi - xi). 


Proper choice of the parameter r > 1 allows to use 
the product rjz, as an upper bound for Ky. 
7) Select integer t from 


R(t) = max{R(i): l<i<k+1l} 
and execute the subsequent trial at the point 


Xt + xXt-1 
2 


gets 


N 

te— 21 oe 
Ly 2r 
if v(x) = v(x—-1); otherwise, i.e. if v(—1) A v(x), 
the second term is omitted. 

The concept of e-reserved solution ye, where 


— sign(z; — Z:-1) 


yeD, 
gily) < —&, 


O0<i<m 


pyc) = min 4 p(y): 


and ¢; > 0, 0 < i < m, provides interpretation for ¢ 
from Step 5). The sequence of points {x*} selected by 
the Steps 1)-7) in the interval [0, 1] generates the cor- 
responding sequence {y*} = {y(x*)} in D. 


Convergence Conditions 


([15,16,17] [18]). Assume that the following is true: 

e the problem (1) has an ¢-reserved solution; 

e functions g(y), 0 <i < m+ 1, admit Lipschitzian 
continuations throughout D; 

e from some Step onwards, the values 4y,0<v<m 
+ 1, from (8) satisfy the inequalities 


rly > 16Ly VN, 0<v<m-+l1. 


Then any limit point 7 of the sequence {y*} generated 
by the above algorithm satisfies the conditions: 


ke Nj, 
(y) = inf g(y"):  gily*) <0, 


0<i<m 


< oye), 


where N, is the set of positive integers. 

As long as in applications SFC y(x) is to be approx- 
imated by y°(x) corresponding to some Mth partition, 
it is important to notice that the substantiation of the 
above convergence conditions implies the relation 


a & ee min a 
JN osvsm\ Ly)’ 


which means that the existence of an e-reserved solu- 
tion may be interpreted as some kind of the regularity 
conditions (cf. [6]). 

Dimensionality reduction through SFC causes some 
loss of the information on the closeness of trial points is 
the initial multidimensional space. Two close points in 
D may have substantially nonclose pre-images in [0, 1]. 
To overcome this obstacle, it is possible either to store 
all pre-images of each trial point (close points in D al- 
ways have some close pre-images; see [12]) or to use 
some sets of shifted SFC to provide the better transfer 
of metric information (see [17]). 

GOSF based on the reduction to one dimension by 
using SFC and on the reduction to unconstrained prob- 
lems by employing IA admits effective parallelization 
(see [16,19]). 
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Weber’s problem and all its variations with positive 
weights is clearly one of the most extensively studied 
problems in the area of continuous location theory. It 
frequently arises in planning situations where a single 
central facility must be located so as to minimize the 
total cost associated with serving a number of demand 
centers. In all these cases, the underlying assumption, 
that the associated service costs are directly propor- 
tional to the Euclidean distance of the demand center 
from the central facility, has been adopted. 

Weber’s problem with attraction and repulsion can 
be stated as follows: Given a number of ‘attractive’ or 
‘repulsive’ points located on a 2D-plane, find the posi- 
tion of a single facility inside an arbitrary region P such 
that the sum of the weighted distances of all points from 
the single facility is at its global minimum. 

This problem can be formulated as the following 
nonlinear optimization problem: 


min Wi x—x;)2 + — i)? 
Looe fe—xP +09) 


=D wife x0? + ye, 


i€I— 


where I*, I” are the sets of attractive (users) and repul- 
sive (residents) points, respectively; w;, i € I* the pos- 
itive weight of the ith attractive point and —w;, i € I~ 
the negative weight of the ith repulsive point; (x;, y;) are 
the coordinates of the ith attractive or repulsive point; 
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and P is the region where where the single facility must 
be situated. 

The unconstrained version of this problem has 
been shown to involve a number of important proper- 
ties. The first property provides a sufficient condition 
for having finite solutions or solutions at infinity. Z. 
Drezner and G.O. Wesolowsky [6] by using the well- 
known triangle inequality relation proved the follow- 
ing: 

Property 1 For the unconstrained problem, if W > 0 
then the global optimum location is finite; if W < 0 then 
the global optimum location is at infinity, where 


W= Yo wi- Do wi 


ie+ i€I— 


The second property deals with the localization of all lo- 
cal minimum solutions. Let R be the radius of the small- 
est circle enclosing all points. The square of this radius 
R can be obtained through the solution of the following 
nonlinear optimization problem: 


min  R? 
x°,y",R? 
s.t. (x° —x;)? + (y° — yi)? < R’, 
VieItUr, 


which is convex in the combined space of the coordi- 
nates of the center of the circle (x‘, y°) and the square of 
the radius of the circle R* enclosing all points. Drezner 
and Wesolowsky [6] proved the following localization 
property, which generalizes the majority theorem [24] 
for Weber’s problem. 


Property 2. For the unconstrained problem, all local 
minima and therefore the global minimum are inside 
a disc with a radius equal to 


R 
p= 
Vv1—a? 
where 
_ Ww 
a= wr 
wt= Yo wi, Ww = Yo wi. 
ieIt ieI— 


Note that the boundary of this disc is attainable. 


The case a = 1 or, equivalently, W = 0 is accounted 
for by finding the optimal solution at infinity and com- 
paring it with the best finite solution. Drezner and 
Wesolowsky [6] by using asymptotic analysis showed 
the following: 


Property 3. For the unconstrained problem, if W = 0 
the best solution at infinity is —(A? + B?)"”? where: 


ie]+ i€l— 
B= > WiVi — ) Wii 
ie[+ i€l— 


The following property examines whether a demand 
point corresponds to a local minimum [6]. 


Property 4 For the unconstrained problem, if there is 
a point i such that 


Wie (Wy =F wy)? 


>0, then point i is a local minimum, 
<0, then point i is not a local minimum, 
= 0, then both possibilities are open, 
where 
i\Xi — Xj 
Ww= > = ] i : 
iene J (i — x)? + 1 — 9) 
_ > w(x; — xj) 
i,jel— ij al CM — xj? + (i -— yj? 
ye dy mu 2) 2 
i,jelt iFj Vi — xP + i — ¥) 
= > wilVi a yj) 
i,j€l— i#j V (xi = x) + Oi — Vi? 


P.-C. Chen and others [5] and F. Plastria [14] de- 
rived independently the following sufficient condition 
for a demand point to be the global minimum solution. 


Property 5 For the unconstrained problem, if there is 
a point i* € I* such that 


> 


i€ITUI-,i¥i* 


Wit = Wis 


then (x;*, yj+) is the global optimum location. 
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It is quite straightforward to show that if all weights 
are positive then the expression for the weighted sum 
of the Euclidean distances is convex [13] and there- 
fore the single local minimum corresponds to the global 
minimum. This means that the total expression for the 
sum of weighted Euclidean distances is a difference of 
two convex functions. As it has been noted earlier the 
presence of negative weights greatly complicates the lo- 
cation of the global minimum solution by introduc- 
ing concave contributions in the objective function. 
This special class of difference of two convex func- 
tions (DC) optimization problems has recently (1990) 
received considerable attention [7]. The next theorem 
introduces a set of conditions for convexity of F(x, y) at 
some point (x, y). 


Property 6 F(x, y) is convex at (x, y) if 


Deo 


Tj 


i€ItTUI-~ 
and 
> oe 
3,3 
i€It+UI- jertur, ae 


j>i 


x [(x — xy —y) + —x)(y- lig 2%, 


where rj = (x — xi)? + (y— yi)? i€ I* UI and 


wi, ielt, 
W; = 

—w;i, i€l. 
A proof of this property can be found in [11]. 

A special case of this problem, involving three 
points with weights equal to one, was first posed by P. 
Fermat in the seventeenth century and it was solved ge- 
ometrically by E. Toricelli. E. Weiszfeld [23] first pro- 
posed a simple iterative algorithm but with no con- 
vergence proof. Later, H.W. Kuhn [8,9,10] proved that 
Weiszfeld’s algorithm was convergent assuming no it- 
erate coincided with any of the demand points. L.M. 
Ostresh [12] and E. Balas and others [1] proposed mod- 
ifications of the Weiszfeld algorithm where by perturb- 
ing the current point, if it coincided with a demand 
point, was global convergence guaranteed. C.Y. Wang 
[22] proved that Weiszfeld’s algorithm has linear rate 
of convergence under certain conditions and sublin- 
ear otherwise. More recently (1980s), P-H. Calamai and 


A.R. Conn [2,3,4] and M.L. Overton [13] introduced 
second order methods which involved local quadratic 
convergence and global convergence under conditions. 
G.L. Xue [25,26], and Xue and J.B. Rosen [15] proved 
unconditional global convergence and conditional lo- 
cal quadratic convergence for a second order algorithm 
and computational comparisons were carried out be- 
tween Weiszfeld’s algorithm and Newton’s algorithm 
on a parallel machine. 

Most papers address only positive weights reflecting 
the inherent assumption that all points ‘attract’ the cen- 
tral facility. However, in real world there exists an abun- 
dance of example problems where certain points ‘repel’ 
the central facility. For example nuclear plants, sewage 
treatment plants, or polluting industrial units may be 
desired to be as close as possible to their customers so 
that transportation costs are minimized but at the same 
time environmental considerations require that these 
facilities be as far as possible from residential areas and 
fragile ecological systems. This need to locate a facility 
away from certain points can be quantified through the 
use of negative weights as shown in [16,19]. A negative 
weight means that the value of the objective function is 
increased as the facility approaches the corresponding 
point. Therefore, the global optimum location of a fa- 
cility is now the one that balances the repulsion and the 
attraction acting on the central facility. It is interesting 
to note that the introduction of negative weights greatly 
increases the complexity of the problem. 

Weber’s problem with some negative weights was 
first considered by L.-N. Tellier [17], who studied the 
case of two attractive and one repulsive point. Later, 
Tellier and D. Pollanski [18] analyzed exhaustively all 
different cases involving three demand points and de- 
rived statistical conclusions regarding the types of pos- 
sible solutions. Drezner and Wesolowsky [6] proved 
a number of theoretical results and proposed a heuris- 
tic algorithm for locating the global minimum solu- 
tion. However, it was Chen and others [5] who first 
presented an exact outer approximation algorithm for 
Weber’s problem with attraction and repulsion by ex- 
ploiting the d.c. structure of the problem. In addition, 
they [5] extend their procedure to exponentially decay- 
ing repulsion and facility location within a set of dis- 
joint convex polygons. Later, Maranas and Floudas [1 1] 
proposed a branch and bound type global optimization 
algorithm for solving Weber’s problems with attraction 
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and repulsion. The approach was based on the iterative 
solution of a set of convex and concave lower bound- 
ing problems. Convergence to an €-global minimum 
was proven and examples were solved with as many as 
10,000 points. By analyzing the computational results 
they observed that for any given number of points N the 
difficulty of the problem increases as we introduce more 
repulsive points. This trend continues until about equal 
numbers of attractive and repulsive points are reached. 
Then, a sharp decrease in computational requirements 
is observed as more repulsive points are added. In fact, 
it is easier to solve problems involving more repulsive 
points than attractive ones. The standard deviation of 
the total number of required iterations and function 
evaluations is fairly small for all ratios of attractive to 
repulsive points with the sole exception of the N* = N~ 
= N/2 case where the standard deviation is substantially 
increased. For a given ratio of attractive to repulsive 
points the CPU requirements increase almost linearly 
with N reflecting the fact that most of CPU time is spent 
on function evaluations. 

A generalization of Weber’s problem is the maxi- 
mization of the sum of decreasing convex functions of 
arbitrary metrics. H. Tuy and F.A. Al-Khayyal [20] pro- 
posed the first algorithm for finding global solutions 
to the problem by reducing it to a sequence of uncon- 
strained nondifferentiable convex minimization prob- 
lems. Later, they [21] extended this work to account for 
repulsion as well and proposed a d.c. reformulation of 
the problem which enabled them to develop a global 
optimization procedure. 
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Abstract 


A well-studied problem in the area of computational bi- 
ology is the sequence alignment problem. Three mixed- 
integer linear optimization models have been devel- 
oped to address the global pairwise sequence alignment 
problem in a mathematically rigorous fashion. These 
formulations, in addition to their rigor, allow for (a) the 
natural introduction of functionally important conser- 
vation constraints, (b) the creation of a rank-ordered 
list of the highest scoring alignments and (c) the refine- 
ment of alignments by using pairwise interaction scores 
from simplified force fields. The third model, a path se- 
lection approach, employs some of the algorithmic ad- 
vantages of dynamic programming methods, to outper- 
form other optimization models. 


Keywords and Phrases 


Sequence alignment; Integer linear optimization; 
Global pairwise alignment; Rank-order list of 
alignments 


Introduction 


Sequence alignment methods aim to both identify re- 
lated protein sequences and determine the best align- 
ment between them. This approach provides a rough 
measure of evolutionary distance and may indicate pos- 
sible relationships between the protein structure and 
function of similar sequences. Multiple scoring matri- 
ces have been developed based on the techniques of the 
percent of accepted mutations (PAM) [3] and protein 
blocks (BLOSUM) [5] to quantify this evolutionary dis- 
tance between aligned residues. 

The pairwise sequence alignment problem is most 
commonly addressed through either (i) global align- 
ment or (ii) local alignment techniques. The goal of 
global alignment algorithms is to determine the highest 
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scoring overall alignment spanning the length of both 
sequences. One widely used approach for this prob- 
lem is a dynamic programming approach proposed by 
Needleman and Wunsch [10]. 

Proteins may share sequence similarity in some re- 
gions, but not in others. Local alignment algorithms are 
more suited to these problems and strive to align only 
the highest scoring subsequence match. Smith and Wa- 
terman extended the dynamic programming approach 
for global pairwise sequence alignment problems to ad- 
dress the local alignment problem [14]. Dynamic pro- 
gramming approaches are computationally inadequate 
for large scale database searches, so a number of heuris- 
tic algorithms for local pairwise sequence alignment 
have been proposed [1,2,11,12]. 

Several researchers have studied the effect of includ- 
ing information about near-optimal alignments. The 
investigation of the suboptimal paths and scores al- 
lows for an evaluation of the reliability of portions of 
a sequence alignment. A review of several approaches 
to this problem and their impact can be found else- 
where [17]. 

In some cases, an alignment between two sequences 
can be improved by constraining the problem to in- 
clude biologically important information in the overall 
alignment. One example of this is the required conser- 
vation of certain residues that form a motif necessary 
for function. This problem has been addressed recently 
by dynamic programming algorithms [4,15]. 


Models 


Several integer linear optimization (ILP) models have 
been developed to rigorously and completely address 
the problem of global pairwise sequence alignment 
in a general fashion. A comparison of the three ap- 
proaches, a template-based model, a template-free 
model, and a path selection model are presented in the 
following sections. The formulation of the problem as 
an integer linear optimization problem provides a de- 
terministic guarantee of identifying the global maxi- 
mum alignment [6], allows for the introduction of inte- 
ger cut constraints, provides a framework for the intro- 
duction of functionally-specific constraints, and shows 
promise for the optimal identification of pairwise inter- 
actions. 


Template-Based Model 


Consider two protein sequences S1, S2 of lengths M and 
N respectively, where M > N. Let the index i represent 
each position in Sequence S1 and the index j represent 
each position in $2, as shown in Eqs. 1-2. 


i€1,2...M (1) 


j€1,2...N (2) 


The template-based optimization model assigns each 
amino acid of both sequences to a template to generate 
the optimal alignment. Equation 3 defines a template 
length K as the sum of the length of the larger sequence 
and the parameter N_GAPS,,, representing the maxi- 
mum number of allowed gaps. This model requires the 
introduction of an index k, representing the position in 
the template, as defined by Eq. 4. 


K=M+N_GAPS», (3) 


ke€1,2...K (4) 


The assignment of an amino acid to a template position 
requires the definition of the binary variables, yj, and 
Zjk, aS Shown in Eqs. 5-6. 


if amino acid i of S1 is assigned to 


Vik = template positionk (5) 
0 otherwise 
if amino acid j of S2 is assigned to 
Zjk = template position k (6) 
0 otherwise 


A position in the template may not have an amino 
acid assigned to it in the overall alignment. Therefore, 
Eqs. 7-8 introduce additional binary variables to repre- 
sent these alignment gaps. 


if template position k is a gap 


Vor = for Sequence S1 (7) 
0 otherwise 
if template position k is a gap 
22k = for Sequence S2 (8) 
0 otherwise 


The objective function of this optimization model max- 
imizes the alignment score, which is the sum of a scor- 
ing matrix value for each matching amino acid pair mi- 
nus any associated penalties for gaps inserted in the 
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sequence. The scoring matrix will assign a weight, wij, 
to any template position k that contains the amino acid 
in position i of Sequence S1 and also the amino acid in 
position j of S2. 

For an affine gap penalty model with no penalties 
for gaps that begin or end a sequence, the objective 
function can then be posed as shown in Eq. 9. The con- 
tribution of the scoring matrix at positions i, pwi, is 
considered only if position i of Sequence 1 is assigned 
to position k of the template, y;, and if position j of Se- 
quence 2 is assigned to position k of the template, zj. 
The gap opening existence terms of Sequence 1, go;', 
and Sequence 2, go}*, are weighted by the gap open- 
ing penalty value of wo to assess the penalty for the 
first residue of any gap in a sequence. The existence of 
a gap extension variable of either sequence, gl?! and 
gl,*, produces a penalty of wl for each occurrence. The 
active gl?! and gl?* variables that are contained within 
a beginning or an ending gap are counteracted by the 
product of gb;!, gb??, gee!, or ge?* with wl. Gap open- 
ing penalties at the beginning or end of a sequence are 
explicitly omitted through only summing over the re- 
duced index, such that2 <k < K—1. 


K-1 
max ) > > >, Wij Vik * Zik — Y(go;! As go;”) -wo 
ij k k=2 


K-1 
—) [elf — gb?" — gez") 
k=2 
+ (gl?? — gb}? — gez?)] - wi 
(9) 


The objective function of Eq. 9 requires the lineariza- 
tion of the product of two binary variables and is sub- 
ject to numerous constraints. The details of the model 
are available elsewhere [8]. 


Template-Free Model 


Unlike the previously described mixed-integer linear 
programming formulation of the global pairwise se- 
quence alignment problem in Sect. “Template-Based 
Model”, the optimization model presented here does 
not assign the amino acids of each sequence to a tem- 
plate. However, information about the maximum num- 
ber of allowable gaps is still included in this model, 
through the variable K in Eq. 3. 


In the template-free model, a binary variable, zj, is 
defined in Eq. 10 to represent the alignment of posi- 
tion i in S1 to position j in $2. A method to handle gaps 
in the sequence still must be introduced into the model 
to account for the evolutionary changes that lead to 
residue insertions and deletions. Aligning a gap residue 
to another gap residue is not allowed. This observation 
leads to two possibilities of gap occurrences. A gap can 
either be in Sequence 1, across from a residue j in Se- 
quence 2 or in Sequence 2, across from a residue i in 
Sequence 1. These possibilities are modeled with the bi- 
nary variables zg; and yg;, defined by Eq. 11-12. 


if position j in S2 aligns with 
positioni in S1 
0 otherwise 


Zij= (10) 


ifno position j in S2 aligns 
to the residue in position i of $1 
0 otherwise 


Zi = (11) 


if no position i in S1 aligns 
to the residue in position j of $2 
0 otherwise 


y8j = (12) 


The objective function in Eq. 13 maximizes the sum 
of the weights of the residue-residue alignments minus 
the sum of the gap penalties, plus the appropriate terms 
that remove the penalties from the gaps at the begin- 
ning and ends of the sequences. The scoring matrix val- 
ues at any given pair of positions, wj are included when 
the binary variables indicating a sequence alignment 
that matches positions i and j, z, are activated. For an 
affine gap penalty model, the variables representing the 
existence of a gap opening, go;! and go>’, and the ex- 
istence of a gap extension, gl? and gl5, are multiplied 
by their respective weights, wo and wi. Ifa gap residue 
is present at the beginning or ending of a sequence, it 
will be accounted for in an active value for one of gb? 7 
ge’!, gb¥, ge?? to remove the penalty assigned by the 
previous terms. 


max ) Wi jZij 
ij 


j 
— S (wo - gos! + wl - gl??) 
j 
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— S\(wo - go}? + wl - gl?”) 


N-1 
+ )\wl-(gbS! + ges!) (13) 
j>l 
M-1 
+ wh: (gb'? + gel 
i>l 
+ wo (ger! + ger”) 


+ wo- (gest + geyr 
The objective function of Eq. 13 is subject to numerous 
constraints. The details of these constraints are avail- 
able elsewhere [9]. 


Path Selection Model 


Let us introduce a binary variable Nj that repre- 
sents the alignment of the residue at position i in Se- 
quence 1 to the residue at position j in Sequence 2. 
This binary variables performs a similar role as zj in 
Sect. “Template-Free Model”. The typical assignment of 
this match assesses a weight, w;, based on a scoring ma- 
trix developed through evolutionary analysis of protein 
sequences. 

A successful sequence alignment will have many ac- 
tive Nj variables, which we will designate as nodes. 
Let the binary variable y;;; represent the existence of 
a connecting path between node Nj and a neighbor- 
ing node Nj/;. Associated with this connecting path, 
is a weight parameter, Cj;/;;,, which can be calculated 
in advance from the scoring matrix w and any position 
dependent gap penalty form that is specified a priori. 
An example of the representation of the node and path 
variables is illustrated in Fig. 1. 

Once these variables have been defined, the ob- 
jective function of the optimal sequence alignment is 
merely the sum of the product of the variable for the 
existence of the path, y;;/;;, and the path weight, Cj; ;; 
as shown in Eq. 14. 


max 0 oD) Do yiirii - City 


LS Sf fSj 


(14) 


The variable y;;/;; is defined only as the existence of 
a contact between two neighboring nodes, where each 
node Nj, that has an incoming connecting path ac- 
tivated must also have an outgoing path. In effect, this 
constraint can be thought of as a “mass” balance around 


LC-EP 
ICWEP 


Sequence 1: 


Sequence 2: 


b 


Global Pairwise Protein Sequence Alignment via Mixed- 
Integer Linear Optimization, Figure 1 

(a) Alignment of two hypothetical sequence fragments. 
(b) A node and path representation of the alignment prob- 
lem as formulated by the mathematical model. Note the 
three active paths connecting the four selected node vari- 
ables 


the node. This constraint is specified for all nodes ex- 
cept those that are allowed to begin or end an alignment 
by Eq. 15. 


- yey = > ys Ley 9 


i<i! j<j i" >i! j’>j! 


V1<i'<M,1<j <N 


(15) 


Equation 16 requires an alignment that matches the 
first residue in one of the two sequences to a residue 
in the other sequence. This constraint invalidates any 
alignment that aligns the first residue in both sequences 
to a gap, a physically meaningless alignment and allows 
for the path weights, Cj; ;;, to be precalculated. 


Dede Draw + DDD yirians 


/>1 j jf>j i i/>ij!>j 
(16) 


~ » > Vi=1,i’,j=1,/ = 1 


f>1 ff S1 


If one sequence ends in a gap, the terminal residues 
of the other sequence must be prevented from aligning 
to earlier residues in a physically unrealistic way. Equa- 
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tion 17 allows exactly one active node Nj,; involving 
a terminal residue in either Sequence 1 or Sequence 2. 


a ¥ ‘> Vi,i'=M, ji + a > Vii’ j,j/=N 


i<M j j'>j i i/>ij<N 
= Y Y Yi,v=M,j,/=N = 1 
i<M j<N 
(17) 


It is more efficient and more meaningful to restrict 
the search to within a maximum alignment length, K. 
Equations 18-19 require the sum of the sequence length 
and the number of gaps created by the alignment to 
be less than the maximum alignment length for Se- 
quences 1 and 2 respectively. 


ay 2s ¥  — jf) viiry + 


W>i j j>j+l 
YY Goes - 
i i/>ij/>1 
ys » ba —N)- yirjvan +M<K 
i i/>ij<N 
D7 2 Did fig 
i v>iti j jf>j 
pa Bee 1)+ yiei,irjyv + (19) 
Hol jf fy 
a ba 2G —N)-yivemjy +N<K 
i<M j j’>j 


Equations 14-19 form the general mathematical model 
for the path selection approach to the global pairwise 
sequence alignment problem. Any of the three models 
presented can be expanded to include functionally-spe- 
cific constraints, integer cut constraints, and pairwise 
interactions. Only the constraints necessary to include 
these features in the path selection model will be pre- 
sented here. 


Functionally-Specific Constraints 


For some sequence alignment problems, specific 
residues are related to the function of a protein and 
should be maintained in a meaningful sequence align- 
ment. This idea can be enforced in a mathematically 
rigorous way. These constraints can only be defined if 
the node existence variables, Nj, are connected to the 


path existence variables, y;;/;. One way to accomplish 
this is by summing over a pair of indices within the path 
variables, as shown in Eqs. 20-21. 


a Vii jv = Nirj Vil > ee >1 (20) 
i<i'j=j’ 

ee vir ji = Nij Vi=lorj=1 (21) 
Ui} 


Constraints enforcing residue identity can then be 
written in terms of the Nj variables. If position i* in Se- 
quence 1 must be conserved to maintain function, then 
Eq. 22 enforces this requirement. 


Nj*j = 1 


pAAa;* =AAj 


(22) 


Integer Cut Constraints 


This alignment model can be further extended by in- 
troducing integer cut constraints. After each solve of 
the above model, the previous solution is excluded from 
the feasible solution space by Eq. 23. A is the set of ac- 
tive variables in the solution to be excluded, I is the set 
of inactive variables and card(A) is the cardinality of set 
A, or the number of members of set A. 


> Vii' jj! — a Viti <card(A)—1 (23) 


(ii/ jj/JEA (ii’ jj/)EI 


Pairwise Interaction Scores 


A score can also be assigned for the alignment of a pair 
of amino acids i, i’ in one sequence to a specific pair of 
amino acids j, j’ in the second sequence. One promis- 
ing application of these pairwise interactions scores is 
the ability to better evaluate the fitness of an alignment 
between a protein of known structure and an unknown 
protein with remote sequence homology. A number of 
recently developed C%-based distance dependent force 
fields [7,13,16] are a good source for these scores be- 
cause they allow some flexibility between the backbones 
of these two structures. 

A pairwise interaction score requires the definition 
of the variable z;;/;, representing the successful align- 
ment of both i, j (Nj) and i’, j’ (Nirj). This variable is 
initially introduced in Eq. 24 as the product of two node 
existence binary variables. 


Zi! jj! => Nij + Nivjr Vi, ei (24) 
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Equation 24 is nonlinear and must be linearized us- 
ing standard optimization techniques by Eqs. 25-27, 
which replace Eq. 24. 


Seay cae We 09) 
ij 

ziti S Ny Vii 28) 
i/j’ 

NePN iH lseeeiy Via (27) 


Let the score of a pairwise interaction be denoted as 
P;;7;'. The objective function of Eq. 14 is expanded to 
include an additional contribution as shown in Eq. 28. 


mae », » » > inti Cit iy + ivi Piirjy (28) 


i >i j i>j 


The ability of the sequence alignment models to eas- 
ily allow for pairwise interaction scores illustrates their 
true power and flexibility. The model is guaranteed to 
converge to the optimal solution even for problems of 
this type. This guarantee suggests the effectiveness that 
could be achieved by incorporating such a model into 
a fold recognition framework. 


Results and Discussion 


The mixed-integer linear programming models of 
Sect. “Models” can address generic sequence alignment 
problems ofa reasonable size. This method will be illus- 
trated on an alignment of G-protein coupled receptors 
with the use of integer cut constraints and an alignment 
of pancreatic trypsin inhibitors demonstrating the use 
of functionally-relevant conservation constraints. All 
the alignments are calculated using the BLOSUM62 
scoring matrix and an affine gap model with a gap 
opening penalty of 11 and a gap extension penalty of 1. 


G-protein Coupled Receptors 


G-protein coupled receptors are a type of membrane 
protein that regulate material and ion transport across 
a cell membrane, a reason they are a popular target 
for drug development. The alignment of the seventh 
transmembrane helix of bovine rhodopsin (34 amino 
acids) to the seventh transmembrane helix of H1R (35 


Sequence 1: 
KNCCNEHLHM FTIWLGYINS TLNPLIYPLC NENFK 
Sequence 2: 
SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFR 


ITERATION: 1 OBJECTIVE: 26 (9 matches) 
1234567890 1234567890 1234567890 12345678 
Sl: KNCCNEHLHM F-TI--WLGY INSTLNPLIY PLCNENFK 


S2: ----SDFGPI FMTIPAFFAK TSAVYNPVIY IMMNKOFR 


ITERATION: 2 OBJECTIVE: 25 (8 matches) 
1234567890 1234567890 1234567890 12345678 
Sl: KNCCNEHLHM FTIWLGYINS T---LNPLIY PLCNENFK 
| | I! II | 
$2: ----SDFGPI FMTIPAFFAK TSAVYNPVIY IMMNKOFR 
ITERATION: 3 OBJECTIVE: 25 (7 matches) 
1234567890 1234567890 1234567890 12345678 
Sl: KNCCNEHLHM FTI---WLGY INSTLNPLIY PLCNENFK 
| I! I | 
S2: ----SDFGPI FMTIPAFFAK TSAVYNPVIY IMMNKOFR 
ITERATION: 4 OBJECTIVE: 25 (7 matches) 
1234567890 1234567890 1234567890 12345678 
Sl: KNCCNEHLHM FT---IWLGY INSTLNPLIY PLCNENFK 


S2: ----SDFGPI FMTIPAFFAK TSAVYNPVIY IMMNKOFR 


Global Pairwise Protein Sequence Alignment via Mixed- 
Integer Linear Optimization, Figure 2 

A rank-ordered list of the top four optimal alignments of the 
helix 7 region of the human histamine receptor (Sequence 1) 
to the helix 7 region of the bovine rhodopsin (Sequence 2) 
for a template length of 50 residues 


amino acids), the first human histamine receptor, will 
be considered to illustrate alignment uncertainty [9]. 
Figure 2 shows the the regions of uncertainty in the se- 
quence alignment using integer cut constraints. There 
is a strong conservation of alignment at the ends of 
the selected sequence, including the preservation of the 
highly conserved NPxxY motif. The central regions of 
the aligned sequences shows more variability. This ob- 
servation could be a result of less structural conserva- 
tion in the region, or less sequence similarity required 
for structural (and functional) conservation. 

A comparison of the computational resources re- 
quired for this problem is presented in Table 1. A larger 
template length results in a more complex optimization 
problem to be solved. The path selection model signif- 
icantly outperforms the other formulations, especially 
for the larger template lengths. 
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Sequence 1: 


MLKYTSISFL LIILLFSFTN ANPDCLLPIK TGPCKGSFPR YAYDSSEDKC 


VEFIYGGCQA NANNFETIEE CEAACL 


Sequence 2: 


RPDFCLEPPY TGPCKARIIR YFYNAKAGLC QTFVYGGCRA KRNNFKSAED 


CMRTCGGA 


OBJECTIVE: 141 


(26 exact matches) 


1234567890 1234567890 1234567890 1234567890 1234567890 
S1: MLKYTSISFL LIILLFSFTN ANPD-CLLPI KTGPCKGSFP RYAYDSSEDK 


S88 Sesese tesa a psseseses -RPDFCLEPP YTGPCKARII RYFYNAKAGL 


S1: CVEFIYGGCQ ANANNFETIE ECEAACL-- 


S2: CQTFVYGGCR AKRNNFKSAE DCMRTCGGA 


Global Pairwise Protein Sequence Alignment via Mixed-Integer Linear Optimization, Figure 3 
Optimal alignment of bombyx mori kazal-type serine proteinase inhibitor 1 (Sequence 1) to bovine pancreatic trypsin in- 
hibitor (Sequence 2), given the requirement of cysteine conservation and a template length of 100 


Global Pairwise Protein Sequence Alignment via Mixed- 
Integer Linear Optimization, Table 1 

Computational performance of the template-based (TB), 
template-free (TF) and path selection (PS) models for helix 7 
of the G-protein coupled receptor proteins (run times in sec- 
onds on an Intel Pentium 3.2 GHz processor, using CPLEX 9.0) 


K TB TF PS Objective 


Serine Protease Inhibitors 


Serine protease inhibitors are responsible for regulat- 
ing serine proteases, proteins necessary for hydrolyzing 
peptides. One well-studied protein within this class is 
the bovine pancreatic trypsin inhibitor (BPTI). Its na- 
tive three-dimensional structure is stabilized by 3 disul- 
fide bonds that are conserved across the class of serine 
protease inhibitors. An alignment of BPTI (58 amino 
acids) to the bombyx mori (domestic silkworm) kazal- 
type serine protease inhibitor (76 amino acids) has pre- 
viously been investigated in the context of introducing 
constraints for the functionally important conservation 
of the disulfide bonds [8]. The results of such an align- 
ment are presented in Fig. 3. The six conserved cysteine 
residues necessary for the formation of the three disul- 


Global Pairwise Protein Sequence Alignment via Mixed- 
Integer Linear Optimization, Table 2 

Computational performance of the template-based (TB), 
template-free (TF) and path selection (PS) models for the 
alignment of bombyx mori kazal-type serine proteinase in- 
hibitor 1 to bovine pancreatic trypsin inhibitor (run times 
in seconds on an Intel Pentium 3.2 GHz processor, using 
CPLEX 9.0) 


TF PS = Objective 


K TB 
P20] e864] 036 |e) 141 


[90 i000 [aoa fia) rar 
Foo fro0c+ forza [277 [a 


fide bridges that stabilize the functional protein are ap- 
parent from this alignment. 

A comparison of the computational resources re- 
quired for this problem is presented in Table 2. Even 
with the inclusion of the conservation constraints, the 
path selection model still solves this alignment example 
quite rapidly for large template lengths. Although the 
template-free approach slightly outperforms the path 
selection approach for short template length restric- 
tions, it does not scale very well with increases in tem- 
plate length. Similar to the first example, the template- 
free approach solves the problem significantly faster 
than the template-based approach, but the path selec- 
tion approach is superior to both of the mixed-integer 
linear programming techniques. 
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A supply chain (SC) may be defined as an integrated 
process where several business entities such as suppli- 
ers, manufacturers, distributors, and retailers work to- 
gether to plan, coordinate and control the flow of ma- 
terials, parts, and finished goods from suppliers to cus- 
tomers. This chain is concerned with two distinct flows: 
a forward flow of materials and a backward flow of in- 
formation. Similarly, a global supply chain (GSC) may 
be defined as a SC where one or more of these busi- 
ness entities operate in different countries. For many 
years, researchers and practitioners have concentrated 
on the individual processes and entities within the SC. 
Within the past few years, however, there has been an 
increasing effort in optimizing the entire SC. This arti- 
cle intends to highlight some of the early results from 
the 1960s to 1995 that have led to today’s SC research 
and most of the recent results that address the design 
and management of GSC networks (as of 2000). 

Within manufacturing and logistics research, the 
current stream of SC research is largely built on prior 
work in the area of multi-echelon inventory models. 
The early works [4,5] and [14] form the basis for most 
of the research done in this area. See [13] and [3] for ex- 
tensive reviews of multi-echelon inventory models. For 
detailed and more recent discussions of multi-echelon 
models, see [12,19,20]. 
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As companies began to realize the benefits of op- 
timizing the SC as a single entity, researchers began 
utilizing operations research (OR) techniques to bet- 
ter model supply chains. See [2] for an extensive review 
of the literature in SC modeling. Typically, a SC model 
tries to determine: 

e the transportation modes to be used; 

e the suppliers to be selected; 

e the amount of inventory to be held at various loca- 
tions in the chain; 

e the number of warehouses and plants to be used; 
and 

e the location and capacities of these warehouses and 
plants. 

However, as a result of the globalization of the econ- 

omy, the models have become more complex. GSC 

models now often try to include factors such as ex- 

change rates, international interest rates, trade barriers, 

taxes and duties, market prices, and duty drawbacks. 

All of these factors are generally difficult to include in 

mathematical models because of the uncertainty and 

nonlinearity they introduce. 

See [21] and [7] for extensive reviews on GSC 
models. [21] concentrates on strategic production- 
distribution models whereas [7] focus on the integra- 
tion of SC network optimization with real options pric- 
ing methods. This article complements these reviews by 
giving a chronological listing of the models in both ar- 
eas. 

In [15] an international facility location model is 
presented. This is one of the first mathematical pro- 
grams that includes financial aspects in GSC modeling. 
The authors develop a large scale nonlinear mixed in- 
teger programming problem (MIP). The objective func- 
tion takes into account the expected profit and the vari- 
ance of the profit, where the variance of the profit is 
multiplied by a risk aversion factor. Plant capacities, 
market demands and financial constraints are included 
in the model. The formulation considers production 
and transportation costs, exchange rate fluctuations, in- 
ternational interest rates, market prices, import tariffs, 
and export taxes. 

In [9] a deterministic model is proposed for maxi- 
mizing the after tax profit of a large scale international 
distribution network. Transportation costs, fixed setup 
costs, variable production and purchasing costs, and 
fixed vendor costs are included in the model. The model 


enforces production capacity constraints, demand lim- 
its, material requirements at each plant, supplier capac- 
ity constraints, balance constraints at plants and dis- 
tribution centers, feasible flow constraints, and offset 
trade requirements. The model is run sequentially over 
a fixed time horizon and computational results are pre- 
sented for various problem sizes. 

In [6] the differences are analyzed between an in- 
ternational SC model and a single-country model, and 
a dynamic, nonlinear MIP model is developed. The in- 
clusion of features such as duties, tariffs, tax rates, and 
exchange rates produce models that are very difficult to 
solve optimally even for small size problems. 

In [8] a normative model is presented for the opera- 
tions of a global company. Plant location, capacity and 
product mix, and material and cash flow determina- 
tion are the decisions included in the model. The model 
consists of a master problem and a set of subproblems. 
The master problem is a multiperiod stochastic program 
and the subproblems are single period stochastic pro- 
grams. These problems are linked through a set of sub- 
models such as a stochastic SC model, a financial flow 
model, a stochastic exchange rate model, and a price- 
demand model. 

In [17] a stochastic dynamic programming (DP) 
model is developed that treats the SC as equivalent to 
owning a financial option instrument. The value of the 
option depends on the real exchange rate. The authors 
consider production switching between two manufac- 
turing plants located in different countries depending 
on the real exchange rate. The model does not consider 
characteristics such as multiple products or different SC 
stages. The model becomes intractible for more than 
one exchange rate process. 

In [1] a comprehensive, multiperiod, multicom- 
modity MIP model is proposed which is used to opti- 
mize the SC of Digital Equipment Corporation (DEC). 
The objective of the model is to minimize a function 
of total production and distribution cost, savings from 
credit, and an additional term which contains produc- 
tion and transportation times. The total cost includes 
fixed and variable costs of production, transportation 
cost, material handling, inventory, and overhead costs. 
The savings from credit are due to reexporting prod- 
ucts. The model enforces constraints on demand satis- 
faction, production and throughput capacities at each 
facility, and bounds on decision variables. In addition, 
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international constraints such as duty drawback, duty 
relief, and offset trade are included. The authors de- 
scribe how DEC used this model to manage their GSC. 

In [18] a multiperiod stochastic DP is introduced 
that allows the firm to switch among several production 
modes to maximize profit. The production modes they 
consider are exporting from the home country, a joint 
venture with local partners, and establishing a wholly 
owned subsidiary for a foreign firm. It concludes by 
identifying cases in which one of these modes would be 
preferred to the others. 

In [16] a stochastic DP formulation is developed 
for the valuation of global manufacturing strategy op- 
tions. A hierarchical approach is proposed. First, the ex- 
change rates are modeled by multinomial approxima- 
tions. Then, options for alternative product and SC net- 
work designs are determined based on the firm’s global 
manufacturing strategy. Finally, an MIP model for each 
exchange rate within every period is solved and the 
value of several manufacturing options is determined. 
The expected profit for each policy option is found by 
solving a stochastic DP using the values of the manu- 
facturing policies. 

In [11] the problem of operating a network of plants 
that are partially-owned subsidiaries of a multinational 
corporation is analyzed. Using real data, a model of 
three subsidiaries and four countries is developed for 
one industry and the effects of coordination under var- 
ious macroeconomic conditions are discussed. 

In [10] optimal policies for operating a network 
of plants located in different countries is studied. It 
is assumed that production costs are stochastic and 
are influenced by factors such as exchange rates, infla- 
tion, taxes, and tariffs. There is a one-time charge for 
switching (production volume changes between coun- 
tries) and variable production costs are either concave 
or piecewise linear convex at each plant. It is also as- 
sumed that demand is deterministic and stationary. 
Under these assumptions a two-country, single market 
stochastic DP model is developed. The authors show 
that the optimal policy is always a barrier policy when 
switching costs are linear or step functions. (A barrier 
policy is a policy in which each plant operates either at 
a minimum or a maximum output level.) 

The literature on GSC management is quite re- 
cent and the models developed usually do not consider 
most of the uncertainties that international corpora- 


tions face. Each model addresses a limited number of 
the aspects of managing a GSC. There is an ongoing ef- 
fort to develop more comprehensive and practical GSC 
design models that will accommodate the needs of the 
rapidly changing global economy. 


See also 


> Inventory Management in Supply Chains 

> Nonconvex Network Flow Problems 

> Operations Research Models for Supply Chain 
Management and Design 

> Piecewise Linear Network Flow Problems 
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Introduction 


Global terrain methods [5,6,7] are a class of methods 
for solving nonlinear programming problems that are 
based on the simple concept of intelligently following 
valleys up and down on the terrain or landscape of three 
times continuously differentiable or C? objective func- 
tion surfaces. They belong to the class of integral path 
or path following methods [1,2,3,4,8] and can also be 
used to solve systems of nonlinear equations formu- 
lated as nonlinear least-squares problems. The overall 
approach is based on the reliable and efficient computa- 
tion of minima, saddle points, and singular points and 
a terrain-following algorithm to efficiently move from 
one stationary point to another or to a boundary of the 
feasible region. What makes global terrain methods su- 
perior to other path following methods is the Newton- 
based predictor-corrector method used to move uphill 
on the objective function landscape. 


Formulation 


The problem under consideration is that of finding 
a number of minima, saddle points, and singular points 
of a C> objective function, ¢ = (z), defined on R" 
subject to bounds on variables, c(z), where z are the op- 
timization variables. Let F = F(z) denote the gradient 
of y and J(z) denote the n x n symmetric Jacobian ma- 
trix of F (or Hessian matrix of ¢). 


Problem Statement 


The problem can be stated in the form 


Find {zz}: zi < c(z*) such that V(FF) =0, (1) 
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where {z;} denotes a set of minima, saddle points, 
and/or singular points, and the constraints are given by 


* L * U 
—2Z; <z;andz; <z;, (2) 


where z; and z¥ are the lower and upper bounds on the 
variable z;. 
Note that V(F!F) = J’ F = 0 implies that either 


F(z) = 0, (3) 


det J = 0 with null space vector F #0. (4) 


If z; * satisfies Eq. (3), it is either a minimum ora saddle 
point of g whereas if zj satisfies Eq. (4), it is a singular 
point of J. To distinguish between minima and saddle 
points, the Hessian matrix of F’F is required, which is 


H=J'J+ZEG;, (5) 


where F; is the ith element function of F and where G; 
is the corresponding element Hessian matrix of F;. If all 
eigenvalues of H are positive, z; is a minimum of 9. If 
at least one eigenvalue of H is negative, z; is a saddle 
point of ¢. 


Geometrical Foundation 


Figure 1 shows the contours of F’F along with the ter- 
rain path for a simple two dimensional reactor example. 
To understand the underlying geometric foundation on 
which global terrain methods are built, consider two 
neighboring contours or level curves along the curved 
valley shown in Fig. 1. Note that the distance, A, be- 
tween any two neighboring level curves in the normal- 
ized gradient direction is largest exactly in the valley 
and that this distance decreases in magnitude as points 
move out of the valley along the same neighboring level 
curves (i.e., the contours become more tightly packed 
together). Therefore the norm of J TF must be smaller at 
any point in the valley than at any neighboring point on 
any given level curve since the same change in the least- 
squares function results from the largest change in dis- 
tance. Thus the valley connecting the stationary points 
shown in Fig. 1 can be characterized as the collection 
of local minima in the norm of J’ F over a set of level 
curves. This same constrained extremum in the gradi- 
ent norm also characterizes ridges, ledges and other dis- 
tinct features of the objective function landscape in any 
n-dimensional space. 


temperature (K) 


ZZ 

LZ 

0 0.25 0.5 
conversion 


Global Terrain Methods, Figure 1 
Contours of a least squares surface 


Valleys, ridges, ledge, etc. can be defined mathemat- 
ically by a set of solutions, V, to a sequence of general 
nonlinearly, constrained optimization problems 


V = {ming"g such that F’F = L, forall € A}, 
(6) 


where F and J are defined as before and where 
g = 2J'F, L is any given value (or level) of the least- 
squares objective function, and A is some collection of 
contours. That is, for any given level curve, we find the 
point on L that corresponds to a local minimum in g’¢. 
The collection of minima for all levels gives all (or part) 
of a valley, ridge, or ledge. Equation (6) forms the geo- 
metrical backbone for global terrain methods and plays 
an important role in the development of predictor- 
corrector algorithms used to implement those ideas. 
Moreover, A is actually a computational by-product of 
the terrain-following approach. 

It is useful to simultaneously monitor behavior on 
the landscape of F’F and the objective function land- 
scape, noting that minima and saddle points on ¢ are 
minima on F’F while singular points on g are saddle 
points on F? F. Valleys on both surfaces closely align. 


Methods 


Terrain-following methods are comprised of a se- 
quence of sub-problems that unfold dynamically dur- 
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ing the course of solving a nonlinear programming 

problem. Since global terrain methods move up and 

down the landscape of F‘F, these sub-problems in- 

clude 

1) Reliable downhill equation solving. 

2) Reliable and efficient computation of singular 
points. 

3) Efficient uphill movement comprised of predictor- 
corrector calculations. 

4) Reliable and efficient eigenvalue-eigenvector com- 
putations. 

5) Effective bookkeeping. 

6) A termination criterion to decide when the compu- 
tations have finished. 

7) Advanced techniques to deal with bifurcations and 
non-differentiable points. 


1) Moving Downhill 


Downhill computations use a trust region method and 
are capable of finding minima, saddle points, and some- 
times singular points on an objective function surface. 
In finding the first point, say z}, initiation of down- 
hill computations is arbitrary. On subsequent downhill 
sub-problems, calculations always begin in the direc- 
tion of the smallest negative eigenvalue of H. 
The basic downhill iteration is defined as follows 


A=—BAn + (B- Deg, (7) 


where Ay =J7!F is the Newton direction and 
B € [0,1] is determined by the following simple rules. 
If ||An|| < R, then 6 = 1, where R is the trust region 
radius. If || Ay|| > Rand ||F|| => R, then 6 = 0. Other- 
wise, f is the unique value in Eq. (7) on [0,1] that satis- 
fies || A|| = R. The new iterate is accepted if it reduces 
||F||. Otherwise, the new iterate is rejected, the trust re- 
gion radius is reduced and the calculations are repeated 
until a reduction in ||F|| occurs. Downhill movement is 
terminated when either ||F|| < ¢, where ¢ is a conver- 
gence tolerance, or ||F||/ ||An|| < ¢ where ¢ is some 
small number (typically 10~°). This latter condition im- 
plies that the Newton step is very large in compari- 
son to the gradient and the computations are converg- 
ing to a singular point. The algorithm then switches to 
quadratic acceleration. 


2) Acceleration to Singular Points 


During downhill movement, quadratic acceleration is 
used if ||F||/||An|| < ¢. Quadratic acceleration is also 
used during uphill calculations to converge to singular 
points and is defined by 


A=-HJ'FE. (8) 


During acceleration, norm reduction in F is not en- 
forced because H can have eigenvalues of mixed sign. 


3) Moving Uphill 


Uphill movement is initiated in the eigen-direction as- 
sociated with the smallest positive eigenvalue of the 
Hessian matrix H and consists of two basic parts - 
Newton predictor steps and successive quadratic pro- 
gramming (SQP) corrector steps. 


Uphill Predictor Steps Predictor steps follow a valley 
uphill but will ‘drift’ from the valley - as shown in the 
slight zigzag in the terrain path in Fig. 1, which shows 
this ‘drift’ (followed by corrector steps). Uphill Newton 
steps are defined by 


Ay = aAn ; (9) 
where Ay = J~!F and the step size a € (0, 1]. 


Uphill Corrector Steps Corrector steps (again see 
Fig. 1) are used intermittently to force iterates back to 
a valley and are invoked when the condition 


@ = 57.295 arccos[(Ayc)/(\|Anl| IIvl|)] = ©, (10) 


is satisfied, where v is the current estimate of the eigen- 
vector associated with the smallest positive eigenvalue 
of H and @ is 5 degrees. Corrector steps are formulated 
as 


min g’g such that F'F = L, (11) 
where L is the current value of F’ F. Corrector steps are 
iterative and are considered converged when the neces- 
sary conditions 


F'F-L=0, (12) 


Hg—Ag =0, (13) 
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are satisfied. Corrector steps are computed using a suc- 
cessive quadratic programming (SQP) method; how- 
ever, other methods can be used for this purpose. The 
SQP formulation for the problem defined by Eq. (11) is 
given by 


1 
min g’ HA, + ~A? MA, such that 
2 (14) 


g A, = —(F'F-1), 


where M is the Hessian matrix of the Lagrangian 
function. The Lagrangian function is defined by 
L=g'g—A(F'F—L), where A is a Lagrange mul- 
tiplier and where M is approximated by the rule 
M = H'H-AH. 


4) Eigenvalue-Eigenvector Computations 


It is not always necessary to find all eigenvalues and 
eigenvectors of H to decide whether to begin the next 
phase of the computations uphill or downhill - partic- 
ularly for problems with large n. Often it is sufficient 
to compute a subset of eigenvalues and eigenvectors, 
which can be conveniently performed using the inverse 
power method. The inverse power method solves the 
inverse form of 


Hv—-Av=0, (15) 
by constructing the iteration 
Vit = ARH Vx, (16) 
T 
Via yVk+1 
Aw = =, (17) 


where the calculations alternate between Eqs. (16) 
and (17) until ||v,g44 —AxH!v_|| < €, where e is some 
pre-specified tolerance. Note that an estimate of v is 
necessary to begin the inverse power method. Once the 
first eigenvalue, say A,, and its corresponding eigen- 
vector, v}, have been determined, the Hessian matrix 
is deflated using symmetric orthonormal projection to 
give an (n — 1) x (n — 1) symmetric matrix whose ba- 
sis spans the space orthogonal to v,. The inverse power 
method is used to find the next eigenvalue, A2, and its 
associated eigenvector, v2, and then v2 is lifted to R”. 
This procedure of deflation by orthonormal projection 
to form an (m — j) x (n — j) symmetric matrix whose 


basis spans the space orthogonal to {v,, v2,..., vj} fol- 
lowed by the inverse power method and the lifting of 
vj+1 to R” is continued until as many eigenvalues and 
eigenvectors as desired are determined. 


5) Effective Bookkeeping 


Another important aspect of global terrain methods 
is that it is possible to avoid calculating the same z; 
more than once by effective bookkeeping. This is ac- 
complished by storing solution information that in- 
cludes the set of solutions, the solution types (i. e., min- 
imum, saddle point, or singular point), corresponding 
values of g and F'F, and the current set of eigen- 
connections (i.e., the smallest positive eigenvalue and 
associated eigenvector for minima and saddles, and the 
largest negative eigenvalue and associated eigenvector 
for singular points). Following the determination of the 
first stationary or singular point, zy}, uphill movement 
proceeds in the +-/— eigen-direction associated with the 
smallest positive eigenvalue of H. Assume that two new 
stationary or singular points, z> and z}, have been de- 
termined by these uphill calculations. The next move 
will be downhill from zj in the eigen-directions, v2, as- 
sociated with the largest negative eigenvalue, A, of H at 
z;. However since zj and z} are connected by path to 
z;,care must be exercised so as not follow the path back 
to z}. To do this, nearest neighbors are determined by 
finding k such that 


\|z; — z{|| is minimum for all k 4 2. (18) 


Let j be the index for which Eq. (18) is satisfied. Fol- 
lowing this, the direction d, = zy — z; is defined. Cor- 
rect downhill movement away from z3 is defined by 
whichever inequality 


v3 dy <0 or —v3d, <0, (19) 


is satisfied. Note that the selection of the proper condi- 
tion in Eq. (19) guarantees that initial movement from 
z3 will be in the direction away from the nearest solu- 
tion ZF. Equations (18) and (19) can be easily general- 
ized to give 


\|z7 — zz || is minimum for all k 4 K, (20) 


vidi <0 or —v;d; <0, (21) 
where d, = z; — z;, jis the index that satisfies Eq. (20), 


and K is the current number of solutions. 
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6) Termination 


Termination occurs when either the desired number of 
points {zz} have been calculated or a certain number 
of bounds are encountered. The first termination cri- 
terion is straightforward. In the normal case, termina- 
tion occurs when two bounds have been encountered. 
When bifurcations have been detected, then the num- 
ber of bounds that must be encountered for termination 
to occur is n, + 2, where ny is the number of distinct 
bifurcations. 


7) Advanced Techniques 


For any global terrain method to be effective it must 
also to address issues such as parametric disconnected- 
ness, integral path bifurcations, and non-differentiable 
points or manifolds. 


Parametric Disconnectedness 


Following solutions parametrically is the basis for many 
homotopy-continuation methods. However, when 
parametric solutions exist on disconnected branches 
of solution curves, continuation methods can have 
difficulties. Global terrain methods are completely un- 
affected by parametric disconnectedness since they 
operate in variable and not parameter space. 


Non-differentiable Points and Manifolds 


There are many engineering applications that ex- 
hibit non-differentiable points and/or manifolds as 
a consequence of inherent switching contained in 
the objective function. At the ‘switch’ points, non- 
differentiability can occur and there can be families of 
‘switch’ points that form manifolds. Non-differentiable 
points or manifolds are easily detected because they of- 
ten exhibit retrograde curvature as well as other quali- 
tative changes in model behavior that can be readily 
monitored. 

Figure 2 illustrates a case in which there is a non- 
differentiable manifold. In this figure, z} = Cj9,Z2 = 
Cig, 23 = C1, 21 +22+23 = 1, which is why the feasible 
region is triangular shaped, and 0 < z; < 1,i = 1,2,3. 
This curved manifold of non-differentiable points de- 
notes the boundary between qualitatively different 
types of behavior for the case where ¢ = min|[@, ¢2] 


at each z and is usually not mentioned in discussions of 
optimization of physical models. However, it is impor- 
tant in computations. The global terrain methodology 
has no difficulties finding stationary and singular points 
on F’ F in this case because it monitors all aspects of the 
g thereby allowing switching take place on the fly and 
the correct stationary and singular points to be easily 
found. 


Integral Path Bifurcations 


There are many applications in which integral paths 
either split into two or become tangent to a contour. 
These occurrences are called integral path bifurcations 
and can significantly impact the reliability of global ter- 
rain methods. Fortunately, Gauss curvature can pro- 
vide a deterministic measure of the presence of bifur- 
cation points. 

It is often easier to understand integral bifurcations 
from a geometrical perspective. Consider Fig. 3 where 
Z. = Cig,22 = Cy, 23 = Cy, 21 +22 +23 = 1, 
and 0 < z; < 1,i = 1,2,3. Note that there is a pitch- 
fork bifurcation at the point denoted by the point b 
on the integral path that runs from the two minima 
and the saddle point of F‘F in the center of the trian- 
gle toward the saddle point and minimum very close 
to the hypotenuse of the triangular region. If the inte- 
gral path bifurcation at b goes undetected, then the sad- 
dle point and minimum closest to the hypotenuse will 
not be found because corrector iterations will force it- 
erates to turn toward the left or right hand branches of 
the pitchfork that end at the corners of the hypotenuse. 
Note, however, that the level curves begin to flatten in 
the neighborhood of the bifurcation point as the path 
moves toward the hypotenuse. This flattening, together 
with an eigenvector exchange from JF to a vector in 
the tangent subspace of the level constraint, is a neces- 
sary condition for integral path pitchfork bifurcations, 
like the one that occurs at b. Moreover, flattening is rel- 
atively easy to measure by calculating (Gauss) curvature 
along a contour. 


Gauss Curvature 


To measure Gauss or Gauss—Kronecker curvature, it is 
necessary to calculate eigenvalues of the Hessian ma- 
trix, H, projected onto the tangent subspace of the level 
constraint, which is orthogonal to the gradient at any 
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T = 280.15 K 
p= 1.013 bar 


Global Terrain Methods, Figure 2 
An Objective Function & Gradient Surface with a Non-Differentiable Manifold (left) b (FIFA ); (right) a Composite F’ F 


T = 280.15 K 
p = 1.013 bar 


Global Terrain Methods, Figure 3 
Integral Path Bifurcation on Objective Function & Gradient Surfaces (left) Landscape of 9; (right) Landscape of F’F 


given point along the integral path. Gauss-Kronecker creasing Gauss or Gauss—Kronecker curvature in a par- 
curvature corresponds to the determinant of this pro- _ ticular part of the feasible region indicates that the level 
jected Hessian matrix. When the number of unknowns curves are flattening and provides a strong reason to 
is two, this curvature is called Gauss curvature. De- check for an exchange in the ‘minimum’ eigenvector of 
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H and, if warranted, to search for an integral path bi- 
furcation. 

Current implementation of these ideas measures 
flattening by calculating a few of the smallest eigen- 
values (and eigenvectors) of the projection of H onto 
the tangent subspace at each iteration of the calcula- 
tions. Without a theoretical basis that defines how of- 
ten Gauss curvature should be measured, intermittent 
measurement seems ad hoc at best since very small re- 
gions of decreasing Gauss curvature could go unde- 
tected. 


Finding Integral Path Tangent Bifurcations 


This type of bifurcation point can be detected by mea- 
suring Gauss curvature and by comparing vectors along 
the flow of an integral path and vectors in the tan- 
gent subspace of the level sets for points on the path. 
When Gauss curvature decreases and the flow of the 
integral path becomes collinear to the tangent subspace, 
a tangent bifurcation point has occurred. Generally, this 
shows up as a ‘jump’ in the path to a point a con- 
siderable distance away on a neighboring level curve. 
Between these two points the value of the constrained 
minimum defining the path is degenerate and H has re- 
peated eigenvalues. 


Finding Integral Path Pitchfork Bifurcations 


When flattening occurs but the flow of the integral path 
is not collinear to the tangent subspace of the level 
constraint, an eigenvalue exchange is sought. This ex- 
change in the minimum eigenvalue of H from one asso- 
ciated with J? F to one associated with the tangent sub- 
space of a level curve is easily determined by monitor- 
ing the eigenvalue associated with the terrain path and 
the smallest eigenvalue of the matrix H projected onto 
the tangent subspace. Once an eigenvalue exchange is 
detected, the algorithm searches for a possible bifur- 
cation point by locating a maximum in the norm of 
J" F on the level curve, say L*, where the eigenvalue ex- 
change has been detected. This is because as contours 
flatten, the distance between these level curves becomes 
smaller and smaller, which is an indication that the na- 
ture of ||J7F|| on L* has changed from a constrained 
minimum to a constrained maximum. See the discus- 
sion in [7]. Therefore, an approximate bifurcation point 


is calculated by solving the NLP problem 


max gg such that F’F = L*. (22) 


Note that Eq. (22) is very similar to Eq. (11). Thus 
the numerical methodology needed to solve Eq. (22) 
already exists in the form of the corrector algorithm. 
However, it is important to note that predictor iter- 
ates rarely land exactly on the contour corresponding to 
a pitchfork bifurcation because finite step sizes are used 
in the predictor-corrector calculations. They generally 
land close and thus the solution to Eq. (22) is usually 
a very good approximation of the bifurcation point - 
since all that is really needed to follow all branches of 
a pitchfork bifurcation is knowledge at a point follow- 
ing the eigenvector exchange. Moreover, because con- 
tours in the neighborhood of a pitchfork bifurcation 
point can be very flat, solving Eq. (22) can be challeng- 
ing in some cases. Extreme flatness creates numerical 
problems because it implies that the Kuhn-Tucker con- 
ditions for Eq. (22) have a near singular coefficient ma- 
trix. Therefore, good step size control should be used 
when solving Eq. (22). 


Finding All Branches Associated 
with a Bifurcation Point 


Once a bifurcation point is located, all branches from 
the bifurcation must be followed in order to increase 
the probability of finding all relevant solution informa- 
tion. Locating these branches is reasonably straightfor- 
ward. Tangent bifurcation points are characterized by 
collinearity and provide only a single branch for further 
exploration that, as noted, manifests itself by a ‘jump’ to 
a widely different point on a neighboring level curve. 
Pitchfork bifurcation points, on the other hand, pro- 
vide three branches of further exploration defined by 
the gradient to the level curve L*, and +/— the ‘mini- 
mum’ eigenvector of H projected onto the tangent sub- 
space at the bifurcation on L*. Each of these vectors is 
easily computed. The gradient vector at a bifurcation, 
which corresponds to the middle part of the pitchfork, 
is a readily available byproduct of the calculations. The 
‘minimum’ eigenvector of H on the tangent subspace at 
L* is also easily determined. What is difficult is locating 
the valleys that correspond to the pair of minima of g’g 
on L*. For this a careful initialization of our corrector 
algorithm is required to solve Eq. (11) with L = L*. 
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Graph Coloring 


Cases 


There are several problem cases that are encompassed 

by the global terrain-following formulations and meth- 

ods presented in earlier sections. These cases include 

1) Nonlinear objective functions with simple bounds 
on variables. 

2) Systems of nonlinear algebraic equations. 

3) Nonlinear objective functions with simple bounds 
and linear constraints. 


1) Nonlinear Objective Functions 
with Simple Bounds 


This is the case on which the developments in the for- 
mulation and methods sections are based and no fur- 
ther discussion is necessary. 


2) Systems of Nonlinear Algebraic Equations 


For a system of algebraic equations, F = 0 is usually 
given and 9 is irrelevant. The function of interest be- 
comes the traditional nonlinear least squares function, 
FF, and the terrain methodology follows the strategies 
outlined in previous sections. 


3) Nonlinear Objective Functions 
with Simple Bounds and Linear Constraints 


Nonlinear programming problems that involve linear 
equality constraints are easily handled by global ter- 
rain methods by using the linear constraints to elimi- 
nate optimization variables. For m linear constraints, m 
optimization variables can be eliminated. However, it 
is important to understand that the gradient and Hes- 
sian matrix of g must be adjusted to accommodate this 
variable elimination. This can be done by either using 
projection methods or by explicitly doing the elimina- 
tion before formulating the optimization problem to be 
solved by the terrain methodology. 

If projection is used then F is replaced by P’F, 
where P is the m x m orthonormal projection matrix 
whose columns are orthogonal to all rows of the Jaco- 
bian matrix of the linear constraints. That is, if JrzQ is 
the m x n Jacobian of the m linear equality constraints, 
then the projection matrix P satisfies J.gqgP = 0. Addi- 
tionally, the Hessian matrix of ¢, J, must reflect implicit 
elimination and is easily computed to be P’JP. These 
projections of F and J permit the use of the terrain 


methodology in R”~™” while still allowing any bounds 
on all variables to be enforced. 


References 


1. Baker J (1986) An algorithm for the location of transition 
states. J Comput Chem 7:385-395 

2. Cerjan CJ, Miller WH (1981) On finding transition states. 
J Chem Phys 75:2800-2806 

3. Diener | (1987) On the global convergence of path- 
following methods to determine all solutions to a system of 
nonlinear equations. Math Prog 39:181-188 

4. Jongen HT, Stein O (2004) Constrained global optimization: 
adaptive gradient flows. In: Floudas CA, Pardalos P (eds) 
Frontiers in Global Optimization. Kluwer Acad, Boston 

5. Lucia A, DiMaggio PA, Depa P (2004) A geometric method- 
ology for global optimization. J Global Optim 29:297-314 

6. Lucia A, Yang F (2002) Global terrain methods. Comput 
Chem Eng 26:529-546 

7. Lucia A, Yang F (2003) Multivariable terrain methods. AIChE 
J 49:2553-2563 

8. Page M, Mclver JW (1988) On evaluating the reaction path 
Hamiltonian. J Chem Phys 88:922-935 


———E 
Graph Coloring 


GC 


JUE XUE 
Department Management Sci., 
City University Hong Kong, Kowloon, Hong Kong 


MSC2000: 90C35 


Article Outline 


Keywords 
See also 
References 


Keywords 


Graph; Coloring; Optimization; Approximation; 
Algorithms 


A graph G=(V, E) consists of a vertex set V and an edge 
set EC Vx V. Ife = (i,j) ( € E) is an edge of G, then e 
is incident to i and j, and i and j are adjacent. Similarly, 
if two edges are incident to the same vertex, they are 
adjacent. 
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A vertex coloring of G = (V, E) is an assignment 
of k colors to members of V (a coloring) so that ad- 
jacent vertices have different colors (G is k-colorable). 
The graph coloring problem (GC) is to find the mini- 
mum number k such that G is k-colorable. 

When a positive integer weight w; is associated with 
every i € V and a color assignment satisfies: 

e every vertex i gets w; different colors, 

e V(i,j) € E,iand j get w; + w; different colors, 

then this color assignment is a weighted coloring. The 

weighted graph coloring problem asks for the minimum 

number of colors needed for a weighted coloring of G. 
An edge coloring and a total coloring of a given 

graph can be defined in a similar way: 

e An edge coloring assigns colors to edges so that ad- 
jacent edges have different colors. 

e A total coloring assigns colors to vertices and edges 
so that any pair of adjacent vertices, adjacent edges, 
and a vertex and any incident edge will have differ- 
ent colors. 

The edge coloring problem or the total coloring prob- 

lem asks for the minimum number of colors needed 

for an edge coloring or a total coloring, respectively 

[12,28,30]. Although the weighted graph coloring, edge 

coloring, and total coloring problems seem different 

from GC, they can be transformed into a GC [33,42]. 

Further generalizations of GC tend to change the struc- 

ture of a coloring solution, and they move closer to 

other well-known combinatorial optimization prob- 

lems [16,37]. 

GC is well-known in graph theory and combina- 
torial optimization. It starts with the famous four- 
coloring conjecture [24,38] which says four colors are 
enough to color any geographic map so that every 
country gets a color different from those used by its 
neighbors. Although the four-coloring conjecture is 
now considered a theorem [1,2], the process to prove 
or disprove it has inspired many interesting ques- 
tions [32], and has helped the development of several 
branches of science, for example, the GC and the graph 
theory [27]. The interest in GC also comes from its 
vast number of applications in solving real world prob- 
lems. For example, GC can be used to model problems 
in timetabling, scheduling, computer science, informa- 
tion systems, telecommunications, and other indus- 
trial applications [9,11,39]. Typically, a graph is con- 
structed with its vertices representing items of interest 


and edges representing some undesirable binary rela- 
tionship. 

GC has several mathematical programming formu- 
lations. For example, one can use an integer variable x; 
= 1 to indicate when a vertex i is colored by k, and xj 
= 0 otherwise. One can also use an integer variable y, = 
1 to indicate color k is assigned to at least one vertex of 
G, and y, = 0 otherwise. Then, the solution to the fol- 
lowing mathematical programming problem provides 
an optimal (minimum) coloring of G: 


IVI 
min Sov 
k=1 
IV] 
s.t. Ya =1, VieV, 
k=1 
Xik + xjk <1, Vi, j) € E, 
Vk = Xik, Vk» Xik € {0, 1}, 


Vie V,k=1,...,|V], 


where |V| is the cardinality of the set V. In this prob- 
lem, the objective function equals the number of colors 
used. The constraints ensure that every vertex is col- 
ored, that no adjacent vertices get the same color, and 
that the counting of used colors is correct. 

For a feasible coloring, one can group the vertices 
into subsets based on their colors. Thus, vertices of each 
subset will be mutually nonadjacent. Such a subset of 
vertices is called a stable set, a color class, or an indepen- 
dent set [5,8,35,41]. Using the concept of a stable set, 
one can formulate GC as a set partitioning problem. 

Let S,, ..., S; be all the stable sets of G. Let As be 
a0 — 1 matrix whose rows are the characteristic vectors 
of the S;s. One can use a variable s; = 1 to indicate that 
all members of S; have the same color, and s; = 0 oth- 
erwise. Then the solution to the following problem also 
provides an optimal (minimum) coloring of G: 


s.t sAs=1 
ee,0, F=9, ca, 
where s = (s;,..., 5;), and 1 = (1,..., 1) is of dimension 


IVI. 
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Other mathematical formulations of GC based 
on quadratic programming, semidefinite programming 
etc. are also available. Different formulations have their 
own distinctive advantages in understanding the prob- 
lem structure and in designing solution methods to 
solve the problem [22,33,34]. 

Checking whether G is k-colorable for an arbitrary 
integer k is an NP-complete problem [14,23]. It re- 
mains NP-complete even for fixed k > 3 [14,40]. There- 
fore, it is unlikely that the solution time of GC can 
be bounded by any polynomial function (polynomial 
time) [13]. However, GC can be solved in polynomial 
time for graphs of some special structures. For example, 
polynomial algorithms exist for perfect graphs, Meyniel 
graphs, and triangulated graphs [3,4,15,17,19,41]. 

Let us define the performance guarantee of an ap- 
proximation method to be the worst ratio between 
the approximation solution value and the correspond- 
ing optimal solution value over all graphs of size |V]. 
Then, O(|V| log |V]|) seems to be the first performance 
guarantee provided by a polynomial time GC heuris- 
tic [20]. This performance guarantee has being im- 
proved over the years. Let k be the optimal (mini- 
mum) number of colors needed to color a graph, and 
let A be the maximum degree (number of edges inci- 
dent to a vertex) among all vertices. The two recent per- 
formance guarantees achieved by polynomial approxi- 
mation algorithms for GC are O(|V|(log log|V|)*/(log 
|V|)>) and min{O(A!~*), O(|V|!—3/&*)} [18,22]. 
On the other hand, it is known that unless P = NP, 
it is NP-hard to approximate an optimal graph color- 
ing within a performance guarantee of O(|V|*), € > 0 
[14,15]. 

Available solution methods for GC can be divided 
into approximation algorithms and exact algorithms. 
These methods find a feasible graph coloring and an op- 
timal graph coloring, respectively [29]. 

A popular way to find an approximation solution to 
GC is the sequential greedy coloring heuristic (SGCH). 
In a SGCH, the vertices are ordered in a sequence and 
are colored one at a time according to the sequence. Ev- 
ery vertex is colored by the smallest (first) feasible color. 
It is not hard to see that the initial vertex sequence de- 
cides the resulting graph coloring of a SGCH. 

It is also known that there exists at least one se- 
quence under which a SGCH will find an optimal col- 
oring. However, finding an optimal vertex sequence is 


NP-hard. Extensive work aimed at finding ‘good’ ver- 
tex sequences can be found in the literature [10,32]. 
Once a feasible coloring is available, further improve- 
ment can be made using various methods, including: 
interchange, iterative improvement, and other search- 
ing techniques (such as simulated annealing and tabu 
search) [36]. 

To date, the most popular and efficient way to find 
an optimal solution to GC is through a branch and 
bound (BB), or implicit enumeration, algorithm. A BB 
algorithm typically consists of two parts: the forward 
phases and the backtrack phases. A forward phase starts 
from a partial coloring (e. g. 8) and colors the remain- 
ing vertices to find a feasible graph coloring. For exam- 
ple, a SGCH can be used in place of a forward phase. 
A backtrack phase will decide the starting point of the 
next forward phase so that an alternative feasible graph 
coloring can be found. 

Now let us consider how a simple BB algorithm [7] 
finds an optimal coloring of G = (V, E). Let UB be the 
value of a current best coloring (initially set UB = 00). 
Suppose the first forward phase applies a SGCH to ver- 
-» Vjy|) and finds a feasible coloring 
of G. The number of colors used by the feasible coloring 
will be the new UB. Apparently, UB is an upper bound 
on the value of any feasible coloring that one needs to 
search for. 

Since SGCH assigns the smallest feasible color to ev- 
ery vertex, a backtrack phase can be carried out by scan- 
ning the vertices in the reverse order of (1, ..., Vv). 
That is, finding the first vertex v; that can be recolored 
by an alternative feasible color < UB, not used for v; be- 
fore. The new forward phase will start from the partial 
coloring of {v;, ..., vj-i} and applies a SGCH to (vj, 
.++) Vjv|), up to a v; whose smallest feasible color is UB, 
or to vy; that has a feasible color < UB. In the latter 
case, a better coloring is found. Then the BB algorithm 
will backtrack and repeat the above until it backtracks 
to vertex v; (the algorithm terminates). 

Various improvement measurements are designed 
and tested for the above basic BB method. They include 
‘look ahead’, ‘dynamic reordering’, choosing an appro- 
priate feasible color (instead of the ‘smallest’) to color 
a vertex, using tighter lower and upper bounds, and 
a column generation approach [6,21,26,31,33]. These 
improvements have greatly reduced the search tree size 
and enhanced our ability to solve GC optimally. The 


tex sequence (vj, .. 
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state-of-the-art method for solving GC on randomly 
generated graphs seems to be limited to graphs of 100 
vertices [21,31,42,43]. 
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A graph is said to be planar if it can be drawn on the 
plane in such a way that no two of its edges cross. Given 
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a graph G = (V, E) with vertex set V and edge set E, the 
objective of graph planarization is to find a minimum 
cardinality subset of edges F C E such that the graph G’ 
= (V, E \ F), resulting from the removal of the edges in 
F from G, is planar. This problem is also known as the 
maximum planar subgraph problem. A related and sim- 
pler problem is that of finding a maximal planar sub- 
graph, which is a planar subgraph G’ = (V, E’) of Gsuch 
that the addition of any edge e € E \ E’ to G’ destroys its 
planarity. 

Graph planarization is known to be NP-hard [21]. 
The proof of NP-completeness of its decision version is 
based on a transformation from the Hamiltonian path 
problem restricted to bipartite graphs. Although ex- 
act methods for solving the maximum planar subgraph 
problem have been recently proposed, most algorithms 
to date attempt to find good approximate solutions. 

In this article, we survey graph planarization and re- 
lated problems. In the next section, we describe vari- 
ants and applications of the basic problem formulated 
above. Next, we describe the branch and cut algorithm 
of M. Jiinger and P. Mutzel [16]. We then review work 
on heuristics based on planarity testing and those based 
on two- phase procedures. Finally, computational re- 
sults are considered. 


Variants and Applications 


An application of graph planarization arises in the de- 
sign of integrated circuits, in which a graph describing 
the circuit has to be decomposed into a minimum num- 
ber of layers, each of which is a planar graph [19]. Other 
applications arise from variants of the basic graph pla- 
narization problem. 

One such variant is the maximum weighted planar 
graph problem, in which positive weights are associ- 
ated with the edges of the graph and one seeks a pla- 
nar subgraph of maximum weight. Note that the ba- 
sic graph planarization problem is a special case of the 
maximum weighted planar graph problem, in which all 
edge weights are equal to one. An application of this 
problem to facility layout is described in [13]. A graph 
is built in which the vertices represent the facilities and 
the edges define the relationships between them. The 
weight of each edge is the desirability that the two fa- 
cilities that define the edge be adjacent in the design. 
A maximum weighted planar subgraph corresponds to 


a feasible layout with maximum benefit. In this paper, 
the authors also propose simulated annealing and tabu 
search heuristics for the approximate solution of the 
maximum weighted planar graph problem. Construc- 
tive heuristics based on maintaining a triangulated sub- 
graph while making node and edge insertions are given 
in [8,11], and [20]. 

Another related variant is that of drawing a given 
graph such that the number of edge crossings is mini- 
mized. The crossing number problem has practical ap- 
plications in circuit design and graph drawing, such 
as in CASE tools [27] and automated graphical dis- 
play systems. One particular case is that of minimizing 
straight-line crossings in layered graphs. A GRASP and 
path relinking approach for the two-layer case is given 
in [17], where one can also find a survey of the litera- 
ture. Algorithms for graph drawing are reviewed in [6]. 

In the planar augmentation problem, one wants to 
determine the minimum number of edges that need to 
be added to a planar graph such that the resulting graph 
is still planar and at least k-connected, where k is usu- 
ally fixed to two or three. This variant has applications 
in automatic graph drawing, as well as in the design of 
survivable networks [24]. 


An Exact Algorithm 


An exact branch and bound algorithm for the weighted 
graph planarization problem was introduced in [10], 
but was limited to small dense graphs. Only recently 
(1999) has there been a leap in the performance of ex- 
act methods for graph planarization with the Jiinger- 
Mutzel branch and cut algorithm [16], which we de- 
scribe next. 

Given a graph G = (V, E), their approach uses facet- 
defining inequalities for the planar subgraph polytope 
PLS(G). Let x, be a 0-1 variable associated with each 
edge e € E, such that x, = 1 if and only if edge e appears 
in the maximum planar subgraph of G. Furthermore, 
let x(F) = )oee px, for FC E. 

Trivial inequalities 0 < x, < 1 are implicitly han- 
dled by the linear programming (LP) solver. The in- 
equality x(E) < 3|V| — 6 is added to the initial lin- 
ear program. Let x be the optimal solution of the LP 
relaxation associated with some node of the enumer- 
ation tree. For 0 <¢€ < 1, let Rk ={ee E}x, > 1— 
€} and consider the graph G, = (V, Ee), to which the 
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Hopcroft-Tarjan planarity-testing algorithm [14] is ap- 
plied. The algorithm stops if it finds an edge set F which 
induces a nonplanar graph in G. If the inequality x(F) 
< |F| — 1 is violated, it is added to the set of con- 
straints of the current LP. The back edge of the path 
which proved the nonplanarity of the graph induced in 
G by F is removed and the planarity-testing algorithm 
proceeds, eventually identifying other forbidden sub- 
graphs of the graph G_. Although these forbidden sub- 
graphs do not necessarily define facets of P£S(G), they 
must contain facet-defining subgraphs. Facet-defining 
inequalities are identified as follows. Once a forbidden 
set F is found, where the inequality x(F) < |F| — 1 is 
violated, one successively deletes each edge f € F and 
applies the planarity-testing algorithm. If the graph in- 
duced by F \ {f} is planar, then edge f is returned to 
F. In at most |F| steps, F is reduced to a smaller edge 
set which induces a minimal planar subgraph, leading 
to the facet-defining inequality x(F) < |F| — 1 still vio- 
lated by the current LP solution. Another simple heuris- 
tic searches for violated Euler facet-defining inequali- 
ties x(F) < 3|V’| — 6 or x(F) < 2|V’| — 4, where (V’, 
F) is, respectively, a clique or a complete bipartite sub- 
graph of G. 

After an LP has been solved, its solution is exploited 
by the planarity-testing algorithm, to produce a feasi- 
ble solution for the graph planarization problem. Such 
feasible solutions are used as lower bounds that are used 
not only for fathoming nodes in the branch and cut tree, 
but also for fixing variables using their reduced costs 
during a cutting plane phase. Other heuristics are im- 
plemented to enhance the practical performance of the 
algorithm. 

Branching is done if no cutting plane has been 
found for the current infeasible solution. The variable 
chosen for branching is one with fractional value clos- 
est to 1/2, among those with maximum cost coefficient 
in the objective function. 


Heuristics Based on Planarity Testing 


The first linear time algorithm for planarity testing was 
proposed by J. Hopcroft and R-E. Tarjan [14]. T. Chiba, 
I. Nishioka and I. Shirakawa [4] used the basic ideas of 
this approach to devise an algorithm for finding a max- 
imal planar subgraph of G = (V, E) with time com- 
plexity O(|V||E|). Later, J. Cai, X. Han and Tarjan [3] 


proposed another version of the above planarity testing 
algorithm. This new algorithm is based on processing 
edges instead of paths. It leads to another algorithm to 
find a maximal planar subgraph, with improved O(|E| 
log |V|) time complexity. 

A. Lempel, S. Even and I. Cederbaum [18] have pro- 
posed another approach to planarity testing. Although 
its original complexity was O(|V|?), K. Booth and G. 
Lueker [2] have shown that it can be implemented in 
linear time using PQ-trees. A few algorithms for find- 
ing a maximal planar subgraph based on this planarity 
testing approach have been proposed in the literature. 
However, Jiinger, S. Leipert and Mutzel [15] show that 
attempts following this strategy are forced to fail. 

Another approach for finding a maximal planar 
subgraph of a given graph works as follows. Start with 
an empty subgraph and successively add the edges of 
the original graph, whenever such addition maintains 
the planarity of the subgraph under construction. Us- 
ing any of the planarity testing algorithms above de- 
scribed, such approach can be implemented in O(|V]|E) 
time complexity. An incremental planarity testing algo- 
rithm, based on an O(log|V) time-per-operation strat- 
egy for the problem of maintaining a planar graph un- 
der edge additions, was proposed by G. Di Battista and 
R. Tamassia [7]. Hence, their algorithm leads to a more 
efficient implementation of the incremental approach 
for finding a maximal planar subgraph with O(|E| log 
|V|) time complexity. 


Two-Phase Heuristics 


The heuristics described in this section are based on 
the separation of the computation into two phases. The 
first phase consists in devising a linear permutation 
of the nodes of the input graph, followed by placing 
them along a line. The second phase determines two 
sets of edges that may be represented without cross- 
ings above and below that line, respectively. Y. Takefuji 
and K.C. Lee [25] were the first to propose a heuris- 
tic using this idea. They use an arbitrary sequence of 
nodes in the first phase and apply a parallel heuristic 
using a neural network for the second phase. Takefuji, 
Lee, and Y.B. Cho [26] claimed superior performance 
of the two-phase approach of Takefuji and Lee [25] with 
respect to the heuristics described in the previous sec- 
tion. 
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Their approach was later extended and improved 
by O. Goldschmidt and A. Takvorian [12]. In the first 
phase, these authors attempt to use a linear permuta- 
tion of the nodes associated with an Hamiltonian cycle 
of G. Two strategies are used: 

i) a randomized algorithm [1] that almost certainly 
finds a Hamiltonian cycle if one exists; and 
ii) a greedy deterministic algorithm that seeks a Hamil- 
tonian cycle. 
In the latter, the first node in the linear permutation is 
a minimum degree node in G. After the first k nodes of 
the permutation have been determined, say v1, ..., vi; 
the next node v,,1 is selected from the nodes adjacent to 
vz in G having the least adjacencies in the subgraph Gx 
of G induced by V\{v;, ..., ve}. If there is no node of Gx 
adjacent to v; in G, then vx, is selected as a minimum 
degree node in G,. 

Let H = (E, I) be a graph where each of its nodes cor- 
responds to an edge of the input graph G. Nodes e; and 
e) of H are connected by an edge if the corresponding 
edges of G cross with respect to the linear permutation 
of the nodes established during the first phase. A graph 
is called an overlap graph if its nodes can be placed in 
one-to-one correspondence with a family of intervals 
on a line. Two intervals are said to overlap if they cross 
and none is contained in the other. Two nodes of the 
overlap graph are connected by an edge if and only if 
their corresponding intervals overlap. Hence, the graph 
H as constructed above is the overlap graph associated 
with the representation of G defined by the linear per- 
mutation of its nodes. 

The second phase of the heuristic of Goldschmidt 
and Takvorian consists in two-coloring a maximum 
number of the nodes of the overlap graph H, such 
that each of the two color classes B (blue) and R 
(red) forms an independent set. Equivalently, the sec- 
ond phase seeks a maximum bipartite subgraph of the 
overlap graph H, i.e. a bipartite subgraph having the 
largest number of nodes. This problem is equivalent to 
drawing the edges of the input graph G above or be- 
low the line where its nodes have been placed, accord- 
ing to their linear permutation. A greedy algorithm is 
used to construct a maximal bipartite subgraph of the 
overlap graph. This algorithm finds a maximum inde- 
pendent set B C E of the overlap graph H = (E, I), re- 
duces the overlap graph by removing from it the nodes 
in B and all edges incident to nodes in B, and then finds 


a maximum independent set R C E\B in the remaining 
overlap graph H’ = (E\B, I’). The two independent sets 
so obtained induce a bipartite subgraph of the original 
overlap graph, not necessarily with a maximum num- 
ber of nodes. 

The linear permutation obtained in the first phase 
affects the size of the planar subgraph found in the sec- 
ond phase of the above heuristic. Moreover, it is not 
clear that the permutation produced by the greedy algo- 
rithm is the best. To produce possibly better permuta- 
tions, randomization and local search have been intro- 
duced in the greedy algorithm by M.G.C. Resende and 
C.C. Ribeiro [22] in the form of a greedy randomized 
adaptive search procedure (GRASP). 

A GRASP [9] is an iterative process, in which each 
iteration consists of two phases: construction and local 
search. The construction phase builds a feasible solu- 
tion, whose neighborhood is explored by local search. 
The best solution over all GRASP iterations is returned 
as the result. 

In the construction phase, a feasible solution is built, 
one element at a time. At each construction iteration, 
the next element to be added is determined by ordering 
all elements in a candidate list with respect to a greedy 
function that estimates the benefit of selecting each el- 
ement. The adaptive component of the heuristic arises 
from the fact that the benefits associated with every el- 
ement are updated at each iteration of the construction 
phase to reflect the changes brought on by the selection 
of the previous elements. The probabilistic component 
of a GRASP is characterized by randomly choosing one 
of the best candidates in the list, but usually not the 
top candidate. This way of making the choice allows for 
different solutions to be obtained at each iteration, but 
does not necessarily jeopardize the power of GRASP’s 
adaptive greedy component. 

The solutions generated by a GRASP construction 
are not guaranteed to be locally optimal, even with re- 
spect to simple neighborhood definitions. Hence, it is 
almost always beneficial to apply a local search to at- 
tempt to improve each constructed solution. A local 
search algorithm works in an iterative fashion by suc- 
cessively replacing the current solution by a better so- 
lution from its neighborhood. 

Resende and Ribeiro [22] proposed an extension 
of the above described heuristic of Goldschmidt and 
Takvorian, in which a GRASP is used for finding 
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a linear permutation of the nodes. In the construction 
phase of this GRASP, the greedy algorithm used in the 
first phase by Goldschmidt and Takvorian is random- 
ized: instead of selecting the node of minimum degree 
among those yet unselected, the selection is made from 
a set of low degree nodes. The local search phase of 
this GRASP explores the neighborhood of the current 
permutation by swapping the positions of two nodes 
at a time, attempting to reduce the number of possible 
edge crossings. 

Incorporating the second phase of the Gold- 
schmidt-Takvorian heuristic to the above GRASP for 
finding a linear permutation of the nodes results in 
a GRASP for graph planarization. 

Each iteration of this GRASP produces three edge 
sets: B (blue edges), R (red edges), and P (the remain- 
ing edges, which are referred to as the pale edges). By 
construction, B, RX, and P are such that no red or pale 
edge can be colored blue. Likewise, pale edges cannot 
be colored red. However, if there exists a pale edge p € 
P such that all blue edges that cross with p (let By CB 
be the set of those blue edges) do not cross with any red 
edge r € &, then all blue edges b € By can be colored 
red and p can be colored blue. In case this reassignment 
of colors is possible, then the size of the planar subgraph 
is increased by one edge. This post-optimization proce- 
dure is incorporated at the end of each GRASP itera- 
tion. 


Computational Results 


Detailed results on a set of 75 test problems described 
in the literature [5,12] are reported in [22]. The de- 
scription of the code used can be found in [23]. Here, 
we summarize computational results illustrating the ef- 
fectiveness of the two-phase heuristics described in the 
previous section, as well as that of the exact branch and 
cut algorithm. These results are based on a Fortran im- 
plementation of the GRASP heuristic of Resende and 
Ribeiro [22], on the original code of the branch and 
cut algorithm of Jiinger and Mutzel [16], and on pub- 
lished results for the heuristics of Takefuji and Lee [25] 
and Goldschmidt and Takvorian [22] (using the greedy 
algorithm for building the linear permutation of the 
nodes). 

We give, in the table below, results comparing the 
four approaches on a subset of the test problems de- 


scribed in [12]. For each instance, the table lists the 
number of nodes, the number of edges, and the size 
of the planar subgraphs produced by each algorithm. 
A time limit of 1000 seconds (on a SUN SPARCstation 
10/41) was imposed on the runs of the branch and cut 
algorithm and the best solution found was returned as 
a heuristic solution when optimality was not attained in 
that time limit. This time limit was reached on instances 
G12-G19. 

The results in this table show that the Goldschmidt- 
Takvorian algorithm is a substantial improvement over 
the neural network approach of Takefuji and Lee. The 
GRASP consistently outperforms both other two-phase 
heuristics, not only for the problems reported in this 
table, but also for all of the remaining instances consid- 
ered in [22]. 


Problem | Nodes Edges} T-L G-T R-R J-M 
Gl 10 P2) 20 20 20 20 
G2 45 85 80 80 82 82 
G3 10 24 21 21 24 24 
G4 10 DBS PO) 21 24 24 
G5 10 26 22 21 24 24 
G6 10 Ae 2D 21 24 24 
G7 10 34 23 22 24 24 
G8 25 69 58 60 69 69 
G9 25 70 59 60 69 69 

G10 25 71 58 59 69 69 
Gll 2S) 72 60 59 69 69 
G12 25 90 61 62 67 68 
G13 50 367 70 131 135 125 
G14 50 491}; 100 136 143 133 
G15 50 582} 101 142 144 138 
G16 100. = 451 92 180 196 187 
G17 100 742} 116 219 236 213 
G18 100) 9922) 115 237 246 223 
G19 150 1064) 127 297 311 290 


A comparison of GRASP with the branch and cut 
algorithm depends heavily on the instances. The results 
reported in [22] can be separated into two groups. On 
49 of the 55 instances in the first group, the GRASP 
either matched or produced better solutions than the 
branch and cut algorithm. On 30 of those 55 instances, 
the GRASP solution was strictly better than the branch 
and cut solution. Note that, on these instances, the 
branch and cut algorithm was forced to stop because 
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of the 1000 second time limit. However, on all the re- 
maining 20 instances, the branch and cut algorithm 
performs remarkably well and outperforms all other al- 
gorithms. 


See also 


> Feedback Set Problems 

> Generalized Assignment Problem 

> Graph Coloring 

> Greedy Randomized Adaptive Search Procedures 
> Optimization in Leveled Graphs 

> Quadratic Assignment Problem 

> Quadratic Semi-assignment Problem 
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Introduction 


Due to its fundamental nature and versatile modelling 
power, the Graph Realization Problem is one of the 
most well-studied problems in distance geometry and 
has received attention in many communities. In that 
problem, one is given a graph G = (V,E) and a set 
of non-negative edge weights {dj; : (i, j) € E}, and the 
goal is to compute a realization of G in the Euclidean 
space R* for a given dimension k > 1, i.e. to place 
the vertices of G in R* such that the Euclidean dis- 
tance between every pair of adjacent vertices vj, v; is 
equal to the prescribed weight d;;. The Graph Re- 
alization Problem and its variants arise from appli- 
cations in very diverse areas, the two most promi- 
nent of which being molecular conformation (see, 
e.g., [13,15,16,19,32]) and wireless sensor network lo- 
calization (see, e.g., [2,8,14,22,24]). In molecular con- 
formation, one is interested in determining the spatial 
structure of a molecule from a set of geometric con- 
straints; in wireless sensor network localization, one is 
interested in inferring the locations of sensor nodes in 
a sensor network from connectivity-imposed proxim- 
ity constraints. Thus, in these contexts, an algorithm 
that finds a realization of the vertices in the required 
dimension will have interesting biochemical and engi- 
neering consequences. Unfortunately, unless P = NP, 
there is no efficient algorithm for solving the Graph 
Realization Problem for any fixed k > 1 ([23]; see 
also [3,4]). Nevertheless, many heuristics have been 
developed for the problem over the years, and vari- 
ous approaches have been taken to improve their eff- 
ciency (see, e. g., [1,2,13,14,15,18,20]). However, these 
approaches have their limitations. Specifically, either 
they solve the original problem only for a very restricted 
family of instances, or it is not clear when the algorithm 
would solve the original problem. Thus, an interesting 
question arises: given a relaxation of the Graph Realiza- 
tion Problem, can one derive reasonably general condi- 
tions under which the relaxation is exact? 


We begin by examining a semidefinite program- 
ming (SDP) relaxation proposed by [10] in Section 
Formulation. We introduce the notion of unique k- 
realizability and show that the SDP relaxation is exact 
if and only if the input instance is uniquely k-realizable, 
where k is the given dimension. The notion of unique k- 
realizability is attractive, as it has a straightforward ge- 
ometric interpretation and is also very suitable for the 
algorithmic treatment of the Graph Realization Prob- 
lem. 

Although we have formulated the Graph Realiza- 
tion Problem as a feasibility problem, it is clear that 
one can also formulate various optimization versions 
of it. One particularly useful objective is to maximize 
the sum of the distances between certain pairs of non- 
adjacent vertices. Such an objective essentially stretches 
apart pairs of non-adjacent vertices, and is more likely 
to flatten a high-dimensional realization into a lower di- 
mensional one. Indeed, such a device has been proven 
to be very useful for finding low-dimensional real- 
izations both in theory (see, e.g., [6,7]) and in prac- 
tice (see, e.g., [9,29,30]). In Section Applications, we 
show how these ideas can be incorporated into the SDP 
model and demonstrate a connection between SDP the- 
ory and tensegrity theory in discrete geometry. 


Formulation 


We begin by introducing the semidefinite program- 
ming (SDP) relaxation proposed by [10]. Let G = 
(V,E) be a graph, and let k > 1 be an integer. Let 
V, = {1,...,n}and V) = {n+1,...,n-+m} bea par- 
tition of V. The vertices in V, (resp. V2) are said to be 
unpinned (resp. pinned). Specifically, let a = (a;)iev, 
be given, where a; € R* for all i € V>. Then, the 
vertex i € V, is constrained to be at a;, while there 
are no such restrictions on the vertices in V;. For our 
purposes, we may assume that V. # Q, since we can 
always pin one vertex at the origin. We may also as- 
sume that E’ = {(i,j) : i,j € Vo} C E, since the 
distance between any two pinned vertices is trivially 
known. Now, let E; = {(i,j) € E: i, j © Vi} be 
the set of edges between two unpinned vertices, and let 
E, = {(i,j) € E: i € Vo, j € Vi} be the set of 
edges between a pinned and an unpinned vertex. Let 
d = (di )i,nery (resp. d = (d?,)i,yek,) be a set of 
weights on the edges in E, (resp. E2). We are then in- 
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terested in finding vectors x1,...,Xn € R* such that: 
IIxi-xjIP = di, for (i, j) € Ey 
2 72 - (1) 
lai—xjI = di, for (i, j) € Ey 
Here, || - || is the Euclidean norm, ie. ||x|| = 


(doh, x2)" for x € R*. We say that p = (pi... 
Pn € R** isa realization of (G, (d,d), a) in R* ifit sat- 
isfies (1). One may obtain a semidefinite relaxation of 
(1) as follows. Let X = [x, x2 ... X,] be the kxn matrix 
that needs to be determined. Then, for all (i, j) € E1, we 
have: 


Ix; — x;ll’ = (e: — e;)' X"X(e; — e;) 
= (e; — ej)(e; — ej)’ @ (X"X) 
and for all (i, j) € E2, we have: 


T 


lla: — x)? = ( ie ) [Ik x1" x1 ( me )- 
J J 
aj; aj . Tk x 
(525) Le 2 | 


Here, e; is the ith standard basis vector of R”, I; is 
the k-dimensional identity matrix, and e is the Frobe- 
nius inner product on the space of symmetric matrices, 
ie. Ae B = tr(A’B) = 9° _, aijbij for symmetric 
nxn matrices A and B. Thus, problem (1) becomes that 
of finding a symmetric matrix Y € R”*" and a matrix 
X € R**" that satisfy the following system: 


(e; = ej )(e; = ej)" eY= dj 
for (i, j) € Ey 


eV Tee _ 
(yea ]e« 


for (i, j) € Ez 
Y=xX'xX 


By relaxing Y = X’X to Y > X'X and using 
Schur’s complement (see, e. g., [11]), we obtain the fol- 
lowing relaxed problem: 

sup 0 
subject to for (i, j) € Ey 
E for (i, j) € Ep 
Z >= 0, Z1:k1:k = Tk 
(3) 


where Z}:x,1-k is the k x k principal submatrix of Z in- 
dexed by the first k rows (columns), 


0 0 
ene) eas 
t J t J 
aj aj ‘ 
en eee 
= =e; 


Note that this formulation forces any feasible solu- 
tion matrix to have rank at least k. To derive the dual of 
(3), let (Oii,pezs and (Wij)G,jeEs be the dual multipli- 
ers of the constraints on E, and E), respectively. Then, 
the dual of (3) is given by: 


ca 


and 


inf TpeV+ +>. 0: jd;, 
(i, fEEy 
52 
+ DL) midi; 
(i, j)€EE2 
af 2 
subject to U= 0 0 + > 6;j;Ei; (4) 
(i, fJEEy 
+ y wijEij = 0 
(i, j)EE2 


6;; € R for all (i, j) € Ei; 
wij € R forall (i, j) € Ey 


Note that the dual is always feasible, as V = 0, 
6;; = 0 for all (i, j) € E, and w;; = 0 for all (i, j) € Ey 
is a feasible solution. Moreover, this solution has a dual 
objective value of 0. Thus, by the SDP strong duality 
theorem, if the primal is also feasible, then there is no 
duality gap between (3) and (4). Moreover, if Z is fea- 
sible for (3) and U is optimal for (4), then by comple- 
mentarity, we have rank(Z) + rank(U) < k-+-n. In par- 
ticular, since rank(Z) > k, we must have rank(U) < n. 

We are interested in deriving the conditions under 
which the relaxation (3) is exact for (2). Towards that 
end, let us first introduce a definition: 


Definition 1 We say that an instance (G, (d, d),a) is 
uniquely k-realizable if (i) there is a unique realization 
p = (p1..--» Pn) of (G, (d,d), a) in R*, and (ii) there 
does not exist p/,..., p’, € R', where 1 > k, such that: 


aj 
0 


li PiP = di, fori, fJe Bi 


2 
)-# 


= dj. for (i, j) € E» 


YK 


p’ ( { ) torsomet <i<n 
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For the motivation of this definition, see [25]. We re- 
mark that Definition 1 can be viewed as a new notion of 
rigidity which takes into account both the combinato- 
rial and the geometric aspects of the Graph Realization 
Problem. 

At this point it is fair to ask whether Definition 1 
is vacuous, i. e. whether uniquely k-realizable instances 
exist at all. It is not hard to see that they do exist for 
all k > 1. In fact, there exists a family of uniquely k- 
realizable instances in which the number of edges scales 
linearly with the number of vertices ([25]). This refutes 
a common belief in the literature (see, e. g., [2,5]) that 
the graph of any uniquely k-realizable instance must 
have {2(n*) edges. 

Having established the existence of uniquely k-re- 
alizable instances, we are now ready to state the main 
theorem of this section. For a proof, see [25,27]. 


Theorem 1 Let G = (V, E) be connected, and let d, d 
and a be given. Then, the following are equivalent: 

(1) The instance (G, (d, d), a) is uniquely k-realizable. 
(2) The max-rank solution matrix of (3) has rank k. 

(3) The solution matrix of (3) satisfies Y = X'X. 


Although unique k-realizability is a useful notion in de- 
termining the solvability of the Graph Realization Prob- 
lem, it is not stable under perturbation. Indeed, there 
exist instances that are uniquely k-realizable, but may 
no longer be so after small perturbation of the un- 
pinned vertices; see [27]. This motivates us to define 
another notion called strong k-realizability: 


Definition 2 We say that an instance (G, (d, d), a) is 
strongly k-realizable if (4) has a rank-n optimal dual 
slack matrix. 


Note that if an instance is strongly k-realizable, then it 
is uniquely k-realizable by complementarity and Theo- 
rem 1, since the rank of any solution to (3) is equal to k. 

Given an instance J = (G, (d, d), a), we say that the 
instance (G’, (d’,d’),a) is a sub-instance of 7 if G’ is 
a subgraph of G that includes all the pinned vertices, 
and (d’,d’) is the restriction of (d,d) on G’. As indi- 
cated by the following theorem, the notion of strong 
k-realizability is very useful in identifying the uniquely 
k-realizable sub-instances of a given instance. Its proof 
can be found in [25,27]. 


Theorem 2 Suppose that a given instance 1 contains 
a sub-instance J’ that is strongly k-realizable. Then, in 


any solution to (3), the submatrix that corresponds to J’ 
has rank k. 


Applications 


It is often observed in practice that by “stretching apart” 
pairs of non-adjacent vertices, one is more likely to flat- 
ten a high-dimensional realization into a lower dimen- 
sional one. We now formalize this observation using 
elements of tensegrity theory (see, e. g., [12,21]). We be- 
gin with some definitions: 


Definition 3 A tensegrity G(p) is a graph G = (V, E) 
together with a configuration p = (pi,..., Pn) € R*" 
such that each edge is labelled as a cable, strut, or bar; 
each vertex is labelled as pinned or unpinned; and ver- 
tex i € V is assigned the coordinates p; € R* for 
1l<i<n. 


The label on each edge is intended to indicate its func- 
tionality. Cables (resp. struts) are allowed to decrease 
(resp. increase) in length (or stay the same length), 
but not to increase (resp. decrease) in length. Bars are 
forced to remain the same length. As before, a pinned 
vertex is forced to remain where it is. Given a graph 
G = (V,E) and a set d of weights on the edges, if 
(i, j) is a cable (resp. strut), then d;; will be the upper 
(resp. lower) bound on its length. If (i, j) is a bar, then 
d;; will simply be its length. 

An important concept in the study of tensegrities is 
that of an equilibrium stress: 


Definition 4 An equilibrium stress for G(p) is an as- 
signment of real numbers w;j = @ji to each edge 
(i,j) € E such that for each unpinned vertex i of G, 
we have: 


~ @ij(pi — pj) =0 (5) 


JG, NEE 


Furthermore, we say that the equilibrium stress 
@ = {aj;} is proper if aj; = aj = 0 (resp. < 0) if 
(i, j) is a cable (resp. strut). 


Clearly, the zero stress @ = 0 is a proper equilibrium 
stress, but it is not too interesting. On the other hand, 
suppose that G(p) has a non-zero equilibrium stress, 
and that at least one of the incident edges of vertex i has 
a non-zero stress. Then, Eq. (5) implies that the set of 
vectors {pj — pi : (i, j) € E} is linearly dependent, and 
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hence those vectors span a lower dimensional space. 
Thus, it would be nice to have conditions that guarantee 
the existence of a non-zero proper equilibrium stress. It 
turns out that the concept of an unyielding tensegrity is 
useful for that purpose. 


Definition 5 Let G = (V,E) be a graph, and let p 
and q be two configurations of G. We say that G(p) 
dominates G(q) (denoted by G(p) & G(q)) if for every 
pinned vertex i, we have p; = qj, and for every edge 
(i, j) € E, we have: 


> cable 
lpi Pilly = pllar—ajl ifG fisay bar 
< strut 


We call G(p) an unyielding tensegrity and p an un- 
yielding configuration if any other configuration q with 
G(p) & G(q) satisfies ||p: — pjl| = lq — gill for all 
(i, j) € E. 


We are now ready to state the following theorem due 
to [6], which plays a crucial role in the characterization 
of the so-called 3-realizable graphs (informally, a graph 
G is 3-realizable if, given any set d of edge weights, 
whenever (G, d) is realizable at all, then it can also be 
realized in R°; for further details, see [7]): 


Theorem 3 If G(p) is an unyielding tensegrity with ex- 
actly one strut or cable, then G(p) has an equilibrium 
stress that is non-zero on at least one edge. 


Belk’s proof of Theorem 3 uses the Inverse Function 
Theorem and hence is not constructive. It turns out that 
the problem of computing an unyielding configuration 
p of a graph G can be formulated as an SDP. What is 
even more interesting is that the optimal dual multi- 
pliers of the SDP will give rise to a non-zero proper 
equilibrium stress for G(p). Consequently, we obtain 
a constructive proof of Theorem 3. In fact, the SDP- 
based proof yields more information than that offered 
by Belk’s proof. 

Specifically, let Vi, V2, E1, Ex be as before, and set 
ES = {(i,j) GE: i,j € Vi}and ES = {(i,j) ZE: 
i € Vo,j € Vi}. Let C),S, be disjoint subsets of Ef, 
and let C2, Sy be disjoint subsets of E5. The pairs in C; 
are intended to be cables, and those in S; are intended 
to be struts. We remark that we do not assume the sets 
Ci, Cz, S1, S2 to be non-empty. 


Now, consider the following SDP, where we aug- 
ment the formulation (3) with an objective function: 


sup > EyjjeZ+ x E,,0Z 
(i, JES) (i, f)ES2 
2 E;jeZ— 2 Bj;eZ 
(i, jvECi (i, j)ECa2 
subjectto — Eyje Z = di, for (i, j) € Ey 
EjjeZ= di, for (i, j) € Ez 
Z = 0, Zi:k, 1k = Ik 
(6) 
The dual of (6) is given by: 
. 2 
inf TpeV+ > 0;jd;, 
(i, ve 
+ Do midi, 
(i, j)€E2 
subject to U=- ye Ei; - Ei; 
(i, jJES1 (i, f)ES2 (7) 
+ byt DB, 
(i, fJECr (i, fJEC2 
+ ls 0 0 > |+ a 6;j;Ei; 
(i,jJEE1 
+ >> wiskij = 0 
(i, j)E€E2 


We then have the following theorem due to [26]: 


Theorem 4 Let G = (V,E), d, danda be given such 

that: 

(1) there is at least one pinned vertex, and 

(2) the graph G\{n + 2,...,n + m} is connected. 
Consider the SDP (6), where we assume that: 

(3) it is strictly feasible, and 

(4) the objective function is not vacuous, i. e. at least one 
of the sets Cy, Cz, S1, Sz is non-empty. 

Let X = (X1,...,Xn) € R! be the positions of the 
unpinned vertices in R! (for some | > k), obtained from 
the optimal primal matrix Z, and let {6:;, w;j} be the op- 
timal dual multipliers. Suppose that we assign the stress 
6;; (resp. w;;) to the bar (i,j) € E, (resp. (i,j) € En), 
a stress of 1 to all the cables in Cy U Cy, and a stress of 
—1 to all the struts in S, U Sp. Then, the resulting assign- 
ment yields a non-zero proper equilibrium stress for the 
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tensegrity G'(x, a), where G’ = (V, EUC, UC, US, US) 
anda = (Gn41, aoe 


a= (4 er’ 
0 


The intuition behind the proof of Theorem 4 is sim- 
ple. Suppose that (6) and (7) achieve the same optimal 
value, and that the common optimal value is attained 
by the primal matrix Z and the dual matrix U. Then, 
the desired result should follow from one of the condi- 
tions for strong duality, namely the identity ZU = 0. 
Of course, strong duality for SDP does not necessar- 
ily hold, and even when it does, there is no guarantee 
that the optimal value is attained by any matrix (see, 
e.g., [17] for some examples). Thus, some additional 
technical assumptions are needed, and items (2) and (3) 
in the statement of Theorem 4 turn out to be sufficient. 
In fact, the conclusion of Theorem 4 remains valid if we 
replace (3) by the following: 

(3') the optimal value of (7) is attained by some dual 
feasible matrix 

We remark that in most applications of Theorem 4, 
there will only be one pinned vertex, namely a,4) = 0. 
Thus, primal strict feasibility can be ensured if the given 
weights d admit a realization whose vertices are in gen- 
eral position, and the connectivity condition is simply 
the statement that G is connected. However, the strict 
feasibility assumption (or the dual attainment assump- 
tion) does weaken the applicability of Theorem 4. In 
particular, Theorem 4 is not as general as Theorem 3, 
although this can be fixed (see [25] for details). 

Besides strict feasibility, it is also assumed that the 
given instance has at least one pinned vertex. Such an 
assumption is necessary in order to ensure that the en- 
tries of Z are bounded, but one can no longer argue 
that the net stress exerted on a pinned vertex is zero. 
However, if there is only one pinned vertex in the given 
instance, then the net stress exerted on it will be zero. 
Thus, one may assume without loss of generality that 
the given instance has one pinned vertex. 

Finally, observe that the assumptions in the state- 
ment of Theorem 4 buy us some additional information 
that is not offered by Theorem 3. Specifically, the equi- 
librium stress obtained in Theorem 4 is non-zero on all 
the cables and struts, and the magnitudes of the stress 
on all the cables and struts can be prescribed (by assign- 


.,4n+m), where: 


ing appropriate weights to each summand in the primal 
objective function). 


Relation to the Maximum Variance Unfolding 
Method 


The idea of stretching apart pairs of non-adjacent ver- 
tices has also been used in the artifical intelligence com- 
munity to detect and discover low-dimensional struc- 
ture in high-dimensional data. For instance, in [29] (see 
also [30]), the authors proposed the so-called Maxi- 
mum Variance Unfolding (MVU) method for the prob- 
lem of manifold learning. The idea is to map a given set 
of high-dimensional vectors pi,..., Pn € R! toa set 
of low-dimensional vectors qi,...,4n € IR* (where 
1 < k < / are given) with maximum total variance, 
while at the same time preserves the local distances. 
More precisely, consider an n-vertex connected graph 
G = (V,E), where the set E of edges represents the 
set of distances that need to be preserved. The desired 
set of low-dimensional vectors can then be obtained by 
solving the following quadratic program: 


maximize 


subject to 


ys =0 (8) 


Ilxi — x,ll* = Ilpi — pill? 
for(i, j) € E 


x,ER* forl<i<n 


To explain the rationale behind the above formu- 
lation, we observe that the first constraint centers the 
solution vectors at the origin and eliminates the trans- 
lational degree of freedom. Moreover, it implies that the 
objective function of (8) can be written as: 


isl? = 2 bes? 
xi|° = — Xi — Xj 
i=1 : af i,j=l a 

Thus, we see that the MVU method attempts to 
“unfold” the manifold by pulling the data points as far 
apart as possible while preserving the local distances. 
We remark that such a technique has also been used 
for the problem of sensor network localization (see, 
e.g., [9,31]). Now, using the ideas in Section Formu- 
lation, we can formulate a semidefinite relaxation of (8) 
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as follows: 
sup TeX 
subject to eee X=0 
E,jeX =|\v;-v;|? for(i, j) €E 
X>=0 
(9) 


Here, e = (1, Hic cascae's 1), Eij = (e; = e;)(e; = ys 
and e; is the ith standard basis vector of R”. It turns 
out that problem (9) and its dual are closely related to 
the problem of finding the fastest mixing Markov pro- 
cess on a graph, as well as to various spectral meth- 
ods for dimensionality reduction. We shall not elabo- 
rate on these results here and refer the interested reader 
to [28,33] for further details. Instead, we will show that 
the MVU problem (9) can be viewed as a problem of 
finding an unyielding configuration of a certain tenseg- 
rity. To begin, suppose that we are given an n-vertex 
connected graph G = ({1,...,}, E) and a configura- 
tion p = (p1,.... Pn) € R'” of the vertices. Consider 
the tensegrity G’(p’), where G’ is obtained from G by 
adding a new vertex n + 1 and connecting it to all the 
vertices of G, and p’ = (p, 0) € R'@+), ie. vertexn+1 
is located at the origin. Furthermore, we label the edges 
in Eas bars and the edges in S = {(n+1,i):1<i<n} 
as struts. Suppose that we pin vertex n + 1 at the origin, 
i.e. @n41 = 0. Now, consider the following SDP: 


sup ~~ En+1i eZ 
i(n+1,i)eS 
subject to E,;eZ=|lpi-p,|? for(i) ¢£ 
Z = 0, Lik lik = Ii 
(10) 
where: 
0 e \" 
a eye en 
Er ej ey = ej 
T 
: 0 0 
d E = 
an n+1,i ( 7 )( _s ) 


It is clear that (10) is an instance of (6). Moreover, 
it can be shown ([25]) that the positions x € R'™ of 
the unpinned vertices obtained from the optimal pri- 
mal matrix Z are automatically centered at the origin, 


even though such a constraint is not explicitly enforced. 
Thus, we see that problem (10) is equivalent to the 
MVU problem (9). 

From the above discussion, we see that the formula- 
tion (6) is more general than the MVU formulation (9). 
Moreover, the flexibility in the formulation (6) often al- 
lows one to achieve the desired dimensionality reduc- 
tion which the MVU formulation cannot achieve. For 
instance, consider the case where the input graph G is 
a tree. It is not hard to show that there is a placement 
of struts such that all the optimal solutions to (6) have 
rank 1 and hence they all give rise to one-dimensional 
realizations. On the other hand, the MVU formulation 
may yield a two-dimensional realization; see [25] for an 
example. 
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Optimization problems that involve a large finite num- 
ber of alternatives often arise in industry, government 
and science. In these problems, one is given a finite 
solution set X and a real-valued function f: X — R, 
and one seeks a solution x* € X with f(x*) < f(x), 
Vx € X. Common examples include designing efficient 
telecommunication networks and constructing cost ef- 
fective airline crew schedules. To find the optimal so- 
lution in a com binatorial optimization problem it is 
theoretically possible to enumerate the solutions and 
evaluate each with respect to the stated objective. How- 
ever, from a practical perspective, it is infeasible to fol- 
low such a strategy of complete enumeration because 
the number of combinations often grows exponentially 
with the size of problem. 

Much work has been done over the last five decades 
to develop optimal seeking methods that do not ex- 
plicitly require an examination of each alternative. This 
research has given rise to the field of combinatorial 
optimization (see [55]), and an increasing capability 
to solve ever larger real-world problems. Nevertheless, 
most problems found in industry and government are 
either computationally intractable by their nature, or 
sufficiently large so as to preclude the use of exact 
algorithms. In such cases, heuristic methods are usu- 
ally employed to find good, but not necessarily guar- 
anteed optimal solutions. The effectiveness of these 
methods depends upon their ability to adapt to a par- 
ticular realization, avoid entrapment at local optima, 
and exploit the basic structure of the problem, such 
as a network or a natural ordering among its compo- 
nents. Furthermore, restart procedures, controlled ran- 
domization, efficient data structures, and preprocess- 
ing are also beneficial. Building on these notions, var- 
ious heuristic search techniques have been developed 
that have demonstrably improved our ability to obtain 


good solutions to difficult combinatorial optimization 
problems. The most promising of such techniques in- 
clude simulated annealing [35], tabu search [27,28,29], 
genetic algorithms [30] and GRASP (greedy random- 
ized adaptive search procedures) [21,22]. 

In this article, we review GRASP. The components 
of a basic GRASP heuristic are addressed and enhance- 
ments proposed to the basic heuristic are discussed. The 
paper concludes with a brief literature review of appli- 
cations of GRASP. 


A Basic GRASP 


A GRASP is a multistart or iterative process, in which 
each GRASP iteration consists of two phases, a con- 
struction phase, in which a feasible solution is produced, 
and a local search phase, in which a local optimum in 
the neighborhood of the constructed solution is sought. 
The best overall solution is kept as the result. The pseu- 
docode below illustrates a GRASP procedure for mini- 
mization in which maxitr GRASP iterations are done. 


se" = COP 

FOR k = 1,...,maxitr DO 
construct (g(-), a, x); 
local (f(-), x); 
IF f(x) < f(x*) DO 

ie" SRB 

END IF; 

END FOR 


Procedure grasp(f(-), g(-), maxitr, x*) 


In the construction phase, a feasible solution is it- 
eratively constructed, one element at a time. The basic 
GRASP construction phase is similar to the semigreedy 
heuristic proposed independently by J.P. Hart and A.W. 
Shogan [31]. At each construction iteration, the choice 
of the next element to be added is determined by order- 
ing all candidate elements (i.e. those that can be added 
to the solution) in a candidate list C with respect to 
a greedy function g: C — R. This function measures the 
(myopic) benefit of selecting each element. The heuris- 
tic is adaptive because the benefits associated with ev- 
ery element are updated at each iteration of the con- 
struction phase to reflect the changes brought on by 
the selection of the previous element. The probabilistic 
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component of a GRASP is characterized by randomly 
choosing one of the best candidates in the list, but not 
necessarily the top candidate. The list of best candidates 
is called the restricted candidate list (RCL). This choice 
technique allows for different solutions to be obtained 
at each GRASP iteration, but does not necessarily com- 
promise the power of the adaptive greedy component 
of the method. Let w € [0, 1] be a given parameter. The 
pseudocode below describes a basic GRASP construc- 
tion phase. 


se = (bb 

Initialize candidate set C; 

WHILE C 4 9 DO 
s = min{g(t): t € C}; 
§ = max{g(t): t € C}; 
RCL= {s € C: g(s) <s+a(s—s)}; 
Select s, at random, from the set RCL; 
se = x U tobe 
Update candidate set C; 

END WHILE 


Procedure construct(g(-), a, x) 


The pseudocode shows that the parameter a con- 
trols the amounts of greediness and randomness in the 
algorithm. A value a = 0 corresponds a greedy construc- 
tion procedure, while a = 1 produces random construc- 
tion. 

As is the case for many deterministic methods, the 
solutions generated by a GRASP construction are not 
guaranteed to be locally optimal with respect to sim- 
ple neighborhood definitions. Hence, it is almost al- 
ways beneficial to apply a local search to attempt to 
improve each constructed solution. A local search al- 
gorithm works in an iterative fashion by successively 
replacing the current solution by a better solution in 
the neighborhood of the current solution. It termi- 
nates when no better solution is found in the neigh- 
borhood. The neighborhood structure N for a problem 
P relates a solution s of the problem to a subset of so- 
lutions N(s). A solution s is said to be locally optimal 
if there is no better solution in N(s). The key to suc- 
cess for a local search algorithm consists of the suitable 
choice of a neighborhood structure, efficient neighbor- 
hood search techniques, and the starting solution. 


While such local optimization procedures can re- 
quire exponential time from an arbitrary starting point, 
empirically their efficiency significantly improves as 
the initial solution improves. Through the use of cus- 
tomized data structures and careful implementation, an 
efficient construction phase can be created which pro- 
duces good initial solutions for efficient local search. 
The result is that often many GRASP solutions are gen- 
erated in the same amount of time required for the local 
optimization procedure to converge from a single ran- 
dom start. Furthermore, the best of these GRASP so- 
lutions is generally significantly better than the single 
solution obtained from a random starting point. The 
pseudocode below describes a basic local search proce- 
dure. 


H = {y € N(x): fly) < f@)} 
WHILE |H| > 0 DO 

Select x € H; 

H={y € N(x): fly) < f(x)}; 
END WHILE 


Procedure local(f(-), N(-), x) 


It is difficult to formally analyze the quality of so- 
lution values found by using the GRASP methodol- 
ogy. However, there is an intuitive justification that 
views GRASP as a repetitive sampling technique. Each 
GRASP iteration produces a sample solution from an 
unknown distribution of all obtainable results. The 
mean and variance of the distribution are functions 
of the restrictive nature of the candidate list. For ex- 
ample, if the cardinality of the restricted candidate 
list is limited to one, then only one solution will be 
produced and the variance of the distribution will be 
zero. Given an effective greedy function, the mean so- 
lution value in this case should be good, but prob- 
ably suboptimal. If a less restrictive cardinality limit 
is imposed, many different solutions will be produced 
implying a larger variance. Since the greedy function 
is more compromised in this case, the mean solution 
value should degrade. Intuitively, however, by order 
statistics and the fact that the samples are randomly 
produced, the best value found should outperform the 
mean value. Indeed, often the best solutions sampled 
are optimal. 
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An especially appealing characteristic of GRASP is 
the ease with which it can be implemented. Few param- 
eters need to be set and tuned, and therefore develop- 
ment can focus on implementing efficient data struc- 
tures to assure quick GRASP iterations. Finally, GRASP 
can be trivially implemented in parallel. Each processor 
can be initialized with its own copy of the procedure, 
the instance data, and an independent random number 
sequence. The GRASP iterations are then performed in 
parallel with only a single global variable required to 
store the best solution found over all processors. 


Enhancements to the Basic GRASP 


A number of enhancements to the basic GRASP, pre- 
sented in the previous section, have been proposed in 
the literature. In this section we review the use path re- 
linking, long-term memory, the proximate optimality 
principle, and bias functions in a GRASP. We discuss 
a parallelization scheme and the use of GRASP in hy- 
brid metaheuristics. 


Path Relinking 


M. Laguna and R. Marti [43] adapted the concept of 
path relinking for use within a GRASP. To test their 
concept, they im plemented a GRASP with path relink- 
ing for the 2-layer straight line crossing minimization 
problem. A small set of high-quality, or elite, solutions 
is stored to serve as guiding solutions for path relink- 
ing. Each GRASP iteration produces a locally optimal 
solution x*. A solution y* is chosen at random from 
the elite set and a path of solutions linking x* to y* is 
constructed by applying a series of changes to the orig- 
inal solution. For example, let x* = (1, 0, 0, 0) and y* = 
(0, 1, 0, 1). A path relinking of x* and y* is x* = (1, 0, 0, 
0) — (0, 0, 0, 0) + (0, 1, 0, 0) > (0, 1, 0, 1) = y*. Each 
of these path solutions is evaluated for solution qual- 
ity. Laguna and Marti report that often improvements 
to the incumbent are found in this path relinking. 


Long-Term Memory 


Long-term memory is the basis for tabu search. Besides 
path relinking, which can thought of as a form of long- 
term memory, other uses of long term memory have 
been proposed for use in a GRASP. C. Fleurent and F. 
Glover [26] observe the fact that the basic GRASP does 


not make use of information gathered in previous it- 
erations and propose a long term memory scheme to 
address this issue. M. Prais and C.C. Ribeiro [64] pro- 
pose a scheme to learn an appropriate value for the RCL 
parameter qa. 

Fleurent and Glover introduced a way to use long- 
term memory in multistart heuristics such as GRASP. 
Their scheme maintains a set S of elite solutions to 
be used in the construction phase. To become an elite 
solution a solution s must be either better than the 
best member of S, or better than the worst member 
of S and sufficiently different from the other elite so- 
lutions. For example, one can count identical solution 
vector components and set a threshold for rejection. 
A strongly determined variable is one that cannot be 
changed without eroding the objective or changing sig- 
nificantly other variables. A consistent variable is one 
that receives a particular value in a large portion of the 
elite solution set. Let I(e) be a measure of the strongly 
determined and consistent features of choice e, i. e. I(e) 
becomes larger as e resembles solutions in elite set S. 
The intensity function I(e) is used in the construction 
phase as follows. Recall that g(e) is the greedy func- 
tion. Let E(e) = F(g(e), I(e)) be a function of the greedy 
and the intensification functions. For example, E(e) = 
X g(e) + I(e). The intensification scheme biases selec- 
tion from the RCL to those elements e with a high 
value of E(e) by setting the probability of selecting e 
to be p(e) = E(e)/ }°serctE(s). The function E(e) can 
vary with time by changing the value of A, e.g. ini- 
tially A is set to a large value and when diversification 
is called for, A is decreased. A procedure for changing 
the value of A is given by Fleurent and Glover. See also 
[11] for an application of this long-term memory strat- 


egy. 


Reactive GRASP 


The term ‘reactive GRASP’ was introduced by Prais and 
Ribeiro [64] for a GRASP that reacts to solutions pro- 
duced by different settings of the RCL parameter a and 
seeks to adjust a to give the GRASP an appropriate level 
of greediness and randomness. At each GRASP itera- 
tion, the value of a is chosen from a discrete set of val- 
ues {0),...; @m}. The probability of selecting the value 
ay is pax), for k=1,..., m. Reactive GRASP adaptively 
changes the probabilities {p(a1), ..., p(@m)} to favor 


1464 


Greedy Randomized Adaptive Search Procedures 


values that produce good solutions. Consider applying 
Reactive GRASP to a minimization problem. Initially 
the probabilities are set as p(a,) = 1/m, fori = 1,..., 
m, so that the values are selected uniformly. To adap- 
tively redefine the probabilities, define F(S*) to be the 
value of the best solution found so far and let A; be the 
average value of the solutions obtained with a;. Prais 
and Ribeiro propose a period of warm-up iterations to 
initialize the A; values. Periodically (say every Ng itera- 
tions) the quantities q; = (F(S*)/A;)5 are computed for 
i=1,..., mand the probabilities are updated to p(q;) 
= qil Doi") qp for i = 1, ..., m. Observe that the more 
suitable a value a; is, the larger the value of q; is and, 
consequently, the higher the value of p(a;), making a; 
more likely to be selected. The parameter 6 can be used 
as an attenuation parameter. See also [16] for an appli- 
cation of reactive GRASP. 


Proximate Optimality Principle 


The proximate optimality principal is based on the 
idea that ‘good solutions at one level are likely to be 
found close to good solutions at an adjacent level’ [29]. 
Fleurent and Glover [26] provide a GRASP interpreta- 
tion of this principle. They suggest that imperfections 
introduced during steps of GRASP construction can be 
‘ironed-out’ by applying local search during (and not 
only at the end of) GRASP construction. Because of ef- 
ficiency considerations, a practical implementation of 
POP to GRASP is to apply local search during a few 
points in the construction phase and not during each 
construction iteration. See also [11] for an application 
of the proximate optimality principle. 


Global Convergence 


In [52] it was pointed out that GRASP with a fixed 
nonzero RCL parameter a@ is not asymptotically con- 
vergent to a global optimum. During construction, 
a fixed RCL parameter may rule out a candidate that is 
present in all optimal solutions. Several remedies have 
been proposed to get around this problem. The most 
straightforward is the use of a randomly selected a [72]. 
In this approach, the parameter is selected at random 
from the continuous interval [0, 1] at the start of each 
GRASP iteration. That value is used during the entire it- 
eration. Since a subset of the iterations are random, the 


algorithm becomes asymptotically globally convergent. 
Reactive GRASP, as described above, can also be made 
asymptotically globally convergent by making a,, = 1, 
i.e. allowing the choice of a value that produces a ran- 
dom GRASP iteration. J.L. Bresina [13] introduced the 
concept of a bias function to select a candidate element 
to be included in the solution. Bresina’s method, which 
is directly applicable to GRASP construction, also al- 
lows for purely random construction and is therefore 
asymptotically globally convergent. At each construc- 
tion step, the elements in the candidate set C are ranked 
by their greedy function values. A bias value bias(r) 
is assigned to the rth ranked element. Bresina pro- 
poses several bias functions. In logarithmic bias, bias(r) 
= I/log(r + 1). In linear bias, bias(r) = 1/r. In poly- 
nomial bias of order n, bias(r) = 1/r”. In exponen- 
tial bias, bias(r) = 1/e”. Finally, in random bias, bias(r) 
= 1. During construction, the probability of selecting 
the rth ranked candidate is bias(r) / yi bias(i). See 


i= 


also [11] for an application of this bias function strat- 
egy. 


Parallel GRASP 


Parallel implementation of GRASP is straightforward. 
Two general strategies have been proposed. In search 
space decomposition, the search space is partitioned 
into several regions and GRASP is applied to each in 
parallel. An example of this is the GRASP for maximum 
independent set [23,69] where the search space is de- 
composed by fixing two vertices to be in the indepen- 
dent set. In iteration parallelization, the GRASP itera- 
tions are partitioned and each partition is assigned to 
a processor. See [54,56,57,58,67] for examples of par- 
allel implementations of GRASP. Some care is needed 
so that different random number generator seeds are 
assigned to the different iterations. This can be done 
by running the random number generator through an 
entire cycle, recording all N, seeds in a seed array. It- 
eration i is started with seed(i). GRASP has been im- 
plemented on distributed architectures. In [58] a PVM- 
based implementation is described. Two MPI-based im- 
plementations are given in [4,50]. A.C.F. Alvim [4] 
proposes a general scheme for MPI implementations. 
A master process manages seeds for slave processors. It 
passes blocks of seeds to each slave processor and awaits 
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the slaves to indicate that they have finished processing 
the block and need another block. Slaves also pass back 
to the master the best solution found for each block of 
iterations. 


GRASP in Hybrid Metaheuristics 


GRASP has been used in hybrid metaheuristic schemes. 
Laguna and J.L. Gonzalez-Velarde [41] proposed 
a GRASP in which local search is done by tabu search. 
See also [16,46] for implementations of GRASP using 
tabu search as the local search procedure. Simulated an- 
nealing can also be used as a GRASP local search proce- 
dure if the initial temperature is low so that it remains 
near the neighborhood of the constructed solution. R.K. 
Ahuja, J.B. Orlin and A. Tiwari [3] use GRASP con- 
struction as a mechanism for generating the initial pop- 
ulation in a genetic algorithm. GRASP is used in [45] in 
a genetic algorithm to implement a type of crossover 
called perfect offspring. 


Applications of GRASP 


We now turn our attention to a number of GRASP 
implementations that have appeared in the literature, 
covering a wide range of applications. An early tuto- 
rial on GRASP appears in [22]. We group the work 
into two categories, applications to operations research 
problems and to industrial applications. 


Operations Research Problems 


Applications of GRASP to operations research prob- 
lems can be classified into eight categories: scheduling 
problems, routing problems, logic, partitioning prob- 
lems, location problems, graph theoretic problems, 
assignment problems, and nonconvex network flow 
problems. 

GRASP has been applied to several scheduling 
problems, including operations sequencing in discrete 
parts manufacturing [7], flight scheduling [18], just-in- 
time scheduling in parallel machines [41], printed wire 
assembly scheduling [9,19], single machine schedul- 
ing with sequence dependent setup costs and delay 
penalties [24], field technician scheduling [79], flow- 
shop with setup costs [76,77], and bus-driver schedul- 
ing [45]. 


Applications of GRASP to routing problems include 
vehicle routing with time windows [38], vehicle rout- 
ing [32], aircraft routing [5], inventory routing prob- 
lem with satellite facilities [10], and permanent virtual 
circuit (PVC) routing [66]. 

Problems in logic have been approached with 
GRASP. These include the satisfiability problem [68], 
maximum satisfiability [58,71,72], and inference of log- 
ical clauses from examples [15]. 

GRASP has been applied to partitioning problems, 
including graph two partition [40] and number parti- 
tioning [6]. 

Applications of GRASP to location problems in- 
clude p-hub location [36], pure integer capacitated 
plant location [14], location with economies of scale 
[33], single source capacitated plant location [16], lo- 
cation of concentrators in network access design [74], 
and maximum covering [67]. 

GRASP has been used for finding approximate 
solutions to a number of graph theoretic problems, 
including set covering [21], maximum independent 
set [23,69], maximum clique with weighted edges 
[48], graph planarization [73,75], 2-layer straight line 
crossing minimization [43], sparse graph coloring 
[42], maximum weighted edge subgraph [47], the 
Steiner tree problem in graphs [49,50], feedback ver- 
tex set in directed graphs [60], maximum clique [1,61], 
and the capacitated minimum spanning tree prob- 
lem [2]. 

Several assignment problems have been approached 
with GRASP. A GRASP was introduced for the 
quadratic assignment problem in [44]. A parallel ver- 
sion of this GRASP is described in [57]. Fortran subrou- 
tines for dense and sparse quadratic assignment prob- 
lems can be found respectively in [70] and [59]. A mod- 
ified local search for the GRASP for quadratic assign- 
ment problems is proposed in [65]. GRASP has been 
used to generate the initial population of a genetic algo- 
rithm for the quadratic assignment problem [3]. Long 
term memory schemes have been adapted to a GRASP 
for the quadratic assignment problem in [26]. AGRASP 
for the biquadratic assignment problem is described 
in [51]. GRASP has been applied to two multidimen- 
sional assignment problems [53,78] and to the radio 
link frequency assignment problem [62]. A GRASP 
for the generalized assignment problem was proposed 
in [46]. 
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GRASP has been used for finding approximate so- 
lutions to a concave-cost network flow problem [34]. 


Industrial Applications 


Industrial applications of GRASP can be classi- 
fied into seven categories: manufacturing, transporta- 
tion, telecommunications, automatic drawing, electri- 
cal power systems, military, and biology. 

GRASP has been applied to several manufactur- 
ing problems, including operations sequencing in dis- 
crete parts manufacturing [7], cutting path and tool 
selection in computer-aided process planning [17], 
manufacturing equipment selection [8], component 
grouping [37], and printed wire assembly schedul- 
ing [9,19]. 

Applications of GRASP in transportation include 
flight scheduling and maintenance base planning [18], 
intermodal trailer assignment [20], and aircraft routing 
in response to groundings and delay [5]. 

In telecommunications, GRASP has been applied 
to the design of SDH mesh-restorable networks [63], 
the Steiner tree problem in graphs [49,50], permanent 
virtual circuit (PVC) routing [66], location of concen- 
trators in network access design [74], traffic schedul- 
ing in satellite switched time division multi-access 
(SS/TDMA) systems [64], location of points of pres- 
ence (PoPs) [67], and to the multicriteria radio link fre- 
quency assignment problem [62]. 

GRASP has been applied to automatic drawing 
problems, including seam drawing in mosaicing of 
aerial photographic maps [25], graph planarization 
[73,75], and 2-layer straight line crossing minimization 
[43]. 

GRASP has been applied to other industrial prob- 
lems. An application to electrical power systems is trans- 
mission expansion planning [12]. A military applica- 
tion of GRASP is in multitarget multisensor tracking 
[53]. GRASP has been applied in biology for protein 
structure prediction [39]. 


Conclusion 


We have surveyed the literature on greedy randomized 
adaptive search procedures (GRASP) in the 1990s. In 
these years many enhancements to the basic GRASP 
introduced in 1988 have been proposed. The number 


and variety of applications has grown and continues to 
grow. 


See also 


> Feedback Set Problems 

> Generalized Assignment Problem 

> Graph Coloring 

> Graph Planarization 

> Heuristics for Maximum Clique and Independent 
Set 

> Maximum Satisfiability Problem 

> Quadratic Assignment Problem 

> Quadratic Semi-assignment Problem 
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Polynomial equations (in several variables) arise in 
many areas connected to management science. They 
could describe the feasible set of an optimization prob- 
lem, the Karush-Kuhn-Tucker conditions for the same 
problem, or maybe constraints on the positions of the 
links of a robot arm in a flexible manufacturing system. 
There are many analogies between polynomial 
equations and their special case, linear equations. 

e One might want to solve the equations, i.e. find 
one or all solutions, determine whether a solution is 
unique or determine whether the system in incon- 
sistent. 

e One might want to answer more abstract questions, 
such as whether a given equation is a consequence 
of a given set of equations (cf. » Farkas lemma; 
> Farkas lemma: Generalizations). 


For linear equations a fundamental concept is that of 
a (linear) basis and the fundamental tool is that of Gaus- 
sian elimination, by which one can construct a basis 
from a given set of vectors. Similarly, for polynomi- 
als there is the corresponding concepts of a Grébner 
basis and the Buchberger algorithm, which for a given 
set of polynomials constructs a Grobner basis. In par- 
ticular one can convert a system of polynomial equa- 
tions to triangular form, which allows for a solution 
by back substitution. In Gaussian elimination, the vari- 
ables/columns have an ordering that influences the end 
result. Similarly, for Grébner bases we need an order, 
not only for the variables, but for monomials, i.e. the 
simplest possible polynomials, such as x}x4, that are 
products of variables. In this short note we will review 
Grébner basis for polynomial equations. 

Before defining a Grébner base we will give an ex- 
ample. 


Example 1 Suppose we want to find the local optima of 
the following optimization problem ([4, Problem 337]; 
also used in [3]), by solving the KKT-conditions: 


min f(x) = 9x} + x} + 9x3 
(P) st. g(x) =1—x, x. <0 
R(x) =1—x. <0 


3(X) =%x3,3-1<0. 


The KKT conditions for (P) are: 


18x, = A 1X2 =0 
2X2 —A x, —r, = 0 


18 A3 =0 
(KKT) x3 + A3 

Ai — x1%2) = 0 

A2(1 — x2) = 0 

A3(x3 = 1) = 0. 


Further suppose we use a lexicographical order of 
the monomials such that x) > x2.>x3> A >A2>A3. Then, 
computing the Grébner basis for the set of polynomi- 
als in the above system and forming the corresponding 


1470 


Grobner Bases for Polynomial Equations 


equation system, we get 


18x, — x2Aq =0 
x2A1 — 36x3 + 18Az = 
x2A72 — Ag 
2x. -—Ay — Ad 
18x3 — A3 
A} — 36A; — 18A3 + 36A. =0 
Aide 24 =0 
AB + 1445 — 32d, =0 
AZ + 18A3 = 


This system has an obvious triangular structure, 
that we have tried to display graphically. The last equa- 
tion contains only A3. Then comes equations in Az (and 
possibly 43) and so on. In a similar way as in Gaus- 
sian elimination, the system can thus be solved by back 
substitution. In each step, one then has to solve a sin- 
gle variable polynomial equation, giving possibly sev- 
eral solutions, each of which is substituted into the pre- 
ceding equations. Thus the solution process evolves in 
a tree-like structure. It might happen, that one has to 
solve for a variable that is already computed. Then of 
course the solutions have to agree, else they are dis- 
carded. 


The above type of structure will always occur if there are 
finitely many solutions. It might happen, though, that 
the system allows a manifold of solutions. In this case 
it might e. g. happen that the last equation contains two 
variables or that you in the back substitution process 
comes to an equation with two (or more) undetermined 
variables. These equations then give a parametrization 
of the manifold. 


What is a Grobner Basis 


In Gaussian elimination the variables are ordered and 
the basic reduction rule is to replace the equations f = 
0, g = 0 by f = 0, g — cf = 0 where the constant c is 
chosen so that the leading terms in g and ef coincide. 

In systems of polynomial equations we do some- 
thing quite similar. First we extend the ordering of the 
variables to a total ordering of all monomials in a way 
such that m’ <m” => mm’ < mm" for all monomials m, 
m’ and m” and so that 1 is the least one. 


The basic reduction rule is now to replace the equa- 

tions f = 0, g = 0 by f = 0, g — cmf = 0 where the con- 
stant c and the monomial m are chosen so that the lead- 
ing terms of g and cmf coincide. This implies that h = 
g —cmf is ‘smaller’ than g in the ordering. If such a re- 
duction of g with f is possible and h = g — cmf we will 
write g — fh. 
Definition 2 A finite set G of polynomials is a Grébner 
basis if for every polynomial q there exist a unique r and 
a finite reduction chain q > », qi > » **' > gy Ik = 
r for some gi, ..., gx in G and such that r cannot be 
reduced further. The unique polynomial r is called the 
normal form of q modulo G. 


Given a finite set of vectors we can use Gaussian elimi- 
nation to compute a basis of vectors spanning the same 
linear space. Given a finite set P of polynomials (and an 
admissible monomial ordering), one can use the Buch- 
berger algorithm to compute a Grobner basis G, span- 
ning the same ‘space’ of polynomials as P. (By the space 
of polynomial spanned by P is meant the ideal gener- 
ated by P, i.e. the set of finite linear combinations qip, 
++++ + qsp; where the p;-s are in P and the q;-s are ar- 
bitrary polynomials.) We say that G is a Grébner basis 
for P. Moreover, the common zeros of P are the same 
as those of G. 


What are Grobner Bases good for 


Roughly speaking, all questions concerning a system of 
polynomial equations f; = --- =f; = 0 can be answered 
if we have a corresponding Grobner basis. Here we list 
just a few of them. 

e Is the system solvable? 

e If the system is solvable, how many solutions are 
there, and which are they? 

e How many real solutions are there? (in case the coef- 
ficients are real). Here we can also allow for inequal- 
ities. 

Is it possible to eliminate some of the variables? 

e Given some polynomial f, does f vanish whenever f; 
-++ f; does? This can be used for automated proofs 
in geometry. 

e Given some polynomial f, does there exist polyno- 

, qs such that f = qifit--- + qsfs? 

e Is it possible to describe the algebraic relations be- 
tween the f;-s, i.e. the set of polynomials q in s vari- 
ables such that q(f1,...; fs) is the zero polynomial. 


mials qi,... 
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e Cana given polynomial f be written as f = q(f1,..., 
fs) for some polynomial q in s variables, and in case 
it can, is it possible to compute q? 

e Can we compute a vector space basis for the vector 
space of polynomials modulo f; --- f;? 


Using Grébner Bases 
and Learning more About them 


Essentially all major mathematical computer packages 
with symbolic capabilities contain modules for Grébner 
bases. The main examples are Maple and Mathematica. 
For a short but more detailed introduction to Grébner 
bases, see [3]. The book [2] gives a rather short intro- 
duction to the field. One standard textbook is [1] 


See also 


> Contraction-mapping 
> Fundamental Theorem of Algebra 


> Global Optimization Methods for Systems 


of Nonlinear Equations 


> Interval Analysis: Systems of Nonlinear 


Equations 
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Even though dynamic programming [2] was origi- 
nally developed for the solution of problems which 
exhibit discrete types of decisions, it has also been 
applied to continuous formulations. In this article, 
the application of dynamic programming to the solu- 
tion of continuous-time optimal control problems is 
discussed. By discretizing the problem, applying the 
dynamic programming equations, then returning to 
the continuous domain, a partial differential equation 
results, the Hamilton-Jacobi-Bellman equation (HJB 
equation). This equation is often referred to as the con- 
tinuous-time equivalent of the dynamic programming 


algorithm. In this article, the HJB equation will first be 
derived. A simple application will be presented, in ad- 
dition to its use in solving the linear quadratic con- 
trol problem. Finally, a brief overview of some solu- 
tion methods and applications presented in the litera- 
ture will be given. 


Problem Formulation 


The dynamic programming approach will be applied to 
a system of the following form: 


2(t) = f(z(t), u(t)), 


z(0) = Zp, 


0<t<T, a) 
where z(t) € R” is the state vector at time ¢ with time 
derivative given by z(t), u(t) € U C R™ is the control 
vector at time ¢, U is the set of control constraints, and T 
is the terminal time. The function f (z(t), u(t)) is contin- 
uously differentiable with respect to z and continuous 
with respect to u. The set of admissible control trajecto- 
ries are given by the piecewise constant functions, {u(t): 
u(t) € U, Vt € [0, T]}. It is assumed that for any admis- 
sible control trajectory, that a state trajectory z“(t) exists 
and is unique. 

The objective is to determine a control trajectory 
and the corresponding state trajectory which minimizes 
a cost function of the form: 


Tr 
h(z"(T)) + / g(z(t), u(t) dt, (2) 
0 


where the functions g, and h are continuously differen- 
tiable with respect to both z and u. 


Derivation 


The derivation of the Hamilton-Jacobi-Bellman equa- 
tion is taken from [3]. The time horizon is first dis- 
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cretized into N equally spaced intervals with: 


ga. 
N 


Also, the state and control are represented by: 


Ze = 2(ks), : 
uz, = u(kd), k=O0,...,N. 


The continuous-time system is approximated by: 
Zet+1 = Ze + F(z, uz)6. 


The cost function is rewritten as: 
N-1 


h(zy) + >: QZ, UE)S. 


k=0 


The dynamic programming algorithm is now applied 

with the following definitions: 

e J*(t, z) is the optimal cost-to-go for the continuous 
problem; 

e J*(t,z) is the optimal cost-to-go for the discrete ap- 
proximation. 

The dynamic programming equations then take the 

form: 


T*(N5,z) = (2), (3) 

T*(k5, z) 

= min | g(z,u)8 + F(k + 18,2 + fle.w)9)], 
k=0,...,N—1. (4) 


It is assumed that J*(t, z) has the necessary differentia- 
bility requirements to write the following Taylor series 
expansion: 


T*((k + 18,2 + f(z, u)8) 
= J*(k8, z) + Vil*(k8, zd 
+ VJ*" (k68, z) f(z, uw) + 0(8), (5) 


where 0(6) represents second order terms which satisfy 
0(5)/5 — 0 as 6 > 0. Substituting (5) into (4) results in: 


7*(k8, 2) = min | g(z, u)5 + 7*(K6, 2) 


+ ViF*(k5, 2)8 + V.TT (kB, 2) f(z, w)5 + 0(8)].. 
(6) 


Dividing (6) by 6 and T*(ké, z), and taking the limit as 
5 — 0 with the assumption that 

lim J*(k6,z) = J*(t,z) 

pe 


ko=t 


results in 


0= min [g(z,u) + ViJ*(t, z) 


+ViJ*"(t,2)f(z.u)], Vt.z, (7) 


with the boundary condition 
J'(T,2) = he). 


This partial differential equation is known as the 
Hamilton-Jacobi-Bellman equation (HJB equation). 


Sufficiency Theorem 


This theorem is presented in [3]. Suppose V(t, z) is a so- 
lution to the HJB equation, that is, V is continuously 
differentiable with respect to z and t and satisfies: 


0= min [g(z,u) + ViV(E, z) 


+ViV'(t,z)f(z,u)], zt, (8) 


V(T,z) = h(z), Vz. (9) 


Suppose also that jz*(t, z) attains the minimum in (8) 
for all t and z. Let z*(#) be the state trajectory obtained 
from the given initial condition z(0) when the control 
trajectory u*(t) = u*(t, z*(£)) is used. (That is, z*(0) = 
2(0), z* = f(z*(t), u*(t, z*(£)))); one also assumes that 
this differential equation has a unique solution starting 
at any pair (t, z) and that the control trajectory is piece- 
wise continuous in time.) Then V is the unique solution 
of the HJB equation and is equal to the optimal cost-to- 
go function 


VG0=Fh2), Vat. 

Furthermore, the control trajectory, u*(t) is optimal for 
all t € [0, T]. 

Example 


Consider the simple dynamic system: 


z(t) = u(t) 
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with the control bounded by u(t) € [— 1, 1] and time 
over the range t € [0, T]. The cost function is given as: 


1 
=2z(T)’. 
BAD) 
Writing the HJB equation for this system gives 


0= min Levee, z)+ V.V(t, zu], Vt, Z, 


ué€[—1,1 
with the boundary condition, 
V(T, 2) = 22" 
2) = =z. 
2 


The obvious choice of a control policy is to drive the 
state to zero as fast as possible and keep it there. This 
corresponds to the policy: 


1 ifz <0, 
w*(t,z) =—sen(z)= 40 ifz=0, 
-1 ifz>0. 


The cost associated with this policy for a given initial 
time and state is: 


J*(t,2) = 5 (max{0, le] - (7-9). 


This function satisfies the terminal condition J*(T, z) = 
z*/2. Also, 


ViJ*(t,z) = max {0,|z| —(T— #)}, 
V.J*(t, z) = sgn(z) max {0, |z| —(T—f)}. 


Substituting these expressions into the HJB equation 
results in 


0= min [1 + sgn(z)u| max {0, |z| —(T — t)}, 


ué€[—1,1] 


which can be shown to hold for all (f, z). The minimum 
is attained for u = — sgn(z), and one therefore concludes 
from the sufficiency theorem presented above that J*(t, 
z) is indeed the optimal cost-to-go function. 


Linear-Quadratic Problem 


Consider a general n-dimensional time-invariant linear 
system 


z(t) = Az(t) + Bu(t) 


with a cost function defined by 
2! (T)Qr2(T) 
T 
+ i, z! (t)Qz(t) + uw" (t)Ru(t) dt, 
0 


where the matrices Q and Qr are symmetric positive 
semidefinite, and the matrix R is symmetric positive 
definite. The HJB equation is written as 


0= min [z' Qz +u! Ru 
+V,V(t,z) + V.V'(t,z)(Az + Bu)], 
V(T,z) =z'Qrz. (10) 
Try a solution of the form: 
V(t, z) = z'K(t)z, 
where K(t) is a symmetric n x n matrix. One then has 
V.V(t, Zz) = 2K(t)z, 
V,V(t,z) = z' K(t)z. 
Substituting the above expressions into (10) results in 
0= min [z' Qz +ulRut z'K(t)z 
+2z'K(t)Az+2z'K(t)Bu)]. (11) 


The minimum is obtained when the gradient with re- 
spect to u is zero. This results in 


2B'K(T)z + 2Ru =0 
or 
u = —R'B'K(t)z. 


Substituting this expression into (11), the following re- 
sults: 


0 =z! (K(t)+K(t)A+A'K(t) 
—K(t)BR'B' K(t) + Q)z. 


Therefore, K(t) must satisfy the following matrix differ- 
ential equation: 


K(t) = —K()A— A! K(t) 
+ K(t)BR'B! K(t)— Q, 
with the terminal condition 


K(T) = Qr. 
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This equation is known as the continuous-time Riccati 
equation. 


Solution Methods and Applications 


In the general case of a nonlinear system, the solu- 
tion can not be determined analytically and numerical 
methods need to be relied on. The numerical solution 
of the Hamilton-Jacobi-Bellman equation is not triv- 
ial due to its partial differential nature. Additionally the 
HJB equation and accompanying numerical methods 
have been used to solve a wide variety of problems. 

See [4] for many applications in the area of optimal 
control, and for an advocate solution by the method of 
characteristics. This classical technique for the solution 
of partial differential equations can be found in many 
textbooks. See [6] for remarks about the application 
of the HJB equation to minimum time optimal con- 
trol problems. See [1] for an approximate method for 
the solution of the time-invariant HJB equation. The 
method consists of a reduction to a set on linear par- 
tial differential equations and an approximation via the 
Galerkin spectral method. It also presents an extensive 
review of various approximation approaches and an ap- 
plication for the voltage regulation of a power genera- 
tor. See [7] for an alternating direction algorithm for the 
solution of HJB equations. See [5] for an application for 
the optimal path timing of robot manipulators and for 
the approximate solution of the resulting HJB equation 
using finite difference methods. 

The aforementioned references are a subset of the 
various solution methods for and applications of the 
Hamilton-Jacobi-Bellman equation. 
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Variational expressions, also called for historical rea- 
sons variational principles, play a significant role in me- 
chanics. They have their origin in the study of problems 
of analytical mechanics, which have extensively been 
studied in previous centuries, a time where scientists 
used to work multidisciplinary. Today, variational prin- 
ciples provide the basis for a correct and efficient mod- 
eling of a variety of physical phenomena, for instance, 
they provide the theoretical basis of the finite element 
method [19]. 


Variational equalities are the commonly met form 
of variational expressions. Having in mind problems 
which can be obtained from the minimization of 
a smooth (i.e., sufficiently differentiable) potential en- 
ergy function, one may consider the variation of this 
function at a given point. A necessary condition for this 
function to attain a critical point is that every varia- 
tion of the function in the neighborhood of this point is 
equal to zero. Thus, one formulates a variational equal- 
ity problem. In mechanics, the differential of a poten- 
tial energy function has the physical meaning of (stored 
or consumed) work. Let us consider a problem in elas- 
tostatics. In a formulation based on displacements, all 
variations of the system ’s variables around a sought 
point are called virtual displacements. For obvious rea- 
sons the variational equality is called in this case prin- 
ciple of virtual work: for small virtual displacements 
around the equilibrium the virtual work of the system 
is equal to zero. Analogously, one arrives at the princi- 
ples of complementary virtual work, or at mixed vari- 
ational principles (the latter being derived from saddle 
point theorems). At this point it should be mentioned 
that a variational formulation may also be written for 
certain classes of problems which does not possess a po- 
tential. 

The introduction of inequality constraints in the 
studied problem, or the assumption of nondifferen- 
tiable (nonsmooth) potential energy functions, lead 
to variational inequalities or more complicated varia- 
tional problems. Intuitively speaking, either not all vir- 
tual variations of the problem variables around a given 
point are permitted (the case of inequality constraints, 
for instance, unilateral contact constraints), or, a lin- 
ear approximation of the potential energy function 
is no more sufficient (the case of nondifferentiable 
or nonsmooth energy). Convex problems have cer- 
tain theoretical and numerical advantages. They are 
connected with monotone operators. This is the case, 
e.g., of small displacement and deformation elasto- 
statics with monotone material laws or interface and 
boundary conditions. These problems lead to varia- 
tional inequalities and, in some cases, to convex (pos- 
sibly nonsmooth) energy minimization problems (con- 
vex superpotentials in the sense of J.-J. Moreau [10]). 
The techniques of convex analysis and minimization 
can be used for their effective solution. Unilateral 
contact problems [10,15,17] and problems of elasto- 
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plasticity [7,17] have been studied within this frame- 
work. 

Hemivariational inequalities are connected with 
nonconvex and possibly nonsmooth energy functions. 
In elastostatics, convexity is usually lost if the effects 
of large displacements or deformations are considered. 
Moreover, falling branches in material, interface or 
boundary laws lead to nonconvex potentials. The latter 
laws may be of a phenomenological nature and may be 
used for modeling of delamination and strength degra- 
dation effects, fracture, etc. Several methods have been 
developed for the study of nonconvex problems. The 
notion of the generalized gradient in the sense of F.H. 
Clarke has been used by P.D. Panagiotopoulos for the 
construction of hemivariational inequalities [16,17,18]. 
Following the example of nonsmooth analysis, he called 
this new field nonsmooth mechanics. A short introduc- 
tion to this theory and its applications in mechanics 
is outlined in this article. The interested reader may 
also consult » Nonconvex energy functions: Hemivari- 
ational inequalities and the monographs [14,18]. 

One should mention that the study of hemivari- 
ational inequalities provides an interesting field for 
mathematicians and engineers alike. For engineers sev- 
eral types of hemivariational inequalities have been 
used for the study and the efficient numerical treat- 
ment of yet unsolved or partially solved problems, 
e.g., in nonmonotone semipermeability problems, in 
modeling of delamination of simple and multilayered 
plates, in the theory of composite structures and adhe- 
sive joints, etc. Several of these concrete practical ap- 
plications can not be treated by more naive, without 
mathematical justification engineering methods. Fur- 
thermore, the potential of this research field can be 
estimated if one thinks that nonconvex energy func- 
tions are connected with instabilities, complex dynam- 
ics, fractals and chaos. Certainly, a lot of work remains 
to be done in this area. 


Abstract Hemivariational Inequality 


The derivation of hemivariational inequalities is based 
on the mathematical notion of the generalized gradient 
of Clarke (denoted here by 9). In contrast to the varia- 
tional inequalities, the hemivariational inequalities are 
not equivalent to minimum problems, but they give rise 
to substationarity problems. A hemivariational inequal- 


ity problem reads: find u € V such as to satisfy the in- 
equality 


a(u,v — u) + [ Puy—w d2 > (l,v—u), 

VveVv. (1) 
In the abstract form used here, let V be a real Hilbert 
space, V’ be its dual space and such that V C L?(Q) 
C V’, with continuous and dense injections. The prob- 
lem is defined in §2, which is an open bounded subset 
of R". Furthermore let (-, -) be the L?({2) product and 
the duality pairing, || - || the norm of V and | - |, the 
L?(92)-norm. Note that (-, -) extends uniquely from V x 
L? (2) to V x V’. Further, let VC L?(2) be compact 
and V M L®({2) be dense in V for the V-norm, and 
have a Galerkin base. The bilinear form a(-, -): V x V 
— R is symmetric continuous and coercive, i.e. there 
exists c > 0 constant such that 


a(v,v)>cllv?, Wee v. (2) 


Moreover j: R — R denotes a locally Lipschitz func- 
tion which is defined by the following procedure: let 6 


€ LP (R) and consider 


B,.(&) = esssupjg,—g}<y, BCE) (3) 
and 
B,,(&) = essinfjg,—¢|<y B(E1). (4) 


They are increasing and decreasing functions of ju, re- 
spectively and thus the limits for j4 — 0, exist. We de- 


note them by (£) and BlE ) respectively and we define 
the multivalued function 


B(é) = (Ble). BDI. (5) 


If B (40) exists for every € € R, then a locally Lipschitz 
function j: R > R can be determined (up to an additive 
constant) such that BCE) = 9 i(&). Finally, in relation 
(1) j° (u, v — u) denotes the generalized gradient of the 
nonconvex and nonsmooth locally Lipschitz potential j. 
By definition one has the following connection with the 
generalized gradient, in the sense of Clarke: 


pP(u,v) = {max (w,v): w € dcij(u)}. (6) 


Speaking in terms of mechanics one identifies relation 
(1) to be a virtual work expression in inequality form. 
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The first term is the internal work, the second term is 
the energy contribution of the nonlinear elements mod- 
eled by the nonconvex superpotential j and the right- 
hand side term represents the loading contribution. De- 
tailed formulations of variational problems, up to the 
hemivariational inequality (1) for concrete applications 
follow in the next section. 


Elastostatics with Nonlinear Boundary Conditions 


A variational formulation is a statement that a solution 
of an operator equation subjected to certain boundary 
and/or initial conditions makes an expression involv- 
ing variations of the quantities of the problems equal 
to zero or nonnegative. Thus one may distinguish be- 
tween the bilateral or equality problems and the uni- 
lateral or inequality problems. Certain variational prin- 
ciples for a deformable body with nonlinear boundary 
interaction effects are derived in this section in order to 
demonstrate the hemivariational inequalities and their 
relation to classical equations and convex variational 
inequalities. Let 2 € R® be an open bounded subset oc- 
cupied by a deformable body in its undeformed state. 
On the assumption of small deformations we can write 
the relation: 


[owen —u)dQ 


2 


= [ to —uj)d2Q+ fovinicn — uj) aI, 
Q r 
VveV, (7) 


for u € V. Here V denotes the function space of the dis- 
placements which will be defined further. Relation (7) 
is the expression of the principle of virtual work for the 
body when it is considered free, without constraints on 
its boundary J”. For the derivation of (7) the following 
steps are followed. The elastostatic equilibrium equa- 
tion is first considered: 


i,j + fi = 9, (8) 


where the f; is the volume force vector. Relation (8) 
is multiplied by the virtual variation v; — u; and then 
an integration over (2 is performed. On the assump- 
tion of appropriately smooth functions, the Green - 


Gauss theorem is applied. One recalls here the strain- 
displacement relation (small deformation theory): 


1 
ej = 3 Mg + yi) (9) 


Let a linearly elastic body be assumed, i. e., the consti- 
tutive material relation reads: 


Oij = CijhkEnk, (10) 


where C = { Cjnk }, i, j, h, k = 1, 2, 3, is the elasticity 
tensor which satisfies the well-known symmetry and el- 
lipticity properties 


(11) 


Cijnk = Cjink = Cknij; 


Cijnk€ijnk = C&ijenk, We = {éij}- (12) 


The bilinear form of linear elasticity a(-, -) reads in this 
case: 


a(u,v) = / Cijnk€ij(Uenk(v) dQ. (13) 


2 


For further reference one splits the last term in (7) into 
the work of the normal and of the tangential tractions 
to the boundary. Then (7) may also be written in the 
form: 


J event = u) dQ2 
2 


= [ foo dQ + [ Sw — uy) ar" 
2 


rT 
+ / Sr, (v7, = ur, ) dl, VveV. (14) 


yo 


Single-Valued Boundary Laws 
and Variational Equalities 


Let us assume first that on I” the classical boundary 
conditions Sy = 0 and ur, = 0, i = 1, 2, 3, hold. Then 
(14) with (13) leads to the following variational equal- 


ity: 
Find vueVY={v: ve V, vr, =Oonl} 
st. a(uv) = ffi dQ, VWvev. (5) 
Q 
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Analogously, one treats all linear or nonlinear bound- 
ary conditions which can be expressed in an equality 
form. Relation (15), under appropriate smoothness as- 
sumptions, imply that the governing equations of the 
mechanical problem (8) and the assumed boundary 
conditions hold in a weak (integral or energetic) form. 


Multivalued, Monotone Laws 
and Variational Inequalities 


Let us assume now that on J” the general monotone 
multivalued boundary condition 

—S € dj(u) (16) 
holds. Here j(u) is assumed to be a convex superpoten- 
tial and ddenotes the subdifferential of convex analysis. 
Moreover, all (normal and tangential) contributions of 
boundary displacements u and tractions S are included 
in (16), which holds as a multidimensional boundary 
condition at each point of the boundary I’. Relation 
(16) is, by definition of the subdifferential, equivalent 
to: 


jv) — ju) => -Si(vi -ui), Vv = {vi} eR’. (17) 


By using (17) and (7) one gets the variational inequality: 


Find 
s.t. a(u,v —u) 


n / (i(v) = ju) dP 


u € V with j(u) < ~w, 


M (18) 
ta | fo —Uu;) dQ, 
2 


Vv € V with j(v) < oo. 


It is trivial to formulate analogous variational inequal- 
ities for more simple one-dimensional laws. This is the 
case where independent contact laws and tangential 
(e.g., due to friction) mechanisms are assumed on the 
boundary I”. One should mention in passing that uni- 
lateral contact relations are included in this formula- 
tion by means of the indicator function in the place of 
j(u). The indicator function is defined by Iy,, (u) = 0 if 
u € Ugg and + oo otherwise, and includes the inequal- 
ity constraints that describe the no-penetration require- 
ments. 


Multivalued, Nonmonotone Laws 
and Hemivariational Inequalities 


In this case the basic building element is the defini- 
tion of boundary conditions and material laws based 
on Clarke subdifferential (6). For instance, let on I” the 
nonmonotone, possibly multivalued boundary condi- 
tion 


—SeE Ici j(u) (19) 


hold, where j is a locally Lipschitz superpotential func- 
tional. Combining (7) with the inequality 


pP(u,v—u) > —-S;(v; — ui), 


Vv = {vj} © R’, (20) 


which defines on I” the condition (19), one gets the fol- 
lowing hemivariational inequality: 


Find ueV 


s.t. a(u,v—u)+ 


+f Puy—w ar 


: (21) 
> f fide, 
2 
VveV. 
If instead of (19) one assumes on I" that: 
— Sn € Ocijn(un), -Sr € IcLjr(ur), (22) 


then one gets analogously the hemivariational inequal- 


ity: 


Find ueV 
s.t. a(u,v—u) 
+ / i\(un, vw — un) a 
Tr 


+ | jp(ur, vr —ur) dl 


rT 
> f finde, 
2 


VveV. 


Hemivariational Inequalities: Applications in Mechanics 


1481 


The last type of variational expressions involving j°(., -) 
or j\,(-, -) and j,(-, -) have been called hemivariational 
inequalities by Panagiotopoulos, who introduced and 
studied them in mechanics [14,16,17,18]. Note that in 
the more general case in which j or jy and jr are not 
locally Lipschitz j°(-, -) in (21) and j{,(- -), #7 +) in 
(23) are replaced by PNG -) and its ), iM, -). More- 
over a combination of monotone subdifferential laws 
(cf. (6)) and nonmonotone laws (cf. (19)) for differ- 
ent (nonoverlapping) parts of the boundary I” is possi- 
ble. One then gets variational-hemivariational inequal- 
ity problems. 

The solution of variational problems, like the vari- 
ational equalities, or the hemivariational inequalities 
derived previously, satisfies the operator equations of 
the problem, e.g. the equation of equilibrium, and the 
boundary conditions of the problem in a weak sense. 
This means, roughly speaking, that these relations are 
satisfied in an integral form, on the body or the bound- 
ary of the structure respectively. Analogous considera- 
tions are familiar within the weak formulations used in 
the finite element method. 


Inequality or Nonsmooth Mechanics 


A boundary value problem is called bilateral (resp. uni- 
lateral) if it leads to variational equality (resp. varia- 
tional, or hemivariational inequality) formulations. The 
unilateral problems are called inequality problems too. 
Inequality problems in mechanics usually character- 
ize structures with variable mechanical behavior, i.e. 
where the material or boundary law depends on the 
direction of the stress or boundary traction variation. 
Due to their connection with nonsmooth energy func- 
tions, all inequality problems belong to the area called 
by Panagiotopoulos nonsmooth mechanics [11,12]. 


Discretized Hemivariational Inequalities 
for Nonlinear Material Laws 


In order to make the subject more accessible to engi- 
neers a discretized hemivariational inequality is formu- 
lated in this section. A finite element discretization is 
assumed. All relations are written in an elementary ma- 
trix analysis form. An elastic structure with both clas- 
sical, linearly elastic and degrading elements is consid- 
ered. 


The stress equilibrium equations read: 


= (G on) (2) =p 


n 


(24) 


where G is the equilibrium matrix of the discretized 
structure which takes into account the stress contribu- 
tion of the linear s and nonlinear s, elements and p is 
the loading vector. 

The strain-displacement compatibility equations 
take the form: 


_ e —T Gl 
= (2 )=e"= (Gr) 


where e, u are the deformation and displacement vec- 


(25) 


tors respectively. 
The linear material constitutive law for the structure 
reads: 


s = Kole — eo), (26) 


where Ko is the natural and stiffness flexibility matrix 
and ég is the initial deformation vector. 

The nonlinear material law is considered in the 
form: 


Sn € IciGn(en). (27) 


Here ¢,,(-), is a general nonconvex superpotential and 
summation over all nonlinear elements gives the total 
strain energy contribution of them as: 


q 


(en) = > (en). 


i=1 


(28) 


Finally classical support boundary conditions complete 
the description of the problem. 

The discretized form of the virtual work equation 
reads: 
Te* —e) +s) (e* — en) = p' (u* — uv), 


Ve*, u*, em. 


s 
(29) 


Entering the elasticity law (26) into the virtual work 
equation (29), and using (25) we get: 


u' GK) G' (u* — u) — (p + GKoeo) ' (u* — u) 
a at (e, — en) = 0, Vu" € Vads (30) 
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where K = G! Kj G denotes the stiffness matrix of the 
structure, p = p + GKoeo denotes the nodal equivalent 
loading vector and Vag includes all support boundary 
conditions of the structure. 

Further one considers the nonlinear elements (27) 
in the inequality form: 

sl (e* — en) < O2(e* —en), Wer, (31) 
where ®°(e* — e,) is the directional derivative of the 
potential ®,. Thus the following discretized hemivari- 
ational inequality is obtained: 


Find kinematically admissible 
displacements u € Vaq 
s.t. u! K(u* —u)— P (ux —u) 
+82 (uy — Un) = 0, 
Vu* € Vag. 


(32) 


Equivalently a substationarity problem for the total po- 
tential energy can be written: 


Find u é€ Vag 
(33) 
s.t. IT(u) = statyey,, {1(v)} . 
Here the potential energy reads [7(v) = iv" Kv _ 


piv + @,(v), where the first two terms (quadratic po- 
tential) are well-known in the structural analysis com- 
munity. 


Other Applications in Mechanics 


Hemivariational inequalities have been used for the 
modeling and solution of delamination effects in com- 
posite and multilayered plates, in composite structures, 
for nonmonotone friction and skin effects and for non- 
linear mechanics applications (for instance, in the anal- 
ysis of semi-rigid joints in steel structures). Details 
can be found in [9,11,12,17,18] and in the citations 
given there. Another area of applications are noncon- 
vex problems arising in elastoplasticity (cf. [4,5,6]). 
Some nonconvex problems in elastoplasticity have been 
treated by hemivariational inequality techniques in 
[17,18]. Mathematical results which are useful for the 
study of hemivariational inequalities can also be found 
in [2,3,13,14]. 


Numerical Algorithms 


A number of algorithms based on nonsmooth and non- 
convex optimization concepts, on engineering meth- 
ods or heuristics and on combination of these two ap- 
proaches have been tested till now for the numerical so- 
lution of hemivariational inequality problems. Both fi- 
nite elements and boundary elements have been used, 
the latter for boundary only nonlinear problems; see 
> Nonconvex energy functions: Hemivariational in- 
equalities and [1,8,9,17]. 


See also 


> Generalized Monotonicity: Applications to 
Variational Inequalities and Equilibrium Problems 

> Hemivariational Inequalities: Eigenvalue Problems 

> Hemivariational Inequalities: Static Problems 

> Nonconvex Energy Functions: Hemivariational 
Inequalities 

> Nonconvex-nonsmooth Calculus of Variations 

> Quasidifferentiable Optimization 

> Quasidifferentiable Optimization: Algorithms for 
Hypodifferentiable Functions 

> Quasidifferentiable Optimization: Algorithms for 
QD Functions 

> Quasidifferentiable Optimization: Applications 

> Quasidifferentiable Optimization: Applications to 
Thermoelasticity 

> Quasidifferentiable Optimization: Calculus of 
Quasidifferentials 

> Quasidifferentiable Optimization: Codifferentiable 
Functions 

> Quasidifferentiable Optimization: Dini Derivatives, 
Clarke Derivatives 

> Quasidifferentiable Optimization: Exact Penalty 
Methods 

> Quasidifferentiable Optimization: Optimality 
Conditions 

> Quasidifferentiable Optimization: Stability of 
Dynamic Systems 

> Quasidifferentiable Optimization: Variational 
Formulations 

> Quasivariational Inequalities 

> Sensitivity Analysis of Variational Inequality 
Problems 

> Solving Hemivariational Inequalities by Nonsmooth 
Optimization Methods 
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> Variational Inequalities 

> Variational Inequalities: F. E. Approach 

> Variational Inequalities: Geometric Interpretation, 
Existence and Uniqueness 

> Variational Inequalities: Projected Dynamical 
System 

> Variational Principles 
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The theory of hemivariational inequalities has been cre- 
ated by P.D. Panagiotopoulos et al. (see [3,5,6,7]) for 
studying nonconvex and nonsmooth energy functions 
under nonmonotone multivalued laws. In this setting 
many relevant models lead to nonsmooth eigenvalue 
problems. A typical example is provided by the analy- 
sis of hysteresis phenomena. To illustrate it we present 
here the loading and unloading problems with hystere- 
sis modes. 

Consider a plane linear elastic body (2 with the 
boundary I” whose mechanical behavior is described by 
the virtual displacement variable u and the scalar pa- 
rameter A which determines the magnitude of the ex- 
ternal loading on the system. The variable u must satisfy 
certain boundary or support conditions. For the sake of 
simplicity we assume that u = 0 on I’, so the space of 
kinematically admissible displacements u is the Sobolev 
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space Hj(), that is the closure of CS°(S2) with respect 
to the L?-norm of the gradient. Let us suppose that there 
exist a fundamental (pre-bifurcation) solution 1 +> 
uo(A) and another solution A +> u(A) = up(A) + z(A) 
that coincide for A < Ao. Then one has lim)_,,, z(A) = 0 
and the hysteresis bifurcation mode has the expression 


¥i(Ao) = lim ZA) 2). (1) 


Using the principle of virtual works together with phys- 
ically realistic assumptions on the data 6 and S (see e. g. 
[7]), we obtain the relation 


a(uy(Ao), v) + (S(ui(Ao)), v) 


= ro | u(Ag)vdx =0, Wve HQ). (2) 
2 


It is justified to accept that a generalized nonmonotone 
reaction-displacement (— S, u) holds in 2 expressed by 
the next law 


J PeaGorn dx = (StuiQo)).»), 
- Vv € HQ), (3) 
where j: R — R stands for a locally Lipschitz function 
with the generalized gradient 0j and the generalized di- 
rectional derivative 


f(xy) = max {(z, y): z € dj(x)} 


(see [2]). Relations (2) and (3) yield the following eigen- 
value problem in hemivariational inequality form: Find 
(u = u(A), A) € H}(2) x R such that 


a(u,v) + [ Pos» dx > af uv dx, 
Q Q 
Vv € Hi(@). (4) 


Additional information concerning problems of type 
(4) can be found in [3,5,6,7]. 

Relation (4), as well as other models, motivates the 
study of abstract eigenvalue problems for hemivaria- 
tional inequalities. The specific case of Problem (4) can 
be reformulated as follows: given a Banach space V em- 
bedded in L?({2), i.e. the space of square-integrable 
functions on 2 C RY, a continuous symmetric bilin- 
ear form a: V x V > Randa locally Lipschitz function 


j. R > R with an appropriate growth condition for its 
generalized gradient, find u € V and A € R such that 


a(u, v) + f Pw v) dx > af uvax, 
Q Q 
VveV. (5) 


Note that this last mathematical model can also be used 
to formulate various other problems in Mechanics like 
unilateral bending problems in elasticity. 

A general approach for studying the abstract eigen- 
value problem (5) is the nonsmooth critical point the- 
ory as developed by K.-C. Chang [1]. In that paper the 
minimax principles in the critical point theory are ex- 
tended from the smooth functionals (see [8]) to the case 
of locally Lipschitz functionals. In this respect we asso- 
ciate to Problem (5), for each A, the locally Lipschitz 
functional I,: V > R, 


hitw= salu, u) + [i dx — ai dx, 
2 Q 
Vue V. (6) 


Note that a critical point u of Ij, i.e. 0 € OI, (u), is a so- 
lution of (5) because 


dI,(u) C a(u,-) — A(u, +) 2 


+ af je dx C a(u,:)— A(u, +) 22 + / dj(u) dx 
Q Q 


(see [2]). Thus, to solve (5), it suffices to establish the 
existence of nontrivial critical points of the functional 
I, introduced in (6). To this end we proceed along the 
lines in [4] by arguing in an abstract framework. 

Given a Banach space V and a bounded domain (2 
in R™, m > 1, let T: V > L5(Q;R) be a compact linear 
operator, where L‘(§2;R%) stands for the Banach space 
of all Lebesgue measurable functions f: 2 — R% for 
which |f |* is integrable with 1 < s < 00. Let F: V > Rbe 
a locally Lipschitz function and let G: 2 xR‘ > R be 
a (Carathéodory) function such that G(x, y) is measur- 
able in x € Q, locally Lipschitz in y € RN and G(x, 0) = 
F(0) = 0, x € 2. The hypotheses below are imposed 
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H1) |w| <c(1+|y|‘~'), Vw € 0,G(x, y), x E Q,y € 
RN, with a constant c > 0; 
H2) i) Fy) —-r(z,vy) >a || vy —ao., Vv Ee Vi ze 
dF(v); 
ii) G(xyy)—r(wy) >= — bly |% — bo, for ae. 
x € 2,y € RN, w € 0,G(x, y), with positive 
constants r, @, do, b, bo, 0, Oo, where 1 < 00 
<min{o,r',s}; 
H3) any bounded sequence { v, } C V for which there 
is Z, € OF(v,) converging in V* contains a conver- 
gent subsequence in V; 
H4) i) lim inf, — oF(v ||v||)? > 0; 
ii) 
lim inf F() [[vlly* 
+ [Q[ PN? ITI)? lim inf G(x, y) |p? 
>0 


uniformly with respect to x,1 < p<s; 
H5) 


lim inf F(tvo)t7 1” 
t>+00 
<—liminf ¢* / G(x, tT vo) dx 
t>+00 
Q 


for some vp € V. 
The following statement is our main result in studying 
the abstract eigenvalue problem (5). 


Theorem 1 Assume that the hypotheses H1)-H5) hold. 
Then there exists a nontrivial critical point u € V of I: V 
— R defined by 


I(v) = F(v) + | G(x, (Tv)(x)) dx, ve V. 
°9) 


Moreover, there exists z € 0F(u) and w € LS6~ }(2;RN) 
such that 


w(x) € dyG(x, (Tu)(x)) 


(Z,V)y + i: (w(x), (Tv)(x)) dx =0, ve V. 
2 


ae.x € 2, 


Conversely, if u € V verifies the relations above, corre- 
sponding to some z and w, and the function G(x, -) is 
regular at (Tu) (x) (in the sense of F.H. Clarke [2]) for 
each x € 82, then u is a critical point of I. 


The foregoing locally Lipschitz functional I satisfies the 
Palais-Smale condition in the sense of Chang [1]. In- 
deed, let (v,) be a sequence in V with I(v,,) < M and for 
which there exists a sequence J, € 0I(v,) with J, > 0in 
V*. Then from H2) and taking into account that 


Jn =2n 4+ T* Wa, 
Zn € OF (Vn), 


Wr(x) € OyG(x,(Tyn)(x)) aexe Q, 


we infer that 
M ar r Yall es F(vy) oF 1 (Zn, Vady 
a [cw (Tvn)(x)) — r (Wn(x), (TVn)(x))) dx 
Q 


= a llyall + Ci lvally + Co, 


with real constants C), C2, provided that n is large 
enough. It is clear that the estimate above implies that 
the sequence (v,,) is bounded in V. Then a standard ar- 
gument based on the assumption H3) allows to con- 
clude that (v,) possesses a strongly convergent subse- 
quence. Namely, the boundedness of (v,) implies that 
(Tv,) is bounded in L‘(2;R%). Thus (w,) is bounded in 
LS~(Q;RN) due essentially to the assumption H1). 
Since T* is a compact operator and J, — 0 we de- 
rive that (z,,) has a convergent subsequence in V*. This 
fact combined with the boundedness of (v,) allows to 
use the hypothesis H3). The claim that the locally Lips- 
chitz functional I verifies the Palais -Smale condition is 
proved. 

Assumption H4) insures the existence of some con- 
stants 6 > 0, A > 0 and B > 0, with 


A—~B|Q|"-?? TI? > 0, 


such that 

Fv) > Allyl. — IIvlly $8, (7) 
and 

G(x,y)>—Bly/’, VxeEQ, |y| <6. 


Combining the inequality above with H1) one obtains 
that 


[ow (Tv)(x)) dx > (An) [lvI2,, 
Q 
Ivy <p. (8) 
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for some 7 > 0 and 0 < p < 6. Indeed, assumption H1) 
and Lebourg ’s mean value theorem imply that G fulfills 


the following growth condition 
IG(x, y)| <a1t+a|ylS,Wxe2, yeR, 


with constants a), ad, > 0. The two estimates above for 
G(x, y) show that 


G(x, y) = —Bly|? — (a8 + a) |yl, 


VxeQ, yeR’. 


Then one deduces from the continuity of T that one has 
iy G(x, (Tv)(x))dx 
7) 
> (—B/Q|°"? | TI? 
= (a8 + a) TI Ivy”) Holl 
Vv eV. 


Since s > p we see that the numbers 7 > 0 and p > 0 can 
be chosen so small that relation (8) be verified. 

By (7) and (8) we arrive at the conclusion that there 
exist positive numbers p, 7 such that 


Kv) >, |lvily =e. (9) 


The formula 
d,(t~!" G(x, ty)) 
= <tr (G(x, ty), ty) — G(x, ty)], 
the absolute continuity property and H2ii) show that 


t-!" G(x, ty) — G(x, y) 
t 
= [acne ty)dt <Cly|" + Co 
1 


for a.e. x € 2, y € RN, t > 1, where C, Cp are positive 
constants. Then one obtains 


I(tOvo) < (t0)”” 


x | F(tOvo)(t0) "+ C IIvoll goo-lr 


+C 07" + O-" i; G(x, 0(Tv9)(x)) dx 
2 


for all t > 1, 6 > 1, with new positive constants C, Co. In 
view of H5) and since oo < 1/r, we can find @ sufficiently 
large such that 


C IIvoll$° grt ae Coo" 


+ on f G(x, O0(Tvo)(x)) dx 
Q < —liminf F(tvo)t!". 
T>+00 
With such fixed number 0, we see that there exists ar- 
bitrarily large t satisfying 


F(tOv0)(t0)-" + C livolly? OM" + Cod" 


or / G(x, 0(Tvo)(x)) dx < 0. 
Q 
We deduce that 


I(tnvo) < 0 (10) 


for a subsequence t,, > oo. The properties (9) and (10) 
permit to apply the mountain pass theorem in the nons- 
mooth version of Chang [1]. This yields the desired crit- 
ical point u of I. The other assertions of the first part of 
Theorem are direct consequences of the last statement. 

The converse part of Theorem follows from the next 
formula 


af ctx.u(x) dx = [460.0 dx, 
Q Q Vu € L5(2;RX), 


which is valid under the growth condition in H1) and 
the regularity assumption for G (see [2]). The proof of 
Theorem is thus complete. 

In the case of problem (4) we choose V = H}(2), 
the compact linear operator T: Hj(2) > L‘() equal 
to the embedding H}(2) C L*(Q) with 2 < s < 2m(m — 
2)! ifm > 3, 


F(v) = 5 f aver —~Av) dx, Vv e H(Q), 
2 


where for simplicity we take a(u, v) = fq Vu- V v dx, 
and G(x, t) = j(t). A significant possible choice for j is 
the following one 


= + foe dt, teR, (11) 
0 


where 6 € L® (R) verifies t B(t) > 0 for t near 0, | B(t) 


loc 


| <c(1+|t|”), t € R, with constants c>0,0<y <1. 


Hemivariational Inequalities: Eigenvalue Problems 


1487 


Corollary 2 Let j: R— R be given by (11). If A; denotes 
the first eigenvalue of — A on H}(82), then for every A 
< 4, the problem (5) with a as above, has a nontrivial 
eigenfunction u € H}(&) which solves in addition the 
nonsmooth Dirichlet problem containing both superlin- 
ear and sublinear terms 


Au + Au + |ul>~ u € [B(u(x)), B(u(x))] 


aex€2,u=0 ond, 


where the notations in [1] are used. 


The argument consists in verifying the assumptions 
H1)-HS) for the functional I = I,, for A < 41, with I, 
described in (6). To this end it is sufficient to take r € 
(1/s, 1/2), p= 0 =2,009 = y + 1and vp € Hj(2) {0}. 
Applying Theorem one finds the stated result. 

Other related results and applications for eigenvalue 
problems in the form of hemivariational inequalities 
are given in [3,4,5,6,7] and the references therein. 
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Let V = H'(Q;R%), N > 1, be a vector valued Sobolev 

space of functions square integrable together with their 

first partial distributional derivatives in 92, 92 being 

a bounded domain in R”, m > 2, with sufficiently 

smooth boundary J”. Assume that V is compactly 

imbedded into L?(92;R%) (1 < p < 2m/m — 2), [12]). 

We write || - ||y and || - ||z»¢@,rx) for the norms in V 

and L?(2;R), respectively. For the pairing over V* x 

V the symbol (-, - )y will be used, V* being the dual of 

V. 

Let A: V > V* be a bounded, pseudomonotone 
operator. This means that A maps bounded sets into 
bounded sets and that the following conditions hold 
[3,5]: 

i) The effective domain of A coincides with the whole 
V; 

ii) If u, > u weakly in V and lim supy— 0 ( AUns Un 
— uy ) < 0, then lim inf, 99 (Aun, Un — vv ) = ( 
Au, u—v)y forany ve V. 

Note that i) and ii) imply that A is demicontinuous, 

ie. 

iii) If u, — u strongly in V, then Au, — Au weakly in 
v*. 


Moreover, we assume that V is endowed with a direct 
sum decomposition V = Vv + Vo, where Vo is a finite- 
dimensional linear subspace, with respect to which A is 
semicoercive, i.e. Wu € V there exist U € Vandée Vo 
such that u = “7+ 6 and 


(Au, u)y = c( ie l#ly- ” 


where c: R* — R stands for a coercivity function with 
c(r) > oo as r > oo. Further, let j: RY > R be a lo- 
cally Lipschitz function fulfilling the unilateral growth 
conditions ([16,21]): 


(P(E —&) < alr) + |é|%), 
Vé,n ER, [nl <r, r>0, (2) 


and 


j G8) S66], 


where 1 <o <p, k is a nonnegative constant and a :R* 
— R° is assumed to be a nondecreasing function from 
R* into R*. Here, j°(.;-) stands for the directional Clarke 
derivative 


VEER, (3) 


Hes) atnip I 


h->0 A 
A>04 


(4) 


by means of which the Clarke generalized gradient of j 
is defined by [6] 


dj(E) = {w ER: (Em) > wen, WN ER}, 
Ene RX. 
Remark 1 The unilateral growth condition (2) is the 
generalization of the well known sign condition used 


for the study of nonlinear partial differential equations 
in the case of scalar-valued function spaces (cf. [27,28]). 


Consider the problem of finding u € V such as to satisfy 
the hemivariational inequality 


(au—g.v—a)y + f Psy—w dQ2 > 0, 


- Vv eV. (5) 


It will be assumed that g € V™ fulfills the compatibility 
condition 


(g,0)y < [re dQ, WOe Vo \ {0}, (6) 
Q 
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where j©: RY + R U {+ 00 } stands for the recession 


functional given by (cf. [2,4,10]) 
#°(§) = lim ninfl—/Cens—m)], per”. (7) 
t>+00 


Because of (1), the problem to be considered here 
will be referred to as a semicoercive hemivariational in- 
equality. 

The notion of hemivariational inequality has been 
first introduced by P.D. Panagiotopoulos in [22,23] for 
the description of important problems in physics and 
engineering, where nonmonotone, multivalued bound- 
ary or interface conditions occur, or where some non- 
monotone, multivalued relations between stress and 
strain, or reaction and displacement have to be taken 
into account. The theory of hemivariational inequali- 
ties (as the generalization of variational inequalities, cf. 
[7]) has been proved to be very useful in understand- 
ing of many problems of mechanics involving non- 
convex, nonsmooth energy functionals. For the gen- 
eral study of hemivariational inequalities and their ap- 
plications, see [13,14,15,17,18,19,20,21,24,26] and the 
references quoted there. Some results in the area of 
static, semicoercive inequality problems can be found 
in [9,10,25]. 

To prove the existence of solutions to (5), the 
Galerkin method combined with the pseudomonotone 
regularization of the nonlinearities will be applied. 

Let us start with the following preliminary results. 

The regularization KRG -), R > 0, of the Clarke direc- 
tional derivative j°(-;-) will be defined as follows: for any 
&,1 © RN, set 


Pn) if |§| SR, 


En) = 
an i (Rgpn) if (el >R. 


(8) 


Lemma 2 Suppose that (2) and (3) are fulfilled. Then 
forR>0, 


Plén—O <HNUA+IEl"), VEER, 
Wn ERY, [nl <7, 72>0. (9) 


P(E-&) <klEl, VEER, (10) 


< a(\n[)(1 + R°) + 


where @: R* — R* is a nondecreasing function inde- 
pendent of R. 


Proof To establish (9) and (10) it suffices to consider 
the case | & | > Rand to invoke the estimates 


flesn—#) =) (REsn-8) 


< 79 oe _ a) 
=f (ee 7 


+r? pa) 


|E|—R 


——kR 
R 


< a(r)(1 + |&|°) + kg], 
VE,n ERY, In| <r, r= 0, 


Pies—#) =p (x= -8) 


lo (p&. p&) = llyp— 
< Bp (R -Ro) kR= kill, 


If) EI ~ R 


respectively. The proof is complete. 


For any R > 0, the following regularization of the primal 
problem can be formulated: 


(Pr)Find (up, ¥r) € V x L4(Q;R), 


1/p +1/q = 1, such that 


(Aur = 27. — Uur)y 


+ f xn-(— un) d2 = 0. VveV, (11) 
Q 


XR € IR(ue), (12) 


Fp(ur) := 4 W € L4(2;R): [v-vac 
2 


< [ Rlussnaa. V ve LP(Q;R%) 
2 


In order to show that (Pr) has solutions, the follow- 
ing auxiliary result is to be applied. 
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Lemma 3 Suppose that (1)-(3) and (6) hold. Then there 


exists Ro > 0 such that for any R > Ro the set ofallue¢ V_ mediately. If an 


with the property that 
(au —gou)y = f Faas —u) dQ <0 (13) 
Q 


is bounded in V, i. e. there exists M > 0 (possibly depend- 
ing on R > Ro), such that (13) implies 


llully < M. (14) 


Proof Suppose on the contrary that this claim is not 
e2., C V with the 


true, i.e. there exists a sequence { Uy }O2, 


property that 


Massey to i Piluns—uy) dQ <0, — (15) 
Q 


where || u, || vy > 00 as n > oo. By the hypothesis, each 
element u,, can be represented as 


Un = Un + en On, (16) 


where tin € V, en > 0, On € Voy | On Ilv = 1, and 
Aiisthiy = c( a, I) a, | ,)- Taking into account 
(3) it follows that 


iGiag un)y = f Felens—uy) a2 
2 


> c(||@n ||) [en y — Iglly» nly 


— 6n (8. 8ny — Kf |ua| a2 
2 


= (nly) [ally — Igllvs (nly + en) 


_— ky a, 7 enky Ox lly ’ (17) 


ly 
where k; = const. The obtained estimates imply that { 
e, } is unbounded. Indeed, if it would not be so, then 
due to the behavior of c(-) at infinity, {u,,} had to be 
bounded. In such a case the contradiction with || u, || v 
— oo as n > ov results. Therefore one can suppose 
without loss of generality that e, — + oo as n —> ov. 
The next claim is that 


i Re 
—u, > 0 (18) 


strongly in V. 
en 


Indeed, if { a, | ys is bounded, then (18) follows im- 


I, — oo then c(|zn||,) —> +00. 


From (17) one has 


ves Tn 
ks + lal > (ella) ~ ely» : ly. 


Thus, the boundedness of the sequence 


ae Un 
(c(i) — liglye — kx) Lely 


results, which in view of 
c(||@n|| ,) — lglly» — ki > +00 asin > 00 


implies the assertion (18). The obtained results give rise 
to the following representation of u,: 


Lx 
Un = Cn (<3, a on) , 
en 


where u,,/e, —> 0 strongly in V and 0, > @ in Vo as n 
— oo for some 6 € Vo with || 6 ||y = 1 (recall that Vo 
has been assumed to be finite dimensional). Moreover, 
the compact imbedding V C L?(2;R) permits one to 
suppose that u,,/e, > Oand 6, > 0 ae. in Q. 

Further, (15), together with the fact that A is semi- 
coercive, leads to 


0 > (Atle — g,ttn)y — i PGa—ngde 

2 
> (¢(||tn |) — Iglly«) @nl] yy — en (g. On) y 
+ en 


~ 1 1 

0 ~ aS 
° _ n\ —4Un On .= n—9n 
/ o (< (<1 a ) en ) 
2 


Hence 


~~ 1 ae 
(g,On)y 2 (c(||@n ||) — Ilglly«) = zn, 


(i en 
+f ih (en (= +6.) a a, d&Q2. 
Q 

(19) 


Now observe that either 


A Tm ais 
(c(@n | ,) — lIgllv*) a, ||, >0 anoow, 
n 
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if { a, | } is bounded, or 


os 1 yn 
(c([[#n ||) — gly) = [#al], = 0 
n 


> wasn > w. 


for sufficiently large n, if a, I, 


Therefore, for any case 
. . = 1 Cad 
lim inf (c(\[tn||y) — lglly+) = [fully 2 0. 
Moreover, by (10) the estimate follows: 


- Un Un 
_ n\ — On ibe On 
(« (= = ) en ) 


n 


ag, 


n 


>-k . (20) 


This allows the application of Fatou ’s lemma in (19), 
from which one is led to 


(g, 0), = lim inf 


(21) 


Taking into account (8) and upper semicontinuity 
of j°(-, -), one can easily verify that 


lim inf E (cnc + a _ °n) | 
noo en en 
6 
>—; (R5:-#) ; 
|0| 
which leads to 


(g,9)y > f-f (RG 3-*) dQ. 
22 


Since j~(-) is lower semicontinuous and Vp is finite di- 
mensional, from (6) it follows that a 6 > 0 can be found 
such that for any 8 € Vo with || @y || =1, 


(22) 


(g,0),+6< [ro dQ. (23) 


2 


With the help of Fatou ’s lemma (permitted by (20)) we 
arrive at 


lim int f _;° (7,0-6) dQ > [ro dQ. 
2 2 


The upper semicontinuity of j°(-;-) allows us to con- 
clude the existence of Rg > 0 and €g > 0 such that 


ae Bae a Es 28 
; j (ao a’) da> | j°()d2—> 
2 2 


for each R > Rg and 6’ € Vo with || 0 — 0’ lly < eg. As 
the sphere {v € Vo: ||v||v = 1 } is compact in Vo, there 
exists Ro > 0 such that 


[-* (%-9) dQ > [ro 2= . 
2 2 


for any 8 € Vo with || 6 ||y = 1, R> Ro. This combined 
with (23) contradicts (22). Accordingly, the existence 
of a constant J > 0 has been established such that (13) 
implies (14), whenever R > Ro. The proof of Lemma 3 is 
complete. 


Proposition 4 Let us assume all the hypotheses stated 
above. Then for any R > Ro the problem (Pr) possesses at 
least one solution. Moreover, if (up, XR) is a solution of 
(PR), then 


lurlly <M (24) 


for some constant M not depending on R > Ro. 


Proof Let A be the family of all finite-dimensional 
subspaces F of V, ordered by inclusion. Denote by ir: 
F — V the inclusion mapping of F into V and by i}: 
vV* — F* the dual projection mapping of V* into F*, 
F* being the dual of F. The pairing over F* x F will be 
denoted by (-,-)p. Set Ap := 77° A° ip and gr := ing. 

Fix R > Ro. For any F € A consider a finite- 
dimensional regularization of (Pa): 


(Pr) Find (up, yr) € F x L1(Q;R) € F 


such that 
(Aur gv) + f tev do =0,VveF, (25) 


2 


Xe € TR(ur). (26) 
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The first task is to show that for each F € A, (Pr) has 
solutions. Notice that I", (-) has nonempty, convex and 
closed values and if w € I’ (v), v € L? (Q;RN), then 


Il Ileaarny < Kr, (27) 


for some Kr > 0 depending on the Lipschitz constant 
of j in the ball { 7 € RN: | 1 | < R }. Moreover, from 
the upper semicontinuity of jQ(-;-) and Fatou ’s lemma 
it follows immediately that Ip is upper semicontinu- 
ous from L? (2;RY) to L4 (2;R%), L4 (Q;R%) being en- 
dowed with the weak topology. 

Further, let tp: L4 (2;R) — F* be the operator that 
to any y € L4(Q;R%) assigns tp y € F* defined by 


(teW, Vv) p= / w-vdQ foranyv € F. (28) 
2 


Note that tp is a linear and continuous operator from 
the weak topology of L7 (2;R%) to the (unique) topol- 
ogy on F*. Therefore Gp: F > 2°”, given by the formula 


Gp(vp) := Tel (ve) forve € F, (29) 


is upper semicontinuous. 

By the pseudomonotonicity of A it follows that Ar: 
F — F* is continuous. Thus, Ag + Gg — gp: F > 2F* 
is an upper semicontinuous multivalued mapping with 
nonempty, bounded, closed and convex values. More- 
over, for any vp € F and Wr € Gr(vp) one has 


(Arve + Wr — SF. VF) p 


SU 2G / Plves—ve) d2. (30) 
2 


Hence, in view of Lemma 3, for R > Ro there exists M > 
0 not depending on F € A such that the condition || vp 
|v = M+ 1 implies 


(Arve + Wr — gr. VF) = 0. (31) 


Accordingly, one can invoke [1, Corol. 3, p. 337] to de- 
duce the existence of up € F with 


lvelly < M+1 (32) 


such that 0 € Apup + Gp(ur) — gr. This implies that for 
some Yr € I R(up) it follows that Wp = tTe(Yp) and (up, 
Xr) is a solution of (Pp). 


In the next step it will be shown that (Pr), R > Ro, 
has solutions. 


For F € A, let 
(up, Xr’) 
satisfies (Per) 
W, = rE 5 
& U ees for some 
F/DF yar € L4(2;RN) 


The symbol weakcl (W,) will be used to denote the clo- 
sure of Wp in the weak topology of V. From (32) one 
gets 


weakel(Wp) C By(O,M+1), WFEA, 


where By (O, M + 1) := {ve V |lv||y < M+ 1}. Thus, 
the family { weakcl( Wr): F € A } is contained in the 
weakly compact set By (O, M + 1) of V. Further, for 
any F),..., Fy € A, k =1,2,..., the inclusion Wp, N--- 
Wp, D Wy results, with F = F, +--+ + Fx. Therefore, 
the family {weakcl( Wz): F € A } has the finite intersec- 
tion property. This implies that Np <4 weakcl (Wp) is 
not empty. From now on, let ur € By(0, M + 1) belong 
to this intersection. 

Fix v € V arbitrarily and choose F € A such that up, 
v € F. Thus, there exists a sequence {uz, } C Wp with 
Up, —> Ur weakly in V. Let yp, € Ir(uz,) denote the 
corresponding sequence for which (ur,, 7F,) is a solu- 
tion of (Pr, ) (for simplicity of notation, the symbols {u,, 
} and { 7, } will be used instead of up, and yr, respec- 
tively). Therefore 


(Aun = gs — ttn) + fn (w= tn) d2 = 0, 


2 


VweF,. (33) 


Since || Xn Ilzaca@:n%) < Kr and L4 (92;R\) is reflexive, 
it can also be supposed that for some 7p € L1(Q;R), 
Xn > Ar weakly in L47(2;R%). By the hypothesis, the 
imbedding V C LP(Q3RY) is compact, sO Uy, —> UR 
strongly in L? (92;R%). Consequently, by the upper 
semicontinuity of "pz from L? (2;R) to L4? (2;R%) 
(L;7 (Q;R) being endowed with the weak topology) it 
follows immediately that yr € I”p(up), i.e. (12) holds. 
Moreover, fa Xn (UR — Un)d 2 > 0 as n > o6 and 
(33) with w = up lead to lim (Auy, Un — UR )y = 0. Ac- 
cordingly, the pseudomonotonicity of A allows the con- 
clusion that (Auy, Un ) vy > (Aur, Ur) y and Au, > Aug 
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weakly in V*. Finally, substituting w = vin (33) and let- 
ting n — oo give in conclusion (11) with v € V chosen 
arbitrarily. Thus the existence of solutions of (Pr) has 
been established. 

Let us proceed to the boundedness of solutions {ug 
}R>R, Of (Pr). Suppose on the contrary that this claim is 
not true. Then according to (11) and (12) there would 
exist a sequence R,, — oo such that || ur, ||vy > coasn 
— oo, and 


(Aur, — &. uny)y— f 7 (ue,s—uR,) dQ <0. (34) 
Q 


From now on, for simplicity of notations, instead of the 
subscript ‘R,,’ we write ‘n’. Eq. (34) allows us to follow 
the lines of the proof of Lemma 3. First, analogously one 
arrives at the representation 


ion 
Un = en (<3, + er , 
en 


with 7,,/e, — 0 strongly in V and 6, > 9 in Vo as n 
— oo for some 69 € Vo with || 0 ||y = 1. Secondly, the 
counterpart of (21) can be obtained in the form 


if ja, + enO,| < R, and 


a 1. 1. 
in («. (<3, a on) ei _ or 


if ae + €n9,| > R,.. Therefore we easily conclude, us- 
ing (7), that 


~ ee 1 x 
lim inf —j («. (<3, + °n) ;-—uy, — a, 
noo m Cy e 


> j (Op). 


Consequently, by Fatou ’s lemma, 


(g, O)y > / j°°(6) d2, 
Q 


contrary to (6). Thus, the boundedness of {ug }r>a, fol- 
lows and the proof of Proposition 4 is complete. 


The next result is related to the compactness property 
of { yx: R> Ro fin L! (Q;RY). 


Proposition 5 Let a pair (ur, Xr) € V x L4 (Q;RN) be 
a solution of (Pr). Then the set 


(ur, XR) 
is a solution of 
(Pr) 
for some up € V, 
R> Ro 


xR € L1(Q;R): 


is weakly precompact in L' (Q;RN). 

Proof According to the Dunford-Pettis theorem [8] it 
is sufficient to show that for each e > 0 ad > 0 can be 
determined such that for any w C 92 with meas w < 6, 


Jive d2<e, R>Rpo. (36) 
) 


Fix r > 0 and let 7 € RN be such that | 7 | <r. Then, by 
(9), from yr + (N — UR) < jh(UR; 7 — UR) it results that 


ARN SARs UR +a(r)(1 + |upR|*) (37) 
a.e. in §2. Let us set 
f 
n= Ty Sen te: ..., S880 XRy), 
where 7r,,i=1,..., N, are the components of yr and 


where sgny = 1 if y > 0, sgn y = 0 if y = 0, and sgny = 
— lif y <0. It is not difficult to verify that | 1 | < r for 
almost all x € 2 and that 


‘ 
“n> — . 
yes ye 


Therefore, by (37) the estimate follows 


r 


JN 


Xx] < XR: UR +Q(r)(1 + |ur|*). 
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Integrating this inequality over w C 2 yields 


/N /N 
fir dQ < XO frees dQ+—a(r) meas w 
’ fa r 
oO 


N 
oF SNe tceas q)'P-7'P Il4rIIZoqq) 42. (38) 
re 
Thus, from (24) one obtains 
[ize ae 
oO 
N N 
< AN fs -urRdQ + VN 7) meas a 
r r 
NO 
+ Nan measoy Py ugh 42 
N N 
< YN fy, -urRADQ + YN) meas w 
r r 
N 
+ YN 504) (meas co)? y? MP dQ (39) 
e 
(ll + Ieecasryy Sv Il: Iv). 
Further, it will be shown that 
[rem dQ<C (40) 


o 


for some positive constant C not depending on w C (2 


and R > Ro. Indeed, from (10) one can easily deduce that 
Xr'UrRtk|ur| => 0 ae.in 92. 


Thus it follows that 


[ve “uUR+ k |ur|) dQ 
() 


= f (tn-un + klun) dQ, 
2 


and consequently 


[x “UR dQ < [rm dQ2 + 2k, lur|ly - 

@ 2 
But A maps bounded sets into bounded sets. Therefore, 
by means of (11) and (24), 


[xem d&2 = — (Aur — g,UrR)y 
2 


< ||Aur — glly llurlly < Co, Co = const, 


and consequently, (40) easily follows. Further, from 
(39) and (40), for r > 0, 


[ie dQ < 4 YN (0) meas 


N 
+ YN) meas w)'P-/P yy MP dQ. (41) 
; 


This estimate is crucial for obtaining (36). Namely, let ¢ 
> 0. Fix r > 0 with 


N 


r 2 a 


and determine 5 > 0 small enough to fulfill 
VN 
—a(r) meas w 
- 
VN 
+—@(r)(meas w)'?~?P y? M® < 
rs 


provided that meas w < 6. Thus, from (41) it follows 
that for any w C 92 with meas w < 4, 


ipo dQ<e, R>Rpo. (43) 


oO 


Finally, { ¥r }r>r, is equi-integrable and its precom- 
pactness in L! ((Q;R¥) has been proved [8]. 


Now the main result will be formulated. 


Theorem 6 Let A: V — V* be a pseudomonotone, 
bounded operator, j: RN > R a locally Lipschitz func- 
tion. Suppose that (1)-(3) and (6) hold. Then there exist 
u € Vand yx € L! (Q;RN) such that 


(Au—g.v—u)y + f xu de =o, 
Q 


Vv € VON L®©(Q;RX), (44) 
X € Oj(u) a.e.in 82, (45) 
xy: ue LQ). 
Moreover, the hemivariational inequality holds: 
(Au —g,v—u)y + [ Peusy —u)dQ2 > 0, 
2 
Vv eV, (46) 
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where the integral above is assumed to take + 00 as its 
value if j°(u;v — u) ¢ L'(Q). 


Proof The proof is divided into a sequence of steps. 

Step 1. From Propositions 4 and 5 it follows that 
from the set { ur, ¥r }r>r, Of solutions of (Pr) a se- 
quence { ur,, Xr, } can be extracted with R, — oo as n 
— oo (for simplicity of notations it will be denoted by 
(Uns Xn)), such that 


(Aun — £,V—Un)y 


+ f teu) a2 =0.¥ eV, (47) 
Q 


and 


Xn € I, (un), 


Un > U 


Xn 7 X 


weakly in V, (48) 


weakly in L1(2;R) 


for some u € V and x € L! (Q;R%). 

The boundedness of {Au,, } in V* (recall that A has 
been assumed to be bounded and that || u, ||y <M) 
allows the conclusion that for some B € V*, 

Au, —> B- weakly in V* (49) 
(by passing to a subsequence, if necessary). Thus, (47) 
implies that the equality 


(B—g.rhy + f x-vda =o (50) 


2 


is valid for any v € VM L©(Q;R). 

Step 2. Now it will be proved that y € dj(u) ae in 
Q, i.e. the first condition in (45) is fulfilled. Since V is 
compactly imbedded into L? (2;RYN), due to (48) one 
may suppose that 

Un > u_ strongly in L?(@;R). (51) 
This implies that for a subsequence of {u, } (again de- 
noted by the same symbol) one gets u, — uae. in 92. 
Thus, from Egoroff ’s theorem it follows that for any ¢ 
> 0a subset w C 92 with meas w < € can be determined 
such that u, — u uniformly in 2 \ w with u € L© 
(2 \ w;RN). Let v € L® (2 \ @;RY) be an arbitrary 


function. From the estimate 


[ wvaas J ReGusv) a0 


2\o R\o 
= / j(unsv) dQ, (for large n) 
2\o 


(uy remains pointwise uniformly bounded in 2 \ w and 
Ry, — 0© as n > oo) combined with the weak conver- 
gence in L(Q;RY) of Xn to x, (51) and with the upper 
semicontinuity of 


L™(2 \ a; R*) > uy, / PP (uns v) dQ, 


2\o 


it follows that 


/ xX-vdQ< ; p(usv) dQ, 
ae oe Vv € L™(2 \ @;R). 
But the last inequality allows us to state that y € dj(u) 
a.e. in §2 \ w. Since meas w < € and € was chosen arbi- 
trarily, 


XE dj(u) aeinQ, (52) 


as claimed. 

Step 3. Now it will be shown that y - u € L!(Q), 
i.e. the second condition in (45) holds. For this pur- 
pose we shall need the following truncation result for 
vector-valued Sobolev spaces. 


Theorem 7 ([20]) For each v € H!(Q;R) there exists 
a sequence of functions { €, } C L°(2) with 0 < &, < 1 
such that 


{(1 — €n)v} C H'(Q;RN) A L™(Q3R) (53) 
(1—€&,)v > v _ strongly in H}(Q;R). 
Remark 8 For the truncation procedure of the form 


(53) in the case of a scalar-valued Sobolev space 
WP'™(§2) the reader is referred to [11]. 


According to the aforementioned theorem, for u € V 
one can find a sequence { ex, } € L™(S2) with 0 < e < 
1 such that H% := (1 — ex)u € VA L®%(Q;RY) and 
Uz — uin V as k > oo. Without loss of generality it 
can be assumed thatu; — wa.e. in 92. Since it is already 
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known that x € 0j(u), one can apply (3) to obtain x - (— 
u) < j°(u; —u) < k |u|. Hence 


Xun = (1—ex)x-u > —k ul. (54) 


This implies that the sequence { y -%;} is bounded from 
below and x -u,% > x-ua.e. in 2. On the other hand, 
due to (50) one gets 


C> (—B+ g,Uk)y = a dQ2 
2 


for a positive constant C. Thus, by Fatou ’s lemma 7 - u 
€ L'(Q2), as required. 
Step 4. Now the inequality 


lim int f Xn- Un dQ = fu dQ (55) 


oo 
2 i?) 
will be established. It can be supposed that u, — u ae. 
in Q, because u, — u strongly in L?(2;R). Fix v € 
L™(92;R¥) arbitrarily. Since x, € Iz, (Un), Eq. (9) im- 
plies 


Xn°* (v—Un) < Teall — Un) 


S AV] poopaigny)A + |unl®). (56) 


From Egoroff ’s theorem it follows that for any ¢ > 0 
a subset w C 2 with meas w < € can be determined such 
that u, — u uniformly in 2 \ w. One can also suppose 
that w is small enough to fulfill /,, @(||v|| poca.any)(1 + 
|un|°) dQ < e,n=1,2,..., and fy a(|lvl|L-oceRy))(1 
+ || uw?) dQ <e. Hence 


| Reensy —u,) 42 


2 

= i pon(Uns V — Un) dQ+e 
2\o 

= ; j°(Unsv—Un) dQ +e (for large n), 
2\o 


which by Fatou ’s lemma and upper semicontinuity of 
75°) yields 


lim int f —jan(Uns V — Un) dQ 
noo 
2 


= [-fer- u) dQ —2e. 
Q 


By arbitrariness of ¢ > 0 and (56) one obtains 


limint [ Xn-Un dQ 
noo 
Q 


. [var f fuv-wae. 
Q Q 

Wve VOAL™(Q;R%). (57) 
By substituting v = WU, := (1 — ex)u (with U, as de- 
scribed in the truncation argument of Theorem 7) into 
the right-hand side of (57) one gets 


lim inf 
A> Oo 


2 


Xn-Un dQ 


>liminf | y-u,d2 
k—>o0o 
°9) 


—lim sup f justi —u)dQ. 


k—oo 


(58) 


Taking into account that u;, > uae. in Q, 
Po (us, — u) = ex j?(us—u) < exk |u| < k |u| 


and |y-u| > x: Ue = (1—e&,)x-u => —k |u|, Fatou’s 
lemma and the dominated convergence can be used to 
deduce 


lim sup f ust —u)d2 <0, 


k—-oo 
and 


lim 
k->oo 


xii da = f x-uaa. 
2 2 


Finally, combining the last two inequalities with (58) 
yields (55), as required. 
Step 5. The next claim is that 


(B-g.u)y + f x-ud2 =o. (59) 
Q 
Indeed, (50) implies 


2 
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with {i;} as in Step 3. Since x -u € L!(2) and —k |u| < 
X- Up = (lL—ex)y-u < |y-ul, by the dominated 
convergence, 


[ian | y-vae. 
2 2 


It means that (59) has to hold by passing to the limit as 
k > oo in (60). 

Step 6. In this step it will be shown that the pseu- 
domonotonicity of A and (47) imply (44). Indeed, (47) 
with v € VM L®(Q;RN) and (49) allows to state that 


lim sup (Aun, Un — U)y < (B—g,v—u)y 
noo 
+ fx-vde—limint f x41, dQ. 
2 2 

Substituting v = 1%; with 7, as in Step 3 and taking into 
account (55) one arrives at lim supy— 00 (Aun, Un — U 
)vy < 0 (by the application of the limit procedure as k 
— oo). Therefore the use of pseudomonotonicity of A 
is allowed and yields ( Aun, un )y > (Au,u)y and Au, 
— B= Auweakly in V* as n > oo. Finally, (47) implies 
(44), as claimed. 

Step 7. In the final step of the proof it will be shown 
that (44) and (45) imply (46). For this purpose, choose 
v€ VM L®©(Q;RY) arbitrarily. From (2) one has x - (v 
— u) <P (wv—u) < a(|\||L-0(@2,xy))(1 + |u|) with x - 
(v—u) €L'(Q) and a (|| vrooca.pny)(1 + |u|) € L(@). 
Hence j°(u;v — u) is finite integrable and consequently, 
(46) follows immediately from (44). 

Now consider the case j°(u;v — u) € L!(2) with 
vEVNLY(Q3RY),. According to Theorem 7 there 
exists a sequence Vy = (1 — €,)v such that {vz} C 
VN L™(2;RN) and; > v strongly in V. Since 


(Au — g,¥, —u)y + [tus —u)dQ>0, 
2 
so in order to establish (46) it remains to show that 


lim sup f ust —u)dQ< [ Posy —u)dQ. 


k—>oo 
2 


For this purpose let us observe that v;—u = (1—€%)(v— 
u) + €,(—u) which combined with the convexity of j°(u; 
-) yields the estimate 


PW — u) < (1 en) jp’ (usv — u) + en j(us—u) 


< |7°(us v — u) + klul. 


Thus the application of Fatou ’s lemma gives the asser- 
tion. Finally, the proof of Theorem 6 is complete. 


Remark 9 The analogous result to that of Theorem 6 
can be formulated for the hemivariational inequality 
(46) in which f a(-) d 2 is replaced by the boundary 
integral [ r(-) d I’, provided the imbedding H1(2) C 
L?(I’) is compact (1 < p < (2m — 2)/(m — 2), [12]). 


Example 10 Let us consider a linear elastic body which 
in its undeformed state occupies an open, bounded, 
connected subset {2 of R°. 92 is referred to a fixed Carte- 
sian coordinate system 0x)x2x3 and its boundary I” is 
assumed to be Lipschitz regular; n = (n;) denotes the 
outward unit normal vector to .. We decompose I" 
into two disjointed parts [’p and Is such that [7 = 
I'y UT's. As usual, the symbols u: 2 > R? ando : 2 
—> S? are used to denote the displacement field and the 
stress tensor field, respectively. Here S* stands for the 
space of all real-valued 3 x 3 symmetric matrices. 
Consider the boundary value problem: 
i) The equilibrium equations: 


Oij,j +b) =0 ind. (61) 
ii) The displacement—strain relation: 

éjj(u) = suis +uji) inQ. (62) 
iii) Hook’s law: 

Oi; = Cijxiéxi(u) in. (63) 
iv) The surface traction conditions 

ojjnj =F; ony. (64) 


v) The nonmonotone subdifferential boundary condi- 
tions 


—Sedj(u) ons. (65) 
Here, S = (S;) = (ajn;) is the stress vector, and 0j(-) is 
the generalized gradient of Clarke of a locally Lipschitz 
function j: R? = R; the summation convention over re- 
peated indices holds and the elasticity tensor C = (Cixi) 
is assumed to satisfy the classical conditions of elliptic- 
ity and symmetry [24]. 

Let V = H'(92;R°). By making use of the standard 
technique (cf. [24]), Eqs. (61)-(65) lead to the problem 
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of finding u € V such as to satisfy the hemivariational 
inequality 


J camesinenty — a a2 [ bi(vs~u) dQ 
Q 2 
= f Fi-u) ar + f Puy—w dl > 0, 


Ir Ts 


VveV. (66) 


Define A: V > V* by 


(Au,v)y = J cimenden) dQ, u,veV, 
2 
and let Vp = R={peV: éj(p) = 0,14, j = 1, 2,3} 
denote the space of all rigid-body displacements. Then 
(1) holds (for details see [24, p. 121]). Accordingly, if 
(2) (with o < 4) and (3) are fulfilled and, moreover, the 
compatibility condition 


2 Ip Ts 


is valid for any p € & \ { 0 }, then the hypotheses of 
the theorem mentioned in Remark 9 are satisfied. Con- 
sequently, the existence of solutions to the hemivaria- 
tional inequality (66) is ensured. 
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Introduction 


The Traveling Salesman Problem (TSP) is one of the 
most representative problems in combinatorial opti- 
mization. If we consider a salesman who has to visit 
n cities [46], the Traveling Salesman Problem asks for 
the shortest tour through all the cities such that no 
city is visited twice and the salesman returns at the 
end of the tour back to the starting city. Mathemati- 
cally, the problem may be stated as follows: Let G = 
(V, E) be a graph, where V is a set of m nodes and 
E is a set of arcs, let C = [c;;] be a cost matrix as- 
sociated with E, where cy represents the cost of going 
from city i to city j, (i, j = 1,..., nm), the problem 
is to find a permutation (i), i2, i3, ... , in) of the in- 
tegers from 1 through n that minimizes the quantity 
Ciyig T Cinis +++ T Cin iy: 

We speak of a Symmetric TSP, if for all pairs (i,j), 
the distance cj is equal to the distance c;;. Otherwise, we 
speak of the Asymmetric TSP [7]. If the triangle inequal- 
ity holds (¢;; < cij,+¢i,j, Vi, j, t1), the problem is said 
to be metric. If the cities can be represented as points 
in the plain such that cj is the Euclidean distance be- 
tween point i and point j, then the corresponding TSP is 
called the Euclidean TSP. Euclidean TSP obeys in par- 
ticular the triangle inequality cj; < cji, + ci,; for all 
tie 

An integer programming formulation of the Trav- 
eling Salesman Problem is defined in a complete graph 


G = (V, E) of n nodes, with node set V = {1, ... , n}, 
arc set E = {(i, j)|i, j = 1, ..., n}, and nonnegative 
costs cj associated with the arcs [8]: 


e =amin Gi; (1) 


i€V jev 
s.t. 
Saye t. tev (2) 
jev 
Vagal jev (3) 
i€V 
Yd doxij<|S]-1, VSCV,S#B (4) 
i€S jes 
xij €{0, 1}, foralli, je V, (5) 
where x;; = 1 if arc (i,j) is in the solution and 0 


otherwise. In this formulation, the objective function 
clearly describes the cost of the optimal tour. Con- 
straints (2) and (3) are degree constraints: they spec- 
ify that every node is entered exactly once and left ex- 
actly once. Constraints (4) are subtour elimination con- 
straints. These constraints prohibit the formation of 
subtours, i.e. tours on subsets of less than V nodes. If 
there was such a subtour on a subset S of nodes, this 
subtour would contain |S| edges and as many nodes. 
Constraints (4) would then be violated for this subset 
since the left-hand side of (4) would be equal to |S| 
while the right-hand side would be equal to |S| — 1. 
Because of degree constraints, subtours over one node 
(and hence, over n — 1 nodes) cannot occur. For more 
formulations of the problem see [34,60]. 

The Traveling Salesman Problem (TSP) is one of the 
most famous hard combinatorial optimization prob- 
lems. It has been proven that TSP is a member of the 
set of NP-complete problems. This is a class of diffi- 
cult problems whose time complexity is probably ex- 
ponential. The members of the class are related so that 
if a polynomial time algorithm was found for one prob- 
lem, polynomial time algorithms would exist for all of 
them [41]. However, it is commonly believed that no 
such polynomial algorithm exists. Therefore, any at- 
tempt to construct a general algorithm for finding op- 
timal solutions for the TSP in polynomial time must 
(probably) fail. That is, for any such algorithm it is pos- 
sible to construct problem instances for which the ex- 
ecution time grows at least exponentially with the size 
of the input. Note, however, that time complexity here 
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refers to the worst case behavior of the algorithm. It can 
not be excluded that there exist algorithms whose aver- 
age running time is polynomial. The existence of such 
algorithms is still an open question. Since 1950s many 
algorithms have been proposed, developed and tested 
for the solution of the problem. Algorithms for solving 
the TSP may be divided into two categories, exact algo- 
rithms and heuristic-metaheuristic algorithms. 


Heuristics for the Traveling Salesman Problem 


There is a great need for powerful heuristics that find 
good suboptimal solutions in reasonable amounts of 
computing time. These algorithms are usually very sim- 
ple and have short running times. There is a huge num- 
ber of papers dealing with finding near optimal solu- 
tions for the TSP. Our aim is to present the most in- 
teresting and efficient algorithms and the most impor- 
tant ones for facing practical problems. In the 1960s, 
1970s and 1980s the attempts to solve the Traveling 
Salesman Problem focused on tour construction meth- 
ods and tour improvement methods. In the last fifteen 
years, metaheuristics, such as simulated annealing, 
tabu search, genetic algorithms and neural networks, 
were introduced. These algorithms have the ability to 
find their way out of local optima. Heuristics and meta- 
heuristics constitute an increasingly essential compo- 
nent of solution approaches intended to tackle diffi- 
cult problems, in general, and global and combinatorial 
problems in particular. 

When a heuristic is designed, the question which 
arises is about the quality of the produced solution. 
There are three different ways that one may try to an- 
swer this question. 

1. Empirical. The heuristic is applied to a number of 
test problem instances and the solutions are com- 
pared to the optimal values, if there are known, or 
to lower bounds on these values [33,35]. 

2. Worst Case Analysis. The idea is to derive bounds 
on the worst possible deviation from the optimum 
that the heuristic could produce and to devise bad 
problem instances for which the heuristic actually 
achieves this deviation [42]. 

3. Probabilistic Analysis. In the probabilistic analy- 
sis it is assumed that problem instances are drawn 
from certain simple probability distributions, and it 
is, then, proven mathematically that particular solu- 


tion methods are highly likely to yield near-optimal 

solutions when the number of cities is large [43]. 
Tour Construction methods build up a tour step 
by step. Such heuristics build a solution (tour) from 
scratch by a growth process (usually a greedy one) that 
terminates as soon as a feasible solution has been con- 
structed. The problem with construction heuristics is 
that although they are usually fast, they do not, in gen- 
eral, produce very good solutions. One of the simplest 
tour construction methods is the nearest neighbor- 
hood in which, a salesman starts from an arbitrary city 
and goes to its nearest neighbor. Then, he proceeds 
from there in the same manner. He visits the nearest 
unvisited city, until all cities are visited, and then re- 
turns to the starting city [65,68]. 

An extension of the nearest neighborhood method 
is the double-side nearest neighborhood method 
where the current path can be extended from both of 
its endnodes. Some authors use the name Greedy for 
Nearest Neighborhood, but it is more appropriately re- 
served for the special case of the greedy algorithm of 
matroid theory [39]. Bentley [11] proposed two very 
fast and efficient algorithms, the K-d Trees and the 
Lazily Update Priority Queues. In his paper, it was the 
first time that somebody suggested the use of data struc- 
tures for the solution of the TSP. A priority queue con- 
tains items with associated values (the priorities) and 
support operations that [40]: 

e remove the highest priority item from the queue and 
deliver it to the user, 

e insert an item, 

e delete an item, and 

e modify the priority of an item. 

The insertion procedures [68] take a subtour of V 

nodes and attempt to determine which node (not al- 

ready in the subtour) should join the subtour next 

(the selection step) and then determine where in the 

subtour it should be inserted (the insertion step). The 

most known of these algorithms is the nearest inser- 

tion algorithm. Similar to the nearest insertion proce- 

dure are the cheapest insertion [65], the arbitrary in- 

sertion [12], the farthest insertion [65], the quick in- 

sertion [12], and the convex hull insertion [12] algo- 

rithms. 

There is a number of heuristic algorithms that are 
designed for speed rather for quality of the tour they 
construct [40]. The three most known heuristic algo- 
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rithms of this category are the Strip algorithm, pro- 
posed by Beardwood et al. [10], the Spacefilling Curve 
proposed by Platzmann and Bartholdi [58] and the Fast 
Recursive Partitioning heuristic proposed by Bent- 
ley [11]. The saving algorithms are exchange proce- 
dures. The most known of them is the Clarke-Wright 
algorithm [17]. Christofides [12,65] suggested a proce- 
dure for solving the TSP based on spanning trees. He 
proposed a method of transforming spanning trees to 
Eulerian graphs. 

The improvement methods or local search meth- 
ods start with a tour and try to find all tours that are 
“neighboring” to it and are shorter than the initial tour 
and, then, to replace it. The tour improvements meth- 
ods can be divided into three categories according to 
the type of the neighborhood that they use [64]. Ini- 
tially, the constructive neighborhood methods, which 
successively add new components to create a new so- 
lution, while keeping some components of the current 
solution fixed. Some of these methods will be presented 
in the next section where the most known metaheuris- 
tics are presented. Secondly, the transition neighbor- 
hood methods, which are the classic local search algo- 
rithms (classic tour improvement methods) and which 
iteratively move from one solution to another based 
on the definition of a neighborhood structure. Finally, 
the population based neighborhood methods, which 
generalize the two previous categories by considering 
neighborhoods of more than one solution. 

The most known of the local search algorithms is 
the 2-opt heuristic, in which two edges are deleted and 
the open ends are connected in a different way in or- 
der to obtain a new tour [48]. Note that there is only 
one way to reconnect the paths. The 3-opt heuristic 
is quite similar with the 2-opt but it introduces more 
flexibility in modifying the current tour, because it uses 
a larger neighborhood. The tour breaks into three parts 
instead of only two [48]. In the general case, 5 edges 
in a feasible tour are exchanged for 6 edges not in that 
solution as long as the result remains a tour and the 
length of that tour is less than the length of the previ- 
ous tour. Lin-Kernighan method (LK) was developed 
by Lin and Kernighan [37,49,54] and for many years 
was considered to be the best heuristic for the TSP. 
The Or-opt procedure, well known as node exchange 
heuristic, was first introduced by Or [56]. It removes 
a sequence of up-to-three adjacent nodes and inserts it 


at another location within the same route. Or-opt can 
be considered as a special case of 3-opt (three arcs ex- 
changes) where three arcs are removed and substituted 
by three other arcs. The GENI algorithm was presented 
by Gendreau, Hertz and Laporte [22]. GENI is a hybrid 
of tour construction and local optimization. 


Metaheuristics 
for the Traveling Salesman Problem 


The last fifteen years an incremental amount of meta- 
heuristic algorithms have been proposed. Simulated 
annealing, genetic algorithms, neural networks, tabu 
search, ant algorithms, together with a number of hy- 
brid techniques are the main categories of the meta- 
heuristic procedures. These algorithms have the abil- 
ity to find their way out of local optima. A number of 
metaheuristic algorithms have been proposed for the 
solution of the Traveling Salesman Problem. The most 
important algorithms published for each metaheuristic 
algorithm are given in the following: 

e Simulated Annealing (SA) belongs [1,2,45,64] to 
a class of local search algorithms that are known 
as threshold accepting algorithms. These algorithms 
play a special role within local search for two rea- 
sons. First, they appear to be quite successful when 
applied to a broad range of practical problems. Sec- 
ond, some threshold accepting algorithms such as 
SA have a stochastic component, which facilitates 
a theoretical analysis of their asymptotic conver- 
gence. Simulated Annealing [3] is a stochastic al- 
gorithm that allows random uphill jumps in a con- 
trolled fashion in order to provide possible escapes 
from poor local optima. Gradually the probability 
allowing the objective function value to increase is 
lowered until no more transformations are possi- 
ble. Simulated Annealing owes its name to an anal- 
ogy with the annealing process in condensed mat- 
ter physics, where a solid is heated to a maximum 
temperature at which all particles of the solid ran- 
domly arrange themselves in the liquid phase, fol- 
lowed by cooling through careful and slow reduc- 
tion of the temperature until the liquid is frozen 
with the particles arranged in a highly structured lat- 
tice and minimal system energy. This ground state 
is reachable only if the maximum temperature is 
sufficiently high and the cooling sufficiently slow. 
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Otherwise a meta-stable state is reached. The meta- 
stable state is also reached with a process known 
as quenching, in which the temperature is instan- 
taneously lowered. Its predecessor is the so-called 
Metropolis filter. Simulated Annealing algorithms 
for the TSP are presented in [15,55,65]. 

Tabu search (TS) was introduced by Glover [24,25] 
as a general iterative metaheuristic for solving com- 
binatorial optimization problems. Computational 
experience has shown that TS is a well established 
approximation technique, which can compete with 
almost all known techniques and which, by its flexi- 
bility, can beat many classic procedures. It is a form 
of local neighbor search. Each solution S has an as- 
sociated set of neighbors N(S). A solution S’ € N(S) 
can be reached from S by an operation called a move. 
TS can be viewed as an iterative technique which ex- 
plores a set of problem solutions, by repeatedly mak- 
ing moves from one solution S to another solution 
S’ located in the neighborhood N(S) of S [31]. TS 
moves from a solution to its best admissible neigh- 
bor, even if this causes the objective function to de- 
teriorate. To avoid cycling, solutions that have been 
recently explored are declared forbidden or tabu for 
a number of iterations. The tabu status of a so- 
lution is overridden when certain criteria (aspira- 
tion criteria) are satisfied. Sometimes, intensification 
and diversification strategies are used to improve the 
search. In the first case, the search is accentuated in 
the promising regions of the feasible domain. In the 
second case, an attempt is made to consider solu- 
tions in a broad area of the search space. The first 
Tabu Search algorithm implemented for the TSP 
appears to be the one described by Glover [23,29]. 
Limited results for this implementation and vari- 
ants on it were reported by Glover [26]. Other Tabu 
Search algorithms for the TSP are presented in [74]. 
Genetic Algorithms (GAs) are search procedures 
based on the mechanics of natural selection and nat- 
ural genetics. The first GA was developed by John H. 
Holland in the 1960s to allow computers to evolve 
solutions to difficult search and combinatorial prob- 
lems, such as function optimization and machine 
learning [38]. Genetic algorithms offer a particularly 
attractive approach for problems like traveling sales- 
man problem since they are generally quite effec- 
tive for rapid global search of large, non-linear and 


poorly understood spaces. Moreover, genetic algo- 
rithms are very effective in solving large-scale prob- 
lems. Genetic algorithms mimic the evolution pro- 
cess in nature. GAs are based on an imitation of the 
biological process in which new and better popu- 
lations among different species are developed dur- 
ing evolution. Thus, unlike most standard heuris- 
tics, GAs use information about a population of so- 
lutions, called individuals, when they search for bet- 
ter solutions. A GA is a stochastic iterative proce- 
dure that maintains the population size constant in 
each iteration, called a generation. Their basic oper- 
ation is the mating of two solutions in order to form 
a new solution. To form a new population, a bi- 
nary operator called crossover, and a unary opera- 
tor, called mutation, are applied [61,62]. Crossover 
takes two individuals, called parents, and produces 
two new individuals, called offsprings, by swapping 
parts of the parents. Genetic algorithms for the TSP 
are presented in [9,51,59,64,67]. 

Greedy Randomized Adaptive Search Procedure - 
GRASP [66] is an iterative two phase search method 
which has gained considerable popularity in com- 
binatorial optimization. Each iteration consists of 
two phases, a construction phase and a local search 
procedure. In the construction phase, a randomized 
greedy function is used to build up an initial solu- 
tion. This randomized technique provides a feasi- 
ble solution within each iteration. This solution is 
then exposed for improvement attempts in the local 
search phase. The final result is simply the best solu- 
tion found over all iterations. Greedy Randomized 
Adaptive Search Procedure algorithms for the TSP 
are presented in [50,51]. 

The use of Artificial Neural Networks to find good 
solutions to combinatorial optimization problems 
has recently caught some attention. A neural net- 
work consists of a network [57] of elementary nodes 
(neurons) that are linked through weighted con- 
nections. The nodes represent computational units, 
which are capable of performing a simple compu- 
tation, consisting of a summation of the weighted 
inputs, followed by the addition of a constant called 
the threshold or bias, and the application of a non- 
linear response (activation) function. The result of 
the computation of a unit constitutes its output. This 
output is used as an input for the nodes to which 
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it is linked through an outgoing connection. The 
overall task of the network is to achieve a certain 
network configuration, for instance a required in- 
put-output relation, by means of the collective com- 
putation of the nodes. This process is often called 
self-organization. Neural Networks algorithms for 
the TSP are presented in [4,6,53,69]. 

The Ant Colony Optimization (ACO) metaheuris- 
tic is a relatively new technique for solving com- 
binatorial optimization problems (COPs). Based 
strongly on the Ant System (AS) metaheuristic de- 
veloped by Dorigo, Maniezzo and Colorni [19], ant 
colony optimization is derived from the foraging 
behaviour of real ants in nature. The main idea of 
ACO is to model the problem as the search for 
a minimum cost path in a graph. Artificial ants walk 
through this graph, looking for good paths. Each ant 
has a rather simple behavior so that it will typically 
only find rather poor-quality paths on its own. Bet- 
ter paths are found as the emergent result of the 
global cooperation among ants in the colony. An 
ACO algorithm consists of a number of cycles (it- 
erations) of solution construction. During each it- 
eration a number of ants (which is a parameter) 
construct complete solutions using heuristic infor- 
mation and the collected experiences of previous 
groups of ants. These collected experiences are rep- 
resented by a digital analogue of trail pheromone 
which is deposited on the constituent elements of 
a solution. Small quantities are deposited during 
the construction phase while larger amounts are de- 
posited at the end of each iteration in proportion 
to solution quality. Pheromone can be deposited 
on the components and/or the connections used in 
a solution depending on the problem. Ant Colony 
Optimization algorithms for the TSP are presented 
in [16,18,19,70]. 

One way to invest extra computation time is to ex- 
ploit the fact that many local improvement heuris- 
tics have random components, even if in their ini- 
tial tour construction phase. Thus, if one runs the 
heuristic multiple times he will get different results 
and can take the best. The Iterated Lin Kernighan 
algorithm (ILK) [54] has been proposed by John- 
son [39] and it is considered to be one of the best 
algorithms for obtaining a first local minimum. To 
improve this local minimum, the algorithm exam- 


ines other local minimum tours ‘near’ the current 
local minimum. To generate these tours, ILK first 
applies a random and unbiased nonsequential 4-opt 
exchange to the current local minimum and then 
optimizes this 4-opt neighbor using the LK algo- 
rithm. If the tour obtained by this process is bet- 
ter than the current local minimum then ILK makes 
this tour the current local minimum and continues 
from there using the same neighbor generation pro- 
cess. Otherwise, the current local minimum remains 
as it is and further random 4-opt moves are tried. 
The algorithm stops when a stopping criterion based 
either on the number of iterations or the computa- 
tional time is satisfied. Two other approaches are the 
Iterated 3-opt and the Chained Lin-Kernighan [5], 
where random kicks are generated from the solution 
and from these new points the exploration for a bet- 
ter solution is continued [40]. 

Ejection Chain Method provides a wide variety of 
reference structures, which have the ability to gener- 
ate moves not available to neighborhood search ap- 
proaches traditionally applied to TSP [63,64]. Ejec- 
tion Chains are variable depth methods that gener- 
ate a sequence of interrelated simple moves to cre- 
ate a more complex compound move. An ejection 
consists of a succession of operations performed 
on a given set of elements, where the m, operation 
changes the state of one or more elements which are 
said to be ejected in the m;+1 operations. Of course, 
there is a possibility to appear changes in the state 
of other elements, which will lead to other ejections, 
until no more operations can be made [27]. Other 
Ejection Chain Algorithms are presented in [20,21]. 
Scatter Search is an evolutionary strategy originally 
proposed by Glover [28,30]. Scatter Search operates 
on a set of reference solutions to generate a new 
set of solutions by weighted linear combinations of 
structured subset of solutions. The reference set is 
required to be made up of high quality and diverse 
solutions and the goal is to produce weighted cen- 
ters of selected subregions that project these centers 
into regions of the solution space that are to be ex- 
plored by auxiliary heuristic procedures. 

Path Relinking [28,30], combines solutions by gen- 
erating paths between them using local search 
neighborhoods, and selecting new solutions en- 
countered along these paths. 
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e Guided Local Search (GLS), originally proposed by 


Voudouris and Chang [71,72], is a general optimiza- 
tion technique suitable for a wide range of combina- 
torial optimization problems. The main focus is on 
the exploitation of problem and search-related in- 
formation to effectively guide local search heuristics 
in the vast search spaces of NP-hard optimization 
problems. This is achieved by augmenting the objec- 
tive function of the problem to be minimized with 
a set of penalty terms which are dynamically manip- 
ulated during the search process to steer the heuris- 
tic to be guided. GLS augments the cost function of 
the problem to include a set of penalty terms and 
passes this, instead of the original one, for minimiza- 
tion by the local search procedure. Local search is 
confined by the penalty terms and focuses attention 
on promising regions of the search space. Iterative 
calls are made to local search. Each time local search 
gets caught in a local minimum, the penalties are 
modified and local search is called again to minimize 
the modification cost function. Guided Local Search 
algorithms for the TSP are presented in [71,72]. 
Noising Method was proposed by Charon and 
Hudry [13] and is a metaheuristic where if it is 
wanted to minimize the function f’, this method do 
not take the true values of f! into account but it con- 
siders that they are perturbed in some way by noises 
in order to get a noised function f Ter During the 
run of the algorithm, the range of the perturbing 
noises decreases (typically to zero), so that, at the 
end, there is no significant noise and the optimiza- 
tion of f' noisea leads to the same solution as the one 
provided by a descent algorithm applied to f' with 
the same initial solution. This algorithm was applied 
to the Traveling Salesman Problem by Charon and 
Hudry [14]. 

Particle Swarm Optimization (PSO) is a popu- 
lation-based swarm intelligence algorithm. It was 
originally proposed by Kennedy and Eberhart as 
a simulation of the social behavior of social organ- 
isms such as bird flocking and fish schooling [44]. 
PSO uses the physical movements of the individuals 
in the swarm and has a flexible and well-balanced 
mechanism to enhance and adapt to the global and 
local exploration abilities. PSO algorithms for the 
solution of the Traveling Salesman Problem are pre- 
sented in [32,47,73]. 


Variable Neighborhood Search (VNS) is a meta- 
heuristic for solving combinatorial optimization 
problems whose basic idea is systematic change of 
neighborhood within a local search [36]. Variable 
Neighborhood Search algorithms for the TSP are 
presented in [52]. 
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Introduction 


Heuristic search [7,9] is a common technique for find- 
ing a solution in a decision tree or graph containing one 
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or more solutions. Many applications in operations re- 
search and artificial intelligence rely on heuristic search 
as their primary solution method. 

Heuristic search techniques can be classified into 
two broad categories: depth-first search (DFS) and best- 
first search (BFS). As a consequence of its better infor- 
mation base, BFS usually examines fewer nodes but oc- 
cupies more storage space for maintaining the already 
explored nodes. 


Depth-First Search 


DFS expands an initial state by generating its immedi- 
ate successors. At each subsequent step, one of the most 
recently generated successors is selected and expanded. 
At terminal states, or when it can be determined that 
the current state does not lead to a solution, the search 
backtracks, that is, the node expansion proceeds with 
the next most recently generated state. Practical imple- 
mentations use a stack data structure for maintaining 
the states (nodes) on the path to the currently explored 
state. The space complexity of the stack, O(d), increases 
only linearly with the search depth d. 

Backtracking is the most rudimentary variant of 
DFS. It terminates as soon as any solution has been 
found; hence, there is no guarantee for finding an 
optimal (least-cost) solution. Moreover, backtracking 
might not terminate in graphs containing cycles or 
when the search depth is unbounded. 

Depth-first branch and bound (DFBB) [6] employs 
a heuristic function to eliminate parts of the search 
space that cannot contain an optimal solution. It con- 
tinues after finding a first solution until the search space 
is completely exhausted. Whenever a better solution is 
found, the current solution path and its value are up- 
dated. Inferior subtrees, i. e., subtrees that are known to 
be worse than the current solution, are eliminated. 

The alpha-beta algorithm [2] used in game tree 
searching is a variant of DFBB that operates on trees 
with alternating levels of AND and OR nodes [5]. Be- 
cause the strength of play correlates to the depth of the 
search, much effort has been spent on devising efficient 
parallel implementations (> parallel heuristic search). 


Best-First Search 


BFS sorts the sequence of node expansions according to 
a heuristic function. The A* search algorithm [7] uses 


a heuristic evaluation function f(”) = g(n)+ h(n) to de- 
cide which successor node n to expand next. Here, g(n) 
is the cost of the path from the initial state to the cur- 
rent node n and h(n) is the estimated completion cost 
to a nearest goal state. Ifh does not overestimate the re- 
maining cost, A* is guaranteed to find an optimal (least- 
cost) solution: it is said to be admissible. It does so with 
a minimal number of node expansions [9]—no other 
search algorithm (with the same heuristic h) can do bet- 
ter. This is possible, because A* keeps the search graph 
in memory, occupying O(w“) memory cells for trees of 
width w and depth d. 

Best-first frontier search [4] also finds an optimal 
solution, but with a much lower space complexity than 
A*. It only keeps the frontier nodes in memory and dis- 
cards the interior (closed) nodes. Care must be taken 
to ensure that the search frontier does not contain gaps 
that would allow the search to leak back into interior re- 
gions. The memory savings are most pronounced in di- 
rected acyclic graphs. In the worst case, that is, in trees 
of width w, it still saves a fraction of 1/w of the nodes 
that BFS would need to store. 

Iterative-deepening A* (IDA*) [3] simulates A*’s 
best-first node expansion by a series of DFSs, each with 
the cost-bound f(n) increased by the minimal amount. 
The cost-bound is initially set to the heuristic estimate 
of the root node, h (root). Then, for each iteration, the 
bound is increased to the minimum value that exceeded 
the previous bound. Like A* , IDA* is guaranteed to 
find an optimal solution [3], provided the heuristic es- 
timate function h is admissible and never overestimates 
the path to the goal. IDA* obeys the same asymptotic 
branching factor as A* [7], if the number of newly 
expanded nodes grows exponentially with the search 
depth [3]. This growth rate, the heuristic branching fac- 
tor, depends on the average number of applicable op- 
erators per node and the discrimination power of the 
heuristic function h. 


Applications 


Typical applications of heuristic search techniques may 
be found in many areas—not only in the fields of ar- 
tificial intelligence and operations research, but also in 
other parts of computer science. 

In the two-dimensional rectangular cutting-stock 
problem [1], we are given a set R; = {(1;, wi), i= 1,..., m} 
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of m rectangles of width w; and length J; that are to 
be cut out of a single rectangular stock sheet S. Assum- 
ing that S is of width W and that the theoretically un- 
bounded length is L, the problem is to find an optimal 
cut with minimal length expansion. Since the elements 
R; are cut after the cutting pattern has been determined, 
we can look at the problem as a bin-packing or vehicle- 
routing problem, which are also known to be nondeter- 
ministic polynomial-time (NP) complete [8]. 

Very large scale integration (VLSI) floorplan opti- 
mization is a stage in the design of VLSI chips, where 
the dimensions of the basic building blocks (cells) must 
be determined, subject to the minimization of the to- 
tal chip layout area. This can be done with a BFS or 
a DFBB approach. Again, only small problem cases can 
be solved optimally, because VLSI floorplan optimiza- 
tion is also NP-complete. 

In the satisfiability problem, it must be determined 
whether a Boolean formula containing binary vari- 
ables in conjunctive normal form is satisfiable, that is, 
whether an assignment of truth values to the variables 
exists for which the formula is true. 

The 15-puzzle benchmark in single-agent game-tree 
search consists of 15 square tiles located in a square 
tray of size 4 x 4. One square, the “blank square,” is 
kept empty so that an orthogonally adjacent tile can 
slide into its position, thus leaving an empty position 
at its origin. The problem is to re-arrange a given ini- 
tial configuration with the fewest number of moves into 
a goal configuration without lifting one tile over an- 
other. While it would seem easy to obtain any solution, 
finding optimal (shortest) solutions is NP-complete. 
The 15-puzzle spawns a search space of 16! ~ 2-101° 
states. 


See also 


> Asynchronous Distributed Optimization 
Algorithms 

> Automatic Differentiation: Parallel Computation 

> Load Balancing for Parallel Optimization 
Techniques 

> Parallel Computing: Complexity Classes 

> Parallel Computing: Models 

> Parallel Heuristic Search 

> Stochastic Network Problems: Massively Parallel 
Solution 


References 


1. Christofides N, Whitlock C (1977) An algorithm for two- 
dimensional cutting problems. Oper Res 25(1):30-44 

2. Knuth DE, Moore RW (1975) An analysis of alpha-beta prun- 
ing. Artif Intell 6(4):293-326 

3. Korf RE (1985) Depth-first iterative-deepening: An optimal 
admissible tree search. Artif Intell 27:97-109 

4. Korf RE, Zhang W, Thayer |, Hohwald H (2005) Frontier 
Search J ACM 52:715-748 

5. Kumar V, Nau DS, Kanal L (1988) A general branch- 
and-bound formulation for AND/OR graph and game-tree 
search. In: Kanal L, Kumar V (eds) Search in Artificial Intelli- 
gence. Springer, New York, pp 91-130 

6. Lawler EL, Wood DE (1966) Branch and bound methods: 
A survey. Oper Res 14:600-719 

7. Nilsson NJ (1980) Principles of artificial intelligence. Tioga 
Publ., Palo Alto 

8. Papadimitriou CH, Steiglitz K (1982) Combinatorial opti- 
mization: Algorithms and complexity. Prentice-Hall, Engle- 
wood Cliffs, NJ 

9. Pearl J (1984) Heuristics. Intelligent search strategies for 
computer problem solving. Addison-Wesley, Reading 


Heuristics for Maximum Clique 
and Independent Set 


MARCELLO PELILLO 
University Ca’ Foscari di Venezia, 
Venezia Mestre, Italy 


MSC2000: 90C59, 05C69, 05C85, 68W01 


Article Outline 


Keywords 

Sequential Greedy Heuristics 
Local Search Heuristics 
Advanced Search Heuristics 


Simulated Annealing 
Neural Networks 
Genetic Algorithms 
Tabu Search 


Continuous Based Heuristics 
Miscellaneous 

Conclusions 

See also 

References 


Keywords 


Heuristics; Algorithms; Clique; Independent set 


Heuristics for Maximum Clique and Independent Set 


1509 


Throughout this article, G = (V, E) is an arbitrary undi- 
rected and weighted graph unless otherwise specified, 
where V = (1,..., n} is the vertex set of G and E C V x 
V is its edge set. For each vertex i € V, a positive weight 
w; is associated with i, collected in the weight vector w 
€ R". For a subset S C V, the weight of S is defined as 
W(S) = Yiies wi, and G(S) = (S, EN S x S) is the sub- 
graph induced by S. The cardinality of S, i.e., the num- 
ber of its vertices, will be denoted by |S]. 

A graph G = (V, E) is complete if all its vertices are 
pairwise adjacent, i.e. Vi, j € V with i £ j, we have (i, 
j) € E. A clique C is a subset of V such that G(C) is 
complete. The clique number of G, denoted by w(G) 
is the cardinality of the maximum clique. The maxi- 
mum clique problem asks for cliques of maximum car- 
dinality. The maximum weight clique problem asks for 
cliques of maximum weight. Given the weight vector 
w € R", the weighted clique number is the total weight 
of the maximum weight clique, and will be denoted by 
@(G, w). 

We should distinguish a maximum clique from 
a maximal clique. A maximal clique is one that is 
not a proper subset of any other clique. A maximum 
(weight) clique is a maximal clique that has the maxi- 
mum cardinality (weight). 

An independent set (also called stable set or vertex 
packing) is a subset of V whose elements are pairwise 
nonadjacent. The maximum independent set problem 
asks for an independent set of maximum cardinality. 
The size of a maximum independent set is the stability 
number of G, (denoted by a(G)). The maximum weight 
independent set problem asks for an independent set 
of maximum weight. Given the weight vector w € R", 
the weighted stability number, denoted a(G, w), is the 
weight of the maximum weight independent set. 

The complement graph of G=(V,E) is the graph 
G=(V,E), where E = {(i,f): i,j € V, i # 
j and (i,j) ¢ E}. It is easy to see that S is a clique 
of G if and only if S is an independent set of G. Any 
result or algorithm obtained for one of the two prob- 
lems has its equivalent forms for the other one. Hence 
a(G) = w(G), more generally, a(G, w) = w(G,w). 

The maximum clique and independent set prob- 
lems are well-known examples of intractable combi- 
natorial optimization problems [18]. Apart from the 
theoretical interest around these problems, they also 
find practical applications in such diverse domains as 


computer vision, experimental design, information re- 
trieval, fault tolerance, etc. Moreover, many important 
problems turn out to be easily reducible to them, and 
these include, for example, the Boolean satisfiability 
problem, the subgraph isomorphism problem, and the 
vertex covering problem. The maximum clique prob- 
lem has also a certain historical value, as it was one of 
the first problems shown to be NP-complete in the now 
classical paper of R.M. Karp on computational com- 
plexity [64]. 

Due to their inherent computational complexity, 
exact algorithms are guaranteed to return a solution 
only in a time which increases exponentially with 
the number of vertices in the graph, and this makes 
them inapplicable even to moderately large problem 
instances. Moreover, a series of recent theoretical re- 
sults show that the problems are in fact difficult to solve 
even in terms of approximation. Strong evidence of this 
fact came in 1991, when it was proved in [32] that 
if there is a polynomial time algorithm that approxi- 
mates the maximum clique within a factor of glogh =" 
then any NP-hard problem can be solved in “quasipoly- 
nomial’ time (i.e., in logon time). The result was 
further refined in [6,7] one year later. Specifically, it 
was proved that there exists an € > 0 such that no 
polynomial time algorithm can approximate the size 
of the maximum clique within a factor of n*, unless 
P = NP. Developments along these lines can be found 
in [14,15,49]. 

In light of these negative results, much effort has re- 
cently been directed towards devising efficient heuris- 
tics for maximum clique and independent set, for which 
no formal guarantee of performance may be provided, 
but are anyway of interest in practical application. Lack- 
ing (almost by definition) a general theory of how these 
algorithms work, their evaluation is essentially based on 
massive experimentation. In order to facilitate compar- 
isons among different heuristics, a set of benchmark 
graphs arising from different applications and prob- 
lems has recently been constructed in conjunction with 
the 1993 DIMACS challenge on cliques, coloring and 
satisfiability [63]. 

In this article we provide an informal survey of re- 
cent heuristics for maximum clique and related prob- 
lems, and up-to-date bibliographic pointers to the rele- 
vant literature. A more comprehensive review and bib- 
liography can be found in [18]. 
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Sequential Greedy Heuristics 


Many approximation algorithms in the literature for 
the maximum clique problem are called sequential 
greedy heuristics. These heuristics generate a maximal 
clique through the repeated addition of a vertex into 
a partial clique, or the repeated deletion of a vertex from 
a set that is not a clique. Decisions on which vertex to be 
added in or moved out next are based on certain indica- 
tors associated with candidate vertices as, for example, 
the vertex degree. There is also a distinction between 
heuristics that update the indicators every time a vertex 
is added in or moved out, and those that do not. Ex- 
amples of such heuristics can be found in [62,89]. The 
differences among these heuristics are their choice of 
indicators and how indicators are updated. A heuristic 
of this type can run very fast. 


Local Search Heuristics 


Let us define Cg to be the set of all the maximal cliques 
of G. Basically, a sequential greedy heuristic finds one 
set in Cg, hoping it is (close to) the optimal set, and 
stops. A possible way to improve our approximation 
solutions is to expand the search in Cg. For example, 
once we find a set S € C g, we can search its ‘neigh- 
bors’ to improve S. This leads to the class of the local 
search heuristics [2]. Depending on the neighborhood 
structure and how the search is performed, different lo- 
cal search heuristics result. 

A well-known class of local search heuristics in the 
literature is the k-interchange heuristics. They are based 
on the k-neighbor of a feasible solution. In the case of 
the maximum clique problem, a set C € Cg is a k- 
neighbor of S if |C A S| < k, where k < |S|. A k- 
interchange heuristic first finds a maximal clique S € 
Cg, then it searches all the k-neighbors of S and re- 
turns the best clique found. Clearly, the main factors 
for the complexity of this class of heuristics are the size 
of the neighborhood and the searches involved. For ex- 
ample, in the k-interchange heuristic, the complexity 
grows roughly with O(n‘), 

A class of heuristics designed to search various sets 
of Cg is called the randomized heuristics. The main in- 
gredient of this class of heuristics is the part that finds 
a random set in Cg. A possible way to do that is to in- 
clude some random factors in the generation of a set of 
Cg. A randomized heuristic runs a heuristic (with ran- 


dom factors included) a number of times to find differ- 
ent sets over Cg. For example, we can randomize a se- 
quential greedy heuristic and let it run N times. The 
complexity of a randomized heuristic depends on the 
complexity of the heuristic and the number N. 

An elaborated implementation of the randomized 
heuristic for the maximum independent set problem 
can be found in [33], where local search is combined 
with randomized heuristic. The computational results 
in it indicated that the approach was effective in find- 
ing large cliques of randomly generated graphs. A dif- 
ferent implementation of a randomized algorithm for 
the maximum independent set problem can be found 
in [5]. 


Advanced Search Heuristics 


Local search algorithms are only capable of finding /o- 
cal solutions of an optimization problem. Powerful vari- 
ations of the basic local search procedure have been de- 
veloped which try to avoid this problem, many of which 
are inspired from various phenomena occurring in na- 
ture. 


Simulated Annealing 


In condensed-matter physics, the term ‘annealing’ 
refers to a physical process to obtain a pure lattice struc- 
ture, where a solid is first heated up in a heat bath un- 
til it melts, and next cooled down slowly until it solidi- 
fies into a low-energy state. During the process, the free 
energy of the system is minimized. Simulated anneal- 
ing, introduced in 1983 by S. Kirkpatrick, C.D. Gelatt 
and M.P. Vecchi [65], is a randomized neighborhood 
search algorithm based on the physical annealing pro- 
cess. Here, the solutions of a combinatorial optimiza- 
tion problem correspond to the states of the physical 
system, and the cost of a solution is equivalent to the 
energy of the state. 

In its original formulation, simulated annealing 
works essentially as follows. Initially, a tentative solu- 
tion in the state space is somehow generated. A new 
neighboring state is then produced from the previous 
one and, if the value of the cost function f improves, 
the new state is accepted, otherwise it is accepted with 
probability exp{A f/t}, where A f is the difference of 
the cost function between the new and the current state, 
and t is a parameter usually called the temperature in 
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analogy with physical annealing, which is varied care- 
fully during the optimization process. The algorithm 
proceeds iteratively this way until a stopping condition 
is met. One of the critical aspects of the algorithm re- 
lates to the choice of the proper “cooling schedule,’ i. e., 
how to decrease the temperature as the process evolves. 
While a logarithmic slow cooling schedule (yielding an 
exponential time algorithm) provably guarantees the 
exact solution, faster cooling schedules, producing ac- 
ceptably good results, are in widespread use. Introduc- 
tory textbooks describing both theoretical and practical 
issues of the algorithm are [1,66]. 

E. Aarts and J. Korst [1], without presenting any ex- 
perimental result, suggested the use of simulated an- 
nealing for solving the independent set problem, using 
a penalty function approach. Here, the solution space 
is the set of all possible subsets of vertices of the graph 
G, and the problem is formulated as one of maximizing 
the cost function f(V’) = |V’| — A |E’|, where |E’| is the 
number of edges in G(V’), and A is a weighting factor 
exceeding 1. 

M. Jerrum [61] conducted a theoretical analysis of 
the performance of a clique-finding Metropolis process, 
i.e., simulated annealing at fixed temperature, on ran- 
dom graphs. He proved that the expected time for the 
algorithm to find a clique that is only slightly bigger 
than that produced by a naive greedy heuristic grows 
faster than any polynomial in the number of vertices. 
This suggests that ‘true’ simulated annealing would be 
ineffective for the maximum clique problem. 

Jerrum’s conclusion seems to be contradicted by 
practical experience. In [56], S. Homer and M. Peinado 
compare the performance of three heuristics, namely 
the greedy heuristic developed in [62], a random- 
ized version of the Boppana-Halldérsson subgraph- 
exclusion algorithm [24], and simulated annealing, 
over very large graphs. The simulated annealing algo- 
rithm was essentially that proposed by Aarts and Korst, 
with a simple cooling schedule. This penalty function 
approach was found to work better than the method in 
which only cliques are considered, as proposed in [61]. 
The algorithms were tested on various random graphs 
as well as on DIMACS benchmark graphs. The authors 
ran the algorithms over an SGI workstation for graphs 
with up to 10,000 vertices, and on a Connection Ma- 
chine for graphs with up to 70,000 vertices. The overall 
conclusion was that simulated annealing outperforms 


the other competing algorithms; it also ranked among 
the best heuristics for maximum clique presented at the 
1993 DIMACS challenge [63]. 


Neural Networks 


Artificial neural networks (often simply referred to as 
‘neural networks’) are massively parallel, distributed 
systems inspired by the anatomy and physiology of the 
cerebral cortex, which exhibit a number of useful prop- 
erties such as learning and adaptation, universal ap- 
proximation, and pattern recognition (see [50,52] for 
an introduction). 

In the mid 1980s, J.J. Hopfield and D.W. Tank [57] 
showed that certain feedback continuous neural mod- 
els are capable of finding approximate solutions to dif- 
ficult optimization problems such as the traveling sales- 
man problem [57]. This application was motivated by 
the property that the temporal evolution of these mod- 
els is governed by a quadratic Liapunov function (typi- 
cally called “energy function’ because of its analogy with 
physical systems) which is iteratively minimized as the 
process evolves. Since then, a variety of combinatorial 
optimization problems have been tackled within this 
framework. The customary approach is to formulate 
the original problem as one of energy minimization, 
and then to use a proper relaxation network to find 
minimizers of this function. Almost invariably, the al- 
gorithms developed so far incorporate techniques bor- 
rowed from statistical mechanics, in particular mean 
field theory, which allow one to escape from poor local 
solutions. We mention the articles [69,82] and the text- 
book [88] for surveys of this field. In [1], an excellent in- 
troduction to a particular class of neural networks (the 
Boltzmann machine) for combinatorial optimization is 
provided. 

Early attempts at encoding the maximum clique and 
related problems in terms of a neural network were al- 
ready done in the late 1980s in [1,12,44,83], and [84] 
(see also [85]). However, little or no experimental re- 
sults were presented, thereby making it difficult to eval- 
uate the merits of these algorithms. In [68], F. Lin and 
K. Lee used the quadratic zero-one formulation from 
[78] as the basis for their neural network heuristic. On 
random graphs with up to 300 vertices, they found their 
algorithm to be faster than the implicit enumerative al- 
gorithm in [26], while obtaining slightly worse results 
in terms of clique size. 
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T. Grossman [45] proposed a discrete, determinis- 
tic version of the Hopfield model for maximum clique, 
originally designed for an all-optical implementation. 
The model has a threshold parameter which determines 
the character of the stable states of the network. The 
author suggests an annealing strategy on this parame- 
ter, and an adaptive procedure to choose the network’s 
initial state and threshold. On DIMACS graphs the al- 
gorithm performs satisfactorily but it does not compare 
well with more powerful heuristics such as simulated 
annealing. 

A. Jagota [58] developed several variations of the 
Hopfield model, both discrete and continuous, to ap- 
proximate maximum clique. He evaluated the per- 
formance of his algorithms over randomly generated 
graphs as well as on harder graphs obtained by gen- 
erating cliques of varying size at random and taking 
their union. Experiments on graphs coming from the 
Solomonoff-Levin, or ‘universal’ distribution are also 
presented in [59]. The best results were obtained us- 
ing a stochastic steepest descent dynamics and a mean- 
field annealing algorithm, an efficient deterministic 
approximation of simulated annealing. These algo- 
rithms, however, were also the slowest, and this moti- 
vated Jagota et al. [60] to improve their running time. 
The mean-field annealing heuristic was implemented 
on a 32-processor Connection Machine, and a two- 
temperature annealing strategy was used. Addition- 
ally, a ‘reinforcement learning’ strategy was developed 
for the stochastic steepest descent heuristic, to auto- 
matically adjust its internal parameters as the process 
evolves. On various benchmark graphs, all their algo- 
rithms obtained significantly larger cliques than other 
simpler heuristics but ran slightly slower. Compared 
to more sophisticated heuristics, they obtained signifi- 
cantly smaller cliques on average but were considerably 
faster. 

M. Pelillo [80] takes a completely different approach 
to the problem, by exploiting a continuous formulation 
of maximum clique and the dynamical properties of the 
so-called relaxation labeling networks. His algorithm is 
described in the next section. 


Genetic Algorithms 


Genetic algorithms are parallel search procedures in- 
spired from the mechanisms of evolution in natural 


systems [45,55]. In contrast to more traditional op- 
timization techniques, they work on a population of 
points, which in the genetic algorithm terminology, are 
called chromosomes or individuals. In the simplest and 
most popular implementation, chromosomes are sim- 
ply long strings of bits. Each individual has an associ- 
ated ‘fitness’ value which determines its probability of 
survival in the next ‘generation’: the higher the fitness, 
the higher the probability of survival. The genetic algo- 
rithm starts out with an initial population of members 
generally chosen at random and, in its simplest ver- 
sion, makes use of three basic operators: reproduction, 
crossover and mutation. Reproduction usually consists 
of choosing the chromosomes to be copied in the next 
generation according to a probability proportional to 
their fitness. After reproduction, the crossover operator 
is applied between pairs of selected individuals to pro- 
duce new offsprings. The operator consists of swapping 
two or more subsegments of the the strings correspond- 
ing to the two chosen individuals. Finally, the mutation 
operator is applied, which randomly reverses the value 
of every bit within a chromosome with a fixed probabil- 
ity. The procedure just described is sometimes referred 
to as the ‘simple’ genetic algorithm [45]. 

One of the earliest attempts to solve the maximum 
clique problem using genetic algorithms was done in 
1993 by B. Carter and K. Park [27]. After showing 
the weakness of the simple genetic algorithm in find- 
ing large cliques, even on small random graphs, they 
introduced several modifications in an attempt to im- 
prove performance. However, despite their efforts they 
did not get satisfactory results, and their general con- 
clusion was that genetic algorithms need to be heavily 
customized in order to be competitive with traditional 
approaches, and that they are computationally very ex- 
pensive. In a later study [79], genetic algorithms were 
proven to be less effective than simulated annealing. At 
almost the same time, T. Back and S. Khuri [8], work- 
ing on the maximum independent set problem, arrived 
at the opposite conclusion. By using a straightforward, 
general-purpose genetic algorithm called GENEsYs and 
a suitable fitness function which included a graded 
penalty term to penalize infeasible solutions, they got 
interesting results over random and regular graphs with 
up to 200 vertices. These results indicate that the choice 
of the fitness function is crucial for genetic algorithms 
to provide satisfactory results. 
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A.S. Murthy et al. [74] also experimented with a ge- 
netic algorithm using a novel ‘partial copy crossover’, 
and a modified mutation operator. However, they pre- 
sented results over very small (i.e., up to 50 vertices) 
graphs, thereby making it difficult to properly evaluate 
the algorithm. 

T.N. Bui and P.H. Eppley [25] obtained encourag- 
ing results by using a hybrid strategy which incorpo- 
rates a local optimization step at each generation of the 
genetic algorithm, and a vertex-ordering preprocessing 
phase. They tested the algorithm over some DIMACS 
graphs getting results comparable to that in[39] 

Instead of using the standard binary representation 
for chromosomes, J.A. Foster and T. Soule [36] em- 
ployed an integer-based encoding scheme. Moreover, 
they used a time weighting fitness function similar in 
spirit to those in [27]. The results obtained are inter- 
esting, but still not comparable to those obtained using 
more traditional search heuristics. 

C. Fleurent and J.A. Ferland [35] developed 
a general-purpose system for solving graph coloring, 
maximum clique, and satisfiability problems. As far 
as the maximum clique problem is concerned, they 
conducted several experiments using a hybrid genetic 
search scheme which incorporates tabu search and 
other local search techniques as alternative mutation 
operators. The results presented are encouraging, but 
running time is quite high. 

In [53], M. Hifi modifies the basic genetic algorithm 
in several aspects: 

a) a particular crossover operator creates two new dif- 
ferent children; 

b) the mutation operator is replaced by a spe- 
cific heuristic feasibility transition adapted to the 
weighted maximum stable set problem. 

This approach is also easily parallelizable. Experimen- 

tal results on randomly generated graphs and also some 

(unweighted) instances from the DIMACS testbed [63] 

are reported to validate this approach. 

Finally, E. Marchiori [71] has developed a sim- 
ple heuristic-based genetic algorithm which consists 
of a combination of the simple genetic algorithm and 
a naive greedy heuristic procedure. Unlike previous ap- 
proaches, here there is a neat division of labor, the 
search for a large subgraph and the search for a clique 
being incorporated into the fitness function and the 
heuristic procedure, respectively. The algorithm out- 


performs previous genetic-based clique finding proce- 
dures over various DIMACS graphs, both in terms of 
quality of solutions and speed. 


Tabu Search 


Tabu search, introduced independently by F. Glover 
[41,42] and P. Hansen and B. Jaumard [48], is a mod- 
ified local search algorithm, in which a prohibition- 
based strategy is employed to avoid cycles in the search 
trajectories and to explore new regions in the search 
space. At each step of the algorithm, the next solution 
visited is always chosen to be the best legal neighbor of 
the current state, even if its cost is worse than the cur- 
rent solution. The set of legal neighbors is restricted by 
one or more tabu lists which prevent the algorithm to 
go back to recently visited solutions. These lists are used 
to store historical information on the path followed by 
the search procedure. Sometimes the tabu restriction is 
relaxed, and tabu solutions are accepted if they satisfy 
some aspiration level condition. The standard example 
of a tabu list is one which contains the last k solutions 
examined, where k may be fixed or variable. Additional 
lists containing the last modifications performed, i.e., 
changes occurred when moving from one solution to 
the next, are also common. These types of lists are re- 
ferred to as short-term memories; other forms of memo- 
ries are also used to intensify the search in a promising 
region or to diversify the search to unexplored areas. 
Details on the algorithm and its variants can be found 
in [43] and [51]. 

In 1989, C. Friden et al. [37] proposed a heuristic for 
the maximum independent set problem based on tabu 
search. The size of the independent set to search for 
is fixed, and the algorithm tries to minimize the num- 
ber of edges in the current subset of vertices. They used 
three tabu lists: one for storing the last visited solutions 
and the other two to contain the last introduced/deleted 
vertices. They showed that by using hashing for imple- 
menting the first list and choosing a small value for the 
dimensions of the other two lists, a best neighbor may 
be found in almost constant time. 

In [38,86], three variants of tabu search for maxi- 
mum clique are presented. Here the search space con- 
sists of complete subgraphs whose size has to be max- 
imized. The first two versions are deterministic algo- 
rithms in which no sampling of the neighborhood is 
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performed. The main difference between the two algo- 
rithms is that the first one uses just one tabu list (of the 
last solutions visited), while the second one uses an ad- 
ditional list (with an associated aspiration mechanism) 
containing the last vertices deleted. Also, two diversi- 
fication strategies were implemented. The third algo- 
rithm is probabilistic in nature, and uses the same two 
tabu lists and aspiration mechanism as the second one. 
It differs from it because it performs a random sampling 
of the neighborhood, and also because it allows for mul- 
tiple deletion of vertices in the current solution. Here 
no diversification strategy was used. In [38,86] results 
on randomly generated graphs were presented and the 
algorithms were shown to be very effective. P. Soriano 
and M. Gendreau [87] tested their algorithms over the 
DIMACS benchmark graphs and the results confirmed 
the early conclusions. 

R. Battiti and M. Protasi [13] extended the tabu 
search framework by introducing a reactive local search 
method. They modified a previously introduced reac- 
tive scheme by exploiting the particular neighborhood 
structure of the maximum clique problem. In general 
reactive schemes aim at avoiding the manual selection 
of control parameters by means of an internal feed- 
back loop. Battiti and Protasi’s algorithm adopts such 
a strategy to automatically determine the so-called pro- 
hibition parameter k, i.e., the size of the tabu list. Also 
an explicit memory-influenced restart procedure is ac- 
tivated periodically to introduce diversification. The 
search space consists of all possible cliques, as in the 
approach by Friden et al., and the function to be maxi- 
mized is the clique size. The worst-case computational 
complexity of this algorithm is O(max{n, m}), where 
n and m are the number of vertices and edges of the 
graph respectively. They noticed, however, that in prac- 
tice, the number of operations tends to be proportional 
to the average degree of the vertices of the comple- 
ment graph. They tested their algorithm over many DI- 
MACS benchmark graphs obtaining better results then 
those presented at the DIMACS workshop in competi- 
tive time. 


Continuous Based Heuristics 


In 1965, T.S. Motzkin and E.G. Straus [73] established 
a remarkable connection between the maximum clique 
problem and a certain quadratic programming prob- 


lem. Let G = (V, E) be an undirected (unweighted) 
graph and let A denote the standard simplex in the n- 
dimensional Euclidean space R": 


A= {x eR": x; > Oforallie V, eles i}, 


where the letter e is reserved for a vector of appro- 


Ty = 


priate length, consisting of unit entries (hence e 
i € vi). 


Now, consider the following quadratic function, 
sometimes called the Lagrangian of G: 


g(x) = x! Agx, (1) 


where Ag = (aj) is the adjacency matrix of G, i.e. the 
symmetric n x n matrix where aj = 1 if (i, j) € E and 
aj = 0 if (i, j) ¢ E, and let x* be a global maximizer of g 
on A. In [73] it is proved that the clique number of G is 
related to g(x*) by the following formula: 


1 


MO Tey 


Additionally, it is shown that a subset of vertices S is 
a maximum clique of G if and only if its characteris- 
tic vector x°, which is the vector of A defined as x? = 
1/|S| if i € S and x? = 0 otherwise, is a global maximizer 
of g on A. In [40,81], the Motzkin-Straus theorem has 
been extended by providing a characterization of max- 
imal cliques in terms of local maximizers of g on A. 
One drawback associated with the original Motz- 
kin-Straus formulation relates to the existence of spuri- 
ous solutions, i. e., maximizers of g which are not in the 
form of characteristic vectors [77,81]. In principle, spu- 
rious solutions represent a problem since, while provid- 
ing information about the cardinality of the maximum 
clique, they do not allow us to easily extract its vertices. 
During the 1990s, there has been much interest 
around the Motzkin-Straus and related continuous for- 
mulations of the maximum clique problem. They sug- 
gest in fact a fundamentally new way of solving the 
maximum clique problem, by allowing us to shift from 
the discrete to the continuous domain in an elegant 
manner. As pointed out in [76], continuous formula- 
tions of discrete optimization problems turn out to be 
particularly attractive. They not only allow us to exploit 
the full arsenal of continuous optimization techniques, 
thereby leading to the development of new algorithms, 
but may also reveal unexpected theoretical properties. 
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In [77], P.M. Pardalos and A.T. Phillips developed 
a global optimization approach based on the Motzkin- 
Straus formulation and implemented an iterative clique 
retrieval process to find the vertices of the maximum 
clique. However, due to its high computational cost 
they were not able to run the algorithm over graphs 
with more than 75 vertices. 

Pelillo [80] used relaxation labeling algorithms to 
approximately determining the size of the maximum 
clique using the original Motzkin-Straus formulation. 
These are parallel, distributed algorithms developed 
and studied in computer vision and pattern recogni- 
tion, which are also surprisingly related to replicator 
equations, a class of dynamical systems widely stud- 
ied in evolutionary game theory and related fields [54], 
Heuristics for maximum clique and independent set. 
The model operates in the simplex A and possesses 
a quadratic Liapunov function which drives its dynami- 
cal behavior. It is these properties that naturally suggest 
using them as a local optimization algorithm for the 
Motzkin-Straus program. The algorithm is especially 
suited for parallel implementation, and is attractive for 
its operational simplicity, since no parameters need 
to be determined. Extensive simulations over random 
graphs with up to 2000 vertices have demonstrated the 
effectiveness of the approach and showed that the algo- 
rithm outperforms previous neural network heuristics. 

In order to avoid time-consuming iterative proce- 
dures to extract the vertices of the clique, L.E. Gibbons, 
D.W. Hearn and Pardalos [39] have proposed a heuris- 
tic which is based on a parameterized formulation of 
the Motzkin-Straus program. They consider the prob- 
lem of minimizing the function: 


n 2 
1 
h(x) => 5x Agx + (> xi ) 


i=1 


on the domain: 


S(k) = 


where k is a fixed parameter. Let x* be a global mini- 
mizer of h on S(k), and let V(k) = h(x*). In [39] it is 
proved that V(k) = 0 if and only if there exists an in- 
dependent set S of G with size |S] > k. Moreover, the 
vertices of G associated with the indices of the posi- 


tive components of x* form an independent set of size 
greater than or equal k. 

These properties motivated the following procedure 
to find a maximum independent set of G or, equiv- 
alently, a maximum clique of G. Minimize the func- 
tion h over S(k), for different values of k between pre- 
determined upper and lower bounds. If V(k) = 0 and 
V(k+ 1) 0 for some k, then the maximum clique of 
G has size k, and its vertices are determined by the pos- 
itive components of the solution. Since minimizing h 
on S(k) is a difficult problem, they developed a heuristic 
based on the observation that by removing the nonneg- 
ativity constraints, the problem is that of minimizing 
a quadratic form over a sphere, a problem which is solv- 
able in polynomial time. However, in so doing a heuris- 
tic procedure is needed to round the approximate solu- 
tions of this new problem to approximate solutions of 
the original one. Moreover, since the problem is solved 
approximately, we have to find the value of the spherical 
constraint 1k which yields the largest independent set. 
A careful choice of k is therefore needed. The resulting 
algorithm was tested over various DIMACS benchmark 
graphs [63] and the results obtained confirmed the ef- 
fectiveness of the approach. 

The spurious solution problem has been solved 
in [16]. Consider the following regularized version of 
function g: 


os a 
g(x) = x! Agx + ati (2) 


which is obtained from (1) by substituting the adja- 
cency matrix Ag of G with 


~ 1 
Ag = Ag+ re 


where I is the identity matrix. Unlike the Motzkin- 
Straus formulation, it can be proved that all maximiz- 
ers of g on A are strict, and are characteristic vectors 
of maximal/maximum cliques in the graph. In an ex- 
act sense, therefore, a one-to-one correspondence exists 
between maximal cliques and local maximizers of ¢ in 
A on the one hand and maximum cliques and global 
maximizers on the other hand. In [16,20], replicator 
equations are used in conjunction to this spurious- 
free formulation to find maximal cliques of G. Note 
that here the vertices comprising the clique are directly 
given by the positive components of the converged vec- 
tors, and no iterative procedure is needed to determine 
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them, as in [77]. The results obtained over a set of ran- 
dom as well as DIMACS benchmark graphs were en- 
couraging, especially considering that replicator equa- 
tions do not incorporate any mechanism to escape from 
local optimal solutions. This suggests that the basins 
of attraction of the global solution with respect to the 
quadratic functions g and g are quite large; for a thor- 
ough empirical analysis see also [23]. One may won- 
der whether a subtle choice of initial conditions and/or 
a variant of the dynamics may significantly improve the 
results, but experiments in [22] indicate this is not the 
case. 

In [19] the properties of the following function are 
studied: 

Ba(x) = x Agx tax! x. 
It is shown that when aq is positive all the properties 
enjoyed by the standard regularization approach [16] 
hold true. Specifically, in this case a one-to-one cor- 
respondence between local/global maximizers in the 
continuous space and local/global solutions in the dis- 
crete space exists. For negative a’s an interesting pic- 
ture emerges: as the absolute value of a grows larger, lo- 
cal maximizers corresponding to maximal cliques dis- 
appear. In [19], bounds on the parameter a are de- 
rived which affect the stability of these solutions. These 
results have suggested an annealed replication heuris- 
tic, which consists of starting from a large negative 
a and then properly reducing it during the optimiza- 
tion process. For each value of a standard replicator 
equations are run in order to obtain local solutions of 
the corresponding objective function. The rationale be- 
hind this idea is that for values of a with a proper 
large absolute value only local solutions correspond- 
ing to large maximal cliques will survive, together with 
various spurious maximizers. As the value of a is re- 
duced, spurious solutions disappear and smaller max- 
imal cliques become stable. An annealing schedule is 
proposed which is based on the assumption that the 
graphs being considered are random. In this respect, 
the proposed procedure differs from usual simulated 
annealing approaches, which mostly use a ‘black-box’ 
cooling schedule. Experiments conducted over several 
DIMACS benchmark graphs confirm the effectiveness 
of the proposed approach and the robustness of the 
annealing strategy. The overall conclusion is that the 
annealing procedure does help to avoid inefficient lo- 


cal solutions, by initially driving the dynamics towards 
promising regions in state space, and then refining the 
search as the annealing parameter is increased. 

The Motzkin-Straus theorem has been generalized 
to the weighted case in [40]. Note that the Motzkin- 
Straus program can be reformulated as a minimization 
problem by considering the function 


f(x) =x" (I+ Ap)x, 


where A@q is the adjacency matrix of the complement 
graph G. It is straightforward to see that if x* is a global 
minimizer of f in A, then we have: w(G) = 1/f(x*). 
This is simply a different formulation of the Motzkin- 
Straus theorem. Given a weighted graph G=(V, E) with 
weight vector w, let ((G, w) be the class of symmetric 
nxn matrices B = (bj)j,j¢ v defined as 2b > bj + bj if 
(i, j) ¢ E and bj = 0 otherwise, and b;; = 1/w; for all i 
eV. 

Given the following quadratic program, which is in 
general indefinite, 


min f(x) = x! Bx 


st. xeEA, 


(3) 


in [40] it is shown that for any B € M(G, w) we have: 


1 
w(G, w) Fey’ 
where x* is a global minimizer of program (3). Further- 
more, denote by x the weighted characteristic vector of 
S, which is a vector with coordinates x“ = w;/W(S) if 
i € Sand x? = 0 otherwise. It can be seen that a sub- 
set S of vertices of a weighted graph G is a maximum 
weight clique if and only if its characteristic vector x* is 
a global minimizer of (3). Notice that the matrix I+ Ag 
belongs to M(G, e). In other words, the Motzkin-Straus 
theorem turns out to be a special case of the preceding 
result. 

As in the unweighted case, the existence of spurious 
solutions entails the lack of one-to-one correspondence 
between the solutions of the continuous problem and 
those of the original, discrete one. In [21] these spuri- 
ous solutions are characterized and a regularized ver- 
sion which avoids this kind of problems is introduced, 
exactly as in the unweighted case (see also [17]). Repli- 
cator equations are then used to find maximal weight 
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cliques in weighted graphs, using this formulation. Ex- 
periments with this approach on both random graphs 
and DIMACS graphs are reported. The results obtained 
are compared with those produced by a very efficient 
maximum weight clique algorithm of the branch and 
bound variety. The algorithm performed remarkably 
well especially on large and dense graphs, and it was 
typically an order of magnitude more efficient than its 
competitor. 

Finally, we mention the work by Massaro and Pelillo 
[72], who transformed the Motzkin-Straus program 
into a linear complementarity problem [31], and then 
solved it using a variation of Lemke’s well-known algo- 
rithm [67]. The preliminary results obtained over many 
weighted and unweighted DIMACS graphs show that 
this approach substantially outperforms all other con- 
tinuous based heuristics. 


Miscellaneous 


Another type of heuristics that finds a maximal clique of 
Gis called the subgraph approach (see [11]). It is based 
on the fact that a maximum clique C of a subgraph 
G' C Gis also a clique of G. The subgraph approach 
first finds a subgraph G’ C G such that the maximum 
clique of G’ can be found in polynomial time. Then 
it finds a maximum clique of G’ and use it as the ap- 
proximation solution. The advantage of this approach 
is that in finding the maximum clique C C G’, one has 
(implicitly) searched many other cliques of G’ (Cg © 
Cg). Because of the special structure of G’, this implicit 
search can be done efficiently. In [11], G’ is a maxi- 
mal induced triangulated subgraph of G. Since many 
classes of graphs have polynomial algorithms for the 
maximum clique problem, the same idea also applies 
there. For example, the class of edge-maximal triangu- 
lated subgraphs was chosen in [9,90], and [91]. Some of 
the greedy heuristics, randomized heuristics and sub- 
graph approach heuristics are compared in [90] and 
[91] on randomly generated weighted and unweighted 
graphs. 

Various new heuristics were presented at the 1993 
DIMACS challenge devoted to clique, coloring and sat- 
isfiability [63]. In particular, in [10] an algorithm is pro- 
posed which is based on the observation that finding 
the maximum clique in the union of two cliques can be 
done using bipartite matching techniques. In [46] re- 


stricted backtracking is used to provide a trade-off be- 
tween the size of the clique and the completeness of the 
search. In [70] an edge projection technique is proposed 
to obtain a new upper bound heuristic for the max- 
imum independent set problem. This procedure was 
used, in conjunction with the Balas-Yu branching rule 
[11], to develop an exact branch and bound algorithm 
which works well especially on sparse graphs. 

See [3] for a new population-based optimization 
heuristic inspired by the natural behavior of human or 
animal scouts in exploring unknown regions, and ap- 
plied it to maximum clique. The results obtained over 
a few DIMACS graphs are comparable with those ob- 
tained using continuous-based heuristics but are infe- 
rior to those obtained by reactive local search. 

Recently, DNA computing [4] has also emerged as 
a potential technique for the maximum clique problem 
[75,92]. The major advantage of DNA computing is its 
high parallelism, but at present the size of graphs this 
algorithm can handle is limited to a few tens. 

Additional heuristics for the maximum clique/ 
independent set and related problems on arbitrary or 
special class of graphs can be found in [28,29,30,34]. 


Conclusions 


During the 1990s, research on the maximum clique and 
related problems has yielded many interesting heuris- 
tics. This article has provided an expository survey on 
these algorithms and an up-to-date bibliography (as of 
2000). However, the activity in this field is so extensive 
that a survey of this nature is outdated before it is writ- 
ten. 


See also 


> Graph Coloring 

> Greedy Randomized Adaptive Search Procedures 

> Replicator Dynamics in Combinatorial 
Optimization 
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We formulate a generalized local maximum principle 
which gives necessary conditions for optimality of ab- 
normal trajectories in optimal control problems. The 
results are based on a hierarchy of primal construc- 
tions of high-order approximating cones (consisting of 
tangent directions for equality constraints, feasible di- 
rections for inequality constraints, and directions of 
decrease for the objective) and dual characterizations 
of empty intersection properties of these cones (see 
> High-order necessary conditions for optimality for 
abnormal points). Characteristic for the theorem is that 
the multiplier associated with the objective is nonzero. 

We consider an optimal control problem in Bolza 
form with fixed terminal time: 

(OC) Minimize the functional 


EF 


I(x,u) = fuente. dt + €(x(T)) (1) 


0 
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subject to the constraints 


x(t) = f(x(t), u(t), t), 
x(0) = 0, q(x(T)) = 0, 
u(-)e€ U= {u € Lh, (0, T): u(t) € U}. 


The terminal time T is fixed and we make the following 
regularity assumptions on the data: L: R” x R” x[0, T] 
— Rand f: R" x R” x = [0, T] R” are Co in (x, u) for 
every t € [0, T]; both functions and their derivatives are 
measurable in ¢ for every (x, u) and the functions and all 
partial derivatives are bounded on compact subsets of 
R" xR” x [0, T]; £: R" > Rand q: R" — R* are C® and 
the rows of the Jacobian matrix q, (i.e. the gradients 
of the equations defining the terminal constraint) are 
linearly independent; U C R” is a closed and convex 
set with nonempty interior. Let 


H(Ao,A.x,u, t) = AoL(x,u,t) tAF(x,u,t) (2) 


be the Hamiltonian for the control problem. If the 
input-trajectory pair (x+, Ux) is optimal for problem 
(OC), then the local maximum principle [7] states that 
there exist a constant A > 0, an absolutely continuous 
function A:[0, T] — (R”)* (which we write as a row vec- 
tor), which is a solution to the adjoint equation 


A = -Hy (Ao, A(t), x«(t), Ue(t), 8), 
with terminal condition 
MT) = Aolx(x«(T)) + vgx(x«(T)), (3) 


(for some row vector v € (R*)*) such that (Ao, A(t)) # 
0 for all t € [0, T] and the following local minimum 
condition holds for all u € U: 


(Hu(Ao, A(t), x4(t), Ux (t), t),u— us(t)) > 0. (4) 


Input-trajectory pairs (x, u») for which multipli- 
ers Ag and A exist such that these conditions are sat- 
isfied are called (weak) extremals. If A) > 0, then it is 
possible to normalize Ap = 1 and the extremal is called 
normal while extremals with Ap = 0 are called abnor- 
mal. Although the terminology abnormal, which has 
its origins in the calculus of variations [4], seems to 
suggest that these type of extremals are an aberration, 
for optimal control problems this is not the case. The 
phenomenon is quite general and abnormal extremals 


cannot be excluded from optimality a priori. For in- 
stance, there exist optimal abnormal trajectories for the 
standard problem of stabilizing the harmonic oscilla- 
tor time-optimally in minimum time, a simple time- 
invariant linear system. 

In the abnormal case conventional necessary condi- 
tions for optimality provide conditions which only de- 
scribe the structure of the constraints. For example, if 
there are no control constraints, then these conditions 
only involve the equality constraint defined by the dy- 
namics and terminal conditions as zero set of an op- 
erator F: Z — Y between Banach spaces. If F’(z.) is 
not onto, but ImF’(z,.) is closed (and this is always the 
case for the optimal control problem) then the standard 
Lagrange multiplier type necessary conditions for opti- 
mality (which imply the local maximum principle [7]) 
can be satisfied trivially by choosing a multiplier which 
annihilates the image of F’(z.) and setting all other 
multipliers to zero.) The corresponding necessary con- 
ditions are independent of the objective and describe 
only the structure of the constraint yielding little infor- 
mation about the optimality of the abnormal trajectory. 

Much of the difficulty in analyzing abnormal points 
in extremum problems can be traced back to the fact 
that the equality constraint is typically no longer a man- 
ifold near abnormal points, but intersections of man- 
ifolds are common. Hence, in order to develop neces- 
sary and/or sufficient conditions for optimality of ab- 
normal extremals, it is imperative to analyze different 
branches of the zero-set of F. Finding these branches 
is at the heart of the matter. Generalizing a result of 
E.R. Avakov [2,3] in [10] a high-order generalization of 
the classical Lyusternik theorem is given which for gen- 
eral p € N describes the structure of p-order tangent 
directions to an operator equality constraint in a Ba- 
nach space for nonregular operators under a more gen- 
eral surjectivity assumption involving the first p deriva- 
tives of the operator. Based on these results p-order tan- 
gent cones to the equality constraint can explicitly be 
calculated along critical directions which comprise the 
low order terms. Combining these cones with standard 
constructions of high-order cones of decrease for the 
functional and high-order feasible cones to inequality 
constraints, all taken along critical directions, general- 
ized necessary conditions for optimality for extremum 
problems in Banach spaces can be derived which al- 
low to incorporate the objective with a nonzero mul- 
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tiplier. Characteristic of these results is that they are 
parametrized by critical directions as it is ‘natural’ near 
abnormal points. 

In [12] (see » High-order necessary conditions for 
optimality for abnormal points) an abstract formula- 
tion of these results is presented for minimization prob- 
lems in Banach spaces. The main result gives a dual 
characterization for the empty intersection property of 
the various approximating cones along critical direc- 
tions, but primal arguments using the cones themselves 
are often equally effective. In this article we formulate 
these abstract results for the optimal control problem, 
but we only consider the so-called weak or local version 
of the maximum principle. This result is weaker than 
the Pontryagin maximum principle [15] in the sense 
that the Pontryagin maximum principle asserts that the 
Hamiltonian of the control problem is indeed mini- 
mized over the control set at every time along the refer- 
ence trajectory by the reference control. The local ver- 
sion only gives the necessary conditions for optimality 
for this property. However, it is well-known how to use 
an argument of A. Ya. Dubovitskii to derive the Pon- 
tryagin maximum principle from the local version [7, 
Lecture 13] and a preliminary strong version of the re- 
sults of this article is given in [9]. 

Other theories of necessary conditions which are 
tailored to abnormal processes include a method 
known as ‘weakening equality constraints’ introduced 
in [14] and developed further in [5]. References [2,3] 
are along the lines of the results described here and 
give necessary conditions for optimality of abnormal 
extremals based on quadratic approximations. Simi- 
larly, both weak and strong versions of a second or- 
der generalized maximum principle are given by the 
authors in [8]. While mostly optimization related tech- 
niques are used in these papers, on a different level [1] 
uses differential geometric techniques to develop a the- 
ory of the second variation for abnormal extremals. 
They give both necessary and sufficient conditions for 
so-called corank-1 abnormal extremals (extremals for 
which there exists a unique multiplier) in terms of the 
Jacobi equation and related Morse indices and nullity 
theorems. Second order necessary conditions for op- 
timality in the type of accessory problem results with- 
out normality assumptions have first been given in [6]. 
Also, the results in [16] are derived without making 
normality assumptions. 


Regularity in the Equality Constraint 


We model the optimal control problem (OC) in the 
framework of optimization theory as a minimization 
problem in a Banach space under equality and inequal- 
ity constraints. Let W/, (0, T) denote the Banach space 
of all absolutely continuous functions x: [0, T] > R" 
with norm |x| = ||x(0)|| + i \|x(s)|| ds and let 
W?.(0,T) = Wi(0, T)M {x € W410, T): x(0) = 0}. 
Then the problem is to minimize the functional I 
over a class A of input-trajectory pairs (x,u) € 
Ww".(0,T) x L%(0,T) which is defined by equal- 
ity constraints and the convex inequality constraint 
u € U. The equality constraints can be modeled as 
F = {(x, u) € w"(0, T) x LO, T): F(x, u) = o} 
where F is the operator 


F: W%(0, T) x L™(0, T) > W",(0, T) x R* 
with F(x, u) given by 


() 
x)= ffx). nls). ds, q(x(T)) 


0 


It is easy to see that the operator F has continuous 
Fréchet derivatives of arbitrary order. For instance, 
F'(x, u) acting on (n, €) € W4,(0, T)xL™(0, T) is given 
by 


n(t) — / fen + fab ds, qx(x(T))9(T) 
0 


All partial derivatives of f are evaluated along a refer- 
ence input-trajectory pair (x, uw) € A. The formulas for 
higher order derivatives are given by equally straight- 
forward multilinear forms. 

We first describe the image of the operator F’(x., 
ux) for a reference input-trajectory pair (x+, ux). De- 
note the fundamental matrix of the variational equation 
by P(t, s), ie. 


* ot.s) = fy (x(t), u(t), 1) P(t, s), 
P(s,s) = Id. 


Furthermore, let R C R" denote the reachable subspace 
of the linearized system 


h(t) =fch+fuv, h(0) =0, (5) 
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at time T. It is well-known that R is a linear subspace of 
R” and that R = R” if and only if equation (5) is com- 
pletely controllable. In general we have that 


Lemma 1 I[mF'(x,, ux) consists of all pairs (a,b) € 
W".(0, T) x R* such that 


b € qx(xx(T)) [oc.sae) ds+R]. (6) 


0 


In particular, ImF' (xs, ux) is closed and of finite codi- 
mension. 


The following characterizations of the nonregularity of 
the operator F and its codimension are well-known. 


Proposition 2 The codimension of F'(x+, ux) is 
equal to the number of linearly independent solu- 
tions to A(t) = —A(t) f(x«(t), ux(t), t) which satisfy 
A(t)f u(x (t), Ux(t), t) = on [0, T] and for which A(T) 
is orthogonal to ker qx(xx(T)). 


Proposition 3 The operator F is nonregular at I" = (x; 
ux) if and only if I is an abnormal weak extremal which 
satisfies H,,(0, A(t), x(t), ux (t), t) = 0 on [0, T]. 


Critical Directions 


We describe the set of critical directions along which 
high-order tangent approximations to the equality con- 
straint F can be set up. Let Z = W%,(0, T) x L™(0, T) 
and suppose an admissible process Z» = (xs; Ux) € A 
and a finite sequence Hp; = (hy, ..., hp — 1) € ZP—1 
are given. The following operators allow to formalize 
high-order approximations to an equality constraint at 
nonregular points (see, ® High-order necessary condi- 
tions for optimality for abnormal points). For k = 1,..., 
p — 1, the directional derivatives V«‘F(zx)(Hx) of F at 2» 


along the sequence Hy = (hy, ..., hy) are given by 
=i 
ae YS Pees chy) (7) 
ral 7 \ ate t+ jr=k 


and we let G,[F](zx;sH,— 1) denote the Fréchet- 
derivatives of the (k — 1)th directional derivative of F at 
zx along H,— . Thus formally G,[F](zs) = F’(z) and 


in general for k > 2, Gy = G,[F](Z4%;Hp—1): Z > Ys v > 
G;(y), is given by 


k-1 l 
Gi Dos 
f=1 ~ 


= 


jtetjrskl 


FOTN (z) (hy. + 5 hy ¥) 


(8) 


We also denote by R,[F](z+;H,¢) those terms in the Tay- 
lor expansion of F(z + TL) which are homoge- 
neous of degree q > 2, but only involve vectors from 
Hg. The general structure of these remainders is given 


by 


FO (z4)(hjys 


» 


Aj)|}. (9) 


Let 


Y; = S¢ Im G[F] (243 Hi-1), j=1),...,p. G10) 
k=1 


The following conditions are necessary for the existence 

of a p-order tangent vector along H, — , [10]: 

i) the first p — 1 directional derivatives of F along 
Hy —1 vanish, 


V'F(zx)(Hj) =0,Vi=1,...,p—L 
ii) the compatibility conditions 


Rp-14i LF] (Ze; Hp-1) € Yj, 


are satisfied. 
In these equations all partial derivatives of f are eval- 
uated along the reference trajectory. These conditions 
are also sufficient if the operator F is p-regular at z, in 
direction of the sequence H, — , in the sense of the fol- 
lowing definition. 


Definition 4 Let F: Z— Y be an operator between Ba- 
nach spaces. We say the operator F is p-regular at zx in 
direction of the sequence Hp— ; € Z?~ ' if the following 
conditions are satisfied: 
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Al) F: Z — Y is (2p — 1)-times continuously Fréchet 
differentiable in a neighborhood of zx. 

A2) The subspaces Y;,i=1,..., p, are closed. 

A3) The map Gp = Gp[F](Z«; Hp —1); 


re. V4 
:Z>Y,xk — X-+-x 
: Y, ai 


v b> Gp(v) = (Gi(v), 71G2(v),..., p-1Gp(v)) , 


where 7;: Yj+1 — Yj41/Y; denotes the canonical 
projection onto the quotient space, is onto. 


In the sense of this definition 1-regularity corresponds 
to the classical Lyusternik condition while 2-regularity 
is similar to Avakov’s definition [3]. Under these as- 
sumptions vectors hp, exist which extend H,—, to p- 
order tangent vectors to F at z» [10,12]. 

For the critical directions for the objective I we fo- 
cus on the least degenerate critical case and therefore 
make the following assumption: 

iii) I’(z.) is not identically zero and V'I(zx)(H;) = 0 for 

i=1,...,p—1. 

The assumption that the first p — 1 directional deriva- 
tives vanish is directly tied in with optimality. If there 
exists a first nonzero directional derivative V/I(z)(H i) 
with j < i which is positive, then z, indeed is a local 
minimum for any curve z(é) = Z* + yeh + o(e?), 
€ > 0, and none of the directions Hy — , is of any use in 
improving the value. We restrict to ¢ > 0 since we also 
want to include inequality constraints. On the other 
hand, if V jI(Z4)(Hj) < 0, then H; is indeed a direction 
of decrease and arbitrary high-order extensions of this 
sequence will give better values. Thus the reference tra- 
jectory is not optimal. 

We also need to define the critical directions for the 
inequality constraint ‘U in the optimal control problem. 
More generally, we define a p-order feasible set to an 
inequality constraint in a Banach space. 


Definition 5 Let S C Z be a subset with nonempty in- 
terior. We call v a p-order feasible vector for S at z, in 
direction of Hp —1 = (hy; ..., hp —1) € Z?~ | if there exist 
an &9 > 0 and a neighborhood V of v so that for all 0<e 
S £0, 


p-l 
Za + So ethi +ePV CS. 


i=1 


The collection of all p-order feasible vectors v for S at 
zx in direction of the sequence H, — , will be called the 
p-order feasible set to S at z, in direction of the sequence 
Hy — 1 and will be denoted by FS) (S;z4; Hy —1). 


It follows from this definition that FS”) (S;z,, H p—1) is 
open. It is also clear that ES) (SZ Hp —1) is convex, 
if S is. Furthermore, if hj € FS! (Sszx, Hj—1) for some 
integer j < p, then any vector v is allowed as a p-order 
feasible direction and thus trivially FS) (S;z4, H p—)) 
=X, 

For the optimal control problem and H,—, = ((m, 
Gils vane Upedy Spe) let Vea = [bry acne Epa) E 
L%(0, T)? denote the sequence of controls. Then the 
critical feasible directions for the convex inequality 
constraint U in L% (0, T) consist of all H, —, for which 
iv) FS®)(Usus, V,—1) is nonempty. 


Definition 6 We calla direction Hy — | a p-regular crit- 
ical direction for the extremum problem at z, if the op- 
erator F is p-regular at z, along H, — ; and if conditions 
(i-iv) are satisfied. 


p-Order Local Maximum Principle 


Theorem 7 below gives a generalized p-order version 
of the maximum principle obtained from a dual char- 
acterization of the fact that if (x, us») is optimal, then 
the p-order tangent cones to the set {F = 0}, the p-order 
feasible cone to U and the p-order cone of decrease 
for the functional J cannot intersect. Notice that we 
write covectors like wy as row vectors. This is consistent 
with a multiplier interpretation of the adjoint variable. 
Also we denote partial derivatives by subscripts. For in- 
stance, if V'f(H;) denotes the ith directional derivative 
of f = f(x, u, t) with respect to the sequence Hj, then 
(V'f(H;))x denotes its partial derivative in x. For exam- 
ple, suppose H; = (7, €)). Then 


V' f (Hi) = felx, um + fulx,u, D& 
and thus 


(V'f(A))), = Fax (x, Uu, t)n + Fuxlx, Uu, te 


and 


(V flay), = Feulx, u, t)m + fuulX, Uu, thé. 
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Theorem 7 (p-order local maximum principle) Sup- 
pose the admissible process (x+, ux) is optimal for the 
optimal control problem (OC). Then for every p-regular 
critical direction H,— 1 there exist a number vo = 
Vo(Hp—1) = 0, vectors a; = a(Hp—1) € (R‘)*, i = 0, 
... p — 1, and absolutely continuous functions w(-) = 
W(Hp—)() and pi(:) = pi(Hp— D0 i= 1. p— 1 
from [0, T] into (R")*, which satisfy the following condi- 
tions along the optimal trajectory (x(t), ux(t), t): 
a) nontriviality condition: vo and the functional A: 
L2(0, T) > R, § H> AE), given by 


T fei 


[vt + Whit Spi (V' f(A), :) dt (11) 


0 i=1 


do not both vanish identically. 
b) extended adjoint equation 


p-l 


W(t) = —volx—W(Ofe—), pilt) (Vif (Hi), (12) 


i=1 


with terminal condition 


W(T) = volx(xx(T)) + 0qx(x»(T)) 
p-l 
+ > ° a; (Vig(xs(T);Hi)),. (13) 


i=1 


n 


c) orthogonality conditions on the additional multipli- 


ers: The functions p;(-), i= 1,..., p — 1, satisfy 


pi(t) = —pi(t) fr. pilt) fu =0, 


(14) 
Pi(T) = 4)4x(x(T)) 
and for j = 1, ..., i— 1, the following conditions are 
satisfied for a.e. t € [0, T]: 
pi(t) (V/f(Hj)), = 0, (15) 
pi(t) (Vf (Hj), = 9, (16) 
a; (V!q(xx(1); Hj), = 0; (17) 


d) separation condition: for all vectors € € FS® (U;us, 
Vp—1) we have that 


O< VoR p[€](Hp-1) + aoRplq](Hp-1) 
p-l 
+ aiRp+ilq](Hp-1) 


i=1 
E 


p-l 
- / (bt +Whut >> pilt) (Vi f(AD), 3) dt 
0 i=1 
T 


x / voRpLL](Hp-1) + W(t)RpLf(Hp—1) 
0 
p-l 

+ >) pilt)Rp+il f\(Ap—1) at. 
i=1 


(18) 


Corollary8 The separation condition d) implies the fol- 
lowing p-order local minimum condition: along (x(t), 
ux(t), t) we have for every u € U and ae. t € [0, T]: 


O< (bt + wit)fu 


p-l 


+>) pilt) (Vif (Hi), u- “0}. (19) 


i=1 


In the case of a Lagrangian minimization problem 
which has no control constraints, or more generally if 
the control takes values in the interior of the control set, 
the functional A vanishes identically. In this case we can 
normalize Vo = 1 and we obtain the following Corollary: 


Corollary 9 (p-order local maximum principle for 
Lagrangian problems) Consider the optimal control 
problem (OC) without control constraints (U = R™) and 
suppose the admissible process (xx, Ux) is optimal. Then 
for every p-regular critical direction Hp—, there exist 
vectors a; = a(Hp — 1) € (R*)*, i=0,..., p —1, and abso- 
lutely continuous functions i (-) = W(Hp — 1)(-) and pj(-) 
= pi(Hp—1)(-), i= 1,..., p — 1, from [0, T] into (R")*, 
which satisfy the conditions b)-d) of Theorem 7 along the 
optimal trajectory (x(t), ux(t), t) for vo = 1. In particu- 
lar, we thus have 


p-l 
Lu + WO fu + >> pilt) (Vi f(A), = 0. 


i=1 
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Example 10 We illustrate Theorem 7 with an example. 
Consider the problem to minimize the functional I(x, 
u) given by 


(20) 


T 
[[e — 1)? + xP + (x3 + 1? -2] dt 
0 


over all (x,u) € W3,(0, T) x 12,(0, T) subject to the 
dynamics 


a(t)=|] x? J+l[-1 0 (') . O05 
p-l u2 
AX, x3 QO -1 


initial condition x(0) = 0 and terminal constraints x,(T) 
= 0 and x3(T) = 0. Here p is an integer, p > 2, and a 
is an arbitrary real number. For simplicity we have not 
imposed any control constraints. 

It can easily be seen that the reference trajectory I” 
= (xx, Ux) = (0, 0) is an abnormal extremal for each 
problem. In fact, setting A(t) = (v, 0, v) with v # 0 and 
Ao = 0 defines an adjoint vector for I” such that H, = 
0. Hence F’(0, 0) is nonregular. 

Theorem 7 can be used to eliminate I from opti- 
mality for any p > 2. We choose H, — ; of the form 


Hy-1 = ((m, 1); (0, 0); --- ; (0, 0)) (22) 
with (7, 1) € F’(0, 0). With this choice of directions 
the compatibility conditions ii) simplify considerably 
and reduce to the first condition only which becomes 


vector 7. We satisfy this by choosing nV) =— 
2] \ 


0 (ie, ae = 0). Then choosing a nonzero 7; 
zero boundary conditions defines a nontrivial vector 
H,—, of the form (22) for which conditions i) and ii) 
in the definition of p-regular critical directions are sat- 
isfied. Furthermore, it is easily seen that the operator F 
is p-regular in direction of H,— , at I”. Finally, these di- 


rections are also critical for the objective: we have I’'(0, 
0)(71, 1) = 0 and furthermore 


V?1(0, 0)(H2) = 51"€0, (m1, &05 On.) 


r 


= / Gul + (nf), ds =0 


0 


provided p > 2. Since no other I-derivatives arise in the 
directional derivatives V'I(0, 0)(H;) for i =3,...,p —1, 
the direction Hy — 1 = ((71, &1)3(0, 0); ---3(0, 0)) with 
2 Bl d [2] ; _ l 
ny =n, =Oanda nonzero n;° is a nonzero p-regular 
critical direction for the problem to minimize I subject 


to F = 0 for any p > 2. 


We thus can apply Theorem 7. Since there are no con- 
trol constraints we can normalize the multipliers so that 
Vo = 1. The additional multipliers p;,i=1,...,p — 1, are 
associated with elements in the dual spaces of the quo- 
tients Y;,1/Y; (see » High-order necessary conditions 
for optimality for abnormal points). But here Y; = Im 
F’(0, 0) for i= 1,..., p — 1, and Y, is the full space. 
Thus we have p; = 0 for i = 2, ..., p — 1 and the only 
nonzero multipliers are y and p,—, which for simplic- 
ity of notation we just call p. Now (14) states that p is 
an adjoint multiplier for which the conditions of the lo- 
cal Maximum Principle for an abnormal extremal are 
satisfied. This multiplier is unique and of the form p(t) 
= (v, 0, v), but v € R could be zero. For the extended 
adjoint equation and minimum condition (19) we need 
to evaluate the directional derivatives V? ~ ‘f(x, u)(Hj). 
Straightforward, but a bit tedious calculations show that 


0 0 0 
V?—! £(0, 0)(H; —|0 0 0 
(Vv? f(0, 0)(H)), (yr 
nN ) 
and 


(V?-" (0, 0)(H1)), 


Thus the extended minimum condition reduces to yB 
= 0, the minimum condition of the weak maximum 
principle. Hence also w2(t) = 0 and wi(t) = w3(t). But 
now the extended adjoint equation is given by 


0. 


0 0 0 


w(t) =(2,0,-2)-p}9 9 0 
p-l 
0 0 (nf) 


High-order Necessary Conditions for Optimality for Abnormal Points 


1527 


and thus 
4 = OV» (a2) = —v (uP) 


But we can certainly choose 7? nonconstant to violate 
this condition. This contradiction proves that I” can- 
not be optimal for the problem to minimize I for any 
p22. 


Conclusion 


Theorem 7 is based on p-order approximations. If these 
remain inconclusive, higher order approximations can 
easily be set up. If the operator F is p-regular in di- 
rection of H,—j, then given a p-regular tangent di- 
rection, it is possible to set up higher order approx- 
imations of arbitrary order. In fact, only a system of 
p linear equations needs to be solved in every step. 
These results provide a complete hierarchy of pri- 
mal constructions of higher-order approximating di- 
rections and dual characterizations of empty intersec- 
tion properties of approximating cones which can be 
used to give necessary conditions for optimality for in- 
creasingly more degenerate structures. For these results 
see [13]. 


See also 


> Dynamic Programming: Continuous-time Optimal 
Control 

> Hamilton-Jacobi-Bellman Equation 

> Pontryagin Maximum Principle 
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We consider the problem of minimizing a functional I: 
X — Rina Banach space X under both equality and 
inequality constraints. The inequality constraints are of 
two types, either described by smooth functionals f: X 
— Ras P= {x € X: f(x) < 0} or described by closed con- 
vex sets C with nonempty interior. The equality con- 
straints are given in operator form as Q = {x € X : F(x) 
= 0} where F: X — Y is an operator between Banach 
spaces. Models of this type are common in optimal con- 
trol problems. 

The standard first order Lagrange multiplier type 
necessary conditions for optimality at the point x, state 
that there exist multipliers Ao, ..., Am, y* which do 
not all vanish identically such that the Euler-Lagrange 
equation 


Aol! (xx) + DO Aji Oe) + F* (ae) =0, (1) 


j=1 


is satisfied (see for instance [7,9]). This article addresses 
the case when the Fréchet-derivative F’(x.) of the op- 
erator defining the equality constraint is not onto, i.e. 
the regular case. In this case the classical Lyusternik 
theorem [14] does not apply to describe the tangent 
space to Q and (1) can be satisfied trivially by choos- 
ing a nonzero multiplier y* from the annihilator of Im 


F'(x,) while setting all other multipliers zero. This gen- 
erates so-called abnormal points for which the stan- 
dard necessary conditions for optimality only describe 
the degeneration of the equality constraint without any 
relation to optimality. Here we describe an approach 
to high-order necessary conditions for optimality in 
these cases which is based a high-order generalization 
of the Lyusternik theorem [12]. By using this theo- 
rem one can determine the precise structure of poly- 
nomial approximations to Q at x* when the surjectiv- 
ity condition on F’(x,) is not satisfied, but when in- 
stead a certain operator G, which takes into account 
all derivatives up to and including order p is onto. 
The order p is chosen as the minimum number for 
which the operator G, becomes onto. If G, is onto, 
then the precise structure of q-order polynomial ap- 
proximations to Q at x, for any q > p can be de- 
termined. This leads to the notion of high-order tan- 
gent cones to the equality constraint Q at points x, in 
a nonregular case. Combining these with high-order 
feasible cones for the inequality constraints and high- 
order cones of decrease, a generalization of the Dubovit- 
skii-Milyutin theorem is formulated. From this theorem 
generalized necessary conditions for optimality can be 
deduced which reduce to classical conditions for nor- 
mal cases, but give new and nontrivial conditions for 
abnormal cases. 

First results of this type have been obtained for 
quadratic approximations (p = 2) in [3,4,5] and [11]. 
Some of these conditions have been analyzed further 
also in connection with sufficient conditions for opti- 
mality, [1,2]. In [10] also quadratic approximations for 
problems with inequality constraints are considered. 
For the regular case when F’(x,) is onto second or- 
der approximating sets were introduced in [6] to derive 
second order necessary conditions for optimality, while 
higher order necessary conditions for optimality in this 
case are given, for instance, in [8] or [15]. These, how- 
ever, are not the topic of this article. 


A High-Order Formulation 
of the Dubovitskii-Milyutin Theorem 


Let X and Y be Banach spaces. Let I: X — R be a func- 
tional, F: X — Y an operator, f;: X > R,j=1,..., m, 
functionals and let C C X be a closed convex set with 
nonempty interior. We assume that I, the functionals f; 
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and the operator F are sufficiently often continuously 
Fréchet-differentiable and consider the problem 


min I 

st *xEA= (NP) QNC, 
P, = {x EX: f,(x) < 0} 
Q={xe xX: F(x) =0}. 


(P) 


We define high-order polynomial approximations 
to the admissible domain A. We denote sequences (h1, 
...5 hy) € X* by Hy with the subscript giving the length 
of the sequence. 


Definition 1 Let Hp—1 = (hy, ..., hp—1) € X?~ | and 
set x(€) = x4 + ay e‘h;. We call Hy — 1 a (p — 1)-order 
approximating sequence to a set S C X at xx € Clos S, 
respectively we call x:e — x(e), a (p — 1)-order approx- 
imating curve, if there exist an &) > 0 and a function r 
defined on [0, €9] with values in X, r: [0, €9] — X, with 
the property that 


p-l 
x(e) + r(e) = xe + Do e'hj +(e) € S (2) 
i=1 
and 
lim Il = 3) 


We call a (p — 1)-order approximating sequence/curve 
(p — 1)-order feasible if S is an inequality constraint, re- 
spectively (p — 1)-order tangent if S is an equality con- 
straint. 

Let x, € F and assume as given a (p — 1)-order ap- 
proximating sequence Hp —; = (M,..., hp—1) € xea 
with corresponding (p — 1)-order approximation x(¢) 
= xy + ~ e‘h;. It is implicitly assumed that x, has 
not been ruled out for optimality. Then we extend the 
existing (p — 1)-order approximations to p-order ap- 
proximations and derive the corresponding necessary 
conditions for optimality. The following definitions are 
direct generalizations of standard existing definitions 


[7]. 


Definition 2 We call vo a p-order vector of decrease for 
a functional I: X — Rat x, € X in direction of the se- 
quence Hp_, = (hy, ..., hp—1) € X?—! if there exist 


a neighborhood V of vp and a number @ < 0 so that for 
all v € V we have 


p-l 
I ( + yoeth; + a) 
i=1 
= I(x(e) + e?v) < I(xx) + ae’. (4) 


The collection of all p-order vectors of decrease for I at 
x» in direction of the sequence Hy —, will be called the 
p-order set of decrease to I at x, in direction of the se- 
quence H, — and will be denoted by DS®) (I;xx, Hy —1). 


Definition 3. We call vo a p-order feasible vector for an 
inequality constraint P at x, € X in direction of Hp—1 
if there exist an &9 > 0 and a neighborhood V of vo so 
that for all 0 < ¢ < € 


p-l 
xe t+ So eth + e?V = x(e) + eV CP. (5) 


i=1 


The collection of all p-order feasible vectors vo for P 
at x» in direction of the sequence H,— will be called 
the p-order feasible set to P at x, in direction of the se- 
quence H, — ; and will be denoted by ES") (Psxx, Hy —1). 


Note that by definition the p-order set of decrease to I 
and the p-order feasible set to P, both at x, in direction 
of the sequence Hy — 1, are open. 


Definition 4 We call hy a p-order tangent vector to an 
equality constraint Q at x, in direction of the sequence 
Hy, if Hp = (hy, ..., hp) € X? is a p-order approximat- 
ing sequence to the set Q at x. € Q. The collection of 
all p-order tangent vectors to Q at x» in direction of the 
sequence H,, — ; will be called the p-order tangent set to 
Q at x, in direction of the sequence Hy —; and will be 
denoted by TS” (Q;x4, Hp —1). 


These approximating sets can be embedded into cones 
in the extended state-space X x R. This has the ad- 
vantage that many classical results like the Minkowski- 
Farkas lemma or the annihilator lemma can be directly 
applied in calculating dual cones (see also [11]). Let us 
generally refer to p-order sets of decrease, feasible sets 
and tangent sets as p-order approximating sets and de- 
note them by AS) (Z3x4, H p—1). Then we define the 
corresponding approximating cones as follows: 


Definition 5 Given a p-order approximating set 
AS?) (Z3x4, Hp-1) to a set Z C X at x, in direction 
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of the sequence Hy, — 1, the p-order approximating cone 
to Z at x, in direction of Hp —1, AC) (Zs x45 Hy -1); 
is the cone in X x R generated by the vectors (v, 1) € 
AS”) (Ziit—, Hyp—1) XR. 


Thus we talk of the p-order cone of decrease for the func- 
tional I, p-order feasible cones for inequality constraints 
and p-order tangent cones for equality constraints, all at 
x» in direction of the sequence Hy —1. 


Definition 6 Let C C Z bea cone in a Banach space 
Z with apex at 0. The dual (or polar) cone to C consists 
of all continuous linear functionals A € Z* which are 
nonnegative on C, i.e. 


Cener Uyet, wet. (6) 


Then we have 


Theorem 7 [11,13] (p-order Dubovitskii-Milyutin the- 
orem). Suppose the functional I attains a local minimum 
for problem (P) at x, € A. Let Hp—1 = (hj, ..., hp—1) € 
X?~! be a (p — 1)-order approximating sequence such 
that the p-order cone of decrease for the functional I, the 
p-order feasible cones for the inequality constraints P;, 
j=41,..., m, and C, and the p-order tangent cone to 
the equality constraint Q, all at x. in direction of the 
sequence Hy 1, are nonempty and convex. Then there 
exist continuous linear functionals 
* 
= Os.eve (DCs, Hp-1)) 
* 
Yj = (2j, Wj) € (FC (fis xe, Hp-1)) 
forj=1,...,m, 


@ = mtr, Himti) € (FC(C5 x4, Hp-)) 
and 
= (Am+2, Hint2) € (TC?(Q:x4, Hp) 


all depending on Hy — 1, such that 


m+2 m+2 
» Aj =0 > Oo 0 (7) 
j=0 j=0 


hold. Furthermore, not all the dj, j =0,..., m+ 2, vanish 
identically. 


High-Order Directional Derivatives 


We describe a formalism to calculate higher derivatives 
[12,13] which will be needed to describe high-order ap- 
proximating cones. Let F: X — Y be an operator be- 
tween Banach spaces which is sufficiently often contin- 
uously Fréchet differentiable in a neighborhood of x, € 
X and consider the Taylor expansion of F along a curve 


y(e) = xX" + Soeihi. 


i=1 


We have 


F(y(e)) = Flx«) + D5 €'V'F(x«)(I1,..., hi) +76), 


i=1 
where V'F(x)(/1, ..., hj) is given by 


Ys FOO(x,)(Aj,,.- +5 Aj.) (8) 


r=1 


2 


jtetii 


and f(e) is a function of order o(e™) as e > 0. Note that 
V'F(x»)(hy, ..., hi) simply collects the ¢'-terms in this 
expansion. These terms, which we call the ith-order di- 
rectional derivatives of F along the sequence Hj=(h,..., 
hj), 1 <i<m, are easily calculated by straightforward 
Taylor expansions. For example, 


V'F(x«)(Hi) = F'(x*)hi, 


V?F(x*)(H2) = F'(x*)h2 + SF xa), hy). 


The higher-order directional derivative V'F(x.) is ho- 
mogeneous of degree i in the directions in the sense that 
Vi F(xs)(ehy,...,¢'hj) = e' V'F(xx)(hy,..., hj). 
In particular, no indices j, and jz with j; + jz >i can 
occur together as arguments in any of the terms in 
V'F(xx). Thus all vectors h; whose index satisfies 2j 
> i appear linearly in V‘F(x,) and are multiplied by 
terms which are homogeneous of degree i — j. In fact, 
there exist linear operators Gy = G,[F](x+;Hx—1), k € 
N, depending on the derivatives up to order k of F 
in the point x, and on the vectors H,— 1 = (hi, ...; 
hy — 1), which describe the contributions of these com- 
ponents. We have G,[F](x») = F’(x) and in general 
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Gx = GxlF] (x%3He— 1): Z > Y, v > G;(v), is given by 
k-1y 
GW = 5 
r=1 


> 


jt jrekol 


EYP) Ui os 255 Mia) 


(9) 
These operators G,[F](x%;H,—1) are the Fréchet- 
derivatives of the (k — 1)th directional derivative of F 
at x, along H;,— . Note that these terms are homoge- 


neous of degree k — 1. For simplicity of notation we 
often suppress the arguments. For example, we write 


Gi(v) = F'(xs)v, Gov) = F"(x«)(n, ¥), 
G3(v) = F"(x«)(h2,¥) + SP" xe) hn hi, v). 


Given an order p €N, it follows that we can separate 
the linear contributions of the vectors hp, . 
derivatives of orders p through 2p — 1 and fori=1,..., 
p, we have an expression of the form 


wey Mop -1 in 


VPI TE(x4)(Hp—143) = 


Y Ge lF] (x03 Hi—1) Mp 4ie +R p14 i[F] (x43 Hp). 
k=1 


Here among the terms which are homogeneous of de- 
gree p — 1 + i the sum gives the terms which contain 
one of the vectors hp, ..., hp —1 +i, and the remainder R 
combines all other terms which only include vectors of 
index < p — 1. The general structure of the remainder 
R,|F](z+;H¢) for arbitrary q > 2 and ¢ is given by 


q 
1 
(r) : : 
oo SB Geek) (10) 
r=2 jite+jr=q 
1Sjxsf, 
1<k<r 


Thus R,(H¢) consists of the terms which are homoge- 
neous of degree q, but only involve vectors from Hy. For 
example, R3[F](z+;H2) is given by 


1 
F"(Zx)(hi, hz) + cole hy, hy). 


Note that the remainders only have contributions from 
derivatives of at least order two. These operators allow 
to formalize high-order approximations to an equality 
constraint at nonregular points [13]. 


High-Order Tangent Cones 


We first describe the set of critical directions along 
which high-order tangent approximations to the equal- 
ity constraint Q can be set up. For a given admissible 
process z, € A and a finite sequence Hpy—, = (hi, ...; 
hp —1) € X?—', let 


¥ = >" imG,[Fl(xes Hei), 1 =1,...,p. 


k=1 


It is clear that the first p — 1 directional derivatives of F 
along H,— 1 must vanish, 
ViF(zs)(H;) =0, Vi=1,.. 


apa, (11) 


if Hp—, isa (p — 1)-order tangent direction. But addi- 
tional compatibility conditions of the form 

Rp-14ilF](%%; Hpi) € Y;,, i=1,...,p—1, (12) 
are necessary as well if we want to extend H, _, to a p- 
order tangent direction Hy = (Hp —1;hp). Conditions 
(11) and (12) are indeed sufficient for the existence of 
p-order approximations along H,— under the follow- 
ing regularity condition: 


Definition 8 Let F: X — Y be an operator between Ba- 

nach spaces. We say the operator F is p-regular at x in 

direction of the sequence Hp — ; € X?~ ! if the following 

conditions are satisfied: 

Al) F: X > Y is (2p — 1)-times continuously Fréchet 
differentiable in a neighborhood of x,; 

A2) the subspaces Y;,i=1,..., p, are closed; 

A3) the map Gp = Gp[F] (x*;Hp — 1) 


ee re Y 
: > Y,x —x---x : 
Y, coe 


v b> Gp(v) = (Gi(v),..., Tp1Gp(v)), 


where 71;: Yi+1 — Yj+1/Y; denotes the canonical 
projection onto the quotient space, is onto. 


In the sense of this Definition, 1-regularity corresponds 
to the classical Lyusternik condition while 2-regularity 
is similar to Avakov’s definition [5]. 


Theorem 9 [12] Let H,—1 be a sequence so that 
V'F(xs)(Hi) = 0 fori = 1,..., p — 1, and suppose the 
operator F is p-regular at xx in direction of H,— 1. Then 
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TS”) (Q:x%, Hp —1) is nonempty if and only if for i = 1, 
...» p — 1, the compatibility conditions 


Rp-14i [F] (x3; Hp—1) € Y; 


are satisfied. In this case TS®(Q;x:, Hy — 1) is the closed 
affine subspace of X given by the solutions to the linear 
equation 


Gp[F](x*3 Hp—i1)(v) + Rp-i[F](x*, Hp-1) = 0, (13) 


where Ry — 1 [F] (xx, Hp —1) € Z is the point with compo- 
nents 


(Rp[F] (x43 Hy-1), m1 Rp4i[F] (x4; Hp-1), tees 
Tp—1R2p-1 [F] (x45 Hy-1)) : 


This formulation of the result clearly brings out the ge- 
ometric structure of the p-order tangent sets as closed 
affine linear subspaces of X generated by the kernel of 


Gp; kerGp. 


Corollary 10 [12] Let Hp — be a sequence such that the 
operator F is p-regular at x» in direction of Hy — 1. Sup- 
pose the first (p — 1) directional derivatives V'F(x«)(Hi) 
vanish for i=1,..., p — 1, and the compatibility condi- 
tions Rp —1+ilF](xxsHp—1) € Yj are satisfied for i = 1, 
..., p. Then the p-order tangent cone to Q = {x € X: F(x) 
= F(x,)} at xx in direction of Hp —1, TC?) (Q3x.,; Hp-1); 
consists of all solutions (w, y) € X xR, (i.e. y > 0) of the 
linear equation 


GplF](w) + YRp-ilF] (x, Hpi) = 0. 


For applications to optimization problems we need the 
subspace of continuous linear functionals which anni- 
hilate G,. Since the operator G» is onto, it follows by the 
annihilator lemma or the closed-range theorem [9] that 


(ker Gp)" = Im(G¥), 


where G;: 


- > x. 


Y. Y 
Z* = Y* x (—)* ++ x ( 
Y, Ya 


denotes the adjoint map. Let 


Y; ; 
Tit = + ytih 
Y; 1 


denote the canonical isomorphism. Here 1; +1 denotes 
the annihilator in Y;,j, ie. 


Yt = {yt eYA,: (y*,v) =0, Wve Yi} 


1 


and we formally set Yo = {0}, so that i '~ YT. Then 
we have: 


Proposition 11 [11,13] A functional 1 € X* lies in 
(kerG,)* if and only if it can be represented in the form 


P 
A= > G, [Fl@eai Heat 


i=1 


(14) 


i i=1,..., p. 
Proposition 12 [11,13] The dual or polar p-order tan- 
gent cone consists of all linear functionals (A, ju) € X* x 
R which can be represented in the following form: There 
exist functionals y* € ay i=1,...,p,andanumberr 
> 0 such that 


for some functionals y* € Y 


P 

A= So GFF] (x4; Hi) y7, 
i=1 

P 

w= So (yF Rpt LF) (x05 Hp) + 1 


i=1 


High-Order Cones of Decrease 


We now consider critical directions for the objective I 
and determine the p-order sets of decrease of a func- 
tional I: X — R. These results also apply to p-order fea- 
sible sets to inequality constraints defined by smooth 
functionals. We assume as given a (p — 1)-order se- 
quence H,—, and we calculate the p-order set of de- 
crease of I at x, along H, — . Trivial cases arise if there 
exists a first nonzero directional derivative V‘I(x.)(H;) 
of I with i < p — 1. In this case we have either 
DS®)(I;x., Hp—1) = @ if V'I(xs)(H;) > 0 or DS (I;x., 
Hp—1) = X if V'I(x+)(H;) < 0. In the first case the 
sequence H,—, cannot be used to exclude optimality 
of x» since indeed x, is a local minimum along the 
approximating curve generated by H,—.. In the sec- 
ond case h; is an ith-order direction of decrease along 
H;—, and thus every vector v € X is admissible as 
a pth order component. The only nontrivial case arises 
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if V'I(x.)(H;) = 0 for all i with i < p — 1 and if I’(xx) 
#0. 


Proposition 13 [13] Suppose I'(x.) 4 0 and for all i 
with i < p — 1 we have V'I(xx)(H;) = 0. Then the p-order 
cone of decrease for the functional I at x» in direction of 
Ap-1 DC?) (Ix; Hp — 1), consists of all vectors (w, y) € 
X x R which satisfy 


I'(x*)w + VRp[L] (x45 Hp-1) <0. 


Thus DC” (I;x,, H p —1) isnonempty, open and convex. 
The dual or polar cone to DC” (I;xx, Hy — 1) can easily 
be calculated using the Minkowski-Farkas lemma [7]. 


High-Order Feasible Cones to Inequality 
Constraints Given by Smooth Functionals 


In this section we give the form of the p-order feasible 
cones, EC) (P;x,, H p—1)> for inequality constraints P 
described by smooth functionals, 


P={xeEX: f(x) <0}. 


Similar like for sets of decrease, if there exists a first in- 
dex i < p — 1 such that V'f(x.)(H;) 4 0, then the con- 
straint will either be satisfied for any p-order vector v € 
X if Vif(x+)(H;j) < 0 or it will be violated if Vif(x+)(Hi) 
> 0. This leads to the definition of p-order active con- 
straints. 


Definition 14 The inequality constraint P is said to be 
p-order active along the sequence Hy — if for all i,i=1, 
...p — 1, we have V'f(xs)(Hi) = 0. 


Only p-order active constraints enter the necessary 
conditions for optimality derived via p-order approx- 
imations along an admissible sequence Hy — 1; p-order 
inactive constraints generate zero multipliers since 
DS) (P;x4, Hp —1) = X (p-order complementary slack- 
ness conditions) and can be ignored for high-order ap- 
proximations. 


Proposition 15 If the constraint P = { x € X: f(x) < 
0} is p-order active along the sequence Hy — 1, then the 
p-order feasible cone, FC®)(P;x, Hp — 1), consists of all 
vectors (w, y) € X x R, which satisfy 


f'(xe)w + VRp[fl (x43 Hp-1) < 0. 


Hence, if f’(xx) 4 0, then EC”) (Psxx, Hp—1) is 
nonempty, open and convex. 


High-Order Feasible Cones 
to Closed Convex Inequality Constraints 


Let C C X be a closed convex set with nonempty in- 
terior. Again we assume that H,—, is a (p — 1)-order 
feasible sequence. Note that it follows from Definition 
3 that FS” (C;x., H)-—1) is open (since any vector in 
the neighborhood V of v also lies in ES) (Csx45 Hy -1)). 
It is also clear that FS (C;x., Hy, —1) is convex, since 
C is. Thus FC?) (C;x,, Hy, —1) is an open, convex cone. 
Furthermore, if there exists an integer j < p so that h; 
€ FS9 (Cx, Hj—1), then any vector v is allowed as 
a p-order feasible direction and thus trivially FS?) (C;x., 
Hp -1) = X, ie. the convex constraint x € C is not p- 
order active. In this case the necessary conditions for 
optimality along H,—, are exactly the same as without 
C. 

The dual or polar cone FC")(C;xx, H,y—1)* can be 
identified with all supporting hyperplanes to FS” (C;x., 
Hy-—1) at x». More precisely, it consists of all linear 
functionals (A, (4) € X* xR which satisfy 


(A,v)+u>0, Wve FS! (C; x, Hp-1). 


Corollary 16 [13] Let C CX be aclosed convex set with 
nonempty interior and suppose the p-order feasible set 
FS?) (C35 Hy —1) is nonempty. If (A, ju) € FC®)(C3xx, 
Hy —1)*, then d is a supporting hyperplane to C at xx. 


Generalized Necessary Conditions for Optimality 


We now give generalized necessary conditions for op- 

timality for problem (P) based on general p-order ap- 

proximations. We assume as given a sequence Hp— = 

(hy, ..., hp —1) € X?~! with the following properties: 

P1) The first p — 1 directional derivatives of F along 
Hy —1 vanish, 


V'F(x»)(H;) = 0, 
the compatibility conditions 
Rp-14ilF](%3 Hp-1) € Y; 


are satisfied for i = 1, ..., p — 1, and the opera- 
tor F is p-regular at x, in direction of the sequence 
Hp-1. 
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P2) Either the first nonvanishing derivative 
V'I(xx)(H;) is negative or V'I(x«)(Hi) = 0 for i 
=1,..,p—1. 

P3) If the jth constraint is not p-order active, then the 
first nonzero derivative V'f(xs)(Hj) is negative. 

P4) FS?) (C3xx, Hy —1) is nonempty. 

These conditions guarantee respectively that the cor- 

responding p-order approximating cones to the con- 

straints or the functional J are nonempty and convex. 

The next theorem generalizes the classical first order 

necessary conditions for optimality for a mathemati- 

cal programming problem with convex inequality con- 

straints [7, Thm. 11.4]. 


Theorem 17 I[fx, is optimal for problem (P), then given 
any sequence Hy , = (hy, ..., hp— 1) € X?~! for which 
conditions P1)-P4) are satisfied, there exist Lagrange 
multipliers v; > 0, i = 0, ..., m, functionals y} € ene 
i=1,..., p, and a supporting hyperplane (A, v) + w= 
0 for all v € FS)(C;x, Hp — 1), all depending on the se- 
quence Hy —;, such that the multipliers v;,i=0,..., m, 
and X do not all vanish, and 


Ae ele 3 Vi fim) + 3 Giyi, (15) 
MS snRpLfl Cees Hy) ~ 
+ ys vjRpLfil(xes Hp—1) 
a 
+0 (yf, Rp-i4ilF(Ap-)). (16) 


i=1 
Furthermore, the following p-order complementary 
slackness conditions hold: 

© vo =0 if DS (Ix, Hp—1) =X; 

© v= 0 if FS (Pixs, Hpi) =X: 

© A=0if FSP (Cix4, Hp—1) =X. 


Remark 18 This theorem gives the formulation for the 
case which is nondegenerate in the sense that the op- 
erator Gp is onto and it is this condition which im- 
plies the nontriviality of the multipliers v;, j = 0, ..., 
m, and A. If Gp is not onto, but ImG) is closed, while 
all the other conditions remain in effect, then a degen- 
erate version of this theorem can easily be obtained by 
choosing a nontrivial multiplier y* € (Im G,)+. This 
then gives rise to nontrivial multipliers y* € ‘aan which 
have the property that )-?_,G*y* = 0. Thus (15) still 


i=1 


holds if we set v; = 0, for j = 0,..., m, and A = 0. Thus 
the difference is that it can only be asserted that not all 
of the multipliers v;,j =0,...,m,y7 € 7. tap. cag P; 
and A do vanish. 


See also 


> Kuhn-Tucker Optimality Conditions 
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The formulation of Hilbert’s thirteenth problem [8] 
reads: ‘impossibility of solving the general equation of 
degree 7 by means of any continuous functions depend- 
ing only on two variables’ [21]. 

On this basis, D. Hilbert proposed that the complex- 
ity of functions is specified essentially by the number 
of variables. However, as turned out later, this proposal 
being valid for analytic functions is not true in the gen- 
eral case. In particular, complexity of r times continu- 
ously differentiable functions of n variables depends not 
on the number of variables n but on the ratio n/r. 

It is known that the equation of third degree can be 
reduced by translation to 


X?+ pX+q=0, 


which has the solution (S. del Ferro, 16th century) 


3 2 
x= 4 4 4p? + 274 
2 4(27) 
q 4p? + 27q? 
+ — = 
a 4(27) 


The equation of fourth degree can be solved by super- 
position of addition, multiplication, square roots, cube 
roots and fourth roots. 

To try to solve algebraic equations of higher degree 
(a vain hope according to N.H. Abel and E. Galois), the 


1/3 


1/3 


idea of W. Tschirnhausen in 1683 [24] was to adjoin 
a new equation, i.e., to 


P(X) =0 
one adjoins 
Y = Q(X), 


where Q is a polynomial of degree strictly less than that 
of P, chosen expediently. In this way one can show that 
the roots of an equation of degree 5 can be expressed 
via the usual arithmetic operations in terms of radicals 
and of the solution ¢(x) of the quintic equation 


X+xX+1=0 


depending on the parameter x. Similarly for the equa- 
tion of degree 6, the roots are expressible in the same 
way if we include also a function 0(x, y), a solution of 
a 6th-degree equation depending on two parameters x 
and y. 

For degree 7 we would have to include also a func- 
tion o(x, y, z), solution of the equation 


X’4+xX? + yX?4+2X+1=0. 


Hence the natural question: Can o(x, y, z) be expressed 
by superposition of algebraic functions of two variables 
[10]? 

A great number of papers are devoted to the rep- 
resentability of functions as superpositions of functions 
depending on a smaller number of variables and sat- 
isfying certain additional conditions such as algebraic- 
ity, analyticity and smoothness. Hilbert was aware of 
the fact that superpositions of discontinuous functions 
represent all functions of a larger number of variables. 
He also knew about the existence of analytic functions 
of three variables that cannot be represented by any fi- 
nite superpositions of analytic functions of two vari- 
ables [8]. 

In the statement of his 13th problem, Hilbert pro- 
ceeded from a result of Tschirnhausen [24], according 
to which a root of an algebraic equation of degree n > 5, 
i.e., a function f (x1, ..., x,) determined by an equation 


fh bay ft! +--+ x, =0, (1) 


can be expressed as a superposition of algebraic func- 
tions of n— 4 variables [21]. Hilbert assumed that the 
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number n — 4 cannot be reduced for n = 6, 7, 8 and also 
proved that in order to solve an equation of degree n = 
9 it suffices to have functions of n — 5 variables [9]. A. 
Wiman [26] extended the latter result to n > 9, while 
N. Chebotarev [6] reduced the number of variables in- 
volved in the representation of functions to n — 6 for n 
> 21landton—7 forn> 121. 

Chebotarev was the first to attempt to find topo- 
logical obstructions to the representability of algebraic 
functions as superpositions of algebraic functions, but 
his proofs were not convincing [5,17]. Using topologi- 
cal notions related to the behavior of a many-valued al- 
gebraic function on and near a branching manifold, it is 
proved that algebraic functions cannot be represented 
by complete superpositions of integral algebraic func- 
tions. Completeness means that the represented func- 
tion must involve all the branches of the many-valued 
functions and not only one of them as, for example, in 
the formulas expressing solutions to equations of the 
3rd and the 4th degree [21]. 

Certain topological obstructions to the representa- 
tion by a complete superpositions of algebraic func- 
tions were constructed in this way [2]. V. Lin [15] es- 
tablished the following, most complete, result: In any 
neighborhood of the origin for n > 3 the root f(x), ..., 
Xn) of equation (1) is not a complete superposition of 
entire algebroid functions of fewer than n — 1 variables 
and single-valued holomorphic functions of an arbi- 
trary number of variables. Thus, from the standpoint of 
complete superpositions of entire algebraic functions, 
even fourth-degree equations cannot be solved without 
using functions of three variables [21]. 

Hilbert had had another motivation for his thir- 
teenth problem: nomography, the method of solving 
equations by drawing a one-parameter family of curves. 
This problem, arising in the methods of computation 
of Hilbert’s time, inspired the development of Kol- 
mogorov’s notion of ¢-entropy [20]. Applications of e- 
entropy have its crucial role in theories of approxima- 
tion now used in computer science [22]. 

In Kolmogorov e€-entropy, a natural characteristic of 
a function class F is 


H,(F) = log, Ne(F), 


where N,(F) is the minimum number of points in an e- 
net in F. Broadly speaking, the e-entropy of a function 
class F is the amount of information needed to specify 


with accuracy ¢ a function of the class F. A main prob- 
lem in e-entropy is estimates for the rate of growth of 
H,(F) as ¢ — 0 for Lipschitz functions, classes of ana- 
lytic functions and functions possessing a given num- 
ber of derivatives. A.N. Kolmogorov showed that the e- 
entropy of r times continuously differentiable functions 
of n variables grows as ¢~”” [20]. 

Since a digital computer can store only a finite set 
of numbers, functions must be replaced by such finite 
sets. Therefore, studies in ¢-entropy are important for 
the correct estimation of the possibilities of computa- 
tional methods for approximately representing func- 
tions, their implementation on computers and their 
storage in the computer memory. 

Also ¢-entropy has many other applications [23]. 
An e-net of Lipschitz functions of n variables is con- 
structed to design global optimization algorithms. This 
é-net is based on the Kolmogorov’s minimal ¢-net 
of one-dimensional Lipschitz functions and is en- 
coded in terms of monotone functions of k-valued 
logic. This construction gives a representation of an n- 
dimensional global optimization problem by a minimal 
number of one-dimensional ones without loss of infor- 
mation [13]. 

Let us briefly recall the history of the solution of 
the Hilbert’s thirteenth problem by Kolmogorov and V. 
Arnol’d. Hilbert’s problem was first solved on the basis 
of ideas by using technique developed by A. Kronrod 
[14]. In this way Kolmogorov proved that any contin- 
uous function of n > 4 variables can be represented as 
a superposition of continuous functions of three vari- 
ables [11]. For an arbitrary function of four variables 
the representation has the form 


F (x1, X2, x3, X4) 


4 
= h"[x4, oi (X1, X2, X3), 95 (x1, X2, X3)]. 
r=1 


The question whether an arbitrary continuous func- 
tion of three variables can be represented as a super- 
position of continuous functions of two variables re- 
mained open. The method reduced the representabil- 
ity of functions of three variables as superpositions of 
functions of two variables to a representability prob- 
lem for functions defined on universal trees of three- 
dimensional space [21]. 
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Contrary to the expectations of Hilbert and of his 
contemporary mathematicians, in 1957 Arnol’d [1], 
who was a student of Kolmogorov, solved the latter 
problem and gave the final solution to Hilbert’s thir- 
teenth problem in the form of a theorem asserting that 
any continuous function of n > 3 variables can be rep- 
resented as a superposition of functions of two variables 
[21]. 

A few weeks later Kolmogorov showed that any 
continuous function f of n variables can be represented 
as a superposition 


2n+1 n 


F@is.-.%0) = >> tq | > G?4(xp) (2) 
q p= 


=1 


of continuous functions of one variable and the oper- 
ation of addition [12]. In Kolmogorov’s representation 
(2) the inner functions #?4 are fixed and only the outer 
functions x, depend on the represented function f. 

The results of [11] do not follow from the theorem 
presented in [12] in their exact statements, but their 
essence (in the sense of the possibility of representing 
functions of several variables by means of superposi- 
tions of functions of a smaller number of variables and 
their approximation by superpositions of a fixed form 
involving polynomials in one variable and addition) is 
obviously contained in it [12]. The method for prov- 
ing the theorem is more elementary than that in [1,11] 
and reduces to direct constructions and calculations. In 
Kolmogorov’s opinion, the proof of the theorem was 
his most technically difficult achievement [21]. 

Thorough proofs of Kolmogorov’s theorem and the 
lemmas of his paper [12] were published in [16,18,20] 
and others. G. Lorenz [16] noted that the outer func- 
tions x, can be replaced by a single function y. D. 
Sprecher [18] reduced all the inner functions to trans- 
lations and extensions of a single function y with the 
property that there exits ¢ > 0 and A > 0 such that any 
continuous function of m variables can be represented 
as 


2n+1 


flxi,---5%n) = D> XP (xp tog) + Ql. (3) 


q=1 


B. Fridman [7] proved that the inner functions $4 
in (2) can be chosen so that they satisfy a Lipschitz con- 
dition. Sprecher [19] extended this result to the repre- 


sentation (3) (the function w can be chosen to satisfy 
a Lipschitz condition). 

It follows from Kolmogorov’s representation (2) 
and Bari’s representation [3] of any continuous func- 
tion of one variable as a sum of three superpositions of 
absolutely continuous functions )°f; ° gk that all con- 
tinuous functions of any number of variables can be 
represented by means of superpositions of absolutely 
continuous functions of one variable and the operation 
of addition [21]. 

In the opposite direction are the results of A. Vi- 
tushkin [25] and L. Bassalygo [4]. When we deal with 
superpositions of formal series or analytic functions it 
can be shown that, for example, almost every entire 
function has at an arbitrary point of C’ a germ which 
is not expressible by superposition of series in two vari- 
ables. So there are many more entire functions of three 
variables than of two [10]. The result of Vitushkin is 
that there exist r times continuously differentiable func- 
tions of n variables that cannot be expressed in terms of 
finite superpositions of s > 1 times continuously differ- 
entiable functions of k < n variables if n/r > ks [25], rep- 
resentability depends on n/r. Bassalygo proved that for 
any three functions ~, continuous on a square there 
exists a continuous function f which cannot be repre- 
sented as )> 7x ° Wx for any continuous 7; [4]. 
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Did you ever watch how a spider catches a fly or 
a mosquito? Usually, a spider hides at the edge of its 
net. When a fly or a mosquito hits the net, the spider 
will pick up each line in the net to choose the tense one 
and then goes rapidly along the line to its prey. Why 
does the spider chooses the tense line? Some biologists 
explain that the line gives the shortest path from the spi- 
der to its prey. 

Did you heard the following story about a wise gen- 
eral? He had a duty to capture a town behind a moun- 
tain. When he and his soldiers reached the top of the 
mountain, he found that his enemy had already ap- 
proached the town very closely from another way. His 
dilemma was how to get in the town before the enemy 
arrive. It was a challenging problem for the general. The 
general solved the problem by asking each soldier to 
roll down the mountain in a blanket. Why is this faster? 
Physicists tell us that a free ball rolling down a moun- 
tain always chooses the most rapid way. 

Do you know the tale of a horse match of Tian Gi? 
It is a story set in BC time. Tian Gi was a general in one 
of several small counties of China, called Qi. The King 
of Qi knew that Tian Gi had several good horses and 
ordered Tian Gi to have a horse match with him. The 
match consisted of three rounds. In each round, each 
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side chose a horse to compete with the other side. Tian 
Gi knew that his best horse could not compete with the 
best one of the King, his second best horse could not 
compete with the second best one of King, and his third 
best horse could not compete with the third best one 
of the King. Therefore, he did not use his best horse 
against the best horse of the King. Instead, he put his 
third best horse in the first round against the best one 
of the King, his best horse in the second round against 
the second best one of the King, and his second best 
horse in the third round against the third best one of 
the King. The final result was that although he lost the 
first round of the match, he won the last two rounds. 
Tian Gi’s strategy was the best to win this match. To- 
day, economists tell us that many economic systems 
and social systems can be modeled into games. Each 
contestant in the game tries to maximize certain ben- 
efits. 

Optimality is a fundamental principle, establishing 
natural lows, ruling biologic behaviors, and conducting 
social activities. Therefore, optimization started from 
the earliest stages of human civilization. Of course, 
before mathematics was well established, optimization 
could be done only by simulation. One may find many 
wise men’s stories in the human history about it. For 
example, to find the best way to get out of a mountain, 
someone followed a stream, and to find the best way to 
get out from a desert, someone set an old horse free and 
followed the horse’s trace. 

In the 19th century or even today, simulation is still 
used for optimizing something. For example, to find 
a shortest path on a network, one may make a net with 
rope in a proportional size and pull the net tightly be- 
tween two destinations. The tense rope shows the short- 
est path. To find an optimal location of a school for 
three villages, one may drill three holes on a table and 
put a piece of rope in each hole. Then tie three rope- 
ends above the table together and hang a one-kg-weight 
on each rope-end under the table. When this mechani- 
cal system is balanced, the knot of the three rope-pieces 
points out the location of the school. 

The history of optimization in mathematics can be 
divided into three periods. 

In the first period, one did not know any gen- 
eral method to find a maximum/minimum point of 
a function. Only special techniques were found to max- 
imize/minimize some special functions. A typical func- 


tion is the quadratic function of one variable 
y=ax?+bx+c. 


The study of quadratic functions was closely related to 
the study of constantly-accelerating movement. What 
is the highest point that a stone is thrown out with cer- 
tain initial speed and certain angle? What is the far- 
thest point where a stone thrown with certain initial 
speed can reach when throwing angle varies? These 
were questions considered by some physicists and gen- 
erals. In fact, the stone-throwing machine was an im- 
portant weapon in military. 

Today (as of 2000), 
minimum points of a quadratic function is still an 


computing maximum/ 


important technique of optimization, existing in ele- 
mentary mathematics books. The technique had been 
also extended to other functions such as 


_ xetx+1 
x2 42x 43° 


Actually, multiplying both sides by x?+ 2x+3 and sim- 
plifying, we obtain 


(y— 1)x? + (2y—1)x + By —1) = 0. 
Since x is a real number, we must have 

(2y— 1? —A(y— 1)(3y—1) = 0. 
Therefore, 

—8y + 12y-3>0, 
that is, 

2(3— V3) < y < 2(3 + V3). 


It is interesting to note that with this technique we ob- 
tained the global maximum and minimum of y. 

A new period started in 1646 by P. de Fermat. He 
proposed, in his paper [5], a general approach to com- 
pute local maxima/minima points of a differentiable 
function, that is, setting the derivative of the function to 
be zero. Today, this approach is still included in almost 
all textbooks of calculus as an application of differenti- 
ation. In this period, optimization existed scattered and 
disorderly in mathematics. Because optimization had 
not become an important branch of applied mathemat- 
ics, some mathematicians did not pay so much atten- 
tion to results on optimization and some contributions 
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were even not put in any publication. This left many 
mysteries in the history of optimization. 

For example, who is the first person who proposed 
the Steiner tree? It was one such mystery. To obtain 
a clear view, let us explain it in a little detail. 

In the same paper mentioned above, Fermat also 
studied a problem of finding a point to minimize the 
total distance from it to three given points in the Eu- 
clidean plane. Suppose three given points are (x1, y1); 
(x2, y2), and (x3, y3). Then the total distance from 
a point (x, y) to these three points is 


3 
fly) = f@—mP +0. 
i=1 


By Fermat’s general method, the minimum point of f(x, 
y) must satisfy the following equations 


of X— Xj 
— = =0 
dx dX J (x — xi)? + (y — yi)? 


of _o~< y— Vi = 
aX _ 


VORP ta 


However, obtaining x and y from this system of equa- 
tions seems hopeless. Therefore, Fermat mentioned this 
problem again in a letter to A. Mersenne that it would 
be nice if a clear solution could be obtained for this 
problem. 

E. Torricelli, a student of G. Galilei, obtained 
a clever solution with a geometric method. He showed 
that if three given points form a triangle without an an- 
gle of at least 120°, then the solution is a point at which 
three segments from it to three given points produce 
three angles of 120°. Otherwise, the solution is the given 
point at which the triangle formed by the three given 
points has an angle of at least 120°.This result can also 
be proved by the mechanic system described at the be- 
ginning of this article. In the first case, the knot of the 
three rope-pieces stays not at any given point and hence 
the balance condition of the three forces of equal mag- 
nitude yields the condition on the angles. In the second 
case, the knot falls in one of the three holes, and the 
condition on the angle guarantees that the knot would 
not move away from the hole. 

Fermat’s problem was extensively studied later and 
was generalized to four points by J.Fr. Fagnano in 1775 
and to n points by P. Tedenat and S. L’Huiller in 1810. 


Fagnano pointed out that it is very easy to find the so- 
lution of Fermat’s problem for four points. When four 
given points form a convex quadrilateral, the solution 
of Fermat’s problem is the intersection of two diago- 
nals, i.e., the intersection of two diagonals minimizes 
the total distance from one point to four given points. 
Otherwise, there must be one of the given points ly- 
ing inside the triangle formed by the other three given 
points; this given point is the solution. 

On March 19, 1836, H.C. Schumacher wrote a let- 
ter to C.F. Gauss. In his letter, he mentioned a paradox 
about Fermat’s problem: Consider a convex quadrilat- 
eral ABCD. It has been known that the solution of Fer- 
mat’s problem for four points A, B, C, and D is the inter- 
section E of diagonals AC and BD. Suppose extending 
DA and CB can obtain an intersection F. Now, moving 
A and B to F. Then E will also be moved to F. However, 
when the angle at F is less than 120°, the point F can- 
not be the solution of Fermat’s problem for three given 
points F, D, and C.What happens? 

On March 21, 1836, Gauss wrote a letter to Schu- 
macher in which he explained that the mistake of Schu- 
macher’s paradox occurs at the place where Fermat’s 
problem for four points A, B, C, and D is changed to 
Fermat’s problem for three points F, C, and D. When 
A and B are identical to F, the total distance from E 
to four points A, B, C, and D equals 2EF + EC + ED, 
not EF + EC + ED. Thus, the point E may not be the 
solution of Fermat’s problem for F, C, and D. More 
importantly, Gauss proposed a new problem. He said 
that it is more interesting to find a shortest network 
rather than a point. Gauss also presented several pos- 
sible connections of the shortest network for four given 
points. 

Unfortunately, Gauss’ letter was discovered only in 
1986. From 1941 to 1986, many publications have fol- 
lowed R. Courant and H. Robbins who in their popular 
book [2] called Gauss’ problem as the Steiner tree prob- 
lem. The Steiner tree has become a popular and impor- 
tant name. If you search ‘Steiner tree’ with ‘yahoo.com’ 
on the internet, then you will receive a list of 4675 web- 
pages on Steiner trees. We have no way to change back 
the name from Steiner trees to Gauss trees. It may be 
worth mentioning that J. Steiner, a geometrician in 19th 
century whose name is used for the shortest networks, 
has not been found so far to have any significant con- 
tribution to Steiner trees. 
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G.B. Dantzig, who first proposed the simplex 
method to solve linear programming in 1947, stated in 
[4]: “What seems to characterize the pre- 1947 era was 
lack of any interests in trying to optimize’. Due to the 
lack of interests in optimization, many important works 
appeared before 1947 were ignored. This happened not 
only for Steiner trees, but also to other areas of opti- 
mization, including some important contributions in 
linear and nonlinear programming. 

The discovery of linear programming started a new 
age of optimization. However, in [4], Dantzig made 
the following comment: ‘Linear programming was un- 
known prior to 1947’. This is not quite correct; there 
were some late exceptions. J.B.J. Fourier (of Fourier se- 
ries fame) in 1823 and the well-known Belgian math- 
ematician Ch. de la Vallée Poussin in 1911 each wrote 
a paper about it. Their work had as much influence on 
post- 1947 developments as would finding in an Egyp- 
tian tomb an electronic computer built in 3000 BC. L.V. 
Kantorovich’s remarkable 1939 monograph on the sub- 
ject was also neglected for ideological reasons in the 
USSR. It was resurrected two decades later after the ma- 
jor developments had already taken place in the West. 
An excellent paper by F.L. Hitchcock in 1941 on the 
transportation problem was also overlooked until after 
others in the late 1940s and early 1950s have indepen- 
dently rediscovered its properties. 

He also recalled how he made his discovery: “My 
own contribution grew out of my World War II expe- 
rience in the Pentagon. During the war period (1941- 
1945), I had become an expert on programming- 
planning methods using desk calculators. In 1946 I was 
mathematical advisor to the US Air Force Comptroller 
in the Pentagon. I had just received my PhD (for re- 
search I had done mostly before the war) and was look- 
ing for an academic position that would pay better than 
a low offer I had received from Berkeley. In order to 
entice me to not take another job, my Pentagon col- 
leagues, D. Hitchcock and M. Wood, challenged me to 
see what I could do to mechanize the planning pro- 
cess. I was asked to find a way to more rapidly com- 
pute a time-staged development, training and logistical 
supply program. In those days mechanizing planning 
meant using analog devices or punch-card equipment. 
There were no electronic computers’. 

This challenge problem made Dantzig discover his 
great work in linear programming without electronic 


computer. But, we have to point out that it is due to 
the rapid development of computer technology that ap- 
plications of linear programming can be made so wide 
and so great, and areas of optimization can have so fast 
growing. 

In 1951, A.W. Tucker and his student H.W. Kuhn 
published the Kuhn-Tucker conditions. This is con- 
sidered as an initial point of nonlinear programming. 
However, A. Takayama has an interesting comment on 
these condition: “Linear programming aroused interest 
in constraints in the form of inequalities and in the the- 
ory of linear inequalities and convex sets. The Kuhn- 
Tucker study appeared in the middle of this interest 
with a full recognition of such developments. However, 
the theory of nonlinear programming when constraints 
are all in the form of equalities has been known for 
a long time - in fact, since Euler and Lagrange. The 
inequality constraints were treated in a fairly satisfac- 
tory manner already in 1939 by Karush. Karush’s work 
is apparently under the influence of a similar work in 
the calculus of variations by Valentine. Unfortunately, 
Karush’s work has been largely ignored’. Yet, this is an- 
other work that appeared before 1947 and it was ig- 
nored. In the 1960s, G. Zoutendijk, J.B. Rosen, P. Wolfe, 
M.J.D. Powell, and others published a number of al- 
gorithms for solving nonlinear optimization problems. 
These algorithms form the basis of contemporary non- 
linear programming. 

In 1954, L.R. Ford and D.R. Fulkerson initiated the 
study on network flows. This is considered as a start- 
ing point on combinatorial optimization although Fer- 
mat is the first one who studied a major combina- 
torial optimization problem. In fact, it was because 
of the influence of the results of Ford and Fulker- 
son, that interests on combinatorial optimization were 
growing, and so many problems, including Steiner 
trees, were proposed or re-discovered in history. In 
1958, R.E. Gomory published the cutting plane method. 
This is considered as an initiation of integer program- 
ming, an important direction of combinatorial opti- 
mization. 

In 1955, Dantzig published his paper [3] and E.M.L. 
Beale proposed an algorithm to solve similar problems. 
They started the study on stochastic programming. R.J- 
B. Wets in the 1960s, and J.R. Birge and A. Prékopa in 
the 1980s made important contributions in this branch 
of optimization. 
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Now, optimization has merged into almost every 
corner of economics. New branches of optimization 
appeared in almost every decade, global optimization, 
nondifferential optimization, geometric programming, 
large scale optimization, etc. No one in his/her whole 
life is able to study all branches in optimization. Each 
researcher can only be an expert in a few branches of 
optimization. 

Of course, the rapid development of optimization 
is accomplished with recognition of its achievements. 
One important fact is that several researchers in opti- 
mization have received the Nobel Prize in economics, 
including Kantorovich and T.C. Koopmans. They re- 
ceived the Nobel Prize on economics in 1975 for their 
contributions to the theory of optimum allocation of re- 
sources. H.M. Markowitz received the Nobel Prize on 
economics in 1990 for his contribution on the quadratic 
programming model of financial analysis. 

Today, optimization has become a very large and 
important interdisciplinary area between mathematics, 
computer science, industrial engineering, and manage- 
ment science. The ‘International Symposium on Math- 
ematical Programming’ is one of major conferences on 
optimization. From the growing number of papers pre- 
sented in this conference we may see the projection of 
growing optimization area: 


1949) 
1951) 
1955) 
1959) 
1962) 
1964) 
1967) 
1970) 
1973) 
1976) 
1979) 
1982) 


Chicago, USA, 34 papers; 
Washington DC, USA, 19 papers; 
Washington DC, USA, 33 papers; 
Santa Monica, USA, 57 papers; 
Chicago, USA, 43 papers; 
London, UK, 83 papers; 
Princeton, USA, 91 papers; 

The Hague, The Netherlands, 137 papers; 
Stanford, USA, about 250 papers; 
Budapest, Hungary, 327 papers; 
Montreal, Canada, 458 papers; 
Bonn, FRG, 554 papers; 

1985) Cambridge, USA, 589 papers; 
1988) Tokyo, Japan, 624 papers. 

(This data is quoted from [1].) 


With the current fast growth of computer technology 
optimization it is expected to continue its great speed 
of developments. These developments may contain in- 
clude a deep understanding of the successful heuristics 
for combinatorial optimization problems with nonlin- 


ear programming approaches. It may also include dig- 
ital simulations to some natural optimization process. 
As many mysteries and open problems still exist in op- 
timization, it will still be an area receiving a great atten- 
tion. 
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The linear program 


min cx 
st. Ax =b, (1) 
x>0 


may have an optimal solution, be primal infeasible or 
be dual infeasible for a particular set of data c € R", b 
€ R”, and A € R”*". In fact the problem can be both 
primal and dual infeasible for some data where (1) is 
denoted dual infeasible if the dual problem 


max bly 
st. Alyts=c, (2) 
s>0 


corresponding to (1) is infeasible. The vector s is the so- 
called dual slacks. 

However, most methods for solving (1) assume that 
the problem has an optimal solution. This is in partic- 
ular true for interior point methods. To overcome this 
problem it has been suggested to solve the homogeneous 
and selfdual model 


min 0 

st. Ax —br =0, 
—Aly+cr>0, (3) 
b'y—c'x>0, 
x>0, t=0, 


instead of (1). Clearly, (3) is a homogeneous LP and is 
selfdual which essentially follows from the constraints 
form a skew-symmetric system. The interpretation of 
(3) is t is a homogenizing variable and the constraints 
represent primal feasibility, dual feasibility, and re- 
versed weak duality. 

The homogeneous model (3) was first studied by 
A.J. Goldman and A.W. Tucker [2] in 1956 and they 
proved that (3) always has a nontrivial solution (x*, y*, 


t*) satisfying 


xFse = 0, Vj 

x*+s*>0, Vi, 

J ae J J (4) 
t*xk* =0 


where s* :=ct* — A! y* > Oand x* = b!' y* —c! 

x* > 0. A solution to (3) satisfying the condition (4) is 

said to be a strictly complementary solution. Moreover, 

Goldman and Tucker showed that if (x*, t*, y*, s*, «*) 

is any strictly complementary solution, then exactly one 

of the two following situations occurs: 

e t* >0 if and only if (1) has an optimal solution. In 
this case(x*, y*, s*)/t* is an optimal primal-dual so- 
lution to (1). 

e «* >0 if and only if (1) is primal or dual infeasible. 
In the case b? y* >0 (c! x* < 0) then (1) is primal 
(dual) infeasible. 

The conclusion is that a strictly complementary solu- 

tion to (3) provides all the information required, be- 

cause in the case t* > 0 then an optimal primal-dual 

solution to (1) is trivially given by (x, y, s) = (x*, y*, 

s*)/t*. Otherwise, the problem is primal or dual infea- 

sible. Therefore, the main algorithmic idea is to com- 

pute a strictly complementary solution to (3) instead of 
solving (1) directly. 

Y. Ye, M.J. Todd, and S. Mizuno [6] suggested to 
solve (3) by solving the problem 


nz 
st. Ax—br—bz=0, 
—Aly+ct+tz>0, 


min 


b'y—c'x+dz>0, ” 
b'y—t'x—dt =—n?, 
x>0, 1t=0, 
where 
b:= Ax? — bt®, 
C:=—cr® + Aly? +59, 
d:=c!x®— bly? + x9, 


wec= (x°)T 5° oi 1°K?, 
and 


oe Pee) = (10,1) 
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(e is an n vector of all ones). It can be proved that 
the problem (5) always has an optimal solution. More- 
over, the optimal value is identical to zero and it is easy 
to verify that if (x, t, y, z) is an optimal strictly com- 
plementary solution to (5), then (x, t, y) is a strictly 
complementary solution to (3). Hence, the problem (5) 
can solved using any method that generates an optimal 
strictly complementary solution because the problem 
always has a solution. Note by construction then (x, tT, 
y, Z) = (x°, r°, y®, 1) is an interior feasible solution to 
(5). This implies that the problem (1) can be solved by 
most feasible-interior point algorithms. 

X. Xu, P.-F. Hung, and Ye [4] suggest an alternative 
solution method which is also an interior point algo- 
rithm, but specially adapted to the problem (3).The so- 
called homogeneous ee can be stated as uae 
1) Choose (x°,2", y°,.s°,.«°) such that @°, c°,.3°, «> 

0. Choose ey, €g > 0 and y € (0, 1) and let 77 := 1 -y. 
2) k:=0. 


3) Compute: 
o = br* — Axk, 
# := ct* Ala - sk 
k Tk Tk 
= ck clack — yl yk, 
_ ees ae 
a n+1 , 


4) If || (er rk) || < éf and ws €g, then terminate. 
5) Solve the linear equations 


Ad, — bd, =nr5, 
A'd, +d,—cd,; = nek, 
—cl dy +b! dy — dy =nrf, 
S'd, + X*d, = —X*s* + yppke, 
ede + t* de = =r * + yer, 


for (dx, dr, dy, ds, dy) where Xk = diag(x*) and Sk := 


diag(s*). 
6) For some 6 € (0, 1), let a* be the optimal objective 
value to 


max 60a 


xk 
ck 
s.t. ah = 0, 
ck 
g71 


RASS 


i 


7) 
xktl xk d, 
gktl uk d, 
yt |= | yk | tak) dy 
k+l sk d, 
kth Kk dx 

8) k=k+1. 

9) goto 3) 


The following facts can be proved about the algorithm 


mo =(l=(L= yer, 
rt? = (1—-(1—y)ak)rk, 


m= =(= ye"), 


and 


((x* + A +1 ak. tk t 1k t ) 
= (1—(1— y)ark)((x*) "sk + rk), 


which shows that the primal residuals (r,), the dual 
residuals (rq), the gap residual (r,), and the oe 
mentary gap (x's + tk) all are reduced eens if 2 >0 
and at the same rate. This shows that (x*, c* as s*, x*) 
generated by the algorithm converges towards an opti- 
mal solution to (3) (and the termination criteria in step 
4) is ultimately reached). In principle the initial point 
and the stepsize a* should be chosen such that 


min(x} tkx*) > Buk, fork =0,1,..., 
is satisfied for some f € (0, 1) because this guarantees 
(xk, tk, rae sk, K*) converges towards a strictly comple- 
mentary solution. Finally, it is possible to prove that the 
algorithm has the complexity O(n*°L) given an appro- 
priate choice of the starting point and the algorithmic 
parameters. 

Further details about the homogeneous algorithm 
can be seen in [3,5]. Issues related to implementing the 
homogeneous algorithm are discussed in [1,4]. 


See also 


> Entropy Optimization: Interior Point Methods 

> Interior Point Methods for Semidefinite 
Programming 

> Linear Programming: Interior Point Methods 
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> Linear Programming: Karmarkar Projective 
Algorithm 

> Potential Reduction Methods for Linear 
Programming 

> Sequential Quadratic Programming: Interior Point 
Methods for Distributed Optimal Control Problems 

> Successive Quadratic Programming: Solution by 
Active Sets and Interior Point Methods 
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Let V be an £-dimensional affine space over the field K. 
An arrangement of hyperplanes, A, is afinite collection 
of codimension one affine subspaces in V, [5]. 


Some Examples 


1) A subset of the coordinate hyperplanes is called 
a Boolean arrangement. 

2) Anarrangement is in general position if at each point 
it is locally Boolean. 

3) The braid arrangement consists of the hyperplanes 
{xj = xj:1 <i<j < ¢}. Itis the set of reflecting hyper- 
planes of the symmetric group on £ letters. 

4) The reflecting hyperplanes of a finite reflection 
group is a reflection arrangement. 


Combinatorics 


An edge X of A is a nonempty intersection of elements 
of A. Let L(A) be the set of edges partially ordered by 
reverse inclusion. Then L is a geometric semilattice with 
minimal element V, rank given by codimension, and 
maximal elements of the same rank, r(A). The Moe- 
bius function on L is defined by (V) = 1 and for X> 
V, Vivey<xH(Y) = 0. The characteristic polynomial 
of A is x(A, t) = x e1u(X)'™. The B-invariant of 
A is B(A) = (-1)'"™ x(A, 1). For a generic arrange- 
ment of n hyperplanes y(A, t) = Tipe (2) pik, 
For the braid arrangement y(A, t) = t(t—1)(t— 2) --: 
(t—(€— 1)). Similar factorizations hold for all reflection 
arrangements involving the (co)exponents of the reflec- 
tion group. Given a p-tuple of hyperplanes, S = (H;, 
...» Hp), let 9 S = H, 1 -+- O Hy and note that N S 
may be empty. We say that S is dependent if 1 S A @ 
and codim(N S)< |S]. Let E(A) be the exterior algebra 
on symbols (H) for H € A where product is juxtaposi- 
tion. Define 0: E > E by 01 = 0, 0(H) = 1 and for p > 
2, 0(My «++ Hyp) = P_(- I! «+» Ay s+ Hp). Let 
I(A) be the ideal of E(A) generated by {S: N S = O} U 
{0S:S is dependent}. The Orlik-Solomon algebra of A is 
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A(A) = E(A)/I(A). See also connections with matroid 
theory [3]. 


Divisor 

The divisor of A is the union of the hyperplanes, N(A). 
If K = R or K = C, then N has the homotopy type of 
a wedge of f(A) spheres of dimension r(A)— 1, [4]. 
The singularities of N are not isolated. The divisor of 
a general position arrangement has normal crossings, 
but this is not true for arbitrary A. Blowing up N along 
all edges where it is not locally a product of arrange- 
ments yields a normal crossing divisor. 


Complement 


The complement of A is M(A) = V— N(A). 

1) If K = F,, then M is a finite set of cardinality |M| = 
X(A, 4). 

2) If K = R, then M is a disjoint union of open con- 
vex sets (chambers) of cardinality (=1)° X(A, — 1). If 
r(A) = £, M contains 6(A) chambers with compact 
closure, [7]. 

3) If K =C, then M is an open complex (Stein) mani- 
fold of the homotopy type of a finite CW complex. 
Its cohomology is torsion-free and its Poincaré polyno- 
mial is Poin(M, t) = (—t)*y(A, — t7!). The product 
structure is determined by the isomorphism of graded 
algebras H*(M) ~ A(A). The fundamental group of M 
has an effective presentation but the higher homotopy 
groups of M are not known in general. The comple- 
ment of a Boolean arrangement is a complex torus. In 
a general position arrangement of n> ¢ hyperplanes M 
has nontrivial higher homotopy groups. For the braid 
arrangement, M is called the pure braid space and its 
higher homotopy groups are trivial. The symmetric 
group acts freely on M with orbit space the braid space 
whose fundamental group is the braid group. The quo- 
tient of the divisor by the symmetric group is called 
the discriminant, which has connections with singular- 

ity theory. 


Ball Quotients 


Examples of algebraic surfaces whose universal cover is 
the complex ball were constructed as ‘Kummer’ covers 
of the projective plane branched along certain arrange- 
ments of projective lines, [2]. 


Logarithmic Forms 


For H € A choose a linear polynomial wy with H = 
ker wy and let Q(A) = [[ye4 an. Let 2?[V] denote 
all global regular (i.e., polynomial) p-forms on V. Let 
§2?(V) denote the space of all global rational p-forms on 
V. The space §2?(A) of logarithmic p-forms with poles 
along A is 


Q°(A) = {w € QP(V): Qw € 2P[V], 
Q(dw) € 2°**[V]}. 


The arrangement is free if 2'(.A) is a free module over 
the polynomial ring. A free arrangement A has integer 
exponents {b,, ..., b“} so that x(A, #) = []{_,(t-by). 
Reflection arrangements are free. This explains the fac- 
torization of their characteristic polynomials. 


Hypergeometric Integrals 


Certain rank one local system cohomology groups of M 
may be identified with spaces of hypergeometric inte- 
grals, [1]. If the local system is suitably generic, these 
cohomology groups may be computed using the alge- 
bra A(A). Only the top cohomology group is nonzero 
and it has dimension 6(A). See [6] for connections with 
the representation theory of Lie algebras and quantum 
groups, and with the Knizhnik-Zamolodchikov differen- 
tial equations of physics. 


See also 


> Hyperplane Arrangements in Optimization 
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A finite set S of hyperplanes in R¢ defines a dissection of 
R into connected sets of various dimensions. We call 
this dissection the arrangementA(S) of S. 

Given a vector n = (n, ..., Na) € R? — {0} and 
a number 7) € R, we may define a hyperplane H and 
associated halfspaces H~, H* by 


H= {xeR’: n-x= nok. 
He = {x eR? nox <ml. 
Ht = {re ERt: n-x > no}. 


Clearly, H, H~, H* are disjoint and HU H~ U H* =R4, 

We may now specify the location of a point relative 
to the set of hyperplanes S = {H;,..., H,}. For a point p 
and 1 <j <n, define 


-1 ifpeH,, 
s(p)= 40  ifpe H,, 
+1 ifpe HF. 


The vector s(p) = (si (p), ..., Sn(p)) is called the position 
vector of p. 


Clearly there are at most 3” possible position vec- 
tors, however, in general most of these will not occur. 
We say that points p and glie on the same face if s(p) = 
s(q). The nonempty set of points with position vector r 
is called the face f(r): 


(o= \p ER*: s(p) = r\ 


The nonempty sets of this form are called the faces of 
the arrangement A(S). The position vector of a face f(r) 
= g is defined to be r, 


s(f(r)) =r. 


A face f is called a k-face if its dimension is k. Spe- 
cial names are used to denote k-faces for special val- 
ues of k: a 0-face is called a vertex, a 1-face is called 
an edge, a (d—1)-face is called a facet, and a d-face is 
called a cell. A face is said to be a subface of another 
face g if the dimension of f is one less than the dimen- 
sion of g and f is contained in the boundary of g; it fol- 
lows that s;(f) = 0 unless s; (f) = s; (g) for 1 <i<n. 
If f is a subface of g, then we also say that f and g are 
incident (upon each other) or that they define an inci- 
dence. 

An arrangement A(S) of n > d hyperplanes is 
called simple if any d hyperplanes of S have a unique 
point in common and if any d + 1 hyperplanes have 
no point in common. If n < d, we say that A(S) 
is simple if the common intersection of the n hy- 
perplanes is a (d—n)-flat. For more details see [3,4] 
and [5]. 

As an application of hyperplane arrangements in al- 
gorithm design for optimization problems, see [1].In 
it the problem of minimizing the Euclidean distance 
function on R” subject to m equality constraints and 
upper and lower bounds (box constraints) is consid- 
ered. A parametric characterization in R” of the family 
of solutions to this problem is provided, thereby show- 
ing equivalence with a problem of search in an arrange- 
ment of hyperplanes in R”. This characterization and 
the technique for constructing arrangements due to H. 
Edelsbrunner, J. O'Rourke and R. Seidel are used to de- 
velop an exact algorithm for the problem. The algo- 
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rithm is strongly polynomial running in time O(n”) for 
each fixed m. 


See also 


> Hyperplane Arrangements 
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Introduction 


Kinetic phenomena drive the macroscopic behavior of 
biological, chemical, and physical systems. The lack of 
mechanistic understanding of these kinetic phenomena 
is still the major bottleneck for a more widespread ap- 
plication of model-based techniques in process design, 
optimization, and control. In recent years, kinetic phe- 
nomena have become of increasing importance given 
the rapidly developing capabilities for the numerical 
treatment of more complex models on the one hand 
and the need for predictive models on the other. 
Despite this demand, kinetic modeling of process 
systems is still a challenge. This contribution presents 
systematic work processes to derive and validate mod- 
els that capture the underlying physicochemical mech- 
anisms of an observed behavior. The work process of 
model-based experimental analysis (or MEXA for short) 


is introduced in the next section. The key factor in the 
procedure is an incremental strategy for model structure 
refinement tailored for the identification of reaction ki- 
netics and transport phenomena [30]. While identifica- 
tion of kinetic models from experimental data can, in 
principle, be performed by application of standard sta- 
tistical tools of nonlinear regression [2] and model dis- 
crimination [39], this direct approach in general leads 
to a large number of NLP or even MINLP problems 
being solved [16,21,34,37] that may be computation- 
ally prohibitive and in particular does not reflect the 
underlying physics. In contrast, the incremental iden- 
tification approach discussed here presents a physically 
motivated and adapted divide-and-conquer strategy to 
the complex optimization problem of kinetic model 
identification. Applications of this approach in the ar- 
eas of (bio)chemical reactions [6,12,13,15,32], multi- 
component diffusion [3,5], and heat transfer in fluid 
flow [22,25] are discussed. 


Methods and Applications 
Model-Based Experimental Analysis 


The typical work flow of the MEXA procedure is as fol- 

lows (Fig. 1): 

1. An initial experiment with a suitable measurement 
system is designed on the basis of a priori knowledge 
and intuition. 

2. A first mathematical model of experiment and mea- 
surement system is proposed. 

3. Numerical simulation studies are performed to ex- 
plore the expected behavior of the experiment. 

4. The model is then employed for rigorous experimen- 
tal design [41] to gain maximum information with 
respect to the goal of the investigation. 


1550 


Identification Methods for Reaction Kinetics and Transport 


experimental 
design 


mn experimental 


conditions 


r — ee a ee 
| 

and intuition 

I 

I SSSR SCRE R ES Che Fee oes cemeeweeaes ee 
| ; on O88 Oo8 Cees eeees oe ee ee 

Li 

li 

I : inputs, parameters, computed states 

I : initial conditions and measurements 

i 1 


iterative improvement of experiment 


extended understanding 


iterative model refinement 


| 
I 
| 
| 
J 


@eu ene cenene srosenesmensoceel 


model structure, 
parameters, 


formulation and solution 
of inverse problems 


sensor 44, model 
calibration selection |. 
inputs, states, 
confidence regions 
measurements 


Identification Methods for Reaction Kinetics and Transport, Figure 1 


Model-based experimental analysis [30] 


5. The designed experiment is performed and at least 
some of the variables of interest are observed using 
appropriate measurement techniques. 

6. Formulation and solution of inverse problems refers 
to combinations of state, parameter, and unknown 
input estimation as well as model structure identifi- 
cation and selection. 

7. Typically, the first model does not reflect the stud- 
ied phenomena with sufficient detail and accuracy. 
Therefore, iterative model refinement, intertwined 
with iterative improvement of the experimental and 
measurement techniques, must be carried out to im- 
prove the predictive capabilities of the model based 
on the extended understanding gained. 

Work processes consisting of the steps design of exper- 

iments, data analysis, and modeling date back to at least 

the 1970s [26]. However, the development and bench- 
marking of such work processes has only recently been 
formulated as an important research objective, e. g., by 
the Collaborative Research Center CRC 540 “Model- 
based Experimental Analysis in Fluid Multi-Phase Re- 
active Systems” (http://www.sfb540.rwth-aachen.de/) 
at RWTH Aachen University as well as by Asprey and 

Macchietto [1]. The power of these work processes de- 

pends on the specific strategies employed for system- 

atically improving both the model structure and the ex- 
perimental setup in every refinement step during model 


identification. While experimental design is the focus of 
the work of Asprey and Macchietto [1], the research in 
CRC 540 is complementary and emphasizes the strat- 
egy for model structure refinement as discussed in what 
follows. 


Incremental vs. Simultaneous Model Identification 


Incremental Modeling and Identification The key 
idea of the incremental approach for model structure 
refinement is to follow the incremental steps of system- 
atic model development [29] also in model identification 
(Fig. 2). 

Therefore, the main steps of model development 
and their connection to incremental identification are 
outlined next. 


Model B In model development, balance envelopes 
and their interactions are determined first, the spa- 
tiotemporal resolution of the model is decided, and 
the extensive quantities x to be balanced are se- 
lected. The balance equation is formulated as a sum 
of generalized fluxes, e. g., 


Ox 
ae ON dels (1) 
dx 
— = A(x) + B(x)w. (2) 


dt 
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model B balance 
model BF balance flux model 


flux J(z,t) 


rate coefficient k(z,t) 


rate coeff. |parameter 
model BFR | balance flux model nae 


kinetic model: structure and parameters | 


Equation (1) exemplifies a balance for a distributed 
quantity x flowing with the flux J; and being gener- 
ated/consumed according to the source/sink term 
Js. Note that further generalized fluxes may arise 
through initial or boundary conditions. A lumped 
quantity is balanced in Eq. (2), where A,B are 
matrix functions of appropriate dimensions de- 
scribing, e.g., inter- and intraphase transport and 
source/sink terms. Note that no constitutive equa- 
tions are considered yet to specify the generalized 
fluxes J(-)! (here: Js, Js, w) as a function of the inten- 
sive thermodynamic state variables. 

In incremental model identification, the unknown 
generalized fluxes J(-) are estimated directly from 
the balance equation. For this purpose, measure- 
ments of the states x(-) with sufficient resolution in 
time t and/or space z are assumed. The unknown 
flux J(-) in the balance equation is then determined 
as a function of time and space coordinates — with- 
out the need for specifying a constitutive equation. 


Model BF In model development, constitutive equa- 


tions are specified for each flux term in the balances 
on the next decision level: 


IG) = Ix), Vx)... kG). (3) 


This could be, e. g., correlations for interfacial fluxes 
or reaction rates. 


'The (-)-argument summarizes the spatial and/or respective 


temporal dependency of the quantity. 


Identification Methods for Reaction Kinetics and Transport, Figure 2 
Incremental modeling and identification [30] 


Similarly, in incremental model identification on 
level BF, flux model candidates (3) are selected or 
generated to relate the flux to rate coefficients, to 
measured states, and to their derivatives. The flux 
estimates obtained on Jevel B are now interpreted as 
inferential measurements. These can then be used, 
together with the real measurements, to determine 
arate coefficient k(-) as a function of time and space. 
Often, the flux model can directly be solved for the 
rate coefficient function k(-). 


Model BFR In model development, the rate coeffi- 


cients introduced in the correlations on the level 
BF - such as a reaction rate or heat and mass trans- 
fer coefficients - often themselves depend on the 
states. Consequently, a model relating rate coeffi- 
cients and states has to be chosen on yet another 
level BFR 


k(-) = k(x(-), Vx(-),...,). (4) 


This cascaded decision process can continue as long 
as the submodels considered involve not only con- 
stant parameters @ but also functions of the states. 
Mirroring this step in incremental model identifi- 
cation, a model for the rate coefficients is identi- 
fied. This model (4) is assumed to only depend on 
the measured states and constant parameters. These 
parameters 6 can be computed from the estimated 
rate coefficients k(-) and the measured states x(-) by 
solving an algebraic regression problem. 
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Such a structured approach during model identification 
renders the individual decisions in the modeling and 
identification process completely transparent: the mod- 
eler is in full control of the model refinement process. 


Simultaneous Model Identification In the previous 
section, it was shown that there exists a natural hi- 
erarchy in models of kinetic phenomena. Classic ap- 
proaches to model identification, however, neglect this 
inherent structure. These simultaneous approaches as- 
sume that the model structure is correct and consider 
only the fully specified model (Fig. 2). Models for the 
flux expression (Model BF) and the phenomenologi- 
cal coefficients (Model BFR) have to be specified a pri- 
ori. 

In practical situations, these models are initially un- 
certain. Now, all assumptions on the process will si- 
multaneously influence the results of the model iden- 
tification procedure. The estimates may be biased if the 
parameter estimation is based on a model containing 
structural errors [42]. The theoretically optimal prop- 
erties of a maximum likelihood approach [2] are there- 
fore lost in the presence of structural model mismatch. 
Initialization and convergence may be difficult since the 
whole problem is solved in one step [18]. More impor- 
tantly, it may be difficult in a simultaneous approach 
to identify which part of the model introduced the er- 
ror. 

Furthermore, several candidate model structures 
may exist for each kinetic phenomenon. The aggre- 
gation of such submodels with the balance equations 
will inevitably lead to a multitude of candidate mod- 
els. Alternatively, general approximation schemes like 
neural nets can be used, often leading to several hun- 
dred unknown parameters. Both approaches may be 
prohibitive due to computational cost, especially when 
more complex or even distributed parameter systems 
are considered. 


Discussion of Identification Approaches The incre- 
mental approach splits the identification procedure into 
a sequence of inverse problems, thereby reducing un- 
certainty and computational complexity. It thus has the 
potential to overcome a number of the disadvantages of 
the simultaneous approach: 
e Avoid combinatorial complexity: Rather than pos- 
tulating large numbers of nested model structures, 


a structured, fully transparent process is used in 

the incremental model refinement strategy. An un- 

controlled combinatorial growth of the number of 
model candidates is avoided. 

e Reduce uncertainty: In the incremental approach, 
any decision on the model structure relates to a sin- 
gle physicochemical phenomenon. Submodel selec- 
tion is guided by the previous estimation step, which 
provides input-output data inferred from the mea- 
surements. Identifiability can also be assessed more 
easily on the level of the submodel. 

e Computational advantages: The decomposition in- 
herent in incremental model refinement avoids the 
solution of many difficult output least-squares prob- 
lems with (partial-)differential-algebraic constraints 
and potentially large data sets. Rather, an often lin- 
ear inverse problem must be solved first. All the fol- 
lowing problems are nonlinear regression problems 
with algebraic constraints - regardless of the com- 
plexity of the overall model. This decomposition not 
only facilitates initialization and convergence, but it 
also allows for incremental testing of model valid- 
ity at every decision level for the submodels. Largely 
intractable estimation problems may become com- 
putationally feasible. 

Still, it should be kept in mind that the incremental 

and the simultaneous methods were derived for dif- 

ferent purposes: the incremental approach is aimed at 
gross elimination of candidate models and/or system- 
atic derivation of suitable candidate model structures, 
whereas the simultaneous approach gives the best pa- 
rameter estimates once the correct model structure is 

known [6]. 

Multistep approaches to model identification have 
been applied rather intuitively in the past. The sequence 
of flux estimation and parameter regression is, e. g., 
commonly employed in reaction kinetics as the so- 
called differential method [19]. More recently, a two- 
step approach has been applied for the hybrid mod- 
eling of fermentation processes [36,38]. First reaction 
fluxes are estimated from measured data, then neural 
networks and fuzzy models are employed to correlate 
the fluxes with the measurements. Mahoney et al. [28] 
estimate the crystal growth rate directly from the popu- 
lation balance equations using a method of characteris- 
tics approach and indicate the possibility of correlating 
it with solute concentration next. 
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Though the incremental refinement approach is 
rather intuitive, a successful implementation requires 
tailored ingredients such as 
e high-resolution field measurement techniques for 

state variables, 

e algorithms for model-free flux estimation by inver- 
sion of the balance equations, 
e methodologies for the generation, assessment, and 
selection of the most suitable model structures, and 
e model-based experimental design methods. 
A detailed discussion of these areas in relation to in- 
cremental model identification can be found in [30]. 
Various aspects are highlighted in the following case 
studies. Here, the progress in the development of the 
incremental model identification approach is reported 
for challenging kinetic modeling problems of gradually 
increasing complexity from (bio)chemical reactions to 
diffusion in liquids and to heat transfer at falling liquid 
films. In addition, the incremental approach has been 
successful in the identification of hybrid process mod- 
els [24]. 


Case Studies 


(Bio)chemical Reaction Kinetics The identification 
of the mechanism and kinetics of chemical reactions is 
one of the most relevant and still not yet fully satisfacto- 
rily solved tasks in process systems modeling [8]. In bi- 
ological systems, the situation is often even more severe 
due to the complexity of living systems. The incremen- 
tal identification approach has been applied for a va- 
riety of reaction systems [6,12,13,15,32]. Here, selected 
features are discussed to elucidate the general proper- 
ties of this problem class. 


Model B: Reaction flux estimation in lumped systems 
For illustration, we assume a well-mixed and 
isothermal homogenous reaction system. The bal- 
ance equation for the mole number n; of species i is 
then 


dn; : 
= in ta +ff , 


i=1,...,n,., (5) 
where te fee are, respectively, the molar flow 
rates into and out of the reactor and f;" is the un- 
known reaction flux of species i. It is worth noting 
that the fluxes enter the balance equations linearly 
and the equations are decoupled for each species. 


All reaction fluxes fj‘ can thus be estimated in- 
dividually by numerical differentiation of concen- 
tration data for each measured species on level B 
from material balances only. Tikhonov-Arsenin fil- 
tering [31] or smoothing splines [6] with regulariza- 
tion parameter choice based on the L-curve or gen- 
eralized cross-validation have been shown to give 
reliable estimates. 

Model BF: Estimation of reaction rates and stoichiom- 
etry If the reaction stoichiometry is unknown, tar- 
get factor analysis (TFA) [11] is used to test possible 
stoichiometries and to determine the number of rel- 
evant reactions. The reaction rates r(t) can then be 
calculated from the typically nonsquare linear equa- 
tion system relating reaction fluxes f;"(t) and rates 
by the stoichiometric matrix N: 


fi =v(t)N'r(t), (6) 


with v(t) denoting the reactor volume. 

Model BFR: Estimation of kinetic coefficients On the 
next level, concentrations are determined either 
from smoothed measurements using nonparamet- 
ric methods [40] or unmeasured concentrations are 
reconstructed from stoichiometry and mass bal- 
ances [13]. Since a complete set of concentration 
and rate data is now available, candidate reaction 
rate laws of the general form 


r(t) = m(c(t), 8) (7) 


can now be discriminated by nonlinear algebraic re- 
gression [42]. 


Model identification may not immediately result in reli- 
able model structures and parameters because of a lack 
of information content in the data. Iterative improve- 
ment with optimally chosen experimental conditions as 
suggested in the MEXA work process can then be em- 
ployed [13]. 

The incremental identification method has been 
worked out for arbitrary reaction schemes with re- 
versible or irreversible as well as dependent or inde- 
pendent reactions. The minimum type of concentration 
measurements required to guarantee identifiability has 
been assessed theoretically. 

The incremental identification strategy has been 
used in a benchmark study considering a homogeneous 
reaction system [13]. Computational effort for model 
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identification could be reduced by almost two orders of 
magnitude using the structured search in the incremen- 
tal method. The inclusion of data-driven model sub- 
structures in hybrid models is straightforward, as al- 
ready exemplified for neural networks [15] and sparse 
grids [14]. The basic framework can easily be extended 
to nonisothermal systems, and even multiphase trans- 
port has been considered [12,23]. An application study 
to a biochemical reaction system is presented in [32]. 


Multicomponent Diffusion While phase equilib- 
rium models are available even for complex multicom- 
ponent mixtures [17], there is a lack of experimentally 
validated diffusion models in particular for multicom- 
ponent liquid mixtures [9]. The incremental identifi- 
cation of diffusive mass transport models is therefore 
outlined in this section. The application is based on the 
recently introduced Raman diffusion experiment [4,7]. 
Here, one-dimensional interdiffusion of two initially 
layered liquid mixtures is observed by 1D-Raman spec- 
troscopy. Concentration profiles c; of all species are ob- 
tained with high resolution in time and space [20]. 


Model B: Estimation of 1D-diffusion fluxes For the 1D- 
diffusion process, the mass balance equation for 
each species i can be given as 


.Ac—l. (8) 


The determination of the diffusive flux J; falls into 
the class of interior flux estimation in distributed 
parameter systems [30]. While interior fluxes can- 
not be determined in 2D or 3D situations without 
specification of a constitutive model, the model-free 
flux estimation is possible in the one-dimensional 
situation considered here. Only one nonzero mass 
flux component has to be determined from differ- 
entiated concentrations measured along a line in 
the direction of the diffusive flux. Such a strategy 
has been followed in [3,5]. 

The Raman concentration measurements were first 
differentiated with respect to time by means of 
spline smoothing [33] and subsequently integrated 
over the spatial coordinate to render a diffusive flux 
estimate without specifying a diffusion model: 


2 OC; , 
Iilz, t) --| we ae (9) 


This technique directly carries over to multicompo- 
nent diffusion [5] provided concentration measure- 
ments are available for every species. In particular, 
there is only a linear increase in complexity due to 
the natural decoupling of the multicomponent ma- 
terial balances (8). 

Model BF: Estimation of diffusion flux models A flux 
model has to be introduced on the next level. For 
example, generalized Fick or Maxwell-Stefan mod- 
els could be selected as candidates. In case of binary 
mixtures, the Fick diffusion coefficient can, e. g., be 
determined at any point in time and space: 


__ SJ, t) 
dc(z, t)/dz 


Positivity requirements may now be used, e.g., to 


D(z, t) = (10) 


assess model assumptions on this level. 

Model BFR: Estimation of model parameters The esti- 
mated diffusion coefficient data can now be corre- 
lated with the measured concentrations to obtain 
a diffusion model: 


D(z, t) = m(c(z, t), 8). (11) 


Error-in-variables methods [10] and_ statistical 
model discrimination techniques [35] are employed 
to decide on the most appropriate model for the 
concentration dependence of the diffusion coef- 
ficient. This concentration dependency has been 
shown to be even identifiable from a single Raman 
diffusion experiment [3]. An application in food 
science has recently been presented in [27]. 

In case of multicomponent diffusion, the last two 
levels have to be merged because all species con- 
centration gradients are determined by the diffu- 
sive flux of any species due to the cross effects of 
multicomponent diffusion [3,5]. The merged steps 
BF and BFR then allow for efficient initialization of 
these complex estimation problems. 


Heat Transfer at Falling Films Liquid falling films 
are a challenging benchmark problem for general fluid 
multiphase reaction systems as they show all the rele- 
vant features of this problem class. Here, the first steps 
in the application of the incremental approach to heat 
transfer in falling films are considered [22,25]. 


Model B: Boundary flux estimation in distributed sys- 
tems In order to study its heat transfer character- 
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istics, a laminar-wavy falling film is heated by re- 
sistance heating using a supporting wall as heater. 
Infrared thermography is employed to measure 
a transient 2D temperature field on the backside of 
the wall. An inverse heat conduction problem for 
the three-dimensional wall has to be solved to deter- 
mine the boundary heat flux between the wall and 
the falling film as the first step of the incremental 
approach: 


oT 

— = aAT, 12 
a 2) 
-AVT|r =w(zr,t), (13) 
—AVT\|a = q(Za, bt), (14) 


with I” and A being the parts of the surface with 
unknown and with known boundary heat fluxes 
w(zr,t) and q(za,t), respectively. The boundary 
flux estimation problem is solved by means of 
a multigrid finite-element discretization of the heat 
conduction Eq. (12) in conjunction with the conju- 
gate gradient method. Gradient computation is per- 
formed using the adjoint method. This framework 
allows for the solution of the discretized problem 
involving about three million variables on a desk- 
top computer [22]. 


These results show that the identification of kinetic 
phenomena may become feasible even in complex flow 
problems using the structured search strategy of the in- 
cremental approach. A generalization of the presented 
problem to work out the full incremental identification 
concept for heat transfer problems in falling films is 
currently in progress [25]. 
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It is generally accepted that the notion‘ill-posed prob- 
lem’ originates from a considered concept of well- 
posedness: A problem is called ill-posed if it is not well- 
posed. There are a lot of different notions of well- 
posedness (cf. [15,23,27,35,38] and [40]), which cor- 
respond to certain classes of variational problems and 
numerical methods and take into account the‘quality’ 
of the input data, in particular their exactness. For 
a comparison of different concepts of well-posedness 
see [12,15] and [35]. 

For instance, Tikhonov well-posedness [35,38] is 
convenient if we deal with methods generating feasible 
minimizing sequences, and it is not appropriate to anal- 
yse stability of exterior penalty methods. 
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We shall proceed from two concepts of well- 
posedness which are suitable for wide classes of prob- 
lems and methods. 

The first concept is destined to the problem 


min {J(u): ue Kt, (1) 


where K is a nonempty closed subset of a Banach space 
V with the norm || - || and J: V > RU {+ ov} is a proper 
lower-semicontinuous functional. 


Definition 1 (cf. [27]) The sequence {u"} C V is said 
to bea generalized minimizing sequence (Levitin-Polyak 
minimizing sequence) for (1) if 


lim d(u",K) =0 
noo 
and 
lim J(u") = inf J(u), 
noo ueK 
with 
d(u, K) = inf |lu— v|| 


distance function. 


Definition 2 (1) is called well-posed (Levitin-Polyak 

well-posed) if 

i) it is uniquely solvable, and 

ii) any generalized minimizing sequence converges to 
u* = arg min{J(u):u € K}. 


The second concept (cf. [20,23]) concerns (1) with 
K = {u € Up: Bu) <0}, (2) 


Up C V aconvex closed set, B:Ug — Y a convex con- 
tinuous mapping into a Banach space Y, and J:Up > R 
a convex continuous functional. The relation ‘“<’ in (2) 
and the convexity of B are defined according to a posi- 
tive cone in Y. 

In this case, the study of the dependence of a so- 
lution on data perturbations is often more natural and 
simpler than the analysis of the convergence of a gener- 
alized minimizing sequence. 

We suppose that Uo is exactly given and a violation 
of the condition u € Up does not arise. For a fixed 6 > 0, 
the set of variations is defined by 


5 = {ys = Us. Bs): I Iollewy £8: 


sup ||B(u) ~ By(u)lly <8, ) 


u€Uo 


where J5:Uy — R, Bs:Uo — Y are assumed to be con- 
tinuous. Then, the problem 


min {Js(u): u € Uo, Bs(u) <0} (4) 
corresponds to an arbitrary but fixed variation gs € ®;. 
The set of optimal solutions of (4) will be denoted by 
U* (935). 


Definition 3 Problem (1), (2) is called well-posed if 

i) it is uniquely solvable, 

ii) there exists a constant 5o > 0 such that for any 6 € 
(0, 59) and any gs € ®; the set U*(5)is nonempty, 

iii) limg _, 9d(u*, U*(gs)) = 0 for arbitrary ys € Bs. 


Depending on the pecularities of the problem consid- 
ered, the ‘quality’ of data as well as the requirements to 
an approximation, other norms in (3) and additional 
assumptions w.r.t. Js, Bs can be considered (for in- 
stance, convexity of Js, Bs). For a relaxation of the in- 
equalities in (3) see [39,40]. 

Of course, the Definitions 2 and 3 are not equiva- 
lent, and in the framework of the chosen concept of 
well-posedness the problem is called ill-posed if any 
condition is violated in the corresponding definition 
used. 


Example 4 Problem (1), (2) with 


V=R, Y=R, 


J(u) = up, 
Up = {ue R*: 0<u;, <2}, 
Bu) 


+ : : ie 
= (4, +42. -— -,u3-— -,-u, —u,—U 
1 ee 1— U2 — 43 


is well-posed according to Definition 2, but it is ill- 
posed according to Definition 3. This example reflects, 
in particular, the situation that an arbitrary small data 
perturbation may lead to an unsolvable problem. 


Example 5 The unconstrained problem 


[o.e} 
minimize J(u) = >. k uy ovr V=l1, 
k=1 
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is ill-posed according to both Definitions 2 and 3. To 
verify that take 


MP at-10...2010.5, Ree 
7U,---5,U,1,U.~..), n a 


ji = os kluy + max{n'u? nn}. 


If it is supposed that V is a reflexive Banach space, Y 
= C(T) (with T a compact set), that Problem (1), (2) 
is uniquely solvable and that Slater’s condition is valid, 
then the condition 


lim sup |ju—u*|| = 0, 
5—>+0 ue Ws 


with 


Ws = {u € Up: J(u) < J(u*) + 6, 
max B(u)(t) < 8} 


is necessary and sufficient for this problem to be well- 
posed according to Definition 3 (with convex Js, Bs) (cf. 
[20]). 

Let us mention also concepts of well-posedness us- 
ing different notions of hyper- or epiconvergence. As an 
example, identifying the functions with their epigraphs, 
in [9] for the class of Problem (1) the closeness of data 
is measured in the Attouch—Wets metric defined on the 
data space. Here, Problem (1) is said to be well-posed if 
it is uniquely solvable and its solution depends contin- 
uously (in V) on the data perturbation (for details see 
[35]). These concepts are closely related to the classical 
idea of Hadamard of the continuous dependence of the 
solution on the data. 

Some notions of well-posedness do not suppose 
uniqueness of a solution of the problem considered (cf. 
[35,40]). A correspondinggeneralization of Definition 2 
leads to the following conditions: 

i) the optimal set U* is nonempty, 

ii) each generalized minimizing sequence has a subse- 
quence converging to an element of U*, 
or (the weaker condition) 

i’) d(u", U*) > 0 for each generalized minimizing se- 

quence {u"}. 

If the problem is ill-posed, the following difficulties oc- 
cur: 


1) using approximate data one cannot be sure that a so- 
lution of the ‘perturbed’ problem is close to the solu- 
tion (or to the solution set) of the original problem; 

2) in the majority of the numerical methods it is pos- 
sible that the calculated minimizing sequence does 
not converge (in a suitable sense) to a solution of the 
problem. 

It may also happen that standard solution methods 

break down for such problems. 


Example 6 Problem (1), (2) with 


V=R’, Y=C[0,1], 
Up = {ue R’: uz > 0}, 


J(u) = -u,, 


and 


iT 2 
Bu(t) = u,— (+ ~) U2. 


Obviously, solutions of this linear semi-infinite prob- 
lem are points u* € U* = {(0, a): a > 0}. Choosing a fi- 
nite grid T’ on [ 0, 1] with 


t ai ter 
aa 


and ty) # 1/4/2, then for the approximate problem 
(with T’ instead of [0, 1]), the ray 


2 1 , 
uER: uy, = to = — u2, u, > 0 


to = argmin 


J2 


is feasible and J(u) — — 00 on this ray if ||u|| + 00. 


This example shows the typical behavior of ill-posed 
semi-infinite problems: Although the original problem 
is solvable, the discretized ones may be not solvable, 
even if dense grids are used. Due to unsolvability of 
the discretized problems, the direct application of dis- 
cretization and exchange methods for solving semi- 
infinite programs is impossible. Moreover, the assump- 
tions required for the application of reduction methods 
are violated in this example, too. (For the conceptual 
description of the methods mentioned see [16]). 

Nevertheless, it is well-known that some classical 
methods, applied to ill-posed problems, possess stabi- 
lizing qualities: They generate minimizing sequences 
with better convergence properties than those proper- 
ties which are guaranteed for an arbitrary minimizing 
sequence. 
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For instance, for the ill-posed problem (1), where J 
is a convex functional of the class C’! and K is a convex 
closed subset of a Hilbert space V, the gradient projec- 
tion method (with a constant steplength parameter and 
an inexact calculation of the gradient p* ~ V J(u*) at 
each step k) converges weakly to some element of U* if 
U* # Gand || p* — V J(uk) || < ee US, ee < 00 (cf. 
[33]). 

In [20] it is shown that penalty methods applied to 
a finite-dimensional convex programming problem, for 
which the conditions ii), iii) in Definition 3 may be vi- 
olated, converge to the unique solution of this problem 
if the exactness of the data is improved within the solu- 
tion process by a special rule, depending on the change 
of the penalty parameter. 

Stable methods for solving convex ill-posed varia- 
tional problems are mainly based on Tikhonov’s reg- 
ularization approach (cf. [29,39,40]) and the proximal 
point approach (cf. [30,37]). Nowadays the direct appli- 
cation of these approaches (when multiple regulariza- 
tion of the original problem is performed and the regu- 
larized problems are solved with high accuracy) loses its 
importance in comparison with techniques using reg- 
ularization inwards of the basic numerical algorithm 
which is suitably chosen for solving well-posed prob- 
lems of the corresponding class of problems. 

Let us briefly describe these techniques under the 
assumption that V is a Hilbert space. Suppose a certain 
basic method (for instance, discretization or penaliza- 
tion method) generates the sequence of auxiliary prob- 
lems 


Jiu) > min, ue K; CV, 
then in the Tikhonov approach successively the auxil- 
iary problems 


Ji(u) +; lu—al|? > min, ue K;, 


(5) 


(a; >0, lima; =0, wu € Va fixed element) 


are solved, whereas the proximal point approach leads 
to the following sequence of auxiliary problems 


Ji(u) + Xi ju — ui 1||° > min, u € Kj, (6) 


with 0 < x; < 7, u'—! an approximate solution of (6) 
at the stage i := i— 1 and u® € V an arbitrary starting 
point. 


We refer to (5) and (6) as Tikhonov’s iterative reg- 
ularization method and proximal-like method, respec- 
tively. 

Usually, dealing with a convex variational problem, 
the functions J; are convex and the sets K; are convex 
and closed. Therefore, the objective functions in the 
Problems (5) and (6) are strongly convex, and hence, 
these problems are uniquely solvable (if K; # ). It 
should be emphasized that, inasmuch y; — 0 is not nec- 
essary for the convergence of the proximal-like meth- 
ods (in particular, y; = x > 0 can be chosen), they pos- 
sess a better stability and provide a better efficiency of 
fast convergent methods solving the regularized auxil- 
iary problems. 

Theoretical foundations for the construction and 
the convergence analysis of Tikhonov’s iterative regu- 
larization methods have been developed in [32,40]. We 
refer to some methods coupling Tikhonov’s regular- 
ization with gradient projection methods [8], Newton 
methods [7], augmented Lagrangian [2] and penalty 
methods [40]. In the latter paper the stability of regu- 
larized penalty methods for Problem (1), (2) with Y = 
R" is proved without assuming convexity of J and B. For 
applications of Tikhonov’s regularization in the frame- 
work of successive discretization of ill-posed variational 
problems see [28,40]. 

Proximal-like methods have been intensively de- 
veloped during the last two decades. Starting with the 
papers [3] and [36], where the proximal method of 
multipliers has been investigated, regularized variants 
of different penalty methods (cf. [1,5,6,19]), steepest 
descent method [18], Newton methods [34,41] and 
quasi-Newton methods [10] have been suggested. In 
[21] proximal regularization is coupled with penaliza- 
tion and successive discretization for solving ill-posed 
convex semi-infinite problems, and in [22] a proximal 
method with successive discretization has been stud- 
ied for solving elliptic variational inequalities. There 
is a couple of papers in which proximal regularization 
is used to obtain new decomposition (splitting) algo- 
rithms ([11,14,17]) and new bundle algorithms for non- 
differentiable optimization problems ([4,10,24,25,31]). 
In some papers mentioned a nonquadratic proximal 
regularization is carried out by means of the Bregman 
function [13]. 

General schemes for the investigation of proximal- 
like methods have been developed in [20,26] and [37]. 
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The scheme in [20] includes a generalization of (6), 
where the proximal iterations are repeated for fixed 
Ji, K; until they providean‘appropriate’ decrease of the 
functional J}. 


See also 


> Sensitivity and Stability in NLP 
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The study of the properties of the image of a real-valued 
functionis an old one; recently, it has been extended to 
multifunctions and to vector-valued functions. How- 
ever, in most cases the properties of the image have 
not been the purpose of study and their investigation 
has occurred as an auxiliary step toward other achieve- 
ments (see, e. g., [4,16,17]). 

Traces of the idea of studying the images of func- 
tions involved ina constrained extremum problem go 
back to the work of C. Carathéodory (in 1935, [3, 
Chap.5]). In the 1950s R. Bellman [1], with his cel- 
ebrated maximum principle, proposed - for the first 
time in the field of optimization - to replace the given 


unknown by a new one which runs in the image; how- 
ever, alsohere the image is not the main purpose. Only 
in the late 1960s and 1970ssome Authors, indepen- 
dently from each other, have brought explicitly such 
a study into the field of optimization [2,6,7,10,11]. 

The approach consists in introducing the space, call 
it image space (IS), where the images of the functions 
of the given optimization problem run. Then, a new 
problem is defined in the IS, which is equivalent to 
the given one. In a certain sense, such an approach 
has some analogieswith what happens in the measure 
theory when one goes from Mengoli-Cauchy-Riemann 
measure to the Lebesgue one. 

The approach will now briefly be described. Assume 
we are given the integers m and p with m > 0 and 0 < 
Pp <™m, the subset X of a Hilbert space H whose scalar 
product is denoted by (-, -), and the functions f:X > 


R, gi:X > R,i=1,..., m. Consider the minimization 
problem: 
min f(x), 
(P) 4st. gi(x)=0, ieT®, 
gi(x)>0, ielt, xeXx, 


where I° := {1,..., p}, It = {p+ 1,..., m}, and p =0 

>P=O, pamal*=O;m=05T= PUP =O. 
Here and in the sequel, all the considered extrema and 
integrals are supposed to exist; the discussion of their 


existence goes beyond the scope of this paper. Let us 
set g(x) == (g1(X), -.-, m(X)), Op = (0, ..., 0) € RP, C= 
O,xR"" ? and R:= {x € X:g(x) € C}, with the stipulation 
that C = R’' when p = 0 and C = Oj, := (0,..., 0) € R” 
when p = m; m = 0 does not require to define C. 

A particular case of (P), call it (P)iso, is a classic 
isoperimetric type problem defined in the following 
way. Let AC,,(T) denote the class of absolutely contin- 
uous n-vector functions x(t) := (x; (0), ..., x,(f)) on T := 
[a, b] C R with square integrable derivatives. By suit- 
ably defining the scalar product - and, consequently, 
the norm - such a class is a Hilbert space; set H = A 
C,(T) and 


fla) = f volt. x(t) 510) dt, 
T 


sila) = f vile. x(0) 80) e ver 
ME: 
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where y;R'**" > R, i € {0} U I, are given integrands. 
Fixed endpoints conditions can be included in the defi- 
nition of X. We point out that the problems which can 
be reduced to the format of (P) share the characteris- 
tic of having a finite-dimensional image. Hence, certain 
problems, for instance of geodesic type, are not covered 
by (P); the image analysis of them is outside the scope 
of the present writing. 

The IS approach arises naturally in as much as an 
optimality condition for (P) is achieved through the im- 
possibility of a system. More precisely, by paraphrasing 
the very definition of global minimum we can say that 
a feasible x € X is a global minimum point for (P) if and 
only if the system (in the unknown x): 


g(x; x) := f(x) — f(x) > 0, 


S 
(s) g(x) EC, 


xe xX, 
is impossible, or 


(iS). Anke) =e, 

where # := {(u, v) € RXR”: u> 0, v € Chand K(x) := 
{(u,v) ERX R™: u=(x;x), v= g(x), x EX} = 
F(X), where F := (¢, g). It is easy to see that (S’) holds if 
and only if 

(S") HO[K(x)—closH] = @, 
where the difference is in vector sense and clos denotes 
closure. K(x) is called the image of (P) and K(x) — 
clos H its conic extension. The replacement of the im- 
age with its conic extension - which corresponds to 
modifying f and g - does not affect the optimality con- 
ditions and has several advantages. For instance, if f 
and —g are convex (or, more generally, (f, —g) is convex 
like [6,20]), then K(x) — clos H is convex even if K(x) 
is not. Note that a change of x produces merely a trans- 
lation of K(x) with respect to the u-axis. Hence, the 
properties of the image can be studied independently 
of the choice of x. 

The analysis in the IS must be viewed as a prelimi- 
nary and auxiliary step - and not as a concurrent anal- 
ysis — for studying (P). If this aspect is understood, then 
the IS analysis may be highly fruitful. In fact, in the IS 
we may have a sort of ‘regularization’: The conic exten- 
sion of the image of (P) may be convex or continuous or 
smooth when (P) (and its image) do not enjoy the same 


property, so that convex or continuous or smooth anal- 
ysis can be developed in the IS but not in H. (P) at H = 
R" and (P)js. have their unknowns in a finite and in an 
infinite-dimensional spaces, respectively; while the im- 
ages of both problems run in a finite-dimensional space; 
hence, in the given spaces, namely R” and AC,,(T), they 
require substantially different mathematical tools, while 
in the IS they can be treated in the same way. 

It is easy to show that (P) is equivalent to the follow- 
ing image problem: 


max U 
IP 


st. (u,v) € K(x), veEC. 


A maximum point, say (u,¥V), of IP is the image - 
through the pair (y, g)—of a minimum point, say x, of 
(P), and we have f(x) —u = f(x), whatever xX € X may 
be. If IP is replaced by the other one: 


max wu 
st. (u,v) € K(x), 

HS) v¥4,=&, ie! 

: as 1s ’ 

vi>&, ielt, 


then the maximum, which is now a function of & := (€), 
..+5 Em), gives the so-called perturbation function (called 
also optimal value function) of (P), and, with obvious 
notation, we have f(x) — u(&) = f(x(&)). 

If in (P) minimization is replaced by maximization, 
then the entire image space analysis remains unchanged 
provided ¢ receives the new definition as y(x;x) := 
f(x) —f@). 

To prove directly whether or not (S’) (or (S’’) and 
hence (S)) holds is generally impracticable. An indirect 
way of showing it consists in proving that H{ and K(x) 
lie in two disjoint level sets, respectively. This separa- 
tion approach is exactly equivalent to finding a theo- 
rem of the alternative (cf. » Theorems of the alterna- 
tive and optimization and [6]) related to (S); only the 
mathematical languages are different. The separation 
scheme enjoys mainly a geometrical appeal, while in 
studying alternative the algebraic characters are dom- 
inant. In general terms, a separation scheme begins by 
introducing a family W of functionals such that the in- 
tersection of the positive level sets equals 1. If a w € 
W is found whose nonpositive level set contains K(x), 
then (S) holds and x is a solution of (P). 
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Let us consider the column vector v = (v1, ...; Vm)" 
and the row vectors A = (Aj,..., Am), = (Mis «++ in): 
A particular, but wide, family of type W is offered by 
the class of parabolic-exponential functions w:R'*" > 


R, defined by 
Woe: w=w(u,v3A, ph) :=ut+yv;A, 1), 


where A € C* =R? x Rs ER", and 


y(vsA, 1) 
= Suir; —Ajvi)+ > Aivi exp(—Hivi), 
iel ieIt 


where * marks positive polar, and A, jz are parameters 
which describe the family. At jz = 0, this family col- 
lapses to that of linear functions: 


We: £=l(u,vj;A):=ut(A,v), AEC, 


where (-, -) marks the usual scalar product. It is easy to 
check that the family W,. (and hence W,) fulfills the 
above mentioned intersection property (with respect to 
A, 4). Starting from such a separation scheme, it is pos- 
sible to develop most of the theory of constrained ex- 
trema as will be briefly shown. 

The intersection of the positive level sets of the ele- 
ments of Wye is not open, since H is not open. There- 
fore, an element of H{ may be a limit point of elements 
of K(x) or the Bouligand tangent cone to K(x) at 014m 
may intersect H{ even when (S’) holds, so that no ele- 
ment of Wye may exist which shows separation between 
H and K(x). This drawback can be overcome by en- 
larging the family Wye. For instance, in the linear case, 
Wz is replaced by the family €(u, v;0,A):=@u+A, 
6 >0,A € C*, (0, A) £ 0; at p = 0, this is the family 
considered in » Theorems of the alternative and opti- 
mization. Of course, such a relaxation does not guar- 
antee any longer (S’), even if K(X) is included in the 
nonpositive level set of £, since at 6 = 0 the nonposi- 
tive level set of ¢ intersects H{, the intersection contain- 
ing the positive u axis. As a consequence, the separa- 
tion scheme would be useless. A way of remedying this 
consists in cutting off the set of problems whose images 
can be included in the nonpositive level set of £ at 6 = 0 
only. Such an exclusion, which is extended to the (lin- 
ear)approximations of (P), is done by imposing suitable 
conditions on f and g, which are called constraint qual- 
ifications if they implicate only g and are called regular- 


ity conditions if they implicate both f and g (cf. ® The- 
orems of the alternative and optimization; [12]). 

Now, let us show some consequences of the sepa- 
ration scheme. A first result is a sufficient condition. 
From the very definition of w, we see that the existence 
of vectors A € C* and pw € R", such that K(x) C 
lev<o w(u, v3A, (4) (where leve 9 denotes nonpositive 
level set) is sufficient for x to bea solution of (P). Hence, 
to achieve a sufficient condition in terms of X, f and g 
it is enough to replace, in the above inclusion, u and v 
with their expressions: 


Theorem 1 Let x € R. If there exist A € C* and fi € 
R¢}, such that 


F(%) — fx) + y(g(x)sA,@) < 0, 


then x is a global minimum point of (P). 


Vx € X, (1) 


Let us introduce the function L(x; 4, ) := f(x)— 
y(g(x);A, 2). It is a generalization of the classic La- 
grangian function, which is found at = 0. At p = m, 
£& becomes the so-called augmented Lagrangian func- 
tion. At p = mand A = 0, £ becomes the classic Courant 
penalty function. (See [5, p.12], where such a function 
has been introduced under the name ‘sensitized’; see 
also [15].) Obviously, (1) holds if and only if f(x) < 
L(x3A, jt), Vx € X. Under the equality 


y(g(x);A, 2) = 0, (2) 


we have f(x) = L£(X3A, J); then (1), which is the alge- 
braic form of a separation condition, can be rewritten 
in a different (but equivalent) form, which corresponds 
to a saddle-point condition. At jz = 0, (2) collapses to 
the orthogonal condition 


(x, e@) =o, (3) 


which is classically known as complementarity condi- 
tion [6,12], due to the role it has played in some algo- 
rithms. The fundamental condition (2), which subtends 
the entire theory of constrained extrema, will be proved 
within the next theorem. 
Theorem 2 Let X € X. If there exist 1 € C* and [i € 
R"%, such that 
LGA) SLA, B) Ss L4,m, 4) 
VxeX, VWAEC*, VuweR", 


then x is a global minimum point of (P). 
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Proof It is enough to prove that (1) = (4). 

(>) X%¥ € RA € C* > y(g(k);A,M) = 0; the 
reverse of this inequality is implied by (1) atx = x. 
Thus (2) follows. Because of (2), (1) is equivalent to the 
second of inequalities (4). It is easy to show that 


exEeC 6 
VAeEc*, 


y(g(x)3A, w) = 0, 


i) 
VueRY. 7) 


In fact, the implication => is trivial; the reverse impli- 
cation follows quickly by reasoning ab absurdo. (2) and 
(5) imply the inequality 


y(g(x);A, 7) < y(g(X)5A, w), 


* m (6) 
WAeC*, VWuweR, 


which is equivalent to the first of (4). 

(<=) The first of (4) is equivalent to (6) which 
implies x € R, so that y(g(X)3A, 2) = 0. The 
strict inequality contradicts (6) since, Vi € I + 
Xi gi(X) exp(—pigi(X)) can be made arbitrarily small so 
that the same happens to y(g(X); A, jz). Hence (2) holds 
here too. Taking into account (2), it is easy to see that 
the second of inequalities (4) implies (1). 


At 2 = 0, (4) is the classic saddle-point sufficient condi- 
tion and (2) becomes (3). Note that Theorem 2 does not 
contain any assumption on X, if and g. 


Example 3 Let us set H = R, X = ]—1, +oo[, p= 0, m= 
1, f(x) = log(x + 1), g(x) =x. At ju = 0, (4) is equivalent 
to the system of (A — A)x > 0 and 


x+1 = _ 
log (=) > A(x —X), 


Vx €]-l,tool, WA>0, 


which is impossible. Hence, the classic saddle-point 
condition is not satisfied. At Z@ > 0, (4) can be satis- 
fied. In fact, at x = 0, (4) is equivalent to log(x + 1) > 
Ax exp(—x), Vx € ]— 1, + oo[, which is true ifA =1 
and jl is large enough. Hence, Theorem 2 can now be 
applied to state that x = 0 is a global minimum point 
of (P). 


Example 4 Let us set X C H = AC2(T), where T := [0, 
tT] is the domain of the elements x = (x), x2) € H; x) = 
x(t) and x2 = x2(t), t € T, are the parametric equations 
of acurve y in R’*; given a positive real ¢, tr must be such 
that the length of y be ¢. X is now the set of pairs x = 


(x1, x2) € H, such that x; (0) = x2(0) = 0, x2(t) > 0, VteE 
T, each x; is regular in the sense of Jordan and closed. 
Moreover, we set p = m = 1, T = [0, 27], and 


f= [nde go= f fad+rad—e 


Consider the problem 
P(e) max f(x), 
s.t g(x) =0, x EX. 


The solution of this classic isoperimetric problem is 
well known: 


x1(t;2) = ~ cos (= 9) 
ain = ~ [1 i sin (t- a 
te T = [0,27], 


or, in nonparametric form, xj + x3 — x2 €/ =0, and the 
maximum is ¢7/47c. If in P(/) we replace g(x) = 0 with 
g(x) = & so that we consider P(¢ + &), then Vx € X we 
have 


Hee eee 
wen = f(x) - a 


It follows that K(x) is included in a convex (with re- 
spect to u-axis) parabola; hence H{ and K(x) can be 
separated by a line, so that (1) and (4) can be verified 
at w = 0. Any x € X (and not necessarily an opti- 
mal one) allows to carry on the analysis in the image 
space. Of course, in general, it is impossible to have an 
explicit form of K(x). In the present example, to show 
explicitly a part of K(x), namely the perturbation func- 
tion, we have exploited the knowledge of the maximum 
point. 


Let us stress that the sufficient condition (1), as well as 
(4), is an important result; however, in general, it is not 
necessary and it is difficult to be verified since the in- 
equality must be fulfilled Vx € X. Therefore, it is use- 
ful to weaken the analysis, by replacing K(x) withan 
‘approximation’ which be ‘easier’ to handle. A natural 
way of doing this consists in approximating K(x) at 
F(x) := ((%%), g(x)) by means of its tangent cone 
(e. g., in the sense of Bouligand [4,6]); in general a cone 
is obviously easier than any set. For the sake of sim- 
plicity, now we will consider a particular case of (P) 
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and adopt a separation scheme less general than above, 
which however embeds the classic theory [12]; for more 
general results see [6,7,9,10,11,13,14,18,19,20]. 

Consider the particular case where p = 0 (so that 
C = R"; the presence of bilateral constraints makes 
the analysis extremely difficult, unless f and g are as- 
sumed to fulfill conditions which make applicable Dini 
or Lyusternik implicit function theorems) and X is 
open. Denote by C the set of sublinear real-valued func- 
tions (i.e., positively homogeneous of degree one and 
convex) defined on H; f is superlinear if and only if — 
f is sublinear. f is said to be C-differentiable at x if and 
only if IDcf: Xx X — R, such that Dc(x;-) € C, 
and 


1 
lim — [e (x; z) 
20 ||| 


= f(x +z) — f(*) — Def %z)] =0, (7) 


where || - || is the norm in H generated by the scalar 
product and z belongs to a neighborhood, say Z, of x. 
Dcf is said to be the C-derivative of f at x. If Dcf is 
linear (the linear functions are obviously elements of 
C), then f is differentiable [7]. It is easy to see that a C- 
differentiable function is directionally differentiable in 
any direction z (in the sense that there exists the limit 
of [f(x + wz) — f(x)]/a as a | 0); the vice versa is not 
true, as shown by the following example(which gener- 
alizes the so-called Peano function showed by G. Peano 
to detect a famous mistake by Lagrange): 


Example5 H=X=R=R?’, x= (xj, x2), X = 0, |lz|| = 
WZllo, f(x) = (xt + x5)? if x A 0 and f(x) = a(x2/x7) (x3 
+ x3)" if x > 0, where a:R, \ {0} > R is defined by 
a(t)=1l1if0<t<lort>3,a(t)=3-2tifl<t 
<2 and a(t) = 2t — 5 if 2 < t < 3. In this Example, at 
x = X the directional derivative exists and is f’(X;z) = 
(zi +z}) 2, while it is not possible to verify (7). Note that 
f is continuous, but not locally Lipschitz, and f’(0;z) > 
0, Vz ER’. 


Example6 H=X=R=R,f:R—-R, with f(x) =|x|+ 
x? ifx € Qand f(x) =| x | + 2x’ ifx ¢Q, Q being the set 
of rational numbers. f is C-differentiable at x = 0 with 
C-derivative Dc f (0; z) = |z|. Note that f is continuous 
at x = 0 only. 


The C-subdifferential of a C-differentiable function f at 
x € X is defined by 


Def (x3z) = (z*,z), 


* eH’: ; 
. VzEZ 


Oc f(x) := 
where H’ is the continuous dual of H; z* is called the 
C-subgradient of f at x. When f is convex, then dc f(x) 
collapses to the classic subdifferential which is de- 
noted simply by df (x); hence, dc f(x) is nothing more 
than the subdifferential of Dc f(x; z) or dc f(xX3z) = 
ODc f(x; z). When Dc f(X;z) is linear, then dc f(X) is 
a singleton and collapses to the classic differential. In 
the latest example dc f(0) = [—1, 1]. Consider the fur- 
ther example: H = X = R=R, f(x) =x’ sin I/x ifx # 
0 and f(0) = 0. We find Dc f(0;z) = 0, Vz € Z (in- 
deed f is differentiable), so that dc f(0) = {0}, while the 
Clarke subdifferential [4] is [—1, 1]. Now consider the 
following regularity condition: 


(RC) T(K(x))N{(u,v)e H: v=0} = 9G, 


where T(K(x)) is the Bouligand tangent cone of K(x) 
at x. Several conditions on f and g are well known 
(mainly when H = R") which guarantee (RC). Con- 
sider for instance the case where H = R” and f, 
g are derivable. (RC) holds if the gradients Vg;(x), 
i € {i eI: g;(x) = 0} are linearly independent. (RC) 
holds if g is affine. (RC) holds if g is concave and 4x € 
X such that g(x) > 0. For additional conditions see 
> Theorems of the alternative and optimization; [6,12]. 

The approximation of (P), we want to discuss in the 
present particular case p = 0, consists in replacing f and 
— gi, i € I, with their C-derivatives. More precisely, in- 
stead of the map F = (¢, g), we consider now the super- 
linear map 


F(x; z) 


= (—Dc f(%z), gi(X) + Degi(%sz) ied), 


which is the first order expansion of the (—C)- 
differentiable map F. K(x) is now replaced by the cone 
Kc(X) := Fe(x; X —X). (S’) is now replaced by 


(S") HN Ke(X) = 9, 


which holds if X is a minimum point of (P) (but not 
necessarily vice versa due to the above approximation 
of K(x); hence, from the necessity and sufficiency of 
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(S’) we jump to the sole necessity of (S’’’), since H and 
K(X) are linearly separable; hence (S’’”) can be proved 
by means of the subclass of Wy. at js = 0. As a conse- 
quence the Lagrangian function £ (which, as we have 
seen above, is, up to a formal transformation, the sep- 
aration function w) will be used at jz = 0; thus we set 
L(x;A) := £(x;A, 0). This leads to the following neces- 
sary condition, whose proof can be found in [7]. 


Theorem 7 Let the functions f, — gj, i € I, be C- 
differentiable, and assume that (RC) be fulfilled. If x is 
a minimum point of (P), then 3X € R”, such that 


inf DeL(% 2A) > 0, (8) 
g(x) >0, ASO, (9) 
(2. s(@)) = (10) 


where DcL(X;z;A) is the C-derivative of Latx = 
x,A =A, and B:= {z: || z || = 1}. 
(8) is equivalent to 


0 € dcf(®) + Y- AjIc(—gi(®)). (11) 
i€l 
which becomes 
0 € Of(%) + YA; 4(—g;(®)), (12) 


i€l 
if, in particular, X, f and — g are convex. When f and 
g are differentiable on X, then (8) collapses to V, = 0 
along x = X, where V is the first variation of L, and in 
case (P)iso becomes 
/ = le d / Sale 

W(t, %,% 3A) — Fretaa x,x;A) =0, (13) 
where W := Wo— vier Ai Wi is the integrand of L. If X 
= H = R", then (8) collapses to 


L' (x4) = 0, (14) 


where L’. is the gradient of L with respect to x. 


Note that (13) is the classic Euler equation and (14) 
is the classic Lagrange equation; d is the vector of La- 
grange multipliers which turns out to be the gradient of 
the hyperplane (w = 0 at zs = 0) which separates the two 
sets of (S”). 


Now, let us go back to the separation scheme which 
led to the sufficient condition (1). The choice of proving 
(S’) indirectly through separation has a lot of interest- 
ing consequences which go beyond the initial purpose. 
One of them is the introduction of a (nonlinear) dual 
space: that of functionals w. When we restrict ourselves 
to Wye, then the dual space is isomorphic to R”” (to 
R” at jy = 0; this is the classic duality scheme in finite- 
dimensional optimization). Such an isomorphism is the 
characteristic of constrained extremum problems hav- 
ing finite-dimensional image (independently of the di- 
mension of the space where the unknown runs). 

Having recognized that we have introduced a dual 
space, to define a dual problem is immediate. Indeed, 
looking at (1), since the inequality must be fulfilled Vx 
€ X, it is straightforward, for each A and j1, to search 
for maxxex w(p(X; x), g(x); A, 4) and then to find A, w 
which make such a maximum as small as possible and, 
hopefully, not greater than zero. Hence, we are led to 
study the problem: 


(P*) max min L(x;A, 4), 


AEC*, WERT xEX 

which we call generalized dual problem of (P); any pair 
(A*, *) which solves (P*) is a dual variable [6,19]. At 
Jt = 0, (P*) is the classic dual problem of (P) [12]; in- 
deed, the classic duality theory starts by defining (P*) 
as a dual problem, independently of the separation 
scheme and hence of the other theories like the saddle- 
point one. It is easy to show that the maximum in (P*) 
is < of the minimum in (P); the difference between the 
latter and the former is called duality gap; it is now clear 
that a positive duality gap corresponds to a lack of sep- 
aration between H and K(x) at the minimum point x. 

Another important topic which can be derived by 
the separation scheme is the penalization theory. Seem- 
ingly independent of the other topics of Optimization, 
it is indeed strictly related to them, since it can be drawn 
from the separation scheme, as will be now briefly out- 
lined (recall the remark after Theorem 1). Consider 
again the family W,. within which select a sequence, 
say {w, := w(u, 3A", w")}°2,, of separation functions, 
such that the positive level set (with respect to (u, v)) of 
wr+1 be strictly included in that of w,. Then, we can try 
to ‘fulfill (1) asymptotically’ or to set up the sequence of 
problems: 


(P,) min L£(x;A", pw"); r=1,2,.... 
xEX 
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Under suitable conditions, a limit point of {x"}?° (x" be- 
ing a solution of (Pr) is a solution of (P). See [6,15] for 
details. 

Let us stress the fact that the separation scheme and 
its consequences come down from (S), and do not ‘see’ 
(P); they are unacquainted with the fact that the impos- 
sibility of (S) expresses optimality for (P). Therefore, it 
is obvious that the separation approach can be applied 
to every kind of problem which leads to the impossibil- 
ity of a system like (S). In fact, such an approach can 
be applied to vector optimization and to variational in- 
equalities [8], and to generalized systems [14,19]. 


See also 


> Vector Optimization 
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The nonlinear complementarity problem (see [3,15]) is 
to find a point x € R” such that 
x= 0, 


F(x)>0, (x, F(x)) =0, (1) 
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where F : R” — R" and (-, -) denotes the usual inner 
product in R". A popular approach for solving the non- 
linear complementarity problem (NCP) is to construct 
a merit function f such that solutions of NCP are related 
in a certain way to the optimal set of the problem 


min f(x) 


st xEC. 


Of practical interest is the case when the set C has 
simple structure and smoothness properties of F and 
dimensionality n of the variables space are preserved. 
There is a number of ways to reformulate the NCP as an 
equivalent optimization problem (for a survey, see [7]). 


Unconstrained Implicit Lagrangian 


The first smooth unconstrained merit function was 
proposed by O.L. Mangasarian and M.V. Solodov [12]. 
This function is commonly referred to as the implicit 
Lagrangian; it has the following form: 


Ma(x) = (x, F(x)) 


1 
A = (I(x — aF(x))4 ||? — llx||”) 


1 
+ 5 (II(F@) — x) +1? — FOO) 


where @ > 1 is a parameter and (-), denotes the orthog- 
onal projection map onto the nonnegative orthant R", 
i.e. the ith component of the vector (z), is max{0, z;}. It 
turns out that M,(x) is nonnegative on R” provided a 
> 1, and is zero if and only if x is a solution of the NCP. 
If F is differentiable on R”, then so is Mg (-) and its gra- 
dient vanishes at all solutions of NCP for a > 1. Hence, 
one can attempt to solve the NCP by means of solving 
the smooth unconstrained optimization problem 


(2) 


min M(x) 
st. x ER". 


The implicit Lagrangian owes its name to the way the 
function was first derived in [12]. Consider the con- 
strained minimization problem (MP) 


min 

s.t. 
which is related to the NCP (1) in the sense that its 
global minima of zero coincide with the solutions of 


(x, F(x)) 
x >0, F(x) >0 


NCP. Because of the special structure of the MP (the 
objective function is the inner product of the func- 
tions defining constraints), for every feasible x such that 
(x, F(x)) = 0 it can be observed that x plays the role of 
the Lagrange multiplier [2] for the constraint F(x) > 0, 
while F(x) plays a similar role for the constraint x > 
0. Keeping in mind this observation, consider the aug- 
mented Lagrangian [1] for the above MP: 


La(x, u,v) = (x, F(x) 
1 
+ 5 (IIe F(x) + u) +1? — [u1/) 
1 
+ 5 (I(x + v) +1? — Ill’), 


where u € R" and v € R" are Lagrange multipliers corre- 
sponding to the constraints F(x) > 0 and x > 0 respec- 
tively. Since it is known a priori that at any solution x of 
MP (and NCP) one could take u = x and v = F(x), it 
is intuitively reasonable to ‘solve’ for multipliers u and v 


in terms of the original variables. Replacing u by x and 
v by F(x) in the augmented Lagrangian, one obtains the 
implicit Lagrangian function Ma (x). 

The parameter a must be strictly bigger than one, 
because it can be checked that M;(x) = 0 for all x € R”. 
Another interesting property is that the partial deriva- 
tive of the implicit Lagrangian with respect to a is also 
nonnegative for all x, and is zero if and only if x is 
a solution of the NCP [11]. However, a merit function 
based on this derivative is nonsmooth. 


Restricted Implicit Lagrangian 


When the implicit Lagrangian is restricted to the non- 
negative orthant R", where nonnegativity of x is explic- 
itly enforced, the last two terms in the expression for 
M,(x) can be dropped. Thus the restricted implicit La- 
grangian is obtained: 


Nal)6 = (4, FO) +5 ("Io — FO) I = IP), 


where a@ > 0. In this form, the function was introduced 
in [12]. It is also equivalent to the regularized gap func- 
tion proposed by M. Fukushima [6] in the more gen- 
eral context of variational inequality problems (cf. also 
> Variational inequalities). 

The restricted implicit Lagrangian is nonnegative 
for all x € R4. provided the parameter a is positive, and 
its zeroes coincide with solutions of the NCP. It also 
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inherits the differentiability of F. Thus the NCP can be 
solved via the bound constrained optimization problem 


min No(x) 


s.t. x> 0. 


(3) 


Note that since it is known a priori that every solu- 
tion of the NCP is nonnegative, one may also consider 
a bound constrained problem with the function Mg (x). 
However, for the constrained reformulations the func- 
tion N(x) is probably preferable because it is some- 
what simpler. 


Regularity Conditions 


It should be emphasized that only global solutions of 
optimization problems (2) and (3) are solutions of the 
underlying NCP (1). On the other hand, standard iter- 
ative methods are guaranteed to find stationary points 
rather than global minima of optimization problems. It 
is therefore important to derive conditions which en- 
sure that these stationary points are also solutions of 
NCP. One such condition is convexity. However, the 
implicit Lagrangian is known to be convex only in the 
case of strongly monotone affine F, and provided pa- 
rameter a is large enough [17]. Clearly, this is very re- 
strictive. Thus other regularity conditions were investi- 
gated. 

For the unconstrained problem (2), the first suf- 
ficient condition was given by N. Yamashita and 
Fukushima [24]. They established that if the Jacobian 
V F(X) is positive definite at a stationary point x of (2), 
then x solves the NCP. This result was later extended 
in [8] to the case when VF (x) is a P-matrix. Finally, F. 
Facchinei and C. Kanzow [5] obtained a certain regu- 
larity condition which is both necessary and sufficient 
for a stationary point point of the unconstrained im- 
plicit Lagrangian to be a solution of NCP (it is similar 
to the condition stated below for the restricted case). 

For constrained problem (3), Fukushima [6] first 
showed the equivalence of stationary points to NCP so- 
lutions under the positive definiteness assumption on 
the Jacobian of F. A regularity condition which is both 
necessary and sufficient, was given by Solodov [20]: 
a point x € R" is said to be regular if V F (x) reverses 
the sign of no nonzero vector z € R" satisfying 
Zn <0, (4) 


Zp > 0, Zc = 0, 


where 
C:= fi: x; = 0, Fi(x) = 0, x;F;(x) = 0}, 
P:= {i: xj > 0, Fi(x) > 0}, 
N := {i: x; = 0, Fi(x) < 0}. 


Recall [4] that the matrix V F(x)! is said to reverse the 
sign of a vector z € R" if 


zi[VF(x)'z]; <0, Wie {1,...,n}. (5) 


Therefore a point x € R‘, is regular if the only vector z 
€ R" satisfying both (4) and (5) is the zero vector. 

A stationary point of (3) solves the NCP if and only 
if it is regular in the sense of the given definition. 


Derivative-Free Descent Methods 


When F is differentiable, so are the functions Mg(x) 
and N(x). Therefore, any standard optimization algo- 
rithm which makes use of first order derivatives can be 
applied to problems (2) and (3). However, taking ad- 
vantage of the underlying structure one can also devise 
special descent algorithms which do not use derivatives 
of F. This can be especially useful in cases when deriva- 
tives are not readily available or are expensive to com- 
pute. 

In [24], it was shown that when F is strongly mono- 
tone and continuously differentiable, then the direction 


d(x) = (B — a)(x — (x — aF(x))+) 
+ (1 — @B)(F(x) — (F(x) — ax)+) 


is a descent direction for M(-) at x € R", provided 6 
> 0 is chosen appropriately. A descent method based 
on this direction with appropriate line search, converges 
globally to the unique solution of the NCP [24]. In [13], 
it was established that the rate of convergence is actually 
at least linear. 

For the restricted implicit Lagrangian, a descent 
method with the direction 


d(x) = (x—a@F(x))4 —x 


was proposed in [6]. The algorithm was proven to be 
convergent for the strongly monotone NCP (no rate of 
convergence has been established however). In [26], by 
using adaptive parameter a, this method was further 
extended to monotone (not necessarily strongly mono- 
tone) and Lipschitz continuous (not necessarily differ- 
entiable) functions. 
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Error Bounds 


The implicit Lagrangian also appears useful for provid- 
ing bounds on the distance from a given point to the 
solution set of the NCP. In particular, if F is affine then 
there exists a constant c > 0 such that (see [11]) 


dist(x,S) < cMg(x)!” 


for all x close to S, where S denotes the solution set. This 
inequality is called a local error bound. X.-D. Luo and 
P. Tseng [10] proved that, in the affine case, this bound 
is global (i.e. it holds for all x € R”) if and only if the 
associated matrix is of the class Ro. 

For the nonlinear case, Kanzow and Fukushima [9] 
showed that My(x)!”” provides a global error bound if F 
is a uniform P-function which is Lipschitz continuous. 

In the context of error bounds, the following rela- 
tion established in [11] is useful: for all x € R”, 


a '(a@ — 1) |Ir(x) |? < Ma(x) < (a — 1) |Ir(x) |]? 
where 
r(x) = x — (x — F(x))+ 


is the natural residual [14]. Therefore the implicit La- 
grangian M(x) provides a local/global error bound if 
and only if so does the natural residual r(x). 

For the restricted implicit Lagrangian, one only has 
the following relation: 


2aNg(x) > |Ir(x)|]?. 


Thus, in principle, Ng (x) may provide a bound in cases 
when the natural residual does not. 
For a general discussion of error bounds see [16]. 


Extensions 


The implicit Lagrangian can be extended to the con- 
text of generalized complementarity problems and vari- 
ational inequality problems via its relation with the reg- 
ularized gap function. As observed by J.-M. Peng and 
Y.X. Yuan [19], the function My(x) can be represented 
as a difference of two regularized gap functions with 
parameters 1/a and a. Since the regularized gap func- 
tion can also be defined for variational inequalities, 
one might consider a similar expression in this more 
general context. Peng [18] established the equivalence 
of the variational inequality problem to unconstrained 


minimization of a difference of regularized gap func- 
tions. This result was further extended by Yamashita, 
K. Taji and Fukushima [25] who obtained similar re- 
sults for differences of regularized gap functions whose 
parameters are not necessarily the inverse of each other. 
For algorithms based on this approach, see [22]. 

For a unified treatment of extensions of the implicit 
Lagrangian and the regularized gap function for the 
generalized complementarity problems see [23]. 

Yet another context where the implicit Lagrangian 
can be used is optimization reformulation of the ex- 
tended linear complementarity problem[21]. 


See also 


> Kuhn-Tucker Optimality Conditions 
> Lagrangian Duality: Basics 
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Introduction 


The role of convex analysis in optimization is well 
known. One of the major properties of a convex func- 
tion is its representation as the upper envelope of 
a family of affine functions. More specifically, every 
lower semicontinuous proper convex function can be 
expressed as the supremum of the family of affine 
functions, majorized by it [4]. The subject of ab- 
stract convexity arose precisely by generalizing this 
idea (see [5,6]). A function is said to be abstract con- 
vex if and only if it can be represented as the up- 
per envelope of a class of functions, usually called el- 
ementary functions. One of the first studies in abstract 
convexity concerned the analysis of increasing and posi- 
tively homogeneous (IPH) functions. It was initially car- 
ried out for functions defined over RR’, and R‘_, where 
Ri 4 := int R4, and later on extended to an arbitrary 
closed convex cone in [1] and an arbitrary topological 
vector space in [3]. This study was further extended 
to include increasing and convex-along-rays (ICAR) 
functions over IR‘. The study of IPH and ICAR func- 
tions has given rise to the subject of monotonic analysis, 
the study of increasing functions enjoying some addi- 
tional properties, which has important applications in 
global optimization (see [5] for more details). The sys- 
tematic study of this subject was started in [1] and [2] 
by J. Dutta, J. E. Martinez-Legaz, and A. M. Rubinov, 
where they analyzed IPH and ICAR functions defined 
on a cone. In the present article, we extend this anal- 
ysis to the study of ICAR functions defined over an 
arbitrary topological vector space. We want to empha- 
size that the role of IPH functions in monotonic anal- 
ysis is the same as the role of sublinear functions in 
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convex analysis, whereas ICAR functions play the role 
of convex functions. We define elementary functions, 
which can be considered as generalizations of min-type 
functions, and demonstrate that ICAR functions are ab- 
stract convex with respect to the class of such elemen- 
tary functions. This leads us to develop suitable notions 
of subdifferential for ICAR functions. Finally, we study 
the class of decreasing and convex-along-rays (DCAR) 
functions. 

Let H be a set of functions h: X —> Rio 
defined on a set X. Recall (see [6]) that a function 
ff: X —> Roo is called abstract convex with re- 
spect to H (H-convex) if there exists a set U C H such 
that f(x) = sup{h(x): h © U}. Let supp(f,H) := 
{h € H:h <_ f} be the support set of a func- 
tion f: X —> Roo with respect to H. The func- 
tion conf: X —> Roo defined by conf(x) := 
sup{h(x): h € supp(f, H)} is called the H-convex hull 
of f. Clearly, f is H-convex if and only if f = cof. 
Let f: X —> Roo be a proper function and x9 € 
domf = {x € X: f(x) < +00}. The set 


Ou f (xo) = {h © Hz f(x) > flxo) + h(x) — h(x) 
for all x € X} 


is called the H-subdifferential of f at the point x9. Obvi- 
ously, 04 f(x) is nonempty if f(xo) = max{h(xo): h € 
supp(f. H)}. 

Let (X,Y) be a pair of sets with a coupling func- 
tion g: X x Y —> R40. Denote by Fx the union of 
the set of all functions f: X —> R+oo and the function 
—oo, where —oo(x) = —oo for all x € X. The Fenchel- 
Moreau conjugation corresponding to ¢ is the mapping 
f — f? defined on Fx by 


fy) = supto(e, y) —f(x)}}, yey. 


Let g’ be the function defined on YxX by 
g'(y,x) = g(x, y). Then the Fenchel-Moreau con- 
jugation corresponding to g’ is the mapping g —> g®” 
defined on Fy by 


g? (x) = sup{o/(y.x) — (9)} 
ye 

= sup{y(x, y) — g(y)} - 
yey 


In the case where Y is a set of functions defined on 
set X, for each y € Y and y € R, consider the func- 
tion hy,y(x):= y(x)—y, x € X. Denote by Hy the 


set {hy y: ye Y, y € R}. Let g: X x Y —> Ryo de- 
fined by g(x, y) = y(x). The following result is well 
known (see, e. g., [5], Theorem 8.8): 


Theorem 1 Let f € Fx. Then f?% = con, f. In par- 
ticular, f°® = f if and only if f is Hy-convex. 


ICAR Functions 


Let X be a topological vector space. A set K C X is 
called conic ifAK C K forall A > 0. We assume that X is 
equipped with a closed convex pointed cone K (the let- 
ter means that K M (—K) = {0}). The increasing prop- 
erty of our functions will be understood to be with re- 
spect to the ordering < induced on X by K: 


xy ==> y-xekK. 


A function f: X —> R46 is called convex along rays 
(shortly CAR) if the function f,(@) = f(ax) is con- 
vex on the ray [0, +00) for each x € X. Similarly, f is 
called increasing along rays (shortly IAR) if the func- 
tion f,(@) = f (ax) is increasing on the ray [0, +00) for 
each x € X. Also the function f: X —> Roo is called 
increasing ifx > y = > f(x) => f(y) and it is called 
decreasing if x > y = > f(x) < f(y). In the sequel, 
we shall study the increasing convex-along-rays (ICAR) 
and decreasing convex-along-rays (DCAR) functions. 
Consider the coupling function 1: X x X —> R, de- 
fined by 


I(x, y) = max{A > 0: Ay < x}, 


with the conventions max 8 := 0 and max R := +00. 
This function is introduced and examined in [3]. We 
shall include some properties of ! for the sake of com- 
pleteness. 


Proposition 1 For every x, x’, y € X and y > 0, one 
has 


(yx, y) = yl(x, y), (1) 
Ios.7y) = SIGs.y), Q) 
I(x,x)=1 — > x€-K, (3) 
xx’ => I(x, y) < U(x’, y), (4) 
x€K, ye¢—-K = l(x,y)=+00. (5) 
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Proof 1 We only prove (3). Note that, since 0 € K, 
we have I(x,x) >1 for all x € X. If x ¢ —K, then 
Ax <x for some A > 0. This implies that A < 1. Thus, 
I(x, x) = 1. Conversely, let I(x,x) = 1. Assume, to- 
ward a contradiction, that x € —K. Then, Ax < x for 
all A >1, and so I(x, x) = +00 > 1, which contradicts 
the assumption that /(x,x) =1.Hence,x €-K. O 


In view of the above proposition, for example, when 
x = y € K, the maximum in the definition of I(x,y) is 
actually attained. The following proposition gives us 
a necessary condition for which /(x,y) is finite. 


Proposition 2 If y ¢ —K, then I(x, y) < +00 for all 
xe XxX. 


Proof 2 If I(x, y) = +00, then there exists a sequence 
An —> +00 such that y < (1/A,,)x.Hencey <0. O 


By ([3], Remark 2.1), ly: X —> Ry is an IPH function 
for each y € X. Wealso have the following proposition: 


Proposition 3 Let y¢-—K. 
L:X — R4 is upper semicontinuous. 


The function 


Proof 3 Fix x eX. Let {x,} CX be such that 
Xn —> x. Set A = lim I,(xn). If A = 0, then, by the 
nonnegativity of J,, we have 1,(x) > A. Let A > 0. It fol- 
lows from y ¢ —K that A < +00. Consider the subse- 
quence {n;}.>1 such that 1,(x,,) > 0 and 1,(x,,) —> A. 
We have xy, — ly(xn,)y € K for all s > 1. Since K is 
closed, we get x — Ay € K and so, by the definition of |, 
I(x, y) = A. Hence l, is upper semicontinuous. oO 


Set X’ = X \ (—K) and L’ = {l,: y ¢ —K}. Fixy € X’. 
Let 1: X —> Rx defined by 


I(x) = I(x). 


We have Al(x) = I(Ax) for all A € [0, +00). (Note that 
1,(x) < +00 for all x € X). For each x € X, consider 
the function /*: R4 —> Ry defined by 


I(t) := U(tx) = 1, (tx). 


It is not difficult to check that I’ is increasing and con- 
tinuous. Hence the function / is ICAR, IAR, and con- 
tinuous along rays. 


Proposition 4 Let f: X —> R400 be increasing and 
IAR. Then f(x) = f(0) for all x < 0. 


Proof 4 Fix x € X such that x < 0. It follows from 
x <0 and the monotonicity of f that f(x) < f(0). On 
the other hand, since f is IAR, we have 


f(x) = fC) = fr(0) = f(0). 


Hence f(x) = f(0). Oo 


We give an example of an increasing function that is 
not TAR. 


Example 1 Consider the function f: R? —> R de- 
fined by 


f(x) = min x;, VxeR?. 
1<i<3 


Recall that x < yifand only ifx; < y; forall 1 <i <3. 
It is easy to see that f is increasing but, if we set 
x = (—2,3,4), then f,: [0, -+oo) —> R is not increas- 
ing. Therefore, f is not IAR. 


The following functions are samples of ICAR and IAR 
functions. 


Example 2. Consider the functions f: R” —> R and 
g: R” — R defined by 
x€éR", 


otherwise. 


Max)<j<n Xj 


f(x) = 


and 


g(x) = exp(f). 


It is easy to check that f and g are ICAR and JAR with 
respect to coordinatewise ordering on R”. 


Let W = L’ U {0}, where 0(x) = 0 for all x € X. Con- 
sider the set H = {1 —y: le W, y € R}. We have the 
following result: 


Theorem 2 Let f: X —> Rioo be a function. Then f 
is ICAR, IAR, and IscAR if and only if it is H-convex. 


Proof 5 It is clear that each function h € H is 
ICAR, IAR, and continuous along rays. Therefore each 
H-convex function is ICAR, IAR, and IscAR. Con- 
versely, let f: X —> Rioo be an ICAR, IAR, and 
IscAR function. Consider y € X. Since fy is in- 
creasing, convex, and Isc, it follows from ([5], 
Lemma 3.1) that there exists a set V, CR; x R such 
that f,(t) = suPyey, vit — v2}, for each t > 0, where 
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v = (v1, v2) € Vy. First, we suppose that y ¢ —K. For 
v = (11, V2) € Vy, we set 
h’(x) = v,1,(x)-v, x EX. 


Clearly h” € H. Let v; = 0 or 1,(x) = 0. Since f is JAR, 
we have 


f(x) = fx) = fr) = f(0) = f,(0) 
>-—vy=h"(x), forallx eX. 


Suppose now that v;>0 and /,(x)>0. (Note that 
1,(x) < +00). Since x — 1,(x)y € K and f is increas- 
ing, it follows that 


A'(x) < fylLy(x)) = fy) < f(x). 


Thus, h’(x) < f(x) for all x eX, that is, h’ € 
supp(f, H), where 


supp(f,H) = {h © H: h(x) < f(x), Vx eX}. 
Therefore 
sup{h"(y),v € Vy} < sup{h(y): h € supp(f.H)} 
< fy). (6) 


On the other hand, in view of (3), we have 
f(y) = fy) = sup (v1 — v2) = sup (111 ,(y) — v2) 
veVy veVy 


= sup h"(y). 


veVy 


Thus, f(y) = sup{h(y): h € supp(f, H)}. We now as- 
sume that y € —K. By Proposition 4, we have f(y) = 
(0). It follows from the proof of ([5], Lemma 3.1) that 
v, = 0 for all v € V,. Consider v € V,. We set 

h(x) =—-v., xEXx. 


Then 
f(x) = fC) = f) = oe —Vv2 = h"(x), 
forall x € X 


and 


f(y) = f() = sup —v2 = sup h’(y). (7) 


veVy 


Hence {h”: v € V,} C supp(f,H), and in view of (6) 


and (7), we get f(y) = sup{h(y): h € supp(f, H)}. 
This completes the proof. oO 


It follows from ([3], Proposition 3.3) that there exists 
a bijection y from X’ U {0} onto W. Therefore we can 
identify W with Y = X’ U {0} by means of the map- 
ping 

We define the coupling function g: X x Y > Rioo 
by 


1, (x) 
0 y=0. 


yeXx’, 


g(x,y) = (8) 


Combining Theorem 1 with Theorem 2, one gets: 


Theorem 3 A function f: X —> Rico is ICAR, IAR, 
and IscAR if and only if f = f?®. 


Subdifferentiability of ICAR Functions 


Consider the subdifferential dw of f: X —> Rio at 
point x9 € domf: 


Owf (xo) = th e W: f(x) = f(xo) + A(x) — h(x) 
Vx € X}. 


We have the following result: 


Proposition 5 Let f: X —> R4oo be an ICAR and 
IAR function and xo € X'. If Axo € domf for some 
A > 1, then dw f (xo) is nonempty and we have 


{tly t € Ofe(1)} S Iw (xo), (9) 


where fx,(@) = f(axo). Moreover, if f is strictly increas- 
ing at point xo (i.e, x € X, x < x9 and x # xo imply 
f(x) < fro) and f xo is strictly increasing at point a = 1, 
by replacing Ow with 01’, then equality holds in (9). 


Proof 6 Since the increasing convex function fo is 
continuous at 1, it has a subgradient t > 0 at this point. 
Thus 

forallt >0. 


tt—t1 < fas(t)— fag(W) (10) 


Let x € X be arbitrary. If 1,,(x) = 0, then by setting 
t = I,,(x) = 0 in (10), we have 


TL xy (X) —Tt1< Fico(0) — feo) . 
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Since 1,,(xo) = 1 and f is IAR, we get 
Tx 9(x) — Thx (%o) S f(x) — f (xo) - 


Assume now that 1,,(x) > 0. We have 1,,(x)xo < x. In 
view of the monotonicity of f, one has 


F(x) = fg (*) x0) = fag ay (x) 
= foo) + Tyo (X) —T. 


Thus 


f(x) — f(%0) = The (X) — Theo (x0) , 


which shows that tl,, € dwf(xo). This implies that 
Iwf (xo) # @ and we have (9). Now, let fio be strictly 
increasing at a = 1 and f be strictly increasing at xo. In 
this case t is different from zero in the left-hand side 
of (9). Consider | € dz f (xo). Then there exists y € X’ 
such that / = 1,. We have 


f(x) —f(xo) = I(x) —1(xo),  forallx € X. (11) 


We will show that /,(xo) > 0. Reasoning by contradic- 
tion, let us assume that (xo) = 0. It follows from (11) 
that f.,(t) — fry(1) = 0 for all t > 0. Since fo is strictly 
increasing at point 1, we get a contradiction. Thus 
1, (xo) > 0. Since Ly(xo)y < xo, one has 


f (x0) = fy (xo) y) = f(x0) + Ly (Ly(xo) y) — Ly (x0) 
= f (xo) + Ly(xo)ly(y) — Ly(xo) ; 
= f (xo). 


Hence f(xo) = f(1,(xo)y). Since f is strictly increasing 
at xo, we get 1y(xo)y = xo or y = Xo/(1y(xo)). More- 
over, for all t > 0, by (11), we have 


Frao(t) = f(txo) = f(xo) + 1y(tx0) — Ly (xo) , 
= fao(l) + Ly (xo)(t—- 1), 


which shows that 1,(xo) € Ofx)(1). We set t = 1, (xo). 
Then / = |,y;¢ = Tl, . This completes the proof. O 


Proposition 6 Let f: X —> Roo be an ICAR and 
TAR function. If x € X' is a point such that the one-sided 
derivative f’(x,x) = 0, then x is a global minimizer of f 
over X. 


Proof 7 Since the function f, is convex and f/(1) = 
f'(x,x) = 0, we have f,(1) = minzejo,+00) fr(#). Thus 


fc.) < f,(0). On the other hand, since f is IAR, 
f(x) = fel) = f(0) = f(0). Hence f(x) = f(0) = 
minxex f(x). Oo 
Recall that, by ([3], Proposition 3.3) , we can identify L’ 


with X’ by means of the mapping y’. Let us denote by 
Ox’ f (xo) the set (w’)~! (dz f(xo)). Then 


Ox’ f (x0) 
= {y eX’: 1,(x)—l, (xo) < f(x)—f(xo), Vx € X}. 


Proposition 7 Let f: X —> R be an ICAR and IAR 
function. Then 


dx f(0) = ty € X": L(x) < (fe), 0), 
where (fx)', is the right derivative of the function fx 
given by f,(a) = fax). 

Proof 8 For each y € X’, one has 


1,(x) — 1,(0) < f(x) — f (0), 
Vx Ex 

I,(ax) < f(a) — fx(0), 

Vx € X’, Va>0 

fx(a@) — fi (0) 
a 

Vx € X’, Va>0 


L(x) S (fx). ), 


The second of equivalence is a consequence of Proposi- 
tion 1 and ([3], Remark 2.1). Oo 


Vx eX}, 


ye dx’ f (0) —=> 


= 


I,(x) S 


=> Vx € Xx’. 


DCAR Functions 


In this section, we shall study decreasing convex-along- 
rays (DCAR) functions defined on X. To this end, we 
introduce the coupling function u: X x X —> R de- 
fined by 


u(x, y) = max{A € R: x < Ay}. 


Let x, x’, y € X be arbitrary and y > 0. It is easy to 
check that the function u has the following properties: 
(1) x <x! => u(x’, y) < u(x, y); 
(2) u(yx, y) = yu(x, y), 
(3) u(x,y) = +00 = > yek. 

We have also u(—x, —y) = I(x, y). For each y € X, 
consider the cone 


Q, = {x EX: 0 < u(x, y) < +00}. 
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Lemmal Let y € X and x, x’ € Q,. The following in- 
equality holds: 


u(x +x’, y) > u(x, y) + u(x’, y). (12) 
Proof9 LetA={a eR: ax < y}, B={B ER: Bx’ 
< y}and C = {y ER: y(x + x’) < y}. In view of the 
transitive property of the relation <, we get A+ BCC 
and this yields (12). Oo 


It follows from the properties of u and Lemma 1 that Q, 
is a downward convex cone. Fix y € X. We define the 


function ry: X —> Ry by 
u(x,y) x €Q,, 
ry(x) = en? j (13) 
0 otherwise . 


By the properties of u, we get r, is a decreasing 

and positively homogenous function of degree one. Let 
y ¢K and set r=ry. It is not difficult to see that 
the function r*: Ry —> Ry defined by r*(t) = r(tx) 
is increasing, convex, and continuous. Thus, for each 
y ¢ K, the function r, is DCAR, IAR, and continuous 
along rays. 
Example 3 Let X = IR" and K be the cone R‘ of 
all vectors in IR" with nonnegative coordinates. Let 
I= {1,2,...,n}. Each vector x € R” generates the fol- 
lowing sets of indices: 


I4(x) = f{iel: x; >0}, 
In(x) = {i € 1: x; = O}, 
I(x) = {ie 1: x; < 0}. 


Let x € R” andc € R. Denote by £ the vector with co- 
ordinates 


£, i¢ Io(x), 
(2),-|F ne 


Then, for each x, y € Ri, we have 


: xj EQ 

MiMjel(y)—, * ; 

ry(x) = a » 

0, x £Q, 

where 
Qy = 4 x ER": h(y) ULV) € h(x) U L(); 

Xj Xi 
max — < min — 
i€I4(y) Vi 1€T_-(y) Vi 


Example 4 Consider the function g: R" —> R de- 
fined by 


xéR, 


otherwise . 


— Min} <i<p Xj 


g(x) = 


It is easy to check that g is DCAR and IAR. 


Now, let 

U = try: y €(X\ K)}U {0} 
and 

Hy ={h-y:heuwu, yeR}. 


It is clear that each Hy-convex function is DCAR, IAR, 
and IscAR. The proof of the following theorem is similar 
to that of Theorem 2, and therefore we omit its proof. 


Theorem 4 Let f: X —> Rigo bea DCAR, IAR, and 
IscAR function. Then f is Hy-convex. 


Corollary 1 The function f: X —> R4oo is Hu- 
convex if and only if it is DCAR, IAR, and IscAR. 
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Introduction 


We study IPH (increasing and positively homogeneous 
of degree one) functions defined on a topological vec- 
tor space X that is equipped with a closed convex 
pointed cone K (see [7]). The theory of IPH func- 
tions defined on a closed convex cone K in a topo- 
logical vector space X is well developed [2]. There are 
two main results of this theory, which have a cen- 
tral role. First, each IPH function p defined on K can 
be represented as the Minkowski gauge of a normal 
closed along rays (Definitions 1 and 2) subset U of 
K, namely, U = {x € K: p(x) < 1}. Conversely, the 
Minkowski gauge of a normal closed along rays set is 
an IPH function. The second result is based on ideas 
of abstract convexity: each IPH function defined on K 
can be represented as the upper envelope of a set of so- 
called min-type functions. This result can be considered 
as a certain form of a dual representation of IPH func- 
tions. IPH functions can be useful for the description 
of radiant and coradiant downward sets and, through 
them, can have applications to the study of some NTU 
games arising in mathematical economics [1,13] and to 
the analysis of topical functions, which are used in the 
analysis of discrete event systems [3,4,5]. Some of the 
results related to monotonic analysis on the space R” 
with respect to the coordinatewise order relation can be 
found in [5,6,10,11]. IPH functions defined on R” are 
examined in [5]. Nevertheless, as it turned out, the main 
results from [5] can be extended for arbitrary topolog- 
ical vector spaces. Such extension is one of the main 
topics of this article. The study of some problems in 
a more general framework clarifies and simplifies the 
main ideas and approaches. The results obtained can 


be used, for example, in the study of vector optimiza- 
tion problems. 

Let X be a topological vector space. We assume that 
X is equipped with a closed convex pointed cone K 
(the latter means that K M (—K) = {0}). The increas- 
ing property of our functions will be understood to be 
with respect to the ordering < induced on X by K: 


xSy => y-xeKk. 
We use the following notations: 


R = (—oo, +00), 
R = RU {—00} U {+00}, 


Rico = (—o0, +00], 


R+ = [0, +00), 
R, = [0, too], 
R_ = (—oo, 0], 
R_ = [—oo, 0]. 


We accept the following conventions: 
, a =0. (1) 


A function p: X —> R is called positively homoge- 
neous if p(Ax) = A p(x) for all x € X and A > 0. Func- 
tion p is called increasing if x > y = > p(x) => p(y). 
We shall study IPH (increasing and positively homo- 
geneous of degree one) functions p such that 0 ¢€ 
dom p := {x € X: — oo < p(x) < +00}, and hence 
we have p(0) = 0. 
The following definitions can be found in [9]. 


Definition 1 A nonempty subset W of X is called 
closed along rays if (x € W, A, >0,A,x EW, n= 
1,2,..., A, —-A,A>0) = Axew. 


Definition 2 A nonempty subset A of K is called nor- 
mal ifx € A, x’ € Kandx’ < ximplyx’ € A. 


Definition 3 A nonempty subset B of K is called 
conormal if x € B, x’ € K and x < x’ implyx’ € B. 


A normal subset A of K is radiant, that is, x € A 
and 0<A<1 imply Ax € A. A conormal sub- 
set B of K is coradiant, that is, x € B and A>1 
imply Ax € B. A set WC X is called downward 
if (x € W, x’ <x) = > x’ € W. (In particular, the 
empty set is downward). Similarly, a set V C X is called 
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upward if (x’ € V, x’ <x) => x EV. Let W C X be 
a radiant set. The Minkowski gauge ww: X —> Ry of 
this set is defined by 
x 
w(x) = inf{A >0: 7 Ee W}. (2) 
The Minkowski cogauge vy : X —> Rx of a coradiant 
set V C X is defined by 
x 
vy(x) = sup{A > 0: t eV}. (3) 
It is easy to check that the Minkowski gauge of a down- 
ward set and the Minkowski cogauge of an upward set 


are IPH. 
Consider the function 1: K x K —> Rx defined by 


I(x, y) = max{A € Ry: Ay < x}. 


This function is introduced and examined in [2]. To 
motivate our study, we characterize the IPH functions 
defined on K. 


Theorem 1 ([2], Theorem 16) Let p: K —> R+ be 
a function. Then p is IPH if and only if p(x) = 
I(x, y)p(y) for all x, y € K, with the convention 
(+00) x0=0. 


Characterizations of Nonnegative IPH Functions 


Carrying forward the motivation from Theorem 1, 
we shall now proceed to develop a similar type of 
property for IPH functions p : X —> R +. To achieve 
this, we need to introduce the coupling function 
1: X x X —> R, defined by 

I(x, y):= max{A > 0: Ay < x} (4) 
(we use the conventions max @ := 0 and max Ry := 
+oo). The next proposition gives some properties of 
the coupling function I. 


Proposition 1 For every x, x’, y € X and y > 0, one 


has 
(yx, y) = yl(x, y), (5) 
I(x, yy) = ley), (6) 
(x,y) = +00 => ye-K, (7) 


I(x,x)=1 = > x €-K, (8) 
x€K, ye-K = (x,y) = +00, (9) 
x Sx’ ==> I(x, y) <l(x',y), (10) 
ysy => I(x, y)2 x,y’). (11) 


Proof We only prove parts (7) and (10). Let [(x, y) = 
+oo for some x, y € X. By (4) there exists a sequence 
{An}n>1 such that A, —> +00 and y < 1/A,,x for all 
n > 1. Since K is a closed cone, we get y < 0. This 
proves (7). To prove (10), letx < x’,Ay,y = {A = 
O: Ay < x}and Ay y = {y = O: yy < x’}. It is 
clear that A,,y C A,’,y (notice that if A,,, = G, then 
A,,y = 0). Hence I(x, y) < I(x’, y). a 


Example 1 Let X = R” and K be the cone R" of 
all vectors in R” with nonnegative coordinates. Let 
I= {1,2,...,n}. Each vector x € IR” generates the fol- 
lowing sets of indices: 

I4(x) = {i El: xj > 0}, 

Ip(x) = {i € Ix; = 0}, I(x) = {i EI: x; < 0}. 


Let x € R" and c € R. Denote by c/x the vector with 
coordinates 


(9-48 
x/e 10, 


Then, for each x, y € R”, we have 


i ¢ Ip(x), 


i€ Ip(x) P 


F ey 
MINjer4(y) ye 


0, x éKy, 


x€éK,, 


I(x, y) = | 
where 


Ky = {x ER": Vie (yUb(y), x > 0 


Xj Xi 
max — < min — 
i€l4(y) Vi i€l_(y) Vi 


We also need to introduce the coupling function u: X x 
X —> R, defined by 
u(x, y) := min{u > 0: x < py} (12) 


(with the convention min % := +00). 
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The following proposition gives some properties of 
the coupling function u. 


Proposition 2 For every x, x',y € X and y > 0, one 
has 


u(yx,y) = yulx,y), (13) 
ule. 79) = ule, 9). (14) 
u(x,y)=0 => xe-K, (15) 
u(x,x)=1 => x¢-K, (16) 
eax! ==> ulx,y) < ulx',y); (17) 
ysy => ulx,y) = ux, y’). (18) 


Example 2. Let X = IR” and K be the cone R% of all 
vectors in R” with nonnegative coordinates. Then, for 
each x, y € R", we have 


Xj + 
ore i oe an CH 
yi’ y 


; xECS, 


MaXjer4 (y) 


u(x,y) = 


where 
a = {xe R": Jiely(y) st. x; >0 and 


Vie L(y) Ulpy), x; < 0}. 


Theorem 2 Let p: X —> Rx be a function. Then the 
following assertions are equivalent: 


(i) pis IPH. 
(ii) p(x) => Ap(y) for allx, y € X, and i > 0 such that 
Ay <x. 


(iii) p(x) = I(x, y)p(y) for all x, y € X, with the con- 
vention (+00) x0 = 0. 

(iv) p(x) < u(x, y)p(y) for all x, y € X, with the con- 
vention (+00) X 0 = +00. 


Proof It is clear that (i) implies (ii). To prove the 
implication (ii) —> (iii), notice first that, due to (7), 
I(x, y) = +00 implies that y € —K and so p(y) = 0. 
Then, by the convention (+00) x0=0, we have 
p(x) = U(x, y)p(y). If I(x, y) = 0, then, by the non- 
negativity of p, we get p(x) > I(x, y)p(y). Finally, let 
0 < I(x, y) < +00. Then in view of (4) and the closed- 
ness of K, we have x > I(x, y)y, and so (ii) implies 


(iii). We shall now prove the implication (iii) —> 
(i). Consider x, y € X such that y < x. By (4) we get 
I(x, y) => 1. Then (iii) yields that p(x) > p(y). Hence 
p is increasing. Let x € X, A>0 and I(x,Ax) = +00. 
It follows from (6) and (7) that x, Ax € —K. Since 
p is increasing, we get Ap(x) = p(Ax) =0. Let 
x ¢—K and set y=Ax. Then, by (6) and (8), we 
have I(x,Ax) = 1/A. Thus p(x) > 1/Ap(Ax), and so 
Ap(x) => p(Ax). By replacing A with 1/A and x with 
Ax, we obtain p(Ax) > Ap(x). This proves that p 
is positively homogeneous. We next prove the im- 
plication (i) —> (iv). Let u(x, y) =0. By (15) we 
get x < 0. Then p(x) = 0, and so p(x) < u(x, y) p(y). 
If u(x, y) = +00, then, in view of the convention 
(+00) x 0 = +00, we have p(x) < u(x, y)p(y). We 
now assume that 0 < u(x, y) < +00. Then, in view 
of (12) and the closedness of K, we get x < u(x, y)y. 
Hence p(x) < u(x, y)p(y). Finally, the proof of the im- 
plication (iv) —> (i) can be done in a manner analo- 
gous to that of the implication (iii) —> (i). Oo 


We shall now describe a class of elementary functions 
with respect to which the IPH functions are supremally 
generated. Given y € X, let us set 1,(x) := I(x, y) for 


all x € X. Thus, by (4), 
I(x) = max{A>O0:Ay<x}, VxeXx. (19) 


Remark 1 The function 1,: X —> R. is an IPH func- 
tion for each y € X. It obviously follows from (5) 
and (10). 


Let L be the set of all supremally generating elementary 
functions, defined by (19), that is, 


L:= {ly: ye X}. (20) 
Consider the mapping yw : X —> L defined by 
vy) =], 


We have the following proposition: 


yEeXx. 


Proposition 3. The mapping yy: X —> L is onto. 


Moreover, it is antitone: 
y1 Ss y2 ? ly, - Ly, (21) 


and antihomogeneous (positively homogeneous of de- 
gree — 1): 


liy= aly, Ved, VASO. (22) 
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Proof By the definition of L, y is obviously onto. Im- 
plications (21) and (22) follow from (11) and (6), re- 
spectively. wi 


The following example shows that the mapping y is not 
one-to-one. 


Example 3 Let X = R* and K = R%,.. Consider the 
distinct points y,) = (a,b) and y.2 = (c,d), where 
a, b, cand dare negative numbers. By the results ob- 
tained in Example 1, we have Ky, = Ky, = X. Since 
I4(y1) = I4(2) = 9, it follows that 1,,(x) = 1,,(x) = 
+oo for every x € X. Thus, /,, = 1,,. 


Let X’ = X \ (—K) and L’ = {l,: y € X’}. We can get: 


Proposition 4 The mapping Wy’ = |x’ is a bijection 
from X' onto L’, where |x’ is the restriction of y to X’. 


Proof Since, by the definition of L’, y’ is obviously 
onto, we only have to prove that w’ is one-to-one. 
To this aim, assume that y), y2 € X’ are such that 
ly, = 1,,. By (8), we have 1 = 1(y1, 91) = [(1, y2). 
Hence, by (4), we get y2 < yi). By symmetry it fol- 
lows that y; < y2. Since K is pointed, we conclude that 
Vi = Je: C 


Recall (see [8]) that a function p : X —> Rx is called 
abstract convex with respect to the set L or L-convex 
if and only if there exists a set W CL such that 
p(x) = supjeyw I(x). If W C L’, then using y’ we can 
identify W with some subset of X. In terms of X, 
p is L’-convex if there exists a subset Y C X’ such that 
p(x) = supyey /y(x). It follows from Remark 1 that 
L consists of nonnegative IPH functions, hence each 
L-convex function is IPH. 


Theorem 3 Let p: X —> Rx be a function and L be 
the set described by Eq. (20). Then p is IPH if and only if 
there exists a set Y C X such that 


p(x) = ee I(x) Vxex 


(with the convention max ® := 0). In this case, one can 
take Y = {y € X: p(y) = 1}. Hence, p: X —> Rx is 
IPH if and only if it is L-convex. 


Proof We shall only show that every IPH function 
p:X —> Rx satisfies p(x) = maxyey /,(x), for all 
x € X, with 


Y={yeX: ply) = ]}. 


It is clear that YM (—K) = @. For any x, y € X with 
p(y) = 1, it follows from Theorem 2 that p(x) > 1,(x). 
This means that p>J, for all y¢Y, and so 
p = Maxycy l,. If p(x) = 0, then, by nonnegativity of 
the function J,, we have maxyey 1,(x) = 0 = p(x). As- 
sume now that 0 < p(x) < +00. Since p(x/p(x)) = 1, 
we get x/p(x) € Y. Moreover, it follows from (6) that 
p(x) = I(x, x/p(x)). Therefore, p(x) = maxyey l,(x). 
Finally, suppose that p(x) = +o. It follows from the 
positive homogeneity of p that (1/A)x € Y for all A > 0. 
Then, maxyey Jy(x) > [qjayx(x) = A for all A > 0. This 
means that maxyey ly(x) = +oo = p(x). This com- 
pletes the proof. Oo 


The IPH functions are also infimally generated by the 
elementary functions u, : X —> Ry, y € X, defined 
by 


Uy(x):= u(x, y) = minfu>0: x < py}, VxeXx. 


In view of (13) and (17), it is clear that the function u, 
is IPH. Set 


U := {uy: y © X}. (23) 


We define the mapping yg: X —> Rx by 


gly) = Uy, yex. 


We omit the proof of the following results, which are 
similar to those of Propositions 3 and 4. 


Proposition 5 The mapping y: X —> U is onto. 
Moreover, it is antitone and antihomogeneous (posi- 
tively homogeneous of degree -1). 


Let U’ = {uy: y € X’}. We can get: 


Proposition 6 The mapping y’ = |x’ is a bijection 
from X’ onto U’, where |x’ is the restriction of y to 
KS, 
A function p: X —> Rj is called abstract concave with 
respect to the set U, or U-concave, if there exists a set 
W C Usuch that p(x) = infyew u(x). Since U consists 
of nonnegative IPH functions, we get each U-concave 
function is IPH. 

The proof of the following theorem can be done in 
a manner analogous to the one of Theorem 3. 


Theorem 4 Let p: X —> Rx be a function and U be 
the set described by (23). Then p is IPH if and only if 
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there exists a set W C U such that 


p(x) = pia aeaae VxeXx. 

In this case, one can take W = {uy: y € X, ply) < VY. 
Hence p: X —> Rx is IPH if and only if it is U- 
concave. 


Abstract Convexity of Nonnegative IPH Functions 


We are now going to develop an abstract convexity 
(resp. abstract concavity) approach to IPH functions. 
The set L (resp. U) will play the role of the conjugate 
space in the usual linear model, while IPH functions 
will be regarded as analogous to sublinear functions. 
The well-known dual object related to a sublinear func- 
tion is the so-called polar function (see, for example, 
[12,14]). We now give an analog of this concept for IPH 
functions and define also a related notion of polar set of 
asetW C X. 


Definition 4 The lower polar function of p: X —> 
IR4 is the function p°: L —> Rx defined by 


p(y) = sup ie Lek (24) 


(with the conventions 0/0 = 0 and oo/oo = 0). 


Theorem 5 Let p : X —> Rx be a function. Then 


°L,) >—~ VLeEL, 25 
PU) = FG : om 
and p is IPH if and only if 
1 
°1,) = —~ VLeEL. 26 
Be py” i 


Proof By (8), (9), and (24) we have p°(l,) > 
L,(y\/p(y) = 1/p(y) for every y € X. Let p be an 
IPH function and x, y € X be arbitrary. Suppose that 
0 < p(x) < +00 and 0 < p(y) < +00. It follows from 
Theorem 2 that 


W(x) 1 
p(x) ~ ply) 


If p(x) = 0, then, by part (iii) of Theorem 2, we 
have 1,(x) = 0 or p(y) = 0, which in both cases (27) 
holds. In view of (1), (27) holds in the other cases. 


(27) 


Therefore, p°(1,) = sup,ex ly(x)/p(x) < 1/p(y). This, 
together with (25), yields that p°(1,) = 1/p(y). To 
prove the converse, let x, y¢X be arbitrary. It 
follows from (26) that 1,(x)/p(x) < 1/p(y). Thus, 
1,(x)p(y) < p(x). Since x and y are arbitrary, by The- 
orem 2 (the implication (iii) = > (i)), we conclude 
that p is IPH. oO 


The set supp(p, L) = {l, € L: 1,(x) < p(x) Vx € X} 
is called the support set of the function p: X —> Ry 
with respect to set L. If p is finite-valued or IPH, then, 
in view of (9), we get supp(p, L) C L’, and using w’ we 
can identify supp(p, L) with some subset of X. Let us 
denote by supp(p, X) the set (W’)~'(supp(p, L)). Then 


supp(p,X) = {ye X: ly < p}. 


We shall call supp(p, X) the X — support of p. 


Proposition 7 Let p : X —> Rx be a function. Then, 
p is IPH if and only if 
(28) 


supp(p,X) = ty € X: ply) 2}. 


Proof Let p be an IPH function. We have 


supp(p,X) = {y © X: 1,(x) < p(x) Vx e X} 
={ye Xr p(y) = 1}. 


Then (26) immediately yields (28). To prove the con- 
verse, let x, y € X be arbitrary. If p(y) = 0, then it 
is clear that p(y)l,(x) < p(x). Let 0 < p(y) < +00. 
Then, by hypothesis, we have r = y/p(y) € supp(p, X). 
Thus /,(x) < p(x), and by (6) we get 1,(x)p(y) < p(x). 
Finally, let p(y) = +00. By (28), y € supp(p, X). 
Thus, 1,(x) < p(x). If p(x) = 0, then the nonnegativ- 
ity of 1, yields that 1,(x) = 0, and so 1,(x)p(y) < p(x). 
Clearly the latter inequality holds for p(x) = +00 
Let 0 < p(x) < +oo. Then r = x/p(x) € supp(p, X). 
Therefore, I,(y) < p(y) and by (6), p(x)lx(y) < ply). 
Hence, by Theorem 2, p is IPH, which completes the 
proof. a 


Proposition 8 For any set W C X, the following asser- 

tions are equivalent: 

(i) W is upward, coradiant and closed along rays. 

(ii) There exists an IPH function p: X —> Ry such 
that supp(p, X) = W. Furthermore, function p of 
(ii) is unique, namely, p is the Minkowski cogauge 
vw of W. 
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Proof (i) => (ii). Let p = vy. It is clear that p is 
positively homogeneous. Moreover, since W is upward, 
p is increasing. By [9, Proposition 5.6], since W is closed 
along rays and coradiant, one has 


W=tyeEX: ply) = 1}. (29) 


Whence in view of (28), W = supp(p, X). 

(ii) = > (i). Let W = supp(p, X) for an IPH func- 
tion p: X —> Ry. By (28) W is coradiant, upward, 
and closed along rays. Finally, the uniqueness of p in 
(ii) follows from the following equalities, the last one 
of which uses the convention sup @ = 0 and is a conse- 
quence of (29): 


p(y) = sup{A >0:A < p(y)} 


sup {2 >0: 1 < p(z)} 
y 


II 


= sup{A>0: Té Ww} 
= vw). 
This completes the proof. oO 


For a function p: X —> Rj, the L-subdifferential at 
a point xo € X is defined as follows: 


dr p(xo) = {ly € L: p(x) — p(xo) = Ly(x) — Lyx) }- 


If dp p(xo) C L’, then the set dxp(xo) = (W’)7! 
(0, p(xo)) will be called X-subdifferential of p at xo (note 
that 0x p(xo) C X’). Thus 


Oxp(xo) = {y € X: p(x) — p(xo) = 1y(x) — Ly (xo). 
(30) 


The following simple statement will be useful in the se- 
quel. 


Proposition 9 Let p: X —> R be an IPH function 
and x € dom p be a point such that p(x) 4 0. Then, 
r = x/p(x) € —K. 


Proof Let p(x)>0. Then p(r) =1>0. Since p is 
an IPH function, we get r¢—K. If p(x) <0, then 
p(—r) = —1. Then, in view of the monotonicity of p, 
we get —r ¢ K or r ¢ —K. This completes the proof. O 


Theorem 6 Let p: X —> Rx be an IPH function 
and x € dom p be a point such that p(x) #0. Let 
r= x/p(x). Then I, € 0, p(x), and hence 0d, p(x) is 
nonempty. 


Proof It follows from Proposition 9 and (7) that 
r ¢ —K and I(y,r) < +00 for all y € X. Clearly 
p(x) € {A => 0: Ar < x}. Then, by the defini- 
tion of J, we have I(x,r) > p(x) >0. We shall now 
show that /]-(y) < p(y) for any y € X. To this 
end, let y € X be arbitrary. If I(y,r) = 0, then 
L(y) < ply). Let 0 < I(y,r) < +00. We have 
I(y,r)r < y. Since p is IPH, we get I(y, r)p(r) < p(y). 
Because of p(r) = 1, we get I,(y) < p(y). Since 
y € X was arbitrary, we conclude that I(x) = p(x) 
and I,(y) < p(y) for all y € X. This yields that 
L, € Op p(x). oO 


Remark 2 Let int K 4 9. Consider nonzero IPH func- 
tion p: X —> R} and x € X such that p(x) = 0. We 
can show 0, p(x) # @. Indeed, since p ¥ 0, there ex- 
ists r€ int K such that p(r)>0 (see [2], Proposi- 
tion 6). Set r’ = r/p(r). It is clear that p(r’) = 1, and 
so, by (28), r’ € supp(p, X), that is, 1,(t) < p(t) for 
all t € X. It follows from the nonnegativity of /, that 
L(x) = p(x) = 0. Hence, 1, € dr p(x). 


We next define the upper polar function py: U —> R4+ 
of the function p : X —> Rx by 


jane apee (31) 
Ha yi xeX p(x)’ y 
(with the conventions 0/0 = +oo and 


+o0/+00 = +00). 
The proof of the following result can be done in 
a manner analogous to that of Theorem 5. 


Theorem 7 Let p : X —> Rx bea function. Then 


1 
Po(uy) < 70)’ Vuy EU, (32) 
and p is IPH if and only if 
1 
(uy) = —., Vuy eu. (33) 
Pome’ PO) . 


We shall now study the structure of support sets from 
above, which are characterized by the elementary func- 
tions u, rather than by the functions J, (which charac- 
terize support sets from below). We shall denote the 
support set from above, or upper support set, of the 
function p : X —> R with respect to set U as 


Supp* (p, U) = {uy € U: uy > p}. 
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In what follows, we state the counterpart of Proposi- 
tion 7 for the support set from above. 


Proposition 10 Let p : X —> Rx bea function. Then 


Supp* (p, U) = {uy € U: po(uy) > 1}. (34) 
Furthermore, p is IPH if and only if 
Supp* (p, U) = {uy € U: ply) < 1}. (35) 


Proof Equality (34) follows easily from the defini- 
tions of Supp* (p, U) and po. Furthermore, if p is IPH, 
then (35) follows from (33) and (34). To prove the 
converse, let x, y € X be arbitrary. If 0 < p(y) < +00, 
then uy/p(y) € Suppt(p, U). Thus, uy/py)(x) > p(x). 
By (14) we get p(y)u(x, y) = p(x). If p(y) = +00, 
we have p(x) < u(x, y)p(y) (here we use the con- 
vention (+00) x 0 = +00). Finally, let p(y) = 0. It is 
clear that u, € Supp*(p,U). Thus, u,(x) > p(x). If 
uy(x) = 0, then, in view of the nonnegativity of p, we 
get p(x) = 0, and so p(x) < u(x, y)p(y). Now, sup- 
pose that 0 < uy(x) < +00. It follows from p(Ay) = 0 
for all A > 0 that uy, € Supp*(p, U) for all A > 0, and 
in view of (14), we get (1/A)u,(x) = ugy(x) = p(x) 
for all A>0. This means that p(x) = 0. Therefore, 
p(x) < u(x, y)p(y). Hence, by Theorem 2 (implication 
(iv) => (i), pis IPH. Oo 


For the function p : X —> R4, in a manner analogous 
to the case of L-subdifferential, we now define the U-su- 
perdifferential of p at xo as follows: 


OG p(xo) := {uy € Uz uy(x)—uy(x0) = p(x)—p(xo)}- 


One can prove the following result for U-superdifferen- 
tial in a manner analogous to the proof of Theorem 6, 
and therefore we omit its proof. 


Theorem 8 Let p: X —> Rx be an IPH function 
and x €domp be a point such that p(x) #0. Let 
r = x/p(x). Then u; € di, p(x). 


Definition 5 Let U C X. Then the left polar set of W 
is defined by 


W? ={xeX: I(x, y)<1 Vye W}. 


Analogously, we define the right polar set of V C X. 


Definition 6 Let V C X. Then the right polar set of V 
is defined by 


V" ={yeX: I(x,y)<1 VxeEV}. 


In the following theorem, we assume that int K # @. 


Theorem9 Let W, V C X and V1 int K # Q. Then 

the following assertions are true: 

(i) One has W = weler if and only if W is upward, 
coradiant and closed along rays. 

(ii) One has V = V°"°! if and only if V is downward, 
radiant, and closed along rays. 


Proof Since X? =X" =@, 9°! =" = X, and X 
is upward, downward, radiant, coradiant and closed in 
itself, both statements are true when W = V = X. For 
the rest of the proof we shall assume that W # X and 
VFX. 

(i) Let W C X and W” & Q. By the definition of 
W°", Remark 1, and Proposition 3, W°" is cora- 
diant, upward, and closed along rays. Therefore, 
W = W?!°r implies that W is coradiant, upward, 
and closed along rays. To prove the converse, we 
shall first show that W C W°" Let y € W. 
Since for any x € WwW” we have I(x) < 1, it 
is clear that y € W?!°", We shall now show that 
weer CW. Let y € W??", By Proposition 8 
we have W = supp(p, X) for some IPH function 
p: X —> Ry. Let x € X and A € (p(x), +00) be 
arbitrary. For every y’ € W = supp(p,X), since 
Ly(x) < p(x) < A, using (5), one gets 1,(x/A) = 
Al ly(x) < 1, whence x/A € W°!. Therefore, 
I(x) = A 1,(x/A) < A. Hence, 1,(x) < p(x). This 
proves that 1, < p, that is, y € supp(p, X) = W. 
Suppose that V is a nonempty set. Then, by 
the definition of V’, Proposition 3, and Re- 
mark 1, V” is downward, radiant, and closed 
along rays. Therefore, V = V°"! implies that 
V is downward, radiant, and closed along rays. 
To prove the converse, we shall first show that 
V cv! Let x € V. Since for any y € V" we 
have 1,(x) < 1, it follows that x € yor! We shall 
now show that V?"! C V.Let x € V°!. Consider 
the Minkowski gauge wy: X —> Rx. It follows 
from [9], Proposition 5.1 that 


~ 


(ii 


V={tEX:py(t) <1}. (36) 
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Thus, if py(x)=0, then x eV. Assume that 
[Ly (x) > 0. Since V M int K # @, we get wy(x) < +00. 
Set r= x/py(x). By (28), r€ supp(y, X). Then 
1-(t) < y(t) for each t € X. In view of (36), we obtain 
1-(t) < 1 for all t € V, that is, r € V°". Thus, I,(x) < 1, 
and so by (6) and (8), wy(x) < 1 (note that wy(x) > 0 
implies that x ¢ —K). This proves that x € V, which 
completes the proof. O 


Abstract Concavity of DPH Functions 


Recall that a function q: X —> R is called decreas- 
ing ifx > y => q(x) < q(y). Ifp is an IPH function, 
then the functions q(x) = p(—x) and q(x) = —p(x) 
are DPH (decreasing and positively homogeneous of 
degree one). Hence, DPH functions can be investigated 
by using the properties of IPH functions. In this section, 
we shall study DPH functions separately. To this end, 
we need to introduce the function g: X x X —> R de- 
fined by 


g(x,y) := min{A € R: Ay < x} (37) 


(with the conventions min@ := +00 and minR := 
—oo). 


The following proposition can be easily proved: 


Proposition 11 For every x, x’, y € X and A > 0, one 


has 
g(Ax, y) = Ag(x, y), (38) 
ox Ay) = 590.9). (9) 
x<x' => g(x,y) = g(x’, y), (40) 
g(x,y) =—00 => yeK, (41) 
g(x,x)=1 => x€K. (42) 


It is worth noting that in (37) we cannot restrict the def- 
inition of g to A < 0 because we shall lose property (42). 

For each y€X, we consider the cones 
Cy, Cy and Cy defined by 


Cy = {x € X: g(x,y) € Roo}, (43) 


Cy = {x €C,: g(x, y) > 0}, (44) 


Cy ={xeCy: g(x,y) € R_}. (45) 


It is easy to check that Cy is an upward convex cone and 
Ct is a downward cone. Each element y € X generates 
the following functions: 


4 g(x,y), xe ct 
(x) = r (46) 
fy +00, otherwise, 
and 
pee es (47) 
y +00, otherwise. 


Let F be the set of all functions defined by (46) and (47). 
Remark 3 The function f, is DPH for each y € X. 


The proof of the following proposition is similar to that 
of Proposition 9, and therefore we omit its proof. 


Proposition 12 Let q: X —> R be a DPH function 
and x € dom q be a point such that q(x) #0. Then 
r= x/q(x) € K. 


Proposition 13. Let q: X —>R be a DPH func- 
tion and x € domq be a point such that q(x) # 0. 
Let r = x/q(x). Then the superdifferential dF q(x) is 
nonempty and the following assertions are true: 

1. If q(x) >0, then f+ € df q(x). 

2. If q(x) <0, then f; € df q(x). 


Proof We only prove part (i). Since q(x) € {A € 
R: Ar < x}, by (37), we get g(x,1r) < q(x). In view 
of (39) and (42), we have 


g(x,r) = q(x) g(x, x) = q(x) >0 


(note that since q(x) #0, it follows from Proposi- 
tion 12 that x ¢ K). By (44) and (46) we have x € Ct 
and f(x) = g(x,r) < q(x). We shall now show that 
f.*(y) = q() for every y € X. Let y € X be arbitrary. 
If y¢é Ct, then ft (y) = +00 => q(y). Assume that 
y € Cy. Then g(y,r)r < y. Since q is DPH, we get 
gly. r)q(r) = q(y). It follows from q(r) = 1 and (46) 
that f*(y) => q(y). This yields that f*(x) = q(x) and 

+ (y) = q(y) for each y € X. Hence f+ € df q(x). O 


It follows from the preceding proposition that we do 
not need functions of the form (47) in the study of non- 
positive DPH functions. For each r € X, we can con- 
sider the function s, : X —> R_ defined by 


g(x,r), xEeCr 


0, x€C,, (48) 


S,(x) = 


Increasing and Positively Homogeneous Functions on Topological Vector Spaces 


1585 


instead of the function f; defined by (47). Let S be the 
set of all functions defined by (48). Since C, is an up- 
ward set, we get that set S consists of nonpositive DPH 
functions; hence each S-concave function is DPH. We 
shall now give an infimal representation of DPH func- 
tions. 


Proposition 14 Let q: X —> R_ be a nonzero func- 
tion. Then q is DPH if and only if it is S-concave. 


Proof We only prove the part iff Let W’ = 
supp (q,S). We shall show that W’ 4 @. Consider 
x € X such that q(x) <0. Set r = x/q(x). It fol- 
lows from Proposition 12 and (41) that r ¢ K and 
g(y,r) > —oo for all ye X. Since q(x) € {A € 
R: Ar < x}, by (37) we get g(x, 7) < q(x) < 0. Then 
x € C_, and by (48) we obtain s,(x) < q(x). We shall 
now show that s,(y) > q(y) for each y € X. Let y € X 
be arbitrary. If y ¢ C,, then s,(y) = 0 = q(y). Assume 
that yeC,. Then (—g(y,r))(—r) = giyr)r< y. 
Since q is DPH, we get —g(y,r)q(—r) > q(y). It fol- 
lows from q(—r) = —1 and y € C> that s,(y) => q(y). 
Thus s, € W’ = suppt(q,S) and s,(x) = q(x). Fi- 
nally, if q(x) = 0, then s(x) = 0 foreach s € W’. Hence 
q(x) = min,ew’ s(x), that is, q is S-concave. Oo 


In the sequel, we introduce the function h: X x X —> 
IR defined by 

h(x, y) = max{A € R: Ay < x} (49) 
(we use the conventions max @ := —oo and maxR := 


+oo). The next proposition gives some properties of 
the coupling function h. We omit its easy proof. 


Proposition 15 For every x, x’, y € X and y > 0, one 
has 


h(yx,y) = yh(x, y), (50) 
hs. 79) = Shs. y), (51) 
h(x,y)=+0o => ye-K, (52) 
h(x,x)=1 — > x¢-K, (53) 
xx => h(x,y) <h’,y), (54) 
x€K, y€-K = h(x, y) = +00. (55) 


For each y € X, consider the cones 


Ky = {x € X: h(x, y) € Ryoo} (56) 
and 
K, = {x € Ky: h(x, y) <0}. (57) 


Clearly, K, is a downward cone. Each element y € X 
generates the function gy : X —> R_ defined by 


h(x,y), x€ K, 


x¢€ Ka 68) 


y (x) = 

gy = 

Let G_ be the set of all functions defined by (52). We 

conclude this section by a result on negative IPH func- 
tions. 


Theorem 10 Let p: X —> R_ be an IPH function 
and x € dom p be a point such that p(x) #0. Let 
r = x/p(x). Then g, € 0g_ p(x), and hence dg_ p(x) is 
nonempty. 


Proof It is clear that p(x) < 0. Since p(x) € {A € 
R: Ar < x}, by (49) we get h(x, 1r) > p(x). In view of 
Proposition 15, we have 


h(x, r) = p(x)h(x, x) = p(x) <0 


(note that since p(x) #0, it follows from Proposi- 
tion 9 that x ¢ —K). By (51) and (52) we have x € K, 
and g, (x) = h(x,r) > p(x). We shall now show that 
g, (y) < p(y) for every y € X. Let y € X be arbitrary. 
If y¢ K,, then g,(y) =—oo < p(y). Assume that 
y € K,. Then (—h(y,r))(—r) = h(y, r)r < y. Since p 
is IPH, we have —h(y,r)p(—r) < p(y). It follows from 
p(—r) = —1 and (52) that g(y) < p(y). This yields 
that g(x) = p(x) and gy (y) < p(y) for each y € X. 
Hence g- € dg_p(x). Oo 
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The Problem 


An inequality-constrained nonlinear programming 
problem may be posed in the form 


min f(x) a) 


s.t. c(x) > 0, 


where f(x) is a nonlinear function and c(x) is an m- 
vector of nonlinear functions with ith component c;(x), 
i=1,...,m. We shall assume that f and care sufficiently 
smooth. Let x* denote a solution to (1). We are mainly 
concerned about smoothness in the neighborhood of 
x*. In such a neighborhood we assume that both the 
gradient of f(x) denoted by g(x) and the m x n Jaco- 
bian of c(x) denoted by J(x) exist and are Lipschitz con- 
tinuous. As is the case with the unconstrained problem 
a solution to this problem may not exist. Typically addi- 
tional assumptions are made to ensure a solution does 
exist. A common assumption is to assume that the ob- 
jective f(x) is bounded below on the feasible set. How- 
ever, even this is not sufficient to assure a minimizer 
exists but it is obviously a necessary condition for an 
algorithm to be assured of converging. If the feasible 
region is compact then a solution does exist. We shall 
only be concerned with local solutions. 


First Order Optimality Conditions 


The problem is closely related to the equality-con- 
strained problem. If it was known which constraints 
were active (exactly satisfied) at a solution and which 
were slack (strictly positive) then the optimality condi- 
tions for (1) could be replaced by the optimality condi- 
tions for the equality case. Note that this does not imply 
the inequality problem could be replaced by an equality 
problem when it comes to determining a solution by an 
algorithm. The inequality problem may have solutions 
corresponding to different sets of constraints being ac- 
tive. Also an equality problem may have solutions that 
are not solutions of the inequality problem. Nonethe- 
less this equivalence in a local neighborhood enables us 
to determine the optimality conditions for this prob- 
lem from those of an equality-constrained problem. In 
order to study the optimality conditions it is necessary 
to introduce some notation. 

Let c(x) and ¢(x) denote the constraints active and 
slack at x respectively. Likewise, let T(x) and J denote 
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their respective Jacobians. Assume that T(x") is full 
rank. Points at which the Jacobian of the active con- 
straints is full rank are said to be regular. It follows from 
the necessary conditions for the equality case that 


g(x*) —T(x*)"A =0, 
C(x*) =0, 


c(x*) >0, 


where A is vector of Lagrange multipliers. These equa- 
tions may be written in the form: 


g(x*) — J(x*)"A* =0, 
E(x") = 0, 


A*"c(x*) = 0, 


where A* is the extended set of Lagrange multipliers. The 
set is extended by defining a multiplier to be zero for the 
slack constraints at x* c(x*). 

The above first order optimality conditions are not 
the only necessary conditions. Unlike the equality case 
there may be a feasible arc that moves off one or more 
of the active constraints along which the objective is re- 
duced. In other words we need some characterization 
that is necessary for the active set to be binding. The 
key to identifying the binding set is to examine the sign 
of A. 

It follows from the definition of A that 


4=0)-%e, (2) 


where the argument x* has been dropped for simplicity. 
Note that (2) implies that [a 
Define p as 


is bounded. 


Tp = be + e;, 


where 6 > 0, e denotes the vector of ones and ¢; is the 
unit column with one in the jth position. It follows from 
the assumption on the continuity of the Jacobian that 
x* + a p is feasible for 0 < a < @ is sufficiently small. 
From the mean value theorem we have 


f(x* + ap) = f(x") + ap" g(x* + Exp), 


where 0 < & < 1. The Lipschitz continuity of g implies 
M exists such that 


p! g(x* + Eap) < p' g(x") +aM. 


It follows that 
f(x* + ap) < f(x*) + a(p! g(x*) + aM). 
From the necessary conditions on x* we get 
p'g(x*) = pty’, 
which implies 
flx* + ap) < f(x*) + a(p JA + aM). 
Using the definition of p gives 
f(x* + ap) < f(x*) + a(SeTA +4; + 2M). 


It follows from the boundedness of A that if re < 0 
then for 6 sufficiently small there exists @ such that for 
0<a <a, 


f(x* + ap) < f(x*). 


Consequently, a necessary condition for x* to be a min- 
imizer under the assumptions made is that a > 0. 
Equivalently, A* > 0. 

For different assumptions such as J not being full 
rank the condition need not hold as the following 
simple case illustrates. Suppose we have an equality- 
constrained problem with c(x) = 0 then an equivalent 
inequality-constrained problem is 


min f(x) 
s.t. c(x) > 0, 


—c(x) > 0. 


It follows that all constraints are active at a solution. 
We know in this case there are no necessary conditions 
on the Lagrange multipliers. Clearly the Jacobian of the 
active constraints is not full rank. Geometrically what 
breaks down is that there is no perturbation from x* 
that moves feasible with respect to one constraint with- 
out violating at least one other constraint. 

The condition c(x*)' A* = 0 is a complementarity 
condition. At least one of (c;(x*), A*) must be zero. It 
is possible for both to be zero. If there is no index for 
which both are zero then c(x*) and A* are is said to sat- 
isfy strict complementarity. 

If 7(x*) is full rank then it follows from (2) that A* 
is an isolated point. 
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The function L(x, A), 
L(x, A) = F(x) — Al c(x), 

is known as the Lagrangian. The optimality condition 
g(x*) — J(x*)"2* = 0 


is equivalent to V, L(x*, A*) = 0. It is also equivalent to 
Z(x*)" g(x*) = 0, where the columns of Z(x) are a basis 
for the null space of the rows of T(x). The vector Z(x) " 
g(x) is called the reduced gradient. 

Clearly Lagrange multipliers play a significant role 
in defining the solution of an inequality-constrained 
problem. There is a significant difference in that role 
between linear and nonlinear constraints. In the case 
of linear constraints the numerical value of the multi- 
plier plays no role in defining x* only the sign of the 
multiplier is significant. For nonlinear constraints the 
numerical value as well as the sign is of significance. 
To appreciate why it first necessary to appreciate that 
for problems that are nonlinear in either the constraints 
or the objective, curvature of the functions are relevant 
in defining x*. More precisely the curvature of the La- 
grangian. It easily seen that curvature of the objective 
is relevant since for unconstrained problems no solu- 
tion would exist otherwise. To appreciate that curva- 
ture in c(x) is relevant note that any problem can be 
transformed into a problem with just a linear objective 
by adding an extra variable. For example, add the con- 
straint x,41 — f(x) = 0 and minimize x, instead of 
f(x). Since we have established the curvature of f(x) is 
relevant that relevance must still be there even though 
f(x) now appears only within a constraint. It is harder 
to appreciate that it is the relative curvature of the vari- 
ous constraints and objective that is of significance. 


Second Order Optimality Conditions 


We shall now assume that the problem functions 
are twice continuous differentiable. From the uncon- 
strained case it is known that a necessary condition is 
that V? f(x*) is positive semidefinite. Obviously a gen- 
eralization of this condition needs to hold for (1). Again 
the Lagrangian will be shown to playa key role. We start 
by examining the behavior of f(x) along a feasible arc 
emanating from x*. Although the first order optimality 
conditions make the first order change in the objective 


along a feasible arc nonnegative, it could be zero. Con- 
sequently, the second order change needs to be nonneg- 
ative for arcs where this is true. 

We restrict our interest to feasible arcs that remain 
on the set of constraints active at x*. If x(a) represents 
a twice differentiable arc, with x(0) = x*, that lies on the 
active set then C(x(@)) = 0. Define p = d(x(0))/d a and 
h = d’(x(0))/d a. We have 


d . d 
aa" (x(w)) = V(Ei(x(@))' —x(a), 
a da 
a d _ d 
ae (x(a@)) = Fg tlw)" V*Ci(x(@)) F— x(a) 


a 
ie Vei(x(a))" 75 (a). 


Since ¢C(x(a@)) = 0 it follows that 


d . " 
aa (x(0)) = Vei(x*)h+p!' V7¢;(x*)p = 0. (3) 


Similarly we get 


da 

Fait #0) = g(x*)'h+ p' Vf (x*)p. 
Since 

0)) = g(x*)'p=0 

Fat )) = g(x") p= 


(otherwise there would be a descent direction from x*) 
we require that 


g(x*)"h+ p' V7 f(x*)p = 0. 


Substituting for g(x*) using the first order optimality 
conditions gives 


A" J(x*)"A* + pp! V7 f(x*)p >0. 


It follows from (3) and the definition of the extended 
multipliers that we require 


— SCAT pT V2 ci(x*)p + p'V?f(x*)p = 0. 
i=l 
From the definition of L(x*, A*) and T(x*)p = 0 
this condition is equivalent to requiring that Z(x*)' V? 
L(x*, A*) Z(x*) be positive semidefinite. This matrix is 
called the reduced Hessian of the Lagrangian. Since the 
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condition is on the second derivatives it is termed a sec- 
ond order optimality condition. It can now be appreci- 
ated that the numerical value of the Lagrange multipli- 
ers play a role in defining the solution of a nonlinearly- 
constrained problem. Note that when there are n active 
constraints then there is no feasible arc that remains 
on the active set and the second order optimality con- 
dition is empty. When J has n rows then the reduced 
Hessian has zero dimension. For convenience we can 
define symmetric matrices of zero dimension to be pos- 
itive definite. 

Necessary and sufficient conditions for x* to be 
a minimizer are complex. However, sufficient condi- 
tions are easy to appreciate. We have established no fea- 
sible descent direction exists that moves off any of the 
active constraints. Consequently, if A > 0 then f(x) in- 
creases along any feasible arc emanating from x* that 
moves off a constraint. We now only need to be sure 
the same is true for all arcs emanating from x* that re- 
maining on the active set. This is assured if 


2 


d 
Fait (x) = g(x*)"h+ p' V7f(x*)p > 0, 


which implies Z(x*)' V? L(x*, A*) Z(x*) is positive def- 
inite. Assuming that x* is a regular point, strict com- 
plementarity hold, the first order necessary conditions 
hold, and the reduced Hessian at x* is positive definite 
then x* is a minimizer and an isolated point. 


Algorithms 


Algorithms for inequality problems have a combina- 
torial element not present in algorithms for equality- 
constrained problems. The simplest case of linear pro- 
gramming (LP) illustrates the point. Under mild as- 
sumptions the solution of an LP is given by the solution 
of a set of linear equations, i.e. a vertex of the feasi- 
ble region. The difficult issue is determining which of 
the constraints define those equations. If there are m 
inequality constraints and n variables there are m!/n! 
(n— m)! choices of active constraints. Even for modest 
values of m and n the possible choices are astronomi- 
cal. This clearly rules out methods based on exhaustive 
search. 

One class of methods to solve inequality problems 
are so-called active set methods, an example being the 
simplex method for LP. First a guess is made of the 


active set (called the working set) and then an estimate 
to the solution of the resulting equality-constrained 
problem is computed (in the case of LP or quadratic 
programming (QP) this would be precise) and at the 
new point a new guess is made of the active set. The es- 
timate of the solution of the equality-constrained prob- 
lem is usually made by finding a point that satisfies an 
approximation to the first order necessary conditions. 
Unless an intelligent guess is made of the active set such 
algorithms are doomed to fail. Typically after the initial 
active set such algorithms generate subsequent working 
sets automatically. For linearly-constrained problems 
this is usually a very simple procedure. Assuming the 
current iterate is feasible an attempt is made to move 
to the new estimate of the solution. If this is infeasible 
the best (or a point better than the current iterate) is 
found along the direction to the new estimate. The con- 
straints active at the new feasible point are then used 
to define the working set. Usually the active set will be 
the working set but occasionally we need to move off 
a constraint that is currently active. How to identify 
such a constraint is usually straightforward and can be 
done by examining an estimate to the Lagrange multi- 
pliers (obtained from the solution to the approximation 
of the first order necessary conditions). More complex 
strategies are possible that move off several constraints 
simultaneously. 

An initial feasible point is found by solving an LP. 
One consequence of this strategy is that it is only nec- 
essary to consider working sets for which the objective 
function has a lower value than at the current iterate. 
Once we are in a neighborhood of the solution the 
working set does not change if strict complementarity 
holds at the solution and x* is a regular point. Typically 
the change in the working set at each iteration of active 
set methods for linearly-constrained problems is small 
(usually one), which results in efficiencies when com- 
puting the estimate to the new equality-constrained 
problem. In practice active set methods work well and 
usually identify the active set at the solution with very 
little difficulty. For an LP the number of iterations re- 
quired to identify the active set usually grows linearly 
with the size of the problem. However, pathological 
cases exist in which the number of iterations is astro- 
nomical and real LP problems do arise where the num- 
ber of iterates required is much greater than the typical 
case. Nonetheless algorithms for linearly-constrained 
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problems based on active set methods are highly suc- 
cessful. 

For nonlinear problem the issue of identifying the 
active set at the solution is usually less significant since 
even when the active set is known the number of itera- 
tions required to solve a problem may be large. A more 
relevant issue is that not knowing the active set causes 
some problems such as making the linear algebra rou- 
tines much more complicated. For small problems this 
is of little consequence but in the large scale case it com- 
plicates the data structures required. 

Nonlinearly-constrained problems are usually an 
order of magnitude more complicated to solve than 
linearly-constrained problems. One reason is that algo- 
rithms for problems with nonlinear constraints usually 
do not maintain feasible iterates. If a problem has just 
one nonlinear equality constraint then generating each 
member of a sequence that lies on that constraint is it- 
self an infinite process. Methods that generate infeasible 
iterates need to have some means of assessing whether 
a point is better than another point. For feasible-point 
algorithms this is a simple issue since the objective pro- 
vides a measure of merit. A typical approach is to define 
a merit function, which balances a change in the ob- 
jective against the change in the degree of infeasibility. 
A commonly used merit function is 


M(x, p) = f(x) + p > max{0, —c;(x)}, 


i=1 


where p is a parameter that needs to be sufficiently 
large. Usually it will not be known what ‘sufficiently’ 
large is so this parameter is adjusted as the sequence of 
iterates is generated. Note that M(x, p) is not a smooth 
function and has a discontinuity in its derivative when 
any element of c(x) is zero. In particular it is not con- 
tinuous at x* when a constraint is active at x*. Were 
this not the case then constrained problems could be 
transformed to unconstrained problems and solved as 
such. While transforming a constrained problem into 
a simple single smooth unconstrained problem is not 
possible the transformation approach is the basis of 
a variety of methods. A popular alternative to direct 
methods is to transform the problem into that of solv- 
ing a sequence of smooth linearly-constrained prob- 
lems. This is the method at the heart of MINOS (see 
[8,9]) one of the most widely used methods for solving 


problems with nonlinear constraints. Other transfor- 
mation methods transform the problem to that of solv- 
ing a sequence of unconstrained or bounds-constrained 
problem. Transformation methods have an advantage 
of over direct methods when developing software. For 
example, if you have a method for solving large scale 
linearly-constrained problems then it can be used as 
a kernel in an algorithm to solve large scale nonlinearly- 
constrained problems. 
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The goal in a classification problem is to uncover a sys- 
tem that places examples into two or more mutually 
exclusive groups. Identifying a classification system is 
beneficial in several ways. First of all, examples can 
be organized in a meaningful way, which will make 


the exploration and retrieval of examples belonging to 
specific group(s) more efficient. The tree-like directory 
structure, used by personal computers in organizing 
files, is an example of a classification system which en- 
ables users to locate files quickly by traversing the di- 
rectory paths. A classification system can make the re- 
lations between the examples easy to understand and 
interpret. A poor classification strategy, on the other 
hand, may propose arbitrary, confusing or meaningless 
relations. An extracted classification system can be used 
to classify new examples. For an incomplete or stochas- 
tic system, its structure may pose questions whose an- 
swers may generalize the system or make it more accu- 
rate. 

A special type of classification problem, called the 
Boolean function inference problem, is when all the ex- 
amples are represented by binary (0 or 1) attributes and 
each example belongs to one of two categories. Many 
other types of classification problems may be converted 
into a Boolean function inference problem. For exam- 
ple, a multicategory classification problem may be con- 
verted into several two-category problems. In a similar 
fashion, example attributes can be converted into a set 
of binary variables. 

In solving the Boolean function inference problem 
many properties of Boolean logic are directly applica- 
ble. A Boolean function will assign a binary value to each 
Boolean vector (example). See [22] for an overview of 
Boolean functions. Usually, a Boolean function is ex- 
pressed as a conjunction of disjunctions, called the con- 
junctive normal form (CNF), or a disjunction of con- 
junctions, called the disjunctive normal form (DNF). 
CNF can be written as: 


where x; is either the attribute or its negation, k is the 
number of attribute disjunctions and p; is the jth index 
set for the jth attribute disjunction. Similarly, DNF can 
be written as: 


It is well known that any Boolean function can be writ- 
ten in CNF or DNF form. See [20] for an algorithm con- 
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verting any Boolean expression into CNF. Two func- 
tions in different forms are regarded as equivalent as 
long as they assign the same function values to all the 
Boolean vectors. However, placing every example into 
the correct category is only one part of the task. The 
other part is to make the classification criteria mean- 
ingful and understandable. That is, an inferred Boolean 
function should be as simple as possible. One part of 
the Boolean function inference problem that has re- 
ceived substantial research efforts is that of simplifying 
the representation of Boolean functions, while main- 
taining a general representation power. 


Inference of Monotone Boolean Functions 


When the target function can be any Boolean function 
with n attributes, all of the 2” examples have to be exam- 
ined to reconstruct the entire function. When we have 
a priori knowledge about the subclass of Boolean func- 
tions the target function belongs to, on the other hand, 
it may be possible to reconstruct it using a subset of the 
examples. Often one can obtain the function values on 
examples one by one. That is, at each inference step, an 
example is posed as a question to an oracle, which, in 
return, provides the correct function value. A function, 
f, can be defined by its oracle Ay which, when fed with 
a vector x = (x),..., X,), returns its value f(x). The in- 
ference of a Boolean function from questions and an- 
swers is known as interactive learning of Boolean func- 
tions. In many cases, especially when it is either difficult 
or costly to query the oracle, it is desirable to pose as few 
questions as possible. Therefore, the choice of examples 
should be based on the previously classified examples. 
The monotone Boolean functions form a subset of 
the Boolean functions that have been extensively stud- 
ied not only because of their wide range of applica- 
tions (see [2,7,8] and [24]) but also their intuitive in- 
terpretation. Each attribute’s contribution to a mono- 
tone function is either nonnegative or nonpositive (not 
both). Furthermore, if all of the attributes have nonneg- 
ative (or nonpositive) effects on the function value then 
the underlying monotone Boolean function is referred 
to as isotone (respectively antitone). Any isotone func- 
tion can be expressed in DNF without using negated 
attributes. In combinatorial mathematics, the set of iso- 
tone Boolean functions is often represented by the free 
distributive lattice (FDL). To formally define monotone 


Boolean function, consider ordering the binary vectors 
as follows [21]: 


Definition 1 Let E” denote the set of all binary vectors 
of length n; let x and y be two such vectors. Then, the 
vector x = (x1, ..., X,) precedes vector y = (yi, ..-5 Yn) 
(denoted as x < y) if and only if x; < y; for 1 <i< 
n. If, at the same time x # y, then x strictly precedes y 
(denoted as x < y). 


According to this definition, the order of vectors in E” 
can be listed as follows: 


(11) ~ (01) ~ (00) 
and 
(11) ~ (10) ~ (00). 


Note that the vectors (01) and (10) are in a sense in- 
comparable. 

Based on the order of the Boolean vectors, a nonde- 
creasing monotone (isotone) Boolean function can be 
defined as follows [21]: 


Definition 2 A Boolean function f is said to be an non- 
decreasing monotone Boolean function if and only if for 
any vectors x, y € E”, such that x < y, then f(x) ~ f(y). 


A nonincreasing monotone (antitone) Boolean function 
can be defined in a similar fashion. As the method used 
to infer an antitone Boolean function is the same as that 
of a isotone Boolean function, we will restrict our atten- 
tion to the isotone Boolean functions. 

When analyzing a subclass of Boolean functions, it 
is always informative to determine its size. This may 
give some indications of how general the functions are 
and how hard it is to infer them. The number of iso- 
tone Boolean functions, W (n), defined on E” is some- 
times referred to as the nth Dedekind number after R. 
Dedekind, [6] who computed it for n = 4. Since then it 
has been computed for up to E®. 


W (1) = 3; 
W (2) = 6; 
W (3) = 20; 


W (4) = 168 [6]; 

W (5) =7, 581 [4]; 

W (6) = 7, 828, 354 [28]; 

W (7) = 2, 414, 682, 040, 998 [5]; 

W (8) = 56, 130, 437, 228, 687, 557, 907, 788 [29]. 
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Wiedeman’s algorithm [29] employed a Cray-2 proces- 
sor for 200 hours to compute the value for n = 8. This 
gives a flavor of the complexity of computing the ex- 
act number of isotone Boolean functions. The compu- 
tational infeasibility for larger values of n provides the 
motivation for approximations and bounds. The best 
known bound on W (n) is due to D. Kleitman, [12] and 
Kleitman and G. Markowsky, [13]: 


W(n) < gluse) 
where cis a constant and [n/2| is the integer part of n/2. 

This bound, which is an improvement over the first 
bound obtained by G. Hansel, [11], are also based on 
the Hansel chains described below. Even though these 
bounds can lead to good approximations for W(n), 
when n is large, the best known asymptotic is due to 
A.D. Korshunoy, [15]: 


Unio) ef (n) for even n, 
W(n) ~ F 
2ni2-112.) +1 pan) for odd n, 
where 
n 1 n? n 
wa a - ) (5 7 nts a) 
g(n) 


_ n 1 n2 n 
~ \n/2 —3/2] \2+3)2 ante gn 4.3 
n 1 n? 
a n/2 — 1/2} \ 242 a gnt4 J 


I. Shmulevich [24] achieved a similar but slightly infe- 
rior asymptotic for even n in a simpler and more el- 
egant manner, which led to some interesting distribu- 
tional conjectures regarding isotone Boolean functions. 

Even though the number of isotone Boolean func- 
tions is large, it is a small fraction of the number of gen- 
eral Boolean functions, 27”. This is the first hint towards 
the feasibility of efficiently inferring monotone Boolean 
functions. Intuitively, one would conjecture that the 
generality of this class was sacrificed. That is true, how- 
ever, a general Boolean function consists of a set of ar- 
eas where it is monotone. In fact, any Boolean function 
q(x1,.. 
ing g; (x) and nonincreasing hj(x) monotone Boolean 


-» X,) can be represented by several nondecreas- 


functions in the following manner [17]: 


q(x) = \/ 


i 


gi(x) \ hj(x) 
j 


As a result, one may be able to solve the general 
Boolean function inference problem by considering 
several monotone Boolean function inference problems. 
Intuitively, the closer the target function is to a mono- 
tone Boolean function, the fewer monotone Boolean 
functions are needed to represent it and more success- 
ful this approach might be. In [17] the problem of joint 
restoration of two nested monotone Boolean functions 
f; and f2 is stated. The approach in [17] allows one 
to further decrease the dialogue with an expert (oracle) 
and restore a complex function of the form f; & — fo, 
which is not necessarily monotone. 


The Shannon Function and the Hansel Theorem 


The complexity of inferring isotone Boolean functions 
was mentioned in the previous section, when realizing 
that the number of isotone Boolean functions is a small 
fraction of the total number of general Boolean func- 
tions. In defining the most common complexity mea- 
sure for the Boolean function inference problem, con- 
sider the following notation. Let M, denote the set of 
all monotone Boolean functions, and A = {F} be the set 
of all algorithms which infer f € M,, and 9 (F, f) be 
the number of questions to the oracle Ay required to in- 
fer f. The Shannon function  (n) can be introduced as 
follows [14]: 


y(n) = min (max v(F. f)) ; 


An upper bound on the number of questions 
needed to restore a monotone Boolean function is given 
by the following equation (known as the Hansel theo- 
rem) [11]: 


n n 
C= ie = & i .} 


That is, if a proper question-asking strategy is applied, 
the total number of questions needed to infer any 
monotone Boolean function should not exceed @ (n). 
The Hansel theorem can be viewed as the worst-case 
scenario analysis. Recall, from the previous section, that 


1594 


Inference of Monotone Boolean Functions 


all of the 2” questions are necessary to restore a general 
Boolean function. D.N. Gainanov [9] proposed three 
other criteria for evaluating the efficiency of algorithms 
used to infer isotone Boolean functions. One of them is 
the average case scenario and the two others consider 
two different ways of normalizing the Shannon func- 
tion by the size of the target function. 


Hansel Chains 


The vectors in E” can be placed in chains (sequences of 
vectors) according to monotonicity. The Hansel chains 
is a particular set of chains that can be formed using 
a dimensionally recursive algorithm [11]. It starts with 
the single Hansel chain in E!: 


H" = {(0), (1)}. 


To form the Hansel chains in E’, three steps are re- 
quired, as follows: 


1 Attach the element ‘0’ to the font of each vector 
in H' and get chain C? ™™ = {(00), (01)}. 

2 Attach the element ‘1’ to the front of each vec- 
torn and get chainC? “= = 4 (10) (11) 

3. Move the last vector in C? ™*, i.e. vector (11), 
to the end of C? ™"; H®! = £(00), (01), (11)}; 
H? = {(10)}. 


To form the Hansel chains in E°, these steps are re- 
peated: 


1 Cc! mn — £(000), (001), (011)}; 
C3:2 min _ {(010)}. 

BLO eA UI. eb he 
C32 max _ {(110)}. 

3 H? = {(000), (001), (011), (111)}; 
H°? = {(100), (101)}; 
H? = {(010), (110)}. 


Note that since there is only one vector in the 
C*?™aXx chain, it can be deleted after the vector (110) 
is moved to C>?™"". This leaves the three chains listed 
in Table 1. In general, the Hansel chains in E” can be 
generated recursively from the Hansel chains in E"~! 
by following the three steps described above. 

A nice property of the Hansel chains is that all the 
vectors in a particular chain are arranged in increasing 


Inference of Monotone Boolean Functions, Table 1 
Hansel chains for E? 


vector 
000 
001 
011 
111 
100 
101 
010 
110 


chain # vector in-chain index 
1 1 


i) 
DO] RH] dO] Re} BY] Go] bo 


order. That is, if the vectors V; and V; are in the same 
chain then V; < V; (i-e., V; strictly precedes V; when j < 
k). Therefore, if the underlying Boolean function is iso- 
tone, then one can classify vectors within a chain easily. 
For example, if a vector V; is negative (i.e., f(Vj) = 0), 
then all the vectors preceding V; in the same chain are 
also negative (i.e., f(Vx%) = 0 for any k < j). Similarly, 
if a vector V; is positive, then all the vectors succeed- 
ing V; in the same chain are also positive. The mono- 
tone ordering of the vectors in Hansel chains motivates 
the composition of an efficient question-asking strategy 
discussed in the next section. 


Devising a Smart Question-Asking Strategy 


The most straightforward question-asking strategy, 
which uses Hansel chains, sequentially moves from 
chain to chain. Within each chain one may also sequen- 
tially select vectors to pose as questions. After an answer 
is given, the vectors (in other chains also) that are classi- 
fied as a result of monotonicity are eliminated from fur- 
ther questioning. Once all the vectors have been elim- 
inated, the underlying function is revealed. The maxi- 
mum number of questions for this method, called the 
sequential Hansel chains question-asking strategy, will 
not exceed the upper limit g (n), given in the Hansel 
theorem, as long as the chains are searched in increas- 
ing size. 

Although the sequential question-asking strategy is 
easy to implement and effective in reducing the total 
number of questions, there is still room for improve- 
ments. N.A. Sokolov [25] introduced an algorithm that 
sequentially moves between the Hansel chains in de- 
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Inference of Monotone Boolean Functions, Table 2 
Iteration 1 


chain # |index of|vector |vector |middle | reward P if| reward N if| selected middle | answer | other 
vectors classi- |vector |the vector|the vector | vector with vectors 
in the fied in the|is positive | is negative | the largest deter- 
chain chain min(P, N) mined 
Il 1 000 
2 001 <— 4 2, <— 1 
3 011 1 
4 111 1 
2 il 100 << 4 2 
2 101 1 
3 1 010 <_ 4 2, 
2 110 


creasing size and performs a middle vector search of 
each chain. His algorithm does not require storing all 
the Hansel chains since at each iteration it only requires 
a single chain. This advantage is obtained at the cost of 
asking more questions than needed. 

In an entirely different approach, Gainanov [9] pre- 
sented a heuristic that has been used in numerous al- 
gorithms for inferring a monotone Boolean function, 
such as in [3] and in [18]. This heuristic takes as input 
an unclassified vector and finds a border vector (maxi- 
mal false or minimal true) by sequentially questioning 
neighboring vectors. The problem with most of the in- 
ference algorithms based on this heuristic is that they 
do not keep track of the vectors classified, only the re- 
sulting border vectors. Note that for an execution of 
this heuristic, all of the vectors questioned are not nec- 
essarily covered by the resulting border vector, imply- 
ing that valuable information may be lost. In fact, sev- 
eral border vectors may be unveiled during a single ex- 
ecution of this heuristic, but only one is stored. Many 
of these methods are designed to solve large problems 
where it might be inefficient or even infeasible to store 
all of the information gained within the execution of the 
heuristic. However, these methods are not efficient (not 
even for small size problems), in terms of the number 
of queries they require. 

One may look at each vector as carrying a ‘reward’ 
value in terms of the number of other vectors that will 
be classified concurrently. This reward value is a ran- 
dom variable that takes on one of two (one if the two 
values are the same) values depending on whether the 


vector is a positive or a negative example of the target 
function. The expected reward is somewhere between 
these two possible values. If one wishes to maximize 
the expected number of classified vectors at each step, 
the probabilities associated with each of these two val- 
ues need to be computed in addition to the actual val- 
ues. Finding the exact probabilities is hard, while find- 
ing the reward values is relatively simple for a small set 
of examples. 

This is one of the underlying ideas for the new infer- 
ence algorithm termed the binary search-Hansel chains 
question-asking strategy. This method draws its motiva- 
tion, for calculating and comparing the ‘reward’ values 
for the middle vectors in each Hansel chain, from the 
widely used binary search algorithm (see, for instance, 
[19]). Within a given chain, a binary search will dramat- 
ically reduce the number of questions (to the order of 
logs while the sequential search is linear). Once the ‘re- 
ward’ values of all the middle vectors have been found, 
the most promising one will be posed as a question to 
the oracle. Because each vector has two values, select- 
ing the most promising vector is subjective and several 
different evaluative criteria can be used. 

The binary search-Hansel chains question-asking 
strategy can be divided into the following steps: 

1) Select the middle vector of the unclassified vectors 
in each Hansel chain. 

2) Calculate the reward values for each middle vector. 
That is, calculate the number of vectors that can be 
classified as positive (denoted as P) if it is positive 
and negative (denoted as N) if it is negative. 
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Iteration 2. The vector (100) is chosen and based on the answer, the class membership of the vectors (100) and (000) is 


determined 

chain # |index of|vector | vector |middle | reward P if|reward N if| selected middle | answer | other 
vectors classi- |vector |the vector|the vector | vector with vectors 
in the fied in the|is positive |is negative | the largest deter- 
chain chain min(P, N) mined 

1 1 000 <- 4 1 0 
2 001 1 
3 011 1 
4 111 1 

2 1 100 <_ 2 2 <_ 0 
2 101 1 

3 1 010 <_ 2 2 
2, 110 


3) Select the most promising middle vector, based on 
the (P, N) pairs of the middle vectors, and ask the 
oracle for its membership value. 

4) Based on the answer in Step 3, eliminate all the vec- 
tors that can be classified as a result of the previous 
answer and the property of monotonicity. 

5) Redefine the middle vectors in each chain as neces- 
sary. 

6) Unless all the vectors have been classified, go back 
to Step 2. 

The inference of a monotone Boolean function on E° by 

using the binary search-Hansel chains question-asking 

strategy is illustrated below. The specifics of Iteration 

1, described below, are also shown in Table 2. At the 

beginning of first iteration, the middle vectors in each 

Hansel chain (as described in Step 1) are selected and 

marked with the “—’ symbol in Table 2. Then, accord- 

ing to Step 2, the reward value for each one of these 
middle vectors is calculated. For instance, if (001) (the 
second vector in chain 1) has a function value of 1, then 
the three vectors (000), (001) and (010) are also classi- 

fied as positive. That is, the value of P for vector (001) 

equals 4. Similarly, (000) will be classified as 0 if (001) 

is classified as 0 and thus its reward value N equals 2. 
Once the ‘reward’ values of all the middle vectors 

have been evaluated, the most promising middle vec- 

tor will be selected based on their (P, N) pairs. Here we 
choose the vector whose min (P, N) value is the largest 
among the middle vectors. If there is a tie, it will be bro- 
ken randomly. Based on this evaluative criterion, vector 


2 is chosen in chain 1 and is marked with “<’ in the 
column ‘selected middle vector with the largest min (P, 
N)’. After receiving the function value of 1 for vector 
(001), its value is placed in the ‘answer’ column. This 
answer is used to eliminate all of the vectors succeed- 
ing (001). The middle vector in the remaining chains 
are updated as needed. At least one more iteration is 
required, as there still are unclassified vectors. 

After the second iteration, no unclassified vectors 
are left in chains 1 and 2, and the middle of these chains 
need not be considered anymore. Therefore, an ‘X’ is 
placed in the column called “middle vector in the chain’ 
in Table 4. At the beginning of the third iteration, the 
vector (010) is chosen and the function value of the re- 
maining two vectors (010) and (110) are determined. 
At this point all the vectors have been classified and the 
question-asking process stops. 

The algorithm posed a total of three questions in or- 
der to classify all the examples. The final classifications 
listed in Table 5. corresponds to the monotone Boolean 
function x2 V x3. 


Conclusions 


This paper described some approaches and some of 
the latest developments in the problem of inferring 
monotone Boolean functions. As it has been described 
here, by using Hansel chains in the sequential question- 
asking strategy, the number of questions will not exceed 
the upper bound stated in the Hansel theorem. How- 
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Inference of Monotone Boolean Functions, Table 4 
Iteration 3 


chain # |index of|vector |vector |middle | reward P if| reward N if| selected middle | answer | other 
vectors classi- |vector |the vector|the vector | vector with vectors 
in the fied in the|is positive |is negative | the largest deter- 
chain chain min(P, N) mined 
1 1 000 0 
ep; 001 1 x 
3 011 1 
4 111 1 
2 il 100 0 x 
2 101 1 
3 1 010 <_— 2 1 <_— 1 
ey; 110 1 
Inference of Monotone Boolean Functions, Table 5 > Optimization in Boolean Classification Problems 
The resulting class memberships > Optimization in Classifying Text Documents 
chain # vector in- | vector | function Raferancas 
chain index value 
1 1 100 0 1. Alekseev DVB (1988) Monotone Boolean functions. Encycl. 
5) 101 I Math., vol 6. Kluwer, Dordrecht, 306-307 
2. Bioch JC, Ibaraki T (1995) Complexity of identifixation and 
2 1 010 1 ane ae é 
dualization of positive Boolean functions. Inform and Com- 
2 110 J put 123:50-63 
3 1 000 0 3. Boros E, Hammer PL, Ibaraki T, Kawakami K (1997) 
2 001 1 Polynomial-time recognition of 2-monotonic positive 
3 011 1 Boolean functions given by an oracle. SIAM J Comput 
4 JUL 1 26(1):93-109 


ever, by combining the binary search of Hansel chains 


with the notion of an evaluative criterion, the number 
of questions asked can be further reduced. At present, 6 
the binary search-Hansel chains question-asking strat- 
egy is only applied to Hansel chains with a dimension 


of less than 10. However, it is expected that this method 


can be applied to infer monotone Boolean functions of 


larger dimensions with slight modifications. 
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In economics or biology there is no natural end time 
for a process. Nations as well as species have a very 
long future to consider. A mathematical abstraction for 
this phenomenon is the concept of infinite time hori- 
zon simply defined as an unbounded time interval of 
the form [0, + co). The study of competing agents in 
a dynamic deterministic setting over a long time period 
can be cast in the framework of an infinite horizon dy- 
namic game. This game is defined by the following ‘ob- 
jects’: 

e A system evolving over an infinite horizon is char- 
acterized by a state x € X C R™°. Some agents also 
called the players i = 1, ..., p can influence the 
state’s evolution through the choice of an appropri- 
ate control in an admissible class. The control value 
at a given time n for player iis denoted uj(n) € U; C 
Ryj. 

e The state evolution of such a dynamical system may 
be described either as a difference equation, if dis- 
crete time is used, or a differential equation in a con- 
tinuous time framework. For definiteness we fix our 
attention here on a stationary difference equation 
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and merely remark that similar comments apply for 
the case when other types of dynamical systems are 
considered. 


x(n + 1) = f(x(n), uy (n),..., up(n)) 


for n = 0, 1,..., where f: Rx ...x R™? > R” is 
a given state transition function. 

e We assume that the agents can observe the state of 
the system and remember the history of the system 
evolution up to the current time n, that is, the se- 
quence 


h, = {x(0), u(0),..., u(n — 1), x(n)}, 


where u(1) denotes the controls chosen by all play- 
ers at period n (i.e., u(m) = (ui (1), ..., Up(m))). A pol- 
icy or a strategy is a way for each agent, to adapt 
his/her current control choice to the history of the 
system, that is a mapping y;: (n, h,) — U; which 
tells player i which control u;(n) € U; to select given 
that the time period is n and the state history is hy. 

e Once such a model is formulated the question arises 
as to what strategy or policy should each agent adopt 
so that his/her decision provides him/her with the 
most benefit. The decision to adopt a good strategy 
is based on a performance criterion defined over the 
life of the agent (in this case [0, + oo)), that is, for 
each time horizon N the payoff to player i is deter- 
mined by 


N 
Ty(x,u) = D> Bi gi(x(n), a(n), 


n=1 


where x and u denote the state and control evolu- 
tions over time, gj: Rx --- x R™? — Risa given 
reward function and f; € [0, 1] is a discount factor 
for each player i=1,..., p. 

Two categories of difficulties have to be dealt with when 

one studies infinite horizon dynamic games: 

e the consideration of an unbounded time horizon 
gives rise to the possibility of having diverging val- 
ues for the performance criterion (i. e., tending to + 
oo on all possible evolutions). This happens typically 
when there is no discounting (6; = 1). A related is- 
sue is the stability vs. instability of the optimally con- 
trolled system. 


e A second category of difficulties are associated with 
the consideration of all possible actions and reac- 
tions of the different agents over time, since an infi- 
nite time horizon will always give any agent enough 
time to correct his/her strategy choice, if neces- 
sary. 

The first difficulty is already present in a single agent 
system where the problem reduces to a dynamic opti- 
mization problem and is typically cast in the framework 
of the calculus of variations or optimal control in either 
discrete or continuous time. The second type of difhi- 
culty arises typically in nonzero-sum games. 


Unbounded Cost 


To introduce the difficulties involved in studying in- 
finite horizon problems we first consider the single 
player case. The single player case is the most studied of 
these problems with a relatively rich history beginning 
with the seminal paper of F. Ramsey [8]. Therefore we 
shall introduce the subject with the Ramsey model, us- 
ing simpler notations than the one introduced above. 
In Ramsey’s work a continuous time model for the eco- 
nomic growth of a nation is developed and analyzed. In 
discrete time, the dynamics for Ramsey’s model is de- 
scribed by the difference equation 


Xn41 = Xn + ff (Xn) — Cn 


with a fixed initial condition xo. Here, x, > 0 denotes 
the amount of capital stock at the end of the time pe- 
riod n; f(x,) is a nonnegative valued function, known 
as the production function, which is defined for all pos- 
itive x, and represents the rate at which capital stock 
is produced given a stock level x,; and c,> 0 represents 
the rate at which the nation consumes the capital stock. 
Since a nation usually does not consume at a rate faster 
than it produces we also have the inequality constraint 


O0<c,<f(x,) foralln =1,2,.... 


The performance of the system is measured as an accu- 
mulation of social welfare over the time scale. Thus, up 
to a fixed time N, this is represented by the sum 


N 
In (fen}) = > Ucn), 


n=1 


in which U(c,) is called a social utility function and rep- 
resents the ‘rate of enjoyment’ of society at a consump- 
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tion rate c,. The goal of a decision maker in this model 
is to determine c,, n = 1, 2,..., so that 


N 
lim Jv(len}) = lim >| Ulen) 


n=1 


is maximized. An immediate concern in attempting to 
solve such a problem is that the performance crite- 
rion is well defined. That is, for a given feasible ele- 
ment {x,, Cy}, n = 1, 2, ..., is the above infinite se- 
ries convergent? Additionally, if there exists feasible 
elements for which the convergence is assured how 
does one know if the supremum is finite. Ramsey was 
aware of these two difficulties and these issues were ad- 
dressed in his work. In dealing with this lack of preci- 
sion two ideas have arisen. The first of these is to in- 
troduce the notion of discounting to ‘level the playing 
field’ by scaling units to present value terms. This mani- 
fests itself through a positive weighting scheme. Specif- 
ically the performance criterion is modified through 
the introduction of a constant ‘discount rate’, 8, be- 
tween 0 and 1. That is, the above infinite series is 
replaced by 


N 
dim Jn end} = dim TB" Ucn). 


n=1 


It is now an easy matter to see that if the sequence 
{U(c,)} is bounded then the infinite series converges. 
Moreover, if all feasible sequences {c,,} are bounded and 
U(-) is a continuous function it is easy to see that the 
supremum (as well as the infimum) over all such se- 
quences is bounded above and the optimization prob- 
lem is well defined. A criticism of discounting voiced by 
Ramsey is that it weights a decision makers preference 
toward the present at the expense of the past. Conse- 
quently Ramsey seeks another approach. This alternate 
idea was that the rate at which a nation consumes is 
bounded and that ideally the best system would be one 
in which the rate is as large as possible. Thus Ramsey 
introduced the notion of a ‘maximal sustainable rate of 
enjoyment’ which he referred to as bliss. The notion of 
bliss, denoted by B, is defined now as an optimal steady 
state problem. That is, 


B= max{U(c): c= f(x), x => 0} 
= max{U(c): c>0}. 


With this idea, the performance index is replaced by 
a new performance given as 


N 
lim Jn (len}) = lim TB Ulen), 


n=1 


and the goal is to choose {c,} as a minimizer instead of 
a maximizer. Observe that B— U(c,) > 0 for all n so 
that the above limit is bounded below by zero. Thus, 
if bliss is attained by some feasible sequence (that is, 
Cy, = € for all n sufficiently large with B = U(c), then 
the performance criterion is finite for at least one fea- 
sible element {x,, c,} and the minimization problem is 
well defined. Using the notion of bliss, Ramsey solved 
this problem using classical variational analysis (i.e., 
the Euler-Lagrange equation from the calculus of vari- 
ations) to arrive at what is now referred to as Ramsey’s 
rule of economic growth. Finally we remark that the so- 
lution, say {x*, c*}, obtained by Ramsey asymptotically 
approaches {X, c}, where x the unique solution to the 
equation c = f(x). 

The approach adopted by Ramsey in his model has 
become a prototype for studying more complex prob- 
lems. In particular, the notion of bliss and the optimal 
steady state problem combined with the idea that bliss 
is obtained in finite time is now referred to as a reduc- 
tion to finite costs. Finally the asymptotic convergence 
to the optimal steady state is referred to as an asymp- 
totic turnpike property. 

Since Ramsey ‘solved’ his problem through an ap- 
plication of necessary conditions he did not directly 
address the question of existence of an optimal solu- 
tion. He assumed that the solution to the necessary 
condition was in fact a solution. However, in 1962, S. 
Chakravarty [4] gave a simple example in which the so- 
lution of Ramsey’s rule was not a minimizer but a maxi- 
mizer! This led to the quest for the existence of optimal 
solutions for these problems. As the performance ob- 
jective is unbounded, the traditional notion of a mini- 
mizer is no longer valid. Thus, new types of optimality 
were introduced in the 1960s by C.C. von Weizacker 
[10] to deal with this problem. These notions are now 
known as overtaking optimality, weakly overtaking op- 
timality, and finite optimality. The most useful and 
strongest of these three types of optimality is overtak- 
ing optimality. In words, a sequence {x*, c*} is over- 
taking optimal if when compared with any other fea- 
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sible sequence {X,, cy} the finite horizon performance 
criterion, Jy({c*}) is larger than Jy({c,}) to within an 
arbitrarily small margin of error for all N sufficiently 
large. 

The introduction of new types of optimality led 
to new important results concerning these problems. 
The first necessary conditions for these types optimal- 
ity were given in 1974 by H. Halkin [5] in which the 
classical Pontryagin maximum principle was extended. 
Of particular notice in this result was the fact that the 
classical transversality condition found in correspond- 
ing finite horizon problems does not necessarily hold. 
This fact led to many results which insure some sort of 
boundary condition holds at infinity. The first general 
existence theorem for these optimization problems was 
given by W.A. Brock and H. Haurie [1] in 1976. During 
the 1980s these major results were extended in a vari- 
ety of directions and many of these results are discussed 
in [3]. 


Nonzero-Sum Infinite Horizon Games 


We now turn our attention to p-player games. We use 
from now on the general notations introduced in the 
introduction. To simplify a little the exposition we shall 
use a simplified paradigm where each player is control- 
ling his/her own dynamical system. Hence each player 
enjoys his/her own state and control, say {x;(n), uj(n)} 
fori=1,...,pandn=1,2,..., and has a performance 
criterion, say ieee u), which is described in discrete 
time up to the end of period N as 


N 
Tux, u) = D> gi(x(n), u(n)). 
n=1 


Here we use the notation 


x(n) = {(x1(n),...,xp(n))} 


and 


u(n) = {(ui(n),...,Up(n))}. 


From the notation we see that each players performance 
measure depends not only on their own decision but 
also those of the other players. This coupling may also 
occur in the dynamical system as well. In discrete time 
these systems may be represented by a system of p dif- 


ference equations 
xi(n + 1) = fi(x(n), u(n)) 


forn=0,1,... andi=1,..., p. 

The goal of each of the players is to ‘play’ the game 
so that their decisions provide them with the best per- 
formance possible. This action is in conflict with the 
other players and therefore generally it is not possible 
for the players to minimize or maximize their perfor- 
mance. The way one defines optimality in a game de- 
pends on the mood of play, i.e. if the players behave in 
a cooperative or in a noncooperative fashion. 


Cooperative Solution 


If players cooperate they will want to reach an undom- 
inated solution, also called a Pareto solution after its 
originator, V. Pareto [7], who introduced the concept 
in 1896. A pair {x, u} is called a cooperative solution if 
there does not exist a feasible point {y, v} satisfying J'(x, 
u) > J'(y, v) for all players i = 1, ..., p with at least one 
strict inequality for one of the players. It is well known 
that such an equilibrium can be obtained by solving an 
appropriate single player game in which the payoff is 
a weighted sum of the payofts of all of the players 


m= So aaa); 
j=l, p 
rj; = 0, j=l,...,p. 


In this way the problem is reduced to the case of infinite 
horizon optimization and the remarks made earlier ap- 


ply. 


Noncooperative Solutions 


If players do not cooperate one may consider that they 
will be satisfied of the outcome if, for each player, 
his/her strategy is the best response he/she can make 
to the strategies adopted by the other players. This is 
the concept of equilibrium, introduced in 1951 by J.F. 
Nash [6] in the context of matrix games. In general, 
the search for a Nash equilibrium can not be reduced to 
an optimization problem. Since each players decision is 
his/her best decision under the assumption that the de- 
cisions of the other players are fixed, the search for an 
equilibrium is equivalent to the search for a fixed-point 
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of a reaction mapping that associates with each strat- 
egy choice by the p players the set of optimal responses 
by each of them. To better understand this concept it 
is preferable to consider first a game defined in its nor- 
mal form. Let y; € I”; design the strategies of player j. 


Let Vj(v1, .-.> ¥p) € R be the payoff to player j associ- 
ated with the strategy choices y = (71,..., Yp) of the p 
players. y* is a Nash equilibrium if 
VAN sna Viaind gS VAY Is 
Vyj € Tj, JH Tycvny PD: 


Now we introduce the product strategy set [ = 
x Tj and the mapping 


Finally let us define the point to set mapping W: I" > 
2£ defined by 


Wy) = ipt: a(y,y*) = sup o(y.y°) 
yer 


W is the best response mapping for the game. A fixed- 
point of W is a strategy vector y* such that 


y = wy”). 


y* is a fixed-point of W if and only if it is a Nash equi- 
librium. 

In a dynamic setting the concept of strategy is 
closely related to the information structure of the game. 
We have assumed, in the beginning that the players can 
remember the whole (state and control) history of the 
dynamical system they contribute to control. This is the 
most precise information that can be available to the 
players at each instant of time. On the other end we can 
assume that the only information available to a player 
is the initial state of the system x° = x(0) and the cur- 
rent time t. A strategy y; for player j will thus be an 
open-loop control {uj(n)}n = 0, ..., 00. An equilibrium 
in this class of strategies is called an open-loop Nash 
equilibrium. An intermediate case is the one where each 
player can observe the state of the system at each time 


period but does not recall the previous history of the 
system, neither the state nor the control values. A (sta- 
tionary) strategy y; for player j will thus be a closed-loop 
control or a feedback control y;:x +> uj; = y;(x). An 
equilibrium in this class of strategies is called a feedback 
Nash equilibrium. In the economics literature, feedback 
strategies are also called Markov strategies to emphasize 
the lack of memory in the information structure. 

For a single agent deterministic system, i.e. an op- 
timal control problem, the information structure does 
not really matter. The agent will not be able to do bet- 
ter than the optimal open-loop control, even if he/she 
has a perfect memory. In a two-player zero-sum dy- 
namic game this will also be the case. In a nonzero-sum 
game the different information structures lead to differ- 
ent types of equilibria. 

A criticism of the open-loop Nash equilibrium is 
that it is not necessarily subgame perfect in the sense of 
R. Selten [9]. This means that if a player deviates from 
the equilibrium control for a while and then decides to 
play again ‘correctly’, then the previously defined equi- 
librium is not an equilibrium any more. A feedback 
Nash equilibrium can be made subgame perfect if one 
uses dynamic programming to characterize it. A mem- 
ory strategy Nash equilibrium can also be made sub- 
game perfect. Furthermore, the possibility to remem- 
ber past actions or state values permit the player to de- 
fine a so-called communication equilibrium where, be- 
fore the play the agents communicate with each other 
and decide to use a specific memory strategy equilib- 
rium. The memory permits the inclusion of threats that 
would support a cooperative outcome. The cooperative 
outcome becomes also a Nash equilibrium outcome. 
This type of results have been known as the ‘folks the- 
orem’ in economics. The infinite horizon is essential to 
obtain this type of result. 

Nevertheless, the open-loop concept still has wide 
interest for a variety of reasons. In infinite horizon 
games the notion of overtaking Nash equilibrium is de- 
fined analogously to the concept in the single-player. 
These ideas have just recently begun to be studied ex- 
tensively with the first existence theory for open-loop 
Nash equilibria and a corresponding turnpike theory 
being given in 1996 in [2]. Finally, from a practical set- 
ting, the numerical computation of a feedback Nash 
equilibrium is much less understood than the compu- 
tation of an open-loop Nash equilibrium. The analo- 
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gous theory for feedback (or closed-loop) equilibria is 
still waiting to be developed. 

In closing, the theory of infinite horizon dynamic 
games is for the most part still in its infancy and much 
remains to be studied and researched. One important 
open question concerns the existence of overtaking 
feedback Nash equilibria and another is that once such 
an equilibrium is known to exist can a robust numeri- 
cal procedure for computation of equilibrium be devel- 
oped. 
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This article concerns optimization in two senses. The 
first is that information-based complexity (IBC) is the 
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study of the minimal computational resources to solve 
continuous mathematical problems. (Other types of 
mathematical problems are also studied; the problems 
studied by IBC will be characterized later.) J.F. Traub 
and A.G. Werschulz [14] provide an expository intro- 
duction to the theory and applications of IBC, with over 
400 recent papers and books. A general formulation 
with proofs can be found in [13]. 

The second is that the computational complexity of 
optimization problems is one of the areas studied in IBC. 
S.A. Vavasis [16 pag. 135] calls this information-based 
optimization. We will discuss information-based com- 
plexity and information-based optimization in turn. 


Information-Based Complexity 


To introduce computational complexity, we first define 
the model of computation. The model of computation 
states which operations are permitted and how much 
they cost. The model of computation is based on two 
assumptions: 

1) Wecan perform arithmetic operations and compar- 

isons on real numbers at unit cost. 

2) We can perform an information operation at cost c. 

Usually, > 1. 

We comment on these assumptions. The real number 
model (Assumption 1) is used as an abstraction of the 
floating-point model typically used in scientific compu- 
tation. Except for the possible effect of roundoff errors 
and numerical stability, complexity results will be the 
same in these two models. 

The real number model should be contrasted with 
the Turing machine model, typically used for discrete 
problems. The cost of an operation in a Turing ma- 
chine model depends on the size of the operands, which 
is not a good assumption for floating point numbers. 
For a full discussion of the pros and cons of the Tur- 
ing machine and real number models see [14 Chapt. 
8]. Whether the real number or Turing machine model 
is used can make an enormous difference. For exam- 
ple, L.G. Khachiyan [3] shows that linear program- 
ming is polynomial in the Turing machine model. In 
1982, Traub and H. Wozniakowski [15] showed that 
Khachiyan’s algorithm is not polynomial in the real 
number model and conjectured that linear program- 
ming is not polynomial in this model. This conjecture 
is still open. 


The purpose of information operations (Assump- 
tion 2) is to replace the input by a finite set of num- 
bers. For integration, the information operations are 
typically function evaluations. 


Computational Complexity 
of High-Dimensional Integration 


We illustrate some of the important ideas of IBC with 
the example of high-dimensional integration. 

We wish to compute the integral of a real-valued 
function f of d variables over the unit cube in d di- 
mensions. Typically, we have to settle for computing 
a numerical approximation with an error ¢. To guaran- 
tee an €-approximation we have to know some global 
information about the integrand. We assume that the 
class F of integrands has smoothness r. One such class 
is F,, which consists of those functions having contin- 
uous derivatives of order through r, these derivatives 
satisfying a uniform bound. 

A real function of a real variable cannot be entered 
into a digital computer. We evaluate f at a finite num- 
ber of points and we call the set of values of f the local 
information, for brevity information, about f. An algo- 
rithm combines the function values into a number that 
approximates the integral. 

In the worst-case setting we want to guarantee an 
error at most ¢ for every f € F. The computational 
complexity, for brevity complexity, is the least cost of 
computing the integral to within e for every f. We 
want to stress that the complexity depends on the prob- 
lem and on ¢, but not on the algorithm. Every pos- 
sible algorithm, whether or not it is known, and all 
possible points at which the integrand is evaluated are 
permitted to compete when we consider least possible 
cost. 

It can be shown that if F = F,, then the complexity 
of our integration problem is of order e* Ifr=0,e. g., 
if our set of integrands consists of uniformly bounded 
continuous functions, the complexity is infinite. That is, 
it is impossible to solve the problem to within «. Let r be 
positive and in particular let r = 1. Then the complexity 
is of order ¢~ ¢. Because of the exponential dependence 
on d, we say the problem is computationally intractable. 
This is sometimes called the curse of dimensionality. 

We will compare this d-dimensional integration 
problem with the well-known traveling salesman prob- 
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lem (TSP), an example of a discrete combinatorial 
problem. The input is the location of the n cities and 
the desired output is the minimal route; the city lo- 
cations are usually represented by a finite number 
of bits. Therefore the input can be exactly entered 
into a digital computer. The complexity of this prob- 
lem is unknown but conjectured to be exponential 
in the number of cities. That is, the problem is con- 
jectured to be computationally intractable and many 
other combinatorial problems are conjectured to be in- 
tractable. 

Most problems in scientific computation which in- 
volve multivariate functions belonging to F, have been 
proven computationally intractable in the number of 
variables in the worst-case setting. These include non- 
linear equations [10], partial differential equations [19], 
function approximation [7], integral equations [19], 
and optimization [6]. Material on the computational 
complexity of optimization will be presented in the sec- 
ond half of this article. 

Very high-dimensional integrals occur in many 
disciplines. For example, problems with dimension 
ranging from the hundreds to the thousands occur 
in mathematical finance. Path integrals, which are of 
great importance in physics, are infinite-dimensional, 
and therefore invite high-dimensional approximations. 
This motivates our interest in breaking the curse of di- 
mensionality. Since this is a complexity result, we can- 
not get around it by a clever algorithm. We can try 
to break the curse by settling for a stochastic assur- 
ance rather than a worst-case deterministic assurance. 
Examples of stochastic assurance are provided by the 
randomized and average case settings which we will 
consider below. We can also try to break the curse by 
changing the class of inputs. A good example of this oc- 
curs in mathematical finance. 


Mathematical Finance 


The valuation of financial instruments often requires 
the calculation of very high-dimensional integrals. Di- 
mensions of 360 and higher are not unusual. Further- 
more, since the integrals can be very complicated re- 
quiring between 10° and 10° floating point operations 
per integrand evaluation, it is important to minimize 
the number of evaluations. Extensive numerical testing 
shows that these problems do not suffer from the curse 


of dimensionality. A possible explanation is given by I. 
Sloan and WoZniakowski [11], who show that the curse 
can be broken by changing the class of integrands to 
capture the essence of the mathematical finance prob- 
lem. See [14 Chapt. 4] for a survey of high-dimensional 
integration and mathematical finance. 


General Theory 


In general, IBC is defined by the assumptions that the 
information concerning the mathematical model is 

e partial, 

e contaminated, and 

e priced. 

Referring to the integration example, the mathematical 
input is the integrand and the information is a finite 
set of function values. It is partial because the integral 
cannot be recovered from function values. For a partial 
differential equation the mathematical input consists of 
the functions specifying the initial value and/or bound- 
ary conditions. Generally, the mathematical input is re- 
placed using a finite number of information operations. 
These operations may be functionals on the mathemat- 
ical input or physical measurements that are fed into 
a mathematical model. 

In addition to being partial the information is often 
contaminated by, for example, round-off or measure- 
ment error ([8]). If the information is partial or con- 
taminated it is impossible to solve the problem exactly. 
Finally, the information is priced. As examples, func- 
tions can be costly to evaluate or information needed 
for oil exploration models can be obtained by set- 
ting off shocks. With the exception of certain finite- 
dimensional problems, such as roots of systems of poly- 
nomial equations and problems in numerical linear al- 
gebra, the problems typically encountered in scientific 
computation have information that is partial and/or 
contaminated and priced. 

As part of our study of complexity we investigate 
optimal algorithms, that is, algorithms whose cost is 
equal or close to the complexity of the problem. This 
has sometimes led to new solution methods. The rea- 
son that we can often obtain the complexity and an op- 
timal algorithm for IBC problems is that partial and/or 
contaminated information permits arguments at the in- 
formation level. This level does not exist for combinato- 
rial problems where we usually have to settle for trying 
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to establish a complexity hierarchy and trying to prove 
conjectures such as P 4 NP. 

A powerful tool at the information level is the no- 
tion of the radius of information, R. The radius of in- 
formation measures the intrinsic uncertainty of solving 
a problem using given information. We can compute 
an €-approximation if and only if R < e. The radius de- 
pends only on the problem being solved and the avail- 
able information; it is independent of the algorithm. 
The radius of information is defined in all IBC settings. 


Information-Based Optimization 


We turn to the application of IBC concepts to informa- 
tion-based optimization. 

In their seminal book, A.S. Nemirovsky and D.B. 
Yudin [6] study a constrained optimization problem. 
They wish to minimize a nonlinear function subject to 
nonlinear constraints. Let f = [fo, ..., fm], where fo de- 
notes the objective function and f;, ..., f», denote con- 
straints. Let F be the product of m+ 1 copies of F,. Then 


T dir 
comp(s) = © ((:) : 


Thus this problem suffers from the curse of dimension- 
ality. 

Vavasis [16 Chapt. 6] reports on the worst-case 
complexity of minimizing an objective function with 
box constraints. He assumes objective functions de- 
fined on the unit cube in d dimensions and takes F as 
the class of continuous functions with uniform Lips- 
chitz constant L. For global minimization, 


d 
comp(e) = © ((%) : 


Thus global minimization is intractable. 

In contrast to global minimization, the problem of 
computing a local minimum is tractable with suitable 
conditions on F. Let F consist of continuously differen- 
tiable real functions on [0, 1]4 whose gradients satisfy 
a uniform Lipschitz condition with constant M. Then 
4d(M/e)* function and gradient evaluations are suffi- 
cient. 

As discussed above, there are two ways one can at- 
tempt to break the curse of dimensionality: by settling 


for a stochastic assurance, or by changing the class of 
inputs. For the constrained optimization problem, we 
first describe changing the class of functions, and then 
turn to weakening the assurance. 

Nemirovsky and Yudin [6] take F = Feony to be the 
class of convex functions that satisfy a Lipschitz condi- 
tion with a uniform constant on a bounded convex set 
D. Then 


1 
comp(s) = 0 (10g -) ; 


where the constant in the @-notation depends polyno- 
mially on the dimension d of D and m, the number of 
constraints. Thus, convexity breaks the curse of dimen- 
sionality. 

The worst-case deterministic assurance may be 
weakened to a stochastic assurance; we report on the 
randomized and average case settings. 

Nemirovsky and Yudin [6] show that randomiza- 
tion does not break the curse of dimensionality for com- 
puting the minimum value of the nonlinear constrained 
problem. G.W. Wasilkowski [17] establishes an even 
more negative result if an ¢-approximation to the value 
of x that minimizes fo is sought. He permits random- 
ization and shows that for all ¢ < 1/2, this problem is 
unsolvable even if d= 1. 

The results considered so far use a sequential model 
of computation. One could also ask about the complex- 
ity under a parallel model of computation. If we have k 
processors running in parallel, how much can the com- 
putation of the minimum be sped up? Clearly, the best 
possible speedup is k. Nemirovsky [5] considers this 
problem for the case F = Feony, showing that 


d 1/3 1 
par = 2 
comp «n= 2( (ee) (2). 


where the §2-constant is independent of k and e. Hence 
we find that 


comp(e) =% In(2kd) \ 1 
compP*"(¢,k) ( d ) , 


which is much less than k. Thus parallel computation is 


not very attractive for this problem. 
The average case setting looks more promising than 
the randomized setting, but since it is technically very 
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difficult, the results to date are quite limited. In the aver- 
age case setting we want to guarantee that the expected 
error is at most € and we minimize the expected cost. 

In the average case setting, an a priori measure must 
be placed on F. Typically, this measure is Gaussian; in 
particular, Wiener measures are used. Since the distri- 
bution of the random variable min, f(x) is difficult to 
obtain, the average case analysis of the global optimiza- 
tion problem is very difficult. Only partial results have 
been obtained. 

Let d = 1 and F C C’(0, 1]. Assume that F is en- 
dowed with the r-fold Wiener measure. Wasilkowski 
[18] shows that approximately (e~!VIn e—!)!(+/) 
function evaluations suffice. This is better than the 
worst case, where some « '" function values are 
needed. 

Stronger results have been obtained for the case of 
d=1andr=0,i.e., optimization for continuous scalar 
functions, equipped with the Wiener measure. K. Ritter 
[9] considers the case of nonadaptive methods, showing 
that 


1 2 
comp""(e) = © ((:) . 
€ 


Moreover, the optimal evaluation points are equidistant 
knots. More recently (1997), J.M. Calvin [1] investigates 
adaptive methods for this problem, showing that for any 
6 €(0, 1), 


1\ /a-8) 
comp*“(g) =O ((2) F 


The study of optimization in the average case setting 
is a very promising area for future research. Important 
open problems include: 

e obtaining multivariate results, 

e obtaining lower bounds, 

e obtaining better upper bounds. 

We now restrict our attention to the special optimiza- 
tion problem of linear programming (LP), which we 
discuss in the worst-case setting. 

In 1979, Khachiyan [3] studied an ellipsoid algo- 
rithm and proved that LP is polynomial in the Tur- 
ing machine model. In 1982, Traub and WoZzniakowski 
[15] showed that the cost of this ellipsoid algorithm is 
not polynomial in the real-number model, and conjec- 


tured that the LP problem is not polynomial in the real- 
number model. This nicely illustrates the difference be- 
tween the cost of an algorithm and the complexity of 
a problem, since the result concerning the cost of the 
ellipsoid algorithm leaves open the question of prob- 
lem complexity. The Traub-Wozniakowski conjecture 
remains open. 

A related open question is whether LP can be solved 
in strongly polynomial time. (Note that the underlying 
models of computation are different: the real-number 
model versus the Turing machine model.) This ques- 
tion is also still open, with results known only for spe- 
cial cases. In 1984, N. Megiddo [4] showed that LP can 
be solved in linear time if the number of variables is 
fixed, while in 1986, E. Tardos [12] showed that many 
LP problems that arise from combinatorial applications 
can be solved in strongly polynomial time. 

We now discuss the computation of fixed points, 
which we include here because the result involves el- 
lipsoid methods. The problem is to compute the fixed 
point of f(x); that is, to solve the nonlinear equation x 
= f(x) for any f € F, where F is the class of functions on 
[0, 1]¢ having a Lipschitz constant of q, with q € (0, 1). 

The simple iteration algorithm x;,1 = f(x;), with xo 
= 0, can compute an ¢-approximation with at most 


In L/e 
male, q) = In 1/q 


evaluations of f. Thus the simple iteration algorithm 
behaves poorly if q is close to one. 

Z. Huang, Khachiyan, and K. Sikorski [2] show 
that an inscribed ellipsoid algorithm computes an e- 


approximation with 
1 
1—q 


function evaluations. Thus their algorithm is excellent 
for computing fixed points of functions with q close to 
unity; that is, almost noncontracting functions. 


1 
n-(e,q) = O (« (m —+I1n 
€ 


See also 


> Complexity Classes in Optimization 

> Complexity of Degeneracy 

> Complexity of Gradients, Jacobians, and Hessians 
> Complexity Theory 

> Complexity Theory: Quadratic Programming 
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Let M be a given n x n matrix, and let q be a given n 
vector. The linear complementary problem (LCP; cf. also 
> Linear complementarity problem) is to find a vector 
x which satisfies the following system: 


x>0, 
Mx+q=0, (1) 
x" (Mx + q)=0. 


When some or all the variables are required to be inte- 
gers, the problem is called integer linear complementary 
problem (ILCP). 

Suppose that for each i (i = 1, ..., k), the variable x; 
is required to be integer among 


x; € {0,..., ni}, 


while for each i (i = k+ 1, ..., 1), the variable x; is con- 
tinuous and 


0< x; < Bi. 
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The problem can be formulated as the feasibility prob- 
lem which finds a solution x and z such that 


0< Mjx+qi < Bi) — 2), 


0S njZ; — Xj, [a een 2 

x; € {0,..., nj}, i= Lone ky 

0 < Bizi — xj, P= hE lone; 
0<x; < Bi. i=k+1,...,n, 
ze {0,1}", 


where MV; is the ith row of M, q; is the ith component of 
q and B; is the optimal value of the following problem: 


max Mjx + qi 
st. x; €{0,..., mj}, i=1,...,k, 
O<x,< fi, i=kt+1,...,n, 


which can be solved analytically. 

It has been shown [2] that if the region (2) is empty, 
the associated (ILCP) has no solution. Otherwise, if 
(x, Z) satisfies (2), then x solves the (ILCP). 

Obviously, when all the variables of LCP are zero- 
one integers, the (ILCP) is formulated as a zero-one in- 
teger feasibility problem of the form: 


Find x,z 
st. O< Mjx+ qi < Bi(1— 2), (3) 
0<z—-x;, i=1,...,n, 
x,z € {0,1}", 
where 


B= max {M;x + qi: XE {0, 1}"} 3 
It is worth noting that the minimum norm solution of 


the zero-one (ILCP) can be obtained by solving the fol- 
lowing linear zero-one integer problem: 


n 
min y Xj 
i=1 


s.t. (x, Z) satisfies (3). 


(4) 


We note that there are many algorithms for solving the 
problem with practical size. 


Integer variables without known upper bounds 
make the problem much harder. Let us consider the 
(ILCP) defined below: 


Find x 
s.t. x= 0, 
Mx + q = 0, 
x! (Mx + q) =0, 
x; is integer for i = 1,...,k. 


This problem can be rewritten in the form: 


Find a,y,z 
s.t. 0< My+ag<e-z, 
0<a, 
(5) 
O<y<z, 
z € {0, 1}", 


yi/a is integer fori = 1,...,k, 


where e is a vector of all ones. If (@, y, Z) solves (5), i-e., 
for each i = 1, ..., k, y;/@ is integer, x = y/a@ solves 
its associated (ILCP). See [3] and [2] for a proof of the 
equivalence. 

The (ILCP) arises in several contexts such as poly- 
matrix games in pure strategies, economic equilibrium 
with discrete activity levels and spatial price equilib- 
rium in discrete commodities. See [2], for details. 

Let us consider the polymatrix game. Suppose that 
each player i (i = 1, ..., m) has a finite number m,; of 
strategies, and the partial payoffs to player i, resulting 
from choices by him/her and player j, are given by m;x 
m, matrices A! (i,j = 1,..., n). The elements of A! are 
assumed to be positive without loss of generalities. 

Let 


XT = (Mus 


be a vector where each component X! expresses the 
probability of i playing his sth strategy. It has been 
shown [1] that finding equilibria of polymatrix games 
is equivalent to find solutions of the linear complemen- 
tarity problem defined below. 

Let 


Pat) yA, fa lyani 
j#i 


1610 


ILPs for Routing and Protection Problems in Optical Networks 


Also, let us define 


xt (, xert yl , v") : 
=I mj n 
q' =(0,...,0,—1,...,—-), 
and 
o A’ Al -e 0 0 
A? 0 A" 0 —-e 0 
Atl An 0 0 O —e 
M=|" _ , 
e 0 0 oOo oO 0 
0 ef 0 oO oO 0 
0 0 ion. gl 0 0 -. 0 


where e is a vector of all ones whose dimension is 
given by context. Then, the above polymatrix game can 
be equivalently written as (1). Moreover, suppose that 


some players, i (i = 1, ..., k), can select only one pure 
strategies, while the other players, i (i= k+1,...,), can 
select mixed strategies. For each player i (i = 1, ..., k), 


the vector X' is required to be zero-one integer, which 
results (ILCP). 
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Introduction 


Owing to the exponential growth of traffic demand, 
there is an emerging challenge as well as an opportunity 
for service providers to employ Internet Protocol (IP) 
backbone networks carried on top of optical transport 
networks, forming IP over optical infrastructure. This 
technology is poised to take over most of broadband 
operational services as an integrated transport platform 
for the following reasons. Optical networks offer the ca- 
pability to carry numerous wavelength signals or chan- 
nels simultaneously without interaction between each 
wavelength, known as wavelength division multiplex- 
ing (WDM) [14,16,18]. 

Also WDM optical switches are known to be reli- 
able, support high speed, and are economical, which 
makes them an attractive selection for the default mod- 
ern transport network. Moreover, new emerging ser- 
vices that require high bandwidth and reliability (such 
as Internet Protocol TV - IPTV [2]) are consider- 
ing optical networks as the underlying network to di- 
rectly carry the traffic. However, to best efficiently uti- 
lize WDM networks, network operators face a number 
of management and operation challenges, which often 
require complex mathematical models and advanced 
optimization techniques. This article focuses on these 
challenges and briefly review how integer linear pro- 
gramming (ILP) formulation and algorithms have been 


developed and applied to the domain of optical net- 
works. 


Motivations and Challenges 
in Optimization Models 


The management and operation of WDM networks in- 
volve a number of challenges, which should address 
the physical topology formation, logical topology for- 
mation, survivability and fault management. Design- 
ing a new transport network is very complex, as it 
requires one to make decisions on where to place op- 
tical nodes so as to provide survivability, connectiv- 
ity, and cost-effectiveness. Once the physical topol- 
ogy has been fixed, the logical topology of the back- 
bone is decided by setting up lightpaths from one op- 
tical node to another. In transport networks, provid- 
ing survivability and fault management is the most im- 
portant task. Especially, routing should rapidly recover 
from any failure in the logical topology, because even 
a short outage reflects a massive amount of traffic loss in 
high-speed transport networks. The management and 
operation of these complex challenges benefit greatly 
from using mathematical modeling and optimization 
techniques. 

Today's backbone mostly takes a form of a lay- 
ered IP over optical network. For survivability, it is 
extremely important to address the practical issue of 
how IP routing and protection schemes can effectively ex- 
ploit the lower layer path diversity. IP layer failures are 
known to occur most frequently, while fiber span fail- 
ures are catastrophic in that they lead to multiple simul- 
taneous upper layer failures. Moreover, some of the IP 
layer failures can be only addressed in the IP layer, as 
lower layer (optical) survivability mechanisms cannot 
detect failures occurring at higher (IP or applications) 
layers. In order to rapidly recover from network-wide 
failures and provide persistent end-to-end path quality, 
service providers may set up two diverse paths: the ser- 
vice (primary) path and the restoration (backup) path. 
Any failure in the service path can be hidden, as traffic 
can be instantly rerouted to the restoration path. Obvi- 
ously, the efficacy of the restoration path depends heav- 
ily on how disjoint these two paths are (under the most 
frequent single failures). Therefore, it is important to 
understand how the layering employed in the network 
affects the correlation of failures among paths. 
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This demonstrates the importance of protection 
against failures in layered networks arising out of 
shared risk resource groups (SRRG). An example of 
SRRG is multiple IP links sharing a common optical 
component, including ducts or conduits through which 
multiple optical links are routed under the ground. To 
effectively deliver high-quality services to customers, 
network providers are required to incorporate this 
SRRG information into their routing and protection 
schemes, which has become one of the most challenging 
problems in networking practice. Note that the SRRG- 
diverse constraint involves multiple link failure mod- 
els, in the form of shared risk link groups (SRLGs) and 
shared risk node groups (SRNGs). Most of previous 
studies in IP-over-WDM networks considered only two 
possible alternatives of routing and protection schemes: 
protection at the optical layer or restoration at the IP 
layer [7,11,15,16,17]. 


Path-Protection (Diverse) Routing Problem 


Finding a backup path that is disjoint for each work- 
ing or “primary” path, in general, has been recognized 
as path-protection schemes and has been widely studied 
in optical networks. Medard et al. [12] focused on the 
problem of identifying two redundant trees from a sin- 
gle source to a set of destinations that can survive any 
single link failure (i. e., the elimination of any vertex in 
the graph leaves each destination vertex connected to 
the source via at least one of the directed trees). El- 
linas et al. [5] focused on the problem of identifying 
two diverse paths that are SRRG failure resilient. They 
were the first to theoretically prove that if an arbitrary 
set of links can belong to a common SRLG, then the 
problem of finding SRLG-diverse paths between a given 
source and destination is NP-complete for unicast traf- 
fic. Subsequently, Zang et al. [22] proposed heuristic al- 
gorithms for the combined problem of finding SRLG- 
diverse paths and wavelength assignment for one-to- 
one (unicast) traffic. Most recently, Cha et al. [2] stud- 
ied the SRLG-diverse routing for one-to-many (multi- 
cast) traffic, where they focused on the combined prob- 
lem of minimizing the network cost of multicasting 
traffic from dual sources to multiple destinations while 
providing path protection against a single SRLG fail- 
ure. 


Minimum Color Problem 


Coudert et al. [4] proposed new techniques for the min- 
imum color path problem for multiple failure toler- 
ance from a SRRG failure. The consequence minimum 
color st-cut problem was also shown to be NP-complete 
and hard to approximate. Each SRRG is associated with 
a so-called color in a colored graph G, = (V,E,C), 
where C is a family of subsets of E. The minimum color 
path problem is to find a path from a node s to a node 
t that minimizes the number of different colors of its 
links. This problem was proven to bee NP-complete 
in [21] and polynomial in the special case where all the 
edges of each color have a common extremity. Many 
insightful theoretical results of this problem were re- 
ported in [4]. 


Definition 1 [4] Let G = (V, E, C) beacolored graph, 
where Cis a partition of E. The minimum color cut con- 
sists in finding a minimal set of colors disconnecting G. 
Let s, t € V be two distinct vertices in G. The minimum 
color st-cut problem is to find a minimal set of colors 
disconnecting s from t. 


Theorem 1 [4] The minimum color st-cut is NP-hard. 


Coudert et al. [4] proved this theorem by proposing the 
reduction of each set of the set cover instance to a color 
of the minimum st-cut instance. 


Theorem 2 [4] The minimum color st-cut problem 
is not approximable within a factor o(logn) unless 
NP CG TIME(n (sles), 


Theorem 3 [4] When the edges of each color induce 
a connected subgraph the minimum color st-cut and the 
minimum color cut problems are polynomial. 


Theorem 4 [4] The minimum color st-cut is k-approx- 
imable when the number of connected components of the 
subgraph induced by edges of each color is bounded by k. 


Define a nonnegative variable for each node, where 
some edge e between nodes i and j institutes a cut if 
x; # x;. If any edge of color c institutes a cut, then c is 
a color cut. Let a binary variable y, be associated with 
each color c, where y, = 1 when the color c is selected 
to be in a set of color cut, and y, = 0 otherwise. The 
minimum color st-cut problem can be formulated as 
a mixed integer linear program as follows: 


min > yc (1) 


cEC 
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subjectto y. > |xj—xj|VeeC,Vi,jec, (2) 
x,j>0 VieV, (3) 


x,=0,x%,=1. (4) 


Finding the Dual Link Problem 


The problems related to finding a pair of link or node 
disjoint paths in single cost networks have been stud- 
ied since the mid 1980s [19]. The min-sum problem 
of dual link is to minimize the sum of the costs of the 
two disjoint paths and can be solved using a polynomial 
time algorithm called the shortest pair of paths [19]. 
In a recent study, the min-sum problem was shown to 
be a special case of the min-cost flow problem [2]. In 
contrast to the min-sum problem, the min-max prob- 
lem, whose objective is to minimize the length of the 
longer one of the two paths was proven to be NP- 
complete [10]. The min-min problem, whose objective 
is to minimize the length of the shorter one of the two 
paths, can also be proven to be NP-complete [20] by 
using the reduction of a the well-known 3-satisfiability 
(3SAT) problem. The proof can be described as fol- 
lows [20]. An instance of 3SAT is a boolean formula 
that is the AND of m clauses Cj(j = 1,...,m). A clause 
is the OR of three literals, each of which is an occurrence 
of variable x;(i = 1,...,m) or its negation. A truth as- 
signment is a function t : {x;} — {true, false}. C; is 
satisfied by t if it contains a literal with truth value. 
The question of 3SAT is to determine whether there is 
a truth assignment that satisfies all m clauses simultane- 
ously. With the 3SAT approach, Xu et al. [20] proposed 
the following theorem. 


Theorem 5 [20] The problem of finding two node/link- 
disjoint paths between a pair of source and destination 
nodes in a directed/undirected network with minimum 
cost for the shorter one is NP-complete. 


SRRG-Diverse Routing Problem 


The SRRG-diverse routing problem can be consid- 
ered to be a generalization of the link-diverse/disjoint 
routing problem. In a multicast context, the link- 
disjoint path-protection problem can be viewed as a di- 
verse routing problem of identifying two redundant 
trees from a single source to a set of destinations 


that can survive any single link failure [6,12]. The di- 
verse routing problem has been previously shown to 
be NP-complete [1,7,8]. During the past few years, 
there has been increasing interest in the diverse rout- 
ing problem with SRRG-diverse constraints as SRRG- 
diversity requirements play a very crucial role in real 
life network provisioning problems. An example of 
real life problems is finding a pair of diverse paths 
at the optical layer, which involves the search of two 
SRLG-diverse paths as each link at the OXC (op- 
tical cross connect) layer may be related to several 
SRLGs. Many recent studies have shown that the gen- 
eralized SRLG (or SRRG) diverse routing is a special 
case of the diverse routing problem, which is also NP- 
complete [5,8,11,13,22]. Among those studies, the di- 
verse routing problem of unicast (one-to-one) traffic 
with SRLG-diverse constraints has been proven to be 
NP-complete [5]. In later studies, the diverse routing 
problem was extended to many special cases (e. g., di- 
verse routing under both wavelength capacity and path 
length constraints [22], multicast routing under SRLG- 
diverse constraint [3]). 

Cha et al. [3] proposed a generalized case of the 
SRRG-diverse routing problem where there are two 
source nodes. The problem can be formally defined as 
follows. Let G = (V,E) be an undirected graph rep- 
resenting the backbone network. We denote the set of 
network nodes by V, while E is the set of duplex com- 
munications links (edges). Let the number of nodes and 
edges be n and m, respectively. There is a set of two 
source nodes, denoted by S C V, and there is a set of 
destination nodes, denoted by D C V. Denote Basa set 
of SRLGs. Each link (i,j) € E in the graph has a cost 
function (c;;) associated with it and belongs to a subset 
of SRLGs in B. Note that the cost c;; is the sum of the 
port cost at nodes i and j and the transport cost relative 
to the distance of link (i, j). This problem was proven to 
be NP-hard in [3] by using a reduction from the SRLG- 
diverse path problem [5]. 


Theorem 6 The two-source SRRG-diverse routing 
problem is strongly NP-hard. 


This theorem was proven in [3] where this problem was 
claimed to be a generalization of the problem of finding 
SRLG-diverse paths between a source and a destination 
in a given graph (SRG (shared risk group)-diverse rout- 
ing) proposed in the paper by Ellinas et al. [5]. We add 
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two nodes (|S|) in the graph and two links connecting 
each of the two new nodes with the source with the costs 
of 0. Assume that the two new links do not share any 
SRLGs. We then add | D| nodes and 2 * | D| edges. Each 
of these nodes is connected to the destination node by 
two edges that do not share any SRLGs. Then the trans- 
formation is complete and clearly polynomial. 

The ILP of the two-source SRRG-diverse routing 
problem can be formulated as follows. Define the fol- 
lowing decision variables. Y? ; = 1 if link (i, j) is used 
by the multicast tree rooted at source node s; 0 other- 
wise. Xi id = 1if link (i, j) is used by the multicast tree 
rooted at source node s to destination d; 0 otherwise. 
Ziq = 1 if the path from source s to destination d uses 
an SRLG 5; 0 otherwise. The ILP formulation is given by 


min) > 2 Y} 5C%,j (5) 


s€S (i, j)EE 
subjectto Y;,;> Xj, Vij) €E, 6) 
‘ ah 6 
VseS,VdeD, 
do Xie DY Xiu 
{iG DEES tiIG.EE} 
=o), VieV,VseS,VdeD, 
(7) 
Zia 2 Xijg Vj) €b,Vs €S,Vd €D, 
VbeB, (8) 
Yo Z,4<1 VdeD,VbeB, (9) 
ses 
Xi jd Vij: Zb,a € 10,1} Vs eS, Vd € D, 
V(i,j)€ E,VbeEB, (10) 
1 if i=s, 
where ig = {-l if i=d, (11) 


0 otherwise . 


The constraints in (6) ensure that an edge must be 
selected by the multicast tree when it is used by the mul- 
ticast tree to carry any traffic. The flow constraints in 
(7) ensure the flow conservation at each node, allowing 
each destination to have a flow path from the source. 
More precisely, 7 , is the net flow capacity generated, 
carried, or destined at node i for destination d by the 
multicast tree rooted at node s, which should have the 


value of 1 if node i is the source, —1 if node i is the desti- 
nation (acting as a sink), and 0 otherwise (whether node 
i belongs to the multicast tree or not). The constraints 
in (8) ensure that a SRLG must be selected by the path 
from a source to a destination when a link that belongs 
to the SRLG is used by the path. The constraints in (9) 
state that W? is greater than or equal to the number of 
number of distinct sources that uses bundle b to reach 
d; that is, if b is used in only one or none of the sources 
to reach b, the value is greater than or equal to zero. 


Shared Path-Protection Problem 


Sahasrabuddhe et al. [17] proposed fault management 
in IP over WDM networks using techniques which are 
protection at the WDM layer and restoration at the IP 
layer. “Protection” refers to preprovisioned failure re- 
covery (i.e., set up a backup lightpath for every pri- 
mary lightpath), whereas “restoration” refers to more 
dynamic recovery (i.e., overprovision the network so 
that after a fiber failure, the network should still be able 
to carry all the traffic it was carrying before the fiber 
failure). Typically, their protection scheme focuses on 
shared path protection against single fiber span failures, 
where multiple independent primary paths share the 
backup path capacity to minimize the total capacity re- 
quired in the network. 

Given E as a set of unidirectional fiber links in the 
network, Fas a set of bidirectional fibers in the network, 
Rj; as a set of alternate routes for node pair ij, and W 
as the maximum number of wavelengths on a link, the 
ILP of the shared path-protection routing problem can 
be formulated as follows. Define the following decision 
variables. w; is the number of wavelengths used by pri- 
mary lightpaths on link k, s; is the number of spare 
wavelengths used on link k, and Vj; is the number of 
primary lightpaths between node pair ij. my? = 1if one 
or more backup lightpaths are using wavelength w on 
link k; 0 otherwise. y;, = 1 if the rth route between 
node pair ij utilizes wavelength w before any fiber fail- 
ures; 0 otherwise. i = 1 if a primary on route be- 
tween node pair ij is protected by route between the 
node pair by employing wavelength; 0 otherwise. The 
ILP of the shared path-protection routing problem is 
rather complicated as it considers end-to-end lightpath 
assignment on the physical links, physical diversity of 
the primary and backup paths, and sharing backup path 
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capacity for failure-independent primary paths. There- 
fore, the key ILP is formulated as follows (please refer 
to [17] for the complete set of equations): 


E 
min ) “(we + sx) (12) 
k=1 
b,w 
subject to 2 > >: 6:7, <1 
ij pERij:fep bERij:keb (13) 
1<f<F,1l<k<E,l<w<w, 
WEetsp<W 1<kK<E, (14) 
WwW 
Mh = Va Vis, (15) 
re€Rjjw=1 
Ww 
~ DY Yi awe 1SkSE, (16) 
ij r€Rjj,kerw=1 
Ww 
\ Sse Leake (17) 
w=1 
(x = Vij,) +m 1 1<k<E, 
ij r€Rjj,ker 
l<w<w, (18) 
Ww WwW 
b, 2 
Lrie= LD Lah Vinvpe Ry, 
w=1 bER;,bApw=l 
l<w<w. (19) 


The objective of ILP is to minimize the total capac- 
ity used given in (12). The crux of the formulation is at 
the set of constraints in (13), which ensure that that two 
backup lightpaths share wavelength w on link k only if 
the corresponding primary paths are fiber-disjoint. The 
above ILP formulation includes constraints for setting 
up lightpaths with shared path protection, where the 
number of channels on each link is bounded by (14), 
the number of primary lightpaths between a node pair 
is defined by (15), the number of primary lightpaths 
traversing a link is defined by (16), the spare capacity 
of each link is defined by (17), the usage of primary or 
backup lightpaths on a wavelength is defined by (18), 
and every primary lightpath is ensured to be protected 
by a backup lightpath by (19). In addition, multicom- 
modity flow constraints (omitted here) are added to en- 
sure the amount of traffic sourced from node to destina- 
tion node is covered by the capacity of the wavelength. 


Note that additional constraints on the number of re- 
ceivers and transmitters used can be incorporated in the 
model [17]. 


Path-Restoration Problem 


Path-restoration techniques have been frequently em- 
ployed to provide highly capacity efficient (close to) 
real-time restoration of a network failure. In contrast 
to the path-protection routing, the operation mode of 
the path restoration only uses the full bandwidth from 
the primary while finding an alternative path. The path- 
restoration routing problem can involve different net- 
work layers: physical (optical) and IP. The optical path 
restoration directly replaces the prefailure capacity at 
the transmission carrier signal level, which has no per- 
formance effects in the upper layers. On the other hand, 
the IP path restoration dynamically reroutes the sig- 
nals around failures using routing table updates or dy- 
namic call-routing. An interior gateway protocol (IGP) 
is widely used to dynamically find an alternative path 
and perform load sharing in traffic distribution. 


Optical Path Restoration Problem 


Iraschko and Grover [9] studied the path-restoration 
routing problem, where the task is to deploy a set of 
replacement signal paths between two end nodes of 
the failed span, capable of yielding the maximum total 
amount of replacement capacity, while respecting the 
finite number of spare links on each span. They assume 
that, given failure scenarios, a predefined set of distinct 
eligible routes are precomputed for end node pairs (i. e., 
primary paths). Then, the goal of the path-restoration 
routing problem is to maximize the total of all restora- 
tion flow assignments for those primary paths, using 
only the commodities selected by the failure scenario 
and only the surviving spans in the reserve network. 
It also requires that all flow assignments made over 
all routes, for all simultaneously restored node pairs, 
should not exceed the spare capacity of any span in the 
reserve network. The outcome of such optimized de- 
sign will require minimal capacity for the reserve net- 
work. 

The ILP of the path-restoration routing problem 
can be formulated as follows. Define the following deci- 
sion variables and parameters. Let i represent a failure 
scenario, such as a single span cut or a node loss. Dj 
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is the number of end node pairs that have lost one or 
more units of demand owing to the failure i and r is the 
index to indicate these node pairs, r € (1--+ Dj). X} is 
the number of demand units lost by an end pair r under 
failure i. Finally, the network is G(N, E, s), where N is 
the set of nodes, E is the set of spans, and s is the vector 
of spare capacities on each span s;. The ILP formulation 
is given by 


Dj; Py 
max DD fi” 


r=1 p=1 


(20) 


r 


P; 
subjectto So f7? =X Vre(1---Dj), (21) 


p=l 
Dj Py 

poe Ps, VjEE, (22) 
r=1 p=1 

‘he >0, integer V(r, p), (23) 


where f,’? is a whole number assignment of flow to the 
pth route available for restoration of node pair r un- 
der failure scenario i. P’ is the total number of eligi- 
ble restoration routes available to node pair r for the 
restoration of failure i. i = 1 if span j is in the pth 
eligible route for restoration of node pair r in the event 
of failure scenario i; 0 otherwise. 


IP Path-Restoration Problem 


In the IP-layer communication, as apposed to the 
abovementioned optical path restoration problem, the 
path-restoration problem is defined differently on the 
basis of the WDM protection model from Sect. “Shared 
Path-Protection Problem” (for the complete model, 
see [17]). The key ILP model for this IP path restora- 
tion problem is given by 


max ) > ee (24) 
sd ij 
subject to > x Viejr < Trans?’ 
j reRij 
Vi,l<w<W, (25) 
y > ve ae Vil<w<W, (26) 


i r€Rij 
>> So vt, <1 1S5k< EL < WW, (27) 


ij rERjj.ker 


where the objective function in (24) is to minimize 
the average hop distance before a fault, the constraints 
in (25) and (26) ensure that node i uses at most Trans’ 
transmitters and node j uses at most Rec’’ receivers on 
wavelength w, and the constraints in (27) ensure that 
the wavelength w on link j is used either by a primary 
lightpath or by backup lightpaths. 


Concluding Remarks 


In this article, we reviewed how ILP formulations are 
used in WDM optical networking. In optical networks, 
preprovisioning the networks to support fast restora- 
tion of failures is critical, which means to set up physi- 
cally disjoint backup paths (links) for the primary paths 
(links). Here, ILP formulations are valuable in finding 
the global optimal backup paths among all the pos- 
sible alternative paths for traffic demands of interest. 
A set of constraints in ILP can be set up to represent 
the restoration flow balance constraint, the link capac- 
ity flow constraint, and physical diversity of the primary 
and backup paths. 
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One of the byproducts of World War II was the dis- 
covery that the routing of ship convoys, and other hu- 
man activities like transportation, production, alloca- 
tion, etc. could be modeled mathematically, i.e. the of- 
ten intricate choices that they involve could be cap- 
tured into a system of equations and inequalities and 
could be optimized according to some agreed upon cri- 
terion. The simplest such model, involving only lin- 
ear functions, became known as linear programming. 
Parallel developments have led to the discovery of the 
computer, which made it practical to solve linear pro- 
grams of a realistic size. A few years later the theory was 
extended to systems involving nonlinear convex func- 
tions. Convexity was needed to ensure that any local 
optimum is a global optimum. 

An amazing variety of activities and situations can 
be adequately modeled as linear programs (LPs) or 
convex nonlinear programs (NLPs). Adequately means 
that the degree of approximation to reality that such 
a representation involves is acceptable. However, as the 
world that we inhabit is neither linear nor convex, the 
most common obstacle to the acceptability of these 
models is their inability to represent nonconvexities or 
discontinuities of different sorts. This is where integer 
programming comes into play: it is a universal tool for 
modeling nonconvexities of various kinds. 

To illustrate, imagine a factory that produces two 
items and whose capacity is determined by the four lin- 
ear constraints represented in Fig. 1. These inequalities, 
along with x; > 0, x, > 0 (only nonnegative amounts 
can be produced), define the feasible set, shown as the 
shaded area. Since the latter is a convex polyhedron, 
if profit is a linear function of the amounts produced, 
then an optimal (i.e. profit-maximizing) production 
plan will correspond to one of the vertices of the poly- 
hedron. 

Imagine now that the following reasonable con- 
dition is imposed: for each item there is a threshold 
quantity below which it is not worth producing it. The 
threshold is b units for item 1 and d units for item 2. 
Furthermore, at least one of the two items must be pro- 
duced. As shown in Fig. 2., the feasible set now consists 
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Feasible set without tresholds 
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Feasible set with tresholds 


of three separate pieces: the two lines [b, c] and [d, e], 
and the shaded area (polyhedron) whose vertices are f, 
g,h, i. Depending on the slope of the objective function, 
the optimum may now occur at any of the points J, c, d, 
e, g, h, i. The problem has become qualitatively differ- 
ent. The ‘threshold’ conditions, of the form ‘either x; = 
0 or else x; > b’, and ‘either x2 = 0 or x2 > a’, as well as 
the condition ‘at most one of x; =0 and x, =0 can hold’, 
cannot be modeled by linear or convex programming 
techniques. The same is true of a host of other ‘logical’ 
conditions: disjunctions (‘either this or that’; ‘at least 
one of several constraints must hold’, ‘at most one of 
several variables can be positive’), implications (‘if this 
action is taken, then that action must be taken’), prece- 
dence relations (‘this event must precede that’, ‘this ac- 
tion cannot start until some others are completed’), etc. 
Yet, it is quite obvious that conditions of this type are 
in no way exceptional; on the contrary, their presence 
is pervasive in many real world situations. 

A linear (nonlinear) programming problem whose 
variables are restricted to integer values is called a linear 


(nonlinear) integer programming problem, or simply 
an integer program (IP, linear unless otherwise stated). 
If only some of the variables are restricted to integer 
values, we have a mixed integer program (MIP). Such 
a problem can be stated as 


min cx 

st. Ax >b 
x>0 
x; integer 
JEMCN, 


where A is a given m x n matrix, c and b are given vec- 
tors of conformable dimensions, N := {1, ..., n} and x 
is a variable n-vector. The ‘pure’ integer program (IP) 
is the special case of MIP when N, =N. If, in addition, 
all entries of A, b, c are integer, then the slack or surplus 
variables can also be restricted to integers. 

Integer programming as a field started in the mid- 
1950s. A number of excellent textbooks are available for 
its study [15,16,17,20]. 


Scope and Applicability 


Applications of integer programming abound in all 
spheres of decision making. Some typical real-world 
problem areas where integer programming is particu- 
larly useful as a modeling tool, include facility (plant, 
warehouse, hospital, fire station) location; scheduling 
(of personnel, production, other activities); routing 
(of trucks, tankers, aircraft); design of communication 
(road, pipeline, telephone, computer) networks; capi- 
tal budgeting; project selection; analysis of capital de- 
velopment alternatives. Various problems in science 
(physics: the Ising spin glass problem; genetics: the se- 
quencing of DNA segments) and medicine (optimiz- 
ing tumor radiation patterns) have been successfully 
modeled as integer programs. In engineering (electri- 
cal, chemical, civil and mechanical) the sphere of appli- 
cations is growing steadily. 

By far the most important special case of integer 
programming is the (pure or mixed) 0-1 program- 
ming problem, in which the integer-constrained vari- 
ables are restricted to 0 or 1. This is so because a host of 
frequently occurring nonconvexities, such as the ones 
listed above, can be formulated via 0-1 variables. If 
the constraint set of the production planning problem 
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shown in Fig. 1. is 
AjyX1 + djgx2 <b), i=1,...,4, 
x,20, 2.20, 


then the conditions 


x, <0 or x >b, 
x2<0 or x,>d, 
x} >0 or x>0, 


imposed in the variant shown in Fig. 2., can be formu- 
lated by introducing two 0-1 variables, 5; and 4, and 
the constraints 


bd, <x, < cb, 
dd <xX%.< e62, 


61+ 62 1, 61,62 € {0, 1}. 


V 


Next we present a few well-known pure and mixed in- 
teger models. 

The fixed charge problem asks for the minimization, 
subject to linear constraints, of a function of the form 
vi c; (x;), with 


Sit Cixi 
0 if x; = 0. 


ifx; > 0, 
ci(xj) = 


Whenever x; is bounded by U; and f; > 0 for all i, such 
a problem can be restated as a (linear) MIP by setting 


ci(xj) = cixj + fiyi. 

x; < Uiyi, 

yi € {0, 1} for all i. 
Clearly, when x; > 0 then y; is forced to 1, and when x; 
= 0 the minimization of the objective function drives y; 
to 0. 

The facility location problem consists of choosing 
among m potential sites (and associated capacities) of 
facilities to serve n clients at a minimum total cost: 


min Soy cijxig + YS fii 


i=1 j=l i=1 


m 
) xij = dj, 


j=l, Jn, 
i=1 
n 
ey = ay, i= 1,...,m, 
j=l 
xj; 2 0, ee ,mj=1, 5 My 
yi € {0,1}, i=1,...,m 


Here dj is the demand of client j, a; is the capacity of 
a potential facility to be located at site i, cj is the per- 
unit cost of shipments from facility i to client j, and f; 
is the fixed cost of opening a facility of capacity a; at lo- 
cation i. In any feasible solution, the indices i such that 
y; = 1 designate the chosen locations for the facilities to 
be opened. 

Variants of this problem include the uncapacitated 
facility location problem (where dj = 1 for all j and the 
constraints involving the capacities can be replaced by 
xj < yn i= 1,...,m, j = 1, ..., n), the warehouse 
location problem (which considers cheap bulk ship- 
ments from plants to warehouses and expensive pack- 
aged shipments to retailers), and various emergency fa- 
cility location problems (where one chooses locations to 
minimize the maximum distance traveled by any user 
of a facility, rather than the sum of travel costs). 

The knapsack problem is an integer program with 
a single constraint: 


max {cx: ax <b, x > 0 integer}, 


where c and a are positive n-vectors, while b is a positive 
scalar. When the variables are restricted to 0 or 1, we 
have the 0-1 knapsack problem. 

A variety of situations can be fruitfully modeled as 
set covering problems: Given a set M and a family of 
weighted subsets Sj, ..., S, of M, find a minimum- 
weight collection C of subsets whose union is M. If A 
is a 0- 1 matrix whose rows correspond to the elements 
of M and whose columns are the incidence vectors of 
the subsets S,, ..., S,, and c is the n-vector of subset- 
weights, the problem can be stated as 


min cx 
Ax >1 
x € {0,1}", 


where the right-hand side of the inequality is the m- 
vector of ls. This model and its close relative, the set 
partitioning problem (in which > is replaced by = ) has 
been (and is being) widely used in airline, bus, and train 
crew scheduling (each row represents a leg of a trip that 
has to be covered; each column stands for a potential 
duty period of a crew). Another application is in med- 
ical diagnostics (each column represents a diagnostic 
test, each row stands for a pair of diseases, with a 1 in 
column j if the pair’s reactions to the tests are differ- 
ent, and a 0 if they are the same; the goal being to select 


1620 


Integer Programming 


a minimum-cost battery of tests guaranteed not to yield 
identical outcomes for any two diseases). 


Combinatorial Optimization 


A host of interesting combinatorial problems can be 
formulated as 0-1 programming problems defined 
on graphs, undirected or directed, vertex-weighted or 
edge-weighted. The joint study of these problems by 
mathematical programmers and computer scientists, 
starting from around 1960, has led to the develop- 
ment of the burgeoning field called combinatorial op- 
timization [7]. Some typical problems of this field 
are: edge matching (finding a maximum-weight col- 
lection of pairwise non-adjacent edges) and edge cov- 
ering (finding a minimum-weight collection of edges 
that together cover every vertex); vertex packing (find- 
ing a maximum-weight independent set, i.e. collection 
of pairwise non-adjacent vertices) and vertex covering 
(finding a minimum-weight collection of vertices that 
together cover every edge); maximum clique (finding 
a maximum cardinality complete subgraph) and min- 
imum vertex coloring (partitioning the vertices into 
a minimum number of independent sets, i. e. coloring 
the vertices with a minimum number of colors such that 
all adjacent pairs differ in color); the traveling salesman 
problem (finding a cycle of minimum total edge-weight 
that meets every vertex). 

We will briefly discuss two of the above problems, 
which in a sense span the universe of combinatorial op- 
timization. At one end of the spectrum, the matching 
problem on a graph G = (V, E) can be stated as 


max x(E) 
st. =. x(S(v)) <1, VEV, 
x>0, x,integer, ec E, 


where for F C E, x(F) = )°ee rx, and 6 (v) is the set 
of edges incident with v. Its weighted version asks for 
maximizing ) > ee EWeXe, where w, is the weight of edge 
e. This problem has the nice property that the integral- 
ity condition can be omitted if the above nonnegativity 
and degree constraints are supplemented with the in- 
equalities 


x(y(S)) < 


S|-1 
| - for allS C V, |S| odd, 


where y(S) is the set of edges with both ends in S. 


In other words, the ‘odd set inequalities’, along with 
the nonnegativity and degree constraints, fully describe 
the convex hull of incidence vectors of matchings. The 
discovery of this remarkable phenomenon in the mid- 
1960s ([8]) has started a massive pursuit of facets of the 
convex hull of other combinatorial polyhedra, and can 
be viewed as the inaugural step in the development of 
the field called polyhedral combinatorics. Close relatives 
of the matching problem are the perfect matching prob- 
lem (in which a maximum or minimum-weight match- 
ing is sought, when it exists, that leaves no vertex un- 
matched), the 2-matching problem (in which the degree 
constraints have right-hand side 2) and more generally, 
the b-matching, or degree-constrained subgraph prob- 
lem (in which the degree constraint for vertex v has the 
positive integer b, as right-hand side). In each of these 
cases, a complete description of the convex hull of fea- 
sible integer points is available in the form of a class 
of inequalities similar to the above ones, which makes 
these problems polynomially solvable. 

At the other end of the spectrum, one of the hard- 
est and most thoroughly investigated combinatorial op- 
timization problems is the traveling salesman problem 
(TSP) already mentioned, in which a salesman is look- 
ing for a cheapest tour of n cities, given the cost of travel 
between all pairs of cities. This is the prototype model 
for situations dealing with the optimal sequencing of 
objects (e. g., items to be processed on a machine in the 
presence of sequence-dependent setup costs). The stan- 
dard formulation on a complete directed graph with 
node set N and arc costs cj is 


min 2 >». CijXij 
i€N jEN\{i} 

s.t. y xij =1, ie€N, 
jEN\Ci} 

JEN, 


i€S jes\{i} 
for SCN,2<|S|<n-1, 
xij € {0,1}, i,j EN, if j. 
The first two sets of equations define an assignment 
problem whose solutions are spanning unions of di- 
rected cycles. The third set, consisting of inequalities 


called subtour elimination constraints, exclude all cy- 
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cles with fewer than n = |N| arcs. The number of so- 
lutions - tours - is factorial in the number of nodes. 
Problems with random costs can be solved even for 
thousands of nodes, but some cost structures that oc- 
cur in practice tend to give rise to very hard problems 
even at small sizes: a class of machine scheduling prob- 
lems at chemical plants turn out to be hard to solve as 
TSPs on 30-50 node directed graphs. The TSP has be- 
come a test bed for the development of, and experimen- 
tation with, various approaches to combinatorial opti- 
mization. A summary of results until the mid- 1980s is 
to be found in [12]. 

A generalization of the TSP, in which the salesman 
does not have to visit all the cities, but gets a prize 
for every city that he does visit, is called the prize col- 
lecting traveling salesman problem (PCTSP). It asks for 
a cheapest tour of just enough cities to collect a re- 
quired amount of prize money. This is the model used 
for scheduling the daily ‘rounds’ of a steel rolling mill, 
an operation that combines the tasks of selecting the 
items for the next round with that of putting them in 
the proper sequence. On a directed graph with loops, it 
can be formulated as the problem of finding a directed 
cycle of minimum arc cost subject to an upper bound 
on the sum of loop penalties on nodes not included in 
the cycle: 


min ) ) CijXij 


i€N jEN\{i} 

s.t > xijtyi=l, iEeN 
JEN\Ki} 
Yo xjtyj=l. jen 
i€N\{j} 
+ > xij + ~ w= Ve =|S|—1 
i€S jeS\{i} i€S\{k} 

for SCN,2<|S|<n-1, 
keS,€EN\S, 
So wiyi < U, 
i€N 
xij € {0,1}, yi € {0,1}, Vij. 


The first two sets of equations are satisfied by any span- 
ning union of cycles and loops, while the third set of 
constraints excludes multiple cycles. Finally, the last in- 
equality bounds at U the weighted sum of loop vari- 
ables. 


Solution Methods 


Unlike linear programs, which are polynomially solv- 
able, integer programming problems, including 0-1 
programming and most combinatorial optimization 
problems, are notoriously difficult: in the language 
of computational complexity theory, they are NP- 
complete. Polynomial time integer programming algo- 
rithms do not exist. However, sometimes an integer 
program can be solved as a linear program, in the sense 
that solving the linear programming relaxation (L) of 
the integer program (i.e. the problem obtained by re- 
moving the integrality conditions), one obtains an inte- 
ger solution. In particular, this is the case when all the 
basic solutions of (L) are integer. For an arbitrary inte- 
ger vector b, the constraint set Ax > b, x > 0, if feasi- 
ble, is known [10] to have only integer basic solutions 
if and only if the matrix A is totally unimodular (i.e. all 
square submatrices of A have a determinant equal to 0, 
lor-—1l). 

The best-known instances of total unimodular- 
ity are the vertex-edge incidence matrices of directed 
graphs, and of undirected bipartite graphs. As a con- 
sequence, shortest path and network flow problems on 
arbitrary directed graphs, as well as edge matching (or 
covering) and vertex packing (or covering) problems 
on undirected bipartite graphs, are in fact linear pro- 
grams, as are all those integer programs whose LP relax- 
ation has as its coefficient matrix the incidence matrix 
of a directed graph, or that of an undirected bipartite 
graph, with arbitrary integer right-hand sides. 

Apart from this important but very special class of 
problems, the difficulty in solving integer programs, as 
already mentioned, lies in the nonconvexity of the fea- 
sible set, which makes it impossible to establish global 
optimality from local conditions. The two principal ap- 
proaches to solving integer programs try to circumvent 
this difficulty in two different ways. 

The first approach, which until the late 1980s was 
the standard way of solving integer programs, is enu- 
merative (branch and bound, implicit enumeration). It 
partitions the feasible set into successively smaller sub- 
sets tied together as nodes of a branch and bound tree, 
calculates bounds on the objective function value over 
each subset, and uses these bounds to discard certain 
subsets (nodes) from further consideration. The lower 
bounds (in a minimization problem) typically come 
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from solving the linear programming relaxation cor- 
responding to the given node, the upper bounds come 
from integer solutions found at some of the nodes. The 
procedure ends when each subset has either produced 
a feasible solution, or was shown to contain no better 
solution than the one in hand. The efficiency of the pro- 
cedure depends crucially on the strength of the bounds. 
Two early prototypes of this approach are due to A.H. 
Land and A.G. Doig [11] and E. Balas [1]. 

The second approach, known as the cutting plane 
method, is a convexification procedure: it approximates 
the convex hull of the set of feasible integer points by 
a sequence of inequalities that cut off (hence the term 
‘cutting planes’) part of the linear programming poly- 
hedron, without removing any feasible integer point. 
The first finitely convergent procedure of this type, 
which uses modular arithmetic to derive valid cutting 
planes for pure integer programs, is due to R.E. Gomory 
[9]. V. Chvatal [6] has shown that the procedure can 
be viewed as one of integer rounding, in which positive 
multiples of Ax > b, x => 0 are added up and the co- 
efficients of the resulting inequality are rounded down 
to the nearest integer. The resulting inequalities form 
the elementary closure of Ax > b, x > 0. The proce- 
dure can then be applied to the elementary closure, and 
so on. The number of times the procedure needs to be 
iterated in order to obtain the convex hull of feasible in- 
teger points is called the Chvatal rank of the given poly- 
hedron. No bound is known on the Chvatal rank of an 
arbitrary integer polyhedron (convex hull of feasible in- 
teger points). By contrast, the matching polyhedron has 
Chvatal rank one, since the odd set inequalities can be 
obtained from the degree inequalities by integer round- 
ing. 

The Gomory-Chvatal procedure has been extended 
to mixed integer programming and has been enhanced 
by the use of subadditive functions and group theory. 

A different approach comes from disjunctive pro- 
gramming [2,3], or linear programming with logical 
conditions (conjunctions, disjunctions and implica- 
tions involving inequalities). In this approach, which 
uses the tools of convex analysis, like polarity and pro- 
jection, 0-1 programming (pure or mixed) is viewed as 
optimization over the (nonconvex) union of (convex) 
polyhedra, i.e. a set of the form Uje QP, where P; = 
{x: Aix => b'}, i € Q. There is a compact characteriza- 
tion of the convex hull Pp := conv Uj« QP; in a higher- 


dimensional space, whose projection onto the original 
space yields all the valid cutting planes. Thus ax > B 
is a valid inequality for U;¢ QP; if and only if a > u' A’ 
and B < u' bi for some u! > 0,i € Q. A central result of 
this approach is that an important class of disjunctive 
programs, called facial, which includes pure and mixed 
0-1 programs, are sequentially convexifiable. For a 0-1 
program (pure or mixed) with n 0-1 variables and a lin- 
ear programming relaxation Pp, this means that one can 
impose the 0-1 condition on x; and generate the con- 
vex hull P; of Pogo U Poi, where Poo := {x € Po: x1 = 
O}, Por := {x € Po: x; = 1}; then impose the 0-1 con- 
dition on x, and generate the convex hull P, of Pig U 
P,, where Pio := {x € Pi: x2 = 0}, Py := {x € Py: x2 = 
1}; etc., and at the end of n steps, the convex hull P,, 
of P,— 1,9 U Py —1,1 turns out to be the convex hull of 
{x € Po: x; € {0, 1},j=1,..., }. This property does not 
hold for arbitrary integer programs, and is thus a main 
distinguishing feature of 0-1 programs. If one defines 
the disjunctive rank of a polyhedron as the number of 
times the above procedure has to be iterated in order to 
generate all of its facets, it follows that an arbitrary 0-1 
programming polyhedron has disjunctive rank n. 
Although these results date back to 1974, it was 
not until the early 1990s that they were implemented 
into an efficient computational tool called lift-and- 
project ([4]). The name conveys the idea of a higher- 
dimensional representation of the convex hull (lift- 
ing), which is then projected back to generate cutting 
planes. In the meantime, L. Lovasz and A. Schrijver 
[13] (see also [18]) developed a closely related proce- 
dure which derives higher-dimensional representations 
of a 0-1 programming polyhedron by multiplying the 
constraint set of Py with the inequalities xj; > 0 and 1 
— x; = 0,j € N, then linearizing the resulting quadratic 
forms, and projecting them back into the original space. 
As in the disjunctive programming approach, n iter- 
ations of this procedure yield the convex hull of the 
0- 1 programming polyhedron. However, the quadratic 
forms obtained during the procedure can also be used 
to derive positive semidefiniteness constraints that are 
stronger than the inequalities obtained by linearization. 
Semidefiniteness constraints aside, a streamlined 
version of the Lovasz—Schrijver procedure, in which Pp 
is multiplied at every iteration by just one pair of in- 
equalities x; > 0, 1 — x; = 0, rather than by all pairs, 
was shown in [4] to be equivalent to the disjunctive 
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programming procedure for 0-1 polyhedra, in that the 
linearized version of the quadratic constraints obtained 
by multiplication is exactly the same as the higher- 
dimensional representation of the convex hull used in 
disjunctive programming. The paper [4] also showed 
how to use a cut generating linear program (CGLP) 
to obtain lift-and-project (or disjunctive) cuts that are 
deepest in a well defined sense. Most importantly, these 
cuts can be generated in a subspace, i.e. using only 
a subset of the variables, and then lifted to the full space. 
It was this aspect which has led to a computational 
breakthrough. Earlier attempts to implement cutting 
plane procedures of whatever type for general integer or 
0-1 programs foundered on the phenomenon known as 
‘stalling’: in the process of generating a sequence of cut- 
ting planes and reoptimizing the linear program, after 
a while the new cuts tended to become shallower and 
the process tended to run into numerical difficulties. 
Now the possibility of lifting cuts generated in a sub- 
space has opened the door to combining the enumer- 
ative and convexifying approaches into a branch and 
cut procedure, which generates cutting planes as long 
as they ‘work’, but branches whenever the cut generat- 
ing ‘stalls’. This was made possible by the fact that cuts 
generated at a node of the search tree can be lifted to 
be valid at any node. The outcome was a robust proce- 
dure, considerably more efficient than either a branch 
and bound or a cutting plane algorithm by itself [5]. 

Besides these two basic approaches (enumerative 
and convexifying), two further procedures need to be 
mentioned that do not belong to either category, but 
can be combined with either of them. Both procedures 
essentially decompose the problem, one of them by par- 
titioning the variables, the other by partitioning the 
constraints. The first one, known as column generation, 
starts with a subset of the columns and generates the 
missing columns as needed, by pricing them out. It is an 
extension to integer programming of well-known linear 
programming decomposition techniques. The second 
one, known as Lagrangian relaxation, works with a sub- 
set of the constraints, while assigning Lagrange multi- 
pliers to the remaining constraints and taking them into 
the objective function. 

Each of the approaches outlined above aims at solv- 
ing the integer program exactly. However, due to the 
NP-completeness of the problem, approximation meth- 
ods and heuristics play an increasingly important role 


in this field. Some highly efficient heuristics are known 
for several special structures. As to the general problem, 
heuristics have so far been less uniformly successful. 


The State of the Art 


Until about 1985, most of the integer programming 
problems encountered in practice were too large to 
be solved in useful time by existing algorithms and 
codes. This situation has drastically changed during the 
decade of the 1990s. MIPLIB is a collection of inte- 
ger programming problems that various people have 
tried to solve over the last two decades or so, often 
unsuccessfully. It contains scores of instances varying 
in structure, size and computational toughness. A few 
years ago even those instances that could be solved 
took a very long time. Today a large majority of these 
problems have been solved and can be solved in use- 
ful time. A number of tools are available for this pur- 
pose. Among academic codes, we mention MIPO [5], 
MINTO [14] and ABACUS [19]. The best-known com- 
mercial codes are CPLEX, XPRESS and OSL. 
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bases of toric ideals and polytopes see [29] and for 
Grobner basis theory for general polynomial ideals see 
[1] and [11]. Toric ideals and more generally, ‘lattice 
ideals’ [32], have been the subject of much research in 
the past decade. The discussion in this article follows 
a specific route through the work done in this area. All 
effort will be made to include references needed for de- 
tails and further reading. 


Toric Ideals and Integer Programming 


We will be concerned with integer programs of the 
form IP, -(b) := min{c - x: Ax = b, x € N"} where A 
is a fixed d x n integer matrix of rank d. Here N denotes 
the nonnegative integers. The right-hand side vector b 
will be assumed to lie in the monoid posz(A) := {Ax: x 
€ N"} which guarantees that IP, c(b) is always feasible. 
Let kerz(A) denote the (n — d)-dimensional saturated 
lattice {u € Z": Au = 0}. For simplicity we assume that 
kerz(A) MN” = {0} which implies that P, := conv{x € 
N": Ax = b} is a polytope for all b € posz(A). For b € 
posz(A) and a v € P, 1 N" the set of lattice points in 
Py is precisely the congruence class in N” of v modulo 
kerz(A). 
The toric ideal of A is the d-dimensional binomial 
prime ideal 
a2} ge a ut —u- € kerz(A), 
Iam (x =i : ut um ENT 


in k[x] := k[x,,..., x,] where k is a field. The cost vector 
clies in R" and for each polynomial f = )7!_, kx € I, 
the initial term of f with respect to c, denoted as in,(f), 
is the sum of those terms in f for which c - a; is maxi- 
mal. The initial ideal of I, with respect to c is then the 
ideal in,(I,4) := (in.(f): f € I4) C k[x]. We will assume 
unless stated otherwise that the cost vector c is such that 
in,(I4) is a monomial ideal, i.e, in-(I4) can be generated 
by monomials. Such a c is said to be generic with respect 
to IP,, the family of all integer programs IP,4,-(b) as b 
and c vary. Equivalently, c is generic with respect to IP, 
if and only if each integer program in the family IP4, 
c:= {IP4,-(b): b € posz(A)} has a unique optimal solu- 
tion. Note that each lattice point a € N” is a solution 
to a unique integer program in IP,,, since @ lies in Pag 
and in no other polytope of the form Py. The following 
theorem relates in,(I,4) to IP,,.. 


Lemma 1_ The lattice point a € N" is a nonoptimal so- 
lution to IP, (Aa) if and only if the monomial x® lies in 
the initial ideal in,(I,). 


Proof The lattice point a € N” is a nonoptimal solu- 
tion to IP4,.(Aq@) if and only if there exists B in Pag 
MN" such that c-a > c- B. This is equivalent to the 
statement that x% — x® is a nonzero element of [4 with 
in.(x® — xB) = x®. 


The standard monomials of in,(I,) are precisely all the 
monomials in k[x] that do not lie in in, (I,). 


Corollary2 A monomial x” € k[x] is a standard mono- 
mial of in-(I4) if and only if y is the unique optimal so- 
lution to the integer program IP,, (Ay). 


By Corollary 2, there is a bijection between the standard 
monomials of in,(I4) and the elements of the monoid 
posz(A). 


The Conti-Traverso Algorithm 


In [9], P. Conti and C. Traverso gave an algorithm to 
solve integer programs using Grébner bases of toric 
ideals. A Grébner basis with respect to c, of the toric 
ideal I,, is any finite subset J( of I, such that in,(I,) = 
(in-(f): f € H). A Grébner basis 1H is reduced if it has 
the additional property that for each f € H, the coef- 
ficient of in,(f) is the identity in k and in,(f) does not 
divide any term in another element g of H{. Reduced 
Grobner bases are unique. 

Let G, denote the reduced Grébner basis of I, with 
respect to c. Then G, has the form {x%' — xi:i=1,..., 
t} where a; — 8; € kerz(A), a;, Bj € N” and supp(aj) N 
supp(B;) = 9 for alli = 1,..., t. For p € Z", supp(p) := 
{i € [n] := {1,..., n}: pj 4 0} denotes the support of p. If 
xmi — xBi € G, then we always assume that c- a; >c- Bj. 


Lemma 3 If G, = {x*i — xPi:i=1,..., t} is the reduced 

Grébner basis of I, with respect to c then 

i) {x%:i=1,..., t} is the minimal generating set of the 
initial ideal in,(I4); and 

ii) for each binomial x*i — xPi € G,, Bj is the unique 
optimal solution to the integer program IP,, -(Aa;). 


Proof Part i) follows from the definition of reduced 
Grdbner bases. For each binomial x%' — x’i € G, we 
have Aa; = AB;, a;, B; € N” andc-a;>c- Bj. If B; 
is a nonoptimal solution to IP4 ,(Aq;) then xi lies in 
in,(I4) by Lemma 1 and hence some x*/ for j = 1,...,t 
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will divide x*' contradicting the definition of a reduced 
Grébner basis. 


The conditions in Lemma 3 are in fact also sufficient 
for a finite subset of binomials in I, to be the reduced 
Grébner basis of I, with respect to c. Given f € I, the 
normal form of f with respect to G, is the unique re- 
mainder obtained upon dividing f by G.. See [11] for 
details on the division algorithm in k[x]. The structure 
of G, implies that the normal form of a monomial x’ 
with respect to G, isa monomial x” such that both v 
and v are solutions to IP, .(Av). The Conti-Traverso 
algorithm for IP4,- can be summarized as follows. 


Input: The matrix A and cost vector c. 
Pre-processing: 

1 | Find a generating set for the toric ideal I,. 
Compute the reduced Grobner basis, G,, 
of I4 with respect to the cost vector c. 

To solve IP4,-(b): 

3 | Find a solution v to IP4,-(b). 

4 | Compute the normal form x”” of the mono- 
mial x” with respect to the reduced Grébner 
basis G;. 

Then yv* is the optimal solution to IP4,-(b). 


Conti-Traverso algorithm: How to solve programs in IP,,< 


Proof In order to prove the correctness of this algo- 
rithm, it suffices to show that for each solution v of 
IP,4, -(b), the normal form of x” modulo G, is the mono- 
mial x”” where v* is the unique optimal solution to 
IP4,-(b). Suppose x” is the normal form of the mono- 
mial x’. Then w is also a solution to IP, -(b) since the 
exponent vectors of all monomials x” obtained during 
division of x” by G, satisfy b = Av = Aw’, w’ EN" If w 
# v*, then x” — x” € I, and in,(x” — x”) =x" since c 
-w>c-v*. This implies that x” € in,(I4) and hence can 
be further reduced by G, contradicting the definition of 
the normal form. 


Computational Issues 


The Conti-Traverso algorithm above raises several 
computational issues. In Step 1, we require a generating 
set of the toric ideal I4 which can be a computationally 
challenging task as the size of A increases. The origi- 
nal Conti-Traverso algorithm starts with the ideal J, := 


(xt? — t97* sf =1,..., n, tot, +++ ta — 1) in the larger 
polynomial ring k[to, .. .» X,] where aj = a; — 
a; is the jth column of the matrix A. The toric ideal I, 
= Ja ™ k[x] and hence the reduced Grobner basis of I, 
with respect to c can be obtained by elimination (see [11 
Chapt. 3]). Although conceptually simple, this method 
has its limitations as the size of A increases since it re- 
quires d + 1 extra variables over those present in I, and 
the Buchberger algorithm for computing Grobner bases 
[8] is sensitive to the number of variables involved. Two 
different algorithms for computing a generating set for 
I, without introducing extra variables can be found in 
[5] and [18] respectively. 

Once the generating set of I4 has been found, one 
needs to compute the reduced Grébner basis G, of 
I,. This can be done by any computer algebra package 
that does Grébner basis computations like Macaulay2, 
Maple, Reduce, Singular or Cocoa to name a few. Co- 
coa has a dedicated implementation for toric ideals [6]. 
As the size of the problem increases, a straightforward 
computation of reduced Grébner bases of I4 can be- 
come expensive and even impossible. Several tricks can 
be applied to help the computation, many of which are 
problem specific. 

In Step 3 of the Conti-Traverso algorithm above 
one requires an initial solution to IP,, .(b). The original 
Conti-Traverso algorithm achieves this indirectly dur- 
ing the elimination procedure. Theoretically this task 
can be as hard as solving IP,4,.(b), although in practice 
this depends on the specific problem at hand. The last 
step - to compute the normal form of a monomial with 
respect to the current reduced Groébner basis — is (rela- 
tively speaking) a computationally easy task. 

In practice, one is often only interested in solving 
IP,,-(b) for a fixed b. In this situation, the Buchberger 
algorithm can be truncated to produce a sufficient set of 
binomials that will solve this integer program [35]. This 
idea was originally introduced in [36] in the context of 
0 - 1 integer programs in which all the data is nonnega- 
tive. See also [10]. A ‘nontoric algorithm’ for solving in- 
teger programs with fixed right-hand sides was recently 
proposed in [4]. 


igckdy. Mis08 


Test Sets in Integer Programming 


A geometric interpretation of the Conti-Traverso algo- 
rithm above and more generally of the Buchberger al- 
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gorithm for toric ideals can be found in [34]. A test set 
for IP,,, isa finite subset of vectors in kerz(A) such that 
for an integer program IP, -(b) and a nonoptimal solu- 
tion v to this program, there is some u in the test set 
such that c- v > c- (v — u). By interpreting a binomial 
xi — xBi G, as the vector a; — B; € kerz(A), it can be 
seen that G, is the unique minimal test set for the family 
IP,4,-. A closely related test set for integer programming 
is the set of neighbors of the origin introduced by H.E. 
Scarf [26]. 

The binomial x% — xi € G, can also be viewed as 
the directed line segment [a;, Bj] directed from a; to 
B;. For each b € posz(A) we now construct a directed 
graph Fy,, as follows: the vertices of this graph are the 
solutions to IP,,.(b) and the edges of this graph are all 
possible directed line segments from G, that connect 
two vertices of this graph. Then G, is a necessary and 
sufficient set of directed line segments such that Fy, 
is a connected graph with a unique sink (at the optimal 
solution) for each b € posz(A). This geometric interpre- 
tation of G, can be used to solve several problems. By 
reversing the directions on all edges in F;, ., one obtains 
a directed graph with a unique root. One can enumerate 
all lattice points in Py by searching this graph starting 
at its root. This idea was used in [33] to solve a class 
of manufacturing problems. The graphs Fy, provide 
a way to connect all the feasible solutions to an inte- 
ger program by lattice moves. This idea was applied to 
statistical sampling in [13]. 


Universal Grobner Bases 


A subset U, of I, is a universal Grobner basis for I, 
if Us contains a Grébner basis of I4 with respect to 
all (generic) cost vectors c € R". The Graver basis of 
A [16] is a finite universal Grobner basis of I, that 
can be described as follows. For each o € {+, —}”, let 
Ho be the unique minimal generating set (over N) of 
the semigroup kerz(A)  R?. Then the Graver basis, 
Gra := UsH,\{0}. An algorithm to compute Gr, can 
be found in [30]. It was shown in [34] that all reduced 
Grobner bases of I4 are contained in Gra which implies 
that there are only finitely many distinct reduced Gréb- 
ner bases for I4 as c varies over generic cost vectors. 
Let UGB, denote the union of all the distinct reduced 
Grobner bases of I4. Then UGB, is a universal Grébner 
basis of I, that is contained in the Graver basis Gry. The 


following theorem from [30] characterizes the elements 
of UGB, and thus allows one to test whether a binomial 
xi — xBi € Gry belongs UGB,. A second test can also 
be found in [30]. A vector u € Z” is said to be primitive 
if the g.c.d. of its components is one. 


Theorem 4 For a primitive vector u € kerz(A), the bi- 
nomial x"* — x" belongs to UGB, if and only if the 
line segment [u*, u_] is a primitive edge in the polytope 
Pps. 


The degree of a binomial x*' — xPi € I4, is defined to 
be )aij+ >° Bij. The degree of the universal Grébner 
basis UGB, is then simply the maximum degree of any 
binomial in UGB,. This number is an important com- 
plexity measure for the family of integer programs that 
have A as coefficient matrix. The current best bound for 
the degree of UGB, is as follows. See [29, Chapt. 4] for 
a full discussion. 


Theorem 5 The degree of a binomial x*i — xPi € 
UGBa, is at most (n — d)(d + 1)D(A) where D(A) is the 
maximum absolute value of the determinant of a d x d 
submatrix of A. 


It has been conjectured that this bound can be im- 
proved to (d + 1)D(A) and some partial results in this 
direction can be found in [17]. 

The universal Grobner bases of several special in- 
stances of A have been investigated in the literature, 
a few of which we mention here. For the family of 1 
x n matrices A(n) := [1,..., n] it was shown in [12] that 
the Graver basis of A(n) is in bijection with the primi- 
tive partition identities with largest part n. A matrix A € 
Z4*" is unimodular if the absolute values of the deter- 
minants of all its nonsingular maximal minors are the 
same positive constant. For u € kerz(A), the binomial 
x € I, is a circuit of A if u is primitive and 
has minimal support with respect to inclusion. Let C4 
denote the set of circuits of A. Then in general, C4 C 
UGB, C Gry. If A is unimodular, then all of the above 
containments hold at equality although the converse is 
false: there are nonunimodular matrices for which C4 
= Gra. If A, is the node-edge incidence matrix of the 
complete graph K,, then the elements in UGB, can be 
identified with certain subgraphs of K,. Grobner bases 
of these matrices were investigated in [23]. The integer 
programs associated with A, are the b-matching prob- 
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lems in the literature [24]. See [29, Chapt. 14] for some 
other specific examples of Grobner bases. 


Variation of Cost Functions 
in Integer Programming 


We now consider all cost vectors in R” (not just the 
generic ones) and study the effect of varying them. As 
seen earlier I, has only finitely many distinct reduced 
Grébner bases as c is varied over the generic cost vec- 
tors. We say that two cost vectors c, and cz are equiv- 
alent with respect to IP, if for each b € posz(A), the 
integer programs IP 4,.,(b) and IP4,,,(b) have the same 
set of optimal solutions. The Grébner basis approach 
to integer programming allows a complete character- 
ization of the structure of these equivalence classes of 
cost vectors. 


Theorem 6 [30] 

i) There exists only finitely many equivalence classes of 
cost vectors with respect to IP. 

ii) Each equivalence class is the relative interior of 
a convex polyhedral cone in R". 

iii) The collection of all these cones defines a complete 
polyhedral fan in R" called the Grébner fan of A. 

iv) Let db denote any probability measure with support 
posz(A) such that [4 b db < oo. 

Then the Minkowski integral St(A) = [ » Py db is an (n — 

d)-dimensional convex polytope, called the state polytope 

of A. The normal fan of St(A) equals the Grobner fan 

of A. 


Grébner fans and state polytopes of graded polynomial 
ideals were introduced in [25] and [2] respectively. For 
a toric ideal both these entities have self contained con- 
struction methods that are rooted in the combinatorics 
of these ideals [30]. For a software system for comput- 
ing Grébner fans of toric ideals see [21]. 

We call P, for b € posz(A) a Grébner fiber of A if 
there is some x“ € UGB, such that b = Aut* 
= Au”. Since there are only finitely many elements in 
UGB, the matrix A has only finitely many Grobner 
fibers. Then the Minkowski sum of all Grobner fibers 
of A is a state polytope of A. For a survey of algorithms 
to construct state polytopes and Grobner fans of graded 
polynomial ideals see [29, Chapt. 2; 3]. The Grobner fan 
of A provides a model for global sensitivity analysis for 
the family of integer programs IP 4, ¢. 


= x 


We now briefly discuss a theory analogous to the 
above for linear programming based on results in [7] 
and [14]. For a comparison of integer and linear pro- 
gramming from this point of view see [30]. Let LP, -(b) 
:= min{c - x: Ax = b, x > 0} where A and ¢ are as be- 
fore and b is any vector in the rational polyhedral cone 
pos(A) := {Ax: x > 0}. We define two cost vectors cy 
and c, to be equivalent with respect to LP, if the linear 
programs LP,,.,(b) and LP,,.,(b) have the same set of 
optimal solutions for all b € pos(A). Let A := {a,..., 
ay} be the vector configuration in Z’ consisting of the 
columns of A. For a subset o C A, we let pos(a) denote 
the cone generated by o. A polyhedral subdivision A of 
A is a collection of subsets of A such that {pos(c): o 
€ A} is a set of cones in a polyhedral fan whose sup- 
port is pos(A). The elements of A are called the faces or 
cells of A. For convenience we identify A with the set 
of indices [n] and any subset of A by the corresponding 
subset o C [n]. A cost vector c € R” induces the regular 
subdivision A, of A [7,14] as follows: o is a face of A, 
if there exists a vector y € R? such that a; - y = c; when- 
ever j € o and aj- y < cj otherwise. A cost vector c € R” is 
said to be generic with respect to LP, if every linear pro- 
gram in the family LP, . has a unique optimal solution. 
When c is generic for LP, the regular subdivision A, is 
in fact a triangulation called the regular triangulation of 
A with respect to c. 

Two cost vectors c, and c, are equivalent with re- 
spect to LP, if and only if A,, = A:,. The equivalence 
class of c with respect to LP, is hence {c’ € R”: Ay = 
A,} which is the relative interior of a polyhedral cone 
in R" called the secondary cone of c, denoted as S,. The 
cone S, is n-dimensional if and only if c is generic with 
respect to LP,. The set of all equivalence classes of cost 
vectors fit together to form a complete polyhedral fan in 
R" called the secondary fan of A. This fan is the normal 
fan of a polytope called the secondary polytope of A. See 
[7] for construction methods for both the secondary fan 
and polytope of A. 

We conclude this section by showing that the Gréb- 
ner and secondary fans of A are related. The Stanley- 
Reisner ideal of A, is the square-free monomial ideal 
(xi, +++ Xj,: fii, ..., i-} is anonface of A.) C k[x]. 


Theorem 7 [28] The radical of the initial ideal in,(I,) 
is the Stanley-Reisner ideal of the regular triangulation 
A.. 
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Corollary 8 [28] 
i) The Grobner fan of A is a refinement of the secondary 


fan of A. 


ii) A secondary polytope of A is a summand of a state 
polytope of A. 


Corollary 8 reaffirms the view that integer program- 
ming is an arithmetic refinement of linear program- 
ming. 


Group Relaxations in Integer Programming 


We now investigate group relaxations of integer pro- 
grams in the family IP4,, from an algebraic point of 
view. The results in this section are taken from [19,20] 
and [32], sometimes after an appropriate translation 
into polyhedral language. See these papers for the al- 
gebraic motivations that led to these results. 

The group relaxation of IP 4, .(b) [15] is the program 
Group?(b) := minf{te- xe: AgxXo + Arxe = 5, 
xg > 0,x = (Xo,xg)Z"}, where Ag, the submatrix of 
A whose columns are indexed by o C [n], is the op- 
timal basis of the linear program LP, .(b) and ce = 
Cr CoA, Ag. Here the cost vector c has also been par- 
titioned as c = (cg, ce) using the set o C [n]. 


Definition 9 Suppose L is any sublattice of Z", w € R” 
and v € N". The lattice program Latrc,y(v) defined by 
this data is 


min w-u 
mod £, 


s.t. u=v 
uEeN". 


Lattice programs are a generalization of integer pro- 
grams: IP4,.(b) = Latr,.(v) where £ = kerz(A) and v 
is any feasible solution to IP,4, .(b). Grébner basis meth- 
ods for integer programs can be extended to solve lat- 
tice programs (see [20,32]). Given the lattice £ and 
a cost vector w, we first construct the lattice ideal Ir = 
(x? —xf:a—-BeLl,a,pe N") Cc. k[x]. We then 
compute the reduced Grébner basis of Ir with respect 
to w denoted as Gy(I,). (If w does not induce a total 
order on N” via the inner product w - x, x € N”, then 
we use a tie breaking term order to refine the order in- 
duced by w.) For a particular lattice program Lat, (v), 
the optimal solution is the exponent vector of the nor- 
mal form of x” with respect to Gy (Ir). 


Let t C [n] and x;: Z” — Z'™! be the coordinate 
projection map where the coordinates indexed by t are 
eliminated. Consider the lattice £L; := 1,(£) where £ 
= kerz(A). Given a basis {b;, ..., bn—a} of £, the set 
{1 (b,), ...5 22 (b, — a)} forms a basis for £,. Further, 
Ir: £L — L; is an isomorphism whenever rank(A;) = 
|r|. 


Proposition 10 [32] Let v be a feasible solution to 
IP, -(b) and Ag be the optimal basis of LP 4, .(b). Then 
the group relaxation Group? (b) of IP4,-(b) is the lat- 
tice program Lat, ~ (o(v)) where Co = Agle — 
te(Ag) A) = te — eA, Aas 


The program Group’ (b) can be solved by Grébner ba- 
sis methods as explained earlier or by dynamic pro- 
gramming [15]. The optimal solution x= to Group? (b) 
is then lifted to the unique vector x* = (x3, x=) € Z" 
by solving the equation Agxo + Agx= = b. If all com- 
ponents of x* are nonnegative then x* is the optimal 
solution to IP, .(b). Otherwise c - x* is a lower bound 
to the optimal value of IP, -(b). 

When Group’ (0) fails to solve IP4,-(b), L. Wolsey 
[37] suggested using extended group relaxations of 
IP4,-(b). We introduce a more general set of extended 
group relaxations of IP,4,.(b) inspired by the following 
close relationship between the linear programs in LP4,, 
and the regular triangulation A,. 


Proposition 11 [30] The optimal solutions x to 
LP4,-(b) are the solutions to the problem: Find x € R" 
such that Ax = b, x => 0, and supp(x) is a subset of a face 
of Ac. 


Proposition 11 says that the set o in Group? (b) is 
a maximal face of A,. 


Definition 12 Consider the integer program IP, -(b) 
and a feasible solution v to this program. Let t be 
a face of A, and o be any maximal face of A, contain- 
ing t. Then the group relaxation of IP,4,-(b) with re- 
spect to t denoted as Group‘ (b) is the lattice program 
Lat, (41(v)) where, := m(¢ — cg AG'A). 


The extended group relaxations in [37] are precisely 
those Group’(b)s where t is a subset of the maximal 
face o of A, that gives the optimal basis of LP,, -(b). 
Clearly, one such relaxation will solve IP, .(b). How- 
ever, we consider all relaxations of IP 4, .(b) of the form 
Group‘ (b) as t varies over all faces of A, in order to 
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avoid keeping track of which b is being considered and 
what the optimal basis of LP,,-(d) is. 

It was shown in [32] that the lattice program 
Group*(b) is related to the localization of the initial 
ideal in,(I4) at the prime ideal p,; := (xj: j ¢ t) in k[x]. 
Since group relaxations are always defined with respect 
to a face t of A,, we are guaranteed that rank(A;) = 
|t| which allows a unique lifting of the optimal solution 
of Group*(b) to a vector in the same congruence class 
modulo £ as the solutions to IP 4, .(b). 


Theorem 13 [20] Suppose u' € N'*! is the optimal solu- 
tion to the group relaxation Group* (b). Then there exists 
a unique u € Z" such that A(u — v) = 0 for any feasible 
solution v to IP 4, -(b) and x,(u) =u’. If u > 0 then it is 
the optimal solution to IP,, -(b). 


A group relaxation Group’ (b) is easiest to solve when t 
is a maximal face of A,. In this situation, the lattice ideal 
Ir, is zero dimensional and hence their Grobner bases 
are easier to compute than otherwise. We call such 
group relaxations the Gomory relaxations of IP,, -(b). 
In general one is most interested in those group relax- 
ations Group*(b) that solve IP,4,-(b) with |t| as large 
as possible. In the rest of this section we study sev- 
eral structural properties of these ‘least tight’ extended 
group relaxations that solve programs in IP,,.. We first 
need a diversion into combinatorics. 
For m € N", we define support of x” € k[x] to be 
supp(m). 
Definition 14 For a monomial x” € k[x] ando C [n], 
we say that (x, 0) is an admissible pair of a monomial 
ideal M if 
i) supp(m) No = 9; and 
ii) every monomial in x” - k[x;: j € o] is a standard 
monomial of M. 


There is a natural partial order on the set of all admissi- 
ble pairs of M given by (x, a) < (x, o') if and only if 
x™ divides x” and supp(x”"/x”) Uo! Ca. 


Definition 15 An admissible pair (x”, 0) of M is called 
a standard pair of M if it is a minimal element in the 
poset of all admissible pairs with respect to the above 
partial order. 


The standard pairs of M induce a unique covering of 
the set of standard monomials of M which we refer to 
as the standard pair decomposition of M. This decom- 
position was introduced in [31] to study the associated 


primes of M and their multiplicities and thus the arith- 
metic degree [3] of M. When M is the initial ideal of 
a toric ideal stronger conclusions can be drawn. In our 
exposition below we bypass much of the algebraic re- 
sults associated with the standard pair decomposition 
of M, but instead use these results to motivate appro- 
priate definitions to continue our discussion of group 
relaxations. 


Definition 16 

i) Fort C [n], we define the multiplicity of t, denoted 
as mult(r), to be the number of standard pairs of the 
form (x, T) in the standard pair decomposition of 
M. 

ii) The sum of the multiplicities of t as t varies over 
the subsets of [1] is called the arithmetic degree of 
M, denoted as arithdeg(M). 


In the rest of this section we let M = in,(I,). 


Proposition 17 [29, Sect. 12.D] 

i) If (x, t) is a standard pair of in,(I,4) then t is a face 
of A.. 

ii) The standard pair (1, 0) occurs in the standard pair 
decomposition of in-(I4) if and only if o is a maximal 
face of A,. In this case, mult(o) is the normalized vol- 
ume of o in A.. 


The normalized volume of a maximal face o € A, is 
the quotient |det(A,)|/T where T is the g.c.d. of all 
|det(Ag’)| as o’ varies over the maximal faces of A.. 
We note that the converse to Proposition 17i) is false. 
If t is a nonmaximal face of A, then there may not be 
a standard pair of the form (x, r) in the standard pair 
decomposition of in, (I,). 

The standard pair decomposition of in ,(I4) reduces 
the problem of solving integer programs in IP,4,, to 
solving systems of linear equations: if 6 is the optimal 
solution to the program IP,,,(b), then the monomial 
x? is covered by some standard pair (x", t). Thinking 
of u as a vector in N'*! (by adding zero components if 
necessary), we get Bz = u and f,; is the unique solu- 
tion to the linear system A,;x; = b — Azu. Therefore, 
if the standard pairs of in,(I,4) are known a priori, then 
one can set up arithdeg(in,(I4))-many systems of lin- 
ear equations — one for each standard pair. For each b € 
posz(A), one then solves for 6; as above. Whenever the 
B, obtained this way is both integral and nonnegative, 
we have found the optimal solution to IP,4,-(b). Hence 
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arithdeg(in,(I4)) can be seen as a complexity measure 
of IP4,-. See [22] for another preprocessing of IP 4, that 
reduces solving IP4,-(b) to solving a sequence of sub- 
problems involving super additive functions. 


Theorem 18 [20] 

i) The integer program IP, -(b) is solved by the group 
relaxation Group’ (b) if and only if the monomial x?, 
where B is the optimal solution to IPa, -(b), is covered 
by a standard pair (x®, t') of in-(I4) for some t' D t. 


In order to state the main results, we need yet an- 
other interpretation of group relaxations of programs 
in IP, 

Let pa: N" — Z‘ be the linear map x +> Ax. Then 
P, is the convex hull of #3'(b) for each b € posz(A). 
Consider a matrix B € Z"*“"~” such that the columns 
of B form a basis for kerz(A) (as an Abelian group). For 
v € p,'(b) we can identify @;'(b) with the lattice points 
in the polytope 


Qy:= {u ER" 4: Bux v} j (1) 


via the bijection Q, N Z"~4 > 31(b) such that u > 
v — Bu. Under this bijection, v € 3'(b) corresponds 
to 0 € Q,. We refer to Q, as a Scarf formulation of Py 
= conv($;'(b)). If v, v € @3'(b), then Q, and Q,, are 
simply lattice translates of each other. 


Proposition 19 If v is a feasible solution to IPa, -(b), 
then IPs, -(b) is equivalent to 


min {—(cB) “ur UE QYN zr (2) 


Proof A lattice point v* is the optimal solution to 
IP 4, -(b): 

there exists u* € Z"~@ such that v* = v — Bu* 
> 0 and c(v — Bu*) < c(v— Bu) for allu 4 u* € Z"~4 
with v — Bu >0 

< there exist u* € Q, N Z"~ 4 such that — (cB) - u* 
<—(cB)-uforallue Q,N Z"~4 

<> u* is the optimal solution of the integer program 
(2). 
We will refer to the integer program (2) as a Scarf for- 
mulation of IP 4, -(b). Using the optimal solution u* of 
the Scarf formulation (2), we define the following sub- 
polytope of Q,: 

Q,(u*) := \u eR, 


(3) 
Bu < v, —(cB)-u < —(cB)-u*}. 


Theorem 20 Let v € N" be a feasible solution to 
IP,, -(b). Then u* is the optimal solution to (2) if and 
only if u* is the unique lattice point in Q,(u*). In partic- 
ular, v is the optimal solution to IP a, -(b) if and only if 0 
is the unique lattice point in Q,(0) = {u € R"~ 4: Bu <y, 
— (cB)-u < 0}. 


Proof A vector u* € Z"~4 is the optimal solution to 
(2) if and only if u* is in Q, and there is no u € Q, 
NM Z"—4 such that — (cB) - u < — (cB) - u*. Since c is 
a generic cost vector, this is equivalent to u* being the 
unique lattice point in Q,(u*). The second statement 
follows immediately. 


Corollary 21 A monomial x” is a standard monomial 
of in-(I4) if and only if 0 is the unique lattice point in 
Qy(0). 


Let B* denote the submatrix of B whose rows are in- 
dexed by the set t € [n]. 


Lemma 22 Suppose o is a maximal face of A, and t 
a subface of o. Then¢;B" = cB where ct, = m(c — 
Cg(Ag)*A). 


Proof Since the support of c — cg (Ag) 'A is contained 
in T, ¢;B* = (¢ —¢g(Ag)!A)B = cB. 


Theorem 23 Let v be a feasible solution to IP 4, .(b), and 
suppose that o is a maximal face of A, and t a subface 
of o. Then the group relaxation Group’ (b) is the integer 
program min {—(cB) -u: Bru<a,(v), ue y diam 


Proof Since L, = {Bru: UE Z"~4}. we have: 


Lat, ~ (1(v)) 


ee w = 7,(v) 
Cr: Ww: 


§ (mod L£,), 
we Ni 
we NI, 

w = 17(v) — Bru, 
ue Z-4 

Iz (v) — Bru = 0, 
ue znr4 
Bru < 1,(v), 
ue Zr4 


Bru < m,(v), 
ue z4 


= min fe - (7(v) — Bru): 
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Bru < 1, (v), 


= min} —(cB)-u: nez4 


by Lemma 22. 


We will denote the polyhedron obtained from Q, by 
removing the inequalities corresponding to t by Q*. 
By the above theorem, solving the group relaxation of 
IP4,-(b) with respect to t € A, is equivalent to mini- 
mizing the linear functional — (cB) - u over the lattice 
points in Q™. Now we can characterize which group re- 
laxations will solve IP, ,(b). 


Corollary 24 

i) Let v be a feasible solution to IP,,-(b). Then 
Group‘ (b) solves IP4,-(b) if and only if the pro- 
grams min{—(cB) +: u: u € Q Nn Z"-4} and 
min {—(cB) “ur uEQTN Sime have the same op- 
timal solutions. 

ii) If v is optimal for IP,4,-(b), then Group‘ (b) solves 
IP 4, -(b) if and only if 0 is the unique lattice point in 
Q™(0) := {u ER" 4: Bru < m,(v), -(cB)-u < 0}. 


For a polyhedron P = {x € R?: Tx < t} we say that an in- 
equality T;x < t; is essential if the relaxation of the poly- 
hedron obtained by removing T;x < t; contains a new 
lattice point. 


Theorem 25 [19] An admissible pair (x’, t) is a stan- 
dard pair of in,(I4) if and only if 0 is the unique lattice 
point in QT(0) and all of the inequalities in the system 
Bru < m;(v) are essential. 


Using the above characterization of the standard pairs 
of in.(I4) we obtain a combinatorial interpretation 
formult(r) and arithdeg(in,(I,)). 


Corollary 26 

i) The multiplicity of t is the number of polytopes of the 
form Q?(0) := {ue R"-4: Btu < v, —(cB)-u< 
0} where v € N'*l, 0 is the unique lattice point in 
Q*(0) and all inequalities in B'u < v are essential. 

ii) The arithmetic degree of in,(I,) is the total number 
of such polytopes Q*(0) as t ranges over the subsets 
of [n]. 

The result that mult(o) is the normalized volume of o 

when o is a maximal face of A, is a special case of the 

above more general interpretation of multiplicity. See 

[19]. 


Corollary 27 For the initial ideal in.(I,), the following 
are equivalent: 


i) The initial ideal in,(I,) has no standard pairs of the 
form (x, t) where t is a nonmaximal face of A¢. 

ii) Fora facet € Ag if there exists av € N'*! such that 
Q*(0) contains the origin as its unique lattice point 
and all inequalities are essential then t is a maximal 
face of A, and Q?(0) is a simplex. 

iii) All programs in IP4,- can be solved by group relax- 
ations with respect to maximal faces of A¢. 

iv) The arithmetic degree of in,(I,) is vol(conv(A)). 


Proposition 17 shows that the set of all t in A, that in- 
dex standard pairs of in,(I,) is a sub poset (with respect 
to inclusion) of the face lattice of A,. We denote this 
subposet by Std(A,). Note that both (face lattice of) A, 
and Std(A,) have the same maximal elements. We now 
show that the elements of Std(A,) come in chains. 


Theorem 28 [19] Let t, |t| < d be a nonmaximal face 
of A, such that t € Std(A,). Then there exists some t' € 
A, such that t' € Std(A.) with the property that 

i) t!D>tand 

ii) |c’| = |t| +1. 


See [19] for a proof of this theorem. The tools needed 
in the proof are polyhedral and depend heavily on the 
polyhedral interpretation of a standard pair as given in 
Theorem 25. In terms of group relaxations, Theorem 
28 is saying that whenever there is a b € posz(A) that 
is solved by a ‘least tight’ Group‘(b), then there exists 
ab’ € posz(A) that is solved by a ‘least tight’ Group” (b) 
where 

i) t/>tand 

ii) |c’'|=|t] +1. 

Hence the ‘least tight’ extended group relaxations that 
solve the programs in IP,4,, form saturated chains in the 
poset Std(A,). 

Since a maximal face of A, has dimension d, the 
length of a maximal chain in Std(A,), which we denote 
as length(Std(A,)), is at most d. However, when n — d 
which is the the corank of A is small compared to d, 
length(Std(A,)) has a stronger upper bound as shown 
below. We need the following result [27, Corol. 16.5a]: 


Theorem 29 Let Ax < b be a system of linear inequali- 
ties in n variables, and let c € R". If max{c - x: Ax < b, x 
€ Z"} is finite, then max{c - x: Ax < b, x € Z"} = max{c - 
x: Al x <b’, x € Z"} for some subsystem A'x < b! of Ax 
< b with at most 2" — 1 inequalities. 
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Theorem 30 The length of a maximal chain in Std(A,) 
is at most min(d, 2"~ 4 —(n—d+ 1)). 


Proof Suppose v is the optimal solution to IP4,.(b) 
which is equivalent to min{— (cB) - u: Bu < v, u € 
1 hie ay By Theorem 29, we need at most 2”~ 4 _ ] in- 
equalities to describe the same integer program. This 
means we can remove at least n — (2"~¢ — 1) inequal- 
ities from Bu < v without changing the optimal solu- 
tion. Therefore by Theorem 23, IP,4,.(b) can be solved 
by a group relaxation with respect to a t € A, of size 
at least n — (2"~ 4 — 1). This implies that the maximal 
length of a chain in Std(A,) is at most d — (n — (2"— : 
—1))=2"-4~-—(n—d+1). 


Corollary 31 If A € Z’*" has corank two, then 
length(Std(A,)) < 1. 


Proof In this situation, 2”~ 4_(n—d+1)=4-(4— 
2+ 1)=4-3=1. 


Corollary 32 All programs in the family IP,,. can be 
solved by group relaxations with respect toa t € A, of 
size at least max(0, n — (2"~ 4 — 1)). 


We conclude by remarking that the bound in Theorem 
30 is sharp. See [19] for details. 
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Synonyms 


Branch and bound 


Overview 

An integer programming problem (IP) is an optimiza- 
tion problem in which some or all of the variables are 
restricted to take on only integer values. The exposition 
presented here will focus on the case in which the ob- 
jective and constraints of the optimization problem are 
defined via linear functions. In addition, for simplicity, 
it will be assumed that all of the variables are restricted 
to be nonnegative integer valued. Thus, the mathemati- 
cal formulation of the problem under consideration can 
be stated as: 


clx 


Ax <b 
xeZi, 


max 
(IP) 4 st. 
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where A € R™*", b € R™ and c € R". For notational 
convenience, let S denote the constraint set of problem 
(IP); i-e., 


S:= {xe Zi: Ax <b}. 


The classical approach to solving integer programs 
is branch and bound [39]. The branch and bound 
method is based on the idea of iteratively partitioning 
the set S (branching) to form subproblems of the origi- 
nal integer program. Each subproblem is solved - either 
exactly or approximately - to obtain an upper bound 
on the subproblem objective value. The driving force 
behind the branch and bound approach lies in the fact 
that if an upper bound for the objective value of a given 
subproblem is less than the objective value of a known 
integer feasible solution (e. g., obtained by solving some 
other subproblem) then the optimal solution of the 
original integer program cannot lie in the subset of S as- 
sociated with the given subproblem. Hence, the upper 
bounds on subproblem objective values are, in essence, 
used to construct a proof of optimality without exhaus- 
tive search. 

One concept that is fundamental to obtaining upper 
bounds on subproblem objective values is that of prob- 
lem relaxation. A relaxation of the optimization prob- 
lem 


max ane xe s\ 


is an optimization problem 


max {cp x: xeE Sr} ; 
where SC Spandc!x < eh x for all x € S. Clearly, solv- 
ing a problem relaxation provides an upper bound on 
the objective value of the underlying problem. Perhaps 
the most common relaxation of problem (IP) is the 
linear programming relaxation formed by relaxing the 
integer restrictions and enforcing appropriate bound 
conditions on the variables; i.e., cp =c and Sp ={xe€ 
R": Ax <b,l<x <u}. 

A formal statement of a general branch and bound 
algorithm [48] is presented in Table 1. The notation L 
is used to denote the list of active subproblems {IP’}, 
where IP° = (IP) denotes the original integer program. 
The notation Z; denotes an upper bound on the optimal 
objective value of IP’, and z;p denotes the incumbent 


Integer Programming: Branch and Bound Methods, Table 1 
General branch and bound algorithm 


1 | Chitialization): Set L = {IP°},Z = +00, and 
Zip = ©: 

2 | (Termination): If L = 9Q, then the solu- 
tion x* which yielded the incumbent objective 
value z;, is optimal. If no such x* exists (i.e., 
ly = —oo), then (IP) is infeasible. 

3 | (Problem selection and relaxation): Select and 
delete a problem IP‘ from L. Solve a relax- 
ation of IP’. Let z denote the optimal objective 
value of the relaxation, and let x‘* be an opti- 
mal solution if one exists. (Thus, z® = c!x!®, 
or z* = —00.) 

4 | (Fathoming and Pruning): 

We || Wheres es Z;» 80 to Step 2. 

ii) | If z* > z,, and x'® is integral feasible, up- 
date z,, =z)‘. Delete from L all problems with 
ty SE Zip: Go to Step 2. 

5 | (Partitioning): Let {S'J ye be a partition of the 
constraint set S! of the problem IP’. Add prob- 
lems Puy to L, where IP‘/ is IP‘ with fea- 
sible region restricted to S'/ and Z;; = z} for 
= eae 

Go to Step 2. 


objective value (i.e., the objective value corresponding 
to the current best integral feasible solution to (IP)). 

The actual implementation of a branch and bound 
algorithm is typically viewed as a tree search, where 
the problem at the root node of the tree is the origi- 
nal (IP). The tree is constructed in an iterative fashion 
with new nodes formed by branching on an existing 
node for which the optimal solution of the relaxation 
is fractional (i.e., some of the integer restricted vari- 
ables have fractional values). Typically, two child nodes 
are formed by selecting a fractional valued variable and 
adding appropriate constraints in each child subprob- 
lem to ensure that the associated constraint sets do not 
include solutions for which this chosen branching vari- 
able assumes the same fractional value. 

The phrase fathoming a node is used in reference to 
criteria that imply that a node need not be explored fur- 
ther. As indicated in Step 4, these criteria include: 

a) the objective value of the subproblem relaxation at 
the node is less than or equal to the incumbent ob- 
jective value; and 
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b) the solution for the subproblem relaxation is integer 
valued. 

Note that a) includes the case when the relaxation is 
infeasible, since in that case its objective value is — 
oo. Condition b) provides an opportunity to prune the 
tree; effectively fathoming nodes for which the objective 
value of the relaxation is less than or equal to the up- 
dated incumbent objective value. The tree search ends 
when all nodes are fathomed. 

A variety of strategies have been proposed for intel- 
ligently selecting branching variables, for problem par- 
titioning, and for selecting nodes to process. However, 
no single collection of strategies stands out as being 
best in all cases. In the remainder of this article, some 
of the strategies that have been implemented or pro- 
posed are summarized. An illustrative example is pre- 
sented. Some of the related computational strategies - 
preprocessing and reformulation, heuristic procedures, 
and the concept of reduced cost fixing - which have 
proved to be highly effective in branch and bound im- 
plementations are considered. Finally, there is a discus- 
sion of recent linear programming based branch and 
bound algorithms that have employed interior point 
methods for the subproblem relaxation solver, which is 
in contrast to using the more traditional simplex-based 
solvers. 

Though branch and bound is a classic approach for 
solving integer programs, there are practical limitations 
to its success in applications. Often integer feasible so- 
lutions are not readily available, and node pruning be- 
comes impossible. In this case, branch and bound fails 
to find an optimal solution due to memory explosion 
as a result of excessive accumulation of active nodes. In 
fact, general integer programs are NP-hard; and conse- 
quently, as of this writing (1998), there exists no known 
polynomial time algorithm for solving general integer 
programs [30]. 

In 1983, a breakthrough in the computational pos- 
sibilities of branch and bound came as a result of the 
research by H. Crowder, E.L. Johnson, and M.W. Pad- 
berg. In their paper [22], cutting planes were added 
at the root node to strengthen the LP formulation be- 
fore branch and bound was called. In addition, fea- 
tures such as reduced cost fixing, heuristics and pre- 
processing were added within the tree search algo- 
rithm to facilitate the solution process. See » Inte- 
ger programming: Cutting plane algorithms for de- 


tails on cutting plane applications to integer program- 
ming. 

Since branch and bound itself is an inherently par- 
allel technique, there has been active research activity 
among the computer science and operations research 
communities in developing parallel algorithms to im- 
prove its solution capability. 

Most commercial integer programming solvers use 
a branch and bound algorithm with linear program- 
ming relaxations. Unless otherwise mentioned, the de- 
scriptions of the strategies discussed herein are based 
on using the linear programming relaxation. 

See [48] for references not listed here; [51] also in- 
cludes useful material about branch and bound. 


Partitioning Strategies 


When linear programming relaxation is employed, par- 

titioning is done via addition of linear constraints. Typ- 

ically, two new nodes are formed on each division. 

Suppose x® is an optimal solution to the relaxation 

of a branch and bound node. Common partitioning 

strategies include: 

e Variable dichotomy [23]. If x; is fractional, then two 
new nodes are created, one with the simple bound 
ae | x} | and the other with x; > [ x ]; where 
| - | and [ - ] denote the floor and the ceiling of 
a real number. In particular, if x; is restricted to be 
binary, then the branching reduces to fixing x; = 0 
and x; = 1, respectively. One advantage of simple 
bounds is that they maintain the size of the basis 
among branch and bound nodes, since the simplex 
method can be implemented to handle both upper 
and lower bounds on variables without explicitly in- 
creasing the dimensions of the basis. 

e Generalized-upper-bound dichotomy (GUB_ di- 
chotomy) [8]. If the constraint }’jceq xj = 1 is 
present in the original integer program, and x7, i 
€ Q, are fractional, one can partition Q = Q; UQ 

such that > jeq, o and > jeq, x} are approxi- 
mately of equal value. Then two branches can be 
formed by setting ))jeq, xj = 0 and ) jeg, xj = 0, 
respectively. 

e Multiple branches for bounded integer variable. If x? 
is fractional, and x; € { 0,..., 1}, then one can cre- 
ate 1 + 1 new nodes, with x; = k for node k, k = 0, 
..., 1, This idea was proposed in the first branch and 
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bound algorithm by A.H. Land and A.G. Doig [39], 
but currently (1998) is not commonly used. 


Branching Variable Selection 


During the partitioning process, branching variables 

must be selected to help create the children nodes. 

Clearly the choice of a branching variable affects the 

running time of the algorithm. Many different ap- 

proaches have been developed and tested on different 
types of integer programs. Some common approaches 
are listed below: 

e Most/least infeasible integer variable. In this ap- 
proach, the integer variable whose fractional value 
is farthest from (closest to) an integral value is cho- 
sen as the branching variable. 

e Driebeck-Tomlin penalties [25,57]. Penalties give 
a lower bound on the degradation of the objective 
value for branching each direction from a given vari- 
able. The penalties are the cost of the dual pivot 
needed to remove the fractional variable from the 
basis. If many pivots are required to restore primal 
feasibility, these penalties are not very informative. 
The up penalty, when forcing the value of the kth 
basic variable up, is 


_ - fide; 
ua = min ————, 
fakj <0 —akj 
where f; is the fractional part of x;,, ¢; is the reduced 
cost of variable x;, and the aj; are the transformed 
matrix coefficients from the kth row of the optimal 
dictionary for the LP relaxation. The down penalty 
d; is calculated as 


Once the penalties have been computed, a variety 
of rules can be used to select the branching vari- 
able (e.g., max, max(uz, dx), or max, min(uz, d;)). 
A penalty can be used to eliminate a branch if the 
LP objective value for the parent node minus the 
penalty is worse than the incumbent integer solu- 
tion. Penalties are out of favor because their cost is 
considered too high for their benefit. 

e Pseudocost estimate. Pseudocosts provide a way to 
estimate the degradation to the objective value by 
forcing a fractional variable to an integral value. The 
technique was introduced in 1970 by M. Benichou et 


al. [10]. Pseudocosts attempt to reflect the total cost, 
not just the cost of the first pivot, as with penalties. 
Once a variable x; is labeled as a candidate branch- 
ing variable, the pseudocosts are computed as: 


1— fk tk 
where Z, is the objective value of the parent, z{ is the 
objective value resulting from forcing up, and z4 is 
the objective value from forcing down. (If the sub- 
problem is infeasible, the associated pseudocost is 
not calculated.) Ifa variable has been branched upon 
repeatedly, an average may be used. 
The branching variable is chosen as that with the 
maximum degradation, where the degradation is 
computed as: Dx fx + Ux (1 — fx). Pseudocosts are 
not considered to be beneficial on problems where 
there is a large percentage of integer variables. 
Pseudoshadow prices. Similar to pseudocosts, pseu- 
doshadow prices estimate the total cost to force 
a variable to an integral value. Up and down pseu- 
doshadow prices for each constraint and pseu- 
doshadow prices for each integer variable are speci- 
fied by the user or given an initial value. The degra- 
dation in the objective function for forcing an in- 
teger variable x, up or down to an integral value 
can be estimated. The branching variable is chosen 
using criteria similar to penalties and pseudocosts. 
See [27,40] for precise mathematical formulations 
on this approach. 
Strong branching. This branching strategy arose in 
connection with research on solving difficult in- 
stances of the traveling salesman problem and gen- 
eral mixed zero-one integer programming prob- 
lems [2,12,13]. Applied to zero-one integer pro- 
grams within a simplex-based branch and cut set- 
ting, strong branching works as follows. Let N and 
K be positive integers. Given the solution of some 
linear programming relaxation, make a list of N bi- 
nary variables that are fractional and closest to 0.5 
(if there are fewer than N fractional variables, take 
all fractional variables). Suppose that I is the index 
set of this list. Then, for each i € I, fix x; first to 0 
and then to 1 and perform K iterations (starting with 
the optimal basis for the LP relaxation of the cur- 
rent node) of the dual simplex method with steepest 
edge pricing. Let L;, Uj, i € I, be the objective values 
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that result from these simplex runs, where L; cor- 
responds to fixing x; to 0 and U; to fixing it to 1. 
A branching variable can be selected based on the 
best weighted-sum of these two values. 

e Priorities selection. Variables are selected based on 
their priorities. Priorities can be user-assigned, or 
based on objective function coefficients, or on pseu- 
docosts. 


Node Selection 


Given a list of active problems, one has to decide which 

subproblem should be selected to be examined next. 

This in turn will affect the possibilities of improving the 

incumbent, the chance of node fathoming, and the to- 

tal number of problems needed to be solved before op- 
timality is achieved. Below, various strategies given in 

[7,10,11,20,27,29,31,47] are presented. 

e Depth-first search with backtracking. Choose a child 
of the previous node as the next node; if it is pruned, 
choose the other child. If this node is also pruned, 
choose the most recently created unexplored node, 
which will be the other child node of the last suc- 
cessful node. 

e Best bound. Among all unexplored nodes, choose 
the one which has the best LP objective value. In 
the case of maximization, the node with the largest 
LP objective value will be chosen. The rationale is 
that since nodes can only be pruned when the relax- 
ation objective value is less than the current incum- 
bent objective value, the node with largest LP objec- 
tive value cannot be pruned, since the best objective 
value corresponding to an integer feasible solution 
cannot exceed this largest value. 

e Sum of integer infeasibilities. The sum of infeasibili- 
ties at a node is calculated as 


s= y > min(f;,1— fj). 
j 


Choose the node with either maximum or minimum 
sum of integer infeasibilities. 

e Best estimate using pseudocosts. This technique was 
introduced [10] along with the idea of using pseu- 
docosts to select a branching variable. The individ- 
ual pseudocosts can be used to estimate the resulting 
integer objective value attainable from node k: 


=n Y > min(D;fi. U;(1 — fi)), 


where Zx is the value of the LP relaxation at node k. 
The node with the best estimate is chosen. 

e Best estimate using pseudoshadow prices. Pseu- 
doshadow prices can also be used to provide an esti- 
mate of the resulting integer objective value attain- 
able from the node, and the node with the best esti- 
mate can then be chosen. 

e Best projection [29,47]. Choose the node among all 
unexplored nodes which has the best projection. 
The projection is an estimate of the objective func- 
tion value associated with an integer solution ob- 
tained by following the subtree starting at this node. 
It takes into account both the current objective func- 
tion value and a measure of the integer infeasibility. 
In particular, the projection p; associated with node 
k is defined as 

Sk(Zo — Zip) 
Pk = 2 — ———; 
50 

where Zp denotes the objective value of the LP at the 
root node, zip denotes an estimate of the optimal in- 
teger solution, and s; denotes the sum of the integer 
infeasibilities at node k. 

The projection is a weighting between the objective 
function and the sum of infeasibilities. The weight 
(Z — Zip)/So corresponds to the slope of the line be- 
tween node 0 and the node producing the optimal 
integer solution. It can be thought of as the cost to 
remove one unit of infeasibility. Let n; be the num- 
ber of integer infeasibilities at node k. A more gen- 
eral projection formula is to let we = pu ny. + (1 — 2) 
sx, where jt € [0, 1], and define 


wx(Zo — Zip) 
Wo ‘ 


Example 1 Here, a two-variable integer program is 
solved using branch and bound. The most infeasible in- 
teger variable is used as the branching variable, and best 
bound is used for node selection. Consider the problem 


max 13x; + 8x2 
s.t. xX, + 2x2 < 10 
Ip° 5x1 + 2x2 < 20 
x, 20, x2. 20, 


x1, X2 integer. 
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Initially, L consists of just this problem IP®. The solu- 
tion to the LP relaxation is x? = 2.5, x$ = 3.75, with value 
zx = 59.5. The most infeasible integer variable is x, so 
two new subproblems are created, IP! where x; > 3 and 
IP? where x, < 2, and L = {IP!, IP? }. 

Both problems in L have the same bound 59.5, so as- 
sume the algorithm arbitrarily selects IP’. The optimal 
solution to the LP relaxation of IP’ is x} = 3, x; = 2.5, 
with value z* = 59. The most infeasible integer variable 
is x2, so two new subproblems of IP! are created, IP? 
where x, > 3 and IP* where x, < 2, and now L = {IP?, 
IP3, IP*}. 

The algorithm next examines IP’, since this is the 
problem with the best bound. The optimal solution to 
the LP relaxation is xj = 2, x5 = 4, with value zt = 58. 
Since x? is integral feasible, z;, can be updated to 58 and 
IP? is fathomed. 

Both of the two problems remaining in L have best 
bound greater than 58, so neither can yet be fathomed. 
Since these two subproblems have the same bound 59, 
assume the algorithm arbitrarily selects IP* to examine 
next. The LP relaxation to this problem is infeasible, 
since it requires that x satisfy x; > 3, x. > 3 and 5x, 
+ 2x2 < 20 simultaneously. Therefore, z¥ = — oo, and 
this node can be fathomed by bounds since Ze < Zip. 

That leaves the single problem IP* in L. The solu- 
tion to the LP relaxation of this problem is mG = 3.2; 
x} = 2, with value z? = 57.6. Since zf < Zip, this sub- 
problem can also be fathomed by bounds. The set L is 
now empty, so x” is optimal for the integer program- 
ming problem IP°. 

The progress of the algorithm is indicated in Fig. 1. 
Each box contains the name of the subproblem, the so- 
lution to the LP relaxation, and the value of the solu- 
tion. 


Preprocessing and Reformulation 


Problem preprocessing and reformulation has been 
shown to be a very effective way of improving inte- 
ger programming formulations prior to and during 
branch and bound [14,15,18,19,22,24,33,34,35,54]. Be- 
low, some commonly employed preprocessing tech- 
niques are listed. For more details on these procedures, 
see the references. 
1) Removal of empty (all zeros) rows and columns. De- 
tection of implicit bounds and implicit slack vari- 
ables. 


zy = 2.5,29 = 3.75 


Update zi, = 58 
Fathomed 


rw. >3 £2 <2 
Infeasible zi = 3.2,25 =2 
3 4 1 3#2 
Fathomed zf < zip 
Fathomed 


Integer Programming: Branch and Bound Methods, Figure 1 
Branch and bound example 


2) Removal of rows dominated by multiples of other 
rows, including pairs of rows for which the support 
of one is a subset of the support of the other. 

3) Strengthen the bounds within rows by comparing 
individual variables and coefficients to the right- 
hand side. Additional strengthening may be possible 
for integral variables using rounding. 

4) Use variable bounds to determine upper and lower 
bounds for the left-hand side of a constraint, 
and compare these bounds to the right-hand side. 
Where possible, conclude that a constraint is incon- 
sistent, redundant, or forces the fixing of some or all 
variables in its support. Several of these row-driven 
operations can be dualized to columns. 

5) Aggregation: Given an equality constraint where the 
bound on some variable is implied by the satis- 
faction of the bounds on the other variables, this 
variable can be substituted out, and the constraint 
deleted. Note that free variables always satisfy this 
condition. Note also that in order to control fill-in 
(and coefficient growth), not all such substitutions 
may be desirable. For integral variables, there is the 
added restriction that they can be eliminated only 
if their integrality is implied by the integrality of 
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the remaining variables. For integer programming 

problems, an added advantage of aggregation rela- 

tive to LP ’s, is that the reduction in the number of 
equality constraints increases the relative dimension 
of the underlying polytope. 

6) Coefficient reduction: Consider a constraint )\j<x 
a; x; = b in which all a; = 0 and all x; = 0. If x; is 
a 0-1 variable and a; > b, for some j € K, replace Gj 
by b. A stronger version of this procedure is possible 
when the problem formulation involves other con- 
straints of appropriate structure. 

7) Logical implications and probing: 

a) Logical implications: Choose a binary variable x; 
and fix it to 0 or 1. Perform 4. This analysis may 
yield logical implications such as x, = 1 implies 
xj = 0, or x, = 1 implies x; = 1, for some other 
variable x;. The implied equality is then added as 
an explicit constraint. 

b) Probing: Perform logical implications recur- 
sively. An efficient implementation of probing 
appears to be very difficult. Details of compu- 
tational issues regarding probing are discussed 
in [33], and [54]. 


Heuristics 


Heuristic procedures provide a means for obtaining in- 
teger feasible solutions quickly, and can be used repeat- 
edly within the branch and bound search tree. A good 
heuristic - one that produces good integer feasible solu- 
tions - is a crucial component in the branch and bound 
algorithm since it provides an upper bound for reduced 
cost fixing (discussed later) at the root, and thus allows 
reduction in the size of the linear program that must 
be solved. This in turn may reduce the time required 
to solve subsequent linear programs at nodes within 
the search tree. In addition, a good upper bound in- 
creases the likelihood of being able to fathom active 
nodes, which is extremely important when solving large 
scale integer programs as they tend to create many ac- 
tive nodes leading to memory explosion. 

Broadly speaking, five ideas are commonly used in 
developing heuristics. The first idea is that of greed- 
iness. Greedy algorithms work by successively choos- 
ing variables based on best improvement in the objec- 
tive value. Kruskal’s algorithm [37], which is an exact 
algorithm for finding the minimum-weight spanning 
tree in a graph, is one of the most well-known greedy 


algorithms. Greedy algorithms have been applied to 
a variety of problems, including 0-1 knapsack prob- 
lems [36,41,53], uncapacitated facility location prob- 
lems [38,56], set covering problems [3,4], and the trav- 
eling salesman problem [52]. 

A second idea is that of local search, which involves 
searching in a local neighborhood ofa given integer fea- 
sible solution for a feasible solution with a better objec- 
tive value. The k-interchange heuristic is a classic ex- 
ample of a local search heuristic [38,44,46]. Simulated 
annealing is another example, but with a bit of a twist. 
It allows, with a certain probability, updated solutions 
with less favorable objective values in order to increase 
the likelihood of escaping from a local optimum [16]. 

Randomized enumeration is a third idea that is used 
to obtain integer feasible solutions. One such method is 
that of genetic algorithms (cf. » Genetic algorithms), 
where the randomness is modeled on the biological 
mechanisms of evolution and natural selection [32]. Re- 
cent work on applying a genetic algorithm to the set 
covering problem can be found in [9]. 

The term primal heuristics refers to certain LP- 
based procedures for constructing integral feasible so- 
lutions from points that are in some sense good, but fail 
to satisfy integrality. Typically, these nonintegral points 
are obtained as optimal solutions of LP relaxations. Pri- 
mal heuristic procedures involve successive variable fix- 
ing and rounding (according to rules usually governed 
by problem structure) and subsequent re-solves of the 
modified primal LP [6,12,14,34,35]. 

The fifth general principle is that of exploiting the 
interplay between primal and dual solutions. For exam- 
ple, an optimal or heuristic solution to the dual of an 
LP relaxation may be used to construct a heuristic so- 
lution for the primal (IP). Problem dependent crite- 
ria based on the generated primal-dual pair may sug- 
gest seeking an alternative heuristic solution to the dual, 
which would then be used to construct a new heuris- 
tic solution to the primal. Iterating back-and-forth be- 
tween primal and dual heuristic solutions would con- 
tinue until an appropriate termination condition is sat- 
isfied [21,26,28]. 

It is not uncommon that a heuristic involves more 
than one of these ideas. For example, pivot-and- 
complement is a simplex-based heuristic in which bi- 
nary variables in the basis are pivoted out and replaced 
by slack variables. When a feasible integer solution is 
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obtained, the algorithm performs a local search in an 
attempt to obtain a better integer feasible solution [5]. 
Obviously, within a branch and bound implementa- 
tion, the structure of the problems that the implemen- 
tation is targeted at influences the design of an effective 
heuristic [2,12,13,14,22,26,34,35,43]. 


Continuous Reduced Cost Implications 


Reduced cost fixing is a well-known and important idea 
in the literature of integer programming [22]. Given an 
optimal solution to an LP relaxation, the reduced costs 
¢; are nonpositive for all nonbasic variables x; at lower 
bound, and nonnegative for all nonbasic variables at 
their upper bounds. Let x; be a nonbasic variable in 

a continuous optimal solution having objective value 

Zzp, and let z;, be the objective value associated with an 

integer feasible solution to (IP). The following are true: 

a) If x; is at its lower bound in the continuous solution 
and zpp — Zip < —Cj, then there exists an optimal 
solution to the integer program with x; at its lower 
bound. 

b) Ifx; is at its upper bound in the continuous solution 
and zip — Zip < Cj, then there exists an optimal 
solution to the integer program with x; at its upper 
bound. 

When reduced cost fixing is applied to the root node 
of a branch and bound tree, variables which are fixed 
can be removed from the problem, resulting in a re- 
duction in the size of the integer program. A vari- 
ety of studies have examined the effectiveness of re- 
duced cost fixing within the branch and bound tree 
search [12,14,22,34,35,49,50]. 


Subproblem Solver 


When linear programs are employed as the relaxations 
within a branch and bound algorithm, it is common to 
use a simplex-based algorithm to solve each subprob- 
lem, using dual simplex to reoptimize from the optimal 
basis of the parent node. This technique of advanced 
basis has been shown to reduce the number of simplex 
iterations to solve the child node to optimality, and thus 
speedup the overall computational effort. Recently with 
the advancement in computational technology, the in- 
crease in the size of integer programs, and the success 
of interior point methods (cf. also » Linear program- 
ming: Interior point methods) to solve large scale lin- 
ear programs [1,45] there are some branch and bound 


algorithms employing interior point algorithms as the 
linear programming solver [17,42,43,55]. In this case, 
advanced basis is no longer available and care has to 
be taken to take advantage of warmstart vectors for the 
interior point solver so as to facilitate effective compu- 
tational results. In [42,43], a description of the ideas 
of ‘advanced warmstart’ and computational results are 
presented. 
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Branch and cut methods are exact algorithms for inte- 
ger programming problems. They consist of a combina- 
tion of a cutting plane method (cf. » Integer program- 
ming: Cutting plane algorithms) with a branch and 
bound algorithm (cf. » Integer programming: Branch 
and bound methods). These methods work by solving 
a sequence of linear programming relaxations of the 
integer programming problem. Cutting plane methods 
improve the relaxation of the problem to more closely 
approximate the integer programming problem, and 
branch and bound algorithms proceed by a sophisti- 
cated divide-and-conquer approach to solve problems. 
The material in this entry builds on the material con- 
tained in the entries on cutting plane and branch and 
bound methods. 

Perhaps the best known branch and cut algorithms 
are those that have been used to solve traveling sales- 
man problems. This approach is able to solve and prove 
optimality of far larger instances than other methods. 
Two papers that describe some of this research and 
also serve as good introductions to the area of branch 
and cut algorithms are [21,32]. A more recent work on 
the branch and cut approach to the traveling salesman 
problem is [1]. Branch and cut methods have also been 
used to solve other combinatorial optimization prob- 
lems; recent references include [8,10,13,23,24,26]. For 
these problems, the cutting planes are typically derived 
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from studies of the polyhedral combinatorics of the cor- 
responding integer program. This enables the addition 
of strong cutting planes (usually facet defining inequal- 
ities), which make it possible to considerably reduce the 
size of the branch and bound tree. Far more detail about 
these strong cutting planes can be found in >» Integer 
programming: Cutting plane algorithms. 

Branch and cut methods for general integer pro- 
gramming problems are also of great interest (see, for 
example, the papers [4,7,11,16,17,22,28,30]). It is usu- 
ally not possible to efficiently solve a general integer 
programming problem using just a cutting plane ap- 
proach, and it is therefore necessary to also to branch, 
resulting in a branch and cut approach. A pure branch 
and bound approach can be sped up considerably by 
the employment of a cutting plane scheme, either just 
at the top of the tree, or at every node of the tree. 

For general problems, the specialized facets used 
when solving a specific combinatorial optimization 
problem are not available. Some useful families of gen- 
eral inequalities have been developed; these include cuts 
based on knapsack problems [17,22,23], Gomory cut- 
ting planes [5,12,19,20], lift and project cutting planes 
[3,4,29,33], and Fenchel cutting planes [9]. All of these 
families of cutting planes are discussed in more detail 
later in this entry. 

The software packages MINTO [30] and ABACUS 
[28] implement branch and cut algorithms to solve in- 
teger programming problems. The packages use stan- 
dard linear programming solvers to solve the relax- 
ations and they have a default implementation avail- 
able. They also offer the user many options, including 
how to add cutting planes and how to branch. 


Example 1 Consider the integer programming prob- 
lem 
min —5x, — 6x2 
s.t. xy + 2x2 = 7 
2X1 —-xX2< 3 


X1,X, >0 and integer. 


This problem is illustrated in Fig. 1. The feasible inte- 
ger points are indicated. The linear programming re- 
laxation (or LP relaxation) is obtained by ignoring the 
integrality restrictions; this is given by the polyhedron 
contained in the solid lines. 


0 1 2 XH 


Integer Programming: Branch and Cut Algorithms, Figure 1 
A branch-and-cut example 


The first step in a branch and cut approach is to 
solve the linear programming relaxation, which gives 
the point (2.6, 2.2), with value — 26.2. There is now 
a choice: should the LP relaxation be improved by 
adding a cutting plane, for example, x; + x. < 4, or 
should the problem be divided into two by splitting on 
a variable? 

Assume the algorithm makes the second choice, and 
further assume that the decision is to split on x2, giving 
two new problems: 

min —5x, — 6x2 
St. Xy + 2x2. <7 
2X1 —xX2< 3 
X= 3 


X1,X, >0 and integer, 


and 


X1,X2 >0 and integer. 
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The optimal solution to the original problem will be 
the better of the solutions to these two subproblems. 
The solution to the linear programming relaxation of 
the first subproblem is (1, 3), with value — 23. Since 
this solution is integral, it solves the first subproblem. 
This solution becomes the incumbent best known fea- 
sible solution. The optimal solution for the linear pro- 
gramming relaxation of the second subproblem is (2.5, 
2), with value — 24.5. Since this point is nonintegral, it 
does not solve the subproblem. Therefore, the second 
subproblem must be attacked further. 

It is possible to branch using x; in the second sub- 
problem; instead, assume the algorithm uses a cutting 
plane approach and adds the inequality x, + 2x. < 6. 
This is a valid inequality, in that it is satisfied by ev- 
ery integral point that is feasible in the second sub- 
problem. Further, this inequality is a cutting plane: it 
is violated by (2.5, 2). Adding this inequality to the re- 
laxation and resolving gives the optimal solution (2.4, 
1.8), with value — 22.6. The subproblem still does not 
have an integral solution. However, notice that the opti- 
mal value for this modified relaxation is larger than the 
value of the incumbent solution. The value of the op- 
timal integral solution to the second subproblem must 
be at least as large. Therefore, the incumbent solution is 
better than any feasible integral solution for the second 
subproblem, so it actually solves the original problem. 

Of course, there are several issues to be resolved 
with this algorithm, including the major questions of 
deciding whether to branch or to cut and deciding how 
to branch and how to generate cutting planes. Notice 
that the cutting plane introduced in the second sub- 
problem is not valid for the first subproblem. This in- 
equality can be modified to make it valid for the first 
subproblem by using a lifting technique, which is dis- 
cussed later in this entry. 


A Standard Form 


To fix notation, the following problem is regarded as 
the standard form mixed integer linear programming 
problem in this entry: 


min c!x 
st. Ax <b 
x>0 


xjinteger, i=1,...,p. 


Here, x and c are n-vectors, b is an m-vector, and A is an 
m x n matrix. The first p variables are restricted to be in- 
teger, and the remainder may be fractional. If p = n then 
this is an integer programming problem. Ifa variable is 
restricted to take the values 0 or 1 then it is a binary 
variable. If all variables are binary then the problem is 
a binary program. 


Primal Heuristics 


In the example problem, it was possible to prune the 
second subproblem by bounds, once an appropriate 
cutting plane had been found. The existence of a good 
incumbent solution made it possible to prune in this 
way. In this case, the solution to the linear program- 
ming relaxation of the first subproblem was integral, 
providing the good incumbent solution. In many cases, 
it takes many stages until the solution to a relaxation 
is integral. Therefore, it is often useful to have good 
heuristics for converting the fractional solution to a re- 
laxation into a good integral solution that can be used 
to prune other subproblems. Primal heuristics are dis- 
cussed further in > Integer programming: Branch and 
bound methods. 


Preprocessing 


A very important component of a practical branch and 
cut algorithm is preprocessing to eliminate unnecessary 
constraints, determine any fixed variables, and simplify 
the problem in other ways. Preprocessing techniques 
are discussed in > Integer programming: Branch and 
bound methods. 


Families of Cutting Planes 


Perhaps the first family of cutting planes for general 
mixed integer programming problems were Chvdtal- 
Gomory cutting planes [15,19,20]. These inequalities 
can be derived from the final tableau of the linear pro- 
gramming relaxation, and they are discussed in more 
detail in >» Integer programming: Cutting plane algo- 
rithms. These cuts can be useful if they are applied in 
a computationally efficient manner [5,12]. Gomory cuts 
can contain a large number of nonzeros, so care is re- 
quired to ensure that the LP relaxation does not be- 
come very hard with large memory requirements. The 
cuts are generated directly from the basis inverse, so 
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care must also be taken to avoid numerical difficul- 
ties. 

One of the breakthroughs in the development of 
branch and cut algorithms was the paper by H.P. Crow- 
der, E.L. Johnson, and M.W. Padberg [17]. This paper 
showed that it was possible to solve far larger general 
problems than had previously been thought possible. 
The algorithm used extensive preprocessing, good pri- 
mal heuristics, and cutting planes derived from knap- 
sack problems with binary variables. Any inequality in 
binary variables can be represented as a knapsack in- 
equality }°jeNn a; x; < b with all a; > 0 for some sub- 
set N of the variables, eliminating variables or replacing 
a variable x; by 1 — x; as necessary. The facial struc- 
ture of the knapsack polytope can then be used to derive 
valid inequalities for the problem. For example, if RC N 
with )\jeR a; >b then jer x; <|R| — 1is a valid in- 
equality. Further, if R is a minimal such set, so deleting 
any member of R leaves the sum of coefficients smaller 
than b, then the inequality defines a facet of the corre- 
sponding knapsack polytope. Other inequalities can be 
derived from the knapsack polytope. These inequalities 
have been extended to knapsacks with general integer 
variables and one continuous variable [11] and to bi- 
nary problems with generalized upper bounds [34]. 

Another family of useful inequalities are lift-and- 
project or disjunctive inequalities. These were originally 
introduced by E. Balas [2], and it is only in the last few 
years that the value of these cuts has become appar- 
ent for general integer programming problems [3,4]. 
Given the feasible region for a binary programming 
problem S := { x: Ax < b, x; = 0, 1, Vi}, each variable 
can be used to generate a set of disjunctive inequal- 
ities. Let Si =e Ax =b,0 =x; = 1, Via = 0} and 
Si = (Ars 6,0 Sx; 5 1, Vigo 1}, Thens c Si U 
Sj , 0 valid inequalities for S can be generated by finding 
valid inequalities for the convex hull of Si U Sj. These 
inequalities are generated by solving linear program- 
ming problems. Because of the expense, the cuts are 
usually only generated at the root node. Nonetheless, 
they can be very effective computationally. 

Other general cutting planes have been developed. 
The paper[16] describes several families and discusses 
routines for identifying violated inequalities. Fenchel 
cutting planes, which are generated using ideas from 
Lagrangian duality and convex duality, are introduced 
in [9]. 


When to Add Cutting Planes 


The computational overhead of searching for cutting 
planes can be prohibitive. Therefore, it is common to 
not search at every node of the tree. Alternatives include 
searching at every eighth node, say, or at every node at 
a depth of a multiple of eight in the tree. 

Generally, at each node of the branch and bound 
tree, the linear programming relaxation is solved, cut- 
ting planes are found, these are added to the relax- 
ation, and the process is repeated. Usually, there comes 
a point at which the process tails off, that is, the solu- 
tion to one relaxation is not much better than the so- 
lutions to the recent relaxations. It is then advisable to 
stop work on this node and branch. Tailing off is a func- 
tion of lack of knowledge about the polyhedral struc- 
ture of the relaxation, rather than a fundamental weak- 
ness of the cutting plane approach [32]. In some imple- 
mentations, a fixed number of rounds of cutting plane 
searching are performed at a node, with perhaps several 
rounds performed at the root node, and fewer rounds 
performed lower in the tree. 

The cut-and-branch variant adds cutting planes 
only at the root node of the tree. Usually, an implemen- 
tation of such a method will expend a great deal of effort 
on generating cutting planes, requiring time far greater 
than just solving the relaxation at the root. The benefits 
of cut-and-branch include 
e all generated cuts are valid throughout the tree, since 

they are valid at the root. 

e bookkeeping is reduced, since the relaxations are 
identical at each node. 

e no time is spent generating cutting planes at other 
nodes. 

Cut-and-branch is an excellent technique for many 

general integer programs, but it lacks the power of 

branch and cut for some hard problems. See [16] for 

more discussion of the relative computational perfor- 

mance of cut-and-branch and branch and cut. 


Lifting Cuts 


A cut added at one node of the branch and cut tree may 
well not be valid for another subproblem. Of course, it is 
not necessary to add the cut at any other node, in which 
case the cut is called a local cut. This cut will then only 
affect the current subproblem and its descendants. The 
drawback to such an approach is in the potential mem- 
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ory requirement of needing to store a different version 
of the problem for each node of the tree. In order to 
make a cut valid throughout the tree (or global), it is 
necessary to lift it. 

Lifting can be regarded as a method of rotating 
a constraint. Returning to the example problem once 
again, the constraint x, + 2x2 < 6 is valid if x. < 2. To 
extend this constraint to be valid when x2 > 3, consider 
the inequality 


xy + 2x2 + a(x. —2) < 6. 


It is desired to take a as large as possible while ensuring 
that this is a valid inequality. If x. = 3 then x; < 1, so 
the inequality is valid for x. = 3 provided a < — 1. If 
Xz = 1, the inequality is valid provided a > — 2. Finally, 
the inequality is valid when x, = 0 provided a > — 2.5. 
Combining these conditions gives that the valid range 
is — 2 <a@ <— 1. The two extreme choices aw = — 1 and 
a = — 2 give the valid inequalities x; + x. < 4and x, < 
2, respectively. Other valid choices for a give inequali- 
ties that are convex combinations of these two. In this 
way, valid inequalities for one node of the tree can be 
extended to valid inequalities at any node. 

See [11] for more discussion of lifting in the case of 
general mixed integer linear programming problems. It 
is often not possible to lift inequalities for such prob- 
lems because the upper and lower bounds on the coeffi- 
cients conflict. Of course, if an inequality is valid at the 
root node then it is valid throughout the tree so there 
is no need to lift. This is one of the reasons why gen- 
eral inequalities such as Chv atal-Gomory cuts or lift- 
and-project cuts are often more successfully employed 
in a cut-and-branch approach. 

The method of calculating coefficients in the case of 
binary problems is now outlined — see [31] for more de- 
tails. The inequality generated at a node in the tree will 
generally only use the variables that are not fixed at that 
node. Lifting can be used to make the inequality valid at 
any node of the tree. It is necessary to apply the lifting 
process for each variable that has been fixed at the node, 
examining the opposite value for that variable. For ex- 
ample, if the inequality 


2s ajxj <h for some subset J C {1,...,n} 
je] 


is valid at a node where x; has been fixed to zero, the 
lifted inequality takes the form 


= AjXj +ajx; <h 
jet 


for some scalar a;. This scalar should be maximized 
in order to make the inequality as strong as possible. 
Now, maximizing a; requires solving another integer 
program, so it may be necessary to make an approxi- 
mation. This process has to be applied successively to 
each variable that has been fixed at the node. The or- 
der in which the variables are examined may well affect 
the final inequality, and other valid inequalities can be 
obtained by lifting more than one variable at a time. 


Implementation Details 


Many details of tree management can be found in > In- 
teger programming: Branch and bound methods. This 
includes node selection, branching variable selection, 
and storage requirements, among other issues. Typi- 
cally, a branch and bound algorithm stores the solution 
to a node as a list of the indices of the basic variables. 
Branch and cut algorithms may require more storage if 
cuts are added locally, because it is then necessary to be 
able to recreate the current relaxation at any active node 
with just the appropriate constraints. If cuts are added 
globally, then it suffices to store a single representation 
of the problem. 

It is possible to fix variables using information about 
reduced costs and the value of the best known feasible 
integral solution, as described in » Integer program- 
ming: Cutting plane algorithms. Once variables have 
been fixed in this way, it is often possible to fix addi- 
tional variables using logical implications. In order to 
fully exploit the fixing of variables, parent node recon- 
struction [32] is performed as follows. Once a parent 
node has been selected, it is not immediately divided 
into two children, but is solved again using the cutting 
plane algorithm. When the cutting plane procedure ter- 
minates, the optimal reduced cost vector has been re- 
constructed and this is used to perform variable fixing. 

Many branch and cut implementations use a pool 
of cuts [32]. This is typically a set of constraints that 
have been generated earlier and either not included in 
the relaxation or subsequently dropped because they no 
longer appeared to be active. It is easy to check these 
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cuts for violation and this is usually done before more 
involved separation routines are invoked. The pool of 
cuts also makes it possible to reconstruct the parent 
node more efficiently, partly because difficulties with 
tailing off are reduced. 


Solving Large Problems 


The difficulty of a particular integer programming 
problem is not purely a function of the size of the prob- 
lem. There are problems in the MIPLIB test set [6] with 
just a few hundred variables that prove resistant to stan- 
dard solution approaches. The difficulty is caused by an 
explosion in the size of the tree. 

For some problems, difficulties are caused by the 
size of the LP relaxation, and interior point methods 
may be useful in such cases. Interior point methods 
are superior to simplex methods for many linear pro- 
gramming problems with thousands of variables. How- 
ever, restarting is harder with an interior point method 
than with a simplex method when the relaxation is 
only slightly changed. Therefore, for very large prob- 
lems, the first relaxation at the top node of the tree can 
be solved using an interior point method, and subse- 
quent relaxations can be solved using the (dual) simplex 
method. For some problems, the relaxations are just too 
large to be handled with a simplex method. For exam- 
ple, the relaxations of the quadratic assignment problem 
given in [25] were solved using interior point methods. 
Interior point methods also handle degeneracy better 
than the simplex method. Therefore, the branch and 
cut solver described in [1] occasionally uses an interior 
point method to handle some subproblems. 

One way to enable the solution of far larger prob- 
lems is to use a parallel computer. The nature of branch 
and cut and branch and bound algorithms makes it pos- 
sible for them to exploit coarse grain parallel computers 
efficiently: typically, a linear programming relaxation is 
solved on a node of the computer. It is possible to use 
one node to manage the distribution of linear programs 
to nodes. Alternatively, methods have been developed 
where a common data structure is maintained and all 
nodes access this data structure to obtain a relaxation 
that requires solution, for example [18]. For a discus- 
sion of parallel branch and cut algorithms, see [7,27]. 
It is also possible to generate cutting planes in parallel; 
see, for example, [14]. 


Conclusions 


Branch and cut methods have been successfully used 
to solve both specialized integer programming prob- 
lems such as the traveling salesman problem and ve- 
hicle scheduling, and also general integer program- 
ming problems. In both cases, these methods are the 
most promising techniques available for proving op- 
timality. For specialized problems, cutting planes are 
derived using the polyhedral theory of the underlying 
problem. For general mixed integer linear program- 
ming problems, important components of an efficient 
implementation include preprocessing, primal heuris- 
tics, routines for generating cutting planes such as lift- 
and-project or Gomory ’s rounding procedure or cuts 
derived from knapsack problems, and also routines for 
lifting constraints to strengthen them. This is an active 
research area, with refinements and developments be- 
ing continuously discovered. 
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Cutting plane methods are exact algorithms for integer 
programming problems. They have proven to be very 
useful computationally in the last few years, especially 
when combined with a branch and bound algorithm 
(cf. » Integer programming: Branch and bound meth- 
ods) in a branch and cut framework (cf. » Integer pro- 
gramming: Branch and cut algorithms). These methods 
work by solving a sequence of linear programming re- 
laxations of the integer programming problem. The re- 
laxations are gradually improved to give better approx- 
imations to the integer programming problem, at least 
in the neighborhood of the optimal solution. For hard 
instances that cannot be solved to optimality, cutting 
plane algorithms can produce approximations to the 
optimal solution in moderate computation times, with 
guarantees on the distance to optimality. 

Cutting plane algorithms have been used to solve 
many different integer programming problems, includ- 
ing the traveling salesman problem [1,15,33], the lin- 


ear ordering problem [16,29,30], maximum cut prob- 
lems [4,28,36], and packing problems [18,31]. See [22] 
for a survey of applications of cutting plane methods, 
as well as a guide to the successful implementation 
of a cutting plane algorithm. G.L. Nemhauser and L. 
Wolsey [32] provide an excellent and detailed descrip- 
tion of cutting plane algorithms and the other material 
in this entry, as well as other aspects of integer pro- 
gramming. The book [34] and also the more recent ar- 
ticle [35] are excellent sources of additional material. 

Cutting plane algorithms for general integer pro- 
gramming problems were first proposed by R.E. Go- 
mory in [12,13]. Unfortunately, the cutting planes pro- 
posed by Gomory did not appear to be very strong, 
leading to slow convergence of these algorithms, so the 
algorithms were neglected for many years. The devel- 
opment of polyhedral theory and the consequent intro- 
duction of strong, problem specific cutting planes led to 
a resurgence of cutting plane methods in the 1980s, and 
cutting plane methods are now the method of choice 
for a variety of problems, including the traveling sales- 
man problem. Recently, there has also been some re- 
search showing that the original cutting planes pro- 
posed by Gomory can actually be useful. There has also 
been research on other types of cutting planes for gen- 
eral integer programming problems. Current research 
is focused on developing cutting plane algorithms for 
a variety of hard combinatorial optimization problems, 
and on solving large instances of integer programming 
problems using these methods. All of these issues are 
discussed below. 


Example 1 Consider, for example, the integer pro- 
gramming problem 


—2x = 2X9 
s.t. xX, + 2x. <7 
2x) —xX2< 3 
X1,X2 >0 and integer. 
This problem is illustrated in Fig. 1. The feasible inte- 
ger points are indicated. The linear programming re- 
laxation (or LP relaxation) is obtained by ignoring the 
integrality restrictions; this is given by the polyhedron 
contained in the solid lines. The boundary of the convex 
hull of the feasible integer points is indicated by dashed 
lines. 
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Integer Programming: Cutting Plane Algorithms, Figure 1 
A cutting plane example 


If a cutting plane algorithm were used to solve this 
problem, the linear programming relaxation would first 
be solved, giving the point x; = 2.6, x2 = 2.2, which has 
value — 7.4. The inequalities x; + x. < 4and x; <2 
are satisfied by all the feasible integer points but they 
are violated by the point (2.6, 2.2). Thus, these two in- 
equalities are valid cutting planes. These two constraints 
can then be added to the relaxation, and when the re- 
laxation is solved again, the point x; = 2, x2 = 2 results, 
with value — 6. Notice that this point is feasible in the 
original integer program, so it must actually be optimal 
for that problem, since it is optimal for a relaxation of 
the integer program. 

If, instead of adding both inequalities, just the in- 
equality x; < 2 had been added, the optimal solution 
to the new relaxation would have been x; = 2, x2 = 2.5, 
with value — 6.5. The relaxation could then have been 
modified by adding a cutting plane that separates this 
point from the convex hull, for example x; + x2. < 4. 
Solving this new relaxation will again result in the opti- 
mal solution to the integer program. This illustrates the 
basic structure of a cutting plane algorithm: 

e Solve the linear programming relaxation. 
e Ifthe solution to the relaxation is feasible in the in- 
teger programming problem, STOP with optimality. 


e Otherwise, find one or more cutting planes that sep- 
arate the optimal solution to the relaxation from 
the convex hull of feasible integral points, and add 
a subset of these constraints to the relaxation. 

e Return to the first step. 
Typically, the first relaxation is solved using the primal 
simplex algorithm. After the addition of cutting planes, 
the current primal iterate is no longer feasible. How- 
ever, the dual problem is only modified by the addition 
of some variables. If these extra dual variables are given 
the value 0 then the current dual solution is still dual 
feasible. Therefore, subsequent relaxations are solved 
using the dual simplex method. 

Notice that the values of the relaxations provide 
lower bounds on the optimal value of the integer pro- 
gram. These lower bounds can be used to measure 
progress towards optimality, and to give performance 
guarantees on integral solutions. 


Totally Unimodular Matrices 


Consider the integer program 
: Tse: = : 
min {c' x: Ax = b,0 < x <u, xinteger} , 


where A is an m x n matrix, c, x, and u are n-vectors, 
and b is an m-vector. A cutting plane method attempts 
to refine a linear programming relaxation until it gives 
a good approximation of the convex hull of feasible in- 
teger points, at least in the region of the optimal solu- 
tion. In some settings, the solution to the initial linear 
programming relaxation min {c' x: Ax = b,0<x <u 
} may give the optimal solution to the integer program. 
This is guaranteed to happen if the constraint matrix 
A is totally unimodular, that is, the determinant of ev- 
ery square submatrix of A is either 0 or + 1. Exam- 
ples of totally unimodular matrices include the node- 
arc incidence matrix of a directed graph, the node- 
edge incidence matrix of a bipartite undirected graph, 
and interval matrices (where each row of A consists 
of a possibly empty set of zeroes followed by a set of 
ones followed by another possibly empty set of zeros). 
It therefore suffices to solve the linear programming re- 
laxation of maximum flow problems and shortest path 
problems on directed graphs, the assignment problem, 
and some problems that involve assigning workers to 
shifts, among others. 
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Chvatal-Gomory Cutting Planes 


One method of generating cutting planes involves com- 
bining together inequalities from the current descrip- 
tion of the linear programming relaxation. This pro- 
cess is known as integer rounding, and the cutting 
planes generated are known as Chv dtal-Gomory cut- 
ting planes. Integer rounding was implicitly described 
by Gomory in [12,13], and described explicitly by V. 
Chvy atal in [7]. 

Consider again the example problem given earlier. 
The first step is to take a weighted combination of the 
inequalities. For example, 


0.2(x1 + 2x2 < 7) + 0.4(2x1 —x2< 3) 
gives the valid inequality for the relaxation: 
xX] < 2.6. 


In any feasible solution to the integer programming 
problem, the left hand side of this inequality must take 
an integer value. Therefore, the right hand side can be 
rounded down to give the following valid inequality for 
the integer programming problem: 


xy = 2. 


This process can be modified to generate additional in- 
equalities. For example, taking the combination 0.5 (x; 
+ 2x) < 7) + 0 (2x — x2 < 3) gives 0.5 x) + x2 < 3.5, 
which is valid for the relaxation. Since all the variables 
are constrained to be nonnegative, rounding down the 
left hand side of this inequality will only weaken it, 
giving x2 < 3.5, also valid for the LP relaxation. Now 
rounding down the right hand side gives x. < 3, which 
is valid for the integer programming problem, even 
though it is not valid for the LP relaxation. 

Gomory originally derived constraints using the op- 
timal simplex tableau. The LP relaxation of the simple 
example above can be expressed in equality form as: 

min —2x); — x2 
s.t. xX, + 2x. +%3=7 
2x, —X2 +x, =3 


x20, i=l,...,4 


Notice that if x; and x, take integral values, then so 
must x3 and x4. Solving this LP using the simplex al- 
gorithm gives the optimal tableau 


74 | 0 0 0.8 0.6 
22 0 1 0.4 =0.2 
2.6 1 0 0.2 0.6 


The rows of the tableau are linear combinations of 
the original objective function and constraints, and cut- 
ting planes can be generated using them. The objective 
function row implies that 0.8x3 + 0.6x4 > 0.4 in any in- 
tegral feasible solution. It can be seen that this is equiv- 
alent to requiring that 2x, + x2 < 7, by substituting for 
x3 and x4 from the equality form given above. It is also 
possible to generate constraints from the other rows of 
the tableau. For example, the first constraint row of the 
tableau is equivalent to the equality 2.2 = x. + 0.4x3 — 
0.2x4. The fractional part of the right-handside of this 
equation is 0.4x3 + 0.8x4. This must be at least as large 
as the fractional part of the left hand side in any feasible 
integral solution, giving the valid cutting plane 0.4x3 + 
0.8x4 = 0.2, which is equivalent to x; < 2.5. Similarly, 
the final row of the tableau can be used to generate the 
constraint 0.2x3 + 0.6x4 > 0.6, or equivalently 7x, — x2 
< 13. In practice, the cut added to the tableau should 
be expressed in the nonbasic variables, here x3 and x4, 
since the tableau will then be in standard form for the 
dual simplex algorithm. 

Gomory ’s cutting plane algorithm solves an inte- 
ger program by solving the LP relaxation to optimal- 
ity, generating a cutting plane from a row of the tableau 
if necessary, adding this additional constraint to the re- 
laxation, solving the new relaxation, and repeating until 
the solution to the relaxation is integral. It was shown in 
[13] that if a cutting plane is always generated from the 
first possible row, then Gomory ’s cutting plane algo- 
rithm will solve an integer program in a finite number 
of iterations. 

Unfortunately, this finite convergence appears to be 
slow. However, it was shown in [3,6] that Gomory ’s 
cutting plane algorithm can be made competitive with 
other methods if certain techniques are used, such as 
adding many Chv atal-Gomory cuts at once. 

It follows from the finite convergence of Gomory ’s 
cutting plane algorithm that every valid inequality for 
the convex hull of feasible integral points is either gen- 
erated by repeated application of integer rounding or 
is dominated by an inequality generated in such a way. 
There are many different ways to generate a given in- 
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equality using integer rounding. The Chv dtal rank of 
a valid inequality is the minimum number of successive 
applications of the integer rounding procedure that are 
needed in order to generate the inequality; it should be 
noted that a rank 2 inequality can be generated by ap- 
plying the integer rounding procedure to a large num- 
ber of rank 1 and rank 0 inequalities, for example. 

It was shown in [26] that Gomory cutting planes 
can be generated even when an interior point method 
is used to solve the LP relaxations, because much of the 
information in the simplex tableau can still be obtained 
easily. 


Strong Cutting Planes From Polyhedral Theory 


The resurgence of interest in cutting plane algorithms 
in the 1980s was due to the development of polyhedral 
combinatorics and the consequent implementation of 
cutting plane algorithms that used facets of the convex 
hull of integral feasible points as cuts. A facet is a face of 
a polytope that has dimension one less than the dimen- 
sion of the polytope. Equivalently, to have a complete 
linear inequality description of the polytope, it is neces- 
sary to have an inequality that represents each facet. 

In the example above, the convex hull of the set of 
feasible integer points has dimension 2, and all of the 
dashed lines represent facets. The valid inequality x; + 
2x2 < 7 represents a face of the convex hull of dimen- 
sion 0, namely the point (1, 3). 

If a complete description of the convex hull of the 
set of integer feasible points is known, then the inte- 
ger problem can be solved as a linear programming 
problem by minimizing the objective function over this 
convex hull. Unfortunately, it is not easy to get such 
a description. In fact, for an NP-complete problem [11] 
(cf. also » NP-complete problems and proof method- 
ology), such a description must contain an exponential 
number of facets, unless P = NP. 

The paper [22] contains a survey of problems that 
have been solved using strong cutting plane algorithms. 
Typically in these algorithms, first a partial polyhedral 
description of the convex hull of the set of integer fea- 
sible points is determined. This description will usu- 
ally contain families of facets of certain types. Separa- 
tion routines for these families can often be developed; 
such a routine will take as input a point (for example, 
the optimal solution to the LP relaxation), and return 


as output violated constraints from the family, if any 
exist. 

The prototypical combinatorial optimization prob- 
lem that has been successfully attacked using cutting 
plane methods is the traveling salesman problem. In this 
problem, a set of cities is provided along with distances 
between the cities. A route that visits each city exactly 
once and returns to the original city is called a tour. 
It is desired to choose the shortest tour. This problem 
has many applications, including printed circuit board 
(PCB) production: a PCB needs holes drilled in certain 
places to hold electronic components such as resistors, 
diodes, and integrated circuits. These holes can be re- 
garded as the cities, and the objective is to minimize the 
total distance traveled by the drill. 

The traveling salesman problem can be represented 
on a graph, G = (V, E), where V is the set of vertices 
(or cities) and E is the set of edges (or links between 
the cities). Each edge e € E has an associated cost (or 
length) ce. If the incidence vector x is defined by 


1_ if edge e is used, 
: 
° 0 otherwise, 


then the traveling salesman problem can be formulated 
as 


min | ) CeXe: x the incidence vector of a tour! . 


Notice that for a tour, at each vertex the sum of the edge 
variables must be two; this is called a degree constraint. 
This leads to the relaxation of the traveling salesman 
problem: 


min CeXe 
s.t. ~~ Xe =2 forall vertices v 
e€d(v) 


Xe =Oorl forall edges e. 


Here, 5(v) denotes the set of all edges incident to ver- 
tex v. All tours are feasible in this formulation, but it 
also allows infeasible solutions corresponding to sub- 
tours, consisting of several distinct unconnected loops. 
To force the solution to be a tour, it is necessary to in- 
clude subtour elimination constraints of the form 


x Xe >2 


e€d(U) 
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for every subset U C V with cardinality 2 < |U| <|V 
|/2, where 5(U) denotes the set of edges with exactly one 
endpoint in U. Any feasible solution to the relaxation 
given above which also satisfies the subtour elimina- 
tion constraints must be the incidence vector of a tour. 
Unfortunately, the number of subtour elimination con- 
straints is exponential in the number of cities. This led 
G.B. Dantzig et al. to propose a cutting plane algorithm 
in [9], where the subtour elimination constraints are 
added as cutting planes as necessary. 

The degree constraints and the subtour elimination 
constraints, together with the simple bounds 0 < x, < 
1, are still not sufficient to describe the convex hull of 
the incidence vectors of tours. This approach of [9] has 
been extended in recent years by the incorporation of 
additional families of cutting planes - see, for example, 
[2515,33]; 

Thus, cutting plane algorithms can be used even 
when the integer programming formulation of the 
problem has an exponential number of constraints. 
Similar ideas are used in papers on the matching prob- 
lem [10,14], maximum cut problems [4,28,36], and the 
linear ordering problem [16,30], among others. The pi- 
oneering work of J. Edmonds on the matching problem 
gave a complete description of the matching polytope, 
and this work was used in subsequent algorithms; it was 
also an inspiration to future work on many other prob- 
lems and even to the formulation of complexity theory 
and the concept of a ‘good’ algorithm. 


Alternative General Cutting Planes 


A knapsack problem is an integer programming prob- 
lem with just one linear inequality constraint. A gen- 
eral integer programming problem can be regarded as 
the intersection of several knapsack problems, one for 
each constraint. This observation was used in [8,19,20] 
to solve general integer programming problems. The 
approach consists of finding facets and strong cutting 
planes for the knapsack problem and adding these con- 
straints to the LP relaxation of the integer program as 
cutting planes. 

There has been interest recently in other families of 
cutting planes for general integer programming prob- 
lems. Two such families of cuts are lift-and-project cuts 
[2] and Fenchel cuts [5]. To find a cut of this type, it 
is generally necessary to solve a linear programming 
problem. 


These alternative general cutting planes are not usu- 
ally strong enough on their own to solve an integer pro- 
gramming problem, and they are most successfully em- 
ployed in branch and cut algorithms for integer pro- 
gramming; they are discussed in more detail in > Inte- 
ger programming: Branch and cut algorithms. 


Fixing Variables 


If the reduced cost of a nonbasic variable is sufficiently 
large at the optimal solution to an LP relaxation, then 
that variable must take its current value in any optimal 
solution to the integer programming problem. To make 
this more precise, suppose the binary variable x; takes 
value zero in the optimal solution to an LP relaxation 
and that the reduced cost of this variable is rj. The op- 
timal value of the relaxation gives a lower bound z on 
the optimal value of the integer programming problem. 
The value zyg of the best known feasible integral so- 
lution provides an upper bound on the optimal value. 
Any feasible point in the relaxation with x; = 1 must 
have value at least z + rj, so such a point cannot be op- 
timal if r; > zyg — z. Similar tests can be derived for 
nonbasic variables at their upper bounds. It is also pos- 
sible to fix variables when an interior point method is 
used to solve the relaxations [26]. 

Once some variables have been fixed in this man- 
ner, it is often possible to fix further variables using log- 
ical implications. For example, in a traveling salesman 
problem, if x, has been set equal to one for two edges 
incident to a particular vertex, then all other edges inci- 
dent to that vertex can have their values fixed to zero. 


Solving Large Problems 


It is generally accepted that interior point methods 
are superior to the simplex algorithm for solving suf- 
ficiently large linear programming problems. The situ- 
ation for cutting plane algorithms for large integer pro- 
gramming problems is not so clear, because the dual 
simplex method is very good at reoptimizing if only 
a handful of cutting planes are added. Nonetheless, it 
does appear that interior point cutting plane algorithms 
may well have a role to play, especially for problems 
with very large relaxations (thousands of variables and 
constraints) and where a large number of cutting planes 
are added simultaneously (hundreds or thousands). LP 
relaxations of integer programming problems can expe- 
rience severe degeneracy, which can cause the simplex 
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method to stall. Interior point methods suffer far less 
from the effects of degeneracy. 

In [27], an interior point cutting plane algorithm is 
used for a maximum cut problem on a sparse graph, 
and the use of the interior point solver enables the so- 
lution of far larger instances than with a simplex solver, 
because of both the size of the problems and their de- 
generacy. 

A combined interior point and simplex cutting 
plane algorithm for the linear ordering problem is de- 
scribed in [30]. In the early stages, an interior point 
method is used, because the linear programs are large 
and many constraints are added at once. In the later 
stages, the dual simplex algorithm is used, because just 
a few constraints are added at a time and the dual 
simplex method can then reoptimize very quickly. The 
combined algorithm is up to ten times faster than either 
a pure interior point cutting plane algorithm or a pure 
simplex cutting plane algorithm on the larger instances 
considered. 

The polyhedral combinatorics of the quadratic as- 
signment problem are investigated in [21]. It was found 
necessary to use an interior point method to solve the 
relaxations, because of the size of the relaxations. 


Provably Good Solutions 


Even if a cutting plane algorithm is unable to solve 
a problem to optimality, it can still be used to generate 
good feasible solutions with a guaranteed bound to opti- 
mality. This approach for the traveling salesman prob- 
lem is described in [23]. The value of the current LP 
relaxation provides a lower bound on the optimal value 
of the integer programming problem. The optimal so- 
lution to the current LP relaxation (or a good feasible 
solution) can often be used to generate a good integral 
feasible solution using a heuristic procedure. The value 
of an integral solution obtained in this manner provides 
an upper bound on the optimal value of the integer pro- 
gramming problem. 

For example, for the traveling salesman problem, 
edges that have x, close to one can be set equal to one, 
edges with x, close to zero can be set to zero, and the 
remaining edges can be set so that the solution is the 
incidence vector of a tour. Further refinements are pos- 
sible, such as using 2-change or 3-change procedures to 
improve the tour, as described in [25]. 


This has great practical importance. In many situa- 
tions, it is not necessary to obtain an optimal solution, 
and a good solution will suffice. If it is only necessary to 
have a solution within 0.5% of optimality, say, then the 
cutting plane algorithm can be terminated when the gap 
between the lower bound and upper bound is smaller 
than this tolerance. If the objective function value must 
be integral, then the algorithm can be stopped with an 
optimal solution once this gap is less than one. 


Equivalence of Separation and Optimization 


The separation problem for an integer programming 
problem can be stated as follows: 


Given an instance of an integer programming 
problem and a point x, determine whether «x is in 
the convex hull of feasible integral points. Fur- 
ther, if it is not in the convex hull, find a sepa- 
rating hyperplane that cuts off x from the convex 
hull. 


An algorithm for solving a separation problem is called 
a separation routine, and it can be used to solve an in- 
teger programming problem. 

The ellipsoid algorithm [17,24] is a method for solv- 
ing linear programming problems in polynomial time. 
It can be used to solve an integer programming problem 
with a cutting plane method, and it will work in a poly- 
nomial number of stages, or calls to the separation rou- 
tine. If the separation routine requires only polynomial 
time then the ellipsoid algorithm can be used to solve 
the problem in polynomial time. It can also be shown 
that if an optimization problem can be solved in poly- 
nomial time then the corresponding separation prob- 
lem can also be solved in polynomial time. 

There are instances of any NP-hard problem that 
cannot be solved in polynomial time unless P = 
NP.Therefore, a cutting plane algorithm cannot always 
generate good cutting planes quickly for NP-hard prob- 
lems. In practice, fast heuristics are used, and these 
heuristics may occasionally be unable to find a cutting 
plane even when one exists. 


Conclusions 


Cutting plane methods have been known for almost as 
long as the simplex algorithm. They have come back 
into favor since the early 1980s because of the develop- 
ment of strong cutting planes from polyhedral theory. 
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In practice, cutting plane methods have proven very 
successful for a wide variety of problems, giving prov- 
ably optimal solutions. Because they solve relaxations of 
the problem of interest, they make it possible to obtain 
bounds on the optimal value, even for large instances 
that cannot currently be solved to optimality. 
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One of the more elegant and satisfying ideas in the the- 
ory of optimization is linear programming duality. The 
dual of a linear programming problem is not only in- 
teresting theoretically but has great practical value, be- 
cause it provides sensitivity analysis, bounds on the op- 
timal value, and marginal values for resources. 

It is natural to want to extend duality to integer pro- 
gramming in order to obtain these same benefits. The 
matter is not so simple, however. Linear programming 
duality actually represents several concepts of duality 
that happen to coincide in the case of linear program- 
ming but diverge as one moves to other types of opti- 
mization problems. The benefits also decouple, because 
each duality concept provides some of them but not 
others. 

Five types of integer programming duality are sur- 
veyed here. None is clearly superior to the others, and 
their strengths and weaknesses are summarized in at 
the end of the article. 


Linear Programming Duality 

A brief summary of linear programming duality will 
provide a foundation for the rest of the discussion. Con- 
sider the linear programming (primal) problem, 


max cx 
st. Ax <b (1) 
x>0 


where A is an m x n matrix. The dual problem may be 
stated 


min ub 
st. uA>c (2) 
u > 0, 
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where u is a vector of dual variables. This is a strong 

dual because its optimal value is the same as that of the 

primal problem, unless both primal and dual are infea- 
sible. (An unbounded or infeasible maximization prob- 
lem is regarded as having optimal value oo or — on, 
respectively, and analogously for minimization prob- 
lems.) 

The linear programming dual brings at least three 
important benefits. 

a) (Bounds) The value of any feasible dual solution 
provides an upper bound on the optimal value of 
the primal problem. For any x and y that are pri- 
mal and dual feasible, respectively, ub > uAx > cx. 
The first inequality is due to the fact that Ax < b 
and u > 0, and the second is due to the fact that uA 
> cand x > 0. By finding a dual feasible solution, 
one can estimate how much a primal feasible solu- 
tion falls short of optimality. Although this property 
is less important for linear programming, where ro- 
bust solution algorithms are available, it is essential 
for integer programming. 

b) (Sensitivity analysis) Due to a), the dual solution 
provides a partial sensitivity analysis. Let u be an op- 
timal solution of the dual problem (2), so that ub 
is the optimal value of both primal and dual. If the 
right-hand side of the constraint in (1) is perturbed 
by Ab, so that it becomes Ax < b + Ab, then the dual 
(2) becomes 


min u(b+ Ab) 
st. uA>c (3) 


u>0. 


Because only the objective function changes, W is 
feasible in (3) as well as (2). So u(b+ Ab) is an upper 
bound on the optimal value of the perturbed primal 
problem. The (possibly negative) change in the op- 
timal value ub of the original problem is bounded 
above by uAb. The change is in fact equal to uAb 
if the perturbation Ab lies within easily computable 
ranges. 

c) (Complementary slackness) Due to b), the marginal 
values of resources are readily available. If the right- 
hand side b; of a particular constraint of (1) rep- 
resents a resource constraint, then a change Aj; in 
the amount of resource available raises the optimal 
profit by at most 7; Ab;. In particular there is a com- 
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The shaded polyhedron is the feasible set of a linear pro- 
gramming problem with optimal solution (x, x2) = (2.8, 1.3) 
and optimal value 69. The dashed lines represent a pertur- 
bation of the right-hand sides. The black dots represent fea- 
sible solutions of the corresponding integer programming 
problem, which has optimal solution (x1, x2) = (2, 1) and op- 
timal value 50 


plementary slackness property, which says that a sur- 

plus resource has no marginal value. If x is optimal 

in (1), then u(b — Ax) = 0. 

Consider for example the linear programming 
problem 


max 20x; + 10x2 
s.t. xy + Ax> < 8 


2x] — 2x2 < 3 


(4) 
X1,X2 => 0, 


which is graphed in Fig. 1. The optimal dual solution 
is (1, U2) = (6, 7). If the two constraints represent two 
resource limitations, then the resources have marginal 
values of at most 6 and 7, respectively. If one less unit of 
each resource is available (represented by dashed lines 
in Fig. 1.), then the change in the objective function 
value is bounded above by — 6 — 7 = — 13. In fact, the 
profit decreases by exactly 13. 


Integer Programming 


Integer programming modifies the linear programming 
problem (1) by requiring the variables to take integral 
values: 


max cx 
st. Ax <b (5) 


x => Oand integer. 
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In mixed integer/linear programming (MILP) some 
variables are continuous and some are integral. For ease 
of exposition, the discussion here is restricted to pure 
(unmixed) integer programming. 

The example problem (4) may be modified to obtain 
the integer programming problem, 


max 20x, + 10x2 
s.t. xy + Ax> < 8 


2x] — 2x2 Ss 3 


(6) 
X1,X2 > Oand integers. 


Figure | illustrates this problem as well. 

In linear programming, a constraint with slack at 
the optimal solution is redundant. It may be omitted 
without changing the optimal solution. The example 
shows that this is untrue for integer programming. Both 
constraints in (6) contain slack at the solution (x), x2) 
= (2, 1). Yet removing either would result in a different 
solution. 

The concept of a marginal value is problematic in 
integer programming. Let the value function v*(b) in- 
dicate the optimal value of (1) or (5) for a given right- 


90 “ 


branch aud bound dual 


Superacdditive dial 


3 2 1 0 1 2 3 
Integer Programming Duality, Figure 2 
Upper bounds on the optimal value of an integer program- 
ming problem provided by the superadditive and branch 
and bound duals, as a function of the right-hand side pertur- 
bation A bj. The value function indicates the exact optimal 
value for each Ab; 


hand side b. In linear programming the marginal value 
of resource i is essentially the partial derivative of v*(b) 
with respect to b;. Yet in integer programming v*(b) 
is a step function with respect to any bj, as illustrated 
by the dotted line in Fig. 2. So it is unclear what would 
be meant by a marginal value. However, there may be 
a complementary slackness property of some kind, de- 
pending on the duality in question. 


Surrogate Duality 


One general scheme for formulating integer program- 
ming duals is to define a family of relaxations of the 
original problem that are parameterized by dual vari- 
ables. The dual problem is then the problem of finding 
the tightest relaxation. It will be seen that the linear pro- 
gramming dual does exactly this. 

One instance of this scheme is surrogate duality 
[9,10,11]. The integer programming problem (5) can be 
relaxed by replacing the constraints Ax < b with a sur- 
rogate constraint, i.e., a nonnegative linear combina- 
tion of the inequalities in Ax < b. This yields a surrogate 
relaxation of (1): 


max cx 
s.t. uAx < ub (7) 


x > Oand integer. 


This is a relaxation in the sense that its feasible set con- 
tains that of (5). Its optimal value o(u) is therefore an 
upper bound on that of (5) for any u > 0. The surrogate 
relaxation may be much easier to solve than the original 
problem because it has only one constraint (other than 
nonnegativity). The surrogate dual problem is to find 
a u that gives the best bound: 


jn o(u) (8) 


s.t. u>0. 


The surrogate relaxation of a linear programming prob- 
lem (1) is (7) without the integrality constraint. From 
strong linear programming duality, its optimal value is 
o(u) = ming{aub: auA > ac, a > O}. So the surro- 
gate dual (8) becomes precisely the linear programming 
dual (2). 
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Integer Programming Duality, Figure 3 
Plot of o (1, u) for a surrogate dual problem 


Surrogate duality can be illustrated with the integer 
programming problem (6). The surrogate relaxation is 


min 20x + 10x. 
st. (1 + 2u)x; + (4—2u)x2 < 8+ 3u 


X1,X2 > Oand integers. 


Because there are only two constraints and only the ra- 
tio u2/u, matters, u; is set to 1 and up replaced with u. 
A plot of o(1, u) appears in Fig. 3. 

The primary utility of the surrogate dual is to pro- 
vide an upper bound on the optimal value of the origi- 
nal problem. In the example, the dual attains its optimal 
value of 60 when 1 < u < 20/17. This is better than the 
bound of 69 provided by the linear programming relax- 
ation. But there is a duality gap of 60 — 50 = 10. 

One might speculate that the surrogate multipliers 
indicate the relative importance of the two constraints, 
but it is unclear what this means. One can say, how- 
ever, that omitting a constraint with a vanishing multi- 
plier does not raise the optimal value above that of the 
surrogate dual. Vanishing multipliers therefore identify 
redundant constraints when there is no duality gap. 

The surrogate dual (8) must be solved by a search 
method that does not require gradient or subgradient 
information. Possible algorithms are discussed in [16]. 
The dual problem need not be solved to optimality, be- 
cause only an upper bound is sought in any case. 


Lagrangian Duality 


Another form of relaxation duality, Lagrangian dual- 
ity [5,6,7], removes some of the more troublesome con- 
straints from (5) but inserts into the objective function 
a penalty for violating them. Thus the constraints are 
partitioned into ‘hard’ constraints A'x < b' and ‘easy’ 
constraints A?x < b?: 


max cx 
st.  Alx <b! 
A’x < b? ©) 


x > Oand integer. 


The hard constraints are dualized to obtained the La- 
grangian relaxation: 


max cx +u(b! — A!x) 


st. A?x < b? (10) 


x > Oand integer. 


This is a relaxation in the sense that its optimal value 
§(u) is an upper bound on the optimal value of (5). For 
any x that is feasible in (9), cx < cx + u(b' — A'x) be- 
cause u > 0 and b! — A'x > 0. The Lagrangian dual 
problem is 


min 6(u) 


s.t. u>0. 


(11) 


If all the constraints of (9) are dualized, then the La- 
grangian dual is no improvement over the linear pro- 
gramming dual. In this case @(u) = max {(c — uA) x + 
ub: x > 0, integer }. So 0(u) is ub when c — uA < Oand is 
oo otherwise. The Lagrangian dual problem (2) is now 
the problem of minimizing ub subject to c — uA < 0 
and u > 0, which is precisely the linear programming 
dual (2). (It follows that linear programming duality is 
a special case of Lagrangian duality.) 

The Lagrangian dual is therefore useful only when 
some constraints are not dualized. These constraints 
must be carefully chosen so that the integer program- 
ming problem (5) is easy to solve. It may, for example, 
decouple into smaller problems or have other special 
structure. 

As an example of Lagrangian duality, suppose that 
the first constraint of (6) is dualized. The Lagrangian 
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relaxation is 
max 20x; + 10x. + u(8 — x; — 4x2) 
s.t. 2x1 = 2x2 < 3 


X1,X2 > Oand integers. 


The optimal solution of the Lagrangian dual is u = 6, 
with value 0(u) = 62, slightly worse than the surrogate 
bound of 60 but still better than the linear program- 
ming bound of 69. 

The Lagrangian and surrogate duals can be com- 
pared in general if the surrogate relaxation dualizes the 
same constraints as the Lagrangian relaxation. In this 
case it can be shown that the surrogate duality gap is 
never larger than the Lagrangian duality gap [7], and it 
tends to be smaller. The Lagrangian relaxation has the 
advantage, however, that it is often easier to solve than 
the surrogate relaxation that dualizes the same con- 
straints. Moreover, @(u) is convex and piecewise linear. 
A subgradient optimization method can be used to find 
a global minimum of @(u) by finding a local minimum. 
In fact, if O(u) = cx, + u(b — Ax,,), then b — Ax,, is a sub- 
gradient of (u) at u. 

When there is no duality gap, the Lagrangian mul- 
tipliers u; can be viewed as sensitivities to right-hand 
side perturbations, with respect to at least one optimal 
solution. It can be shown that there is no duality gap 
if and only there exists a feasible solution x of (5) and 
u > 0 that satisfy 6(u) = cx+u(b'—A'x) and comple- 
mentary slackness: u(b! — A'x) = 0. However, solution 
of the Lagrangian dual does not necessarily yield a so- 
lution x with these properties. Further search may be 
required. 


Superadditive Duality 


So far the linear programming dual has been viewed as 
a relaxation dual, of either the surrogate or Lagrangian 
type. It can also be viewed as representing the classical 
duality of vectors and linear functionals. For this pur- 
pose the dual of (1) is written: 


min f(b) 
st. f(A)>c (12) 
feF. 


Here f is a linear functional defined by a nonnegative 
row Vector u, so that f(b) = ub and f(A) = uA. The min- 
imization is over the set F of all such functionals. 


A similar dual of the integer programming problem 
(5) can be written as (12), but with minimization over 
a broader class F of functions. It can be shown that if F 
is the class of superadditive nondecreasing functions f 
with f(0) = 0, then (12) is a strong integer programming 
dual. This superadditive dual [2,15,20,24] provides sen- 
sitivities to right-hand side perturbations. It is also pos- 
sible, at least in principle, to construct a function f that 
solves the dual, by means of a cutting plane algorithm. 

A superadditive function f is one that satisfies f(a + 
b) > f(a) + f(b) for all vectors a, b. The superadditive 
dual satisfies weak duality because if x is feasible in (5) 
and f is feasible in (12), then 


cx < \- f(a)xj < \~ flaix;) < f(Ax) < f(b), 
j fi 


where a! is row j of A. The first inequality follows from 
f(A) = c. The second is due to superadditivity of f and 
the fact that multiplication by a nonnegative integer x; 
creates a sum of zero or more terms (also f (0) = 0). The 
third is due to superadditivity. The fourth follows from 
the fact that f is nondecreasing and Ax < b. 

Strong duality can be established by exhibiting 
a dual feasible solution f for which there is no duality 
gap. Let a rounding function be a function of the form 

R(d) = [My |My |Mid] ++ JJ, (13) 
where each MM; is a nonnegative matrix and | @ | is a 
rounded down. A Chvdtal function has the form uR, 
where u > 0 and R is a rounding function. Because 
Chvatal functions clearly belong to F, it suffices to ex- 
hibit a Chvatal function wR for which uR(b) is the opti- 
mal value of (5). 

This is done by generating Chvatal-Gomory cuts. 
A rank 1 Chvdtal-Gomory cut for Ax < b, x > 0 is an 
inequality of the form | mA | x < | mb |, where m > 
0 defines a linear combination of the rows of Ax < b. 
Rank 2 cuts are obtained by applying the same opera- 
tion to rank 1 cuts, and so forth. Chvatal showed that 
the integer hull of any polyhedron (i. e., the convex hull 
of its integral points) is described by finitely many cuts 
of finite rank. 

This implies that for some rounding function R, 
R(A)x < R(b) and x > 0 describe the integer hull of 
P={x > 0: Ax < b}. So the optimal value of the integer 
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programming problem (5) is the optimal value of 


max cx 
st. R(A)x < R(b) (14) 
x>0. 
The linear programming dual of (14) is 
min uRb 
st. uR(A)>c (15) 
u>=0. 


If @ solves problem (15), then its optimal value 7R(b) 
is the optimal value of (14) and therefore of the origi- 
nal integer programming problem (5). Thus f = @R is 
a Chvatal function that solves the dual problem (12). 

The dual solution provides sensitivity analysis with 
respect to right-hand sides. Due to weak duality, 
uR(b + Ab) is an upper bound on the optimal value 
when the right-hand side in (5) is perturbed to b + Ab. 
There is a form of complementary slackness, because 
for any optimal solution X of (5), (@R(A) — c)x = 0. 

Consider again the example problem (6). It will be 
seen below that R is 


1 
5; 0 
0 1 


20x; + 10x2 
s.t. xX) + 2x2 < 4 


(16) 


O win 
NIRUINS |e 
nu 


CO Uleule 


R(d) = ( 


Problem (14) is 


max 


xp—X2 <1 


Xs Xo = 0. 


The solution of (15) isu = (10 10). So if the right- 
hand side of (6) is perturbed by Ab = (Ab,, Abz), the 
new optimal value of (6) is bounded above by 


2 i 0 8+ Ab, 
3 3 
me) ( 0 ') og) 


For instance, if each resource is reduced by one (Ab = 
(— 1, — 1)), then the new optimal value is at most 50. In 
fact it is exactly 50. Figure 2 plots 7R(b + Ab) against 
Ab, for comparison with the value function v*(b + Ab). 
Note that there is complementary slackness, because 
(aR(A) — c)x = [(20 10)—(20 10)](2,1)=0. 


CO ales 
= 
NIRGINS|H 


Solving the Superadditive Dual 


A solution of the dual problem (12) can be constructed 
in stages that correspond to Chvatal ranks [3,14,23]. It 
is assumed without practical loss of generality that the 
components of A, b and c are rational numbers. 

The first stage proceeds as follows. Let x', ..., x? be 
the vertices of Py = P. For each x* consider the cone C; 
of directions d for which x* maximizes dx subject to x 
€ Py. To describe C;, let Ax < b be the constraints of 
Ax < b that are active at x* (i. e., the constraints a'x < b; 
for which a‘x* = b;), and let —Ix < 0 be the active con- 
straints of — x < 0 (the constraints — x; < 0 for which 
ie = 0). Then C, is the cone spanned by the rows of A 
and —I. 

It suffices to identify a Hilbert basis [8] for Cy; i-e., 
a set of directions d!,..., d7 such that every integer vec- 
tor in C, is a nonnegative integer combination of d', 
..., 47, Assume without loss of generality that the com- 
ponents of A are integers (the inequalities Ax < b can 
be multiplied by appropriate integers to achieve this). 
Then the integer vectors d', ..., d4 in the set 


AA— pl: O<AK<e,0<p<e 
LL LL 


form a Hilbert basis for C;,, where e is a row vector of 
ones. 

The next step is to generate rank 1 Chvatal-Gomory 
cuts associated with x*. First note that each inequality 
of the form dix < dix* supports Po at x* and is therefore 
a nonnegative linear combination of the rows of Ax < 
b, —Ix < 0. Thus one can write 

di = mA — p'T. (17) 
The multipliers m/ and p’ can be obtained by solving 
(17). The valid inequalities dix < | dix* | are clearly 
rank 1 cuts for Ax < b, — x < 0. Rank 1 cuts are gener- 
ated in this fashion for all the vertices x*. 

Now let P; be defined by all of the rank 1 cuts gen- 
erated, plus x > 0. Let the rows of M, be the vectors 
m corresponding to rank 1 cuts that define facets of P}. 
Then P} ={x>0:|M,;A|x<|M,b]}. 

This same procedure is now applied to the vertices 
of P; to obtain M, and P3, and so forth until all the ver- 
tices of P; are integral. At this point (13) is the desired 
rounding function R, and f = WR solves the dual (12). 

The Hilbert bases and inequalities dix < | dix* | for 
problem (6) appear in Table 1. (The origin need not be 
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Integer Programming Duality, Figure 4 
The black dots indicate a Hilbert basis for the cone spanned 
by (1, 4) and (1, — 1) 


considered as a vertex.) At vertex x! = (2.8, 1.3), for ex- 
ample, the two constraints x) + 4x. < 8 and 2x; — 2x2 
< 3 are active, and so the cone C; is spanned by (1, 4) 
and (1, — 1). The Hilbert basis consists of the integer 
vectors of the region depicted in Fig. 4. 

The polyhedra P; and P2 are shown by dashed and 
dotted lines, respectively, in Fig. 5. Their facets (other 
than x > 0) and the corresponding vectors m/ appear 
in Table 2. The vectors M, and M,; that appear in the 
rounding function (16) can be read from Table 2. 


Another Functional Dual 


It is practical to solve the superadditive dual only when 
the problem is small or has special structure. An al- 
ternative is to derive a dual solution for (12) from the 
branch and bound tree that solves the primal problem, 
as proposed in [18] on the basis of work in [24]. This 
maneuver sacrifices an independently computed upper 
bound on the optimal value, but it provides useful sen- 
sitivity analysis in a more practical fashion than the su- 
peradditive dual. It might be called a “branch and bound 
dual’. 


Integer Programming Duality, Table 1 
Hilbert basis vectors d/ and rank 1 cuts d/x < | d/x* | corre- 
sponding to vertices x of an integer programming problem 


xk di dix < [dix*] 
(2.8, 1.3) T= ea! 
(1, 0) x <2 
Ge 1) Xj, +x. <4 
Cl, 2); X,+2x, <5 
(1,3) xX, +3x2 <6 
(1,4) X, +4x2 < 8 
(2,3) 2x, + 3x2 <9 
(1.5, 0) (0, —1) =i SC 
Ch=D i = iy, S Ih 
(1, —2) x1 — 2X2 < 1 
(0, 2) (—1,0) —x, <0 
(0, 1) Xz <2 
(0, 2) 2x2 <4 
(0, 3) 3x2 <6 
(1, 3) xX, +3x2 <6 


Integer Programming Duality, Table 2 
Polyhedra P;, P2 and vectors m/ corresponding to their facets 


P; Facet mi 

P x, + 3x2 <6 ae 
x1 <2 35 
Ba] 8) SS Il (0 5 

P, x1 +2x,.<4 | (F = 0) 


x1 —X2 <1 (0 0 1) 


£] 


Integer Programming Duality, Figure 5 
The polyhedra Pp (solid line), P; (dashed line), and P2 (dotted 
line) for an integer programming problem 
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Node 0 
z = (2.8,1.3),7 = 69 


v9(b) = 


= (6,7) 
65, + Tb. 


= min{vg(d), max{wy (b), wi{b}}} 


Ty, <2 

Node 1 
x = (2,1.5), 7 = 55 
u = (2.5, 0), 8 = 17.5 


uv, (b) = 2.53; + 35 
= min{v,(b), max{w(b), wa(b)}} 


rm> 3 
Node 4 
infeasible 
u = (0.5, 1.0},a = 2.5 
va(b) = 0.58, + bg — 7.5 
7, f 0 if v4(b) >0 
me) = { —oo otherwise 
violated surrogate: 
52, < 14 


Node 2 Node 3 
x = (1,2),2 = 50 x =(0,2),2= 
u = (0,0), 8 = (20,10) u = (20,0},a = 0,8 = 70 
v9 (6) = we (b) = 60 v3 (6) = w3(b) = 206, — 140 
violated surrogate: violated surrogate: 
0<90 zy+4r2 <8 
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A branch and bound tree with information relevant to the branch and bound dual and the inference dual 


Rather than superadditive functions, the feasible set 
F in (12) will contain functions of the form 


f(d) = min{yd + yo, maxt{fi(d), A(@d)}}, (18) 


where y > 0 and f; and f> are either identically zero or 
of the form (18). Weak duality is easily shown. Strong 
duality is shown by constructing a solution as follows. 

At each node t of the branch and bound tree for (5), 
one solves the linear relaxation 


max CX 

s.t. Ax <b (u) 
-x<-I' a) os 
x= U* (B), 


where the lower and upper bounds L’, U‘ are defined by 
branching, and associated dual variables are shown on 
the right. By weak linear programming duality, v;(b) = 
ub—aL' + BU' isan upper bound on the optimal value 
of (19) with perturbed right-hand side d = b + Ab. If 


(19) is infeasible, let u, a, 8 be the dual solution of the 
phase I problem in which the objective function is the 
sum of negative constraint violations. In this case v,(b) 
is — 00 if ub — a + B < 0 and is 00 otherwise. 

Now if t1, t2 are the successor nodes of node t in the 
search tree, 


w;(b) = min{v;(b), max{w;,(b), w:,(b)}} 


is an upper bound on the optimal value of (19) with 
right-hand side b = b + Ab and integral x. (At leaf 
nodes, the max expression is omitted.) The recursively 
computed function wo associated with the root node 
solves the dual problem (12) because wo(b) is the op- 
timal value of (5). 

The dual solution wo for the example problem (6) is 
indicated in the branch and bound tree for this problem 
depicted in Fig. 6. A plot of wo(b + Ab) as a function of 
Ab, appears in Fig. 2. 
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Inference Duality 


Still another interpretation of the linear programming 
dual views it as an inference problem. It wishes to find 
the smallest upper bound z* on the objective function 
that can be inferred from the constraints. This dual of 
(1) can be written 


min Zz 


Ax<b 
st. ( ea 


(20) 
) imply cx < z. 


A corollary of the classical Farkas lemma states that the 
constraints Ax < b, x > 0 imply cx < z if and only if 
they are infeasible or some surrogate uAx < ub domi- 
nates cx < z;i.e., uA > cand ub < z. So the inference 
dual (20) seeks the smallest ub for which uA > c and 
u > 0 (assuming the constraints are feasible), which is 
precisely the linear programming dual. 

The inference dual can be generalized to integer 
programming if the implication in (20) is interpreted 
differently [4,13]. Constraints Ax < b, x > 0 imply cx < 
z if and only if all integer (rather than all real) vectors x 
that satisfy the former also satisfy the latter. There is ob- 
viously no duality gap, because the maximum value z* 
of cx is the smallest upper bound on cx implied by the 
constraints. As will be seen, inference duality allows cal- 
culation of sensitivity ranges for all problem data (not 
just right-hand sides) by solving linear programming 
problems. 

To solve the dual (20) is in effect to exhibit a proof 
that the value of cx is at most z*. In linear program- 
ming, a proof is a nonnegative linear combination of 
constraints, and the optimal dual multipliers u encode 
the desired proof. A method of proof suitable for inte- 
ger programming is developed in [12], but for present 
purposes it suffices to reconstruct a proof from the 
branch and bound tree that solves the primal problem. 
Actually it will be proved that cx is at most z* + Az (for 
any Az > 0), to provide a more flexible analysis. 

The proof is by contradiction. Assume, contrary to 
the claim, that the optimal value of (5) is strictly more 
than z* + Az. Then each branch of the tree can be seen 
as leading to a contradiction. At any given leaf node t 
let zzz be the value of the best integral solution found 
so far (zzz = — 00 if none has been found). One of the 
following cases obtains. 

a) The linear relaxation (19) is infeasible. Then the 

dual solution (u, a, B) proves infeasibility; i.e. u'A 


—a+ Bh >Oand u'b —a'L' + B' U' <0. So the 
constraints u'Ax < u'b, L' < x < U',x > Oare also 
infeasible. In other words, the bounds L' < x < U' 
are inconsistent with the surrogate u'Ax < u'b. 

b) The solution of (19) is integral with value z;, where 
Z, > z'8. So the constraints 


—cx < —Z,— Az 
Ax <b 
—x<-L' 


x<U'! 


(21) 


are infeasible. If (u', a’, B') is the dual solution of 

(19), the multipliers (1, u‘, «', 8‘) prove infeasibility 

of (21). This means that the bounds L' < x < U! 

are inconsistent with the surrogate (u'A — c)x < 

u'b —Z, — Az. 

c) The optimal value Z of (19) satisfies Z < zi®, where 
2} is the current lower bound (the tree is pruned at 
this node). Here the bounds L’ < x < U‘ are incon- 
sistent with the surrogate (u'A —c) x < u'b — zi? — 
Az. 

Thus there is a contradiction at every leaf node, because 

the bounds L' < x < U' are inconsistent with some sur- 

rogate at every leaf node. 

The key to sensitivity analysis is that a contradiction 
remains at every leaf node, and the proof remains valid, 
so long as the bounds remain inconsistent with the sur- 
rogates after perturbation of the data. To analyze how 
much perturbation is possible, the following observa- 
tion is helpful. The bounds L < x < U are inconsistent 
with inequality dx < 6 if and only if there exists a vector 
d > O such that 


dL=d(U =L)s6 


a = (22) 
d>d, d>0. 
Now let (5) be perturbed as follows: 
max (c+ Ac)x 
st. (A+ AA)x <b+Ab (23) 


x > Oand integer. 


Thus the violated surrogate in case a) becomes u'(A + 
AA) x < u'(b + Ab), and similarly in cases b) and c). 
Using (22), the optimal value of (23) rises no more than 
Az (Az > 0) if the perturbation satisfies the following 
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Integer Programming Duality, Table 3 
Properties of five integer programming duals 


Type of dual Strong duality? Computational Sensitivity analysis? Complementary 
bounds? slackness? 
Surrogate No Yes Very limited, even ifnodu- No 
ality gap 
Lagrangian No Yes For RHS, ifno duality gap _—‘Yes, if no duality gap 
Superadditive | Yes Not practical For RHS only Yes 
Branch and Yes No For RHS only No 
bound 
Interface Yes No Bounds for all problem date No 
for some q‘ > Oat every leaf node t, e inference dual: — oo < Ab; <1; 
e superadditive dual: — 00 < Ab, < 0.375; 
(q' + Aq')L' — ‘(Ut — L') e branch and bound dual: — 00 < Ab, <1; 
>ul(A+ AA) +z, (24) e maximum range: — 00 < Ab, < 2. 
qi>qitAg', qi 0. No perturbation within the maximum range causes the 
optimal value to rise above 50. The various forms of 
Here sensitivity analysis generally provide more conservative 
ganas ranges (the same is true of classical linear program- 
; : . ; ming). This example shows that the superadditive dual, 
Ag’ =wAA-—uAc, although the hardest to compute, does not necessarily 
(0, €) in case a), provide the sharpest analysis. The inference dual, un- 
Cx eee mee like the others, provides ranges for all problem data: 


(, gf + Az) incase c). 


This can be checked by linear programming. Note that 
the perturbations AA, Ab, Ac are not restricted to be 
nonnegative. Ranges for any perturbation can be com- 
puted by minimizing and maximizing it subject to (24) 
with all other perturbations set to zero. 

The dual solutions in Fig. 6 suffice to generate the 
inequalities (24) for the example problem (6). Leaf 
nodes 2, 3 and 4 respectively illustrate cases b), c) and 
a). For instance, the inequalities for leaf node t = 2 are 


— 2Ac, — Ac, — 2g; — G3 = 0, 
Gq, = —20— Aci, 
q; = —-10— Aco, 


q = 0, 
a = 0. 


At leaf nodes 3 and 4 one must assume some large but 
finite upper bound on variables x; for which Uj is oth- 
erwise infinite. The resulting sensitivity range for b, is 
given below, along with the ranges yielded by the super- 
additive dual, the branch and bound dual, and the true 
value function (the last three from Fig. 2). 


(0 0)<Ac<(c@ ov). 


By setting Az to 10 rather than zero in (24), one obtains 
ranges within which perturbations do not increase the 
optimal value more than 10, and so forth. 

Like branch and bound duality, inference duality is 
computationally impractical if the branch and bound 
tree is too large, although it requires fewer data from 
the tree. It does not provide an explicit approximation 
of the value function as superadditive and branch and 
bound duality do. However, only inference duality pro- 
vides easily computable sensitivity ranges, not only for 
right-hand sides but for all problem data. 


Conclusions 


Table 3 summarizes the properties of the various duals. 
The surrogate and Lagrangian duals are used primarily 
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for computational purposes, because they provide in- 
dependent bounds on the optimal value. The remain- 
ing duals are useful for sensitivity analysis. The super- 
additive and branch and bound duals provide a more 
complete analysis of right-hand side sensitivity. The lat- 
ter requires a branch and bound solution of the prob- 
lem but considerably less computation. Inference dual- 
ity requires a branch and bound solution and provides 
only sensitivity ranges, but ranges can be obtained for 
all problem data by solving linear programming prob- 
lems. 

One can also formulate a dual based on congruence 
relations [21] that is not discussed here. H.P. Williams 
provides an interesting discussion of this and some 
other duals (surrogate, Lagrangian, superadditive) in 
[22]. General treatments of Lagrangian and superaddi- 
tive duality may be found in [17,19], with a brief discus- 
sion of surrogate duality in the former. Excellent pre- 
sentations of Lagrangian duality appear in [6] and [1, 
Chap. 6]. 
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Relaxation is important in optimization because it pro- 
vides bounds on the optimal value of a problem. One 
of the more popular forms of relaxation is Lagrangian 
relaxation, which is used in integer programming and 
elsewhere. 

A problem is relaxed by making its constraints 
weaker, so that the feasible set is larger, or by approx- 
imating the objective function. In the case of a mini- 
mization problem, the optimal value of the relaxation is 
a lower bound on the optimal value of the original prob- 
lem. For a maximization problem it is an upper bound. 
The art of relaxation is to design a relaxed problem that 
is easy to solve and yet provides a good bound. 


Purpose of Relaxation 


Relaxation bounds are useful for two reasons. First, they 
can indicate whether a suboptimal solution is close to 
the optimum. If a minimization problem, for example, 
is hard to solve, one might settle for a suboptimal solu- 
tion whose value is close to a known lower bound. An 
optimal solution would not be much better. 

Second, relaxation bounds are useful in accelerating 
a search for an optimal solution. In a solution of an inte- 
ger programming problem, for example, one normally 
solves a relaxation of the problem at each node of the 
branch and bound tree. Suppose again that the objec- 
tive is to minimize. If the value of the relaxation at some 
node is greater than or equal to the value of a feasible so- 
lution found earlier in the search, then there is no point 
in branching further at that node. Any optimal solution 


found by branching further will have a value no better 
than that of the relaxation and therefore no better than 
that of the solution already found. Lagrangian relax- 
ation is often used in this context, because it may pro- 
vide better bounds than the standard linear program- 
ming (LP) relaxation. 


Lagrangian Relaxation 


Lagrangian relaxation is named for the French math- 
ematician J.L. Lagrange, presumably due to the occur- 
rence of what we now call Lagrange multipliers in his 
calculus of variations [2]. Because this form of relax- 
ation changes the objective function as well as enlarging 
the feasible set, it is necessary to broaden the concept of 
relaxation somewhat. 

Consider the problem of minimizing a function f(x) 
subject to x € S, where x is a vector of variables and S the 
set of feasible solutions. The epigraph of the problem is 
the set of all points (z, x) for which x € S and z > f(x). 
This is illustrated in Fig. 1. The problem of minimizing 
f' (x) subject to x € S’ is a relaxation of the original prob- 
lem if its epigraph contains the epigraph of the original 
problem. That is, a) S C S’ and b) f(x) < f’(x) for all x 
€ S. Relaxation is therefore conceived as enlarging the 
epigraph; enlarging the feasible set is a special case. It 
is clear that the optimal value of a relaxation still pro- 
vides a lower bound on the optimal value of the original 
problem. 


Integer Programming: Lagrangian Relaxation, Figure 1 
Epigraph of an optimization problem min {f(x): x € S } (darker 
shaded area) and of a relaxation min { f’(x): x € S’ } (darker 
and lighter shaded areas) 
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Lagrangian relaxation is available for problems in 
which some of the constraints are inequalities or equa- 
tions. Such problems may be written as 


minimize f(x) (1) 
subject to g(x) <0 (2) 
xeES. (3) 


Here, g(x) is a vector of functions (g1(x), ..., £m(x)), and 
(2) is a family of m constraints g;(x) < 0. There is no 
loss of generality in omitting equality constraints h;(x) 
= 0 from this formulation, because they can be written 
as two inequality constraints, hj(x) < 0 and — h,(x) < 
0. The constraints (3) may take any form, inequality or 
otherwise. 

The Lagrangian relaxation is formed by “dualizing’ 
the constraints (2): 


min f(x)+Ag(x) (4) 

xeS. 
Here, A = (Aj, ... 
grange multipliers that correspond to the inequality 
constraints. The aim of dualization is to remove the 
hardest constraints from the constraint set, so that the 
relaxed problem is relatively easy to solve. 


, Am) is a vector of nonnegative La- 


The Lagrangian relaxation is in fact a relaxation be- 
cause its epigraph contains the epigraph of the original 
problem (1)-(3). This can be verified by checking con- 
ditions a) and b): 

a) The feasible set of the original problem is a subset of 
the feasible set of the relaxation, because the relax- 
ation omits some of the original constraints. 

b) If x is feasible in the original problem, then f(x) > 
f(x) + A g(x). This is because A > 0 and, due to the 
feasibility of x, g(x) < 0. 


The Lagrangian Dual 


A relaxation can be constructed simply by eliminat- 
ing the constraints (2) rather than dualizing them. 
One might ask what is the advantage of dualization. 
One rationale is that when the Lagrange multipliers 
are properly chosen, the penalties A;g;(x) in the objec- 
tive function hedge against infeasibility. To the extent 
that constraints g;(x) < 0 are violated and the bound 


thereby weakened, the objective function will be penal- 

ized, restoring the quality of the bound. 

Fortunately one can search for a proper choice 
of multipliers. The Lagrangian relaxation is actually 
a ‘family’ of relaxations, parameterized by the vector A 
of multipliers. This provides the possibility of searching 
over values of A to find a relaxation that gives a good 
lower bound on the optimal value. 

The problem of finding the best possible relaxation 
bound is the Lagrangian dual problem. If @(A) is the 
optimal value of the relaxation (4), the Lagrangian dual 
of (1)-(3) is the problem of maximizing 6(A) subject to 
A> 0. 

Under certain conditions the best relaxation bound 
is equal to the optimal value of the original problem (1)- 
(3) [1]. Generally, however, it falls short. The amount 
by which it falls short is the duality gap. 

The Lagrangian dual problem has three attractive 
features: 

e Itneed not be solved to optimality. Any feasible so- 
lution provides a valid lower bound. 

e Its objective function 6(A) is always a concave func- 
tion of A. One need only find a local maximum, 
which is necessarily a global maximum as well. 

e Its solutions have a complementary slackness prop- 
erty. If certain A;’s are positive in an optimal solu- 
tion of the dual problem, then the corresponding 
constraints g;(x) < 0 are satisfied as equations in 
some optimal solution of the primal problem (1)- 
(3). 

A serious drawback of the Lagrangian dual is that sim- 

ply evaluating the objective function (A) for a given A 

normally requires solution of an optimization problem. 

The relaxation must be carefully chosen so that this is 

practical. Moreover the function @ is typically nondif- 

ferentiable. 

Why is the Lagrangian dual a ‘dual’? Perhaps be- 
cause it generalizes the LP dual, which is the Lagrangian 
dual of an LP problem. To see this, consider the LP 
problem min {cx: Ax > a, x > 0 }. Its Lagrangian dual is 
to maximize 


A(A) = min{cx + A(a — Ax)} 
= min{(c —AA)x + ra} 


over A > 0. So @(A) is — oo if some component of c 
— AA is negative and is Aa otherwise. This means that 
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maximizing 6(A) over A > 0 is equivalent to maximiz- 
ing Aa subject to AA < c and A > 0, which is precisely 
the LP dual. 

The duality relationship holds more convincingly, 
however, between two problem-solving strategies: solu- 
tion of strengthenings and solution of relaxations [12]. 
Methods that solve strengthenings include branching 
methods, local search heuristics, and other techniques 
that enumerate solutions or partial solutions by fixing 
some or all of the variables. Solution of each strength- 
ening provides an upper bound on the optimal value, 
and the goal is to find the smallest upper bound. If the 
search is exhaustive, the smallest upper bound is equal 
to the optimal value. 

The dual strategy is to solve relaxations of the prob- 
lem in order to find the largest possible lower bound 
on the optimal value. There is no obvious way to enu- 
merate relaxations, however, unless they are somehow 
parametrized, in which case one can enumerate values 
of the parameters. The Lagrangian dual is one way of 
doing this but by no means the only. Another is the sur- 
rogate dual [7,8,9], in which the relaxed constraint set 
is a nonnegative linear combination of inequality con- 
straints, and relaxations are parametrized by the vector 
of multipliers in the linear combination. The dual ap- 
proach also differs from the primal in that an exhaustive 
enumeration normally does not guarantee that the best 
bound obtained is equal to the optimal value. There is 
usually a duality gap. 


Integer Programming 


The application of Lagrangian ideas to integer pro- 
gramming dates back at least to H. Everett [4]. In this 
arena the optimization problem (1)-(3) becomes, 


min cx 
st. Ax<a 
Bx <b 


x; integer for all j. 


(5) 


The ‘hard’ constraints Ax < a are dualized in the La- 
grangian relaxation, 


cx + (Ax — a) 
st. Be<b (6) 


x; integer for all j, 


min 


and @(A) is the minimum value of this problem for 
a given A. The optimal value zzp of the Lagrangian dual 
is a lower bound on the optimal value zip of (5). It will 
be seen shortly that the bound zzp is at least as good 
as the bound zzp obtained by solving the LP relaxation 
of (5). (The LP relaxation is the result of dropping the 
integrality constraints.) 

In the context of integer programming, the La- 
grangian function @(A) is not only concave but piece- 
wise linear. This is because 0(A) is the maximum ofa set 
of linear functions cx + A(Ax — a) over all integral val- 
ues of x that satisfy Bx < b. 

A fundamental property of the Lagrangian dual is 
that z;p is equal to the optimal value zc of 


min cx 
st. Ax<a (7) 
x € conv(S), 


where S is the set of integer points satisfying Bx < b, and 
conv(S) is the convex hull of S [6]. The Lagrangian dual 
can therefore be written as an LP problem, if a linear 
description of conv(S) is available. 

The reasoning behind this fact goes as follows. Be- 
cause the feasible set of (7) is that of an LP problem, the 
optimal value of its Lagrangian dual is equal to zc. To 
see that it is also equal to zzp, thereby proving Zc = Zzp; 
it suffices to show that the Lagrangian relaxation of (7) 
always has the same optimal value as the Lagrangian re- 
laxation of (5). But this is true because the former is the 
same problem as the latter, except that the constraints 
in former are x € conv(S) and in the latter are x € S. This 
substitution has no effect on the optimal value because 
the objective function is linear. 

It can now be seen that the bound zzp is always at 
least as good as zp. Let Cyp be the problem (7) corre- 
sponding to (5), and let C;p be the problem (7) corre- 
sponding to the LP relaxation of (5). Cyp ’s feasible set 
contains that of Cyp, and its optimal value is therefore 
less than or equal to zzp. But because Cy is identical to 
(5)’s LP relaxation, zp < Zzp. 

When Bx < b happens to describe a polyhedron 
whose vertices have integral coordinates, Cyp and Crp 
are the same problem. In this case zp = Zzp. 

To sum up, 


Zup S 2Lp = Zc S Zip, 
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| 
(2,0) 


Integer Programming: Lagrangian Relaxation, Figure 2 
Feasible set of an integer programming problem (large dots) 
and its linear programming relaxation (area shaded by small 
dots). The point (2, 0) is the optimal solution, and (2.5, 0) is 
the solution of the LP relaxation 


where the first inequality is an equation when Bx < b 
describes an integral polyhedron. 

As an example consider the integer programming 
problem (Fig. 2): 


min —2x;—x2 
s.t. 4x; + 5x2 < 10 
(8) 
0< xjS 3 
x; integer, j = 1,2. 
The optimal solution is x = (2, 0), with value zp = — 4. 


Dualizing the first constraint decouples the variables: 


A(A) = min, {2x =X): + A(4x1 + 5x2 = 10)} 
x; integer 
= pimin, (4a —1)x, + (5A + 1)x2 — 104}. 
Sxj= 


xj integer 


Because of the decoupling, 0(A) is easily computed: 


I7A-9 if0<A <i, 
OA)= 420-6 iff <AK<}, 
-10A ifA > 5. 


It is evident in Fig. 3 that @ is a concave, piecewise lin- 
ear function. The optimal value of the Lagrangian dual 
is Z_p = 0(1/2) = — 5, resulting in a duality gap of zjp — 


Integer Programming: Lagrangian Relaxation, Figure 3 

The Lagrangian function @(A) for an integer programming 
problem. The optimal value of the Lagrangian dual problem 
is 0(1/2)=—5 


Zp = 1. The optimal value of the LP relaxation is like- 
wise — 5, so that in the present case zzp = Zzp. This is 
predictable because Bx < b consists of the bounds 0 < 
x; < 3, which define an integral polyhedron. 

In practical applications, the Lagrangian relaxation 
is generally constructed so that it can be solved in poly- 
nomial time. It might be a problem in which the vari- 
ables can be decoupled, as in the above example, or 
whose feasible set is an integral polyhedron. Popular 
relaxations include assignment or transportation prob- 
lems, which can be solved quickly. 

A notable example is the traveling salesman problem 
on n cities: 


minimize 2 ey (9) 
ij 
subject to 
ban = 1, all i, (10) 
j 
y= 1, all j, (11) 


eae > 1,all nonempty V C {2,..., nm}, (12) 


i¢V jeV 


xij 2 0, Xij integral, all i, j. (13) 
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If the assignment constraints (10) are dualized, the La- 
grangian relaxation minimizes 


ars + per Oe —1) 
ij i j 
= Yicij + Ai)xij — yoai 
ij i 


subject to (11)-(13). This is equivalent to finding 
a minimum-cost spanning arborescence that is rooted 
at node 1, which can be done in polynomial time [3]. 
Alternatively, the Lagrangian relaxation can be solved 
as an LP problem without the integrality constraints, 
because the same optimal value results [3]. See [10,14] 
for a survey of efforts along this line. 


Solving the Dual 


Subgradient optimization is a popular method for solv- 
ing the Lagrangian dual, because subgradients of 0 (and 
gradients when they exist) can be readily calculated. 

Let X(A) be the set of optimal solutions of the La- 
grangian relaxation (4) when A = A. If X(A) isa single- 
ton {x}, then the gradient of 6 at A is simply the vector 
g(x). This is because for values of A in a neighborhood 
of A, 0(A) is the linear function f(®) + Ag(x). 

More generally, for any x € X (A), g(x) is a subgra- 
dient of 6 at A. In fact, every subgradient of 6 at A is 
a convex combination of subgradients that correspond 
to the solutions in X(A). 

In the integer programming case, the subgradients 
of 6 at A are Ax — a for each X¥ € X(A), and convex 
combinations thereof. Consider the example (8), where 


{(3, 3)} if0 <A < 1/5, 
{(3, 3), (3,0)} ifA = 1/5, 

X(A) = 4 {(3, 0)} if i/4-< 2 =< 1/2, 
{(3,0),(0,0)} ifA = 1/2, 
{(0, 0)} if A > 1/2, 


Thus at A = 0, 6 has the gradient (slope) of 4(3) + 5(3) — 
10 = 17. At A = 1/5, the subgradients of @ are 17, 2, and 
their convex combinations; i. e., all slopes in the interval 
[2, 17]. This can be seen in Fig. 3. 


Further Reading and Extensions 


A lucid geometrical exposition of Lagrangian duality 
may be found in [1, Chap. 6]. A widely read treatment 


of its application to integer programming is [5]. A re- 
cent tutorial is [11], which also surveys methods for 
solving the dual. [13, Sect. III.2.6] describes some meth- 
ods for strengthening the Lagrangian relaxation. There 
is a vast literature on applications and enhancements. 

The idea of the Lagrangian dual need not be lim- 
ited to the use of Lagrange multipliers. A dual prob- 
lem can be solved over any parametrized family of re- 
laxations. The dual problem might be solved by a local 
search heuristic over the parameter space. 
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Introduction 


Integrated planning and scheduling of events in pro- 
duction systems are among the most important fac- 


J 


Time Slot for 
Scheduling 


Planning Horizon —_"—_ 


Time Period for 
€ Planning 7 


Integrated Planning and Scheduling, Figure 1 
Time relation between planning and scheduling 


tors that affect the efficient operation of these systems. 
The main objective of integrated planning and schedul- 
ing is the allocation of resources that change dynam- 
ically over the time domain, and the coordination of 
the activities that are required to satisfy customer de- 
mand. One difference between planning and schedul- 
ing is the time scale over which the actions for these 
factors are triggered. During the planning phase, alloca- 
tion of available resources and satisfaction of the given 
demand are made for a medium-term horizon that is 
usually expressed in months. During the scheduling 
phase, however, short-term allocation of available re- 
sources and the timing of the production of specific 
orders are the main decisions and these involve time 
scales of days to weeks. 

During both planning and scheduling phases, the 
time horizon to be planned is divided into time slots. 
The main difference between planning and scheduling 
problems in terms of execution times is the length of 
time scales. Whereas the time scale for the planning 
phase is longer, the time scales for the scheduling phase 
are shorter as shown in Fig. 1. 

For a realistic case, the time period for the plan- 
ning phase is measured in weeks or months, there- 
fore, total production and the inventory of each prod- 
uct at the end of each time period are the only perfor- 
mance issues for the system. For scheduling, however, 
the length of the time slots is measured in hours. There- 
fore in addition to the production quantity, the produc- 
tion sequence of each product becomes important dur- 
ing scheduling . 

Traditionally planning and scheduling have been 
performed separately on the shop floor, but this decom- 
position of two activities leads to decreased efficiency 
of the operations performed in the production centers. 
Moreover, medium-term plans may result in infeasible 
projections if they are made without consideration of 
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short-term performance requirements, and conversely, 
short-term plans may cause myopic decisions without 
consideration of long-term performance issues. 

Ideally, an integrated model is required in which 
planning and scheduling can be considered simultane- 
ously. This model should include both medium-term 
capacity utilization and production level values and 
short-term production sequence and machine assign- 
ment decisions. Due to simultaneous consideration of 
medium- and short-term decisions, the representation 
of time slots is one of the primary concerns in inte- 
grated planning and scheduling problems. 


Representation of Time Slots 


The first decision that should be taken during the mod- 
eling of the planning and scheduling of a system is the 
representation of time. Time can be represented in in- 
tegrated planning and scheduling problems using dis- 
crete or continuous formulations [2]. The choice of 
time representation is related to whether an event in the 
system can take place only at predefined times or at any 
instant. If the events can take place only at predefined 
times, then a discrete time representation is required. 

Discrete Time Model: In a discrete time model, the 
whole planning horizon is divided into predefined time 
intervals. This representation assumes that an event can 
occur only at the boundaries of each time interval [4]. 
Therefore, during the solution of the model, no plan- 
ning horizons, other than the boundaries of a finite 
number of time slots, are considered, and this simplifi- 
cation makes the model more tractable. To express the 
exact behavior of the system during the planning hori- 
zon, however, the length of the time slots should be kept 
as short as possible, which may cause an explosive in- 
crease in the number of variables. On the other hand, 
increasing the length of time slots may give infeasible 
results. 

Continuous Time Model: In the continuous time 
model an event can occur at any instant within the 
whole planning horizon. This makes the model more 
dependable and flexible and the total number of vari- 
ables decreases. The representation of some constraints 
becomes more complex, however, and this decreases 
the tractability of the model [2]. 

Mixed Time Representation: A mixed time represen- 
tation that includes both discrete and continuous time 


has also been studied [3]. In this situation, the time slots 
are fixed and the durations of the processes are kept 
constant in discrete time. The durations of the process 
task are expressed as variables in the mixed time repre- 
sentation. This is accomplished by setting the durations 
of the process tasks to be multiples of a fixed time grid. 


Problem Definition 


In this section, the model given by Dogan and Gross- 
mann [1] is examined in detail. Given that several prod- 
ucts are to be produced in a single production unit, 
the planning horizon is divided into planning periods 
of one week and each week-long planning period is di- 
vided into N time slots, where there are N products to 
be produced. At the end of each week, the demand is 
determined for each product. The production level in 
the system is constant and it is also a cost issue for the 
model. Transition time of production with respect to 
production sequence is given within this system. 

The problem is the determination of products that 
should be produced each week and the sequencing of 
the production of these items. During sequencing, the 
production time, the amount and production duration 
are determined for each product. In addition, the in- 
ventory levels should be determined for each time pe- 
riod. 

The MILP Model: 

The objective of the model is the maximization 
of profit. 
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In the objective function, pj; is the price of product i 
in time period t, S;; is the amount of sales of product iin 
time period f, ciny is the inventory cost, Area;; is the area 
below the inventory versus time graph for product i at 
time period ¢, c;Pis the operating cost for product i in 
time period t, Xj; is the amount of the product i pro- 
duced in time period f, ci¥?* is the transition cost from 
product i to product k, Z;x1; is a binary variable that in- 
dicates that production of product iis followed by prod- 
uct k in time slot / of the time period t, and TRT jx; is 
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a binary variable that denotes production of product i 
is followed by product k at the end of time period t. 

The first term of the objective function gives the rev- 
enue generated, the second term is the inventory cost, 
the third term is the total operation cost, the fourth 
term is the transition cost within each week, and the 
last term is the transition cost between each week. 

The assignment of production orders and corre- 
sponding processing times for each order are given by 
the following: 


> Wit =1 VW1,t (2) 

pete Vi, Lt (3) 

Oi = > Gin Vint (4) 
1 

Xin =riOin Vi,Lt (5) 

Xie = 0 Xin Vist (6) 
1 


In Eqs. (2)-(6), Wii: is a binary variable that denotes 
the production of product i in the time slot / of the time 
period t, 9; is the production time of product i, in time 
slot / of the time period t, H;, is the length of time period 
t, Xj, is the total production time of the product i in 
the time period ¢, and Xjjis the production amount of 
product i in time slot / of the time period t. 

Equation (2) states that only one product can be 
produced in the each time slot. According to Eq. (3), the 
production time of product i, at time slot / of the time 
period t will be zero, if this product is not assigned to 
time slot / of time period t. Equation (4) states that the 
total production time of the product i, at time period t 
is equal to the sum of the production time of this prod- 
uct during time slots of the corresponding time period. 
Equation (5) represents the total production amount of 
product i, during time slot / of the time period tf, and 
Eq. (6) calculates the total production of product i, dur- 
ing the time period ¢. 

The transition from one product to another product 
is expressed by the following constraint: 


Ziklt = Witte + Wei+it — 1 


Vi, k,1,t (7) 


Equation (7) ensures that if product k is produced 
after product i, in the time period /, then Z;,1; will be 
equal to 1, otherwise it will be 0. 


An important consideration is the starting and end- 
ing times for each task in the production schedule. The 
following constraints are used to calculate them: 


Tey, = Tsip + >> in a s, > TikZikir W1,t (8) 
i ik 


TRTigt = Witt + Wejti,t41 — 1 
Vi,k,t,1=N,1l=1 (9) 


Tey + >) >. tik TRTike = Tsiiet1 


Vt,1=N,l1=1 (10) 
Ter, => Tsi+it V1 x N,t (11) 
Ten; < HT; Vt (12) 


In Eq. (8), Tei: is the end of the time slot / of the 
time period t, Ts); is the start time of the time slot / 
of the time period ¢ and t;, is the transition time be- 
tween product i and product k. Equation (8) states that 
the end of time slot / of time period / is equal to the 
start time plus the total processing time for the prod- 
ucts produced in that slot and the total transition time. 
In Eq. (9), TRT xe is equal to 1, if at the end of period t 
production of product i takes place, and at the begin- 
ning of the time period t+1 production of product k 
takes place. 

Equations (10) and (11) ensure the connectivity be- 
tween consecutive time periods. In Eq. (12), HT; is the 
length of the time period t and Eq. (12) also ensures that 
the end time of the last time slots in each time period 
cannot be greater than the length of the corresponding 
time period. 

The inventory levels for each product are updated 
with the following constraints: 


INV =INVio + >) riOin Vist =1 (13) 
1 
INVit = INVOii-1 + DiGi Vist #1 (14) 
1 
INV Oit = INV; — Sit Vi,t (15) 
Areaj, > INVOn-1H; + 7i9itHt Vi, t (16) 


In Eq. (13), INViz is the inventory level of the 
product i at time period t and INVio is the initial in- 
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ventory of product i. Equation (13) updates the in- 
ventory of each product in the first time period. In 
Eq. (14), INVOj+_1 is the inventory of product i at time 
period t-1 after demand of that product is satisfied. 
Equation (14) establishes the inventory and production 
quantity relationship for all periods other than initial 
time period. According to Eq. (15), the amount of in- 
ventory of product i after demand of it is satisfied is 
equal to total inventory of product i minus sales of this 
product for each time period. In Eq. (16) Areaj; is the 
area below the inventory versus time plot for product i 
at time period t. As the exact area is nonlinear the equa- 
tion overestimates this area. 

The demand for products is incorporated into the 
integrated planning and scheduling model with the fol- 
lowing constraints: 


Sit>= dit Wi,t (17) 

NY =) Wir Vit (18) 
1 

YOP;, > Wir Wilt (19) 

YOP;, < NY, <NYOP;, Vi,t (20) 


NY; >N-— (x vor.) - | — M(1— War) 


Vi,t 
(21) 


NY SN - (x vor.) 7 | — M (1 — Wit) 


Vi,t 
(22) 


In Eqs. (17)-(22), the dj; is the demand for the 
product i at time period t, NY; is the number of time 
slots during which product i is produced in the time 
period t, YOP;; is a binary variable that shows whether 
product i is produced during time period t and M is 
a sufficiently large number. In Eq. (17), the lower bound 
of demand satisfied is ensured. In Eq. (18), the num- 
ber of time slots during which product i is produced 
within the time period ¢ is found. Equation (19) en- 
sures that if production of product i at time slot / of 
time period t takes place, then YOP;; will be equal to 
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Integrated Planning and Scheduling, Figure 2 
An example of degenerate solution 


1. In Eqs. (21) and (22) the occurrence of a degenerate 
solution is prevented. According to the model formula- 
tion, if production of a product takes place in more than 
one time slot in the given time period, these time slots 
should be consecutive. If the production takes place in 
non-consecutive time slots, then the solution obtained 
will be suboptimal because of the existence of transi- 
tion costs. The reason for degeneracy in this consecu- 
tive production is illustrated in Fig. 2. 

As seen in Fig. 2 both solutions give the same objec- 
tive value since the total production time of all products 
is the same and therefore, without Eqs. (21) and (22) 
the model formulation will be highly degenerate. 


Solution Strategy 


The integrated planning and scheduling model pre- 
sented in Eqs. (1)-(22) gives rise to a complex 
mixed-integer programming problem. In the model, 
inventory and demand satisfaction trends are ob- 
served weekly because these two performance issues are 
planned in the medium-term time horizon. Timing and 
assignment constraints, however, are applied for each 
time slot within a week. An important assumption in 
this model is the production center. It is assumed that 
there is only one production center that carries out all 
of the planning and scheduling activities but the model 
is intractable even for this specific case of the problem. 
This intractability is not specific to this formulation as 
most planning and scheduling models that include re- 
alistic details are intractable. Two approaches are con- 
sidered for addressing the intractability of the problem. 
In the first approach, an integrative model is first for- 
mulated and then, with respect to some defined crite- 
rion, a decomposition scheme is applied to the model. 
In the second approach, a simple modeling technique 
such as single-item capacitated or incapacitated lot siz- 
ing is formulated and then a superposition of all the en- 
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tire simple models is derived to apply to the planning 
and scheduling scheme. 

In the first approach, both planning and scheduling 
problems are considered in an integrative manner and 
the quality of the result depends on the decomposition 
scheme. The aim of the decomposition is to decrease 
of the intractability of the model by removing from the 
formulation those details that make the model com- 
putationally complex. These details are the scheduling 
constraints that ensure feasibility on the shop floor. In 
this approach, an iterative solution is applied to make 
the problem computationally tractable while generating 
feasible schedules. 

According to this method, an integrative planning 
and scheduling model is first constructed and then the 
details that make the model difficult to solve are re- 
moved or aggregated on the time domain or with re- 
spect to common parts. This high-level model is solved 
to obtain a temporary solution that is then applied with 
respect to the scheduling criteria. If the obtained solu- 
tion is feasible on the shop floor, it is accepted. Oth- 
erwise, the system returns to the high-level model and 
changes it to make it to produce a feasible plan. The 
success of the results of the high-level plan depends on 
the accuracy of this model. If the model is not accu- 
rate enough to produce a feasible solution, much iter- 
ation can be done, although this is undesirable for the 
dependability and tractability of the model. 

To avoid the direct solution of the integrated MILP 
planning and scheduling model, a bilevel decompo- 
sition algorithm that applies a hierarchical decom- 
position scheme has been proposed by Dogan and 
Grossmann [1]. In this scheme, the original model is 
decomposed into two separate models. The first model 
is the upper-level planning model that determines the 
products to be produced and the level of production 
and inventory in each time period. The second model 
is the lower-level planning and scheduling model that is 
modeled initially. In the lower-level problem, the orig- 
inal model formulation is solved by only applying it to 
the products that the upper-level planning model has 
decided to produce. 

The upper-level planning problem is an MILP 
model and it is used to predict an upper bound for 
the original model formulation. This is obtained by ig- 
noring the detailed sequencing constraint that is im- 
portant for scheduling model. The result of the lower- 


level model, which is obtained by using the result of the 
upper-level model, creates a lower bound for the global 
optimum. As the lower-level model which is solved 
with respect to the result of high-level planning prob- 
lem is a sub-problem of the original model, the result 
produced is feasible. 

The proposed algorithm is applied in an iterative 
manner. The upper bound found by the upper-level 
planning problem and the lower-bound found by the 
detailed planning and scheduling model are compared. 
If the difference between these bounds is less than some 
predefined tolerance value, the algorithm terminates. 
Otherwise, some integer and logic cuts are added to the 
upper-level planning model to obtain a more refined 
solution for the original model. 

Zhu and Majozi [6] proposed another decomposi- 
tion algorithm to address the intractability of a plan- 
ning and scheduling model for multipurpose batch 
plants. They classified the economic concerns of the 
model as part of the planning problem and sequenc- 
ing concerns of the model as a part of the schedul- 
ing problem. The proposed decomposition scheme of 
the detailed planning and scheduling problem is based 
on the assumption of a block angular structure for the 
model. With respect to the block angular structure of 
the model, there should be two types of blocks in the 
model. The first type is the constraints that are com- 
mon for all plants in the system. In the multi-plant 
model, the first blocks are concerned with the allo- 
cation of resources, such as raw materials and labor. 
The second type of blocks concern the set of con- 
straints that are specific to each plant. These second- 
type blocks intersect with the sequence of resources al- 
located by the constraints of the first block. In the con- 
text of the planning and scheduling problem, the first 
type of blocks represent the planning problem and the 
second type of blocks represent the scheduling problem 
for each plant, separately. The proposed solution lies 
in the extraction of the block angular structure of the 
integrated model. After this step, the model is decom- 
posed into two separate parts: The first part consists of 
only planning blocks. Within these constraints, the al- 
locations of common resources are planned so that they 
can be classified as part of the model for the planning 
problem. The second part consists of only scheduling 
blocks, which include separate sets of constraints for 
each plant. In this second part the aim is to obtain the 
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optimal schedule for each plant and therefore it is clas- 
sified as a scheduling problem. 

To apply this decomposition scheme successfully, 
the integrated model must satisfy two main conditions. 
The first condition is the convexity of the constraints. 
If that is the case, the feasible region of the integrated 
model will form a convex set. Therefore, the schedules 
obtained from the separate scheduling block problems 
for a given optimal resource allocation will provide the 
global optimal solution of the integrated model. The 
second condition to be satisfied is the block angular de- 
composability of the integrated model formulation. It is 
necessary to show that all scheduling constraints can be 
separated into a non-intersecting set of constraints in 
order to satisfy this condition. This makes the schedul- 
ing part of the decomposition scheme, plant specific. 
On the shop floor this structure can be achieved if man- 
ufacturing of the products begin and end in the same 
plant. 

After the model has been decomposed, an iterative 
solution approach is applied to get the feasible and near 
optimal schedule. First the planning problem that in- 
cludes A blocks is solved and the allocation of the com- 
mon resources to the each plant is obtained. Then this 
result is incorporated into separate scheduling mod- 
els to produce detailed schedules for each plant. If the 
results of the scheduling problems do not match the 
targets of the planning problem, the planning prob- 
lem is resolved with the results obtained from schedul- 
ing problem. This procedure continues until the dif- 
ferences between the results of the planning problem 
and scheduling problems converge to a small threshold 
value. 

In the second approach, superposition of the simple 
and frequently studied models such as lot-sizing models 
is used. Pocket and Wolsey [5] proposed a single item 
decomposition method to deal with the intractability 
problem. In this method, a capacitated lot-sizing prob- 
lem is solved for each finished product individually. 
This model is solved to satisfy the demand for the 
end product and the inventory and production quan- 
tity of the end product are also monitored during the 
solution process. The proposed model is solved only 
for single product and multi-product cases, however, 
and as some products may share common resources, 
there may be infeasibilities when the schedules are com- 
bined. The superpositioning of individual models is re- 


quired in order to address this problem. The bill of 
materials (BOM) should be used during the superpo- 
sition since each end product is produced by using 
many intermediate products. The single-item decom- 
position technique begins by solving a capacitated lot- 
sizing model for each product, which is the master pro- 
duction scheduling (MPS) model [5]. After the solution 
of the MPS, the batch size of each product during the 
planning horizon is obtained. The batch sizes are de- 
composed into batches of intermediate products by us- 
ing the BOM. A rough cut capacity planning (RCCP) 
is executed in parallel to roughly check the feasibility of 
the MPS model with respect to the capacity available. In 
cases of infeasibility, the MPS model is revisited or the 
capacity of the rare resource is increased. 


Conclusions 


The execution of planning and scheduling tasks sep- 
arately on the shop floor causes infeasible and sub- 
optimal decisions. To prevent these problems, it is 
necessary to optimize the planning and scheduling 
tasks simultaneously but models in which planning 
and scheduling are integrated can become computa- 
tionally intractable because of the highly complex struc- 
ture of the model. To deal with this problem, two ap- 
proaches can be used. In the first approach the detailed 
planning and scheduling model is decomposed into 
simpler ones, and these models are solved iteratively 
with the detailed original model until a feasible solution 
that has satisfactory objective value is obtained. In con- 
trast, in the second approach, a single model is solved 
for each product individually and then superposition of 
these models is used to determine the feasibility of the 
results with respect to some shared resources. 
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The problem of optimizing a linear function of an un- 
known symmetric positive semidefinite matrix, subject 
to finitely many linear equations and inequalities, is 
called the semidefinite programming (SDP) problem. As 
is clear from the above definition, the SDP problem is 
a generalization of the very well known linear program- 
ming (LP) problem. 

Parallel to the great success of LP in applications, at 
the time of this writing, there is a tremendous amount 
of activity and excitement regarding the applications of 
SDP. Even at these early stages of the development, the 
applications are far-reaching. 

The SDP problem brings together many fields of en- 
gineering, computer science and mathematics. The the- 
ory and practice of SDP both draw from and contribute 


to a very large number of fields. For applications in 
combinatorial optimization see [4], for eigenvalue op- 
timization see [8], and for applications in engineering, 
system and control theory see [22]. Also see the spe- 
cial issue of Mathematical Programming dedicated to 
SDP [14], as well as the Handbook on Semidefinite Pro- 
gramming [18] and the proceedings of a Fields Institute 
workshop [15]. 


Preliminaries 


Let a symmetric n x n matrix X with real entries be 
given. Then X is positive semidefinite, denoted X > 0, if 
u? Xu > 0 forall u € R". X is positive definite, denoted 
X > 0, if u?Xu > 0 for all u € R"\{0}. To define lin- 
ear functions of this variable X, one equips the space of 
n X n matrices with an inner-product: (C, X) denotes 
the trace of (C’X). Using this notation, one can define 
a specific form of the SDP problem. Let a symmetric 
n X n matrix C, a column vector b, and m symmetric 
n Xn matrices Aj, A, ... , Am be given. Then the fol- 
lowing is the primal form of SDP: 


(P) min (C, X) , 
(Aj, X) = b;, ie {1, ie ,m}, 
X>0. 


Any SDP problem can be put into the above form. The 
dual of (P) can be defined (similarly to the LP dual) as 
follows: 


(D) max b'y, 


yd t= 6 


i=1 


S>0. 


Without loss of generality, it can be assumed that the 
matrices A;,A2,...,Am are linearly independent. If 
they are linearly dependent, then either the system 
(Aj, X) = b;,i € {1, ... ,m} has no solution, or there 
are some redundant equations which can be eliminated. 
In the first case, (P) is infeasible. In the second case, all 
redundant equations and corresponding Aj, b; can be 
eliminated, to arrive at an equivalent problem satisfy- 
ing the assumption. Under this assumption, for any so- 
lution, (y,S), of the equation )7"_, yiA; + S = C, the 
S part of the solution uniquely identifies y. Sometimes, 
in interior-point algorithms, it is convenient to refer 
only to S when one mentions a feasible solution of (D). 
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Even though one writes “min” and “max” in the def- 
initions of (P) and (D), the optimum values of these 
problems may not be attained, and even if they are at- 
tained, primal and the dual objective values may not 
equal each other. Thus, the duality theory for SDP is 
quite a bit more complicated than that for LP. (See [16] 
and the references therein, for a discussion of various 
duals and duality theorems.) 

However, if we assume that there exists 


xs 6 
such that 
(Aj, X) = b;, ie {1,...,m} 


and that there exists (y, S) such that 


m 

yo aa HC 8" =o, 

i=1 
then the duality theory guarantees that the optimum 
values of both problems (P) and (D) are attained and 
that they are equal. 

Now, we introduce the very general notion of in- 
terior-point methods. These are the methods of solv- 
ing convex optimization problems by generating a se- 
quence which lies in the relative interior of a convex 
set defined by the “difficult” constraints. In this defini- 
tion, one envisions an abstract formulation of convex 
optimization problems in which one has a maximal set 
of linear equality constraints and some convex set con- 
straint. The convex set constraint is the “difficult” con- 
straint. 

The basic idea of interior-point methods goes 
back at least to Frisch [3] (1950s) and to Fiacco and 
McCormick [2] (1960s). However, the current mod- 
ern interior-point algorithms have their origins in Kar- 
markar’s groundbreaking work [6]. 

The general definition above does not refer to the 
fact that much of the interest is in the algorithms that 
can be proven to generate approximately optimal so- 
lutions in polynomially many iterations in the dimen- 
sion of the problem and a desired accuracy, prescribed 
as a part of the input. The amount and the type of work 
required per iteration will be described shortly. Certain 
practical variants of such interior-point algorithms turn 
out to be very fast and robust for a wide range of ap- 
plications. Indeed, as in just like any other optimiza- 
tion problem, the algorithms which perform very well 


in extensive computational tests spark great interest 
for theoretical investigations in the area to further our 
comprehension of the efficiency of the interior-point 
methods as well as our understanding of the degree of 
difficulty of certain SDP problems for interior-point ap- 
proaches. 

At the time of this writing, the most popular al- 
gorithms are the primal-dual ones. These algorithms 
work almost equally hard in improving both the pri- 
mal and the dual solutions. However, for certain appli- 
cations, primal-only (or dual-only) algorithms, which 
work almost exclusively on the primal problem and 
use the dual to only generate bounds on the optimum 
value, are indispensable. This is usually due to the spe- 
cial structure of the problem at hand. The main ingre- 
dients of interior-point methods for SDP will be illus- 
trated for primal-dual algorithms. 

Interior-point algorithms can be classified with re- 
spect to many criteria. A rather obvious criterion is 
the initial iterate (X, y, S). All interior-point al- 
gorithms start with X© > 0, S > 0, and keep all it- 
erates positive definite. If the algorithm allows X or 
(y, S) not to satisfy the corresponding equality con- 
straints, then the algorithm is called an infeasible-start 
interior-point algorithm. In this article, the illustration 
of the details of the methods will be done mostly for 
feasible starting points. 

If X is feasible in (P) and (y, S) is feasible in (D), then 


(C,X)—bly = (X,S)>0. 


In the above, (X, S) = 0 if and only if both X and (y, S) 
are optimal in their respective problems. For the above 
reasons, (X, S) is called the duality gap. 

Next, some important concepts used in the mak- 
ing of an interior-point algorithm are mentioned. Once 
the initial iterate, (X, y, S), such that X® > 0, 
S© > 0, is given, one needs to generate a search direc- 
tion (dx, dy, ds) along which to move. Then a step size 
a > O describing how much to move in the given search 
direction must be determined. Practical methods allow 
two different step sizes: one for the primal iterate X“, 
and the other for the dual iterate (y, s®), Below, the 
main steps of a very basic interior-point algorithm for 
SDP are given. For this description and the rest of the 
article a common step size a, for both iterates, is used. 

The way in which the search directions and the step 
sizes are determined has a very significant impact on 
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InputInstance(C, Ai,...,Am,b,X, y, 5, €); 
iteration counter k := 0; 
WHILE (XxX), S®) > «DO 
find a search direction (dx, dy, ds); 
find a step size a > 0 such that 
XY +ad, > 0,8" +ad, > 0; 
set (ay yey Shea) <= 
(XM, y®), SM) + ody, dy, ds); 
setk :=k+1; 
END{ WHILE}; 
OUTPUT(X™), y), s®) 


Interior Point Methods for Semidefinite Programming, Algo- 
rithm 1 
Main steps of an interior-point algorithm 


the performance of interior-point methods. Two of the 
main theoretical foundations for choosing the search 
direction are mentioned below. 


Path-Following Algorithms 


Many practical algorithms are influenced by this foun- 
dation. One first picks a barrier function, F(X). This 
is a function which is defined on the set of symmet- 
ric, positive-definite matrices such that for every se- 
quence of matrices from this set, converging to a point 
on the boundary of the set, the value of F tends to in- 
finity. For improved theoretical and/or practical perfor- 
mance, one must enforce further conditions on the bar- 
rier function. For the purposes of this article, the barrier 
function (which does possess many of the such desired 
properties) is 


F(X) := —IndetX , 


the negation of the logarithm of the determinant of X. 
Consider the family of optimization problems parame- 
terized by ps > 0: 


(Py) min (C, X) + F(X), 
(A;,X) = b;, Vi, 
X>O0. 


The unique minimizer of this optimization prob- 
lem defines a point (X(y), y(y), S(4)) on the pri- 
mal-dual central path. Here, y(jz) and S(j) represent 
the dual variables for the equality and the semidefi- 
niteness constraints of (P,,), respectively. As j4 — 0, 


(X(i), y({4), S(jL)) converges to the optimal solutions of 
(P) and (D). 

The primal-dual central path can be expressed 
more explicitly. (X(j2), y(4), S(w)) is the unique solu- 
tion of the following system 


(A;,X) = bj, Vi, X>O0, 
DAL +S=C, 
i=1 
S=px . 


Path-following algorithms choose the search direc- 
tion to approximately follow this path. They are usually 
based on Newton’s method or are related to it. For a re- 
view of such search directions, see [7] and [19]. 


Potential Reduction Algorithms 


For these algorithms, one defines a potential function, 
based on a barrier function, to measure how good 
a given point is with respect to the duality gap and the 
proximity to the central path: 


(X, S) = (n + q)In(X, S) + F(X) + F(S). 


One chooses gq := ©(./n) for (current) the best theo- 
retical complexity results and q := ©(n) or larger for 
better practical performance. In this setting, the search 
directions are usually obtained by computing a steepest- 
descent direction for the potential function p(X, S) and 
projecting it onto the appropriate linear subspace to 
satisfy the equality constraints in (P) and (D). 


Step Size 
Once the search direction (d,, dy, ds) is computed, then 
a practical interior-point algorithm usually calculates 
a as a constant fraction of the maximum step size 
that keeps the next iterate positive definite. In a path- 
following algorithm for robustness of the performance, 
or good theoretical results, one might want to confine 
all (or some) of the iterates into a neighborhood of the 
central path. In a potential reduction algorithm, one 
usually chooses the value of @ as the minimizer of o 
along the given search direction. There are many other 
possibilities, see the potential reduction survey [20] and 
the references therein (such as [11] and [12]). 
Currently, interior-point methods provide the 
fastest algorithms in theory with respect to the worst- 
case complexity bounds proven so far. In the current 
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practice, interior-point methods also provide the fastest 
and most robust solution techniques for general SDP 
problems. One such theoretical result can be summa- 
rized as follows. 


Theorem: Let X, y, S, a feasible solution of (P) 
and (D), and € > 0 such that 


(0) ¢(0) 
nin (FF) Fx +75 < ¥nin(1/e) 


n 


be given. Then (certain variants of) the potential- 
reduction interior-point algorithm described above will 
generate in O (/nIn(1/e)) iterations feasible interior 
points X, S such that 


(Co) Se). 


In the above theorem, the function 


nin (=) + F(X) + F(S) 


is a proximity measure. It is nonnegative for every pair 
of interior points (X, S). It is equal to zero if and only if 
(X, S) lies on the central path (for pw := (X, S)/n). 

Alizadeh [1], and Nesterov and Nemirovskii [10] 
independently generalized interior-point methods to 
SDP problems. The above theorem is a specialization 
of a more general result for convex programming prob- 
lems from [10]. A similar result was independently ob- 
tained for SDP [1]. 


Search Directions 


In obtaining search directions, both path-following and 
potential reduction approaches end up with some linear 
system of equations, defined by the input of the prob- 
lem, the current iterate (x, Ss) and some other pa- 
rameters of the algorithm. The resulting system has the 
following structure: 


(Ai, de) = 1, 
m 


Yi (dy)iAi t+d,=r, 


i=1 


— (XS) 


Ed, + Fads 


where r“ is an m-vector representing the primal resid- 
ual (infeasibility of X), rf is an nx n symmetric 
matrix representing the dual residual (infeasibility of 


(y, S®)), and £ and F are linear operators on n x n 
symmetric matrices. The operators £ and F vary from 
one algorithm to the next and depend on the current 
iterate, as well as some other parameters of the under- 
lying algorithm. The choice of £ and F can have pro- 
found theoretical and/or practical effects on the perfor- 
mance of the algorithm. Finally, rf“ denotes an n x n 
symmetric matrix, a residual related to the desired value 
of (X, S) for the next iterate as well as the desired value 
of the proximity measure for the next iterate. 

In solving such systems to determine the search di- 
rections, many tools from numerical analysis become 
relevant. Moreover, one must exploit the existing struc- 
ture of the problem at hand to be able to efficiently solve 
the large-scale instances. 

As shown in [10], many of the fundamental ideas of 
interior-point methods can be applied to general con- 
vex programming problems. For primal-dual interior- 
point algorithms for general convex programming 
problems, see also [9] and [21]. The most general the- 
oretical results in this direction rely on the existence 
of very special barrier functions for every convex set 
(see [10], also see [5] for connections of interior-point 
methods to many other branches of mathematics via 
the barrier functions for SDP and more general convex 
optimization problems). 

Another important issue is that of the initial point 
and the detection of infeasibility in interior-point meth- 
ods. For such problems, the quality of the initial point 
and the value of the condition measures (measuring in 
the input space how far a given instance is from the 
boundary separating feasible and infeasible instances, 
etc. [17]) for the given instance are good attributes for 
evaluating the performance of the methods. 
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Interval analysis can provide valuable tools in several 
aspects of chemical process design, including steady- 
state process simulation and optimization, and the ini- 
tial synthesis and screening of process alternatives. The 
discussion below highlights the use of interval analy- 
sis in these areas. For a general description of prob- 
lems and issues in chemical engineering design, see [8] 
and [7]. 


Process Simulation 


Process simulators are used to compute the perfor- 
mance of a chemical process given its design (the pro- 
cess simulation problem), or to compute a design that 
meets given performance specifications (the process de- 
sign problem). In either case, the central problem in 
steady-state process simulation is the solution of an 
n X n system of nonlinear algebraic equations f(x) = 
0, where n may be very large (hundreds of thousands 
or more) and the equation system represents a math- 
ematical model of the process, including material and 
energy balances, thermodynamic equilibrium relation- 
ships, and other equations needed to describe the pro- 
cess. 

For solving the process model, Newton and quasi- 
Newton methods are widely used, but may not reliably 
converge, especially since a good initial guess is often 
hard to obtain. To improve convergence in these cir- 
cumstances, various approaches have been used. These 
include trust region techniques, such the dogleg method 
[4], and homotopy continuation methods (e.g., [10]). 
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An additional difficulty is that in process simula- 
tion there are invariably upper and lower bounds on the 
variables, x’ < x < x, violation of which may cause 
some functions to become undefined. Bounds are of- 
ten dealt with in an ad hoc manner involving trunca- 
tion or reflection of the correction step. A more natu- 
ral way of dealing with bounds is to use a mathematical 
programming approach (e. g., [1]), in which the bounds 
become an integral part of the problem. While a num- 
ber of the techniques noted above demonstrate excel- 
lent global convergence properties in practice, none of- 
fer a rigorous mathematical guarantee of convergence. 

A further difficulty in solving the nonlinear equa- 
tion systems arising in process simulation is that they 
may have multiple solutions. With the exception of ho- 
motopy based methods, none of the techniques men- 
tioned above are designed for finding multiple solutions 
when they exist. While in practice homotopy based 
methods are frequently able to locate all solutions to 
a problem, they offer no guarantee that all solutions 
have been found, except in special cases. 

All of the difficulties noted above, namely the lack 
of good initial guesses, the presence of variable bounds, 
and the possibility of multiple solutions, can be dealt 
with using interval analysis. For example, R.E. Swaney 
and C.E. Wilhelm [11] use a technique, based on re- 
peated solution of linear programs, which, through 
the use of bounds generated using interval analysis 
within a branch and bound framework, provides rig- 
orous global convergence to a solution of the process 
model. 

C.A. Schnepper and M.A. Stadtherr [5] use an inter- 
val Newton approach. This can rigorously enclose any 
and all solutions to the process model, and is essen- 
tially initialization independent, since it requires only 
initial intervals for the variables, and some of these 
bounds may be specified as part of the problem. Both 
serial and parallel implementations are described in [5], 
and provision is made for efficient handling of sparse 
matrices. Several example problems were successfully 
solved, ranging in size from 3 to 177 variables, includ- 
ing problems with multiple solutions. Performance on 
the larger problems was unpredictable, with two prob- 
lems of over one hundred variables being solved very ef- 
ficiently, even with very large initial bounds on the vari- 
ables, but one problem of 50 variables being unsolvable 
due to excessive computation time. However, for this 


50-variable problem, once smaller, more intelligently 
chosen (using knowledge of boiling points and critical 
temperatures) initial intervals were used, the problem 
was easily and efficiently solved. 


Process Optimization 


Perhaps the most natural formulation for a process de- 
sign problem is as an optimization problem. The pro- 
cess simulation problem is then viewed as an optimiza- 
tion problem with zero degrees of freedom. A typi- 
cal process optimization problem features a nonlin- 
ear objective function, nonlinear equality constraints 
(the process model), nonlinear inequality constraints, 
and upper and lower bounds on variables. Frequently 
these nonlinear programming problems are nonconvex 
as well, prompting interest in global optimization tech- 
niques to deal with the potential for multiple extrema. 

Several approaches to global optimization in pro- 
cess engineering have been proposed, including both 
deterministic and nondeterministic methods. Among 
the deterministic techniques used are branch and 
bound, cutting plane, primal-dual decomposition, and 
interval analysis. The work of R. Vaidyanathan and 
M.M. El-Halwagi [9] provides a good example of the 
use of interval analysis in this context. This is an inter- 
val branch and bound approach that is guaranteed to 
yield the global solution. The procedure is accelerated 
by using a ‘distrust region’ method for eliminating in- 
feasible portions of the search space and by use of local 
methods for some purposes. 

R.P. Byrne and I.D.L. Bogle [3] and Byrne [2] also 
use an interval branch and bound approach, but treat 
the interval lower bounding process as a convex pro- 
gramming problem. In [2] it is also shown how this 
interval-based approach can be applied in the context 
of modular process optimization software. Since mod- 
ular software predominates commercially, this work is 
of particular interest. 


Process Synthesis 


Before the process simulation and optimization prob- 
lems discussed above can be formulated, it is neces- 
sary to synthesize and screen process alternatives. These 
provide the base case problems for later process simu- 
lation and optimization studies. When there is uncer- 
tainty in design specifications, or when the design spec- 
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ification covers a range of values, then interval analysis 
can be a particularly useful tool in process synthesis. 

For example, see [6] for a problem involving the 
processing of high level nuclear waste. In this problem, 
the waste to be processed is characterized by intervals of 
composition, as are the requirements for a stable glass 
product. In [6], an interval propagation scheme that ex- 
ploits the structure of the problem and a simple pro- 
cess model is developed, and it is demonstrated how to 
use this to screen process alternatives and to infer other 
knowledge about the process design. 


Conclusion 


Interval analysis provides tools that can be used to 
solve process simulation problems with complete re- 
liability, providing a method that can guarantee with 
mathematical and computational certainty that the cor- 
rect result is found, and thus eliminating computa- 
tional problems that are encountered with conventional 
techniques. The method is essentially initialization in- 
dependent, deals with variable bounds naturally, and 
also guarantees the enclosure of multiple solutions if 
present. In process optimization, similar guarantees can 
also be provided that the global extremum has been 
found. There are many other problems in chemical pro- 
cess design, for instance in many aspects of process syn- 
thesis, that likewise are amenable to solution using this 
powerful approach. 
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Optimization can involve differential equations, for in- 
stance in the formulation of constraints. Interval anal- 
ysis provides methods for computing interval-valued 
functions, for example polynomials with interval coef- 
ficients, guaranteed to contain solutions to differential 
equations. Methods have been developed for initial and 
boundary value problems for both ordinary (ODE) and 
partial (PDE) differential equations [1]-[32]. 

For the initial value problem in ODEs, the Cauchy- 
Peano approach of classical analysis can be made into 
a constructive method using interval analysis. With in- 
terval arithmetic and interval extensions to standard 
functions, we can computationally verify sufficient con- 
ditions for existence of solutions, as well as construct 
upper and lower bounds on solutions. The techniques 
of automatic differentiation provide for efficient use of 
and (using interval arithmetic) bounding of remainder 
terms in Taylor series expansions, making interval Tay- 
lor series an effective method for initial value problems 
in ODEs [5,13,21]. See especially [33]. 

Many problems in differential equations, both ini- 
tial and boundary value problems for ODEs and PDEs, 
can be reformulated as integral equations. Interval anal- 
ysis provides means for using fixed-point theory and Pi- 
card-Lindeléf-type iteration constructively on such re- 
formulations [3,4,9,15], [17]-[29]. 

For initial value problems, it was noticed early 
on [12] that local coordinate transformations are often 
needed to prevent excessive growth (‘the wrapping ef- 
fect’) of the widths of interval enclosures. It has been 


a continuing project to improve on such transforma- 
tions [6,8,12,13,14,16,17,21,28,30,32,33]. 

Variable-precision interval computation provides 
a means of controlling computational error in ill-posed 
problems [1,2,7,8]. Using interval methods, we can, in 
principle (with enough computing), find solutions to 
prescribed accuracy for differential equations [28]. 

Some examples will illustrate the kinds of results ob- 
tainable by interval methods. 


Example 1 A problem that occurs in chemical reactor 
theory involves the differential equation 


” 1, (-7)) 
y+ -y+be bl =0, O<x<1, b>0, 
x 


with boundary values y’(0) = 0 and y(1) = t > 0. 

It turns out there may be one or more solutions de- 
pending on the values of t and b. Using interval meth- 
ods, it can be proved easily [20] that, for every t, b > 0, 
we have 


yithet+b Jets — x?) —) 


4 4 

for all solutions y and all 0 < x < 1. 

Example 2 Consider the nonlinear hyperbolic PDE 
Uxy = 1+ (ux + Uuy)u 


with initial conditions u(x, 0) = 0 and u(0, y) = 0. 

Using interval methods, we can prove [23] that for 
allO < x < 0.5 and all 0 < y < 0.5, the solution u(x, y) 
is contained in the interval-valued function 


xy + [0.1666, 0.2144](x’y* + xy’). 


This means that we have guaranteed lower and up- 
per bounds on the solution, namely 
xy + 0.1666(x7y*? + x? y’) < u(x, y) 
< xy + 0.2144(x7y? + x? y’) 
for all x and y in [0, 0.5]. 
Example 3 Consider the initial value problem [21] 
xi =ct? +x? 


with x(0) =a. 


Suppose there is uncertainty about the values of a, b and 
c, and all we know is that 0 < a < 0.1, 0.2 <b < 0.38, 
3.3<c<3.6. 
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In a single iteration of the integral equation repre- 
sentation of the problem, we find that every solution 
satisfies 


1.10 <x(f)<01+t4+12¢0 


for all t € [0, 0.6] and all a, b, c in the given ranges. 

Moving coordinate systems for initial value problems 
help reduce the growth of interval bounds from one 
expansion step to the next. The wrapping effect arises 
from a rotation of the vector field associated with the 
differential equations. Such a rotation cannot be fol- 
lowed by interval vector bounds, which are boxes with 
faces parallel to the coordinate planes. This wrapping 
effect can be partially controlled in a number of ways 
with varying degrees of success. R.E. Moore [16] sug- 
gested the use of the ‘connection matrix’. F. Kriickeberg 
[12] devised an algorithm he called the 3PM process. 
RJ. Lohner [13] suggests the use of parallepipeds and 
also QR-decomposition for matrix transformations of 
enclosing hyper-rectangles. 


Among the sample programs given in [10 Appendix 
D] there is ‘AWA’, (AnfangsWert Aufgabe) for initial 
value problems ([10, pp. 248-251]). AWA implements 
the ideas of Lohner concerning control of the wrapping 
effect. Five options are provided: 
0) interval vector; 
1) parallelepiped; 
2) QR-decomposition; 
3) intersections of 0) and 1); 
4) intersections of 0) and 2). 

For an example using AWA, we considered 
a Volterra model of conflicting populations 


u, => 2x,(1 _ x2), 


= —x2(1 — x1), 


with initial conditions x,(0) = 1 and x2(0) = 3. 
Automatic differentiation software is incorporated 
in AWA which automatically introduces auxiliary vari- 
ables T,, T>, ... as needed, and derives recursion rela- 
tions for generating Taylor coefficients (x,), and (x2),, 
k > 1, line-by-line from a compiler code list, see [31]. 
For the example above, the automatically derived recur- 
sion relations for derivatives of any order would look 


like the following: 


Tr =1—x, so (Ti)e = —(x2)k, 


k 
h=x1N, so (Th) = Yi (x) (Tk): 
j=0 
(x1)1=2Th, so (xi)eti = eqlhk. 


T3=1-—x, so (T3)e = —(xr)e, 


k 
Y (x2) ;(Ts)a-j, 


j=0 


(x2)k+1 = 


Ty = X2T3, so (T4)x 


II 


(x2)1 = —Ts, so — ry (Ta)k. 
The interval Taylor expansion about some fo is then, 
for i= 1, 2: 


K-1 


(x;)(to + h) = 1 D(x) (toh! p + Rih*, 
j=0 


where the remainder coefficients R;, i = 1, 2, are com- 
puted from the above recursion relations with interval 
inputs (also found automatically) for x; = (x1)o and x2 
= (x2)o. 

Using the program AWA given on the diskette for 
C-XSC, we obtained the following results using the op- 
tions described. 

Using nine terms in the Taylor expansions, K = 9, 
and continuing the solution to t = 10, using a relatively 
large stepsize h = 0.1, we obtained different results for 
two options, both containing the exact solution. 

Option 0) produced 


x2(10) € [0.347636...,0.350002...], 
whereas option 1) produced the narrower enclosure 
x2(10) € [0.34875..., 0.34888 ...]. 


The optimal choices of all the various program 
parameters and the method of coordinate transfor- 
mation and how they depend on a particular initial 
value problem are matters that are still being studied. 
See [5,13,14,33]. 

J.S. Ely [7] developed a variable precision interval 
package (VPI), using the programming language C++, 
and applied it to the study of the following ill-posed 
partial differential equation from the theory of vortex 
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dynamics ([8]): 


0z* (p, t) 
ot 


+t z(p, t) — 2(q, t) 
= sev. [ B@ cot Kegs dq 


with initial condition z(p, 0) = p, and the integral taken 
over [0, 277]. 

Here, z(p, t) is a 2 periodic, complex function of 
two real variables p and t, z* denotes complex conjuga- 
tion, B(q) = 1+ A cos(q), and P.V. stands for the Cauchy 
principal value of the singular integral. 

In order to obtain satisfactory results on this diffi- 
cult problem, in particular to determine the time of on- 
set of turbulence, and to rule out rounding errors as the 
cause of the observed behavior of the computer sim- 
ulation, as many as 896 bits were used in the interval 
arithmetic. To improve efficiency, trigonometric trans- 
formations were used for a reformulation of the differ- 
ential equation. Further speed-ups were obtained using 
parallel programming for a distributed network of com- 
puters, and in another version a Cray supercomputer. 
The results [7,8] settled a long-standing controversy 
concerning the reliability of the mathematical model in 
the face of previously unknown effects of rounding er- 
ror. It is certainly not obvious in advance how many 
bits are needed for such a difficult problem. The point 
is that with interval computation we can see, from the 
widths of interval results, how accurately the answers 
have been determined. If we have not yet obtained de- 
sired accuracy, we can repeat the computations carry- 
ing more bits. This can be automated so that, in the 
words of O. Aberth [1]: “We expect the computer to do 
whatever is necessary to obtain such answers’. 

See [3] for further discussion of interval methods for 
differential equations, and some nontrivial applications 
(e.g.: existence proofs for bifurcations, computer as- 
sisted proofs in dynamics, globally convergent domain 
decomposition methods). 


See also 


> Automatic Differentiation: Point and Interval 

> Automatic Differentiation: Point and Interval 
Taylor Operators 

> Bounding Derivative Ranges 


> Global Optimization: Application to Phase 
Equilibrium Problems 

> Interval Analysis: Application to Chemical 
Engineering Design Problems 

> Interval Analysis: Eigenvalue Bounds of Interval 
Matrices 

> Interval Analysis: Intermediate Terms 

> Interval Analysis: Nondifferentiable Problems 

> Interval Analysis: Parallel Methods for Global 
Optimization 

> Interval Analysis: Subdivision Directions in Interval 
Branch and Bound Methods 

> Interval Analysis: Systems of Nonlinear Equations 

> Interval Analysis: Unconstrained and Constrained 
Optimization 

> Interval Analysis: Verifying Feasibility 

> Interval Constraints 

> Interval Fixed Point Theory 

> Interval Global Optimization 

> Interval Linear Systems 

> Interval Newton Methods 
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Most of the square matrices appearing in practice un- 
dergo quantization that can be coarse and are repre- 
sented by finite precision numbers. Hence, the underly- 
ing unquantized matrices belong to real (complex) in- 
terval matrices whose entries are closed intervals (rect- 
angles). Interval matrices can also be used to model un- 
structured matrix perturbations. This self contained ar- 
ticle focuses on eigenvalue bounds of interval matrices 
and provides proofs to all theorems and lemmas. 
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The organization of this article is as follows. In Sec- 
tion 1 we study the eigenvalues of (n x n)-dimensional 
real symmetric interval matrices and show that the ex- 
act real interval of variation of their first and last eigen- 
values can be found by considering two sets of vertex 
matrices each of cardinality 2" '. In addition, we give 
a counterexample that falsify the conjecture that for n > 
3 the interval(s) of variation of the other eigenvalues is 
attained at vertex matrices. We also remark that this re- 
sult can be applied to real skew-symmetric interval ma- 
trices, as well as to finding the interval of variation of the 
real part of the eigenvalues of a class of interval matrices 
whose endpoints are real symmetric matrices. In Sec- 
tion 2 we study the eigenvalues of (n x n)-dimensional 
Hermitian interval matrices and show that the exact 
real interval of variation of their first and last eigenval- 
ues can be found by considering two sets of vertex ma- 
trices each of cardinality 2°’+"-??, The above men- 
tioned counterexample also falsifies the conjecture that 
for n > 3 the interval(s) of variation of the other eigen- 
values is attained at vertex matrices. We also remark 
that this result can be applied to skew-Hermitian inter- 
val matrices, as well as to finding the interval of varia- 
tion of of the real part of the eigenvalues of a class of 
interval matrices whose endpoints are Hermitian ma- 
trices with their imaginary part fixed. Finally, in Section 
3 we present rectangular bounds for the eigenvalues of 
complex interval matrices. 

In signal processing, control, and statistics real sym- 
metric and Hermitian interval matrices represent, e. g., 
the quantized sampled covariance matrices of vector 
stochastic processes and their eigenvalues represent the 
variances of their decorrelated elements. 

In a recent global optimization algorithm, see [1], 
real symmetric interval matrices were used to tightly 
bound the sets of Hessian matrices resulting from the 
objective function’s nonconvex addends and then their 
minimal eigenvalues were used to tightly convexify 
them. 

Eigenvalues of general interval matrices are useful 
to study robust stability margins of analog and dis- 
crete systems, and convergence rates in numerical anal- 
ysis. The reader interested in additional work in this 
area is referred to [7] and the references therein, [5]; 
and, [4,6,10], where Toeplitz and Hankel interval ma- 
trices were studied. Genetic algorithms are promising 
for solving the above problems for large n. 


Real Symmetric Interval Matrices 


A real (n x n)-dimensional symmetric interval matrix 
S = S[S,S], where S and S are both real symmetric 
matrices is defined by 


S=S!, 


[Spe < Ske < Ske]. ¢ > (1) 
k,€=1,...,n 


S= 4S = [sxe]: 


where the superscript T denotes transposition. Further, 
let S C S denote the set of all vertex matrices such that 
if S = [sye] € 8, then sgg = Ss, Or See = Spe. Note that 
|S| = 2° +2, where |8| denotes the cardinality of 8. 
It is well known that all the eigenvalues of a real 
symmetric matrix S are real, see e. g. [11]. So let A, (S) < 
- <1,,(S) be the ordered eigenvalues of S and A(S) = 
{A;x(S):k = 1,..., n}. Further, let A;,(S) = {A: A = A;(S), 
S € S}, A(S) = {A: A € X(S), S € S}, 


A, (S) = min(Ax(S)), S 
Ae(S) = max(Ax(S)). 

Because § is a compact set (i.e., closed and bounded 
in R™, where m = (n?+ n)/2 is the number of free pa- 
rameters of S) and the eigenvalues of a matrix depend 
continuously upon its entries [9 Appendix D], it fol- 
lows from [14 Thm. 4.16] that A,(S) and A;(S) are at- 
tained. That is, there exist matrices S,, S, € § that sat- 
isfy A,(S) = Ax(S,),An(S) = Ax(Sx), where k = 1, 
syn 

The purpose of this Section is to study the problem 
of computing the possibly overlapping eigenvalue in- 
tervals A;(S) = [A,(S), Ax(S)], where k= 1,..., . 

We have shown in [3] and [7] that the four end- 
points of A;(S) and A,(S) are attained by considering 
two subsets of § each of size at most 2”— !, see Theorem 
below. 

Regarding A;,(S) for k = 2,...,n — land n> 2 
we present below a (3 x 3)-dimensional real symmet- 
ric interval matrix S and a matrix S, € S$ for which 
A2(S) = A2(So) > A2(S). That is, for k 4 1, n the end- 
points of A;(S) are not necessarily attained at vertex 
matrices of 8. 

It is well known that if S € §, A,(S) and A,,(S) are at- 
tained at the minimal and maximal values, respectively, 
of the set {xT Sx: || x |] = 1, x € R”} [11 Thm. 5.2], where 
|| x || denotes the Euclidean norm of the vector x. Note 
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that because x™Sx = (— x)TS(— x) we can let x; > 0. 
Hence, the endpoints of A;(S) are given by 


4,(S) = min min x! Sx (3) 
SES xER", 
|xll=1, 
x1>0 


and 


A,\(S) = max min x! Sx (4) 
SeS x€R", 
Ix\|=1, 
x,>0 
Similarly, one can give expressions for the endpoints 
of A,(S), or, alternatively use the relation A,(S) = 
— A,(—S). 
Expanding xTSx, where x € R” and S € S we obtain 


x'Sx = Yue a5. 3 SKEXKEXE. (5) 


k=1l=k+1 


Since x; > 0, x may vary in 2"—! closed orthants 


wherein the sign pattern of its elements is preserved. 
Let O, C R", p =1,..., 2" 1, denote the set of unit- 
length real vectors x with x; > 0 that belong to the pth 
orthant, where the orthants are ordered according to 
the binary order of the signs of the last n— 1 elements 
of x and a negative (nonnegative) element corresponds 
to ‘0’ (‘1’). 
Hence we obtain the following subset of 8: 


S = arg minses x! Sx, 


S=,)S some x€Or, 
| a re ey 
SPS PS ogg? hy (6) 


where S? = [s? ae 0; denotes the interior of O,, and 


Sy, ifk = €, 
if (xpxg > 0) A (k # 8), (7) 
See if (xexe <O)A(KF# £). 


Similarly as above, by ane x! Sx over 2 € Sand 
some x € OF one arrives at S’ and S, where S’? ESC S 
and |S| = | 5 F 


Theorem 1 


A,(S) = Ai(S), As (S) = Ax (S), 


and 


A,(S) = An(S), An(S) = An(S). 

Proof We will prove that A,(S) = A,(S). The rest of 
the proof is similar and will therefore be omitted. Be- 
cause the minimization in (3) is over a compact set (i. e., 
{x, S: x € R", || x || = 1, S € S}) and xTSx is a real con- 
tinuous function of x and S, it follows that xT Sx attains 
its minimal value for some x° € O? and S° € S. By ex- 
panding the quadratic form x°TS? x° as in (5) it can be 
seen that x°'S°x® > x°'S?x? > x?! §Px?, where x? 
denotes the unit length eigenvector of S? € S corre- 
sponding to A,(S?). Moreover, because x? and S° solve 
the optimization problem (3) it follows that xT $x? << 
x?™ SPxP, Hence x°'S°x® = x?! SPx? and therefore 
A,(S) — Ay (S). 


Note that similar results as in Theorem 1 hold for real 
skew-symmetric interval matrices, see [13]. 


Remark 2 Let S[S,S] be defined as before with S = 
a and S = 3 Define the real interval matrix 
BIS,5] > SIS,S] by B = {B = [beel: [sy < 
bee < Ske], k,€ = 1,...,n}. Using Bendixon’s the- 
orem [11 Thm. 5.3] (ie. for B € B, mintA(B) => 
A(B') and max tA(B) < 2(B’), where B! = B42"), 
and ‘i, S denote the real and imaginary parts, respec- 
tively), it follows that min #A(B[S, S]) = A(S) and 
max HA(B[S, S]) = A(S). Hence, using Theorem 1 we 
obtain that min tA(B) = A(S) and maxNA(B) = 
A(S), see also [12]. 


Example 3 Let the 3-dimensional real symmetric ma- 
trix S = [S, S] be given by 


an es 

S=[1 3 -1], 
=f =1 =I 
6 2 -2 

S=/2 11 7 
=—f 7° 5 


Here S$ contains 2° = 64 vertex matrices, i.e. |S| = 
Using MatLab we obtain that A(S) = A(S,) = 10. 6 
where 
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Next, results of 10° computer runs produced the follow- 
ing matrix S, € S, 


5.2054 1.6556 —6.8321 
So =| 1.6556 10.9244 1.9721 
—6.8321 1.9721 4.6703 


with A,(S,) = 11.3514. Hence A2(S) > A2(So) > 
A2(S). 

Next, considering — S = {S: — S € S} we obtain that 
S; = —S, € —S satisfies A,(S1) = —A2(—S,) and con- 
sequently A,(—S) < A,(S1) < A,(—S), where — 8 has 
a similar meaning as — S. 

Hence, the endpoints of A2(S) need not be attained 
at vertex matrices of 8. 


Hermitian Interval Matrices 


An (n x n)-dimensional Hermitian interval matrix H = 
H[H, H], where H and H are both Hermitian matrices 
is defined by 


H={H: H=H*, 
[Sthpy < Whe < Rhze, 
k,€=1,...,n, k> &, 


hy < Shee S Sheed, 


(8) 


where 3 hy. = 0, k = 1, ..., n, * denotes the Hermi- 
tian operator (i.e. conjugation followed by transposi- 
tion). Further, let 4 C H denote the set of all vertex 
matrices such that if H = [hye] € H, then hyg = hyp 
or hye = hye. Note that the cardinality of H is |F{| = 
2”. It is well known that the eigenvalues of an Her- 
mitian matrix H are real, see e.g. [11]. So let 4,(H) 
< A.(H) < --- < 4,(H) be the ordered eigenvalues 
of H and A(H) = {A;(H): k = 1, ..., n}. Further, let 
as before A, (H) = {A: A = A,(H), H € H},A(H) = 
{A: 4 € 1(H), H € H}, A,(H) = min(A;(H)), and 
Ax(H) = max(A;(H)). 

Because H is a compact set (i.e. closed and 
bounded in R”, where n2 is the number of free real 
parameters of H) and the eigenvalues of a matrix de- 
pend continuously upon its entries [9, Appendix D], it 
follows from [14, Thm. 4.16] that A,(H) and A, (H) are 


attained. That is, there exist matrices H,, Hy, € H that 
satisfy A,(H) = Ax(H,), Ax(H) = Ax (Hy), where k = 
| eee a 

The purpose of this Section is to study the problem 
of computing the possibly overlapping eigenvalue in- 
tervals A,(H) = [A,(H),Ax(H)], where k = 1,..., n. 
We have shown in [2] and [7] that the four endpoints 
of A,(H) and A,(H) are attained by considering two 
subsets of H each of size at most 2°” +"~2)2, see Theo- 
rem 4 below. 

Regarding A;(H) for k = 2,...,n — 1 and n > 3, 
since real-symmetric interval matrices are a special case 
of Hermitian interval matrices, it follows from Exam- 
ple 3 that for k ~ 1, n the endpoints of A;,(H) are not 
necessarily attained at vertex matrices of H. 

It is well known that if H € H, A,(H) and d,(H) 
are attained at the minimal and maximal values, respec- 
tively, of the set {x* Hx: ||x|| = 1, x € C”}, see [11 Thm. 
5.2]. Note that because x* Hx = (e!’x)*H(e!#x) for all 
@ we can choose x; € R*, where R* = {x € R: x > O}. 
Hence, the endpoints of A(H) are given by 


4,(H) = min min x*Hx (9) 
HEH xeC", 
IxI=1, 
xjERT 
and 


A\(H) = max min x*Hx. (10) 


HEH xeC", 
xll=1, 
xjERT 


Similarly, one can give expressions for the endpoints 
of A,(H), or, alternatively use the relation A,(H) = 
— A\(—H). Since x; € R*, x € C” may vary in 27-7 
closed orthants contained in R*"~' wherein the sign 
pattern of its elements is preserved. Let O,, p = 1,..., 
2?n~2, denote the set of unit-length real vectors x with 
x, € R* that belong to the pth orthant, where the or- 
thants are ordered according to the binary order of the 
signs of the vector (3tx2, 9x2,..., RXn, 3Xn) € R?*? 
and a negative (nonnegative) element corresponds to ‘0’ 
(‘1’). 

Expanding x* Hx, where x = u + iv, u, v € R” and 
H = B+ jC €H, B= BT € R”*", C=—CT € R"*", and 
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noting that diag(C) = 0 we obtain 


xe = by; 


+ = byx(uz + vj) 


k=2 


+2) [byuiue + cre(—wv0)] 


f=2 
n n 
+257 So [be(urue 
k=2 €=k+1 


(11) 


+ vVEVe) + Cre(—ukve + veug)). 


Hence, minimizing x* Hx over H = [hye] = [bye+ ice] 
€ H and some x € Of we obtain that only part of the 
entries b,g and cyg can be chosen at vertex points, i.e., 


hy, ifk=2, 
Rhy if(ue>0)AL>k=1), 
Rhy if(ue<0)A(>k=1)), 
be, = {Rhye if (ugue > 0, veve > 0) (12) 
A(E>k> 1), 
Nhre if (upug < 0, veve < 0) 
ACL > k > 1). 
and 
0 ifk = 2, 
Shy if(ve <0) AL >k=1), 
Yhye if(ve>0)AL>k=)D, 
che = 4 Shyy if (ueve < 0, vee > 0) (13) 
A(E>k> 1), 
Shye if (ugve > 0, veue < 0) 
Aik = 1, 


Let H? C H denote the set of all vertex matrices that 
satisfy (12) and (13). Using (12) and (13) one can set in 
Band C their diagonal, first column, first row, and if ¢ 
> k > 1 either an entry of B or of C, see [2] for more 


details. Hence, since the number of free parameters in 
(n2—3n-+2) 
2 


H is n’, it follows that | pP | = 2 . Further, 
n2—3n+ 
let H? = Hee. @=1,...,2° 3 and H = 


(n2-+-n—2) 


{He P= jess, 2 7} hence || = 2° 3, Site 
ilarly as above, by maximizing x* Hx over H € H and 


some xX € 0; we arrive at H’ and 4, where H? C 
HCH, [P| = |H?|, and || = (91), 
Theorem 4 


Ai(H) = AH), A, (H) = A, (F) 


and 


An(H) = An(H),  An(H) = An(#1). 

Proof We will prove that A,(H) = Ai(H). The rest 
of the proof is similar and will therefore be omitted. 
Because the minimization in (3) is over a compact set 
(i.e. {x, H: x € C”, ||x|| = 1, H € H}) and x* Hxisa real 
continuous function of x and H, it follows that x* Hx 
attains its minimal value for some x° € O? and H? € H. 
By expanding x°* Hx° as in (11) and noting that x° is 
constant, it can be seen that there is an H?! € H? for 
which x°* H°x® > x°*HP&x° > xP!" HP!xPl where xP! 
denotes the unit-length eigenvector of H?* correspond- 
ing to Ay (HP®). Moreover, because x° and H° solve the 
optimization problem (9), it follows that x°*H°x°® < 
xPl* Pe xP! Hence x°* H°x? = xP HPP! and there- 
fore A, (H) = A, (HX). 


Note that similar results as in Theorem 4 hold for skew- 
Hermitian interval matrices. 


Remark 5 This remark is similar to Remark 2. Let 
H[H, H] be defined as before with H = H*, H = H*, 
and SH = SH. Define the complex interval matrix 
A[H, H] > H[H, H] by 


[Nhe < Mage < Ihe, 
Sage = Shee), 
kKyb=lyscyn 


A= jA= [axe]: 


Using Bendixon’s theorem and Theorem 4, it fol- 
lows that min %A(A[H,H]) = A(H) = A(H) and 
max tA(A[H, H]) = A(H) = A(A). 


Complex Interval Matrices 


In this Section we present rectangular bounds on the 
eigenvalues of a complex interval matrix by extending 
similar results for a real interval matrix [13]. A com- 
plex interval matrix, denoted by A, is defined by A 
= B+ iC, where B = {B: B< B<B, BE R™*}, 
C= 1G: C<C<C, CeR™"}, i? = — 1, and 
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€ R"*" are fixed matrices. Further, let A = 
= B+ iC,and A, = tA) 

Let jt Odes Riven te cane 
R®*!, || x || = 1) be an eigenvalue and the correspond- 
ing unit length eigenvector of some matrix A € A. First, 
note that || x || = 1 implies 


Ix)? =x*x =ulu+viv=1. (14) 
Let A = B+ iC, B € B, CE C. Since Ax = Ax we obtain 
(A, + iA;)x = (B + iC)(u + iv) 
= Bu—Cv-+ i(Bv+ Cu). 


Premultiplying the above equation by x* = uT— iv, 
equating the real and imaginary parts, and noting that 
x*x = 1 we obtain 


A, =u! Bu—u'lCv+v' Bv+v'Cu 
and 
A, = —v'But+v'Cv+u' By+u'Cu. 


We have that u'Bu = u!'Blu < A(B’)u'u and 
v' Bv =v! B'v < A(B’)v ' vy, see [11], where 


B+B! 
a. 


B= 


(15) 


Note that similar results pertain to the real matrix C. 
Hence, using (14) we obtain 


Ay < A(B) —ulCv+v'Cu. (16) 
Choose B, = (Bt) and C, (cre) , then 
X(B’)= max x! Bx 
||xll=1,xeR*" 
= max (<x +x"(B= B.)x) 
||xll=1,xeR*" 
<A(B.)+ max |x|! Ag |x| 
||x||=1,xeR” 
= A(BY) + A(Ay), (17) 
where |x| = abs(x) taken elementwise, 
Ag = B-B,, (18) 


and both A‘, and B’, have similar meaning as B’ defined 
in (15). 


To obtain the final form of the upper bound on 1,, 
A, it remains to carry out the following derivation: 


=u! Gy ty! Cu 
— =a Cv + v' Cu _ ul(C —C,)v+ vi(C —C,)u 
(<u! Cv4v' Cao) 


= max 
|@Tv [|= 
max (\u|" Ac|y| + |v|" Ac |u)) 
[eT wT ]=1 
: 
() (2 o)() 
=< max 
Tv" |/=1 \V Cc. 0 v 
Tr 
(ii) (ae “o) Gw) 
ax 
l@tv j= \IvIJ/ \ac 0 J vl 


ZAC!) + AD), 
(19) 


where Ac has similar meaning as Ag defined in (18), 


—c.+c) 
0 = 
C= c.-ct 2 ; 
2 0 
0 A 
AL = Cc 
f=(4, ‘): 


and A‘ has similar meaning as B’ defined in (15). 
Hence, using (16), (17), and (19) we finally obtain 


(20) 


Ay < Ay = ACBL) + MAb) + (CY) + MAY). (21) 


The lower bound on A,, A,, can be obtained by noting 
that —A = —A, —i); is an eigenvalue of -A € —A = 
—B + i(—C), where —-B = {B: —B < B< —B, Be 
R"™*} and, —C = {C: —C < C < -C,C € R"*"} 
using (21) and then replacing the roles of B and C by 
— Band — C we obtain 


Ay > Ay = A(BL) — A(AR) + A(CZ) — A(AQ)._ (22) 


The upper and lower bounds on 4; can be similarly ob- 
tained by noting that — iA = A;— iA, is an eigenvalue of 
—iA €—iA=C + i(—B), using (21) and (22), respec- 
tively, and then replacing the roles of B and C by C and 
— B. We thus obtain 


Theorem 6 Let A = B+ iC be as defined above, A, = 


(ata) = B. + iC, be the central matrix of A, and i = 
Ay i id; be any eigenvalue of the matrix A € A, then 


Ay SAr SA, and Ay <A <4i, 
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where 
Ay = A(Bi) — A(Ag) + A(C2) — A(AQ), 
A, = ABi) + (Ag) + (CZ) + A(AQ), 
A; = A(CL) — MAG) — ABD) — AC Ag), 
Ai = A(C{) + (AG) — A(BY) + A(Ag)s 


all the primed matrices (i. e., B’., C!., A’, and AG) have 
similar meaning as B’ defined in (15); Ag is as in (18) 
and Ac has similar meaning; and, C’, and Ag are as 
in (20) with B” and Aj having similar meaning, respec- 
tively. 
Corollary 7 Note the following consequences: 
i) ifa, <0, then the interval matrix A is Hurwitz sta- 

ble. 
ii) if the rectangle 

\@.y): Sx 57, di sy sti} 


is contained in the open unit disk, then A is Schur 
stable. 


Some computational simplifications for Theorem 6 can 
be obtained by using the following lemmas. 

Lemma 8 Let C,” be as defined in (20), then eC) = 
—(C") = p( ESE), where p(A) = (lal: 4 € A(A)} 
denotes the spectral radius of the matrix A. 


Proof Let G = (C,— C])/2 and Gv = A v (note that 
since G is skew symmetric, A is purely imaginary, see 
[11]); the eigenvalues of C’, are + iA with correspond- 
ing eigenvectors (v1, + ivT)T, which gives the desired 
result. 


Note that this Lemma can also be applied to BY”. 
Lemma9_ Let 


0 Ss 
a (‘ a 
where S € R"*" and S = ST, then A(D) = —A(D) = 
p(S). 


Proof 4 The eigenvectors of D are (wT, + wT)T, where 
w is an eigenvector of S. Hence the eigenvalues of D are 
+A, with A an eigenvalue of S, which gives the desired 
result. 


Note that this Lemma can also be applied to Aj and 
AG. 


See also 


> «BB algorithm 

> Automatic Differentiation: Point and Interval 

> Automatic Differentiation: Point and Interval 
Taylor Operators 

> Bounding Derivative Ranges 

> Eigenvalue Enclosures for Ordinary Differential 
Equations 

> Global Optimization: Application to Phase 
Equilibrium Problems 

> Hemivariational Inequalities: Eigenvalue Problems 

> Interval Analysis: Application to Chemical 
Engineering Design Problems 

> Interval Analysis: Differential Equations 
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> Interval Analysis: Nondifferentiable Problems 

> Interval Analysis: Parallel Methods for Global 
Optimization 

> Interval Analysis: Subdivision Directions in Interval 
Branch and Bound Methods 

> Interval Analysis: Systems of Nonlinear Equations 

> Interval Analysis: Unconstrained and Constrained 
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(i) VB ve 
(ii) = v4 = Us 


(ili) V5 = v4Vv3, 
(iv) V6 = V3 — V5, 
(v) Vy = M5 oP 
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OP p q r 
5 3 1 _ 
5 4 Z _ 
4 5 4 3 

21 6 3 5 

20 i 6 4 


In global optimization algorithms, the computer must 
repeatedly evaluate an objective function, as well as, 
possibly, inequality and equality constraints. Such func- 
tions are given as algebraic expressions or as subrou- 
tines or sections of computer code. When such com- 
puter code is executed, operations are applied to the 
independent variables, producing intermediate terms. 
These intermediate terms are, in turn, combined to pro- 
duce other intermediate terms, or, eventually, the ob- 
jective function value. For example, consider the prob- 
lem 

min P(x) = xp — xix5 + x3 


over the box x = ([—1,1],[—1,1])’. . 


To evaluate ¢, the computer may start with the inde- 
pendent variable values v; = x; and v2 = x internally 
produce quantities v3, v4, v5, and v¢, to finally produce 
the dependent variable value #(x) = v7. Table 1 indi- 
cates how this may be done. 

A list such as in Table 1 may be represented as a ta- 
ble of addresses of variables and operations. For exam- 
ple, if the operation x» <— BE corresponds to operation 
code 5, xp <— x,x, corresponds to operation code 4, xp 
<— Xq + x, corresponds to operation code 20, and x, <— 
Xq — x; corresponds to operation code 21, then the set 
of relations in Table 1 is represented by Table 2. 

Such a sequence of operations is called a code list, 
but is sometimes called other things, such as a tape. As- 
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Vp= xX) 
V2 = X2 
(i) aw, 
(ii) a= v3, 


(iii) V5 = V4V3, 
(iv) V6 = V3 — V55 
(v) V7 = V6 + Va; 
(vi) vg = 24, 
(vii) V9 = vgva, 


(viii) Vi0 = Vg — Vo, 
(ix) vi = 2, 
(x) Vig = V5iilits 
(xi) 9 ¥13 = —Vya, 
(xii) Vi4=V13+Vu1, 
gb = v7; 
se = V10> 
i = Viv 


suming the axioms of real arithmetic hold for evalua- 

tion, code lists for a given algebraic expression or por- 

tion of a computer program are not unique. 

The concept of a code list is familiar to computer 
science students who have worked with compilers, 
since a compiler produces such lists while translating 
algebraic expressions into machine language. However, 
code lists and access to the intermediate expressions 
are of particular importance in interval global optimiza- 
tion, for the following reasons. 

e Code lists provide a convenient internal representa- 
tion for the objective and constraints, to be used for 
automatic differentiation, for both point and inter- 
val evaluation of objectives, gradients, and Hessian 
matrices. 

e The values of the intermediate quantities can be 
used within the optimization algorithm in processes 
that reduce the size of the search region. 

e Symbolic manipulation can reduce the overestima- 
tion, or interval dependency that would otherwise 
occur with interval evaluations. 

Details are given below. 


Use In Automatic Differentiation 


A code list can be used either as a pattern to specify the 
computations in the forward mode of automatic differ- 


entiation or as a symbolic representation of the system 
of equations to be solved in the backward mode. See [7] 
for an in-depth look at the forward mode of automatic 
differentiation, and see [3] for somewhat more recent 
research on the subject. See [6, pp. 37-39] for some ex- 
amples and additional references. Also see ® Automatic 
differentiation: Introduction, history and rounding er- 
ror estimation. 


Use In Constraint Satisfaction Techniques. 


Since each intermediate variable in the code list is con- 
nected to one or two others via an elementary, invert- 
ible operation, narrow bounds on one such intermedi- 
ate variable can be used to obtain narrow bounds on 
others. For example, suppose that the code list in Ta- 
ble 1 has been symbolically differentiated, to get the 
code list in Table 3. Then, if the subbox x = ([0.5, 1], 
[— 1, —0.5])T is to be considered for possible inclusion 
of optima, the derivative code list in Table 3 can be eval- 
uated by forward substitution to obtain the interval set 
of intermediate values in Table 4. Furthermore, since 
(1) is an unconstrained problem, an optimum must oc- 
cur where 0¢/ 0x, = 0 and d/ 0x2 = 0. In particular, 
any global optimizer x* must have 


Vio(x") = 0. (2) 
Using (2) in line (viii) of the derivative code list in Ta- 
ble 5, 

Vo = Vg — V10; 
whence 

Vo < [1,2] —0, 

Vo <— Vo M vo = [1,2]. 

Now, using (vii) of Table 5, 


aw 1,2 
Hee aE) p39). 
Vg [1, 2] 


V4 <— Va N v4 = [0.5, 1]. 


Now using (ii) of Table 5 gives 
Me NAZI U —J¥4 (3) 


€ [0.70, 1] U [-1, -0.70], 
V2 — V2 N v2 = [—1, —0.70]. 


(4) 


The last computation represents a narrowing of the 
range of one of the independent variables. 
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Vi= 5, Ih, 

= |=1,=5lh 
v3 = [.25, 1], 
V4 = [.25, ill, 
vs = [.0625, 1], 


16 = (|=. 75 2375)|; 
Wy = ||=5, 19B75)), 


ve = [1,2], 

Vo = 25, 2I|, 

Vi0 = [—1, 1.75], 
vu = [-2, -1], 
Vio = [—2, —.25], 
Vi3 = 25), 2], 
Vi4 = [—1.75, Ih, 


b € [—.5, 1.9375], 
ap 

i (Sy), 
52 € [-1.75, 1]. 
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vi = Xi, 

v2 = X2, 
(i) Vy = Va 
(ii) a= Tee 
(iii) v5 = V3V2, 
(iv) Ye=Vi, 
(v) 7 = V6 + V4, 
(vi) Vg = V7 + lls 
(vii) Vg + v5 = 0, 
(viii) Vg — 3v5 =0 


(A similar computation could also have been carried 
out to obtain narrower bounds on 1.) 

If, in addition, an upper bound ¢ = 0 for the global 
optimum of ¢ is known, then 


v7 € [—00, 0] N [—0.5, 1.9375] = [—0.5, 0]. 


This can now be used in Table 3, (v), along with new 
intermediate variable bounds, wherever possible, to ob- 
tain 
Va <— V7 — Ve = [—0.5, 0] — [—0.75, 0.9375] 
= [—1.4375, 0.75], 
v4 <— v4 NV4 = [0.5, 0.75]. 


Now using Table 3, (vii), 


Vo < [1, 2][0.5, 0.75] = [0.5, 1.5], 


V9 <— Vo MN vo = [1, 1.5], 


then using (viii) and vjo = 0 gives vg = [1, 1.5]. Finally, 
using Table 3, (vi), gives 


v, <— [0.5, 0.75] MN [0.5, 1] = [0.5, 0.75]. (5) 


Now, evaluating ¢ in (1) (or redoing the forward substi- 
tution represented in Table 4) at (x:, x2) = ([0.5, 0.75], 
[—1, —0.70]) gives 


6 € (5.75) =[(5.75) R= 77 + i 7P 
= [.25.5625] — [.25.5625][.49, 1] + [.49, 1] 
= [.25.5625] + [—.5625, —.1225] + [.49, 1] 
= [.1775, 1.44], 


contradicting the known upper bound ¢ = 0. This 
proves that there can be no global optimizer of (1) 
within ([0.5, 1], [—1, —0.5])™. (Note that, in fact, there 
are no global optimizers in ([—1, 1], [—1, 1])T if the 
problem is considered to be unconstrained.) 

The above procedure is easily automated, as is done 
in, say, GlobSol [2,6], UniCalc [1], or other interval con- 
straint propagation software. 

This example illustrates a more general technique, 
associated with constraint propagation and logic pro- 
gramming. See [4] for an introduction to this view of 
the subject, and see [5] for alternate techniques of in- 
terval constraint satisfaction. 


Use In Symbolic Preprocessing. 


To understand how symbolic analysis based on the code 
list may help, consider the following example: 


Find all solutions to 

fH =0, 7 = Vie" 
within the box x = ([—2, 0], [—1, 1])" (6) 
where fi (x1, x2) =x? +x7x2.+x5+1 


fo(x1, x2) = x? = 3x7x2 + Ei +1. 


A possible code list is 

There is much interval dependency in this system, both 
in the individual equations (since each variable occurs 
in various terms), and between the equations (since the 
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equations share common terms). However, examina- 
tion of the code list in Table 5 reveals that a change of 
variables can make the system more amenable to inter- 
val computation. Seeing that (vii) and (viii) are linear in 
vs and vg = v4 + V6+ 1, define 


MaV = Xa as (7) 
yr = V4 + V6 = x7 + X5. 
Then the system becomes 
2+¥1+1=0, 
» (8) 
y2- 3y1 +1=0. 


Thus, the linear system (8) may be solved easily for y; 
and y2. The interval bounds may then be plugged into 
(7) to obtain x; and x2. There is no overestimation in 
any of the expressions for function components or par- 
tial derivatives in either (8) or (7). 

Additional research should reveal how to automate 
this change of variables process. 
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Nondifferentiable problems arise in various places in 
global optimization. One example is in /; and Io op- 
timization. That is, 


min (x) = min ||Fl|, = min)? | fi(x)| (1) 


i=1 
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and 
min @(x) = min ||F||,, = min imax ifcon » (2) 


where x is an n-vector, arise in data fitting, etc., and ¢ 
has a discontinuous gradient. In other problems, piece- 
wise linear or piecewise quadratic approximations are 
used, and the gradient or the Hessian matrix are dis- 
continuous. In fact, in some problems, even the objec- 
tive function can be discontinuous. 

Much thought has been given to nondifferentia- 
bility in algorithms to find local optima, and various 
techniques have been developed for local optimization. 
Some of these techniques can be used directly in inter- 
val global optimization algorithms. However, the power 
of interval arithmetic to bound the range of a point- 
valued function, even if that function is discontinuous, 
can be used to design effective algorithms for nondif- 
ferentiable or discontinuous problems whose structure 
is virtually identical to that of algorithms for differen- 
tiable or continuous problems. 


Posing As Continuous Problems 


Several techniques are available for re-posing problems 
as differentiable problems, in particular for Problem (1) 
and Problem (2). One such technique, suggested in [2, 
p. 74] and elsewhere, involves rewriting the forms |e|, 
max{e), €2}, and min{e,, e2} occurring in variable ex- 
pressions in the objective and constraints in terms of 
additional constraints, as follows: 

e Replace an expression |e| by a new variable x, 4; and 

the two constraints x, 41 > 0and oe 41 e. 

e Replace max{e;, e2} by 


e; + e2 + |e1 — e2| 
; ; 
e Replace min{e,, e2} by 
e; + e2 — |e; — e2| 
: 
Alternately, as explained in [1] and elsewhere, the 
entire Problems (1) and (2) can be replaced by con- 
strained problems. In particular, (1) can be replaced by 


m 
min Yo vi 

i=1 (3) 
st. vi > fi(x), i=1,...,m, 

vi>—fi(x), i=1,...,m, 


where the v; are new variables. 
Likewise, (2) can be replaced by 


min v 
st. v>fi(x), i=1,...,m, (4) 
v>—fi(x), i=1,...,m, 


where v is a new variable. 


A Special Method for Minimax Problems 


In [4], a special interval algorithm for (2) is presented. 


Treating As Continuous Problems 


Due to inclusion properties of interval arithmetic, 

interval algorithms based on a particular degree of 

smoothness can be effective, essentially unchanged 
when less smoothness is present. In particular, 

e If the objective function is discontinuous, algo- 
rithms designed for continuous objective functions 
can be used effectively. 

e Ifthe function is nonsmooth (that is, if the gradient 
has discontinuities), then algorithms based on sec- 
ond order information can often be used effectively. 

For a brief discussion and further references for these 
general algorithms, see » Interval analysis: Uncon- 
strained and constrained optimization. For a more in- 
depth discussion of how continuous algorithms can be 
used for discontinuous problems, see [3, Chap. 6]. The 
main ideas are highlighted below. 

Minima of ¢: R" > R' can still be located when 
the objective @ is discontinuous because bounds on the 
range of ¢ are all that is necessary to do a branch and 
bound search. For a simple example, suppose 


ijn t=" (5) 
x)= 
l+x ifx>1, 


and suppose the interval [—2, 2] is to be searched 
for global minima. For illustration purposes, suppose 
(0.25) = 0.125 has been evaluated, so that 0.125 is an 
upper bound on the global optimum, and suppose the 


subinterval x = [0.5, 1.5] is to be analyzed. To obtain an 
interval enclosure for the range of ¢ over x, we take 


#(x) € [0.5, 1.0]*U(1 + [1.0, 1.5]) 
= [0.25, 1.0]U[2.0, 2.5] = [0.25, 2.5], 
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where a U b is the smallest interval that contains both a 
and b. Thus, since 0.125 < [0.25, 2.5], a minimum of ¢ 
cannot possibly occur within the interval [0.5, 1.5]. 

Similar considerations apply if the gradient V¢ is 
discontinuous. In such cases, the gradient test (see 
> Interval analysis: Unconstrained and constrained op- 
timization) will keep boxes that either contain zeros of 
the gradient or critical points corresponding to gradient 
discontinuities where the gradient changes sign. 

When the gradient is discontinuous, interval New- 
ton methods can still be used for iteration, as well as to 
verify existence. (See [3, (6.4) and (6.5), p. 217] for a for- 
mula; see >» Interval Newton methods for an introduc- 
tion to interval Newton methods; see > Interval fixed 
point theory for an explanation of interval fixed point 
theory.) Application to problems with discontinuous 
gradients is based on extended interval arithmetic (with 
infinities) and astute computation of slope bounds; see 
[3, pp. 214-215] for details. 


Example 1 Consider 
f(x) = |x? —x|-2x +2 =0. (6) 


This function has both a root and a cusp at x = 1, with 
a left derivative of — 3 and a right derivative of — 1 at x 
= 1.If 1 € x, then a slope enclosure is given by S(f, x, x) 
=[-1, 1](x+x-—1)—-2. 


f(z) = |22 - 2] - 22 4+ 2 


slope range 


0.7 


08 O09 10 11 12 
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The concept of a slope range for a nondifferentiable function 


Consider using the interval Newton method 


( f(&) 


x< x — S(f.x, x)’ 


x FFD = x) nx. 


with x equal to the midpoint * = 0.9 of x, and x 
= [0.7, 1.1], where S(f, x") (4) is a bound on the slope 
enclosure of f at x. (See Fig. 1 for the concept of slope 
range.) 

An initial slope enclosure is then S(f, [0.7, 1.1], 0.9) 
=[-3, -1], 


29 


and S(f,[0.7,1.1],0.9) = [-—3,—1]. If this interval 
Newton method is iterated, then on iteration 3, exis- 
tence of a root within x® was proven, since x® C 
intx?), where intx®) is the interior of x®. For details, 
see [3, pp. 224-225]. 
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Introduction 


There are many applications of optimization for dy- 
namical systems, including parameter estimation from 
time series data, determination of optimal operating 
profiles for batch and semibatch processes, optimal 
start-up, shutdown, and switching of continuous sys- 
tem, etc. To address such problems, one approach is to 
discretize any control profiles that appear as decision 
variables. There are then basically two types of methods 
available: (1) the complete discretization or simultane- 
ous approach [20,28], in which both state variables and 
control profiles are discretized, and (2) the control pa- 
rameterization or sequential approach [3,26], in which 
only the control profiles are discretized. In this article, 
only the sequential approach is considered. Since these 


problems are often nonconvex and thus may exhibit 
multiple local solutions, the classical techniques based 
on solving the necessary conditions for a local mini- 
mum may fail to determine the global optimum. This is 
true even for a rather simple temperature-control prob- 
lem with a batch reactor [12]. Therefore, there is an in- 
terest in global optimization algorithms which can rig- 
orously guarantee optimal performance. 

There has been significant recent work on this prob- 
lem. For example, Esposito and Floudas [6,7] used the 
aBB approach [1,2] to address the global optimization 
of dynamic systems. In this method, convex underesti- 
mating functions are used in connection with a branch- 
and-bound framework. A theoretical guarantee of at- 
taining an e-global solution is offered as long as rig- 
orous underestimators are used, and this requires that 
sufficiently large values of a be used. However, this is 
difficult in this context because determining proper val- 
ues of a depends on the Hessian of the function being 
underestimated, and this matrix is not available in ex- 
plicit functional form when the sequential approach is 
used. Thus, as discussed in more detail by Papamichail 
and Adjiman [21], this approach does not provide 
a theoretical guarantee of global optimality. Alterna- 
tive approaches have been given by Chachuat and Lat- 
ifi [4] and by Papamichail and Adjiman [21,22] that do 
provide a theoretical guarantee of €-global optimality; 
however, this is achieved at a high computational cost. 
Singer and Barton [25] have described a branch-and- 
bound approach for determining a theoretically guar- 
anteed €-global optimum with significantly less compu- 
tational effort. In this method, convex underestimators 
and concave overestimators are used to construct two 
bounding initial value problems (IVPs), which are then 
solved to obtain lower and upper bounds on the trajec- 
tories of the state variables [24]. However, the bound- 
ing IVPs are solved using standard numerical methods 
that do not provide guaranteed error estimates, and so 
this approach does not provide fully guaranteed results 
from a computational standpoint. 

In this article we discuss an approach [8,9] for the 
deterministic global optimization of dynamical systems 
based on interval analysis. A key feature of the method 
is the use of a verifying solver [10] for parametric or- 
dinary differential equations (ODEs), which is used to 
produce guaranteed bounds on the solutions of dy- 
namic systems with interval-valued parameters. This is 
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combined with a technique for domain reduction based 
on using Taylor models [19] in an efficient constraint 
propagation scheme. The result is that problems can be 
solved to global optimality with both mathematical and 
computational certainty. 


Formulation 


In this section we give the mathematical formulation 
of the nonlinear dynamic optimization problem to be 
addressed. Assume the system is described by the non- 
linear ODE model x = f(x, 0). Here x is the vector of 
state variables (length ) and @ is a vector of adjustable 
parameters (length p), which may be a parameteriza- 
tion of a control profile 6(t). The model is given as an 
autonomous system; a nonautonomous system can eas- 
ily be converted into autonomous form by treating the 
independent variable (t) as an additional state variable 
with derivative equal to 1. The objective function ¢ is 
expressed in terms of the adjustable parameters and the 
values of the states at discrete points t,,, 4 = 0,1,...,7r. 
That iss ¢=@ [xp(0), 0; w=0,1,..., ar where 
x,(0) = x(t,,,9). If an integral appears in the ob- 
jective function, it can be eliminated by introducing an 
appropriate quadrature variable. 
The optimization problem is then stated as 


min $ [x,(8), 65 =O loan? (1) 
XL 
subject to x = f(x,6), 
xo = xo(0) > 
t € [to, tr], 
dcO. 


Here @ is an interval vector that provides upper and 
lower parameter bounds (uppercase will be used to 
denote interval-valued quantities, unless noted other- 
wise). We assume that f is (k—1) times continu- 
ously differentiable with respect to the state variables 
x, and (q + 1) times continuously differentiable with 
respect to the parameters 6. We also assume that ¢ is 
(q + 1) times continuously differentiable with respect 
to the parameters 0. Here k is the order of the trunca- 
tion error in the interval Taylor series (ITS) method to 
be used in the integration procedure, and q is the or- 
der of the Taylor model to be used to represent param- 
eter dependence. When a typical sequential approach 
is used, an ODE solver is applied to the constraints 


with a given set of parameter values, as determined 
by the optimization routine. This effectively eliminates 
Xy,f6 =0,1,...,7, and leaves a bound-constrained 
minimization in the adjustable parameters 6 only. The 
method discussed here can also be extended to opti- 
mization problems with general state path constraints, 
and more general equality or inequality constraints on 
parameters. This is done by adapting the constraint 
propagation procedure (CPP) discussed below to han- 
dle the additional constraints. 


Methods 
Taylor Models 


Makino and Berz [13] have described a remainder 
differential algebra (RDA) approach that uses Taylor 
models for bounding function ranges. This represents 
an approach for controlling the “dependency problem” 
of interval arithmetic, which leads to overestimation of 
function ranges. In the RDA approach, a function is 
represented using a model consisting of a Taylor poly- 
nomial and an interval remainder bound. 

One way of forming a Taylor model of a function 
is by using a truncated Taylor series. Consider a func- 
tion f:x € X CR” > R that is (q + 1) times par- 
tially differentiable on X and let xo € X. The Taylor 
theorem states that for each x € X, there existsa¢ € R 
with 0 < ¢ < 1 such that 


f(x) = YA fe —20-V] fea 


i=0 


1 qtl 
tag le- VI f [xo + (x — x0)e] ; 


(2) 
where the partial differential operator [g - \/]* is 
[g- Vi 
k! ak 
=) Fejetll Soe 
Eee de ee OX, +++ OXm 
OSji..jmsk 
(3) 


The last (remainder) term in (2) can be quantita- 
tively bounded over 0 < € < 1 and x € X using inter- 
val arithmetic or other methods to obtain an interval 
remainder bound Ry. The summation in (2) is a qth 
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order polynomial (truncated Taylor series) in (x — x9) 
which we denote by ps(x — xo). A qth order Taylor 
model T; for f(x) then consists of the polynomial py 
and the interval remainder bound Ry and is denoted by 
Tr = (pr, Rp). Note that f € Ty for x € X and thus T; 
encloses the range of f over X. 

In practice, it is more useful to compute Taylor 
models of functions by performing Taylor model op- 
erations. Arithmetic operations with Taylor models can 
be done using the RDA operations described by Makino 
and Berz [13,14], which include addition, multiplica- 
tion, reciprocal, and intrinsic functions. Therefore, it is 
possible to compute a Taylor model for any function 
representable in a computer environment by simple 
operator overloading through RDA operations. When 
RDA operations are performed, only the coefficients of 
pr are stored and operated on; however, rounding er- 
rors are bounded and added to Ry. It has been shown 
that, compared with other rigorous bounding meth- 
ods, the Taylor model can be used to obtain sharper 
bounds for modest to complicated functional depen- 
dencies [13,19]. 

An interval bound on a Taylor model T = (p, R) 
over X is denoted by B(T), and is found by de- 
termining an interval bound B(p) on the polyno- 
mial part p and then adding the remainder bound; 
that is, B(T) = B(p) + R. The range bounding of the 
polynomials B(p) = P(X — xo) is an important issue, 
which directly affects the performance of Taylor model 
methods. Unfortunately, the exact range bounding of 
an interval polynomial is nondeterministic polyno- 
mial-time hard, and direct evaluation using interval 
arithmetic is very inefficient, often yielding only loose 
bounds. Thus, various bounding schemes [15,19] have 
been used, mostly focused on exact bounding of the 
dominant parts of P, i.e., the first- and second-order 
terms. However, exact bounding of a general inter- 
val quadratic is also computationally expensive (in the 
worst case, exponential in the number of variables m). 
Lin and Stadtherr [8] have adopted a very simple com- 
promise approach, in which only the first-order and the 
diagonal second-order terms are considered for exact 
bounding, and other terms are evaluated directly. That 
is, 


m 


B(p) = ss [ai (Xj — xi0)” + bi(Xi — xio)]+Q, (4) 


i=1 


where Q is the interval bound of all other terms, and is 
obtained by direct evaluation with interval arithmetic. 
In (4), since X; occurs twice, there exists a dependency 
problem. For |a;| > w, where @ is a small positive num- 
ber, (4) can be rearranged so that each X; occurs only 
once; that is, 


- b\? 
B(p) ->|. (x:-x0+ =) -Zlsa. (5) 


i=1 : 


In this way, the dependence problem in bounding the 
interval polynomial is alleviated so that a sharper bound 
can be obtained. If |a;| < w, direct evaluation can be 
used instead. 


Verifying Solver for Parametric ODEs 


When a traditional sequential approach is applied to 
the optimization of nonlinear dynamical systems, the 
objective function 9 is evaluated, for a given value of 
0, by applying an ODE solver to the constraints to 
eliminate the state variables x. In the global optimiza- 
tion approach discussed here, a sequential approach 
based on interval analysis is used. This approach re- 
quires the evaluation of bounds on ¢, given some pa- 
rameter interval O. Thus, an ODE solver is needed that 
can compute bounds on x,, W =0,1,...,7, for the 
case in which the parameters are interval-valued. Inter- 
val methods (also called validated methods or verified 
methods) for ODEs [16] provide a natural approach 
for computing the desired enclosure of the state vari- 
ables at t,,, 4 = 0,1,...,r. An excellent review of in- 
terval methods for IVPs has been given by Nedialkov 
et al. [17]. Much work has been done for the case in 
which the initial values are given by intervals, and there 
are several software packages available that deal with 
this case. However, less work has been done on the case 
in which parameters are also given by intervals. In the 
global optimization method discussed here, a verifying 
solver for parametric ODEs [10], called VSPODE, is 
used to produce guaranteed bounds on the solutions of 
dynamic systems with interval-valued initial states and 
parameters. In this section, we review the key ideas be- 
hind the method used in VSPODE, and outline the pro- 
cedures used. Additional details are given by Lin and 
Stadtherr [10]. 
Consider the parametric ODE system 


x= F(x, 6), Xo € Xo, Oe Oe, (6) 
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where t € [fo, t,] for some ft, > fo. The interval vec- 
tors Xj and © represent enclosures of initial values 
and parameters, respectively. It is desired to de- 
termine a verified enclosure of all possible solu- 
tions to this initial value problem. We denote by 
x(t; t;,X;,@) the set of solutions x(t; t;,Xj;,O) 
{x(t; tj,xj,0)|xjEXj,0€ 0} , where x(t; tj, x;, 0) 
denotes a solution of x = f(x, 0) for the initial con- 
dition x = x; at t = t;. We will outline a method for 
determining enclosures X; of the state variables at each 
time step j = 1,...,7, such that x(t;; to, Xo,O) C Xj. 

Assume that at t; we have an enclosure Xj; of 
x(tj; to, Xo, @), and that we want to carry out an inte- 
gration step to compute the next enclosure Xj. Then, 
in the first phase of the method, the goal is to find 
a step size hj = tj41 — t; > 0 and ana priori enclosure 
(coarse enclosure) X; of the solution such that a unique 
solution x(t; tj,x;,0) € Xx; is guaranteed to exist for all 
t € [t;, tj41], all x; €¢ Xj, and all 6 € O. One can ap- 
ply a traditional interval method, with high-order en- 
closure, to the parametric ODEs by using an ITS with 
respect to time. That is, hj and X; are determined such 
that for X; C xt, 


k-1 

X; = > (0, Aj)'Fl(x;, @) + [0, hj] KF (X?, @) 
i=0 

yo 

c Xj. 


(7) 


Here x° is an initial estimate of a k denotes the 
order of the Taylor expansion, and the coefficients 
F'l are interval extensions of the Taylor coefficients 
f' of x(t) with respect to time. Satisfaction of (7) 
demonstrates [5] that there exists a unique solution 
x(t; tj,Xj, 6) € X; forallt€ [tj tia], all xj € Xj, and 
alé «Oo. 

In the second phase of the method, a tighter 
enclosure Xj41 © Xx; is computed such that x(tj41; 
ty,Xo,@) C Xj41. This is done by using an ITS ap- 
proach to compute a Taylor model T,,,, of xj+41 in 
terms of the parameter vector @ and initial state vector 
Xo, and then obtaining the enclosure Xj4; = B(Tx,,,) 
by bounding T,. jt OVEr 6 € © and xo € Xo. To de- 
termine enclosures of the ITS coefficients f (x, j9) an 
approach combining RDA operations with the mean 
value theorem is used to obtain the Taylor models 


T a Now using an ITS for x j+; with coefficients given 
by Tt, one can obtain a result for T,,,, in terms 
of - ana sire and initial states. In order to ad- 
dress the wrapping effect [16], results are propagated 
from one time step to the next using a type of Tay- 
lor model in which the remainder bound is not an 
interval but a parallelepiped. That is, the remainder 
bound is a set of the form P = {Av | v € V}, where 
A € R”*" is a real and regular matrix. If A is orthog- 
onal, as from a QR-factorization, then P can be inter- 
preted as a rotated n-dimensional rectangle. Complete 
details of the computation of T,,,, were given by Lin 
and Stadtherr [10]. 

The approach outlined above, as implemented in 
VSPODE, has been tested by Lin and Stadtherr [10], 
who compared its performance with results obtained 
using the popular VNODE package [18]. For the test 
problems used, VSPODE provided tighter enclosures 
on the state variables than VNODE, and required sig- 
nificantly less computation time. 


Deterministic Global Optimization Method 


In this section, we summarize a method for the de- 
terministic global optimization of dynamical systems, 
based on the use of the tools described above. As noted 
previously, when a sequential approach is used, the 
state variables are effectively eliminated using the ODE 
constraints, in this case by employing VSPODE, leav- 
ing a bound-constrained minimization of ¢(@) with 
respect to the adjustable parameters (decision vari- 
ables) 6. The optimization method discussed here 
can be thought of as a type of branch-and-bound 
method, with a CPP used for domain reduction. There- 
fore, it can also be viewed as a branch-and-reduce al- 
gorithm. The basic idea is that only those parts of 
the decision variable space O that "Satisfy the con- 
straint c(@) = (0) — $< < 0, where ¢ is a known up- 
per bound on the global minimum found using local 
minimization, need to be retained. To perform this do- 
main reduction, a CPP can be used. 

Partial information expressed by a constraint can 
be used to eliminate incompatible values from the do- 
main of its variables. This domain reduction can then 
be propagated to all constraints on that variable, where 
it may be used to further reduce the domains of other 
variables. This process is known as constraint propaga- 
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tion. It is applied to a sequence of subintervals of O, 
which arises in a bisection process. For a subinterval 
0), the Taylor model Ty, of the objective function y 
over 0) is computed. To do this, Taylor models of x ,,, 
the state variables at times t,,, 4 = 1,...,7, in terms of 
0 are determined using VSPODE. Note that Ty, then 
consists of a qth order polynomial in the decision vari- 
ables 6, plus a remainder bound. The part of O™ that 
can contain the global minimum must satisfy the con- 
straint c(@) = (0) — @ < 0. In the CPP outlined here, 
B(T,) is determined and then there are three possible 
outcomes (in the following, an underline is used to in- 
dicate the lower bound of an interval, and an overline is 
used to indicate the upper bound): 


1. If B(T.) > 0, then no 6¢ O™ will ever satisfy 
the constraint; thus, the CPP can be stopped and 
0) discarded. Testing for this outcome amounts to 
checking if the lower bound of Ty, , B( Tg, ), is greater 


than @. If so, then @ can be discarded because it 
cannot contain the global minimum and need not be 
tested further. 

2. If B(T.) < 0, then every @ ¢ O™ will always satisfy 
the constraint; thus, @ cannot be reduced and the 
CPP can be stopped. This amounts to checking if the 
upper bound of Ty,, B(T¢, ), is less than @. This also 
indicates, with certainty, that there is a point inO™ 
that can be used to update @, which can then be done 
using a local optimization routine. 

3. If neither of the previous two cases occur, then part 
of the interval @ may be eliminated. To do this, an 
approach [8,9] based on the range bounding strategy 
for Taylor models is used, as given by (5). If insuffi- 
cient reduction of @ occurs, then it is bisected and 
the resulting subintervals are added to the sequence 
of subintervals to be processed. 


Complete details of the optimization method based 
on these ideas were given by Lin and Stadtherr [8,9]. 
It can be implemented either as an e€-global algo- 
rithm, or, by incorporating interval-Newton steps in 
the method, as an exact (€ = 0) algorithm. The latter 
requires the application of VSPODE to the first- and 
second-order sensitivity equations. An exact algorithm 
using interval-Newton steps was implemented by Lin 
and Stadtherr [8] for the special case of parameter esti- 
mation problems. However, this has not been fully im- 
plemented for more general cases. 


Cases 


Lin and Stadtherr [8,9] have tested the performance of 
the algorithm discussed above on a variety of test prob- 
lems. In this section we summarize the results for two 
of these problems. Both example problems were solved 
using an Intel Pentium 4 3.2 GHz machine running Red 
Hat Linux. The VSPODE package [9], with ak = 17 or- 
der ITS, q = 3 order Taylor model, and QR approach 
for wrapping, was used to integrate the dynamical sys- 
tems in each problem. Using a smaller value of k will 
result in the need for smaller step sizes in the integra- 
tion and so will tend to increase computation time. Us- 
ing a larger value of q will result in somewhat tighter 
bounds on the states, though at the expense of addi- 
tional complexity in the Taylor model computations. 


Catalytic Cracking of Gas Oil 


This problem involves parameter estimation in a model 
representing the catalytic cracking of gas oil (A) to 
gasoline (Q) and other side products (S), as described 
by Tjoa and Biegler [27] and also studied by several oth- 
ers [4,7,22,25]. The reaction is 


ky 
A——— Q 
& cA 
5 
Only the concentrations of A and Q were measured. 
This reaction scheme involves nonlinear reaction kinet- 


ics. A least-squares objective was used for parameter es- 
timation, resulting in the optimization problem 


20 2 
mung — a > (Sag = %u,i) 


M=1 i=1 
subject to x, = —(@ + 63)x> , 


. 2 
x2 = Ox; = 6x2 ; 


t € [0,0.95], 
Xu = x(ty), 
xo = 1,0)", 


8 € [0,20] x [0, 20] x [0, 20], 


where x, is given experimental data. Here the state vec- 
tor, x, is defined as the concentration vector (A, Q)* 
and the parameter vector, 6, is defined as (ky, ko, ks)". 
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For the €-global algorithm, with a relative conver- 
gence tolerance of ¢"! = 1073, 14.3s was required to 
solve this problem. For the exact (€ = 0) global algo- 
rithm using interval-Newton, 11.5s was required. For 
this problem, the exact algorithm required less com- 
putation than the e-global algorithm. However, this 
may or may not be the case for other problems [8]. 
Papamichail and Adjiman [22] solved this problem 
to €-global optimality in 35,478s (Sun UltraSPARC- 
II 360 MHz; Matlab), and Chachuat and Latifi [4] 
obtained an €-global solution in 10,400s (unspecified 
machine; prototype implementation). Singer and Bar- 
ton [25] solved this problem to €-global optimality for 
a series of absolute tolerances, so their results are not 
directly comparable. However, the computational cost 
of their method on this problem appears to be quite 
low. These other methods all provide for €-convergence 
only. 


Singular Control Problem 


This example is a nonlinear singular optimal control 
problem originally formulated by Luus [11] and also 
considered by Esposito and Floudas [7], Chachuat and 
Latifi [4], and Singer and Barton [25]. This problem is 
known to have multiple local solutions. In autonomous 
form and using a quadrature variable, this problem is 
given by 


ce ge 
ae x5(tf) 


subject to x; = x2, 
x2 = —x36 + 16x4 —8, 
x3 = 0, 
x4 =1, 
oo (8) 
X5 = xX) + x5 
+ 0.0005(x2 + 16x, — 8 — 0.1x367)? , 
xo = (0,-1,—V5, 0,0)", 
t € [to, tr] = [0,1], 
6 € [-4, 10]. 
The control @(t) is parameterized as a piecewise con- 
stant profile with a specified number of equal time in- 
tervals. Five problems are considered, corresponding to 
one, two, three, four, and five time intervals in the pa- 
rameterization. Each problem was solved to an abso- 
lute tolerance of €** = 107°. Computational results [9] 


are presented in Table 1. This shows, for each prob- 
lem, the globally optimal objective value ¢* and the 
corresponding optimal controls 0", as well as the CPU 
time (in seconds) and number of iterations required. 
Chachuat and Latifi [4] solved the two-interval prob- 
lem to €-global optimality using four different strate- 
gies, with the most efficient requiring 502 CPU seconds, 
using an unspecified machine and a “prototype” im- 
plementation. Singer and Barton [25] solved the one-, 
two-, and three-interval cases with «**’ = 107? using 
two different problem formulations (with and with- 
out a quadrature variable) and two different implemen- 
tations (with and without branch-and-bound heuris- 
tics). The best results in terms of efficiency were 
achieved with heuristics and without a quadrature 
variable, with CPU times of 1.8, 22.5, and 540.3 s 
(1.667 GHz AMD Athlon XP2000+) for the one-, two- 
, and three-interval problems, respectively. This com- 
pares with CPU times of 0.02, 0.32, and 10.88s (3.2 
GHz Intel Pentium 4) for the method discussed here. 
Even accounting for the roughly factor of 2 differ- 
ence in the speeds of the machines used, the method 
described here appears to be well over an order of 
magnitude faster. The four- and five-interval prob- 
lems were solved [9] in 369 and 8,580.6 CPU sec- 
onds, respectively, and apparently had not been solved 
previously using a method rigorously guaranteed to 
find an €-global minimum. It should be noted that 
the solution to the three-interval problem, as given in 
Table 1, differs from the result reported by Singer and 
Barton [25], which is known to be a misprint [23]. 


Conclusions 


In this article, we have described an approach for the 
deterministic global optimization of dynamical sys- 
tems, including parameter estimation and optimal con- 
trol problems. This method [8,9] is based on inter- 
val analysis and Taylor models and employs a type of 
sequential approach. A key feature of the method is 
the use of a new verifying solver [10] for parametric 
ODEs, which is used to produce guaranteed bounds 
on the solutions of dynamic systems with interval- 
valued parameters. This is combined with techniques 
for domain reduction based on using Taylor mod- 
els in an efficient constraint propagation scheme. The 
result is that problems can be solved to global op- 
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Interval Analysis for Optimization of Dynamical Systems, Table 1 
Results [9] for the singular control problem 


Time intervals g” 


4.071) 


CPU time(s) No. of iterations 


9 


5.575, —4.000) 


71 


1,414 


9.789, —1.200, 1.257, 6.256) 


31,073 


( 
( 
(8.001, —1.944, 6.042) 
( 
( 


timality with both mathematical and computational 
certainty. On parameter estimation problems, an ex- 
act (€ = 0) algorithm, using interval-Newton steps, 
can be applied at a cost comparable to, and per- 
haps less than, that of the e-global algorithm. The 
new approach can provide significant improvements in 
computational efficiency, potentially well over an or- 
der of magnitude, relative to other recently described 
methods. 
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Introduction 


The ability of interval arithmetic (IA) [21,22,23,24] to 
automatically compute reliable solution bounds in nu- 


merical computations makes it an ideal mechanism 
for solving continuous nonlinear global optimization 
problems. To date, most efforts at developing paral- 
lel IA methods for global optimization have used the 
branch and bound (B&B) global search strategy [1,4,8]. 
The sequential B&B-based IA global optimization al- 
gorithm [10,17] executes a tree-like search process 
which is naturally parallelized and amenable to mas- 
sive coarse-grained data parallelism (i. e. workload scal- 
able [14]). 

Several noteworthy advances in parallel algorithms 
for global optimization using interval arithmetic have 
occurred over the past few years [7,15,26]. In addition, 
new software packages have been developed as a re- 
sult of recent implementations of new or existing paral- 
lel IA global optimization algorithms [13,29]. A paral- 
lel programming language expressing a message-driven 
model is utilized in one implementation, resulting in 
a significantly different computational flow than is typ- 
ical with the more classic and popular message-passing 
(e.g. MPI, PVM) and shared-memory (e.g. pthreads) 
parallel implementations [20]. Recently, the ubiquity of 
multi-core processor architectures has opened up new 
possibilities for exploiting thread-level parallelism. 

In the sections that follow, a sequential (B&B) IA 
global optimization algorithm is presented along with 
relevant IA and parallel computing definitions. Next, 
a general formulation of a parallel IA global opti- 
mization algorithm (PIAGO) based on the B&B global 
search strategy is presented. In the methods section, 
a survey of recent algorithmic advances, novel imple- 
mentations, and pertinent language and programming 
environments is discussed. Finally, some concluding re- 
marks are made along with thoughts on fertile future 
research avenues. 


Definitions 
Interval Arithmetic 


A “box” is an n-dimensional interval: 


Ra{t1 gj ee Se 1 = 0 xc —1} 
= (lens Mohs leigh loss 9 ne eee) 
= (Kos Raises 8p)" 


Boldface letters and capital letters are used to de- 
note interval quantities and vectors, respectively (as 
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proposed in [17]). The midpoint of X is denoted, m(X). 
The width of X is denoted, w(X). The greatest lower 
bound and the least upper bound for the interval x is 
denoted x and x, respectively. 


Sequential IA Branch and Bound 


A canonical sequential (B&B) IA global optimization 
algorithm, SIAGO, iterates over a prioritized list of 
boxes, representing current candidate subregions (of 
the initial search space) for containing global min- 
imizer(s). The prioritized list, Q, is typically imple- 
mented as a heap data structure (see Algorithm 1). In 

each iteration, a box, X, is removed from Q. If w(X) 

and w(f(X)) are less than the prescribed tolerances, €, 

and €,, respectively, then X is placed on the solution 
list, S; Otherwise, X is subdivided into smaller boxes, 

Xo, Xi, ... , Xp—-1. Each of the k boxes, X;, is subjected 

to a set of deletion/reduction tests. Surviving X; boxes 

are placed onto Q. 

The boolean operator, Delete(), takes as input a box, 

X, and a floating point number, U* (the upper bound 

on the smallest function value known thus far), and re- 

turns TRUE if and only if one of the following tests re- 
turns TRUE: 

e f(X)>U* 

e if X is strictly feasible (i. e. does not lie on the bound- 
ary of the feasible space) and 0 ¢ V; f(X) (the gra- 
dient) forsome i= 0,..., n—1, 

e if X is strictly feasible and the Hessian, V7 f(X), is 
not positive semi-definite anywhere in X 

e interval Newton’s method can eliminate all of X. 
These tests are known as the midpoint test, mono- 

tonicity test, Hessian test, and Newton test, respectively. 

More elaborate versions of SIAGO exist today (e.g. 

Newton’s method box reduction, unique critical point 

existence tests) [10,16] but have little effect on the sur- 

vey in Sect. “Methods” of parallel IA global optimiza- 
tion algorithms’. 


Formulation 


The following two facts about SIAGO (see Algorithm 1) 
reveal a potential for scalable parallelism. 

First, Delete() can be performed independently on 
different feasible subregions and therefore can be done 


ISIAGO efficiency can affect experimental parallel speedup 
measurements as noted in Sect. “Superlinear Speedup” 


U* = oo, Q.insert(X); // initial box while 
true do 
repeat 
if Q.empty then S.print, Halt Q.remove(X) 
until f(X) < U*; // cut-off test 
if WithinTol(X, €,,€f, U*) then 
S.insert(X) 
else 
Subdivide(X, Xo, Xj, .. 
do 
if not Delete(X;, U*) then 
U* = min(f(m(X;)), U*) Q.insert(X;) 
end 
end 
end 
end 


.,Xz-1) for i=0 tok-1 
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Algorithm 1 
SIAGO 


in parallel. This allows for “massive” data parallelism as 
sub-boxes can be distributed across all processors. 

Actually, some dependence exists for the midpoint 
test (see U* in Algorithm 1). However, this dependence 
only affects the “sharpness” of this test and not the cor- 
rectness. In practice, newly discovered lower U* val- 
ues are shared among all participating processors via 
broadcasts or shared memory (see Sect. “Parallel Com- 
puter Models”). 

Second, if a feasible region is not deleted (and not 
reduced via interval Newton’s method), the procedure, 
Subdivide(), will divide it into k subregions which to- 
gether entirely cover the whole feasible region. The 
workload has just grown by k. 

Such workload growth makes possible workload 
scalability [14]. This means that the workload can scale 
to match the parallel computing power (i.e. CPU uti- 
lization is optimized). In fact, the workload growth of 
SIAGO is potentially exponential and for high dimen- 
sional problems can overtake the parallel computing 
power and memory resources. 

The exponential workload growth of SIAGO is no 
surprise in that the global optimization problem in 
general is NP-hard (i.e. no algorithm has yet been 
found which is better than simply performing a com- 
plete space search for the solution, requiring exponen- 
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tial time in the worst case). IA allows one to remove 
(or reduce using interval Newton’s method) potentially 
large “chunks” of the search space with the hope of 
pruning/squeezing one’s way to a solution. 


Parallel Computer Models 


A parallel version of SIAGO is implementable on two 
basic categories of parallel computers: shared mem- 
ory multiprocessors (including multi-core processors 
common today) and distributed memory multicomput- 
ers. Although multiprocessors are easier to program 
than multicomputers (one doesn’t have to worry about 
communication primitives), multicomputers have been 
the most popular choice for PIAGO for several rea- 
sons. First, multicomputers are more scalable. Second, 
there are freely-available, robust, easy-to-use parallel 
programming language extensions such as Parallel Vir- 
tual Machine (PVM) and the Message Passing Interface 
(MPI). Third, multicomputers are more cost effective 
(e.g. a simple cluster of workstations (COW) with in- 
expensive gigabit Ethernet). Fourth, the massive work- 
load generated by PIAGO implementations on hard 
global optimization problems (for which PIAGO al- 
gorithms were designed to solve) keeps each proces- 
sor busy working on a local subregion of the search 
space. If an effective workload management scheme is 
adopted (see Sect. “Workload Management”), CPU uti- 
lization will be maximized and communication will not 
be a limiting factor. 


PIAGO 


A generalized distributed memory parallel IA global 
optimization algorithm (PIAGO) has the following 
form: 


Initialize/Startup all processors 
Perform SIAGO in parallel 
manage workload 
broadcast improved U* values 
Detect global termination state 
Terminate all processors 
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Algorithm 2 
PIAGO 


Workload Management 


Workload in PIAGO algorithms is characterized at any 
given time by the set of boxes remaining to be processes 
or searched. PIAGO methods distinguish themselves 
primarily in the manner they manage workload (see Al- 
gorithm 2). In SIAGO, workload resides on a single pri- 
ority queue of boxes, Q. In PIAGO, workload can be 
centrally managed on a single “master” node (proces- 
sor), or it can be distributed among all nodes with each 
processor managing its own local Q. Hybrid schemes 
can also be employed consisting of a centrally managed 
global priority search queue on the master node work- 
ing in concert with local search queues on each slave 
node. 


Distributed Management In this scheme, workload 
is distributed either statically to all processors at the 
beginning of the computation (static load balancing) 
or dynamically during computation (dynamic load 
balancing). With dynamic load balancing, processors 
coordinate and redistribute workload during computa- 
tion in order to maximize CPU utilization and mini- 
mize total execution time. Workload state information 
(e.g. local search queue size) is continually (but not 
necessarily frequently) broadcast among all processors. 

Dynamic load balancing is generally scalable. How- 
ever, each processor must communicate (by request, 
event, or at programmed time intervals) workload state 
information in order to make effective workload bal- 
ancing decisions. Too much state information being 
broadcast frequently detracts from box processing and 
may saturate the machine’s bandwidth. Stale informa- 
tion concerning a processor’s state risks poor load bal- 
ancing decisions being made on an inaccurate depiction 
of the current global state. 


Centralized Management In this scheme (some- 
times called master/slave), one master node is respon- 
sible for managing (scheduling) the workload. Slave 
(worker) nodes request work (or are “pushed” work) 
from the master. The master node is responsible for 
scheduling the workload in a way that maximizes CPU 
utilization and (hopefully) minimizes total execution 
time. 

One advantage of centralized control is that work- 
load can be prioritized globally (e. g. boxes, X, ordered 
on a priority queue based on minimum f(X)). Global 
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termination detection is easy: The computation is done 
when the master node has no more workload and all 
slave processors are idle. Load balancing is achieved 
through effective scheduling. 

Centralized workload management is not scalable, 
in the theoretical sense. In practice, a centralized 
scheme is successful provided the master node does not 
become a “bottleneck”. Communication between one 
master node and a large set of worker nodes can be- 
come intensive and exhaust the communication band- 
width of the parallel machine. Moreover, memory and 
CPU resources on one processor are limited (relative to 
the total CPU power and memory of the parallel ma- 
chine) and can easily become saturated if stressed with 
too much workload or communication. 


Hybrid Workload Management Hybrid schemes al- 

low each worker processor to manage its own local Q 

while still maintaining a master process responsible for 

handling work requests from idle processors. The ben- 
efits of the hybrid approach over the pure centralized 
approach are two-fold: 

e fewer requests for work (to the master) are required 
since each worker must first complete its local work- 
load (including self-generated workload resulting 
from box splitting) before it becomes idle 

e the potential memory bottleneck at the master is 
mitigated since the local memory resources on each 
worker are utilized. 

The main disadvantage of the hybrid approach versus 

the centralized approach is the sacrifice of total (global) 

workload ordering. The master node “running out of 
boxes” or a worker process generating too much work 
to be held in local memory are two other issues that 
need to be addressed. 

The main advantages of the hybrid approach com- 
pared to the distributed approach are two-fold: 

e a better approximation to a total workload ordering 

e fewer possible retransmissions for work as the mas- 
ter node is (usually) guaranteed to have boxes. 

Because the hybrid approach still uses a master node for 

scheduling workload, this method inherits the scalabil- 

ity weakness of its centralized parent. 


Load Balancing 


One necessary condition for load balancing is ensur- 
ing no worker processor sits idle. A second goal of load 


balancing in PIAGO algorithms is the distribution of 
“quality” boxes among the worker processors. A qual- 
ity box is defined as a box more likely to contain a min- 
imizer (or near minimizer). It is natural to expect that 
global minima will be discovered more quickly if par- 
ticipating processors focus their efforts on subregions 
of the workspace that more likely to contain minimiz- 
ers. Early improvements to the SIAGO algorithm rec- 
ognized this fact and (efficiently) sorted boxes, X, on 
increasing f(X) using a priority search queue, Q. 


Superlinear Speedup 


Speedup is defined as S,, = Ti/Tin, where T, is the 
sequential execution time (e.g. SIAGO on one pro- 
cessor) and T;, is the parallel execution time (e.g. 
PIAGO on m processors). Theoretically, superlinear 
speedup (i.e. S,, > p) of an efficient algorithm is 
not possible [6]. In practice, however, superlinear 
speedup has been reported often for B&B algorithms in 
general and PIAGO algorithms in particular [2,5,7,13, 
15,18,20,25,26]. 

One reason why superlinear speedup may be 
achieved in practice is that the sequential algorithm 
may be inefficient. Some of the earliest PLAGO im- 
plementations reported large superlinear speedups. For 
example, a superlinear speedup of 170 is reported on 
32 nodes in [25]. Using a priority search queue ordered 
on lowest f(X) [9], Leclerc [18] reports only sublinear 
speedup of approximately 1/2 for the same problem. 

In [2] a theorem is presented that “clearly indicates 
that no substantial superlinear speedup is possible, as- 
suming that the best-first strategy is used”. Here, best- 
first strategy refers to the same lowest f(X) ordering of 
boxes on the search queue used by Leclerc. Note, the 
theorem does not claim that the best-first strategy is 
best strategy to use. It only claims that if the best-first 
strategy is used for both the the sequential version and 
the parallel version, then superlinear speedup is not ex- 
pected. 

In fact, most of the superlinear speedups that have 
been reported recently are just slightly above linear. 
This can be explained by the combination of one or 
more of the following factors: 

e high memory utilization in the sequential case may 
result in poor caching and possible paging thus ex- 
tending execution time 
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e non-deterministic timing anomalies (race condi- 
tions) that occur in parallel executions may not have 
been “smoothed out” by averaging the results of 
many execution runs 

e the partial breadth-first search that parallelization 
introduces into the computation may indeed accel- 
erate finding global solutions for some problems. 


Methods 


Following is a survey of PIAGO methods that have 
evolved over the past 15 years. Performance compar- 
isons of the the various methods based on execu- 
tion times are difficult to make. For example, differ- 
ences in implementation hardware, the problems be- 
ing solved, and IA software used, will affect execution 
times. 

Instead, most articles report speedup as a mea- 
sure of the efficiency of the parallel algorithm. How- 
ever, speedup is also dependent on the several fac- 
tors including box ordering on the search queue, 
memory utilization, non-deterministic parallel “race 
condition” effects, implementation hardware, and the 
specific problem being solved (see Sect. “Superlinear 
Speedup”). For these reasons, no effort is made to com- 
pare the various algorithms with regard to reported ef- 
ficiency. 

Various acceleration techniques or general im- 
provements to SIAGO are not considered. It is assumed 
that such improvements would benefit most if not all of 
the methods surveyed. 

Finally, no discussion of global termination detec- 
tion is made. Although this is an interesting topic [28], 
the methods (both centralized and distributed) are few, 
well analyzed, and not affected by the particular na- 
ture of B&B IA global optimization algorithms. More- 
over, the contribution of global termination detection 
to the total execution time for hard problems is negligi- 
ble. 

The key component differentiating the various 
PIAGO algorithms is workload management (see 
Sect. “Workload Management”). Each considered PI- 
AGO algorithm is categorized into one of distributed, 
centralized, or hybrid categories. A discussion of the 
workload management scheme is given along with rel- 
evant comments concerning scalability, code complex- 
ity, and communication costs. 


Distributed Approaches 


As mentioned in “Distributed Management”, dis- 
tributed workload approaches are generally scalable. 
Asynchronous non-blocking communication is more 
efficient, but also more difficult to program. By ei- 
ther interleaving messaging probing (e. g. MPI_Iprobe) 
within the main computation loop (see Algorithm 1) 
or dedicating a separate thread to the task of receiving 
messages, one can use efficient non-blocking commu- 
nication in the approaches that follow. No further dis- 
cussion of synchronous versus asynchronous commu- 
nication is made. 

Let Po, Pi, ... , Pm—1 represent m processors on 
a parallel machine. Let Wo, Wi, Sette a Wo represent 
recorded workload state information for each proces- 
sor. A given processor can query the (approximate) cur- 
rent workload queue size or minimum f(X) on proces- 


sor j using W;.Qsize or W;.Qlbf (the lower bound on 
the function over all boxes in the queue), respectively. 


The Leclerc Approach This approach [18,19] is fully 
distributed and utilizes the best-first queuing strategy. 
It uses the load balancing procedure listed in Proce- 
dure loadbalance with the function WorkloadBalanced 
returning TRUE when the processor’s Q is empty (i.e. 
no work). This is a simple demand-driven load balanc- 
ing scheme. The lowest f(X) value for boxes on each 
processor’s local Q are broadcast at regular intervals to 
all processors and recorded in Ww. 


// Load balance on processor, P; 
E = {i} if not WorkloadBalanced(Q, W) then 
repeat 
P, = min(W;.QIbf), i ¢é E Request fraction 
of boxes from P; if no boxes received then 


E=EU {b} 
end 
until boxes received 


end 
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Procedure 
loadbalance (i) 


The Hu, Kearfott, Xu, and Yang Approach This ap- 
proach [13] is similar to the one used by Leclerc, but 
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with an initial static assignment of one box to each 
processor on startup. One box is requested, instead of 
a fraction of boxes, when a processor becomes idle. 
From the paper it is unclear to which processor(s) a re- 
quest for workload is made. It is also unclear whether 
workload state information, WwW, is maintained. 


The Caprani and Madsen Approach This simple, 
yet promising approach [3] uses static load balancing 
rather than dynamic load balancing. First, a “good” 
U* is computed on one processor. Next, a “sufficient 
number” (e.g. 10m) of sub-boxes are generated using 
SIAGO and placed into m sets of “approximately equal 
difficulty”. The m sets along with U™ are statically dis- 
tributed onto m processors. SIAGO is performed on 
each processor with no communication. 


The Eriksson and Lindstrém Approach Here [5], 
load balancing is considered on a specialized paral- 
lel computer—an Intel iPSC/2 hypercube. No work- 
load state information, W, is maintained. In order to 
load balance qualitatively as well as quantitatively, a hy- 
brid of two load balancing strategies is used: receiver- 
initiated and sender-initiated. 

The receiver-initiated load balancer is conceptually 
similar to Procedure loadbalance. But, rather than a se- 
lection based on min(W; .Qibf), an un-prioritized linear 
search (for a non-idle node) along a ring is performed. 
This ensures no processor stays idle for very long. 

The sender-initiated load balancer seeks to balance 
qualitatively. Here, the “best” box on the Q (i.e. the one 
with the lowest f(X)) is “pushed” to a random proces- 
sor each time G boxes have been split. The frequency of 
a push operation, G, on a particular processor is decre- 
mented by one when the pushed box gets placed at the 
front of the Q of the randomly selected processor; oth- 
erwise, G is incremented by one. The net effect is that 
if truly “good” boxes are being pushed, then they will 
continue to be pushed at a high frequency; otherwise, 
pushes will occur less often. 


The Gau and Stadther Approach Here [7] two fun- 
damental algorithms are proposed. First is the syn- 
chronous work stealing (SWS) approach. This ap- 
proach is very similar to the approaches by Hu, 
Kearfott, Xu, and Yang, and Leclerc. The difference is 
that largest Q length is used instead of lowest f(X). 


Next, an asynchronous diffusive load balancing 
(ADLB) scheme is proposed. A group of “nearest neigh- 
bors” is defined. Neighbors exchange workload infor- 
mation. Then, boxes are either “pushed” or “pulled” 
to/from neighbors depending on workload distribution 
inequities as determined by each processor. The mech- 
anism is analogous to heat or mass diffusion. 

In theory this approach should be able to handle 
qualitative issues regarding workload. However, this is 
not considered in the paper. 


The Martinez, Casado, Alvarez, and Garcia Approach 
This recent approach [20] is most novel for it’s imple- 
mentation language—Charm++. The execution model 
of Charm++ is message-driven (i.e. the arrival of mes- 
sages “triggers” associated computations). This model 
is similar to a data flow machine. 

Essentially a process (chare) runs on each processor. 
This process responds to (is triggered by) messages to 
either process a box, Process-Box, or update U*, update- 
U*. A Process-Box message can either: 

e reject the box with no messages generated 

e subdivide the box generating two Process-Box mes- 
sages sent to two random processors 

e send a message to the main chare to enqueue a new 
solution. 

Messages can be prioritized so that update-U* mes- 
sages take precedence over Process-Box messages. This 
should help improve the efficiency of the parallel algo- 
rithm. Also, Process-Box messages can be prioritized on 
lowest f(X) in order to load balance qualitatively. 

Data flow solutions are truly elegant. Load balanc- 
ing quantitively and qualitatively is achieved via ran- 
domness and built-in message prioritization. 


Centralized Approaches 


The Henriksen and Madsen Approach An early im- 
plementation of a PIAGO algorithm using a cen- 
tralized workload manager is that of Henriksen and 
Madsen [11]. A master node maintains the priority 
workload queue, Q, and schedules work to each slave 
processor. When a slave node splits a box, it keeps only 
one box and sends the remainder back to the master, to 
be inserted into Q. U* is also maintained at the master. 

The algorithm is load balanced (both quantitively 
and qualitatively) and has the advantage of total order- 
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ing of boxes. However, its weakness is poor scalability. 
The master quickly becomes a memory bottleneck and 
communication “hotspot” on parallel machines with 32 
or more nodes [2]. 

To be fair, however, such an algorithm is better 
suited to shared memory multiprocessors, and in par- 
ticular, multi-core processors (e.g. AMD Opteron, In- 
tel Core 2 Duo). Though multi-core processors don’t 
offer as great an opportunity for massive parallelism 
(usually 16 cores or less on a processor), they are ubiq- 
uitous today and inexpensive. Therefore, one can envi- 
sion Henriksen and Madsen’s approach being used on 
a distributed memory multicomputer in which the in- 
dividual processors are multi-core. A more scalable al- 
gorithm would be used on the multicomputer architec- 
ture as a whole, but the centralized approach could be 
used as a multithreaded PIAGO application running on 
each multi-core processor. 

The advantage of this hierarchical workload man- 
agement approach is a better approximation to the best- 
first strategy. In addition, more efficiency would be ob- 
tained with the centralized implementation on each 
multi-core processor, since shared memory is faster 
than message passing. The main disadvantage would be 
code complexity. 


Hybrid Approaches 


As was mentioned in Sect. “Centralized Approaches”, 
pure centralized approaches, though offering total or- 
dering of the workload Q, are not scalable. Hybrid ap- 
proaches are theoretically not scalable either. However, 
some of the scalability issues are mitigated by leverag- 
ing local memory on worker processors. Three hybrid 
approaches are considered. 


The Berner Approach Here [2], a master node han- 
dles requests from idle processors. A dynamically ad- 
justed variable, max, is used to “throttle” the workload 
on the worker processors as well as help ensure the mas- 
ter does not run out of work. Processors with more than 
max boxes on the local Q will send “some of them” to 
the master. 


The Ibraev Approach This approach [15] is a varia- 
tion on the Berner approach, with the master (leader) 
node continually “floating” to the processor that dis- 
covers a better f(X). Workers discovering a possibly 


lower f(X) “challenge” the current leader. The current 
leader makes a determination as to the next leader and 
broadcasts the index of the new leader along with the 
improved f(X) to all processors. 

In this approach, no effort is made to approximate 
a totally ordered global Q. Rather, the approach seeks 
only to ensure that work requests are made to the pro- 
cessor with the best quality boxes. 


The Tapamo and Frommer Approach Tapamo and 
Frommer [26] propose a variation of the Berner ap- 
proach which allows non-idle processors to serve re- 
quests. The master node keeps track of the lengths of 
each processor’s local Q. When one or more processors 
become idle, the master then instructs non-idle proces- 
sors (in decreasing order of Q length) to concurrently 
satisfy requests from idle processors. 

Workload state information (i. e. local Q sizes) must 
be sent to the master at some frequency. The same is- 
sues regarding this frequency are present in the various 
distributed approaches (See Sect. “Distributed Manage- 
ment”). Delay is introduced due to the indirection of 
requests having to “pass” through the master node. 


Conclusions 


The pure centralized workload management scheme is 
clearly impractical to implement on large distributed 
memory multicomputers due to issues of scalabil- 
ity. Fully distributed algorithms are scalable but some 
would question their efficiency based on concerns that 
the following phenomena may significantly impact per- 
formance: 
e frequent broadcasting of workload state 
e repeated retransmissions for workload due to idle P, 
in Procedure loadbalance 
e a global best-fit exploration of boxes is not being 
performed (i.e. perhaps the best quality boxes are 
not being evenly distributed). 
Hybrid methods were apparently developed to resolve 
one or more of the perceived deficiencies of distributed 
methods and the scalability problem of the pure cen- 
tralized method. Though hybrid methods have reduced 
bottleneck potential, they still suffer from poor scalabil- 
ity. 
A closer examination of the apparent deficiencies of 
the distributed methods is worth making. Efficient (up 
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to practically constant-time) broadcast primitives have 
been implemented [27,12]. Thus, it would seem, that 
frequent broadcasting of workload state may not sig- 
nificantly effect performance. Moreover, the frequency 
of broadcast can easily be throttled if required. 

A good estimate of the workload state on each pro- 
cessor for large problems is reasonable to expect. Thus, 
a high probability exists that the first or possibly second 
request will fall on a non-idle processor with “good” 
work. Retransmissions may in fact be few. 

Finally, a global best-fit exploration of boxes is not 
being performed using distributed schemes. However, 
such a totally ordered exploration is not being done 
using any of the hybrid methods either. An argument 
claiming hybrid methods yield better approximations 
to a global ordering is difficult to make. 

A complete and fair assessment of the various 
PIAGO algorithms (in particular distributed methods 
versus hybrid methods) should cover a wide range of 
difficult global optimization test problems. The same 
efficient SIAGO algorithm (e. g. using best-first order- 
ing) should be used in each and a common hardware 
platform should be utilized. Furthermore, multiple runs 
of each test case should be run and averaged in order 
to “smooth out” non-deterministic parallel computa- 
tion effects. To date no such comprehensive analysis 
has been performed. 
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The selection of subdivision direction is one of the 
points where the efficiency of the basic » branch- 
and-bound algorithm for unconstrained global opti- 
mization can be improved (see > Interval analysis: un- 


constrained and constrained optimization). The tradi- 
tional approach is to choose that direction for subdivi- 
sion in which the actual box has the largest width. If the 
inclusion function (x) is the only available informa- 
tion about the problem 


min P(x), 


then it is usually the best possible choice. If, however, 
other information like the inclusion of the gradient 
(V@), or even the inclusion of the Hessian (H) is cal- 
culated, then a better decision can be made. 


Subdivision Directions 


All the rules select a direction with a merit function: 
k= arg max D(i), (1) 
i=l 


where D(i) is determined by the given rule. If many 
such optimal k indices exist then the algorithm can 
chose the smallest one, or it can select an optimal di- 
rection randomly. 


Rule A. The first rule was the interval-width oriented 
rule. This rule chooses the coordinate direction with 


D(i) := w(x;). (2) 


This rule is justified by the idea that, if the original inter- 
val is subdivided in a uniform way, then the width of the 
actual subintervals goes to zero most rapidly. 

The algorithm with Rule A is convergent both with 
and without the monotonicity test [8]. This rule allows 
a relatively simple analysis of the convergence speed (as 
in [8], Chapter 3, Theorem 6). 


Rule B. E. Hansen described another rule (initiated by 
G. W. Walster [5]). The direct aim of this heuristic di- 
rection selection rule is to find the component for which 
W; = maxjex;h(m,.. 
mines; P(m1,...,Mi-1,t,Miti,...,Mn) is the 
largest (where m; = (x; + x;)/2 is the midpoint of the 
interval xj). The factor W;, that should reflect how 
much @ varies as x; varies over x;, is then approximated 
by w(V¢j(x))w(x;) (where V¢;(x) denotes the ith com- 
ponent of V¢(x)). The latter is not an upper bound for 


.,Mj-1,t, Mitl,.++5 Mn) = 
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W; (cf. [5] page 131 and Example 2 in Section 3 of [4]), 
yet it can be useful as a merit function. 


Rule B selects the coordinate direction, for which (1) 
holds with 


D(i) := w(V¢i(x)) w(x). (3) 


It should be noted that the basic bisection algorithm 
represents only one way in which Rule B was applied 
in [5]. There the subdivision was, e. g., also carried out 
for many directions in a single iteration step. 


Rule C. The next rule was defined by Ratz [9]. The 
underlying idea was to minimize the width of the inclu- 
sion: w($(x)) = w(b(x) — G(m(x))) & w(VO(x)(x — 
m(x))) = S°'_, w(Vo;(x)(x; — m(x;))). Obviously, 
that component is to be chosen for which the term 
w(V¢;(x)(x; — m(x;))) is the largest. Thus, Rule C can 
also be formulated with (1) and 


D(i) := w(Vo j(x)(x; — m(x;))). (4) 


The important difference between (3) and (4) is that in 
Rule C the width of the multiplied intervals is maxi- 
mized, not the multiplied widths of the respective inter- 
vals (and these are in general not equal). After a short 
calculation, the right-hand side of (4) can be written 
as max{| min V@;(x)|, | max V@,(x)|}w(x;). This cor- 
responds to the maximum smear defined by R.B. Kear- 
fott (used as a direction selection merit function solv- 
ing systems of nonlinear equations [6,7]) for the case 
o:R" > R. It is easy to see that the Rules B and C 
give the same merit function value if and only if either 
Vo (x) = 0 or Vo; (x) = 0. 


Rule D. The fourth rule, Rule D is derivative-free like 
Rule A, and reflects the machine representation of the 
inclusion function (x) (see [5]). It is again defined by 
(1) and by 


w(x;) 
w(x;)/ < x; > 


if 0 € xj, 


D(i) := 
(i) otherwise , 


(5) 


where <x > is the mignitude of the interval x: 
<x >:= minxe; |x|. 


This rule may decrease the excess width w(@(x)) — 
w(“(x)) of the inclusion function (where $“(x) is the 


Rule A 


2? 
33 0 3 
Rule B 
3 
" ! 
at" 
? d 
3s 0 3 


Interval Analysis: Subdivision Directions in Interval Branch 
and Bound Methods, Figure 1 

Remaining subintervals after 250 iteration steps of the 
model algorithm with the direction selection Rules A, and B 
for the Three-Hump-Camel-Back problem [3] 


range of g on x) that is caused in part by the floating 
point computer representation of real numbers. Con- 
sider the case when the component widths are of similar 
order, and the absolute value of one component is dom- 
inant. The subdivision of the latter component may re- 
sult in a worse inclusion, since the representable num- 
bers are sparser in this direction. 
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Rule E. Similar to Rule C, the underlying idea of 
Rule E is to minimize the width of the inclusion, but 
this time based on second order information (suggested 
by Ratz [10]): 


D(i) = w((x; — m(x;))(Vbi(m(x)) 


1 n 
+ 5D (Hij(a)(%i — mes). ©) 


j= 


Many interval optimization codes use ® automatic 
differentiation to produce the gradient and Hessian val- 
ues. For such an implementation the subdivision selec- 
tion Rule E requires not much overhead. 


Properties of Direction Selection Rules 


Both the theoretical and numerical properties of sub- 
division direction selection rules have been studied ex- 
tensively [1,3,4,10,11]. The exact definitions, theorems 
and details of numerical comparison tests can be found 
in these papers. Denote the global minimum value 


by ¢*. 


Theoretical Properties 


In [4] the property of balanced direction selection has 
been defined. A subdivision direction selection rule is 
balanced basically if the B&B algorithm with this di- 
rection selection rule will not be unfair with any co- 
ordinate direction: each direction will be selected an 
infinite number of times in each infinite subdivision 
sequence of the leading boxes generated by the opti- 
mization algorithm. A global minimizer point x’ € x° 
is called hidden global minimizer point, if there ex- 
ists a subbox x’ C x° with positive volume for which 

x’ Ex’ and ¢(x’) = ¢* while there exists an other 

global minimizer point x” of the same problem such 

that $(x") < ¢* holds for each subbox x” C x° with 
positive volume that contains x” [11]. Now the follow- 
ing statements can be made: 

1. The basic branch-and-bound algorithm converges 
in the sense that lim,+.. w(x*) = 0 if and only if 
the interval subdivision selection rule is balanced [4] 
(where x* is the leading box of the algorithm in the 
iteration step number s). 

2. Assume that the subdivision direction selection rule 
is balanced. Then the basic B&B algorithm con- 
verges to global minimizer points in the sense that 


Rule C 


3 0 3 
Rule D 

3 

0 ok, 

333 0 3 


Interval Analysis: Subdivision Directions in Interval Branch 
and Bound Methods, Figure 2 

Remaining subintervals after 250 iteration steps of the 
model algorithm with the direction selection Rules C, and D 
for the Three-Hump-Camel-Back problem [3] 


lims-+o0 (x*) = $*, the set of accumulation points 
A of the leading box sequence is not empty, and A 
contains only global minimizer points. 

3. Assume that the optimization algorithm con- 
verges for a given problem in the sense that 
lim,+o0 6(x*) = $*. Then either the algorithm pro- 
ceeds on the problem as one with a balanced 
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direction-selection rule, or there exists a box y 
such that ¢(x) = $* for all x € y, and w(y;) > 0 
(i= 1,2,...,n) for all coordinate directions that 
are selected only a finite number of times. 

4. The subdivision selection Rules A and D are bal- 
anced, and thus the related algorithms converge to 
global minimizer points. 

5. Either the subdivision selection Rules B and C 
choose each direction an infinite number of times 
(they behave as balanced), or the related algorithms 
converge to a positive width subinterval of the 
search region x9 that contains only global minimizer 
points. 

6. Sonja Berner proved that the basic algorithm is con- 
vergent with Rule E in the sense of lims_+95 @(x*) = 
o*, if an additional condition holds for the inclusion 
function [1]. 

7. If the branch-and-bound algorithm with any of the 
direction selection Rules A-E converges to a global 
minimizer point, then it converges to all non-hidden 
global minimizer points [11]. 


Numerical Properties 


The numerical comparison tests were carried out on 
a wide set of test problems and in several computa- 
tional environments. The set of numerical test prob- 
lems contained the standard global optimization test 
problems [3,4], the set of problems studied in [5], 
and also some additional ones [10,11]. The comput- 
ing environments include IBM RISC 6000-580 and HP 
9000/730 workstations and Pentium PC-s. The pro- 
grams were coded in FORTRAN-90, PASCAL-XSC, 
and also in C++. The tests were carried out both with 
simple natural interval extension and with more sophis- 
ticated inclusion functions involving centered forms. 
The derivatives were handcoded in some test [4], while 
they were generated by automatic differentiation in the 
others [3,10,11]. The range of the investigated algo- 
rithms included simple B&B procedures and also opti- 
mization codes with many acceleration devices (like the 
> interval Newton method). 

The conclusions were essentially the same: the Rules 
B, C, and E had similar, substantial efficiency improve- 
ments against Rules A and D, and these improvements 
were the greater the more difficult the solved problem 
was. The average performance of Rule D was the worst. 


Rule C was usually the best, closely followed by Rule B 
and E. It seems that the use of Rule E is justified only if 
the second derivatives are calculated also for other pur- 
poses. The numerical results were diverse, thus if the 
user has a characteristic problem set, then it is worth to 
test all the subdivision direction selection rules to find 
the most fitting one. 

A computationally intensive numerical study [2] 
has proven that the most efficient subdivision direction 
selection rules are not those that minimize the width of 
the objective function inclusions for the result subin- 
tervals (which was the common belief), but those that 
maximize the lower bound of the worse subinterval ob- 
tained or minimize the width of the intersection of the 
result subintervals. The decisions of these a posteriori 
rules coincide the most with the a priori Rules B, C, 
and E. These findings confirm the earlier mentioned 
numerical efficiency results. 
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A system of nonlinear equations can be represented in 
vector form as f(x) = 0, where the components are f ;(x) 
=filx1,...5Xn)=0,i=1,..., 0. 

Sometimes we seek one solution; sometimes we are 
interested in locating all solutions. 

A naive interval approach can be used to search 
a box (an interval vector) V for solutions. Using re- 
peated bisections in various coordinate directions, we 
can chisel off parts of V that cannot contain a solution. 
That is, if f(W) does not contain the zero vector for 
some W in V, then we can delete W as containing no 
solutions to f(x) = 0. The remaining parts of V contain 
all the solutions, if any, that were in the initial V. 

For differentiable systems, there are much more ef- 
ficient methods for finding a solution or the set of all so- 
lutions. Even so, the naive approach does have its uses. 
In practice it often pays to combine a number of tech- 
niques. 

One approach to solving f(x) = 0 is to formulate an 
equivalent fixed-point problem, and use iterative meth- 
ods to solve it. We can define 


g(x) =x + Yf(x) 


for any linear mapping Y. If Y is nonsingular, then f(x) 
= 0 is equivalent to g(x) =x. 

If g is continuous and S is a compact, convex subset 
of R”, and g maps S into itself, then g has a fixed point 
in S and so f(x) = 0 has a solution in S. 


An interval vector V is a compact, convex set, so 
g(V) C V implies f(x) = 0 for at least one point x in 
Vz. 

Classical iterative methods consider sequences of 
points generated by 


kT) — g(x) 


starting from some initial point x. 

If we denote the Jacobian matrix for the system by 
f(x), then choosing Y = — f’(x)~!, we will have New- 
ton’s method. If we take Y as an approximation to — 
f'(x)“!, then we obtain a Newton-like method. 

Interval versions of Newton’s method, however, 
also involve intersections, as we will see. 

An interval Newton method for finite systems of 
nonlinear equations was introduced by R.E. Moore 
[11,12]. Subsequently, many improvements have been 
made, e. g., [4,6,7,8,10,13,16,17,18]. 

In order to explain as clearly as possible, consider 
the one-dimensional case. We have the mean value the- 
orem for continuously differentiable f: 


f(x) = fe) + FO — x) 


for some € between x and x. 
We have f(x) = 0 if x satisfies 


x= x — FOI FO). 


Now the ordinary Newton method replaces the un- 
known & by x, 

The initial idea was to use an interval for € and use 
interval computation throughout the iterations. If we 
start with an interval, say X©, that contains x and 
happens to also contain a solution, say x, of f(x) = 0, 
then X also contains & and therefore x is contained in 


NX) = 2 — [FON FR) 


(N for Newton), where f’(X) C {f"(x):x € X}. The 
first idea was to iterate X"* ) = N(X“), but this turns 
out not to converge in all cases. 
Then the following idea was proposed, [12]. Since 
a solution x in X is also in N(X), it follows that x is 
also contained in the intersection: X° M N(X). 
Therefore, we iterate 


xX) = xh) Aq N(x) 
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with 
Nx) = y — [FROM FO), 


choosing y in X, say the midpoint of X, 

With this modification, the interval Newton 
method does what we want, as will be explained. From 
the above arguments we have proved that for an inter- 
val X, 


1) if N(X) M X is empty, then there is no solution in X. 


If we divide by an interval containing zero, we may 
obtain one, or the union of two, semi-infinite intervals 
for N(X). The intersection with the finite interval X in 
the interval Newton method, reduces the result to a fi- 
nite interval, or the union of two finite intervals, or the 
empty set. 

During the iterations, if X turns out to be the 
union of two intervals, we put one on a list and pro- 
ceed to further iterate with the other one. This idea was 
first presented in E.R. Hansen [6]. 

We can also prove that [5,6]: 

2) if N(X) C X, then there exists a unique solution in 
N(X). 

The existence follows for the compact, convex in- 
terval X, and from the continuity of f’. The uniqueness 
follows from the boundedness of N(X) C X. If there 
were two solutions in N(X), then f’(y) would be zero 
for some y in X and N(X) would be unbounded. 

If f is twice continuously differentiable, then we can 
also prove the following [16]: 

3) if N(X) CX, then the interval Newton method con- 
verges quadratically to the unique solution in X, as 
does the ordinary Newton method from any starting 
point in X. 

‘Quadratically’ here means there is a constant C 
such that w(X“*)) < Cw(X)?, k = 1, 2, ..., where 
w(X) denotes the width of an interval X; thus, w([a, b]) 
=b—-a. 

We illustrate the different behaviors of the ordinary 
Newton method and the interval Newton method in the 
following figures. Fig 1 shows that the ordinary Newton 
method cannot find the middle solution unless we start 
very close to it. 

The first three iterations of the ordinary Newton 
method 


_ f(x) 
F'(xx) 


Xk+1 = Xk 


Interval Analysis: Systems of Nonlinear Equations, Figure 1 
The ordinary Newton method 


for f(x) = x°— x+ 0.2, starting with x = — 0.375, are 
shown in Fig 1. The algorithm produces x") = 0.528..., 
x = — 0,584 ..., x9) = — 22.569 .... In order to con- 
verge to the middle root, we need an initial guess x 
very close to that root. 

The interval Newton method finds all three solutions 
on the starting interval X = [— 1.2, 1.2] without diffi- 
culty. We choose that starting interval because the roots 
of a polynomial 


plz) = anz" +++» + a1z+ a 


with a, 4 0 are well-known to lie in the complex disk 


Iz < ac} ft a 


k<n 


so, for the example p(x) = x°— x+ 0.2, the real roots are 
known to satisfy 


=12<% = 122: 


If the intersection N(X™) nN x splits into two in- 
tervals, we list one and analyze the other. See Fig 2 and 
Fig 3 

We used the usual recursive algorithm for evaluat- 
ing the polynomial and its derivative, namely 


Pp<—4n 

p <0 

FOR i =n—1TOO STEP —1 DO 
p —xp +p 
p<-ajt+xp 

END DO 
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Interval Analysis: Systems of Nonlinear Equations, Figure 2 
The interval N([— 1.2, 1.2]) N [— 1.2, 1.2] splits into [— 1.2, — 
0.0602 ...] U[0.0375..., 1.2]. We list the first and analyze the 
second. See Fig 3 


Of course, for the interval Newton method, the eval- 
uations are carried out in interval arithmetic with out- 
ward rounding. The real coefficients are entered as de- 
generate intervals, for example 


0= [0,0], 1=[1,1], 0.2 = [0.2,0.2]. 


Recalling that we began with the initial interval X 
= [— 1.2, 1.2], we find that the midpoint of X© is y = 
[0, 0] and so: 


NX) = y = [p(X pO) 


= [0,0] — [0.2, 0.2] 


[—5.32, 3.32] 
= (—00, —0.06024...] U [0.03759 ..., 00). 


When we intersect this union with X), we obtain 
[—1.2, —0.06024...] U [0.030759..., 1.2]. 
The calculation of p’([ — 1.2, 1.2]) was as follows: 


p’([=1.2, 1.2]) 

= [3, 3]([—1.2, 1.2][—1.2, 1.2]) — [1,1] 
= [3,3][—1.44, 1.44] — [1,1] 

= [—4.32, 4.32] — [1,1] 

= [—5.32, 3.32]. 


Referring to Fig 2 and Fig 3, we see that the interval 
Newton method first splits the starting interval X into 


0.6 


0.4 


Interval Analysis: Systems of Nonlinear Equations, Figure 3 
The interval being analyzed, X = [0.0375 ..., 1.2], is shown 
enlarged for clarity. Again N(X) N X splits into two intervals 
[0.0375 ...,0.436...] U [0.6738 ..., 1.2]. We list the first and 
analyze the second. See Fig 4 


0.6 


0.4 


. 


Interval Analysis: Systems of Nonlinear Equations, Figure 4 
We analyze the interval X = [0.6738 ..., 1.2]. This time we 
have N(X) C X, because N(X) =[0.724...,0.911...], so we will 
have convergence to the unique solution in X 


two subintervals [— 1.2, — 0.0602 ...] and [0.0375 ..., 
1.2], then it splits the second one again into two subin- 
tervals [0.0375 ..., 0.436...] and [0.6738..., 1.2]. 

The intervals 


[—1.2,0.0602...] and [0.0375...,0.0436...] 


were listed for later analysis, and the interval [0.6738 
...) 1.2] was analyzed. With X = [0.6738 ..., 1.2], the 
method produced 


N(X) = [0.724...,0.911...] CX, 


so there is a unique solution in that X. 
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We can prove the following: 

4) If N(X) C X and we carry out the interval compu- 
tations with outward rounding after a fixed number 
of digits (or bits), the iterations 


xX) = N(x) x, k=0,1,..., 


with X = X form a nested sequence of intervals 
containing the solution until, after a finite number 
of iterations we can stop with X“*) = x®, 


This follows from the fact that there are only finitely 
many different machine numbers of a given ‘precision’, 
that is with a given finite number of digits (bits). In the 
example at hand, the final results were as follows. For 
X = [0.6738..., 1.2], after four iterations: 

THERE IS A ROOT IN 


[0.8788850662, 0.8788850663]. 


The program then removed a new X = [0.037..., 
0.436] from the list. It turned out that N(X) = [0.2007 
...50.213...] CX), so there is a unique solution in the 
new X). After three iterations, we obtained: 

THERE IS A ROOT IN 


[0.2091488484, 0.2091488485]. 


Finally, the program removed the last remaining in- 
terval on the list, namely [— 1.2, — 0.06 ...], which 
is then taken as a new X). This time the intersection 
came into play because 


N(X) = (—oo, —0.809...] U [—0.04...,00). 


The intersection of this union with X = [— 1.2, — 0.06 
...] gave the single interval 


xX = N(X) 7 X© = [-1.2, -0.809...]. 


After four iterations, we obtained the result: 
THERE IS A ROOT IN 


[—1.0880339147, —1.0880339146]. 


A final message was printed out (which happens when 
the list becomes empty after the last one taken out is 
analyzed): 

THERE ARE NO MORE ROOTS IN 


(=1.2,41.2]. 


The following additional examples were carried out, 
using a program implementing the interval Newton 
method just described, in C-XSC [9], and run on an In- 
tel 486 processor. 


1) Find the real roots in [—3, 3] of 


p(x) 
= x* — 1,5201x? + 0.770201x — 0.1300755. 


This polynomial is the expanded form of 
p(x) = (x — 0.5)(x — 0.51)(x — 0.5101), 


thus the roots are fairly closely packed. 
When we entered p(x) as 


x*ex#*X — 1.5201 *x*x + 0.770201 *x — 0.1300755, 


the program SES (Single Equation Solver, in C-XSC [9]) 
produced three roots in the intervals (shown outwardly 
rounded to six places): 


[0.510099, 0.510101], 
[0.509999, 0.510001], 
[0.499999, 0.500001]. 


Sadly, when we entered p(x) as 
x3 — 1.5201 *x~ 2 + 0.770201*x — 0.1300755, 


the program SES produced a false result, a ‘root’ in the 
interval [0.168885, 0.168886]. 

Unfortunately, one must be cautious when using 
a programmed implementation (‘software’) even for 
a method that is guaranteed. It can be very difficult to 
prove the correctness of a computer program, partic- 
ularly when it still has bugs, such as the implementa- 
tion of x3 in an early version of SES. Hopefully, it 
has been fixed in later versions. Usually extensive test- 
ing of a program before its release will uncover most 
such bugs. The subject of proving correctness of com- 
puter programs is a difficult and active area of research 
in computer science. 

Here are some additional examples run in C-XSC. 
They were all independently checked with another pro- 
gram written from scratch and run on another com- 
puter. This is another way to test a computer imple- 
mentation of a computational method. 
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2) Find all the roots in [.01, 1.0] of 
f(x) =x? +sin Z 
= =): 


The program found 30 roots from right to left, be- 
coming more and more closely packed. The first four 
digits of the last three roots found were .0109, .0106, 
0102. 

The last root lies in the interval (shown to 11 digits) 


[.01026804972, .01026804973]. 


The CPU time was 0.16 seconds using C-XSC on 
a 486 processor. 


3) An example suggested by E.A. Galperin, see [14], is 


1 
= ee | = = 
f(x) =x +sin(5), p = 1,2,3. 
We ran the program to find all roots in [ 0.1, 1.0] for 
the cases p = 1, 2, 3. 
Three roots were found right to left for p = 1; thirty- 
one roots for p = 2; and 318 roots for p = 3. The last root 


found for p = 3 lies in the interval 
[0.10003280626, 0.10003280628]. 


To find all the 318 closely packed roots in the case p = 
3, the CPU time was 1.5 seconds using C-XSC on a 486 
processor. 

The method generalizes to n dimensions. Interval 
Newton methods for n-dimensional nonlinear systems 
have some remarkable properties; among them are the 
following: 

1) If N(X) MX is empty, then there is no solution in X; 

2) If N(X) CX, then there is a unique solution in N(X); 

3) If.N(X) CX, then the interval Newton method, with 
X = X, converges quadratically to the unique solu- 
tion in X, as does the ordinary Newton method from 
any starting point in X. 

4) If N(X) C X, with outward rounding in the inter- 
val computation of N(X), then the interval New- 
ton method converges in a finite number of itera- 
tions, because of the intersection, to an interval vec- 
tor containing the unique solution in X, using the 
stopping criterion: STOP when X“*)) = x, 

The general form of such algorithms is 


Xk) — N(x) fal xh), 


where the interval Newton operator N is defined in var- 
ious ways. 
The original way, [12], was 


N(X®) = m(X) = fKOY Fm(XO)), 


where m(X) is the midpoint of X, Newer versions 

(see e. g. [4,7,10,18]) avoid having to find the inverse of 

the Jacobian matrix f’(X) for an interval vector X. 
Krawczyk’s variation [10] was to define N(X) as 


N(X) = y— Yf(y) + {I- YF(X)}Z, 


where y is the midpoint of the interval vector X, I is the 
identity matrix, Y is a nonsingular real matrix, such as 
an approximation to the inverse of f’(m(X)), and Z = 
X— y. 

In cases 2) and 3) above, the subsequent iterations 
converge to the solution (quadratically, if Y converges 
to [f’(m(X))]~! as the width of X goes to zero); and with 
outward rounding in a computer implementation, the 
iterations will stop at a finite value of k with the stop- 
ping criterion: STOP if X**) = x, 

For a technique for searching for a safe starting re- 
gion X) satisfying 2) from which convergence to a so- 
lution is guaranteed, see [13,16]. The technique in- 
volves starting with a large initial box and using bisec- 
tions in a depth-first search in suitable directions to find 
a sub-box that satisfies property 2) above. 

To find enclosures of all solutions in a given ini- 
tial box, we form a list of ‘sub-boxes’ (interval vec- 
tors) in a way analogous to that explained in the one- 
dimensional example. When the intersection N(X) M 
X produces two sub-boxes, we can list one and analyze 
(continue iterations with) the other, or we can list them 
both and choose some other box on the list to analyze 
next. Such an interval method lends itself to paralleliza- 
tion. We can distribute sub-boxes remaining on the list 
to processors in a network, and gain a speed-up factor. 
This is particularly important for applications to global 
optimization; see e.g. [8,15]. 


Numerical Example 


A solution was sought for the following nine-dimen- 
sional system obtained from P. Rabinowitz (private 
communication). The system concerns finding weights 
and argument spacings for a certain type of multidi- 
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mensional integration formula 


x, +x3+ x5 + 2x7 = 1, 

XX. + x3xX4 + 2x5X6 + 2x7Xg + 2xX7X— = > 
1 (%2)" + x3(%4)" + 2x5(%6)? 

+ 2x7(xg)” + 2x7(x9)? = 4 

x4 (Xa)? + x3(24)? + 2x5(x6)° 

+ 2x7(xg)? + 2x7(x9)* = =, 

x1 (x2)* + x3(x4)* + 2x5(x6)* 


2 
+ 2x7(xg)* + 2x7(x9)* = 9° 


1 
x5(x6)? + 2x7xgx9 = 9° 


1 
x5(x6)* + 2x7(xg)?(x9)? = 35° 


x5(X6)° + x7x8(X9)? + x7(xg)’x9 = —, 
x5(x6)* + x7xg (x9)? + x7(xg)?xo = —. 


A solution was sought in the unit 9-cube. We started 
with an initial box slightly larger than the unit 9-cube 
in case there was a solution on the boundary. A depth- 
first search for a safe starting region was carried out and 
was successful after 168 bisections in a certain sequence 
of coordinate directions determined by the program as 
the process proceeded, see [16]. Finally, a solution was 
quickly bounded in a small box (9-dimensional interval 
vector here). The reader is invited to try a favorite non- 
interval nonlinear systems solver to find a solution of 
this system. Better still, find all the solutions and prove 
there are no more, as the interval method did. 

An alternative approach using interval analysis to 
solve nonlinear systems is computing the topological de- 
gree of the mapping f over a box (n-dimensional inter- 
val vector). See [1,2]. 

For access to voluminous literature, available soft- 
ware, current research efforts, conferences, etc. in the 
area of interval computation, see [21]. 


See also 


> Automatic Differentiation: Point and Interval 
> Automatic Differentiation: Point and Interval 
Taylor Operators 


> Bounding Derivative Ranges 

> Contraction-mapping 

> Global Optimization: Application to Phase 
Equilibrium Problems 

> Global Optimization Methods for Systems of 
Nonlinear Equations 

> Interval Analysis: Application to Chemical 
Engineering Design Problems 

> Interval Analysis: Differential Equations 

> Interval Analysis: Eigenvalue Bounds of Interval 
Matrices 

> Interval Analysis: Intermediate Terms 

> Interval Analysis: Nondifferentiable Problems 

> Interval Analysis: Parallel Methods for Global 
Optimization 

> Interval Analysis: Subdivision Directions in Interval 
Branch and Bound Methods 

> Interval Analysis: Unconstrained and Constrained 
Optimization 

> Interval Analysis: Verifying Feasibility 

> Interval Constraints 

> Interval Fixed Point Theory 

> Interval Global Optimization 

> Interval Linear Systems 

> Interval Newton Methods 

> Nonlinear Least Squares: Newton-type Methods 

> Nonlinear Systems of Equations: Application to the 
Enclosure of All Azeotropes 
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Interval algorithms for constrained and unconstrained 
optimization are based on adaptive, exhaustive search 
of the domain. Their overall structure is virtually iden- 
tical to Lipschitz optimization as in [4], since interval 
evaluations of an objective function @ over an inter- 
val vector x correspond to estimation of the range of 
@ over x with Lipschitz constants. However, there are 
additional opportunities for acceleration of the process 
with interval algorithms, and use of outwardly rounded 
interval arithmetic gives the computations the rigor of 
a mathematical proof. 

The interval algorithms are both complicated and 
accelerated by the presence of constraints, as is ex- 
plained below. 

See [5,2] or [3] for further details of concepts in this 
article. 


The basic problem is 
min (x) 
st. c(x) =0 (1) 
g(x) <0, 


where @: x C R" > R, c: x > R™, and g:x > R™, 
where x is an interval vector 
£2 eilecaleaa 

The values m, = 0 and m, = 0 will be allowed, in which 
case the problem is considered to be unconstrained. It is 
emphasized here that, in problem (1), a global optimum, 
that is, the lowest possible value of ¢ over the feasible 
set, is sought. 


The Basic Branch and Bound Algorithm 
for Unconstrained Optimization 


The overall outline of an interval branch and bound al- 
gorithm for unconstrained global optimization is given 
in Table 1. 

One way that a box is rejected in step 2b) of this 
algorithm is by using a bound on the range of the func- 
tion ¢ over the interval vector (box) x. In particular, 
suppose the value #(x) at a point x is known. Then (x) 
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INPUT: an initial box x9. 

OUTPUT: a list C of boxes that have been proven to contain critical points and a list U of boxes with 
small objective function values, but which could not otherwise be resolved. 

1. Initialize a list of boxes L by placing the initial search region Xp in L. 


DO WHILE L + @. 


a) Remove the first box x from L. (The boxes in L are in general in a particular order, depend- 


ing on the actual algorithm.) 

b) (Process x) Do one of the following: 
— reject x; 
- reduce the size of x; 


- determine that x contains a unique critical point, then find the critical point to high 


accuracy; 


- subdivide x to make it more likely to succeed at rejecting, reducing, or verifying unique- 


ness. 


c) Do the following to the box(es) resulting from Step 2b): 


- Ifxwas rejected, do nothing. 


— If more then one box was derived from x, insert all but one of them into L. Call the 


remaining box derived from x x. 


- If there is a X that has been proven to contain critical point, insert it into C. 
- If there is a X that is small, but has not been proven to contain feasible point, insert it 


into U. 


END DO 


is an upper bound for the global optimum. (In fact, if 
@ has been evaluated at various points, then the mini- 
mum of the resulting values is a usable upper bound on 
the global optimum.) Now suppose a lower bound ¢ on 
the range of ¢ over a box (or more generally, a region) 
x C R" can be computed, and that ¢ > (x). Then there 
cannot be any global optimizers of ¢ within x. The value 
@ can be obtained through an interval function value. 
This process is illustrated in the following figure. 

The lower bound ¢ for the objective over the box x 
need not be obtained via interval computations. Indeed, 
if a Lipschitz constant L, for ¢ is known over x, and 
(x) is known for x, the center of x, then, for any x Ex, 


$(%) = H(%) — 5x ||w(x)]] 


where w(x) is the vector of widths of the components of 
the interval vector x. However, getting rigorous bounds 
on Lipschitz constants can require more human ef- 
fort than the interval computation, and often results 
in bounds that are not as sharp as those from inter- 
val computation. (However, heuristically obtained ap- 


interval extension 
of ¢ over X 


Interval Analysis: Unconstrained and Constrained Optimiza- 
tion, Figure 1 

The midpoint test: Rejecting x because of a high objective 
value 


proximate Lipschitz constants, as employed in the cal- 
culations in [4], have been highly successful at solv- 
ing practical problems, albeit not rigorously.) Simi- 
larly, automated computations for Lipschitz constants 
as presently formulated result in bounds that are prov- 
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ably not as sharp as interval computations. Further- 
more, use of properly rounded interval arithmetic, if 
used both in computing ¢ and (x), allows one to con- 
clude with mathematical rigor that there are no global 
optima of @ within x. 

Use of this lower bound for ¢ is sometimes called 
the midpoint test, since the points x at which $(x) is 
evaluated are often taken to be the vectors of midpoints 
of the boxes x produced during the subdivision process. 
(Actually, some implementations use the output of an 
approximate or local optimizer as x, to get an upper 
bound on the global optimum that is as low as possi- 
ble.) 

The simplest possible branch and bound algo- 
rithms need to contain both a box rejection mechanism 
and a subdivision mechanism. A common subdivision 
mechanism is to form two sub-boxes by bisecting the 
widest coordinate interval of x (with possible scaling 
factors). Heuristics and scaling factors, as well as sev- 
eral references to the literature, appear in [3, $4.3.2, 
p. 157 ff]. Alternatives to bisection, such as trisection, 
forming two boxes by cutting other than at a midpoint, 
etc. have also been discussed at conferences and studied 
empirically [1]. 


Acceleration Tools 


Early and simple algorithms contain only the midpoint 
test mechanism and bisection mechanism described 
above. Such algorithms produce as output a large list 
U of small boxes (with diameters smaller than a stop- 
ping tolerance) and no list C of boxes that contain ver- 
ified critical points. The list U in such algorithms con- 
tains clusters of boxes around actual global optimizers. 
Some Lipschitz constant-based algorithms are of this 
form. Note, however, that such algorithms are of lim- 
ited use in high dimensions, since the number of boxes 
produced increases exponentially in the dimension n. 

Interval computations provide more powerful tools 
for accelerating the algorithm. For a start, if an interval 
extension of the gradient V@(x) is computable then 0 ¢ 
V $(x) implies that x cannot contain a critical point, and 
x can be rejected. This tool for rejecting a box x is some- 
times called the monotonicity test, since 0 ¢ (V¢(x)); 
implies @ is monotonic over x in the ith component x;, 
where (V@(x)); represents the ith component of the in- 
terval evaluation of the gradient V¢. 


Perhaps the most powerful interval acceleration tool 
is interval Newton methods, applied to the system V ¢ 
= 0. Interval Newton methods can result in quadratic 
convergence to a critical point in the sense that the 
widths of the coordinates of the image of x are propor- 
tional to the square of the widths of the coordinates of x. 
Interval Newton methods also can prove existence and 
uniqueness of a critical point or nonexistence of a crit- 
ical point in x. Thus, the need to subdivide a relatively 
large x is often eliminated, making a previously imprac- 
tical algorithm practical. See » Interval Newton meth- 
ods and > Interval fixed point theory. 

For a more detailed algorithm, and for a discussion 
of parallelization of the branch and bound process, see 
> Interval analysis: Parallel methods for global opti- 
mization. 


Differences Between Unconstrained 
and Constrained Optimization 


If m,> 0 or m2> 0 in problem (1), then the problem is 
one of constrained optimization. The midpoint test can- 
not be applied directly to constrained problems, since 
(x) is guaranteed to be an upper bound on the global 
optimum only if the constraints c(x) = 0 and g(x) < 0 
are also satisfied at x. If there are only inequality con- 
straints and none of the inequality constraints are ac- 
tive at x, then an interval evaluation of g(x) will rigor- 
ously verify g(x)< 0, and x can be used in the midpoint 
test. However, if there are equality constraints (or if one 
or more of the inequality constraints is active), then an 
interval evaluation will yield 0 € ¢(x) (or 0 € g(x) for 
some i), and it cannot be concluded that x is feasible. 
In such cases, a small box x can be constructed about 
x, and it can be verified with interval Newton methods 
that x contains a feasible point. The upper bound of the 
interval evaluation @(x) then serves as an upper bound 
on the global optimum, for use in the midpoint test. For 
details and references, see > Interval analysis: Verifying 
feasibility. 

On the other hand, constraints can be beneficial in 
eliminating infeasible boxes x. In particular, 0 ¢ c(x) or 
g(x)> 0 implies that x can be rejected. 

It is sometimes useful to consider bound constraints 
of the form x; > x; and x; < x; separately from the 
general inequality constraints g(x) < 0. Such bound 
constraints can generally coincide with the limits on 
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the search region Xo, but are distinguished from sim- 
ple search bounds. (It is possible for an unconstrained 
problem to have no optima within a search region, but 
it is not possible if all of the search region limits rep- 
resent bound constraints.) See [3, §5.2.3, p. 180 ff] for 
details. 


Example 1 Consider 


min (x) = —(x; + x2)? 


s.t. c(x) = x. —2x, = 0. 


(2) 


Example (2) represents a constrained optimization 
problem with a single equality constraint and no bound 
constraints or inequality constraints. To apply the mid- 
point test in a rigorously verified algorithm, a box must 
first be found in which a feasible point is verified to ex- 
ist. Suppose that a point algorithm, such as a general- 
ized Newton method, has been used to find an approxi- 
mate feasible point, say x = (3, x)T, Now observe that 
V c = (— 2, 1)1. Therefore, as suggested in > Inter- 
val analysis: Verifying feasibility, x. can be held fixed at 
x2 = 1/2. Thus, to prove existence of a feasible point in 
a neighborhood of x, an interval Newton method can be 
applied to f(x1) = c(x1, 0.5) = 0.5— 2x,. We may choose 
initial interval x, = [ 0.25— ¢€, 0.25+ €] with € = 0.1, to 
obtain 


Xj = [—.15, —.35], 


ao 0 
xX = 0.25 — 3 — [0.25, 0.25] Cx, 


This computation proves that, for x2 = 0.5, there is a fea- 
sible point for x; € [0.25, 0.25]. (See » Interval Newton 
methods and > Interval fixed point theory.) We may 
now evaluate @ over the box ([— 0.25, — 0.25], [0.5, 
0.5])T (that is degenerate in the second coordinate, and 
also happens to be degenerate in the first coordinate for 
this example). We thus obtain 


(0.25, 0.25], [0.5,0.5]) = -z _ ie 


and — 9/16 has been proven to be an upper bound on 
the global optimum for example problem (2). 
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Introduction 


Constrained optimization problems are of the form 


min ¢(x) 

s.t. ci(x)=0, i=1,...,m, 
g(x) <0, jHl,....q, (1) 
Xi, <Xi,, kK=1,...,42—-p, 
XS Xi,, kK=wtl,..., gq, 


where ¢: R” —> R, c;: R” > R, and g;: R” > R. 
In interval branch and bound algorithms for finding 
global optima for problem (1), a search box of the form 


x= 2(x1,...,X%n)' ER": (2) 
is generally given, where some of the sides in (2) cor- 
respond to bound constraints of problem (1), and the 
other sides merely define the extent of the search re- 
gion. If there are no constraints c; and gj, then the box x 
is systematically tessellated into sub-boxes. The branch 
and bound algorithm, in its most basic form, proceeds 
as follows: Over each sub-box x, (x) is computed for 
some X € x, and the range of ¢ over x is bounded (e. g. 
with a straightforward interval evaluation). If there are 
no constraints c; and g;, then the value #(X) represents 
an upper bound on the minimum of ¢. The minimum 
such value ¢ is kept as the tessellation and search pro- 
ceed; if any box ¥ has a lower range bound greater than 
@, it is rejected as not containing a global optimum. See 
[1.2], or [3] for details of such algorithms. 


The situation is more complicated in the con- 
strained case. In particular, the values ¢(x) cannot be 
taken as upper bounds on the global optimum unless 
it is known that x is feasible. More generally, an upper 
bound on the range of ¢ over a small box X can be taken 
as an upper bound for the global optimum provided it 
is proven that there exists a feasible point of problem 
(1) within x. This article outlines how this can be done. 


General Feasibility: the Fritz John Conditions 


An interval Newton method (see » Interval Newton 
methods) can sometimes be used to prove existence of 
a feasible point of problem (1) that is a critical point of 
@. In particular, the interval Newton method can some- 
times prove existence of a solution to the Lagrange mul- 
tiplier or Fritz John system within x. For the Fritz John 
system, it is convenient to consider the q, bound con- 
straints in the same form as the q; general inequality 
constraints, so that there are q = q) + q2 general inequal- 
ity constraints of the from g;(x) < 0. With that, the Fritz 
John system can be written as 


F(x,u,v) = 
uo V(x) + a ujV gj + uae viV ci (x) 

uj81 
UqSq 

c1(x) =v 
Cm(x) 

(uo a ye uj + Doin v7) —1 
(3) 
where uj > 0,j =1,..., q, the vj are unconstrained, and 


the last equation is one of several possible normaliza- 
tion conditions. For details, see [1, §10.5] or [2, §5.2.5]. 

However, computational problems occur in prac- 
tice with the system (3). It is more difficult to find 
a good approximate critical point (for an appropri- 
ate small box x) of the entire system (3) than it is to 
find a point where the inequality and equality con- 
straints are satisfied. Furthermore, if an interval New- 
ton method is applied to (3) over a large box, the cor- 
responding interval Jacobi matrix or slope matrix typi- 
cally contains singular matrices and hence is useless for 
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existence verification. This is especially true if it is dif- 
ficult to get good estimates for the Lagrange multipliers 
u; and v;. For this reason, the techniques outlined below 
are useful. 


Feasibility of Inequality Constraints 


Proving feasibility of the inequality constraints is some- 
times possible by evaluating the g; with interval arith- 
metic: If g;(x) < 0), then every point in xis feasible with 
respect to the constraint gj(x) < 0; see [3]. However, 
if X corresponds to a point at which g; is active, then 
0 € g;(X), and no conclusion can be reached from an 
interval evaluation. In such cases, feasibility can some- 
times be proven by treating g;(x) = 0 as one of the equal- 
ity constraints, then using the techniques below. 


Infeasibility 


An inequality constraint g; is proven infeasible over x 
if g;(k) > 0, and an equality constraint c; is infeasible 
over x if either c;(x) > 0 or c;(x) < 0. See [3]. 


Feasibility of Equality Constraints 
The equality constraints 
c(x) = (c1(x),...,€m(x))' =0, 


c: R” > R”, n > m, can be considered an underde- 
termined system of equations, whereas interval New- 
ton methods generally prove existence and/or unique- 
ness for square systems. However, fixing n — m coordi- 
nates x; € x; allows interval Newton methods to work 
with c: R™ —> R", to prove existence of a feasible 
point within x. In principle, indices of the coordinates 
to be held fixed are chosen to correspond to coordi- 
nates in which c is varying least rapidly. For a set of test 
problems, the most successful way appears to be choos- 
ing those coordinates corresponding to the rightmost 
columns after Gaussian elimination with complete piv- 
oting has been applied to the rectangular matrix c’(x) 
for some x € x. Figure 1 illustrates the process in two 
dimensions. 

Certain complications arise. For example, if bound 
constraints or inequality constraints are active, then ei- 
ther the point x must be perturbed or else the bound or 
inequality constraints must be treated as equality con- 
straints. Handling this case by perturbation is discussed 
in [2, p. 191ff]. 


This point is proven 
to exis. 


Center of box 
First coordinate is held 


fixed at center of box, 


» 
Second coord. varies. 


Interval Analysis: Verifying Feasibility, Figure 1 
Proving that there exists a feasible point of an underdeter- 
mined constraint system 


For the original explanation of the Gaussian 
elimination-based process, see [1, §12.4]. In [2, §5.2.4], 
additional background, discussion, and references ap- 
pear. 
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Interval constraint processing is an alternative technol- 
ogy designed to process sets of (generally nonlinear) 
continuous or mixed constraints over the real numbers. 
It associates propagation and search techniques devel- 
oped in artificial intelligence and methods from interval 
analysis. 

Interval constraints are used in the design of the 
constraint solving and optimization engines of most 
modern constraint programming languages and have 
been used to solve industrial applications in areas like 
mechanical design, chemistry, aeronautics, medical di- 
agnosis or image synthesis. 

The term interval constraint is a generic term de- 
noting a constraint (that is a first order atomic for- 
mula such as an equation, inequation or more gener- 
ally a relation) in which variables are associated with 
intervals. These intervals denote domains of possible 
values for these variables. In general, intervals are de- 
fined over the real numbers but the concept is general 
enough to address other constraint domains (e.g. non 


negative integers, Booleans, lists, sets, etc.). When de- 
fined over the real numbers, interval constraint sets are 
often called continuous or numerical constraint satisfac- 
tion problems. 

The main idea underlying interval constraint pro- 
cessing—also called interval propagation—is, given 
a set of constraints S involving variables {vj, ..., vn} 
and a set of floating point intervals {I), ..., In} rep- 
resenting the domains of possible values of variables, 
to isolate a set of {n}-ary canonical boxes (Cartesian 
products of I;s subintervals whose bounds are either 
equal or consecutive floating point numbers) approx- 
imating the constraint system solution space. To com- 
pute such a set, a search procedure navigates through 
the Cartesian product I;x +--+ x I, alternating pruning 
and branching steps. 

The pruning step uses a relational form of interval 
arithmetic [1,11]. Given a set of constraints over the re- 
als, interval arithmetic is used to compute local approx- 
imations of the solution space for a given constraint. 
This approximation results in the elimination of val- 
ues from the domains of the variables and these do- 
main modifications are propagated through the whole 
constraint set until reaching a stable state. This stable 
state is closely related to the notion of arc consistency 
[9,10], a well-known concept in artificial intelligence. 
The branching step consists in a bisection-based divide- 
and-conquer procedure on which a number of strate- 
gies and heuristics can be applied. 

Interval constraints were first introduced by J.G. 
Cleary in [5]. The initial goal was to address the in- 
correctness of floating point numerical computations 
in the Prolog language while introducing a relational 
form of arithmetic more adapted to the language for- 
mal model. These ideas, clearly connected to the con- 
cepts developed in constraint logic programming [6,7], 
were then implemented in BNR-Prolog [12]. Since then, 
many other constraint languages and systems have used 
interval constraints as their basic constraint solving en- 
gine, for example CLP(BNR) [4], Prolog IV [13] or Nu- 
merica [16]. 

In the interval framework, the basic data structure 
is a set of ordered pairs of numbers taken in a finite set 
of particular rational numbers augmented with the two 
infinities (this set generally coincides with a set of IEEE 
floating point numbers). Such a pair, called a floating 
point interval or, more concisely, an interval, denotes, 
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as expected, the set of real numbers in between the 
lower and upper bounds. Operations and relations over 
the reals can be lifted to intervals (using floating point 
operations and outward rounding) so as to keep nu- 
merical errors under control. In particular, correctness 
of computations is guaranteed by a fundamental theo- 
rem due to R.E. Moore [11]. 

Assuming a finite set of intervals closed under in- 
tersection, every relation p over R can be approximated 
with its interval enclosure (i. e. the intersection of all in- 
tervals containing it). The approximation of any {n}-ary 
relation is then defined as the Cartesian product of its 
projection approximations. These Cartesian products 
of intervals are called boxes. The set of boxes, partially 
ordered by inclusion, is the complete lattice made of the 
fixed points of the closure operator that maps {n}-ary 
relations over R to their approximations. The intersec- 
tion of all boxes containing an n-ary relation defines an 
outer approximation notion. A dual notion of inner ap- 
proximation can be defined as the union of all boxes 
contained in the relation. 

Given a finite set of constraints S and an initial n- 
ary box X representing the domains (intervals) for all 
variables occurring in S, every constraint in S repre- 
sents an n-ary relation p (modulo an appropriate cylin- 
drification). The main idea is then to compute a box 
approximating the solution set defined by S and X. In 
the interval constraint framework, this approximation 
is generally computed by applying the following algo- 
rithm, called here NC3 to reflect its similarity to the arc 
consistency algorithm AC3 [10]. 

The call to the function narrow in NC3 is an algo- 
rithmic narrowing process. Every constraint c in S and 
its corresponding relation p is associated with an op- 
erator N,, called constraint narrowing operator, map- 
ping boxes to boxes and verifying the properties of cor- 
rectness, contractance, monotonicity and idempotence, 
that is for every boxes X, X’ 

1) XN pCN-(X); 

2) N-(X) CX; 

3) X CX’ implies N.(X) C N,(X’) 

4) N.(N-(X)) = N-(X). 

When such operators are associated with the con- 
straints of a set S, the function narrow(X, c) simply re- 
turns N,(X). The algorithm stops when a stable state 
is reached, i.e. no (strict) narrowing is possible with 
respect to any constraint. The result of the main step 


function NC3() 
input: S, a (nonempty) constrain system, 
X, a (nonempty) box 
output: X’ C X 
Queue all constraints from S in Q 
REPEAT 
select a constraint c from Q 
% Narrow down X with respect to c 
X'’ <— narrow(X,c) 
% if X’ is empty, S is inconsistent 
IF X’ = 9 THEN return 9 
% Queue the constraints whose variables’ 
% domains have changed. Delete c from Q 
Let S’ = {c € S : Av € var(c), X/ C X,} 
Oe QS ac 
xX < X’ 
UNTIL Q is empty 
return X 
END % NC3 


NC3: A generic narrowing algorithm 


is to remove (some) incompatible values from the do- 
mains of the variables occurring in c. Furthermore, it 
can be shown that NC3 terminates, is correct (the final 
box contains all solutions of the initial system included 
in {X}, confluent (selection of constraints in the main 
loop is strategy independent) and computes the greatest 
common fixed point of the constraint narrowing oper- 
ators that is included in the initial box [2]. 

Over the real numbers, different constraint narrow- 
ing operators can be defined, resulting in different local 
consistency notions. A system is said to be locally con- 
sistent (with respect to a family of constraint narrowing 
operators), if the Cartesian product of the domains of 
its variables is a common fixed point of the constraint 
narrowing operators associated with its constraints. 
The main local consistency notions used in continu- 
ous constraint satisfaction problems are: first order lo- 
cal consistencies deriving from arc consistency (hull (or 
2B) consistency [4,8], box consistency [3], and higher 
order local consistencies deriving from k-consistency 
(3B, kB-consistency [8], box(2)-consistency [14]). 

More precisely, let apx(c) denote the smallest box 
enclosing the relation associated with a constraint c. 
The family of constraint narrowing operators N defined 
as: For all box X and all constraint c, N.(X) = apx(X 
M c) is the support of hull consistency. These opera- 
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tors can be computed for very simple constraints, of- 
ten named primitive constraints (e.g. x + y = z, sin(x) 
= y, ...) and complex constraints are decomposed into 
primitive constraints, eventually adding fresh variables. 

The definition of box consistency involves the intro- 
duction of projection constraints. Given a multivariate 
constraint c over the reals, the projection constraint of 
c with respect to a variable v is obtained by computing 
an interval extension of the constraint and by replacing 
every variable but v with the interval corresponding to 
its domain. The constraint narrowing operator associ- 
ated with this projection constraint computes the great- 
est interval [a, b] such that [a, a* and (b-, b], where a* 
(resp. b~) denotes the successor (resp. the predecessor) 
of a (resp. b), verify the projection constraint. Besides 
the fact that this technique does not require the addi- 
tion of any additional variable, these operators can be 
computed with an algorithm mixing interval Newton 
methods (cf. » Interval Newton methods), propagation 
and bisection-based search. 

Higher-order local consistencies are based on their 
first order counterparts. Operationally, the idea is to 
improve the enclosures accuracy by eliminating subin- 
tervals of the locally consistent domains that can be 
detected locally inconsistent. The general procedure 
is as follows: Consider a hull consistent interval con- 
straint system (S, X). An equivalent 3B-consistent sys- 
tem is a system (S, X’) such that, for every variable 
v, if X,’ = [a, b], the system (S, X’ at) (resp. (S, 


v<tl[a, 
x <[b- »))) is hull consistent, where X’, , denotes the 


vel 
Cartesian product X’ where X,’ is replaced with I. 
Box(2)-consistency is defined in the same manner with 
respect to box consistency. The computational cost of 
higher-order local consistencies is generally high, but 
the local gain in accuracy was shown to outperform 
most existing techniques in several challenging prob- 
lems (see, for example the circuit design problem in 
[14]). 

Finally, the above mentioned interval constraint 
techniques are also used for unconstrained and con- 
strained optimization problems (see for example 
[15,16]). 
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Interval methods (interval Newton methods and the 
Krawczyk method) can be used to prove existence and 
uniqueness of solutions to linear and nonlinear finite- 
dimensional and infinite-dimensional systems, given 
floating-point approximations to such solutions. (See 
> Interval Newton methods; and [6,8].) In turn, these 


existence-proving interval operators have a close rela- 
tionship with the classical theory of fixed-point itera- 
tion. This relationship is sketched here. 


Classical Fixed Point Theory 
and Interval Arithmetic 


Various fixed point theorems, applicable in finite- 
or infinite-dimensional spaces, state roughly that, if 
a mapping maps a set into itself, then that mapping has 
a fixed point within that set. For example, the Brouwer 
fixed point theorem states that, if D is homeomorphic to 
the closed unit ball in R” and P is a continuous mapping 
such that P maps D into D, then P has a fixed point in 
D, that is, there is an x € D with x = P(x). 

Interval arithmetic can be naturally used to test the 
hypotheses of the Brouwer fixed point theorem. An in- 
terval extension P of P has the property that, if x is 
an interval vector with x C D, then P(x) contains the 
range {P(x): x € x}, and an interval extension P can 
be obtained simply by evaluating P with interval arith- 
metic. Furthermore, with outward roundings, this eval- 
uation can be carried out so that the floating point in- 
tervals (whose end points are machine numbers) rigor- 
ously contain the actual range of P. Thus, if P(x) C x, 
one can conclude that P has a fixed point within x. 

Another fixed point theorem, Miranda’s theorem, 
follows from the Brouwer fixed point theorem, and is 
directly useful in theoretical studies of several interval 
methods. Miranda’s theorem is most easily stated with 
the notation of interval computations: Suppose x C R” 
is an interval vector, and for each i, look at the lower 
ith face x; of x, defined to be the interval vector all of 
whose components except the ith component are those 
of x, and whose ith component is the lower bound x; of 
the ith component x; of x. Define the upper ith face x; 
of x similarly. Let P: x + R”, P(x) = (Pi (x), ..., Pn(x)) 
be continuous, and let P = (Pj, ..., P,,) be any interval 
extension of P. Miranda’s theorem states that, if 


P;(x;)P;(x;) < 0, (1) 


then P has a fixed point within x. 


The Krawczyk Method and Fixed Point Theory 


R.E. Moore provided one of the earlier careful analyses 
of interval Newton methods in [5]. There, the Krawczyk 
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method was analyzed as follows: The chord method is 
defined as 


P(x) = x—Yf(x), (2) 


where the iteration matrix is normally taken to be Y = 
('@)_ for some Jacobi matrix f’(x) with X € x, 
where solutions of f(x) = 0, f: DC R” — R" are sought. 
A mean value extension is then used: 


P(x) € P(x) + P’(x)(x — x), 
whence 
K(x, x) = P(x) 
= P(X) + P’(x)(x—%) (3) 
=x-—Yf(xX)+ (I- Y f’(x)) (x — x) 


is an interval extension of P. Thus, the fact that the 
range of P obeys 


(P(x): x © x} C P(x) = K(x, x) 


coupled with the Brouwer fixed point theorem implies 
that, if 


K(x, x) Cx, 


then there exists a fixed point of P, and hence solution 
x* € K(x, x), f(x*) =0. 

By analyzing the norm norm || I — Yf’(x) ||, Moore 
further concludes, basically, that if 


7 —Yf'(x)|| <1, 


then any solution x* € x must be unique; for an exact 
statement and details, see [5]. 


Interval Newton Methods and Fixed Point Theory 


Traditional interval Newton methods are of the form 
N(f,x, x) =X+V, (4) 


where v is an interval vector that contains all solutions 
v to point systems Av = —f(x), for A € f’(x), where 
f(x) is either an interval extension to the Jacobi ma- 
trix of f over x or an interval slope matrix; see > In- 
terval Newton methods. [7, Thm. 5.1.7] asserts that, if 
N(f,x, x) C intx, where f’(x) is a ‘Lipschitz set’ for f, 
intx denotes the interior of x, and x € int(x), then there 


is a solution of f(x) = 0 within N(f, x, x), and this so- 
lution is unique within x. Classical fixed point theory is 
used in the succinct proof of this general theorem. 

When the interval Gauss-Seidel method is used to 
find the solution set bounds v, a very clear correspon- 
dence to Miranda’s theorem can be set up. This is done 
in [3]. 


Uniqueness 


In classical fixed point theory, the contractive mapping 
theorem (a nongeneric property) is often used to prove 
uniqueness. For example, suppose P is Lipschitz with 
Lipschitz constant L < 1, that is, 


|P(x) — P(y)|| < L\lx—y|| forsome L <1. (5) 


Then x = P(x) and y = P(y) implies || x — y || = || P(x) 
— P(y) || < L || x — y |], which can only happen if x = y. 
(This argument appears in many elementary numerical 
analysis texts, such as [4].) 

An alternate proof of uniqueness involves nonsin- 
gularity (i.e., regularity) of the mapping f for which we 
seek x with f(x) = 0. In particular, if f(x) = Ax is linear, 
corresponding to a nonsingular matrix A, then f(x) = 0 
and f(y) = 0 implies 


0 = f(x) — fly) = Ax — Ay = A(x — y), (6) 


whence nonsingularity of A implies x — y= 0, i.e. x= y. 

Without interval arithmetic, the argument in (6) 
cannot be generalized easily to nonlinear systems. Ba- 
sically, invertibility implies uniqueness, and one must 
somehow prove invertibility. However, with interval 
arithmetic, uniqueness follows directly from an equa- 
tion similar to (6), and regularity can be proven di- 
rectly with an interval Newton method. In particular, 
if the image under the interval Newton method (4) is 
bounded, then every point matrix A € f(x) must be 
nonsingular. (This is because the bounds on the so- 
lution set to the linear system f(x)v = —f(x) must 
contain the set of solutions to all systems of the form 
Av = —f(%), A € f’(x).) Then, the mean value theorem 
implies that, for every x € x, y € x, 

f(x) — fly) = A(x—y) forsome A € f(x). (7) 


This is in spite of the fact that, in (7), A is in general 
not equal to any f’(x) for some x € x. In fact, (7) follows 
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from considering f componentwise: 


fil”) = fil) + (VAiled)) | (y—%), 


for some c;, different for each i, on the line connecting x 
and y; the matrix A € f’(x) can be taken to have its ith 
row equal to (V f;(c;))™. Thus, because of the nonsin- 
gularity of A in (7), f(x) = 0, f(y) = 0 implies 0 = A(x — 
yandx=y. 

Summarizing the actual results, 


N(f.x, X) C intx, (8) 


where N(f,x, x) is as in (4), then classical fixed-point 
theory combined with properties of interval arithmetic 
implies that there is a unique solution to f(x) = 0 in 
N(f, x, x), and hence in x. 

If slope matrices are used in place of an interval Ja- 
cobi matrix f’(x), then (7) no longer holds, and (8) no 
longer implies uniqueness. However, a two-stage pro- 
cess, involving evaluation of an interval derivative over 
a small box containing the solution and evaluation of 
a slope matrix over a large box containing the small box, 
leads to an even more powerful existence and unique- 
ness test than using interval Jacobi matrices alone. This 
technique perhaps originally appeared in [9]. A state- 
ment and proof of the main theorem can also be found 
in [3, Thm. 1.23, p. 64]. 


Infinite-Dimensional Problems 


Many problems in infinite-dimensional spaces (e.g. 
certain variational optimization problems) can be writ- 
ten in the form of a compact operator fixed point equa- 
tion, x = P(x), where P: S — S is some compact opera- 
tor operating on some normed linear space S. In many 
such cases, P is approximated numerically from a finite- 
dimensional space of basis functions {¢;: i = 1, ..., n} 
(e.g. splines or finite element basis functions @;), and 
the approximation error can be computed. That is, P(x) 
= P,(y) + Rn(y), where y € R" is an approximation to x 
€ S, and R,,(y) is the error that is computable as a func- 
tion of y. Thus, a fixed point iteration can be set up of 
the form 


y — Paly) = Praly) + Raly), (9) 


where y € R". (The dimension n can be increased as 
iteration proceeds.) 


For (9), the Schauder fixed point theorem is an 
analogue of the Brouwer fixed point theorem; see [1, 
p. 154]. Furthermore, interval extensions can be pro- 
vided to both P,, and R,, so that an analogue to finite- 
dimensional computational fixed point theory exists. In 
particular, if 


P,,(y) C inty, (10) 


then there exists a fixed point of P within the ball in S 
centered at the midpoint of y and with radius equal to 
the radius of y. (For these purposes, 


y= > aidi 
i=1 


can be identified with the interval vector (aj, ..., a,)™ 
corresponding to the coefficients in the expansion.) For 
details, see [6, Chap. 15]. Also see [2] for a theoretical 
development and various examples worked out in de- 
tail. 
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We give an overview of the general ideas involved 
in solving constrained and unconstrained global op- 
timization problems using interval arithmetic. We in- 
clude a discussion of a few prototype optimization algo- 
rithms and enumerate some applications in engineer- 
ing, chemistry, manufacturing, economics and physics. 


Introduction 


Let I be the set of real compact intervals, R the set of 
reals, m a positive integer, X € I, and f: X — R the ob- 
jective function. We assume that a global minimum f* 
of f exists over X. Let X* be the set of global minimizers 
of f over X. Then the global unconstrained optimization 
problem is written down concisely as 


min f(x) (1) 


which means that f* or X* is to be determined. The 
global constrained optimization problem arises if a more 
general set M C R”, the so-called feasible domain, is 
considered. Solution methods for global constrained 
problems use the tools for global unconstrained prob- 
lems, but additionally, further concepts are needed such 
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as numerical proofs of the guaranteed existence of fea- 
sible points in subareas of the working domain. There- 
fore we separate the treatment of the constrained case 
from the treatment of the unconstrained case. 

The first interval techniques for treating global opti- 
mization problems were established by [13,20,30,31,46, 
64,65,66,67,71,72,74,81,98, 103,104,105], etc. Although 
some of these references were focusing on special prob- 
lems like convex or signomial programming, they pro- 
vided concepts which would give insight into more gen- 
eral problems where they were later applied. 

The overview that we provide in this article, can 
only cast a quick glance at the various topics that will be 
considered. Their thorough investigation may be found 
in [33,39,47,48,49,56,70,90,95, 100]. 

Solving an optimization problem such as (1) re- 
quires, in general, the repetitive comparison of con- 
tinua of values and the choosing of an optimum value. 
Since interval computation is a tool for handling con- 
tinua, it provides competitive methods for solving 
global optimization problems. Simple prototype algo- 
rithms for unconstrained problems are discussed in or- 
der not to get too sophisticated. We choose three vari- 
ants, on the one hand in order to keep track of the his- 
torical origins, on the other hand in order to show how 
small changes in the prototypes influence their conver- 
gence behavior. These prototypes are based on ideas of 
S. Skelboe [103], R.E. Moore [74], N.S. Asaithambi, Z. 
Shen and Moore [2], E.R. Hansen [30,31], and K. Ichida 
and Y. Fujii [46]. We do not have the space to provide 
prototype algorithms for constrained problems as well 
in this article. Thus we only discuss parts which we have 
to add to the unconstrained prototypes in order to get 
a procedure for constrained problems. 

In general, interval algorithms for solving global op- 
timization problems consist of 
i) the main algorithm, 

ii) accelerating devices. 

The main algorithm is a sequential deterministic al- 
gorithm where branch and bound techniques are used. 
(An algorithm is called sequential if the nth step of the 
computation depends on the former steps. A method 
is deterministic if stochastic methods are avoided. By 
branch and bound principles is meant that the whole 
area X or M is not searched uniformly for the global 
minimizers; instead some parts (branches) are pre- 
ferred. The branching depends on the bounding. It is 


required that for any box Y of the working area a lower 
bound for f over Y is known or computable.) 

Interval arithmetic is used for point i) to achieve the 
bounds needed for the branch and bound techniques (f 
need not be Lipschitz, convex, etc.) and for point ii) to 
remove superfluous parts of the domains X or M. 

The contents of this article is as follows: In the next 
two sections we introduce the interval tools which are 
required in the article. In section 4, three algorithms 
for solving (1) are presented. They are seemingly very 
similar, but their convergence properties, which are dis- 
cussed in section 5, are different. The three algorithms 
are also of interest for historical reasons. A survey of ac- 
celeration devices, which aim to speed up the computa- 
tion, is given in section 6. It is shown in section 7 that 
interval analysis is an excellent means for dealing with 
problems which have an unbounded domain or a nons- 
mooth objective function. In section 8, the constrained 
case is touched upon. Applications of these methods are 
collected in the final section 9. 


Interval Arithmetic 


The interval tools which are needed for the explanation 
of the basic features of interval methods in global opti- 
mization are described in this section. A thorough in- 
troduction to the whole area of interval arithmetic can 
be found, for example, in [1,4,52,74,102], etc. More ad- 
vanced readers will enjoy [79]. The development of in- 
terval tools appropriate for dealing with optimization 
problems is presented in [88,90]; cf. also the Appendix 
of [86]. 
The interval arithmetic operations are defined by 


AxB={axb: ac A,beB} forA,Bel, (2) 


where the symbol * may denote +, —, -, or /. In general, 
A/B is not defined if 0 € B. (But see the sections on ‘in- 
terval Newton methods’ and ‘global optimization over 
unbounded domains’ below.) The meaning of (2) is the 
following: If some unknown reals a, 6 are included in 
known intervals, say a € A, f € B, then it is guaranteed 
that the desired result wa * 6, which is in general un- 
known, is contained in the known interval A * B. Defi- 
nition (2) is equivalent to the following rules, 


[a,b] + [c,d] =[a+c,b+d], 
[a,b] — [c,d] = [a—d,b—-c], 
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[a, b] - [c,d] 

= [min(ac, ad, bc, bd), max(ac, ad, bc, bd)], 
[a, b] 
[c, d] 


= [a,b] - co Al if 0 ¢ [c, d]. 
dc 


Therefore, the interval arithmetic operations can 
easily be realized on a computer. The algebraic prop- 
erties of (2) are different from those of real arithmetic 
operations. The distributive law, for instance, does not 
hold for (2). A summary of the algebraic behavior of 
interval arithmetic is given in [85]. 

The main interval arithmetic tool applied to opti- 
mization problems is the concept of an inclusion func- 
tion. Let again X € I” and f: X > R. The set of com- 
pact intervals contained in X is denoted by I(X). Let 
f(Y) = {f(x): x © Y} for Y © I(X) be the range of 
f over Y. A function F is called an inclusion function for 
f if 

f(Y) © F(Y) for any Y € I(X). 


The left and the right endpoint of F(Y) will be denoted 
by minF(Y) and maxF(Y), respectively. 

Inclusion functions can be constructed in any pro- 
gramming language in which interval arithmetic is sim- 
ulated or implemented via natural interval extensions: 
Firstly, let g be any function pre-declared in some pro- 
gramming language (like sin, cos, exp, etc.). Then the 
corresponding pre-declared interval function IG is de- 
fined by 


IG(Y) = g(Y) 


for any Y € I contained in the domain of g. 


Since the monotonicity intervals of pre-declared func- 
tions g are well known it is easy to realize the interval 
functions IG on a computer. Nevertheless, the influence 
of rounding errors may be considered, see [30], for in- 
stance. 

Secondly, let f(x) be any function expression in the 
variable x € R”. So, f(x) may be an explicit formula or 
described by an algorithm not containing logical con- 
nectives at the moment. For simplicity, we assume that 
f(x) is representable in a programming language. Let Y 
€ I” or let Y be an interval variable over I". Then the 
expression which arises if each occurrence of x in f(x) 
is replaced by Y, if each occurrence of a pre-declared 
function g in f(x) is replaced by IG, and if the arithmetic 


operations in f(x) are replaced by the corresponding in- 
terval arithmetic operations, is called the natural inter- 
val extension of f(x) to Y, and it is denoted by f(Y), see 
[71]. Due to (2) and the definition of the IG’s we get the 
inclusion principle for (programmable) functions 


aéY implies f(a) € f(Y). (3) 

Therefore, f(Y), seen as a function in Y, is an inclu- 
sion function for the function f(x). 

For example, if f(x) = x1 sin x. — x3 for x € R?, then 
f(Y) = Yi € Y2 — Y3 is the natural interval extension of 
f(x) toYeP. 

If logical connectives occur in an expression, the ex- 
tensions are similar, cf. [55,87]. 

Due to the algebraic properties of interval arith- 
metic, different expressions for a real function f can 
lead to interval expressions which are different as func- 
tions. For example, if f;(x) = x — x? and f2(x) = x(1 
— x) for x € R, then f,(Y) = Y — Y? = [—1, 1] and 
f2(Y) = YQ — Y) = [0, 1] for Y = [0, 1]. For compar- 
ison, f( Y) = [0, ‘]. In general, the problem arises as to 
how to find expressions of a given function that lead to 
natural interval extensions that are as good as possible. 
A partial solution to this problem can be found in [88]. 

A measure of the quality of an inclusion function F 
for f: X — Ris the so-called excess width ([71]), defined 
as w(F(Y)) — w(f(Y)) for all Y € I(X), where w([a, b]) 
= b — ais the width of an interval. F is called of order 
> Oif 


w(F(Y)) — w(f(¥)) = O(w(¥*)) for ¥ € I(X), 


where the width of a box Y = Y, x--: xX Y,, is defined 
m W(Y;). In order to obtain good 
computational results it is necessary to choose inclusion 
functions having as high an order @ as possible, when 
w(Y) is small, see for example [88]. 

The endpoints of the intervals must be machine 
numbers, if interval arithmetic is implemented on 
a machine. This leads to a special topic called machine 
interval arithmetic. It can be considered as an approxi- 
mation to interval arithmetic on computer systems. 

Machine interval arithmetic is based on the inclu- 
sion isotonicity of the interval operations in the follow- 
ing manner: Let us again assume that a, 6 are the un- 
known exact values at any stage of the calculation, and 
that only including intervals are known, a € A, B € B. 


by w(Y) = max;-1 


Seas, 
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Then A, B might not be representable on the machine. 
Therefore A and B are replaced by the smallest machine 
intervals that contain A and B, 


AC Ay, BC By. 


A machine interval is an interval which has left and 
right endpoints that are machine numbers. From (2) it 
follows that 


AxBCAy * By. 


The interval Ay, * By need not be a machine interval 
and it is therefore approximated by (Ay * By) which 
is the smallest interval representable on the machine 
and which contains Ay * By. This leads to the inclu- 
sion principle of machine interval arithmetic: 


Thus, the basic principle of interval arithmetic is re- 
tained in machine interval arithmetic, that is, the ex- 
act unknown result is contained in the corresponding 
known interval, and rounding errors are under control. 

We sum up: When a concrete problem has to be 
solved then our procedure is as follows: Firstly, the the- 
ory is done in interval arithmetic, secondly, the calcula- 
tion is done in machine interval arithmetic, and finally, 
the inclusion principle provides the transition from in- 
terval arithmetic to machine interval arithmetic. 

Many software packages for interval arithmetic are 
meanwhile available, which work under Fortran 77, 
Fortran 90, Pascal, C, C++, Prolog, etc. A good survey 
can be found, for instance, in [57]. 


aeéA,B eB implies 


Interval Newton Methods 


Interval Newton methods are excellent methods for 
determining all zeros of a continuously differentiable 
vector-valued function ¢: X > R” where X € I. These 
methods are important tools for nonlinear optimiza- 
tion problems since they can be used for computing all 
critical points of the objective function, f, by applying 
the methods to Jg(x), where ¢ is the gradient function 
of f and J¢ the Jacobian of ¢, or for solving the Karush- 
Kuhn-Tucker or John conditions in constrained opti- 
mization. 

The interval Newton method was introduced by 
Moore [71] and it has been further extensively devel- 
oped by many researchers. The latest state of art for in- 
terval Newton methods may be found in [79]. See also 


> Interval Newton methods. The extensive treatment 
of the interval Newton method is not part of this in- 
troductory article so that we sketch it in an extremely 
simplified manner just in order to make the aim of the 
method understandable. For a detailed treatment see, 
for instance, [1,33,79,90,93], etc. 

Interval Newton methods are closely connected to 
solving systems of linear interval equations. An unfor- 
tunate notation is widely used to describe this situa- 
tion since it uses the notation of interval arithmetic in 
a doubtful manner which can lead to misunderstand- 
ings. Le., let A € I *™, B € I” then the solution of the 
linear interval equation (with respect to x or X) 

Ax =B or AX=B 
is not an interval vector Xo that satisfies the equation, 
AXo = B, as one would expect. The solution is defined 
as the set 


X = {x ER”: ax = bforsomea € A,b € B}. 


Thus, for example, the solution of the linear interval 
equation 


[1,2]x = [1, 2] 


is X = [1/2, 2]. In general, the solution set is not a box if 
m > 2. 


It is therefore the aim of interval arithmetic solu- 
tion methods to find at least a box which contains 
the solution set. 


Accordingly, if c € R”, then the solution of the lin- 
ear interval equation 


A(x—c)=B or A(X—-—c)=B 


with respect to x or X is defined to be the set 
X:=c+Y:i={ce+ty: ye Y} 


where Y is the solution of the interval equation Ay = B. 

The following prototype algorithm aims to deter- 
mine the zeros of ¢: X > R”™ in X € I. Let J(Y) be 
an inclusion function for the Jacobian matrix J¢(x) for 
Y €1(X). 
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a) choose x, € Xn, 

b) determine a superbox Z,,; of the solution 
Y,41 of the linear interval equation with re- 
spect to Y, J(Xn)(%n — Y) = (xn), 
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The interval Newton algorithm 


Since we use it later we emphasize that one iteration 
of the interval Newton algorithm is just the execution of 
a), b) and c) for a particular value of n. 

Interval Newton methods are distinguished by the 
particular choice of the superbox Z,, 1. For example, 
if Zn4 1 is the box hull of Y,4 1, that is, the small- 
est box containing Y,, 1, then the method is called 
the interval Newton method (in the proper sense). If 
Zn+1 is obtained by using interval Gauss-Seidel steps 
combined with preconditioning as will be explained in 
the sequel, the method is named after Hansen and S. 
Sengupta [35]. Krawczyk’s method [60] and Hansen- 
Greenberg’s methods [34] are also widely used. Con- 
vergence properties exist under certain assumptions. 
The following general properties are useful for under- 
standing the principle of application of the algorithm, 
see [1,71,73,79]: 

1) Ifazero, &, of ¢ exists in X then & € X,, for all n. This 
means that no zero is ever lost! This implies that: 

2) If X;, is empty for some n then ¢ has no zeros in X. 

3) If Z,+4 1 is obtained by Gauss-Seidel or Gauss elim- 
ination, possibly combined with preconditioning as 
mentioned below then 

i) ifZy+41 CX, for some n then ¢ has a zero in X, 

ii) Zn+1 C int X, for some n then ¢ has a unique 

zero in X (where int means topological interior). 
4) Under certain conditions one obtains 


w(Xn41) < a(w(Xn))” 


for some constant a > 0. 
A very promising realization of the interval Newton al- 
gorithm is the Hansen-Sengupta version [35] where the 
linear system occurring in the Newton iteration step is 
solved by a preconditioning step and by relaxation steps 
(Gauss-Seidel). 

Now we discuss just one iteration of the Hansen- 
Sengupta variant and suppress the index n when writ- 


ing down the formulas that occur in the nth iteration. 
That is, we write 


(X)(x — Y) = (x) (5) 
instead of 
T(Xn)(Xn = Y) = $(Xn) 


and, accordingly, we search for a superset Z of the solu- 
tion set of (5), where X, J(X), x and $(x) are given. The 
solution set of (5) is also denoted by Y. 


The Preconditioning Step 


It was already argued by Hansen and R.R. Smith [37] 
that (5) was best solved by pre-multiplying by an ap- 
proximate inverse of the midpoint of J(X). If the ap- 
proximate inverse is B, we obtain 


BJ(X)(x — Y) = Bé(x) 
or 
M(x —Y) =b (6) 


where M = BJ(X) and b = B¢(x). In this manner the sys- 
tem has been modified to a system that is almost diago- 
nally dominant provided the widths of the Jacobian en- 
tries are not too large and it is then amenable to Gauss- 
Seidel type iterations. It is obvious that the solution set 
of (6) contains the solution set of (5) such that no solu- 
tion is lost in the above transformation. During the last 
years much research has been focusing on the precon- 
ditioning step, cf. for example, [59]. 


The Relaxation Steps (Gauss-Seidel) 


The relaxation procedure for linear interval equations 
was developed in [36]. It consists mainly in the inter- 
pretation of the well-known noninterval Gauss-Seidel 
iteration procedure in an interval context. But much 
care is taken in the interval realization if division 
through intervals that contain zero occurs. We do not 
have the space for a complete discussion and refer, for 
example, to [33,90,93]. 

Instead of a relaxation iteration, interval Gauss 
elimination can be used. This is nothing more than the 
well-known Gauss elimination performed in an inter- 
val setting. Interval Gauss elimination is not as robust 
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as the interval Gauss-Seidel steps. It is, however, more 
effective under certain conditions (for instance, if the 
Jacobian or the preconditioned Jacobian matrix is diag- 
onally dominant, see [79]). Practical experiences show 
that it is best to combine Gauss-Seidel steps with Gaus- 
sian elimination, cf. [33,79,90]. 

There is no urgent need for discussing convergence 
properties of the interval Newton algorithm since only 
single iterations are incorporated into the optimization 
algorithms, cf. the sections on ‘accellerating and related 
devices’ and ‘applications’ below, and hence the con- 
vergence theory of the latter is applicable, cf. the sec- 
tions on ‘convergence properties of the prototype al- 
gorithms’ and ‘applications’ below. Only if it is already 
certain or very likely that the computation is approach- 
ing a global minimizer does it make sense to switch to 
the complete interval Newton algorithm and enjoy fi- 
nally the quadratic convergence property (cf. property 
4). Such a situation occurs, for example, if the objective 
function, in the unconstrained case, or the Lagrangian 
in the constrained case, is convex. 


Three Prototype Algorithms 
for the Unconstrained Problem 


The algorithms are designed to determine f* or X* or 
both as will be described later. They have the box X, the 
inclusion function F for f: X — R and some accuracy 
parameters which may occur in the termination crite- 
ria, as input parameters. The termination criteria will 
depend on the actual case and will not be specified here, 
but see, for example, item c) in the section on “conver- 
gence properties of the prototype algorithms’. For his- 
torical reasons, we go back to the roots of interval arith- 
metic optimization theory. We start with Moore’s al- 
gorithm [71], which used uniform subdivision, but we 
already incorporate the first branch and bound steps 
as proposed by Skelboe [103], and finally we land at 
Hansen’s algorithm [30,31], which was the first algo- 
rithm which featured convergence to both, to f* and 
to X*. 

Algorithm 1 initializes a list £ = £, consisting of 
one pair (X, y), see Step 3. Then the list is modified and 
enlarged at each iteration, see Steps 8 and 9. At the nth 
iteration a list £ = £, consisting of n pairs is present, 


Ln = ((Zni; aa) an where Zni = min F(Z, j). 


Calculate F(X). 

Set y := minF(X). 

Initialize list £ = ((X, y)). 

Choose a coordinate direction k parallel to 
an edge of maximum length of X = X,x...x 
Xm) ie. k € {i : w(X) = w(Xj)}. 

Bisect X normal to direction k obtaining 
boxes V;, V2 such that X = V, U V3. 
Calculate F(V,), F(V). 

Set v; := minF(V;) for i = 1, 2. 

Remove (X, y) from the list L£. 

Enter the pairs (Vj, v1) and (V2, v2) into the 
list such that the second members of all pairs 
of the list do not decrease. 

10. Denote the first pair of the list by (X, y). 

11. Ifthe termination criteria hold, go to 13. 

12. Goto 4. 

13. End. 


E> oS IS 


ot 


Go SI 


Algorithm 1: Moore-Skelboe 


The leading pair of the list £,, will be denoted by 
(Xn, Yn) = (Zn1, Zn1). 


The boxes X,, are called the leading boxes of the algo- 
rithm. It is assumed that the termination criteria of Step 
11 are not satisfied during the whole computation such 
that the algorithm will not stop. In this case an infinite 
sequence of lists is produced. 

Algorithm 1 was mainly established to determine 
f*. Now, Ichida and Fujii [46] and Hansen [30,31] fo- 
cused on the boxes Z,; in order to get reasonable in- 
clusions for X*. While midpoint tests (cf. [2,30,31,46]) 
have no impact on the convergence properties of Al- 
gorithm 1, they are now important when getting inclu- 
sions of X*. Midpoint tests are incorporated as follows: 
Let f,, be the lowest function value which has been cal- 
culated up to the completion of the list £,,. (If no func- 
tion values are available then min;- ,..., , max F(X;) can 
be taken as f,,.) Then all pairs (Z,;, Zn) of & are dis- 
carded that satisfy 


Tn <Zni- 


This gives a reduced list L,. Let U, = U Z,, for all Zyj 
of the reduced list. Then two different procedures are 
known: 
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e Algorithm 2 [46] emerges from Algorithm 1 by 
keeping track of L,, instead of £, (and thus having 
U,, available at each iteration). 

e Algorithm 3 [30,31] is like Algorithm 2, but the re- 
duced lists £,, are ordered with respect to the age or 
the widths of the boxes. 

Variants of the three prototypes occur if the ordering of 

the lists and the bisection directions are changed, cf. the 

next two sections. 


Convergence Properties 
of the Prototype Algorithms 


The results presented in this section are proven in 
[77,86,89]. 

Let us first consider Algorithm 1. As in the previous 

section, we denote the leading pairs of Algorithm 1 by 
(Xn. Yn). One can show that 
a) w(X,) > Oasn—> oo. 
This fact seems to be self-evident but it is not. For exam- 
ple, small modifications of the basic algorithm do not 
satisfy a) as is the case with the cyclic bisection method 
[74]. From the assumption 


w(F(Y)) — w(f(Y)) +0 asw(¥Y) > 0 
(Fe X)) 7) 


it follows that 
b) 


yn <f* for any n, 
In > f* 
f* —yn < w(F(X,)) (error estimate). 


asn > Oo, 


Assumption (7) is not very restrictive. It is almost 
always satisfied if natural interval extensions are used. 
However, (7) does not imply continuity, Lipschitz con- 
dition on f, etc. Let F now satisfy 


w(F(Y)) > 0 asw(Y) > 0. (8) 


Clearly, (8) implies (7) and the continuity of f. Then 

c) w(F(X,)) > 0as n+ oo 

(that is, the error estimate tends to 0 and can thus be 

used for termination criteria), 

d) each accumulation point of the sequence (X,,) is 
a global minimizer. 


The convergence order of the approach y, — f* is de- 
scribed by the following two results: 


e) Let any @ > 0 and any converging sequence of reals 
be given. Then, to any f, there exists an inclusion 
function of order a for which (y,) converges slower 
than the given sequence. 


This result indicates that the convergence can be arbi- 
trarily slow and that no worst case exists, which is usu- 
ally taken in order to establish formulas for the con- 
vergence speed or convergence order. If, however, only 
isotone inclusion functions (F is called isotone if Y C 
Z implies F(Y) C F(Z)) are considered then the follow- 
ing estimate of the convergence speed is valid. Practi- 
cally this estimate characterizes the complete conver- 
gence theory since it is always possible to find isotone 
inclusion functions with small effort. 


f) If Fis isotone and of order a, then 
f* V1 = O(n). 


In [16] some variants of this assertion are proven. 
Algorithms 2 and 3 have nearly the same behavior 
as Algorithm 1 if the convergence to f* is considered. 
Their properties with respect to a determination of X* 
are as follows: 
Let (U,,) be the sequence of unions produced by Al- 
gorithm 2. If (8) is assumed, then 


g) the sequence (U,,) is nested and converges (with re- 
spect to the Hausdorff metrics for compact sets) to 
a superset D > X*. The probability, that D is not 
equal to X* is zero, however. 


Let now (U,,) be the sequence of unions produced 
by Algorithm 3. If (8) is assumed, then 


h) the sequence (U,,) is nested and converges to X*. 


Therefore, Hansen’s Algorithm 3 is the only one of 
the three which features a satisfactory and guaranteed 
convergence to f* and X*. This algorithm will therefore 
play the main role in our further considerations. 


Accelerating and Related Devices 


Algorithm 3 and its predecessors which we have treated 
so far are based on the exhaustion principle, that is, the 
principle of removing areas (subboxes of X) which can- 
not contain a global minimizer. In the same manner we 
realize that the branch and bound principle forms the 
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overlying structure, that is, areas are processed which 
have the largest chance to contain a global minimizer. 
This is a time-consuming process and it is therefore 
important to combine the principle mentioned with 
techniques for speeding up the computations. In this 
section we deal with only a few of these techniques 
in order to demonstrate how they may be combined 
with the basic algorithm. Much research is done in or- 
der to find an optimal combination of the basic algo- 
rithms and the acceleration devices, cf. for instance, 
[26,31,33,48,49,93,95,96]. 

In the following we give an overview of several ac- 
celeration devices and other tools that are used to im- 
prove the computational efficiency of unconstrained in- 
terval based global optimization. Most of them are also 
developed for constrained optimization. 


The Monotonicity Test 


It can be applied if f is differentiable and if an inclusion 
function for V f is available ([30,31,72]). It allows one 
to automatically recognize that f is strictly monotone in 
one of the variables in the subbox Y C X on which the 
algorithm is focusing. Then Y can be discarded from 
the list if Y lies in the interior of X or otherwise Y can be 
replaced by an edge piece of Y. This can be done since 
the parts removed do not contain a global minimizer. 
I.e., let G; be an inclusion function of df/dx; for i = 1, 
..., m. If now 0 ¢ G,(Y) just for one index i, then f is 
strictly monotone in the variable x; over Y such that Y 
can be discarded or replaced by an edge piece as men- 
tioned before. (For the application of the test it is al- 
ready sufficient that f is locally Lipschitz, cf. the next 
section). 


The Interval Newton Method 


If f is twice continuously differentiable and if an in- 
clusion function for the Hessian matrix function, f, 
exists then the interval Newton algorithm can be ap- 
plied to f’ in order to get boxes that contain all zeros 
of f’. Together with the monotonicity test, the inter- 
val Newton algorithm counts as one of the most effec- 
tive tools for solving optimization problems. The main 
advantage is not only the localization of the zeros of 
f', but also a computationally very successful perfor- 
mance. This is based on the properties mentioned in 
the section on ‘interval Newton methods’ which result 


in reducing or splitting the search area. Finally, the con- 

traction shows quadratic convergence under reasonable 

conditions. 
Interval Newton methods can be applied in two dif- 
ferent manners: 

i) The method is applied to f’ in X (necessarily com- 
bined with some splittings of the search area) un- 
til all critical points of f are included in sufficiently 
small boxes Z, for example where w(Z) < €. Then 
the search for the global minimizers is restricted to 
these remaining boxes Z and to the facet of X. This 
approach is, however, not too effective since these 
zeros can be saddle points, local maximizers, or even 
local but not global minimizers. Hence the following 
procedure is used generally: 

ii) Each iteration of the optimization algorithm is com- 
bined with the monotonicity test and one or two 
interval Newton iterations. I. e., after having X bi- 
sected into the subboxes V; and V3, cf. Step 5 of 
Algorithm 3, the midpoint test, the monotonicity 
test and one interval Newton iteration is applied to 
V, and V> in order to diminish the size of Vi, V2 
or to discard them. This procedure avoids superflu- 
ous and costly interval Newton iterations in boxes 
in which f is strictly monotone or which have too 
large function values. 

The interval Newton can be improved by using slopes 

whenever possible, cf. [79]. See also ‘use of good inclu- 

sion functions’ below. 


Finding a Function Value as Small as Possible 


The smaller the smallest known or computed function 
value is at the nth iteration the more effective is the mid- 
point test, that is, boxes are removed earlier than with- 
out these values. 

There are many possible techniques for getting 
lower function values such as statistical and line search 
methods, bundle methods (line search in the nons- 
mooth case), descent methods, Newton-like methods, 
where the application of the methods depends on the 
differentiability of the objective function. Many of these 
variations lead to so-called globally convergent meth- 
ods. This does not mean that a global solution is found, 
however, it does mean that a local solution is always 
found. Good results in finding small function values 
have been attained with generating a not very dense set 
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of points and to use them as starting points for the glob- 
ally convergent methods mentioned above. 


Bisections 


The computations can be accelerated by a good choice 
of 

A) the next box of the list to be bisected; and 

B) the bisection direction of that box. 


These two topics did not draw too much attention in 
the first years of interval optimization. They were con- 
sidered as a tiresome task for completing the algorithms 
rather than topics important for the success of the algo- 
rithms. Meanwhile it has been recognized how impor- 
tant the right choice of box and bisection direction is 
for keeping computation time and costs low. The right 
choice of bisection direction is equally important for 
the global zero search of systems of functions. 

Strategies for choosing the next box include uniform 
subdivision [71], bisecting a box which has a minimal 
lower bound, cf. Algorithms 1 and 2, bisecting that box 
which has been longest on the list [31], bisecting a box 
which has maximum width [31], last in-first out [79], 
that is, the youngest boxes are always processed first 
which keeps the list length short under certain circum- 
stances. 

When a box has been selected for getting bisected 
one has to choose the bisection direction. Historically, 
the first three criteria were uniform subdivision [71] 
(that is, bisections were done in all m directions), cyclic 
bisection [74] (that is, the bisection directions change 
cyclically, i. e., the first box gets bisected normal to the 
first coordinate direction, the second normal to the sec- 
ond coordinate direction, etc.), and bisection normal to 
one of the longest box edges [31]. 

It turned out that using the box width as the only 
criterion for deciding the bisection direction could be 
very ineffective. For a typical example, see [91]. The 
conclusion from such examples is that the choice of the 
bisection direction should consider the behavior of the 
function f over the box as well. Hence, formulas for de- 
ciding a bisection direction are built up using bounds 
for the box width of the objective function and bounds 
for the first and second partial derivatives. Natural in- 
terval extensions of noninterval scaling formulas are 
also used. Our own tests and experiments show that 
an optimum bisection strategy does not exist and that 


it is reasonable to use several bisection strategies each 
pursuing another heuristic aim. This led to systematic 
investigations of bisections and also trisections by sev- 
eral authors, mainly [15,18,53,54,96,97,106]. For fur- 
ther strategies see [93], where also a survey of conver- 
gence properties of some of the strategies can be found, 
and [92]. 


Use of Good Inclusion Functions, Slope Arithmetic 


The better the inclusion functions are, the more effec- 
tive are the tests like midpoint test, monotonicity test, 
etc., cf. for instance, [88]. The derivatives can frequently 
be replaced by slopes which leads to inclusions with 
smaller width, There is also an automatic slope arith- 
metic available which is comparable to automatic dif- 
ferentiation, cf. ‘autoimatic differentiation’ below. The 
interested reader is referred to [1,40,61,62,79,88,101]. 


The Nonconvexity Test 


The aim of this test is to verify that the objective func- 
tion is nowhere convex in some subbox Y € I(X) by 
computationally checking whether the Hessian of the 
objective function does not satisfy some standard con- 
ditions of convexity. Then the interior of Y cannot con- 
tain a minimizer. f € C? is assumed. The first such test 
seems to date back to [64]. 


Thin Evaluation of the Hessian Matrix 


If interval Newton steps are incorporated they will be 
applied to ¢ = f’ where the matrix Jg(Y) = H,(Y) is 
required and H,(Y) is the natural interval extension of 
the Hessian matrix. By certain rearrangements of H;(Y) 
and a special method of getting an interval extension, 
where not all real entries are replaced by intervals, it 
is possible to obtain an interval matrix which is thin- 
ner, hence better (cf. “use of good inclusion functions’ 
above) than H;(Y). A detailed discussion and formulas 
can be found in [33,90]. 


Constraint Logic Programming 


(also known as constraint solving) involves techniques 
where, among others, equations (for example, the 
Karush-Kuhn-Tucker or F. John conditions) are pri- 
marily not evaluated numerically but seen as con- 
straints for or as relations between the variables (a con- 
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cept overtaken from artificial intelligence). The relation 
is then used to shrink the search domain. For exam- 
ple, the equation (constraint) y — x? = 0 immediately 
enables the halfspace defined by y < 0 to be removed 
from the working area. There are several methods based 
on that idea which are best embedded in appropriate 
languages, where symbolic manipulation such as PRO- 
LOG is available. For example, a method called rela- 
tional interval algebra is embedded in the computer 
language CLP having PROLOG as metalanguage (cf. [7] 
or [83]). In this connection it is also opportune to auto- 
matically add redundant constraints in order to accel- 
erate the computations (see, for example, [5] or [82]). 

Another approach is called branch and prune ([41)). 
The pruning concept aims to shrink the search area by 
several tests. The crucial property which is searched for 
is the so-called box consistency which has been intro- 
duced in [6] and is also known in connection with dis- 
crete combinatorial search problems. The box consis- 
tency is primarily used to indicate the existence of so- 
lutions in the considered subarea and is some kind of 
a substitution of interval Newton techniques. An inter- 
esting means for proving box consistency is the bound 
consistency which requires the checking of the facets of 
the box instead of the box itself. The branch and prune 
algorithm is embedded in NUMERICA, which is de- 
signed as a modeling language for global optimization 
and related problems, cf. [41]. 

There are several other approaches that are based on 
constrained logical programming such as the use of re- 
lational manipulations or of set-valued operations, see 
for example, [3] or [45] and the references listed there. 


Automatic Differentiation 


This technique seems to go back to [108]. It helps to re- 
duce costs when computing derivatives or their inclu- 
sion functions, or expressions like (x — c)Tf’(c), (Y — 
of'(Y), (« — c) f’(Y), (Y — o) Tf'(Y)(Y — 0), etc., where 
x, c € X, Y € I(X). There are two modes of automatic 
differentiation, a forward and a reverse. Both modes use 
recursive techniques for evaluating function values and 
chain rules of differentiation. In the forward mode all 
intermediate values of the function are simultaneously 
determined with the corresponding intermediate values 
of derivative, Hessian, etc, and all these intermediate 
values are computed from values calculated in former 


steps. The reverse mode requires some structural plan- 
ing of the formulas similar to the construction of Kan- 
torovich graphs of functional expressions, where a new 
variable is assigned to each node. The differentiation fi- 
nally starts backwards from the function in dependency 
of the variables introduced. Both modes have advan- 
tages. 

Our own experiences, however, show that in case 
of interval expressions like (Y — c) f’(Y) or in case of 
computing generalized gradients, it is not always wise 
to use automatic differentiation. The reason is that in 
such cases information about dependencies between in- 
tervals can be lost so that the widths of the resulting in- 
terval values increase unnecessarily. 

For a detailed description of automatic differentia- 
tion cf. for instance, [23,24,29,84]. 


Parallel Computations 


for global optimization were investigated and imple- 
mented primarily by [8,12,21,22]. 


Global Optimization Over Unbounded Domains 
and Nonsmooth Optimization 


Global Optimization over Unbounded Domains 


Almost all methods for solving global optimization 
problems need the assumption that a bounded do- 
main which contains the solution points is known. The 
boundedness is necessary for the numerical computa- 
tion as well as for guaranteeing the convergence prop- 
erties. If an a-priori box X as search area for the global 
solutions is not known, it is possible to extend the pre- 
vious algorithms, especially Algorithm 3 in such a man- 
ner, that they can operate over unbounded boxes as well 
cf. [90,94]. It is not even necessary, to change the algo- 
rithms formally, one only has to define midpoint and 
width of infinite intervals (both values have to be finite) 
and an arithmetic for infinite intervals. This arithmetic 
should provide intervals with minimal widths in order 
to get reasonable inclusion functions. It would go to 
far to present this arithmetic here, but a short exam- 
ple could be illustrative: This arithmetic assigns to the 
quotient [0, c0]/[0, oo] the value [0, oo], whereas by 
an arithmetic which is called Kahan-Novea-Ratz arith- 
metic in [55] the value [—oo, oo] results. Most of the 
convergence properties of the section on ‘convergence 
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properties of the prototype algorithms’ remain valid 
under slight modifications of the assumptions since 
one can interpret the algorithms as algorithms in (R)” 
where R is the two-point compactification of the real 
axis, R. Thus compact intervals are generated by the 
algorithms and that is all one needs for convergence 
proofs. For details see [90,94] or the survey in [93]. 


Nonsmooth Optimization 

A broad spectrum of mathematical programming prob- 
lems can be reduced to nondifferentiable problems 
without constraints or with simple constraints. The use 
of exact nonsmooth penalty functions in problems of 
nonlinear programming, maximum functions to esti- 
mate discrepancies in constraints, piecewise smooth 
approximation of technical-economic characteristics in 
practical problems of optimal planning and design, 
minimax compromise function in problems of multi- 
criterion optimization, all generate problems of non- 
smooth optimization. Thus, the objective function, f, 
of the optimization problem may look like f(x) = 
max{f1(x), ..., fn(x)} where f; € C', or like f(x) = 
Lf o(x) + a max(0, f;(x)) which is a typical objec- 
tive function arising from penalty methods where fo, f; 
€ C! and pt > 0 is a (reciprocal) penalty factor. 

Interval methods have no difficulties at all to handle 
nonsmooth problems, a fact which was discovered in 
[87] and rediscovered in [55] with great emphasis. The 
construction of inclusion functions does not depend at 
all on the smoothness of a function. The application of 
monotonicity tests and other devices where gradients 
are used (for instance, local noninterval methods, cf. 
‘finding a function value as small as possible’ in the sec- 
tion on ‘accellerating and related devices’) is still pos- 
sible as long as the function is locally Lipschitz, which 
means that, at any argument, x of the function, f, an 
open neighborhood of x, say U,, exists in which f satis- 
fies a Lipschitz condition. It follows by a theorem of H. 
Rademacher that f is differentiable almost everywhere 
in U,. Let 2 be the set of points in U, at which f is 
not differentiable, and let S be any other set of Lebesque 
measure 0. Then the generalized gradient (also called 
subdifferential) of f at x is defined as 


df (x) 
= conv { lim Vf (xn): Xn > X,%Xn ES u ah 


where conv denotes the convex hull, cf. [14]. Let (x, y) 
C R™ denote the open line segment between x and y. 
A theorem of G. Lebourg says that, if y € Ux with (x, y) 
C U, is given then some u € (x, y) exists such that 


f(y) — f(x) € (y—x) "df (u). (9) 


Locally, (9) can be approximated by means of the Lip- 
schitz constant. Globally, (9) can be used to find inclu- 
sion functions of f of a mean value type explicitly: If 
G(Y) is a (not necessarily bounded) box that contains 
Of (u) for any u € Y, then 


F(Y) = f(c) +(Y—c)'G(Y) for Y € I(X), 


where c denotes the midpoint of Y (any other point of 
Y may also be chosen), is an inclusion function of f and 
appropriate for its use in the Algorithms 1 to 3. Fur- 
thermore, G(Y) can be used for the monotonicity test: 
If only one component of G(Y) does not contain zero, 
then f is strictly monotone with respect to the corre- 
sponding direction. 

Algorithms 1 to 3 as well as the monotonicity test 
therefore can be applied to problem (1) without mod- 
ifications, if the objective function of f is locally Lip- 
schitz. It is, however, only possible to apply the in- 
terval Newton algorithm for a very restricted class of 
functions since second ‘derivatives’ of locally Lipschitz 
functions are are not yet explored satisfactory. With 
the aid of the infinite interval arithmetic mentioned in 
the subsection above one can admit also unbounded 
subdifferentials and handle them. For the construc- 
tion of inclusion functions of the objective function 
and the subdifferential and for numerical tests (with 
bounded and unbounded search areas) see, for in- 
stance, [27,28,55,87,90,94]. For further results in con- 
nection with estimates of the penalty factor see [111]. 


Constrained Optimization 


The principles which were developed in the previous 
sections are also useful for constrained problems, that 
is, 


min f(x) (10) 


where M C R™ means the feasible set defined by con- 
straints 


gi(x) <0, 
hj(x) = 0, 


i=1,...,k, 
j=l,...,s. 
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For simplicity, we assume that M C X for some X 
€ I” and that the functions f, g; and h; are defined on 
X. For a successful treatment of problem (10) we need 
inclusion functions F, G; and H; of f, g; and hj, respec- 
tively which satisfy (8) and which have the property that 


w(Gi(Y)) ~0 asw(Y) > 0, 
w(Hj(Y)) > 0 asw(Y)—>0 
fori=1,...,4,j=1,...,s,and Y € I(X). 
Then a very effective means of interval arithmetic is 


the infeasibility test which is applicable to any Y € I(X): 
If either 


(11) 


G;(Y) >0 forsomei € {1,...,k} 
or if 


0¢H(Y) for some j € {1,...,s} 


then all points of Y are infeasible. (The notation [a, b] 
> 0 or [a, b] < 0 is used to indicate that a >0 orb < 0 
holds, respectively.) Hence the box Y can never contain 
a solution of (10) such that Y can be discarded from any 
procedure for solving (10). Conversely, if 


G,(Y) <0 fori=1,...,k, 
and 
HY) =0 forj=1,...,s, 


then all points of Y are feasible (feasibility test). This 
is due to the inclusion principle, (3), by which a € Y 
implies gj(a) € G,(Y) as well as hj(a) € H,(Y) for all 
indices i and j, that is, g(a) < 0 and hj(a) = 0 for alli 
and j. This gives, in fact, the guarantee that every point 
a € Y is feasible. However, if equality constraints are 
present in (10) it is extremely unlikely that conditions 
like Hj(Y) = 0 are satisfied such that the feasibility test 
is rather an academic tool if s > 0. 

There are principally two main possibilities for solv- 
ing the constrained problem. The first possibility is to 
transform the problem to an unconstrained problem 
within a penalty setup and apply the methods of the for- 
mer sections together with the feasibility, respectively 
infeasibility, test in order to have the guarantee to be in 
M or to discard infeasible areas. The second possibility 
is a direct approach where Algorithm 3 is enriched by 
feasibility and infeasibility test and adapted to handle 
the constrained case. We will now give a brief discus- 
sion of these possibilities. 


The Penalty Approach 


There are two kinds of penalty functions which are usu- 
ally preferred. The first one is the so-called L;-exact 
penalty function, @(x) = f(x) + 2a max(0, gi(x)) 
+ ))5=,|hj(%)|, cf. also the subsection ‘nonsmooth op- 
timization’ in the previous section. The second one, al- 
ready introduced by R. Courant, is defined as p(x) = 
uf (x) + 7, max(0, gi(x))? + Dia (hi(x))’. In both 
cases, jl is a penalty factor. For details, and how penalty 
methods are applied to solve constrained optimization 
problems, cf. [25]. (Augmented Lagrangian functions 
could also be taken for the penalty approach.) When lo- 
cally solving (10) with standard noninterval methods, ¢ 
has the advantage that there exists a jz so that the local 
minimizers of @ are also local minimizers of (10), but 
has the disadvantage of being nonsmooth. The use of 
w has the advantage of dealing with a smooth function 
(provided f and the constraints are smooth), but the 
disadvantage, that the minimizers of y might attain the 
solutions of (10) only asymptotically as jz tends to zero. 
If f and the constraint functions are smooth there exists 
a value jz in both cases of penalty functions so that the 
global minimizers of @ and y are also global solutions 
of (10) when solving (10) with interval methods. The 
explicit determination of this number yp is still under 
investigation, cf. [111]. On the other hand, the knowl- 
edge of the value is not necessary if only convergence 
is expected because infeasible areas are removed by the 
infeasible test which has to be incorporated in the pro- 
totype algorithm such as Algorithm 3. The knowledge 
of the value of jz accelerates the computation. A further 
discussion would be too extensive for this article. 


The Direct Approach 


Algorithm 3 is also appropriate as a base algorithm for 
dealing with the constrained case. In order to consider 
the constraints, one just has to add the feasibility and 
infeasibility test and to apply the latter test as a box 
deleting device to the boxes V; and V» after Step 5 of 
the algorithm. If it turns out that the box is feasible, 
it should be marked as feasible by a flag or a Boolean 
value. The remaining boxes of the list are indeterminate, 
that is, the tests executed up to the current state of the 
computation have not yet been able to decide whether 
the box is feasible or not. It can happen that boxes V; 
which are feasible (respectively, infeasible) are not rec- 
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ognized as feasible (respectively, infeasible) by the feasi- 
bility (respectively, infeasibility) test. This is due to the 
excess width (see the section ‘interval arithmetic above) 
which, for instance, can cause that 0 € G;(V) for some 
box V occurs even if 0 < gj(V) holds. The continued 
processing of the indeterminate boxes of the list by the 
steps of Algorithm 3, however, reduces the box widths 
to zero so that their excess widths also tend to zero as 
long as (11) is assumed. This implies also that the union 
of the boxes of & tends, as the computation proceeds, to 
M with respect to the Hausdorff metrics (cf. [90]) if one 
dropped the midpoint test. 

The midpoint test itself helps to discard feasible as 
well as indeterminate boxes which contain no global 
minimizer. The execution is as in the unconstrained 
case: Let f;, be the lowest function value which has been 
calculated up to the completion of the list £,. Then all 
pairs (Zi, Zni) of & are discarded that satisfy 


Fn < Zni- 


It is important for the correctness of the algorithm that 
only function values of points x € M are admitted. 
Hence, if x is taken from a feasible box of the list, x 
is certainly feasible. If the list contains only indetermi- 
nate boxes no direct access to feasible points of M is at 
hand. This is regularly the case if equality constraints 
are present. But without the knowledge of points x « M 
the midpoint test cannot be executed. Two possibilities 
are known for overcoming this hurdle. The first pos- 
sibility is the so-called €-inflation. It accepts that the 
constraints are satisfied within a tolerance of e. If e- 
inflation, which is widely used in noninterval computa- 
tions, is applied then the reliability of the computation 
is lost. Thus this possibility is avoided as far as possible 
in interval computations. 

The second possibility to overcome the difficulties 
arising by equality constraints is based on the appli- 
cation of Moore’s test for the existence of solutions of 
equations [73]. Hansen and G.W. Walster [38] were the 
first who suggested to apply this test to constrained op- 
timization. It is used in the following manner: If Y is an 
indetermined box under processing and one looks for 
a feasible point in Y, the equality constraints and the 
inequality constraints which are active with respect to 
Y are combined to a system of equations. Then interval 
Newton iterations are applied to this system in Y, not 
to solve the system but only to prove the existence of 


a solution within Y by a contraction of the Newton op- 
erator. Then all boxes (Zy;, Zn) of the list (feasible or in- 
determined) can be discarded that satisfy max f(Y) < Zni 
since max f(Y) is an upper bound for a function value 
of a feasible point. If the system of equations shows 
more variables than equations, some variables are re- 
placed by constants. 

The existence test in Y is best done in the follow- 
ing manner: Apply a local simple noninterval optimiza- 
tion algorithm to the objective function w(x) = ns 
max(0, gi(x))? + ae (hj(x))? (this is the Courant 
penalty function for f(x) = 0, cf. the first subsection in 
this section) in order to come near a feasible point, say 
c. Put a small box which has to lie in Y around c and 
apply the existence test to the system in the box (even- 
tually cleaned up by meanwhile inactive inequality con- 
straints). If the test is positive, the existence of a feasible 
point in the box and hence in Y is guaranteed. How- 
ever, it is not at all a proof that Y is infeasible if the test 
fails. 

An improvement is due to [58] where techniques to 
search for points c € Y are designed so that the chances 
of finding a nearby feasible point is optimal. Also the 
number of variables can be larger than the number of 
equations in the underlying system. 

The convergence of the union of the list boxes to 
the set of global minimizers can be shown if the test for 
the existence of feasible points is applied systematically 
and successfully to the boxes of the list (as far as they are 
indeterminate). Other convergence proofs can be found 
in [8,90]. 

In order to not only get a convergent but also a fast 
convergent algorithm, acceleration devices and related 
techniques are again extremely important for practical 
computations. Well-known techniques are the follow- 
ing: 

i) Interval Newton iterations. They are applied to the 
F. John conditions to enclose the stationary points, 
similar as to the unconstrained case. Since the num- 
ber of equations exceeds the number of variables by 
1 in the F. John conditions, an additional equation 
is added which does not influence obtaining the 
stationary points, cf. [39]. As in the unconstrained 
case, the interval Newton iterations are not exe- 
cuted until termination, but they merge with the 
steps of the optimization algorithm. Again, if an it- 
eration shows the existence of a F. John point, it is 
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a feasible point and can be used for the midpoint 
test. 

In contrast to several authors we do not count the 
interval Newton iterations as basic steps of an opti- 
mization algorithm, since they do not influence the 
convergence, only the convergence speed of the al- 
gorithm. 

ii) Monotonicity and nonconvexity test, cf. the section 
on ‘accellerating and related devices’. These tests 
are best applied to feasible boxes, but there are also 
exemptions of this suggestion, cf. [70,93]. Also lin- 
earization of the constraints supplementing the in- 
feasibility test is used [33]. 

iii) Good inclusion functions, slope arithmetic, auto- 
matic differentiation, bisections, parallel algorithms, 
constrained logic programming, are already men- 
tioned in the section on ‘accellerating and related 
devices’. 

iv) Local search devices. In order to get soon func- 
tion values of feasible points, local noninterval op- 
timization procedures are applied to the function 
w, as defined above, related to the current box un- 
til one reaches a feasible point or until one is near 
a feasible point. In the latter case the existence test 
has to be applied at this approximation w. r. to 
a small surrounding box in order to guarantee the 
existence. In case of full-dimensional feasible do- 
mains, the local search can be continued with @¢ in- 
stead wy, but one has to take care not to leave the 
domain M. 

It turned out that the performance of the algorithm was 
greatly influenced by how the steps of the optimiza- 
tion algorithm and the acceleration devices were com- 
bined. Several investigations dealing with this matter 
have been done, cf. for example, [8,18,19,26,38,39,49, 
56,95,100,109,110]. 


Applications 


Global optimization using interval arithmetic has been 
applied to optimization problems in a variety of science, 
engineering and social science areas. Below we briefly 
describe representative examples from several areas. 


Chemistry and Chemical Engineering 


Many optimization problems in the fields of chem- 
istry and chemical engineering can be investigated ef- 


fectively using the tools described in the previous sec- 
tions. 

As a first example we consider the diagram of 
a chemical process showing the processing units and 
the connections between them. This depicts the flow 
of chemical components through the system and it is 
often referred to as process flowsheeting and the asso- 
ciated optimization problems are called process flow- 
sheeting problems. They require the solution of large 
sparse differential-algebraic systems. In [99] a paral- 
lel interval Newton algorithm combined with bisection 
techniques is applied to solve a number of simple prob- 
lems of this type where the parallelization is required 
in order to complete the computations within a reason- 
able timeframe. 

The reliable prediction of phase stability in a chem- 
ical process simulation has been considered by [42,43]. 
It is pointed out that conventional methods that are ini- 
tialization dependent may converge to trivial or non- 
physical solutions or to a nonglobal local minimum. 
It is furthermore shown that these difficulties can be 
avoided using a cubic equation of the state model com- 
bined with interval tools. Their technique is initializa- 
tion independent and it solves the phase stability prob- 
lem with complete reliability. In [44] the approach is 
further developed with respect to computational effi- 
ciency. An enhanced method is presented based on 
sharpening the range of the interval functions that oc- 
cur in the algorithm. It is shown that the computation 
time can be reduced by nearly an order of magnitude in 
some cases. 

The paper [69] addresses the problem of minimiz- 
ing the Gibbs free energy in the m-component mul- 
tiphase chemical and phase equilibrium problem in- 
volving different thermodynamic models. The solution 
method is based on the tangent-plane criterion of Gibbs 
and it is reduced to a finite sequence of local opti- 
mization steps in K(m — 1)-dimensional space where 
K < m is the number of phases, and global optimiza- 
tion steps in (m — 1)-dimensional space. The algo- 
rithm developed in the lower-dimensional space uses 
techniques from interval analysis. Some promising re- 
sults are reported for the algorithm. A parallel inter- 
val algorithm for the problem was developed in [9]. 
Chemists performing photoelectron spectroscopy col- 
lide photons with atoms or molecules. These collisions 
result in the ejections of photoelectrons. The chemist is 
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left with a photoelectron spectrum which is a plot of the 
number of photoelectrons ejected as a function of the 
kinetic energy of the photoelectron. A typical spectrum 
consists of a number of peaks. The chemist would like 
to resolve the individual peaks in the spectrum. In the 
paper [76] a test problem is constructed as a sum of two 
Gaussian functions involving a number of parameters. 
These parameters are found using interval techniques 
of global optimization. 


Physics, Electronics and Mechanical Engineering 


A wide variety of problems in physics, electronics and 
mechanical engineering can be formulated as optimiza- 
tion problems amenable to the techniques described in 
the previous sections. We provide some representative 
examples below. 

An early application is found in [68] who applies 
interval global optimization to electronic switching sys- 
tems for efficiency reasons. 

In [10] interval global optimization is used to deter- 
mine rigorous bounds on Taylor maps of general opti- 
cal systems. It is also pointed out that stability for stor- 
age rings and other weakly nonlinear systems can be 
guaranteed using their developments. 

In [78] Hansen’s method is applied to a demagni- 
fying system for electron beam lithography device for 
finding all real minimizers of a real valued objective 
function of several variables. 

Computer-aided simulation tools for liquid crystal 
displays have been developed in recent years. These 
tools calculate the molecule orientation of the liquid 
crystal material by minimizing an energy function. The 
results of such simulations are used to optimize note- 
book computer displays. In the paper [80] interval 
global optimization is used to calculate all minimizing 
molecule configurations. 

Interval global optimization is applied to the opti- 
mal design of a flat composite plate and a composite 
stiffened panel structure in [63]. The methodology is to 
generate a feasible suboptimal interval which is used to 
examine the manufacturing tolerance in the design op- 
timization. 


Economics 

Global optimization using interval analysis has also 
found applications in economics. Two examples are 
presented below. 


A model of copyable products such as software is 
considered by [107] who based their model on the 
model developed by I.E. Besanko and W.L. Winston 
[11]. In the paper [107] this model is solved for a glob- 
ally optimal result using an interval branch and bound 
method. 

In [50] another problem in economics is consid- 
ered. The problem is to minimize an econometric func- 
tion 


Yin _ Bi _ BoX2 _ PxXey 


where the data are artificially generated for the vari- 
ables. Several tests are performed and it is shown that 
interval methods are competitive with other methods 
such as simulated annealing. 


See also 


> «BB Algorithm 

> Automatic Differentiation: Point and Interval 

> Automatic Differentiation: Point and Interval 
Taylor Operators 

> Bounding Derivative Ranges 

> Continuous Global Optimization: Applications 

> Continuous Global Optimization: Models, 
Algorithms and Software 

> Global Optimization in the Analysis and 
Management of Environmental Systems 

> Global Optimization: Application to Phase 
Equilibrium Problems 

> Global Optimization in Batch Design Under 
Uncertainty 

> Global Optimization in Generalized Geometric 
Programming 

> Global Optimization Methods for Systems of 
Nonlinear Equations 

> Global Optimization in Phase and Chemical 
Reaction Equilibrium 

> Interval Analysis: Application to Chemical 
Engineering Design Problems 

> Interval Analysis: Differential Equations 

> Interval Analysis: Eigenvalue Bounds of Interval 
Matrices 

> Interval Analysis: Intermediate Terms 

> Interval Analysis: Nondifferentiable Problems 

> Interval Analysis: Parallel Methods for Global 
Optimization 
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> Interval Analysis: Subdivision Directions in Interval 
Branch and Bound Methods 

> Interval Analysis: Systems of Nonlinear 
Equations 

> Interval Analysis: Unconstrained and Constrained 
Optimization 

> Interval Analysis: Verifying Feasibility 

> Interval Constraints 

> Interval Fixed Point Theory 

> Interval Linear Systems 

> Interval Newton Methods 

> MINLP: Branch and Bound Global Optimization 
Algorithm 

> MINLP: Global Optimization with wBB 

> Mixed Integer Nonlinear Programming 

> Smooth Nonlinear Nonconvex Optimization 
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In many applications the coefficients of real linear sys- 
tems are, due to measurement or approximation errors, 
not known exactly. Therefore, the family of real linear 
systems 


A-x=b, (1) 
where A, b satisfy the inequalities 


|A°—A| <A, |b—b | <6 (2) 


is considered. The absolute value and comparisons are 
used entrywise. The matrices A‘, A, A € R”*” are real 
n x n matrices, b°, b, 6 € R", and A, 6, which describe 
the perturbation bounds, are assumed to be nonnega- 
tive. This family of real linear systems is called an in- 
terval linear system, because each matrix A, right-hand 
side b is contained in the interval matrix A := [A‘ — A, 
A‘ + A], interval vector b := [b° — 5, b° + 5], respectively. 


A‘ and Db‘ are called the centers of the interval linear sys- 
tem. 

The corresponding solution set X is defined as the 
union of all solutions of this family, that is 


X := {x € R": x,A,b satisfy (1), (2)}. (3) 


Naturally, the main interest is to determine the exact 
range of each component of the solution set, that is to 
calculate the exact or optimal componentwise bounds 

min{x;: x €X}, max{x;: x € X} (4) 
fori=1,...,. The minima and maxima exist provided 
A is regular, that is all matrices A € A are nonsingular. 
Otherwise, A is called singular, and X is unbounded or 
empty. 

In general, the solution set X is not convex and has 
a complicated shape: see Fig. 1 which is taken from 
a book of A. Neumaier [18, p. 97]. Hence, calculating 
bounds for the solution set X is a global optimization 
problem. Moreover, X needs not to be connected or 
bounded. This is shown by the simple one-dimensional 
equation A - x = 1, A € [— 1, 1] with solution set X = 
(— oo, — 1] U[1, oo). 

From the point of view of complexity theory, J. 
Rohn [25] has proved that the problem of calculating 
bounds for the solution set is NP-hard. Roughly speak- 
ing, he has shown that there is no polynomial time algo- 
rithm which calculates bounds of the solution set with 
overestimation less than any given positive constant. 


Interval Linear Systems, Figure 1 
A projection of a three-dimensional solution set 
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This is true, even if the interval matrix A is strongly reg- 
ular, i.e. if the spectral radius p(|(A°)~'| - A) is less than 
one. If A is strongly regular, then the regularity of A 
follows immediately by observing that for A € A there 
holds 


A=AS—-A=A‘-(I—-(A)7!- A), (5) 


where || < A. Hence, singularity of A is equiva- 
lent to the fact that (A°)~' - A has the eigenvalue 1. 
Since p(|(A°)~'| - A) < 1 it follows that A is regular. 
By Perron-Frobenius theory, strong regularity implies 
that the radius matrix A is not too large. For further 
NP-hardness results related to other interval problems 
see [27]. 

During the last three decades the problem of calcu- 
lating componentwise bounds for X, not necessarily op- 
timal bounds, has received much attention, and many 
methods were developed. No attempt can be made in 
this short survey to review all different approaches. But 
the literature given in this section shall serve as a guide 
for further reading. 

The first algorithm for calculating optimal compo- 
nentwise bounds was given by W. Oettli and W. Prager 
[19,20]. There the solution set X is described as the set 
of feasible solutions of a special system of nonlinear in- 
equalities: 


X= {x ER": |Aox—b°| < A- |x| +8} (6) 


But in each orthant this system is a convex poly- 
hedron. Hence, in each orthant optimal bounds can 
be calculated by using linear programming techniques. 
Unfortunately, there are 2” orthants, and therefore this 
method needs for each instance a priori exponential 
time, and can work only for problems of very small size. 

Recently, based on the result of Oettli and Prager, 
in [9] a more efficient method for calculating optimal 
bounds is presented. This method uses linear program- 
ming techniques in only those orthants which are inter- 
sected by the solution set X. 

Starting with the pioneering book of R.E. Moore 
[15], a large number of methods were proposed using 
the tools of interval arithmetic. Many algorithms can 
be found for example in the monographs [2,16], and 
[18]. These methods are polynomial time algorithms, 
calculate only componentwise (not optimal) bounds, 
and work under special assumptions: in almost all cases 
strong regularity of A is required. 


In interval arithmetic the elementary operations for 
intervals x = [x, x], y = Ly, y] € IR are defined by 


x*y={xxy: xex, yey} (7) 


where * € {+, —, -, /}, and in case of division 0 ¢ y is 
assumed. By a simple monotonicity argument it follows 
that 


x * y = [min S, max S], (8) 
where the set S is defined by 
S:= {x * y, xX" YX * YX * VY}. 


Interval operations between real matrices, interval ma- 
trices, real vectors and interval vectors are defined as in 
the real case, only the real operations are replaced by 
the corresponding interval operations (7). 

For example, if R = (rj) ¢ R"*" isa real n x n matrix, 
and b € IR", then R - b is defined as follows: the real 
coefficients rj are replaced by the point intervals rj = rj 
= [rij, rij] and 


(R -b); = ae -bj. 
j=l 


By definition (7), for all i the equation 
(R-b)i = 4) rij-bj: BEd 
j=l 


holds. Therefore R -b is the smallest interval vector con- 
taining the set {R- b: b € b}. But in general, R -b overes- 
timates the latter set. 


Example 1 Let 


r= (1 i) 
1 1 


then R -b = ([4, 14], [2, 6])™, but {R - b: b € b} is the 
convex hull of the set {(4, 2)T, (13, 5)T, (5, 3)T, (14, 6)T}, 
see Fig, 2. 


In many interval methods for calculating bounds of X 
the interval linear system is first preconditioned by an 
appropriate matrix R, which in most cases is an approx- 
imate inverse of the center A‘. This yields the precondi- 
tioned interval linear system 


(R-A)-x =R-b. (9) 


Interval Linear Systems 


1759 


Interval Linear Systems, Figure 2 


with interval matrix R -A and interval right hand side 
R-b. 

From the discussion above it follows that R-A, R-b 
overestimate the sets {R- A: A € A}, {R- b: b € D}, re- 
spectively. Therefore, the solution set X’ of the precon- 
ditioned interval linear system (9) contains but over- 
estimates the solution set X. However, for small A the 
interval matrix R -A is close to the identity matrix and 
diagonal dominant. Hence, only small overestimations 
will occur. 

Bounds for the solution set X’ of the precondi- 
tioned interval linear system (9) are then calculated 
by using interval Gaussian elimination (cf. for exam- 
ple [1,7]), interval Gauss-Seidel iteration (cf. for exam- 
ple [6,17,22]), or fixed point iteration (cf. for example 
[14,28]). In general, these bounds are not optimal for 
X’, and the overestimation depends on the method. 

But recently E.R. Hansen [5], and Rohn [26] have 
presented a polynomial time algorithm for calculating 
optimal bounds for the solution set X’ of the precondi- 
tioned interval system (9). Only two matrix inversions 
are required. 

Preconditioning of interval linear systems was first 
suggested by Hansen and R. Smith [7]. Later, R.B. Kear- 
fott [11,12] introduced the so-called width optimal pre- 
conditioners by using linear programming techniques. 

Preconditioning requires the computation of an ap- 
proximate inverse. For sparse linear systems the in- 
verse in general is full. Therefore, the approaches de- 
scribed above are not applicable for large dimensions. 
But recently, $.M. Rump [29,30] generalized his iter- 
ation method (cf. [28]) to sparse nonlinear systems 
without preconditioning with a full inverse. His idea, 
roughly spoken, was to replace the inverse by a lower 
bound of the smallest singular value of the center ma- 
trix. 


Last, an interval method not using precondition- 
ing should be mentioned. This method is a branch and 
bound scheme proposed by S.P. Shary [31]. 

In the following, two methods are described in more 
detail. First, in the next section 2 Rump ’s method [28] 
is presented. This method is implemented in several 
programming packages like ACRITH [8], ARITHMOS 
[3], PASCAL-XSC [4], and PROFIL [13]. Moreover, as 
mentioned above, the method can be modified for solv- 
ing sparse interval linear systems and nonlinear sys- 
tems. Then, in the last section the method presented in 
[9] for calculating optimal bounds of X is described. 


An Iterative Interval Method 


It is assumed that A, b satisfy the inequalities (2), R is 
an approximate inverse of A‘, and x is an approximate 
solution of A‘ x = b°. No assumptions about the qual- 
ity of these approximations are made. It is well known 
from numerical linear algebra that defect iteration with 
the iteration function 


f(x) := x +R-(b— A(X + x)) 


= R-(b— Ax) +(I—RA)x (10) 


can be used to improve the quality of the approxima- 
tion x. This function is continuous, and if for a given 
interval vector x the condition f(x) C x holds, then by 
Brouwer ’s fixed point theorem there exists x € x with 
f(x) = x. Using (10) yields R - (b — A(¥ + X)) = 0. 
If R is nonsingular, then A(X + x) = b implying that 
X+x € X+x is the exact solution of Ax = b. Moreover, 
by using a contradiction argument, it can be shown that 
the solution x + X is unique and R, A are nonsingu- 
lar provided that f(x) is contained in int(x), the interior 
of x. 
An immediate consequence is that if the condition 


(11) 


is satisfied, then f(x) C int(x) holds for all A € A, b Eb. 
Hence X C x+x. Notice that (11) can be easily checked 
by using interval arithmetic. 

The remaining problem is to find an appropriate 
box x satisfying (11). The following iteration starting 
with x° := R-(b— A- x) can be used: 


R:(b—A-%) +(I—R-A)-x C int(x) 


y:=x*-[l1-e,l+e]+[-pwu]-e, 


(12) 
xktl x? +(I—R-A)-y*. 
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The values €, 2 > 0 are called inflation parameters, and e 
is the vector with 1 in each component. The main prop- 
erty of this iteration is that (cf. [30]) for every starting 
box x° holds 


dk eN: x**! c int(y*) 


t 
p(II—R-Al) <1, 


where p denotes the spectral radius of the absolute value 
of I — R-A. This means that after a finite number of 
steps bounds x + xk+1 of X are calculated, provided 
the spectral radius p(|I — RA|) < 1; this means that A is 
strongly regular. For practical applications it is recom- 
mended to execute at most k = 10 iteration steps, and 
pt should be greater than the smallest positive floating 
point number. Obviously, by using this parameters we 
get an 0(n*) polynomial time algorithm. 


Example 2 To demonstrate how this algorithm works, 
the following interval linear system with centers 


aca (12 12) ge (15). 
=e 12 3.5 


and perturbation bounds 


ez (2 03) . te i) 
0.2 0.2 0.5 
is considered. 

This system is a slight modification of an example 
of Rohn [23]. We have chosen € = 0.05 and yu equal to 
the smallest positive machine number. In the follow- 
ing, five (appropriately rounded) decimal digits are dis- 
played. The two-dimensional interval vector with com- 
ponents equal to [— 1, 1] is denoted by [—1, 1]. 

The spectral radius p(|I — R-A]) ~ 0.3333 < 1 where 
R = (A‘)~1, and therefore the iteration (12) will com- 
pute a box containing the solution set X in finitely steps. 

The approximate solution of the center system is 
x = (0.83333, 2.0833)! yielding the starting box 
x9 := R-(b—A-X) = 0.9028 - [—-1, 1]. 

Iteration (12) yields 


y’ = 0.9480-[-1,1], x’ 
y = 1.2797-[-1,1],  x* = 1.3294-[—1, 1], 
y = 1.3959-[-1,1], x? = 1.3681 - [—1, 1]. 


1.2188 - [—1, 1], 


Hence, for k = 2 it follows x? C int(y’), and the solution 
set X is contained in 


, ae oa, (13) 


*+Y = \ 10.7152, 3.4514] 


For numerical results of this method and its generaliza- 
tion to sparse systems, see [29,30]. There, examples up 
to 1000000 variables including the “Harwell test cases’ 
are presented. 


Optimal Bounds 


As pointed out in the introduction, a polynomial time 
algorithm may overestimate the solution set X drasti- 
cally or may fail. Therefore, in this section a method (cf. 
[9]) which produces optimal bounds of X if and only if 
A is regular is described. 

An immediate consequence of (6) is that the solu- 
tion set X is the finite union of convex polyhedrons. To 
see this, let {— 1, 1}" denote the set of all sign vectors 
with components equal to 1 or — 1. For a sign vector 
s € {— 1, 1}" let D(s) denote the diagonal matrix with 
diagonal s and R"(s) := {x € R": D(s) - x > O}. Then the 
intersection X(s) := X M R"(s) of the solution set with 
the orthant corresponding to s is given by the following 
system of linear inequalities 


(A°—A-D(s))-x <b°+6 


(A+ A-D(s))-x > b'-5 
D(s)-x > 0. 


(14) 


Therefore, for a fixed orthant R"(s) optimal bounds 
of X(s) can be calculated by minimizing and maxi- 
mizing each coordinate x; subject to the constraints 
(14). These are linear programming problems which 
can be solved in polynomial time, implying that opti- 
mal bounds of X(s) can also be calculated in polynomial 
time. 

Now, one can get optimal bounds of X by calculat- 
ing optimal bounds of X(s) for each orthant R"(s). Un- 
fortunately, there are 2” orthants, and this approach can 
work only for very small dimension n. 

For interval linear systems with A = 0, 6 = 0 the so- 
lution set X is by definition equal to the exact solution 
of the corresponding real linear system, and therefore 
X will be in one orthant (with exception of degenerated 
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cases). With growing radii A, 6 the solution set may in- 
tersect more orthants. But in many cases only few or- 
thants will intersect the solution set. 

Then most of the computing time of the above ap- 
proach will be spent for checking that X M R"(s) is 
empty for almost all orthants. Therefore, the question 
arises if it is possible to construct an algorithm which 
picks up exactly those orthants where X M R"(s) is 
nonempty. In the following such an algorithm is pre- 
sented. This approach heavily relies on the following 
topological alternative statement, which says that for 
nonempty X exactly one of the following two state- 
ments is true: 

i) X is compact and connected, and A is regular; 
ii) X is unbounded, each topologically connected com- 
ponent of X is unbounded, and A is singular. 

An immediate consequence is that the solution set 
X cannot be the union of bounded and unbounded 
topologically connected components. Therefore, each 
method which only calculates optimal bounds of 
a topologically connected component of X suffices to 
solve the problem. To do this, the representation graph 
G=(V, E) of the solution set X with the set of nodes 


V = {s € {-1,1}": X(s) 4 O}, (15) 
and the set of edges 
P= Jiggy s,t € V, sand t differ in (16) 
exactly one component 
is defined. 


Now the following basic relationship between the 
solution set and its representation graph can be proved: 
a) Each nonempty topologically connected compo- 

nent X of X can be represented in the form 


cS U{X(s): s € US, (17) 


where U is the node set of a connected component 
of G, 

b) If X is nonempty and bounded, then G = (V, E) is 
a connected graph, and 


X = U{X(s): se V}. (18) 


This property gives the possibility to apply to the 
implicitly defined representation graph G the well- 
known graph search method (see for example [21]) for 
calculating a connected component: 


1) Compute a starting node s € V by solving the mid- 
point system A‘x = b‘. The vector s is defined as the 
sign vector of this solution, and stored in a list L. 

2) Puta sign vector s € L, and solve the linear program- 
ming problems 

min {x;: x € X(s)}, (19) 
max {x;: x € X(s)} 
fori=1,...,n. 
If a problem is unbounded, then an unbounded 
topologically connected component of X is found. 
Hence, each other topologically connected compo- 
nent of X is unbounded, A is singular and the 
method is stopped. Otherwise, the linear program- 
ming problems calculate optimal bounds of X(s), 
which are also stored. By definition of the edge set 
E, it follows immediately that 


1p Si Spaying Sa) (20) 


is adjacent to s, if and only if one of the Ip ’s in (19) 

has the exact bound equal to zero. All neighbored 

nodes t of s are stored in list L, except those which 
have been already treated. Then we proceed by go- 
ing to 2), and repeat this process until L is empty. 

It follows that this algorithm terminates in a finite 
number of steps, and either calculates optimal bounds 
of the solution set and proves regularity of A, or shows 
that X is unbounded and A is singular. The algorithm 
searches only in those orthants which have a nonempty 
intersection with the solution set, and avoids all other 
ones. Therefore, |V| calls of a polynomial time algo- 
rithm are needed, where | V| is the number of nonempty 
intersections of the solution set with the orthants. 

In many cases in practice, due to physical or eco- 
nomical requirements, only few variables will change 
the sign implying that only few orthants will be inter- 
sected by the solution set. In those cases the method 
works efficiently. Nevertheless, due to the mentioned 
NP-hardness results of Rohn, there are also cases where 
an exponential computing time occurs. 


Example 3 In order to see how this algorithm behaves 
in detail, the example of the previous section is dis- 
cussed. The solution « = (—0.8333, 2.0833)" gives the 
sign vector s = (— 1, 1) which is stored in L. 

Now we take this sign vector from list L (then L is 
empty) and solve the lp ’s (19) which gives the optimal 
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bounds of X(s) 


Ga an) 


21 
[1.3095, 3.1667] (21) 


No optimal bound has a value equal to zero, which im- 
plies that s = (— 1, 1) has no neighbor with respect to 
the edge set E. It follows that X(s) is a topologically con- 
nected component and X = X(s). Therefore, (21) gives 
the optimal bounds of xX. 

Following, the original example of Rohn [24] is dis- 
cussed, which differs from the previous one by chang- 


Ao 500.5 500.5 
©" \—500.5 500.5)’ 
Ae 499.5 499.5 
“~~ \499,5 499.5)” 
Thus very large perturbations A are allowed, and the 
spectral radius p (|(A°)~"| - A) = 1.9960. Hence the it- 
eration method of the previous section cannot work, 
because A is not strongly regular. The solution x = 
(—0.001998, 0.004995) " gives s=(—1,1)T and L = {s}. 


Now s is removed from list L (then L is empty) and 
the lp ’s (19) yield the following optimal bounds of X(s): 


[—3.9950, 0] 
[0.001002, 3.9980] } © 
One optimal bound of the first component has a value 
equal to zero. Therefore, by (20) t = (— 5), s.)7 = (1, 1)T 
is adjacent to s and list L := {t}. 


Now we take ¢ from list L (then L is empty), and the 
Ip ’s (19) yield the optimal bounds of X(t): 


[0, 1.9950] 
hea sm 
Only the lower optimal bound of the first component 
is equal to zero. This gives the adjacent sign vector s 
= (— ft), t2) = (— 1, 1). But this is the sign vector al- 
ready treated, and therefore not stored in list L. Since 
list L is empty, the algorithm is finished, and the opti- 


mal bounds (22) and (23) together deliver the optimal 
bounds 


ing 


(22) 


(23) 


Fete oa) (24) 


[0.001002, 3.9980] 


for the solution set X. 


By comparing the bounds (21) and (13), we see that 
the optimal bounds (21) clearly improve the bounds 
(13) calculated by the iteration method of the previ- 
ous section. This overestimation is mainly due to the 
preconditioning with the midpoint inverse. However, 
the bounds (13) give additionally the information that 
the solution set X intersects at most 2 orthants.Thus, an 
a priori estimation on the computing time for the ex- 
act method in this section is given: the above method 
has only to search in two orthants. Hence, first using 
in the strongly regular case a polynomial time method, 
provides rough bounds for X as well as a bound for the 
computing time which is needed for calculating exact 
bounds. 


Several other examples up to dimension n = 50 can be 
found in [9] and [10]. 
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Introduction 


Interval Newton methods combine the classical Newton 
method, the mean value theorem, and interval analy- 
sis. The result is an iterative method that can be used 
both to refine enclosures to solutions of nonlinear sys- 
tems of equations, to prove existence and uniqueness of 
such solutions, and to provide rigorous bounds on such 
solutions, including tight and rigorous bounds on crit- 
ical points of constrained optimization problems. In- 
terval Newton methods can also prove nonexistence 
of solutions within regions. Such capabilities can be 
used in isolation, for example, to provide rigorous er- 
ror bounds for an approximate solution obtained with 
floating point computations, or as an integral part of 
global branch and bound algorithms. 


Univariate Interval Newton Methods 


Suppose f: x =[x,x] — R has a continuous first 
derivative on x, suppose that there exists x* € x such 
that f(x*) = 0, and suppose that x € x. Then, since the 
mean value theorem implies 


0 = f(x") = f(X) + f'(E)(x* — 8), 
we have x* = x — ie for some & € x. If f’(x) is any 
interval extension of the derivative of f over x, then 


for any X € x. (1) 


(Note that, in certain contexts, a slope set for f centered 
at x may be substituted for f(x); see [1] for further ref- 
erences.) Equation (1) forms the basis of the univariate 


interval Newton operator: 


_ f@) 
f(x) 
Because of (1), any solutions of f(x) = 0 that are in x 
must also be in N(f, x, x). Furthermore, local conver- 
gence of iteration of the interval Newton method (2) 
is quadratic in the sense that the width of N(f,x, x) is 
roughly proportional to the square of the width of x. 
Furthermore, if an interval derivative extension (in 
contrast to an interval slope) is used for f (x), then 


N(f,x, x) = x 


(2) 


N(f, x, x) C int(x), 


where int(x) represents the interior of x, implies that 
there is a unique solution of f(x) = 0 within N(f, x, x), 
and hence within x. 


Multivariate Interval Newton Methods 


Multivariate interval Newton methods are analogous to 
univariate ones in the sense that they obey an iteration 
equation similar to equation (2), and in the sense that 
they have quadratic convergence properties and can be 
used to prove existence and uniqueness. However, mul- 
tivariate interval Newton methods are complicated by 
the necessity to bound the solution set of a linear sys- 
tem of equations with interval coefficients. 

Suppose now that f: R” > R", suppose x is an inter- 
val vector (i.e. a box), and suppose that x € R”. (If in- 
terval derivatives, rather than slope sets, are to be used, 
then further suppose that x € x.) Then a general form 
for multivariate interval Newton methods is 


N(f.x,%) =% +y, (3) 


where v is an interval vector that contains all solutions 

v to point systems Av = —f(X), for A € f’(x), where 

f(x) is an interval extension to the Jacobi matrix of f 

over x. (Under certain conditions, f may be replaced by 

an interval slope matrix.) As with the univariate inter- 
val Newton method, under certain natural smoothness 
conditions, 

e N(f,x,%) must contain all solutions x* € x with 
f(x*) =0. (Consequently, if N(f, x, x) Nx = 9, then 
there are no solutions of f(x) = 0 in x.) 

e For x containing a solution of f(x) = 0 and the 
widths of the components of x sufficiently small, the 
width of N(f,x, x) is roughly proportional to the 
square of the widths of the components of x. 
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e IfN(f,x,x) C int(x), where int(x) represents the 
interior of x, then there is a unique solution of f(x) 
= 0 within N(f, x, x), and hence within x. 

For details and further references, see [1, $1.5]. 
Finding the interval vector v in the iteration formula 

(3), that is, bounding the solution set to the interval lin- 

ear system 


f(x)v = —f(%), 


is a major aspect of the multivariate interval Newton 
method. Finding the narrowest possible intervals for 
the components of v is, in general, an NP-hard problem. 
(See » Complexity classes in optimization.) However, 
procedures that are asymptotically good in the sense 
that the overestimation in v decreases as the square of 
the widths of the elements of f can be based on first 
preconditioning the interval matrix f’(x) by the inverse 
of its matrix of midpoints or by other special precon- 
ditioners (see [1, Chapt. 3]), then applying the interval 
Gauss-Seidel method or interval Gaussian elimination. 


Existence-Proving Properties 


The existence-proving properties of interval Newton 
methods can be analyzed in the framework of classi- 
cal fixed-point theory. See » Interval fixed point the- 
ory, or [1, $1.5.2]. Of particular interest in this context 
is a variant interval Newton method, not fitting directly 
into the framework of formula (3), that is derived di- 
rectly by considering the classical chord method (New- 
ton method with fixed iteration matrix) as a fixed point 
iteration. Called the Krawczyk method, this method has 
various nice theoretical properties, but its image is usu- 
ally not as narrow as other interval Newton methods. 
See [1, p. 56]. 

Uniqueness-proving properties of interval Newton 
methods are based on proving that each point matrix 
formed elementwise from the interval matrix f’(x) is 
nonsingular. 


Example 1 For an example of a multivariate interval 
Newton method, take 


An interval extension of the Jacobi matrix for f is 


f(x) = & a 
2X2 2x) 
and its value at x is 
[1.8,2.4] [—0.2,0.2] 
[—0.2, 0.2] [1.8,2.4] J ° 


The usual procedure (although not required in this spe- 
cial case) is to precondition the system 


f (xv = —f(%), 


say, by the inverse of the midpoint matrix 


eu fl 0) =fae 1 

~\oO 24 ~\ 0 0.476 
to obtain 

Yf (x)v = —Yf(%), 


i.e., rounded out, 
[—.096,.096]  [0.85,1.15] 


_ ([--0488, 0.487] 
= ; : 


( [0.85,1.15] [—.096, oy 


(Rigor is not lost by taking floating point approxi- 
mations for the preconditioner, but the interval arith- 
metic should be outwardly rounded.) The interval 
Gauss-Seidel method can then be used to compute 
sharper bounds on v = 


[—0.15,0.15]) rat is 
[—0.1, 0.1] 


x — x, beginning with v = 


= c £50.0488, -0.0488] — [—0.096, 0.096]v2 
Vv 
, (0.85, 1.15] 
Cc [—0.0688, —0.034]. 


Thus, the first component of N(f, x, x) is 
x +vC [0.9833, 1.016]. 


In the second step of the interval Gauss-Seidel method, 
fas 0 — [—0.096, 0.096]¥, 
ea 
° [0.085, 1.15] 
C [-0.00778, 0.00778], 
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so, rounded out, N(f, x, X) is computed to be 


[0.981, 1.016] ) Gee 
[—0.00778, 0.00778] [—0.1, 0.1] ) ° 


This last inclusion proves that there exists a unique so- 
lution to f(x) = 0 within x, and hence, within N(f, x, x). 
Furthermore, iteration of the procedure will result in 
bounds on the exact solution that become narrow 
quadratically. 
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A supply chain (SC) can be defined as an integrated sys- 
tem, where various firms work together, including sup- 
pliers of raw materials, manufacturers, distributors and 
retailers. Their efforts are concentrated on transform- 
ing the raw materials into final products that satisfy cus- 
tomer requirements, and delivering these products to 
the right place, at the right time. A SC contains two ba- 
sic, integrated processes: 

a) production planning and inventory management 

(IM); and 

b) distribution and logistics processes [6]. 

This article gives a brief review of literature on single- 
stage IM and multistage IM models. The objective is 
to provide an overview of this research and empha- 
size current achievements in this field. Inventories exist 
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throughout the SC in the form of raw materials, work- 
in-process, and finished goods. Typical relevant inven- 
tory costs are: inventory carrying costs, order costs, and 
shortage costs. These costs often tend to conflict, in 
other words, decreasing one generally requires increas- 
ing another. The main motivation for keeping invento- 
ries is to cope with the uncertainty of external demand, 
supply and lead-time [18]. Keeping inventories is im- 
portant to increase customer service level and reduce 
distribution costs, but it is estimated [5] that invento- 
ries cost approximately 20% to 40% of their value per 
year. Thus, managing inventories in a scientific manner 
to maintain minimal levels required for meeting service 
objectives makes economic sense. K. Arrow [2] presents 
an interesting discussion of the motives of a firm for 
holding inventories. There are several opportunities for 
streamlining SC inventories. It is important to under- 
stand that for a given service level the lowest inventory 
investment results when the entire SC is considered as 
a single system. Such coordinated decisions at Xerox 
and Hewlett Packard reduced their inventory levels by 
over 25% [9]. 


Single Stage Inventory Management Models 


The simplest inventory model is the deterministic eco- 
nomic order quantity (EOQ) model presented by F. 
Harris [12]. He recognized this problem in 1913 in his 
work at Westinghouse. The model determines the con- 
stant order quantity that minimizes the average an- 
nual cost of purchasing and carrying inventory, as- 
suming deterministic and constant demand rate, no 
shortages, and zero order lead-times. A number of im- 
portant scholars turned their attention to mathemat- 
ical inventory models during the 1950s. A collection 
of mathematical models by Arrow, S. Karlin and H.E. 
Scarf [3] influenced later work in this area. At about 
the same time, H.M. Wagner and T.M. Whitin [24] 
developed a solution algorithm to the dynamic lot- 
sizing problem subject to time varying demand. Their 
model assumes periodic, deterministic demand over 
a finite planning horizon, no capacity restrictions on 
production, and zero inventory at the beginning and 
the end of the planning horizon. This problem is for- 
mulated as a mixed integer linear program (MILP) 
and can be represented as a fixed-charge network flow 
problem. The Wagner-Whitin algorithm is best illus- 


trated using a shortest-path graph representation. Al- 
though the Wagner-Whitin model gives an optimal so- 
lution, in practice other heuristic lot-sizing algorithms 
are adopted. See [18] for a survey on the EOQ lot-sizing, 
silver-meal, least unit cost heuristics, etc. These models 
trade-off productivity losses from making small batches 
and the opportunity costs of tying up capital in inven- 
tory due to large batches. U.S. Karmarkar [14] extends 
the lot-sizing model to include lead-time related costs. 
Inventory control models subject to uncertain demand 
are basically of two types: periodic review models and 
continuous review models. Periodic review models exist 
for one planning period or for multiple planning pe- 
riods. The single-period, stochastic inventory model is 
known as the newsboy model. The case of single pe- 
riod models with fixed order cost and initial invento- 
ries, leads to the optimality of (s, S) optimal policies. 
These policies state that ifinventory position is less than 
s, then order up to S, otherwise do not order. The pe- 
riodic review models with an infinite horizon are for- 
mulated in a dynamic programming framework [23]. 
Continuous review systems under uncertain demand 
track demands as they occur and the inventory position 
is always known. These models lead to the (Q, R) pol- 
icy, under which a fixed amount of Q units is ordered 
each time the inventory position reaches a certain level 
R. The model typically assumes either backordering or 
lost sales when shortages occur. 


Multistage Inventory Management Models 


Coordinating decisions at different levels of an organi- 
zation comes as a need to reduce operating costs. This 
coordination can be seen in terms of integrating differ- 
ent decision types e. g., facility location, inventory plan- 
ning, distribution, etc., or linking decisions within the 
same function at different stages in the SC. Multistage 
inventory management models (MSIM models) con- 
centrate on integrating IM policies in different stages 
of the SC. The typical MSIM problem analyzed in the 
literature is a two-level system composed of a number 
of retailers being served by a central warehouse. The de- 
mand at each retailer is satisfied using on-hand inven- 
tory. When insufficient inventory is available, a back- 
order typically occurs, and demand must be satisfied 
later using inventory from the warehouse. The model 
decides on the inventory level at each retailer and the 
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warehouse, such that a set of prespecified criteria is 
satisfied at minimum inventory-related costs. The first 
MSIM model was developed by A. Clark and Scarf [7]. 
They consider a system with a single product and N fa- 
cilities, where facility i supplies facility i+ 1, for i = 1, 

.., N—1. The model considers a periodic review of the 
inventory level and assumes fixed lead-times, a finite 
planning horizon, backordering of demand shortages 
and variable order cost. The aim is to find IM policies to 
be applied in each of the echelons, such that system cost 
is minimized. They show that under the above assump- 
tions an optimal policy for the system can be found 
by decomposing the problem into N separate single- 
location problems and solving the problem recursively. 
The above model was extended to incorporate an infi- 
nite horizon and lead time uncertainty. A generaliza- 
tion of the system described above is the multi-echelon 
arborescence system, where each location has a unique 
supplier. A.F. Veinott [23] provides an excellent sum- 
mary of these early modeling efforts. One of the earliest 
continuous review MSIM models was presented by C.C. 
Sherbrooke [21]. He considers a two-stage system with 
several retailers and a single warehouse that supplies 
to these retailers. He introduces the well-known MET- 
RIC approximation to determine the optimal level of 
inventory in the system. The METRIC approximation 
assumes a Poisson distribution of demand and constant 
replenishment lead-times. S.C. Graves [11] extends the 
METRIC approximation by estimating the mean and 
the variance of the outstanding retailer orders. He fits 
the negative binomial distribution to these parameters 
to determine the optimal inventory policy. S$. Axsater} 
[4] provides an exact solution to the problem and shows 
that the METRIC approximation provides an underes- 
timate, whereas Graves’ two-parameter approximation 
[11] overestimates the retailer backorders. The above 
studies use the one-for-one ordering policy (S — 1, S), 
i.e. an order is placed as soon as a demand occurs. 
This policy is appropriate for items with high value and 
a low demand rate. Axsater [4] shows that the mod- 
els used for the one-for-one ordering policy can be ex- 
tended in the case of batch ordering with only one re- 
tailer. Analysis of batch ordering policies in arbores- 
cent systems (when the number of retailers is greater 
than one) is similar to Sherbrooke ’s model. B.L. Deuer- 
meyer and L.B. Schwarz [8] were the first to analyze 
such a system. They estimate the mean and the vari- 


ance of lead-time demand to obtain average inventory 
levels and backorders at the warehouse, assuming that 
lead-time demand is normally distributed. The retailer 
lead-time demand is also approximated using a normal 
distribution. In addition to reviewing the literature in 
the area, [15,17] and [22] also provide several exten- 
sions to the Deuermeyer and Schwarz model.In [10] 
the concept of stability in a capacitated, multi-echelon 
production-inventory system under a base-stock policy 
is introduced. W.L. Maxwell and others [16] extend the 
analysis to multiproduct, continuous review and deter- 
ministic demand, MSIM problems. Their model tends 
to schedule the orders for each of the products over an 
infinite horizon so as to minimize the long-run average 
cost. The authors define a new class of policies in which 
each product uses a stationary interval of time between 
successive orders. Their model finds a lot-sizing rule 
that is within 6% of the average cost of the optimal pol- 
icy. R. Roundy [19] develops a similar multistage, mul- 
tiproduct lot-sizing model. Under the assumption that 
the ratio of the order intervals of any two products is 
an integer power of two, it is shown that the solution is 
within 2% of optimality. 

D. Sculli and others [20] extend the analysis of 
MSIM systems for the case when two suppliers are 
used to replenish stock of a single item. They calcu- 
late the mean and the standard deviation of the effec- 
tive lead time demand and interarrival time when re- 
plenishment orders are placed at the same time with 
the two suppliers, in a continuous review system. The 
lead-time distribution of each supplier is assumed to 
be normal. R. Ganeshan [9] presents a near-optimal 
(s, Q) inventory policy for a production/distribution 
network with multiple suppliers replenishing a central 
warehouse, which distributes to a large number of re- 
tailers. The model concentrates on inventory analysis 
at the retailers and the warehouse, and demand process 
at the warehouse. The model finds a near-optimal order 
quantity and a reorder point at both the retailer and the 
distribution center (DC) under stochastic demand and 
lead-time, subject to customer service constraints. The 
main contribution of this model is the integration of the 
above components for analyzing simple supply chains. 
P. Afentakis and others [1] develop a procedure for op- 
timally solving medium size lot-scheduling problems in 
multistage structures with periodic review of the inven- 
tory and dynamic deterministic demand. They formu- 
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late the problem in terms of echelon stock, which sim- 
plifies its decomposition by Lagrangian relaxation. An 
efficient branch and bound algorithm is used to solve 
the problem. 

M.C. van der Heijden and others [13] consider 
the periodic review, order-up-to (R, S) inventory sys- 
tem under stochastic demand. They propose a new ap- 
proach to calculate the mean physical stock. The stan- 
dard approximation appears to yield inaccurate results 
in the case of low service levels. Low service levels usu- 
ally occur at intermediate nodes in optimal solutions 
for multi-echelon systems. 


Conclusions 


With the trend toward just-in-time deliveries and re- 
duction of inventories, many firms are reexamining 
their inventory and logistics policies. Some firms are al- 
tering their inventory, production and shipping poli- 
cies, and others are working on coordinating inven- 
tory decisions throughout their SC, with the goal of 
reducing costs and improving service. Single stage IM 
models give some insights on how to manage inven- 
tories under certain demand and lead-time considera- 
tions, while MSIM models take the analysis further, co- 
ordinating inventory decisions throughout the SC. This 
article reviews the literature on single-stage and multi- 
stage inventory management models, with an emphasis 
on achievements in this field. 
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Lagrangian conditions are often necessary, but not suf- 
ficient, for a minimum of an optimization problem in 
continuous variables. Sufficiency holds under convex 
assumptions, which are however often not satisfied in 
applications. Invexity is a less restrictive assumption 
than convexity, under which Lagrangian conditions are 
sufficient for a minimum, and also related duality re- 
sults hold. It also provides a structure, showing rela- 
tions with various other kinds of generalized convexity. 
Consider a constrained minimization problem 


min f(x), 
(P1) St. s(x) = 0, 
j = 1, ,m, 


where the functions f and g; are differentiable. Define 
the Lagrangian L(-; A) := f(-) + © Aj gj(-), where A = 
(A;, ...» Am) is a vector of nonnegative Lagrange multi- 
pliers. When p is a feasible point, define also a reduced 
Lagrangian L,)(-; A) obtained by omitting constraints 


inactive at p (thus, when g;(p) < 0.) A minimum point p 
for (P1), assuming some regularity for the constraints, 
is then a Karush-Kuhn-Tucker (KKT) point, namely 
one where the gradient of the Lagrangian with respect 
to x satisfies Lip)'(p; A) = 0 for some A > 0. However, 
a KKT point is not generally a minimum point. It is, 
in particular, if the functions f and each g, are convex. 
However, convexity often does not hold in applications, 
and less restrictive conditions are sought when a KKT 
point is a minimum. 

A differentiable vector function F := (f, g1,... &m)s 
with gradient F’, is called invex if, for some scale func- 
tion n, and all x and p, 


(INV) F(x) —F(p) = F’(p)n(x, p). 


This property was first called n-convex by M.A. Hanson 
[11]. 

Usually, at a given p, n(x, p) = (x — p) + o(|| x— pl). 
In particular, F is convex if 1 (x, p) = x — p. If F is invex 
and A > 0, then it follows that L(p)(-; A) is invex, so if x 
is a feasible point, then 


f(x) — f(p) = Ley, A) — Lipy(p, A) 
> Li,)(psA)n(x, p) = 0, 


from KKT, so that p is a minimum point of (P1). This 
minimum is global if (INV) holds globally, otherwise 
local. (But how can (INV) be verified, as a global prop- 
erty?) 

By a similar argument, since L(-;v) is invex when v 
> 0, duality holds for (P1) and the Wolfe dual problem: 


max f(u)+ > vjgj(u) 
(D1) 4st fu) + >> vjgi(u) = 0, 


v1, 2 0,...,V¥m = 9, 


if KKT holds for (P1) and the invex property holds (see 
[17]). Duality means that f(x) > f(u) + )ovjgj(u) when- 
ever x and u, v are feasible for their respective problems, 
and also that the optimum objectives are equal. Con- 
sider also the Mond-Weir dual [18]: 


max f(u) 
im 0 fu) + > vjgi(u) = 0, 
V1 = 0,...,Vm > 0, 
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Assume that F is invex. If x is feasible for (P1) and (x, 
u) for (D2), then 


f(x) — flu) = f’(u)n(x, u) 
=~) Ajgiu)n(, uw) 
= Y(-v5g;(x) + vjgj(u) = 0, 


and this, with KKT, proves duality. 

If the vector function F is locally Lipschitz, rather 
than differentiable, then (INV) is replaced by general- 
ized invex (see [6]): 


(INV2) F(x) — F(p) = F°(p, (x, p)). 


where F °(p, d) denotes Clarke ’s generalized directional 
derivative (see [2]) of F at p in direction d. Most prop- 
erties of (INV) extend to (INV2). In particular, if F, 
F, +++ are (generalized) invex functions with the same 
scale function 7, then so also are max; F;(-), any element 
of the convex hull co F;(-), and (if it exists) lim, F;(-). But 
the assumption of the same 7 is necessary here. 

Suppose now that the following V-invex property 
(see [14]) holds for go(-) := f(-) — f(p) and for the ac- 
tive constraints gj(-) (those for which g;(p) = 0): 

Vx:  gi(x)— gi(p) = Bilx, p)g(p)n(x, p). 
Note that 7(-, -) is a vector function, the same for each 
j, and the weight B;(-, -) is a positive scalar function. If 
KKT holds, and if x is feasible for (P1), then setting ju; 
:= A,/ B;, the minimum follows from 


F(x) — fp) = Lop (xs) — Lip) sw) 
> > Bjuigi(p)n(x, p) = 0. 


In the problem (P1), set G(x) := rj(x) gj(x), where 
rj(-) > 0; then g, x) < 0 if and only if G(x) < 0 Gj = 
1,..., m), and f(x) > f(p) if and only if Go(x) > 0. So 
(P1) is equivalently formulated in terms of the weighted 
constraint functions G;. For each active constraint gj(x) 
< 0 with g; invex, 


Gi(x) — Gj(p) = Gj(x) = Bj(x, p)Gi(p)n(x, p), 


where the weight B(x, p) = r;(x)/r;(p). Thus the G; have 
the V-invex property. Note also that if n and d are real 
functions with n(-) > 0 and d(-) > 0, and n and — d are 
invex with the same same scale function, then [14] the 


ratio n(-)/d(-) is V-invex with the same scale function, 
and weight B(x, p) = d(p)/d(x). 

Invexity for (P1) can also be relaxed to the require- 
ment, called Type I in [12]: 


f(x) — f(p) = f'(p)n&, p) 
— gi(p) = gi(p)n(x.p), Viel, 


where J is the set of indices of constraints active at p. If 
this property holds at a KKT point p, and g,(x) < 0 (Vj), 
then 


f(x) — f(p) = — Do Ajgi(p)nlx, p) = 0, 


je] 


thus p is a minimum for (P1). 

Invexity is related as follows to some other proper- 
ties. The vector function F: R"” — R’ is convexifiable if 
H:=Fo ¢ 1 is convex, for some invertible transforma- 
tion d: R? > R". For0<a <1, 


(CL) (1—a@)F(p) + a@F(x) 
= (1—a)H($(p)) + aH(¢(x)) 
= H((1— a)h(p) + ap(x)) 


= F(&(a, x, p)) 
if H is convex, where 


(K) &(a, x, p) := @ (1 — a) G(p) + a(x). 
This reduces, for a convex function F, to &(a@, x, p) = (1 
—a)p+ax. If ¢ is differentiable, then also 


a 
(K2) ag 0 6 x, P)la=0 
lar 
= ¢-"(b(p))Lb(x) — o(p)] = n(x. p). 
Hence, letting a | 0, invexity follows. Thus, 


Convexifiable = (CL)+(K) = _Invex, 


with the second implication assuming differentiable 
functions. The name ‘invex’ was given [4], from invari- 
ant convex, since invex preserves that part of the con- 
vex property that is invariant to the transformation @. 
(See also [1].) The property (CL), together with the ex- 
istence of (0/ da) E(a, x, p)|a =o. was called protoconvex 
in [10]. It holds, in particular, if (a, x, p)=pt+a od (x— 
p) holds in (CL). If F is locally Lipschitz, then (CL) and 
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(K2) for all x and p in a domain imply generalized invex. 
Note that invex does not imply convexifiable. A coun- 
terexample (see [8] is F(x) := (— x1x2 + 1, — 1x2 + 1). 

Suppose that f is invex, with scale function 7, and 
that ¢ is a differentiable invertible transformation of the 
domain space. An application of the chain rule shows 
that f o @ is also invex, with a new scale function 


Ala, B) = $'(b(a))*n(G(B), (@)). 


This invariance of invexity to domain transformations 
extends to two relaxed properties (see [15,18]), defined 


by: 
© quasi-invex: 

f(x) fp) => f'(p)n(x, p) <0 
e pseudo-invex 

f(x) <f(p) =>  f'(p)n(x, p) <0. 


These reduce to quasiconvex (respectively pseudocon- 
vex; see [16,19]) when n(x, p) = x — p. If f is quasi- 
invex (respectively, pseudo-invex) with scale function 
n, then f o @ is quasi-invex (respectively pseudo-invex) 
with scale function 7. Note that each pseudoconvex real 
function is invex, but not conversely. 

In (P1), if f is pseudo-invex, and each gj is quasi- 
invex, all with the same scale function 7, then KKT is 
sufficient for a minimum. For each active constraint, 


g(x) S0=gi(p) =>  gilp)n(x,p) <0. 


Then >°A; g/(p) n(x, p) < 0, hence KKT gives f’(p) n(x, 
p) = 0. From pseudoconvexity, f(x) > f(p), proving 
the minimum. There are various results (see [18,21]) 
showing sufficiency of KKT for a minimum, when var- 
ious combinations of the functions f and g; have speci- 
fied pseudo-invex and quasi-invex properties, all with 
the same scale function 7. Some further examples of 
pseudo-invex functions are given in [15] (they called n- 
pseudoconvex.) 

The property (CL) is called convex-like (see [13]). If 
I’ CR" is a convex set, F is (CL), and Q is the orthant 
Re then F(I") + Q is a convex set. For, taking x, p € 
IT andg,r€ Q, withO0<a<1, 


(1 — @)[F(p) + q] + a[F(x) + 1] 
— [F(E(a, x, p) + -—a@)qt+ar] €Q. 


From this follows (see [3,8,13]) the basic alternative the- 
orem that 


(ax € I): F(x) <0 


BAT) G04 p>0): pF) > 0. 


Consider (P1) with inactive constraints omitted. If 
(CL) holds, then (from (BAT)) (P1) reaches a minimum 
at p if and only if there are nonnegative multipliers t 
and A, not both zero, for which 


cLf(x) — fp) + Ag(x) = 0, Wx. 


If Slater’s constraint qualification holds, that g(c) < 0 for 
some c, then t = 1 can be assumed. If f and g are direc- 
tionally differentiable, then the directional derivatives 


satisfy 

f'(psd) + Ag'(psd) = 0 
for each direction d. If f and g are Lipschitz functions 
and (CL) holds, then 


OE Af +Ag\(p), Ag(p) = 0, 


is necessary and sufficient for a minimum, where 
ddenotes here Clarke ’s subdifferential [2]. 

When is a vector function invex at a point p? As- 
sume now twice differentiable functions, and expand 


1 
F(x) = F(p) + F'(p)v+ sv P'(p).v Jp ea 
1 
n(x, p) =v + a Ow spie 


where v = x — p, and vTF’(p).v means that component 
j of F has second order term v7 F;’"(p) v, and similarly 
for v'Q.v; denote the matrix component k of Q. by Q:. 
Then ([5]), by substituting in (INV), local invexity im- 
plies that 


F"(p)s — > F'(p) sk Qk 
k 


is positive semidefinite, for each s. Conversely, if each of 
these matrices is positive definite, then F is locally invex 
at p. 

Some further classes of invex functions are de- 
scribed as follows (see [9]). Let Xo be an open domain in 
R", let A: Xp — R™ be convex, and let B: Xp — R be dif- 
ferentiable and satisfy B(Xo) C (0, oo). Then A(-)/B(-) 
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is invex if also B is convex with A(X) C — R", or if B 
is concave with A(X9) C R”. If g: Xo > R” is differ- 
entiable, and g’(a) d < 0 for some direction d, then g is 
invex at the point a. 

Let ¢: R" — R" be an invertible C? mapping; let r: 
R, — R, be strictly increasing, with r(0) = 0, 7’(0) = 0, 
and r’’(0) < 0 on some interval. Then r and h(-) := ro || 
- || are pseudoconvex, and h o ¢ is pseudo-invex (hence 
also quasi-invex). 

The invex property, and also pseudo-invex, quasi- 
invex and V-invex, are also applicable when the 
spaces R" and R” are replaced by infinite-dimensional 
normed spaces of functions, such as occur in optimal 
control (see [3,8]) and continuous programming (see 
[7,20]). The definitions, and proofs of basic proper- 
ties (see [3,7,8]), are unchanged, interpreting a > b as 
a — b € Q, where the order cone Q is a closed con- 
vex cone. Examples of spaces of control functions are 
the spaces C(R’) (respectively, PC(I, R’)) of continu- 
ous (respectively, piecewise continuous) functions from 
an interval I into R’, with the uniform norm, and the 
space L?(I, R’) of square-integrable functions from I 
into R’). Consider, for example, an integral objective 
function f(x) := fy (x(t), X(1), t) dt, where f € 
C1(0, T), ¢ is differentiable, and x(t) = (4)x(t). As- 
sume boundary conditions x(0) = xo, x(T) = xr. De- 
note dx(x(t), x(t), t) := (ZA) p(x(0), t), and similarly 
;. Then the gradient f’(p) of f at p € C'[0, T) is given 
by 


Tt 
fpz= / ox (p(t), p(t), t)z(t) dt 


= d 
=i | o(0(8, BO. 8 — F4s( 9,210. 
0 
x 2(t) dt 


after integrating by parts. Then f is imvex if, for some 
scale function n, 


f(x) — f(p) 
- . d . 
= | | o(0(8. BO. 8 — F560), p09 


x (x(t), p(t), t) dt. 


For a constraint y (x(t), t) < 0 (Vt € [0, T]), the analog 
of the term )°A; g)(x) in the Lagrangian is 


T 
/ At) W(x(t), t) dt, 
0 


and invexity requires that 
F 
[ 2@lwe.0-ve.0] a 
0 


T 
> ‘| AOvre(pt), Onlex(t), ple), f) dt. 
0 


There are converse KKT and duality properties for such 
infinite-dimensional problems (see e.g. [8,20]), using 
invexity quite similarly to finite-dimensional problems. 


See also 


> Generalized Concavity in Multi-objective 
Optimization 

> Isotonic Regression Problems 

> L-convex Functions and M-convex Functions 
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Problem Statement 


Given a finite set X with an ordering <, a real function 
g on X and a positive weight function w on X, the iso- 
tonic regression problem is to find a function g* which 


minimizes 


Y Lex) — f(x)? w(x), 


xEX 


among the class F of isotonic functions f defined on X, 
Le. 


F={f: Vx,yeXandx <x y=> f(x) < fly}. 


The function g* is called isotonic regression, and it exists 
and is unique [5]. 

Isotonic regression can be viewed as a least squares 
problem under order restrictions; here, order restric- 
tions on parameters can be regarded as requiring that 
the parameter, as a function of an index, will be isotonic 
(the adjective ‘isotonic’ is used as a synonym for ‘order 
preserving’) with respect to an order on the index set. 

If = is reflexive, transitive, antisymmetric and every 
pair of elements are comparable, the problem is called 
simple order isotonic regression. 

A very important result in the theory of isotonic 
regression, is that the increasing function f closest to 
a given function g on X in the (weighted) least squares 
sense, can be constructed graphically. A geometrical in- 
terpretation of isotonic regression over a simple order 
finite set X = {x1,..., Xn} is the following. Let Wj; = 

a w(x;) and G; = aa g(x) w(x;); the points P; 
= (Wj, G;) obtained plotting the cumulative sums G; 
against the cumulative sums Wj, j = 0, ..., 2 (Wo = 0, 
Go = 0), constitute the cumulative sum diagram (CSD) 
of a given function g with weights w. The isotonic re- 
gression of g is given by the slope of the greatest convex 
minorant (GCM) (i.e., the graph of the supremum of 
all convex functions whose graphs lie below the CSD) 
of the CSD; the slope of the segment joining P;— ; to P; 
is just g(x;),j =1,..., n, while the slope of the segment 
joining P;_ , to P;, i < jis the weighted average 


J, g(xp)w(xr) 


os w(x;) 


T=. 


AV{x;,..., jf = 


Therefore, the value of the isotonic regression g* at the 
point x; is just the slope of the GCM at the point Pi = 
(Wj, G), where G* = 1_g*(xi) w(x;). Note that, if P; 
is a ‘corner’ of the graph, g* is the slope of the segment 


extending to the left. An illustrative example is shown 
in Fig. 1. 
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ji w(z) Wi 9(t) G;__g* (a) Gs 
1 1 1 i. =i 1 1 
2 1 2 5 3 1/5 4/5 
3 2 4 { «4 1/5 -2/5 
4 2 6 2. 1/5 0 
5 i % 2 a 2 


Isotonic Regression Problems, Figure 1 
Example of CSD and GCM 


Other isotonic regression problems are based on 
a less restrictive kind of order: partial order and quasi- 
order. In the partial order isotonic regression problem 
the binary relation < on X is reflexive, transitive and 
antisymmetric, but there may be noncomparable ele- 
ments. In the quasi-order isotonic regression problem, 
the ordering relation satisfies only the first two condi- 
tions. 

The isotonic regression problem arises in both 
statistics and operations research. Applications in 
statistics are discussed in [2,10] and [14]. Applications 
in operations research can be found in [11] and [15]. 

The problem under consideration is also of theoret- 
ical interest being one of the very few quadratic prob- 
lems known for which strongly polynomial solution al- 
gorithms exist (an algorithm is said to be strongly poly- 
nomial if the number of elementary arithmetic opera- 
tions it requires is a polynomial in the problem param- 
eter and not just the size of the input data). That is why 
many researcher in the area of order restricted statisti- 
cal inference have paid a great deal of attention to the 
problem of developing algorithms for isotonic regres- 
sion. Most of the algorithms proposed involve averag- 
ing g over suitably selected subsets S of X on each of 
which g*(x) is constant. 


The Pool Adjacent Violators Algorithm 


The first and the most widely used algorithm for the 
simply ordered isotonic regression is the pool adjacent 
violators algorithm (PAV) proposed by M. Ayer et al. 
[1] in 1955. 

This algorithm follows directly from the geometri- 
cal interpretation of isotonic regression. As it is said be- 
fore, the solution of the problem under consideration is 
given by the left derivative of the greatest convex func- 
tion lying below the CSD. If, for some index i, g(x;—1) 
> g(x;), then the graph of the part of the GCM between 
the points P*_, and P* is a straight line segment. Thus 
the CSD could be altered by connecting P;— with P; 
by a straight line segment, without changing the GCM. 
The PAV algorithm is based on this idea of successive 
approximation to the GCM. (See Fig. 2 for a geometri- 
cal interpretation of ‘pooling’ adjacent violators.) 

In describing the algorithm, an arbitrary set of con- 
secutive elements of X will be referred to as a block. The 


j 1 2 5 
w(a;) 1 1 2 2 
g(x;) -1 3 1 —2 2 
First pool 
w(z;) 1 2 4 4 
g(x;) -1 3 -l1 -1 2 
Second pool 
w(z;) 1 5 5 5 
g(x;) = g*(a;) -1 1/5 1/5 1/5 2 
AL 
3L 
2r Jirst pooling 
~_slope —1 
1+ a 


3 ere 
_-— “second pooling 
slope 1/5 


Isotonic Regression Problems, Figure 2 
Geometrical interpretation of pooling adjacent violators 


1776 


Isotonic Regression Problems 


aim is to find the solution blocks, that is a partitioning 
of X into sets on each of which the isotonic regression 
function g* is constant. 

The PAV algorithm starts from the initial block 
class A consisting of the singleton sets {xj}, 1 <i< 
n. At each stage of the algorithm, a new block class is 
obtained from the previous block class by joining the 
blocks together until the final partition is reached. If 
g(x1) <+++ <g(x,), then the initial partition is also the 
final partition, and g*(x;) = g(x;),i=1,...,n. 

Otherwise, select any of the pairs of violators of the 
ordering; that is, select an index i such that g(x;) > 
g(x;+1).Pool’ these two values of g: i-e., join the two 
points x; and x;+1 in a block {x;, x;41} ordered be- 
tween {x;—1} and {x;,2}, and associate to this block 
the average value Av {x;, xj+1} and the weight w(x;) 
+ w(xj+1). After each step in the algorithm, the aver- 
age values associated with the blocks are examined to 
see whether or not they are in the required order. If 
so, the final partition has been reached, and the value 
of g* at each point of a block is the ‘pooled’ value as- 
sociated with the block. If not, a pair of adjacent vi- 
olating blocks is selected, and pooled to form a single 
block, with associated weight the sum of their weights 
and associated average value the weighted average of 
their average values, completing another step of the al- 
gorithm. 

A pseudocode for PAV algorithm is presented be- 
low, where B is the first block in A and B, is the block 
that follows B in the blocks partition. 


A = {{xi},...,{%nh} 
REPEAT 
set B and B, 
WHILE B, # 0 
IF AvB > AvB, THEN 
A= A\{B,B,}U{BUB,} 
(3) = 1310) 1. 
g* (x) = AvB, Vx € B 
ELSE 
B=B, 
ENDIF 
B, = succ(B) 
ENDWHILE 
UNTIL there are no violating blocks 


A pseudocode for PAV algorithm 


S.J. Grotzinger and C. Witzgall [9] have shown that 
the computational complexity of the PAV algorithm 
is O(n). 


Minimum Lower Set Algorithm 


The first algorithm proposed for partially order isotonic 
regression is the minimum lower sets (MLS) due to H.D. 
Brunk [4]. 

For describing this algorithm, as for most of the al- 
gorithms for general partial order, it is convenient to 
introduce the concept of ‘level set’ that generalizes the 
concept of ‘block’. In order to define this set, important 
concepts are lower and upper set. A set L C X is called 
lower set if Vx € X and Vy € Lwithx <y > x €L. A set 
U CX is called upper set if Vx € X and Vy € U with x 
<y=> xe U. Finally, S C X is called level set if there 
are a lower set L C X and an upper set U C X such that 
S=LNU. 

The isotonic regression with respect to any partial 
order is constant on level sets. The MLS algorithm com- 
putes the isotonic regression function by partitioning X 
into / level sets S,, ..., S; such that AvS, < +--+ < AvS). 
In doing that, the algorithm performs / steps in each 
of which searches for the largest level set of minimum 
average S; among the level sets Li.) De where TP 
is the complement of L;. The isotonic regression values 
are given by the weighted average of the observations in 
each of the level set that belong to the solution partition. 

In the following a pseudocode for MLS algorithm is 
given, where L is the lower set family of X. 

M.J. Best and N. Chakravarti [3] have proved that 
the MLS algorithm is of computational complexity 
O(n’). 


select L; C X: AvB, = AvL, 
=min{AvL : L €L} 
i=l 
REPEAT 
jis ite Il 
select Ly © X: AvB; = Av(L2NL®) 
= minf{A(LN LC): LeL} 
Ly = Lp 
UNTIL X is exhausted 
ix) = AVR, Vee By, f= Vico 


A pseudocode for MLS algorithm 
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Other Algorithms 


Several other algorithms are available for solving the 
isotonic regression problem as well as its various spe- 
cial cases. Their description are provided in [2,3,6,7,8, 
10,11,12,13,15], among others. 

Best and Chakravarti, in their paper [3], have 
pointed out that several of the proposed algorithms are 
active set quadratic programming methods and that this 
methodology provides a unifying framework for study- 
ing algorithms for isotonic regression. 


See also 


> Regression by Special Functions 
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C.E. Shannon’s seminal discovery [7] (1948) of his en- 
tropy measure in connection with communication the- 
ory has found useful applications in several other prob- 
abilistic systems. E.T. Jaynes has further extended its 
scope by discovering the maximum entropy principle 
(MaxEnt) [1] (1957) which is inherent in the process 
of optimization of the entropy measure when some in- 
complete information is given about a system in the 
form of moment constraints. MaxEnt has, over the past 
four decades, given rise to an interdisciplinary method- 
ology for the formulation and solution of a large class 
of probabilistic systems. Furthermore, MaxEnt’s natu- 
ral kinship with the Bayesian methods of analyses has 


further bolstered its importance as a viable tool for sta- 
tistical inference. 


Entropy and Uncertainty 


The word entropy first originated in the discipline of 
thermodynamics, but Shannon entropy has a much 
broader meaning since it deals with the more perva- 
sive concept of information. The word entropy itself 
has now crept into common usage to mean transforma- 
tion of a quantity, or phenomenon, from order to disor- 
der. This implies an irreversible rise in uncertainty. In 
fact, the word uncertainty would have been more un- 
ambiguous as to its intended meaning in the context of 
information theory, but for historic reasons, the usage 
of the word entropy has come to stay in the literature. 

Uncertainty arises both in probabilistic phenomena 
such as in the tossing of a coin and, equally well, in de- 
terministic phenomena where we know that the out- 
come is not a chance event, but we are merely fuzzy 
about the possibility of the specific outcome. What is 
germane to our study of MaxEnt is only probabilistic 
uncertainty. The concept of probability that is used in 
this context is what is generally known as the subjec- 
tive interpretation as distinct from the objective inter- 
pretation based on frequency of outcome of an event. 
The subjective notion of probability considers a prob- 
ability distribution as representing a state of knowledge 
and hence it is observer dependant. 

The underlying basis for an initial probability as- 
signment is given by the Laplace’s principle of insuf- 
ficient reason. According to this, if in an experiment 
with n possible outcomes, we have no information ex- 
cept that each probability pj > 0 and °?_,p; = 1, 
then the most unbiased choice is the uniform distribu- 
tion: (1/n, ..., 1/n). Laplace’s principle underscores the 
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choice of maximum uncertainty based on logical rea- 
soning only. 


Why Choose Maximum Uncertainty? 


We shall now consider the example of a die in order to 
highlight the importance of maximum uncertainty as 
a preamble to our later discussion of MaxEnt. When the 
only information available is that the die has six faces, 
the uniform probability distribution (1/6, ..., 1/6), sat- 
isfying the natural constraint 


SRS M20 cuss =O, (1) 


i=1 


represents the maximum uncertainty. If, in addition, we 
are also given the mean number of points on the die, 
that is, if we are given that 


pit 2p2 + 3p3 + Ap, + ey + 6P6 = 4.5, (2) 


the choice of distributions is restricted to the incom- 
plete information given by both (1) and (2), and, conse- 
quently, the maximum uncertainty encountered at the 
first stage is reduced. Since there are only two indepen- 
dent equations in six variables, there is an infinity of 
probability distributions satisfying the constraints. Out 
of all such distributions, one can anticipate the exis- 
tence of a distribution giving rise to the maximum un- 
certainty Smax. The importance of Smax can be deduced 
from a careful consideration of the process by which 
uncertainty is reduced (or never increased) by provid- 
ing more and more information in terms of moment 
constraints. If we use any distribution from amongst 
the infinity of distributions satisfying the constraints 
that is different from the one corresponding to Smax, 
it would imply that we would be using some informa- 
tion in addition to those given by (1) and (2). But scien- 
tific objectivity would behoove that we should use only 
the information that is given to us, and scrupulously 
avoid using any extraneous information. The principle 
of maximum uncertainty can, accordingly, be stated as: 


Out of all probability distributions consistent 
with a given set of constraints, the distribution 
with maximum uncertainty should be chosen. 


At first glance, it may seem paradoxical that while the 
goal is reduction of uncertainty, we are actually trying 


to maximize it. However, what we are ensuring by the 
principle of maximum uncertainty is that we are maxi- 
mally uncertain about what we do not know. 


Shannon Entropy 


The conclusion from the example of the die is that in 
making inferences based on incomplete information, 
the probability distribution that has the maximum un- 
certainty permitted by the available information should 
be used. It is therefore necessary to have a quantita- 
tive measure of uncertainty in a probability distribu- 
tion. A unique function was defined by Shannon [7] to 
measure uncertainty. Let p = (p1, ..., Pn) be a probabil- 
ity distribution satisfying the natural constraint 


Vea h (3) 


i=1 


Shannon’s measure of entropy (uncertainty) for this 
distribution is given by 


n 
S=—)° pilnp; (4) 
i=1 
Shannon arrived at this unique measure by first stat- 
ing the desirable properties that such a measure should 
have. Since not all this long list of properties are in- 
dependent, he considered a smaller independent set of 
properties and deduced the uniqueness of (4). Similarly, 
A.I. Kinchin [5] assumed a different independent set 
and arrived at the same measure. 

The Shannon entropy measure is the basis for 
Jaynes’ maximum entropy principle. Of particular im- 
portance is the property of concavity of the measure 
which guarantees the existence of a maximum entropy 
distribution with all p; > 0. Shannon’s work in infor- 
mation theory did not involve optimization and as such 
he did not make use of the concavity property whereas 
here, it is central to the development. 


Jaynes’ Maximum Entropy Formalism 


We will now present Jaynes’ maximum entropy formal- 
ism based on discrete multivariate distributions of the 
random variable X and state some important results 
arising from it. 

The ensemble, 


(X, p) = ((x1, pi), tees (Xn, Pn))s 
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where n is finite, represents all the possible realizations 
of X and their probabilities of occurrence. p is estimated 
by maximizing the Shannon measure (4) subject to the 
natural constraint (3) and the moment constraints 


n 
\ Peay r=1,....m; pj =O. (5) 
i=1 


The Lagrangian is given by 
L= — 5 piln pi — (Ao - 1) (s7--2] 
i=l i=1 
_ LF (x Piri = «| (6) 
r=1 i=1 


where Ao, ..., Am are the (m + 1) Lagrange multipliers 
corresponding to the (m+ 1) constraints of (3) and (5). 


gp, 707 IPI to Do rg =0 (7) 
or, 

pi = exp(—Ao — Aigii — +++ —Am&mi), (8) 
for i= 1,..., n. The m multipliers are determined by 


substituting for p; from (8), in (3) and (5) so that 


Yvexp | —Ao— So Ajgii | =1 (9) 
i=1 j=l 
and 
Ye ri exp —Xo — 2 A jSji = a,, (10) 
i=1 j=l 
for r=1,...,m, or 
exp(Ao) = ) “exp - SO Apg ii (11) 
i=1 j=l 
and 
a, exp(Ao) = ye exp | — So Ajgii F (12) 
i=1 j=l 
for r=1,...,mso that 
Doi=r Sri EXP (- vi=1 1jgii) 
ay = , (13) 


yoi=i &XP (- ae Aigii) 


r=1,..., m. Equation (11) gives Ao as a function of A;, 

...5 Am. Equations (13) give a), ..., @ as functions of 

Niscaas Aine 
Based on the above formalism, we can derive the fol- 

lowing results which are useful in applications. 

e The Lagrange multiplier Ao is a convex function of 
Aagsenes Aine 

e The value of the maximum entropy Smax = Ao + A141 
tees tA dm. 

© Smax is a concave function of qj,... 
The Lagrange multipliers Aj, ..., Aim are the partial 
derivatives of Smax with respect to a,.. 
tively. 

e An alternative proof that MaxEnt gives globally 
maximum values of entropy, that is, Smax —S > 0, 
can be given on the basis of Shannon’s inequality 


n ai 
pin 2 
Drang ®° 


> am. 


+> Am, Tespec- 


(14) 


where q is the probability distribution with entropy 
S. 

e Jaynes’ formalism also leads to Jaynes’ entropy con- 
centration theorem that asserts that the constrained 
maximum probability distribution is the one that 
best represents our state of knowledge about the be- 
havior of the system and that MaxEnt is the pre- 
ferred method of inferring that distribution. The 
conclusion is based on (14) and the chi-square test. 

e Jaynes’ formalism is applicable to continuous- 
variate probability distributions also. 

e In our earlier discussion, we had stated that the 
statement of the Laplace’s principle of insufficient 
reason was based purely on logic. We can now 
show that uniform distribution results from MaxEnt 
when only the natural constraint (3) is specified. 


Applications of MaxEnt and Conclusions 


As the very first application, Jaynes demonstrated the 
power of MaxEnt by deriving all the principal distribu- 
tions of statistical mechanics without reference to the 
classical methods of derivation [1]. An important appli- 
cation that closely followed the application of MaxEnt 
to statistical mechanics, was the correspondence that 
M. Tribus [9,10] established with the laws of thermo- 
dynamics. This application, incidentally, clarified the 
connection between the Shannon entropy and ther- 
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modynamic entropy. He also demonstrated that most 
of the statistical distributions that are commonly en- 
countered can be derived from MaxEnt by making use 
of appropriate moment constraints, which, later, came 
to be known as characterizing moments. Thus, he es- 
tablished the integral link between information theory 
and statistics. For example, the normal distribution is 
a maximum-entropy distribution resulting from maxi- 
mizing the Shannon entropy with respect to the char- 
acterizing moments of mean and variance. 

These remarkable successes set in motion the ap- 
plications of MaxEnt in several other disciplines. To 
name only a few, MaxEnt has been applied to prob- 
lems in urban and regional planning, transportation, 
business, economics, finance, statistical inference, op- 
erations research, queueing theory, nonlinear spec- 
tral analysis, pattern recognition and image process- 
ing, computerized tomography, risk analysis, popula- 
tion growth models, chemical reactions and many other 
areas. These are all problems that are inherently prob- 
abilistic in nature or, alternatively, where the MaxEnt 
model is made to fit by artificially interpreting prob- 
abilities as proportions. References to these problems 
can be found in [2,3,4]. 

For the past ten years, the direction of research into 
MaxEnt has gone in the direction of using the princi- 
ple in conjunction with Bayes theorem. There has been 
a series of workshops conducted under this heading 
which appears in the series [6]. 

Also of great interest is the concept of minimum 
entropy which is found useful in recognizing patterns 
contained in an information structure. However, re- 
search in this direction has been hampered by the com- 
putational complexity in determining the quantity be- 
cause it results from the global minimization of a con- 
cave function which is a NP-hard problem. 

Closely associated with MaxEnt are the methods of 
optimization based on Kullback-Leibler measure [8] 
to measure distance between two probability distribu- 
tions. However, the school dedicated to the use of Max- 
Ent steers clear of this approach since it does not in- 
volve the concept of entropy. 


See also 


> Entropy Optimization: Interior Point Methods 
> Entropy Optimization: Parameter Estimation 


> Entropy Optimization: Shannon Measure of 
Entropy and its Properties 
> Maximum Entropy Principle: Image Reconstruction 
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The job-shop problem may be formulated as follows. 
Given are n jobs j = 1,..., m and m machines Mj, ..., 
Mn. Job j consists of a sequence 


Ones Ong 


of nj; operations which must be processed in the given 
order, i.e. Oj,1,; cannot start before Oj is completed 
for i= 1,..., nj—1. Associated with each operation Oj 
there is a processing time p, and a machine pj € {M), 

..» My}. Oj must be processed for pi, time units on 
machine j1;. Each job can be processed by at most one 
machine at a time and each machine can process only 
one operation at a time. If not stated differently pre- 
emptions of operations are not allowed. One has to find 
a feasible schedule which minimizes the makespan. 

It is assumed that all processing times are nonnega- 
tive integers and that all jobs and machines are available 
at starting time zero. Furthermore, if not stated differ- 
ently, machine repetition is allowed, i.e. [i+1,j; = Mi is 
possible. 

For a precise formulation of the job-shop problem, 
let O be the set of all operations, let p(k) be the process- 
ing time of operation k € O, and define (k, 1) € Cifand 
only if k = Oy and 1 = O;,.1,; for some job j and some i = 
1,..., nj— 1. Finally, let M(k) be the machine on which 
operation k must be processed. 

Then the job-shop problem may be formulated 
as disjunctive linear program (cf. » Linear program- 
ming): 


min maxts(k) + p(k)} (1) 
such that 
s(k) + p(k) < s(1) 
forall k,l € O3(k, 1) € C, (2) 


s(k) + p(k) < s(1) or s(1) + p(/) < s(k) 
forall k,leEO; kAl; M(k)=M(), (3) 


s(k)>0 forallke O. (4) 


s(k) represents the starting time of operation k. Due to 
(2) all operations of the same job are processed in the 


right order. The constraints (3) make sure that a ma- 
chine cannot process two operations at the same time. 

The job-shop problem may be represented by 
a mixed graph G = (O, C, D) with vertex set O, the set 
C of (directed) arcs, anda set D= {{k, }: ,1E O;k F 
1, M(k) = M(D} of (undirected) edges. Furthermore, the 
processing time p(k) is associated with each vertex k € 
O. The arcs are called conjunctions and the edges are 
called disjunctions. 

The basic scheduling decision is to define a process- 
ing order of the operations on each machine, i.e. to fix 
precedence relations between these operations. 

In the mixed graph model this is done by orienting 
edges, i.e. by turning disjunctions into conjunctions. 
A set S of these oriented edges is called an orientation. 
An orientation S is called a complete orientation if 
e every edges becomes oriented; and 
e the resulting directed graph G(S) = (O, C U S) has 

no cycles. 

A complete orientation S defines a feasible schedule 
which is calculated in the following way. For each oper- 
ation k let I(k) be the length of a longest path to vertex 
k in G(S). A path to k is a sequence of vertices vj, ..., 
v, = k with (yj, vj,1) is an arc for i=1,..., r—l. The 
length of a path P to k is the sum of all processing times 
of operations in P excluding operation k. Choose I(k) as 
the starting time of operation k. It is not difficult to see 
that this schedule is feasible. Furthermore, the length of 
the longest path in G(S) defines the makespan of this 
schedule. A corresponding path is called critical path. 

On the other hand a feasible schedule s = (s(k)) ceo 
defines a complete orientation S and the critical path 
length in G(S) is not greater than the makespan of the 
schedule s. Thus, one may restrict attention to sched- 
ules defined by complete orientations. 

There are only a few special cases of the job-shop 
problem which are polynomially solvable (cf. » Com- 
plexity classes in optimization). They will be discussed 
next. 


Complexity Results 


The two-machine job-shop problem in which each job 
has at most two operations can be solved by a simple 
extension of Johnson’s algorithm for the two machine 
flow-shop problem [16]. Let I; be the set of jobs with 
operations on M; only (i= 1, 2), and let I), 2 (Jz, 1) be the 
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set of jobs which are processed first on M, (M2) and 
then on M, (M;). Order the latter two sets by means of 
Johnson’s algorithm and the former two sets arbitrarily. 
Then one obtains an optimal schedule by executing the 
jobs on M, in order (112, I), In1) and on M2 in order 
(a1, Ia; Iz). 

In [15] the two-machine job-shop problem with 
unit-time operations (pj = 1) and no machine repeti- 
tion is solved in time linear in the total number of oper- 
ations, through a rule that gives priority to the longest 
remaining job. Despite the fact that this algorithm is 
fast it is not polynomial if we represent each job j by 
the machine which processes the first operation Oy; and 
the number n; of operation of j. However, there ex- 
ists a clever implementation of this algorithm which is 
polynomial ([17,26]). Surprisingly, the problem is NP- 
hard if we allow repetition of machines [12]. 

This, however, is probably as far as one can get if the 
number of jobs is not fixed but part of the input. Two- 
machine job-shop problems with n; < 3 or pj € {1, 2} 
are NP-hard as well as three machine problems with n; 
<2 or py = 1 (18,19). 

The job-shop problem with two jobs may be for- 
mulated as a shortest path problem in the plane with 
regular objects as obstacles [2]. Figure 1 shows a short- 
est path problem with obstacles which corresponds to 
a job-shop problem with two jobs with n; = 4 and nz = 
3. The processing times of the operations of job 1 (job 2) 
are represented by intervals on the x-axis (y-axis) which 
are arranged in the order in which the corresponding 
operations are to be processed. Furthermore, the inter- 
vals are labeled by the machines on which the corre- 
sponding operations must be processed. 

A feasible schedule corresponds to a path from 0 
to F consisting of segments which are either paral- 
lel to one of the axes or diagonal, and avoids the in- 
terior of any rectangular obstacle. If one defines the 
length of the diagonal parts of the path to be equal 
to the projections of these parts on one of the axes 
then the length of the path corresponds to the length 
of the schedule. It can be shown that this geometric 
problem can be formulated as a shortest path prob- 
lem in an acyclic network with O(r?) arcs where r = 
max{n,, m2} and that this network can be calculated 
in time O(r’ log r) which is also the complexity for 
solving the problem [7]. The corresponding preemp- 
tive problem can be solved in O(r*) time by allowing 


—— 


0 M, M> M, M3 Ji 


Job-shop Scheduling Problem, Figure 1 
Path problem with obstacles 


the paths to go horizontally or vertically through the 
obstacles [24]. 

The two-machine job-shop problem with a fixed 
number k of jobs has been solved with time complex- 
ity O(n?*) [9]. However, the three machine job-shop 
problem with k = 3 is NP-hard [25] (cf. also » Com- 
plexity theory). If one allows preemption even the two- 
machine problem with k = 3 is NP-hard [12]. This 
is very surprising because the corresponding problem 
without preemption is polynomially solvable. 


Branch and Bound Algorithms 


Effective branch and bound algorithms (cf. » Integer 
programming: Branch and bound methods) have been 
developed for the job-shop scheduling problem from 
the 1990s onwards ([3,11,13,20]). Rather than a de- 
scription of each of these algorithms in detail some 
of the main concepts, like lower and upper bounds, 
branching rules, and immediate selection are presented. 

Most of the branch and bound algorithms use the 
mixed graph model. The nodes of the enumeration tree 
correspond to orientations of edges representing sets 
of feasible schedules. Branching is done by adding fur- 
ther orientations in different ways. The leaves of the 
enumeration tree correspond to complete orientations 
while the root is defined by the empty orientation. 
Given an orientation S one may define heads and tails. 
A head r(k) of operation k is a lower bound for an ear- 
liest possible starting time of k. A tail q(k) of operation 
k is a lower bound for the time period between the fin- 
ishing time of operation k and the optimal makespan. 
A simple way to derive a head r(k) would be to calcu- 
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late the length of a longest path to k in G(S). Similarly, 
a tail q(k) could be calculated as the length of a longest 
path starting in k excluding p(k). 

Let P be a critical path in G(S) and L(S) be the length 
of P. A maximal sequence uj, .. 
tions in P to be processed on the same machine is called 
a block if it contains at least two operations. 

The following block theorem is used in connection 
with branch and bound algorithms. It also plays an 
important role when defining neighborhoods for local 
search methods. 


., Uj of successive opera- 


Theorem 1 (block theorem) Let S be a complete ori- 
entation. If there exists another complete orientation S’ 
such that L(S')< L(S), then in S' at least one operation of 
some block B of G(S) has to be processed before the first 
or after the last operation of B. 


Upper Bounds 


Each feasible solution provides an upper bound. At the 
root of the enumeration tree some time is invested for 
calculating a good upper bound to start with. This is ac- 
complished by applying tabu search using an appropri- 
ate neighborhood. Some branch and bound algorithms 
also calculate heuristically a feasible solution satisfying 
the given orientation in each vertex of the enumeration 
tree. If this solution provides a better upper bound than 
the current one then the current bound is updated. Fur- 
thermore, informations provided by the heuristic solu- 
tion are used for the branching process. 


Lower Bounds 


Lower bounds are calculated for each node of the enu- 
meration tree, i.e. for the set of solutions feasible with 
respect to the current orientation S. Lower bounds may 
be calculated constructively or destructively. Construc- 
tive lower bounds are calculated by solving relaxations 
of the problem. The destructive methods work as fol- 
lows. For a given integer U one tries to prove that there 
exists no feasible solution with value Cmax < U. In this 
case U + 1 is a valid lower bound. In case of a failure 
one repeats the test for a smaller U-value. Binary search 
can be applied to find a large lower bound. 

The length of a critical path in G(S) provides a con- 
structive lower bound which can be calculated fast. 
Good bounds are provided by certain one-machine 
relaxations denoted as head-body-tail problem: Given 


a set of jobs j = 1, ..., n with release times (heads) r(j), 
processing times p(j), and tails q(j) to be processed on 
a single machine. Find a schedule with starting times 
s(j) satisfying the release times such that max'_, {s(j) + 
pG) + qG)} is minimized. 

Unfortunately the one-machine head-body-tail 
problem is NP-hard. However, the preemptive version 
of this problem can be solved in time O(n log ) by ap- 
plying the following scheduling rules: 

e Take the release times and completion times as de- 
cision points. 

e Schedule jobs in increasing order of decision points 
preferring an available job with longest tail. 

By applying this algorithm to all operations to be pro- 

cessed on M; one gets a lower bound L; (k = 1,..., m). 

The best of all these Ly-values is chosen. 

Other constructive lower bounds are based on 
two job relaxations [10] and cutting plane approaches 
(cf. also ® Integer programming: Cutting plane algo- 
rithms) [3]. 

For destructive methods one assumes that U is a fic- 
tive upper bound and wants to prove that no feasible 
schedule satisfying Cyax < U exists. From U one derives 
the time window [r(k), d(k)] with d(k) = U —q(k) in 
which each operation k must be processed if U is valid. 
For each job j = 1,..., m let 8; the set of schedules for j 
which are feasible with respect to its time window. The 
feasibility problem can be reduced to a zero-one linear 
program as follows. For each schedule o € 8; (j= 1,..., 
n) one defines a(o, i, t) = 1 if o requires machine M; 
in time-period [t —1, t] and a(o, i, t) = 0 otherwise. Let 
Xj a 0-1 decision variable that indicates whether job j 
is scheduled according to schedule o. Then there exists 
no feasible schedule if and only if the following linear 
program has an optimal solution value A* > 1 (see [20]). 


min A (5) 
such that 

Yee. Fae (6) 

o€S; 

n 

S » a(o,i, t)xjg <A, 

j=l o€Sj 

t=low.w me £=1 ,U, (7) 


Xjo € {0, 1}, 


1786 


Job-shop Scheduling Problem 


Due to (6) exactly one schedule is chosen from each 
set 8;. The left-hand side of (7) counts the number of 
jobs to be processed on machine M; in time-period 
[t—1, t]. Thus, one has a feasible schedule if and only 
if A* = 1. To check infeasibility one uses the continuous 
version of (5) to (8) where (8) is replaced by 


Xjo 20, fHl,....m OES). 


A second destructive lower bound based on immediate 
selection will be discussed later. 


Branching 


The simplest branching scheme is to choose a not yet 
oriented edge and orient it in the two possible ways. 
Another more sophisticated branching scheme is based 
on the block theorem. There is a branch to several chil- 
dren of the same father node in the enumeration tree. 
The idea behind such a branching is to orient many 
edges when branching (see [11]). In [20] a time oriented 
branching schemes has been used. 


Immediate Selection 


By branching disjunctions are oriented. There is an- 
other method to orient disjunctions which is due to 
[13]. This method is called immediate selection. It uses 
an upper bound UB for the optimal makespan and sim- 
ple lower bounds based on heads r(k) and tails q(k) of 
operations k. 

Let I be a set of n operations to be processed on the 
same machine and consider a strict subset J C I and an 
operation c € I \ J. If condition 


z p(j) + min q(j) = UB (9) 
jeJULc} d 


min r(j)+ 
jeJULc} 

holds, then all operations j € J must be processed before 
operation c if we want to improve the current upper 
bound UB. This follows from the fact that the left-hand 
side of (9) is a lower bound for all schedules in which c 
does not succeed all jobs in J. Due to integrality, (9) is 
equivalent to 


min r(j) + 5 p(j) + min q(j) > UB-1 
jeJULc} iJ 
jeJU{c} 
or 


min r(j)+ (10) 


jeJULc} 


Dd) pli) > max d(j), 
jeJULc} : 
where d(j) := UB — q(j)— 1. 


(VJ, c) is called a primal pair if (9) or, equivalently, 
(10) holds. The corresponding arcs j — c withj € J are 
called primal arcs. Similarly, (c, J) is called a dual pair 
and arcs c — j are called dual arcs if 

minr()+ D) pj) > max d(j) (11) 

jeJU{c} 
holds. In this case all operations j € J must be processed 
after operation c if we want to improve the current so- 
lution value UB. 

It can be shown [9] that all primal and dual arcs for 
the set J can be calculated in O(n’) time. 

Immediate selection is applied to speed up a branch 
and bound algorithm. For each machine the set I of all 
operations to be processed on this machine is consid- 
ered and all corresponding primal and dual arcs are cal- 
culated. Then heads and tails are recalculated and the 
whole process is repeated until there are no new primal 
or dual arcs to be added. 

By this method the orientation S is increased step by 
step. A possible outcome of this process is that one de- 
duces a graph G(S) which contains cycles. This implies 
that there exists no feasible solution with makespan < 
UB. 

Immediate selection and a corresponding cycle 
check is applied to each node of the enumeration tree. 
If the cycle check is positive, one can backtrack. 

Immediate selection is also used to calculate a lower 
bound by the destructive method. 


Heuristic Procedures 


Using a branch and bound algorithm and immediate 
selection J. Carlier and E. Pinson [13] were able to 
solve the notorious 10 x 10 benchmark problem in [21] 
for the first time. Recently (2000), 15 x 15 benchmark 
problems have been solved [6]. Problems of dimension 
n x n for n > 15 are still out of the reach if one is 
interested in optimal solutions. Thus, the only way to 
find solutions for larger job-shop problems is to apply 
heuristics, which provide solutions which are not too 
far away from the optimum. Some of these heuristics 
will be discussed next. 


Priority Rule Based Heuristics 


These are probably most frequently applied due to their 
ease of implementation and low computation times. 
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The idea of a priority heuristic is to schedule opera- 
tions step by step each as early as possible. In each step 
among all unscheduled operations with the property 
that their conjunctive predecessors are scheduled a can- 
didate with the highest priority is chosen to be sched- 
uled next. For extended summaries and discussions of 
priority rules see [5,14,23]. 


Shifting Bottleneck Heuristic 


(See [1,4].) This is one of the most powerful heuristics 
for the job-shop scheduling problem. In each iteration 
a machine M,; is chosen and all disjunctions between 
operations to be processed on M; are oriented. This is 
done according to the exact solution of the head-body- 
tail problem for M;. Thus, after k steps the disjunctions 
for k machines are oriented. Let JM; the set of these k 
machines. Before choosing a new machine M; ¢ M; in 
the next step the orientations for the machines in Mx 
are updated by applying the head-body-tail algorithm 
to each of the machines in M; in a given machine or- 
der. Asa machine M; ¢ M; added to MM; in the next step 
a bottleneck machine is chosen. A bottleneck machine is 
a machine M; ¢ MM; with a largest head-body-tail prob- 
lem solution value. It is important to note that before 
each application of the solution procedure for a head- 
body-tail problem heads and tails are updated accord- 
ing to the current orientation. 


Local Search 


An important class of heuristics are local search meth- 
ods like iterative improvement, simulated annealing (cf. 
> Simulated annealing), tabu search and genetic algo- 
rithms (cf. » Genetic algorithms). All these methods 
have been applied to the job-shop problem (see [27] for 
an excellent survey). 

The local search techniques are based on the con- 
cept of local improvement. Given an existing solution 
or representation of such a solution, a modification is 
made in order to obtain a different (usually better) so- 
lution. 

For the job-shop problem solutions are represented 
by complete orientations. To modify a solution one 
usually restricts to critical paths (which must be de- 
stroyed for improving the current makespan). One pos- 
sibility for modifications is to choose an arc (v, w) on 


a critical path such that v and w are processed on the 
same machine and replace (v, w) by the reverse arc (w, 
v). It can be shown that the corresponding new orien- 
tation is complete again, i.e. no cycles are created by 
such a reversal. Other modifications are based on the 
block theorem. One modifies an orientation by shifting 
an operation of some block at the beginning or the end 
of the block. Such modifications are not defined if the 
resulting selections are not complete, i.e. contain cy- 
cles. 

One of the best local search methods in terms of so- 
lution quality and computation time is a tabu search 
procedure described in [22]. 


See also 


> MINLP: Design and Scheduling of Batch Processes 
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> Vehicle Scheduling 
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L.V. Kantorovich was born in St. Petersburg, Russia, 
on January 6, 1912 and died on April 7, 1986. Kan- 
torovich shared the 1975 Nobel Prize for Economics 
with T. Koopmans for their work on the optimal allo- 
cation of scarce resources [4,5]. 

Kantorovich was educated at Leningrad State Univ., 
receiving his doctorate in mathematics (1930) there 
at the age of 18. He became a professor at Leningrad 
in 1934, a position he held until 1960. He headed 
the department of mathematics and economics in 
the Siberian branch of the U.S.S.R. Academy of Sci- 
ences from 1961 to 1971 and then served as head of 
the research laboratory at Moscow’s Institute of Na- 
tional Economic Planning (1971-1976). Kantorovich 
was elected to the prestigious Academy of Sciences of 
the Soviet Union (1964) and was awarded the Lenin 


Prize in 1965. For detailed interesting information on 
the life and scientific views of Kantorovich, see his pa- 
per [2] 

Kantorovich was one of the first to use linear pro- 
gramming as a tool in economics. His most famous 
work is [1]. The characteristic of Kantorovich’s work is 
a combination of theoretical and applied research. His 
first works concerned delicate problems of set theory. 
Later he became one of the first Soviet specialists on 
functional analysis. In the 1930s he laid down the foun- 
dations of the theory of semi-ordered spaces which con- 
stitutes now a vast chapter of functional analysis bor- 
dering algebra and measure theory. At the same time 
he anticipated the ideas of the future theory of general- 
ized functions which became current only in the 1950s. 
Kantorovich obtained beautiful results on approxima- 
tion theory. The approach to Sobolev’s embedding the- 
orem suggested by Kantorovich (based on his estima- 
tions of integral operators) is well known. 
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In the mid 1960s R. Solomonoff [16], AN. Kolmogorov 
[11] and G. Chaitin [4] independently invented the field 
now generally known as Kolmogorov complexity. It is 
also known variously as algorithmic complexity, algo- 
rithmic information, algorithmic entropy, Solomonoff- 
Kolmogorov-Chaitin complexity, descriptional complex- 
ity, shortest program length, algorithmic randomness 
and others. An extensive history of the field can be 
found in [14]. 

The Kolmogorov complexity formalizes the notion 
of amount of information necessary to uniquely de- 
scribe a digital object. A digital object means one that 
can be represented as a finite binary string, for exam- 
ple, a genome, an Ising microstate, or an appropriately 
coarse-grained representation of a point in some con- 
tinuum state space. In particular, the Kolmogorov com- 
plexity of a string of bits is the length of the short- 
est computer program that prints that string and stops 
running. The Kolmogorov complexity of an object is 
a form of absolute information of the individual ob- 
ject. This is not possible to do by Shannon’s informa- 
tion theory. Unlike Kolmogorov complexity, informa- 
tion theory is only concerned with the average infor- 
mation of a random source [14]. 

Solomonoff was addressing the problem: How do 
we assign a priori probabilities to hypotheses when we 


begin an experiment? He represented a scientist’s ob- 
servations as a series of binary digits and weighted to- 
gether all the programs for a given result into a proba- 
bility measure. The scientist seeks to explain these ob- 
servations through a theory, which can be regarded as 
an algorithm capable of generating the series and ex- 
tending it, that is, predicting future observations. For 
any given series of observations there are always sev- 
eral competing theories and the scientist must choose 
among them. The model demands that the smallest al- 
gorithm, the one consisting of the fewest bits, be se- 
lected. Stated another way, this rule is the familiar for- 
mulation of Occam’s razor: Given differing theories 
of apparently equal merit, the simplest is to be pre- 
ferred [6]. 

Thus in the Solomonoff model a theory that enables 
one to understand a series of observations is seen as 
a small computer program that reproduces the observa- 
tions and makes predictions about possible future ob- 
servations. The smaller the program, the more compre- 
hensive the theory and the greater the degree of under- 
standing. Observations that are random cannot be re- 
produced by a small program and therefore cannot be 
explained by a theory. In addition the future behavior 
of a random system cannot be predicted. For random 
data the most compact way for the scientist to commu- 
nicate his or her observations is to publish them in their 
entirety [6]. 

Kolmogorov and Chaitin independently suggested 
that computers be applied to the problem of defining 
what is meant by a random finite binary string of 0s and 
1s [5]. In the traditional foundations of the mathemati- 
cal theory of probability, as expounded by Kolmogorov 
in his classic [10], there is no place for the concept of 
an individual random string of Os and ls. Yet it is not 
altogether meaningless to say that the string 


001110100001110011010000111110 
is more random than the string 
000000000000000000000000000000 


for we may describe the second string as thirty 0s, but 
there is no shorter way to specify the first string than by 
just writing it all out [5]. 

We believe that the random strings of a given length 
are those that require the longest programs. Those 
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strings of length n that can be obtained by putting into 
a computer program much shorter than n are the non- 
random strings. The more possible it is to compress 
a binary string into a short program calculation, the less 
random is the string. 

Solomonoff, Kolmogorov and Chaitin saw that the 
notion of ‘computable’ lay at the heart of their ques- 
tions. They arrived at equivalent notions, showing that 
these two questions are fundamentally related. 

The Kolmogorov complexity of a string is low if 
it can easily be obtained by a computation, whereas it 
will be high if it is difficult to obtain it [1]. The dif- 
ficulty is measured by the length of the shortest pro- 
gram that computes the string on a universal Turing 
machine. The use of Turing machines to determine the 
length of the shortest program that computes a par- 
ticular bit-string is intuitive: since a universal Turing 
machine can simulate any other Turing machine, the 
length of the program computing string s on Turing 
machine T, can only differ from the program comput- 
ing the same string on Turing machine T’ by a finite 
length I(T, T’), the length of the prefix code necessary to 
simulate T on T’. As this difference is constant (for each 
string s), the length of the shortest program to compute 
string s on a universal Turing machine is constant in 
the limit of infinitely long strings s and the Kolmogorov 
complexity of string s is defined as 


K(s) = min{|p|: s = Ar(p)}, 


where |p| stands for the length of program p and A7(p) 
represents the result of running program p on Turing 
machine T. 

This measure can be illustrated by a few exam- 
ples. A blank tape (the string with all zeros) is clearly 
a highly regular string and correspondingly its Kol- 
mogorov complexity will be low. Indeed, the program 
needed to produce this string can be very short: print 
zero, advance, repeat. The same is true, of course, for 
every string with a repetitive pattern. 

Another way of viewing algorithmic regularity is 
by saying that an algorithmically regular string can be 
compressed to a much smaller size: the size of the small- 
est program that computes it. More interesting is the 
regularity of a string that can be obtained by the ap- 
plication of a finite but nontrivial algorithm, such as 
the calculation of the transcendental number z. The 
string representing the binary equivalent of z certainly 


appears completely random, yet the minimal program 
necessary to compute it is finite. Thus, such a string 
is also classified as algorithmically regular (though not 
quite as regular as the blank tape) [1]. 

Kolmogorov complexity also provides a means to 
define randomness in this context. According to the 
Kolmogorov measure, a string r is declared random if 
the size of the smallest program to compute r is as long 
as r itself, i. e., 


K(r) © |r|. 


Why should this definition of randomness be 
preferable to any other we might come up with? The 
answer to that was provided by P. Martin-Loef, who 
was a postdoc of Kolmogorov. Roughly, he demon- 
strated that the definition ‘an n-bit string s is random 
if and only if K(s) > n’ ensures that every such individ- 
ual random string possesses with certainty all effectively 
testable properties of randomness that hold for strings 
produced by random sources on the average [9]. 

The algorithmic definition of randomness provides 
a new foundation for the theory of probability. By no 
means does it supersede classical probability theory, 
which is based on an ensemble of possibilities, each of 
which is assigned a probability. Rather, the algorithmic 
approach complements the ensemble method by giving 
precise meaning to concepts that had been intuitively 
appealing but that could not be formally adopted [6]. 

The Kolmogorov complexity of a string s is also de- 
fined as the negative base-2 logarithm of the string’s al- 
gorithmic probability P(s) [2,18]. This in turn is defined 
as the probability that a standard universal computer 
T, randomly programmed, would embark on a com- 
putation yielding s as its sole output, afterward halt- 
ing. The algorithmic probability P(s) may be thought 
of a weighted sum of contributions from all programs 
that produce s, each weighted according to the negative 
exponential of its binary length. 

Turning to the sum of P(s) over outputs, the sum ), 
P(s) is not equal to unity, because, as is well known, an 
undecidable subset of all universal computations fail to 
halt and so produce no output. Therefore >, P(s) is an 
uncomputable irrational number less than 1. This num- 
ber, called Chaitin’s Omega [7], has many remarkable 
properties [8], such as the fact that its uncomputable 
digit string is a maximally compressed form of the in- 
formation required to solve the halting problem [2]. 
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Though very differently defined, Kolmogorov com- 
plexity is typically very close to ordinary statistical en- 
tropy — )/p log p in value. To take a simple example, it 
is known that almost all N-bit strings drawn from a uni- 
form distribution (of statistical entropy N bits) have 
Kolmogorov complexity nearly N bits. More generally, 
in any concisely describable ensemble of digital objects, 
i.e., a canonical ensemble of Ising microstates at a given 
temperature, the ensemble average of the objects’ Kol- 
mogorov complexity closely approximates the whole 
ensemble’s statistical entropy [2,18]. In the case of con- 
tinuous ensembles, the relation between Kolmogorov 
complexity and statistical entropy is less direct because 
it depends on the choice of coarse-graining. Some of the 
conceptual issues involved are discussed in [17]. 

The basic flaw in the Kolmogorov construction (as 
far as physical complexity is concerned) is the absence 
of a context [1]. This is easily rectified by providing 
the Turing machine with a tape u, which represents the 
physical ‘universe’, while the Turing machine with u as 
input computes various strings from u. 

The conditional Kolmogorov complexity of a string 
s is defined as the length of the shortest program that 
computes s given string u [12] 


K(s|u) = min {|p|: s = Ar(p,u)}, 


where the notation Ar(p, u) is introduced as the re- 
sult of the computation running p on Turing machine 
T with u as input tape. The conditional complexity 
measures the remaining randomness in string s, i.e., it 
counts those bits that are not correlated with bits in u. 
In other words, the program p is the maximally com- 
pressed string containing those bits that cannot be com- 
puted from u, as well as the instructions necessary to 
compute those bits of s that can be obtained from u. 

The latter part of the program is of vanishing length 
in the limit of infinitely long strings, which implies that 
the program p mainly contains the remaining random- 
ness of s. The mutual complexity is defined by 


K(s:u) = K(s) — K(s|w), 


which clearly just measures the number of bits that 
mean something in the universe u. 

Let us consider K(s : u) in more detail. Its mean- 
ing becomes clearer if instead of considering a string s 
obtained by Turing machine T with universe u, the en- 
semble of strings S that can be obtained from a universe 


u with T is considered. This ensemble can be thought of 
as a probabilistic mixture subject to random bit-flips. 

In other words, the output tapes to be connected to 
a heat bath can be imagined. In that case, an entropy 
H(S) can be associated with the ensemble of strings S. 
Consider a Turing machine operating on u, a specific 
universe. Obtaining s from u then constitutes a mea- 
surement on the universe U and consequently not only 
reduces the conditional entropy of S given u, but also 
the conditional entropy of U given s. 

Note that the universe is assumed here to be fully 
known, i. e., there is only one tape u in the ensemble U. 
While this must not strictly be so, sometimes it is con- 
venient to assume that there is no randomness in the 
universe. Also, the length of the smallest program that 
computes s from u, averaged over the possible realiza- 
tions of s, then just equals the conditional entropy of S$ 
given u [1]. 

It is known that the average Kolmogorov complex- 
ity over an ensemble of strings just equals the entropy 
of the ensemble. Then 


H(S|u) = (K(s|u))s =— > p(slu) log p(s|u) (1) 


and 
(K(s) — K(s|u)), = H(S) — H(S|w). 


Note that (1) is not strictly a conditional entropy, 
as no average over different realizations of the universe 
takes place. Indeed, it looks just like a conventional 
Shannon entropy only with all probabilities being prob- 
abilities conditional on u. 

Determining the Kolmogorov complexity of a string 
is a difficult problem. For this reason, Kolmogorov 
complexity remained more of a curiosity than a practi- 
cal mathematical tool. In the last few years, mainly due 
to P. Vitanyi and M. Li, a significant progress in using 
Kolmogorov complexity has been made [14]. 

In particular, several successful applications of Kol- 
mogorov complexity in the theory of computation are 
made and the general pattern of the incompressibility 
method emerged [14]. The incompressibility method 
and Kolmogorov complexity is truly a versatile math- 
ematical tool. The incompressibility method is a basic 
general technique such as the ‘pigeon hole argument’, 
the ‘counting method’ or the ‘probabilistic method’. It is 
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a sharper relative of classical information theory (abso- 
lute information of individual object rather than aver- 
age information over a random ensemble) and yet sat- 
isfies many of the laws of classical information theory, 
although with a slight error term. Applications of Kol- 
mogorov complexity have been given in a number of 

areas, including [14]: 

e randomness of individual finite objects or infinite 
strings, Martin—Loef tests for randomness, Gédel’s 
incompleteness result, information theory of indi- 
vidual objects; 

e universal probability, general inductive reasoning, 
inductive inference, prediction, mistake bounds, 
computational learning theory, inference in statis- 
tics; 

e the incompressibility method, combinatorics, graph 
theory, Kolmogorov 0-1 laws, probability theory; 

e theory of parallel and distributed computation, time 
and space complexity of computations, average case 
analysis of algorithms, language recognition, string 
matching, routing in computer networks, circuit 
theory, formal language and automata theory, paral- 
lel computation, Turing machine complexity, com- 
plexity of tapes, stacks, queues, average complexity, 
lower bound proof techniques; 
structural complexity theory, oracles; 

e logical depth, universal optimal search, physics and 
computation, dissipationless reversible computing, 
information distance and picture similarity, ther- 
modynamics of computing, statistical thermody- 
namics and Boltzmann entropy. 

Based on the Turing model of computation, the field of 

Kolmogorov complexity probably will need to be mod- 

ified to account for the new quantum modes of compu- 

tation. From recent studies there appear more facts that 
this modification is likely to be based on notions that go 

beyond the framework of space-time (for example [15]) 

and sought within the world view, which considers nat- 

ural systems not as separate entities but as integrated 

parts of a undivided whole [3]. 

An attempt to contribute to such a modification is 
made in [13]. The results are based on a mathematical 
structure, called a web of relations, that is a collection 
of hierarchical formations of integer relationships. The 
web of relations allows to introduce a concept of struc- 
tural complexity to measure the complexity of a binary 
string in terms of corresponding hierarchical forma- 


tions of integer relationships. Importantly, the concept 
of structural complexity is based on the integers only 
and does not rely on notions that derive from space- 
time. 


See also 


> Complexity Classes in Optimization 

> Complexity of Degeneracy 

> Complexity of Gradients, Jacobians, and Hessians 

> Complexity Theory 

> Complexity Theory: Quadratic Programming 

> Computational Complexity Theory 

> Fractional Combinatorial Optimization 

> Information-based Complexity and 
Information-based Optimization 

> Mixed Integer Nonlinear Programming 

> Parallel Computing: Complexity Classes 


References 


1. Adami C (1998) Introduction to artificial life. Springer, 
Berlin 
2. Bennett C (1982) The thermodynamics of computation - 
a review. Internat J Theoret Physics 21:905-940 
3. Bohm D (1980) Wholeness and the implicate order. Rout- 
ledge and Kegan Paul, London 
4. Chaitin G (1966) On the length of programs for computing 
binary sequences. J ACM 13:547-569 
5. Chaitin G (1970) On the difficulty of computations. IEEE 
Trans Inform Theory 16:5-9 
6. Chaitin G (1975) Randomness and mathematical proof. Sci- 
entif Amer 232:47-52 
7. Chaitin G (1975) A theory of program size formally identical 
to information theory. J ACM 22:329-340 
8. Gardner M (1979) Mathematical games. Scientif Amer 
10:20-34 
9. Kirchherr W, Li M, Vitanyi P (1997) The miraculous universal 
distribution. Math Intelligencer 19(4):7-15 
10. Kolmogorov A (1950) Foundations of the theory of proba- 
bility. Chelsea, New York 
11. Kolmogorov A (1965) Three approaches to the definition of 
the concept “quantity of information”. Probl Inform Trans- 
mission 1:1-7 
12. Kolmogorov A (1983) Combinatorial foundations of infor- 
mation theory and the calculus of probabilities. Russian 
Math Surveys 38:29 
13. Korotkich V (1999) A mathematical structure for emergent 
computation. Kluwer, Dordrecht 
14. LiM, Vitanyi P (1997) An introduction to Kolmogorov com- 
plexity and its applications. second, revised and expanded 
edn. Springer, Berlin 


1794 


Krein—Milman Theorem 


15. Penrose R (1995) Shadows of the mind. Vintage, London 

16. Solomonoff R (1964) A formal theory of inductive infer- 
ence. Inform and Control 7:1-22 

17. Zurek W (1989) Algorithmic randomness and physical en- 
tropy. Phys Rev A 40:4731-4751 

18. Zvonkin A, Levin L (1970) The complexity of finite objects 
and the development of the concepts of information and 
randomness by means of the theory of algorithms. Russian 
Math Surv 256:83-124 


oe 
Krein—Milman Theorem 


GABRIELE E. DANNINGER-UCHIDA 
University Vienna, Vienna, Austria 


MSC2000: 90C05 


Article Outline 


Keywords 
See also 
References 


Keywords 


Convex; Convex hull; Extreme point 


A theorem stating that a compact closed set can be rep- 
resented as the convex hull of its extreme points. First 
shown by H. Minkowski [4] and studied by some others 
([1,2,5]), it was named after the paper by M. Krein and 
D. Milman [3]. See also, for example, [6,7,8]. 


Theorem 1 Let C C R" be convex and compact, let S = 
ext(C) be the set of extreme points of C. 

Then conv(S) = C, i. e. the convex hull of the extreme 
points of C coincides with the set C. 


Proof Since S C C, conv(S) C conv(C) = C. So we are 
left to show that C C conv(S). We prove this by induc- 
tion. 

Let d = dim C. For d = —1(C = 9), d=Oandd=1 
the proof is trivial. 

Let us assume that the theorem is true for all convex 
compact sets of dimension d — 1 > 0. If x € C, but not 
in conv(S), there exists a line segment in C such that x 
is in the interior of it (since x is not an extreme point). 
This line segment intersects the (relative) boundary of 
C in two points u and v. At least one of them is not 
extremal, else x € conv(S). Assume u ¢ S. Since u is on 


the (relative) boundary of C there exists a supporting 
hyperplane H of C, that contains u. Sou € CN H = 
conv(ext(C M H)) (by induction, since dim(C N H) < 
d—1). But ext(C N H) = ext(C) N H and so u € ext(C) 
OH C ext(C). 

An analogous result holds for v. Since x € [u, v], x = 
Au + (1 — A)v with A € J0, 1[ and so x € conv(ext(C)). 


See also 


> Carathéodory Theorem 
> Linear Programming 
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In this article we discuss necessary conditions for lo- 
cal optimality for an optimization problem in terms of 
a system of equations and inequalities which form the 
well-known Kuhn-Tucker KT conditions. Under suit- 
able convexity assumptions about the objective func- 
tion and the feasible domain, the Kuhn-Tucker condi- 
tions are sufficient even for global optimality. However, 
in the nonconvex case sufficiency is no longer guaran- 
teed. The material of this article has been adapted from 
[2] and [4]. 

First, we consider the nonlinear optimization prob- 
lem with inequality constraints, 


min f(x), (1) 


where S = {x: g(x) < 0,i=1,..., p} C R”. We may as- 
sume that all the functions in the optimization problem 
are continuously differentiable on an open set contain- 
ing S. 

A vector d € R" is called a feasible direction at x* if 
there exists A* > 0 such that x* + Ad € S for every0 <A 
< A*. Let Z(x*) denote the set of all feasible directions 
at x*. A local minimum point x* € S satisfies d™ Vf(x*) 
> 0 for every feasible direction d € Z(x*). 

Let 


I(x*) := {i € {1,..., p}: gi(x*) = 0} 


be the index set of the active constraints at x*. Recall 
that dT Vg;(x*)< 0 implies gi(x* + Ad) < gi(x*) for 0 < 
A < ij, for some A; > 0, and dT Vgi(x*) > 0 implies 
gi(x* + Ad) > gj (x*),0 <A < re for some re > 0. 
Moreover, each constraint which is not active in x*, i.e., 
for which we have g;(x*) < 0, does not influence Z(x*), 
because g; (x) < 0 holds in a neighborhood of x*. It 
follows that {d: d™ Vg;(x*) < 0, i € I(x*)} C Z(x*) C {d: 
d™ Vgi(x*) < 0, i € I(x*)} =: L(x*). 

It is easy to see that for linearly constrained prob- 
lems, where gj(x) = a] x — bj, a; € R" \ {0}, b; € R, we 
have Z(x*) = L(x*). On the other hand, one can readily 
construct examples of nonlinear constraints such that 


Z(x*) = {d: d'Vgi(x*) <0, ie I(x*)}. 


Now {d: d™ Vg;(x*) < 0} is an open set whereas L(x*) 
is closed. Recall that the closure cl M of a set MC 
R" is the smallest closed set containing M. Because of 
the continuity of the inner product d™Vg;(x*), we see 
that dT Vf(x*) < 0 for every d € Z(x*) implies that dT 
Vf(x*) <0 for every d € cl Z(x*). Hence, the condition 
d™ Vf(x*) < 0 for every d € cl Z(x*) is as well nec- 
essary for x* to be a local minimum point. Clearly, by 
the discussion so far, we have cl Z(x*) C L(x*). One 
would expect that also cl {d: dT Vgj(x* < 0, i € I(x*)} = 
L(x*), and hence cl Z(x*) = L(x*). Indeed, this is true, 
apart from a few rather pathological cases. An example 
of such a pathological case is S = {x € R: gi(x) := x? < 
O} and x* = 0. Here we have S = Z(x*) = cl Z(x*) = {0}, 
but L(x*) = {d ER:d-2-0<0}=R. 

The constraints of the optimization problem 
min, ¢ s f(x) are said to be regular in x* € S when L(x*) 
= cl Z(x*). Every condition which ensures regularity in 
this sense is called a constraint qualification. Three of 
the most well-known constraint qualifications are given 
in the next theorem. 


Theorem 1 Each of the following conditions is a con- 

straint qualification: 

e gi(x) = alx — bi, aj © R" \ {0},b; ER (Gi =1,..., p; 
linear constraints). 

e gi(x) is convex (i= 1,..., p), and there exists x satis- 
fying gi(X) < 0,..., p; Slater condition). 

e The vectors Vgi(x*), i € I(x*) are linearly indepen- 
dent. 


The first two conditions ensure regularity in every x* € 
S whereas the third requires knowledge of x*. 

Finally, one applies the well-known Farkas lemma 
(cf. » Farkas lemma). This states that, whenever dT 
Vf(x*) => 0 for every d satisfying d™ Vgi(x*) < 0,i € 
I(x*), there exists A; > 0, i € I(x*) such that 


Vf(x*)+ Do AVgi(x*) = 0. 


i€I(x*) 


Since I(x*) is not known in advance, one formulates 
this equation in the following equivalent form: 


Theorem 2 Let f, g; be continuously differentiable on 
an open set containing S, and let x* be a local minimum 
point such that the constraints are regular in x*. Then 
the following KT conditions hold: 
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e gi(x*)<0,i=1,...,p; 
e there exist A; > 0 such that A;g;(x*) = 0,i=1,..., p; 
© VE (x*)+ Py Ai Vegilx*) = 0. 
Theorem 3 The KT conditions are sufficient for a con- 
strained global minimum at x* provided that the func- 
tions f(x) and g;(x),i=1,..., p, are convex. 
Proof By convexity of f(x) and gi(x) we have 
{O2IGIFG=KR VIG, 
g(x) = g(x*) + (x —x*) TV g(x"). 
Multiplying the last inequalities by A; and adding to 
the first one we obtain: 


p 
f(x) + S> Aigi(x) 
i=1 3 
> f(x") + Yo Aigi(x*) 
i=1 é 
+ (x—x*)! [eye + ave| ; 
i=l 
Since the last two terms vanish because of the KT 
conditions, this implies f(x) > f(x*) — ae <1 Aigi(x) = 
f(x*) for all x € S, that is x* is a global minimum. 
Note that no constraint qualification is required in the 
above Theorem. 
Consider now problems with inequality and equal- 
ity constraints 


min f(x), 
where 
sa sy. SiO) S0G=1,...,p), 


hi(x) =O0(G=1,...,0) f° 

Theorem 4 Let f, gi (i =1,..., p), hi @=1,..., t) be 
continuously differentiable in an open set containing S, 
and let x* be a local minimum point. Further, assume 
that the vectors Vg;(x*) (i € I(x*)), Vhi(x*) i =1,...; 
t) are linearly independent. Then the following KT con- 
ditions hold: 

e gi(x*)<O(=1,..., p), hi(x*)=O0(=1,..., t). 

e There existA; >0(i=1,...,p)andu;ERG=1,..., 

t) such that 


P t 
V(x") + DAV gi(x*) + Yo wiVhi(x*) = 0, 


i=l i=1 


Aigi(x*) =0 (i=1,...,p). 


When the functions f, gi (i= 1,..., p) are convex, and 
the functions h;(x) are affine then the above two condi- 
tions are again sufficient for x* to be a global minimum. 


Next, we consider the situation when Kuhn-Tucker 
theory is applied to nonconvex programming. We il- 
lustrate some difficulties arisen from nonconvexity by 
the following simple examples of concave minimization 
problems. 


Example 5 Consider the problem 


min —(xj + x3) 
s.t. xy <1. 
The KT conditions for this problem are 


xj —1<=0, 


Ay — 2x} = 0, 


Ai(xt = 1) — 0, 


—2x> = 0. 


A, = 0, 


It is easy to see that the KT conditions are satisfied at x* 
= (0, 0) with A, = 0 and at x* = (1, 0) with A, = 2. The 
first is a global maximum. The second is neither a local 
minimum nor a local maximum. The problem has no 
local minima. Moreover, inf{f(x): x € S} = —oo since f 
is unbounded from below over S. 


Example 6 


min 2x —x? 


s.t. 0<x <3. 


The KT conditions for this problem are 


Ay(x* = 3) = 0, Nox" = 0, 
2(1—x*)+A,—Az2 =0, 
A, > 0,A2 > 0. 


Since the objective function is strictly concave local 
minima occur at the endpoints of the interval [0, 3]. The 
point x* = 3 is the global minimizer. The endpoints sat- 
isfy the KT conditions (x* = 0 with A; = 0, A, = 2 and 
x* = 3 with A, = 4, Az = 0). However, we can easily see 
that the KT conditions are also satisfied at x* = 1 (with 
A; = Az = 0) and that this is a global maximum point. 


These examples show that for minimization problems 
with nonconvex functions KT points may not be local 
minima. 

Next, we consider the complexity of the prob- 
lem of deciding existence of a Kuhn-Tucker point for 
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quadratic programming problems. When the feasible 
domain is unbounded, we prove that the decision prob- 
lem is NP-hard. 

Most classical optimization algorithms compute 
points that satisfy the Kuhn-Tucker KT conditions. 
When the feasible domain is not bounded it is not easy 
to check existence of a KT point. More precisely, con- 
sider the following quadratic problem 


min f(x) = clx+ x! Qx 


s.t. x>0 


(2) 


where Q is an n X n symmetric matrix, and c € R". The 
Kuhn-Tucker optimality conditions for this problem 
become the following so-called linear complementarity 
problem (denoted by LCP(Q, c)): Find x € R" (or prove 
that no such an x exists) such that 


Qx+c>0, x>0, x'(Qx +c) =0. 


Hence, the complexity of finding (or proving existence) 
of Kuhn-Tucker points for the above quadratic prob- 
lem is reduced to the complexity of solving the corre- 
sponding LCP. 


Theorem 7 The problem LCP(Q, c), where Q is sym- 
metric, is NP-hard. 


Proof Consider the LCP(Q, c) problem in R"*? defined 
by 


—I, en —en On 
ef -l -1 -1 
Qnt3yxent3s) =] or py 8] 


and cl, = (a1, ..., dn, — , b, 0), where aj, i=1,..., n, 
and b are positive integers, I, is the (n x n)-unit matrix 
and the vectors e, € R”, 0, € R” are defined by 

= Gil Ol Sc): 
Define now the following knapsack problem: Find a fea- 
sible solution to the system 


n 
yas = 5 x; € {0,1} (i=1,...,n). 
i=1 


This problem is known to be NP-complete [1]. Next 
we will show that LCP(Q, c) is solvable if and only if the 
associated knapsack problem is solvable. 

Obviously, if x solves the knapsack problem, then y 
= (4) X1,..., AnXn, 0, 0, 0)T solves LCP(Q, c). 

Conversely, assume the point y solves LCP(Q, c) 
given above. Since Qy + c > 0, y > 0 we obtain yyi1 
= Ynv2 = Yn+3 = 0. This in turn implies that }°?_, y; = b 
and 0 < y; < qj. Finally, if y; < a;, then yT (Qy + c) =0 
enforces y; = 0. Hence, x = (y;/a1,..., Yn/dn) solves the 
knapsack problem. 

Therefore, even in quadratic programming, the 
problem of ‘deciding whether a Kuhn-Tucker point ex- 
ists’ is NP-hard. 
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J.L. Lagrange (1736-1813) made significant contribu- 
tions to many branches of mathematics and physics, 
among them the theory of numbers, the theory of equa- 
tions, ordinary and partial differential equations, the 
calculus of variations, analytic geometry and mechan- 
ics. By his outstanding discoveries he threw the first 
seeds of thought that later nourished C.F. Gauss and 
N.H. Abel. 

During the first thirty years of his life he lived in 
Turin (France, now Italy) and, as a boy, his tastes were 
more classical than scientific. His interest in mathemat- 
ics began while he was still in school when he read 
a paper by E. Halley on the uses of algebra in optics. 
He then began a course of independent study, and ex- 
celled so rapidly in the field of mathematical analysis 
that by the age of nineteen he was appointed profes- 
sor at the Royal Artillery School and helped to found 
the Royal Academy of Science in 1757. His ideas had 
greatly impressed L. Euler, one of the giants of Euro- 


pean mathematics. Euler and Lagrange, together, would 
join the first rank of the eighteenth century mathemati- 
cians, and their careers and research where often re- 
lated [5]. 

In 1759 Lagrange focused his research in analysis 
and mechanics and wrote ‘Sur la Propagation du son 
dans les fluides’, a very difficult issue for that time [4]. 
From 1759 to 1761 he had his first publications in the 
‘Miscellanea of the Turin Academy’. His reputation was 
established. 

Lagrange developed a new calculus which would en- 
rich the sciences, called calculus of variations. In its sim- 
plest form the subject seeks to determine a functional 
relationship y = f(x) such that an [8 g(x, y) dx could 
produce a maximum or a minimum. It was viewed 
as a mathematical study of economy or the ‘best in- 
come’ [4]. That was Lagrange’s earliest contribution to 
the optimization area. 

In 1766, Lagrange was appointed the Head of the 
Berlin Academy of Science, succeeding Euler. In of- 
fering this appointment, Frederick the Great wanted 
to turn his Academy into one of the best institutes of 
its day, proclaiming that the ‘greatest mathematician 
in Europe’ should live near the ‘greatest king in Eu- 
rope’ [1]. During this period, he had a prosperous time, 
developing important works in the field of calculus, in- 
troducing the strictness in the calculus differential and 
integral. Later (1767) he published a memoir on the ap- 
proximation of roots of polynomial equations by means 
of continued fractions; in 1770 he wrote a paper con- 
sidering the solvability of equations in terms of permu- 
tations on their roots. 

After Frederick’s death, Lagrange left Berlin and be- 
came a member of the Paris Academy of Science by the 
invitation of Louis XVI (1787). He remained in Paris 
for the rest of his career, making a lengthy treatise on 
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the numerical solution of equations, representing a sig- 
nificant portion of his mathematical research. His pa- 
pers on solution of third - and fourth-degree polyno- 
mial equations by radicals received considerable atten- 
tion. His methods, laid in the early development of 
group theory to solving polynomials, were later taken 
by E. Galois. Lagrange’s name was attached to one of 
the most important theorems of group theory [3]: 


Theorem 1 [If 0 is the order of a subgroup g of a group 
G of order O, then o is a factor of O. 


In 1788 he published his masterpiece, the treatise 
“Méchanique Analytique’, which summarized and uni- 
fied all the work done in the field of general mechan- 
ics since the time of I. Newton. This work, notable for 
its use of theory of differential equations, transformed 
mechanics into a branch of mathematical analysis. As 
W. Hamilton later said, ‘he made a kind of scientific 
poem’ [6]. 

In 1793, Lagrange headed a commission, which in- 
cluded P.S. Laplace and A. Lavoisier, to devise a new 
system of weights and measures. Out of this came the 
metric system. 

Lagrange developed the method of variation of pa- 
rameters in the solution of nonhomogeneous linear dif- 
ferential equations. In the determination of maxima 
and minima of a function, say f(x, y, z, w), subject to 
constraints such as g(x, y, z, w) = 0 and h(x, y, z, w) = 
0, he suggested the use of Lagrange multipliers to pro- 
vide an elegant algorithm. By this method two undeter- 
mined constants A and yz are introduced, forming the 
function F = f + Ag + wh, from the related equations 
F, = 0, Fy = 0, F, = 0, Fy = 0, g = 0, and h = 0, the mul- 
tipliers A and py are then eliminated, and the problem is 


solved. This procedure and its variations have emerged 
as a very important class of optimization method [1,2]. 

One can characterize Lagrange’s contribution to op- 
timization as his formalist foundation. Most of his re- 
sults were retained and developed further by the follow- 
ing generations, who gave to his theory a different and 
practical course. 

By the end of his life, Lagrange could not think fu- 
turistically for the mathematics. He felt that other sci- 
ences such as chemistry, physics and biology would at- 
tract the ablest minds of the future. His pessimism was 
unfounded. Much more was to be forthcoming with 


Gauss and his successors, making the nineteenth cen- 
tury the richest in the history of mathematics. 
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Lagrange and penalty function methods provide a pow- 
erful approach, both as a theoretical tool and a compu- 
tational vehicle, for the study of constrained optimiza- 
tion problems. However, for a nonconvex constrained 
optimization problem, the classical Lagrange primal- 
dual method may fail to find a minimum as a zero du- 
ality gap is not always guaranteed. A large penalty pa- 
rameter is, in general, required for classical quadratic 
penalty functions in order that minima of penalty prob- 
lems are a good approximation to those of the original 
constrained optimization problems. It is well-known 
that penalty functions with too large parameters cause 
an obstacle for numerical implementation. Thus the 
question arises how to generalize classical Lagrange and 
penalty functions, in order to obtain an appropriate 
scheme for reducing constrained optimization prob- 
lems to unconstrained ones that will be suitable for suf- 
ficiently broad classes of optimization problems from 
both the theoretical and computational viewpoints. 

One of the approaches for such a scheme is as fol- 
lows: an unconstrained problem is constructed, where 
the objective function is a convolution of the objec- 
tive and constraint functions of the original problem. 
While a linear convolution leads to a classical Lagrange 
function, different kinds of nonlinear convolutions lead 
to interesting generalizations. We shall call functions 
that appear as a convolution of the objective func- 
tion and the constraint functions, Lagrange-type func- 
tions. It can be shown that these functions naturally 
arise as a result of a nonlinear separation of the im- 
age set of the problem and a cone in the image-space 
of the problem under consideration (see [4]). The class 
of Lagrange-type functions includes also augmented 
Lagrangians, corresponding to the so-called canonical 
dualizing parameterization. However, augmented La- 
grangians constructed by means of some general du- 
alizing parameterizations cannot be included in this 
scheme. 

Consider the following problem P( f,g): 


min f (x) 


where X is a metric space, f is a real-valued function 
defined on X, and g maps X into R”, that is, g(x) = 
(gi(x),...,m(x)), where g; are real-valued functions 
defined on X. We assume that the set of feasible solu- 
tions Xp = {x € X: g(x) < 0} is nonempty and that 
the objective function f is bounded from below on X. 


subjectto x € X, g(x) <0, 


Let 2 bea set of parameters and h: R'*™xQ >R 
be a function. Let 7 € IR. Then the function 


xEX,wEel, 


(1) 


L(x, @) = h(f(x)—n, g(x); @) +0, 


is called a Lagrange-type function for problem P(f,g) 
corresponding to hand n, and h is called a convolution 
function. 

If h is linear with respect to the first variable, more 
specifically: 


h(u,v;@) =u+ x30), 


where xy: R™ x 922 — R isa real-valued function, then 
the parameter 7) can be omitted. Indeed, for each 7 € R, 
we have 


L(x, @) = f(x) + x(g(x);@). 


However in general nonlinear situation the pres- 
ence of 7 is important and different 7 lead to Lagrange- 
type functions with different properties. 

One of the possible choices of the number 7 is 
n = f(x) where x, is a reference point, in particular 
x is a solution of P( f,g) (see [4]). Then the Lagrange- 
type function has the form 


L(x, @) = h( f(x) — f(x), g(x)3@) + f(x), 


xEX,wWEeEN. 


The Lagrange-type function (1) is a very general 
scheme and includes linear Lagrange functions, classi- 
cal penalty functions, and augmented Lagrange func- 
tions as special cases. 

Let 82 = R* and p be a real-valued function de- 
fined on R'*™. Define 


h(u,v;@) = p(u,@V1,...,@mVm)- (2) 


The Lagrange-type function has the form 
Lp(x,@) = p(f(x)— 7, @181(%),-- +, OmSm(x)) +1 


We can obtain fairly good results if the function p 
enjoys some properties. In particular, we assume that 
(i) pis increasing; 

(ii) p(u,Om) < u, for all u € R. (Here 0,, is the 
origin of R”.) One more assumption is useful for 
applications. 
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(iii) p is positively homogeneous (p(Ax) = Ap(x) for 
A>0). 

If both (i) and (iii) hold, then p is called an IPH func- 

tion. Let p be a real-valued function defined on R'*™ 

and h be a convolution function defined by (2). Then 

(a) If p enjoys properties (i) and (ii), then 


sup h(u,v;w) <u, VueR,veR”. 


wed 


(b) If p is an IPH function and p(u, e;) > 0, where e; is 
the i-th unit vector, i= 1,...,m, then 


sup h(u,v;w) = +00, 
weEQ 


VvéR™. 


We now give some examples of Lagrange-type func- 
tions. First two examples correspond to functions of the 
form (2). 

1) Let p(u,v) = u+ > jn, vi. Then Ly(x, @) = f(x)+ 
>", wigi(x) coincides with the classical Lagrange 
function. 

2) Let p(u,v) =u + 07, vit where vt = max(v, 0). 
Then L,(x,w) = f(x) + 072, wigi(x)* coin- 
cides with the classical (linear) penalty function. If 
p(u,v) =ut >, (v7), then Lp(x,@) = f(x) + 
yt, wi(gi(x)*)* is a quadratic penalty function. 
We now give the definition of a penalty-type func- 
tion. Let 2 be a set of parameters and h: R'*™ x 
Q — R bea convolution function with the prop- 
erty: 

h(u,v;o) =u, ueR veR™ wen. 

Then the Lagrange-type function L(x,w), corre- 

sponding to h, is called a penalty-type function. 

Next two examples cannot be presented in the 

form (2). 

Augmented Lagrangians 

Let o: R™ — R be an augmenting function, i.e., 

o(0) = 0 and o(z)>0, for z # 0, and 2 C 

{(y,r): y € R™,r => 0} be a set of parameters 

satisfying (0,0) € 2 and (y,r) € 92 implying 

(y,r’) € &, for all r’ > r. Leth: R™ x 2 > R 


3 


~ 


be the convolution function defined by 


h(u,v;(y,r)) = inf (u — [y,z] + ro(z)) 


ztv< 


II 


u+ inf Aaa bs z] + ro(z)). 


ztv< 


Then the Lagrange-type function, corresponding 
to 7» = 0, coincides with the augmented La- 
grangian [5], that is, 


L(x, (y,7)) = A(f(x), g(x)s (v7) 


= wa —[y,z] + ro(z)). 


4) Morrison-type functions. Let 2 = R+ and 


h(u,v,@) = (u—o)t? + o(vi,..., vt), 


7m 


where o is an augmenting function. Then the Lagrange- 
type function corresponding to n =0 has the form 


L(x, @) = ((f(x)—@)TY +0(gi(x)*,..- Sm(x)*). 


Functions of this kind have been introduced by 
Morrison [6]. 

Consider problem P(f,g), a convolution function 
h: R't™ x Q — R and the corresponding Lagrange- 
type function 


L(x,w) = h(f(x) — 0, g(x)s@) + 7. 


The dual function gq: 2 > R = RU {—o0, +00} 
of P(f,g) with respect to h and 7 is defined by 


q(w) = inf h(f(x)—n, g(x);o) +n, we. 


Consider the dual problem to P(f,g) with respect to 
hand 7: 
maxq(w), subjectto we. 


We are interested in the following questions: Find 
conditions under which 
1) the weak duality holds, i.e., 


M(f,g) := inf f(x) > sup q(w) := M*(f,g); 
xEXq wEQ 
2) the zero duality gap property holds, i.e., 


pee = sup q(o); 


WEL 


3) an exact Lagrange parameter exists, i.e., the weak 
duality holds and there exists @ € (2 such that 


inf f(x) = inf L(x, @); 
xEXg xEX 
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4) a strong exact parameter exists: there exists an exact 
parameter @ € 92 such that 


argminP(f, g) := argmin,<y, f(x) 


= argmin,.yL(x,@); 


5) a saddle point exists and generates a solution of 
P(f.g). The first part of this question means that 
there exists (xx, @*) € X x 92 such that 


L(x, @%) < L(X«,@%) < L(Xx, @), 
(x, @x) < L(x, @«) < L(xx, ) (3) 


xEX, WEN. 


The second part means that (3) implies x, € 
argmin P(f, g). 

The weak duality allows one to estimate from below 
the optimal value M(f,g) by solving the unconstrained 
problem inf,ex L(x, w). The zero duality gap property 
allows one to find M(f,g) by solving a sequence of un- 
constrained problems inf,<x L(x, w;) where {@;} C 92. 
The existence of an exact Lagrange parameter @ means 
that M(f,g) can be found by solving one unconstrained 
problem inf,<x L(x, @). The existence of a strong exact 
parameter ® means that the solution set of P(f,g) is the 
same as that of minyex L(x, @). 

Leth: R!*™ x Q > R bea convolution function 
such that 


sup h(u,v;w) <u,  forall(u,v) €RxR"”. (4) 


WEL 


Then the weak duality holds. 
Condition (4) can be guaranteed if 


h(u, v3 w) = p(u,wiv,..-,WmVm) 


(u,v) ER" we R’, 
and p: R!+™ — R is an IPH function satisfying 


pd.On) <1, p(-1,0n) <—-1. 


Assume that 77 is a lower estimate of the function f 
over the set X,i.e., f(x)—n > b>0, for allx € X. Then, 
in order to establish the weak duality, we need only to 
consider convolution functions defined on [b, +00) x 
IR™ x 92 such that 


sup h(u,v;w) <u, V(u,v) € [b, +00)xR™. (5) 


WEL 


To investigate the zero duality gap property, we fur- 
ther assume that, for any € € (0, b), there exists 6 >0 
such that 


inf h(u,v;w) >u—e, Wu>b,r(v) <6; (6) 
WED 


and that, for each c > 0, there exists @ € 2 such that 


h(u,v;@) > cr(v), Vu >b,veR"”, (7) 


where r: R™ — R is such that r(v) <0 => ve R™. 
Assume further that 


(f1) The function f is uniformly positive on Xo, i.e., 


inf f(x) = M(f.g)>0; 
xEXo 
(f2) The function f is uniformly continuous on an 
open set containing the set Xo; 
The mapping g is continuous and the set-valued 


mapping 


(g) 


D(6) = {x € X: r(g(x)) < 6} 


is upper semi-continuous at the point 6 = 0. 


Theorem 1 Under the assumptions (5)-(7) and (f 1), 
(f2) and (g), the zero duality gap property holds 
for P(f.g) with respect to the Lagrange-type function 
L(x, @), corresponding to h and n =0. 


Let b > 0. Define a convolution function h: [b, +00) x 


R” — R by 


h(u,v;@) = plu, @1V1,...,@mVm); 


where p: Ry x R” — R is an increasing function sat- 
isfying 
plu,0m) <u, forallu>0. (8) 


Consider the P(f,g) with uniformly positive objec- 
tive function f on X. Let L be the Lagrange-type func- 
tion defined by 


L(x, @) = p(f (x), o1gi(x),..-,@mgm(x)) , 


where p is defined on R+ xR”. Define the perturbation 
function B(y) of P(f.g) by 


Bly) = inf fx): x € X. g(x) sy}, yeR™. 
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Theorem 2 Let p be a continuous increasing function 
satisfying (7). Let the zero duality gap property with re- 
spect to p holds. Then the perturbation function B is 
lower semi-continuous at the origin. 


Further assume that p satisfies the following property: 
there exist positive numbers @),...,a@,, such that, for 
all u>0,(11,...,V¥m) € R™, we have 


plu, V1,.--,¥m) = max(u, a1V1,...,4mVm) - (9) 


Theorem 3 Assume that p is an increasing convolution 
function that possesses properties (8) and (9). Let per- 
turbation function B of problem P(f,g) be lower semi- 
continuous at the origin. Then the zero duality gap prop- 
erty with respect to p holds. 


Remark 1 The perturbation function 6 depends on 
P(f,g) and doesn’t depend on the exogenous function 
p. It is worth noting that Theorems 2 and 3 estab- 
lish equivalence relations between the zero duality gap 
property with respect to different p from a broad class 
of convolution functions. 


Remark 2 If p is a linear function, then the lower 
semicontinuity does not imply the zero duality gap 
property, so we need to impose a condition that does 
not hold for linear functions. This is the role of (9). 
The results similar to Theorem 2 and Theorem 3 
hold also for penalty type functions, where p(u,v) is 
a function defined on Ry x Rf and L(x,w) = 
pf (x), @igi(x)t,...,@m(x)t). In such a case (9) 
should be valid only for u>0, v € RY". This require- 
ment is very weak and is valid for many increasing func- 
tions including the function p(u,v) = u+ O72, vi. 


Let the Lagrange-type function be of the following 
form 


L(x,@) = f(x) + x(g(x);@), xEX,wek. 
Consider set K of functions 7: R™ x 2 — R with 
the following two properties 
(i) x(-, @) is lower semi-continuous for all w € Q; 
(ii) supyeg X(V3@) = 0, for all v € R”™. 
Consider a point (x, @«) € X x §2 such that 


L(xx,@«) = min L(x, wx), (10) 
xEX 


(g(x); Ox) =0. (11) 


Theorem 4 Let xy € K. If (10) and (11) hold for 
Xx € Xo and wx € MQ, then wx is an exact Lagrange 
parameter. 


The most advanced theory has been developed for two 
special classes of Lagrange-type functions. One of them 
is augmented Lagrangians (see article in encyclopedia). 
The other class consists of penalty-type functions for 
problems with a positive objective and a single con- 
straint. This penalty-type functions are composed by 
convolutions functions of the form (2) with IPH func- 
tions p. 


Remark 3 Consider problem P( f,g) with m constraints 
Zi,---»%m- Wecan convert these constraints to a single 
one by many different ways. In particular, the system 
gi(x) < 0,...,8m(x) < 0 is equivalent to the single 
inequality fi(x) := ye) < 0. The function f; 
is non-smooth. If all functions g;(x) are smooth then 
a smoothing procedure can be applied to f; (see [13] 
for details). Problems with a single constraint are con- 
venient to be dealt with from many points of view. 


Let P(f, 1) be a problem with a positive objective f and 

a single constraint f;. We consider here only IPH func- 

tions s; defined on R7, by: 
sk (u, v) = (u* a ys, 


u,v>0. (12) 


(Many results that are valid for s, can be extended 
also for IPH functions p: R4. + R4+ with properties 
pC, 0) =1, limy++00 p(, u) = +00.) 

A penalty-type function L;* corresponding to s; has 
the form i d) = (f(x)* + apa. Here d* is 
a penalty parameter. It can be shown that the exact pa- 
rameter does not exist if k > 1 for the ‘regular’ problems 
in a certain sense, so we will here consider only the clas- 
sical penalty function with k=1 and lower order penalty 
functions with k < 1. It can be shown that the existence 
of an exact parameter for k < 1 implies the existence of 
exact parameters for k’ with 0 < k’ < k. One of the 
main questions that can be studied in the framework of 
this class of penalty-type functions is the size of exact 
penalty parameters. Generally speaking, we can dimin- 
ish the size of exact parameter using the choice of k and 
some simple reformulations of the problem P(f, f1) in 
hand. 

For the function Le an explicit value of the least 
exact penalty parameter can be expressed through the 
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perturbation function. Let f(y) be the perturbation 
function of the problem P(f,f;). Note that B(0) = 
M(f, fi) and B is a decreasing function, so B(y) < 
M(f, fi). For the sake of simplicity, we assume that 
By) < M(f, fi) for all y > 0. Let 


k k k 
dg = sup MAE = BRO 
y>0 y 


(13) 


Then the least exact parameter exists if and only if 
the supremum in (13) is finite and the least exact pa- 
rameter is equal to dx. For k= 1 the existence easily fol- 
lows from the calmness results of Burke [1]. 

Let f(x) =f(x)+c with c>0 and d., be the least 
exact parameter for problem P(f*, f;). Then it can be 
proved that d.., — 0 asc — +00. 

Assume that functions f and f; are Lipschitz. Since 
k < 1 the function i is not locally Lipschitz at points 
x where f (x) = 0, so we need to have a special smooth- 
ing procedure in order to apply numerical method for 
the unconstrained minimization of this function. Such 
a procedure is described in [14]. This procedure can be 
applied for different types of lower order penalty func- 
tions. 

Another approach for constructing a Lipschitz 
penalty function with a small exact parameter is also of 
interest (see [11] and references therein). 

Let o be a strictly increasing continuous concave 
function defined on [a, +00) where a > 0. Assume that 
o(a) = Oand limy_,+.. o(y) = 0 where o/, is the right 
derivative of the concave function o. Consider the func- 
tion f(x) = o(f(x) + c) and the classical penalty 
function for Gt d) = o(f(x)+c)+df;(x) for the 
problem P( f°, f;). Let dg, be the least exact parame- 
ter of ies (assuming that this parameter exists). Then 
we can assert that d,,, — 0 as c — 0 under very mild 
assumptions. 
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The Primal Problem 
and the Lagrangian Dual Problem 


For a given primal optimization problem (P) it is pos- 
sible to construct a related dual problem which de- 
pends on the same data and often facilitates the anal- 
ysis and solution of (P). This section focuses on the La- 
grangian dual, a particular form of dual problem which 
has proven to be very useful in many optimization ap- 
plications. 
A general form of primal problem is 


min f(x) 
(P) st. g(x) < 0, 
h(x) = 0, 
xe S, 


where f is a scalar function of the n-dimensional vector 
x, and g and hare vector functions of x. Sis a nonempty 
subset of R”. It is convenient to associate dual variables 
with the constraints as follows: components of the dual 
vector u correspond to components of the vector con- 
straint g(x) < 0, and similarly the components of v are 
associated with components of the constraint h(x) = 0. 

There is a great deal of flexibility in defining prob- 
lem (P). For example, any or all of the explicit con- 
straints g(x) < 0 and h(x) = 0 could be incorporated 
in the definition of the set S. This, of course, governs 
the number and type of dual variables. As will be seen 
in the examples, defining S is the first step in defining 
a Lagrangian dual of (P). To illustrate the basic duality 
results, certain assumptions regarding the functions f, 
g and h and the set S will be made to simplify the pre- 


sentation below. For more thorough treatments, see the 
references. 

Given (P), define the Lagrangian function L(x, u, v) 
=f (x)+ uT g(x) + vT h(x). The Lagrangian dual problem 
is then 

max O(u,v) 


(D) 
s.t. u> 0, 


where, for fixed (u, v), the dual function 6 is defined in 
terms of the infimum of the Lagrangian function with 
respect to x € S: 


O(u,v) = inf L(x, u,v). 
xES 


Below are five examples of primal problems and 
their duals. The first is a geometrical example, three are 
classes of optimization problems: linear programs, con- 
vex programs, and quadratic programs, and the final 
example is an integer program. 


Example 1 (geometrical problem) In this two-variable 
example, a linear function is to be minimized over the 
intersection of the unit disk and the nonnegative or- 
thant. The optimal solution is at the origin with objec- 
tive value zero. 


min x,+%x* 
st x7 +22 <1, 
(P1) : aaa 
—x, <0, 
=X9 < 0, 


Letting S = {(x1, x2): xt + x} < 1}, the dual problem is 


max 0O(u) 
(D1) 
s.t. u >= 0, 
where 
O(u) = min (1—u,)x,; + (1 — u2)xo. 


B in? 
xp+xz<1 


Note that min replaces inf in the definition of @ since 
it is clear that the infimum exists and is finite for this 
example. 


Example 2 (linear programming) Duality is an impor- 
tant topic in any treatment of linear programming. This 
example shows that the Lagrangian dual of a primal lin- 
ear program is equivalent to the dual linear program as 
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it is usually formulated in textbooks. Letting the primal 
be 


min c!x 
(P2) s.t b—Ax <0 
x>0, 


and choosing S = {x: x = 0}, the Lagrangian dual is 


(D2) max 6(u) 
s.t. u>0 


where 
O(u) = inf clx +u'(b— Ax). 


This reduces to 
0 if(c— A'u) > 0, 


6(u) = bl ut 
—oo otherwise. 


Assuming there are nonnegative values of u such that 
c > ATu, these would be the only viable choices for the 
maximization of 0(u) and therefore (D2) takes the form 
familiar from linear programming duality: 


Example 3 (differentiable convex programming) One of 
the first nonlinear duals was developed by P. Wolfe [27] 
for the primal problem 


min f(x) 
(P3) st. g(x) <0, 
xéeS, 


where S is an open convex set, and f and g are differ- 
entiable convex functions defined on S. The Lagrangian 
function is L(x, u) = f(x) + uT g(x), and it is further 
assumed that @(u) 4 — oo for all u > 0. With these as- 
sumptions the Lagrangian function is convex in x and 
has a minimum where its gradient is zero. That is, the 
requirement 6 (u) = min, «5 L(x, u) is the same as re- 
quiring V, L(x, u) = 0. Thus the dual problem may be 
written 


max L(x,u) 
(D3) st. V,L(x,u) = 0, 


u> 0. 


Example 4 (convex quadratic programming) An im- 
portant special case of the preceding example is the 
problem 


min tx" Hx +d'x 
(P4) st. Ax <b, 
x ER", 


where H is a given symmetric positive definite n x n 
matrix and d is a given vector in R". Applying the re- 
sults for (P3) above and using the equality constraints 
of (D3) to eliminate x, the dual of can be written 


max 6(u) = —3u'(AH™'!A')u 
—u'(b + AH"'d) — $d'H"'d 


s.t. u> 0. 


(D4) 


Thus, the dual of (P4) is also a quadratic program in the 
dual variables u. 


Example 5 (integer program) The following numerical 
example of a linear problem with binary variables will 
be used to illustrate various dual properties in the fol- 
lowing sections. 


min 20— x, — 5x. —7x3 
(P5) st. xy + 3x2 + 4x3 <5, 
xj €{0,1}, j=1,2,3. 


For this problem, let S be defined by the binary restric- 
tions on the components of x. Then L(x, u) = 20— x;— 
5X2— 7X3+ u(x, + 3x2 + 4x3— 5) and the dual problem 
is 


max 0O(u) 
(D5) 
s.t. u > 0, 
where 
O(u) = min (u—1)x; + Bu—5)x2 
+(4u — 7)x3 — 5u + 20 
s.t. x, €{0,1}, j=1,2,3. 
Weak and Strong Duality 


For a given primal problem (P) and associated dual 
problem (D), a fundamental relationship showing that 
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the two objective function values bound each other is 
given by the following weak duality result: 


Theorem 6 [fx is feasible to (P) and (u, V) is feasible to 
(D), then 


f(x) = Ou, 7). 
Proof 


O@(u, v) = inf L(x, u,v) 
xeS 


< f&) +a" g&) + V7") < f@). 


The first inequality follows since x € S and the second 
from a! g(x) < 0 and h(h(x) = 0. Oo 


If the optimal primal and dual objective values are 
equal, strong duality is said to hold for the primal and 
dual pair. The following theorem illustrates such a re- 
sult for the the pair (P3) and (D3). 


Theorem 7 Let x* be an optimal solution for (P3) and 
assume the function g satisfies some constraint qualifi- 
cation. Then there exists a vector u* such that (x*, u*) 
solves (D3) and 


f(x*) = L(x*,u*). 
Proof Under the assumptions there exists a u* > 0 


such that (x*, u*) satisfies the Karush-Kuhn-Tucker 
conditions: 


V,.L(x*,u*) = 0, 
a gtx) =0, 

from which it follows that 
f(x*) = L(x*, u*) 


and that (x*, u*) is feasible to (D3). Using this and the 
weak duality theorem gives 


L(x*,u*) > L(x, u) 


for any (x, u) satisfying the constraints of (D3). The re- 
sults of the theorem follow. Oo 


The references contain additional strong duality results, 
including cases where differentiability is not required. 
However, as will be seen in examples below, it often 
happens that there is a difference, known as the dual- 
ity gap, between the optimal values of the primal and 
dual objective functions. 


Properties of the Lagrangian Dual Function 


The Lagrangian dual function enjoys two useful proper- 
ties: it is concave and, although it is not necessarily dif- 
ferentiable, it is relatively straightforward to compute 
a subgradient at any dual feasible point. 


Theorem 8 @ (u, v) is concave. 


Proof For fixed x, L(x, u, v) is linear in (u, v) and thus 
O(u, v) is the infimum of a (possibly infinite) collection 
of functions linear in (u, v). Oo 


It is important to note that the above result is true under 
very general conditions. In particular, it is true when the 
set S is discrete. 

Since @ (u, v) is concave, it is known that at least 
one linear supporting function exists at each (u, v). Col- 
lectively, the gradients of all linear supports at (u, v) is 
called the set of subgradients of 6 at (u, v). 

For any (u, v) for which 6(u, v) is finite, denote S(u, 
v) as the solution set of the minimization defining 0(u, 
v). 


Theorem 9 For fixed (u,v), let x © S(u,v). Then 
(g(x), h(X)) is a subgradient of 6 at (u, V). 


Proof For any (u, v) 
O(u,v) = inf f(x) + u! g(x) +v' h(x) 
< f@® +ul g&) + v"h@) 


= fx) +(u- a)! g(x) + a! g(x) 
+(v—v)'h(X) + 9' A(z). 


Hence 


(u,v) < OU, ¥) + g(x)" (u— a) + h(&)"(v—7). 


a 


If S(u, V) is a single point x, then there is only one sub- 
gradient of 6 at (u,v) in which case @ is differential at 
(u,v), ie, VOU, Vv) = (g(x), h(X)). 

From the above, @ is always concave and it is rela- 
tively easy to calculate a slope at any point. Much use of 
this is made in algorithms for large scale integer pro- 
grams. Also, the fact that the maximum value of the 
dual provides a lower bound to the optimal objective 
function value in methods (such as branch and bound) 
for solving the primal problem. While strong duality 
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generally holds for convex programs, this is rarely true 
for integer programs. 

Revisiting the examples of the first section, for Ex- 
ample 1 the Karush-Kuhn-Tucker conditions can be 
employed to derive 


A(u) = —J/(1— m1)? + (1 — up). 


There is no duality gap for this problem, the dual maxi- 
mum occurs at (u4, U2) = (1, 1) where @ is zero, in agree- 
ment with the primal minimum. The dual function is 
differentiable except at its maximizing point. The dual 
function of Example 2, a linear program, is linear and 
thus it is concave and differentiable everywhere. Simi- 
larly, in Example 4, since H is positive definite, H lis 
also positive definite and the dual function is again con- 
cave and differentiable everywhere. For Example 5, the 
integer program, values of u feasible to the dual prob- 
lem, S(u) and 6(u) are given in Table 1. S(u) is the triple 
(x1 (u), x2(u), x3(u)). 

Figure 1 is a graph of the function 6(u). Again, 0(u) 
is a concave function and it is differentiable except at 


The maximum dual value is 


o(3) =u5. 
3 3 


which indicates a duality gap of size 2/3 since the opti- 
mal value of (P5) is f(1, 0, 1) = 12. 

By contrast, Theorem 8 does not apply in Example 
3 because the objective of (D3), a Lagrangian function, 
depends on both x and u, rather than the dual variables 
alone. Lagrangian functions are generally not concave. 
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Values of the dual function for Example 5 


u S(u) O(u) 
0<u<l (ney 7 +3u 
1 £1, 1, 10, 1, 1} | 84 2u 
1l<u<5/3 (0, 1,1) 8+2u 
5/3 (0.1.1) UK(0,.6, 1) | 13a 
5/3 <u < 7/4 | (0,0, 1) 13-—u 
7/4 £(0, 0, 1) U (0,0, 0)} | 20 — 5u 
7/4 <u (0, 0, 0) 20 — 5u 


0 


oO 0.5 1 4 1.5 2 2.5 
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0 (u) for Example 5 


Geometrical Interpretations 
of Lagrangian Duality 


The Resource-Payoff Space 


One interpretation of the dual problem is provided via 
the resource-payoff set RP for problem (P). To illustrate 
geometrically, assume that (P) has just one inequality 
constraint g(x) < 0 and there are no explicit equality 
constraints. Then the resource-payoff set for the prob- 
lem is the set of points defined by 


RP = {(g(x), f(x): x € S)}. 


That is, RP is a mapping of all x € S into the (g, f)-plane. 
In this plane, the Lagrangian equated to a constant 0 
has the form f + ug = @, which defines a line of slope 
—u and intercept @. For any u > 0, the dual function 
9(u) is defined by minimizing f(x) + ug(x) over x € S. 
Thus @ (u) is the intercept of a linear support to the 
resource-payoff set at {(g(x), f(x)): x € S(u)}. To illus- 
trate, consider the problem 
min xj +x} 
(P6) 1— x, — x2 <0, 
—x, <0, 


—xX2< 0. 


The optimal solution is xf = x} = 1/2 and f(x], x}) 
= 1/2. Letting S = {(x1, x2): x1 > 0, x2 = O}, and g(x, 
X2) = 1— x, — x, the resource-payoff set is a subset of 
R? defined by RP = {(g(x1, x2), f(x1, x2)): x1 = 0, x2 > 
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RP set for (P6) 


O}. It can be verified that RP consists of all points in R* 
between the curves (g — 1)? and (g — 1)*/2 for g < las 
shown in Fig. 2. 

The two linear supports of RP shown have slopes 
of —2 and —1, corresponding to u values of 2 and 1. 
With u = 2, (x;(u), x2(u)) is the singleton (1, 1) and 
(g(x(u)), f(x(u))) = (-1, 2). The line with slope —2 pass- 
ing through the point (g, f) = (—1, 2) intersects the f- 
axis at the origin. Thus 6(2) = 0 < f(x*), illustrating the 
weak duality theorem. 

For u = 1, (x;(u), x2(u)) = (1/2, 1/2) and (g(x(u)), 
f(x(u))) = (0, 1/2). Since this point lies on the f-axis 
it follows that 6(1) = 1/2 = f(x*). This illustrates the 
strong duality theorem. 

As an alternative consider Example 5, the binary 
linear programming problem. Since S is discrete, RP 
consists of the eight points in R? listed in the last two 
columns of Table 2. The optimal solution to the prob- 
lem is x* = (1, 0, 1), f(x*) = 12. The resource-payoff set 
for this example is shown in Fig. 3. The lines in the fig- 
ure trace out the lower envelope of the resource-payoff 
set and are found by minimizing the Lagrangian func- 
tion over S using u! = 7/4, u* = 5/3 and wu? = 1. The lines 
with slope —7/4, —5/3, and —1 intersect the f-axis at 


1 1 
11-,11-=, 10, 
4 3 


respectively. Thus 


0 ({) = ie, A(1) = 10, 0 (:) = le. 
4 4 3 3 
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Values of g and f for Example 5 


X1 | X2 | X3 Q(X1, X2, x3) f (x1, x2, x3) 
X1 + 3x2 + 4x3 —5 | 20—x, —5x2—7x3 
0|0)]0 =5 20 
0);0)1 —1 13 
0; 110 —2 15 
OR ela | 2 8 
1/0)0 —4 19 
1/0) 1 0 12 
1/110 —1 14 
iy a | al 3 7 


slope=-7/4 


—2 -i 
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The set RP for Example 5 


The duality gap for this problem, as noted earlier, is 


P Wan je 2 aa ee 
f(x") -— @(u*) = 12 et 


These two examples illustrate a sufficient condition 
for there to be no duality gap. There is no gap if the 
point (g(x*), f(x*)) lies on the lower envelope of the 
resource-payoft set, and there is a linear support of slope 
—u < Oat that point with intercept f(x*). 

This condition is satisfied for (P6), but not for (P5). 
However, if the constraint in (P5) is replaced by g(x) = 
x, + 2x2 + 4x3— 4 < 0, the condition is satisfied. The 
effect of this constraint change can be seen in Table 2 
and Fig. 3. In the table, the g column entries would be 
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increased by 1, and thus f(x*) = 13, with x* = (xf, x3, 
x3) = (0, 0, 1). In Fig. 3, the f-axis would be shifted one 
unit to be left. In this case, note that (g(x*), f(x*)) now 
lies on the lower envelope of the set RP. Furthermore, 
an optimal dual variable is any value u* € [5/3, 7/4]. 


Gap Function 


Another geometrical interpretation can be given for the 
primal problem 


min f(x) 
(P7) st. Ax =b, 
x>0, 


where f is assumed convex and differentiable. In what 
follows, let S = {x € R": Ax = b, x > 0} which is assumed 
to be a compact subset of R”. 

For any feasible x, define the gap function by 


G(x) = ie V(x)" (x —y) 
= —minVf(x)'(y—x). 
yes 


The gap function has several interesting properties. 
Letting y(x) be the solution of the linear program defin- 
ing G(x), note first that the gap function at x is the nega- 
tive of the directional derivative of f at x in the direction 
(y(x) — x). Second, it can be used to construct a lower 
bound on the optimal solution f(x*) of (P7). To see this, 
consider the convexity inequality 


f(y) = fe) + VF)" (y 2), 
Minimizing both sides over y € S implies 


f(x") = f(x) — G(x), 


Vy eS. 


Vx eS. 


By the weak duality result, a lower bound for f(x*) 
can also be obtained by evaluating the dual objective at 
any dual feasible solution. The next theorem employs 
the Wolfe dual of (P7) to show that the bound given 
above is equivalent to obtaining the maximum dual ob- 
jective value for a given x. 

Let v and u be the dual variables associated with Ax 
= band x > 0, respectively. The Lagrangian function of 
(P7) is L(x, v, u) = f(x) + v™(b— Ax) — uTx. Then, for 
the given x, the maximum dual objective value is 


d(x) = max L(x,v,u), 
(v,u)ED(x) 


where D(x) is the set of all multipliers such that (x, v, u) 
is feasible to the Wolfe dual: 


D(x) = {(v,u): VyL(x,v,u) = Oandu > 0}. 
Theorem 10 For any x € S, G(x) = f(x) — d(x). 


Proof First it is verified that D(x) is nonempty so that 
d(x) is well defined. This will be true if there exists 
av such that AT v < V f(x). By adaptation of Farkas’ 
lemma (cf. also ® Farkas lemma; » Farkas lemma: Gen- 
eralizations) such a v exists if and only if the alternative 
system 


Vf(x)'z <0, Az=0, z>0 


has no solution. However Az = 0, z > 0 imply that x + 
Az €S for all A > 0. Since S is assumed to be compact, 
the only possibility is z = 0 and the alternative system 
has no solution. Thus D(x) is nonempty. 

The dual constraints imply that uT x = V f(x)T x — 
vl Ax, so 


max f(x)—Vf(x)x + vib 


d(x) = : 
st. A'v <Vf(x). 


By linear programming duality 


min Vf(x)ly 
st. Ay=b 


y= 0, 


max bly _ 
st. ATy<Vf(x) — 


and it follows that 
d(x) = f(x) + min Vf(x)"(y ~x) 
= f(x) — G(x). 
Oo 


Expressing the duality gap in terms of x allows a simple 
interpretation of weak and strong duality in the convex 
case. Figure 4 illustrates the gap function in one variable 
with S being the interval [a, b]. Let x = x'. The linear 
function 


f(x") + VF) (yy — x") 


is the tangent line shown. It has a minimum in S at y(x') 
= a which, by convexity, must lie below f(x*). Hence 
the weak duality result holds: f(x*) > f(x')— G(x!) = 
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Lagrangian Duality: BASICS, Figure 4 
A one variable interpretation of weak and strong duality 


d(x'). Strong duality occurs when x! = x* and the min- 
imum of the linear function (i.e., the tangent at x*) has 
the value f(x*). In this case G(x*) = 0. If x* were at an 
interior point of S, and/or if x! is infeasible to S, this 
same interpretation holds provided only that f(x!) and 
V f(x') are defined. 


Summary 


This section has illustrated basic results and geometri- 
cal interpretations of Lagrangian duality. The reference 
list below is a selection of texts and journal articles on 
this topic for further reading. 


See also 


> Equality-constrained Nonlinear Programming: KKT 
Necessary Optimality Conditions 

> First Order Constraint Qualifications 

> Inequality-constrained Nonlinear Optimization 

> Kuhn-Tucker Optimality Conditions 

> Rosen’s Method, Global Convergence, and Powell’s 
Conjecture 

> Saddle Point Theory and Optimality Conditions 

> Second Order Constraint Qualifications 

> Second Order Optimality Conditions for Nonlinear 
Optimization 
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Optimization problems concern the minimization or 
maximization of functions over some set of condi- 
tions called constraints. The original treatment of con- 
strained optimization problems was to deal only with 
equality constraints via the introduction of Lagrange 
multipliers which found their origin in basic mechan- 
ics. Modeling real world situations often requires using 
inequality constraints leading to more challenging op- 
timization problems. Lagrange multipliers are used in 
optimality conditions and play a key role to devise algo- 
rithms for constrained problems. What will be summa- 
rized here are the basic elements of various algorithms 


based on Lagrangian multipliers to solve constrained 
optimization problems, and particularly convex opti- 
mization problems. A standard formulation of an op- 
timization problem is: 


(O) min{f(x): xe XNC}, 


where X is a certain subset of R" and Cis the set of con- 
straints described by equality and inequality constraints 


gi(x) <0, i=1,...,m, 


SY Sl = tare acd 


C= 
All the functions in problem (O) are real valued func- 
tions on R”, and the set X can described more abstract 
constraints of the problem. A point x € XM Cis called 
a feasible solution of the problem, and an optimal solu- 
tion is any feasible point where the local or global mini- 
mum of f relative to X N C is actually attained. By a con- 
vex problem we mean the case where X is a convex set, 
the functions f, g1,..., 2m are convex and hy,..., hp are 
affine. Recall that a set S C R” is convex if the line seg- 
ment joining any two different points of S is contained 
in it. 

Let S be a convex subset of R”. A real valued func- 
tion f: S > R is convex if for any x, y € Sand any dA € 
[0, 1], 


fx +(1—A)y) <AF(x) + (1 —- ADSL). 


Convexity plays a fundamental role in optimization 
(even in nonconvex problems). One of the key fact is 
that when a convex function is minimized over a con- 
vex set, every local optimal solution is global. Another, 
fundamental point is that a powerful duality theory can 
be developed for convex problems, which as we shall 
see, is also at the root of the development and analysis 
of Lagrangian multiplier methods. 


Augmented Lagrangians 


The basic idea of augmented Lagrangian methods for 
solving constrained optimization problems, also called 
multiplier methods, is to transform a constrained prob- 
lem into a sequence of unconstrained problems. The 
approach differs from the penalty-barrier methods, [13] 
from the fact that in the functional defining the uncon- 
strained problem to be solved, in addition to a penalty 
parameter, there are also multipliers associated with the 
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constraints. Multiplier methods can be seen as a com- 
bination of penalty and dual methods. The motivation 
for these methods came from the desire of avoiding ill- 
conditioning associated with the usual penalty-barrier 
methods. Indeed, in contrast to penalty methods, the 
penalty parameter need not to go to infinity to achieve 
convergence of the multiplier methods. As a conse- 
quence, the augmented Lagrangian has a ‘good’ condi- 
tioning, and the methods are robust for solving nonlin- 
ear programs. Augmented Lagrangians methods were 
proposed independently by M.R. Hestenes [16] and 
M,J.D. Powell [26] for the case of equality constraints, 
and extended for the case of inequality constraints by 
R.T. Rockafellar [27]. Many other researchers have con- 
tributed to the development of augmented Lagrangian 
methods, and for an excellent treatment and compre- 
hensive study of multiplier methods, see [7] and refer- 
ences therein. 


Quadratic Lagrangian 


We start by briefly describing the basic steps involved 
in generating a multiplier method for the equality con- 
strained problem 


(E) min{f(x): h(x) =0,i=1,...,p}. 


Here f and h; are real valued functions on R” and no 
convexity is assumed (which will not help anyway be- 
cause of the nonlinear equality constraints). Also for 
simplicity we let X = R". The ordinary Lagrangian as- 
sociated with (E) is 


P 
I(x, y) = f(x) + Y> yihi(x). 


i=1 


One of the oldest and simplest way to solve (E) is by se- 
quential minimization of the Lagrangian ([2]). Namely, 
we start with an initial multiplier y, and minimize I(x, 
y*) over x € R" to produce x*. We then update the mul- 
tiplier sequence via the formula: 
Speen), Twi, 
where s; is a stepsize parameter. The rational behind 
the above method is that it can be simply interpreted 
as a gradient-type algorithm to solve an associated dual 
problem. Unfortunately, such a method while simple 
requires too many assumptions on the problem’s data 


to generate points converging rapidly toward an opti- 
mal solution. Thus this primal-dual framework is not in 
general particularly attractive. However, combining the 
primal-dual idea to the one of penalty leads to another 
class of algorithms called multiplier methods. In these 
methods one uses instead of the classical Lagrangian 
I(x, y) a ‘penalized’ Lagrangian of the form: 


P P 
Pelee, y) = fle) +O yi) + 5 DMG), 
i=1 i=l 
where c > 0 is a penalty parameter. Then, starting with 
an initial multiplier y* and penalty parameter c*, the 
augmented Lagrangian P, is minimized with respect to 
x and at the end of each minimization, the multipli- 
ers (and sometimes also the penalty parameter) are up- 
dated according to some scheme and we continue the 
process until convergence. More precisely, the method 
of multipliers generates the sequences {yk} CR™, {x} 
C R" as follows. Given a sequence of nondecreasing 
scalars c; > 0, compute 


xt! € argmin ay ey xe R"} : 


k 


yh) = yk + cehala*?’), 


i=1,...,p. 

The rational behind the updating of the multipliers 
yk is that if the generated sequence x* converges to a lo- 
cal minimum then the sequence y* will converge to the 
corresponding Lagrange multiplier y*. Under reason- 
able assumptions, this happens without increasing the 
parameter c* to infinity and thus avoids the difficulty 
with ill-conditioning. The above scheme provides with 
the key steps in devising a multiplier method for equal- 
ity constrained optimization problems. We now turn to 
the case of problems with inequality constraints: 


(I) min{f(x): gi(x)<0,i=1,...,m}. 


One simple way to treat this case is to transform the 
inequality constraints to equality using squared vari- 
ables and then apply the multiplier framework previ- 
ously outlined. Thus, we convert problem (I) to the 
equality constrained problem in the variables (x, z): 


min f(x) 


st. gix)t+2=0, i=1,...,m, 


where z € R” are additional variables. The quadratic 
augmented Lagrangian to be minimized with respect to 
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(x, z) thus takes the form: 
Qe (x, 2, y) = f(x) + D> yilgi(x) + 27) 
i=1 


+= igi) + ay. 


i=1 


The key observation here is that the minimization with 
respect to z can be carried out analytically. One can 
verify via simple calculus that for fixed (x, y), minzerm™ 
Q.(x, z, y) = L-(x, y), with 


m 


Lely) = fod+ 5) [max%0, yi + cgi} - ¥7]. 


i=1 


Summarizing, the multiplier method for the inequality 
constrained problem (I) consists of the following two 
steps: 


x“*! € argmin {Leg x, ¥*): xeE R"} ; 


y*t! = max{0, y* + ceg(x**)}. 

For the general optimization problem (O), namely 
the case of mixed equality and inequality constraints, 
Lagrangian multiplier methods can be developed in 
a similar fashion. Convergence results to a local min- 
imum for the above schemes can be established under 
second order sufficiency assumptions, ([7,28]). In the 
case of convex programs, namely when in problem (1) 
the functions f, gi, ..., m are assumed convex func- 
tions, (or more generally in problem (O), if we also as- 
sume h; affine and X convex), much stronger conver- 
gence results can be established under mild assump- 
tions ([29]). A typical result is as follows. 


Assumption 1 The set of optimal solutions of the con- 
vex problem (I) is nonempty and compact and the set 
of multiplier is nonempty and compact. 


The assumption on the optimal set of multipliers is 
guaranteed under the standard Slater constraint qual- 
ification: 

ake. ee)S0, tH dee 

Under assumption 1, one can prove that the se- 
quence y* converges to some Lagrange multiplier y* 


and any limit point of the sequence x* is an optimal 


solution of the convex program. Note that we do not 
require that c, is sufficiently large and convergence is 
obtained from any starting point y® € R”. 

The multiplier method for inequality constrained 
problems was derived by using slack variables in the 
inequality constraints and then by applying the multi- 
plier method which was originally devised for problems 
having only equality constraints. An alternative way of 
constructing an augmented Lagrangian method is via 
the proximal framework. 


Proximal Minimization 


Consider the convex optimization problem 


(C) min{F(x): x €R"}, 

where F: R" — (— 00, + 0] is a proper, lower semicon- 
tinuous convex function. One method to solve (C) is 
to ‘regularize’ the objective function using the proximal 
map of J.-J. Moreau [22]. Given a real positive num- 
ber c, a proximal approximation of f is defined by: 


F.(x) = inf {F(u) + (2c)! |x —ull’}. (1) 


The resulting function F, enjoys several important 
properties: it is convex and differentiable with gradient 
which is Lipschitz with constant (c~') and when min- 
imized possesses the same set of minimizers and the 
same optimal value than problem (C). The quadratic 
regularization process of the function f leads to an it- 
erative procedure for solving problem (C), called the 
proximal point algorithm [21,30]. The method is as fol- 
lows: given an initial point x9 € R” a sequence {x;} is 
generated by solving: 


k 


Fl = argmin | F(x) + 5 |z-* 4 F (2) 
2Ck 
where {cx}?2 , is a sequence of positive numbers. 

One of the most powerful application of the prox- 
imal algorithm is when applied to the dual of an op- 
timization problem. Indeed, as shown by Rockafellar 
[27,29], a direct calculation shows that L, can be written 
as 


1 
L¢(x, y) = max I(x,A) = 5 |]A = yl? , (3) 


m 
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where the maximum is attained uniquely at A; = max 
{0, yi + c gi(x)}, i= 1,..., m. Herel: R" x RP OR 
denotes the usual Lagrangian associated with the in- 
equality constrained problem (I) and R% stands for 
the nonnegative orthant. This shows that the quadratic 
augmented Lagrangian is nothing else but the Moreau 
proximal regularization of the ordinary Lagrangian, 
and the quadratic multiplier method can be interpreted 
as applying the proximal minimization algorithm on 
the dual problem associated with (I): 


(D) sup {d(y): y = 0}, 


where d(y) := inf, I(x, y) is the dual functional. This 
interplay between the proximal algorithm and multi- 
plier methods is particularly interesting since it offers 
the possibility of designing and analyzing the conver- 
gence properties of the later from the former, and also 
leads to consider useful potential extensions of multi- 
plier methods which are discussed next. 


Modified Lagrangians 


One of the main disadvantages of the quadratic mul- 
tiplier methods for inequality constrained problems is 
that even when the original problem is given twice con- 
tinuously differentiable, the corresponding functional 
L, is not. Indeed, note that with twice continuously dif- 
ferentiable data {f, gi}, the augmented Lagrangian L, 
is continuously differentiable in x. However, the Hes- 
sian matrix of L, is discontinuous for all x such that 
gi(x) =—c_! y;. This may cause difficulties in designing 
an efficient unconstrained minimization algorithm for 
L, and motivates the search for alternative augmented 
Lagrangian to handle inequality constrained problems, 
which we call here modified Lagrangians. These La- 
grangians possess better differentiability properties to 
allow the use of efficient Newton-like methods in the 
minimization step. Modified Lagrangians can be found 
in several works, [1,15,19,20]. An approach originally 
developed in [19] proposed a class of methods which 
uses instead of L, a modified Lagrangian of the form: 


Be(x, y) = f(x) +7! D> yiw(cgi(x)), 


i=1 


where 7p is a scalar penalty function which is at least C” 
and satisfies some other technical conditions. For each 


choice of y we then have a multiplier method which 
consists of the sequence of unconstrained minimization 
problems 


xk! € arg min B.x(x, y*), 
xeR" 


followed by the multiplier updates 


k+1 


ye Sy ae), 


t= 1g, 

The multiplier updating formula can be simply ex- 
plained as follows. Suppose the functions in problem 
(I) are given differentiable, then x**! minimizes Be, (% 
y*) means that Vx B,,(x**!, yk) = 0, i.e. 


Via) + Do yi W (cage )Vgi(x*t!) = 0, 


i=1 


and using the multiplier updates defined above the 
equation reduces to: 


V(x) zis yo vee) =0, 


i=1 


showing that (x**!, y<*!) also satisfies the optimal- 


ity conditions for minimizing the classical Lagrangian, 
namely V, [(x**!, y¥*!) = 0. Interesting special cases 
of the generic method described above includes the ex- 
ponential method ([23,35]) with the choice w(t) = e — 
1 and the modified barrier method [24] which is based 
on the choice w(t) = — In(1— ft). More examples and 
further analysis of these methods can be found in [25]. 

Another way of constructing modified Lagrangians 
is in view of the results from the previous section, 
to try alternative proximal regularization terms which 
could lead to better differentiability properties of the 
corresponding augmented Lagrangian functional. This 
approach was considered in [32], who suggested new 
classes of proximal approximation of a function given 
by 


Fy (x) := inf f(w) + A" D(u, »)}. (4) 


Here, D(-, -), which replaces the quadratic proximal 
term in (1), is a measure of ‘closeness’ between x, y sat- 
isfying D(x, y) => 0 with equality if and only if x = y. 
One generic form for D is the use of a ‘proximal-like’ 
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term defined by 


D(x, y) = do(x, y) = D> vig 'xi), 


i=1 


where ¢ is a given convex function defined on the non- 
negative real line and which satisfies some technical 
conditions ([33]). The motivation of using such func- 
tional emerges from the desire of eliminating nonneg- 
ativity constraints such as the ones present in the dual 
problem. Thus, by mimicking (2) and (3) with the prox- 
imal term dg, one can design a wide variety of modified 
Lagrangians methods with an appropriate choice of @. 
The basic steps of the modified multipliers method then 
emerging can be described as follows: Given a sequence 
of positive numbers {c;}, and initial points xk ER", yk 
€ R' (the positive orthant) generate iteratively the next 
points by solving 


xktlhe arg min {Mex(x, y"): xe R"| , (5) 


followed by the multiplier updates 
ye argmaxty'g(x") —cpldoly. yh, 6) 
where M, is the modified Lagrangian defined by 


M.A(x,y) = sup {I(x, 4) — cdg (u, y)} (7) 
HERD 


i.e., the proximal-like regularization of the usual La- 
grangian I(x, 4) associated with problem (I). In the 
equation (6), g(x) denotes the column vector (g1 (x), ..., 
&m(x))’ € R™ and the prime denotes transposition. The 
method is viable since both (6) and (7) can be solved 
analytically, and the computational analysis and effort 
should concentrate on (5). This method of multipliers 
is nothing else but a proximal-like algorithm applied to 
the dual problem (D) ([17]) i.e. starting with y° € R”, 
generate a sequence {y*} by solving 


yr = arg max{d(y) — c;do(y, ye 
yz0 


The above scheme gives rise to a rich family of nu- 
merical methods, which includes (with an appropriate 
choice of ~) several classes of nonquadratic multiplier 
methods ([7,24,35]). One of the main advantage of us- 
ing these modified multiplier methods is that in con- 
trast with the usual quadratic augmented Lagrangian 


function, the modified Lagrangian for various choices 
of dy is twice continuously differentiable if the prob- 
lem’s data f, g are. Thus, this opens the possibility of 
using Newton methods for solving efficiently (5). 

Under assumption 1 and appropriate condition on 
the kernel g one can prove convergence results for these 
modified multiplier methods similar to the one obtains 
in the quadratic case ([17]). There has been consider- 
able recent research on modified Lagrangian methods 
and for further results see [3,4,5,11,18,25]. 

The Lagrangian functional plays a central role in the 
analysis and algorithmic development of constrained 
optimization problems. Lagrangian based methods 
and the related proximal framework have been used 
in other optimization contexts, such as convexifica- 
tion of nonconvex optimization problems [6,28], de- 
composition algorithms [9,12,31,34], semidefinite pro- 
gramming [10] and in many other applications, see 
e.g., [8,14] where more references can be found. 


See also 


> Convex Max-functions 

> Decomposition Techniques for MILP: Lagrangian 
Relaxation 

> Integer Programming: Lagrangian Relaxation 

> Lagrange, Joseph-Louis 

> Multi-objective Optimization: Lagrange Duality 
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Abstract 


The Laplace method has found many applications in 
the theoretical and applied study of optimization prob- 
lems. It has been used to study: the asymptotic behav- 
ior of stochastic algorithms, ‘phase transitions’ in com- 
binatorial optimization, and as a smoothing technique 
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for non-differentiable minimax problems. This article 
describes the theoretical foundation and practical ap- 
plications of this useful technique. 


Background 


Laplace’s method is based on an ingenious trick used by 
Laplace in one his papers [19]. The technique is most 
frequently used to perform asymptotic evaluations to 
integrals that depend on a scalar parameter t, as ¢t tends 
to infinity. Its use can be theoretically justified for inte- 
grals in the following form: 


_ — f(x) 
19) = [ ex} TW) b dacs). 


Where f : R” > R,T : R — R, are assumed to 
be smooth, and T(t) — 0 as ¢ tends to oo. A is some 
compact set, and A is some measure on B (the o—field 
generated by A). We know that since A is compact, 
the continuous function f will have a global minimum 
in A. For simplicity, assume that the global minimum 
x is unique, and that it occurs in the interior A. Un- 
der these conditions, and as t¢ tends to infinity, only 
points that are in the immediate neighborhood of x" 
contribute to the asymptotic expansion of I(t) for large 
t. The heuristic argument presented above can be made 
precise. The complete argument can be found in [2], 
and in [4]. Instead we give a heuristic but didactic argu- 
ment that is usually used when introducing the method. 


Heuristic Foundations of the Method 


For the purpose of this subsection only, assume that f is 
a function of one variable, and that A is given by some 
interval [a, b]. It will be instructive to give a justification 
of the method based on the one dimensional integral: 


b 
Ko= [ exp} Ff dx. 


t 


Suppose that f has a unique global minimum, say c, 
such that c € (a,b). As t is assumed to be large, we 
only need to take into account points near c when eval- 
uating K(t). We therefore approximate K(t) by K(t;€). 
The latter quantity is given by: 


cte 
Kase =f exp | Ft dx. 


Expanding f to second order, and by noting that 
f'(c) = 0, we obtain the following approximation: 


cte lg _ As 
Knee [ ap | f0* Lf" oN(x— 6) a 


t 


—eE 


ee " = 
om a he. 


t - 2t 


The limits of the integral above can be extended to in- 
finity. This extension can be justified by the fact only 
points around c contribute to the asymptotic evaluation 
of the integral. 


K(t;€) 


+00 " _ 
= exp | 1 exp ae “| dx 


- 2t 


7 flc| [ant 
= exp} 72 "0 . 


In conclusion we have that: 


naan £0} | 


Rigorous justifications of the above arguments can be 
found in [4]. These types of results are standard in the 
field of asymptotic analysis. The same ideas can be ap- 
plied to optimization problems. 


Applications 


Consider the following problem: 


F* = min f(x) 


s.t gi(x) < 0 (1) 


a eae 


Let S denote the feasible region of the problem above, 
and assume that it is nonempty, and compact, then: 


lim —e In c(t) = F*. (2) 
to 


Where, 


ac) [ exp] ; baa 
[e 


(3) 
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A is any measure on (R", B). A proof of Eq. (2) can be 
found in [16]. 

The relationship in Eq. (3) can be evaluated us- 
ing the Laplace method. The link between the Laplace 
method and optimization has been explored in: 

e Stochastic methods for global optimization. 

e Phase transitions in combinatorial optimization. 
e Algorithms for worst case analysis. 

These application areas will be explored next. 


Stochastic Methods for Global Optimization 


Global optimization is concerned with the computation 
of global solutions of Eq. (1). In other words, one seeks 
to compute F’, and if possible obtaining points from the 
following set: 


S* ={x ES | f(x) = F*}. 


Often the only way to solve such problems is by us- 
ing a stochastic method. Deterministic methods are 
also available but are usually applicable to low dimen- 
sional problems. When designing stochastic methods 
for global optimization, it is often the case that the algo- 
rithm can be analyzed as a stochastic process. Then in 
order to analyze the behavior of the algorithm we can 
examine the asymptotic behavior of the stochastic pro- 
cess. In order to perform this analysis we need to define 
a probability measure that has its support in S’. This 
strategy has been implemented in [3,6,7,8,9,10,16]. 

A well known method for obtaining a solution to an 
unconstrained optimization problem is to consider the 
following Ordinary Differential Equation (ODE): 


dxX(t) = —Vf(X(t))dt. (4) 


By studying the behavior of X(t) for large t, it can be 
shown that X(t) will eventually converge to a station- 
ary point of the unconstrained problem. A review of, so 
called, continuous-path methods can be found in [22]. 
More recently, application of this method to large scale 
problems was considered by Li-Zhi et al. [13]. A defi- 
ciency of using Eq. (4) to solve optimization problems 
is that it will get trapped in local minima. In order to 
allow the trajectory to escape from local minima, it has 
been proposed by various authors (e.g. [1,3,7,8,12,16]) 
to add a stochastic term that would allow the trajectory 
to “climb” hills. One possible augmentation to Eq. (4) 


that would enable us to escape from local minima is to 
add noise. One then considers the diffusion process: 


dX(t) = —Vf(X(t))dt + V2T(H)dB(t) . (5) 


Where B(t) is the standard Brownian motion in R”. 
It has been shown in [3,7,8], under appropriate condi- 
tions on f, that if the annealing schedule is chosen as 
follows: 


Cc 


forsome c>c, (6) 


T(t) = —___. 
log(2 + t) 
where co is a constant positive scalar (the exact value 
of co is problem dependent). Under these conditions, 
as t —> oo, the transition probability of X(t) converges 
(weakly) to a probability measure /7. The latter, has its 
support on the set of global minimizers. A characteriza- 
tion of IT was given by Hwang in [11]. It was shown that 
IT is the weak limit of the following, so called, Boltz- 

mann density: 


conf 22H] [fof S29 


Discussion of the conditions for the existence of JT, can 
be found in [11]. A description of [7 in terms of the 
Hessian of f can also be found in [11]. Extensions of 
these results to constrained optimization problems ap- 
pear in [16]. 


Phase Transitions in Combinatorial Optimization 


The aim in combinatorial optimization is to select from 
a finite set of configurations of the system, the one 
that minimizes an objective function. The most fa- 
mous combinatorial problem is the Travelling Sales- 
man Problem (TSP). A large part of theoretical com- 
puter science is concerned with estimating the com- 
plexity of combinatorial problems. Loosely speaking, 
the aim of computational complexity theory is to clas- 
sify problems in terms of their degree of difficulty. One 
measure of complexity is time complexity, and worst 
case time complexity has been the aspect that received 
most attention. We refer the interested reader to [15] 
for results in this direction. We will briefly summarize 
results that have to do with average time complexity, 
the Laplace method, and phase transitions. 
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Most of complexity theory is concerned with 
worst case complexity. However, many useful methods 
(e.g. the simplex method) will require an exponential 
amount of time to converge only in pathological cases. 
It is therefore of great interest to estimate average case 
complexity. The physics community has recently pro- 
posed the use of tools from statistical mechanics as one 
way of estimating average case complexity. A review in 
the form of a tutorial can be found in [14]. Here we just 
briefly adumbrate the main ideas. 

The first step in the statistical mechanics approach is 
to define a probability measure on the configuration of 
the system. This definition is done with the Boltzmann 
density: 


exp {-t (c)} 


=, 
p(C) Sep ei (O} 


The preceding equation is of course the discrete version 
of Eq. (7). Using the above definition, the average value 
of the objective function is given by: 


(ft) = Do pl O)F(C). 
Cc 


Tools and techniques of statistical mechanics can be 
used to calculate ‘computational phase transitions’. 
A computational phase transition is an abrupt change 
in the computational effort required to solve a combi- 
natorial optimization problem. It is beyond the scope of 
this article to elaborate on this interesting area of opti- 
mization. We refer the interested reader to the review 
in [14]. The book of Talagrand [20] presents some rig- 
orous results on this subject. 


Worst Case Optimization 


In many areas where optimization methods can be 
fruitfully applied, worst case analysis can provide con- 
siderable insight into the decision process. The funda- 
mental tool for worst case analysis is the continuous 
minimax problem: 


min @®(x), 
xEX 


where ®(x) = maxyey f(x, y). The continuous min- 
imax problem arises in numerous disciplines, includ- 
ing n—-person games, finance, economics and policy op- 
timization (see [18] for a review). In general, they are 
used by the decision maker to assess the worst-case 


strategy of the opponent and compute the optimal re- 
sponse. The opponent can also be interpreted as nature 
choosing the worst-case value of the uncertainty, and 
the solution would be the strategy which ensures the 
optimal response to the worst-case. Neither the robust 
decision maker nor the opponent would benefit by de- 
viating unilaterally from this strategy. The solution can 
be characterized as a saddle point when f(x, -) is convex 
in x and f(-, y) is concave in y. A survey of algorithms 
for computing saddle points can be found in [5,18]. 

Evaluating ®(x) is extremely difficult due to the 
fact that global optimization is required over Y. More- 
over, this function will in general be non-differentiable. 
For this reason, it has been suggested by many re- 
searchers (e.g. [17,21]) to approximate ®(x) with 
P(x; t) given by: 


P(x;t) = [en |-= | dy. 


This is of course another application of the Laplace 
method, and it can easily be seen that: 


lim —tln @(x; t) = B(x). 
to 


This idea has been implemented in [17,21] with consid- 
erable success. 
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The trust region (TR) problem consists in minimizing 
a general quadratic function q: R” — R of the type 


1 
de) = 5x"Qetc"x 


subject to an ellipsoidal constraint x? Hx < r? with the 
symmetric matrix H positive definite and r a positive 
scalar. By rescaling and without loss of generality, it can 
be assumed for sake of simplicity H = I, hence the TR 
problem is 


min q(x) 
2 2 (1) 
st. [xl <r, 
where || - || denotes the 2; norm. 


The interest in this problem initially arose in the 
context of unconstrained optimization when q(x) is 
a local quadratic model of the objective function which 
is ‘trusted’ to be valid over a restricted ellipsoidal region 
centered around the current iterate. However, it has 
been shown later that problems with the same struc- 
ture of (1) are at the basis of algorithms for solving 
general constrained nonlinear programming problems 
(e. g. [2,14,19,21,27,28] and references therein), and for 
obtaining bounds for integer programming problems 
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(e.g. [10,11,12,17,18,26]; cf. also » Integer program- 
ming). 

Many papers have been devoted to study the spe- 
cific features of Problem (1). It is well known [7,22] that 
a feasible point x* is a global solution for (1) if and only 
if there exists a scalar A* > 0 such that the following 
KKT conditions are satisfied: 


(Q+A*Ix* = —-c, 
A*(||x*|? — 1?) =0, 


and furthermore Q + A* I = 0, where = denotes positive 

semidefinitness of the matrix. 

Note that a complete characterization of global min- 
imizers is given without requiring any convexity as- 
sumption on the matrix Q. Moreover, it has been 
proved that an approximation to the global solution 
can be computed in polynomial time (see, for exam- 
ple, [1,24,25]). Hence Problem (1) can be considered an 
‘easy problem from a theoretical point of view. These 
peculiarities led to the development of ‘ad hoc’ algo- 
rithms for finding a global solution of Problem (1). The 
first ones proposed in [7,16,22] were essentially based 
on the solution of a sequence of linear system of the type 
(Q+ A, I) x= —c for a sequence {A;}. These algorithms 
produce an approximate global minimizer of Problem 
(1), but rely on the ability to compute a Cholesky fac- 
torization of the matrix (Q + A, I) at each iteration k, 
and hence these methods are appropriate when form- 
ing a factorization for different values of A, is realistic 
in terms of both memory and time requirements. In- 
deed, they are appropriate for large scale problems with 
special structure, but in the general case, when no spar- 
sity pattern is known, one cannot rely on factorizations 
of the matrices involved. 

Thus one concentrates on iterative methods of con- 
jugate gradient type (cf. ® Conjugate-gradient meth- 
ods) that require only matrix-vector products. Among 
the methods that have been proposed to solve large 
scale trust region problems, the following two main cat- 
egories can be identified: 

e methods that produce a sequence of KKT points of 
(1) with progressive improvement of the objective 
function; 

e methods that solve (1) via a sequence of parametric 
eigenvalue problems. 


Algorithms Based on Successive Improvement 
of KKT Points 


Methods in this class are based on special properties of 
KKT points of Problem (1). Indeed one can prove the 
following properties: 

1) given a KKT point that is not a global minimizer, it 
is possible to find a new feasible point with a lower 
value of the objective function [5,13]; 

2) the number of distinct values of the objective func- 
tion q(x) at KKT points is bounded from above by 
2m + 2 where m is the number of negative eigenval- 
ues of Q [13]. 

Exploiting these properties, a global minimizer of Prob- 

lem (1) can be found, by applying a finite number of 

times an algorithm that, starting from a feasible point, 
locates a KKT point with a lower value of the objective 
function. 

An algorithmic scheme of methods in this frame- 
work is summarized in the pseudocode of Table 1. The 
procedure described above is well-posed in the sense 
that it enters the ‘DO cycle’ a finite number of steps, 
since by Property 2, the function can assume at most 
a finite number of values at a KKT point. 

To complete the scheme of Table 1 and obtain an 
efficient algorithm for the solution of Problem (1), it re- 
mains to specify how to move from a non global KKT 
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A pseudocode for TR problem based on successive improve- 
ment of KKT points 


procedure TR-IMPROVE-KKT() 
input instance (Q, c, r, x°); 
Set k =0;x =x'; (starting point) 
find a KKT point #* s.t. q(&*) < q(x*); 
DO (until a global minimizer is found) 
(escape from a nonglobal KKT point) 
find as.) || « |= 7, qe) < a(x"); 
(update starting point) 
setk=k+1, x* =x; 
(find a ‘better’ KKT point) 
find a KKT point #* s.t. 
g(&*) < q(x*); 
OD; 
RETURN (solution) 
END TR-IMPROVE-KKT; 
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point to a feasible point while improving the objective 
function, and how to define a globally and ‘fast’ conver- 
gent algorithm to locate a KKT point. 

To check global optimality of a KKT point (i.e. to 
check if Q+ A I = 0), one needs an estimate of the KKT 
multiplier 2 corresponding to the point x, and has to 
verify whether A > — Amin(Q). To obtain A the follow- 
ing multiplier function can be used 


Aa) = zyx M(Qx + 0), (2) 


which is consistent, namely at a KKT point A(x) = 4. 
If A< — Amin(Q), then (x, A) is a nonglobal KKT point 
and a negative curvature direction for the matrix Q + 
A I exists, namely a vector z such that z1(Q+AI)z< 
0. To perform the step ‘escape from a non global KKT 
point’, one can use such a direction. Roughly speaking 
and without discussing the details (see [5,13]), a new 
feasible point can be obtained by moving from x along 
z itself or along a direction easily obtainable from z of 
a computable quantity a. The efficiency of this step de- 
pends on the ability of finding efficiently such a vec- 
tor z. Hence a procedure that finds an approximation 
of the minimum eigenvalue of (Q + A I) and of the 
corresponding eigenvector is needed. In the large scale 
setting, this can be done efficiently by using a Lanczos 
method [3,23] which meets the requirement of limited 
storage and needs only matrix-vector products. 

In the algorithmic scheme of Table 1, it remains to 
define how to find efficiently a KKT point for Problem 
(1). Two different approaches have been recently (1998) 
proposed to perform this step; one is based on a contin- 
uously differentiable exact penalty function approach, 
the other is based on a difference of convex function 
approach. In both cases, the basic idea is to reformulate 
the constrained Problem (1) in a different form that al- 
lows one to use ideas typical of other fields of mathe- 
matical programming. Both approaches, which are de- 
scribed briefly in the sequel, treat indifferently the so 
called ‘easy and hard’ cases of Problem (1) and require 
only matrix vector products. 


Exact Penalty Function Based Algorithm (EPA) 


The main idea at the basis of a continuously differen- 
tiable exact penalty function approach is the reformula- 
tion of the constrained Problem (1) as an unconstrained 


one. In particular, a continuously differentiable func- 
tion P(x) can be defined [13] such that Problem (1) is 
‘equivalent’ to the unconstrained problem 


min P(x). 
el? (x) 


The merit function takes full advantage of the struc- 
ture of Problem (1) and it is a piecewise quartic func- 
tion, whose definition relies on the particular multiplier 
function (2). The analytic expression of P is 


P(x) = q(x) - {Ax}? 
€ 2 2 
+ — max (0. ~(|Jx||? — 17) + 1409) ; 
4 € 


where 0 < € < 2r4/[r?(|| Q || + 1)+ ||cl]?]. The function 

P(x) has the following features: 

e it has compact level sets; 

e stationary (global minimum) points of P(x) are KKT 
(global minimum) points of Problem (1) and vice 
versa; moreover P(x) = q(x) at these points; 

e the penalty parameter ¢ need not be updated; 

e for points such that ||x||? <r? it results P(x) < q(x); 

e P(x) is twice continuously differentiable in a neigh- 
borhood of a KKT point that satisfies strict comple- 
mentarity. 

The unconstrained reformulation of Problem (1) can 

be exploited to define an algorithm for finding a KKT 

point while improving the value of objective func- 
tion with respect to the initial one. Indeed any uncon- 
strained method for the minimization of P(x) can be 
used. Starting from a point xo, any of these algorithms 
produce a sequence of the type 

kt] = xk 4 gk gk (3) 
where d* is a suitable direction, a‘ isa stepsize along dk. 

The sequence {x*} need not to be feasible for Problem 

(1). The boundedness of the level sets of P(x) guarantees 

the boundedness of the iterates and that any conver- 

gent unconstrained method obtains a stationary point 

x for P such that P(x) < P(xo). Furthermore a station- 

ary point of P(x) is a KKT point of Problem (1) and 

P(x) = q(x). If, in addition, xo is a feasible point, the 

following relation holds: 


q(x) = P(x) < P(x) < q(xo), 
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which means that x is a KKT point of Problem (1) with 
a value of the objective function lower than the value at 
the starting point. 

As regard the efficiency of the algorithms, in terms 
of rate of convergence and computational requirement, 
a ‘good’ direction d* can be defined, by further exploit- 
ing the features of the unconstrained reformulation. In- 
deed, in a neighborhood of points satisfying the strict 
complementarity assumption, P(x) € C? and therefore 
any unconstrained truncated Newton algorithm [4] can 
be easily adapted in order to define globally convergent 
methods which show a superlinear rate of convergence. 
Methods in this class include conjugate gradient based 
iterative method that requires only matrix-vector prod- 
ucts and hence are suitable for large scale instances. 

The resulting algorithmic scheme is reported in Ta- 
ble 2. 

In the nonconvex case (Q 7 0) strict complemen- 
tarity holds in a neighborhood of every global mini- 
mizer of Problem (1) [13]. However, this may not be 
true in a neighborhood of a KKT point and the func- 
tion P(x) may be not twice differentiable there. Nev- 
ertheless algorithms which exhibit superlinear rate of 
convergence can be defined. In fact, drawing inspira- 
tion from the results in [6], the direction d* is defined 
as the approximate solution of one of the following lin- 
ear systems: 


if |x*|)” —r< —e%X., then 
(Q+A*Id* = —(Qx* + c), 
if |x*|° -r> ~e%X then (4) 


Q+A‘I xk\ (dk _ —Qxk —¢ 
HT of \atp Ve — xk?) 


The solution of the linear systems (4), can be deter- 
mined approximately by using the truncated Newton 
method proposed in [8]. The direction d* satisfies suit- 
able descent conditions with respect to the penalty 
function P, which can be used to measure the progres- 
sive improvement of the iterate. The stepsize a* can be 
determined by any Armijo-type line search [9] that uses 
P as merit function. 

It is possible to prove that the sequence {x*} pro- 
duced by (3) with d* obtained by (4) and {A(x')} by 
(2) converges to a KKT point (x, i) Moreover if the 
KKT point (x, A) satisfies z™(Q + ADz > 0 for all z: 


z'X = 0 whenever =|" = rand > 0, then there 
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A pseudocode for finding a KKT point by EPA 


procedure KKT point by EPA() 
Given x° :|| x° ||?< r? and e > 0; 
set A° = A(x°) and k = 0; 
DO (until a KKT point (x*, A*) is found) 
set xFt! = xk + ak dk 
and sn = A(xk*4); 
k=k+1; 
OD; 
RETURN(KKT point); 
END KKT point by EPA; 


exists a neighborhood of x where the rate of conver- 
gence of the algorithm is superlinear. 


D.C. Decomposition Based Algorithm (DCA) 


This algorithm is based on an appropriate reformula- 
tion of Problem (1) as the minimization of the differ- 
ence of convex functions [5]. DCA has been proposed 
for solving large scale d.c. programming problems. The 
key aspect in d.c. optimization (cf. » D.C. program- 
ming) relies on the particular structure of the objec- 
tive function to be minimized on R” that is expressed as 
f(x) = g(x) — h(x), with g and h being convex. One uses 
the tools of convex analysis applied to the two compo- 
nents g and h of the d.c. function. In particular d.c. du- 
ality plays a fundamental role to understand how DCA 
works. Indeed for a generic d.c. problem, DCA con- 
structs two sequences {x} and {y*} and it can be viewed 
as a sort of decomposition approach of the primal and 
dual d.c. problems. It must be pointed out that a d.c. 
function has infinitely many d.c. decompositions that 
give rise to different primal dual pairs of d.c. problems 
and so to different DCA relative to these d.c. decompo- 
sitions. Thus, choosing a d.c. decomposition may have 
an important influence on the qualities (such as robust- 
ness, stability, rate of convergence) of the DCA. This 
aspect is related to regularization techniques in d.c. pro- 
gramming. 

In the special case of Problem (1), a quite appropri- 
ate d.c. decomposition has been proposed, so that DCA 
becomes very simple and it requires only matrix-vector 
products. To apply DCA to Problem (1), a d.c. decom- 
position of the objective function f(x) = q(x) + xr(x) 
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must be defined, where 7p(x) is the indicator function 
for the feasible set, namely 


0 if Ix’ <7, 


XF(x) = . 
oo otherwise. 


From the computational point of view, the most effi- 
cient decomposition that has been proposed is 


1 

g(x) = spllall? + 07x + xr), 
lt 

h(x) = 5x" (pl — Q)x, 


with p > 0 and such that (p I — Q) > 0. In this case 
the sequence {y*} is obtained by the following rule y* = 
(pI — Q) x* and x**! is obtained as the solution of the 
problem 


4 
min =p |x|]? + x'(c— y*) + xr(x). 
xER" 2 


Thus x«*! is the projection of (y‘ — c)/p onto the fea- 
sible region ||x||? < r?. The scheme for obtaining KKT 
points by DCA is reported in Table 3. 

It has been proved [5] that algorithm DCA gener- 
ates a sequence of feasible points {x*} with strictly de- 
creasing value of the objective function and such that 
{x} converges to a KKT point. 

In practice the convergence rate depends on the 
choice of the parameter p. A possible choice (the best 
one according to some numerical experimentations 
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A pseudocode for finding a KKT point by DCA 


procedure KKT POINT by DCA() 
Given x°, o > 0 such that (pI — Q) > 0; 
DO (until a KKT point is found) 
IF || (of — Q)x* —c ||< pr THEN 
x1 = (pI — Q)x* — ¢] 


Oe = 
ELSE x*t! = >? oe z 
[ (or Qxk—e | 

END IF; 
IF || x**1 — x* ||< tol exit; 
setk=k+1; 

OD; 

RETURN (KKT point); 

END KKT POINT by DCA; 


performed in [5]) consists in taking p as close as possi- 
ble to the largest eigenvalue of the matrix Q, namely p = 
max{Amax(Q) + €, 107*} with e> 0 and sufficiently small. 
Actually only a low accuracy estimate of Amax(Q), which 
can be found by using a Lanczos method, is needed. 


Parametric Eigenvalue Reformulation 
Based Algorithms 


The algorithms in this framework are based on the re- 
formulation of the TR problem into a parametric eigen- 
value problem of a bordered matrix. It must be noted 
that, if the linear term is not present in the function 
q(x), i.e. c = 0, Problem (1) is a pure quadratic prob- 
lem that corresponds to finding the smallest eigenvalue 
of the matrix Q. Indeed the intuitive observation behind 
this idea is that given a real number ft, one can write 


eget) (oe 1 
ga 5 \y c Q)\x 


and for a fixed t the goal is to minimize the function 
q(x) over the set {x: ||x||? + 1 =r? + 1}, that is to mini- 
mize a pure quadratic form z? D(t) z/2 over a spherical 
region where 


| 
vo=( 9) 


This suggests that a solution of (1) may be found using 
eigenpairs of the matrix D(t) where t is a parameter to 
be adjusted. Indeed, in both the algorithms proposed in 
this framework a key role is played by eigenpairs of the 
matrix D(t). At each iteration the main computational 
step is the calculation of the smallest eigenvalue and 
a corresponding normalized eigenvector of the para- 
metric matrix D(t). The evaluation of the eigenvalue- 
eigenvector pair can be done by using Lanczos method 
as a black box. Therefore methods can exploit sparsity 
in the matrices and requires only matrix-vector mul- 
tiplications. Moreover, only one element of the matrix 
D(t) is changed at each iteration of both the algorithms 
and so consecutive steps of Lanczos algorithm become 
cheaper. 

Both algorithms have to distinguish between the 
easy and hard case of Problem (1). The hard case is 
said to occur when the vector c is orthogonal to the 
eigenspace associated to the smallest eigenvalue of Q, 
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i.e. cly = 0, for all y € Smin with 
Smin = {x eR": Qx = Amin(Q)x} . 


Depending on whether the easy or the hard case occurs, 
eigenpairs of the perturbed matrix D(t) satisfies differ- 
ent properties. In the easy case, the smallest eigenvalue 
/Lmin (D(t)) is simple and such that [min (D(t)) < Amin 
(Q) for all values t. Moreover in this case the corre- 
sponding eigenvector has the first component not equal 
to zero and this plays a fundamental role in defining the 
iteration of both the algorithms. In the hard case cau- 
tion should be used, due to the fact that the first com- 
ponent of the eigenvector corresponding to the smallest 
eigenvalue of D(t) may be zero. Actually, any vector of 
the form (0, yT)T with y € Smin is an eigenvector of the 
matrix D(t) if and only if cL Smin. 

The two algorithms in this framework are briefly de- 
scribed below. Although the basic idea behind both the 
algorithms is the same, namely inverse interpolation for 
a parametric eigenvalue problem, the second one is em- 
bedded in a semidefinite programming framework. So 
the first one is referred to as ‘inverse interpolation para- 
metric eigenvalue’ (IPE) approach and the second one 
as ‘semidefinite programming approach’ (SDP). 


Inverse Interpolation Parametric 
Eigenvalue Formulation (IPE) 


In [23] it is observed that if an eigenvector z of D(t) 
corresponding to a given eigenvalue jz can be normal- 
ized so that its first component is one, that is z = (1, 
xT)T, then a solution of the TR problem can be found 
in terms of eigenpairs of D(t). This corresponds to the 
easy case and indeed the pair (x, j1) satisfies 


( o)G)-"(): 


from which we get: 


( t—-p=-—c'x, ) 
(Q—pl)x =—c. 
For {t < Amin(Q), that holds in the easy case with pz = 


[Lmin(D(t)), the matrix (Q — p I) is positive definite and 
hence one can define the function 


$(u) =—clx =cT(Q—ple, 


whose derivative is 
@'(u) = c'(Q— ply = |x|. 


For a given value of t, finding the smallest eigenvalue 
L(t) = [min (D(t)) < Amin(Q) and the corresponding 
eigenvector of D(t) and then normalizing the eigenvec- 
tor to have its first component equal to one (1, xay)T 
will provide a mean to evaluate the function @(j) and 
its derivative. If t can be adjusted so that the corre- 
sponding x,,:1) satisfies @’(w(t)) = || xu ||? =7? with t — 
L(t) = —cT x4, and u(t) < 0 then (x, — j(t)) satisfies 
the optimality conditions for Problem (1). Whereas if, 
during the course of adjusting ft, it happens that p(t) > 
0 with || x, ||? < 7? then the optimal solution of Prob- 
lem (1) is actually unconstrained and can be found by 
solving the system Qx = — c with any iterative method. 

Hence using the parametric eigenvalue formulation, 
the optimal value of (x*, A*) of Problem (1) can be 
found by solving a sequence of eigenvalue problems ad- 
justing iteratively the parameter t. In order to make this 
observation useful, a modified Lanczos methods, the 
implicit restarted Lanczos method [23], is used for com- 
puting the smallest eigenvalue and the corresponding 
eigenvector of D(t). Moreover a rapidly convergent it- 
eration to adjust t has been developed, based on a two- 
point interpolant method. Recalling that the goal is to 
adjust t so that $(j) = t — mw and ¢'(w) =r’, an in- 
terpolation based iteration that exploits the structure of 
the problem is proposed. The method is based upon an 
interpolant H( jt) of P(t) of the form 

2 
$(M) = r 


+ B(a—p) +6. 
a — 
The values of the parameters a, 6, y, 5 appearing in the 
interpolant function #(j) are determined using the val- 
ues of two iterations (x*, 1), (x*~!, uw!) according to 
the following rules. The value 5 is chosen so as to pro- 
vide the current estimate dmin Of Amin(Q). In particular, 
if || x* || <ror ||? | <7 
kyT Oyk 
5 = min (sn a) 


2 
l=" 


> 


if || x* || > rand || x4! || > rthen 


(oe (Fl) T Qxk-1 
6 = min 


2. 7 2 
[=*| (aa 
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A pseudocode for TR based on (IPE) 


procedure TR INTERPOL-PARAM-EIG() 
input instace (Q, c,r, x°); 


(initialization) 

Find A min(Q) and its eigenvector x; 

Se =, al are A KO) 
_ le le= 

DO (ons ss tl) 


construct the interpolar (L); 
find ji: f(t) = 1’, that is: 


2) 1/2 
p=a-(2 ) ; 
12+ 8B 


set tk! = 1 + (jr), that is: 
f*! — 1 +84 Bla — pi) + 


Yeo 
a — fl’ 
compute) =a a(t) 

and the corresponding normalized eigenvec- 


os 
tor { 1, tT) ; 


setk=k+1; 
OD; 
RETURN(solution) 


END TR INTERPOLATION-PARAM-EIG; 


and db min = Min(dmin, 6) > Amin(Q). The other coefficient 
are chosen to satisfy b(u*) =—c! xk, #'(u*) = | x* - 
}' (uk) a xt]°. 

An algorithmic scheme for finding the global min- 
imizer of Problem (1) in the easy case, is reported in 
Table 4. 

It has been proved in [23] that there exists a neigh- 
borhood of — A* such that if j2°, ww' are in this neigh- 
borhood, all the sequence {*} is well defined, remains 
in the neighborhood and converge superlinearly to —A* 
with the corresponding iterates x* converging superlin- 
early to x*. 

Unfortunately, the iteration described above can 
break down in the hard case. Indeed the iteration is 
based on the ability to normalize the eigenvector of the 
bordered matrix D(t). This is not possible when the 
first component is equal to zero, that is in the hard 
case. From the computational point of view, also a near- 
hard case can be difficult and it is important to detect 
these cases and to define alternative rules so as to ob- 
tain a convergent iteration. This can be done, by using 


again eigenpairs of the bordered matrix and additional 
information such as the value of an upper bound Ay on 
the optimal value A*. When the hard case is detected 
the new iteration should be used. The convergence of 
this new iteration can be established but unfortunately 
the rate of convergence is no longer superlinear. 


Semidefinite Programming Approach (SDP) 


In [20] a primal-dual simplex type method for Prob- 
lem (1) has been proposed, which is essentially based on 
a primal dual pair of semidefinite programming prob- 
lems. Primal-dual pairs of SDP provide a general frame- 
work for TR problem. The idea arises from the fact that 
Problem (1) enjoys strict duality, that is there is no du- 
ality gap and 


q(x*) = min max L(x, A) = max min L(x, A), 


where L(x, A) = q(x) + A(||x||? — 77) denotes the La- 
grangian function. By exploiting this feature it is possi- 
ble to define a primal-dual pair of linear SDP problems 
that are strictly connected with the TR problem. In par- 
ticular, a dual for Problem (1) is 


max (1° + 1)Mmin(D(t)) — ¢, 


s.t. [min(D(t)) <0. ” 


The objective function in (5) is a real valued concave 
function. When the constraint in Problem (1) is an 
equality one, its dual problem (5) is an unconstrained 
problem, and as an immediate consequence, the non 
convex constrained TR problem is transformed into 
a convex problem and hence it can be solved in polyno- 
mial time by the results for general convex programs. 
Problem (5) can be easily reformulated as a SDP 
problem, by introducing an additional variable pz € R: 


max (r+ 1l)—t, 
st. D(t)—pl>0, (6) 
<0. 


Slater’s condition holds for Problem (6), and it is possi- 
ble to write its Lagrangian dual that is: 


min trace(D(0)X), 
st.  trace(X) <7? 41, 
(7) 
Xi = 1 
X>0. 
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The algorithm parallels the dual simplex method for 
linear programming. At each iteration it maintains dual 
feasibility for Problem (6) and complementary slack- 
ness, while iterating to get primal feasibility of Problem 
(7) (Xi, = 1) and reduce the duality gap. 
Essentially these steps can be summarized as fol- 
lows: 
1) find a basic solution (t, min(D(t))) of Problem (6); 
2) find an approximate solution of Problem (7), by us- 
ing the complementary slackness relation 


trace((D(t) — uDX) = 0; 


an eigenvector z(t) = (zo(t), v(t)™)™ corresponding 
tO Umin(D(t)) is used and X = (r? + 1) zzT so that 
the constraint on the trace of X in Problem (7) is 
satisfied; 

3) use inverse interpolation to predict a value of the pa- 
rameter t such that X,, = 1 and/or the duality gap 


trace(D(0)X) — ((r? + Du —?t) 


is decreasing. 
Some differences occur depending on whether the easy 
or the hard case happens. Let us denote by z(t) = 
(Zo(t), v(t)T)T the eigenvector of D(t) corresponding to 
Hmin(D(t)). 

In the easy case, the first component zo(t) # 0 and 
the vector v(t)/Zo(t) is the unique optimal solution of 


min q(x) 


2 1- B 
st. |||]? = Ser. 


Hence, a value t* such that (1—zo(t*)?)/zo(t*)* = r° 
must be found and then the point x* = v(t*)/zo(t*) with 
multiplier A* = — {min(D(t*)) is the unique solution 
of Problem (1). The correct value of t can be found by 
standard search procedures and the algorithm produces 
an interval containing ¢* that is iteratively updated. 

In the hard case, zo(t) may be zero. However there 
is still a value to such that fmin(D(to)) = Amin(Q) and 
a corresponding eigenvector z(to) exists with first com- 
ponent not equal to zero. In order to obtain the value to, 
consider, without loss of generality, a diagonal Q with 
elements A; in increasing order, so that Ay = Amin(Q). 
Assume that p is the multiplicity of Amin(Q), and define 


to = Amin(Q) + a ao 


k=p+l1 


Then the smallest eigenvalue /tmin(D(to)) = Amin(Q) 
with multiplicity p + 1. 

Two cases can occur. If (1—z(to)”)/Zo(to)* > 7? then 
the value t*< to. This case can be treated as the pre- 
ceding easy case since there exists t < fy such that the 
eigenvalue [min(D(t)) is simple, it results min(D(t)) 
< Amin(Q), and the corresponding eigenvector satisfies 
(1—zo(t))/zo(t)? = r?. On the other hand, if z3(to) = 
1/(r? + 1), then a primal step to the boundary of the 
feasible region of Problem (7) is taken while improving 
the objective function. In particular, let w € Smin with 
||w|| = 1, then the vector 


oy leat) | 
‘ = swt {(* Alt) J” 


together with A* = — Amin(Q)) satisfy the optimality 
conditions for Problem (1) and t* = to. Hence in the 
hard case, a vector is found that allows to move to the 
correct radius while improving the objective function. 

Inverse interpolation on the value of the first 
component Zo of the eigenvector corresponding to 
[min(D(t*)) is used to predict a new value for gern, 

A brief scheme of the algorithm is in Table 5. 
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A pseudocode for TR based on SDP 


procedure TR PRIMAL-DUAL-SDP() 
input instance (Q, c, r, x°); 
(initialization) 
Find Amin(Q); set k = 0. 
Set the interval of uncertainty 
les fl for t*, and [ae pk] for q(x*); 
DO (until a solution is found) 
improve the parameter ¢**! 
using inverse interpolation 
update the iterate 
using [min(D(t*)) and its corresponding 
eigenvector; 
update the intervals 
tenets fea and bee ee 
setk=k+1; OD; 
RETURN(solution) 
END TR PRIMAL-DUAL-SDP; 
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Conclusion 


All the algorithms described above appear to be poten- 
tially equivalent from the computational point of view. 
They have been implemented in MATLAB [15] codes 
and the results of the numerical testing are reported in 


the corresponding papers. 
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A large scale unconstrained optimization problem can 
be formulated as the problem of finding a local mini- 
mizer ofa real valued function f: R” — R over the space 
R", namely to solve the problem 


min f(x), (1) 
where the dimension n is large. The notion of ‘large 
scale’ is machine dependent and hence it could be 
difficult to state a priori when a problem is of large 
size. However, today an unconstrained problem with 
more than one thousand variables is usually considered 
a large scale problem. 

Besides its own theoretical importance, the growing 
interest in the last years in solving problems of large size 
derives from the fact that problems with a larger and 
larger number of variables are arising very frequently 


from real world as a result of modeling systems with 
a very complex structure. 

The main difficulty in dealing with large scale prob- 
lems is the fact that effective algorithms for small scale 
problems do not necessarily translate into efficient algo- 
rithms when applied to solve large problems. Therefore 
in most cases it is improper to tackle a problem with 
a large number of variables by using one of the many 
existing algorithms for the small scale case relying on 
the growing powerful of the modern computers (see, 
e. g., [11,13,34] for a review on the existing methods for 
small scale unconstrained optimization). 

A basic feature of an algorithm for large scale prob- 
lems is a low storage overhead needed to make practi- 
cable its implementation. Moreover, whenever a large 
scale problem has some structure it should be exploited 
to define reliable algorithms; in fact, often the structure 
of a problem reflects in the sparsity of the Hessian ma- 
trix of the function f which can be efficiently exploited. 

Methods for unconstrained optimization differ ac- 
cording to how much information on the function f is 
available. In the framework of large scale unconstrained 
optimization it is usually required that the user pro- 
vides at least subroutines which evaluate the objective 
function and its gradient for any point x. More effec- 
tive methods can be obtained if second order deriva- 
tives are known. When the derivatives are not available 
they can be obtained by finite difference or by using 
automatic differentiation. Throughout we assume that 
the function f is twice continuously differentiable, i.e. 
that the gradient g(x) = V f(x) and the Hessian matrix 
H(x) = V? f(x) of the function f exist and are contin- 
uous. Moreover, we denote by ||v|| the Euclidean norm 
of a vector v € R”. 

As in the small scale case, most of the large scale 
unconstrained algorithms are iterative methods which 
generate a sequence of points according to the scheme 


Xk+1 = Xp + Ody (2) 


where dy € R" is a search direction and a, € R is 
a steplength obtained by means of a one-dimensional 
search. Obviously, also in large scale optimization it is 
important that an algorithm presents both the global 
convergence (i.e. convergence of the sequence {x;} to- 
wards a stationary point from any starting point) and 
a good convergence rate. 
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A basic method for solving large scale uncon- 
strained optimization problems can be considered the 
steepest descent method obtained by setting dy = — g(xx) 
in (2). This method is based on the linear approxima- 
tion of the objective function f and hence only first or- 
der information are need. Due to its very limited stor- 
age required by a standard implementation, steepest de- 
scent method could be considered very attractive in the 
large scale setting; moreover the global convergence can 
also be ensured. However, its convergence rate is only 
linear and therefore it is too slow to be used. A partic- 
ular rule for computing the stepsize a, has been pro- 
posed [39] and this led to a significant improvement of 
the efficiency of the steepest descent method. 

One of the most effective methods for solving un- 
constrained problems is the Newton method (cf. ® Un- 
constrained nonlinear optimization: Newton-Cauchy 
framework). It is based on the quadratic approximation 
of f (x; + w) given by 


Gi(w) = flxe) + gla)" w+ zw! H(xx)w (3) 
and it is defined by iterations of the form 
Xk+1 = Xk + Sk (4) 


where the search direction s; is obtained by minimiz- 
ing the quadratic model of the objective function (3) 
over R”. On the one hand, Newton method presents 
quadratic convergence rate and it is scale invariant, but, 
on the other hand, in its pure form it is not globally 
convergent. Globally convergent modifications of the 
Newton method has been defined following the line 
search approach and the trust region approach (see, 
e.g. [11,12,27]; cf. also » Large scale trust region prob- 
lems), but the main difficulty, in dealing with large scale 
problems, is represented by the possibility to efficiently 
solve, at each iteration, linear systems which arise in 
computing the search direction s;. In fact, the problem 
dimension could be too large for any explicit use of the 
Hessian matrix and iterative methods must be used to 
solve systems of linear equations instead of factoriza- 
tions of the matrices involved. Indeed, whereas in the 
small scale setting the Newton direction s; is usually de- 
termined by using direct methods for solving the linear 
system 


A(xx)s = —g(xx), (5) 


when 1 is large, it is impossible to store or factor the full 
n x n Hessian matrix unless it is a sparse matrix. More- 
over the exact solution, at each iteration, of the system 
(5) could be too burdensome and not justified when x; 
is far from a solution. In fact, since the benefits of us- 
ing the Newton direction are mainly local (i.e. in the 
neighborhood of a solution), it should not be necessary 
a great computational effort to get an accurate solution 
of system (5) when g(x;,) is large. 

On the basis of these remarks, in [8] the inexact 
Newton methods were proposed. They represent the ba- 
sic approach underlying most of the Newton-type large 
scale unconstrained algorithms. The main idea is to ap- 
proximately solve the system (5) still ensuring a good 
convergence rate of the method by using a particular 
trade-off rule between the computational burden re- 
quired to solve the system (5) and the accuracy with 
which it is solved. The measure of this accuracy is the 
relative residual 


[Ire 


Wgex) I? where rx = H(xx)sk + g(x) (6) 


and s; is an approximate solution of (5). The analysis 
given in [8] shows that if the sequence {x;,} generated 
by (4) converges to a point x, and if 


|r| 

3% [eG a 
then {x;,} converges superlinearly to x,. This result is 
at the basis of the truncated Newton methods which 
represent one of the most effective approach for solv- 
ing large scale problems. This class of methods was in- 
troduced in [9] within the line search based Newton- 
type methods. They are based on the fact that when- 
ever the Hessian matrix H(xx) is positive definite, to 
solve the Newton equation (5) is equivalent to deter- 
mine the minimizer of the quadratic model (3). There- 
fore, in these methods, a Newton-type direction, i.e. 
an approximate solution of (5), is computed by apply- 
ing the (linear) conjugate gradient (CG) method (cf. 
> Conjugate-gradient methods) [23] to approximately 
minimize the quadratic function (3). A scheme of a line 
search based truncated Newton algorithm is the follow- 
ing: 
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Line search based truncated Newton algorithm 


OUTER iterations 
Ror he = ©, Maso. 
Compute g(xx) 
Test for convergence 
INNER iterations 
(Computation of the direction s;) 
Iterate CG algorithm until 
a termination criterion is satisfied 
Compute a stepsize a; by a line search procedure 
Set Xk41 = XK + AksK 


A scheme for a truncated Newton algorithm 


Given a starting point x9, at each iteration k, 
a Newton-type direction s; is computed by truncating 
the CG iterates - the inner iterations - whenever a re- 
quired accuracy is obtained. The definition of an effec- 
tive truncation criterion represents a key aspect of any 
truncated Newton method and a natural choice is rep- 
resented by monitoring when the relative residual (6) 
is sufficiently small. Moreover, by requiring that ||r,|| / 
lle(xx)|| < ne with lim, — oo 7% > 0, the condition given 
by (7) is satisfied and hence the superlinear conver- 
gence is guaranteed [9]. In particular 7, can be chosen 
to ensure that, as a critical point is approached, more 
accuracy is required. Other truncation criteria based on 
the reduction of the quadratic model can be defined 
[31]. Numerical experiences showed that a relatively 
small number of CG iterations is needed, in most cases, 
for obtaining a good approximation of the Newton di- 
rection and this is one the main advantage of the trun- 
cated Newton methods since a considerable computa- 
tional savings can be obtained still ensuring a good con- 
vergence rate. The performance of the CG algorithm 
used in the inner iterations can be improved by using 
a preconditioning strategy based either on the informa- 
tion gained during the outer iterations or on some scal- 
ing of the variables. Several different preconditioning 
schemes have been proposed and tested [29,40]. Trun- 
cated Newton methods can be modified to enable their 
use whenever the Hessian matrix is not available; in 
fact, the CG method only needs the product of the Hes- 
sian matrix with a displacement vector, and this prod- 
uct can be approximated by finite difference [35]. The 
resulting method is called discrete truncated Newton 
method. In [41] a Fortran package (TNPACK) imple- 


menting a line search based (discrete) truncated New- 
ton algorithm which uses a preconditioned conjugate 
gradient is proposed. However, additional safeguard is 
needed within truncated Newton algorithms since the 
Hessian matrix could be not positive definite. In fact, 
the CG inner iterations may break down before satis- 
fying the termination criterion when the Hessian ma- 
trix is indefinite. To handle this case, whenever a di- 
rection of negative curvature (i.e. a direction d, such 
that di H(xx) dy < 0) is encountered, the inner itera- 
tions are usually terminated and a descent direction (i.e. 
a direction d, such that g(x;)™ dy < 0) is computed [9]. 
More sophisticated strategies can be applied for itera- 
tively solving the system (5) when it is indefinite [6,15, 
36,43]. In particular, the equivalent characterization of 
the linear conjugate gradient algorithm via the Lanczos 
method can be exploited to define a truncated Newton 
algorithm which can be used to solve problems with in- 
definite Hessian matrices [28]. In fact, the Lanczos algo- 
rithm does not requires the Hessian matrix to be posi- 
tive definite and hence it enables to obtain an effective 
Newton-type direction. 

A truncated Newton method which uses a non- 
monotone line search (i.e. which does not enforce the 
monotone decrease of the objective function values) 
was proposed in [20] and the effectiveness of this ap- 
proach was shown especially in the solution of ill- 
conditioned problems. Moreover in the CG-truncated 
scheme proposed in [20] an efficient strategy to handle 
the indefinite case is also proposed. 

A new class of truncated Newton algorithms for 
solving large scale unconstrained problems has been 
defined in [25]. In particular, a nonmonotone stabiliza- 
tion framework is proposed based on a curvilinear line 
search, i.e. a line search along the curvilinear path 


x(a) = xp +075, + ody, 


where s; is a Newton-type direction and d; is a particu- 
lar negative curvature direction which has some resem- 
blance to an eigenvector of the Hessian matrix corre- 
sponding to the minimum eigenvalue. The use of the 
combination of these two directions enables, also in the 
large scale case, to define a class of line search based al- 
gorithms which are globally convergent towards points 
which satisfy second order necessary optimality condi- 
tions, i.e. stationary points where the Hessian matrix is 
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positive semidefinite. Besides satisfying this important 
theoretical property, this class of algorithms was also 
shown to be very efficient in solving large scale uncon- 
strained problems [25,26]. This is also due to the fact 
that a Lanczos based iterative scheme is used to com- 
pute both the directions without terminating the inner 
iterations when indefiniteness is detected and, as result, 
more information about the curvature of the objective 
function are conveyed. 

Truncated Newton methods have been also defined 
within the trust region based methods. These methods 
are characterized by iterations of the form (4) where, at 
each iteration k, the search direction s, is determined 
by minimizing the quadratic model of the objective 
function (3) in a neighborhood of the current iterate, 
namely by solving the problem 


min ¢,(s), (8) 

IIsl<A 
where A is the trust region radius. Also in this frame- 
work most of the existing algorithms require the solu- 
tion of systems of linear equations. Some approaches 
are the dogleg methods [10,38] which aim to solve prob- 
lem (8) over a one-dimensional arc and the method 
proposed in [5] which solves problem (8) over a two- 
dimensional subspace. However, whenever the prob- 
lem dimension is large, it is impossible to rely on ma- 
trix factorizations, and iterative methods must be used. 
If the quadratic model (3) is positive definite and the 
trust region radius is sufficiently large that the trust re- 
gion constraint is inactive at the unconstrained mini- 
mizer of the model, problem (8) can be solved by using 
the preconditioned conjugate gradient method [42,44]. 
Of course, a suitable strategy is needed whenever the 
unconstrained minimizer of the quadratic model is no 
longer lying within the trust region and the desired so- 
lution belongs to the trust region boundary. A simple 
strategy to handle this case was proposed in [42] and 
[44] and it considers the piecewise linear path connect- 
ing the CG iterates, stopping at the point where this 
path leaves the trust region. If the quadratic model (3) 
is indefinite, the solution must also lie on the trust re- 
gion boundary and the piecewise linear path can be 
again followed until either it leaves the trust region, or 
a negative curvature direction is found. In this latter 
case, two possibilities have been considered: in [42] the 
path is continued along this direction until the bound- 


ary is reached; in [44] the minimizer of the quadratic 
model within the trust region along the steepest de- 
scent direction (the Cauchy point) is considered. This 
class of algorithms represents a trust region version 
of truncated Newton methods and an efficient imple- 
mentation is carried out within the LANCELOT pack- 
age [7]. These methods have become very important in 
large scale optimization, due to both their strong the- 
oretical convergence properties and good efficiency in 
practice, but they are known to possess some draw- 
backs. Indeed, they are essentially unconcerned with 
the trust region until they blunder into its bound- 
ary and stop. Moreover, numerical experiences showed 
that very frequently this untimely stop happens during 
the first inner iterations when a negative curvature is 
present and this could deteriorate the efficiency of the 
method. In order to overcome this drawback an alter- 
native strategy is proposed in [16] where ways of con- 
tinuing the process once the boundary of the trust re- 
gion is reached are investigated. The key point of this 
approach is the use of the Lanczos method and the fact 
that preconditioned conjugate gradient and Lanczos 
methods generate different bases for the same Krylov 
space. Several other large scale trust region methods (cf. 
> Large scale trust region problems) have been pro- 
posed. 

Another class of methods which can be successfully 
applied to solve large scale unconstrained optimiza- 
tion problems is the wide class of the nonlinear con- 
jugate gradient methods [14,23]. They are extensions 
to the general (nonquadratic) case of the already men- 
tioned linear conjugate gradient method. They repre- 
sent a compromise between steepest descent method 
and Newton method and they are particularly suited for 
large scale problems since there is never a need to store 
a full Hessian matrix. They are defined by the iteration 
scheme (2) where the search direction is of the form 


dy = —g(xx) + Brde—1 (9) 


with do = — g(xo) and where f, is a scalar such that 
the algorithm reduces to the linear conjugate gradient 
method if the objective function f is a strictly convex 
quadratic function and a, in (2) is obtained by means 
of an exact line search (i.e., a, is the one-dimensional 
minimizer of f(x; + a@ d,) with respect to ~). The most 
widely used formulas for 6, are Fletcher-Reeves (FR) 
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and Polak-Ribiére (PR) formulas given by 


FR _ Ile(xx) ||? 
Wg? 

per = g(xn)! [ g(xx) = g(xk-1)] 
‘ Ign 


Many efforts have been devoted to investigate the global 
convergence for nonlinear conjugate gradient methods. 
A widespread technique to enforce the global conver- 
gence is the use of a regular restart along the steepest 
descent direction every n iterations obtained by setting 
B, = 0. However, computational experiences showed 
that this restart can have a negative effect on the effi- 
ciency of the method; on the other hand, in the large 
scale setting, restarting does not play a significant role 
since n is large and very few restarts can be performed. 
Global convergence results have been obtained for the 
Fletcher-Reeves method without restart both in the case 
of exact line search [46] and when a, is computed by 
means of an inexact line search [1]; then, the global 
convergence was extended to methods with |8;| < 67" 
[14]. As regards the global convergence of the Polak- 
Ribiére method, for many years it was proved with ex- 
act line search only under strong convexity assump- 
tions [37]. Global convergence both for exact and in- 
exact line search can also be enforced by modifying 
the Polak-Ribiére method by setting 6; = max{B?*, 0} 
[14]; this strategy correspond to restart the iterations 
along the steepest descent direction whenever a nega- 
tive value of 6; occurs. However, an inexact line search 
which ensures global convergence of the Polak—Ribiere 
method for nonconvex function has been obtained in 
[21]. As regards the numerical performance of these 
two methods, extensive numerical experiences showed 
that, in general, Polak—Ribiére method is usually more 
efficient than the Fletcher-Reeves method. An efficient 
implementation of the Polak-Ribi¢re method (with 
restarts) is available as routine VA14 within the Har- 
well subroutine library [22]. See, e.g., [34] for a de- 
tailed survey on the nonlinear conjugate gradient meth- 
ods. 

Another effective approach to large scale uncon- 
strained optimization is represented by the limited- 
memory BFGS method (L-BFGS) proposed in [32] 
and then studied in [24,30]. This method resembles 


the BFGS quasi-Newton method, but it is particularly 
suited for large scale (unstructured) problems because 
the storage of matrices is avoided. It is defined by the 
iterative scheme (2) with the search direction given by 


dy = —Hyg(xr) 


and where H; is the approximation to the inverse Hes- 
sian matrix of the function f at the kth iteration. In 
the BFGS method the approximation H; is updated by 
means of the BFGS correction given by 


His = Ve HxVe + press) 


where Vx =I — pk Yk Sp Sk =Xk+1 — Xko Ve = BKK +1) 
— g(xx), and px = Uy, sz. In the L-BFGS method, in- 
stead of storing the matrices Hx, a prefixed number 
(say m) of vectors pairs {s;, yz} that define them im- 
plicitly are stored. Therefore, during the first m iter- 
ations the L-BFGS and the BFGS methods are iden- 
tical, but when k > m only information from the m 
previous iterations are used to obtain Hy. The num- 
ber m of BFGS corrections that must be kept can be 
specified by the user. Moreover, in the L-BFGS the 
product Hx g(x;) which represents the search direc- 
tion is obtained by means of a recursive formula in- 
volving g(x;,) and the most recent vectors pairs {s,, 
yx}. An implementation of L-BFGS method is avail- 
able as VA15 routine within the Harwell subroutine li- 
brary [22]. An interesting numerical study of L-BFGS 
method and a comparison of its numerical perfor- 
mance with the discrete truncated Newton method 
and the Polak-Ribiére conjugate gradient method are 
reported in [30]. The results of a numerical experi- 
ence with limited-memory quasi-Newton and trun- 
cated Newton methods on standard library test prob- 
lems and on two real life large scale unconstrained op- 
timization applications can be found in [45]. A method 
which combines the discrete Newton method and the 
L-BFGS method is proposed in [4] to produce an ef- 
ficient algorithm able to handle also ill-conditioned 
problems. 

Limited memory quasi-Newton methods represent 
an adaptation of the quasi-Newton methods to large 
scale unstructured optimization. However, the quasi- 
Newton approach can be successfully applied to large 
scale problems with a particular structure. In fact, fre- 
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quently, an optimization problem has some structure 
which may be reflected in the sparsity of the Hessian 
matrix. In this framework, the most effective method 
is the partitioned quasi-Newton method proposed in 
[18,19]. It is based on the fact that a function f with 
a sparse Hessian is a partially separable function, i.e. it 
can be written in the form 


fle) = 2 filo 


i=1 


where the element functions f; depends only on a few 
variables. Many practical problems can be formulated 
(or recasted) in this form showing a wide range of ap- 
plicability of this approach. The basic idea of the par- 
titioned quasi-Newton method is to decompose the 
Hessian matrix into a sum of Hessians of the element 
functions f;. Each approximation to the Hessian of f; 
is then updated by using dense updating techniques. 
These small matrices are assembled to define an ap- 
proximation to the Hessian matrix of f used to com- 
pute the search direction. However, the element Hes- 
sian matrices may not be positive definite and hence 
BFGS formula cannot be used, and in this case a sym- 
metric rank one formula is used. Global convergence 
results have been obtained under convexity assumption 
of the function f; [17]. An implementation of the parti- 
tioned quasi-Newton method is available as VE08 rou- 
tine of the Harwell subroutine library [22]. A compari- 
son of the performance of partitioned quasi-Newton, L- 
BFGS, CG Polak-Ribiére and truncated discrete New- 
ton methods is reported in [33]. 

Another class of methods which has been extended 
to large sparse unconstrained optimization are tensor 
methods [3]. Tensor methods are based on fourth or- 
der model of the objective function and are particu- 
larly suited for problems where the Hessian matrix has 
a small rank deficiency. 

To conclude, it is worthy to outline that in deal- 
ing with large scale unconstrained problems with a very 
large number of variables (more than 10*) high per- 
formance computer architectures must be considered. 
See e.g. [2] for the solution of large scale optimization 
problems on vector and parallel architectures. 

The reader can find the details of the methods men- 
tioned in this brief survey in the specific cited refer- 
ences. 
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In the field of nonlinear programming (in continuous 
variables), convex analysis [20,21] plays a pivotal role 
both in theory and in practice. An analogous theory 
for discrete optimization (nonlinear integer program- 
ming), called “discrete convex analysis’ [15,16], is devel- 
oped for L-convex and M-convex functions by adapting 
the ideas in convex analysis and generalizing the results 
in matroid theory. The L- and M-convex functions are 
introduced in [15] and [12,18], respectively. 


Definitions of L- and M-Convexity 


Let V be a nonempty finite set and Z be the set of inte- 
gers. For any function g: ZY + Z U{+o0} define dom g 


= {p € Z": g(p) < +00}, called the effective domain of g. 
A function g: ZY + Z U {+00} with dom g # @ is 
called L-convex if 


g(p) + 9(q)= g(pVq)t+e(prq) (p.qeEZ"), 
WreZ: gp+D=g(p)t+r (peZ"), 


where p V q = (max(p(v), q(v)) |v € V) € ZY, pA q= 
(minp(v), q(v))|v € V) € ZY, and 1 is the vector in Z” 
with all components being equal to 1. 

A set D C Z” is said to be an L-convex set if its indi- 
cator function 5p (defined by dp(p) = 0 if p € D, and = 
+ oo otherwise) is an L-convex function, i.e., if 
i) DFO; 

ii) p.qeD>pvgapAqeED;and 
iii) peDSptileD. 

A function f: ZY > Z U {+00} with dom f # @ is 
called M-convex if it satisfies 
e M-EXC) For x, y € dom f and u € supp*(x— y), 

there exists v € supp” (x — y) such that 


FOF I) = fht— ar 

+ fly + xu - xX): 
where, for any u € V, 7, is the characteristic vector 
of u (defined by y,,(v) = 1 if v=, and = 0 otherwise), 
and 
(z€Z"), 
(ze ZV). 


supp™ (z) = {ve V: z(v) > 0} 
supp (z)={veEV: z(v) < 0} 


A set B C Z” is said to be an M-convex set if its in- 
dicator function is an M-convex function, i.e., if B sat- 
isfies 
e B-EXC) For x, y € Band for u € supp*(x — y), there 

exists v € supp (x — y) such that x — x, + x, € B 

andy + yu — 7%, €B. 

This means that an M-convex set is the same as the set 
of integer points of the base polyhedron of an integral 
submodular system (see [8] for submodular systems). 

L-convexity and M-convexity are conjugate to each 
other under the integral Fenchel-Legendre transforma- 
tion f + f° defined by 


f° (p) = sup {(p, x) — f(x): x € ZY}, pez’, 


where (p, x) = )oyev p(v) x(v). That is, for L-convex 
function g and M-convex function f, it holds [15] that 
g° is M-convex, f® is L-convex, g*® = g, and f*® =f. 
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Example 1 (Minimum cost flow problem) L-convexity 
and M-convexity are inherent in the integer minimum- 
cost flow problem, as pointed out in [12,15]. Let G=(V, 
A) be a graph with vertex set V and arc set A, and let T 
C V be given. For &: A > Z its boundary 0&: V > Z is 
defined by 


dg(v) 
=o {&(a): ae Sty} — \{E(a): ae by} 
(ve V), 


where 6*v and 5” v denote the sets of out-going and in- 
coming arcs incident to v, respectively. For p: V > Z 
its coboundary 5p: A — Z is defined by 

5p(a) = p(dta)—p(d-a) (a€ A), 
where 0'a and 0 a mean the initial and terminal 
vertices of a, respectively. Denote the class of one- 
dimensional discrete convex functions by 


Ci = {g: Z> ZU {+co}| domg ¥ G, 
g(t—1) + y(t +1) = 29(t) (te Z)}. 
For @q € C; (a € A), representing the arc-cost in 


terms of flow, the total cost function f: Z’ > ZU {+00} 
defined by 


d&(v) = —x(v) 
_, ; (ve T), 
f(x) => ue 2, al§(a)): d&(v) =0 
(ve V\T) 


(x € Z") 


is M-convex, provided that f > — 00 (ie., f does not 
take the value of — oo). For Wa € C; (a € A), repre- 
senting the arc-cost in terms of tension, the total cost 
function g: Z’ > ZU {+00} defined by 


n= —6p, 
g(p) = inf) >> valn(a)): Blv) = plv) 
acA (v € T) 
(p € Z") 


is L-convex, provided that g > — oo. The two cost func- 
tions f(x) and g(p) are conjugate to each other in the 
sense that, if W, =? (a € A), then g=f?. 


Example 2 (Polynomial matrix) Let A(s) be an m x n 
matrix of rank m with each entry being a polynomial 
in a variable s, and let B C 2” be the family of bases of 
A(s) with respect to linear independence of the column 
vectors; namely, J C V belongs to B if and only if |J| = 
m and the column vectors with indices in J are linearly 
independent. Then f: ZY — Z U {+ oo} defined by 


—deg.detA[J] (x = xj, J € B) 


fe) = +00 (otherwise) 


is M-convex, where y; € { 0, 1 } is the characteristic 
vector of J (defined by x;(v) = 1 if v € J, and = 0 other- 
wise), A[J] denotes the m x m submatrix with column 
indices in J € B, and deg,(-) means the degree as a poly- 
nomial in s. The Grassmann-Pliicker identity implies 
the exchange property of f. This example was the moti- 
vation of valuated matroids in [2,3], which in turn can 
be identified with the negative of M-convex functions f 
with dom f € { 0, 1}Y. 

For p = (p(v))yeve Z” denote by D(p) the diagonal 
matrix of order n = |V| with diagonal elements s?) (v 
€ V). Then the function g: ZY — Z defined by 


g(p) = max {deg, det(A - D(p))[J]: J € B} 


is L-convex [16], where (A - D(p)) [J] means the m x 
m submatrix of A - D(p) with column indices in J. We 
have g=f°. 


L-Convex Sets 


An L-convex set D C Z" has ‘no holes’ in the sense that 
D = DN Z", where D denotes the convex hull of D in 
R". Hence it is natural to consider the polyhedral de- 
scription of D, ‘L-convex polyhedron’ (see [15,16]). For 
any function y: V x V > ZU {+00} with y(y, v) = 0 (v 
€ V), define 

p(v) — pu) < y(u, v) 

(Vu,v € V) ‘ 


If D(y) 4 9, D(y) is an integral polyhedron and D = 
D(y)/ Z” is an L-convex set. If y satisfies triangle in- 
equality: 


Diy) = {peR’: 


y(u,v) + y(v,w) > y(u,w) (u,v,weV), 


then D(y)F @ and 


y(u,v) = sup {p(v) — p(u): p € D(y)} 
(u,v eV). 
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Conversely, for any nonempty D C Z, 


y(u,v) = sup {p(v) — p(u): p € D} 
(u,v € V), 


satisfies triangle inequality as well as y(v, v) = 0 (ve 
V), and if D is L-convex, then D = D(y). Thus there is 
a one-to-one correspondence between L-convex set D 
and function y satisfying triangle inequality. In partic- 
ular, D C Z” is L-convex if and only if D = D(y)n Z” 
for some y satisfying triangle inequality. For L-convex 
sets D,, D2 C Z”, itholds that D} +D2 = Dy, + D:NZY 
and D; N D2 = D, N Dy». 

It is also true that a function y satisfying triangle in- 
equality corresponds one-to-one to a positively homo- 
geneous M-convex function f (i-e., f(A x) = A f(x) for 
x € ZY and0 <A € Z). The correspondence f +> y is 
given by 


y(u, v) = f% = Xu) (u,v € V), 


whereas y +> f by 


f(x) = inf 


S Auv(Xv _ Xu) = 


u,vEeV 
0<A,,€Z 
(u,v € V) 


(x € ZY). 


> Awy(u,v): 


u,vEeV 


The correspondence between L-convex sets and posi- 
tively homogeneous M-convex functions via functions 
with triangle inequality is a special case of the conjugacy 
relationship between L- and M-convex functions. 


M-Convex Sets 


An M-convex set B C Z” has ‘no holes’ in the sense that 
B = BN Zy. Hence it is natural to consider the poly- 
hedral description of B, ‘M-convex polyhedron’. A set 
function p: 2” — Z U {+00} is said to be submodular if 


P(X) + p(Y) = p(XUY) + p(Xn VY) 
(X,Y CV), 


where the inequality is satisfied if p(X) or p(Y) is equal 
to +00. It is assumed throughout that e(@) = 0 and p(V) 


< +00 for any set function p:2” > ZU {+oo}. For a set 
function p, define 


x(X) < p(X) 
(VX CV), : 
x(V) = p(V) 


P(o) = 4xeER’: 


where x(X) = )°yex x(y). If p is submodular, P(p) is 
a nonempty integral polyhedron, B = P(p) N Z" is an 
M-convex set, and 


p(X) = sup {x(X): x € P(p)} (X CV). 


Conversely, for any nonempty B C ZY, define a set 
function p by 


p(X) = sup {x(X): x eB} (X CV). 


If B is M-convex, then p is submodular and B = P(p). 
Thus there is a one-to-one correspondence between 
M-convex set B and submodular set function p. In par- 
ticular, B C Z” is M-convex if and only if B = P(p)N Z” 
for some submodular p. The correspondence B <> p is 
a restatement of a well-known fact [4,8]. For M-convex 
sets B,, By C Z’, it holds that B, + By = By + B,.NZY 
and B, M B, = By M Bo. 

It is also true that a submodular set function p corre- 
sponds one-to-one to a positively homogeneous L-con- 
vex function g. The correspondence g +> 9 is given by 
the restriction 


A(X) = g(yx) (XCV) 


(7x is the characteristic vector of X), whereas p +> 
g by the Lovasz extension (explained below). The cor- 
respondence between M-convex sets and positively 
homogeneous L-convex functions via submodular set 
functions is a special case of the conjugacy relationship 
between M- and L-convex functions. 

For a set function p: 2” > Z U { + oo}, the Lovdsz 
extension [11] of p isa function p: RY > RU {+00} 
defined by 


Alp) = S-(p; — pjti)e(Vj) (p € RY), 


jal 


where, for each p € R’, the elements of V are indexed as 
{v1,...> Va} (with m =|V]) in such a way that p(v|) >--- 
> p(n); pj = p(y), Vj ={v1,..., vj} for j= 1,..., n, and 
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Pn+1 = 0. The right-hand side of the above expression 
is equal to + oo if and only if p; — pj, > 0 and p(Vj) = 
+00 for some j with 1 <j < n— 1. The Lovasz extension 
? is indeed an extension of p, since 0(yx) = p(X) for X 
CV. 

The relationship between submodularity and con- 
vexity is revealed by the statement [11] that a set func- 
tion p is submodular if and only if its Lovasz extension 
? is convex. 

The restriction to Z” of the Lovdsz extension of 
a submodular set function is a positively homogeneous 
L-convex function, and any positively homogeneous 
L-convex function can be obtained in this way [15]. 


Properties of L-Convex Functions 


For any g: Z” + ZU {+ oo} and x € RY, define g[— x]: 
ZY + RU {+00} by 


gl—x](p) = g(p)—(p.x) (pe Z”). 


The set of the minimizers of g[— x] is denoted as 
argmin(g[— x]). 

Let g: ZY — Z U {+00} be L-convex. Then dom g is 
an L-convex set. For each p € dom g, 


Pp(X) = g(p + xx)-glp) (X CV) 


is a submodular set function with p,(@) = 0 and p,(V) 
<+ 0. 

An L-convex function g can be extended to a convex 
function g: RY + RU {+00} through the Lovasz ex- 
tension of the submodular set functions p, for p € dom 
g. Namely, for p € dom g and q € [0, 1], it holds [15] 
that 


g(p + q) 


= g(p) + Y\(ai — aj+(8(P + xv;) — B(P)), 
j=l 

where, for each q, the elements of V are indexed as {v1, 
.-+> Va} (with n = |V]) in such a way that q(v)) >--- > 
q(Vn)s qj = 9(j), Vj = {v1 .--> vj} for j = 1,..., n, and 
Gn+1 = 0. The expression of g shows that an L-convex 
function is an integrally convex function in the sense 
of [5]. 

An L-convex function g enjoys discrete midpoint 
convexity: 


cvsaree([28) oa 254) 


for p,q € Z", where [p] (or |p|) for any p € RY denotes 
the vector obtained by rounding up (or down) the com- 
ponents of p to the nearest integers. 

The minimum of an L-convex function g is charac- 
terized by the local minimality in the sense that, for p € 
dom g, g(p) < g(q) for all q € Z" if and only if g(p + 1) 
= g(p) < g(p + xx) forall X CV. 

The minimizers of an L-convex function, if 
nonempty, form an L-convex set. For any x € RY, 
argmin (g[— x]), if nonempty, is an L-convex set. Con- 
versely, this property characterizes L-convex functions 
under an auxiliary assumption. 

A number of operations can be defined for L-convex 
functions [15,16]. For x € Z’, g[— x] is an L-convex 
function. For a € Z and B € Z, g(a + B p) is L-convex 
in p. For U C V, the projection of g to U: 


ge (p') = inf {e(p", p”): a” E aie (p’ E ZY) 


is L-convex in p’, provided that g/ > — oo. For wy € C; 
(veV), 


Hp) = inf, 0 +O Wo(p(r) = x | 


vEeV 


is L-convex in p € Z’, provided that ¢ > —oo. The 
sum of two (or more) L-convex functions is L-convex, 
provided that its effective domain is nonempty. 


Properties of M-Convex Functions 
Let f: ZY — ZU {+00} be M-convex. Then dom f is an 
M-convex set. For each x € domf, 


Yx(u,v) = f(x— Xu t+ Xv)— f(x) (uve V) 


satisfies [16] triangle inequality. 

An M-convex function f can be extended to a con- 
vex function f: RY — RU {+00}, and the value of 
f(x) for x € RY is determined by {f(y): y € Z”, [x] < 
y < [x]. That is, an M-convex function is an integrally 
convex function in the sense of [5]. 

The minimum of an M-convex function f is charac- 
terized by the local minimality in the sense that for x € 
dom f, f(x) < f(y) forall y € Z” ifand only if f(x) < f(x 
— Xut Zy) for all u, v € V [12,15,18]. 

The minimizers of an M-convex function, if 
nonempty, form an M-convex set. Moreover, for any 
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p € R", argmin(f[—p]), if nonempty, is an M-convex 
set. Conversely, this property characterizes M-convex 
functions, under an auxiliary assumption that the effec- 
tive domain is bounded or the function can be extended 
to a convex function over RY (see [12,15]). 

The level set of an M-convex function is not neces- 
sarily an M-convex set, but enjoys a weaker exchange 
property. Namely, for any p € RY anda E€R,S= {xe 
Z’: f[—p] (x) < a} (the level set of f[—p]) satisfies: For x, 
y € Sand for u € supp* (x — y), there exists v € supp” (x 
— y) such that either x — y,+ x, € Sory+t ZYu- Xv 
€ S. Conversely, this property characterizes M-convex 
functions [25]. 

A number of operations can be defined for M-con- 
vex functions [15,16]. For p € Z", f[— p] is an M-con- 
vex function. For a € Z, f(a — x) and f(a + x) are 
M-convex in x. For U C V, the restriction of f to U: 

ful’) = f(x’, Oyu) (x € Z") 

(where 0y\ y is the zero vector in Z” \¥) is M-convex in 
x’, provided that dom fy 4 @. For g, € C; (ve V), 


F(x) = fe) + Vw (x(v)) (x € ZY) 


vEeV 


is M-convex, provided that dom f # Q. In particular, 
a separable convex function f(x) = Yo,ey @(x(v)) 
with dom f being an M-convex set is an M-convex 
function. For two M-convex functions f; and f», the in- 
tegral convolution 


(AO f2)(x) 


= inf} fi(xi) + folx2): 


x =X, + X2 
X1,%. € ZY 


(x € Z’) 


is either M-convex or else (f; 1 f2)(x) = + ow for all x 
eZ. 

Sum of two M-convex functions is not necessarily 
M-convex; such function with nonempty effective do- 
main is called M,-convex. Convolution of two L-convex 
functions is not necessarily L-convex; such function 
with nonempty effective domain is called L2-convex. 
M>- and L2-convex functions are in one-to-one cor- 
respondence through the integral Fenchel-Legendre 
transformation. 


L'- and M'-Convexity 


Li. and M!-convexity are variants of, and essentially 
equivalent to, L- and M-convexity, respectively. L4- and 
M®-convex functions are introduced in [9] and [19], re- 
spectively. 

Let vo be a new element not in V and define V= 
{vo} U V. A function g: ZY — ZU{+oo} with dom g 
# @ is called L'-convex if it is expressed in terms of an 
L-convex function g: ZY — ZU {+00} as g(p) = 
(0, p). Namely, an L'-convex function is a function 
obtained as the restriction of an L-convex function. 
Conversely, an L4-convex function determines the cor- 
responding L-convex function up to the constant r in 
the definition of L-convex function. 

An L'-convex function is essentially the same as 
a submodular integrally convex function of [5], and 
hence is characterized by discrete midpoint convex- 
ity [9]. An L-convex function, enjoying discrete mid- 
point convexity, is an L'-convex function. 

Quadratic function 


(pP) = >>> aijpip) (p< Z") 
i=1 j=l 
with aj = aj € Z is L)-convex if and only ifay <0 (iF 
j) and et ay = 0 (i=1,---, n). For {yj € C):i= 1, 
..., n}, a separable convex function 


g(p) =) wilpi) (p €Z") 
i=1 


is L4-convex. 

The properties of L-convex functions mentioned 
above are carried over, mutatis mutandis, to L4-convex 
functions. In addition, the restriction of an L'-convex 
function g to U C V, denoted gy, is L4-convex. 

A subset of Z” is called an L!-convex set if its indi- 
cator function is an L'-convex function. A set E C Z" is 
an L'-convex set if and only if 


pPqeE = [252].| P44] ex. 


A function f: ZY > Z U{+oo} with dom f # @ is 
called M"-convex if fit is expressed in terms of an M-con- 
vex function f: ZY > ZU {+00} as 
f(x) ifxo + > x(u) = 0 

ueV 
+oo otherwise. 


F (x0, x) = 
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Namely, an M'-convex function is a function obtained 
as the projection of an M-convex function. Conversely, 
an M!-convex function determines the corresponding 
M-convex function up to a translation of dom f in the 
direction of vo. A function f: ZY > ZU {+00} with dom 
fHAMMis M!-convex if and only if (see [19]) it satisfies 

e M!-EXC) For x, y € dom f and u € supp*(x — y), 


F(x) + f) 


> min [Fo 1) + fy + xu), 


min 
vesupp~ (x— 


yu kat Wt It a= 10}. 


Since M-EXC) implies M!-EXC), an M-convex func- 
tion is an M!-convex function. 
Quadratic function 


f(x) = ss aix;” + b> xix; 
i=1 


i<j 


(x € Z") 


with a; € Z (1 <i<n),b € Zis M!-convex if 0 < b 
<2 min; <j<, a; (cf. [19]). For {gj € Ci: i =0,..., n}, 
a function of the form 


f(x) = 9 (> 7) +> gi(xi) (x€Z") 
i=1 i=1 


is M5-convex [19];a separable convex function is a spe- 
cial case of this (with @o = 0). More generally, for {gx 
€ C,: X € J} indexed by a laminar family J C 2", the 
function 


f(x) = Yo ex(x(X)) (x € Z”) 
XeT 
is M!-convex [1], where T is called laminar if for any X, 
Y €T, at least one of XM Y,X\ Y, Y \ X is empty. 

The properties of M-convex functions mentioned 
above are carried over, mutatis mutandis, to M4-convex 
functions. In addition, the projection of an M!-convex 
function f to U C V, denoted f U is M'-convex. 

A subset of Z" is called an M'-convex set if its indi- 
cator function is an M!-convex function. A set QcZ 
is an M'-convex set if and only if Q is the set of integer 
points of an integral generalized polymatroid (cf. [7] for 
generalized polymatroids). 

As a consequence of the conjugacy between L- 
and M-convexity, L'-convex functions and M'-convex 
functions are conjugate to each other under the integral 
Fenchel-Legendre transformation. 


Duality 


Discrete duality theorems hold true for L-convex/ 
concave and M-convex/concave functions. A function 
g: ZY + ZU {—oo} is called L-concave (respectively, 
L)-, M-, or M3 -concave) if —g is L-convex (respectively, 
L4_, M-, or M'-convex); dom g means the effective do- 
main of —g. The concave counterpart of the discrete 
Fenchel-Legendre transform is defined as 


g°(p) = inf {(p, x) — g(x): x Z™} (pe ZY). 


A discrete separation theorem for L-convex/ 
concave functions, named L-separation theorem [15] 
(see also [9]), reads as follows. Let f: ZY > ZU {+00} 
be an L!-convex function and g: ZY + ZU {— oo} be 
an L'-concave function such that dom f N dom g 4 0 
or dom f* N dom g° # O. If f(p) = g(p) (p € Z”), there 
exist B* € Zand x* € Z” such that 


f(p) = B* + (p,x*) = g(p) (pe Z"). 


Since a submodular set function can be identified 
with a positively homogeneous L-convex function, the 
L-separation theorem implies Frank’s discrete separa- 
tion theorem for a pair of sub/supermodular functions 
[6], which reads as follows. Let p: 2” + Z U {+00} and 
pu: 2° — ZU {co} be submodular and supermodu- 
lar functions, respectively, with p() = w(@) = 0, p(V) < 
+00, [L(V)> —oo, where jz is called supermodular if — 
is submodular. If p(X) > w(X) (X C V), there exists x* 
€ ZY such that 


p(X) = x*(X) > w(x) (X CV). 


Another discrete separation theorem, M-separation 
theorem [12,15] (see also [9]), holds true for M-con- 
vex/concave functions. Namely, let f: ZY > ZU {+oo} 
be an M!-convex function and g: ZY + ZU {-o0} be 
an M!-concave function such that dom f N dom g 4 0 
or dom f* N dom g® # 9G. If f(x) = g(x) (x € Z”), there 
exist a* € Zand p* € Z" such that 


f(x) = a* + (p*,x) > g(x) (xe ZY’). 


The L- and M-separation theorems are conjugate 
to each other, while a self-conjugate statement can be 
made in the form of the Fenchel-type duality [12,15], as 
follows. Let f: ZY + Z U {+00} be an L!-convex func- 
tion and g: Z” > ZU {—oo} be an L)-concave function 
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such that dom f M dom g ¥ 9 or dom f* N dom g® # 
Q. Then it holds that 


inf {f(p) — g(p): pe ZY} 
= sup {g°(x) —fr(x): x€ ZY} ; 


Moreover, if this common value is finite, the infimum is 
attained by some p € domf M dom g and the supremum 
is attained by some x € dom f* N dom g?®. 


Example 3 Here is a simple example to illustrate the 
subtlety of discrete separation for discrete functions. 
Functions f: Z* — Z and g: Z? — Z defined by f(x, 
x2) = max(0, x1+ x2) and g(x), x2) = min(x), x2) can be 
extended respectively to a convex function f: R? > R 
and a concave function g: R* — R according to 
the defining expressions. With p = G. +), we have 
f(x) > (p,x) > B(x) for all x € R’, and a fortiori, 
f(x) => (p,x) = g(x) for all x € Z?. However, there 
exists no integral vector p € Z? such that f(x)> (p, x) > 
g(x) for all x € Z?. Note also that f is M'-convex and g 
is L-concave. 


Network Duality 


A conjugate pair of M- and L-convex functions can be 
transformed through a network ([12,16]; see also [23]). 
Let G = (V, A) be a directed graph with arc set A and 
vertex set V partitioned into three disjoint parts as V = 
V*UV°UV-~. For gq € Cj (a € A) and M-convex f: 
Z’* + ZU {+00}, define f: Z”” > Z U {400} by 


fly) = - 
d& = (x, 0,—y) 
f(x) + D> gal&(a)): e ZVFUVUT™ 
acA fe TA 


For Wa € C; (a € A) and L-convex g: ZY 3Z U{+oo}, 
defineg: ZY — ZU {+00} by 


g(q) = int 
n = —8(p,r, q) 
g(p) + 9° Waln(a)): ne ZA 
aca (p.r.q) € Zvtuvuv— 


Then fi is M-convex, provided that f > —oo, and ¢ is 
L-convex, provided that g > —oo. If g =f* and w, = 


gy? (a€ A), then g = fe A special case (V* = V) of the 
last statement yields the network duality: 


0& = x, 
inf, @(x,é): x eZ, 
ge ZA 
n = —dp, 
= sup, W(p,n): peZ’, +, 
ne ZA 


where P(x, §) = f(x)+ Diaea Pal&(a)), YP 1) = 
—g(p)— Yoaea Waln(a)) and the finiteness of inf ® or 
sup W is assumed. The network duality is equivalent to 
the Fenchel-type duality. 


Subdifferentials 


The subdifferential of f: ZY > ZU {+ oo} at x € dom 

f is defined by {p € RY: f(y)— f(x)= (p, y — x) (Vy € 

Z")}. The subdifferential of an L2- or Mz-convex func- 

tion forms an integral polyhedron. More specifically: 

e The subdifferential of an L-convex function is an in- 
tegral base polyhedron (an M-convex polyhedron). 

e The subdifferential of an L-convex function is the 
intersection of two integral base polyhedra (M- 
convex polyhedra). 

e The subdifferential of an M-convex function is an 
L-convex polyhedron. 

e The subdifferential of an Mz-convex function is the 
Minkowski sum of two L-convex polyhedra. 

Similar statements hold true with L and M replaced re- 

spectively by L4 and M!. 


Algorithms 


On the basis of the equivalence of L'-convex func- 
tions and submodular integrally convex functions, the 
minimization of an L-convex function can be done 
by the algorithm of [5], which relies on the ellip- 
soid method. The minimization of an M-convex func- 
tion can be done by purely combinatorial algorithms; 
a greedy-type algorithm [2] for valuated matroids 
and a domain reduction-type polynomial time algo- 
rithm [24] for M-convex functions. Algorithms for du- 
ality of M-convex functions (in other words, for M- 
convex functions) are also developed; polynomial algo- 
rithms [14,22] for valuated matroids, and a finite primal 
algorithm [18] and a polynomial time conjugate-scaling 
algorithm [10] for the submodular flow problem. 
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Applications 


A discrete analog of the conjugate duality framework 
[21] for nonlinear optimization is developed in [15]. An 
application of M-convex functions to engineering sys- 
tem analysis and matrix theory is in [13,17]. M-convex 
functions find applications also in mathematical eco- 
nomics [1]. 


See also 


> Generalized Concavity in Multi-objective 
Optimization 

> Invexity and its Applications 

> Isotonic Regression Problems 
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LCP: Pardalos-Rosen Mixed Integer Formulation 


In this article we consider the general linear comple- 
mentarity problem (LCP) of finding a vector x € R” such 
that 

Mx+q>0, x=0, x'Mx+q'x=0 
(or proving that such an x does not exist), where M is 
an n X n rational matrix and q € R” is a rational vec- 
tor. For given data M and q, the problem is generally 
denoted by LCP(M, q). The LCP unifies a number of 
important problems in operations research. In partic- 
ular, it generalizes the primal-dual linear programming 
problem, convex quadratic programming, and bimatrix 
games [1,2]. 

For the general matrix M, where S = {x: Mx + q => 
0, x > 0} can be bounded or unbounded, the LCP can 
always be solved by solving a specific zero-one, linear, 
mixed integer problem with n zero-one variables. Con- 
sider the following mixed zero-one integer problem: 


max a 
A,y5Z 
(MIP) st. O<My+aq<e-z, 
a>0, O<y<z, 
z € {0,1}". 


Theorem 1 Let (a*, y*, z*) be any optimal solution of 
(MIP). Ifa* > 0, then x* = y*/a* solves the LCP. If in the 
optimal solution a* = 0, then the LCP has no solution. 


The equivalent mixed integer programming formula- 
tion (MIP) was first given in [3]. Every feasible point 
(a, y, Z) of (MIP), with a > 0, corresponds to a solution 
of LCP. Therefore, solving (MIP), we may generate sev- 
eral solutions of the corresponding LCP. J.B. Rosen [4] 
proved that the solution obtained by solving (MIP) is 
the minimum norm solution to the linear complemen- 
tarity problem. 


See also 


> Branch and Price: Integer Programming with 
Column Generation 

> Convex-simplex Algorithm 

> Decomposition Techniques for MILP: Lagrangian 
Relaxation 

> Equivalence Between Nonlinear Complementarity 
Problem and Fixed Point Problem 

> Generalized Nonlinear Complementarity Problem 


> Integer Linear Complementary Problem 

> Integer Programming 

> Integer Programming: Algebraic Methods 

> Integer Programming: Branch and Bound 
Methods 

> Integer Programming: Branch and Cut Algorithms 

> Integer Programming: Cutting Plane Algorithms 

> Integer Programming Duality 

> Integer Programming: Lagrangian Relaxation 

> Lemke Method 

> Linear Complementarity Problem 

> Linear Programming 

> Mixed Integer Classification Problems 

> Multi-objective Integer Linear Programming 

> Multi-objective Mixed Integer Programming 

> Multiparametric Mixed Integer Linear 
Programming 

> Order Complementarity 

> Parametric Linear Programming: Cost Simplex 
Algorithm 

> Parametric Mixed Integer Nonlinear Optimization 

> Principal Pivoting Methods for Linear 
Complementarity Problems 

> Sequential Simplex Method 

> Set Covering, Packing and Partitioning Problems 

> Simplicial Pivoting Algorithms for Integer 
Programming 

> Stochastic Integer Programming: Continuity, 
Stability, Rates of Convergence 

> Stochastic Integer Programs 

> Time-dependent Traveling Salesman Problem 

> Topological Methods in Complementarity Theory 


References 


1. Cottle RW, Dantzig GB (1968) Complementarity pivot theory 
of mathematical programming. In: Dantzig GB, Veinott AF 
(eds) Mathematics of the Decision Sci., Part 1. Amer. Math. 
Soc., Providence, RI, pp 115-136 

2. Horst R, Pardalos PM, Thoai NV (1995) Introduction to global 
optimization. Kluwer, Dordrecht 

3. Pardalos PM, Rosen JB (1988) Global optimization approach 
to the linear complementarity problem. SIAM J Sci Statist 
Comput 9(2):341-353 


4. Rosen JB (1990) Minimum norm solution to the linear com- 


plementarity problem. In: Leifman LJ (ed) Functional Anal- 
ysis, Optimization and Mathematical Economics. Oxford 
Univ. Press, Oxford, pp 208-216 


Least-index Anticycling Rules 


1847 


ee 
Least-index Anticycling Rules 


LindAcR 


TAMAS TERLAKY 
Department Comput. & Software, 
McMaster University, West Hamilton, Canada 


MSC2000: 90C05, 90C33, 90C20, 05B35 


Article Outline 


Keywords 
Consistent Labeling For the Max-Flow Problem 


Linear Optimization 
Least-Index Rules for Feasibility Problem 
The Linear Optimization Problem 
Least-Index Pivoting Methods for LO 


Linear Complementarity Problems 
Least-Index Rules and Oriented Matroids 
See also 

References 


Keywords 


Pivot rules; Anticycling; Least-index; Recursion; 
Oriented matroids 


From the early days of mathematical optimization peo- 
ple were looking for simple rules that ensure that cer- 
tain algorithms terminate in a finite number of steps. 
Specifically, on combinatorial structures the lack of fi- 
nite termination imply that the algorithm cycles, i.e. 
periodically visits the same solutions. That is why 
rules ensuring finite termination of algorithms on finite 
structures are frequently referred to as anticycling rules. 

One frequently used anticycling rule in linear opti- 
mization (cf. » Linear programming) is the so-called 
lexicographic pivoting rule [9]. The other large class 
of anticycling procedures, the ‘least-index’ rules, is the 
subject of this paper. least-index rules were designed for 
network flow problems, linear optimization problems, 
linear complementarity problems and oriented matroid 
programming problems. These classes will be consid- 
ered in the sequel. 


Consistent Labeling For the Max-Flow Problem 

The maximal flow problem (see e. g. [11]; [24]) is one of 
the basic problems of mathematical programming. The 
problem is given as follows. A directed capacitated net- 


work (N, A, u) is given, where N, the set of nodes, is a fi- 
nite set; A C N x N is the set of directed arcs; finally, u 
€ R“ denotes the nonnegative capacity upper bound for 
flows through the arcs. Let further s, t € N be specified 
as the source and the sink in the network. A vector f € 
R‘ is a flow in the network, if the incoming flow at each 
node, different from the source and the sink, is equal to 
the flow going out from the node. The goal is to find 
a maximal flow, namely a flow for which the total flow 
flowing out of the source or, equivalently, flowing in to 
the sink is the largest possible. The Ford-Fulkerson al- 
gorithm is the best known algorithm to find such a max- 
imal flow. It is based on generating augmenting path’s 
subsequently. A path P connecting the source s and the 
sink t is a finite subset of arcs, where the source is the 
tail of the first arc; the sink is the head of the last arc; 
finally, the tail of an arc is always equal to the head of 
its predecessor. For ease of simplicity let us assume that 
if (v!, v?) € A, then (v, v!) € A as well. If the oppo- 
site arc were not present, we can introduce it with zero 
capacity. 


0 | Initialization. 

Let f be equal to zero. Let a free capacity net- 
work (N, A, #) be defined. Initially let 
A={aéEA: u,>Olandu=u. 

1 | Augmenting path. 

Let P be a path from s to f in the free capacity 
network. 

IF no such path exists, THEN STOP; 

A maximal flow is obtained. 

2 | Augmenting the flow. 

Let 3} be the minimum of the arc capacities 
along the path P. Clearly 0 > 0. 

Increase the flow f on each arc of P by . 

3 | Update the free-capacity network. 

Decrease (increase) 4, by 0 if the (opposite) of 
arc a is on the path P. 

LetA={ae€A: > 0}. 

Go to Step 1. 


The Ford-Fulkerson max-flow algorithm 


At each iteration cycle the flow value strictly in- 
creases. Thus, if the vector u is integral and the max- 
flow problem is bounded, then the Ford—Fulkerson al- 
gorithm provides a maximal flow in a finite number of 
steps. However, if the vector u contains irrational com- 


1848 


Least-index Anticycling Rules 


ponents, then the algorithm does not terminate in a fi- 
nite number of steps and, even worse, it might converge 
to a nonoptimal flow. For such an example see [11,24]. 
An elegant solution for this problem is the consistent la- 
beling algorithm of A.W. Tucker [28]. This most simple 
refinement reads as follows: 


Be consistent at any time during the algorithm, specifically 
when building the augmenting path by using the labeling 
procedure. Whenever a labeled but unscanned subset of 
nodes is given during the procedure pick always the same 


from the same subset to be scanned. 

Particularly, if we assign an index to each node, then we are 
supposed to choose always the least-indexed node among 
the possibilities. 


Tucker writes [28]: “Fulkerson (unpublished) con- 
jectured that a consistent labeling procedure would be 
polynomially bounded; a proof of this conjecture ap- 
pears to be very difficult.’ 


Linear Optimization 


Before discussing the general LO problem, first the lin- 
ear feasibility problem is considered. 


0 | Initialization. 

Let T(B) be an arbitrary basis tableau and fix 
an arbitrary ordering of the variables. 

1 | Leaving variable selection. 

Let Kp be the set of the indices of the infeasible 
variables in the basis. 

IF Kp = 0, THEN STOP; 

the feasibility problem is solved. 

ELSE, let p be the least-index in Kp and then 
Xp will leave the basis. 

2 | Entering variable selection. 

Let Kp be the set of the column indices of the 
negative elements in row p of T(B). 

IF Kp = @, THEN STOP; 

Row p of the tableau T(B) gives an evidence 
that the feasibility problem is inconsistent and 
row p of the inverse basis is a solution of the 
alternative system. 

ELSE, let q be the least-index in Kp and then 
xq will enter the basis. 

3 | Basis transformation. 

Pivot on (p, q). Go to Step 1. 


Pivot rule 


Least-Index Rules for Feasibility Problem 
The feasibility problem 


and its alternative pair 


bly > 0, Aly <0, 


can be solved by a very simple least-index pivot al- 
gorithm. A fundamental result, the so-called Farkas 
lemma (cf. also » Farkas lemma; » Farkas lemma: Gen- 
eralizations) [10] says that exactly one of the two alter- 
native systems has a solution. This result is also known 
as the theorem of the alternatives. When a simple finite 
pivot rule gives a solution to either of the two alterna- 
tives, an elementary constructive proof for the Farkas 
lemma and its relatives is obtained. The above simple 
finite least-index pivot rule for the feasibility problem is 
a special case (see below) of Bland’s algorithm [5]. It is 
taken from [19] where the role of pivoting, and specif- 
ically the role of finite, least-index pivot rules in linear 
algebra is explored. 


The Linear Optimization Problem 


The general linear optimization (LO), linear program- 
ming (cf. » Linear programming), problem will be con- 
sidered in the standard primal form 


min {el x: Ax =b,x> 0}, 
together with its standard dual 
max {b'y: Aly <c}. 


One of the most efficient, and for a long time the only, 
practical method to solve LO problems was the sim- 
plex method of G.B. Dantzig. The simplex method is 
a pivot algorithm that traverses through feasible basic 
solutions while the objective value is improving. The 
simplex method is in practice one of the most efficient 
algorithms but it is theoretically a finite algorithm only 
for nondegenerate problems. 

A basis is called primal degenerate if at least one of 
the basic variables is zero; it is called dual degenerate 
if the reduced cost of at least one nonbasic variable is 
zero. In general, the basis is degenerate if it is either pri- 
mal or dual, or both primal and dual degenerate. The 
LO problem is degenerate, if it has a degenerate ba- 
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sis. A pivot is called degenerate when after the pivot 
the objective remains unchanged. When the problem 
is degenerate the objective might stay the same in sub- 
sequent iterations and the simplex algorithm may cy- 
cle, i.e. starting from a basis, after some iterations the 
same basis is revisited and this process is repeated end- 
lessly. Because the simplex method produces a sequence 
with monotonically improving objective values, the ob- 
jective stays constant in a cycle, thus each pivot in the 
cycle must be degenerate. The possibility of cycling was 
recognized shortly after the invention of the simplex al- 
gorithm. Cycling examples were given by E.M.L. Beale 
[2] and by A.J. Hoffman [17]. Recently (1999) a scheme 
to construct cycling LO examples is presented in [15]. 
These examples made evident that extra techniques are 
needed to ensure finite termination of simplex meth- 
ods. The first and widely used such tool is the class of 
lexicographic pivoting rules (cf. ® Lexicographic pivot- 
ing rules). Other, more recent techniques are the least- 
index anticycling rules and some more general recursive 
schemes. 


Least-Index Pivoting Methods for LO 


Cycling of the simplex method is possible only when the 
LO problem is degenerate. In that case not only many 
variables might be eligible to enter, but also to leave 
the basis. The least-index primal simplex rule makes 
the selection of both the entering and the leaving vari- 
able uniquely determined. Least-index rules are based 
on consistent selection among the possibilities. The first 
such rule for the simplex method was published by R.G. 
Bland [4,5]. 

The least-index simplex method is finite. The finite- 
ness proofs are quite elementary. All are based on the 
simple fact that there is a finite number of different ba- 
sis tableaus. Further, orthogonality of the primal and 
dual spaces on some recursive argumentation is used [4, 
5,27] 

It is straightforward to derive the least-index dual 
simplex algorithm. The only restriction relative to the 
dual simplex algorithm is, that when there are more 
candidates to leave or to enter the basis, always the 
least-indexed candidate has to be selected. 

An interesting use of least index-resolution is used 
in [18] by designing finite primal-dual type Hungarian 
methods for LO. Note that finite criss-cross rules (cf. 


0 | Initialization 

Let T(B) be a given primal feasible basis 
tableau and fix an arbitrary ordering of the 
variables. 

1 | Entering variable selection. 

Let Kp be the set of the indices of the dual in- 
feasible variables, i.e. those with negative re- 
duced cost. 

IF Kp = 9, THEN STOP; 

The tableau T(B) is optimal and this way a pair 
of solutions is obtained. 

ELSE, let q be the least-index in Kp and x4, will 
enter the basis. 

2 | Leaving variable selection. 

Let Kp be the set of the indices of those can- 
didate pivot elements in column q that satisfy 
the usual pivot selection conditions of the pri- 
mal simplex method. 

IF Kp = 9, THEN STOP; 

the primal problem is unbounded, and so the 
dual problem is infeasible. 

ELSE, let p be the least-index in Kp and then 
Xp will leave the basis. 

3 | Basis transformation. 

Pivot on (p, q). Go to Step 1. 


The least-index primal simplex rule 


also ® Criss-cross pivoting rules) [14,26] make maxi- 
mum possible use of least-index resolution. 

Least-index simplex methods are not polynomial, 
they might require exponential number of steps to solve 
a LO problem, as it was shown by D. Avis and V. Chva- 
tal [1]. Their example is essentially the Klee-Minty 
polytope [21]. Another example, again on the Klee- 
Minty polytope, is Roos’s exponential example [25] for 
the least-index criss-cross method. Here the initial ba- 
sis is feasible and, although it is not required, feasibility 
happens to be preserved, thus the criss-cross method 
reduces to a least index simplex method. 


Linear Complementarity Problems 


A linear complementarity problem (cf. » Linear com- 
plementarity problem) (LCP) is given as follows: 


—-Mx+s=t, x,s>0, x's=0. 


Pivot algorithms are looking for a complementary basis 
solution of the LCP. A basis is called complementary, if 
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exactly one of the complementary variables x; and s; for 
all i is in the basis. 

The solvability of LCP depends on the properties of 
the matrix M. One of the simplest case is when M is 
a P-matrix. The matrix M is a P-matrix if all of its prin- 
cipal minors are positive. K.G. Murty [22] presented an 
utmost simple finite pivot algorithm for solving the P- 
matrix LCP. This algorithm is a least-index principal 
pivot algorithm. 

Two extremal behaviors, exponential in the worst 
case and polynomial in average, of this finite pivot rule 
is studied in [13]. 

Finite least-index pivot rules are developed for 
larger classes of LCPs. All are least-index principal piv- 
oting methods, some more classical feasibility preserv- 
ing simplex type methods [7,8,23], others are least- 
index criss-cross pivoting rules (cf. ® Criss-cross pivot- 
ing rules) [6,16,20]. More details are given in > Princi- 
pal pivoting methods for linear complementarity prob- 
lems. 


0 | Initialization. 

Let T(B) be complementary basis tableau and 
fix an arbitrary ordering of the variables. (We 
can choose x = 0, s = ti.e., x nonbasic, s 
basic.) 

1 | Leaving variable selection. 

Let K be the set of the infeasible variables. 

IF K = @ , THEN STOP; 

a complementary solution for LCP is obtained. 
ELSE, let p be the least-index in K. 

2 | Basis transformation. 

Pivot on (p, p), ie. replace the least-indexed 
infeasible variable in the basis by its comple- 
mentary pair. 

Go to Step 1. 


Murty’s Bard-type schema 


Least-Index Rules and Oriented Matroids 


The least-index simplex method was originally de- 
signed for oriented matroid linear programming (cf. 
also ® Oriented matroids) [3,4]. It turned soon out, 
that this is not a finite algorithm in the oriented matroid 
context. The reason is the possibility of nondegenerate 
cycling [3,12], a phenomenon what is impossible in the 
linear case. An apparent difference between the linear 


and the oriented matroid context is that for oriented 
matroids none of the finite-, recursive- or least-index- 
type rules yield a simplex method, i.e. a pivot method 
that preserves feasibility of the basis throughout. This 
discrepancy is also due to the possibility of nondegen- 
erate cycling. 


See also 


> Criss-cross Pivoting Rules 

> Lexicographic Pivoting Rules 

> Linear Programming 

> Pivoting Algorithms for Linear Programming 
Generating Two Paths 

> Principal Pivoting Methods for Linear 
Complementarity Problems 

> Probabilistic Analysis of Simplex Algorithms 
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Let c be the linear functional on the space of complex 
polynomials defined by 


c, EC, 1 = Oj) pees 
0, i<0. 


c(x') = 


It is said that {P;} forms a family of (formal) orthogonal 
polynomials with respect to c if Vk: 

e P; has the exact degree k, 

e c(x!P;,(x)) = 0 for i=0,...,k—1. 

Such a family exists if, Vk, the Hankel determinant 


Co Cl Ck-1 
qo = Cl Cg, RRs Ck 
; = 

Ck-1 Ck ott" C2k-2 


is different from zero. Such polynomials enjoy most 
of the properties of the usual orthogonal polynomials, 
when the functional c is given by 


b 
c(x!) =f x! da(x), 


where @ is bounded and non decreasing in [a, b] (see [1] 
for these properties). In this paper we study the polyno- 
mials R, such that 


m 


S> [ela Rix) 


i=0 


is minimized, where m is an integer strictly greater than 
k — 1 (since, for m = k — 1, we recover the previous for- 
mal orthogonal polynomials) and which can possibly 
depend on k. They will be called least squares (formal) 
orthogonal polynomials. They depend on the value of m 
but for simplicity this dependence will not be indicated 
in our notations. 

Such polynomials arise naturally in problems of 
Padé approximation for power series with perturbed 
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coefficients, and in Gaussian quadrature (as described 
in the last section). Some properties of these polyno- 
mials are derived, together with a recursive scheme for 
their computation. 


Existence and Uniqueness 


Since the polynomials R; will be defined apart from 
a multiplying factor, and since it is asked that the de- 
gree of Rx is exactly k we shall write 


Ry (x) = bo + byx +--+ + bpx* 


with by = 1. We set 


P(bo,..., Bea) = DY [e(x'Re(x))]? 
i=0 
and we seek for the values of bo, ..., b, — ; that minimize 


this quantity. That is, such that 


a ee (1) 
3b, orj=0,..., ; 


Setting Yn = (Cp; ...3 Cn+m)!, this system can be written 
bolo. Vj) + 0* + Be (Ve-1, Yj) = —(Vk Yj) (2) 


for j= 0,...,k — 1. Thus Ry exists and is unique if and 
only if the matrix A, of this system is non singular. Set- 
ting X = (1, x,..., x ~) and calling the right-hand side 
of the preceding system y we see that 


Ant Y 
xk 
Tal 
If we set 
Co mee Ck-1 
Be= 
ig ee de 


then A, = BL Bes y= Blyk and we recover the usual so- 
lution of a system of linear equations in the least squares 
sense. 


Computation 


The polynomials R; can be recursively computed by in- 
verting the matrix A; of the above system (2) by the bor- 
dering method, see [5]. This method is as follows. Set 


Ax Uk 
Akti = cc ) 


where uz is a column vector, vg a row vector and ax 
a scalar. We then have 


Akt = a + Ay urB; ‘vA! 
+1 —Br lv, Az! 


—A;'urB; ) ; 
Be 


where B= ax — veAz | Uk. 

Instead of choosing the normalization b, = 1 we 
could impose the condition bo = 1. In that case we have 
the system 


bi. vj) Hoe + OL (VK. Yj) = —(Y0, Y;) (3) 


for j = 1,..., k, and the bordering method can be used 
not only for computing the inverses of the matrices of 
the system recursively but also for obtaining its solu- 
tion, since the new right-hand side contains the previ- 
ous one. 

Let Ax’ be the matrix of (3) and d;’ be the right-hand 
side. We then have 


A’ out, di 
y k k\. Jd... = (“% 
k+1 & al, k+1 fi 


with 


Uh = (Yeti, Ys+- +s (Yeti Ve) 
Ve = (Gis Veer ss <0 Ves Yee): 
Ay = (Vetis Yet1)3 

dy = (Yo ¥1)s +++ (os Ya))" 

fi = (Yo. Ve-+1)- 


Setting z;’ = (by’,..., by’)T we have 


4 (4) 4 fh tee (-Keu 
k+1 0 Bp 1 


with Bx! = a;/— VAT ay! 

Of course the bordering method can only be used if 
Bx (or Bx’ in the second case) is different from zero. If it 
is not the case, instead of adding one new row and one 
new column to the system it is possible to add several 
rows and columns until anon singular 6; (which is now 
a square matrix) has been found (see [3] and [4]). 


Location of the Zeros 


We return to the normalization b; = 1. As 


c(x'Ry(x)) = boc: + +++ + beci+k 


Least Squares Orthogonal Polynomials 


1853 


and dc(x'Ry(x))/ 0b; = ci + ;, from (1) we obtain 
Yo c(xiRe(x))ci4j =0 for j=0,...,k-1. (4) 
i=0 

This relation can be written as 


C(Re(x) (Cj + i418 +++ + Ci4mx"™)) = 0, 


fori=0,...,k — 1. Let us now assume that 
b * 
a= | x'da(x), i=0,1,..., 
a 


with a bounded and nondecreasing in [a, b]. We have 


Cpe Cig Cae” 
m b 
Elm): 
pak 
b ; m a 
=| y'[ doxly! | doy). 
a j=0 
Set 
b lf i . 
wes.) = f yt! >i xy da(y). 
a j=0 


Thus 
w(x, i) = ¢;) + Ci4.x Fee + Citemx™ 


and it follows that 


b 
c(Ry(x)w(x, i)) = / Ry (x)w(x, i) da(x) = 0 


a 


for i=0,..., k — 1, which shows that the polynomial Rx 
is biorthogonal in the sense of [7,8]. Let us now study 
the location of the zeros of R;.. For that purpose we shall 
apply [7, Thm. 3], also given as [8, Thm. 5]. Set 


d®(x, 1) = w(x, p)da(x) 


and 


b 
I(t) = x* d®(x,), k=0,1,.... 


In our case, ju takes the values uw; =i—1,i=1,2,.... 
Thus 


det [Ti(u)] = det easy vi) 


and the condition of regularity of [7,8] is equivalent to 
our condition for the existence and uniqueness of Rx. 
According to [7,8], we now have to look at the interpo- 
lation property of w. We have 


w(xj, Hj) = (yj-1, Xi) 


where X; = (1, xj, ..., x7")T, the x;’s being arbitrary dis- 
tinct points in [a, b], and thus 


(Yo, X1) (Ye-1, X1) 
—_ : = det(X; Ix) 
(Yo. Xk) (Ve-1, Xk) 
with 
xy 
X=]: and Ik = (yo,.--.Vk-1): 
xT 


k 


The interpolation property holds if and only if 
det(X;,I",) A 0, that is, if and only if the matrix XI", 
has rank k. Thus, using the theorem of [7,8], we have 
proved the following result: 


Theorem 1 If Ax is regular and if X;I°,% has rank k, 
then Ry, exists and has k distinct zeros in [a, b]. 


Remark 2. When 0 < a < B, it can be proved that 
det(X;,I",) 4 0 (see [2] for the details). 


Applications 


Our first application deals with Padé-type approxima- 
tion. Let vz be an arbitrary polynomial of degree k and 
let wi(t) = ag + +++ + ax—1t*! be defined by 


a; = c(x "yA (x)), i=0,...,k—-1. 


We set 
T(t) = t*y,(t-!) and W;,(t) = th 'w;,(t7). 


Let f be the formal power series 


(HS > ar. 
i=0 


Then it can be proved that 


w(t) 


_ k 
tC amie 


f(t) - 


(t > 0). 
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The rational function A) is called a Padé-type approx- 
Vk 


imant of f and it is denoted by (k — 1/k)¢(t), [1]. More- 
over it can also be proved that 


vK(x) _ tk 
(3 ) V(t) 


kk 
=) vs(0)) ; 


_ Wi, (t) es 
Vet) Vet) 


f(t) 


ve ( (tate nt atte 


That is, 
Ff (V(t) — We) = DS clave (x))t. 
i=0 


Thus if the polynomial v;, which is called the generating 
polynomial of (k — 1/k), satisfies 


c(x'vz(x)) =0 fori =0,...,k—1, 


then 


W(t) 
V(t) 


f(t)— =o"), 


In this case vy; is the formal orthogonal polynomial P; 
of degree k with respect to c and ma is the usual Padé 
approximant [k — 1/k] of f. 

As explained in [10], Padé approximants can be 
quite sensitive to perturbations on the coefficients c; of 
the series f. Hence the idea arises to take as v;, the least 
squares orthogonal polynomial R; of degree k instead of 
the usual orthogonal polynomial, an idea which in fact 
motivated our study. Of course such a choice decreases 
the degree of approximation, since the approximants 
obtained are only of the Padé-type, but it can increase 
the stability properties of the approximants and also 
their precision since pn er) is minimized by 
the choice vz = Rx. We give a numerical example that 
illustrates this fact. 

We consider the function 


f@= net) = ya 
i=0 


and we assume that we know the coefficients c; with 
a certain precision. For example, we know approximate 
values c} such that 


l= [s10”, t=0,1s... 


In the following table we compare the number of ex- 
act figures given by the Padé approximant with those 
of the least squares Padé-type approximant, both com- 
puted with the same number of coefficients c*. We can 
see that the least squares Padé-type approximant has 
better stability properties. 


z | Padé approx LS Padé-type approx 
| [7/8] [6/7] (m = 8) 
125 6.7 Tell 
1.9 5.7) 7.0 
Zl 52) 6.7 


Another application concerns quadrature methods. 
We have already shown that if the functional c is given 
by 


b 
ay, x' da(x), i=0,1,..., 
a 


with a bounded and nondecreasing, then the corre- 
sponding least squares orthogonal polynomial of de- 
gree k, Ry, has k distinct zeros in [a, b]. We can 
then construct quadrature formulas of the interpolatory 
type. 
If Aj,..., Ax are the zeros of Ry, we can approximate 
the integral 


b 
i=) f(x) da(x) 


by 
Tk = Aif(Ai) +++: + Agf(Ar) (5) 
where 
_ a (x) 
a= | caeaay 
and 


k 
n(x) = | [(«-A)). 
j=l 


This corresponds to replacing the function f by its in- 
terpolating polynomial at the knots Aj, ..., Ax. The 
truncation error of (5) is given by 


b 
I-i =Er= / FIA + Ag, x]Re(x) dor(x). 
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Expanding the divided difference we see 


+ FLA... Akt mtis x] (% — Agi) s+ (% — Akemi) 


for Ax41,.- 


sere Aggil(™ — Angi) s+: (% —Agti-n) 


.> Ak+m+1 any points in the domain of defi- 


nition Dy of f. If0 € Dy, then we can choose 


Agta = = Akt = 0. 
Setting 
M; = f[A1,..., Anti] 
we get 
Fi ee ee 
m+1 


= M;x'! + rr Aa veut Aktmtis X] 


i=1 


and hence, for the truncation error 


m b , 
= So Mini (/ Ry(x)x' ‘ava 
i=0 


Er 


a 


b 
+f Fld Anemne ade RCH) da) 


with 


Ds 


m 
i=0 


minimised. Moreover, if f € ck+™+1([q, b]) and, since 


mt+1 


; 2 
(/ Ry(x)x! ‘aco 


x is positive over [a, b], we obtain 


flan, see Aktm415 ala Re) da(x) 


Cm+1 


~ (k+m+D! 


yp Redfern) 


with A, 7 € [a, b], and, for the error, 


Mek) (py. b 
= fe i) (/ Rode) 


Er 


A (k + i! 


Cm+1 


(k+m-+1)! 


ReAyfrrnroG). ) 


with 7; € [a, b],i=0,..., m, A, 1 € [a, b]. We remark 
that in the case where m = k — 1, R, is the orthogonal 
polynomial with respect to the functional c and so (5) 
corresponds to a Gaussian quadrature formula. An ad- 
vantage of the quadrature formulas (5) is that they are 
less sensitive to perturbations on the sequence of mo- 
ments c;, as is shown in the following numerical exam- 
ple. Such a case can arise in some applications where the 
formula giving the moments c; is sensitive to rounding 
errors, see [11] for example. 
Consider the functional c defined by 


1 ; 1 
caf dem, 
0 i+1 


and perturb the coefficients in the following way 


i a i cr 

0 1.00000011 6 0.14285700 
1 0.50000029 ZU 0.12500000 
2 0.33333340 8 0.11111109 
3 0.25000101 9 0.10000000 
4 0.20000070 10 0.09090899 
5) 0.16666600 11 0.08333300 


We can construct from these coefficients the least 
squares orthogonal polynomials and the corresponding 
quadrature formulas (5). The precision of the numeri- 
cal approximations of I = / (f(x) dx is given in the fol- 
lowing table 


f(x) k=5;m=4 k = 5; m = 6 least 
Gauss quad. sq. quad. 
W/(x+0.5)]  —2.2* 10> —6.2 * 10°° 
1/(x + 0.3) —2.1* 1074 —1.2* 107° 


We can obtain other applications from the 
following generalization. Instead of minimizing 

76 [c(x'R;(x))]* we can introduce weights and mini- 
mize 


* ~ ip* 2 
B*(bo,..., bi-1) = D pi [elx'RE(x))] 
i=0 
with p; > 0,i=0,..., m. If we choose the inner product 


(Vi. yj" = Yo PR RCj+K 


k=0 
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the solution of this problem can be computed as in the 
previous case and all the properties of the polynomials 
are still true. It can be seen, from numerical examples, 
that if the sequence of moments c; has a decreasing pre- 
cision, we can expect that the least squares Padé-type 
approximants constructed with a decreasing sequence 
of weights will give a better result. In the same way, 
for the quadrature formulas (5), from the expression (6) 
of the truncation error and the knowledge of the mag- 
nitude of the derivatives, we can reduce this error by 
choosing appropriate weights. Some other possible ap- 
plications of least squares orthogonal polynomials will 
be studied in the future. 


See also 


> ABS Algorithms for Linear Equations and Linear 
Least Squares 

> ABS Algorithms for Optimization 

> Gauss-Newton Method: Least Squares, Relation to 
Newton’s Method 

> Generalized Total Least Squares 

> Least Squares Problems 

> Nonlinear Least Squares: Newton-type Methods 

> Nonlinear Least Squares Problems 

> Nonlinear Least Squares: Trust Region Methods 


References 


1. Brezinski C (1980) Padé type approximation and general 
orthogonal polynomials. ISNM, vol 50. Birkhauser, Basel 

2. Brezinski C, Matos AC (1993) Least squares orthogonal 
polynomials. J Comput Appl Math 46:229-239 

3. Brezinski C, Redivo Zaglia M (1991) Extrapolation methods. 
Theory and practice. North-Holland, Amsterdam 

4. Brezinski C, Redivo Zaglia M, Sadok H (1992) A breakdown- 
free Lanczos type algorithm for solving linear systems. Nu- 
mer Math 63:29-38 

5. Faddeeva VN (1959) Computational methods of linear al- 
gebra. Dover, Mineola, NY 

6. Gantmacher FR (1959) The theory of matrices. Chelsea, 
New York 

7. Iserles A, Norsett SP (1985) Bi-orthogonal polynomials. In: 
Brezinski C, Draux A, Magnus AP, Maroni P, Ronveaux A 
(eds) Orthogonal Polynomials and Their Applications. Lec- 
ture Notes Math. Springer, Berlin, pp 92-100 

8. Iserles A, Norsett SP (1988) On the theory of biorthogonal 
polynomials. Trans Amer Math Soc 306:455-474 

9. Karlin S (1968) Total positivity. Stanford Univ. Press, Palo 
Alto, CA 


10. Mason JC (1981) Some applications and drawbacks of Padé 
approximants. In: Ziegler Z (ed) Approximation Theory and 
Appl. Acad. Press, New York, pp 207-223 

11. Morandi Cecchi M, Redivo Zaglia M (1991) A new recursive 
algorithm for a Gaussian quadrature formula via orthogo- 
nal polynomials. In: Brezinski C, Gori L, Ronveaux A (eds) 
Orthogonal Polynomials and Their Applications. Baltzer, 
Basel, pp 353-358 


Least Squares Problems 


AKE BJORCK 
Linképing University, Linképing, Sweden 


MSC2000: 65Fxx 


Article Outline 


Keywords 
Synonyms 
Introduction 
Historical Remarks 
Statistical Models 
Characterization of Least Squares Solutions 
Pseudo-inverse and Conditioning 
Singular Value Decomposition and Pseudo-inverse 
Conditioning of the Least Squares Problem 
Numerical Methods of Solution 
The Method of Normal Equations 
Least Squares by QR Factorization 
Rank-Deficient and II]-Conditioned Problems 
Rank Revealing QR Factorizations 
Updating Least Squares Solutions 
Recursive Least Squares 
Modifying Matrix Factorizations 
Sparse Problems 
Banded Least Squares Problems 
Block Angular Form 
General Sparse Problems 
See also 
References 


Keywords 


Least squares 


Synonyms 


LSP 


Least Squares Problems 


1857 


Introduction 
Historical Remarks 


The linear least squares problem originally arose from 
the need to fit a linear mathematical model to given ob- 
servations. In order to reduce the influence of errors in 
the observations one uses a greater number of measure- 
ments than the number of unknown parameters in the 
model. 

The algebraic procedure of the method of least 
squares was first published by A.M. Legendre [25]. It 
was justified as a statistical procedure by C.F. Gauss 
[13]. A famous example of the use of the least squares 
principle is the prediction of the orbit of the asteroid 
Ceres by Gauss in 1801. After this success, the method 
of least squares quickly became the standard procedure 
for analysis of astronomical and geodetic data. 

Gauss gave the method a sound theoretical basis 
in two memoirs: “Theoria Combinationis’ [11,12]. In 
them, Gauss proves the optimality of the least squares 
estimate without any assumptions that the random 
variables follow a particular distribution. 


Statistical Models 


In the general univariate linear model the vector b € 
R” of observations is related to the unknown param- 
eter vector x € R” bya linear relation 


Ax =b+e, (1) 


where A € R”*" is a known matrix of full column 
rank. Further, € is a vector of random errors with zero 
means and covariance matrix 07 W € R™*™, where 
W is known but o” > 0 unknown. The standard linear 
model is obtained for W = I. 


Theorem 1 (Gauss-Markoff theorem) Consider the 
standard linear model (1) with W = I. The best lin- 
ear unbiased estimator of any linear function c™x is 
c' x, where x is obtained by minimizing the sum of the 
squared residuals, 
m 
Ira = Dor (2) 
i=1 
where r = b — Ax and || - ||2 denotes the Euclidean 
vector norm. Furthermore, E(s*) = o7, where s? is the 
quadratic form 


= ——(b — Ax)" (b — Az). (3) 


The variance-covariance matrix of the least squares esti- 
mate x is given by 


Vie) = 0?(ATA). (4) 


The residual vector F = b — AX satisfies A'F = 0, and 
hence there are n linear relations among the m com- 
ponents of 7. It can be shown that the residuals 7, and 
therefore also the quadratic form s’, are uncorrelated 
with X, i.e., cov(7, x) = 0, cov(s”,x) = 0. 

If the errors in € are uncorrelated but not of equal 
variance, then the covariance matrix W is diagonal. 
Then the least squares estimator is obtained by solving 
the weighted least squares problem 


min ||D(Ax— b)||,,.  D= W7?. (5) 


For the general case with no restrictions on A and W, 
see [23]. 

The assumption that A is known made in the linear 
model is frequently unrealistic since sampling or mod- 
eling errors often also affect A. In the errors-in-variables 
model one instead assumes a linear relation 


(A+ E)x=b+r, (6) 


where (E, r) is an error matrix whose rows are indepen- 
dently and identically distributed with zero mean and 
the same variance. An estimate of the parameters x in 
the model (6) is obtained from the total least squares 
(TLS) problem. 


Characterization of Least Squares Solutions 


Let S be set of all solutions to a least squares problem, 
S={x eR": ||Ax—b||, = min}. (7) 


Then x € § if and only if AT(b — Ax) = 0 holds. Equiv- 
alently, x € S if and only if x satisfies the normal equa- 
tions 


Al Ax = Alb. (8) 


Since AT b € R(AT) = R(AT A) the normal equations are 
always consistent. It follows that 5 is a nonempty, con- 
vex subset of R”. Any least squares solution x uniquely 
decomposes the right-hand side b into two orthogonal 
components 


b=Ax+r, Axe R(A)LreN(A!), 
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where R(A) and N(AT) denote the range of A and the 
nullspace of AT, respectively. 

When rank A < n there are many least squares so- 
lutions x, although the residual b — Ax is still uniquely 
determined. There is always a unique least squares so- 
lution in 8 of minimum length. The following result ap- 
plies to both overdetermined and underdetermined lin- 
ear systems. 


Theorem 2 Consider the linear least squares problem 
min Ixl|,, S={x eR": ||}b—Ax||, = min}, (9) 
x€ 


where A € R™*" and rank(A) = r < min(m, n). This 
problem always has a unique solution, which is distin- 
guished by the property that 


x L N(A). 


Pseudo-inverse and Conditioning 
Singular Value Decomposition and Pseudo-inverse 


A matrix decomposition of great theoretical and practi- 
cal importance for the treatment of least squares prob- 
lems is the singular value decomposition (SVD) of A, 


n 
A=UEV' =) ujo;v;. (10) 
i=1 
Here o; are the singular values of A and u; and v; the 
corresponding left and right singular vectors. 
Using this decomposition the solution to problem 
(9) can be written x = A‘ b, where 


-1 
Al=V € 


0 (11) 


‘) u' eR™™, 
0 
Here A? is called the pseudo-inverse of A. It is the 
unique matrix which minimizes ||AX — I||r, where || 
- ||» denotes the Frobenius norm. Note that the pseudo- 
inverse A‘ is not a continuous function of A, unless one 
allows only perturbations which do not change the rank 
of A. 

The pseudo-inverse was first introduced by E.H. 
Moore in 1920. R. Penrose [30] later gave the following 
elegant algebraic characterization. 


Theorem 3 (Penrose’s conditions) The pseudo-inverse 
X =A? is uniquely determined by the four conditions: 


1) AXA=A; 
2) XAX =X: 
3) (AX)T = AX; 
4) (XA) =XA. 


It can be directly verified that A‘ given by (11) satisfies 
these four conditions. 

The total least squares problem (TLS problem) in- 
volves finding a perturbation matrix (E, r) having min- 
imal Frobenius norm, which lowers the rank of the ma- 
trix (A, b). Consider the singular value decomposition 
of the augmented matrix (A, b): 


(A,b)=USV', S= diag(o},..., On+1)s 


where 0; > ++: > 0,41 => 0. Then, in the generic case, 
(x, — 1)T is a right singular vector corresponding to 
On+ 1 and min || (E,r) ||p =On41. 

An excellent survey of theoretical and computa- 
tional aspects of the total least squares problem is given 
in [22]. 


Conditioning of the Least Squares Problem 


Consider a perturbed least squares problem where A= 
A+ 8A, b = b + 4b, and let the perturbed solution be 
x = x +6x. Then, assuming that rank(A) = rank(A + 6 
A) =n one has the first order bound 


1 1 
[Sxll2 <= (WBbilla + WSAll2 llalle) + MSAlla Ila 


n 
The condition number of a matrix AE R™*" (AF 
0) is defined as 


O71 
K(A) = |All, At], = —. (12) 
Or 
where 0) > +--+ >o,>0,are the nonzero singular values 
of A. Hence, the normwise relative condition number of 
the least squares problem can be written as 


Ire 


A, b) = K(A) + K(Ay? 
Kis(A, b) = K(A) + (A) |All llxll 


(13) 
For a consistent problem (r = 0) the last term is zero. 
However, in general the condition number depends on 
the size of r and involves a term proportional to k(A)’. 
A more refined perturbation analysis, which applies 
to both overdetermined and underdetermined systems, 
has been given in [34]. In order to prove any meaning- 
ful result it is necessary to assume that rank(A + 6 A) 
= rank(A). If rank(A) = min(m, n), the condition 7 = 
|A‘ lz |6Al]2 < 1 suffices to ensure that this is the case. 
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Numerical Methods of Solution 
The Method of Normal Equations 


The first step in the method of normal equations for the 
least squares problem is to form the cross-products 
d=A'bER". 


C=A'AER™", (14) 


Since the matrix C is symmetric, it is only necessary 
to compute and store its upper triangular part. When 
m > n this step will result in a great reduction in the 
amount of data. 

The computation of C and d can be performed 
either using an inner product form (operating on 
columns of A) or an outer product form (operating on 
rows of A). Row-wise accumulation of C and d is advan- 
tageous if the matrix A is sparse or held in secondary 
storage. Partitioning A by rows, one has 


m m 
C = aah: d = > bias, 
i=1 i=1 


where ‘a denotes the ith row of A. This expresses C as 
a sum of matrices of rank. 

Gauss solved the symmetric positive definite system 
of normal equation by elimination, preserving symme- 
try, and solving for x by back-substitution. A different 
sequencing of this algorithm is to compute the Cholesky 
factorization 


(15) 


C=R'R, (16) 
where R is upper triangular with positive diagonal ele- 
ments, and then solve the two triangular systems 


R'z=d, Rx =z, (17) 


by forward- and back-substitution, respectively. The 
Cholesky factorization, named after the French officer 
A.L. Cholesky, who worked on geodetic survey prob- 
lems in Africa, was published by C. Benoit [1]. (In sta- 
tistical applications this method is often known as the 
square-root method, although the proper square root of 
A should satisfy B? = A.) 

The method of normal equations is suitable for 
moderately ill-conditioned problems but is not a back- 
ward stable method. The accuracy can be improved by 
using fixed precision iterative refinement in solving the 
normal equations. 


Set xo = 0, ro = 0, and for s = 0, 1, ... until conver- 
gence do 


r; = b—Ax,, 
R'(Réx,) = A'r,, 


Xs41 = X; + bx,. 


(Here, x; corresponds to the unrefined solution of the 
normal equations.) 

The method of normal equations can fail when ap- 
plied to weighted least squares problems. To see this 
consider a problem with two different weights y and 1, 


yAi yb, 

( Az ~ ( by ) 
for which the matrix of normal equations is ATA = y* 
Aj A; + Aj A>. When y > 1 this problem is called 
stiff. In the limit y — oo the solution will satisfy the 
subsystem Ax = b; exactly. If y > u7 " (u is the unit 
roundoff), the information in the matrix A, may com- 
pletely disappear when forming ATA. For possible ways 
around this difficulty, see [4, Chap. 4.4]. 


min (18) 
x 


2 


Least Squares by QR Factorization 


The QR factorization and its extensions are used exten- 
sively in modern numerical methods for solving least 
squares problems. Let A € R”*"” with rank(A) = n. 
Then there are an orthogonal matrix Q €¢ R”*™ and 
an upper triangular R € R"”*" such that 


s=0() 


Since orthogonal transformations preserve the Eu- 
clidean length, it follows that 


(19) 


||Ax — bl], = |Q™ (Ax — b) ||, (20) 


for any orthogonal matrix Q € R”*™. Hence using the 
QR factorization (19) the solution to the least squares 
problem can be obtained from 


(21) 


An algorithm based on the QR decomposition by 
Householder transformations was first developed in 
a seminal paper by G.H. Golub [18]. Here, Q is com- 
pactly represented as a product of Householder ma- 
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trices Q = P, --- P,, where Py = I — By UK, « Only 
the Householder vectors u, are stored, and advantage 
is taken of the fact that the first k — 1 components of u, 
are zero. 

Golub’s method for solving the standard least 
squares problem is normwise backward stable, see [24, 
pp. 90ff]. Surprisingly, this method is stable also for 
solving the weighted least squares problems (5) pro- 
vided only that the equations are sorted after decreasing 
row norms in A, see [8]. 

Due to storage considerations the matrix Q ina QR 
decomposition is often discarded when A is large and 
sparse. This creates a problem, since then it may not be 
possible to form QTD. If the original matrix A is saved 
one can use the corrected seminormal equations (CSNE) 


R'RxX=A'D, 
R'Réx = A'7, 


T= b—Ax, 
xX, =xX+ 6x. 


(Note that unless the correction step is carried out the 
numerical stability of this method is no better than the 
method of normal equations.) An error analysis of the 
CSNE method is given in [2]. A comparison with the 
bounds for a backward stable method shows that in 
most practical applications the corrected seminormal 
equations is forward stable. 

Applying the Gram-Schmidt orthogonalization pro- 
cess to the columns of A produces Q; and R in the fac- 
torization 


A= (qj,...,4n) = QR, Qi = (q1.--+54n)s 


where Q, has orthogonal columns and R is upper 
triangular. There are two computational variants of 
Gram-Schmidt orthogonalization, the classical Gram- 
Schmidt orthogonalization (CGS) and the modified 
Gram -Schmidt orthogonalization (MGS). In CGS there 
may be a catastrophic loss of orthogonality unless re- 
orthogonalization is used. In MGS the loss of orthogo- 
nality can be shown to occur in a predictable manner. 
Using an equivalence between MGS and House- 
holder QR applied to A with a square matrix of zeros on 
top, backward stable algorithm based on MGS for solv- 
ing least squares problems have been developed, see [3]. 


Rank-Deficient and Il|-Conditioned Problems 


The mathematical notion of rank is not always ap- 
propriate in numerical computations. For example, if 


a matrix A € R”*”, with (mathematical) rank k < n, is 
randomly perturbed by roundoff, the perturbed matrix 
most likely has full rank n. However, it should be con- 
sidered to be ‘numerically’ rank deficient. 

When solving rank-deficient or ill-conditioned least 
squares problems, correct assignment of the ‘numerical 
rank of A is often the key issue. The numerical rank 
should depend on a tolerance which reflects the error 
level. Overestimating the rank may lead to a computed 
solution of very large norm, which is totally irrelevant. 
This behavior is typical in problems arising from dis- 
cretizations of ill-posed problems, see [21]. 

Assume that the ‘noise level’ 5 in the data is known. 
Then a numerical rank k, such that 0, > 6 > ox 4, can 
be assigned to A, where o; are the singular values of A. 
The approximate solution 


c=U'b, 


is known as the truncated singular value decomposition 
solution (TSVD). This solution solves the related least 
squares problem min, ||Ayx — b||2, where 


k 
Ar =) ujov), \|A—Axll, <8, 
i=1 


is the best rank k approximation of A, The subspace 


R(V2), Va= (V4. aiid Vin) 


is called the numerical nullspace of A. 
An alternative to TSVD is Tikhonov regularization, 
where one considers the regularized problem 


min ||Ax — b||; + 17 |x|}. (22) 
for some positive diagonal matrix D = diag(d), ..., 
d,). The problem (22) is equivalent to the least squares 


problem 

tD , TO 

A b 
where the matrix A has been modified by appending the 
matrix t D on top. An advantage of using the regular- 


ized problem (23) instead of the TSVD is that its solu- 
tion can be computed from a QR decomposition. When 


(23) 


min 
x 
2 


t > 0 this problem is always of full column rank and has 
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a unique solution. For D = J it can be shown that x(t) 
will approximately equal the TSVD solution for t = 6. 
Problem (23) also appears as a subproblem in trust 
region algorithms for solving nonlinear least squares, 
and in interior point methods for constrained linear 
least squares problems. A more difficult case is when 
the noise level 6 is unknown and has to be determined 
in the solution process. Such problems typically arise in 
the treatment of discrete ill-posed problems, see [21]. 


Rank Revealing QR Factorizations 


In some applications it is too expensive to compute the 
SVD. In such cases so called ‘rank revealing’ QR factor- 
izations, often are a good substitute. 

It can be shown that for any 0 < k < na column 
permutation IT exists such that the QR decomposition 
of A IT has the form 


Ru Ry 
All = “ 24 
a( ®) as 
where 
1 
ox(Ri) = ooh | Roa ||. < consi, (25) 


and c < (m + 1)/2. In particular, if A has numerical 6- 
rank equal to k, then there is a column permutation 
such that || Roo ||2 <6. Sucha QR factorization is called 
a rank revealing QR factorization (RRQR). No efficient 
numerical method is known which can be guaranteed 
to compute an RRQR factorization satisfying (25), al- 
though in practice Chan’s method [7] often gives satis- 
factory results. 

A related rank revealing factorization is the com- 
plete orthogonal decomposition of the form 


(26) 


where U and V are orthogonal matrices, Rj; € REX, 
ox(Ri1) = ox/c, and 


L 
(Rialle + [Roalle)? < const. 


This is also often called a rank revealing URV factor- 
ization. (an alternative lower triangular form ULV is 
sometimes preferable to use.) If V = (Vi V2) is parti- 
tioned conformably the orthogonal matrix V2 can be 
taken as an approximation to the numerical nullspace 
N(A). 


Updating Least Squares Solutions 


It is often desired to solve a sequence of modified least 
squares problems 


min ||Ax — b||,, Ae R”*’, (27) 
x 


where in each step rows of data in (A, b) are added, 
deleted, or both. This need arises, e.g., when data are 
arriving sequentially. In various time-series problems 
a window moving over the data is used; when a new ob- 
servation is added, an old one is deleted as the window 
moves to the next step in the sample. In other applica- 
tions columns of the matrix A may be added or deleted. 
Such modifications are usually referred to as updating 
(downdating) of least squares solutions. 

Important where modified least 
squares problems arise include statistics, optimiza- 
tion, and signal processing. In statistics an efficient and 
stable procedure for adding and deleting rows to a re- 
gression model is needed; see [6]. In regression models 
one may also want to examine the different models, 
which can be achieved by adding or deleting columns 
(or permuting columns). 


applications 


Recursive Least Squares 


Applications in signal processing often require near 
real-time solutions. It is then critical that the modifi- 
cation should be performed with as few operations and 
as little storage requirement as possible. 

Methods based on the normal equations and/or up- 
dating of the Cholesky factorization are still often used 
in statistics and signal processing, although these algo- 
rithms lack numerical stability. Consider a least squares 
problem where an observation wx = B is added. The 
updated solution x then satisfies the modified normal 
equations 


(A'A+ ww! )x = A'b + Bw. (28) 


A straightforward method for computing x is based on 
updating the (scaled) covariance matrix C = (AT A)7!. 
By the Sherman-Morrison formula one obtains C= 
C1 +ww!',and 


aS 1 + 

C=C- fan », u=Cw. (29) 
From this follows the updating formula 

K=xt+(p—-w'xla, W=Cw. (30) 
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The equations (29), (30) define a recursive least squares 
(RLS) algorithm. They can, with slight modifications, 
also be used for ‘deleting’ observations. The simplicity 
of this updating algorithm is appealing, but a disadvan- 
tage is its serious sensitivity to roundoff errors. 


Modifying Matrix Factorizations 


The first area where algorithms for modifying matrix 
factorizations seems to have been systematically used 
is optimization. Numerous aspects of updating various 
matrix factorizations are discussed in [17]. 

There is a simple relationship between the problem 
of updating matrix factorizations and that of updating 
least squares solutions. If A has full column rank and 
the R-factor of the matrix (A, b) is 


R z 

0 pi’ 
then the solution to the least squares problem (27) is 
given by 


(31) 


Rx =z, ||Ax—b|l, =p. (32) 


Hence updating algorithms for the QR or Cholesky fac- 
torization can be applied to (A, b) in order to give up- 
dating algorithms for least squares solutions. 
Backward stable algorithms, which require O(m7’) 
multiplications, exist for updating the QR decomposi- 
tion for three important kinds of modifications: 
e General rank one change of A. 
e Deleting (adding) a column of A. 
e Adding (deleting) a row of A. 
In these algorithms, Q € R”*" is stored explicitly 
as an m X m matrix. In many applications it suffices to 
update the ‘Gram-Schmidt’ QR decomposition 


A= Q\R, Q) € R”™*" | (33) 


where Q, € R™*" consists of the first n columns of Q, 
[10,31]. These only require O(mn) storage and opera- 
tions. 

J.R. Bunch and C.P. Nielsen [5] have developed 
methods for updating the SVD 


A= u(3] a 
0 


where U € R™*™ and V € R"*", when A is modified 
by adding or deleting a row or column. However, their 
algorithms require O(mn7) flops. 


Rank revealing QR factorizations can be updated 
more cheaply, and are often a good alternative to use. 
G.W. Stewart [33] has shown how to compute and up- 
date a rank revealing complete orthogonal decomposi- 
tion from an RRQR decomposition. 

Most updating algorithms can be modified in 
a straightforward fashion to treat cases where a block 
of rows/columns are added or deleted. which are more 
amenable to efficient implementation on vector and 
parallel computers. 


Sparse Problems 


The gain in operations and storage in solving the lin- 
ear least squares problems where the matrix A is sparse 
can be huge, making otherwise intractable problems 
possible to solve. Sparse least squares problems of huge 
size arise in a variety of applications, such as geodetic 
surveys, photogrammetry, molecular structure, gravity 
field of the earth, tomography, the force method in 
structural analysis, surface fitting, and cluster analysis 
and pattern matching. 

Sparse least squares problems may be solved either 
by direct or iterative methods. Preconditioned iterative 
methods can often be considered as hybrids between 
these two classes of solution. Below direct methods are 
reviewed for some classes of sparse problems. 


Banded Least Squares Problems 


A natural distinction is between sparse matrices with 
regular zero pattern (e. g., banded structure) and matri- 
ces with an irregular pattern of nonzero elements. 

A rectangular banded matrix A ¢ R™*" has the 
property that the nonzero elements in each row lie in 
a narrow band. A is said to have row bandwidth w if 


w(A) = max (1i(A) — fi(A) + 1). (34) 


where 


fi(A) = min {j: ai; 4 0}, 
1;(A) = max {j: Gij x o} 


are column subscripts of the first and last nonzeros in 
the ith row of A. For this structure to have practical sig- 
nificance one needs to have w < n. Note that, although 
the row bandwidth is independent of the row ordering, 
it will depend on the column ordering. To permute the 
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columns in A so that a small bandwidth is achieved the 
method of choice is the reverse Cuthill-McKee order- 
ing, see [15]. 

It is easy to see that if the row bandwidth of A is 
w then the matrix of normal equations C = ATA has at 
most upper bandwidth p = w — 1, i.e., 


m 
lj—k| =w => (ATA) jx =>  ajan = 0. 


i=1 


If advantage is taken of the band structure, the solution 
of a least squares problem where A has bandwidth w by 
the method of normal equations requires a total of 


$+ (mw(w + 3) + n(w — 1)(w + 2)) + n(2w — 1) 


flops. 

Similar savings can be obtained for methods based 
on Givens QR decomposition used to solve banded least 
squares problem. However, then it is essential that the 
rows of A are sorted so that the column indices f;(A), 
i= 1,..., m, of the first nonzero element in each row 
form a nondecreasing sequence, i.e., 

i<k = fi(A) < fx(A). 

A matrix whose rows are sorted in this way is said to be 
in standard form. Since the matrix R in the QR factor- 
ization has the same structure as the Cholesky factor, it 
must be a banded matrix with nonzero elements only 
in the first p = w — 1 superdiagonals. In the sequential 
row orthogonalization scheme an upper triangular ma- 
trix R is initialized to zero. The orthogonalization then 
proceeds row-wise, and R is updated by adding a row of 
A ata time. 

If A has constant bandwidth and is in standard form 
then in the ith step of reduction the last (n — 1;(A)) 
columns of R have not been touched and are still zero 
as initialized. Further, the first (f;(A) — 1) rows of R 
are already finished at this stage and can be read out to 
secondary storage. Thus, as with the Cholesky method, 
very large problems can be handled since primary stor- 
age is needed only for the active part of R. The complete 
orthogonalization requires about 2mw” flops, and can 
be performed in w(w + 3)/2 locations of primary stor- 
age. 

The Givens rotations could also be applied to one 
or several right-hand sides b. Only if right-hand sides 


which are not initially available are to be treated, need 
the Givens rotations be saved. The algorithm can be 
modified to also handle problems with variable row 
bandwidth wj. 

For the case when m > na more efficient schemes 
uses Householder transformations, see [24, Chap. 11]. 
Let Ax consist of the rows of A for which the first 
nonzero element is in column k. Then, in step k of this 
algorithm, the A; is merged with Ry —1, by computing 
the QR factorization 


af (*e*) = Re 
Note that this and later steps will not involve the first k 
— 1 rows and columns of R,— ,. Hence the first k — 1 
rows of R;— ; are rows in the final matrix R. 

The reduction using this algorithm takes about 
w(w + 1)(m + 3n/2) flops, which is approximately half 
as many as for the Givens method. As in the Givens al- 
gorithm the Householder transformations can also be 
applied to one or several right-hand sides b to produce c 
= QTD. The least squares solution is then obtained from 
Rx = c, by back-substitution. 

It is essential that the Householder transformations 
be subdivided as outlined above, otherwise interme- 
diate fill will occur and the operation count increase 
greatly, see the example in [32]. 


Block Angular Form 


There is often a substantial similarity in the structure of 
large sparse least squares problems. The matrices pos- 
sess a block structure, perhaps at several levels, which 
reflects a ‘local connection’ structure in the underlying 
physical problem. In particular, the problem can often 
be put in the following bordered block diagonal or block 
angular form: 


A, By 
A= ar (35) 
Am By 
x] b; 
x= b= (36) 
XM+1 bu 
From (35) it follows that the variables x), ..., xy are 


coupled only to the variables x;y ,.;. Some applications 


1864 


Least Squares Problems 


where the form (35) arises naturally are photogramme- 
try, Doppler radar positioning [27], and geodetic survey 
problems [20]. 

Problems of block angular form can be efficiently 
treated either by using normal equations of by QR 
factorization. It is easily seen that the matrix R from 
Cholesky or QR will have a block structure similar to 
that of A, 


Ri Si 
(37) 
Ru Sm 


Ru+i 


where the R; € R”'*”! are upper triangular. This factor 
can be computed by first performing a sequence of or- 
thogonal transformations yielding 


T ; ye Rj S; A Ci 
aria.ny=(® %), afn=(8). 


Any sparse structure in the blocks A; should be ex- 
ploited. The last block row Ry+1, Cu+1 is obtained by 
computing the QR decomposition 


~T — (Rui Cuti 
Qui (T d) ~~ ( 0 -)) : 


where 


T, d 1 


The unknown x41 is determined from the triangular 
system Ry+1 Xm+1 = Cm+1.- Finally xy, ..., x) are com- 
puted by back-substitution in the sequence of triangu- 
lar systems Rjx; = cj — S; Xu41,i=M,..., 1. Note that 
a large part of the computations can be performed in 
parallel on the M subsystems. 

Several modifications of this basic algorithm have 


been suggested in [19] and [9]. 


General Sparse Problems 


If A is partitioned by rows, then (15) can be used 
to compute the matrix C = AT A. Make the ‘no- 
cancellation assumption’ that whenever two nonzero 
numerical quantities are added or subtracted, the result 


is nonzero. Then it follows that the nonzero structure 
of AT A is the direct sum of the nonzero structures of 
dj. as i=1,...,m, where al denotes the ith row of A. 
Hence the undirected graph G(AT A) representing the 
structure of ATA can be constructed as the direct sum 
of all the graphs G(q;. a), i=1,..., m. The nonzeros 
in row a} will generate a subgraph, where all pairs of 
nodes are connected. Such a subgraph is called a clique. 

From the graph G(AT A) the structure of the 
Cholesky factor R can be predicted by using a graph 
model of Gaussian elimination. The fill under the fac- 
torization process can be analyzed by considering a se- 
quence of undirected graphs G; = G(A”), i=0,...,n- 
1, where A = A. These elimination graphs can be re- 
cursively formed in the following way. Form G; from 
Gii—1) by removing the node i and its incident edges 
and adding fill edges. The fill edges in eliminating node 
v in the graph G are 


{(.K): (jk) € Adjg(v), j # ky. 


Thus, the fill edges correspond to the set of edges re- 
quired to make the adjacent nodes of v pairwise ad- 
jacent. The filled graph Gp(A) of A is a graph with n 
vertices and edges corresponding to all the elimination 
graphs G;, i =0,..., — 1. The filled graph bounds the 
structure of the Cholesky factor R, 


G(R" 4 R) C Gal A). (38) 


This also give an upper bound for the structure of the 
factor R in the QR decomposition. 

A reordering of the columns of AP of A corresponds 
to a symmetric reordering of the rows and columns 
of AT A. Although this will not affect the number of 
nonzeros in AT A, only their positions, it may greatly 
affect the number of nonzeros in the Cholesky factor R. 
Before carrying out the Cholesky or QR factorization 
numerically, it is therefore important to find a permuta- 
tion matrix P such that PTAT AP has a sparse Cholesky 
factor R. 

By far the most important local ordering method is 
the minimum degree ordering In terms of the Cholesky 
factorization this ordering is equivalent to choosing the 
ith pivot column as one with the minimum number 
of nonzero elements in the unreduced part of AT A. 
This will minimize the number of entries that will be 
modified in the next elimination step. Remarkably fast 
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symbolic implementations of the minimum degree al- 
gorithm exist, which use refinements of the elimination 
graph model of the Cholesky factorization. See [16] for 
a survey of the extensive development of efficient ver- 
sions of the minimum degree algorithm. 

Another important ordering method is substructur- 
ing or nested dissection, which results in a nested block 
angular form. Here the idea is to choose a set of nodes B 
in the graph G(ATA), which separates the other nodes 
into two sets A, and A, so that node variables in A, are 
not connected to node variables in Az. The variables 
are then ordered so that those in A, appear first, those 
in Az second, and those in B last. Finally the equations 
are ordered so that those including A, come first, those 
including A next, and those only involving variables 
in B come last. This dissection can be continued recur- 
sively, first dissecting the regions A; and A, each into 
two subregions, and so on. 

An algorithm using the normal equations for solv- 
ing sparse linear least squares problems is usually split 
in a symbolical and a numerical phase as follows. 

1) Determine symbolically a column permutation P, 

such that P] AT AP, has a sparse Cholesky factor R. 
2) Perform the Cholesky factorization of P' AT AP. 

symbolically to generate a storage structure for R. 

3) Compute B = P! AT AP, and c = P! AT b numeri- 

cally, storing B in the data structure of R. 

4) Compute the Cholesky factor R numerically and 
solve RT z= c, Ry =z, giving the solution x = P,y. 
Here, steps 1 and 2 involve only symbolic computation 
and apply also to a sparse QR algorithm. For details of 
the implementation of the numerical factorization see 
[15, Chap. 5]. For moderately ill-conditioned problems 
a sparse Cholesky factorization, possibly used with iter- 

ative refinement, is a satisfactory choice. 

Orthogonalization methods are potentially more 
accurate since they work directly with A. The number 
of operations needed to compute the QR decomposi- 
tion depends on the row ordering, and the following 
heuristic row ordering algorithm should be applied to 
A before the numerical factorization takes place: 

First sort the rows after increasing f(A), so that 
fi(A) < fx(A) ifi < k. Then for each group of rows with 
fi(A) =k, k = 1,..., max; f;(A), sort all the rows after 
increasing L;(A). 

In the sparse case, applying the usual sequence of 
Householder reflections may cause a lot of intermedi- 


ate fill-in, with consequent cost in operations and stor- 
age. In the row sequential algorithm by J.A. George 
and M.T. Heath [14], this problem is avoided by using 
a row-oriented method employing Givens rotations. 
Even more efficient are multifrontal methods, in which 
Householder transformations are applied to a sequence 
of small dense subproblems. 

Note that in most sparse QR algorithms the orthog- 
onal factor Q is not stored. The corrected seminormal 
equations are used for treating additional right-hand 
sides. The reason is that for rectangular matrices A the 
matrix Q is usually much less sparse than R. In the mul- 
tifrontal algorithm, however, Q can efficiently be rep- 
resented by the Householder vectors of the frontal or- 
thogonal transformations, see [26]. 

A Fortran multifrontal sparse QR subroutine, called 
QR27, has been developed by P. Matstoms [28]. He [29] 
has also developed a version of this to be used with 
MATLAB, implemented as four M-files and available 
from netlib. 


See also 


> ABS Algorithms for Linear Equations and Linear 
Least Squares 

> ABS Algorithms for Optimization 

> Gauss, Carl Friedrich 

> Gauss—Newton Method: Least Squares, Relation to 
Newton’s Method 

> Generalized Total Least Squares 

> Least Squares Orthogonal Polynomials 

> Nonlinear Least Squares: Newton-type Methods 

> Nonlinear Least Squares Problems 
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G.W. Leibniz (1646-1716) was a well-known German 
philosopher and mathematician. He is considered a de- 
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scendant of German idealism and a pioneer of the En- 
lightenment. Leibniz is known as the inventor of the 
differential and integral calculus [7]. 

Leibniz’s contribution in philosophy is as significant 
as in mathematics. In philosophy Leibniz is known for 
his fundamental philosophical ideas and principles in- 
cluding truth, necessary and contingent truths, possi- 
ble worlds, the principle of sufficient reason (i. e., there 
is a reason behind everybody’s action), the principle of 
pre-established harmony (i.e., the universe is created 
in such a way that corresponding mental and physi- 
cal events occur simultaneously), and the principle of 
noncontradiction (i. e., ifa contradiction can be derived 
from a proposition, this proposition is false). Leibniz 
was fond on the idea that the principles of reasoning 
could be organized into a formal symbolic system, an 
algebra or calculus of thought, where disagreements 
could be settled by calculations [4]. 

Leibniz was the son of a professor of moral philoso- 
phy at Leipzig Univ. Leibniz learned to read from his fa- 
ther before going to school. He taught himself Latin and 
Greek by age 12, so that he could read the books in his 
father’s library. He studied law at the Univ. of Leipzig 
from 1661 to 1666. In 1666 he was refused the degree of 
doctor of laws at Leipzig. He went to the Univ. of Alt- 
dorf, which awarded him doctorate in jurisprudence in 
1667 [1]. 

Leibniz started his career at the courts of Mainz 
where he worked until 1672. The Elector of Mainz pro- 
moted him to diplomatic services. In 1672 he visited 
Paris to try to dissuade Louis XIV from attacking Ger- 
man areas. Leibniz remained in Paris until 1676, where 
he continued to practice law. In Paris he studied math- 
ematics and physics under Chr. Huygens. During this 
period he developed the basic features of his version of 
the calculus. He spent the rest of his life, from 1676 until 
his death (November 14, 1716) at Hannover [6]. 

Leibniz’s most important achievement in mathe- 
matics was the discovery of infinitesimal calculus. The 
significance of calculus is so important that it was 
marked as the starting point of modern mathemat- 
ics. Leibniz’s formulations were different from previous 
investigation by I. Newton. Newton was mainly con- 
centrated in the geometrical representation of calculus, 
while Leibniz took it towards analysis. Newton consid- 
ered variables changing with time. Leibniz thought of 
variables x, y as ranging over sequences of infinitely 


close values. For Newton integration and differentia- 
tion were inverses, while Leibniz used integration as 
a summation. At that time, neither Leibniz nor New- 
ton thought in terms of functions, both always thought 
in terms of graphs. 

In November 1675 he wrote a manuscript using the 
notation [ f(x) dx for the first time [5]. In the same 
manuscript he presented the product rule for differen- 
tiation. The quotient rule first appeared two years later, 
in July 1677. In 1676 Leibniz arrived in the conclusion 
that he was in possession of a method that was highly 
important because of its generality. Whether a function 
was rational or irrational, algebraic or transcendental 
(a word that Leibniz coined), his operations of finding 
sums and differences could always be applied. 

In November 1676 Leibniz discovered the familiar 
notation d(x") = nx"—! dx for both integral and frac- 
tional n. Newton claimed that: ‘not a single previously 
unsolved problem was solved here’, but the formalism 
of Leibniz’s approach proved to be vital in the devel- 
opment of the calculus. Leibniz never thought of the 
derivative as a limit. This does not appear until the 
work of J. d'Alembert. Leibniz was convinced that good 
mathematical notations were the key to progress so 
he experimented with different notation for coefficient 
systems. His language was fresh and appropriate, incor- 
porating such terms as differential, integral, coordinate 
and function [8]. His notations which we still use today, 
were clear and elegant. His unpublished manuscripts 
contain more than 50 different ways of writing coefh- 
cient systems, which he worked on during a period of 
50 years beginning in 1678. 

Leibniz used the word resultant for certain com- 
binatorial sums of terms of a determinant. He proved 
various results on resultants including what is essen- 
tially Cramer’s rule. He also knew that a determinant 
could be expanded using any column, what is now 
called Laplace expansion. As well as studying coefficient 
systems of equations which led him to determinants, 
Leibniz also studied coefficient systems of quadratic 
forms which led naturally towards matrix theory [9]. 
He thought about continuity, space and time [2]. 

In 1684 Leibniz published details of his differen- 
tiable calculus in ‘Acta Eruditorum’, a journal estab- 
lished in Leipzig two years earlier. He described a gen- 
eral method for finding maxima and minima, and 
drawing tangents to curves. The paper contained the 
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rules for computing the derivatives of powers, products 
and quotient. 

In 1686 Leibniz published a paper on the princi- 
ples of new calculus [3] in “Acta Eruditorum’. Leibniz 
emphasized the inverse relationship between differen- 
tiation and integration in the fundamental theorem of 
calculus. 

In 1692 Leibniz wrote a paper that set the basis of 
the theory of envelopes. This was further developed in 
another paper published on 1694 where he introduced 
for the first time the terms coordinates and axes of coor- 
dinates. 

Leibniz published many papers on mechanical sub- 
jects as well [1]. In 1700 Leibniz founded the Berlin 
Academy and was its first president. 

Leibniz’s principal works are: 

1) ‘De Arte Combinatoria’ (On the Art of Combina- 

tion), 1666; 

2) “Hypothesis Physica Nova’ (New Physical Hypothe- 

sis), 1671; 

3) ‘Dicours de Metaphysique’ (Discourse on Meta- 

physics), 1686; 

4) Unpublished Manuscripts on the Calculus of Con- 

cepts, 1690; 

5) ‘Nouveaux Essais sur L’entendement Humaine’ 

(New Essays on Human Understanding), 1705; 

6) “Theodicee’ (Theodicy), 1710; 
7) “Monadologia’ (The Monadology), 1714. 
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The linear complementarity problem (LCP) is a well 
known problem in mathematical programming. Appli- 
cations of the LCP to engineering, game theory, eco- 
nomics, and many other scientific fields have been 
found. The monograph of K.G. Murty [8] is a com- 
pendium of LCP developments. One of the most sig- 
nificant approaches to the solution of the linear com- 
plementarity problem is called Lemke’s method or 
Lemke’s algorithm. Two descriptions of the algorithm 
[6,7] provide many algorithmic proofs and details for 
the interested reader. Our treatment here is a sketch of 
the algorithm, together with pointers to related work in 
the literature. 

There are some important related works for those 
who wish to solve LCP. A. Ravindran [10] provided 
a FORTRAN implementation of Lemke’s algorithm in 
a set-up similar to the revised simplex method. C.B. 
Garcia [2] described some classes of matrices for which 
the associated LCPs can be solved by Lemke’s algo- 
rithm. J.J.M. Evers [1] enlarged the range of application 
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of Lemke’s algorithm, and showed that it could solve 
the bimatrix game. P.M. Pardalos and J.B. Rosen [9] 
presented a global optimization approach to LCP. D. 
Solow and P. Sengupta [11] proposed a finite descent 
theory for the linear complementarity problem. M.M. 
Kostreva [4] showed that without the nondegeneracy 
assumption, Lemke’s algorithm may cycle, and showed 
that the minimum length of such a cycle is four. 

The linear complementarity problem considered is: 
Given an (n x n)-matrix M and an (n x 1) column vec- 
tor q, problem LCP(q, M) is to find x (or prove that no 
such x exists) in R” satisfying 


y= Mx + q, (1) 

yi 2 9, (2) 

x; = 0, (3) 

yi x; =0, (4) 
for all i,i=1,..., 7. 


Clearly these conditions are equivalent to yT x = 0. 
The variables (yj, x;) are called a complementary pair of 
variables. Lemke’s algorithm is organized relative to the 
following extended system of equations: 


y= Mx+q4t xod, (5) 


where d is an (n x 1) column vector, and xy > 0. Relative 
to the vector d, it is only required that (q + xod) > 0 for 
some Xo > 0. It is assumed that the system of equations 
(5) is nondegenerate, that is, any solution has at most n 
+ 1 zero values among the variables (y, x, xo). 
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If q > 0, terminate with a complementary feasible solu- 
tion, y=q,x=0. 

If q has some negative component, then on the first 
pivot xo is increased until for the first time y = q + xo 
d > 0. When this occurs, some y variable, say y, be- 
comes zero. The first pivot is to exchange the variables 
Xo and y,. Now the variable xo is basic, and the variables 
y, and x, are two complementary non basic variables. If 
a pivot can be made on variable x, (complement of the 
most recently pivoted member of the complementary 
pair), then it leads to another similar situation with an- 


other pair of complementary variables. If a pivot cannot 
be made, the sequence is terminated. If the variable xo 
becomes non basic (zero), a solution is at hand. If not, 
the pivoting continues uniquely, with each new set of 
equations containing a non basic complementary pair 
of variables, one of which is most recently made non 
basic. Due to the unique choices of pivot row and pivot 
column, finite termination must occur. 

Under certain conditions, including the positive 
semidefinite matrices, the condition of termination 
without finding a pivot (also called secondary ray ter- 
mination) can be shown to imply that the set {x: y = 
Mx+q>0, x= 0}is empty. Under such conditions, 
Lemke’s algorithm is said to process the LCP: either it 
is solved, or it is shown not to have a feasible solution. 
The set of all LCPs which Lemke’s algorithm will pro- 
cess is unknown, but some recent papers shed light on 
its processing domain. Kostreva and M.M. Wiecek [5] 
use a multiple objective optimization approach which 
eventually results in a larger dimensioned LCP, while G. 
Isac, Kostreva and Wiecek [3] point out a set of prob- 
lems which is impossible for Lemke’s method to pro- 
cess. 


Example 1 Consider the LCP corresponding to the 
quadratic programming problem 


Xt — 2x1X2 + x} + 3x1 + x 
s.t. 3x, + x2 > 4 


x, > 0, x. > 0. 


Then q = (—4, 3, 1)T and M = [(0, —3, —1)T, (3, 2, —2)T, 
(1, —2, 2)T], and Lemke’s algorithm requires four piv- 
ots to obtain the solution x* = (1, 1)T, using the vector 
d = (1, 1, 1)T. It is noteworthy that the nondegeneracy 
assumption is not satisfied in this example, but Lemke’s 
algorithm works anyway. 


See also 


> Convex-simplex Algorithm 

> Linear Complementarity Problem 
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> Parametric Linear Programming: Cost Simplex 
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> Sequential Simplex Method 
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The general linear optimization (LO), linear program- 
ming (cf. » Linear programming), problem will be con- 
sidered in the standard primal form 


min ee Ax =b,x> 0} : 
together with its standard dual 
max {b! y: Aly< c} : 


One of the most efficient, and for a long time the only, 
practical method to solve LO problems was the sim- 
plex method of G.B. Dantzig. The simplex method is 
a pivot algorithm that traverses through feasible basic 
solutions while the objective value is improving. The 
simplex method is practically one of the most efficient 
algorithms but it is theoretically a finite algorithm only 
for nondegenerate problems. 

A basis is called primal degenerate if at least one of 
the basic variables is zero; it is called dual degenerate 
if the reduced cost of at least one nonbasic variable is 
zero. In general, the basis is degenerate if it is either pri- 
mal or dual, or both primal and dual degenerate. The 
LO problem is degenerate, if it has a degenerate ba- 
sis. A pivot is called degenerate when after the pivot 
the objective remains unchanged. When the problem 
is degenerate the objective might stay the same in sub- 
sequent iterations and the simplex algorithm may cy- 
cle, i.e. starting from a basis, after some iterations the 
same basis is revisited and this process is repeated end- 
lessly. Because the simplex method produces a sequence 
with monotonically improving objective values, the ob- 
jective stays constant in a cycle, thus each pivot in the 
cycle must be degenerate. The possibility of cycling was 
recognized shortly after the invention of the simplex al- 
gorithm. Cycling examples were given by E.M.L. Beale 
[2] and by A.J. Hoffman [10]. Recently (1999) a scheme 
to construct cycling LO examples is presented in [9]. 
These examples made evident that extra techniques are 
needed to ensure finite termination of simplex meth- 
ods. The first and widely used such tool is the lexico- 
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graphic simplex rule. Other techniques, like the least- 
index anticycling rules (cf. » Least-index anticycling 
rules) and more general recursive schemes were devel- 
oped more recently. 


Lexicographic Simplex Methods 


First we need to define an ordering, the so-called lexico- 
graphic ordering of vectors. 


Lexicographic Ordering 


An n-dimensional vector u = (uj,... , Un) is called lexi- 
cographically positive or, in other words, lexico-positive 
if its first nonzero coordinate is positive, i.e. for a cer- 
tain j < n one has u; = 0 for i < j and x, > 0. Observe, 
that the zero vector is the only lexico-nonnegative vec- 
tor which is not lexico-positive. The vector u° is said to 
be lexicographically smaller than a vector u’ when the 
difference u! — u° of the two vectors is lexico-positive. 
Further, if a finite set of vectors {uv°, ... , u*} is given, 
then the vector u° is said to be lexico-minimal in the 
given set, when uv? is lexicographically smaller than u' 
foralll <i<k. 


The Lexicographic Primal Simplex Method 


Cycling of the simplex method is possible only when the 
LO problem is degenerate. In that case possibly many 
variables are eligible to enter and to leave the basis. The 
lexicographic primal simplex rule makes the selection 
of the leaving variable uniquely determined when the 
entering variable is already chosen. 


The Use of Lexicographic Ordering 


At start a feasible lexico-positive basis tableau is given. 
A basis tableau is called lexico-positive if, except the re- 
duced cost row, all of its row vectors are lexico-positive. 
Any feasible basis tableau can be made lexico-positive 
by a simple rearrangement of its columns. Specifically, 
we can take the solution column as the first one, and 
then take the current basic variables, in an arbitrary or- 
der, followed by the nonbasic variables, again in an ar- 
bitrary ordering. 

The following lexicographic simplex pivot selection 
rule was first proposed by Dantzig, A. Orden and P. 
Wolfe [7]. 


0 | Initialization. 

Let T(B) be a given primal feasible lexico- 
positive basis tableau. 

(Fix the order of the variables.) 

1 | Entering variable selection. 

Choose a dual infeasible variable, i.e. one with 
negative reduced cost. Let its index be q. 

IF no such variable exists, THEN STOP; 

The tableau T(B) is optimal and this way a pair 
of optimal solutions is obtained. 

2 | Leaving variable selection. 

Collect in column q all the candidate pivot el- 
ements that satisfy the usual pivot selection 
conditions of the primal simplex method. 

Let K = {ij,..., ig} be the set of the indices of 
the candidate leaving variables. 

IF there is no pivot candidate, 

THEN STOP; 

The primal problem is unbounded, and so the 
dual problem is infeasible. 

IF there is a unique pivot candidate {p} = K to 
leave the basis, 

THEN go to Step 3. 

IF there are more pivot candidates, 

THEN look at the row vectors t', i € K, of 
the basis tableau (note that by construction x; 
is the first coordinate of t'). 

Let p be the pivot row if t? is lexico-minimal 
in this set of row vectors. 

3 | Basis transformation. 

Pivot on (p, g). Go to Step 1. 


The lexicographic primal simplex rule 


The following two observations are important. First 
note that lexicographic selection plays role only when 
the leaving variable is selected. In that case some rows of 
the tableau are compared in the lexicographic ordering. 
If the basis variables were originally out right after the 
solution column, as proposed in order to get a lexico- 
positive initial tableau, then this comparison is already 
decided when one considers only the columns corre- 
sponding to the initial basis. This claim holds, because 
those columns form a basis, thus the related row vectors 
are linearly independent as well. 

On the other hand, when the initial basis is the 
unit matrix, then at each pivot the basis inverse can be 
found, in the place of the initial unit matrix. When these 
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two observations are put together, it can be concluded 
that instead of using the rows of the basis tableau, the 
rows of the basis inverse headed by the correspond- 
ing solution coordinate, can be used in Step 2. to de- 
termine the unique leaving variable. As a consequence 
one do not need to calculate and store the complete ba- 
sis tableau when implementing the lexicographic pivot 
rule. The solution and the basis inverse provide all the 
necessary information. 

The lexicographic simplex method is finite. The 
finiteness proof is based on the following simple prop- 
erties: There is a finite number of different basis 
tableaus. The first row of the tableau, i.e. the vec- 
tor, having the objective value as its first coordinate 
followed by the reduced cost vector, strictly increases 
lexicographically at each iteration. This fact ensures 
that no basis can be revisited, thus cycling is impossi- 
ble. 


Lexicographic Ordering and Perturbation 


Independent of [7], A. Charnes [4] developed a tech- 
nique of perturbation, that resulted in a finite simplex 
algorithm. This algorithm turned out to be equiva- 
lent to the lexicographic rule. The perturbation tech- 
nique is as follows. Let € be a sufficiently small num- 
ber. Let us replace bj by bj + )-jaje! for all i. If € is 
small enough then the resulted problem is nondegen- 
erate. Moreover, starting from a given primal feasible 
basis, the primal simplex method applied to the new 
problem produces exactly the same pivot sequence as 
the lexicographic simplex method on the original prob- 
lem. 

In particular, when the problem is initialized with 
a feasible basis solution, it suffices to use the perturba- 
tion b;+ €'. This way only the basis part of the coefficient 
matrix is used in Charnes’ perturbation technic. 

An appealing property of the perturbation tech- 
nique is that actually it is not needed to perform the 
perturbation with a concrete e. It can be done symboli- 
cally. 


Lexicographic Dual Simplex Method 


The dual simplex method is nothing else, than the pri- 
mal simplex method applied to the dual problem, when 
the dual problem is brought in the primal standard 
form. This way it is straightforward to develop the lexi- 


cographic, or the equivalent perturbation technique for 
the dual simplex method. 


Extensions 


The lexicographic rule is extensively used in proving 
finiteness of pivot algorithms, see e. g. [1] for an appli- 
cation in a monotonic build-up scheme, [14] for fur- 
ther references in LO and [5] for references when lexi- 
cographic degeneracy resolution is applied for comple- 
mentarity problems. 


Lexicography and Oriented Matroids 


Based on the perturbation interpretation, analogous 
lexicographic techniques and lexicographic pivoting 
rules were developed for oriented matroid program- 
ming [3] (cf. also ® Oriented matroids). These tech- 
niques were particularly interesting, because nonde- 
generate cycling [3,8] is possible in oriented matroids. 
An apparent difference between the linear and the ori- 
ented matroid context is that for oriented matroids 
none of the finite - recursive or least index type — rules 
yield a simplex method, i.e. a pivot method that pre- 
serves feasibility of the basis throughout. This discrep- 
ancy is also due to the possibility of nondegenerate cy- 
cling. 

Interestingly, in the case of oriented matroid pro- 
gramming the finite lexicographic method of M.J. Todd 
[15,16] is the only one which preserves feasibility of the 
basis and therefore yields a finite simplex algorithm for 
oriented matroids. 

The equivalence of Dantzig’s self—dual paramet- 
ric algorithm [6] and Lemke’s complementary pivot al- 
gorithm [11,12] applied to the linear complementar- 
ity problem (cf. also » Linear complementarity prob- 
lem) defined by the primal and dual LO problem was 
proved by I. Lustig [13]. Todd’s lexicographic pivot 
rule is essentially a lexicographic Lemke method (or 
the parametric perturbation method), when applied to 
the specific linear complementary problem defined by 
the primal-dual pair of LO problems. Hence, using the 
equivalence mentioned above a simplex algorithm for 
LO can be derived. However, it is more complicated to 
present this method in the linear optimization than in 
the complementarity context. Now Todd’s rule will be 
sketched for the linear case. 
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0 | Initialization. 

Let a lexico-positive feasible tableau T(B) be 
given. 

1 | Entering variable selection. 

Collect all the dual infeasible variables as the 
set of candidate entering variables. Let their set 
of indices be denoted by Kp. 

IF no such variable exists, THEN STOP; 

The tableau T(B) is optimal and this way a pair 
of optimal solutions is obtained. 

IF there is a unique {q} = Kp candidate to en- 
ter the basis, 

THEN go to Step 2. 

IF there are more pivot candidates, 

THEN let q be the index of that variable whose 
column is lexico-minimal in the set Kp. (Anal- 
ogous to the dual lexicographic simplex selec- 
tion rule). 

2 | Leaving variable selection. 

Collect in column q all the candidate pivot el- 
ements that satisfy the usual pivot selection 
conditions of the primal simplex method. 

Let Kp be the set of the indexes of the candi- 
date leaving variables. 

IF there is no pivot candidate, THEN STOP; 
the primal problem is unbounded, and so the 
dual problem is infeasible. 

IF there is a unique {p} = Kp pivot candidate 
to leave the basis, 

THEN go to Step 3. 

IF there are more pivot candidates, 

THEN let p be the index of that variable whose 
row is lexico-minimal in the set Kp. (Analo- 
gous to the primal lexicographic simplex selec- 
tion rule.) 

3 | Basis transformation. 

Pivot on (p, q). Go to Step 1. 


Todd's lexicographic Lemke rule (Phase II) 


In Todd’s rule the perturbation is done first in the 
right-hand side and then in the objective (with increas- 
ing order of the perturbation parameter ¢). It finally 
gives a two phase simplex method. For illustration only 
the second phase [14] is presented here. Complete de- 
scription of the algorithm can be found in [3,16]. 

This algorithm is not only a unique simplex method 
for oriented matroids, but it is a novel application of 
lexicography in LO as well. 


See also 
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> 
> 
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Criss-cross Pivoting Rules 
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Linear Programming 
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Synonyms 
LCP 


Definition 

In its standard form, a linear complementarity problem 
(LCP) is an inequality system stated in terms of a map- 
ping f: R" — R” where f(x) = q + Mx. Given f, one 
seeks a vector x € R" such that for i=1,...,n, 


filx) = 0, 


Because the affine mapping f is specified by the vector 
q € R" and the matrix M € R"*"”, the problem is or- 
dinarily denoted LCP(q, M) or sometimes just (q, M). 
A system of the form (1) in which f is not affine is 
called a nonlinear complementarity problem and is de- 
noted NCP(f). The notation CP(f) is meant to cover 
both cases. 

If x is a solution to (1) satisfying the additional non- 
degeneracy condition x; + fi(x) > 0,i=1,..., n, the 
indices i for which x; > 0 or f;(x) > 0 form comple- 
mentary subsets of {1,..., n}. This is believed to be the 


x; = 0, 


and x;fi(x) = 0. (1) 


origin of the term complementary slackness as used in 
linear and nonlinear programming. It was this termi- 
nology that inspired the name complementarity prob- 
lem. 


Sources of Linear Complementarity Problems 


The linear complementarity problem is associated with 
the Karush-Kuhn-Tucker necessary conditions of lo- 
cal optimality found in quadratic programming. This 
connection (as well as the more general connection of 
nonlinear complementarity problems with other types 
of nonlinear programs) was brought out in [1,2] and 
later in [3]. Finding solutions to such systems was one 
of the original motivations for studying the subject. An- 
other was the finding of equilibrium points in bima- 
trix and polymatrix games. This kind of application was 
emphasized in [16] and [22]. These early contributions 
also included essentially the first algorithms for this 
class of problems. There are numerous applications of 
the linear and nonlinear complementarity problems in 
computer science, economics, various engineering dis- 
ciplines, finance, game theory, and mathematics. One 
application of the LCP is in algorithms for the non- 
linear complementarity problem. Descriptions of (and 
references to) these applications can be found in [5,27] 
and [17]. The survey article [10] is a rich compendium 
on engineering and economic applications of linear and 
nonlinear complementarity problems. 


Equivalent Formulations 


Whether linear or nonlinear, the complementarity 
problem expressed by the system (1) can be formu- 
lated in several equivalent ways. An obvious one calls 
for a solution (x, y) to the system 


y-f(x)=0, x20, x"y=0. (2) 


Another is to find a zero x of the mapping 
g(x) = min{x, f(x), (3) 


where the symbol min {a, b} denotes the component- 
wise minimum of the two n-vectors a and b. A third 
equivalent formulation asks for a fixed point of the 


mapping 
h(x) = x — g(x), 


that is, a vector x € R” such that x = h(x). 
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The formulation given in (3) is related to the (often 
nonconvex) optimization problem: 


min x" fa) 
st. f(x) >0 (4) 


x= 0. 


In such a problem, the objective is bounded below by 
zero, thus any feasible solution of (4) for which the ob- 
jective function xT f(x) =0 must be a global minimum 
as well as a solution of (1). As it happens, there are cir- 
cumstances (for instance, the monotonicity of the map- 
ping f) under which all the local minima for the mathe- 
matical programming problem (4) must in fact be solu- 
tions of (3). See [28] for an extended discussion of this 
matter. 

Also noteworthy is a result in [8] showing that the 
LCP is equivalent to solving a system of equations y = 
v(x) where the mapping g: R” — R" is piecewise linear. 
In particular, LCP(q, M) is equivalent to finding a vec- 
tor u such that 


q+ Mut —u™ =0, 


where (fori=1,...,n, uj = max {0, uj} and u; = — min 
{0, uj}. 


The Importance of Matrix Classes 


The extensive literature of the LCP exhibits several 
main directions of study: the existence and unique- 
ness (or number of) solutions, mathematical properties 
of the problem, generalizations of the problem, algo- 
rithms, applications, and implementations. 

Much of the theory of the linear complementarity 
problem is intimately linked in various ways to matrix 
classes. For instance, one of the earliest theorems on the 
existence of solutions to LCPs is due H. Samelson, R.M. 
Thrall and O. Wesler [30]. Motivated by a problem in 
structural mechanics, they showed that the LCP(q, M) 
has a unique solution for every q € R” if and only if 
the matrix M has positive principal minors. (That is, 
the determinant of every principal submatrix of M is 
positive.) The class of such matrices has come to be 
known as P, and its members are called P-matrices. 
(The Samelson-Thrall-Wesler theorem characterizes 
this class of matrices in terms of the LCP.) The class P 
includes all positive definite (PD) matrices, i.e., those 
square matrices M for which x7 Mx > 0 for all x 4 0. In 


the context of the LCP, the term PD does not require 
symmetry. An analogous definition (and usage) holds 
for positive semidefinite (PSD) matrices, namely, M is 
PSD if xT Mx > 0 for all x. Some authors refer to such 
matrices as monotone because of their connection with 
monotone mappings. PSD-matrices have the property 
that associated LCPs (q, M) are solvable whenever they 
are feasible, whereas LCPs (q, M) in which M € PD are 
always feasible and (since PDCPSD) are always solv- 
able. This distinction is given a more general matrix 
form in [25,26]. There Q is defined as the class of all 
square matrices for which LCP(q, M) has a solution 
for all q and Qo as the class of all square matrices for 
which LCP(q, M) has a solution whenever it is feasible. 
Although the goal of usefully characterizing the classes 
Q and Q, has not yet been realized, much is known 
about some of their special subclasses. Indeed, there are 
now literally dozens of matrix classes for which LCP 
existence theorems have been established. See [5,27] 
and [17] for an abundance of information on this sub- 
ject. 

From the theoretical standpoint, the class of ‘suffi- 
cient matrices’ [6] illustrates the intrinsic role of matrix 
classes in the study of the LCP. A matrix M € R"*" is 
column sufficient if 


[x;(Mx); <0 Vil > [x;(Mx); =0 Vil 


and row sufficient if MT is column sufficient. When M 
is both row and column sufficient, it is called sufficient. 
Row sufficient matrices always have nonnegative prin- 
cipal minors, hence so do (column) sufficient matrices. 
These classes include both P and PSD as distinct sub- 
classes. The row sufficient matrices form a subclass of 
Qo; this is not true of column sufficient matrices how- 
ever. The column sufficient matrices M € R"*" are 
characterized by the property that the solution set of 
LCP(q, M) is convex for every q € R". In the same spirit, 
a real n x n matrix M is row sufficient if and only if for 
every q € R", the solutions of the LCP(q, M) are pre- 
cisely the optimal solutions of the associated quadratic 
program (4). Rather surprisingly, the class of sufficient 
matrices turns out to be identical to the matrix class P,. 
introduced in [19]. See [13] and [34]. 


Algorithms for Solving LCPs 


The algorithms for solving linear complementarity 
problems are of two major types: pivoting (or, direct) 
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and iterative (or, indirect). Algorithms of the former 
type are finite procedures that attempt to transform the 
problem (q, M) to an equivalent system of the form (q’, 
M’) in which q’ > 0. Doing this is not always possible; 
it depends on the problem data, usually on the matrix 
class (such as P, PSD, etc.) to which M belongs. When 
this approach works, it amounts to carrying out a prin- 
cipal pivotal transformation on the system of equations 


w=qt+Mz. 


To such a transformation there corresponds an index 
set a (with complementary index set @ = {1,...,} \ 
a) such that the principal submatrix Mg q is nonsingu- 
lar. When this (block pivot) operation is carried out, the 
system 


Wa = qa + MaaZa + Mazza, 
Wa = qa + MaaZa + Maaza 
becomes 


/ / iG 
Za = dy + Magwa + MogZa; 
/ 
Wa = da + My’ Wa + My’'zZa; 


where 
da = —Mga 4a: 
. = qa — MaaMyo 4a; 
Maa = Maa: 
Mga = MaaMaa: 
Mya = — aa Mow, 


Mia = Maa — MauaMyy Moa: 


There are two main pivoting algorithms used in pro- 
cessing LCPs. The more robust of the two is due to C.E. 
Lemke [21]. Lemke’s method embeds the LCP (q, M) 
in a problem having an extra ‘artificial’ nonbasic (inde- 
pendent) variable zo with coefficients specially chosen 
so that when Zp is sufficiently large, all the basic vari- 
ables become nonnegative. At the least positive value 
of Zo for which this is so, there will (in the nondegen- 
erate case) be (exactly) one basic variable whose value 
is zero. That variable is exchanged with zo. Thereafter 
the method executes a sequence of (almost complemen- 
tary) simple pivots. In each case, the variable becom- 
ing basic is the complement of the variable that be- 


came nonbasic in the previous exchange. The method 
terminates if either z) decreases to zero (in which case 
the problem is solved) or else there is no basic variable 
whose value decreases as the incoming nonbasic vari- 
able is increased. The latter outcome is called termina- 
tion on a secondary ray. For certain matrix classes, ter- 
mination on a secondary ray is an indication that the 
given LCP has no solution. Lemke’s method is studied 
from this point of view in [7]. 

The other pivoting algorithm for the LCP is called 
the principal pivoting method (PPM), expositions of 
which are given in [3] and [5]. The algorithm two ver- 
sions: symmetric and asymmetric. The former executes 
a sequence of principal (block) pivots or order 1 or 2, 
whereas the latter does sequences of almost comple- 
mentary pivots, each of which results in a block prin- 
cipal pivot or order potentially larger than 2. 

Iterative methods are often favored for the solu- 
tion of very large linear complementarity problems. In 
such problems, the matrix M tends to be sparse (i. e., 
to have a small percentage of nonzero elements) and 
structured. Since iterative methods do not modify the 
problem data, these features of large scale problems can 
be used to advantage. Ordinarily, however, an iterative 
method does not terminate finitely; instead, it generates 
a convergent sequence of trial solutions. The older it- 
erative LCP algorithms are based on equation-solving 
methods (e. g., Gauss-Seidel, Jacobi, and successive over- 
relaxation); the more contemporary ones are varieties 
of the interior point type. In addition to the usual con- 
cerns about practical performance, considerable inter- 
est attaches to the development of polynomial time al- 
gorithms. Not unexpectedly, the allowable analysis and 
applicability of iterative algorithms depend heavily on 
the matrix class to which M belongs. Details on sev- 
eral such algorithms are presented in [36,37], and the 
monographs [5,27] and [17]. 


Software 


For decades researchers have experimented with com- 
puter codes for various linear (and nonlinear) comple- 
mentarity algorithms. By the late 1990s, this activity 
reached the stage where the work could be distributed 
as something approaching commercial software. An 
overview of available software for complementarity 
problems (mostly nonlinear), is available as [35]. 


Linear Complementarity Problem 


1877 


Some Generalizations 


Both linear and nonlinear complementarity problems 
have been generalized in numerous ways. One of the 
earliest generalizations, given in [14] and [18], is the 
problem CP(K, f) of finding a vector x in the closed 
convex cone K such that f(x) € K* (the dual cone) and 
xT f(x) =0. Through this formulation, a connection can 
be made between complementarity problems and vari- 
ational inequality problems, that is, problems VI(X, f) 
wherein one seeks a vector x* € X (a nonempty subset 
of R”) such that 


f(x*)"(y—x*) >0 forall ye X. 


It was established in [18] that when X is a closed convex 
cone, say K, with dual cone K*, then CP(K, f) and VI(X, 
ff) have exactly the same solutions (if any). See [15] for 
connections with variational inequalities, etc. 

In [29] the generalized complementarity problem 
CP(K, f) defined above is considered as an instance of 
a generalized equation, namely to find a vector x € R” 
such that 


0 € f(x) + dWK(x), 


where wx is the indicator function of the closed con- 
vex cone K and 0 denotes the subdifferential operator 
as used in convex analysis. 

Among the diverse generalizations of the linear 
complementarity problem, the earliest appears in [30]. 
There, for given n x n matrices A and B and n-vector 
c, the authors considered the problem of the finding n- 
vectors x and y such that 


Ax+By=c, x,y>=0 and x'y=0. 


A different generalization was introduced in [4]. In this 
sort of problem, one has an affine mapping f(x) = q + 
Nx where N is of order ae pj Xn partitioned into k 
blocks; the vectors q and y = f(x) are partitioned con- 
formably. Thus, 


y=qg+Nix forj=1,...,k. 


The problem is to find a solution of the system 


In recently years, many publications, e.g. [9] and [24], 
have further investigated this vertical linear comple- 
mentarity problem (VLCP). Interest in the model which 
is at the heart of [30] and is now called the horizon- 
tal linear complementarity problem (HLCP) was revived 
in [38] where it is used as the conceptual framework 
for the convergence analysis of infeasible interior point 
methods. (The problem also comes up in [20].) In some 
cases, HLCPs can be reduced to ordinary LCPs. This 
subject is explored in [33] which gives an algorithm for 
doing this when it is possible. A further generalization 
called extended linear complementarity problem (ELCP) 
was introduced in [23] and subsequently developed in 
[11,12] and [32]. To this collection of LCP variants can 
be added the ELCP presented in [31]. The form of this 
model captures the previously mentioned HLCP, VLCP 
and ELCP. 


See also 
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If one has two systems of linear relations, where each 
relation is either an linear equation (or linear equality 
relation) or a linear inequality relation (of type >, =, <, 
< or #), and exactly one of the two systems has a solu- 
tion, then one says that the two given systems are each 
others alternative. A mathematical theorem stating that 
two systems are alternative systems is called a theorem 
of the alternative, or also a transposition theorem. Many 
such theorems are known. The table lists ten results of 
this type, with their inventors and dates. The table is 
a modified version of tables of H. Greenberg [16] and 
in [8]. In each case the alternative systems are labelled 
by a and J, respectively. 

Consider by way of example the two systems 4a and 
4b in the table. The corresponding theorem of the alter- 
native is known as Farkas’ lemma. Assume that 4a has 
a solution x, so Ax < b. Then we have for each non- 
negative vector y that yTAx < yTb. Hence, if yTA = 0 
then we will have y™b > 0. Thus it follows that if 4a has 
a solution then 4b does not have a solution. This is the 
easy part of the proof of Farkas’ lemma. The proof of 
the other implication is much harder. For a discussion 
of several proof techniques, see ® Farkas lemma. 

In the above example we used that for y > 0 the in- 
equality y'Ax < yTb is implied by the system Ax < b. 
Note that the implied inequality y’ Ax < y™b is obtained 
from the separate inequalities in Ax < b by combin- 
ing them in a linear fashion. Fixing y, one easily un- 
derstands that the implied inequality has no solution x 
if and only if y'A = 0 and yTb < 0. Together with y > 
0 these are precisely the relations in the alternative sys- 
tem 4b. Thus, it may be concluded that Farkas’ lemma 
can be restated by saying that the system Ax < b is fea- 
sible if and only if it does not imply (in a linear fashion) 
the ‘contradiction’ 07x <0. The ‘if-part is obvious: if the 
system has an implied inequality 0'x < 0 then it must be 
inconsistent. But the ‘only if-part is a very deep result: 
it states that if the system has no contradictory implied 
inequality then it has a solution. The other theorems of 
the alternative in the table admit a similar interpreta- 
tion. 

The relevance of a theorem of the alternative is the 
following. Given some system S of relations the cru- 


1 J.B.J. Fourier (1826) [4] 
a Ax <0, Bx <0, Cx =0 
b y'A+v' B+w'C=0, 


y20,0Av=0 
P. Gordan (1873) [7] 

Ax >0 
ylA=0,0#y20 
J.Farkas (1902) [3] 

Ae = fb, 2 = CO 
y'AZ=0, ylb <0 
Farkas (1902) [3] 

Ax <b 
Vy LAH Oyo 
E. Stiemke (1915) [13] 

Ax =0, x >0 
y'A>O, y'A #0 
W.B. Carver (1912) [2] 

Ax <b 
vA 0a = Oy lo 0 yA 
T.S. Motzkin (1936) [10] 

Ax <0, Bx <0 
y'A+v'B=0, y>0, v=0,v40 
J. Ville (1938) [15] 

Ax >0, x >0 
y'A<0, y>0, y #0, or A'y 40 
A.W. Tacket (1956) [14] 

Ax >0, Ax #0, Bx >0, Cx =0 
y'A+v'B+w'C=0, y>0, v=0 


Ts OFA WTA NTA WAT sA WS A HI Ts werd NY 


10 D. Gale (1960) [5] 
a Ax <b 
b y'A=0, y'b=-1,y>0 


Ten pairs of alternative systems 


cial question is whether the system has a solution or 
not. Knowing the answer to this question one is able to 
answer many other questions. For example, if one has 
a linear optimization problem LO in the standard form 
min CEs Ax =b,x> of" 
x 
a given real number z is a strict lower bound for the 
optimal value of the problem if and only if the system 
Ax = b, c'x <z, x >0, 
has no solution, i.e. is infeasible. On the other hand, 
a given real number Z is an upper bound for the optimal 
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value of the problem if and only if the system 


Ax = b, c'x>z, x >0, 


has a solution, i.e. is feasible. 

If a system S has a solution then this is easy to cer- 
tify, namely by giving a solution of the system. The solu- 
tion then serves as a certificate for the feasibility of S. If S 
is infeasible, however, it is more difficult to give an easy 
certificate. One is then faced with the problem of how 
to certify a negative statement. This is in general a very 
nontrivial problem that also occurs in many real life sit- 
uations. For example, when accused for murder, how 
should one prove his innocence? In circumstances like 
these it may be impossible to find an easy to verify cer- 
tificate for the negative statement ‘not guilty’. A practi- 
cal solution is the rule ‘a person is innocent until his/her 
guilt is certified’. Clearly, from the mathematical point 
of view this approach is unsatisfactory. 

Now suppose that there is an alternative system T 
and there exists a theorem of the alternative for S and 
T. Then we know that exactly one of the two systems 
has a solution. Therefore, S has a solution if and only if 
T has no solution. In that case, any solution of T pro- 
vides a certificate for the unsolvability of S. Thus it is 
clear that a theorem of the alternative provides an easy 
to verify certificate for the unsolvability of a system of 
linear relations. 

The proof of any theorem of the alternative consists 
of two parts. Assuming the existence of a solution of 
one system one needs to show that the other system 
is infeasible, and vice versa. It has been demonstrated 
above for Farkas’ lemma that one of the two implica- 
tions is easy to prove. This seems to be true for each the- 
orem of the alternative: in all cases one of the implica- 
tions is almost trivial, but the other implication is highly 
nontrivial and very hard to prove. On the other hand, 
having proved one theorem of the alternative the other 
theorems of the alternative easily follow. In this sense 
one might say that all the listed theorems of the alter- 
native are equivalent: accepting one of them to be true, 
the validity of each of the other theorems can be veri- 
fied easily. The situation resembles a number of cities 
on a high plateau. Travel between them is not too dif- 
ficult; the hard part is the initial ascent from the plains 
below [1]. 

It should be pointed out that Farkas’ lemma, or each 
of the other theorems of the alternative, is equivalent 


to the most deep result in linear optimization, namely 
the duality theorem for linear optimization: this theo- 
rem can be easily derived from Farkas’ lemma, and vice 
versa (cf. also ® Linear programming). In fact, in many 
textbooks on linear optimization the duality theorem is 
derived in this way [5,17], whereas in other textbooks 
the opposite occurs: the duality theorem is proved first 
and then Farkas’ lemma follows as a corollary [11]. This 
phenomenon is a consequence of a simple, and basic, 
logical principle that any duality theorem is actually 
equivalent to a theorem of the alternative, as has been 
shown in [9]. 

Both the Farkas’ lemma and the duality theorem for 
linear optimization can be derived from a more general 
result which states that for any skew-symmetric matrix 
K (i.e., K = — KT) there exists a vector x such that 

Kx >0, x>0, x+Kx>0. 

This result is due to Tucker [14] who also derives 
Farkas’ lemma from it, whereas A.J. Goldman and 
Tucker [6] show how this result implies the duality 
theorem for linear optimization. For recent proofs, see 
[12]. 


See also 


> Farkas Lemma 

> Linear Programming 

> Motzkin Transposition Theorem 

> Theorems of the Alternative and Optimization 

> Tucker Homogeneous Systems of Linear Relations 
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The linear ordering problem (LOP) has a wide range of 
applications in several fields, such as scheduling, sports, 
social sciences, and economics. Due to its combinato- 
rial nature, it has been shown to be NP-hard [5]. Like 
many other computationally hard problems, the linear 
ordering problem has captured the researcher attention 
for developing efficient solution procedures. A compre- 
hensive treatment of the state-of-art approximation al- 
gorithms for solving the linear order problem is con- 
tained in [15]. The scope of this article is to introduce 
the reader to this problem, providing its definition and 
some of the algorithms proposed in literature for solv- 
ing it efficiently. 


Problem Description 


The linear ordering problem (LOP) can be formulated 
as follows: Given a complete digraph Dy, = (Vn; En) on 
n nodes and given arc weights c(i, j) for each arc (i, j) € 
E,, find a spanning acyclic tournament in D, such that 
the sum of the weights of its arcs is as large as possible. 

An equivalent mathematical formulation of LOP 
([11]) is the following: Given a matrix of weights E 
= {€ij}mxm find a permutation p of the columns (and 
rows) in order to maximize the sum of the weights in 
the upper triangle. Formally, the problem is to maxi- 
mize 


m—-l om 
Cz(p) = pa x €pipy 


i=1 j=i+1 


where p; is the index of the column (and row) occupy- 
ing the position i in the permutation. 

The best known among the applications of LOP oc- 
curs in economics. In fact, it is equivalent to the so- 
called triangulation problem for input-output tables. In 
this economical application, the economy (regional or 
national) is subdivided into sectors. An m x m input- 
output matrix is then created, whose entry (i, j) repre- 
sents the flow of money from the sector i to the sector 
j. The sectors have to be ordered so that suppliers tend 
to come first followed by costumers. This scope can be 
achieved by permuting the rows and the columns of the 
built matrix so that the sum of entries above the diag- 
onal is maximized, which is exactly the objective of the 
linear ordering problem. 
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Review of Exact and Approximation Algorithms 


The pioneer heuristic method for solving LOP has been 
proposed by H.B. Chenery and T. Watanabe [3]. Their 
method tries to obtain plausible rankings of the sec- 
tors of an input-output table in the triangulation prob- 
lem by ranking first those sectors that have a small 
share of inputs from other sectors and of outputs 
to final demand. An extensive discussion about the 
heuristics proposed until 1981 can be found in [16], 
while more recent work has been done in [2,11]. In 
[11] a heuristic algorithm is proposed based on the 
tabu search methodology and incorporating strategies 
for search intensification and diversification are given. 
For search intensification M. Laguna and others ex- 
perimented with path relinking, a strategy proposed in 
connection with tabu search by F. Glover and Laguna 
[6] and still rarely used in actual implementations. In 
[2] an algorithm is presented implementing a scatter 
search strategy, which is a population-based method 
that has been shown to lead to promising outcomes 
for solving combinatorial and nonlinear difficult prob- 
lems. 

The development of exact algorithms for LOP can 
be seen connected to the development of methods for 
solving general integer programming problems, since 
any such method can be slightly modified to solve the 
triangulation problem. Most of those exact algorithms 
belong either to the branch and bound family or to the 
linear programming methods. 


Branch and Bound Algorithms 


One of the earliest published computational results us- 
ing a branch and bound strategy is due to J.S. DeCani 
in 1972 [4]. He originally studied how to rank n objects 
on the basis of a number of paired comparisons. Since 
k persons have to pairwise compare n objects according 
to some criterion, a matrix E = {ej} is built, where e; 
is the number of persons that prefer object i to object 
j. The problem is to find a linear ranking of the objects 
reflecting the outcome of the experiment as closely as 
possible. In the branch and bound strategy proposed by 
DeCani partial rankings are built up and each branch- 
ing operation in the tree corresponds to inserting a fur- 
ther object at some position in the partial ranking. At 
level n of the tree a complete ranking of the objects is 
found. The upper bounds are exploited in the usual way 


for backtracking and excluding parts of the tree from 
further consideration. 

A further method for solving LOP is the lexico- 
graphic search algorithm proposed in [9,10]. It lexico- 
graphically enumerates all permutations of the n sec- 
tors by fixing at level k of the enumeration tree the kth 
position of the permutations. In more detail, if at level 
k a node is generated, then the first k positions o(1), 
..., O(k) are fixed. Based on this fixing several Helm- 
stadter’s conditions can be tested. If one of them is vio- 
lated, then there is no relatively optimum having o(1), 
...,0(k) in the first k positions. Therefore, the node cur- 
rently under consideration can be ignored and a back- 
tracking is performed. By using this method all rela- 
tively optimum solutions are enumerated, since there 
is no bounding according to objective function values. 
At the end the best one among them is kept. Starting 
from lexicographic search, [8] proposed a lexicographic 
branch and bound scheme. 

Other authors have proposed branch and bound 
methods, such as [7,12], and [14]. 


Linear Programming Algorithms 


All linear programming approaches are based on the 
consideration that the triangulation problem can be 
formulated as a 0-1 integer programming problem us- 
ing the 3-dicycle inequalities. In [13] the LP relaxation 
using the tournament polytope P¢ is proposed and the 
corresponding full linear program is solved in its dual 
version. In [1] LP relaxation is used for solving schedul- 
ing problems with precedence constraints. It is easy to 
see that the scheduling problem of minimizing the to- 
tal weighted completion time of a set of processes on 
a single processor can be formulated as a linear order- 
ing problem. 

Other possibilities for theoretically solving linear 
ordering problems are methods as dynamic program- 
ming or by formulating the problem as quadratic as- 
signment problem ([10]). 


See also 


> Assignment and Matching 

> Assignment Methods in Clustering 

> Bi-objective Assignment Problem 

> Communication Network Assignment Problem 
> Complexity Theory: Quadratic Programming 


Linear Programming 


1883 


> Feedback Set Problems 

> Frequency Assignment Problem 

> Generalized Assignment Problem 

> Graph Coloring 

> Graph Planarization 

> Greedy Randomized Adaptive Search Procedures 

> Maximum Partition Matching 

> Quadratic Assignment Problem 

> Quadratic Fractional Programming: Dinkelbach 
Method 

> Quadratic Knapsack 

> Quadratic Programming with Bound Constraints 

> Quadratic Programming Over an Ellipsoid 

> Quadratic Semi-assignment Problem 

> Standard Quadratic Optimization Problems: 
Algorithms 

> Standard Quadratic Optimization Problems: 
Applications 

> Standard Quadratic Optimization Problems: Theory 


References 


1. Boenchendorf K (1982) Reihenfolgenprobleme/Mean- 
flow-time sequencing. Math Systems in Economics. 
Athenaum-Hain-Scriptor—Hanstein, K6nigstein/Ts. 

2. Campos V, Glover F, Laguna M, Marti R (1999) An experi- 
mental evaluation of a scatter search for the linear order- 
ing problem. Manuscript Apr 

3. Chenery HB, Watanabe T (1958) International comparisons 
of the structure of production. Econometrica 26(4):487- 
521 

4. DeCani JS (1972) A branch &bound algorithm for max- 
imum likelihood paired comparison ranking. Biometrika 
59:131-135 

5. Garey MR, Johnson DS (1979) Computers and intractabil- 
ity: A guide to the theory of NP-completeness. Freeman, 
New York 

6. Glover F, Laguna M (1997) Tabu search. Kluwer, Dordrecht 

7. Hellmich K (1970) Okonomische Triangulierung. Heft 54. 
Rechenzentrum Graz, Graz 

8. Kaas R (1981) A branch&bound algorithm for the acyclic 
subgraph problem. Europ J Oper Res 8:355-362 

9. Korte B, Oberhofer W (1968) Zwei Algorithmen zur 
Lésung eines Komplexen Reihenfolgeproblems. Un- 
ternehmensforschung 12:217-231 

10. Korte B, Oberhofer W (1969) Zur Triangulation von Input- 
Output Matrizen. Jahrbuch f Nat Ok u Stat 182:398-433 

11. Laguna M, Marti R, Campos V (1999) Intensification and di- 
versification with elite tabu search solutions for the liner 
ordering problem. Comput Oper Res 26:1217-1230 

12. Lenstra jr. HW (1973) The acyclic subgraph problem. Techn 
Report Math Centrum Amsterdam BW26 


13. Marcotorchino JF, Mirchaud P (1979) Optimisation en anal- 
yse ordinale des donnees. Masson, Paris 

14. Poetsch G (1973) Losungsverfahren zur Triangulation von 
Input-Output Tabellen. Heft 79. Rechenzentrum Graz, Graz 

15. Reinelt G (1985) The linear ordering problem: Algorithms 
and applications. In: Hofmann HH, Wille R (eds) Res. and 
Exposition in Math., vol 8. Heldermann, Berlin 

16. Wessels H (1981) Computers and intractability: A guide 
to the theory of NP-completeness. Beitrage zur Struktur- 
forschung, vol 63. Deutsches Inst. Wirtschaftsforschung, 
Berlin 

17. Whitney H (1935) On the abstract properties of linear de- 
pendence. Amer J Math 57:509-533 


re 
Linear Programming 


LP 


PANOS M. PARDALOS 

Center for Applied Optim., Department Industrial 
and Systems Engineering, University Florida, 
Gainesville, USA 


MSC2000: 90C05 


Article Outline 


Keywords 


Problem Description 
The Simplex Method 


See also 
References 


Keywords 


Linear programming; Basic solution; Simplex method; 
Pivoting; Nondegenerate 


Linear programming (LP) is a fundamental optimiza- 
tion problem in which a linear objective function is to 
be optimized subject to a set of linear constraints. Due 
to the wide applicability of linear programming models, 
an immense amount of work has appeared regarding 
theory and algorithms for LP, since G.B. Dantzig pro- 
posed the simplex algorithm in 1947. It is not surpriz- 
ing that in a recent survey of Fortune 500 companies, 
85% of those responding said that they had used linear 
programming. The history, theory, and applications of 
linear programming may be found in [3]. Several books 
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have been published on the subject (see the references 
section). 


Problem Description 


Consider the linear programming problem (in standard 
form): 


min cx 
st. Ax =b, (1) 
x>0 


where c € R", b € P™ and A is an m X n matrix of rank 
m (i.e. we do not have any redundant constraints). The 
feasible domain 


P={xeER": Ax=b, x >0} 


is a polytope. We assume that (1) has a finite optimal 
solution. Let B be a submatrix of A formed by m lin- 
early independent columns. We may assume that A = 
[B, N], i.e. the first columns of A are linearly indepen- 
dent. Then the linear system Bxg = b has a unique solu- 
tion. If x = (xg, 0) then Ax = b and x = (xz, 0) is called 
a basic solution. The components of x associated with 
the columns of B are called basic variables. If one of the 
basic variables in a basic solution is zero, that solution is 
called a degenerate basic solution. A basic solution that 
is feasible (i. e. x > 0) is called a basic feasible solution. 

The following theorem identifies the special impor- 
tance of the basic feasible solutions. 


Theorem 1 Assume that P in (1) is nonempty. Then 
a feasible point x € P is a vertex of P if and only if x is 
a basic feasible solution. 


Existence of basic feasible solutions is established by 
the following fundamental theorem of linear program- 
ming. 


Theorem 2 Given the linear programming problem (1), 

the following statements are true: 

1) If Pis nonempty, there exists a basic feasible solution. 

2) If (1) has an optimal solution, then there is an opti- 
mal basic feasible solution. 


Therefore, the linear programming problem can be 
solved by searching among its basic feasible solutions 
(i. e. vertices of P). Since there are at most 


(") 


basic solutions, the above theorem gives a finite, but 
a very inefficient algorithm. A more systematic search 
among the basic feasible solutions, is given by the sim- 
plex method, which was developed by Dantzing in 
1947, 


The Simplex Method 


The simplex method has a simple geometric motivation 
which is described by the following two phases. 


I | An initial vertex xo of P (basic feasible solu- 
tion) is computed. 

II | Starting from the vertex x, a sequence of 
vertices Xo,...,xn is computed such a way 
that x;,; is adjacent to x;,i=0,..., N—1, and 
such that c!x;,; < c'x;. The method termi- 
nates if either none of the edges adjacent to xy 
is decreasing the objective function (i-e., x is 
the solution) or if an unbounded edge adja- 
cent to xy is found, improving the objective 
function (i.e. the problem is unbounded). 


Each step of the simplex method, moving from one 
vertex to an adjacent one, is called pivoting. The inte- 
ger N gives the number of pivot steps in the simplex 
method. Phase I can be solved in a similar way to Phase 
IL. In problems of the canonical form: 


min c!x 
st. Ax =b, (2) 
x=0, b=0, 


there is no need for Phase I, because an initial vertex 
(xo = 0) is at hand. We start by considering Phase II of 
the simplex method, by assuming that an initial vertex 
(basic feasible solution) is available. Let xp be a basic 
feasible solution with xj0, ..., Xmo its basic variables, and 
let B = {Agq:i=1,..., m} the corresponding basis. If A; 
denotes the jth column of A, (A; ¢ B), then 


m 
YS xij Ane => Aj. (3) 
i=1 

In addition, 


Y > xioA ay =p, (4) 
i=1 
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Multiply (3) by @ > 0 and subtract the result from (4) to 
obtain: 


Y-(xio — Ox;j)Apiiy) + OA; = b. (5) 


i=1 


Assume that xo is nondegenerate. How much can we 
increase 0 and still have a solution? We can increase 0 
until the first component of (xj9 — 6x) becomes zero 
or equivalently 


j Xi0 
6) = min | =2: 


Xij > of . (6) 
1 Xij 


If 69 = xio/xy, then column A; leaves the basis and A; 
enters the basis. 

Ifa tie occurs in (6), then the new solution is degen- 
erate. In addition, if all x <0, then we move arbitrarily 
far without becoming infeasible. In that case the prob- 
lem is unbounded. 

Define the new point x9’ by 


if xin -Oxij, i Al, (7) 
6, —s a 
and 
Bi), iF l, 
ay = \BO» i# 
jp i=l 


It is easy to see that the m columns Ajvij) are linearly 
independent. Let 


m m 
Y > xiApri) — xj Aj +. Y> xi Ani=o- 


i=1 i=1 


ifl 


Using (3) we have: 


m 

Yi (aixij + aj)Agiy + a1x1j;ABay) = 0 

iZl 
and by linear independence of the columns Agi) we 
have 


aj=0, aj(l+xij) =0 > a4),...,an = 0. 


Hence, the new point xo’ whose basic variables are given 
by (7) is a new basic feasible solution. When the basic 
feasible solution xo is degenerate then some of the ba- 


sic variables are zero. Therefore more than n—m of the 
constraints x; > 0 are satisfied as equations (are active) 
and so Xo satisfies more than n equations. From (6) it 
follows that if xj) = 0 and the corresponding x; > 0, then 
69 = 0 and therefore we remain at the same vertex. 

Note that when a basic feasible solution x9 is degen- 
erate, there can be an enormous number of basis associ- 
ated with it. In fact, if xo has k > m positive components, 
then there may be as many as CS) different bases. In 
that case we may compute x9 as many times as there 
are basis, but the set of variables that we label basic and 
nonbasic are different. 

The cost (value of objective function) as a basic fea- 
sible solution x, with corresponding basis B is: 


n 
= > X iy CB(i) 


I=1 


Suppose we bring column A; into the new basis. The 
following economic interpretation can be used to select 
the pivot column Aj: For every unit of the variable x; 
that enters the basis, an amount xj of each of the vari- 
ables xg,;) must leave. Hence, a unit increase in the vari- 
able x; results in a net change in the cost, equal to: 


Cj = Cj) —Z;j 


(relative cost of column j), where z; = ype: It is 
profitable to bring column j into the basis exactly when 
c; < 0. Choosing the most negative c; corresponds to 
a kind of steepest descent. However, many other selec- 
tion criteria can be used (e. g., Blad’s rule, etc). 

If all reduced costs satisfy ¢; > 0, then we are at an 
optimal solution and the simplex method terminates. 
Note that relations (1) can be expressed in matrix nota- 
tion by: 


BX=A or X=B'A, 


that is, the matrix X = (xj) is obtained by diagonalizing 
the basic columns of A. Then 


m 
= Yo xijeaw or zg) =clX=ci BA. 
1=1 
Suppose ¢ = c — z > 0. Let y bea feasible point. Then, 


clyse yee BR Ay =cla beac’ m 


and therefore xo is an optimal solution. 
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Under the assumption of nondegeneracy with our 
pivot selection, xjo > 0 (see (6)) and 


= X10 
Z = %— —(zj)-—¢j) > % (zj—c¢j < 0). 
xlj 


Note that corresponding to any basis there is a unique 
Zo, and hence, we can never return to a previous basis. 
Therefore, each iteration gives a different basis and the 
simplex method terminator after N < (") pivots. 
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An enormous amount of research on interior point al- 
gorithms for linear programming (LP) has been con- 
ducted since N.K. Karmarkar [8] announced his cele- 
brated projective algorithm in 1984. Interior point algo- 
rithms for LP are interesting for two different reasons. 
First, many interior point methods are polynomial time 
algorithms for LP. Consider a standard form problem: 


min cx 
LP s.t Ax = 
x > 0, 


where A is an m x n matrix. For the purpose of char- 
acterizing the complexity of algorithms it is common 
to assume that the data of LP is integral. If L is the 
number of bits required to encode the data, then an 
algorithm for LP is polynomial time if the number of 
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operations required to solve LP is a polynomial func- 
tion of n, m, and L. Throughout we use ‘operations’ to 
refer to arithmetic operations in infinite precision, al- 
though for an algorithm to be rigorously polynomial 
time in the rational model of computation the number 
of bits required to perform all computations should also 
be polynomially bounded. Karmarkar’s projective algo- 
rithm solves LP in O(nL) iterations and O(n*L) total op- 
erations. This overall complexity is required to obtain 
a solution of LP whose objective is within a tolerance 
2~ © of optimality; an exact optimal solution can then 
be obtained using a ‘rounding’ procedure in O(n*) op- 
erations. Karmarkar also described a partial updating 
procedure that reduced the overall complexity of his al- 
gorithm to O(n*°L) operations. The idea of partial up- 
dating is to allow for some error in the specification of 
the projection equations that are solved on each itera- 
tion of the algorithm. 

Interior point algorithms are also interesting be- 
cause they perform well in practice. When the projec- 
tive algorithm was first announced Karmarkar made 
well-publicized claims that his algorithm was several 
times faster than the simplex method in solving large 
LP problems. It was eventually discovered that most of 
Karmarkar’s claims were actually for an implementa- 
tion of the affine scaling algorithm, a simplified version 
of Karmarkar’s algorithm that avoids the use of projec- 
tive transformations. Initial attempts to replicate Kar- 
markar’s results were mainly failures, but eventually it 
was convincingly established that interior point algo- 
rithms are highly competitive with the simplex method 
on large scale problems. 

The announcement of Karmarkar’s algorithm led to 
the development of a variety of different types of inte- 
rior point methods for LP. The simplest of these are 
affine scaling methods, which were independently de- 
vised by E. Barnes [2] and RJ. Vanderbei, M.J. Meke- 
ton, and B.A. Freedman [21]. It was eventually realized 
that in fact the affine scaling method was discovered 
by LI. Dikin [3] in 1967. The affine scaling method is 
not a polynomial time algorithm for LP, and it is now 
known that the algorithm may not even converge if 
the stepsize is too long [12]. Nevertheless its practical 
performance is often quite good, as indicated by Kar- 
markar’s early claims. 

Another type of interior point method, the path fol- 
lowing algorithm, was discovered by J. Renegar [17]. 


Renegar’s algorithm requires only O(,/nL) iterations 
to solve LP, as opposed to O(nL) iterations for Kar- 
markar’s algorithm. By adapting Karmarkar’s partial 
updating technique to the path following framework, 
C.C. Gonzaga [5] and P.M. Vaidya [19] devised the first 
algorithms for LP with overall complexities of O(n*L) 
operations. The iterates of path following algorithms lie 
in a small neighborhood of the central path or central 
trajectory, which is defined to be the set of minimizers 
of the logarithmic barrier function 


alt n 
fulx) = “ — ) Int), 
i=1 


over {x: Ax = b, x > 0}, for pu € (0, oo). Later C. Roos 
and J.-Ph. Vial [18], and Gonzaga [6] developed ‘long 
step’ path following algorithms. These algorithms are 
based on properties of the central path, but the iter- 
ates are not constrained to remain in a small neighbor- 
hood of the path. Long step path following algorithms 
are very closely related to the classical sequential un- 
constrained minimization technique (SUMT) of A.V. 
Fiacco and G.P. McCormick [4]. 

A different class of interior point algorithms is based 
on Karmarkar’s use of a potential function, a surrogate 
for the original objective, to monitor the progress of his 
projective algorithm. Gonzaga [7] and Y. Ye [23] de- 
vised the first potential reduction algorithms. These al- 
gorithms are based on reducing a potential function but 
do not employ projective transformations. Ye’s poten- 
tial reduction algorithm requires O(,//L) iterations, 
like path following algorithms, and provides an O(n°L) 
algorithm for LP when implemented with partial up- 
dating. 

All of the algorithms mentioned to this point are 
based on solving LP, or alternatively the dual problem: 


max bly 
LD st. Aly+ts=c 
s>0. 


Algorithms for solving LP typically generate feasible so- 
lutions to LD, and vice versa, but the algorithms are 
not symmetric in their treatment of the two problems. 
A different class of interior point methods, known as 
primal-dual algorithms, is completely symmetric in the 
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variables x and s. Primal-dual algorithms are based on 
applying Newton’s method directly to the system of 
equations: 


Ax =b 
PD(w) YAly+s=c 


xOos= je, 


where e € R” is the vector of ones, jz is a positive 
scalar, and x o s is the vector whose ith component 
is x;s;. Solutions x > 0 and s > 0 to PD(j) are ex- 
actly on the central paths for LP and LD, respectively. 
Most primal-dual algorithms fit into the path following 
framework. The idea of a primal-dual path following al- 
gorithm was first suggested by N. Megiddo [13], and 
complete algorithms were first devised by R.C. Mon- 
teiro and I. Adler [15] and M. Kojima, S. Mizuno, and 
Y. Yoshise [10]. It is widely believed that primal-dual 
methods are in practice the best performing interior 
point algorithms for LP. 

One advantage of the system PD(jz) is that New- 
ton’s method can be applied even when the current x > 
0 and s > 0 are not feasible in LP and LD. This in- 
feasible interior point (IIP) strategy was first employed 
in the OB1 code of J. Lustig, M.E. Marsten, and D.F. 
Shanno [11]. The solution to the Newton equations 
with yz = 0 is referred to as the predictor, or primal- 
dual affine scaling direction, while the solution with 
ju = xTs/n, for the current solutions x and s, is called 
the corrector, or centering direction. The primal-dual 
predictor-corrector algorithm alternates between the use 
of these two directions. One implementation of the IIP 
predictor-corrector strategy, due to S. Mehrotra [14], 
has worked particularly well in practice. Despite the fact 
that primal-dual IIP algorithms were very successfully 
implemented it proved to be quite difficult to charac- 
terize the convergence of these methods. The first such 
analyses, by Kojima, Megiddo, and Mizuno [9], and Y. 
Zhang [25], were followed by a large number of papers 
giving convergence/complexity results for various IIP 
algorithms. Ye, M.J. Todd, and Mizuno [24] devised 
a ‘selfdual homogeneous’ interior point method that 
has many of the practical features of IIP methods but at 
the same time has stronger convergence properties. An 
implementation of the homogeneous algorithm [22] 
exhibits excellent behavior, particularly when applied 
to infeasible or near-infeasible problems. 


Many interior point algorithms for LP can be ex- 
tended to more general optimization problems. Primal- 
dual algorithms generalize very naturally to the mono- 
tone linear complementarity problem (LCP; cf. > lin- 
ear complementarity problem); in fact many papers on 
primal-dual algorithms (for example [25]) are written 
in terms of the LCP. As a result these algorithms im- 
mediately provide interior point solution methods for 
convex quadratic programming (QP) problems. Inte- 
rior point algorithms can also be generalized to apply 
to quadratically constrained quadratic programming 
(QCQP), optimization over second order cone (SOC) 
constraints, and semidefinite programming (SDP); for 
details on these and other extensions see [16]. The ap- 
plication of interior point methods to SDP has particu- 
larly rich applications, as described in [1], and [20], and 
remains the topic of extensive research. 
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In his groundbreaking paper [6], N.K. Karmarkar de- 
scribed a new interior point method for linear program- 
ming (LP). As originally described by Karmarkar, his 
algorithm applies to a LP problem of the form: 


min c!x 
KLP st. Ax =0 
xeéeS, 


where x € R”, A is an m x n matrix, and S is the simplex 
S={x €R": x > 0, elx = n}. Throughout e denotes the 
vector with each component equal to one. It is assumed 
that e is feasible in KLP, and that the optimal objective 
value in KLP is exactly zero. These assumptions may 
seem restrictive, but it is easy to show that a standard 
form LP problem: 


min cx 
st. Ax =b (1) 
x>0 
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can be converted into a problem of the form KLP by 
combining the problem with its dual, and minimizing 
the gap between the two problems. 

In addition to the special form of the LP problem, 
Karmarkar employed two novel ingredients in the spec- 
ification of his algorithm. The first was the use of a pro- 
jective transformation in the construction of the algo- 
rithm’s iterative process. The algorithm is initialized at 
x° = e. For an iterate x* > 0, k > 0, let X* be the diagonal 
matrix with x = x i= 1,...,n. On the kth iteration, 
the algorithm uses a projective change of coordinates 
TiS S, 


n(X*)-x 
T(x) = ———, 
(2) el (Xk)—1x 
to map the point x* to e. Under the assumption that the 
optimal value in KLP is zero, KLP is equivalent to the 
transformed problem: 


min 7x 
s.t. Ax = 0 
xeS, 


where X = T*(x),@ = X*c and A = AX*. The algo- 
rithm then takes a projected gradient step in the trans- 
formed problem, and uses the inverse projective trans- 
formation to define the next iterate in the original co- 
ordinates: 


xh i ag (: _ =) ; (2) 
Jel 


where @ is a positive steplength and C, is the projection 
of ¢ onto the nullspace of A and eT. 

Karmarkar’s second innovation was the use of a po- 
tential function to monitor the algorithm’s progress. 
Karmarkar’s potential function is: 


f(x) = nin(cTx) — S°In(x;)). 


i=1 


Karmarkar proved that on each iteration, the steplength 
a in (2) can be chosen so that f(-) is reduced by an ab- 
solute constant 4. It is then easy to show that the iter- 
ates satisfy cT x* < eK" cT x° for all k. For any pos- 
itive L, it follows that if cT x° < 2°), then the algo- 
rithm obtains an iterate x* having cT x* < 2~™ ink = 


O(nL) iterations, each requiring O(n*) operations. For 
a problem of the form KLP with integer data, it can be 
shown that if cTx* < 2~ 9), where L is the number of 
bits required to represent the problem, then an exact 
optimal solution can be obtained from <x‘ 
ing’ procedure. These facts together imply that Kar- 
markar’s algorithm is a polynomial time algorithm for 
linear programming, requiring O(n‘L) operations for 
a problem with n variables, and integer data of bit size L. 
Karmarkar also described a partial updating technique 
that reduces the total complexity of his algorithm to 
O(n*°L) operations. Partial updating is based on using 
a scaling matrix X* which is an approximation of X*, 
and only ‘updating’ components ae which differ from 
X* by more than a fixed factor. 

Karmarkar’s algorithm created a great deal of inter- 
est for two reasons. First, the algorithm was a polyno- 
mial time method for LP. Second, Karmarkar claimed 
that unlike the ellipsoid algorithm, the other well- 
known polynomial time method for LP, his method 
performed extremely well in practice. There was some 
controversy at the time regarding these claims, and 
eventually it was discovered that most of Karmarkar’s 
computational results were based on the affine scaling 
algorithm, a simplified version of his algorithm that 
avoids the use of projective transformations. In any case 
it soon became clear that the performance of interior 
point algorithms for LP could be highly competitive 
with the simplex method, the usual solution technique, 
on large problems. 

There is a great deal of research connected with 
Karmarkar’s algorithm. Several authors ([1,3,4,5,9]) 
showed that the special form of KLP was unnecessary, 
and instead the projective algorithm could be directly 
applied to a standard form problem (1). This ‘stan- 
dard form variant’ adds logic which maintains a lower 
bound on the unknown optimal value in (1). Later it 
was shown that the projective transformations could 
also be eliminated, giving rise to so-called potential re- 
duction algorithms for LP. The best known potential 
reduction algorithm, due to Y. Ye [8], requires only 
O(./nL) iterations, and with an adaptation of Kar- 
markar’s partial updating technique has a total com- 
plexity of O(n°L) operations. The survey articles [2] and 
[7] give extensive references to research connected with 
Karmarkar’s algorithm, and related potential reduction 
methods. 
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The problem of determining the worst-case behavior of 
the simplex algorithm remained an outstanding open 
problem for more than two decades. In the beginning of 
the 1970s, V. Klee and G.J. Minty [9] solved this prob- 
lem by constructing linear examples on which an ex- 
ponential number of iterations is required before opti- 
mality occurs. In this article we present the Klee-Minty 
examples and show how they can be used to show expo- 
nential worst-case behavior for some well known pivot- 
ing rules. 


Introduction 


The problem of determining the worst-case behavior of 
the simplex algorithm remained an outstanding open 
problem for more than two decades. In the beginning 
of the 1970s, Klee and Minty in their classical paper 
[9] showed that the most commonly used pivoting rule, 
i.e., Dantzig’s largest coefficient pivoting rule [5], per- 
forms exponentially bad on some specially constructed 
linear problems, known today as Klee—Minty examples. 
Later on, R.G. Jeroslow [8] showed similar behavior for 
the maximum improvement pivoting rule. He showed 
this result by slightly modifying Klee-Minty examples. 
The Klee-Minty examples have been used by several 
researchers to show exponential worst-case behavior 
for the great majority of the practical pivoting rules. D. 
Avis and V. Shvatal [1] and independently, K.G. Murty 
[10, p. 439] showed exponential behavior for Bland’s 
least index pivoting rule [2] and D. Goldfarb and W. Sit 
[7] for the steepest edge simplex method [5]. Recently, 
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C. Roos [13] established exponential behavior for Ter- 
laky’s criss-cross method [14] and K. Paparrizos [11] for 
a number of pivoting rules some of which use past his- 
tory. Similar results have been derived by Paparrizos 
[12] for his dual exterior point algorithm and K. Dosios 
and Paparrizos [6] for a new primal dual pivoting rule 
[3]. 

In this paper we present the Klee-Minty examples 
and show some of their properties that are used in 
deriving complexity results of the simplex algorithm. 
These properties are then used to show exponential be- 
havior for two pivoting rules; the least index and the 
maximum coefficient pivoting rule. 

The paper is self contained. Next section describes 
a particular form of the simplex algorithm. The Klee- 
Minty examples and their properties are presented in 
Section 3. Section 4 is devoted to complexity results. 


Simplex Algorithm 


In describing our results we find it convenient to use the 
dictionary form [4] of the simplex algorithm. We will 
see in the next section that this form exhibits some ad- 
vantages in describing the properties of the Klee-Minty 
examples. 

Consider the linear problem in standard form 


max z=c!x 
st. Ax =b, (1) 
x > 0, 


where c, x € R",b € R”, A € R”” and superscript T de- 
notes transposition. Without loss of generality we may 
assume that A is of full row rank, i.e., rank(A) = m (m 
<n). 

A basis for problem (1) is a set of indices B C {1,..., 
n} containing exactly m indices. The element of B, the 
components of c and x and the columns of A indexed 
by Bare called basic while the remaining ones are called 
nonbasic. The set of nonbasic indices is denoted by N, 
N =({l,...,n}~ B. We also denote by B(N) the subma- 
trix of A containing the columns indexed by B(N). The 
components of a vector x indexed by B(N) are denoted 
by xp(xy). 

With this notation at hand the equality constraints 
of (1) are written in the form 


Bxpz + Nxn = 0. (2) 


If Bis a nonsingular matrix we can set xy = 0 and com- 
pute xg from (2). Then, we find xg = B7!b. The non 
singular matrix B is called basic matrix or basis. The so- 
lution xy = 0 and xg = B' bis called basic solution. If, in 
addition, it is xg = B~'b > 0, then xg, xn is a basic fea- 
sible solution. Geometrically, a basic feasible solution of 
(1) corresponds to a vertex of the polyhedral set of the 
feasible region. 

If B is nonsingular, we can express the basic vari- 
ables xg as a function of the non basic variable xy. We 
have from (2) that 


xp = —B-'Nxy + BUD. (3) 


Using (3), the objective function of problem (1) is writ- 
ten in the form 


z= ch XB + ChxN 
a c, (—B I Nxy + Bob) + Cy XN 
= (-c] BN + ch)xn +c] Bb. (4) 


At every iteration the simplex algorithm constructs the 
system of equations (3) and (4). 

Let the current feasible basis be B. The correspond- 
ing system of equations is written in the form 


z= (—c} BN + c))xn +c] Bb, 


(5) 
xp = —B 'Nxyn + BU!b. 


We denote the coefficients of xy and the constant terms 
of (5) by H,ie., 


ch —cy,B IN ci Bb = 
—B'N Blb Jo 


The top row of H, row zero, is devoted to the objective 
function. Some times we call it cost row. The remaining 
rows are numbered 1, ..., m. The ith row, 1 <i < m, 
corresponds to the basic variable xg;;3, where B[i] de- 
notes the ith element of B. Similarly, the jth column of 
H, 1 <j < n—m, corresponds to the nonbasic variable 
xyjj]- The last column of H corresponds to the constant 
terms. We denote the entries of H by hj. 

It is well known that if ho, < 0, for j = 1,...,n — 
m, then xg, xy is an optimal solution to (1). In this case 
the algorithm terminates. Otherwise, a nonbasic vari- 
able xnjq) = x; such that ho, njq) > 0 is chosen. Variable 
x, is called entering variable. If the condition 


hewigi 20; fox t= 1ysag th, 
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holds, problem (1) is unbounded and the algorithm 
stops. Otherwise, the basic variable xgip) = xz, is deter- 
mined by the following minimum ratio test 


XB[p] 
—hy,N{q] 
hi-n— 
= fon 7 <ix<m,hj,nfq) < 07. 
—hi,ntq) 


The basic variable x; is called leaving variable. Then, 
the entering variable x, takes the place of the leaving 
variable and vice versa, i. e., it is set 

Blip] — N[q] and Nl[q] < B[p]. 
Thus, a new basis B is constructed and the procedure 


is repeated. Let H be the tableau corresponding to the 
new basis B. It is easily seen that 


J ee 
= Ne le 
hy={e fie pia (6) 


hy; . 
h;;-“ otherwise. 
Ting 


Klee-Minty Examples 


The Klee-Minty examples of order n are the linear 
problems of the form 


n 
max y ex; 
j=l 


s.t. x, <1 
i-1 


_ 7 
20 ating + x; <1, 7) 
j=l 
re 
xj; = 0, j=l,...,n, 


where 0 < € < 1/3. In this section we will show that the 
feasible region of (7) is a slightly perturbed cube of di- 
mension n, see Fig. 1 and Fig. 2. The optimal solution is 
(0, 0, ..., 1) € R” and the optimal value is 1. A cube of 
dimension n has 2” vertices. In the next section we will 
describe pivoting rules that force the simplex method 
to pass through all the vertices of the Klee-Minty ex- 
amples. These pivoting rules require 2” — 1 iterations 
before optimality is reached and, hence, they are expo- 
nential. 


x2 


(0,0) (1,0) xi 


Linear Programming: Klee—Minty Examples, Figure 1 
Feasible region of Klee—Minty example of order n = 2 


(1,0,1-2e-2e2) 


(1.1-2e,0) 


Linear Programming: Klee—Minty Examples, Figure 2 
Feasible region of Klee—Minty example of order n = 3 


The standard form of linear problem (1) is 


n 
max y Be" Ix; 
j=l 


s.t. Xp + Xn41 = 1, 


; | ; 
2) oe Ixj + Xai = 1, (8) 
j=l 

1 = 2: n, 
xj20, j=l,...,2n, 


where X;,4;, 1 <i <n, is the slack variable corresponding 
to the ith inequality constraint of problem (7). 

We will be interested in basic solutions of (8) such 
that, for each j = 1,..., n either x; or x,4; is basic but 
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not both. Such a basic solution is called distinguished. 
A tableau corresponding to a dis- tinguished basis is 
called distinguished tableau. In order to facilitate the 
presentation it is convenient to introduce the set Q of 
all zero-one n-sequences (a), ..., dy) such that 


0 if x; is nonbasic and x, ; is basic, 

aj; = 
ij : : ‘ : : 
1 if x,+4; is nonbasic and x; is basic. 


We denote the distinguished basis corresponding to the 
sequence (0, ..., 0) by B and the tableau corresponding 
to B by H. We have B = {n + 1,...,2n}. It is easily 
verified that 


ei (tT =0, 9 Sx, 
—l ifl_<i=j<n, 
hij = 40 ifl<i<j, jAn4+l, (9) 
—2e' J ifi> j, 
1 ifi>nandj=n+1. 


Tableau H is sometimes called initial. 

A distinguished tableau H corresponding to (q),..., 
ay) € Q is constructed by starting from H and pivoting 
only on elements hp, such that a, = 1. Using this pro- 
cedure and relations (6) and (9) we easily conclude that 


hpp = —1 for p =1,...,n, (10) 
hij =Oforl<si<n—-l,i<j<n, 


for each distinguished tableau H. 


Lemma 1 Let B be an arbitrary distinguished basis and 
H the corresponding tableau. Then 


hij + hpjhip = —hij, j<p<n, Ke). (11) 


hoj + hpjhoj = 0, j<p< n,i=0. (12) 


Proof It suffices to show the following induction hy- 
pothesis. If the distinguished tableau H satisfies (11) 
and (12) and a pivot operation is performed on h,,, 1 
<r <n, resulting in tableau H, then H satisfies (11) 
and (12) as well. Observe that relations (11) and (12) 
are satisfied by the initial tableau H. 

So, assume that H satisfies (11) and (12) and a pivot 
operation is performed on element h,, = —1. Then, we 


have from (6) and (10) 


hij = (13) 


ifs otherwise. 


Combining (13) and the induction hypothesis we have 


hij ifi=0,j<r, 

hij ifi>r, j<r, 
hij=\—hijthrjhir ifi>r,fonti, (4) 

hi th jhir ifi=0,f=ntl, 

—h; jp otherwise. 


There are two cases to be considered, p < r and p > r. 
From relations (14) we have, for p < r, 


hij —hij;, hip = —hip, hpj - pj 
and for p>r 
hij = hij, hip = hij, hpj = —hpj- 


In both cases, 
hij + hpjhip = —hij — hhiphp; 
= hij = —hi. 
This proves (11). The proof of (12) is similar. 


Lemma | shows that pivoting on element hp, of a distin- 
guished tableau H is very easily performed. Just change 
the signs of the entries hj such that i = p andj < p ori 
> p and j < pand set 


Ainti — hint t+ hiphpnti 


for i= 0 ori>p. 

Figure 3 illustrates the entries of H that change value 
when pivoting on My. In particular, the entries in areas 
A and B just change sign. 


Theorem 2 Let H be a distinguished tableau of problem 

(8) and a = (aj, ..., Ay) be the corresponding n-sequence. 

Then the following relations hold. 
Fori=1,...,nandj=1,...,n we have 


(15) 
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See, = a aia = 


Linear Programming: Klee—Minty Examples, Figure 3 
Entries of a distinguished tableau H that change value after 
pivoting on element hp, 


while for i > j, 


261), 
k= 
hij = pi (16) 
2ei J, yy ax odd 
k=j 
For i =O andj =1,...,n we have 
n 
etd > ax even, 
se 
hoj = re (17) 
—e" i, = a, odd 
k=j 
Fori=1,...,nandj=n-+ 1 we have 
1, i=1, 
h; = i] (18) 
er 1- > aghiz, 2<i<n. 
k=1 


Proof The proof is by induction. We assume that dis- 
tinguished tableau H satisfies (15)-(18), and show that 
tableau H computed by pivoting on hp p Satisfies (15)- 
(18) as well. Observe that initial tableau H satisfies (15)- 
(18). 

Let a = (a@),...,a,) be the sequence correspond- 
ing to tableau H. Then 


aj, j#P. 
j=p. 


1—4a;, 


Proof of (15)-(16)}. Relations (15) have already 
been shown. From Lemma 1 we have 


i-1 i-1 
hij = hij and yas > 
k=j 


k=j 


for i< pori>pandj> p. Fori>pandj < p we have 


Hence, if ee ais odd (even), Pa; ax is even (odd). 
Also, from Lemma 1 we have hj; = —hj;. Hence, (16) 
holds in all cases. 

Proof of (17). It is easily seen that 

sign(ho;) = sign(hnj), fori <n. 

Now the proof comes from the proof of (16). 

Proof of (18). If i < p we have hin4i = Aijn4i. 
Hence, (18) holds trivially from the induction hypothe- 
sis. If i> p, then 


hint = hint + hpntihip 


i-1 p-l 
_ > aghiz + (: - > a] hip 
k=1 k=1 


p-l 
=l- > ax(hik + hpkhip) 
k=1 
i-1 
+ hip _ ahpphip — > aghir 
k=1 
pol = oe 
=l]- S- thik — (1 — ap)hip — ~~ axhix 
k=1 k=p+1 


i-1 
1- So aehik. 
k=1 


Theorem 3 The feasible region of problem (8) is 
a slightly perturbed cube. 


Proof It suffices to show that the feasible region has 
precisely 2” vertices. We show that each distinguished 
tableau, H, is feasible and all the adjacent tableaux are 
distinguished. 
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From Theorem 2 we have 


i-1 
hint =1— 0 axhie 
k=1 


>1-2se(lt+e+e7+---) 
2é€ 
l-eé 


te = 0. 


We show that if xyjp] is entering, then x,ip) is leaving 
variable. Let hip < 0 (i< p). We show that 


hint 
= =H, 


h 
pmntl 2 


Npp 


The last relation is equivalently written 


p-l i-1 
—hip(l— )° axhpe) <1— >> axhix. 
k=1 k=1 


Using Lemma 1 we get 


p-1 i-1 
hip —2 > arhiz < 1— = arhir 
k=1 k=1 


or 
pol i-1 
O0<1+hip + Yo axhik = x ahi 
k=1 k=p 
pHl i-1 
=1+ ) aghix+(l—axhipt D> axhir. 


k=1 k=pt+l 


We have already shown that the last relation holds. 


Applications 


Now, we are ready to show exponential behavior for 
some pivoting rules. Let a # b be sequences of Q. We 
write a < b if for the largest index j such that a; £ bj it is 
ye y=1 4 even and ) 7_ bd; odd. 

Let now f(a) be the objective value at the vertex cor- 
responding to a € Q. It is easily seen that f(a) < f(b), if 
a < b. The immediate successor of a sequence a € Q is 
the sequence (a), .. , Ay), where p is 
the smallest index such that pas p4j is even. 

Given a distinguished tableau H, a nonbasic variable 
XN{[q] is called eligible if ho, nip) > 0. 

A pivoting rule that forces the simplex algorithm to 
pass through all vertices of Klee-Minty examples is the 


+» Ar, 1 — Ap, Aptis ++ 


following. For the ease of reference we call it generic piv- 
oting rule. Let a € Q be the sequence corresponding to 
H. The entering variable is xy,p], where p is the smallest 
index such that }°7_ pt is even. From Theorem 2 we 
see that ho, nip] = e” P > 0. Hence, Xn1p] is eligible, and 
the generic pivoting rule requires 2” — 1 iterations on 
Klee-Minty examples of order n. 


Smallest Index Rule 


In the smallest index rule, the entering variable is the eli- 
gible variable with the smallest index. We show that the 
smallest index rule, called also Bland’s rule, performs 
exponentially on the slightly modified Klee-Minty ex- 
amples 


i-1 


ot 19 
2 ee + x2j-1 <1, oe 
j=l 
t= 2. Jn 
xj = 0, J= 1, 71 


We introduce the slack variable x2; to the ith con- 
straint of problem (19). 


Theorem 4 The least index pivoting rule performs ex- 
ponentially on example (19). 


Proof We show that the simplex algorithm employ- 
ing the least index pivoting rule requires 2” — 1 itera- 
tions when applied to problem (19) and initialized with 
the basis corresponding to the sequence (0,...,0) € Q. 
Clearly, all the bases generated by the algorithm are dis- 
tinguished i.e. for each i either x2;-) or x; is basic but 
not both. Let H be the current distinguished tableau 
corresponding to the sequence a € Q. Let also p be the 
smallest index such that }77_ pak is even. 

Then ho, nip] > 0 and, hence, xyjp) is eligible. Because 
of the indexing of the variable in problem (19), N[p] 
= 2p or 2p — 1. If q is another index such that ho, njq| 
>0 eka qth is even), then g > p and, hence, N[q] > 
Np]. Hence, the next basis corresponds to the immedi- 
ate successor of a € Q. This completes the proof. 
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Largest Coefficient Rule 


In the largest coefficient rule the entering variable xyjp) 
is chosen so that 


ho,wip) = max {ho,ntj]: Ho,ntj) > 0}. 


This rule solves problem (7) in one iteration when the 
initial basis is (0,...,0) EQ. 
We modify problem (7) as follows. We set 


1 
be (20) 
xj = yjer”, 

and divide the ith constraint by e7—)) and the objective 
function by ¢2"~), Then, problem (7) is written in the 


equivalent form 


> pty, 
j=l 


i-1 


max 


s.t. 2) pity, ty < we, (21) 
j=l 
i — re 
yj 29, j=l,...,n, 


where pl = I/e > 3. 


Theorem 5 The largest coefficient rule performs expo- 
nentially on problem (21). 


Proof Problem (21) is a scaled version of problem (7). 
Let x,,4; be the slack of constraint i. Then, all the results 
of the previous section, except those involving the RHS, 
hold true for problem (21). Because of relation (20), c; 
> 0 if and only if y; > 0. Hence, every distinguished 
basis of (21) is feasible. Now, it suffices to show that the 
generic and the largest coefficient rule coincide when 
applied to problem (21). However, this statement holds 
because jz > 1. 


See also 


> Criss-cross Pivoting Rules 

> Least-index Anticycling Rules 
> Lexicographic Pivoting Rules 
> Linear Programming 
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Introduction 


The G-group classification problem (discriminant prob- 
lem) seeks to classify members of some population into 
one of G predefined groups based on the value of a scor- 
ing function f applied to a vector x € N? of observed 
attributes. The scoring function is constructed using 
training samples drawn from each group. Of several 
criteria available for selecting a scoring function, ex- 
pected accuracy (measured either in terms of frequency 
of misclassification or average cost of misclassification) 
predominates. The scoring function f can be vector- 
valued, but when two groups are involved it is almost 
always scalar-valued, and scalar functions may be used 
even when there are more than two groups. 

As discussed in [8], statistical methods for con- 
structing scoring functions revolve around estimating, 
directly or indirectly, the density functions of the distri- 
butions of the various groups. In contrast, a number of 
approaches have been proposed that in essence ignore 
the underlying distributions and simply try to classify 
the training samples with maximal accuracy, hoping 
that this accuracy carries over to the larger population. 
The use of mathematical programming was suggested 
at least as early as 1965 by Mangasarian [11]; interest 
in it grew considerably with the publication of a pair of 
papers by Freed and Glover in 1981 [3,4], which led to 
parallel streams of research in algorithm development 
and algorithm analysis. 

Though nonlinear scoring functions can be con- 
structed, virtually all research into mathematical pro- 
gramming methods other than support vector ma- 
chines [1] restricts attention to linear functions. This 
is motivated largely by tractability of the mathemati- 
cal programming problems, but is bolstered by the fact 
that the Fisher linear discriminant function, the semi- 
nal statistically derived scoring function, is regarded as 
a good choice under a wide range of conditions. For the 
remainder of this article, we assume f to be linear. Di- 
rectly maximizing accuracy on the training samples dic- 


tates the use of a mixed integer program to choose the 
scoring function (» Mixed Integer Classification Prob- 
lems). The number of binary variables in such a formu- 
lation is proportional to the size of the training sam- 
ples, and so computation time grows in a nonpolyno- 
mial manner as the sample sizes increase. It is therefore 
natural that attention turned to more computation- 
ally efficient linear programming classification models 
(LPCMs). Erenguc and Koehler [2] give a thorough 
survey of the spectrum of mathematical programming 
classification models as of 1989, and Stam [14] pro- 
vides a somewhat more recent view of the field. Com- 
parisons, using both “real-world” data and Monte Carlo 
experiments, of the accuracy of scoring functions pro- 
duced by mathematical programming models with that 
of statistically-derived functions has produced mixed 
results [14], but there is evidence that LPCMs are more 
robust than statistical methods to large departures from 
normality in the population (such as populations with 
mixture distributions, discrete attributes, and outlier 
contamination). 


Models 


When G = 2 and f is linear and scalar-valued, clas- 
sification of x is based without loss of generality on 
whether f(x) < 0 or f(x) > 0. (If f(x) = 0,x can 
be assigned to either group with equal plausibility. This 
should be treated as a classification failure.) Barring the 
degenerate case f = 0, the solution set to f(x) = 0 
forms a separating hyperplane. Ideally, though not of- 
ten in practice, each group resides within one of the 
half-spaces defined by that hyperplane. An early pre- 
cursor to linear programming models, the perceptron 
algorithm [12], constructs an appropriate linear classi- 
fier in finite time when the samples are separable, but 
can fail if the samples are not separable. 

There being no way to count misclassifications in an 
optimization model without introducing integer vari- 
ables, LPCMs must employ a surrogate criterion. A va- 
riety of criteria have been tried, all revolving around 
measurements of the displacement of the sample points 
from the separating hyperplane. Let f(x) = w’x + wo 
for some non-null coefficient vector w € it? and some 
scalar wo. The euclidean distance from x to the separat- 
ing hyperplane is easily shown to be | f(x)| / ||w||. So the 
value of the scoring function at each training observa- 
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tion measures (to within a scalar multiple) how far the 
observation falls from the separating hyperplane. That 
distance is in turn identified as either an internal devia- 
tion or an external deviation depending on whether the 
observation falls in the correct or incorrect half-space. 
Figure | illustrates both types of deviation. 

The “hybrid” model of Glover et al. [6] is sufficiently 
flexible to capture the key features of most two-group 
models. Let X, be an N, x p matrix of training obser- 
vations from group g, and let 0 and 1 denote vectors of 
appropriate dimension, all of whose components are 0 
and 1 respectively. The core of the hybrid model, to be 
expanded later, is: 


2 
min > (ag - Veg — Bg- I'dg + Veego — Sedo) 
g=1 


s.t.Kyw + wo-1+d, —e; + dig: 1—ey9-1 <0 
Xow + Wo: 1—do +e) —do9-1+ e29:1 > 0 


Ww, Wo free; dy, eg, dgo, ego = 0. 


Variables d, and eg are intended to capture the internal 
and external deviations respectively of individual ob- 
servations from group g, while ego and dgo are intended 
to capture the maximum (or minimum) external and 
internal deviations respectively across the sample from 
group g. (The original hybrid model had diy = dy and 
€10 = €29, which is unnecessarily restrictive.) The in- 
tent of Glover et al. in presenting the hybrid model was 
to subsume a number of previously proposed models, 
and so the hybrid model should be viewed as a frame- 
work. When applied, not all of the deviation variables 
need be present. For example, omission of eg and d, 
would yield a version of the “MMD” model [2], with eo 
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the worst external deviation of any group g observation 
if any is misclassified (in which case dgo = 0) and dgo 
the minimum internal deviation of any group g obser- 
vation if none is misclassified (in which case eg) = 0). 
On the other hand, omission of é,9 and dgo results in 
a variation of the “MSID” model [2], with the objective 
function penalizing individual external deviations (e,) 
and rewarding individual internal deviations (d,). The 
nonnegative objective coefficients a, By, Yg, bg must 
be chosen so that the penalties for external deviations 
exceed the rewards for internal deviations; otherwise, 
the linear program becomes unbounded, as adding an 
equal amount to both e,, and dg, improves the objec- 
tive value. 


Pathologies 


Due to their focus on minimizing error count, mixed 
integer classification models tend to be feasible (the 
trivial function f = 0 is often a feasible solution) and 
bounded (one cannot do better than zero misclassifica- 
tions). LPCMs, in contrast, tend to be “naturally” fea- 
sible but may require explicit bounding constraints. If 
the training samples are perfectly separable, a solution 
exists to the partial hybrid model with e,, = 0 for all g 
and n and dy, > 0 for some g and n; any positive scalar 
multiple of that solution is also feasible, and so the ob- 
jective value is unbounded below. One way to correct 
this is to introduce bounds on the coefficients of the ob- 
jective function, say 


-l<w<+4+Hl. 


Another potential problem has to do with what is vari- 
ously referred to as the “trivial” or “unacceptable” solu- 
tion, namely f = 0. Consider the partial hybrid model 
above. The trivial solution (all variables equal to zero) is 
certainly feasible, with objective value zero. Given the 
requirement that the objective coefficients of external 
deviation variables dominate those of internal deviation 
variables, any solution with a negative objective value 
must perfectly separate the training samples. Contra- 
positively, then, if the training samples cannot be sep- 
arated, the objective value cannot be less than zero, in 
which case the trivial solution is in fact optimal. This is 
undesirable: the trivial function does not classify any- 
thing. The trick is to make the trivial solution subopti- 
mal. Some authors try to accomplish this by fixing the 
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constant term wo of the classification function at some 
nonzero value (typically wo = 1). The trivial discrimi- 
nant function w = 0 with nonzero constant term now 
misclassifies one group completely, and is unlikely to 
be the model’s optimal solution even when the training 
samples cannot be separated. There is the possibility, 
however, that the best linear classifier has wo = 0, in 
which case this approach dooms the model to finding 
an inferior solution. 

Other approaches include various attempts to make 
w = 0 infeasible, such as adding the constraint ||w|| = 
1. Unfortunately, trying to legislate the trivial solution 
out of existence results in a nonconvex feasible region, 
destroying the computational advantage of linear pro- 
gramming. Yet another strategy for weeding out trivial 
solutions is the introduction of a so-called normaliza- 
tion constraint. The normalization constraint proposed 
by Glover et al. for the hybrid model is 


2 Ng 
del, 


g=1n=0 


Various pathologies have been connected to injudicious 
use of normalization constraints [9,10,13], including: 
unboundedness; trivial solutions; failure of the result- 
ing discriminant function to adapt properly to rescal- 
ing or translation of the data (the optimal discriminant 
function after scaling or translating the data should be 
a scaled or translated version of the previously optimal 
discriminator, and the accuracy should be unchanged); 
and failure to find a discriminant function with per- 
fect accuracy on the training samples when, in fact, they 
can be separated (which suggests that the discriminant 
function found will have suboptimal accuracy on the 
overall population). Indeed, Glover later changed the 
normalization of the hybrid model to [5] 


—N2 ? 1'X,w + Ny * 1'X,w =1 


to avoid some of these pathologies. 


Multiple Group Problems 


The use of a scalar-valued scoring function in an LPCM 
with G>2 groups requires the a priori imposition of 
both a specific ordering and prescribed interval widths 
on the scores of the groups. This being impractical, at- 
tention turns to vector-valued functions. Whether us- 


ing methods based on statistics or mixed integer pro- 
gramming, a common approach to the multiple group 
problem is to develop a separate scoring function for 
each group, and assign observations to the group whose 
scoring function yields the largest value at that obser- 
vation. The linear programming analog would be to 
reward amounts by which the score fj(x) of an ob- 
servation x from group i exceeds each fj(x),j # i 
(or max;x; fj(x)) and penalize differences in the op- 
posite direction. This induces a proliferation of devia- 
tion variables (on the order of (G—1) YS Ng). Other 
approaches may construct discriminant functions for 
all pairs of groups, or for each group versus all oth- 
ers, and then using a “voting” procedure to classify ob- 
servations [15]. 

A good example of the use of a vector-valued scor- 
ing function is the work of Gochet et al. [7]. They begin 
with one scoring function per group, and in cases where 
two of those functions wind up identical, add additional 
functions to serve as tie-breakers. Their model is: 


G G 
: / 
min ) ) Legh 


g=1 g#h=1 
s.t. Xy (wg _ wn) + (weo _ Who) -L+eg, —d,;, = 0 


> » r (dei — gn) =4 


g=1 g#h=1 


Wg, Wego free; deh, egh = 0. 


The scoring function corresponding to group g is 
fe(x) = w Xt Wgo. Internal” and “external” deviations 
now represent amounts by which the scores of observa- 
tions generated by the correct functions exceed or fall 
short of their scores from functions belonging to other 
groups. The first constraint is repeated for every pair 
of groups g,h = 1,...,G, g # h. The second con- 
straint, in which q is an arbitrary positive constant, is 
a normalization constraint intended to render infeasi- 
ble both the trivial solution (all w, identical) and solu- 
tions for which the total of the external deviations ex- 
ceeds that of the internal deviations. If wy = wy and 
Wego = Who for some g F h, the model is applied recur- 
sively to the subsamples from only those groups (possi- 
bly more than just g and h) that yielded identical scor- 
ing functions. The additional functions generated are 
used as tie-breakers. 
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Methods 


The number of constraints in an LPCM approximately 
equals the number of training observations, while the 
number of variables can range from slightly more than 
the number of attributes to slightly more than the sum 
of the number of observations and the number of at- 
tributes, depending on which deviation variables are 
included in the model. In practice, the number of ob- 
servations will exceed the number of attributes; indeed, 
if the difference is not substantial, the model runs the 
risk of overfitting the scoring function (in the statis- 
tical sense). When the number of deviation variables 
is small, then, the LPCM tends to have considerably 
more constraints than variables, and a number of au- 
thors have suggested solving its dual linear program in- 
stead, to reduce the amount of computation. Improve- 
ments in both hardware and software have lessened the 
need for this, but it may still be useful when sample 
sizes reach the tens or hundreds of thousands (which 
can happen, for example, when rating consumer credit, 
and in some medical applications). 


See also 


> Deterministic and Probabilistic Optimization 
Models for Data Classification 

> Linear Programming 

> Mixed Integer Classification Problems 

> Statistical Classification: Optimization Approaches 
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Let F be a field, whose elements are referred to as 

scalars. A linear space V over F is a nonempty set on 

which the operations of addition and scalar multiplica- 

tion are defined. That is, for any x, y € V, we have x + 

y € V, and for any x € V anda € F we havea xe V. 

Furthermore, the following properties must be satisfied: 

1) x+y=ytx%,VuyeV. 

2) (xt+y)+z=x4+(y+z),Vuy,ze€ V. 

3) There exists an element 0 € V, such that x + 0 = x, 
VxeV. 

4) V xe V, there exists —x € V such that x + (—x) = 0. 

5) a(xt+y)=axtay,VaecFk, VxyeV. 

6) (a+ B)x=ax+ Bx,Va,BEF,VxeEV. 
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7) (aB)x=a(Px),Va,BEF,VxeV. 
8) lx=x, VxeEV. 

The elements of V are called vectors, and V is also 
called a vector space. 


See also 


> Affine Sets and Functions 
> Linear Programming 
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Stated in simplest terms, this article considers, in an ab- 
stract mathematical framework, a curve fitting or esti- 
mation problem where a given set of data points f is ap- 
proximated or estimated by an element from a set K so 
that the estimate of f is least affected by perturbations 
inf. 

Let X be a normed linear space with norm || - || and 
K be any (not necessarily convex) nonempty subset of 
X. For any f in X, let 


d(f.K) = inf{||f— hl: he K} (1) 


denote the shortest distance from f to K. Let also, for f 
in X, 


Pf) = Pe(f) = the K: ||f —hll = 4(f, K)}. 


The set-valued mapping P on X is called the metric pro- 
jection onto K. It is also called the nearest point map- 
ping, best approximation operator, proximity map, etc. 
If P(f) # @, then each element in it is called a best ap- 
proximation to (or a best estimate of) f from K. In prac- 
tical curve fitting or estimation problems, f represents 
the given data and the set K is dictated by the underly- 
ing process that generates f. Because of random distur- 
bance or noise, f is in general not in K, and it is required 
to estimate f by an element of K. See [7,12] and other 
references given there for a discussion of such prob- 
lems and the use of various norms or distance func- 
tions in approximation. An approximation problem or 
a minimum distance problem such as (1) involves find- 
ing a best approximation, investigating its uniqueness 
and other properties, and developing algorithms for its 
computation. If P(f) 4 @ (respectively, P(f) is a single- 
ton) for each f € X, then K is called proximinal (respec- 
tively, Chebyshev). 

If K is proximinal, then we define a selection op- 
erator, or simply a selection, to be any (single valued) 
function T on X into K so that T(f) € P(f) for every 
f € X. If K is Chebyshev, then clearly T = P and T is 
unique. A continuous selection operator is a selection T 
which is continuous. There is a vast literature available 
on the existence and properties of continuous selections 
including some survey papers. See, e. g., [1,2,3,6,8] and 
other references given there. A more difficult problem is 
finding a Lipschitzian selection operator (LSO) i.e., a se- 
lection T which satisfies 


ITA) — TA) < (1) ||f -—All, allf,he xX, (2) 


where c(T) (a positive constant depending upon T) is 
the smallest value satisfying (2). An LSO T is called an 
optimal Lipschitzian selection operator (OLSO) if c(T) 
< c(T’) for all LSO T’. If the operator T in (2) is OLSO, 
then (2) shows that the estimate T(f) of f is least sensi- 
tive to changes in the given data f. Consequently, T(f) 
is the most desirable estimate of f. The concept of an 
OLSO was introduced in [12] and the existence of an 
LSO and OLSO was investigated in [13,14,15,16,17]. If 
X is a Hilbert space and K C X is nonempty, closed and 
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convex, then K is Chebyshev. Then T, which maps f to 
its unique best approximation, is an LSO, i.e., T satis- 
fies (2) with c(T) = 1. For a proof see [5, p. 100]. Since T 
is unique, it is also trivially OLSO. For other spaces, the 
results are not so straightforward. 

In this paper we present several results which iden- 
tify LSOs and OLSOs in approximation problems on 
the space of bounded or continuous functions. We il- 
lustrate these results by examples. 


Lipschitzian Selection Operators 


Let S be any set and B denote the Banach space of real 
bounded functions f on S with the uniform norm |f|| 
= sup{|f(s)|: s € S}. Similarly, when S is topological, de- 
note by C = C(S), the space of real bounded and contin- 
uous functions on S, again, with the uniform norm ||-|[. 
Let X = B or C in what follows. We let f €¢ X, K C X and 
d(f, K) as above. We let d(f) = d(f, K) for convenience. 
For f in X, define Ky = {k € KK: k < f} and K’; = {ke K 
:k > f}. Let 


f(s) = sup {k(s): ke Ky}, s eS, 


f(s) = inf {k(s): ke Ky}, seS. 


We state three conditions below, they are identical for 
X=BorC. 

1) Ifk eK, thenk +c € K forall real c. 

2) Iff < X, then f € K. 

3) Iff € X, then f € K. 

If f and f are in K, then they are called the greatest 
K-minorant and the smallest K-majorant of f, respec- 
tively. Note that condition 2) (respectively, 3)) implies 
that the pointwise maximum (respectively, minimum) 
of any two functions in K is also in K. This can be easily 
established by letting f = max{f1, f2} (respectively, f = 
mintf1,f2}) where fi, f2 €¢ K. We callag € K the maxi- 
mal (respectively, minimal ) best approximation to f € 
X if g > g’ (respectively, if g < g’) for all best approxi- 
mations g’ to f. 


Theorem 1 Consider (1) with X = B or C, and any 

KcCX. 

a) Assume K is not necessarily convex. Suppose that 
conditions 1) and 2) hold for K. Then d(f) = 
| f —fl/2 and f’ = f + d(f) is the maximal best 
approximation to f. Also || f’ —h' || <2 || f —h|| for 


all f, h € X. The operator T defined T(f) =f! is an 
LSO with c(T) = 2. 

b) Assume K is not necessarily convex. Suppose that 
conditions 1) and 3) hold for K. Then a) holds with 
f replaced by f and with f’ = f — d(f), which is the 
minimal best approximation to f. 

c) Assume K is convex. Suppose that conditions 1), 2) 
and 3) hold for K. Then a) and b) given above apply. 
In addition, d(f) = (||f — f||)/2. Ag in K is a best 
approximation to f if and only if f — d(f) < g < 
f + d(f). Moreover, if f’ = (f + f)/2, then f' is 
a best approximation to f and || f’ —h' || < || f —h|| 
for allf, h € X. The operator T defined by T(f) =f" is 
an OLSO with c(T) = 1. 


The following theorem shows that the existence of 
a maximal (respectively, minimal) best approximation 
to (1) implies condition 2) (respectively, 3)). 


Theorem 2 Consider (1) with X = B or C, and any 
K C X. Assume condition 1) holds for K. Assume that 
the pointwise maximum (respectively, minimum) of two 
function in K is also in K. Then condition 2) (respec- 
tively, 3)) holds if the maximal (respectively, minimal) 
best approximation to f exists. This best approximation 
then equals f + d(f) (respectively, f — d(f)). 


The above theorems and the next one appear in [14,15]. 
Their proofs are available there. We now define another 
approximation problem, closely related to (1). Let 

d(f) = d(f, Ky) =inf{||f hl]: he Ks}. (3) 


The problem is to finda g € {hE Ky: || f —h || =d(f, 
Ky)}, called a best approximation to f from Ky. 


Theorem 3 Consider (3) with X = B or C, and any K C 

X which is not necessarily convex. 

a) Suppose that conditions 1) and 2) hold for K. Then a 
is the maximal best approximation to f and d(f) = 

| f= f| = 2d(f). The operator T defined by T(f) = 


7 is the unique OLSO with c(T) = 1. 

b) Assume condition 1) holds for K. Assume that the 
pointwise maximum of two functions in K is also in 
K. Then condition 2) holds if the maximum best ap- 
proximation to f exists. This best approximation then 
equals f. 
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Examples and Applications 


Example 4 (Approximation by quasiconvex functions.) 
Let S C R” be nonempty convex and consider B = B(S). 
For C = C(S) assume S is nonempty, compact and con- 
vex. A function h € B is called quasiconvex if 


h(As + (1 —A)t) < max{h(s), h(t)}, 


(4) 
foralls,teS, O<A<1. 


Equivalently, h in B is quasiconvex if one of the follow- 
ing conditions holds [9,10]: 

e {h <c}is convex for all real c; 

e {h<c} is convex for all real c. 

Let K be the set of all quasiconvex functions in B. It is 
easy to show that K and K M C are closed cones which 
are not convex and both satisfy condition 1) above (K is 
a cone if Ah € K whenever h € K and A > 0.) The great- 
est K-minorant of f is called the greatest quasiconvex 
minorant of f. Using (4) it is easy to show that if f € B 
then such a minorant f exits in B. The next proposition 
shows that if f € Cthen f € C. 


Let IT be the set of all convex subsets of S. Clearly, ¢, 
S € IT. For any A C R", we denote by co(A) the convex 
hull of A, i.e., the smallest convex set containing A. 


Proposition 5 Let f € X and let 


f°(P) = inf {f(): te S\P}, Pe, 
f(s) = sup {f°(P): Pe TT, s€S\P},seS. 


Then the following holds: 

e Iff € B (respectively, C) then 7 € B (respectively, C) 
and is quasiconvex. It is the greatest quasiconvex mi- 
norant of f. 

e Anh € B is the greatest quasiconvex minorant of 


f €B ifand only if 
{h<c}=co{f <c} forallrealc. (5) 


e Anh € B is the greatest quasiconvex minorant of 
f € C if and only if (5) holds or, equivalently, fh < 
c} = coth < c} for all real c. 


This proposition and its proof appear in [15]. The 
proposition shows that condition 2) holds for K and 
KC. Hence, Theorems 1a) and 3a) apply to X = Band 
K, and also to X = Cand KN C. In particular, Theorem 
la) shows that in each of these two cases the operator T 


mapping f to f is LSO with c(T) = 2. Now the example 
given in [13, p. 332] shows that T is OLSO. 


Example 6 (Approximation by convex functions.) Let 
S C R" be nonempty convex and consider B = B(S). 
A function h € B is called convex if h(As + (1 — A)t) 
<Ah(s)+ (1 — A) h(®), for all s, t€ SandallO <A <1. 
Clearly, a convex function is quasiconvex. Let K be the 
set of all convex functions in B. It is easy to show that K 
is a closed convex cone and satisfies condition 1). The 
greatest K-minorant of f is called the greatest convex 
minorant of f. It follows at once from the definition of 
a convex function that if f ¢ B then such a minorant f 
exists in B. Condition 2) therefore holds for K. Hence, 
Theorems 1a) and 3a) apply to X = Band K. In particu- 
lar, the LSO T of Theorem 1a) mapping f to f with c(T) 
= 2 can be shown to be an OLSO by using an example 
as in [13, p. 334]. 


Now consider approximation of a continuous function 
by continuous convex functions. For this case we let S$ C 
R" be a polytope which is defined to be the convex hull 
of a finitely many points in R”. It is compact, convex 
and locally simplicial [11]. Let K C C = C(S) be the set 
of continuous convex functions. It is easy to show that 
K isa closed convex cone. Again condition 1) holds for 
K. We assert that if f ¢ C, then f is convex and con- 
tinuous. This will establish that f is the greatest convex 
minorant of f. To establish the assertion note that f is 
convex since it is the pointwise supremum of convex 
functions. Since S is locally simplicial, [11, Corol. 17.2.1; 
Thm. 10.2] show that f is continuous on S. Thus, condi- 
tion 2) holds for K. Hence Theorems 1a) and 3a) apply 
to X = C and K. In particular, the LSO T of Theorem 
la) mapping f to f with c(T) = 2 can be shown to be 
an OLSO by using the same example as in the bounded 
case above since the sequence used in that example con- 
sists of continuous functions [13]. 


Example 7 (Approximation by isotone functions.) Let S 
be any set with partial order <. A partial order is a rela- 
tion < on S satisfying [4, p. 4]: 

e reflexivity, i.e.,s < s for alls € S;and 

e transitivity, i.e., ifs, tv <€S,ands <tandt < v, then 

S<v. 

A partial order is antisymmetric if s < tand t < s imply 
s = t. We do not include this antisymmetry condition in 
the partial order for sake of generality. We consider B = 
B(S) as before, and define a function k in B to be isotone 
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if k(s) < k(t) whenever s, t € Sands < t. Let K C Bbe 
the set of all isotone functions. It is easy to see that K 
is a closed convex cone. It is nonempty since the zero 
function is in K. It is easy to verify that conditions 1), 2) 
and 3) apply to K. Thus the greatest isotone minorant 
f and the smallest isotone majorant f of an f in B exist. 
Theorem 1c) and 3a) apply and we conclude that the 
operator T of Theorem Ic), mapping f to (f + f)/2, is 
OLSO with c(T) = 1 [15]. 7 


The next proposition gives explicit expressions for f 


and f. We call a subset E of S a lower (respectively, up- 
per) set if whenever t € E and v < t (respectively, t < 
vy), then v € E. For sin S, let L, = {t € S, t < s} and U, = 
{t € S, s < t}. Then, L, (respectively, U;) is the smallest 
lower (respectively, upper) set containing s, as may be 
easily seen. 

Proposition 8 


f(s) = sup{f(t): te L,}, 
f(s) = inf {f(f): t € U,}. 


For a proof, see [15]. 

Now we consider an application to C. Define S = x 
{[a;, bj]: 1 < i < n} C R", where a; < b;, and let < be 
the usual partial order on vectors. Let C = C(S) and let 
K be the set of isotone functions in C. It is easy to verify 
that K is a closed convex cone. Furthermore, if f € C, 
then f, f € C. We conclude, as before, that Theorems 
1c) and 3a) apply. Various generalizations of this prob- 
lem exist. See, for example, [12, Sect. 5], [15, Ex. 4.3], 
and [17]. 

As was observed in [16], the dual cone of K plays 
an important role in duality and approximation from 
K. Some properties of the cone K of isotone functions 
on a finite partially ordered set S and its dual cone are 
obtained in [18]. 


See also 


> Convex Envelopes in Optimization Problems 
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Discrete optimization problems are solved using a vari- 
ety of state space search techniques. The choice of tech- 
nique is influenced by a variety of factors such as avail- 
ability of heuristics and bounds, structure of state space, 
underlying machine architecture, availability of mem- 
ory, and optimality of desired solution. The computa- 
tional requirements of these techniques necessitates the 
use of large scale parallelism to solve realistic problem 
instances. In this chapter, we discuss parallel processing 
issues relating to state space search. 

Parallel platforms have evolved significantly over 
the past two decades. Symmetric multiprocessors 
(SMPs), tightly coupled message passing machines, and 
clusters of workstations and SMPs have emerged as the 
dominant platforms. From an algorithmic standpoint, 
the key issues of locality of data reference and load bal- 
ancing are key to effective utilization of all these plat- 
forms. However, message latencies, hardware support 
for shared address space and mutual exclusion, com- 
munication bandwidth, and granularity of parallelism 
all play important roles in determining suitable parallel 
formulations. A variety of metrics have also been de- 
veloped to evaluate the performance of these formula- 
tions. Due to the nondeterministic nature of the com- 
putation, traditional metrics such as parallel runtime 
and speedup are difficult to quantify analytically. The 
scalability metric, Isoefficiency, has been used with ex- 
cellent results for analytical modeling of parallel state 
space search. 

The state spaces associated with typical optimiza- 
tion problems can be fashioned in the form of either 
a graph or a tree. Exploiting concurrency in graphs is 
more difficult compared to trees because of the need 


for replication checking. The availability of heuristics 
for best-first search imposes constraints on parallel ex- 
ploration of states in the state space. For the purpose 
of parallel processing, we can categorize search tech- 
niques loosely into three classes: depth-first tree search 
techniques (a tree search procedure in which the deepest 
of the current nodes is expanded at each step), best-first 
tree search techniques (a tree search procedure in which 
nodes are expanded based on a global (heuristic) mea- 
sure of how likely they are to lead to a solution), and 
graph search techniques (a search requiring additional 
computation for checking if a node has been encoun- 
tered before, since a node can be reached from multiple 
paths). Many variants of these basic schemes fall into 
each of these categories as well. 


Parallel Depth-First Tree Search 


Search techniques in this class include ordered depth- 
first search, iterative deepening A* (IDA*), and 
depth—first branch and bound (DFBB). In all of these 
techniques, the key ingredient is the depth-first search 
of a state space (cost-bounded in the case of IDA* and 
DFBB). DFS was among the first applications explored 
on early parallel computers. This is due to the fact that 
DFS is very amenable to parallel processing. Each sub- 
tree in the state space can be explored independently 
of other subtrees in the space. In simple DFS, there is 
no exchange of information required for exploring dif- 
ferent subtrees. This implies that it is possible to de- 
vice simple parallel formulations by assigning a distinct 
subtree to each processor. However, the space associ- 
ated with a problem instance can be highly unstruc- 
tured. Consequently, the work associated with subtrees 
rooted at different nodes can be very different. There- 
fore, a naive assignment of a subtree rooted at a dis- 
tinct node to each processor can result in considerable 
idling overhead and poor parallel performance. The 
problem of designing efficient parallel DFS algorithms 
can be viewed in two steps: the partitioning problem 
and the assignment problem. The partitioning problem 
addresses the issue of breaking up a given search space 
into two subspaces. The assignment problem then maps 
subspaces to individual processors. 

There are essentially two techniques for partitioning 
a given search space: node splitting and stack splitting. 
In node splitting, the root node ofa subtree is expanded 


Load Balancing for Parallel Optimization Techniques 


1907 


to generate a set of successor nodes. Each of these nodes 
represents a distinct subspace. While node splitting is 
easy to understand and implement, it can result in 
search spaces of widely varying sizes. Since the objective 
of the assignment problem is to balance load while min- 
imizing work transfers, widely varying subtask sizes are 
not desirable. An alternate technique called stack split- 
ting attempts to partition a search space into two by as- 
signing some nodes at all levels leading up to the spec- 
ified node. Thus if the current node is at level 4, stack 
splitting will split the stack by assigning some nodes at 
levels 1, 2, and 3 to each partition. In general, stack split- 
ting results in a more even partitioning of search spaces 
than node splitting. 

We can now formally state the assignment problem 
for parallel DFS as a mapping of subtasks to processors 
such that: 

e the work available at any processor can be parti- 
tioned into independent work pieces as long as it is 
more than some nondecomposable unit; 

e the cost of splitting and transferring work to another 
processor is not excessive (i.e. the cost associated 
with transferring a piece of work is much less than 
the computation cost associated with it); 

e areasonable work splitting mechanism is available; 
i.e., if work w at one processor is partitioned in 2 
parts yw and (1 — y)w, then 1— a > Ww > a, where 
a@ is an arbitrarily small constant; 

e itis not possible (or is very difficult) to estimate the 
size of total work at a given processor. 

A number of mapping techniques have been proposed 

and analyzed in literature [5,7,8,9,11,16]. These map- 

ping techniques are either initiated by a processor with 
work (sender initiated, the processor with work ini- 
tiates the work transfer) or a processor looking for 
work (receiver initiated, an idle processor initiates the 
work transfer). In the global round robin request (GRR, 
idle processors in the global round robin scheme re- 
quest processors for work in a round-robin fashion us- 
ing a single (global) counter) receiver initiated scheme, 

a single counter specifies the processor that must re- 

ceive the next request for work. This ensures that work 

requests are uniformly distributed across all processors. 

However, this scheme suffers from contention at the 

processor holding the counter. Consequently, the per- 

formance of this scheme is poor beyond a certain num- 
ber of processors. A message combining variant of this 


scheme (GRR-M, a variant of the global round robin 
scheme in which requests for value of global counter 
are combined to alleviate contention overheads) relies 
on combining intermediate requests for the counter 
into single request. This alleviates the contention and 
performance bottleneck of the GRR scheme. The asyn- 
chronous round robin balancing (ARR, i. e. each proces- 
sor selects a target for work request in a round robin 
manner using a local counter) uses one counter at each 
processor. Each processor uses its counter to deter- 
mine the next processor to query for work. While this 
scheme balances work requests in a local sense, these 
requests may become clustered in a global sense. In the 
random polling scheme (RP, i.e. idle processors send 
work requests to a randomly selected target processor), 
each processor selects a random processor and requests 
work. In near-neighbor load balancing scheme (NN, i.e. 
an idle processor requests one of its immediate neigh- 
bors for work), processors request work from their im- 
mediate neighbors. This scheme has the drawback that 
localized hot-spots may take a long time to even out. 

In sender initiated schemes a processor with work 
can give some of its work to a selected processor 
[6,16]. This class of schemes includes the master-slave 
(MS) and randomized allocation (RA) schemes. In the 
MS scheme, a processor, designated master, generates 
a fixed number of work pieces. These work-pieces are 
assigned to processors as they exhaust previously as- 
signed work. The master may itself become the bottle- 
neck when the number of processors is large. Multilevel 
master-slave algorithms have been used to alleviate this 
bottleneck. Randomized allocation schemes are sender 
initiated counterparts of RP schemes. In randomized 
allocation, a processor sends a part of its work to a ran- 
domly selected processor. 

The performance and scalability of these techniques 
is often dependent on the underlying architecture. 
Many of these techniques are, in principle scalable, i.e., 
they result in linear speedup on increasing the number 
of processors p as long as the size of the search space 
grows fast enough with p. It is desirable that this re- 
quired rate of growth of problem size (also referred to 
as the iso-efficiency metric [10]) be as small as possible 
since it allows the use of a larger number of processors 
effectively for solving a given problem instance. In Ta- 
ble 1, we summarize the iso-efficiency functions of var- 
ious load balancing techniques. 
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Scalability results of receiver initiated load balancing 
schemes for various architectures 


Arch Shared | H-cube Mesh W/S 
Scheme (2D) Cluster 
ARR p’ log p | p? log’ p | p** log p| p* log p 
NN plogp | pes"? | kv? | pi logp 
GRR plogp | p*logp | p’logp | p*logp 
GRR-M | plogp | plog’ p | p!*logp 

RP plog* p | plog’ p |p’ log” p| p* log* p 
Lower p plogp ee i 

Bound 


IDA* and DFBB search techniques use this basic 
parallel DFS algorithm for searching state space. In 
IDA* , each processor has a copy of the global cost 
bound. Processors perform parallel DFS with this cost 
bound. At the end of each phase, the cost is updated 
using a single global operation. Some schemes for al- 
lowing different processors to work with different cost 
bound have also been explored. In this case, a solution 
cannot be deemed optimal until search associated with 
all previous cost bounds has been completed. DFBB 
technique uses a global current best solution to bound 
parallel DFS. Whenever a processor finds a better solu- 
tion, it updates this global current best solution (using 
a broadcast in message passing machines and a lock-set 
in shared memory machines). DFBB and IDA* using 
these parallel DFS algorithms has been shown to yield 
excellent performance for various optimization prob- 
lems [3,13,19]. 

In many optimization problems, the successors of 
nodes tend to be strongly ordered. In such cases, naive 
parallel formulations that ignore this ordering infor- 
mation will perform poorly since they are likely to ex- 
pand a much larger subspace than those that explore 
nodes in the right order. Parallel DFS formulations for 
such spaces associate priorities with nodes. Nodes with 
largest depth and highest likelihood of yielding a solu- 
tion are assigned the highest priority. Parallel ordered 
DFS then expands these nodes in a prioritized fashion. 


Parallel Best-First Tree Search 


Best-first tree search algorithms rely on an open list (i.e. 
a list of unexplored configurations sorted on their qual- 


ity) to sort available states on the basis of their heuristic 
solution estimate. If this heuristic solution estimate is 
guaranteed to be an underestimate (as is the case in the 
A» algorithm), it can be shown that the solution found 
by BFS is the optimal solution. The presence of a glob- 
ally ordered open list makes it more difficult to paral- 
lelize BFS. In fact, at the first look, BFS may appear in- 
herently serial since a node with higher estimated so- 
lution cost must be explored only after all nodes with 
lower costs have been explored. However, it is possible 
that there may be multiple nodes with the best heuris- 
tic cost. If the number of such nodes is less than the 
number of available processors, then some of the nodes 
with poorer costs may also be explored. Since it is pos- 
sible that these nodes are never explored by the serial 
algorithm, this may result in excess work by the parallel 
formulation resulting in deceleration anomalies. These 
issues of speedup anomalies resulting from excess (or 
lesser) work done by the parallel formulations of state 
space search are discussed later. 

A simple parallel formulation of BFS uses a global 
open list. Each processor locks the list, extracts the best 
available node and unlocks the list. The node is ex- 
panded and heuristic estimates are determined for each 
successor. The open list is locked again and all suc- 
cessors are inserted into the open list. Note that since 
the state space is a tree, no replication checking is re- 
quired. The open list is typically maintained in the form 
of a global heap. The use of a global heap is a source of 
contention. If the time taken to lock, remove, and un- 
lock the top element of the heap is taccess and time for 
expansion is texp, then the speedup of the formulation 
is bounded by (access + texp)/taccess: A number of tech- 
niques have been developed to effectively reduce the ac- 
cess time [17]. These techniques support concurrent ac- 
cess to heaps stored in shared memory while maintain- 
ing strict insertion and deletion ordering. While these 
increase the upper bound on possible speedup, the per- 
formance of these schemes is still bounded. 

The contention associated with the global data 
structure can be alleviated by distributing the open list 
across processors. Now, instead of p processors shar- 
ing a single list, they operate on k distinct open lists. 
In the limiting case, each processor has its own open 
list. A simple parallel formulation based on this frame- 
work starts off with the initial state in one heap. As 
additional states are generated, they are shipped off to 
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the other heaps. As nodes become available in other 
heaps, processors start exploring associated state space 
using local BFS. While it is easy to keep all processors 
busy using this framework, it is possible that some of 
the processors may expand nodes with poor heuris- 
tic estimates that are never expanded by the serial for- 
mulation. To avoid this, we must ensure that all open 
lists have a share of the best globally available nodes. 
This is also referred to as quality equalization (the pro- 
cess of ensuring that all processors are working on re- 
gions of state-space of high quality). Since the quality of 
nodes evolves with time, quality equalization must be 
performed periodically. Several triggering mechanisms 
have been developed to integrate quality equalization 
with load balancing [1,19]. A simple triggering mecha- 
nism tracks the best node in the system. The best node 
in the local heap is compared to the best node in the 
system and if it is considerably worse, an equalization 
process is initiated. Alternately, an equalization process 
may be initiated periodically. The movement of nodes 
between various heaps may itself be fashioned in a well 
defined topology. Lists may be organized into rings, 
shared blackboards, or hierarchical structures. These 
have been explored for several applications and archi- 
tectures. Speedups in excess of 950 have been demon- 
strated on 1024 processor hypercubes in the context of 
TSPs formulated as best-first tree search problems [2]. 


Searching State Space Graphs 


Searching state space graphs presents additional chal- 
lenges since we must check for replicated states during 
search. The simplest strategy for dealing with graphs is 
to unroll them into trees. The overhead of unrolling 
a graph into a tree may range from a constant to an 
exponential factor. If the overhead is a small constant 
factor, the resulting tree may be searched using parallel 
DFS or BFS based techniques. However, for most graph 
search problems, this is not a feasible solution. 

Graph search problems rely on a closed list (i.e. a list 
of all configurations that have been previously encoun- 
tered) that keeps track of all nodes that have already 
been explored. Closed lists are typically maintained as 
hash tables for searching. In a shared memory context, 
insertion of nodes into the closed list requires locking of 
the list. If there is a single lock associated with the en- 
tire list, the list must be locked approximately as many 


times as the total number of nodes expanded. This rep- 
resents a serial bottleneck. The bottleneck can be alle- 
viated by associating multiple locks with the closed list. 
Processors lock only relevant parts of the closed list into 
which the node is being inserted. 

Distributed memory versions of this parallel algo- 
rithm physically distribute the closed list across the pro- 
cessors. As nodes are generated, they are hashed to the 
appropriate processor that holds the respective part of 
the hash table. Search is performed locally at this pro- 
cessor and the node is explored further at this processor 
if required. This has two effects: if the hash function as- 
sociated with the closed list is truly randomized, this has 
the effect of load balancing using randomized alloca- 
tion. Furthermore, since nodes are randomly allocated 
to processors, there is a probabilistic quality equal- 
ization for heuristic search techniques. These schemes 
have been studied by many researchers [14,15]. As- 
suming a perfectly random hash function, it has been 
shown that if the number of nodes originating at each 
processor grows as O(log p), then each processor will 
have asymptotically equal number of nodes after the 
hash operation [15]. Since each node is associated with 
a communication, this puts constraints on the architec- 
ture bandwidth. Specifically, the bisection width of the 
underlying architecture must increase linearly with the 
number of processors for this formulation to be scal- 
able. 

A major drawback of graph search techniques such 
as BFS is that its memory requirement grows linearly 
with the search space. For large problems, this mem- 
ory requirement becomes prohibitive. Many limited- 
memory variants of heuristic search have been devel- 
oped. These techniques rely on retraction or delayed 
expansion of less promising nodes to reduce memory 
requirement. In the parallel processing context, retrac- 
tions lead to additional communication and indexing 
for parent-child relationships [4]. 


Anomalies in Parallel Search 


As we have seen above, it is possible for parallel formu- 
lations to do more or less work than the serial search 
algorithm. The ratio of nodes searched by the paral- 
lel and serial algorithms is called the search overhead 
factor (i.e. the ratio of excess work done by a parallel 
search formulation with respect to its serial formula- 
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tion). A search overhead factor of greater than one in- 
dicates a deceleration anomaly and less than one indi- 
cates an acceleration anomaly. An acceleration anomaly 
manifests itself in a speedup greater than p on p pro- 
cessors. It can be argued however that in these cases, 
the base sequential algorithm is suboptimal and a time- 
multiplexed serialization of the parallel algorithm is in 
fact a superior serial algorithm. 

In DFS and related techniques, parallel formu- 
lations might detect solutions available close to the 
root on alternate branches, whereas serial formulations 
might search large parts of the tree to the left before 
reaching this node. Conversely, parallel formulations 
might also expand a larger number of nodes than the se- 
rial version. There are situations, in which parallel DFS 
can have a search overhead factor of less than 1 on the 
average, implying that the serial search algorithm in the 
situation is suboptimal. V. Kumar and V.N. Rao [18] 
show that if no heuristic information is available to or- 
der the successors of a node, then on the average, the 
speedup obtained by parallel DFS is superlinear if the 
distribution of solutions is nonuniform. 

In BFS, the strength of the heuristic determines 
the search overhead factor. When strong heuristics are 
available, it is likely that expanding nodes with lower 
heuristic values will result in wasted effort. In general, it 
can be shown that for any given instance of BFS, there 
exists a number k such that expanding more than k 
nodes in parallel from a global open list leads to wasted 
computation [12]. This situation gets worse with dis- 
tributed open lists since expanded nodes have locally 
minimum heuristics that are not the best nodes across 
all open lists. In contrast, the search overhead factor can 
be less than one if there are multiple nodes with iden- 
tical heuristic estimates and one of the processors picks 
the right one. 


Applications of Parallel Search Techniques 


Parallel search techniques have been applied to a vari- 
ety of problems such as integer and mixed integer pro- 
gramming, and quadratic assignment for applications 
ranging from path planning and resource location to 
VLSI packaging. Quadratic assignment problems from 
the Nugent-Eschermann test suites with up to 4.8 x 
10!° nodes have been solved on parallel machines in 
days. Traveling salesman problems with thousands of 
cities and mixed integer programming problems with 


thousands of integer variable are within the reach of 
large scale parallel machines. While the use of paral- 
lelism increases the range of solvable problems, design- 
ing effective heuristic functions is critical. This has the 
effect of reducing effective branching factor and thus 
inter-node concurrency. However, the computation of 
the heuristic can itself be performed in parallel. The 
use of intra-node parallelism in addition to inter-node 
parallelism has also been explored. While significant 
amounts of progress has been made in effective use of 
parallelism in discrete optimization, with the develop- 
ment of new heuristic functions, opportunities for sig- 
nificant contributions abound. 
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In the classic unconstrained minimization problem, 
a continuously differentiable real-valued function f is 
given on a normed vector space X and the goal is to find 
points in X where the infimum of f is achieved or closely 
approximated. Descent methods for this problem start 
with some nonoptimal point x°, search for a neighbor- 
ing point x! where f(x’) < f(x°), and so on ad infinitum. 
At each stage, the search is typically guided by a local 
model based on derivatives of f. 

If f is convex and every local minimizer is therefore 
automatically a global minimizer, then well-designed 
descent methods can indeed generate minimizing se- 
quences, i.e., sequences {x*} for which 


dim f(x’) = inf f(a), (1) 


On the other hand, nonconvex cost functions can 
have multiple local minimizers and any of these may 
attract the iterates of the standard descent schemes. 
This behavior is examined here for a large class of 
gradient-related descent methods, and for local min- 
imizers that need not satisfy the usual nonsingular- 
ity hypotheses. In addition, the analytical formulation 
adopted yields nontrivial local convergence theorems in 
infinite-dimensional normed vector spaces X. Such the- 
orems are not without computational significance since 
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they often help to explain emerging trends in algorithm 
behavior for increasingly refined finite-dimensional ap- 
proximations to underlying infinite-dimensional opti- 
mization problems. 


Differentials and Gradients 


In a general normed vector space X, the first (Fréchet) 
differential of f at a point x is a linear function f’(x): X 
—> R' that satisfies the following conditions: 


def 


CoH] sup [7G] <2 @) 
o [fx + @) — fe) — fal _ . iss 
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Since f’(x)d is linear in d, condition (2) holds if and only 
if f’(x)d is continuous in d. Condition (2) is automati- 
cally satisfied in any finite-dimensional space X. The re- 
maining condition (3) asserts that f(x)+ f’(x)d asymp- 
totically approximates f(x+d) with an o(||d||) error as d 
approaches zero. At most one linear function can sat- 
isfy these conditions in some norm on X. If conditions 
(2) and (3) do hold in the norm ||-||, then f is said to be 
(Fréchet) differentiable at x (relative to the norm ||-||). If 
f is differentiable near x € X and if 


lim ||| f(y) —f'|| = 9. (4) 
ll»-||+0 


then f is continuously differentiable at x. Note that in 
finite-dimensional spaces, all norms are equivalent and 
conditions (2)-(4) hold in any norm if they hold in 
some norm. However, two norms on the same infinite- 
dimensional space need not be equivalent, and continu- 
ity and differentiability are therefore norm-dependent 
properties at this level of generality. 

In the Euclidean space X = R", f is continuously dif- 
ferentiable if and only if the partial derivatives of f are 
continuous; moreover, when f has continuous partial 
derivatives, f’(x) is specified by the familiar formula, 


f'(x)d = (Vf (x), 4), (5) 


where (-, -) is the standard Euclidean inner product and 
Vf (x) is the corresponding gradient of f at x, i.e., 


n 
= y XiVi 
i=1 


and 


Vi (x) = (Fem... ). 


- oy ) 

When Vf(-) is continuous, conditions (2)-(4) can be 
proved for the linear function in (5) with a straightfor- 
ward application of the chain rule, Cauchy’s inequality 
and the one-dimensional mean value theorem. In ad- 
dition, it can be shown that d = V f(x) is the unique 
solution of the equations, 


lal = | FD] (6) 
and 

f'(x)d = || fC] Mal. (7) 
where ||-|| and |||-||| are induced by the Euclidean inner 


product on R". 

The circumstances in the Euclidean space R” sug- 
gest a natural extension of the gradient concept in gen- 
eral normed vector spaces X. Let f be differentiable at 
x € X. Then any vector d € X that satisfies conditions 
(6)-(7) will be called a gradient vector for f at x. Note 
that the symbols ||-|| and |||-||| in (6)-(7) now signify the 
norm provided on X and the corresponding operator 
norm in (2). Depending on the space X, its norm ||-|| 
and the point x, conditions (6)-(7) may have no solu- 
tions for d, or a unique solution, or infinitely many so- 
lutions. 

In any finite-dimensional space X, linear functions 
are continuous, the unit sphere {u € X: ||u|| = 1} is com- 
pact, the supremum in (2) is therefore attained at some 
unit vector u, and the existence of solutions d for (6)—- 
(7) is consequently guaranteed. On the other hand, f 
may have infinitely many distinct gradients at a point 
x if the norm on X is not strictly convex. For example, 
if X = R" and ||x|| = max; <;<,|x;|, then f’(x) is pre- 
scribed by (5), and 


aE 
Lreoll =o Fee]. 
i=1 


Moreover, d is a gradient vector for f at x if and only if 


d=||F'G)||4 


and 


); 


of 
uj € sgn (= 
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where sgn(t) = {—1} or [—1, 1] or {1} fort <0, f=0and 
t > 0, respectively. 

The existence of gradients can also be proved in re- 
flexive infinite-dimensional spaces X where bounded 
linear functions are weakly continuous and closed unit 
balls are weakly compact. In nonreflexive spaces, condi- 
tions (6)-(7) may not have solutions d; however, in any 
normed vector space and for any fixed arbitrarily small 
v € (0, 1), the relaxed conditions, 


lll = | fF’) ]]| (8) 


and 


f'(x)d = (1—v) | fC lla. (9) 


always have solutions d. This follows easily from (2) 
and the meaning of sup. The solutions of (8)-(9) will be 
called v-approximate gradients of f at x. They occupy 
a central position in the present formulation of the sub- 
ject algorithms. 


Gradient-Related Descent Methods 


If f’ (x) = 0, then x is called a stationary point of f. If 
f' (x) # 0, then x is not stationary and the set {d € X: 
f'(x)d < 0} is a nonempty open half-space. An element 
d in this half-space is called a descent vector since condi- 
tion (3) immediately implies that f(x + td) < f(x) when t 
is positive and sufficiently small. If dis a v-approximate 
gradient at a nonstationary point x, then according to 


(8)-(9), 
f(x(-d) < -A—v)|[F"I "<0. 


Hence —d is a descent vector. In particular, if d is a gra- 
dient at a nonstationary point x, then —d is a steepest 
descent vector in the sense that 


f'(x\(-d) = f'(x)y, 


for all v € X such that ||v|| = ||d]l. 

Suppose that v, j41, and [U2 are fixed positive num- 
bers, with v € (0, 1) and pz > 4; > 0. At each x € X, let 
G” (x) denote the nonempty set of v-approximate gra- 
dients for f at x and let G(x) be a nonempty subset of 
the set of all multiples wd with w € [MW1, 2] and d € 
G" (x), ie., 


(10) 


B#G(x)C (J uG(x). 


ME[f1, 12] 


(11) 


The corresponding set-valued mapping G(-) is referred 
to here as a gradient-related set function with pa- 
rameters v, [41 and jl2. In the present development, 
a gradient-related iterative descent method consists of 
a gradient-related set function G(-), and a rule that se- 
lects a vector d* € G(x*) at each iterate x*, and another 
rule that determines the steplength parameter s* € (0, 1] 
in the recursion, 


xhTd — yk — gk gk (12) 
once d* has been chosen. The sequences {ck} gener- 
ated by this recursion are called gradient-related suc- 
cessive approximations. (For related formulations, see 
[3,4,10].) The convergence theorems described later in 
this article depend only on basic properties of gradient- 
related set functions and the steplengths s‘, hence the 
precise nature of the rule for selecting d* in G(x*) is 
not important here. This rule may refer to prior iter- 
ates {x'};<,, or may even be random in nature. There 
are also many alternative steplength rules that achieve 
sufficient reductions in f at each iteration in (12) and 
move the successive approximations x* toward regions 
in the domain of f that are interesting in at least a local 
sense [3,4,10]. 


Descent Method Prototypes 


When gradients of f exist and f attains its infima on 
lines in X, the steepest descent and exact line minimiza- 
tion rules for d and s yield the prototype steepest descent 
method, 


kth = xk _ sk qk (13) 
where 
k : k 4 gk 
soe arg min f(x td”) (14) 


and d* is any solution of (6)-(7) for x = x*, Note that 
the actual reduction in f achievable on a steepest de- 
scent half-line {y € X: dt > 0, y = x — td} may be 
smaller than that attainable on other half-lines, since 
(10) merely refers to norm-dependent local directional 
rates of change for f at x. Thus the name of this method 
is somewhat misleading. 

Newtonian descent algorithms also amount to spe- 
cial gradient-related descent methods near a certain 
type of nonsingular local minimizer x*. These schemes 
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employ variants of the restricted line minimization 
steplength rule, 


sk € arg min f(x* — td‘), 
t€(0,1] 


(15) 


and replace the gradients d* in a steepest descent iter- 
ation by descent vectors that approximate the Newton 
increment, 


dN (x*) Se f(x) f (x). 
Gradient-related descent vector approximations to 
dN(x‘) are generated in some neighborhood of x* 


by various quasi-Newton auxiliary recursions, pro- 
vided that the following (interdependent) nonsingular- 


(16) 


ity conditions hold: 
i) f is twice continuously (Fréchet) differentiable at 
x*; 


ii) f”(x*) satisfies the coercivity condition 
(f'(x*)v)v > ¢ Iv)? 


for some c > 0 and all v € X; 

iii) A bounded inverse map f”(x)~’ exists for all x suf- 

ficiently near x*; 

iv) f’(-)7! is continuous at x*. 

Near a nonsingular local minimizer, the local con- 
vergence rates for Newtonian descent methods are gen- 
erally much faster than the steepest descent conver- 
gence rate [8,10]. On the other hand, near singular lo- 
cal minimizers the Newton increments d% (x*) and their 
quasi-Newton approximations are typically not con- 
fined to the image sets G(x*) of some gradient-related 
set function G(-), and may actually be undefined on con- 
tinuous manifolds in X containing x*. Under these cir- 
cumstances, the unmodified Newtonian scaling princi- 
ples can degrade or even destroy local convergence. In 
any case, the convergence properties of Newtonian de- 
scent methods near singular local minimizers x* are not 
well-understood, and are likely to depend on the higher 
order structure of the singularity at x*. 


The Armijo Steplength Rule 


The line minimization steplength rules in (14) and (15) 
can be very effective in special circumstances; however, 
they are more often difficult or impossible to imple- 
ment, and are not intrinsically ‘optimal’ in any gen- 
eral sense when coupled with standard descent direc- 
tion rules based on local models of f. By their very 


nature, such schemes do not anticipate the effect of 
current search direction and steplength decisions on 
the reductions achievable in f in later stages of the 
calculation. Therefore, over many iterations, the exact 
line minimization rule may well produce smaller to- 
tal reductions in f than other much simpler steplength 
rules that merely aim for local reductions in f that are 
‘large enough’ compared with |||f’(x)||| at each itera- 
tion. A. Goldstein and L. Armijo proposed the first 
practical steplength rules of this kind in [1,8,9] for 
steepest descent and Newtonian descent methods in R". 
These rules and other related schemes described in [10] 
and [4] are easily adapted to general gradient-related it- 
erations. The present development focusses on the local 
convergence properties of the simple Armijo rule de- 
scribed below; however, with minor modifications, the 
theorems set forth here extend readily to the Goldstein 
rule and other similar line search formulations. 

Let G(-) be a gradient-related set function with pa- 
rameters v, [4; and [J2. Fix B in (0, 1) and 6 in (0, 1), 
and for each x in X and d in G(x) construct s(x, d) € (0, 
1] with the Armijo steplength rule, 


s(x,d) = maxt (17) 
subject to 

t € {1, B, B?,...} 
and 


f(x) — f(x — td) > btf"(x)d. 


When x is not stationary and —d is any descent vector, 
the rule (17) admits precisely one associated steplength 
s(x, d) € (0, 1]. This is true because B* converges to zero 
as k — oo and 


f(x) — f(x — td) 
= 5f'(x)td + (1-4) f'(x)td + o(t) 
> 5f'(x)td 
for t positive and sufficiently small, in view of (3). When 


x is stationary, (17) yields s(x, d) = 1 trivially for every 
vector d. 


Fixed Points 


Descent methods based on gradient-related set func- 
tions and Armijo’s rule generate sequences {x*} that sat- 
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isfy 

x**1 e T(x"), k=0,1, (18) 
where 

T(x) £ {y: Ad € G(x), y=x—s(x,d)d}. (19) 


The convergence theory outlined in the following 
sections addresses the behavior of all such Armijo 
gradient-related sequences near fixed points of the set- 
valued map T(-): X — 2*. The roots of this theory lie 
in Bertsekas’ convergence proof for steepest descent it- 
erates near nonsingular local minimizers in R” [2], and 
subsequent modifications of this proof strategy for gra- 
dient projection methods and singular local minimiz- 
ers in finite-dimensional or infinite-dimensional vector 
spaces with inner products [6,7]. For related nonlocal 
theories, see [10] and [4]. 

By definition, x* is a fixed point of T(-) if and only if 


x* € T(x*). 


Since Armijo’s rule produces nonzero steplengths s(x, 
d), it follows that x* is a fixed point of T(-) if and only if 
x* is a stationary point of f. More precisely, 


Proposition 1 Let T(-) be an Armijo gradient-related 
iteration map in (19). Then for all x € X, 


x € T(x) & T(x) = {x} 


G(x) = {0} & f(x) =0. (20) 


According to Proposition 1, any Armijo gradient- 
related sequence {x*} that intercepts a fixed point x* of 
T(-) must terminate in x*. Conversely, if {x'\terminates 
in a vector x*, then x* is a fixed point of T(-), and 
hence a stationary point of f. On the other hand, Armijo 
gradient-related sequences that merely pass near some 
stationary point x* may or may not converge to x*. 


Local Attractors: Necessary Conditions 


A vector x* is said to be a local attractor for an Armijo 
gradient-related iteration (18) if and only if there is 
a nonempty open ball, 


B(x*, p) = {x EX: |x —x*|| < p} 


with center x* and radius p > 0 such that every sequence 
{x} which satisfies (18) and enters the ball B(x*, p) 
must converge to x*, i.e. 

= 0. 


31, x! € B(x*,p) > lim |" — x" (21) 
k->0o 


With Proposition 1 and another rudimentary result for 
gradient-related set functions and Armijo steplengths, 
it is readily shown that a local attractor must be a strict 
local minimizer of f and an isolated stationary point of 
fi 

Proposition 2 Let v € (0, 1), 41 > 0, and 6 € (0, 1) be 
fixed parameter values in the gradient-related set func- 
tion G(-) and Armijo rule (17), and put c) =6 (1—v) 1 
> 0. Then for allx € X and d € G(x), 


F(x) — f(x — s(x, d)d) > crs(x, a) ||| f(x)? 22) 


Corollary 3 Let T(-) be the Armijo gradient-related it- 
eration map in (19). If {xk} is generated by the corre- 
sponding gradient-related iteration (18), then for all k = 
0) Essie 


iCute (23) 
and 
#Q) AUS fa) <fO*. (24) 


Since f is continuous, the claimed necessary conditions 
for local attractors are now immediate consequences of 
Proposition 1 and Corollary 3. 


Theorem 4 A vector x* is a local attractor for an 
Armijo gradient-related iteration (18) only ifx* is an iso- 
lated stationary point and a strict local minimizer of f, 
i. e., only if there is a nonempty open ball B(x*, p*) that 
excludes every other stationary point x # x*, and also 
excludes points x # x* at which f(x) < f(x*). 


The conclusion in Theorem 4 actually applies more 
generally to set-valued iteration maps T(-) prescribed 
by any steplength rule that guarantees the fixed-point 
characterization (20) and the descent property (23)- 
(24). On the other hand, related converse assertions are 
tied more closely to special properties of the Armijo 
rule and its variants, and to certain local uniform 
growth conditions on f and ||[f’(-)|||. If X is a finite- 
dimensional space, and x* is a strict local minimizer 
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and an isolated stationary point, then the requisite uni- 
form growth conditions automatically hold near x* and 
the full converse of Theorem 4 can be proved. If X is an 
infinite-dimensional space, the growth conditions be- 
come hypotheses in a weaker but still nontrivial partial 
converse of Theorem 4. This is explained in greater de- 
tail below. 


Local Attractors: Sufficient Conditions 


If x* is a strict local minimizer of f, then for some p* > 
0 and all x in the closed ball, 


Blx*, p*) = {x €X: |lx—x"ll <p}, 


the quantity f(x) — f(x*) is strictly positive when x # 
x*. In finite-dimensional spaces, it is possible to say 
more. If dim X < 00, then for each t € (0, p*] the corre- 
sponding closed annulus, 


A(t, p*) = {x: t < ||x —x*|| < p*}, (25) 


is compact. Since the function f(-) — f(x*) is continuous 
and positive in A(t, p*), it must attain a positive mini- 
mum value in this set, i. e., 


def ; * 
a(t) = at — f(x") >0, (26) 
for each t € (0, p*]. Put a(0) = 0 and note that for all 
th, to, 0 < t) < tz < p* => A(t, p*) D A(h, p*) > 
a(t)) < a(t). This establishes the following uniform 
growth property for strict local minimizers in finite- 
dimensional spaces. 


Lemma 5 Let X be a finite-dimensional normed vector 
space. If x* is a strict local minimizer for f, then there is 
a positive number p* and a positive definite nondecreas- 
ing real-valued function a(-) on [0, p*] such that, 


f(x) — f(x") = a(||x — x"), 


for all x € B(x*, p*). 


(27) 


In infinite-dimensional spaces, the uniform growth 
condition (27) need not hold at every strict local 
minimizer; however, when this condition is satisfied, 
the minimizer x* has a crucial stability property for 
gradient-related descent methods. More specifically, 
suppose that (27) holds and T‘(-) is an Armijo iteration 
map (19) with associated parameter /12 > 0. Since de- 
scent directions can not exist at a local minimizer, the 


vector x* must be a stationary point. Fix € € (0, p*] and 
note that since f’(-) is continuous and f’(x*) = 0, there 
isa te € (0, €] for which 


Ix —x* |] + poo ||[f'(x) ||] < € (28) 


for all x € B(x*, t.). Now construct the corresponding 
set, 


I(e) = {x € B(x*,€): f(x) — f(x*) < a(te)}. (29) 
By Proposition 2, the simple descent property, 


f(x — s(x, d)d) < f(x) (30) 


holds for all x and all d € G(x), hence the restriction (28) 
and the properties of a(-) insure that I(€) is an invariant 
set for T(-), i.e., T(x) C I(e) for all x € I(€). Moreover, 
since f is continuous, the minimizer x* is clearly an in- 
terior point of the set I(€), and this proves the following 
stability lemma for Armijo gradient-related iterations 
(or indeed, any gradient-related method with the de- 
scent property (30)). 


Lemma 6 Suppose that the uniform growth condition 
(27) holds near a local minimizer x* for f. Let T(-) be an 
Armijo gradient-related iteration map in (19). Then for 
every € > 0 there is a corresponding p € (0, €] such that 
for all sequences {xk} satisfying (18), and all indices I, 


x! € B(x*, p) > Wk > 1 x* € B(x*,€). (31) 
According to Lemma 6, the uniform growth condi- 
tion (27) guarantees that an Armijo gradient-related 
sequence {x} will remain in any specified arbitrarily 
small open ball B(x*, €) provided {x*} enters a suffi- 
ciently small sub-ball of B(x*, €). This property alone 
does not imply that {och} converges to x*; however, it is 
an essential ingredient in the local convergence proof 
outlined below. This proof requires two additional 
technical estimates for the Armijo rule and gradient- 
related set functions, a local uniform growth condition 
for |||f’(-)||| analogous to (27), and a local uniform con- 
tinuity hypothesis on f’(-). The first pair of estimates 
are straightforward consequences of the Armijo rule 
and the one-dimensional mean value theorem. The last 
two requirements are automatically satisfied in finite- 
dimensional spaces, once again because closed bounded 
sets are compact in these spaces. 
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Proposition 7 Let v € (0, 1), 42 > 0, B € (0, 1), and 
5 € (0, 1) be fixed parameter values in the gradient- 
related set function G(-) and Armijo rule (17), and put 


co = 6(1—v) 45! > 0. Then for all x € X and d € G(x), 
f(x) — f(x — s(x, d)d) = cos(x, d)* \|d||?. (32) 


Moreover, if s(x, d) < 1 and c3 = (1 — 5)(1 — v), then 
there is a vector & in the line segment joining x to x — 
Bo's(x, d)d such that 


PE) — FCO 3 [FCO 


and 


(33) 


\|E — x|| < Bo's(x, d) ||d|| . (34) 


Lemma 8 Let X be a finite-dimensional normed vec- 
tor space. If x* is an isolated stationary point for f, then 
there is a positive number p* and a positive definite non- 
decreasing real-valued function B(-) on [0, p* ] such that, 


[| FCO]]| = BUlx — x* ID, 
for all x € B(x*, p*). 


(35) 


The proof of Lemma 8 is similar to the proof of 
Lemma 5. 

Now suppose that the growth conditions (27) and 
(35) both hold in the ball B(x*, p*), and that f’(-) is 
uniformly continuous in this ball. By Lemma 6, there 
is a positive number p € (0, p*/2] such that every se- 
quence {x"} which satisfies (18) and enters the ball B(x*, 
p), thereafter remains in the larger ball B(x*, p*/2). But 
if {x*} is eventually confined to the ball B(x*, p*/2), then 
the mean value theorem insures that the nonincreasing 
real sequence {f (x*)} is bounded below and therefore 
converges to some finite limit. In this case, the differ- 
ences f (x*) — fi (xkt1) converge to zero and Propositions 
2. and 7 therefore yield, 


2 
lim s(x", d*) [Fe =0, (36) 
k->0o 
and 
lim s(x*, d*) |a*| =, (37) 
k->0o 


where d* € G(x*) and s(x*, d*) dé = xk+! — x* for all k. It 
follows easily from the remainder of Proposition 7 and 
the growth condition (35) that 
jim | | f(x’) | | =0 (38) 
k>oo 


and therefore 


lim = 0. (39) 


k->oo 
To see that (38) must hold, construct the index sets, w 
= {k: s(x*, d*) = 1} and o = {k: s(x*, d*) < 1}. If w is an 
infinite set, then, 


|x — x* 


im |] f°" ]| =. 
rar ia 
00 
by (36). On the other hand, if ¢ is an infinite set, then 
lim |] f°" ]| = 
fim | t @) 
k—oo 


by (37), (33), (34), and the local uniform continuity of 
f'(). This establishes (38) and proves the following local 
convergence results. 


Theorem 9 If the uniform growth conditions (27) and 
(35) hold simultaneously in the closed ball B(x*, p*) 
for some p* > 0, and if f'(-) is uniformly continuous 
in B(x*,p*) then x* is a local attractor for Armijo 
gradient-related iterations (18). 


Corollary 10 If X is a finite-dimensional normed vec- 
tor space and x* is a strict local minimizer and an iso- 
lated stationary point for f, then x* is a local attractor 
for Armijo gradient-related iterations (18). 


Nonsingular Attractors 


The nonsingularity conditions i) and ii) and Taylor’s 
formula imply that in some neighborhood of x*, the 
objective function f is convex and satisfies the local 
growth condition (27) with 


a(t)=at? (40) 
for some a > 0. But iff is locally convex near x*, then 
(O=(e Seas") 
<|FCO|| Ile 2" @D) 
near x*, and therefore (27) and (40) imply (35) with 
B(t) = at. 


These observations and Theorem 9 immediately yield 
the following extension of the convergence result in [2] 
for steepest descent processes in R". 


(42) 


Corollary 11 Every nonsingular local minimizer x* is 
a local attractor for Armijo gradient-related iterations 


(18). 
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Singular Attractors and Local Convexity 


The growth condition (27) alone does not imply local 
convexity of f, or condition (35), or the local attractor 
property. In fact, (27) can hold even if x* is the limit of 
some infinite sequence of local minimizers for f. This is 
readily demonstrated by the following simple function 
F:R'>R!: 


F(x) = x? | v3 sin (J - vans’) : (43) 


This function has a strict absolute minimizer at x* = 0, 
with 
(/2 — 1)x? < F(x) < (V2 + Ix? 


for all x € R'. However, F also has infinitely many (non- 
singular) local minimizers, 


So k — 8m) 
x~ = +exp | ————— 
m Pp 8/3 
for m = 1, 2,..., and these local minimizers accumulate 


at 0. Since each x is a stationary point and not an ab- 
solute minimizer, it follows that F is not convex in any 
neighborhood of the absolute minimizer at x* = 0, that 
(35) cannot hold at x*, and that x* is not a local attrac- 
tor for gradient-related descent processes. Evidently, x* 
= 0 is a singular minimizer for F; in fact, F’(x) does not 
exist at x = 0. (Apart from a minor alteration in one 
of its constants, (43) is taken directly from [6, Example 
1.1]. The erroneous constant in [6] was kindly called to 
the author’s attention by D. Bertsekas.) 

The growth conditions (27) and (35) together still 
do not imply convexity of f near x*, and indeed f may 
not be convex in any neighborhood of a singular local 
attractor. This is shown by another function F: R? > 
R! from [6, Example 1.2], viz. 


F(x) = x? — 1.98x, ||x||? + ||x||*, (44) 


where x = (x1, x2) and ||-|| is the Euclidean norm in R?. 
This function has a singular absolute minimizer at x* 
= 0, and F(x) and |||F’(x)||| grow like ||x||* and ||x||>, re- 
spectively, near 0. On the other hand, since every neigh- 
borhood of 0 contains points x where F’(x)(x — 0) is 
negative, it follows that F is not convex (or even pseu- 
doconvex) near 0. Nevertheless, x* = 0 is a local attrac- 
tor for Armijo gradient-related iterations, according to 
Corollary 10. 


Although f need not be convex near a singular lo- 
cal attractor x*, there are many instances where some 
sort of local convexity property is observed. (The func- 
tion f(x) = x* provides a simple illustration.) If the local 
pseudoconvexity condition, 


RG R)=fR') S Fe =—x"), (45) 
is satisfied for some k > 0 and all x in the ball B(x*, p*), 
then 


(f(x) — f(x*)) < |||’) | Ile — x" | 


near x*, and condition (35) follows at once from (27), 
with 


B(t) = K(p*) a(t) 


for all t € [0, p*]. These considerations immediately 
yield two additional corollaries of Theorem 9. 


Corollary 12 Suppose that the uniform growth condi- 
tion (27) holds in the closed ball B(x*, p*) for some p* 
> 0. In addition, suppose that in B(x*, p*), f'(-) is uni- 
formly continuous and f satisfies the pseudoconvexity 
condition (45). Then x* is a local attractor for Armijo 
gradient-related iterations (18). 


Corollary 13 If X is a finite-dimensional normed vector 
space, if x* is a strict local minimizer for f, and if f satis- 
fies the pseudoconvexity condition (45), then x* is a local 
attractor for Armijo gradient-related iterations (18). 


Local Convexity and Convergence Rates 


A local version of the convergence rate proof strategy in 
[5] also works in the present setting when f’(-) is locally 
Lipschitz continuous and f satisfies the pseudoconvex- 
ity condition (45) and the growth condition (27) near 
x*. Under these circumstances, the worst-case conver- 
gence rate estimate, 

f(x") — fe") = OK), (46) 
can be proved for Armijo gradient-related sequences 
{x} that pass sufficiently near x*. More refined order 
estimates are possible if the first two hypotheses hold 
and 


f(x) — f(x*) > allx —x* |" (47) 
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for some a > 0 and r € (1, 00), and all x € B(x*, p*). In 
such cases, it it can be shown that 


Ff (x®) — f(x") = O(k =) (48) 
for r € (2, oo), and 
Ha € (0,1) f(x*)—- f(x*) = 0") (49) 


for r € (1, 2]. (The latter estimate is comparable to the 
basic geometric convergence rate theorem for steepest 
descent iterates near nonsingular local minimizers [2].) 
The proof strategy in [5] can also produce still more 
precise local convergence rate estimates that relate the 
constants implicit in the order estimates (48) and (49) 
to local Lipschitz constants for f’(-) and parameters in 
the gradient-related set functions G(-), the growth con- 
dition (47), the pseudoconvexity condition (45), and 
the Armijo steplength rule (17). 

In the absence of local convexity assumptions, it is 
harder to establish analogous asymptotic convergence 
rate theorems; however, the analysis in [6] and [7] 
has established O(k~7) rate estimates for Hilbert space 
steepest descent iterations and a class of nonlinear func- 
tions f that contains the example (44). 


Concluding Remarks 


In a finite-dimensional space any two norms are equiv- 
alent and it can be seen that the gradient-related prop- 
erty and the local attractor property are therefore 
norm-invariant qualitative features of set-valued maps 
G(-): X — 2* and local minimizers x*. On the other 
hand, even in finite-dimensional spaces, the Lipschitz 
constants, growth rate constants, and gradient-related 
set function parameters in the present formulation 
are not norm-invariant, and this is reflected in norm- 
dependent convergence rates and norm-dependent size 
and shape parameters for the domains that are sent to 
a local attractor x* by gradient-related iterations. These 
facts have potentially important computational mani- 
festations when gradient-related methods are applied 
to large scale finite-dimensional problems that approxi- 
mate some limiting problem in an infinite-dimensional 
space. Note that infinite-dimensional spaces can sup- 
port multiple nonequivalent norms, and a set-valued 
function G(-) that is gradient-related in one norm 
need not be gradient-related relative to some other 


nonequivalent norm. Similarly, the local attractor prop- 
erty for a minimizer x*, and indeed local optimality it- 
self, are also typically norm-dependent at this level of 
generality. 


See also 


> Conjugate-gradient Methods 

> Large Scale Trust Region Problems 

> Nonlinear Least Squares: Newton-type Methods 
> Nonlinear Least Squares: Trust Region Methods 
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Introduction 


In the last few years, the need for an integrated lo- 
gistic system has become a primary objective of ev- 
ery company manager. Managers recognize that there 
is a strong relation between the location of facilities, 
the allocation of suppliers, vehicles, and customers to 
the facilities, and the design of routes around the facili- 
ties. In a location routing problem (LRP), the optimal 
number, the capacity, and the location of facilities are 
determined, and the optimal set of vehicle routes from 
each facility is also sought. 

In most location models, it is assumed that the cus- 
tomers are served directly from the facilities being lo- 
cated. Each customer is served on his or her own route. 
In many cases, however, customers are not served indi- 
vidually from the facilities. Rather, customers are con- 
solidated into routes that may contain many customers. 
One of the reasons for the added difficulty in solving 
these problems is that there are far more decisions that 
need to be made by the model. These decisions include: 
e How many facilities to locate, 

Where the facilities should be, 

Which customers to assign to which depots, 

Which customers to assign to which routes, 

In what order customers should be served on each 
route. 


In the LRP, a number of facilities are located among 
candidate sites and delivery routes are established for 
a set of users in such a way that the total system cost 
is minimized. As Perl and Daskin [51] pointed out, 
LRPs involve three interrelated, fundamental decisions: 
where to locate facilities, how to allocate customers to 
facilities, and how to route vehicles to serve customers. 

The difference between the LRP and the classic ve- 
hicle routing problem is that not only routing must be 
designed but the optimal depot location must be si- 
multaneously determined as well. The main difference 
between the LRP and the classical location-allocation 


problem is that, once the facility is located, the for- 
mer requires a visitation of customers through tours 
while the latter assumes that the customer will be visited 
from the vehicle directly, and then the vehicle will re- 
turn to the facility without serving any other customer 
([47]). In general terms, the combined location routing 
model solves the joint problem of determining the op- 
timal number, capacity, and location of facilities serv- 
ing more than one customer and finding the optimal 
set of vehicle routes. In the LRP, the distribution cost 
is decreased due to the assignment of the customers to 
vehicles while the main objective is the design of the ap- 
propriate routes of the vehicles. 


Variants of the Location Routing Problem 


Laporte et al. [39] considered three variants of LRPs, in- 
cluding (1) capacity-constrained vehicle routing prob- 
lems, (2) cost-constrained vehicle routing problems, 
and (3) cost-constrained location routing problems. 
The authors examined multidepot, asymmetrical prob- 
lems and developed an optimal solution procedure that 
enables them to solve problems with up to 80 nodes. 
Chan et al. [11] solved a multidepot, multivehicle loca- 
tion routing problem with stochastically processed de- 
mands, which are defined as demands that are gener- 
ated upon completing site-specific service on their pre- 
decessors. Min et al. [47] synthesized the past research 
and suggested some future research directions for the 
LRP. An extended recent literature review is included 
in the survey paper published by Nagy and Salhi [48]. 
They proposed a classification scheme and looked at 
a number of problem variants. The most important ex- 
act and heuristic algorithms were presented and ana- 
lyzed in this survey paper. 


Exact Algorithms for the Solution 
of the Location Routing Problem 


A number of exact algorithms for the problem was 
presented by Laporte et al. [38]. Applications and for- 
mulations and exact and approximation algorithms for 
LRPs under capacity and maximum cost restrictions 
are studied in the survey of Laporte [34]. Nonlinear 
programming exact algorithms for the solution of the 
LRP have been proposed in [20,61]. Dynamic program- 
ming exact algorithms for the solution of the LRP have 
been proposed in [5]. Integer programming exact al- 
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gorithms for the solution of the LRP have been pro- 
posed in [35,37,46]. Mixed integer goal programming 
exact algorithms for the solution of the LRP have been 
proposed in [65]. Two branching strategies have been 
proposed in [36]. An iterative exact procedure has been 
proposed in [9]. A branch-and-bound technique on the 
LP relaxation has been proposed in [17]. 


Heuristic Algorithms for the Solution 
of the Location Routing Problem 


The LRP is very difficult to solve using exact algorithms, 
especially if the number of customers or the candidate 
for location facilities is very large due to the fact that 
this problem belongs to the category of NP-hard prob- 
lems, i.e. there are no known polynomial-time algo- 
rithms that can be used to solve them. Madsen [43] 
presented a survey of heuristic methods. Christofides 
and Eilon [16] were the first to consider the problem 
of locating a depot from which customers are served 
by tours rather than individual trips. They proposed an 
approximation algorithm for the solution of the prob- 
lem. Watson-Gandy and Dohrn [63] proposed an al- 
gorithm where the problem is solved by transform- 
ing its location part into an ordinary location prob- 
lem using the Christofides—Eilon approximation algo- 
rithm. The routing part of the algorithm is solved us- 
ing the Clarke and Wright algorithm. Jacobsen and 
Madsen [31] proposed three algorithms. The first is 
called a tree-tour heuristic. The second is called ALA- 
SAV and is a three-phase heuristic, where in the first 
phase a location-allocation problem is solved and in the 
second and third phases a Clarke and Wright heuris- 
tic is applied for solving the problem. Finally, the 
third proposed algorithm is called SAV-DROP and is 
a heuristic algorithm that combines the Clarke-Wright 
method and the DROP algorithm. A two-phase heuris- 
tic is presented in [4], where in the first phase the 
set of open plants is determined and a priori routes 
are considered, while in the second phase the routes 
are optimized. Other two-phase heuristics have been 
proposed in [7,12,13,30,33,42,49,50,58]. Cluster analy- 
sis algorithms are presented in [6,18,60]. Iterative ap- 
proaches have been proposed by [27,59]. Min ([46]) 
considered a two-level location-allocation problem of 
terminals to customer clusters and supply sources us- 
ing a hierarchical approach consisting of both exact and 


heuristic procedures. Insertion methods have been pro- 
posed in [15]. A partitioning heuristic algorithm is pro- 
posed in [35], and a sweep heuristic is proposed in [21]. 


Metaheuristic Algorithms for the Solution 
of the Location Routing Problem 


Several metaheuristic algorithms have been proposed 
for the solution of the LRP. In what follows, an ana- 
lytical presentation of these algorithms is given. 

e Tabu search (TS) was introduced by Glover [22,23] 
as a general iterative metaheuristic for solving com- 
binatorial optimization problems. Computational 
experience has shown that TS is a well-established 
approximation technique that can compete with al- 
most all known techniques and that, by its flexibil- 
ity, can beat many classic procedures. It is a form of 
local neighbor search. Each solution S has an associ- 
ated set of neighbors N(S). A solution S’ € N(S) can 
be reached from S by an operation called a move. TS 
can be viewed as an iterative technique that explores 
a set of problem solutions by repeatedly making 
moves from one solution S to another solution S$’ lo- 
cated in the neighborhood N(S) of S [24]. TS moves 
from a solution to its best admissible neighbor, even 
if this causes the objective function to deteriorate. 
To avoid cycling, solutions that have been recently 
explored are declared forbidden or tabu for a num- 
ber of iterations. The tabu status of a solution is 
overridden when certain criteria (aspiration criteria) 
are satisfied. Sometimes, intensification and diversi- 
fication strategies are used to improve the search. 
In the first case, the search is accentuated in the 
promising regions of the feasible domain. In the 
second case, an attempt is made to consider solu- 
tions in a broad area of the search space. Tuzun and 
Burke [62] proposed a two-phase tabu search archi- 
tecture for the solution of the LRP. TS algorithms for 
the LRP are also presented in [10,14,41,45,57]. 

e Simulated annealing (SA) [1,3,32] plays a special 
role within local search for two reasons. First, SA ap- 
pears to be quite successful when applied to a broad 
range of practical problems. Second, some thresh- 
old accepting algorithms such as SA have a stochas- 
tic component, which facilitates a theoretical anal- 
ysis of their asymptotic convergence. SA [2] algo- 
rithms are stochastic algorithms that allow random 
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uphill jumps in a controlled fashion in order to pro- 
vide possible escapes from poor local optima. Grad- 
ually the probability allowing the objective function 
value to increase is lowered until no more transfor- 
mations are possible. SA owes its name to an anal- 
ogy with the annealing process in condensed-matter 
physics, where a solid is heated to a maximum tem- 
perature at which all particles of the solid randomly 
arrange themselves in the liquid phase, followed by 
cooling through careful and slow reduction of the 
temperature until the liquid is frozen with the par- 
ticles arranged in a highly structured lattice and 
minimal system energy. This ground state is reach- 
able only if the maximum temperature is sufficiently 
high and the cooling sufficiently slow. Otherwise 
a metastable state is reached. The metastable state 
is also reached with a process known as quenching, 
in which the temperature is instantaneously low- 
ered. Its predecessor is the so-called Metropolis fil- 
ter. Wu et al. [64] proposed an algorithm that di- 
vides the original problem into two subproblems, 
i.e., the location-allocation problem and the gen- 
eral vehicle routing problem, respectively. Each sub- 
problem is, then, solved in a sequential and iter- 
ative manner by the SA algorithm embedded in 
the general framework for the problem-solving pro- 
cedure. SA algorithms for the LRP are presented 
in [8,40,41]. 

Greedy randomized adaptive search procedure 
(GRASP) [56] is an iterative two-phase search 
method that has gained considerable popularity in 
combinatorial optimization. Each iteration consists 
of two phases, a construction phase and a local 
search procedure. In the construction phase, a ran- 
domized greedy function is used to build up an ini- 
tial solution. This randomized technique provides 
a feasible solution within each iteration. This so- 
lution is then exposed for improvement attempts 
in the local search phase. The final result is sim- 
ply the best solution found over all iterations. 
Prins et al. [52] proposed a GRASP with a path- 
relinking phase for the solution of the capacitated 
location routing problem. 

Genetic algorithms (GAs) are search procedures 
based on the mechanics of natural selection and 
natural genetics. The first GA was developed by 
John H. Holland in the 1960s to allow comput- 


ers to evolve solutions to difficult search and com- 
binatorial problems such as function optimization 
and machine learning [28]. Genetic algorithms offer 
a particularly attractive approach to problems like 
location routing problems since they are generally 
quite effective for the rapid global search of large, 
nonlinear, and poorly understood spaces. Moreover, 
GAs are very effective in solving large-scale prob- 
lems. GAs [25] mimic the evolution process in na- 
ture. They are based on an imitation of the biolog- 
ical process in which new and better populations 
among different species are developed during evo- 
lution. Thus, unlike most standard heuristics, GAs 
use information about a population of solutions, 
called individuals, when they search for better so- 
lutions. A GA is a stochastic iterative procedure that 
maintains the population size constant in each iter- 
ation, called a generation. Their basic operation is 
the mating of two solutions to form a new solution. 
To form a new population, a binary operator called 
a crossover and a unary operator called a mutation 
are applied [54,55]. Crossover takes two individuals, 
called parents, and produces two new individuals, 
called offspring, by swapping parts of the parents. 
Marinakis and Marinaki [44] proposed a bilevel GA 
for a real-life LRP. A new formulation based on 
bilevel programming was proposed. Based on the 
fact that in the LRP decisions are made at a strate- 
gic level and at an operational level, we formulate the 
problem in such a way that in the first level, the deci- 
sions of the strategic level are made, namely, the top 
manager finds the optimal location of the facilities, 
while in the second level, the operational-level de- 
cisions are made, namely, the operational manager 
finds the optimal routing of vehicles. Other evolu- 
tionary approaches for the solution of the LRP have 
been proposed in [29,53]. 

Variable neighborhood search (VNS) is a meta- 
heuristic for solving combinatorial optimization 
problems whose basic idea is systematic change of 
a neighborhood within a local search [26]. VNS al- 
gorithms for the LRP are presented in [45]. 

The ant colony optimization (ACO) metaheuristic 
is a relatively new technique for solving combinato- 
rial optimization problems (COPs). Based strongly 
on the ant system (AS) metaheuristic developed by 
Dorigo, Maniezzo, and Colorni [19], ACO is derived 
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from the foraging behavior of real ants in nature. 
The main idea of ACO is to model the problem as 
the search for a minimum cost path in a graph. Arti- 
ficial ants walk through this graph looking for good 
paths. Each ant has a rather simple behavior so that 
it will typically only find rather poor-quality paths 
on its own. Better paths are found as the emergent 
result of the global cooperation among ants in the 
colony. An ACO algorithm consists of a number of 
cycles (iterations) of solution construction. During 
each iteration a number of ants (which is a parame- 
ter) construct complete solutions using heuristic in- 
formation and the collected experiences of previous 
groups of ants. These collected experiences are rep- 
resented by a digital analog of trail pheromone that 
is deposited on the constituent elements of a solu- 
tion. Small quantities are deposited during the con- 
struction phase while larger amounts are deposited 
at the end of each iteration in proportion to solution 
quality. Pheromone can be deposited on the com- 
ponents and/or the connections used in a solution 
depending on the problem. ACO algorithms for the 
LRP are presented in [8]. 
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A nonnegative function f: R” > R! is called a logcon- 
cave (point) function if for every x, y € R" and0<A< 
1 we have the inequality 


fAx+(—My) = [feo] [FE]. 


A probability measure P defined on the Borel sets of 
R" is called logconcave if for any Borel sets A, B C R" 
and 0 < A < 1 we have the inequality 


P(AA + (1—A)B) > [P(A)]* [P(B)] 


provided that AA + (1 — A)B is also a Borel set. If P is 
a logconcave measure in R” and A C R” is a convex 
set, then P(A + z) is a logconcave point function in R”. 
In particular, the probability distribution function F(z) 
= P({x: x < z}) = P({x: x < 0} + z), of the probability 
measure P, is a logconcave point function. If n = 1, then 
also 1 — F(z) is logconcave. 


The basic theorem concerning logconcave measures 
[5,6] states that if the probability measure P is generated 
by a logconcave probability density function f, i.e., 


P(C) = / f(x) dx 
Cc 


for every Borel set C C R", then P is a logconcave mea- 
sure. 

Examples for logconcave probability distributions 
are the multivariate normal, the uniform (on a convex 
set) and for special parameter values the Wishart, the 
beta, the univariate and some multivariate gamma dis- 
tributions. 

A closely related theorem [5] states that if f: R’*” 
—>Rlisa logconcave function, then 


f(x,y) dy 
R™ 


is a logconcave function in R”. This implies that the 
convolution of two logconcave functions is also logcon- 
cave [3,5]. 

Logconcave probability distributions play impor- 
tant role in probabilistic constrained stochastic pro- 
gramming problems. If the problem is: 


+ 


min cx 
st. P(Tx > &)> p, 
Ax=b, x>0, 


and the random vector & has continuous distribution 
with logconcave probability density function, then the 
set of feasible solutions is convex (for more general 
results see [6]). On the other hand, if the problem is 
solved by a barrier function method with logarithmic 
penalty function, then the function, to be minimized in 
each step, is convex. 

The basic theorem of logconcave measures has the 
following generalization [1,2]: If -oo <a@ <00,0<A 
< 1, and the probability density function f: R" > R! 
satisfies (x, y € R”): 


fax + (1—A)y) 
> [Af + -FeM]®. 


then for any Borel sets A, B C R” such that A A + (1 — 
X)B is also a Borel set, we have 


P(AA + (1—A)B) (1) 
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> {a[Pcay” +a) [Py’}”, (2) 


where y = a@/(1 + na). The cases a, y = — 00, 0, 00 are 
interpreted by continuity. Logconcavity corresponds to 
the case a = y = 0. If f, P satisfy the above inequalities, 
then f is called an w-concave function and P a y-concave 
probability measure. If a, y = — oo, then f and P are 
called quasiconcave. 

A nonnegative function f: R” — R? is called logcon- 
vex in the convex set D € R” if for every x, y € D and 0 
<A <1 we have the inequality 


fAax+(—dy) <[f@] [FO]. 


Similarly, the probability measure P defined on the 
Borel subsets of the convex set D C R" is called logcon- 
vex if for any Borel sets A, B C D we have the inequality 


P(AA + (1—A)B) < [P(A)]* [P(B)]”.. 


It follows, by Hélder’s inequality, that the sum of log- 
convex functions is also logconvex. This fact, in turn, 
implies that if f is logconvex in D, then the function of 
the variable t € R” 


g(t) = F(x) dx 
C+t 
is logconvex for any fixed Borel set C C R" in the sense 
that g(At; + (1 — A) to) < [g(ti)]}* [g(t2)]!-* provided 
that C+t,; CD,C +t, CDand0<)<1. 
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The univariate discrete probability distribution {p,: k € 
Z} is called logconcave if for every k we have the inequal- 
ity py = Pr-1 Presi. This inequality implies that if k = i 
+ (1 — A)j, where i, j, k are integers and 0 < J < 1, then 
we have the inequality p, > pe ae Examples are the 
binomial, Poisson, hypergeometric, geometric distribu- 
tions. 

A classical theorem by M. Fekete [3] states that the 
convolution of two logconcave univariate discrete dis- 
tributions is also logconcave. 

The multivariate discrete logconcavity [2] is not 
a direct generalization of its univariate counterpart. The 
discrete probability distribution {P(x): x € Z’} is said to 
be logconcave if there exists a convex function g: R™ > 
R such that 


—log P(x) = g(x) forxe Z™. 


If P(x) = 0, then by definition — log P(x) = too. 

The convolution theorem, mentioned above in con- 
nection with logconcave univariate distributions, does 
not carry over to the multivariate case. We know, how- 
ever, that the trinomial distribution: 


n} 
ky !ky)\(n = ky = k,)! 
x pt! ps (1 — pi — pa)" 


P(ky, ky) = 


is logconcave and the convolution of trinomial distri- 
butions is also logconcave [5]. For the use of discrete 
logconcavity in stochastic programming consult [6]. 

Other definitions and results, concerning multivari- 
ate discrete logconcavity, can be found in [1,4]. 
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Introduction 


Turkay and Grossmann [7] proposed a logic version of 
the outer-approximation algorithm for MINLP by Du- 


ran and Grossmann [2] for solving a special class of 
generalized disjunctive programming (GDP) problems. 
The problem arises in the optimization of process net- 
works and involves two-term disjunctions in which the 
first term is activated when a unit or node is selected, 
while the second term enforces zero values to a subset of 
the continuous variables. The specific form of the GDP 
problem is as follows: 


minZ = pa + f(x) 


keK 
s.t. r(x) <0 
Y, AY, 
ats 2 0 |v Bkx : 0 |kEK SDE) 
Ck = Vk c,h = 0 
92(Y) = True 
xER", ceER™”, Y € {true,false}” , 


where Y; are the Boolean variables that decide whether 
the first term or second term in a disjunction k € K is 
true or false, and x are the continuous variables. The ob- 
jective function involves the term f(x) for the continu- 
ous variables and the charges c, that depend on the dis- 
crete choices in each disjunction k € K. The constraints 
r(x) < 0 must hold regardless of the discrete choices. 
In contrast, g(x) < 0 are conditional constraints that 
must hold when Y, is true in the Ath disjunction; oth- 
erwise (—Y;) a subset of the x variables is set to zero 
with the proper definition of the matrix B'. In partic- 
ular, we define B! = [b7] such that bi =e ie; =0, 
and bj = 0' if x; # 0. In this way only a subset of the 
variables x is forced to zero (typically flows). The cost 
variables c, correspond to the fixed charges, and their 
value equals , if the Boolean variable Y, is true; other- 
wise they are zero. §2(Y) = True are logical relations 
for the Boolean variables expressed as propositional 
logic. It is assumed for the derivation of basic methods 
that the functions are convex, although in practical ap- 
plications these often correspond to nonconvex func- 
tions. 


NLP and Master Subproblems 


Following the original algorithm [2], the logic-based 
outer-approximation algorithm consists of solving NLP 
subproblems and disjunctive or MILP master prob- 
lems. 


Logic-Based Outer Approximation 


1929 


As described in Turkay and Grossmann [7], for 
fixed values of the Boolean variables, Y, = true and 
Y,; = false, the corresponding NLP subproblem is as 
follows: 


min Z = Yocc + f(x) 


keK 
s.t. r(x) <0 
g(x) <0 
Ck = Yk 
By = 
ck = 0 


| for ¥, = true keK 


: | for Y= fal keK 
x ER", c, ER", 
(NLPD) 


Note that for every disjunction k € K only constraints 
corresponding to the Boolean variable Y, that is true 
are imposed, thus leading to a reduction in the size 
of the problem. Also, fixed charges y, are only ap- 
plied to these terms. Assuming that NF subproblems 
(NLPD) are solved in which sets of linearizations 
] =1,...NF are generated for subsets of disjunction 
terms Ly = {1| uy = true}, one can define the following 
disjunctive OA master problem: 


MinZ = ) °c, +a 
k 


a > f(x!) + V(x!" — x!) 


t. 1=1,...,L 
: r(x!) + Vr(x)T (x — x!) <0 
Yr 
ge(x!) + Vane (x— x!) <0 i. 
aa v| Bkx =0 
lel, 
chr =0 
te = Ye 
kek 
92(Y) = True 
a@eR, xER", cER™, Y € {true, false}” 
(MGDP) 


It should be noted that before applying the above mas- 
ter problem it is necessary to solve various subprob- 
lems (NLPD) so as to produce at least one linear ap- 
proximation of each of the terms in the disjunctions. 
As shown by Turkay and Grossmann [7], selecting the 
smallest number of subproblems amounts to solving 
a set covering problem, which is of small size and easy 


to solve. In the context of a process flowsheet synthesis 
problem, another way of generating the linearizations 
in (MGDP) is by starting with an initial flowsheet and 
suboptimizing the remaining subsystems. 

The above problem (MGDP) can be solved by 
the methods described by Beaumont [1], Raman and 
Grossmann [6], and Hooker [4]. Turkay and Gross- 
mann [4] have shown that if the convex hull represen- 
tation of the disjunctions is used in (MGDP), then con- 
verting the logic relations §2(Y) into the inequalities 
Ay < aleads to the following MILP problem: 


MinZ = So yey +a 
k 


oe ae 
r(x') + Vr(x') (x —x') <0 
Vez, Sk(X') Xz, + View, B(x") XN, 
< [—ge(x!) + Vgu(x!)"x"| yp 
lely, kKEK 
XN, = XN, + XN, 
0< XN, < XN, Ik 
0< xx, < 0 — yk) 
Ay<a 
x € R" xy, 29, xy, 29, y € {0, 1}” 
(MIPDE) 


where the vector x is partitioned into the variables 
(xz,,*N,) for each disjunction k according to the def- 
inition of the matrix B’ (i.e., x, refers to nonzero rows 
of this matrix). It is interesting to note that the logic- 
based outer-approximation algorithm represents a gen- 
eralization of the modeling/decomposition strategy of 
Kocis and Grossmann [5] for the synthesis of process 
flowsheets. 


Steps of Algorithm 


Assuming feasible NLP subproblems, the steps of the 
proposed logic-based auter-approximation method are 
as follows: 


Step 1: Model the problem in generalized disjunctive 
form as in (GDP). 

Step 2: Identify the NF subproblems to be solved either 
from inspection or from set covering problems. 
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Logic-Based Outer Approximation, Figure 1 
Process network example 


Step 3: Solve NLP subproblems (NLPD) for the NF sub- 
problems determined in step 2. The lowest-cost solu- 
tion of these NLPs yields an upper bound, Zy, for the 
problem. 

Step 4: Linearize the objective function and constraints 
of the current NLP subproblem(s) and set up the MILP 
master problem (MIPDF). The solution of this problem 
gives the lower bound, Z1, for the problem. 

Step 5: If |Zy — Z| < ¢, where « is a tolerance, then 
stop. The solution with the current Zy is the optimal 
solution. Otherwise go to step 6. 

Step 6: Solve NLP subproblem (NLPD) by fixing the 
Boolean variables predicted by the master problem. 
The objective function value of the solution is Zyzp. If 
ZNLP < Zu> then set Zu = ZNILP: 

Step 7: Compare the upper bound Zy with the lower 
bound Z,. If|Zy — Z| < ¢, then stop; the solution with 
the current Zy is the optimal solution. Otherwise go to 
step 4. 


It should be noted that one can also derive a logic- 
based version of Generalized Benders Decomposition 
as described in [7]. The logic outer-approximation al- 
gorithm described above has been implemented in 
the computer code LOGMIP by Vecchietti and Gross- 
mann [8], which can be accessed from http://www. 
logmip.ceride.gov.ar 


Example 


Consider the following (GDP) problem from [7] that 
deals with a simplified version of the synthesis of a pro- 
cess network shown in Fig. 1. 


The GDP model is as follows: 
1. Objective function: 
minZ=cjytom+ot+tcec+c5 +e +¢7 +06, + x2 
— 10x3 + x4 — 15x5 — 40x 9 + 15x19 + 15x44 
+ 80x17 — 65x13 + 25x19 — 60X29 
+ 35x21 — 80X22 — 35x25 + 122 
2. Material balances at mixing/splitting points: 
Xx) —X%2,—x3=0 
X4+X5—X% — X11 = 0 
X13 — X19 — X21 = 0 
X17 — Xo — X16 — X25 = O 
X11 — X12 — X15 = O 
X6 —X7—xXg = 0 
X23 — X29 — X22 = 0 
X23 — X14 — X24 = 0 
3. Specifications on the flows: 


X19 — 0.8x17 < 0 
X19 — 0.4x17 > 0 
X12 — 5x14 <0 
X12 — 2X14 = 0 


4, Disjunctions: 


Yi 
Unit 1:] exp(x4) -1l— x2. <0 
q = 5 


AY, 
xX2= X= 0 
Cy =0 
Y) 
exp(x5/1.2) -l1—x3 <0 
2= 8 
aAY, 
V X4= x3 = 0 
C2 =0 
Y3 
1.5x9 — xg + X19 =0 
3 = 6 
AY; 
Xg = Xo = X10 = 0 
C3 =0 
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Y4 

1.5(x12 + X14) — x13 = 0 
C4 = 10 

AY, 

x12 = X13 = X44 = 0 
C4 = 0 

Ys 

X15 — 2X16 = 0 

c= 6 

ie 

x15 = X16 = 0 

c= 0 


| 
| 
| 
| 
units ae eee 
| 
| 
| 
| 


t= 7 
aAY¥e 
Vi] X19 = X29 = 0 
C= 0 
Y; 
Unit 7: | exp(x22) — 1 — x21 <0 
c7 =4 
AY, 
Vi] Xa = x2 = 0 
7 = 0 
Yg 
Unit 8:] exp(x1s) — 1 — x10 — x17 < 0 
cg =5 
AY, 
Vio X10 = X17 = X1g = 0 
cg = 0 


5. Propositional Logic [2 = (Y;)]: 


Y, > Y3V Ya V Ys 

Y. > Yi3 V Ys V Y5 

Y3 => Yi V Yo, Y3 => Yg 
Y4=> VV Yo, Ys > Ye V Y7 
Y5 => Y, VY, Ys => ¥p 
Yo => Yj 

Y, => Y4 

Yg => Y3 V Ys V (AY3 A —Ys) 


Logic-Based Outer Approximation, Table 1 
Progress of iterations 


Subproblem Objective value 


NLPD1 73.277 
NLPD2 103.584 


NLPD3 113.789 
MGDP 67.948 
NLPD4 68.009 


6. Specifications: 
YVY2 
YaV Ys 
YoVY7 
7. Variables: 
xj; C20, Y, = {True,False} 
i=1,2,...,8,f =1,2,...,25 
Applying LOGMIP to solve this problem, and starting 
with three NLP subproblems at 


NLPD1: Y = Y3 = Y4 = Ys = Yg = True 
NLPD2: Y, = Y3 = Y4 = Y7 = Yg = True 
NLPD2: Y = Y4 = Yo = Y, = True 


the predicted optimum solution is given by Z = 68.009. 
Table 1 shows the progress of the iterations. 
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Introduction 


Let G(V,E) be a simple undirected graph, V = 
{1,2,...,n}. The adjacency matrix of G is a matrix 
Ag= (dij) nxn» Where aij = 1 if (i, j) € Eand aij = 0 
otherwise. The set of vertices adjacent to a vertex i € V 
will be denoted by N(i) = {j € V: (i,j) € E} and 
called the neighborhood of the vertex i. We will also 
consider the complementary graph G(V, E) having the 
same set of vertices V, but an edge (i, j) € E ifand only 
if i and j are not adjacent in G. 

An independent set S is a subset of V such that no 
two vertices of S are adjacent, i.e., Vi € SN(i)NS = @. 
The set Sis called a maximal independent set if any ver- 
tex i € V \ S has at least one adjacent vertex in S, i.e., 
Vie V\SN(i)NS F# ©. Finally, the set S is called 
a maximum independent set if it has the largest car- 
dinality among all independent sets of the graph. This 
cardinality will be denoted by a(G) and called the inde- 
pendence (or stability) number of the graph G. 

In addition to the maximum cardinality stable sets, 
we will consider the maximum weight independent sets. 


Let there be a given vector w = (Wi, W2,...,Wn)! of 
nonnegative vertex weights. A maximum weight inde- 
pendent set is such an independent set S C V that has 
the largest weight a(G, w) = maxs ) > j¢5 Wi. 

Similarly, a clique Q of the graph G is a sub- 
set of V such that any two vertices in it are adja- 
cent, ie, Vie QN(iI)NQ=Q\ {i}. The clique Q 
is called maximal if for any vertex i € V \ Q there 
is at least one vertex in Q non-adjacent to i, ie. 
Vie V\QNG)NQFQ. If Q has the largest cardi- 
nality among all cliques of the graph, it is called a max- 
imum clique. The cardinality of a maximum clique will 
be denoted by w(G) and called the clique number of the 
graph G. A maximum weight clique is a clique having 
the largest weight w(G, w) = maxg )ijeq Wi. 

It is easy to see that independent sets of the graph G 
correspond to cliques of G, and vise versa. 

We will denote by 7(G) the chromatic number of 
the graph G (i.e. the minimum number of colors to 
which the graph vertices can be colored without using 
one color for any two adjacent vertices.) The number 
x(G), giving the minimum number of cliques of G to 
which the vertex set V can be partitioned, will be also 
denoted by 7(G) and called the clique partition number 
of the graph G. 

Next, for two graphs Gj(Vj, E;) and G2(V2, Er) we 
define their strong product G, - G2 as the graph, whose 
vertex set is the Cartesian product V; x V2 and in which 
a vertex (i,j) is adjacent to a vertex (i’, j’) if and only 
if (i, i’) € E, and (j, j’) € Ep. The strong product of k 
copies of G will be denoted by G*. 


Formulation 


Lovasz Number as an Upper Bound 
of Shannon Capacity 


Let us consider the set V = {1,2,..., m} to be an alpha- 
bet in which the adjacency means that the two letters 
can be confused. Then any set of one-letter messages 
that cannot be confused with each other corresponds 
to an independent set of the graph and vise versa. Fur- 
thermore, the maximum number of one-letter mes- 
sages that cannot be confused with each other is equal 
to a(G), and the maximum number of k-letter mes- 
sages that cannot be confused with each other is equal 
to a(G*). It is easy to see that there are at least a(G)* k- 
letter messages that cannot be confused with each other, 
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so a(G*) > a(G)*. So, 


@(G) = sup Va(G*) = Jim Va(Gk) > a(G). (1) 
k —>0o 


The value ©(G) is called the Shannon zero-error capac- 
ity of the graph G [14]. Generally, it is extremely hard 
to compute, and nowadays ©(G) is not even known for 
the graph C; (cycle of 7 vertices). 

Thus, the independence number a(G) gives a lower 
bound on @(G). In 1979, L. Lovasz defined a new non- 
trivial upper bound on the Shannon zero-error capacity 
of a graph in his seminal paper [11]. This function was 
named later Lovdsz number (or 0-function) of a graph. 

First, define an orthonormal representation of the 
graph G as a system (uj, U2,... 
a Euclidean space such that whenever two vertices i and 
jare not adjacent, the vector u; is orthogonal to the vec- 
tor uj. It is easy to see that such systems of vectors do 
exist, e.g., any n orthonormal vectors from the space 
R". The #-function is defined as the following mini- 
max value: 


, Uy) of unit vectors in 


1 
0(G) = min max ———., 
© {cui} 1€V (CT uj)? 


(2) 
where c ranges over unit vectors of the same dimension- 
ality that the vectors u; are. The vector c was called by 
Lovasz the handle of the representation. 

It can be shown that for a strong product of graphs, 
0(G- H) < 0(G)0(H). To show that a(G) < 0(G) one 
needs to observe that if S is a maximum independent 
set of G, then 1 =c* > Doj¢6(c'uj;)? => @(G)/0(G). 
From here it is obvious that O(G) < 0(G) as 
a(G*) < #(G*) < O(G)*. 

Similarly, we introduce the weighted #-function: 
Wi 


0(G,w) = min max ———., 
te) {cui} tev (ch uj)? 


(3) 
which gives an upper bound for a(G, w) < ¥(G, w). 

In contrast to @(G) and a(G, w), which are hard to 
compute, #(G, w) can be computed with an arbitrary 
precision in a polynomial time by either the ellipsoid 
method or an interior point method due to its semidef- 
inite programming formulation considered below (see 
also [7,9,13]). This makes ?-function attractive for es- 
timating these intractable numbers. 


The Sandwich Theorem 


Other equivalent formulations implying a number of 
interesting properties of (G) were established in [7] 
(see also the extensive survey [9]). To introduce them, 
let us define three specific convex sets in R” associated 
with the graph: 


ST AB(G) = hull({x € {0, 1}"|xj + xe <1, 
V(j,k) € E}), 
TH (G) ={x = 0| > (c'u))"s;, = 1, 


jev 
Y ort. lab. (uj) of G, cl] = 1}, 


QST AB(G) = {x >0| Sox) <1, 
JEQ 
V cliques Q of G}. 


Let x° € {0,1}" be the incidence vector of an inde- 
pendent set S, that is, x° = 1 if i € S, and x; = 0 oth- 
erwise. Then, obviously, for any orthonormal represen- 
tation (u;) and a unit vector c, 


Vea =e ew st 


jev jeS 


So, any x € ST AB(G) satisfy the constraints of 
T H(G). Let Q be any clique of G. Then we can con- 
struct an orthonormal representation as follows. Let 
all vectors (uj)jey\q be mutually orthogonal, and also 
each of them be orthogonal to another unit vector c. We 
set all (uj)ieq to be equal to c. If we consider the con- 
straint )° jevicu ay x j < lover only such orthonormal 
representations, we obtain the clique constraints defin- 
ing the set QST AB(G). Hence, we have 


ST AB(G) C TH(G) C OST ABG). (4) 
Obviously, 
a(G,w) = max{w' x |x € ST AB(G)}. (5) 


Let us also denote 
k(G,w) = max{w'x | x € OST AB(G)}. (6) 
We will prove that 


o(G,w) = max{w' x |x € TH(G)} (7) 
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and henceforth conclude that 
a(G,w) < 0(G,w) < K(G, w). (8) 


The double inequality (8) constitutes the famous sand- 
wich theorem. 

Let us denote by S,, the set of all n x n symmetric 
matrices, and by S** the set of all positive semidefinite 
n X n matrices: 


St = {AE S,|x"Ax >0 Vx € R"}. 


We also denote by z = (./wi, /W2,...,./Wn)! the 
vector of square roots of the vertex weights. Consider 
the following functions of the graph and vertex weights: 


04(G, w) = min Amax(A), 
AES), 


S.t. dij =./Wiwj, 


where Amax(A) denotes the largest eigenvalue of A; 


Vif) GE, 


03(G, w) = max z' Xz, 
xest 


st.xj,=0, VG,j)e FE, tr(xX)=1, 


where tr(X) = )~7_, xi; denotes the trace of the ma- 
trix X; 


04(G, w) = max d'v;)’wi, 
(G, w) = max x ) 


where (v;)jey range over all orthonormal representa- 
tions of the complementary graph G and ||d|| = 1. 


Theorem 1 


O(G, w) =02(G, w) = 03(G, w) = 04(G, w) 
=max{w!x |xée TH(G)}. 


Proof First we show that 3(G, w) < 0(G, w). Con- 
sider a matrix A € S,, such that aj; = JWiwj, V(i, j) ¢ 
E, and let t = Amax(A). Then tl — A € S**, and hence 
there exists X € R"*" such that tl— A = X'X. Let 
x; € IR" be the i-th column of X. Then 


Fee =t—w;, Vie V 


and 


TT — 
xj;X; = / Wij, 


Vi, jnonadjacent inG. 


Note that rank(X) <n since the matrix tl—A has 
a zero eigenvalue. This implies that there exists a unit 
vector c € R” orthogonal to all x;, i € V. Consider the 
vectors 


uj = (Jwic + xi/V't, 


It is easy to see that 


ie€V. 


Uj Uj _ (neem a) a ee =1 
t 
and for any two nonadjacent vertices i, j € V, 
(ymamjcc + ale) _ 
t 
Hence, the vectors (u;) form an orthonormal represen- 
tation of Gand 


l= 
uj uj = 


Wi 


W; 
o(G, w) < max ——— = max 
ieV w,/t 


ieV (clu;)? 
Now, we show that 0:(G,w) < 03(G,w). We 
have z'Xz < 03-tr(X) for any X € St such that 
xij =0V(i,j) € E. This inequality is equivalent to 
(W — 031) eX < 0, where W = (,/WiWj)nxn and “e” 
denotes the Euclidian inner product in R”*”, i.e., 
AeB= ae a;jbij. From here it can be inferred 
that the matrix %31— W is a sum of some posi- 
tive semidefinite matrix D € S;* and another sym- 
metric matrix A = (a;;) € S, such that if (i,j) ¢ E, 
then a;; = 0. This implies that 031 - W — A € S* and 
hence 33 > Amax(W + A) > do. 
Next, we show that 03(G,w) < 04(G,w). Let 
X = (x;j) € R'” be an optimum matrix for the pro- 
gram defining 33. Since X € S*7, there exists a matrix 
Y € R"*" such that X = YTY. Let x; denote the i-th 
column of X and y; denote the i-th column of Y. Con- 
struct an orthonormal system of vectors (u;)jey in R” 
such that there is the vector uj = y; 5 \|vi || whenever 
yi #0. Since y} yj = xj; = OV(i, j) € E, the system 
(u;) is an orthonormal representation of G. Further- 
more, z'Y!Yz = z!Xz = Bs and hence d = Yz/./03 
is a unit vector. Whenever y; 4 0, 


=t= Amax(A) : 


2 gly y: zl x; 
Vi= => > 
(Vosyill) (Vella) 
Thus, |ly;||d'vi = z'x; //03, Wie V,and 


1 
> Ilyill /wid' v; = ear ce 
ieV V03 ieV 


1 T / 
ve / 
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Using the Cauchy-Schwarz inequality, 


Bs = (x ide 


i€V 


< (x ay i tao) 


i€V ieV 


_ (x ni) i mtd 
ieV ieV 


— > w;(d'v;)* < B4(G, w) : 


ieV 


Next, we prove that 04(G,w) < max,{w!x | 
x € TH(G)}. Let (v{)iey and d be correspondingly 
an optimum orthogonal representation of G and its 
handle for the program defining 34. We show that 
the vector ((d'v;)’),.,, belongs to TH(G). Con- 
sider some orthonormal representation of G, (uj)iev, 
u; € IR", and let c € R” be some unit vector. The matri- 
ces ujv; € R"*" are mutually orthogonal and have the 


«> 


unit norm with respect to the inner product “e”, i.e., 


Lif i = j, 


ee ONS) 5 cea 


Now we may conclude that 


dieu Pd" vi)? = YU (ed") @ (uiv}))? 
ieV ieV 
<(cd") e(cd') =1. 


Hence ((d"v;)*) ,.,, € TH (G) and 


04(G,w) => wid" vi? 
i€eV 


< max{w'x |x € TH(G)}. 


The final step is to show that max,{w'x | x € 
TH(G)} 
< 0(G,w). Let x* be a vector maximizing w!x over 
TH(G). Choosing any orthonormal representation 
(u;)jev and a unit vector c, we have 


ay 
w'x* < { max ——— Yeu 
iH (chui)? ieV 


1 


< 0(G,w). 


< ai 
max ———— 
i (clu;)? 


The four inequalities established above can hold if 
and only if all the ’s are equal. QED. oO 


Let us denote by «(G) the value of «(G,w) when 
all vertex weights are 1’s. It is easy to see that 
K(G) < ¥(G). Indeed, if we take a minimum clique par- 
tition {Qi, Qo,..., Qz} of G, then 7(G) is equal to the 
optimum value of the program: 


This program has the same objective as the program for 
«(G), but its set of constraints is a subset of constraints 
of QST AB(G). So, we may extend the sandwich in- 
equality to 


a(G) < 0(G) < K(G) < x(G). 


Omitting «(G) and applying the inequalities to the 
complementary graph, we obtain 


o(G) < #(G) < x(G), (9) 


which expresses another famous form of the sandwich 
theorem stating that a polynomial-time computable 
number 0(G) lies in between the two NP-hard num- 
bers: the clique number and the chromatic number. 


Lovasz Number as a Dual Bound 
of Quadratic Maximization 


Consider the following quadratic formulation of the 
maximum weight independent set problem: 


a(G, w) = maxw'x 
V(i, jf) € E, 


2 * 
x,—-xj=0,1=1, ...,n. 


St ity =D, (10) 


It has been shown by N.Z. Shor that the optimal 
Lagrangian dual bound of program (10) is equal to 
0(G, w) [15]. One can compute this bound for a max- 
imization problem with a quadratic objective subject 
to a set of linear and quadratic constraints minimiz- 
ing a convex non-differentiable function defined over 
a parametric (linearly dependent on Lagrangian multi- 
pliers) set of negative semidefinite symmetric matrices. 
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Indeed, the Lagrangian of program (10) is 


L(x, A) = wix + So Aula? — xi) + > Aj jXiX; - 


i=1 (i, f)EE 
The considered optimization problem is equivalent to 
max min L(x, A), 
x A 
while the optimal Lagrangian dual bound is derived as 
min max L(x, A). 
A x 


From here it follows that in the dual problem A should 
always be chosen in such a way that the quadratic form 
of L(x, A) is negative semidefinite with respect to x 
(otherwise the inner maximization with respect to x will 
deliver infinity), while minimization with respect to A 
turns out to be convex non-differentiable [15]. 


Applications 
Perfect Graphs 


A graph is called perfect if, for all its vertex-induced 
subgraphs, the clique number is equal to the chromatic 
number. In this case the inequalities (9) become the 
equalities: 


o(G) = 1(G) = x(G). 


So, both the clique number and the chromatic number 
can be computed in a polynomial time by means of the 
0-function. It is also easy to see how the 3-function can 
be used to actually find a maximum clique of a perfect 
graph. Indeed, a vertex i of a perfect graph G belongs 
to some maximum clique if and only if 3-function 
of the subgraph induced by the neighborhood N(i) is 
equal to 7(G) — 1. Hence, we can obtain a maximum 
clique of the graph successively selecting a vertex satis- 
fying this condition and repeating the procedure with 
the subgraph induced by its neighborhood. Moreover, 
this simple algorithm can be improved [1,17]. Coloring 
a perfect graph can be also performed in a polynomial 
time (see, e. g., [7]). 

A graph is perfect if and only if its complemen- 
tary graph is perfect [7,10]. This means that for any 
vertex-induced subgraph of a perfect graph there is also 
the equality between the independence number and the 
clique partition number. The strong perfect graph theo- 
rem, proved recently [5], states that a graph is perfect if 


and only if it does not include an odd hole or an odd 
antihole as a vertex-induced subgraph. A polynomial- 
time algorithm for recognizing perfect graphs was also 
derived on the basis of the strong perfect graph theo- 
rem [4]. 


Improving Upper Bounds 
for Independence Number 


It is worth to consider how well does }(G) approximate 
the independence number a(G) for general graphs and 
whether this approximation can be improved with- 
out breaking the polynomial-time computability. It 
turns out that for random graphs ¥(G)/a(G) grows as 
O(./n/ log n) [2,9]. So, #(G) does not allow for a fixed 
approximation guarantee for a(G). Moreover, the max- 
imum independent set problem is known to be hard to 
approximate (see, e. g., [8]). However, there is a num- 
ber of approaches to formulate increasingly tight ap- 
proximations of a(G) based on the v-function. The 
first one is the “lift-and-project” method by Lovasz and 
Shrijver [12]. The second approach is to express a(G) 
as a copositive program and to use its approximations 
via semidefinite programming [6]. Finally, we may try 
to improve the dual bound of (10) and make it closer 
to the optimum generating superfluous quadratic con- 
straints [15,16]. 

In first two cases one obtains a sequence of semidef- 
inite programs increasing in size, but having non- 
increasing optimum values, and at some point the op- 
timum value becomes equal to a@(G). It comes as no 
surprise that before achieving the value a(G), in the 
general case, the size of the program increases exponen- 
tially (otherwise it would imply P = NP). What is more 
surprising is that any provable polynomial-time im- 
provement of 3(G) (i. e. a polynomial-time computable 
function of a graph having a value less than 0(G) 
whenever a(G) < #(G)) would also imply P= NP [3]. 
Hence, unless P= NP, neither method can deliver, in 
general case, a value closer to a(G) than 3(G) before 
the size of the program becomes exponential. 


See also 


> Copositive Programming 
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Background 


The interval-Newton method is a tool for solving a sys- 
tem of nonlinear algebraic equations. It provides the 
capability to enclose all solutions of the equation sys- 
tem that occur within a specified search interval, and 
to do so with mathematical and computational cer- 
tainty. In the context of optimization, it is generally ap- 
plied to the deterministic global optimization of bound- 
constrained problems: 


min $(x) (1) 
xe XO, (2) 


The objective @(x) is in general a nonconvex func- 
tion that may have multiple local minima. The inter- 
val vector (box) X“) provides upper and lower bounds 
on each component of the decision-variable vector x. 
It is assumed here that these bounds are sufficiently 
wide that the global minimum of (x) will occur in 
the interior of X®. This means that the stationarity 
condition V¢(x) = 0 can be used in the search for 
the global minimum. The interval-Newton method can 
then be applied to enclose all stationary points, one 
of which is the global minimizer. If only the global 
minimizer is sought, then interval-Newton is typically 
combined with some branch-and-bound scheme, so 
that all stationary points need not actually be found. 
However, in other applications, such as transition state 
analysis [22,38] and computation of phase equilib- 
rium [13,37], it may be useful to know all of the sta- 
tionary points, and the interval-Newton approach pro- 
vides this capability. For situations in which it is pos- 
sible that the global minimum will lie on a boundary 
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of X, then the “peeling” process described by Kear- 
fott [19], in which interval-Newton is applied to each 
of the lower dimensional subspaces that constitute the 
boundary of X, can be used. For more general con- 
strained problems, the interval-Newton method can be 
applied to the solution of the Karush-Kuhn-Tucker 
(KKT) conditions or the Fritz-John conditions. A thor- 
ough discussion of the use of the interval-Newton 
approach in global optimization has been given by 
Hansen and Walster [11]. In recent years, this approach 
has been used in a number of applications, includ- 
ing computation of fluid phase equilibrium from activ- 
ity coefficient models [27,36,37,40], cubic equation-of 
state models [5,12,13,14,40] and statistical associating 
fluid theory [39], computation of solid-fluid equilib- 
rium [35,41], parameter estimation using standard least 
squares [7,25] and error-in-variables [8,9], calculation 
of adsorption in nanoscale pores from a density func- 
tion theory model [26], transition state analysis [22] 
and determination of molecular structures [24]. 

A drawback to the interval-Newton approach, as 
well as to other approaches for deterministic global op- 
timization, is the potentially high computational cost. 
One way to improve the efficiency of the interval- 
Newton method is to more tightly bound the solution 
set of the linear interval equation system that is at the 
core of this approach. In this article, we discuss the solu- 
tion of this linear interval system and describe a bound- 
ing strategy [21,23] based on the use of linear program- 
ming (LP) techniques. Using this approach it is possi- 
ble to exactly (within round out) determine the desired 
bounds on the solution set of the linear interval system. 
By providing tight interval bounds on the solution set, 
the goal is to more quickly contract intervals that may 
contain stationary points, as well as to more quickly 
identify intervals that contain a unique stationary point 
or no stationary point, thus leading to an overall im- 
provement in computational efficiency. 


Methods 
Interval-Newton 


Several good introductions to interval computations 
are available [11,17,19,28]. A real interval X is defined 
as the set of real numbers lying between (and including) 
given upper and lower bounds; that is, X = [a,b] = 
{x ER | a < x < b}. A real interval vector X = (X,, 


X>,...,X,)" has n real interval components and can be 
interpreted geometrically as an n-dimensional rectan- 
gle or box. Note that in this context uppercase quanti- 
ties are intervals, and lowercase quantities are real num- 
bers. Basic arithmetic operations with intervals are de- 
fined by XopY = {xopy | x € X,y © Y}, where 
op € {+,—,x,+}. Interval versions of the elemen- 
tary functions can be similarly defined. It should be 
emphasized that, when machine computations with in- 
terval arithmetic operations are done, as in the proce- 
dures outlined below, the endpoints of an interval are 
computed with a directed (outward) rounding. That 
is, the lower endpoint is rounded down to the next 
machine-representable number and the upper end- 
point is rounded up to the next machine-representable 
number. In this way, through the use of interval, as op- 
posed to floating-point arithmetic, any potential round- 
ing error problems are avoided and rigorous enclosures 
are maintained. Implementations of interval arithmetic 
and elementary functions are readily available, and re- 
cent compilers from Sun Microsystems directly support 
interval arithmetic and an interval data type. 

For an arbitrary function f(x), the interval exten- 
sion, F(X), encloses all values of f(x) for x € X; that 
is, it encloses the range of f(x) over X. It is often com- 
puted by substituting the given interval X into the func- 
tion f(x) and then evaluating the function using inter- 
val arithmetic. This so-called “natural” interval exten- 
sion may be wider than the actual range of function 
values, though it always includes the actual range. For 
the case in which the function is a single-use expres- 
sion, that is, an expression in which each variable oc- 
curs only once, natural interval arithmetic will always 
yield the true function range. For cases in which such 
rearrangements are not possible, there are a variety of 
other approaches that can be used to try to tighten in- 
terval extensions [11,17,19,28,29]. 

Of interest here is the interval- Newton method and 
its application to the stationarity condition Vo(x) = 0. 
Given an n x n nonlinear equation system f(x) = 
V¢(x) = 0 with a finite number of real roots in some 
initial interval, this technique provides the capability to 
find tight enclosures of all the roots of the system that 
lie within the given initial interval. An outline of the 
interval-Newton methodology is given here. More de- 
tails are available elsewhere [11,19,34]. It should be em- 
phasized that this technique is not equivalent to simply 
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implementing the routine “point” Newton method in 
interval arithmetic. 

Given some initial interval X), the interval-New- 
ton algorithm is applied to a sequence of subinter- 
vals, which arises due to a bisection process. Consider 
a subinterval X“) in the sequence. Before application 
of interval-Newton, measures are usually taken first 
to try to eliminate, or at least shrink, this subinterval. 
For example, one may apply a function range test. An 
interval extension F(X") of the function F(x) is cal- 
culated. If there is any component of the interval ex- 
tension F(X“) that does not include zero, then the in- 
terval can be discarded, since no solution of f(x) = 0 
can exist in this interval. The next subinterval in the se- 
quence may then be considered. Otherwise, testing of 
X continues. A variety of other interval-based tech- 
niques (e.g., constraint propagation) may also be ap- 
plied to try to shrink X“ before proceeding to the in- 
terval-Newton procedure. 

In the interval-Newton procedure, the linear inter- 
val equation system 


F(X®)(N® — x) = f(x), (3) 


is solved for a new interval N“, where F’(X“) is an 
interval extension of the Jacobian of f(x), and x) is 
an arbitrary point in X)_ Tt has been shown [11,19,28] 
that any root contained in X“ is also contained in the 
image N‘*), This implies that if the intersection between 
X and N“™ is empty, then no root exists in x), and 
also suggests the iteration scheme Xt) = xX®QN™), 
In addition, it has also been shown [11,19,28] that, 
if N“) is in the interior of X, then there is a unique 
root contained in X“ and thus in N“), Thus, after 
computation of N“*) from Eq. (3), there are three pos- 
sibilities: (1) x ANY = @, meaning there is no 
root in the current interval X“*) and it can be discarded; 
(2) N“ is in the interior of X, meaning that there is 
exactly one root in the current interval xX“); (3) neither 
of the above, meaning that no conclusion can be drawn. 
In the last case, if X AN is sufficiently smaller than 
X), then the interval-Newton test can be reapplied to 
the resulting intersection, X**) = x“ 9 N“. Oth- 
erwise, the intersection x ON is bisected, and the 
resulting two subintervals are added to the sequence 
of subintervals to be tested. If an interval containing 
a unique root has been identified, then this root can be 
tightly enclosed by continuing the interval-Newton it- 


eration, which will converge quadratically to a desired 
tolerance (on the enclosure diameter). 

At termination, when the subintervals in the se- 
quence have all been tested, either all the real roots of 
f(x) = Ohave been tightly enclosed, or it is determined 
that no root exists. Applied to nonlinear equation solv- 
ing problems, this can be regarded as a type of branch- 
and-prune scheme on a binary tree. It should be em- 
phasized that the enclosure, existence, and uniqueness 
properties discussed above, which are the basis of the 
method, can be derived without making any strong as- 
sumptions about the function f(x) for which roots are 
sought. The function must have a finite number of roots 
over the search interval of interest; however, no spe- 
cial properties such as convexity or monotonicity are 
required, and f(x) may have transcendental terms. 


Solution of Linear Interval Equation Systems 


Clearly, the solution of the linear interval system given 
by Eq. (3) is essential to the interval-Newton approach. 
To see the issues involved in solving such a system, 
consider the general linear interval system Az = B, 
where the matrix A and the right hand side vector B 
are interval-valued. The solution set S of this system 
is defined by S = {z| Az =b,AcA bE B}. How- 
ever, in general this set is not an interval and may have 
a very complex polygonal geometry. Thus to “solve” 
the linear interval system, one instead seeks an inter- 
val Z containing S. Computing the interval hull (the 
tightest interval containing S) is NP-hard [33], but 
there are several methods for determining an interval 
Z that contains but overestimates S. Various interval- 
Newton methods differ in how they solve Eq. (3) for 
N“™ and thus in the tightness with which the so- 
lution set is enclosed. By obtaining bounds that are 
as tight as possible, the overall performance of the 
interval-Newton approach can be improved, since with 
a smaller N“) the volume of X“ 9 N“ is reduced, 
and it is also more likely either that X n N® = 
@ that N is in the interior of X will be satisfied. 
Thus, intervals that may contain solutions of the non- 
linear equation system are more quickly contracted, 
and intervals that contain no solution or that contain 
a unique solution may be more quickly identified, all of 
which leads to a likely reduction in the number of bi- 
sections needed. 
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Frequently, N‘*) is computed component-wise us- 
ing an interval Gauss-Seidel approach, preconditioned 
with an inverse-midpoint matrix. Though the inverse- 
midpoint preconditioner is a good general-purpose 
preconditioner, it is not always the most effective ap- 
proach [18,19]. A hybrid preconditioning approach 
(HP/RP) [10], which combines a simple pivoting 
preconditioner with the standard inverse-midpoint 
scheme, has been shown to be significantly more effi- 
cient than the inverse-midpoint preconditioner alone 
on some applications. However, it still may not yield 
the tightest enclosure of the solution set, which, as 
noted above, is in general an NP-hard problem. Nev- 
ertheless, it is possible, using an LP-based strategy, to 
compute exact component-wise bounds on the solu- 
tion set required in the context of the interval-Newton 
method, while avoiding exponential time complexity. 
This method is described next. 


LP Strategy for Interval-Newton Method 


Many types of methods have been proposed for bound- 
ing the solution set of a system of linear interval equa- 
tions. One such method is based on the use of LP tech- 
niques [1,3,15,17]. Consider again the linear interval 
system Az = B. Oettli and Prager [31] showed that the 
solution set S is determined by the constraints: 


|Az — B| < AA|z| + AB, (4) 


where A is the component-wise midpoint matrix of 
the interval matrix A, AA is the component-wise half- 
width (radius) matrix of A, and similarly Band AB are 
the midpoint and radius of B. Eq. (4) is not directly use- 
ful for computing bounds on the solution set because of 
the absolute value operation on the right-hand side. In 
general, the solution may lie in all 2” orthants for an 
n-dimensional problem. In each orthant, each compo- 
nent of z keeps a constant sign, and thus the absolute 
value can be dropped. For a given orthant, define the 
diagonal matrix Dg by 


1 zj 20 


(Da) jj = ee 
jJ— 


j=1,2,...,n. (5) 


Thus |z| = Dgz and z = Dg |z|. Eq. (4) becomes: 


|Az — B] < AADgz + AB. (6) 


This can be rearranged to the set of linear inequalities 


(4) 


where the underline and overline denote lower and 
upper interval bounds, respectively. To determine the 
tightest interval enclosing the solution set, one can 


A=AAD, 
=A = AAD, 


then solve, in each orthant, the set of 2n optimization 
problems 


max Z;, 
z 


j= 1,2,...,n, (8) 


min Z;, 
z 


j=1,2,...,n, (9) 


each with the 2n linear inequality constraints given by 
Eq. (7). These can be solved using linear programming 
(LP) techniques. However, in general, there are 2” or- 
thants and so the solution time complexity will be ex- 
ponential, as expected since this problem is known to 
be NP-hard. 

In the context of the interval-Newton method, how- 
ever, the exponential time complexity can be avoided. 
This is because only that part of the solution set of 
Eq. (3) that intersects X“) needs to be found. Consider 
the choice of the real point x“) in Eq. (3). Here x) is 
an arbitrary point in X“) typically taken to be the mid- 
point. However, if x“ is chosen to be a corner of X‘*) 
instead, then the part of the solution set for N (k)_x(K) of 
Eq. (3) that intersects X (K) lies in just one orthant. Thus, 
in the context of interval-Newton, only 2n LP subprob- 
lems, each with 2 constraints, needs to be solved. Fur- 
thermore, the LP subproblems have properties that can 
be exploited. First, all the 2m subproblems share the 
same constraints; that is, they all have the same feasi- 
ble region. Thus, an initial feasible basis for the LP sub- 
problems needs to be found only once. Second, the ob- 
jective function of each subproblem consists of just one 
variable. This makes the problem much simpler since it 
is not necessary, as it is in the general case, to calculate 
the gain in objective value when choosing variables to 
enter and exit the basis. 

Lin and Stadtherr [23] have implemented the ap- 
proach outlined above in the procedure LISS_LP (Lin- 
ear Interval System Solver by Linear Programming), 
and incorporated it in an interval-Newton method for 
global optimization. Two key aspects of the implemen- 
tation are: 
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1. The choice of a corner of X to be used in 
the LP problem. For this purpose, a heuristic ap- 
proach [23] was developed that incorporates ideas 
from the pivoting preconditioner approach of Gau 
and Stadtherr [10]. 

2. Determination of rigorous error bounds on the solu- 
tion of the LP problems. This is done using a proce- 
dure based on primal/dual relationships [16,23,30]. 
Complete details of the implementation are given by 
Lin and Stadtherr [23]. We turn next to some ex- 
amples that demonstrate the performance of the LP- 
based strategy as implemented in LISS_LP. 


Cases 


Lin and Stadtherr [21,23] have tested the performance 
of the LP-based interval-Newton strategy for global 
optimization on a variety of problems. In this sec- 
tion, we summarize the results on a group of param- 
eter estimation problems. Each parameter estimation 
case used was formulated using the error-in-variables 
approach, with complete details given by Gau and 
Stadtherr [8,9]. Comparisons are made to an interval 
Gauss-Seidel method with a hybrid preconditioning 
approach (HP/RP), which has been shown [10] to pro- 
vide a substantial improvement in computational per- 
formance relative to standard implementations of the 
interval-Newton approach. Comparisons are made in 
terms of the number of interval-Newton (I-N) tests re- 
quired, i. e., the number of times Eq. (3) must be solved, 
and in terms of the CPU time on a Sun Blade 1000 
Model 1600 workstation. On a current (early 2007) 
workstation, these CPU times would be approximately 
an order of magnitude less. 


Problem 1 


This problem [6,20] involves estimation of binary pa- 
rameters in the van Laar equation for activity co- 
efficients. These two parameters are estimated from 
vapor-liquid equilibrium data for the binary system of 
methanol and 1,2-dichloroethane. Computational per- 
formance results are shown in Table 1. When the LP- 
based strategy (LISS_LP) is applied, the number of I-N 
tests is substantially reduced relative to HP/RP, indicat- 
ing its effectiveness in reducing the number of intervals 
that must be tested. Essentially, by reducing the size of 
N“), the LP approach is able to more quickly identify 
and discard intervals that do not contain a stationary 


LP Strategy for Interval-Newton Method in Deterministic 
Global Optimization, Table 1 

Computational performance of LP-based method (LISS_LP) 
and preconditioned interval Gauss-Seidel method (HP/RP) 
ona Sun Blade 1000 Model 1600 


Variables | HP/RP 
(a) I-N 
Tests Tests 
303589 156182 
220 81 
9505 1258 
144833 24817 
55255 9757 


LISS_LP 


CPU I-N 
time (s) 


Problem 
CPU 


time (s) 


point. However, the percent reduction in overall CPU 
time is less than the percent reduction in I-N tests. This 
occurs due to the overhead in solving the LP subprob- 
lems. 


Problem 2 


In this problem [4], the rating parameters are esti- 
mated for a steady-state heat exchanger network, which 
consists of four heat exchangers. The four parameters 
can be estimated from experimental measurements, in- 
cluding six flow measurements and thirteen tempera- 
ture measurements. In the version of the problem con- 
sidered here, 20 data points were considered, leading 
to an optimization problem involving 264 indepen- 
dent variables. Due to the large number of variables, 
sparse linear programming routines were implemented 
in LISS_LP for this problem. In this case, both I-N tests 
and CPU time are substantially reduced when the LP- 
based method is used, as shown in Table 1, indicating 
that the LP overhead is less significant on this relatively 
large problem. 


Problems 3 and 4 

Both of these problems involve the estimation of ki- 
netic parameters for an irreversible, first-order reaction 
A — B.In Problem 3 [6,20], data from an adiabatic 
continuous-stirred-tank reactor (CSTR) is used, and in 
Problem 4 [2] data from an isothermal batch reactor is 
used. In both cases, the reaction rate constant k is given 
by an Arrhenius expression 


6. 
k = 0, exp (-2) , 


(10) 
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in which two parameters, 0; and 62, must be deter- 
mined from experimental measurements. Again, the 
computational performance results (Table 1) show that 
use of the LP-based strategy substantially reduces the 
number of I-N tests required relative to HP/RP, but that 
a comparable reduction in CPU time is not achieved. 
For example, on Problem 4, the number of I-N tests is 
reduced by nearly a factor of 6, but there is only about 
a 15% reduction in CPU time. This reflects the fact that 
an I-N test performed using the LP method requires 
greater computational effort than an I-N test using the 
HP/RP approach. However, on problems studied by Lin 
and Stadtherr [21,23], this overhead was always offset 
by a large reduction in the number of I-N tests, result- 
ing in computational savings on all but very small prob- 
lems. 


Problem 5 


In this problem, parameters are estimated in a model 
of an isothermal pseudo-differential reactor for the cat- 
alytic hydrogenation of phenol on a palladium cata- 
lyst [32]. There are 28 experimental kinetic data points 
of the partial pressure of phenol (P)), the partial pres- 
sure of hydrogen (P,), and the initial reaction rate (r). 
It is desired to fit this kinetic data to a semi-empirical 
model of the form 


6; 6205P, P2 


= fl 
" (1 + 6, P, + @P2)? _ 


where 6), 62 and 63 are the parameters to be estimated. 
This global optimization problem has 59 independent 
variables. Due to the relatively large number of vari- 
ables in this problem, a sparse linear programming rou- 
tine was used in LISS_LP. As seen in Table 1, both I-N 
tests and CPU time are reduced nicely compared to 
HP/RP when the LP-based method is used. As in the 
case of Problem 2, on this relatively large problem the 
impact of the LP overhead appears to be less significant. 


Conclusions 


In this article, we have described an LP-based strat- 
egy [21,23] for solving the linear interval equation 
system arising in the context of the interval-Newton 
approach for deterministic global optimization. The 
method can obtain tighter bounds on the solution set 
of the linear interval system than the preconditioned in- 
terval Gauss-Seidel approach, and thus leads to a large 


reduction in the number of subintervals that must be 
tested during the interval-Newton procedure. However, 
the difference between the overhead required to solve 
the LP subproblems and that required to perform the 
preconditioned Gauss-Seidel method may lead to rela- 
tively smaller or larger improvements in overall com- 
putational time, depending on the size of the prob- 
lem. With sparse linear algebra in the LP subproblems, 
the method can be successfully applied to deterministic 
global optimization problems involving over two hun- 
dred variables, providing a rigorous guarantee of global 
optimality. 
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L-shaped Method for Two-stage Stochastic Programs with Recourse 


A stochastic linear program with recourse (SLP) is 
a mathematical program of the form 


min 
s.t. 


where Q(x) = Eg Q(x, €), 


c-x + Q(x) 
Ax=b, x>0, 


min q, y(§) 


ae tn Oe a 


and E¢ denotes the mathematical expectation with re- 
spect to &, x is an (m, x 1) decision vector, and for each 
&, y is (m2 x 1). A is (m, x nj) and for each &, h is (mz 
x 1). All other matrices and vectors have conformable 
dimensions. Transposes are omitted for simplicity. The 
random vector & is formed by the random components 
of q, hand T - Q(x, €) is the second-stage value function 
for a given € and Q(x) the expected value-function or 
expected recourse. 

In the case where random vectors are described by 
discrete distributions, Q(x) is a piecewise linear con- 
vex function of x, so that classical decomposition tech- 
niques may apply. Let k = 1, ..., K index the possible 
realizations of &, let p; be their probabilities and y; be 
the corresponding second stage decision variables. SLP 
is then equivalent to the extensive form 


K 
cx + So pean ye 


min 
k=1 
to = Ax=b, 
(EF) 4° aah 
Thx + Wyx = he, 
kA on K, 
x, yr = 0. 


This extensive form possesses a dual block-angular 
structure. It is thus well suited to application of Ben- 
ders decomposition, which in the case of stochastic pro- 
gramming is known as the L-shaped method. An abbre- 
viated presentation is as follows. It is restricted to the 
case where all second stage programs are bounded and 
feasible for any choice of first-stage decision. 


L-Shaped Method for Two-Stage Stochastic 
Program with Bounded, Complete Recourse 


Consider the master linear program 


min cx+0@ 
st. Ax =b, 
(MLP) 
Ejx +0 > ej, j=Hl,...,5, 
x= 0, 


with s optimality cuts (initially s = 0) and 6 a lower 
bound on Q(x), (@ is omitted when s = 0). 

Using the solution x*, 6° of (MLP) at iteration s, find 
optimal solutions to the K subproblems, 


min w=qky 
st. Wy = hy — Tex’, 
y= 0, 
with optimal simplex multipliers Ths k=1,...,K. 
Define Fra4 = DF pp wi Te and Bay = 
pry Ak. 
If e541 — Es41x° < 6%, then stop as x* is an optimal 


solution. Otherwise, set s = s + 1 and return to the mas- 
ter program. 

Finite convergence of the L-shaped method is 
proved through classical convexity arguments. When 
the second stage does not have complete recourse, some 
first stage decisions may imply that no feasible recourse 
exists for some k. Then, a number of feasibility cuts 
are also needed. They are obtained through the opti- 
mal simplex multipliers of some phase-1 problem. Al- 
though these cuts should theoretically be generated for 
all realizations k = 1, ..., K, there are many situations 
where the search for feasibility cuts can be limited to 
one selected second-stage only [9]. 

The L-shaped method can be made more efficient 
by performing some bunching to obtain optimal mul- 
tipliers for several realizations of & at once (see the ex- 
periments in [4]). Efficiency can sometimes be gained 
by sending disaggregated cuts (also called multicuts) 
instead of one fully aggregated cut at each iteration 
[2]. Another way of improving the efficiency of the L- 
shaped is to include a quadratic regularizing term in 
the first-stage objective function. This additional term 
is typically the square of the Euclidean distance between 
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the decision x and the previous iterate point x° [8]. L- 
shaped methods have also been combined with statis- 
tical estimation, in particular to cope with continuous 
random variables (see [5] and » Discretely distributed 
stochastic programs: Descent directions and efficient 
points). 

A number of alternatives to the L-shaped tech- 
niques have been proposed to solve SLP. One is to use 
the Lagrangian finite generation method, also known 
as scenario aggregation [7]. Another is to use interior 
points techniques [1]. For a general presentation of 
stochastic programming, see [3] or [6]. 


See also 
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Abstract 


Maritime transportation is a heavily utilized mode 
when large quantities of bulk products need to be trans- 
ported over long distances. Often, inventories exist at 
the loading and/or unloading ports of the sailing legs. 
When the ship operator has the responsibility for both 


the transportation of the fleet and the inventories at 
the ports, the underlying planning problem is a mar- 
itime inventory routing problem. Here we introduce 
the reader to various applications within maritime in- 
ventory routing and present some examples of research 
contributions. First we consider and present a math- 
ematical model for the basic problem where a single 
product is transported and denote this problem the in- 
ventory ship routing problem. There exist a lot of ex- 
tensions and variants of the problem. These include, 
among others, problems with inventories at only one 
end, variable production/consumption rates, multiple 
products, use of spot charters and problems that com- 
bine inventory routing with other planning aspects. 
Maritime inventory routing problems are very complex 
and to the authors’ knowledge there exist no commer- 
cial optimization-based systems for the shipping indus- 
try yet. However, it is probably just a question of time 
before they become available. 


Introduction 


In order to survive in a tough global market, many 
companies have been forced to change their focus from 
competition between companies to competition be- 
tween supply chains. Supply chains of companies with 
foreign sources of raw materials or with overseas cus- 
tomers most often include maritime transportation. 
Supply chain management and optimization are active 
fields of research, and we can see applications in almost 
all industries. So far the focus of such applications has 
usually not been much on maritime transportation, so 
there is a great potential and need for research in the 
area. 

A maritime inventory routing problem is defined 
here as a combined ship routing and scheduling prob- 
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lem and an inventory management problem. The ba- 
sic inventory ship routing problem (ISRP) concerns the 
transportation of a single product. The product is pro- 
duced and stored in inventories at given loading ports 
and is transported by sea to inventories at unload- 
ing or consumption ports. Inventory capacities are de- 
fined in all ports. Further, we assume that information 
about production and consumption rates is given in 
all ports. To transport the product between the given 
production and consumption ports, the planners con- 
trol a heterogeneous fleet of ships. The planning prob- 
lem is to find routes and schedules for the fleet that 
minimize the transportation costs without interrupting 
production or consumption at the storages. Depend- 
ing on the segment the fleet is operating in, the typical 
planning period spans from 1 to 2 weeks up to several 
months. 

Most ship scheduling problems studied in the liter- 
ature are so-called cargo routing problems [1]. In cargo 
routing problems, each cargo is specified by a given 
loading and unloading port. The quantity of the cargo 
is given and normally there exist time windows for 
loading and/or unloading. When planning routes and 
schedules, the shipping company either seeks to min- 
imize the transportation costs for carrying all con- 
tracted cargoes or in addition to maximize profit for 
optional spot cargoes that may be available. We refer 
to [4] for a survey on maritime cargo routing problems. 
The cargo routing problems deviate from the ISPRs in 
a number of ways. The number of calls at a given port 
during the planning horizon is not predetermined in 
the ISRP, neither is the quantity to be loaded or un- 
loaded at each port call. There is also no predefined 
pickup and delivery pair in the ISRP. The combina- 
tion of the inventory management and the ship routing 
and scheduling makes the ISRP a very complex prob- 
lem. 

The inventory routing problem has been focused 
on in the literature for a couple of decades. Dror and 
Ball [8] defined the problem as a distribution prob- 
lem in which each customer maintains a local inventory 
of a product such as heating oil and consumes a cer- 
tain amount of that product each day. Given a cen- 
tral supplier (depot), the objective is to minimize the 
annual delivery costs while attempting to ensure that 
no customer runs out of the product at any time. The 
asymmetry between each type of inventory (production 


and consumption) with only one central supply node 
(depot) will often be found in road-based inventory 
routing problems, and more seldom in maritime trans- 
portation (ISRP). In the road-based inventory routing 
problem, the amount unloaded at each customer is of- 
ten small compared to the total capacity of the vehicle. 
This is also in contrast with the ISRP, where the ship is 
often fully loaded and unloaded. 

The objective of this article is to introduce the 
reader to various real planning problems within mar- 
itime inventory routing. The purpose is not to give 
a comprehensive overview of such problems, but rather 
to present examples of applications and research in the 
area. 

The rest of the article is organized as follows: The 
first section defines the basic inventory ship routing 
problem and the underlying mathematical model. Ex- 
tensions of the basic ISRP are addressed next. Finally, 
concluding remarks and future research follow. 


The Basic ISRP 


In order to give an introduction to the various real plan- 
ning problems within maritime inventory routing, we 
will start with a basic ISRP. First we describe the plan- 
ning problem. Then we present an arc-flow formulation 
of the problem . The final section is devoted to real ap- 
plications of the basic ISRP. 


Problem Description 


The products transported in maritime inventory rout- 
ing problems are usually bulk products, where large 
quantities are transported and there are inventories at 
both the loading and the unloading ports. In these 
problems, the ship operators have a twofold responsi- 
bility: transportation and inventory management at the 
production and consumption sites. In such planning 
situations, the routing and scheduling of the fleet have 
to be synchronized with the inventory management at 
both production and consumption sites. 

In the basic ISRP a single (homogeneous) prod- 
uct is transported. The product is produced at the 
sources, called loading ports, and consumed at the des- 
tinations, called unloading ports. Inventory storage ca- 
pacities are given in all ports in addition to the pro- 
duction or consumption rate of the product. Here, the 
rate is assumed constant during the planning horizon. 
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Inventory level, consumption port 
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Inventory levels during a planning period for a production and a consumption port 


The number of calls at a given port during the plan- 
ning period is not predetermined, nor is the quantity 
to be loaded or unloaded at each port call. Figure 1 
shows an example of a production/consumption and 
loading/unloading pattern. For both port types, the port 
is called at twice. However, the quantities loaded or 
unloaded differ at each call. The reason for this might 
be that the ports are visited by ships with different ca- 
pacities loading/unloading full loads, or due to partial 
loading/unloading. In loading ports, it is important to 
ensure that the inventory level is not above the maxi- 
mum inventory level when loading starts and not un- 
der the minimum inventory level when the loading has 
finished. In unloading ports, the opposite has to be 
ensured. The inventory level at the beginning of the 
planning period can be at any level, as indicated in 
Fig. 1. 

Therefore, the planning problem is to design routes 
and schedules that minimize the transportation cost 
without interrupting production or consumption. We 
assume no inventory costs because the shipper owns 
both the producing sources and the consuming destina- 
tions. The ship operator controls a heterogeneous fleet 
of ships. We assume that partial loading and unloading 
is allowed, such that two ports of the same type (loading 
or unloading) may be called at in succession. The ship 
is not necessarily empty at the beginning of the plan- 
ning horizon, but might have some load onboard. The 
ship can be either at a port or at sea at the beginning of 
the planning horizon. Figure 2 shows a simplified illus- 
tration of the planning problem for a cement producer 
in Norway with two production factories and five con- 
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Maritime Inventory Routing Problems, Figure 2 
A simplified planning situation with seven ports and two 
ships 


sumption ports with inventories. The fleet consists of 
two ships. Each port can be called at several times dur- 
ing the planning period by the same ship or different 
ships. 


Mathematical Model 


The model of the ISRP will be presented in a compact 
and simplified way. In Sect. “Routing,” we describe the 
flow network and the objective function. Then, the con- 
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ditions for the loading and unloading, the time aspects 
and the inventories are described in Sect. “Loading 
and Unloading”, “Scheduling” and “Inventory Manage- 
ment,” respectively. We base our notation and model 
on those of Christiansen et al. [3]. 

In the upcoming formulation, we have assumed that 
the ship may be partially loaded/unloaded, meaning 
that multiple cargoes may be onboard a ship simultane- 
ously. The model could have been simplified if we had 
assumed full loads only or sailing between different port 
types (from loading to unloading and vice versa). 


Routing In the mathematical description of the net- 
work each port is represented by an index i and the set 
of ports is given by N. Let YV, indexed by v, be the 
set of available ships to be routed and scheduled. Not 
all ships can visit all ports, and N, = {feasible ports 
for ship v} U{o(v),d(v)} is the set of ports that can be 
visited by ship v. The terms o(v) and d(v) represent 
the artificial origin port and artificial destination port 
of ship v, respectively. Each port can be visited sev- 
eral times during the planning horizon, and ™ is the 
set of possible calls at port i, while M;, is the set of 
calls at port i that can be made by ship v. The port call 
number is represented by an index m, and M; is the 
last possible call at port i within the planning period. 
The set of nodes in the flow network represents the set 
of port calls, and each port call is specified by (i,m), 
ie N,m € Mj. In addition, we specify flow networks 
for each ship v with nodes (i,m), i € Ny,m € Miy. Ay 
contains all feasible arcs for ship v, which is a subset of 
{i € N,, mE Miy} x {i € Ny, mE Miy}. Finally, Czy 
represents the variable costs for sailing between port i 
and port j with ship v. This includes port, channel and 
fuel costs. 

In the network flow part of the formulation we use 
the following types of variables: the binary flow variable 
Ximjnv » V € V, (i, m, j,n) € Ay equals 1, if ship v sails 
from node (i, m) directly to node (j, 2), and 0 otherwise, 
and the slack variable wi, i € NV, m € M; is equal to 
1 if no ship takes port call (i,m), and 0 otherwise. The 
routing formulation including the objective function is 
as follows: 


CijvXimjny, (1) 


min a » 


veV (i,m, j,n)EAy 


subject to 


> ys Ximjnv + Wim = 1, Vie N,me Mj, 


vEeV jEN, nEM jy 


(2) 
> x Xo(v)ljnv = 1, Vve Vv, (3) 
JEN, NEM jy 
a > Ximjnv — > > Xjnimv = 0, 
iEN, MEM, iEN, mMEMiy (4) 


Vv e Vj € Ny \{o(v), d(v)},n € Mjy, 


= » Ximd(v)lv = 1,Vve Vv, (5) 


iEN, MEMiy 


Wim — Wi(m—1) = 0, Vie N,me Mi, (6) 
Ximjnv € {0,1}, Vv € V, (i,m, j,n) € Ay, (7) 
Wim € {0,1}, Vie N,me Mj. (8) 


The objective function (1) minimizes the total costs. 
Constraints (2) ensure that each port call is visited at 
most once. Constraints (3)-(5) describe the flow on the 
sailing route used by ship v. One or several of the calls in 
a specified port can be made by a dummy ship, and the 
highest call numbers will be assigned to dummy ships 
in constraints (6). These constraints reduce the number 
of symmetrical solutions in the solution approach. For 
the calls made by a dummy ship, we get artificial start- 
ing times and artificial inventory levels within the de- 
fined upper and lower limits. Finally, the formulation 
involves binary requirements (7) and (8) on the flow 
variables and port call slack variables, respectively. 


Loading and Unloading The capacity of ship v is 
given by Vcapy. Variable liny, v € V, i € Ny \{d(v)}, 
m € Mi, gives the total load onboard ship v just af- 
ter the service is completed at node (i, m), while vari- 
able gimv,v € V, i € Nv\{d(v)},m € Mi, represents 
the quantity loaded or unloaded at port call (i, m), when 
ship v visits (i,m). It is assumed that nothing is loaded 
or unloaded at the artificial origin o(v);qoiv)iv = 0. 
However, the ships may have cargo onboard, Lo, , at the 
beginning of the planning horizon; [,(y);y = Loy. Fur- 
ther, constant I; is equal to 1 if port i is a loading port, 
—1 if port iis an unloading port and 0 if port i is o(v) or 
d(v). Constraints related to the quantity onboard a ship 
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can be formulated as follows: 


Ximjnv limy ae Tj jnv = linv) = 0, 
Vve V,(i,m, j,n) € A,|j 4 d(v), 
dimv = lim = ». > VeapvXimjnvs 
JEN, NEM jy 
Ve Vie N, me Miy|l; = 1, 
O< limv = » > VeapvXim jnv — dimv, 
GEN, nEM jy 


Ye Vie N,, me Miy|I; = —1. 


(9) 


(10) 


(11) 


Constraints (9) give the relationship between the bi- 
nary flow variables and the ship load at each port call. 
Constraints (10) and (11) give the ship capacity inter- 
vals at the port calls for loading and unloading ports, 
respectively. 


Scheduling The time required to load or unload the 
ship may constitute a major part of the total time in 
many maritime transportation applications. It is there- 
fore usual to calculate this as a function of the quantity 
loaded/unloaded. The time spent loading/unloading 
one unit of a cargo at port i is given by Tq;. The term 
Tsiy represents the sailing time from port i to port j 
with ship v. In some ports, there is a minimum required 
time, T'z;, between the departure of one ship and the ar- 
rival of the next ship, due to small port area or narrow 
channels from the port to the pilot station. The time 
variable tin, (i € N,m € M;) U(i € o(v), Vv, m = 1) 
represents the time at which service begins at node 
(i,m). It is assumed that the ship arrives at o(v) at 
a given fixed time; to()1 = Toy. Finally, let T denote the 
planning horizon. The scheduling constraints can now 
be written as follows: 


Ximjnv(tim + Taiqimy + Tsijy — tin) < 9, 


(12) 
Vv e V,(i,m, j,n) € A,|j 4 d(v), 


tim — ti(m—1) — > ToQifi(m—1)v + TaiWim = Tai, 
veV 


Vie N,me Mj\{l}. (13) 


Constraints (12) take into account the timing or 
scheduling on the route. Note that waiting at a port is 
allowed. Constraints (13) prevent service overlap in the 
ports and ensure the order of real calls at the same port. 
A ship must complete its service before the next ship 
starts its service at the same port. If port i does not have 


constraints regarding the minimum time between de- 
parture of one ship and arrival of the next, Tg; = 0. If 
port i also allows the service of several ships simultane- 
ously, constraints (13) will simply be tim — ti¢m—1) = 0, 
to ensure the order of calls at the port. 


Inventory Management The levels of the inventory 
have to be within a given interval at each port [Syn 
Suxi ]. The production rate R; is positive if port iis pro- 
ducing the product, and negative if port i is consuming 
the product. At the beginning of the planning horizon, 
the inventory level at each port i is So; . Finally, sim, 
i¢ N,m € Mj represents the inventory level when 
service starts at port call (i,m). The inventory con- 
straints of the formulation become 


Si) — Ritiy = Sois ViEN, (14) 


Si(m—1) — 3 Tiqi(m—1v + Riltim — ticm—1)) — Sim 
veV 
=i. 


Vie N,meM;\{1}, (15) 


SMNi XS Sim XS Smxi, Vie Nim € Mi, (16) 


SMNi XS Sim — 2 Tidimy + R(T — tim) < Suxi. 
veV 


Vie N,m = Mj. (17) 


The inventory level at the first call at each port is cal- 
culated in constraints (14). From constraints (15), we 
find the inventory level at any port call (i,m) from the 
inventory level upon arrival at the port in the previous 
call (i,m-1), adjusted for the loaded/unloaded quantity 
at the port call and the production/consumption be- 
tween the two arrivals. The general inventory limit con- 
straints at each port call are given in (16). Constraints 
(17) ensure that the level of inventory at the end of the 
planning horizon is within its limits. It can easily be 
shown by substitution that constraints (17) ensure that 
the inventory at time T will be within the bounds even 
if ports are not visited at their last calls. 


A Real Application An application that is close to 
the ISRP is a real ship planning problem for ammo- 
nia transportation. Norsk Hydro Agri (now named 
Yara) produces and consumes ammonia in its facto- 
ries worldwide. The planners at the company are re- 
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sponsible for keeping the inventory levels within the 
predefined upper and lower limits at all Norsk Hy- 
dro Agri factories around the world where they pro- 
duce and consume ammonia. This requires the plan- 
ners to design routes and schedules for their fleet of 
heterogeneous ships transporting ammonia from pro- 
duction ports to consumption ports. The problem is 
described in detail in Christiansen [2]. The overall so- 
lution approach is based on a column generation ap- 
proach with columns for both the ship routes and the 
inventory management sequences [5], where subprob- 
lems are solved by dynamic programming for each port 
and each ship [6]. Another solution approach to the 
same problem was developed by Flatberg et al. [9]. They 
used an iterative improvement heuristic combined with 
an linear program (LP) solver to solve this problem. 
The heuristic is used to solve the combinatorial prob- 
lem of finding the ship routes, and an LP model is used 
to find the starting time of service at each call and the 
loading/unloading quantities. 


Extensions of the ISRP 


Most of the real applications of maritime inventory 
routing problems have a more complex structure than 
the basic ISRP. We present here various extensions of 
the ISRP that are described in the literature or have 
been experienced in our research group. In many mar- 
itime applications, several of the extensions are com- 
bined. 


One Central Supplier or Consumer 


As mentioned in the introduction, the road-based in- 
ventory routing problem often has a vehicle routing 
problem (VRP) structure, where a central supplier (or 
depot) serves a set of customers with a local inventory 
and a consumption rate. We can imagine a lot of real 
planning problems with such a structure, for instance, 
in the gasoline business, delivering gasoline to gas sta- 
tions from a refinery or central storage. Milk collection 
at farms for transport to a dairy has the opposite struc- 
ture, where the customers are producers and the depot 
consumes the milk. 

In the maritime sector, we can also find this VRP 
structure for ship operators dealing with maritime in- 
ventory routing problems. The Norwegian oil com- 
pany Statoil will start its production of natural gas from 


Snohvit, Melkoya, north of Norway in 2008. Most of 
the gas will be cooled down and transported as lique- 
fied natural gas (LNG) by LNG tankers. At the mo- 
ment the planning problem concerns one source pro- 
ducing the gas and several consumption ports. Frich 
and Horgen [10] presented a mixed integer program 
(MIP) model of the planning problem where this spe- 
cial VRP structure is exploited. 

Similar maritime inventory routing problems can 
be found, for instance, with the Arabian Gulf as the 
source for the transportation of both LNG and heavier 
oil products. 


Inventory Constraints in Either Production Ports 
or Consumption Ports 


For the ISRP, the inventory management is considered 
at both the loading and the unloading ports. However, 
many real planning problems concern the design of 
routes and schedules for a fleet of ships with inventory 
constraints at just one of the port types. There exist for 
instance ship operators engaged in vendor managed in- 
ventory (VMI) contracts. Here, the ship operator mon- 
itors its customers’ inventories and must ensure that 
these are kept within predefined limits. Often, the cus- 
tomers are concerned about inventories at only the un- 
loading ports, while the ship operator has entered into 
a contract to supply the product with given quantities 
and time windows from the loading ports. The oppo- 
site might also be the case, where the customers have 
inventories at only loading ports. Then, the ship oper- 
ator must also engage in contracts to deliver these vol- 
umes with given quantities and time windows. 


Variable Production or Consumption Rate 


The production and consumption rates are assumed 
constant for all port inventories during the planning 
period in the ISRP. However, for many real planning 
problems this assumption is too coarse, and the pro- 
duction and consumption that may vary from day to 
day have to be taken into account in the modeling. In- 
cluding this aspect into the basic ISRP model would re- 
sult in a more complicated model and it would become 
harder to solve. 

A maritime inventory routing problem for the LNG 
business was considered by Gronhaug et al. [11]. Here 
the production of LNG at the liquefaction plants and 
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the consumption of LNG at the regasification terminals 
have to be regarded as variable. In order to overcome 
these complicating factors, a time discretized model 
was developed, and it was solved by a column gener- 
ation approach. 

Also Ronen [16] used a time discretized model with 
a variable production and consumption rate for an 
inventory routing problem for refinery products. The 
model focuses on the inventory and not the routing part 
of the problem, as the model solution suggests ship- 
ment sizes that are assumed to be an input for a cargo 
routing problem at a later stage. 


Multiple Products 


Here we extend the ISRP to the multiproduct case. In 
the ISRP several cargoes may be transported simulta- 
neously in one ship, but the product is assumed to be 
the same. This means that the product does not need to 
be transported in separated compartments onboard the 
ship or stored in separate stores at the ports. 

The problem with multiple products is frequently 
encountered by chemical and oil product transport 
companies. Al-Khayyal and Hwang [1] gave a math- 
ematical formulation for such a problem where the 
products are assumed to require dedicated compart- 
ments in the ship. For this problem there exist inven- 
tory limits and production/consumption rates for each 
product in each port. Hwang [13] used a combined La- 
grangian relaxation and heuristic approach to solve test 
instances of the problem. 

The problem described in Ronen [16] also includes 
multiple products. Sometimes the stowage onboard the 
ship must also be considered in the inventory routing 
problem; see, for instance, Haugen and Lund [12] for 
the transportation of cement products. 


Use of Spot Charters 


In some cases the dedicated fleet of ships has insuf- 
ficient transportation capacity to provide continuous 
production at all sources and consumption at all desti- 
nations. In such a case some of the loads can be serviced 
by spot charters, which are ships chartered for a single 
voyage. 

The cement company described by Haugen and 
Lund [12] is faced with limited vessel capacity. In some 
periods the company makes use of spot charters, while 


in peak periods additional road-based transportation 
is necessary. In their solution approach, the consump- 
tion inventories are sorted according to their impor- 
tance and their location regarding what the cost effects 
for additional trucks will be. It is ensured that the in- 
ventories with highest priority are served by the fleet of 
ships. 


Combined Inventory Routing and Cargo Routing 


The cargo routing problem was introduced in the intro- 
duction. For this problem, there exist predefined car- 
goes with specified quantities and time windows. The 
cargoes may be contracted or optional spot ones. Often 
the companies facing an ISRP trade cargoes with other 
operators in order to better utilize the fleet and to en- 
sure there is product balance at their own plants. 

In the real problem described by Christiansen [2], 
the shipper trades ammonia with other operators. 
These traded volumes are determined by negotiations. 
The ship operator undertakes to load or unload ammo- 
nia within a determined quantity interval and to arrive 
at a particular port within a given time window. For 
these external ports, no inventory management prob- 
lem exists. 

There also exist shipping companies that have VMI 
contracts with some customers, but apart from that are 
involved in ordinary cargo routing. This will give these 
shipping companies a combined inventory and cargo 
routing planning problem. 


Combining Inventory Routing 
with Other Planning Aspects 


The ISRP concerns parts of a supply chain and focuses 
on sea transportation and the inventories at both ends 
of the sailing leg. In many real planning situations, it 
is sensible to consider larger parts of the supply chain. 
Persson and Gothe-Lundgren [15] studied a planning 
problem that integrates both the shipment planning of 
petroleum products from refineries to depots and the 
production scheduling at the refineries. Shih [17] and 
Liu and Sherali [14] presented two other maritime sup- 
ply chain applications where coal is transported. 
Rather than considering a larger part of the supply 
chain, the ISRP may be combined with other planning 
aspects. In Sect. “Multiple Products,” we referred to the 
combined ISRP and stowage of different cement prod- 
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ucts in various compartments onboard the ships. See 
Haugen and Lund [12] for more information about the 
case and solution approach. 


Concluding Remarks 


We have described the maritime inventory routing 
problem, which is a combined inventory management 
and a ship routing and scheduling problem. The so- 
called basic ISRP and several extensions to the ISRP 
were presented. In practice, planners are more often 
faced with extensions of the ISRP and also the exten- 
sions described in combination with each other. 

As far as we know, no generic commercial optimiza- 
tion-based decision support system exists for solving 
maritime inventory routing problems. However, the 
shipping industry is experiencing an increased need for 
such systems owing to extended planning responsibility 
and increased fleet sizes. We expect that such systems 
will be available on the market in the years to come. 

The basic VRP is computationally very hard. The 
maritime inventory routing problem is even more de- 
manding owing to the additional degrees of freedom. 
Many of the extensions discussed in this article are 
barely touched on in the operations research commu- 
nity. This means that there exist a lot of research chal- 
lenges, in the development of both exact methods and 
heuristic solution methods. 

Maritime transportation is faced with higher un- 
certainty in its operations compared with many other 
modes of transportation. This is due to greater depen- 
dence on weather conditions and technology. For the 
maritime inventory routing problem, we have also un- 
certainties in the production and consumption at the 
inventories. The consideration of these uncertainty as- 
pects is another interesting topic of research. 
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Introduction 


Progress in digital data acquisition and storage tech- 
nology has resulted in the growth of huge databases. 
This has occurred in a variety of scientific and engi- 
neering research applications [8] as well as medical do- 
main [19,20]. Making sense out of these rapidly grow- 
ing massive data sets gave birth to a “new” scientific 
discipline often referred to as Data Mining. Defining 
a discipline is, however, always a controversial task. 
The following working definition of the area was re- 
cently proposed [9]: Data mining is the analysis of (of- 
ten large) observational data sets to find unsuspected 


relationships and to summarize the data in novel ways 
that are both understandable and useful to the data 
owner. 

Clearly the term data mining if often used as a syn- 
onym for the process of extracting useful information 
from databases. However, the overall knowledge dis- 
covery from databases (KDD) process is far more com- 
plicated and convoluted and involves a number of ad- 
ditional pre and post-processing steps [6]. Therefore, 
in our definition data mining refers to the ensem- 
ble of new, and existing, specific algorithms for ex- 
tracting structure from data [8]. The exact definition 
of the knowledge extraction process and the expected 
outcomes are very difficult to characterize. However, 
a number of specific tasks can be identified and, by and 
large, define the key subset of deliverables from a data 
mining activity. Two such critical activities are classifi- 
cation and clustering. 

A number of variants for these tasks can be identi- 
fied and, furthermore, the specific structure of the data 
involved greatly impacts the methods and algorithms 
that are to be employed. Before we proceed with the ex- 
act definition of the tasks we need to provide working 
definitions of the nature and structure of the data. 


Basic Definitions 


For the purposes of our analysis we will assume that the 
data are expressed in the form of n-dimensional fea- 
ture vectors x € X C it”. Appropriate pre-processing 
of the data may be required to transform the data into 
this form. Although in many cases this transformations 
can be trivial, in other cases transforming the data into 
a “workable” form is a highly non-trivial task. The goal 
of data mining is to estimate an explicit, or implicit, 
function that maps points of the feature vector from the 
input space, X C ", to an output space, C, given a fi- 
nite sample. The concept of the finite sample is impor- 
tant because, in general, what we are given is a finite 
representative subset of the original space (training set) 
and we wish to make predictions on new elements of 
the set (testing set). The data mining tasks can thus de 
defined based on the nature of the mapping C and the 
extent to which the train set is characterized. 

If the predicted quantity is a categorical value and 
if we know the value that corresponds to each elements 
of the training set then the question becomes how to 
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identify the mapping that connects the feature vector 
and the corresponding categorical value (class). This 
problem is known as the classification problem (super- 
vised learning). If the class assignment is not known 
and we seek to: (a) identify whether a small, yet un- 
known, number of classes exist; (b) define the mapping 
assigning the features to classes then we have a cluster- 
ing problem (unsupervised learning). 

A related problem associated with superfluous in- 
formation in the feature vector is the so-called feature 
selection problem. This is a problem closely related to 
over-fitting in regression. Having a minimal number 
of features leads to simpler models, better generaliza- 
tion and easier interpretation. One of the fundamental 
issues in data mining is therefore to identify the least 
number of features, sub-set of the original set of fea- 
tures, that best address the two issues previously de- 
fined. The concept of parsimony (Occam’s razor) is of- 
ten invoked to bias the search [1]: never do with more 
what can be done with fewer. 

Although numerous methods exist for addressing 
these problems they will not be reviewed here. Nice 
reviews of classification and were recently presented 
in [8,9]. In this short introduction we will concentrate 
on solution methodologies based on reformulating the 
clustering, and classification questions as optimization 
problems. 


Mathematical Programming Formulations 


Classification and clustering, and for that matter most 
of the data mining tasks, are fundamentally optimiza- 
tion problems. Mathematical programming method- 
ologies formalize the problem definition and make use 
of recent advances in optimization theory and appli- 
cations for the efficient solution of the corresponding 
formulations. In fact, mathematical programming ap- 
proaches, particularly linear programming, have long 
been used in data mining tasks. 

The pioneering work presented in [13,14] demon- 
strated how to formulate the problem of constructing 
planes to separate linearly separable sets of points. 

In this summary we will follow the formalism put 
forth in [2] since it presented one of the most com- 
prehensive approaches to this problem. One of the ma- 
jor advantages of a formulation based on mathemati- 
cal programming is the ease in incorporating explicit 


problem specific constraints. This will be discussed in 
greater detail later in this summary. 


Classification 


As discussed earlier the main goal in classification is to 
predict a categorical variable (class) based on the values 
of the feature vector. The general families of methods 
for addressing this problem include [9]: 


i) Estimation of the conditional probability of ob- 
serving class C given the feature vector x. 

ii) Analysis of various proximity metrics and based 
the decision of class assignment based on proxim- 
ity. 

iii) Recursive input space partitioning to maximize 
a score of class purity (tree-based methods). 


The two-class classification problem can be formulated 
as the search of a function that assigns a given input 
vector x into two disjoint point sets A and B. The data 
are represented in the form of matrices. Assuming that 
the set A has m elements and the set B has k elements, 
then A € 8", B © 2**", describe the two sets re- 
spectively. The discrimination in based on the deriva- 
tion of hyperplane 


P= {x|x eR", xo = y} 


with normal and distance from the origin wr The op- 
timization problem then becomes to determine w and y 
such that the separating hyperplane P defines two open 
half spaces 


{x|x ER" xbw < y} 


{x|x ER", x'w > yp} 


containing mostly points in A and B respectively. Un- 
less A and B are disjoint the separation can only be sat- 
isfied within some error. Minimization of the average 
violations provides a possible approximation of the sep- 
arating hyperplane [2]: 
. 1 1 
min — ||(—Aw + ey +e)-+ hi + 7I\(—Bo + ey +e)+Ih 
oy m k 
In [2] a number of linear programming reformulations 
are discussed exploring the properties of the structure 
of the optimization problem. In particular an effective 
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robust linear programming (RLP) reformulation was 
suggested making possible the solution of large-scale 
problems: 
T T 
min ae + : 
@,Yy.2 mM k 
s.t.—Aw+ey+texy 
Bo-—ey+ex<z 
y,z=0. 


In [17] it was demonstrated how the above formulation 
can be applied repeatedly to produce complex space 
partitions similar to those obtained by the application 
of standard decision tree methods such as C4.5 [21] or 
CART [4]. 


Clustering 


The goal of clustering is the segmentation of the raw 
data into groups that share a common, yet unknown, 
characteristic property. Similarity is therefore a key 
property in any clustering task. The difficulty arises 
from the fact that the process is unsupervised. That is 
neither the property nor the expected number of groups 
(clusters) are known ahead of time. The search for the 
optimal number of clusters is parametric in nature and 
the optimal point in an “error” vs. “number of clusters” 
curve is usually identified by a combined objective the 
weighs appropriately accuracy and number of clusters. 
Conceptually a number of approaches can be developed 
for addressing clustering problems: 


i) Distance-based methods, by far the most com- 
monly used, that attempt to identify the best k-way 
partition of the data by minimizing the distance of 
the points assigned to cluster k from the center of 
the cluster. 

ii) Model-based methods assume the functional form 
of a model that describes each of the clusters and 
then search for the best parameter fit that models 
each cluster by minimizing some appropriate likeli- 
hood measure. 


There are two different types of clustering: (1) hard 
clustering; (2) fuzzy clustering. The former assigns 
a data point to exactly one cluster while the latter as- 
signs a data point to one of more clusters along with the 
likelihood of the data point belonging to one of those 
clusters. 


The standard formulation of the hard clustering 
problem is: 


m 
min > min IIx? —c! ll, 
Cc 


i=1 


That is given m points, x, in an n-dimensional space, 
and a fixed number of cluster, k, determine the centers 
of the cluster, c, such that the sum of the distances of 
each point to a nearest cluster center is minimized. It 
was shown in [3] that this general non convex problem 
can be reformulated such that we minimize a bilinear 
functions over a polyhedral set by introducing a selec- 
tion variable t;;: 


m ek 
: T 
min ) y ti(e° dj 
wee ii il) 


i=1 i=1 


s.t.— dj <x! —c! < di 


d is a dummy variable used to bound the components 
of the difference x — c. In the above formulation the 
1-norm is selected [3]. 

The fuzzy clustering problem can be formulated as 
follows [5]: 


m k 
2 12 
min 7 do will! — cl 
i=] [=1 
k 
s.t. > Wil = 1 
1=1 
wii = 1, 
where x', i = 1,..., mis the location descriptor for the 
data point, c!,1 = 1,...,k is the center of the cluster, 


w;, is the likelihood of a data point i being assigned to 
cluster J. 


Support Vector Machines 


This optimization formalism bares significance resem- 
blance to the Support Vector Machines (SVM) frame- 
work [25]. SVM incorporate the concept of structural 
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risk minimization by determining a separating hyper- 
plane that maximizes not only a quantity measuring the 
misclassification error but also maximizing the mar- 
gin separating the two classes. This can be achieved 
by augmenting the objective of the RLP formulation 
earlier presented by an appropriately weighted mea- 
sure of the separation between the two classes as 
(1—A)(eTy + e7z) + Alla|]3. 

In [6] the concept of SVM is extended by introduc- 
ing the Proximal support vector machines which clas- 
sify points based on proximity to one of two parallel 
planes that are pushed as far apart as possible. Non- 
linear transformations were also introduced in [6] to 
enable the derivation of non-linear boundaries in clas- 
sifiers. 


Multi-Class Support Vector Machines 


Support vector machines were originally designed for 
binary classification. Extending to multi-class problems 
is still an open research area [10]. 

The earliest multi-class implementation is the one 
against all [22] by constructing k SVM models, where 
k is the number of classes. The ith SVM is classifies 
the examples of class i against all the other samples in 
all other classes. Another alternative builds one against 
one [12] classifiers by building Keo) models where 
each is trained on data from two classes. The emphasis 
of current research is on novel methods for generating 
all the decision functions through the solution of a sin- 
gle, but much larger, optimization problem [10]. 


Data Mining in the Presence of Constraints 


Prior knowledge about a system is often omitted in data 
mining applications because most algorithms do not 
have adequate provisions for incorporating explicitly 
such types of constrains. Prior knowledge can either en- 
codes explicit and/or implicit relations among the fea- 
tures or models the existence of “obstacles” in the fea- 
ture space [24]. 

One of the major advantages of a mathematical 
programming framework for performing data min- 
ing tasks is that prior knowledge can be incorporated 
in the definition of the various tasks in the form of 
(non)linear constraints. Efficient incorporation of prior 
knowledge in the form of nonlinear inequalities within 
the SVM framework was recently proposed by [15]. Re- 


formulations of the original linear and nonlinear SVM 
classifiers to accommodate prior knowledge about the 
problem were presented in [7] in the context of approx- 
imation and in [16] in the context of classifiers. 


Data Mining and Integer Optimization 


Data mining tasks involve, fundamentally, discrete de- 
cisions: 

e How many clusters are there? 

e Which class does a record belong to? 

e Which features are most informative? 

e Which samples capture the essential information? 
Implicit enumeration techniques such as branch-and- 
bound were used early on to address the problem of 
feature selection [18]. 

Mathematical programming inspired by algorithms 
for addressing various data mining problems are now 
being revisited and cast as integer optimization prob- 
lems. Representative formulations include feature se- 
lection using Mixed-Integer Linear Programs [11] and 
in [23] integer optimization models are used to address 
the problem of classification and regression. 


Research Challenges 


Numerous issues can of course be raised. However, we 
would like to focus on three critical aspects 


i) Scalability and the curse of dimensionality. Data- 
bases are growing extremely fast and problems of 
practical interest are routinely composed of mil- 
lions of records and thousands of features. The 
computational complexity is therefore expected to 
grow beyond what is currently reasonable and 
tractable. Hardware advances alone will not address 
this problem either as the increase in computational 
complexity outgrows the increase in computational 
speed. The challenge is therefore two-fold: either 
improve the algorithms and the implementation of 
the algorithms or explore sampling and dimension- 
ality reduction techniques. 

ii) Noise and infrequent events. Noise and uncertainty 
in the data is a given. Therefore, data mining al- 
gorithms in general and mathematical program- 
ming formulations in particular have to account for 
the presence of noise. Issues from robustness and 
uncertainty propagation have to be incorporated. 
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However, an interesting issue emerges: how do we 
distinguish between noise and an infrequent, albeit 
interesting observation? This in fact maybe a ques- 
tion with no answer. 


iii) Interpretation and visualization. The ultimate goal 


of data mining is understanding the data and de- 
veloping actionable strategies based on the conclu- 
sions. We need to improve not only the interpreta- 
tion of the derived models but also the knowledge 
delivery methods based on the derived models. Op- 
timization and mathematical programming needs 
to provide not just the optimal solution but also 
some way of interpreting the implications of a par- 
ticular solution including the quantification of po- 
tential crucial sensitivities. 
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Introduction 


Supply chain management (SCM) is the integration 
of key business processes from end users through the 
original suppliers to the customers that provides prod- 
ucts, services and information that add value to all par- 
ties [13,14,17]. It is therefore concerned with the orga- 
nization, the planning and the qualitative and quanti- 
tative determination of material and information flows 
both in and between facilities (vendors, plants, sites and 
distribution centres) and between these and the final 
consumers. It is a set of important activities in all pro- 
ducing facilities and in many organizations [6]. 

For some restricted production problems, such as 
determining an optimal control to a chemical plant, 
suitable experimental designs can be enacted, such as 
EVolutionary OPeration (EVOP) [4], Taguchi meth- 
ods [19], or more complex experimental designs such 
as Latin squares, Greek squares and block designs [21]. 

In general, verification procedures, based on exper- 
imental replication and design, cannot be used in the 
applied sciences, as non-reversible and unpredictable 
changes in the environment occur [18], and the out- 
come of the plans cannot be imputed to the effect of the 
decision taken rather than to an environmental change, 
so there can be no evaluation of the relevance of a for- 
mulated supply chain plan. 

Thus more complex methodologies than those 
based on experimental verification, such as intuition 
experimental design or anecdotal evidence, must be 
posited. The solution of any SCM problem must be un- 
dertaken with respect to a set of principles and proce- 
dures to ensure the formulation of expectationally valid 


plans, i. e. robust valid feasible policies are determined. 

To enable management to formulate good SCM 
plans, the methodology proposed should be analysed 
for its logical consistency, its statistical correctness and 
its adequacy. Essentially, it must be shown that from 
acceptable premises or axioms, by suitable deductions 
a policy is formulated (syntactically correctness). Since 
this policy cannot be tested, but only applied, it must 
also be shown that in many other historical derivations 
the policies that were formulated by this methodology 
turned out to be applicable (semantically adequate). 

A dynamic non-linear stochastic system formula- 
tion of an SCM model must be estimated and applied. 
Thus an optimization algorithm must be specified and 
solved which determines simultaneously the adequate 
functional form, its parameterization and the optimal 
control [6]. 


Definitions 


In this section some fundamental definitions will be 
given. 

A dynamical system is a precise mathematical ob- 
ject [16], and given the flows of the activities of the phe- 
nomenon, the input-output relationships must be de- 
termined by appropriate estimation methods. 

Not every relationship can be modelled by mathe- 
matical system theory, since a representation which is 
non-anticipatory is required [16], while the condition 
that the functionals be sufficiently smooth which was 
previously required may be waived. 

Dynamical systems have been defined at a high level 
of generality to refine concepts and perceive unity in 
a diversity of applications, and by appropriate mod- 
elling whole hierarchies of phenomena can be repre- 
sented as systems defined at different levels. 


Definition 1 ([16]) A dynamical system is a composite 

mathematical object defined by the following axioms: 

1. There is a given time set T, a state set X, a set of 
input values U, a set of acceptable input functions 
Q =o: 2 — U,a set of output values Y and a set 
of output functions 1 = y: I > Y. 

2. (Direction of time). T is an ordered subset of the re- 
als. 

3. The input space {2 satisfies the following conditions: 
(a) (Non-triviality). 92 is non-empty. 
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(b) (Concatenation of inputs). An input segment 
O(h,b) © € 2 restricted to (4,h) NT. If 
ow,’ € Qandt, < ty < tz, there isanw” € Q 


” as ” — vw) 
such that Mh) = Olt tal and oO Cr t5]" 


4. There is a state transition acion Tx Tx 
X x 22 — X whose value is the state x(t) = 
y(t;t,x,@) € X resulting at time t € T from the 
initial state x = x(t) € X at the initial time t € T 
under the action of the input w € 22. ¢ has the fol- 
lowing properties: 

(a) (Direction of time). ¢ is defined for all t > t, but 
not necessarily for all t < tT. 

(b) (Consistency). y(t; t,x,@) = x forall t € T, all 
x€ Xandallw € 2. 

(c) (Composition property). For any t < tz < fs 
there results: 


9(t3; t1,x,@) = p(t; to, P(ta; t,x, @), @) 


for allx € X andallw € Q. 
(d) (Causality). If @,@'€ 2 and w@,4 =; 1 
then g(t; t,x, @) = Q(t3T, x, w’). 

5. There is a given readout map 7: Tx X > Y 
which defines the output y(t) = n(t,x(t)). The 
map (t,t] > Y given by o } n(0,¢(0,T, x, @)), 
o €(t,¢t], is an output segment, that is the restric- 
tion y(r,4) of some y € I to (t, t]. 


The following mathematical structures in Definition 1 

will be indicated by: 

e The pair (t,x),t € T,x € X Vtis called an event; 

e The state transition function g(x;, u;) is called a tra- 
jectory. 

Phenomena may also be modelled through dynamical 

systems in the input/output sense, which reflect an ex- 

perimental design or a simulative approach, long ap- 

plied in science. 


Definition 2. A dynamical system in an input/output 

sense is a composite mathematical object defined as fol- 

lows: 

1. There are given sets T, U, 2, Y and I satisfying all 
the properties required by Definition 1. 

2. There is a set A indexing a family of functions 


F =tfa: Tx 2—- Y,a € A}; 


each member of F is written explicitly as fo(t,@) = 
y(t), which is the output resulting at time ¢ from the 


input w under the experiment a. Each fy is called an 

input/output function and has the following proper- 

ties: 

(a) (Direction of time). There is a map 1: A> T 

such that f(t, @) is defined for all t > (a). 
(b) (Causality). Let t,t € Tandt <t.Ifw,a@’ € 2 
and @r,1] = r,¢p then fo(t,@) = fu(t, w’) for 
all w such that t = l(q@). 
While the input/output approach may determine 
a family of functions, which generally vary over the time 
interval of realization and across instances, the state- 
space approach represents the trajectories in the way in- 
dicated, through a unique function. The latter approach 
is intuitively more appealing, especially in applications. 

The representations are equivalent. It is easy to 
transform a given system from a state space for- 
mulation into an input/output formulation and vice 
versa [2,16], so each may be used as convenience sug- 
gests. 

It cannot be assumed generally that a dynamical sys- 
tem satisfies the conditions of smoothness, nor that it 
will meet the necessary and sufficient conditions for an 
optimal control to exist. Thus in general, the dynamical 
systems to be dealt with may have an awkward struc- 
ture, but through the combined estimation and opti- 
mization approach a sufficiently good approximation 
may be obtained with the required characteristics [6]. 

A sufficiently general representation of a dynamical 
system may be formulated by applying Definition 1, re- 
calling the equivalence of an input/output system and 
a system in state form: 


Xt+1 = P(Xt, Ur) , (1) 


Ye = (x1), (2) 


where x; € X C R’ may simply be taken as an r-dimen- 
sional vector in a Euclidean space X, indicating the state 
of the system at time t, u,; € U C R4 may be taken as 
a q-dimensional vector in a Euclidean subspace U of 
control variables and y; € Y C R? is a p-dimensional 
vector in a Euclidean space Y of output variables, in line 
with Definitions 1 and 2. 

The definition of a dynamical system is based on 
defining an intermediary set of states and a transition 
function or a family of functions. Neither of these con- 
structions is unique, so if it is desired to represent 
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a SCM system by such structures, equivalence of the 
possible structures must be shown. 


Definition 3 Given two states x;, and x, belonging to 
systems S and $ which may not be identical but have 
a common input space {2 and output space Y, the two 
states are said to be equivalent if and only if for all input 
segments @/1),1) € $2 the response segment of S starting 
in state x;, is identical with the response segment of s 
starting in state x,,; that is 


Xty SK Xt SH Nt, P(Xtq, @to,t))) = H(t, P(Kty, @to,1))) 
VieT = 4 Vng.g €S,S: 
(3) 


Systems S and $ may be two models of a SCM system 
solved with different control policies, or they may be 
various alternative models of the phenomenon. 


Definition 4 A system is in reduced form if there are 
no distinct states in its state space which are equivalent 
to each other. 


Definition 5 Systems S and S$ are equivalent S = S$ if 
and only if to every state in the state space of S there 
corresponds an equivalent state in the state space of $ 
and vice versa. 


Some important conditions are required to make the 
representation of the SCM adequate. 

The conditions of the system are: 

Reachability 

Controllability 

Observability 

Stability 

These conditions are very important since they allow 
trajectories to be defined, the initial point of trajecto- 
ries to be determined and their stability properties to be 
derived. Moreover they can be applied at any moment 
in time to determine if the goals of the SCM are still at- 
tainable and at what cost. Reachability, controllability 
and stability are seldom formally examined and yet at 
every period exogenous events can arise to nullify even 
the best formulated plan, so these are important instru- 
ments for SCM [6]. 

An important property which distinguishes dynam- 
ical systems from their counterparts derived in compar- 
ative statics is the distinction between systems which 
are simply equivalent and those which are multiply 


equivalent [6]. This distinction is crucial if dynamical 
systems are considered, while with comparative static 
models the distinction does not apply. This is one of the 
many reasons that one should insist on solving SCM 
dynamic estimation problems with a data-driven for- 
mulation [6]. 

The dynamical system representation of a SCM sys- 
tem permits one to verify its specification, whether the 
optimal control which determines the final event is 
reachable, if the system is controllable throughout the 
sequence of events comprising the trajectory, if the sys- 
tem is observable and finally if the given solution is 
stable, so that small perturbations will not give rise to 
explosive perturbations or to chaotic behaviour. In so 
doing crucial questions which are important to man- 
agement can be answered. 

If these conditions are not verified, this will suggest 
strategic changes to the SCM system or profound mod- 
ification of policies, aspects which are difficult to deter- 
mine in advance. 

Computationally, these aspects are handled by 
adding appropriate constraints in the mathematical 
program [6]. 


Formulation 


Consider the monitoring of a set of activities in time 
of a supply chain at a given level of aggregation, which 
may be at the department, plant or firm level, or a hi- 
erarchical system developed through all these organiza- 
tional structures. Although the accuracy of the repre- 
sentation may depend on the sampling strategy and the 
time interval, these aspects will not be considered here. 

Thus a given finite-dimensional estimation and op- 
timization problem will be considered which may well 
be non-linear and dynamic. 

Consider the data set of a phenomenon consisting 
of measurements (y’, x‘, u') over (t = 1,2, ... , T) pe- 
riods, where it is assumed that y; € R? is a p-dimen- 
sional vector, while x, € R" is an r-dimensional vector 
of explanatory or state variables of the dynamic pro- 
cess of dimension. Also, 1; is a q-dimensional vector of 
control variables. It is desired to determine functional 
forms gy: R't4 — R’ and n: R" > R? and aset of suit- 
able coefficients © € R™ such that: 


ag 
Min J= > clxp,ur,y2), (4) 


t=T+1 
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xt) = o(x'u',y'w') Vt=T+l1,...,0—1, (5) 


Pane w we) VST ely, (6) 
where w’ and v’ are stochastic processes also to be de- 
termined. 

Equation (4) is the objective function for the supply 
chain and (5) and (6) are the system equations in state 
space formulation and a similar representation may be 
adopted for the the input-output formulation [16,20]. 

The system (4)-(6) could be estimated by a maxi- 
mum likelihood method so as to minimize the random 
errors, indicated by w‘ € R’ and v' € R?, such that they 
will have minimum variance and zero mean value, and 
then on the quantified model the optimal control prob- 
lem could be solved, usually through an appropriate op- 
timization problem. 

However, for this type of model with serially cor- 
related disturbances, which are also correlated with the 
control variables, its estimation will be biased and the 
necessary least-squares properties to ensure an asymp- 
totically correct estimate may only be fulfilled in ex- 
ceptional cases. Thus the two-stage approach, indicated 
above, is inappropriate [15]. 

It is important to apply a suitable data-driven statis- 
tical method to determine the most appropriate statisti- 
cal form and the most precise values of the parameters, 
as when implemented correctly with regard to an ac- 
curately specified functional form. Such a method will 
provide estimates of parameters that satisfy the statisti- 
cal properties [1,18]. 

Suppose that all the statistical properties that a given 
estimate must fulfil are set up as constraints to the maxi- 
mum likelihood problem to be solved; then the parame- 
ters are defined implicitly by this optimization problem, 
which can be inserted into the optimal control system 
for policy determination, so that statistically correct es- 
timates will always result. Thus the solution yielding the 
best policy can be chosen, where T + 1,...,T is the 
forecast period, by solving an optimization formulation 
of this complex problem. By recursing on the specifi- 
cations, i.e. by changing the functional form, increas- 
ingly better fits can be obtained. At each iteration, the 
best combination of parameterization and policy is ob- 
tained. 

The unknowns to be determined are the input 
and output variables considered and the parameters 


of the functional form specified in the current it- 
eration, indicated as O = {6;, 2} C R”, respectively 
for (5) and (6). Note that m may be much larger than 
2r+q+p+41, the number of variables present in 
each system, since the system is non-linear. 

The mathematical program will be formulated with 
respect to the residual variables, but it is immediate that 
for a given functional form, the unknown parameters 
will be specified and thus the unknowns of the problem 
will also be defined and available. Thus the mathemat- 
ical program is fully specified for each functional form 
to be considered. 

Using the notation given above, the residual terms 
are given from Eqs. (5) and (6) as: 


A 


Wi = Xi41 — (Ki, Hi, 91: 1) i= 1,2,...,N, (7) 


Vi = Viti MX, Uj,vi: %) i=1,2,...,N, (8) 
where °, as usual, indicates the historical values of a vari- 
able, and thus suitable values of 0; and 62 must be de- 
termined by the mathematical program such that all the 
constraints expressed in terms of w;, v;Vi are specified. 


Methods and Applications 


Given an experimental data set obtained as a set of mea- 
surements of the operation of a phenomenon, it is de- 
sired to determine a suitable representation of it in the 
form of a model, so as to determine a suitable control 
law for the model which can then be extended to the 
phenomenon and thus obtain a better performance [3]. 

Except in some simple cases, the representation as- 
sumed by the model and the data that have been col- 
lected will condition the results obtainable by enacting 
the control law. For models that are non-linear in the 
parameters, the interaction between the estimation of 
these and the determination of an optimal control is 
much more complex than the linear case requiring the 
solution of constrained optimization problems which 
will determine simultaneously the best estimates and 
the optimal control. 

Consider the availability of a given data set con- 
taining a number of sets of time series data or cross- 
sectional data. To determine from these data a suitable 
model, a functional form must be selected and a set of 
suitable parameters must be estimated which will satisfy 
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all the conditions on the model and permit the determi- 
nation of a suitable set of control variables, which will 
define an optimal control with respect to a predefined 
merit function. 

Thus from the data set it is desired to derive a suffi- 
ciently accurate model of the phenomenon, which can 
then be used in control and in prediction. 

Statistical estimation methods are important be- 
cause, when implemented correctly with regard to an 
accurately specified functional form, they will provide 
estimates of parameters that have the following proper- 
ties [1,18]: 

1. The parameter estimates are unbiased. 
e As the size of the data set grows larger, the esti- 
mated parameters tend to their true values. 
2. The parameter estimates are consistent, which will 
then satisfy the following conditions: 

e The estimated parameters are aymptotically un- 
biased. 

e The variance of the parameter estimate must tend 
to zero as the data set tends to infinity. 

3. The parameter estimates are asymptotically efficient. 

e The estimated parameters are consistent. 

e The estimated parameters have smaller asymp- 
totic variance as compared to any other consis- 
tent estimator. 

4. The residuals have minimum variance, which is en- 
sured by the following factors: 

e The variance of the residuals must be minimum. 

e The residuals must be homoscedastic. 

e The residuals must not be serially correlated. 

5. The residuals are unbiased (have zero mean). 
6. The residuals have a non-informative distribution 

(usually, a Gaussian distribution). 

e If the distribution of the residuals is informa- 
tive, the extra information could somehow be 
obtained, reducing the variance of the residuals, 
their bias etc., with the result that better estimates 
are obtained. 

In short, through correct implementation of statisti- 
cal estimation techniques the estimates are as close as 
possible to their true values, all the information that is 
available is applied and the uncertainty surrounding the 
estimates and the data fit is reduced to the maximum 
extent possible. Thus the estimates of the parameters, 
which satisfy all these conditions, are the ‘best’ possible 
in a ‘technical’ sense [1]. 


To ensure that all the statistical properties which the 
given estimates of the residuals must fulfil are satisfied 
at every iteration, instead of solving an unconstrained 
maximum likelihood or least-squares problem [15], the 
required statistical properties of the estimates are set 
up as constraints, together with the specification of the 
model of the phenomenon, and this global optimization 
problem is solved for all the undetermined variables. 

The parameters of this model to be estimated are 
defined implicitly through those constraints which de- 
fine the statistical conditions. On solving the global op- 
timization problem, the parameter estimates that result 
will be defined for the optimal control system for the 
policy determination so that statistically correct esti- 
mates will always result. 

The procedure adopted can be specified easily by us- 
ing the same notation as above and by adding an ad- 
ditional set of constraints which express the statistical 
conditions that must be satisfied by the estimates. 

Let 


V(Xi41, Xi, Ui, Viti. Vis Wi, Vi, 1, 2) = 0 
i=1,2,...,N,N4+1,...,T7 (9) 


be the set of conditions to be satisfied to obtain esti- 
mates, if they exist, which satisfy the statistical proper- 
ties indicated above. Then the optimization problem to 
be solved is: 


T 
Min J= i» (Xi, Ui, Yi), 


i=N+1 


(10) 


Kitt = O(%i, Ui, ¥i,Wi: 1) i=1,2,...,T, AD) 


Viti = (Xj, Uj, Vi: 62) i= V2 cd: if; (12) 


0 < V(Xi41. Xi, Ui, Vit. Vis Wis Vi, O1, O2) 
P= Ly ag « W) 
Thus the solution yielding the best policy can be cho- 
sen by solving an optimization formulation of this com- 
plex problem. By recursing on the specifications, i.e. by 
changing the functional form, and increasing the num- 
ber of independent variables considered, increasingly 
better fits can be obtained, with regard to both the his- 
torical data and the predicted optimal control policy. 
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Models 


Many models of industrial, extractive and financial ac- 
tivities require the integration of key processes, but the 
most essential aspect is to formulate precise informa- 
tion where it is most needed [11,12]. 

Many optimization models solve satisfactorily sup- 
ply chain problems, but apparently no model except 
this one integrates information and allocation of goods 
and operations dynamically. 

This algorithm instead solves the combined prob- 
lem, as has been shown elsewhere [6], while a theory- 
driven modelling approach to the problem, using mod- 
els consisting of two stages, an identification stage and 
an optimization stage, can be shown to be dominated 
by this data-driven approach. 

At present, this seems to be the only viable approach 
to solving such complex problems. 


Cases 


Some non-typical SCM problems are indicated here: 
dynamical supply chain management problems for per- 
foration oil wells and for finance. Industrial SCM mod- 
els are given elsewhere [3,8,9,10]. 


Dynamic Supply Chain Management Problem 
for Extraction Activities 


The perforation of oil wells consists of a number of 
operations to drive the bit head lower and lower while 
ensuring normal functions on the equipment and the 
operations. To this end complex measurements are exe- 
cuted by software systems indicated as mudlogging sys- 
tems. These measurements are designed to assist the 
operator in controlling the perforation rate of the bit 
head by monitoring a number of crucial operations pe- 
riodically. 

The settings of some of these operations affect the 
rate of perforation, and therefore it is considered ex- 
tremely useful to dispose of measurements of these 
variables and have predictions over the next few peri- 
ods of the possible advancement of the bit head, or of 
the rate of perforation, and so enable an optimal con- 
trol of the process to be formulated [5]. 

It should be mentioned that periodically the drilling 
process must be halted so that the boring can be lined 
with suitable materials. Also, one of the most important 


elements of the process is to keep circulating around 
the bit-head assembly a concentration of mud lubri- 
cants, indicated as mud, which gives the name to the 
measuring process. Recall that all these flows and oper- 
ations occur in time, so it is considered crucial to spec- 
ify dynamic models, unless it is desired to determine the 
steady-state rates of the eventual process. 

In fact oil drilling processes can be considered as 
complex supply chain systems with many phases and 
many operations. 

The determination of optimal control policies in 
processes for the extraction of oil from underground 
require that they be formulated as formal procedures, 
which are syntactically correct and semantically ade- 
quate, so as to permit management to make the neces- 
sary investments, not on hearsay or clever promotional 
activities, but on the basis of rational knowledge and 
confidence in the application. 

Figure 1 shows an optimal SCM plan compared to 
the actual historical plan implemented. The predicted 
trajectory is superimposed on the actual time path of 
the perforation process, thus respecting all the interrup- 
tions and periods of halting. 

In Table 1, six instances to determine optimal con- 
trols are indicated, and each entry reflects the drilling 
experience of the given well for that week with regard to 
the given period. From the active perforation intervals 
an intial period was selected randomly and the optimal 
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Example of drilling for oil: real-time path (continuous) and 


optimal control path (dashed) for the well 
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agement, Table 1 

Optimal predicted versus actual increment for 6 oil wells 

over 192 periods (8 h) in metres 
Well and week Optimal Real % difference 

increment increment 


control was defined for the next 192 periods (8h). The 
average predicted increment in depth attainable over 
the actual one was more than 30% on average. 


Dynamic Supply Chain Management Problem 
for Finance 


The prediction of future quotations on stock exchange 
indices is important and consists of the basic instru- 
ment to handle financial supply chain management sys- 
tems. A financial supply chain system must consider 
many types of financial intermediaries, many types of 
stocks and stock indices and many types of operations. 
Further, there are many possibilities for managing the 
monetary holdings, so that a full SCM system is envis- 
aged as defined above [7]. 

Consider the Dow-Jones Industrial Average (DJIA) 
stock exchange index over a period of 3 years starting 
in April 2001, as shown in Fig. 2, where the continu- 
ous line indicates the actual quotations, week by week 
over the period, while the 1-week-ahead predictions are 
given by the dashed line. As can be easily seen, the two 
curves almost coincide, which implies that the predic- 
tions 1 week ahead are very good. 

Instead, in Table 2 a period of 5 weeks is considered 
from April 16, 2004 to May 14, 2004. The quotations 
are given every Friday evening at closing time, while the 
predictions are made on Fridays just after closing time. 
Thus on April 9 predictions were made for the next 5 
weeks, as indicated in the second row of the table. After 
closing on April 16, 2004, predictions were made for 4 
weeks only and are depicted in the third row of the table 
and so on for the subsequent weeks. Finally, in the last 
row the closing quotations for the week are given. 
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Weekly time series of the Dow-Jones Industrial Average 


Mathematical Programming Methods in Supply Chain Man- 
agement, Table 2 

Results for prediction of the Dow-Jones Industrial Average, 
147 periods 


Period 16/4 23/4 30/4 7/5 

9/4 8652.86 | 8568.44 | 9304.81 | 13306.5 
16/4 8646.28 | 8552.54 | 11514.3 
23/4 8820.73 | 8806.51] 4700.47 
30/5 : a = 8518.12} 8361.19 
7/5 8343.35 


Index | 8712.88 | 8855.03 | 8538.03 | 8505.54] 8432.25 


9958.15 
11000.3 


This table allows one to determine with the appro- 
priate portfolio model suitable financial policies to for- 
mulate optimal financial SCM plans [7]. 


Conclusions 


Optimal dynamic SCM policies may be obtained by 
a correct application of statistical inference and mathe- 
matical programming techniques. 

It has been indicated that these policies are expecta- 
tionally valid, which implies that they are syntactically 
correct and semantically adequate. 

Computational evidence has been presented and in- 
dicated in the references. 
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Matrix completion problems are concerned with de- 
termining whether partially specified matrices can be 
completed to fully specified matrices satisfying certain 
prescribed properties. In this article we survey some 
results and provide references about these problems 
for the following matrix properties: positive semidefi- 
nite matrices, Euclidean distance matrices, completely 
positive matrices, contraction matrices, and matrices of 
given rank. We treat mainly optimization and combi- 
natorial aspects. 
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Introduction 


A partial matrix is a matrix whose entries are specified 
only on a subset of its positions; a completion of a par- 
tial matrix is simply a specification of the unspecified 
entries. Matrix completion problems are concerned with 
determining whether or not a completion of a partial 
matrix exists which satisfies some prescribed property. 
We consider here the following matrix properties: pos- 
itive (semi) definite matrices, distance matrices, com- 
pletely positive matrices, contraction matrices, and ma- 
trices of given rank; definitions are recalled below. 

In what follows, x*, A* denote the conjugate trans- 
pose (in the complex case) or transpose (in the real 
case) of vector x and matrix A. A square real symmetric 
or complex Hermitian matrix A is positive semidefinite 
(psd) if x*Ax > 0 for all vectors x and positive definite 
(pd) if x* Ax > 0 for all vectors x 4 0; then we write: X = 
0 (X > 0). Equivalently, A is psd (respectively, pd) ifand 
only if all its eigenvalues are nonnegative (respectively, 
positive) and A is psd if and only if A = BBT for some 
matrix B. A matrix A is said to be completely positive if 
A = BBT for some nonnegative matrix B. An n x n real 
symmetric matrix D = (dj) is a Euclidean distance ma- 
trix (abbreviated as distance matrix) if there exist vec- 
tors V1, ..., V, € R* (for some k > 1) such that, for all i,j 
=1,...,n, di is equal to the square of the Euclidean dis- 
tance between v; and vj. Finally, a (rectangular) matrix 
A is a contraction matrix if all its singular values (that 
is, the eigenvalues of A*A) are less than or equal to 1. 

The set of positions corresponding to the specified 
entries of a partial matrix A is known as the pattern of 
A. If A is an n x m partial matrix, its pattern can be 
represented by a bipartite graph with node bipartition 
[1, nm] U [1, m] having an edge between nodes i € [1, n] 
and j € [1, m] if and only if entry aj is specified. 

When asking about existence of a psd completion of 
a partial n x n matrix A, it is commonly assumed that 
all diagonal entries of A are specified (which is no loss 
of generality if we ask for a pd completion); moreover, 
it can obviously be assumed that A is partial Hermitian, 
which means that entry aj is specified and equal to aj * 
whenever aj is specified. Hence, in this case, complete 
information about the pattern of A is given by the graph 
G = ([1, n], E) with node set [1, n] and whose edge set 
E consists of the pairs ij (1 < i<j < n) for which aj 
is a specified entry of A. The same holds when dealing 


with distance matrix completions (in which case diag- 
onal entries can obviously be assumed to be equal to 
Zero). 

An important common feature of the above matrix 
properties is that they possess an ‘inheritance structure’. 
Indeed, if a partial matrix A has a psd (pd, completely 
positive, distance matrix) completion, then every prin- 
cipal specified submatrix of A is psd (pd, completely 
positive, a distance matrix); similarly, if a partial matrix 
A admits a completion of rank < k, then every speci- 
fied submatrix of A has rank < k. Hence, having a com- 
pletion of a certain kind imposes certain ‘obvious’ nec- 
essary conditions. This leads to asking which are the 
patterns for the specified entries that insure that if the 
obvious necessary conditions are met, then there will 
be a completion of the desired type; therefore, this in- 
troduces a combinatorial aspect into matrix completion 
problems, as opposed to their analytical nature. 

In this article we survey some results and provide 
references for the various matrix completion problems 
mentioned above, concerning optimization and com- 
binatorial aspects of the problems. See [32,47] for more 
detailed surveys on some of the topics treated here. 


Positive Semidefinite Completion Problem 


We consider here the following positive (semi) definite 
completion problem (PSD): Given a partial Hermitian 
matrix A = (aj);j ¢ 5 whose entries are specified on a sub- 
set S of the positions, determine whether A has a psd (or 
pd) completion; if, yes, find such a completion. (Here, S 
is generally assumed to contain all diagonal positions.) 

This problem belongs to the most studied matrix 
completion problems. This is due, in particular, to its 
many applications, e.g., in probability and statistics, 
systems engineering, geophysics, etc., and also to the 
fact that positive semidefiniteness is a basic property 
which is closely related to other matrix properties like 
being a contraction or distance matrix. Equivalently, 
(PSD) is the problem of testing feasibility of the follow- 
ing system (in variable X = (xj)): 

X= 0, Xij = Gij (ij = S). (1) 
Therefore, (PSD) is an instance of the following 
semidefinite programming problem (P): Given Hermi- 
tian matrices A;,..., Ay and scalars bj, ..., bm, decide 
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whether the following system is feasible: 


(where A+ X := pee 
n)-matrices A and X). 

The exact complexity status of problems (PSD) and 
(P) is not known; in particular, it is not known whether 
they belong to the complexity class NP. However, it is 
shown in [60] that (P) is neither NP-complete nor co- 
NP-complete if NP #4 co-NP. However, the semidefi- 
nite programming problem and, thus, problem (PSD) 
can be solved with an arbitrary precision in polyno- 
mial time. This can be done using the ellipsoid method 
(since one can test in polynomial time whether a ra- 
tional matrix A is positive semidefinite and, if not, find 
a vector x such that x* Ax < 0; cf. [24]), or interior point 
methods (cf. [3,27,56]). There has been a growing in- 
terest in semidefinite programming in the recent years 
(1994), which is due, in particular, to its successful ap- 
plication to the approximation of hard combinatorial 
optimization problems (cf. the survey [20]). This has 
prompted active research on developing interior point 
algorithms for solving semidefinite programming prob- 
lems; the literature is quite large, see [64,65] for exten- 
sive information. Numerical tests are reported in [34] 
where an interior point algorithm is proposed for the 
approximate psd completion problem; it permits to find 
exact completions for random instances up to size 110. 

Moreover, it is shown in [59] that problem (P) can 
be solved in polynomial time (for rational input data Aj, 
b;) if either the number m of constraints, or the order n 
of the matrices X, Aj in (2) is fixed (cf. also [9]). More- 
over, under the same assumption, one can test in poly- 
nomial time the existence of an integer solution and 
find one if it exists [39]. 

Call a partial Hermitian matrix A partial psd (re- 
spectively, partial pd) if every principal specified sub- 
matrix of A is psd (respectively, pd). As mentioned in 
the Introduction, being partial psd (pd) is an obvious 
necessary condition for A to have a psd (pd) comple- 
tion. In general, this condition is not sufficient; for in- 
stance, the partial matrix: 


ai * x for two Hermitian (n x 


oN ee 
ve Se 
ee eS sw 
eS Fe ~~ © 


(? indicates an unspecified entry) is partial psd, yet no 
psd completion exists; note that the pattern of A is a cir- 
cuit of length 4. Call a graph chordal if it does not con- 
tain any circuit of length > 4 as an induced subgraph; 
chordal graphs occur in particular in connection with 
the Gaussian elimination process for sparse pd matri- 
ces (cf. [21,61]). (An induced subgraph of a graph G = 
(V, E) being of the form H = (U, F) where U C V and 
F := {ij € E: i, j € U}.) It is shown in [23] that every 
partial psd matrix with pattern G has a psd completion 
if and only if G is a chordal graph; the same holds for 
pd completions. This extends an earlier result from [16] 
which dealt with “block-banded’ partial matrices; in the 
Toeplitz case (all entries equal along a band), one finds 
the classical Carathéodory-Fejér theorem from func- 
tion theory. 

The proof from [23] is constructive and can be 
turned into an algorithm with a polynomial running 
time [48]. Moreover, it is shown in [48] that (PSD) can 
be solved in polynomial time when restricted to par- 
tial rational matrices whose pattern is a graph having 
a fixed minimum fill-in; the minimum fill-in of a graph 
being the minimum number of edges needed to be 
added in order to obtain a chordal graph. This result 
is based on the above mentioned results from [39,59] 
concerning the polynomial time solvability of (integer) 
semidefinite programming with a fixed number m of 
linear constraints in (2). 

The result from [23] on psd completions of partial 
matrices with a chordal pattern has been generalized 
in various directions; for instance, considering gen- 
eral inertia possibilities for the completions ([17,35]), 
or considering completions with entries in a function 
ring [37]. 

If A is a partial matrix having a pd completion, then 
A has a unique pd completion with maximum deter- 
minant (this unique completion being characterized by 
the fact that its inverse has zero entries at all unspeci- 
fied positions of A) [23]. In the case when the pattern 
of A is chordal, explicit formulas for this maximum de- 
terminant are given in [7]. The paper [52] considers the 
more general problem of finding a maximum determi- 
nant psd completion satisfying some additional linear 
constraints. 

Further necessary conditions are known for the ex- 
istence of psd completions. Namely, it is shown in [8] 
that if a partial matrix A = (a,j) with pattern G and di- 
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agonal entries equal to 1 is completable to a psd matrix, 
then the associated vector x := (arccos(aj)/7);j ¢ g Satis- 
fies the inequalities: 


> xe - = Xe <|Fi|-1 


e€F e€C\F 


forall F CC, Ccircuitin G, |F| odd. (3) 


Moreover, any partial matrix with pattern G satisfying 
(3) is completable to a psd matrix if and only if G does 
not contain a homeomorph of K, as an induced sub- 
graph (then, G is also known as series-parallel graph) 
[44]. (Here, K, denotes the complete graph on 4 nodes 
and a homeomorph of K, is obtained by replacing the 
edges of K4 by paths of arbitrary length.) The patterns 
G for which every partial psd matrix satisfying (3) has 
a psd completion are characterized in [6]; they are the 
graphs G which can be made chordal by adding a set of 
edges in such a way that no new clique of size 4 is cre- 
ated. Although (3) can be checked in polynomial time 
for rational x [5], the complexity of problem (PSD) for 
series-parallel graphs (or for the subclass of circuits) is 
not known. A strengthening of condition (3) (involving 
cuts in graphs) is formulated in [44]. 

Another approach to problem (PSD) is considered 
in [1,28], which is based on the study of the cone 


xX >0, xij = 0 
Vidj, if 
associated to graph G = (V, E). Indeed, it is shown there 


that a partial matrix A with pattern G has a psd comple- 
tion if and only if 


Pgi= ‘x = (xijijev: 


> AjiiXii + > ajjxij 20, VX € Po. (4) 
ieV idj, 
ij€E 


Obviously, it suffices to check (4) for all X extremal in 
Pg (i-e., X lying on an extremal ray of the cone Pg). 

Define the order of G as the maximum rank of an 
extremal matrix in Pg. The graphs of order 1 are pre- 
cisely the chordal graphs [1,58] and the graphs of or- 
der 2 have been characterized in [46]. One might rea- 
sonably expect that problem (PSD) is easier for graphs 
having a small order. This is indeed the case for graphs 
of order 1; the complexity of (PSD) remains however 
open for the graphs of order 2 (partial results are given 
in [48]). 


Euclidean Distance Matrix Completion Problem 


We consider here the Euclidean distance matrix comple- 
tion problem (abbreviated as distance matrix completion 
problem) (EDM): Given a graph G = (V = [1, n], E) and 
a real partial symmetric matrix A = (aj) with pattern 
G and with zero diagonal entries, determine whether A 
can be completed to a distance matrix; that is, whether 
there exist vectors v},..., ¥, € R* for some k > 1 such 


that 


aij = |\vi—v;|’ for all ij € E. (5) 


(here, ||v|| = 4/ ae v7 denotes the Euclidean norm 


of v € R“.) The vectors v;, ..., v, are then said to form 
a realization of A. A variant of problem (EDM) is the 
graph realization problem (EDMk), obtained by letting 
the dimension k of the space where one searches for 
a realization of A be part of the input data. 

Distance matrices are a central notion in the area 
of distance geometry; their study was initiated by A. 
Cayley in the 18th century and it was continued in par- 
ticular by K. Menger and I.J. Schoenberg in the 1930s. 
They are, in fact, closely related to psd matrices. The fol- 
lowing basic connection was established in [63]. Given 
a symmetric (m x n)-matrix D = (dij)j. j=1 with zero di- 
agonal entries, consider the symmetric ((n — 1) x (n — 
1))-matrix X = (x; pet defined by 

ae d d d 
xij = . in + djn — dij) (6) 
for alli,j =1,...,2—1. 


Then, D is a distance matrix if and only if X is psd; 
moreover, D has a realization in the k-space if and only 
if X has rank < k. Other characterizations are known 
for distance matrices. As the literature on this topic is 
quite large, see the monographs [11,13,14], where fur- 
ther references can be found. 

Problems (EDM) and (EDMk) have many impor- 
tant applications; for instance, to multidimensional 
scaling problems in statistics (cf. [49]) and to position- 
location problems, i.e., problem (EDMk) mostly in di- 
mension k < 3. A much studied instance of the lat- 
ter problem is the molecular conformation problem in 
chemistry; indeed, nuclear magnetic resonance spec- 
troscopy permits to determine some pairwise inter- 
atomic distances, the question being then to reconstruct 


Matrix Completion Problems 


1971 


the global shape of the molecule from this partial infor- 
mation (cf. [13,41]). 

In view of relation (6), problem (EDM) can be for- 
mulated as an instance of the semidefinite program- 
ming problem (P) and, therefore, it can be solved with 
an arbitrary precision in polynomial time. Exploiting 
this fact, some specific algorithms based on interior 
point methods are presented in [2] together with nu- 
merical tests. Moreover, problem (EDM) can be solved 
in polynomial time when restricted to partial rational 
matrices whose pattern is a chordal graph or, more gen- 
erally, a graph with fixed minimum fill-in [48]; as in the 
psd case, this follows from the fact (mentioned below) 
that partial matrices that are completable to a distance 
matrix admit a good characterization when their pat- 
tern is a chordal graph. 

While the exact complexity of problem (EDM) is 
not known, it has been shown in [62] that problem 
(EDMk) is NP-complete if k = 1 and NP-hard if k > 
2 (even when restricted to partial matrices with entries 
in {1, 2}). Finding €-optimal solutions to the graph re- 
alization problem is also NP-hard for small € ({53]). 
The graph realization problem (EDMk) has been much 
studied, in particular in dimension k < 3, which is the 
case most relevant to applications. The problem can be 
formulated as a nonlinear global optimization problem: 
f(v) such that v = (11,...5 Vn) € R*", where the cost func- 
tion f(-) can, for instance, be chosen as 


Fv) = Yi = vj" — an. 
ij€E 
Hence, f(-) is zero precisely when the v;’s provide a re- 
alization of the partial matrix A. This optimization 
problem is hard to solve (as it may have many lo- 
cal optimum solutions). Several algorithms have been 
proposed in the literature; see, in particular, [13,19, 
26,29,31,41,54,57]. They are based on general tech- 
niques for global optimization like tabu and pattern 
search [57], the continuation approach (which con- 
sists of transforming the original function f(-) into 
a smoother function having fewer local optimizers, 
[53,54]), or divide-and-conquer strategies aiming to 
break the problem into a sequence of smaller or easier 
subproblems [13,29,31]. In [29,31], the basic step con- 
sist of finding principal submatrices having a unique re- 
alization, treating each of them separately and then try- 
ing to combine the solutions. Thus arises the problem 


of identifying principal submatrices having a unique re- 
alization, which turns out to be NP-hard [62]. How- 
ever, several necessary conditions for unicity of realiza- 
tion are known, related with connectivity and generic 
rigidity properties of the graph pattern [30,67]. Generic 
rigidity of graphs can be characterized and recognized 
in polynomial time only in dimension k < 2 ([42,51]) 
(cf. the survey [43] for more references). 

Call a partial matrix A a partial distance matrix if ev- 
ery specified principal submatrix of A is a distance ma- 
trix. Being a partial distance matrix is obviously a nec- 
essary condition for A to be completable to a distance 
matrix. It is shown in [4] that every partial distance ma- 
trix with pattern G is completable to a distance matrix 
if and only if G is a chordal graph; moreover, if all spec- 
ified principal submatrices of the partial matrix A have 
a realization in the k-space, then A admits a completion 
having a realization in the k-space. 

As noted in [33], if a partial matrix A with pattern G 
is completable to a distance matrix, then the associated 
vector x := ( JSGij)i jez Must satisfy the inequalities: 


Xe > xe <0 
goes for alle € C, CcircuitinG. (7) 


The graphs G for which every partial matrix (respec- 
tively, partial distance matrix) A with pattern G for 
which (7) holds is completable to a distance matrix, are 
the graphs containing no homeomorph of Ky as an in- 
duced subgraph [45] (respectively, the graphs that can 
be made chordal by adding edges in such a way that no 
new clique of size 4 is created [33]). Note the analogy 
with the corresponding results for the psd completion 
problem; some connections between the two problems 
(EDM) and (PSD) are exposed in [38,45]. 


Completion to Completely Positive 
and Contraction Matrices 


Call a matrix doubly nonnegative if it is psd and en- 
trywise nonnegative. Every completely positive (cp, for 
short) matrix is obviously doubly nonnegative. The 
converse implication holds for matrices of order n < 4 
(cf. [22]) and for certain patterns of the nonzero entries 
in A (cf. [40]). The cp property is obviously inherited by 
principal submatrices; call a partial matrix A a partial 
cp matrix if every fully specified principal submatrix of 
A is cp. It is shown in [15] that every partial cp matrix 


1972 


Matrix Completion Problems 


with graph pattern G is completable to a cp matrix if 
and only if Gis a so-called block-clique graph. A block- 
clique graph being a chordal graph in which any two 
distinct maximal cliques overlap in at most one node 
or, equivalently, a chordal graph that does not contain 
an induced subgraph of the form: 


Recall that an (1 x m)-matrix A is a contraction ma- 
trix if all eigenvalues of A*A are less than or equal to 1 
or, equivalently, if the matrix 


~ I, A 
Be & z) (8) 


is positive semidefinite. Call a partial matrix A a par- 
tial contraction if all specified submatrices of A are con- 
tractions. As every submatrix of a contraction is again 
a contraction, an obvious necessary condition for a par- 
tial matrix A to be completable to a contraction matrix 
is that A be a partial contraction. Thus arises the ques- 
tion of characterizing the graph patterns G for which 
every partial contraction with pattern G can be com- 
pleted to a contraction matrix. 

As we now deal with rectangular n x m partial ma- 
trices A, their pattern is the bipartite graph G with node 
set U U V, where U, V index the rows and columns 
of A and edges of G correspond to the specified entries 
of A. We may clearly assume to be dealing with par- 
tial matrices whose pattern is a connected graph (as the 
partial matrices associated with the connected compo- 
nents can be handled separately). Below is an example 
of a partial matrix A which is a partial contraction, but 
which is not completable to a contraction matrix: 


In fact, the graph pattern displayed in this example is 
in a sense present in every partial contraction which is 
not completable to a contraction. Namely, it is shown in 
[36] that the following assertions (i-iii) are equivalent 
for a connected bipartite graph G with node bipartition 
UUY: 


i) Every partial contraction with pattern G can be 
completed to a contraction; 

ii) G does not contain an induced matching of size 2 
(i.e. if e := uv, e’ := u'v’ are edges in G with u 4 wu’ 
€ U,v# Vv’ € V, then at least one of the pairs uv’, 
u’v is an edge in G; that is, G is nonseparable in the 
terminology of [21]); 

iii) The graph G obtained from G by adding all edges 
uu’ (u#u' € V)and w’ (v4 Vv € V) is chordal. 

(Note that the implication iii) — i) is a consequence 

of the result on psd completions from [23] mentioned 

in the Section on the positive semidefinite completion 
problem above, as G is the graph pattern of the matrix 

‘A defined in (8).) 


Rank Completions 


In this section, we consider the problem of determining 
the possible ranks for the completions of a given partial 
matrix. For a partial matrix A, let mr(A) and MR(A) de- 
note, respectively, the minimum and maximum possi- 
ble ranks for a completion of A. If B, C are completions 
of A of respective ranks mr(A), MR(A), then changing B 
into C by changing one entry of B into the correspond- 
ing entry of C at a time permits to construct comple- 
tions realizing all ranks in the range [mr(A), MR(A)]. 
Hence, the question is to determine the two extreme 
values mr(A) and MR(A). As we see below, the value 
MR(A) can, in fact, be expressed in terms of ranks of 
fully specified submatrices of A and it can be computed 
in polynomial time; this constitutes a generalization of 
the celebrated Frobenius-Ko6nig theorem (correspond- 
ing to the case when specified entries are equal to 0). On 
the other hand, determining mr(A) seems to be a much 
more difficult task. 

We first deal with the problem of finding maximum 
rank completions. Let A be an n x m partial matrix with 
graph pattern G, i.e., G is the bipartite graph (U U V, 
E) where U, V index respectively the rows and columns 
of A, and the edges of G correspond to the specified en- 
tries of A, and let G denote the complementary bipartite 
graph whose edges correspond to unspecified entries of 
A. Note that computing MR(A) amounts to computing 
the generic rank of A when viewing the unspecified en- 
tries of A as independent variables over the field con- 
taining the specified entries. For a subset X C U U V, 
let Ax denote the submatrix of A with respective row 
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and column index sets {i € [1, n]: u; ¢ X} and {j € [1, 
ml]: v; ¢ X}. Call X a cover of G if every edge of G has at 
least one end node in X; that is, if Ay is a fully specified 
submatrix of A. Clearly, we have: MR(A) < rank(Ax)+ 
|X|. In fact, the following equality holds: 


MR(A) =) min _ rank(Ax) + |X| (9) 


X cover of G 


as shown in [12]. A determinantal version of the result 
was given in [25]. In the special case when all specified 
entries of A are equal to 0, then MR(A) coincides with 
the maximum cardinality of a matching in G and, there- 
fore, the minimax relation (9) reduces to the Frobe- 
nius-K6nig theorem (cf. [50] for details on the latter 
result). Moreover, one can determine MR(A) and con- 
struct a maximum rank completion of A in polynomial 
time. This was shown in [55] by a reduction to matroid 
intersection and, more recently, in [18] where a simple 
greedy procedure is presented that solves the problem 
by perturbing an arbitrary completion. 

We now consider minimum rank completions. To 
start with, note that mr(A) may depend, in general, on 
the actual values of the specified entries of A (and not 
only on the ranks of the specified submatrices of A). In- 


2ab 
deed, consider the partial matrix A = (: : ') where a, 
e 


b, c, d, e, f # 0. Then, mr(A) = 1 if ace = bdf and mr(A) 
= 2 otherwise, while all specified submatrices have rank 
1 in both cases. Thus arises the question of identifying 
the bipartite graphs G for which mr(A) depends only 
on the ranks of the specified submatrices of A for ev- 
ery partial matrix A with pattern G; such graphs are 
called rank determined. The graph pattern of the above 
instance A is the circuit Cs. Hence, Cg is not rank de- 
termined. Call a bipartite graph G bipartite chordal if it 
does not contain a circuit of length > 6 as an induced 
subgraph. Then, if a bipartite graph is rank determined, 
it is necessarily bipartite chordal [12]. It is conjectured 
there that, conversely, every bipartite chordal graph is 
rank determined. The conjecture was shown to be true 
in [66] for the nonseparable bipartite graphs (i.e., the 
bipartite graphs containing no induced matching of size 
2; they are obviously bipartite chordal). Note that a par- 
tial matrix A has a nonseparable pattern if and only if it 
has (up to row/column permutation) the following ‘tri- 
angular’ form: 


Then, mr(A) can be explicitly formulated in terms 
of the ranks of the specified submatrices of A; in the 
simplest case, the formula for mr(A) reads: 


B ? 
in ( ; a 
= rank (2) +rank(C_ D) — rank(C). 


Itis shown in [12] that the above conjecture holds when 
the pattern G is a path, or when G is obtained by ‘gluing’ 
a collection of circuits of length 4 along a common edge. 


See also 


> Interior Point Methods for Semidefinite 
Programming 

> Semidefinite Programming and Determinant 
Maximization 
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Matroids have been defined in 1935 as generalization 
of graphs and matrices. Starting from the 1950s they 
have had increasing interest and the theoretical results 
obtained have been used for solving several difficult 
problems in various fields such as civil, electrical, and 
mechanical engineering, computer science, and mathe- 
matics. A comprehensive treatment of matroids can not 
be contained in few pages or even in only one book. 
Thus, the scope of this article is to introduce the reader 
to this theory, providing the definitions of some differ- 
ent types of matroids and their main properties. 
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Historical Overview 


In 1935, H. Whitney in [38] studied linear dependence 
and its important application in mathematics. A num- 
ber of equivalent axiomatic systems for matroids is con- 
tained in his pioneering paper, that is considered the 
first scientific work about matroid theory. 

In the 1950s and 1960s, starting from the Whit- 
ney’s ideas, W. Tutte in [25,26,27,28,29,30,31,32,33] 
built a considerable body of theory about the struc- 
tural properties of matroids, which became popular in 
the 1960s, when J. Edmonds in [5,6,7,8,9,10,11] intro- 
duced matroid theory in combinatorial optimization. 
From 1965 on, a growing number of researchers be- 
came interested in matroids. In 1976, D.J.A. Welsh 
([34]) published the first book on matroid theory. In the 
1970s, 1980s, and 1990s selected topics have been cov- 
ered by a huge number of scientific publications, among 
them [1,2,3,12,13,15,17,18,20,21,23,24,35,36,37]. [16] 
provides an excellent historical survey, while [21] is 
a good book for students. 


Definition of a Matroid 


Matroids are combinatorial structures often treated in 
together with the greedy technique, which yields opti- 
mal solutions when applied for solving simple problems 
defined on matroids. 

In order to provide the definition of a general ma- 
troid, some notation and further definitions are needed. 


Definition 1 An ordered pair S = (E, I), where E = {e), 
..+5 nf and I C 25, is an independent system (SI) if and 
only if 


VA,BCE: BCAEISBEIl. (1) 


E is also called ground set. 
Note that the empty set is necessarily a member of I. 


Definition 2 The members of I are called independent 
sets. 


Definition 3. The members of D = 2? \ J are called de- 
pendent sets. 


Definition 4 The members of the set 
B={ACE: AcI,VfEeE\A: BU{fEB 


are called maximal independent sets or bases. 


In other words, a basis is an independent set which is 
maximal with respect to set inclusion operation. 


Definition 5 The members of the set 
C={CCE: CeD, Vf EC: C\ffsfeB 


are called minimal dependent sets or circuits. A 1- 
element circuit is a loop. 


Definition 6 A matroid M is an independent system 
(E, I) such that if A, B € I, |A| < |B|, then there is some 
element x € B\ A such that A U {x} € I. 

We say that M satisfies the exchange property. 


Most combinatorial problems can be viewed as the 
problem of finding an element in one of the above de- 
fined sets corresponding to the optimal objective func- 
tion value. 

The word matroid is due to Whitney. He studied 
matric matroids, in which the elements of E are the rows 
of a given matrix and a set of rows is independent if they 
are linearly independent in the usual sense. 

The following theorems express two equivalent ax- 
iomatic definitions of matroids in terms of bases and 
circuits. 


Theorem 7 A nonempty set B of subsets of E is the set 
of bases for a matroid M = (E, I) if and only if for all By, 
Bp € B, B; 4 Bz, and x € B; \ Bo, there exists an element 
y € Bz \ B; such that 


B, Uty} \ {x} € B. 


Theorem 8 A set C of subsets of E is the set of circuits 

for a matroid M = (E, I) if and only if the following two 

properties hold: 

1) forallX AY EC, XY; 

2) for allX A Y € Candz €X IY, there exists Z EC 
such that ZC XU Y \ {z}. 


Other alternative axiomatic characterizations of a ma- 
troid need some further definitions. 
Let M = (E, I) be a matroid. 


Definition 9 For all A C E, let p: 2% N bea function 
such that 


p(A) = max {|X|: X C A,X €]}. 


p is called rank of M. 
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Note that the rank of M is equal to the rank of E, which 
is given by the cardinality of the maximal independent 
subset of E. The rank is always well-defined, due to the 
following proposition. 


Proposition 10 If A is a subset of E and X and Y are 
maximal independent subsets of A, then |X| =|Y]. 


Proposition 10 claims that the maximal independent 
subsets contained in A C E of a given matroid M = (E, 
I) have the same cardinality. Choosing A = E, the fol- 
lowing corollary holds. 


Corollary 11 The bases of any matroid have the same 
cardinality. 


Definition 12 A subset A of E is called a closed of M if 


P(A U {x}) = p(A) +1, Vx Ee E\A, 


i.e. if it is not possible to add to A any element without 
increasing its rank. 


Definition 13 The closure operator for M is a function 
o: 27 —> 2F such that for all A C Eo (A) is the closed of 
minimum cardinality that contains A, i.e. 


o(A) =AU{x € E\A: p(AU {x}) = p(A)}. 


Definition 14 A subset A of E covers M if and only if 
it contains a basis of M, i.e. 


p(A) = p(E). 


With these further definitions at hand, the follow- 
ing theorems express three other equivalent axiomatic 
characterizations of a matroid in terms of its rank. 


Theorem 15 A function p: 2° + N is a rank function 

of a matroid M = (E, I) if and only if for all X C E and 

for all y, z € E the following three properties hold: 

1) p(B) = 0; 

2) p(X) S p(X U ty) S p(X) + 15 

3) p(X) = p(X U ty) = p(X VU tz}) = p(X U fy ZH) = 
p(X). 


Theorem 16 A function p: 2 > N is a rank function 
of a matroid M = (E, I) if and only if for allX A Y CE 
the following three properties hold: 

1) 0< p(X) < |X} 

2) X CY = p(X) S p(Y); 


3) p(X UY) + p(X MY) < p(X) + p(Y). 


Note that the second property of theorem 16 implies 
that p is a monotonic function, while the third property 
expresses its submodularity. 


Theorem 17 A function o: 2" — 2° is a closure opera- 
tor of a matroid M = (E, I) if and only if for allX AY C 
E and for all x, y € E the following four properties hold: 
1) X Co(X); 

2) YOX = o(Y) Co(X); 

3) o(X) =a(0(X)); 

4) y€o(X), ye o(X U {x}) > xe o(X U fy). 


Definition 18 A matroid M = (E, I) is weighted if there 
is an associated weight function w that assigns a strictly 
positive weight w(x) to each element x € E. 

The weight function w extends to subsets A of E by 
summation: 


w(A) = ) w(x). 


xEA 


Minor of Matroids: Restriction and Contraction 


A minor of a matroid M = (E, I) is a ‘submatroid’ ob- 
tained from deleting or contracting from the ground set 
E one or more elements. 

A loop is an element y of a matroid such that {y} is 
not independent. Equivalently, {y} does not lie in any 
independent set, nor in maximal independent sets. 


Definition 19 Let M = (E, I) be a matroid. If an ele- 

ment {x} is not a loop, the matroid M/x, called a con- 

traction of M, is defined as follows: 

1) the ground set of M/x is E \ {x}; 

2) aset A is independent in M/x if and only if A U {x} 
is independent in M. 


The concept of matroid contraction can be dualized. In 
fact, an element y is a coloop if it is contained in every 
basis of M. 


Definition 20 Let M = (E, I) be a matroid. If an ele- 

ment {x} is not a coloop, the matroid M \ x, called a re- 

striction of M, is defined as follows: 

1) the ground set of M\x is E\ {x}; 

2) aset A is independent in M\x if and only if it is in- 
dependent in M. 


The above definitions have been given in terms of re- 
striction and contraction of only one element, but they 
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can be easily extended to the restriction and contrac- 
tion of a set X. The minors obtained will be denoted M\ 
X and M\X, respectively. 


Representability of Matroids 


One among the most common canonical examples of 
matroids is the vectorial matroid, whose ground set E 
is a finite set of vectors from a vector space, while the 
independent sets are the linearly independent subsets 
of vectors of E. A matroid M = (E, I) is representable on 
a field F if there exists some vector space V over F, with 
some finite set E of vectors of V, so that M is isomorphic 
to the vectorial matroid of the set E. A binary matroid 
is a matroid representable over GF(2), while a ternary 
matroid is representable over GF(3). 

In recent literature (as of 1999) the problem of clas- 
sifying all the fields over which a given matroid is rep- 
resentable and the inverse problem of characterizing all 
the matroids that are representable on a given field have 
had growing interest. An important result for matroid 
representability is the following theorem. 


Theorem 21 A matroid M = (E, I) is representable over 
any field if and only if it is representable over GF(2) and 
over some field of characteristic other than two. 


A matroid as in the previous theorem is called regular. 


Connectivity of Matroids 


Connectivity is an important concept in matroid the- 
ory. 


Definition 22 A matroid M = (E, I) admits a k- 
separation if there exists a partition (X, Y) of the ground 
set E such that 

1) |[X|=>kh|Yl =k 

2) p(X)+ p(Y) — p(E) <k—-1. 

Definition 23 The smallest k such that a matroid M = 


(E, I) admits a k-separation is called the connectivity of 
M. 


If k > 2, M is n-connected for any n < k; ifk =1, Mis 
disconnected; if M admits any k-separations for all inte- 
gers k, M has infinite connectivity. 

An important result for matroid connectivity is the 
following theorem. 


Theorem 24 A matroid M = (E, I) is disconnected if 
and only if there exists a partition (X, Y) of the ground 


set E such that every circuit C of M is either a subset of X 
or a subset of Y. 


Examples of Matroids 


In this section some of the most popular types of ma- 
troids involved in combinatorial optimization will be 
described. 


Uniform Matroid 


Let E be a set of n elements and let I be the family of 
subsets A of E such that |A| < k <n. Then M = (E, I) is 
called the uniform matroid of rank k and is denoted by 
Uk, n- 

The sets of the bases and the circuits of Ux,» are 


B={X CE: |X| =k} 
and 
C={X CE: |X|=k+1}, 


respectively. 
Moreover, for all A C E, 


|A| if |A| < K, 
p(A) = j 
K _ otherwise, 
A if |A|<K, 
o(A) = 
E otherwise. 
Graphic Matroid 


If F is the set of forests of a graph G = (V, E), M = (E, 
F) is called a graphic matroid. The circuits of M are the 
graph-theoretic circuits of G, while the rank of a subset 
E, of Eis given by 


P(E) = |V| — c(Ei), 


where c(E)) is the number of connected components of 
G, =(V, E;). 


Transversal Matroid 


Let E be a finite set, C = {8}, ..., S,} a collection of sub- 
sets of E, and let T = {e,..., e;} CE. 

T is called a transversal of C if there exist distinct 
integers j(1), ..., j(t) such that e; € Sj), i=1,..., t. Let 
I be the set of all transversals of E, then M = (E, I) is 
a transversal matroid. 
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Partition Matroid 


Let E be a finite set, JT = {E,, ..., Ep} a partition of E, 
that is a collection of disjoint subsets of E covering E, 
and d;,..., dy p nonnegative integers. A subset A of E 
is independent, i.e. A € I, if and only if |A Ej] < dj, 
j=1,..., p. The system M = (E, I) is a matroid, called 
a partition matroid. 

An example of a partition matroid can be obtained 
by considering any digraph G = (V, E) and partitioning 
the edges of the set E according to which node is the 
head (or, equivalently, the tail) of each. Suppose that d; 
=1,j=1,..., p; then a set A of edges is independent if 
no two edges of A have the same head (or, equivalently, 
the same tail). 


Dual Matroids 


Let M = (E, I) bea matroid, and let B be its set of bases. 
The dual matroid M is the matroid on the ground 
set E, whose bases are the complements of the bases of 
M. Thus, a set A is independent in M if and only if A is 
disjoint from some basis of M. Note that M = M. 
For a pair of matroids (M, M) and their rank func- 
tions, the following propositions hold. 


Proposition 25 Let M = (E, I) be a matroid, and let p 
be its rank function. Let M = (E,D) be the dual matroid 
of M; then 


P(A) = |A| + e(E \ A) — p(B), 


foreachA CE. 


Proposition 26 Let M be the dual of the matroid M = 

(E, 1), let A be a subset of E and let A = E \ A. If p and 

p are the rank functions of M and M respectively, then 

1) |A| — p(A) = p(B) — p(A)s 

2) p(E) — p(A) = |A] — pA). 

Proposition 27 Let M = (E, I) be a matroid, then 

1) x isa loop in M if and only if x is a coloop in M and 
vice versa; 

2) If x is not a loop in M, then the dual of M/x is the 
matroid M\x; 

3) Ifx is not a coloop in M, then the dual of M \ x is the 
matroid uw 


As example of the dual of a matroid, let us consider the 
vectorial matroid. Suppose that the vectors represent- 
ing M are the columns of an m x n matrix A and that 


these vectors span F”. Thus, A has rank m and is the 
matrix of a linear transformation T from F” onto F”. 
Let K be the kernel of T, and B the matrix of a linear em- 
bedding of U into F”. Note that B is an x (n — m) ma- 
trix (whose columns are the basis for U) and has rank 
n — m. Moreover, the columns of the (1 — m) x n ma- 
trix BT are indexed by the same set as the columns of A 
and BTA = 0. BT is the dual matroid M of the vectorial 
matroid M. 


Greedy Algorithms on Weighted Matroids 


Many combinatorial problems for which the greedy 
technique gives an optimal solution can be formulated 
in terms of finding a maximum-weight independent 
subset in a weighted matroid. In more detail, there 
is given a weighted matroid M = (E, I) and the ob- 
jective is to find an independent set A € I such that 
w(A) is maximized (also called an optimal subset of M). 
Since the weight w(x) of any element x € E is posi- 
tive, a maximum-weight independent subset is always 
a maximal independent subset. 

In the minimum spanning tree problem, for exam- 
ple, there are given a connected undirected graph G = 
(V, E) anda length function w such that w(e) is the pos- 
itive length of the edge e. The objective is to find an 
acyclic subset T of E that connects all of the vertices of 
G and whose total length 


w(T) = 5° w(e) 
e€T 

is minimized. This is a classical combinatorial problem 
and can be formulated as a problem of finding an op- 
timal subset of a matroid. In fact, consider the graphic 
weighted matroid Mg with weight function w’ such that 
w (e) = Wo — w(e), where w is larger than the maximum 
length of any edge. It can be easily seen that for each e € 
E, w’(e) > 0 and that an optimal subset of Mg is a span- 
ning tree of minimum total length in the original graph 
G. In more detail, each maximal independent subset A 
corresponds to a spanning tree and since 


w'(A) = ({V| — 1) - wo — w(A) 


for any maximal independent subset A, the indepen- 
dent subset that maximizes w’(A) must minimize w(A). 

J.B. Kruskal in [14] and R.C. Prim in [22] proposed 
two greedy strategies for solving efficiently the mini- 
mum spanning tree, but in the following is reported the 
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pseudocode of a greedy algorithm that works for any 
weighted matroid. The algorithm GREEDY takes as in- 
put a matroid M = (E, I) and a weight function w and 
returns an optimal subset A. 


setA=@ 
sort E[M] = {xj,.. 
by weight w 
FORi=l1tot 
IF AU {x;} € I[M] 
set A= AU {x;} 
return(A) 


., Xx} into nonincreasing order 


Greedy(M,w) 


Like any other greedy algorithm, GREEDY always 
makes the choice that looks best at the moment. In fact, 
it considers in turn each element x; belonging to E[M], 
whose element are sorted into nonincreasing order by 
weight w and immediately adds x to the building set A 
if A U {x;} is still independent. Note that the returned 
set A is always independent, because it is initialized to 
the empty set, which is independent by definition of 
a matroid, and then at each iteration an element x; is 
added to A while preserving the A’s independence. A is 
also an optimal subset of the matroid M and therefore, 
a minimum spanning tree for the original graph G. To 
prove its optimality, it is enough to show that weighted 
matroids exhibit the two ingredients whose existence 
guarantee that a greedy strategy will solve optimally the 
given problem: the greedy-choice property and the opti- 
mal substructure property. The proof that matroids ex- 
hibit both these properties can be found in [4]. Gener- 
ally speaking, the proof of the exhibition of the greedy- 
choice property consists of showing that a globally op- 
timal solution can be obtained by making a locally opti- 
mal (greedy) choice. The proof examines a global opti- 
mal solution. It shows that the solution can be modified 
so that a greedy choice is made at the first step and that 
this choice reduces the original problem into an equiv- 
alent problem having smaller size. By induction, it is 
proved that a greedy choice can be made at each step. 
To show that making a greedy choice reduces the origi- 
nal problem into a similar but smaller problem reduces 
the proof of correctness to demonstrating that an op- 
timal solution must exhibit optimal substructure. The 
optimal substructure property is exhibited by a given 


problem, ifan optimal solution to the problem contains 
within it optimal solutions to subproblems. The valid- 
ity of this property guarantee the applicability of greedy 
strategies as well as dynamic programming algorithms. 


See also 
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Maximum constraint satisfaction problems (MAX- 
CSPs) generalize maximum satisfiability (MAX-SAT) 
to include cases where the variables are no longer re- 
stricted to binary (or Boolean) values. 

MAX-CSP is NP-complete even in the special case 
of binary CSPs. Therefore designing procedures to 
compute upper bounds to the exact (unknown) opti- 
mum value (maximum number of satisfied constraints) 
is a relevant issue. Such bounds may be useful, in par- 
ticular, to provide estimates of the quality of solutions 
obtained from various heuristic approaches. 

This article describes a systematic way of computing 
upper bounds for large scale MAX-CSP instances such 
as those arising from the so-called radio link frequency 
assignment problem (RLFAP). After discussing the gen- 
eral relaxation principle and the basic procedure from 
which the bounds are derived, we present results of ex- 
tensive computational experiments on series of 90 in- 
stances of RLFAP including both real test problems and 
randomly generated ‘realistic’ test problems (for sizes 
ranging from 396 variables and about 1700 constraints 
to 831 variables and about 4800 constraints). 

These results clearly indicate that the proposed ap- 
proach is practically useful to produce fairly accurate 
upper bounds for such large MAX-CSP problems. 
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Introduction 


Constraint satisfaction problems (CSPs) may be viewed 
as a generalization of satisfiability (SAT) to include 
cases where, instead of taking binary values only (0-1 
or true-false) the variables may take on a finite number 
(> 2) of given possible values. 

For an infeasible CSP, a relevant question, both the- 
oretically and practically, is to determine an assignment 
of values to variables such that the number of satisfied 
constraints is the largest possible. This is the so-called 
maximum constraint satisfaction problem (MAX-CSP), 
which generalizes in a natural way maximum satisfia- 
bility (MAX-SAT). 

Since MAX-2SAT is NP-complete (see e. g. [12, pp. 
259-260]) even the subclass of MAX-CSP correspond- 
ing to binary CSPs (those problems with constraints in- 
volving pairs of variables only) is NP-complete. There- 
fore, for very large instances such as those arising from 
practical applications (e.g. the RLFAP discussed. be- 
low) one can only hope for approximate solutions us- 
ing some of the currently available heuristic approaches 
such as: simulated annealing, tabu search, genetic algo- 
rithms, or local search of various kinds. 

However, for many applications, getting an approx- 
imate solution without any information about the qual- 
ity of this solution (e. g. measured by the difference be- 
tween the cost of this solution and the optimal cost) 
may be of little value. 

We address in this paper the problem of computing 
upper bounds to the optimum cost of MAX-CSP prob- 
lems from which estimates on the quality of heuristic 
solutions can be derived. 

The article is organized as follows. Basic defini- 
tions about CSPs and MAX-CSPs are recalled in the 
second section. Modeling the so-called radio link fre- 
quency assignment problem (RLFAP) in terms of CSP 
and MAX-CSP is addressed in the third section. Then 
we present a general class of relaxations for MAX-CSP 
problems and its specialization to the computation of 
MAX-CSP bounds for RLFAP. Finally results of exten- 
sive computational experiments carried out on series 
of both real test problems and realistic randomly gen- 
erated test problems are presented. To our knowledge, 
this is the first time extensive computational results of 
this kind are reported for such large scale MAX-CSP 
problems. 


CSP and MAX-CSP 
A constraint satisfaction problem (CSP) is defined by 


specifying: 
e aset of n variables x), ...,Xn3 
e for each variable x;, i € I = {1,..., n} the domain of 


i, i.e. the (finite) set D; of possible values for x;; 

e asetofK constraints g,,k=1,...,K. For eachkeé [1, 
K], constraint g;, is defined by its support set (i.e. the 
subset S, = supp(@x) of indices of the variables in- 
volved in the constraint) and an oracle which, given 
any combination X;s,) of values for variables in S;, 
answers TRUE if y,(X[s,|) = TRUE, i.e. if the com- 
bination is allowed, FALSE otherwise. (For any S C 
{l, ..., n} and x € D; x +++ xX Dy, x{s; denotes the 
vector x restricted to components in S.) 

Given a CSP specified as above, we define a free assign- 

ment as any n-tuple x € D = D, x-:- x D,. A feasible 

assignment (or solution) is a free assignment such that 

¢x(x[s,]) = TRUE for all k=1,..., K. 

For simplicity, we restrict here to the case where 
each variable takes scalar values only (i. e. real or integer 
values), but we note that more general CSPs may be de- 
fined with variables taking, for instance, vector values. 

The arity of a constraint p, is the cardinality of its 
support set: |S;| = |supp(¢x)|. A binary CSP is a con- 
straint satisfaction problem in which |supp(y;)| < 2 for 
al k=1,...,K. 

The constraint hypergraph associated with a given 
CSP is the hypergraph having vertex set I = {1, ..., n} 
and edge set {5}, ..., Sx}. In case of a binary CSP this is 
a graph. 

The two examples below are interesting special 
cases of the general definition and show NP complete- 
ness of arbitrary CSPs. 


Example 1 (Satisfiability) SAT is easily recognized as 
a special case of CSP where Vi: D; = {TRUE, FALSE} 
and where there is a constraint ~, corresponding to 
each clause C; with y;(x) = TRUE < clause C; is satis- 
fied under truth assignment x. 


Example 2 (Hypergraph q-coloring; see [2, Chap. 19]) 
Let q > 1 be a given integer and H = [V, E] an hyper- 
graph with vertex set V and edge set E. The problem is 
to assign one out of q colors to each vertex of H so that 
each edge of H has vertices of different colors. Clearly 
this may be formulated as a CSP problem where there 
is one variable x; for each v; € V, with domain D; = {1, 
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..., q}, and one constraint p, for each edge ex = {ij,..., 
ip} € E such that g(x;,,...,Xi,) = TRUE + no two 
values in {X;,,...,%i,} are equal. Note that when H is 
a graph (i.e. |ex| = 2 for all e, € E), the resulting CSP is 
a binary CSP. 


For an infeasible CSP, one basic question is to deter- 
mine a ‘best possible’ or ‘least infeasible’ assignment. If 
the criterion for quality (or degree of ‘feasibility’) of an 
assignment x is taken to be the number o(x) of con- 
straints satisfied under that assignment, we are led to 
the so-called MAX-CSP problem: 

e Given: a CSP defined by its variables x), ..., x,, do- 

mains D),..., D,, and constraints @),..., Ox. 

e Find: x € D, x--+ x D, such that 


o(x) = | {x €[1,K]: @x(x{5,]) = TRUE}| 


is maximized. 


Example 3 (MAX-SAT, MAX-2SAT) Clearly, MAX- 
SAT is a special case of MAX-CSP when the given CSP 
is a satisfiability problem. The associated decision prob- 
lem is NP-complete even for the special case of MAX- 
2SAT ([13]), showing that MAX-CSP is NP-complete 
even for binary CSPs. 


Heuristics for approximately solving the MAX- 
SAT problem have been proposed by [17,19,23,27]. 
A branch and bound algorithm for MAX-SAT based 
on probabilistic bounds is described in [3] with com- 
putational results up to 100 binary variables and 1000 
clauses. The branch and cut algorithm described in [20] 
presents computational results for general Max-3SAT 
problems up to 100 binary variables and 575 clauses. 
For a recent survey on SAT and MAX-SAT, see [9]. 

For more general MAX-CSP problems, many 
heuristic approaches have been investigated such as 
tabu search ([4,7]), simulated annealing [5], genetic al- 
gorithms [18]. Exact Algorithms for random MAX-CSP 
problems were proposed in [11]. However in the com- 
putational experiments reported, the sizes of the prob- 
lems for which exact optimal solutions were found are 
rather small (144 variables with domains of cardinality 
4 and 646 constraints for the largest problems solved in 


[11]). 


MAX-CSP and the Radio Link Frequency 
Assignment Problem 


Operating large radio link telecommunication net- 
works gives rise to the so-called radio link frequency 
assignment problem (RLFAP), which is to choose, for 
each transmission link, a specific operating frequency 
(among a given list of allowed values) while satisfying 
a list of noninterference constraints, (most constraints 
usually involving pairs of links). A CSP formulation of 
RLFAP is as follows: With n denoting the number of 
links, for each link i = 1, ..., n, there is an associated 
variable x; representing the frequency to be assigned to 
link i. The domain D; of x; is the (finite) set of allowed 
frequencies for link i (frequencies are expressed in Hz, 
KHz, MHz or any other specified unit). 

Any assignment x € S= D, x--- x D, is not allowed 
because a number of constraints, called noninterference 
constraints have to be satisfied. 

We will only consider here the case of binary nonin- 
terference constraints (i. e. involving only pairs of links), 
which is relevant to many applications of interest (see 
e.g. [15,16]). For a given pair of links i and j, two (ex- 
clusive) types of constraints are possible: 

e equality constraints of the form 


(E) |x; —x;| = wigs 
e inequality constraints of the form 
(1) |x; —x;| = wi. 


The real number w; which represents the requested 
slack or minimum requested slack between the two as- 
signed frequencies will be called the weight of the con- 
straint. 

An instance of RLFAP is therefore specified by n 
(number of links), a list of domains D,,..., D, anda list 
of constraints i.e. a list of quadruples of the form (i, j, 
wi, Ti) where: i, j are the indices of the two links in- 
volved, w; is the weight of the constraint, and Tj its 
type ((E) or (I)). The constraint graph associated with 
an instance of RLFAP is defined as the undirected graph 
G with node set {1,..., n} and with an edge (i, j) for each 
constraint (i, j, wi, Ti). We denote K the total number 
of constraints in an instance of RLFAP. Benchmarks of 
the RLFAP involving real instances up to 916 variables 
and 5744 constraints have been made publicly avail- 
able in the context of the European Project CALMA 
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(see [8,15]). In those practical instances, the number 
of equality constraints (of type (E)) is never more than 
n/2, and assignments satisfying all of them can easily 
be found. We will denote S’ C § = D, x --- x D,, the 
set of all such assignments. All assignments x € S \ S’ 
must be disregarded because they are physically mean- 
ingless, therefore, from now on, we will only consider 
assignments in S’ as possible solution to RLFAP. 

An assignment in S’ which satisfies all constraints of 
type (I) will be called feasible. The feasibility version of 
RLFAP may therefore be stated as the following CSP: 

e Given: an instance of RLFAP. 

e Question: does there exist a feasible frequency as- 
signment? 

e Answer: yes or no and, if yes, output a feasible as- 

signment x. 

Efficient solution methods for RLFAP are of major in- 
terest to numerous practical applications in the context 
of civilian mobile communication networks as well as 
of military networks. Since the available spectrum is 
severely limited and the communication needs (traffic 
requirements) are continuously increasing, a high pro- 
portion of the instances of the RLFAP encountered in 
applications turn out to be infeasible. 

When faced with an instance which is either infeasi- 
ble or which is presumably infeasible (e. g. because run- 
ning a heuristic solution method just failed to produce 
a feasible solution) a key question for the practitioner 
becomes to determine a ‘best possible’ or ‘least infeasi- 
ble’ assignment. 

This leads to the ‘optimization version’ of the RL- 
FAP in the form of the following MAX-CSP: 

e Given: an instance of RLFAP with n variables (links) 
and K constraints. 

e Question: determine x* € S’ such that o(x*) (num- 
ber of satisfied constraints) is maximized: 


a(x") = max{o(x)}. 


In view of the NP-completeness of MAX-CSP for 
binary CSPs, guaranteed optimal solutions to the above 
for large scale instances (such as those of the CELAR 
benchmarks) cannot be reasonably expected from cur- 
rently available techniques in combinatorial optimiza- 
tion. A less ambitious, though practically relevant ob- 
jective, addressed in the following section, is to try and 
obtain good upper bounds to an optimal solution value. 


We note here that in the case where an upper bound 
@ is found such that @ < K, then we can deduce that 
the given RLFAP has no feasible solution. Thus, an in- 
teresting by-product of computing bounds will be to 
produce proofs of infeasibility of a given instance of RL- 
FAP. Clearly, such an information may be of consider- 
able importance to the practitioner. 


A General Class of Relaxations 
for Computing MAX-CSP Bounds 


MAX-CSP may be reformulated as the discrete opti- 
mization problem 


K 
max z= Yo yk 
k=1 
st. _gk(x) > ye, Wk = 1,...,K, (1) 
yk = O0orl, Vk, 
x = (x1,...,%n)) €S!. 


In the above, for all k = 1,..., K, gx(x) = 1 if pe(x[s,)) 
= TRUE, and g;(x) < 1 if @x(x{s,]) = FALSE. Note that 
in the case of RLFAP, this specializes to: gi(x) = |x; — 
x;j|/Wk where x; and x; are the two variables involved in 
constraint k, and w; the weight of constraint k. 

A relaxation of an optimization problem such as (1) 
is obtained by replacing its solution set by a larger so- 
lution set. Clearly if the relaxed problem can be solved 
exactly (i.e. to guaranteed optimality) then its optimal 
objective function value is an upper bound (in case of 
maximization) to the optimum objective function value 
of the original problem. 

There exists a number of standard ways of relax- 
ing an optimization problem such as (1), e. g. using La- 
grangian relaxation (e.g. [10]) or considering the so- 
called continuous relaxation of some of the variables 
(e. g. relaxing the constraints on the y; variables in (1) to 
0 <x < 1). However, in our treatment of RLFAP, those 
standard relaxations have not been considered because 
they do not give rise to easily solvable relaxed problems. 
We therefore investigated a different approach accord- 
ing to the following general principle. 

The relaxations we consider are based on the iden- 
tification of those parts of the constraint graph or hy- 
pergraph which are responsible for the infeasibility of 
the whole problem. Preliminary computational results 
obtained in [25] have shown that, at least for MAX-CSP 
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problems deriving from RLFAP, it is most often possi- 
ble to identify in a given instance an infeasible induced 
subproblem of sufficiently reduced size to make the cor- 
responding MAX-CSP bound computable in reason- 
able time. 

This suggests to consider relaxations of (1) formed 
by subproblems induced by properly chosen subsets of 
constraints. Thus, if K’ C K = {1,..., K} is the subset of 
constraints chosen, the induced relaxation considered 
is: 


K 
max z= pe 
k=1 
st. g(x) > ye, Wk EK’, (2) 
ye =Oorl, Vk =1,...,K, 
xe S’, 


Note that, in an optimal solution to (2) 
kKEK\K'=> yp = 1. 
Therefore Z, the optimum objective function value of 
(2), may be rewritten as: 
Z=K-|K'|+7, 
where Z is the optimum value of the problem: 


max 2 =H 


kex’ 


g(x) = Vk Vke K', 
yer =Oorl, Vk EX’, 
xeS’, 


RK’ s.t. 


Clearly, the constraint graph or hypergraph G’ cor- 
responding to a relaxation R[K’] is deduced from the 
constraint graph or hypergraph G by deleting all edges 
associated with the constraints in K \ K’. Also observe 
that if G’ has several distinct connected components, 
then the solution of R[K’] decomposes into independent 
subproblems, one for each connected component. 

If the constraint graph or hypergraph G’ is of suffi- 
ciently small size, then it is possible to solve R[K’] ex- 
actly, and the optimum solution value obtained clearly 
leads to an upper bound to the optimum value of the 
original problem. When G’ is too large to get the ex- 
act optimal solution value of R[K’] then we will content 
ourselves with getting an upper bound to this exact op- 
timal value (see the procedure SOLVE.RELAX below). 


Clearly, any such upper bound still provides a valid up- 
per bound to the original problem. Of course, in the 
above approach, the quality of the bound derived from 
R[K’] essentially depends on how to select the subset 
K’. We now describe the selection procedure which has 
been used in our computational experiments. 


Building Relaxations 
for RLFAP Using Maximum Cliques 


We now specialize the general relaxation scheme de- 
scribed above to derive bounds for RLFAP. The pre- 
sentation below improves and extends our preliminary 
work in [25]. 

The basic idea of our selection procedure for choos- 
ing K’ C K is that, for RLFAP, infeasibility is more 
likely to occur on subsets of links which are all mutu- 
ally constrained, i.e. on subsets of links which induce 
a clique (complete subgraph) in the constraint graph. 
Since for RLFAP the constraint graphs arising from 
applications are always very sparse (less than 1% den- 
sity for the CELAR instances), it is known that finding 
a clique of maximum cardinality can be efficiently done 
even using simple approaches such as implicit enumer- 
ation. 

In [6] an efficient implicit enumeration based algo- 
rithm with good computational results for large sparse 
graphs up to 3000 vertices is described; however, it as- 
sumes very small maximum clique sizes (in the compu- 
tational results presented in [6], maximum clique sizes 
do not exceed 11, and the running times seem to in- 
crease extremely fast with this parameter). Unfortu- 
nately, in view of the fact that, for our large RLFAP 
instances, the maximum clique sizes turned out to be 
commonly in the range [12, 25], the above algorithm 
could not be used. 

We therefore worked out a different implementa- 
tion of the implicit enumeration technique which al- 
lowed us to find guaranteed maximum cliques for all 
the test problems treated within acceptable computing 
times (see results at the end of the paper). Using this 
maximum clique algorithm, the procedure for building 
a relaxation to MAX-CSP for RFLAP is as follows. 

The heuristic solution method used in our experi- 
ments to implement step b1) is a variant of local search 
consisting in iteratively improving an initial starting 
solution; at each iteration an exact tree search is car- 
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ried out to find an optimal solution to a subprob- 
lem involving only a few variables. In our computa- 
tional experiments we observed that the impact of the 
quality of the heuristic solutions produced at step b1) 
on the quality of the relaxation obtained at the end 
of BUILD.RELAX was practically negligible (the main 
reason for this is that & is only used as a stopping crite- 
rion in the process of successive extraction of maximum 
cliques). The computational results shown below con- 
firm that bounds of good average quality indeed result 
from the above construction. 


a Set: G = [X, U] < G (the initial con- 
straint graph), i < 0 
b Current step: 


1 Apply a heuristic algorithm to get a good 
approximate solution to MAX-CSP on G. 
Let o denote the number of constraints 
satisfied in this solution. 

IF & = |U| go to c) (end of the construc- 
tion), 
ELSE set: i << i+1. 

2 Look for a maximum clique on G. Let C; 
be the clique obtained, with node set 
N(C;) and edge set E(C;). 

3 Let G’ denote the subgraph of G induced 
by X \ N(C;) (obtained from G by deleting 
all edges having at least one endpoint in 
N(C;)). 

Set G < G’ and return to b). 

c IF i = 0, the problem is feasible and step 
b1) produces an assignment satisfying all 
the constraints. Terminate. 

ELSE the relaxation R[’] obtained cor- 
responds to the set K’ of all constraints in 
Uj_,E(C)). 


Procedure BUILD.RELAX 


Solving the Relaxed Problem R[I’] 


In order to solve the relaxed problem R[X’] we use 
a basic procedure called FIND.SOLUTION(R[X’], 6) 
which, for any integer value 6 € [1, |K’|], answers YES 
or NO depending on whether there exists a solution to 
R[K’] with objective function value z > 6 or not. In case 
of a YES answer, the procedure also exhibits the corre- 
sponding solution. We assume that this procedure is ex- 


act i.e. always finds the right answer. Clearly, any value 
of 6 leading to a NO answer produces an upper bound 
to the optimal solution value of R[K’]. 

The procedure SOLVE.RELAX(R[K’]) determines 
a decreasing sequence of upper bounds to the optimal 
value of R[K’] until either termination is obtained (at 
step c)) or the maximum computation time has been 
reached. 

In the former case, the exact optimum solution 
value to R[K’] is obtained; in the latter case, only an 
upper bound to this optimal value is produced. 


a Initialization: Set 6 <| K’ |. 
b Current step: 
Apply FIND.SOLUTION(R[K’], 9) 
IF the answer is NO, 
THEN set 0 < 9 — 1 and return to b). 
ELSE perform step c). 
c A YES answer has been obtained at step b): 0 is 
the optimal solution value to R[K’]. Terminate. 


Procedure SOLVE.RELAX(R[I’]) 


When G’, the constraint graph of R[K’] has sev- 
eral distinct connected components corresponding to 
subsets of constraints, K’), ..., K’,, then solving 
R[K’] decomposes into the solution of several smaller 
subproblems R[K,'], ..., R[K,’]. In the procedure 
SOLVE.RELAX, this decomposability may be exploited 
in various possible ways. In our implementation, this is 
done by organizing the computation into phases num- 
bered t¢ = 0, 1, .... The current upper bound value UB 
is initialized by: UB < |K’|. The current phase tf con- 
sists in running the procedure FIND.SOLUTION on 
each of the subproblems R[K’j], j = 1, ..., p, with the 
parameter @ = |X’;| — t. Each time a NO answer is ob- 
tained, UB is updated by UB < UB — 1. Clearly with 
the above process, when a YES answer has been ob- 
tained for some subproblem R[X;’] during phase t, this 
subproblem should not be considered any more at later 
phases ¢’ > t. The computation stops either at the end of 
a phase during which a YES answer has been obtained 
for all subproblems; or when a user-specified time limit 
has been reached. 

The basic procedure FIND.SOLUTION has been 
implemented as a classical depth first tree search pro- 
cess of the implicit enumeration type, (achieved by 
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means of a recursive C function). Since getting the ex- 
act answer (YES or NO) is essential to the derivation 
of our bounds, the procedure FIND.SOLUTION is run 
until full completion of the tree search (i. e. when all the 
nodes of the tree have been explored implicitly or ex- 
plicitly). 


Computational Results 


In order to validate the above described approach, sys- 
tematic computational experiments have been carried 
out on two series of test problems. 

The first set was composed of 15 infeasible real 
problems which arose from actual network engineering 
studies carried out on three distinct large radio link net- 
works (one in the 2GHz frequency range, one in the 2, 
5GHz frequency range and one in the 4GHz frequency 
range). 

The second series concerned a set of 5 x 15 = 75 ‘re- 
alistic test problems generated by applying some ran- 
dom perturbation to the above 15 real problems. More 
precisely, each problem of the second series is gener- 
ated from one problem of the first series by changing 
the weight w; of each inequality constraint of the form: 
|x; _ x;| 2 Wij to: Wij = Wij X (a + BO) where @ is 
a pseudorandom number drawn from a uniform dis- 


Maximum Constraint Satisfaction: Relaxations and Upper 
Bounds, Table 1 


Prob. n K NF Relaxation 
# # var. # const. 
1 680 2389 8 44 Di 
2 680 3367 16 38 339 
3 680 4103 24 84 671 
4 680 2725 8 74 490 
5 680 2576 8 46 311 
6 680 2470 8 44 284 
Ti 831 3451 16 16 113 
8 831 4802 24 33 248 
9 396 1792 12 70 375 
10 396 1792 12 70 375 
11 396 1792 12 70 375 
12 396 1792 12 70 375 
13 396 1792 12 70 375 
14 396 1792 12 70 375 
15 396 1792 12 70 375 


Maximum Constraint Satisfaction: Relaxations and Upper 
Bounds, Table 2 


Prob. HS Best upper bound 

# obtained within 
15s oy lh 
1 2376 2387 2385 2383 
2) 3358 3367 3366 3365 
3 4090 4102 4098 4098 
4 2700 2720 2713 2708 
5 2559 yl 2569 2564 
6 2457 2467 2464 2459 
7 3440 3450 3450 3450 
8 4781 4800 4800 4799 
9 1762 1786 1780 1777 
10 1759 1786 1780 1776 
11 1761 1786 1780 1778 
12 1764 1786 1780 1776 
13 1761 1786 1780 1775 
14 1757 1786 1780 1775 
5) 1764 1786 1783 1777 


tribution on [0, 1] and a, 6 are chosen parameters (of 
course the pseudorandom drawing is assumed to be in- 
dependent from one constraint to the next). 

Table 1 presents the characteristics of the 15 real test 
problems treated, numbered 1 to 15 and provides for 
each problem: number of variables (n), number of con- 
straints (K), number of distinct frequencies used (NF) 
and the main characteristics of the relaxed subproblem 
obtained from the procedure BUILD.RELAX: number 
of variables #var, and number of constraints #const. 

Table 3 presents in a similar way the characteristics 
of the 5 x 15 = 75 test problems deduced from the previ- 
ous ones by random perturbation. The 5 instances cor- 
responding to each basic problem i are numbered ij, ..., 
is. For each instance the values of the parameters w and 
B used to generate the instance are displayed together 
with the characteristics (number of variables, number 
of constraints) of the relaxed subproblem produced by 
BUILD.RELAX. 

The computation times taken to construct the re- 
laxed subproblems (using BUILD.RELAX) on the prob- 
lems of Tables 1 and 3, are all between 5 minutes to 35 
minutes with an average of about 12 minutes. 

Table 2 shows the results obtained on the 15 real 
test problems of Table 1 and Table 4 shows the results 
for the 5 x 15 problems of Table 3. The computer used 
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Maximum Constraint Satisfaction: Relaxations and Upper Bounds, Table 3 


Prob. «a B Relaxation Prob. a@ B Relaxation 

# #var. #const. # #var. #const. 
1, 0,5 1 42 Doi 91 0,2 41,6 12 375 
1, 0,5 1 42 261 9 O,% I,© 1 66 
1; 0,8 0,4 42 261 93 0,2 41,6 48 66 
14 0,8 0,4 42 261 94 OF2 156 24 264 
1; 0,8 0,4 42 261 95 O,2 Il 36 2) 
21 0,5 1 38 339 10 0,2 41,6 36 375 
22 0,5 1 38 339 102 0,2 41,6 12 198 
23 0,8 0,4 38 339 103 0,2 41,6 48 66 
24 0,8 0,4 38 339 104 0,2 41,6 24 264 
25 0,8 0,4 38 339 105 0,2 41,6 36 132 
31 0,5 1 54 671 11 0,2 41,6 36 375 
32 0,5 1 70 460 11, 0,2 41,6 12 198 
33 0,8 0,4 84 480 113 0,2 Il,@ 48 66 
34 0,8 0,4 84 671 lly O26 24 264 
35 0,8 0,4 54 671 Ths O,2% Ilh@ 36 132 
4, 0,5 1 74 490 12 0,2 41,6 24 375 
4, 0,5 1 74 490 12, OF2,  1h6 12 132 
4; 0,8 0,4 74 490 123 0,2 41,6 48 66 
A, 0,8 0,4 74 490 12, 0,2 41,6 24 264 
4; 0,8 0,4 74 490 125 0,2 41,6 36 162 
51 0,5 1 46 311 13 0,2 41,6 36 375 
52 0,5 1 46 311 13, 0,2 41,6 12 198 
53 0,8 0,4 46 311 133 0,2 41,6 48 66 
54 0,8 0,4 46 311 13, 0,2 41,6 24 264 
yg 0,8 0,4 46 311 135 O,% il 36 132 
6 0,5 1 44 284 14 O,2% Il@ 36 375 
62 0,5 1 44 284 14, OFZ eG 12 198 
63 0,8 0,4 44 284 143 O,2 I,© 48 66 
64 0,8 0,4 44 284 14, 0,2 41,6 24 264 
65 0,8 0,4 44 284 14, 0,2 41,6 36 132 
7 0,8 0,4 16 113 15 0,2 41,6 24 375 
72 0,8 0,4 16 113 15, 0,2 41,6 12 132 
73 0,8 0,4 16 113 153 0,2 41,6 48 66 
Wes 0,8 0,4 16 113 154 0,2 41,6 24 264 
75 0,8 0,4 16 113 155 2% Il@ 36 132 
8 0,5 1 33 248 

8 0,5 1 33 248 

83 0,5 1 33 248 

8, 0,5 1 33 248 

85 0,5 1 33 248 


was a PC Pentium 166 workstation with 32Mb RAM. utes and 1 hour. The results in Table 2 confirm that 
For each problem we provide: HS, the best heuristic so- our approach is practical to consistently produce good 
lution value obtained (number of satisfied constraints); | bounds for real RLFAP instances within acceptable so- 
the best upper bounds obtained after 15 seconds, 5 min- _ lution times. 
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Maximum Constraint Satisfaction: Relaxations and Upper Bounds, Table 4 


Prob. HS Best upper bound Prob. HS Best upper bound 
# obtained within # obtained within 
15s Ss lh 15s 5 lh 

1, 2376 2386 2383 2378 9) 1779 1791 1791 1791 
1, 2376 2386 2383 2378 SD) WAHH 1791 1790 1789 
ibs 2376 2386 2383 2378 oF 1774 1788 1788 1785 
14 2376 2386 2383 2378 94 1777 1790 1789 1789 
15 2376 2386 2383 2378 95 1779 1789 1787 1787 
21 3358 3366 3365 3365 10; 1780 1789 1788 1787 
22 3358 3367 3365 3365 102 1780 1791 1790 1788 
23 3358 3367 3366 3365 103 1776 1788 1787 1785 
24 3358 3367 3365 3365 104 1778 1790 1789 1789 
25 3358 3367 3366 3365 105 1777 1789 1788 1788 
31 4081 4103 4101 4101 11, 1783 1789 1789 1789 
39 4081 4102 4101 4101 11, 1780 1791 1789 1788 
3 4086 4102 4098 4098 113 WHA 1788 1788 1786 
34 4086 4102 4098 4098 114 1780 1790 1789 1789 
oS 4088 4102 4101 4101 11; WHT 1789 1788 1787 
4, 2700 2720 2713 2708 12i, 1779 1790 1790 1789 
4 2700 2720 2713 2708 12 1780 1791 1790 1789 
43 2700 2720 2713 2708 123 1777 1788 1788 1786 
A, 2700 2720 2713 2708 124 1780 1790 1789 1787 
45 2700 2720 2713 2708 2s 1778 1789 1788 1787 
51 2559 2571 2569 2564 13; 1782 1789 1789 1788 
yy 2559 DD 2569 2564 13 1777 1790 1789 1789 
5 DES) DOT, 2569 2564 133 1776 1788 1787 1786 
54 Daa) 2573 2569 2564 134 1779 1790 1789 1789 
D5 MBS) 2573 2569 2564 135 1777 1789 1788 1788 
61 2457 2467 2464 2459 14 1782 1789 1789 1788 
62 2457 2467 2464 2459 14, iD) 1791 1789 1789 
63 2457 2467 2464 2459 14; 1775 1788 1787 1786 
64 2457 2467 2464 2459 14, 1779 1791 1789 1789 
65 2457 2467 2464 2459 14; 1776 1789 1788 1788 
71 3438 3450 3450 3450 iS 1780 1790 1790 1789 
7p 3437 3450 3450 3450 152 1779 1791 1789 1788 
13 3421 3430 3430 3430 153 1777 1788 1788 1788 
74 3414 3424 3424 3424 15, 1781 1790 1789 1788 
75 3436 3450 3450 3450 155 1780 1789 1788 1788 
8) 4780 4800 4800 4799 

82 4783 4800 4800 4799 

83 4778 4800 4800 4799 

84 4781 4800 4800 4799 

85 4781 4800 4800 4799 


From Tables 2 and 4, it is seen that for all the in- 
stances treated, the difference between the heuristic so- 
lution values HS and the best upper bounds obtained 


are always quite small. More precisely for all the exam- 
ples treated, the ratio R = (UB — HS)/UB is most of- 
ten well below 1% (Problem 14 in Table 2 is the only 
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one for which R > 1%). We note that since HS is only 
a lower bound, R is a pessimistic estimate of the relative 
difference between the best upper bound obtained and 
the optimal, unknown, solution value. 

Also, from Table 4, it is seen that the results ob- 
tained appear to be fairly stable, in spite of the impor- 
tance of the perturbations applied to generate the cor- 
responding 75 instances. In addition to practical appli- 
cability, and efficiency, this clearly shows good stability 
and robustness in the behavior of our algorithms. To 
our knowledge, this is the first time a systematic way 
of deriving upper bounds to such large scale MAX-CSP 
problems has been implemented and fully tested. 

To conclude, let us mention that, in view of the 
results obtained, the techniques described here have 
been included in an industrial software tool for radio 
network engineering developed by the French MOD 
(DGA/CELAR). 


See also 


> Frequency Assignment Problem 
> Graph Coloring 
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Introduction 


The MAXIMUM CUT problem (MAX-CUT) is one of the 
simplest graph partitioning problems to conceptualize, 
and yet it is one of the most difficult combinatorial opti- 
mization problems to solve. The objective of MAX-CUT 
is to partition the set of vertices of a graph into two sub- 
sets, such that the sum of the weights of the edges hav- 
ing one endpoint in each of the subsets is maximum. 
This problem is known to be NP-complete [18,27]; 


however, it is interesting to note that the inverse prob- 
lem, i. e., that of looking for the minimum cut ina graph 
is solvable in polynomial time using network flow tech- 
niques [1]. MAX-CUT is an important combinatorial 
problem and has applications in many fields including 
VLSI circuit design [9,32] and statistical physics [5]. For 
other applications, see [16,21]. For a detailed survey of 
MAX-CUT, the reader can refer to [33]. 


Organization 


In this paper, we introduce the MAXIMUM CUT prob- 
lem and review several heuristic methods which have 
been applied. In Subsect. “C-GRASP Heuristic” we de- 
scribe the implementation of a new heuristic based op- 
timizing a quadratic over a hypercube. The heuristic 
is designed under the C-GRASP (Continuous Greedy 
Randomized Adaptive Search Procedure) framework. 
Proposed by Hirsch, Pardalos, and Resende [23], 
C-GRASP is a new stochastic metaheuristic for contin- 
uous global optimization problems. Numerical results 
are presented and compared with other heuristics from 
the literature. 


Idiosyncrasies 


We conclude this section by introducing the symbols 
and notations we will employ throughout this paper. 
Denote a graph G = (V,E) as a pair consisting of 
a set of vertices V, and a set of edges E. Let the map 
w: Et> R be a weight function defined on the set of 
edges. We will denote an edge-weighted graph as a pair 
(G,w). Thus we can easily generalize an un-weighted 
graph G = (V, E) as an edge-weighted graph (G,w), by 
defining the weight function as 


1, if (i,j) E, 
0, if (i, f) ZE. 


We use the symbol “b:= a” to mean “the expres- 
sion a defines the (new) symbol b”. Of course, this 
could be conveniently extended so that a statement like 
“(1 — €)/2 := 7” means “define the symbol € so that 
(1 — €)/2 = 7 holds”. We will employ the typical sym- 
bol S° to denote the complement of the set S; further let 
A \ B denote the set-difference, A MN B°. Agree to let the 
expression x <— y mean that the value of the variable y 
is assigned to the variable x. Finally, to denote the cardi- 
nality of a set S, we use | S|. We will use bold for words 


(1) 


Wij = 
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which we define, italics for emphasis, and SMALL CAPS 
for problem names. Any other locally used terms and 
symbols will be defined in the sections in which they 
appear. 


Formulation 


Consider an undirected edge-weighted graph (G,w), 
where G = (V,E) is the graph, and w is the weight 
function. A cut is defined as a partition of the vertex set 
into two disjoint subsets S and S := V \ S. The weight 
of the cut (S, S) is given by the function W: Sx SHR 
and is defined as 


i€S,jeS 
For an edge-weighted graph (G,w), a maximum cut 


is a cut of maximum weight and is defined as 


MC(G, w) := max W(S,V\S). (3) 
VSCV 


We can formulate MAX-CUT as the following inte- 
ger quadratic programming problem: 


1 
max = > wij(1—yiy;) (4) 
l<i<j<n 
subject to: 
yi € {-1, hh, VieV. (5) 


To see this, notice that each subset V D S := {i € 
V : y; = 1} induces a cut (S,S) with corresponding 
weight equal to 


W(S,S) = ; > wij — yiyj) - (6) 


l<i<j<n 


An alternative formulation of MAX-CUT based on 
the optimization of a quadratic over the unit hypercube 
was given by Deza and Laurent in [12]. 


Theorem 1 Given a graph G = (V,E) with |V| =n, 
the optimal objective function value of the MAXIMUM 
CUT problem is given by 


max x! W(e—x), (7) 
x€[0,1]” 


where W = [wjj]7' ,_, is the matrix of edge weights, and 


e:= [1,1,...,1]! is the unit vector. 
Proof 1 Let 
f(x) = x’ We —x) (8) 


denote the objective function from Eq. (7). To begin 
with, notice that the matrix W has a zero diagonal, 
ie, wii = 0, Vie 1,2,...,n. This implies that f(x) 
is linear with respect to each variable, and thus there 
always exists an optimal solution, x* of (7) such that 
x* € {0, 1}”. Therefore, we have shown that 


max x'W(e—x)= max x!W(e—x). (9) 
x€[0,1]” x€{0,1}* 

The next step is to show that there is a bijection be- 
tween binary vectors of length n and cuts in G. Con- 
sider any binary vector x € {0,1}". Now suppose we 
partition the vertex set V into two disjoint subsets 
V, := {i]x; = 0} and V2 := {i]*; = 1}. Then, evaluat- 
ing the objective function we have 


Ss 


(i,jEVI X V2 


f(%) = Wij. (10) 


which is equal to W(V1,V2), the value of the cut defined 
by the partition of V = V, J V2 (see Eq. (2) above). 

Alternatively, consider any partition of V into two 
disjoint subsets V|, V. C V. That is 


V=aVUlLv% and Vil \vy=98. 
Now, we can construct the vector % as follows: 


; i, fie VY 
xi = (11) 
0, ifie Ve. 


Once again, evaluating the objective function on x, we 
have 


f(%) = (12) 


) Wij - 


(i,j)EVi x Vo 


Hence f(x) = W(Vj, V2) and we have the result.! Alas, 
we have shown the bijection between binary n-vectors 


‘Notice that the result holds even if (without the loss of gener- 
ality) V| = V and V2 = @. In this case, a cut induced by (V;, V2) 
will be a maximum cut if w;; < 0, Vi,jeV. 
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and cuts in G. In summary, we have 
T 
max x W(e-— x) 
x€[0,1]" 


= max x’W(e—x) 
xeE{0,1}” 


=> max 
V=V, UV2,Vi() V2 =0 


) Wij - 


(i, j)EVi x V2 


oO 


There are several classes of graphs for which MAX- 
CUT is solvable in polynomial time [25]. These include 
planar graphs [11], weakly bipartite graphs with non- 
negative edge weights [20], and graphs without Ks mi- 
nors [4]. The general problem however is known to be 
APX-complete [31]. This implies that unless 
P = NP, MAX-CUT does not admit a polynomial 
time approximation scheme [30]. 


Methods 


The MAXIMUM CUT problem is one of the most well- 
studied discrete optimization problems [27]. Since the 
problem is NP-hard in general, there has been an in- 
credible amount of research done in which heuristic 
techniques have been applied. Before we present the 
new heuristic approach, we review some of the prior 
work that has been done. 


Review of Solution Approaches 


There have been many semidefinite and continuous 
relaxations based on this formulation. This was first 
shown by Lovasz in [28]. In 1995, Goemans and 
Williamson [19] used a semidefinite relaxation to 
achieve an approximation ratio of .87856. This impli- 
cation of this work is significant for two reasons. The 
first is of course, the drastic improvement of the best 
known approximation ratio for MAX-CUT of 0.5 which 
had not been improved in over 20 years [36]. Secondly, 
and perhaps more significantly is that until 1995, re- 
search on approximation algorithms for nonlinear pro- 
gramming problems did not receive much attention. 
Motivated by the work of Goemans and Williamson, 
semidefinite programming techniques were applied to 
an assortment of combinatorial optimization problems 
successfully yielding the best known approximation 
algorithms for GRAPH COLORING [7,26], BETWEEN- 


NESS [10], MAXIMUM SATISFIABILITY [13,19], and 
MAXIMUM STABLE SET [2], to name a few [29]. 

As noted in [16], the use of interior point methods 
for solving the semidefinite programming relaxation 
have proven to be very efficient. This is because meth- 
ods such as the one proposed by Benson, Ye, and Zhang 
in [6] exploit the combinatorial structure of the re- 
laxed problem. Other algorithms based on the nonlin- 
ear semidefinite relaxation include the work of Helm- 
berg and Rend [22] and Homer and Peinado [24]. 

The work of Burer et al. in [8] describes the im- 
plementation of a rank-2 relaxation heuristic dubbed 
circut. This software package was shown to com- 
pute better solutions than the randomized heuristic of 
Goemans and Williamson, in general [16]. In a re- 
cent paper dating from 2002, Festa, Pardalos, Resende, 
and Ribeiro [16] implement and test six random- 
ized heuristics for MAX-CUT. These include variants 
of Greedy Randomized Adaptive Search Procedures 
(GRASP), Variable Neighborhood Search, and path-re- 
linking algorithms [35]. Their efforts resulted in im- 
proving the best known solutions for several graphs 
and quickly producing solutions that compare favor- 
ably with the method of Goemans and Williamson [19] 
and circut [8]. For several sparse instances, the 
randomized heuristics presented in [16] outperformed 
circut. 

In [25], Butenko et al. derive a “worst-out” heuris- 
tic having an approximation ratio of at least 1/3 which 
they refer to as the edge contraction method. The also 
present a computational analysis of several greedy con- 
struction heuristics for MAX-CUT based on variations 
of the 0.5-approximation algorithm of Sahni and Gon- 
zalez [36]. With this, we now move on and describe the 
implementation of a new heuristic for MAX-CUT based 
on the new metaheuristic Continuous GRASP [23]. 


C-GRASP Heuristic 


The Continuous Greedy Randomized Adaptive Search 
Procedure (C-GRASP) is a new metaheuristic for con- 
tinuous global optimization [23]. The method is an ex- 
tension of the widely known discrete optimization algo- 
rithm Greedy Randomized Adaptive Search Procedure 
(GRASP) [15]. Preliminary results are quite promising, 
indicating that C-GRASP is able to quickly converge to 
the global optimum on standard benchmark test func- 
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procedure GRASP(MaxI ter, RandomSeed) 
1 f*<0 

2 X*<Q@ 

3 fori=1toMaxIter do 

4 X < ConstructionSolution(G, g, X,a) 
5 X < LocalSearch(Xx, N(X)) 

6 if f(X) => f(X*) then 

7 X*<xX 

8 f* <—f(X) 

9 end 

10 end 

11 return X* 

end procedure GRASP 


Maximum Cut Problem, MAX-CUT, Figure 1 
GRASP for maximization 


tions. The traditional GRASP is a two-phase procedure 
which generates solutions through the controlled use 
of random sampling, greedy selection, and local search. 
For a given problem /7, let F be the set of feasible solu- 
tions for IT. Each solution X € F is composed of k dis- 
crete components a),..., ax. GRASP constructs a se- 
quence {X}; of solutions for /7, such that each X; € F. 
The algorithm returns the best solution found after all 
iterations. The GRASP procedure can be described as 
in the pseudo-code provided in Fig. 1. The construction 
phase receives as parameters an instance of the prob- 
lem G, a ranking function g: A(X) > R (where A(X) is 
the domain of feasible components a, ..., ax for a par- 
tial solution X), and a parameter 0 < a < 1. The con- 
struction phase begins with an empty partial solution X. 
Assuming that |A(X)| = k, the algorithm creates a list 
of the best ranked wk components in A(X), and returns 
a uniformly chosen element x from this list. The cur- 
rent partial solution is augmented to include x, and the 
procedure is repeated until the solution is feasible, i.e., 
until X € F. 

The intensification phase consists of the implemen- 
tation of a hill-climbing procedure. Given a solution 
X € F, let N(X) be the set of solutions that can found 
from X by changing one of the components a € X. 
Then, N(X) is called the neighborhood of X. The im- 
provement algorithm consists of finding, at each step, 
the element X* such that 


X* := arg max f(X’), 
6 | ) 


where f: F + R is the objective function of the prob- 
lem. At the end of each step we make the assignment 
X* <— X if f(X) > f(X*). The algorithm will eventu- 
ally achieve a local optimum, in which case the solu- 
tion X* is such that f(X*) > f(X’) for all X’ € N(X%*). 
X* is returned as the best solution from the iteration 
and the best solution from all iterations is returned as 
the overall GRASP solution. GRASP has been applied 
to many discrete problems with excellent results. For 
an annotated bibliography of GRASP applications, the 
reader is referred to the work of Festa and Resende 
in [17]. 

Like GRASP, the C-GRASP framework is a multi- 
start procedure consisting of a construction phase 
and a local search [14]. Specifically, C-GRASP is de- 
signed to solve continuous problems subject to box 
constraints. The feasible domain is given as the n- 


dimensional rectangle S := {x = (x1, %2,...,Xn) € 
R": 1 < x < ut}, where 1,u € R"” are such that 
l; < uj, for i= 1,2,...,n. Pseudo-code for the ba- 


sic C-GRASP is provided in Fig. 2. Notice that the al- 
gorithm takes as input the dimension n, upper and 
lower bounds / and u, the objective function f, and 
parameters MaxIiters, 


MaxNumiterNolImprov, 


NumTimesToRun, MaxDirToTry, and a number 
a € (0,1). 

To begin with, the optimal objective function 
value f* is initialized to —oo. The procedure then en- 
ters the main body of the algorithm in the for loop 
from lines 2-21. The value NumTimesToRun is the 
total number of C-GRASP iterations that will be per- 
formed. To begin with, more initialization takes place 
as the current solution x is initialized as a random point 
inside the hyperrectangle, which is generated according 
to a function Uni £Rand([I/, u)) which is uniform onto 
[1,u)*. Furthermore, the parameter which controls the 
discretization of the search space, h, is set to 1. Next, 
the construction phase and local search phases are en- 
tered. In line 9, the new solution is compared to the cur- 
rent best solution. If the objective function value corre- 
sponding to the current solution dominates the incum- 
bent, then the current solution replaces the incumbent 
and NumIterNoImprov is set to 0. This parameter 


This is the “typical” definition of a Uniform distribution. 
That is, P: X +> R is uniform onto [A,B), if, for any subinter- 
val I C [A, B), the measure of P—!(I) equals the length of I. 
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procedure C-GRASP(n, 1, u, f(-),MaxI ters, MaxNumIterNoImprov, NumTimesToRun, 


MaxDirToTry, @) 


1 f*<-—oo 

2 for j= 1toNumTimesToRun do 

3 x < UnifRand([I, u)) 

4 h<1 

5 NumIterNoImprov < 0 

6 for Iter = 1toMaxIters do 

7 x < ConstructGreedyRandomized(x, f(-),n,h, 1, u, a) 
8 x < LocalSearch(x, f(-),n,h,1,u,MaxDirToTry) 
9 if f(x) > f* then 

10 x*<_x 

11 f* <— f(x) 

12 NumiIterNoImprov < 0 

13 else 

14 NumiIterNoImprov < NumIterNoImprov + 1 
15 end if 

16 if NumIterNoImprov > MaxNumIterNoImprov then 
17 h<— h/2 

18 NumiIterNoImprov < 0 

19 end if 

20 end for 

21 end for 


22 return x* 
end procedure C-GRASP 


Maximum Cut Problem, MAX-CUT, Figure 2 
C-GRASP pseudo-code adapted from [23] 


controls when the discretization measure h is reduced. 


That is, after a total of MaxNumIterNoImprov iter- 
ations occur in which no solution better than the cur- 
rent best solution is found, h is set to h/2 and the loop 
returns to line 6. By adjusting the value of h, the algo- 
rithm is able to locate general areas of the search space 
which contain high quality solutions, and then narrow 
down the search in those particular regions. The best 
solution after a total of NumTimesToRun iterations is 
returned as the best solution. 

The construction phase of the C-GRASP takes as in- 
put the randomly generated solution x € S (see Fig. 2, 
line 3). Beginning with all coordinates unfixed, the 
method then performs a line search on each unfixed 
coordinate direction of x holding the other n — 1 direc- 
tions constant. The objective function values resulting 
from the line search solution for each coordinate direc- 
tion are stored in a vector, say V. An element v; € V 
is then selected uniformly at random from the maxi- 


mum (1 — a@)100% elements of V, and the v; coordinate 
direction is fixed. This process repeats until all n coor- 
dinates of x have been fixed. The resulting solution is 
returned as the C-GRASP solution from the current it- 
eration. For a slightly more detailed explanation of this 
procedure, the reader is referred to [23]. 

As for the local search phase, this procedure sim- 
ulates the role of calculating the gradient of the ob- 
jective function f(-). As mentioned earlier, gradients 
are not used in C-GRASP because oftentimes, they are 
difficult to compute and result in slow computation 
times. Therefore, the gradient is approximated as fol- 
lows. Given the construction phase solution x, the lo- 
cal search generates a set of directions and determines 
in which direction (if any) the objective function im- 
proves. 

The directions are calculated according to a bijec- 
tive function T which maps the interval of integers 
[1,3”) M Z onto their balanced ternary representation. 
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Maximum Cut Problem, MAX-CUT, Table 1 
Parameters used for C-GRASP 


NumTimesToRun = 20|]MaxIters = 1000 


MaxNumIterNoImrpov = 1 


Recall that 1 is the dimension of the problem under 
consideration. That is, T: [1,37) Zt {-1,0, 1, 
Clearly, as n —> oo, the number of search directions 
grows exponentially. Therefore, only MaxDirToTry 
directions are generated? and tested on the current so- 
lution. For each direction d, the point x := x + hd 
is constructed and f(x) is computed. Recall that h is 
the parameter which controls the density of the search 
space discretization. If the constructed point x € S 
has a more favorable objective value than the cur- 
rent point x, then x replaces x, and the process con- 
tinues. The phase terminates when a locally optimal 
point x* € S is found. The point x* is said to be lo- 
cally optimal if f(x*) > f(x* + hd)Vd € {1,2,..., 
MaxDirToTry}. Again, for a slightly more in depth 
description of this procedure, the reader should see the 
paper by Hirsch et al. [23]. 


Computational Results The proposed procedure 
was implemented in the C++ programming language 
and complied using Microsoft” Visual C++ 6.0. It was 
tested on a PC equipped with a 1700 MHz Intel” Pen- 
tium® M processor and 1GB of RAM operating un- 
der the Microsoft® Windows® XP environment. The 
C-GRASP parameters used are provided in Table 1. 
First, we tested the C-GRASP on 10 instances produced 
by the Balasundarm-Butenko problem generator in [3]. 
Though these problems are relatively small, they have 
proven themselves to be quite formidable against the 
Multilevel Coordinate Search (MCS) black-box opti- 
mization algorithm. We also tested the C-GRASP on 12 
instances from the TSPLIB [34] collection of test prob- 
lems for the TRAVELING SALESMAN PROBLEM. These 
problems are also used as benchmark problems for test- 
ing MAX-CUT heuristics [19]. 

For further comparison, all instances were tested us- 
ing the rank-2 relaxation heuristic circut [8], as well 
as with a simple 2-exchange local search heuristic which 


uniformly at random 


procedure LocalSearch(G, MaxIter) 
f* <— —0o 
x*<—Q@ 
for j = 1toMaxIter do 
x < KruskalMST(x, G) 
x <— LocalImprove(x, G) 
if f(x) > f* then 
x*<x 
f* <— f(x) 
end if 
10 end for 
11 return x* 
end procedure LocalSearch 


WAND MF WN FH 


Maximum Cut Problem, MAX-CUT, Figure 3 
The 2-exchange local search routine 


is outlined in the pseudo-code provided in Fig. 3. The 
method receives as input a parameter MaxIter indi- 
cating the maximum number of iterations to be per- 
formed and G = (V, E) the instance of the problem 
whereupon a maximum spanning tree is found using 
Kruskal’s algorithm [1]. The spanning tree, due to its 
natural bipartite structure provides a feasible solution 
to which a swap-based local improvement method is 
applied in line 5. The local improvement works as fol- 
lows. For all pairs of vertices (u,v) such that u € S and 
v € S, a swap is performed. That is, we place u € S 
and v € S. If the objection function is improved, the 
swap is kept; otherwise, we undo the swap and exam- 
ine the next (u,v) pair. The local search was tested on 
the same PC as the C-GRASP. The circut heuristic 
was compiled using Compaq’ Visual Fortran on a PC 
equipped with a 3.60 GHz Intel’ Xeon® processor and 
3.0 GB of RAM operating under the Windows® XP en- 
vironement. 

Table 2 provides computational results of the al- 
gorithms on the 10 Balasundarum-Butenko instances 
from [3]. The first three columns provide the instance 
name, the number of vertices and the optimal solution. 
The solutions from the heuristics are provided next. 
The solutions from the Multilevel Coordinate Search 
algorithm were provided in [3]. For all of these in- 
stances, the time required by the C-GRASP, circut, 
and the local search to find their best solutions was frac- 
tions of a second. Computing times were not listed for 
the MCS algorithm in [3]. Notice that the 2-exchange 
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Maximum Cut Problem, MAX-CUT, Table 2 
Comparative results from the Balasundaram-Butenko in- 
stances from [3] 


1987 1802 | 1802 
0 


-1] 10 ]1585 | 1585 [1513 
5 | 594 
273 
285 


local search computed optimal solutions for each of 
these instances, followed closely by circut which 
found optimal cuts for all but one problem. As for the 
continuous heuristics, the C-GRASP found optimal so- 
lutions for 5 of the 10 instances while the MCS proce- 
dure produced optimal cuts for only 1 instance. For the 
5 instances where C-GRASP produced suboptimal so- 
lutions, the average deviation from the optimum was 
3.54%. 

Table 3 shows results of the C-GRASP, local search, 
and circut heuristics when applied to 12 instances 
from the TSPLIB collection of test problems for the 
TRAVELING SALESMAN problem [34]. The first two 
columns provide the instance name and the size of the 
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vertex set | V|. Next the solutions are provided along 
with the associated computing time required by the re- 
spective heuristic. Notice that for all 12 instances, the 
three heuristics all found the same solutions. Notice 
that in terms of computation time, the simplest heuris- 
tic, the 2-exchange local search seems to be the best 
performing of the three methods tested. The rank-2 re- 
laxation algorithm circut is also very fast requiring 
only 2.99 s on average to compute the solution. On the 
other hand, the C-GRASP method did not scale as well 
as the others. We see that there is a drastic increase in 
the solution time as the number of vertices increases 
beyond 48. 

This is not particularly surprising. The philosophi- 
cal reasoning behind the slow computation time of the 
C-GRASP relative to the discrete heuristics being that 
the C-GRASP is a black-box method and does not take 
into account any information about the problem other 
than the objective function. To the contrary, the local 
search and circut specifically exploit the combina- 
torial structure of the underlying problem. This allows 
them to quickly calculate high quality solutions. 


Conclusions 


In this paper, we implemented a new metaheuristic for 
the MAXIMUM CUT problem. In particular, we pro- 
posed the use of a continuous greedy randomized adap- 
tive search procedure (C-GRASP) [23], for a contin- 
uous formulation of the problem. To our knowledge, 


Comparative results from TRAVELING SALESMAN PROBLEM instances [34] 


C-GRASP__ Time (s) 
burma14 283 


Time(s) circut 
283 283 


gr17 24986 


24986 24986 


bays29 


53990 


53990 


53990 


dantzig42 


42638 


42638 


42638 


gr48 


320277 


320277 


320277 


hk48 


771712 


LIA 


771712 


gr96 


105328 


105328 


105328 


kroA100 


5897368 


5897368 


5897368 


kroB100 


5763020 


5763020 


5763020 


kroC100 


5890745 


5890745 


5890745 


kroD100 


kroE100 | 100 | 5986587 | 69.64 5986587 | 0.03 5986587 | 2.500 


5463250 


5463250 


5463250 
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this is the first application of C-GRASP to continuous 
formulations of discrete optimization problems. Nu- 
merical results indicate that the procedure is able to 
compute optimal solutions for problems of relatively 
small size. However, the method becomes inefficient on 
problems approaching 100 nodes. The main reason for 
this is the fact that C-GRASP is a black-box method, 
in that it does not take advantage of any information 
about the problem structure. Recall that the only input 
to the method is some mechanism to compute the ob- 
jective function. A natural extension of the work pre- 
sented here is to enhance the C-GRASP framework to 
take advantage of the structure of the problem at hand. 
Using a priori information about the problem being 
considered, one could modify the algorithm to include 
these properties which would presumably reduce the 
required computation time. 


See also 
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Optimization 
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Abstract 


In decision making under uncertainty an important 
step is uncertainty quantification. Game theory has 
been traditionally used since it injects robustness into 
the decision process. Another popular framework is 
that of maximum entropy. The purpose of this article 
is to briefly explain the two solution concepts and point 
out situations in which they are identical. 


Background 


Consider the following optimization problem: 


inf F(x) 2 [ f(x.) dP(w) , (1) 


where x is the decision vector, w is a vector represent- 
ing the random parameters that are distributed accord- 
ing to the probability measure P. We are not concerned 
here with the exact properties of f(-,-) for the problem 
to be valid, the interested reader is referred to [2] where 
the properties of this type of problem are made more 
precise. Instead we give two well known and studied ex- 
amples of this formulation. The first is the so called two- 
stage recourse problem and the second is the chance 
constrained formulation. The former can be formulated 
as in Eq. (1) with the following definition for the objec- 
tive function: 


f(x,o) © falor) + fuler,o), (2) 
and where 
ful. w) 

& inf (flo, x2(@), 0) | x2(@) € X2(@, x} 

(3) 


In Eq. (2) the objective function is split into two parts, 
the deterministic (fy) and the uncertain part (f,,) of 
the problem . The decision to be taken is x;. The full 
consequences of following a particular strategy are not 
known exactly since the true cost will depend on the 


2000 


Maximum Entropy and Game Theory 


solution of Eq. (3). The objective is therefore to find the 
decision x, that is best on average. 

A different decision model is given by chance con- 
strained programming problems. These can be formu- 
lated as follows: 


inf {f(x) | Pr(g(x,@) > 0) > a}, 


where we optimize an objective function and impose 
constraints that need to be satisfied with a probability 
of above a certain threshold a. 

These two models have been widely studied and 
have found many applications where traditional opti- 
mization is used. It is also evident that their usefulness 
revolves around our ability to provide a reasonable de- 
scription of the uncertainties. 

In order to provide a description of the uncertain- 
ties a technique based on moment matching can be 
used. Under this framework we assume that the deci- 
sion maker can not provide an exact description of the 
distribution but only knows some of its moments. The 
problem is to recover a meaningful probability measure 
given this knowledge: suppose a vector of functions 
m(@) = [m,(@)...mny(@)] and a vector of scalars 
JL = [[1... [Ln] are given, the problem is to find a P 
such that: 


/ m(@)dP(@) = wi, i=1,...,n 
2 


i. dP(w) = 1, (4) 
2 


P(w)>0, ae, 
where {2 isa compact subset of R”. By P we will denote 
the set of of all finite signed measures that are defined 
on the o-field F of 2. The vector m(w) represents the 
(generalized) moments of the distribution. The prob- 
lem in Eq. (4) is the so called generalized Hausdorff mo- 
ment problem. The aim is to recover a compactly sup- 
ported distribution from a finite number of its general, 
not necessarily power, moments. This is a variation of 
the classical moment problem formulated by Stieltjes. 
In [4] one can find a comprehensive summary of the 
main results when 2 = [0,1]. Prekopa [13] provides 
an excellent summary of results that are especially rele- 
vant in stochastic optimization problems. 


Methods 


Solving optimization problems where the uncertainty 
is only known through its moments requires some kind 
of regularization in order to fix the probability measure 
with which the optimization is to be done. Two popular 
frameworks are game theoretic and maximum entropy 
approaches. Under the game theory framework one se- 
lects the distribution with the worst case realization of 
the uncertainties. When a maximum entropy solution 
is sought, one optimizes with respect to the distribution 
with the maximum uncertainty. 


Game Theory Approach 


When P is unknown or not known exactly then the de- 
cision maker assumes that if strategy x is followed then 
the consequences of following this strategy will be de- 
cided by some law of Nature. Motivated by the appli- 
cation oriented requirement for robust decision mak- 
ing, we assume that Nature is antagonistic. If we decide 
to follow strategy x then Nature will follow strategy P’. 
The latter is the solution of the following optimization 
problem: 


Pep 


P(x) = sup | fox.0) dP(w) , (5) 
Q 


where ®(x) represents the value (outcome) of the game 
if strategy x is followed. Obviously ®(x) > F(x) for 
given x € X and forall P € P. Therefore, after we mini- 
mize Eq. (5) for x we will be guaranteed to attain a value 
which is as good as ®(x) whatever strategy nature de- 
cides to follow. The robustness property of the minimax 
strategy originates from the latter property. We thus re- 
formulate Eq. (1) as follows: 


xEX Pep 


inf sup ffx w) dP(@). (6) 
Q 


This approach has its origins in game theory and has 
been used extensively in many areas of optimization. 
See for example [9] for an excellent introduction to 
game theory, applications of minimax especially in eco- 
nomics and finance can be found in [14]. Numerical al- 
gorithms to solve Eq. (6) have been proposed in [5,6]. 
The general idea of these algorithms is to solve the inner 
maximization problem using results for general Cheby- 
shev inequalities [4,17]. However these methods re- 
quire several global optimization steps to be performed 
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at each iteration in order to identify the support of the 
measure that maximizes Eq. (5). Moreover it is usually 
assumed that the max-function is convex. Algorithms 
based on stochastic quasi-gradient methods were pro- 
posed in [4,17]. Recently Shapiro et al. [15] proposed 
the use of a reference distribution in order to refor- 
mulate the minimax problem into a standard stochastic 
programming problem. Bertsimas et al. [1] and Lassere 
[11] proposed a semidefinite formulation of the inner 
maximization problem in Eq. (6). 


Maximum Entropy Formulation 


The formulation of the moment problem using the 
maximum entropy principle was initiated by Jaynes [8]. 
The derivations in this Section are more or less stan- 
dard (see e. g. [12]).We will assume that a multivariate 
continuous density is postulated, discrete distributions 
share similar properties. Under these assumptions the 
maxent formulation is given by: 


PO) g 
h(o) 


ee i=0...n, 
2 


inf Q(p.h) = [ oe 
(7) 


where P, denotes the restriction on P to all absolutely 
continuous measures w.r.t dw, we will write v < ju to 
mean that v is absolutely continuous w.r.t jw. The func- 
tion in the objective function is the so called Kullback 
Leibler divergence (see e.g. [3]) and serves as a kind 
of distance metric between h(w) and p(w); the former 
is a distribution that is assumed to be known. The ob- 
jective is to find a p.d.f with the prescribed moments 
that is as close to has possible. If such a function is not 
known then we take h(w) = 1 (i.e. the uniform dis- 
tribution) and the problem becomes the classical max- 
imum entropy formulation. The convex functional de- 
fined by Q(p, h) is always strictly positive and is zero 
if and only if p = fh ae. It is also worth mentioning 
that in general Q(p, h) # Q(h, p). These properties of 
Q(p, h) are well known. We refer the interested reader 
to [3] for more properties of the entropy function. We 
assume that mo(@) = [lo = 1. Note that by consid- 
ering general moments as opposed to power moments 
allows us to impose fractile constraints, this property is 
important in many applications. 

While the problem in Eq. (7) is a convex optimiza- 
tion problem it cannot be handled using standard nu- 


merical algorithms. For this reason one considers the 
dual of Eq. (7). The Lagrangian associated with Eq. (7) 
is given by: 


p(@) 
h(w) 


Xi i dw —p;). 
+y ([, morco) ® ui) 


The dual problem of Eq. (7) is given by: 


L(p,a) = / sine 


ag D(A) = an L(p, a). (8) 


It is well known that the inner minimization on 
Eq. (8) can be done explicitly using functional deriva- 
tives [12,16]: 


L(p + dp, A) 


= pa) dp(w) 
— [ew + 5p) In} a (: + ne) ) dw 


+04 (/, mi(@)(p(@) + 5p(@)) da — ui) 
i=0 


=up.ay+ f fi+m(22)t Sp(w) dw 


+> [ Aim;(w)bp(w) do , 
i=0 


where to get the last equality we assumed that dp is 
small, used the approximation In(1 + €) ~ € (which 
is valid for small €) and ignored second order terms. 
The stationary points of the Lagrangian must therefore 


satisfy: 


p(w) = h(w) exp ( = eo) (9) 
i=0 


Using the normalization condition we have: 


Z = exp(1+Ao) = i h(q@) exp (- Xam) do. 
2 i=l 


Using the equation above we can write Eq. (9) as fol- 
lows: 


p(@) = (10) 


exp Peso (- me mj o). 
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Finally, using Eq. (10) and the normalization constraint 
in Eq. (8) we find the following explicit form for the 
dual problem: 


sup D(A) = —InZ— > Aipi (11) 
A i=1 


The dual formulation given above is more useful than 
the primal problem since the dual problem is amenable 
to conventional optimization algorithms. 


Relationships Between Game Theory 
and Maximum Entropy 


The minimax approach has proven to be a prudent 
method for problems where the nature of the uncer- 
tainty is not known exactly. We will approach the prob- 
lem somewhat differently by dispensing the usual as- 
sumptions of convexity but allowing the decision maker 
to adopt mixed strategies. Such an approach (in the 
context of Stochastic Programming) has been described 
in Kolbin [10] but has not received much attention. The 
advantage of allowing mixed strategies is that the prob- 
lem exhibits a saddle point. Topsoe [18] showed that 
if the decision problem has a specific structure (will be 
outlined below) then the solution of the maximum en- 
tropy problem and that of zero-sum games are dual to 
each other. Recently Griinwald et al. [7] has further de- 
veloped this approach so that it can be applied to more 
general games. This generalization however has been 
done at the expense of defining more general entropy 
functionals; these do not, in general, render themselves 
to numerical algorithms. We believe this relationship to 
be very interesting and can under certain conditions be 
used as an additional motivation for adopting the max- 
imum entropy principle. 

Let 2 be a compact subset of R” and let F be the 
o-field generated by §2. We will use w to denote a ran- 
dom vector whose distribution is known to belong to 
a family P. The following meaningless formulation of 
a stochastic programming problem: 


inf fla 


s.tgi(x,w) <0, i=1,...,k 


can be placed into a pertinent form by formulating it as 
a two-person zero sum game G = (x,@,q). The first 
player is the Decision Maker (DM) that selects vectors 


x € X C R”. The second player is Nature that selects 
an event w € F with probability P(q), it is further as- 
sumed that the exact probability measure of Nature is 
only known to belong to a certain family P. The func- 
tion q represents the outcome of the game given the 
strategies the two players decide to follow. Kolbin [10] 
suggested the following form: 


k 
q(x,o) = f(x.) + >> Bilgi(x.o)), (12) 


i=1 
where f;(a) is a continuous non-decreasing penalty 
function that is 0 when a < 0. An example of 
such a function is the max-penalty function given by: 
c; max{g;(x, w), 0} (cj is a penalty parameter). 

The DM would like to minimize the outcome of the 
game given by Eq. (12) whereas Nature being antago- 
nistic would like to maximize this quantity: 


inf sup H(x, P) = / q(x,w) dP(a) . (13) 
Q 


xEX pep 

For the game above to exhibit a saddle point convexity 
assumptions need to be imposed on qg. Many problems 
of interest do not have this property and it is necessary 
to resort to mixed strategies for the DM. Using our as- 
sumptions that q is continuous, and the compactness of 
Q and _X, it can be shown [10] that if we allow the DM 
to follow mixed strategies then the game in Eq. (13) will 
have a saddle point, i.e: 


inf sup | q(x, @) dP(w) dK(x) 
xX 


KEK pep JQ 


= sup inf / gix,@) dP(w) dK(x) 4) 
R2xxX 


pep KEK 


= H(K*, P*), 


where the set K represents the family of randomized 
strategies of the DM. 

Assume that the DM selects a probability measure 
K € K and Nature selects P € P and both have their 
support in {2 (or X). Moreover assume that the objec- 
tive function of the game has the following functional 
form: 

H(K, P) = / —p(w) In k(@) do , (15) 

2 


where p(w) and k(@) are the Radon-Nikodym deriva- 
tives w.r.t dw of P © Pand K € XK respectively. 
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Topsge [18] observed that under these conditions the 
maximum entropy solution and the minimax solution 
coincide. To see why this is the case, suppose that na- 
ture selects a probability measure from the following 
family: 

{ dP 


P= TPE. f mio) dP) = wi 
| do Q 


i=1y...,n P< dul ; 


where P < dw is used to denote that P is absolutely 
continuous w.r.t to dw. K,, the family of admissible 
strategies for the DM is defined in an analogous man- 
ner. If Nature adopts a maximum entropy distribution, 
then its strategy can be found by solving: 


sup M(p) = i —p(o) In p(@) da . 
pe?. 2 


The optimal solution will be given by: 


p’(@) = exp{-1—Aj - So AFmi(o)} : 
i=1 
The optimal strategy of the DM can then be obtained 
by solving: 
inf —p*(w)Ink(@) dw. 
2 


keK;, 


Using the information inequality [3] we have: 


: —p*(w) In k(@) dw > | —p*(w) In p*(@) da, 
iy) 2 
(16) 


the above inequality is satisfied as an inequality if and 
only if k(w) = p*(w) ae. Consequently the optimal 
strategy for the DM is the same as Nature’s strategy. 

Conversely, assuming that the game has the func- 
tional form given in Eq. (15), then the minimax solu- 
tion of the game in Eq. (14) is the same as the maximum 
entropy solution. Indeed, by using the information in- 
equality we have: 


sup | —p(w) In k(@) dw 
Q 


peP. 


> | —p*(@)Ink(@) dw 
2 


> i —p*(@)|n p*(w) dw = M(p*), 


where p’ is the distribution of maximum entropy. From 
the above relationship it follows that: 


M(p*) < inf sup | —p(@) Ink(@) do , (17) 
kEKe peP, Q 


if we choose k = p* as the minimizer of the left hand 
side of Eq. (14), then: 


peEPc 


sup iy —p(w) In p*(@) dw = —A* — es 
i=1 
= M(p"), 
it follows from above that: 


inf sup / —p(w) Ink(@) dw < M(p*), (18) 
kEK pep, JQ 


and therefore k = p* is indeed the minimizer of the left 
hand side of Eq. (14). From the well known property of 
minimax problems: 


M(p*) = inf sup | —p(@) Ink(@) dw 
kEK pep, Q 


> sup int [ —p(w) In k(@) da , 
Q 


peP. kEK 
and from: 
sup inf —p(@) In k(@) da 
pEP, keK Q 


= ing [ -p*(w)in kw) do = M(p*), 
kex Jo 

we conclude that the game has a saddle point at p = 

k = p*. 

For games in the form of Eq. (15), the relationship 
between game theory and maximum entropy is most 
useful both theoretically and practically. For games not 
in the form described above can still be approached via 
maximum entropy methods but the definition of the 
entropy functional is given by a more general functional 
form. Griinwald et al. [7] defined the generalized en- 
tropy function as: 


M(P) = inf i q(x,@) dP(@) dK(x). 
kKEK JOQxx 


The maximum entropy problem becomes: 


mee aE) , (19) 


pe 
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They showed that using the generalized definition of 
entropy one could find the same results for both game 
theory and maximum entropy problems. Even though 
the formulation in Eq. (19) is very general, unfortu- 
nately there is no general way to solve it. However, the 
relationship between the two principles is worth further 
investigation. 
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Images can be used to characterize the underlying dis- 
tribution of certain physical properties, such as density, 
shape, and brightness, of an object under investigation. 
In many applications where an image is required, only 
a finite number of observations and/or indirect mea- 
surements can be made. Image reconstruction is a pro- 
cedure for processing the measurement data to con- 
struct an image of the object. This section introduces 
the basic concept of image reconstruction from projec- 
tion data. Two types of entropy optimization mod- 
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els, namely, the finite-dimensional model and vector- 
space model, and three classes of entropy optimiza- 
tion methodologies, namely, the discretization meth- 
ods, Banach-space methods (e. g., MENT) and Hilbert- 
space methods (e.g., finite element method) are in- 
cluded. For more details about image reconstruction, 
the reader is referred to [2,7,13] and the references 
therein. 

A very important scientific application of image re- 
construction is in computerized tomography (CT) for 
medical diagnosis. Physicians need to know, for exam- 
ple, the location, shape, and size of a suspected tumor 
inside a patient’s brain in order to plana suitable course 
of treatment. With computerized tomography, images 
of cross-sections of a human body can be constructed 
from data obtained by measuring the attenuation of X- 
rays along a large number of straight lines (or strips) 
through each cross-section. For ease of introduction, 
we illustrate the basic ideas about image reconstruction 
with the example of two-dimensional X-ray CT, with 
the understanding that the discussion can be general- 
ized to higher-dimensional settings. 

In this example, the distribution to be determined is 
that of the X-ray linear attenuation coefficient of hu- 
man body tissues. The total attenuation of the X-ray 
beam between a source and a detector is approximately 
the integral of the linear attenuation coefficient along 
the line between the source and the detector. The un- 
known distribution of the X-ray linear attenuation co- 
efficient is represented by a density function f of two 
variables, which assumes zero-value outside a squared- 
shape region. The squared region is usually referred to 
as the support of the image. 

Two basic types of entropy optimization mod- 
els, namely, finite-dimensional model and vector-space 
model, are commonly used to decide the density func- 
tion f. The finite-dimensional models approximate the 
density values over the support of the image at a fi- 
nite number of grid points, while the density is ap- 
proximated by a real-value function for the entire scan- 
ning region in the vector-space models. The latter mod- 
els were motivated to reconstruct the image with only 
a small number of available projections. 

In the finite-dimensional models, the support of the 
density f is represented by n (given by the users) reg- 
ularly spaced grid points, and the values of the density 
function f at these points are denoted by f = (fi, ..., 


fn). Assume that m projections are made and the mea- 
surement data d = (d,,..., d,,) are obtained. 

The relationship between the unknown density val- 
ues f and the observed measurement d can be approxi- 
mated by a linear relation 


dw Af, (1) 


where A = [aj] is a projection matrix. 

Note that the approximation sign in (1) reflects pos- 
sible errors in modeling and measurement. Also note 
that, in the classical square pixel model, the image is 
discretized by partitioning its support into a finite num- 
ber of equi-sized square regions (called pixels or cells) 
whose centers are those n sample points. By assuming 
that the density function f is constant in each of the 
equi-sized pixels, i.e., f = f; throughout pixel j, the value 
of aj in the projection matrix is simply the length of the 
intersection of the line corresponding to the ith projec- 
tion with the pixel surrounding the jth sample point. 

Once the projection matrix A is defined and the 
measurement d is known, the problem is to find an f 
satisfying (1). To cope with the errors mentioned above, 
G.T. Herman [6] suggested that (1) be replaced by an 
‘interval constraint’ and a nonnegativity constraint be 
added: 


d—e<Af<d+e, (2) 
f>O0, (3) 
where € = (€],..., €m) isan m-vector of user-chosen tol- 


erance levels. Note that (2) can be replaced by an equiv- 
alent system of inequalities 


A'f <d’, (4) 


with twice as many one-sided inequalities [2,6]. 

For such an image reconstruction model, we can 
adopt either the feasibility approach’ to find a solution 
to (2) and (3) directly, or the ‘optimization approach’ 
to find a solution that is not only feasible in the above 
sense but also optimal with respect to a certain crite- 
rion. In the literature, at least three different types of 
optimization problems have been proposed, namely, 
the entropy maximization problem, the quadratic min- 
imization problem, and the maximum likelihood prob- 
lem. 
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The entropy optimization problem seeks to opti- 
mize an entropic objective function subject to (2) and 
(3) as follows. 

Model 1: 


max —> fin fj 
j=l 


(5) 
s.t. d—e<Af<d+e, 


f>0O. 


Some researchers proposed models in which the f;’s are 
normalized in such a way that ae, fj =1, and the pro- 
jection matrix and the measurement data differ from 
those of Model 1. See, e.g., [4]. In this way, a solu- 
tion that is consistent with the measurement data but 
remains maximally noncommittal can be found. Note 
that an optimal solution to such models can also be in- 
terpreted as the most probable solution that is consis- 
tent with the measurement data [3]. 

Other variations of Model 1 exist. Despite possible 
modeling and measurement errors, one common prac- 
tice is to replace (1) and inequalities (2), and (5) by 
a system of equations: Af = d. 

A different version of the finite-dimensional en- 
tropy optimization model begins with the definition of 


an error vector e = (€],..., €m)', where 


n 
e=d,-) Aijfis i=1,...,m. 
j=l 


Assume that errors ej, ... 
measurement and are independent noise terms with 
zero mean and known variance Gy S.F. Burch et al. [1] 
observed that the strong law of large numbers implies 
that 


, €m exist due to imprecise 


(St aif — ai) 
a i—1 Bij Jj — Gi 
Qf) = -> = — 


i=1 i 


> 1, 


aS moO. 


Thus, if m is sufficiently large, the following entropy op- 
timization problem with quadratic constraints can be 
useful: 

Model 2: 


~) > filn fj 
j=l 
st. 4(Af—d)'S?(Af—d) = 1, 


f,2=9, jol,...,n, 


where S is a diagonal matrix with 1/o; being its ith di- 
agonal element. 

Concerns such as the smoothing effect, nonunifor- 
mity, peakness, and exactness [14] of a constructed im- 
age can also be addressed in this model with proper 
modification of the objective functions and constraints. 
So far, we have used the square pixel model to illustrate 
the idea of entropy optimization for image reconstruc- 
tion. Other models exist [2]. 

For an introduction to the concept of Shannon’s en- 
tropy and related entropy optimization principles, i.e., 
principle of maximum entropy and principle of mini- 
mum cross-entropy, see ® Entropy optimization: Shan- 
non measure of entropy and its properties. A large 
amount of literature has been devoted to developing it- 
erative methods for solving finite-dimensional entropy 
optimization problems with linear and/or quadratic 
constraints. For details and a unification of such meth- 
ods, see [3]. 

The method currently employed in most CT sys- 
tems is the ‘filtered back-projection’ method, which is 
based on a finite-dimensional model. (See [5,10] for 
details.) Compared to the iterative methods for solv- 
ing entropy optimization problems, this method pro- 
vides speed, which enables reconstruction of the im- 
age while X-ray transmission data are being collected. 
Hence the time between scanning and obtaining re- 
constructed images is reduced. However, there are sit- 
uations where iterative methods produce compara- 
ble or better reconstructed images than the filtered 
back-projection method, e. g., in image reconstruction 
with few projections or in high-contrast image recon- 
struction. The ever increasing computer speed and its 
companion reduction in cost may increase the de- 
sirability of employing iterative methods in CT sys- 
tems. 

In many situations, e. g., conducting diagnostic ex- 
periments on plasma in magnetic confinement devices 
or laser target impositions with measurements on fu- 
sion reactor cores, only few projections are available, 
e.g., less than 10. When the finite-dimensional en- 
tropy optimization model is applied, it tends to pro- 
duce ‘streaking’ artifacts. This motivated the use of the 
vector-space model. 

Take the two-dimensional X-ray CT problem as an 
example. By assuming that the unknown density func- 
tion f(x, y) is continuous over a compact support D 
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such that 
f(x, y) = 0 and J [fe dx dy = 1, (6) 
D 


G. Minerbo [9] defined the entropy of f(x, y) as 


cp=—f f fe.nintfo. na dx dy, 


where A is the area of D. Denote the set of continuous, 
nonnegative functions with compact support in D by 
C, (D). 

The scanning area is partitioned into parallel strips, 
each of which is penetrated by an X-ray beam. Let 6j, j = 
1, ..., J, be the J distinct projection angles with respect 
to the X-axis of the scanning area. Also let M(j) be the 
number of parallel beams associated with the jth pro- 
jection or view, and Sj <--- < Sjyq be a set of abscissas 
for the jth view. The projection data are assumed to be 
in the form of the following ‘strip integrals’: 


S j(m+1) fo) 
Pmp= ff 
Sim —oo 


f(scos 6; — tsin 6;,s sin 0; + t cos 6;) dt ds, 


where m = 1,..., M(j) andj = 1,..., J. It is assumed 
that, forj=1,...,J, 


[o,@} 
i f(scos 6; — tsin 6;,s sin 6; + tcos 6;) dt = 0, 
—oo 


for s< Sj or s> SjM(j)- 
Let Gj denote the observed values of Pim, (f), for m= 1, 
..., M(j), andj =1,..., J. Note that (6) implies Gj, > 0 
and peel Gjm = 1. 

Then the vector-space model results in the follow- 
ing optimization problem: 


Model 3: 
sup ¢(f) 
C+(D) 
s.t. Pin(f) _ Gim, (7) 
m=1,...,M(j); 
j=l,...,J. 


A finite-dimensional unconstrained dual problem 
can be derived by using the technique of Lagrange mul- 
tipliers. An algorithm known as MENT [9] was pro- 
posed. It was shown that the solutions produced by 


MENT converge to a density function f* which satisfies 
the constraint (7) with ¢(f*) = supc,(p) ¢ (f). How- 
ever, the limiting density function f* is not continu- 
ous. Actually, as pointed out in [8], f* is piecewise con- 
stant and f* ¢ C,(D). When few projections are avail- 
able and the object being scanned has a simple structure 
(or close to circular symmetry in density), some prelim- 
inary computational results indicated the potential of 
this approach. 

Recognizing the fact that the supremum of Model 
3 is not attained by any function f € C, (D), M. Klaus 
and R.T. Smith [8] defined an alternative formulation 
in a richer class of functions than C,(D). More pre- 
cisely, they replaced C, (D) by i (D), the set of all non- 
negative square integrable functions on D, as the set- 
ting. Note that all piecewise-constant functions over D 
are contained in L’ (D). Also recognizing that measure- 
ments may not be consistent and even be flawed, they 
considered an optimization problem where the objec- 
tive function is the original entropy functional ¢ (f) mi- 
nus a penalty term corresponding to the residual error 
in meeting the measurement constraints, and the con- 
straint is that the maximizer lies in a weakly compact 
set that is determined by known physical information 
about the density function of the object to be scanned. 
A corresponding formulation becomes 

Model 4: 


sup G(f) = S(f)— 7 )_[Gim — Pim(fI, 

fEQ ie 
where y > 0 is an adjustable penalty parameter and (2 is 
a convex and weakly (sequentially) compact set of non- 
negative functions in L’, (D), with a compact support in 
D and containing physical information known a priori 
about the object to be scanned, e.g., upper and lower 
bounds on the density function. (A set 2 of nonnega- 
tive functions in S4 (D) is weakly (sequentially) com- 
pact if and only if every sequence in §2 has a weakly 
convergent subsequence whose weak limit lies in 2; 
a sequence {f,,(x, y)} converges weakly to f(x, y) if and 
only if the sequence {(f,,(x, y), g(x, y))} converges to 
(f(x, y)> g(x, y)} for every g(x, y) € L4. (D), where (hy, 
hr) = f f hy (x, y) ho (x y) dxdy denotes the inner 
product of h, and hy in the space of L?, (D).) 

With the aid of the theory of Hilbert space, it can be 
shown [8] that G has a unique maximizer in 92, for any 
given data Gjn, m=1,...,M(Qj),j=1,...,J. 
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Based on this alternative formulation, the density 
function f(x, y) can be approximated by using the fi- 
nite element method [11]. For simplicity, assume that D 
= [-1, 1] x [—1, 1]. First, we superimpose a fixed rect- 
angular mesh on D, with uniform mesh size h = 1/n in 
both the x and y directions. We also use the product of 
piecewise linear functions in x and y as the finite ele- 
ment space S* In this way, a basis for S" has the form 


Wklx. y) = Wilx)Wi(y), for k=1,...,(2n4+ 1)’, 


where 


1) f(K==(k=1) (mod 2n +1) 
~ | Qn+1 ie 
i=k—(l+n\Q2Qn+1l—n-1, 


and 


0 if t<(j—lh 
or ft>(j+))hA, 
FOUR if (j-Ih <t < jh, 


Gepnet if jh<t<(j+ Uh. 


wi(t) = 


It is reasonable to expect that, in practice, one 
should know a priori the minimum and maximum den- 
sities of the object being examined. Hence we focus on 
a simple constraint set 


0<a<f<b<wae., 


_ 2 : 
a=} fer): f =0Oae,inR?\D 


The density function f(x, y) is then approximated in 
gh by 


N 
f(xy) = Do caval, »), 
k=1 


where N = (2n + 1) and c,’s are chosen as the optimal 
solution of the following finite-dimensional optimiza- 
tion problem: 


N 
G (>: cove ») 


sup 
cERN k=1 
N 
s.t. 0O<a< > ceWe(x, y) < b. 


k=1 


This problem can be further reduced to 


N 
SuPceRN (>> ChW(x, ») 


k=1 
N 2 
-y> cs, — SO ce Pim (Wie, »| 
jsm k=1 
s.t. O<a<ce<b, k=1,...,N. 


Preliminary computational results reported in [11, 
12] indicate some improvements of this alternative ap- 
proach over the MENT algorithm when the object un- 
der investigation does not have circular symmetry in 
density and has a high density area near the edge of the 
scanning region. 
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The maximum flow problem seeks the maximum pos- 
sible flow in a capacitated network from a specified 
source node s to a specified sink node t without exceed- 
ing the capacity of any arc. A closely related problem is 
the minimum cut problem, which is to find a set of arcs 
with the smallest total capacity whose removal separates 
node s and node t. The maximum flow and minimum 
cut problems arise in a variety of application settings 
as diverse as manufacturing, communication systems, 
distribution planning, matrix rounding, and schedul- 
ing. These problems also arise as subproblems in the 
solution of more difficult network optimization prob- 
lems. In this article, we study the maximum flow and 
minimum cut problems, briefly introducing the under- 
lying theory and algorithms, and presenting some ap- 
plications. See [2] for a wealth of additional material 
that amplifies on this discussion. 

Let G = (N, A) be a directed network defined by a set 
N of n nodes and a set A of m directed arcs. We refer 
to nodes i and j as endpoints of arc (i, j). A directed path 
ij— in —+++ — ig isaset of arcs (i), ir),..., (ix-1, ix). Each 
arc (i, j) has an associated capacity u;; denoting the max- 
imum amount of flow on this arc. We assume that each 
arc capacity uj is an integer, and let U = max {uj:(i,j) € 
A}. The network has two distinguished nodes, a source 
node s and a sink node t. To help in representing a net- 
work, we use the arc adjacency list A(i) of node i, which 
is the set of arcs emanating from it, that is, A(i) = {(i, j) 
€ Ayj € N}. 

The maximum flow problem is to find the maxi- 
mum flow from the source node s to the sink node t 
that satisfies the arc capacities and mass balance con- 
straints at all nodes. We can state the problem formally 
as follows. 


max v (1) 
subject to 
dD t- i 
{i: pea} {i: (j,i)eA} 
Vv fori=s, 
= 40 fori € {s,t}, (2) 
—v fori=t, 


0 < xij < uj, for all (i, j) € A. (3) 
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We refer to a vector x = {xj} satisfying (2) and (3) as 
a flow and the corresponding value of the scalar variable 
vas the value of the flow. We refer to the constraints (2) 
as the mass balance constraints, and refer to the con- 
straints (3) as the flow bound constraints. 

In examining the maximum flow problem, we im- 
pose two assumptions: 

i) all arc capacities are integer; and 
ii) whenever the network contains arc (i, j), then it also 

contains arc (j, i). 

The second assumption is nonrestrictive since we allow 
arcs with zero capacity. 

Sometimes the flow vector x might be required to 
satisfy lower bound constraints imposed upon the arc 
flows; that is, if Jj; => 0 specifies the lower bound on the 
flow on arc (i, j) € A, we impose the condition x; > lj. 
We refer to this problem as the maximum flow problem 
with nonnegative lower bounds. It is possible to trans- 
form a maximum flow problem with nonnegative lower 
bounds into a maximum flow problem with zero lower 
bounds. 

The minimum cut problem is a close relative of 
the maximum flow problem. A cut [S, S] partitions the 
node set N into two subsets S and S = N — S It consists 
of all arcs with one endpoint in S and the other in S. We 
refer to the arcs directed from S to S, denoted by (S, S), 
as forward arcs in the cut and the arcs directed from S$ 
to S, denoted by (S, S), as backward arcs in the cut. The 
cut [S, S] is called an s — t-cut ifs € Sand t € S. We 
define the capacity of the cut [S, S], denoted as u[S, S], 
as Va jpels.s) uj; A minimum cut in G is an s — t-cut 
of minimum capacity. We will show that any algorithm 
that determines a maximum flow in the network also 
determines a minimum cut in the network. 

The remainder of this article is organized as fol- 
lows. To help in understanding the importance of the 
maximum flow problem, we begin by describing sev- 
eral applications. In the next section we present some 
preliminary results concerning flows and cuts. We next 
discuss two important classes of algorithms for solv- 
ing the maximum flow problem: augmenting path algo- 
rithms, and preflow-push algorithms. As described in 
the next section, augmenting path algorithms augment 
flow along directed paths from the source node to the 
sink node. The proof of the validity of the augmenting 
path algorithm yields the well-known max-flow min- 
cut theorem, which implies that the value of a maxi- 


mum flow in a network equals the capacity of a mini- 
mum cut in the network. In the next section, we study 
preflow-push algorithms that ‘flood’ the network so 
that some nodes have excesses and then incrementally 
‘relieve’ the flow from nodes with excesses by sending 
flow from excess nodes forward toward the sink node or 
backward toward the source node. In the final section, 
we study implications of the max-flow min-cut theorem 
and prove some max-min results in combinatorics. 

We would like to design maximum flow algorithms 
that are guaranteed to be efficient in the sense that their 
worst-case running times, that is, the total number of 
multiplications, divisions, additions, subtractions, and 
comparisons in the worst-case grow slowly in some 
measure of the problem’s size. We say that a maximum 
flow algorithm is an O(n*) algorithm, or has a worst- 
case complexity of O(n°), if it is possible to solve any 
maximum flow problem using a number of computa- 
tions that is asymptotically bounded by some constant 
times the term n°. We say that an algorithm is a poly- 
nomial time algorithm if it’s worst-case running time 
is bounded by a polynomial function of the input size 
parameters. For a maximum flow problem, the input 
size parameters are n, m, and log U (the number of bits 
needed to specify the largest arc capacity). We refer to 
a maximum flow algorithm as a pseudopolynomial time 
algorithm if its worst-case running time is bounded by 
a polynomial function of n, m, and U. For example, an 
algorithm with worst-case complexity of O(nm log U) 
is a polynomial time algorithm, but an algorithm with 
worst-case complexity of O(nmU) is a pseudopolyno- 
mial time algorithm. 


Applications 


The maximum flow problem arises in a variety of sit- 
uations and in several forms. Sometimes, it arises di- 
rectly in combinatorial applications that on the surface 
might not appear to be maximum flow problems at all; 
at other times, it occurs as a subproblem in the solu- 
tion of more difficult network optimization problems. 
In this section, we describe three applications of the 
maximum flow problem. 


Capacity of Physical Networks 


An oil company needs to ship oil from a refinery to 
a storage facility using the pipelines of its underlying 
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distribution network. In this problem context, the re- 
finery corresponds to a particular node s in the dis- 
tribution network and the storage facility corresponds 
to another node t. The capacity of each arc is the 
maximum amount of oil per unit time that can flow 
along it. The value of a maximum s— ¢ flow deter- 
mines the maximum flow rate from the source node s 
to the sink node t. Similar applications arise in other 
settings, for example, determining the transmission ca- 
pacity between two nodes of a telecommunications net- 
work, 


Feasible Flow Problem 


The feasible flow problem consists of finding a feasible 
flow satisfying the following constraints: 


3 xij— SS x;; = b(i)for alli € N, (4) 
{j: (peat {j: GieAt 
0 < xi; < uj; forall (i,j) € A. (5) 


We assume that )>j en U(i) = 0. The following dis- 
tribution scenario illustrates how the feasible flow prob- 
lem arises in practice. Suppose that merchandise avail- 
able at several seaports is desired by other ports. We 
know the stock of merchandise available at the ‘supply’ 
ports, the amount required at the other ports, and the 
maximum quantity of merchandise that can be shipped 
on a particular sea route. We wish to know whether we 
can satisfy all of the demands by using the available sup- 
plies. 

We can solve the feasible flow problem by solving 
a maximum flow problem defined on an augmented 
network as follows. We introduce two new nodes, 
a source node s and a sink node t. For each node i with 
supply (that is, with b(i) > 0), we add an arc (s, i) with 
capacity b(i), and for each node i with demand (that is, 
with b(i) < 0), we add an arc (i, t) with capacity — b(i). 
We refer to the new network as the transformed net- 
work. We then solve a maximum flow problem from 
node s to node t in the transformed network. It is easy 
to show that the model (4)-(5) has a feasible solution 
if and only if the maximum flow saturates all the arcs 
emanating from the source node, that is, x,; = ug for all 
arcs (s, j) € A(s). Moreover, if each b(i) and uj is inte- 
ger, then model (4)-(5) always has an integer feasible 


solution whenever it has a feasible solution (see Theo- 
rem 3). 

Sometimes in a feasible flow problem arcs have non- 
negative lower bounds, that is, the flow bound con- 
straints are lj < xj < uj instead of 0 < xj < uj, for 
some constants ];; > 0 for each (i,j) € A. By substituting 
Vij = Xij — 1j for xg, we can transform this problem to 
the formulation (4)-(5). Then (5) reduces to 0 < yj < 
(uj — lj) and (4) reduces to the same set of equations, 
but with a different right-hand side vector b’. 


Matrix Rounding Problem 


This application is concerned with consistent rounding 
of the elements, the row sums, and the column sums of 
a matrix. We are given a p x q matrix of real numbers 
D = {dj}, with row sums q; and column sums fj. We 
can round any real number d to the next smaller integer 
|d] or to the next larger integer [d], and the decision 
to round up or round down is entirely up to us. The 
matrix-rounding problem requires that we round the 
matrix elements, and the row and column sums of the 
matrix so that the sum of the rounded elements in each 
row equals the rounded row sum, and the sum of the 
rounded elements in each column equals the rounded 
column sum. We refer to such a rounding as a consis- 
tent rounding. The matrix-rounding problem arises is 
several application contexts, for example, the rounding 
of census data to disguise data on individuals. 

Using a numerical example, we will show how to 
transform a matrix rounding problem into a maximum 
flow problem. Figure la) shows an instance of the ma- 
trix rounding problem and Fig. 1b) gives the maximum 
flow network G for this problem. The network G con- 
tains a node i corresponding to each row i of the ma- 
trix D, a node j corresponding to each column j of D, 
a source node s, and a sink node t. The network con- 
tains an arc (i, j) corresponding to the ijth element in 
the matrix, an arc (s, i) for each row i (this arc repre- 
sents the sum of row i), an arc (j, t) for each column j 
(this arc represents the sum of column j). For any arc (i, 
j), we define its upper bound uj = [dj] and lower bound 
lj = [diy]. Notice that the flow xj = dj is a real-valued 
feasible flow x in the network. Since there is a one-to- 
one correspondence between the consistent roundings 
of the matrix and feasible integer flows in the corre- 
sponding network, we can find a consistent rounding 
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Maximum Flow Problem, Figure 1 
Network for the matrix rounding problem 


by solving a feasible flow problem on the correspond- 
ing network. The feasible flow algorithm will produce 
an integer feasible flow (because of Theorem 3), which 
corresponds to a consistent rounding. 


Preliminaries 


In this section, we discuss some elementary properties 
of flows and cuts. We will use these properties to prove 
the celebrated max-flow min-cut theorem and to estab- 
lish the correctness of the augmenting path algorithm 
described in the next section. 


Residual Network 


The concept of residual network plays a central role in 
the development of maximum flow algorithms. Given 
a flow x, the residual capacity rj of any arc (i, j) € A 
is the maximum additional flow that can be sent from 
node i to node j using the arcs (i, j) and (j, i). (Recall 
the assumption from the first Section that whenever the 
network contains arc (i, j), it also contains the arc (j, i).) 
The residual capacity rj has two components: 
i) uj — xj, the unused capacity of arc (i, j); 
ii) the current flow x; on arc (j, i), which we can cancel 
to increase the flow from node i to node j. 
Consequently, rjj = uj) — xij + xj. We refer to the net- 
work G(x) consisting of the arcs with positive residual 
capacities as the residual network (with respect to the 
flow x). Figure 2 gives an example of a residual net- 
work. 


Flow across an s — t-Cut 


Let x be a flow in the network. Adding the mass balance 
constraint (2) for the nodes in S, we obtain the equation 


=D] Dos LD x 


FES | de Gedy fi: Gidea} 


= > Xij _ ye Xij- (6) 


(i, €(S,S) (i,)€(S,S) 


Xij 


The second equality uses the fact that whenever 
both the nodes p and q belong to the node set S$ and 
(p, q) € A, the variable xp, in the first term within the 
bracket (for node i = p) cancels the variable — xp, in 
the second term within the bracket (for node j = q). 
The first expression in the right-hand side of (6) de- 
notes the amount of flow from the nodes in S to nodes 
in S, and the second expression denotes the amount of 
flow returning from the nodes in S to the nodes in S. 
Therefore, the right-hand side denotes the total (net) 
flow across the cut, and (6) implies that the flow across 
any s — t-cut [S, S] equals v. Substituting x; < uj in the 
first expression of (6) and xj > 0 in the second expres- 
sion yields: v < 0, yecs,5) Uij = UlS, S] implying that 
the value of any flow can never exceed the capacity of 
any cut in the network. We record this result formally 
for future reference. 


Lemma The value of any flow can never exceed the ca- 
pacity of any cut in the network. Consequently, if the 
value of some flow x equals the capacity of some cut 
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Maximum Flow Problem, Figure 2 


O— 


Illustrating the construction of a residual network; a) the original network, with arc capacities and a flow x; b) the residual 


network 


[S, S], then x is a maximum flow and the cut [S, S] is 
a minimum cut. 


The max-flow min-cut theorem, to be proved in the next 
section, states that the value of some flow always equals 
the capacity of some cut. 


Generic Augmenting Path Algorithm 


In this section, we describe one of the simplest and 
most intuitive algorithms for solving the maximum 
flow problem, an algorithm known as the augmenting 
path algorithm. 

Let x be a feasible flow in the network G, and let G(x) 
denote the residual network corresponding to the flow 
x. We refer to a directed path from the source to the sink 
in the residual network G(x) as an augmenting path. 
We define the residual capacity 6 (P) of an augment- 
ing path P as the maximum amount of flow that can be 
sent along it, that is, 5 (P) = min setrj(i, j) € P. Since the 
residual capacity of each arc in the residual network is 
strictly positive, the residual capacity of an augmenting 
path is strictly positive. Therefore, we can always send 
a positive flow of 5 units along it. Consequently, when- 
ever the network contains an augmenting path, we can 
send additional flow from the source to the sink. (Send- 
ing an additional 6 units of flow along an augmenting 
path decreases the residual capacity of each arc (i, j) in 
the path by 6 units.) The generic augmenting path algo- 
rithm is essentially based upon this simple observation. 


The algorithm identifies augmenting paths in G(x) and 
augments flow on these paths until the network con- 
tains no such path. The algorithm below describes the 
generic augmenting path algorithm. 

We can identify an augmenting path P in G(x) by 
using a graph search algorithm. A graph search algo- 
rithm starts at node s and progressively finds all nodes 
that are reachable from the source node using directed 
paths. Most search algorithms run in time proportional 
to the number of arcs in the network, that is, O(m) time, 
and either identify an augmenting path or conclude that 
G(x) contains no augmenting path; the latter happens 
when the sink node is not reachable from the source 
node. 


BEGIN 
ze = (Op 
WHILE G(x) contains a directed path from 
node s to node t DO 
BEGIN 
identify an augmenting path P from s to t; 
set 6 := min{r;; : (i, j) € P}; 
augment 6 units of flow along P; 
update G(x); 
END; 
END; 
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For each arc (i, j) € P, augmenting 6 units of flow 
along P decreases rj by 5 units and increases rj by 6 
units. The final residual capacities rj when the algo- 
rithm terminates specifies a maximum (arc) flow in the 
following manner. Since rj = ujj — xj + x;i, the arc flows 
satisfy the equality xj — xj = uj — ry. If uy > rj, we can 
set xj = uj — rj and x; = 0; otherwise, we set xj = 0 and 
Xji = Vij — Uij- 

We use the maximum flow problem given in Fig. 3) 
to illustrate the algorithm. Fig 3a) shows the residual 
network corresponding to the starting flow x = 0, which 
is identical to the original network. The residual net- 
work contains three augmenting paths: 1 — 3 — 4, 1 — 
2 —4, and 1 — 2 —3 — 4. Suppose the algorithm selects 
the path 1 — 3 — 4 for augmentation. The residual ca- 
pacity of this path is 6 = min{rj3, r34} = min{4, 5} = 4. 
This augmentation reduces the residual capacity of arc 
(1, 3) to zero (thus we delete it from the residual net- 
work) and increases the residual capacity of arc (3, 1) to 
4 (so we add this arc to the residual network). The aug- 
mentation also decreases the residual capacity of arc (3, 
4) from 5 to 1, and increases the residual capacity of arc 
(4, 3) from 0 to 4. Figure 3b) shows the residual network 
at this stage. In the second iteration, the algorithm se- 
lects the path 1 — 2 —3 — 4 and augments 1 unit of flow; 
Fig. 3c) shows the residual network after the augmenta- 
tion. In the third iteration, the algorithm augments one 
unit of flow along the path 1 — 2 — 4. Figure 3d) shows 
the corresponding residual network. Now the residual 
network contains no augmenting path and so the algo- 
rithm terminates. 

Does the augmenting path algorithm always find 
a maximum flow? The algorithm terminates when the 
search algorithm fails to identify a directed path in G(x) 
from node s to node t, indicating that no such path ex- 
ists (we prove later that the algorithm would terminate 
finitely). At this stage, let S denote the set of nodes in N 
that are reachable in G(x) from the source node using 
directed paths, and S = N—S. Clearly,s € Sand t ¢ S. 
Since the search algorithm cannot reach any node in 
S$ and it can reach each node in S, we know that rip = 
0 for each (i,j) € (S, S). Recall that rj = (ug — xy) + 
Xjix Xj S Uij> and xii = 0. If rig = 0, then Xj = Uj and 
xj = 0. Since rj = 0 for each (i, j) € (S,S), by substi- 
tuting these flow values in expression (6), we find that 
v = u[S, S]. Therefore, the value of the current flow x 
equals the capacity of the cut. Lemma 1 implies that x is 


a maximum flow and [S, S] isa minimum cut. This con- 
clusion establishes the correctness of the generic aug- 
menting path algorithm and, as a byproduct, proves the 
following max-flow min-cut theorem. 


Theorem 2 The maximum value of the flow from 
a source node s to a sink node t in a capacitated network 
equals the minimum capacity among all s — t-cuts. 


The proof of the max-flow min-cut theorem shows 
that when the augmenting path algorithm terminates, 
it also discovers a minimum cut [S, S], with S defined 
as the set of all nodes reachable from the source node in 
the residual network corresponding to the maximum 
flow. For our previous numerical example, the algo- 
rithm finds the minimum cut in the network, which is 
[S, S] with S = {1}. 

The augmenting path algorithm also establishes an- 
other important result, the integrality theorem: 


Theorem 3 [f all arc capacities are integer, then the 
maximum flow problem always has an integer maxi- 
mum flow. 


This result follows from the facts that the initial (zero) 
flow is integer and all arc capacities are integer; con- 
sequently, all initial residual capacities will be inte- 
ger. Since subsequently all arc flows change by integer 
amounts (because residual capacities are integer), the 
residual capacities remain integer throughout the algo- 
rithm. Further, the final integer residual capacities de- 
termine an integer maximum flow. The integrality the- 
orem does not imply that every optimal solution of the 
maximum flow problem is integer. The maximum flow 
problem might have noninteger solutions and, most 
often, it has such solutions. The integrality theorem 
shows that the problem always has at least one integer 
optimal solution. 

What is the worst-case running time of the algo- 
rithm? An augmenting path is a directed path in G(x) 
from node s to node t. We have seen earlier that each 
iteration of the algorithm requires O(m) time. In each 
iteration, the algorithm augments a positive integer 
amount of flow from the source node to the sink node. 
To bound the number of iterations, we will determine 
a bound on the maximum flow value. By definition, U 
denotes the largest arc capacity, and so the capacity of 
the cut ({s}, S — {s}) is at most nU. Since the value of any 
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c 


Maximum Flow Problem, Figure 3 


Illustrating the augmented path algorithm: a) the residual network G(x) for x = 0; b) the residual network after augmenting 
four units along the path (1 — 3 — 4); c) the residual network after augmenting one unit along the path (1 — 2 — 3 — 4); d) the 
residual network after augmenting one unit along the path (1 — 2 — 4) 


flow can never exceed the capacity of any cut in the net- 
work, we obtain a bound of nU on the maximum flow 
value and also on the number of iterations performed 
by the algorithm. Consequently, the running time of 
the algorithm is O(nmU), which is a pseudopolynomial 
time bound. We summarize the preceding discussion 
with the following theorem. 


Theorem 4 The generic augmenting path algorithm 
solves the maximum flow problem in O(nmU) time. 


The augmenting path algorithm is possibly the simplest 
algorithm for solving the maximum flow problem. Em- 
pirically, the algorithm performs reasonably well. How- 
ever, the worst-case bound on the number of iterations 
is poor for large values of U. For example, if U = 2”, the 


bound is exponential in the number of nodes. More- 
over, as shown by known examples, the algorithm can 
indeed perform these many iterations. A second draw- 
back of the augmenting path algorithm is that if the ca- 
pacities are irrational, the algorithm might not termi- 
nate. For some pathological instances of the maximum 
flow problem, the augmenting path algorithm does not 
terminate in a finite number of iterations and although 
the successive flow values converge to some value, they 
might converge to a value strictly less than the max- 
imum flow value. (Note, however, that the max-flow 
min-cut theorem is valid even if arc capacities are irra- 
tional.) Therefore, if the augmenting path algorithm is 
to be guaranteed to be effective in all situations, it must 
select augmenting paths carefully. 
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Researchers have developed specific implementa- 
tions of the generic augmenting path algorithms that 
overcome these drawbacks. Of these, the following 
three implementations are particularly noteworthy: 

i) the maximum capacity augmenting path algorithm 
which always augments flow along a path in the 
residual network with the maximum residual ca- 
pacity and can be implemented to run in O(m? log 
U) time; 

ii) the capacity scaling algorithm which uses a scal- 
ing technique on arc capacities and can be imple- 
mented to run in O(nm log U) time; 

iii) the shortest augmenting path algorithm which aug- 
ments flow along a shortest path (as measured by 
the number of arcs) in the residual network and 
runs in O(n*m) time. 

These algorithms are due to J. Edmonds and R.M. Karp 

[6], H.N. Gabow [9], and E.A. Dinic [5], respectively. 

L.R. Ford and D.R. Fulkerson [8] and P. Elias, A. Fen- 

stein and C.E. Shannon [7] independently developed 

the basic augmenting path algorithm. 


Generic Preflow-Push Algorithm 


Another class of algorithms for solving the maximum 
flow problem, known as preflow-push algorithms, is 
more decentralized than augmenting path algorithms. 
Augmenting path algorithms send flow by augment- 
ing along a path. This basic operation further decom- 
poses into the more elementary operation of sending 
flow along individual arcs. Sending a flow of 6 units 
along a path of k arcs decomposes into k basic opera- 
tions of sending a flow of 6 units along each of the arcs 
of the path. We shall refer to each of these basic opera- 
tions as a push. The preflow-push algorithms push flows 
on individual arcs instead of on augmenting paths. 

A path augmentation has one advantage over a sin- 
gle push: it maintains conservation of flow at all nodes. 
The preflow-push algorithms violate conservation of 
flow at all steps except at the very end, and instead 
maintain a ‘preflow’ at each iteration. A preflow is a vec- 
tor x satisfying the flow bound constraints and the fol- 
lowing relaxation of the mass balance constraints (2): 


2 2 
{j: (i,pea} {j: (,ieAt 
for alli € N — {s,t}. (7) 


Xij xj 20 


Each element of a preflow vector is either a real 
number or equals + oo. The preflow-push algorithms 
maintain a preflow at each intermediate stage. For 
a given preflow x, we define the excess for each node 
iéN —{s, thas 


Di 


{i: (i iea} 


e(i) = 


Xji — 


) Xij- 


{i: Gea} 


We refer to a node with positive excess as an active 
node. We adopt the convention that the source and sink 
nodes are never active. In a preflow-push algorithm, the 
presence of an active node indicates that the solution is 
infeasible. Consequently, the basic operation in this al- 
gorithm is to select an active node i and try to remove 
the excess by pushing flow out of it. When we push flow 
out of an active node, we need to do it carefully. If we 
just push flow to an adjacent node in an arbitrary man- 
ner and the other nodes do the same, then it is conceiv- 
able that some nodes keep pushing flow among them- 
selves resulting in an infinite loop, which is not a de- 
sirable situation. Since ultimately we want to send the 
flow to the sink node, it seems reasonable for an active 
node to push flow to another node that is ‘closer’ to the 
sink. If all nodes maintain this rule, then the algorithm 
could never encounter an infinite loop. The concept of 
distance labels defined next allows us to implement this 
algorithmic strategy. 

The preflow-push algorithms maintain a distance 
label d(i) with each node in the network. The distance 
labels are nonnegative (finite) integers defined with re- 
spect to the residual network G(x). We say that distance 
labels are valid with respect to a flow x if they satisfy the 
following two conditions: 


d(t) = 0, (8) 


d(i) < d(j) +1 for every arc (i, j) 


in the residual network G(x). (9) 


We refer to the conditions (8) and (9) as the validity 
conditions. It is easy to demonstrate that d(i) is a lower 
bound on the length of any directed path (as measured 
by number of arcs) from node i to node t in the residual 
network, and thus is a lower bound on the length of the 
shortest path between nodes i and j. Let i= i, — +++ — 
i, — t be any path of length k in the residual network 
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from node i to node t. The validity conditions (8), (9) 
imply that d(i) = d(i,) < d(iz) + 1, d(in) < d(i3) +. 1,..., 
d(i,) < d(t) + 1 = 1. Adding these inequalities shows 
that d(i) < k for any path of length k in the residual 
network, and therefore any (shortest) path from node i 
to node ¢ contains at least d(i) arcs. We say that an arc 
(i, j) in the residual network is admissible if it satisfies 
the condition d(i) = d(j) + 1; we refer to all other arcs as 
inadmissible. 

The basic operation in the preflow-push algorithm 
is to select an active node i and try to remove the ex- 
cess by pushing flow to a node with smaller distance 
label. (We will use the distance labels as estimates of the 
length of the shortest path to the sink node.) If node 
i has an admissible arc (i, j), then d(j) = d(i) — 1 and 
the algorithm sends flow on admissible arcs to relieve 
the node’s excess. If node i has no admissible arc, then 
the algorithm increases the distance label of node i so 
that node i has an admissible arc. The algorithm termi- 
nates when the network contains no active nodes, that 
is, excess resides only at the source and sink nodes. The 
next algorithm describes the generic preflow-push al- 
gorithm. 


BEGIN 
set x := 0 and d(j) := 0 for all j < N; 
set x,; = Us; for each arc (s, j) € A(s); 
d(s) := 1; 
WHILE residual network G(x) contains an ac- 
tive node 
DO 
BEGIN 
select an active node J; 
push/relabel(i); 
END; 
END; 


procedure push/relabel(i); 
BEGIN 
IF network contains an admissible arc (i, j) 
THEN push 6 := min{e(i), 7;,;} units of flow 
from node i to node j 
ELSE replace d(i) by 
min{d(j) +1: (i,j) € A(i), rij > 0}; 
END; 


The generic preflow-push algorithm 


The algorithm first saturates all arcs emanating 
from the source node; then each node adjacent to node 
s has a positive excess, so that the algorithm can be- 
gin pushing flow from active nodes. Since the prepro- 
cessing operation saturates all the arcs incident to node 
s, none of these arcs is admissible and setting d(s) = 
n will satisfy the validity condition (8), (9). But then, 
since d(s) = n, and a distance label is a lower bound on 
the length of the shortest path from that node to node 
t, the residual network contains no directed path from 
s to t. The subsequent pushes maintain this property 
and drive the solution toward feasibility. Consequently, 
when there are no active nodes, the flow is a maximum 
flow. 

A push of 5 units from node i to node j decreases 
both the excess e(i) of node i and the residual rj of arc 
(i,j) by 6 units and increases both e(j) and rj by 5 units. 
We say that a push of 6 units of flow on an arc (i, j) 
is saturating if d = rj and is nonsaturating otherwise. 
A nonsaturating push at node i reduces e(i) to zero. We 
refer to the process of increasing the distance label of 
a node as a relabel operation. The purpose of the rela- 
bel operation is to create at least one admissible arc on 
which the algorithm can perform further pushes. 

It is instructive to visualize the generic preflow-push 
algorithm in terms of a physical network: arcs represent 
flexible water pipes, nodes represent joints, and the dis- 
tance function measures how far nodes are above the 
ground. In this network, we wish to send water from 
the source to the sink. We visualize flow in an admis- 
sible arc as water flowing downhill. Initially, we move 
the source node upward, and water flows to its neigh- 
bors. Although we would like water to flow downhill 
toward the sink, occasionally flow becomes trapped lo- 
cally at a node that has no downhill neighbors. At this 
point, we move the node upward, and again water flows 
downhill toward the sink. 

Eventually, no more flow can reach the sink. As we 
continue to move nodes upward, the remaining excess 
flow eventually flows back toward the source. The al- 
gorithm terminates when all the water flows either into 
the sink or flows back to the source. 

To illustrate the generic preflow-push algorithm, we 
use the example given in Fig 4. Figure 4a) specifies the 
initial residual network. We first saturate the arcs ema- 
nating from the source node, node 1, and set d(1) = n 
= 4. Fig 4b) shows the residual graph at this stage. At 
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Illustrating the preflow-push algorithm: a) the residual network G(x) for x = 0; b) the residual network after saturating arcs 
emanating from the source; c) the residual network after pushing flow on arc (2, 4); d) the residual network after pushing 


flow on arc (3, 4) 


this point, the network has two active nodes, nodes 2 
and 3. Suppose that the algorithm selects node 2 for the 
push/relabel operation. Arc (2, 4) is the only admissi- 
ble arc and the algorithm performs a saturating push 
of value 5 = min {e(2), ro4} = min{2, 1} = 1. Fig 4c) 
gives the residual network at this stage. Suppose the al- 
gorithm again selects node 2. Since no admissible arc 
emanates from node 2, the algorithm performs a relabel 
operation and gives node 2 a new distance label d(2) = 
min{d(3)+ 1, d(1)+ 1} = min{2, 5} = 2. The new residual 
network is the same as the one shown in Fig 4c) except 
that d(2) = 2 instead of 1. Suppose this time the algo- 
rithm selects node 3. Arc (3, 4) is the only admissible 
arc emanating from node 3, and so the algorithm per- 
forms a nonsaturating push of value 5 = min{e(3), r34} 
= min{4, 5} = 4. Fig 4d) specifies the residual network 
at the end of this iteration. Using this process for a few 


more iterations, the algorithm will determine a maxi- 
mum flow. 

The analysis of the computational (worst-case) 
complexity of the generic preflow-push algorithm is 
somewhat complicated. Without examining the details, 
we might summarize the analysis as follows. It is pos- 
sible to show that the preflow-push algorithm main- 
tains valid distance labels at all steps of the algorithm 
and increases the distance label of any node at most 
2n times. The algorithm performs O(nm) saturating 
pushes and O(n*m) nonsaturating pushes. The nonsat- 
urating pushes are the limiting computational opera- 
tion of the algorithm and so it runs in O(n*m) time. 

The preflow-push algorithm has several attractive 
features, particularly its flexibility and its potential for 
further improvements. Different rules for selecting ac- 
tive nodes for the push/relabel operations create many 
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different versions of the generic algorithm, each with 
different worst-case complexity. As we have noted, the 
bottleneck operation in the generic preflow-push algo- 
rithm is the number of nonsaturating pushes and many 
specific rules for examining active nodes can produce 
substantial reductions in the number of nonsaturating 
pushes. The following specific implementations of the 
generic preflow-push algorithms are noteworthy: 

i) the FIFO preflow-push algorithm examines the ac- 
tive nodes in the first-in, first-out (FIFO) order and 
runs in O(n?) time; 

ii) the highest label preflow-push algorithm pushes 
flow from an active node with the highest value of 
a distance label and runs in O(n? m") time; and 

iii) the excess-scaling algorithm uses the scaling of arc 
capacities to attain a time bound of O(nm + n? 
logU). 

These algorithms are due to A.V. Goldberg and R.J. 

Tarjan [10], J. Cheriyan and S.N. Maheshwari [4], 

and R.K. Ahuja and J.B. Orlin [3], respectively. These 

preflow-push algorithms are more general, more pow- 
erful, and more flexible than augmenting path algo- 
rithms. The best preflow-push algorithms currently 
outperform the best augmenting path algorithms in 
theory as well as in practice (see, for example, [1]). 


Combinatorial Implications 
of the Max-Flow Min-Cut Theorem 


The max-flow min-cut theorem has far reaching con- 
sequences. It can be used to prove several important 
results in combinatorics that appear to be difficult to 
prove using other means. We will illustrate the use of 
the max-flow min-cut theorem to prove two such im- 
portant results. 


Network Connectivity 


Given a directed network G = (N, A) and two specified 

nodes s and ft, we are interested in the following two 

questions: 

i) what is the maximum number of arc-disjoint (di- 
rected) paths from node s to node t; and 

ii) what is the minimum number of arcs that we should 
remove from the network so that it contains no di- 
rected paths from node s to node t. 

We will show that these two questions are closely re- 

lated. The second question shows how robust a net- 


work, for example, a telecommunications network, is to 
the failure of its arcs. 

In the network G, let us define the capacity of each 
arc as equal to one. Consider any feasible flow x of value 
vin the resulting unit capacity network. We can decom- 
pose the flow x into flows along v directed paths from 
node s to node ft, each path carrying a unit flow. Now 
consider any s — t-cut [S, S$] in the network. The capac- 
ity of this cut is is, S)| that is, equals the number of 
forward arcs in the cut. Since each path joining nodes s 
and t contains at least one arc in the set (S, S), the re- 
moval of all the arcs in (S, S$) disconnects all paths from 
node s to node t. Consequently, the network contains 
a disconnecting set of arcs of cardinality equal to the 
capacity of any s — t-cut [S,S]. The max-flow min-cut 
theorem immediately implies the following result: 


Corollary 5 The maximum number of arc-disjoint 
paths from s to t in a directed network equals the min- 
imum number of arcs whose removal will disconnect all 
paths from node s to node t. 


Matchings and Covers 


The max-flow min-cut theorem also implies a max-min 
result concerning matchings and node covers in a di- 
rected bipartite network G = (N; U No, A), with arc 
set A C N, x N>. In the network G, a subset M C A 
is a matching if no two arcs in M have an endpoint in 
common. A subset C C N,N isa node cover of G if ev- 
ery arc in A has at least one endpoint in the node set 
C. Suppose we create the network G’ from G by adding 
two new nodes s and t, as well as arcs (s, i) of capacity 1 
for each i € Nj and arcs (j, t) of capacity 1 for each j € 
Np. All other arcs in G’ correspond to the arcs in G and 
have infinite capacity. It is possible to show that each 
matching of cardinality v defines a flow of value v in G’, 
and each s — ¢ cut of capacity v induces a corresponding 
node cover with v nodes. Consequently, the max-flow 
min-cut theorem establishes the following result: 


Corollary 6 In a bipartite network G = (N; U No, A), 
the maximum cardinality of any matching equals the 
minimum cardinality of any node cover of G. 


These two examples illustrate important relationships 
between maximum flows, minimum cuts, and many 
other problems in the field of combinatorics. The max- 
imum flow problem is of interest because it provides 
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a unifying tool for viewing many such results, because it 
arises directly in many applications, and because it has 
been a rich arena for developing new results concerning 
the design and analysis of algorithms. 
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Abstract 


Maximum-likelihood detection is a generic NP-hard 
problem in digital communications which requires 
efficient solution in practice. Some existing quasi- 
maximum-likelihood detectors achieve polynomial 
complexity with significant bit-error-rate performance 
degradation (e.g. LMMSE Detector), while others ex- 
hibit near-maximum-likelihood bit-error-rate perfor- 
mance with exponential complexity (e.g. Sphere De- 
coder and its variants). We present an efficient subopti- 
mal detector based on a semidefinite relaxation, called 
SDR Detector, which enjoys near-maximum-likelihood 
bit-error-rate with worst-case polynomial complexity. 
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SDR Detector can be implemented with recently devel- 
oped Interior-Point methods for convex optimization 
problems. For large systems SDR Detector provides 
a constant factor approximation for the maximum- 
likelihood detection problem. In high signal-to-noise 
ratio region SDR Detector can solve the maximum- 
likelihood detection problem exactly. Efficient imple- 
mentations of SDR Detector empirically deliver a near- 
optimal bit-error-rate with running time that scales 
well to large problems and in any signal-to-noise ratio 
region. 


Keywords and Phrases 


Maximum-likelihood detection; Multiple-input 
multiple-output systems; Multiuser detection; 
Semidefinite relaxation 


Introduction 


Maximum-Likelihood (ML) detection is a fundamental 
problem in digital communications. Under the mild as- 
sumption of equiprobable transmitted signals ML De- 
tector achieves the best Bit-Error-Rate (BER). In gen- 
eral, the ML detection problem is NP-hard due to the 
discrete nature of a signal constellation. The exhaus- 
tive search can be applied for small problem sizes, how- 
ever this strategy is not practical for large systems. Large 
communication systems often arise in schemes with ef- 
ficient rate and diversity utilization, e.g. the systems 
based on Linear Dispersion Codes [6]. Various subopti- 
mal detectors that have been developed to approximate 

ML Detector can be divided into two major categories: 

e Accelerated versions of ML Detector with expo- 
nential complexity (e.g. versions of Sphere De- 
coder [3,16]), 

e Polynomial complexity detectors with significant 
degradation in the BER performance (e.g. Linear 
Minimum Mean Square Error (LMMSE) Detector, 
Matched Filter, Decorrelator, etc.). 

We focus on an alternative detector which is based 
on a semidefinite relaxation of the ML detection prob- 
lem. This detector, called SDR Detector hereafter, en- 
joys a worst-case polynomial complexity while deliver- 
ing a near-optimal BER performance. In the next sub- 
section we will introduce notations and a system model 
used throughout the text. 


Formulation 
System Model 


Consider a vector communication channel with n 
transmit and m receive antennas. In wireless commu- 
nications a Rayleigh fading model is widely used in 
scenarios with significantly attenuated line-of-sight sig- 
nal component. An abundant research is based on this 
model which is used in profound theoretical results on 
channel capacity, diversity and multiplexing gain. De- 
fine a fading coefficient from the ith transmit antenna 
to the kth receive antenna to be a Gaussian zero-mean 
unit-variance, (0,1), variable H;;, with a Rayleigh 
distributed amplitude |H;;| and a uniformly distributed 
phase $(H,;). The coefficients H;; are assumed to 
be spatially and temporarily independent and identi- 
cally distributed (i.id.). The transmitted signals s = 
[s1,.-.,Sn]/ are drawn from a discrete n-dimensional 
complex set C”. The communication system is operat- 
ing at an average Signal-to-Noise Ratio (SNR) denoted 
by p. Noise samples at each receive antenna, v;,k = 
1,...,m, are modelled as ii.d. NV(0, 1) random vari- 
ables. With these notations a Rayleigh memoryless vec- 
tor channel can be represented by: 


The coefficient < p/n ensures that the expected 
value of SNR at each receive antenna is equal to p inde- 
pendent of problem dimension n. Channel model (1) is 
quite generic and can be used to describe other commu- 
nication systems, for example, a synchronous CDMA 
multi-access channel, where n denotes the number of 
users in the system. 

In the sequel, we will assume that the receiver has 
perfect information of the fading matrix H. In prac- 
tice H is estimated by sending training signals which 
are known to the receiver. Given the vector of received 
signals y and the channel state H, the optimal detector 
computes an estimate of transmitted signals such that 
the probability of an erroneous decision is minimized. 
For equiprobable input signals the minimal error prob- 
ability is achieved by ML Detector given by: 


= H), 
SML arg max p(y|s, ) 


where p(-|-) is a conditional probability density func- 
tion and s)y; denotes the ML estimate of transmitted 
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signals. For Gaussian noise this optimization problem 
can be stated in the form of the Integer Least Squares 
(ILS) problem: 


sm = arg min |ly— yp/n Hs||°. (2) 


In general, this optimization problem is NP-hard 
and the discrete constraint set C” of dimension n is the 
source of intractability. We are interested in an efficient 
polynomial time approximation algorithm for (2) with 
theoretical performance guarantees. In the next section 
we will briefly discuss common approaches to solving 
problem (2). 


Connection with Unconstrained Optimization 


Several strategies have been developed to overcome 
high computational complexity of ML Detector. Some 
detectors achieve polynomial complexity by relaxing 
the integer constraint in the ML detection problem (2), 
e.g. LMMSE Detector, Decorrelator, and Matched Fil- 
ter [5]. From the perspective of optimization theory 
these detectors can be jointly treated by dropping the 
discrete constraint in (2) and imposing a penalty func- 
tion instead. For the BPSK constellation the relaxed 
problem can be written as: 


§ = arg min lly — Vo/n Hs|* + y |lsll”. (3) 


The modified optimization problem is usually fol- 
lowed by a rounding procedure which projects the opti- 
mal solution of the relaxed problem onto set C”. Select- 
ing proper values for y, we can specialize (3) to LMMSE 
Detector, Decorrelator, or Matched Filter. An appeal- 
ing advantage of this approach is that it can be solved 
analytically: 


-1 
$ = sign ((2 H'H+ yl) Hy) : (4) 


This strategy achieves complexity O(n*) while sac- 
rificing the BER performance. 

Another type of detectors preserves the near-ML 
BER while reducing the high complexity of the exhaus- 
tive search. The work originates in [3,16] with the al- 
gorithm to find the shortest vector on a lattice, known 
as the so-called Sphere Decoder. The algorithm reduces 
the exhaustive search to an ellipse centered at the zero- 
forcing estimate of the transmitted signals: 


szp = nip (HH) H’y. 


Different variants of this approach use various in- 
telligent strategies of the radius selection and order- 
ing of points to be searched inside the ellipse. In high 
SNR region for small problem sizes Sphere Decoder 
empirically demonstrates fast running time [7]. How- 
ever, a thorough theoretical analysis [9,10] has shown 
that both the worst-case and expected complexity of 
this algorithm is still exponential. 


Semidefinite Relaxation Strategy 


We consider an alternative approach to solve (2) which 
is based on a convex relaxation of the ML detec- 
tion problem. Convexity of an optimization problem 
is a good indicator of problem tractability. Efficient 
and powerful algorithms with complexity O(n*>) have 
recently been developed to solve convex optimization 
problems (e.g. Interior-Point methods). These algo- 
rithms make efficient use of theoretically computable 
stopping criteria, enjoy robustness, and offer the cer- 
tificate of infeasibility when no solution exists. All these 
properties render convex optimization methods a pri- 
mary tool for various fields of engineering. 

There are several generic types of convex problems, 
the simplest one being a Linear Program (LP), i.e. the 
optimization problem with a linear objective function 
and linear constraints. An LP allows natural generaliza- 
tion of the notion of an inequality constraint to a so- 
called Linear Matrix Inequality (LMI). Instead of the 
regular componentwise meaning of the inequality in 
LP, LMI X > 0 implies that X belongs to the cone of 
symmetric positive semidefinite matrices, i.e. all eigen- 
values of X are non-negative. Such generalization leads 
us to a generic class of Semi-Definite Programs (SDP), 
which can be written in the standard form as follows: 


min QexX 
st. ApeX=b;,, k=1,...,K, (5) 
X>0, 


where (e) denotes inner product in the matrix space: 
Qe X = Tr(QX). The class of SDP problems (5) in- 
cludes Linear Programs as well as Second Order Cone 
Programs as special cases. It is quite remarkable that 
any problem (5) in the broad class of SDP problems can 
be solved in polynomial time, which makes it a valu- 
able asset for solving engineering problems, includ- 
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ing filter design, control, VLSI circuit layout design, 
etc. [2]. 

In addition to application in numerical solvers, SDP 
formulation (5) is widely used for analysis and design of 
approximation algorithms for NP-hard problems. Tra- 
ditional approaches involve relaxation of an NP-hard 
problem to an LP, which can be easily solved in poly- 
nomial time. With the invent of Interior-Point meth- 
ods for non-linear convex optimization problems some 
approximation algorithms have been significantly im- 
proved [4]. Such advanced non-linear approximation 
algorithms use weaker relaxations, thereby preserving 
most of the structure of the original NP-hard problem. 
The class of SDP problems represents a perfect candi- 
date for design of approximation algorithms since the 
SDP form is quite generic. The solution to the original 
NP-hard problem is generated from the solution of the 
relaxed SDP problem by a randomized or determinis- 
tic rounding procedure. For example, as will be shown 
later, the ML detection problem can be formulated as 


sive = min Qex 
s.t. Xii=l, i=1l,...,n+1 
(6) 
X=0 
Xis rank-1. 


Relaxing the rank constraint of X reduces the prob- 
lem to the standard SDP form (5): 


fspp := min Qex 
s.t. Xi = 1, i=1,...,.n+1 (7) 
X>0. 


A subsequent rounding procedure generates an es- 
timate of the transmitted signals with an objective value 
denoted fgpr based on the optimal solution Xopt of this 
SDP problem. 

Since SDR Detector outputs an estimate that be- 
longs to the feasible set of the ML detection problem, 
the optimal objective value fspr of SDR Detector satis- 
fies fut < fspr- Let fopt (fapr) denote the optimal objec- 
tive value of an NP-hard problem (approximation algo- 
rithm) in minimization form, then the approximation 
algorithm with ratio c > 1 guarantees to provide a so- 
lution with objective value fap, such that fapr < C/fopt- 
The quality of SDR Detector can be measured in terms 


of approximation ratio c such that: 


fur < fopor < cfm, c= 1, 
where c is independent of problem size. 

Relaxation (5) was first applied to combinatorial op- 
timization in [4] where the authors relaxed MAX-CUT 
problem to an SDP problem in the standard form (5). 
This strategy resulted in a substantial improvement of 
the approximation ratio for MAX-CUT problem, as 
compared to the classical relaxation to an LP. Unfor- 
tunately, we can not pursue this approach because the 
ML detection problem involves minimization instead 
of maximization (for a positive semidefinite matrix Q) 
used in the formulation of MAX-CUT problem. More- 
over, the ML detection problem does not allow a con- 
stant factor approximation algorithm for the worst case 
realizations of H and v. However, from the perspective 
of digital communications we are interested in the av- 
erage performance of SDR Detector over many channel 
and noise realizations. It turns out that SDR Detector 
allows a probabilistic approximation ratio for the ran- 
dom channel model (1). In high SNR region a typical 
behavior of the detection error probability is 


Po eV (e) : 


where function y(p) varies for different detectors. 
For example, ymi(p) = O(p) for ML Detector, 
and Yimmse(p) = O(,/p) for LMMSE Detector [5]. 
When a suboptimal detector is deployed instead of ML 
Detector, the incurred BER deterioration can be ex- 
pressed in terms of the log-likelihood ratio: 


log(P.(sdr)) _ Ysar(p) 
log(P.(ml)) 


< 
yeito) = 

Therefore, the approximation ratio c(p) is an essen- 
tial step in bounding the SNR gap between two detec- 
tors. Before we proceed with the probabilistic analysis 
of the performance, let us consider the empirical BER 
performance of SDR Detector in numerical simulations 
for channel model (1). 


Bit-Error-Rate Performance 


The detector based on a semidefinite relaxation (SDR) 
consists of two parts: a solver of relaxation (7) and 
a randomized rounding procedure. The SDP in (7) can 
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Maximum Likelihood Detection via Semidefinite Program- 
ming, Figure 1 

Bit-Error-Rate as a function of Signal-to-Noise Ratio for dif- 
ferent detectors 


be efficiently solved using Interior Point (IP) methods 
with complexity O(n?*). For this purpose we use Se- 
DuMi optimization toolbox for Matlab. The random- 
ized rounding procedure projects the solution of the 
SDP (7) onto the original discrete constraint set and will 
be discussed in details in the next section. 

Figure 1 shows a comparison of the BER per- 
formance of the SeDuMi-based SDR Detector [13], 
LMMSE Detector, Matched Filter, Decorrelator, 
Nulling and Cancelling strategy, Sphere Decoder, and 
ML Detector. We observe a significant BER improve- 
ment of SDR Detector compared to other polynomial 
complexity detectors. Sphere Decoder with adjustable 
radius search [16] delivers the BER performance of ML 
Detector (with probability 1) with running time that 
scales exponentially [9] with problem size. 

In many real-time/embedded applications a detec- 
tion latency is upper bounded and, in general, prema- 
ture decisions cause significant BER degradation. For 
simulation purposes we suppose that an engineering 
system is designed with BPSK modulation, operates at 
SNR = 10dB and allows 6.3ms per bit detection la- 
tency. Figure 2 demonstrates the BER performance of 
this system under the upper bound on the detection la- 
tency. The exponential complexity of Sphere Decoder 
reveals itself between dimensions 40 and 60 where we 
observe a rapid BER degradation because the running 
time of Sphere Decoder exceeds the fixed detection time 


ae Boserits = aoe z, ; w= Sphere Decoder 
Ame Ree 2... 1. .| s@= SeDuMi-based SDR Detector 


30 35 40 45 50 55 60 65 70 
Dimension, n 
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BER degradation due to the limit on detection time. Simu- 
lation parameters: BPSK modulation, SNR = 10dB and time 
limit per bit = 6.3 ms 


threshold for most channel realizations. At the same 
time, the running time of SDR Detector scales grace- 
fully with problem size and, in most cases, the detector 
completes detection in time. As a result, SDR Detector 
does not suffer any significant BER degradation even 
for large problem sizes. In fact, the number of late de- 
tections for SDR Detector does not exceed 1% for all di- 
mensions shown in Fig. 2. For different values of SNR 
and latency per bit we obtain essentially similar curves 
for both detectors. Such behavior is indicative of the 
exponentially growing computational effort of Sphere 
Decoder and comparably modest computational power 
required by SDR Detector. 

In the next section we will discuss the details of the 
SDP relaxation (11) and the randomized rounding pro- 
cedure. After that we present theoretical guarantees that 
substantiate the observed empirical behavior of SDR 
Detector. 


Method 


SDR Detector consists of two components: an SDP 
solver and a randomized rounding procedure. 


SDP Solver 


A transformation of the original ML detection prob- 
lem (2) into the standard SDP form (5) will help 
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us localize the place in (2) that makes the problem 
NP-hard. We start with homogenizing the objective 
function: 


lly — Ve/n Hs||° 


= "| (p/n) H7H — /pinH'y s 
—Vplny"H llyll? 1 


= pTr(Qxx’), 


where matrix Q € R&"+)*(@+) and vector x € R"t! 
are defined as 


(1/n) H7H 
Q= < 
—J/l/npy'H 


— J1/np H'y —_ s 
llyll?/e 1 
(8) 


Notice, that matrix Q is composed of the parame- 
ters that are known at the receiver. We linearize the ob- 
jective function by introducing a variable matrix X to 
comply with the standard SDP form (5): 


fv := min Tr(QX) 
oh. Maxx? (9) 


Xi = 1, i=1l,....n+1. 


In this problem formulation we discarded con- 
straint x,41; = 1 on the last entry of vector x because 
the problem is not sensitive to the sign of vector x. 
If tn41 = —1 we output —X as the solution to (9). 
Constraint X = xx! is equivalent to the set {X > 0, 
rank(X) = 1}, where notation X > O implies that 
matrix X is symmetric positive semidefinite. Thus, we 
complete the transformation of the original ML Detec- 
tion problem over BPSK constellation to the equivalent 
form stated in (6): 


sine := min Tr(QX) 
s.t. Xii=1, i=1l,...,n+1 
(10) 
X>=0 
X is rank-1. 


The rank-1 constraint is the only non-convex con- 
straint in (10) which makes the above problem in- 
tractable. SDR Detector relaxes the rank constraint and 
solves the following convex optimization problem: 


fspp = min Tr(QX) 
s.t. Xi,i =A; 
X>0. 


i=1,....2+1 (11) 


To reveal the difference between this relaxation and 
the one in (3) we can take one step further by relaxing 
the set of constraints {X;,; = 1, i= 1,...,n+ 1} into 
{Tr(X) = n+1} while keeping constraint X > 0 intact. 
This extra relaxed problem can be solved analytically 
and leads to the solution 
cee (SH"H) iy, 

n 

which is exactly the soft output of Decorrelator (4) 
with y = 0. The relaxation in (11) compares favor- 
ably to the relaxations in (3) because it requires less 
modifications of the ML problem, although complex- 
ity O(n**) of (11) is higher than O(n*) for the detectors 
in (3). 

Since we dropped the rank constraint in (11), a so- 
lution Xp; of (11) is no longer rank-1, hence, we need 
to project Xo; onto the feasible set of the original ML 
detection problem. Such projection is usually done by 
a rounding procedure which can be either determinis- 
tic like in (4) or randomized [13]. It can also vary de- 
pending on the processing power available for the algo- 
rithm. In the next section we will consider a random- 
ized rounding procedure based on the principal eigen- 
vector of matrix Xop. 


Randomized Rounding Procedure 


There are various rounding procedures that can be used 
to extract a rank-1 approximation of Xo. Widely used 
approaches and their analysis can be found in [4,13,14]. 
For our purposes we consider the randomized strat- 
egy based on the principal eigenvector of matrix Xopt. 
Notice that in the noise-free case, we have v = 0 and 
a transmitted vector s belongs to the kernel of matrix Q 
which is defined in (8). The optimal objective function 
is 0 and is achieved by the vector of transmitted sig- 
nals s. Thus, in the noise-free case, the optimal solution 
of problem (11) is a rank-1 matrix: 


Xow =| fs ie 


The structure of the optimal matrix X,, in the 
noise-free case suggests that the principal component of 
the eigen-decomposition contains most reliable infor- 
mation on the transmitted signals in high SNR region. 
It turns out that the optimal matrix X,,; has a strong 
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principal component even in low SNR region, justifying 

the randomized rounding procedure presented below: 

e INPUT: Solution Xo, of (11), and number D of ran- 
domized rounding tries. 

e OUTPUT: Quasi-ML estimate sspg and the best 
achieved objective value fspr. 

e RANDOMIZED ROUNDING PROCEDURE: 


1. Take a spectral decomposition X., = 
= Ajujul and set v; = JAjuj,i = 1,..., 
n+l. 

2. Pick v; corresponding to the principal eigenvec- 
tor v’"** = arg max}<j<n+1 {||Vill}. 

3. For each entry x; define Bernoulli distribution: 


Prix; = +1} = (1+ v™*)/2, 


Pr{x; = —1} = (1 —v"™*)/2 , (12) 


where v;"** denotes the ith entry of vector v""*. 

4. Generate a fixed number D of iid. (n+1)-di- 
mensional vector samples X7,d = 1,..., D, such 
that each entry of (xq);,i = 1,...,n+1, is drawn 
from distribution (12). 

5. For all D samples, set Xy := —xq if (n+1)-st entry 
of Xq is equal to —1. 

arg ming X7 Qka and set the best 
achieved objective value fspr := xJ5p QXspr. 

7. Return fspr and sgpr which is given by vector 
Xspr With the last bit discarded. 


6. Pick XSDR := 


This randomized rounding procedure is designed to 
ensure that output sspr is equal to the vector of trans- 
mitted signals with high probability. Whenever there is 
an error, the procedure selects sspr to reduce the num- 
ber of bits in error. 


Cases 
Performance of SDR Detector 


Constant Factor Optimality of SDR Detector The 
core component of SDR Detector is an approximation 
algorithm based on the convex relaxation (11) of the 
original ML detection problem. In this section we an- 
alyze the approximation ratio of this algorithm. 

A technique pioneered in [4] is widely used in 
optimization literature to derive a constant factor 
optimality for SDP-based relaxations. After the opti- 
mal solution X,, of problem (11) has been obtained 


the randomized rounding procedure used in [4] defines 
Gaussian distribution N (0, Xop1) (compare with (12)) 
and implements the n-dimensional sign(-) operator 
with uniformly generated cutting hyperplanes: 


e Generate D iid. samples X,...,Xp from Gaussian 
distribution NV(0, Xopt). 
e Let x; = sign(x;) and set the solution xspr that 


achieves minimum: 
T ey 
fspr ?= XsprQxXspr = minx; Qx; . 
1 


The best objective value fspr achieved with this ran- 
domized rounding procedure can be upper bounded as 
follows [4]: 


E (fspr} = E {xJpxQxspr} 
<? E {xi Qx;} 
= Tr (QE {x;x}}) 


cao (Q arcsin(Xopt)) , 
I 


(13) 


where the inequality above holds in probability for suf- 
ficiently many samples D, and the last equality follows 
from that fact that for any scalar random samples <; 
and x; drawn from N (0, 1) we have: 


E {sign(<;) sign(<;)} = = arcsin (E {ix;}) - 


By taking Taylor expansion of arcsin(Y), we can see 
that for any matrix Y, such that Y > 0, Y;; = 1 the 
following inequality holds: 


arcsin(Y) > Y. (14) 


Suppose that Q < 0, then we have the following 
upper bound: 


Tr (Q arcsin(Xopt)) < Tr(QXopt) . (15) 


which allows us to bound fspr as a constant factor away 
from Siu: 


2 
fu < E{fspr} <” 7 LM QXop) 
2 2 
=> — < — 
~ fspp < 7 iM : 


where the first inequality holds because an output of 
SDR Detector belongs to the feasible set of the ML 
problem (10), the second inequality follows from (13) 
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combined with (15), the third equality is the definition 
of fspp, and the last inequality holds because the SDP 
problem (11) is a relaxation of the ML problem (10). 
Therefore, given Q < 0, we obtain a 2/-approxima- 
tion ratio for the algorithm. Unfortunately, for ML de- 
tection problem the reverse inequality takes place (8): 


é= — J I/np Hy " 


llyll’/p 
We can attempt to cure the problem with inequality 
similar to (14) in the reverse direction for some con- 
stant c: 


(1/n) H7H 
—/l/npy'H 


arcsin(Y) < cY, forall Y > 0, with Y;; =1. 


For this inequality to hold, c must be growing 
linearly with problem dimension n. Hence, in the 
limit n — oo the constant c together with the approx- 
imation ratio of the algorithm grow unbounded. That 
is, we can not obtain a constant factor approximation 
by applying the standard technique of [4] to the analy- 
sis of the SDP relaxation in (11). 

The technique presented above applies to any neg- 
ative semidefinite matrix Q, hence, in the context of 
suboptimal detection it attempts to obtain a constant 
factor optimality for the worst-case channel realization. 
However, from the perspective of digital communica- 
tions, we are interested in the average performance of 
SDR Detector over many channel realizations. Unlike 
the technique we have discussed above, a probabilis- 
tic analysis of Karush-Kuhn-Tucker (KKT) optimality 
conditions of the semidefinite problem (11) allows us to 
claim a constant factor optimality for SDR Detector in 
probability [11]. 

The optimal objective value fspr achieved by SDR 
Detector is within a constant factor c(p,y) away from 
the optimal ML objective value in probability: 


lim P| a2 < o(p.y)| = 1, 
n,m — oo fut 


min—>y>1 (16) 


211+ /y)B 


p* —1 


where c(p, vy) = 1+ 


and {a, 8} are given by 


The statement implies that the log-likelihood ratio of 
SDR and ML Detectors is bounded in probability by 
a constant which is fully specified by SNR only. 


Performance of SDR Detector in High SNR Region 
We have argued in Sect. “Randomized Rounding Pro- 
cedure” that the selected randomized rounding pro- 
cedure provides the optimal solution in the noise-free 
case. The optimality condition can be extended to the 
case of large finite SNR: for sufficiently high SNR SDR 
Detector solves ML detection problem in polynomial 
time. 

For given system dimension n and SNR p (both fi- 
nite), the solution Xo of the relaxed problem (11) is 
rank-1 if channel matrix H and noise v realizations sat- 
isfy: 


Amin ATH) > [> ATI (18) 

Since random matrix H‘H is full rank with prob- 
ability 1, this claim can also be interpreted as follows: 
for any given n there exists a sufficiently high (finite) 
SNR level such that (18) holds and Xo, is rank-1. In 
general, if (18) does not hold X,,; may still be rank-1. 
Notice that if condition (18) is satisfied the solution of 
the SDP problem (11) belongs to the feasible set of (10), 
thus, Xp is also the solution of the ML detection prob- 
lem. Hence, under the specified conditions SDR Detec- 
tor solves the original ML detection problem. 

The asymptotic performance of SDR Detector for 
fixed problem size and p — oo has been analyzed in [8], 
where it is shown that for Rayleigh fading H SDR De- 
tector achieves maximum diversity, i. e. 


log P{Ssar # s} _ log P{smi # s} = n 


lim —————— = lim = 
poo log p poo log p 2 
Simulation Results 


In this section we compare the running time and the 

BER performance of various implementations of the 

detectors based on the semidefinite relaxation (11) and 

that of Sphere Decoder: 

e SDP detector [13] implemented with SeDuMi tool- 
box [15] for convex optimization problems. 

e SDR Detector that is based on a dual-scaling 
interior-point method (DSDP implementation [1]) 
and a dimension reduction strategy [12]. 
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e SDR Detector [12], implemented with a dual-scaling 
interior-point method, a dimension reduction strat- 
egy, and warm start with a truncated version of 
Sphere Decoder. 

e Sphere Decoder [16]. 

Figures 3 and 4 demonstrate the average running time 

and the BER performance achieved by the above detec- 

tors for problem size n = 60. Notice, the running time 
of DSDP-based (SeDuMi-based) detector is insensitive 
to SNR, and the BER performance shows 1 dB (2-dB) 


: | =@= SDR Detector t 
aad DSDP with dimension reduction} 
mmf Sphere Decoder ! 

SeDuMi-based SDP Detector | 
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Maximum Likelihood Detection via Semidefinite Program- 
ming, Figure 3 
Running time comparison, n = 60 


| ==@= SDR Detector 
| DSDP with dimension reduction 


Pe et a 
z 8 9 1 741 12 #13 #14 #15 #16 «#17 
SNR, dB 


Maximum Likelihood Detection via Semidefinite Program- 
ming, Figure 4 
Bit-error-rate comparison, n = 60 


SNR loss. Sphere Decoder is faster than the semidefi- 
nite relaxation-based detectors in high SNR regime but 
becomes significantly slower for SNR lower than 10 dB. 
SDR Detector matches the speed of Sphere Decoder in 
high SNR region, matches the running time of other 
semidefinite relaxation-based detectors in low SNR 
regime, and enjoys the near-ML BER performance. 
Figures 5 and 6 compare the average running time 
for large problems and in low SNR region. The run- 
ning time of polynomial complexity detectors (SDR 
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Maximum Likelihood Detection via Semidefinite Program- 
ming, Figure 5 
Running time for large problems, o = 10dB 
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ming, Figure 6 
Running time in low SNR regime, n = 40 
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Detector, SCDuMi and DSDP-based) scales well in both 
regimes, remaining in the sub-second region, while the 
running time of Sphere Decoder deteriorates in both 
scenarios. 


Conclusions 


We have considered the maximum likelihood detection 
problem. Among various quasi-ML detectors SDR De- 
tector offers a near-optimal BER performance with the 
worst-case polynomial complexity. We have analyzed 
the underlying structure of the SDP relaxation which is 
the core of SDR Detector. For a given SNR SDR Detec- 
tor delivers a constant factor approximation of the log- 
likelihood ratio for the original ML detection problem 
in probability, where the constant factor is indepen- 
dent of problem size. SDR Detector solves ML detection 
problem exactly in high SNR region. Numerical simu- 
lations of BER and running time empirically demon- 
strate the advantages of SDR Detector as compared to 
the computationally expensive ML Detector. 
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The maximum partition matching problem was intro- 
duced recently in the study of routing schemes on in- 
terconnection networks [2]. In this article, we study the 
basic properties of the problem. An efficient algorithm 
for the maximum partition matching problem is pre- 
sented. 
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Definitions and Motivation 


Let S = {C), ..., Cy} be a collection of subsets of the 
universal set U = {1,..., n} such that ca C; = U, and 
C; C; = @ for all i 4 j. A partition (A, B) of S pairs two 
elements a and b in U if a is contained in a subset in A 
and b is contained in a subset in B. A partition matching 
(of order m) of S consists of two ordered subsets L = 
{a,, ..., Am} and R = {b,, ..., bm} of m elements of U 
(the subsets L and R may not be disjoint), together with 
a sequence of m distinct partitions of S: (Aj, Bj), ...; 
(Am, By) such that for all i= 1, ..., m, the partition (Aj, 
B;) pairs the elements a; and b;. The maximum partition 
matching problem is to construct a partition matching 
of order m for a given collection S with m maximized. 

The maximum partition matching problem arises 
in connection with the parallel routing problem in in- 
terconnection networks. In particular, in the study of 
the star networks [1], which are attractive alternatives 
to the popular hypercubes networks. It can be shown 
that constructing an optimal parallel routing scheme in 
the star networks can be effectively reduced to the max- 
imum partition matching problem. Readers interested 
in this connection are referred to [2] for a detailed dis- 
cussion. 

The maximum partition matching problem can be 
formulated in terms of the 3-dimensional matching 
problem as follows: given an instance S$ = {Cj, ..., Cg} 
of the maximum partition matching problem, we con- 
struct an instance M for the 3-dimensional matching 
problem such that a triple (a, b, P) is contained in M 
if and only if the partition P of S pairs the elements a 
and b. However, since the number of partitions of the 
collection S can be as large as 2” and the 3-dimensional 
matching problem is NP-hard [4], this reduction does 
not hint a polynomial time algorithm for the maximum 
partition matching problem. 

In the rest of this article, we study the basic proper- 
ties for the maximum partition matching problem, and 
present an algorithm of running time O(n? log n) for 
the problem. We first introduce necessary terminolo- 
gies that will be used in our discussion. 

Let a = (L, R, (Ai, By), ..., (Am, Bm)) be a partition 
matching of the collection S, where L = {a1,..., dm} and 
R= {by,..., bm}. We will say that the partition (Aj, B;) 
left-pairs the element a; and right-pairs the element bj. 
An element a is said to be left-paired if it is in the set 


L. Otherwise, the element a is left-unpaired. Similarly 
we define right-paired and right-unpaired elements. The 
collections A; and B; are called the left-collection and 
right-collection of the partition (Aj, B;). The partition 
matching m may also be written as m[(a, bi), ...5 (Am 
b,)] if the corresponding partitions are implied. 

For the rest of this paper, we assume that U = {1,..., 
n} and that S = {C), ..., Cy} is a collection of pairwise 
disjoint subsets of U such that U_, C; = U. 


Case I. Via Pre-Matching when ||S|| is Large 


A necessary condition for two ordered subsets L = {ay, 
seey Am} and R = {b, ..., bm} of U to form a partition 
matching for the collection S is that a; and b; belong to 
different subsets in the collection S, for all i=1,..., m. 
We say that the two ordered subsets L and R of U form 
a pre-matching o = {(aj; bj): 1 < i < m} if a; and b; do 
not belong to the same subset in the collection §, for all 
i=1,..., m. The pre-matching o is maximum if m is 
the largest among all pre-matchings of S. 

A maximum pre-matching can be constructed ef- 
ficiently by the algorithm pre-matching given below, 
where we say that a set is singular if it consists of a sin- 
gle element. See [3] for a proof for the correctness of the 
algorithm. 


Input: the collection S = {C),..., Cx} of subsets 
of U 

Output : a maximum pre-matching o in S 

i i Son—10; 

2, WHILE T contains more than one set but 
does not consist of exactly three singular 
sets 
DO 

2.1. pick two sets C and C’ of largest cardinal- 
ity in T; 

2.2. pick an element a in C and an element b 
in C’; 


2.3. o=a0U {(a,b),(b,a)}; 
I, C=C={arC sC' = kale 
2.5. if Cor C’ is empty now, delete it from T; 


3h IF T consists of exactly three singular sets 
C, = {ay}, Co = {an}, and C3 = {a3} 
THEN 


o =0 U {(aj, a2), (a2, a3), (43, a1)}. 


Algorithm pre-matching 
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In the following, we show that when the cardinality 
of the collection S is large enough, a maximum parti- 
tion matching of S can be constructed from the maxi- 
mum pre-matching o produced by the algorithm pre- 
matching. 

Suppose that the collection S consists of k subsets 
Ci,..., Cy and 2‘ > 4n. The pre-matching o contains at 
most n pairs. Let (a, b) be a pair in o and let C and C’ 
be two arbitrary subsets in S such that C contains a and 
C’ contains b. Note that the number of partitions (A, B) 
of § such that C is in A and C’ is in B is equal to 2'~? > 
n. Therefore, at least one such partition can be used to 
left-pair a and right-pair b. This observation results in 
the following theorem. 


Theorem 1 Let S = {C;, ..., Cy} be a collection of 
nonempty subsets of the universal set U = {1,..., n} such 
that ea C; = Uand C,N C = G, for i Fj. Ifa > 4n, 
then a maximum partition matching in S can be con- 
structed in time O(n’). 


Proof Consider the following algorithm partition- 
matching-I. 


Input: the collection § = {C,..., Cx} of subsets 
of U 

Output: a partition matching z in S 

1, construct a maximum pre-matching o of 
S; 

BD, FOR each pair (a, b) in o DO 
use an unused partition of S to pair a and 
b. 


Algorithm partition-matching-| 


Suppose the pre-matching o constructed in step 1 
is o = {(a), bj), ..., (Gm bm)}. According to the above 
discussion, for each pair (a;, b;) in o, there is always an 
unused partition of $ that left-pairs a and right-pairs b. 
Therefore, step 2 of the algorithm partition-matching-I 
is valid and constructs a partition matching z for the 
collection §. Since each partition matching for S$ in- 
duces a pre-matching in S and o is a maximum pre- 
matching, we conclude that the partition matching z is 
a maximum partition matching for the collection S. 

By carefully organizing the elements in U and 
the partitions of S$, we can show that the algorithm 
partition-matching-I runs in time O(n’). See [3]. 


Case Il. Via Greedy Method when ||S|| is Small 


Now we consider the case 2* < 4n. Since the number 
2* of partitions of the collection S is small, we can ap- 
ply a greedy strategy that expands a current partition 
matching by trying to add each of the unused partitions 
to the partition matching. We show in this section that 
a careful use of this greedy method constructs a maxi- 
mum partition matching for the given collection. 

Suppose we have a partition matching a = m[(a), 
b,),...5 (ay, b,)] and want to expand it. The partitions 
of the collection S then can be classified into two classes: 
h of the partitions are used to pair the h pairs (aj, bj), i= 
1,..., h, and the rest 2‘ — h partitions are unused. Now 
if there is an unused partition P = (A, B) such that there 
is a left-unpaired element a in A and a right-unpaired 
element b in B, then we simply pair the element a with 
the element b using the partition P, thus expanding the 
partition matching zr. 

Now suppose that there is no such unused parti- 
tion, i.e., for all unused partitions (A, B), either A con- 
tains no left-unpaired elements or B contains no right- 
unpaired elements. This case may not necessarily imply 
that the current partition matching is the maximum. 
For example, suppose that (A, B) is an unused parti- 
tion such that there is a left-unpaired element a in A 
but no right-unpaired elements in B. Assume further 
that there is a used partition (A’, B’) that pairs elements 
(a’, b’), such that the element b’ is in B and there is 
a right-unpaired element b in B’. Then we can let the 
partition (A’, B’) pair the elements (a’, b), and then let 
the partition (A, B) pair the elements (a, b’), thus ex- 
panding the partition matching z. An explanation of 
this process is that the used partitions have been incor- 
rectly used to pair elements, thus in order to construct 
a maximum partition matching, we must re-pair some 
of the elements. To further investigate this relation, we 
need to introduce a few notations. 

For a used partition P of S, we put an underline on 
a set in the left-collection (resp. the right-collection) of 
P to indicate that an element in the set is left-paired 
(resp. right-paired) by the partition P. The sets will be 
called the left-paired set and the right-paired set of the 
partition P, respectively. 


Definition 2 A used partition P is directly left- 
reachable from a partition P; = (Aj, Bj) if the left- 
paired set of P is contained in A; (the partition P, 


2032 


Maximum Partition Matching 


can be either used or unused). The partition P is di- 
rectly right-reachable from a partition P, = (Ao, Bz) if 
the right-paired set of P is contained in B2. A partition 
P, is left-reachable (resp. right-reachable) from a parti- 
tion P, if there are partitions P2,..., P;_; such that P; 
is directly left-reachable (resp. directly right-reachable) 
from P;_ ,, for all i=2,...,s. 


The left-reachability and the right-reachability are tran- 
sitive relations. 

Let P; = (Aj, B;) be an unused partition such that 
there are no left-unpaired elements in Aj, and let P, = 
(A,, B,) be a partition left-reachable from P) and there 
is a left-unpaired element a, in A,. We show how we can 
use a chain justification to make a left-unpaired element 
for the collection A. 

By the definition, there are used partitions Po, ..., 
Ps such that P; is directly left-reachable from P;—, 
for i = 2,..., s. We can further assume that P; is not di- 
rectly left-reachable from P;_ 2 for i =3,..., s (otherwise 
we simply delete the partition P;_ ; from the sequence). 
Thus, these partitions can be written as 


P, = ({Ci} U Aj, By), 
P, = ({Ci, Cy} U A, Bz), 
P, = ({Cp, Cs} U AS, Bs), 


Psy = ({C.—2, C;-1} U Al, B,-1), 

P, = ({C.-1, C;} U Al, B,), 
where Aj’, ..., A,’ are subcollections of S$ without an 
underlined set. 

We can assume that the left-unpaired element a, in 
A, = {C,-1,C,}U Al is in a nonunderlined set C, in As 
(otherwise we consider the sequence Pj, ..., Ps—1 in- 
stead). We modify the partition sequence into 

P, = ({Ci} U Aj, By), 

Py = ({C,, Cy} U A, Bz), 

P3 = ({Cy, C3} U AG, Bs), 


Psy = ({C.—2, C1} U AY 4, Baa). 
P, = ({C.-1,C,} UA’, B,). 


The interpretation is as follows: we use the partition 
P, to left-pair the left-unpaired element a, (the right- 


paired element in the right-collection B, is unchanged). 
Thus, the element a,_; in the set C,—1 of the parti- 
tion P, used to left-pair becomes left-unpaired. We then 
use the partition P,— ; to left-pair the element a;_ ; and 
leave an element a;— 2 in the set C,_ 2 left-unpaired, then 
we use the partition P,— 2 to left-pair a; 2, etc. At the 
end, we use the partition P, to left-pair an element a, 
in the set C) and leave an element aq, in the set C, left- 
unpaired. Therefore, this process makes an element in 
the left-collection A; = {C,} U Aj’ of the partition P, 
left-unpaired. 

The above process will be called a left-chain justifi- 
cation. Thus, given an unused partition P| = (Aj, By) 
in which the left-collection A; has no left-unpaired el- 
ements and given a used partition P, = (A,, B,) left- 
reachable from P, such that the left-collection A, of P, 
has a left-unpaired element, we can apply the left-chain 
justification that keeps all used partitions in the par- 
tition matching mz and makes a left-unpaired element 
for the partition P;. A process called right-chain justifi- 
cation for right-collections of the partitions can be de- 
scribed similarly. 

A greedy method based on the left-chain and right- 
chain justifications is presented in the following algo- 
rithm greedy-expanding. 


Input: the collection § = {C),..., Cx} of subsets 
of U 

Output: a partition matching zrexp in S 

1. exp = U3 

2, repeat until no more changes 


IF there is an unused partition P = (A, B) 
that has a left-unpaired element ain A and 
a right-unpaired element b in B 

THEN pair the elements (a, b) by the par- 
tition P and add P to the matching Zrexp 
ELSE IF a left-chain justification or a 
right-chain justification (or both) is appli- 
cable to make an unused partition P = 
(A, B) to have a left-unpaired element in 
A and a right-unpaired element in B 
THEN apply the left-chain justification 
and/or the right-chain justification 


Algorithm greedy-expanding 


In case 2* < 4n, a careful organization of the ele- 
ments and the partitions can make the running time 
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of the algorithm greedy-expanding bounded by O(n? 
log n). Briefly speaking, we construct a graph G of 2* 
vertices in which each vertex represents a partition of 
S. The direct left- and right- reachabilities of partitions 
are given by the edges in the graph G, so that checking 
left- and right- reachabilities and performing left- and 
right- chain justifications can be done efficiently. Inter- 
ested readers are referred to [3] for a detailed descrip- 
tion. 

After execution of the algorithm greedy-expanding, 
we obtain a partition matching 7 exp. For each partition 
P = (A, B) not included in zxp, either A has no left- 
unpaired elements and no used partition left-reachable 
from P has a left-unpaired element in its left-collection, 
or B has no right-unpaired elements and no used par- 
tition right-reachable from P has a right-unpaired ele- 
ment in its right-collection. 


Definition 3 Define Lire to be the set of partitions P 
not used by zxp such that the left-collection of P has 
no left-unpaired elements and no used partition left- 
reachable from P has a left-unpaired element in its left- 
collection, and define Rye to be the set of partitions P’ 
not used by zexp such that the right-collection of P’ has 
no right-unpaired elements and no used partition right- 
reachable from P’ has a right-unpaired element in its 
right-collection. 


According to the algorithm greedy-matching, each par- 
tition not used by zrexp is either in the set Lyee or in the 
set Rfree. The sets Lfree and Rfree may not be disjoint. 


Definition 4 L,.a- to be the set of partitions in 7¢xp that 
are left-reachable from a partition in Lfee, and define 
Rreac to be the set of partitions in 7p that are right- 
reachable from a partition in Ryeac. 


According to the definitions, if a used partition P is 
in the set Lreac, then all elements in its left-collection 
are left-paired, and if a used partition P is in the set 
Rreacs then all elements in its right-collection are right- 
paired. 

We first show that if Lreac and Ryeac are not disjoint, 
then we can construct a maximum partition matching 
from the partition matching 7 xp constructed by the al- 
gorithm greedy-expanding. For this, we need the fol- 
lowing technical lemma. 


Lemma 5 If the sets Lyeac aNd Rreac contain a common 
partition and the partition matching 1 exp has less than 
n pairs, then there is a set Co in S, |Co| < n/2, such that 
either all elements in each set C £ Cy are left-paired and 
every used partition whose left-paired set is not Co is con- 
tained in Lyege, or all elements in each set C # Co are 
right-paired and every used partition whose right-paired 
set is not Co is contained in Ryeac- 


For a proof, see [3]. 


Theorem 6 [f Lyeac and Ryeac have a common partition, 
then the collection S has a maximum partition matching 
of n pairs, which can be constructed in linear time from 
the partition matching T exp. 


Proof If exp has n pairs, then exp is already a maxi- 
mum partition matching. Thus we assume that 7 exp has 
less than n pairs. According to the above lemma, we can 
assume, without loss of generality, that all elements in 
each set Cj, i = 2, ..., k, are left-paired, and that every 
used partition whose left-paired set is not Cy is in Lreac. 
Moreover, |C,| < )-*_, |Cil. 

Let f= *_, |C;| and d = |C,|. Then we can assume 
that the partition matching mp consists of the parti- 
tions 


Price ecg PP pei yee EEE 


where P), ..., P; are used by exp to left-pair the ele- 
ments in (ies C;, and Pi.1,..., Pi+n are used by exp 
to left-pair the elements in C), h < d. Moreover, all par- 
titions P,, ..., P; are in the set Lyeac. Thus, the set C; 
must be contained in the right-collection in each of the 
partitions Pj, ..., P;. 

We ignore the partitions P,, 1, . 
partitions P;,..., P; to construct a maximum partition 
matching of n pairs. Note that {P), ..., P;} also forms 
a partition matching in the collection S. 

For a partition (A, B) of S, we say that the partition 
(B, A) is obtained by flipping the partition (A, B). In 
the following algorithm partition-flipping, we show that 
a maximum partition matching of n pairs can be con- 
structed by flipping d partitions in the partitions P),..., 
Py, 


.., Pi, and use the 
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Input: a partition matching {P),..., P;} that left- 

pairs all elements in WEG; i= eS ICil, 

and the set C; is contained in the right- 
collection of each partition P;,i =1,...,¢, 
d=|C\|<t 

a maximum partition matching in S with 

n pairs. 

il, if not all elements in the set C, are 
right-paired by P,,...,P;, replace a 
proper number of right-paired elements 
in UEC; by the right-unpaired elements 
in C,; so that all elements in C, are 
right-paired by P,,..., Pi; 

2, suppose that the partitions P,,...,Pi—a 
right-pair t — d elements b;,...,b;—~q in 
Cie er and that P;aii1...., P; right-pair 
the d elements in C); 

3. suppose that P,,...,P;-q are the t—d 
partitions in {P,,..., P;} that left-pair the 


elements by,..., by—a; 
4, flip each of the d_ partitions in 
Hocnanltt = {Piocnsn Pea 1 mee a 


partitions Pi,...,P, to left-pair the d 
elements in C;. The right paired element 
of each P; is the left-paired element before 
the flipping; 

5. {Poe oP Pie ene) 0S a partition 
matching of n pairs. 


Algorithm partition-flipping 


Step 1 of the algorithm is always possible: since C, is 
contained in the right-collection of each partition P;, i= 
1,...,¢, and t > d, for each right-unpaired element b in 
C, we can always pick a partition P; that right-pairs an 
element in ca C;, and let P; right-pair the element b. 
We keep doing this replacement until all d elements in 
C, get right-paired. At this point, the number of parti- 
tions in {P),..., P;} that right-pair elements in can Ci 
is exactly t— d. Step 3 is always possible since the parti- 
tions P;,..., P; left-pair all elements in UE, Cj. 

Now we verify that the constructed sequence {P}, 
we) Pt, Py’, ..., Pa} is a partition matching in S. No 
two partitions P; and P; can be identical since {P},..., 
P;} is supposed to be a partition matching in S. No two 
partitions P;’ and P;’ can be identical since they are ob- 
tained by flipping two different partitions in {Pj, ..., 
P;}. No partition P; is identical to a partition P;’ because 


P; has C, in its right-collection while Pe has C, in its 
left-collection. Therefore, the partitions P,,..., P;, Py’, 
..., P,’ are all distinct. 

Each of the partitions P,, ..., P; left-pairs an ele- 
ment in (Neg C;, and each of the partitions P;’,..., Py’ 
left-pairs an element in C,. Thus, all elements in the 
universal set U get left-paired in {P), ..., P;, Py’, ...; 
Py}. 

Finally, the partitions P), ..., P; right-pair all ele- 
ments in C, and the elements b;,..., b;—g in Ge, G. 
Now by our selection of the partitions, the partitions 
P,’,..., Pd’ precisely right-pair all the elements in U‘_, 
C; — {b,, ..., b; a}. Thus, all elements in U also get 
right-paired in {P,,..., P;, Py’, ..., Pa’}. 

This concludes that the constructed sequence {P), 
..> Py, Py’, ..., Pa} is a maximum partition matching 
in the collection $. The running time of the algorithm 
partition-flipping is obviously linear. 


Now we consider the case when the sets Lyeac and Rreac 
have no common partitions. 


Theorem 7 [f Lyeac and Rreac have no common parti- 
tions, then the partition matching Texp is a maximum 
partition matching. 


Proof Let Wother be the set of used partitions in exp 
that belong to neither Lreac NOY Rreac. Then Lfree U Réree 
U Lreac U Rreac U Wother is the set of all partitions of the 
collection S, and Lreac U Rreac U Wother is the set of par- 
titions contained in the partition matching 7 exp. Since 
all sets Lyeacs Rreacs ANd Wother are pairwise disjoint, the 
number of partitions in exp is precisely |Lreac| + |Rreac| 
+f: | W other|- 

Now consider the set Wy = Leree U Lreac. Let Uz, be 
the set of elements that appears in the left-collection of 
a partition in W;. We have 
e Every P € Lyeac left-pairs an element in U;; 

e Every element in Uy is left-paired; 

e Ifan element a in U, is left-paired by a partition P, 

then P € Lyeac. 

Therefore, the partitions in Lyeac precisely left-pair the 
elements in U;. This gives |Lreac| = |Uz|. Since there are 
only |U;| elements that appear in the left-collections in 
partitions in Lfree U Lreac, we conclude that the parti- 
tions in Wy = Lfee U Lreac can be used to left-pair at 
most |Uz| = |Lreac elements in any partition matching 
in S. 
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Similarly, the partitions in the set Wr = Rfree U Rreac 
can be used to right-pair at most |Ryeac| elements in any 
partition matching in S. 

Therefore, any partition matching in the collection 
S can include at most |L;eac| partitions in the set Wz, at 
most |Ryeac| partitions in the set Wr, and at most all par- 
titions in the set Wother. Consequently, a maximum par- 
tition matching in S$ consists of at most |Lreac| + |Rreac| 
+ |Wother| partitions. Since the partition matching exp 
constructed by the algorithm greedy-expanding con- 
tains just this many partitions, 7 exp is a maximum par- 
tition matching in the collection S. 


Now it is clear how the maximum partition matching 
problem is solved. 


Theorem8 The maximum partition matching problem 
is solvable in time O(n? log n). 


Proof Suppose that we are given a collection S = {C, 
..., Cx} of pairwise disjoint subsets of U = {1,..., n}. 

In case 2* > 4n, we can call the algorithm partition- 
matching-I to construct a maximum partition match- 
ing in time O(n’). 

In case 2 < 4n, we first call the algorithm greedy- 
expanding to construct a partition matching 7.x) and 
compute the sets Lreac and Rreac- If Lreac aNd Ryeac have 
no common partition, then according to the previous 
theorem, exp is already a maximum partition match- 
ing. Otherwise, we call the algorithm partition-flipping 
to construct a maximum partition matching. All these 
can be done in time O(n log n). A detailed analysis of 
this algorithm can be found in [3]. 


See also 


> Assignment and Matching 

> Assignment Methods in Clustering 

> Bi-objective Assignment Problem 

> Communication Network Assignment Problem 
> Frequency Assignment Problem 

> Quadratic Assignment Problem 
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In the maximum satisfiability (MAX-SAT) problem 
one is given a Boolean formula in conjunctive normal 
form, i.e., as a conjunction of clauses, each clause be- 
ing a disjunction. The task is to find an assignment of 
truth values to the variables that satisfies the maximum 
number of clauses. 

Let be the number of variables and m the number 
of clauses, so that a formula has the following form: 


/\ VV lik]. 


Isism \1<ks|C;| 


where |C;| is the number of literals in clause C; and lx 
is a literal, i.e., a propositional variable u; or its nega- 
tion 4j, for 1 <j <n. The set of clauses in the formula 
is denoted by C. If one associates a weight w; to each 
clause C; one obtains the weighted MAX-SAT problem, 
denoted as MAX W-SAT: one is to determine the as- 
signment of truth values to the n variables that maxi- 
mizes the sum of the weights of the satisfied clauses. In 
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the literature one often considers problems with differ- 
ent numbers k of literals per clause, defined as MAX-k- 
SAT, or MAX W-k-SAT in the weighted case. In some 
papers MAX-k-SAT instances contain up to k literals 
per clause, while in other papers they contain exactly k 
literals per clause. We consider the second option un- 
less otherwise stated. 

MAX-SAT is of considerable interest not only from 
the theoretical side but also from the practical one. On 
one hand, the decision version SAT was the first exam- 
ple of an NP-complete problem [16], moreover MAX- 
SAT and related variants play an important role in the 
characterization of different approximation classes like 
APX and PTAS [5]. On the other hand, many issues 
in mathematical logic and artificial intelligence can be 
expressed in the form of satisfiability or some of its 
variants, like constraint satisfaction. Some exemplary 
problems are consistency in expert system knowledge 
bases [46], integrity constraints in databases [4,23], ap- 
proaches to inductive inference [35,40], asynchronous 
circuit synthesis [32]. An extensive review of algorithms 
for MAX-SAT appeared in [9]. 

M. Davis and H. Putnam [19] started in 1960 the 
investigation of useful strategies for handling resolu- 
tion in the satisfiability problem. Davis, G. Logemann 
and D. Loveland [18] avoid the memory explosion of 
the original DP algorithm by replacing the resolution 
rule with the splitting rule. A recent review of advanced 
techniques for resolution and splitting is presented in 
[31]. 

The MAX W-SAT problem has a natural integer 
linear programming formulation. Let y; = 1 if Boolean 
variable uj is ‘true’, y; = 0 if it is ‘false’, and let the 
Boolean variable z; = 1 if clause C; is satisfied, z; = 0 
otherwise. The integer linear program is: 


m 
max - WiZi 
i=l 
subject to the constraints: 


> ‘ee ae =i) = Zi, 


jeut jeU 

i=1,...,m 
yj € 10, 1}, ae eee 
zi € {0,1} = 1 ,m, 


where U}' and U; denote the set of indices of variables 
that appear unnegated and negated in clause C;, respec- 
tively. If one neglects the objective function and sets all 
z; variables to 1, one obtains an integer programming 
feasibility problem associated to the SAT problem [11]. 

The integer linear programming formulation of 
MAX-SAT suggests that this problem could be solved 
by a branch and bound method (cf. also » Integer 
programming: Branch and bound methods). A usable 
method uses Chvatal cuts. In [35] it is shown that 
the resolvents in the propositional calculus correspond 
to certain cutting planes in the integer programming 
model of inference problems. 

Linear programming relaxations of integer linear 
programming formulations of MAX-SAT have been 
used to obtained upper bounds in [27,33,55]. A lin- 
ear programming and rounding approach for MAX-2- 
SAT is presented in [13]. A method for strengthening 
the generalized set covering formulation is presented 
in [47], where Lagrangian multipliers guide the genera- 
tion of cutting planes. 

The first approximation algorithms with a ‘guaran- 
teed’ quality of approximation [5] were proposed by 
D.S. Johnson [38] and use greedy construction strate- 
gies. The original paper [38] demonstrated for both of 
them a performance ratio 1/2. In detail, let k be the 
minimum number of variables occurring in any clause 
of the formula, m(x, y) the number of clauses satisfied 
by the feasible solution y on instance x, and m*(x) the 
maximum number of clauses that can be satisfied. 

For any integer k > 1, the first algorithm achieves 
a feasible solution y of an instance x such that 


m(x, y) Sef 1 
m*(x) — k+1? 


while the second algorithm obtains 


m(x, y) Sc 1 


m*(x) ~ 2° 


Recently (1997) it has been proved [12] that the sec- 
ond algorithm reaches a performance ratio 2/3. There 
are formulas for which the second algorithm finds 
a truth assignment such that the ratio is 2/3. Therefore 
this bound cannot be improved [12]. 

One of the most interesting approaches in the de- 
sign of new algorithms is the use of randomization. 
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During the computation, random bits are generated 
and used to influence the algorithm process. In many 
cases randomization allows to obtain better (expected) 
performance or to simplify the construction of the 
algorithm. Two randomized algorithms that achieve 
a performance ratio of 3/4 have been proposed in 
[27] and [55]. Moreover, it is possible to derandom- 
ize these algorithms, that is, to obtain deterministic al- 
gorithms that preserve the same bound 3/4 for every 
instance. The approximation ratio 3/4 can be slightly 
improved [28]. T. Asano [2] (following [3]) has im- 
proved the bound to 0.77. For the restricted case of 
MAX-2-SAT, one can obtain a more substantial im- 
provement (performance ratio 0.931) with the tech- 
nique in [21]. If one considers only satisfiable MAX 
W-SAT instances, L. Trevisan [54] obtains a 0.8 ap- 
proximation factor, while H. Karloff and U. Zwick 
[41] claim a 0.875 performance ratio for satisfiable in- 
stances of MAX W-3-SAT. A strong negative result 
about the approximability can be found in [36]: Unless 
P= NP MAX W-SAT cannot be approximated in poly- 
nomial time within a performance ratio greater than 
7/8. 

MAX-SAT is among the problems for which local 
search has been very successful: in practice, local search 
and its variations are the only efficient and effective 
method to address large and complex real-world in- 
stances. Different variations of local search with ran- 
domness techniques have been proposed for SAT and 
MAX-SAT starting from the late 1980s, see for ex- 
ample [30,52], motivated by previous applications of 
‘min-conflicts’ heuristics in the area of artificial intel- 
ligence [44]. 

The general scheme is based on generating a start- 
ing point in the set of admissible solution and trying to 
improve it through the application of basic moves. The 
search space is given by all possible truth assignments. 
Let us consider the elementary changes to the current 
assignment obtained by changing a single truth value. 
The definitions are as follows. 

Let U be the discrete search space: U = {0, 1}", and 
let f be the number of satisfied clauses. In addition, let 
U € U be the current configuration along the search 
trajectory at iteration t, and N(U™) the neighborhood 
of point U, obtained by applying a set of basic moves 
[ti (1 <i <n), where fz; complements the ith bit u; of 
the string: Lj (W),..., Uy ---> Un) = (U1, ..-) L— Uj, 


Un): 
nu) = \ue Ue Veg, U.S ee 
The version of local search that we consider starts 
from a random initial configuration U € U and gen- 
erates a search trajectory as follows: 


V = BESTNEIGHBOR(N(U“)), (1) 


Vif f(V) > f(U%), 
Oe AV SFU) 


ult) _ 


(2) 


where BESTNEIGHBOR selects V € N(U“) with the 
best f value and ties are broken randomly. V in turn 
becomes the new current configuration if f improves. 
Other versions are satisfied with an improving (or 
nonworsening) neighbor, not necessarily the best one. 
Clearly, local search stops as soon as the first local opti- 
mum point is encountered, when no improving moves 
are available, see (2). Let us define as LS* a modifica- 
tion of LS where a specified number of iterations are 
executed and the candidate move obtained by BEST- 
NEIGHBOR is always accepted even if the f value re- 
mains equal or worsens. 

Properties about the number of clauses satisfied at 
a local optimum have been demonstrated. Let m* be the 
best value and k the minimum number of literals con- 
tained in the problem clauses. Let mio be the number 
of satisfied clauses at a local optimum of any instance 
of MAX-SAT with at least k literals per clause. myo< sat- 
isfies the following bound [34]: 


Moe = ~——— mM 
loc = p si 
and the bound is sharp. Therefore, if mo. is the number 
of satisfied clauses at a local optimum, then: 


k * 
Moc = han . (3) 


State-of-the-art heuristics for MAX-SAT are ob- 
tained by complementing local search with schemes 
that are capable of producing better approximations be- 
yond the locally optimal points. In some cases, these 
schemes generate a sequence of points in the set of ad- 
missible solutions in a way that is fixed before the search 
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starts. An example is given by multiple runs of local 
search starting from different random points. The al- 
gorithm does not take into account the history of the 
previous phase of the search when the next points are 
generated. The term ‘memory-less’ denotes this lack of 
feedback from the search history. 

In addition to the cited multiple-run local search, 
these techniques are based on Markov processes (simu- 
lated annealing; cf. also ® Simulated annealing meth- 
ods in protein folding), ‘plateaw’ search and ‘random 
noise’ strategies, or combinations of randomized con- 
structions and local search. The use of a Markov process 
to generate a stochastic search trajectory is adopted, for 
example in [53]. 

The Gsat algorithm was proposed in [52] as a model- 
finding procedure, i.e., to find an interpretation of the 
variables under which the formula comes out ‘true’. 
Gsat consists of multiple runs of LS*, each run con- 
sisting of a number of iterations that is typically pro- 
portional to the problem dimension n. An empirical 
analysis of Gsat is presented in [24,25]. Different ‘noise’ 
strategies to escape from attraction basins are added to 
Gsat in [50,51]. 

A hybrid algorithm that combines a randomized 
greedy construction phase to generate initial candidate 
solutions, followed be a local improvement phase is the 
GRASP scheme proposed in [48] for the SAT and gen- 
eralized for the MAX W-SAT problem in [49]. GRASP 
is an iterative process, with each iteration consisting 
of two phases, a construction phase and a local search 
phase. 

Different history-sensitive heuristics have been pro- 
posed to continue local search schemes beyond lo- 
cal optimality. These schemes aim at intensifying the 
search in promising regions and at diversifying the 
search into uncharted territories by using the infor- 
mation collected from the previous phase (the history) 
of the search. Because of the internal feedback mecha- 
nism, some algorithm parameters can be modified and 
tuned in an on-line manner, to reflect the characteris- 
tics of the task to be solved and the local properties of 
the configuration space in the neighborhood of the cur- 
rent point. This tuning has to be contrasted with the off- 
line tuning of an algorithm, where some parameters or 
choices are determined for a given problem in a prelim- 
inary phase and they remain fixed when the algorithm 
runs on a specific instance. 


Tabu search is a history-sensitive heuristic proposed 
by F. Glover [26] and, independently, by P. Hansen and 
B. Jaumard, that used the term ‘SAMD’ (steepest as- 
cent mildest descent) and applied it to the MAX-SAT 
problem in [34]. The main mechanism by which the 
history influences the search in tabu search is that, at 
a given iteration, some neighbors are prohibited, only 
a nonempty subset N4(U) Cc N(U) of them is al- 
lowed. The general way of generating the search trajec- 
tory that we consider is given by: 


Na(U™) = allow(N(U™),..., U), (4) 
u“+) — BESTNEIGHBOR(N,(U")). (5) 


The set-valued function allow selects a nonempty sub- 
set of N(U™) in a manner that depends on the entire 
previous history of the search U, ..., U. A spe- 
cialized tabu search heuristic is used in [37] to speed 
up the search for a solution (if the problem is satis- 
fiable) as part of a branch and bound algorithm for 
SAT, that adopts both a relaxation and a decomposi- 
tion scheme by using polynomial instances, i. e., 2-SAT 
and Horn-SAT. 

Different methods to generate prohibitions produce 
discrete dynamical systems with qualitatively different 
search trajectories. In particular, prohibitions based on 
a list of moves lead to a faster escape from a locally op- 
timal point than prohibitions based on a list of visited 
configurations [6]. In detail, the function allow can be 
specified by introducing a prohibition parameter T (also 
called list size) that determines how long a move will 
remain prohibited after its execution. The fixed tabu 
search algorithm is obtained by fixing T throughout the 
search [26]. A neighbor is allowed if and only if it is ob- 
tained from the current point by applying a move that 
has not been used during the last T iterations. In detail, 
if LU(jz) is the last usage time of move pp (LU(jz) = — 
oo at the beginning): 


Na(U®) = ju = pu: LU(y) < (t- 1); 


The reactive tabu search algorithm of [10], defines 
simple rules to determine the prohibition parameter by 
reacting to the repetition of previously-visited configu- 
rations. One has a repetition if U“*® = U for R > 1. 
The prohibition period T depends on the iteration t and 
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a reaction equation is added to the dynamical system: 
De tea TO eg), 


An algorithm that combines local search and 
nonoblivious local search [8], the use of prohibitions, 
and a reactive scheme to determine the prohibition pa- 
rameter is the Hamming-reactive tabu search algorithm 
proposed in [7], which contains also a detailed experi- 
mental analysis. 

Given the hardness of the problem and the rele- 
vancy for applications in different fields, the empha- 
sis on the experimental analysis of algorithms for the 
MAX-SAT problem has been growing in recent years 
(as of 2000). 

In some cases the experimental comparisons have 
been executed in the framework of ‘challenges,’ with 
support of electronic collection and distribution of soft- 
ware, problem generators and test instances. An exam- 
ple is the the Second DIMACS algorithm implemen- 
tation challenge on cliques, coloring and satisfiability, 
whose results have been published in [39]. Practical and 
industrial MAX-SAT problems and benchmarks, with 
significant case studies are also presented in [20]. Some 
basic problem models that are considered both in theo- 
retical and in experimental studies of MAX-SAT algo- 
rithms are described in [31]. 

Different algorithms demonstrate a different degree 
of effort, measured by number of elementary steps or 
CPU time, when solving different kinds of instances. 
For example, in [45] it is found that some distributions 
used in past experiments are of little interest because 
the generated formulas are almost always very easy to 
satisfy. It also reports that one can generate very hard 
instances of k-SAT, for k > 3. In addition, it reports the 
following observed behavior for random fixed length 3- 
SAT formulas: if r is the ratio r of clauses to variables (r 
=m/n), almost all formulas are satisfiable if r < 4, almost 
all formulas are unsatisfiable if r > 4.5. A rapid transi- 
tion seems to appear for r ~ 4.2, the same point where 
the computational complexity for solving the generated 
instances is maximized, see [17,42] for reviews of ex- 
perimental results. 

Let « be the least real number such that, ifr is larger 
than «, then the probability of C being satisfiable con- 
verges to 0 as n tends to infinity. A notable result found 
independently by many people, including [22] and [14] 


is that 
K < logs 2 =5.191. 
7 


A series of theoretical analyses aim at approximat- 
ing the unsatisfiability threshold of random formulas 
[1,15,29,43]. 


See also 


> Greedy Randomized Adaptive Search Procedures 
> Integer Programming 
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Introduction 


In multiproduct and multipurpose batch plants, dif- 
ferent products can be manufactured via the same or 
a similar sequence of operations by sharing available 
pieces of equipment, intermediate materials, and other 
production resources. They are ideally suited to manu- 
facture products that are produced in small quantities 
or for which the production recipe or the customer de- 
mand pattern is likely to change. The inherent opera- 
tional flexibility of this type of plant provides the op- 
portunity for increased savings through the realization 
of an efficient production schedule which can reduce 
inventories, production and transition costs, and pro- 
duction shortfalls. 

The problem of production scheduling and plan- 
ning for multiproduct and multipurpose batch plants 
has received a considerable amount of attention dur- 
ing the last two decades. Extensive reviews have been 
written by Reklaitis [10], Pantelides [9], Shah [11] 
and more recently by Floudas and Lin [4,5]. Most 
of the work in the area of multiproduct batch plants 
has dealt with either the long-term planning prob- 
lem or the short-term scheduling problem. Both plan- 
ning and scheduling deal with the allocation of avail- 
able resources over time to perform a set of tasks re- 
quired to manufacture one or more products. How- 
ever, long-term planning problems deal with longer 
time horizons (e.g., several months or years) and are 
focused on higher level decisions such as timing and 
location of additional facilities and levels of produc- 
tion. In contrast, short-term scheduling models address 
shorter time horizons (e. g., several days) and are fo- 
cused on determining detailed sequencing of various 
operational tasks. The area of medium-term schedul- 
ing, however, which involves medium time horizons 
(e.g. several weeks) and still aims to determine de- 
tailed production schedules, can result in very large- 
scale problems and has received much less attention in 
the literature. 

For medium-term scheduling, relatively little work 
has been presented in the literature. Medium-term 
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scheduling can be quite computationally complex, thus 
it is common for mathematical programming tech- 
niques to be used in their solution. The most widely 
employed strategy to overcome the computational dif- 
ficulty is based on the idea of decomposition. The 
decomposition approach divides a large and com- 
plex problem, which may be computationally expen- 
sive or even intractable when formulated and solved 
directly as a single MILP model, to smaller subprob- 
lems, which can be solved much more efficiently. 
There have been a wide variety of decomposition ap- 
proaches proposed in the literature. In addition to de- 
composition techniques developed for general forms 
of MILP problems, various approaches that exploit 
the characteristics of specific process scheduling prob- 
lems have also been proposed. In most cases, the de- 
composition approaches only lead to suboptimal so- 
lutions, however, they substantially reduce the prob- 
lem complexity and the solution time, making MILP 
based techniques applicable for large, real-world prob- 
lems. 

In this chapter, we propose an enhanced State-Task 
Network MILP model for the medium-term produc- 
tion scheduling of a multipurpose, multiproduct indus- 
trial batch plant. The proposed approach extends the 
work of Ierapetritou and Floudas [6] and Lin et al. [8] 
to consider a large-scale production facility and ac- 
count for various storage policies (UIS, NIS, ZW), vari- 
able batch sizes and processing times, batch mixing and 
splitting, sequence-dependent changeover times, inter- 
mediate due dates, products used as raw materials, and 
several modes of operation. The methodology consists 
of the decomposition of the whole scheduling period 
into successive short horizons of a few days. A decom- 
position model is implemented to determine each short 
horizon and the corresponding products to be included. 
Then, a novel continuous-time formulation for short- 
term scheduling of batch processes with multiple in- 
termediate due dates is applied to each short horizon 
selected, leading to a large-scale mixed-integer linear 
programming (MILP) problem. The scheduling model 
includes over 80 pieces of equipment and can take 
into account the processing recipes of hundreds of dif- 
ferent products. Several characteristics of the produc- 
tion plant are incorporated into the scheduling model 
and actual plant data are used to model all parame- 
ters. 


Problem Statement 


In the multiproduct batch plant investigated, there are 
several different types of operations (or tasks) termed 
operation type 1 to operation type 6. The plant has 
many different types of units and over 80 are mod- 
eled explicitly. Hundreds of different products can be 
produced and for each of them, one of the processing 
recipes shown in Fig. 1 or a slight variation is applied. 
The recipes are represented in the form of State-Task 
Network (STN), in which the state node is denoted by 
a circle and the task node by a rectangle. The STN rep- 
resentation provides the flow of material through vari- 
ous tasks in the production facility to produce different 
types of final products and does not represent the actual 
connectivity of equipment in the plant. 

For the first type of STN shown in Fig. 1, raw ma- 
terials (or state F) are fed into a type 1 unit and un- 
dergo operation type 1 to produce an intermediate 
(or state 11). This intermediate then undergoes oper- 
ation type 3 in a type 3 unit to produce another in- 
termediate (or state 12). This second intermediate is 
then sent to a type 4b unit before the resulting in- 
termediate material (or state I3) is sent to a type 6 
unit to undergo an operation type 6 task to pro- 
duce a final product (or state P). The information on 
which units are suitable for each product is given. All 
the units are utilized in a batch mode with the ex- 
ception of the type 5 and 6 units, which operate in 
a continuous mode. The capacity limits of the type 1, 
type 2, and type 3 units vary from one product to an- 
other, while the capacity limits of the types 4a, 4b, 5 
and 6 units are the same for all suitable products. The 
processing time or processing rate of each task in the 
suitable units is also specified. Also, some products re- 
quire other products as their raw materials, creating 
very complicated state-task networks. 

The time horizon considered for production 
scheduling is a few weeks or longer. Customer orders 
are fixed throughout the time horizon with specified 
amounts and due dates. There is no limitation on ex- 
ternal raw materials and we apply the zero-wait storage 
condition or limited intermediate storage capacity for 
all materials based on actual plant data. There are two 
different types of products produced, category 1 and 2. 

The sixth STN shown in Fig. 1 shows a special type 
of product, denoted as a campaign product. For this 
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Medium-Term Scheduling of Batch Processes, Figure 1 
State-task network (STN) representation of plant 


type of product, raw materials are fed into up to three 
type 1 units and undergo operation type 1 to produce 
an intermediate, or state I1. This intermediate is then 
sent to one of two type 4a units before being processed 
in the type 5 unit, which is a continuous unit. Finally, 
the intermediate material (or state 13) is sent to a type 6 
unit, producing a final campaign product (or state P). 
Because product changeovers in the type 5 unit can be 
undesirable, there was a need to introduce the ability 
to fix campaigns for continuous production of a single 
product in the type 5 unit, called campaign mode pro- 
duction. 


Formulation 


The overall methodology for solving the medium- 
range production scheduling problem is to decom- 
pose the large and complex problem into smaller 
short-term scheduling subproblems in successive time 
horizons [8]. The flowchart for this rolling horizon 
approach is shown in Fig. 2. The first step is to input 
relevant data into the formulation. Then, if necessary, 
campaign mode production is determined. Next, the 
overall medium-term scheduling problem is consid- 
ered. A decomposition model is formulated and solved 


to determine the current time horizon and correspond- 
ing products that should be included in the current 
subproblem. According to the solution of the decom- 
position model, a short-term scheduling model is for- 
mulated using the information on customer orders, in- 
ventory levels, and processing recipes. The resulting 
MILP problem is a large-scale, complex problem which 
requires a large computational effort for its solution. 
When a satisfactory solution is determined, the relevant 
data is output and the next time horizon is considered. 
The above procedure is applied iteratively in an auto- 
matic fashion until the whole time horizon under con- 
sideration has been scheduled. 

Note that the decomposition model determines how 
many days and products to consider in the shorter 
scheduling horizon subject to an upper limit on the 
complexity of the resulting mathematical model. Prod- 
ucts are selected for the scheduling horizon if there is an 
order for the product, if the product has an order within 
a set amount of time into the future, if the product is 
used as a raw material for another product which is in- 
cluded, if the product was still processing in the previ- 
ous scheduling horizon, or if the product is a campaign 
product and is included in a campaign for the current 
horizon. 
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Medium-Term Scheduling of Batch Processes, Figure 2 
Flowchart of the rolling horizon approach 


Models 


A key component of the rolling horizon approach is 
the determination of the time horizon and the products 
which should be included for each short-term schedul- 
ing subproblem. We extend the two-level decomposi- 
tion formulation of Lin et al. [8] which partitions the 
entire scheduling horizon into shorter subhorizons by 
taking into account the trade-off between demand sat- 
isfaction, unit utilization, and model complexity. In the 
first level, the number of days in the time horizon and 
the main products which should be included are de- 
termined. In the second level, additional products are 


added to the horizon so that each of the first-stage units, 
or type 1 units, are fully utilized. 


Short-Term Scheduling Model 


Once the decomposition model has determined the 
days in the time horizon and the products to be in- 
cluded, a novel continuous-time formulation for short- 
term scheduling with multiple intermediate due dates is 
applied to determine the detailed production schedule. 
This formulation is based on the models of Floudas and 
coworkers [6,7,8] and is expanded and enhanced in this 
work to take into account specific aspects of the prob- 
lem under consideration. The proposed short-term 
scheduling formulation requires the following indices, 
sets, parameters and variables: 


Indices: 


d_ days; 
processing tasks; 


_/! 


j units; 

k orders; 

n event points representing the beginning of a task; 
s 


states; 

Sets: 

D days in the overall scheduling horizon; 

D'" days in the current scheduling horizon; 

I processing tasks; 

F tasks which can be performed in unit (J); 

Ik tasks which process order (k); 

It tasks which consume state (s); 

r tasks which produce state (s); 

‘ia tasks which are included in the current schedul- 
ing horizon; 

I> tasks which are used to determine the type 5 
unit campaign; 

I’ tasks which are used to perform operation 
type 6 for category 1 products; 

J units; 

Ji units which are suitable for performing task (1); 

yp units which are suitable for performing only 
processing tasks, or operation type 1, 2, 3, and 
5 tasks; 

J™ units which are suitable for performing only op- 


eration type 1 tasks; 
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J™ units which are suitable for performing only op- 
eration type 4a and 4b tasks; 
J units are which used to determine the type 5 
unit campaign; 
J’® units which are suitable for performing only op- 
eration type 6 tasks; 
K orders; 
K; orders which are processed by task (i); 
K, orders which produce state (s); 
K® — orders which are included in the current 
scheduling horizon; 
N event points within the time horizon; 
S states; 
Sk states which are used to satisfy order (k); 
Ss“! states which are category 1 final products; 
s2__ states which are category 2 final products; 
SP™ states which have minimum or maximum stor- 
age limitations; 
sf states which are final products, after operation 
type 6; 
Si states which are intermediate products, before 
operation type 6; 
gin states which are included in the current 
scheduling horizon; 
SP states which are either final or intermediate 
products; 
S™ states which are products and are used as raw 
materials for other products; 
Sst states which have no intermediate storage; 
st states which are used to determine the type 5 
unit campaign; 
se! states which have unlimited intermediate stor- 
age; 
Ss? states which are external raw materials; 
Parameters: 
Bex the maximum suitable batch size used to 
produce product state (s); 
Bae the minimum suitable batch size used to 
produce product state (s); 
C a large constant (e. g., 10000); 
cap; maximum capacity for task (i) in unit (J); 
cap;;" minimum capacity for task (i) in unit (J); 
dem, demand for state (s) in the current 
scheduling horizon; 
dem™” demand for raw material product state 


(s); 


tot 
dem; 


duekksa 
ExtraTime; 
FixedTime; 
H 
mintasks 
N™ax 
praw,s 
price, 
prior, 
prior?” 
RateCT; 
rKisa 

start; 
stcapy* 


stcap™ 
a 


B 


Cc 


Pis 


total demand for state (s) in the overall 
horizon; 

due date of order (k) for state (s) on day 
(d); 

amount of time needed for operation 
type 3 task after processing task (i); 
constant term of processing time for task 
(i) in unit ({); 

time horizon; 

the minimum number of tasks that must 
occur in the first-stage processing units, 
rs 

the maximum number of event points in 
the scheduling horizon; 

0-1 parameter to relate final product (s) 
to its raw material product (s’); 

price of state (s); 

priority of product state (s); 

priority of raw material state (s); 

variable term of processing time for task 
(i) in unit (j); 

amount of order (k) for state (s) on day 
(d); 

the time at which unit (j) first becomes 
available in the current scheduling hori- 
Zon; 

maximum capacity for storage of state (s); 
minimum capacity for storage of state (s); 
coefficient for the demand satisfaction of 
individual orders term; 

coefficient for the due date satisfaction of 
individual orders term; 

coefficient for the overall demand satis- 
faction slack variable term; 

coefficient for the minimum inventory 
requirement in dedicated units term; 
coefficient for the artificial demands on 
raw material states term; 

coefficient for the minimizing of binary 
variables term; 

coefficient for the minimizing of active 
start times term; 

a small constant (e. g., 0.01); 

proportion of state (s) consumed by task 
(i); 

proportion of state (s) produced by task 
(i); 
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Tit’ sequence-dependent setup time between 
tasks (i) and (i’); 

gp coefficient for the satisfaction of orders 
term; 

o coefficient for the overall production 
term; 


Continuous Variables: 


B(i,j,n) amount of material undertaking task (i) 
in unit (j) at event point (n); 

D(s, n) amount of state (s) delivered at event 
point (n); 

Df (s,n) amount of state (s) delivered after the last 
event point; 

kD(k,s,n) amount of state (s) delivered at event 
point (1) for order (k); 

kDf(k,s,n) amount of state (s) delivered after the last 
event point for order (k); 

slal(k,s,d) amount of state (s) due on day (d) for or- 
der (k) that is not delivered; 

sla2(k,s,d) amount of state (s) due on day (d) for or- 
der (k) that is over delivered; 

slcap(s,n) amount of state (s) that is deficient in its 
dedicated storage unit at event point (1); 

sII(s) amount of state (s) due in the current 
time horizon but not made; 

slI**(s) amount of raw material product state (s) 
artificially due in the current time hori- 
zon but not made; 

slorder(k) 0-1 variable indicating if order (k) was 
met; 

slt1(k,s,d) amount of time state (s) due on day (d) 
for order (k) is late; 

slt2(k,s,d) amount of time state (s) due on day (d) 
for order (k) is early; 

ST(s,n) amount of state (s) at event point (”); 

STF(s) final amount of state (s) at the end of the 
current time horizon; 

STO(s) initial amount of state (s) at the beginning 
of the current time horizon; 

T*(i, j»n) time at which task (i) finishes in unit (j) 
at event point (7); 

T’(i,j,n) time at which task (i) starts in unit (/) at 
event point (1); 

tot(s) total amount of state (s) made in the cur- 


rent time horizon; 


tt(i,j,n) starting time of the active task (i) in unit 
(j) at event point (1); 

Binary Variables: 

wv(i,j,n) assigns the beginning of task (i) in unit (/) 
at event point (); 

yi, k,n) assigns the delivery of order (k) through 


task (i) at event point (1); 


On the basis of this notation, the mathematical 
model for the short-term scheduling of an industrial 
batch plant with intermediate due dates involves the 
following constraints: 


eS wv(i, j,n) <1, 
ier]; (1) 
VjieJ, neN, n<=n™ 


min 


cap;;" -wv(i, j,n) < Bi, j,n) 


max 


ij WVU, jn), (2) 
Vier”, jeyi, neN, n<N™ 


< cap 


st(s,n) =0, Vse Si, St, 


se grr gl 2eN, 22 N™* ®) 
st(s,n) . stcap™"" — slcap(s) , 4) 
VseS™,SP™ neNn, n<N™* 
st(s, n) = stcapy™, (5) 
VsEeS™,SP™ neNn, n<N™* 
ST(s,n) = ST(s,n — 1) — D(s, n) 
+ >> o, SB, j.n-1) 
ier? ji (6) 
~S > pf, ¥o Bi j.n), 
iele Jj 


VseES™, neN, n>1, n<N™ 
ST(s,n) = STO(s) — D(s, n) 
—S > pf, )> Bi j.n), (7) 
ie j€Jj 
Vses®™ neN, n=1 
STF(s) = ST(s, n) — Df(s,n) 
+ >> pS Bi j.n), (3) 
ier ji 


VseS™ neN, n= N™* 
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T'(i, j,n) = TS(i, j, n) + FixedTime;; - wv(i, j, n) 
+ RateCT;;- B(i, j,n), (9) 
Vier, jefPPus Ji, ne N, n<N™ 


Gj.) = ion): 


(10) 
Vier, jes° Je nen, n= Nn™ 
T'(i,j,n) =H, Vses™,8*, sé sm, any 
iel™,P, jel *i, ne N, n= N™ 
T(i, jn +1) > Ti, jn) 
+ ExtraTime;-wv(i,j,n), (12) 


Viel, jej;, ne N, n<N™ 


TS(i, j,n +1) > Ti’, jn) + (ti; + ExtraTime;) 
-wv(i', j,n) — H[1 — wii’, j,n)], 

VjeJ, i eI 1, if i, neN, n<N™ 
(13) 


Ti,jnt)=T@.j.n) 
— All —-wv(i', 7’, n)], 
Vses™ ie Kk, fer P, 


j€Ji, jf €]v, fH, nEN, n<N™ 


(14) 


Ti, jn +1) < TY, jn) + H[2—wv(i’, j’,n) 
—wv(i,j,n+1)], 
Vres" S ees. fer. fers 


a45> a4s5 > 


jedi, jf ev, j#j, neN, n<nN™ 
(15) 


The allocation constraints in (1) express the require- 
ment that for each unit (j) and at each event point 
(n), only one of the tasks that can be performed in the 
unit (i.e., i € Ij) should take place. The capacity con- 
straints in (2) express the requirement for the batch- 
size of a task (i) processing in a unit (j) at event point 
(n), B(i, j, n), to be greater than the minimum amount 
of material, cap, and less than the maximum amount 
of material, cap;;*", that can be processed by task (7) in 
unit (j). The storage constraints in (3) enforce that those 
states with no intermediate storage have to be con- 
sumed by some processing task or storage task immedi- 
ately after they are produced. Constraints (4) represent 


the minimum required storage for state (s) in a dedi- 
cated storage tank where this amount can be violated, 
if necessary, by an amount slcap(s) which is penalized 
in the objective function. Constraints (5) represent the 
maximum available storage capacity for state (s) based 
on the maximum storage capacity of the dedicated stor- 
age tank. According to the material balance constraints 
in (6), the amount of material of state (s) at event point 
(n) is equal to that at event point (m — 1) increased 
by any amounts produced at event point (n — 1), de- 
creased by any amounts consumed at event point (7), 
and decreased by the amount required by the market 
at event point (n), D(s, n). Constraints (7)-(8) repre- 
sent the material balance on state (s) at the first and 
last event points, respectively. The duration constraints 
in (9) represent the relationship between the starting 
and finishing times of task (i) in unit (j) at event point 
(n) for all processing tasks (i.e., J?) and all operation 
type 6 tasks (i.e., J"°) where FixedTime; are the fixed 
processing times for batch tasks and zero for contin- 
uous tasks and RateCT; are the inverse of processing 
rates for continuous tasks and zero for batch tasks, re- 
spectively. Constraints (10) also represent the relation- 
ship between the starting and finishing times of task (i) 
in unit (j) at event point (7), but for operation type 4a 
and 4b tasks (i.e., J). They do not impose exact du- 
rations for tasks in these units but just enforce that all 
tasks must end after they start. Constraints (11) are 
written only for tasks in units which are processing 
a nonstorable state (i.e., S* and not S"™) and enforce 
that task (i) taking place at the last event point (m) must 
finish at the end of the horizon. 

The sequence constraints in (12) state that task (i) 
starting at event point (m + 1) should start after the end 
of the same task performed in the same unit (j) which 
has finished at the previous event point, (m) where extra 
time is added after task (i) at event point (n), if nec- 
essary. The constraints in (13) are written for tasks (i) 
and (i’) that are performed in the same unit (j) at event 
points (n + 1) and (n), respectively. If both tasks take 
place in the same unit, they should be at most consec- 
utive. The third set of sequence constraints in (14) re- 
late tasks (i) and (i’) which are performed in different 
units (j) and (j’) but take place consecutively accord- 
ing to the production recipe. The zero-wait constraints 
in (15) are written for different tasks (i) and (i’) that 
take place consecutively with the intermediate state (s) 
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having no possible intermediate storage and thus sub- 
ject to the zero-wait condition. 


Y> SS yGivk.n) + slorder(k) > 1, 
ie, nen 
yes eNO (16) 


Vk eK™ 


dD vk in) 


i€]™,[,,[7 nEN 
n<Nmax 


oo) el 

= Bmin 
sESin, geatl gepin s 
sl,>0  rkksq>0 


S> suiti; - yi, k,n) 


kEK",K; j€Ji 
> ye wv(i, j,n), (18) 
Geis 
Vier 7° neNn, n<N™ 


Yo vikn)< So wvli.jen), 


keKin,K; jeJi,I™ (19) 
Vier 7% neNn, n<N™ 


(17) 


VkeK™ 


—_—_} 


kD(k,s,n+1)+ kDf(k,s,n+1)=> 
>> Bi, j,n) —C-(1— yl, k,n), 
ii (20) 
WeesS” So" kek”. E.. 


i€i. fF. 2eNn, ne N™ 


D(s,n) = ‘ kD(k,s,n), 


kEKiN,K, (21) 
Vse sin, gcatl ne N, n< n™*x 


Df(s.n)= > kDf(k,s.n), 
kEKi",K, (22) 


Vs E sin gcatl ne N, n= n™x 


S> [kD(k,s,n + 1) + kDf(k,s,n + 1)] 


neN 
n<Nmax 


4 sloth 3, a) = Reis (23) 
Vees™. 5" KER" K., 
dé D"™ rkisq > 0 


= [kD(k,s,n + 1) + kDf(k,s,n + 1)] 


nEN 
n<Nmax 


+ stf(s) — sla2(k,s,d) < rksa, (24) 

¥eeS" 8, KER™ K., 

d € D™  rkisq > 0 

T*(i, j, n) — sltl(k,s,d,n) < duek(k, s, d) 

+ H-(2—wv(i, j,n) — y(i,k,n)), (25) 
VeeS" 5" he RU R, (EP ir; 
j€i, ne N,n<N™, de D",rkisg > 0 
rig. n) + slt2(k,s,d,n) => (duek(k, s,d) — 24) 

—H-(2—wv(i, j,n) — yi, k,n), 
VeeS 8S kek, ter it, 
j€ti, ne N,n<N™, de D™,rkisa > 0 
(26) 


The order satisfaction constraints in (16)-(23) are 
written to ensure that all orders for category 1 prod- 
ucts are met on-time and with the required amount. 
Both under and overproduction as well as early and late 
production are represented with slack variables that are 
penalized in the objective function. Note that these con- 
straints can be modified to represent different require- 
ments for production, if desired. Constraints (16) try to 
ensure that each order (k) is met at least one time with 
an operation type 6 task (i), where task (i) is suitable for 
order (k) ifi € I, and isa operation type 6 task for a cat- 
egory 1 product if i € I°. Similarly, constraints (17) 
enforce the condition that each order (k) for category 
1 product state (s) on day (d) can be met with at most 
[rkisa/ B™™"] tasks. Constraints (18) and (19) link the 
delivery of order (k) through task (i) at event point (1) 
to the beginning of task (i) in any suitable unit (j) at 
event point (1) so that every category 1 operation type 6 
task must be linked to at least one order delivery and 
vice versa. Thus, constraint (18) enforces that if a bi- 
nary variable is activated for operation type 6 task (i), 
then at least one order delivery must be activated. Sim- 
ilarly, constraint (19) ensures that if no binary variables 
are activated for operation type 6 task (i) at event point 
(n), then no delivery variables can be activated. Con- 
straints (20) relate the individual order delivery vari- 
ables to the batch-size of the operation type 6 task used 
to satisfy the order. If an order (k) is met by task (i) at 
event point (n) (i.e., y(i, k,n) = 1), then at least one 
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operation type 6 task is active for task (i) at event point 
(n) and thus at least one B(i, j, m) variable is greater 
than zero. Constraints (21) and (22) relate the individ- 
ual order delivery variables to the overall delivery vari- 
ables used in the material balance constraints. 

Constraints (23) and (24) determine the under and 
overproduction, respectively, of order (k) for state (s) 
on day (d). Constraints (23) try to enforce the indi- 
vidual order delivery variables to exceed the amount 
due for order (k) (i.e, rksq) where slack variables 
slal(k,s,d) are activated in the case of underproduc- 
tion. Similarly, constraints (24) try to enforce the indi- 
vidual order delivery variables plus any amount of the 
product state left at the end of the horizon not to ex- 
ceed the amount due for order (k) where slack variables 
sla2(k, s, d) are activated in the case of overproduction. 
Constraints (25) and (26) determine the late and early 
production, respectively, of order (k) for state (s) on day 
(d). Constraints (25) try to enforce the finishing time of 
task (i) used to satisfy order (k) at event point (1) to be 
less than the due date of order (k) where slack variables 
slt1(k, s,d, n) are activated in the case of late produc- 
tion. Similarly, constraints (26) try to enforce the fin- 
ishing time of task (i) used to fulfill order (k) at event 
point (1) to be greater than the beginning of the day (d) 
on which the order is due (i. e., duek(k, s, d) — 24). Oth- 
erwise, slack variables slt2(k, s,d, n) are activated indi- 
cating early production. 


tot(s) = stf(s)+ S > [D(s,n) + Df(s.n)] . 


soaps (27) 
Vs es™ 
2 [D(s, n) + Df(s, n)| + sll(s) > dem, , 
jae (28) 
Vs E sin geatl 
tot(s) + sll(s) > dem,, Ws € S™,S@? (29) 
Y> So YS BU. jn) + sll*"(s) = dem? , 
iezin ye i€Ji GaN, 
n<N (30) 


Vse sis, se sitsf prawy, > 0, 
dem, >0, dem,” >0 


Constraints (27)-(29) are used to determine the 
overall underproduction for both category 1 and 2 


products in the current time horizon. First, con- 
straints (27) determine the total production for all 
product states (s) (i-e., tot(s)) in the current horizon. 
Then, constraints (28) sum the overall delivery vari- 
ables for category 1 products and activate the slack vari- 
ables sil(s) if the sum does not exceed the demand for 
category 1 product state (s). Similarly, constraints (29) 
calculate the amount of underproduction (i.e., sil(s)) 
for category 2 product state (s) based on it’s overall 
demand in the time horizon. The slack variable s/I(s) 
is then penalized in the objective function where cat- 
egory 1 and 2 products can be penalized at different 
weights. Constraints (30) determine the amount of un- 
derproduction for intermediate product states (s) that 
are needed as raw materials for final product states 
(s’). 

The bound constraints are used to impose lower and 
upper bounds on the continuous variables including 
slack variables. They are also used to fix some binary 
and continuous variables to be zero when necessary. 


T'(i, j,n) > start;, Viel, jeJi, nen 
T°(i, j,n) = start;, Viel, jeyi, neN 
T'(i,j,n)<H, Viel, jeéji, neN 
T°(i, j,n) <H, Wiel, jéJi, n€N 
STO(s) =0, Vs ¢ S° 

STF(s) < dem‘, Vs € s 

tot(s) < dem, Ws est 


Ss 


D(s,n), Df(s,n) =0, Vs ¢€ SP orn e€ Nin > N™ 
D(s,n), Df(s,n) < S> S° rk(k,s,d), 
deDi" kexkin 

Vs Ee SP, ne N,n< N™ 
kD(k, s,n), kDf(k,s,n) =0, Vk ¢ Kk@ 

ors ¢S,orné Nn>N™ 
kD(k, s, n), kDf(k, s,d) < > rk(k,s,d), 

deDin 

Vse Spy, nEN, n< N™* 
slcap(s,n) < stcap™™, Ws e SP™ 
slal(k, s,n), sla2(k,s,n) =0, Wk ¢ K™ ors ¢ Sx 

ord ¢ D*® or rk(k,s,d) =0 
slal(k,s,n) < rk(k,s,d), Wk eK®™, 

sé€S:, de pD™ 
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sla2(k,s,n) < dem", Vk e K®, 
séS,, de D™ 

slt1(k,s,d,n), slt2(k,s,d,n) =0, Vk ¢K™ 
ors ¢ S, ord ¢ D™ or 

rk(k,s,d) =O ornée N, n>N™* 

sltl(k, s,d,n) < H — duek(k,s,d), Wk e K™, 
seS,, d€eD™, neN, n<N™ 

slt2(k,s,d,n) < duek(k,s,d), Wk € K™, 
seES;,, dE D™, neN, n<N™™ 

sIl(s) < dem,, Ws € SP 

sl (s) < demo”, VWs € S™ 

wv(i, j,n), BG, j,n) =0, Vi¢ I" or j ¢ J; 
orneN, n>N™* 

D(s,n), Df(s,n) =0, Vs ¢ S™ 


n> NOS 


(31) 


orneN, 


There are several different objective functions that can 
be employed with a general short-term scheduling 
problem. In this work, we maximize the sale of final 
products while penalizing several other terms includ- 
ing the slack variables introduced previously. The over- 
all objective function is as follows: 


Max w- > price, - tot(s) 


sesin, sp 
—i- y y y tts(i, j, n) 
iefin jeJ; neN 
n<Nmax 


—K-: p3e> Ps wv(i, j, 2) 


ielim j€Jj nEN 
n<Nmx 
+OE DY wean 
keKin i€l, neN 
n<Nmax 
-y: Y- prior, -sl(s)—@- = slorder(k) 
sest keKin 


—a: BS S> SS slal(k,s, d) 


kEK™ sesatl depin 


+p: sath.) 


ae > - > y - siti(k,s,d,n) 


kEK™ sesatl depin nEN 
n<N™max 


+ slt2(k, s,d,n) 


—1: » prior.” - sII°™(s) 


sestw 

—6- ) ) slcap(s, n) 
sesPpm neN 
n<Nmax 


(32) 


where each of the coefficients is used to balance the rel- 
ative weight of each term in the overall objective func- 
tion. The first term is the maximization of the value of 
the final products and is the main term of the objective 
function. The second term seeks to minimize the sum 
of the starting times of all active processing tasks. This 
is done to encourage all tasks to start as early as possi- 
ble in the scheduling horizon. Note that this results in 
a bilinear term which can replaced with an equivalent 
linear term and set of constraints [3]. The third term 
seeks to minimize the number of active binary variables 
in the final production schedule. The fourth term seeks 
to minimize the slack variable that is activated when 
product state (s) does not meet its overall demand for 
the time horizon. Coefficient prior, allows the ability 
to assign different weights to different product states. 
The fifth term minimizes the number of category 1 or- 
ders (k) that are not filled in the time horizon. The sixth 
term minimizes the amount of over and underproduc- 
tion of orders for category 1 products in the time hori- 
zon where the coefficient jz allows over and underpro- 
duction to be penalized by different amounts. The sev- 
enth term seeks to minimize the amount of early and 
late production of orders for category 1 products due 
in the time horizon where the coefficient jz allows early 
and late production to be penalized to different degrees. 
The eighth term minimizes the slack variables activated 
when insufficient raw material state (s) is produced dur- 
ing the time horizon where prior?” allows different 
states to be penalized by different amounts. The ninth, 
and final, term seeks to minimize the slack variables ac- 
tivated when insufficient intermediate state (s) is stored 
in its dedicated storage tank at each event point. Typical 
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values for each of the coefficients are as follows: w = 1, 
d = 1k = 10,y = 1000,6 = 1000,a = 2000, 
B = 500, w = 0.01, n = 50,5 = 10. 


Cases 


In this section, an example problem is presented to 
demonstrate the effectiveness of the rolling horizon 
framework. The example utilizes the proposed frame- 
work to determine the medium range production 
schedule of an industrial batch plant for a two-week 
time period which satisfies customer orders for vari- 
ous products distributed throughout the time period. 
The example is implemented with GAMS 2.50 [1] and 
solved using CPLEX 9.0 [2] with a 3.20GHz Linux 
workstation. The dual simplex method is used with 
best-bound search and strong branching. A relative op- 
timality tolerance equal to 0.001% was used as the ter- 
mination criterion along with a three hour time limit 
and an integer solution limit of 40. 

The distribution of demands for the entire two- 
week time period is shown in Fig. 3 where the amounts 
are shown in relative terms. There are two categories 
of products, category 1 and 2, and a total of 67 dif- 
ferent products have demands. There are two different 
campaign products that can be scheduled for campaign 
mode production and an additional eight intermediate 
products are used to make final products, even though 
they do not have demands. It is assumed that no final 
products are available at the beginning of the time hori- 
zon although some intermediate materials are available. 
Also, we assume no limitation on external raw materials 
and the zero-wait condition is applied to all intermedi- 
ate materials unless they are used as raw materials for 


a Category 1 s Category 2 


am 
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other final products. In this case, unlimited interme- 
diate storage is allowed. Note that finite intermediate 
storage is effectively modeled for those intermediates 
that have a dedicated storage task with a given capac- 
ity limit. In addition, there are two types of connections 
made between each consecutive short-term scheduling 
horizon in the rolling horizon framework: the initial 
available time for each unit and the inventory of inter- 
mediate materials. 


Case 1: Nominal Run 
without Campaign Mode Production 


The example problem considers the production 
scheduling of an industrial batch plant where no type 5 
unit campaign is imposed. Instead, demands for both 
campaign products are created throughout the time 
horizon with a total demand for each product equal to 
the production that would be imposed by a campaign. 
The total time period is 19 days, from DO to D18. The 
rolling horizon framework decomposes the time hori- 
zon into 8 individual subhorizons, each with its own 
products and demands. The results of the decomposi- 
tion for each time horizon can be seen in Table 1. 

The final production schedule for the entire time 
period can be seen in Fig. 4 and 5 where the process- 
ing units (operation type 1, 2, 3, and 5) are shown in 
the first figure and the other units (operation type 4a, 
4b, and 6) are shown in the second. Each short-term 
scheduling horizon is represented with a different color 
beginning with black for the first horizon, red for 
the second horizon, green for the third horizon, etc. 
The model and solution statistics for each short-term 


Medium-Term Scheduling of Batch Processes, Table 1 
Decomposition results for case 1 
Additional Products 
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Overall production schedule for processing units for case 1 
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Medium-Term Scheduling of Batch Processes, Table 2 
Model and solutions statistics for case 1 


Days 
DO-D2 


Event Points Objective Function 
14, 001.69 


Binary Variables 


Constraints 
187,833 


Continuous Variables 
33,064 


D3-D4 4135.24 


24,923 125,374 


D5-D6 —105, 854.81 


32,621 258,852 


D7-D8 —5496.19 


32,167 255,696 


D9-D10 = 115), S35 7255)7/ 


27,613 175,939 


D11-D12 =11l, se, 1183 


32,637 272,802 


D13-D14 —19, 401.39 


32,955 282,632 


D15-D18 


—37, 054.00 


scheduling horizon can be seen in Table 2 where each 
horizon runs for the time limit of three hours. 

The total demand for the entire 14-day period is 
2323.545 and the total production is 2744.005, where 
51.674 of the demands are not met. The production 
schedules obtained satisfy demands for almost all the 
products, though some due dates are relaxed, and also 
produce 18.10% more material than the demands re- 
quire. Many of the processing units are not fully uti- 
lized, as shown in Table 3, indicating the potential for 
even more production in the given time period. Also, 
note that the processing units become more idle to- 
wards the end of the overall time horizon. This is be- 
cause no demands are specified for the days following 
day D14 including days D15 to D18. Additional de- 
mands at the end of the overall time horizon or in the 
following days would generate a more heavily utilized 
production schedule. 


Conclusions 


In this paper, a unit-specific event-based continuous- 
time formulation is presented for the medium-term 
production scheduling of a large-scale, multipurpose 
industrial batch plant. The proposed formulation takes 
into account a large number of processing recipes and 
units and incorporates several features including var- 
ious storage policies (UIS, NIS, ZW), variable batch 
sizes and processing times, batch mixing and split- 
ting, sequence-dependent changeover times, interme- 
diate due dates, products used as raw materials, and 
several modes of operation. The scheduling horizon is 
several weeks or longer, however longer time periods 
can be addressed with the proposed framework. A key 
feature of the proposed formulation is the use of a de- 


46,827 321,162 


Medium-Term Scheduling of Batch Processes, Table 3 
Unit utilization statistics for case 1 


Time Used (h) TimeLeft (h) Percent Utilized 
21.49% 
74.78% 
72.15% 
80.92% 
62.06% 
88.16% 
89.47% 
61.62% 
70.61% 
70.66% 
68.46% 
38.82% 
44.08% 
79.39% 


composition model to split the overall scheduling hori- 
zon into smaller subhorizons which are scheduled in 
a sequential fashion. Also, new constraints are added 
to the short-term scheduling model in order to model 
the delivery of orders at intermediate due dates. The ef- 
fectiveness of the proposed approach is demonstrated 
with an industrial case study. Results indicate that the 
rolling horizon approach is effective at solving large- 
scale, medium-term production scheduling problems. 
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Introduction 


The vehicle routing problem (VRP) or the capacitated 
vehicle routing problem (CVRP) is often described as 
a problem in which vehicles based at a central depot 
are required to visit geographically dispersed customers 
in order to fulfill known customer demands. Let G = 
(V,E) bea graph where V = {io, i1, in, ... in} is the 
vertex set (i; = ig refers to the depot and the customers 
are indexed i; = i,...,i,) and E = {(i), i): ij, i, € 
V} is the edge set. Each customer must be assigned to 
exactly one of the k vehicles and the total size of deliv- 
eries for customers assigned to each vehicle must not 
exceed the vehicle capacity (Q,). If the vehicles are ho- 
mogeneous, the capacity for all vehicles is equal and de- 
noted by Q. A demand q;, and a service time st;, are 
associated with each customer node i). The travel cost 
between customers i) and ij, is Cixi, The problem is to 
construct a low cost, feasible set of routes - one for each 
vehicle. A route is a sequence of locations that a vehi- 
cle must visit along with the indication of the service it 
provides. The vehicle must start and finish its tour at 
the depot. The most important variants of the vehicle 
routing problem can be found in [12,13,39,54,84]. 

The vehicle routing problem was first introduced by 
Dantzig and Ramser [21]. As it is an NP-hard prob- 
lem, the instances with a large number of customers 
cannot be solved in optimality within reasonable time. 
Due to the general inefficiency of the exact methods and 
their inability to solve large scale VRP instances, a large 
number of approximation techniques have been pro- 
posed. These techniques are classified into two main 
categories, the classical heuristics that were developed 
mostly between 1960 and 1990 and the metaheuristics 
that were developed in the last fifteen years. 

In the 1960s and 1970s the first attempts to solve 
the vehicle routing problem focused on route build- 
ing, route improvement and two-phase heuristics. In 
the 1980s a number of mathematical programming 
procedures were proposed for the solution of the 
problem. The most important of them can be found 
in [6,18,19,22,28,29,33,62,88]. 


Metaheuristic Algorithms 
for the Vehicle Routing Problem 


The last fifteen years an incremental amount of meta- 
heuristic algorithms have been proposed. Simulated 
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annealing, genetic algorithms, neural networks, tabu 

search, ant algorithms, together with a number of hy- 

brid techniques are the main categories of the meta- 
heuristic procedures. These algorithms have the ability 
to find their way out of local optima. Surveys in meta- 

heuristic algorithms have been published by [27,31,32, 

49,50,79]. 

A number of metaheuristic algorithms have been 
proposed for the solution of the Capacitated Vehicle 
Routing Problem. The most important algorithms pub- 
lished for each metaheuristic algorithm are given in the 
following: 

e Simulated Annealing (SA) [1,3,47,72] plays a spe- 
cial role within local search for two reasons. First, 
they appear to be quite successful when applied 
to a broad range of practical problems. Second, 
some threshold accepting algorithms such as SA 
have a stochastic component, which facilitates a the- 
oretical analysis of their asymptotic convergence. 
Simulated Annealing [2] is a stochastic algorithm 
that allows random uphill jumps in a controlled 
fashion in order to provide possible escapes from 
poor local optima. Gradually the probability allow- 
ing the objective function value to increase is low- 
ered until no more transformations are possible. 
Simulated Annealing owes its name to an anal- 
ogy with the annealing process in condensed mat- 
ter physics, where a solid is heated to a maximum 
temperature at which all particles of the solid ran- 
domly arrange themselves in the liquid phase, fol- 
lowed by cooling through careful and slow reduc- 
tion of the temperature until the liquid is frozen 
with the particles arranged in a highly structured lat- 
tice and minimal system energy. This ground state is 
reachable only if the maximum temperature is suffi- 
ciently high and the cooling sufficiently slow. Other- 
wise a meta-stable state is reached. The meta- 
stable state is also reached with a process known 
as quenching, in which the temperature is instan- 
taneously lowered. Its predecessor is the so-called 
Metropolis filter. Simulated Annealing algorithms 
for the VRP are presented in [14,31,63]. 

e Threshold Accepting Method is a modification 
of the Simulated Annealing, which together with 
record to record travel [25,26] are known as Deter- 
ministic Annealing methods. These methods leave 
out the stochastic element in accepting worse solu- 


tions by introducing a deterministic threshold de- 
noted by Th,, > 0, and accept a worse solution if 
o = c(S’) — c(S) < Th,,, where c is the cost of the 
solution. This is the move acceptance criterion and 
the subscript m is an iteration index. Dueck and 
Scheurer [26] were the first to propose the Thre- 
shold Accepting Method for the VRP. Tarantilis 
et al. [81,82] proposed two very efficient algorithms 
belonging to this class: the Backtracking Adaptive 
Threshold Accepting (BATA) and the List-Based 
Threshold Accepting (LBTA). Other Determinis- 
tic Annealing methods were proposed by Golden 
et al. [40], the Record-to-Record Travel Method 
and by Li et al. [51]. 

Tabu search (TS) was introduced by Glover [34,35] 
as a general iterative metaheuristic for solving com- 
binatorial optimization problems. Computational 
experience has shown that TS is a well established 
approximation technique, which can compete with 
almost all known techniques and which, by its flexi- 
bility, can beat many classic procedures. It is a form 
of local neighbor search. Each solution S has an as- 
sociated set of neighbors N(S). A solution S’ € N(S) 
can be reached from S by an operation called a move. 
TS can be viewed as an iterative technique which 
explores a set of problem solutions, by repeatedly 
making moves from one solution S to another so- 
lution S’ located in the neighborhood N(S) of S [37]. 
TS moves from a solution to its best admissible 
neighbor, even if this causes the objective func- 
tion to deteriorate. To avoid cycling, solutions that 
have been recently explored are declared forbidden 
or tabu for a number of iterations. The tabu sta- 
tus of a solution is overridden when certain crite- 
ria (aspiration criteria) are satisfied. Sometimes, in- 
tensification and diversification strategies are used to 
improve the search. In the first case, the search is 
accentuated in the promising regions of the feasi- 
ble domain. In the second case, an attempt is made 
to consider solutions in a broad area of the search 
space. Tabu Search algorithms for the VRP are pre- 
sented in [7,9,20,30,63,70,7 1,77,85,89,90]. 

Genetic Algorithms (GAs) are search procedures 
based on the mechanics of natural selection and nat- 
ural genetics. The first GA was developed by John H. 
Holland in the 1960s to allow computers to evolve 
solutions to difficult search and combinatorial prob- 
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lems, such as function optimization and machine 
learning [44]. Genetic algorithms offer a particu- 
larly attractive approach for problems like vehicle 
routing problem since they are generally quite effec- 
tive for rapid global search of large, non-linear and 
poorly understood spaces. Moreover, genetic algo- 
rithms are very effective in solving large-scale prob- 
lems. Genetic algorithms [38,72] mimic the evolu- 
tion process in nature. GAs are based on an imita- 
tion of the biological process in which new and bet- 
ter populations among different species are devel- 
oped during evolution. Thus, unlike most standard 
heuristics, GAs use information about a population 
of solutions, called individuals, when they search for 
better solutions. A GA is a stochastic iterative proce- 
dure that maintains the population size constant in 
each iteration, called a generation. Their basic oper- 
ation is the mating of two solutions in order to form 
a new solution. To form a new population, a bi- 
nary operator called crossover, and a unary opera- 
tor, called mutation, are applied [65,66]. Crossover 
takes two individuals, called parents, and produces 
two new individuals, called offsprings, by swapping 
parts of the parents. Genetic Algorithms for the VRP 
are presented in [4,5,8,11,45,56,53,60,64]. 

Greedy Randomized Adaptive Search Procedure - 
GRASP [73] is an iterative two phase search method 
which has gained considerable popularity in com- 
binatorial optimization. Each iteration consists of 
two phases, a construction phase and a local search 
procedure. In the construction phase, a randomized 
greedy function is used to build up an initial solu- 
tion. This randomized technique provides a feasi- 
ble solution within each iteration. This solution is 
then exposed for improvement attempts in the local 
search phase. The final result is simply the best so- 
lution found over all iterations. Greedy Randomized 
Adaptive Search Procedure algorithms for the VRP 
are presented in [17,42,55]. 

The use of Artificial Neural Networks to find 
good solutions to combinatorial optimization prob- 
lems has recently caught some attention. A neural 
network consists of a network [76] of elementary 
nodes (neurons) that are linked through weighted 
connections. The nodes represent computational 
units, which are capable of performing a simple 
computation, consisting of a summation of the 


weighted inputs, followed by the addition of a con- 
stant called the threshold or bias, and the applica- 
tion of a nonlinear response (activation) function. 
The result of the computation of a unit constitutes 
its output. This output is used as an input for the 
nodes to which it is linked through an outgoing con- 
nection. The overall task of the network is to achieve 
a certain network configuration, for instance a re- 
quired input-output relation, by means of the col- 
lective computation of the nodes. This process is of- 
ten called self-organization. Neural Networks algo- 
rithm for the VRP are presented in [61,83]. 

The Ant Colony Optimization (ACO) metaheuris- 
tic is a relatively new technique for solving com- 
binatorial optimization problems (COPs). Based 
strongly on the Ant System (AS) metaheuristic de- 
veloped by Dorigo, Maniezzo and Colorni [24], ant 
colony optimization is derived from the foraging 
behaviour of real ants in nature. The main idea of 
ACO is to model the problem as the search for 
a minimum cost path in a graph. Artificial ants walk 
through this graph, looking for good paths. Each ant 
has a rather simple behavior so that it will typically 
only find rather poor-quality paths on its own. Bet- 
ter paths are found as the emergent result of the 
global cooperation among ants in the colony. An 
ACO algorithm consists of a number of cycles (it- 
erations) of solution construction. During each it- 
eration a number of ants (which is a parameter) 
construct complete solutions using heuristic infor- 
mation and the collected experiences of previous 
groups of ants. These collected experiences are rep- 
resented by a digital analogue of trail pheromone 
which is deposited on the constituent elements of 
a solution. Small quantities are deposited during 
the construction phase while larger amounts are de- 
posited at the end of each iteration in proportion 
to solution quality. Pheromone can be deposited 
on the components and/or the connections used in 
a solution depending on the problem. Ant Colony 
Optimization algorithms for the VRP are presented 
in [10,15,16,23,57,67,68,69]. 

Path Relinking This approach generates new so- 
lutions by exploring trajectories that connect high- 
quality solutions - by starting from one of these 
solutions, called the starting solution and generat- 
ing a path in the neighborhood space that leads 
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towards the other solution, called the target solu- 
tion [36]. Two new metaheuristic algorithms using 
the path relinking strategy as a part first of Tabu 
Search Metaheuristic is proposed in [43] and second 
as a part of a Particle Swarm Optimization Meta- 
heuristic is proposed in [52]. 

Guided Local Search (GLS), originally proposed by 
Voudouris and Chang [86,87], is a general optimiza- 
tion technique suitable for a wide range of combi- 
natorial optimization problems. The main focus is 
on the exploitation of problem and search-related 
information to effectively guide local search heuris- 
tics in the vast search spaces of NP-hard optimiza- 
tion problems. This is achieved by augmenting the 
objective function of the problem to be minimized 
with a set of penalty terms which are dynamically 
manipulated during the search process to steer the 
heuristic to be guided. GLS augments the cost func- 
tion of the problem to include a set of penalty terms 
and passes this, instead of the original one, for mini- 
mization by the local search procedure. Local search 
is confined by the penalty terms and focuses atten- 
tion on promising regions of the search space. Iter- 
ative calls are made to local search. Each time local 
search gets caught in a local minimum, the penal- 
ties are modified and local search is called again 
to minimize the modification cost function. Guided 
Local Search algorithms for the VRP are presented 
in [58,59]. 

Particle Swarm Optimization (PSO) is a popu- 
lation-based swarm intelligence algorithm. It was 
originally proposed by Kennedy and Eberhart as 
a simulation of the social behavior of social organ- 
isms such as bird flocking and fish schooling [46]. 
PSO uses the physical movements of the individuals 
in the swarm and has a flexible and well-balanced 
mechanism to enhance and adapt to the global and 
local exploration abilities. The first algorithm for the 
solution of the Vehicle Routing Problem was pro- 
posed by [52]. 

One of the most interesting developments that have 
occurred in the area of TS in recent years is the con- 
cept of Adaptive Memory developed by Rochat and 
Taillard [74]. It is, mostly, used in TS, but its applica- 
bility is not limited to this type of metaheuristic. An 
adaptive memory is a pool of good solutions that is 
dynamically updated throughout the search process. 


Periodically, some elements of these solutions are 
extracted from the pool and combined differently to 
produce new good solutions. Very interesting and 
efficient algorithms based on the concept of Adap- 
tive Memory have been proposed [74,78,79,80]. 
Variable Neighborhood Search (VNS) is a meta- 
heuristic for solving combinatorial optimization 
problems whose basic idea is systematic change of 
neighborhood within a local search [41]. Variable 
Neighborhood Search algorithms for the VRP are 
presented in [48]. 
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Introduction 


Many decision problems in various areas such as busi- 
ness, engineering, economics, and science, including 
those in manufacturing, location, routing, and schedul- 
ing, may be formulated as optimization problems. Ow- 
ing to the complexity of many of these optimization 
problems, particularly those of large sizes encountered 
in most practical settings, exact algorithms often per- 
form very poorly, in some cases taking days or more 
to find moderately decent, let alone optimal, solutions 
even to fairly small instances. As a result, heuristic al- 
gorithms are conspicuously preferable in practical ap- 
plications. 

As an extension of simple heuristics, a large num- 
ber of local search approaches have been developed to 
improve given feasible solutions. The main drawback 
of these approaches is their inability to continue the 
search upon becoming trapped in local optima. This 
leads to consideration of techniques for guiding known 
heuristics to overcome local optimality. Following this 
theme metaheuristics have become a most important 
class of approaches for solving optimization problems. 
They support managers in decision-making with robust 
tools that provide high-quality solutions to important 
applications in reasonable time horizons. 

We describe metaheuristics mainly from an oper- 
ations research perspective. Earlier survey papers on 
metaheuristics include those of Blum and Roli [14] and 
Vof [95]. Here we occasionally rely on the latter. The 
general concepts have not become obsolete, and many 
changes are mainly based upon an update to most re- 
cent references. A handbook on metaheuristics is avail- 
able describing a great variety of concepts by various 
authors in a comprehensive manner [44]. 
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Definitions 


The basic concept of heuristic search as an aid to prob- 
lem solving was first introduced in [76]. A heuristic is 
a technique (consisting of a rule or a set of rules) which 
seeks (and hopefully finds) good solutions at a reason- 
able computational cost. A heuristic is approximate in 
the sense that it provides (hopefully) a good solution for 
relatively little effort, but it does not guarantee optimal- 
ity. 

Heuristics provide simple means of indicating 
which among several alternatives seems to be best. That 
is, “Heuristics are criteria, methods, or principles for 
deciding which among several alternative courses of 
action promises to be the most effective in order to 
achieve some goal. They represent compromises be- 
tween two requirements: the need to make such crite- 
ria simple and, at the same time, the desire to see them 
discriminate correctly between good and bad choices. 
A heuristic may be a rule of thumb that is used to guide 
one’s action” [73]. 

Greedy heuristics are simple iterative approaches 
available for any kind of (e. g., combinatorial) optimiza- 
tion problem. A good characterization is their myopic 
behavior. A greedy heuristic starts with a given feasible 
or infeasible solution. In each iteration there are a num- 
ber of alternative choices (moves) that can be made to 
transform the solution. From these alternatives which 
consist in fixing (or changing) one or more variables, 
a greedy choice is made, i. e., the best alternative accord- 
ing to a given measure is chosen until no such transfor- 
mations are possible any longer. 

Usually, a greedy construction heuristic starts with 
an incomplete solution and completes it stepwise. Sav- 
ings and dual algorithms follow the same iterative 
scheme: dual heuristics change an infeasible low-cost 
solution until reaching feasibility; savings algorithms 
start with a high-cost solution and realize the highest 
savings as long as possible. Moreover, in all three cases, 
once an element has been chosen this decision is (usu- 
ally) not reversed throughout the algorithm, it is kept. 

As each alternative has to be measured, in general 
we may define some sort of heuristic measure (provid- 
ing, e. g., some priority values or some ranking infor- 
mation) which is iteratively followed until a complete 
solution is built. Usually this heuristic measure is ap- 
plied in a greedy fashion. 


For heuristics we usually have the distinction be- 
tween finding initial feasible solutions and improving 
them. In that sense we first discuss local search before 
characterizing metaheuristics. 


Local Search 


The basic principle of local search is to successively 
alter solutions locally. Related transformations are de- 
fined by neighborhoods which for a given solution in- 
clude all solutions that can be reached by one move. 
That is, neighborhood search usually is assumed to 
correspond to the process of iteratively moving from 
one solution to another one by performing some 
sort of operation. More formally, each solution of 
a problem has an associated set of neighbors called 
its neighborhood, i.e., solutions that can be obtained 
by a single operation called transformation or move. 
Most common ideas for transformations are, e. g., to 
add or drop some problem-specific individual com- 
ponents. Other options are to exchange two com- 
ponents simultaneously, or to swap them. Further- 
more, components may be shifted from a certain po- 
sition into other positions. All components involved 
within a specific move are called its elements or at- 
tributes. 

Moves must be evaluated by some heuristic mea- 
sure to guide the search. Often one uses the implied 
change of the objective function value, which may pro- 
vide reasonable information about the (local) advan- 
tage of moves. Following a greedy strategy, steepest de- 
scent (SD) corresponds to selecting and performing in 
each iteration the best move until the search stops at 
a local optimum. Obviously, savings algorithms corre- 
spond to SD. 

As the solution quality of local optima may be 
unsatisfactory, we need mechanisms which guide the 
search to overcome local optimality. A simple strategy 
called iterated local search is to iterate/restart the local 
search process after a local optimum has been obtained, 
which requires some perturbation scheme to generate 
a new initial solution (e. g., performing some random 
moves). Of course, more structured ways to overcome 
local optimality may be advantageous. 

A general survey of local search can be found in [1] 
and the references from [2]. A simple template is pro- 
vided in [90]. 
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Starting in the 1970s (see Lin and Kernighan [66]), 
a variable way of handling neighborhoods is still a topic 
within local search. Consider an arbitrary neighbor- 
hood structure N, which defines for any solution s 
a set of neighbor solutions N,(s) as a neighborhood 
of depth d = 1. In a straightforward way, a neigh- 
borhood Na+i(s) of depth d+ 1 is defined as the 
set Na(s) U {s’|As” € Na(s): s’ € Ni(s”)}. In general, 
a large d might be unreasonable, as the neighborhood 
size may grow exponentially. However, depths of two 
or three may be appropriate. Furthermore, temporarily 
increasing the neighborhood depth has been found to 
be a reasonable mechanism to overcome basins of at- 
traction, e.g., when a large number of neighbors with 
equal quality exist. 

Large-scale neighborhoods have become an impor- 
tant topic (see, e.g., [5] for a survey), especially when 
efficient ways are at hand for exploring them. Related 
research can also be found under various names; see, 
e. g., [75] for the idea of ejection chains. 

Stochastic local search is pretty much all we know 
about local search but is enhanced by randomizing 
choices. That is, a stochastic local search algorithm is 
a local search algorithm making use of randomized 
choices in generating or selecting candidate solutions 
for given instances of optimization problems. Random- 
ness may be used for search initialization as well as the 
computation of search steps. A comprehensive treat- 
ment of stochastic local search is given in [58]. 


Metaheuristics 


The formal definition of metaheuristics is based on 
a variety of definitions from different authors based 
on [39]. Basically, a metaheuristic is a top-level strat- 
egy that guides an underlying heuristic solving a given 
problem. In that sense we distinguish between a guiding 
process and an application process. The guiding process 
decides upon possible (local) moves and forwards its 
decision to the application process, which then executes 
the move chosen. In addition, it provides information 
for the guiding process (depending on the requirements 
of the respective metaheuristic) like the recomputed set 
of possible moves. 

According to [43], “metaheuristics in their modern 
forms are based on a variety of interpretations of what 
constitutes intelligent search”, where the term “intelli- 
gent search” has been made prominent by Pearl [73] 
(regarding heuristics in an artificial intelligence con- 
text; see also [92] regarding an operations research con- 
text). In that sense we may also consider the following 
definition: “A metaheuristic is an iterative generation 
process which guides a subordinate heuristic by com- 
bining intelligently different concepts for exploring and 
exploiting the search spaces using learning strategies to 
structure information in order to find efficiently near- 
optimal solutions” [72]. 

To summarize, the following definition seems to 
be most appropriate: “A metaheuristic is an iterative 


Heuristic 
Search 
ras 
Heuristic Evolutionary 
Measure Algorithm 
ras 
Simulated Tabu 
Steepest Annealing Search 
Descent 
LN x Many others... 
Pilot SA Johnson et al. | | TS static 
Method 
TS reactive TS ... 


Metaheuristics, Figure 1 
Simplified metaheuristics inheritance tree 
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master process that guides and modifies the opera- 
tions of subordinate heuristics to efficiently produce 
high-quality solutions. It may manipulate a complete 
(or incomplete) single solution or a collection of so- 
lutions at each iteration. The subordinate heuristics 
may be high (or low) level procedures, or a simple lo- 
cal search, or just a construction method. The fam- 
ily of metaheuristics includes, but is not limited to, 
adaptive memory procedures, tabu search, ant systems, 
greedy randomized adaptive search, variable neighbor- 
hood search, evolutionary methods, genetic algorithms, 
scatter search, neural networks, simulated annealing, 
and their hybrids” (p. ix in [97]). 

We describe the ingredients and basic concepts of 
various metaheuristic strategies like tabu search (TS), 
simulated annealing (SA), and scatter search. This is 
based on a simplified view of a possible inheritance tree 
for heuristic search methods, illustrating the relation- 
ships between some of the most important methods 
discussed below, as shown in Fig. 1. 

We also emphasize advances including the impor- 
tant incorporation of exact methods into intelligent 
search. Furthermore, general frames are sketched that 
may subsume various approaches within the meta- 
heuristics field. 


Metaheuristic Methods 


We survey the basic concepts of some of the most 
important metaheuristics. We shall see that adaptive 
processes originating from different settings such as 
psychology (“learning”), biology (“evolution”), physics 
(“annealing”), and neurology (“nerve impulses”) have 
served as interesting starting points. 


Simple Local Search Based Metaheuristics 


To improve the efficiency of greedy heuristics, one may 
apply generic strategies to be used alone or in combina- 
tion with each other, namely, changing the definition 
of alternative choices, look ahead evaluation, candidate 
lists, and randomized selection criteria bound up with 
repetition, as well as combinations with local search or 
other methods. 


Greedy Randomized Adaptive Search Omitting 
a greedy choice criterion for a random strategy, one 
can run the algorithm several times and obtain a large 


number of different solutions. A combination of best 
and random choice seems to be appropriate: We define 
a candidate list as a list consisting of a number of (best, 
i.e., first best, second best, third best, etc.) alternatives. 
Out of this list one alternative is chosen randomly. The 
length of the candidate list is given either as an abso- 
lute value, a percentage of all feasible alternatives, or 
implicitly by defining an allowed quality gap (to the 
best alternative), which also may be an absolute value 
or a percentage. 

Replicating a search procedure to determine a local 
optimum multiple times with different starting points 
has been given the acronym GRASP and investigated 
with respect to different applications. A comprehensive 
survey of GRASP and its applications is given in [32]. 
It should be noted that GRASP goes back to older ap- 
proaches [52], which is frequently overlooked in many 
applications. The different initial solutions or starting 
points are found by a greedy procedure incorporating 
a probabilistic component. That is, given a candidate 
list to choose from, GRASP randomly chooses one of 
the best candidates from this list in a greedy fashion, 
but not necessarily the best possible choice. 

The underlying principle is to investigate many 
good starting points through the greedy procedure and 
thereby to increase the possibility of finding a good lo- 
cal optimum on at least one replication. The method 
is said to be adaptive as the greedy function takes into 
account previous decisions when performing the next 
choice. 


The Pilot Method Building on a simple greedy algo- 
rithm such as, e.g., a construction heuristic, the pilot 
method [29,30] is a metaheuristic not necessarily based 
on a local search in combination with an improvement 
procedure. It primarily looks ahead for each possible lo- 
cal choice (by computing a so-called “pilot” solution), 
memorizing the best result, and performing the respec- 
tive move. (Very similar ideas have been investigated 
under the name rollout method [13].) One may ap- 
ply this strategy by successively performing a greedy 
heuristic for all possible local steps (i.e., starting with 
all incomplete solutions resulting from adding some 
not yet included element at some position to the cur- 
rent incomplete solution). The look ahead mechanism 
of the pilot method is related to increased neighbor- 
hood depths as the pilot method exploits the evaluation 
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of neighbors at larger depths to guide the neighbor se- 
lection at depth one. 

In most applications, it is reasonable to restrict 
the pilot process to some evaluation depth. That is, 
the method is performed up to an incomplete solu- 
tion (e. g., partial assignment) based on this evaluation 
depth and is then completed by continuing with a con- 
ventional heuristic. For a recent study applying the pi- 
lot method to several combinatorial optimization prob- 
lems obtaining very good results see [96]. Additional 
applications can be found, e. g., in [18,68]. 


Variable Neighborhood Search ‘The basic idea of 
variable neighborhood search (VNS) is to change the 
neighborhood during the search in a systematic way. 
VNS usually explores increasingly distant neighbor- 
hoods of a given solution, and jumps from this solution 
to a new one if and only if an improvement has been 
made. In this way often favorable characteristics of in- 
cumbent solutions, e. g., that many variables are already 
at their appropriate value, will be kept and used to ob- 
tain promising neighboring solutions. 

Moreover, a local search routine is applied repeat- 
edly to get from these neighboring solutions to local op- 
tima. This routine may also use several neighborhoods. 
Therefore, to construct different neighborhood struc- 
tures and to perform a systematic search, one needs to 
have a way for finding the distance between any two so- 
lutions, i. e., one needs to supply the solution space with 
some metric (or quasi-metric) and then induce neigh- 
borhoods from it. For an excellent treatment of various 
aspects of VNS see [51]. 


Simulated Annealing 


Simulated annealing (SA) extends basic local search by 
allowing moves to inferior solutions [26,64]. A basic 
SA algorithm may be described as follows: Successively, 
a candidate move is randomly selected; this move is ac- 
cepted if it leads to a solution with an improved objec- 
tive function value compared to the current solution, 
otherwise, the move is accepted with a probability de- 
pending on the deterioration A of the objective func- 
tion value. The acceptance probability is computed as 
ce, using a temperature T as a control parameter. 
Usually, T is reduced over time for diversification at an 
earlier stage of the search and to intensify later. 


Various authors have described a robust concretiza- 
tion of this general SA approach [60,62]. An interesting 
variant of SA is to strategically reheat the process, i.e., 
to perform a nonmonotonic acceptance function. 

Threshold accepting [28] is a modification (or sim- 
plification) of SA accepting every move that leads to 
a new solution which is “not much worse” (i.e., deteri- 
orates not more than a certain threshold which reduces 
with temperature) than the older one. 


Tabu Search 


The basic paradigm of tabu search (TS) is to use infor- 
mation about the search history to guide local search 
approaches to overcome local optimality (see [43] for 
a survey on TS). In general, this is done by a dynamic 
transformation of the local neighborhood. Based on 
some sort of memory, certain moves may be forbidden; 
we say they are set tabu. As for SA, the search may lead 
to performing deteriorating moves when no improving 
moves exist or when all improving moves of the current 
neighborhood are set tabu. At each iteration a best ad- 
missible neighbor may be selected. A neighbor, or a cor- 
responding move, is called admissible if it is not tabu 
or if an aspiration criterion is fulfilled. An aspiration 
criterion is a rule to eventually override a possibly un- 
reasonable tabu status of a move. For example, a move 
that leads to a neighbor with a better objective function 
value than encountered so far should be considered as 
admissible. 

We briefly describe some TS methods that differ es- 
pecially in the way in which tabu criteria are defined, 
taking into consideration the information about the 
search history (performed moves, traversed solutions). 

The most commonly used TS method is based on 
a recency-based memory that stores moves, or attributes 
characterizing respective moves, of the recent past 
(static TS). The basic idea of such approaches is to pro- 
hibit an appropriately defined inversion of performed 
moves for a given period. For example, one may store 
the solution attributes that have been created by a per- 
formed move in a tabu list. To obtain the current tabu 
status of a move to a neighbor, one may check whether 
(or how many of) the solution attributes that would be 
destroyed by this move are contained in the tabu list. 

Strict TS embodies the idea of preventing cycling to 
formerly traversed solutions. The goal is to provide ne- 
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cessity and sufficiency with respect to the idea of not 
revisiting any solution. Accordingly, a move is classi- 
fied as tabu if and only if it leads to a neighbor that 
has already been visited during the previous search. 
There are two primary mechanisms to accomplish the 
tabu criterion: First, we may exploit logical interde- 
pendencies between the sequence of moves performed 
throughout the search process, as realized by, e.g., the 
reverse elimination method and the cancellation se- 
quence method [40,94]. Second, we may store infor- 
mation about all solutions visited so far. This may be 
carried out either exactly or, for reasons of efficiency, 
approximately (e. g., by using hash codes). 

Reactive TS aims at the automatic adaptation of the 
tabu list length of static TS [12]. The idea is to in- 
crease the tabu list length when the tabu memory in- 
dicates that the search is revisiting formerly traversed 
solutions. A possible specification can be described as 
follows: Starting with a tabu list length / of 1 it is in- 
creased every time a solution has been repeated. If there 
has been no repetition for some iterations, we decrease 
it appropriately. To accomplish the detection of a rep- 
etition of a solution, one may apply a trajectory-based 
memory using hash codes as for strict TS. 

For reactive TS [12], it is appropriate to include 
means for diversifying moves whenever the tabu mem- 
ory indicates that we are trapped in a certain region 
of the search space. As a trigger mechanism one may 
use, e. g., the combination of at least two solutions each 
having been traversed three times. A very simple escape 
strategy is to perform a number of random moves (de- 
pending on the average of the number of iterations be- 
tween solution repetitions); more advanced strategies 
may take into account some long-term memory infor- 
mation (like the frequencies of appearance of specific 
solution attributes in the search history). 

Of course there are a great variety of additional in- 
gredients that may make TS work successfully, e. g., re- 
stricting the number of neighbor solutions to be evalu- 
ated (using candidate list strategies). 


Evolutionary Algorithms 


Evolutionary algorithms comprise a great variety of dif- 
ferent concepts and paradigms, including genetic algo- 
rithms (GAs) [45,56], evolutionary strategies [55,83], 
evolutionary programs [36], scatter search [38,41], and 


memetic algorithms [71]. For surveys and references on 
evolutionary algorithms see also [9,37,69,78]. 

GAs are a class of adaptive search procedures based 
on principles derived from the dynamics of natural 
population genetics. One of the most crucial ideas 
for a successful implementation of a GA is the rep- 
resentation of an underlying problem by a suitable 
scheme. A GA starts, e.g., with a randomly created 
initial population of artificial creatures (strings), a set 
of solutions. These strings in whole and in part are 
the base set for all subsequent populations. They are 
copied and information is exchanged between the 
strings in order to find new solutions of the underly- 
ing problem. The mechanisms of a simple GA essen- 
tially consist of copying strings and exchanging par- 
tial strings. A simple GA uses three operators which 
are named according to the corresponding biological 
mechanisms: reproduction, crossover, and mutation. 
Performing an operator may depend on a fitness func- 
tion or its value (fitness), respectively. As some sort 
of heuristic measure, this function defines a means 
of measurement for the profit or the quality of the 
coded solution for the underlying problem and often 
depends on the objective function of the given prob- 
lem. 

GAs are closely related to evolutionary strategies. 
Whereas the mutation operator in a GA serves to pro- 
tect the search from premature loss of information, 
evolution strategies may incorporate some sort of lo- 
cal search procedure (such as SD) with self-adapting 
parameters involved in the procedure. On a simpli- 
fied scale many algorithms may be coined evolutionary 
once they are reduced to the following frame [54]: 

1. Generate an initial population of individuals. 

2. While no stopping condition is met do. 

3. Co-operation. 

4. Self-adaptation. 

Self-adaptation refers to the fact that individuals (solu- 
tions) evolve independently while co-operation refers 
to an information exchange among individuals. 

Scatter search ideas established a link between early 
ideas from various sides - evolutionary strategies, TS, 
and GAs. As an evolutionary approach, scatter search 
originated from strategies for creating composite deci- 
sion rules and surrogate constraints [38]. Scatter search 
is designed to operate on a set of points, called ref- 
erence points, that constitute good solutions obtained 
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from previous solution efforts. The approach system- 
atically generates linear combinations of the reference 
points to create new points, each of which is mapped 
into an associated point that yields integer values for 
discrete variables. Scatter search contrasts with other 
evolutionary procedures, such as GAs, by providing 
unifying principles for joining solutions based on gen- 
eralized path constructions in Euclidean space and by 
utilizing strategic designs where other approaches re- 
sort to randomization. For a very comprehensive treat- 
ment of scatter search see [65]. 


Swarm Intelligence 


Swarm intelligence is a relatively novel discipline inter- 
ested in the study of self-organizing processes in nature 
and human artifacts [15,63]. While researchers in eth- 
nology and animal behavior have proposed many mod- 
els to explain various aspects of social insect behavior 
such as self-organization and shape formation, algo- 
rithms inspired by these models have been proposed to 
solve optimization problems. Successful examples are 
the so-called ant system or ant colony paradigm, the bee 
system, and swarm robotics, where the focus is on ap- 
plying swarm intelligence techniques to the control of 
large groups of cooperating autonomous robots. 

The ant system is a dynamic optimization process 
reflecting the natural interaction between ants search- 
ing for food [23]. The ants’ ways are influenced by two 
different kinds of search criteria. The first one is the lo- 
cal visibility of food, i.e., the attractiveness of food in 
each ant’s neighborhood. Additionally, each ant’s way 
through its food space is affected by the other ants’ trails 
as indicators for possibly good directions. The inten- 
sity of trails itself is time-dependent: With time passing, 
parts of the trails are diminishing, while the intensity 
may increase by new and fresh trails. With the quan- 
tities of these trails changing dynamically, an autocat- 
alytic optimization process is started forcing the ants’ 
search into most promising regions. This process of in- 
teractive learning can easily be modeled for most kinds 
of optimization problems by using simultaneously and 
interactively processed search trajectories. 

A comprehensive treatment of the ant system 
paradigm can be found in [24]. To achieve enhanced 
performance of the ant system it is useful to hybridize 
it at least with a local search component. 


Miscellaneous 


Target analysis may be viewed as a general learning ap- 
proach. Given a problem we first explore a set of sample 
instances and an extensive effort is made to obtain a so- 
lution which is optimal or close to optimality. The best 
solutions obtained provide targets to be sought within 
the next part of the approach. For instance, a TS algo- 
rithm may resolve the problems with the aim of find- 
ing what are the right choices to come to the already 
known solution (or as close to it as possible). This may 
give some information on how to choose parameters for 
other problem instances. 

A different method in this context is path relinking 
(PR), which provides a useful means of intensification 
and diversification. Here new solutions are generated 
by exploring search trajectories that combine elite solu- 
tions, i.e., solutions that have proven to be better than 
others throughout the search. For references on target 
analysis and PR see [43]. 

Recalling local search based on data perturbation 
the noising method may be related to the following 
approach too. Given an initial feasible solution, the 
method performs some data perturbation [87] in or- 
der to change the values taken by the objective function 
of a respective problem to be solved. On the perturbed 
data a local search may be performed (e. g., following 
a SD approach). The amount of data perturbation (the 
noise added) is successively reduced until it reaches 
zero. The noising method is applied, e.g., in [19] for 
the clique partitioning problem. 

The key issue in designing parallel algorithms is 
to decompose the execution of the various ingredients 
of a procedure into processes executable by parallel 
processors. In contrast to ant systems or GAs, meta- 
heuristics like TS or SA, at first glance, have an in- 
trinsic sequential nature owing to the idea of perform- 
ing the neighborhood search from one solution to the 
next. However, some effort has been undertaken to de- 
fine templates for parallel local search [20,90,91,93]. 
A comprehensive treatment with successful applica- 
tions is provided in [6]. The discussion of parallel meta- 
heuristics has also led to interesting hybrids such as the 
combination of a population of individual processes, 
agents, in a cooperative and competitive nature (see, 
e. g., the discussion of memetic algorithms in [71]) with 
TS. 
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Neural networks may be considered as metaheuris- 
tics, although we have not considered them here; see 
[85] for a comprehensive survey of these techniques for 
combinatorial optimization. In contrast, one may use 
metaheuristics to speed up the learning process regard- 
ing artificial neural networks; see [7] for a comprehen- 
sive consideration. 

Furthermore, recent efforts on problems with mul- 
tiple objectives and corresponding metaheuristic ap- 
proaches can be found in [61]. See [82] for some ideas 
regarding GAs and fuzzy multiobjective optimization. 


General Frames 


An important avenue of metaheuristics research refers 
to general frames (to explain the behavior and the rela- 
tionship between various methods) as well as the devel- 
opment of software systems incorporating metaheuris- 
tics (eventually in combination with other methods). 
Besides other aspects, this takes into consideration that 
in metaheuristics it has very often been appropriate 
to incorporate a certain means of diversification vs. 
intensification to lead the search into new regions of 
the search space. This requires a meaningful mecha- 
nism to detect situations when the search might be 
trapped in a certain area of the solution space. There- 
fore, within intelligent search the exploration of mem- 
ory plays a most important role. 


Adaptive Memory Programming 


Adaptive memory programming (AMP) coins a gen- 
eral approach (or even thinking) within heuristic search 
focusing on exploiting a collection of memory com- 
ponents [42,89]. An AMP process iteratively con- 
structs (new) solutions based on the exploitation of 
some memory, especially when combined with learning 
mechanisms supporting the collection and use of the 
memory. Based on the idea of initializing the memory 
and then iteratively generating new solutions (utilizing 
the given memory) while updating the memory based 
on the search, we may subsume various of the above- 
described metaheuristics as AMP approaches. This also 
includes exploiting provisional solutions that are im- 
proved by a local search approach. 

The performance as well as the efficiency of 
a heuristic scheme strongly depends on its ability to 
use AMP techniques providing flexible and variable 


strategies for types of problems (or special instances 
of a given problem type) where standard methods fail. 
Such AMP techniques could be, e. g., dynamic handling 
of operational restrictions, dynamic move selection for- 
mulas, and flexible function evaluations. 

Consider, as an example, adaptive memory within 
TS concepts. Realizing AMP principles depends on 
which specific TS application is used. For example, the 
reverse elimination method observes logical interde- 
pendencies between moves and infers corresponding 
tabu restrictions, and therefore makes fuller use of AMP 
than simple static approaches do. 

To discuss the use of AMP in intelligent agent 
systems, one may use the simple model of ant sys- 
tems as an illustrative starting point. Ant systems are 
based on combining local search criteria with infor- 
mation derived from the trails. This follows the AMP 
requirement for using flexible (dynamic) move selec- 
tion rules (formulas). However, the basic ant system 
exhibits some structural inefficiencies when viewed 
from the perspective of general intelligent agent sys- 
tems, as no distinction is made between successful and 
less successful agents, no time-dependent distinction 
is made, and there is no explicit handling of restric- 
tions providing protection against cycling and duplica- 
tion. Furthermore, there are possible conflicts between 
the information held in the adaptive memory (diverging 
trails). 


A Pool Template 


In [48] a pool template (PT) is proposed as can be seen 
in Fig. 2. The following notation is used. A pool of 
p = 1 solutions is denoted by P. Its input and output 
transfer is managed by two functions which are called 
IF and OF, respectively. S is a set of solutions with car- 
dinality s > 1. A solution combination method (proce- 
dure SCM) constructs a solution from a given set S, and 
IM is an improvement method. 

Depending on the method used, in step 1 a pool 
is either completely (or partially) built by a (random- 
ized) diversification generator or filled with a single so- 
lution which has been provided, e. g., bya simple greedy 
approach. Note that a crucial parameter that deserves 
careful elaboration is the cardinality p of the pool. The 
main loop, executed until a termination criterion holds, 
consists of steps 2-5. Step 2 is the call of the output 
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1. Initialize P by an external procedure 
WHILE termination=FALSE DO BEGIN 


2. S:= OF(P) 
3. IFs > 1 THEN S’ := SCM(S) 
ELSE S’ := § 
4, S$” = IM(S’) 
5, P= AO") 
END 


6. Apply a post-optimizing procedure to P 


Metaheuristics, Figure 2 
Pool template 


function which selects a set of solutions, S, from the 
pool. Depending on the kind of method represented in 
the PT, these solutions may be assembled (step 3) to 
a working solution S’ which is the starting point for the 
improvement phase of step 4. The outcome of the im- 
provement phase, S”, is then evaluated by means of the 
input function, which possibly feeds the new solution 
into the pool. Note that a post-optimization procedure 
in step 6 is for facultative use. It may be a straightfor- 
ward greedy improvement procedure if used for single- 
solution heuristics or a pool method on its own. As an 
example we quote a sequential pool method, the TS with 
PR in [11]. Here a PR phase is added after the pool 
has been initialized by a TS. A parallel pool method on 
the other hand uses a pool of solutions while it is con- 
structed by the guiding process (e.g., a GA or scatter 
search). 

Several heuristic and metaheuristic paradigms, 
whether they are obviously pool-oriented or not, can be 
summarized under the common PT frame. We provide 
the following examples: 

a) Local search/SD: PT with p = s = 1. 

b) SA: p = 2,s = 1 incorporating its probabilistic ac- 
ceptance criterion in IM. (It should be noted that 
p = 2and s = 1 seems to be unusual at first glance. 
For SA we always have a current solution in the 
pool for which one or more neighbors are evalu- 
ated and eventually a neighbor is found which re- 
places the current solution. Furthermore, at all itera- 
tions throughout the search the so far best solution is 
stored too (even if no real interaction between those 
two stored solutions takes place). The same is also 
valid for a simple TS. As for local search the current 


solution corresponds to the best solution of the spe- 
cific search, we have p = 1.) 

c) Standard TS: p = 2,s = 1 incorporating adaptive 
memory in IM. 

d) GAs: p > 1 and s > 1 with population mechanism 
(crossover, reproduction, and mutation) in SCM of 
step 3 and without the use of step 4. 

e) Scatter search: p > 1 and s > 1 with subset genera- 
tion in OF of step 2, linear combination of elite solu- 
tions by means of SCM in step 3, e. g., a TS for pro- 
cedure IM and a reference set update method in IF 
of step 5. 

f) PR (asa parallel pool method): p > 1 and s = 2 with 
a PR neighborhood in SCM. Facultative use of step 4. 


Partial Optimization Metaheuristic 
Under Special Intensification Conditions 


A natural way to solve large optimization problems 
is to decompose them into independent subproblems 
that are solved with an appropriate procedure. How- 
ever, such approaches may lead to solutions of moder- 
ate quality since the subproblems might have been cre- 
ated in a somewhat arbitrary fashion. Of course, it is not 
easy to find an appropriate way to decompose a prob- 
lem a priori. The basic idea of POPMUSIC conditions 
is to locally optimize subparts of a solution, a posteri- 
ori, once a solution to the problem is available. These 
local optimizations are repeated until a local optimum 
is found. Therefore, POPMUSIC may be viewed as a lo- 
cal search working with a special, large neighborhood. 
While the acronym POPMUSIC was given by Taillard 
and Vof [88] other metaheuristics may be incorporated 
into the same framework too [84]. 

For large optimization problems, it is often pos- 
sible to see the solutions as composed of parts (or 
chunks [102], cf. the term “vocabulary building”). Con- 
sidering the vehicle routing problem, a part may be 
a tour (or even a customer). Suppose that a solution can 
be represented as a set of parts. Moreover, some parts 
are more in relation with some other parts, so a cor- 
responding heuristic measure can be defined between 
two parts. The central idea of POPMUSIC is to select 
a so-called seed part and a set P of parts that are mostly 
related to the seed part to form a subproblem. 

Then it is possible to state a local search optimiza- 
tion frame that consists of trying to improve all sub- 
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problems that can be defined, until the solution does 
not contain a subproblem that can be improved. In the 
POPMUSIC frame of [88], P corresponds precisely to 
seed parts that have been used to define subproblems 
that have been unsuccessfully optimized. Once P con- 
tains all the parts of the complete solution, all subprob- 
lems have been examined without success and the pro- 
cess stops. 

Basically, the technique is a gradient method that 
starts from a given initial solution and stops in a local 
optimum relative to a large neighborhood structure. 
To summarize, both POPMUSIC as well as AMP may 
serve as a general frame encompassing various other 
approaches. 


Hybrids with Exact Methods 


Often a new idea or a new paradigm in metaheuristics 
is claimed to be the idea by the inventor, while oth- 
ers see it as useless in the first instance. However, once 
it has been hybridized, things begin to fly. Especially 
in population-based metaheuristics, many researchers 
have followed this trend. That is, we now see many 
hybrid approaches where the successful ingredients of 
various metaheuristics have been combined. The term 
“hybridization”, however, goes further, as it also refers 
to combining metaheuristics with exact methods. 

Traditionally, the structure of neighborhoods is de- 
termined by local transformations or moves. This usu- 
ally refers to relatively small homogeneous neighbor- 
hoods. Different types of moves have been used in the 
construction of very large and diverse neighborhoods. 
In contrast, as a hybrid one may deploy neighborhoods 
that are method-based. By this we mean that the basic 
structure of a neighborhood is determined by the needs 
and requirements of a given (say, exact) optimization 
method used to search the neighborhood. That is, given 
an incumbent solution one may define the neighbor- 
hood so that an exact method can be efficiently used 
rather than defining a neighborhood and trying to find 
an appropriate method to explore it. This approach was 
called corridor method by Sniedovich and Vof [86] as 
it literally defines a neighborhood as a sufficiently sized 
corridor around a given solution so that a given exact 
method behaves well. Iteratively the corridor is moved 
through the search space for exploration. 


Constraint programming (CP) is a paradigm for rep- 
resenting and solving a wide variety of problems ex- 
pressed by means of variables, their domains, and con- 
straints on the variables. Usually CP models are solved 
using depth-first search and branch and bound. Nat- 
urally, these concepts can be complemented by local 
search concepts and metaheuristics. This idea is fol- 
lowed by several authors; see [21] for TS and guided 
local search hybrids. Commonalities with the POPMU- 
SIC approach can be deduced from [74]. 

Of course, the treatment of this topic is by no means 
complete and various ideas have been developed. One 
idea is to transform a greedy heuristic into a search al- 
gorithm by branching only in a few (i.e., limited num- 
ber) cases when the choice criterion of the heuristic ob- 
serves some borderline case or where the choice is least 
compelling, respectively. This approach may be called 
limited discrepancy search [17,53]. 

Independent from the CP concept, one may inves- 
tigate hybrids of branch and bound and metaheuristics, 
e.g., for deciding upon branching variables or search 
paths to be followed within a branch and bound tree 
(see [103] for an application of reactive TS). Here we 
may also use the term “cooperative solver.” Somewhat 
related is the local branching concept for solving mixed 
integer programs (MIP), which seeks to explore neigh- 
borhoods defined through (invalid) linear cuts. The 
neighborhoods are searched by means of a general pur- 
pose MIP solver [35]. 

Correspondingly, one of the current research is- 
sues refers to exploiting mathematical programming 
(MP) techniques in a (meta)heuristic framework or, 
correspondingly, granting to MP approaches the cross- 
problem robustness and computation time effective- 
ness which characterize metaheuristics. Discriminating 
landmark is some form of exploitation of a MP formu- 
lation, e.g., by means of MIP. In this respect various 
efforts have been made towards developing strategies 
for making a heuristic sequence of roundings to obtain 
feasible solutions for problems represented by means of 
appropriate MIP [3,34]. 


Optimization Software Libraries 


Besides some well-known approaches for reusable soft- 
ware in the field of exact optimization (e.g., CPLEX 
or ABACUS; see http://www.ilog.com and http://www. 
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informatik.uni-koeln.de/abacus) some ready-to-use 
and well-documented component libraries in the field 
of local search based heuristics and metaheuristics have 
been developed; see especially the contributions in [98]. 

The most successful approaches documented in 
the literature are the heuristic optimization frame- 
work HOTFRAME of [33] and EASYLOCAL++ of [22]. 
HOTFRAME, as an example, is implemented in C++, 
which provides adaptable components incorporating 
different metaheuristics and an architectural descrip- 
tion of the collaboration among these components and 
problem-specific complements. Typical application- 
specific concepts are treated as objects or classes: 
problems, solutions, neighbors, solution attributes, and 
move attributes. On the other hand, metaheuristic con- 
cepts such as the different methods described above and 
their building blocks such as tabu criteria or diversifi- 
cation strategies are also treated as objects. HOTFRAME 
uses genericity as the primary mechanism to make these 
objects adaptable. That is, common behavior of meta- 
heuristics is factored out and grouped in generic classes, 
applying static type variation. Metaheuristics template 
classes are parameterized by aspects such as solution 
spaces and neighborhood structures. 


Applications 


Applications of metaheuristics are almost uncountable 
and appear in various journals (e. g., Journal of Heuris- 
tics), books, and technical reports every day. A helpful 
source for a subset of successful applications may be 
special issues of journals or compilations such as [25, 
77,79,97], just to mention some. 

Specialized conferences like the Metaheuristics In- 
ternational Conference are devoted to the topic [25, 
59,72,80,81,97] and even more general conferences re- 
veal that metaheuristics have become part of neces- 
sary prerequisites for successfully solving optimization 
problems [46]. Moreover, ready-to-use systems such as 
class libraries and frameworks have been developed, al- 
though they are usually restricted to application by the 
knowledgeable user. 

Specialized applications also reveal research needs, 
e. g., in dynamic environments. One example refers to 
the application of metaheuristics for online optimiza- 
tion [49]. 


Conclusions 


Over the last few decades metaheuristics have become 
a substantial part of the optimization stockroom with 
various applications in science and, even more impor- 
tant, in practice. Metaheuristics have been considered 
in textbooks, e. g., in operations research, and a wealth 
of monographs [27,43,70,92] are available. Most impor- 
tant in our view are general frames. AMP, an intelligent 
interplay of intensification and diversification (such as 
ideas from POPMUSIC), and the connection to pow- 
erful exact algorithms as subroutines for handable sub- 
problems are avenues to be followed. 

From a theoretical point of view, the use of most 
metaheuristics has not yet been fully justified. While 
convergence results regarding solution quality exist 
for most metaheuristics, once appropriate probabilis- 
tic assumptions are made [8,31,50] these turn out not 
to be very helpful in practice as usually a dispro- 
portionate computation time is required to achieve 
these results (usually convergence is achieved for a 
computation time tending to infinity, with a few ex- 
ceptions, e.g., for the reverse elimination method 
within TS or the pilot method where optimality can 
be achieved with a finite, but exponential number of 
steps in the worst case). Furthermore, we have to ad- 
mit that theoretically one may argue that none of the 
metaheuristics described are on average better than 
any other; there is no free lunch [101]. Basically this 
leaves the choice of a best possible heuristic or related 
ingredients to the ingenuity of the user/researcher. 
Some researchers related the term “hyperheuristics” 
to the question of which (heuristic) method among 
a given set of methods to choose for a given prob- 
lem [16]. 

Moreover, despite the widespread success of vari- 
ous metaheuristics, researchers occasionally still have 
a poor understanding of many key theoretical aspects 
of these algorithms, including models of the high-level 
run-time dynamics and identification of search space 
features that influence problem difficulty [99]. More- 
over, fitness landscape evaluations are considered to be 
in their infancy too. 

From an empirical standpoint it would be most in- 
teresting to know which algorithms perform best under 
various criteria for different classes of problems. Unfor- 
tunately, this theme is out of reach as long as we do not 
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have any well-accepted standards regarding the testing 
and comparison of different methods. 

While most papers on metaheuristics claim to pro- 
vide “high-quality” results based on some sort of mea- 
sure, we still believe that there is a great deal of room 
for improvement in testing existing as well as new ap- 
proaches from an empirical point of view [10,57,67]. 
In a dynamic research process numerical results pro- 
vide the basis for systematically developing efficient al- 
gorithms. The essential conclusions of finished research 
and development processes should always be substan- 
tiated (i.e., empirically and, if necessary, statistically 
proven) by numerical results based on an appropriate 
empirical test cycle. Furthermore, even when excellent 
numerical results are obtained, it may still be possible to 
compare with a simple random restart procedure and 
obtain better results in some cases [47]. However, this 
comparison is usually neglected. 

Usually the ways of preparing, performing, and pre- 
senting experiments and their results are significantly 
different. The failing of a generally accepted standard 
for testing and reporting on the testing, or at least a cor- 
responding guideline for designing experiments, unfor- 
tunately implies the following observation: Some re- 
sults can be used only in a restricted way, e. g., because 
relevant data are missing, wrong environmental set- 
tings are used, or simply results are glossed over. In the 
worst case nonsufficiently prepared experiments pro- 
vide results that are unfit for further use, i.e., any gen- 
eralized conclusion is out of reach. Future algorithm re- 
search needs to provide effective methods for analyzing 
the performance of, e. g., heuristics in a more scientifi- 
cally founded way (see [4,100] for some steps into this 
direction). 

A final aspect that deserves special consideration is 
to investigate the use of information within different 
metaheuristics. While the AMP frame provides a very 
good entry into this area, this still provides an interest- 
ing opportunity to link artificial intelligence with oper- 
ations research concepts. 
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Nicholas Constantine Metropolis was born in Chicago 
on June 11, 1915 and died on October 17, 1999 in 
Los Alamos. At Los Alamos, Metropolis was the main 
driving force behind the development of the MANIAC 
series of electronic computers. He was the first to 
code a problem for the ENIAC in 1945-1946 (together 
with S. Frankel), a task which consumed approximately 
1,000,000 IBM punched cards. 

Metropolis received his PhD in physics from the 
University of Chicago in 1941. He went to Los Alamos 
in 1943 as a member of the initial staff of fifty scientists 
of the Manhattan Project. He spent his entire career 
at Los Alamos, except for two periods (1946-1948 and 
1957-1965), during which he was professor of Physics 
at the University of Chicago. 

Metropolis is best known for the development (joint 
with S. Ulam and J. von Neumann) of the Monte-Carlo 
method. The Monte-Carlo method provides approxi- 
mate solutions to a variety of mathematical problems by 
performing statistical sampling experiments on a com- 
puter. However, the real use of Monte-Carlo meth- 
ods as a research tool stems from work on the atomic 
bomb during the second world war. This work involved 


a direct simulation of the probabilistic problems con- 
cerned with random neutron diffusion in fissile mate- 
rial. Metropolis and his collaborators, obtained Monte- 
Carlo estimates for the eigenvalues of Schrodinger 
equation. 

In 1953, Metropolis co-authored the first paper on 
the technique that came to be known as simulated an- 
nealing [3,8]. Simulated annealing is a method for solv- 
ing optimization problems. The name of the algorithm 
derives from an analogy between the simulation of the 
annealing of solids. Annealing refers to a process of 
cooling material slowly until it reaches a stable state. 

Metropolis also made several early contributions to 
the use of computers in the exploration of nonlinear 
dynamics. In the Sixties and Seventies he collaborated 
with G.-C. Rota and others on significance arithmetic. 
Another contribution of Metropolis to numerical anal- 
ysis is an early paper on the use of Chebyshev’s iterative 
method for solving large scale linear systems [1]. 
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Minimax is a principle of optimal choice (of some pa- 
rameters or functions). If applied, this principle re- 
quires to find extremal values of some max-type func- 
tion. Since the operation of taking the pointwise maxi- 
mum (of a finite or infinite number of functions) gen- 
erates, in general, a nonsmooth function, it is impor- 
tant to study properties of such a function. Fortunately 
enough, though a max-function is not differentiable, in 
many cases it is still directionally differentiable. The di- 
rectional differentiability provides a tool for formulat- 
ing necessary (and sometimes sufficient) conditions for 
a minimum or maximum and for constructing numer- 
ical algorithms. 

Recall that a function f : R" — Ris called Hadamard 
directionally differentiable (H.d.d.) at a point x € R" if 
for any g € R" there exists the finite limit 

Ge. tie f(x + ag) ~ fx) 

[a,g’|>[+0,g] a 

A function f: R” > R is called Dini directionally dif- 
ferentiable (D.d.d.) at a point x € R” if for any g € R” 
there exists the finite limit 
f(x +. ag) — f(x) 


a 


fo(%. 2) = lim 
alo 


If f is H.d.d., then it is D.d.d. as well and fyy’(x, g) = 


fo! (% g). 
Let 2 C R" be a convex compact set, x € (2. The 
cone 


N,(Q) = {v € R": (v,x) = pa(x)} 


is called normal to §2 at x. Here 
palx) = ae y) 


is the support function of S2 at x. 


A max-function 


Let 
f(x) = max g(x, y), (1) 
yeG 


where g: S x G > R is continuous jointly in x, y on S 
x Gand continuously differentiable in x there, SC R” 
is an open set, G is a compact set of some space. Under 
the conditions stated, the function f is continuous on S. 


Proposition 1 The function f is H.d.d. at any point x € 
S and 


max (v,g), (2) 


L(x, Z) = max (9g! (x, y),g) = 
ful, 8) ae y), 8) ee 


where 


R(x) ={y eG: f(x) = (x, y)}, 


x’ (x, g) is the gradient of y with respect to x for a fixed 
y, (a, b) is the scalar product of vectors a and b, 


Af (x) = co {py(x, g): y € R(x)} CR". 


The set df(x) is called the subdifferential of f at x. It is 
convex and compact. The mapping df is, in general, dis- 
continuous. 


Remark 2 It turns out that a convex function can also 
be represented in the form (1) with g being affine in x. 
For this special (convex) case the set df (x) is 


Of (x) 
={veR": f(z) —f(x) = (v,.z—x), Wz € S$}. 


The discovery of the directional differentiability of 
max-functions ([1,2,6]) and convex functions [10] was 
a breakthrough and led to the development of minimax 
theory and convex analysis ([4,9,10]). 
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A Maximum Function 
with Dependent Constraints 


Let x C R”, Y C R™ be open sets and let 


f(x) = max ¢(x, g), (3) 
yea(x) 

where a(x) is a multivalued mapping with compact im- 

ages, p: X x Y — R is Hadamard differentiable as 

a function of two variables, i. e. there exists the limit 


({x, y],[g,v]) = lim = 
gy([x, y],[g.v]) eed eee 


-[p(x +g’, y + av’) — g(x, y)]. 


Then ¢ is continuous and gy’ is continuous as a func- 
tion of direction [g, v]. 

The function f is called a maximum function with 
dependent constraints. Such functions are of great im- 
portance and have widely been studied (see [3,5,7,8]). 
To illustrate the results let us formulate one of them [5, 
Thm. I, 6.3]. 


Proposition 3 Let a mapping a be closed and bounded, 
its images be convex and compact, the support function 
a(x, 1) = maxy € a(x) (v, 1) be uniformly differentiable with 
respect to parameter I. Let, further, x € X and a function 
gy be concave in some convex neighborhood of the set {[x, 
yl: y © R(x)} (where R(x) = {y € a(x): o(x% y) = f(x)p). 
Then f (see (3)) is H.d.d. and 


f'(x.g)= sup min [(h,g)+a'(x,h;g)], (4) 


yeR(x) Fi2lEV x.y) 
where 
V(x.y) = {I = [hh] € 9px, y): h € Nay, 


d0(x, y) is the superdifferential of y at the point [x, y], 
and Nx, y is the cone normal to a(x) at y. 


Recall that if a function F: RS — Ris concave, Z C RS is 
open, z € Z, then the set 
5 Bia )= 2) <2 =<); 
OF(z) = iveR’: WieEZ 


is called the superdifferential of F at z € Z. It is convex 
and compact. 


A Maxmin Function 


Let v(x, y, z): Sx G, x G; > R be continuous jointly in 
all variables, S C R” be an open set, G; C R”, G, C R? 
be compact. Put 


f(x) = prey Q(x, y, 2). (5) 


The function f is continuous on S. 
Let 


(x,y) = min GC, V2); 
R(x) = {ye Gi: P(x, y) = f(x}, 
Q(x, y) = {Zz € Ga: G(x, y,z) = B(x, y)}. 


Fix x € S, let De(e > 0) be an e-neighborhood of the 
set {x} x R(x) x Uy r@ Q(x, y). Assume that the deriva- 
tives 


dp dp Pp Wp IY 
dx’ dy’ dx?” dxdy’ dy? 


exist and are continuous jointly in all variables on D,(x) 
and that 


os 
(Ae? ,») <0, 


oy 


V[x, y.z] € De(x), ve R”™. 


Assume also that G; is convex. Let y € G;. Put 


y(y) = {v =Ay'-y): A>0, ye Gi}, 
M(y) = cly(y). 
Proposition 4 [3, Thm. 5.2] Under the above assump- 


tions the function f (see (5)) is Hadamard directionally 
differentiable and 


fx(x,g) = sup sup min 
yER(x) yeT (y) 2EQy) 


dy(x, y, Z) dQ(x, y,Z) 
ee 


Remark 5 More sophisticated results on the directional 
differentiability of max- and maxmin functions can be 
found, e. g., in [8]. 


Higher-Order Directional Derivatives 


The results above are related to the first order direc- 
tional derivatives. Using these derivatives, it is possible 
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to construct the following first order expansion: 


f(x +ag) = f(x) + af" (x, g) + 0x,¢(a), (6) 


where f’ is either fy’ or fp’. 

In some cases it is possible to get ‘higher-order’ ex- 
pansions. 

Let 


f(x) = max fi(x), (7) 


where I = 1: N, x = (x1, ..., Xn) € R", the f;’s are con- 
tinuous and continuously differentiable up the /th order 
on an open set S C R’. Fix x € S. Then for sufficiently 
small a > 0 


Filx + ag) 


Lk 
= file) + SAM C%.g) + olga"), (8) 
k=1 


where 
“ d* f(x) 
(k) = i 
ae oie Ox}, +** OX; ey 
a ee ee >0 (9) 
a ato 


uniformly with respect to g, ||g|| = 1. 
Let us use the following notation 


F?(x, 9) = file), 


Viel, Ro(x,g) =I, 
Rx (x, g) = {i € Ry-i(x, 8) : 
PGS =, mx 6 @2)). 
JERK-1(x,8) 
a eee 2 
Clearly 


Ro(x, g) > R,(x, g) 2) R2(x, g) ae 


Note that Ro(x, g) does not depend on x and g, and R,(x, 
g) does not depend on g. 


Proposition 6 [3, Thm. 9.1] The following expansion 
holds: 


Tk 
fle +ag) = fle) + 1 FMC. g) + ofg.0"), 
k=1 


Vg eR", (10) 


where 
k k 
fC, 9) = max fix, g), 
iERx(x,g) 


o(g, a) _ 


0 11 
a! ao ( ) 


uniformly with respect to g, ||g|| = 1. 


The value d*f(x)/ dgk = f(x, g) is called the kth deriva- 
tive of f at x in a direction g. 


Remark 7 The mapping R;(x, g) is not continuous in x, 
while the mappings R;(x, g) (k => 2) are not continuous 
in x as well as in g. Therefore the functions f(x, g) in 
(11) are not continuous in x and (if k > 2) in g and, as 
a result, expansion (6) is also not ‘stable’ in x. 


To overcome this difficulty we shall employ another 
tool. 


Hypodifferentiability of a Max Function 


Let us again consider the case where f is defined by (7). 
It follows from (8) that, for A = (A),..., A,) € R", 


f(x + A) 
i 
= max ies - dX gi. | + o({| Al"). 
(12) 
Let us use the notation (see (9)) 
f(x, A) = An A®. 


The function f yp (x, A) is a kth order form of coordinates 
Aj,..., An; Aix being the set of coefficients of this form. 
Then (12) can be rewritten as 


f(x + A) 


= max res +0 ina + o(||AlI) 


k=1 


= f(x)+ max 
Aéd! f(x) 


l 
bp ina + o(llAll'), 3) 


k=1 
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where 
d' f(x) = co {Ae =(MycnAgieie 1 


Aio = filx) — f(x), 
Ap € R, Ai € R’, 


A= (Ao,..., Ar), 


k times 


=—e_—_ 
RUXn npAy € Rexx 


Ard€ 


k times 


TDN, 
Here, R"*""*” is the space of kth order real forms, 


e. g. R’*" is the space of real (n x n)-matrices. 
The set d!f(x) is called the kth order hypodifferential 
of f at x. It is an element of the space R x R” x --- x 
1 


—_— 
R™"*", The mapping d'f is continuous in x. 


Remark 8 Expansion (13) can be extended to the case 
where f is given by (1) and ¢ is / times continuously 
differentiable in x. 


Max functions represent a special case of the class of 
quasidifferentiable functions (see [5]). 


See also 


> Bilevel Linear Programming: Complexity, 
Equivalence to Minmax, Concave Programs 

> Bilevel Optimization: Feasibility Test and Flexibility 
Index 

> Minimax Theorems 

> Nondifferentiable Optimization: Minimax Problems 

> Stochastic Programming: Minimax Approach 

> Stochastic Quasigradient Methods in Minimax 
Problems 
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Minimax Game Tree Searching 


With the introduction of computers, also started the 
interest in having machines play games. Programming 
a computer such that it could play, for example chess, 
was seen as giving it some kind of intelligence. Start- 
ing in the mid fifties, a theory on how to play two 
player zero sum perfect information games, like chess 
or go, was developed. This theory is essentially based 
on traversing a tree called minimax or game tree. An 
edge in the tree represents a move by either of the play- 
ers and a node a configuration of the game. 

Two major algorithms have emerged to compute 
the best sequence of moves in such a minimax tree. 
On one hand, there is the alpha-beta algorithm sug- 
gested around 1956 by I. McCarthy and first published 
in [27]. On the other hand, G.C. Stockman [29] intro- 
duced the SSS* algorithm. Both methods try to min- 
imize the number of nodes explored in the game tree 
using special traversal strategies and cut conditions. 


Minimax Trees 


A two-player zero-sum perfect-information game, also 
called minimax game, is a game which involves ex- 
actly two players who alternatively make moves. No 
information is hidden from the adversary. No coins 
are tossed, that is, the game is completely determinis- 
tic, and there is perfect symmetry in the quality of the 
moves allowed. Go, checker and chess are such mini- 
max games whereas backgammon (the outcome of a die 
determines the moves available) or card games (cards 
are hidden from the adversary) are not. 

A minimax tree or game tree is a tree where each 
node represents a state of the game and each edge a pos- 
sible move. Nodes are alternatively labeled ‘max’ and 
‘min’ representing either player’s turn. A node having 
no descendants represents a final outcome of the game. 
The goal of a game is to find a winning sequence of 
moves, given that the opponent always plays his best 
move. 

The quality of a node f in the minimax game tree, 
representing a configuration, is given by its value e(t). 
The value e(t), also called minimax value, is defined re- 
cursively as 


f(t) if t is a leave node, 
e(t) = 2a e(s) if t is labeled ‘max’, 
min e(s) if tis labeled ‘min’. 


s€sons(t) 


If the considered minimax tree represents a com- 
plete game, that is, all possible board configurations, the 
function f may be defined as follows: 


+1 if tleads to a winning position, 
f(t) = 40 


—1 if t leads to a losing position; 


if t leads to a tie position, 


otherwise f(t) represents an evaluation of the quality of 
a board position. 

The relation between minimax trees and games is 
detailed in the following table. 


Minimax tree notion Minimax game notion 


Minimax tree All board configurations 
Node in the tree Board configuration 
Edge from “max” to Move by player “max” 
“min” node 

Edge from “min” to Move by player “min” 
“max” node 

Node value Quality of board position 
Leave node Outcome of a game 
Solution path Sequence of moves lead- 


ing the best outcome 


Sequential Minimax Game Tree Algorithms 


Let t be a node of a minimax tree. Then the func- 
tion first_son(t) returns the first son node s, of t and 
next_son(s;, t) returns the i + 1th son of node t. The 
function no_more_sons(s, t) returns true of s is the last 
son of t. Otherwise it returns false. The ordering of the 
sons introduced by these functions is arbitrary. In prac- 
tice it is given by some heuristic function. The func- 
tion father(t) returns the father node of t, is_leave(t) 
whether or not f is a leave node and node_type(t) the 
type of node t. 


Minimax Algorithm 


The most basic minimax algorithm is called the min- 
imax algorithm. It systematically traverses, in a depth 
first, left to right fashion, the complete minimax tree. 
All nodes are visited exactly once. 


Alpha-Beta Algorithm 


The first nontrivial algorithm introduced to com- 
pute the minimax value of a game tree was the 
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alpha-beta algorithm. According to D. Knuth and R. 
Moore, McCarthy’s comments at the Dartmouth sum- 
mer research conference on artificial intelligence led 
to the use of alpha-beta pruning in game playing pro- 
grams since the late 1950s. The first published discus- 
sion of an algorithm for minimax tree pruning ap- 
peared in 1958 (see [11, p. 56]). Two early extensive 
studies of the algorithm may be found in [18] and 
(27). 

The idea behind the alpha-beta algorithm is to tra- 
verse the minimax tree in a depth first, left to right fash- 
ion. It tries to prune sub-trees that can not influence the 
minimax value of the tree. The conditions used to prune 
sub-trees are called cut conditions. The idea behind the 
suggested cut conditions is to associate to each node 
a lower and an upper bound, called a and 6 bounds. 
The bounds of a node are passed to its sons and tight- 
ened during the execution of the algorithm. It is easy 
to see that if the lower bound of a node t of type ‘max’ 
is larger than its upper bound then all not visited sons 
of node t can be pruned, and similar for nodes of type 


ee 


min. 


FUNCTION AlphaBeta(n, a, 6) IS 
BEGIN 
IF is_leave(n) THEN RETURN f(n) 
s < first_son(n) 
IF node_type(n)=max THEN 
LOOP 
a < max{a, AlphaBeta(s, a, 6)} 
IF a > B THEN RETURN B 
EXIT LOOP WHEN no_more_sons(s, 7) 
s <— next_son(s, n) 
END LOOP 
RETURN a 
ELSE 
LOOP 
B < max{a, AlphaBeta(s, a, 6)} 
IF 6B <a THEN RETURNa@ 
EXIT LOOP WHEN no_more_sons(s, 1) 
s <next_sons(s, 7) 
END LOOP 
RETURN 
END IF 
END AlphaBeta 


Pseudocode for the alpha-beta algorithm 


It has been proved in [18] that the alpha-beta algo- 
rithm correctly calculates the minimax value of a tree. 
The above pseudocode describes the alpha-beta algo- 
rithm. 

The minimax value of a tree T is computed as fol- 
lows. 

e (root(T)) < AlphaBeta (root(T), —oo, +00). 


Optimal State Space Search Algorithm SSS# 


It has been introduced by Stockman in 1979, [29]. It 
originates not in game playing but in systematic pat- 
tern recognition. The algorithm was first analyzed and 
criticized in [26]. 

The idea behind the SSS algorithm is to use a tree 
traversal strategy that is, better than the depth first and 
left to right strategy found in the alpha-beta algorithm. 
The criteria used to order the nodes yet to visit is an 
upper bound of their value. Nodes are stored in non 
increasing order of their upper bound in a list called 
‘open’. 

The SSS* algorithm first traverses the minimax 
tree from top to bottom. Nodes whose sons have not 
yet been visited and which cannot yet be pruned are 
marked ‘live’. Nodes marked ‘solved’ have already been 
visited once and have therefore their best upper bound 
associated. 

The operation purge(t, open) removes all nodes 
from the open list for which the node tf is an ancestor. 
Due to the fact that the nodes in the open list are sorted 
in nonincreasing order of their associated upper bound, 
the pruning operation only eliminates nodes that need 
no further consideration. 

The SSS* algorithm is described by the following 
pseudocode. 


FUNCTION SSS» IS 
BEGIN 
open <— @ 
insert (root, live, +oo, open) 
LOOP 
(s, t,m) <— remove (open) 
IF s= root AND t= solved THEN RETURN m 
( Apply the I” operator to node s ) 
END LOOP 
END SSS 


Pseudocode for the SSS# algorithm 
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The operator I(s) is applied to each node s ex- 
tracted from the ‘open’ list. 

It is possible to define a dual version of the SSS», 
which may be called SSS« -dual, in which the computa- 
tion of upper bounds is replaced by the computation of 
lower bounds. The SSS* -dual algorithm has been sug- 
gested in [21]. 

Stockman has shown that if the SSS* algorithm ex- 
plores a node, then this node is also explored by the 
alpha-beta algorithm. In fact, the alpha-beta algorithm 
loses efficiency (in the number of nodes visited) against 
the SSS» algorithm when the value of the minimax tree 
is found towards the right of the tree. If the SSS* algo- 
rithm is applied to win-lose trees then it visits exactly 
the same nodes in the same order as would the alpha- 
beta algorithm. 


(Apply the I" operator to node s) = 
IF t = live AND node_type = max 
AND NOT is_leave(t) THEN 
s < first_son(f) 
LOOP 
insert (s, live, m, open) 
EXIT LOOP WHEN no_more_sons(s, t) 
s <— next_son(s, f) 
END LOOP 
END IF 
IF t = live AND node_type = min 
AND NOT is_leave(t) THEN 
insert(first_son(t), live, m, open) 
END IF 
IF t = live AND is_leave(t) THEN 
insert(t, solved, min{ f(t), m}, open) 
END IF 
IF t = solved AND node_type = max 
AND NOT no_more_sons(t, father(t)) THEN 
insert(next_son(t, father(t)), live, m, open) 
END IF 
IF t = solved AND node_type = max 
AND no_more_sons(f, father(t)) THEN 
insert(father(t), solved, m, open) 
END IF 
IF t = solved AND node_type = min THEN 
insert(father(t), solved, m, open) 
purge(father(t), open) 
END IF 


SCOUT: Minimax Algorithm 
of Theoretical Interest 


In the previous sections, we have described the most 
common minimax algorithms. While trying to show 
the optimality of the alpha-beta algorithm, J. Pearl [23] 
introduced the SCOUT algorithm. His idea was to show 
that the SCOUT algorithm is dominated by the alpha- 
beta algorithm and to prove that SCOUT achieves 
an optimal performance. But counterexamples showed 
that the alpha-beta algorithm does not dominate the 
SCOUT algorithm because the conservative testing ap- 
proach of the SCOUT algorithm may sometimes cut off 
nodes that would have been explored by the alpha-beta 
algorithm. 

The SCOUT algorithm itself recursively computes 
the value of the first of its sons. Then it tests to see if the 
value of the first son is better that the value of the other 
sons. In case of a negative result, the son that failed 
the test is completely evaluated by recursively calling 
SCOUT. 

Although the SCOUT algorithm is more of theoret- 
ical interest, there are some problem instances where it 
outperforms all other minimax algorithms. A last ad- 
vantage of the SCOUT algorithm versus one of its ma- 
jor competitors, the SSS* algorithm, is that its storage 
requirements are similar to those of the alpha-beta al- 
gorithm. 


GSEARCH: Generalized Game 
Tree Search Algorithm 


In 1986, T. Ibaraki [16] proposed a generalization of 
the previously known algorithms to compute the mini- 
max value of a game tree. His idea was to use a branch 
and bound like approach. Nodes of the considered tree 
which have not yet been evaluated are stored in a list 
which is ordered according to a given criteria. Different 
orderings give different traversal strategies. A lower and 
upper bound is associated to each node. These bounds 
generalize the a and f values found in the alpha-beta 
algorithm. 

Finally Ibaraki showed how the algorithm GS}CH is 
related to other minimax algorithms like alpha-beta or 
SSS+, and proved that his algorithm always surpasses 
the alpha-beta algorithm. 
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SSS-2: Recursive State Space Search Algorithm 


The SSS-2 algorithm has been proposed by W. Pijls and 
A. de Bruin [24]. It is based on the idea of computing 
an upper bound for the root node and then repeatedly 
transforming this upper bound into a tighter one. They 
have shown that the SSS-2 algorithm exactly expands 
the same nodes as those to which the SSS* algorithm 
applies the I” operator. 


Some Variations On The Subject 


Computing the minimax value of a game tree may be 
seen as aspiring the solution value from a leave node 
through the whole tree up to the root node. While mov- 
ing closer to the root node, more and more useless sub- 
trees will be eliminated, as we have already stated for the 
alpha-beta algorithm. The better the a and 6 bounds, 
the more subtrees may be pruned. If, for instance, one 
knows that the minimax value will, with high probabil- 
ity, be found in the subset Ja, b[, then it may be worth 
calling the alpha-beta algorithm as 


e < AlphaBeta (root(T), a, b) 


If, indeed, the minimax value e(root(T)) belongs to the 
set Ja, b[, then the algorithm will correctly return that 
value. If the minimax value does not belong to the set 
Ja, b[, then the value returned will be either a or b, de- 
pending on whether the minimax value belongs to ]— 
oo, a] or [b, + oo[. We then say that the alpha-beta al- 
gorithm failed low, respectively high. In the case where 
the algorithm failed low, the call 


e < AlphaBeta (root(T), —oo, a + 1) 


will return the correct value. But it would also be possi- 
ble to reiterate this procedure on a subset Ja), a + 1[. 

The technique of limiting the interval in which the 
solution may be found is called aspiration search. If the 
minimax value belongs to the specified interval, then 
a much larger number of cut conditions are verified and 
the tree actually traversed is much smaller than the one 
traversed by the alpha-beta algorithm without initial al- 
pha and beta bounds. 

Furthermore it is interesting to note that aspiration 
search is at the bases of a technique called iterative deep- 
ening which is used in many game playing programs. 

I. Althdfer [5] suggested an incremental negamax al- 
gorithm which uses estimates of all nodes in the mini- 


max tree, rather than only those of the leave nodes, to 
determine the value of the root node. This algorithm 
is useful when dealing with erroneous leave evaluation 
functions. Under the assumption of independently oc- 
curring and sufficiently small errors, the proposed al- 
gorithm is shown to have exponentially reduced error 
probabilities with respect to the depth of the tree. 

R.L. Rivest [25] proposed an algorithm for search- 
ing minimax trees based on the idea of approximating 
the min and the max operators by generalized mean 
value operators. The approximation is used to guide the 
selection of the next leave node to expand, since the ap- 
proximation allows to select efficiently that leave node 
upon whose value the minimax value most highly de- 
pends. B.W. Ballard [6] proposed a similar algorithm 
where the value of some nodes (the chance node as he 
calls them) is a, possibly weighted, average of the values 
of its sons. In fact he considers one additional type of 
nodes called chance nodes. 

Conspiracy numbers have been introduced by D.A. 
McAllester in [22] as a measurement of the accuracy of 
the minimax value of an incomplete tree. They measure 
the number of leave nodes whose value must change in 
order to change the minimax value of the root node by 
a given amount. 


Parallel Minimax Tree Algorithms 


Parallelizing the minimax algorithm is trivial over uni- 
form trees. Even on irregular trees, the parallelization 
remains easy. The only additional problem arises from 
the fact that the size of the subtrees to explore may now 
vary. Different processors will be attributed problems 
of varying computational volume. All what is needed 
then to achieve excellent speedups, is a load-balancing 
scheme, that is, a mechanism by means of which pro- 
cessors may, during run-time, exchange problems so as 
to keep all processors busy all the time. 

The parallelization of the alpha-beta and the SSS 
algorithms are much more interesting than the more 
theoretical minimax algorithm. There exist basically 
two approaches or techniques to parallelize the alpha- 
beta algorithm. In the first approach, which has been 
one of the first techniques used, all processors explore 
the entire tree but using different search-intervals. This 
approach is at the basic of the algorithm called paral- 
lel aspiration search by G. Baudet [7]. The second one 
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consists in exploring simultaneously different parts of 
the minimax tree. 


A Simple Way to Parallelize the Exploration 
of Minimax Trees 


Exploring a minimax tree in parallel can very simply be 
obtained by generating the sons of the root node, and 
their sons and so on up to the point where one has as 
many son nodes waiting to be explored as there are pro- 
cessors. At this point, each processor explores the sub- 
tree rooted at one of these nodes, using any given se- 
quential minimax algorithm. When all processors have 
completed their exploration, the solution for the entire 
tree is computed by using the partial results obtained 
from each of the processors. 

In practice the sons of a node may be ordered in 
such a way that any son has a probability of yielding 
the locally optimal path that is no smaller than the cor- 
responding probabilities for its right neighbors. The 
probability to find the optimum in the subtree rooted at 
a given son then always decreases when traversing the 
sons in a left to right order. Such ordering information 
is generally available in game-playing programs, the or- 
dering function being a heuristic function based on the 
knowledge of the game to be played. 


A Mandatory Work First Algorithm 


R. Hewett and G. Krishnamurthy [15] proposed an al- 
gorithm that achieves an efficiency of roughly 50% for 
an number of processors in the range of 2 to 25. All the 
nodes that still need to be explored are maintained in 
alist called ‘open’ list. This list is ordered with respect to 
how the nodes have been reached. More precisely, the 
algorithm maintains two lists called ‘open’ and ‘closed’, 
and a tree called ‘cut’. The ‘open’ list contains all the 
nodes yet to be explored, the ‘closed’ list contains the 
expanded nodes not yet pruned and the ‘cut’ tree con- 
tains the pruned nodes. The ‘open’ list initially contains 
only the root node. All processors fetch nodes from the 
‘open’ list and process them if they cannot be discarded, 
that is, they do not have any of their ancestors in the 
‘cut’ tree. Leave nodes are evaluated and their result is 
returned to the parent which may update its value and 
check for possible pruning by traversing the ‘cut’ tree 
up to the root node applying the usual alpha and beta 
cutofts. If the node selected is not a leave node, it is ex- 


panded and its sons are inserted into the ‘open’ list and 
itself into the ‘closed’ list. 

S.G. Akl et al. [1,2] proposed an algorithm that 
uses the same approach for exploring the minimax tree. 
Their priority function is computed as 


p(ni) = p(father(n;)) — (ba, + 1—i)-10%-F-», 


where n; is the ith son of node father(n;), b,, the 
branching of node father(n;), h the search depth (the 
maximal depth of the minimax tree) and f the depth of 
node father(n;) in the minimax tree. 

K. Almquist et al. [3] also developed an algorithm 
based on the idea of having two categories of unex- 
plored nodes which are ordered according to a given 
priority function. Furthermore they add to this concept 
parallel aspiration search as well as a novel scheduling 
algorithm. 

In the same direction, V.-D. Cung and C. Roucairol 
[9] have proposed a shared memory parallel minimax 
algorithm which distinguishes between critical and non 
critical nodes. In their algorithm one processor is as- 
signed to each node. 

In the algorithm by LR. Steinberg and M. Solomon 
[28], which is also a mandatory work first type algo- 
rithm, the list containing the speculative work or non 
critical nodes is dynamically ordered. 


Aspiration Search 


The parallel algorithm called aspiration search has been 
introduced by Baudet in 1978 [7]. In this algorithm 
the search interval ]— 00, + oo[ used by the sequen- 
tial alpha-beta algorithm is divided into a certain num- 
ber of subintervals that cover the entire range ]— 00, 
+ oo[. Now, every processor explores the entire mini- 
max tree using one subinterval, different processors be- 
ing assigned different intervals. Any processor search- 
ing an interval ]a;, a;,;] may either fail low or high. The 
principle is the same as in the sequential version of the 
algorithm. Exactly one processor will neither fail low, 
nor fail high. The value computed by this processor is 
the value of the minimax tree to explore. 

The implementation of the aspiration search algo- 
rithm is really simple. Furthermore, there is no in- 
formation exchange needed between processors. If the 
nodes in the to explore minimax tree are ordered in 
such a way that the alpha-beta algorithm has to explore 
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the whole tree, then the speedup obtained by using the 
aspiration search algorithm is maximal. But, when the 
aspiration search algorithm is applied to randomly gen- 
erated trees then Baudet has shown that the speedup is 
limited to about six and is independent of the number 
of processors used. 


Tree-Splitting Algorithm 


Among the early parallel minimax algorithms is the 
tree-splitting algorithm by R.A. Finkel and J.P. Fishburn 
[14]. This algorithm is based on the idea to look at the 
available processors as a tree of processors. Each pro- 
cessor, except for the ones representing leaves in the 
processor tree, have a fixed number py of son or slave 
processors. During the execution of the algorithm a non 
leave processor associated with a node n in the minimax 
tree spawns the exploration of the sons s; of n to its pp 
slaves. As soon as one slave returns the next unexplored 
son s; is spawned to that slave or the current value is 
returned to the father processor if the cut condition is 
satisfied. If all the sons of a node have been spawned to 
its slaves, the father processor waits for the results of all 
its slaves. Leave processors simply compute the value 
of their associated node using the sequential alpha-beta 
algorithm. 

An important advantage of the tree-splitting algo- 
rithm over other more elaborated algorithms is that it 
may be simply implemented as well on a shared mem- 
ory parallel machine as on a distributed memories par- 
allel machine. 

The tree-splitting algorithm has been implemented 
and its execution has been simulated. On a 27 processor 
simulated machine, in which each processor has tree 
slave sons associated, the average speedup was 5.31 for 
trees of depth eight and a branching of three. 


PVSPLIT: Principal Variation Splitting Algorithm 


It has been proposed by T.A. Marsland and M.S. Camp- 
bell [19] and is by far the most often implemented algo- 
rithm, especially in chess playing programs. The algo- 
rithm is based on the structure of the sequential alpha- 
beta algorithm. The idea is to first explore in a sequen- 
tial fashion a path from the root node to its leftmost 
leave. This path is called the principal variation path. 
The traversal is done to obtain alpha and beta bounds. 
If the minimax tree to explore is of type best first, then 


the explored principal variation path represents the so- 
lution path. In a second phase, for each level of the min- 
imax tree all the yet to be visited sons are explored in 
parallel by using the bounds computed during the prin- 
cipal variation path computation and the traversal of 
the lower levels of the minimax tree. 

The PVSPLIT algorithm is completely described by 
the following pseudocode using the negamax notation. 

The PVSPLIT algorithm has been implemented in 
[20] on a network of Sun workstations. An accelera- 
tion of 3.06 has been measured on 4 processors when 
traversing minimax trees representing real chess games. 
The main problem of the PVSPLIT algorithm is that, 
during the second phase, the subtrees explored in par- 
allel are not necessarily of the same size. 

The PVSPLIT algorithm is most efficient when the 
iterative deepening technique is used, because with 
each iteration is is increasingly likely that the first move 
tried, that is, the one on the principal variation path, is 
the best one. 


FUNCTION PVSplit(b, a, 8) IS 
BEGIN 
IF is_leave(n) THEN RETURN f(n) 
s < first_son(n) 
a < —PVSplit(s, —B, —a) 
IFa > B THEN RETURN a 
FOR s’ € sons(n) — {s} LOOP IN PARALLEL 
(wait until a slave node is idle) 
v; < —TreeSplit(s’, —B, —w) 
IF v; > a THEN 
a <— Vj; 
(Update the bounds according to @ on all 
slaves) 
END IF 
IF a > B THEN 
(Terminate all slave processors) 
RETURN a 
END IF 
END LOOP 
RETURN a 
END PVSplit 


Pseudocode for the PVSPLIT algorithm 


Synchronized Distributed State Space Search 


A completely different approach to parallelizing the 
SSS* algorithm has been taken by C.G. Diderich and 
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M. Gengler [10]. The algorithm proposed is called syn- 
chronized distributed state space search (SDSSS). It 
is an alternation of computation and synchronization 
phases. The algorithm has been designed for a dis- 
tributed memory multiprocessor machine. Each pro- 
cessor manages its own local ‘open’ list of unvisited 
nodes. 

The synchronization phase may be subdivided in 
three major parts. First, the processors exchange infor- 
mation about which nodes can be removed from the 
local ‘oper’ lists. This corresponds to each processor 
sending the nodes for which the ‘purge’ operation may 
be applied by all the other processors. Next, all the pro- 
cessors agree on the globally lowest upper bound m* 
for which nodes exist in some of the ‘oper’ lists. Fi- 
nally all the nodes having the same upper bound m”* are 
evenly distributed among all the processors. This oper- 
ation concludes the synchronization phase. 

The computation phase of the SDSSS algorithm 
may be described by the following pseudocode. 


(Computation phase) = 
WHILE (there exists a node in the open list 
having an upper bound of m*) 

LOOP 

(s, t, m*) < remove(open) 

IF s = root AND t = solved THEN 
BROADCAST ‘the solution has been 
found’ 

RETURN m* 
END IF 
(Apply the I” operator to node s) 
END LOOP 


Pseudocode for the computation phase of the SDSSS algo- 
rithm 


Experiments executing the SDSSS algorithm on an 
Intel iPSC/2 parallel machine have been conducted. 
Speedups of up to 11.4 have been measured for 32 pro- 
cessors. 


Distributed Game Tree Search Algorithm 


R. Feldman [12] parallelized the alpha-beta algorithm 
for massively parallel distributed memory machines. 
Different subtrees are searched in parallel by different 
processors. The allocation of processors to trees is done 
by imposing certain conditions on the nodes which are 


be selectable. They introduce the concept of younger 
brother waits. This concept essentially says that in the 
case of a subtree rooted at s;, where s, is the first son 
node of a node n, is not yet evaluated, then the other 
..» Sp of node n are not selectable. Younger 
brothers may only be considered after their elder broth- 
ers, which has as a consequence that the value of the el- 
der brothers may be used to give a tight search window 
to the younger brothers. 

This concept is nevertheless not sufficient to achieve 
the same good search window as the alpha-beta algo- 
rithm achieves. Indeed when node s is computed, then 
the younger brothers may all be explored in parallel us- 
ing the value of node s;. Thus the node s, has the same 
search window as it would have in the sequential alpha- 
beta algorithm, but this is not true anymore for s;, where 
i > 3. Indeed if nodes s2 and s3 are processed in parallel, 
they only know the value of node s;, while in the se- 
quential alpha-beta algorithm, the node s3 would have 
known the value of both s; and s. This fact forces the 
parallel algorithm to provide an information dissemi- 
nation protocol. 

In case the nodes s and s3 are evaluated on proces- 
sors P and P’, and processor P finishes its work before 
P’, producing a better value than node s, did, then pro- 
cessor P will inform processor P’ of this value, allowing 
it to continue with better information on the rest of its 
subtree or to terminate its work if the new value allows 
P’ to conclude that its computation becomes useless. 
The load distribution is realized by means of a dynamic 
load balancing scheme, where idle processors ask other 
processors for work. 

Speedups as high as 100 have been obtained on 
a 256 processor machines. In [13], a speedup of 344 
on a 1024 transputer network interconnected as a grid 
and a speedup of 142 on a 256 processor transputer de 
Bruijn interconnected network have been shown. 


SONS 5S, . 


Parallel Minimax Algorithm with Linear Speedup 


In 1988, Althéfer [4] proved that it is possible, to de- 
velop a parallel minimax algorithm which achieves lin- 
ear speedup in the average case. With the assumption 
that all minimax trees are binary win-loss trees, he ex- 
hibited such a parallel minimax algorithm. 

M. Bohm and E. Speckenmeyer [8] also suggested 
an algorithm which uses the same basic ideas as Althdf- 
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fer. Their algorithm is more general in the sense that 
it needs only to know the distribution of the leave val- 
ues and is independent of the branching of the tree ex- 
plored. 

In 1989, R.M. Karp and Y. Zhang [17] proved that 
it is possible to obtain linear speedup on every instance 
of a random uniform minimax tree if the number of 
processors is close to the height of the tree. 


See also 


> Bottleneck Steiner Tree Problems 
> Directed Tree Networks 
> Shortest Path Tree Algorithms 
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We suppose that X and Y are nonempty sets and f: X 
x Y > R. A minimax theorem is a theorem that asserts 
that, under certain conditions, 


ae = up inf f, 
that is to say, 


ee een 

The purpose of this article is to give the reader the 
flavor of the different kind of minimax theorems, and of 
the techniques that have been used to prove them. This 
is a very large area, and it would be impossible to touch 
on all the work that has been done in it in the space that 
we have at our disposal. The choice that we have made 
is to give the historical roots of the subject, and then go 
directly to the most recent results. The reader who is 
interested in a more complete narrative can refer to the 
1974 survey article [35] by E.B. Yanovskaya, the 1981 
survey article [8] by A. Irle and the 1995 survey article 
[31] by S. Simons. 


Von Neumann’s Results 


In his investigation of games of strategy, J. von Neu- 
mann realized that, even though a two-person zero- 
sum game did not necessarily have a solution in pure 
strategies, it did have to have one in mixed strategies. 
Here is a statement of that seminal result ([19], trans- 
lated into English in [21]): 


Theorem 1 (1928) Let A be an m x n matrix, and X 
and Y be the sets of nonnegative row and column vectors 
with unit sum. Then 


min max xAy = max min xAy. 
yEeY xEx xEX yey 


Despite the fact that the statement of this result is quite 
elementary, the proof was quite sophisticated, and de- 
pended on an extremely ingenious induction argument. 
Nine years later, in [20], von Neumann showed that the 
bilinear character of Theorem 1 was not needed when 
he extended it as follows, using Brouwer’s fixed point 
theorem: 


Theorem 2 (1937) Let X and Y be nonempty compact, 
convex subsets of Euclidean spaces, and f: X x Y — R be 
jointly continuous. Suppose that f is quasiconcave on X 
and quasiconvex on Y (see below). Then 

min max f = max min f. 
When we say that f is quasiconcave on X, we mean that 
e forally€ Yandd ER, GT(A, y) is convex, 
and when we say that f is quasiconvex on Y, we mean 
that 
e forallxe X anda ER, LE(x, A) is convex. 
Here, GT(A, y) and LE(x, A) are ‘level sets’ associated 
with the function f. Specifically, 


GT(A, y) := {x eX: f(x,y) >A} 
and 
LE(x,A):={y © Y: f(x,y) <A}. 


In 1941, S. Kakutani [10] analyzed von Neumann’s 
proof and, as a result, discovered the fixed point theo- 
rem that bears his name. 


Infinite-Dimensional Results for Convex Sets 


The first infinite-dimensional minimax theorem was 
proved in 1952 by K. Fan ([1]), who generalized Theo- 
rem 2 to the case when X and Y are compact, convex 
subsets of infinite-dimensional locally convex spaces, 
and the quasiconcave and quasiconvex conditions are 
somewhat relaxed. The result in this general line that 
has the simplest statement is that of M. Sion, who 
proved the following ([33]): 
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Theorem 3 (1958) Let X be a convex subset of a lin- 
ear topological space, Y be a compact convex subset of 
a linear topological space, and f: X x Y — R be upper 
semicontinuous on X and lower semicontinuous on Y. 
Suppose that f is quasiconcave on X and quasiconvex on 
Y. Then 


minsup f = sup min f. 
Y x x Y 


When we say that f is ‘upper semicontinuous on X’ and 
‘lower semicontinuous on Y’ we mean that, for all y € 
Y, the map x +> f(x, y) is upper semicontinuous and, 
for all x € X, the map y+ > f(x, y) is lower semicontinu- 
ous. The importance of Sion’s weakening of continuity 
to semicontinuity was that it indicated that many kinds 
of minimax problems have equivalent formulations in 
terms of subsets of X and Y, and led to Fan’s 1972 
work ([4]) on sets with convex sections and minimax 
inequalities, which has since found many applications 
in economic theory. Like Theorem 2, all these result re- 
lied ultimately on Brouwer’s fixed point theorem (or 
the related Knaster-Kuratowski-Mazurkiewicz lemma 
(KKM lemma) on closed subsets of a finite-dimensional 
simplex). 


Functional-Analytic Minimax Theorems 


The first person to take minimax theorems out of the 
context of convex subsets of vector spaces, and their 
proofs (other than that of the matrix case discussed in 
Theorem 1) out of the context of fixed point theorems 
was Fan in 1953 ([2]). We present here a generalization 
of Fan’s result due to H. Kénig ([15]). Kénig’s proof 
depended on the Mazur-Orlicz version of the Hahn- 
Banach theorem (see Theorem 5 below). 


Theorem 4 (1968) Let X be a nonempty set and Y be 
a nonempty compact topological space. Let f: X x Y > 
R be lower semicontinuous on Y. Suppose that: 

e for all x1, x2 € X, there exists x3 € X such that 
«Fe + flea.) 
= 2 


Ff (x3,°) nY; 


e forall y1, y2 € Y, there exists y3 € Y such that 


fl. ya) S Pe) fy) on X. 


Then 


min sup f = sup min f. 
Y x x Y 


We give here the statement of the Mazur-Orlicz version 
of the Hahn-Banach theorem, since it is a very useful 
result and it not as well-known as it deserves to be. 


Theorem 5 (Mazur-Orlicz theorem) Let S be a sub- 
linear functional on a real vector space E, and C be 
a nonempty convex subset of E. Then there exists a linear 
functional L on E such that L < S on E and infcL = infcS. 


See [16,22] and [23] for applications of the Mazur- 
Orlicz theorem and the related ‘sandwich theorem’ to 
measure theory, Hardy algebra theory and the theory 
of flows in infinite networks. 

The kind of minimax theorem discussed in this sec- 
tion (where X is not topologized) has turned out to be 
extremely useful in functional analysis, in particular in 
convex analysis and also in the theory of monotone op- 
erators on a Banach space. (See [32] for more details of 
these kinds of applications.) 


Minimax Theorems that Depend 
on Connectedness 


It was believed for some time that proofs of minimax 
theorems required either the fixed point machinery of 
algebraic topology, or the functional-analytic machin- 
ery of convexity. However, in 1959, W.-T. Wu proved 
the first minimax theorem in which the conditions of 
convexity were totally replaced by conditions related 
to connectedness. This line of research was continued 
by H. Tuy, L.L. Staché, M.A. Geraghty with B.-L. Lin, 
and J. Kindler with R. Trost, whose results were all sub- 
sumed by a family of general topological minimax the- 
orem established by K6nig in [17]. Here is a typical re- 
sult from [17]. In order to simplify the statements of 
this and some of our later results, we shall write f, := 
supx infyf. fx is the ‘lower value’ of f. If A eR, VC Y 
and W C X, we write GT(A, V) = (\yev GT(A, y) and 
LE(W, A) = (xe w LE(x, A). 


Theorem 6 (1992) Let X be a connected topological 
space, Y be a compact topological space, and f: X x Y 
— R be upper semicontinuous on X and lower semicon- 
tinuous on Y. Let A be a nonempty subset of (f+, 00) 
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such that inf A =f. and suppose that, for all A € A, for 
all nonempty subsets V of Y, and for all nonempty finite 
subsets W of X, 


GT(A, V)_ is connected in X, 
and 

LE(W,A)_ is connected in Y. 
Then 


min sup f = sup min f. 


Mixed Minimax Theorems 


In [34], F. Terkelsen proved the first mixed minimax 
theorem. We describe Terkelsen’s result as ‘mixed’ since 
one of the conditions in it is taken from Theorem 4, and 
the other from Theorem 6: 


Theorem 7 (1972) Let X be a nonempty set and Y be 
a nonempty compact topological space. Let f: X x Y > 
R be lower semicontinuous on Y. Suppose that, 

e forall x), x2 € X there exists x3 € X such that 


Pen) Sn axe 


f (xs, 2) pad 
Suppose also that, for all 1 € R and, for all nonempty 
finite subsets W of X, 


LE(W, A) is connected in Y. 


Then 


min sup f = sup min f. 


A Metaminimax Theorem 


It was believed for some time that Brouwer’s fixed point 
theorem or the Knaster-Kuratowski-Mazurkiewicz 
lemma was required to order to prove Sion’s theorem, 
Theorem 3. However, in 1966, M.A. Ghouila-Houri 
([7]) proved Theorem 3 using a simple combinato- 
rial property of convex sets in finite-dimensional space. 
This was probably the first indication of the breakdown 
of the classification of minimax theorems as either of 
‘topological’ or ‘functional-analytic’ type. Further indi- 


cation of this breakdown was provided by Terkelsen’s 
result, Theorem 7, and the subsequent 1982 results of I. 
Joé and Stacho ([9]), the 1985 and 1986 results of Ger- 
aghty and Lin ({5] and [6]), and the 1989 results of H. 
Komiya ([18]). 

Kindler ([11]) was the first to realize (in 1990) that 
some abstract concept akin to connectedness might be 
involved in minimax theorems, even when the topolog- 
ical condition of connectedness was not explicitly as- 
sumed. This idea was pursued by Simons with the in- 
troduction in 1992 of the concept of pseudoconnected- 
ness, which we will now describe. We say that sets Ho 
and H, are joined by a set H if 


HCHUM, HNH#9 
and 
HNH, 4 @. 


We say that a family 1 of sets is pseudoconnected if 


Ho, Hi,H € He 


I 
Hy OH, £9. 


and Hp and Hj joined by H 


Any family of closed connected subsets of a topological 
space is pseudoconnected. So also is any family of open 
connected subsets. However, pseudoconnectedness can 
be defined in the absence of any topological structure 
and, as we shall see in Theorem 8, is closely related to 
minimax theorems. Theorem 8 is the improvement of 
the result of [29] due to Konig (see [30]). We shall say 
that a subset W of X is good if 

e Wiis finite; and 

e forall x € X, LE(x, fx) ON LE(W, fx) AO. 


Theorem 8 (1995) Let Y be a topological space, and A 

be a nonempty subset of R such that inf A =f». Suppose 

that, for alld € A and for all good subsets W of X, 

e for all x € X, LE(x, A) is closed and compact; {LE(x, 
A) ON LE(W, A)}x ex is pseudoconnected; and 

e for all xo, x; € X, there exists x € X such that LE(xo, 
A) and LE(x;, A) are joined by LE(x, A) N LE(W, A). 

Then 


min sup f = sup min f. 
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Theorem 8 is proved by induction on the cardinality 
of the good subsets of W. Given the obvious topolog- 
ical motivation behind the concept of pseudoconnect- 
edness, it is hardly surprising that Theorem 8 implies 
Theorem 6. What is more unexpected is that Theo- 
rem 8 implies Theorems 4 and 7 also. We prefer to 
describe Theorem 8 as a metaminimax theorem rather 
than a minimax theorem, since it is frequently harder 
to prove that the conditions of Theorem 8 are satisfied 
in any particular case that it is to prove Theorem 8 it- 
self. So Theorem 8 is really a device for obtaining min- 
imax theorems rather than a minimax theorem in its 
own right. 

More recent work by Kindler ([12,13] and [14]) on 
abstract intersection theorems has been at the interface 
between minimax theory and abstract set theory. 


Minimax Theorems and Weak Compactness 


There are close connections between minimax theo- 
rems and weak compactness. The following “converse 
minimax theorem’ was proved by Simons in [25]; this 
result also shows that there are limitations on the ex- 
tent to which one can totally remove the assumption of 
compactness from minimax theorems. 


Theorem 9 (1971) Suppose that X is a nonempty 
bounded, convex, complete subset of a locally convex 
space E with dual space E*, and 


inf sup (x, y) = sup inf (x, 
pe ” eee »” 


whenever Y is a nonempty convex, equicontinuous, sub- 
set of E*. Then X is weakly compact. 


No compactness is assumed in the following, much 
harder, result (see [26]): 


Theorem 10 (1972) If X is a nonempty bounded, con- 
vex subset of a locally convex space E such that every el- 
ement of the dual space E* attains its supremum on X, 
and Y is any nonempty convex equicontinuous subset of 
E*, then 


inf sup (x, y) = sup inf (x,y). 


VEY vex xEx VEY 


If one now combines the results of Theorems 9 and 
10, one can obtain a proof of the ‘sup theorem’ of R.C. 


James, one of the most beautiful results in functional 
analysis: 


Theorem 11 (James sup theorem) If C is a nonempty 
bounded closed convex subset of E, then C is w(E, E*)- 
compact if and only if, for all x* € E*, there exists x € C 
such that (x, x*) = maxcx*. 


James’s theorem is not easy - the standard proof can be 
found in the paper [24] by J.D. Pryce. 

See [31] for more details of the connections between 
minimax theorems and weak compactness. 


Minimax Inequalities for Two or More Functions 


Motivated by Nash equilibrium and the theory of non- 
cooperative games, Fan generalized Theorem 2 to the 
case of more than one function. In particular, he proved 
in [3] the following two-function minimax inequality 
(since the compactness of X is not needed, this result 
can in fact be strengthened to include Sion’s theorem, 
Theorem 3, by taking g = f): 


Theorem 12 (1964) Let X and Y be nonempty com- 
pact, convex subsets of topological vector spaces and f, g: 
X x Y > R. Suppose that f is lower semicontinuous on 
Y and quasiconcave on X, g is upper semicontinuous on 
X and quasiconvex on Y, and 


f<g onxxY. 
Then 


min sup f< sup inf g. 


Fan (unpublished) and Simons (see [27]) generalized 
KGnig’s theorem, Theorem 4, with the following two- 
function minimax inequality: 


Theorem 13 (1981) Let X be a nonempty set, Y be 
a compact topological space and f, g: X x Y > R. Sup- 
pose that f is lower semicontinuous on Y, and 

e forall y;, y2 € Y there exists y3 € Y such that 


n X; 


faye) Lend fear) 
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e forall x), x2 € X there exists x3 € X such that 


B(xX1,°) + g(x, °) 


nY; 
2 


Q(X3,°) = 


and 
e fx<gonxxY. 
Then 


min sup f < sup inf g. 
ae so 


Theorems 12 and 13 both unify the theory of mini- 
max theorems and the theory of variational inequali- 
ties. The curious feature about these two results is that 
they have “opposite geometric pictures’. This question 
is discussed in [27] and [28]. The relationship between 
Theorem 12 and Brouwer’s fixed point theorem is quite 
interesting. As we have already pointed out, Sion’s the- 
orem, Theorem 3, can be proved in an elementary fash- 
ion without recourse to fixed point related concepts. 
On the other hand, Theorem 12 can, in fact, be used 
to prove Tychonoff’s fixed point theorem, which is itself 
a generalization of Brouwer’s fixed point theorem. (See 
[3] for more details of this.) 

A number of authors have proved minimax inequal- 
ities for more than two functions. See [31] for more de- 
tails of these results. 


Coincidence Theorems 


A coincidence theorem is a theorem that asserts that if 
S: X — 2” and T: Y > 2* have nonempty values and 
satisfy certain other conditions, then there exist x9 € X 
and yo € Y such that yo € Sxo and xo € Tyo. The con- 
nection with minimax theorems is as follows: Suppose 
that infy supy f # sup x infyf. Then there exists 1 ¢ R 
such that 


sup inf f <A < infsup f. 
x ¥ Y x 
Hence, 
e forall x € X there exists y € Y such that f(x, y) < A; 
and 


e forall y € Y there exists x € X such that f(x, y) >A. 
Define S: X — 2¥ and T: Y > 2* by 


Sx:={yeY: f(x,y) <A} FO 
and 


Tx :={x EX: f(x,y) >APA. 


If S and T were to satisfy a coincidence theorem, then 
we would have xp € X and yo € Y such that 


F(X, Yo) < A 


which is clearly impossible. Thus this coincidence the- 
orem would imply that 


and F (x0, Yo) >, 


ae = eg inf f. 


The coincidence theorems known in algebraic topology 
consequently give rise to corresponding minimax theo- 
rems. There is a very extensive literature about coinci- 
dence theorems. See [31] for more details about this. 
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The minimum concave transportation problem MCTP 
concerns the least cost method of carrying flow on a bi- 
partite network in which the marginal cost for an arc is 
a nonincreasing function of the flow on that arc. A bi- 
partite network contains source nodes and sink nodes, 
but no transshipment (i.e., intermediate) nodes. The 
MCTP can be formulated as 


min D> ij(xij) (1) 
(i,peA 
subject to: 
yas Vie M, (2) 
jeN 
a dj, VIEN, (3) 
i¢M 
xij=0, VG,j) eA, (4) 
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where M is the set of source nodes; N is the set of sink 
nodes; s; is the supply at source node i, dj is the demand 
at sink node j; A = {(i, j): i € M, j € N} is the (di- 
rected) arc set; x; is the flow carried on arc (i, j); and $j 
(xij) is the concave cost function for arc (i, j). Objective 
function (1) minimizes total costs; constraints (2) bal- 
ance flow at the source nodes; and constraints (3) bal- 
ance flow at the sink nodes. If }°j «m 5; is less (greater) 
than )°j¢n dj, then a dummy source (sink) node can 
be added to set M (N). 

MCTPs arise naturally in distribution problems in- 
volving shipments sent directly from supply points to 
demand points in which the transportation costs ex- 
hibit economies of scale [21]. However, the MCTP is 
not limited to this class of problems. Specifically, any 
network flow problem with arc cost functions that are 
not concave can be converted to a network flow prob- 
lem on an expanded network whose arc cost functions 
are all concave [16]. Then, the expanded network can 
be converted to a bipartite network by replacing each 
transshipment node with a source node and a sink 
node. Arc flow capacities can be removed by adding 
additional source nodes, one for each capacitated arc 
[19,23]. 

The fixed charge transportation problem FCTP is 
a type of MCTP in which the cost function $j (xj) for 
each arc (i, j) € A is of the form 
if Xj j = 0, 


dij(xij) = (5) 


fig + Bij xij if xij > 0, 

where fj and gj are coefficients with fj; > 0. FCTPs are 
commonly used to model network flow problems in- 
volving setup costs [9]. Furthermore, a variety of com- 
binatorial problems can be converted to FCTPs. For in- 
stance, consider the 0-1 knapsack problem KP. The KP 


is formulated as 


max ) > Ck Vk (6) 
k=1 
subject to: 
Soaks ye <b, (7) 
k=1 
ye € {0,1}, fork =1,...,n, (8) 


with a, > 0 and c, > 0 for k = 1,..., n. The KP can 
be converted to a FCTP with two source nodes and n + 


1 sink nodes. Define a, 4 , = b and c,,, = 0. Then, the 
network is specified as M = {1, 2}, N={l,...,0+ 1} 51 
= b, sy = )°7_, a, and d; = a; for j = 1,...,n + 1; and 
the cost function is of the form of (5) where, for each 
arc (i, j) € A, the coefficients f;; and gj are given by 


n 
Sock ifj=1,...,n, 
k=1 


fii = (9) 
0 ifj=ntl, 
oa ifi=1, (10) 
10 P=: 
For j = 1, ..., n sink node j has two incoming arcs, ex- 


actly one of which will have nonzero flow in the optimal 
solution to the FCTP. If xy > 0 in the FCTP, then yi = 
1 in the KP. If x3; > 0 in the FCTP, then y; = 0 in the 
KP. 

One consequence of this result is that any integer 
programming problem with integer coefficients can (in 
principle) be formulated and solved as a FCTP by first 
converting the integer program to a KP [10]. 

Exact solution methods for the MCTP are pre- 
dominately branch and bound enumeration procedures 
[2,3,4,6,8,11,12,15]. Binary partitioning is used for the 
FCTP; and interval partitioning is used for the MCTP 
with arbitrary concave arc cost functions. Finite con- 
vergence of the method was shown by R.M. Soland [22]. 
The convex envelope of the cost function $j (xj) is 
an affine function. Hence, a subproblem in the branch 
and bound procedure can be solved efficiently as a lin- 
ear transportation problem (LTP) [1]. Fathoming tech- 
niques (such as ‘up and down penalties’ and ‘capacity 
improvement’) based on post-optimality analysis of the 
LTP facilitate the branch and bound procedure for the 
MCTP [2,3,18,20]. The LTP is also used in approximate 
solution methods for the MCTP which rely on succes- 
sive linearizations of the concave cost function, $4 (xi) 
[5,13,14]. 

Test problems for the MCTP are given in 
[7,8;12,17,201. 
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The minimum cost flow problem seeks a least cost ship- 
ment of a commodity through a network to satisfy de- 
mands at certain nodes by available supplies at other 
nodes. This problem has many, varied applications: the 
distribution of a product from manufacturing plants to 
warehouses, or from warehouses to retailers; the flow of 
raw material and intermediate goods through various 
machining stations in a production line; the routing of 
automobiles through an urban street network; and the 
routing of calls through the telephone system. The min- 
imum cost flow problem also has many less direct appli- 
cations. In this article, we briefly introduce the theory, 
algorithms and applications of the minimum cost flow 
problem. [1] contains much additional material on this 
topic. 

Let G = (N, A) be a directed network defined by a set 
N of n nodes and a set A of m directed arcs. Each arc 
(i, j) € A has an associated cost cj that denotes the cost 
per unit flow on that arc. We assume that the flow cost 
varies linearly with the amount of flow. Each arc (i, j) € 
A has an associated capacity uj denoting the maximum 
amount that can flow on this arc, and a lower bound 
lj that denotes the minimum amount that must flow 
on the arc. We assume that the capacity and flow lower 
bound for each arc (i, j) are integers. We associate with 
each node i € N an integer b(i) representing its sup- 
ply/demand. If b(i) > 0, node i is a supply node; if b(i) 
< 0, then node i is a demand node with a demand of — 
b(i); and if b(i) = 0, then node iis a transshipment node. 
We assume that }°; <n b(i) = 0. The decision variables 
xj are arc flows defined for each arc (i, j) € A. 

The minimum cost flow problem is an optimization 
model formulated as follows: 


Minimize > CX; (1) 
(i, jJEA 
subject to 
> Rig » xji = (i), 
{j: (i,pea} {i: (ie 


forallie N, (2) 


lig < xij <uij, forall (i,j) € A. (3) 


We refer to the constraints (2) as the mass balance con- 
straints. For a fixed node i, the first term in the con- 
straint (2) represents the total outflow of node i and the 
second term represents the total inflow of node i. The 
mass balance constraints state that outflow minus in- 
flow must equal the supply/demand of each node. The 
flow must also satisfy the lower bound and capacity 
constraints (3), which we refer to as flow bound con- 
straints. 

This article is organized as follows. To help in un- 
derstanding the applicability of the minimum cost flow 
problem, we begin in Section 2 by describing several 
applications. In Section 3, we present preliminary ma- 
terial needed in the subsequent sections. We next dis- 
cuss algorithms for the minimum cost flow problem, 
describing the cycle-canceling algorithm in Section 4 
and the successive shortest path algorithm in Section 5. 
The cycle-canceling algorithm identifies negative cost 
cycles in the network and augments flows along them. 
The successive shortest path algorithm augments flow 
along shortest cost augmenting paths from the supply 
nodes to the demand nodes. In Section 6, we describe 
the network simplex algorithm. 


Applications 


Minimum cost flow problems arise in almost all in- 
dustries, including agriculture, communications, de- 
fense, education, energy, health care, manufacturing, 
medicine, retailing, and transportation. Indeed, mini- 
mum cost flow problems are pervasive in practice. In 
this section, by considering a few selected applications 
that arise in distribution systems planning, capacity 
planning, and vehicle routing, we give a passing glimpse 
of these applications. 


Distribution Problems 


A large class of network flow problems center around 
distribution applications. One core model is often de- 
scribed in terms of shipments from plants to ware- 
houses (or, alternatively, from warehouses to retailers). 
Suppose a firm has p plants with known supplies and q 
warehouses with known demands. It wishes to identify 
a flow that satisfies the demands at the warehouses from 
the available supplies at the plants and that minimizes 
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its shipping costs. This problem is a well-known spe- 
cial case of the minimum cost flow problem, known as 
the transportation problem. We next describe in more 
detail a slight generalization of this model that also in- 
corporates manufacturing costs at the plants. 

A car manufacturer has several manufacturing 
plants and produces several car models at each plant 
that it then ships to geographically dispersed retail cen- 
ters throughout the country. Each retail center requests 
a specific number of cars of each model. The firm must 
determine the production plan of each model at each 
plant and a shipping pattern that satisfies the demand 
of each retail center while minimizing the overall cost 
of production and transportation. 

We describe this formulation through an example. 
Figure | illustrates a situation with two manufacturing 
plants, two retailers, and three car models. This model 
has four types of nodes: 

i) plant nodes, representing various plants; 

ii) plant/model nodes, corresponding to each model 
made at a plant; 

iii) retailer/model nodes, corresponding to the models 
required by each retailer; and 

iv) retailer nodes corresponding to each retailer. 

The network contains three types of arcs: 

i) production arcs; 

ii) transportation arcs; and 

iii) demand arcs. 

The production arcs connect a plant node to a plant/ 

model node; the cost of this arc is the cost of produc- 

ing the model at that plant. We might place lower and 

upper bounds on production arcs to control for the 

minimum and maximum production of each particu- 

lar car model at the plants. Transportation arcs con- 

nect plant/model nodes to retailer/model nodes; the 

cost of any such arc is the total cost of shipping one 

car from the manufacturing plant to the retail cen- 

ter. The transportation arcs might have lower or upper 

bounds imposed upon their flows to model contractual 

agreements with shippers or capacities imposed upon 

any distribution channel. Finally, demand arcs connect 

retailer/model nodes to the retailer nodes. These arcs 

have zero costs and positive lower bounds that equal 

the demand of that model at that retail center. 

The production and shipping schedules for the au- 
tomobile company correspond in a one-to-one fashion 
with the feasible flows in this network model. Conse- 


quently, a minimum cost flow provides an optimal pro- 
duction and shipping schedule. 


Airplane Hopping Problem 


A small commuter airline uses a plane, with a capacity 
to carry at most p passengers, on a ‘hopping flight’ as 
shown in Fig. 2a). The hopping flight visits the cities 
1, ..., n, in a fixed sequence. The plane can pick up 
passengers at any node and drop them off at any other 
node. Let bj denote the number of passengers available 
at node i who want to go to node j, and let fj; denote 
the fare per passenger from node i to node j. The airline 
would like to determine the number of passengers that 
the plane should carry between the various origins to 
destinations in order to maximize the total fare per trip 
while never exceeding the plane’s capacity. 

Figure 2b) shows a minimum cost flow formulation 
of this hopping plane flight problem. The network con- 
tains data for only those arcs with nonzero costs and 
with finite capacities: any arc listed without an associ- 
ated cost has a zero cost; any arc listed without an as- 
sociated capacity has an infinite capacity. Consider, for 
example, node 1. Three types of passengers are avail- 
able at node 1: those whose destination is node 2, node 
3 or node 4. We represent these three types of passen- 
gers in a new derived network by the nodes 1 - 2, 1 - 
3 and 1 - 4 with supplies bj, bj3 and b,4. A passenger 
available at any such node, say 1 - 3, could board the 
plane at its origin node represented by flowing through 
the arc (1 - 3, 1) and incurring a cost of — f;3 units (or 
profit of f;3 units). Or, the passenger might never board 
the plane, which we represent by the flow through the 
arc (1 - 3, 3). It is easy to establish a one-to-one corre- 
spondence between feasible flows in Fig. 2b) and feasi- 
ble loading of the plane with passengers. Consequently, 
a minimum cost flow in Fig. 2b) will prescribe a most 
profitable loading of the plane. 


Directed Chinese Postman Problem 


The directed Chinese postman problem is a generic rout- 
ing problem that can be stated as follows. In a directed 
network G = (N, A) in which each arc (i, j) has an as- 
sociated cost cj, we wish to identify a walk of mini- 
mum cost that starts at some node (the post office), vis- 
its each arc of the network at least once, and returns 
to the starting point (see the next Section for the def- 
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Minimum Cost Flow Problem, Figure 1 
Formulating the production-distribution problem 


inition of a walk). This problem has become known as 
the Chinese postman problem because a Chinese math- 
ematician, K. Mei-Ko, first discussed it. The Chinese 
postman problem arises in other settings as well; for in- 
stance, patrolling streets by police, routing street sweep- 
ers and household refuse collection vehicles, fuel oil de- 
livery to households, and spraying roads with sand dur- 
ing snowstorms. The directed Chinese postman prob- 
lem assumes that all arcs are directed, that is, the postal 
carrier can traverse an arc in only one direction (like 
one-way streets). 

In the directed Chinese postman problem, we are 
interested in a closed (directed) walk that traverses each 
arc of the network at least once. The network might not 
contain any such walk. It is easy to show that a net- 
work contains a desired walk if and only if the net- 


Retailer/model Retailer 


nodes nodes 


work is strongly connected, that is, every node in the net- 
work is reachable from every other node via a directed 
path. Simple graph search algorithms are able to deter- 
mine whether the network is strongly connected, and 
we shall therefore assume that the network is strongly 
connected. 

In an optimal walk, a postal carrier might traverse 
arcs more than once. The minimum length walk min- 
imizes the sum of lengths of the repeated arcs. Let xj 
denote the number of times the postal carrier traverses 
arc (i, j) in a walk. Any carrier walk must satisfy the fol- 
lowing conditions: 


Y xij - ¥. xi = 0 forall i € N, (4) 
{j: pea} fj: eA} 
xij = 1 forall (i,j) € A. (5) 
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Formulation of the hopping plane flight problem as a minimum cost flow problem 


The constraints (4) state that the carrier enters 
a node the same number of times that he or she leaves 
it. The constraints (5) state that the carrier must visit 
each arc at least once. Any solution x satisfying the sys- 
tem (4)-(5) defines a carrier’s walk. We can construct 
a walk in the following manner. Given a flow x;, we re- 
place each arc (i, j) with xj copies of the arc, each arc 
carrying a unit flow. In the resulting network, say G’ = 
(N, A’), each node has the same number of outgoing 
arcs as it has the incoming arcs. It is possible to decom- 
pose this network into at most m/2 arc-disjoint directed 
cycles (by walking along an arc (i, j) from some node i 
with x; > 0, leaving an node each time we enter it until 
we repeat a node). We can connect these cycles together 
to form a closed walk of the carrier. 

The preceding discussion shows that the solution 
x defined by a feasible walk for the carrier satisfies 
conditions (4)-(5), and, conversely, every feasible so- 
lution of system (4)-(5) defines a walk of the postman. 
The length of a walk defined by the solution x equals 
uj 4 CX. This problem is an instance of the mini- 
mum cost flow problem. 


Preliminaries 


In this Section, we discuss some preliminary material 
required in the following sections. 


Assumptions 


We consider the minimum cost flow problem subject to 

the following six assumptions: 

1) Ij =0 for each (i,j) € A; 

2) all data (cost, supply/demand, and capacity) are in- 
tegral; 

3) all arc costs are nonnegative; 

4) for any pair of nodes i and j, the network does not 
contain both the arcs (i, j) and (j, i); 

5) the minimum cost flow problem has a feasible solu- 
tion; and 

6) the network contains a directed path of sufficiently 
large capacity between every pair of nodes. 

It is possible to show that none of these assumptions, 

except 2), restricts the generality of our development. 

We impose them just to simply our discussion. 
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Graph Notation 


We use standard graph notation. A directed graph G = 
(N, A) consists of a set N of nodes and a set A of arcs. 
A directed arc (i, j) has two endpoints, i and j. An arc (i, 
j) is incident to nodes i and j. The arc (i, j) is an outgoing 
arc of node i and an incoming arc of node j. A walk in 
a directed graph G = (N, A) is a sequence of nodes and 
arcs ij, A, i2, d2,..., i, satisfying the property that for all 
1<k<r—1, either ay = (ig, ix 1) € A Or ag = (igs 1, ik) 
€ A. We sometimes refer to a walk as a sequence of arcs 
(or nodes) without any explicit mention of the nodes 
(or arcs). A directed walk is an oriented version of the 
walk in the sense that for any two consecutive nodes 
ix and ix, 1 on the walk, ax = (ix, iki) € A. A path is 
a walk without any repetition of nodes, and a directed 
path is a directed walk without any repetition of nodes. 
A cycle is a path ij, iz, ..., i, together with the arc (i,, 
i) or (i, i,). A directed cycle is a directed path i), in, 
...» i, together with the arc (i,, i). A spanning tree of 
a directed graph G is a subgraph G’ = (N, A’) with A’ 
C A that is connected (that is, contains a path between 
every pair of nodes) and contains no cycle. 


Residual Network 


The algorithms described in this article rely on the con- 
cept of a residual network G(x) corresponding to a flow 
x. For each arc (i, j) € A, the residual network contains 
two arcs (i, j) and (j, i). The arc (i, j) has cost cj and 
residual capacity rj = uj — xj, and the arc (j, i) has cost 
cji = — cj and residual capacity rj; = xj. The residual 
network consists of arcs with positive residual capacity. 
If (i, j) € A, then sending flow on arc (j, i) in G(x) cor- 
responds to decreasing flow on arc (i, j); for this reason, 
the cost of arc (j, i) is the negative of the cost of arc 
(i, j). These conventions show how to determine the 
residual network G(x) corresponding to any flow x. We 
can also determine a flow x from the residual network 
G(x) as follows. If rj > 0, then using the definition of 
residual capacities and Assumption 4), we set xj = uj 
— rj if (i,j) € A, and x; = rg otherwise. We define the 
cost of a directed cycle W in the residual network G(x) 


as Es ew Cij- 


Order Notation 


In our discussion, we will use some well-known nota- 
tion from the field of complexity theory. We say that 


an algorithm for a problem ? is an O(n*) algorithm, or 
has a worst-case complexity of O(n*), if it is possible to 
solve any instance of P using a number of computa- 
tions that is asymptotically bounded by some constant 
times the term n°. We refer to an algorithm as a poly- 
nomial time algorithm if its worst-case running time is 
bounded by a polynomial function of the input size pa- 
rameters, which for a minimum cost flow problem, are 
n, m, log C (the number of bits needed to specify the 
largest arc cost), and log U (the number of bits needed 
to specify the largest arc capacity). A polynomial time 
algorithm is either a strongly polynomial time algorithm 
(when the complexity terms involves only n and m, but 
not log C or log U), or is a weakly polynomial time al- 
gorithm (when the complexity terms include log C or 
log U or both). We say that an algorithm is a pseu- 
dopolynomial time algorithm if its worst-case running 
time is bounded by a polynomial function of n, m and 
U. For example, an algorithm with worst-case complex- 
ity of O(nm? log n) is a strongly polynomial time algo- 
rithm, an algorithm with worst-case complexity O(nm? 
log U) is a weakly polynomial time algorithm, and an 
algorithm with worst-case complexity of O(n? mU) is 
a pseudopolynomial time algorithm. 


Cycle-Canceling Algorithm 


In this Section, we describe the cycle-canceling algo- 
rithm, one of the more popular algorithms for solv- 
ing the minimum cost flow problem. The algorithm 
sends flows (called augmenting flows) along directed cy- 
cles with negative cost (called negative cycles). The algo- 
rithm rests upon the following negative cycle optimality 
condition stated as follows. 


Theorem 1 (Negative cycle optimality condition) A 
feasible solution x* is an optimal solution of the mini- 
mum cost flow problem if and only if the residual net- 
work G(x*) contains no negative cost (directed) cycle. 


It is easy to see the necessity of these conditions. If the 
residual network G(x*) contains a negative cycle (that 
is, a negative cost directed cycle), then by augmenting 
positive flow along this cycle, we can decrease the cost 
of the flow. Conversely, it is possible to show that if the 
residual network G(x*) does not contain any negative 
cost cycle, then x* must be an optimal flow. 

The negative cycle optimality condition suggests 
one simple algorithmic approach for solving the min- 
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BEGIN 
establish a feasible flow < in the network; 
WHILE G(x) contains a negative cycle DO 
BEGIN 
identify a negative cycle W; 
6 := min{ri;: (2,7) € W}; 
augment 6 units of flow in the cycle W 
and update G(x); 
END; 
END 


Minimum Cost Flow Problem, Figure 3 
Cycle-canceling algorithm 


imum cost flow problem, which we call the cycle- 
canceling algorithm. This algorithm maintains a feasible 
solution and at every iteration improves the objective 
function value. The algorithm first establishes a feasi- 
ble flow x in the network by solving a related (and eas- 
ily solved) problem known as the maximum flow prob- 
lem. Then it iteratively finds negative cycles in the resid- 
ual network and augments flows on these cycles. The 
algorithm terminates when the residual network con- 
tains no negative cost directed cycle. Theorem 1 implies 
that when the algorithm terminates, it has found a min- 
imum cost flow. Figure 3a specifies this generic version 
of the cycle-canceling algorithm. 

The numerical example shown in Fig. 4a) illustrates 
the cycle-canceling algorithm. This figure shows the arc 


costs and the starting feasible flow in the network. Each 
arc in the network has a capacity of 2 units. Figure 4b) 
shows the residual network corresponding to the ini- 
tial flow. We do not show the residual capacities of the 
arcs in Fig. 4b) since they are implicit in the network 
structure. If the residual network contains both arcs (i, 
j) and (j, i) for any pair i and j of nodes, then both have 
residual capacity equal to 1; and if the residual network 
contains only one arc, then its capacity is 2 (this ob- 
servation uses the fact that each arc capacity equals 2). 
The residual network shown in Fig. 4b) contains a neg- 
ative cycle 1 - 3 - 2 - 1 with cost - 3. By augmenting 
a unit flow along this cycle, we obtain the residual net- 
work shown in Fig. 4c). The residual network shown in 
Fig. 4c) contains a negative cycle 6 - 4 - 5 - 6 with cost - 
4. We augment unit flow along this cycle, producing the 
residual network shown in Fig. 4d), which contain no 
negative cycle. Given the optimal residual network, we 
can determine optimal flow using the method described 
in the previous Section. 

A byproduct of the cycle-canceling algorithm is the 
following important result. 


Theorem 2 (Integrality property) [f all arc capacities 
and supply/demands of nodes are integer, then the mini- 
mum cost flow problem always has an integer minimum 
cost flow. 


Minimum Cost Flow Problem, Figure 4 


Illustration of the cycle-canceling algorithm. a) the original network with flow x and arc costs; b) the residual network G(x); c) 
the residual network after augmenting a unit of flow along the cycle 2 - 1 - 3 - 2; d) the residual network after augmenting a 


unit of flow along the cycle 4-5-6-4 
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This result follows from the fact that for problems 
with integer arc capacities and integer node sup- 
plies/demand, the cycle-canceling algorithm starts with 
an integer solution (which is provided by the maxi- 
mum flow algorithm used to obtain the initial feasible 
flow) and at each iteration augments flow by an integral 
amount. 

What is the worst-case computational requirement 
(complexity) of the cycle-canceling algorithm? The al- 
gorithm must repeatedly identify negative cycles in the 
residual network. We can identify a negative cycle in 
the residual network in O(mm) time using a shortest 
path label-correcting algorithm [1]. How many times 
must the generic cycle-canceling algorithm perform 
this computation? For the minimum cost flow prob- 
lem, mCU is an upper bound on the initial flow cost 
(since cy < C and xj < U for all (i, j) € A) and —-mCU 
is a lower bound on the optimal flow cost (since cj 
= —C and xj < U for all (i, j) € A). Any iteration 
of the cycle-canceling algorithm changes the objective 
function value by an amount )°(,) < w ¢,j) 5, which is 
strictly negative. Since we have assumed that the prob- 
lem has integral data, the algorithm terminates within 
O(mCU) iterations and runs in O(nm? CU) time, which 
is a pseudopolynomial running time. 

The generic version of the cycle-canceling algo- 
rithm does not specify the order for selecting nega- 
tive cycles from the network. Different rules for select- 
ing negative cycles produce different versions of the al- 
gorithm, each with different worst-case and theoreti- 
cal behavior. Two versions of the cycle-canceling algo- 
rithm are polynomial time implementations: 

i) aversion that augments flow in arc-disjoint negative 
cycles with the maximum improvement [2]; and 

ii) a version that augments flow along a negative cycle 
with minimum mean cost, that is, the average cost 

per arc in the cycle [4]). 


Successive Shortest Path Algorithm 


The cycle-canceling algorithm maintains feasibility of 
the solution at every step and attempts to achieve op- 
timality. In contrast, the successive shortest path algo- 
rithm maintains optimality of the solution at every step 
(that is, the condition that the residual network G(x) 
contains no negative cost cycle) and strives to attain fea- 
sibility. It maintains a solution x, called a pseudoflow 


(see below), that is nonnegative and satisfies the arcs’ 
flow capacity restrictions, but violates the mass balance 
constraints of the nodes. At each step, the algorithm se- 
lects a node k with excess supply (i-e., supply not yet 
sent to some demand node), a node / with unfulfilled 
demand, and sends flow from node k to node / along 
a shortest path in the residual network. The algorithm 
terminates when the current solution satisfies all the 
mass balance constraints. 

To be more precise, a pseudoflow is a vector x sat- 
isfying only the capacity and nonnegativity constraints; 
it need not satisfy the mass balance constraints. For any 
pseudoflow x, we define the imbalance of node i as 


) Xij 


{(i,f)EA} 
forallie N. (6) 


e(i) = b(i) + > Xji — 


{j,i)€A} 


If e(i) > 0 for some node i, then we refer to e(i) as the 
excess of node i; if e(i) < 0, then we refer to — e(i) as the 
node’s deficit. We refer to a node i with e(i) = 0 as bal- 
anced. Let E and D denote the sets of excess and deficit 
nodes in the network. Notice that }°jey e(i) = Dien 
b(i) = 0, which implies that )°j ez e(i) = — oie p e(i). 
Consequently, if the network contains an excess node, 
then it must also contain a deficit node. The residual 
network corresponding to a pseudoflow is defined in 
the same way that we define the residual network for 
a flow. The successive shortest path algorithm uses the 
following result. 


Theorem 3 (Shortest augmenting path theorem) 
Suppose a pseudoflow (or a flow) x satisfies the optimal- 
ity conditions and we obtain x’ from x by sending flow 
along a shortest path from node k to some other node | in 
the residual network, then x’ also satisfies the optimality 
conditions. 


To prove this Theorem, we would show that if the resid- 
ual network G(x) contain no negative cycle, then aug- 
menting flow along any shortest path does not intro- 
duce any negative cycle (we will not establish this result 
in this discussion). Figure 5 gives a formal description 
of the successive shortest path algorithm. 

The numerical example shown in Fig. 6a) illustrates 
the successive shortest path algorithm. The algorithm 
starts with x = 0, and at this value of flow, the residual 
network is identical to the starting network. Just as we 
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BEGIN 
x := 0; 
e(i) = b(i) for all i € N; 
initialize the sets E and D; 
WHILE E 4 DO 
BEGIN 
select anode k € E anda node/ € D; 
identify a shortest path P in G(x) from 
node k to node /; 
6:=min|[e(s), —e(t), min{r;; : (i, j) € P}]; 
augment 6 units of flow along the path P and 
update x and G(x); 
END 
END 


Minimum Cost Flow Problem, Figure 5 
Successive shortest path algorithm 


observed in Fig. 4, whenever the residual network con- 
tains both the arcs (i, j) and (j, i), the residual capacity of 
each arc is 1. If the residual network contains only one 
arc, (i, j) or (j, i), then its residual capacity is 2 units. 
For this problem, E = {1} and D = {6}. In the residual 
network shown in Fig. 6a), the shortest path from node 
1 to node 6 is 1 - 2 - 4 - 6 with cost equal to 9. The 
residual capacity of this path equals 2. Augmenting two 
units of flow along this path produces the residual net- 
work shown in Fig. 6b), and the next shortest path from 


oi) UW 


node 1 to node 6 is 1 - 3 - 5 - 6 with cost equal to 10. 
The residual capacity of this path is 2 and we augment 
two unit of flow on it. At this point, the sets E= D= 9, 
and the current solution solves the minimum cost flow 
problem. 

To show that the algorithm correctly solves the min- 
imum cost flow problem, we argue as follows. The algo- 
rithm starts with a flow x = 0 and the residual network 
G(x) is identical to the original network. Assumption 3) 
implies that all arc costs are nonnegative. Consequently, 
the residual network G(x) contains no negative cycle 
and so the flow vector x satisfies the negative cycle op- 
timality conditions. Since the algorithm augments flow 
along a shortest path from excess nodes to deficit nodes, 
Theorem 3 implies that the pseudoflow maintained by 
the algorithm always satisfies the optimality conditions. 
Eventually, node excesses and deficits become zero; at 
this point, the solution maintained by the algorithm is 
an optimal flow. 

What is the worst-case complexity of this algo- 
rithm? In each iteration, the algorithm reduces the ex- 
cess of some node. Consequently, if U is an upper 
bound on the largest supply of any node, then the al- 
gorithm would terminate in at most nU iterations. We 
can determine a shortest path in G(x) in O(mm) time us- 
ing a label-correcting shortest path algorithm [1]. Con- 
sequently, the running time of the successive shortest 
path algorithm is n?mU. 


Minimum Cost Flow Problem, Figure 6 


Illustration of the successive shortest path algorithm. a) the residual network corresponding to x = 0; b) the residual network 
after augmenting 2 units of flow along the path 1 - 2 - 4 - 6; c) the residual network after augmenting 2 units of flow along 


the path 1- 3-5-6 
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Minimum Cost Flow Problem, Figure 7 
Computing flows for a spanning tree 


The successive shortest path algorithm requires 
pseudopolynomial time to solve the minimum cost flow 
problem since it is polynomial in n, m and the largest 
supply U. This algorithm is, however, polynomial time 
for some special cases of the minimum cost flow prob- 
lem (such as the assignment problem for which U = 
1). Researchers have developed weakly polynomial time 
and strongly polynomial time versions of the successive 
shortest path algorithm; some notable implementations 
are due to [3] and [5]. 


Network Simplex Algorithm 


The network simplex algorithm for solving the mini- 
mum cost flow problem is an adaptation of the well- 
known simplex method for general linear programs. 
Because the minimum cost flow problem is a highly 
structured linear programming problem, when applied 
to it, the computations of the simplex method become 
considerably streamlined. In fact, we need not explic- 
itly maintain the matrix representation (known as the 
simplex tableau) of the linear program and can per- 
form all of the computations directly on the network. 
Rather than presenting the network simplex algorithm 
as a special case of the linear programming simplex 
method, we will develop it as a special case of the cycle- 
canceling algorithm described above. The primary ad- 
vantage of our approach is that it permits the network 
simplex algorithm to be understood without relying on 
linear programming theory. 

The network simplex algorithm maintains solutions 
called spanning tree solutions. A spanning tree solution 
partitions the arc set A into three subsets: 

i) T, the arcs in the spanning tree; 
ii) L, the nontree arcs whose flows are restricted to 
value zero; 


iii) U, the nontree arcs whose flow values are restricted 
in value to the arcs’ flow capacities. 

We refer to the triple (T, L, U) as a spanning tree 
structure. Each spanning tree structure (T, L, U) has 
a unique solution that satisfies the mass balance con- 
straints (2). To determine this solution, we set xj = 0 
for all arcs (i, j) € L, xj = uy for all arcs (i, j) € U, and 
then solve the mass balance equations (2) to determine 
the flow values for arcs in T. 

To show that the flows on spanning tree arcs are 
unique, we use a numerical example. Consider the 
spanning tree T shown in Fig. 7a). Assume that U = 9, 
that is, all nontree arcs are at their lower bounds. Con- 
sider the leaf node 4 (a leaf node is a node with exactly 
one arc incident to it). Node 4 has a supply of 5 units 
and has only one arc (4, 2) incident to it. Consequently, 
arc (4, 2) must carry 5 units of flow. So we set x42 = 5, 
add 5 units to b(2) (because it receives 5 units of flow 
sent from node 4), and delete arc (4, 2) from the tree. 
We now have a tree with one fewer node and next se- 
lect another leaf node, node 5 with the supply of 5 units 
and the single arc (5, 2) incident to it. We set x5. = 5, 
again add 5 units to b(2), and delete the arc (5, 2) from 
the tree. Now node 2 becomes a leaf node with modified 
supply/demand of b(5) = —10, implying that node 5 has 
an unfulfilled demand of 10 units. Node 2 has exactly 
one incoming arc (1, 2) and to meet the demand of 10 
units of node 2, we must send 10 units of flow on this 
arc. We set xj2 = 10, subtract 10 units from b(1) (since 
node 1 sends 10 units), and delete the arc (1, 2) from 
the tree. We repeat this process until we have identi- 
fied flow on all arcs in the tree. Figure 7b) shows the 
corresponding flow. Our discussion assumed that U is 
empty. If U were nonempty, we would first set xj = uj, 
add uj to b(j), and subtract uj from b(i) for each arc (i, 
j) € U, and then apply the preceding method. 
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Minimum Cost Flow Problem, Figure 8 
Computing node potentials for a spanning tree 


We say a spanning tree structure is feasible if its as- 
sociated spanning tree solution satisfies all of the arcs’ 
flow bounds. We refer to a spanning tree structure as 
optimal if its associated spanning tree solution is an op- 
timal solution of the minimum cost flow problem. We 
will now derive the optimality conditions for a span- 
ning tree structure (T, L, U). 

The network simplex algorithm augments flow 
along negative cycles. To identify negative cycles 
quickly, we use the concept of node potentials. We de- 
fine node potentials (i) so that the reduced cost for 
any arc in the spanning tree T is zero. That is, that is, c7; 
= cj — (i) +m (j) = 0 for each (i,j) € T. With the help 
of an example, we show how to compute the vector 
of node potentials. Consider the spanning tree shown 
in Fig. 8a) with arc costs as shown. The vector 7 has 
n variables and must satisfy n — 1 equations, one for 
each arc in the spanning tree. Therefore, we can assign 
one potential value arbitrary. We assume that (1) = 0. 
Consider arc (1, 2) incident to node 1. The condition 
Ch = C2 — m (1) + mw (2) = O yields m (2) = — 5. We 
next consider arcs incident to node 2. Using the con- 
dition cZ, = cs. — mw (5)+ mw (2) = 0, we see that z (5) 
= — 3, and the condition c%, = cx, — mw (3) + wm (2) = 
0 shows that z (3) = — 2. We repeat this process until 
we have identified potentials of all nodes in the tree T. 
Figure 8b) shows the corresponding node potentials. 

Consider any nontree arc (k, I). Adding this arc to 
the tree T creates a unique cycle, which we denote as 
Wu. We refer to Wy as the fundamental cycle induced 
by the nontree arc (k, 1). If (k, I) € L, then we define the 
orientation of the fundamental cycle as in the direction 
of (k, 1), and if (k, 1) € U, then we define the orienta- 


mid 


tion opposite to that of (k, 1). In other words, we de- 
fine the orientation of the cycle in the direction of flow 
change permitted by the arc (k, /). We let c( Wj) denote 
the change in the cost if we send one unit of flow on the 
cycle W,; along its orientation. (Notice that because of 
flow bounds, we might not always be able to send flow 
along the cycle Wy.) Let Wx; denote the set of forward 
arcs in W,, (that is, those with the same orientation as 
(k, 1)), and let W,, denote the set of backward arcs in 
Wu (that is, those with an opposite the orientation to 
arc (k, 1)). Then, if we send one unit of flow along Wx, 
then the flow on arcs in Wx; increases by one unit and 
the flow on arcs in W,, decreases by one unit. There- 
fore, 


c(Wx1) = Cij — > Cij- 


(is f)€ Wei Glew) 


Let c” (W,;) denote the change in the reduced costs 
if we send one unit of flow in the cycle Wy along its 
orientation, that is, 


c* (Wx) = ¥ = > Ci. 


Gi. p)eWr (i, EWR! 


It is easy to show that c” (Wj) = c(Wj). This result 
follows from the fact that when we substitute c7) = cj 
— a (i) + x (j) and add the reduced costs around any 
cycle, then the node potentials (i) cancel one another. 
Next notice that the manner we defined node potentials 
ensures that each arc in the fundamental cycle W4, ex- 
cept the arc (k, I) has zero reduced cost. Consequently, 
if arc (k, J) € L, then 


c(Wg1) = c” (Wg) = cf, 
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and if arc (k, 1) € U, then 
c( Wei) = c* (Wi) = —cy)- 


This observation and the negative cycle optimality 
condition (Theorem 1) implies that for a spanning tree 
solution to be optimal, it must satisfy the following nec- 
essary conditions: 


cy, = 0 for every arc (i, j) € L, (7) 


cz, <0 for every arc (i, j) € U. (8) 


It is possible to show that these conditions are also 
sufficient for optimality; that is, if any spanning tree so- 
lution satisfies the conditions (7)—(8), then it solves the 
minimum cost flow problem. 

We now have all the necessary ingredients to de- 
scribe the network simplex algorithm. The algorithm 
maintains a feasible spanning tree structure at each it- 
eration, which it successively transforms it into an im- 
proved spanning tree structure until the solution be- 
comes optimal. The algorithm first obtains an initial 
spanning tree structure. If an initial spanning tree struc- 
ture is not easily available, then we could use the follow- 
ing method to construct one: for each node i with b(i) 
> 0, we connect node i to node 1 with an (artificial) arc 
of sufficiently large cost and large capacity; and for each 
node i with b(i) < 0, we connect node 1 to node i with 
an (artificial) arc of sufficiently large cost and capacity. 
These arcs define the initial tree T, all arcs in A define 
the set L, and U = @. Since these artificial arcs have large 
costs, subsequent iterations will drive the flow on these 
arcs to zero. 

Given a spanning tree structure (T, L, U), we first 
check whether it satisfies the optimality conditions (7) 
and (8). If yes, we stop; otherwise, we select an arc (k, I) 
€ Lor (k,l) € U violating its optimality condition as an 
entering arc to be added to the tree T, obtain the fun- 
damental cycle Wy induced by this arc, and augment 
the maximum possible flow in the cycle Wj, without vi- 
olating the flow bounds of the tree arcs. At this value 
of augmentation, the flow on some tree arc, say arc (p, 
q); reaches its lower or upper bound; we select this arc 
as an arc to leave the spanning tree T, adding it added 
to L or U depending upon its flow value. We next add 
arc (k, 1) to T, giving us a new spanning tree structure. 
We repeat this process until the spanning tree structure 


BEGIN 

determine an initial feasible tree structure 

(CRIES U)): 

let x be the flow and let z be the corresponding 

node potentials; 

WHILE (some nontree arc violates its opti- 

mality condition) DO 

BEGIN 
select an entering arc (k, /) violating the opti- 
mality conditions; 
add arc (k,/) to the spanning tree T, thus 
forming a unique cycle Wx); 
augment the maximum possible flow 6 in the 
cycle W;) and 
identify a leaving arc (p,q) that reaches its 
lower or upper flow bound; 
update the flow x, the spanning tree struc- 
ture (T, L, U) and the potentials z; 

END; 

END 


Minimum Cost Flow Problem, Figure 9 
The network simplex algorithm 


satisfies the optimality conditions. Figure 9 specifies the 
essential steps of the algorithm. 

To illustrate the network simplex algorithm, we use 
the numerical example shown in Fig. 10a). Figure 10b) 
shows a feasible spanning tree solution for the problem. 
For this solution, T = {(1, 2), (1, 3), (2, 4), (2, 5), (5, 6)}, 
L = {(2, 3), (5, 4)} and U = {(3, 5), (4, 6)}. We next com- 
pute cJ, = 1. We introduce the arc (3, 5) into the tree, 
creating a cycle. Since (3, 5) is at its upper bound, the 
orientation of the cycle is opposite to that of (3, 5). The 
arcs (1, 2) and (2, 5) are forward arcs in the cycle and 
arcs (3, 5) and (1, 3) are backward arcs. The maximum 
increase in flow permitted by the arcs (3, 5), (1, 3), (1, 
2), and (2, 5) without violating their upper and lower 
bounds is, respectively, 3, 3, 2, and 1 units. Thus, we 
augment 1 unit of flow along the cycle. The augmenta- 
tion increases the flow on arcs (1, 2) and (2, 5) by one 
unit and decreases the flow on arcs (1, 3) and (3, 5) by 
one unit. Arc (2, 5) reaches its upper bound and we se- 
lect it as the leaving arc. We update the spanning tree 
structure; Fig. 10c) shows the new spanning tree T and 
the new node potentials. The sets L and U become L = 
{(2, 3), (5, 4)} and U = {(2, 5), (4, 6)}. In the next iter- 
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Minimum Cost Flow Problem, Figure 10 
Numerical example for the network simplex algorithm 


ation, we select arc (4, 6) since this arc violates the arc 
optimality condition. We augment one unit flow along 
the cycle 6 - 4-2 -1-3-5-6 and arc (3, 5) leaves 
the spanning tree. Figure 10d) shows the next spanning 
tree and the updated node potentials. All nontree arcs 
satisfy the optimality conditions and the algorithm ter- 
minates with an optimal solution of the minimum cost 
flow problem. 

The network simplex algorithm can select any non- 
tree arc that violates its optimality condition as an en- 
tering arc. Many different rules, called pivot rules, are 
possible for choosing the entering arc, and these rules 
have different empirical and theoretical behavior. [1] 
describes some popular pivot rules. We call the process 
of moving from one spanning tree structure to another 
as a pivot operation. By choosing the right data struc- 
tures for representing the tree T, it is possible to per- 
form a pivot operation in O(m) time. 

To determine the number of iterations performed 
by the network simplex algorithm, we distinguish two 
cases. We refer to a pivot operation as nondegenerate 
if it augments a positive amount of flow in the cycle 
Wu (that is, 6 > 0), and degenerate otherwise (that is, 
6 = 0). During a degenerate pivot, the cost of the span- 
ning tree solution decreases by |cZ,|5. When combined 
with the integrality of data assumption (Assumption 2) 


above), this result yields a pseudopolynomial bound on 
the number of nondegenerate iterations. However, de- 
generate pivots do not decrease the cost of flow and 
so are difficult to bound. There are methods to bound 
the number of degenerate pivots. Obtaining a polyno- 
mial bound on the number of iterations remained an 
open problem for quite some time; [6] suggested an 
implementation of the network simplex algorithm that 
runs in polynomial time. In any event, the empirical 
performance of the network simplex algorithm is very 
attractive. Empirically, it is one of the fastest known 
algorithms for solving the minimum cost flow prob- 
lem. 
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The location-allocation problem may be stated in the 
following general way: Given the location or distribu- 
tion of a set of customers which could be probabilis- 
tic and their associated demands for a given product or 
service, determine the optimal locations for a number 
of service facilities and the allocation of their products 
or services to the costumers, so as to minimize total (ex- 
pected) location and transportation costs. This problem 
finds a variety of applications involving the location of 
warehouses, distribution centers, service and produc- 
tion facilities and emergency service facilities. In the 
last section we are going to consider the development 
of an offshore oil field as a real-world application of the 
location-allocation problem. This problem involves the 
location of the oil platforms and the allocation of the oil 
wells to platforms. 

It was shown in [25] that the joint location- 
allocation problem is NP-hard even with all the demand 
points located along a straight line. In the next sec- 
tion alternative location-allocation models will be pre- 
sented based on different objectives and the incorpo- 
ration of consumer behavior, price elasticity and sys- 
tem dynamics within the location-allocation decision 
framework. 


Location-allocation Models 


In developing location-allocation models different ob- 
jectives alternatives are examined. One possibility is to 
follow the approach in [5], to minimize the number of 
centers required to serve the population. This objective 
is appropriate when the demand is exogenously fixed. 
A more general objective is to maximize demand by op- 
timally locating the centers as proposed in [10]. The de- 
mand maximization requires the incorporation of price 
elasticity representing the dependence of the costumer 
preference to the distance from the center. The cost of 
establishing the centers can also be incorporated in the 
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model as proposed in [13]. An alternative objective to- 
wards the implementation of costumer preference to- 
wards the nearest center is the minimization of an ag- 
gregated weighted distance which is called the median 
location-allocation problem. 

The simplest type of location-allocation problem is 
the Weber problem, as posed in [9], which involves lo- 
cating a production center so as to minimize aggre- 
gate weighted distance from the different raw mate- 
rial sources. The extension of the Weber problem is 
the p-median location-allocation problem, which in- 
volves the optimal location of a set of p uncapaci- 
tated centers to minimize the total weighted distance 
between them and n demand locations. Here, each 
source is assumed to have infinite capacity. In continu- 
ous space, the p-median problem can be formulated as 
follows: 


n Pp 
min C=) Oye: 


i=1 j=1 


P 
s.t. yan 1, 


where O; is the quantity demanded at location i whose 
coordinates are (x;, yi); and Xi is the binary vari- 
ables that is assigned the value of 1 if demand point 
i is located to center j and zero otherwise. The above 
formulation allocate the consumers to their nearest 
center while ensuring that only one center will serve 
each customer. This however, can lead to dispropor- 
tionally sized facilities. In the more realistic situation 
where the capacities of the facilities are limited to 
supplies of s1, ..., Ss, for i = 1, ..., n facilities then 
the location-allocation problem takes the following 
form [24]: 


np 
min C= >>> wiyty 


i=1 j=l 


P 
s.t > wij = Si, i= 1. Jn, 
j=l 
n 
iwi =a Gl 
i=1 
Aij = 0,1, i=1l,...,n j=l,. >? 


where w; is the amount shipped from facility i located 
at (xj, yj) to destination j. In the above formulations the 
distance (or the generalized transport cost, which is as- 
sumed to be proportional to distance) between the de- 
mand point i and the supply point j is represented by 
cj. The Euclidean metric: 


cij = lx =04)? + (yi = By? 
or the rectilinear metric: 
cij = |x — a;| + yi — B;| ; 


The rectilinear metric is appropriate when the trans- 
portation is occurring along a grid of city streets (Man- 
hattan norm) or along the aisles of a floor shop [8]. 

The aforementioned location-allocation models are 
based on the assumption that the consumers always 
prefer the nearest center to obtain service. In real- 
ity however, as reported in the literature from sev- 
eral empirical studies [11] there exist several ser- 
vices for which consumers choose their service fa- 
cility center. The travel patterns of the consumers 
for example can produce a variety of allocations 
that differ from the nearest center rule. In order 
to accommodate such behavior a spatial-interaction 
model is incorporated within the uncapacitated p- 
median location-allocation model in the following 
manner: 


min FMD Sulog(s 0 
J i 
+ OE YSijes; 
j i 
a  Y PSyS0, t= Iau 
j 
vi =P 
j 


Si = Yj, 
Yj =0,1, 


i=l,...,n, jH1,...,p, 


j=al....p, 


where the decision variables include Y; which takes the 
value of one if the facility is located at J models. and 
zero otherwise; 


Sij = AjO;Y; exp(—Bc;;) 
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that defines the interaction of facility i and consumer j. 


1 


A; = ———_——., i=l... 
>, Yr exp(—Bcir) 


» mM, 


that ensures that the sum of all outflows from the ori- 
gin i add up to the amount of demand at that location; 
B is either calibrated to match some known interac- 
tion data or is defined exogenously. The following re- 
lationship holds between the original p-median model 
and the spatial-interaction model as shown in [17]. The 
value of the optimal objective function at the solution 
of the p-median problem is given by: 


>> yO seen 
ij 


where Xj allocates demand to the nearest of p avail- 
able centers. Turning to spatial-interaction model, as 
the impedance parameter f increases the term: 


Yj exp(—fc;j) 
>) Yi exp(—Bei) 


of the Sy tends to Xj, where Xj = 1 if the travel time 
from i to j is smaller that the travel time from i to 
any other facility and zero otherwise. Therefore, the 
Sj tends to O;X; and this model allocates the demand 
to the nearest facility as the original p-median prob- 
lem. 

All the models mentioned above consider the static 
location-allocation problem where all the activities take 
place at one instance. These formulations are suffi- 
cient if neither the level nor the location of demand 
alters over time. An important factor however, in any 
location-allocation problem is the dynamics of the sys- 
tem involving demand changes over time. Particularly, 
in the competitive environment, an optimal center lo- 
cation could become undesirable as new competing 
centers develop. Potential directions include the liter- 
ature on decision making under uncertainty, [12]. A.J. 
Scott [18] proposed a general framework for the inte- 
gration of the spatial and discrete temporal dimensions 
in the location-allocation models. He proposed a mod- 
ification of the location-allocation so as to minimize an 
aggregate weighted transport cost over T time periods, 
during which time the number 1;, level Oj and the lo- 
cation (xj; yir) of the demand points change. If the lo- 


cations were greatly different the center would be likely 
to relocate at some time and costs of relocation are in- 
cluded in the model. It was assumed that when a cen- 
ter relocates it incurs a fixed cost, w. Based on these 
ideas the formulation proposed for the uncapacitated 
location-allocation problem has the following form: 


ny 
min a+) Oncij 


T mt 
a Ss (0, + SS ore 
t 


i=1 


s.t. At => 0,1, 


where the subscript t refers to different time periods, a 
is the cost of establishing the center in the first time pe- 
riod. The problem as formulated above is to locate in 
the first period one center that takes into account fu- 
ture variations. Extending the aspects of this model al- 
lows the replacement of a truly dynamic model by a se- 
ries of static problems as proposed in [3], thus outlining 
a multilayer approach, where the objective is to sequen- 
tially locate each period’s facility given the previous pe- 
riod’s facility locations in order to minimize the present 
period cost. This strategy is appropriate whenever the 
period durations are sufficiently long or under uncer- 
tainty regarding future data or decisions. An alterna- 
tive approach proposed in [24] is a discounted present 
worth strategy which is appropriate whenever the fore- 
going conditions do not hold. In this case the facilities 
are being located one per period and the decisions are 
made in a rolling horizon framework. 


Solution Approaches 


For the uncapacitated location-allocation problem us- 
ing Euclidean metric for the distances between each fa- 
cility and the different demand points, R.F. Love and 
H. Juel [15] showed that this problem is equivalent to 
a concave minimization problem for which they used 
several heuristic procedures. For the capacitated prob- 
lems assuming that the costs are proportional to Ef us- 
ing |, distances where p > 1 and q = 1 are integers, 
M. Avriel [1] developed a geometric programming ap- 
proach. H.D. Sherali and C.M. Shetty [22] proposed 
a polar cutting plane algorithm for the case p = q = 
1. For the case p = q = 2, Sherali and C.H. Tunc- 
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bilek [23] proposed a branch and bound algorithm (cf. 
> MINLP: Branch and bound methods; » MINLP: 
Branch and bound global optimization algorithm) that 
utilizes a specialized tight, linear programming repre- 
sentation to calculate strong upper bounds via a La- 
grangian relaxation scheme. They exploit the special 
structure of the transportation constraints to derive 
a partitioning scheme. Additional cut-set inequalities 
are also incorporated to preserve partial solution. 

For the uncapacitated location-allocation model us- 
ing rectilinear distance metric Love and J.G. Morris 
[16] have developed an exact two-stage algorithm. R.E. 
Kuenne and R.M. Soland [14], have developed a branch 
and bound algorithm based on a constructive assign- 
ment of customers to sources. The capacitated problem 
has been addressed in [19,21] and utilize the discrete 
equivalence of the capacitated location-allocation prob- 
lem. In particular, [8], and [26] showed that 
a) the optimal values of x; and y; for each i must satisfy 

x; = a; for some j and y; = f; for some j, which means 

that the rectilinear distance location problem always 

has an optimal solution with the sources located at 
the grid points of the vertical and horizontal lines 
drawn through the existing customer locations; and 
b) the optimal source locations lie in the convex hull of 
the existing facility locations. 
Based on these ideas and by denoting k = 1, ..., K the 
intersection grid points that also belong to the convex 
hull of the existing facility locations, [21], introduced 
the decision binary variables z, that take the value of 
1 if source i is located at point k and zero otherwise. 
This leads to the following discrete location-allocation 
problem: 


n p K 
min °° caw 


i=1 j=1 k=1 


K 
s.t. » ee a1, a nee 
k=1 


P 

>> wij = 5i, io ees 78 

j=l 

n 

Yo wij = dj, j= ie x9: Ps 

i=1 

wij = 9, al een / j=l, -»Ps 
Zk =0,1, i=1,...,n, 


where cijx = cy [lox — oj| + [Bx — Bil]. The above 
model corresponds to a mixed integer bilinear pro- 
gramming problem. See [19] for a related version of 
this discrete-site location-allocation problem involving 
one-to-one assignment restriction and fixed charges. 
See [20] for the solution of the problem as a bilinear 
programming problem, since the binary variables z can 
be treated as positive variables because of the problem 
structure that preserves the binariness of z at optimal- 
ity. However, in [21] it is proved that it is more useful 
to exploit the binary nature of z variables for the effi- 
cient solution of the above model. Before giving more 
details of this proposed branch and bound based ap- 
proach we should mention the heuristic approach pro- 
posed in [4], which is very widely used. This so-called 
alternating procedure exploited the fundamental con- 
cepts of the location-allocation problem and simply 
involves allocating demand to centers and relocating 
centers until some convergence criterion is achieved. 
For the uncapacitated p-median problem, the alternat- 
ing procedure involves iterating through the following 
equations: 


Bs OjAijxi 
i=1 


xji= ail 
i x OjAij ” 
i=1 Cij 
5 ah Oi” iyi 
i=1 Cij 
Jp= 


n  OjAij ? 
pa aera 


Cij 


which are derived from differentiating the objective 
function with respect to x; and y; and setting the partial 
derivatives to zero. The major drawback of this proce- 
dure is that it does not guarantee global optimality. This 
is in fact a concern because the spatial configuration of 
the local and the global optimum may be very different. 
Asa rule, repeated runs using numerous starting values 
should be undertaken, although there is no guarantee 
that the repeatedly found solution would be the global 
optimum. Note however that the procedure is general 
to all different models of the location-allocation prob- 
lem. 

Returning to the approach proposed in [21] for the 
case of rectilinear capacitated location-allocation prob- 
lem, the following linear reformulation of the problem 
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is used: 
n Pp K 
min OY even 
i=1 j=1 k=1 


K 
s.t. Y- Xijz -—wWii= 0, Vi, j), 
k=1 


P 
> Xizz —$iZix =9, V(i,k), 
j=l 


—Xijk + uijzik =0, VG.)), 
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Ya il. Vi, 


wij 20, V(i,j), 
Zik = 0,1, Vi, 
Xijk 20, Wi, j,k), 


where uj; = min{s;, d; }. The above model corresponds to 
a mixed integer linear programming problem for which 
a special branch and bound algorithm is applied based 
on the derivation of tight lower bounds via a suitable 
Lagrangian dual formulation. 

Briefly, for the location-allocation problems that 
have embedded spatial-interaction equations dual- 
based exact methods, [17], and heuristic approaches, 
[2], have been developed. 


Application: Development of Offshore Oil Fields 


In this section a real world application of the 
location-allocation problem is presented considering 
the minimum-cost development of offshore oil fields, 
[6]. The facilities to be located are the platforms and the 
demands to be allocated are the oil wells. For the ini- 
tial information about an oil field, locations are decided 
upon the production wells which are specified by two 
map coordinates and a depth coordinate. The drilling is 
performed directionally from fixed platforms. The cost 
of drilling depends on the length and angle of the well 
from the platform. The platform cost depends on the 
water depth and on the number of wells to be drilled 
from the platform. Consequently for a large number of 
wells (25 to 300) an optimization problem that arises is 


to find the number, size and location of the platforms 
and the allocation of wells to platforms so as to mini- 
mize the sum of platform and drilling costs. 

In order to formulate this problem the following in- 
dices, parameters and variables are introduced. Let m 
denote the number of wells and i the index of well, 
n the number of platforms and j the index for plat- 
form, zj are then the binary variables that represent 
the allocation of the well i to platform j if it takes the 
value of 1, otherwise it becomes 0, S; the capacity of 
the platform j representing the number of wells drilled 
from this platform, (a;, b;) denote the location coor- 
dinates of well i and (xj, yj) the location of platform j, 
dij = J[ (x; — aj)? + (yj — b;)?] is the horizontal Eu- 
clidean distance between well i and platform j, g(djj) 
denotes the drilling cost function that depends on dis- 
tance dj, P(S;, x;, y;) is the platform cost which is a func- 
tion of platform size S; and its location. Based on this 
notation the location-allocation problem can be formu- 
lated as follows: 


min Yd. zijg(dij) + > PUS;. x5. 94) 
i=1 j=1 j=l 
s.t. ye =1, V(i), 
j=l 
a _ Sj, Vis 
i=1 
Zij => 0, 1. V(i, j), 


where the first set of constraints guarantee that each 
well is assigned to exactly one platform and the sec- 
ond set guarantee that exactly S; wells are assigned 
to each platform. Note that n is fixed in the problem 
and is usually small in the size of 3 to 5. The nature 
of the problem depends upon the form of the cost of 
the drilling function and the platform cost function. 
The approach taken in [6] is the alternating location- 
allocation method presented in the previous section. 
For the specific problem the approach involves the fol- 
lowing steps: 
a) given fixed platform locations find a minimum cost 
allocation of wells to platforms; 
b) given fixed allocation of wells to platforms find the 
minimum total cost location for each platform. 
The procedure alternates between steps a) and b) un- 
til convergence is achieved. The convergence criterion 
is the following: From the solution of step a) a set of 
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n subproblems are generated for each one of the plat- 
forms, the solution of these problems result in the re- 
location of the platforms. The iterations continue until 
no changes are possible. As mentioned above, the so- 
lution obtained from this algorithmic procedure is lo- 
cally optimum in the sense that for a given assignment 
of wells to platforms the solution cannot be improved 
by changing locations and for given locations, the so- 
lution cannot be improved by altering the assignment 
of wells to platforms. The mathematical formulation of 
problem a), the allocation subproblem is the following: 


min Y zijg(dij) - >" P(S)) 
j=l 


i=1 j=1 


s.t. ¥ 24 = 1, V(i), 
j=l 


Yo zij => Sj, Vi. 
i=1 


Zij = 0,1, V(i, j), 

note that the platform cost now depends only on S; 

since the location of the platforms are known. The so- 

lution procedure for this problem depends on the form 
of the platform cost P(S;). Five different forms are dis- 

cussed in [6]: 

1) Single fixed cost with no capacity constraints: P(S;) 
= aj In this case the total cost for platforms is fixed 
and the optimal allocation corresponds to the as- 
signment of the wells to the closest platform. 

2) Single fixed cost with capacity constraints: P(S)) = a; 
and capacity constraints are introduced as inequali- 
ties )°"_, Zi < Sj, Vj. In this case the problem cor- 
responds to a linear programming model. 

3) Linear platform cost: P(S;) = a; + b; S; By considering 
the following transformation cj’ = cj + bj the prob- 
lem takes the form of case 1). 

4) Piecewise linear function. In this case the problem 
has the structure of ‘transshipment problem’ which 
can be solved network flow techniques. 

5) Step function: P(S;) = ~ ry Zhi» where Kj are the 
number of different size platforms available and rf 
is the cost of kth size of platform j. The problem 
in this case is a mixed integer linear programming 
problem. 

The mathematical formulation for problem b), the 
location problem, is the following. Assuming that A; is 


the set of indices for the wells assigned to platform j, 
then zj = 1, fori€ Aj, Zi = 0 otherwise and the problem 
for platform j takes the form: 


min | ~ g(dij) + P(x;, yj). 


i=1 i€A; 


Note that the platform cost is a function of platform lo- 
cation only since the size is assumed known. Since the 
drilling cost function is convex, if the platform cost is 
also convex then the problem corresponds to the min- 
imization of a convex function that can be achieved 
through a local minimization algorithm. Of course if 
the platform cost is nonconvex then global optimal- 
ity cannot be guaranteed and global optimization tech- 
niques should be considered, [7]. 

Finally, M.D. Devine and W.G. Lesso, [6], applied 
the aforementioned procedure to two test problems one 
involving 60 wells and 7 platforms and a second one in- 
volving 102 wells and 3 platforms. In both cases they 
reported large economic savings in the field develop- 
ment. 
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Pooling and blending is inherent in many manufactur- 
ing plants with limited tankage available to store the 
intermediate streams produced by various processes. 
Also, chemical products often need to be transported 
as a mixture, either in a pipeline, a tank car or a tanker. 
In each case, blended or pooled streams are then used 
in further downstream processing. In modeling these 
processes, it is necessary to model not only product 


MINLP: Applications in Blending and Pooling Problems, Fig- 
ure 1 
General pooling and blending problem 


flows but the properties of intermediate streams as well. 
The presence of these pools can introduce nonlineari- 
ties and nonconvexities in the model of the process, re- 
sulting in difficult problems with multiple local optima. 

Given a set of components i, a set of products j, a set 
of pools k and a set of qualities J, let x, be the amount of 
component i allocated to pool J, yj be the amount going 
from pool / to product j, z be the amount of compo- 
nent i going directly to product j and px be the level 
of quality k in pool /. Furthermore, let A;, Dj and S; be 
upper bounds for component availabilities, product de- 
mands and pool sizes respectively, let Ci, be the level of 
quality k in component i, Px be upper bounds on prod- 
uct qualities, c; be the unit price of component i and d; 
be the unit price of product j. The general pooling and 
blending model can then be written as [1]: 


max “Le ae ae 
st. Sat Dasa 
Stes 
ya Dy =? 
Dans < 5 
- Deus + Pik dy =0 
Den Pik) V1 
26) 


Xil, Vij» Zij, Pik = 9. 


ci)Zij 


Pix)zij <0 


The first two sets of constraints ensure that the amount 
of components used and products made do not ex- 
ceed the respective availabilities or demands. The third 
and fourth set of constraints are material balance con- 
straints around each pool, which ensure that there is no 
accumulation or overflow of material in the pools. The 
fifth set of constraints relates the quality of each pool 
to the quality of the components going into the pool 
(in this case, the qualities are assumed to blend linearly, 
that is, the pool quality is an average of the qualities of 
the components). Finally, the sixth set of equations en- 
sures that any upper bound specifications on product 
qualities are met. These last two sets of equations are 
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bilinear, and can cause significant problems in solving 
these models. 

The general blending problem has a similar for- 
mulation as above, except that the pools need not be 
present; the components can be blended directly to 
make various products. It should also be noted that 
there are various other formulations possible, involv- 
ing multiple time periods, tanks and inventories for 
components and products, and costs for pooling. More- 
over, not all the components need go through all pools. 
One example of a simplified pooling model, due to C.A. 
Haverly [8,9], is given in Fig. 2, where three compo- 
nents with varying sulfur contents are to be blended to 
form two products. There is a maximum sulfur restric- 
tion on each product. The components have values of 
6, 13 and 10, respectively, while the products have val- 
ues of 9 and 15, respectively. The mathematical model 
for the problem consists of writing mass and sulfur bal- 
ances for the various streams, and can be formulated as 


max 9-(yi1 + 231) + 15+ (yi2 + 232) 
—6X11 — 13x21 — 10 - (Z31 + 232) 
St Xu + X21 — Yu — Yin = 0 
Po yu + 2231 — 2.5(y11 + 231) < 0 
P°Vi2 + 2232 — 1.5(y12 + 232) < 0 
PP: (yu + Viz) — 3x11 — X21 = 0 
yu + 231 < 100 
Vi2 + 232 < 200. 


The variable p represents the sulfur content of the pool 
(and of y;; and y,2) and is determined as an average of 
the sulfur contents of x;; and x. 


Characteristics of Pooling and Blending Problems 
Multiple Solutions 


The presence of nonconvex constraints needed to de- 
fine pool and product qualities often results in multiple 
local solutions in these models. For example, consider 
the optimal solution of the Haverly pooling problem as 
a function of the pool quality p, as shown in Fig. 3. 
It can be seen that the problem has three solutions: 
1) Alocal maximum of 125 at p = 2.5 with x1; = 75, x2) 
= 25, yi, = 100 and all other variables zero; 
2) a saddle point region with 1 < p < 2, all flows zero 
and profit of zero; and 


Max, 1.5% 5 
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Haverly pooling problem 
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ure 3 
Optimal solution to Haverly pooling problem 


3) a global maximum of 750 at p = 1.5 with x, = 50, 
X21 = 150, yi2 = 200 and all other variables zero. 

It is not uncommon for a large pooling problem to have 

many dozen local optima, with the objective function 

varying by small amounts but with all the flow and qual- 

ity variables taking on vastly different values. 


MINLP: Applications in Blending and Pooling Problems 


2117 


Nonlinear Blending 


For the sake of simplicity, it is often assumed in for- 
mulating these models that the qualities to be tracked 
blend linearly by volume or weight of each component. 
In practice, however, this is rarely the case. For exam- 
ple, one of the properties commonly tracked in refinery 
blends is the Reid vapor pressure (RVP), which mea- 
sures the volatility of a blend. The most commonly used 
blending rule for RVP is the Chevron method: 


(> 7) R}25 = y ef 
i i 


where 1; is the RVP of component i, x; is its volume, 
and R is the RVP of the blend. Including such a non- 
linear equation in the model can cause difficulty in its 
solution. Fortunately, this can be avoided by introduc- 
ing a blending index, defined as 


R = Ri. 


Then, all specifications on the blend RVP can be con- 
verted using the same index. For example, if there is 
a lower bound R" on the blend RVP, then using the 
blending index results in the constraints as: 


“4 )R= XiTi, Rey. 
(x ) bea R > (Rt) 


In some cases, the properties (such as octane number or 
pour point) can require complex blending rules which 
cannot be simplified using the blending index, and the 
full nonlinear blending equation must be included in 
the model as is. 


Single versus Multiperiod Models 


Since components are pooled or blended in the plants 
on a regular basis, it is often advantageous to model 
these processes using multiple periods. With multi- 
period models, it is possible to accumulate material in 
the pools or blend tanks, thereby facilitating the alloca- 
tion of stocks ahead of time in anticipation of a future 
lifting of a valuable product. This requires the model to 
incorporate inventories (carry-over stock) in each tank 
or pool, resulting in more complex models. It is im- 
portant to note that each period does not need to be 
of the same duration. Often, the results of the multi- 
period models will only be implemented for the first 


period, with results for future periods being used for 
planning purposes. Therefore, initial periods are typi- 
cally of shorter duration (say a day each) while later pe- 
riods might be as long as a month. This way, the same 
multiperiod model can be used as an operating tool for 
the present and a planning tool for the future. 

Another important consideration in multiperiod 
models is the disposition of stocks at the end of the final 
period. If the final inventories/stocks are included sim- 
ply as variables, the optimal solution will almost always 
set them to zero. In practice, however, this is unrealistic 
since it is not desired to run down stocks. This can be 
dealt with in several ways: 

a) set the final inventory levels to reasonable values 
(say the same as inventory levels at the beginning 
of the first period); 

b) assign a value to final inventory; this way the model 
can decide if it is worthwhile to produce stock to sell 
at the end of the final period. 


Logical Constraints and MINLP Formulations 


It is often necessary to impose additional logical con- 

straints that dictate how various components are to be 

blended in relation to each other. Modeling such con- 
straints often requires the addition of integer variables, 
as discussed below. 

a) If a component is to be used in a particular blend, 
then it must be present in at least a certain amount 
in the blend. This arises from the fact that it is usu- 
ally not practical to blend in infinitesimally small 
quantities. 

If x represents the volume of such a component, 
then introducing a new binary variable 6 (i.e. 6 is 
either 0 or 1) and the constraints 


x— M6 <0, 


x—méd>0 


are sufficient to ensure this condition is satisfied. 
Here, M is a sufficiently large number, while m rep- 
resents the threshold value below which a compo- 
nent should not be blended in. 

b) Each product can have at most k components in 
its blend. This is typically imposed by limitations 
on how many streams can be physically blended in 
a reasonable amount of time. Again, introducing the 
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new variables and constraints as below: 


xX] — m6, = 0, 


Xn —méb, > 0, 
6) +++: +6, <k, 
Oy ese On € {0—1}”", 


ensures this condition is met. 
c) If component A is to be present in the blend, then 
component B must also be present: 


xa — mb, > 0, 
xp — mdz > 0, 
bp > ba. 


Each of these logical constraints results in a mixed 
integer nonlinear programming (MINLP) model (cf. 
also » Mixed integer nonlinear programming). To date 
(2000), such models have not been used extensively in 
the practical solution of these problems in industry. 


Complexity of Models 


With the various options of single versus multiperiod 
and linear versus nonlinear blending, the models for 
pooling and blending can vary significantly in complex- 
ity. This is shown pictorially in Fig. 4. 


Solution Methods 


Pooling problems can be solved using a variety of solu- 
tion algorithms. These can be broadly classified as local 
and global solution methods. 


Local Optimization Approaches 


Traditionally, pooling and blending problems have 
been solved using various recursion and successive lin- 
ear programming (SLP) techniques. The first published 
approach for solving the pooling problem was due to 
Haverly [8], who proposed the following recursion ap- 
proach for solving the problem given in Fig. 2: 


1 | Start with a guess for the pool quality p. 

2 | Solve the remaining linear problem for all 
other variables. 

3 | Calculate a new value for p from the solution 
in 2). 


Unfortunately, this rather simple recursion will 
converge to a suboptimal solution regardless of the 
starting value for p. This can be partially addressed by 
using a “distributed recursion’ approach, where an ad- 
ditional recursion coefficient f and two additional ‘cor- 
rection vectors’ are introduced, modifying the inequal- 
ities in the model as follows: 


Po yu + 2231 — 2.5(yu1 + 231) 
+ f(over — under) < 0, 

P°Vi2 + 2232 — 1.5(y12 + 232) 
+ (1 — f)(over — under) < 0. 


This formulation serves to distribute the error made in 
estimating the pool quality to the two pool destinations. 
Recursing on both p and f has a better likelihood of 
identifying the optimal solution. 

SLP algorithms solve nonlinear models through 
a sequence of linear programs (LPs), each of which is 
a linearized version of the model around some base 
point. These methods consist of replacing nonlinear 
constraints of the form 


g(x) <0, A(x) =0, 


with the linearizations 
g(x") + Vg(e*) «(x —x*) <0, 
h(x*) + VA(x*) - (x —x*) =0 


around a base point x* at the kth iteration. The lin- 
earized problems can be solved using standard LP 
methods. The solution to the problem is used to pro- 
vide a value for x*t!. As long as there is an improve- 
ment in the objective function value as well as the 
feasibility of the original constraints, these methods 
can be shown to converge to a local optimum. They 
work well for largely linear problems and have there- 
fore found widespread use in the refining industry for 
solving pooling, blending and general refinery planning 
problems [4,11]. However, when there are nonlinear 
blending constraints, the linearization in the SLP is of- 
ten a bad approximation of the original problem, lead- 
ing to poor convergence rates and large solution times. 

Pooling and blending problems can also be solved 
using other nonlinear programming (NLP) meth- 
ods such as generalized reduced gradient, successive 
quadratic programming or penalty function methods. 
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Types of pooling problems 


In general, these methods have not found large accep- 
tance in solving these problems, mainly due to difficul- 
ties with convergence and stability. 


Global Optimization Approaches 


The recursive, SLP and conventional NLP techniques 
all suffer from the drawback that the solution found is 
highly dependent on the starting point, and in general 
cannot guarantee convergence to the global solution. In 
the last dozen years, numerous approaches have been 
proposed for the solution of quadratically constrained 
optimization problems (such as the pooling/blending 
problem). Surveys of these algorithms can be found 
in [10,12]. These approaches can generally be classified 
as either decomposition-based or branch and bound al- 
gorithms. 

One of the common approaches to dealing with 
the nonconvexities in the pooling problem is to reduce 
the bilinear terms to linear terms over a convex enve- 
lope [2]. Noting that for any bilinear term p - y, 


(=p) a7 20, 
(p—p")-(y—y") 20, 
(p—p")-(y—y") <0, 
(p—p")-(y—y') $0, 
where [p’, pY] and [y", y¥] define the ranges for the 


variables p and y. This allows the term p - y to be re- 
placed by a set of linear inequalities in the model, re- 


sulting in a linearized problem which provides an upper 
bound on the global solution to the original problem. 
After solving this problem, the rectangle defined by the 
bounds on p and y can be subdivided into smaller rect- 
angles, and a new linearized problem can be solved over 
each of these subrectangles. By continuously subdivid- 
ing these rectangles, the upper bound can be made to 
asymptotically approach the global solution. See [7] for 
the solution of several pooling problems using this ap- 
proach. 

Note that the pooling problem is a partially linear 
problem. That is, it can be formulated as 


min c!x 
=P (1) 


st.  A(p)x <5), 


where p represents the pool quality and x represents all 
component flow rates. For such problems, decomposi- 
tion approaches provide a natural solution mechanism. 
For a fixed value of p, this problem is linear, and pro- 
vides an upper bound on the global solution. The solu- 
tion to this linear problem (called the ‘primal’ problem) 
can be used to generate a Lagrange function of the form 


L(x, p) =c'x +A-(A(p)x — b) 


where A represents the multipliers or marginal values 
for the constraints from the primal problem. Then, the 
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‘dual’ problem 


min pL 
X,Pob (2) 
s.t. pL = L(x, p) 


provides an upper bound on the global solution. Prob- 
lem (2) contains bilinear terms of the form A(p)x, 
which can be underestimated in a variety of ways. C.A. 
Floudas and V. Visweswaran [5,6] have developed the 
GOP algorithm based on this approach. By alternating 
between the primal problem and a series of relaxed dual 
problems (developed by successively partitioning the 
feasible region), the GOP algorithm guarantees conver- 
gence to the global solution. In [13,14], they show that 
it is possible to develop properties that reduce the num- 
ber of relaxed dual problems that need to be solved, thus 
speeding up the overall algorithm. They also report the 
solution of numerous pooling and blending problems 
using this approach. 

Instead of fixing p for the primal problem, it is pos- 
sible to solve (1) directly using local optimization tech- 
niques. For example, nonsmooth optimization tech- 
niques can be effective in finding local solutions to these 
problems [1]. The dual problem can also be solved this 
way, with the region for p being refined by partition- 
ing. See [1] for the solution of several pooling problems 
using this approach. 

It is important to note that these global optimiza- 
tion approaches (and others) for solving the pooling 
problem can be computationally intensive. Invariably, 
a large number of subproblems need to be solved be- 
fore convergence to a global solution can be guaranteed. 
Because the subproblems are usually of the same struc- 
ture, varying only slightly in the data for the problems, 
they can be solved in parallel. See [3] for an implemen- 
tation of a distributed parallel version of the GOP al- 
gorithm and a successful application to solve pooling 
problems of medium size. 


Applications 


The most common application of pooling and blending 
models is in the refining and petrochemical industries. 
Crude oil from various sources is often brought into 
the refinery and stored in common tanks before being 
processed downstream. Similarly, intermediate streams 
from various refinery processes (alkylation, reforming, 


cracking) are usually sent to common pools from which 
finished products such as gasoline and diesel oil are 
made. In both cases, it is important to know various 
qualities of the stream coming out of the pool (such as 
chemical compositions like sulfur or physical proper- 
ties such as vapor pressure). 

In addition to refinery processes, blending is a fea- 
ture of various other manufacturing processes. These 
include 
e agriculture, where blending livestock feeds or fertil- 

izers at minimum cost is very important; 

e mining, where different ores are often mixed to 
achieve a desired quality; 
e various aspects of food manufacturing; and 

pulp and paper, involving blending of raw materials 

used to produce paper. 
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In the development of a process, the steady state design 
aspects and dynamic operability issues are usually han- 
dled sequentially. First, the design engineers develop 
and synthesize the structure of the flowsheet and de- 
termine the operating parameters and steady-state op- 
erating conditions. Then, the control engineer takes the 
fixed design and develops a control system to maintain 
the system at the desired specifications. During the first 
step, the dynamic operation of the process is generally 
not considered, and in the second step, changes to the 
flowsheet and operating conditions generally can not be 
made. 

Process design seeks to determine the arrangement 
of processing units that will convert the given raw ma- 
terials into the desired products. The idea is to develop 
a process flowsheet from the large number of possible 
design alternatives. Numerous process design methods 
and techniques exist for determining the best process 
flowsheet and operating conditions. This best design is 
determined by optimizing some economic criteria and 
the quality of the design is based on its economic value. 
Hence, the process is designed to operate at steady state 
and issues relating to the process dynamics, operability, 
and controllability are usually not considered. 

Once the process has been designed, the plans are 
handed over to the process control engineer whose task 
is to ensure the stable dynamic performance of the pro- 
cess. The control engineer is concerned with develop- 
ing a control system which maintains the operation of 
the process at the desired steady state in the presence 
ever-changing external influences. Issues such as dis- 
turbances, uncertainty, and changes in production rates 
must be addressed so as to maintain product quality 
and safe operation. By addressing the design and con- 
trol sequentially, the inherent connection between the 
two is neglected. For instance, the steady-state design of 
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a process may appear to produce great economic prof- 
its. However, unfavorable dynamic operation may lead 
to a product which does not meet the required specifi- 
cations. This may result in an economic loss due to dis- 
posal or reworking costs. Thus, a process design with 
good controllability aspects may have better economic 
value that an economically optimal steady state design 
when the dynamic operation is considered. This trade- 
off between the steady state design and the dynamic 
controllability motivates the treatment of the issues si- 
multaneously. 

There are additional incentives for employing a si- 
multaneous approach. Due to economic and environ- 
mental reasons, the recent trend in process design has 
been towards more highly integrated process in terms 
of both material and energy flows. Processes are also 
required to operate under much tighter operating con- 
ditions due to environmental and safety issues. Both of 
these lead to designs with increased dynamic interac- 
tions and processes which are generally more difficult 
to control. Thus, the dynamic operation of the process 
must be considered at the early stages of the design. 

A systematic method for analyzing the interaction 
of design and control requires quantitative controllabil- 
ity measures of the process. Such measures have been 
derived to quantify certain qualitative concepts about 
the controllability of the process such as inversion, in- 
teraction effects, and directionality problems. A com- 
mon measure for controllability is the integral squared 
error (ISE) between outputs and their desired levels. Al- 
though it is easy to measure, it is not of direct interest in 
practice. Other performance criteria such as maximum 
deviation of output variables, maximum magnitude of 
control variables, or time to return to steady state can 
also be used. 

Most of the work in the development of control- 
lability measures has focused on linear dynamic mod- 
els. The control objective is the robust performance of 
the process without any restrictions on the controller 
structure [15]. One such measure is the structured sin- 
gular value, o, which indicates the performance in the 
presence of uncertainty. The condition number, y, has 
been developed as an indicator of closed-loop sensi- 
tivity to model error while the disturbance conditions 
number, y4, indicates the sensitivity of the process to 
disturbances. The relative gain array (RGA), A, is used 
as an indicator of the relationship between control error 


and set point changes while the closed-loop disturbance 
gain (CLDG) is used to measure the relation between 
control error and disturbances. These measures have 
been used extensively in applications for controllabil- 
ity assessment; however, they can be misleading. While 
these indicators give ideas as to the closed loop perfor- 
mance of the process, their impact on the economics of 
the process is not clear. 


Previous Work 


In comparison to the amount of research on the con- 
trollability measures, relatively little work has been 
placed on methods for systematically determining the 
trade-offs between steady-state economics and dynamic 
controllability. Although economics continues to be the 
driving force in the design of a process, there is no 
straightforward method for evaluating the economics 
of the dynamic operation of the process. Several meth- 
ods have been proposed to address these issues. M. 
Morari and J.D. Perkins [14] discuss the concept of con- 
trollability and emphasize that the design of a control 
system for a process is part of the overall design of the 
process. Noting that a great amount of effort has been 
placed on the assessment of controllability, particularly 
for linear dynamic models, they indicate that very lit- 
tle has been published on algorithmic approaches for 
determination of process designs where economics and 
controllability are traded off systematically. 

In order to deal with the controllability issues on 
a economic level, a back-off method was presented in 
[18] to determine the economic impact of disturbances 
on the system. The basic idea is to determine the opti- 
mal steady-state operating point such that the feasible 
operation is maintained with respect to all constraints 
in the presence of uncertainties and disturbances. This 
operating point is compared to the optimal steady- 
state operating point determined in the absence of dis- 
turbances. The economic penalty incurred by backing 
away from the disturbances-free operating point to the 
feasible operating point can be determined and thus the 
cost of the disturbance can be evaluated. This concept 
is illustrated in Fig. 1. Point A indicates the nominal 
steady-state design, and point B is the back-off point 
which corresponds to the design which will not violate 
the constraints h; and h2 in the presence of uncertain- 
ties and disturbances. 
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Figure 1 
Illustration of the back-off approach 


The method is further developed in [17], where the 
control structure selection problem is analyzed. Perfect 
control assumptions are used along with a linearized 
model to formulate a mixed integer linear program 
(MILP) where the integer variables indicate the pairings 
between the manipulated and controlled variables. The 
back-off approach incorporated the dynamic operation 
of the process into the design, but it only ensures the 
feasible operation of the process and does not directly 
address controllability aspects. 

An approach for determining process designs which 
are both steady-state and operationally optimal was 
presented in [2]. The controllability of potential designs 
is evaluated along with their economic performance 
by incorporating a model predictive control algorithm 
into the process design optimization algorithm. This 
coordinated approach uses an objective function which 
is a weighted sum of economic and controllability mea- 
sures. 

A multi-objective approach was proposed in [9,10] 
to simultaneously consider both controllability and 
economic aspects of the design. This approach incorpo- 
rates both design and control aspects into a process syn- 
thesis framework where the trade-offs between various 
open-loop controllability measures and the economics 
of the process can be observed. The problem is formu- 
lated as a mixed integer nonlinear program (MINLP), 
where integer variables are utilized for structural al- 


ternatives in the process flowsheet. Through the ap- 
plication of multi-objective techniques, a process de- 
sign which is both economic and controllable is deter- 
mined. 

A screening approach was proposed in [4], where 
the variability in the product quality is used to com- 
pare different steady-state process designs. The dy- 
namic controllability is measured economically by cal- 
culating the amount of material produced that is off- 
specification and on-specification. The on-specification 
material leads to profits while the off-spec material re- 
sults in costs for reworking or disposal. 

A back-off technique was also developed in [1] for 
the design of steady-state and open-loop dynamic pro- 
cesses. Both uncertainties and disturbances are consid- 
ered for determining the amount of back-off. In order 
to address the fact that back-off approaches address the 
feasible operation and do not address controllability as- 
pects, [5] introduces a recovery factor which is defined 
as the ratio of the amount of penalty recovered with 
control to the penalty with no control. This ratio is then 
used to rank different control strategies. 

The advantage of the back-off approaches is that 
they determine the cost increase associated with mov- 
ing to the back-off position which is attributed to the 
uncertainties and disturbances. A limitation of this ap- 
proach is that it can lead to rather conservative designs 
since the worst-case uncertainty scenario is considered. 
Although the probability of the worst-case uncertainty 
occurring may not be high, this is the basis for the final 
design. Also, the method has not been applied to the 
design/synthesis problem. A fixed design is considered 
and then the back-off is considered as a modification of 
this design. 

The optimal design of dynamic systems under un- 
certainty was addressed in [13]. Flexibility aspects as 
well as the control design were considered simultane- 
ously with the process design. The algorithm is used to 
find the economic optimum which satisfies all of the 
constraints for a given set of uncertainties and distur- 
bances when the control system is included. 

S. Walsh and Perkins [23] outline the use of opti- 
mization as a tool for the design/control problem. They 
note that the advances in computational hardware and 
optimization tools have made it possible to solve the 
complex problems that arise in design/control. Their 
assessment focuses on the control structure selection 
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problem where the economic cost of a disturbance is 
balanced against the performance of the controller. 

The increasing importance of design and control is- 
sues had lead to more and more discussion on the topic. 
One contribution to the area has been [11]. The funda- 
mental design and control concepts are described and 
several quantitative examples are given which illustrate 
the interaction of design and control. 

Most of the previous work does not address syn- 
thesis issues and does not treat the problem quanti- 
tatively. Two methods employ the optimization ap- 
proach in process synthesis to arrive at mathematical 
programming formulations which are solved to deter- 
mine the trade-offs between the steady-state design and 
dynamic controllability. The first method [9,10] uses 
steady state linear controllability measures while the 
second method [20] uses full nonlinear dynamic mod- 
els of the process. 


Process Synthesis 


Mathematical programming has been found to be 
a very useful tool for process synthesis. Its application 
in analyzing the interaction of design and control has 
followed directly along the process synthesis methodol- 
ogy. 

The goal in process synthesis to determine the struc- 
ture and operating conditions of the process flowsheet. 
The optimization approach to the synthesis problem in- 
volves three steps: 

1) The representation of process design alternatives of 
interest through a process superstructure. 

2) The mathematical modeling of the superstructure. 

3) The algorithmic development of solution procedure 
to extract the optimal process flowsheet from the su- 
perstructure and solution of the optimization prob- 
lem. 

The key aspect is the postulation of a superstructure 

which contains all possible design alternatives of inter- 

est. The superstructure must be sufficiently rich so as to 

include the numerous design possibilities yet succinct 

enough to eliminate redundancies and reduce complex- 

ities. 

The mathematical model is characterized by the 
variables and equations used in the model. Continu- 
ous variables are used to represent flowrates, compo- 
sitions, temperatures, etc. Binary variables are used to 


represent structural alternatives such as the existence of 
process units. The modeling of steady-state processes 
leads to algebraic equations and constraints and re- 
sults in an MINLP. When dynamic models are to be 
used, the continuous variables are partitioned into dy- 
namic state variables, control variables, and time invari- 
ant variables, and the resulting formulation is classified 
as a mixed integer optimal control problem (MIOCP). 


Steady-State Modeling Approach 


This approach was outlined in [9,10] and follows the 
optimization approach for process synthesis. A system- 
atic procedure is presented for incorporating open-loop 
steady-state controllability measures into the process 
synthesis problem. The problem is formulated mathe- 
matically as a MINLP and a multi-objective optimiza- 
tion problem is solved to quantitatively determine the 
best-compromise solution among the economic and 
control objectives. The €-constraint method is used to 
determine the noninferior solution set where one objec- 
tive can be improved only at the expense of another, 
and the best-compromise solution is determined using 
a cutting plane algorithm. 

In order to apply the process synthesis approach, 
the controllability measure must be expressed as a func- 
tion of the unknown design parameters. Steady-state 
controllability measures are used to simplify the prob- 
lem and reduce implementation difficulties that arise 
when considering controllability measures as functions 
of frequency. The steady-state gains of the process can 
be written in an analytical form thus allowing for an al- 
gebraic representation. 

The starting point for the controllability analysis is 
the linear multiple input/multiple output system writ- 
ten in the Laplace domain as 


z(s) = G(s)u(s) + Ga(s)d(s), 


where z are the output variables, u are the control vari- 
ables, G(s) is the process transfer function matrix, and 
G,(s) is the disturbance transfer function matrix. 

Closed-loop control can be considered by express- 
ing the control variable u(s) as 


u(s) = G;(s)(z*(s) — 2(s)), 


where G,(s) is the controller transfer function and z* 
is the desired set-point. This requires that the form of 
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controller transfer function be known as well as the 
method for calculating the parameters. Since this causes 
problems in the formulation of the optimization prob- 
lem, the controllability is viewed as a property inherent 
to the process and independent of the particular con- 
trol system design. The analysis thus considers only the 
open-loop controllability measures which depend only 
on the process itself. 

Since both the process design and controllability 
measures can be expressed as functions of the unknown 
design parameters, the synthesis problem can be ex- 
pressed as a multi-objective MINLP: 


min J(x,y) 

st. h(x,y) =0 
g(x,y) =0 
n = h(x, y) 
xe€XCR? 
y € {0, 134, 


In this formulation, J is a vector of objectives which in- 
cludes both the economic objectives and controllabil- 
ity objectives. The expressions h and g represent mate- 
rial and energy balances, thermodynamic relations, and 
other constraints. The controllability measures are in- 
cluded in the formulation as y. The variables in this 
problem are partitioned as continuous x and binary y. 

The problem is posed with multiple objectives rep- 
resenting the competing economic and open-loop con- 
trollability measures. Different techniques have been 
developed in order to assess the trade-offs among the 
objectives quantitatively. In this approach, the nonin- 
ferior solution set is generated to determine the set of 
solutions in which one objective can be improved only 
at the expense of the other(s). The noninferior solution 
set for a two objective problem is visually depicted in 
Fig. 2. 

This noninferior solution set is generated using an 
€-constraint method where one objective is optimized 
and the others are included as constraints less than a pa- 
rameter €. The problem is reduced to a single objec- 
tive optimization problem which is iteratively solved for 
varying values of € to generate the noninferior solution 
set. 

By reducing the problem to a single objective prob- 
lem, MINLP optimization techniques can be applied 


Solution Set 


Noninferior Solution Set 


i f; 
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Figure 2 
Noninferior solution set for a problem with two objectives 


to solve the problem. These MINLP techniques in- 
clude generalized Benders decomposition (GBD) [7,19], 
outer approximation (OA) [3], outer approximation 
with equality relaxation (OA/ER) [8], and outer ap- 
proximation with equality relaxation and augmented 
penalty [22]. These are discussed in detail in [6]. 

Once the noninferior solution set is determined, the 
best compromise solution is determined by applying 
a cutting plane algorithm. The trade-offs among the ob- 
jectives are quantitatively assessed using weight factors 
which come from the slope of the noninferior solution 
set. 


Dynamic Modeling Approach 


The major limitation of the above approach is that is 
does not consider the dynamic behavior of the pro- 
cess. This approach considers the full dynamic model of 
the process and a dynamic controllability measure. An 
optimization approach is applied which involves a dy- 
namic optimization problem. 

One of the initial difficulties with this method is 
defining a controllability measure for nonlinear dy- 
namic systems. As in the previous method, the control- 
lability measure must be capable of being expressed as 
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a function of the unknown design parameters. One pos- 
sible choice for the controllability measure is the inte- 
gral square error (ISE). The benefit of this measure is 
that it is easy to calculate and and does reflect the dy- 
namics of the process albeit only in the outputs of the 
process. One downside of this measure is that there is 
no one to one correspondence between the the control 
structure and the ISE measure. Thus, different dynamic 
characteristics of the process may not be reflected in the 
ISE. 

The superstructure is the same as in the previ- 
ous approach, but a dynamic model is used instead of 
a steady-state model. The dynamic modeling of the su- 
perstructure leads to a problem that includes differen- 
tial and algebraic equations (DAEs) and the formula- 
tion is a multi-objective MIOCP. New algorithmic tech- 
niques must be developed for the solution of the formu- 
lation. 

The general formulation for the multi-objective 
MIOCP is as follows: 


min J(z,(t;), Z:(ti), Za(ti), u(ti), x, y) 
st. £1 (Zi (t), z1(£), Z2(t), u(t), x, y, t) = 0 
f,(zi(t), Z2(t), u(t), x,y, t) = 0 
Z(to) = zy 
Z(to) = 25 
h’(zZ, (ti), Z1(ti), Za(ti), w(ti), x,y) = 0 
g (2: (ti), Z1(ti), Z2(ti), w(ti),x,y) <0 (1) 
h’(x,y) =0 
g(x,y) <0 
xe X CR? 
y € {0, 1}4 
ti € [to, tn] 
i=0,...,N. 


Here, z;(t) is a vector of n dynamic variables whose 
time derivatives, z)(t), appear explicitly, and z(t) is 
a vector of m dynamic variables whose time derivatives 
do not appear explicitly, x is a vector of p time invari- 
ant continuous variables, y is a vector of q binary vari- 
ables, and u(t) is a vector of r control variables. Time t 
is the independent variable for the DAE system where 
to is the fixed initial time, t; are time instances, and ty 
is the final time. The DAE system is represented by f), 
the n differential equations, and f,, the m dynamic alge- 


braic equations. The constraints h’ and g’ are point con- 
straints where t; represents the time instance at which 
the constraint is enforced and h” and g”’ are general 
constraints. The objective functions for the economic 
and controllability measures are represented by the vec- 
tor J. 

The initial condition for the above system is deter- 
mined by specifying n of the 2n + m variables z; (to), 1 
(to), Z2 (to). For DAE systems with index 0 or 1, the re- 
maining n + m values can be determined. In this work, 
DAE systems of index 0 or 1 are considered and the ini- 
tial conditions for z;(t) and z(t) are z? and z} respec- 
tively. 

Note that in this general formulation, the y variables 
appear in the DAE system as well as in the point con- 
straints and general constraints. This has implications 
on the solution strategy. 

A similar approach to that of the previous approach 
is applied to address the multi-objective nature of the 
problem. An €-constraint method is applied to reduce 
to problem to an iterative solution of single objective 
MIOCPs. 


MIOCP Solution Algorithm 


The strategy for solving the MIOCP is to apply iterative 
decomposition strategies similar to existing MINLP al- 
gorithms with extensions for handling the DAE system. 
The algorithm developed for the solution of the MIOCP 
closely parallels existing algorithms for MINLP opti- 
mization (GBD, OA, OA/ER, OA/ER/AP). The pres- 
ence of the y variables in DAE system for the general 
case prohibits the use of Outer Approximation and its 
variants. For the special cases where the y variables do 
not appear in the DAEs and do participate in a lin- 
ear and separable fashion, outer approximation and its 
variants can be applied to the problem. The GBD al- 
gorithm can be applied to the solution of the general 
problem, and the algorithmic development closely fol- 
lows those of GBD. 

The GBD algorithm is an iterative procedure which 
generates upper and lower bounds on the solution of 
the MINLP formulation. The upper bound results from 
the solution of an NLP primal problem and the lower 
bound from an MILP master problem. The bounds on 
the solution converge in a finite number of iterations 
to yield the solution to the MINLP model. A similar 
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methodology is applied to the MIOCP problem, but the 
forms of the primal and master problems have to be al- 
tered. 


Primal Problem 


The primal problem is obtained by fixing the y variables 
which leads to an optimal control problem. For fixed 
values of y = y*, the MIOCP has the following form: 


min J(z:(t;), 21(t;), Z2(ti), u(ti). x y*) 

st. £,(z1(t), 21(t), Z2(t), u(t), x, y*, t) = 0 
£,(zi(t), Z2(t), u(t), x, y*, t) = 0 
Z(t) = 24 
Z(to) = 25 
h’(z,(t;), 21(ti), Z2(t;), u(ti), x,y") = 0 
g’ (z(t), 21(ti), Z2(ti), u(ti), xy") < 0 
h’(x, y*) = 0 
g’(x.y*) <0 
xEX CR? 
ti € [to. tn] 


(2) 


The solution of this optimal control problem can be 
handled in several ways: complete discretization, so- 
lution of the necessary conditions, dynamic program- 
ming, and control parameterization. This work focuses 
on the control parameterization techniques which pa- 
rameterize only the control variables u(t) in terms of 
time invariant parameters. At each step of the optimiza- 
tion procedure, the DAEs are solved for given values 
of the decision variables and a feasible path for z(t) is 
obtained. This solution is used to evaluate the objec- 
tive function and remaining constraints. The control 
parameterization can either be open loop as described 
in [21] or closed-loop such as that described in [17] and 
[16] which also includes the control structure selection. 

The basic idea behind the control parameterization 
is to express the control variables u(t) as functions of 
time invariant parameters. This parameterization can 
be done in terms of the independent variable t (open 
loop): 


u(t) = ¢(w, f). 


Alternatively, the parameterization can be done in 
terms of the state variables z(t) (closed-loop): 


u(t) = w(w, 2(t), z(t). 


In both cases, w are the time invariant control param- 
eters. The set of time invariant parameters, x, is now 
expanded to include the control parameters: 


x = {x, wh. 


The set of DAEs (f) is expanded to include parameteri- 
zation functions 


f(-) = {£(-), 60), WO} 


and the control variables are converted to dynamic state 
variables: 


z= {z,u}. 


Through the application of the control parameter- 
ization, the control variables are effectively removed 
from the problem and the following problem results: 


min J(z;(t;), Z1(t;), (ti), x y*) 
s.t. f; (z(t), z: (1), z2(t), x, y*, t) =0 
f,(zi(t), Z2(t), x, y*, t) = 0 


Z(to) = zy 

Z(to) = 2 

h’(z,(t;), 21 (ti), Z2(ti). x,y*) = 0 G3) 
8 (2: (ti), z1(ti), Z(t), x.y") <0 

h”(x,y*) = 0 

g"(x,y*) <0 

xEX CR? 

ti € [to, tn] 

P=0,2325/N% 


This problem is a nonlinear program with differential 
and algebraic constraints (NLP/DAE). This problem is 
solved using a parametric method where the DAE sys- 
tem is solved as a function of the x variables. The solu- 
tion of the DAE system is achieved through an integra- 
tion routine which returns the values of the z variables 
at the time instances, z(t;), along with their sensitivi- 
ties with respect to the parameters, dz/dx(t;). The re- 
sulting problem is an NLP optimization over the space 
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of x variables which has the form: 
min  J(z;(t;),z1(ti), Z2(ti). x. y*) 
st. h’(4;(ti), 21(ti), Z2(ti), x.y") = 0 
g (z1(ti), Z(t), Za(ti), x.y") <0 


h’(x,y‘) =0 

g(x, ¥) <0 
xEXx 

ti: € [tgys+ sty] 

i=0,..., N, 


where the variables Z;(t;), z:(t;), and z2(t;) are deter- 
mined through the solution of the DAE system by inte- 
gration: 


f, (z(t), z(t), 22(t), x y*, t) = 0, 
f,(z,(t), Z(t), x, y t) = 0, 
Z(to) = 2), 


(5) 


Z2(to) =z). 


The functions J(-), g’(-), and h’(-) are functions of 
z(t;) which are implicit functions of the x variables 
through the integration of the DAE system. For the so- 
lution of the NLP the objective and constraints eval- 
uations, along with their gradients with respect to x, 
are required. These are evaluated directly for the con- 
straints g’’(x) and h’’(x). However, for the functions 
J(.), g'(-), and h’(-), the values z(t;), and the gradients 
dz/dx(t;), as returned from the integration, are used. 
The functions J(-), g’(-), and h’(-) are evaluated directly 
and the gradients dJ/dx, dg;’/dx, and dh'/dx are evalu- 
ated by using the chain rule: 


4 =(¥)(B)+(B). 
i 7 (=) (3) + (@). (6) 


Standard gradient based optimization techniques 
can be applied to solve this problem as an NLP. The so- 
lution of this problem provides values of the x variables 
and trajectories for z(t). 

The master problem is formulated using dual infor- 
mation and the solution of the primal problem. Pro- 
vided that the y variables participate linearly, the prob- 
lem is an MILP whose solution provides a lower bound 
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Table 1 
Constraints and their corresponding dual variables 


constraint | dual variable 
f; v(t) 

f, v2(t) 

g be 

h’ Vi 

2 ph" 

h" A" 


and y variables for the next primal problem. Dual infor- 
mation is required from all of the constraints including 
the DAEs whose dual variables, or adjoint variables, are 
dynamic. The constraints and their corresponding dual 
variables are listed in Table 1. 

The dual variables ws’, A’, w”’, and A” are gener- 
ally obtained from the solution technique for the pri- 
mal problem. Dual information from the DAE system 
is obtained by solving the adjoint problem for the DAE 
system which has the following formulation: 


— yp! 4h 

PS Vy aa 

~ _ . T df T df 

P= Fi ge FY? ga (7) 
— pl df T df 

0=», dey FY 2 din” 


This is a set of DAEs where the solutions for df,/dz,, 
df,/dz,, df,/dz,, df,/dz2, and df,/dz, are known func- 
tions of time obtained from the solution of the primal 
problem. The variables v,(t) and v2(t) are the adjoint 
variables and the solution of this problem is a backward 
integration in time with the following final time condi- 
tions: 


dg! df, 
x ! T —0. 
* dz, * B dz, 


1 - 
dz, 


Thus, the Lagrange multipliers for the end-time con- 
straints are used as the final time conditions for the ad- 
joint problem and are not included in the master prob- 
lem formulation. 

The master problem is formulated using the solu- 
tion of the primal problem, x* and z*(t), along with 
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Superstructure for reactor-separator-recycle system 
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Noninferior solution set for the reactor-separator-recycle system 
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Dynamic responses of product compositions for three designs 


the dual information, p’”*, 4/"", and v*(t). The master 


problem has the following form: 


min [ly 
Yoo 


st. py > I(x", y) 


to 


tn 
+ / vk(E) 
to 


f,(z*(t), zk(t),x*,y, t) dt 
+pl* eg" (xk y) zie A" * hh" (x*, y), 
ke Kgeas, 


tn 
o> [vk 
to 


f,(z*(t), z(t), 2k(t),x*,y, t) dt 


tn 
+f vk (t)£(zk(t), k(t), x", y, t) dt 


to 


tpg (xk, y) og ah" (xk, y). 
ke Kinfeas; 
y € {0, 1}4. 


+ . vk (ty (zk (t), zk (t), 2k (t), x*, y, t) dt 


(8) 


The integral term can be evaluated since the profiles 
for z‘(t) and v*(t) both are fixed and known. Note that 
this formulation has no restrictions on whether or not 
y variables participate in the the DAE system. 


Example: Reactor-Separator-Recycle System 


The example problem considered here is the design of 
a process involving a reaction step, a separation step, 
and a recycle loop. Fresh feed containing A and B flow 
into a an isothermal reactor where the first order irre- 
versible reaction A — B takes place. The product from 
the reactor is sent to a distillation column where the un- 
reacted A is separated from the product B and sent back 
to the reactor. The superstructure is shown in Fig. 3. 

The model equations for the reactor (CSTR) and 
the separator (ideal binary distillation column) can be 
found in [12]. The specific problem design follows the 
work in [10]. 

For this problem, the single output is the prod- 
uct composition. The bottoms (product) composition is 
controlled by the vapor boil-up and the distillate com- 
position is controlled by the reflux rate. Since only the 
product composition is specified, the distillate compo- 
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sition set-point is free and left to be determined through 
the optimization. 

The cost function includes column and reactor cap- 
ital and utility costs. 


Ostia: = 176390) 2D,)", 
COStemann = Ge02D Any 
+ 548.8D)>°N:, 
COStetvnges = 1930230... 
COStutilities = 72420V;,, 
COStreactor + COStcolumn + COStexchangers 
Bpay 


+ Brax [costutilities] . 


COSttotal = 


The controllability measure is the time weighted ISE 
for the product composition: 
a = t(xp = wey 

The noninferior solution set is shown in Fig. 4, and 
Table 2 lists the solution information for three of the 
designs in the noninferior solution set. The dynamic 
profile for these three designs are shown in Fig. 5. 

All of the designs in the noninferior solution set are 
strippers. Since the feed enters at the top of the column, 
there is no reflux and thus no control loop for the dis- 
tillate composition. The controllability of the process is 
increased by increasing the size of the reactor and de- 
creasing the size of the column. The most controllable 
design has a large reactor and a single flash unit. 
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Table 2 
Solution information for three designs 


Solution A B Cc 
Cost($) 489, 000 534, 000 736, 000 
Capital($) 321, 000 364, 000 726, 000 
Utility($) 168, 000 170, 000 10, 000 
ISE 0.0160 0.00379 0.0011 
Trays 19 8 1 
Feed 19 8 1 
V,(kmol) 2057.9 3601.2 15000 
V(kmol/hr) 138.94 141.25 85.473 
Ky 90.94 80.68 87.40 
ty (hr) 0.295 0.0898 0.0156 


See also 

> Chemical Process Planning 

> Control Vector Iteration 

> Duality in Optimal Control with First Order 
Differential Equations 

> Dynamic Programming: Continuous-time Optimal 
Control 

> Dynamic Programming and Newton’s Method in 
Unconstrained Optimal Control 

> Dynamic Programming: Optimal Control 
Applications 

> Extended Cutting Plane Algorithm 

> Generalized Benders Decomposition 

> Generalized Outer Approximation 

> Hamilton-Jacobi-Bellman Equation 

> Infinite Horizon Control and Dynamic Games 

> MINLP: Application in Facility Location-allocation 

> MINLP: Applications in Blending and Pooling 
Problems 

> MINLP: Branch and Bound Global Optimization 
Algorithm 

> MINLP: Branch and Bound Methods 

> MINLP: Design and Scheduling of Batch Processes 

> MINLP: Generalized Cross Decomposition 

> MINLP: Global Optimization with wBB 

> MINLP: Heat Exchanger Network Synthesis 

> MINLP: Logic-based Methods 

> MINLP: Outer Approximation Algorithm 

> MINLP: Reactive Distillation Column Synthesis 

> Mixed Integer Linear Programming: Mass and Heat 
Exchanger Networks 

> Mixed Integer Nonlinear Programming 

> Multi-objective Optimization: Interaction of Design 
and Control 

> Optimal Control of a Flexible Arm 

> Robust Control 

> Robust Control: Schur Stability of Polytopes of 
Polynomials 

> Semi-infinite Programming and Control Problems 

> Sequential Quadratic Programming: Interior Point 
Methods for Distributed Optimal Control Problems 

> Suboptimal Control 
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A wide range of nonlinear optimization problems in- 
volve integer or discrete variables in addition to con- 
tinuous ones. These problem are denoted as mixed in- 
teger nonlinear programming (MINLP) problems. Inte- 
ger variables correspond to logical decision describing 
whether certain actions do or do not take place, or mod- 
eling the sequence according to which those decisions 
take place. The nonlinear nature of the MINLP models 
may arise from: 


nonlinear relations in the integer domain only 
nonlinear relations in the continuous domain only 
nonlinear relations in the joint domain, i.e., prod- 
ucts of continuous and binary/integer variables. 
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The general mathematical formulation of the 
MINLP problems can be stated as follows: 


min f(x, y) 
xy 
st. h(x, y) =0 


g(x,y) <0 
xeEex CR" 


y€Y (integer). 


Here, x represents a vector of n continuous variables, 
y is a vector of integer variables, f(x, y), h(x, y), g(x y) 
represent the objective function, equality and inequality 
constraints, respectively. It should be noted, that every 
problem of the form just presented, can be transformed 
into one where all integer variables have been trans- 
formed into binary, i.e., 0-1, variables, by realizing that 
every integer y’ < y < y” can be expressed through 0-1 
variables, z = (Z1,..., Zn); as: 


y= yh $2 +22 +423 +2 + WN I ey, 
ao =) 


N =1+ INT 
7 log 2 


Therefore, any MINLP problem can be written as: 


min f(x, y) 
xy 
st. h(x, y) =0 


g(x,y) <0 
xEX CR" 
yeY={0,1}". 


In the analysis of MINLP problems two issues are of 
paramount importance: 

e combinatorial explosion of computational require- 
ments as the number of binary variables increases 

e NP-hard nature of the problem of determining 
the global minimum solution of general nonconvex 

MINLP problems. 

A complexity analysis of the former is presented in [16], 
while the complexity of determining global minimum 
solutions of MINLPs is discussed in [15]. 

Various methods exist for identifying a locally opti- 
mum solution of MINLP problems. These are discussed 
in great detail in [9] and in a recent thorough review pa- 
per, [6], which presents a comprehensive account of the 
various approaches for addressing issues related to the 


solution of mixed integer nonlinear optimization prob- 
lems. 

The main objective in a general branch and bound 
algorithm is to perform an enumeration of the alterna- 
tives without examining all 0-1 combinations of the bi- 
nary variables. A key element in such an enumeration if 
the representation of alternatives via a binary tree. The 
basic ideas in a branch and bound algorithm are the fol- 
lowing. First, a reasonable effort is made in solving the 
original problem, by considering for instance the con- 
tinuous relaxation of it. If the relaxation does not re- 
sult in an integer-feasible solution, i.e., one in which 
the binary variables achieve 0-1 at the optimal point, 
them the root node is separated into two candidate 
subproblems which are subsequently solved. The sep- 
aration aims at creating simpler instances of the orig- 
inal problem. Until the problem is successfully solved 
this process of generating candidate subproblems is re- 
peated. Branch and bound algorithms are also known as 
divide-and-conquer for that very reason. A basic prin- 
ciple common to all branch and bound algorithms is 
that the solution of the subproblems aims at generat- 
ing valid lower bounds on the original MINLP through 
its relaxation to a continuous problem. The relaxation, 
in the case of MINLP, results in a nonlinear program- 
ming problem (NLP) which, in the general case, is non- 
convex and needs to be solved to global optimality so 
as to provide a valid lower bound. If the NLP relax- 
ation renders an integer solution, then this solution 
is referred to as valid upper bound. The generation of 
the sequence of valid upper and lower bounds is called 
bounding step. The way subproblems are created is by 
forcing some of the binary variables to take on a value 
of 0 or 1. This is known as the branching step. Nodes 
in the tree are pruned when the corresponding valid 
lower bound exceeds the valid upper bound, this stage is 
knowas the fathoming step. The selection of the branch- 
ing node, the branching variable and the generation of 
the lower bound are very crucial steps whose impor- 
tance becomes even more pronounced when address- 
ing nonconvex MINLP problems. Two basic strategies 
exists regarding the selection of the branching node de- 
pending on whether one designs a branch and bound 
based on a depth-first or a breadth-first approach. In 
the former, the last node created is selected for branch- 
ing, in the latter the node that generated the best lower 
bound is selected. It is not clear which strategy is the 
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best and it is often that the one that minimizes the 
computational requirement is selected, [13]. Another 
alternative is to select nodes based on the deviation of 
the solution from integrality, [12]. The most common 
strategy for selecting a branching variable is to select 
the variable whose value at the solution of some re- 
laxed problem is the farthest from integer, i. e., the most 
fractional variable, [17]. In [12] a method based on the 
concept of pseudocosts which quantifies the effect of bi- 
nary variables is also proposed, which assigns essen- 
tially priorities on the order of branching variables. Fi- 
nally, one of the most important computational step 
is the generation of the lower bound, in other words 
the solution of the relaxed problem. The effectiveness 
of a branch and bound depends of the quality of the 
lower bound that is generated. At every node of the 
branch and bound tree a nonlinear-nonconvex NLP is 
solved. Two issues are important: the lower bound must 
be valid, in other words the relaxation at a particular 
node must underestimate the solution of the original 
problem for this node, and the lower bounds must be 
tight so as to enhance the fathoming step. The key com- 
plexity when dealing with nonconvex MINLPs is that 
the relaxation solved at each node is, of course, a non- 
convex NLP that has to be solved to global optimal- 
ity. With the exception of problems which are convex 
in the x and relaxed y-space for which variants of the 
branch and bound algorithms will lead the correct so- 
lution, [18], in all other cases global optimization algo- 
rithms have to be employed for the generation of valid 
lower bounds. 

In [19] the scope of branch and bound algorithms 
was extended to problems for which valid convex un- 
derestimating NLPs can be constructed for the con- 
vex relaxations. The problems included bilinear and 
separable problems for which convex underestimators 
can be build [14]. A number of very useful tests were 
proposed to accelerate the reduction of solution space. 
Namely: 

1) Optimality based range reduction tests: For the first 
set of tests, an upper bound U on the noncon- 
vex MINLP must be computed and a convex lower 
bounding NLP must be solved to obtain a lower 
bound L. If a bound constraint for variable x;, with 
xi < x; < x¥, is active at the solution of the convex 

NLP and has multiplier A¥ > 0, the bounds on x; can 

be updated as follows: 


a) Ifx;- ae = 0 at the solution of the convex NLP 
and x; = x/— (U — L)/A* is such that x; > x?, 
thén x? =e; 

b) If x; — x} = 0 at the solution of the convex NLP 
and x; = x' + (U — L)/A* is such that K; < x¥, 
then xV 
If neither bound constraint is active at the solution 
of the convex NLP for some variable x;, the problem 
can be solved by setting x; = xv Or Xj = xi Tests sim- 
ilar to those presented above are then used to update 
the bounds on xj. 

2) Feasibility based range reduction tests: In addition 
to ensuring that tight bounds are available for the 
variables, the constraint underestimators are used 
to generate new constraints for the problem. Con- 
sider the constraint g;(x, y) < 0. If its underestimat- 
ing function g(x, y) = Oat the solution of the con- 


vex NLP and its multiplier is jz > 0, the constraint 


= Kj. 


—L 
g(xy)> 
a ia 


i 


can be included in subsequent problems. 

A global optimization algorithm branch and bound al- 
gorithm has been proposed in [20]. It can be applied 
to problems in which the objective and constraints are 
functions involving any combination of binary arith- 
metic operations (addition, subtraction, multiplication 
and division) and functions that are either concave over 
the entire solution space (such as In) or convex over this 
domain (such as exp). 

The algorithm starts with an automatic reformu- 
lation of the original nonlinear problem into a prob- 
lem that involves only linear, bilinear, linear fractional, 
simple exponentiation, univariate concave and univari- 
ate convex terms. This is achieved through the intro- 
duction of new constraints and variables. The reformu- 
lated problem is then solved to global optimality using 
a branch and bound approach. Its special structure al- 
lows the construction of a convex relaxation at each 
node of the tree. The integer variables can be handled 
in two ways during the generation of the convex lower 
bounding problem. The integrality condition on the 
variables can be relaxed to yield a convex NLP which 
can then be solved globally. Alternatively, the integer 
variables can be treated directly and the convex lower 
bounding MINLP can be solved using a branch and 
bound algorithm as described earlier. This second ap- 
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proach is more computationally intensive but is likely 
to result in tighter lower bounds on the global optimum 
solution. In order to obtain an upper bound for the op- 
timum solution, several methods have been suggested. 
The MINLP can be transformed to an equivalent non- 
convex NLP by relaxing the integer variables. For exam- 
ple, a variable y € {0, 1} can be replaced by a continuous 
variable z € [0, 1] by including the constraint z— z- z= 
0. The nonconvex NLP is then solved locally to provide 
an upper bound. Finally, the discrete variables could be 
fixed to some arbitrary value and the nonconvex NLP 
solved locally. 

In [1] SMIN was proposed which is designed to ad- 
dress the following class of problems to global optimal- 


ity: 
min f(x) +x! Aoy+cq y 
st. h(x) +x" Ayy+cly =0 
g(x) +x" Ary+cly <0 


xeX CR" 
y€Y _ (integer), 
a0 


where cy ,c; and a are constant vectors, Ag, A; and A 
are constant matrices and f(x), h(x) and g(x) are func- 
tions with continuous second order derivatives. The so- 
lution strategy is an extension of the wBB algorithm for 
twice-differentiable NLPs [4,5,7]. It is based on the gen- 
eration of two converging sequences of upper and lower 
bounds on the global optimum solution. A rigorous 
underestimation and convexification strategy for func- 
tions with continuous second order derivatives allows 
the construction of a lower bounding MINLP problem 
with convex functions in the continuous variables. If no 
mixed-bilinear terms are present (A; = 0, Vi), the re- 
sulting MINLP can be solved to global optimality us- 
ing the outer approximation algorithm (OA), [8]. Oth- 
erwise, the generalized Benders decomposition (GBD) 
can be used, [10], or the Glover transformations [11] 
can be applied to remove these bilinearities and per- 
mit the use of the OA algorithm. This convex MINLP 
provides a valid lower bound on the original MINLP. 
An upper bound on the problem can be obtained by 
applying the OA algorithm or the GBD to find a lo- 
cal solution. This bound generation strategy is incorpo- 
rated within a branch and bound scheme: a lower and 
upper bound on the global solution are first obtained 
for the entire solution space. Subsequently, the domain 


is subdivided by branching on a binary or a continu- 
ous variable, thus creating new nodes for which upper 
and lower bounds can be computed. At each iteration, 
the node with the lowest lower bound is selected for 
branching. If the lower bounding MINLP for a node is 
infeasible or if its lower bound is greater than the best 
upper bound, this node is fathomed. The algorithm is 
terminated when the best lower and upper bound are 
within a pre-specified tolerance of each other. 

Before presenting the algorithmic procedure, an 
overview of the underestimation and convexification 
strategy is given, and some of the options available 
within the algorithm are discussed. 

In order to transform the MINLP problem of the 
form just described into a convex problem which can 
be solved to global optimality with the OA or GBD 
algorithm, the functions f(x), h(x) and g(x) must be 
convexified. The underestimation and convexification 
strategy used in the wBB algorithm has previously been 
described in detail [3,4,5]. Its main features are exposed 
here. 

In order to construct as tight an underestimator as 
possible, the nonconvex functions are decomposed into 
a sum of convex, bilinear, univariate concave and gen- 
eral nonconvex terms. The overall function underes- 
timator can then be built by summing up the convex 
underestimators for all terms, according to their type. 
In particular, a new variable is introduced to replace 
each bilinear term, and is bounded by its convex enve- 
lope. The univariate concave terms are linearized. For 
each nonconvex term nt(x) with Hessian matrix H,;(x), 
a convex underestimator L(x) is defined as 


L(x) = nt(x)— Do ail? —xi)(xi-x7), 1) 


where x¥ and x! are the upper and lower bounds on 
variable x;, respectively, and the w parameters are non- 
negative scalars such that H,,;(x) + 2 diag(a;) is posi- 
tive semidefinite over the domain [x", x”]. The rigorous 
computation of the w parameters using interval Hessian 
matrices is described in [3,4,5]. 

The underestimators are updated at each node of 
the branch and bound tree as their quality strongly de- 
pends on the bounds on the variables. An unusual fea- 
ture of the SMIN-@BB algorithm is the strategy used to 
select branching variables. It follows a hybrid approach 
where branching may occur both on the integer and the 
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continuous variables in order to fully exploit the struc- 
ture of the problem being solved. After the node with 
the lowest lower bound has been identified for branch- 
ing, the type of branching variable must be determined 
according to one of the following two criteria: 

1) Branch on the binary variables first. 

2) Solve a continuous relaxation of the nonconvex 

MINLP locally. Branch on a binary variable with 

a low degree of fractionality at the solution. If there 

is no such variable, branch on a continuous variable. 
The first criterion results in the creation of an integer 
tree for the first q levels of the branch and bound tree, 
where q is the number of binary variables. At the low- 
est level of this integer tree, each node corresponds to 
a nonconvex NLP and the lower and upper bounding 
problems at subsequent levels of the tree are NLP prob- 
lems. The efficiency of this strategy lies in the minimiza- 
tion of the number of MINLPs that need to be solved. 
The combinatorial nature of the problem and its non- 
convexities are handled sequentially. If branching oc- 
curs on a binary variable, the selection of that variable 
can be done randomly or by solving a relaxation of the 
nonconvex MINLP and choosing the most fractional 
variable at the solution. 

The second criterion selects a binary variable for 
branching only if it appears that the two newly 
created nodes will have significantly different lower 
bounds.Thus, if a variable is close to integrality at the 
solution of the relaxed problem, forcing it to take on 
a fixed value may lead to the infeasibility of one of the 
nodes or the generation of a high value for a lower 
bound, and therefore the fathoming of a branch of the 
tree. If no binary variable is close to integrality, a con- 
tinuous variable is selected for branching. 

A number of rules have been developed for the se- 
lection of a continuous branching variable. Their aim 
is to determine which variable is responsible for the 
largest separation distances between the convex under- 
estimating functions and the original nonconvex func- 
tions. These efficient rules are exposed in [2]. Variable 
bound updates performed before the generation of the 
convex MINLP have been found to greatly enhance the 
speed of convergence of the aBB algorithm for contin- 
uous problems [2]. For continuous variables, the vari- 
able bounds are updated by minimizing or maximiz- 
ing the chosen variable subject to the convexified con- 
straints being satisfied. In spite of its computational 


cost, this procedure often leads to significant improve- 

ments in the quality of the underestimators and hence 

a noticeable reduction in the number of iterations re- 

quired. 

In addition to the update of continuous variable 
bounds, the SMIN-@BB algorithm also relies on binary 
variable bound updates. Through simple computations, 
an entire branch of the branch and bound tree may 
be eliminated when a binary variable is found to be 
restricted to 0 or 1. The bound update procedure for 
a given binary variable is as follows: 

1) Set the variable to be updated to one of its bounds 
Y= YB: 

2) Perform interval evaluations of all the constraints in 
the nonconvex MINLP, using the bounds on the so- 
lution space for the current node. 

3) If any of the constraints are found infeasible, fix the 
variable to y = 1 — yz. 

4) If both bounds have been tested, repeat this proce- 
dure for the next variable to be updated. Otherwise, 
try the second bound. 

In [1] GMIN, which operates within a classical 
branch and bound framework, was proposed. The main 
difference with similar branch and bound algorithms 
[12,17] is its ability to identify the global optimum so- 
lution of a much larger class of problems of the form 


min f(x, y) 
xy 
st. h(x, y) =0 


g(x,y) <0 
xexX Cc R" 
yeN4, 


where N is the set of nonnegative integers and the only 
condition imposed on the functions f(x, y), g(x, y) and 
h(x, y) is that their continuous relaxations possess con- 
tinuous second order derivatives. This increased appli- 
cability results from the use of the @BB global opti- 
mization algorithm for continuous twice-differentiable 
NLPs [4,5,7]. 

At each node of the branch and bound tree, the non- 
convex MINLP is relaxed to give a nonconvex NLP, 
which is then solved with the aBB algorithm. This al- 
lows the identification of rigorously valid lower bounds 
and therefore ensures convergence to the global opti- 
mum. In general, it is not necessary to let the wBB al- 
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gorithm run to completion as each one of its iterations 
generates a lower bound on global solution of the NLP 
being solved. A strategy of early termination leads to 
a reduction in the computational requirements of each 
node of the binary branch and bound tree and faster 
overall convergence. 

The GMIN-aBB algorithm selects the node with the 
lowest lower bound for branching at every iteration. 
The branching variable selection strategy combines sev- 
eral approaches: branching priorities can be specified 
for some of the integer variables. When no variable has 
a priority greater than all other variables, the solution of 
the continuous relaxation is used to identify either the 
most fractional variable or the least fractional variable 
for branching. 

Other strategies have been implemented to ensure 
a satisfactory convergence rate. In particular, bound 
updates on the integer variables can be performed at 
each level of the branch and bound tree. These can be 
carried out through the use of interval analysis. An in- 
teger variable, y*, is fixed at its lower (or upper) bound 
and the range of the constraints is evaluated with in- 
terval arithmetic, using the bounds on all other vari- 
ables. If the range of any constraint is such that this 
constraint is violated, the lower (or upper) bound on 
variable y* can be increased (or decreased) by one. An- 
other strategy for bound updates is to relax the integer 
variables, to convexify and underestimate the noncon- 
vex constraints and to minimize (or maximize) a vari- 
able y* in this convexified feasible region. The resulting 
lower (or upper) bound on relaxed variable y* can then 
be rounded up (or down) to the nearest integer to pro- 
vide an updated bound for y*. 
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A general mixed integer nonlinear programming prob- 
lem (MINLP) can be written as 


min f(x,y) 
s.t. h(x,y) =0 
(MINLP) g(x,y) <0 
x € R” 
ye ZZ”. 


Here x is a vector of m continuous variables and y is 
a vector of m integer variables. In many cases, the in- 
teger variables y are restricted to the values 0 and 1. 
Such variables are called binary variables. The function 
f is a scalar valued objective function, while the vec- 
tor functions h and g express linear or nonlinear con- 
straints. Problems of this form have a wide variety of 
applications, in areas as diverse as IR spectroscopy [6], 
finance [3], chemical process synthesis [9], topological 
design of transportation networks [12], and marketing 
[10]. 

The earliest work on branch and bound algorithms 
for mixed integer linear programming dates back to the 
early 1960s [7,13,15]. Although the possibility of apply- 
ing branch and bound methods to mixed integer non- 
linear programming problems was apparent from the 
beginning, actual work on such problems did not be- 
gin until later. Early papers on branch and bound al- 
gorithms for mixed integer nonlinear programming in- 
clude [11,14]. 

A branch and bound algorithm for solving 
(MINLP) requires the following data structures. The 
algorithm maintains a list L of unsolved subproblems. 
The algorithm also maintains a record of the best in- 
teger solution that has been found. This solution, (x*, 
y*), is called the incumbent solution. The incumbent 
solution provides an upper bound, ub, on the objective 
value of an optimal solution to (MINLP). 

The basic branch and bound procedure is as follows. 
1) Initialize: Create the list L with (MINLP) as the ini- 

tial subproblem. Ifa good integer solution is known, 
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then initialize x*, y*, and ub to this solution. If there 
is no incumbent solution, then initialize ub to +00. 

2) Select: Select an unsolved subproblem, S, from the 
list L. If L is empty, then stop: If there is an incum- 
bent solution, then that solution is optimal; If there 
is no incumbent solution, then (MINLP) is infeasi- 
ble. 

3) Solve: Relax the integrality constraints in S and 
solve the resulting nonlinear programming relax- 
ation. Obtain a solution X, y, and a lower bound, 1b, 
on the optimal value of the subproblem. 

4) Fathom: If the relaxed subproblem was infeasible, 
then S will clearly not yield a better solution to 
(MINLP) than the incumbent solution. Similarly, if 
Ib > ub, then the current subproblem cannot yield 
a better solution to (MINLP) than the incumbent 
solution. Remove S from L, and return to step 2. 

5) Integer Solution: If ¥ is integer, then a new incum- 
bent integer solution has been obtained. Update x*, 
y*, and ub. Remove S from L and return to step 2. 

6) Branch: At least one of the integer variables y; takes 
on a fractional value in the solution to the current 
subproblem. Create a new subproblem, S, by adding 
the constraint 


Vk <= Ve). 


Create a second new subproblem, Sz by adding the 
constraint 


yk = [V«l- 


Remove S from L, add S; and S; to L, and return to 
step 2. 
The following example demonstrates how the branch 
and bound algorithm solves a simple (MINLP): 


(1 - i)? + (y2 - i)? + 93 
—2y, +2y2 <1 
y binary. 


The optimal solution to the initial nonlinear program- 
ming relaxation is y = (1/4, 1/4, 0), with an objective 
value of z = 0. Both y, and y take on fractional val- 
ues in this solution, so it is necessary to select a branch- 
ing variable. The algorithm arbitrarily selects y, as the 


y=(1,1/4,0) 
z= 9/16 
bound > ub 


z=1/8 
integer 


infeasible 
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Branch and bound tree for a sample problem 


branching variable, and creates two new subproblems 
in which y, is fixed at 0 or 1. In the subproblem with y; 
fixed at 0, the optimal solution is y = (0, 1/4, 0), with z = 
1/16. Since the optimal value of y is fractional, the algo- 
rithm again creates two new subproblems, with y» fixed 
at 0 and 1. The optimal solution to the subproblem with 
y, =Oand y2 = 0is y = (0, 0, 0), with z = 1/8. This estab- 
lishes an incumbent integer solution. The subproblem 
with y, = 0 and y2 = 1 is infeasible and can be eliminated 
from consideration. The subproblem with y, = 1 has an 
optimal solution with y = (1, 1/4, 0) and objective value 
z = 9/16. Since 9/16 is larger than the objective value of 
the incumbent solution, this subproblem can be elim- 
inated from consideration. Thus the optimal solution 
to the example problem is y* = (0, 0, 0) with objective 
value z* = 1/8. 

Since each subproblem S creates at most two new 
subproblems, the set of subproblems considered by the 
branch and bound algorithm can be represented as 
a binary tree. The above figure shows the branch and 
bound tree for the example problem. 

There are a number of important issues in the im- 
plementation of a branch and bound algorithm for 
(MINLP). 

The first important issue is how to solve the non- 
linear programming relaxations of the subproblems in 
step 3. If the objective function f and the constraint 
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functions g are convex, while the constraint functions 
hare linear, then the nonlinear programming subprob- 
lems in step 3 are convex and thus relatively easy to 
solve. A variety of methods have been used to solve 
these subproblems including generalized reduced gra- 
dient (GRG) methods [11], sequential quadratic pro- 
gramming (SQP) [4], active set methods for quadratic 
programming [8], and interior point methods [16]. 

However, if the nonlinear programming subprob- 
lems are nonconvex, then it can be extremely diffi- 
cult to solve the nonlinear programming relaxation of 
S or even obtain a lower bound on the optimal ob- 
jective function value. For some specialized classes of 
nonconvex optimization problems, including indefinite 
quadratic programming, bilinear programming, and 
fractional linear programming, convex functions which 
underestimate the nonconvex objective function are 
known. These convex underestimators are widely used 
in branch and bound algorithms for nonconvex nonlin- 
ear programming problems. Branch and bound tech- 
niques for nonconvex continuous optimization prob- 
lems can also been used within a branch and bound al- 
gorithm for nonconvex mixed integer nonlinear pro- 
gramming problems. For instance, the BARON sys- 
tem uses this approach to solve a variety of noncon- 
vex mixed integer nonlinear programming problems 
[17,18]. This approach is also used in the GMIN-aBB 
algorithm to solve nonconvex 0 — 1 mixed integer non- 
linear programming problems with twice differentiable 
objective and constraint functions [1]. 

The choice of the next subproblem to be solved in 
step 2 can have a significant influence on the perfor- 
mance of the branch and bound algorithm. In mixed 
integer linear programming, a variety of heuristics are 
employed to select the next subproblem [2]. One pop- 
ular heuristic used in branch and bound algorithms for 
MILP is the best bound rule’, in which the subprob- 
lem with the smallest lower bound is selected. The best 
bound rule is widely used within branch and bound al- 
gorithms for (MINLP) [4,11,18] 

In step 6, there may be a choice of several vari- 
ables with fractional values to be the branching variable. 
A simple approach is to select the variable whose value 
Vx is furthest from being an integer [4,11]. In mixed 
integer linear programming, estimates of the increase 
in the objective function that will result from forcing 
a variable to an integer value are often made. These es- 


timates, called ‘pseudocosts’ or ‘penalties’, are used to 
select the branching variable. Penalties have also been 
used in branch and bound algorithms for mixed inte- 
ger nonlinear programming problems [11,18]. 

The performance of the branch and bound algo- 
rithm can be improved by computing lower bounds 
on the optimal value of a subproblem without actu- 
ally solving the subproblem. In [8], lower bounds on 
the optimal objective value of a subproblem are derived 
from an optimal dual solution to the subproblem’s par- 
ent problem. If this lower bound is larger than the ob- 
jective value of the incumbent solution, then the sub- 
problem can be eliminated from consideration. In [4], 
Lagrangian duality is used to compute lower bounds 
during the solution of a subproblem. When the lower 
bound exceeds the value of the incumbent solution, the 
current subproblem can be discarded. 

Another way to improve the performance of 
a branch and bound algorithm for (MINLP) is to 
tighten the formulation of the nonlinear programming 
subproblems before solving them. In the BARON pack- 
age, dual information from the solution to a nonlinear 
programming subproblem is used to restrict the ranges 
of variables and constraints in the children of the sub- 
problem [17,18]. 

In branch and cut approaches, constraints called 
cutting planes are added to the nonlinear programming 
subproblems [3,19]. These additional constraints are 
selected so that they reduce the size of the feasible re- 
gion of nonlinear programming subproblems without 
eliminating any integer solutions from consideration. 
This tightens the formulations of the subproblems and 
thus increases the probability that a subproblem can be 
fathomed by bound. Furthermore, the use of cutting 
planes can make it more likely that an integer solution 
will be obtained early in the branch and bound pro- 
cess. A variety of cutting planes developed for use in 
branch and cut algorithms for integer linear program- 
ming have been adapted for use in branch and cut al- 
gorithms for nonlinear integer programming. These in- 
clude mixed integer rounding cuts [3], knapsack cuts [3], 
intersection cuts [3], and lift-and-project cuts [19]. 

To date, little work has been done to compare 
the performance of branch and bound methods for 
(MINLP) with other approaches such as outer ap- 
proximation and generalized Benders decomposition. B. 
Borchers and J.E. Mitchell (1997) compared an ex- 
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perimental branch and bound code with a commer- 
cially available outer approximation code on a num- 
ber of test problems [5]. This study found that the 
branch and bound code and outer approximation code 
were roughly comparable in speed and robustness. R. 
Fletcher and S. Leyffer (1998) compared the perfor- 
mance of their branch and bound code for mixed in- 
teger convex quadratic programming problems with 
their implementations of outer approximation, gener- 
alized Benders decomposition, and an algorithm that 
combines branch and bound and outer approximation 
approaches [8]. Fletcher and Leyffer found that their 
branch and bound solver was consistently faster than 
the other codes by about an order of magnitude. 
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The design of batch processes has been a major area 
of research for the past several decades. In conjunc- 
tion with the design of batch plants, many different ap- 
proaches have been proposed for the determination of 
an optimal schedule for the plant. It has been recog- 
nized for some time that in order to increase the effi- 
ciency of batch processes, the two tasks of design and 
scheduling should be considered simultaneously. 

The problem is to design a batch process consisting 
of M processing steps, in which N products are made, 
where all materials follow the same path through the 
process. This is commonly known as a multiproduct 
batch plant, or a flow-shop. 


There are two predominant methods for formulat- 
ing the batch process design and scheduling problem. 
The first is a continuous-time formulation in which the 
scheduling information is incorporated through a plan- 
ning horizon constraint. This problem can be formu- 
lated as a NLP or MINLP depending on whether the 
number of parallel units is fixed or variable. The solu- 
tion of this problem does not give the actual schedule, 
but does guarantee that a feasible schedule exists. A sep- 
arate problem, typically a MILP, must be solved to find 
the actual schedule. 

The second method for formulating the batch pro- 
cess design and scheduling problem is based on a state- 
task-network (STN) representation. In this approach, 
the planning horizon is discretized into time steps. Each 
task must be assigned to both a unit and a time slot. The 
formulation results in a large MINLP whose solution 
provides both the plant design and the actual sched- 
ule. 


Continuous-Time Formulations 


The early work of [10] was based on the single product 
campaign (SPC) scheduling policy. In a single product 
campaign, all batches of one product are processed one 
after the other, followed by all of the batches of the next 
product, and so on. 

In this approach, the scheduling information is in- 
corporated by way of a planning horizon constraint. 
This constraint requires that all products must be com- 
pleted before the planning horizon, H, is reached. In 
a single product campaign, the time between batches 
of product i is based on the maximum processing time 
over all of the stages, 


tri = max(t;;), 
j 


where ¢7; is the ‘limiting’ time for product i. The plan- 
ning horizon constraint can be written as the sum over 
all of the products of the limiting time multiplied by the 
number of batches of each product 


~~ a Tri < A, 
—~ B; 

t 
where Q; is the total production of i and B; is the batch 
size for i. Because Q; and B; are variables, this results in 


a NLP. 
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In [4] the authors formulated the batch process de- 
sign and scheduling problem as a MINLP. Their model 
was based on the SPC model of [10]. In this problem, 
more than one piece of equipment per stage is available 
for use in parallel. Rather than solve the MINLP rigor- 
ously, they relaxed the number of units per stage to be 
continuous and solved the resulting NLP. [5] formu- 
lated the MINLP using binary 0-1 variables and solved 
it with an outer approximation method. In addition to 
the combinatorial nature of the problem due to integer 
variables, the solution of the problem is complicated by 
the nonconvex form of the planning horizon constraint. 

[2] developed extensions of the SPC formulation 
to allow more efficient utilization of the batch process 
equipment. They considered two mixed-product cam- 
paign (MPC) scheduling policies, 

i) the unlimited intermediate storage (UIS) policy; and 
ii) the zero-wait (ZW) policy. 

As its name implies, a mixed product campaign allows 
batches of different products to be processed sequen- 
tially. For example, a SPC schedule for three batches 
each of two products A and B would be, AAABBB, 
while a MPC schedule could be ABABAB. In the zero- 
wait policy, when a product has completed process- 
ing in one stage, it must immediately begin process- 
ing in the next stage. Conversely, the UIS policy allows 
a product to be stored for a period of time before be- 
ginning the next processing step. [7] showed that for 
the case of zero cleanup times, the UIS policy is the 
most efficient mixed-product campaign policy, while 
the ZW policy is the most conservative. [2] incorpo- 
rated the new scheduling policies into the batch process 
design problem by considering the characteristic cycle 
time for each policy. The cycle time becomes the ba- 
sis upon which the planning horizon constraint is im- 
posed. 

[3] used the batch design formulation with mixed- 
product campaign schedules to formulate the batch 
synthesis, design and scheduling problem. In this for- 
mulation the number of stages, M, in the batch process 
is not fixed. Instead, each product is required to un- 
dergo the same sequence, T, of processing tasks. Units 
that each can perform one of the tasks are given, and 
in addition, ‘superunits’ are postulated that can com- 
bine two or more tasks. The problem is to assign tasks 
to units, size the units, and determine the number of 
parallel units in the batch process. 


Problem Formulation 


1) Binary variables 


1 if unit j exists 
YEX; = 

0 otherwise, 

1 if unit j contains 
YC,j = c parallel units 


0 otherwise, 


1 if task t is assigned 
Yj = to unit j 


0 otherwise, 


1 if t is the first task 
YF = processed in unit j 


0 otherwise. 


2) Design constraints 

Task volume requirement, We depends on batch 
size, B;, of each product and size factor, Si, for 
each product in each task. 


VV = BS: 


- The volume of a processing unit j must be large 
enough to accomodate task ¢ if task t is assigned 
to unit j, (Yy = 1). 


Weve =) d= i), 


- The processing time, ptj, for each product in 
each unit is given by the corresponding time fac- 
tor, ti, for each product in task t if task ft is as- 
signed to unit j, (Y= 1). 


pti = > tit Yj. 
t 


- The number of batches, n;, multiplied by the 
batch size must satisfy the production require- 
ment, Q;, for each product. 


nj Bi = Qi. 


3) Parallel equipment constraints 
- The number of parallel units in each stage j is de- 
termined by the binary variable YC,; multiplied 
by the number c, 


Nps > ee YOR. 
c 
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4) 


5) 


Scheduling constraint 

- For the UIS policy with zero cleanup times, the 
planning horizon constraint derived by [2] is 
used, 


So niptij < H- Nj. 
i 


Logical constraints 
- Ifa stage j exists, then at least one processing task 
must be assigned to it, 


Y> Yj = YEX;. 
t 
- Ifa stage j does not exist, there can be no tasks 
assigned to it, 


- Ifa stage j exists, then one of the tasks assigned 
to it must be the first task assigned to stage j, 


" ¥Py = YEX;: 
t 


- There cannot be more than one first task as- 
signed to each stage, 


YF, <1. 
t 


- A task can be the first task assigned to a stage 
only if the task is among those assigned to the 
stage, 


YFij < Yij- 


- No tasks that occur before the first task assigned 
to stage j can be among those assigned to the 
stage, 

Yvyj <1-YF,j fort’ <t. 


- If multiple tasks are assigned to a unit, they must 
be consecutive tasks, 


Yij S YFuj + Yi-1,;. 


- One and only one binary variable that deter- 
mines the number of parallel units in stage j must 
be active, 


\ eed, 


6) Objective function 
- The objective is to minimize the cost of the plant. 
[3] used a fixed-charge cost for each unit, y;, plus 
a nonlinear cost function on the size of the unit, 


Cost = > Ny Ly +a;¥;"]. 
j 


This formulation is a MINLP where all binary variables 
participate linearly and separably. However, it is a non- 
convex problem due to the cost function, and the bi- 
linear terms in the batch size constraints and the plan- 
ning horizon constraints. [3] used the outer approxima- 
tion method implemented in DICOPT ([11]) to solve 
a number of example problems. Due to the nonconvex- 
ities in the formulation, there is no guarantee of global 
optimality with the outer approximation method, but 
they report good results for the examples presented in 
the paper. 

Two examples are briefly discussed to illustrate the 
proposed approach for multiproduct batch plants with 
a variety of scheduling policies. The first example con- 
sists of three products with four processing tasks and 
five potential units and superunits. The MINLP formu- 
lation with the SPC policy contains 33 binary variables 
and 54 continuous variables. With the ZW policy, the 
number of binary variables drops to 8, with 98 contin- 
uous variables. For the UIS policy, the formulation has 
33 binary variables with 51 continuous variables. 

The second example is larger and contains 6 prod- 
ucts with 7 potential units and superunits. The SPC pol- 
icy formulation contains 46 binary variables and 101 
continuous variables. The MINLP formulation for the 
ZW policy has 11 binary and 374 continuous variables. 
The UIS policy formulation has 46 binary and 95 con- 
tinuous variables. In all cases the examples were solved 
in less than 50 minutes using GAMS/DICOPT ++ on 
Microvax II. 


Discrete-Time Formulations 


A.P.F.D. Barbosa-Pévoa and S. Macchietto, [1], pro- 

posed a MILP formulation to address the problem of 

optimal batch design by simultaneously considering 

optimizing production schedule. They based their for- 

mulation on 

a) an extended state-task-network (mSTN) represen- 
tation of the batch plant; and 
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b) the discrete time representation using uniform time 
discretization. 

In the STN representation, proposed in [6], all the 

materials are represented as states processed through 

a set of processing steps (‘tasks’). In order to incorpo- 

rate connectivity constraints the extended state-task- 

network (mSTN) is proposed involving the alternative 

design configurations considering all permitted equip- 

ment and connections allocations. Single campaign is 

assumed with a cyclic schedule of cycle time T repeated 

over a planning horizon H. A cycle represents a se- 

quence of operations involving the production of all 

products and the utilization of all resources. The op- 

erational characteristics such as the allocation of equip- 

ments to tasks, batch sizes, task timings, transport of 

material and storage profiles are identical in each cycle. 

The mathematical formulation they proposed involves: 

e allocation constraints for the assignment of the tasks 
to the units 

e capacity constraints expressing the limiting equip- 
ment capability 

e connectivity constraints for determining the con- 

nection of different units 

dedicated storage constraints 

mass balances 

production requirement constraints 

an objective function, which is chosen to be either 

the minimization of the capital cost or the maxi- 

mization of plant profit. 

The main variables of the formulation are: 

a) binary structural variables representing the exis- 
tence of an equipment; 

b) binary allocation variables for the assignment of 
a task to a unit at the beginning of a time period; 

c) continuous variables representing the capacity of 
a unit; 

d) continuous variables corresponding to the batch 
size of a task to a unit at each time period; 

e) amount of material delivered and received at each 
time period; 

f) the amount of material transfered at each time pe- 
riod; and 

g) the amount of material stored at each time period. 

The proposed formulation correspond to a mixed in- 

teger linear programming (MILP) problem since they 

used linear cost functions to express the capital cost of 

equipments and time discretization to represent time. 


Three examples were solved illustrating: 

a) the effect of limited connectivity and connection 
cost in the optimal design; 

b) the advantages of considering simultaneously the 
plant design and plant connectivity rather than op- 
timizing first the equipment sizes and then optimiz- 
ing plant connectivity. 

In later work, Barbosa-Pévoa and C.C. Pantelides, 
[1], proposed a new mathematical formulation for the 
optimization of batch plant design considering detailed 
operation characteristics (i.e., short term schedul- 
ing). This formulation also considers a uniform time 
discretization, the only difference lies in the plant 
representation. The resource-state-task (RTN) plant 
representation, [9], was used which corresponds to 
a more general and uniform description of all avail- 
able production resources. However, the new formu- 
lation shares the main characteristics of the previous 
presented one with the same basic variables, and con- 
straints. 

Both formulations share the limitations of the dis- 
crete time formulations, which are that: 

i) they correspond to an approximation of the time 
horizon; and 

ii) they result in an unnecessary increase of the number 
of binary variables in particular, and in the overall 
size of the mathematical model. 

A continuous-time formulation was proposed in [12], 
based on the STN representation and the scheduling 
formulation proposed in [13]. It gives rise to a mixed in- 
teger nonlinear programming problem which is solved 
using a stochastic MINLP optimizer based on an evolu- 
tionary algorithm (EA) with simulated annealing (SA) 
presented in [12]. The method is based on a guided 
stochastic generation of alternative vectors of decision 
variables, which explore promising areas of the search 
space through selection, crossover, and mutation oper- 
ations applied to individuals in a population of solu- 
tion candidates. It can be used to deal with nonconvex, 
nondifferentiable functions although it has no guaran- 
tee of convergence to even a local optimal solution. The 
proposed formulation involves the following basic vari- 
ables: 

e Main design variables representing the discrete deci- 
sions of selecting a unit (j), Ej, or a storage (s), E,, or 
continuous decisions corresponding to the capacity 
of unit storage or utility, V;, V;,and U,,, respectively. 
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e Main operation variables corresponding to the dis- 
crete decision of allocation of task (i) in unit (j) at 
time T;, Wj, and the decision of assigning task (i) 
in unit (j) between starting time T; and end time 
Ty, and continuous variables, the time of event (J), 
T}, the batch size, the processing time and utility re- 
quirement of task (i) allocated to unit (j) starting at 
T), Bil, Tijd, Uiip respectively, 

Based on these variables the proposed formulation in- 

volves: 

1) Processing task models: 

Vij 


u 
U ijl? 


pes SAE u 
ig) = Cj + B;,B 


expressing the consumption-generation of utilities 


as a function of batch size; 


Vij 
Tig) = Oj + BiB; 


+ Dwi Vin + DO MIA 
u Qa 


expressing the dependence of processing time, Tj), 
of batch size, Bj), utilities, Uy, » and unit availabili- 


2) Batch size constraints: 
bi; ViWigt < Bij < $7, Vi Wis 


imposing the maximum and minimum capability 
of unit (j) when task (i) is performed. 
3) Timing constraints: 


Wij (tigi + Tr) = Y> Xiqw Tr, 
>I 
which establish the relationship between process- 
ing time, t,, and time of event (J), T}. 


O<T<Ih<-+< Timm <H, 


expressing the monotonic increase in event times. 
4) Allocation constraints: 


O< > > Wij’ -S > > Xiu < Ej, 


ie]; 1”<V i€]; [<1 1” <I 


» Wiye= >>, > Xijll”, 


i€1; 17 <[ mx i€T; L<I/ 17 <]max 
Wijl = y Xiji’, 
I/>1 


expressing the relationship between Wij and X it 
operation variables, [13]. 


5) Material balances written for state s at event time 
Tr: 


Coy = Copy 


+ > as Y> BigXijv 


i€l, j€Ji I<I’ 


— Dd ai Bin’ 


i€ls j€Ji 
0< Cy < Vio + Vs. 


6) Utility constraints written for utility (u) at event 
time T): 


Uy = Uny-1 


+ De Vikan 


i€l,, j€J; 1<I’ 


= Uti Wign 


i€ly j€Ji 


0< Uy < Uy, 


Une = > be > Ui Tit Wijl- 


i€l, j€J; 1 


7) Availability constraints written for unit (j) at event 


time T): 
Sve =D ASv at Wi — D0 BE Wir 
i€l; i€l; 
a a 
ASy Sova Wi. 


i€]j 


8) Existence constraints: 
= Wij < Ej, 
i€l; 
min max 
vy E,<Vjx< ve Ej, 
Ve Bes V3 Ve Ba 
that correspond to logical restrictions on produc- 
tion unit and storage tank size if this unit-storage 


tank is present at the optimal design. 
9) Production constraints: 


Csjmax > Rs, 


expressing the requirement of producing at least as 
much as the market demands for state (s). 
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10) Objective function: 


Profit = > Ps Cmax 


SESp 
F S> ps(Corm = C30) 
ses; 
= > PsCso = y Cu U ues 
sESf u 


the first two terms represent the revenue due to 
product and intermediate state production, respec- 
tively, whereas the last two terms express the cost 
of raw materials and utilities, respectively, 


Cost = y (a, + B ) 
j 
+ > (Ea, + Bs Vv"); 


the first term represent the cost of installing pro- 
duction unit (j), whereas the second term corre- 
spond to the cost of storage tank (s). 


Objective = Cost — Profit. 


This above formulation correspond to a MINLP prob- 
lem with decision variables: Wj, Xj, Bij, Uj. p> 1 that 
correspond to plant operation and Ej, E,, Vj, Vs, Uu 
that represent design decisions. Nonconvexities appear 
in the timing constraints, material balances, utility con- 
straints as bilinear products of binary and continuous 


variables and in the objective function in power form 


of the type i and V,’*. The authors proposed an 
evolutionary algorithm (EA) with simulated annealing 
(SA), [12], to solve this problem. They utilized simu- 
lated annealing to improve the poor local search abil- 
ity of EA. A suitable encoding procedure is proposed 
which results in reduction in the number of constraints 
and variables by up to 50%. In particular, they explored 
the mathematical structure of the problem in the fol- 
lowing sense. If Wj = 1 and Xjjy = 1, unit j exists, 
it executes operation k which starts at ST! = / finishes 
at FT, = I' involving task TS/, = i with batch size BS), 
= By and utility usage U! ; = Uj. So they proposed 
to replace Wii, Xi, Bil and UF by the operation se- 
quence of tasks in units: task sequence TS! = (ij, ..., 
in,), task batch size BS’ = (B), ..., By,), task utility 


usage ui = (U*,..., UN,)» Start time STi = (h, ...; 
Iy;), finish time FT! = (I';, ..., 'y,). In this way the 
decision variables become (Ej, Vj, E;, Vs, Uu, Ti, TS, 
BS), Ul, STi, FT). The algorithm starts with an ini- 
tial guess and evolves a number of candidate instances 
for these variables. The allocation and the capacity con- 
straints are automatically satisfied by each candidate so- 
lution and T) are chosen so that the timing constraints 
are also satisfied. Two examples are presented to il- 
lustrate the applicability of the proposed approach to 
solve batch design problem involving detailed schedul- 
ing constraints. Linear and nonlinear task processing 
times and unit cost models are considered for both the 
examples. For the first example considering linear func- 
tions for processing times and unit cost models the re- 
sults obtained are compared with a discrete time for- 
mulation, [8], and found to outperform it in terms of 
number of variables which is expected since the for- 
mulation is based on the continuous time description 
and the computational requirement for the solution of 
their model. Considering nonlinear models for pro- 
cessing times and unit costs, the resulting model for 
a problem with 4 production units, 4 storage tanks, 5 
tasks and 4 states, involves 62 integer and 34 contin- 
uous variables and 122 constraints. This example was 
the largest presented in this work, and required consid- 
erable computational effort, 7849.23 CPU seconds on 
a SUN ULTRAstation-1. 
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Decomposition methods, such as the classical Benders 
decomposition (cf. » Generalized Benders decomposi- 
tion), [1], and Dantzig-Wolfe decomposition, [3], have 
been used to solve many different large structured opti- 
mization problems, by decomposing them with the help 
of relaxation of constraints or fixation of variables. The 
success of such an approach depends very much on the 
structure of the problem. In some cases these methods 
are very efficient, but in other cases they are not com- 
petitive with other techniques. 

However, the simple elegance of these basic princi- 
ples has inspired many researchers to propose modifi- 
cations of the basic methods, mostly aimed at improv- 
ing the efficiency of the methods, but also aimed at ex- 
tending the applicability of the approaches. 

Dantzig-Wolfe decomposition, originally for linear 
programming problems, [3], has been extended to con- 
vex nonlinear programming problems, [2], under sev- 
eral names, for example generalized linear program- 
ming. We will here simply use the term ‘nonlinear 
Dantzig-Wolfe decomposition’. 
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Benders decomposition, originally for linear mixed 
integer programming problems, [1], has been extended 
to partly convex nonlinear programming problems, [5], 
under the name ‘generalized Benders decomposition’. 

On the other hand, among the numerous sugges- 
tions for modifications to increase the efficiency, there 
is one which in a way shares the simplicity and clear 
principle of the basic methods, namely cross decom- 
position, [11]. Usually described as a combination of 
Benders decomposition and Dantzig-Wolfe decompo- 
sition, simultaneously using the two methods in an it- 
erative manner, the method borrows its basic conver- 
gence properties from these two methods. However, 
one can also view cross decomposition as the more gen- 
eral method, and Benders and Dantzig-Wolfe decom- 
position as modifications of cross decomposition, ob- 
tained by excluding one of the subproblems and one of 
the master problems. 

Cross decomposition was originally developed for 
linear mixed integer programming problems, [11], but 
the approach is more general and not restricted to such 
problems. The first application of cross decomposition 
was to the capacitated facility location problem, [12], 
and produced a solution method which is recognized 
as one of the most efficient existing methods for that 
problem. However, another early application was to the 
stochastic transportation problem (a convex problem 
with linear parts), [10]. 

Here we will describe ‘generalized cross decompo- 
sition’, which was first proposed in [6], and more thor- 
oughly treated in [7]. The generalization of the proce- 
dure, parallel to that in [5] for generalized Benders de- 
composition, enables the solving of nonlinear program- 
ming problems with convex parts, for example nonlin- 
ear mixed integer programming problems, see for ex- 
ample [4]. 


The Problem 


Consider the following general optimization problem. 


v* = min f(x, y) 
s.t. Gi(x, y) <0 
(P) Go(x, y) < 0 
xeEex 
yeyY 


where X and Y are compact, nonempty sets. Assume 
that X is convex and f, G; and G» are proper convex 
functions in x for any fixed y € Y, i.e. that the problem 
is convex in x. Also assume that that f, G; and G are 
bounded and Lipschitzian on (X, Y). Note that we do 
not assume any convexity in the y-variables. An impor- 
tant case is when Y isa (finite) set of integers. 

Furthermore we assume the following (as was done 
in [5] for generalized Benders decomposition). The op- 
timization with respect to x of the Lagrangian func- 
tions must be possible to do ‘essentially indepen- 
dent’ of y (called property P by A.M. Geoffrion). We 
therefore assume that the functions qi, q2, q3 and 
qq exist, such that f(x, y) + u] G(x, y) + uj Gilx, 
y) = ailqs(x, u), y, u), Vx, y, u, and Hi] Gy(x, y) + 
Tt} Go(x,y) = qo(qu(x,u), y,u), Vx, y,u, where q3 
and q,4 are scalar functions, q, and q are increasing 
in their first argument, and 7% is assumed to belong 
to the set of all possible nonnegative, normalized di- 
rections C = {a> 0: ca= I} where e is a vec- 
tor of ones. Since f, G; and G) are convex in x and 
bounded and Lipschitzian on (X, Y), the same applies 
to qi for any fixed u > 0, and to q for any fixed 
“EC. 

The optimal solution of P is denoted by (x*, y*). 
We will also mention the case when P is convex, i.e. 
where f, G; and G) are convex functions (in y too) and 
Y is a convex set. Lagrangian duality can be used to get 
a dual solution (the optimal Lagrange multipliers), de- 
noted by u* = (uf, u5). 

Let us for convenience introduce the following no- 
tation. 


L(x, y,u) = f(x,y) + a] Gilx, y) + uz Gr(x, y), 
L(x, y,t) =H] Gi(x, y) +7 Go(x, y), 

Li(x, y,u1) = f(x,y) + u; Gi(x, y), 

T(x, y,%) = ti} Gi(x, y). 


The Primal Master Problem 


Using the primal structure of (P) we can rewrite it as 


y= h(y), 
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where Vy=€ V, Since q; and q are proper, convex, bounded and 
Lipschitzian on X, and X is compact and convex, the 
h(y) = min f(x, y) optima in x (for fixed u and 7%) will be attained. q, and 


q2 are increasing in their first argument, so the mini- 


s.t. Gi(x, y) <0 be ted ‘ . 
mization in x can be made in q3 and qy4 instead, and 
Gass) 2 0 the value of y will thus not influence the result of this 
xeXx minimization. The minimization over x can be made 
once (for any y) and the result will then be true for all 

and yey. 

The relaxed primal master problem only contains 
Va syeY: eX: Gi(x, y) < 0, ; a finite number of cuts (with index sets Py and Ry) 
G2(x, y) < 0 which gives an approximate description of h(y) and 


V, and an optimal objective function value, vpy < v*. 
The problem is convex in x, so we can use La- Since the part of the problem that is described by the 


grangian duality to get, Vy € V, constraints is convex in x, vpy will converge asymptot- 
ically towards v* as the sets of constraints grow. 
h(y) = max min L(x, y, u). The constraints can now be expressed as 
u=0 xEX 
A similar expression can be obtained for V: q=% (nin qa(x,u™), y, ) , Wke Py, 
x€ 
ae —(k)) 4, Zk) 
V= y EY: (maxmin 7x. ».2) < of : Oso (min ate ) yeu ) » Whe Ru. 
uec XE 


The full primal master problem is given below: Pile mniiza nonin & caanow Be miasie mdsoen: 


dently in each constraint, since the other arguments in 
g3 and q4, namely u and U, are fixed. Since the minima 


vy" = ming are attained, we use the notation x, Vk € Py, and x™, 
st ge min L(x, y,u), Wu 0, Vk € Ry, for the minimizers of q3 and qa. 
o> ain Le, yi), Wie, Inserting this, we obtain the final form of the relaxed 
xEX primal master problem. 
yey. 


vem = ming 


This problem has an infinite number of constraints, 6. ose («' k), i: ‘it by, VEPs 


one for each nonnegative dual point and one for each (PM) wo os 

nonnegative dual direction. Each constraint contains an 0> LE, yu), Wk e Ru, 

optimization problem (minimization with respect to x), yey. 

which should in theory be solved for all y € Y before 

the main problem, miny ¢ yh(y), can be solved. How- The constraints in the first set are called value cuts, 

ever, we have and those in the second set are called feasibility cuts. 
Bn Ds Sol) = (in q(x, 4), ¥, u) The Dual Master Problem 


Using Lagrangian duality on (P) yields a relaxation and 


and a lower bound, vz, on v*: 


min L(x, y,) = q2 (sin aul), i) v_ = max g(u11) 
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where, Vu, > 0, 


gui) = min L(x, y, u1) 
s.t. Go(x, y) <0 
xEex 
yey. 


This leads to a dual master problem, which is a convex- 
ification of the problem. If (P) is not convex a duality 
gap might occur. We denote the subset of the solutions 
that are included by (x, y), Vk € Px, and obtain the 
restricted dual master problem as 


VpDM = Mmaxq 
t. <L (k) (k) , 
(DM) s qs Lyx, y, uy) 
Vke Px, 
u, = 0. 
The Subproblems 


The primal subproblem is a convex problem in x, ob- 
tained by fixing y to y. 


A(y) = min f(x,y) 
(PS) s.t. Gil, 7) <0 
Ga(x, y) <0 
xeEeXx. 


A solution to (PS) is assumed to consist of both a pri- 
mal solution, x“, and a dual solution, (u™, us), Due 
to the convexity we can use Lagrangian duality without 
creating a duality gap. 


(PSL) h(y) = sup min L(x, y, u). 
u>0 xEX 


If (PS) is infeasible, (PSL) will be unbounded in u, and 
a solution is represented by a direction, TD, A valid 
cut for the primal master problem also requires a cor- 
responding primal solution, x“, obtained by solving 


min L(x, yu”), 
xEX 


(Note that x“ is not feasible in (PS).) 


The dual subproblem is the following (nonconvex) 
problem, obtained by relaxing the first set of constraints 
in (P) and fixing the Lagrange multipliers u, to 11: 


g(u) = min Lj(x, y m1) 
< 
(DS) s.t. Go(x, y) <0 
xEex 
yeY 


To handle unbounded dual solutions, 7, we can use the 
following subproblem: 


Vu) = min L(x, y,%4) 
G2(x, y) < 0 
xEex 

yey. 


s.t. 
(UDS) 


(UDS) does not produce a bound on v*, but if v(m) < 0 
it yields a dual cut that will eliminate 7). 


The Cross Decomposition Algorithm 


In the subproblem phase of the cross decomposition 
method we iterate between the primal subproblem (PS) 
and the dual subproblem (DS) (or (UDS)). 

The primal subproblem, (PS), supplies an upper 
bound, h(y), on v*, and u for the dual subproblem. 
The dual subproblem, (DS), supplies a lower bound, 
g(u), on v*, and ¥ for the primal subproblem. If (PS) 
has an unbounded solution, %, we use (UDS) (instead 
of (DS)) to get y. 

Unfortunately, the lack of controllability for the im- 
portant parts of the solutions, y and u, which occurs 
unless the problem is strictly convex, implies that this 
procedure alone cannot be expected to converge to the 
optimal solution. 

We therefore need to use the master problems to en- 
sure convergence. (PM) or (DM) can be solved with all 
the constraints generated by the subproblem solutions. 
We have all the known results for generalized Benders 
or nonlinear Dantzig-Wolfe decomposition to fall back 
on, so this technique is well known. After the solution 
of one master problem, the subproblem phase is reen- 
tered. (We do not switch to Benders or Dantzig—Wolfe 
decomposition completely.) 
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S Dual subproblem 
<a> Master problem 
Primal subproblem 


MINLP: Generalized Cross Decomposition, Figure 1 


a» 


We will later describe convergence tests that tell us 
exactly when to use a master problem. The existence 
of such convergence tests is a very important aspect of 
cross decomposition. Let us, before getting any further, 
give below a short algorithm for cross decomposition 
algorithm. 

Let us denote the convergence test in step 3 (before 
(PS)) by CTP and the convergence test in step 6 (before 
(DS)) by CTD. The optimality tests (step 2 and step 5) 
are included in the convergence tests, and the decision 
about where to go is based on the results of both tests. 
The algorithm is pictured in Fig. 1. 


0 Get a starting u. 

1 Solve (DS) (or (UDS)). 

2 IF optimal go to 8. 

3 IF not convergence, go to 7A (or 7B). 

4 Solve (PS). 

5 IF optimal go to 8. 

6 IF not convergence go to 7B (or 7A). ELSE 
goto l. 

Solve (PM). Go to 4. 

Solve (DM). Go to 1. 

8 Stop. The solution from (PS) is optimal. 


We can start with either one of the subproblems, so 
a good primal starting solution can also be utilized. 

If CTP indicates that (PS) will not give further con- 
vergence, we use (PM). If CTD indicates failure of con- 
vergence for (DS), we can use (DM) (which however 
gives certain convergence only if (P) is convex). After 
(PM) we go to (PS) and after (DM) we go to (DS), in 
order to make use of the output of the master problems. 
In the general nonconvex case, it is not necessary to use 
(DM). It is even possible to omit the convergence tests 
CTD if only (DM) is used. 


The Convergence Tests 


Returning to the question of convergence in the sub- 
problem phase, we make the following definitions of e- 
improvements. 

‘e-bound-improvement’ is an improvement of at 
least ¢ of the upper or lower bound. 

“e-cut-improvement’ is a generation of a new, so far 
unknown cut, that is at least ¢ better (i.e. has a value of 
at least ¢ higher or lower) than all known cuts at some 
point. 

Discussing linear mixed integer problems, as in [11], 
one can let ¢ = 0. In such a case we simply omit ¢ from 
the above notation. 

Cut-improvement thus means that a new cut will 
be included in one of the restricted master problems 
and that the description of the functions h(y) or g(u;) 
or the set Y is refined. By ‘improvement’ we will, in 
the rest of this paper mean bound-improvement and/or 
cut-improvement. When using unbounded solutions 
as input no finite bounds are obtained, so bound- 
improvement can not appear. Also, a cut giving a cut- 
improvement can be a value cut or a feasibility cut, i.e. 
generated by output in the form of unbounded as well 
as bounded solutions. 

Let us by primal cut-improvement denote genera- 
tion of a primal cut (for (PM)) and by dual cut-im- 
provement denote generation of a dual cut (for (DM)). 
We also use the notation ‘primal’ or ‘dual bound-im- 
provement’ to indicate which of the two subproblems 
that gave the improvement, i.e. primal bound-improve- 
ment means that h(y) < v and dual bound-improve- 
ment means that g(u) > v. (Vv is the least upper bound 
known and v the largest lower bound known.) 

The convergence tests are originally formulated to 
give the answers to the following questions. 

e Can y give a bound-improvement in (PS)? 

e Can 1“ give a bound-improvement in (DS)? 
Testing extreme rays, 4, for convergence, we note 
that the subproblem (UDS) can not give bound- 
improvement. We call the test of unbounded solutions 
CTDU. 

We now give the convergence tests, CT, with strict 
inequalities, following [11]: 
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CTP [If L(x, y,u™) < ¥, Wk € Py, and 
Ta, 7,0) < 0, Vk € Ry, then ¥ will 
give primal improvement. If not, use a mas- 
ter problem. 

its Ly(x™, y™ 2) > v, Vk € Px, then 1, 
will give dual improvement. If not, use 
a master problem. 

If L(x, ya) > 0, Wk © Px, then 7% 
will give dual cut-improvement. If not, use 
a master problem. 


CTD 


CTDU 


We call CTD and the first part of CTP value conver- 
gence tests and CTDU and the second part of CTP fea- 
sibility convergence tests. This conforms to the notation 
of value and feasibility cuts in the master problems. 

One can show that the convergence tests CTP and 
CTD are necessary for bound-improvement and suffi- 
cient for cut- or bound-improvement, see [7]. The con- 
vergence tests CTDU are sufficient for cut-improve- 
ment. 

However, there can be an infinite number of primal 
and/or dual improvements, so one can not be certain 
that CT will fail within a finite number of steps. For this 
reason it is necessary to consider e-improvements. 

We need the following e-convergence tests, CT e: 


CTPe | If Lay, a) < v—e, Wk € Py, and 
LR, 7,0) < —e, Wk € Ry, then ywill 
give primal e-improvement. If not, use a 
master problem. 

If L(x, y, 7) >vt+e,Wk &€ Py, then 
u, will give dual e-improvement. If not, 
use a master problem. 

If Ly (x, ym) = ©, Vk € Px, then 1 
will give dual e-cut-improvement. If not, 
use a master problem. 


CTDe 


CTDUe 


The e-value convergence tests correspond to the value 
cuts of the master problems, and the ¢ used corresponds 
directly to a change of ¢ of the bounds (¢-bound- 
improvement). The e-feasibility convergence tests, on 
the other hand, correspond to feasibility cuts of the 
master problems, and the ¢ used corresponds to the 
‘infeasibility’ it gives some previously feasible points, 
which is what we call e-cut-improvement for feasi- 
bility cuts. While these e-tests are sufficient for e- 


improvement, they are not necessary. To prove ne- 
cessity would require an inverse Lipschitz assumption, 
namely that for points a certain distance apart, the value 
of a function (the feasibility cut) should differ by at least 
a certain amount. The following result is proved in [7]. 

The ¢-value convergence tests of CTP ¢, the feasi- 
bility convergence tests of CTP and the e-convergence 
tests CTD ¢ are necessary for e-bound-improvement. 
The ¢-convergence tests CT ¢ are sufficient for ¢- 
bound- or é-cut-improvement, in the sense that they 
are sufficient for one of the following. 

I) ¢-bound-improvement. 

II) ¢-cut-improvement. 

III) ¢,-bound-improvement and ¢-cut-improvement, 
where €; + €2 =€. 

Now it is possible to verify finiteness of the convergence 

tests. A formal proof for this can be found in [7]. The 

following reasoning is used. 

When the bounded set Y is completely described 
with an accuracy better than « by either value cuts or 
feasibility cuts, the e-convergence tests will fail (if not 
earlier). Each time the e-convergence tests do not fail, 
we will get improvement according to one of the three 
cases mentioned above. 

A finite number of e-bound-improvements is obvi- 
ously sufficient to decrease the finite distance between V 
and v* to less than e. After an e-cut-improvement, the 
new cut describes h(y) with an accuracy better than ¢ in 
the area around y where h(y) < L(x, ys u) + ¢. Due 
to the Lipschitzian property of the functions f, G; and 
Gp, there is a least distance, 5, proportional to e, from 
y to any point y violating this inequality, and the e- 
convergence tests will fail for any point with a distance 
to ¥ less then 6. The bounded set V can be completely 
covered by a finite number of such areas. 

In the third case, an ¢)-bound-improvement to- 
gether with an ¢2-cut-improvement, where ¢; + €2 = &, 
we can ignore the least of ¢; and €9, leaving us with the 
other one greater or equal to ¢/2. This yields one of the 
two cases above, so exchanging « for ¢/2 finiteness is 
still assured. 

For unbounded solutions to (PS), any y satisfying 


Le, y,u) > —e will make the e-convergence tests 


fail, and because of the Lipschitzian property of G; and 
G, there is a least distance, 5 (proportional to ¢), from 
y to any y not making the e-convergence tests fail. Thus 
an area of a certain least size is made ‘infeasible’, and 
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the bounded set Y \ V can be covered by a finite set of 
such areas. Thus CTP « will fail within a finite number 
of steps. 

Note that it is enough that CTP e¢ fails. To obtain 
finiteness we do not need to use CTD ¢, even if it 
might be useful in practice. We cannot show that CTD 
é will fail within a finite number of steps. Dual e-bound- 
improvement can only occur a finite number of times, 
but dual e-cut-improvement can occur an infinite num- 
ber of times, since the area to be covered by the cuts is 
the nonnegative orthant of 1. 

We therefore require that (PM) is used regularly. 
(One could even skip (DM) completely.) The following 
is our main result. 


Theorem 1 The generalized cross decomposition algo- 
rithm equipped with e-convergence tests CT « finds an 
&-optimal solution to (P) in a finite number of steps, if 
the generalized Benders decomposition algorithm does. 


All the results for generalized Benders decomposition 
can be directly used for generalized cross decomposi- 
tion, especially the following two. 

In [5] it is shown that generalized Benders decom- 
position has finite exact convergence if Y is a finite dis- 
crete set. The worst case is solving the primal subprob- 
lem with each possible y € Y, which will give a perfect 
description of h(y) and V on Y. 

Therefore we know that if Y is a finite discrete 
set, the generalized cross decomposition algorithm will 
solve P exactly in a finite number of steps. 

It is also shown in [5] that generalized Benders de- 
composition terminates in a finite number of steps to 
an €-optimal solution, i.e. where ¥V — v < e for any 
given € > 0, if the set of interesting (uj, u2)-solutions 
(possible optimal solutions to the primal subproblem) 
is bounded and Y C V. This makes the primal feasibil- 
ity cuts (and the corresponding convergence tests) un- 
necessary. So for generalized cross decomposition, we 
know the following. 

If h(y) is bounded from above for all y € Y, ice. 
(PS) has a feasible solution for every y € Y, then the 
cross decomposition algorithm (without UDS and the 
&-feasibility convergence tests of CT €) will yield finite 
€-convergence, i. e. yield ¥V—v < e€ ina finite number of 
steps, for any given € > 0. 

If Y Z V one might get asymptotic convergence 
of the feasibility cuts, i.e. solutions getting closer and 


closer to the feasible set, but never actually becomes 
feasible. If one is reluctant to base a stopping criterion 
on €-feasible solutions, one could use penalty functions, 
which transforms feasibility cuts to value cuts and gives 
better possibilities of handling cases where Y Z V. One 
could also use artificial variables for this purpose. As for 
nonlinear penalty function techniques, one should not 
forget the Lipschitzian assumption made. 

The practical motivation behind cross decomposi- 
tion is to replace the hard primal master problem with 
the easier dual subproblem to the largest possible ex- 
tent. Therefore the theoretical result that generalized 
cross decomposition equipped with e-convergence tests 
does not have asymptotically weaker convergence than 
generalized Benders decomposition, is quite satisfac- 
tory. 

Finally one might mention that these approaches 
also has been applied to pure (not mixed) integer pro- 
gramming problems in [8] (nonlinear) and [9] (linear). 
In such cases, various duality gaps appear, and exact so- 
lution is not possible. However, the approach may be 
useful for obtaining good bounds on the objective func- 
tion value, which are to be used in branch and bound 
methods. 
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The @BB global optimization algorithm for continu- 
ous twice-differentiable NLPs (cf. » BB algorithm) 
[2,4,5,6,8,18] can be used to design global optimiza- 
tion algorithms for mixed integer nonconvex problems 
[1,3,7]. One such algorithm, the special structure mixed 
integer &BB algorithm (SMIN-aBB) is designed to ad- 
dress the class of MINLPs in which all the integer 
variables are binary variables that participate in linear 
or mixed-bilinear terms and in which the nonconvex 
functions in the continuous variables have continuous 
second order derivatives. This algorithm is an extension 
of the wBB algorithm and branching is performed on 
both the continuous and the binary variables. A second 
algorithm, the general structure mixed integer BB al- 
gorithm (GMIN-a@BB), guarantees convergence to the 
global optimum of a much broader class of problems. 
The integer variables may participate in the problem in 
a very general way, provided that the continuous relax- 
ation of the MINLP is C’ continuous. This article de- 
scribes both algorithms. 


The SMIN-«BB Algorithm 


The SMIN-aBB algorithm [1,3,7] guarantees finite €- 
convergence to the global solution of MINLPs belong- 
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ing to the class 


min f(x) +x" Apy + cry 
st. gi(x) + x Agiy + crY <0, 
i= 1,205, 
h(x) +x! Aniy + cn iY =0, 
i=1,...,p, 
sl 


(1) 


x € [x!|x 


ye70,1)" 


where f(x), g(x), and h(x), are continuous, twice- 
differentiable functions, m is the number of inequality 
constraints, p is the number of equality constraints, q 
is the dimension of the binary variable vector, Ay, Ag, i 
and Aj, ; are n x q matrices, and cf, Cg; and cp,; are q- 
dimensional vectors. 

The main features of any branch and bound algo- 
rithm are the strategy used to generate valid lower and 
upper bounds for the problem and the selection criteria 
for the branching node and the branching variable. Op- 
tionally, a procedure to tighten the variable bounds may 
be considered. Each one of these issues is examined in 
the context of the SMIN-@BB algorithm. 


Generation of Valid Upper and Lower Bounds 


A local solution of the nonconvex MINLP (1) using one 
of the algorithms described in [13] constitutes a va- 
lid upper bound on the global optimum solution of 
that problem. The generalized Benders decomposition 
(GBD) [10,14] or a standard MINLP branch and bound 
algorithm (B&B) [9,11,15,19,20] may be used to ob- 
tain such a solution. When there are no mixed-bilinear 
terms, the outer approximation with equality relaxation 
(OA/ER) [12,16] may also be used. Alternatively, the bi- 
nary variables may be fixed to a combination of 0 and 1 
values and the resulting nonconvex NLP may be solved 
locally. 

A relaxed problem which can be solved to global 
optimality must be constructed from problem (1) in 
order to obtain a valid lower bound. The class of 
MINLPs in which the continuous functions f(x), gi(x), 
and h;(x), are convex can be solved to global opti- 
mality using the GBD or B &B algorithms, and, when 


there are no mixed-bilinear terms, the OA/ER algo- 
rithm. To identify a guaranteed lower bound on the so- 
lution of the problem, it therefore suffices to construct 
convex underestimators for the nonconvex functions 
F(x), gi(x), and h;(x), and to solve the resulting prob- 
lem with one of these algorithms. The rigorous con- 
vexification/relaxation strategy used in the aBB algo- 
rithm for nonconvex continuous problem [2,4,5,6] al- 
lows the construction of the desired lower bounding 
MINLP. This scheme is based on a decomposition of 
the functions into a sum of terms with special mathe- 
matical structure, such as linear, convex, bilinear, trilin- 
ear, fractional, fractional trilinear, univariate concave 
and general nonconvex terms. A different convex relax- 
ation technique is then applied for each class of term. 
The fact that a summation of convex functions is it- 
self a convex function is then used to construct overall 
function underestimators and arrive at a convex lower 
bounding MINLP. 


Selection of Branching Node 


A list of the lower bounds on all the nodes that have not 
yet been explored during the branch and bound pro- 
cedure is maintained. A number of approaches can be 
used to select the next branching node, such as depth- 
first, breadth-first or smallest lower bound first. Since 
the purpose of the algorithm is to identify the global so- 
lution of the problem, all promising regions, that is, all 
regions for which the lower bound is less than or equal 
to the best upper bound on the solution, must be ex- 
plored. The strategy that usually minimizes the num- 
ber of nodes to be examined and therefore the CPU re- 
quirements of the algorithm is used to choose the next 
branching node in the SMIN-@BB algorithm. Thus, the 
node with the smallest lower bound is selected. 


Selection of Branching Variable 


Several strategies can be used to select the next vari- 
able to be branched on. If a continuous variable is judi- 
ciously chosen, the partition results in an improvement 
of the lower bound on the problem through a tighten- 
ing of the convex relaxation of the nonconvex contin- 
uous functions. Binary variables have an indirect effect 
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on the quality of the convex underestimators as they in- 
fluence the range of values that the continuous variables 
can take on. 

A first branching variable selection scheme exploits 
the direct relationship between the range of the con- 
tinuous variables and the quality of the lower bounds 
and therefore branches only on these variables. One of 
the rules available for the wBB algorithm [2] is used for 
the selection. These are based on the size of the variable 
ranges, or on a measure of the quality of the underesti- 
mator for each term, or on a measure of each variable ’s 
overall contribution to the quality of the underestima- 
tors. 

A second approach aims to first tackle the combi- 
natorial aspects of the problem by branching only on 
binary variables for the first q levels of the branch and 
bound tree, where q is the number of binary variables. 
The nonconvexities are dealt with on subsequent levels 
of the tree, by branching on the continuous variables. 
The specific binary variable used for branching is cho- 
sen randomly or from a priority assigned on the basis of 
its effect on the structure of the problem. In particular, 
the binary variables that influence the bounds on the 
greatest number of variables are given the highest pri- 
orities. Once all the binary variables have been fixed, the 
problems that must be considered are continuous non- 
convex and convex problems for the upper and lower 
bound respectively. The bounding of the nodes below 
level q is therefore less computationally intensive than 
above that level. 

A third approach also involves branching on the 
continuous and binary variables although the choice 
is no longer based on the level in the tree. To in- 
crease the impact of binary variable branching on the 
quality of the lower bound, such a variable is selected 
when a continuous relaxation of the problem indicates 
that the two children node will have significantly dif- 
ferent lower bounds, and that one of them may even 
be infeasible. Thus, if one of the binary variables is 
close to 0 or 1 at a local solution of the continuous 
relaxation, it is branched on. The degree of closeness 
is an arbitrary parameter which can typically be set 
to 0.1 or 0.2. If no ‘almost-integer’ binary variable is 
found, a continuous variable is selected for branching. 
In general, this hybrid strategy results in a faster im- 


provement in the lower bounds than the second ap- 
proach, but it is more computationally intensive be- 
cause a continuous relaxation must be solved before 
selecting a branching variable and a larger number of 
MINLP nodes may be encountered during the branch 
and bound search. 


Variable Bound Updates 


The tightening of variable bounds is a very important 
step because of its impact on the quality of the under- 
estimators. For continuous variables, the strategies de- 
veloped for the wBB algorithm may be used [2]. For 
the SMIN-@BB algorithm, they rely on the solution of 
several convex MINLPs in the optimization-based ap- 
proach, or the iterative interval evaluation of the con- 
straints in the interval-based approach. In this latter 
case, the binary variables are relaxed during the inter- 
val computation. 


PROCEDURE binary variable bound update() 
Consider R = {(x,y) € F: y; = 0}; 
Test interval feasibility of R; 
IF infeasible, set yt =; 
Consider R = {(x,y) € F: y; = 1}; 
Test interval feasibility of R; 
IF infeasible, 
IF y/ = 1, RETURN(infeasible node); 
BLSE, set vy) = 0; 
RETURN(new bounds y/ and y¥); 
END binary variable bound update; 


Procedure for binary variable bound updates 


In the case of binary variables, successful bound 
updates are beneficial in two ways. First, they indi- 
rectly lead to the construction of tighter underestima- 
tors as they affect the continuous variable bounds. Sec- 
ond, they allow a binary variable to be fixed and there- 
fore decrease the number of combinations that poten- 
tially need to be explored. An interval-based strategy 
can be used to carry out binary variable bound updates. 
Given the current upper bound fi on the global op- 
timum solution, the feasible region F is defined by the 
constraints appearing in the nonconvex problem, a new 
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constraint f(x) + x'A ryt ary < a and the box (x, _ tion is: 


y) € [x’, x¥] x [y’, yY]. Consider a variable y; € {0, 1 } 
whose bounds are being updated. The procedure above 
is used. 


Algorithmic Procedure 


The algorithmic procedure for the SMIN-a@BB algo- 
rithm is as follows: 


PROCEDURE SMIN-a@BB algorithm() 
Decompose functions in problem; 
Set tolerance €; 
=o 
Set f* = f° =—ooand f =f = +00; 
Initialize list of lower bounds { f°}; 
DOF =f Se 
Select node k with smallest lower bound, f*, 
from list of lower bounds; 7 
Set f* = f*; 
(Optional) Update binary and continuous var- 
iable bounds; 
Select binary or continuous branching variable 
Partition to create new nodes; 
DO for each new node i 
Generate convex lower bounding MINLP; 
Find solution f' of convex lower bounding 
MINLP; 
IF infeasible or f' > f +e 
Fathom node; 
ELSE 
Add f' to list of lower bounds; 


Find a solution P of nonconvex MINLP; 
IFf <f THENSetf =f; 
OD; 
OD; a 
RETURN(f and variables values at correspon- 
ding node); 
END SMIN-@BB algorithm; 


Pseudocode for the SMIN-w BB algorithm 


In order to illustrate the algorithmic procedure, 
a small example proposed in [17] is used. It is a sim- 
ple design problem where one of two reactors must be 
chosen to produce a given product at the lowest possi- 
ble cost. It involves two binary variables, one for each 
reactor, and seven continuous variables. The formula- 


min 7.5y; + 5.5y2 + 7v, + 6Vv2 + 5x 
st. z; -0.9(1—e °°") x, = 0 
Z — 0.8 (1 — e953") x. = 0 


X,) +x.-x =0 


Zj) +2. = 10 
vy; —10y; <0 
v2 —10y2 <0 
x; —20y, <0 
X2 — 20y2 < 0 
Ntry2=1 


0O<x1,x%. < 20; O0< 27,2 < 30 


0<v,v2 <10; O<x< 20 


(yi, ¥2) € {0, 1}? 


Because of the linear participation of the binary vari- 
ables, the SMIN-@BB algorithm is well-suited to solve 
this nonconvex MINLP. It identifies the global solution 
of 99.2 after nine iterations, when bound updates are 
performed at every iteration and branching takes place 
on the binary variables first. Branching variable selec- 
tion takes place randomly for the binary variables and 
according to the term measures for the continuous vari- 
ables. At the global solution, the binary variable val- 
ues are y; = 1 and y2 = 0. The steps of the algorithm 
are shown in Fig. 1. The boldface numbers next to the 
nodes indicate the order in which the nodes were ex- 
plored. The lower bound is computed by solving a con- 
vex relaxation of the nonconvex problem is indicated 
inside each node, and the branching variable selected 
for the node is also specified. The domain to which 
this branching variable is restricted is displayed along 
each branch. A black node indicates the lower bound- 
ing problem was found infeasible and a shaded node is 
fathomed because its lower bound is greater than the 
current upper bound on the solution. 

At the first node, the initial lower bound is 11.4 and 
an upper bound of 99.2 is found. The binary variable y, 
is selected as a branching variable. The region y, = 0 is 
infeasible and can therefore be fathomed (black node), 
while an improved lower bound is found for y; = 1. This 
latter region is therefore chosen for exploration at the 
second iteration. Variable bound updates reveal that y2 
= 1 is infeasible so that y2 can be fixed to zero. Branch- 
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MINLP: Global Optimization with wBB, Figure 1 
SMIN-«BB branch and bound tree 


ing on the continuous variables may now begin. The 
first selected variable is x, and regions 0 < x; < 10 and 
10 < x; < 20 are created. Since the left region has the 
lowest lower bound (36.4), it is examined at iteration 3. 
Variable bound updates show that this region is in fact 
infeasible and it is therefore eliminated without further 
processing. The algorithm proceeds to node 4 for which 
v, is selected as a branching variable. The right region, 
5 < v, < 10, is fathomed since it has a lower bound 
greater than 99.2. The algorithm progresses along the 
branch and bound until, at iteration 9, two nodes are 
left open with lower bounds of 99.2. This is within the 
accuracy required for this run so the procedure is ter- 
minated. One more iteration would reveal that the only 
global optimum lies in the right child of node 9. 

The SMIN-aBB algorithm is especially effective for 
chemical process synthesis problem such as distillation 
network or heat exchanger network synthesis [1,3]. 


The GMIN-«BB Algorithm 


The GMIN-aBB algorithm is designed to address the 
broad class of problems represented by 


min f(x,y) 
XY 


st. g(x,y) <0 
h(x, y) = 0 (2) 
x € [x!, x27] 


yély,y"] ON? 


where f(x, y), g(x, y), and h(x, y), are functions whose 
continuous relaxation is twice continuously differen- 
tiable. 

The GMIN-a@BB algorithm [2,3,7] extends the ap- 
plicability of the standard branch and bound ap- 
proaches for MINLPs [9,11,13,15,19,20] by making use 
of the wBB-algorithm. The most crucial characteristics 
of the algorithm are the branching strategy, the deriva- 
tion of a valid lower bound on problem (2), and the 
variable bound update strategies. 


Branching Variable Selection 


Branching in the GMIN-aBB algorithm is carried out 
on the integer variables only. When it is a bisection, 
the partition takes place either at the midpoint of the 
range of the selected variable, or at the value of that 
variable at the solution of the lower bounding problem. 
It is also possible to branch on more than one variable 
at a given node, or to perform k-section on one of the 
variables. More than two children node may be created 
from a parent node when the structure of the problem is 
such that the bounds on a small fraction of the integer 
variables affect the bounds on many of the other vari- 
ables in the problem. As in the SMIN-@BB algorithm, 
an integer variable is chosen randomly or according to 
branching priorities. An additional rule consists of se- 
lecting the most or least fractional variable at the solu- 
tion of a continuous relaxation of the problem. 


Generation of a Valid Lower Bound 


A guaranteed lower bound on the global solution of 
the current node of the branch and bound tree is ob- 
tained by solving a continuous relaxation of the non- 
convex MINLP at that node. When the integer variables 


2160 


MINLP: Global Optimization with wBB 


that have not yet been fixed are allowed to vary con- 

tinuously between their bounds, the problem becomes 

a nonconvex NLP. The validity of the lower bound can 

only be ensured if the global solution of this noncon- 

vex NLP is identified or if a lower bound on this so- 
lution is found. On the other hand, when all integer 
variables have been fixed to integer values at a node, no 
additional partitioning of this node can take place and 
the global optimum solution of the nonconvex NLP is 
required to guarantee convergence of the GMIN-aBB. 

Based on these conditions, the wBB algorithm can be 

used as as subroutine to generate valid lower bounds: 

e If atleast one integer variable can be relaxed at the 
current node, run the wBB algorithm for a few itera- 
tions to obtain a valid lower bound on the global so- 
lution of the continuous relaxation or run the wBB 
algorithm to completion to obtain the global solu- 
tion of the continuous relaxation. 

e Otherwise, run the wBB algorithm to completion to 
obtain the global solution for the current node. 

This strategy makes use of the convergence characteris- 
tics of the aBB algorithm to improve the performance 
of the GMIN-@BB algorithm. The rate of improvement 
of the lower bound on the global solution of a non- 
convex NLP is usually very high at early iterations and 
then gradually tapers off. At later stages of an wBB run, 
the computationally expensive reduction of the gap be- 
tween the bounds on the solution of the continuous 
relaxation does not result in a sufficiently significant 
increase in the lower bound to affect the performance 
of the GMIN-o@BB algorithm and can therefore be by- 
passed. 


Generation of a Valid Upper Bound 


Because of the finite size of the branch and bound tree, 
it is not necessary to generate an upper bound on the 
nonconvex MINLP at each node in order to guaran- 
tee convergence of the GMIN-aBB algorithm. In the 
worst case, the integer variables are fixed at every node 
of the last level of the tree, and the solutions of the cor- 
responding NLPs provide the upper bounds needed to 
identify the global optimum solution. However, upper 
bounds play a significant role in improving the conver- 
gence rate of the algorithm by allowing the fathoming 
of nodes whose lower bound is greater than the smallest 
upper bound and therefore reducing the final size of the 


branch and bound tree. An upper bound on the solu- 
tion of a given node can be obtained in several ways. For 
example, if the solution of the continuous relaxation is 
integer-feasible, that is, all the relaxed integer variables 
have integer values at the solution, this solution is both 
a lower and an upper bound on the current node. If the 
a@BB algorithm was run for only a few iterations and the 
relaxed integer variables are integer at the lower bound, 
they can be fixed to these integer values and the result- 
ing nonconvex NLP can be solved locally to yield an up- 
per bound on the solution of the node. Finally, a set of 
integer values satisfying the integer constraints can be 
used to construct a nonconvex NLP whose local solu- 
tions are upper bounds on the current node solution. 


Variable Bound Updates 


If the bounds on the integer variables at any given node 
can be tightened, the solution space can be significantly 
reduced due to the combinatorial nature of the prob- 
lem. The allocation of computational resources for this 
purpose is therefore a potentially worthwhile invest- 
ment. An optimization-based approach or an interval- 
based approach may be used to update the variable 
bounds. These approaches are similar to those devel- 
oped for the wBB algorithm but they take advantage of 
the integrality of the variables. Thus, in the optimiza- 
tion approach, the lower or upper bound on variable y; 
is improved by first relaxing the integer variables, and 
then solving the convex NLP 


min or Maxxyyw Vi 


s.t. Fx, yweef 
e C(x, y, w) 
x € [x!, x7] 
ye ly’.y7] 
w € [w!,w7] 
where 7 (x, y, w) denotes the convex underestimator of 
objective function, f* denotes the current best upper 
bound on the global optimum solution, C (x, y, w) de- 
notes the set of convexified constraints, and w is the 
set of new variables introduced during the convexifica- 
tion/relaxation procedure. Finally, the improved lower 
or upper bound is obtained by setting yj = [ y* ] or y¥ 


=Ly* J. 
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In the interval-based approach, an iterative proce- 
dure is followed based on an interval test which pro- 
vides sufficient conditions for the infeasibility of the 
original constraints and the ‘bound improvement con- 
straint’ f(x,y) < f*, given the relaxed region (x, y) € 
[x, x] x [y’, y”]. This set of constraints defines a re- 
gion denoted by F. The procedure to improve the lower 


(upper) bound on variable y; is as follows: 


PROCEDURE interval-based bound update() 
Set initial bounds L = yj and U = y¥; 
Set iteration counter k = 0; 
Set maximum number of iterations K; 
DOk <KandL4U 
Compute ‘midpoint M = |(U + L)/2]; 
Set left region 
{(x y) € F: y; € [L, M]}; 
Set right region 
NGS) SE 123 jy S [Lal se IL, Wh 
Test interval feasibility of left(right) region; 


IF feasible, 
Set U = M (L = M); 
ELSE 
Test interval feasibility of right(left) 
region; 
IF feasible, 
Set L = M (U = M); 
ELSE 
IF k =0, 
RETURN(infeasible node); 
ELSE 
Saul = UW (WU = Ihe 
Sat Ur = Me (L= y)3 
Setk=k+1; 
OD; 


RETURNG! =110" =U); 
END interval-based bound update; 


Interval-based bound update procedure 


The variable bound tightening is performed before 
calling the wBB algorithm to obtain a lower bound on 
the solution of the current node. In many cases, during 
an @BB run, variable bound updates are also used to 
improve the quality of the generated lower bounds. Al- 
though the wBB algorithm treats the y variables as con- 
tinuous, the bound update strategy within the wBB al- 


gorithm may be modified to account for the true nature 
of these variables. A larger reduction in the solution 
space can be achieved by adopting one of the integer 
bound update strategies described here for the relaxed y 
variables. This more stringent approach leads to a lower 
bound which is not necessarily a valid lower bound on 
the continuous relaxation, but which is always a lower 
bound on the global solution of the nonconvex MINLP. 

The overall algorithmic procedure for the GMIN- 
aBB algorithm is shown below: 


PROCEDURE GMIN-aBB algorithm() 
Set tolerance €; 
Set f* = f° =—ooandf = rs = +00; 
Initialize list of lower bounds {f°}; 
DO =e 
Select node k with smallest lower bound, f i 
from list of lower bounds; 7 
Set f* = f*; 
(Optional) Update y variable bounds; 
Select integer branching variable(s); 
Create new nodes by branching; 
DO for each new node i 
Obtain lower bound f' on node 
IF all integer variables are fixed, 
Find global solution f' of nonconvex 
NLP with wBB algorithm; 
ELSE 
Relax integer variables; 
Run @BB algorithm to completion or 
for a few iterations to get f’ 
(Optional) Use integer bound 
updates on y variables; 
IF f' > f +, THEN Fathom node; 
ELSE 
Add f ' to list of lower bounds; 
(Optional) Obtain upper bound E on 
nonconvex MINLP; 
IFf <f THENSetf =f; 
OD; 
OD; . 
RETURN(f and variables values at corres- 
ponding node); 
END GMIN-aBB algorithm; 


Pseudocode for the GMIN-w BB algorithm 
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MINLP: Global Optimization with «BB, Figure 2 
GMIN-@BB branch and bound tree 


The algorithmic procedure for the GMIN-aBB al- 
gorithm is illustrated using the same example as for the 
SMIN-@BB algorithm. The branch and bound tree is 
shown in Fig. 2, using the same notation as previously. 

At the first node, the continuous relaxation of the 
nonconvex MINLP is solved for 10 @BB iterations to 
yield a lower bound of 60. No upper bound is found. 
Next, the binary variable y2 is chosen for branching and 
the continuous relaxation of the problem with y. = 0 
is solved. A lower bound of 92.2 is found as the global 
solution to this nonconvex NLP. In addition, this solu- 
tion is integer feasible and therefore provides an upper 
bound on the global optimum solution of the noncon- 
vex MINLP. The region yz = 1 is then examined and 
the global solution of the NLP is found to be 101.7 after 
10 @BB iterations. This node can therefore be fathomed 
and the procedure terminated. 

The GMIN-aBB algorithm has been used to solve 
nonconvex MINLPs involving nonconvex terms in the 
integer variables and some mixed nonconvex terms. 
Branching priorities combined with variable bound up- 
dates and a small number of @BB iterations for relaxed 
nodes allow the identification of the global optimum so- 
lution after the exploration of a small fraction of the 
maximum number of nodes and with small CPU re- 
quirements. In particular, the algorithm has been used 
on a pump network synthesis problem [2,3]. Some non- 
convex integer problems have also been tackled by the 
same approach. For instance, the minimization of trim 
loss, a problem taken from the paper cutting industry, 
has also been addressed for medium order sizes [3]. 


Conclusions 


The @BB algorithm for nonconvex NLPs can be in- 
corporated within more general frameworks to address 


broad classes of nonconvex MINLPs. One extension 
of the algorithm is the SMIN-a@BB algorithm which 
identifies the global optimum solution of problems in 
which binary variables participate in linear or mixed- 
bilinear terms and continuous variables appear in twice 
continuously differentiable functions. The partitioning 
of the solution space takes place in both the contin- 
uous and binary domains. The GMIN-a@BB algorithm 
is designed to locate the global optimum solution of 
problems involving integer and continuous variables in 
functions whose continuous relaxation is twice contin- 
uously differentiable. The algorithm is similar to tra- 
ditional branch and bound algorithms for mixed inte- 
ger problems in that branching occurs on the integer 
variables only and a continuous relaxation of the prob- 
lem is constructed during the bounding step. It uses the 
aBB algorithm for the efficient and rigorous generation 
of lower bounds. Both algorithms are widely applica- 
ble and have been successfully tested on a variety of 
medium-size nonconvex MINLPs. 
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Heat exchanger network synthesis problems arise in 
chemical process design when the heat released by hot 
process streams is used to satisfy the demands of cold 
process streams. These problems have been the subject 
of an intensive research effort, and over 400 publica- 
tions have been written in the area. See [7,8,9] for re- 
views of the area, and [1,3] for detailed analysis of HEN 
synthesis. 

The discovery by T. Umeda et al. [11] of a ther- 
modynamic pinch point that limits heat integration in 
a heat exchanger network led to much of this research 
effort. They showed that setting minimum temperature 
approach, AT nin, places a lower bound on the utility 
consumption in a heat exchanger network and decom- 
posed a heat exchanger network into independent sub- 
networks. This enables the heat exchanger network syn- 
thesis problem to be decomposed into four subprob- 
lems. The first subproblem finds the appropriate min- 
imum temperature approach, the second subproblem 
minimizes the utility consumption, the third subprob- 
lem finds the minimum number of matches and iden- 
tifies the matches and their heat duty, and the fourth 
finds and optimizes the actual network structure. 

See [5] for a systematic scheme for solving these 
problems sequentially. First, the utility consumption is 
minimized using the linear programming (LP) trans- 
shipment model approach of [10]. Second, a set of pro- 
cess matches and their heat duties that minimize the 
total number of units is found with the mixed integer 
linear programming (MILP) strategy of [10]. Then, the 
network structure is found [5] by optimizing a super- 
structure that contains all possible network configura- 
tions embedded within it using a nonlinear program- 
ming (NLP) problem. When there is more than one 
combination of matches and heat duties that satisfies 
the minimum unit criterion, the best combination is 
found by exhaustive enumeration. The minimum tem- 
perature approach is optimized with a golden section 
search that solves all three of these optimization prob- 
lems at each iteration. 


In the late 1980s it was found, [4,12], that better net- 
work designs could be obtained by solving some of the 
heat exchanger network design subproblems simulta- 
neously. C.A. Floudas and A.R. Ciric [4] combined the 
MILP stream matching problem with the NLP super- 
structure optimization problem formulated in [5], cre- 
ating a mixed integer nonlinear programming problem 
(MINLP) that avoided the exhaustive search through all 
combinations of matches that minimize the number of 
units. In 1990, they [2] formulated the entire heat ex- 
changer network design problem as a MINLP. The so- 
lution of this problem yields the optimal temperature 
approach, utility level, process matches, heat duties, and 
network structure, eliminating the need for a global sec- 
tion search for the optimum minimum temperature ap- 
proach. 

T.F. Yee and I.E. Grossmann [12] used a smaller su- 
perstructure proposed in [6] that embodies a sequen- 
tial-parallel network structure to formulate an alterna- 
tive MINLP for heat exchanger network synthesis. The 
solution of this MINLP yielded the utility consumption, 
matches and network structure and heat exchanger ar- 
eas. 


Problem Statement 


This article will explore two mixed integer nonlin- 
ear programming problems in heat exchanger network 
synthesis: combined match-network optimization and 
heat exchanger network synthesis without decompo- 
sition. The synthesis without decomposition problem 
can be stated as follows: 

Given: 

1) A set of hot process streams and hot utilities i € H, 
their inlet and outlet temperatures T’, T®', and heat 
capacity flow rates F'; 

2) A set of cold process streams and cold utilities j € C, 
their inlet and outlet temperatures T’, T®/, and heat 
capacity flow rates F’; and 

3) Overall heat transfer coefficients Uj. 

Determine: 

A) The stream matches (ij), the heat duty Qi of match 
(ij), and the heat exchanger area Aj of match (ij); 

B) the piping structure for each stream in the network; 
and 

C) the temperature and flowrate within each pipe of 
the network. 
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A superstructure for one hot stream exchanging heat with two cold streams 


In the match-network problem, one is also given 

e the level of each utility; and 

e aminimum temperature approach AT min. 

These problems can be solved using mixed integer non- 
linear programming. The development and application 
of these approaches is described in more detail below. 


Heat Exchanger Network Superstructures 


Mixed integer nonlinear programming approaches to 
these problems begin with a superstructure that con- 
tains many alternative designs embedded within it. Two 
superstructures are particularly interesting. 

Figure 1 shows a superstructure of a hot stream, 
above the thermodynamic pinch point, that may ex- 
change heat with two cold streams [5]. Notice that the 
stream can be piped in series, in parallel, and in split- 
mix-bypass configurations, as shown in Fig. 2. As we 
shall see, this richness leads to nonconvex constraints 
in the MINLP. The first network superstructure is cre- 
ated by constructing similar structures for every other 
stream above the pinch point. 

Notice that in this subnetwork, streams H1 and Cl, 
and all other pairs of hot and cold streams, can ex- 
change heat no more than once. H1 and Cl may ex- 
change heat again in the subnetwork below the thermo- 
dynamic pinch point. The thermodynamic pinch point 
has partitioned the temperature range into two inter- 
vals, and in each interval, individual process streams 
can only exchange heat once. 

One could increase the number of times two 
streams can exchange heat by partitioning the temper- 


ature range further. This is the basic strategy behind 
the second superstructure [6,12] shown in Fig. 3. Here, 
the temperature range has been partitioned into many 
intervals, or stages. Within any particular stage, each 
hot stream may exchange heat with each cold stream; 
multiple intervals allow any particular match to take 
place many times in the network. Unlike the first super- 
structure, each stream in each stage is piped in a par- 
allel configuration, and the inlet and outlet tempera- 
ture of each parallel line is fixed by the temperature 
interval. Series piping structures arise when a stream 
exchanges heat only once per interval. The superstruc- 
ture does not contain split-mix-bypass or series-parallel 
structures, but as we shall see that in exchange the non- 
convex constraints that arise from the first superstruc- 
ture have been eliminated. 


Mathematical Models 
for HEN Synthesis using MINLPs 


MINLP models of heat exchanger network synthesis 
arise when the process stream matches are selected 
while simultaneously optimizing the heat exchanger 
network; the former is a discrete decision modeled 
with integer variables, the latter, a nonlinear optimiza- 
tion problem. In this paper, we refer to this as the 
match-network problem. MINLPs may also be used 
to formulate an optimization problem that simultane- 
ously minimizes the utility consumption, selects the 
stream matches, and optimizes the network layout, in 
heat exchanger network synthesis without decomposi- 
tion. 
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MINLP: Heat Exchanger Network Synthesis, Figure 2 
Stream piping configurations embedded in the superstructure shown in Fig. 1 


Match-Network Problem duties, a superstructure model of all possible network 
structures, and an objective function. 

The MINLP model of the match-network problem has The transshipment model partitions the tempera- 

three components: a transshipment model [10] that ture range into t = 1, ..., T temperature intervals, us- 


identifies feasible process stream matches and their heat _ ing the inlet and outlet temperatures of the streams and 
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MINLP: Heat Exchanger Network Synthesis, Figure 3 
Two-stage superstructure 


the temperature interval approach temperature (TIAT). 
Hot streams release heat into the temperature intervals, 
where it either flows to the cold streams in the same in- 
terval or cascades down to the next colder interval. The 
binary variable Y; denotes the existence of a match be- 
tween hot stream i and cold stream j, where heat loads 
are qi and Qj, and heat residuals are Ry. The model is 
composed by the following constraints: 


D> Gijt + Rit — Riza = QF, i¢H, jeC, 
jECy 


Yin =O 7G. Pate, 


i€R; 

T 

> ain =Qi; i€H, jeC, 
t=1 

Qi;-UY;; <0, ie€H, jeC, 
> > Yij S Nmax: 

i€H j€C; 


The first two constraints in the transshipment model 
are the energy balances for each temperature interval. 


The total heat load in a match is given by the third con- 
straint. The fourth constraint bounds the heat load us- 
ing the binary variable Y; and a large fixed constant U. 
The last constraint in the above model puts an upper 
bound on the number of existing matches, which is the 
maximum number of units. 

The second part of the match-network synthesis 
model is the hyperstructure topology model, which 
consists of mass and energy balances for the mixers and 
splitters, feasibility constraints, utility load constraint 
and bounds on the flow rate heat-capacities. 

Mass balances for the splitters at the inlet of the su- 
perstructure: 


=e. FenCL 
k’ 


Here, HCT is the set of all process streams and utili- 
ties. Mass balances for the mixers at the inlets of the 
exchangers: 


I,k Bk 
k’ i De ee ~~ 


k”’ 


EK =0, kk €HCT. 
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Mass balances for the splitters at the outlets of the ex- 
changers: 


fe’ + Do fein ~ for" = 0 


kk’ 


k’,k € HCT. 


Energy balances for the mixers at the inlets of the ex- 
changers: 


k clk B,k ,O,k E,k ,I,k 
T k’ + pe ee tes ~ Se te = 0, 
k’ 


k',k © HCT. 


Energy balances over the heat exchangers: 
oe a (¥" 2 i) =0, i€H, jE, 


i ij j 
E,j (0.4 — Li 
Qy— fi ZG i) = 0. 


The minimum temperature approach between a hot 
stream and a cold stream: 
i _ ee 2 AT min. 
St ST ie 
Logical relations between the heat-capacity flow rates 
and the existence of a match: 
E,i 
fj — F'Y;; < 0, 
E,j 
fp! — PY <0. 


Lower bounds on the heat-capacity flow rates through 
the exchanger: 


[a _ _ Ue >0 
j ATjj,max = 
pri 2 59 
ATjj,max ~ 


where ATi, max equals T' — T’. Lastly, the objective 
function minimizes the total investment cost: 
B 


min ) ya : - - - Yj;. 
pt g sd _p0oh 4 gi u 


i€H jeC 


The model is a mixed integer nonlinear programming 
(MINLP) problem, as the objective function and the en- 
ergy balances are nonlinear, and the decision variables 
Yj are binary. Notice that the energy balances are bilin- 
ear, creating a nonconvex feasible region. 


MINLP: Heat Exchanger Network Synthesis, Table 1 
Stream data for example problem 


Stream Tin(C) Tour(C)  FC,(kW/C) 
H1 500 320 6 
H2 480 380 4 
H3 460 360 6 
H4 380 360 20 
H5 380 320 12 
Cl 290 660 18 
F 700 700 
CW 300 320 


U = 1.0kW/(m?C) 

Annual cost=1200A°° for all exchangers 
C, = 140$/kW 

Cow = LOST W 


MINLP: Heat Exchanger Network Synthesis, Table 2 
Match data for example problem; pseudo-pinch method [2] 


Match Q(kW)  A(m7?) 
H1-Cl 948.454 79.391 
H1-CW 131.546 6.280 
H2-Cl 400.000 29.057 
H3-Cl 600.000 57.488 
H4-Cl 400.000 14.880 
H5-Cl 720.000 25.509 

S-Cl 3591.546 32.112 


Heat Exchanger Network Synthesis 
Without Decomposition 


MINLP models that optimize utility consumption as 
well as process matches, heat duties, and network con- 
figurations can also be formulated. See [2] and [12] for 
pseudopinch approaches that set the TIAT to a small 
value and lets heat flow across the pinch. A strict de- 
composition at the pinch can also be maintained by let- 
ting TIAT vary, and using integer variables to model the 
changing structure of the temperature cascade. 


Example 1 These techniques are demonstrated with 
a problem given in both [12] and [2]. The problem con- 
sists of two hot streams, two cold streams, one hot util- 
ity (steam), and one cold utility (cooling water). The 
stream data is given in Table 1. 

Using the pseudopinch method with TIAT = 1C and 
AT min = 0.5C, and allowing HRAT to vary between 1C 
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MINLP: Heat Exchanger Network Synthesis, Figure 4 


Optimal network configuration for example problem; pseudopinch [2] 


MINLP: Heat Exchanger Network Synthesis, Figure 5 


Optimal network configuration for example problem; simultaneous approach [12] 


and 30C, Ciric and Floudas [2] formulated the prob- 
lem as a MINLP problem and solved it using the gener- 
alized Benders decomposition algorithm. The optimal 
network configuration is pictured in Fig. 3. The net- 
work consumes 3592.4kW of steam and 1312.4kW of 
cooling water, the HRAT is 8.42C. The annual cost of 
the network is $571,080. The match data of this solu- 
tion is given in Table 2. Yee and Grossmann [12] used 
the same problem to demonstrate the simultaneous op- 
timization approach. The problem is again formulated 


as a MINLP problem. The optimal network configura- 
tion is given in Fig. 4. The annual cost of this network 
is $576,640. HRAT is 13.1C. The match data of this net- 
work is given in Table 3. 


Conclusions 


Mixed integer nonlinear programming offer a power- 
ful approach to heat exchanger network synthesis. Us- 
ing these techniques, stream matching, the combina- 
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MINLP: Heat Exchanger Network Synthesis, Table 3 
Match data for example problem; simultaneous approach 
[12] 


Match Q(kW) A(m7?) 
S-Cl 3676.4 32.6 
H1-Cl 863.6 64.1 
H2-Cl 400.0 7 
H3-Cl 600.0 47.0 
H4-Cl 400.0 13.8 
H1-CW 216.4 eo) 
H5-Cl 720.0 18.4 


torial component of heat exchanger network synthesis, 
can be performed while simultaneously minimizing the 
utility consumption and selecting the cost-optimal heat 
exchanger network configuration. Merging these tasks 
leads to more cost-effective stream matches and lower 
exchanger costs. 
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There has been an increasing trend to representing lin- 
ear and nonlinear discrete optimization problems by 
models consisting of algebraic constraints, logic dis- 
junctions and logic relations ([1,7,8]). For instance, 
a mixed integer program can be formulated as a gen- 
eralized disjunctive program as has been shown in [5]: 


min Z= pac + f(x) 
k 


st. g(x) <0 
¥ 
Viep, | hix(x) <0 


DP1 
( ) Ck = Vik 


k € SD, 
§2(Y) = true 
xER", ceR”, 


Y € {true, false}””, 


in which Yj are the boolean variables that establish 
whether a given term in a disjunction is true (h(x) 
< 0), while Q(Y) are logical relations assumed to be 
in the form of propositional logic involving only the 
boolean variables. Y;, are auxiliary variables that con- 
trol the part of the feasible space in which the continu- 
ous variables, x, lie, and the variables cx represent fixed 
charges which are set to a value yj if the corresponding 
term of the disjunction is true. Finally, the logical con- 
ditions, §2(Y), express relationships between the dis- 
junctive sets. In the context of optimal synthesis of pro- 
cess networks, the disjunctions in (DP1) typically arise 
for each unit i in the following form: 


Y; =, 
h(x) <0 V Bix =0 
Ci = Vi ci = 0 


ie Tl, (1) 


in which the inequalities h; apply and a fixed cost y; 
is incurred if the unit is selected (Y;); otherwise (—Y;) 
there is no fixed cost and a subset of the x variables is set 
to zero with the matrix B'. An important advantage of 


the above modeling framework is that there is no need 
to introduce artificial parameters for the ‘big-M’ con- 
straints that are normally used in MINLP to model dis- 
junctions. 

M. Turkay and I.E. Grossmann [9] proposed a logic 
version of the outer approximation algorithm for 
MINLP [3] for solving problem (DP1), and in which 
the disjunctions are given as in equation (1), and all 
the functions are assumed to be convex. The algorithm 
consists of solving a sequence of NLP subproblems and 
master problems, which are as follows. 

For fixed values of the boolean variables, x = true 
and Y, = false for i # i, the corresponding NLP sub- 
problem is as follows: 


min Z= Sock + f(x) 
k 


st. g(x) < 0 
hj <0 
for a = true: | KC) 
(NLPD) Ck = Vik 
for Y;, = false : = 
chk =0 
k € SD, 
xeER", c ER. 


Note that for every disjunction k € SD only constraints 
corresponding to the boolean variable Y,, that is true 
are imposed. Also, fixed charges yj, are only applied 
to these terms. Assuming that K subproblems (NLPD) 
are solved in which sets of linearizations ] = 1, ..., K 
are generated for subsets of disjunction terms L(ik) = {I: 
Y!, = true}, one can define the following disjunctive OA 
master problem: 


(MDP1) minZ = 2 ck + f(x) 
k 


such that 


a > f(x!) + VF(x') (x — x’), 
g(x!) + Ve(x') "(x —x') <0, 
| — i oe ie 
Yik 
V hix(x®) + Vhix(x) "(x — x") < 0 
i€D_ Ck = Vik 
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k € SD, 
2(Y) = true, 
aéeR, xeER", ceEeR”, 


Y € {true, false}””. 


It should be noted that before applying the above 
master problem it is necessary to solve various subprob- 
lems (NLPD) so as to produce at least one linear ap- 
proximation of each of the terms in the disjunctions. 
Selecting the smallest number of subproblems amounts 
to the solution of a set covering problem, which is of 
small size and easy to solve [9]. 

The above problem (MDP1) can be solved by the 
methods described in [1] and [7]. It is also interesting 
to note that for the case of process networks Turkay and 
Grossmann [9] have shown that if the convex hull rep- 
resentation of the disjunctions in (1) is used in (MDP1), 
then assuming B’ = I and converting the logic relations 
§2(Y) into the inequalities Ay < a, leads to the MILP 
problem, 


(MIPDF) minZ = cy + f(x) 
k 


such that 

a > f(x!) + VE (x!) (x —x'), 

g(x!) + Vg(x!)"— x!) <0, 

f=1,...,L, 

Ve, ix!) ' Xe, + Vey, Rix’) xy, 

< [-hie!) + Vehilx)™ x4] yi, 

€eKi, iel, 

xn; = te “- ad 

02%, = 8) Vk 

0< ty < ren — yi), 

Ay < a4, 

xéeR", 

10,15", 
where the vector x is partitioned into the variables for 
each disjunction i according to the definition of the ma- 
trix B’. The linearization set is given by Kj, = {€: Y€; = 
true, € = 1,..., L} that denotes the fact that only a sub- 
set of inequalities were enforced for a given subprob- 


lem £. It is interesting to note that the logic-based outer 
approximation algorithm represents a generalization of 


1 2 
xn, 2 0, xy, > 0, 


the modeling/decomposition strategy [5] for the syn- 
thesis of process flowsheets. 

Turkay and Grossmann [9] have also shown that 
while a logic-based generalized Benders method [4] can- 
not be derived as in the case of the OA algorithm, one 
can exploit the property for MINLP problems that per- 
forming one Benders iteration [2] on the MILP master 
problem of the OA algorithm, is equivalent to generat- 
ing a generalized Benders cut. Therefore, a logic-based 
version of the generalized Benders method consists of 
performing one Benders iteration on the MILP master 
problem (MIPDF). It should also be noted that slacks 
can be introduced to (MDP1) and to (MIPDF) to re- 
duce the effect of nonconvexities as in the augmented- 
penalty MILP master problem [10]. 

Finally, it should be noted that S. Lee and Gross- 
mann [6] have developed a new branch and bound 
method and a MINLP reformulation that is based on 
the convex hull of each of the disjunctions in (DP1) 
with nonlinear inequalities. 
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Mass integration in the form of mass exchanger net- 
works, MEN, appears in the chemical industries as an 
economic alternative in waste treatment, feed prepara- 
tion, product separation, recovery of valuable materials, 
etc. The MEN involves a set of rich streams, wherefrom 
one or more components are removed by means of lean 
streams (mass separating agents) in mass transfer op- 
erations that do not require energy (constant pressure 
and temperature). 

The MEN synthesis/design problem is posed as 
a combinatorial problem, involving discrete and con- 
tinuous decisions (e.g. the mass exchange opera- 
tions/matches and the unit sizes, respectively), that 
both affect the overall mass integration cost. 

When the mass transfer operations can take place at 
different temperatures, heat integration of the rich and 


Rich streams 


R={il i=1..Np} 
CY, 
Lean 
streams 
§ ={jl j=1..Ng} 


Mass 
Exchange 
Network 


u ] t u 
Lj SL; Xo jc Xie 
xs 
Jc 
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lean streams is also considered within a combined mass 
and heat exchanger network, MHEN, synthesis prob- 
lem. 

In isothermal MEN synthesis, the simultaneous op- 
timization of the mass exchange operations, the mass 
separating agent flows and the network configuration 
has been formulated by K.P. Papalexandri, E.N. Pis- 
tikopoulos and C.A. Floudas [9] as an MINLP problem 
based on: 

a) the MEN superstructure of synthesis/design alterna- 
tives; 

b) modeling of mass exchange in each mass exchanger; 
and, 

c) minimization of a total annualized network cost. 

Details are given below 


MEN Superstructure 


The MEN superstructure for a given set of rich and lean 
streams includes all possible mass exchange operations 
(mass exchange matches) between the network streams 
in all possible network configurations. Its main features 
are: 

e Each potential match between a rich and a lean 

stream corresponds to a potential mass exchanger 
(one-to-one correspondence). 
Multiple mass exchange matches between two 
streams may be considered (i.e. streams integrated 
at different points in the network), increasing thus 
the considered MEN structures and the combinato- 
rial complexity of the synthesis problem. Note that, 
this is not similar to an a priori decomposition of the 
network into separable subnetworks. 

e Each stream entering the network is split towards all 
its potential mass exchanger units. 

e After each mass exchanger, a splitter is considered 
for each stream, where the stream is split towards 
its final mixer and all the other potential stream ex- 
changers. 

e Prior to each potential mass exchanger, a mixer is 
considered for each participating stream, where the 
flow from the initial splitter and connecting (bypass) 
flows from all the other exchangers of the stream are 
merged into the flow towards the exchanger. 

e A mixer is considered at the network outlet of each 
stream, where flows from all the potential stream ex- 
changers are merged into the outlet flow. 


~ 
fa 
other 


from 
other B 


y . exchangers ifjjm'm ia Vie 
gi : ; 
ifm! 
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Lean stream superstructure 


For example, for a rich stream i and its mth and m’th 
possible exchangers with lean streams j and j’ respec- 
tively, we have Fig. 2 

In Fig. 2, c= 1, ..., C are the transferable compo- 
nents. All possible configurations for the two exchang- 
ers ((ijm) and (ij’m’) in series, or in parallel), result by 
‘deleting’ appropriate connecting streams. Stream dele- 
tion corresponds to zero stream flows (e. g. g/ itt = SS 
= 0 and Si jmrm = 9 results in the exchangers in series). 

For a lean stream j and its mth and m'th exchangers 
with rich streams i and i’, we have Fig. 3. 

The MEN superstructure is described by mass bal- 
ances for the overall streams and each transferable com- 
ponent at the exchangers, splitters and mixers of the su- 
perstructure: 


E I O = mn 

ree Eye =V el = Mijme; 
CER, C= Mie G, 

E (,0 I _ 

Fi (Xj ) = Mijme, 


ijme — *ijme 
PES, ¢= 1iei35G, 


(1) 


(x 


MINLP: Mass and Heat Exchanger Networks 


2175 


> Sim = G; =0 
jEs,m (2) 


= Fim ~ Li 0, 


i€R,m 


@) B 
Sijm + »; ij’ mm’ 


j<S,m’ 


Lie - » Ly ‘jmm/ Vine = 0, (3) 


i/ES,m! 


i€R, jeS,m=1,...,M, 


: + oD 89 jm! > Shim = 0, 


E 
— Sijm = 9, 


Sym’ 


oe > li ijm’m Fim = 0, (4) 


i/ES,m!/ 


i€R, jeS,m=1,...,M, 


I as B oO 
SijimVict » Si! jm! mY ij! m'c 


eS,m’ 
E if — 
SijmVijme = 0, 
I 
Li im&je + > li ijm’m Xi im’ (5) 


pone 


—1E xi =0, 


1jm Sis 
ieR, jeS, 
CH 1.6250, m= 1, 2..5M, 


yi Sim Vim — GiYic = 0, 


jEs,m 
i€R,c=1,...,C, 


Y —— 
Um 44 ijmce — Ljx;, = 0, 


i€R,m 


i€S,c=1,...,C, 


(6) 


where the inlet, outlet, exchanger and exchanger- 
connecting flows of the rich and lean streams (g’, g° 
oe’ and? PEP 
compositions of components (molar fractions x!, x°, y/, 
y°) are illustrated in the corresponding superstructure 
figures. 


, respectively) and the intermediate 


Modeling Mass Exchange 


The existence of each potential mass exchanger in the 
network is denoted by a binary variable: 


1, when the mth exchanger 
Eijm = between streams i and j exists, 


0, otherwise, 


and defined by 
Sijm ijm ao V5 
lm EijjmU < 0, (7) 
Mijme — <0, 
Bae a Mijme = 9, 


where M jjm: is the mass exchange load of component c 
in mass exchanger (ijm), and U a large positive number. 

In each potential mass exchanger a component c is 
transferred from the rich to the lean stream when the 
rich composition is greater than the equilibrium com- 
position with respect to the lean stream: 


Veo f (Xe), 


where f(x.) is the mass transfer equilibrium relation, 
that may account for reactive mass transfer also. 

Feasibility of mass transfer is ensured imposing the 
above constraint at the inlet and outlet of the streams, 
i.e. (for counter-current flows): 


Mime ar FF me) + €ije — (1 ~ Ejjm)U <0, 


(8) 
—V¥ime + Pine ol €ijc — (l a Ejijm)U = 0, 


where €jjc is a minimum composition difference that is 
required for feasible mass exchange in a unit of finite 
size (e. g. imposed from mechanical constraints). When 
f(%c) is not convex the constraints in (8) cannot guar- 
antee feasible mass transfer throughout the exchanger. 
In this case f(x,) can be approximated by a set of con- 
vex functions and feasible mass transfer be ensured 
considering the constraints in (8) also for intermedi- 
ate exchanger points, that define the convex parts. Note 
that, the mass-transfer feasibility or driving-force con- 
straints in (8) are activated only when the correspond- 
ing exchanger exists (Ejjm = 1). 

The size of each potential mass exchanger (number 
of mass transfer stages, N“, etc.) is calculated as a func- 
tion of the variable mass transfer, through appropriate 
design equations (e. g. for perforated-plate columns the 
Kremser equation): 


st st I oO I oO 
IN Fi =N (Sim Ltt paciane De (9) 


Minimizing Network Cost 


The total network cost comprises 
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e the annualized capital cost of the mass exchangers, 
that may be discontinuous (involve a fixed charge 
cost factor), and 

e the annualized operating cost, i.e. the cost of the 
mass separating agents. 

Consequently, the MEN MINLP synthesis model is 
formulated as follows: 


(P1) min 


ijm j 
such that 


(2) — (9) 
Bid iil winnie =: 
Veswies Vijme = 0, 

iE€R, j,j €S, 

Mi = iss, ME, 

= Laws; 

Eison> “im> Litt jnm! 


I oO 
ijmc? Xijme = 0, 


oO 
lism = 0, 


x 


ie€R, j,j' €S, 


(P1) is a nonconvex MINLP problem and global 
optimization methods are required to guarantee global 
optimal solutions. 

The main advantage of the simultaneous MEN syn- 
thesis model (P1), as opposed to the sequential MEN 
synthesis method, is that the trade-off between the cap- 
ital and operating costs is systematically considered. 
Also, 

e (P1) derives the optimal network with respect to all 
the transferable components, considering the mass 
transfer of each component separately within the 
calculated mass-transfer stages of each exchanger. 

e Forbidden mass exchange matches, limited mass 
exchange and/or forbidden exchanger connections 
can be explicitly considered in (P1). 


e Variable target compositions are straightforwardly 
handled. 

When the mass exchange matches and mass exchange 

loads are fixed (e. g. when these are determined within 

a sequential MEN synthesis framework), (P1) reduces 

to an NLP and can be solved to derive a network con- 

figuration and unit sizes with minimum capital cost. 
Extending the concept of cost optimality of the mass 

exchanger network, two special cases have been studied: 

e MEN and regeneration networks. 
When regenerating agents are available for some 
(or all) lean streams, the total mass integration cost 
involves also the regeneration cost. The regener- 
ation network can be considered simultaneously 
within the MINLP MEN synthesis model [9], ac- 
counting for all the regeneration alternatives of the 
lean streams and employing binary variables to de- 
note the existence of the regenerating exchangers. 
In this case, the mass separating agents behave as 
lean streams in the mass exchangers of the main 
MEN and as rich streams in the regenerating mass 
exchangers. The regeneration network is not nec- 
essarily separable from the main MEN, as a lean 
stream may be partly regenerated before being used 
as a separating agent in another mass exchanger. 
Thus, the lean stream superstructures involve all the 
possible interconnections between the exchangers of 
the main MEN and the regenerating exchangers. For 
example, for a lean stream j and its mth and m'th 
exchangers with rich stream i and regenerant k we 
have Fig. 4. 
The overall superstructure of mass exchange and re- 
generation alternatives involves also the superstruc- 
tures of the regenerating agents, that have variable 
flows, while the overall network cost includes the 
main MEN and the regeneration cost (capital and 
operating cost). 

e Flexible mass exchange networks. 
The ability of MEN to accommodate variations in 
the rich stream flows and inlet compositions in an 
efficient manner affects cost optimality. A multi- 
period MINLP MEN synthesis model has been sug- 
gested in [7], to derive mass exchange networks, 
flexible to accommodate in an optimal manner dif- 
ferent mass integration requirements. In the mul- 
tiperiod MINLP model a weighted operating cost 
is optimized simultaneously with the capital cost 
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Regenerating stream superstructure 


for mass exchangers that can operate feasibly un- 
der the different conditions. The MEN superstruc- 
ture is extended to include control variables that 
enhance flexibility (as exchanger-bypassing streams 
and overall bypass streams that are accordingly pe- 
nalized). 
When the alternative mass transfer operations take 
place at different and/or variable temperatures, heat 
integration between the network streams can be si- 
multaneously considered within a combined MEN and 
HEN synthesis problem [7]. The available rich and lean 
streams define hot, cold or hot-and-cold streams in the 
heat integration problem, depending on whether their 
supply and target compositions are above or below the 
mass exchange temperatures. Thus, their heat exchange 
alternatives include both hot- and cold-side matching. 
Inlet and outlet temperatures and compositions in mass 
and heat exchangers are variables. The combined mass 
and heat exchanger superstructure involves all the pos- 
sible mass and heat exchangers of a stream and all the 
possible interconnections between them, Fig. 6. 
The combined MEN and HEN superstructure is de- 
scribed by 


(MEN and ’ Xm’ Xikm’ : 


e mass balances at the superstructure splitters (i. e. the 
initial stream splitters and the splitters after each 
side of the possible mass and heat exchangers), simi- 
lar to (2) and (3), and considering all the connecting 
flows; 

e mass balances for overall flows and transferable 
components at the superstructure mixers (i. e. the fi- 
nal stream mixers and the mixers prior to each side 
of the potential mass and heat exchangers), similar 
to (4), (5) and (6), and considering all the connect- 
ing flows; 
energy balances at the superstructure mixers; 

e mass balances at the mass exchangers, similar to (1), 
and 

e energy balances at the heat exchangers. 

The MHEN synthesis model also involves 

e binary variables, to denote the existence of mass and 
heat exchangers, and their definition (mixed integer 
constraints), 

e driving force constraints for mass exchange (8) at 
the potential mass exchangers, and for heat ex- 
change at the potential heat exchangers (based on 
A Tmin) > 

e design equations for the potential mass and heat ex- 
changers, and 

e a total annualized network cost. 

and is formulated as a (nonconvex) MINLP. 

The simultaneous MHEN synthesis model ad- 
dresses systematically the trade-off between capital and 
operating cost of mass and heat integration. The MEN 
and HEN are not assumed separable. Thus, better inte- 
gration can be achieved, as it is allowed for a stream to 
be partly heated for a particular mass exchange opera- 
tion and then heated further for final purification. 
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Combined MEN and HEN superstructure 


In the simple case when the temperatures of the 
mass exchange operations are given or can be prepos- 
tulated, the rich and lean streams define hot (or cold) 
streams before participating to mass exchangers and 
cold (or hot) streams afterwards [11]. 

The final mass and heat exchanger network struc- 
ture results from the flows of the superstructure sub- 
streams. Alternatively, the use of binary variables has 
been suggested in [7] to denote the existence of ex- 
changer connections. This, although increasing the 
combinatorial complexity of the MINLP synthesis 
model, allows for: 

i) explicit piping cost considerations, 
ii) structural constraints to be easily modeled, and 
iii) the solution of simple NLP subproblems within 

a decomposition-based MINLP solution method. 
Mass exchange networks have been introduced as an 
end-of-pipe treatment alternative. However, the extent 
of mass recovery and the corresponding cost are closely 
related to the reactive and mixing operations in a pro- 
cess. A. Lakshmanan and L.T. Biegler [6] have sug- 
gested a MINLP model for the synthesis of optimal re- 
actor networks, where the thermodynamic feasibility of 
mass integration and its implications are taken simul- 
taneously into account, applying the first and second 
thermodynamic laws for mass exchange, i.e. 

e Total mass balance for the mass integrated streams 

(resulting process and available rich and lean 

streams); 


> GiVie — Vic) = LG, — Xie) 


i€R jes 


(10) 


and 


mass 
exchanger 


pert 


e feasibility of mass exchange above (and below) each 
candidate mass exchange pinch: 


Mass lost by all the rich 
streams below each pinch 
point candidate 


Mass gained by all the lean 
— 4 streams below each pinch 
point candidate 


<0 


(11) 


Note that the thermodynamic feasibility requirements 
in (11) involve nondifferentiable terms if inlet and 
outlet compositions are variables (position of streams 
with respect to candidate pinch points). These can be 
handled either employing differentiable approximation 
functions [6], or introducing binary variables [2,3,5]. 
The main assumption in MEN is that mass transfer 
operations are isothermal. In the general case these can 
be followed (or caused) by heat transfer, as in distilla- 
tion. Assuming constant counter-current molar flows, 
M.J. Bagajewicz and V. Manousiouthakis showed in 
[1] that distillation columns can be handled as pure 
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mass transfer operations and derived targets for en- 
ergy consumption and separation of a ‘key’ component, 
employing the first and second thermodynamic laws 
in (10) and (11), within an MINLP-based MHEN se- 
quential synthesis framework. The problem of energy- 
induced separations has been addressed by M.M. El- 
Halwagi, B.K. Srinivas and R.F. Dunn in [4], translating 
the energy-based separation tasks into simple energy- 
requiring operations (heating and cooling tasks) and 
deriving targets for energy consumption and the corre- 
sponding mass recovery, based on thermodynamic fea- 
sibility constraints. 

Extending the concept of mass exchange to non- 
isothermal mass transfer operations Papalexandri and 
Pistikopoulos introduced a mass/heat transfer mod- 
ule [8], where mass is transferred between different 
phases or reacting species if that is thermodynamically 
feasible, i. e. if that decreases the total Gibbs free energy 
of the system. Mass and energy balances, taking into 
account possible reactions, and mass-transfer driving- 
force constraints based on total Gibbs free energy are 
employed to model the mass/heat transfer module as 
an aggregate of differential mass and energy transfer 
phenomena. Considering a superstructure of mass/ 
heat and heat exchange modules in a process and all 
possible interconnections between them, process syn- 
thesis tasks can be formulated as mass/heat and heat 
exchange superstructure MINLP problems, where bi- 
nary variables are employed to denote the existence 
of mass/heat and heat exchangers. Then, process op- 
erations (conventional and/or hybrid) and networks 
are derived as combinations of mass/heat and heat 
exchange phenomena [8,10]. 
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The outer approximation algorithm (OA algorithm) 
([1,2,9]) addresses mixed integer nonlinear programs of 
the form: 


min Z= f(x,y) 
(P) st. gi(x,y) <0, jel, 
xex,yeY, 


where f(-), g(-) are convex, differentiable functions, J is 
the index set of inequalities, and x and y are the con- 
tinuous and discrete variables, respectively. The set X 
is commonly assumed to be a convex compact set, e. g. 
X = {x: x € R", Dx < d, x! < x < x}; the discrete set Y 
corresponds to a polyhedral set of integer points, Y = {y: 
y € Z", Ay < a}, and in most cases is restricted to 0 — 1 
values, y € {0, 1}’”. In most applications of interest the 
objective and constraint functions f(-), g(-) are linear in 
y (e. g. fixed cost charges and logic constraints). 

The OA algorithm is based on the following theo- 
rem [1]: 


Theorem 1 Problem (P) and the following mixed- 
integer linear program (MILP) master problem (M-OA) 
have the same optimal solution (x*, y*), 

(M—OA) minZ, =a 


such that 


k 
a> flo’ y) + VFO y) (; 7 **) 


x — xk 
gi (xk, y*) + Vgi(x*, y*) i 7 a) <0, 


jet, kek", 
xex, yeY, 
where 


(x*, y) is the optimal 


K* = 2k: | solution to (NLP1) ; 
for all feasible y‘ € Y 
min Ze = fix, y*) 
(NLP1) 4st. g(x, y*) <0, jeJ, 


xEx, 


where Z*, is an upper bound to the optimum of problem 


(P). 


Note that since the functions f(x, y) and g(x, y) are 
convex, the linearizations in (M-OA) correspond to 
outer approximations of the nonlinear feasible region 
in problem (P). Also, since the master problem (M-OA) 
requires the solution of all feasible discrete variables y*, 
the following MILP relaxation is considered, assuming 
that the solution of K NLP subproblems is available: 


(RM—OA) minZ*S =a 


such that 
x= xk 
a> Flot yk) + VFlxk yh) = \) | 


k ak k ok x — xk 
glx", y") + Vgi(x", y") aye <0, 


je), k=1,...,K, 
xExX, yeY. 


Given the assumption on convexity of the functions 
f(xy) and g(x,y), the following property can be easily 
be established, 


Property 2. The solution of problem (RM-OA), corre- 
sponds to a lower bound to the solution of problem (P). 


Note that since function linearizations are accumulated 
as iterations proceed, the master problems (RM-OA) 
yield a nondecreasing sequence of lower bounds, Z; < 

i< VAS , since linearizations are accumulated as itera- 
tions k proceed. 

The OA algorithm as proposed by M.A. Duran and 
LE. Grossmann [1] consists of performing a cycle of 
major iterations, k= 1,..., K, in which (NLP1) is solved 
for the corresponding y*, and the relaxed MILP mas- 
ter problem (RM-OA) is updated and solved with the 
corresponding function linearizations at the point (xk, 
y*). The (NLP1) subproblems yield an upper bound 
that is used to define the best current solution, UBX = 
min(Z a The cycle of iterations is continued until this 
upper bound and the lower bound of the relaxed master 
problem, are within a specified tolerance. 

It should be noted that for the case when the prob- 
lem (NLP1) has no feasible solution, there are two ma- 
jor ways to handle this problem. The more general op- 
tion is to consider the solution of the feasibility prob- 
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lem, 


min u 
(NLFP) jst. gj(x.y*)<u, jeJ, 
xexX, uEeR'. 


R. Fletcher and S. Leyffer [2] have shown that for 
infeasible NLP subproblems, if the linearization at the 
solution of problem (NLFP) is included, this will guar- 
antee convergence to the optimal solution. 

For the case when the discrete set Y is given by 0-1 
values in problem (P), the other option to ensure con- 
vergence of the OA algorithm without solving the fea- 
sibility subproblems (NLFP), is to introduce the follow- 
ing integer cut whose objective is to make infeasible the 
choice of the previous 0-1 values generated at the K 
previous iterations [1]: 


D- Ds |B -1. 
ieBk ieNk 
k= 1p..0,.K5 


(ICUT) 


where B’ = fiy? = 1}, N© = fy = 0}, k= 1, 2 K. 
This cut becomes very weak as the dimensionality of 
the 0-1 variables increases. However, it has the useful 
feature of ensuring that new 0-1 values are generated at 
each major iteration. In this way the algorithm will not 
return to a previous integer point when convergence is 
achieved. Using the above integer cut the termination 
takes place as soon as ZS > UBX. 

The OA method generally requires relatively few cy- 
cles or major iterations. One reason for this behavior is 
given by the following property: 


Property 3. The OA algorithm trivially converges in 
one iteration if f(x, y) and g(x, y) are linear. 


The proof simply follows from the fact that if f(x, y) and 
g(x, y) are linear in x and y the MILP master problem 
(RM-OA) is identical to the original problem (P). 

It is also important to note that the MILP master 
problem need not be solved to optimality. In fact given 
the upper bound UB and a tolerance « it is sufficient 
to generate the new (y*, x*) by solving, 


(M—OAF) min i = 0a 


such that 
a> UB‘ —¢ : 


_ yk 
a > f(x y*) + VF (xk, y¥) (~*.] , 


i= xk 
silo! y+ Veit y (F—%) <0, 


jet, k=1,...,K, 
xex, yeY. 


While in (M-OA) the interpretation of the new 
point y* is that it represents the best integer solution to 
the approximating master problem, in (M-OAF) it rep- 
resents an integer solution whose lower bounding ob- 
jective does not exceed the current upper bound UBS; 
in other words it is a feasible solution to (M-OA) with 
an objective below the current estimate. Note that in 
this case the OA iterations are terminated when (M- 
OABF) is infeasible. 

Another interesting point about the OA algorithm 
is the relationship of its master problem with the one 
of the generalized Benders decomposition method [3], 
which is given by: 

(RM—GBD) minZ{ =a 
such that 

a> f(x* yy) + Vif (e® yy — 9) 

+ (u*)t [gcx*, ¥") + Ve(x*, yy — | ; 

ke KFS, 

NT | gxk, yh) + Vaart, yy — 4] 

keKIs, 

xeEeX, aeR', 


where KFS is the set of feasible subproblems (NLP1) 
and KIS the set of infeasible subproblems whose solu- 
tion is given by (NLFP). Also |KFS C KIS | = K. The 
following property, holds between the two methods [1]: 


Property 4 Given the same set of K subproblems, the 
lower bounds predicted by the relaxed master problem 
(RM-OA) are greater or equal to the ones predicted by 
the relaxed master problem (RM-GBD). 


The above proof follows from the fact that the La- 
grangian and feasibility cuts in (RM-GBD) are surro- 
gates of the outer approximations in the master prob- 
lem (M-OA). Given the fact that the lower bounds of 
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MINLP: Outer Approximation Algorithm, Table 1 
Summary of computational results 


Method Subproblems Master LPs 
problems solved 

BB 5 (NLP1) 

OA 3 (NLP2) 3(M-PID)  19LPs 

GBD 4 (NLP2) 4(M-GBD) 10LPs 

ECP = 5(M-MIP)  18LPs 


3 Iterations 4 


MINLP: Outer Approximation Algorithm, Figure 1 
Progress of iterations of OA and GBD for MINLP in MIP-EX 


GBD are generally weaker, this method commonly re- 
quires a larger number of cycles or major iterations. As 
the number of 0-1 variables increases this difference be- 
comes more pronounced. This is to be expected since 
only one new cut is generated per iteration. Therefore 
user-supplied constraints must often be added to the 
master problem to strengthen the bounds. As for the 
OA algorithm, the trade-off is that while it generally 
predicts stronger lower bounds than GBD, the compu- 
tational cost for solving the master problem (M-OA) is 
greater since the number of constraints added per iter- 
ation is equal to the number of nonlinear constraints 
plus the nonlinear objective. 

The OA algorithm is also closely related to the ex- 
tended cutting plane (ECP) method by T. Westerlund 


and F. Peterssen [8]. The main difference lies that in 
the ECP method no NLP subproblem is solved, and that 
linerization simply takes place over the predicted con- 
tinuous points from the MILP master problem, which 
in turn will normally only include linearizations of the 
most violated constraints. 

Extension of the OA algorithm [4] include the 
LP/NLP based branch and bound [6], which avoids the 
complete solution of the MILP master problem (M- 
OA) at each major iteration. The method starts by solv- 
ing an initial NLP subproblem which is linearized as 
in (M-OA). The basic idea consists then of perform- 
ing an LP-based branch and bound method for (M-OA) 
in which NLP subproblems (NLP1) are solved at those 
nodes in which feasible integer solutions are found. By 
updating the representation of the master problem in 
the current open nodes of the tree with the addition of 
the corresponding linearizations, the need of restarting 
the tree search is avoided. Another important extension 
has been the method by Fletcher and Leyffer [2] who 
included a quadratic approximation based on the Hes- 
sian of the Lagrangian to the master problem (M-OAF) 
in order to capture nonlinearities in the 0-1 variables. 
Note that in this case the optimal solution of the mixed 
integer quadratic program (MIQP), ZX, does not pre- 
dict valid lower bounds in this case, and hence the con- 
straint ~@ < UBK — ¢ is added, with which the search is 
terminated when no feasible solution can be found in 
the MIQP master. 

Finally, in order to handle equations in problem 
(P), G.R. Kocis and Grossmann [5] proposed the equal- 
ity relaxation strategy, in which linearizations of equa- 
tions are converted into inequalities for the MIP mas- 
ter problem according to the sign of the Lagrange 
multipliers of the corresponding NLP subproblem. J. 
Viswanathan and Grossmann [7], further proposed to 
add slack variables to this MILP master problem, and 
an augmented penalty function. Since in this gener- 
ally nonconvex case the bounding properties do not ap- 
ply, the algorithm was modified so as to start with the 
NLP relaxation of problem (P). If no integer solution is 
found, iterations between the MILP and NLP subprob- 
lems take place until there is no improvement in the ob- 
jective function. This idea was precisely implemented in 
the commercial code DICOPT, which can also be modi- 
fied to the original OA algorithm, if the user knows that 
the functions f(x, y) and g(x, y) are convex. 
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Example 5 In order to illustrate the performance of the 
OA algorithm, a simple numerical MINLP example is 
considered. 


min Z=y,;+1.5y2 + 0.5y3 


+x7 + x} 
s.t. (x1 = 2) —xX2< 0 
x; —2y, = 0 


xX) — X2 —4(1— y2) <0 
(MIP — EX) C= j20 
xX2— 2 20 
X1 + x2 = 3y3 
yMt+y2t+y321 


O0<x <4, OS% <4 


Vis V2.3 = 0,1. 


The optimum solution to this problem corresponds to 
y1 = 0, yo = 1, y3 = 0, x) = 1, x2 = 1, Z = 3.5. Figure 1 
shows the progress of the iterations of the OA and GBD 
algorithm with the starting point y; = y2 = y3 = 1. As 
can be seen the lower bounds predicted by the OA algo- 
rithm are considerably stronger than the ones predicted 
by GBD. In particular at iteration 1, the lower bound of 
OA is 1.0 while the one of GBD is —23.5. Nevertheless, 
since this is a very small problem GBD requires only 
one more iteration than OA (4 versus 3). It is interesting 
to note that the NLP relaxation of this problem is 2.53, 
which is significantly lower than the optimal mixed in- 
teger solution. Also, as can be seen in Table 1, an NLP- 
based branch and bound method requires the solution 
of 5 NLP subproblems, while the ECP method requires 
5 successive MILP problems. 
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Reactive distillation (RD) occurs when a reaction takes 
place in the liquid holdup on the trays, in the reboiler, 
or in the condenser of a distillation column. Reactive 
distillation can increase the conversion of equilibrium 
limited reactions by continuously separating products 
and reactants, improve the selectivity in some kineti- 
cally limited reaction systems, and separate azeotropic 
and isomeric mixtures by converting one species into 
another that is easy to remove. It can also create a natu- 
ral heat integration that uses an exothermic heat of re- 
action to create vapor boilup in a distillation column, 
and reduce capital costs by completing several process- 
ing steps in a single vessel. Reactive distillation is used 
commercially to produce methyl tert-butyl ether [13], 
esters including methyl acetate [1], and nylon 6, 6 [9]. 
It has also been proposed for hydrolysis reactions [7], 
ethyl- ene glycol synthesis [11], and cumene produc- 
tion [12]. See [7] for a review of the area. 

As a result of increasing interest in the reactive 
distillation technique, systematic reactive distillation 
design methods have gained much importance. See 
[2,3,4,5] for residue curve maps, a powerful tool for vi- 
sualizing distillation problems, to reactive distillation. 
In [7] this work was extended by including kinetic ef- 
fects when the Damkohler number is fixed. In [14] syn- 
thesis of reactive distillation with multiple reactions is 
studied. 

Reactive distillation poses a challenging problem for 
optimization based design techniques. Unlike in con- 
ventional distillation, holdup volume is an important 
design variable in reactive distillation, since the reac- 
tion generally takes place in the liquid body on the 
tray. The constant molar overflow assumption of con- 
ventional distillation design is not valid unless the re- 


action has thermal neutrality and is stoichiometrically 
balanced. For an optimal solution one should take into 
account that the feed to the column may be distributed. 
This, in addition to the holdup volume, liquid and va- 
por flows, composition and temperature profiles, num- 
ber of trays and feed location(s) become major variables 
of an optimization problem which searches for a min- 
imum of a cost function. The constraints of this op- 
timization problem are material and energy balances, 
vapor-liquid equilibria, mole fraction summations, ki- 
netic and thermodynamic relationships, and logical re- 
lationships between the variables. The resulting opti- 
mization model is a mixed integer nonlinear program- 
ming problem since it involves the optimum number of 
trays and feed tray locations which are integer variables. 
The cost function and the material and energy balances 
cause the nonlinearity of the problem. 

There are two approaches to RD design via MINLP 
methods. One addresses reactive distillation through 
heat and mass exchanger networks [10], and the other 
addresses it through distillation column superstruc- 
tures [6,8]. 


Problem Statement 


The general problem of the reactive distillation column 
synthesis problem can be stated formally as follows. 
Given: 

e the chemical species, i= 1,..., I, involved in the dis- 
tillation; desired products, i € P, and their produc- 
tion rates P7; 
the set of chemical reactions, j = 1,..., J; 

e rate expressions 1; or an equilibrium constant K; for 
each reaction j; 

e heat of vaporization and vapor-liquid equilibrium 
data; 
cost of downstream separations; 

e cost c, and composition x;, of all feedstocks, s = 1 

0k 

e the cost of the column as a function of the number 
of trays and the internal vapor flow rate, C(V, N); 

e the form of the catalyst. 

Determine: 

e the optimum number of trays; 

e the trays where reactions take place; 

e the holdup on each tray where a kinetically limited 
reaction takes place; 


MINLP: Reactive Distillation Column Synthesis 


2185 


e the reflux ratio; 

e the condenser and reboiler duties; and 

e the feed location(s). 

Such that the total cost is minimized while producing 
the correct amount of product. 


Distillation Based Superstructure Approaches 


One approach to MINLP based reactive distillation col- 
umn design uses a superstructure that contains many 
different alternative designs embedded within it. Two 
different superstructures have been proposed; they dif- 
fer in their treatment of the liquid reflux and vapor 
boilup, and in their heat management. See [6] for 
a structure that varies the number of trays and always 
recycles the liquid reflux to the top tray and the va- 
por boilup to the bottom tray (Fig. 1). More recently 
(1997), Z.H. Gumus and A.R. Ciric [8] modified the su- 
perstructure presented in [15] recycling vapor boilup 
and liquid reflux to each tray by adding a decanter to 
the distillate stream and side heaters and coolers to each 
tray (Fig. 2). In both of these superstructures, the num- 
ber of trays may vary between 1 and some upper bound 
K. Each feed stream is split, and a portion is sent to each 
tray in the superstructure. In kinetically limited reac- 
tions, the hold-up volume may vary, and, in reactions 
systems with a solid catalyst, some trays will have reac- 
tion while others do not. 


MINLP: Reactive Distillation Column Synthesis, Figure 1 
Superstructure for optimum feed location(s) and number of 
trays [6] 


MINLP: Reactive Distillation Column Synthesis, Figure 2 
Tray-by-tray superstructure of [8] 


The structure shown in Fig. 1 is appropriate for re- 
active distillation processes with a single liquid phase 
and kinetically limited reactions that are catalyzed with 
a solid catalyst. Representing the existence of each tray 
with an integer variable Y; leads to a mixed integer 
nonlinear programming problem whose solution ex- 
tracts a design with the number of trays, feed tray lo- 
cations, reactive trays, holdup volumes, reflux ratio and 
boilup ratio that minimize the total cost. Assumed va- 
por liquid equilibrium on each tray, no reaction in the 
vapor phase, homogeneous liquid phase, negligible en- 
thalpy of liquid streams, constant heat of vaporization 
leads to the MINLP shown below [6]: 


minZ =c, + » Cs Fk + CRQB + CCQC 


sk 
Wi 
es erD' x S (2% + 127) 
0.802 
Wy 
+ csuD || Ho + ) 2+ 12755 
k ki <k 
x (Ye — Yeti) 


subject to 


> XisFs1 — Lixi1(1 — B) 
& 


+ L2xi,2 — ViKiixi + > vijé1j =0, () 
j 
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Ss 


—L xin — ViKinxix + >. vajéik | Ye = 9, 


p 


k=2,..., 


AVe-1 — AVE — D> AH jE jx | Ye = 0, 


d. 


Dist = (Ve — Leta) Ye — Yeti), 


k 
By = 1 — B)Lixis, 


B; ie P, 


* a ] (Yx — Yr-1) = 0, 
bs Kipxik — ] Y; = 0, 


i 


xdi — Kipxik —1+ ye — yeti < 9, 


Xi,k+1 —xdj —1+ yx 


EL = 1, 


Fix = Wefi (xix, Th), 


— Ve+1 <9, 


Kix = Kix(xiz, Tk), 
Vi — Finax Ye <0, 
Yo Fk — Frax Yr <0, 
s 
Lyi = Frmax Yx <0, 
Wi _ Winax Yk <0, 
Qz = BAL, 
Qc = DAVEY: — Yass), 
k 
IF = Cop, 


D 2 Dyin, 


bs XisFsk + Ve—-1Ki,k—1Xik-1 + Let 1Xi,k+1 


K, 


Yeti < Ye. (22) 


In this model, constraints (1) and (2) are the com- 
ponent balances of species i over the bottom tray and 
the remaining trays k; constraint (3) is the energy bal- 
ance around tray k. The distillate flow is found with 
constraint (4). Distillate flow is calculated as the differ- 
ence between the vapor flow leaving the top tray and 
the liquid flow entering it. Note that the term Y, — 
Yx+1 Will be nonzero only for top tray, and zero for all 
others. Constraint (5) calculates the bottoms flow rate 
and constraint (6) specifies the production rate. Sum- 
mation equations for the mole fractions are given in 
constraints (7) and (8). Constraints (9)-(11) identify 
the top tray and set the distillate and liquid reflux com- 
position equal to the composition of the vapor leaving 
the top tray. Reaction rates are given in constraint (12), 
and the vapor liquid equilibrium constant is found by 
constraint (13). Constraints (14)—(17) ensure that when 
Y;, equals zero and tray k does not exist, the flows onto 
and off of the tray are zero. Constraints (18) and (19) 
calculate the reboiler and condenser duties, while con- 
straints (20) and (21) find the column diameter. The last 
constraint ensures that tray k + 1 does not exist if tray k 
does not exist. 

In [6] this technique is demonstrated with the syn- 
thesis of a reactive distillation column that makes ethy- 
lene glycol from ethylene oxide and water. The main 
reaction is 


C,H,O + H,O = C,H¢O>. 


Further reaction of ethylene glycol gives the unde- 
sired byproduct diethylene glycol: 


C,H,O + C,H.¢O2 => C4H1903 


Ethylene glycol is produced using reactive distilla- 
tion because the large volatility difference between the 
product and the reactants allows the continuous re- 
moval of EG from the reaction zone and absorption 
of the heat of reaction by the separation results in cost 
cuts. 

The problem is solved using the reaction, physical 
property and cost data given in Table 1. The production 
rate is taken as 25 kg.mol/h of ethylene glycol. When 
the problem is solved without specifying the number of 
feed trays or their locations the solution obtained us- 
ing GAMS is a 10-tray distillation column with a total 
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Ethylene glycol system; reaction, physical property and cost 
data 


Reaction Rate AH 
(mol/cm?.s) (kJ/mol) 

1 3.15x 10°exp| 32 xz0xXH,0  —80 

2 6.3x 10° exp) sR? |xtoxes —13.1 


Component K for P = 1 atm 


OMNI ee |) 
H,O 221.2 exp {6.31 [ =“ ]} 
EG Teo O84 | paeall 
DEG  47exp {10.42 [ =S8]} 


EO feedstock: $43.7/kmol 
Water feedstock: $21.9/kmol 
Downstream separation: $0.15/kmol H,O 


in effluent 
Cop =  $222/yr 
Cr= $15.7/yr 
Cr=  $146.8/kW.yr 
Cco= $24.5/kW.yr 
Co= $10, 000/yr 


annualized cost of 15.69 x 10°/yr. The reaction zone is 
above tray 4 and the feed is distributed to each tray in 
the reaction zone. When the problem is slightly modi- 
fied by adding constraints on the feed tray number, the 
solution changes to a 10-tray column with a total annu- 
alized cost of 15.73 x 10°/yr. The reaction zone is be- 
tween trays 4 and 10 and water is fed to tray 10 while 
ethylene glycol enters the column at tray 4. The selectiv- 
ity reached by both columns is the same. (Fig. 3) shows 
the solutions. The column specifications are given in 
Table 2. 


Heat and Mass Exchange Networks 


In this approach, process units are defined as combi- 
nations of heat and mass exchanger blocks, and the al- 
ternatives for the synthesis are explored simultaneously 
in a superstructure. A reactive distillation column can 
be described as a combination of mass/heat exchanger 
units with a condenser and a reboiler [10]. Heat and 
mass transfer takes place between the contacting vapor 


and liquid phases and from reactants to products. Mul- 
tiple feeds and products and side heating and cooling 
tasks can be included in the description in the form of 
multiple mass and heat exchanger blocks between liq- 
uid and vapor streams. Its phase and quality define each 
stream. The quality indicator describes the leanness or 
richness of a stream in different components. Heat and 
mass transfer occurs between vapor and liquid streams 
of the same quality or between liquid and liquid (reac- 
tant and product) streams. For example, consider the 
reaction 


A+B—>C+D. 


Then there are liquid and vapor streams LABCD 
and VABCD in general notation. The streams lean 
in a component, for example in A, have that letter 
in parentheses, e.g. L(A)BCD or V(A)BCD. All pos- 
sibilities of such streams, i.e. LAB(CD), VAB(CD), 
L(ABC)D, V(ABC)D, L(AB)C(D), V(AB)C(D), etc., 
and all the possible matches between them are con- 
sidered within the structure. The possible matches are 
liquid-vapor matches of the same stream and all liquid- 
liquid matches. 

This model describes exchangers with simple mass 
and energy balances and constraints defining phase and 
feasibility. Mass and heat generated or consumed by 
chemical reactions are included in the balances. Mass 
transfer is driven by a minimum concentration ap- 
proach while a minimum temperature approach is the 
driving force for heat transfer. Concentration and tem- 
perature approach constraints are considered at each 
end of the exchanger. Equilibrium can be represented 
by a zero concentration approach, which means no 
driving force for mass transfer. 

In the synthesis framework for an optimal pro- 
cess network, one should start with the construction 
of the stream sets containing all the initial, intermedi- 
ate, and final process streams. The key is the availabil- 
ity of the physical and chemical property information 
on the streams. When the information is not enough 
to identify the individual streams, especially the inter- 
mediate streams, a general set of one vapor and one 
liquid stream is constructed, which contain all compo- 
nents involved in the process. The second step is to list 
all the possible stream matches. Engineering knowledge 
plays an important role in this step. One should be care- 
ful about not listing redundant or meaningless stream 
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Column specifications for ethylene glycol production 


Feed type Diam. (m) Height (m) Boilup ratio Reboiler duty (MW) Condenser duty (MW) 
Distr. 1.3 IZ 0.958 6.7 7.31 
Two-feed 1.3 1 0.96 6.9 ee) 
Product network superstructure is the next step in the frame- 
work. All possible interconnections between the stream 
i: splitters and mixers should be taken into considera- 
tion. The last step is the optimization of the superstruc- 
— ture. Usually, the objective function of the optimization 
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Mass/heat exchange network representation of a multifeed 
reactive distillation column 


Product 


matches since these will only make the problem more 
complex. Knowledge about the system is the key in this 
screening stage. Developing the mass/heat exchange 


problem is a cost function. If the cost function includes 
only operating cost, which depends on the raw mate- 
rial and utility consumption, the objective function can 
be easily formulated from the superstructure. If, how- 
ever, capital investment costs are involved in the ob- 
jective cost function, the formulation is not straightfor- 
ward from the superstructure, since process unit spec- 
ifications are not considered in the superstructure. In 
this case, capital cost is to be approximated using cost 
functions that take operating conditions into account. 
Separation difficulty can be used in evaluating the capi- 
tal cost of a distillation tray. 

K.P. Papalexandri and E.N. Pistikopoulos [10] used 
the production of ethylene glycol from ethylene ox- 
ide and water to demonstrate this approach. The re- 
actions involved in this production were given be- 
fore. Physical properties, cost and reaction data are 
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Steps of the synthesis framework 


Product 


the same as given earlier in Table 1. The difference 
from the example problem studied in [6] is the objec- 
tive, which is the minimization of operating cost only. 
The set of streams include the intermediate streams 
L{EO, H20, EG, DEG} and V{EO, H20, EG, DEG} and 
the product streams L(EG) and L(DEG). Five liquid- 
liquid mass/heat exchange matches and 15 liquid-vapor 
mass/heat exchange matches are considered. Repre- 
senting each match with a binary variable, and consid- 
ering all possible interactions between units, the prob- 
lem is formulated as a mixed integer nonlinear pro- 
gramming problem with the objective of minimizing 
operating cost, which includes raw material cost, purifi- 
cation, and utility cost. The optimal reactive distillation 
column obtained is pictured in Fig. 6. The column has 
two reaction zones and multiple feeds, and the operat- 
ing cost is 1.17 x 10° $/yr. 


Conclusions 


This paper discussed the MINLP applications in reac- 
tive distillation design problems. Two main approaches 
are studied: distillation based superstructure approach 
that uses rigorous tray-by-tray method to model reac- 
tive distillation, and heat and mass exchanger network 
superstructure approach that realizes reactive distilla- 
tion processes as combinations of several mass/heat ex- 
changers with a condenser and a reboiler. Examples are 
included to demonstrate the approaches. 
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The trim-loss problem is one of the most demanding 
optimization problems in the paper-converting indus- 
try. It appears when an order specified by a customer is 
to be satisfied by cutting out a set of product reels from 
a wider raw paper reel. 

The products in the order are characterized by 
width and quality. In a paper-converting mill the raw 
paper can be printed, coated and cut. In a typical paper- 
converting mill, there may be hundreds of different 
products to be produced. When considering the trim- 
loss problem, width is the most important property 
while the main problem is to determine such cutting 
patterns that minimize waste production, the trim loss. 

In the optimization problem, beyond the number 
of cutting patterns needed, the appearance of each cut- 
ting pattern needs to be determined at the same time as 
having to decide how many times the cutting patterns 
ought to be repeated. 

The customer widths and the raw paper widths are 
often more or less independent of each other. This 
makes it combinatorially very demanding to produce 
a cutting plan that minimizes the trim loss. Even if 
the trim-loss problem is in its basic form an integer 
problem, it has often been solved by linear program- 
ming (LP) methods [3] or some heuristic algorithms 
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Trim Loss 


MINLP: Trim-loss Problem, Figure 1 
The cutting procedure 


[4]. A good survey of widely used solution methods for 
trim-loss and assortment problems is given in [7]. 

When using an LP-approach to solve an integer 
problem the biggest difficulty is to convert the continu- 
ous solution such that the integer variables obtain in- 
teger values. The rounding methods are heuristic [8] 
and often fail to give the optimal integer solution even 
though the solution may be fairly good. 


Problem Formulation 


The trim-loss problem is a bilinear nonconvex integer 
nonlinear programming (INLP) problem. The appear- 
ance of a cutting pattern needs to be determined by in- 
teger variables and the bilinearity comes from the de- 
mand constraints. 

A cutting pattern tells how many times a certain 
product is cut out from the raw paper. Let a cutting pat- 
tern have the index j and a product the index i. Assume 
a customer demand with I different products and fur- 
ther assume that the maximum allowed number of dif- 
ferent cutting patterns is J. Further let m; be the num- 
ber of times a certain cutting pattern is repeated and nj 
be the number of times a product i appears in cutting 
pattern j. If the demand of a product i is expressed by 
Ni, order» the demand constraints can be written as 


J 
Nj order — ) mj Ni; <0, 
j=l 


(1) 
tS Macey 


+ 
Mj, Nij eZ’. 
The negative bilinear terms make the problem noncon- 


vex. Both of the variables in the term are integer vari- 
ables and consequently the problem is a bilinear inte- 


ger optimization problem. It is not possible to replace 
one of the variables nj with a continuous variable be- 
cause this would violate the product specification. In 
theory it is possible to replace the m; with a continuous 
variable but this may easily dissatisfy the desired prod- 
uct reel length and diameter requirements. Therefore, 
in the following study it is preferable to keep both m; 
and nj as integers. 

While raw paper reels of the same width are often 
glued together to form a continuous raw paper reel the 
problem can be simplified by omitting the raw paper 
length and assuming that the pattern lengths are equal. 

Besides the demand constraint, certain constraints 
are needed to keep the problem feasible. Let the width 
of a product i be expressed by 0; and the width of the 
raw paper used for cutting pattern j by B;, max. The trim- 
loss width cannot exceed, for instance, 200mm owing to 
the machinery. This limit is represented by Aj. Further- 
more, the maximum number of products that can be 
cut out from a pattern often has a physical restriction. 
The outcoming product reels have to form an angle big 
enough so that the reels do not attach together, yet with 
too big an angle between the outermost reels the paper 
may be torn off. Let this upper limit be Nj, max. 

Besides the total number of patterns, the pattern 
changes are also of interest when doing the optimiza- 
tion. This is due to the fact that the machinery normally 
needs to be stopped for a knife change which causes 
a production stop. Let therefore the variable y; be 1 if 
the cutting pattern j exists and 0 if not. The sum of y; 
variables then indicates how many different cutting pat- 
terns are needed to satisfy the production and the sum 
of mj; indicates the total number of all patterns which 
are related to the running metres of the raw material. 

Now the basic formulation can be written in math- 
ematical form. The objective is to minimize the total 
number of patterns and the number of pattern changes. 


J 


min ci-m,+Ci-y; (2) 
ie as 2, ee i Yj 
subject to 
I 
S> bi + nij — Bjymax <0, (3) 


i=1 
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I 
— }0 bj - ij + Bjmax — Aj < 0, (4) 
i=1 
I 
> Niji — Nj,max <0, (5) 
i=l 
yj — mj <9, (6) 


j=l,...,J 7) 
J 
Ni order — Mj ° ij <0, 
a (8) 
a eee 
mj, nijEeZ, yj; € {0,1}. 


The Mj gives the upper bound for corresponding m;, 
variables. When using an objective as in (2) the con- 
straint (6) becomes irrelevant. The width constraints 
are given in (3)-(4) and the constraint (5) restricts the 
number of cuts in a pattern. The binary variables, yj, are 
defined in (6)-(7). 

The functionality of the variables are demonstrated 
in the following figure where the raw-paper width is 
Bj, max. Note that the pattern length may typically be e. g. 
6500m. 

The last constraint, the demand constraint (8), is an 
integer bilinear constraint where both variables in bi- 
linear terms are pure integers. This makes the problem 
a nonconvex MINLP problem where the nonconvexity 
appears in the integer variables. 

There are very few methods available that are ca- 
pable of solving similar nonconvex MINLP problems. 
Some heuristic methods such as simulated annealing 
[9] may find the global optimal solution within infinite 
time but algorithmic methods have not been proven to 
converge with such types of problems. Only recently 
(1999) some advancements have been reported in [1] 
and [11]. 

However, it is fully possible to transform the trim- 
loss problem into convex or linear form and use some 
established MINLP or MILP solver to solve the result- 
ing problem to global optimality. Some linear trans- 
formations are presented in [6] and methods to trans- 
form the nonconvex problem into a convex form can be 
found in [10] and [5]. 


Linear Transformations 


As can be seen from (2)-(8), all constraints but the last 
demand constraint are linear. This means that the prob- 
lem should be fairly well bounded already by the linear 
part of the problem and thus a linear formulation strat- 
egy seems to be fully possible. 

However, this linear transformation requires new 
variables and constraints that may complicate the prob- 
lem. Using a standard approach, by rewriting one of 
the integer variables in the bilinear term by binary vari- 
ables, the following is obtained. 


K 


m; = oF eB 
j dX j (9) 


m, €R, Bix € {0,1}. 


K is the number of binary variables needed. By defin- 
ing L, to be the upper bound for respective nj variables 
and introducing a new slack-variable s; the following 
constraints will create a necessary link between the nj 
and sjx variables: 


Sijk — Nij <9, (10) 
— Sijk + mij — Lij- (1 — Bye) <0, (11) 
Sijk — Liz + Bik < 0. (12) 


Using the above constraints the bilinear demand con- 
straint can be written in linear form 


JK 
; Qk-1 1 <0 
Nj order — *Sijk SU. 


j=l k=1 


(13) 


The m, could also be represented by special ordered sets 
(SOS) where at most, one of the binary variables are al- 
lowed to be nonzero. 


K K 
m=) kPa: > Be. 
k=1 k=1 


It should be noted that the usage of this kind of trans- 
formation may enlarge the integrality gap unless for in- 
stance the nj variables in equations (3)—(5) are replaced 
with corresponding variables sjjx. 

The same transformation can be modified such that 
nj is replaced by a binary representation and mj is de- 
fined through the slack-variables sj. 


(14) 
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The integer variables 


Parameterization Methods 


Beyond the linear transformation, the problem can be 
written in linear form by simply parameterizing one of 
the variables in the bilinear term. This method though 
may lead to global optimality only in such cases where 
all the possible combinations have been considered. 
This strategy may be good for smaller problems but it 
may also generate far too many integer variables in solv- 
ing larger trim-loss problems. 

It is quite easy to generate all the possible combi- 
nations of nj variables satisfying the constraints (3)- 
(5). This strategy results in a problem where these con- 
straints can be removed and where the nj variables in 
the resulting linear demand constraint are parameters: 


Ni ,order — mj 7 ie s 0, (15) 


j m eZ. 
The same type of parameterization strategy may also be 
applied to the other variable m; but in this case it may 
be more difficult to define the exact values of the pa- 
rameters. One strategy is to use the upper bounds Mj or 
define all the m; variables to be equal to one and make 
sure that a sufficient amount of the variables m; are con- 
sidered. 

Another alternative is to combine the parameter- 
ization and transformation methods so that a proper 
amount of parameterized variables are combined with 
original variables. This strategy may be very efficient 
but often requires such information that may be dif- 
ficult to obtain from a larger problem without any 
knowledge of the solution. 


Convex Transformations 


In the previous sections a number of methods were 
presented where the nonconvex problem can be trans- 


formed or parameterized into linear form. The main 
drawback for this linear transformation strategy is the 
large number of extra constraints and continuous vari- 
ables. The parameterization strategy results in a formu- 
lation with a few constraints but many extra integer 
variables. 

In the following a number of convexification meth- 
ods are presented. Generally, the convex formulations 
need fewer extra constraints and continuous variables 
as the linear strategies and no extra integer variables as 
is the case with the parameterization methods. Thus, 
the convex transformation could be expected to result 
in formulations which are easier to solve especially for 
larger-scale orders. This creates an interesting problem, 
where the integer search space is reduced at the expense 
of more complex nonlinear functions, which could, in 
principle, be used as benchmarks for the performance 
of MINLP algorithms. 

The basic principle for the convex transformation is 
to first expand the bilinearity in the demand constraint 


mj-niz = (mjp+r)(niytt)—t-(mytni)—t’. (16) 


In the following text, the translation constant t = 1 is 
used for simplicity. The second step is to substitute the 
bilinear term in the original demand constraint 


J 
Nj ,order — S(m; “ 1)(nij ++ 1) 
j=l 


J 
+> i(mjt+nij)+J <0. (17) 


j=l 


It should be noted that the transformations that follow 
need to consider the whole problem not only individual 
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functions, which makes the transformation techniques 
more demanding. A transformation of a single function 
may cause linear constraints to become nonlinear if one 
is unaware of this fact. 


Exponential Transformation 


The demand constraint is originally a negative bilinear 
constraint. The exponential transformation can only be 
applied to a positive bilinear constraint. Therefore, one 
of the variables in the bilinear term needs to be substi- 
tuted with its reversed value. 


rij = Nj,max — Nij (18) 
and the demand constraint is modified to 
I J 
Ni,order — )_, j * Njmax + So mj: rij. (19) 
j=l j=l 


Now the exponential transformation can be applied. 
The transformation is of the form 


m+1l=ei, ri+1= eri (20) 
and the variables are defined as 
Lj 
mj = >> Bil (21) 
1=1 
Lj 
M; = > Bil -In(i + 1), (22) 
1=1 
Kj 
rij = >_ Bink, (23) 
k=1 
Kj 
Ri = >> Bie In(k + 0, (24) 
k=1 
Lj Ki 
>be <1, DOB <1 (25) 


Bit. Bijx € 10, 1}, Mj, Rij ER. 


When combining these definitions, the demand con- 
straint can be written in convex form 


J 

y Mj+Rij; 

Nj order — J a ev! y 
j=l 


Lj Ki 
-»)> (Njmax + 1)- 95 Bi 1+ Do Bijn +k = 0, 
k=1 


j=l 1=1 
(26) 


This transformation can also be achieved in slightly an- 
other way but using this strategy also requires updating 
some of the constraints in (3)-(7). 


Square-Root Transformation 


This transformation is almost equivalent to the previ- 
ous one. A main difference is that it can be applied 
straight to the negative bilinear constraint and thus no 
rj variables need to be defined. The constraint (21) is 
valid but the constraint (23) needs to be modified to 


(27) 


Note that the equations in (25) are valid. The transfor- 
mation is of the form 


mj+1= ,/Mj, njjtl= Nij. 


The transformation variables Mj; and Nj are defined as 


(28) 


Lj 
Mj =1+) By - Ml +2), 


(29) 
1=1 
K; 

Nig =1+ >> Bije K(k +2), (30) 
k=1 


Bix, Bijx € {0, 1}, 


and the resulting convex demand constraint is 


J 
Ni,order +] — D> J Mj -Nij 
j=l 


J 


2 


i=] 


Mj, Nij ER. 


Lj Kj 
Bae l+ > Bie -k} <0. (31) 
I=1 k=1 
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Logarithmic and Square-Root Transformation 


The square-root and the logarithmic functions can be 
combined, resulting in a third convex transformation. 
It is directly applicable to the negative bilinear function 
and the transformation can be written as 

mj +t1l= Mj nij + 1 = InNij. (32) 
The mj, nj and M; variables are defined as in the square- 
root transformation and the Nj is defined as 


K; 
Nij = e+ >> Bizz: (e**? —e) 
k=1 


(33) 


and the following convex demand constraint is ob- 
tained 


J 
Nj ,order + J= > \/ Mj In Nij 
j=l 


J Lj 


Kj 
+o] > [Ba t+ > Bink | <0. (4) 
k=1 


j=l \l=1 


It can be noted in equation (34) that the only difference 
to the former transformation is the third term of the 
demand constraint. 


Inverted Transformation 


The following transformation can be applied to a pos- 
itive bilinear constraint. Thus the same definition of rj 
has to be done as for the exponential transformation. 
The transformation has the form 
1 1 
Oe a (35) 


The definitions of the transformation variables follow: 


Mj (36) 


II 
a 
+ 
- 
+] 4 
i 
| 
Se” 


(37) 


The demand constraint is obtained exactly in the same 
way as before 


J 


1 
Niorder — J + fa 
2 M; . Rij 


Lj Kj 
=—S0 | max +1) + D5 Bil + 5 Bink] <0. 
k=1 


j=l 1=1 
(38) 


Modified Square-Root Transformation 


As the last transformation, a modification to the pre- 
viously presented square-root transformation is intro- 
duced. In such cases where the variable m; may take 
large values, it may be more efficient to use another type 
of binary representation. 


ie 
Pi 
mj = 2) Bit, (39) 
l=1 


where L;’ = |logs(mj,max)|+ 1 if mj max is the upper 
bound for the respective m; variable. This modification 
reduces the required number of binary variables and 
the transformation variable M; needs to be redefined. 
The definition also requires additional slack-variables 
and constraints. In the following, the square-root trans- 
formation is used: 


v 
Lj 


M; =1 + bo Came 49") . Bi 
1=1 


L’ 
j 
+ Yo 24h siim, (40) 


1,m=1;m<l 


—Sjim —1+ Bj + Bim < 9, (41) 
2+ Sjim — By — Bim <0, 
Lm = leuk m<l. (42) 


By adding the extra constraints and defining Nj as in 
the square-root strategy, the demand constraint can be 
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written in convex form as follows 


J 
Ni,order + J- VM; 7 Nij 
j=l 


J Lj Kj 
+> 01> Ba 2’ + D  Bije-k | <0. (43) 
k=1 


j=l \l=1 


Five methods for transforming the originally noncon- 
vex trim-loss problem into convex form have been dis- 
cussed. Three of them were directly applicable to a neg- 
ative bilinear function but for two methods some op- 
erations were needed to change the demand constraint 
into a positive bilinear constraint. 


Example: A Numerical Problem 


In this last section a numerical example is solved with 
all of the presented methods. To improve the perfor- 
mance of the solution procedure some extra linear con- 
straints need to be defined. They are, however, not spec- 
ified here. 

In the following example order an upper limit for 
products 71, max that are allowed to be produced also has 
been defined. Here, the maximal possible overproduc- 
tion of any product is 2. This limit is somewhat unnat- 
ural and is therefore not used as a constraint. However, 
the use of this type of upper bounds makes it possible 
to efficiently reduce the combinatorial space. 


i b i (mm) Nj order Ni ,max 
1 330 8 10 
2 360 16 18 
3 380 12 14 
4 430 7 9 
5 490 14 16 
6 530 16 18 


Example order 


The example demand is a mid-size customer order with 
a total weight of 27.5tons. Some important parameters 
need to be defined before optimization. The raw paper 


width of 2200mm is chosen and a maximal trim loss of 
100mm is tolerated. At most 5 products may be cut out 
from a cutting pattern. Among the following parame- 
ters, the parameter M; refers to the upper bound of the 
respective mj variable and the parameter N; to the nj 
variables. Note, that since the raw paper width is equal 
for every pattern the latter upper bound is independent 
of the index j. 


f= f=6 Neve 
cae M; = {14, 12,8,7,4, 2} 
Coal N= 23,25. at 


Bee 200mm iS 
Aj; = 100mm 


The problem parameters 


The parameter Myin is the lower bound for the sum of 
the variables mj. This sum can easily be calculated in ad- 
vance and significantly enhances the optimization per- 
formance. 

Before doing the actual optimization it should be 
pointed out that the results are not comparable. The 
main purpose for showing the numerical results is to 
demonstrate that the above presented strategies are 
fully usable and result in quite efficient solvable formu- 
lations. The transformation strategies can be directly 
applied to any problem where the bilinear terms con- 
tain integer variables. 

The methods are divided into three groups of which 
the linear transformation and the parameterization 
strategies result into MILP formulations. The third 
group, the convex transformation strategy produces 
MINLP formulations that have in this case been solved 
using the extended cutting plane (ECP) algorithm by T. 
Westerlund and F. Peterssen [12]. 

In the parameterization strategies the problem is 
redefined by parameterizing certain variables which 
means that the resulting problem has already been 
partly solved. This may, however, not always be a bene- 
fit, especially in such problems where a huge number of 
parameters increases the integer search space for other 
variables. 
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The strategies are numbered as follows: 


— 


AluyjR] wy] ry 


. | square-root transformation 
. | logarithmic and square-root transformation 


. | inverted transformation 
. | modified square-root transformation 


The strategies enlarge the problem both in terms of 
variables and constraints. In the following the num- 


coo|N 


Ke) 


ber of variables and constraints are given. All the con- 
straints are linear except in the convex transformation 
strategies where six of the constraints are nonlinear. 
The strategies 1-4 are linear formulations of which 
3-4 use the parameterization strategy to overcome the 
bilinearity. Strategies 5-9 are convex transformations. 
The field with combinations gives simply the number of 
unconstrained discrete variable combinations as a func- 
tion of number of binary variables. This information is 
more informative than just the number of variables. 


Strategy | Constraints | Variables Comb. 
(1/B/C) De 
1. 408 36/23/120 ge 
2D. 366 6/88/144 gies 
3. 59 51/51/— gu 
Al, 201 282/47/— Bowe 
5, 199 —/169/84 pee 
6. 199 —/169/84 Dee 
7 185 —/169/84 pec 
8. 185 —/169/84 ge 
9. 225 —/208/84 ge 


The MILP problems 1-4 were solved with CPLEX- 
5.0 using default settings and the MINLP problems 5- 


9 were solved by ‘mittlp’, an ECP application written 
by H. Skrifvars. The optimization was done on a Pen- 
tium Pro 200MHz running the Linux operating sys- 
tem. 

The optimization results can be seen in the follow- 
ing table. 


Strategy Nodes ECP-iter. CPU- 
(MILP) | (MINLP) | time (s) 

Ils 265 : 7.6 
Dp 51 = 0.51 
3. 2174 = 32) 
4. 265 = Weil 
oF = 4 8.6 
6. : 7 66.6 
Es = 9 138.6 
8. F 10 736.4 
©). = 6 49.9 


The optimal result has two cutting patterns with the 
widths B, = 2110 mm and B, = 2170 mm and multi- 
ples m, = 8, m2 = 7. The appearances of the patterns are 
given by the following variables: 1), ; = 1, n2,1 = 2, 16,1 
1, 15,2 =2 


2, 13,2 = 2, Na,2 


Conclusions 


The study above is not a fair comparison. Experience 
has shown that the performance order is highly depen- 
dent on the specific problem. In order to get an idea 
of which of the methods is, in average the most efh- 
cient one, tens of problems of different sizes need to 
be solved. However, the study illustrates that it is fully 
possible to apply the transformation methods to a well 
explored real industrial problem. 

In the present study the trim-loss problem was used 
as an example case but the transformation methods are 
general and can be applied to any problem with similar 
type of bilinear constraints. 
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Notation 


product index 


cutting pattern index 


number of products in the order 


number of possible cutting patterns 


number of times the pattern j is used 


number of product jin pattern j 


reversed value of nj 


Ni, order | NUMber of product i ordered 
bj width of product i 


Bj, max | width of raw paper of pattern j 


Aj max. trim-loss width 


Nj, max | max. number of products in pattern j 


Yj binary variable that is one if mj > 0 


GG cost coefficients 


Mj upper bound / transformation variable 


Bj1, Bix | binary variables for defining m; 


Li upper bound 


Sik slack-variable for linear transformations 


Bik binary variables for defining nj or rij 
y 


ny fixed nj values 


translation constant 


transformation variable 


transformation variable 


indices of binary variables 


Lj, Kj number of binary variables needed 
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Relaxation 

> Integer Linear Complementary Problem 

> Integer Programming 
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> Integer Programming Duality 

> Integer Programming: Lagrangian Relaxation 

> LCP: Pardalos—Rosen Mixed Integer Formulation 

> Mixed Integer Classification Problems 

> Multi-objective Integer Linear Programming 

> Multi-objective Mixed Integer Programming 

> Multiparametric Mixed Integer Linear 
Programming 
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> Set Covering, Packing and Partitioning Problems 

> Simplicial Pivoting Algorithms for Integer 
Programming 
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> Stochastic Integer Programs 
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Introduction/Background 


In the past two decades, biologists have sequenced more 
and more complete genome sets for various species. To 
reveal secrets of life hiding in enormous genome data, 
the mechanism which conducts gene expression is con- 
tinuously researched and discussed. Gene transcrip- 
tion, a primary gateway to gene function, is controlled 
by a complex regulatory mechanism. In this mecha- 
nism, many specific regulatory proteins bind to local 
regions of a gene upstream, called transcription factor 
binding sites (TFBS) or motifs, to control the gene ex- 
pression. Therefore, the discrimination of TFBSs be- 
comes an essential task for genome function analysis. 
Finding TFBSs is a challenging issue because motifs 
are mostly orientation- and position- independent to 
transcription starting points, and usually with some de- 
gree of ambiguity. Experimental methods like DNA mi- 
croarray (DeRisi et al., 1997; Lockhart et al., 1996) and 


SAGE (Velculescu et al., 2000) are capable of precisely 
elucidating motifs, but too laborious and time consum- 
ing to analyze enormous genome data. More and more 
computer based methods - such as enumeration meth- 
ods, probability models and heuristics - are being de- 
veloped to help motif finding. The modeling of in silico 
motif finding has two parts: scoring function and algo- 
rithm. The simplest scoring function is given by sum- 
ming up the number of base matches in a regulatory 
region. Generally it needs a predefined shared pattern 
for accuracy. Another scoring criterion is position-spe- 
cific scoring matrices (PSSM) or its variant, informa- 
tion content (IC, Schneider et al., 1986), [44]. Though 
more computing is required, PSSM and IC are the most 
popular scoring functions, owing to their pattern-free 
property. 

Current motif finding algorithms can generally be 
categorized as the probabilistic approaches and the 
deterministic approaches. Popular probabilistic algo- 
rithms are the expectation maximization [22], Gibbs 
sampling [21] and hidden Markov model (HMM). 
These are used to develop various sample-driven tools 
like MEME [3], CONSENSUS [17], AlignACE [19], 
ANN-spec [54], BioProspector [24], MotifSampler 
[48], GLAM [13], The Improbizer [1], QuickScore [38], 
SesiMCMC [11] and TFBSfinder [51]. 

There are many discrepancies among determinis- 
tic methods. A representative one is the consensus- 
based approach, [45] which tests all 4” m-wide pat- 
terns and promises an optimal solution, but is very time 
consuming and impractical for large m [33,49]. Many 
heuristics are developed to prune the huge searching 
space, including testing only the substrings in the se- 
quences [15,26], specifying a shared pattern to restrict 
the locations of mismatches [5,7,38,41], constructing 
suffix tree with fixed mismatches [30,31] and clustering 
approaches [6,23,34]. 

The methods for determining a consensus pattern 
can be split into two parts. The first part is the model 
for describing the shared pattern, and the second part 
is the algorithm for identifying the optimal consensus 
sequence according to its shared pattern. This study be- 
longs to the second part. A consensus based motif find- 
ing problem is, given a set of sequences known to con- 
tain binding sites for a common factor but not knowing 
where the site are, to discover the location of the sites in 
each sequence [45]. 
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Ecker et al. (2002) utilized optimization techniques 
to reformulate the maximum likelihood approach for 
motif finding problems. They adopted a probabilistic 
model and formulated a well-designed nonlinear model 
with reference to the expectation maximization algo- 
rithm of Lawrence and Reilly [22]. Their method, how- 
ever, occasionally only finds a feasible solution or a lo- 
cal optimum, which means the best solution may not 
be found. Additionally, no further structural feature in 
the target motif can be embedded conveniently in their 
model. 


Definitions 


This study introduces a linear programming method 
for solving a motif finding problem to reach the globally 
optimal consensus sequence. Two examples of search- 
ing for CRP-binding sites and for FNR-binding sites in 
the Escherichia coli genome are used to illustrate the 
proposed method. The motif finding problem is firstly 
formulated as a nonlinear mixed 0-1 program for the 
alignment of DNA sequences; each of the four bases 
are coded with two binary variables and a matching 
score is designed. This nonlinear mixed 0-1 program is 
then converted into a linear mixed 0-1 program by lin- 
earization techniques. Owing to some special features 
of the binary relationships, this linear 0-1 program in- 
cludes 2m binary variables where m is the number of 
active letters in the consensus. This method makes the 
number of binary variables independent of the num- 
ber of sequences and the size of each sequence. That 
means the proposed method is computationally effi- 
cient in solving a motif finding problem with a large 
data size. Secondly, the proposed method is guaran- 
teed to find the global optimum instead of a local op- 
timum. Thirdly, many kinds of specific features ac- 
companied with the target motif can be formulated as 
logical constraints and embedded into the linear pro- 
gram. 

An example of searching CRP-binding sites, as dis- 
cussed in Stormo et al. [44] and Ecker et al. (Ecker et al., 
2002), is described as follows. Given eighteen letter se- 
quences, each 105 positions long, where each position 
contains a letter from the set {A, T, C, G}, find a con- 
sensus sequence of length16 with the pattern 


LL, L3L4Ls5 2K OK OK OK OK *L6L7LgLoLio 


where L; €{A, T, C, G} and *’s mean the positions of 
ignored letters. 
Restated, the problem is to specify 
(i) the L;’s of the consensus sequence pattern, and 
(ii) the location of the site in each given sequence 
which can fit most closely the consensus sequence. 


Formulation 


This study firstly formulates a motif finding problem as 
a nonlinear mixed 0-1 program. This nonlinear mixed 
0-1 program is then converted into a linear mixed 
0-1 program using linearization techniques. To reduce 
the computational burden, many 0-1 variables in this 
linear mixed 0-1 program can actually be solved as con- 
tinuous variables by an all or nothing assignment tech- 
nique which greatly improves the computational effi- 
ciency of this program. 

Here we use the example data in [44], as listed in 
Appendix, to describe the proposed method. First, we 
represent the data in Appendix as an 18*105 data ma- 
trix D: 


bi bi by ,105 
bo ba bo 105 

D= ; (1) 
bis; big2 bi8,105 


where b7,, is the letter in the position p of the sequence I. 

Recall the example discussed in previous section: 
the consensus sequence we want to find has 16 posi- 
tions (ten L;’s and six ignored letters). A sequence has 
90 corresponding sites, so an 18*900 data matrix D’ is 
generated from D. 


di, ae di, diy = dy", ~ dt 99 =~ di 59 
D'= d5 ad ds, dy nee d5", ue 45 99 a d3 "59 
dig — dig. digo — digo ae dig.o0 i di8,90 
(2) 
where 
fe by i+s—1 (for i el oe eee 5) 
I,s ‘ 
° Biceges Cord = 6,7,2055 10), 


and s = 1...90 is the starting position of each candi- 
date site. 
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tion Element Identification, Table 1 
Base code in the determined consensus sequence 


Base uj vj 
A 0 


For L; €{A, T, C, G}, two binary variables u; and v; 
can be used to express L;, an element of the consensus 
sequence, as shown in Table 1. 

Table 1 indicates that if L; is A, T, C, or G respec- 
tively, then a; = 1, t; = 1,c; = 1 or gj = 1, which im- 
plies following conditions. 


a; = (1—4;)(1 — vj) 

tj = UjVv; 

ci = (l—ui)v; my 
gi = uj(1— vj) 


Now we let Score; be the degree of fitting to the 
found consensus sequence, specified as 


90 
Score; = yas (0), +07, +... +072) (4) 
s=1 


where 6} is the element of candidate sites extracted 
from D’. The constraints associated with (4) are below: 


(i) 


90 
zsh zs € {0,1} foralll ands. (5) 
s=1 


(ii) 
a; if di. =A 
; ; a= 
ls — ‘ “4 (6) 
: c; if di, =C 
gi if di =G. 


Clearly, 0 < Score; < 10, and the objective is to maxi- 
mize the total sum of Score;. 


Methods/Applications 


Consider the sample data in Fig. 1 for instance: 


Score; = 
Z1,1(a1 + a2 + g3 + dg +¢5 + to + t7 + te 
+ go + ajo) 
+ 21,2(a, + 2 +43 +¢4+ ts + te + t7 + ge 
+ dg + tio) 
21,3(g1 + a2 + ¢3 + ty + g5 + te + g7 + ag 
+t9+ cio) (7) 


Score, = 
Zai(f1 + a2 +t3 +t4+ ds +6 + 7 + Be 
+ co + gio) 
+ Z2,2(a, + ty + t3 + a4 + ts + go + 97 + 68 
+ go + tio) 
+ Z2,3(t) + ty + a3 + ta + ts + Bo + €7 + ge 
+ to + cio) (8) 


All z;,; in (4) are binary variables. Equation (5) im- 
plies that for a sequence J, only one site is chosen to 
contribute to Score;. Suppose the kth site is selected, 
then z),, = 1 and z;,; = 0 for all s € {1,2,..., 90}, 
s #k. Since a huge amount of z;,; (i-e., |/| * |s|) are 
involved, to treat z),; as binary variables would cause 
a heavy computational burden. Therefore z),; should 
be resolved as continuous variables rather than binary 
variables. An important proposition is introduced be- 
low: 


Proposition 1 (All or nothing assignment) Let 
Z1,s = 0 be continuous variables instead of binary vari- 
ables. If there is a k, k € {1,2,...,90}, such that 
ere = max {yo i for s= 1 2ia5 0), 
then assigning z),, =1 and z),,=0 for all s#k, 
s € {1,2,..., 90}, can maximize the value of Score). 


Proof Since }°,zi,; = 1 and z),; > 0, it is true that 


max {9° (21, BA 6.) = max{>),6/,fors = 1, 
2,..-,90} = )°, Oy. Oo 
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Remark 1 The objective function }°, Score; can be 
rewritten as 


10 


fe= Soja Do asttii DO ay 


i=1 (1,s)ESA; (1,s)€ST; 
+c >) taste D> asp 9) 
(1,s)eSC; (1,s)€SG; 


where SA; = {(1,s)|dj, = A}, ST; = {(1,s)|dj, = Tt, 
SC; = {(1, s)|dj , = C}, and SG; = {(1,s)|dj, =G} 
fori=1,2,...,10. 


This result implies that SA; (or ST;, SC;, SG;) is a set 
composed of (/,s) in which the product term z),,a; (or 
Zistis Zi,sCir Z1,sZi respectively) appears on the right 
hand side of (4) because 6; = dj. 

For instance, the sum of Score; and Score, in (7) 
and (8) becomes 


Score; + Scorer = 4\(Z1,1 + Z1,2 + 22,2) +... 


+ ay0Z1,1 +++ + gi(Z1,3 + 22,1) +++ + B10Z2,1 - 
(10) 


Some logical constraints can be conveniently expressed 
by binary variables. For instance, the constraint that 
a CRP dimer binds a symmetrical site requires that 


A then Lyi = aii ; 


ifL; = 
C then Ly; =G. 


Such a logical structure can be conveniently formulated 
with the following constraints: 


uj +uy-j = 1 


| for i = 1,2,3,4,5 (11) 


Vi +tvyi-j = 1 


where uj, Vj, U11—-i, Vir—i € {0, 1}. 
With reference to Table 1, clearly if L; = A (ie. 


uj= O and v; = 0) then Ly; = T (i. e., Uyj—j = 
landv,,;_; = 1) and vice versa; (ii) if L; = C (ie, 
u; = Oandv; = 1) then Ly-; = G (ie, uy-j = 
1 and vj;_; = 0) and vice versa. 

Models 


A motif finding problem can be formulated as a nonlin- 
ear mixed 0-1 program based on these constraints: 


Program 1 (Nonlinear Mixed 0-1 Program) 


Maximize 
18 10 
Y° Score; = > aj Y Z1,s 
1=1 i=1 (1,s)ESA; 
+t; > Zs + Cj ye Z1,s (12) 
(1,s)EST; (1,s)ESC; 
+g >> ts 
(1,s)ESG; 
90 
subject to ae =1, Zs > Oforall/,s 
s=1 
aj = (1—uj)(1— vj) ; 
Conservative 


tj = UjVvj , 
constraints for 
ci = (1 — uj); 


i=1,2,...,10 
gi = uj(1—v;) 
uj + Ui-i = 1{ Logical constraints 
vyitvyj=1) fori=1,2,...,5 
uji,vi € {0,1} 


fori =1,2,...,5 
O0<uj,v; <1 

fori = 6,7,...,10 
0<4j,ti,ci,gi <1 


fori =1,2,...,10. 


This program intends to solve {aj,ti,ci, gi} for 
i= 1,2,...,10 thus to maximize the total degree of 
fitting to the consensus sequence for the given 18 
sequences, subjected to a possible logical constraint. 
A very important feature of Program 1 is that we can 
treat z),; as continuous variables rather than binary 
variables, which can improve the computational effi- 
ciency dramatically. We can ensure all found z,,, still 
have binary values as discussed in the next section. 


Linearization of Program 1 Program 1 is a mixed 
nonlinear 0-1 program where q; )~ Zi,s for qi € {aij, ti, 
gi, cj} and u;v; are product terms. These product terms 
can be linearized directly by the following propositions: 


Proposition 2 The product term A; = qi )- Z1,s, where 
A; is to be maximized and q; € {0, 1}, can be linearized 
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as follows: 


(13) 


where M is a big constant larger than or equal to the 
number of sequences. 


Proof 
A; = 0. 


If qi =1 then A; = > zs; and otherwise 


oO 


Proposition 3. The product term w; = ujv;, where 
ui,vi € {0, 1}, can be linearized as follows: 


Wi SU; 


Wi S Vj 
(14) 
w;, > 0 


wi>ujtv—-l. 


Denote Z(a;) = a; sees Z1,s. Z(ti) = ti uses; 
Zs, Z(ci)= Ci Diasyesc, Zs, and Z(gi) = 
gi YMusyesG; 


Z1,;. Program 1 is then linearized into Program 2 based 
on Proposition 2 and Proposition 3. 


Program 2 (Linear Mixed 0-1 Program) 


18 
Maximize y Score; 
1=1 
10 


= x (Z(a;) + Z(ti) + Z(c;) 


i=1 


+ Z(gi)) (15) 


subject to 


90 
Sozis=1, 2120 foralll,s 
s=1 


a; =1—u;-—v,+w; 


ti = Wi 


Conservative constraints 
fori=1,2,...,10 


uj tuy-j=1 . . 
Logical constraints 
vi + Vi-i = 1 
fori=1,2,...,5 

> astMa-NsZa@)s DO a, 
(1,s)ESAj (1,s)ESAj 
0< Z(a;) < Ma; 

Yo ast M(ti-I)<Zt)< YO zs 
(1,s)EST; (1,s)EST; 
0<Z2(t)< Mt; 

Yd) zstMr-<Za< Yo zis 
(1,s)ESC; (1,s)ESC; 
0<Z(c))< Me; 

> z1,s + M(gi — 1) < Z(gi) < > Z1,s 
(1,s)ESG; (1,s)ESG; 
0<Z(g))< Mg; 
uj,v; © {0,1} fori=1,2,...,5 

for i = 6,7,...,10 
for = 1,2,...,10 


Constraints for linearizing product terms 


0<uj,v; <1 
0 < 4j,ti,ci,gi <1 
Z1,; S are treated as non-negative continuous variables 


for] = 1,2,...,18ands = 1,2,...,90 where M can 
be any value greater than or equal to 18. 


In Program 2, since u; and v; are binary variables, aj, tj, 
c;, and g; should have binary values following (3). Al- 
though z;,; are treated as continuous variables, the val- 
ues of z;,; should be 0 or 1. This is because the optimal 
solution of a linear program should be a vertex point 
satisfying }>, z1,; = 1 for all 1. 

Consider the following proposition. 


Proposition 4 Let the optimal solution of Program 2 
be x* = (Z*, u*,v*) and \>,Z1,; = 1. Assume that 
a sequence | contains sites s,,52,...,8,% such that 
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0<zf, <1 forj=1,2,...,k, then, 


where 1.5, are specified in (6) . 


Proof For >°, Zi,s = 1, if sp, sq € {51, 52,... 5x} where 
ys Bis, > 916° then to maximize Score) = 
Dj Zhs;) Dei Dis, requires Z),,, = 0. This conflicts with 
the observation that 0 < zj,;, < 1, therefore >); = 


Is) 


> Oi, — - OF a O 


After solving Program 2 we can obtain the globally 
optimum solution “TGTGA******TCACA” with objec- 
tive value 147. The related nonzero z;,; values indicate 
the starting positions of the binding sites in the 18 se- 
quences, as listed below: 


21,64 = 22,58 = 23,79 = 24,66 = 25,53 = 26,63 = 27,27 


= 28,42 = 29,12 = 210,17 = 211,64 = 212,44 = 213,51 


= 214,74 = 215,20 = 216,56 = 217,87 = 218,81 = 1 


All other z),;’s have value 0. 

In Program 2 the total number of 0-1 variables is 
2m and the total number of the continuous variables 
is 20m + |1| * |s|. Since the number of 0-1 variables is 
independent of the lengths of / and s, a motif finding 
problem with many long sequences can be solved effec- 
tively. 


Suboptimal Consensus Sequences Program 2 can 
find the exact global optimum solution. Sometimes the 
second best and the third best solution may also be use- 
ful. It is very convenient for the proposed method to 
find a complete set of consensus sequences by adding 
some extra constraints. For instance, the second best 
solution of Program 2 can be obtained conveniently by 
solving the following program: 


18 
Maximize > Score; 
1=1 
(i) The same constraints in Model 1 
(ii) th + g + t3 + ga tds + te + c7+ 


dg + co + aio < 9 (new constraint) 


(16) 


subject to 


AAGACTGTTTTTTTGATC 
GATTATTTGCACGGCGTC 
l=1,s=1 AAGACITGTTTTITTTGATC 
l=1,s=2 AAGACTIGTTTTTITTGATIC 
l=1,s=3 AAGACTGTTTTTTITGATC 
l=2,s=1 GATTAITTTGCAICGGCGITC 
l=2,s=2 GATTAT|TTGCACIGGCGTIC 
1=2,s=3 GAITTATT/TGCACGGCGTC 
b 


Peoreaee AGACTTTGAT eeu 


GATTACGGCG ATTATGGCGT TTATTGCGTC 
Cc 
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tion Element Identification, Figure 1 

A small example of finding consensus sequence: a two se- 
quences to be compared; b Schematic representation of the 
candidate sites; c The associated D’ matrix 


The new constraint is used to force the program to 
find a new solution different from the solution of Pro- 
gram 2. The found second best consensus sequence is 
“TTTGA******TCAAA” with score 129. Similarly we 
can find another solution by adding following con- 
straint into (16). 


thhtt,t+t3+g4t+a5+te+c7+dgt+dg9t+ ajo <9 


The found third best consensus 
“AAATT***** A ATTT” with score 129. 


sequence is 


Extend to Find Unknown Binding Sites A more 
complicated motif finding problem is to search for the 
consensus sequence with an uncertain pattern format 
where the number of ignored letters between the two 
half sites is unknown. An example is to find a consen- 
sus sequence of length 2 * 5 + k with the pattern 


Ly, LoL3L4Ls5 KOK L6L7LeLoL\o 


where k, the number of *’s, is an unknown integer be- 
tween 0 and 10. 

Program 2 can be modified slightly to treat this type 
of motif finding problem. Firstly we expand D in (1) as 
D’ below: 


D’ = [D‘(0)D‘(1)D’(2)...... D’(10)] 
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Number of Solving Time 


Number of Solving Time 


Computational time versus sequence length 


Sequence 
Length 
105 
210 
315 
420 
525 
630 
735 
840 
945 
1050 


Computational time versus number of sequences 


Sequences 
9 
18 
27 
36 
45 
54 
63 
72 
81 
90 


Solving Time 
(mm:ss) 
1:39 
1:21 
1:44 
1:43 
1:48 
1:54 
1:48 
1:56 
1:59 
2:04 


(mm:ss) 
0:30 
1:39 
3:21 
4:32 
6:15 
6:01 
8:16 

10:29 
10:01 
9:37 


Computational time (mm:ss) 


Computational time (mm:ss) 


2:20 
2:00 
1:40 
1:20 
1:00 
0:40 
0:20 
0:00 


Computational time versus number of independent positions 


Cc 


Indep Pos (h:mm:ss) 
2 0:00:01 
3 0:00:03 
4 0:00:21 
5 0:01:23 
6 0:03:38 
7 0:05:18 
8 0:08:25 
9 0:15:52 
10 0:53:27 
ll 2:33:20 


Computational time (seconds) 


0 105 210 315 420 525 630 735 840 945 1050 1155 
Length of a single sequence 
12:00 - 
co 4 
10:00 + > 
08:00 © 
06:00 —— 
04:00 ss 
¢ 
02:00 . 
00:00 = ! 
0 9 18 27 36 = 45 54. 63 72 81 90 =99 
| : Number of sequences 
100000.0  - 
@ 
10000.0 bd 
e 
1000.0 
a sd 
e i 
100.0 ~s 
e 
10.0 
So 
1.0 
0.1 
1 3 4 5 6 7 8 9 10 11 12 = «13 


m: Number of independent positions 


Mixed 0-1 Linear Programming Approach for DNA Transcription Element Identification, Figure 2 

The relationship between computational time and various factors involved in a consensus based motif finding problem. This 
figure illustrates the computational time of solving Program 2 with a various sequences sizes; b various number of sequences 
and c various independent positions 
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kok 
0 1 
2: 3 
4 5 
6 7 
8 9 
10 11 
35:00 
_. 30:00 
: 25:00 
B 20:00 
= 
& 15:00 
8 
g. 10:00 
° 
O 05:00 
00:00 
0 1 2 


Common Site 


TGTTT (0) AAACA 
TGAAA (2) TTTCA 
GTGAA (4) TTCAC 
TGTGA (6) TCACA 
TGTGA (6) TCACA 
TGTGA (6) TCACA 


3 4 5 


Score 


126 
129 
134 
147 
147 
147 


6 7 


Computational Time 
4:51 
12:32 
19:46 
24:28 
25:49 
32:35 


|k |: Number of possible k's 


Mixed 0-1 Linear Programming Approach for DNA Transcription Element Identification, Figure 3 
Computational time of Program 3 with various numbers of possible k’s. The number enclosed in the common site is the 


solution of k 
in which 
D/(k) = 
Chee S qk di 5 fad qk a 
10 10 
ak oe dy tk do 2. _ dy ak aa 
10 no 
dis.1,k dig ik ieee a dig ak 
where k € {0,1,..., 10}. 
i bh its—1 (for i= 


bhitstk-1 (fori = 
isk = Gi» ti, ci or gj when 


‘C’, or ‘G respectively . 


The cases with k larger than 10 are not considered since 
they are relatively rare. A linear mixed 0-1 program for 
solving this example is formulated below: 


Program 3 


di ik - Chen 
1 wae Qld 
4) 90,k 4) 90,k subject to 
0 
dia. 90.% dis 90,k 
1,2,3,4,5) 
6,7; 8;9,.10) 
qi sk _ ‘A, ple 


2m 


Maximize 0 (Z(a;) + Z(t;) + Z(ci) + Z(gi)) (15) 


i= 


10 96—k 


ag: yy are 


k=0 s=1 
Z1,s,k = 0 for all 1, s,k 


(ii) > aed = Sak See 
Ss s 
= J) > 218,s,k for k € {0,1,..., 10} 
Ss 


(iii) the same conservative and logical 
constraints in Program 2 


(iv) the same constraints for linearizing 
product terms in Program 2 but 
replace Z),; by Z1,s.k - 


Constraints (i) and (ii) are used to ensure that 
when a specific k is chosen then )°, z),s,~. = 1 and 
>, Z1,s,k/ = 0 for kK’ Ak. 
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Mixed 0-1 Linear Programming Approach for DNA Transcription Element Identification, Table 2 


FNR binding sites found by Program 3 


Operon mai ay by eee Score Site seq. listed in RegulonDB* Bal 
Common site: TTGAT----ATCAA 
narK 338 ATGAT----ATCAA -86 9 actatgGGTAATGATIAAATIAATCAAITGATagataa = -79.5 
TTGAT----ATCAA -48 10 atcttaTCGTITTGATITTACATCAAATTGccttta -41.5 
ansB 345 =TTGTT----GTCAA -48 8 acgttgTAAATTGTTITAACIGTCAAATTTcccata  -41.5 
TTGTA----TCCAA -81 6 gcctctAACTITTGTAIGATCITCCAAAATAtattca -74.5 
TTTAT----TTTAA — -123 7 
narG 525 TTGAT----ATCAA -55 10. ctcittgATICGTTIAATCAAITTCCCACGCTGtttcag = -41.5 
dmsA 325 TTGAT----AACAA -48 9 ctlttgaTIACCGAACAAITAATTACTCCTCacttac -33 
frd 781  TTCAG----ATCCA -37 7 AAAAATCGATCTCGTCAAATITTcaglacttlatcca -47 
TTAAT----TTCAG -98 7 
nirB 262 TTGAT----ATCAA -48 10 aaaggtGAATITTGATITTACATCAAITAAGcggggt -41.5 
sodA 284 TTGAT----ATTTT -42 ei agtacgGCAITTGAT|AATCATTTTICAATAtcattt -34 
fnr** 96 TTGAC----ATCAA =] 9 atgttaAAATTGACAAATATCAAITTTACGgcttga 1 
ccttaaCAACTTAAGGGTTTTCAAATAGatagac’ -103.5 
(cyoA) 599 CTTCT----ATCAA  -113 7 N/A N/A 
TTGTT----TTCAC  -198 7 
(icdA) 290 ATGAC----AACAA 16 7 N/A N/A 
TTGCT----AGCAT 73 i, 
(sdhC) 708 TTGAT----AATAA -330 8 N/A N/A 
(ulaA) 346 TCAAT----ATCAA  -278 8 N/A N/A 
TTGGT----ATTAA — -257 8 


* For visualizing the comparison, the letters in uppercase represent the binding site listed in RegulonDB,; the letter in bold face is the 
center of the site sequence; and the encompassed letters represent the exact binding site obtained by Program 3. 
** The second site listed in RegulonDB is not contained in the sequence data, which is only 96 bases long, from GenBank. 


Cases 
Finding CRP Binding Sites with a Given Pattern 


Several experiments are tested here, using the exam- 
ple in the Appendix, to analyze the effect of sequence 
length and number of sequences on the computational 
time. All examples are solved by LINGO [40], a widely 
used optimization software, on a personal computer 
with a Pentium 4 2.0G CPU. A software package named 
“Global Site Seer” is developed based on Program 2 for 
finding DNA motifs. This software is available from 
http://www.iim.nctu.edu.tw/~cjfu/gss.htm. 

Figure 2 illustrates the experimental results for an- 
alyzing the time complexity. Figure 2a is the computa- 
tional time given various sequence lengths, where the 
number of sequences is fixed at 18. The results show 
that the computational time changes only slightly even 
if the sequence length is increased from 105 to 1050. 


Figure 2b is the computational time with various num- 


bers of sequences. It shows that the solving time is 
roughly proportional to the number of sequences. The 
proposed model is quite promising for finding DNA 
motifs in a dataset with a large sequence length and 


a large number of sequences. Figure 2c shows that the 


computational time rises exponentially as the number 


of independent positions increases. 

Using Program 3 to 
sites, we obtain the globally optimal 
“TGTGA******TCACA” with score 147, which is ex- 
actly the same solution found in Program 2. The second 
best solution is “GIGAA****TTCAC” with score 134. 


The relationship between the computational time and 
the number of possible k’s (i.e. |k|) is linear, as shown 
in the experiment result listed in Fig. 3. The number of 
ignored letters k is between 0 and k, the upper bound 


of k, and thus we have |k| = k + 1 in this experiment. 


search CRP binding 
solution 
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Finding FNR-binding Sites with an Ambiguous 
Shared Pattern Program 3 is also applied to solve 
an example of searching for binding sites of fumarate 
and nitrate reduction regulatory protein (FNR) in E. 
coli. Both CRP and FNR belong to the CRP/FNR helix- 
turn-helix transcription factor superfamily [47]. The 
sequence data, which is taken from GenBank, con- 
tains 12 DNA sequences with lengths varied from 96 
to 781. Owing to the dimer structure of the binding 
protein, the consensus sequence in this example also 
has a constraint of inverse symmetry. The RegulonDB 
database [18] lists the found regulatory binding sites for 
eight of these twelve sequences while the exact posi- 
tions of the other four sequences are not listed yet. Solv- 
ing this example by Program 3 we obtained the global 
optimal consensus sequence as “I'TGAT****ATCAA” 
with score 107, which is the same consensus sequence 
as indicated by [47]. Table 2 illustrates the result includ- 
ing the consensus sequence and the predicted binding 
sites for all of the 12 sequences. Some sites downstream 
of the transcription start (i.e. with positive indices) 
are also listed because there are a few known cases 
in which regulatory sites appear within transcription 
units [47]. The proposed method has found some sites 
not listed in RegulonDB, but which have scores higher 
than those listed in RegulonDB (e.g. the third solution 
in the Operon ansB row of Table 2). The best predicted 
sites in the four undetermined sequences are also listed 
in Table 2. 


Conclusions 


This study proposes a linear mixed 0-1 programming 
approach for finding DNA motifs. Compared to the 
widely used maximum likelihood methods, the pro- 
posed method can reach a global optimum rather than 
finding a local optimum or a feasible solution. Addi- 
tionally, by utilizing binary variables, some logical con- 
straints can be embedded into the models. It is also con- 
venient to find the complete set of the second, third, 
etc. best consensus sequences. Since the number of bi- 
nary variables is fully independent of the number of 
sequences and the length of a sequence, the proposed 
method can treat motif finding problems with many 
long sequences. For finding motifs with many indepen- 
dent positions in an acceptable time, this study also pro- 
poses a method for distributed computing. 


The proposed method can also be conveniently ex- 
tended to treat more complicated motif finding prob- 
lems. In this study an extension of the linear program is 
designed to find DNA motifs with an unknown number 
of ignored letters between the two half sites. The result 
of searching for FNR-binding sites shows that the ex- 
tended model can find not only the locations of known 
binding sites listed in the RegulonDB database but also 
those not yet delimitated. 
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Introduction 


The G-group classification problem, also known as the 
G-group discriminant problem, involves a population 
partitioned into G distinct (and predefined) groups. 
The object is to construct a scalar- or vector- valued 
scoring function f : 8? — NR so that the group to which 
a population member with observed attributes x € KR? 
belongs can be determined, with best possible accuracy, 
from its score f(x). The scoring function f is usually re- 
stricted to a particular class (most commonly, linear). 
By a wide margin, the majority of studies have focused 
on the two-group case. Construction of f is based on 
training samples from the various groups. The most rea- 
sonable criterion for choosing f may be expected mis- 
classification cost, but many studies make the simpli- 
fying assumptions that all misclassifications are equally 
expensive and that groups are represented in the train- 
ing samples in proportion to their prior probability of 
being encountered, in which case the criterion reduces 


x | x 


Mixed Integer Classification Problems, Figure 1 
Optimal linear classifier (A) = misclassified) 
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to minimizing the number of misclassifications in the 
combined training samples. Figure | illustrates an opti- 
mal choice of linear classifier f in a two-group problem. 

Classical discriminant analysis relies on distribu- 
tional assumptions. In the two-group case with nor- 
mally distributed attributes, the scalar-valued discrimi- 
nant function that minimizes expected misclassification 
cost is known to be linear if the two groups have identi- 
cal covariance structures and quadratic if not. In both 
cases, direct estimation of f is straightforward. Non- 
parametric approaches, making no distributional as- 
sumptions, have utilized an eclectic assortment of tech- 
niques, among them neural networks, metaheuristics, 
and mathematical programming. Although some con- 
sideration has been given to nonlinear programming 
methods, the bulk of the work involving mathematical 
programming has utilized either linear or mixed inte- 
ger linear programming models, or support vector ma- 
chines (quadratic programs) [8]. See [10] and > lin- 
ear programming models for classification (elsewhere 
in this volume) for an overview of the subject. 

When f is linear, the problem of minimizing the 
number of misclassifications is a special case of the 
slightly more general problem of dropping the small- 
est (or least costly) set of constraints necessary to ren- 
der an inconsistent set of linear inequalities consis- 
tent. This problem crops up in a variety of contexts, 
including pattern recognition [18], machine learn- 
ing/data mining [5] and the analysis of infeasible lin- 
ear programs [6]. Thus methods from those areas may 
be applicable to discriminant problems. For instance, 
Soltysik and Yarnold [15] applied the algorithm of 
Warmack and Gonzalez [18] to the two-group linear 
discriminant problem. 


Formulation 


The following is a typical mixed integer programming 
model for the two-group case, using a scalar linear dis- 
criminant function: 


0 (1) 


>0 


s.t. Xiwt+wo-l1-M-7Z 
Xowt+wo-1+M-z 


W, Wo free;zy € {0, i : 


Matrix X, is an N, x p training sample from group g, 
while 2, and C, are respectively the prior probabil- 
ity of group g and the cost of misclassifying a mem- 
ber of that group. M is a sufficiently large positive con- 
stant, and 0 and 1 denote vectors, all of whose en- 
tries are respectively 0 or 1. The discriminant func- 
tion f(x) = wx-+ wo is intended to produce nega- 
tive scores for members of the first group and positive 
scores for members of the second group. Bivalent indi- 
cator variable z,, takes value 1 if the nth training obser- 
vation from group g is classified incorrectly and 0 if it is 
classified correctly. 

The discriminant function is linear as written, but 
various nonlinear functions can be generated by em- 
bedding the attribute space ‘R? in a higher-dimensional 
space. Support vector machines are particularly adept 
at this. Polynomial functions, for instance, are easily ac- 
commodated in (1) by expanding the sample matrices 
to include powers and products of attributes. 

A score of zero results in an ambiguous classifica- 
tion. Some authors deal with this by changing the first 
two constraints of (1) to 


—e:-l 
+e-1, 


Xiwt+wo-l1—-M-7z 
Xow+wo-1+M-z = 


IA 


where ¢ is a small positive constant. This formulation is 
nearly as general, although it is mathematically possible 
that infelicitous choices of ¢ and M could rule out an 
otherwise desirable solution. 

Problem (1) is known to be NP-hard [1]. At the 
same time, using the finite VC-dimension of linear clas- 
sifiers [16,17], it can be shown that the error rate of 
the solution to (1) converges in probability to the op- 
timal error rate as sample size grows [2]. Assuming 
availability of sufficient data, a key question is whether 
the problem remains tractable when the training sam- 
ple is large enough to provide a suitably accurate so- 
lution. There is grounds for (cautious) optimism, in 
that progress in hardware, software and algorithms ad- 
vances the boundaries of what is tractable, while for 
a given problem instance the sample size needed for ac- 
curacy is static. 

While there will often be a unique best choice of 
training observations to misclassify (i. e., unique opti- 
mal values of z, and z2), there commonly will be in- 
finitely many choices for the coefficients w, wo of a dis- 
criminant function that misclassifies those observations 
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only. To select from among those coefficient solutions, 
authors often introduce additional terms in the objec- 
tive function. As an example, Bajgier and Hill [4] used 
a formulation similar to the following: 


2 Ng 
: Tg - + 
min 2 a y [ Cezen + e14,,, — edi, | 


=1° 8 n=1 
s.t. Xiw+wo-1+di—d> -M-x <0 
X.w+wo-1—d} +d>+M-2 =0 


W, Wo free; dvd, > 0; Zz € {0,1} %. 


The deviation variables dy and d; measure the amount 
by which each score falls on the correct and incorrect 
side of the zero cutoff, respectively. The objective func- 
tions rewards the former and penalizes the latter, using 
small positive objective coefficients ¢, and ¢ to prevent 
improvements in these terms from inducing unneces- 
sary misclassifications. 

The motivation for formulation (1) is simple: if 
the training samples are representative of the overall 
population, the discriminant function that minimizes 
misclassification costs on the training samples should 
come close to minimizing expected misclassification 
cost on the overall population. Models like (1) tend to 
be computationally expensive, however. As is typical 
with mixed integer programming models, computation 
time increases modestly with the number of attributes 
(p) but more dramatically with the number of zero-one 
variables (N; + N2, the combined sample size). More- 
over, the constant M must be chosen large enough that 
the best choice of w and wp is not rendered infeasible 
by a misclassified observation with score larger than 
M in magnitude; but the larger M is, the weaker the 
bounds in a branch-and-bound solution of the prob- 
lem, and thus the longer the solution time. Codato and 
Fischetti [7] reported success using a form of Benders 
cut to eliminate M. 

In the special case where all attribute variables are 
discrete, it is likely that some observation vectors will 
appear more than once in the training samples. When 
that occurs, the number of zero-one variables can be re- 
duced from one per observation to one per distinguish- 
able observation, yielding a variation of (1) in which the 
objective function is replaced with 


; 2 gC Kg 
min > = » NgkZgk - 
g=l Ng k=1 


In this formulation [3], Kg is the number of distinct at- 
tribute vectors x in the training sample from group g, 
Nx is the number of repetitions of the kth distinct ob- 
servation from group g, and the matrices X, contain 
only one copy of each such observation. 


Multiple Groups 


When G > 2 groups are involved, the problem becomes 
considerably more complicated. In a practical applica- 
tion with multiple groups, it is plausible that misclassi- 
fication costs would depend not only on the group to 
which a misclassified point belonged but also the one 
into which it was classified. Thus an appropriate objec- 
tive function might look like 


where Cy, is the cost of classifying a point from group g 
into group h and Zgny, is 1 if the nth observation of 
group g is classified into group h and 0 otherwise. This 
represents a substantial escalation of the number of in- 
dicator variables. As a consequence, most research on 
the multiple group problem assumes that misclassifica- 
tion costs depend only on the correct group. 

Few models, and fewer computational results, 
have been published for the multiple group problem. 
Gehrlein [9] presented one of the earliest scalar-valued 
mixed integer models for the case G > 2. The range 
of his discriminant function is partitioned into sepa- 
rate intervals corresponding to the groups. His model, 
adapted to the preceding notation, is 


G 


; TCg os 
min 2 Ny 27s 
st. Xgw+wo-l—M-z,—Ug-1 < 0 
Xw+wo-l+M-z,—L,-1 > 0 
U,—Lg = 0 
Li, — Ug + Myng => e 
Ygh + Yhg a1 


W, Wo, L, U free; 
zg € {0,1}%8; y € {0,139 Y | 


The first three constraints are repeated for g = 
1,...,G while the next two are repeated for all pairs g, 
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h = 1,...,G such that g # h. Observations are clas- 
sified into group g if their scores fall in the interval 
[Ly, Ug]. Variable yg, = 1 if the scoring interval for 
group g precedes that for group h. Parameter ¢ > 0 dic- 
tates a minimum separation between intervals. 

Using a single scalar-valued discriminant function 
with G > 2 groups is restrictive; it assumes that the 
groups project onto some line in an orderly manner. 
In [9], Gehrlein also suggested a model using a vector- 
valued discriminant function f() of dimension G. Ob- 
servation x would be classified into the group corre- 
sponding to the largest component of f(x). The model 
increases the number of coefficient variables and the 
number of constraints but not the number of 0-1 vari- 
ables, the primary determinant of execution time. The 
model is: 


Ng 
» Zen 


n=1 


Cate 
: & Ss 
min 
a 
g=l1 
St. XgWe + Weo°1—Xywh—wro-1+ M-z, 
> e-l1 


We, Weo free; zy € {0,1} Xs. 


Here wx + wgo is the gth component of f(x) and ¢ > 0 
is the minimum acceptable difference between the cor- 
rect component of the scoring function and the largest 
incorrect component. The sole constraint is repeated 
once for each pair g,h = 1,...,G such that g # h. 


Methods 


Advances in computer hardware, optimization soft- 
ware and algorithms for the mixed integer classifica- 
tion problem have allowed progressively larger train- 
ing samples to be employed: where Koehler and Eren- 
guc [11] were restricted to combined training samples 
of 100 in 1990 (on a mainframe), Rubin [13] was able 
to handle over 600 observations in 1997 (on a per- 
sonal computer). Nonetheless, a variety of heuristics 
have been developed to find near optimal solutions to 
the problem. Several revolve around this property of 
the problem: if the training samples can be classified 
with perfect accuracy by a linear function, then prob- 
lem (1) can be solved as a linear program, with the 
Zen deleted, to obtain a discriminant function. Deletion 
of the Zn reduces the objective function to a constant 


0. Although this is perfectly acceptable, heuristics may 
substitute an objective function from one of the linear 
programming classification models, to encourage the 
chosen discriminant function to separate scores of the 
two groups as much as possible. This often also neces- 
sitates inclusion of a normalization constraint, to keep 
the resulting linear program from being unbounded. 
Alternatively, (1) may be solved heuristically to deter- 
mine which training observations to misclassify, and 
then a linear programming model using the remaining 
observations may be employed to select the final dis- 
criminant function. 

The BPMM heuristic of [11] solves the linear pro- 
gram dual to a relaxation of the mixed integer prob- 
lem, notes which observations would be misclassified 
by the resulting discriminant function, and then solves 
the dual of each linear relaxation obtainable by delet- 
ing one of those observations. Solving the dual problem 
tends to be more efficient than solving the primal, since 
there will typically be more observations than attributes 
(Ni + N2 > p). The heuristics presented in [14] also 
operate on the dual of the linear relaxation of the mixed 
integer problem, restricting basis entry to force certain 
dual variables to take value zero (equivalent to relaxing 
the corresponding primal constraints, thus allowing the 
associated observations to be misclassified). 

As noted earlier, comparatively few computational 
studies involve mixed integer models for multiple 
groups. Pavur proposed a sequential mixed integer 
method to handle multiple groups [12], constructing 
a vector-valued scoring function from a sequence of 
scalar functions. An initial mixed integer model sim- 
ilar to Gehrlein’s is solved to obtain the first scalar 
function. Thereafter, a sequence of similar mixed in- 
teger models is solved, with each model bearing addi- 
tional constraints compelling the scores produced by 
the next scoring function to have sample covariance 
zero with the scores of each of the preceding functions. 
The covariance constraints impose a sort of probabilis- 
tic “orthogonality” on the dimensions of the composite 
(vector-valued) scoring function. 


See also 


> Deterministic and Probabilistic Optimization 
Models for Data Classification 
> Integer Programming 


2214 


Mixed Integer Linear Programming: Heat Exchanger Network Synthesis 


> Linear Programming Models for Classification 
> Optimization in Boolean Classification Problems 
> Statistical Classification: Optimization Approaches 


References 


1. AmaldiE, Kann V (1995) The complexity and approximabil- 
ity of finding maximum feasible subsystems of linear rela- 
tions. Theor Comput Sci 147:181-210 

2. Asparouhov O, Rubin PA (2004) Oscillation heuristics for 
the two-group classification problem. J Classif 21:255-277 

3. Asparoukhov OK, Stam A (1997) Mathematical program- 
ming formulations for two-group classification with binary 
variables. Ann Oper Res 74:89-112 

4. Bajgier SM, Hill AV (1982) An experimental comparison of 
statistical and linear programming approaches to the dis- 
criminant problem. Decis Sci 13:604-618 

5. Bennett KP, Bredensteiner EJ (1997) A parameteric opti- 
mization method for machine learning. INFORMS J Com- 
put 9(3):311-318 

6. Chinneck JW (2001) Fast heuristics for the maximum fea- 
sible subsystem problem. INFORMS J Comput 13(3):210- 
223 

7. Codato G, Fischetti M (2006) Combinatorial Benders’ cuts 
for mixed-integer linear programming. Oper Res 54(4): 
756-766 

8. Cortes C, Vapnik V (1995) Support-vector networks. Mach 
Learn 20(3):273-297 

9. Gehrlein WV (1986) General mathematical programming 
formulations for the statistical classification problem. Oper 
Res Lett 5:299-304 

10. Hand DJ (1997) Construction and assessment of classifica- 
tion rules. Wiley, Chichester 

11. Koehler GJ, Erenguc SS (1990) Minimizing misclassifica- 
tions in linear discriminant analysis. Decis Sci 21:63-85 

12. Pavur R (1997) Dimensionality representation of linear dis- 
criminant function space for the multiple-group problem: 
An MIP approach. Ann Oper Res 74:37-50 

13. Rubin PA (1990) Heuristic solution procedures for a mixed- 
integer programming discriminant model. Manag Decis 
Econ 11:255-266 

14. Rubin PA (1997) Solving mixed integer classification prob- 
lems by decomposition. Ann Oper Res 74:51-64 

15. Soltysik R, Yarnold P (1994) The Warmack-Gonzalez algo- 
rithm for linear two-category multivariable optimal dis- 
criminant analysis. Comput Oper Res 21:735-745 

16. Vapnik V (1999) An overview of statistical learning theory. 
IEEE Trans Neural Netw 10:988-999 

17. Vapnik V, Chervonenkis A (1971) On the uniform conver- 
gence of relative frequencies of events to their probabili- 
ties. Theor Probab Appl 16:264—280 

18. Warmack R, Gonzalez R (1973) An algorithm for the op- 
timal solution of linear inequalities and its application to 
pattern recognition. IEEE Trans Comput C22:1065-1075 


Mixed Integer Linear Programming: 
Heat Exchanger Network Synthesis 


KEMAL SAHIN, KORHAN GURSOY, AMY CIRIC 
Department Chemical Engineering, 
University Cincinnati, Cincinnati, USA 


MSC2000: 90C90 


Article Outline 


Keywords 
Using MILP Models 
to Find the Minimum Number of Units 
Conclusions 
See also 
References 


Keywords 
MILP; HEN synthesis; Transshipment model 


Heat exchanger networks use the waste heat released by 
hot process streams to heat the cold process streams of 
a chemical manufacturing plant, reducing utility costs 
by as much as 80%. Heat exchanger network synthesis 
has been an active area of process research ever since 
the energy crisis of the 1970s, and over 400 research pa- 
pers have been published in the area. See [1,2,4,5,6], for 
recent reviews. 

In 1979, T. Umeda et al. [8] discovered a thermo- 
dynamic pinch point that limits the energy savings of 
a heat exchanger network, establishes minimum util- 
ity levels, and partitions the heat exchanger network 
into two independent subnetworks. This discovery rev- 
olutionized heat exchanger network synthesis: with it, 
designers could compute utility levels a priori, then 
seek the heat exchanger network structure that uses the 
minimum utility consumption while also minimizing 
the total investment cost. This remaining problem re- 
quires matching the hot utilities and process streams 
that release heat with the cold process streams and util- 
ities that require heat, choosing the network structure 
of each stream, and designing the individual heat ex- 
changer networks. In general, this is a mixed integer 
nonlinear programming problem (MINLP), but can be 
decomposed into two smaller problems by first select- 
ing the matches between hot and cold process streams 
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and utilities by minimizing the total number of units, 
then optimizing the network structure. The first prob- 
lem is a mixed integer linear programming problem 
that will be discussed in detail here. 


Using MILP Models 
to Find the Minimum Number of Units 


Stated formally, the minimum-units problem is: 
Given 
1) A set of hot process streams and utilities i ¢ H, and 

for each hot stream i: 

a) the inlet and outlet temperatures Tj and T?; 

b) either the heat capacity flow rate FCp; or the heat 

duty Qi. 
2) A set of cold process streams j € C, and for each cold 

stream j: 

a) the inlet and outlet temperatures T and Ths 

b) either the heat capacity flow rate FC,; or the heat 

duty Q;, 
3) The minimum temperature difference between hot 
and cold streams exchanging heat, A Tynin. 
Identify a set of stream matches (ij) and their heat du- 
ties Q; that 
a) meets the heating and cooling needs of each stream; 
and 
b) minimizes the total number of matches. 

S.A. Papoulias and LE. Grossmann [7] formulated 
this as a mixed integer programming problem using 
a transshipment model, by making an analogy between 
heat exchanger networks and transportation networks. 
In the transshipment analogy, hot process streams, the 
sources of heat, are similar to manufacturing plants, the 
sources of goods, while cold process streams, the heat 
sinks, are akin to stores and shopping malls, the sinks 
of manufactured goods. 

The analogy is not perfect, as heat only flows from 
a high temperature to a lower one, in obedience to the 
second law of thermodynamics. Partitioning the tem- 
perature range of the heat exchanger network into in- 
tervals can capture this heat flow pattern. Each interval 
sends excess, or residual, heat to the interval below it, 
just as excess manufactured goods are sent to a discount 
warehouse. 

The hot side of this temperature cascade is created 
by ordering T} and T; + A Twin from the highest to 
the lowest value, creating t = 1, ..., TI temperature in- 


Mixed Integer Linear Programming: Heat Exchanger Net- 
work Synthesis, Table 1 
Stream data. Qcw = 8395.2 kW, ATmin = 10°C 


Stream T"™(°) ORO) FC,(kW/K) 
Hl 159 UI 228.5 

H2 159 88 20.4 

H3 159 90 53.8 

Cl 26 127 93.3 

C2 118 149 196.1 


tervals. Temperatures on the cold side of the cascade 
equal the temperature on the hot side minus AT pin. 
Hot stream i releases Q/! units of heat to temperature 
interval t. Qi! is equal to 


POPA Tj = Ty) 
fT) > Tend 7° = %,, 
u_ | PCP(T1— TP) 
eS) #e Shand Oo Ss, 
Q 
if T! = T° and T;-) = TI. 


t 


Cold stream j absorbs Q, units of heat from tempera- 
ture interval t. Qi equals 


FCP,(T;-1 — T;) 
if T; < T; — AT min and 
T; 2 Ty-1 _ AT min; 
c FCP,(T) — T;-1) 
i if T] < T;— ATmin and 
tg < Ti-1 —_ ATmin, 
Qi 
if Tj = Te and se = Tap A Tie: 


Any excess heat sent to interval t from hot stream i 
cascades down to interval t+1 through the residual flow 
Riz. Process utilities may be treated as process streams, 
or may be placed at the top or bottom of the cascade. 

This transshipment model of heat flow leads to the 
following mixed integer linear programming problem: 


min ) Vij 
i,j 
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subject to 


Rie — Rita + Sais = Qi. 
j (1) 
i=1,.. 


why tH Tis, FE 


a= Oe j=l,... 


Oy = 5 Gin i= 1,.... 


teTI 
Qij < Uij Vij, i=1,...,H, j=1, LC, (4) 
qijt = 9, 
R; = 0, (5) 
i=1, »H,j=1 2G, t=1 , TI, 
Ro = Rr = 0, (6) 
yij = {0, 1}, ba Veg Po Ages GC: (7) 


In this formulation, yj is a binary variable which is 
one if a match between hot process stream i and cold 
process stream j occurs, and zero otherwise; qj; is the 
amount of heat exchanged between hot stream i and 
cold stream j in temperature interval t, Rj; is the resid- 
ual heat flow associated with hot stream i that cascades 
down from temperature interval t to temperature in- 
terval t+1, and Qj is the heat duty of match (i, j). The 
overall objective function minimizes the total number 
of units. Constraint (1) is the energy balance for hot 
stream i around temperature interval t and constraint 
(2) is the energy balance for cold stream j around tem- 
perature interval t. Constraint (3) finds the overall heat 
duty of match (ij). Constraint (4) sets this heat duty to 
zero when match (ij) does not exist. The nonnegativ- 
ity constraints prevents heat flow from a low tempera- 
ture to a higher one. Note that the residual heat flows 
into the first temperature interval and out of the last 
temperature interval are zero when there are no utili- 
ties above or below the cascade. The objective function 
and the constraints are linear, and the formulation in- 
volves both continuous and integer variables, making 
this a mixed integer linear programming problem. 


Lower bounds on the solution of this problem are 
given by linear programming problems where some in- 
teger variables are fixed to either zero or one and the 
remainder are treated as continuous variables. The ac- 
curacy of these bounds depends upon the parameters 
Uj; is the fourth constraint. When these parameters are 
very large, the lower bounds will be quite far from the 
solution of the MILP. 

The smallest acceptable value of Uj is the minimum 
of the cooling requirements of stream i and the heating 
requirements of stream j: 


Uij = min | > Qe, 


teTI 


> QCjt 


teTI 


Example 1 This example is from [3] and features three 
hot streams, two cold streams, and a cold utility. Table 1 
gives the inlet and outlet stream temperatures and the 
flowrate heat capacities of each process stream and the 
cooling water duty. 

Temperatures on the hot side of the cascade are 
159°C, 128°C, and 36°C, while temperatures on the cold 
side are 149°C, 118°C and 26°C. There are two temper- 
ature intervals. Table 2 gives the heat released from hot 
streams to the temperature intervals, while Table 3 gives 
the heat absorbed by the cold streams from the temper- 
ature intervals. 
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Qy heat released from hot stream i to temperature interval t 


Stream Temperature Interval 


TI-1 TI-2 
H1 7083.5 11635.5 
H2 632.4 816.0 
H3 1667.8 2044.4 
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work Synthesis, Table 3 

Qt, heat absorbed by cold stream i from temperature inter- 
val t 


Stream Temperature Interval 


TI-1 TI-2 
Cl 839.7 8583.6 
C2 6079.1 
CW 8395.2 
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Four solutions which satisfy minimum number of matches 


Solution 1 Solution 2 Solution 3 Solution 4 

Match Duty (Qij) Match Duty (Qij) Match Duty (Qij) Match Duty (Qij) 
H1—-Cl 9423.3 H1-Cl 7974.9 H1-Cl SV/ILIl H1-Cl 4262.7 
H1—C2 6079.1 H1—C2 6079.1 H1—C2 6079.1 H1—C2 6079.1 
H1—CW 3234.6 H1—CW 4683.0 H1—CW 6946.8 H1—CW 8395.2 
H2—CW 1448.4 H2-—Cl 1448.4 H2—CW 1448.4 H2-Cl 1448.4 
H3—W 37122 H3—CW 37222 H3-Cl 3722 H3-Cl BND 
In this example, the minimum number of units is5, References 


and there are four solutions to this MILP that meet this 
minimum (cf. Table 4). 


Conclusions 


Mixed integer linear programs are used in heat ex- 
changer network synthesis to identify the minimum 
number of units, and a set of matches and their heat 
loads meeting the minimum. These MILPs are based 
upon a transshipment model of heat flow. 
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Separation networks involving mass transfer operations 
that do not require energy (e.g. absorption, liquid- 
liquid extraction, ion-exchange etc.) are characterized 
as mass exchange networks (MEN). These appear in 
the chemical industries mostly in waste treatment, but 
also, in feed preparation, product separation, recovery 
of valuable materials, etc. A mass exchanger, in this con- 
text, is any counter-current, direct-contact mass trans- 
fer unit, where one or more components are trans- 
ferred at constant temperature and pressure from one 
process stream, which is characterized as rich stream, 
to another process or utility stream, characterized as 
lean stream. Mass integration aims to the purification 
of the rich streams and the recovery of valuable or haz- 
ardous materials at the minimum total cost (invest- 
ment and operating cost of auxiliary streams). In the 
specific case, when the mass transfer operations take 
place at the same temperature, or heating/cooling re- 
quirements are negligible, the integration problem is 
limited to the synthesis of a mass exchanger network 
(MEN) only. When mass exchange operations at dif- 
ferent temperature levels are encountered, mass and 
heat exchanger networks (MHEN) may be considered 


simultaneously. 
MEN synthesis involves a set of rich streams, in 
terms of one or more components, R = {i:i=1,..., Nr}, 


with known flowrates, G;, inlet and outlet compositions 
for the components of interest, y;., y;. (exact values or 
bounds) respectively, and a set of process or auxiliary 
lean streams (mass separating agents, MSAs), S = {j: j = 
1, ..., Ns} with known cost, inlet and outlet composi- 
tions for the same components, Xje> Xie (exact values or 
bounds), as shown in Fig. 1. 

The synthesis problem refers to the selection of the 
appropriate lean streams and their flowrates, L;, the 
mass exchange operations (mass exchange matches), the 
mass transfer load for each separator and its required 
size, and the configuration of the overall network. 

Mass transfer in each mass exchanger is governed 
by the first and second thermodynamic laws, as is heat 
transfer in heat exchangers. Mass transfer of a compo- 
nent c from a rich to a lean stream is feasible if the com- 
position of c in the rich phase is greater than the equi- 


Rich streams 


R={il i=1..Np} 
Gi ¥f, 
Lean ; 
streams 
§ ={jl J=1..Ng} 
Mass 
Exchange 
Network 
u 
Lys j ic 
xs 
jc 


Mixed Integer Linear Programming: Mass and Heat Ex- 
changer Networks, Figure 1 


librium composition with respect to the lean phase: 


Yo = fl%e) +, (1) 


where f(x.) is the equilibrium relation and € is a min- 

imum composition difference that ensures feasible mass 

transfer in a separator of finite size, in analogy to A Tin 
in heat exchangers. This analogy led to the development 
of synthesis methods for mass exchanger networks em- 
ploying mixed integer optimization techniques, simi- 
lar to heat exchanger networks (cf. Mixed Integer Lin- 
ear Programming: Mass and Heat Exchanger Networks; 
> MINLP: Mass and Heat Exchanger Networks), that 
are categorized into the sequential synthesis and the si- 
multaneous synthesis methods. 

The sequential MEN synthesis method, introduced 
in [3] and [4] involves the following steps: 

1) Minimum cost of mass separating agents (minimum 
utility problem), to determine the optimal flows of 
the mass separating agents. 

2) Minimum number of mass exchanger units, for 
fixed MSA flows, to determine the mass exchange 
matches. 

3) Network configuration and separator sizes for fixed 
mass exchange operations. 

The first two synthesis steps involve the solution of lin- 

ear and mixed integer linear problems. 

A useful tool of the sequential MEN synthesis 
method is the composition interval diagram, CID, 
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where thermodynamic feasibility of mass transfer is ex- 
plored mapping the rich and the lean streams on equiv- 
alent composition scales, that are derived from the mass 
transfer feasibility requirements in (1). In general, the 
composition equivalent scales and the minimum com- 
position difference, €, are defined for each component 
of interest and each pair of rich and lean streams. In the 
simple case of a single component, where mass transfer 
is independent of the presence of other components in 
the rich streams, the CID is constructed as illustrated in 
Fig. 2. 

Feasible rich-to-lean mass transfer is guaranteed 
within a composition interval when the equilibrium re- 
lation f(x.) is convex within the interval. When f(x,) is 
convex in the whole composition range, only inlet com- 
positions are required to construct the CID [8]. 

The minimum cost of mass separating agents is 
found employing a transshipment model, where the 
components of interest are the transferred commodi- 
ties, the rich and the lean streams are considered as 
sources and sinks respectively, and the composition in- 
tervals define the intermediate nodes [4]. The model in- 
volves energy balances around the temperature inter- 
vals (intermediate nodes): 


min ) cj Lj 


J 
s.t. OK-1 + PS WRi 


iERy 
(TP1) = & + >> WS} 
JESk 
O<L <L?, jeS, 
50 = ONim = 05 
6, > 0, K=1,..., Nine — 1, 
where 


e R, is the set of rich streams, present in interval k; 

e S, is the set of lean streams, present in interval k; 

e Nin is the number of composition intervals; 

e WR; is the mass exchange load of rich stream i in 
interval k, 


WR = Gi(ye — max(ye41, ¥j))3 


e ws! is the mass exchange load of lean stream j in 
interval k, 


WS = L,(min(x;, xjx) — xj+1)3 


R; . 
R, 
sicoasirpasioursivnistessalpasees Yee Xi k [ose Xj kop 
k-1 
interval 
ines aieol iiaoaiinintiivass pattie Yk Xk pas! Lie es Xk mals 
k L- 
interval J 
a) hin Yk+1 Xj k+l —=— Xi kel 
bi 
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Composition interval diagram 


e 6, is the residual mass exchange load in interval k. 
Problem (TP1) results in the optimal flows of the mass 
separating agents and the identification of the pinch 
points, i.e. the thermodynamic bottlenecks in mass 
transfer. The pinch points are defined by zero residual 
flows and divide the mass exchange network into sub- 
networks. Mass transfer between different subnetworks 
(i.e. across the pinch) increases the cost of mass sepa- 
rating agents. 

An assumption in (TP1) is that molar flows of the 
rich and the lean streams are constant. If significant 
flowrate variations take place, compositions and mass 
exchange loads are calculated based on nontransferable 
components. 

The following cases are distinguished: 

e Fixed inlet and outlet compositions. 

Then, (TP1) is an LP problem. 

When multiple components are considered, the CID 

is defined for all the components of interest and 

(TP1) corresponds to the multicommodity trans- 

shipment model. The pinch points are then deter- 

mined by the component that requires the greater 

MSA flows. 

e Variable outlet compositions. 

Then, the mass exchange loads of the rich and lean 

streams in their final intervals (defined by the up- 

per and lower bounds on their outlet compositions) 
are variables. Problem (TP1) can still be solved as an 

LP [9], considering the variable mass exchange loads 

explicitly in the model. 

Variable inlet compositions usually require flexible 
mass exchange networks to accommodate the varia- 
tions and define a different problem. For a single com- 
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ponent it has been shown that the minimum MSA cost 

corresponds to the lower bounds of the inlet composi- 

tions [8]. 

For nonconvex equilibrium relations, (TP1) cannot 
guarantee feasible mass transfer throughout the com- 
position range, while the predicted MSA cost is a lower 
bound to the actual minimum one. B.K. Srinivas and 
M.M. El-Halwagi suggested in [14] an iterative proce- 
dure to calculate the minimum required MSA cost, that 
involves two major steps: 

i) a ‘feasibility problem’, where ‘critical’ composition 
levels are identified and included in the CID (non- 
convex NLP step, that requires global optimization 
methods), and 

ii) (TP1) with updated intervals, which calculates in- 
creasing lower bounds to the minimum MSA cost. 

Instead of target outlet compositions for the rich 

streams, it may be of interest to remove a certain total 

mass load of pollutants. Then, (TP1) is solved with vari- 
able rich outlets and a fixed total mass exchange load 

[10]: 


M. = D1 Gilyi — ¥) 


The minimum-utility-cost problem has been alter- 
natively formulated as an LP or MINLP problem, based 
on total mass balances and the following property: 


Mass lost by all the rich 
streams below each 
pinch point candidate 


Mass gained by all the lean 
— 4 streams below each 
pinch point candidate 


<0 (2) 


and employing binary variables to denote the relative 
position of variable outlet compositions with respect to 
each pinch point candidate in the CID [5,6,8,9]. 

The minimum number of mass exchange opera- 
tions (units) for fixed MSA cost is determined in each 
subnetwork in a second step, in an attempt to minimize 
the fixed cost of the separators. The minimum number 
of mass exchangers is found employing the expanded 
transshipment model, where the existence of a mass ex- 
change match-separator in a subnetwork is denoted by 


a binary variable: 


1, when streams i, j 
exchange mass 

Eijm = ; 

in subnetwork m 


0, otherwise. 
For a single component, the minimum number of mass 


exchanger units is given by the following MILP prob- 
lem [4]: 


min y > Ejjm 


m i€Rm JES 
s.t. Sik — Oik-1 + es Mijx = WR, 
JES mk 
kE€Im, i€ Rink, MEM, 
» Mijx = Ws), 
i€Rmk 
(TP2) ke In, j€ Smk, mEM 
y Mijk — EijmUijm <0 
kEIm 
bik = 0, KE Im, i € Rm, 
Mijx = 0, k € In, 
i€ Rkm, j € Skm 
Ej, = 0,1, k € In, 
i€ Rim, j € Skms 
where 
e R, is the set of rich streams, present in subnetwork 
m, 
e Sm is the set of lean streams, present in subnetwork 
m, 


I, is the set of intervals in subnetwork m, 

e Rim is the set of rich streams, present in interval k of 
subnetwork m, or above, 

© Sim is the set of lean streams, present in interval k of 
subnetwork m, 

e WR; is the mass exchange load of rich stream i in 
interval k, 

e 4d; is the residual mass exchange load of rich stream 
i in interval k, 

° ws! is the mass exchange load of lean stream j in 
interval k, as determined by (TP1), 

e Mix is the mass exchange load between i and j in 
interval k, 
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e Uijm is an upper bound to the possible mass ex- 
change load between i and j in subnetwork m, 


Uijm = min} ~ WR; >) WS; 


kEl im k€lm 


Srinivas and El-Halwagi have shown [14] that, when the 
equilibrium relations around a pinch point are not con- 
vex, a mass exchanger can straddle the pinch and still be 
thermodynamically feasible. To account for such cases, 
exchangers across the pinch points can be considered 
introducing extra binary variables: 


Lijp = Mijp, 

Tijpti < Mijp4i, 

Tijp + lijpti 2 2Bijp. 
Tijps Lijp+i € (0, 1}, 
Bijp € (0, 1}, 


where 

e Ij) denotes that streams i and j exchange mass at the 
interval directly above pinch point p, 

e Tijp:1 denotes that streams i and j exchange mass at 
the interval directly below pinch point p, 

e Bip denotes the existence of an exchanger between 
streams i and j, across the pinch p. 

Then, the number of required units to minimize is 

given by: 


Del dL Do Etim — Do Bisp 
m i€Ry, JESm Pp 
Note, that Ijj-variables can be relaxed to continuous, 


due to total unimodularity of the model with respect to 
these variables: 


0 < Iijp, Tijpti <1 


Problem (TP2) may not have a unique solution. 
Alternative combinations of mass exchange matches, 
featuring the minimum MSA cost, may be generated 
by solving (TP2) iteratively and including integer cuts. 
These do not necessarily correspond to networks of the 
same overall cost. 

The expanded transshipment model can also be em- 
ployed to determine the minimum MSA cost, consider- 
ing variable mass loads for the lean streams. Then, for- 
bidden or restricted mass exchange operations can be 
explicitly accounted for. 


Although (TP2) does not determine the network 
structure, stream splitting and exchanger connectivity 
may be guided by the resulting mass exchange load dis- 
tribution in each composition interval [4]. The actual 
network configuration is found in a next step, employ- 
ing heuristic methods [3,5] or superstructure methods 
(NLP models). 

Special cases of mass exchange networks have been 
studied: 

e MEN and regeneration networks [5,11]. 

The regeneration of mass separating agents by auxil- 
iary streams can be considered simultaneously with 
the main MEN, in another mass exchanger network, 
where the MSAs behave as the rich streams. In this 
case, the CID is extended to include the equivalent 
composition scales of the regenerating agents. The 
inlet and outlet compositions of the lean streams in 
the main MEN are in general variables. 

e Reactive mass exchange networks [6,11,14] 
Rich-to-lean mass transfer may involve interphase 
mass transfer and chemical reaction in the lean 
phase, at constant temperature. Mass exchange op- 
erations of this kind are considered deriving the 
equilibrium relations based on chemical equilib- 
rium. 

The main advantage of the sequential synthesis method 

for mass exchange networks is that simple optimiza- 

tion models are solved. However, unless the MSA cost is 
dominant, as synthesis decisions are fixed from one step 
to the next, important trade-offs between operating and 
capital cost are not exploited and overall cost optimal- 
ity cannot be guaranteed. Furthermore, the minimum 
composition difference, € that defines the mass recov- 
ery levels in (TP1) and (TP2), is in general, an optimiza- 
tion variable for each mass exchanger separately. In the 
sequential synthesis method this is fixed arbitrarily to 

a possibly conservative value for the construction of the 

CID. El-Halwagi and V. Manousiouthakis [4] suggested 

a two-level optimization procedure to select a unique € 

for all mass exchange operations, based on the impact 

of € on the final MEN cost, still, not exploiting the over- 
all cost trade-offs. 

When isothermal mass exchange operations take 
place at different temperature levels, the operating and 
overall mass integration costs are affected by the heat- 
ing and cooling requirements of the system. Energy 
integration between the rich and lean streams can be 
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Mixed Integer Linear Programming: Mass and Heat Ex- 
changer Networks, Figure 3 


considered within a mass and heat exchanger network 
synthesis problem (MHEN) to reduce the total cost. 
The overall problem is addressed combining MEN and 
HEN synthesis tools. The optimal temperature of mass 
exchange is defined for each pair of rich and lean 
streams by the equilibrium relations that limit mass 
transfer 


yi = Kij(T)x;, 


where K;j(T) is a known function of temperature. 

In the sequential synthesis framework, the overall 
minimum operating cost for the network (cost of mass 
separating agents and heating/cooling utilities) may be 
calculated from a combined mass and heat transship- 
ment model. Each stream is considered to consist of 
substreams, of the same inlet and outlet composition 
and temperature, each of which participates to isother- 
mal mass exchange operations at a different temper- 
atures. Srinivas and El-Halwagi proved [13], that, for 
monotonic dependence of the equilibrium constant on 
temperature, the overall utility cost of the combined 
MHEN is independent of such a stream decomposition, 
see Fig. 3. 

Although the mass exchange temperatures (T},..., 
Ty) are variables, their relative position with respect to 
inlet and outlet stream temperatures (greater or less) 
can be prepostulated. Thus, the rich and lean sub- 
streams define hot (or cold) streams before their mass 
exchange operations and cold (or hot) streams after- 
wards, cf. Fig. 4. 

A CID is constructed, similarly to the simple MEN 
case, involving the several substreams with variable 
flows, and thus, variable mass loads in each composi- 


(Ri.y}. TP) (Ri. yj-T) 
hot cold 
stream stream 


rich 
stream 


Ra. y?. TD Ra. y{.T) 
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changer Networks, Figure 4 
Rich substream with T; < T; < bu 


tion interval. Mass exchange is permitted between sub- 
streams of the same temperature. A temperature inter- 
val diagram, TID, is also constructed, involving the hot 
and cold substreams and the available heating and cool- 
ing utilities, with variable heat loads per interval, due 
to the variable substream flows. In order to avoid dis- 
crete decisions (i.e. presence or not of streams in tem- 
perature intervals with variable limits), the temperature 
range for each mass transfer operation is discretized 
and a substream is associated with each candidate tem- 
perature [13]. 

The minimum utility cost is found from the solution 
of the combined LP transshipment model, which, for 
a single component is as follows: 


min y cjLj 


jes 


+ > 2 cphQHU jy, 


n€TIheHU, 


+> YO ccQCU en 


n€TIcECUy 


(TP3) 


such that 


51k — 81,k-1 + > > Mik = WRi, 


; / 
jes MESS jx 


keClI,ieR, 1; € RSix, 


as >, eT WS, 


i€R [VERS jx 
ke CI, fj € S,1j € SSix, 
O11 — O81. n—1 


+ > 2 Qi.0)n 


s/ERUS VW,ECS ry 
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+ Qi = QSi,4, 


RS; is the set of substreams of rich stream i, of vari- 


c€CUn able flow, G;,, such that 
néTI,seRUS, |, € ASsy, OF — Or, 
+ 2 YS Qin = QU in, s aa 
sERUS 1,ECS sy 
né TI, he HU, pie in interval k, or above, . . 
SSjx is the set of substreams of lean stream j, of vari- 
able flow, L;,, such that 
>» * Qi tn . 
s/ERUS I, EHS, ye = 1;, 
+ > QAni,n = QSi.n ! 
heHUn present in interval k, 
ne TI,se RUS, 1, € CS8sn, HS,, is the set of hot substreams of stream s, present 
in interval n, or above, 
> > QCi.cn = QCU En, CSsy is the set of cold substreams of stream s, present 
sERUS 1,€HS sy in interval n, 
néTI,ce€Cuy,, HU, is the set of hot utilities, present in interval n, 
or above, 
bik 20, KE CIL,i ER, 1; € RSix, CU,, is the set of cold utilities, present in interval n, 
WR'i, is the mass exchange load of substream /;, in 
On = 0, n€ TI, se RUS, 1; € HSsy, interval k, 
gh >0,ne TI, h € HUy, WR} = Gi,(ye — max(ye4i, yispt)), 
Myx = 9. WS!i, is the mass exchange load of substream Jj, in 
interval k, 
kKeECLieR, jeS, 
l, € RSix, Uf € SSjx, WS, = Li (min(x;, xjx) — xje+1), 
61,« is the residual mass load of substream /; in inter- 
Mir =, val k, 
keClLiceR jeS, Mij11k is the mass exchange load between /; and Lis 
li © RSix, Uy € Sit, mai . 
‘ FM is the set of mass exchanging substreams that are 
(ij) ¢ FM, at different temperatures, 
QSi,n is the heat load of substream /, in interval n, 
81,0 = Si:Ner = 9, Qi,iy:n is the heat exchange load between /, and ly’ 
ié R, 1; € RS;, in interval n, 
1,n is the residual heat load of hot substream /, in 
61.0 = O1,.N 7, = 9, interval n, 
sERUS, |, € HS,, QHUnz» is the heat load of hot utility / in interval n, 
QCU,» is the heat load of cold utility c in interval n, 
Che = Oe, = 0, QHhi,n is the heat exchange load between hot utility 
he HU, h and I, in interval n, 
0% is the residual heat load of hot utility h in inter- 
where val n, 
e Clis the set of composition intervals k, © QC),cn is the heat exchange load between /, and cold 


e TI is the set of temperature intervals n, utility c in interval n. 
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Problem (TP3) results in the minimum utility cost and 
the corresponding flows of separating agents and heat- 
ing/cooling utility streams, the optimal decomposition 
of each stream into substreams of fixed mass exchange 
temperature and the mass and heat exchange pinch 
points and corresponding subnetworks. 

The minimum operating cost of the combined 
MHEN can alternatively be found applying the first and 
second thermodynamic laws (property in (2)) on the 
composition and temperature interval diagrams [13]. 

The minimum number of mass and heat exchangers 
is determined in a second step through the expanded 
MILP transshipment model, separately in each mass 
and heat exchanger subnetwork. The final network con- 
figurations and unit sizes are determined in a final step, 
applying heuristic rules or superstructure models. 

Additional disadvantages of the sequential MHEN 
synthesis method, compared to the synthesis of simple 
MEN, are that: 

i) the mass and heat exchange networks are assumed 
separable and 

ii) the intermediate mass exchange temperatures are 
decided in the first step; this forbids full exploitation 
of the mass/heat integration trade-offs, as capital 
cost implications of such decision is not accounted 
for. 

Modeling concepts from the sequential mass and heat 

exchanger network synthesis methods, employing LP 

and MILP optimization models, have been extended 

to explore distillation networks [1], pervaporation sys- 

tems [12] and other energy-requiring separation net- 

works [2,7]. 
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Introduction 


The global optimization of classes of mixed integer 
nonlinear bilevel optimization problems is addressed. 
For problems where the integer variables participate in 
both the inner and the outer problems, the outer level 


may involve general mixed-integer nonlinear functions. 
The inner level may involve functions that are mixed- 
integer nonlinear in outer variables, linear, polyno- 
mial, or multilinear in inner integer variables and linear 
in inner continuous variables. The technique is based 
on reformulating the mixed-integer inner problem as 
continuous by its convex hull representation [11,12] 
and solving the resulting nonlinear bilevel optimization 
problem by a novel deterministic global optimization 
framework. 


Formulation 


The general mixed-integer nonlinear Bilevel Program- 
ming Problem (BLP) formulation is: 


min F(x, y) 


s.t. G(x, y) > 0 
H(x,y) =0 
min f(x, y) a) 

s.t. g(x, y) = 0 
h(x, y) =0 

Sia oN rina eR, 


+ + 
Xit ss Xny EZ Vitis Van © Yin Z" . 


where x is a vector of outer problem variables, of which 
i are continuous and n, — i are integer, y is a vector of 
inner problem variables, of which j are continuous and 
nz — j are integer, F(x, y) is the outer objective func- 
tion, H(x, y) are outer equality constraints, G(x, y) are 
outer inequality constraints, f(x, y) is the inner objec- 
tive function, h(x, y) are inner equality constraints, and 
g(x,y) are inner inequality constraints. The applica- 
tions of BLP are many and diverse [4,6,7]; if these prob- 
lems involve discrete decisions in addition to continu- 
ous ones, then the mixed-integer BLP models arise. 


Classes 


The nonlinear mixed integer BLP can be classified into 
four different categories, depending on the existence of 
integer variables in the outer or the inner problems: 


(1). 
(II). 

(III). 
(IV). 


Integer Upper, Continuous Lower BLP; 
Purely Integer BLP; 

Continuous Upper, Integer Lower BLP; 
Mixed-Integer Upper and Lower BLP. 
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Mixed Integer Nonlinear Bilevel Programming: Deterministic 
Global Optimization, Figure 1 
Algorithm flowsheet for type II,II,IV BLPs 


The existence of both integer and nonlinear terms 
in the above problem classes require special solution 
techniques. The specific mathematical structure of the 
mixed integer nonlinear BLP is of great import in devel- 
oping corresponding solution strategies. Problems of 
Type I can be addressed with existing BLP solution ap- 
proaches. For problems of Type II, enumeration meth- 
ods can be applied. However, BLPs of Type III and IV 
are the most difficult to solve. 


BLPs with Inner Integer Variables 


The conventional solution method of the continuous 
BLP is to transform it into a single level problem by re- 
placing the inner problem with the set of equations that 
define its Karush-Kuhn-Tucker (KKT) optimality con- 
ditions. However, the KKT optimality conditions use 
gradient information, so the conventional approach is 
not applicable when integer inner variables exist. Fur- 
thermore, if the integrality constraint is relaxed on the 
inner integer variables, the solution of this relaxed BLP 
does not provide a valid lower bound on the solution of 
the mixed-integer BLP [9]. Note that even if the optimal 


solution of the relaxed BLP is integral in y, this may not 
be a globally optimal solution of the original BLP [9]. 
Thus, the conventional KKT-based methods inherently 
fail in locating the global optimum. 

The BLP with inner mixed-integer variables can be 
transformed into an equivalent BLP with inner mixed- 
binary (0-1) variables as follows. Every inner prob- 
lem integer variable y;, with upper and lower bounds 
yi <yjx< yy is converted into a set of binary variables 
using the formula [2]: 


Vir yj + Zj1 + 2Zj2 + 4zj3 +... + aN Zin (2) 


where Z; is a vector of (0-1) variables and N; is the min- 
imum number of (0-1) variables needed: 


log(yj’ — 2) 


log(2) ®) 


y= 1ener/ 


such that INT truncates its real argument to an integer. 

The only time that the KKT optimality conditions 
are applicable to solve the BLP with mixed-binary y is 
when the following property is satisfied [8]: 


Property 1 If the inner problem constraint set, Yq, de- 
fines a vertex polyhedral convex hull and all the vertices 
of the convex hull lie in Yqy, then the optimal inner prob- 
lem integer solution is equivalent to its linear program- 
ming relaxation. As a result, the Karush-Kuhn-Tucker, 
KKT, conditions of relaxed inner linear problem are nec- 
essary and sufficient to define the optimal inner problem 
integer solution. 


The property is also satisfied when outer variables ex- 
ist in the inner problem constraints, such that the inner 
problem vertex polyhedral convex envelope is defined 
parametrically in x. Hence, the integer solution of the 
inner problem lies at a vertex of the inner solution set 
and the KKT optimality conditions locate the true opti- 
mal solution [8]. 

Here, a global optimization procedure is presented 
for BLPs of Type II, II and IV that is based on a refor- 
mulation/linearization scheme combined with a global 
optimization framework. The idea is that if the inner 
problem constraint set has a vertex polyhedral convex 
envelope, then Property 1 is satisfied and the mixed- 
integer inner problem can be converted into a contin- 
uous problem of equivalent form. The application of 
the reformulation/linearization technique results in the 
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convex hull representation for several classes of inner 
problems. 


Reformulation/Linearization 


The mixed-binary inner problem constraint set is trans- 
formed into the continuous domain by converting it 
into a polynomial programming problem and then 
relinearizing it into an extended linear problem by 
a method based on [11]. First, a polynomial factor is 
defined as follows: 


(n ») (n (1 -»») 
FiVh, J2) = on i€Jy 


Ni, J2 G Ny = 1,.., ny, 
s.t. Ii N Jo => Q, i Uf| — ny 
(4) 


Using this polynomial factor, the convex hull of the in- 
ner problem, Yyy, is obtained. If the inner optimization 
problem is linear, then the 2-step process is as follows: 


Step 1 Reformulation. Multiply every constraint, in- 
cluding 0 < y < 1, with every factor defined as above 
and use the relationship yj = yj Vj =1,.., my to lin- 
earize terms polynomial in y. Include in the inner prob- 
lem constraint set the nonnegativities on all possible 
factors of degree ny (i.e. Fu(Ii, Jz) = 0 for all (1, Jz) of 
order ny). 

Step 2 Linearization. Linearize the inner constraints 
that are multilinear in y, such as [Tje;y; by substitut- 
ing a z, for each set J with |J| > 2, with the elements 
of J in increasing order. (i.e. a new variable zj is in- 
troduced to substitute for a bilinear term (y; Vj = Zij) 
and further substitution is performed for multilinear 
terms). At constant x, the resulting inner constraint set 
describes a polytope with all vertices defined by binary 
values and characterizes the convex hull of feasible so- 
lutions for any inner problem that is linear or polyno- 
mial binary in y-variables. 

If the inner optimization problem is mixed-binary 
linear or polynomial, the problem constraints are again 
multiplied by n,-degree polynomial factors composed 
of the ny binary variables and their complements and 
the resulting nonlinear problem is linearized by a sub- 
stitution of new variables. Additional nonlinear terms 
arise from the multiplication of the ny-degree polyno- 
mial factors with the inner problem linear continuous 


terms in y, that are also linearized through a redefini- 
tion [8,12]. This transformation is applicable when in 
the mixed-binary inner problem, the continuous y are 
0 < y < 1. Note that there are no such restrictions on 
the outer problem x-variables in both inner and outer 
problems. 


Inner Problem KKT Conditions 
and Complementarity 


After reformulation/relinearization, the inner problem 
is replaced by the set of equations that define its neces- 
sary and sufficient KKT optimality conditions: 


hi(x,y*)=0, iel, 
af (x, y*) 
oy* 
a ee ee 
ede pas Se 
+ Aig +H oy* es (5) 
j=l j=l 
glx y")+s7 =0, jel 
Ais* =0, jeJ (CS) 
Ais? =>0, jel 


where f’, h” and g’ are the reformulated inner objec- 
tive, equality and inequality constraints, A and pm are 
the Lagrange multipliers of the inner inequality and 
equality constraints, and s are the slack variables asso- 
ciated with the complementarity constraints. 


Active Set Strategy 


The complementarity condition constraints, (CS) in- 
volve discrete decisions on the choice of the inner prob- 
lem active constraint set. The set changes when at least 
one inequality function and its Lagrange multiplier are 
equal to zero. This imposes a major difficulty in the so- 
lution of the transformed problem. To overcome this 
difficulty, the Active Set Strategy [5,8] is employed, such 
that the complementarity constraints are reformulated 
as: 


Mjp-UH=20, fer 
e=ud=Y) 20, jes . 
Aj.sj 20, jel 


Y; € {0,1}. 
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where U is an upper bound on the slack variables s and 
Y are the additional binary variables introduced. If con- 
straint j is active, (Y; = 1), and if inactive, (Y; = 0). 
Note that now the integer variable set includes the bi- 
nary variables Y in addition to the outer problem inte- 
ger variables. 


Transformed BLPP Global Optimization 


The problem that results after the reformulated/linear- 
ized inner problem is replaced with its KKT optimal- 
ity conditions and active set strategy is applied can still 
have nonlinear terms due to complementarity and sta- 
tionarity conditions. Further, nonlinear terms in the 
outer problem variables may exist in either the in- 
ner or outer problem constraints. Hence, the result- 
ing problem is a mixed-integer (nonlinear) optimiza- 
tion problem and should be solved by a global opti- 
mization procedure. If the integer variables are all bi- 
nary and only appear in linear or mixed-bilinear terms, 
the Special structure MINLP-a@BB, SMIN-@BB [1,3] ap- 
proach is employed. If the outer integer variables are 
not restricted to binary and/or participate in nonlin- 
ear terms, the General structure MINLP-wBB, GMIN- 
a@BB [1,3] approach is employed. The steps of the pro- 
posed framework are given below. 


Global Optimization Algorithm 


Step 1 Establish variable bounds by solving the prob- 
lems: 


y', y° = min y, —y s.t. inner problem constraint 
setprotect 
to obtain simple lower and upper bounds on y, 
oe 
Step 2 If the inner integer variables are integer, convert 
into a set of binary variables by Eq. (2) and Eq. (3). 
Step 3 Obtain the vertex polyhedral convex envelope 
of the inner problem feasible region via the reformula- 
tion/ linearization [11]. The inner problem is now lin- 
ear in both inner binary and continuous variables and 
parametric in outer problem variables, x. 
Step 4 Replace the inner problem with the set of equa- 
tions that define its necessary and sufficient KKT opti- 
mality conditions. The resulting problem is single level. 
Step 5 Solve the resulting single level optimization 


problem to global optimality. The inner integer vari- 
ables are all separable, linear and binary at the begin- 
ning of this step. If the final problem is a Mixed Inte- 
ger Linear Problem, (MILP), then use CPLEX. Notice 
that the problem will be an MILP only for the simplest 
cases. If there are nonlinear continuous variables, but 
the integer variables are all binary, linear and separable, 
use SMIN-@BB [1,3] global optimization procedure. If 
the outer problem has nonlinear integer terms, then use 
GMIN-a@BB [1,3] global optimization procedure. 


Illustrative Example 


The following problem [10] can not be solved to global 
optimality using current deterministic solution ap- 
proaches for integer bilevel programming problems in 
the literature. 


min — ey + 4x? 
gin? Xo} Viy2 


— (—x} + 3xfx2) (1 — yi) ya — (2x3 — x1) (1 — 2) 
s.t. min — (x1x3 + 8x3 — 14x} — 5x1) yy 

— (=x, x3 + 5x1 x2 + 4x2) (1 — yi) yo 

— 8x1 yi(1 — yo) 
S.t. yy + y2 21 

0<x, < 10 

0<x,<10 

yiy2 € (0,17, xen. 


(7) 


Steps 1-2 Variable bounds are already given in this 
problem, with 0 < x < 10, and y; and y» are defined 
as binary. 

Step 3 Determination of the vertex polyhedral con- 
vex hull: The inner problem is Ny =2 degrees, 
second degree factors yiy2, yi(1—y2), (1—y1)y2, 
(1 — y1)(1 — y2) multiply the inner problem constraint 
yi + yo —1 > Oand result in: 


yiy2 = 0 


(8) 
yt y2-yiy2-120. 


Linearization: Assign a new variable for the bilinear 
term Z12 = y) 2 that leads to the additional constraints: 
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Z12 2 0 
yi — 212 = 0 (9) 
2 — 212 = 0 


=y¥y— 92.212 21. 


From Eqs. (8) and (9), yi + y2 — Z12 — 1 = 0. Substi- 
tuting the definition of z1) into Eq. (8), a linear relax- 
ation of the inner problem constraint set leads to the 
original set of constraints: 


yi+y2-1=0 
1l—yl=0. 


Hence, the continuous relaxation of the original prob- 
lem constraints define the convex hull and no addi- 
tional constraints are necessary. The inner problem is 
continuous and linear in y, and y2, and parametric in 
the outer problem variables x. 

Step 4 Replace the relaxed inner problem with the 
equivalent set of equations that define its necessary and 
sufficient KKT optimality conditions: 


17 2 
min (Zain - 4x} - x) nt (Zeiss — 2a, - ) 


7 2 3 

*y2 (-Zatm + 2x, + x2 + n) 

0<x, < 10 

0<x,< 10 

_ xs _ 8x3 + 14x? + 5x) — x 1X5 + 5x 1X2 + 4x2 

—A, +A, =0 

— Ay — x[x} — 8x} + 14x7 + 13x, +A; =0 

—y-yot1+s5, =0 

yt =1 

y2 +53 =1 

A4,- UY <0 

s; +UY<U 

Y € {0,1}; 51,41, 91, y2 = 05 

x1,%2 EM; yi,y2 <1. 

(11) 

Step 5 The single level problem constraint contains the 
following nonlinear terms: x7x2y1, X31» X7X2Y2 XZ V2 
X1V25 XPX2, X27, X27, X1xZ, X1X2, x7xF that should be 


underestimated. All integer variables are binary, lin- 
ear and separable. Solve the resulting single level prob- 


lem to global optimality using SMIN-wBB [1,3]. The 
global optimal solution reported in [10] is at (xf, x3, 
Yi» ¥5) = (6.038, 2.957,0, 1). We identify the lower 
global solution at (0, 10, 1, 1). Note that the solution of 
this problem by enumeration methods could be labor 
intensive due to the presence of continuous variables. 


Conclusions 


The global optimization framework addresses the solu- 
tion of several classes of mixed integer nonlinear bilevel 
optimization problems. The outer problem may be 
mixed-integer nonlinear in both inner and outer vari- 
ables; the inner problem may be mixed-integer nonlin- 
ear in outer variables, linear, polynomial or multilinear 
in inner integer variables and linear in inner continu- 
ous variables. This is based on the reformulation of the 
mixed-integer inner problem feasible space to gener- 
ate its convex hull, where the vertices correspond to bi- 
nary solutions. This allows the equivalence of the inner 
optimization problem to the set of equations that de- 
fine its KKT optimality conditions, with which it is re- 
placed. The resulting single level optimization problem 
is solved to global optimality. This is arguably the first 
deterministic global optimization technique that can 
solve several classes of mixed-integer nonlinear bilevel 
optimization problems. Note that if the central decision 
maker wants to locate the second-best inner or outer 
integer solutions, simple integer cuts [2] can be added 
prior to applying the relevant solution strategy. 
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Introduction 


The mixed-integer nonlinear programming (MINLP) 
approach was widely used to model and solve the pro- 
cess synthesis problems in chemical engineering filed 
during the last two decades within the superstructure 
framework that always involves discrete and continu- 
ous variables [3,4,5,6,10]. Recently, the successful em- 
ployment of the branch-and-cut method for 0-1 inte- 
ger programming [7,8] and 0-1 mixed-integer linear 
programming [1,2] has spurred great interest in its ap- 
plication for 0-1 mixed-integer nonlinear optimization 
due to the significant progress of interior point algo- 
rithm for convex optimization problems. Stubbs and 
Mehrotra [9] generalized the lift-and-project cut or the 
disjunctive cut for 0-1 integer or mixed-integer lin- 
ear programming proposed in [1,2,7,8], and extended 
their method into a branch-and-cut algorithm for the 
0-1 mixed-integer nonlinear optimization problem. 
The disjunctive cutting plane presented by Stubbs and 
Mehrotra [9] is obtained by solving a convex projection 
problem, so it is computationally expensive. In [11], 
a valid disjunctive cutting plane for mixed-integer non- 
linear optimization problems was constructed by solv- 
ing a linear programming problem implemented in an 
algorithmic package named MINO, i. e., Mixed-Integer 
Nonlinear Optimizer. 


Formulation 


The general 0-1 mixed-integer nonlinear optimization 
problems can be formulated as 


min dx 
xy 
(P) st. Ax+Gy<b 
gi(x.y) <0, i=1,...,1 
xen", ye {0,1}1 


where the constant vectors and matrices are defined as 
dew" AER Ge RI, b € KR”. Let the 
feasible region of the standard continuous relaxation of 
problem (P) be defined as 


Ax+ Gy <b, 
(x,y) enrta eS 0, t= Nyentel, 
O<y<l, 


C= 


Hence, the feasible set of ( P ) can be formulated as 


Co = {(x,y)eC: y,€ {0,1}, j=1,...,q}. 
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At a generic step of the branch-and-cut algorithm, 
let (x, y) be a solution to the current NLP relaxation of 
(P). If any of the components of the binary variables ‘y 
are not in {0, 1}, we can add a valid inequality into the 
current feasible set such that this inequality is violated 
by (x,y). We denote by c the family of inequalities to 
describe the current feasible set and the newly incorpo- 
rated inequalities. We denote by Fo, Fi € {1,..., q} the 
sets of binary variables that have been fixed at 0 and 1, 
respectively. Let 


7 yj =O forj € Fo 
K Fo, Fi) = 
(C, Fo, Fi) Hiss ec | yj = lforje Fy 


And let NLP (C, Fo, F;) denote the nonlinear program 


min dx 
XY 


s.t. (x, y) € K(C, Fo, Fi) 


The active nodes of the enumeration tree are repre- 
sented by a list of S with ordered pairs (Fo, F;). Let UBD 
represent the current upper bound, i. e., the value of the 
best-known solution to problem (P). 


Branch-And-Cut Procedure 


Input of d,n,q,A,G,b, g; (i =1,...,]): 

(1) Initialization. Set S = {(Fo = ¢, F, = ¢)}, and let 
C consist of the nonlinear programming relaxation 
of (P) and UBD = ~. 

(2) Node Selection. If S = ¢, stop. Otherwise, choose 
an ordered pair (Fo, F) € S and remove it from S. 

(3) Lower Bounding Step. Solve the nonlinear program 

NLP (C, Fo, F;). If the problem is infeasible, go to 

Step 2. Otherwise, let (x, y) denote its optimal solu- 

tion. If dx > UBD, go to Step 2. If y; € {0, 1} ,j = 

1,...,q, let (x*, y*) = (x, y), UBD = dx, and go 

to Step 2. 

Branching versus cutting decision. Should cutting 

planes be generated? If yes, go to Step 5, else go to 

Step 6. 

Cut generation. Generate cutting plane ax + By < 

y valid for (P) but violated by (x, y). Add the cuts 

into C and go to Step 3. 

Branching Step. Pick an index j € {1,...,q} such 

that 0 < y, < 1. Generate the subproblems corre- 

sponding to (Fo U {j}, F:) and (Fo, F; U {j}), add 

them into the node set S. Go to Step 2. 


(4 


na 


(5 


we 


(6 


wm 


When the algorithm terminates, if UBD < on, (ey y*) 
is an optimal solution to (P), otherwise (P) is infeasible. 


Linear Approximation of NLP Relaxation 


The continuous relaxation of problem (P) at some node 
in an enumeration tree can be described by 


min dx 
XY 


s.t. Ax + Gy <b, 
gi (x,y) <0, ile iedls 
yj=9, je Fo, 
yal, jek, 
(x.y) en"t4, 


(NLP) 


where the reformulated linear constraint set consists 
of the original linear constraint set and the upper and 
lower bound constraints for binary variables, so we 
have A € em+2aixm G € glm+24)9, 5 em +4, As. 
sume that the above NLP continuous problem is feasi- 
ble and has a finite minimum at (x, y), since otherwise 
the node is done. A linear approximation problem at 
(x, ¥) for the above NLP problem can be obtained by 


min dx 
Xyy 


s.t. Ax + Gy < b, 
7) + Ve GI) (*~F ) <0 
gi (EA) + VERN 5 ) So 


(LP) 
—o hee l, 
yj=0, je, 
W=l, je, 


(x. y) Ee Ret, 


where the original convex and differentiable functions 
are replaced by their first-order Taylor approximation 
at (x, y). Accordingly, a MILP problem corresponding 
to the MINLP problem at the current node can be de- 
scribed by 


min dx 
xy 


s.t. Ax + Gy < 5, 


(GH) + Ve &H( 57%) <0. 
a ( y) + Vgi (%,9) <7 


(MILP) 
i=1,...,1 
yj=0, je Po, 
y=l, jeR, 


x ER", y € (0,134, 
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[11] proved that if the above NLP achieves its optimal 
solution at (x,y). Then, (X, 7) is also an optimal solu- 
tion to the aforementioned LP. The geometrical expla- 
nation of the above linear approximation is presented 
in Fig. 1, and it is obvious that the mixed-integer set is 
expanded after linear approximation. 


Disjunctive Cut Generation 


For problem (P), it is very attractive to construct the 
lift-and-project cut in terms of the approximated LP 
instead of the NLP, but the cut still can cut away the 
fraction point (x, y). Such cut can be derived by impos- 
ing the 0-1 integral condition on a binary variable y; 
while 0 < y, < 1. In Fig. 1, the short dashed line rep- 
resents the cut generated directly by using the convex 
hull of the mixed-integer convex set presented by [9], 
and the long dashed line stands for the cut to be gener- 
ated in [11] based on the linear approximation. For cut 
generation, the node sets Fy and F; can be expanded 
to include additional binary variables whose optimal 
solutions are taken at 0 or 1 for the NLP problem at 
the current node. Then, we redefine these two sets as 
By = {iz¥;=0} and F) = {i2y, = 1}. It is not 
difficult to verify that the above NLP and LP problems 
have the same optimal solutions if we change the orig- 
inal node sets to be the expanded ones. Let the feasible 
region of the above LP be defined as 


Ax+Gy <b 
(x ye nntd yi =0,i € Fo 
yi slice F, 


where A € imt2at)xn G Ee gylmt2qtl)xa GB 
gmat! i e., the newly reformulated linear constraint 
set consists of the linear approximation set besides the 
original one, as 


Ax + Gy <b, 
Vg" (&.3) x + Vo" (%3) y 
Ax+Gy<b = <8) (5 )-8@7). 


It should be noted that the node sets have already 
been changed to the expanded ones in the above for- 


mulation. If we impose the integrality condition on a bi- 
nary variable y; for which 0 < y ; ~< 1, the disjunctive 
cut can be obtained by choosing a valid inequality for 


P;(K) = conv (KN {(x, y) € R"*4: y; € {0, 1h) , 


The convex hull of this union set can be further de- 
scribed by its disjunctive form as 


P(K) = conr(} 0 \@.y) eR"4 ry < off 
U}Kn (9) entra, -yji< -i}} ) ; 


Let F = {1,...,q}\ (Fo UF) denote the set of 
free variables at node (Fo, F\), and the vector cor- 
responding to those free variables can be defined as 
y’ = y\{yi:i € (Fo U Fi)}. The columns of ma- 
trix G corresponding to the fixed binary variables 
can be removed from the constraint set by defining 
e = G\ {Gi :i € (FoU F,)}, and the right-hand side 
Gj. Fi- 
nally, the rows in matrices A and C; and vector b that 
correspond to the upper and lower bounds of the fixed 
binary variables are removed. After doing the above 
operations, we can assume without loss of generality, 
that F; = ¢. Since if F, # 4¢, all the variables y;, in 
F, can be complemented by 1-y, which amounts re- 
placing the columns G; and the right-hand side with 
—G, and b — Gx, respectively. The reduced LP con- 
straint set after removing the fixed binary variables be- 
comes 


can be calculated accordingly as y = b- ie, 


Ax + » Gjyj <b 
i€F 
=) G 
i€F) 
Vg* (x,y) x 
+30 Vg; (%.9) yj 


i€F 


< Vg (x,y) ( 

—g (x, 9) 

= Ve (7); 
i€F, 

—yi £0 

Vil 


Ax + Gy" < o = 


Se] &l 


ie F, 
ieF, 
where rte E gmt 2] FI+D x2 | = € (mF 2| FIED XE] 
b em tlFl+! Then, the feasible region of the above 
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gi(z.y)< 0 


_— afsy}+ve(ar[s-29-7) <0 
aN 


a oe ~ on? 
—— z,{3,9]+vg,(x9]x-x9-9) <0 


cut proposed in [9] 


g,(x,y)<0 


cut proposed in [10] 


Mixed-Integer Nonlinear Optimization: A Disjunctive Cutting Plane Approach, Figure 1 
Linear approximation of the mixed-integer nonlinear optimization problem, where the mixed-integer convex set is described 


by a continuous variable and a binary variable 


LP can be reformulated as 
K= (x. ¥") e yr tlFl la'x ++ Gy < BS 


Let (X,¥) be the optimal solution when solving 
NLP (C, Fo, F)). First, we assume that (x, y) is not fea- 
sible to problem (P), and let j be the binary variable in- 
dex such that 0 < yj < 1. In [11], a disjunctive cut 
generation linear programming is given by 
min > z+ >) wi 

i€N i€F 
s.t. X =Up + uy, y =vo + V1, 
—F =F oF 
A up +G vo —b Ao < 9, 


F 


Vo,j <9, 
Au + G vi _ by <0, 

(LP(F)) Se, 
Ao tA, = 1; 
—z+x<x, -z-x<-x, 
Wes Sys Say Says 
Agar > 0, x,u9,m1,z7E NR", 


y vo. Vi, we QF 

This linear program has 4n + 4|F| + 2 variables 
with 2m + 3n + 7|F| + 21 + 3 equality or inequality 
constraints. After solving this linear program, we get its 
solutions denoted by (%, 9”, i, 09, 1,01, 2, W, Ao, A1) 
as well as the dual multipliers. Denote by x, a, , 5F 
the multipliers for the equality constraints, 54, 5{ for 


the disjunctive inequality constraints, 5, for the 
inequality original constraints, and e!|,e!, g/g" for 
the additional constraints in LP(F), the cut generated in 
Theorem 2, i.e.ax-+B* y" < y, canbe reformulated by 
the dual multipliers and the primal solutions to LP(F), 
as myx + ayy" < myx + my ¥". Since the feasible set 
of problem (P) at the node (Fo, Fi) is contained in the 
feasible set of the MILP at that node. Therefore, the in- 
equality ax +" y" < y is valid and proper for problem 
(P) at the current node denoted by (Fo, F;), and its de- 
scendents where the variables in (Fo, F;) remain fixed. 


Cut Lifting 


An important advantage of the cut generated by the 
lift-and-project technology is that the multipliers, i-e., 
[15 5g 4]. 61 , obtained along with the solution (*, #7”) 
by solving LP(F) can be used to calculate the closed 
form expressions of the coefficients 6; for the binary 
variables in the index set Fy U F). First, we lift the in- 
equality obtained at the current node into the comple- 
mented original space of the MILP problem, that is 


| (x, y) Egrta, 


Aeee- y< b°, 
y € {0, 1}4 

Let j € {1,..., q} be an index such that 0 < <1 
and consider the inequality a?x + Bly < y4 gener- 
ated over the complemented original space of the MILP 
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problem, this is to solve the linear program LP(Q), as 


min > z+ > wi 
i€N i€Q 
St.X =Uptujy, YHvt+yv, 
Avy + Gy =b Ry = 0, 
Vo,j <9, 
Aw Gon 8 Uy <0, 
(PQ), one a 
Agta, = 1, 
=24- XK =x, =2= 25 =x, 
WLS I “WH PS), 
Ao, A1 = 9, 
X,U9,U1,Z ER", y,vo,vi,w © R42. 


Note that more variables and constraints are added 
into this linear program compared with LP(F). But, 
by using the solutions to LP(F) and its multipliers, we 
can obtain the optimal solution to the above LP(Q). 
Let (4.5, fig, 9, th, 1,2, W, Ag, Ai)be the solution to 
the linear program LP(Q), which can be constructed 
by the solution to the LP(F), as t = %, 7 = (j*,0), 
u (¥, 0), t1 = (1,0), Ao = Ao, 
4, = A,e = Zandw = (w,0). Then, the cor- 
responding dual multipliers of LP(Q ) denoted by 
Gee Re Dy ps ot ky 92,62), which 
is also tiie solution to the ia linear es DLP(Q), 
can be constructed by those to DLP(F), as #2 = a? 


Ry, =H,;, for ie F,# i; = = min {fi§G; ie} 
int Fe RyU oy = 640, H6i.8, = Sey = 
(0,27. = (40,62 = ad, s2 = 39 
g2 = (gf,0), and ¢2 = (GF,0). The inequal- 
ity atx + Bly < v4 described by #2x + w2y < 
ALK + H2% is valid for the entire enumeration tree, 


tp = to, Hy = 01, % = 


AQ 
Eps Es, 


> 


and cuts away (X, 7). 


Conclusion 


A branch-and-cut algorithm is introduced in this sec- 
tion to solve 0-1 mixed-integer nonlinear optimiza- 
tion problem where the disjunctive cuts are generated 
and incorporated into an enumeration process. The 
lift-and-project cut generation is performed via linear 
programming, as opposed to the convex nonlinear ap- 
proach used in [9]. This new approach has the ad- 
vantage of making the cut generation computationally 
cheaper and overcoming the nondifferential problems. 
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A wide range of nonlinear optimization problems in- 

volve integer or discrete variables in addition to the 

continuous variables. These classes of optimization 
problems arise from a variety of applications and 
are denoted as mixed integer nonlinear programming 

MINLP problems. 

The integer variables can be used to model, for in- 
stance, sequences of events, alternative candidates, ex- 
istence or non-existence of units (in their zero-one rep- 
resentation), while discrete variables can model, for in- 
stance, different equipment sizes. The continuous vari- 
ables are used to model the input-output and inter- 
action relationships among individual units/operations 
and different interconnected systems. 

The nonlinear nature of these mixed integer opti- 
mization problems may arise from: 

i) nonlinear relations in the integer domain exclu- 
sively (e.g., products of binary variables in the 
quadratic assignment model); 

ii) nonlinear relations in the continuous domain only 
(e.g., complex nonlinear input-output model in 
a distillation column or reactor unit); 

iii) nonlinear relations in the joint integer-continuous 
domain (e.g., products of continuous and binary 


variables in the scheduling/planning of batch pro- 

cesses and retrofit of heat recovery systems). 

The book [88] studies mixed integer linear optimiza- 
tion and combinatorial optimization, while the [40] 
studies mixed integer nonlinear optimization prob- 
lems. 

The coupling of the integer domain and the contin- 
uous domain with their associated nonlinearities make 
the class of MINLP problems very challenging from 
the theoretical, algorithmic, and computational point 
of view. Mixed integer nonlinear optimization prob- 
lems are encountered in a variety of applications in 
all branches of engineering and applied science, ap- 
plied mathematics, and operations research. These rep- 
resent very important and active research areas that in- 
clude: 

e process synthesis 

- heat exchanger networks 

- retrofit of heat recovery systems 

- distillation sequencing 

- mass exchange networks 

- reactor-based systems 

- reactor-separator-recycle systems 

- utility systems 

- total process systems 

- metabolic engineering 
e process design 

- reactive distillation 

- design of dynamic systems 

- plant layout 

- environmental design 
e process synthesis and design under uncertainty 

- uncertainty analysis 

- dynamic systems 

- batch plant design 
e molecular design 

- solvent selection 

- design of polymers and refrigerants 

- property prediction under uncertainty 
e interaction of design, synthesis and control 

- steady state operation 

- dynamic operation 
© process operations 

- scheduling of multiproduct plants 

- design and retrofit of multiproduct plants 

- synthesis, design and scheduling of multipurpose 

plants 
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- planning under uncertainty 
e facility location and allocation 
e facility planning and scheduling 
e topology of transportation networks 
The applications in the area of process synthesis in 
chemical engineering include: 
i) the synthesis of grassroot heat recovery networks 
[24,25,43,138,139,140]; 
ii) the retrofit of heat exchanger systems [25,95]; 
iii) the synthesis of distillation-based separation sys- 
tems [8,9,90,102,104,131]; 
iv) the synthesis of mass exchange networks [54,99]; 
v) _ the synthesis of complex reactor networks [71,73, 
74,119]; 
vi) the synthesis of reactor-separator-recycle systems 
[72]; 
the synthesis of utility systems [65]; 
the synthesis of total process systems [28,29,68,69, 
75,76,98]; and 
ix) the analysis and synthesis of metabolic pathways 
[30,58,59,107]. 
Reviews of the mixed integer nonlinear optimization 
frameworks and applications in Process Synthesis are 
provided in [40,49,50], and [7], while algorithmic ad- 
vances for logic and global optimization in Process Syn- 
thesis are reviewed in [44]. 
The MINLP applications in the area of process de- 
sign include: 
i) reactive distillation processes [26]; 
ii) design of dynamic systems [11,14,117,118]; 
iii) plant layout systems [47,105]; and 
iv) environmentally benign systems [27,123]. 
The MINLP applications in the area of process synthesis 
and design under uncertainty include: 
i) deterministic and stochastic uncertainty analysis 
[153651]; 
ii) design of dynamic systems under uncertainty 
[31,85]; and 
iii) design of batch processes under uncertainty 
[57,63,108,109]. 
In the area of molecular design, the MINLP applications 
include: 
i) the computer-aided molecular design aspects of se- 
lecting the best solvents [91]; 
ii) design of polymers and refrigerants [21,22,23,35, 
80,111,126]; and 
iii) property prediction under uncertainty [81]. 


The MINLP applications in the area of interaction of 

design, synthesis and control include: 

i) studies under steady state operation of chemical 

processes [78,79,96,97]; and 

ii) studies under dynamic operation [85,806,118]. 

Applications of MINLP approaches have also emerged 

in the area of process operations and include: 

i) short term scheduling of batch and semicontinuous 
processes [85,143]; 

ii) the design of multiproduct plants [17,18,53]; 

iii) the synthesis, design and scheduling of multipur- 
pose plants [13,36,37,93,94,116,127,128,132,133, 
137]; and 

iv) planning under uncertainty [62,63,64,77,106]. 

Reviews of the advances in the design, scheduling and 

planning of batch plants can be found in [52,113], while 

a collection of recent contributions can be found in the 

proceedings of the 1998 FOCAPO meeting. 

MINLP applications received significant attention 
in other engineering disciplines. These include 

i) the facility location in a multi-attribute space [45]; 

ii) the optimal unit allocation in an electric power sys- 
tem [16]; 

iii) the facility planning of an electric power generation 
[19,114]; 

iv) the chip layout and compaction [32]; 

v) the topology optimization of transportation net- 
works [60]; and 

vi) the optimal scheduling of thermal generating units 
[48]. 


Mathematical Description 


The general algebraic MINLP formulation can be stated 
as: 


min f(x,y) 

xY 

st. h(x,y) =0 
g(x,y) <0 (1) 
xE€XC RR” 


y <Y_ integer. 


Here x represents a vector of n continuous vari- 
ables (e.g., flows, pressures, compositions, tempera- 
tures, sizes of units), and y is a vector of integer vari- 
ables (e. g., alternative solvents or materials); h(x, y) = 
0 denote the m equality constraints (e. g., mass, energy 
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balances, equilibrium relationships); g(x, y) < 0 are the 
p inequality constraints (e. g., specifications on purity of 
distillation products, environmental regulations, feasi- 
bility constraints in heat recovery systems, logical con- 
straints); and f(x, y) is the objective function (e. g., an- 
nualized total cost, profit, thermodynamic criteria). 


Remark 1 The integer variables y with given lower and 
upper bounds 


y <ysy" 
can be expressed through 0-1 variables (i.e., binary), 
denoted as z, by the following formula: 


y= yh +2 + 220 + 423 +++» + 2N Izy, 


where N is the minimum number of 0-1 variables 
needed. This minimum number is given by: 


area 


N =1+ INT 
= log 2 


where the INT function truncates its real argument to 
an integer value. 


Then, formulation (1) can be written in terms of 0-1 
variables: 


min f(x,y) 

%Y 

st. h(x,y) =0 
g(x,y) <0 (2) 
xE€XC R” 
y € {0,1}, 


where y now is a vector of q 0-1 variables (e. g., exis- 
tence of a process unit (y; = 1) or nonexistence (y; = 0)). 


Challenges in MINLP 


Dealing with mixed integer nonlinear optimization 
models of the form (1) or (2) present two major chal- 
lenges. These difficulties are associated with the nature 
of the problem, namely, the combinatorial domain (y- 
domain) and the continuous domain (x-domain). 

As the number of binary variables y in (2) in- 
crease, one is faced with a large combinatorial prob- 
lem, and the complexity analysis results characterize 
MINLP problems as NP-complete [88]. At the same 
time, due to the nonlinearities the MINLP problems 


are in general nonconvex which implies the potential 
existence of multiple local solutions. The determina- 
tion of a global solution of the nonconvex MINLP prob- 
lems is also NP-hard, since even the global optimization 
of constrained nonlinear programming problems can 
be NP-hard [100], and even quadratic problems with 
one negative eigenvalue are NP-hard [101]. An excel- 
lent book on complexity issues for nonlinear optimiza- 
tion is [129]. 

Despite the aforementioned discouraging results 
from complexity analysis, which are worst-case results, 
significant progress has been achieved in the MINLP 
area from the theoretical, algorithmic, and computa- 
tional perspective. As a result, several algorithms have 
been proposed for convex and nonconvex MINLP 
models, their convergence properties have been inves- 
tigated, and a large number of applications now exist 
that cross the boundaries of several disciplines. In the 
sequel, we will discuss these developments. 


Overview of Local Optimization Approaches 
for Convex MINLP Models 


A representative collection of local MINLP algorithms 

developed for solving convex MINLP models of the 

form (1) or restricted classes of (2) includes the follow- 

ing: 

1) generalized Benders decomposition, GBD, [42,46, 
103]; 

2) outer approximation, OA, [34]; 

3) outer approximation with equality relaxation, 
OA/ER, [67]; 

4) outer approximation with equality relaxation and 
augmented penalty, OA/ER/AP, [131]; 

5) generalized outer approximation, GOA, [38]; 

6) generalized cross decomposition, GCD, [61]; 

7) branch and bound, BB, [15,20,39,55,92,110]; 

8) feasibility approach, FA, [82]; 

9) extended cutting plane, ECP, [134,135]; 

10) logic-based approaches, [124,130]. 

In the pioneering work [46] on the generalized benders 

decomposition, GBD, two sequences of updated up- 

per(nonincreasing) and lower (nondecreasing) bounds 

are created that converge within € in a finite number 

of iterations. The upper bounds correspond to solving 

subproblems in the x variables by fixing the y variables, 

while the lower bounds are based on duality theory. 
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The outer approximation, OA, addresses problems 
with nonlinear inequalities, and creates sequences of 
upper and lower bounds as the GBD, but it has the dis- 
tinct feature of using primal information, that is the 
solution of the upper bound problems, so as to lin- 
earize the objective and constraints around that point. 
The lower bounds in OA are based upon the accu- 
mulation of the linearized objective function and con- 
straints, around the generated primal solution points. 

The OA/ER algorithm extends the OA to handle 
nonlinear equality constraints by relaxing them into in- 
equalities according to the sign of their associated mul- 
tipliers. 

The OA/ER/AP algorithm introduces an aug- 
mented penalty function in the lower bound subprob- 
lems of the OA/ER approach. 

The generalized outer approximation, GOA, ex- 
tends the OA to the MINLP problems that the GBD ad- 
dresses and introduces exact penalty functions. 

The generalized cross decomposition, GCD, simul- 
taneously utilizes primal and dual information by ex- 
ploiting the advantages of Dantzig-Wolfe and general- 
ized Benders decomposition. 

An overview of these local MINLP algorithms and 
extensive theoretical, algorithmic, and applications of 
GBD, OA, OA/ER, OA/ER/AP, GOA, and GCD algo- 
rithms can be found in [40]. 

The branch and bound, BB, approaches start by 
solving the continuous relaxation of the MINLP and 
subsequently perform an implicit enumeration where 
a subset of the 0-1 variables is fixed at each node. The 
lower bound corresponds to the NLP solution at each 
node and it is used to expand on the node with the 
lowest lower bound or it is used to eliminate nodes 
if the lower bound exceeds the current upper bound. 
If the continuous relaxation, NLP in most cases with 
the exception of the algorithm of [110] where an LP 
problem is obtained, of the MINLP has a 0-1 solution 
for the y variables, then the BB algorithm will termi- 
nate at that node. With a similar argument, if a tight 
NLP relaxation results in the first node of the tree, then 
the number of nodes that would need to be eliminated 
can be low. However, loose NLP relaxations may result 
in having a large number of NLP subproblems to be 
solved. The algorithm terminates when the lowest lower 
bound is within a prespecified tolerance of the best up- 
per bound. 


The feasibility approach, FA, rounds the relaxed 
NLP solution to an integer solution with the least lo- 
cal degradation by successively forcing the superba- 
sic variables to become nonbasic based on the reduced 
cost information. The premise of this approach is that 
the problems to be treated are sufficiently large so that 
techniques requiring the solution of several NLP relax- 
ations, such as the branch and bound approach, have 
prohibitively large costs. They therefore wish to ac- 
count for the presence of the integer variables in the for- 
mulation and solve the mixed integer problem directly. 
This is achieved by fixing most of the integer variables 
to one of their bounds (the nonbasic variables) and al- 
lowing the remaining small subset (the basic variables) 
to take discrete values in order to identify feasible so- 
lutions. After each iteration, the reduced costs of the 
variables in the nonbasic set are computed to measure 
their effect on the objective function. If a change causes 
the objective function to decrease, the appropriate vari- 
ables are removed from the nonbasic set and allowed 
to vary for the next iteration. When no more improve- 
ment in the objective function is possible, the algorithm 
is terminated. This strategy leads to the identification of 
a local solution. 

The cutting plane algorithm proposed in [66] for 
NLP problems has been extended to MINLPs [134,135]. 
The ECP algorithm relies on the linearization of one of 
the nonlinear constraints at each iteration and the so- 
lution of the increasingly tight MILP made up of these 
linearizations. The solution of the MILP problem pro- 
vides a new point on which to base the choice of the 
constraint to be linearized for the next iteration of the 
algorithm. The ECP does not require the solution of any 
NLP problems for the generation of an upper bound. As 
a result, a large number of linearizations are required 
for the approximation of highly nonlinear problems 
and the algorithm does not perform well in such cases. 
Due to the use of linearizations, convergence to the 
global optimum solution is guaranteed only for prob- 
lems involving inequality constraints which are convex 
in the x and relaxed y-space. 

An alternative to the direct solution of the MINLP 
problem was proposed by [124]. Their approach stems 
from the work of [70] on a modeling/decomposition 
strategy which avoids the zero-flows generated by the 
nonexistence of a unit in a process network. The first 
stage of the algorithm is the reformulation of the 
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MINLP into a generalized disjunctive program. A vec- 
tor of Boolean variables indicate the status of a dis- 
junction (True or False) and are associated with the 
alternatives. The set of disjunctions allows the repre- 
sentation of several alternatives. A set of logical rela- 
tionships between the Boolean variables is introduced. 
Instead of resorting to binary variables within a single 
model, the disjunctions are used to generate a different 
model for each alternative. Since all continuous vari- 
ables associated with the nonexisting alternatives are 
set to zero, this representation helps to reduce the size 
of the problems to be solved. Two algorithms are sug- 
gested by [124]. They are logic-based variants of the 
outer approximation and generalized Benders decom- 
position. [130] introduced LOGMIP, a computer code 
for disjunctive programming and MINLP problems, 
and studied modeling alternatives and process synthe- 
sis applications. 


Overview of Global Optimization Approaches 
for Nonconvex MINLP Models 


the previous Section we discussed local MINLP algo- 

rithms which are applicable to convex MINLP models. 

While identification of the global solution for convex 

problems can be guaranteed, a local solution is often 

obtained for nonconvex problems. The recent book by 

[41] discusses the theoretical, algorithmic and applica- 

tions oriented advances in the global optimization of 

mixed integer nonlinear models. A number of global 

MINLP algorithms that have been developed to address 

different types of nonconvex MINLPs are presented in 

this section. These include: 

1) Branch and reduce approach, [115]; 

2) interval analysis based approach, [125]; 

3) extended cutting plane approach, [135,136]; 

4) reformulation/spatial branch and bound approach, 
[121,122]; 

5) hybrid branch and bound and outer approximation 
approach, [141,142]; 

6) The SMIN-wBB approach, [2,4]; 

7) The GMIN-cBB approach, [2,4]. 

In the sequel, we will briefly discuss the approaches 1)- 

7). 


Branch and Reduce Algorithm 


[115] extended the scope of branch and bound algo- 
rithms to problems for which valid convex underesti- 


mating NLPs can be constructed for the nonconvex re- 
laxations. The range of application of the proposed al- 
gorithm encompasses bilinear problems and separable 
problems involving functions for which convex under- 
estimators can be built [10,83]. Because the nonconvex 
NLPs must be underestimated at each node, conver- 
gence can only be achieved if the continuous variables 
are branched on. A number of tests are suggested to ac- 
celerate the reduction of the solution space. They are 
summarized in the following. 


Optimality Based Range Reduction Tests 


For the first set of tests, an upper bound U on the non- 

convex MINLP must be computed and a convex lower 

bounding NLP must be solved to obtain a lower bound 

L. If a bound constraint for variable x;, with a <xj< 

ie is active at the solution of the convex NLP and has 

multiplier A¥ > 0, the bounds on x; can be updated as 
follows: 

1) Ifx;- er = 0 at the solution of the convex NLP and 
Kk; =x —(U —L)/A* is such that «; > x}, then x! = 
Kj. 

2) Ifx;- x = 0 at the solution of the convex NLP and 
Kk; =x! + (U —L)/A* is such that «; < x, then x¥ = 
Kj. 

If neither bound constraint is active at the solution of 

the convex NLP for some variable xj, the problem can 

or 4 = xi . Tests similar 
to those presented above are then used to update the 
bounds on xj. 


be solved by setting x; = x 


Feasibility Based Range Reduction Tests 


In addition to ensuring that tight bounds are available 
for the variables, the underestimators of the constraints 
are used to generate new constraints for the problem. 
Consider the constraint g(x, y) < 0. If its underestimat- 
ing function g;(x, y) = 0 at the solution of the convex 
NLP and its multiplier is jz* > 0, the constraint 


U-L 
gilx y) => -——; 
ae bb 


i 


can be included in subsequent problems. 
The branch and reduce algorithm has been tested 
on a set of small problems. 
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Interval Analysis Based Approach 


An approach based on interval analysis was proposed 
by [125] to solve to global optimality problems with 
a twice-differentiable objective function and once- 
differentiable constraints. Interval arithmetic allows the 
computation of guaranteed ranges for these functions 
[87,89,112]. The approach relies on the same concepts 
of successive partitioning of the domain and bound- 
ing of the objective function, while the branching takes 
place on the discrete and continuous variables. The 
main difference with the branch and bound algorithms 
is that bounds on the problem solution in a given do- 
main are not obtained through optimization. Instead, 
they are based on the range of the objective function 
in the domain under consideration, as computed with 
interval arithmetic. As a consequence, these bounds 
may be quite loose and efficient fathoming techniques 
are required in order to enhance convergence. [125] 
suggested node fathoming tests and branching strate- 
gies which are outlined in the sequel. Convergence 
is declared when best upper and lower bounds are 
within a prespecified tolerance and when the width of 
the corresponding region is below a prespecified toler- 
ance. 


Node Fathoming Tests 


The upper-bound test is a classical criterion used in 
all branch and bound schemes: If the lower bound for 
a node is greater than the best upper bound for the 
MINLP, the node can be fathomed. 

The infeasibility test is also used by all branch and 
bound algorithms. However, the identification of infea- 
sibility using interval arithmetic differs from its identifi- 
cation using optimization schemes. An inequality con- 
straint g;(x, y) < 0 is declared infeasible if its interval 
inclusion over the current domain, is positive. If a con- 
straint is found to be infeasible, the current node is fath- 
omed. 

The monotonicity test is used in interval-based ap- 
proaches. Ifa region is feasible, the monotonicity prop- 
erties of the objective function can be tested. For this 
purpose, the inclusions of the gradients of the objec- 
tive with respect to each variable are evaluated. If all the 
gradients have a constant sign for the current region, 
the objective function is monotonic and only one point 
needs to be retained from the current node. 


The nonconvexity test is used to test the existence 
of a solution (local or global) within a region. If such 
a point exists, the Hessian matrix of the objective func- 
tion at this point must be positive semidefinite. A suf- 
ficient condition is the nonnegativity of at least one 
of the diagonal elements of its interval Hessian ma- 
trix. 

[125] suggested two additional tests to accelerate the 
fathoming process. The first is denoted as lower bound 
test. It requires the computation of a valid lower bound 
on the objective function through a method other than 
interval arithmetic. If the upper bound at a node is less 
than this lower bound, the region can be eliminated. 
The second test, the distrust region method, aims to 
help the algorithm identify infeasible regions so that 
they can be removed from consideration. Based on the 
knowledge of an infeasible point, interval arithmetic is 
used to identify an infeasible hypercube centered on 
that point. 


Branching Strategies 


The variable with the widest range is selected for 

branching. It can be a continuous or a discrete variable. 

In order to determine where to split the chosen variable, 

a relaxation of the MINLP is solved locally. 

e Continuous Branching Variable: If the optimal 
value of the continuous branching variable, x*, is 
equal to one of the variable bounds, branch at the 
midpoint of the interval. Otherwise, branch at x*— 
B, where f is a very small scalar. 

e Discrete Branching Variable: If the optimal value of 
the discrete branching variable, y*, is equal to the 
upper bound on the variable, define a region with y 
= y* and one with y! < y < y*— 1, where y” is the 
lower bound on y. Otherwise, create two regions y” 
< y < int(y*) and int(y*) + 1 < y < y”, where y” is 
the upper bound on y. 

This algorithm has been tested on a small example 

problem and a molecular design problem [125]. 


Extended Cutting Plane for Pseudoconvex MINLPs 


The use of the ECP algorithm for nonconvex MINLP 
problems was suggested in [135], using a modified al- 
gorithmic procedure as described in [136]. The main 
changes occur in the generation of new constraints for 
the MILP at each iteration (Step 4). In addition to the 
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construction of the linear function /;,(x, y) at iteration 

k, the following steps are taken: 

1) Remove all constraints for which ];(x‘, y")> glx’, 
y‘). These correspond to linearizations which did 
not underestimate the corresponding nonlinear 
constraint at all points due to the presence of non- 
convexities. 

2) Replace all constraints for which Li(x*, y*) = gpilx®, 
y*) = 0 by their linearization around (x, y’). 

3) If constraint i is such that gilx*, y")> 0, add its lin- 
earization around (x‘, y* ). 

The convergence criterion is also modified. In addition 

to the test used in Step 3, the following two conditions 

must be met: 

1) (xk xk1)T (xk x1) < 8, a pre-specified toler- 
ance. 

2) yr—y* 1 =0. 

The ECP algorithm for pseudoconvex MINLPs has 

been used to address a trim loss problem arising in the 

paper industry [136]. A comparative study between the 
outer approximation, the generalized Benders decom- 
position and the extended cutting plane algorithm for 

convex MINLPs was presented in [120]. 


Reformulation/Spatial Branch 
and Bound Algorithm 


A global optimization algorithm of the branch and 
bound type was proposed in [121]. It can be applied 
to problems in which the objective and constraints are 
functions involving any combination of binary arith- 
metic operations (addition, subtraction, multiplication 
and division) and functions that are either concave over 
the entire solution space (such as In) or convex over this 
domain (such as exp). 

The algorithm starts with an automatic reformu- 
lation of the original nonlinear problem into a prob- 
lem that involves only linear, bilinear, linear fractional, 
simple exponentiation, univariate concave and univari- 
ate convex terms. This is achieved through the intro- 
duction of new constraints and variables. The reformu- 
lated problem is then solved to global optimality us- 
ing a branch and bound approach. Its special struc- 
ture allows the construction of a convex relaxation at 
each node of the tree. It should be noted that due to 
the introduction of many new constraints and variables 
the size of the convex relaxation of the reformulated 


problem increases substantially even for modest size 
problems. The integer variables can be handled in two 
ways during the generation of the convex lower bound- 
ing problem. The integrality condition on the variables 
can be relaxed to yield a convex NLP which can then 
be solved globally. Alternatively, the integer variables 
can be treated directly and the convex lower bounding 
MINLP can be solved using a branch and bound algo- 
rithm. This second approach is more computationally 
intensive but is likely to result in tighter lower bounds 
on the global optimum solution. 

In order to obtain an upper bound on the optimum 
solution, a local MINLP algorithm can be used. Alter- 
natively, the MINLP can be transformed to an equiva- 
lent nonconvex NLP by relaxing the integer variables. 
For example, a variable y € {0, 1} can be replaced by 
a continuous variable z € [0, 1] by including the con- 
straint z—z-z=0. 

This algorithm has been applied to reactor selection, 
distillation column design, nuclear waste blending, heat 
exchanger network design and multilevel pump config- 
uration problems. 


Hybrid Branch and Bound 
and Outer Approximation 


[142] proposed a global optimization MINLP approach 
for the synthesis of heat exchanger networks without 
stream splitting. This approach is a hybrid branch and 
bound with outer approximation. It is based on two 
alternative convex underestimators for the heat trans- 
fer area. The first type of these convex underestima- 
tors along with the variable bounds and techniques 
for the bound contraction are based on a thermody- 
namic analysis. The second type is based on a relax- 
ation and transformation so as to employ specific un- 
derestimation schemes. These convex underestimators 
result in a convex MINLP that is solved using the 
Outer Approximation approach and which provides 
valid lower bounds on the global solution. This ap- 
proach has been applied to five heat exchanger net- 
work examples that employ the MINLP model of [138] 
that contains linear constraints and nonconvex objec- 
tive function. 

[141] introduced a deterministic branch and con- 
tract approach for structured process systems that have 
univariate concave, bilinear and linear fractional terms. 
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They proposed properties of the contraction operation 
and studied their effect on several applications. 


The SMIN-«BB Algorithm 


The SMIN-a@BB global optimization algorithm, pro- 
posed by [2] is designed to solve to global optimal- 
ity mathematical models where the binary/integer vari- 
ables appear linearly and hence separably from the con- 
tinuous variables and/or appear in at most bilinear 
terms, while nonlinear terms in the continuous vari- 
ables appear separably from the binary/integer vari- 
ables. These mathematical models become: 


min f(x) +x! Aoytely 
xy 


st. h(x) +x! Aytely=0 
g(x) +x! Axyt+cly <0 (3) 
xE€XC R"” 
y <Y integer, 


where cj ,¢] 


are constant matrices and f(x), h(x) and g(x) are func- 
tions with continuous second order derivatives. 


and rl are constant vectors, Ag, A; and A, 


The theoretical, algorithmic and computational 
studies of the SMIN-aBB algorithm are presented in 
detail in [41]. 


The GMIN-ovBB Algorithm 


The GMIN-a@BB global optimization algorithm pro- 
posed in [2] operates within a branch and bound 
framework. The main difference with the algorithms of 
[56,92] and [20] is its ability to identify the global opti- 
mum solution of a much larger class of problems of the 
form 


min f(x,y) 

%Y 

st. h(x,y) =0 
g(x,y) <0 
xEXC R"” 
ye NY, 


where N is the set of nonnegative integers and the only 
condition imposed on the functions f(x, y), g(x, y) and 
h(x, y) is that their continuous relaxations possess con- 


tinuous second order derivatives. This increased appli- 
cability results from the use of the aBB global opti- 
mization algorithm for continuous twice-differentiable 
NLPs [3,5,6,12]. 

The theoretical, algorithmic and computational 
studies of the GMIN-aBB Algorithm are presented in 
detail in [41]. 
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Abstract 


This Chapter presents a novel, mixed integer nonlinear 
programming (MINLP) model for the well scheduling 
problem, where the nonlinear behavior of the reservoir, 
wells, pipelines and surface facilities has been incor- 
porated into the mathematical formulation. The well 
scheduling problem is formulated as a snapshot opti- 
mization problem with an objective function that ex- 
presses the maximization of an economic index. Dis- 
crete decisions here include the operational status of the 
wells, the allocation of wells to manifold or separators 
and the allocation of surface flowlines to separators. 
Continuous decisions include the well oil flowrates, and 
the allocation of gas-to-gas lift wells. 

A three-step solution strategy is proposed for the 
solution of this problem, where logic based relations 
and piecewise linear approximations of oil field wells 
are integrated in the MINLP formulation. The model 
is solved following an Outer Approximation (OA) class 
algorithm. A number of examples are presented to il- 
lustrate the performance and business value of the pro- 
posed strategy; a remarkable increase in oil production 
of up to 10% is demonstrated, compared to results ob- 
tained via widespread heuristic methods. A further in- 
crease of 2.9% can be achieved by dynamic optimiza- 
tion based on explicit consideration of the multiphase 
flow within the reservoirs of a particular oil field. 


Introduction 


In an era of globalized business operations, large and 
small oil and gas producers alike strive to foster prof- 
itability by improving the agility of exploration endeav- 
ors and the efficiency of oil production, storage and 
transport operations [7]. Consequently, they all face 
acute challenges: ever-increasing international produc- 
tion, intensified global competition, price volatility, op- 
erational cost reduction policies, aggressive financial 
goals (market share, revenue, cash flow and profitabil- 
ity) and strict environmental constraints (offshore ex- 
traction, low sulphur): all these necessitate a high level 
of oilfield modeling accuracy, so as to maximize recov- 
ery from certified reserves. Straightforward translation 
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of all considerations to explicit mathematical objectives 
and constraints can yield optimal oilfield network de- 
sign, planning and operation policies. Therefore, the 
foregoing goals and constraints should be explicitly in- 
corporated and easily revised if the generality of pro- 
duction optimization algorithms is to be preserved. 
This Chapter provides a summary of a new, efficient 
MINLP optimization formulation for well schedul- 
ing, and a novel strategy towards integration of equa- 
tion-oriented process modeling and multiphase reser- 
voir computational fluid dynamics (CFD), in order to 
include the dynamic behavior of reservoirs into oil and 
gas production models. 

The problem of fuel production optimization sub- 
ject to explicit oilfield constraints has attracted sig- 
nificant attention, documented in many petroleum 
engineering publications. A comprehensive literature 
review by Kosmidis [18] classifies previous algorithms 
in 3 broad categories (simulation, heuristic, and math- 
ematical programming methods) and underlines that 
most are applied either to simple pipeline networks of 
modest size, relying on heuristic rules of limited appli- 
cability, or are suitable for special structures. Reducing 
the computational burden (focus on natural-flow wells 
or gas-lift wells only, or reducing well network connec- 
tivity discrete variables) is a crucial underlying pattern. 

Dynamic oil and gas production systems simula- 
tion and optimization is a research trend which has 
the clear potential to meet the foregoing challenges of 
the international oil and gas industry and assist pro- 
ducers in achieving business goals and energy needs. 
Previous work [8,19,20,23] has addressed successfully 
research challenges in this field, using appropriate sim- 
plifying correlations [25] for two-phase flow of oil and 
gas in production wells and pipelines. A series of as- 
sumptions are routinely adopted to achieve manageable 
computational complexity: the fundamental one is the 
steady-state assumption for the reservoir model, based 
on the enormous timescale difference between different 
spatial levels (oil and gas reservoir dynamics evolve in 
the order of weeks, the respective timescales of pipeline 
networks are in the order of minutes, and the produc- 
tion optimization horizon is in the order of days). The 
decoupling of reservoir simulation from surface facili- 
ties optimization is based on these timescale differences 
among production elements [2,25]. While the surface 
and pipeline facilities are in principle no different from 


those found in any petrochemical plant, sub-surface 
elements (reservoirs, wells) induce complexity which 
must be addressed via a systematic strategy that has not 
been hitherto proposed. 

In some petroleum fields, such as the Prudhoe 
Bay [22], a production well can be connected to dif- 
ferent manifolds that lead to different separators. In 
such fields, switching a well from one manifold to an- 
other could be an effective way to increase oil produc- 
tion and/or reduce production cost by making optimal 
use of the existing resources such as the capacity of 
separators [9]. However, for best results, the well con- 
nection must be optimized simultaneously with the well 
oil rate and gas lift rate. The corresponding optimiza- 
tion problem is known as well scheduling problem. 


Problem Statement 


The well scheduling problem in integrated oil and gas 
production can be stated as follows: given are (i) a set 
of wells, which could be closed (shut in) or connected 
to manifolds or separators, (ii) a set of flowlines which 
could be connected to separators. The goal is to deter- 
mine: (i) the operational status of the wells, i.e. closed 
or open, (ii) the connection of wells to manifolds or sep- 
arators, (iii) the connection of flow lines to separators, 
(iv) the well oil flowrate and the (v) the allocation of gas 
to gas lift wells, which maximize the net revenue (oil 
sales minus the cost of gas compression), while satisfy- 
ing physical laws and operational constraints such as: 
(i) a well bore model, (ii) mass, energy and momentum 
balances throughout the production network, (iii) up- 
per and lower well oil rate constraints and minimum 
pressure constraints at the inlet and outlet of the flow- 
lines, (iv) maximum oil, gas and water capacity con- 
straints in the separators, (v) an upper bound on gas 
lift availability and (vi) a maximum number of well 
switches. The first step in order to determine the op- 
timal well configuration that maximizes the revenue 
is to develop a suitable superstructure that includes 
all the possible pipeline network configurations. Such 
a production network superstructure is shown in Fig. 1 
and includes the reservoir (R), the wells (W), the man- 
ifolds (M), and the separators (S) nodes as well as the 
potential connection of wells to manifolds or separa- 
tors and flowlines to separators. Two types of wells are 
considered: (i) type A wells that can be connected only 
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to manifolds, and (ii) type B wells that can be connected 
to separators. It must be noted that a feasible produc- 
tion network should satisfy the following requirements: 
(i) a type A well should be either shut in or else con- 
nected to one manifold, (ii) a type B well should be 
either shut in or else connected to one separator, and 
(iii) a manifold flowline must be connected to one sep- 
arator. 


Optimization Model 


This section presents the MINLP optimization model 

for the well scheduling problem, based on the following 

assumptions: 

e the system is under steady state conditions, 

e a homogeneous slip model which is applied to de- 
termine the pressure drop in the pipelines, 

e the temperature of the reservoir is known, 
the operating pressures of the separators are con- 
stant, and 

e the thermodynamic description of the fluid is based 
on the black oil model. 

For the development of the MINLP optimization 

model, the following sets, variables and parameters are 

defined: 


Sets 

I set of wells 

I, set of wells of type A 
Ip_ set of wells of type B 
M set of manifolds 

S set of separators 


Indices 

i,iq,ip well in set I, I,, Ip respectively 
m manifold in set M 

s separator in set S 


Binary decision variables 


r-| 


—— 1 
Vim = 0 


1 if manifold m is connected to 
separator s 
0 otherwise 


1 if well iis open 
0 otherwise 


if well i is connected to manifold m 
otherwise 


1 if well i is connected to separator s 


%is =) 9 otherwise 


Continuous variables 

Gp,i__ flowrate in stock tank conditions of phase p from 
well i 

p,i,m flowrate in stock tank condition of phase p from 
well ito manifold m 


Ap,m,s flowrate in stock tank condition of phase p from 
manifold m to separator s 

Gp,s flowrate in stock tank condition of phase p in 
separator s 

Pi,m — pressure of well i at the manifold level 


Pm» manifold pressure 
total enthalpy of well i at the manifold level 
m manifold enthalpy 


The proposed model includes the following elements: 
(i) the well bore model, (ii) the mass, momentum and 
energy balances in well, manifold and separator nodes, 
(iii) the network logic constraints, (iv) the well and 
flowline momentum and energy balances, (v) the max- 
imum number of allowable well switches, and (vi) the 
objective function. 


Wellbore Model 


The wellbore model describes the multiphase fluid flow 
from the reservoir to the wellbore and comprises the 
following equations: 


Go,i = PI;(Pr,i — P™), Vie! (1) 
qi = folqo,i), Vie I (2) 
qw,i = fw(go,i), Viel (3) 
T; = Tr, Viel (4) 
H; = ful(P, Tis Jo,is Iwyis gi)» Viel (5) 


where pe is the bottomhole pressure and qd, ; is the 
formation gas flowrate in stock tank conditions. Equa- 
tions (2) and (3) can be nonlinear in order to model 
the case of gas and water coning wells. These nonlin- 
ear relations are generated either by using Addington’s 
correlations [1] or by repetitively solving a well coning 
model for different values of well oil rate qo,;, in or- 
der to calculate the corresponding water qy,; and gas 
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Wells A (W) 


Manifold (M) 


Reservoir (R) 


Mixed Integer Optimization in Well Scheduling, Figure 1 


Separator(S) Wells B (W) Reservoir (R) 


Production network superstructure for the well scheduling problem 


flowrates a ;- In the latter case, Eq. (2) and (3) are con- 
structed via curve fitting to the data series (qo,i, qo,i) 
and (q,i q's, ;) respectively. For naturally flowing wells, 
the total gas flowrate is given by: 


dg,i = qi, , Vie D= {ie 1 | natural flow} (6) 
while for gas lift wells the total gas flowrate is equal to: 


dei = it Gg ViE F={i el | gaslift}. (7) 


Mass, Momentum and Energy Balances in Well, 
Manifold and Separator Nodes 


A well node of type A can be modeled as a splitter 
i € I4, which consists of an inlet stream that repre- 
sents the fluid flow from the reservoir, and a set of out- 
let streams that represents the potential connections of 
a well to manifolds as shown in Fig. 2. The mass bal- 
ances around the splitter for each phase are given by 
the following relations: 


ant= > Gein Vp €{ow, gt, iely. (8) 


Similarly, the mass balances around a well node of 
type B are given by: 


dpi = >> 4pis, Vp €{o,w.g},i€ls. (9) 


There is also an upper and a lower bound in the well 
oil flowrates. The upper bound is enforced to prevent 


Qp.i,1 


Qp.i,2 


q pi 
Qp.i.M 


Mixed Integer Optimization in Well Scheduling, Figure 2 
Splitter node 


sand production [4], while the lower bound is imposed 
to satisfy stable flow [27]: 


Vide = Gor = Gis Viel. (10) 
Equation (10) states that if well iis open (y; = 1), then 
the well oil flowrate q,,; is constraint by an upper and 
a lower bound, while if well i is shut in (y; = 0), then 
the well oil flowrate qq,; is zero. 


Manifold Node A manifold node is shown in Fig. 3 
and performs two tasks: (i) mixing and (ii) splitting. 
The mass balance of the mixer for each phase is given 
by: 


> Gp.im = p.m, Vp € {o,w, gs}, meM. (11) 
i€l, 
The splitter allocates the manifold fluid gp, to one sep- 
arator s. The mass balances for each phase around the 
splitter are given by: 


dpm =) 4p.ms» Vp,me M. (12) 
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p,1,m 
Opus 
q p,i,m > 
pms 
we ¢ 
Qp,I,.m 1p,m 


p,m,S 


Mixer Splitter 
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Manifold node 


All wells that are connected to manifold m must op- 
erate at the same pressure (P,,,): 


L?(1 — yism) < P™ — Pm < U?(1 — yim)» 


: (13) 
VieI,4, mEM 


where P;,,, is the pressure of well i at manifold level m, 
and L’, U? are the corresponding upper and lower 
bounds, respectively. Moreover, if manifold m is con- 
nected to separator s, then its inlet pressure must be 


greater than the separator pressure: 
Ps¥ms < Pm, Vm eM,ses. (14) 


The pressure of the flowline at the separator level P,,,; 
is equal to the separator pressure: 


Prn.s = ye . (15) 
The energy balance in the manifold is given by: 
YH" =Hn, ¥meM (16) 


i€l, 


where H’” is the enthalpy of well i at manifold level m. 


Separator Node Each separator s has a set of inlet 
streams coming from the flowlines and type B wells, as 
shown in Fig. 4. The mass balances for each phase are 
given by the following relation: 


>; Fp.ms + > Dp,iss = Ap,s > Vp,s eS 


i€lg 


(17) 
while the separator capacity constraints must also be 
satisfied: 


Gps < Cos, Vp.seS. (18) 


Ip, 
dg, 
Op, k,i 
q Oi 
q p,k,N / 
q wii 
p,n1 p,n,i p,n,.N 


Mixed Integer Optimization in Well Scheduling, Figure 4 
Separation node 


Finally, the total amount of gas available for gas lift 
is restricted by the compressor capacity (Cc): 


inj 
y dS C.. 
i 


(19) 


Network Logic Constraints 
A well of type A could either be shut in, or else con- 
nected to one manifold: 


ere =Vi, Viel, (20) 


Vim <i, VielI4, mEM. (21) 


The integer Eq. (20) states that if the well is open 
(yi = 1) then it should be connected to one manifold, 
while Eq. (21) states that if the well is shut in (y; = 0) 
then all binary variables y;,,, which represent the con- 
nection of well i to manifold m are zero. 

Similarly, a well of type B could either be shut in, or 
else connected to one separator: 


Sas Vie lz (22) 


¥si <i, ViEelg, sES. (23) 


Furthermore, it is also necessary to enforce the condi- 
tion that each manifold flowline is connected to one 
separator: 


Yo yms = 1, Vm eM. (24) 
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Moreover, if the connection of well i to manifold m 
or separator s does not exist, then its corresponding 
flowrates and enthalpies must be zero: 


O= AP HA yin, VIEL (25) 
0 < Gp,im S ey ene Vp,iel4, mEM (26a) 


0< Dp,i,s < Gs egbs ; Vp, i€ Iz ,seES (26b) 


Well and Flowline Momentum and Energy Balances 


1. Naturally flowing wells of type A. Kosmidis [18] dis- 


cusses how naturally flowing wells of type A can be 
accurately approximated by piecewise linear func- 
tions: 


gm = ni, a , Wiel, (27a) 
j 
Pm =)" ni jPt,, Vie Is (27b) 
j 
Hy = ? Yo Aine He 5c , Wiel, (27c) 
j k 


Yo,i = cy Mise ca , Wiel, (27d) 
jk 


Goi < Go, Viels (27e) 


0,i 


yy Aga Viel, (27f) 
j ok 


ij = > ie » Ek = Y Aik , 
k j 


bin = Anite » Viely 
j 


(27g) 


Nig > €ik> F220, SOS, Viel,. (27h) 


It must be noted that if well i is shut in (y; = 0), 
then all continuous variables in constraint (27) are 
set equal to zero, as it can be observed from con- 
straint (27f). The piecewise linear approximation of 
the well model is constructed in a pre-processing 
step by discretizing: 

(i) the manifold pressure between the valid lower 

(L’) and upper (U”) bound, and 
(ii) the well oil rate in the interval [q/;, 47]. 


The lower bound (L”) is equal to the lowest op- 
erating pressure of the separators, while the upper 
bound (U”) must be greater than the highest oper- 
ating pressure of the separators. 


. Naturally flowing wells of type B. For the case of nat- 


urally flowing wells of type B, the oil flowrate qo,i,s 
of well i in separator s is given by: 


d,max 


qo,i,s < Go, i, Vis ’ Vie Iz (28) 


d,max - é z 
where q,’;, is calculated in a pre-processing step 
for each fixed pressure separator s by setting the 
choke fully open. 


. Gas lift wells of type A. These can be accurately ap- 


proximated by the following set of mixed integer lin- 
ear relations: 


Go0,i = > Se i , Vie Ip (29a) 
j eok 


7 joe as 

=D Aaiih. Viele — 9b) 
jek 

p™ = > ae Viel, (29c) 
jk 

H™ = Oy aes: Wiels (29d) 
jek 


wee ae Vie lz (29e) 
jek 


Vij = errr: ; Ei k = So dij ; 
j k 


Cit = > Hi,j,j+t, Vi € Ip 
j 


(29f) 


Ni,j > €i,k > Si2 = O(SOS), Vie Ip. (29g) 


These relations are constructed in a pre-processing 

step by discretizing: 

(iii) the manifold pressure in the interval [ L’ , U?], 
and 

(iv) the well gas injection rate in the interval 
[0, des | where des is the gas injection rate 
at the upper bound pressure (U”), where the 
well oil flowrate is reduced despite the increase 
in gas injection rate. 
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4. Gas lift wells of type B. These are connected to a fixed 
pressure separator and they can be accurately ap- 
proximated as follows: 


Go,i,s = pao ter eee , Vie Ip, s€ S (30a) 
j 


- 433 
Ge = DD cr ene , Vielgp,s€ES (30b) 
j 


ae ae Vielgp,seS (30c) 


J 


hese 0, SOS, (30d) 


Flowline Momentum Balance The momentum bal- 
ance in the manifold flowlines is given by: 


Piss — fre(Pmn, An, Jo,m>]g,m> dw,m) =0, 


(31) 
VmeM,seS 


where P,,,; is the pressure of flowline m at separator 
level s. 


Remark During construction of the piecewise linear 
ois? it is possible 
to identify naturally flowing wells of type A or type B 
which are unable to flow towards certain manifolds 
or separators. To exclude these infeasible connections, 
the following logic constraints are incorporated in the 


mathematical formulation: 


approximations or calculation of q 


Yi,m = 1 — Yn,s , Vie I, (32) 


Vis <0, Viel. (33) 


Constraint (32) states that if flowline m is connected to 
separator s(¥,; = 1), then well i cannot be connected 
to manifold m. 


Maximum Number of Well Switches 


There is an upper bound on the number of well switches 
(for wells of both types A and B) that can be performed 
within a day. This is an operational constraint and is 
applied to avoid huge flow variations which may even- 
tually lead to a surface facility shut down. To consider 
and model this requirement, the following binary vari- 
ables and parameters are introduced: 


Binary variables 


1 if the well i is open and in the 
previous day was closed 
0 otherwise 


1 if the well i is closed and in the 
previous day was open 
0 otherwise 


1 ifthe well i of type A is connected 
to a new manifold m on this day 
0 otherwise 


Ci,m — 


1 if the well i of type B is connected to 
a new separator s on this day 
0 otherwise 


Cis = 


> 


1 ifthe flowline m is connected to 
Cis = a new separator s on this day 
0 otherwise . 


Parameters 

NCy* maximum number of switches for the wells of 
type A. 

NC} maximum number of switches for the wells of 
type B. 

y? binary parameters representing the well struc- 
ture of the previous day. 


One switch is accounted for in the following cases: 

(i) A well i was closed and is currently open. This case 
is modeled by incorporating the following con- 
straint in the formulation: 

yy? <c, Wiel. (34) 
Thus, if well i is open (y; = 1) while it was pre- 
viously closed ( y? = 0), then one well switch is 
accounted for by forcing the binary variable c! to 
be 1. 

(ii) A well i was open and is currently closed. To in- 
corporate this well switch in the formulation, the 
following constraint is used: 


yay Viel. (35) 
(iii) A well of type A switches manifold. If well i is 
currently connected to manifold m(y;,m = 1) and 
it was previously connected to manifold m’ 4 m 
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(that implies vn = 0), then there is one well 
switch, which is modeled by the following con- 
straint in the formulation: 


Yim ~Yem <Cim, ViEB, mEM (36) 


where the set B = {i € I | y? # 0} is the set of 
wells of type A that were open during the previ- 
ous day. Constraint (36) is applicable only for wells 
of type A that were previously open (y’ 4 0), to 
avoid double counting a well switch; the case of 
a well that was closed ( y? = 0) and is currently 
open (y; = 1) is considered by constraint (34). 

(iv) A well of type B switches to a new separator. If well i 
is currently connected to separator s(yj,; = 1) 
and was previously connected to separator s’ # 
s( y?, = 0), then there is one well switch, which is 
modeled by the following constraint in the formu- 
lation: 


Vis= 9) Seas Viel,ses (37) 


where the set C = {i € Iz | y? 4 0} is the set of 
wells of type B that were open during the previous 
day. 

(v) A manifold flowline switches to a new separator. If 
a manifold flowline m is currently connected to 
a separator s(ym,; = 1) and was previously con- 
nected to separator s’ # s( ae = 0), then there is 
one switch, which is accounted for by forcing the 


binary variable c,,,; to be 1: 
Yims — Jeng lms, VM EM,sES. (38) 


The sum of switches for the wells of type A and B 
must be less then an upper bound: 


Wit GV4 EV coe 


i€l, i€B mEM 
+ >> Yoems <NCI* (39) 
meM ses 
ete y >) aeence*. (40) 
i€l, i€C sES 


Objective Function 


The objective function is the maximization of daily rev- 
enue: 


inj 
max Wo ) qo,i— We ) Voi 


ie] ie] 


(41) 


The control variables are: 

(i) the well operational status (open or close), 

(ii) the well connections to manifolds and separators, 
(iii) the flowline connections to separators, 

(iv) the well oil flowrates, and 

(v) the gas injection rates into gas lift wells. 


An MINLP Formulation 

for the Well Scheduling Problem 

By defining the vectors x =[P, H],qp =[4o0, 4g: dw qel, 
Y = [vis Vim Ym,s> Vis] C= let, oe Cis Cis Crisl 

@ = [Ai,;, €i,k. Si,4] and y* (the vector of binary vari- 

ables that are used to impose the adjacency condition 

in SOS-type variables), the mathematical programming 

formulation (P) for the well scheduling problem can be 

concisely expressed as: 


P: max W(qo,i- 4g) (42) 
subject to 

(Xj, Qp,i) = 0 (43) 
™M2(Qp,i> Ip,i,m>p,i,s> Ip,s>XisXijmsY) <0 (44) 
™3(Qp,i,m+4p,is+Y) <0 (45) 
MalGo.tv Gy 5 Fo,iae Gre a, Mime PoVoY) 

=0 (46) 
M5(Xm»Qp,m) = 0 (47) 
me(y’,c) <0. (48) 


The equivalence of the equations within the above 
model (P) is explained as follows. Equation (42) is 
equivalent to the linear objective function (41). Equa- 
tion (43) represents the nonlinear wellbore model 
Eq. (1)-(7). Equation (44) represents the mixed inte- 
ger linear mass, momentum and energy balances in 
the wells, manifold and separator nodes and is equiva- 
lent to Eq. (8)-(19). Equation (45) represents the mixed 
integer linear network logic constraints and is equiv- 
alent to Eq. (20)-(26). Equation (46) represents the 
well piecewise linear approximation and is equivalent 
to Eq. (27)-(30). Equation (47) represents the nonlinear 
momentum balance in the flowlines and is equivalent 
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to Eq. (31). Finally, Eq. (48) represents the integer logic 
relations associated with the ability of naturally flowing 
wells to flow and with the relevant well switches, and is 
equivalent to Eq. (32)-(40). 

The mathematical programming formulation P in- 
cludes: (i) binary variables, and (ii) nonlinear equa- 
tions. Therefore, it belongs to the class of mixed inte- 
ger nonlinear programming (MINLP) problems. There 
are two categories of binary variables: the one that is as- 
sociated with the structure of the production network 
and the well switches (y*,c), and a second that is used 
to impose the adjacency condition on SOS-type vari- 
ables (y*). Moreover, the number of nonlinear equa- 
tions is equal to the number of coning wells plus the 
number of flowlines. 

The most popular methods for solving MINLP 
problems are those that proceed by solving a sequence 
of nonlinear (NLP) and mixed integer linear programs 
(MILP) problems. These include Generalized Benders 
decomposition (GBD, Geoffrion [3]) and Outer Ap- 
proximation (OA, Kocis and Grossmann [5]). The dis- 
advantage of GBD is that it may require a significant 
number of major iterations of the NLP subproblem and 
the MILP master problem. The major advantage of OA 
is that it typically requires fewer iterations to achieve 
a solution, since its MILP master problem contains 
more information than the GBD formulation. Con- 
versely, because the OA master problem is richer, it is 
also more time-consuming to solve. A detailed review 
of the various MINLP algorithms has been published 
by Floudas [10]. 

This Chapter considers an approach based on 
Outer Approximation (OA), since it typically requires 
fewer iterations when compared to other MINLP tech- 
niques. Also, its modified version (Outer Aproxima- 
tion/Augmented Penalty (OA/AP), (Viswanathan and 
Grossmann, 1990)) has been found to be capable of 
handling mild nonconvexities present in the MINLP 
problems. 


Optimization Strategy 


The first NLP subproblem of the OA/AP algorithm 
involves solving an optimization problem where the 
structure of the pipeline network is the one of the previ- 
ous day. The /th NLP subproblem (/ > 1) involves fix- 
ing the discrete decisions y* and c to a given set of values 


(yc), Therefore, there is no need to introduce the 
logic constraints (45), (48) and hence the NLP subprob- 
lem (P) is equivalent to the well operation and gas lift 
allocation problem. It must be noted that the solution 
of the NLP subproblem provides a lower bound on the 
solution of the MINLP problem since the binary vari- 
ables y* and c are fixed to values that are not necessarily 
optimal. 

The master problem is formulated from the lin- 
earization of the nonlinear constraints (43) and (47) at 
the solution points of the subproblems (J = 1,...,L) 
and relaxation of them to inequalities using the sign of 
the Lagrange multipliers [17]. It is therefore, a MILP 
problem. The master problem provides (i) an upper 
bound to the MINLP problem and (ii) a new set of 
binary variables y* and c. The master MILP problem is 
as follows: 


PM: max W(qo,i.4g;) — (W))'pi — (wi)"qi (49) 


subject to 


| [Ve, r(x, a!) Vat (x! a), 1 


unaial 
(x; a |} <p, Vi=1,...,L (50) 


(qp.i —q; 
™M2(Qp,is p,i,m> Vp,i.s> Ip,s> Xi» Xi,zmsy) <0 (51) 
™3(Qp,i,m> Qp.i.s/Y) < 0 (52) 


M4(qo,is Ig Fouiess Tors Te i sXims PY") = 0 (83) 
v4 [Vs,,75(Xh> Gi) Vap.2?75(Xn; p,m) 


abe | 
- <q ,/=1,...,L (54) 
Pama 


m6(y*,c) <0 (55) 
= RSI Vie hak 66) 
nec! ne€NG! 


where w) and wi are both vectors whose dimension is 
equal to the number of equations in (50) and (54), re- 
spectively. Each element of these vectors is a positive 
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scalar which is greater than the absolute value of the La- 
grange multiplier vj associated with the jth constraint 
in (50) and (54) at the /th iteration. Moreover, T! is a di- 


agonal matrix whose elements are defined as follows: 
(57) 


Furthermore, p! and q/ are vectors whose elements are 
positive slack variables associated with each of the con- 
straints in Eq. (50) and (54), respectively. Finally, the 
constraint (56) is known as an integer cut and is ap- 
plied to ensure that any pipeline configuration that has 
already been considered is not selected again. The no- 
tion |G'| denotes the cardinality of the set G! whose el- 
ements are all the structural binary variables y,° that 
have a value of 1 at the /th iteration, while NG’ is the set 
of structural binary variables that have a value of zero at 
Ith iteration. 

The MINLP problem terminates when the differ- 
ence of the best lower bound from the NLP subprob- 
lems (max; LB’) and the current upper bound from the 
MILP problem (UB') are within a prespectified toler- 
ance é: 


max, LB! — UB! 
——___—— <e (58) 
UB 


or when the MILP problem is integer-infeasible. The 
optimal solution is the one given by the best NLP sub- 
problem. 

However, the solution of the MILP problem on the 
full space of both structural and interpolation binary 
variables is computationally intensive, since the num- 
ber of interpolation binary variables becomes very large 
as the number of wells increases. For instance, a prob- 
lem with 10 gas lift wells involves about 300 interpola- 
tion binary variables. This motivates the need to refor- 
mulate the MILP problem (PM), so as to involve only 
structural and switching binary variables y° and c. As 
mentioned, the master MILP problem is constructed 
from (i) linearization of the nonlinear constraints, and 
(ii) relaxation of the nonlinear equality constraints us- 
ing the sign of Lagrange multipliers. Fortunately, in- 
formation for both is available from the solution of 
the NLP subproblem. Consider for instance the case of 
a gas lift well of type A (29), where the subscript i has 


been dropped for simplicity. At the optimal point, three 
adjacent ju coefficients are active (Williams, 1990). The 
active triplet is assumed to be (f1j,4, Mj+1,ks Mj,k+1)> 
without loss of generality. Then the gas lift model (29) 
can be written as: 


d d d 
Jo = Mik IG, j,k THI+1K 0, jt 1k tHIK+IIO, ,k+1 (59a) 


ee jis ee 

dg) = Hikd ge + Bit kg et Hikt1dg ei (59D) 
d,m d,m d,m 

Pr = [hjkP or” + MjtikPigy + MjktiP;”  (59c) 

ik + Mjtik + Hj kt = 1 (59d) 


By substituting Eq. (59d) into (59b) and (59c), both 
/4j+1,k and [4,441 are given by: 


Py, — Po 
PELE = Sia _ pam (60a) 
j+l j 
qui ge 
a3 gk 
Mik = —“Ginj d,inj * (60b) 
kt Ig,k 


Substituting Eq. (60a) and (60b) into (59a), the follow- 
ing equation is obtained: 


d d 
Fo, j+1,k — 10, j,k 


d,m 
Fo = Yo, j,k + pam = pam (Pin = Pi ) 
j+1 j 
d d 
qo ik-+1 q ik ie eer 
ods OJ, inj inj 
d,inj d,inj “4g — 4k) - (61) 
Dojk+1 ~~ Vo.k 


Equation (61) is the linearization of the nonlinear gas 
lift well model, where the derivatives are calculated by 
forward finite difference formulae. If Eq. (61) replaces 
Eq. (59), a new NLP subproblem is obtained; then, by 
applying KKT conditions to both NLP subproblems, it 
is easy to prove that the Lagrange multiplier of Eq. (59a) 
is equal to the Lagrange multiplier of Eq. (61). Conse- 
quently, the active triplet of jz’s is obtained from the so- 
lution of the NLP subproblem, along with the Lagrange 
multiplier; moreover, the new MILP master problem 
(PM’) is formulated: 
PM': max W(o,i> Qo. 
— (wr) "pi — (wi) qr — (w7)"t1 (62) 
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The MILP problem (PM’) involves only structural (y‘) 
and switching (c) binary variables. Figure 5 depicts the 
linearization of the nonlinear gas lift model, according 
to the foregoing analysis. 


Solution Procedure 


Based on the foregoing sections, the steps of the 
proposed MINLP optimization strategy for the well 
scheduling problem are formally presented as follows: 
(1) Pre-processing step 

1. The reservoir information (productivity index, 
GOR and WOR) is updated, using a reservoir 
simulator. 

2. For each naturally flowing well, the manifold 
pressure and the well oil rate are discretized be- 
tween a lower and an upper bound. Then, the 
well model is simulated for each pair of discrete 


points, and the momentum and energy balances 
are approximated with piecewise linear func- 
tions. 

3. For each gas lift well, the manifold pressure 
and the gas injection rate are discretized be- 
tween a lower and an upper bound. Then, the 
well model is simulated for each pair of discrete 
points, and the momentum and energy balances 
are approximated with piecewise linear func- 
tions. 

4. If a naturally flowing well cannot flow towards 
a separator, then the corresponding logic con- 
straint is incorporated into the formulation. 

5. If Vertical Flowing Tables are used, then the ap- 
proximation of momentum and energy balances 
in the wells is simpler: there is no need for well 
simulation using each pair of discrete points, 
and simple interpolation calculations are used 
to approximate the momentum and energy bal- 
ances in the wells. 

(2) Processing step 

This step involves the solution of the MINLP prob- 

lem: 

1. Set the iteration counter at / = 0, and the upper 
bound at UB° = +00. 

2. Solve the NLP subproblem as a sequence of 
MILP problems, following the algorithm de- 
scribed by Kosmidis [18] 4 to obtain a lower 
bound (LB’). 

3. Add linearizations and integer cuts cumula- 
tively, and solve the MILP master problem 
(PM’) and update the upper bound (UB’). 

4. If (UB! — max; LB')/(LB') < ¢ or the MILP 
problem is integer-infeasible, then STOP. The 
optimal structure is the one which corresponds 
to the best lower bound max; LB ! Else, set 
1 = 1+ 1and go to step (2).2. 

(3) Post-processing step 

For each well, fix the manifold pressure and the well 

oil flowrate and perform a a well simulation (based 

on the system of well equations) to calculate the 
precise well choke settings. 


Example Problems 


This section illustrates the performance of the proposed 
MINLP algorithm in two different example problems. 
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Mixed Integer Optimization in Well Scheduling, Figure 5 
Linearization of a gas lift well model 


The first example is a small production network which 
involves three wells connected to a manifold, and it is 
used to illustrate the economic impact of incorporat- 
ing discrete decisions in the well scheduling problem. 
In the second example, the proposed MINLP optimiza- 
tion strategy is applied to a field consisting of 3 sepa- 
rators, 2 manifolds and 11 naturally flowing wells. To 
evaluate the economic benefits of the proposed MINLP 
optimization strategy, heuristic rules are also applied 
to the same problem for comparison. Finally, the pro- 
posed method is applied to an oil field, which consists 
of 22 (both naturally flowing and gas lift) wells. 


Example 1 


The mathematical formulation and the solution proce- 
dure developed in this Chapter has been applied to the 
production network presented in Fig. 6. The well char- 
acteristics, separator pressures and capacities are given 
in Table 1 and 2, respectively. The problem has been 
formulated as an MINLP problem, where binary vari- 
ables are used to model the operational status of each 
(closed or open) well. The MILP problems have been 
implemented in GAMS [5] and solved using CPLEX® 
as the MIP solver. The problem involves 3 binary vari- 
ables, 26 interpolation binary variables and 81 con- 


| ence 
700 


Manifold Pressure (psia) 


straints. Initially, the manifold pressure and the well 
oil flowrate are discretized to construct a piecewise lin- 
ear approximation of the well model. Then, the initial 
structure (all wells tied to the manifold) has been eval- 
uated by solving the corresponding NLP subproblem: 
the optimal solution has thus been determined equal 
to LB' = 12010 STB/day. The master MILP problem 
is then formulated and solved: the MILP problem so- 
lution generates a new production network structure, 
where well 1 is shut in. The new structure has then been 
evaluated in the NLP subproblem, and a new lower 
bound equal to LB? = 12104.2 STB/day has been deter- 
mined. The algorithm terminates, since the MILP mas- 
ter problem is found to be integer-infeasible. Therefore, 
the optimal structure involves only wells 2 and 3 con- 
nected to the manifold, with their chokes fully open. 

A typical heuristic rule for maximization of oil pro- 
duction states that the well chokes must be fully open 
for oil maximization. The application of this heuristic 
tule to this particular production network yields an oil 
production level equal to 11929.2 STB/day. The results 
from the application of heuristic rules and the pro- 
posed strategy are summarized in Table 3 and sug- 
gest that: (i) these heuristic rules may lead to subop- 
timal solutions, and (ii) an increase in oil production 
of 175 STB/day is observed when the proposed formal 
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Well characteristics for a three-well production network (illustrative example) 


Reservoir / pipeline parameters 
Reservoir pressure (psia) 
Productivity index (STB/psia day) 
GOR (SCF/STB) 


Well1 Well2 Well3 


eames 
L_ 


Flowline 


5100 |1900 | 1600 


wc 0.93 0.165 |0.15 
8000 |6000 | 7000 | 22000 ft 


Vertical length (ft) 
Horizontal length (ft) 
Diameter (in) 


6000 


Roughness 


0.0001 


Flowrate upper bound (STB/day) 


10000 


Flowrate lower bound (STB/day) 


Well 3 


Mixed Integer Optimization in Well Scheduling, Figure 6 
Production network structure for Example 1 (illustrative ex- 
ample) 


Mixed Integer Optimization in Well Scheduling, Table 2 
Surface facilities: separator capacities for Example 1 (illustra- 
tive example) 


Pressure (psia) 
Oil Capacity (STB/day) 


Gas Capacity (MSCF/day) 
Water Capacity (STB/day) 


MINLP optimization technique is applied to the well 
scheduling problem. The above result can be explained 
by considering the interaction of wells that share a com- 
mon flowline. This particular three-well network prob- 
lem has a well with a very high water cut (well 1), as can 


530 


Mixed Integer Optimization in Well Scheduling, Table 3 
Comparison of structure and oil production results: heuris- 
tics vs. optimization 


Structure Objective function (STB/day) 


(¥1,¥25¥3) = (1, 1, 1) | 11929.2 (Heuristics) 
(¥15¥25¥3) = (0, 1, 1) | 12104.2 (Optimization) 


be seen from Table 1: this results in increased back pres- 
sure in the manifold flowline, which restricts oil pro- 
duction from wells 2 and 3. By shutting in well 1, the 
pressure drop in the flowline is reduced: the increased 
production from wells 2 and 3 thus compensates losses 
in oil production by shutting in well 1. 


Example 2 


The proposed MINLP optimization strategy is also ap- 
plied to an oil field that comprises 11 naturally flowing 
wells, 2 manifolds and 3 separators: this production 
network is depicted in Fig. 7. Two types of wells are 
considered: (i) type A wells, designated as TBO1, TBO2, 
TBO04, TBO5, TBO7, TBO8D and TB10, and (ii) type B 
wells, designated as All, Al13, A15 and A18. All these 
are naturally flowing wells and their well oil flowrate 
upper bounds are given in Table 4. The surface facil- 
ities consist of a high (HP), an intermediate (IP) and 
a low (LP) pressure separator, and the respective oper- 
ating pressures and capacities are summarized in Ta- 
ble 5. Two case studies are considered: the first is a gas 
coning oil field, while the second is a water coning oil 
field. The well bore model is generated from a reservoir 
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Mixed Integer Optimization in Well Scheduling, Figure 7 
Production network structure for Example 2a (initial pipeline configuration) 


Mixed Integer Optimization in Well Scheduling, Table 4 
Maximum flowrate values for wells 


2300 STB/day 7000 STB/day 4100 STB/day 


4300 STB/day TBO8D | 1000 STB/day 4200 STB/day 


2500 STB/day 7000 STB/day 1800 STB/day 


7500 STB/day 1200 STB/day 


Mixed Integer Optimization in Well Scheduling, Table 5 
Operating pressures and capacities of separators for Example 2a (gas coning) 


HP separator IP separator LP separator 
Capacity Optimal Capacity Optimal Capacity Optimal 
Pressure (psia) 


Oil (STB/day) 12541.9 


Gas (MMSCF/day) 24000 


Water (STB/day) 1236.5 


Total oil production 


NLP (LB) 28910 


MILP (UB) 31580 
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Mixed Integer Optimization in Well Scheduling, Table 6 
Optimal well flowrates by MINLP optimization (Example 2a) 


Well = Q, (STB/day) Q, (MSCF/day) Qy (STB/day) 


Freoad|shutin | +t Sid 
Freoz | 632006 | zaasera | 00 | 


TBO7 | 6647.67 12543.21 109.594 


5814.3 
4100 
1291.18 
1800 
208.552 


8229.61 
9151.44 
3930.83 
4917.72 

650.76 


2392.47 
71.62 
1553.64 
432.51 
3.213 


simulator using a coning model. Details about this sec- 
ond example (oil flowrate as a function of bottomhole 
pressure, GOR and WC for both cases) are presented by 
Kosmidis [18]. 


Example 2a (gas coning problem) ‘The initial struc- 
ture of the production network is shown in Fig. 7, 
and five (5) well interconnection changes are allowed 
for wells of type A and type B. The MINLP opti- 
mization problem involves 89 binary variables, 260 in- 
terpolation binary variables, 924 continuous variables, 
1082 constraints and the objective is the maximiza- 
tion of oil production. The optimization requires 5 
OA/AP iterations and the total oil production is 
29317.2 STB/day; the optimal production network 
structure is presented in Fig. 8. Table 5 summarizes the 
amount of oil, gas and water in the separators and the 
convergence history of the MINLP algorithm; the in- 
dividual well fluid flowrates are reported in Table 6. 
A remarkable observation is that the gas capacity of 
all separators is fully utilized at the optimal operat- 
ing point, as can be observed from the results of Ta- 
ble 5. 


Example 2b (water coning problem) This problem is 
again solved following the proposed MINLP optimiza- 
tion strategy. The initial structure of the field is pre- 
sented in Fig. 8; the maximum number of allowable 
well interconnection changes is seven (7) for wells of 
type A and type B. The MINLP problem converges in 
6 OA/AP iterations and the optimal structure is de- 
picted in Fig. 9. Table 7 presents the amount of oil, gas 


and water in the separators and the convergence his- 
tory of the MINLP problem, while well fluid flowrates 
are reported in Table 8. The results of Table 7 suggest 
that the production bottleneck of the oil field is the wa- 
ter separator capacity, and the proposed MINLP opti- 
mization method manages to allocate and operate the 
wells in such a way that the available water separator 
capacity is almost fully utilized. The manifold flowline 
that is connected to the HP separator in the initial struc- 
ture (Fig. 7) is reallocated to the IP separator, since the 
latter has a larger water capacity compared to the HP 
separator (Table 7). 


Heuristic Rules vs. Optimization Examples 2a 
and 2b are also both solved with heuristic rules, by 
applying the following procedure: 

STEP 0. Consider an initial pipeline structure identical 
to that of the previous day. 

STEP 1. Set the chokes fully open and solve the corre- 
sponding production network problem. 

STEP 2. If some of the resulting well flowrates from 
Step 1 violate their upper bounds, then choke back these 
wells until the respective upper bounds are satisfied. 
STEP 3. The following two heuristic rules are applied 
sequentially (one well at a time): 


(i) Choke back the well according to the follow- 
ing heuristic rule: if gas and/or water capacity 
constraints are violated, then choke back the well 
with the highest GOR and/or WC, respectively, un- 
til the capacity constraints are satisfied. Terminate 
or else go to Step 3 (ii). 

(ii) Allocate high GOR wells to the HP separator and 
high WC wells to the LP separator, and go back to 
Step 1. 


It must be noted that: (i) the heuristic rules are applied 
sequentially, and (ii) the termination criterion is based 
on the satisfaction of the operator. The results from 
the application of heuristic rules are based on repeti- 
tively applying the procedure described, until the max- 
imum number of allowable interconnection changes is 
reached. The production network structures resulting 
from the application of heuristic rules in Examples 2a 
and 2b are depicted in Fig. 10 and 11, respectively. 
Tables 9 and 10 summarize the results derived from 
both MINLP optimization and heuristic strategies. The 
comparison clearly demonstrates the economic bene- 
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Mixed Integer Optimization in Well Scheduling, Figure 8 


TBO8D 


Optimal production network structure by MINLP optimization (Example 2a) 


Mixed Integer Optimization in Well Scheduling, Table 7 


Optimal surface separator capacities by MINLP optimization (Example 2b) 


HP separator 


IP separator LP separator 


Capacity Optimal Capacity Optimal Capacity Optimal 


Pressure (psia) 1235 ez | 
Oil (STB/day) 15000 [5900 [10000 |9714.5 |10000 | 9684.4 
14069.2 | 18000 |18000 | 


Gas (MMSCF/day) | 24000 


Water (STB/day) 


460 165 


18000 | 18000 18000 


Total oil production 


NLP (LB) 


MILP (UB) 


fits from the application of the proposed MINLP opti- 
mization strategy, which in both examples achieves of 
up to 10% in oil production. There are many reasons 
which can explain these superior results: (i) the sim- 
plistic nature of heuristic rules, which consider only the 
individual well GOR and WC, and neglect other param- 


eters (e.g. productivity index, pipeline length and di- 
ameter), (ii) heuristic strategies do not account directly 
for system interactions, which become important when 
the wells share a common flowline, and (iii) heuristic 
methods often have ad hoc or unclear termination cri- 
teria. 
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TBO4 


Optimal production network structure by MINLP optimization (Example 2b) 


Mixed Integer Optimization in Well Scheduling, Table 8 
Optimal well flowrates by MINLP optimization (Example 2b) 


Qo (STB/day) Qg (MSCF/day) Qy (STB/day) 


Freoa [shutin | +t 


TBO7 | 6864.4 12937.2 1893.7 


1691.2 
4100 
Shut in 
1800 
1158.8 


Integration of Reservoir Multiphase Flow 
Simulation and Optimization 


Dynamic oil and gas production systems simulation 
and optimization is a research trend with a potential 


to meet the challenges faced by the international oil 
and gas industry, as has been already demonstrated in 
a wide variety of publications in the open literature. 
The multiphase flow in reservoirs and wells governs 
fuel transport and production, but is mostly handled 
by algebraic approximations in modern optimization 
applications: true reservoir state variable profiles (ini- 
tial/boundary conditions) are generally not known. 
Nevertheless, oil reservoirs, wells, pipelines, manifolds 
and surface facilities are all equally important ele- 
ments of a spatially and temporally distributed complex 
system, and the potential contribution of CFD methods 
has not been fully explored so far, even though it is gen- 
erally recognized that computing accurate reservoir and 
well state variable profiles can be extremely useful for 
optimization. This section discusses a strategy for inter- 
facing reservoir simulation (ECLIPSE”) with equation- 
oriented process optimization (gPROMS?) and presents 
a relevant application [13]. 

The complex multiphase flow in oil production 
fields is of paramount importance. Despite intensive 
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Mixed Integer Optimization in Well Scheduling, Table 9 
Comparison of results: MINLP optimization vs. heuristics (Example 2a) 


Example 2a (High GORs) Capacity Optimization Heuristics 
Qo (STB/day) Qo (STB/day) Qo (STB/day) 


9584.15 9004.296 
7191.18 7191.186 
12541.894 12321.138 
Total 29317.2 28516.6 
Benefit (STB/day) 800.5 (+2.3%) 
A15 A13 A141 
A18 
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Mixed Integer Optimization in Well Scheduling, Figure 10 
Heuristic production network structure (Example 2a) 


Mixed Integer Optimization in Well Scheduling, Table 10 
Comparison of results: MINLP optimization vs. heuristics (Example 2b) 


Example 2b (High WCs) Capacity Optimization Heuristics 
Qo (STB/day) Q, (STB/day) Qo (STB/day) 


10000 9684.4 7424.407 
10000 9714.5 9311.058 


Benefit (STB/day) | | -2663.3(+11.8%)| 
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Mixed Integer Optimization in Well Scheduling, Figure 11 
Heuristic production network structure (Example 2b) 


experimentation and extensive CFD simulations to- 
wards improved understanding of flow and phase dis- 
tribution, commercial optimization applications have 
not benefited adequately from accurate sub-surface 
multiphase CFD modeling, and knowledge from field 
data is not readily implementable in commercial soft- 
ware. Model integration can enable the employment 
of two-phase reservoir CFD simulation, towards en- 
hanced oil or gas production from depleted or gas-rich 
reserves, respectively. 

The concept of integrated modeling and optimiza- 
tion of oil and gas production treats oil reservoirs, 
wells and surface facilities as a single (albeit multiscale) 
system, and focuses on computing accurate reservoir 
state variable profiles (as initial/boundary conditions). 
The upper-level optimization can thus benefit from 
the low-level reservoir simulation of oil and gas flow, 
yielding flow control settings and production resource 
allocations. The components of this system are tightly 
interconnected (well operation, allocation of wells to 
headers and manifolds, gas lift allocation, control of un- 


stable gas lift wells). These are only some of the prob- 
lems that can be addressed via this unified framework. 
Figure 12 presents the concept of integrated modeling 
of oil and gas production systems. 


Literature Review and Challenges 
for Integrated Modeling and Optimization 


A number of scientific publications address modeling 
and simulation of oil extraction: they either focus on ac- 
curate reservoir simulation, without optimization con- 
siderations [15,22], or on optimal well planning and 
operations, with reduced [8,23,29,32] or absent [28,30] 
reservoir models. A recent paper [16] is the only con- 
sidering a three-dimensional field topology (without 
additional flow constraints) for well placement op- 
timization. Computational Field Dynamics (CFD) is 
a powerful technology, suitable for studying the dy- 
namic behavior of reservoirs for efficient field opera- 
tion [2]. The MINLP formulation for oilfield produc- 
tion optimization of Kosmidis [19] uses detailed well 
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Mixed Integer Optimization in Well Scheduling, Figure 12 
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Integrated modeling concept for oil and gas production systems optimization: illustration of the hierarchy of levels and all 


production circuit elements 


models and serves as a starting point in the case ex- 
amined in this section. Therein, the nonlinear reser- 
voir behavior, the multiphase flow in pipelines, and sur- 
face capacity constraints are all considered (multiphase 
flow is handled by DAE systems, which in turn com- 
prise ODEs for flow equations and algebraics for phys. 
properties). The model uses a degrees-of-freedom anal- 
ysis and well bounding, but most importantly approx- 
imates each well model with piecewise linear functions 
(via data preprocessing). 

Here, explicit reservoir flow simulation via a dy- 
simulator (ECLIPSE®) is 
bined with an equation-oriented process optimizer 
(gPROMS?’), towards integrated modeling and opti- 
mization of a literature problem 13. An asynchronous 
fashion is employed: the first step is the calculation of 
state variable profiles from a detailed description of the 
production system (reservoir) via ECLIPSE”. This is 
possible by rigorously simulating the multiphase flow 
within the reservoir, with real-world physical proper- 
ties (whose extraction is laborious [7]). These dynamic 
state variable profiles (pressure, oil, gas and water sat- 


namic reservoir com- 


uration, flows) are a lot more accurate than piecewise 
linear approximations [18], serving as initial condi- 
tions for the higher-level dynamic optimization model 
(within gPROMS?). Crucially, these profiles consti- 
tute major sources of uncertainty in simplified models. 
Considering the oil and gas pressure drop evolution 
within the reservoir and along the wells, one can solve 
single-period or multi-period dynamic optimization 
problems that yield superior optima, because piece- 
wise linear pressure underestimation is avoided. While 
integrating different levels (sub-surface elements and 
surface facilities - Fig. 12) is vital, interfacing CFD 
simulation with MINLP optimization is here pursued 
in an asynchronous fashion (given the computational 
burden for CFD nested within MINLP). 

The concept of integrated modeling and optimiza- 
tion is illustrated in Fig. 13. 


Problem Definition and Model Formulation 


Dynamic CFD modeling for explicit multiphase flow 
simulation in reservoirs and wells comprises a large 
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2. Extraction of accurate 1D BCs (use in gPROMS 


Mixed Integer Optimization in Well Scheduling, Figure 13 


TOP-LEVEL DYN. OPTIMIZATION 
(SURFACE & WELL SYSTEM) 


BOTTOM-LEVEL SIMULATION 
(WELL & RESERVOIR SYSTEM) 


Integrated modeling and optimization of oil and gas production systems: illustration of the explicit consideration of multi- 


phase flow within reservoirs and wells 


number of conservation laws and constitutive equa- 
tions for closure: Table 1 presents only the most im- 
portant ones, which are implemented in ECLIPSE’. The 
black-oil model [25] is adopted in this study, to manage 
complexity. More complicated, compositional models 
are widely applied [2], accounting explicitly for dif- 
ferent hydrocarbon real- or pseudo-species concentra- 
tions. A black-oil model allows for multiphase simula- 
tion via only 3 phases (oil, water, gas): 

Multiphase flow CFD model equations (Nomencla- 
ture [19]): 
Oil: 


k kyo Of 4 Se 
Vv Eeag + pst] + qo — at (03) (70) 


Water: 


k krw - 
Vv | v1, + psn +qy => (os) (71) 


Gas: 
kk 
v|—_V@.+e in| 
Fa & § 
k kro 
+ VIR; 5, x V(Po + pgh)| + 4g 
0, S¢ So 
= R, 72 
mG bags ) Me 
Total pressure gradient: 
dP w(x)S 
oF = gpm (x) sin(o) — 2) (73) 
Capillary pressure (oil/gas): 
Peog(So; Sg) =P, = Ee (74) 
Capillary pressure (oil/water): 
Prow(So, Sw) = Po — Py (75) 
Multiphase mixture saturation: 
So + Sw +Sg=1 (76) 
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Mixed Integer Optimization in Well Scheduling, Figure 14 
Temporal evolution of pressure, oil saturation and gas/oil ratio in an oilfield: the gradual depletion of oil in reservoirs is 


explicitly considered for optimization (t:yr) 


Multiphase mixture density: 

Pm(x) = pi(x)Ei(x) + pg(x)Eg(x) 
Multiphase mixture viscosity: 

Min(X) = Hi (x)Ei(x) + g(x) E g(x) 
Multiphase mixture sup. velocity: 


pi(x) Uei(x) + Pg(x) 


NS ela Pmn(x) 


Use (x) 


Multiphase mixture holdup closure: 
(77) E,(x) + Ej(x) = 1 


Drift flux model (gas holdup): 


(78) Eg = Fa(Usi, Us,, mixture properties) 
Choke model (for well & valve i): 
(79) 


qui = fcldi, Pi(x3,), Px), Cis Vg,is Vw,i) 


(80) 


(81) 


(82) 
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Mixed Integer Optimization in Well Scheduling, Table 11 


Oil production optimization by explicit CFD simulation boundary conditions 


Example 2a, Kosmidis et al.[ ] Total capacity Via performance indices With explicit reservoir simulation 


Oil production (STB/day) 35000 29317 E2 30193.7 (+2.9%) 


Gas production (MSCF/day) | 60000 60000 60000 


Water production (STB/day) | 14000 11294.3 11720.1 (+3.8%) 


Choke setting (for well & valve i): 


e; = max(c., Pi(x7,), Pi(x3,)) (83) 
Performance (flow vs. pressure): 
qi,i = fi(Pwe,j,i)s Viel, Vi€ {0,w, g}. (84) 


Reduced (1D) multiphase flow balances were solved us- 
ing a fully implicit formulation and Newton’s method, 
but only for the wells and not for the reservoir [18]. 
The present section uses: (a) explicit reservoir and 
well 3D multiphase flow simulation, (b) elimination 
of Eq. (84) (performance relations/preprocessing obso- 
lete due to CFD), (c) CFD profiles as initial conditions 
(asynchronous fashion) for dynamic optimization. The 
MINLP optimization objective (maximize oil produc- 
tion) and model structure is adopted from the litera- 
ture [20] via a gPROMS°-SLP implementation. Adopt- 
ing an SQP strategy can increase robustness as well as 
computational complexity. 


Reservoir Multiphase Flow Simulation Results 


Dynamic multiphase flow simulation _ results 


(ECLIPSE”) are presented in Fig. 14. 


Oil Production Optimization Results 


Dynamic optimization via explicit CFD simulation of 
a particular oil field problem can improve on results 
from MINLP optimization: the comparison is pre- 
sented in Table 11. 


Conclusions 


A novel MINLP optimization formulation for the well 
scheduling problem has been proposed in this Chapter: 
the optimal connectivity of wells to manifolds and sep- 
arators is treated simultaneously with the optimal well 
operation and gas lift allocation. The algorithm avoids 
examining infeasible connections of wells to manifolds 
or separators by incorporating appropriate integer cuts 


in the formulation: these, along with the incorpora- 
tion of operational logic constraints pertinent to the 
maximum number of well switches, lead to satisfac- 
tory computational performance: convergence has been 
achieved in less then 6 iterations in all cases examined. 
The business value of the new MINLP formulation has 
been investigated by comparing the proposed method 
with established heuristic rules, and an increase of up 
to 10% in oil production has been observed for the cases 
studied [18]. 

The combination of dynamic multiphase CFD sim- 
ulation and MINLP optimization has the potential to 
yield improved solutions towards efficiently maximiz- 
ing oil production. This Chapter also addresses inte- 
grated oilfield modeling and optimization, treating the 
oil reservoirs, wells and surface facilities as a com- 
bined system: most importantly, it stresses the ben- 
efit of computing accurate state variable profiles for 
reservoirs via CFD. Explicit CFD simulations via a dy- 
namic reservoir simulator (ECLIPSE®*, Schlumberger) 
are combined with equation-oriented process opti- 
mization software (gPROMS’, PSE): the key idea is to 
use reduced-order copies of CFD profiles for dynamic 
optimization. The literature problem solved shows that 
explicit use of CFD results in optimization yields im- 
proved optima at additional cost (CPU cost and cost 
for efficient separation of the additional water; the per- 
centage difference is due to accurate reservoir simu- 
lation). These can also be evaluated systematically for 
larger case studies under various conditions [14]. 
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Background 
Mixed-Integer Programming 


Mixed-Integer Programming (MIP) [5] emerged in the 
mid 1950s as an extension of Linear Programming (LP) 
to include both integer and continuous variables. It 
was developed to address a variety of problems (facil- 
ity location, scheduling, design of plants and networks, 
etc.) where discrete decisions needed to be made. There 
are two main algorithms used to solve MIP models: 
branch-and-bound [5,31] and cutting planes. When 
the two solution methods are combined we have the 
branch-and-cut algorithm, where cutting planes are 
added until either an integral solution is found or it be- 
comes impossible or too expensive to find another cut- 
ting plane. In the latter case, a traditional branch op- 
eration is performed and the search for cutting planes 
continues for the subproblems. Balas developed an al- 
gorithm for 0-1 problems to obtain dual bounds and 
check primal feasibility [3]. The idea of cutting planes 
was originally proposed by Gomory in [17], and a cut- 
ting plane algorithm was presented by Gomory in [18]. 
A general procedure for bounded programs was pro- 
posed by Chvatal in [13]. 

The results of Edmonds and Fulkerson in the late 
1960s led several authors to propose other, specific 
types of cutting planes: cover inequalities [4,5], flow 
cover inequalities [4], and GUB constraints [51]. Due 
to the incorporation of these theoretical results, the 
efficiency of the commercial solvers has greatly been 
enhanced during the last decade. Advances in prepro- 
cessing, more sophisticated branching and node selec- 
tion rules, as well as the use of primal heuristics have 
also contributed to the improvement of MIP solvers. 
Special techniques have also been used extensively for 
the solution of MIP problems, when the set of con- 
straints exhibits a special structure. The most popular of 
these schemes are Benders decomposition [9] and La- 
grangean relaxation [16,19]. More information on MIP 
can be found in [38], and [52], while an exposition in 
recent progress in solution techniques for MIP models 
can be found in [30]. 


Constraint Programming 


Constraint Programming [24,47] is a relatively new 
modeling and solution paradigm that was originally de- 
veloped to solve feasibility problems, but it has been 
extended to solve optimization problems as well. Con- 
straint Programming (CP) has emerged as a very in- 
teresting sub-field of logic programming that aims at 
combining the declarative aspect of logic programming 
and constraint solving in an efficient problem solv- 
ing environment [29]. Optimization problems in Con- 
straint Programming are solved as Constraint Satisfac- 
tion Problems (CSP), where we have a set of variables, 
a set of possible values for each variable (domain) and 
a set of constraints among the variables. Constraints are 
solved with methods and advanced techniques originat- 
ing in various areas, from Artificial Intelligence, Oper- 
ations Research and Discrete Mathematics. The com- 
putation domains handled by CP solvers are quite di- 
verse, including Boolean algebra, linear programming, 
finite domains, and list and set handling. Successful in- 
dustrial applications were implemented with CP solvers 
over finite domains in production planning, schedul- 
ing and resource applications [44]. Finite domain con- 
straints are expressed over variables, which range over 
a finite set of possible values. Constraints may be arith- 
metic, symbolic or global constraints [1] that have been 
developed to efficiently model and solve complex prob- 
lems. A CP program is usually structured as follows: 
(1) declaration of decision variables, (2) constraints and 
(3) the enumeration/optimization. The question to be 
answered is as follows: Is there an assignment of values 
to variables that satisfy all constraints? Constraint Pro- 
gramming is very expressive as continuous, integer, as 
well as boolean variables are permitted and moreover, 
variables can be indexed by other variables. A CP prob- 
lem can be seen as a network of constraints. As soon as 
some information becomes available at some points in 
this network, constraints are invoked to check consis- 
tency and to remove inconsistent values by applying ef- 
ficient handling methods. The new domain reductions 
are propagated through the network. The solution of 
CP models is based on performing constraint propaga- 
tion at each node by reducing the domains of the vari- 
ables. If an empty domain is found the node is pruned. 
Branching is performed whenever a domain of an inte- 
ger, binary or boolean variable has more than one ele- 
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ment, or when the bounds of the domain of a contin- 
uous variable do not lie within a tolerance. Whenever 
a solution is found, or a domain of a variable is reduced, 
new constraints are added. The search terminates when 
no further nodes must be examined. 

The effectiveness of CP depends on the propaga- 
tion mechanism behind the constraints. Thus, even 
though many constructs and constraints are avail- 
able, not all of them have efficient propagation mech- 
anisms. For some problems, such as scheduling, prop- 
agation mechanisms have been proven to be very ef- 
fective. Some of the most common propagation rules 
for scheduling are the “time-table” constraint [32], 
the “disjunctive-constraint” propagation [6,45], the 
“edge-finding” [12,39] and the “not-first, not-last” [7]. 
Constrained-based scheduling algorithms can be found 
in [8]. General information on CP can be found 
in [24,27,36,47]. 


Methods 


Several authors have compared MIP and CP based ap- 
proaches for solving a variety of problems [21,26], and 
the main findings are as follows: 

e MIP based techniques are very efficient when the LP 
relaxation is tight and the models have a structure 
that can be effectively exploited. 

e CP based techniques are better for highly con- 
strained discrete optimization problems. 

Since the two approaches appear to have complemen- 

tary strengths, in order to solve difficult problems that 

are not effectively solved by either of the two, several re- 
searchers have proposed models that integrate the two 
paradigms. The integration between MIP and CP can 

be achieved in two ways [26,48]: 

1 By combining MIP and CP constraints into one hy- 
brid model. In this case a hybrid algorithm that inte- 
grates constraint propagation with linear program- 
ming in a single search tree is also needed for the 
solution of the model (e. g. see [21,42]). 

2 By decomposing the original problem into two sub- 
problems: one MIP and one CP subproblem. Each 
model is solved separately and information obtained 
while solving one subproblem is used for the solu- 
tion of the other subproblem [11,28]. 

Bockmayr and Kasper [10] have presented a uni- 
fying framework, called Branch and Infer, which can 


be used for the development of various integration 
schemes. Hooker et al. [25] have proposed a new mod- 
eling paradigm to perform efficient integration of MIP 
and CP techniques. In general, it is not clear whether 
an integration strategy performs better than a stan- 
dalone MIP or CP approach, especially when the prob- 
lem at hand is solved effectively by one of the two ap- 
proaches. For some problems, however, the integration 
of the two approaches has led to significant compu- 
tational improvements. Common integration schemes 
include the derivation of cuts for MIP formulations 
using CP techniques, the use of CP to accelerate col- 
umn generation, and the use of CP local search to solve 
MIP scheduling problems. Integration schemes are de- 
scribed in [21,23,26,27,37], and [48]. 

MIP/CP Hybrid Schemes are particularly successful 
for scheduling problems that often arise in manufac- 
turing, chemical and food industry, in transportation 
industries and in computing environments. To solve 
a scheduling problem one has to (i) allocate limited re- 
sources to tasks, and (ii) sequence the tasks allocated to 
a single resource. We will refer to the first set of deci- 
sions as the assignment problem, and the second set of 
decisions as the sequencing problem. 

While heuristic methods are widely used, rigorous 
optimization methods have also been studied. To solve 
some hard scheduling problems to optimality, several 
authors have proposed MIP/CP hybrid schemes that 
exploit the complementary strengths of Mathematical 
and Constraint Programming. The main idea behind 
these approaches is to solve a relaxed MIP model to 
determine the allocation of machines to tasks, and use 
CP to check the feasibility of a given assignment and 
to generate cuts that are added in the relaxed MIP 
model. Thus, the complementary strengths of the two 
methods are combined: Mathematical Programming is 
used for optimization (i.e. identify potentially good 
assignments) and Constraint Programming to check 
feasibility. 


Applications 


A scheduling problem that has been widely studied us- 
ing hybrid schemes is the Multi-Machine Assignment 
Scheduling Problem (MMASP) with Release and Due 
Times. In this problem a set I of N jobs have to be pro- 
cessed on a set J of M machines; the processing of job 
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i¢ I= {l1,...N}onany machine j € J = {1,...M}, 
must start after its release time 7; and must be com- 
pleted before its due time d;; the processing time and 
processing cost of job i € I on machine j € J are Pj, 
and Cj respectively. The objective is to minimize the 
total processing cost. The MMSAP was first studied by 
Hooker et al. [22] in a hybrid optimization framework. 

A MIP model (M) for the MMASP consists of con- 
straints (1)-(6): 


min Z = pmo wer (1) 


i€l jeJ 

Yo xij =1 Viel (2) 

jg 

s;>r; Viel (3) 

sit > Pijxiy < di Viel (4) 
ied 


Vii + yii 2 xij + Xj —1 
VjeJ,Viel,Vi Elli<i’ (5) 


sit aay < si + M(1 — vii) 
je] 
Viel, Vii elli#i’ (6) 
where binary xj is 1 if job i is assigned to machine j, 
binary yj; is 1 is job i is scheduled before job i’ in the 
same machine, and 5; is the start time of processing of 
job i. 

Constraint (2) ensures that each job is processed on 
exactly one machine. Constraints (3) and (4) restrict 
each job to start after its release, and finish before its 
due time, respectively. Constraint (5) imposes the con- 
dition that if both jobs i and i’ are assigned to the same 
machine j (i. e. xjj+x;;—1 = 1), then jobs iand i’ must 
be sequenced (i.e. yi = 1 or yj = 1). Constraint (6) 
is a big-M sequencing constraint that is active when yj;;/ 
is 1. 

Hooker et al. [22] and Jain and Grossmann [28] 
showed that model (M) is not efficient, due to the poor 
LP relaxation caused by the big-M sequencing con- 
straint (6). Furthermore, they showed that standalone 
CP models are not efficient either, due to the large num- 
ber of different assignments. To overcome this, the au- 
thors proposed a scheme where an IP master problem 


and a CP subproblem are solved iteratively. The IP mas- 
ter problem isa relaxation of model (M) and it is used to 
determine an assignment. The CP subproblem is used 
to check feasibility of the current assignment; if infea- 
sible, integer cuts are added and the IP master problem 
is re-solved; if feasible, the subproblem gives a feasible 
sequence, and the algorithm terminates. The IP master 
problem consists of constraints (1)—-(2), (7) and the in- 
teger cuts that are added at each iteration. Constraint 
(7) is used to eliminate infeasible assignments: 


> Pijxiy S max{d;}—minfr;} VjeT (7) 

ie€l 
The IP master problem does not include the sequenc- 
ing binary variables y;; and big-M constraint (6), it is 
solved fast, and at iteration k, yields a complete job- 
machine assignment x*. The CP subproblem is then 
used to check whether the current assignment x* is fea- 
sible. At each iteration k, the set I * of jobs assigned on 
machine j € J, the processing time pk of each job, and 
the domain Dt for the start time of job i (i.e. s; € Dy) 
are given by (8)-(10), respectively: 


kK gayyk A 
Ty = {ilxi= 1} VjeJ (8) 
ie] 


‘1 Wiel 


D* = [r;,d; — P! (10) 


Thus, the CP subproblem reduces to |J| one-machine 
independent problems, and for each one of these prob- 
lems we try to find a sequence of jobs in I ks that sat- 
isfies constraint (10) and the non-overlapping of jobs 
assigned to machine k (see (5) and (6)). This problem 
can be solved using the global constraint cumulative [1], 
and various propagation techniques (time-table, dis- 
junctive, edge-finding, etc.). 


(11) 


cumulative jeo ((s;,di,ri), 1, e) 


The basic version of cumulative, (see detailed exam- 
ples in [2]) takes 3 arguments, argument 1 is the set of 
operations O where each operation is characterized by 
three parameters, which can be either domain variables 
or values; the starting time s;, the duration d;, and the 
amount of some resource r; used by the operation. The 
second argument / is the upper bound on the resource 
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consumption. The third argument e is the completion 
time. In this case, Eq. (11) can be written as follows: 


cumulative jey, (sh 1) 1, max (12) 
i€lj 


The global cumulative constraint is satisfied if the fol- 
lowing conditions hold: 


» 


i€O:s;<t<sj+dj,tél..1 


7; <1 AND max(s; + di) <e 
i€ 
(13) 


If there is no sequence for machine j that satisfies con- 
straint (12), then the current assignment x* is infeasi- 
ble. For every infeasible one-machine problem we add 
the following integer cut in the cut-pool of the master 
problem: 


Yo xi Ss [ZF =1 


sok 
i€l; 


(14) 


If the IP master problem is solved to optimality, the 
lower bound provided by the optimal solution Z* of 
the IP is non-decreasing, and the first feasible assign- 
ment is the assignment that yields the optimal solution 
with a minimum assignment cost. A schematic of the 
proposed algorithm is given in Fig. 1. The hybrid itera- 
tive approach was shown to be considerably faster than 
standalone MIP and CP models. 

The above hybrid decomposition can also be imple- 
mented in a branch-and-cut framework (B&C), where 
the IP master problem is not solved to optimality before 
adding cuts. In the B&C framework, cuts are added ei- 
ther at a (possibly suboptimal) integer solution to the 
master problem or a partially feasible node, i.e. a node 
with integer assignments for a subset of machines. 

Bockmayr and Pisaruk [11] proposed a hybrid 
branch-and-cut scheme where the master problem is 
solved using an IP solver and the CP solver is called 
at a node of the tree, in order to generate integer cuts. 
The advantage of this method is that the IP model is 
not solved from scratch every time an integer solu- 
tion (i.e. an assignment) is found. Furthermore, the au- 
thors were able to obtain cuts that are stronger than 
the ones proposed by Jain and Grossmann [28]. They 
were also able to generate cuts from fractional LP solu- 
tions of the IP model. The computational performance 


of the proposed hybrid branch-and-cut approach is bet- 
ter than the iterative IP/CP approach. Vazacopoulos 
and Verma [49] proposed certain Disjunctive and pre- 
emptive cuts to a priori forbid infeasible assignments 
and developed two hybrid MIP/CP algorithms for the 
MMASP. Sadykov and Wolsey [43] studied several 
hybrid approaches and developed two IP/CP hybrid 
schemes that appear to be better than those previously 
proposed. In the first, the authors were able to develop 
two classes of tightening inequalities, in the space of x;; 
variables, which exclude many infeasible assignments 
and thus lead to smaller trees. The tightening inequal- 
ities are knapsack constraints, similar to constraint (7), 
but for subsets of set I. They also proposed a column 
generation algorithm using the tightening inequalities. 
While the MMASP has been extensively studied 
due to its simple structure, hybrid schemes have also 
been developed for more complex scheduling prob- 
lems. Harjunkoski and Grossmann [20], Timpe [46] 
and Constantino [14] presented hybrid schemes for 
complex chemical plants. Maravelias and Gross- 
mann [33,34] proposed a general framework for in- 
tegrating Mathematical and Constraint Programming 
methods for the solution of scheduling problems, while 
Maravelias [35] proposed the integration of MIP meth- 
ods with heuristic algorithms. Hybrid methods that 
combine Mathematical and Constraint Programming 
have also been applied to transportation, inventory 
management and resource allocation problems. 


Conclusions 


While the computational efficiency of MIP/CP meth- 
ods varies significantly, there is evidence that for some 
classes of problems they outperform existing methods. 
In general, if the structure of the problem at hand is 
exploited by efficient preprocessing and the genera- 
tion of strong cuts, it is expected that hybrid schemes 
will be more effective because they combine the 
complementary strengths of two solution techniques. 
The computational performance of hybrid methods 
relies on (i) the quality of the decomposition, (ii) the 
solution efficiency of the two subproblems, and (iii) the 
number of subproblems needed to be solved to prove 
optimality. Ideally, the original problem should be de- 
composed/reformulated into a tight MIP subproblem 
that is easily solved yielding potentially good feasible 
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Solve IP master problem 


Eqs: (1), (2), (7), 4) 
Yield Z*, x* 


Define subproblem 
via eqs (8) — (10) 


Solve CP subproblem 
i.e. |J| one-machine problems 
Eq: (12) 


Add integer cut(s) 
eq (14) 


Optimal 
Solution 
Subproblem 

Feasible ? 


Mixed Integer Programming/Constraint Programming Hybrid Methods, Figure 1 
Iterative hybrid IP/CP scheme of Jain and Grossmann [28] 


solutions, and a feasibility CP subproblem that is used 
to check feasibility and generate cuts. 


In particular, MIP/CP methods have been shown to 


be very effective in tackling scheduling problems where 
both assignment and sequencing decisions have to be 
made. The key idea in these methods is the decompo- 
sition of the original problem into two sub problems; 
Mathematical Programming is used for the assignment 
of tasks to resources, while Constraint Programming is 
used for the sequencing of tasks on resources. 
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Abstract 


This chapter presents model based controllers for two 
drug delivery systems: (i) surgery under anesthesia and 
(ii) insulin delivery for type 1 diabetes. For anesthesia, 
a compartmental model is presented and then used for 
deriving model predictive controller for simultaneous 
control of mean arterial pressure (MAP), cardiac out- 
put (CO) and hypnosis. For type 1 diabetes, parametric 
control techniques are used for obtaining insulin deliv- 
ery rate as an explicit function of the state of the patient. 
This reduces the implementation of the model based 
controller to function evaluations that can be carried 
out on a portable computational hardware. 


Introduction 


Drug delivery systems aim to provide effective therapy 
by minimizing side effects, reducing deviations from 
the desired state of the patient and increasing patient 
compliance and safety. Automation of a drug delivery 
system relies on the mathematical model of the pa- 
tient that can take into account the pharmacokinetic 
and pharmacodynamic effects of the drugs on various 
organs of the body. To reduce the complexity of the 
mathematical model, some of the organs are lumped 


and then represented as interconnected compartments. 
This reduction in complexity is quite important es- 
pecially for models that are used for controlling the 
amount of drugs to be infused. In this chapter, mod- 
els and advanced model based controllers for two drug 
delivery systems are presented. In Sect. “Surgery Un- 
der Anesthesia”, the first system which is concerned 
with the delivery of anesthetics for patients undergoing 
surgery is discussed. A compartmental model is pre- 
sented that considers a choice of three drugs, isoflu- 
rane, dopamine and sodium nitroprusside, and there- 
fore allows simultaneous control of mean arterial pres- 
sure, cardiac output and hypnosis. This model is then 
used for designing model predictive controller and the 
performance of the controller is tested for its set-point 
tracking capabilities. In Sect. “Blood Glucose Control 
for Type 1 Diabetes” model based parametric con- 
troller for the regulation of the blood glucose concen- 
tration for people with type 1 diabetes is derived. The 
key advantage of this controller is that the optimal 
drug infusion rate is obtained as an explicit function 
of the state of the patient and therefore requires sim- 
ple function evaluations for its implementation. Con- 
cluding remarks are presented in Sect. “Concluding 
Remarks”. 


Surgery Under Anesthesia 


Anesthesia is defined as the absence or loss of sensa- 
tion. In order to provide safe and adequate anesthesia, 
the anesthesiologist must guarantee analgesia, provide 
hypnosis, muscle relaxation and maintain vital func- 
tions of the patient. Anesthesiologists administer anes- 
thetics and monitor a wide range of vital functions, such 
as mean arterial pressure (MAP), heart rate, cardiac 
output (CO). These vital functions need to be moni- 
tored and maintained within tolerable operating ranges 
by infusing various drugs and/or intravenous fluids as 
shown in Fig. 1. Automation of anesthesia is desirable 
as it will provide more time and flexibility to the anes- 
thesiologist to focus on critical issues, monitor the con- 
ditions that cannot be easily measured and overall im- 
prove patient’s safety. Also, the cost of the drugs will 
be reduced and shorter time will be spent in the post- 
operative care unit. There is a significant amount of 
research in the area of developing models and con- 
trol strategies for anesthesia [10,14,15,17]. Gentilini et 
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Model Based Control for Drug Delivery Systems, Figure 1 
Anesthesia control system (adapted from [6]) 


al. [6] proposed a model for the regulation of MAP 
and hypnosis with isoflurane. It was observed that con- 
trolling both MAP and hypnosis simultaneously with 
isoflurane was difficult. Yu et al. [16] proposed a model 
for regulating MAP and CO using dopamine (DP) and 
sodium nitroprusside (SNP), but the control of hypno- 
sis was not considered. 

In the next section, a compartmental model is pre- 
sented, which allows the simultaneous regulation of 
the MAP and the unconsciousness of the patients. The 
model is characterized by: (i) pharmacokinetics for the 
uptake and distribution of the drugs, (ii) pharmacody- 
namics which describes the effect of the drugs on the vi- 
tal functions and (iii) baroreflex for the reaction of the 
central nervous system to changes in the blood pres- 
sure. The model involves choice of three drugs, isoflu- 
rane, DP and SNP. This combination of drugs allows 
simultaneous regulation of MAP and hypnosis. 


Modeling Anesthesia 


The model is based on the distribution of isoflurane in 
the human body [15]. It consists of five compartments 
organized as shown in Fig. 2. 

The compartments 1-5 represent lungs, vessel rich 
organs (e.g. liver), muscles, other organs and tissues 
and fat tissues respectively. 

The distribution of the drugs occurs from the cen- 
tral compartment to the peripheral compartments by 
the arteries and from the peripheral to the central by 
the veins. The first compartment in Fig. 2 is the central 


Patient 
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uptake 
| Injection of DP and 
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oe 


Model Based Control for Drug Delivery Systems, Figure 2 
Compartmental model 


compartment and heart can be considered to be belong- 
ing to the central compartment, whereas compartments 
2-5 are the peripheral compartments. 


Pharmacokinetic Modeling The uptake of isoflurane 
in central compartment via the respiratory system is 
modeled as: 


ACinsp 


V = 
dT 


Qin Cin = (Qin _ AQ)Cinsp 
_ FR(Vr _— A)(Cinsp = Cout) > 


where Cinsp is the concentration of isoflurane inspired 
by the patient (g/ml), Ci, is the concentration of isoflu- 
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rane in the inlet stream (g/ml), Cou is the concentration 
of isoflurane in the outlet stream (g/ml), Qin is the inlet 
flow rate (ml/min), AQ is the losses (ml/min), V is the 
volume of the respiratory system (1), fr is the respira- 
tory frequency (l/min), Vr is the tidal volume (1) and 
A is the physiological dead space (ml). For the central 
compartment, the concentration of isoflurane is given 


5 
=> (2 (z = ci)) + fi VrA)(CinspCr), 
i=2 , 

where C; is the concentration of the drug in compart- 
ment i (g/ml), R; is the partition coefficient between 
blood and tissues in compartment i, Q; is the blood flow 
in compartment i (ml/min). The concentration of DP 
and SNP in the central compartment is modeled as fol- 
lows: 


where Cin¢ is the concentration of the drug infused 
(g/min), V; is the volume of compartment i (ml) and 
T1/2 is the half-life of the drug (min). Isoflurane is elim- 
inated by exhalation and metabolism in liver, the 2nd 
compartment, as follows: 


where ky is the rate of elimination of isoflurane in the 
2nd compartment (min™!). The distribution of isoflu- 
rane in compartments 3 to 5 is given by: 


The natural decay of DP and SNP in the body, for 
compartment 2 to 5, is given by: 


dC; C; 1 
V; =Q; Cc, -— ——C;V;, Dos on De 
dt " (< ) ud 


Pharmacodynamic Modeling The effect of DP and 
SNP on two of the heart’s characteristic parameters: 


maximum elastance (Emax) and systemic resistance 
(Reys) is given by: 


dEfft 
dt. 
Emax = Emax,o (1 + Effpp—Emax) 

Reys = Reys,o (1 — Effpp-r,, — Effsnp—r,y,) F 


= k,CN (Effinax — Eff) — ko Eff 


sys 


where Eff is the measure of the effect of drug on 
the parameters of interest, R,y, is the systemic resis- 
tance (mmHg/(ml/min)), Emax is the maximum elas- 
tance (mmHg/ml), Emax is nominal maximum elas- 
tance, R.yso is nominal systemic resistance, Effpp—zmax 
is effect of DP on Emax, Effpp—reys is effect of DP on Ryys, 
Effsnp—reys is the effect of SNP on Rgys, ky, kz are the rate 
constants and N is the non-linearity constant. MAP can 
then be expressed as a function of Emax and Rey; as: 


i 
MAP? —— + 2K?MAP — 2K? VivEmax = 0 
sys 
AaortaALv 


JP Vv Aly — A ds 


where MAP is the mean arterial pressure (mmHg), 
Aaorta is the cross sectional area of the aorta (cm*), Ary 
is the cross sectional area of the left ventricle (cm”), Viy 
is the mean volume of the left ventricle (ml) and p is the 
blood density (g/ml). Isoflurane affects MAP as follows: 


MAP = Q ; 


5 
Xu (gio (1 + b;C;)) 


K= 


where, gio is the baseline conductivities (ml/ 
(min.mmHg)) and J; is the variation coefficient of con- 
ductivity (ml/g). There is experimental evidence that 
a transportation delay exists between the lungs and the 
site of effect of isoflurane on the unconsciousness of the 
patient. In order to model this, an effect compartment 
is linked to the central compartment. The concentra- 
tion of isoflurane within this compartment is related to 
the central compartment, which is given by: 


where C, is the concentration of isoflurane in the effect 
compartment (g/ml), and keo is the kinetics in the effect 
compartment (min7!). The action of isoflurane can be 
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then expressed as follows: 


rag 
Cr + ECh, 
ABIS = BIS — BISo 
ABISmax = BISmax — BIS , 


ABIS = ABISmax 


where BISp is the baseline value of BIS (assumed to be 
100), BIS;yax is the maximum value of BIS (assumed to 
be 0), ECso is the patient’s sensitivity to the drug and y 
is the measure of the degree of non-linearity. 


Baroreflex Baroreflex is obtained from a set of trans- 
fer functions relating the mean arterial pressure to the 
maximum elastance and the systemic resistance and is 
given by: 


ec(MAP—MAPO) 


bfc = 1 + ec(MAP—MAPO) ” 


where c is the empirical constant (mmHg). 


Control of Anesthesia 


The model presented in the previous section was val- 
idated by carrying out a number of dynamic simula- 
tions for different amounts of drug dosages and distur- 
bances using gPROMS [7]. For designing controllers, 
this model was linearized at the nominal values of in- 
puts: 0.6% vol. of isoflurane, 2 ug/kg/min of DP and 
4 ug/kg/min of SNP and outputs: 57.38 mmHg of MAP, 
61.1 BIS and 1.211/min of CO, to obtain a state-space 
model of the following form: 


X41 = Ax; + Bu; 


(1) 
yt = Cx, + Du; , 
subject to the following constraints: 
Xmin SX S Xmax 
Ymin Ss yt ss Ymax (2) 


Umin S Ut S Umax » 


where x; € R", y; € R!, u,; € R™, are the state, output 
and input vectors respectively and the subscripts min 
and max denote lower and upper bounds respectively. 


Model predictive control (MPC) [5] problem can then 
be posed as the following optimization problem: 


: T 
aa J(U, x(t)) = Xp+ny|ePXt+Nylt 


N,-1 

+ xP Qx +ul_,Ru 
t+klt<*t+klt t+kU4ttk 

k=0 


S.t. Xmin S Xp4kjpSXmax, kK=1,...,Ne 
Vmin S Ve+k|t S Vmax » k= Dea Ne (3) 
linin S Ui = Atinaes eH Vy Ne 
Xitktie = Axper + Burze, k=O 
Yerkes = CXitet+ Duyzer, k=O 


Ur+k = Kxytke, Nu Sk < Ny, 


where U = [u/,..., Hage il 3 Q and R are constant, 
symmetric and positive definite matrices, P is given 
by the solution of the Riccati or Lyapunov equation, 
N,, N, and N, are the prediction, control and con- 
straint horizons respectively and the superscript T de- 
notes transpose of the vector. Problem (3) is solved 
at each time ¢t for the current state x; and the vector 
of predicted state variables, x;41)\1,.-.,X:+Ny|r at time 
t+1,...,t+ k respectively and corresponding con- 


trol actions u;,..., Uz+4 are obtained. 


Results 


The model for anesthesia consists of 23 states, 3 out- 
puts and 3 inputs. This state-space form of the model 
is then adapted for designing model predictive con- 
troller by using the MATLAB Model Predictive Control 
Toolbox™ [11]. For designing the MPC controller, the 
following input: 0 < DP < 7ug/kg.min, 0 < SNP < 
10 ug/kg.min, 0 < Isoflurane < 5%vol., and output 
constraints: 40 < MAP < 150mmHg, 40 < BIS < 65, 
1 < CO < 6.51/min are used. A prediction horizon of 
5, control horizon of 3 and sampling time of 0.5 min- 
utes are considered. A set point of [20-10 1]’ deviation 
from the nominal point of the output variables is given 
and the performance of the controller is shown in Fig. 3. 
It is observed that the MPC tracks the set point quite 
well. The performance of the MPC was also tested by 
reducing the model to 15 states and was observed to be 
very good. From the above results it can be inferred that 
the model based control technology provides a promis- 
ing platform for the automation of anesthesia. 
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Model Based Control for Drug Delivery Systems, Figure 3 
MPC performance for anesthesia 


Note that MPC solves a quadratic program at regu- 
lar time intervals. In the next section a parametric pro- 
gramming approach for control of blood glucose for 
type 1 diabetes is presented that does not require repet- 
itively solving quadratic programs. 


Blood Glucose Control for Type 1 Diabetes 


Diabetes is a disease that affects the body’s ability to 
regulate glucose. In Type 1 diabetes, the pancreas pro- 
duces insufficient insulin, and exogenous insulin is re- 
quired to be infused at an appropriate rate to maintain 
blood sugar levels within the range of 60-120 mg/dl [2]. 
If insulin is supplied in excess, the blood glucose level 
can go well below normal (<60 mg/dl), a condition 
known as hypoglycemia. On the other hand, if insulin 
is not supplied sufficiently, the blood glucose level is el- 
evated above normal (> 120 mg/dl), a condition known 
as hyperglycemia. Both hypo- and hyperglycemia can be 
harmful to an individual’s health. Hence, it is very im- 
portant to control the level of blood glucose in the body 
to within a reasonable range [9,12]. In the following 
sections, advanced model based controllers for regulat- 
ing the blood glucose concentration for type 1 diabetes 
are presented. 
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t<Sece 


Plasma v 
Glucose G(t) 


Exercise, Meals D(t) 


Model Based Control for Drug Delivery Systems, Figure 4 
Schematic representation of the Bergman model 


Model for Type 1 Diabetes 


The Bergman model [1] is used in this study, which 
presents a ‘minimal’ model comprising 3 equations to 
describe the dynamics of the system. The schematic 
representation of the model is shown in Fig.4. The 
modeling equations are: 


dG 

Fp = TPG — XG + G) + DE) (4) 
dI 

Gp Th + Ie) + UI, (5) 
oO P,X + P3I (6) 
dt = 2 34. 


The states in this model are: G, plasma glucose con- 
centration (mg/dl) relative to basal value, I, plasma 
insulin concentration (mU/I) relative to basal value, 
and X, proportional to I in remote compartment 
(min7'). The inputs are: D(t), meal glucose disturbance 
(mg/dl/min), U(t), manipulated insulin infusion rate 
(mU/min) and Gp, Ip, nominal values of glucose and 
insulin concentration (81 mg/dl; 15 mU/l). The param- 
eter values for a Type 1 diabetes are: P} = Omin“!, 
P, = 0.025 min“, P; = 0.000013 1/mUmin’, 
V, = 12land n = 5/54 min [4]. 

The model, (4)-(6) is linearized about the steady- 
state values of G, = 81 mg/dl, I, = 15mU/], X, = 0 
and U, = 16.66667 mU/min to obtain the state space 
model of the form: x;4,; = Ax; + Bu; + Bad; where 
the term d; represents the input disturbance glucose 
meal. The sampling time considered is 5 minutes, 
which is reasonable for the current glucose sensor tech- 
nology. The discrete state-space matrices A, B, Cand By 
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are as follows: 


1 —0.000604 —21.1506 
A=] 0 0.6294 0 
0 0.00004875 0.8825 
—0.000088 
B=| 0.3335 
0.0000112 
5 
C=(1 0 @], Br=)-6 
0 


The constraints imposed are 60 < G+ G, < 180 and 
0<U+U, < 100. 


Parametric Controller 


Parametric programming can be used in the MPC 
framework to obtain U as a function of x; by treating 
U as optimization variables and x; as parameters as de- 
scribed next [3,13]. For simplicity in presentation as- 
sume that N, = N, = N;, the theory presented is how- 
ever valid for the case when Ny, N, and N, are not 
equal. The equalities in formulation (3) are eliminated 
by making the following substitution: 


k= 


Xt+klt = A*x; F S > A Burg ej (7) 
j=0 


to obtain the following Quadratic Program (QP): 


: 1 ue T 1 T 
min-U° HU + x, FU + =x; Yx; 
u 2 2 (8) 
s.t.GU < W+Ex;:, 


where, U = [er acccgle cecal € R*, is the vector 
of optimization variables, s= mN,, H is a con- 
stant, symmetric and positive definite matrix and 
H,F,Y,G,W,E are obtained from Q, R and (1) and 
(2). 

The QP problem in (8) can now be reformulated as 
a multi-parametric quadratic program (mp-QP): 


1 
V,(x) = min -z? Hz 
z 2 (9) 
s.t.Gz<W+Sx;, 


where, z= U+H7!F'x,;,z€R’, and S = 
GH"'!F’, 


This mp-QP is solved by treating z as the vector 
of optimization variables and x; as the vector of pa- 
rameters to obtain z as an explicit function of x;. U 
is then obtained as an explicit function of x; by using 
U=z—H"'Fx,. 


Results 


A prediction horizon N, = 5 and Q/R ratio of 1000 
is considered for deriving the control law - this re- 
sults in partitioning of the state-space into 31 polyhe- 
dral regions. These regions are known as Critical Re- 
gions (CR). Associated with each CR is a control law 
that is an affine function of the state of the patient. For 
example, one of the CRs is given by the following state 
inequalities: 


—-5<s7=25 

0.0478972G — 0.0002712I — X < 0.104055 
0.0261386G — 0.0004641I — X < 0.0576751 
— 0.00808846G + 0.00119685I + X <0 

— 0.00660123G + 0.001302391 + X <0 
0.00609435G — 0.00134362I — X <0 


10) 


where the insulin infusion rate as a function of the state 
variables for the next five time intervals is given as fol- 
lows: 


U(1) = 30.139G — 0.445971 — 3726.2X 
U(2) = 24.874G — 0.403261 — 3280.4X 
U(3) = 20.16G — 0.359461 — 2842.8X 
U(4) = 16.002G — 0.31571] — 2424.1X 
U(5) =0 


(11) 


The complete partitioning of the state-space for G 
= 80 mg/dl into CRs is shown in Fig. 5. The perfor- 
mance of the parametric controller for a 50mg meal 
disturbance [8] is as shown in Figs. 6 and 7. The corre- 
sponding trajectory of the state variables is also shown 
in Fig. 5. 

The model based parametric controller of the form 
given in (10) and (11) can be stored and implemented 
on a simple computational hardware and therefore can 
provide effective therapy at low on-line computational 
costs. 
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Critical regions for type 1 diabetes 
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Model Based Control for Drug Delivery Systems, Figure 6 
Glucose concentration vs. time 


Concluding Remarks 


Automation of drug delivery systems aims at reducing 
patient inconvenience by providing better and person- 
alized healthcare. The automation can be achieved by 
developing detailed models and by deriving advanced 
controllers that can take into account the model as well 
as the constraints on state and control variables. In this 
chapter, a compartmental model incorporating phar- 
macokinetic and pharmacodynamic aspects for deliv- 
ery of anesthetic agents has been presented. This model 
was then used for the derivation of model predictive 
controller. For type 1 diabetes, implementation of ad- 
vanced model based controllers through a simple com- 
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Model Based Control for Drug Delivery Systems, Figure 7 
Insulin infusion vs. time 


putational hardware was demonstrated by deriving in- 
sulin delivery rate as an explicit function of the state 
of patient. The developments presented in this chapter 
highlight the importance of modeling and control tech- 
niques for biomedical systems. 


See also 


> Nondifferentiable Optimization: Parametric 
Programming 
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Introduction 


We define difficult optimization problems as problems 
that cannot be solved to optimality or to any guaran- 
teed bound by any standard solver within a reason- 
able time limit. The problem class we have in mind 
are mixed-integer programming (MIP) problems. Op- 
timization, and especially MIP, is often appropriate and 
frequently used to model real-world optimization prob- 
lems. While it started in the 1950s, models have become 
larger and more complicated. 

A reasonable general framework is mixed-integer 
nonlinear programming (MINLP) problems. They 
are specified by the augmented vector x, =x' @y' 
established by the vectors x! = (x1,...,%n,) and 
y' =()1,---,¥ny) Of me continuous and ng discrete 
variables, an objective function f(x, y), 1. equality con- 
straints h(x,y), and n; inequality constraints g(x, y). 
The problem 


h(x,y) =0,h:XxU—>R”™, 
xe xX CR" 
g(x,y) >0,g:XxU—>R", 
yeUcag” 


min 


f(x,y) 


(1) 


is called a mixed-integer nonlinear programming 
(MINLP) problem if at least one of the functions 
F(x. y), g(x,y), or h(x, y) is nonlinear. The vector in- 
equality, g(x, y) > 0, is to be read componentwise. Any 
vector Xj satisfying the constraints of (1) is called a fea- 
sible point of (1). Any feasible point whose objective 
function value is less than or equal to that of all other 


feasible points is called an optimal solution. From this 
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definition it follows that the problem might not have 
a unique optimal solution. 

Depending on the functions f(x,y), g(x,y), and 
h(x, y) in (1) we get the following structured problems 
known as 


Type of 
opti- 
mization 


f(x, y) 


Linear 
pro- 
gram- 
ming 
Quadratic 


integer 
LP 
Mixed- 
integer 
QP 
Mixed- 
integer 
NLP 


Xp AX@ +c'x@ | Ax@ —b 


with a matrix A of m rows and n columns, i.e., A € 
Mim x n,R),b € R”,c € R", andn=n, + ng. 
Real-world problems lead much more frequently to 
LP and MILP than to NLP or MINLP problems. QP 
refers to quadratic programming problems. They have 
a quadratic objective function but only linear con- 
straints. QP and MIQP problems often occur in appli- 
cations of the financial services industry. 

While LP problems as described in [31] or [1] can 
be solved relatively easily (the number of iterations, and 
thus the effort to solve LP problems with m constraints, 
grows approximately linearly in m), the computational 
complexity of MILP and MINLP grows exponentially 
with ng but depends strongly on the structure of the 
problem. Numerical methods to solve NLP problems 
work iteratively, and the computational problems are 
related to questions of convergence, getting stuck in 
bad local optima and availability of good initial solu- 
tions. Global optimization techniques can be applied to 


both NLP and MINLP problems, and its complexity in- 
creases exponentially in the number of all variables en- 
tering nonlinearly into the model. 

While the word optimization, in nontechnical or 
colloquial language, is often used in the sense of im- 
proving, the mathematical optimization community 
sticks to the original meaning of the word related to 
finding the best value either globally or at least in a lo- 
cal neighborhood. For an algorithm being considered 
as a (mathematical, strict, or exact) optimization al- 
gorithm in the mathematical optimization community 
there is consensus that such an algorithm computes fea- 
sible points proven globally (or locally) optimal for lin- 
ear (nonlinear) optimization problems. Note that this is 
a definition of a mathematical optimization algorithm 
and not a statement saying that computing a local opti- 
mum is sufficient for nonlinear optimization problems. 
In the context of mixed-integer linear problems an op- 
timization algorithm [12] and [13] is expected to com- 
pute a proven optimal solution or to generate feasible 
points and, for a maximization problem, to derive a rea- 
sonably tight, nontrivial upper bound. The quality of 
such bounds is quantified by the integrality gap - the 
difference between the upper and lower bound. What 
one considers to be a good-quality solution depends on 
the problem, the purpose of the model, and the accu- 
racy of the data. A few percent, say 2 to 3%, might be 
acceptable for the example discussed by Kallrath (2007, 
Encyclopedia: Planning). However, discussion based on 
percentage gaps become complicated when the objec- 
tive function includes penalty terms containing coefhi- 
cients without a strict economic interpretation. In such 
cases scaling is problematic. Goal programming as dis- 
cussed in ([23], p. 294) might help in such situations to 
avoid penalty terms in the model. The problem is first 
solved with respect to the highest-priority goal, then 
one is concerned with the next level goal, and so on. 

For practical purposes it is also relevant to observe 
that solving mixed-integer linear problems and the 
problem of finding appropriate bounds is often NP- 
complete, which makes these problems hard to solve. 
A consequence of this structural property is that these 
problems scale badly. If the problem can be solved to 
optimality for a given instance, this might not be so 
if the size is increased slightly. While tailor-made op- 
timization algorithms such as column generation and 
branch-and-price techniques can often cope with this 
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situation for individual problems, it is very difficult for 

standard software. 

We define difficult optimization problems as prob- 
lems that cannot be solved to optimality or within a rea- 
sonable integrality gap by any standard MIP solver 
within a reasonable time limit. Problem structure, size, 
or both could lead to such behavior. However, in many 
cases these problems (typically MIP or nonconvex op- 
timization problems fall into this class) can be solved if 
they are individually treated, and we resort to the art of 
modeling. 

The art of modeling includes choosing the right 
level of detail implemented in the model. On the one 
hand, this needs to satisfy the expectations of the owner 
of the real-world problem. On the other hand, we are 
limited by the available computational resources. We 
give reasons why strict optimality or at least safe bounds 
are essential when dealing with real-world problems 
and why we do not accept methods that do not generate 
both upper and lower bounds. 

Mapping the reality also forces us to discuss 
whether deterministic optimization is sufficient or 
whether we need to resort to optimization under un- 
certainty. Another issue is to check whether one objec- 
tive function suffices or whether multiple-criterion op- 
timization techniques need to be applied. 

Instead of solving such difficult problems directly 
as, for example, a standalone MILP problem, we dis- 
cuss how problems can be solved equivalently by solv- 
ing a sequence of models. 

Efficient approaches are as follows: 

e Column generation with a master and subproblem 
structure, 

Branch-and-price, 

e Exploiting a decomposition structure with a rolling 
time horizon, 

e Exploiting auxiliary problems to generate safe 
bounds for the original problem, which then makes 
the original problems more tractable, 

Exhaustion approaches, 

e Hybrid methods, i.e., constructive heuristics and 
local search on subsets of the difficult discrete 
variables leaving the remaining variables and con- 
straints in tractable MILP or MINLP problems that 
can be solved. 

We illustrate various ideas using real-world planning, 

scheduling, and cutting-stock problems. 


Models and the Art of Modeling 


We are here concerned with two aspects of modeling 
and models. The first one is to obtain a reasonable rep- 
resentation of the reality and mapping it onto a math- 
ematical model, i.e., an optimization problem in the 
form of (1). The second one is to reformulate the model 
or problem in such equivalent forms that is is numeri- 
cally tractable. 


Models 
derived from the word model. Its etymological roots 
are the Latin word modellus (scale, [diminutive of 
modus, measure]) and what was to be in the 16th cen- 
tury the new word modello. Nowadays, in a scientific 
context the term is used to refer to a simplified, ab- 
stract, or well-structured part of the reality one is in- 
terested in. The idea itself and the associated concept 
is, however, much older. Classical geometry, and espe- 
cially Pythagoras around 600 B.c., distinguish between 
wheel and circle and field and rectangle. Around A.D. 
1100 a wooden model of the later Speyer cathedral was 
produced; the model served to build the real cathe- 
dral. Astrolabs and celestial globes have been used as 
models to visualize the movement of the moon, plan- 
ets, and stars on the celestial sphere and to compute 
the times of rises and settings. Until the 19th cen- 
tury mechanical models were understood as pictures 
of reality. Following the principles of classical mechan- 
ics the key idea was to reduce all phenomena to the 
movement of small particles. Nowadays, in physics and 
other mathematical sciences one will talk about models 
if 

e For reasons of simplification, one restricts oneself to 
certain aspects of the problem (example: if we con- 
sider the movement of the planets, in a first approx- 
imation the planets are treated as point masses); 

e For reasons of didactic presentation, one develops 
a simplified picture for more complicated reality 
(example: the planetary model is used to explain the 
situation inside atoms); 

e One uses the properties in one area to study the sit- 
uation in an analogous problem. 

A model is referred to as a mathematical model of 

a process or a problem if it contains typical mathemat- 

ical objects (variables, terms, relations). Thus, a (math- 

ematical) model represents a real-world problem in 


The terms modeling and model building are 
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the language of mathematics using mathematical sym- 
bols, variables, equations, inequalities, and other rela- 
tions. 

It is very important when building a model to define 
and state precisely the purpose of the model. In science, 
we often encounter epistemological arguments. In engi- 
neering, a model might be used to construct some ma- 
chines. In operations research and optimization, mod- 
els are often used to support strategic or operative deci- 
sions. All models enable us to 
e Learn and understand situations that do not allow 

easy access (very slow or fast processes, processes in- 

volving a very small or very large region); 
e Avoid difficult, expensive, or dangerous experi- 
ments; and 
e Analyze case studies and what-if-when scenarios. 
Tailored optimization models can be used to support 
decisions (that is, the overall purpose of the model). 
It is essential to have a clear objective describing what 
a good decision is. The optimization model should pro- 
duce, for instance, optimal solutions in the following 
sense: 
e To avoid unwanted byproducts as much as possible, 
e To minimize costs, or 
e to maximize profit, earnings before interest and 
taxes (EBIT), or contribution margin. 
The purpose of a model may change over time. 
To solve a real-world problem by mathematical opti- 
mization, at first we need to represent our problem by 
a mathematical model, that is, a set of mathematical 
relationships (e. g., equalities, inequalities, logical con- 
ditions) representing an abstraction of our real-world 
problem. This translation is part of the model-building 
phase (which is part of the whole modeling process) 
and is not trivial at all because there is nothing we 
could consider an exact model. Each model is an ac- 
ceptable candidate as long as it fulfills its purpose and 
approximates the real world accurately enough. Usu- 
ally, a model in mathematical optimization consists of 
four key objects: 
e Data, also called the constants of a model; 
e Variables (continuous, semicontinuous, binary, in- 
teger), also called decision variables; 
e Constraints (equalities, inequalities), also called re- 
strictions; and 
e Objective function (sometimes even several of 
them). 


The data may represent costs or demands, fixed oper- 
ation conditions of a reactor, capacities of plants, and 
so on. The variables represent the degrees of freedom, 
i.e, what we want to decide: how much of a certain 
product is to be produced, whether a depot is closed 
or not, or how much material we will store in the in- 
ventory for later use. Classical optimization (calculus, 
variational calculus, optimal control) treats those cases 
in which the variables represent continuous degrees of 
freedom, e. g., the temperature in a chemical reactor or 
the amount of a product to be produced. Mixed-integer 
optimization involves variables restricted to integer val- 
ues, for example counts (numbers of containers, ships), 
decisions (yes-no), or logical relations (if product A is 
produced, then product B also needs to be produced). 
The constraints can be a wide range of mathematical re- 
lationships: algebraic, analytic, differential, or integral. 
They may represent mass balances, quality relations, 
capacity limits, and so on. The objective function ex- 
presses our goal: minimize costs, maximize utilization 
rate, minimize waste, and so on. Mathematical mod- 
els for optimization usually lead to structured problems 
such as: 
e Linear programming (LP) problems, 
e Mixed-integer linear programming (MILP) prob- 
lems, 
e Quadratic (QP) and mixed-integer quadratic pro- 
gramming (MIQP), 
Nonlinear programming (NLP) problems, and 
e Mixed-integer nonlinear programming (MINLP) 
problems. 


The Art of Modeling How do we get from a given 
problem to its mathematical representation? This is 
a difficult, nonunique process. It is a compromise be- 
tween the degree of detail required to model a problem 
and the complexity, which is tractable. However, sim- 
plifications should not only be seen as an unavoidable 
evil. They could be useful for developing understanding 
or serve as a platform with the client, as the following 
three examples show. 

1. At the beginning of the modeling process it can be 
useful to start with a “down-scaled” version to de- 
velop a feeling for the structure and dependencies 
of the model. This enable a constructive dialog be- 
tween the modeler and the client. A vehicle fleet with 
100 vehicles and 12 depots could be analyzed with 
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only 10 vehicles and 2 depots to let the model world 
and the real world find each other in a sequence of 
discussions. 

2. In partial or submodels the modeler can develop 
a deep understanding of certain aspects of the prob- 
lem which can be relevant to solve the whole prob- 
lem. 

3. Some aspects of the real world problem could be 
too complicated to model them complete or exactly. 
During the modeling process it can be clarified, us- 
ing a smaller version, whether partial aspects of the 
model could be neglected or whether they are essen- 
tial. 

In any case it is essential that the simplifications be well 

understood and documented. 


Tricks of the Trade for Monolithic Models 


Using _ state-of-the-art commercial solvers, e.g., 
XPressMP [XPressMP is by Dash Optimization, 
http://www.dashoptimization.com] or CPLEX [CPLEX 
is by ILOG, http://www.ilog.com], MILP problems can 
be solved quite efficiently. In the case of MINLP and 
using global optimization techniques, the solution effi- 
ciency depends strongly on the individual problem and 
the model formulation. However, as stressed in [21] for 
both MILP and MINLP problem, it is recommended 
that the full mathematical structure of a problem be 
exploited, that appropriate reformulations of models 
be made, and that problem-specific valid inequalities 
or cuts be used. Software packages may also differ with 
respect to the ability of presolving techniques, default 
strategies for the branch-and-bound algorithm, cut 
generation within the branch-and-cut algorithm, and, 
last but not least, diagnosing and tracing infeasibilities, 
which is an important issue in practice. 

Here we collect a list of recommendation tricks that 
help to improve the solution procedure of monolithic 
MIP problems, i. e., standalone models that are solved 
by one call to a MILP or MINLP solver. Among them 
are: 

e Use bounds instead of constraints if the dual values 
are not necessarily required. 

e Apply one’s own presolving techniques. Consider, 
for instance, a set of inequalities 


Bijedijx < Aijes V{i, j,k} (2) 


on binary variables 4; ;,. They can be replaced by the 
bounds 


ijk =0; V{(i, j,k) |Aijx < Bix} 


or, if one does not trust the < in a modeling lan- 
guage, the bounds 


bijk =0; V{(i, j,k) |Aize < Bijx —€} 


where ¢ > 0 is a small number, say, of the order 
of 10°. If Ajjx = Bijx, then (2) is redundant. Note 
that, due to the fact that we have three indices, the 
number of inequalities can be very large. 

Exploit the presolving techniques embedded in the 
solver; cf. [28]. 

Exploit or eliminate symmetry: sometimes, symme- 
try can lead to degenerate scenarios. There are situ- 
ations, for instance, in scheduling where orders can 
be allocated to identical production units. Another 
example is the capacity design problem of a set of 
production units to be added to a production net- 
work. In that case, symmetry can be broken by re- 
questing that the capacities of the units be sorted in 
descending order, i.e., ¢, > Cy41. [29] exploit sym- 
metry in order allocation for stock cutting in the pa- 
per industry; this is a very enjoyable paper to read. 
Use special types of variables for which tailor-made 
branching rules exist (this applies to semicontinu- 
ous and partial-integer variables as well as special 
ordered sets). 

Experiment with the various strategies offered by 
the commercial branch-and-bound solvers for the 
branch-and-bound algorithm. 

Experiment with the cut generation within the com- 
mercial branch-and-cut algorithm, among them 
Gomory cuts, knapsack cuts, or flow cuts; cf. [28]. 
Construct one’s own valid inequalities for certain 
substructures of problems at hand. Those inequal- 
ities may be added a priori to a model, and in the ex- 
treme case they would describe the complete convex 
hull. As an example we consider the mixed-integer 


inequality 
x<CrA, 0<x<X; 


xeR{, AEN (3) 


which has the valid inequality 


x <X—G(K—A) where 


k=[2| and G:=X—C(K-1). 4) 
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This valid inequality (4) is the more useful, the more 
K and X/C deviate. A special case arising is often the 
situation A € {0,1}. Another example, taken from 
[39], p. 129 is 

Aj,Q, + Aza, < B+x 


xéeRt 1,02, € IN 


(5) 
which for B ¢ IN leads to the valid inequality 


PTY hpi 
P=) <tal+ 5 © 


where the following abbreviations are used: 


[Ai] oi+ (Las) Oy + 


f:=B-|B], 


fi = Aj = [A,| ‘ h = A2 = | A2 | a (7) 


The dynamic counterpart of valid inequalities added 
a priori to a model leads to cutting-plane algorithms 
that avoid adding a large number of inequalities 
a priori to the model (note, this can be equivalent 
to finding the complete convex hull). Instead, only 
those useful in the vicinity of the optimal solution 
are added dynamically. For the topics of valid in- 
equalities and cutting-plane algorithms the reader is 
referred to books by Nemhauser and Wolsey [30], 
Wolsey [39], and Pochet and Wolsey [32]. 

Try disaggregation in MINLP problems. Global op- 
timization techniques are often based on convex un- 
derestimators. Univariate functions can be treated 
easier than multivariate terms. Therefore, it helps to 
represent bilinear or multilinear terms by their dis- 
aggregated equivalences. As an example we consider 
X1X2 with given lower and upper bounds X; and X - 
for x;; i = 1,2. Wherever we encounter xx in our 
model we can replace it by 


il 
2 2 2 
1X2 = =(xX}, — xy — x5) 
2 
and 
X12 = xX) + X26 


The auxiliary variable is subject to the bounds 
Xp = Xp + Xj and 


This formulation has another advantage. It allows us 
to construct easily a relaxed problem which can be 
used to derive a useful lower bound. Imagine a prob- 
lem P with the inequality 


XX, SA. (8) 
Then 
Xiy — Xp x1 — Xpx. <2A (9) 


is a relaxation of P as each point (x), x2) satisfying 
(8) also fulfills (9). Note that an alternative disaggre- 
gation avoiding an additional variable is given by 


xyX2 = 4 [(%1 + x2)’ — (x1 — x2)’] . 


However, all of the creative attempts listed above may 
not suffice to solve the MIP using one monolithic 
model. That is when we should start looking at solv- 
ing the problem by a sequence of problems. We have to 
keep in mind that to solve a MIP problem we need to 
derive tight lower and upper bounds with the gap be- 
tween them approaching zero. 


Decomposition Techniques 


Decomposition techniques decompose a problem into 
a set of smaller problems that can be solved in sequence 
or in any combination. Ideally, the approach can still 
compute the global optimum. There are standardized 
techniques such as Benders Decomposition [cf. Floudas 
({9], Chap. 6). But often one should exploit the struc- 
ture of an optimization to construct tailor-made de- 
compositions. This is outlined in the following subsec- 
tions. 


Column Generation 


In linear programming parlance, the term column usu- 
ally refers to variables. In the context of column- 
generation techniques it has wider meaning and stands 
for any kind of objects involved in an optimization 
problem. In vehicle routing problems a column might, 
for instance, represent a subset of orders assigned to 
a vehicle. In network flow problems a column might 
represent a feasible path through the network. Finally, 
in cutting-stock problems [10,11] a column represents 
a pattern to be cut. 
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The basic idea of column generation is to decom- 
pose a given problem into a master and subproblem. 
Problems that might otherwise be nonlinear can be 
completely solved by solving only linear problems. The 
critical issue is to generate master and subproblems that 
can both be solved efficiently. One of the most famous 
examples is the elegant column-generation approach 
of Gilmore and Gomory [10] for computing the min- 
imal number of rolls to satisfy a requested demand for 
smaller sized rolls. This problem, if formulated as one 
monolithic problem, leads to a MINLP problem with 
a large number of integer variables. In simple cases, 
such as those described by Schrage ([35], Sect. 11.7), 
it is possible to generate all columns explicitly, even 
within a modeling language. Often the decomposition 
has a natural interpretation. If not all columns can be 
generated, the columns are added dynamically to the 
problem. Barnhart et al. [2] give a good overview on 
such techniques. A more recent review focusing on se- 
lected topics of column generation is [25]. In the con- 
text of vehicle routing problems, feasible tours contain 
additional columns as needed by solving a shortest- 
path problem with time windows and capacity con- 
straints using dynamic programming [7]. 

More generally, column-generation techniques are 
used to solve well-structured MILP problems involving 
a huge number, say, several hundred thousand or mil- 
lions, of variables, i.e., columns. Such problems lead to 
large LP problems if the integrality constraints of the 
integer variables are relaxed. If the LP problem con- 
tains so many variables (columns) that it cannot be 
solved with a direct LP solver (revised simplex, interior 
point method), one starts solving this so-called master 
problem with a small subset of variables yielding the 
restricted master problem. After the restricted master 
problem has been solved, a pricing problem is solved 
to identify new variables. This step corresponds to the 
identification of a nonbasic variable to be taken into the 
basis of the simplex algorithm and the term column gen- 
eration. The restricted master problem is solved with 
the new number of variables. The method terminates 
when the pricing problems cannot identify any new 
variables. The simplest version of column generation is 
found in the Dantzig-Wolfe decomposition [6]. 

Gilmore and Gomory [10,11] were the first to gen- 
eralize the idea of dynamic column generation to an 
integer programming (IP) problem: the cutting-stock 


problem. In this case, the pricing problem, i. e., the sub- 
problem, is an IP problem itself - and one refers to this 
as a column-generationalgorithm. This problem is spe- 
cial as the columns generated when solving the relaxed 
master problem are sufficient to get the optimal integer 
feasible solution of the overall problem. In general this 
is not so. If not only the subproblem, but also the master 
problem involves integer variables, then the column- 
generation part is embedded into a branch-and-bound 
method; this is called branch-and-price. Thus, branch- 
and-price is integer programming with column gen- 
eration. Note that during the branching process new 
columns are generated; therefore the name branch-and- 
price. 


Column Generation in cutting-stock Problems This 
section describes the mathematical model for minimiz- 
ing the number of roles or trimloss and illustrates the 
idea of column generation. 


Indices used in this model: 
peP:={pi....,pne} for cutting patterns (for- 
mats). 


Either the patterns are directly generated according 
to a complete enumeration or they are generated by 
column generation. 

i€7:= {ij,..., ini} given orders or widths. 


Input Data We arrange the relevant input data size 
here: 


B_ [L] width of the rolls (raw material roles) 
D; — [-] number of orders for the width i 
W; — [L] width of order type i 


Integer Variables used in the different model variants: 


Hp € No := {0,1,2,3,...} 
ten pattern p is used. 

If cutting pattern p is not used, then we have 
Lp = 0. 

Qip € INo  [—] indicates how often width i is con- 
tained in pattern p. 

This variable can take values between 0 and D; de- 
pending on the order situation. 


[—] indicates how of- 


Modeling Difficult Optimization Problems 


2291 


Model The model contains a suitable object function 


min f (ip, Lp). 


as well as the boundary condition (fulfillment of the de- 
mand) 


> ip = D,;, Vi (10) 
P 
and the integrality constraints 
Qin € INO. V{ip}, 
p 0 tip} (11) 
Mp €INo, Y{p}. 


General Structure of the Problem In this form it 
is a mixed-integer nonlinear optimization problem 
(MINLP). This problem class is difficult in itself. More 
serious is the fact that we may easily encounter several 
million variables a;,. Therefore the problem cannot be 
solved in this form. 


Solution Method The idea of dynamic column gener- 
ation is based on the fact that one must decide in a mas- 
ter problem for a predefined set of patterns how often 
every pattern must be used as well as calculate suitable 
input data for a subproblem. In this subproblem new 
patterns are calculated. 

The master problem solves for the multiplicities of 
existing patterns and has the shape 


min) pp. 
P 


with the demand-fulfill inequality (note that it is al- 
lowed to produce more than requested) 


> Nipltp = Di, Vi (12) 
and the integrality constraints 
Lp €INo, V{p}. (13) 


The subproblem generates new patterns. Structurally it 
is a knapsack problem with object function 


min 1 — 2 Pid; , 
at 
P 


where P; are the dual values (pricing information) of 
the master problem (pricing problem) associated with 


(12) and a; is an integer variable specifying how often 
width i occurs in the new pattern. We add the knapsack 
constraint with respect to the width of the rolls 


Yo Wai <B, Vi (14) 
and the integrality constraints 
a; € No, V{i}. (15) 


In some cases, ; could be additionally bounded by the 
number, K, of knives. 


Implementation Issues The critical issues in this 
method, in which we alternate in solving the master 
problem and the subproblem, are the initialization of 
the procedure (a feasible starting point is to have one 
requested width in each initial pattern, but this is not 
necessarily a good one), excluding the generation of the 
existing pattern by applying integer cuts, and the termi- 
nation. 


Column Enumeration 


Column enumeration is a special variant of column 
generation and is applicable when a small number of 
columns is sufficient. This is, for instance, the case in 
real-world cutting-stock problems when it is known 
that the optimal solution has only a small amount of 
trimloss. This usually eliminates most of the pattern. 
Column enumeration naturally leads to a type of se- 
lecting columns or partitioning models. A collection of 
illustrative examples contained in ([35], Sect. 11.7) cov- 
ers several problems of grouping, matching, covering, 
partitioning, and packing in which a set of given objects 
has to be grouped into subsets to maximize or mini- 
mize some objective function. Despite the limitations 
with respect to the number of columns, column enu- 
meration has some advantages: 

e No pricing problem, 

e Easily applied to MIP problems, 

e Column enumeration is much easier to implement. 
In the online version of the vehicle routing prob- 
lem described in [22] it is possible to generate the 
complete set, C,, of all columns, i.e., subsets of or- 
ders i€ O, r=|O|, assigned to a fleet of n vehi- 
cles, v € V. Let C, be the union of the sets, C,,, i.e. 
Cy = Uy=1...nCry with C, =|C,| = 2", where C,, 
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contains the subsets of orders assigned to vehicle v. 

Note that C,, contains all subsets containing 1, 2, or r 

orders assigned to vehicle v. The relevant steps of the 

algorithm are: 

1. Explicitly generate all columns C,,, followed by 
a simple feasibility test w.r.t. the availability of the 
cars. 

2. Solve the routing-scheduling problem for all 
columns C,, using a tailor-made branch-and-bound 
approach (the optimal objective function values, 
Z(t-) or Z(tey), respectively, and the associated 
routing-scheduling plan are stored). 

3. Solve the partitioning model: 


Cry NY 
min Z(Tev)Vev » 16 
nit > (Tev)¥ (16) 
s.t. 
Cr NY 
y .ieieeal, ViHlaae 17 


c=1 v=1 


ensures that each order is contained exactly once, the 
inequality 


Cy 
Pieces WEY, (18) 
c=1 


ensuring that at most one column can exist for each 
vehicle, and the integrality conditions 


Vv € {0,1}, VWe=1,...,C,. (19) 


Note that not all combinations of index pairs {c, v} 
exist; each c corresponds to exactly one v, and vice 
versa. This formulation allows us to find optimal so- 
lutions with the defined columns for a smaller num- 
ber of vehicles. The objective function and the parti- 
tioning constraints are just modified by substituting 


Nv NY 

Siege a 

v=1|\veV v=1|vE Vx 
the equations 


Cy NY 


a = Ti(te)Yev =1, Vix=1,...,r, 


c=1 v=1|\vE Vx 


and the inequality 


Cry 
eae Vv e Vu, 
c=1 


where V, C V isa subset of the set V ofall vehicles. 
Alternatively, if it is not prespecified which vehicles 
should be used but it is only required that not more 
than NY vehicles be used, then the inequality 


Cc, NY 
yd wen (20) 
c=1 v=1|\veV 

is imposed. 


4. Reconstruct the complete solution and extract the 
complete solution from the stored optimal solutions 
for the individual columns. 


Branch-and-Price 


Branch-and-price (often coupled with branch-and-cut) 
refers to a tailor-made algorithm exploiting the decom- 
position structure of the problem to be solved. This ef- 
ficient method for solving MIP problems with column 
generation has been well described by Barnhart et al. [2] 
and has been covered by Savelsbergh [34] in the first 
edition of the Encyclopedia of Optimization. Here, we 
give a list of more recent successful applications in var- 
ious fields. 

Cutting stock: [3,38] 

Engine routing and industrial in-plant railroads: [26] 
Network design: [16] 

Lot sizing: [38] 

Scheduling (staff planning): [8] 

Scheduling of switching engines: [24] 

Supply chain optimization (pulp industry): [5] 
Vehicle routing: [7,15] 


Rolling Time Decomposition 


The overall methodology for solving the medium- 
range production scheduling problem is to decom- 
pose the large and complex problem into smaller 
short-term scheduling subproblems in successive time 
horizons, i.e. we decompose according to time. 
Large-scale industrial problems have been solved by 
Janak et al. [18,19]. A decomposition model is formu- 
lated and solved to determine the current horizon and 
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corresponding products that should be included in the 
current subproblem. According to the solution of the 
decomposition model, a short-term scheduling model 
is formulated using the information on customer or- 
ders, inventory levels, and processing recipes. The re- 
sulting MILP problem is a large-scale complex problem 
that requires a large computational effort for its solu- 
tion. When a satisfactory solution is determined, the 
relevant data are output and the next time horizon is 
considered. The above procedure is applied iteratively 
in an automatic fashion until the whole scheduling pe- 
riod under consideration is finished. 

Note that the decomposition model determines au- 
tomatically how many days and products to consider in 
the small scheduling horizon subject to an upper limit 
on the complexity of the resulting mathematical model. 


An Exhaustion Method 


This method combines aspects of a constructive heuris- 
tics and of exact model solving. We illustrate the ex- 
hausting method by the cutting-stock problem de- 
scribed in Sect. “Column Generation in cutting-stock 
Problems”; assigning orders in a scheduling problem 
would be another example. The elegant column gener- 
ation approach by Gilmore and Gomory [10] is known 
for producing minimal trimloss solutions with many 
patterns. Often this corresponds to setup changes on 
the machine and therefore is not desirable. A solution 
with a minimal number of patterns minimizes the ma- 
chine setup costs of the cutter. Minimizing simultane- 
ously trimloss and the number of patterns is possible for 
a small case of a few orders only exploiting the MILP 
model by Johnston and Salinlija [20]. It contains two 
conflicting objective functions. Therefore one could re- 
sort to goal programming. Alternatively, we could pro- 
duce several parameterized solutions leading to differ- 
ent numbers of rolls to be used and patterns to be cut 
from which the user would extract the one he likes best. 

As the table above indicates, we compute tight lower 
bounds on both trimloss and the number of patterns. 
Even for up to 50 feasible orders, near-optimal solu- 
tions are constructed in less than a minute. 

Note that it would be possible to use the branch- 
and-price algorithm described in [38] or [3] to solve 
the one-dimensional cutting-stock problem with min- 
imal numbers of patterns. However, these methods are 


not easy to implement. Therefore, we use the following 

approaches, which are much easier to program: 

e V1: Direct usage of the model by Johnston and 
Salinlija [20] for a small number, say, N '< 14, of 
orders and Dmax < 10. In a preprocessing step we 
compute valid inequalities as well as tight lower and 
upper bounds on the variables. 

e V2: Exhaustion procedure in which we generate suc- 
cessively new patterns with maximal multiplicities. 
This method is parameterized by the permissible 
percentage waste Wax, 1 < Wimax < 99. After a few 
patterns have been generated with this parameteri- 
zation, it could happen that is is not possible to gen- 
erate any more patterns with waste restriction. In 
this case the remaining unsatisfied orders are gen- 
erated by V1 without the Wax restriction. 


Indices and Sets 


In this model we use the indices listed in Johnston and 
Salinlija [20]: 


i€7:= {ij,..., iyi} denotes the sets of width. 

j€|:= {j,..., je} denotes the pattern; NJ < N’. 
The patterns are generated by V1, or dynamically 
by maximizing the multiplicities of a used pattern. 

ke K:={k,...,kye} denotes the multiplicity in- 
dex to indicate how often a width is used in a pat- 
tern. 
The multiplicity index can be restricted by the ratio 
of the width of the orders and the width of the given 
rolls. 


Variables 
The following integer or binary variables are used: 
aijk € IN [—] specifies the multiplicity of pattern j. 


The multiplicity can vary between 0 
Dmax *= max{Dj;}. If pattern j is not used, we have 


and 


pj; < {0,1} [-—] indicates whether pattern j is used at 
all. 
rj € IN [—] specifies how often pattern j is used. 


The multiplicity can vary between 0 and 


Dyax = max{D;}. If pattern j is not used, we have 
i Pj = 0. 

aip € IN [—] specifies how often width i occurs in 
pattern p. 
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# of # output file flag Wmax 
rolls pat 

0 5 8 

30 10 pat00.out ) 

34 W pat0l.out 0) 20 
Sil g) pat02.out iL 1L'5} 
30 8 pat03.out 0 

3) @) pat04.out iL 8 
30 8 pat05.out 0 

Sal 8 pat06.out iL 4 


comment 


99 lower bound: minimal # of patterns 
8) lower bound: minimal # of rolls 


iL@) minimal number of rolls 


6 minimal number of rolls 


The best solution found contains 7 patterns! 
The solution with minimal trimloss contain 30 rolls! 


Improvement in the lower bound of pattern: 6! 
Solutions with 6 patterns are minimal w.r.t. 


to the number of patterns. 


A new solution was found with only 6 patterns and 36 rolls: 


36 6 patnew.out O 99 


This width-multiplicity variable can take all values 
between 0 and D;. 

xijk € {0,1} [—] indicates whether width i appears 
in pattern j at level k. 
Note that x;;, = 0 implies a;;, = 0. 


The Idea of the Exhaustion Method 


In each iteration we generate m at most two or three 
new patterns by maximizing the multiplicities of these 
patterns, allowing no more than a maximum waste, 
Wmax- The solution generated in iteration m is pre- 
served in iteration m + 1 by fixing the appropriate vari- 
ables. If the problem turns out to be infeasible (this 
may happen if Wax turns out to be restrictive), then 
we switch to a model variant in which we minimize the 
number of patterns subject to satisfying the remaining 
unsatisfied orders. 

The model is based on the inequalities (2,3,5,6,7,8,9) 
in [20], but we add a few more additional ones or mod- 
ify the existing ones. We exploit two objective func- 
tions: maximizing the multiplicities of the patterns gen- 
erated 


Ty 


max ) Yj 5 
j=l 


patnew.out 


where 7, specifies the maximal number of patterns 
(x, could be taken from the solution of the column- 
generation approach, for instance), or minimizing the 
number of patterns generated 


Ty 


min > Py 


j=l 
The model is completed by the integrality conditions 


Tj, Gijk € {0,1,2,3,...} (21) 


Pj» Xijks Vik E {0,1}. (22) 


The model is applied several times with ajjx < D;, 
where D; is the number of remaining orders of width 
i. In particular, the model has to fulfill the relationships 

kaijx > D; = 


aijk = 0 Xijk = 0 


and 


D; D;, +S; 
Qijk S 4 Or ijk S ae =| 


where S; denotes the permissible overproduction. 

The constructive method described so far provides 
an improved upper bound, z/, on the number of pat- 
tern. 
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Computing Lower Bounds 


To compute a lower bound we apply two methods. 
The first method is to solve a bin-packing problem, 
which is equivalent to minimizing the number of rolls 
in the original cutting-stock problem described in the 
Sect. “Column Generation in Cutting-Stock Problems” 
for equal demands D; = 1. If solved with the column- 
generation approach, this method is fast and cheap, but 
the lower bound, 77, is often weak. The second method 
is to exploit the upper bound, z/,, on the number of pat- 
terns obtained and to call the exact model as in V1. It is 
impressive how quickly the commercial solvers CPLEX 
and XpressMP improve the lower bound yielding 7. 


For most examples with up to 50 orders we obtain 
/ 


1, — 1, <2, but in many cases 7), — a; = 1 or even 


u 
a 
Ty, = 1). 


Primal Feasible Solutions and Hybrid Methods 


We define hybrid methods as methods based on any 
combination of exact MIP methods with constructive 
heuristics, local search, metaheuristics, or constraint 
programming that produces primal feasible solutions. 
Dive-and-fix, near-integer-fix, and fix-and-relax are 
such hybrid methods. They are user-developed heuris- 
tics exploiting the problem structure. In their kernel 
they use a declarative model solved, for instance, by 
CPLEX and XpressMP. 

In constructive heuristics we exploit the structure 
of the problem and compute a feasible point. Once 
we have a feasible point we can derive safe bounds 
on the optimum and assign initial values to the criti- 
cal discrete variable, which could be exploited by the 
GAMS/CPLEX mipstart option. Feasible points can 
sometimes be generated by appropriate sequences of 
relaxed models. For instance, in a scheduling problem 
P with due times one might relax these due times ob- 
taining the relaxed model R. The optimal solution, or 
even any feasible point of R, is a feasible point of P if 
the due times are models with appropriate unbounded 
slack variables. 

Constructive heuristics can also be established by 
systematic approaches of fixing critical discrete vari- 
ables. Such approaches are dive-and-fix and relax-and- 
fix. In dive-and-fix the LP relaxation of an integer prob- 
lem is to be solved followed by fixing a subset of frac- 
tional variables to suitable bounds. Near-integer-fix is 


a variant of dive-and-fix that fixes variables with frac- 
tional values to the nearest integer point. Note that 
these heuristics are subject to the risk of becoming in- 
feasible. 

The probability of becoming infeasible is less likely 
in relax-and-fix. In relax-and-fix, following Pochet and 
Wolsey ([32], pp. 109) we suppose that the binary 
variables 5 of a MIP problem P can be partitioned 
into R disjoint sets S';...;S® of decreasing impor- 
tance. Within these subsets U" with U C an 4,5" for 
r= 1;...;R—1 can be chosen to allow for somewhat 
more generality. Based on these partitions, R MIP prob- 
lems are solved, denoted P’ with 1 <r <R to find 
a heuristic solution to P. For instance in a production 
planning problem, S! might be all the 6 variables asso- 


ciated with time periods in {1,..., t,}, SY those asso- 
ciated with periods in {t, + 1,...,ty41}, whereas U’ 
would would be the 6 variables associated with the pe- 
riods in some set {t, + 1,..., ur}. 


In the first problem, P!, one only imposes the in- 
tegrality of the important variables in S' U U! and re- 
laxes the integrality on all the other variables in S. As 
P' is a relaxation of P, for a minimization problem, the 
solution of P! provides a lower bound of P. The solu- 
tion values, 5!, of the discrete variables are kept fixed 
when solving P’. This continues and in the subsequent 
P', for2 <r < R, we additionally fix the values of the 5 
variables with index in S’~! at their optimal values from 
P'! and add the integrality restriction for the variables 
in S’UU’. 

Either P" is infeasible for some r € {1,..., R}, and 
the heuristic failed, or else (x®, 5%) is a relax-and- 
fix solution. To avoid infeasibilities one might apply 
a smoothed form of this heuristic that allows for some 
overlap of U"' and U’. Additional free binary vari- 
ables in horizon r — 1 allow one to link the current hori- 
zon r with the previous one. Usually this suffices to en- 
sure feasibility. Relax-and-fix comes in various flavors 
exploiting time-decomposition or time-partitioning 
structures. Other decompositions, for instance plants, 
products, or customers, are possible as well. 

A local search can be used to improve the solution 
obtained by the relax-and-fix heuristic. The main idea is 
to solve repeatedly the subproblem on a small number 
of binary variables reoptimizing, for instance, the pro- 
duction of some products. The binary variables for re- 
solving could be chosen randomly or by a metaheuristic 
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such as simulated annealing. All binary variables related 
to them are released; the others are fixed to the previous 
best values. 

Another class of MIP hybrid method is estab- 
lished by algorithms that combine a MIP solver with 
another algorithmic method. A hybrid method ob- 
tained by the combination of mixed-integer and con- 
straint logic programming strategies has been devel- 
oped and applied by Harjunkoski et al. [14] as well 
as Jain and Grossmann [17] for solving scheduling 
and combinatorial optimization problems. Timpe [37] 
solved mixed planning and scheduling problems with 
mixed MILP branch-and-bound and constraint pro- 
gramming. Maravelias and Grossmann [27] proposed 
a hybrid/decomposiiton algorithm for the short-term 
scheduling of batch plants, and Roe et al. [33] pre- 
sented a hybrid MILP/CLP algorithm for multipur- 
pose batch process scheduling in which MILP is used 
to solve an aggregated planning problem while CP is 
used to solve a sequencing problem. Other hybrid al- 
gorithms combine evolutionary and mathematical pro- 
gramming methods; see, for instance, the heuristics by 
Till et al. [36] for stochastic scheduling problems and by 
Borisovsky et al. [4] for supply management problems. 

Finally, one should not forget to add some algorith- 
mic component that, for the minimization problem at 
hand, would generate some reasonable bounds to be 
provided in addition to the hybrid method. The hy- 
brid methods discussed above provide upper bounds 
by constructing feasible points. In favorite cases, the 
MIP part of the hybrid solver provides lower bounds. In 
other case, lower bounds can be derived from auxiliary 
problems, which are relaxations of the original prob- 
lem, and which are easier to solve. 


Summary 


If a given MIP problem cannot be solved by an available 
MIP solver exploiting all its internal presolving tech- 
niques, one might reformulate the problem and obtain 
an equivalent or closely related representation of real- 
ity. Another approach is to construct MIP solutions and 
bounds by solving a sequence of models. Alternatively, 
individual tailor-made exact decomposition techniques 
could help as well as primal heuristics such as relax- 
and-fix or local search techniques on top of a MIP 
model. 
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Modeling Languages in Optimization: A New Paradigm 


In this paper, modeling languages are identified as 
a new computer language paradigm and their applica- 
tions for representing optimization problems is illus- 
trated by examples. 

Programming languages can be classified into three 
paradigms: imperative, functional, and logic program- 
ming [14]. The imperative programming paradigm is 
closely related to the physical way of how (the von Neu- 
mann) computer works: Given a set of memory loca- 
tions, a program is a sequence of well defined instruc- 
tions on retrieving, storing and transforming the con- 
tent of these locations. The functional paradigm of com- 
putation is based on the evaluation of functions. Every 
program can be viewed as a function which translates 
an input into a unique output. Functions are first-class 
values, that is, they must be viewed as values them- 
selves. The computational model is based on the A- 
calculus invented by A. Church (1936) as a mathemat- 
ical formalism for expressing the concept of a compu- 
tation. The paradigm of logic programming is based on 
the insight that a computation can be viewed as a kind 
of (constructive) proof. Hence, a program is a notation 
for writing logical statements together with specified al- 
gorithms for implementing inference rules. 

All three programming paradigms concentrate on 
problem representation as a computation, that is, the 
problem is stated in a way that describes the process of 
solving it. The computation on how to solve a problem 
‘is’ its representation. One may call such a notational 
system an algorithmic language. 


Definition 1 An algorithmic language describes (ex- 
plicitly or implicitly) the computation of solving a prob- 
lem, that is, “how a problem can be processed using 
a machine. The computation consists of a sequence 
of well-defined instructions which can be executed in 
a finite time by a Turing machine. The information of 
a problem which is captured by an algorithmic language 
is called algorithmic knowledge of the problem. 


Algorithmic knowledge to describe a problem is very 
common in our everyday life - one only need to look 
at cookery-books, or technical maintenance manuals — 
that one may ask whether the human brain is ‘predis- 
posed’ to preferably present a problem in describing its 
solution recipe. 

However, there exists at least one different way to 
capture knowledge about a problem; it is the method 


which describes ‘what’ the problem is by defining its 
properties, rather than saying ‘how’ to solve it. Math- 
ematically, this can be expressed by a set {x € X: R(x)}, 
where X is a continuous or discrete state space and R(x) 
is a Boolean relation, defining the properties or the con- 
straints of the problem; x is called the variable(s). A no- 
tational system that represents a problem in this way is 
called a declarative language. 


Definition 2 A declarative language describes the 
problem as a set using mathematical variables and con- 
straints defined over a given state space. This space can 
be finite or infinite, countable or noncountable. The in- 
formation of a problem which is captured by a declar- 
ative language is called declarative knowledge of the 
problem. 


The declarative representation, in general, does not give 
any indication on how to solve the problem. It only 
states what the problem is. Of course, there exists a triv- 
ial algorithm to solve a declaratively stated problem, 
which is to enumerate the state space and to check 
whether a given x € X violates the constraint R(x). The 
algorithm breaks down, however, whenever the state 
space is infinite. But even if the state space is finite, it 
is - for most nontrivial problems - so large that a full 
enumeration is practically impossible. 

Algorithmic and declarative representations are two 
fundamentally different kinds of modeling and rep- 
resenting knowledge. Declarative knowledge answers 
the question ‘what is?’, whereas algorithmic knowledge 
asks ‘how to?’ [4]. An algorithm gives an exact recipe 
of how to solve a problem. A mathematical model, i.e. 
its declarative representation, on the other hand, (only) 
defines the problem as a subspace of the state space. No 
algorithm is given to find all or a single element of the 
feasible subspace. 


Why Declarative Representation 


The question arises, therefore, why to present a prob- 
lem using a declarative way, since one must solve it 
anyway and, hence, represent as an algorithm? The rea- 
sons are, first of all, conciseness, insight, and documen- 
tation. Many problems can be represented declaratively 
in a very concise way, while the representation of their 
computation is long and complex. Concise writings fa- 
vor also the insight of a problem. Furthermore, in many 
scientific papers a problem is stated in a declarative 
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way using mathematical equations and inequalities for 
documentational purposes. This gives a clear statement 
of the problem and is an efficient way to communi- 
cate it to other scientists. However, documentation is 
by no means limited to human beings. One can imag- 
ine declarative languages implemented on a computer 
like algorithmic languages, which are parsed and inter- 
preted by a compiler. In this way, an interpretative sys- 
tem can analyse the structure of a declarative program, 
can pretty-print it on a printer or a screen, can classify 
it, or symbolically transform it in order to view it as a di- 
agram or in another textual form. 

Of course, the most interesting question is whether 
the declarative way of representing a problem could be 
of any help in solving the problem. 

Indeed, for certain classes of problems the computa- 
tion can be obtained directly from a declarative formu- 
lation. This is true for all recursive definitions. A clas- 
sical example is the algorithm of Euclid to find the 
greatest common divisor (gcd) of two integers. One can 
proof that 


gcd(b,a mod b), b>0 


cd(a, b) = 
one a, b=0, 


which is clearly a declarative statement of the prob- 
lem. In Scheme, a functional language, this formula can 
be implemented directly as a function in the following 
way: 


(define (gcd ab) 
(if(=b 0) a 
(gcd b (remainder a b)))) 


Similar formulations can be given for any other lan- 
guage which includes recursion as a basic control struc- 
ture. This class of problems is surprisingly rich. The 
whole paradigm of dynamic programming can be sub- 
sumed under this class. 

A class of problems ofa very different kind are linear 
programs, which can be represented declaratively in the 
following way: 


{min cx: Ax > b} 


From this formulation - in contrast to the class of 
recursive definitions - nothing can be deduced that 
would be useful in solving the problem. However, there 


exists well-known methods, for example the simplex 
method, which solves almost all instances in a very ef- 
ficient way. Hence, to make the declarative formulation 
of a linear program useful for solving it, one only needs 
to translate it into a form, the simplex algorithm accepts 
as input. The translation from the declarative formu- 
lation {min cx: Ax > b} to such an input-form can be 
automated. This concept can be extended to nonlinear 
and discrete problems. 


Algebraic Modeling Languages 


The idea to state the mathematical problem in a declar- 
ative way and to translate it into an ‘algorithmic 
form by a standard procedure led to a new language 
paradigm emerged basically in the community of op- 
erations research at the end of the 1980s, the algebraic 
modeling languages (AIMMS [1], AMPL [7], GAMS 
[2], LINGO [18], and LPL [12] and others). These lan- 
guages are becoming increasingly popular even outside 
the community of operations research. Algebraic mod- 
eling languages represent a problem in a purely declara- 
tive way, although most of them include computational 
facilities to manipulate the data as well as certain con- 
trol structures. 

One of their strength is the complete separation of 
the problem formulation as a declarative model from 
finding a solution, which is supposed to be computed 
by an external program called a solver. This allows 
the modeler not only to separate the two main tasks 
of model formulation and model solution, but also to 
switch easily between several solvers. This is an invalu- 
able benefit for many difficult problems, since it is not 
uncommon that a model instance can be solved using 
one method, and another instance is solvable only using 
another method. Another advantage of such languages 
is to separate clearly between model structure, which 
only contains parameters (place-holder for data) but no 
data, and model instance, in which the parameters are 
replaced by a specific data set. This leads to a natural 
separation between model formulation and data gath- 
ering stored in databases. Hence, the main features of 
these algebraic modeling languages are: 

e purely declarative representation of the problem; 

e clear separation between formulation and solution; 

e clear separation between model structure and model 
data. 
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It is, however, naive to think that one only needs to for- 
mulate a problem in a concise declarative form and to 
link it somehow to a solver in order to solve it. First 
of all, the ‘linking process’ is not so straightforward as 
it seems initially. Second, a solver may not exist which 
could solve the problem at hand in an efficient way. One 
only needs to look at Fermat’s last conjecture which can 
be stated in a declarative way as {a, b, c,n € N*: a" + b” 
=c",a, b,c > 1, n> 2} to convince oneself of this fact. 
Even worse, one can state a problem declaratively for 
which no solver can exist. This is true already for the 
rather limited declarative language of first order logic, 
for which no algorithm exists which decides whether 
a formula is true or false in general (see [5]). 

In this sense, efforts are under way actually in the 
design of such languages which focus on flexibly link- 
ing the declarative formulation to a specific solver to 
make this paradigm of purely declarative formulation 
more powerful. This language-solver-interface problem 
has different aspects and research goes in many di- 
rections. A main effort is to integrate symbolic model 
transformation rules into the declarative language in 
order to generate formulations which are more use- 
ful for a solver. AMPL, for example, automatically de- 
tects partially separable structure and computes second 
derivatives [8]. This information are also handed over 
to a nonlinear solver. LPL, to cite a very different un- 
dertaking, has integrated a set of rules to translate sym- 
bolically logical constraints into 0-1 constraint [11]. To 
do this in an intelligent way is all but easy, because the 
resulting 0-1 formulation should be as sharp as pos- 
sible. This translation is useful for large mathematical 
models which must be extended by a few logical con- 
ditions. For many applications the original model be- 
comes straightforward while the transformed is com- 
plicated but still relatively easy to solve (examples were 
given in [11]). Even if the resulting formulation is not 
solvable efficiently, the modeler can gain more insights 
into the structure of the model from such a symbolic 
translation procedure, and eventually modify the origi- 
nal formulation. 


Second Generation Modeling Languages 


Another research activity, actually under way, goes in 
the direction of extending the algebraic modeling lan- 
guages in order to express also algorithmic knowledge. 


This is necessary, because even if one could link an 
purely declarative language to any solver, it remains 
doubtful of whether this can be done efficiently in all 
cases. Furthermore, for many problems it is not useful 
to formulate them in a declarative way: the algorithmic 
way is more straightforward and easier to understand. 
For still other problems a mixture of declarative and 
algorithmic knowledge leads to a superior formulation 
in terms of understandability as well as in terms of 
efficiency, (examples are given below to confirm this 
findings). 

Therefore, AIMMS integrates control structures 
and procedure definitions. GAMS, AMPL and LPL also 
allow the modeler to write algorithms powerful enough 
to solves models repeatedly. 

A theoretical effort was undertaken in [10] to spec- 
ify a modeling language which allows the modeler (or 
the programmer) to combine algorithmic and declar- 
ative knowledge within the same language framework 
without intermingle them. The overall syntax structure 
of a model (or a program) in this framework is as fol- 
lows: 


MODEL ModelName 

(declarative part of the model) 
BEGIN 

(algorithmic part of the model) 
END ModelName. 


Declarative and algorithmic knowledge are clearly sep- 
arated. Either part can be empty, meaning that the 
problem is represented in a purely declarative or in 
a purely algorithmic form. The declarative part consists 
of the basic building blocks of declarative knowledge: 
variables, parameters, constraints, model checking fa- 
cilities, and sets (that is a way to ‘multiply’ basic build- 
ing blocks). This part may also contain ‘ordinary decla- 
rations’ of an algorithmic language (e. g., type and func- 
tion declarations). Furthermore, one can declare whole 
models within this part, leading to nested model struc- 
tures, which is very useful in decomposing a complex 
problem into smaller parts. The algorithmic part, on 
the other hand, consists of all control structures which 
make the language Turing complete. One may imagine 
his or her favorite programming language being imple- 
mented in this part. A language which combines declar- 
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ative and algorithmic knowledge in this way is called 
modeling language. 


Definition 3 A modeling language is a notational sys- 
tem which allows one to combine (not to merge) declar- 
ative and algorithmic knowledge in the same language 
framework. The content captured by such a notation is 
called a model. 


Such a language framework is very flexible. Purely 
declarative models are linked to external solvers to be 
solved; purely algorithmic models are programs, that is 
algorithms + data structures, in the ordinary sense. 


Modeling Language 
and Constraint Logic Programming 


Merging declarative and algorithmic knowledge is not 
new, although it is not very common in language de- 
sign. The only existing language paradigm doing it is 
constraint logic programming (CLP), a refinement of 
logic programming [13]. There are, however, impor- 
tant differences between the CLP paradigm and the 
paradigm of modeling language as defined above. 

1) In CLP the algorithmic part - normally a search 
mechanism - is behind the scene and the compu- 
tation is intrinsically coupled with the declarative 
language itself. This could be a strength because the 
programmer does not have to be aware of how the 
computation is taking place, he or she only writes 
the rules in a descriptive, that is declarative, way 
and triggers the computation by a request. In reality, 
however, it is an important drawback, because - for 
most nontrivial problem - the programmer ‘must’ 
be aware on how the computation is taking place. 
Therefore, to guide the computation in CLP, the 
declarative program is interspersed with additional 
rules which have nothing to do with the description 
of the original problem. In a modeling language, the 
user either links the declarative part to an external 
solver or writes the solver within the language. In 
either case, both parts are strictly separated. Why is 
this separation so important? Because it allows the 
modeler to ‘plug in’ different solvers without touch- 
ing the overall model formulation. 

2) The second difference is that the modeling lan- 
guage paradigm lead automatically to modular de- 
sign. This is probably to hottest topic in software 


engineering: building components. Software engi- 
neering teaches us that a complex structure can be 
only managed efficiently by break it down into many 
relatively independent components. The CLP ap- 
proach leads more likely to programs that are dif- 
ficult to survey and hard to debug and to main- 
tain, because such considerations are entirely absent 
within the CLP paradigm. 

3) On the other hand, the community of CLP has de- 
veloped methods to solve specific classes of com- 
binatorial problems which seems to be superior to 
other methods. This is because they rely on propaga- 
tion, simplification of constraints, and various con- 
sistency techniques. In this sense, CLP solvers could 
be used and linked with modeling languages. Such 
a project is actually under way between the AMPL 
language and the ILOG solver [6,17]. 

Hence, while the representation of models is probably 

best done in the language framework of modeling lan- 

guages, the solution process can taken place in a CLP 
solver for certain problems. 


Modeling Examples 


Five modeling examples are chosen from very different 
problem domains to illustrate the highlights of the pre- 
sented paradigm of modeling language. The first two 
examples show that certain problems are best formu- 
lated using algorithmic knowledge, the next two exam- 
ples show the power of a declarative formulation, and 
a last example indicates that mixing both paradigms is 
sometimes more advantageous. 


Sorting 


Sorting is a problem which is preferably expressed in 
an algorithmic way. Declaratively, the problem could be 
formulated as follows: Find a permutation such that 
Agi < Amis) foralli¢ {1,...,—1} where Aj,.. 


.,nisan 
array of objects on which an order is defined. It is diffi- 
cult to imagine a ‘solver’ that could solve this problem 
as efficiently as the best known sorting algorithms such 
as Quicksort, of which the implementation is straight- 
forward. 

The reason why the sorting problem is best formu- 
lated as an algorithm is probably that the state space is 
exponential in the number of items, however, the best 
algorithm only has complexity O(n log n). 
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The n-Queens Problem 


The n-queens problem is to place n queens on a chess- 
board of dimension n x n in such a way, that they can- 
not beat each other. This problem can be formulated 
declarative as follows: {x;, xj € {1,..., nhxj A xj,xj+ 1F 
xj + j, x; — i # x; — j}, where x; is the column position 
of the ith queen (i. e. the queen in row /). 

Using the LPL [12] formulation: 


MODEL nQueens; 
PARAMETER n; SET i ALIAS j::={1,..., 2; 
DISTINCT VARIABLE «x{i}[1,..., 1]; 
CONSTRAINT S{i, j : i < j}: 
x[i]+i <> x[j]+jAND x[i]—i <> x[j]—-j; 
END 


the author was able to solve problems for n < 8 us- 
ing a general MIP solver. The problem is automatically 
translated into a 0-1 problem by LPL. Replacing the 
MIP-solver by a tabu search heuristic, problems with 
n < 50 were solvable within the LPL framework. Using 
the constraint language OZ [19] problems of n < 200 
are efficiently solvable using techniques of propagation 
and variable domain reductions. However, the success 
of all these methods seems to be limited compared to 
the best we can attain. In [20,21], Sosic Rok and Gu 
Jun presented a polynomial time local heuristic that can 
solve problems of n < 3 000 000 in less than one minute. 
The presented algorithm is very simple. The conclusion 
seems to be for the n-queens problem that an algorith- 
mic formulation is advantageous. 


A Two-Person Game 


Two players choose at random a positive number and 
note it on a piece of paper. They then compare them. If 
both numbers are equal, then neither player gets a pay- 
off. If the difference between the two numbers is one, 
then the player who has chosen the higher number ob- 
tains the sum of both; otherwise the player who has cho- 
sen the smaller number obtains the sum of both. What 
is the optimal strategy for a player, i.e. which numbers 
should be chosen with what frequencies to get the max- 
imal payoff? This problem was presented in [9] and is 


a typical two-person zero-sum game. In LPL, it can be 
formulated as follows: 


ODEL Game ‘finite two-person zero-sum game’; 
SET i ALIAS j :=/1 : 50/; 
PARAMETER pi, j} := IF(j > i, IF(j = i+1, 
—i— j, MING, j)), IEG < i,—pli. i], 0))s 
VARIABLE x{i}; 
CONSTRAINT R : SUM{i} x[i] = 1; 
MAXIMIZE gain: MIN{j}(SUM{i} p[j, i] * x[1]); 
END Game. 


This is an very compact way to declaratively formu- 
late the problem and it is difficult to imagine how this 
could be achieved using algorithmic knowledge alone. 
It is also an efficient way to state the problem, because 
large instances can be solved by an linear programming 
solver. LPL automatically transforms it into an linear 
program. (By the way, the problem has an interest- 
ing solution: Each player should only choose number 
smaller than six.) 


Equal Circles in a Square 


The problem is to find the maximum diameter of n 
equal mutually disjoint circles packed inside a unit 
square. 

In LPL, this problem can be compactly formulated 
as follows: 


MODEL circles ‘pack equal circles in a square’; 
PARAMETER n ‘number of circles’; 
SET i ALIAS j = 1,...,n; 
VARIABLE 

t ‘diameter of the circles’; 

x{i}[0, 1] ‘x-position of the center’; 

yi}, 1] ‘y-position of the center’; 
CONSTRAINT 

R{i, j : i < j} ‘circles must be disjoint’: 

(x[i] — xL fl? + Gli) — yl? = ts 

MAXIMIZE obj ‘maximize diameter’: f; 
END 


C.D. Maranas et al. [15] obtained the best known solu- 
tions for all n < 30 and, for n = 15, an even better one 
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using an equivalent formulation in GAMS and linking 
it to MINOS [16], an well-known nonlinear solver. 


The (Fractional) Cutting-Stock Problem 


Paper is manufactured in rolls of width B. A set of cus- 
tomers W orders dy rolls of width b,(with w € W). 
Rolls can be cut in many ways, every subset P’) C W 
such that )7 icp’ y; bj < Bisa possible cut-pattern, where 
yi is a positive integer. The question is how the initial 
roll of width B should be cut, that is, which patterns 
should be used, in order to minimize the overall paper 
waste. A straightforward formulation of this problem is 
to enumerate all patterns, each giving a variable, then to 
minimize the number of used patterns while fulfilling 
the demands. The resulting model is a very large linear 
program which cannot be solved. 

A well-known method in operations research to 
solve such kind of problems is to use a column genera- 
tion method (see [3] for details), that is, a small instance 
with only a few patterns is solved and a rewarding col- 
umn - a pattern - is added repeatedly to the problem. 
The new problem is then solved again. This process is 
repeated, until no pattern can be added. To find a re- 
warding pattern, another problem - named a knapsack 
problem — must be solved. 

The problem can be formulated partially be algo- 
rithmic partially by declarative knowledge. It consists 
of two declaratively formulated problems (a linear pro- 
gram and an knapsack problem), which are both re- 
peatedly solved. In a pseudocode one could formulate 
the algorithmic knowledge as follows: 


SOLVE the small cutting-stock problem 

SOLVE the knapsack problem 

WHILE a rewarding pattern was found DO 
add pattern to the cutting-stock problem 
SOLVE the cutting-stock problem again 
SOLVE the knapsack problem again 

ENDWHILE 


The two models (the cutting-stock problem and the 
knapsack problem) can be formulated declaratively. In 
the proposed framework of modeling language, the 
complete problem can now be expressed as in the pro- 
gram below. 


MODEL CuttingStock; 
MODEL Knapsack(i, w, p, K, x, obj); 
SEM; 
PARAMETER w{i}; p{i}; K; 
INTEGER VARIABLE x{ i}; 
CONSTRAINT R: SUM{i} w * x < K; 
MAXIMIZE obj: SUM{i} p * x; 
END Knapsack. 
SET 
w ‘rolls ordered’; p ‘possible patterns’; 
PARAMETER 
a{w, p} ‘pattern table’; 
d{w} ‘demands’; 
b{w} ‘widths of ordered rolls’; 
B ‘initial width’; 
INTEGER y{w} ‘new added pattern’; 
C ‘contribution of a cut’; 
VARIABLE 
X{p} ‘number of rolls cut according to p’; 
CONSTRAINT 
Dem{w}: SUM{p} a * X > d; 
MINIMIZE obj: SUM{p} X; 
BEGIN 
SOLVE; 
SOLVE Knapsack(w, b, Dem.dual, B, y, C); 
WHILE (C > 1) DO 
p=pt {pattern + str(#p)}; 
atw, #p} := y[w]; 
SOLVE; 
SOLVE Knapsack(w, b, Dem.dual, B, y, C); 
END; 
END CuttingStock. 


This formulation has several remarkable properties: 

1) Itis short and readable. The declarative part consists 
of the (small) linear cutting-stock problem, it also 
contains, as a submodel, a knapsack problem. The 
algorithmic part implements thecolumn generation 
method. Both parts are entirely separated. 

It is a complete formulation, except from the data. 
No other code is needed; both models can be solved 
using a standard MIP solver (since the knapsack 
problem is small in general). 

It has a modular structure. The knapsack problem 
is an independent component with its own name 
space; there is no interference with the surrounding 
model. It could even be declared outside the cutting- 
stock problem. 


2 
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4) The cutting-stock problem is only one problem of 
a large class of relevant problems which are solved 
using a column generation or, alternatively, a row- 
cut generation. 


Conclusion 


It has been shown that certain problems are best formu- 
lated as algorithms, others in a declarative way, still oth- 
ers need both paradigms to be stated concisely. Com- 
puter science made available many algorithmic lan- 
guages; they can be contrasted to the algebraic mod- 
eling languages which are purely declarative. A lan- 
guage, called modeling language, which combines both 
paradigms was defined in this paper and examples were 
given showing clear advantages of doing so. Its is more 
powerful than both paradigms separated. 

However, the integration of algorithmic and declar- 
ative knowledge cannot be done in an arbitrary way. 
The language design must follow certain criteria well- 
known in computer science. The main criteria are: reli- 
ability and transparency. Reliability can be achieved by 
a unique notation to code models, that is, by a modeling 
language, and by various checking mechanisms (type 
checking, unit checking, data integrity checking and 
others). Transparency can be obtained by flexible de- 
composition techniques, like modular structure as well 
as access and protection mechanisms of these structure, 
well-known techniques in language design and software 
engineering. 

Solving efficiently and relevant optimization prob- 
lems using present desktop machine not only asks for 
fast machines and sophisticated solvers, but also for for- 
mulation techniques that allow the modeler to commu- 
nicate the model easily and to build it in a readable and 
maintainable way. 


See also 


> Continuous Global Optimization: Models, 
Algorithms and Software 

> Large Scale Unconstrained Optimization 

> Optimization Software 
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Introduction 


This article presents a general overview of some of the 
most recent approaches for solving the molecular dis- 
tance geometry problem, namely, the ABBIE algorithm, 
the Global Continuation Algorithm, d.c. optimization 
algorithms, the geometric build-up algorithm, and the 
BP algorithm. 

The determination of the three-dimensional struc- 
ture of a molecule, especially in the protein folding 
framework, is one of the most important problems in 
computational biology. That structure is very impor- 
tant because it is associated to the chemical and biolog- 
ical properties of the molecule [7,11,46]. Basically, this 
problem can be tackled in two ways: experimentally, via 
nuclear magnetic resonance (NMR) spectroscopy and 
X-ray crystallography [8], or theoretically, through po- 
tential energy minimization [19]. 

The Molecular Distance Geometry Problem 
(MDGP) arises in NMR analysis. This experimental 
technique provides a set of inter-atomic distances dj 
for certain pairs of atoms (i,j) of a given protein [23,24, 
33,56,57]. The MDGP can be formulated as follows: 


Given a set S of atom pairs (i,j) on a set of m 
atoms and distances dj; defined over S, find positions 
X1,..-,Xm € R? of the atoms in the molecule such that 


IIxi-—xjl=dij, Vapes. (1) 


When the distances between all pairs of atoms of 
a molecule are given, a unique three-dimensional struc- 
ture can be determined by a linear time algorithm [16]. 
However, because of errors in the given distances, a so- 
lution may not exist or may not be unique. In addition 
to this, because of the large scale of problems that arise 
in practice, the MDGP becomes very hard to solve in 
general. Saxe [51] showed that the MDGP is NP-com- 
plete even in one spatial dimension. 

The exact MDGP can be naturally formulated as 
a nonlinear global minimization problem, where the 
objective function is given by 


fC) = Yo UealP-=Gy. =@ 


(i, jES 


This function is everywhere infinitely differentiable and 
has an exponential number of local minimizers. As- 
suming that all the distances are correctly given, x € 
R*” solves the problem if and only if f(x) = 0. 

Formulations (1) and (2) correspond to the exact 
MDGBP. Since experimental errors may prevent solu- 
tion existence (e. g. when the triangle inequality 


dij < dix + dkj 
is violated for atoms i,j,k), we sometimes consider an 
€-optimum solution of (1), i.e. a solution x1,...,Xm 
satisfying 


llxi-xj||-—dij|l<e, VG jes. (3) 
Moré and Wu [41] showed that even obtaining such an 
€-optimum solution is NP-hard for € small enough. 

In practice, it is often just possible to obtain lower 
and upper bounds on the distances [4]. Hence a more 
practical definition of the MDGP is to find positions 
X1,..-,Xm € R? such that 

lig <|lxi-x\lsuj, Va)es, (4) 
where 1, and uj are lower and upper bounds on the dis- 
tance constraints, respectively. 
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The MDGP is a particular case of a more general 
problem, called the distance geometry problem [6,13, 
14,15], which is intimately related to the Euclidean dis- 
tance matrix completion problem [1,28,38]. 

Several methods have been developed to solve the 
MDGP, including the EMBED algorithm by Crippen 
and Havel [12,25], the alternating projection algorithm 
by Glunt et al. [20], spectrial gradient methods by 
Glunt et al. [21,22], the multi-scaling algorithm by 
Trosset et al. [29,52], a stochastic/perturbation algo- 
rithm by Zou, Byrd, and Schnabel [58], variable neigh- 
borhood search-based algorithms by Liberti, Lavor, and 
Maculan [35,39], the ABBIE algorithm by Hendrick- 
son [26,27], the Global Continuation Algorithm by 
Moré and Wu [41,42,43,44,45], the d.c. optimization 
algorithms by An and Tao [2,3], the geometric build- 
up algorithm by Dong, Wu, and Wu [16,17,54], and 
the BP algorithm by Lavor, Liberti, and Maculan [37]. 
Two completely different approaches for solving the 
MDGP are given in [34] (based on quantum compu- 
tation) and [53] (based on algebraic geometry). 

The wireless network sensor positioning problem is 
closely related to the MDGP, the main difference being 
the presence of fixed anchor points with known posi- 
tions: results derived for this problem can often be ap- 
plied to the MDGP. Amongst the most notable, [18] 
shows that the MDGP associated to a trilateration 
graph (a graph with an order on the vertices such that 
each vertex is adjacent to the preceding 4 vertices) can 
be solved in polynomial time; [40] provides a detailed 
study of Semi Definite Programming (SDP) relaxations 
applied to distance geometry problems. 


ABBIE Algorithm 


In [26,27], Hendrickson describes an approach to the 
exact MDGP that replaces a large optimization prob- 
lem, given by (2), by a sequence of smaller ones. He 
exploits some combinatorial structure inherent in the 
MDGBP, which allows him to develop a divide-and- 
conquer algorithm based on a graph-theoretic view- 
point. 

If the atoms and the distances are considered as 
nodes and edges of a graph, respectively, the MDGP 
can be described by a distance graph and the solution 
to the problem is an embedding of the distance graph 
in an Euclidean space. When some of the atoms can be 


moved without violating any distance constraints, there 
may be many embeddings. The graph is then called flex- 
ible or otherwise rigid. 

If the graph is rigid or does not have partial reflec- 
tions, for example, then the graph has a unique embed- 
ding. These necessary conditions can be used to find 
subgraphs that have unique embeddings. The problem 
can then be solved by decomposing the graph into such 
subgraphs, in which the minimization problems associ- 
ated to the function (2) are solved. The solutions found 
for the subgraphs can then be combined into a solution 
for the whole graph. 

This approach to the MDGP has been implemented 
in a code named ABBIE and tested on simulated data 
provided by the bovine pancreatic ribonuclease A, 
a typical small protein consisting of 124 amino acids, 
whose three-dimensional structure is known [47]. The 
data set consists of all distances between pairs of atoms 
in the same amino acid, along with 1167 additional dis- 
tances corresponding to pairs of hydrogen atoms that 
were within 3.5 A of each other. It was used fragments 
of the protein consisting of the first 20, 40,60, 80 and 
100 amino acids as well as the full protein, with two sets 
of distance constraints for each size corresponding to 
the largest unique subgraphs and the reduced graphs. 
These problems have from 63 up to 777 atoms. 


Global Continuation Algorithm 


In [43], Moré and Wu formulated the exact MDGP in 
terms of finding the global minimum of a similar func- 
tion to (2), 


f By Xn) = * wij(||xi — xl? — d,)”, (5) 


(i, j)ES 


where wj are positive weights (in numerical results 
wij = 1 was used). 

Following the ideas described in [55], Moré and Wu 
proposed an algorithm, called Global Continuation Al- 
gorithm, based on a continuation approach for global 
optimization. The idea is gradually transform the func- 
tion (5) into a smoother function with fewer local min- 
imizers, where an optimization algorithm is then ap- 
plied to the transformed function, tracing their mini- 
mizers back to the original function. For other works 
based on continuation approach, see [9,10,30,31,32,49]. 
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The transformed function (f), , called the Gaussian 
transform, of a function f: R” — R is defined by 


1 = 2 
(Male) = as [fore (-P GS") ay, 
6) 


where the parameter A controls the degree of smooth- 
ing. The value (f) (x) is a weighted average of f(x) in 
a neighborhood of x, where the size of the neighbor- 
hood decreases as A decreases: as A — 0, the average is 
carried out on the singleton set {x}, thus recovering the 
original function in the limit. Smoother functions are 
obtained as A increases. 

This approach to the MDGP has been implemented 
and tested on two artificial models of problems, where 
the molecule has m = s° atoms located in the three- 
dimensional lattice 


{(iz, i2, 13): 0 < i <s,0<i <s5,0 <i < s} 


for an integer s > 1. In numerical results, it was consid- 
ered m = 27, 64, 125, 216. 

In the first model, the ordering for the atoms is spec- 
ified by letting i be the atom at the position (i1,i,,13), 


i=1l+i, +sin +3735, 


and the set of atom pairs whose distances are known, S, 
is given by 


S={Gf:li-jl<n, (7) 


where r = s?. In the second model, the set S is specified 
by 


S = {(i, f): |lxi—xjll < Vr}, (8) 


where x; = (i), in, i3) and r = s?. For both models, s is 
considered in the interval 3 < s < 6. 

In (7), S includes all nearby atoms, while in (8), S in- 
cludes some of nearby atoms and some relatively dis- 
tant atoms. 

It was shown that the Global Continuation Algo- 
rithm usually finds a solution from any given starting 
point, whereas the local minimization algorithm used 
in the multistart methods is unreliable as a method for 
determining global solutions. It was also showed that 
the continuation approach determines a global solution 
with less computational effort that is required by the 
multistart approach. 


D.C. Optimization Algorithms 


In [2,3], An and Tao proposed an approach for solving 
the exact MDGP, based on the d.c. (difference of con- 
vex functions) optimization algorithms. They worked 
in M,,,3(R), the space of real matrices of order m x 3, 
where for X € Myn,3(IR), X; (resp., X‘) is its ith row 
(resp., ith column). By identifying a set of positions 
of atoms x),...,Xm with the matrix X, X} = x; for 
i=1,...,m, they expressed the MDGP by 


0 = min Jo(X) 
1 
— 5 > wij 0ij(X): Xe Mm,3(R) : (9) 
(i,f)€S,i<j 


where w;; > 0 for i # j and w;; = 0 for all i. The pair- 
wise potential 6;;: Mm,3(R) — R is defined for prob- 
lem (1) by either 


2 
6;(X) = (di, - IX? - X71) (10) 
or 
2 
Gij(X) = (dij [1X7 — X7II) (11) 
and for problem (4) by 


|e =P =e 
6;;(X) = min? i Zi) 


P 
\|X? — XT? — v2, 
4 ~.0 . (12) 


ij 


+ max? 


u 


Similarly to (2), X is a solution if and only if it is 
a global minimizer of problem (9) and o(X) = 0. 

While the problem (9) with 6;; given by (9) or (12) 
is a nondifferentiable optimization problem, it is a d.c. 
optimization problem. 

An and Tao demonstrated that the d.c. algorithms 
can be adapted for developing efficient algorithms for 
solving large-scale exact MDGPs. They proposed vari- 
ous versions of d.c. algorithms that are based on differ- 
ent formulations for the problem. Due its local char- 
acter, the global optimality cannot be guaranteed for 
a general d.c. problem. However, the fact that the global 
optimality can be obtained with a suitable starting point 
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motivated them to investigate a technique for comput- 
ing good starting points for the d.c. algorithms in the 
solution of (9), with 6;; defined by (11). 

The algorithms have been tested on three sets of 
data: the artificial data from Moré and Wu [43] (with 
up to 4096 atoms), 16 proteins in the PDB [5] (from 
146 up to 4189 atoms), and the data from Hendrick- 
son [27] (from 63 up to 777 atoms). Using these data, 
they showed that the d.c. algorithms can efficiently 
solve large-scale exact MDGPs. 


Geometric Build-up Algorithm 


In [17], Dong and Wu proposed the solution of the 
exact MDGP by an algorithm, called the geometric 
build-up algorithm, based on a geometric relationship 
between coordinates and distances associated to the 
atoms of a molecule. It is assumed that it is possible to 
determine the coordinates of at least four atoms, which 
are marked as fixed; the remaining ones are non-fixed. 
The coordinates of a non-fixed atom a can be calcu- 
lated by using the coordinates of four non-coplanar 
fixed atoms such that the distances between any of 
these four atoms and the atom a are known. If such 
four atoms are found, the atom a changes its status to 
fixed. More specifically, let b;, bz, b3, ba be the four fixed 
atoms whose Cartesian coordinates are already known. 
Now suppose that the Euclidean distances among the 
atom a and the atoms by, bo, b3, b4, namely dai, for 
i = 1,2,3,4, are known. That is, 


a— b,|| = da,by 
a— ba|| = da,bo> 
a — bs|| = da,p;, 
a — b4|| = dajny. 


Squaring both sides of these equations, we have: 


a||? —2a7b, + ||b,||? = @2 


a,b,’ 


a||* —2a7b, + ||ba||? = a 


a,b? 


a||? — 2a7b3 + ||b3||? = a2 


a,b3” 


a||* —2a7b, + ||bal|? = @ 


a,b4* 


By subtracting one of these equations from the others, 
it is obtained a linear system that can be used to deter- 
mine the coordinates of the atom a. For example, sub- 
tracting the first equation from the others, we obtain 


Ax = b, (13) 


where 
(b, — bx)" 
A= -2] (b,—}3)" =a 
(b; — b3)" 
and 
(42 ,, — @2.,,) — (Ilbul? = lle2ll?) 
b=| (a, -@.,,) — (Ilbul? — llesl?) 
(@2,, —42,,,) — (IIbul? = lleall?) 


Since b),b2,b3,b4 are non-coplanar atoms, the sys- 
tem (13) has a unique solution. If the exact distances 
between all pairs of atoms are given, this approach can 
determine the coordinates of all atoms of the molecule 
in linear time [16]. 

Dong and Wu implemented such an algorithm, but 
they verified that it is very sensitive to the numerical 
errors introduced in calculating the coordinates of the 
atoms. In [54], Wu and Wu proposed the updated ge- 
ometric build-up algorithm showing that, in this algo- 
rithm, the accumulation of the errors in calculating the 
coordinates of the atoms can be controlled and pre- 
vented. They have been tested the algorithm with a set 
of problems generated using the known structures of 10 
proteins downloaded from the PDB data bank [5], with 
problems from 404 up to 4201 atoms. 


BP Algorithm 


In [37], Lavor, Liberti, and Maculan propose an algo- 
rithm, called branch-and-prune (BP), based on a dis- 
crete formulation of the exact MDGP. They observe 
that the particular structures of proteins makes it pos- 
sible to formulate the MDGP applied to protein back- 
bones as a discrete search problem. They formalize this 
by introducing the discretizable molecular distance ge- 
ometry problem (DMDGP), which consists of a cer- 
tain subset of MDGP instances (to which most protein 
backbones belong) for which a discrete formulation 
can be supplied. This approach requires that the bond 
lengths and angles, as well as the distances between 
atoms separated by three consecutive bond lengths are 
known. 

In order to describe a backbone of a protein with 
m atoms, in addition to the bond lengths d;_,,;, for i = 
2,...,m, and the bond angles 6;_» ;, for i = 3,...,m, 
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it is necessary to consider the torsion angles w;_3,;, for 
i = 4,...,m, which are the angles between the normals 
through the planes defined by the atoms i—3, i—2, i—1 
and i—2,i—1,i. 

It is known that [48], given all the bond lengths 
di2,---,dm-—1,m, bond angles 613,...,9m—2,m, and 
torsion angles @1,4,...,@m—3,m Of a molecule with 
m atoms, the Cartesian coordinates (x;,,x;i,,xi,) for 
each atom i in the molecule can be obtained using the 
following formulae: 


Xiy 0 
; 0 . 
*in = B,B,...B; ri Vi=1 5m, 
Xiz 0 
1 1 
where 
1 0 0 0 
zp -| 2 10 0 
BN 6: ES Ae 
0 00 1 
ai 0 ®. =tiy 
B 0 1 0 0 
_— 00 -1 0 |’ 
0 0 0 1 
— cos 01,3 —sin 013 0 —d2,3 cos 01,3 
oo sin 6;,3 —cos 61,3 0 d>,3 sin 0,3 
— 0 01 0 | 
0 0 0 1 
and 
— COs 6-2, —sin 6-2, 
a sin 6;-2,; COS @j—3,; — COS Oj—2,; COS @j-3,; 
: sin 6;—2,; sin @j-3,; — Cos O;—2,; sin @;—3,; 
0 0 
0 —dj_1,; Cos 0;—-2,; 
—sin@j-3,; dj—1,; sin O;-2,; Cos @j-3,; 
cos @j-3,; j—1,; sin 6j-2,; sin @j-3,; |’ 
0 1 
fori =4,...,m. 


Since all the bond lengths and bond angles are as- 
sumed to be given in the instance, the Cartesian coordi- 
nates of all atoms of a molecule can be completely deter- 
mined by using the values of cos w;—3,; and sin @;-3,;, 
fori = 4,...,m. 


For instances of the DMDGP class, for all i = 
4,...,m, the value of cos w;—3,; can be computed by 
the formula 


COs Wj—3,; = alb 
2 2 
where a= dj_,,_, + dj_; — 2di-s,i-2di-2,i 


2 
- CoS Oj-2,; COS O)-1,1-41 — dj_s ; 


and b= 2dj-3,;-2dj-2,; sin 6;-2,; sin O)-1,141 
(14) 


which is just a rearrangement of the cosine law for tor- 
sion angles [50] (p. 278), and all the values in the ex- 
pression (14) are given in the instance. This allows to 
express the position of the i-th atom in terms of the 
preceding three, giving 2”? possible conformations, 
which characterizes the discretization of the problem. 

The idea of the BP algorithm is that at each step 
the ith atom can be placed in two possible positions. 
However, either of both of these positions may be in- 
feasible with respect to some constraints. The search is 
branched on all atomic positions which are feasible with 
respect to all constraints; by contrast, if a position is not 
feasible the search scope is pruned. 

The algorithm has been tested on the artificial data 
from Moré and Wu [43] (with up to 216 atoms) and on 
the artificial data from Lavor [36] (a selection from 10 
up to 100 atoms). 


Conclusion 


This paper surveys some of the methods to solve the 
Molecular Distance Geometry Problem, with particular 
reference to five existing algorithms: ABBIE algorithm, 
global continuation algorithm, d.c. optimization algo- 
rithms, the geometric build-up algorithm and the BP 
algorithm. 
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An important class of difficult global minimization 
problems arise as an essential feature of molecular 
structure calculations. The determination of a stable 
molecular structure can often be formulated in terms 
of calculating the global (or approximate global) mini- 
mum of a potential energy function (see [6]). Comput- 
ing the global minimum of this function is very diffi- 
cult because it typically has a very large number of local 
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minima which may grow exponentially with molecule 
size. 

One such application is the well known protein 
folding problem. It is widely accepted that the folded 
state of a protein is completely dependent on the one- 
dimensional linear sequence (i.e., ‘primary’ sequence) 
of amino acids from which the protein is constructed: 
external factors, such as enzymes, present at the time of 
folding have no effect on the final, or native, state of the 
protein. This led to the formulation of the protein fold- 
ing problem: given a known primary sequence of amino 
acids, what would be its native, or folded, state in three- 
dimensional space. 

Several successful predictions of folded protein 
structures have been made and announced before the 
experimental structures were known (see [3,9]). While 
most of these have been made with a blend of a hu- 
man expert’s abilities and computer assistance, fully au- 
tomated methods have shown promise for producing 
previously unattainable accuracy [2]. 

These machine based prediction strategies attempt 
to lessen the reliance on experts by developing a com- 
pletely computational method. Such approaches are 
generally based on two assumptions. First, that there 
exists a potential energy function for the protein; and 
second that the folded state corresponds to the struc- 
ture with the lowest potential energy (minimum of the 
potential energy function) and is thus in a state of ther- 
modynamic equilibrium. This view is supported by in 
vitro observations that proteins can successfully refold 
from a variety of denatured states. Evolutionary the- 
ory also supports a folded state at a global energy min- 
imum. Protein sequences have evolved under pressure 
to perform certain functions, which for most known oc- 
currences requires a stable, unique, and compact struc- 
ture. Unless specifically required for a certain function, 
there was no biochemical need for proteins to hide their 
global minimum behind a large kinetic energy barrier. 
While kinetic blocks may occur, they should be limited 
to special proteins developed for certain functions (see 


[1]). 


Molecular Model 


Unfortunately, finding the ‘true’ energy function of 
a molecular structure, if one even exists, is virtually 
impossible. For example, with proteins ranging in size 


up to 1, 053 amino acids (a collagen found in ten- 
dons), exhaustive conformational searches will never 
be tractable. Practical search strategies for the protein 
folding problem currently require a simplified, yet suf- 
ficiently realistic, molecular model with an associated 
potential energy function representing the dominant 
forces involved in protein folding [4]. In a one such 
simplified model, each residue in the primary sequence 
of a protein is characterized by its backbone compo- 
nents NH — C,H — C’O and one of 20 possible amino 
acid sidechains attached to the central Cy atom. The 
three-dimensional structure of the chain is determined 
by internal molecular coordinates consisting of bond 
lengths J, bond angles 6, sidechain torsion angles y, and 
the backbone dihedral angles ¢, w, and w. Fortunately, 
these 10r — 6 parameters (for an r-residue structure) 
do not all vary independently. Some of these (7r — 4 
of them) are regarded as fixed since they are found to 
vary within only a very small neighborhood of an ex- 
perimentally determined value. Among these are the 3r 
— 1 backbone bond lengths /, the 3r — 2 backbone bond 
angles @, and the r — 1 peptide bond dihedral angles w 
(fixed in the trans conformation). This leaves only the r 
sidechain torsion angles 7, and the r — 1 backbone di- 
hedral angle pairs (@, y). In the reduced representation 
model presented here, the sidechain angles y are also 
fixed since sidechains are treated as united atoms (see 
below) with their respective torsion angles y fixed at 
an ‘average’ value taken from the Brookhaven Protein 
Databank. Remaining are the r — 1 backbone dihedral 
angles pairs. These also are not completely indepen- 
dent; they are severely constrained by known chemical 
data (the Ramachandran plot) for each of the 20 amino 
acid residues. Furthermore, since the atoms from one 
Cy to the next Cy along the backbone can be grouped 
into rigid planar peptide units, there are no extra pa- 
rameters required to express the three-dimensional po- 
sition of the attached O and H peptide atoms. Hence, 
these bond lengths and bond angles are also known and 
fixed. 

Another key element of this simplified polypeptide 
model is that each sidechain is classified as either hy- 
drophobic or polar, and is represented by only a sin- 
gle ‘virtual’ center of mass atom. Since each sidechain 
is represented by only the single center of mass ‘virtual 
atom’ C,, no extra parameters are needed to define the 
position of each sidechain with respect to the backbone 
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mainchain. The twenty amino acids are thus classified 
into two groups, hydrophobic and polar, according to 
the scale given by S. Miyazawa and R.L. Jernigan [7]. 
Corresponding to this simplified polypeptide model 
is a simple energy function. This function includes four 
components: a contact energy term favoring pairwise 
hydrophobic residues, a second contact term favoring 
hydrogen bond formation between donor NH and ac- 
ceptor C’ = O pairs, a steric repulsive term which re- 
jects any conformation that would permit unreason- 
ably small interatomic distances, and a main chain tor- 
sional term that allows only certain preset values for the 
backbone dihedral angle pairs (¢, w). Since the residues 
in this model come in only two forms, hydrophobic 
and polar, where the hydrophobic monomers exhibit 
a strong pairwise attraction, the lowest free energy state 
involves those conformations with the greatest num- 
ber of hydrophobic ‘contacts’ [4] and intrastrand hy- 
drogen bonds. Simplified potential functions have been 
successful in [10,11], and [12]. Here we use a simple 
modification of the energy function from [11]. 


The Convex Global Underestimator 


One practical means for finding the global minimum 
of the polypeptide’s potential energy function is to use 
a convex global underestimator to localize the search in 
the region of the global minimum. The idea is to fit all 
known local minima with a convex function which un- 
derestimates all of them, but which differs from them by 
the minimum possible amount in the discrete L1 norm. 
The minimum of this underestimator is used to predict 
the global minimum for the function, allowing a more 
localized conformer search to be performed based on 
the predicted minimum. 

More precisely, given an r-residue structure with n 
= 2r — 2 backbone dihedral angles, denote a conforma- 
tion of this simplified model by ¢ € R”, and the corre- 
sponding simplified potential energy function value by 
F(@). Then, assuming that k > 2n + 1 local minimum 
conformations $"), for j=1,...,k, have been computed, 
a convex quadratic underestimating function U(@) is 
fitted to these local minima so that it underestimates 
all the local minima, and normally interpolates F (p”) 
at 2n + 1 points. This is accomplished by determining 
the coefficients in the function U(@) so that 


8; = F(¢) — Ug”) > 0 (1) 


for j = 1,..., k, and where ri 6j is minimized. That 
is, the difference between F(#) and U(@) is minimized 
in the discrete L; norm over the set of k local minima 
”, j=1,..., k. Of course, this ‘underestimator’ only 
underestimates known local minima. The specific un- 
derestimating function U(¢) used in this convex global 
underestimator (CGU) method is given by 


u(d)=0+)>> (<i + 5462) (2) 
i=1 


Note that c; and d; appear linearly in the constraints 
of (1) for each local minimum 9". Convexity of this 
quadratic function is guaranteed by requiring that d; > 
0 for i= 1, ..., m. Other linear combinations of convex 
functions could also be used, but this quadratic func- 
tion is the simplest. 

Additionally, in order to guarantee that U(¢) attains 
its global minimum Uj;in in the hyperrectangle Hd = 
{bi: 0 < ¢; < ¢; < ¢; < 2m}, an additional set of 
constraints are imposed on the coefficients of U(@): 


a (3) 
cit di = 0, 

Note that the satisfaction of (3) implies that c; < 0 and 

d; > 0 fori=1,...,n. 

The unknown coefficients c;, i= 0,..., n, and dj, i= 
1, ..., n, can be determined by a linear program which 
may be considered to be in the dual form. For reasons 
of efficiency, the equivalent primal of this problem is 
actually solved, as described below. The solution to this 
primal linear program provides an optimal dual vec- 
tor, which immediately gives the underestimating func- 
tion coefficients c; and dj. Since the convex quadratic 
function U(@) gives a global approximation to the local 
minima of F(@), then its easily computed global min- 
imum function value Umin is a good candidate for an 
approximation to the global minimum of the correct 
energy function F(@). 

An efficient linear programming formulation and 
solution satisfying (1)-(3) will now be summarized. Let 
f? = F(¢®), for j =1,..., k, and let f € R* be the vector 
with elements f”. Also let w\) € R” be the vector with 
elements Lay, i=1,..., n, and let e € R* be the 
vector of ones. Now define the following two matrices 
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@ € RO™*Dxk and 2 ER™; 
li 
ph... gH (4) 


Finally, let c € R"*!, d € R", and 6 € R* be the vectors 
with elements c;, d;, and 6;, respectively. Then (1)-(3) 
can be restated as the linear program (with free vari- 
ables c, d, and 5): 
e minimize e. 6 


e such that 
@' QQ" 0 ; f 
_~@! ~gQTt _ = 
oe Te ae | - 
i B O]\, 0 
-I, -D 0 0 
where D — diag(pi,....%n), D — 


diag(¢,,...,.0,)> Ik is the identity matrix of or- 

der k, and I’, is the n x (m + 1) ‘augmented’ matrix 

(0 : I, where I, is the identity matrix of order n. 
Since the matrix in (5) has more rows than columns 
(2(k + n) rows and k + 2n + 1 columns, where k > 2n + 
1), it is computationally more efficient to consider it as 
a dual problem, and to solve the equivalent primal. Af- 
ter some simple transformations, this primal problem 
reduces to: 


min fly,—flex 


@ yt ay) v1 
s.t. . a 
@ p =p}|” 


V1,V2,¥3 20 


which has only 2n + 1 rows and k + 2n > 4n + 1 
columns, and the obvious initial feasible solution y,; = 
ex and y2 = y3 = 0. Furthermore, since the first of the 2n 
+ 1 constraints in (6) in fact requires that ey = 1, then 
the function fT y; — fT e; is also bounded below, and so 
this primal linear program always has an optimal solu- 
tion. This optimal solution gives the values of c, d, and 
6 via the dual vectors, and also determines which values 
of f are interpolated by the potential function U(¢). 
That is, the basic columns in the optimal solution to (6) 
correspond to the conformations ” for which F(¢”) 


= U(o"”). 


Note that once an optimal solution to (6) has been 
obtained, the addition of new local minima is very easy. 
It is done by simply adding new columns to @ and Q, 
and therefore to the constraint matrix in (6). The num- 
ber of primal rows remains fixed at 2n + 1, independent 
of the number k of local minima. 

The convex quadratic underestimating function 
U() determined by the values c € R"! and d € R" 
now provides a global approximation to the local min- 
ima of F(#), and its easily computed global minimum 
point ¢min is given by (¢min)i = — ci/dj,i=1,...,n, with 
corresponding function value Umin given by Umin = Co 
= Fy cj/dj. The value Uyin is a good candidate for 
an approximation to the global minimum of the cor- 
rect energy function F(@), and so min can be used as an 
initial starting point around which additional configu- 
rations (i.e., local minima) should be generated. These 
local minima are added to the constraint matrix in (6) 
and the process is repeated. Before each iteration of this 
process, it is necessary to reduce the volume of the hy- 
perrectangle H ¢ over which the new configurations are 
produced so that a tighter fit of U(@) to the local min- 
ima ‘near’ @min is constructed. 

The rate and method by which the hyperrectangle 
size is decreased, and the number of additional local 
minima computed at each iteration must be determined 
by computational testing. But clearly the method de- 
pends most heavily on computing local minima quickly 
and on solving the resulting linear program efficiently 
to determine the approximating function U(@) over the 
current hyperrectangle. 

If E, is a cutoff energy, then one means for decreas- 
ing the size of the hyperrectangle H@ at any step is to 
let Hp = {p: U(P) < E,}. To get the bounds of H@, con- 
sider U(p) < E, where U(@) satisfies (2). Then limiting 
oj requires that 


n 


2 («i + 5462) < E,—¢p. (7) 


i=1 


As before, the minimum value of U(@) is attained when 
¢; = —cj/d;,i=1,..., n. Assigning this minimum value 
to each @;, except ¢;, then results in 


1 2 1 Ch 
aaa (8) 
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The lower and upper bounds on ¢,, k = 1, ..., n, are 
given by the roots of the quadratic equation 


cKpe + saudi = Bx. (9) 


Hence, these bounds can be used to define the new hy- 
perrectangle H@ in which to generate new configura- 
tions. 

Clearly, if E, is reduced, the size of H@ is also re- 
duced. At every iteration the predicted global mini- 
mum value U;pin satisfies Umin < F(*), where ¢* is the 
smallest known local minimum conformation. There- 
fore, E, = F(p*) is often a good choice. If at least one 
improved point ¢, with F(¢) < F(*), is obtained in 
each iteration, then the search domain H@ will strictly 
decrease at each iteration, and may decrease substan- 
tially in some iterations. 


The CGU Algorithm 


Based on the preceding description, a general method 

for computing the global, or near global, energy mini- 

mum of the potential energy function F(¢) can now be 
described. 

1) Compute k > 2n + 1 distinct local minima $”, for j 
=1,...,k, of the function F(@). 

2) Compute the convex quadratic underestimator 
function given in (2) by solving the linear program 
given in (6). The optimal solution to this linear pro- 
gram gives the values of c and d via the dual vectors. 

3) Compute the predicted global minimum point ¢min 
given by (¢min)i = —ci/d;, i = 1, ..., n, with corre- 
sponding function value Umin given by Umin = Co — 
2, G/ (24). 

4) If Pmin = 6*, where $* = argmin{F(¢): j = 1, 2,...} 
is the best local minimum found so far, then stop 
and report $* as the approximate global minimum 
conformation. 

5) Reduce the volume of the hyperrectangle H¢ over 
which the new configurations will be produced, and 
remove all columns from ® and (2 which cor- 
respond to the conformations which are excluded 
from H@. 

6) Use ¢min as an initial starting point around which 
additional local minima ¢") of F(@) (restricted to 
the hyperrectangle Hd) are generated. Add these 


new local minimum conformations as columns to 

the matrices ® and £2. 
7) Return to step 2. 
The number of new local minima to be generated in 
step 6 is unspecified since there is currently no theory 
to guide this choice. In general, a value exceeding 2n + 1 
would be required for the construction of another con- 
vex quadratic underestimator in the next iteration (step 
2). In addition, the means by which the volume of the 
hyperrectangle H@ is reduced in step 5 may vary. One 
could use the two roots of (7) to define the new bounds 
of Hd. Another method would be simply to use H¢ = 
{Pi: (min)i — 81 < $i < (Pmin)i + 5;} where 8; = |(Pmin)i 
= (o*);|,i= jeer 7 

For complete details of the CGU method and its 
computational results, see [5,8]. 
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Introduction 


The role of convexity in optimization theory has in- 
creased significantly over the last few decades. Despite 
this fact, a wide variety of global optimization problems 


are usually encountered in applications in which non- 
convex models need to be tackled. For this reason, de- 
veloping solution methods for specially structured non- 
convex problems has become one of the most active 
areas in recent years. Although these problems are diffi- 
cult by their nature, promising progress is achieved for 
some special mathematical structures. Among the so- 
lution methods developed for these special structures, 
monotonic optimization, first proposed by Tuy [9], is 
presented in this study. 

Problems of optimizing monotonic functions of 
n variables under monotonic constraints arise in the 
mathematical modeling of a broad range of real-world 
systems, including in economics and engineering. The 
original difficulties of these problems can be reduced by 
a number of principles derived from their monotonic- 
ity properties. For example, in nonconvex problems in 
general, a solution which is known to be feasible or even 
locally optimal, does not provide any information about 
global optimality and the search should be continued 
on the entire feasible space, while for an increasing ob- 
jective function, a feasible solution like z, would exclude 
the cone z + R‘ from the search procedure (for a min- 
imization objective function). In a similar way, if g(x) in 
a constraint like g(x) < 0 is increasing, then by know- 
ing that zis infeasible for this constraint, the whole cone 
z +R‘ can be discarded from further consideration. 
This kind of information would obviously restrict the 
search space and may result in more efficient solution 
methods. 

To formally present the general framework 
of the monotonic optimization problem, consider 
two vectors x, x’ € R". We say x/>x (x’ domi- 
nates x) if x) > x; Vi=1,..., n. We say x’ > x (x’ 
strictly dominates x) if x} >x;Vi=1,...,n. Let 
Ri = {x € R"|x = O} and R41, = {x € R"|x > O}. If 
a, b € R" and a < b, we define the box [a, b| as the 
set of all x € R” such that a < x < b. Similarly, let 
[a, b) = {x|a < x < b} and (a, b| = {x|a <x <b}. 
A function f : R" > R is called increasing on a box 
[a, b| ER" if f(x) < f(x’) for a<x<x <b. 
A function f is called decreasing if -f is increasing. 
Any increasing or decreasing function is referred to as 
monotonic. It can be easily shown that the pointwise 
supremum of a bounded-above family of increasing 
functions and the pointwise infimum of a bounded- 
below family of increasing functions are increasing. 
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In monotonic optimization, the following problem is 
considered: 


Maximize (minimize) f(x) 
subject to gi(x)<1 Vi=1,...,m,, 
hjx)>=1 Vj=1,...,m, 
xeERi, 
(1) 


in which f(x), gi(x), and h(x) are increasing functions 
on R". A more general definition of this problem is pre- 
sented in Sect. “Normal Sets and Polyblocks”. Heuristi- 
cally, f(x) may be a cost function (profit function for the 
maximize problem), gi(x) may express some resource 
availability constraints, while hj(x) may be a family of 
utility functions which have to take a value at least as 
big as a goal. 

The remainder of this article is organized as follows. 
We first describe the theory of normal sets and poly- 
blocks in Sect. “Normal Sets and Polyblocks”. Mono- 
tonic optimization algorithms are presented in Sect. 
“Solution Method”. Section “Generalizations” contains 
two generalizations of monotonic optimization. Differ- 
ent class of applications for which monotonic optimiza- 
tion is adapted are discussed in Sect. 5 and finally con- 
clusions are made in Sect. “Conclusions”. 


Normal Sets and Polyblocks 


The theory of normal sets and polyblocks is the under- 
lying principle for monotonic optimization. In this sec- 
tion, the definitions are presented as well as the main 
concepts and properties to help the reader to under- 
stand the upcoming algorithms. For more details and 
proofs see [5,9,10]. 


Normal Sets 


A set GC R" is called normal if for any two points 
x, x’ € R" such that x < x’ > x’ € G implies x € G. 
Given any set D C R", the set N[D], which is called 
the normal hull of D, is the smallest normal set contain- 
ing D. In other words, N[D] can be interpreted as the 
intersection of all normal sets that contain D. The in- 
tersection and the union of a family of normal sets are 
normal. If the normal set contains a point u € Ri, we 
say it has a nonempty interior. Suppose that g(x) is an 
increasing function over R". Define the level set of g(x) 


as the set G = {x € Ri|g(x) < 1}. It can be shown that 
the level set of an increasing function is a normal set and 
it is closed if the function is lower semicontinuous. 

Define I(x) = {i|x; = 0}, K, = {x’ © Ri|x; > 
x; Vi ¢ I(x)}, and dK, = {x’ € R4 |x’ > x}. Then 
a point y € R" is called an upper boundary point of 
a bounded normal set G if y € clG while Ky C R1.\G. 
The set of upper boundary points of G is called the up- 
per boundary of G and is denoted by 0*G. 

For a compact normal set GC [0, b] with 
nonempty interior and for every point z € R" \ {0}, the 
half line from 0 through z meets 0*G at a unique point 
denoted by mg(z), which is defined as mg(z) = Az, 
A = max {a > Olaz € G}. 

A set H C R" is called a reverse normal set (also 
known as conormal) if x'>x and x € H implies 
x’ € H.A reverse normal set in a box [0, b] is defined as 
a set like H € R" for whichO < x < x’ <bandx ¢H 
implies x’ € H. As before, rN[D] is the smallest reverse 
normal set containing D C R" and is called a reverse 
normal hull of set D. Define H = {x € Ri |h(x) > 1} 
for the increasing function h(x). Then it can be shown 
that H is reverse normal and it is closed if h(x) is upper 
semicontinuous. 

A point y € R" is called a lower boundary point of 
a reverse normal set H if y eclH and x ¢ HVx < y. 
The set of lower boundary points of H is called the lower 
boundary of H and is denoted by 07 H. 

For the closed reverse normal set H and be 
intH and every point ze [0, b]\H, the half line 
from b through z meets 0-H at a unique point 
Pxu(z), which is defined as py(z) = b+ pw(z—)d), 
je = max {a > 0/b + a(z — b) € H}. 

Now consider the set of constraints imposed by 
increasing functions g;(x) and h,(x) in problem (1). 
The feasible space characterized by these sets of con- 
straints can properly be presented by normal sets and 
reverse normal sets. Define the sets G, H C R" as 
G= {x € Ri |gi(x) < LViH 1p nck m} and H = 
1 € Ri |hj(x)>1Vi=1,..., m}. Then by the ba- 
sic properties of normal and reverse normal sets which 
were described above, G is the intersection of a finite 
number of normal sets which is normal. In a similar 
way, H is the intersection of a finite number of reverse 
normal sets which is reverse normal. Now we can rede- 
fine the fundamental problem of monotonic optimiza- 
tion, also called the canonical monotonic optimization 
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problem, as optimizing a monotonic function on the in- 
tersection of a family of normal and reverse normal sets 
as follows: 


Maximize (minimize) f(x) 


(2) 
subjectto xEGNH, 


in which G C [0, b] C R‘, is a compact normal set, 
H is a close reverse normal set, and f(x) is an in- 
creasing function on [0, b]. Tuy [9] proved that if G 
has a nonempty interior (if b € intH), then the max- 
imum (minimum) of f(x) over GN H, if it exists, 
is attained on 0*GMH (GNM OH). On the basis of 
this essential result, it can be shown that for every 
arbitrary compact set DC R"., max{f(x)|x € D} = 
max {f(x)|x € N[D]}. Analogously, for the minimiza- 
tion version of the objective function, for any ar- 
bitrary set ECR", we have min{f(x)|x¢ E} = 
min {f(x)|x € rN[E]}. 

It is worth mentioning that the minimization prob- 
lem can be converted to the maximization case by mak- 
ing a simple set of transformations. So it can be either 
transformed to the maximization problem or treated 
separately. 


Polyblocks 


The role of polyblocks in monotonic optimization is 
the same as that of the polytope in convex optimiza- 
tion. As the polytope is the convex hull of finitely many 
points in R”, a polyblock is the normal hull of finitely 
many points in R",. A set P C R" is a polyblock in 
[a, b] C R° if itis the union ofa finite number of boxes 
[a,z],z € T C [a, b]. The set T is called the vertex set of 
the polyblock. We call the vertex z € T a proper vertex 
ifz ¢ [0, z'] Vz' € T\ {z}, ie., by removing the vetex z 
from T, the new polyblock created by T is not equiva- 
lent to P. A vertex which is not proper is called an im- 
proper vertex. A polyblock can be defined by the set of 
its proper vertices. 

A polyblock is a closed normal set and the inter- 
section of a set of polyblocks is again a polyblock. 
Now suppose that x € [a, b] and consider the set 
P = [a, b]\(x, b]. Then it is easy to verify that P is 
a polyblock with vertices z' = b + (x — b)e’, Vi = 
1, ... , n in which é? is the ith unit vector. Using this 
property, we can approximate an arbitrary compact 
normal set $2 C R (with any desired accuracy) by 


a nested sequence of polyblock approximation. At each 
iteration, a point x ¢ £2 is found and a new polyblock is 
constructed based on that which is a subset of the pre- 
vious polyblock but still contains the set £2. 

To present the main idea of the polyblock approx- 
imation method in monotonic optimization, we need 
one more result on optimizing an increasing function 
over a polyblock. Tuy [9] proved that the increasing 
function f(x) achieves its maximum over a polyblock 
at a proper vertex. 

Now consider the problem of maximizing the in- 
creasing function f(x) over the arbitrary compact 
set 22C R". As mentioned before, we can substi- 
tute by its normal hull. So without loss of gen- 
erality, we assume that {2 is normal. The idea is to 
construct a nested sequence of polyblock outer ap- 
proximation P;} > P; D ... D @ in such a way that 
max{f(x)|x € Ph} \. max{f(x)|x € 2}. 

At iteration k, assume z* is the proper vertex of Px 
which maximizes f(x), i.e., zk = arg max{ f(z)|z € Ty}, 
where T; is the set of proper vertices of P;. Then if z* is 
feasible in §2, the initial feasible space, it also solves the 
problem. Otherwise, we are interested in a new poly- 
block Py4, C Py\{z*} which still contains (2 as a sub- 
set. 

To obtain P;,, from P;, the box [0, z*] is replaced 
by [0, z*]\K,x, in which x* is defined as x(z*). Math- 
ematically, Py = ((0, z*]\K,«) rer, \fzk} 10, Z], 
which clearly satisfies the desired property of 
2 C Par C P\fa*}. 

The vertex set of the established polyblock Px,1, 
denoted by Vi,1, contains the proper vertices of 
P, excluding z‘ and a set of n new vertices, z*!, 
zh? zk", defined as zi = zk 4 (xk — 
z‘)e!. This result is directly followed by the earlier- 
mentioned property of polyblocks about the vertices 
of [a, b]\(x, b]. Finally, the proper vertex set of Py, 
Tk+1, is obtained from V;,; by removing its improper 
vertices [9,10]. 

A set PCR" is called a reverse polyblock in 
[0, b] if it is the union of a finite number of boxes 
[z, b],z € T, T C [0, b]. The set T is called the vertex 
set of the reverse polyblock. As before, z is a proper ver- 
tex if by removing it from T, the new reverse polyblock 
created by T is not equivalent to P. A reverse polyblock 
can be defined by the set of its proper vertices. An in- 
creasing function f(x) achieves its minimum over a re- 
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verse polyblock at a proper vertex. Similar results to 
what we had for polyblocks can be developed for re- 
verse polyblocks in the very same way. For more details 
see [9,10]. 


Solution Method 


Consider problem (2) (in the maximization form) as 
discussed in Sect. “Normal Sets and Polyblocks” with 
the additional assumptions that f(x) is semicontin- 
uous on H and GN HC R‘.,. The latter assump- 
tion implies the existence of a vector a such that 
O0<a<x,VxEGNH. Let H, = {x € H|x > a}. 
For € > 0 as a given tolerance, the solution x’ is called 
€-optimal if f(x’) > f(x) -—«, Vx € GN H. We at- 
tempt to design an algorithm which is capable of find- 
ing an €-optimal solution for any given e. 

Obviously, b € H because otherwise the problem is 
infeasible. Let P; = [0, b] be the initial polyblock and 
T, = {b} its corresponding proper vertex set. If we ap- 
ply the polyblock approximation method described in 
Sect. “Normal Sets and Polyblocks” to this problem, at 
each iteration k, P; and its proper vertex set, T;, are ob- 
tained from the last iteration. We should notice that ev- 
ery vertex z € T,\H, can be removed since they do not 
belong to the initial feasible space. Also suppose that 
f (x*) is the best value found for the objective function 
so far. Then any vertex z for which f(z) < f (x*) + €is 
discarded because no €-optimal solution happens to be 
in box [0, z]. These two rules can be applied at each iter- 
ation to refine the proper vertex set T;, and delete some 
of the vertices from further consideration. 

If T, = @ in some iteration k, it means there is no 
solution x for which f(x) > 7x) + €. So, x*, the best 
solution found so far, is €-optimal and the procedure 
terminates. Otherwise, let z* = arg max {f(z)|z € Tx}. 
If z is feasible in GM H, it solves the problem. Since 
zk € H is always true, it is feasible if it belongs to G 
and infeasible otherwise. In the case of infeasibility, we 
find x* = a¢(z*) and construct the polyblock Px,1 as 
described in Sect. “Normal Sets and Polyblocks” which 
excludes z* while still containing a global optimal so- 
lution of the problem. This procedure is repeated until 
the termination criteria are satisfied or the problem is 
known to be infeasible. This procedure, first proposed 
by Tuy [9], is called the polyblock algorithm. Tuy [9] dis- 
cussed the convergence of this method and showed that 


as k — 00, the sequence x* converges to a global opti- 


mal solution of the problem. 

Now consider the minimization case of problem (2) 
in Sect. “Normal Sets and Polyblocks” with addi- 
tional assumptions that f(x) is semicontinuous on G 
and there exists a vector c such that 0 < c < b and 
O0<x<c, Vx € GOH. A nested sequence of reverse 
polyblock outer approximation of GN H (or a subset of 
G/M H in which the existence of at least one optimal so- 
lution is guaranteed) is called the reverse polyblock algo- 
rithm (copolyblock algorithm) which is devised to solve 
this problem [9]. 

The polyblock approximation algorithm works 
properly for relatively small dimension n, typically 
n = 10. However, the algorithm converges slowly as 
it gets closer to the global optimal solution and needs 
a large number of iterations even for a value of n as 
small as 5. Tuy et al. [12] presented two main rea- 
sons for this drawback of the algorithm. First, the speed 
of convergence depends on the way in which we con- 
struct the current polyblock from the previous one. Ob- 
viously, we prefer to remove a larger portion of the 
previous polyblock to have a smaller search space and 
a higher speed of convergence. This goal is achieved 
by employing more complex rules of constructing the 
polyblocks, which imposes some additional computa- 
tional effort. The second source of the slowness of the 
algorithm is how it selects the solution x; in each it- 
eration. These solutions are basically derived from the 
monotonicity properties of the problem, while some- 
times there may exist some amount of convexity which 
can be used to speed up the algorithm. 

Tuy and Al-Khayyal [11] introduced the concept of 
reduced box and reduced polyblock. It involves tighten- 
ing the box in which we are interested to find the up- 
per bound of f(x), in such a way that the reduced box 
still contains an optimal solution of the problem. Then 
based on that, a new procedure is developed to pro- 
duce tighter polyblocks. They also redefined the proper 
vertex set of polyblocks in the algorithm and suggested 
that instead of selecting x* as the last point of G on 
the halfline from a through z*, as the original algo- 
rithm does, a more complex way can be implemented 
by incorporating some of the convexity properties of 
the problem. This is by solving the convex relaxation of 
the problem max {f (x) |x €GNH,x €[a, Zz} which 
gives us an upper bound of f(x) over the feasible solu- 
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tion x in box [a, x*]. Similar ideas were applied to the re- 
verse polyblock algorithm as well. Using these two new 
modifications and improvements, they developed new 
algorithms and discussed their convergence properties, 
namely, the revised polyblock algorithm and the revised 
reverse polyblock (copolyblock) algorithm. 

Most of the outer approximation procedures, in- 
cluding the polyblock algorithm, encounter storage and 
numerical problems while solving problems in high di- 
mensions. By using branch-and-bound strategies, one 
can tackle these difficulties. Bounding is performed 
on the basis of the polyblock approximation. As be- 
fore, monotonicity cuts and convex relaxation can be 
combined to enhance the quality of the bounds in the 
corresponding portion of the feasible space. In this 
branch-and-bound approach, branching is performed 
as partitioning the feasible space into cones pairwise 
having no common interior point. The logic behind 
using conical partitioning instead of rectangular par- 
titioning is the fact that the optimal solution of the 
monotonic optimization problem, as discussed before, 
is always achieved on the upper boundary of the feasible 
normal set. Using conical partitioning is more efficient 
and less expensive in terms of the computational time. 

The algorithm starts with initial cone R? and par- 
titions it into subcones. For each of these subcones, an 
upper bound for the value of the objective function over 
the feasible solutions contained in it is derived. Those 
cones which are known to not contain an optimal so- 
lution are fathomed and the remaining ones are sub- 
divided again and the procedure is repeated until the 
termination criteria are satisfied. Among the remaining 
cones, the one having the maximal bound is the first 
candidate for branching. This algorithm, suggested by 
Tuy and Al-Khayyal [11], is called the conical algorithm. 

For those problems having partial monotonicity 
and partial convexity, this branch-and-bound scheme 
can be extended to devise a more general method. In 
this method, branching is performed on the nonconvex 
variables and bounds are computed by Lagrangian or 
convex relaxation [6]. 

To further exploit the monotonic structure of the 
problem, reduction cuts are combined with original 
monotonicity cuts and a more efficient method is de- 
veloped [13]. This method creates branch-and-cut al- 
gorithms to solve monotonic optimization problems by 
systematic use of these cuts. 


Finally, it is worth mentioning that a new concept 
of the essential €-optimal solution can be applied to 
monotonic optimization problems. The advantage of 
the method developed on the basis of this concept is 
the finding of an approximate optimal solution which 
is more appropriate and more stable than that which is 
found by the €-optimal method. For details see [8]. 


Generalizations 


The essential approach used in monotonic optimiza- 
tion can be further generalized to cover a wider class 
of non-convex general optimization problems. Among 
these generalizations, optimization of the difference 
of monotonic functions and discrete monotonic opti- 
mization are presented here. 


Optimization of the Difference 
of Monotonic Functions 


The underlying idea of monotonic optimization can be 
extended to deal with problems including the differ- 
ence of monotonic functions. A function f : R. > R is 
said to be a difference of monotonic functions if it is 
representable as the difference of two increasing func- 
tions: f, : Ri. > R and he: R’. — R. Similar to func- 
tions presented as the difference of convex functions, 
the class of difference of monotonic functions is a lin- 
ear space. The pointwise minimum and pointwise max- 
imum of a family of difference of monotonic functions 
(difference of convex functions) is still a difference of 
monotonic functions (difference of convex functions). 
The linear combination of a set of difference of mono- 
tonic functions is a difference of monotonic functions. 
Obviously, any polynomial function can be presented 
as the difference of two increasing functions, the first 
one includes all terms having positive coefficients and 
the second one includes all terms having negative coef- 
ficients. 
Consider the problem: 


Maximize (minimize) f(x) — g(x) 


(3) 
subjectto xEGNH, 


in which G and H are as before and f(x) and g(x) 
are increasing functions on [0,b]. Tuy [9] extended 
the original polyblock algorithm to solve this prob- 
lem. By introducing t as the difference between g(b) 
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and g(x) for x € [0, b] and regarding the fact that 
t is always positive owing the function g(x) being in- 
creasing, we rewrite the model as (maximization case) 
max{f(x) + t — g(b)|x € GNH,t = g(b) - 
g(x)}. Now g(b) is a constant and can be removed 
from the objective function. In the resulting problem, 
max {f(x) + tlx € GN H, 0 < t < g(b) — g(x)}, con- 
sider the set of constraints. By incrementing the di- 
mension of the problem by one, the feasible space can 
be presented as DM E such that D = {(x,f)|x € 
G,t + g(x) < g(b),0 < t < g(b) — g(0)} and 
E = {(x, |x € H,0 < t < g(b) — g(0)}. It is easy 
to verify that D is a normal set and H is a reverse nor- 
mal set in the box [0, b] x [0, g(b) — g(0)]. Also the 
function F(x, t) = f(x) + t is an increasing function 
on [0, b] x [0, g(b) — g(0)]. So problem (3) is reduced 
to problem (2) in Sect. “Normal Sets and Polyblocks” 
and can be treated by the original polyblock algorithm. 
The additional cost that the presence of difference of 
monotonic functions has incurred is the dimension of 
the problem incremented by one. 

For the minimization case of problem (3), a similar 
transformation can be applied to convert this problem 
to the minimization case of problem (2). 

To make the problem even more general, suppose 
that all constraints are also difference of monotonic 
functions. Specifically, consider the problem: 


Maximize (minimize) f)(x) — f(x) 
subject to gi(x) — hj(x) < 0 (4) 


Vi = Tae AG 


xe2c[0, b]) CR, 


in which fi(x), fo(x), gi(x), and h(x) are increas- 
ing functions and (2 is a normal set. By the above 
argument, first we can make a proper transforma- 
tion and convert the objective function to an increas- 
ing function. So without loss of generality, let us as- 
sume that f,(x) = 0. Now consider the set of m con- 
straints. This set of constraints can be rewritten as 
max; {gi(x) — hj(x)} < 0. Since the pointwise maxi- 
mum of a family of difference of monotonic functions 
is still a difference of monotonic functions, we can 
represent the space imposed by these constraints by 
g(x) — h(x) < 0, where both g(x) and h(x) are increas- 
ing. By introducing the new variable ¢ > 0 and assum- 
ing g(b) => 0 (this assumption is not restrictive), the set 


of the following two constraints fully defines the space 
mentioned: g(x) + t < g(b), h(x) + t => g(b). The first 
constraint gives us the upper bound of g(b) — g(0) for t. 

Finally the problem reduces to (maximization case): 

max{fi(x)|g(x) + t < g(b), h(x) +t = g(d), 
x € 2,0 < t < g(b) — g(0)}. This problem is 
the same as problem (2) by defining G = {(x, f)| 
x € 2, g(x) +t < g(b),0 < t < gb) — g(0)}, 
which is a subset of the box [0, b] x [0, g(b) — g(0)] 
and H = {(x, t)|h(x) + t > g(b)} is defined in R"*?. 

Increasing the dimension of the problem is the main 
drawback of the above mentioned approach. Tuy and 
Al-Khayyal [11] presented a direct approach for the 
difference of monotonic functions optimization prob- 
lem requiring no additional dimension. This method 
is referred to as the branch-reduce-and-bound (BRB) 
algorithm. As the name of the algorithm suggests, it 
contains three main steps, which are branching upon 
nonconvex variables, reducing any partition set before 
bounding, and bounding over each partition set. 

The branching phase is performed by rectangular 
subdivision. Every box is divided into two subboxes by 
a hyperplane. The reduction phase contains a set of op- 
erations by which the box [p, q] is tightened without 
losing any feasible solution. This is called a proper re- 
duction of [p,q]. This approach takes advantage of the 
monotonicity properties of the problem and increases 
the rate of convergence in the algorithm. In the bound- 
ing phase, for a properly reduced box [p, q], an upper 
bound like f is obtained such that B > max{fi(x) — 
Alx)|gi(x) — hi(fx) < 0, Vi = 1,... 
[p. q]}. As mentioned before, stronger bounds are ob- 
tained by a sequence of polyblock approximations or 
by combining monotonicity with convexity present in 
the problem. Furthermore, more complex methods can 
be applied to improve the quality of the bounds in the 
bounding phase. 


,mx € 


Discrete Monotonic Optimization 


A class of monotonic optimization problems contain- 
ing the additional discrete constraints are called discrete 
monotonic optimization problems. Specifically, given 
a finite set S of points in the box [a,b], the constraint 
x € S is added to the model. So the problem can be rep- 
resented as max {f(x)|x € GM HN S} (all the assump- 
tions are as in problem (2). 
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The original polyblock algorithm is not practical for 
these problems. Since the polyblock algorithm is an it- 
erative procedure, it does not have the capability to 
produce the optimal solution in a finite number of it- 
erations. However, by making suitable modifications, 
one can use this algorithm to obtain the exact opti- 
mal solution of the problem in a finite number of steps 
[1,14]. In the new method, monotonicity cuts are ad- 
justed on the basis of a special procedure to cope with 
discrete requirements. This adjustment consists in up- 
dating the vertex of the monotonicity cut by pushing 
it deeper inside the polyblock to obtain a tighter space 
while keeping all discrete points which are not proven 
to be nonoptimal, unaffected. 

The algorithm first constructs the normal hull of 
GS, denoted by G, and then tries to solve the prob- 
lem max {f(x)|x € GM H} in continuous space. This 
method is called the discrete polyblock algorithm. For 
large-scale instances, a similar BRB algorithm was de- 
veloped by Tuy et al. [14]. 


Applications 


Although monotonic optimization is a new approach in 
global optimization and there is not a broad literature 
on its applications, it can be applied to numerous prob- 
lems. In most of these applications, first some transfor- 
mations are performed and the problems are reformu- 
lated in the proper way. Then monotonic optimization 
is applied and other approaches are employed to en- 
hance the quality of the bounds. Some of these appli- 
cations are briefly introduced in this section. 

Polynomial programming: The problem of min- 
imizing or maximizing a polynomial function under 
a set of polynomial constraints, which is encountered 
in a multitude of applications, is called polynomial pro- 
gramming. Tuy [9] reformulated this problem as a dif- 
ference of monotonic functions problem which can be 
solved by the methods described before. Tuy [7] pro- 
posed a robust solution approach for polynomial pro- 
gramming based on a monotonic optimization scheme. 
He developed a BRB procedure to tackle the polynomial 
optimization problems of higher dimensions. 

Polynomial optimization contains 
quadratic programming as a special case. So every 
polynomial optimization method can be applied to 
solve this important class of problems [4,16]. 


nonconvex 


Fractional programming: In fractional program- 
ming, we are dealing with functions which are repre- 
sented by ratios of other functions. Phuong and Tuy [3] 
considered a generalized linear fractional programming 
problem. In this problem, the objective function con- 
sists of an arbitrary continuous increasing function of 
m linear fractional functions and the feasible set is the 
polytope D. Linear fractional functions are defined as 
the ratio of two linear affine functions. They proposed 
a new unified approach which reformulates the prob- 
lem and solves it as a monotonic optimization prob- 
lem. 

Tuy [17] considered a more general class of frac- 
tional programming problems which is optimizing 
a polynomial fractional function (the ratio of two poly- 
nomial functions) under polynomial constraints. His 
method to solve the problem is again based on re- 
formulating the problem as a monotonic optimization 
problem. A branch-and-bound scheme was presented 
for problems of higher ranks. Clearly, polynomial pro- 
gramming is a special case of this class of problems. 

Multiplicative programming: Multiplicative pro- 
gramming problems are optimization problems con- 
taining products of a number of convex or concave 
functions in the objective function or constraints. 
Tuy [9] showed that these classes of problems are es- 
sentially monotonic optimization problems. Tuy and 
Nghia [15] devised a new approach based on the re- 
verse polyblock approximation method for a broad 
class of problems including generalized linear multi- 
plicative and linear fractional programming as special 
cases. 

For more applications, including Lipschitz opti- 
mization, optimization under network constraints, the 
Fekete points problem, and the Lennard-Jones potential 
energy function, see [9]. 


Conclusions 


We have discussed the recently developed theory of 
monotonic optimization as well as its generalizations 
and applications. This noble scheme which is capable 
of solving a wide range of nonconvex problems is based 
on an polyblock outer approximation procedure. 

The approach that monotonic optimization uses to 
deal with optimization problems is analogous to con- 
vex optimization in several respects. Just as we approx- 
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imate convex sets by polyhedrons, normal sets, defined 
as the level sets of increasing functions, can be approx- 
imated by a set of polyblocks in monotonic optimiza- 
tion. As the difference of convex functions plays an es- 
sential role in convex analysis (because any arbitrary 
continuous function can be represented as the differ- 
ence of two convex functions), optimization problems 
representable as the difference of monotonic functions 
can be treated in monotonic optimization. 

The performance of this method can be significantly 
improved by incorporating some other techniques like 
convex relaxation to exploit other properties present in 
the problem. In high dimensions, branch-and-bound 
or branch-and-cut extensions of the algorithm can be 
applied to overcome storage difficulties and increase the 
convergence speed. 
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We review uses of Monte-Carlo simulated annealing 
in the protein folding problem. We will discuss the 
strategy for tackling the protein folding problem based 
on all-atom models. Our approach consists of two ele- 
ments: the inclusion of accurate solvent effects and the 
development of powerful simulation algorithms that 
can avoid getting trapped in states of energy local min- 
ima. For the former, we discuss several models vary- 
ing in nature from crude (distance-dependent dielectric 
function) to rigorous (reference interaction site model). 
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For the latter, we show the effectiveness of Monte-Carlo 
simulated annealing. 


Introduction 


Proteins under their native physiological conditions 
spontaneously fold into unique three-dimensional 
structures (tertiary structures) in the time scale of mil- 
liseconds to minutes. Although protein structures ap- 
pear to be dependent on various environmental fac- 
tors within the cell where they are synthesized, it 
was inferred by experiments ‘in vitro’ that the three- 
dimensional structure of a protein is determined solely 
by its amino-acid sequence information [12]. Hence, 
it has been hoped that once the correct Hamiltonian 
of the system is given, one can predict the native pro- 
tein tertiary structure from the first principles by com- 
puter simulations. However, this has yet to be accom- 
plished. There are two reasons for the difficulty. One 
reason is that the inclusion of accurate solvent effects 
is nontrivial, because the number of solvent molecules 
that have to be considered is very large. The other rea- 
son for the difficulty comes from the fact that the num- 
ber of possible conformations for each protein is as- 
tronomically large [30,60]. Simulations by conventional 
methods such as Monte-Carlo or molecular dynamics 
algorithms in canonical ensemble will necessarily be 
trapped in one of many local-minimum states in the 
energy function. In this article, I will discuss a possi- 
ble strategy to alleviate these difficulties. The outline of 
the article is as follows. In Sect. “Energy Functions of 
Protein Systems” we summarize the energy functions 
of protein systems that we used in our simulations. In 
Sect. “Methods” we briefly review our simulation meth- 
ods. In Sect. “Results” we present the results of our pro- 
tein folding simulations. Section “Conclusions” is de- 
voted to conclusions. 


Energy Functions of Protein Systems 


The energy function for the protein systems is given by 
the sum of two terms: the conformational energy Ep for 
the protein molecule itself and the solvation free en- 
ergy Es for the interaction of protein with the surround- 
ing solvent. The conformational energy function Ep (in 
kcal/mol) for the protein molecule that we used is one 
of the standard ones. Namely, it is given by the sum of 
the electrostatic term Ec, 12-6 Lennard-Jones term Ej), 


and hydrogen-bond term Eyp for all pairs of atoms in 
the molecule together with the torsion term Ejo; for all 
torsion angles: 


Ep = Ec + Eyy + Exp + tor, 


f= ¥ | =—), 

LU) > (% r°. (1) 
Cij Dij 

bm = Yo (SH - 3 F 


(i,j) 
Etor = > U; (1 + cos(n;x')) ‘ 


Here, rj is the distance (in A) between atoms i and pe 
is the dielectric constant, and 7’ is the torsion angle for 
the chemical bond i. Each atom is expressed by a point 
at its center of mass, and the partial charge q; (in units 
of electronic charges) is assumed to be concentrated at 
that point. The factor 332 in Ec is a constant to ex- 
press energy in units of kcal/mol. These parameters in 
the energy function as well as the molecular geometry 
were adopted from ECEPP/2 [37,41,57]. The computer 
code KONF90 [23,46] was used for all the Monte-Carlo 
simulations. For gas phase simulations, we set the di- 
electric constant € equal to 2. The peptide-bond dihe- 
dral angles w were fixed at the value 180° for simplicity. 
So, the remaining dihedral angles ¢ and w in the main 
chain and y in the side chains constitute the variables to 
be updated in the simulations. One Monte-Carlo (MC) 
sweep consists of updating all these angles once with 
Metropolis evaluation [36] for each update. 

Solvation free energy of interactions between a so- 
lute molecule and solvent molecules, in general, can 
be divided into three contributions: hydrophobic term 
that corresponds to the work required to create a cav- 
ity of the shape of the solute molecule in solution 
(the term ‘hydrophobic’ used in this article is differ- 
ent from a more standard one; see [11] for clarification 
on various definitions), the electrostatic term (includ- 
ing the hydrogen-bond energy) between solute and sol- 
vent molecules, and the Lennard-Jones term between 
solute and solvent molecules. 

One of the simplest ways to represent solvent effects 
is by the sigmoidal, distance-dependent dielectric func- 
tion [20,54]. The explicit form of the function we used 
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is given by [43] 
D—-2 
= D=—S- [(sry? + 2sr + 2] Ee, (2) 


which is a slight modification of the one used in [9]. 
Here, we use s = 0.3 and D = 78. It approaches 2 (the 
value inside a protein) in the limit the distance r go- 
ing to zero and 78 (the value for bulk water) in the 
limit r going to infinity. The distance-dependent dielec- 
tric function is simple and also computationally only 
slightly more demanding than the gas-phase case. But 
it only involves the electrostatic interactions. Other sol- 
vent contributions are hydrophobic interactions and 
Lennard-Jones interactions between protein and sol- 
vent. 

Another commonly used term that represents sol- 
vent contributions is the term proportional to the 
solvent-accessible surface area of protein molecule. The 
solvation free energy Es in this approximation is given 
by 


Es = YS oiAi, (3) 


where A; is the solvent-accessible surface area of ith 
functional group, and o; is the proportionality con- 
stant. There are several versions of the set of the propor- 
tionality constants and functional groups. Five param- 
eter sets were compared for the systems of peptides and 
a small protein, and we found that the parameter sets of 
[52,59] are valid ones [33]. The term in (3) includes all 
the contributions from solvent (namely, hydrophobic, 
electrostatic, and Lennard-Jones interactions), and it is 
therefore more accurate than the distance-dependent 
dielectric function. It is, however, an empirical repre- 
sentation, and its validity has to be eventually tested 
with a rigorous solvation theory. 

The most widely-used and rigorous method of in- 
clusion of solvent effects is probably the one that deals 
with the explicit solvent molecules with all-atom rep- 
resentations. Many molecular dynamics simulations of 
protein systems now directly include these explicit sol- 
vent molecules (for a review, see, for instance, [4]). An- 
other rigorous method is based on the statistical me- 
chanical theory of liquid and solution and is called 
the reference interaction site model (RISM) [7,21]. The 
RISM integral equation for solute-solvent (p-s) correla- 


tion functions in Fourier k-space is given by 
Ps = WrPcPs (WS + ph’), (4) 


where h?® and h® are the matrices of the solute-solvent 
and the solvent-solvent total correlation functions, re- 
spectively, ¢?* is the matrix of the solute-solvent direct 
correlation functions, WP? and W*® are the intramolecu- 
lar correlation matrices for solute and solvent, respec- 
tively, and p is the number density matrix of the sol- 
vent. The solvation free energy is given by 


[o,@) 
Es = aapkaT | r’F(r) dr, (5) 
0 
where F(r) is defined by 


F(r) = >> sear)? — cB) - search) . (6) 
a,b 


Here, the summation indices a and b run over the so- 
lute and the solvent sites, respectively. A robust and 
fast algorithm for solving RISM equations was re- 
cently (as of 1999) developed [24], which made fold- 
ing simulations of peptides a feasible possibility [25]. 
Although this method is computationally much more 
time-consuming than the first two methods (terms with 
distance-dependent dielectric function and those pro- 
portional to surface area), it gives the most accurate 
representation of the solvation free energy. 


Methods 


Once the appropriate energy function of the protein 
system is given, we have to employ a simulation method 
that does not get trapped in states of energy local min- 
ima. We have been advocating the use of Monte-Carlo 
simulated annealing [27]. 

In the regular canonical ensemble with a given in- 
verse temperature 6 = 1/kgr, the probability weight of 
each state with energy E is given by the Boltzmann fac- 
tor: 


Wa(E) = exp(—BE). (7) 
The probability distribution in energy is then given by 


Pa(T, E) « n(E)Ws(E), (8) 
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where n(E) is the number of states with energy E. 
Since the number of states n(E) is an increasing func- 
tion of energy and the Boltzmann factor W,(E) de- 
creases exponentially with E, the probability distribu- 
tion Pg(T, E) has a bell-like shape in general. When the 
temperature is high, 6 is small, and W,(E) decreases 
slowly with E. So, Pg(T, E) has a wide bell-shape. On 
the other hand, at low temperature f is large, and 
Wa(E) decreases rapidly with E. So, Pg(T, E) has a nar- 
row bell-shape (and in the limit T — 0 K, Pg(T,E) « 
6(E — Eggs), where Egg is the global-minimum energy). 
However, it is very difficult to obtain canonical distribu- 
tions at low temperatures with conventional simulation 
methods. This is because the thermal fluctuations at low 
temperatures are small and the simulation will certainly 
get trapped in states of energy local minima. Simulated 
annealing [27] is based on the process of crystal mak- 
ing. Namely, by starting a simulation at a sufficiently 
high temperature (much above the melting tempera- 
ture), one lowers the temperature gradually during the 
simulation until it reaches the global-minimum-energy 
state (crystal). If the rate of temperature decrease is suf- 
ficiently slow so that thermal equilibrium may be main- 
tained throughout the simulation, only the state with 
the global energy minimum is obtained (when the fi- 
nal temperature is 0 K). However, if the temperature 
decrease is rapid (quenching), the simulation will get 
trapped in a state of energy local minimum in the vicin- 
ity of the initial state. 

Simulated annealing was first successfully used to 
predict the global-minimum-energy conformations of 
polypeptides and proteins [22,61,63] and to refine pro- 
tein structures from X-ray and NMR data [5,42] almost 
a decade ago. Since then this method has been exten- 
sively used in the protein folding and structure refine- 
ment problems (for reviews, see [45,62]). Our group has 
been testing the effectiveness of the method mainly in 
oligopeptide systems. The procedure of our approach is 
as follows. While the initial conformations in the pro- 
tein simulations are usually taken from the structures 
inferred by the experiments, our initial conformations 
are randomly generated. Each Monte-Carlo sweep up- 
dates every dihedral angle (in both the main chain and 
side chains) once. Our annealing schedule is as follows: 
The temperature is lowered exponentially from T; = 
1000 K to Tr = 250 K (the final temperature Tp was 
sometimes set equal to 100 K, 50 K, or 1 K) [23,46]. The 


temperature for the nth MC sweep is given by 
= Ty (9) 


where y is a constant which is determined by Ty, Tr, 
and the total number of MC sweeps of the run. Each run 
consists of 10* ~ 10° MC sweeps, and we usually made 
10 to 20 runs from different initial conformations. 


Results 


We now present the results of our simulations based on 
Monte-Carlo simulated annealing. All the simulations 
were started from randomly-generated conformations. 

The first example is Met-enkephalin. This brain 
neuro peptide consists of 5 amino acids with the amino- 
acid sequence: Tyr-Gly-Gly-Phe-Met. Because it is one 
of the smallest peptides that have biological functions, 
it has served as a bench mark for testing a new sim- 
ulation method. The global minimum conformation 
of this peptide for ECEPP/2 energy function in gas 
phase (€ = 2) is known [31,49]. For KONF90 realiza- 
tion of ECEPP/2 energy, the peptide is essentially in 
the ground state for Ep < —11 kcal/mol [15,49] and the 
lowest value is —12.2 kcal/mol [16,17]. 

In Fig. 1, we show the ‘time series’ of the total con- 
formational energy Ep (in (1)) obtained by conven- 
tional canonical Monte-Carlo simulations at T = 1000, 
300, and 50 K. 

The thermal fluctuations for the run at T = 50 K in 
Fig. 1c are very small and this run has apparently gotten 
trapped in states of energy local minima (because the 
average energy at 50 K is about —11 kcal/mol [15,16]). 
In Fig. 2 we display the time series of energy obtained 
by a Monte-Carlo simulated annealing simulation. 

This run reaches the global minimum region (Ep < 
—11 kcal/mol) as the temperature is decreased during 
the simulation from 1000 K to 50 K. 

We have up to now presented the results in 
gas phase (€=2). In Fig. 3 we compare the super- 
posed structures of lowest-energy conformations from 
8 Monte-Carlo simulated annealing runs in gas phase, 
simple-repulsive solvent, and water (the latter two con- 
tributions were calculated by the RISM theory) [26] 
with those of 5 structures inferred from NMR experi- 
ments ([13, Fig. 2]). The figures were created with Ras- 
Mol [55]. 


Monte-Carlo Simulated Annealing in Protein Folding 


2327 


ie] 50000 100000 
a MC Sweep 


150000 200000 


0 50000 
b Mc $ 


100000 150000 200000 


weep 


0 §0000 100000 
c McC Sweep 


150000 200000 


Monte-Carlo Simulated Annealing in Protein Folding, Fig- 
ure 1 

Series of energy Ep (kcal/mol) of Met-enkephalin from con- 
ventional canonical Monte-Carlo runs at T = 1000 K (a), 300 K 
(b), and 50 K (c) 
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Monte-Carlo Simulated Annealing in Protein Folding, Fig- 
ure 2 

Time series of energy Ep (kcal/mol) of Met-enkephalin from 
a Monte-Carlo simulated annealing run 


We see a striking similarity between simulation re- 
sults in water Fig. 3c and those of NMR experiments 
(Fig. 3d). The simulation results in Fig. 3 are from the 
same number of MC sweeps. It seems that the presence 
of water speeds up the convergence of the backbone 
structures in the sense that it requires less number of 
MC sweeps for convergence [26]. 

The solvation free energy based on the RISM theory 
is very accurate, but it is also computationally very de- 
manding. We are currently trying to solve this problem 
making the algorithm more efficient and robust [24]. 
Hereafter, we discuss how well other solvation theories 
can still describe the effects of solvent in the predic- 
tion of three-dimensional structures of oligopeptides 
and small proteins. 

Next systems we discuss are those of homo- 
oligomers with length of 10 amino acids. From the 
structural data base of X-ray experiments of protein 
structures [8] and CD experiments [6], it is known that 
certain amino acids have more tendency of a-helix for- 
mation than others. For instance, alanine is a helix for- 
mer and glycine is a helix breaker, while phenylala- 
nine has intermediate helix-forming tendency. We have 
performed 20 Monte-Carlo simulated annealing runs 
of 10,000 MC sweeps in gas phase (€ = 2) with each 
of (Ala)jo, (Leu)i0, (Met)i0, (Phe)i0, (Ile)i0, (Val)10, 
and (Gly); [44]. These amino acids are nonpolar and 
we can avoid the complications of electrostatic and 
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Monte-Carlo Simulated Annealing in Protein Folding, Fig- 
ure 3 

Superposition of the eight conformations of Met-enkephalin 
obtained as the lowest-energy structures by Monte-Carlo 
simulated annealing in gas phase (a), simple-repulsive sol- 
vent (b), and water (c) together with superposition of five 
conformations deduced from the NMR experiment (d) 


hydrogen-bond interactions of side chains with each 
other, with main chain, and with the solvent. 

In order to analyze how much q@-helix formation is 
obtained by simulations, we first define a-helix state of 
a residue. We consider that a residue is in the a-helix 
state when the dihedral angles (¢, y) fall in the range 
(—60 + 45°, —50 + 45°) (Definition I) [23,46]. The 
length £ of a helical segment is then defined by the num- 
ber of successive residues that are in the a-helix state. 
The number 1 of helical residues in a conformation is 
defined by the sum of ¢ over all helical segments in the 
conformation. Note that £ = 3 corresponds to roughly 
one turn of a-helix. We therefore consider a conforma- 
tion as helical if it has a segment with helix length ¢ > 3. 

The average values of the dihedral angles ¢ and w 
for the helical segments based on Definition I (with 
helix length € > 3) are —70° and —37°, respectively, 
and the standard deviation is ~ 10° for ECEPP/2 en- 
ergy function [44,46]. Hence, for detailed analyses of 
the data we adopt a more stringent criterion for a-helix 
state (Definition II): The range is (@, Ww) = (—70 + 20°, 
—37 + 20°) [44]. 

We likewise consider that a residue is in the f- 
strand state when the dihedral angles (¢, yw) fall in 
the range (—130 + 50°, 135 + 45°) [44]. The B-strand 
length m is then defined to be the number of succes- 
sive residues that are in the 6-strand state. We consider 
a conformation as {-stranded if it has a segment with 
B-strand length m > 3. 

In Table 1 we summarize the a-helix formation in 
the 20 Monte-Carlo simulated annealing runs [44]. The 
results are for Definition II of the a-helix state. 

We see that (Met)j0, (Ala) 10, and (Leu) 9 gave many 
helical conformations: 15, 9, and 9 (out of 20), respec- 
tively. In particular, (Met) 19 and (Ala), produced long 
helices, some conformations being almost entirely he- 
lical (€ > 8). On the other hand, (Val)19, (Ile)19, and 
(Gly)1o0 gave few helical conformations: 2, 2, and 1 (out 
of 20), respectively. We obtained not only a smaller 
number of helices but also shorter helices for these 
homo-oligomers than the above three homo-oligomers. 
Finally, the results for (Phe) jo indicate that Phe has in- 
termediate helix-forming tendency between these two 
groups. We thus have the following rank order of helix- 
forming tendency for the seven amino acids [44]: 


Met > Ala > Leu > Phe > Val > Ile > Gly. (10) 
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Monte-Carlo Simulated Annealing in Protein Folding, Table 1 


o-Helix formation in homo-oligomers from 20 Monte-Carlo simulated annealing runs 


Peptide (Met) 10 (Ala)10 (Leu) 10 (Phe) 10 (Val) 10 (Ile) 10 (Gly) 10 
£ 

3 1 0 4 1 0 D 1 

4 2 0 2 2 2 0 0 

5} 0 1 1 1 0 0 0 

6 2 3 yD, il 0 0 0 

Ti 2» 1 0 0 0 0 0 

8 wT 4 0 0 0 0 0 

9 1 0 0 0 0 0 0 

10 0 0 0 0 0 0 0 
Total 15/20 9/20 9/20 5/20 2/20 2/20 1/20 


This can be compared with the experimentally deter- 
mined helix propensities [6,8]. Our rank order (10) is 
in good agreement with the experimental data. 

We then analyzed the relation between helix- 
forming tendency and energy. We found that the dif- 
ferences AE = Exy —Ey between minimum energies for 
nonhelical (NH) and helical (H) conformations is large 
for homo-oligomers with high helix-forming tendency 
(9.7, 10.2, 21.5 kcal/mol for (Met) 9, (Ala) 1, (Leu) 19, re- 
spectively) and small for those with low helix-forming 
tendency (0.5, 1.6, —3.2 kcal/mol for (Val), (Ie)10, 
(Gly)i0, respectively). Moreover, we found that the 
large AE for the former homo-oligomers are caused by 
the Lennard-Jones term AE, (13.3, 8.0, 17.5 kcal/mol 
for (Met)i9, (Ala)io, (Leu), respectively). Hence, we 
conjecture that the differences in helix-forming ten- 
dencies are determined by the following factors [44]. 
A helical conformation is energetically favored in gen- 
eral because of the Lennard-Jones term E,y. For amino 
acids with low helix-forming tendency except for Gly, 
however, the steric hindrance of side chains raises Ey; 
of helical conformations so that the difference AE,y 
between nonhelical and helical conformations are re- 
duced significantly. The small AE; for these amino 
acids can be easily overcome by the entropic effects and 
their helix-forming tendencies are small. Note that such 
amino acids (Val and Ile here) have two large side-chain 
branches at C, while the helix forming amino acids 
such as Met and Leu have only one branch at C? and 
Ala has a small side chain. 

We now study the £-strand forming tendencies of 
these seven homo-oligomers. In Table 2 we summarize 


the 6-strand formation in 20 Monte-Carlo simulated 
annealing runs [44]. 

The implications of the results are not as obvious as 
in the a-helix case. This is presumably because a short, 
isolated B-strand is not very stable by itself, since hy- 
drogen bonds between f-strands are needed to stabi- 
lize them. However, we can still give a rough estimate 
for the rank order of strand-forming tendency for the 
seven amino acids [44]: 


Val > Ile > Phe > Leu > Ala > Met > Gly. (11) 


Here, we considered Val as more strand-forming than 
Ile, since the longer the strand segment is, the harder it 
is to form by simulation. Our rank order (11) is again 
in good agreement with the experimental data [8]. 

By comparing (11) with (10), we find that the helix- 
forming group is the strand-breaking group and vice 
versa, except for Gly. Gly is both helix and strand break- 
ing. This reflects the fact that Gly, having no side chain, 
has a much larger (backbone) conformational space 
than other amino acids. 

The helix-coil transitions of homo-oligomer sys- 
tems were further analyzed by multicanonical algo- 
rithms [3] in [47,48]. The obtained results gave quan- 
titative support to those by Monte-Carlo simulated an- 
nealing described above [44]. 

We have so far studied peptides with nonpolar 
amino acids each of which is electrically neutral as 
a whole. We now discuss the helix-forming tendencies 
of peptides with polar amino acids where side chains 
are charged by protonation or deprotonation. One ex- 
ample is the C-peptide, residues 1-13 of ribonuclease A. 
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Monte-Carlo Simulated Annealing in Protein Folding, Table 2 


B-Strand formation in homo-oligomers from 20 Monte-Carlo simulated annealing runs 


Peptide (Met) 0 (Ala)10 (Leu)10 (Phe) 0 (Val)10 (Ile) 10 (Gly)10 
m 

3 0 0 z 5 1 i 0 

4 0 0 0 1 0 4 0 

5) 0 0 0 0 2 1 0 

6 0 0 0 0 1 0 0 

7 0 0 0 0 0 0 0 

8 0 0 0 0 1 0 0 

9 0 0 0 0 0 0 0 

10 0 0 0 0 0 0 0 
Total 0/20 0/20 2/20 6/20 5/20 12/20 0/20 


It is known from the X-ray diffraction data of the whole 
enzyme that the segment from Ala-4 to Gln-11 exhibits 
a nearly 3-turn a-helix [58,64]. It was also found by CD 
[56] and NMR [53] experiments that the isolated C- 
peptide also has significant a-helix formation in aque- 
ous solution at temperatures near 0°C. 

Furthermore, the CD experiment of the isolated C- 
peptide showed that the side-chain charges of residues 
Glu-2~ and His-12* enhance the stability of the a-helix, 
while the rest of the charges of other side chains do 
not [56]. The NMR experiment [53] of the isolated C- 
peptide further observed the formation of the charac- 
teristic salt bridge between Glu-2~ and Arg-10* that 
exists in the native structure determined by the X-ray 
experiments of the whole protein [58,64]. 

In order to test whether our simulations can repro- 
duce these experimental results, we made 20 Monte- 
Carlo simulated annealing runs of 10,000 MC sweeps 
with several C-peptide analogues [23,46]. The amino- 
acid sequences of four of the analogues are listed in Ta- 
ble 3. 

The simulations were performed in gas phase (€ = 
2). The temperature was decreased exponentially from 
1000 K to 250 K for each run. As usual, all the simula- 
tions were started from random conformations. 

In Table 4 we summarize the helix formation of all 
the runs [46]. Here, the number of conformations with 
segments of helix length € > 3 are given with Defini- 
tion I of the w-helix state. From this table one sees that 
a-helix was hardly formed for Peptide IV where Glu-2 
and His-12 are neutral, while many helical conforma- 
tions were obtained for the other peptides. This is in 
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Amino-acid sequences of the peptide analogues of C- 
peptide studied by Monte-Carlo simulated annealing 


Peptide I II Ill IV 
Sequence 


Glu 
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accord with the experimental results that the charges of 
Glu-2~ and His-12* are necessary for the a-helix sta- 
bility [56]. 

Peptides II and III had conformations with the 
longest a-helix (€ = 7). These conformations turned 
out to have the lowest energy in 20 simulation runs for 
each peptide. They both exhibit an a-helix from Ala-5 
to Gln-11, while the structure from the X-ray data has 
an a@-helix from Ala-4 to Gln-11. These three confor- 
mations are compared in Fig. 4. 

As mentioned above, the agreement of the back- 
bone structures is conspicuous, but the side-chain 
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Monte-Carlo Simulated Annealing in Protein Folding, Fig- 
ure 4 

The lowest-energy conformations of Peptide II (a) and Pep- 
tide III (b) of C-peptide analogues obtained from 20 Monte- 
Carlo simulated annealing runs in gas phase, and the corre- 
sponding X-ray structure (c) 


Monte-Carlo Simulated Annealing in Protein Folding, Table 4 
a-Helix formation in C-peptide analogues from 20 Monte- 
Carlo simulated annealing runs 


Peptide I II Ill IV 
L 

3 4 2 3 1 
4 3 2 3 0 
5 1 1 0 0 
6 0 1 0 0 
7 0 1 1 0 
Total 8/20 7/20 7/20 1/20 


structures are not quite similar. In particular, while the 
X-ray [58,64] and NMR [53] experiments imply the for- 
mation of the salt bridge between the side chains of 
Glu-2~ and Arg-10*, the lowest-energy conformations 
of Peptides II and III obtained from the simulations do 
not have this salt bridge. 

The disagreement is presumably caused by the lack 
of solvent in our simulations. We have therefore made 
multicanonical Monte-Carlo simulations of Peptide II 
with the inclusion of solvent effects by the distance- 
dependent dielectric function (see (2)) [18,19]. It was 
found that the lowest-energy conformation obtained 
has an a-helix from Ala-4 to Gln-11 and does have 
the characteristic salt bridge between Glu-2~ and Arg- 
10* [18,19]. 

Similar dependence of a-helix stability on side- 
chain charges was observed in Monte-Carlo simulated 
annealing runs of a 17-residue synthetic peptide [43]. 
The pH difference in the experimental conditions was 
represented by the corresponding difference in charge 
assignment of the side chains, and the agreement with 
the experimental results (stable a-helix formation at 
low pH and low helix content at high pH) was observed 
in the simulations by Monte-Carlo simulated annealing 
with the distance-dependent dielectric function [43]. 

Considering our simulation results on homo- 
oligomers of nonpolar amino acids, C-peptide, and the 
synthetic peptide, we conjecture that the helix-forming 
tendencies of oligopeptide systems are controlled by 
the following factors [43]. An a-helix structure is gen- 
erally favored energetically (especially, the Lennard- 
Jones term). When side chains are uncharged, the steric 
hindrance of side chains is the key factor for the dif- 
ference in helix-forming tendency. When some of the 
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side chains are charged, however, these charges play 
an important role in the helix stability in addition to 
the above factor: Some charges enhance helix stability, 
while others reduce it. 

We have up to now discussed a-helix formations 
in our simulations of oligopeptide systems. We have 
also studied B-sheet formations by Monte-Carlo sim- 
ulated annealing [38,39,51]. The peptide that we stud- 
ied is the fragment corresponding to residues 16-36 
of bovine pancreatic trypsin inhibitor (BPTI) and has 
the amino-acid sequence: Ala'®-Arg* -Ile-Ile-Arg* -Tyr- 
Phe -Tyr -Asn -Ala -Lys* -Ala -Gly -Leu -Cys -Gln -Thr- 
Phe-Val-Tyr-Gly*®. An antiparallel 6-sheet structure in 
residues 18-35 is observed in X-ray crystallographic 
data of the whole protein [10]. 

We first performed 20 Monte-Carlo simulated an- 
nealing runs of 10,000 MC sweeps in gas phase (€ = 2) 
with the same protocol as in the previous simulations 
[38]. Namely, the temperature was decreased exponen- 
tially from 1000 K to 250 K for each run, and all the 
simulations were started from random conformations. 
The difference of the present simulation and the pre- 
vious ones comes only from that of the amino-acid se- 
quences. 

The most notable feature of the obtained results is 
that w-helices, which were the dominant motif in pre- 
vious simulations of C-peptide and other peptides, are 
absent in the present simulation. Most of the conforma- 
tions obtained consist of stretched strands and a ‘turn’ 
which connects them. The lowest-energy structure in- 
deed exhibits an antiparallel 6-sheet [38]. 

We next made 10 Monte-Carlo simulated annealing 
runs of 100,000 MC sweeps for BPTI(16-36) with two 
dielectric functions: € = 2 and the sigmoidal, distance- 
dependent dielectric function of (2) [39]. The results 
with € = 2 reproduced our previous results: Most of the 
obtained conformations have f-strand structures and 
no extended a-helix is observed. Those with the sig- 
moidal dielectric function, on the other hand, indicated 
formation of a-helices. One of the low-energy confor- 
mations, for instance, exhibited about a four-turn a- 
helix from Ala-16 to Gly-28 [39]. This presents an ex- 
ample in which a peptide with the same amino-acid se- 
quence can form both a-helix and f-sheet structures, 
depending on its electrostatic environment. 

NMR experiments suggest that this peptide actually 
forms a B-sheet structure [40]. The representation of 


Monte-Carlo Simulated Annealing in Protein Folding, Fig- 
ure 5 

The structure of BPTI(16-36) deduced from X-ray experi- 
ments (a) and the lowest-energy conformation of BPTI(16- 
36) obtained from 20 Monte-Carlo simulated annealing runs 
in aqueous solution represented by solvent-accessible sur- 
face area (b) 


solvent by the sigmoidal dielectric function (which gave 
a-helices instead) is therefore not sufficient. Hence, the 
same peptide fragment, BPTI(16-36), was further stud- 
ied in aqueous solution that is represented by solvent- 
accessible surface area of (3) by Monte-Carlo simulated 
annealing [51]. Twenty simulation runs of 100,000 MC 
sweeps were made. It was indeed found that the lowest- 
energy structure obtained has a B-sheet structure (ac- 
tually, type II’ 6-turn) at the very location suggested by 
the NMR experiments [40]. This structure and that de- 
duced from the X-ray experiments [10] are compared 
in Fig. 5. The figures were created with Molscript [29] 
and Raster3D [2,35]. 

Although both conformations are f-sheet struc- 
tures, there are important differences between the two: 
The positions and types of the turns are different. Since 
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the X-ray structure is taken from the experiments on 
the whole BPTI molecule, it does not have to agree with 
that of the isolated BPTI(16-36) fragment. It was found 
[51] that the simulated results in Fig. 5b have remark- 
able agreement with those in the NMR experiments of 
the isolated fragment [40]. 

We have so far dealt with peptides with small 
number of amino acids (up to 21) with simple sec- 
ondary structural elements: a single a-helix or B-sheet. 
The native proteins usually have more than one sec- 
ondary structural elements. We now discuss our at- 
tempts on the first-principles tertiary structure predic- 
tions of larger and more complicated systems. 

The first example is the fragment corresponding to 
residues 1-34 of human parathyroid hormone (PTH). 
An NMR experiment of PTH(1-34) suggested the ex- 
istence of two a-helices around residues from Ser-3 to 
His-9 and from Ser-17 to Leu-28 [28]. Another NMR 
experiment of a slightly longer fragment, PTH(1-37), 
in aqueous solution also suggested the existence of the 
two helices [32]. One of the determined structures, for 
instance, has a-helices in residues from Gln-6 to His-9 
and from Ser-17 to Lys-27 [32]. 

For PTH(1-34) we performed 20 Monte-Carlo sim- 
ulated annealing runs of 10,000 MC sweeps in gas phase 
(€ = 2) with the same protocol as in the previous simu- 
lations [50]. Many conformations among the 20 final 
conformations obtained exhibited a-helix structures 
(especially in the N-terminus area). In Fig. 6 we show 
the lowest-energy conformation of PTH(1-34) [50]. 

This conformation indeed has two a-helices around 
residues from Val-2 to Asn-10 (Helix 1) and from Met- 
18 to Glu-22 (Helix 2), which are precisely the same lo- 
cations as suggested by experiment [28], although Helix 
2 is somewhat shorter (5 residues long) than the cor- 
responding one (12 residues long) in the experimental 
data. 

A slightly larger peptide fragment, PTH(1-37), was 
also studied by Monte-Carlo simulated annealing [34] 
to compare with the results of the recent NMR exper- 
iment in aqueous solution [32]. Ten simulation runs 
of 100,000 MC sweeps were made in gas phase (€ = 2) 
and in aqueous solution that is represented by the terms 
proportional to the solvent-accessible surface area (see 
(3)). Although the results are preliminary, the simula- 
tions in gas phase did not produce two helices this time 
in contrast to the previous work [50], where a short 
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ure 6 

Lowest-energy conformation of PTH(1-34) obtained from 20 
Monte-Carlo simulated annealing runs in gas phase 


second helix was observed, as discussed in the previ- 
ous paragraph. The lowest-energy conformation has an 
a-helix from Val-2 to Asn-10. The simulations in aque- 
ous solution, on the other hand, did observe the two a- 
helices. The lowest-energy conformation obtained has 
a-helices from Gln-6 to His-9 and from Gly-12 to Glu- 
22. Note that the second helix is now more extended 
than the first one in agreement with experiments. This 
structure together with one of the NMR structure [32] 
is shown in Fig. 7. The figures were again created with 
Molscript [29] and Raster3D [2,35]. 

Generalized-ensemble simulations of PTH(1-37) 
are now in progress in order to obtain more quantita- 
tive information such as average helicity as a function 
of residue number, etc. 

The second example of more complicated system is 
the immunoglobulin-binding domain of streptococcal 
protein G. This protein is composed of 56 amino acids 
and the structure determined by an NMR experiment 
[14] and an X-ray diffraction experiment [1] has an a- 
helix and a B-sheet. The a-helix extends from residue 
Ala-23 to residue Asp-36. The B-sheet is made of four 
A-strands: from Met-1 to Gly-9, from Leu-12 to Ala- 
20, from Glu-42 to Asp-46, and from Lys-50 to Glu-56. 
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Monte-Carlo Simulated Annealing in Protein Folding, Fig- 
ure 7 

A structure of PTH(1-37) deduced from NMR experiments (a) 
and the lowest-energy conformation of PTH(1-37) obtained 
from 10 Monte-Carlo simulated annealing runs in aqueous 
solution represented by solvent-accessible surface area (b) 


This structure is shown in Fig. 8a). The figures in Fig. 8 
were again created with Molscript [29] and Raster3D 
[25351 

We have performed eight Monte-Carlo simulated 
annealing runs of 50,000 to 400,000 MC sweeps with 
the sigmoidal, distance-dependent dielectric function 
of (2). The lowest-energy conformation so far obtained 
has four a-helices and no f-sheet in disagreement 
with the X-ray structure. This structure is shown in 
Fig. 8b). 

The disagreement of the lowest-energy structure 
(Fig. 8b) so far obtained with the X-ray structure 
(Fig. 8a) is presumably caused by the poor representa- 
tion of the solvent effects. As can been seen in Fig. 8a), 
the X-ray structure has both interior where a well- 
defined hydrophobic core is formed and exterior where 
it is exposed to the solvent. The distance-dependent di- 
electric function, which mimics the solvent effects only 


b 


Monte-Carlo Simulated Annealing in Protein Folding, Fig- 
ure 8 

A structure of protein G deduced from an X-ray experi- 
ment (a) and the lowest-energy conformation of protein G 
obtained from Monte-Carlo simulated annealing runs with 
the distance-dependent dielectric function (b) 


in electrostatic interactions, is therefore not sufficient to 
represent the effects of the solvent here. 


Conclusions 


In this article we have reviewed theoretical aspects of 
the protein folding problem. Our strategy in tackling 
this problem consists of two elements: 1) inclusion of 
accurate solvent effects, and 2) development of power- 
ful simulation algorithms that can avoid getting trapped 
in states of energy local minima. 

We have shown the effectiveness of Monte-Carlo 
simulated annealing by showing that direct folding of 
a-helix and f-sheet structures from randomly-gener- 
ated initial conformations are possible. 
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As for the solvent effects, we considered sev- 
eral methods: a distance-dependent dielectric func- 
tion, a term proportional to solvent-accessible surface 
area, and the reference interaction site model (RISM). 
These methods vary in nature from crude but com- 
putationally inexpensive (distance-dependent dielectric 
function) to accurate but computationally demanding 
(RISM theory). In the present article, we have shown 
that the inclusion of some solvent effects is very impor- 
tant for a successful prediction of the tertiary structures 
of small peptides and proteins. 
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Many important real-world problems contain stochas- 
tic elements and require optimization. Stochastic pro- 
gramming and simulation-based optimization are two 
approaches used to address this issue. We do not ex- 
plicitly discuss other related areas including stochastic 
control, stochastic dynamic programming, and Markov 
decision processes. We consider a stochastic optimiza- 
tion problem of the form 


(SP) 2" = min Ef(x, §), 


where x is a vector of decision variables with deter- 
ministic feasible region X C R?, £ is a random vector, 
and f is a real-valued function with finite expectation, 
Ef (x, &), for all x € X. We use x* to denote an optimal 
solution to (SP). Note that the decision x must be made 
prior to observing the realization of €. 

A wide variety of types of problems can be expressed 
as (SP) depending on the definitions of f and X. Two 
of the most commonly-used approaches are rooted in 
mathematical programming and in discrete-event sim- 
ulation modeling. 

In a two-stage stochastic linear program with re- 
course [6,14], X is a polyhedral set and f is defined as 
the optimal value of a linear program, given x and &, 
1s, 


min qy 
yao (1) 
st. Wy=Tx +h. 


f(x, €) = cx + 


Here, & is the vector of random elements from h, q, T, 
and W. A prototypical problem of this nature is a capac- 
ity allocation model under uncertain demand and/or 
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capacity availabilities. x is a strategic decision allocat- 
ing resources while y represents an operational recourse 
decision that is made after observing the demand and 
availabilities. Example applications of this type include 
capacity expansion planning in an electric power sys- 
tem [16] and in a telecommunications network [61]. 
The two-stage model generalizes to a more dynamic, 
multistage model (see, e. g., [10]) in which decisions are 
made, and random events unfold, over time. For mul- 
tistage applications in asset-liability management see 
[13] and in hydro-electric scheduling see [39]. 

In the context of a simulation model, f(x, &) could 
represent a performance measure under a design speci- 
fied by x. For example, f(x, &) might represent the num- 
ber of hours in a workday that a critical machine is 
blocked in a queueing network model of a manufac- 
turing system in which buffer sizes are determined by 
x. In another application, E.L. Plambeck et al. [53] al- 
locate constrained processing rates to unreliable ma- 
chines with buffers in a fluid serial queueing network 
in order to maximize steady-state throughput. In non- 
terminating simulations, the expectation in Ef(x, &) is 
typically with respect to a steady-state distribution. 

Note that Ef(x,&) can capture objectives not usu- 
ally thought of as a ‘mean’. For example, if c represents 
random rates of return and x investment amounts, we 
might want to maximize the probability of exceeding 
a return threshold, T. We can write P(cx > T) = 
EI(cx > T) where [(-) is the indicator function that 
takes value one if its argument is true and zero oth- 
erwise. For more on probability maximization models 
(and generalizations of (SP) in which X contains prob- 
abilistic constraints) see [54]. See [45] for a discussion 
of risk modeling in stochastic optimization. 

A more general model than (SP) allows the distri- 
bution of € to depend on x. Some simple types of de- 
pendencies can effectively be captured in (SP) via mod- 
eling tricks, such as the x scaling random elements of 
T in (1). General dependencies, however, are difficult 
to handle. For work on decision-dependent distribu- 
tions when there are a finite number of possibilities see 
[26,40]. 

Regardless of whether it is defined as the expected 
value of a mathematical program or as a long-run av- 
erage performance measure of a discrete-event simula- 
tion model, it is usually impossible to calculate Ef (x, &) 
exactly- even for a fixed value of x. When the dimension 


of the random vector & is relatively low, one approach is 
to obtain deterministic approximations of Ef(x, &) us- 
ing numerical quadrature or related ideas. In stochastic 
programming, this corresponds to generating and re- 
fining bounds on Ef(x, €) within a sequential approx- 
imation algorithm [20,24,43]. For problems in which 
— is of moderate-to-high dimension and is continu- 
ous or has a large number of realizations, Monte-Carlo 
simulation is widely regarded as the method of choice 
for estimating Ef(x, &). As a result, it is not surprising 
that Monte-Carlo techniques play a fundamental role 
in solving (SP). 

In recent years (1999), considerable progress has 
been made in solving realistically-sized problems with 
a significant number of stochastic parameters and de- 
cision variables. The telecommunications model con- 
sidered in [61] has 86 random point-to-point demand 
pairs and 89 links on which capacity may be installed. 
In [53] queueing networks with up to 50 nodes are 
studied. Each node represents a machine with random 
failures and has a decision variable denoting its as- 
signed cycle time. [53] also solves a stochastic PERT 
(program evaluation and review technique) problem 
with 70 nodes and 110 stochastic arcs. The arcs model 
the times required to complete activities and a deci- 
sion variable associated with each arc influences (pa- 
rameterizes) the distribution of the random activity du- 
ration. These problems contain objectives with high- 
dimensional expectations and all were solved using 
Monte-Carlo methods. 

In this article we discuss: 

i) several types of Monte-Carlo-based solution proce- 
dures that can be used for solving (SP); 

ii) methods for testing the quality of a candidate solu- 

tion x € X; 

iii) variance reduction techniques used in stochastic 
optimization; and 
iv) theoretical justification for using sampling. 


Solution Procedures 


Monte-Carlo methods for approximately solving 
stochastic optimization problems can typically be clas- 
sified on the basis of whether the sampling is external 
to, or internal to, the optimization algorithm. Solu- 
tion procedures of both types are driven by estimates of 
objective function values and/or gradients. Before turn- 
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ing to solution procedures we briefly discuss gradient 
estimation. 

In stochastic programming, gradient (or subgra- 
dient) estimates of Ef(x,&) are typically available via 
duality. In simulation-based optimization, the primary 
methods for obtaining gradient estimates are finite dif- 
ferences, the likelihood ratio (LR) method (also called 
the score function method) [29,57], and infinitesimal 
perturbation analysis (IPA) [27,35]. Finite-difference 
approximations require minimal structure, needing 
only estimates of Ef(x,&); however, they result in so- 
lution procedures that can converge slowly. The LR 
method is more widely applicable than IPA, but when 
both apply the IPA approach tends to produce estima- 
tors with lower variance. See, for example, [28] for a dis- 
cussion of these issues. 

In the simplest form of ‘external sampling’ (also 
called ‘sample-path optimization’ [55] and the ‘stochas- 
tic counterpart’ method [57]) we generate independent 
and identically distributed (i.i.d.) replicates & Fc egeee 
from the distribution of € and form the approximating 
problem 


(SP,) 2, = mig Ste é'), 


Even when it is possible to construct (SP,,) using 
iid. variates, it may be preferable to use another sam- 
pling scheme in order to reduce the variance of the re- 
sulting estimators. Moreover, in nonterminating simu- 
lation models, generating i.i.d. replicates from a station- 
ary distribution is often impossible (for exceptions see 
recent work on exact sampling, e. g., [3,22]), but under 
appropriate conditions we may run the simulation for 
a length n and replace the objective function in (SP,) 
with a consistent estimate of the desired long-run aver- 
age performance measure. 

After constructing an instance of (SP,,) we employ 
a (deterministic) optimization algorithm to obtain a so- 
lution x*. In the case of stochastic linear programming, 
(SP,,) is a large scale linear program. The cutting plane 
algorithm of R.M. Van Slyke and R.J-B. Wets [64], its 
variant with a quadratic proximal term [58], and its 
multistage version [7,9] are powerful tools for solving 
such problems. A cutting plane algorithm with a prox- 
imal term and IPA-based gradients is used in an exter- 
nal sampling method for solving the queueing network 


problem in [53]. See [8] for a recent survey of compu- 
tational methods for stochastic programming instances 
of (SP,,). 

Intuitively, we might expect solutions of (SP,) 
to more accurately approximate solutions of (SP) as 
n increases. We discuss results supporting this in 
Sect. “Theoretical Justification for Sampling”. In addi- 
tion, after having solved (SP,,) to obtain x* it would be 
desirable to know whether n was ‘large enough’. More 
generally, we would like to be able to test the quality of 
a candidate solution (such as x*). This is discussed in 
the next section. 

We now turn to solution procedures based on inter- 
nal sampling. These algorithms adapt deterministic op- 
timization algorithms by replacing exact function and 
gradient evaluations with Monte-Carlo estimates. The 
sampling is internal because new observations of & are 
generated on an as-needed basis at each iteration of the 
algorithm. We briefly discuss stochastic adaptations of 
steepest descent and cutting plane methods. 

A deterministic steepest descent algorithm for (SP) 
forms iterates {x*} using the recursion 


got a= Te [2 _ po VEf (x, | ‘ 


ITx performs a projection onto X and { p* } 
are steplengths. It is usually impossible to calculate 
VEf(x, &) exactly and it must be estimated. Stochastic 
approximation (SA) and stochastic quasigradient (SQG) 
algorithms are stochastic variants of a steepest descent 
search. The Keifer- Wolfowitz SA method uses unbiased 
estimates of Ef(x,&) to form finite-difference approx- 
imations of the gradient. The Robbins-Monro SA pro- 
cedure requires unbiased estimates of VEf(x, &). SQG 
methods do not require that Ef(x, &) be differentiable 
and work under more general assumptions concerning 
the estimates of (sub)gradients of Ef (x, &). In particu- 
lar, the estimates need not be unbiased but the bias must 
effectively shrink to zero as the algorithm proceeds. For 
convergence properties of SA methods see [49] and for 
SQG procedures see [23]. 

Cutting plane methods are applicable when Ef (x, &) 
is convex. The iterates {x‘ } are found by solving a se- 
quence of optimization problems of the form 


min max c Eft, £) + VEf(x, E\(x — x°), 


xEX (= 
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where L grows as the algorithm proceeds. At each it- 
eration a first order Taylor approximation of Ef(x, &), 
i.e., a cutting plane, is computed at the current iterate 
x* and is used to refine the piecewise-linear outer ap- 
proximation of Ef(x,&). The key idea is that this ap- 
proximation need only be accurate in the neighborhood 
of an optimal solution. For stochastic linear programs, 
G.B. Dantzig, P.W. Glynn [15], and G. Infanger [37,38] 
and J.L. Higle and S. Sen [32,34] have developed Monte- 
Carlo-based cutting plane methods by using statistical 
estimates for the cut intercepts and gradients. Dantzig, 
Glynn, and Infanger use separate streams of observa- 
tions of & to estimate each cut. The stochastic decom- 
position algorithm of Higle and Sen uses common ran- 
dom number streams to calculate each cut and employs 
an updating procedure to ensure that the statistical cuts 
are asymptotically valid (i.e., lie below Ef (x, &)). Rela- 
tive to SA and SQG methods, cutting plane procedures 
avoid potentially difficult projections and, in practice, 
have a reputation for converging more quickly, partic- 
ularly when X is high dimensional. 

Grid search and optimization of metamodels are 
two common approaches to optimizing system per- 
formance in discrete-event simulation models. In grid 
search, X is replaced by a ‘grid’ of points X,, = { HX  eees 
x } and sample-mean estimates 


n 
f,<@=in> fae) 
i=l 

are formed at each x € X,,. (SP) is then approximately 
solved by z* = minyex,, 7 (x) with x* being the as- 
sociated minimizer. Grid search is attractive because it 
requires minimal structure, but in implementing this 
procedure, we must exercise care in selecting m and n. 
With independent sampling at each grid point, K.B. En- 
sor and Glynn [21] consider the rate at which n must 
grow relative to m in order to achieve consistency and 
they also discuss the method’s limiting behavior when 
the rate of growth is at (and slower than) the critical 
rate. 

A metamodel can be used to approximate a more 
complex simulation model which, in turn, is an approx- 
imation of the real system. In such a metamodel, es- 
timates of Ef(x,&) are formed at each point in a set 
specified by an experimental design, and the parame- 
ters of the postulated response surface are fit to these ob- 
served values. The resulting function is then optimized 


with respect to x. For more on metamodels see, e.g., 
[11,47]. The review in [25] includes optimization using 
response surfaces, and metamodeling has also been ap- 
plied in stochastic programming [5]. 

The grid-search and metamodel approaches are 
classified as external sampling procedures if the proce- 
dure is executed once. However, it may be desirable to 
refine the grid (or the region covered by the experimen- 
tal design) in the neighborhood of promising values of 
x and repeat the methodology. When it is adaptively re- 
peated in this fashion the procedure is classified as an 
internal sampling method. 

We have not explicitly discussed approaches for 
when X is discrete. These range from methods for se- 
lecting the best design in simulation to those for solv- 
ing stochastic integer programming models. Finally, 
sampling-based procedures for multistage stochastic 
programs have been proposed in [17]. 


Establishing Solution Quality 


Establishing solution quality is a key concept when us- 
ing an approximation scheme to solve an optimization 
problem. When applying Monte-Carlo techniques to 
(SP), the best we can expect are probabilistic quality 
statements. In the context of external sampling, there 
has been significant work on studying the behavior of 
solutions to (SP,,) for large sample sizes (see the last sec- 
tion). There are analogous convergence results for al- 
gorithms based on internal sampling. Such results take 
a number of forms but perhaps the most fundamen- 
tal is to show that limit points of the sequence of so- 
lutions are, say, almost surely optimal to (SP). Next, it 
is desirable to have a statement regarding the rate of 
convergence and an associated asymptotic distribution. 
These consistency and limiting distribution results are 
aimed at justifying sampling-based methods and may 
be viewed as establishing solution quality. However, 
the approach discussed in this section centers on the 
question: Given a candidate solution x € X, what can 
be said regarding its quality? Because candidate solu- 
tions may be obtained by internal or external sampling 
schemes or via another, heuristic, method, procedures 
that can directly test the quality of x, regardless of its 
origin, are very attractive. 

One natural way of defining solution quality is by 
the optimality gap, Ef(x, €) — z*. An optimal solution 
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has an optimality gap of zero, but in our setting we hope 
to make probabilistic statements such as 


P{Ef(x,&)—z* <e}>a, (2) 


where € is a random confidence interval width and « is 
a confidence level, e. g., @ = 0.95. Unfortunately, exact 
confidence intervals such as (2) can be difficult to ob- 
tain even in relatively simple statistical settings so we 
attempt to construct approximate confidence intervals 


P{Ef (x, &)—2" <e} wa. (3) 


To form a confidence interval (3) for Ef (x, §)—z* we 
estimate the mean of a gap random variable G, = U, — 
L,, that is expressed as the difference between upper and 
lower bound estimators and satisfies EG, > Ef (x, €) — 
Z. 

In many problems it is relatively straightforward to 
estimate the performance of a suboptimal decision x via 
simulation. For example, the standard sample mean es- 
timator, U;, = 1/n )>7_, f(x, &'), provides an unbiased 
estimate of the expected cost of using decision xX, ie, 
Ef (x, &). 

To construct a confidence interval for the optimality 
gap we also want an estimate of z*. However, unbiased 
estimates of z* are difficult to obtain so an estimator 
L, that satisfies EL, < z* is used. In [51] it is shown 
that if the objective in (SP,,) is an unbiased estimate of 
Ef(x,&) then Ez* < z*, i.e, z% is one possible lower 
bound estimator L,. Higle and Sen [33] perform a La- 
grangian relaxation of a reformulation of (SP,,) which 
uses explicit ‘nonanticipativity’ constraints. The result- 
ing lower bound is weaker in expectation than z* but 
has the computational advantage that the optimization 
problem separates by scenario. 

Once observations of G, can be formed, we can 
appeal to the batch means method and use the cen- 
tral limit theorem [51], or a nonparametric approach 
[31,33], to construct approximate confidence intervals 
(3). Another approach to examining solution quality 
is to test the null hypothesis that the (generalized) 
Karush-Kuhn-Tucker (KKT) optimality conditions are 
satisfied; see [63]. Higle and Sen [31] also consider the 
KKT conditions but use them to derive bounds on the 
optimality gap. 


Variance Reduction Techniques 


When applying the ‘crude’ Monte-Carlo method to es- 
timate Ef (x, &) for fixed x, we use the standard sample 
mean estimator based on i.i.d. terms, 


1 n 
— So f(x, &'). 
n* 
i=1 
The error associated with this estimate is proportional 
to 


(4) 


n 


[meee 


This error can be decreased by increasing the sample 
size. However, obtaining an additional digit of accuracy 
requires increasing the sample size by a factor of 100. 
If f is defined as the optimal value of a mathematical 
program or as the performance measure of a simula- 
tion model, increasing the number of evaluations of f in 
this fashion can be prohibitively expensive. Variance re- 
duction techniques (VRTs) effectively decrease the nu- 
merator in (4) instead of increasing the denominator. 
Many problems for which crude Monte-Carlo would 
yield useless results are instead made computationally 
tractable via VRTs. As described in Sect. “Solution Pro- 
cedures”, sampling is also used to estimate V Ef(x, &), 
but for simplicity we primarily restrict our attention to 
VRTs for estimating Ef (x, &). 

Some VRTs, including control variates (CVs) and 
importance sampling (IS), exploit special structures of 
F(x, &). Suppose that we have I”,.(€), with known mean 
itr, which is believed to approximate (be positively 
correlated with) f(x, &). In CVs we attempt to ‘subtract 
out’ variation by generating observations of [f(x, €) — 
I’.(€)] + wr, which has the same expectation as f(x, 
&). (It is common to incorporate a multiplicative fac- 
tor with the control variate J”,,(&) and also possible to 
use multiple controls.) In IS we attempt to reduce vari- 
ance by generating observations of wr [f(x, &)/ I'x(&)]. 
In CVs observations of € are generated from its origi- 
nal distribution. However, in IS the expected value of 
the ratio is not the ratio of expectations and, as a re- 
sult, there is a change of measure induced by I”, that is 
required to yield an unbiased estimate. Under the new 
IS distribution, we are more likely to sample € where 
I’,(&) is large, i.e., scenarios that our approximation 
function predicts have high cost. In an IS scheme for 


2342 


Monte-Carlo Simulations for Stochastic Optimization 


stochastic linear programs, [15,37] use an approxima- 
tion function that is separable in the components of 
€ while [48] utilizes a piecewise-linear approximation. 
See [12] for the solution of a stochastic optimization 
problem to price American-style financial options us- 
ing the simpler European option as a control variate. 
These papers report significant variance reduction in 
computational results. 

Other VRTs exploit correlation structures in the so- 
lution methodology.Common random numbers (CRNs) 
are often used in simulation when comparing the per- 
formance of two systems. The use of CRNs has been 
suggested in a stochastic approximation method with 
finite differences where the same stream is used for the 
forward and backward point estimates [50]. The upper 
and lower bounds used to determine solution quality 
(see the previous section) may be viewed as two ‘sys- 
tems’ and the use of CRNs in estimating their difference 
has been advocated in [34,51]. In order to reduce the er- 
ror in the resulting response surface, various methods 
have been proposed for generating the streams of ob- 
servations of & at each point in the experimental design. 
The Schruben-Margolin scheme [59] uses a mixture of 
CRNs and antithetic variates and an extension [65] also 
incorporates CVs. 

Another group of VRTs attempts to more regu- 
larly spread the sampled observations over the sup- 
port of €. Such techniques include stratified sampling 
and Latin hypercube sampling as well as quasi- Monte- 
Carlo techniques in which the sequence of observa- 
tions is deterministic. Empirical results in [30] for two- 
stage stochastic linear programming compare the vari- 
ance reduction obtained by stratified sampling, anti- 
thetic variates, IS, and CVs and suggest that a CV pro- 
cedure performs relatively well, particularly on high- 
variance problems. 


Theoretical Justification for Sampling 


In Sect. “Solution Procedures” we formed an approx- 
imating problem for external sampling procedures by 
using the sample mean estimator of Ef(x, €). Here we 
redefine (SP,,) as 


(SP,) zy = minE, f(x), 


with x* again denoting an optimal solution. In (SP) the 
expected value operator E is with respect to the ‘true’ 


probability measure P while in (SP,,), E, is with respect 
to a measure P,, that is a statistical estimate of P. If 
Monte-Carlo methods are used to generate i.i.d. repli- 
cates from P then P,, is the associated (random) empir- 
ical measure. 

Since z7 is an estimator of z* and x* an estimator of 
an optimal solution to (SP), it is natural to study the be- 
havior of these estimators for large sample sizes. For ex- 
ample, under what conditions do we obtain consistency 
and what can be said concerning rates of convergence? 
Positive answers to such questions provide theoretical 
justification for employing external Monte-Carlo sam- 
pling techniques to solve (SP). 

In general, (SP,,) and (SP) may have multiple op- 
timal solutions and so we cannot expect {x* } to con- 
verge. Instead, establishing consistency of x* amounts 
to showing that the accumulation points of the se- 
quence are almost surely optimal to (SP). If, for exam- 
ple, the samples are i.i.d. then by the strong law of large 
numbers we have E, f(x,&) > Ef(x, &), a.s., for all x. 
Unfortunately, this does not ensure that {x* } has accu- 
mulation points that are optimal to (SP) and that z* > 
z*,as. [4]. 

The notion of epiconvergence plays a fundamental 
role in establishing consistency results for x* and z7; 
see [4]. A sequence of functions { ¢, } is said to epi- 


converge to ¢ (written ¢, - ) if the epigraphs of $n, 
{(x, B): B = n(x) }, converge to that of ¢. Epiconver- 
gence is weaker than classical uniform convergence. P. 
Kall [41] provides an excellent review of various types 
of convergence, their relations, and their implications 
for approximations of optimization models. Epiconver- 
gence is a valuable property because of the following re- 
sult: 


Theorem 1 Suppose $y = @. If x is an accumu- 
lation point of { x%, where xi € argmin ,(x), then 
x € argmin $(x). 


Constrained optimization is captured in this result be- 
cause d, and ¢ are defined to be extended-real-valued 
functions that take value + oo at infeasible points. 
While it is possible that the sequence of optimizers {x*} 
has no accumulation points, this potential difficulty is 
avoided if the feasible region X is compact (i.e., closed 
and bounded). 

Because of the implications of epiconvergence, 
there is considerable interest in determining sufficient 
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conditions on f, P,,, and P under which E,, f(x, ) x 
Ef(x, &), a.s. Note that because {P,,} are random mea- 
sures, the epiconvergence of the approximating func- 
tions is with probability one (also called epiconsistency). 
Under this hypothesis the accumulation points of {x* } 
are almost surely optimal to (SP); see [19]. 


Sufficient conditions for achieving E,, f(x, €) = 
Ef(x,&), a.s. are examined in [19,42,55], and [56]. 
Roughly speaking, we will obtain epiconsistency if f 
is sufficiently smooth, P,, converges weakly to P with 
probability one, and the tails of the distributions are 
well-behaved relative to f. See [2,60] for results when 
f is discontinuous. 

For two-stage stochastic programming in which the 
recourse matrix W in (1) is deterministic and P,, is 
the empirical measure, [46] contains consistency results 
under modest assumptions. We note that is possible to 
develop consistency results using other (stronger) types 
of convergence of E,, f(x, &) to Ef (x, &); see, for exam- 
ple, [52]. 

There is a large literature on consistency, stability, 
and rates of convergence for solutions of (SP;,). Much 
of this work may be viewed as generalizing earlier re- 
sults on constrained maximum likelihood estimation in 
[1] and [36]. Under restrictive assumptions, asymp- 
totic normality for /n(z* — z*) and ./n(x* — x*) 
may be obtained, e.g., [19]. However, when inequal- 
ity constraints in X play a nontrivial role we cannot, 
in general, expect to obtain limiting distributions that 
are normal [18,44,62]. See [44] for a limiting distribu- 
tion for /n(x* — x*) that is the solution of a (random) 
quadratic program. 


See also 


> Monte-Carlo Simulated Annealing in Protein 
Folding 
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Motzkin’s transposition theorem (MTT) [1] is a so- 
called theorem of the alternative (cf. » Linear Optimiza- 
tion: Theorems of the Alternative). It deals with the 
question whether or not a given system of linear in- 
equalities has a solution. In the most general case such 
a system has the form 


(S) Ax>a, Bx>b, 


where A and B are matrices of size m x n and p x n, re- 
spectively, and where Ax > a contains the ‘larger than 
or equal’ inequalities and Bx > b the ‘larger than’ in- 
equalities. Note that inequalities of the opposite type 
(‘smaller than or equal’ or “smaller than’) can be turned 
into the appropriate form by multiplying them by —1. 

The Motzkin transposition theorem states that the 
system (S) has no solution if and only if at least one of 
the systems (T1) and (T2) has a solution, where the lat- 
ter systems are given by 


(T1) y'A+v'B=0, ylat+v'b>0, 
v2” v>0, 
and 
(12) y'A +v'B=0, yla +v'ib>0, 
y=, v>0, v0, 
respectively. 


In other words, when one has a solution of (T1) or 
of (T2) this solution is a certificate for the fact that the 
given system (S) is infeasible, i. e., has no solution. 

It makes sense to formulate two most useful princi- 
ples following from the theorem. 


Theorem 1 (Principle A) The system (S) is infeasible 
if and only if one can combine the inequalities in (S) 
in a linear fashion (i. e., multiply each inequality with 
a nonnegative number and add the results) to get the 
contradictory inequality 0 > 0 (or 0 = 1). 


To see that this is exactly what the MTT says, let y and v 
denote nonnegative vectors of appropriate sizes. Then 
the inequality 


(y'At+v'B)x>ylatv'b (1) 


is a consequence of the inequalities in (S), and if the 
vector v is not the zero vector, then also the stronger 
inequality 


(y'A+v'B)x>ylatv'b (2) 


is a consequence of (S). The inequalities (1) and (2) have 
certainly solutions ify’ A+ v7 B40. Butify’ A+v7T B 
= 0 then (1) yields a contradiction if yT a+ v’ b>0 and 
(2) ifyT a+ v1 b > 0. The first case occurs if (T1) has 
a solution and the second case if (T2) has a solution. 
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The second principle is: 


Theorem 2 (Principle B) If (S) is feasible, then a linear 
inequality is a consequence of the inequalities in (S) if 
and only if it can be obtained by combining, in a linear 
fashion, the inequalities in (S) and the trivial inequality 
0>-1. 


This principle can be understood in a similar way: If (S) 
is feasible, then cT x > z is an implied inequality if and 
only if 


+ 


Ax >a, Bx>b > c x>z, 


which is equivalent to the system 


Ax >a, Bx>b, —c!x>-z 


being infeasible. By Principle A this happens if and only 
if there exist nonnegative vectors y and v and a nonneg- 
ative scalar A such that 


(y'A+v'B-Ac)x>ylat+vib—dAz 


is a contradictory inequality. Hence y’ A+ v7 B—A c= 
Oand yt a+ vt b —A z> 0. Since (S) is feasible, we must 
have A > 0. Without loss of generality we may assume 
A =1.Then c= yt A+ vTBand z> yl a+ v7 b. This 
proves the claim. 

The above principles are highly nontrivial and very 
deep. Consider, e. g., the following system of 4 inequal- 
ities with two variables u, v: 


—-l<u<1l, 
—-l<v<l. 
From these inequalities it follows that 
w+ <2, 


which in turn implies, by the Cauchy inequality, the in- 
equality u+ v <2: 


utvel-utl-v< VP4+PVw4+ Vv? <2. 


The concluding inequality is linear, and is a conse- 
quence of the original system, but the above derivation 
is ‘highly nonlinear’. It is absolutely unclear a priori why 
the same inequality can also be obtained from the given 
system in a linear manner as well, as stated by Principle 
B. Of course, it can — it suffices to add the inequalities u 
<landv<1. 


The MTT is one of the deepest result in the part 
of mathematics dealing with linear inequalities and, in 
fact, is logically equivalent to other deep results in this 
discipline. For example, it is equivalent to the duality 
theorem for linear optimization (cf. » Linear Program- 
ming). To demonstrate this, consider the linear opti- 
mization problem 


(P) min fe? x: Ax > b}. 


Let z* denote the optimal value of (P), where we take z* 
= —oo if (P) is unbounded and z* = on if (P) is infeasi- 
ble. Now, a real z is a lower bound on the optimal value 
of (P) if and only if clx > z is a consequence of Ax > b, 
or, which is the same, if and only if the system of linear 
inequalities 


(S,) Ax > Db, —c'x>-z 


has no solutions. By the MTT this is the case if and only 
if at least one of the systems 


mm) A-—yoc=0, y'b—yoz>0, 
° > 0, yo > 0 
and 
(T2.) y A—yoc = y'b— yoz = 0, 
° >0, yo > 0 


has a solution. Note that the only difference between 
these two systems is that (T1,) requires yp > 0 whereas 
(T2,) requires yo > 0. Also, since the system (T2,) is ho- 
mogeneous, without loss of generality we may take yo = 
1. Thus it follows that z is a lower bound on the opti- 
mal value of (P) if and only if one of the following two 
systems 


(Tl) y'A=0, y'b>0, y>=0 
and 
(T2) y"A=c, yb>z, y>0 


has a solution. Observe that z does not appear in (T1’,). 
Therefore, if this system has a solution then each real z 
is alower bound on the optimal value of (P), but this oc- 
curs if and only the problem (P) is infeasible. Assuming 
that (P) is feasible, it follows that z is a lower bound on 
the optimal value of (P) if and only if the system (T2’,) 
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has a solution. Given a solution y of (T2’,) any z sat- 
isfying y'b > z is a lower bound and the largest lower 
bound provided in this way is y™b. Hence, the largest 
possible lower bound on the optimal value of (P) is the 
optimal value of the problem 


(D) max {b' y: ylA=c, y>=oh. 


If the problem (P) is unbounded, i.e., if there does not 
exists a lower bound on the optimal value of (P), then 
the problem (D) must be infeasible. Otherwise the op- 
timal value of (D) must coincide with the optimal value 
of (P). 

The problem (D) is called the dual problem of the 
primal problem (P). The above findings can be summa- 
rized as follows: 


if one of the two problems (P) and (D) is un- 
bounded then the other is infeasible; if both 
problems are feasible then they have both an 
optimal solution and the optimal values are the 
same. 


This is the duality theorem for linear optimization. Note 
that one other case may occur, namely that both prob- 
lems are infeasible. It became clear above that (P) is in- 
feasible if and only if (T1’,) has a solution, so 


the primal problem (P) is infeasible if and only if 
there exists a dual ray y, i.e., a vector y such that 
y'A=0, y'b>0, y>0. (3) 
In fact, the latter statement is equivalent to the state- 
ment that (3) and Ax > bare alternative systems, which 
is the special case of the MTT occurring when B is 
vacuous and which is known as Farkas’ lemma. (See 
> Linear Optimization: Theorems of the Alternative 
and > Farkas Lemma.) In just the same way it can be 
derived from a variant of Farkas’ lemma that: 


the dual problem (D) is infeasible if and only if 
there exists a primal ray x, i.e., a vector x such 
that 


Ax>0, c!x <0. (4) 


It has been shown above that the MTT implies the 
duality theorem for linear optimization. The converse 


is also true: Assuming the duality theorem for linear 
optimization, the MTT easily can be proved, showing 
that the two results are logically equivalent. This goes in 
two steps. Assuming the duality theorem for linear op- 
timization, first one derives Farkas’ lemma and then it is 
shown that the MTT follows. To derive Farkas’ lemma, 
consider the problem 


min bo es Ax > b} : 


Clearly, the system Ax > b has a solution if and only if 
the optimal value of this problem is zero. By the duality 
theorem this holds if and only if the optimal value of 
the dual problem 


max {bl y: y'A=0, y> o} 
is also zero. This holds if and only 


y'A=0,y>=0 3 bly<o, 


which is true if and only if the system 


y'A=0, y>0, bly>o 
has no solution, proving Farkas’ lemma. 
To prove the MTT, one derives from Farkas’ lemma 


that the ‘weaker’ system 


(Sl) Ax >a, Bx>b 


is infeasible if and only if the system (T1) has a solution. 
If (S1) is feasible then one easily verifies that (S) has no 
solution if and only if the optimal value of the problem 


(Pl) min{v: Ax >a, Bx +ve > b} 

is a nonnegative real. Here e denotes the all-one vector. 
Since (P1) is feasible and below bounded, by the duality 
theorem this happens if and only if the optimal value of 
the dual problem 


y'A+v'B=0, 
(D1) max alytb'y: elvy=1, 
y2=0, v=0 


is a nonnegative real and, finally, this occurs if and only 
if (T2) has a solution. Thus it has been shown that the 
MTT is logically equivalent to the duality theorem for 
linear optimization. 
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So far the issue of how to prove the MTT has not 
been touched. One possible approach is to prove the 
duality theorem for linear optimization and then derive 
the MTT in the above described way. This approach is 
now quite popular in text books. For a recent exam- 
ple see, e. g., [2]. The easiest way for a direct proof is to 
prove first the Farkas’ lemma and then derive the MTT 
from this lemma. The latter step uses the easy to ver- 
ify statement that (S) has no solution if and only if the 
system 


Ax — ta = 0, 

Bx —tb—se>0, 
t—s>0, 
—s<0 


has no solution. Application of a suitable variant of 
Farkas’ lemma to this system yields the MTT. Farkas’ 
lemma and its proof have a rich history; for a nice and 
detailed survey one might consult [3]. 


See also 


> Farkas Lemma 

> Linear Optimization: Theorems of the Alternative 
> Linear Programming 

> Minimum Concave Transportation Problems 

> Multi-index Transportation Problems 

> Stochastic Transportation and Location Problems 
> Tucker Homogeneous Systems of Linear Relations 
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Introduction 


Data classification is a supervised learning strategy that 
analyzes the organization and categorization of data in 
distinct classes [14]. Generally, a training set, in which 
all objects are already associated with known class la- 
bels, is used by classification methods. The data clas- 
sification algorithm works on this set by using input 
attributes and builds a model to classify new objects. 
In other words, the algorithm predicts output attribute 
values. Output attribute of the developed model is cate- 
gorical [4]. There are many applications of data classifi- 
cation in finance [6,14], health care [14], sports [14], en- 
gineering [10,14] and science [10]. Data classification is 
an important problem that has applications in a diverse 
set of areas ranging from finance to bioinformatics. 

A broad range of methods exists for data classifica- 
tion problem including Decision Tree Induction [14], 
Bayesian Classifier [14], Neural Networks (NN) [10], 
Support Vector Machines (SVM) [10] and Mathemat- 
ical Programming (MP) [1]. An overall view of clas- 
sification methods is published by Weiss and Ku- 
likowski [21]. A neural network is a data structure that 
attempts to simulate the behavior of neurons in a bi- 
ological brain [14]. A major shortcoming of the neu- 
ral network approach is a lack of explanation of the 
constructed model. The possibility of obtaining a non- 
convergent solution due to the wrong choice of initial 
weights and the possibility of resulting in a non-optimal 
solution due to the local minima problem are impor- 
tant handicaps of neural network-based methods. SVM 
approach operates by finding a hyper surface that will 
split the classes so that the distance between the hy- 
per surface and the nearest of the points in the groups 
has the largest value [19]. The main goal is to generate 
a separating hyper surface which maximizes the margin 
and produces good generalization ability [10]. In recent 
years, SVM has been considered one of the most effi- 
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cient methods for two-class classification problems [5]. 
SVM method has two important drawbacks in multi- 
class classification problems; a combination of SVM has 
to be used in order to solve the multi-class classification 
problems and some approximation algorithms are used 
in order to reduce the computational time for SVM 
while learning the large scale of data. 

There have been numerous attempts to solve clas- 
sification problems using mathematical programming. 
A survey of classification methods using mathemati- 
cal programming is published by Joachimsthaler and 
Stam [11]. The mathematical programming approach 
to data classification was first introduced in early 
1980's. Since then, numerous mathematical program- 
ming models have appeared in the literature. As an ex- 
tension of complement to these, Erenguc and Koehler 
provide a comprehensive review [7]. Many distinct 
mathematical programming methods with different ob- 
jective functions are developed in the literature. These 
include; minimizing the maximum exterior deviation, 
minimizing the weighted sum of exterior deviations, 
minimizing a measure of exterior deviations while max- 
imizing a measure of interior deviations, minimiz- 
ing the number of misclassifications, and minimizing 
a generalized distance measure. Most of these methods 
modeled data classification as linear programming (LP) 
problems to optimize a distance function. Contrary 
to LP problems, mixed-integer linear programming 
(MILP) problems that minimize the misclassifications 
on the design data set are also widely studied [7]. Math- 
ematical programming methods have certain advan- 
tages over the parametric ones. For instance, they are 
free from parametric assumptions and weights to be ad- 
justed. Moreover, varied objectives and more complex 
problem formulations can easily be accommodated. On 
the other hand, obtaining a solution without any dis- 
criminating power, unbounded solutions and excessive 
computational effort requirement are some of the prob- 
lems in mathematical programming based methods. 
Koehler [12] surveys the potential problems in math- 
ematical programming formulations. There have been 
several attempts to formulate data classification prob- 
lems as MILP problems [2,8,13,15]. Since MILP meth- 
ods suffer from computational difficulties, the efforts 
are mainly focused on efficient solutions for two-group 
supervised classification problems. Although ways to 
solve a multi-class data classification problem exist by 


means of solving several two-group problems, such ap- 
proaches also have drawbacks including computational 
complexity resulting in long computational times [16]. 


MILP Formulation 


The objective in data classification is to assign data 
points that are described by several attributes into a pre- 
defined number of classes. The use of hyper-boxes for 
defining boundaries of the sets that include all or some 
of the points in that set as shown in Fig. 1 can be very ac- 
curate on multi-class problems. If it is necessary, more 
than one hyper-box could be used in order to repre- 
sent a class as shown in Fig. 1. When the classes that are 
indicated by square and circle data points are both rep- 
resented by a single hyper-box respectively, the bound- 
aries of these hyper-boxes will overlap. Thus, two boxes 
are constructed in order to eliminate this overlapping. 
A very important consideration in using hyper-boxes 
is the number of boxes used to define a class. If the 
total number of hyper-boxes is equal to the number 
of classes, then the data classification is very efficient. 
On the other hand; if there are as many hyper-boxes of 
a class as the number of data points in a class, then the 
data classification is inefficient. 


x4 


Multi-Class Data Classification via Mixed-Integer Optimiza- 
tion, Figure 1 

Schematic representation of multi-class data classification 
using hyper-boxes 
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The data classification problem based on this new 
idea is built in two parts: training and testing. Determi- 
nation of the characteristics of the data points that be- 
long to a certain class and differentiation them from the 
data points that belong to other classes are the targets 
done during the training part. Thus, boundaries of the 
classes are formed by the construction of hyper-boxes 
in the training step. After the distinguishing character- 
istics of the classes are determined, then the effective- 
ness of the classification must be tested. Predictive ac- 
curacy of the developed model is performed on a test 
data set during the test part. 


Training Problem Formulation 


Training part studies are performed on a training data 
set composed of a number of instances i. The data 
points are represented by the parameter a;,, that de- 
notes the value of attribute m for the instance i. The 
class k that the data point i belongs to are given by the 
set Dj. Each existing hyper-box / encloses a number of 
data points belonging to the class k. Moreover, bounds 
n (lower, upper) of each hyper-box is determined by 
solving the training problem. 

Given these parameters and the sets, the following 
variables are sufficient to model the data classification 
problem with hyper-boxes. The binary variable yb; is 
indicates whether the box / is used or not. The posi- 
tion (inside or outside) of the data point i with regard 
to box / is represented by ypb;;. The assigned class k of 
box / and data point i is symbolized by ybc), and ypcjx, 
respectively. If the data point i is within the bound n 
with respect to attribute m of box J, then the binary 
variable ypbniimn takes the value of 1, otherwise 0. Sim- 
ilarly, ypbmjjm indicates whether the data point i is 
within the bounds of attribute m of box | or not. Fi- 
nally, yp;x indicate the misclassification of data points. 
In order to define the boundaries of hyper-boxes, two 
continuous variables are required: Xj, is the one that 
models bounds n for box / on attribute m. Correspond- 
ingly, bounds n for box | of class k on attribute m are 
defined with the continuous variable XD),k,m,n- 

The following MILP problem models the training 
part of data classification method using hyper-boxes: 


minz = YY ypik + c) > yb (1) 
ik ; 


subject to 


XDikmn < Aimypvil 


Vi,k,l,m,n|n = lo (2) 


XDikmn = Gimypbit Vi, k,l,m,n|n = up (3) 


XDikmn < Qyber. Vk,1,m,n (4) 
SA Diets = Ximn VI, m,n (5) 
k 


Vi,l,m,n|n = up 


(6) 


ypbniimn 2 (1/Q)(Ximn —dim) 


Vi,l,m,n|n = lo 


(7) 


ypbnilmn Zz (1/Q)(dim —Ximn) 


do ypbi = 1 Vi (8) 
l 

do ypcik = 1 Vi (9) 
k 

Yo ypbir = >> ypcix Vi (10) 
l k 

Yo ybei. < ybi VI (11) 
k 

ybeik—_ ypbir <0 WILk (12) 

yberr — ~ ypcik <0 VI,k (13) 

Se ypbnitmn — ypbmiim <N—1 Vi,l,m (14) 

Y> ypbmiim — ypbit <M—-1 Vi,1 (15) 

ypcik — pik <0 Vik ¢ Dix (16) 

Ximn, XDikmn = 0, ybi, ybeix, ypbit, YPCik, a7) 


yponiimn, yPbMilm, YPik € (0, 1} 


The objective function of the MILP problem (Eq. (1)) 
is to minimize the misclassifications in the data set with 
the minimum number of hyper-boxes. In order to elim- 
inate unnecessary use of hyper-boxes, the unnecessary 
existence of a box is penalized with a small scalar c in 
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the objective function. The lower and upper bounds of 
the boxes are given in Eqs. (2) and (3), respectively. The 
lower and upper bounds for the hyper-boxes are de- 
termined by the data points that are enclosed within 
the hyper-box. Eq. (4) enforces the bounds of hyper- 
boxes exist if and only if this hyper-box is assigned to 
a class. Eq. (5) is used to relate the two continuous 
variables that represent the bounds of the hyper-boxes. 
The position of a data point with respect to the bounds 
on attribute m for a hyper-box is given in Eqs. (6) 
and (7). The binary variable ypbnjjm, helps to iden- 
tify whether the data point i is within the hyper-box /. 
Two constraints, one for the lower bound and one for 
the upper bound, are needed for this purpose (Eqs. (6) 
and (7)). Since these constraints establish a relation be- 
tween continuous and binary variables, an arbitrarily 
large parameter, Q, is included in these constraints. The 
Eqs. (8) and (9) state that every data point must be as- 
signed to a single hyper-box, /, and a single class, k, re- 
spectively. The equivalence between Eqs. (8) and (9) 
is given in Eq. (10); indicating that if there is a data 
point in the class k, then there must be a hyper-box | 
to represent the class k and vice versa. The existence of 
a hyper-box implies the assignment of that hyper-box 
to a class as shown in Eq. (11). Ifa class is represented 
by a hyper-box, there must be at least one data point 
within that hyper-box as in Eq. (12). In the same man- 
ner, if a hyper-box represents a class, there must be at 
least a data point within that class as given in Eq. (13). 
The Eq. (14) represents the condition of a data point be- 
ing within the bounds of a box in attribute m. Ifa data 
point is within the bounds of all attributes of a box, then 
it must be in the box as shown in Eq. (15). When a data 
point is assigned to a class that it is not a member of, 
a penalty applies as indicated in Eq. (16). Finally, last 
constraint gives non-negativity and integrality of deci- 
sion variables. By using this MILP formulation, a train- 
ing set can be studied and the bounds of the classes are 
determined for a data classification problem. 


Testing Problem Formulation 


The testing problem for multi-class data classification 
using hyper-boxes is straight forward. If a new data 
point whose membership to a class is not known ar- 
rives, it is necessary to assign this data point to one of 
the classes. There are three possibilities for a new data 


point when determining its class: 

i. the new data point is within the boundaries of a sin- 
gle hyper-box, 

ii. the new data point is within the boundaries of more 
than one hyper-box, 

iii. the new data point is not enclosed in any of the 
hyper-boxes determined in the training problem. 
When the first possibility is realized for the new 

data point, the classification is made by directly assign- 

ing this data to the class that was represented by the 
hyper-box enclosing the data point. Since eliminating 
the shared areas between the constructed hyper-boxes 
introduces new constraints into the training problem 
that makes it computationally very difficult to be solved, 
there exists a possibility for a new data point to be 
within the boundaries of more than one hyper-box. In 
that case, the data point is assigned to the classes of the 
hyper-boxes that enclose this specific data point. The 
proportion of the number of correct classes to the num- 
ber of total assigned classes to that data point deter- 
mines the effect of that data point to the accuracy of the 
model. In the case when the third possibility applies, 
the assignment of the new data point to a class requires 
some analysis. If the data point is within the lower and 

upper bounds of all but not one of the attributes (i-e., 

m’) defining the box, then the shortest distance between 

the new point and the hyper-box is calculated using the 

minimum distance between hyper-planes defining the 
hyper-box and the new data point. The minimum dis- 
tance between the new data point j and the hyper-box 
is calculated using Eq. (18) considering the fact that the 
minimum distance is given by the normal of the hyper- 
plane. 

min {|(ajm — Xtmn)|} 


l,m,n 


(18) 


When the data point is between the bounds of smaller 
than or equal to M-2 attributes, then the smallest dis- 
tance between the point and the hyper-box is obtained 
by calculating the minimum distance between edges of 
the hyper-box and the new point. An edge is a finite 
segment consists of the points of a line that are between 
two extreme points Xin, and Xm. The data point j 


an 
is represented by the vector A; which is composed of 


=> > 
ajm Values and POjm, and P1jm, are the vector forms 
of two extreme points. The minimum distance between 
the new data point j and one of the segments of the 
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hyper-box determined by two extreme points is calcu- 
lated using Eq. (25) where (-) indicates the dot product 
of the matrices in Eq. (22) and (23). 


=> > > 
W jlmn = Aj — POtmn (19) 
> > > 
V jlmn = Plimn — POimn (20) 
Wim - V; 
ela _ ( jlmn “jlmn ) (21) 
| Wotmn || Virmn | 
Vit V; 
ere _ ( jlmn jlmn ) (22) 
[ Vitmn ff Vito | 
Cl jimn 
bjtmn = = 23 
jlmn a ns ( ) 
PbjImn = PO jtmn + Bjtmn Vjlmn (24) 
min (25) 


i (= (4jm — Pojtmn) 

m 
When data point is not within the lower and upper 
bounds of any attributes defining the box, then the 
shortest distance between the new point and the hyper- 
box is calculated using the minimum distance between 
extreme points of the hyper-box and the new data. The 
minimum distance between the new data point j and 
one of the extreme points of the hyper-box is calculated 
using Eq. (26). 


j= (jm _ Kin)” 


The following algorithm assign a new data point j with 
attribute values aj, to class k: 


min 
ln 


(26) 


Step 0: Initialize inAtt(lm) =0. 
Step 1: For each / and m, if 


Ximn S 4jm < Ximn Wn =lo,n' = up 


(27) 


Set inAtt(],m) = inAtt(Lm) + 1. 

Step 2: If inAtt(Lm)=M, then go to Step 3. Other- 
wise, continue. If inAtt(m) <M W—1, then go 
to Step 4. 


Step 3: Assign the new data point to class k where ybci; 
is equal to 1 for the hyper-box in Step 2. Stop. 
Calculate the minimum given by Eq. (18) and 
set the minimum as min1(I). Calculate the min- 
imum given by Eq. (25) and set the minimum 
as min2(l). Calculate the minimum given by 
Eq. (26) and set the minimum as min3(1). Select 
the minimum between min1(l), min2(l) and 
min3(l) to determine the hyper-box / that is 
closest to the new data point j. Assign the new 
data point to class k where ybc), is equal to 1 
for the hyper-box /. Stop. 


Step 4: 


Application 


We applied the mathematical programming method on 
a set of 16 data points in 4 different classes given in 
Fig. 2. The data points can be represented by two at- 
tributes, 1 and 2. 

There are a total of 20 data points; 16 of these points 
were used in training and 4 of them used in testing. The 
training problem classified the data into 4 four classes 
using 5 hyper-boxes as shown in Fig. 3. It is interest- 
ing to note that Class] requires two hyper-boxes while 
the other classes are represented with a single hyper- 
box only. The reason for having two hyper-boxes for 
Class1 is due to the fact that a single hyper-box for this 
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Data points in the illustrative example and their graphical 
representation 
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tion, Figure 3 

Hyper-boxes that classify the data points in the illustrative 
example 


class would include one of the data points that belong to 
Class3. In order to eliminate inconsistencies in training 
data set, the method included one more box for Class1. 

After the training is successfully completed, the test 
data is processed to assign them to hyper-boxes that 
classify the data perfectly. The assignment of the test 
data point B to Class2 is straightforward since it is 
included in the hyper-box that classifies Class2 (i.e., 
inAtt(,m)=M for this data point). The test data in 
Class1 is assigned to one of the hyper-boxes that clas- 
sify Class. Similarly, the test data in Class3 is also as- 
signed to the hyper-box that classifies Class3. Since the 
test data in these classes are included within the bounds 
of one of the two attributes, the minimum distance is 
calculated as the normal to the closest hyper-plane to 
these data points. In the case of data point that belongs 
to Class4, it is assigned to its correct class since the clos- 
est extreme point of a hyper-box classifies Class4. This 
extreme point of the hyper-box 5 classifying Class4 is 
given by (X5.1,10, X5,2,10). The test problem also clas- 
sified the data points with 100% accuracy as shown in 
Fig. 3. 

This illustrative example is also tested by different 
data classification models existing in the literature in 
order to compare the results and to measure the per- 
formance of the proposed model. Table 1 shows the ex- 


Multi-Class Data Classification via Mixed-Integer Optimiza- 
tion, Table 1 

Comparison of different classification models for the illustra- 
tive example 


Misclassified 
Sample(s) 


Prediction 
Accuracy 


Classification Model 


Neural Networks? 


Support Vector Machines? 


Bayesian Classifier® 


K-nearest Neighbor Classifier 


Statistical Regression 
Classifiers® 


Decision Tree Classifier® 


MILP approach 


2 iDAimplementation in MS Excel [9] ° SVM implementation in 
Matlab [3] © WEKA [20] 


amined models and their outcomes for this small illus- 
trative example. 

Neural Networks, Support Vector Machines, 
Bayesian, K-nearest Neighbor and Statistical Regres- 
sion classifiers have only one misclassified instance 
which leads to 75% accuracy value as shown in Table 1. 
Neural Networks and K-nearest Neighbor classifier 
predicts the class of test sample A as Class3. Support 
Vector Machine method misclassifies test sample D and 
assigns it to Classl1 while Bayesian and Statistical Re- 
gression classifier classifies test sample C as belonging 
to Class2. On the other hand, Decision Tree classi- 
fier gives the lowest accuracy value (50%) with two 
misclassifications. Sample A and sample C is classi- 
fied as Class3 and Class2, respectively. Consequently, 
MILP approach in this thesis classifies all of the test 
samples accurately and achieves 100% accuracy. As 
a result, the MILP approach performs better than other 
data classification methods that are listed in Table 1 
for the illustrative example. The accuracy of the MILP 
approach is tested on IRIS dataset and protein fold- 
ing type dataset. The results indicate that the MILP 
approach has better accuracy than other methods on 
these datasets [17,18]. 


Conclusion 


Multi-class data classification problem can be very ef- 
fectively modeled as an MILP problem. One of the 
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most important characteristics of the MILP approach 
is allowing the use of hyper-boxes for defining the 
boundaries of the classes that enclose all or some of 
the points in that set. In other words, if necessary, 
more than one hyper-box is constructed for a spe- 
cific class through the training part studies. Moreover, 
well-construction of the boundaries of each class pro- 
vides the lack of misclassifications in the training set 
and indirectly improves the accuracy of the model. 
The model does not require the underlying distribu- 
tion of the training data set and learns from the train- 
ing set in a reasonable time. With only one parameter 
(c: the penalty parameter to minimize the total num- 
ber of hyper-boxes), the suggested model is simple and 
very effective. Furthermore, the proposed model can be 
used for both binary and multi-class data classification 
problems without any modifications. Hence, the per- 
formance of the model does not depend on the class 
related changes. 
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Linear multicommodity flow problems (MCF) are lin- 
ear programs (LPs) that can be characterized by a set 
of commodities and an underlying network. A com- 
modity is a good that must be transported from one or 
more origin nodes to one or more destination nodes in 
the network. In practice these commodities might be 
telephone calls in a telecommunications network, pack- 
ages in a distribution network, or airplanes in an air- 
line flight network. Each commodity has a unique set 
of characteristics and the commodities are not inter- 
changeable. That is, you cannot satisfy demand for one 
commodity with another commodity. The objective of 
the MCF problem is to flow the commodities through 
the network at minimum cost without exceeding arc ca- 
pacities. A comprehensive survey of linear multicom- 
modity flow models and solution procedures are pre- 
sented in [2]. 

Integer multicommodity flow (IMCF) problems, 
a constrained version of the linear multicommodity 
flow problem in which flow of a commodity (specified 
in this case by an origin-destination pair) may use only 
one path from origin to destination. 

MCE and IMCF problems are prevalent in a number 
of application contexts, including transportation, com- 
munication and production. 


MCF Example Applications 


e Routing vehicles in traffic networks (dynamic traf- 
fic assignment). This involves the determination of 
minimum delay routes for vehicles from their ori- 
gins to their respective destinations over the traffic 
network. The allowable congestion levels determine 
the arc capacities. Alternatively, there are no capaci- 
ties but the cost on an arc is a function of the amount 
of flow on the arc. In the former case, the objective 
function is linear while in the latter it is nonlinear. 


e Distribution systems planning. In this problem there 
are different products (or, commodities) produced 
at several plants with known production capacities. 
Each commodity has a certain demand in each cus- 
tomer zone. The demand is satisfied by shipping 
via regional distribution centers with finite stor- 
age capacities. A.M. Geoffrion and G.W. Graves 
[28] model this problem of routing the commodi- 
ties from the manufacturing plants to the customer 
zones through the distribution centers as a MCF 
problem. 

e Import and export models. One of the factors that 
may affect export is handling capacity at ports. D. 
Barnett, J. Binkley and B. McCarl [8] use a MCF 
model to analyze the effect of US port capacities on 
the export of wheat, corn and soybean. 

e Optimization of freight operations. T. Crainic, J.A. 
Ferland and J.M. Rousseau [20] develop a MCF- 
based routing and scheduling optimization model 
that considers the planning issues for the railroad 
industry. More recently, H.N. Newton [48] and C. 
Barnhart, H. Jin and P.H. Vance [13] study the 
railroad blocking problem using multicommodity 
based formulations. 

e Freight Assignment in the Less-than-Truckload 
(LTL) industry. An LTL carrier has to consolidate 
many shipments to make economic use of the vehi- 
cles. This requires the establishment of a large num- 
ber of terminals to sort freight. Trucking companies 
use forecasted demands to define routes for each 
vehicle to carry freight to and from the terminals. 
Once the routes are fixed, the problem is to deliver 
all the shipments with minimum total service time 
or cost. This problem is formulated as a MCF prob- 
lem in [17] and [24]. 

e Express Shipment Delivery. D. Kim [40] models the 
shipment delivery problem faced by express carri- 
ers like Federal Express, United States Postal Ser- 
vice, United Parcel Service, etc. as a MCF problem 
on a network in space and time. 

e Routing messages in a telecommunications or com- 
puter network. The network consists of transmis- 
sion lines. Each message request is a commodity. 
The problem is to route the messages from origins to 
the respective destinations at a minimum cost. T.L. 
Magnanti et al. [42] and others provide MCF-based 
formulations for this problem. 


2356 


Multicommodity Flow Problems 


e Long-term hydro-generation optimization. The task 
in this case is to determine the amount of hydro- 
generation at a reservoir in an interval of time, that 
minimizes the expected cost of power generation 
over a period of time, divided into several intervals. 
N. Nabonna [47] showed that this problem can be 
modeled as a MCF problem with inflows given as 
probabilistic density functions. 

e Forest management. For each planning period, for- 
est managers have to make decisions concerning the 
land areas to be harvested, the volume of timber to 
be harvested from these areas, the land areas to be 
developed for recreation and the road network to be 
built and maintained in order to support both the 
timber haulers and recreationists. This problem has 
been formulated as a MCF problem in [33]. 

e Street planning. L.R. Foulds [26] introduced this 
problem and modeled it as a MCF problem. The ob- 
jective is to identify a set of two-way streets such 
that making these streets one-way minimizes the to- 
tal congestion cost in the network. 

e Spatial price equilibrium (SPE) problem. This prob- 
lem requires modeling consumer flows within a gen- 
eral network. The SPE problem determines the opti- 
mum levels of production and consumption at each 
market and the optimal flows satisfy the equilibrium 
property. R.S. Segall [59] models and solves the SPE 
problem as a MCF problem. 

For a more comprehensive description of MCF appli- 

cations, see [2,37,57]. 


IMCF Example Applications 


e Airline fleet assignment. Given a time table of flight 
arrivals and departures, the expected demand on the 
flights and a set of aircraft, the objective is to ar- 
rive at a minimum cost assignment of aircraft to the 
flights. This problem has been extensively studied in 
[1,31]. 

e Airline crew scheduling. This problem deals with the 
minimum cost scheduling of crews. Factors such as 
hours of work limitations and Federal Aviation Ad- 
ministration regulations must be taken into account 
while solving the problem. For an in-depth study see 
[5,14]. 

e Airline maintenance routing problems require that 
single aircraft be routed such that maintenance re- 


quirements are satisfied and each flight is assigned 
to exactly one aircraft. This problem has been stud- 
ied in [10,19,25]. 

e Bandwidth packing problems require that bandwidth 
be allocated in telecommunications networks to 
maximize total revenue. The demands, or calls, on 
the networks are the commodities and the objective 
is to route the calls from their origin to their desti- 
nation. In the case of video teleconferencing, since 
call splitting is not allowed, each call must be routed 
on exactly one network path. This IMCF problem is 
described in [49]. 

e Package flow problems, such as those arising in ex- 
press package delivery operations, require that ship- 
ments, each with a specific origin and destination, 
be routed over a transportation network. Each set 
of packages with a common origin-destination pair 
can be considered as a commodity and often, to fa- 
cilitate operations and ensure customer satisfaction, 
must be assigned to a single network path. These 
problems are cast as IMCF problems in [12]. 


Formulations 


Multicommodity flow problems can be modeled in 
a number of ways depending how one defines a com- 
modity. There are three major options: a commodity 
may originate at a subset of nodes in the network and 
be destined for another subset of nodes, or it may orig- 
inate at a single node and be destined for a subset of the 
nodes, or finally it may originate at a single node and be 
destined for a single node. K.L. Jones et al. [34] present 
models for each of these different cases. In the interest 
of space, we will only consider models for the last case. 
The other cases can also be modeled using variants of 
the models presented here. 

We present two different formulations of the MCF 
problem: the node-arc or conventional formulation and 
the path or column generation formulation. The MCF 
is defined over the network G comprised of node set N 
and arc set A. MCF contains decision variables x, where 
i is the fraction of the total quantity (denoted q*) of 
commodity k assigned to arc ij. In the IMCF problem 
these variables are restricted to be binary. The cost of 
assigning commodity k in its entirety to arc ij equals 
q* times the unit flow cost for arc ij, denoted ee Arc 
ij has capacity dj, for all ij €¢ A. Node i has supply of 
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commodity k, denoted bt equal to 1 if i is the origin 
node for k, equal to —1 if iis the destination node for k, 
and equal to 0 otherwise. 

The node-arc MCF formulation is: 


minimize > > Chg at, (1) 


keK ijeA 
such that 
Yo xf So xi = bi, Vie N,WkEK, (2) 
ijéA ji€A 
Date sain Vi 
q Xj; = Gij, ij € A, (3) 
kEK 
xi >0, WijeA, VkeEK. (4) 


Note that without restricting generality of the prob- 
lem, we model the arc flow variables x having values 
between 0 and 1. To do this, we scale the demand for 
each commodity to 1 and accordingly adjust the coef- 
ficients in the objective function (1) and in constraints 
(3). Also note the block-angular structure of this model. 
The conservation of flow constraints (2) form nonover- 
lapping blocks, one for each commodity. Only the arc 
capacity constraints (3) link the values of the flow vari- 
ables of different commodities. 

To contrast, the path-based or column generation 
MCE formulation has fewer constraints, and far more 
variables. Again, the underlying network G is com- 
prised of node set N and arc set A, with q* representing 
the quantity of commodity k. P(k) represents the set of 
all origin-destination paths in G for k, for all k € K. In 
the column generation model, the binary decision vari- 
ables are denoted en where i is the fraction of the total 
flow of commodity k assigned to path p € P(k). The cost 
of assigning commodity k in its entirety to path p equals 
q‘ times the unit flow cost for path p, denoted ce. cr rep- 
resents the sum of the es 7 costs for all arcs ij contained 
in path p. As before, arc ij has capacity dj, for all ij < A. 
Finally, 5? ; 8 equal to 1 if arc ij is contained in path p € 
P(k), for all k € K; and is equal to 0 otherwise. 

The path or column generation IMCF formulation 
is then: 


minimize ) ) 


kEK peP(k 


ert Yp (5) 


such that 
> dX q‘yh8i, < dij, Vij eA, (6) 
kEK peP(k 
Yo weal, VkeK, (7) 
pEP(k) 
ye>=0, WpeP(k), VkEK. (8) 


LP Solution Methods 


Comprehensive surveys of the available multicommod- 
ity network flow solution techniques are provided in 
[6,37]. Descriptions of these approaches are also pro- 
vided in [2,38]. 

Price-directive decomposition techniques use the 
path-based MCF model. To limit the number of vari- 
ables considered in finding an optimal solution, col- 
umn generation techniques are used. Further details of 
price-directive decomposition and column generation 
are provided in [18,22,41,45,61]. 

Resource-directive decomposition techniques at- 
tempt to solve MCF problems by allocating arc capac- 
ity by commodity and solving the resulting decoupled 
minimum cost flow problems for each commodity. Ad- 
ditional descriptions of this technique can be found 
in [27,30,35,37,39,41,52,60,61]. 

Computational comparisons of the performance of 
price- and resource-directive decomposition methods 
can be found in [3,4]. A. Ali, R.V. Helgason, J.L. Ken- 
nington, and H. Lall [4] report that specialized de- 
composition codes can be expected to run from three 
to ten times faster than a general linear program- 
ming package. Furthermore, A.A. Assad [7] reports 
that resource-directive algorithms converge quickly for 
small problems but are outperformed by the price- 
directive method for larger MCF problems. 

G. Saviozzi [56] uses subgradient techniques on the 
Lagrangian relaxation of the bundle constraints and 
proposes a method of arriving at an advanced start- 
ing basis for the minimum cost multicommodity flow 
problem. 

Partitioning methods specialize the simplex method 
by partitioning the current basis to exploit the un- 
derlying network structure. Experiences with primal 
partitioning techniques have been reported in [24,32, 
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36,43,51,53,54,55], among others. J.B. Rosen [53] devel- 
ops a partitioning strategy for angular problems. J.K. 
Hartman and L.S. Lasdon [32] develop a generalized 
upper bounding algorithm for multicommodity net- 
work flow problems in which the special structure of 
the MCF problem is exploited. Their primal partition- 
ing procedure, a specialization of the generalized up- 
per bounding procedure developed by G.B. Dantzig and 
R.M. Van Slyke [21], involves the determination at each 
iteration of the inverse of a basis containing only one 
row for each saturated arc. Similarly, C.J. McCallum 
[44] developed a generalized upper bounding algorithm 
for a communications network planning problem. All 
of these procedures exploit the block-diagonal problem 
structure and perform all steps of the simplex method 
on a reduced working basis of dimension m, where m 
represents the size of set A. 

Interior point methods and parallel computing 
techniques have also been applied to MCF problems. 
Interior point methods provide polynomial time algo- 
rithms for the MCF problems. The best time bound is 
due to P.M. Vaidya [62]. G.L. Schultz and R.R. Meyer 
[58] provide an interior point method with massive 
parallel computing to solve multicommodity flow prob- 
lems. 

Development of new heuristic procedures for MCF 
problems include the primal and dual-ascent heuris- 
tics described in [17] and [9], respectively. A. Ger- 
sht and A. Shulman [29] use a barrier-penalty method 
to find nearly optimal solutions for multicommod- 
ity problems, while R. Schneur [62] describes a scal- 
ing algorithm to determine nearly feasible MCF solu- 
tions. 

Recently, price-directive decomposition or col- 
umn generation approaches, such as those presented 
in [2,11,23,34] have been the most extensively used 
method for solving large versions of the linear MCF 
problem. The general idea of column generation is that 
optimal solutions to large LP’s can be obtained without 
explicitly including all columns (i.e., variables) in the 
constraint matrix (called the Master Problem or MP). 
In fact, only a very small subset of all columns will be in 
an optimal solution and all other (nonbasic) columns 
can be ignored. In a minimization problem, this im- 
plies that all columns with positive reduced cost can be 
ignored. The multicommodity flow column generation 
strategy, then, is: 


0) RMP Construction. Include a subset of columns in 
a restricted MP, called the Restricted Master Prob- 
lem, or RMP; 

1) RMP Solution. Solve the RMP LP; 

2) Pricing Problem Solution. Use the dual variables 
obtained in solving the RMP to solve the pricing 
problem. The pricing problem either identifies one 
or more columns with negative reduced cost (i.e. 
columns that price out) or determines that no such 
column exists. 

3) Optimality Test. If one or more columns price out, 
add the columns (or a subset of them) to the RMP 
and return to Step 1; otherwise stop, the MP is 
solved. 

For any RMP in Step 1, let — 24 represent the non- 

negative dual variables associated with constraints (6) 

and o* represent the unrestricted dual variables asso- 

ciated with constraints (7). Since ck can be represented 


P 
as )ijcA ch Le the reduced cost of column p for com- 


modity k, denoted . q* , is: 


oh ok ky_k P k 
ted = Dia (cj + mij)87, — 0", 
ij€A 


Vp € P(k), Vk eK. (9) 


For each RMP solution generated in Step 1, the pricing 
problem in Step 2 can be solved efficiently. Columns 
that price out can be identified by solving one shortest 
path problem for each commodity k € K over a network 
with arc costs equal to cht mij, for each ij € A. Let px 
represent a resulting shortest path p* for commodity k. 
Then, if for all k € K, 


k Wk 
Cox = 0, 


the MP is solved. Otherwise, the MP is not solved and, 
for each k € K with 


Cae <0, 
path px € P(k) is added to the RMP in Step 3. 


IP Solution Methods 


The ability to solve large MCF LP’s enables the solu- 
tion of large IMCF problems. Successful approaches 
for solving large IMCF problems use the path-based or 
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column generation formulation of the problem. Col- 
umn generation IP’s can be solved to optimality us- 
ing a procedure known as branch and price, detailed in 
[15,23,64]. Branch and price, a generalization of branch 
and bound with LP relaxations, allows column genera- 
tion to be applied at each node of the branch and bound 
tree. Branching occurs when no columns price out to 
enter the basis and the LP solution does not satisfy the 
integrality conditions. 

Applying a standard branch and bound procedure 
to the final restricted master problem with its existing 
columns will not guarantee an optimal (or feasible) so- 
lution. After the branching decision modifies RMP, it 
may be the case that there exists a column for MP that 
prices out favorably, but is not present in RMP. There- 
fore, to find an optimal solution we must maintain the 
ability to solve the pricing problem after branching. 
The importance of generating columns after the ini- 
tial LP has been solved is demonstrated for airline crew 
scheduling applications in [63]. Although they were un- 
able to find even feasible IP solutions using just the 
columns generated to solve the initial LP relaxation, 
they were able to find quality solutions using a branch 
and price approach for crew scheduling problems in 
which they generated additional columns whenever the 
LP bound at a node exceeded a preset IP target objective 
value. 

The difficulty of performing column generation 
with branch and bound is that conventional integer 
programming branching on variables may not be effec- 
tive because fixing variables can destroy the structure of 
the pricing problem. For the multicommodity flow ap- 
plication, a branching rule is needed that ensures that 
the pricing problem for the LP with the branching de- 
cisions included can be solved efficiently with a shortest 
path procedure. To illustrate, consider branching based 
on variable dichotomy in which one branch forces com- 
modity k to be assigned to path p, i.e., yr = 1, and the 
other branch does not allow commodity k to use path 
Peles, vs, = 0. The first branch is easy to enforce since 
no additional paths need to be generated once k is as- 
signed to path p. The latter branch, however, cannot be 
enforced if the pricing problem is solved as a shortest 
path problem. There is no guarantee that the solution 
to the shortest path problem is not path p. In fact, it 
is likely that the shortest path for k is indeed path p. 
As a result, to enforce a branching decision, the pricing 


problem solution must be achieved using a next shortest 
path procedure. In general, for a subproblem, involving 
a set of a branching decisions, the pricing problem so- 
lution must be achieved using a kth shortest path pro- 
cedure. 

The key to developing a branch and price proce- 
dure is to identify a branching rule that eliminates 
the current fractional solution without compromising 
the tractability of the pricing problem. In general, J. 
Desrosiers et al [23] argue this can be achieved by bas- 
ing branching rules on variables in the original formu- 
lation, and not on variables in the column generation 
formulation. This means that branching rules should 
be based on the arc flow variables xf from the node-arc 
formulation of the problem. Barnhart et al. [15] develop 
branching rules for a number of different master prob- 
lem structures. They also survey specialized algorithms 
that have appeared in the literature for a broad range of 
applications. 

M. Parker and J. Ryan [49] present a branch and 
price algorithm for the bandwidth packing problem. in 
which the objective is to choose which of a set of com- 
modities to send in order to maximize revenue. They 
use a path-based formulation. Their branching scheme 
selects a fractional path and creates a number of new 
subproblems equal to the length of the path (measured 
in the number of arcs it contains) plus one. On one 
branch, the path is fixed into the solution and on each 
other branch, one of the arcs on the path is forbidden. 
To limit time spent searching the tree they use a dy- 
namic optimality tolerance. They report the solution of 
14 problems with as many as 93 commodities on net- 
works with up to 29 nodes and 42 arcs. All but two of 
the instances are solved to within 95% of optimality. 

K. Ziarati et al. [16] consider the problem of as- 
signing railway locomotives to trains. They model the 
problem as an integer multicommodity flow problem 
with side constraints and solve using a Dantzig—Wolfe 
decomposition technique, where subproblems are for- 
mulated as constrained or unconstrained shortest path 
problems. 

P. Raghavan and C.D. Thompson [50] illustrate the 
use of randomized algorithms to solve some integer 
multicommodity flow problems. They use randomized 
rounding procedures that give provably good solutions 
in the sense that they have a very high probability of 
being close to optimality. 
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Barnhart et al. [12] present a branch and price and 


cut algorithm for general IMCF problems where each 
commodity is represented by an origin-destination pair 


and flow volume. Branch and cut, another variant of 


branch and bound, allows valid inequalities, or cuts, 
to be added throughout the branch and bound tree. 
Branch and price and cut combines column and row 
generation to yield very strong LP relaxations at nodes 
of the branch and bound tree. 


See also 


> Auction Algorithms 

> Communication Network Assignment Problem 
> Directed Tree Networks 

> Dynamic Traffic Networks 

> Equilibrium Networks 

> Evacuation Networks 

> Generalized Networks 

> Maximum Flow Problem 

> Minimum Cost Flow Problem 

> Network Design Problems 

> Network Location: Covering Problems 

> Nonconvex Network Flow Problems 

> Nonoriented Multicommodity Flow Problems 
> Piecewise Linear Network Flow Problems 

> Shortest Path Tree Algorithms 

> Steiner Tree Problems 

> Stochastic Network Problems: Massively Parallel 
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> Survivable Networks 
> Traffic Network Equilibrium 
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Introduction 


References to Financial Statements Fraud (FSF) and 
earnings manipulation have attracted the attention of 


market participants, academics, and regulators all over 
the world especially in recent years and following the 
collapse of Enron. During these years there have also 
been several cases of financial statement fraud which 
have been undetected by the auditors. 

Using normal audit procedures, the detection of fal- 
sified financial statements is a difficult task [18,73]). 
There are numerous reasons for these difficulties such 
as a shortage of knowledge concerning the character- 
istics of management fraud, the efforts of managers to 
deceive auditors, and difficulties in collecting, analyzing 
and synthesizing large quantities of data from several 
different sources. 

Models of audit reporting have several uses (i.e. 
prediction, determination, bankruptcy), as described 
by Dopuch et al. [26]. For example, they can provide 
a benchmark representing the probability that an audi- 
tor would issue a modified audit report on a given com- 
pany. Furthermore, these models can be imperative in 
an auditing system that enables the users to take pre- 
ventive or corrective actions [30,57,80]). 

Most of the earlier studies of FSF have used dis- 
crete choice models in which the dependent variable 
was dichotomous. Mutchler [66] and Levitan and Kno- 
blett [60] used discriminant analysis, Dopuch et al. [26] 
and Lennox [59] used probit models, Keasey et al. [48], 
Bell and Tabor [9], Monroe and Teh [65], Louw- 
ers [62], DeFond et al. [25], Citron and Taffler [17], 
Menon and Schwartz [63], and Spathis [80] used logit 
models, Krishnan [55] used an ordered probit model, 
Spathis et al. [79,81]), and Pasiouras et al. [71] used 
multicriteria decision aid (UTADIS) and multivariate 
statistical techniques (e. g. discriminant and logit anal- 
ysis), Gaganis and Pasiouras [35] used discriminant 
and logit models, Gaganis et al. [36] used probabilis- 
tic neural network models, Gaganis et al. [33] used 
nearest neighbor models, Fanning et al. [31] and Fan- 
ning and Cogger [30] used artificial neural networks, 
and Doumpos et al. [27] used support vector ma- 
chines. 

In the present study, a multicriteria approach was 
followed through the application of the nonparametric 
Multi-group Hierarchical DIScrimination (MHDIS) 
method with the aim of developing a sorting model to 
detect those firms that issue FSF in Greece. The MHDIS 
model was compared with logit analysis in order to test 
its efficiency against a benchmark that has been com- 
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monly used in previous studies. MHDIS is not based 
on statistical assumptions, which often cause problems 
during the application of statistical methods (logit and 
probit analysis), and furthermore, it can easily incorpo- 
rate qualitative data. 

Although there have recently been a few attempts 
to develop models to detect falsified audit statements in 
Greece [15,51,79,81]. The present study differs in sev- 
eral respects. First, we have used a more recent and 
larger dataset than in previous studies, which contains 
more detailed information. The data used corresponds 
to the fiscal years 2001-2004 and covers 398 compa- 
nies. Spathis et al. [79,81] and Kirkos et al. [51] ex- 
amined the same random sample of 76 manufacturing 
companies covering the period 1997-1999, and Cara- 
manis and Spathis [15] examined a sample of 182 com- 
panies. Second, we examined both listed and unlisted 
companies from the manufacturing, trade, and ser- 
vices sectors in contrast to Spathis et al. [79,81] and 
Kirkos et al. [51] who examined only manufacturing 
listed companies and Caramanis and Spathis [15] who 
considered only listed companies. Third, we used out- 
of-time and out-of-sample testing samples. When eval- 
uating the classification ability of a model, it is impor- 
tant to ensure that it has not been over-fitted to the 
training (estimation) dataset. As Stein [83] mentions 
“a model without sufficient validation may only be a hy- 
pothesis”. Previous research has shown that when clas- 
sification models are used to reclassify the observations 
of the training sample, the classification accuracies are 
biased upward. Thus, it is necessary to classify a set of 
observations which were not used during the develop- 
ment of the model, using some kind of testing sam- 
ple. 

The rest of the paper is organized as follows: Sec- 
tion “Sample” describes the sample used in this study. 
Section “Method” describes the methodology. Section 
“Empirical Results” presents the empirical results, and 
the Sect. “Conclusions and Further Research” discusses 
the concluding remarks and suggests some possible fu- 
ture research directions. 


Sample 


The data used in the study consisted of financial state- 
ment information (i.e. balance sheet, income state- 
ment, auditors’ opinions, and the notes to financial 


statements) of a sample of companies obtained from 
ICAP! database and Athens Stock Exchange (ASE). 
Our analysis was restricted to Greek limited (société 
anonyme) and limited liability companies, which are 
obliged by law to have their financial statements au- 
dited, and we focused on the period between 2001 and 
2004. 

We obtained 199 qualified cases which were dis- 
tributed over various sectors”. The next step was to 
select unqualified firms. We used a pair-matching 
method by sector. Matching of firms is common prac- 
tice when conducting classification studies in auditing 
as well as in other areas of finance, such as bankruptcy 
or acquisitions prediction (e. g. [11,34,50,58,71]). There 
are two primary reasons for following this procedure, 
which is known as choice-based sampling. The first is 
the lower cost of collecting data in comparison with an 
unmatched sample [6,46,90]). The second and most im- 
portant is that a choice-based sample provides greater 
information content than a random sample [19,45,69]). 
Hence, our sample consisted of the same number of 
qualified and unqualified cases. 

Most of the previous studies concerning the devel- 
opment of models to replicate (or predict) auditors’ 
opinion used training and testing samples from the 
same period, or re-sampling techniques such as jack- 
knife and bootstrap (e. g. [57,79,81]). However, as Es- 
pahbodi and Espahbodi [29] point out, the real test of 
a classification model and its practical usefulness is its 
ability to classify objects correctly in the future. The 
main reason, as stated by Barnes [5], is that given in- 
flationary effects, technological and other reasons, such 
as accounting policies, it is not reasonable to expect fi- 
nancial ratios to be stable over time. To account for 
this population drifting, in the present study, we split 
our sample of 398 companies into two distinct sam- 
ples. The training sample, used for the development 
of the models, consisted of 234 companies and cov- 
ered the period 2001-2003. The validation sample con- 
sisted of the remaining 164 companies and used data 
from 2004. 


1ICAP is the largest company providing Business Information 
and Consulting Services in Greece. 

?The sample for this study consists of 164 manufacturing, 
122 trade and 110 services companies. 
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Financial Variables 


Similar to previous studies we used financial ratios as 
the indicators of FSF. One of the problems with the se- 
lection of suitable ratios is that not only are there many 
financial variables which could be potential candidates 
for inclusion in the model, but also that previous stud- 
ies have, in general, failed to select variables that re- 
flect theoretical models of FFS as well. We therefore se- 
lected variables that were regarded as important in pre- 
vious studies, such as Albrecht and Romney [1], Palm- 
rose [70], Dopouch et al. [26], Loebbecke et al. [61], 
Green [38], Stice [84], Davia et al. [23], Bell et al. [10], 
Schilit [75], Arens and Loebbecke [3], Beasley [7], 
Bologna et al. [12], Krishnan and Krishnan [54], 
Green and Choi [39], Hoffman [42], Hollman and Pat- 
ton [43], Zimbelman [89], Laitinen and Laitinen [57], 
Spathis et al. [79,81], Spathis [80], Doumpos et al. [27], 
Gaganis et al. [33,36], Gaganis and Pasiouras [35], and 
Pasiouras et al. [71]. Table 1 present a full list of the 
variables considered in this study. There are 28 finan- 
cial variables covering all aspects of the performance of 
the selected companies, such as liquidity, leverage, prof- 
itability, managerial activity, and annual changes in ba- 
sic accounts [20]. 

An examination of previous research indicates that 
the prediction variables range between 6 and 20. Most 
of these studies selected the effective independent vari- 
ables using a statistical method, in an attempt to reduce 
the number of independent variables and the impact of 
potential multicollinearity. There is, however, little rel- 
evant theory about the selection of independent vari- 
ables for the nonlinear methods. From a practical point 
of view, developing a model that considers a large num- 
ber of variables poses problems for the applicability of 
the model on a daily basis by the auditor. This is because 
any application of the model requires that the auditor 
collects all necessary data, which leads to increased time 
and cost for data collection and management [79]. In 
the present study, we used a combination of two statis- 
tical analysis methods to examine whether there was an 
association between our variables and auditors’ opin- 
ions and hence to select our final set. 

First, we used the Kruskal-Wallis non-parametric 
test to examine the differences between qualified and 
unqualified companies. Table 1 present the results of 
the Kruskal-Wallis test for the training sample. Only 


three variables: 365*Stock/Cost of Sales; Logarithm of 
Debt; and Inventories/Total Assets were not statistically 
significant at the 10% level. We then reduced the num- 
ber of variables to a manageable size using factor anal- 
ysis. 

This approach can be used to uncover the latent 
structure of a set of variables by reducing the attribute 
space from a larger number of variables to a smaller 
number of factors. The factor loadings were then used 
to select a limited set of financial variables. 

Finally, seven financial variables were selected, 
each being the variable with the highest loading in 
each factor. These were: Receivable/Sales; Current As- 
sets/Current Liabilities; Current Assets/Total Assets; 
Cash/Total Assets; Profit before tax/Total Assets; In- 
ventories/Total Assets; and Annual Change in Sales. 


Non-financial Variables 


In addition to the seven financial variables discussed 
above, six non-financial ones were also used. Dop- 
uch et al. [26] presented a predictive model of audit 
opinion qualifications in which the variables with great- 
est predictive power were categorical ones. 

Previous studies mainly dealt with the construction 
of bankruptcy models for making audit opinions rela- 
tive to going concern (e.g. [44,52,53]). Prior research 
(e. g. [17,30,41,61,74,79,84]) suggests that financial dis- 
tress is very important in the issuing of an audit qual- 
ification. Although most of the previous studies used 
Altman’s z-score [2] or credit risk assessment of a rat- 
ing agency [71] as a proxy of default, such an approach 
may not be appropriate in our case. The reason is that 
Altman’s z-score was developed for a particular in- 
dustry (i.e. manufacturing), under different economic 
conditions (i.e. in the 1960s) and for a specific coun- 
try (i.e. USA). In the present study, we used a score 
(UTADISCR) estimated from the UTADIS bankruptcy 
prediction model of Zopounidis et al. [92]. We antici- 
pated that the use of this measure, which indicates the 
likelihood of default of Greek firms over the 12 months 
following the date of its calculation, might provide 
more accurate results. 

Spathis [80] found that audit qualification decision 
was positively associated with company litigation. In 
this study, the client litigation variable was coded as 
zero if a company had litigation in the year preceding 
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Multicriteria Decision Support Methodologies for Auditing Decisions, Table 1 
Descriptive statistics 


Unqualified Qualified Kruskal— 
Mean St.Dev Mean St.Dev_ Wallis 


0.58 0126/0153) G.22| 283=* 
0.31] 0.18[ 0.52] 0.59] 16.84* 
1.52] 0.60/ 1.00] 0.69] 66.07* 
0.75| 0.22| 0.65] 0.22] 14.48* 
0.22] 0.28| 0.08] 0.18] 26.59* 


CASH /TA 13.85 * 

ROA 157.80 * 

PRBT/FA 149.09 * 

INV / SAL 6.28 ** 

SAL/TA 22.70 * 

TD/TA 66.26 * 

PRBT / CL 157.60 * 

CL/TA 40.98 * 

WC/TA 80.45 * 

EBIT MARGIN Salas 

GP / SAL 72.76 * 

GP/TA 83.78 * 

(CA — ST)/CL 65.87 * 

365 * AREC/ SAL | 126.70 21.81 * 

365 * ST/CS 0.41 

SAL/ EQ 31.89 * 

SAL/TD 78.28 * 

365 * AP/ SAL 65.98 * 

TACH MP Z.Ons 

SALCH Sh? 

LOGTA TSO 

LOGDEPT 0.05 

INV/TA 0 

UTADISCR (Aas 
Notes: INVREC / TA: (Inventories + Receivable) / Total Assets, REC / SAL: Receivable / Sales, CA / CL: Current Assets / Current 
Liabilities, CA / TA: Current Assets / Total Assets, CASH / CL: Cash / Current Liabilities, CASH / TA: Cash / Total Assets, ROA: Profit 
before tax x 100 / Total assets, PRBT / FA: Profit (Loss) before tax x 100 / Fixed Assets, INV / SAL: Inventories/Sales, SAL\TA: Sales 
/ Total Assets, TD / TA: Total Dept / Total Assets, PRBT / CL : Profit (Loss) before tax x 100 / Current Liabilities, CL / TA: Current 
Liabilities / Total Assets, WC / TA: Working Capital / Total Assets, EBIT Margin: Profits before interest and taxes x 100 / Turnover, 
GP / SAL: Gross Profit / Sales, GP / TA: Gross Profit / Total Assets, (CA — ST) / CL: (Current assets — Stock ) / Current liabilities, 365 * 
AREC/ SAL:365 Accounts Receivable / Sales, 365 * ST / CS: 365 * Stock / Cost of Sales, SAL / EQ: Sales / Equity, SAL / TD: Sales / Total 
Dept, 365 * AP / SAL: 365* Accounts Payable / Sales, TACH: (Total Assets in year t — Total Assets in year t—1) x 100 / Total Assets 
in year t—1, SALCH: (Sales in year t — Sales in year t—1) x 100 / Sales in year t—1, LOGTA: Logarithm of Total Assets, LOGDEPT: 
Logarithm of Dept, INV / SAL : Inventories / Total Assets. The Kruskal-Wallis test indicates whether there are statistically significant 
differences between the two groups. ** Significant at the 1% level, * Significant at the 5% level, *** Significant at the 10% level 
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the audit opinion and as one otherwise. Skinner [77] 
considered companies having litigation in the follow- 
ing cases: (a) a lawsuit had been filed in a Greek court; 
(b) there had been an allegation of common stock price 
fraud; or (c) there had been an allegation of stock ex- 
change violation under Greek law. 

The most consistent result in all previous research 
has been that auditor size can explain the supply of 
a higher level of audit quality, defined as the joint prob- 
ability of detecting and reporting material financial er- 
rors (i.e., [4,8,22,24,25,28,47,48,64,72]). The evidence 
concerning the relationship between audit firms and 
audit report is mixed. Whereas Warren [88] did find 
a significant association between large audit firms and 
qualified audit reports, Shank and Murdock [76] found 
otherwise. 

Previous studies also examined whether the auditor 
is one of the Big Four (namely PricewaterhouseCoop- 
ers, Deloitte and Touche, KPMG, Ernst and Young) or 
not [35]. We use a dummy variable set to zero (Domes- 
tic = 0) if the auditor was one of the domestic audit 
firms and one if the auditor was one of the foreign audit 
firms in Greece (Foreign = 1). In Greece the auditing 
profession was liberalized in 1992 by enabling legisla- 
tion [37], see Caramanis [13,14]. The competition be- 
tween the local Greek and foreign audit companies has 
increased since 1992. Nowadays, Greek companies au- 
dit a greater percentage of companies than foreign audit 
companies. It is possible that smaller companies avoid 
paying the premium price levied by the large audit com- 
panies, since Krishnan et al. [54] found that smaller 
firms in the US are less likely to be audited by the “Big 
Four’ companies. Furthermore, it is possible that the 
partners of domestic audit companies are more likely to 
develop close personal relationships with the directors 
of Greek client companies. On the other hand, there 
is a chance that the domestic audit companies will be 
more familiar with the ‘small acceptable standards of 
control’ in Greece. 

Various papers examine the relationship between 
the audit opinion before and after the chance of au- 
ditors (switching). Chow and Rice [16], Craswell [21], 
Gul et al. [40], and Krishnan et al. [56] found a sig- 
nificant positive association between qualified opin- 
ions and subsequent auditor switching. As Nieves [68] 
points out, two effects may obscure the influence of the 
audit report in motivating a change: (a) many auditor 


changes may be unrelated to audit opinion; and (b) the 
reasons for auditor changes are an internal state which 
is not directly observable. We tested the importance 
of auditor changes to detection of FFS over a 3-year 
period. The 3-year period included the first year of 
the financial statements and auditors’ opinions and the 
2 years before this first year. We used a dummy variable 
(Prior 2 year auditor) that takes a value of one (yes = 1) 
if the auditor had been retained and a value of zero if 
the firm had switched auditors (no = 0). 

Finally, we used two other variables, LOSS and 
STOCK. LOSS is an indicator variable whose value is 
zero if an auditee experienced a loss in the year of 
audit opinion and one (profit) otherwise. Spathis [80] 
found a significant difference between qualified and 
non-qualified audit reports for this variable. STOCK is 
a dummy variable that takes a value of zero (yes = 0) for 
companies listed in the Athens Stock Exchange and one 
for unlisted companies (no = 1). Ireland [46] reported 
that whether a company is listed or unlisted may influ- 
ence the auditor’s independence. Listed companies may 
have greater supervision and training of their stock ex- 
change authority. Furthermore, as Ireland [46] points 
out, large companies are more likely to have good ac- 
counting systems and internal controls, thus reducing 
disagreements and limitations on scope while, at the 
same time, auditors are more likely to waive earnings 
management attempts (resulting in mis-statements) in 
large clients, even after controlling for the materiality of 
such attempts [67]. 


Method 


Multicriteria decision making (MCDM) provides the 
methodological basis for the combination of qualitative 
and quantitative data. MCDA has been applied in fi- 
nance as a sophisticated tool to improve the decision- 
taking in the turbulent and complex financial environ- 
ment that exists nowadays. Spronk et al. [82] thor- 
oughly investigated the application of this technique 
in the financial field. In the present study we used the 
MHDIS method [91]. 

The problem considered in this case study falls 
within the classification problematic that in general 
involves the assignment of a finite set of alternatives 
A = {4a1,a,...,a,} to a set of q ordered classes 
C; > Cy > +++ > C,. Each alternative was evaluated 
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along a set of m criteria gi, g2,..., m. In the present 
case study the alternatives involved the companies in 
the sample, the criteria correspond to the set of seven 
financial variables, and six non-financial variables and 
there were two classes, the unqualified financial state- 
ments (class C,) and the qualified financial statements 
(class C2). 

MHDIS distinguishes the groups progressively, 
starting by discriminating the first group from the oth- 
ers, and then proceeds to the discrimination between 
the alternatives belonging to the other group. To ac- 
complish this task two additive utility functions are de- 
veloped in each one of the q—1 steps, where q is the 
number of groups. The first function U;.(a) describes 
the alternatives of group C), and the second function 
Ux,x(a) describes the remaining alternatives that are 
classified in lower groups Cx41,...,Cq- 


U;(a) = > PRitki(gi) 
i=l 


and Ux;(a) = > poeta (a) 
i=1 


k =1,2,...,q—1. 


The corresponding marginal utility functions for each 
criterion g; are denoted as ux;(gi) and, u~x; (g;) which 
are normalized between 0 and 1, while the criteria 
weights px; and p~,;; sum up to 1. As mentioned above, 
the model is developed in q—1 steps. In the first step, 
the method develops a pair of additive utility functions 
U,(a) and UX~,(a) to discriminate between the alter- 
natives of group C; and the alternatives of the other 
groups C2,...,C,. On the basis of the above function 
forms the rule to decide upon the classification of any 
alternative has the following form: 

If Uj(a) > Ux;(a) then a belongs in C. 

Else if Uj(a) < Ux~;(a) then a belongs in (C), 
C3,...,Cq). 

The alternatives that are found to belong in class 
C, (correctly or incorrectly) are excluded from fur- 
ther analysis. In the next step, another pair of util- 
ity functions U,(a) and U~2(a) is developed to dis- 
criminate between the alternatives of group C, and 
the alternatives of the groups C3,...,C,, Similarly to 
step 1, the alternatives that are found to belong in 
group C) are excluded from further analysis. This pro- 


cedure is repeated up to the last stage (q—1), when 
all groups have been considered. The estimation of the 
weights of the criteria in the utility functions as well as 
the marginal utility functions is accomplished through 
mathematical programming techniques. More specifi- 
cally, at each stage of the hierarchical discrimination 
procedure, two linear programs and a mixed-integer 
one are solved to estimate the two additive utility func- 
tions optimally and to minimize the classification error. 
Further details of the mathematical programming for- 
mulations used in MHDIS can be found in Zopounidis 
and Doumpos [91]. 


Empirical Results 


Table 1 presents descriptive statistics which indicate the 
magnitude of the difference in the independent vari- 
ables between the qualified and unqualified reports over 
the period 2001-2003. A comparison between the mean 
value of UTADISCR for the qualified and unqualified 
companies in the training sample shows that the for- 
mer had a lower average value, which was statistically 
significant at the 1% level. Hence, many companies that 
had manipulated their financial statements were in fi- 
nancial distress [30,36,78]). Statistically significant dif- 
ferences at the 1% level were also found for two others 
variables, namely Inventories/Sales and Sales Annual 
Change, between the two groups in the training sam- 
ple. Thus, companies with lower sales are more likely 
to receive a qualified report than other companies in 
Greece. 

Furthermore, the variable Profits before tax/Total 
Assets (ROA) had lower means for the qualified com- 
panies, which was consistent with most of the previ- 
ous studies, indicating that firms which receive quali- 
fied opinions made less profit [7,61,78,81,86]). 

Table 2 illustrates the contribution of each of the 
financial and non-financial criteria in our auditing 
model. As our study involved two groups (unqualified 
and qualified) the hierarchical discrimination process 
of MHDIS consisted of only one stage, during which 
two additive utility functions were developed. The util- 
ity function U; characterizes the unqualified companies 
whereas the utility function U~, characterizes those 
that were qualified. 

ROA is indicated as one of the important crite- 
ria in most cases. Particularly in the case of unquali- 
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Multicriteria Decision Support Methodologies for Auditing 
Decisions, Table 2 
Average weights for the criteria in the 2 models 


MHDIS financial MHDIS No financial 
U1 U~1 U1 U~1 
2.28% | 3.82%} 0.02%] 0.02% 
7.17% | 25.18% | 3.99% | 16.41% 
21.49% | 5.02% | 15.04% | 10.14% 
4.39% | 11.23% | 16.87% | 1.11% 
44.75% | 25.71% | 28.09% | 11.57% 
12.54% | 13.74% | 10.55% | 0.02% 
7.38% | 15.30% | 1.50%} 3.94% 
0.00% | 0.00% 


REC/ SAL 
CA/CL 
CA/TA 
CASH /TA 
ROA 

INV / SAL 
SALCH 


Profit or Loss in the 
year 


0.00% 
0.00% 
9.93% 
0.00% 
14.01% 


17.49% 
0.00% 
0.00% 

22.37% 

16.94% 


Stock Exchange 
Auditor 


Prior 2 years Auditor 


Litigation 
UTADISCR 


Notes: REC / SAL: Receivable / Sales, CA / CL: Current Assets / 
Current Liabilities, CA / TA: Current Assets / Total Assets, CASH 
/ TA: Cash / Total Assets, ROA: Profit before Tax x 100 / Total 
Assets, INV / SAL: Inventories / Total Assets, SALCH: (Sales in 
year t — Sales in year t— 1) x 100 / Sales in year t—1 


fied firms, it has a weight that is as high as 44.75% in 
the financial model and 28.09% in the non-financial 
one. Similar findings were observed in previous stud- 
ies [7,61,71,78,81,86]). 

The most important criteria that characterized the 
qualified firms in the case of the financial model are 
ROA and Current Assets/Current Liabilities (CA/CL) 
followed by Sales in year t — Sales in year t— 1 (SALCH) 
with weights of 25.71, 25.18 and 15.30%, respectively. 
Pasiouras et al. [71] also found ROA to be statistically 
significant at the 1% level and one of the most impor- 
tant criteria for the models. In addition, Ireland [46] 
reported that companies with high liquidity (CA/CL) 
might increase the likelihood of a qualified audit opin- 
ion as assets may have been overstated. 

From the non-financial criteria, litigation, STOCK, 
UTADISCR and CA/CL were the most important cri- 
teria with 22.37, 17.49, 16.94 and 16.41%, respectively, 
for qualified companies. A comparison with the results 
of previous studies showed that the results were simi- 
lar. In particular, Spathis [80] found that litigation and 
financial distress were among the most important vari- 
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Decisions, Table 3 

Classification results (accuracies in %) for the MHDIS and 
Logit models 


Unqualified Qualified Average 


Panel A:Financial Variables 
Training(2001-2003) 
MHDIS | 94.87 
LA 95.7 
Holdout (2004) 
MHDIS | 80.49 


95.73 
95.7 


95.30 
95.7 


87.80 84.15 
86.6 82.3 
Panel B: Non-financial Variables 
Training (2001-2003) 
MHDIS | 98.29 98.29 
LA 97.43 95.73 
Holdout (2004) 
MHDIS | 84.15 


98.29 
96.58 


91.46 87.81 


ables, and Spathis et al. [79] found CA/CL to be among 
the most important factors. 

Table 3 presents the classification results obtained 
from the financial and non financial models. The clas- 
sification ability of the models was tested further us- 
ing the out-of-time and out-of-sample companies. The 
results indicated that the MHDIS models developed 
with the selected variables were able to provide a sat- 
isfactory distinction between qualified and unqualified 
statements. 

The overall correct classifications at the training 
and holdout stages were 95.3 and 84.15%, respectively. 
The differences between the financial and the non- 
financial MHDIS models were significant. Overall, the 
non-financial MHDIS model provided higher overall 
classification accuracy in both the training (92.3%) and 
holdout samples (87.81%). This means that the inclu- 
sion of non-financial variables in the model yielded 
a more accurate distinction between qualified and un- 
qualified companies than the inclusion of financial vari- 
ables alone. 

For benchmarking purposes, we developed addi- 
tional models with logit analysis (LA). These models 
were developed with the same input variables. In the 
case of the financial model, the classification accuracy 
in the training sample was 78%, and the correspond- 
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ing figure for the holdout sample was 74.3%. In the 
case of the non-financial model, however, the classifi- 
cation accuracies were 82.3 and 84.15% in the training 
and holdout samples, respectively. It is therefore clear 
that MHDIS was more efficient than LA during both 
the training and holdout stages (for both financial and 
non-financial models). 

MHDIS achieves more balanced results in terms of 
type I and type II errors in the holdout sample. Whereas 
Bell and Tabory [9] reported that type II errors are more 
costly than type I errors, Kida [49] argued that type I 
errors might result in: (a) a company changing its au- 
dit firm (switching), which means loss in audit firm 
revenue; (b) a lawsuit by a client against the account- 
ing firm; (c) a negative effect on the auditor’s repu- 
tation in the business community; (d) a deterioration 
in relations with the client; or (e) the so-called self- 
fulfilling prophecy - the qualification itself jeopardizes 
client survival, which in turn increases the probability 
of that consequence. 


Conclusions and Further Research 


This study investigated the extent to which MHDIS 
models based on financial and non-financial variables 
could predict auditors’ decisions to issue qualified opin- 
ions in the Greek market. 

The sample consisted of 199 companies operating 
in the Greek manufacturing, trade and service sectors 
with FSF between 2001 and 2004, matched by industry 
and total assets with 199 non-FSF ones, yielding a to- 
tal of 398 companies. We used out-of-time and out- 
of-sample testing samples to evaluate the classification 
ability of the model and ensure that they were not over- 
fitted to the training dataset. The sample was split into 
a training dataset of 234 companies using data from the 
period 2001-2003 and a validation dataset of 164 com- 
panies using data from the year 2004. 

Seven financial and six non-financial variables, rep- 
resenting all dimensions of companies’ performance, 
were selected for inclusion in the models that were de- 
veloped through the MHDIS approach. The results in- 
dicated that ROA, CA/CL and Current Assets/Total As- 
sets A were the important criteria for financial model. 
In additional, litigation, stock exchange, UTADISCR 
and CA/CL were the most important criteria for the 
non-financial model. Furthermore, the non-financial 


MHDIS model provides higher overall classification ac- 
curacy indicating that the inclusion of non-financial 
variables resulted in a more accurate distinction be- 
tween qualified and unqualified companies. 

By using such models, auditors can simultaneously 
screen a large number of firms and direct their atten- 
tion to the ones that are more likely to contain mis- 
statements, saving time or money. These models can 
also be used by policy-makers in an attempt to stop tax 
evasion (i.e. the tax evasion consisting of filing fraudu- 
lent tax declarations in Italy is estimated to be between 
3 and 10% of GNP [87]). In addition, these models can 
be useful to investors, managers, banks and others com- 
panies to identify ‘red flags’. 

The current research could be extended in several 
directions. First, future research could be extended to- 
wards the inclusion of additional variables such as man- 
agers’ experience, market characteristics (i.e. industry 
concentration, industry growth), audit fees and non- 
audit fees, subsidiaries, and stock prices. Second, com- 
panies could be classified into more specific groups. 
Third, the inclusion of data from a longer time period, 
could allow the consideration of industry and macroe- 
conomic effects. Finally, future research could be di- 
rected towards the comparison and integration of al- 
ternative or additional classification techniques, such 
as neural networks, rough sets, expert systems, sup- 
port vector machine, and others. The integration of the 
models through additional techniques, such as bagging 
and boosting, could also be examined. 
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Introduction/Background 


Over the last 35 years there have been several stud- 
ies which have attempted to develop classification 
models to predict takeover targets in various coun- 
tries and regions of the world, such as the USA [7,8, 
10,11,12,15,23,30,34,37], the UK [2,3,4,5,13,28,36], 
Canada [9,20,29], Greece [31,38,39], and more recently 
the EU [26,27] and Asia [25]. This is not surprising, 
since the prediction of acquisitions can be of major in- 
terest to stockholders, investors, creditors, and gener- 
ally anyone who has established a relationship with the 
acquired firm [35]. 

Most of these studies have used multivariate statis- 
tical and econometric techniques such as discriminant 
analysis (DA) and logit analysis and only more recently 
the parametric nature and the statistical assumptions 
and restrictions of those approaches have led re- 
searchers to the application of alternative techniques 
such as artificial neural networks (ANN) [10], rough 
sets (RS) [31], recursive partitioning algorithm [15], 
support vector machines [26], and nearest neigh- 
bors [26]. 

A few recent studies have also used multicriteria 
decision aid (MCDA, which is the designation usually 
used in Europe, or multiple criteria decision making, 
MCDM, which is the one usually used in the USA) tech- 
niques [13,25,26,27,36,39] which over the last few years 
have gained significant recognition among researchers 
and have been employed in several studies in bank- 
ing, finance, accounting, and management. For exam- 
ple, Steuer and Na [33] identified 256 applications that 
combine MCDM and finance. One of the characteris- 
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tics of these techniques is that they are well suited for 
analyzing complex decision problems that involve mul- 
tiple and usually conflicting criteria and/or goals. They 
can therefore prove particularly useful in the predic- 
tion of acquisitions, since there is often not one sin- 
gle reason but a number of reasons that lead manage- 
ment to the decision to merge with or acquire another 
firm. A further advantage of MCDA techniques is that 
they do not make any assumptions, as do the traditional 
techniques (see Barniv and McDonald [6] for a sum- 
mary of the problems related to the use of discriminant, 
logit, and probit), about the normality of the variables 
or the group dispersion matrices (e.g., DA) and they 
are not sensitive to multicollinearity (e. g., logit analy- 
sis). In this paper we first present a brief review of the 
studies that have applied MCDA in the prediction of ac- 
quisition targets (Sect. “Methods/Applications”). Then, 
in Sect. “Formulation”, we outline one of the MCDA 
techniques, namely, utilités additives discriminantes 
(UTADIS), which is used for the development of our 
classifications models. “Cases” describes a case study 
and presents the results. Finally, Sect. “Conclusions” 
concludes our paper. 


Methods/Applications 


Zopounidis and Doumpos [39] were the first to pro- 
pose the use of MCDA in the prediction of acquisi- 
tion targets. They developed a classification model with 
the multigroup hierarchical discrimination (MHDIS) 
method using a sample of 30 acquired and 30 nonac- 
quired Greek firms and ten financial ratios covering 
various aspects of a firm’s financial condition. Data 
from 1 year prior to the acquisition (year —1) were 
used for the development of the model, while years 2 
(year —2) and 3 (year —3) before the acquisition were 
used to test its discrimination ability. The model clas- 
sifies correctly 58.33 and 61.67% of the firms for years 
2 and 3 prior to the acquisition, respectively. The au- 
thors argue that this poor classification could be at- 
tributed to the difficulty of predicting acquisition tar- 
gets in general, and not necessarily to the inability of the 
proposed approach as a discrimination method. To test 
further the proposed technique, its classification accu- 
racy was compared with that of DA and UTADIS. The 
correct classification accuracy obtained using the pro- 
posed method is better for all years than that obtained 


using DA. As opposed to the UTADIS method, the clas- 
sification accuracy under the proposed approach is sig- 
nificantly higher for year —1, the same for for year —2, 
and slightly higher for year —3. On the basis of these 
results the authors conclude that the iterative binary 
segmentation procedure is able to provide results that 
are at least favorably comparable with those provided 
by UTADIS and outperforms DA. 

Tartari et al. [35] also used UTADIS in their study 
along with linear DA (LDA), probabilistic neural net- 
works (PNN), and RS in an attempt to examine whether 
the integration of different methods using a stacked 
generalization approach could result in higher classi- 
fication accuracies. Their sample consisted of 48 UK 
firms, selected from 19 industries/sectors, acquired 
during 2001, and 48 nonacquired firms matched by 
principal business activity, asset size, sales volume, 
and number of employees. Twenty-three financial ra- 
tios measuring profitability, liquidity and solvency, and 
managerial performance were initially calculated for 
each firm for up to 3 years prior to the acquisition 
(1998-2000); however, they finally used a set of nine ra- 
tios, selected on the basis of factor analysis. Their exer- 
cise consisted of two stages. First, UTADIS, LDA, PNN, 
and RS were used to develop individual models. The 
most recent year (i. e., 2000) was used as a training sam- 
ple, while data from the other two years (i.e., 1998 and 
1999) were used to test the generalizing performance of 
the proposed integration approach. An eightfold cross- 
validation approach was employed to develop the base 
models using the four methods. The classifications of 
the firms obtained were then used as a training sample 
for the development of a stacked generalization model. 
Finally, the development of the stacked model was per- 
formed using the UTADIS method that combines (at 
a metalevel) the group assignments of all the four meth- 
ods considered in the analysis. The use of other meth- 
ods to develop the combined model was also examined; 
nevertheless the results are inferior to those obtained 
with UTADIS. The stacked model performs better (in 
terms of the overall correct accuracy rate) than any of 
the four methods upon which it is based, throughout all 
the years of the analysis. Furthermore, the results indi- 
cate that the stacked model provides significant reduc- 
tions in the overall error rate compared with LDA, RS, 
and UTADIS, although they were less significant com- 
pared with PNN. 
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In another UK study, Doumpos et al. [13] com- 
pared the classification ability of UTADIS against one 
of the models developed using DA, logistic regression 
(LR), and ANN. The sample included 76 UK firms 
acquired during 2000-2002, matched by industry and 
size with 76 nonacquired firms. Twenty-nine financial 
ratios were initial candidates for model development, 
representing profitability, efficiency, activity, financial 
leverage, liquidity, and growth; however, the authors fi- 
nally selected six variables on the basis of a ft test and 
correlation analysis. The UTADIS model was first de- 
veloped using data drawn from the most recent year 
prior to the acquisition (i.e., year —1). The model de- 
veloped was then applied to data from 2 and 3 years 
prior to the acquisition (years —2 and —3). The average 
accuracies were 74.34 and 78.95%, respectively. These 
accuracies are higher than the ones obtained by both 
DA and LR, and are found to comparable to or better 
than those of ANN when tested using data from years 
—2 and —3. 

Pasiouras et al. [26] used both UTADIS and 
MHDIS, among several other classification techniques, 
to develop models specifically designed for the EU 
banking industry. They developed several models on 
the basis of equal and unequal training samples from 
the period 1998-2000, using both raw and country- 
adjusted variables. The models were tested in equal 
and unequal datasets from a future period (2001-2002). 
They also developed models that combine the pre- 
dictions of the individual models developed in the 
first stage, using two integration techniques, namely, 
stacked generalization and majority voting. Their re- 
sults were mixed and depended on the form of the vari- 
ables used, the datasets, and the evaluation measure 
considered. Hence, they concluded that there is no clear 
winner technique that dominates all the others under 
all circumstances. However, UTADIS appears several 
times as one of the best techniques. Furthermore, the 
stacked model developed through UTADIS also per- 
forms relatively well. 

Pasiouras et al. [27] also focused on the EU banking 
sector, but differentiated their study in two ways from 
that of Pasiouras et al. [26]. First, they considered an 
additional MCDA technique, namely, PAIRCLAS, that 
was applied for the first time in the prediction of acqui- 
sitions. Second, they followed a tenfold cross-validation 
resampling procedure for the development and eval- 


uation of the models. Their sample consisted of 168 
banks acquired between 1998 and 2002 matched with 
168 nonacquired banks. MHDIS achieved the highest 
overall accuracy in the validation dataset, with 68% of 
the acquired and 63.3% of the nonacquired banks clas- 
sified correctly (implying an overall classification rate 
of 65.7%). PAIRCLAS also achieves marginally better 
classification accuracies than UTADIS, and its ability to 
classify correctly the nonacquired banks (75%) is even 
higher than that of MHDIS (72.2%). 

In another study, Pasiouras et al. [25] concentrated 
on the Asian banking sector. They used a sample of 52 
targets and 47 acquirers that were involved in acquisi- 
tions in nine Asian banking markets during 1998-2004 
and matched them by country and time with an equal 
number of banks not involved in acquisitions. The 
models were developed and validated through a tenfold 
cross-validation approach using UTADIS and MHDIS. 
In each case three versions of the model were devel- 
oped. The first one distinguished between acquired 
and noninvolved banks. The second one distinguished 
between acquirers and noninvolved banks. The last 
one, was a three-outcome model that simultaneously 
distinguished between targets, acquirers, and nonin- 
volved banks. For comparison purposes they also devel- 
oped models through DA. The results indicate that the 
MCDA models are more efficient that the ones devel- 
oped through DA. Furthermore, in all cases the mod- 
els are more efficient in distinguishing between acquir- 
ers and noninvolved banks than between targets and 
noninvolved banks. Finally, the models with a binary 
outcome achieve higher accuracies than the ones which 
simultaneously distinguish between acquirers, targets, 
and noninvolved banks. 


Formulation 


The problem considered in the present study is a clas- 
sification one that in general involves the assignment of 
aset of malternatives A = {a), a2, ... , Am}, evaluated 
along a set of n criteria 1, 925... , Zn, to a set of q classes 
C;,C2,..., Cg. In the case of acquisitions, the alterna- 
tives are the firms in the sample, the criteria can corre- 
spond to financial and nonfinancial variables, and there 
are usually two classes, the nonacquired firms (class C;) 
and the acquired firms (class C2). Hence, in what fol- 
lows we consider the simple two-class case, while details 
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on the multiclass case can be found in Doumpos and 
Zopounidis [14] and Zopounidis and Doumpos [39]. 

The UTADIS approach, used in the present study, 
implies the development of an additive utility func- 
tion that is used to score the firms and decide upon 
their classification. The utility function has the follow- 
ing general form: 


U(a) = Y° wiui(gi) € (0, 1], (1) 


i=1 


where w; is the weight of criterion g; (the criteria 
weights sum up to 1) and w’;(g;) is the corresponding 
marginal utility function normalized between 0 and 1. 
The marginal utility functions provide a mechanism 
for decomposing the aggregate result (global utility) in 
terms of individual assessment to the criterion level. To 
avoid the estimation of both the criteria weights and 
the marginal utility functions, it is possible to use the 
transformation uj;(gi) = w;u',(gi). Since w’;(gi) is nor- 
malized between 0 and 1, it becomes obvious that u;(g;) 
ranges in the interval [0, w;]. In this way, the additive 
utility function is simplified to the following form: 


U(a) =) ui(gi) € (0, 1]. (2) 


i=1 


The utility function developed provides an aggregate 
score U(a) of each firm along all criteria. In the case 
of acquisitions prediction, this score provides the basis 
for determining whether the firm could be classified in 
either the group of nonacquired ones or in the group of 
acquired ones. The classification rule in this case is the 
following (C, and C, denote the group of nonacquired 
and acquired firms, respectively, while u is a cutoff util- 
ity point defined on the global utility scale, i. e., between 
0 and 1): 


U(a) = 
U(a) < Uy 


>aeC, 


>aeC, (3) 


The estimation of the additive value function and the 
cutoff threshold is performed using linear program- 
ming techniques so that the sum of all violations of 
the classification rule (3) for all the firms in the train- 
ing sample is minimized. A detailed description and 
derivation of this mathematical programming formu- 
lation can be found in Doumpos and Zopounidis [14]. 


Cases 


In this section, our method is illustrated by a case study 
from the work of Pasiouras et al. [24]. The dataset con- 
sidered in the study consists of 76 firms acquired be- 
tween 2000 and 2002, and 76 nonacquired firms, which 
operate in manufacturing, construction, and mining- 
quarrying-extraction industries in the UK. The sam- 
ple was constructed as follows. The acquired firms were 
first identified in the Hemscott M&As database and 
the financial data were collected from the Financial 
Analysis Made Easy database of Bureau van Dijk. After 
screening for data availability in FAME, 59 manufac- 
turing, six construction, five production and six min- 
ing-quarrying-extraction firms had complete financial 
data for the 3 years prior to the acquisition and were 
included in the sample. 

Although the year of acquisition is not common for 
all firms in the sample, they were all thought to be ac- 
quired in the “zero” year, considered as the year of ref- 
erence. The years of activity prior to “zero” are coded as 
“year —1” (1 year prior), “year —2” (2 years prior), and 
“year —3” (3 years prior). 

After the sample described above had been ob- 
tained, nonacquired firms were chosen to match the ac- 
quired firms. The firms were matched by industry and 
size (total assets) and financial data for the nonacquired 
companies were taken from the same calendar years as 
for the corresponding acquired companies. 

Barnes [5] mentions that the problem for the ana- 
lyst who attempts to forecast targets is simply a mat- 
ter of identifying the best predictive (i. e., explanatory) 
variables. Unfortunately, financial theories do not of- 
fer much in selecting specific variables among the nu- 
merous ones regarded as potential candidates in model 
development. Given the large number of possible ra- 
tios, it is important to reduce the list of ratios that en- 
ter the final model selection process. Hence, a question 
that emerges when attempting to select accounting ra- 
tios for empirical research is which ones, among the 
hundreds, should be used? However, there is no easy 
way to determine how many ratios a particular model 
should contain. Too few and the model will not cap- 
ture all the relevant information. Too many and the 
model will overfit the training sample, but underper- 
form in a holdout sample, and will most likely have 
onerous data input requirements [21]. As Hamer [17] 
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points out, the variable set should be constructed on 
the basis of (1) minimizing the cost of data collection 
and (2) maximizing the model applicability. Huberty 
[18] suggests three variable screening techniques that 
could be used: logical screening (e. g., financial theory 
and human judgment), statistical screening (e.g., test 
of differences of two group means such as the f test), 
and dimension reduction (e. g., factor analysis). In the 
present study we follow the latermost approach as in 
Stevens [34], Barnes [2], Kira and Morin [20], Zanakis 
and Zopounidis [38], and Tartari et al. [35]. Hence, 
a total of 25 variables are initially considered on the 
basis of data availability and previous studies, cover- 
ing several aspects of firms’ performances such as prof- 
itability, efficiency, activity, financial leverage, liquid- 
ity, and growth. Factor analysis is then used to reduce 
the number of variables to a smaller number of factors 
that are linear combinations of the initial variables. The 
analysis results in the extraction of seven factors, with 
eigenvalues higher than 1. The variable with the highest 
loading is selected from each one of the seven compo- 
nents for inclusion in the classification models. Conse- 
quently, we use the following seven variables: 
X1: Current assets/current liabilities, 
X2: Total liabilities/shareholders’ equity, 
X3: Annual change of total assets, 
X4: Annual change of current liabilities, 
X5: Profits before taxes/total assets, 
X6: Sales/stock, 
. X7: Sales/debtors. 
XI is an indicator of liquidity that has been used in 
many previous studies [2,12,20,38]. The views about 
liquidity are somewhat mixed. It is possible that firms 
with excess liquidity are more likely to be acquired be- 
cause of their good short-term financial position and 
the availability of cash or near-cash assets [36]. In 
this case, there is also an opportunity for the acquir- 
ers to finance the acquisition with the target’s own re- 
sources [32]. On the other hand, it can be argued that 
a firm in need of funds to finance its working cap- 
ital requirements is likely to be an acquisition target 
because the acquirer, after the acquisition, expects to 
bring additional funds into the firm to improve its liq- 
uidity [29]. 

X2 is a measure of financial leverage that has been 
used as a proxy for financial leverage in Rege [29], 
Palepu [23], and Kim and Arbel [19] among others. Ac- 
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cording to the financial leverage hypothesis the likeli- 
hood of being acquired decreases with the increase in 
company debt. There are two reasons why firms with 
lower preexisting levels of debt are considered attrac- 
tive acquisition targets. The first is that the low debt ra- 
tio of the target decreases the probability of future de- 
fault of the joint firm, while at the same time it increases 
the debt capacity of the new firm. The second is that 
in some cases a firm has extremely low debt ratios, the 
value of the firm may not be maximized, and low lever- 
age can be seen as a sign of inefficient management. 

X3 and X4 are measures of annual changes in two 
basic elements of the firms (i.e., assets, liabilities). 
Firms whose growth rates, as measured by X3, are rela- 
tively high can experience problems because their man- 
agement and/or structure will not able to deal with and 
sustain exceptional growth. It is therefore possible that 
a firm which is constrained in this way will become an 
acquisition target of a firm with surplus resources or 
management available to help [14]. Furthermore, a firm 
with high levels of growth might be acquired by firms 
that what to take advantage of this increase in assets, 
and boost their own growth. Turning to X4, exceptional 
increases may indicate that the firm has problems in 
meeting its short-term liabilities and can therefore be 
acquired to avoid solvency. 

Variables X5, X6, and X7 are related to the ineffi- 
cient management hypothesis. This hypothesis argues 
that if the managers ofa firm fail to maximize its market 
value, then the firm is likely to be an acquisition target 
and inefficient managers will be replaced. Thus, these 
takeovers are motivated by a belief that the acquiring 
firm’s management can manage better the target’s re- 
sources. This view is supported by two specific argu- 
ments. First, the firm might be poorly run by its cur- 
rent management, partly because the objectives of the 
management are at variance with those of the share- 
holders. In this case, the takeover threat can serve as 
a control mechanism limiting the degree of variance be- 
tween management’s pursuits for growth from share- 
holders’ desire for wealth maximization. A merger may 
not be the only way to improve management, but if dis- 
appointed shareholders cannot accomplish a change in 
management that will increase the value of their invest- 
ment within the firm, either because it is too costly or 
too slow, then a merger may be a simpler and more 
practical way of achieving their desired goals. Second, 
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Multicriteria Methods for Mergers and Acquisitions, Table 1 
Weights of the variables (percent) in the utilités additives dis- 
criminantes (UTADIS) model (averages over the ten replica- 
tions) 


Year—1 Year—2 Year —3 


the acquirer may simply have better management ex- 
perience than the target. There are always firms with 
unexploited opportunities to cut costs and increase 
sales and earnings, and that makes them natural can- 
didates for acquisition by other firms with better man- 
agement [1]. Therefore, if the management of the ac- 
quirer is more efficient than the management of the 
target firm, a gain could result through a merger if the 
management of the target is replaced. 

Table 1 presents the contribution of the seven crite- 
ria in the UTADIS model. To ensure the proper devel- 
opment and validation of the models, we follow a ten- 
fold cross-validation. Hence, the total sample of 152 
firms is randomly split into ten mutually exclusive sub- 
samples (folds) of approximately equal size. Then ten 
models are developed, using each fold in turn for vali- 
dation and the remaining folds for training. Therefore, 
in each of the ten replications, the training sample con- 
sists of 137 firms and the validation of 15 firms. The 
figures presented are the averages over the ten replica- 
tions. 

X2 (total liabilities/shareholders’ equity) appears to 
be the most important criterion in all 3 years with an av- 
erage weight that ranges between 23.28 (year —1) and 
34.04% (year —3). The profitability and efficiency in- 
dicators (X5, X6, X7) also appear to be important in 
classifying firms within the two groups, with average 
weights between 11.31 and 25.08%. X1 (i. e., current as- 
sets ratio) carries a weight above 10% in years —1 and 
—3 but it is considerably reduced to 2.45% in year —2. 
Finally, X3, which corresponds to the annual growth of 
the firm in terms of total assets, is the least important 
criterion in all years. 


Multicriteria Methods for Mergers and Acquisitions, Table 2 
Classification accuracies in percent (averages over ten repli- 
cations) 


Acquired Nonacquired Overall 
accuracy 
Classification accuracies of the UTADIS model 
in the development stage 


firms firms 


Year —1 | 80.1 
Year —2 | 81.9 71.1 76.5 
Year —3 | 81.1 70.3 Seif 


Classification accuracies in the validation 
stage 


71.8 HSK) 


Year —1 
UTADIS | 76.2 
DA 77.9 
Year —2 
UTADIS | 75.3 
DA 774 
Year —3 
UTADIS | 77.3 
DA 


By comparing the score U(a) of each firm with the 
cutoff threshold that was calculated through the esti- 
mation of the UTADIS model and rule (3), we can de- 
cide whether a firm can be classified as acquired or not 
acquired. Table 2 presents the classification results ob- 
tained by UTADIS during the development and vali- 
dation process. In Table 2 we also present the classifi- 
cation results obtained by DA, used for benchmarking 
purposes. 

The overall classification accuracy of the UTADIS 
model during the development stage is around 75%. 
Furthermore, the model appears to be quite robust, 
with classifications that do not deviate significantly 
from one year to another. Unsurprisingly, consistent 
with previous studies, the classification accuracy de- 
creases in the validation stag; however, the decrease is 
relatively small and the overall classification accuracy 
is now around 70%. It should be mentioned that while 
our model misclassifies around 30% of the firms in the 
validation dataset, this is not uncommon for studies on 
the prediction of acquisitions targets. 

Other studies that used resampling techniques ob- 
tained similar results. Bartley and Boardman [7] re- 
ported a classification accuracy of 64%, while in a later 
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study [8] they obtained classification accuracies be- 
tween 69.9 and 79.9%. Similarly, the classification ac- 
curacy in the study of Kira and Morin [20] was 66.17%. 
The study of Pasiouras et al. [27] that focused on the 
EU banking industry also reported classification accu- 
racies between 61.6 and 65.7%. As Barnes [14] notes, 
perfect prediction models are difficult to develop even 
for bankruptcy prediction, where failing firms have 
definitely inferior or abnormal performance compared 
with healthy firms. The problem with the identification 
of acquisition targets is not only that there are poten- 
tially many reasons for acquisitions, but also that at the 
same time managers do not always act in a manner 
which maximizes shareholders’ returns owing to hybris 
or agency motives. 

While the comparison of the results obtained in the 
current study with the ones of previous studies gives 
a first indication for the performance of the model, a di- 
rect comparison is not appropriate because of differ- 
ences in the datasets [16,21], the industry under inves- 
tigation, the methods used to validate the models, and 
so on. Hence, the comparison of the UTADIS model 
with the one developed with DA using exactly the same 
dataset, variables, and development and validation pro- 
cedures might provide a more accurate indication of 
the efficiency (in terms of classification accuracy) of the 
MCDA model. Looking at the results in Table 2, we see 
that UTADIS clearly achieves higher classification ac- 
curacies than DA. Furthermore, while the classification 
accuracies of DA decrease as we move back in time, the 
accuracies of UTADIS remain quite robust, even when 
we use data from 3 years prior to the acquisition. Fi- 
nally, with the exception of acquired firms in year —1, 
UTADIS outperforms DA in classifying correct firms of 
both groups (i.e., acquired, nonacquired). 


Conclusions 


In this paper we first discussed why MCDA could be 
useful in the prediction of acquisition targets and pro- 
vided a review of relevant studies. Then, we presented 
the UTADIS technique and its application on a dataset 
of acquired and nonacquired UK firms. 

The application indicates that UTADIS not only 
outperforms a model developed by DA, but it also 
achieves quite robust results, as we use data that move 
away from the period of the event. 


Future applications of MCDA in the area of acqui- 
sitions prediction could focus on the incorporation of 
nonfinancial and qualitative data (e. g. managers’ expe- 
rience, managers’ educational background) in the anal- 
ysis. Although this has been mentioned in the literature 
in the past [22,38], there is still a lack of studies that 
use such variables in the analysis, usually owing to data 
availability. MCDA techniques, like UTADIS, can eas- 
ily incorporate qualitative data, and it would be there- 
fore interesting to perform such an exercise. Further- 
more, it would also be worthwhile to investigate the 
classification of firms in more than two groups (e.g., 
acquired, acquirers, noninvolved) as in the study of Pa- 
siouras et al. [25]. While the results of the later study 
were not promising, the study focused on the bank- 
ing industry, which is a special case. Hence, results 
from nonfinancial sectors might lead to different con- 
clusions. MCDA techniques, like MHDIS, which was 
developed with the multigroup discrimination in mind, 
might be useful in such applications. 
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Decision making problems, according to their nature, 
the policy of the decision maker, and the overall ob- 
jective of the decision may require the choice of an al- 
ternative solution, the ranking of the alternatives from 
the best to the worst ones or the sorting of the alterna- 
tives in predefined homogeneous classes [30]. For in- 
stance, a decision regarding the location of a new power 
plant can be considered as a choice problem, since the 
objective is to select the most appropriate location ac- 
cording to environmental, social and investment crite- 
ria. On the other hand, an evaluation of the efficiency of 
the different units of a firm can be considered as a rank- 
ing problem, since the objective is to estimate the rela- 
tive performance of each unit compared to the others. 
Finally, a credit granting decision is a sorting problem: 
a credit application can be accepted, rejected or submit- 
ted for further consideration, according to the business 
and personal profile of the applicant. Actually, a wide 
variety of decision problems, including financial and 
investment decisions, environmental decisions, medi- 
cal decisions, etc., are better formulated and studied 
through the sorting approach. 

The sorting problem, generally stated, involves the 
assignment of a set of observations (objects, alterna- 
tives) described over a set of attributes or criteria into 
predefined homogeneous classes. This type of prob- 
lem can also referred to as the ‘discrimination’ problem 
or the ‘classification’ problem. Although any of these 
three terms can be used to describe the general objec- 
tive of the problem (i.e. the assignment of observa- 


tions into groups), actually, they refer to two slightly 
different situations: the discrimination or classification 
problem refers to the assignment of observations into 
classes which are not necessarily ordered. On the other 
hand, sorting refers to the problem in which the obser- 
vations should be classified into classes which are or- 
dered from the best to the worst ones. For instance, in 
medical diagnosis the classification of patients accord- 
ing to their symptoms into several possible diseases is 
a discrimination (classification) problem, since it is im- 
possible to establish a preference ordering between the 
diseases. On the contrary, the evaluation of bankruptcy 
risk is a sorting problem, since the non-bankrupt firms 
are preferred to the bankrupt ones. In this paper the 
terms ‘discrimination’, ‘classification’, and ‘sorting’ will 
be used without distinction to refer to the general prob- 
lem of assigning observations, objects or alternatives 
into classes. 

The major practical interest of the sorting prob- 
lem, has motivated researchers in developing an arsenal 
of methods for studying such problems, with the aim 
being the development of quantitative models achiev- 
ing the higher possible classification accuracy and pre- 
dicting ability. In 1936, R.A. Fisher [8] was the first to 
propose a framework for studying classification prob- 
lems taking into account their multidimensional na- 
ture. The linear discriminant analysis (LDA) that Fisher 
proposed has been used for decades as the main classifi- 
cation technique and it is still being used at least as a ref- 
erence point for comparing the performance of new 
techniques that are developed. C. Smith in 1947 [34] 
extended Fisher’s linear discriminant analysis propos- 
ing quadratic discriminant analysis (QDA) in order to 
overcome the restrictive assumption underlying LDA 
that groups have equal dispersion matrices. Later on, 
several other statistical classification approaches have 
been proposed. Among them logit and probit analy- 
sis are the most widely used techniques overcoming 
the multivariate normality assumption of discriminant 
analysis (both linear and quadratic). Although these 
techniques overcome most of the statistical restrictions 
imposed in discriminant analysis, their parameters are 
difficult to explain, especially in multigroup discrimi- 
nant problems. 

The continuous advances in other fields including 
operations research and artificial intelligence led many 
scientists and researchers to exploit the new capabili- 
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ties of these fields, in developing more efficient classifi- 
cation techniques. Among the attempts made one can 
mention neural networks, machine learning, fuzzy sets 
as well as multicriteria decision aid (MCDA). This ar- 
ticle will focus on MCDA and its application in the 
study of classification problems with or without or- 
dered classes. MCDA provides an arsenal of powerful 
and efficient nonparametric classification methods and 
approaches, which are free of statistical assumptions 
and restrictions, while furthermore they are able to in- 
corporate the decision maker’s preferences in a flexible 
and realistic way. 

The remainder of the article is organized as fol- 
lows. Section 2 provides a review of MCDA sorting ap- 
proaches and techniques, outlining their basic charac- 
teristics, concepts and limitations. In section 3, a new 
MCDA sorting method is described and its operation is 
depicted through a simple illustrative example. Finally, 
section 4 concludes the paper and outlines some pos- 
sible future research directions concerning the applica- 
tion of MCDA in sorting problems. 


Multicriteria Sorting Methods 


The MCDA methods which have been proposed for 
the study of sorting problems can be distinguished ei- 
ther according to the approach from which they are 
originated (multi-objective/goal programming, multi- 
attribute utility theory, outranking relations, preference 
disaggregation), or according to the type of problem 
that they address (ordered or non-ordered classes). The 
review presented in this section will distinguish the 
methods according to their origination, but in the same 
time the type of problems that they address will also be 
discussed. 


Goal Programming Approaches 


The work of A. Charnes and W.W. Cooper [4] set the 
foundations on goal/multi-objective programming, but 
it can also be considered as one or the pioneering stud- 
ies in the field of MCDA in general. Since then, both 
multi-objective and goal programming constitute two 
major fields of interest from the theoretical and prac- 
tical points of view in the MCDA and operations re- 
search communities. In particular, goal programming 
approaches, during the 1960s and the 1970s have been 
used to elicit attribute weights in multiple criteria rank- 


ing decision problems [15,27,35,36]. N. Freed and F. 
Glover [9] were among the first to investigate the po- 
tentials of goal programming techniques in the dis- 
criminant problem. Their aim was to develop a linear 
discriminant model so that the minimum distance of 
the score of each alternative from a predefined cut-off 
point is maximized (maximize the minimum distance- 
MMD). To develop this model, they proposed the fol- 
lowing goal programming formulation: 


max d 
s.t. = wixij +d<c, Wie Group], 
Yi wixig -—d > c, Vi € Group 2, 


where w; is the weight of attribute i, x; is the evaluation 
of alternative j on attribute i, and c is the cut-off score 
(w; and d are unrestricted in sign). Soon after propos- 
ing this model, the same authors proposed a variety of 
similar goal programming formulations incorporating 
several other discrimination criteria, such as the sum of 
deviations (optimize the sum of deviations-OSD), the 
sum of interior deviations (minimize the sum of in- 
terior deviations-MSID) and the maximum deviation 
[10]. 

These two studies attracted the interest of sev- 
eral operational researchers and management scien- 
tists. S.M. Bajgier and A.V. Hill [2] proposed a new 
goal programming approach in order to minimize the 
number of misclassifications using a mixed integer pro- 
gramming formulation (MIP) and conducted a first ex- 
perimental study to compare the MMD model, the OSD 
model, and their MIP formulation with LDA. They 
concluded that the goal programming formulations are 
generally superior to LDA, except for the case of mod- 
erate to low overlap between groups and equal disper- 
sion matrices, where LDA outperforms all the exam- 
ined goal programming formulations. 

The performance of goal programming approaches 
compared to statistical techniques was an issue that 
several researchers tried to investigate using mainly 
experimental data sets. Freed and Glover [11] com- 
pared MMD, MSID, OSD and LDA and they concluded 
that although the presence of outliers pose a greater 
problem for the two simpler goal programming for- 
mulations (MMD and MSID) than for LDA, generally 
the goal programming approaches outperform LDA. 
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E.A. Joachimsthaler and A. Stam [18] compared the 
LDA, QDA, logistic regression and OSD procedures 
and they concluded that these methodologies produce 
similar results although the misclassification rates for 
LDA and QDA tended to increase with highly kurto- 
sis data and increased dispersion heterogeneity. C.A. 
Markowski and E.P. Markowski [22] examined the in- 
fluence of qualitative attributes on the discriminating 
performance of MMD and LDA. Although the incorpo- 
ration of qualitative attributes in LDA violates the nor- 
mality assumption, the experimental study of the au- 
thors showed that the incorporation of qualitative vari- 
ables improved the performance of LDA, while on the 
other hand MMD did not appear to be particularly well- 
suited for use with qualitative variables. In another ex- 
perimental study conducted by P.A. Rubin [32], QDA 
outperformed 15 goal programming approaches, lead- 
ing the author to indicate that ‘if LP models are to be 
considered seriously as an alternative to conventional 
procedures, they must be shown to outperform QDA 
under plausible conditions, presumably involving non- 
Gaussian data’. These experimental studies clearly indi- 
cate the confusion concerning the discriminating per- 
formance of the goal programming formulations as op- 
posed to well known multivariate statistical techniques. 
Except for this issue, the research on the field of goal 
programming approaches for discriminant problems, 
was also focus on the theoretical drawbacks which were 
often meet. Markowski and Markowski [23] were the 
first to identify two major drawbacks of the goal pro- 
gramming formulations (MMD and OSD) proposed by 
Freed and Glover [9,10]. More specifically, they proved 
that if each quadrant contains at least one case from 
the second group, unacceptable solutions will result in 
MMD (all coefficients in the discriminant function are 
zeros which leads all the observations to be classified 
in the same group), while furthermore they showed 
that the solutions (discriminant functions) obtained 
through the MMD and the OSD models are not stable 
when the data are transformed (when there is a shift 
from the origin). Except for these two problems, many 
goal programming formulations were found to suffer 
from two additional theoretical shortcomings [29]: 

a) they produce unbounded solutions, and 

b) they produce improper solutions. 

A solution is considered unbounded if the objective 
function can be increased or decreased without limit, 


in which case the discrimination rule (function) may be 

meaningless, whereas a solution is improper if all obser- 

vations fall on the classification hyperplane. 

To overcome these problems new goal program- 
ming formulations were proposed, including hybrid 
models [12,13], nonlinear programming formulations 
[37], as well as several mixed integer programming for- 
mulations [1,3,5,20,33,38,39]. 

In the light of this review of goal programming ap- 
proaches for discriminant problems it is possible to 
identify the following three characteristics of the re- 
search in this field: 

1) The majority of the proposed models aim at devel- 
oping a linear discrimination rule (function). The 
extension of the models to develop a nonlinear dis- 
criminant function leads to nonlinear programming 
formulations which are generally computationally 
intensive and difficult to solve. Among the few al- 
ternative approaches is the MSM method (mul- 
tisurface method) proposed by O.L. Mangasarian 
[21] that leads to the construction of a piecewise 
linear discrimination surface between two groups 
(see also [26] for a revision of the method using 
multi-objective programming and fuzzy mathemat- 
ical programming techniques). 

2) Little research has been made on extending the ex- 
isting framework on the multigroup discriminant 
problem. E.-U. Choo and W.C. Wedley [5], W. Go- 
chet et al. [14], as well as J.M. Wilson [39] applied 
goal programming approaches in multigroup dis- 
criminant problems, but generally most of the stud- 
ies in this field were focused on two-group discrim- 
ination trying to extend the original goal program- 
ming models of Freed and Glover [9,10] in order to 
achieve higher classification accuracy and predict- 
ing ability. 

3) The models based on the goal programming ap- 
proach can be applied in any classification problem 
with or without ordered classes. 


Outranking Relations Approaches 


In contrast to the goal programming approaches, out- 
ranking relations procedures study the classification 
problem on a completely different basis. The aim of 
such procedures is not to develop a discriminant func- 
tion (linear or nonlinear), but instead their aim is to 
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model the decision makers’ preferences and develop 
a global preference model which can be used to as- 
sign the alternatives (observations) into the predefined 
classes. To achieve the classification of the alternatives 
some reference profiles are determined which can be 
considered as representative examples of each class. 
Through the comparison of each alternative with these 
reference profiles the classification of the alternatives is 
accomplished. 

A representative example of MCDA sorting method 
based on the outranking relations approach is the 
ELECTRE TRI method proposed by W. Yu [40]. The 
aim of ELECTRE TRI is to provide a sorting of the alter- 
natives under consideration into two or more ordered 
categories. In order to define the categories ELECTRE 
TRI uses some reference alternatives (reference pro- 
files) r;, i = 1, ..., k — 1, which can be considered as 
fictitious alternatives different from the alternatives un- 
der consideration. The profile r; is the theoretical limit 
between the categories C; and C;, (Cj +1 is preferred to 
C;) and 1; is strictly better than r;— ; for each criterion. 
To provide a sorting of the alternatives in categories 
ELECTRE TRI makes comparisons of each alternative 
with the profiles. 

For an alternative a and a profile r; the concordance 
index cj(a, r;) is calculated. This index expresses the 
strength of the affirmation ‘alternative a is at least as 
good as profile r; on criterion 7’. In order to compare 
the alternative to a reference profile on the basis of more 
than one criteria, a global concordance index C(a, rj) is 
calculated. This index expresses the strength of the af- 
firmation “a is at least as good as r; according to all cri- 
teria’. Setting w; as the weight of the criterion j, C(a, rj) 
is constructed as the weighted average of all c;(a, rj). 

In contrast to the concordance index, the discor- 
dance index Dj(a, r;) expresses the strength of the op- 
position to the affirmation ‘alternative a is at least as 
good as profile r; according to criterion g;’. The calcula- 
tion of the discordance index is based on the definition 
of a veto threshold v;(r;) for criterion j and the profile r;. 
The veto threshold v;(r;) for criterion j defines the mini- 
mum accepted difference between the values of the pro- 
file r; and alternative a on the specific criterion so that 
we can say that they have totally different preference ac- 
cording to criterion j. 

Let F(a,r;) be the set consisted of all criteria for 
which the discordance index value is greater than the 


value of global concordance index. For each affirma- 
tion of the type: ‘alternative a outranks profile r; ac- 
cording to all criteria’, the credibility index o,(a, rj) is 
calculated. If F(a, r;) is empty then o,(a, r;) = C(a, ri); 
otherwise the credibility index is calculated as follows: 


o,(a,r1) = Ca.ri)-T] st 

j€F 

If the value of the credibility index of the affirmation ‘al- 
ternative a outranks profile r; according to all criteria} 
exceeds a predefined cut-off value A, then the proposi- 
tion ‘a outranks r; can be considered to be valid. De- 
noting the outranking relation as S, the preference (P), 
indifference (I) and incomparability (R) relations be- 
tween alternative a and profile 7; can be defined as fol- 
lows: 

e alr; if and only if aSr; and r;Sa; 

e aPr; if and only if aSr; and no 7;Sa; 

e 7;Pa if and only if no aSr; and 1;Sa; 

e aRr; if and only if no aSr; and no r;Sa. 

According to these relations two sorting procedures are 
applied: the pessimistic and the optimistic one. The 
sorting procedure starts by comparing alternative a to 
the worst profile r; and in the case where aPr1, a is com- 
pared to the second profile r, etc., until one of the fol- 
lowing two situations appears: 

i) aPr;andr;,,Paor alr;, 1; 

ii) aPr; and aRr;,1,.. 
If situation i) appears, then alternative a is assigned to 
category i+ 1 by both pessimistic and optimistic proce- 
dures. If situation ii) appears, then a is assigned to cate- 
gory i+ 1 by the pessimistic procedure and to category 
i+k+1by the optimistic procedure. 

It is clear that the ELECTRE TRI method is a pow- 
erful tool for analyzing the decision maker’s preference 
in sorting problems involving multiple criteria where 
the classes are ordered. However, the major drawback 
of the method is the significant amount of informa- 
tion that it requires by the decision maker (weights 
of the criteria, preference and indifference thresholds, 
veto thresholds, etc.). This problem can be overcame 
using decision instances (assignment examples) as pro- 
posed in [25]. 

Other MCDA sorting methods based on the out- 
ranking relations approach have been proposed in 
[24] (N-TOMIC method), [31] and the PROMETHEE 


») ART} +k Ti+k+1Pa. 
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method as it has been modified in [19]. Furthermore, 
P. Perny [28] extended the existing framework of the 
sorting methods based on the outranking relations ap- 
proach in the case in which the groups are not or- 
dered. More specifically, he proposed the construction 
of a fuzzy outranking relation in order to estimate the 
membership of each alternative for each group, and 
suggested two assignment procedures: 

a) filtering by strict preference (the assignment rule 
consists of testing whether an alternative is pre- 
ferred or not to a reference profile reflecting the 
lower limit of a group), and 

b) filtering by indifference (the assignment rule con- 
sists of testing whether an alternative is indifferent 
or not to a reference profile representing a prototype 
of a group). 

Overall the main characteristics of sorting methods 
based on the outranking relations approach of MCDA 
include their application to both sorting (ordered 
classes) as well as discrimination (non ordered classes) 
problems, and the significant amount of information 
that they require by the decision maker. 


Preference Disaggregation Approaches 


The preference disaggregation approach refers to the 
analysis (disaggregation) of the global preferences of 
the decision maker to deduce the relative importance 
of the evaluation criteria, using ordinal regression tech- 
niques based mainly on linear programming formula- 
tions. 

In contrast to the outranking relations approach the 
global preference model of the decision maker is not 
constructed through a direct interrogation procedure 
between the decision analyst and the decision maker. 
Instead, decision instances (e. g. past decisions) are used 
in order to analyze the decision policy of the decision 
maker, to specify his/her preferences and construct the 
corresponding global preference model as consistently 
as possible. 

A well known preference disaggregation method is 
the UTA method (UTilités Additives) proposed in [17]. 
Given a predefined ranking of a reference set of alterna- 
tives, the aim of the UTA method is to construct a set of 
additive utility functions which are as consistent as pos- 
sible with the pre-ordering of the alternatives (and con- 
sequently with the decision maker’s preferences). The 


form of the additive utility function is the following: 
U(g) = > uj(g)), 
j 


where U(g) denotes the global utility of an alternative 
described over a vector of criteria g, while uj(g;) is the 
partial or marginal utility of an alternative on criterion 
&i- 

Except for the study of ranking problems, the 
methodological framework of the preference disaggre- 
gation approach using the UTA method is also applica- 
ble in sorting problems. The UTADIS method (UTilités 
Additives DIScriminantes) [6,16,17,42] is a representa- 
tive example. In the UTADIS method, the sorting of the 
alternatives is accomplished by comparing the global 
utility (scores) of each alternative a, denoted as U(a), 
with some thresholds (uy, ..., 4-1) which distinguish 
the classes Cj, ..., C, (the classes are ordered, so that C; 
is the class of the best alternatives and C, is the class of 
the worst alternatives). 


U(a)>u,>aceC, 
uz < Ula) <u, S>aeQ 


ux < U(a) < ug-1) > AE Cy 


U(a) < ug-1 > AE Cy. 


The objective of the UTADIS method is to estimate 
an additive utility function and the utility thresholds 
in order to minimize the classification error. The clas- 
sification error is measured through two error func- 
tions denoted as o*(a) and o (a), representing the de- 
viations of a misclassified alternative from the utility 
threshold. The estimation of both the additive utility 
model and the utility thresholds is achieved through 
linear programming techniques [6,42]. 

See [7] and [41] for three variants of the UTADIS 
method to improve the classification accuracy of the 
obtained additive utility models as well as their pre- 
dicting ability. The first variant (UTADIS I) except for 
the classification errors also incorporates the distances 
of the correctly classified alternatives from the util- 
ity thresholds which have to be maximized. The sec- 
ond variant (UTADIS II) is based on a mixed integer 
programming formulation minimizing the number of 
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misclassifications instead of their magnitude, while the 

third variant (UTADIS I) combines UTADIS I and I, 

and its aim is to minimize the number of misclassifica- 

tions and maximize the distances of the correctly clas- 
sified alternatives from the utility thresholds. 

Overall the main characteristics of the application of 
the preference disaggregation approach in the study of 
sorting problems, can be summarized in the following 
three aspects. 

1) The information that is required is minimal, since, 
similarly to the goal programming approaches, only 
a predefined classification of a reference set of alter- 
natives is required. 

2) The preference disaggregation approach is focused 
only on decision problems where the classes are or- 
dered, since it is assumed that there is a strict pref- 
erence relation between the classes. 

3) The classification/sorting models which are devel- 
oped have a nonlinear form, since the marginal util- 
ities of the evaluation criteria are piecewise linear 
and consequently the global utility model is also 
nonlinear, in contrast to the linear discriminant 
models used in the goal programming approaches. 


A Multigroup Hierarchical Discrimination Method 


In this section a new method is presented for the 
study of discrimination problems with two or more 
ordered groups (multigroup discrimination). The pro- 
posed method is called M.H.DIS (Multigroup Hierar- 
chical DIScrimination) and differs from most of the 
aforementioned MCDA approaches in two major as- 
pects. 

1) It employs a hierarchical discrimination approach: 
the method does not aim on the development of an 
overall global preference model (discriminant func- 
tion) which will characterize all the observations (al- 
ternatives or objects). Instead the method is try- 
ing to distinguish the groups progressively, starting 
by discriminating the first group (best alternatives) 
from all the others, and then proceeding to the dis- 
crimination between the objects which belong to the 
other groups. 

2) It accommodates three different discrimination cri- 
teria in a very flexible and efficient way. The most 
common discrimination criterion in the previous 
approaches is the minimization of the classification 


error which is measured as the deviations of the 
scores of the misclassified alternatives from some 
cut-off points. However, such an objective does not 
necessarily yield the optimal classification rule. For 
instance, consider that in a discrimination problem, 
three alternatives are misclassified with the follow- 
ing deviations from the cut-off point: [0.25, 0.25, 
0.25], with the overall objective of minimizing the 
total classification error being 0.75. It is obvious, 
that this classification result is not optimal, since 
a classification result [0, 0, 0.75] yields the same 
value for the overall classification error (0.75), but 
there is only one misclassified alternative instead of 
three. Several mixed integer programming formu- 
lations have been proposed to confront this issue, 
but their application in real world problems is pro- 
hibited by the significant amount of time required 
to solve such problems. M.H.DIS employs an effi- 
cient mixed integer programming (MIP) formula- 
tion for minimizing the number of misclassifica- 
tions, once the minimization of the classification er- 
ror has been achieved. Furthermore, M.H.DIS also 
considers a third criterion in order to achieve the 
higher possible discrimination. These three discrim- 
ination criteria have been used in previous stud- 
ies separately, or in hybrid models [12,13], but they 
have never been used through a sequential proce- 
dure. Instead, in M.H.DIS initially the classification 
error is minimized. Then considering only the mis- 
classified alternatives M.H.DIS tries to ‘re-arrange’ 
their classification error in order to minimize the 
number of misclassifications, and finally the maxi- 
mum discrimination between the alternatives is at- 
tempted. 


Model Formulation 


Let A = {a,, ..., dy} be a set of n alternatives which 
should be classified into q ordered classes C},..., Cy(Cy 
is preferred to C2, C2 is preferred to C3, etc.) Each al- 
ternative is described (evaluated) along a set G = {g1, 
.++> Ym} of m evaluation criteria. The evaluation of each 
alternative a on criterion g; is denoted as gj(a). Ac- 
cording to the set A of alternatives, p; different values 
for each criterion g; can be distinguished. These p; val- 
ues are rank-ordered from the smallest value gi to the 
largest value g’. Furthermore, among the set of cri- 
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teria it is possible to distinguish two subsets: a subset 
G, consisting of m, criteria for which higher values in- 
dicate higher preference, and a second subset G) con- 
sisting of m2 criteria for which the decision maker’s 
preference is a decreasing function of the criterion’s 
scale. For instance, in an investment decision problem 
G, may include criteria related to the return of an in- 
vestment project (projects with higher return are pre- 
ferred), while G, may include criteria related to the 
risk of the investment (projects with lower risk are pre- 
ferred). 


The Hierarchical Discrimination Process 


The method proceeds progressively in the classification 
of the alternatives into the predefined classes, starting 
from class C, (best alternatives). Initially, the aim is to 
identify which alternatives belong in class C). The al- 
ternatives which are found to belong in class C; (either 
correctly or incorrectly) are excluded from further con- 
sideration. In a second stage the objective is to identify 
which alternatives belong in class C. The alternatives 
which are found to belong in this class (either correctly 
or incorrectly) are excluded from further consideration, 
and the same procedure continues until all alternatives 
have been classified in the predefined classes. 

Throughout this hierarchical classification proce- 
dure, it is assumed that the decision maker’s prefer- 
ences are monotone functions (increasing or decreas- 
ing) on the criteria’s scale. This assumption implies that 
in the case of a criterion g; € Gj, as the evaluation of an 
alternative on this criterion increases, then the decision 
of classifying this alternative into a higher (better) class 
is more favorable to a decision of classifying the alterna- 
tive into a lower (worst) class. For instance, in the credit 
granting problem as the profitability of a firm increases, 
the credit analyst will be more favorable in classifying 
the firm as a healthy firm, rather than classifying it as 
a risky one. A similar implication is also made for each 
criterion gj € Gp. 

This preference relation between the several possi- 
ble decisions of classifying a specific alternative a into 
one of the predefined classes, imposes the following 
general classification rule: 


The decision concerning the classification of an 
alternative a into one of the predefined classes 


should be made in such a way that the utility 
(value) of such a decision for the decision maker 
is maximized. 


The utility of a decision concerning the classification of 
an alternative a into group C; can be expressed in the 
form of additive utility function: 


m 


US(a) = Sui" [gi(a)] € (0.11, 


i=1 


where uc [gi(a)] denotes the marginal (partial) utility 
of the decision concerning the classification of an al- 
ternative a into group Cj according to criterion g;. If 
gi € G,, then uy! (gi) will be an increasing function 
on the criterion’s scale. On the contrary, the marginal 
utility of a criterion gj € G2 regarding the classifi- 
cation of an alternative into a lower (worse) class 
Cy (k >j) will be a decreasing function on the crite- 
rion’s scale. For instance, consider once again the credit 
granting problem: since healthy firms are generally 
characterized by high profitability, the marginal util- 
ity for a profitability criterion for the group of healthy 
firms will be an increasing function, indicating that as 
profitability increases the preference of decision con- 
cerning the classification of a firm in the group of 
healthy firms in also increasing. On the other hand, 
for the group of risky firms the marginal utility will 
be a decreasing function of the criterion’s (profitabil- 
ity) values, indicating that as profitability increases the 
preference of the decision concerning the classifica- 
tion of a firm in the group of risky firms is decreas- 
ing. 

Consequently, at each stage of the hierarchical clas- 
sification procedure that was described above, two util- 
ity functions are constructed. The first one corresponds 
to the utility of a decision concerning the classification 
of an alternative a into class C, (denoted as U“*(a)), 
while the second one corresponds to the utility of a de- 
cision concerning the nonclassification of an alternative 
a into class Cy (denoted as U~©*(a)). Based on these 
two utility functions the aforementioned general classi- 
fication rule can be expressed as follows: 


if U“«(a) > U~“(a), 
if U“(a) < U~“(a), 


then a € Cx, 


then a ¢ Cx. (1) 
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Multicriteria Sorting Methods, Figure 1 
The hierarchical classification procedure 


Following this rule, the overall hierarchical discrimina- 
tion procedure is presented in Fig. 1. 


Estimation of Utility Functions 


According to the hierarchical discrimination procedure 
which was described above, to achieve the classification 
of the alternatives in q classes, the number of utility 
functions which must be estimated is 2(q — 1). The esti- 
mation of these utility functions in M.H.DIS is accom- 
plished through linear programming techniques. More 
specifically, at each stage of the hierarchical discrimi- 
nation procedure, two linear programs and one mixed 
integer program are solved to estimate ‘optimally’ the 
two utility functions. 


LP1: Minimizing the Overall Classification Error 


According to the classification rule (1), to achieve the 
correct classification of an alternative a € C; at stage k 
(cf. Fig. 1), the estimated utility functions should satisfy 


US (a) > US (a) 


the following constraint: 


U“'(a) > UF (a). 


Since, in linear programming it is not possible to use 
strict inequality constraints, a small positive real num- 
ber s may be used as follows: 


U“«(a) — U~“(a) > s. 


If for an alternative a € C; the classification rule at 
stage k yields U“ (a) < U~©*(a), then this alternative is 
misclassified, since it should be classified in one of the 
lower classes (the specific classification of the alterna- 
tive will be determined in the next stages of the hierar- 
chical discrimination process). The classification error 
in this case is: 


e(a) = U-°*(a) — U“*(a) +. 


Similarly, to achieve the correct classification of an 
alternative b ¢ C, at stage k, the estimated utility func- 
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tions should satisfy the following constraint: 
Uhh) eb) Ss, 


If this constraint is not satisfied for an alternative b ¢ 
C;, at stage k, then this fact implies that this alternative 
should be classified in class C; and the classification er- 
ror in this case is e(b) = U“(b) — U~©(b) 4s. 

Moreover, to achieve the monotonicity of the 
marginal utilities, the following constraints are im- 
posed: 


uy *(g}) =0 
uy (gr) = 0 
us*(gi"!) > uF*(gi) 
we Mgr’) < uy *(g) 


1 


if g EG, (2) 


iu, (eg, )=0 
u; “*(g}) =0 
usk(gi*) < ul*(g!) 


uy (gi!) > uy (gi) 


if g EG (3) 


j+1 : ; 
where g and g! are two consecutive values of crite- 
, jtl ij : 
rion g(g)" > g! for all g; € G). These constraints can 


be simplified by setting: 
eo ec, (Mage = Mer) —uPM(g) 
&i 1 —Ce _ , Cerf —Cee j+l 
por, ee 
(4) 
Cr Cerf Cry jt) 
if gj € Go Wipf 7 HB) es | ) 
Wij jer = Hi ‘(gi )— 4, *(g;) 
(5) 


The marginal utility of criterion g; at point g! can 
then be calculated through the following formulas: 


jl 
Cre iy _ Ck 
u;*(g;) = y Wares 
1=1 
pi-l 


—Cre iy —Ck 
u; “(g;)) = ~ Wilt 
1=j 


(6) 


Using these transformations, constraints (2) and (3) 
can be rewritten as follows (a small positive number t 
is used to ensure the strict inequality): 


Cc 


—Ck 
Wis => t, 


2h Wii jt = 


k 
itl = V8i- 


Consequently, the initial linear program (LP1) to be 
solved can be formulated as follows: 


min F= > e(a) 
acA 
st. U°k(a) — U~ (a) + e(a) > s, 
Vae Ck, 
U- Oe by = UPK(b) eld) > s, 
VbE Cy, 


ij 
e(a),s,t > 0. 


LP2: Minimizing the Number of Misclassifications 


If after the solution of (LP1), there exist some alterna- 
tives a € A for which e(a) > 0, then obviously these 
alternatives are misclassified. However, as it has been 
already illustrated during the discussion of the main 
characteristics of M.H.DIS, it may be possible to achieve 
a ‘re-arrangement’ of the classification errors which 
may lead to the reduction of the number of misclassi- 
fications. 

In M.H.DIS this is achieved through a mixed integer 
programming (MIP) formulation. However, since MIP 
formulations are difficult to solve, especially in cases 
where the number or integer or binary variables is large, 
the MIP formulation used in M.H.DIS considers only 
the misclassifications occurred by solving (LP1), while 
retaining all the correct classifications. Let C be the set 
of alternatives which have been correctly classified after 
solving (LP1), and M be the set of misclassified alterna- 
tives for which e(a) > 0. The MIP formulation used in 
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M.H.DIS is the following (LP2): 


min F= yo 1(a) 
acA 
st. U°k(a) —U~“ (a) > s, 


VaeCknc, 

U~“k(b) — USK(b) > s, 
VbEC., DEC, 

U“«(a) — U~k(a) + I(a) > s, 
VaeC.e nM, 

U—-k(b) — USK (b) + I(a) > s, 


Vbh¢ Cy, bEM, 
Ck 
ij,j+1 t 


git 
= 
Wai, rls e 


ud wie = 1 
> Wine =! 


Sot, a integer. 


WwW 


IV 


The first set of constraints is used to ensure that 
all the correct classifications achieved by solving (LP1) 
are retained. The second set of constraints is used only 
for the alternatives which were misclassified by (LP 1). 
Their meaning is similar to the constraints in LP1, with 
the only difference being the transformation of the con- 
tinuous variables e(a) of LP1 (classification errors) into 
integer variables I(a) which indicate whether an alter- 
native is misclassified or not. The meaning of the final 
two constraints has already been illustrated in the dis- 
cussion of the LP1 formulation. The objective of LP2 is 
to minimize the number of misclassifications occurred 
through the solution of LP1. 


LP3: Maximizing the Minimum Distance 


Solving LP1 and LP2 the ‘optimal’ classification of the 
alternatives has been achieved, where the term ‘optimal’ 
refers to the minimization of the number of misclassi- 
fied alternatives. However, the correct classification of 
some alternatives may have been ‘marginal’, that is al- 
though they are correctly classified, their global utilities 
according to the two utility functions developed may 
have been very close. The objective of LP3 is to maxi- 
mize the minimum difference between the global util- 


ities of the correctly classified alternatives achieved ac- 
cording to the two utility functions. 

Similarly to LP2, let C be the set of alternatives 
which have been correctly classified after solving LP1 
and LP2, and M be the set of misclassified alternatives. 
LP3 can be formulated as follows: 

max d 
st. U°k(a) — U~“ (a) — 
Vae Cenc, 
U~k(b) — USK (b) — 
VbEC.,DEC, 
USk(a) —U~“k (a) > s, 
VaeEeCenM, 
U-~&k(b) — UN(b) > s, 
VbEéCy,bEM, 


Ck 


ij,j+l 

ee, 
= 

WE itl = 4 


X uM Wine = 1 
3 Dwi il 


ea) oh 


d>s, 


d>s, 


ws >t 


The first set of constraints involves only the cor- 
rectly classified alternatives. In these constraints d rep- 
resents the minimum absolute difference between the 
global utilities of each alternative in the two utility func- 
tions. The second set of constraints involves the mis- 
classified alternatives and it is used to ensure that they 
will be retained as misclassified. 


An Illustrative Example 


To illustrate the application of the method, consider 
a simple example consisting of six alternatives eval- 
uated along three evaluation criteria [25] for which 
higher values are preferred. The alternatives must be 
classified in three ordered classes. Table 1, illustrates the 
evaluation of the alternatives on the criteria as well as 
the predefined classification. 


Distinguishing Between C; and C2-C; 


In the first stage of the hierarchical discrimination pro- 
cedure, the aim is to distinguish the alternatives be- 
longing in class C; from the alternatives belonging in 
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Multicriteria Sorting Methods, Table 1 
Data of the illustrative example (Source: [25]) 


On the other hand, if a, is classified in class C) then 
the utility of the decision maker will be: 


U~“ (ay) = uy (70) + u3 “1 (64.75) + uz (46.25) 


=—C =C 
U '(a,) => W35.6° 


Zi i) 3 Class 
ay 70 64.75 46.25 Ci 
a2 61 62 60 C; t 
a3 40 50 3y/ Cy 
a4 66 40 DS MAS) Cz 
as 20 20 20 C3 
a6 15 15 30 C3 


classes Cy and C3. To achieve this classification two 
utility functions are developed, denoted as U“'(a) and 
U-“ (a). 

The utility of the decision of classifying the alterna- 
tive a) in class C; can be expressed as follows: 


U“ (ay) = ut!(70) + u$!(64.75) + u$!(46.25). (7) 


Since for all criteria higher values are preferred, it 
is possible to define the following rank-order on each 
criterion’s scale (p; = p2 = p3 = 6). 


gi) gl=15<20<40<61<66<70=g'); 
&) gi=15<20< 40 <50<62< 64.75 = 95"; 
g3) gi = 20 < 23.125 < 30 < 37 < 46.25 < 60=¢5". 


According to relation (4), the following transforma- 
tions are then applied (criterion g;): 


wri, = uy'(20) — uy7(15), 
wi3 = uy'(40) — uy1(20), 
wig a uy'(61) — u%1(40), 
Wiis = uy" (66) — u“ (61), 
whe = uy'(70) — uy" (66). 


The same transformations are also applied to crite- 
ria g2 and g3. Then, according to (6), relation (7) can be 
re-written in the following way: 


on = Cy Cy Cy Cc, Cy 
Ua) = (Wyn + W393 + Wig + Was + W15,6 


Cy Ci Cy Ci Cy 
+ (Wy + W731 W234 + Was + W256 


Ci Cy Ci Ci 
aT (wi + W393 1 W334 1 W45) 


Following the same methodology, the utilities con- 
cerning the classification of the rest of the alternatives 
are also formulated. 

e Alternative ap: 


U~(ap) = uy'(61) + u5"(62) + uS'(60) 
t 
U" (a2) = (writs oF wis + wis 

7 (wrt. at Wes F Weis + Wyiis) 

7 (Wei oF w3h3 + Wess + Wat's ate W5e6 , 
U~“ (az) = uy (61) + uy“ (62) + v5 “1 (60) 
t 


= -C -C -C 
U-“'(a2) = (Wigs ap Wise) a (W 5.6). 
e Alternative a3: 


U~©(a3) = uf'(40) + uS1(50) + u$'(37) 


t 
Cc Cc 
U" (as) = (Wiis + wy2\3) 
Cc Cc Cc 
+ (Woo + wy 3 + W734 


C, Cy Cy 
+ (w3j\. + W393 + W334), 


U~©(a3) = uy (40) + uy ©'(50) + u5 “1 (37) 
t 
U-“'(a3) = (wish oF Was Wee) 


-C -C -C -C 
+ (Wo45 oF W 56) a (W345 ar W356): 


e Alternative au: 


U-°! (a4) = uf! (66) + u5!(40) + uf!(23.125) 
t 


Cy _ Ci Cy Cy Cy 
U™'(a4) = (wij2 + Wi 3 F Wi34 + Wigs 
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+ (wot, + wots) + (wsta), 
U~“ (a4) = uj “1(66) 
+ uz (40) + uz (23.125) 


t 
—C —C 
U ‘(aa) = (Wise 
Cc CG iC 
at (W334 as Was aI W256 


Cy Ci Cy Cy 
+ (W393 + W334 + W3a5 + W556). 


e Alternative as: 


U-“(as) = uf!(20) + us!(20) + uf! (20) 
¢ 
U (as) = (wif y) + (wea), 
U- (as) = 4, 720) 
+ uz °1(20) + u5“1(20) 
¢ 
U-“'(as) 


_— Cy Ci Ci Cy 
= (Win3 + Wi34 Wigs 1 Wis.6) 


Ci Ci Ci Ci 
+ (Wo 3 + W934 Was 1 Wo: 6 


Ci C, Cy Ci Cy 
5 (W313 + W393 1 W334 1 W345 + W356 he 


e Alternative ag: 


U~“ (ag) = u01(15) + uS1(15) + u$'(30) 


t 

US (ae) = (w5ila + W331). 

U~ (ag) = uy “1(15) + 4p “1(15) + uy (15) 

t 

U-“ (a6) 

= (wing ae Wing a Wine aa Wigs ae Wisse) 
a (waG oF Wri + Wa | + Were + Wye ) 


Ci Cy Cy 
a (W334 + W345 + W356 e 


According to these expressions of the global utility 
of the decision to classify an alternative into class C; or 
into one of the classes C, and C3, the LP1 formulation 
is used to minimize the classification error (s = 0.001, t 
= 0.0001). 


The obtained solution is presented in Table 2. 
According to this solution, the marginal utilities are cal- 


min F = e(a,) + e(az) + e(a3) + e(a4) 
+e(as) + e(a6) 

st. U%(a,) — U~“(a,) + e(ay) => 0.001 
U“ (ax) — U-“!' (aa) + e(az) > 0.001 
U~©'(a3) — US (a3) + e(a3) = 0.001 
U~“\ (a4) — US" (a4) + e(a4) > 0.001 
U~“ (as) — U“' (as) + e(as) > 0.001 
U~' (ag) — U" (a6) + e(a6) = 0.001 


Wriy, 20.0001, w;;G, > 0.0001, 
3 


5 
C1 _ 
win = 1 


i=1 j=l 


3 5 
Min = 1s 

i=] j=1 

Vi=1,2,3, Vj=1,...,6, 
e(a1), e(a2), e(a3) = 0, 


e(a4), e(as), e(ae) = 0. 


i 


a 


culated. 
e Criterion g: 


ur!(15) = 0, 


=C) Reem cr C1 G C Cy 
Uy ' (15) = Wi + Wy 3 + Wig +Wya 5 + Wisc 


= 0.25937, 


Multicriteria Sorting Methods, Table 2 
Results obtained through the solution of LP1 


Wit 0.00010 wii 0.03708 
wins 0.00010 Wis 0.03708 
wi 0.09872 Wis 4 0.07406 
wits 0.00010 Wine 0.03708 
Wee 0.09872 Wise 0.07406 
Wr2 0.00010 Waa 0.03708 
wns 0.00010 Wo 3 0.03708 
Woe 0.09872 Wat 0.07406 
Weis 0.13570 Wea 0.11104 
Wi8.6 0.09872 W256 0.07406 
W312 0.00010 W31 2 0.03708 
W523 0.09872 Wa . 0.07406 
Wis 0.09872 Wa 4 0.07406 
Was 0.13570 Waa ic 0.11104 
Wade 0.13570 W358 0.11104 
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uy '(20) = wi}, = 0.0001, 
uy '(20) = wif +Wigd tWigetwiag = 0.22229, 
uy (40) = wrt, twit; = 0.0002, 

-C; — yO —Cy Cy _ 
uy '(40) = wid twig twig = 0.18521, 
uy'(61) = Why twins +Wiz4 = 0.09892, 
uv, (61) = wig + wig g = 0.11114, 
Uy" (66) = Wika + Wiks + Wika + Wids = 0.09902, 
uy “'(66) = Wjs¢ = 0.07406, 
uy! (70) = Wie zi Win3 + Wika = Wis + Wise = 
0.19773, 
u, “'(70) = 05 


Criterion go: 


Cy 
#'(15)=0, 
uw, “"(15) = Wri + Wyrt - Wry - Wras +Wrag 
= 0.33333, 
Cy wel — 
ux! (20) = w3}., = 0.0001, 
u,“1(20) = Wa = Wrat + Wras + Wrecg 7 
0.29625, 
uy'(40) = wSi, + wop3 = 0.0002, 
uy “1 (40) = wyy} + wogdt wed = 0.25917, 
uy’ (50) = wy, + Weds + Wod4 = 0.09892, 
uz “1 (50) = woad + Weg2 = 0.18511, 
uy’ (62) = wy, two, +wh 4 +w5s = 0.23462, 
uy “1 (62) = wya2 = 0.07406, 
us? (64.75) = Whois = Wy as Waka aa Woi.s + Woe 
= 0.33333, 
uz “'(64.75) = 0; 


Criterion g3: 


us" (20) ~ " G Cc Cc Cc G 
Uz '(20) = W315 + W593 + W334 +W345 +W35,6 
0.40730, 

uS' (23.125) = wS}', = 0.0001, 

U3 “'(23.125) = Wd + Wid + Wad +Wagd = 
0.37021, 

us (30) = ws, +w$j. = 0.09882, 

U3 (30) = W333 +342 +w5eg = 0.29615, 

us’ (37) = Wyiy +w3h3 +544 = 0.19753, 

Uz 1 (37) = Wd +W3e¢ = 0.22209, 

i (46.25) = Wats +W535 +W3i4 +W3t 5 = 
0.33323, 

U3 ' (46.25) = wa = 0.11104, 

us'(60) = W5iha +W3} 3 +W33 4 +W3i 5 +W33 6 
0.46893, 

uz “'(60) = 0; 


Multicriteria Sorting Methods, Table 3 
Global utilities obtained through the solution of LP1 (stage 1) 


U“ (a) U-“1(a) 
a,| 0.8643 0.1110 
a2} 0.8025 0.1852 
a3| 0.2967 0.5924 
a4| 0.0993 0.7034 
as| 0.0002 0.9258 
da6| 0.0988 0.8889 


According to these marginal utilities, the global util- 
ities are calculated based on the expressions that have 
already been presented. Table 3, illustrates the obtained 
global utilities according to the two utility functions 
that were developed. 

It is clear that a, and ap are classified in class C), 
since the global utility of a decision concerning the clas- 
sification of these two alternatives in class C, is greater 
than the utility concerning their classification in classes 
C, or C3. Similarly, alternatives a3, a4, a5 and dg are 
not classified in class C;, but instead they belong in one 
of the classes C2 or C3 (their specific classification will 
be determined in the next stage of the hierarchical dis- 
crimination process). 

Since the correct discrimination between the alter- 
natives belonging in class C; and the alternative not be- 
longing in this class has been achieved through LP1, 
it is not necessary to proceed in LP2 (minimization of 
the number of misclassifications). Hence, the procedure 
proceeds in the formulation and solution of LP3 in or- 
der to achieve the higher possible discrimination: 


max d 

st. U'(a,) — U~'(a,) — d = 0.001 
U“' (a2) — U-' (a2) — d > 0.001 
U~ (a3) — U“ (a3) — d > 0.001 
U~“' (a4) — U“ (a4) — d > 0.001 
U~“ (as) — U“' (as) — d > 0.001 
U~“ (ag) — U“ (as) — d > 0.001 


Wrig 20.0001, w;S., > 0.0001 
3 5 3 5 

Cc = 
eh ae aoe 
i=1 j=1 i j=l 
WISI S; Visine, Get: 


According to the obtained solution and following 
the same procedure for calculating the marginal utili- 
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Global utilities obtained through the solution of LP3 (stage 1) 


UG) UE @) 
a,| 0.9985 | 0.0001 
az | 0.9987 | 0.0003 
a3} 0.0008 | 0.9992 
a4| 0.0009 | 0.9993 
as} 0.0002 | 0.9998 
a¢| 0.0002 | 0.9998 


ties, the global utilities of Table 4 are obtained. Obvi- 
ously, this new solution provides a better discrimina- 
tion of the alternatives, compared to the initial solution 
obtained by LP1. 


Distinguishing Between C, and C3; 


After the solution of LP3, the first stage of the hierarchi- 
cal discrimination process is completed, with the cor- 
rect classification of a; and a in class C). Consequently, 
these two alternatives are excluded from further consid- 
eration (second stage). In the second stage, the aim is to 
determine the specific classification of the alternatives 
3, a4, as and ag. The following rank-order is defined 
on the scale of the three evaluation criteria (p; = p2 = p3 
= 4). 


gi) gi=15<20<40<66=g''; 
g) gh=15<20<40<50=95"; 
83) ga 20 293195303759", 


Then, following the procedure illustrated in the previ- 


ous stage, the variables wi j+1 and Wi tes are formu- 
lated, and the new form of the LP1 problem is the fol- 


Multicriteria Sorting Methods, Table 5 
Global utilities obtained through the solution of LP1 (stage 2) 


|_ue(a) | U~@(a) 


a3| 0.8944 | 0.1000 
a4| 0.7333 | 0.2501 
as} 0.2111 0.8000 
a6} 0.1612 | 0.7500 


lowing (s = 0.001, t = 0.0001): 


min F = e(a3) + e(a4) + e(as) + e(a6) 

st.  U@(a3) — U~@(a3) + e(a3) > 0.001 
U“ (a4) — U- (a4) + e(a4) > 0.001 
U~@ (as) — U (as) + e(as) > 0.001 
U~ (ae) — U (ae) + e(ae) = 0.001 


wi, 20.0001, w;, > 0.0001 
3 3 
Cc 
yet 
i=1 j=l 
3 3 
—G 
Dwg = 
i j=l 
Vi= 12:3, Vj HI, cacy 


e(a3), e(a4), e(as), (ae) = 0. 


Table 5 presents the global utilities of the alterna- 
tives according to the solution obtained by LP1 in this 
second stage. 

The alternatives are correctly classified in their orig- 
inal classes, and therefore, it is not necessary to pro- 
ceed with LP2 (similarly to the first stage). Instead, the 
method proceeds in solving LP3 to achieve better dis- 
crimination of the alternatives. 


max d 

st. U%(a3) — U~@(a3) — d > 0.001 
U@ (a4) — U~@ (a4) — d > 0.001 
U-@ (as) — U@(as) — d > 0.001 
U-@ (ag) — U (as) — d > 0.001 


wy2i4, 20.0001, w;, > 0.0001, 
3 3 


win =) 


i=1 j=1 
3 3 

—C 
Lwin sh 
i=1 j=1 
Vi=1,2,3, Vj=1,...,4, 
d>0. 


jet 


a 


Table 6 presents the global utilities calculated ac- 
cording to the solution of LP3. 

In this point the hierarchical discrimination proce- 
dure ends, since all the alternatives have been classified 
in the three predefined classes. Moreover, this classifi- 
cation is correct. In particular, in stage 1 a; and az have 
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Multicriteria Sorting Methods, Table 6 
Global utilities obtained through the solution of LP3 (stage 2) 


|_Ue(a) | U~®(a) 
a3| 0.9999 0.0005 
a4| 0.9997 0.0003 
as| 0.0002 0.9996 
a6 | 0.0005 0.7949 


been correctly classified in class C;, while in stage 2 a3 
and a4 have been correctly classified in class C2, and a5 
and dg have been classified into the final class C3 (cf. 
Table 6). 


Concluding Remarks and Future Perspectives 


The focal point of interest in this article was the applica- 
tion of MCDA in the study of sorting or more generally 
discrimination (classification) problems. Such types of 
problems have major practical interest in several fields 
including finance, environmental and energy policy 
and planning, marketing, medical diagnosis, robotics 
(pattern recognition), etc. The multivariate statistical 
classification techniques have been used for decades to 
study such problems. However, their inability to pro- 
vide a realistic and flexible approach to support real 
world decision making problems in situations where 
classification is required, led operational researchers, 
management scientists as well as practitioners towards 
the exploitation of the recent advances in the fields of 
operations research, management science, and artificial 
intelligence. 

Among these ‘alternative’ approaches for the study 
of classification problem, MCDA provides an arsenal 
of tools and methods to develop classification (sorting) 
models within a realistic and flexible context. This arti- 
cle outlined the main MCDA classification techniques, 
both from the specific type of classification problems 
that they address (ordered or non-ordered classes), as 
well as from the MCDA approach that they employ 
(goal programming, outranking relations, preference 
disaggregation). 

Furthermore, anew MCDA approach has been pro- 
posed. The M.H.DIS method, extends the common 
two-group classification framework, through a hier- 
archical multigroup discrimination procedure, taking 
into account three main discrimination criteria through 
a sequential process. In this way the classification prob- 


lem is studied globally, in order to achieved the higher 
possible classification accuracy. Except for the illustra- 
tive example used in this paper, the M.H.DIS method 
has already been used in several financial classification 
problems, including the evaluation of bankruptcy risk, 
portfolio selection and management, the evaluation 
of bank branches efficiency, the assessment of coun- 
try risk, company mergers and acquisitions, etc. [43], 
providing very encouraging results compared to well 
known statistical techniques (discriminant analysis, 
logit and probit analysis), and MCDA preference dis- 
aggregation techniques (family of UTADIS methods). 

An interesting further research direction would be 
the exploration of a possible combination of M.H.DIS 
with artificial intelligence techniques such as fuzzy sets, 
in order to consider the fuzziness which may exist on 
the evaluation of alternatives on each evaluation crite- 
rion, or on the classification of the alternatives. 
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Introduction 


The multidimensional assignment problem (MAP) can 
be viewed as a higher-dimensional extension of the lin- 
ear assignment problem (LAP). While the LAP is often 
explained as assigning each person in a group a spe- 
cific job so that for each job there is only one per- 
son who does it, and for each person there is only one 
assigned job. The MAP generalizes evidently two-di- 
mensional (people, jobs) LAP by allowing additional 
dimensions (space, time, etc.) Hence, the previous ex- 
ample of scheduling people to jobs can be extended 
to scheduling people to jobs at various time intervals 
in different locations, so that each specific parameter 
(say, time interval) is coupled with its own unique three 
other parameters (person, job, location) and none of 
them are in any other assignment (of a person, a job, 
a time slot and a person). Such a modified assign- 
ment problem is an example of a MAP in four dimen- 
sions. 

Obviously, the LAP is a special case of the MAP in 
two dimensions. On the other hand, the MAP (some- 
times referred to as multi-index assignment problem) 
is a special case of the multi-index transportation prob- 
lem, just like the LAP is a particular instance of the 
more general transportation problem. 

Interestingly, a broader class of multidimensional 
transportation problems was originally considered 
about a decade before the LAP was first given its mul- 
tidimensional generalization. In fact, a three-dimen- 
sional case of the multi-index transportation problem 
was first introduced by Schell in 1955 [33], and later 
by Haley [19] in 1963. The MAP was initially pre- 
sented by Pierskalla [26] in 1966, through first extend- 
ing the LAP to its three-dimensional case, and then (in 
1968) as a general formulation of MAP in n dimen- 
sions [27]. 

Despite the fact that the LAP can be solved in 
polynomial time, the MAP of dimensionality d > 3 
is known to be NP-hard in general (the latter state- 
ment follows from a reduction of the matching prob- 
lem in three dimensions) [16]. In fact, the size of the 
MAP increases extremely fast with an increase in di- 
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mensions. To be more precise, the size of problem 
grows by products of factorials. As a result of the inher- 
ent complexity of the problem, only small to medium- 
sized instances of the MAP can be solved routinely 
at the moment. Most of the exact and heuristic al- 
gorithms developed for this problem are enumerative 
in nature and/or utilize some form of local neighbor- 
hood search. Although many real-life applications of 
the MAP, including the data association problem in 
target tracking, require solving general problems of di- 
mensions higher than three, most of the proposed so- 
lution methods deal with widely studied three-dimen- 
sional versions such as axial 3-MAP and planar 3- 
MAP. 


Formulation 


Several alternative formulations of the MAP have been 
given since Pierskalla introduced it as a 0-1 integer pro- 
gramming problem as follows. 

Given 1 < pi < ... < pa <n, a finite sequence of 
positive integers, we want to 


minimize ) aoe ) Ci, ig * Xi, ...ig 


1<ij<pi lSig<pa 
subject to ) is6 y Xiq ia 
1Sin<p2 lSigSpa 


=li1l<i <p, 


2 ae 2 


1<ii<pi VSig—1 Spk—-1 Sint Speti 
>. Xi ..ig = 1, 
l<ia<pa 
l<in<pr, 2<k<d-1, 
> Xiyig = 1, 
1<i)<pi 1Sig—1<pad-1 
l<ig Spa. 
Xi, ig © {0, lb, 1 < ig < per, lk <d, 


(1) 


where cj, ...;, are the cost coefficients. 

By introducing dummy variables, we can assume 
without loss of generality that pj = ... = pa =n; 
then the d-dimensional assignment problem can be re- 


formulated as follows: 


minimize ) ag ) Ciy cig * Xiy cig 


1<ij<n l<ig<n 
subject to ) ) Xipig =1, lS<icn, 
1<in<n l<ig<n 
l<ij<n 1Sip—1 Sn 
) y Mipaig = 1 
ISig4iSn l<ig<n 
l<i<n, 2<k<d-l, 
pa » KMipig = 1, lSigsn, 
l<ij<n 1Sig-i1Sn 
Kinda © 10, iy L=hen; l<k<d 
(2) 


The MAP (2) also has an interesting interpretation as 
a problem of combinatorial optimization: 

Given a d-dimensional cubic matrix, one must find 
the permutation of its columns and rows with the mini- 
mum sum of the diagonal elements. In other words, this 
is an equivalent characterization of (2) in terms of d — 1 


permutations 71, 72,... ,@q—1 Of the set {1,2,... n}: 


minimize ee Cimy(i)...1g—1(i) > 
l<i<n (3) 
subject to 7), 12,..., Ha-1 € IT", 
where /7” is the set of all permutations of {1,2,... m}. 

Spieksma [34] gives an alternative compact formu- 
lation of the MAP as follows: 

Given d sets Aj, A2,...,Ag, each of size n, let 
A= @4_,Aj = Ai x Ay x ... x Aq. In other words, 
A isa set of all d-tuples a = (a(1), a(2),...,a(d)) € A. 
Let xq denote a variable for each a € A. Then, given as- 
signment costs c, for all a € A, the objective function is 


written as )* CaXq. 
acA 
Given a positive integer k, such that] < k<d—1, 


let Q denote the set of all (d—k)-element subsets 
of {1,2,... ,d}. Each subset F from Q corresponds to 
the set of “fixed” indices. Given such F, let Ap = 
@yerAy. Next, given some gc Ap, let A(F, g) = 
{a € Ala(f) = g(f), Vf € F} denote the set of all 
d-tuples that coincide with g on the set F of “fixed” in- 
dices. 
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Then the multi-index assignment problem can be 
written as: 


maximize/minimize p> CiXa 
acA 
subject to 2 Xq = 1, 
acA(F, g) 
forallg€ Ar, FEQ, 


Xq €{0, 1}, forallaeA. 


(4) 


Similarly to the linear assignment formulation by 
means of a bipartite graph, the MAP can also be stated 
using the graph theory terminology in the subsequent 
fashion [7]: 

Given a complete d-partite graph G = (Vi, V>,..., 
Va; E), where V;, |V;| =n, i € {1, 2, ... , d}, denote 
mutually disjoint vertex sets, and E is the set of edges 
in the graph, a subset of the vertex set V = U*_V; 
is said to be a clique if it meets every set V; in exactly 
one vertex. A d-dimensional assignment is a partition 
of V into n pairwise disjoint cliques. Given a real-valued 
cost function c defined on the set of cliques of d-partite 
graph G, the d-dimensional assignment problem asks 
for a d-dimensional assignment, which minimizes c. 


Cases 


A special case of the MAP that is based on the graph 
theory formulation for MAP was considered by Ban- 
delt et al. [6]. The cost function in this particular case 
can be represented using some type of function of el- 
ementary costs defined on the edges of the d-partite 
graph, whereas a general formulation of the MAP us- 
ing graphs allows for the cost function to be defined ar- 
bitrarily on the set of cliques. In particular, the clique 
costs can be decomposed using such functions of edge 
costs as a sum of costs (i. e., a sum of the lengths of all 
the edges in a given clique), a tour cost (i. e., minimum 
cost of a traveling salesman tour in a given clique), a star 
cost (i. e., minimum length of a spanning star in a given 
clique), and a tree cost (i.e., minimum cost of a span- 
ning tree). By using the decomposed costs, one can con- 
struct the worst-case bounds on the ratio between the 
solution costs found by a simple heuristic, as well as find 
the cost of the optimal solution. Specifically, Crama and 
Spieksma [10] considered a case of three-dimensional 
assignment problem, where the lengths of the edges of 
the underlying three-partite graph satisfy the triangle 


inequality, and the objective function is defined as the 
cost of the triangle formed by three vertices (each from 
a different mutually disjoint vertex subsets of the three- 
partite graph). When the triangle cost is defined as the 
length of the triangle (i.e., sum of the lengths of all its 
sides), then there exists a heuristic that gives a feasible 
solution that is within 3/2 from the optimum. The latter 
bound is decreased to 4/3 in the case when the triangle 
cost is defined as the sum of the two shortest sides. 

As mentioned earlier, owing to the exponential in- 
crease in the size of the problem with an increase in the 
number of parameters, it becomes computationally dif- 
ficult to solve MAP instances of higher dimensionality. 
As a result most solution methods for the MAP are con- 
structed for three-dimensional versions of the problem. 
Two important types of the three-dimensional assign- 
ment problem are the axial three-dimensional assign- 
ment problem and the planar three-dimensional as- 
signment problem. The distinction between two types 
lies in constraints and can be easily explained using the 
following simple geometric interpretation [7]. 

Let each solution be represented by a three-dimen- 
sional 0-1 array of size n x n x n. To visualize such an 
array of zeros and ones, let us fix a vertex and draw lines 
or axes along three dimensions. Next, we partition each 
axis onto n intervals. This partition splits the array into 
n> cells so that each cell contains either a 0 or a 1. Given 
an axis, say j, each of n intervals on j has a correspond- 
ing two-dimensional level surface that consists of n x n 
cells and goes through a given interval of j. Alterna- 
tively, the interval partition of each axis divides a three- 
dimensional solution array into n two-dimensional sur- 
faces or “slices” corresponding to each interval on the 
axis. The constraints imposed in the axial case guaran- 
tee that for each axis and all of its intervals, the n x n 
cells in each two-dimensional slice through the inter- 
val sum up to 1. In other words, each axial interval is 
assigned a value of 1, which constitutes the sum of all 
cells that can be projected on that axial interval. This 
explains the name “axial.” 

In contrast, the constraints of the planar MAP deal 
with three planes formed by each possible pair of axes. 
For example, consider the plane formed by axes j and 
k. Using the above partition, this plane is divided into 
n X n squares. For each square on the plane, there is 
a corresponding stack of cells that goes along the i axis. 
Each cell in the stack is projected onto its square (j*, k*) 
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by axis i. The planar constraints require that for each 
plane and every square the sum of the cells in an asso- 
ciated stack is equal to 1. 
Integer programming formulations for each of type 
of three-index MAP are given below. 
Given a set of n? cost coefficients cijk, the axial three- 
dimensional assignment problem is defined as follows: 
n n 


n 
minimize ) ) ) CijkXijk 


i=1 j=1 k=1 


xijk € {0, 1}, l<i,j,k<n. 


Given n° cost coefficients cj,, the planar three-dimen- 
sional assignment problem can be written in the follow- 
ing fashion: 

n n 


n 
minimize ) y y CijkXijk 


i=1 j=1 k=1 


n 
subject to Yo xij =1, 1l<j,k<n, 


i=1 


: . (6) 
a ee l<i,k<n, 
j=l 


n 
dose = 1, 
k=1 


xijk € {0, 1}, l<i,j,k<n. 


The axial three-index MAP given by (5) can also be for- 
mulated using n permutations o and z as a combinato- 
rial optimization problem: 
n 
minimize ss Cio(i)n(i), Subject to o, m € TT, . (7) 
i=l 

Note that the planar three-dimensional assignment 
problem has a different combinatorial interpretation in 
terms of Latin squares of order n. 

Although both axial and planar three-dimensional 
assignment problems (just as the general MAP) are 


generally NP-hard, there exist a number of polyno- 
mially solvable special cases. Particularly, in the case 
when the cost coefficients form a so-called Monge ar- 
ray [8], the MAP is solved by d — 1 identity-n permuta- 
tions {1, 2,... , m} > {1, 2, ... ,m}. Another case of 
the polynomially solvable MAP is the axial three-di- 
mensional assignment problem, where the cost coeffi- 
cient can be represented as a product of nonnegative 
index factors cj jx = pj + qj +1, and the objective func- 
tion is maximized [9]. 


Methods 


All known exact methods for solving this generally NP- 
hard problem are enumerative in nature, and asa result 
of the inherent complexity of the problem such meth- 
ods are too slow for practical applications of the MAP. 
Hence, researchers often use heuristic approaches to 
find suboptimal solutions of different MAPs. In fact, 
one of the earliest solution methods for the MAP was 
a suboptimal method of trisubstitution proposed by 
Pierskalla [26] in 1966 to solve a three-dimensional as- 
signment problem. Later Frieze and Yadegar [15] devel- 
oped a suboptimal procedure for the three-index MAP 
using Lagrangian relaxation. Their technique utilized 
information contained in the relaxed solution to re- 
cover a feasible solution. The key advantage of the La- 
grangian relaxation approach is that it allows for com- 
puting both upper and lower bounds on the optimum 
solution, and therefore this method can be employed to 
evaluate solution quality. Consequently, the Lagrangian 
relaxation technique was widely used to propose nu- 
merous modifications of the original three-dimensional 
method by extending it to the general multidimen- 
sional case [12,28,29]. For example, one of such algo- 
rithms presented by Poore and Robertson [29] in 1997 
works by relaxing a d-dimensional assignment prob- 
lem to a two-dimensional problem, then maximizing 
with regard to the relaxed Lagrangian multipliers, and 
next formulating the recovery procedure as a (d — 1)- 
dimensional problem. These three steps are repeated 
successively until the recovery procedure can be for- 
mulated as a two-dimensional problem, which is solved 
optimally in polynomial time, and the algorithm termi- 
nates. 

Most exact methods for solving the MAP are de- 
vised primarily for its three-dimensional case. One of 
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the earliest exact approaches for the axial three-di- 
mensional assignment problem was suggested by Pier- 
skalla [27] in 1968. His approach works by enumerating 
all feasible solutions using a tree structure, and utilizing 
the branch and bound method as follows. For a given 
node of the feasible solutions tree, a lower bound is cal- 
culated from the corresponding dual subproblem, be- 
fore proceeding further on outgoing branches from the 
node. If the lower bound is greater than the known low- 
est bound, then the outgoing branches are eliminated, 
since it is impossible to obtain a better solution along 
such branches. Otherwise when the lower bound ob- 
tained is less than the known lowest bound, we con- 
tinue further from this node, because it might still be 
possible to improve our solution in that direction. Al- 
though this branch and bound algorithm can easily be 
generalized to the multidimensional case, it is too slow 
to work effectively for the general MAP. 

Since Pierskalla introduced his branch and bound 
procedure for the axial three-dimensional assignment 
problem, many other branch and bound based ap- 
proaches have been developed. Most of them branch 
the current problem onto two subproblems by setting 
one variable x;j, = 0 or x;jk = 1. Then the size of the 
subproblems is decreased. In contrast, a branch and 
bound scheme proposed by Balas and Saltzman [5] per- 
mits fixing several variables at once at each branching 
node by incorporating a special branching strategy that 
takes advantage of the problem structure. 

The planar three-index MAP can also be solved us- 
ing variations of branch and bound. One of the first ap- 
plications of this method to the planar case was given 
by Vlach [35] in 1967. The algorithm obtains lower 
bounds by means of row and column reductions that 
are similar to the ones in the axial case. A method 
for solving the planar three-dimensional assignment 
problem based on a clever combination of branch and 
bound with a relaxation heuristic and Lagrangian relax- 
ation was developed by Magos and Miliotis [24]. The 
upper bounds are calculated by first applying the re- 
laxation heuristic and then decomposing the remain- 
ing problem into n linear sum assignment problems. 
The lower bounds are computed by either a heuristic or 
a Lagrangian relaxation depending on the current prob- 
lem. 

The method introduced by Hansen and Kauf- 
man [20] for solving the axial three-dimensional as- 


signment problem employs a primal-dual method com- 
parable to the well-known Hungarian method for the 
LAP. 

There have been a number of investigations of 
a convex hull of feasible solutions of the three-dimen- 
sional assignment problem. Euler et al. [14] examined 
the polyhedral structure of the solution polytope for 
the planar three-index MAP through its connection to 
Latin squares. Euler [13] also studied the axial poly- 
tope by investigating the role of odd cycles for a class of 
facets of the polytope. The structure of the axial three- 
index assignment polytope was also analyzed by Balas 
et al. [3,4,32]. They developed linear-time separation al- 
gorithms for different classes of facets induced by spe- 
cific cliques, and then constructed a polyhedral proce- 
dure for solving the axial three-index MAP. 

Clemons et al. [11] applied a simulated annealing 
algorithm for solving the MAP. Several local neigh- 
borhood search procedures were implemented for the 
MAP. Greedy randomized adaptive search procedures 
(GRASP) were applied by Murphey et al. [25] for solv- 
ing the general MAP and later by Lidstrom et al. [22] 
and by Aiex et al. [1] for finding solutions of the axial 
three-dimensional assignment problem. A tabu search 
for the planar three-dimensional assignment problem 
was employed by Magos [23] to obtain suboptimal so- 
lutions of the planar thee-index assignment problem. 

Grundel and Pardalos [18] developed a test prob- 
lem generator for testing exact and suboptimal solution 
methods for the axial MAP. Several recent studies in- 
vestigated various asymptotic properties of the MAPs 
with randomly generated assignment cost coefficients. 
In particular, Grundel et al. [17] established the lower 
and upper bounds for the expected number of local 
minima of the MAPs with random costs. 


Applications 


The MAP can be used to solve various real-life prob- 
lems arising in such important areas as capital invest- 
ment, dynamic facility location, and satellite launch- 
ing [30]. Other applications of the MAP include circuit 
board assembly and production planning of goods, 
which can be modeled using the axial three-dimen- 
sional assignment problem [34]. The planar three-di- 
mensional assignment problem has also found many 
interesting applications, for instance, school timetables 
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and experimental design [21], as well as modeling of 
satellite launching [2]. 

Furthermore, it was shown that such a complex 
problem as tracking elementary particles can be inves- 
tigated using the five-dimensional assignment problem 
as a mathematical model [31]. By solving this com- 
plex case of the MAP, one can reconstruct the paths 
of charged elementary particles produced by the Large 
Electron-Positron Collider. 

Many important applications of the general MAP 
arise in data association, resource allocation, air traf- 
fic control, surveillance, etc. In particular, Poore [28] 
has shown that the data association problem arising in 
a large class of multiple target tracking and sensor fu- 
sion problems can be formulated as a MAP by parti- 
tioning the set of observations into false reports and 
tracks, and then maximizing the likelihood of selecting 
the true partition. 


See also 


> Assignment and Matching 
> Integer Programming: Branch and Bound Methods 
> Multi-index Transportation Problems 
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The multidimensional knapsack problem (MKP) can be 
formulated as: 


j=l 
n 
1 
s.t. pease i=1,...,m, (1) 
j=l 
xj €{0,1}, jol,....n, 
where b; > 0,i=1,...,m, and rj = 0,i=1,...,m,j=1, 


eh: 

Each of the m constraints in (1) is called a knapsack 
constraint, so the MKP is also called the m-dimensional 
knapsack problem. 

Other names given to this problem in the literature 
are the multiconstraint knapsack problem, the multi- 
knapsack problem and the multiple knapsack problem. 
Some authors also include the term ‘zero-one’ in their 
name for the problem, e. g., the multidimensional zero- 
one knapsack problem. Historically the majority of au- 
thors have used the name multidimensional knapsack 
problem and so we also use that phrase to refer to the 
problem. The special case corresponding to m = 2 is 
known as the bidimensional knapsack problem or the 
bi-knapsack problem. 

Many practical problems can be formulated as 
a MKP, for example, the capital budgeting problem 
where project j has profit p; and consumes rj units of 
resource i. The goal is to find a subset of the n projects 
such that the total profit is maximised and all resource 
constraints are satisfied. Other applications of the MKP 
include allocating processors and databases in a dis- 
tributed computer system [24], project selection and 
cargo loading [53], and cutting-stock problems [26]. 

The MKP can be regarded as a general statement of 
any zero-one integer programming problem with non- 
negative coefficients. Indeed much of the early work on 
the MKP (e.g., [32,35,52,59]) viewed the problem in 
this way. 

Most of the research on knapsack problems deals 
with the much simpler single constraint version (m = 
1). For the single constraint case the problem is not 
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strongly NP-hard and effective approximation algo- 
rithms have been developed for obtaining near-optimal 
solutions. A good review of the single constraint knap- 
sack problem and its associated exact and heuristic al- 
gorithms is given by S. Martello and P. Toth [42]. 

Below we give a very brief overview of the literature 
relating to the MKP. A more detailed literature review 
can be found in [10]. 


Exact Algorithms 


There have been relatively few exact algorithms pre- 
sented in the literature. 

W. Shih [53] presented a branch and bound algo- 
rithm (cf. also » Integer programming: Branch and 
bound methods) for the MKP with an upper bound ob- 
tained by computing the objective function value asso- 
ciated with the optimal fractional solution for each of 
the m single constraint knapsack problems separately 
and selecting the minimum objective function value 
among those as the upper bound. 

Another branch and bound algorithm was pre- 
sented in [25] with various relaxations of the problem, 
including Lagrangian, surrogate and composite relax- 
ations being used to compute bounds. Y. Crama and 
J.B. Mazzola [11] showed that although the bounds 
derived from these relaxations are stronger than the 
bounds obtained from the linear programming (LP) re- 
laxation, the improvement in the bound that can be re- 
alized using these relaxations is limited. 


Statistical/Asymptotic Analysis 


There have been a few papers considering a statisti- 
cal/asymptotic analysis of the MKP. 

An asymptotic analysis was presented by K.E. 
Schilling [51] who computed the asymptotic (n — oo 
with m fixed) objective function value for the MKP 
where the rj’s and p;’s were uniformly (and indepen- 
dently) distributed over the unit interval and where b; 
= 1.K. Szkatula [54] generalized that analysis to the case 
where b; # 1 (see also [55]). 

A statistical analysis was conducted by J.F. Fonta- 
nari [18], who investigated the dependence of the ob- 
jective function on b; and on m, in the case when pj; = 
1 and the rj’s were uniformly distributed over the unit 
interval. 


Early Heuristic Algorithms 


Early heuristic algorithms for the MKP were typically 
based upon simple constructive heuristics. 

S.H. Zanakis [59] gave detailed results comparing 
three algorithms from [32,35] and [52]. R. Loulou and 
E. Michaelides [40] presented a greedy-like method 
based on Toyoda’s primal heuristic [57]. Primal heuris- 
tics start with a zero solution, after which a succession 
of variables are assigned the value one, according to 
a given rule, as long as the solution remain feasible. 


Bound Based Heuristics 


Bound based heuristics make use of an upper bound on 
the optimal solution to the MKP. 

M.J. Magazine and O. Oguz [41] presented a heuris- 
tic algorithm that combines the ideas of S. Senju and 
Toyoda’s dual heuristic [52] with Everett’s generalized 
Lagrange multiplier approach [17]. Dual heuristics start 
with the all-ones solution, variables are then succes- 
sively set to zero according to heuristic rules until a fea- 
sible solution is obtained. Their algorithm computes an 
approximate solution and uses the multipliers gener- 
ated to obtain an upper bound. 

H. Pirkul [45] presented a heuristic algorithm which 
makes use of surrogate duality. The m knapsack con- 
straints were transformed into a single knapsack con- 
straint using surrogate multipliers. A feasible solution 
was obtained by packing this single knapsack in de- 
creasing order of profit/weight ratios. These ratios were 
defined as p;/ )°"_,@; rj, where @; is the surrogate mul- 
tiplier for constraint i. Surrogate multipliers were deter- 
mined using a descent procedure. 

J.S. Lee and M. Guignard [36] presented a heuris- 
tic that combined Toyoda’s primal heuristic [57] with 
variable fixing, LP and a complementing procedure 
from [6]. 

A. Volgenant and J.A. Zoon [58] extended the 
heuristic in [41] in two ways: 

1) ineach step, not one, but more, multiplier values are 
computed simultaneously; and 

2) at the end of the procedure the upper bound is 
sharpened by changing some multiplier values. 

A. Freville and G. Plateau [21] presented an efficient 

preprocessing algorithm for the MKP, based on [20], 

which provided sharp lower and upper bounds on the 

optimal value, and also a tighter equivalent represen- 
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tation by reducing the continuous feasible set and by 
eliminating constraints and variables. 

They also [22] presented a heuristic for the bidi- 
mensional knapsack problem which includes problem 
reduction, a bound based upon surrogate relaxation 
and partial enumeration. 


Tabu Search Heuristics 


Tabu search (TS) heuristics are based on tabu search 
concepts (see [1,29,46]). 

F. Dammeyer and S. Vof [12] presented a TS 
heuristic based on reverse elimination. R. Aboudi and 
K. Jérnsten [2] combined TS with the pivot and com- 
plement heuristic [6] in a heuristic that they applied to 
the MKP (see also [39]). R. Battiti and G. Tecchiolli [7] 
presented a heuristic based on reactive TS (essentially 
TS but with the length of the tabu list varied over the 
course of the algorithm). 

F. Glover and G.A. Kochenberger [28] presented 
a TS heuristic with a flexible memory structure that in- 
tegrates recency and frequency information keyed to 
‘critical events’ in the search process. Their method was 
enhanced by a strategic oscillation scheme that alter- 
nates between constructive (current solution feasible) 
and destructive (current solution infeasible) phases. See 
also [30]. 

A. Lokketangen and Glover [37] presented a heuris- 
tic based on probabilistic TS (essentially TS but with the 
acceptance/rejection of a potential move controlled by 
a probabilistic process). They also [38] presented a TS 
heuristic designed to solve general zero-one mixed in- 
teger programming problems which they applied to the 
MKP. 


Genetic Algorithm Heuristics 


Genetic algorithm (GA) heuristics are based on genetic 
algorithm concepts (see [1,8,43,46]). 

In the GA of [34] infeasible solutions were allowed 
to participate in the search and a simple fitness function 
which uses a graded penalty term was used. In [56] sim- 
ple heuristic operators based on local search algorithms 
were used, and a hybrid algorithm based on combining 
a GA with a TS heuristic was suggested. 

In [48,49] a GA was presented where parent selec- 
tion is not unrestricted (as in a standard GA) but is 
restricted to be between ‘neighboring’ solutions. Infea- 


sible solutions were penalized as in [34]. An adaptive 
threshold acceptance schedule (motivated by [14,15]) 
for child acceptance was used. 

In the GA of [33] only feasible solutions were al- 
lowed. P.C. Chu and J.E. Beasley [10] presented a GA 
based upon a simple repair operator to ensurethat all 
solutions were feasible. 


Analysed Heuristics 


Analysed heuristics have some theoretical underlying 
analysis relating to their worst-case or probabilistic per- 
formance. 

A.M. Frieze and M.R.B. Clarke [23] described 
a polynomial approximation scheme based on the use 
of the dual simplex algorithm for LP, and analysed the 
asymptotic properties of a particular random model. 

In [47] a class of generalized greedy algorithms is 
proposed in which items are selected according to de- 
creasing ratios of their p;’s and a weighted sum of their 
rj s. These heuristics were subjected to both a worst- 
case, and a probabilistic, performance analysis. 

I. Averbakh [5] investigated the properties of several 
dual characteristics of the MKP for different probabilis- 
tic models. He also presented a fast statistically efficient 
approximate algorithm with linear running time com- 
plexity for problems with random coefficients. 


Other Heuristics 


G.E. Fox and G.D. Scudder [19] presented a heuristic 
based on starting from setting all variables to zero(one) 
and successively choosing variables to set to one(zero). 
See [13] for a heuristic based upon simulated anneal- 
ing (SA). See [27] for a heuristic based on ghost image 
processes. S. Hanafi and others [31] presented a simple 
multistage algorithm within which a number of differ- 
ent local search procedures (such as greedy, SA, thresh- 
old accepting [14,15] and noising [9]) can be used. They 
also presented two TS heuristics. 


Multiple-Choice Problems 


One problem that is related to the MKP is the multidi- 
mensional multiple-choice knapsack problem (MMKP). 
Suppose that {1, ..., m} is divided up into K sets S;, k 
= 1,..., K, which are mutually exclusive S; M S; = 9, 
Vk # I, and exhaustive US_, Sx = {1, ..., n}. If we then 
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add to the formulation of the MKP given previously the 
constraint 


.,K, (2) 


we obtain the MMKP. Equation (2) ensures that exactly 
one variable is chosen from each of the sets S;, k = 1, 
oaglhs 

See [44] for a heuristic for MMKP based on the 
MKP heuristic of Magazine and Oguz [41]. 

The special case of the MMKP corresponding to 
m = 1 is known as the multiple-choice knapsack 
problem (MCKP) and its LP relaxation as the linear 
multiple-choice knapsack problem (LMCKP). Work on 
MCKP includes [16], which presented a hybrid dy- 
namic programming tree search algorithm incorpo- 
rating a Lagrangian relaxation bound; [4], which pre- 
sented a heuristic based upon SA; and [3], which pre- 
sented a tree search algorithm incorporating a La- 
grangian relaxation bound. For work on LMCKP see 
[50]. Earlier work on MCKP and LMCKP is cited in [3, 
4,16,50]. 


See also 


> Integer Programming 
> Quadratic Knapsack 
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Modern large scale vehicle design (aircraft, ships, auto- 
mobiles, mass transit) requires the interaction of mul- 
tiple disciplines, traditionally processed in a sequential 
order. Multidisciplinary optimization (MDO), a formal 
methodology for the integration of these disciplines, is 
evolving toward methods capable of replacing the tradi- 
tional sequential methodology of vehicle design by con- 
current algorithms, with both an overall gain in prod- 
uct performance and a decrease in design time. The 
obstacles to MDO becoming a production methodol- 
ogy, in the same sense as quality control, are numerous 
and formidable. In aircraft design, for instance, typi- 
cal disciplines involved would be aerodynamics, struc- 
tures, thermodynamics, controls, propulsion, manufac- 
ture, and economics. Detailed analyses in each of these 
disciplines could involve tens to hundreds of subrou- 
tines and tens of thousands of lines of code. Managing 
the software libraries and data alone is a daunting task. 

Codes from different disciplines typically are grossly 
incompatible, but even within disciplines, data struc- 
tures and solution representations may be incompat- 
ible, requiring ‘translation’ routines or recoding. This 
incompatibility is particularly acute when stand-alone 
packages with interactive interfaces are involved. Most 
disciplinary codes, designed years ago for small serial 
computers, are very ill-suited to modern parallel archi- 
tectures, even with a coarse grained approach. 

Detailed, highly accurate disciplinary analyses are 
very expensive, requiring sometimes hours on a super- 
computer, even when run in parallel. The import of 
this is that, regardless of the dimension of the design 
space, it can be sampled for accurate function values at 
only a relatively small number of points. Other obsta- 
cles to achieving true MDO include model verification, 
noisy function values, and flawed parallel optimization 
methodologies. 


Almost every conceivable strategy for MDO has 
been proposed. A good recent summary of hierarchi- 
cal approaches can be found in [4], and [9] pioneered 
nonhierarchical or concurrent approaches. The basic 
idea of concurrent methods, and a particular variant 
known as concurrent subspace optimization (CSSO), is 
to simultaneously and independently optimize each of 
the disciplines (or ‘contributing analyses’, as they are 
called), and then perform a global coordination that 
brings the entire system closer to a globally feasible 
and optimal point. Collaborative optimization differs 
from CSSO in how the global coordination is managed. 
An excellent discussion of these approaches is in the 
proceedings [2]. While concurrent methods are intu- 
itively appealing and naturally parallelizable, they are 
not guaranteed to converge [8]. 

Trust region model management [1] is a rigor- 
ous approach to MDO that shows promise, and as- 
pects of CSSO when combined with an extended La- 
grangian and response surface approximations, can 
lead to a provably convergent MDO method (J.F. Ro- 
driguez, J.E. Renaud and L.T. Watson, [6]). A note- 
worthy aspect of the Rodriguez method [6] is that the 
convergence proof covers variable fidelity data, which 
is crucial in practice. 

In a taxonomy of MDO approaches, one distinc- 
tion would be between hierarchic or nonhierarchic. 
Another distinction is whether parallelism is achieved 
between disciplines (concurrent disciplinary computa- 
tion) or within disciplines (multipoint, response sur- 
face, local/global computation). If response surface ap- 
proximations are used, two prevalent approximation 
methods are classical least squares and DACE (Design 
and Analysis of Computer Experiments). 

S. Burgee, A.A. Giunta, V. Balabanov, B. Grossman, 
W.H. Mason, R. Narducci, R.T. Haftka, and Watson 
[3] has a detailed discussion of the multipoint, classi- 
cal least squares approach to response surface construc- 
tion, and of the use of parallelism within disciplines (the 
pipelined MDO paradigm of Burgee is also provably 
convergent). The tack of this approach is to use clas- 
sical design of experiments theory, regression statistics, 
and low order polynomial approximation models. 

The DACE [7] model posits that the output of 
a computer analysis program is 


Y(x) = B + Z(x), 
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where Z(x) is a zero mean stationary Gaussian process. 
(This is clearly a fiction since computer output is deter- 
ministic. The issue is whether the model has predictive 
power.) Using Bayesian statistics, the best unbiased pre- 
dictor is 


¥(x) = B+1(x,S)R(Ys —1-B), 


where S is a set of observation sites, Ys is the vector of 
observations at S, r(x, S) is the correlation of x with sites 
S, R is the correlation matrix between sites S, and B is 
the estimate of the mean. Some parametrized functional 
form for the correlation is assumed, and then these cor- 
relation parameters and B are computed as maximum 
likelihood estimates. 

DACE models are more flexible than polynomial 
models, but with sparse data in high dimensions neither 
DACE nor polynomial models have much predictive 
power. To appreciate the problem, observe that a cube 
in 30 dimensions has 2°° ~ 10° vertices, and to even 
evaluate an algebraic formula at each vertex requires su- 
percomputer power. 


MDO Paradigm Example 


Asan illustration, an MDO paradigm for aircraft design 
is presented here. The MDO algorithm is a repeat loop, 
with a nominal design as its starting point, approximate 
optimal designs as loop iterates, and an optimal design 
as its ending point (see Fig. 1). At the start of each loop, 
aerodynamic shape and mission variables are obtained 
from either the nominal starting design or the inter- 
mediate approximate optimal design. These shape and 
mission variables are then used in the parallel simple 
aerodynamic and structural analyses. 

The simple aerodynamic analyses are performed on 
a regular grid of points in the design space. Simple 
aerodynamic calculations evaluate the (aerodynamic) 
feasibility of each grid point using tolerances on the 
constraints and move limits on the objective function, 
eliminating grossly infeasible points, and generating an 
approximation domain. The simple structural analyses 
use the aerodynamic shape and mission variables in ba- 
sic weight equations to calculate approximate weights 
needed by the objective function and constraints, fur- 
ther refining the approximation domain. 

Using the relatively abundant data from the simple 
analyses, regression analysis and analysis of variance 
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OPTIMAL DESIGN 


Multidisciplinary Design Optimization, Figure 1 
MDO paradigm 


are used to identify less important terms in the poly- 
nomial response surface models. Once the less impor- 
tant terms are eliminated, the structure of the reduced- 
term polynomial regression models is known, and can 
be used later in the generation of response surface ap- 
proximations of the optimal weight and necessary aero- 
dynamic quantities over the approximation domain. 

A genetic algorithm (GA; cf. » Genetic Algorithms) 
is used to find sets of approximate D-optimal design 
points in the approximation domain obtained from the 
parallel simple analyses. The structure of a response 
surface model is embodied in the regression matrix X, 
which defines the GA merit function |X™X| (maximized 
by a set of points called D-optimal). These D-optimal 
design points are input to the detailed aerodynamic 
analysis code, which performs detailed analyses at each 
of the D-optimal design points in parallel. The analyses 
result in accurate aerodynamic quantities, such as wave 
drag and other drag components, and accurate aerody- 
namic loads. 


Multifacility and Restricted Location Problems 
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The accurate aerodynamic quantities are used to 
generate reduced-term polynomial response surface 
models for each of the expensive quantities (such as 
wave drag). An aerodynamic load calculated in the 
detailed aerodynamic analyses is used in a detailed 
structural optimization to calculate an accurate optimal 
weight for that particular aerodynamic load. This struc- 
tural optimization is done (in parallel) for each aero- 
dynamic load generated in the detailed aerodynamic 
analyses. The accurate optimal weights calculated in the 
structural optimization are used to generate a reduced- 
term polynomial response surface model for the opti- 
mal weight. 

All the response surface models are then used in 
a configuration optimization to generate an approxi- 
mate optimal design, which will be used as the starting 
design for the next iteration of the MDO loop. The grid 
spacing may possibly be refined for the simple analy- 
ses. When some convergence criterion is satisfied, the 
MDO loop exits with an optimal design. 

Note that the source of parallelism in the present 
MDO paradigm is the multipoint approximations within 
each discipline, where the disciplines are visited sequen- 
tially in a pipeline. This contrasts sharply with CSSO 
MDO paradigms, where the source of the parallelism is 
processing the disciplines in parallel. 


See also 


> Bilevel Programming: Applications in Engineering 

> Design Optimization in Computational Fluid 
Dynamics 

> Interval Analysis: Application to Chemical 
Engineering Design Problems 

> Multilevel Methods for Optimal Design 

> Optimal Design of Composite Structures 

> Optimal Design in Nonlinear Optics 

> Structural Optimization: History 
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Multifacility and Restricted Location Problems 


In location planning one is typically concerned with 
finding a good location for one or several new facil- 
ities with respect to a given set of existing facilities 
(clients). The two most common models in planar lo- 
cation theory are the Weber problem, where the average 
(weighted) distance of the new to the existing facilities 
is taken into account and the Weber—-Rawls problem, 
where the maximum (weighted) distance of the new to 
the existing facilities is taken into account. 

More precisely, one is given a finite set Ex = {Ex), 
...» Exy} of existing facilities (represented by their ge- 
ographical coordinates) in the plane R* and distance 
functions d,, assigned to each existing facility m € M 
:= {l, ..., M}. The set of locations for the N new fa- 
cilities one is looking for is denoted X = {Xj, ..., Xv}. 
The distance between the new facilities is measured by 
a common distance d. Additionally, a value Wyn» is as- 
signed to each pair (Ex, X;,), forme M,n € N:= {1, 
..., N} and a value v,, assigned to each pair (X;, X;), for 
r,s €N,s>1, reflecting the level of interaction. 

With these definitions the multifacility Weber objec- 
tive function can be written as 


yD, Want Bt Ks) 


meEM nEeN 


+ D0 vrsd(Xr, Xs) = F(X... Xw) 


r,sEN 
s>r 


and the multifacility Weber-Rawls objective function 
can be written as 


max § Max Winndm(EXm,Xn), max v;sd(X;, Xs) 
meM r,sEN 
nEN s>r 


= g(X, sates , Xn). 


In the corresponding optimization problems we may 
additionally assume a feasible region F and we look for 


min f(X%,..., Xn), 
{X05 XN EF 
and 
min g(X),..., Xn). 
{Xp jg XN }CF 


In the first part of this survey it is assumed that F = R* 
whereas J will be a restricted set later on. 

The models above implicitly assume that the new 
facilities can be distinguished, that the amount of inter- 
action between each new and existing facility is known 


and that the new facilities have mutual communica- 
tion. Note, that problems without communication be- 
tween the new facilities can be separated into N inde- 
pendent 1-facility problems which can be easily solved 
by suitable algorithms. Also, in many applications we 
want to locate a number of indistinguishable facilities to 
serve the overall demand. This implies that we are not 
only locating facilities, but we are also allocating exist- 
ing facilities (clients) to the new ones. This variation of 
the problem is called multiWeber or multiWeber-Rawls 
problem and the objective functions can be written as 


>) Wma EXms{Xis-+4Xn}) = F(X) 


meEM 


and 
oer {Wmdm(EXm, {X, tees Xy})} = @(X), 


respectively, where d,(Exm, {X1, ..., Xn}) := 
minyex,,...,xX,} Im(EXm, Y). 
In order to discuss solution methods, suitable types 
of distance functions d,,,m € M, are specified next. 
Let B be a compact convex set in the plane contain- 
ing the origin in its interior and let Y be a point in the 
plane. The gauge of Y (with respect to B) is then defined 


as 
yp(Y):= inf{A >0: Y € AB}. 


This definition dates back to [25]. The distance from 
Ex, to Y induced by yz is 

Aimn(EXm,Y):= yp, (Y —Exm) forme M. 

In the case where all B,, are convex polytopes with 
extreme points Ext(B,,) := {ej", ..., eg} we can define 
halflines /;” starting at Ex,, and going through e’”. For 
the 1-facility case it was proved in [6] for the Weber 
problem that there always exists an optimal solution in 
the set of intersection points of the halflines //" for i = 
1,...,G” and m € M. This result carries over to multi- 
(facility) Weber problems when each B,, has no more 
than 4 extreme points [24]. For more than 4 extreme 
points it is in general wrong (see [24] for a counterex- 
ample). 

In the case where all B,, are polytopes we can give 
linear programming formulations for the multifacility 
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Weber as well as the multifacility Weber—Rawls prob- 
lem [34] using Bs the polar set of B,,, m € M. 


4 J 
min ) ) WmnZmn + ) Vrs 245 


meEM nEN 7T,sEN 
s>r 


St. (EX = Kats) = Saw 
Vm € M, n€ Ne® € Ext(B’), 
(X,—X,,e°) <2, 
Vs,reN, s>r, e° € Ext(B), 
min Zz 
st. Wmn{EXm — Xn, e°,) < 2, 
Ym e M,neéeN, e° 
Vrs (Xs — X,, €°) < z, 


Vs,reN, s>r, e° € Ext(B). 


€ Ext(B° ), 


Even without polyhedral structure we still have a con- 

vex optimization problem for which several solution 

techniques are available (see [11,12,21,32] and refer- 
ences therein). 

In the case where we also have to deal with the allo- 
cation problem we still can apply discretization results 
from the 1-facility case. The allocation part makes the 
problem however NP-hard (see [22,23]; cf. also » Com- 
plexity Theory; » Complexity Classes in Optimiza- 
tion). Nevertheless, constructs from computational ge- 
ometry (e. g. Voronoi diagrams; cf. also » Voronoi Di- 
agrams in Facility Location) can be used to tackle the 
allocation part efficiently and allow iterative heuristics 
producing in general satisfactory results (see [2,30]). 

Further extensions are possible and already investi- 
gated including location with attraction and repulsion, 
hub location, etc. (see [32] for further references). 

A problem common to all forms of multi-(facility) 
location problems is, that in an optimal solution loca- 
tions of different new facilities may coincide with each 
other or with existing facilities. This raises at least two 
issues: 

e A priori detection of coincidences which result in 
a reduction of the dimension of the problem and al- 
low the exploitation of differentiability are discussed 
in [7,20,31]. 

e If coincidence is excluded, the theory of restricted 
location can be used which is discussed next. 

So far, the set F for placing new facilities was the whole 

plane R’. Now, the feasibility set F = R* \ int (R) is con- 


sidered, where R C R? is the restricting set assumed 
to be connected in R?. This problem is more compli- 
cated than the unrestricted one, since F is in general 
not convex. But from a practical point of view it is 
a necessary extension of the classical location model, 
since forbidden regions appear everywhere: nature re- 
serves, lakes, exclusion of coincidence in multifacility, 
etc. These problems are called restricted location prob- 
lems and have been developed in [1,12,14,15] and [26]. 
In the following we exclude the trivial case and assume 
that none of the optimal solutions of the unrestricted 
problem is a feasible solution of the restricted one. 

If the objective function h of the location problem 
is convex it can be shown that optimal solutions of the 
restricted problem can be found on the boundary of R. 
Therefore, level curves 


La(z) := {X € R": h(X) = 2} 
and level sets 
Le(z) := {X € R": h(X) < z} 


can be used to reformulate the restricted location prob- 
lem as 


min {z: La(z) NOR # Mand Le(z) C R}. 


A resulting search algorithm was formulated in [11], 
but proved to be inefficient in practical applications. 

An efficient approach originally presented in [12,14, 
15] identifies finite dominating set (FDS) on the bound- 
ary R, i.e. a finite set of locations on dR which contains 
an optimal solution. Using this discretization, problems 
with gauge distance and convex forbidden region can 
be solved by considering as FDS the intersection points 
of [" and the boundary of ® (see [15,26,28] and the il- 
lustration in the following figure). 

The discretization also works for restricted center 
problems [16] and can be extended to nonconvex for- 
bidden regions (see [15,26]) and also to the case of at- 
traction and repulsion (negative weights are allowed), 
see [29]. The concept of forbidden regions has been suc- 
cessfully applied to a problem in PCB assembly, where 
the bins holding the parts to be inserted into the PCB 
have to be stored [10]. Of course, the PCB itself has to be 
forbidden for placing a bin. A solution approach, where 
also the issue of space requirements in a multifacility 
setting is addressed can be found in [9,15]. A more gen- 
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Multifacility and Restricted Location Problems, Figure 1 
Example of a restricted location problem with 4 existing fa- 
cilities and an elliptic forbidden region 


eral case where the new facility is a line has been consid- 
ered in [33]. Algorithms for multifacility problems with 
forbidden regions can be found in [8,15,27]. 

Another type of restricted location problem is one, 
where not only placement, but also tresspassing of re- 
gions is forbidden. These problems are called barrier lo- 
cation problems. The corresponding models are mathe- 
matically challenging, since the distance functions (and 
thus also the objective functions) are no longer con- 
vex. [17] considers Euclidean distances and one circle 
as forbidden region. [1] and [4] develop heuristics for 
I, distances and barriers that are closed polygons. [19] 
and [3] obtain discretization results for 1, distances and 
arbitrary shaped barriers by showing an equivalence of 
the barrier problem to a network location problem. In 
the more general context of gauge distances an FDS is 
given in [13] for median problems and in [5] for center 
problems. Finally, [18] considers barrier problems if the 
distance is an arbitrary norm and the barrier consists of 
a line with finitely many passages. 
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An ordinary transportation problem has variables with 
two indices, typically corresponding to sources (or ori- 
gins, or supply points) and destinations (or demand 
points). A multi-index transportation problem (MITP) 
has variables with three or more indices, correspond- 
ing to as many different types of points or resources 
or other factors. Multi-index transportation problems 
were considered by T. Motzkin [22] in 1952; an appli- 
cation involving the distribution of different types of 
soap was presented by E. Schell [35] in 1955. MITPs are 
also known as multidimensional transportation prob- 
lems [4]. There are several versions and special cases of 
MITPs: 
e The number k of dimensions may be fixed to 
a small value; the resulting MITP is called a k-index 
transportation problem, k ITP. Quite naturally, the 
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best studied cases are the three-index transportation 
problems (3ITPs), also known as three-dimensional, 
or 3D transportation problems. 

e The type of constraints is determined by an integer 
m with 0 < m < k, defining m-fold k ITPs (called 
symmetric MITPs in [16]; see also [41, Chapt. 8]). 
The most common cases are axial MITPs, when m = 
k—1; and planar MITPs, when m = 1; see below for 
details. 

e Integer solutions may or may not be required. In- 
tegrality requirements, which give rise to integer 
MITPs, may be necessary since MITPs lack the inte- 
grality property enjoyed by ordinary transportation 
problems (but see [22] for an exception). 

e Unit right-hand sides, in conjunction with integral- 
ity requirements, give rise to multi-index assign- 
ment problems (MIAPs). (Some authors use this 
term for integer MITPs with integer right-hand 
sides; the present terminology, consistent with that 
for ordinary assignment and transportation prob- 
lems, seems preferable.) MIAPs are hard to solve: 
the 3IAP is already NP-hard by reduction from the 
3-dimensional matching problem [17]. Even worse 
[6]: no polynomial time algorithm for the 3IAP can 
achieve a constant performance ratio, unless P = NP. 

e The objective function is usually a simple linear 
combination of the variables, normally a total cost 
to be minimized as in equation (1) below. Alterna- 
tives, not considered in this article, may include bot- 
tleneck objectives [11,36], more general nonlinear 
objectives such as in [34], or multicriteria problems 
[38]. 

e There may be additional constraints, such as upper 
bounds on the variables, (capacitated MITPs), vari- 
ables fixed to the value zero (MITPs with forbidden 
cells), or constraints on certain partial sums of vari- 
ables (MITPs with generalized capacity constraints). 

MITPs with linear objectives and without integral- 

ity restrictions are linear programming problems with 

a special structure. The most extensively studied inte- 

ger MITPs are three-index assignment problems (31APs); 

see also Three-index Assignment Problem. 


Formulations 


The following compact notation [31,34] avoids multi- 
ple summations and multiple layers of subindices. Let k 


> 3 denote the number of dimensions or indices, and K 
={1,...,k}. Fori€ K let A; denote the set of values of 
the ith index. Let A = @jexAj =A) X+++ X Ag denote the 
Cartesian product of these index sets, that is, the set of 
all joint indices (k-tuples) a = (a(1),..., a(k)) with a(i)€ 
A; for all i € K. One variable x, is associated with each 
joint index a € A. Thus, for example in a 3ITP with in- 
dex sets I, J and L, the variable x, stands for xj¢ when 
the joint index is a = (i, j, €). 

Given unit costs c, €R for all a € A, a linear objec- 
tive function is 


min y CaXa (1) 


acA 


and the variables are usually restricted to be nonnega- 
tive: 


X,>0 forallacA. (2) 


Given the integer m with 0 < m < k, the demand 
constraints of the m-fold k ITP are defined as follows. 
Let CG in) denote the set of all (k — m)-element subsets 
of K;anFe om m) is interpreted as a set of k — m ‘fixed 
indices’. Given such an F and a (k — m)-tuple g € Ap = 
@rerAs of ‘fixed values’, let 


A(F,g) = {ae A: a(f) = g(f), Vf € F} 


be the set of k-tuples which coincide with g on the fixed 
indices. The m-fold demand constraints are 


> Xq = drg 


a€A(F,g) 


forall Fe g € Ar 
k-—m ; ; 


where the right-hand sides dg, are given positive de- 
mands associated with the values g for fixed index sub- 
set F. These ‘demands’ may also denote supplies or ca- 
pacities when the indices represent sources or some 
other resource type. When some of these resources are 
in excess, the equality in constraints (3) may be replaced 
with inequalities. Problem (1)-(3) is a k ITP. Adding 
the integrality restrictions 

x,€N forallae A, (4) 


yields an integer MITP. 
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As mentioned above, the most common cases are m 
= k—1, defining axial MITPs; and m = 1, defining pla- 
nar MITPs. For the axial problems, the notation may 
simplified by letting dj, = dz, when F = {i }. Note that 
each variable x, appears in the same number k of axial 
and planar demand constraints; however there are only 
Yiex |Ail axial constraints, versus Wiex [|fex {i} 
|A;| planar constraints. Of course, it is possible to com- 
bine demand constraints with different values of m, so 
as to formulate different types of restrictions (e. g., see 
[5] and [16]). 

Reductions between MITPs are presented in [16], 
where it is shown in particular that an m-fold k ITP can 
be reduced to a 1-fold k ITP for any m (with 0 < m< 
k), thereby generalizing a result in [14]. Thus, an algo- 
rithm that solves planar k ITPs is in principle capable of 
solving m-fold k ITPs for any m (with 0 < m< k). 

Notice that any MITP with arbitrary right hand 
sides can be transformed to a MITP with right hand 
sides 1. This is a (pseudopolynomial) transformation 
and simply involves duplicating a resource with a sup- 
ply of q units by q unit-supply resources. There seems 
to be little advantage in doing so, except perhaps in con- 
verting an integer MITP into one with 0-1 variables. 

Another issue is the existence of feasible solutions. 
For an axial MITP the requirement of equal total de- 
mands )°¢ dig = dog dig for all i, j € K is a necessary 
and sufficient condition for the existence of feasible so- 
lutions. Feasibility conditions are more complicated for 
nonaxial problems; see [40] for a review of results for 
planar problems. See also [41, Chapt. 8] for properties 
of polytopes associated with (integer) MITPs, including 
issues of degeneracy. 


Applications 
Transportation and Logistics 


MITPs are used to model transportation problems that 
may involve different goods; such resources as vehicles, 
crews, specialized equipment; and other factors such as 
alternative routes or transshipment points. Thus index 
sets A, and A, may represent destinations and sources, 
respectively, and the other sets A3, Ay, ...these addi- 
tional factors. The type of ‘demand’ constraints used 
will reflect the availability of these factors and their in- 
teractions. Thus, for example, an axial demand con- 
straint (3) with right-hand side d3; will be used for a ve- 


hicle type i € A3 of which d3; units are globally available 
(at identical cost) to all sources and destinations, while 
a constraint with F = {2, 3} will be used if there are dp, 
vehicles of type g(3) available at the different sources 
g(2). 

Interesting cases arise when each resource or factor 
£ € A; corresponds to a point P; ¢ ina metric space, i.e., 
a set with a distance 5, and the unit costs c, are ‘de- 
composable’ as defined below. Each joint index a € A 
may be interpreted as a cluster of points among which 
transportation and other activities are conducted. The 
unit cost c, reflects the within-cluster transportation 
costs associated with these activities; it is decomposable 
if it can be expressed as a function of the distances 
between pairs of points in the cluster a. Examples in- 
clude the diameter max,, ; 5(Pi, a(i); Pj, a(j)), when all these 
activities are performed simultaneously; the sum costs 
dij 5(Pi, ali) Pj, aj) When all activities are performed 
sequentially; and the Hamiltonian path or path costs, 
when all points P;¢ in the cluster have to be visited in 
a shortest sequence. 

Other interesting cases arise when one of the indices 
denotes time. A simple dynamic location problem [27] 
may be modeled as an axial k ITP, where index set A, 
may denote the set of facilities (say, warehouses) to be 
located; A, that of candidate locations; and A3 that of 
time periods. The costs c;; may include discounted con- 
struction and operating costs of these facilities. See [38] 
and [33] for other applications of this type. 


Timetabling 


Other problems involving time and which can be for- 
mulated as MITPs arise in timetabling or staffing appli- 
cations. To illustrate, consider the following generic sit- 
uation. Given are N employees (index i), each of which 
can be assigned to one of M tasks (index j) during each 
of T time periods (index k). Moreover, for each pair 
consisting of a task and a time period a number rjx is 
given denoting the number of employees required for 
task j in period k. Also, a number rj is given denoting 
the number of periods that task j requires employee i. 
An employee can only be assigned to one task during 
each time period. Finally, there is a cost-coefficient Cijk 
which gives the cost of employee i performing task j in 
period k. This problem is called the multiperiod assign- 
ment problem in [21] (see also the references contained 
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therein). To model this as a planar 3ITP, let A; be the 
set of employees; A» the set of tasks; A3 the set of time 
periods; 


tye for F = {2,3}, Ve= Gh): 
dpg = {1 for F = {1,3}, Vg = (i,k); 
rij forF= {1,2}, Vg=(i,j); 


and require the decision variables to be in { 0, 1 }. A spe- 
cial case arises when rj = 1 for all j, k and N = M. 
The polyhedral structure of the resulting planar 3ITP 
is investigated in [7]. Other references dealing with 
timetabling problems formulated as MITPs are [10,15] 
and [12]. 


Multitarget Tracking 


Consider the following (idealized) situation. N objects 
move along straight lines in the plane. At each of T time 
instants a scan has been made, and the approximate po- 
sition of each object is observed and recorded. From 
such a scan it is not possible to deduce which object 
generated which observation. Also, a small error may 
be associated with each observation. A track is defined 
as a T-tuple of observations, one from each scan. For 
each possible track a cost is computed based on a least 
squares criterion associated with the observations in the 
track. The problem is now to identify N tracks while 
minimizing the sum of the costs of these tracks. This 
problem is called the data-association problem in [25]. 
It can be modeled as an axial integer TIAP as follows: 
let A; be the set of observations in scan i,i = 1,..., T, 
and let diy =1,i=1,..., T, g=1,..., N. Not surpris- 
ingly, this problem is NP-hard already for T = 3 (see 
[37]; notice however that this does not follow from the 
NP-hardness of 3IAP due to the structure present in the 
cost-coefficients in the objective function of multitarget 
tracking problems). Other references dealing with tar- 
get tracking problems formulated as axial MIAPs are 
[23] and [24]; see also [20]. 


Tables with Given Marginals 


Other statistical applications of MITPs require finding 
multidimensional tables with given sums across rows or 
higher-dimensional planes, as specified in constraints 
(3). The right-hand sides dz, of such constraints are 


often known as marginals. In a simple application [3] 
arising in the integration of surveys and controlled selec- 
tion, each index set represents a population from which 
a sample is to be drawn. A (joint) sample is a k-tuple, 
one from each population. The marginals are speci- 
fied marginal probability distributions over each pop- 
ulation, giving rise to axial demand constraints. Given 
sample costs cq, the problem is to find a joint probability 
distribution, defined by (x,), of all the samples, consis- 
tent with these marginal distributions and of minimum 
expected cost (1). 

In contrast, problems of updating input-output ma- 
trices (see [34] and references therein) typically have 
nonlinear objectives. In such problems, given are a k- 
dimensional array B of data (for example, past input- 
output coefficients) and arrays d of marginals (for ex- 
ample, forecast aggregate coefficients) with appropri- 
ate dimensions. The problem is to determine values x,, 
the updated array entries, satisfying the demand con- 
straints corresponding to the given marginals, and such 
that the resulting updated array X = (x,) differs as little 
as possible from the given array B, as specified by an ap- 
propriate (nonlinear) objective function. A (nonlinear) 
MITP arises when the values x, are constrained to be 
nonnegative, a natural requirement in many contexts. 


Other Applications 


include an axial integer 3ITP model for planning the 
launching of weather satellites [27], and an axial integer 
5IAP arising in routing meshes in circuit design [9]. 


Solution Methods 


As noted above, MITPs are linear programming prob- 
lems with a special structure. There are several propos- 
als for extensions of LP (transportation) algorithms to 
MITPs (e. g., [4,13] for 3ITPs and [1] for a 4ITP). 

As also mentioned earlier, integer MITPS are hard 
to solve. Exact algorithms have been proposed for the 
axial integer 3IAP (see Three-index Assignment Prob- 
lem) and for the planar integer 3IAP (see [39] and 
[19]). Other exact approaches for integer MITPs rely 
on structure that is present in the particular application 
considered (see, e. g., [12]). 

Several methods have been proposed to obtain good 
approximate solutions to integer MITPs. In [21 ]results 
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are reported for a rounding heuristic on some medium- 
sized planar integer 3ITPs. A tabu search algorithm for 
this problem is described in [18]. Heuristic solution 
approaches based on Lagrangian relaxation are pro- 
posed in [26,28] and [29] for multitarget tracking prob- 
lems. 

One major difficulty with these exact or approxi- 
mate solution methods may be the sheer size of MITP 
formulations; if, for example, all | A;| = then an m-fold 
kITP has n* variables and (* nk ~'™ constraints. In con- 
trast, the two approaches sketched below yield feasible 
solutions to axial MITPs much more quickly than sim- 
ply writing down all the cost coefficients. In particular, 
these algorithms only produce the nonzero variables x, 
and their values; all other variables are zero in the solu- 
tion. In addition, this solution is integral if all demands 
are integral. Of course, the effectiveness of these meth- 
ods relies on some assumptions on the cost coefficients 
Ca, assumptions which are verified in several applica- 
tions. 


A Greedy Algorithm for Axial MITPs 


The greedy algorithm below (a multi-index extension of 
the North-West corner rule) finds a feasible solution to 
axial MITPs in O(k }°; |Aj|) time, which is (for fixed k) 
linear in the size of the demand data dig. This solution 
is in fact optimal if the cost coefficients are known to 
satisfy a ‘Monge property [3,31,32] defined below. (For 
k = 3, this greedy algorithm is already described in [4] 
to obtain a basic feasible solution). 

Consider the axial k ITP with equality constraints 
(3) and assume that each A; = { 1, ..., |Aj] }. Recalling 
that the demands are denoted dj, assume that Y eA; 
ee © pea dj¢ for all i € K, a necessary and sufficient 
condition for the problem to be feasible. 


PROCEDURE greedy MITP algorithm 
WHILE ()>,<4, dig > 0 for all i € K) DO 
let a(i) = min{g € A; : dig > 0}; 
let A = min{d;,q(;) : i € K}; 
lets 
FORi € K DO let di,a(i) = di,a(i) —A; 
RETURN x 
END 


A greedy algorithm for axial MITPs 


A Monge Property 


The join a V band meet a A b of a, b € Aare 


(a Vv b); = max{a(i), b(i)}, 
(a A b); = min{a(i), b(i)} foralli e K. 


The cost coefficients (cq) satisfy the Monge property if 


Cavb t+ Canb X Cag +cp foralla,be A. 


Note that this is just the submodularity of the function 
c: A > R defined on the product lattice A, see [3,31,32]. 
These references show that the above greedy algorithm 
returns an optimal solution for all feasible demands if 
and only if the cost function satisfies the Monge prop- 
erty. The latter two references also extend the greedy 
algorithm 
i) to the case of forbidden cells when the nonforbidden 
cells form a sublattice of A; and 
ii) so that it returns an optimal dual solution. 
They also show that optimizing a linear function over 
a submodular polyhedron is special case of the dual 
problem. It is shown in [32] that the primal problems 
are equivalent to the ‘submodular linear programs on 
forests’ of [8]. 

Cost functions c with the Monge property include 
typical decomposable costs (as defined above) when all 
the points are located on a same line or on parallel lines 
(one line for each factor type A;). For these problems, 
the greedy algorithm above amounts to a ‘left to right 
sweep’ across the points. 


Hub Heuristics for Axial MITPs 


The basic idea ([30], extending earlier work on axial 
3IAPS [6] and MIAPs [2] with decomposable costs) 
is to solve a small number of ordinary transportation 
problems and to expand their solutions into a feasible 
solution to the original MITP. For a large collection of 
decomposable costs arising from applications, the ob- 
jective value of this feasible solution is provably within 
a constant factor of the optimum. 

Given an index h, called the hub, determine, for each 
index i # h, a feasible solution to the ordinary trans- 
portation problem defined by supplies (dij)je aq and 
(dng )g € a(n). The Expand procedure below then takes as 
inputs these solutions yl = (y'); #h and expands them 
into a feasible solution x to the axial MITP. Its run- 
ning time is O(|Aj, | 04a] Ail). 
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ROIR GE = Ny co6n Fly 
PROCEDURE Expand(h, y“) 
FOR g := 1 TO n;, DO 
q := 0; 
a(i):=1 fori € K\h; 
WHILE(q < di,¢) DO 
let £ be such that 
Yes = mint yi), of se ints 
xn = Yaw,g3 
Via ae x) for all r € K\ h; 
a(£) := a(€) +1; 
q:i=qt x; 
RETURN x“) 
END 


The Expand procedure for axial MITPs 


In the hub heuristics for decomposable costs, the or- 
dinary transportation problems use as cost coefficients 
the distances 6(Pj, Pg) between the corresponding 
points Pj and Py, in the metric space. The expanded 
MITP solution x" would be optimum if the cost func- 
tion was that of the star with center h, namely if c, = 
ars: 5(Pi, ati)» Pn,a(n)). The triangle-inequality prop- 
erty of the distance 5 allows one to bound the cost 
penalty from using this h-star cost function instead of 
the actual decomposable cost function. 

In the single hub heuristic, one chooses a hubh € K; 
solves these k — 1 transportation problems; inputs their 
solutions y“) to Expand; and simply outputs the result- 
ing MITP solution x. If the distance 6 satisfies the tri- 
angle inequality, the cost of this solution x is no more 
than k — 1 times the optimal cost, in the worst case, 
for many common decomposable cost functions. The 
multiple-hub heuristic is an obvious extension whereby 
one performs the single-hub heuristic k times, once for 
each h € K, and retains the best solution. This amounts 
to solving (*) ordinary transportation problems. Under 
the same assumptions as above and for many common 
decomposable cost functions, the cost of the resulting 
solution is less than twice the optimum cost in the worst 
case. 


See also 


> Generalized Assignment Problem 
> Stochastic Transportation and Location Problems 
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Multilevel, or hierarchical, programming problems 
(MLP) are constrained optimization programs in which 
subsets of the solution set are themselves solution sets 
of other, lower-level optimization programs. Several 
general MLP problem statements exist. They differ 
from one another in the specifics of optimization vari- 
able distribution among the levels and the definition of 
the objectives and constraints at particular levels. 
Given a set of objectives {fj}; =1,..., m with f;: RR” >R 
and a vector of variables x € R”, partitioned into subsets 
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x =(x1,..., xX) for some integer M denoting the num- 
ber of subsystems, a prototypical form of MLP may be 
stated as follows: 

Minx,¢s, 


fix) 


s.t. x2 € argmin,, <>, {fo(x)} 


xm € argmin, es, {fu(x)}, 


where the optimization problem at each level i controls 
its own subset of variables x;, while the other subsets 
of variables x1, ..., X;—1, Xi+1, Xm Serve as parameters. 
The constraint set for each level is S; = {x: hj(x) = 0, 
gi(x) => 0} with hj: R" > R™" and g;: R" > Rs: for 
some integers mi, msi, 

This form of MLP inspired by the work of H. 
Stackelberg [92] can be viewed as an M-player Stack- 
elberg game [18,84]. Its interpretation is that of M 
autonomous players or decision makers seeking to 
minimize their (possibly constrained) objective func- 
tions while manipulating subsets of decision or design 
variables disjoint from those of other decision makers. 
The higher-level problems are implicit in the variables 
of the lower-level problems. This formulation has been 
studied widely in the bilevel case. See, for example, [15] 
and the references therein. In general, all problem lev- 
els, but the outermost one, may contain a number of 
concurrent optimization problems. 

A related variant of the problem, known as the gen- 
eralized bilevel programming problem, represents the 
reaction of the lower-level problem to decisions made 
by the upper-level problem via a solution of an equilib- 
rium problem stated as a variational inequality: 


min xex, 


filx, y) 
yeY(x) 
s.t. (fax, y),y—Z) <0 forallz € Y(x), 


where the upper-level domain X is such that the lower- 
level domain Y(x) is not empty. This formulation 
was introduced by P. Marcotte in [63] and studied in 
[45,64], and [71]. 

Multilevel problems may be partitioned into two 
classes with respect to another criterion [100]. In one 
of the classes, upper-level optimization problems de- 
pend on the corresponding lower-level ones through 


the optimal value functions (or the marginal functions) 
of the lower-level problems. An optimal value function 
represents the value of a lower-level objective function 
at a solution of that lower-level problem. In the other 
class, upper-level problems depend on the correspond- 
ing lower-level problems through the actual optimal so- 
lutions of the latter. An example of two such formula- 
tions in engineering design optimization will be given 
further. 

Multilevel programming problems arise in numer- 
ous applications where the structure of the applica- 
tion involves hierarchical decision making or where 
the sheer size and complexity of the problem neces- 
sitates partitioning of the system and processing the 
subsystems in a hierarchical fashion. Information on 
applications of multilevel optimization in such varied 
areas as power systems, water resource systems, ur- 
ban traffic systems, and river pollution control can be 
found in [36,50,51,52,62,69,70,85], and many other ref- 
erences. The use of multilevel algorithms in engineer- 
ing control is well documented, for instance, in [46] 
and [57]. 

The broad area of multidisciplinary design optimiza- 
tion(MDO) - a term that denotes a large set of re- 
search subjects and practical techniques for the design 
of complex coupled engineering systems - is particu- 
larly amenable to the use of multilevel methods, due 
to the extreme computational expense and the organi- 
zational complexity of the field. For instance, the de- 
sign of aircraft involves aerodynamics, structural anal- 
ysis, control, weights, propulsion, and cost, to list a few 
disciplines. The complexity and expense of each dis- 
cipline have assured that most disciplines have devel- 
oped into vast, autonomous fields of study, so that prac- 
tically feasible optimization methods that involve the 
contributing disciplines must take into account such an 
autonomy and the hierarchical organization. Maintain- 
ing disciplinary autonomy while accounting for inter- 
disciplinary subsystem couplings and allowing for inte- 
grated system optimization with respect to system and 
interdisciplinary objectives is one of the tasks of MDO. 
Overviews of multidisciplinary optimization may be 
found in [6] and [90]. 

Practitioners of engineering have been using mul- 
tilevel methods, in some form, since optimization al- 
gorithms made their appearance in engineering prob- 
lems. The seminal works [60,65], and [98] contributed 
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to a systematic development and understanding of hi- 
erarchical optimization. Multilevel methods have been 
studied extensively in application to multidisciplinary 
design ([16,17,22,96,97]) and single-discipline design 
areas that give rise to large problems, such as structural 
optimization (e. g., [74,87,91]). Engineering multilevel 
optimization has always had a strong connection with 
multi-objective optimization (e. g., [53]). 


Problem Formulation 


The procedure of formulating an engineering design 

problem as a multilevel or a bilevel problem is difficult 

and depends on the complexity and size of the prob- 
lem. The general components in formulating a multi- 
level optimization problem are as follows: 

e The original problem is studied to determine its 
structure. Structure is of paramount importance in 
deciding to adopt a particular formulation. For in- 
stance, most formulations assume that the problem 
subsystems share only a relatively small number of 
variables, i. e., that the bandwidth of interdisciplinary 
coupling is relatively small. 

e The problem is partitioned into a system (or upper- 
level) problem and subsystem (or lower-level) prob- 
lems. Decisions are made on inclusions of particular 
variables and constraints into the system and sub- 
systems. Decisions are also made on the form of the 
system and subsystem objectives. 

e Finally, algorithms are selected for solving the sys- 
tem and subsystem optimization problems. One 
must distinguish a formulation of the problem from 
the algorithm used to solve that formulation. While 
some of the multilevel formulations can be eas- 
ily shown to be mathematically equivalent to the 
original problem with respect to solution sets, they 
may not be equivalent with respect to other at- 
tributes, such as constraint qualifications and opti- 
mality conditions. Hence the numerical properties 
of algorithms applied to different formulations vary 
widely [8,9,10]. 

Problem decomposition constitutes a special area of 
study. In general, decomposition techniques take ad- 
vantage of the problem structure and depend on the 
strength and bandwidth of couplings among the sub- 
systems. Separable and partially separable problems are 
particularly amenable to decomposition. 


Two types of decomposition may be considered 
in design optimization. Coarse-grained decomposi- 
tion with respect to disciplines presents no difficulty, 
because the design problem initially consists of au- 
tonomous parts. The difficulty at this level of problem 
formulation is in integration or synthesis. However, in 
realistic applications, even though the coarse-grained 
decomposition is frequently obvious, the complexity of 
the problem requires that a dependence analysis be per- 
formed in order to determine the most advantageous 
arrangement or sequencing of the disciplinary subsys- 
tems in the optimization procedure. Automatic tech- 
niques based on graph-theoretic foundations may be 
found in [78] and [79], for instance. 

Finer-grained decomposition within a particular 
discipline may be addressed by a multitude of tech- 
niques for decomposition of mathematical programs. 
Extensive references on decomposition in general 
mathematical programming, beginning with [19] and 
[31], and extended in [49] and many others, can be 
found in [42] and [43]. Further references to decompo- 
sition techniques aimed specifically at design problems 
can be found in [95]. 

General multilevel programming presents an ex- 
ceedingly difficult problem, and many multilevel for- 
mulations and algorithms of engineering design rely 
more on heuristics than on theoretically substantiated 
foundations. There are exceptions, for instance, such as 
those in [12,29,68], and [75]. While many engineering 
multilevel approaches have enjoyed success when ap- 
plied to specific problems, insufficient analytical foun- 
dation and the difficulty of the problem usually mean 
that the approaches are not robust, and extensive ‘fine- 
tuning’ of heuristic parameters is required for each new 
problem or instance of a problem. Hence, recent years 
have seen renewed interest in systematic, analytically 
substantiated approaches to MLP. Many such develop- 
ments have taken place in bilevel optimization. 


Bilevel Optimization 


Although bilevel optimization problems (BLP) form 
the simplest case of multilevel optimization, they are 
very difficult to solve and constitute a fertile research 
area. A survey of the field can be found in [28]. A large 
bibliography with an emphasis on theoretical develop- 
ments is also provided in [94]. 
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The conventional general bilevel problem may be 
posed as follows: 


minyex fi(x, y) 
s.t. h(x, y) =0 
gilx, y) = 0, 


where y solves for fixed x: 


minyey f2(x, y) 
s.t. ho(x, y) =0 
K(x, y) 2 0. 


The cases of linear and convex problem functions have 
been studied widely. A popular class of methods for the 
linear bilevel problem (extreme point algorithms) com- 
putes global solutions by enumerating extreme points 
of the lower-level feasible set (e. g., [27]). Convex bilevel 
problems are often solved by branch and bound meth- 
ods (e. g., [15]). A survey of methods for linear and con- 
vex bilevel programming can be found in [11]. 

The considerably more difficult case of nonlinear 
and nonconvex problem functions has inspired much 
research activity as well but has, to date, led to few 
computationally successful algorithms. The existing ap- 
proaches to nonlinear bilevel optimization can be clas- 
sified into several categories. 


Penalty-Based Methods 


This category uses penalty methods. In some algo- 
rithms (e.g., [1]), a barrier function penalizes the 
lower-level objective. In double-penalty methods, both 
the lower-level problem and the upper-level problem 
are approximated by sequences of unconstrained opti- 
mization problems [56,61,64]. Single or double-penalty 
methods are, in general, expected to converge slowly, 
especially for highly nonlinear problems. Thus using 
these methods for the usually large and nonlinear de- 
sign optimization problems may be difficult. 


KKT-Based Methods 


The algorithms of this category convert the bilevel 
problem into a nonconvex, single-level optimization 
problem by using the Karush—Kuhn—Tucker condi- 
tions (KKT conditions) of the lower-level problem 
as constraints on the upper-level problem [14,15,20, 


37,44]. If the lower-level problem is convex, the KKT 
formulation is equivalent to the original formulation 
[14]. However, even in this case, the KKT conditions 
on the lower-level problem include the complementar- 
ity slackness condition as a constraint. The form of 
the complementarity condition makes the single-level 
problem difficult to solve. The KKT formulation suffers 
from an additional difficulty. Namely, it is well known 
from the study of the sensitivity and stability of non- 
linear programming (e. g., [40]) that even if the lower- 
level problem behaves exceedingly well in that it satis- 
fies such stringent assumptions as strong second order 
sufficiency and regularity as a constraint qualification, 
the feasible set of the single-level problem will generally 
not be differentiable with respect to x. Hence, the per- 
formance of gradient-based solvers on the transformed 
problem may be adversely affected. 


Descent-Based Methods 


Another category of algorithms is based on solving 
subproblems that result in descent for the upper-level 
problem with gradient information of the lower-level 
problem used in a number of ways [34,39,59,83]. 

The remainder of the article will be devoted to 
a more detailed description of two specific approaches 
to nonlinear, nonconvex problems that arose from the 
need to solve engineering design problems. One ap- 
proach is a bilevel formulation, the other is an algo- 
rithm for solving multilevel formulations. 


Examples: Collaborative Optimization 


Collaborative optimization (CO) is a general approach 
to solving multidisciplinary design optimization prob- 
lems by formulating them as nonlinear bilevel pro- 
grams of special structure. CO comprises a number of 
methods. Its antecedents can be traced to earlier hier- 
archical approaches, as in [60] and [98]. The underly- 
ing idea of CO appeared in [13,80,81,82,88] and [96,97]. 
The approach has recently received attention under the 
name of collaborative optimization [22,23,86,93]. 
Given that MDO problems are naturally partitioned 
into subsystems along disciplinary lines, CO suggests 
an intuitively attractive way to formulate the optimiza- 
tion problem so that the autonomy of the disciplinary 
subsystem computations is preserved. However, the ap- 
proach presents a problem that is difficult to solve by 


Multilevel Methods for Optimal Design 


2423 


means of conventional nonlinear programming soft- 
ware [7,58]. The analytical and computational aspects 
of CO were addressed in [9], of which the following 
discussion is an abstract. As a complete description of 
CO is lengthy, only an abbreviated version is consid- 
ered here. 

It is assumed that the original system is composed of 
a number, say M, of interdependent but autonomous 
systems, each of which is described by a disciplinary 
analysis A;, i= 1,..., M, expressed in the form 


Aj(x;, yi(xi)) = 0, 


where, given a vector of disciplinary design variables 
x;, the analysis (frequently represented by a numerical 
differential equation solver or simulator) is performed 
to yield the vector of state variables or responses y;(x;). 
The sets of disciplinary variables x; are not necessarily 
disjoint. The disciplinary constraints are usually repre- 
sented by inequalities 


ci(x;, yi(xj)) = 0. 


Once the system objective and variables and the subsys- 
tem constraints and variables are identified, the bilevel 
problem is formed as follows: 

The constraints of the system problem comprise 
the ‘consistency’ (or ‘coupling’ or ‘matching’) condi- 
tions that are used to drive the discrepancy among the 
inputs and outputs shared by the subsystems to zero. 
The values of the constraints are computed by solving 
the subsystem optimization problems, and the num- 
ber of consistency constraints is related to the number 
of subsystems and variables shared among the subsys- 
tems. The form of the consistency constraints deter- 
mines a particular implementation of CO. 

Let € and 7 represent system-level variables corre- 
sponding to inputs and outputs of subsystems, respec- 
tively. Then, given M subsystems, the abbreviated sys- 
tem program is 


min F(&,7) (1) 
st. G(E,n) = 0, 
where 
gilé, 1) 
G(§,n) = 


gu (&, 0) 


is the set of system consistency constraints obtained by 
solving lower-level subproblems, each of which is of the 
form 


min 5 [ll& — ill’ + Ini — yal] 


st. cj(x;, y(x;)) = 0, : 


where i is the number of the subsystem. Thus, the ob- 
jective of a subsystem optimization problem is always 
to minimize the discrepancy between the shared vari- 
ables of the subsystems, in a least squares sense, sub- 
ject to satisfying the disciplinary constraints, which do 
not depend explicitly on the system variables passed 
down to the subsystems as parameters. The subsystems 
remain feasible during optimization, while interdisci- 
plinary feasibility is gradually attained at the system 
level via the consistency constraints. Maintaining disci- 
plinary feasibility is extremely important from the de- 
sign perspective. 

The problem now consists of a set of decoupled sub- 
problems that can be solved independently and in par- 
allel. 

One instance of the system-level consistency condi- 
tions gives rise to the form in which CO is usually pre- 
sented: namely, the consistency condition is intended 
to drive to zero the value function of the subproblem 
(2). That is, 


1 
gilEn) = 5 ll€ — x47 + In — yew I, (3) 


where x, solves the subsystem optimization problem. 

Another instance of system-level consistency condi- 
tions matches the system-level variables with their sub- 
system counterparts computed in subproblem 


gi(E,n) = (§ — x4, — y(xx)). (4) 


The behavior of optimization algorithms applied to the 
original and CO formulations will differ greatly, as the 
formulations are not equivalent with respect to con- 
straint qualifications or optimality conditions. 

In general, value functions are not differentiable, 
and this may cause difficulties for optimization algo- 
rithms applied to the system-level problem. However, 
under a number of strong assumptions, the constraints 
are locally differentiable and can usually be computed. 

Derivatives of the system-level constraints with re- 
spect to the system-level design variables are the sensi- 
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tivities of the minima or the solutions of the subsystem- 
level optimization problems to parameters. The area of 
sensitivity in nonlinear programming has been studied 
extensively. Relevant results can be found in [40] and 
[41]. In particular, under the assumptions of sufficient 
smoothness, second order sufficiency, regularity as con- 
straint qualification, and strict complementarity slack- 
ness, the basic sensitivity theorem (BST) proves the ex- 
istence of a unique, local, continuously differentiable 
solution-multiplier triple for the perturbed problem. 
Moreover, locally, the set of active constraints remains 
unchanged and regularity and strict complementarity 
hold, allowing one to compute derivatives locally. In 
fact, under a number of assumptions, stronger state- 
ments can be made about the differentiability of the 
value function [30,77]. 

Under the conditions of BST, local first order 
derivatives of the consistency constraints (3) have a par- 
ticularly simple form because, in the case of CO, the 
constraints of the lower-level problems do not depend 
on parameters. On the other hand, the first order sen- 
sitivities of solutions of the lower-level problem that 
form the derivatives of the consistency constraints (4), 
while of closed form, are expensive to compute and 
involve second order derivatives of the subsystem La- 
grangians. 

There is another feature of the CO formulation with 
compatibility constraints (3) that will cause difficul- 
ties for nonlinear programming algorithms applied to 
the system-level problem: Lagrange multipliers will al- 
most never exist for the equality constrained system 
level problem, with all the ensuing consequences. The 
nonexistence of Lagrange multipliers is due to the de- 
scription of the feasible region that causes the Jaco- 
bian of the system-level constraints to vanish at a so- 
lution. The formulation with compatibility constraints 
(4) aims to address this problem. However, the com- 
putation of derivatives for this formulation is clearly 
expensive, as it not only involves solving a system of 
equations, but also requires the computation of second 
order information for the subsystems. The difficulties 
are addressed in detail in [9]. 

In summary, CO is an appealing approach to de- 
sign optimization; however, the bilevel nature of the 
problem formulation will cause difficulties for con- 
ventional nonlinear programming algorithms applied 
to the system-level problem. Variations, special algo- 


rithms for solution, and alternatives can be found in, 
e.g. [33,54,55]. 


Example: MAESTRO, 
a Class of Multilevel Algorithms 


As mentioned earlier, most multilevel formulations and 
algorithms for engineering design problems assume 
that the bandwidth of coupling among the subsystems 
comprised by the multilevel system is small. While 
many problems may be stated in this way, it is becom- 
ing increasingly important to consider problems with 
large bandwidth of coupling where, to use an MDO 
expression, ‘everything affects everything else’. MAE- 
STRO (a class of multilevel algorithms for constrained 
optimization; [2]) is intended for solving large non- 
linear programming problems with arbitrary couplings 
among the naturally occurring subsystems, i. e., a par- 
ticular instance of MDO problems with a single ob- 
jective. The class was extended in, e. g., [5] to include 
a large class of steps for the nonlinear programming 
problem and in [3,4] to incorporate general nonlin- 
ear objectives. The class makes no assumptions on the 
structure of the problem, such as convexity or separa- 
bility. 

The algorithms of the class are based on trust re- 
gion methodology (see, e. g., [35,38,67]) and are proven 
to converge under reasonable assumptions. 

The idea of the MAESTRO algorithms is to attain 
sequential predicted sufficient decrease conditions for all 
the constrained objectives, and is a direct extension of 
the multilevel ideas for the equality constrained opti- 
mization problem. The approach can be summarized 
as follows. Given an initial approximation to the so- 
lution of the multilevel problem, the trial step for the 
multilevel problem is computed as a sum of a sequence 
of substeps, each of which predicts sufficient (or opti- 
mal) decrease in the quadratic model of the objective of 
a given subproblem, subject to maintaining predicted 
decrease in the models of the previous objectives. For 
instance, in the case of the unconstrained bilevel prob- 
lem, the trial step for the bilevel problem is a sum of two 
substeps. The first substep is computed to predict suffi- 
cient decrease, via the quadratic model of the innermost 
objective f2, for the subproblem of approximately opti- 
mizing 


mls) = fale) + Vfolxe)™s + 557 Holes, 
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in the trust region of size 5, to produce the substep s,,, 
where x, is the current approximation to the solution 
and H; is the current approximation to the Hessian of 
f2. The second step ss, would then approximately mini- 
mize the quadratic model of the outermost objective f1, 
constructed at x, + sy, in the trust region of size df, 
subject to constraints that enforce the preservation of 
the predicted sufficient or optimal decrease for f;. The 
total trial step is evaluated by using the merit function 
designed to account for the sequential processing of the 
objectives. The algorithm is shown to converge to crit- 
ical points of the bilevel or multilevel problem. Thus, 
the essential difference between this approach and the 
classical approaches to bilevel optimization is that in- 
stead of starting from the optimality conditions for the 
bilevel or multilevel problem, the approach attempts to 
obtain decrease on the sequence of subproblem mod- 
els, while preserving predicted decrease for the previ- 
ously processed subproblems, and to measure progress 
via the use of an appropriate merit function with rig- 
orously updated penalty parameters. It is important to 
emphasize that the merit function is used only to eval- 
uate the steps, and not to compute them. 

The ongoing work is concerned with practical im- 
plementation issues and applications to engineering de- 
sign problems. 


Summary 


Multilevel optimization has been an active research 
field, both in applied mathematics and in engineer- 
ing design. Many open questions remain, in particu- 
lar, in the area of practical computational algorithms 
for bilevel and multilevel problems. Overviews of some 
recent developments can be found in [66]. 

Understanding the behavior of specific, nonlinear 
programming algorithms applied to the system-level 
problem of the bilevel or multilevel formulations will 
present an interesting and difficult area of inquiry, and 
would benefit from the techniques of nonsmooth anal- 
ysis and optimization [32,36,47], unconventional no- 
tions of constraint qualifications [24,25], and optimality 
[99,100]. 

To facilitate research and testing in the area of algo- 
rithms, one may find automatic bilevel and multilevel 
problem generators, as well as other sources of multi- 
level problems, described in [26,72,73]. 
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Multilevel optimization methods have been developed 
first in the period after 1960. The main scope was to 
facilitate the optimization of large scale systems in in- 
dustrial processes and to solve trajectory determination 
and prediction problems using trajectory decomposi- 
tion techniques. The reader may refer in this respect to 
the corresponding articles [3] and [26] and to the ref- 
erences given there but also to the books [27] and [12]. 
More recent works on this subject have been published 
in [4,14]. It should be mentioned that certain sources 
concerning the ideas of multilevel optimization may be 
found in well-known treatises of calculus of variations 
and theoretical mechanics, cf. e.g. [5,10]. Indeed, the 
well-known procedure of variational methods in Me- 
chanics of ‘frozen’ variables or constraints has a great 
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relationship with the ideas of multilevel optimization. 
Also the well-known iterative methods of H. Cross and 
G. Kany of linear structural analysis used after 1940 
and before the development of computer codes based 
on the finite element method (FEM), for the calculation 
of framed structures, are nothing than a formulation in 
the ‘language’ of structural analysis of a multilevel op- 
timization algorithm for the minimization problem of 
the complementary energy of the structure, expressed 
in terms of the bending moments of the beam and col- 
umn connections. 

Among the pioneers in the application of the multi- 
level optimization methods in mechanics and especially 
concerning the calculation of structures involving in- 
equality constraints was P.D. Panagiotopoulos [19,20]. 
The idea was the following: Most mechanical problems 
can be expressed as the minimum problems of an ap- 
propriately formulated energy function. The decompo- 
sition of this initial optimization problem into smaller 
subproblems corresponds to the energetic decomposi- 
tion of the initial mechanical problem into smaller fic- 
titious subproblems. The mutual interaction of these 
subproblems yields, after an iterative procedure, the 
solution of the initial problem. The aforementioned 
method leads to the following three main applications 
of the multilevel optimization techniques in the frame- 
work of Mechanics and more generally in engineering 
sciences. 

a) Calculation of large structures. 

b) Validation of the simplifying assumptions used for 
the calculation of complex structures. Accuracy test- 
ing. 

c) Accuracy improvement of simplified models used 
for the estimation of the behavior of complex struc- 
tures. 

Note that in the above, the term ‘structure’ can be re- 

placed with the term ‘systems’, meaning systems whose 

behavior is characterized by the solution of a minimax 
problem. 

Since most of the multilevel techniques developed 
in the early sixties for the trajectory determination 
problems in space science are also applicable to sta- 
tionarity problems, and since recently it has been 
proved that in the dynamic problems involving impact 
phenomena the functional of the action is stationary 
[22,23] it results that there is also a further application 
of the multilevel optimization methods: 


d) Calculation of the dynamic behavior of structures 
involving impact effects. 

To the aforementioned applications the following, clas- 

sical one, can be added. 

e) Solution of optimal control (minimum of weight or 
cost, maximum of strength) in dynamic structural 
analysis problems. 

This article deals mainly with static systems. Concern- 

ing the application d) and e) the reader is referred to 

[12,27] in relation with [22,23]. In dynamic problems 

analogous methods to the static problems can be devel- 

oped. 

The classical decomposition techniques which are 
applied to optimization problems (cf. in this respect 
also [20, pp. 355ff]) have been extended and they can 
be applied also to substationarity problems [25], i.e. to 
problems of the type 


0 € Of (x), 


where f is a nonconvex nonsmooth energy function 
and 0 denotes the generalized gradient of F.H. Clarke 
[7] as it has been extended by R.T. Rockafellar [25] for 
nonLipschitzian functionals. In this case the variational 
inequalities of the convex energy problems are replaced 
by hemivariational inequalities (cf. e.g. [8,17,20,21]) 
and instead of a global minimum of the convex po- 
tential or complementary energy functionals, the local 
minima and maxima are searched and among them the 
global minimum as well. For the numerical treatment of 
hemivariational inequalities certain numerical methods 
have been developed (cf. e.g. [21]) and among them, 
the two methods described in [15] are extensions of 
the multilevel optimization methods to substationarity 
problems. 

It should also be noted that most of the domain de- 
composition methods are special cases of the multilevel 
optimization algorithms, as it results easily if one con- 
siders the energy functionals corresponding to the par- 
tial differential equations studied. Then the domain de- 
composition leads to energy functionals which have to 
be minimized on the decomposed parts of the domain. 

Finally, it should be mentioned that fractal geome- 
tries in optimization problems arising in Mechanics are 
treated by means of appropriate multilevel transforma- 
tions of the problem as is will be shown further. It is evi- 
dent that an optimization problem with many variables 
cannot always directly be decomposed into indepen- 
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dent optimization subproblems. The aim of the mul- 
tilevel optimization is to define with respect to an op- 
timization problem, appropriate mutually independent 
subproblems. Each of these when solved independently 
yields the optimum of the overall problem after an iter- 
ative procedure which is called second-level controller. 
The decomposition into subproblems is achieved by 
choosing some variables, called coordinating variables, 
which are freely manipulated by the second-level con- 
troller in such a way that the subproblems (first-level of 
the problem) have solutions which in fact yield the op- 
timum of the initial problem, i. e. before its decomposi- 
tion into subproblems. Here, the ideas of [3] are closely 
followed. 

There are several different methods of transforming 
a given constrained optimization problem into a multi- 
level optimization problem. All these methods are basi- 
cally combination of two methods: the feasible decom- 
position method or model coordination method and the 
nonfeasible decomposition method or goal coordination 
method. 

Let us consider the problem 


min J1(x,u) 
x,u 


st. f(x, u) = 0 (1) 
R(x, u) > 0, 


where x is a vector in E,, u is a vector in E,,, f is ann 
vector of C? functions, I7 is a twice continuously differ- 
entiable (C”) function, and R is an r vector of C? func- 
tions. To decompose, coordinating variables s may be 
substituted not only for a single variable but also, for 
functions g(x, u), so that JT is splitted into mutually dis- 
joint parts and the f and R equations contain no com- 
mon x, u, or s variables between the subproblems. Thus 
the following problem results: 


N 

IT(x, u,s) = y Bea? a) 
i=1 

Fx, yO, 6) = 0, 

ROK, WO, ) > 0, 


i=1,...,N, 


i=1,...,N. 


The (i) denotes to the ith subproblem or subsystem 
which must be optimized. For example in a control 
problem x denotes the state, u denotes the control and 
x‘) is the state vector for the first subsystem. Also the 


coupling equations must be added: 
s) = gq yw) forall j F i. 
The Lagrangian of the new problem reads 


TI(x, u, 8; A, [L, p) 


N N N 
= eis + yao is Ss pOT(R® — 9) 
i=1 i=1 i=1 


N 


+P =a); 


i=1 
where o“) > 0 are additional slack variables such that 
R® —6 =0, 


TT is immediately separable into N individual subsys- 
tems, except for its last term. 

In the method of nonfeasible decomposition it is as- 
sumed that p“ has a known value. The term pT §( is 
put in the ith subsystem and all of the pT g® (x, u) 
terms associated with the jth variables are put in the jth 
subsystem. On the other hand, in the feasible decompo- 
sition method it is assumed that s has a known value. 
Moreover, all of the p“ Tig (x, uw) — s] terms as- 
sociated with the jth variables are put in the jth sub- 
system. In both cases, the optimization problem is sep- 
arable and each subsystem can be optimized indepen- 
dently. Equation (2) is rewritten in more compact form 
as 


TI(x,v;A, LL, p) 
= F(x,v) +A! f(x, v) + w'[R(x,v) —o] 
+ p'h(x,v), (3) 


where o > 0, vrepresents u and s and h(x, v) denotes all 
g — s, p is a Lagrange multiplier vector of the same 
dimension as g, mt is an r vector including all Lagrange 
multipliers, and A is an n vector including all Lagrange 
multipliers. 

The Kuhn-Tucker theory of nonlinear program- 
ming [9] implies that if [7 (x, v) has a critical point at 
(x°, v°) such that the constraint equations in (1), are 
satisfied, and if the rank of 


(a) (&) G) | 
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is full and equals the rank of 


Ga GG) « 


“() 


at (x°, v°), then a set of unique Lagrange multipliers 4°, 
w° and p? exist at the critical point. The necessary con- 
ditions for a critical point (local minimum) are 


soon 

= =~ =9, wiki =0, R>O, w <0, (5) 
ox Ov 

a _ gr _ 9 TT iy (6) 
a )0~SCt~é~<CS~ssti‘i‘it SSC 


If I(x, v) is convex, if f;(x, v) and h(x, v) are convex 
for A? and p? positive, or if f;(x, v), hi(x, v), Ri(x, v) are 
concave for A?, p?, 119 negative, and the above necessary 
conditions are satisfied, then IT(x°, v°) is the absolute 
minimum of (1) and TI hasa global saddle point at (x°, 
v°); that is, 


TT (x, v; 2°, w°, p°) > TI(x°,v°; 2°, °, p°) 
> IT(x°, vA, f, p) 


for all x, v, A, x, and p. These conditions can be relaxed 
to local convexity and concavity such that only a local 
minimum and saddle point are assured. 

The nonfeasible gradient controller of L.C. Lasdon 
and J.D. Schoeffler [11] has the following form: Given 
(1), suppose that 
a) [I has a global saddle point at (x°, v°;A°, “°, p°); 

and 
b) for any given p, a finite constrained (unique) mini- 

mum (constrained by f and R) exists. 
Then the iterative procedure given by 


og = ip ae Ap, 
where 
Ap = +kh(x*,v*), 


with k > 0, 


will converge to p° and the absolute minimum of (1). 
Note that a local saddle point can replace a), then the 


initial guess on p must be within this saddle region. 
However, then the algorithm leads only to a local min- 
imum. This Lasdon gradient controller can be consid- 
ered as a variant of the modified Arrow-Hurwicz gra- 
dient method of K. Arrow, L. Hurwicz and H. Uzawa 
[1]. 

The feasible gradient controller of C.B. Brosilow et 
al. [6] has the following form: Given (1), suppose that 
a) a finite minimum exists at (x°, v°); and 
b) all the conditions of (5) and (6) are fulfilled except 

for 011/(ds) = 0, (where v denotes all s and u). 
Then the iterative procedure given by 


itl — ig 4 Ag, 


will converge to s° = x° and the minimum of (1). 

The good choice of « is important for the gradi- 
ent calculations. Then at the second level of the feasible 
method, we may write ([3, p. 142]) that 


ae ~ ~\T 
on att ( art 
diq => 3. 38 = ae (=) ‘ K > 0. 


An estimate of the expected improvement is written as 
—* 
—alI ,a>0, where a is usually 10% or so. Then 


all* 
——— 7) 
(dIT/ds) (dIT/ds) 


In the case of nonfeasible decomposition a similar 
equation may be obtained [3]: 


all* 
K= : (8) 
gig 


Note that As and Ap become singular at the optimum 
if (7) and (8) are used, respectively, and therefore these 
values of As and Ap are not appropriate to obtain exact 
solutions. 

There is also the possibility to apply a Newton- 
Raphson controller both for the feasible and for the 
nonfeasible method in the second level (cf. in this con- 
text [3, p. 173]). 
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For instance examining (5) and (6), it is obvious that 
the only necessary condition not satisfied by the subsys- 
tems is g = 0 in the nonfeasible decomposition method. 
Thus the Newton-Raphson method has as task to solve 
g = 0 by an iterative method at the second level. 

Note that the main characteristic of the aforemen- 
tioned methods, i.e. the decomposition into subsys- 
tems and the separable optimization applies also to 
nonsmooth convex or nonconvex optimization prob- 
lems. 


Large Cable Structures 


Here a possibility offered in structural analysis by the 
multilevel optimization algorithms is presented. Cer- 
tain subproblems do not contain inequalities, i.e. are 
bilateral, and thus they can be treated by the avail- 
able classical (i. e. based only on inequalities) FEM pro- 
grams. 

In the majority of cable structures the number of ca- 
bles and nodes is large, and so an optimization problem 
with a large number of unknowns and constraints must 
be solved. Here, a multilevel optimization technique 
suitable for the solution of this kind of optimization 
problem is proposed. The initial optimization problem 
is decomposed into a number of subproblems. In the 
‘first level’ of the calculation, each subproblem is opti- 
mized separately, and in the ‘second level’ the solutions 
of these subproblems are combined to yield the overall 
optimum. 

It is interesting to note that some of these sub- 
problems constitute minimization problems without 
inequality constraints (corresponding to classical bilat- 
eral structures), and the algorithms for their numerical 
treatment are much faster. The initial problem is de- 
composed into two subproblems: the first involves only 
the displacement terms and corresponds to a structure 
resulting from the given one by considering that all 
the cables act as bars (capable of having compressive 
forces), and the second, including only the slackness 
terms, corresponds to a hypothetical slack structure. In 
order to perform the decomposition, the potential en- 
ergy of the structure is written in the form 


TT(u, v) = T7'(u) + 17’ (v) + u! GKov, (9) 
where 


1 
IT'(u) = sui Ku —u! (GKyep + p) (10) 


and 


IT" (v) = 5v"Kyv" +v!(a—Koe). (11) 
In the above equations u, v, p, eo are the displacements, 
slackness, loading and initial strain vectors respectively, 
Ky is the natural stiffness matrix, K is the stiffness ma- 
trix of the assembled structure and G is the equilibrium 
matrix. Introducing the variable w the minimization 
problem (9) takes the form 


min /7(u, v, w) = T7'(u) + 17” (v) + u! GKow. 
The Lagrangian of this problem is 
IT,(u, v, w) = I7(u, v, w) + p'(v—w), 


where p is the vector of the Lagrange multipliers. The 
decomposition can be performed by means of two 
methods: the nonfeasible gradient controller method of 
Lasdon and Schoeffler and the feasible gradient con- 
troller method of Brosilow, Lasdon and Pearson [11]. 
In the nonfeasible gradient controller method the value 
of p is supposed to be constant in the first level, say 
p;,and the minimization problem decomposes into the 
two subproblems 


min{/7'(u) + u' GKow — p| w} 
and 
min {17 (v) + piv: v+ p> Oo}. 


After performing the optimization, the values of u, v 
and w, e.g. uj, Vv; and wi, result. It is obvious that v; 
4 wy. The task of the second level is to estimate a new 
value of p, e. g. p2 by means of the equation 

Po = pi tkK(vi-—wi), kK >9), 
where « is a properly chosen constant (see, e.g., [11]), 
and to transmit this value to the first level. The opti- 
mization is performed again, new values up, v2 and w2 
result, etc., until the differences v; — w; are made neg- 
ligible. The algorithm converges in a finite number of 
steps, provided that the minima exist [11]. 

In the feasible gradient controller method, the value 
of w is taken as constant in the first level, e.g. wi, and 
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thus the initial problem decomposes into the two sub- 
problems 


min{J7'(u) + u! GKow;} 

and 
min {IT/(v) + p'(v—wi): v+b>0}. 
vp 


As a result of the optimization, the values of u, v and p, 
e. g. U;, Vv; and p; are calculated. By means of the sec- 
ond level a new value of w, e.g. Wo, is estimated and 
transmitted to the first level. This value is given by the 
equation 


(a) 
WwW. = wi — K | ——— , K>O0, 
dw wWw=wWi1 


where xk is a properly chosen constant (see, e.g., [11]). 
The optimization yields a new set of values uy, v2 and p2 
and the procedure is continued until the difference be- 
tween the consecutive values of vector w becomes suffi- 
ciently small. 

For numerical applications the reader is referred 
to [20]. 


Large Elastoplastic Structures 


We consider here the holonomic plasticity model [13], 
(extension to nonholonomic plasticity problems is 
straightforward) described by the following equations: 


e= Fos, 


e=e) + ez + ep, 


ep=NA, 
¢ =N's-—k, 
A>0, <0, @'A=0, 


where Fo is the natural flexibility matrix of the struc- 
ture, e the respective strain vector consisting of three 
parts, the initial strain eo, the elastic strain eg and the 
plastic strain ep, A are the plastic multipliers vector, 
the yield functions, N is the matrix of the gradients of 
the yield functions with respect to the stresses and k is 
a vector of positive constants. The potential energy of 
the structure is written in the form 


T(u, A) = T'(u) + I’ (A) — u' GKyNA 


where 
TT'(u) = tu’ Ku —e) K)G'u—p'u, 


IT"(2) = 4A N'KoNA + e) KoNA — kA. 


Again, K is the stiffness matrix of the structure and Ko 
is the inverse of Fo. 

The solution of the problem can be obtained by 
minimizing the potential energy of the structure: 


min {J7(u,A): A > 0}. (12) 


By introducing a new variable w, (12) takes the form 


min {/7(u, A,w) = I7'(u) + IT" (A) 
—u'GK)Nw: w=A,A>0}. (13) 


As in the previous section, the decomposition can be 
performed by the two methods of the feasible and 
the nonfeasible gradient controller respectively. For the 
sake of brevity only the nonfeasible gradient method 
will be shown here. The Lagrangian of (13) is first con- 
sidered 


IT(u,A,w) = IT(u,A,w) + p' (A —w) 


and the minimization problem is decomposed in the 
following two subproblems 


min {I7'(u) —u! GK,Nw— pw (14) 


and 


min {/T/(2) + pia: A=0}. (15) 
In the first step it is supposed that the value of p is con- 
stant (say p,) and we take as a result from (14) and (15) 
the values uj, A; and wy. Obviously 4; # wi. Then 
the second level controller estimates the new value of 
p from the equation 


Po =p, +Kk(Ai—wi), K>0, 


and transmits it to the first level, and the procedure is 
continued until the differences 2; — w; become appro- 
priately small. 

The same procedure can be applied also to holo- 
nomic models including hardening and to nonholo- 
nomic plasticity models [13]. 
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Validation and Improvements 
of Simplified Models 


In mechanics and engineering sciences as well as in 
economy, simplified models are often considered for 
the treatment of complicated problems, e.g. concern- 
ing the calculation of stresses in complex structures. In 
these models it is assumed that certain quantities do 
not influence considerably the solution of the problem. 
By means of the multilevel decomposition, a method 
which permits the validation of these models and the 
improvement of their accuracy can be developed. This 
idea is explained in the sequel. 

A. Consider a large structure involving also some ca- 
bles and assume that due to the pretension of the 
cables the structure is calculated as if the cables are 
rods, i.e. by ignoring the fact that a cable may be- 
come slack and then it has zero stresses. Then in the 
equations (9)-(11) v = 0 and the solution of the mini- 
mum problem is obtained by solving an unconstrained 
minimization problem, i.e. by a linear system solver. 
In order to check whether the solution of the simpli- 
fied model is close to the solution of the initial prob- 
lem, in which some cables, say r, may become slack, 
i.e. vj > 0,i=1,..., 7, it is enough to verify whether 
the second level controller which gives a value of the 
slackness of the cables causes a significant change in 
the solution of the first level problem which corre- 
sponds to the simplified structure. Also the algorithm 
offers an improvement of the solution of the simplified 
model. 

B. Here, the investigation of the mutual influence of 
two subsystems is presented. Consider two substruc- 
tures connected together, for instance a cylindrical shell 
with a hemispherical shell covering the one end of the 
cylinder. The solution of the whole linear elastic struc- 
tural compound minimizes, for a given external load- 
ing, the potential (or the complementary) energy of the 
whole structure. Let x; (respectively, x2) be the variables 
of the cylindrical (respectively, the hemispherical) shell 
and let z be the common variables at the contact line 
which are common in both structures. In order to de- 
compose the potential energy into two minimum prob- 
lems, one containing the unknowns of the cylindrical 
shell and the other of the hemispherical shell, the com- 
mon variables for the cylindrical (respectively, hemi- 
spherical) shell are denoted by z, (respectively, z2) and 


thus the initial problem 


min {17 (x1, X2, Z) = IT, (x, Z) + TTp(X2, Z)} 


X] 5X2,Z 


is written as 


min {11,(x1,Z1) + I1p(x2, 22): Z1—Z2 = 0}. 


X] 5X22] 5Z2 


Here IT, (respectively, [72) denotes the potential or the 
complementary energy of the cylindrical (respectively, 
the hemispherical) shell. Thus it can be tested by the 
nonfeasible controller method how the difference z; — 
Z influences the solution of the problem. The proce- 
dure is similar in the case of elastoplastic structures with 
the difference that the minimum is constrained by in- 
equalities. 

The above procedure may find applications in esti- 
mating the influence of saddles on pipelines of rigidity 
rings on long tubes etc. 

C. Note that in all the above cases the Lagrange multi- 
pliers have a precise meaning: they correspond in the 
sense of energy to the chosen coordinating variables, 
i.e., if the coordinating variables are stresses (respec- 
tively, strains) or forces (respectively, displacements) 
then the coordinating Lagrange multipliers are strains 
(respectively, stresses) or displacements (respectively, 
forces). Thus the feasible and the nonfeasible decom- 
position method have a precise mechanical meaning. 
In the first case the Lagrange multipliers, i. e. the strains 
(respectively, the stress) are controlled while in the sec- 
ond one the coordinating variables, i.e. the stress (re- 
spectively, the strain) of the links between the two sub- 
structures are controlled, in order to achieve the posi- 
tion of equilibrium of the whole structure. 

D. Some of the resulting substructures may have 
a known analytical solution. Then this fact facilitates 
the calculation and may be applied as a test for the ac- 
curacy of the resulting solution via a numerical tech- 
nique, e.g. by the FEM model. The procedure is de- 
scribed in [24]. 

E. The multilevel decomposition method can be used 
also as estimator of the sensitivity of the final solution 
to small changes of the system to be optimized [24]. 
This method may be used for example in estimating 
how a partial change in a structure influences the stress 


Multilevel Optimization in Mechanics 


and strain field of the structure without solving twice 
the structure. 


Decomposition Algorithms 
for Nonconvex Minimization Problems 


In unilateral contact problems with friction, Pana- 
giotopoulos proposed in 1975 an algorithm [18] called 
later PANA-algorithm for the decomposition of the 
quasivariational inequality problems into two classical 
variational inequality problems which are equivalent 
to two minimization problems. Analogous decomposi- 
tion methods of complicated problems using an anal- 
ogous to [18] fixed point procedure can be applied to 
the treatment of much more complicated problems to- 
day involving nonconvex energy functions. This section 
is devoted to the study of multilevel decomposition al- 
gorithms for problems belonging to the general frame- 
work of the substationarity problems. 

It is known that the equilibrium of an elastic body 
Q in adhesive contact with a support I” is governed by 
the following problem [17,21]: Find u € V such as to 
satisfy the hemivariational inequality 


a(u,v—u)+ i in(un, vn — un dl" 
r 


+ | Alurve—undr > (fw), Ye V. 
r 
(16) 


Here u, v are the displacement fields, f are all the ap- 
plied forces, (f, v) - usually a L? internal product - 
is the work of the applied forces, a(u, v) is the elastic 
strain energy which is usually a coercive form, jy (re- 
spectively, jr) denote the nonconvex, locally Lipschitz 
generally nonsmooth energy density functions of the 
adhesive forces in the normal (respectively, the tangen- 
tial) direction to the interface I". It is assumed that the 
normal adhesive action is independent of the tangential 
adhesive action. Moreover, j),, j}. denote the directional 
derivative in the sense of Clarke [7], and uy, vy (respec- 
tively, ur, vr) denote the normal (respectively, tangen- 
tial) component of the displacement with respect to I’. 
The solution of the above problem can be obtained in 
most cases of practical interest (cf. [21]) under certain 
mild hypotheses which guarantee this equivalence, by 


solving the substationarity problem 


0 € d1(u) = a su, u)+ 1 in(un)dl 
j dr—(f,u)}, 
+f jn(us) (f u| 


where 0 denotes the generalized gradient of Clarke. 

In engineering problems the nonconvex superpo- 
tentials (cf. e.g. [16]) jy and jr are not independent 
but they depend jy (respectively, jr) on the vectors Sr 
(respectively, Sy), where Sr, Sy are the reactions cor- 
responding to uy, uy respectively. In this case a hemi- 
variational inequality cannot be formulated. In order to 
solve this problem numerically one may apply the fol- 
lowing procedure: In the first step it is assumed that Sy 
is given, say, so and the problem c enters with its 
work into (f ue u)) 


Oe a solu u) + i jr(S@, unar —(f, | 
T 
(17) 


is solved. The above problem yields a value of Sy, say 
Sag Then the problem 


oF) Sanu) + | ju(S andar — (A. ah 
r 
(18) 


is solved (is? enters with its work into eae u)) yield- 
ing a new value of Sy, say so and so on until the dif- 
ferences || su - se || and || SY - suru || at each 
point of the discretized interface [” become appropri- 
ately small. Here || - || denotes the R°-norm because 
the values are checked pointwise. The first (respectively, 
second) problem with jy = 0 (respectively, with jr = 0) 
corresponds to the first level (respectively, to the sec- 
ond level). Applications of the above procedure can be 
found in [15,20,21]. 


Structures with Fractal Interfaces 


In this section the attention is focused on the fractal ge- 
ometry of interfaces where their behavior is modeled 
by means of an appropriate nonmonotone contact and 
friction mechanism. The interfaces of fractal geometry 
are analyzed here as a sequence of classical interface 
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subproblems. These classical subproblems result from 
the consideration of the fractal interface as the unique 
‘fixed point’ of a given iterative function system (IFS), 
which consists of N contractive mappings w;: R? > R? 
with contractivity factors 0 < 5; < 1,i=1,..., N [2]. 
According to this procedure, a fractal set A is the ‘fixed 
point’ of a transformation W i.e. 


N 
A= W(A) = (J Wit), 


i=1 
where W,;; is defined 


W;(B) = {wi(x): x € B}, WB e H(R’). 


Generally a fractal set A is given by the relation: 


A= lim W”(B), VB € H(R?), 
noo 


where H(R’) is the space of all compact subsets of 
R’. Thus each level corresponds to a classical geom- 
etry approximating the fractal geometry. Within each 
level a new optimization problem is solved with the new 
data. Thus the multilevel character of the optimization 
problem results from the necessity to take into account 
the fractal geometry. 

In the sequel a linear elastic structure occupying 
a subset 2 of R? is considered. In its undeformed state 
the structure has a boundary I” which is decomposed 
into two mutually disjoint parts y and Ir. It is as- 
sumed that on I y(respectively, Ir) the displacements 
(respectively, the tractions) are given. In the structure 
§2 some cracks with interfaces ® of fractal type are 
formed. These cracks in brittle materials frequently 
propagate along one or more irregular ways. In this case 
the fracture system may be considered to be a cluster of 
branches propagating in such a way that new branches 
in the n + 1 step are successively created from a for- 
mer branch at the n step. In other words the fracture 
system can be modeled by an IFS procedure. Regarding 
now the boundary conditions on ®, it is assumed that 
nonmonotone, possibly multivalued laws describe the 
behavior of each interface in the normal and tangential 
directions. More specifically, it is assumed that the fol- 
lowing boundary conditions hold: 


= Sn € djn(un, x), 


—Sreé Ojr(ur, x). 


Then according to the previous section, an equilibrium 
position of (2 is characterized by the hemivariational 
inequality (16). 

In this case, where the fractured body 92 with fractal 
interfaces ® is studied, it is necessary to substitute in 
(16) the domain I" with @. As it has been mentioned 
above, ® is the fixed point of a given transformation 
denoted by W, i.e. 


d=wWO, 
pint) = wo”, 
Oo” > @. 


Thus, for each approximation ©” of the fractal inter- 
face ® a structure 2“ must be solved. Since ®™ is an 
interface set with classical geometry the solutions u‘” 
and o” (where u and o™ are the corresponding dis- 
placement and stress fields) are obtained using numer- 
ical procedures for the solution of (17) and (18). This 
procedure is repeated several times by increasing n; at 
the limit n > oo, u“ and o™ give the solution of the 
fractal interface problem. 
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It is well known that, on the one hand, combinatorial 
optimization (CO) provides a powerful tool to formu- 
late and model many optimization problems, on the 
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other hand, a multi-objective (MO) approach is often 
a realistic and efficient way to treat many real world ap- 
plications. Nevertheless, until recently, Multi-objective 
combinatorial optimization (MOCO) did not receive 
much attention in spite of its potential applications. 
One of the reason is probably due to specific difficul- 
ties of MOCO models. We can distinguish three main 
difficulties. The first two are the same as those ex- 
isting for multi-objective integer linear programming 
(MOILP) problem (cf. » Multi-objective Integer Lin- 
ear Programming), i. e. 
e the number of efficient solutions may be very large; 
e the nonconvex character of the feasible set requires 
to device specific techniques to generate the so- 
called ‘nonsupported’ efficient solutions (cf. » Mul- 
ti-objective Integer Linear Programming). 
A particular single CO problem is characterized by 
some specificities of the problem, generally a special 
form of the constraints; the existing methods for such 
problem use these specificities to define efficient ways 
to obtain an optimal solution. For MOCO problem, it 
appears interesting to do the same to obtain the set of 
efficient solutions. Consequently, and contrary to what 
is often done in MOLP and MOILP methods, a third 
difficulty is to elaborate methods avoiding to introduce 
additional constraints so that we preserve during all the 
procedure the particular form of the constraints. 
The general form of a MOCO problem is 


"min’ z,(X) = cx X, 
XES 
k=1,...,K, 
(P) where S= DNB" 
with X(n x 1), 
B= {0,1} 


and Disa specific polytope characterizing the CO prob- 
lem: assignment problem, knapsack problem, traveling 
salesman problem, etc. 

There exists several surveys on MOCO; some are 
devoted to specific problems (i.e., the particular form 
of D): the shortest path problem [8], transportation net- 
works [2], and the scheduling problem [6,7]; the survey 
[9] is more general examining successively the litera- 
ture on MO assignment problems, knapsack problems, 
network flow problems, traveling salesman problems, 
location problems, set covering problems. 


In the present article we put our attention on the 
existing methodologies for MOCO. First we examine 
how to determine the set E(P) of all the efficient solu- 
tions and we distinguish three approaches: direct meth- 
ods, two-phase methods and heuristic methods. Subse- 
quently we analyse interactive approaches to generate 
a ‘good compromise’ satisfying the decision maker. 


Generation of E(P) 
Direct Methods 


The first idea is to use intensively classical methods for 
single objective problem (P) existing in the literature to 
determine E(P). Of course, each time a feasible solution 
is obtained the k values z;,(X) are calculated and com- 
pared with the list E(P) containing all the feasible solu- 
tions already obtained and non dominated by another 
generated feasible solution. Clearly, E(P), called the set 
of potential efficient solutions, plays the role of the so- 
called ‘incumbent solution’ in single objective methods. 
At each step, E(®) is updated and at the end of the pro- 
cedure E(P) = E(P). Such extension of single objec- 
tive method is specially designed for enumerative pro- 
cedure based on a branch and bound approach. Unfor- 
tunately, in a MO framework, a node of the branch and 
bound tree is less often fathomed than in the single ob- 
jective case, so that logically such MO procedure is less 
efficient. 

We describe below an example of such direct 
method, extending the well known Martello-Toth pro- 
cedure, for the multi-objective knapsack problem for- 
mulated as 


The following typical definitions are used (k = 1,..., K): 

e Oj: variables order according to decreasing values of 

e =r"); the rank of variable jin order Ox. 

e ©: variables order according to increasing values of 
4 7K 

We assume that variables are indexed according to or- 

dinal preference 0. 
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At any node of the branch and bound tree, variables 
are set to 0 or 1; let By and B, denote the index sets of 
variables assigned to the values 0 and 1, respectively. Let 
F be the index set of free variables which always follow, 
in the order ©, those belonging to B, U Bo. If i — 1 is 
the last index of fixed variables, we have B; U Bo = { 1, 
voi rl} F={i...,n}. 

Initially, i= 1. Let 

© W=W—) jeg, Wj = 0 be the leftover capacity of 
the knapsack. 

‘Ze (z = View, Sy oor be the criteria val- 
ues vector obtained with already fixed variables. 
E(P) contains nondominated feasible values Z and 
is updated at each new step. 

Initially, z, = 0, V k, and E(P) = 9. 

e Z = (Z) be the vector whose components are upper 
bounds of feasible values respectively for each ob- 
jective at considered node. These upper bounds are 
evaluated separately, for instance as in the Martello- 
Toth method. 

Initially, Z, = 00, Vk. 

A node is fathomed in the following two situations: 

i) if {je F: w; < W} =Q;or 

ii) Zis dominated by z* € E(P). 

When the node is fathomed, the backtracking proce- 

dure is performed: a new node is build up by setting to 

zero the variable corresponding to the last index in B}. 

Let t be this index: 


By < By\{t}, 

Bo — (Bo N{1,...,t—1}) U ft}, 

F< {t+1,...,n}. 
When the node is nonfathomed, a new node of the 
branch and bound tree is build up for next iteration, 


as follows: 
e Define s to be the index variable such that 


1 

max ,/¢éF: Yiwi<W 

iri 
Ifw; > W, sets=i—1. 
e Ifsti: 

B, — B, Uf{i,...,s}, 

Bo < Bo, 

F < F\{i,...,s}. 


Ifs=i-1, 


By <— B, U {r}, 
Bo <— Bo U fi,...,r—1}, 
F<F\fi,...,r}, 


with r = min {j € F: w; < W}. 
The procedure stops when the initial node is fathomed 
and then E(P) = E(P). An illustration is given in [10]. 


Two-Phase Method 


Such an approach is particularly well designed for bi- 
objective MOCO problems. The first phase consists to 
determine the set SE(P) of supported efficient solu- 
tions (see » Multi-objective Integer Linear Program- 
ming). Let S U S’ be the list of supported efficient so- 
lutions already generated; S is initialized with the two 
efficient optimal solutions respectively of objectives z 
and z2. Solutions of S are ordered by increasing value 
of criterion 1; let X, and X, be two consecutive solu- 
tions in S, thus with z,, < zj; and Z2, > Z2;, where Zy = 
Z(X1). The following single-criterion problem is con- 
sidered: 


min za(X) = AqzZ(X) + A2z2(X) 
XeS=DnNB 
A, >0, Az,>0. 


(Pa) 


This problem is optimized with a classical single ob- 
jective CO algorithm for the values A; = Z2, — Z2; and Az 
= Zs — Z1,; With these values the search direction z, (X) 
corresponds in the objective space to the line defined 
by Z, and Z,. Let {X': t= 1,..., T } be the set of optimal 
solutions obtained in this manner and {Z;:t = 1,..., T } 
their images in the objective space. There are two pos- 
sible cases: 

e {Z,,Z,}N{Z:t=1,..., T } =: Solutions X‘ are 
new supported efficient solutions. X' and X’, pro- 
vided T > 1, are put in S and, if T > 2, X’,..., X7~} 
are put in S’. It will be necessary at further steps to 
consider the pairs (X’, X') and (X", X°) 

e {Z,,Z,} C {Zt =1,..., T }: Solutions {X': t = 
1,..., T } \ { X", X* } are new supported efficient 
solutions giving the same optimal value as X” and 
XS for z,(X); we put them in list S’. 
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@ solutions in s 
+ solutions in s’ 


Multi-objective Combinatorial Optimization, Figure 1 
SE(P)=SUS’ 


This first phase is continued until all pairs (X", X*) of S 
have been examined without extension of S. 

Finally, we obtain SE(P) = S U S’ as illustrated in 
Fig. 1. 

The purpose of the second phase is to generate the 
set NSE(P) = E(P) \ SE(P) of nonsupported efficient 
solutions. Each nonsupported efficient solution has its 
image inside the triangle A Z,Z, determined by two 
successive solutions X” and X* of SE(P) (see Fig. 1). So 
each of the |SE(P)| — 1 triangles A Z,Z, are successively 
analysed. This phase is more difficult to manage and is 
dependent of the particular MOCO problem analysed; 
in general, this second phase is achieved using partly 
a classical single objective CO method. An example of 
such second phase is given in » Bi-objective Assign- 
ment Problem and in [14] for the bi-objective knapsack 
problem. 


Heuristic Methods 


As pointed out in [9,10,14], it is unrealistic to extend 
the exact methods describe above to MOCO problems 
with more than two criteria or more than a few hun- 
dred variables; the reason is that these methods are too 
consuming time. Because a metaheuristic, simulating 
annealing (SA), tabu search (TS), genetic algorithms 
(GA), etc., provide, for the single objective problem, 
excellent solutions in a reasonable time, it appeared 
logical to try to adapt these metaheuristics to a multi- 
objective framework. 


The seminal work in this direction is the 1993 Ph.D. 
thesis of E.L. Ulungu, which gave rise to the so-called 
MOSA method to approximate E(P) (see, in particu- 
lar, [11]). After this pioneer study, this direction has 
been tackled by other research teams: P. Czyzak and A. 
Jaszkiewicz ([3]) proposed another way to adapt simu- 
lating annealing to a MOCO problem; independently, 
[4,5] and [1] did the same with tabu search, the later 
combining also tabu search and genetic algorithms; ge- 
netic algorithms are also used in [13]. 

The principle idea of MOSA method can be re- 
sumed in short terms. One begins with an initial iterate 
Xo and initializes the set of potentially efficient points 
PE to just contain Xo. One then samples a point Y in 
the neighborhood of the current iterate. But instead of 
accepting Y if it is better than the current iterate on an 
objective: we now accept it if it is not dominated by any 
of the points currently in the set PE. If it is not domi- 
nated, we make Y the current iterate, add it to PE, and 
throw out any point in PE that are dominated by Y. On 
the other hand, if Y is dominated, we still make it the 
current iterate with some probability. In this way, as we 
move the iterate through the space, we simultaneously 
build up a set PE of potentially efficient points. The 
only complicated aspect of this scheme is the method 
for computing the acceptance probability for Y when it 
is dominated by a point in PE. The MOSA method is 
described in details in [11] and in > Bi-objective As- 
signment Problem. 


Interactive Determination of a Good Compromise 


The general idea of interactive methods is described in 
> Multi-objective Integer Linear Programming. Two 
types of methods can be distinguished, which we treat 
in the following subsections. 


Goal Programming 


As pointed out in [9], this methodology is often used 
by American researchers to treat several case studies. 
The general idea of goal programming method is to in- 
troduce for each objective k deviation variables d* and 
d_, respectively by excess and by default, with respect 
to a certain a priori goal gj, so that goal constraints are 
defined. If some priorities expressed by some weights px 
are given, this results in a single-objective problem (P,) 
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defined by the global weighted deviation function: 


K 
min > pedy 
k=1 
st. ze (X) + ay —d, = &, 
XeES=DNB". 


P 
Os) Vk, 


When a solution is obtained, the decision maker can 
possibly modify the values of the goals g; before a new 
iteration is performed. One drawback is that the addi- 
tional goal constraints induce the loss of the particular 
structure of the initial CO problem, so that a general 
ILP software must be used to solve problem (Py). 


Interactive Two-Phase Methods and MOSA Method 


The two-phase methodology described above can eas- 
ily be adapted to build interactively a good compro- 
mise. At each step of the first phase, the decision maker 
can indicate which pair (X,, X;) he prefers so that only 
a small subset of SE(P) is generated in the direction 
given by the decision maker; at the second phase, only 
one (or a few number of) triangles A Z,Z, is (are) anal- 
ysed to verify if there exists in it a more satisfying non- 
supported efficient solution. In the same spirit, an inter- 
active MOSA method can be designed (see also [12]): 
the decision maker gives some goals g, and only the 
solutions satisfying z,(X) < gx are putting in the list 
of potential efficient solutions. When this list contains 
a certain a priori fixed number of solutions, the deci- 
sion maker indicates which one is preferred, modifies 
the goals g; in a more restrictive sense before to con- 
tinue the search with MOSA. 

An example of such interactive procedure is given 
in [12] for a real case study. 
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Introduction 


A number of optimization problems are actually mul- 
tiobjective optimization problems (MOPs), where the 
objectives are conflicting. As a result, there is usually no 
single solution which optimizes all objectives simulta- 
neously. A number of techniques have been developed 
to find a compromise solution to MOPs. The reader is 
referred to the recent book by Miettinen [16] about the 
theory and algorithms for MOPs. Fractional program- 
ming problems(FPPs) arise from many applied areas 
such as portfolio selection, stock cutting, game theory, 
and numerous decision problems in management sci- 
ence. Many approaches for FPPs have been exploited in 
considerable details. See, for example, Avriel et al. [3], 
Craven [5], Schaible [24,25], Schaible and Ibaraki [26] 
and Stancu-Minasian [27,28]. 

In this paper, we consider the following multiobjec- 
tive fractional programming problem: 


(MFP) min 
f (x) A (a fax) fn) 
g(x) Bilx)? g(x)" gp(x) J? 
s.t. 
h(x) <0, xeEX, 


where X C R" is an open set, f;, gi (i = 1,2,...,p) 
are real-valued functions defined on X, and h is an m- 
dimensional vector-valued function defined on X. Sup- 
pose that fi(x) > 0 and gj(x)>0 for x € X and 
1,2,...,p. Moreover, let fi, gi (i = 1,2,...,p) 
and hj (j = 1,2,...,m) be continuously differentiable 
over X and denote the gradients of f;, gj and hj at x by 
Vfi(x), Vgi(x) and Vh;(x), respectively. 

If the parameter p in the problem (MFP) is equal 
to 1, then (MFP) corresponds to the following single- 
objective fractional programming problem: 


i= 


(FP) min ra) 
g(x) 


st. h(x) <0, xEX, 


where X C R" is an open set, f, g are real-valued func- 
tions defined on X, and h is an m-dimensional vector- 
valued function defined on X, f(x) > 0 and g(x) >0 
for all x € X. Moreover, assume that f(x), g(x) and 
hj(x) (j = 1,2,...,m) are continuously differentiable 
over X. 


Multi-objective Fractional Programming Problems 


2443 


Khan and Hanson [10], and Reddy and Mukher- 
jee [21] considered the optimality conditions and du- 
ality for (FP) with respect to the following generalized 
concepts of convexity, respectively. 


Definition 1 [6] Let f bea real function defined on an 
open set X C R” and differentiable at x9. Given a map- 
ping 7 : XxX — R", the function f is said to be invex at 
xo with respect to n if, Vx € X, the following inequality 
holds: 


F(x) — f (x0) = Vf (%0)" n(x, x0). 


Definition 2. [7] Let f bea real function defined on an 
open set X C R” and differentiable at xo. Given a real 
number p, a mapping 7 : X x X — R" and a scalar 
function d : X x X — R, the function f is said to be 
p-invex at xo with respect to 7 and d if, Vx € X, the 
following inequality holds: 


f(x) — f(xo) = Vf (x0)! n(x, Xo) + pd? (x, Xo). 


The authors of references [10,21] imposed the corre- 
sponding generalized convexity on the numerator and 
denominator individually for the objective function in 
the problem (FP), and then derived some optimality 
conditions and duality results. How to extend these 
methods to the multiobjective case is still an open prob- 
lem [21]. 

As far as the multiobjective fractional problem 
(MFP) is concerned, Jeyakumar and Mond [8] intro- 
duced a concept of v-invexity as follows. 


Definition 3. Let f : X — R? bea real vector function 
defined on an open set X C R” and each component of 
f be differentiable at x9. The function f is said to be v- 
invex at xo € X if there exist a mapping n : Xx X — R” 
anda function a; : X x X > R+\{0} (i =1,2,..., p) 
such that, Vx € X, 


Fila) — fi(xo) = ai(x, x0) V filo) "n(x, x0) « 


Jeyakumar and Mond [8] obtained some weak effi- 
ciency conditions and duality results for a nonconvex 
multiobjective fractional programming problem via the 
concept of v-invexity, v-pseudoinvexity and v-quasiin- 
vexity. 

Motivated by various concepts of generalized con- 
vexity, Liang et al. [12] introduced a unified formu- 
lation of the generalized convexity, which was called 


(F, a, p, d)-convexity, and obtained some correspond- 
ing optimality conditions and duality results for the 
single-objective fractional problem (FP). In this paper, 
we will extend the methods adopted for the single- 
objective problem (FP) in [12] to the multiobjective 
problem (MFP). 


Definition 4 A function F : R" — R is said to be 
sublinear if for any a, @2 € R”, 


F(a, + a) < F(a,) + F(a2), (1) 
and for anyr € Ry, a € R", 


F(ra) = rF(q@). (2) 


Note that the concept of the sublinear function was 
given in Preda [20]. Now, a sublinear function is de- 
fined simply as a function that is subadditive and pos- 
itively homogeneous, which is free of extraneous sym- 
bols in Preda [20]. It follows from (2) that F(0) = 0. 

Based upon the concept of the sublinear function, 
we recall the unified formulation about generalized 
convexity, i.e., (F, a, p, d)-convexity, which was intro- 
duced in [12] as follows. 


Definition 5 Given an open set X C #”, a number 
p € R, and two functions a: X x X > Rx \ {0} and 
d: Xx X — R, a differentiable function f over X is said 
to be (F,a, p,d)-convex at x) € X if for any x € X, 
F(x, x9; +): 2” — M is sublinear, and f(x) satisfies the 
following condition: 


f(x) — f (xo) =F(x, x03 a(x, x0) V f(x0)) 3) 
+ pd?(x, x9). 
The function f is said to be (F, a, p, d)-convex over X 
if, Vxo € X, it is (F, a, p, d)-convex at x9; f is said to 
be strongly (F, a, p, d) — convex or (F, a) — convex if 
p> 0Oor p = 0, respectively. 
From Definition 5, there are the following special 
cases: 
(i) Ifa(x, xo) = 1 forall x, xo € X, then (F, a, p, d)- 
convexity is (F, p)-convexity [20]. 
(ii) If F(x, x03 a(x, xo) Vf (x0)) = Vf (xo) "n(x, xo) for 
a certain mapping 7 X x X — Rt", then 
(F, a, p, d)-convexity is p-invexity defined in [7]. 
(iii) If op = 0 or d(x,x9) = O for all x,x%) € X 
and F(x, xo; a(x, xo) Vf (xo)) = Vf xo)" n(x, xo) 
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for a certain mapping 7 : X x X — R", then 
(F, a, p, d)-convexity reduces to invexity [6]. 
In the following, p,@ and d are referred to as pa- 
rameters of (F, a, p, d)-convexity. Furthermore, we will 
adopt the following conventions. 

Let R% denote the nonnegative orthant of R” and x" 
denote the transpose of the vector x € R”. For any two 
vectors x = (21, %950005%e) 0 = (Ws Varnes yn) € 
R", we denote: 


x =yimplying xj;=yj, i=1,2,....m 


x~<yimplying xj) < yj, i=1,2,...,n 
but x # y; 
x < y implying $= 1,23 nc 


x A yimplying yj < x; 


Xi < Vi, 


for at leastone i. 


A solution of the problem (MFP) is referred to as an 
efficient (Pareto optimal) solution, which is defined as 
follows. 


Definition 6 A feasible solution x9 € X of (MFP) is 
called an efficient solution of (MFP) if there exists no 
other feasible solution x € X such that 


LO), Fo) 
g(x) g(%) 


In [14], Maeda gave a kind of constraint qualification, 
which was called generalized Guignard constraint qual- 
ification (GGCQ), under which he derived the follow- 
ing Kuhn-Tucker type necessary conditions for a feasi- 
ble solution x9 to be an efficient solution to the problem 
(MEP): 

If xo is an efficient solution of (MFP) and (GGCQ) 
holds at x9 [14], then there exist tT = (tT), T,..., wy" E 


RB £S0,¥ 35 = = landdA = (Ay,An,...,Am)' € 
R such that 
yvAee " Be Wh (xo) = 0, 
=I 8i(X0) =i 
Ajhj(xo) = 9, JH 12 yang Ms 
This paper is organized as follows. In Sect. 


“Efficiency Conditions”, efficiency conditions for the 
multiobjective fractional problem (MFP) involving (F, 
a, p, d)-convexity are presented. The duality properties 
of the problem (MFP) are studied in Sect. “Duality”, 


including several duals for (MFP) and some weak and 
strong duality theorems. Concluding remarks are given 
in the last section. 


Efficiency Conditions 


First, we present a lemma which indicates that (F, a, 
p, d)-convexity can be preserved after taking division. 


Lemma 1 Let X C R” be an open set. Assume that p, 
q are real-valued differentiable functions defined on X 
and p(x) > 0,q(x)>0 for allx € X. If p and —q are 
(F, a, p, d)-convex at xo € X, then p/q is (F,@, p, d)- 
— a(x,xo)q(xo) = __ 
convex at Xo, where @(x, xo) age 8 = 
p (1 +8 a) and d(x, xo) = 2%0), 
q2(x) 
In the following, we present some sufficient efficiency 
conditions for (MFP) under appropriate (F, a, p, d)- 
convexity assumptions. 


Theorem 1 Let xo be a feasible solution of (MFP). Sup- 
pose that there exist tT = (%, T,..., t)" E Bs t>0O, 
ati = Landd = (Ay,A2,...,Am)" € RY such 


that 


= wit J+ LAVA <0. (4) 
ij gi Xo 
Ajhj(xo) = 9, j=1,2,...,m. (5) 


If f; and —g; (i = 1,2,...,p) are (F, aj, pi, di)- 
convex at xo, hj (jf = 1,2,...,m) is (F, Bj, &j, ¢j)- 
convex at xo, and 


ee Xo) 


fee) nea (6) 


where @j(x,xo) = 


and dj(x,xo) = 


ai(x,x0)gi(xo) = : filxo 
sisangp, mp, (1+ S82), 


4X0) | then x is a global efficient so- 
7 (x) 
lution for (MFP). 


Corollary 1 Let xo be a eee solution of (MFP). 
Suppose that there exist tT = (t,T,..., Tp)’ € Ro. 


tO, ye Sad LS Oi dieses day ERO 
such that 
Sov hive) 2) Vhi(xo) = 0 
i j j\Xo) = VU, 
= 8i(X0) j=l aa 


Ajhj(xo) = 0, j=1,2,...,m. 
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If f; and —g; (i = 1,2,...,p) are strongly 
(F, a;, pi, di)-convex (or (F,a;)-convex) at xo, hj (GG = 
1,2,..., m) is strongly (F, Bj, ¢;, cj)-convex (or (F, B;)- 
convex) at xo, then xo is a global efficient solution for 
(MEP). 


For i = 1,2,...,p, if gi(x) = 1 for all x € X, 
fi(x) need not be nonnegative, and the functions in- 
volved are assumed to be invex, p-invex with respect to 
n: Xx X — R",d: Xx X > R, (F, p)-convex, or gen- 
eralized (F, p)-convex, respectively, then we can obtain 
the corresponding results presented in [1,2,9]. 

Next, we consider a special case of (MFP), in which 
the fractional objective functions have the same de- 
nominator. For i = 1,2,...,p, let gi(x) = g(x) in 
(MFP). The property about the efficient solution of this 
special (MFP) can be obtained similarly as that in The- 
orem 1, so we state the following theorem: 


Theorem 2 Let xo be a feasible solution of (MFP). Sup- 
pose that there exist T = (T,T2,... i) E Ras t>0, 

= Tj = land’ = (Aj,A2,...,Am)? € Ri? such 
that 


&(x0) 
A jhj(xo) = 0, 


. film)  < 
AY + SO A;Vhj(xo) = 0, 
i=1 j=l 
j=1,2,...,m. 


If —g is (F,a, p,d)-convex at xo,f; (i = 1,2,...,p) 
is (F,a, p;,d)-convex at xohj (j§ = 1,2,...,m) 
is (F,q, ¢;, d)-convex at Xo, and pen Tip; + 
ei Ajj = 0, where @(x, xo) = (a(x, x0) g(x0))/g(x), 
Bi = pit P(filxo))/g(x0) and d(x, xo) = (d(x, xo))/ 
(g2(x)), then x9 is a global efficient solution for (MFP). 


Finally, we present an equivalent formulation of the 
problem (MFP). Let G(x) = Ps gilx), Gi(x) = 
a (i = 1,2,...,p). Then (MFP) can be written in 
the following form: 


(MFP) 
(See Go(x) fo(x) 
G(x)” G(x) 7 
st. h(x) <0, x EX. 


Gp(x) fp(x) ; 
G(x) , 


By Theorem 2, we have the following corollary: 


Corollary 2 Let xo be a feasible solution of (MFP). 
Suppose that there exist t = (t%1,%,...,Tp)’ € Res 


e>U.> a = landd = (Ay, A2,... 
such that 


Ag eRe 


Sov lieo 4 ee )=0 
= ‘ &i (Xo) = ieee ‘ 
i= j=l 

Ajhj(xo) = 0, j=1,2,...,m. 

If —G is (F, a, p, d)-convex at xo, Gi fi(i = 1,2,..., 
p) is (F, &, pi, d)-convex at xo, hj(j = 1,2,...,m) 
is (F,@,¢;,d)-convex at xo, and eS Tip; + 
a1 Aj$) = 0, where p; = pi + P(filxo))(gi(xo))s 
A(x, xo) = (a(x, xX9)G(x0))/G(x), and d(x,xo) = 
(d(x, xo))/(G"?(x)), then xo is a global efficient solu- 
tion for (MEP). 


Under the assumptions of Theorem 2 or Corollary 2, 
ifp > ee Pi = pill + filxo)/g(xo)), or Pj) = 
pi(1+ fi(xo)/gi(xo)), respectively, then the correspond- 
ing results still hold. 


Duality 


Many types of duals for a given mathematical program- 
ming problem. Two well-known duals are the Wolfe 
type dual [29] and the Mond-Weir type dual [17]. Re- 
cently, the mixed (or general type) dual has been con- 
sidered for various optimization problems [1,2,11,13, 
18,19,20,30,31,32]. The mixed dual includes the Wolfe 
type dual and the Mond-Weir type dual as special cases. 
In the sequel, the generalized Mond-Weir dual are dis- 
cussed first, and then three other types of duals are pre- 
sented, which are based on (F,a@, p, d)-convexity for 
the problem (MFP). 

Let M = {1,2,...,m} and Mo, M,,...,M, be 
a partition of M, i.e, Uf_, Mk = M,Mk(\Mi = 9 
for k # I. The generalized Mond-Weir dual of (MFP) 
is as follows: 


f(u) 


max—— + MyM (u) ea 


g(u) 


(a + A Mp Mo (1) I eteises 


fp{u) 
&p(u) 


P m 
s.t. iy ive + SAjVhj(u) = 0, 
i=1 j=l 


Tr 
eS Aan (#)) 


gi(u 
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Ani tt, = 0, = 12 yang 


t= (iit) CR eS, 


P 
>> Tji= 1, 
i=1 


Am, € REM k= 0,1,2, 2.2545 


on 


where e = (1,1,...,1)' and Ay, denotes the column 
vector whose subscripts of components belong to Mx. 
In particular, if Mo = M,M, = 9,k = 1,2,...,q, 
then the above dual becomes the Wolfe type dual; if 
My = @ and q = 1,M,; = M, the Mond-Weir type 
dual is obtained. Since the Wolfe type dual is unsuit- 
able for single objective fractional programming prob- 
lems [15,22,23], the duals with My # @ are certainly 
unsuitable for (MFP). For the generalized Mond-Weir 
type dual, we only consider the case My = 9, M; = M, 
i.e., the Mond-Weir dual. 


Mond-Weir Dual 


The Mond-Weir dual of the problem (MEP) has the fol- 
lowing form: 


(MED1) 


ax - (ae fou) 
g(u) gi(u)’ go(u)’” gp(u) 


gi(u) 
ATh(u) = 0, 


p m 
st. )> Py ple +S AjVhj(u) = 0, 
i=1 : j=l 


P 
oc a ce %) €R 2 >0, > y=) 
i=] 


A= (Ai, A2,..., Am) ERT UEX., 


Theorem 3 (Weak Duality) Assume that x is a fea- 
sible solution of (MFP) and (u,T, A) is a feasible so- 
lution of (MFD1). If f; and —g; (i = 1,2,...,p) 
are (F,a;, p;,d;)-convex at u, hj (j = 1,2,...,m) is 
(F, B, 6), cj)-convex at u, and the inequality 


holds, where @;(x,u) = aj(x,u)(g(@))/(g&)) DP; = 
pill + (filu))(gi(@))), and di(x,u) = (di(x,u))/ 


(g? (x)), then we have 


f@) 
g(x) 


fu) 
g(u) 


Corollary 3. (Weak Duality) Assume that x is a fea- 
sible solution of (MFP), and (u,T, A) is a feasible so- 
lution of (MFD1). If f; and —g; (i = 1,2,..., p) are 
strongly (F, 0;, pi, d;)-convex (or (F, a;)-convex) at u, 
and hj; (j = 1,2,..., m) is strongly (F, B, €;, cj)-convex 
(or (F, B.)-convex) at u, then 


f(x) , f@) 
g(x)“ g(a) 


Theorem 4 (Strong Duality) Assume that x is an ef- 
ficient solution of (MFP) and the constraint qualifica- 
tion (GGCQ) holds at ¥ [14]. Then there exists (T,A) € 
R' x R™ such that (x,T,A) is a feasible solution of 
(MED1), and the objective function values of (MFP) and 
(MFD1) at the corresponding points are equal. If the as- 
sumptions about the generalized convexity and the in- 
equality (7) in Theorem 3 are also satisfied, then (x, T, A) 
is an efficient solution of (MFD1). 


Schaible Dual 


In this subsection, we shall consider the following ex- 
tended form of the Schaible dual for (MFP) [22,23]: 


(MED2) 
max A = (Ai, A2, toe ae 


Pp m 
s.t. > ti Vulfi(u) — Aigi(u)) + *: vjVhj(u) 


i=1 j=l 


= 0, 
fitu) — Aigi(u) = 0, i=1,2,...,p 
v'h(u) > 0, 
P 
tT>0, yea, 
i=1 
Ae Ri, TER, veRt, ueX. 


Theorem5 (Weak Duality). Assume that x is a feasible 
solution of (MFP) and (u,T, A, ¥) is a feasible solution of 
(MFD2). If one of the following holds: 
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e (I) fj and —g; (i = 1,2,..., 
convex atu, hj (j = 
convex at u, and 


p) are (F, 0;, pi, di)- 
.,m) is (F, B, fj, c;)- 


ci(X, U) 


+y v6; —— B(x, i) = 0; 
(8) 


ee d?(X, 71) 
Yt taj aD 


e (II) fj and —g; (i = 1,2,...,p) are (F,a, p;, d)- 
convex at u, hj (j = 1,2,...,m) is (F,a, ¢;, d)- 
convex at u, and the vectors T, A, V satisfy: 


ya pill +1) + Vib; =O, 


i=1 j=l 


Theorem 6 (Strong Duality). Assume x is an effi- 
cient solution of (MFP), and the constraint along 
tion ete holds at x [14]. Then there exist T € Ri 
es Ri v € R' such that ie T,A,V) is a feasible 
solution of (MFD2) and 1 = £%. Furthermore, if all 
assumptions in Theorem 5 are ve then the corre- 
sponding (X,T, A, ¥) is an efficient solution of (MFD2). 


Extended Bector Type Dual 


For a single-objective fractional programming problem 
in [4], Bector used the positivity of the denominator to 
transform the inequality constraints and add them to 
the objective by Lagrangian mulitipliers for establishing 
a kind of dual. Since the denominators in (MFP) need 
not be the same, we use the equivalent form (MFP) of 
(MFP) to establish the following dual, which is called 
the extended Bector type dual of (MFP): 


(MED3) 


E 
Gp (u) fp(w)+v¥,, hag (u) 


G(u) 
P 
s.t. y, ti Vy 
i=1 


Gy(u) fu(u)+v 4, ha (w) 
G(u) 


Gi(u) fi(u) + V4), hatg(u) 
G(u) 


q 
+ >° Vuvin, tm, (¥) = 0, 
k=1 


Vy gt) 20, RH, 20i05g, 
G;(u) fi(u) + Vyhag(u) = 9, 


i=1,2,....p, 

P 

ee ere ee ae 
i=1 

tT>0, 

ueX, vy, eR!) k=0,1,2,....4. 


Theorem 7 (Weak Duality) Let x be a feasible so- 
lution of (MFP) and (u,t,v) be a feasible solution of 
(MFD3). Assume that —G is (F,a@, p,d)-convex at u, 
Gif; (i = 1,...,p) is (F,a, p;,d)-convex at u and 
hj (j = 1,...,m) is (F,a, ¢;,d)-convex at u. If p = 
en p; and the following inequality holds: 


Gi(u) fi(u) + Viq, hm (w) 
G(u) ) 


P 
Y(t + 
i=1 
q 
a > vjtj + G(u) >> > vjoj = 0, 


j€Mo k=1 j€ My 


(10) 


then we have 


f(x) , Glu) f(u) + vig hmo(u) e 
g(x) G(u) 


> 


where G(u) = diag{G\(u),..., G,(u)} and each com- 
ponent in e € RP is equal to 1. 


Theorem 8 (Strong Duality) Assume that x is an effi- 
cient solution of (MFP) and the constraint qualification 
(GGCQ) holds at x [14]. Then there exists (t,v) such 
that (x,T,¥V) is a feasible solution of (MFD3), and the 
objective function values of (MFP) and (MFD3) at x and 
(x,T, v), respectively, are equal. If the assumptions and 
conditions in Theorem 7 are also satisfied, then (X, T, V) 
is an efficient solution of (MFD3). 


Concluding Remarks 


In this paper, a unified formulation of the generalized 
convexity defined in [12] is adopted, which includes 
many other generalized convexity concepts in opti- 
mization theory as special cases. Our concept of gen- 
eralized convexity is suitable to analyze the efficiency 
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conditions and duality of multiobjective fractional pro- 
gramming problems. Efficiency conditions and dual- 
ity for a class of multiobjective fractional programming 
problems are presented. We extend the methods, which 
were adopted for single-objective fractional program- 
ming problems in [10,12,21], to the case with multi- 
ple fractional objectives. We also present the extended 
Bector type dual by using an equivalent formulation of 
the primal problem. Note that we only consider (MFP) 
from a viewpoint of the efficient solution in this paper. 
The methods used here can be extended to the study of 
(MEP) from a viewpoint of the weak efficient solution. 
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From the 1970s onwards, multi-objective linear pro- 
gramming (MOLP) methods with continuous solutions 
have been developed [8]. However, it is well known 
that discrete variables are unavoidable in the linear pro- 
gramming modeling of many applications, for instance, 
to represent an investment choice, a production level, 
etc. 

The mathematical structure is then integer lin- 
ear programming (ILP), associated with MOLP giving 
a MOILP problem. Unfortunately, MOILP cannot be 
solved by simply combining ILP and MOLP methods, 
because it has got its own specific difficulties. 

The problem (P) considered is defined as 


‘max’ ZR(X) = ye Xj, 
k= teed 
TX <d, 
where D= 4X ER": x20, 
(P) x; integer, 
je] 
with T(m x n), 
d(m x 1), 
X(n x 1), 


JC {l,...,n}. 


If we denote LD = {X: TX < d, X > 0}, problem (LP) 
is the linear relaxation of problem (P): 


Z(X), 
XE€ELD 


"max! R= Lyons, Ky 


(LP) 


A solution X* in D (or LD) is said to be efficient for 

problem (P) (or (LP)) if there does not exist any other 

solution in D (or LD) such that zz (X)> z, (X*), k = 1, 
.» K, with at least one strict inequality. 

Let E(-) denote the set of all efficient solutions of 
problem (-). It is well known (see [8]) that (LP) may be 
characterized by the optimal solutions of the single ob- 
jective and parametrized problem: 


K 
max So Agze(X) 
k=1 

XE€ELD 


with A, >0, Vk, 


K 
Soak =a 
k=1 


(LP) 


This fundamental principle - often called Geof- 
frion’s theorem - is no longer valid in presence of dis- 
crete variables because the set D is not convex. The set 
of optimal solutions of problem (P,), defined as prob- 
lem (LP,) in which LD is replaced by D, is only a sub- 
set SE(P) of E(P); the solutions in SE(P) are called sup- 
ported efficient solutions, while the solutions belonging 
to NSE(P) = E(P) \ SE(P) are called nonsupported effi- 
cient solutions. 

The breakdown of Geoffrion’s theorem for problem 
(P) can be illustrated by the following obvious example: 


K=2, 
Z(X) = 6x1 + 3x2 + x3, 
Zo(X) = x, + 3x2 + 6x3, 
= {X: x, +x. + x3 <1, x; € {0,1}}. 
For this problem, 
E(P) = {(1,0, 0); (0, 1, 0); (0, 0, 1)} 


while NSE(P) = {(0, 1, 0)}. 
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Nevertheless, V.J. Bowman [1] has given a theoreti- 
cal characterization of E(P): Setting 


My; = max z;(X), 
XeED 


Ze = Met+ex, withe, > 0, 


p>o0, 


then E(P) is characterized by the optimal solutions of 
the problem(P}): 


K 
min ae ( (Ze — zk(X)) + p (>: (Zk — «))) , 


k=1 


consisting of minimizing the augmented weighted 

Tchebychev distance between z;(X) and Z,. 

Let us note that another characterization of E(P) is 
given in [2] for the particular case of binary variables. 
Two types of problems can be analysed: 

e Generate E(P) explicitly. Several methods have been 
proposed; they are reviewed in [10]. below we will 
present two of them, which appear general, charac- 
teristic and efficient. 

e To determine interactively with the decision maker 
a ‘best compromise’ in E(P) according to the pref- 
erences of the decision maker. Some of the existing 
approaches are reviewed in [11]; below we will de- 
scribe three of these interactive methods. 


Generation of E(P) 
Klein-Hannan Method 


See [5]. This is an iterative procedure for sequentially 
generating the complete set of efficient solutions for 
problem (P) (we suppose that the coefficients ae are 
integers); it consists in solving a sequence of progres- 
sively more constrained single objective ILP problems 
and can be implemented through use of any ILP algo- 
rithm. 
e (Initialization: step 0) An objective function / € {1, 
...» K} is chosen arbitrarily and the following single 
objective ILP problem is considered: 


(Po) max z)(X). 
XeD 


Let E(Po) be the set of all optimal solutions of (Po) 
and let Eo(P) be the set of solutions defined as Ey (P) 
= E(Po) N E(P). Thus, Eo(P) is the subset of non- 
dominated solutions in E(P9). 


e (Step j, (j =1)) The efficient solutions generated at 
the previous steps are denoted by X¥,r=1,..., R, 
i.e. UJ, Bi(P) = {X*:r = 1, ..., R}. In this jth step, 

the following problem is solved 


max z)(X) 
XED 
(Pj) a 
(\| U2) = (x) +1 
r=1 | k=1 
k#l 


The new set of constraints represents the require- 
ment that a solution to (P;) be better on some ob- 
jective k # | for each efficient solution X* gener- 
ated during the previous steps; an example of imple- 
mentation of theseconstraints is given in [5]. The set 
of solutions E;(P) is then defined as Ej(P) = E(P;)N 
E(P), where E(P;) is the set of all optimal solutions 
The procedure continues until, at some iteration J, the 
problem (P;) becomes infeasible; at this time E(P) = 
Uy E AP). 


Kiziltan-Yucaoglu Method 


See [4]. This is a direct adaptation to a multi-objective 
framework of the well-known Balas algorithm for the 
ILP problem with binary variables. 

At node S" of the branch and bound scheme, the fol- 
lowing problem is considered: 


‘max’ » cjxj + > Cj 
jer" jeB" 
s.t. = tjxj S d’ 
j€F 
xj = (0, 1) 
where _ B’ is the index set of variables 
assigned the value one 


F’ is the index of free variables 
d’=d-) 3 
jeB 

t; is the jth column of T 

c; is the vector of components ie 
The node S" is called feasible when d'> 0 and infeasible 
otherwise. The three basic rules of the branch and bound 
algorithm are: 
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e (bounding rule) A lower and upper bound vector, 
Z' and Z", respectively, are defined as 


Zz = > Cj, 
jeBr 

Faery, 
where Y, = )j€F’ max{0, ch The vector Z” is 
added to a list E of existing lower bounds if Z’ is 
not dominated by any of the existing vectors of E. 
At the same time, any vector of E dominated by Z’ 
is discarded. 

e (fathoming rules) In the multi-objective case, the 
feasibility of a node is no longer a sufficient condi- 
tion for fathoming it. The three general fathoming 
conditions are: 

- Z" is dominated by some vector of E; 

- the node S’ is feasible and Z” = Z’; 

- the node S’ is unfeasible and }°j¢ F’ min(0, tij)> 
d' for some i =1,..., m. 

The usual backtracking rules are applied. 

e (branching rule) A variable x; € F" is selected to be 

the branching variable. 

- Ifthe node S" is feasible, ] € {j € F": cj £ O}. 

- Otherwise, index / is selected by the minimum 
unfeasibility criterion: 


min max (0, -—dj + tij) : 
jer’ = 
= 


When the explicit enumeration is complete, E(P) = E. 


Interactive Methods 


Such methods are particularly important to solve multi- 
objective applications. The general idea is to determine 
progressively a good compromise solution integrating 
the preferences of the decision maker. 

The dialog with the decision maker consist of 
a succession of ‘calculation phase’ managed by the 
model and ‘information phase’ managed by the deci- 
sion maker. 

At each calculation phase, one or several new effi- 
cient solutions are determined taking into account the 
information given by the decision maker at the pre- 
ceding information phase. At each information phase, 
a few number of easy questions are asked to the deci- 
sion maker to collect information about its preferences 
in regard to the new solutions. 


Gonzalez—Reeves-Franz Algorithm 


See [3]. In this method a set E of K efficient solutions is 

selected and updated in each algorithm step according 

to the decision maker’s preferences. At the end of the 
procedure, E will contain the most preferred solutions. 

The method is divided in two stages: in the first one, the 

supported efficient solutions are considered, while the 

second one deals with nonsupported efficient solutions. 

e (Stage 1): Determination of the best supported effi- 
cient solutions. E is initialized with K optimal so- 
lutions of the K single objective ILP problems. Let 
us denote by Z the K corresponding points in the 
objective space of the solution of E. At each itera- 
tion, a linear direction of search G(X) is build:G(X) 
is the inverse mapping of the hyperplane defined by 
the points of Z in the objective space into the deci- 
sion space. A new supported efficient solution X”* is 
determined by solving the single objective ILP prob- 
lem maxxepG(X) and Z* is the corresponding point 
in the objective space. Then: 

- if Z* ¢ Z and the decision maker prefers solu- 
tion X* to at least one solution of E: the least pre- 
ferred solution is replaced in E by X* and a new 
iteration is performed; 

- if Z* ¢ Zand X* is not preferred to any solution 
in E: E is not modified and the second stage is 
initiated; 

~ if Z * Z: Z defines a face of the efficient surface 
and the second stage is initiated. 

e (Stage 2): Introduction of the best non supported so- 
lutions. We will not give details about this second 
stage (see [3] or [10]); letus just say that it is per- 
formed in the same spirit but considering the single 
objective problem 


max G(X) 
Xe€ED 
Gi=C=« with ¢ > 0 


where G is the optimal value obtained for the last 
function G(X) considered. 


Steuer-Choo Method 


See [9]. Several interactive approaches of MOLP prob- 
lems can also be applied to MOILP; among them, we 
mention only the Steuer-Choo method, which is a very 
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general procedure based on problem (P73) defined in the 
introduction. 

The first iteration uses a widely dispersed group of 
A weighting vectors to sample the set of efficient solu- 
tions. The sample is obtained by solving problem (P7) 
for each of the A values in the set. Then the decision 
maker is asked to identify the most preferred solution 
X() among the sample. At iteration j, a more refined 
grid of weighting vectors A is used to sample the set 
of efficient solution in the neighborhood of the point 
zy(X) (k = 1,..., K) in the objective space. Again the 
sample is obtained by solving several problems (P}) 
and the most preferred solution X*) is selected. The 
procedure continues using increasingly finer sampling 
until the solution is deemed to be acceptable. 


The MOMIX Method 


(See [6].) The main characteristic of this method is the 

use of an interactive branch and bound concept - ini- 

tially introduced in [7] - to design the interactive phase. 

e (First compromise): The following minimax opti- 
mization, with m = 1, is performed to determined 
the compromise X“): 


min 6 
(P™) Vk IT” (My” — zx(X)) < 8, 
Xe D™ 
where 
- DY=D; 


- [m, M | are the variation intervals of the cri- 
teria k, provided by the pay-off table (see [8]); 

- II Se are certain normalizing weights taking into 
account these variation intervals (see [8]). 


Remark 1 Ifthe optimal solution is not unique, an aug- 
mented weighted Tchebychev distance is required in 
order to obtain an efficient first solution. 


e (Interactive phases): There are integrated in an in- 
teractive branch and bound tree; a first step (a depth- 
first progression in the tree) leads to the determina- 
tion of a first good compromise; the second step (a 
backtracking procedure) confirms the degree of sat- 
isfaction achieved by the decision maker or it finds 
a better compromise if necessary. 
- (Depth first progression): For m > 1, let at the 

mth iteration 


1) X” be the mth compromise; 

2) a be the corresponding values of the crite- 
ria; 

3) [m”, M ol be the variation intervals of the 
criteria; and 

4) IT” be the weight of the criteria. 

The decision maker has to choose, at this mth it- 

eration, the criterion ],,(1)€ {k:k=1,..., K} he is 

willing to improve in priority. Then a new con- 

straint is introduced so that the feasible set be- 

comes D9) = DO™ 17 {zp (1)(X) > Zim (1)} 


Further, the variation intervals [m'” re.” | cn) 


and the weights 7"*” he 

feasible set D'"*!). The new compromise X‘ 

is obtained by solving the problem (P”*?). 

Different tests allow to terminate this first step. 

The node (m+1) is fathomed if one of the follow- 

ing conditions is verified: 

a) Do") = g; 

b) ue = mint) <e. Wk; 

c) the vector Z of the incumbent values (val- 
ues of the criteria for the best compromise 
already determined) is preferred to the new 
ideal point (of component M oe, 

The first step of the procedure is stopped if either 
more than q successive iterations do not bring an 
improvement of the incumbent point Z or more 
than Q iterations have been performed. 
Note that the parameters €;, q and Q are fixed in 
the agreement with the decision maker. 

c) (Backtracking procedure): It can be hoped that the 

appropriate choice of the criterion zj,,(1), at each 
level m of the depth-first progression, has been made 
so that at the end of the first step, a good compro- 
mise has been found. 
Nevertheless, it is worth examining some other parts 
of the tree to confirm the satisfaction of the deci- 
sion maker. The complete tree is generated in the 
following manner: at each level, K subnodes are in- 
troduced by successively adding the constraints: 


are updated on the new 
m+1) 


Zlm (1) (X) > ae 
Z(2)(X) > a Zim(y(X) < Zi. 


m 


y? 


) 


: ) 
(kK)? 


ZIm(k(X) S aan 


ZIm(K)(X) > a 
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for all k =1,..., K — 1, where 1,,(k) € {k: k= 1,..., 
K} is the kth objective that the decision maker wants 
to improve at the mth level of the branch and bound 
tree. 

At each level m, the criteria are thus ordered accord- 
ing to the priorities of the decision maker in regard 
with the compromise xem), 

The usual backtracking procedure is applied; yet it 
seems unnecessary to explore the whole tree. In- 
deed, the subnode k > K of each branching corre- 
spond to a simultaneous relaxation of those criteria 
Im(k), k < K, the decision maker wants to improve 
in priority! 

Therefore, the subnodes k > K = 2 or 3, for in- 
stance, do almost certainly not bring any improved 
solutions. 

The fathoming tests and the stopping tests are again 
applied in this second step. 
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A multi-objective (multicriteria) mixed integer program- 
ming(MOMIP) problem is a mathematical program- 
ming problem that considers more than one objective 


function and some but not all the variables are con- 
strained to be integer valued. The integer variables can 
either be binary or take on general integer values. The 
problem may be stated as follows: 


max Zz; = fi(x) 


max zx = f(x) 


s.t. xEex 


where X C R" denotes the nonconvex set of feasible so- 
lutions defined by a set of functional constraints, x > 0 
and x; integer j¢ JC {1,..., n}. It is assumed that X is 
compact (closed and bounded) and nonempty. 

Although a MOMIP problem may be nonlin- 
ear, models with linear constraints and linear objec- 
tive functions have been more often considered. In 
a multi-objective mixed integer linear programming 
(MOMILP) problem, the functional constraints can be 
defined as Ax < b, and the objective functions f(x) = 
cix, i= 1,..., k, where A is am x n matrix, b is a m- 
dimensional column vector and c;, i= 1, ..., k, are n- 
dimensional row vectors. 

Multi-objective mixed integer programming is very 
useful for many areas of application such as commu- 
nication, transportation and location, among others. 
Integer variables are required in a real-world model 
whenever it is sought to incorporate discrete phenom- 
ena; for instance, investment choices, production lev- 
els, fixed charges, logical conditions or disjunctive con- 
straints. However, research on MOMIP has been rather 
limited. Concerning multi-objective mathematical pro- 
gramming, most research efforts have been so far de- 
voted to linear programming with continuous variables 
(MOLP). The introduction of discrete phenomena into 
multi-objective models leads to all-integer or mixed in- 
teger problems that are more difficult to tackle. They 
can not be handled by most MOLP approaches be- 
cause the feasible set is no longer convex. Also, there 
are multi-objective approaches designed for all-integer 
problems that do not apply to the mixed integer case. 
Therefore, even for the linear case, techniques for deal- 
ing with multi-objective mixed integer programming 
involve more than the combination of MOLP with 
multi-objective integer programming techniques. 
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Efficiency and Nondominance 


The concept of efficiency (or nondominance) in 
MOMIP is defined as usually for multi-objective math- 
ematical programming: A solution x € X is efficient if 
and only if it does not exist another x € X such that 
filx) > fi(x) for alli € {1,..., k} and fi(x) > fi(x) 
for at least one i. A solution x € X is weakly efficient 
if and only if it does not exist another x € X such that 
filx) > fix) for allie {1,..., k}. 

Let Z C R* be the image of the feasible region X 
in the criterion (objective function) space. A criterion 
point z € Z corresponding to a (weakly) efficient solu- 
tion x € X is called (weakly) nondominated. The desig- 
nations ‘efficient’, ‘nondominated’ and ‘Pareto optimal’ 
are often used as synonyms. 


Supported and Unsupported 
Nondominated Solutions 


Since the feasible region is nonconvex, unsupported 
nondominated points/solutions may exist ina MOMIP 
problem. A nondominated point Z € Z is unsupported 
if it is dominated by a convex combination (which 
does not belong to Z) of other nondominated criterion 
points (belonging to Z). In Fig. 1 the line segment from 
A to B plus D is the set of supported nondominated cri- 
terion points. The line segment from C to D excluding 
C and D is the set of unsupported nondominated crite- 
rion points. Note that convex combinations of B and D 


22 
D 


ral 


Multi-objective Mixed Integer Programming, Figure 1 
Nondominated criterion points of a MOMILP problem 


dominate the line segment from C to D, excluding D. C 
is a weakly nondominated solution. 


Characterization of the Nondominated Set 


Unlike MOLP, the nondominated (or efficient) set of 
MOMIP problems can not be fully determined by pa- 
rameterizing on A the weighted-sums program: 


k 
(P,) max | oa xEex 


i=1 


where AE A. 
Here, 
A; >0 Vi, 
Aare. 
via 4i = 1 


The unsupported nondominated solutions cannot be 
reached even if the complete parameterization on A is 
attempted. 

Researchers on multi-objective mathematical pro- 
gramming early recognized this fact and stated other 
characterizations for the nondominated set that fit 
MOMIP and, in particular, MOMILP problems. Ba- 
sically, two main characterizations are defined. One 
consists of introducing additional constraints into the 
weighted-sums program. Generally, these constraints 
impose bounds on the objective function values. This 
form of characterization may be regarded as a partic- 
ularization of the general characterization provided by 
R.M. Soland [13]. The other is based on the Tchebycheff 
theory whose theoretical foundation originated from 
V.J. Bowman [3]. More details about these character- 
izations and on how they provide the computation of 
nondominated solutions will be given later. Although 
providing very important theoretical results, the char- 
acterizations of the nondominated set do not offer an 
explicit means to provide decision support for MOMIP 
problems. However, some authors have developed de- 
cision support methods for these problems. 


Interactive Versus Noninteractive Methods 


Methods may be either noninteractive (in general, 
generating methods designed to find the whole or 
a subset of the nondominated solutions) or inter- 


2456 


Multi-objective Mixed Integer Programming 


active (characterized by phases of human interven- 
tion alternated with phases of computation). Gener- 
ating methods for MOMIP problems usually require 
an excessive amount of computational resources, both 
in processing time and storage capacity. Even spe- 
cialized generating algorithms developed just for bi- 
objective problems, which profit from graphical rep- 
resentations on the criterion space, tend to be inad- 
equate to deal with large problems. Nevertheless, the 
distinction between interactive and generating meth- 
ods is not always clear. Some approaches attempt to 
find a representative subset of the nondominated set 
(generating methods according to the above defini- 
tion) and would be easily embodied in an interac- 
tive framework. The bi-objective method of R. Solanki 
[14] may be regarded as an example of such an ap- 
proach. 

Taking into account the difficulties mentioned 
above, and the large number of nondominated solu- 
tions in many problems, special attention to interactive 
methods will be paid. First of all, a short remark is made 
about the major paradigms followed by the authors of 
interactive methods. Some authors admit that the deci- 
sion maker’s (DM) preferences can be represented by 
an implicit utility function. The interactive process con- 
sists in building a protocol of interaction aiming to dis- 
cover the optimum (or an approximation of it) of that 
implicit utility function. The convergence to this opti- 
mum requires no contradictions in the DM’s responses 
given throughout the interactive process. 

In contrast with implicit utility function ap- 
proaches, the open communication approaches are 
based on a progressive and selective learning of the 
nondominated set. The terminology of open commu- 
nication is inspired on the concept of open exchange, 
defined by P. Feyerbend [6]. Such multi-objective ap- 
proaches are not intended to converge to any ‘best’ 
compromise solution but to help the DM to avoid the 
search for nondominated solutions he/she is not at all 
interested in. There are no irrevocable decisions dur- 
ing the whole process and the DM is always allowed to 
go ‘backwards’ at a later interaction. So, at each inter- 
action, the DM is only asked to give some indications 
on what direction the search for nondominated solu- 
tions must follow, or occasionally to introduce addi- 
tional constraints. The process only finishes when the 
DM considers to have gained sufficient insight into the 


nondominated solution set. Using the terminology of 
B. Roy [12], ‘convergence’ must give place to ‘creation’. 
The interactive process is a constructive process, not the 
search for something ‘pre-existent’. 

Although we personally prefer the open commu- 
nication methods, we will include in the next section 
a tentative classification of both, drawing out some 
differences and similarities between them. We adopt 
this perspective because this question is not specific 
to mixed integer programming and arguments pro or 
against each approach, besides being subjective, are the 
same as in other multi-objective programming fields. 
Furthermore, since MOMIP is still in its early steps, 
no behavioral studies exist addressing the use of pro- 
cedures within this context. 

As we have mentioned before, research on MOMIP 
has been rather scarce in comparison to other fields 
of the multi-objective mathematical programming, 
namely in MOLP. We will mention herein some well- 
known methods specially designed for MOMIP or far 
more generally applicable. 


Computing Processes and Their Use 
in Interactive Methods 


Weighted-Sums Programs 
with Additional Constraints 


The introduction of bounds on the objective function 
values into the weighted-sums program (P,) enables this 
program to also compute unsupported nondominated 
solutions: 


k 


(Pag) max) ) Aifi(x): x EX, flx)> gp, 


j=1 


where f(x) = (f1(x), ..., fx(x)), A € A and g is a vector 
of objective bounds. Besides the fact that every solution 
obtained by (P4, ¢) is nondominated, there always exists 
age R* such that (P 4,) yields a particular nondom- 
inated solution. Other types of additional constraints 
can also be used. 

A scalarizing program which consists of the 
weighted-sums program combined with additional 
constraints is used for computing nondominated solu- 
tions in the interactive branch and bound method of 
B. Villarreal et al. [18]. The additional constraints are 
bounds imposed on integer variables by the branching 
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process. This method, which is devoted to MOMILP 
problems, received later improvements in [8] and [11]. 
Starting by applying the well-known (MOLP) Zionts- 
Wallenius procedure to the linear relaxation of the 
MOMILP problem, the method then employs a branch 
and bound phase until an integer solution that satis- 
fies the DM is achieved. An implicit utility function is 
assumed and the DM’s preferences are assessed using 
pairwise evaluations of decision alternatives and trade- 
off analysis. In light of the DM’s underlying utility func- 
tion, decisions on whether to apply again the Zionts- 
Wallenius procedure to the linear relaxation of a can- 
didate multi-objective subproblem, or to continue to 
branch by appending a constraint on a variable, are suc- 
cessively made. 

Another method that uses particular forms of (P4, ¢) 
to compute nondominated solutions is due to Y. Ak- 
soy [1]. This is an interactive method for bicriterion 
mixed integer programs that employs a branch and 
bound scheme to divide the subset of nondominated 
solutions considered at each node into two disjoint sub- 
sets. The branching process seeks to bisect the range of 
nondominated values for zz at the node under consid- 
eration, checking whether a nondominated point ex- 
ists whose value for zz is in the middle of the range. 
If no such solution exists, that subset is divided us- 
ing two nondominated points whose values for z2 are 
the closest (one up and the other down) to the mid- 
dle value. These nondominated solutions are obtained 
by solving (P,,,) optimizing one objective function 
and bounding the other. The interactive process re- 
quires the DM to make pairwise comparisons in or- 
der to determine the branching node and to adjust 
the incumbent solution to the preferred nondomi- 
nated solution. It is assumed that the DM’s preferences 
are consistent, transitive and invariant over the pro- 
cess aiming to optimize the DM’s implicit utility func- 
tion. 

C. Ferreira et al. [5] proposed a decision support 
system for bicriterion mixed integer programs. The in- 
teractive process follows an open communication pro- 
tocol asking the DM to specify bounds for the objec- 
tive function values. These bounds are input into (P 4, ¢) 
defining subregions to carry on the search for nondom- 
inated solutions. Some objective space regions are pro- 
gressively eliminated either by dominance or infeasibil- 


ity. 


Tchebycheff and Achievement Scalarizing Programs 


Bowman [3] proved that the parameterization on w 
of min,ex If — f(x)|lw generates the nondominated 
set, where w; > 0 for all i, wi =, f is a cri- 
terion point such that f > f(x) for all x ¢ X and 
\|f — f(x)|lw denotes the w-weighted Tchebycheff met- 
ric, that is, maxy<j<k{wilf, — fi(x)|}. This scalarizing 
program is equivalent to 


min @ 
(T,) dst. w; (7,-fi@) <a, 127ek 


xExX, a0. 


(T,,) may yield weakly nondominated solutions (for in- 
stance, point C in Fig. 1). Replacing the objective func- 
tion in (T,,) by a — p es i(x) with p a small posi- 
tive value, all the solutions returned by this augmented 
weighted Tchebycheff program are nondominated. R.E. 
Steuer and E.-U. Choo [16] proved that there are always 
p small enough that enable to reach all the nondomi- 
nated set for the finite-discrete and polyhedral feasible 
region cases. 

Concerning the MOMIP case, although there may 
be portions of the nondominated set that the program 
is unable to compute, even considering p very small 
(for example, the line segment from C to C’ in Fig. 2, 
for a given p), this characterization is still possible in 
practice. Note that p can be set so small that the DM 
is unable to discriminate between those solutions and 
a nearby weakly nondominated solution (this corre- 
sponds to C’ getting closer to C in Fig. 2). 

In [16] and [15] a lexicographic weighted Tcheby- 
cheff program is proposed for the nonlinear and 
infinite-discrete feasible region cases to overcome this 
drawback of the augmented weighted Tchebycheff pro- 
gram. The lexicographic approach can also be applied 
to the mixed integer (linear) case. However, it is more 
difficult to implement since two stages of optimiza- 
tion are employed. At the first stage only @ is mini- 
mized. When the first stage results in alternative op- 
tima, a second stage is required. It consists of mini- 
mizing — ae i(x) over the solutions that minimize 
a in order to eliminate the weakly nondominated solu- 
tions. 

Besides (T,,) (either the augmented or the lexi- 
cographic forms), there are other similar approaches 
that also allow to characterize the nondominated set of 
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Multi-objective Mixed Integer Programming, Figure 2 
Illustration of the augmented weighted Tchebycheff metric 


multi-objective mixed integer programs. An approach 
of this type consists in discarding the w-vector or fix- 
ing it and varying f, the criterion reference point that 
represents the DM’s aspiration levels. This scalarizing 
program can be denoted by (T}). There always exist ref- 


erence points satisfying f > f(x) for all x € X, such 
that (Ty) produces a particular nondominated solution 


Z = f(x). The variation of a can be done according 
to a vector direction 0, leading to (Ty 4g): The refer 
ence points are thus projected onto the nondominated 
set. Reference points that do not satisfy the condition 
7 > f(x) for all x € X may also be considered pro- 
vided that the a variable is defined without sign restric- 
tion. This corresponds to the minimization of a dis- 
tance from Z to the reference point if the latter is not 
attainable and to the maximization of such a distance 
if the reference point is attainable. If reference or as- 
piration levels are used as controlling parameters, the 
(weighted) Tchebycheff metric changes its form of de- 
pendence on controlling parameters and should be in- 
terpreted as an achievement function [9]. 

Like (T,,), the simplest form of (TF) may produce 
weakly nondominated solutions. The augmented form 
is a good substitute in practice and the lexicographic 
approach guarantees that all nondominated solutions 
can be reached. In what follows, let (T.) denote either 
the simplest, the augmented or the lexicographic form. 

Scalarizing programs (T,,), (TF) and their exten- 
sions or slight different formulations are used to gen- 
erate nondominated solutions in several (interactive) 


methods proposed in literature, namely in the follow- 
ing ones. 

Steuer and Choo [16] proposed a general purpose 
multi-objective programming interactive method that 
assumes an implicit DM’s utility function without any 
special restriction on shape. The strategy of the inter- 
active procedure is to sample series of progressively 
smaller subsets of nondominated solutions. At each 
interaction, the DM selects his/her preferred solution 
from a sample of nondominated solutions obtained 
from (Ty) with several w-vectors and the ideal crite- 
rion point in the role of f. The solution preferred by 
the DM provides information to tighten the set of w- 
vectors for the next interaction. The procedure termi- 
nates when a nondominated criterion point sufficiently 
close to the optimal criterion point of the underlying 
utility function is found. 

Solanki’s method [14], which is designed for bi- 
objective mixed integer linear programs, is an adapta- 
tion of the noninferior set estimation (NISE) method 
developed by J.L. Cohon for bi-objective linear pro- 
grams. It seeks to generate a representative subset of 
nondominated solutions by combining the NISE’s key 
features with weighted Tchebycheff scalarizing pro- 
grams. At each iteration, a new nondominated solution, 
say z?, is computed by solving (T,,) for specific w and f, 
assuring that z°* belongs to the region between a pair of 
nondominated criterion points previously determined, 
say (z!, z”). This pair is then replaced by (z’, z*) and 
(z3, 2”). The approximation of the nondominated sur- 
face is progressively improved, thus decreasing the ‘er- 
rors’ associated with the approximate representation of 
the pairs. This ‘error’ is measured by the largest range of 
the two objectives for the points forming the pair. The 
algorithm finishes when the maximum ‘error’ is lower 
than a predefined maximum allowable ‘error’. 

Another interactive method capable of solving 
MOMIP problems was developed by A. Durso [4]. This 
method employs a branching scheme considering pro- 
gressively smaller portions of the nondominated set by 
imposing lower bounds on the criterion values. At each 
interaction, the k nondominated solutions that define 
the (quasi)ideal criterion point for each new node are 
calculated. The DM is then asked to select the node 
for branching by choosing the preferred ideal point. 
The branching process begins by solving an equally 
weighted augmented Tchebycheff program to deter- 
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mine a ‘centralized’ nondominated point for the subset 
of the node under exploration. Once the DM chooses 
the most preferred of the k + 1 nondominated points 
already known for this node, say Z, up to k new nodes 
(children) are created. Each child inherits its parent’s 
bounding constraints and uses Z to further restrict one 
of them. Thus, the ith child restricts the ith criterion 
by imposing fj(x) > Z; + 5 with 5 small positive. 
This approach may be regarded as an open communi- 
cation procedure that terminates when the DM is sat- 
isfied with the incumbent solution (the preferred non- 
dominated solution obtained so far). 

M.J. Alves and J. Climaco [2] proposed a MOMILP 
open communication interactive approach. It combines 
the Tchebycheff theory with the traditional branch and 
bound technique for solving single-objective mixed in- 
teger programs. At each interaction, the DM speci- 
fies either a reference point f, which is input in (Ty) 
to compute a nondominated solution via branch and 
bound, or just selects an objective function, say fj, 
he/she wants to improve with respect to the previous 
nondominated solution. In the latter case, the refer- 
ence point is automatically adjusted by increasing the 
jth component of f keeping the others equal, in order 
to produce new nondominated solutions (directional 
search) more suited to the DM’s preferences. This in- 
volves an iterative process of sensitivity analysis and 
operations to update the branch and bound tree. The 
sensitivity analysis takes advantage of the special be- 
havior of the parametric scalarizing program (T7, 9). 
It returns a value 6; > 0 such that the structure of the 
previous branch and bound tree remains unchanged 
for variations in f ; up to re + 6). Therefore, refer- 
ence points f+0=(f,,...,f;+9;,...,f;) with 
6; < 6; lead to nondominated solutions that may be 
obtained in a straightforward way. If the DM wishes to 
continue the search in the same direction, a slight in- 
crease over 6;, say 0; + €, is first considered. In this 
case, the previous sensitivity analysis also returns the 
best candidate node, i. e., an ancestor of the node that 
will produce the next nondominated solution. The pre- 
vious branch and bound tree is thus used to proceed to 
the next computations. Since further branching is usu- 
ally required, an attempt is made to simplify the tree 
before enlarging it. The underlying idea is to avoid an 
evergrowing tree. This simplification means cutting off 
parts of the tree linked by branching constraints no 


longer active. In sum, this approach brings together 
sensitivity analysis phases meant to adjust the refer- 
ence point and simplification/branching operations of 
the search tree to compute nondominated solutions. 
This process is repeated as long as the DM wishes to 
continue the directional search or if the reference point 
has not been adjusted enough to yield a nondominated 
solution different from the previous one (a situation 
that occurs more often in all-integer programs than 
in mixed integer models). Computational experiments 
have shown that this multi-objective approach succeeds 
in performing directional searches. The times of com- 
puting phases using simplification/branching opera- 
tions have been significantly reduced by this strategy. 

Some researchers have developed other methods for 
multi-objective integer programming that are also ap- 
plicable to the mixed integer case. Good examples of 
such approaches are those in [10,17] and [7]. In our 
opinion, they all are open communication procedures 
that share some key features, namely the concept of 
projecting a reference direction onto the nondominated 
surface (although this procedure is used in different 
ways) and the type of information required about the 
DM’s preferences. This information lies fundamentally 
in the specification of aspiration levels for the objec- 
tive function values (reference points). Some of these 
approaches are continuous/integer ([7,10]) working al- 
most all the time with nondominated continuous solu- 
tions of the linear relaxation of the problem. Whenever 
the DM finds a satisfactory continuous solution, an in- 
teger nondominated solution close to it is then com- 
puted. 


Conclusions and Future Developments 


Most methods developed so far for MOMIP problems 
require an excessive amount of computational effort, 
or require too much cognitive load from the DM, or 
only address bi-objective problems. In addition, com- 
putational experience with real-world applications is 
lacking. Although interesting or promising approaches 
have been developed, further research efforts must be 
made in order to build effective interactive methods 
able to handle real-sized problems. 


See also 


> Branch and Price: Integer Programming with 
Column Generation 


2460 


Multi-objective Optimization and Decision Support Systems 


> Decomposition Techniques for MILP: Lagrangian 
Relaxation 

> Graph Coloring 

> Integer Linear Complementary Problem 

> Integer Programming 

> Integer Programming: Algebraic Methods 

> Integer Programming: Branch and Bound 
Methods 

> Integer Programming: Branch and Cut Algorithms 

> Integer Programming: Cutting Plane Algorithms 

> Integer Programming Duality 

> Integer Programming: Lagrangian Relaxation 

> LCP: Pardalos—Rosen Mixed Integer Formulation 

> Mixed Integer Classification Problems 

> Multi-objective Integer Linear Programming 

> Multiparametric Mixed Integer Linear 
Programming 

> Parametric Mixed Integer Nonlinear Optimization 

> Set Covering, Packing and Partitioning Problems 

> Simplicial Pivoting Algorithms for Integer 
Programming 

> Stochastic Integer Programming: Continuity, 
Stability, Rates of Convergence 

> Stochastic Integer Programs 

> Time-dependent Traveling Salesman Problem 


References 


1. Aksoy Y (1990) An interactive branch-and-bound algo- 
rithm for bicriterion nonconvex/mixed integer program- 
ming. Naval Res Logist 37:403-417 

2. Alves MJ, Climaco J (2000) An interactive reference point 
approach for multiobjective mixed-integer programming 
using branch-and-bound. Europ J Oper Res 124(3):478- 
494 

3. Bowman VJ (1976) On the relationship of the Tchebycheff 
norm and the efficient frontier of multiple-criteria objec- 
tives. In: Thiriez H, Zionts S (eds) Multiple Criteria Deci- 
sion Making. Lecture notes Economics and Math Systems. 
Springer, Berlin, pp 76-86 

4. Durso A (1992) An interactive combined branch-and- 
bound/Tchebycheff algorithm for multiple criteria opti- 
mization. In: Goicoechea A, Duckstein L, Zionts S (eds) 
Multiple Criteria Decision Making, Proc 9th Internat Conf. 
Springer, Berlin, pp 107-122 

5. Ferreira C, Santos BS, Captivo ME, Climaco J, Silva CC (1996) 
Multiobjective location of unwelcome or central facilities 
involving environmental aspects: A prototype of a deci- 
sion support system. Belgian J Oper Res Statist Comput Sci 
36(2-3):159-172 


6. Feyerabend P (1975) Against method. Verso, London 
7. Karaivanova J, Korhonen P, Narula S, Wallenius J, Vassilev 
V (1995) A reference direction approach to multiple objec- 
tive integer linear programming. Europ J Oper Res 81:176- 
187 
8. Karwan MH, Zionts S, Villarreal B, Ramesh R (1985) An im- 
proved interactive multicriteria integer programming al- 
gorithm. In: Haimes Y, Chankong V (eds) Decision Mak- 
ing with Multiple Objectives. Lecture notes Economics and 
Math Systems. Springer, Berlin, pp 261-271 
9. Lewandowski A, Wierzbicki A (1988) Aspiration based deci- 
sion analysis and support. Part |: Theoretical and method- 
ological backgrounds. WP-88-03, Internat Inst Appl Sys- 
tems Anal (IIASA), Austria 
10. Narula SC, Vassilev V (1994) An interactive algorithm for 
solving multiple objective integer linear programming 
problems. Europ J Oper Res 79:443-450 
11. Ramesh R, Zionts S, Karwan MH (1986) A class of practical 
interactive branch and bound algorithms for multicriteria 
integer programming. Europ J Oper Res 26:161-172 
12. Roy B (1987) Meaning and validity of interactive proce- 
dures as tools for decision making. Europ J Oper Res 
31:297-303 
13. Soland RM (1979) Multicriteria optimization: A general 
characterization of efficient solutions. Decision Sci 10:26- 
38 
14. Solanki R (1991) Generating the noninferior set in mixed 
integer biobjective linear programs: An application to a lo- 
cation problem. Comput Oper Res 18(1):1-15 
15. Steuer R (1986) Multiple criteria optimization: Theory, com- 
putation and application. Wiley, New York 
16. Steuer R, Choo E-U (1983) An interactive weighted Tcheby- 
cheff procedure for multiple objective programming. Math 
Program 26:326-344 
17. Vassilev V, Narula SC (1993) A reference direction algo- 
rithm for solving multiple objective integer linear pro- 
gramming problems. J Oper Res Soc 44(12):1201-1209 
18. Villarreal B, Karwan H M, Zionts S (1980) An interactive 
branch and bound procedure for multicriterion integer lin- 
ear programming. In: Fandel G, Gal T (eds) Multiple Crite- 
ria Decision Making: Theory and Application. Lecture notes 
Economics and Math Systems. Springer, Berlin, pp 448- 
467 


Multi-objective Optimization 
and Decision Support Systems 


SERPIL SAYIN 
Kog University, Istanbul, Turkey 


MSC2000: 90B50, 90C29, 65K05, 90C05, 91B06 


Multi-objective Optimization and Decision Support Systems 


2461 


Article Outline 


Keywords 

Traditional Classification 
Multi-Objective Linear Programming 
Working in the Outcome Space 
Reflections on Optimization Trends 
Nonlinear and Integer Problems 
Applications 

A Related Optimization Problem 
Trends 

See also 

References 


Keywords 


Multiple criteria decision making; Vector 
optimization; Efficient solution; Decision support 


Multiple criteria decision making (MCDM) refers to the 
explicit incorporation of more than one evaluation cri- 
teria into a decision problem. MCDM has been a very 
active field of research roughly since the 1970s. Al- 
though boundaries might be fuzzy and overlapping, 
multicriteria decision analysis (studying the problem of 
identifying the ‘most-preferred’ among a finite discrete 
set of alternatives), multi-attribute utility theory (using 
utility functions explicitly to model a decision maker’s 
preferences) and multi-objective optimization (model- 
ing the decision problem within a mathematical pro- 
gramming framework) have emerged as major fields of 
interest under MCDM. For more information on the 
general field of MCDM, see [21]. 

Multi-objective mathematical programming pro- 
vides a flexible modeling framework that allows for si- 
multaneous optimization of more than one objective 
function over a feasible set. Mathematically, the multi- 
objective optimization problem can be expressed as: 
(moo) )™* Ff), 

st. x EX, 
where X C R” is the set of feasible alternatives and f = 
(f1,-..>fp): R” >R?, p = 2, is a vector-valued function. 
Note that X can be any set, continuous or discrete, ex- 
pressed through constraints, and the objective function 
f can be of any form. 

The increased flexibility provided by (MOO) also 
raises the question of what constitutes a solution to it. 


The definition of optimality is no longer valid, as each 
objective function would possibly yield a different op- 
timal solution. Therefore solving the (MOO) problem 
is about studying the inherent trade-offs among con- 
flicting objectives. Efficient solutions are the ones that 
possess the relevant trade-off information. An x° € R” 
is called an efficient solution for the (MOO) problem 
if x° € X and there exists no x € X such that f(x) > 
f(x°) with strict inequality holding for at least one com- 
ponent. The set of all efficient solutions of the (MOO) 
problem is usually denoted by Xz. As per the above 
definition, the most-preferred solution of the decision 
maker should belong to Xz, as solutions that are not ef- 
ficient, the dominated ones, can be improved upon in 
at least one objective without worsening the others. 
Since X¢ is usually a big set, confining the most- 
preferred solution to X, does not help identify the 
most-preferred solution immediately. In particular, the 
difficulty of defining and obtaining the most-preferred 
solution, the one that the decision maker would iden- 
tify as the solution to the decision-making problem, and 
the need for the inevitable involvement of the decision 
maker in the solution procedure has resulted in very 
different solution approaches to the (MOO) problem. 


Traditional Classification 


The timing of the involvement of the decision maker 
in the solution procedure has been a crucial factor that 
distinguishes among various approaches to the (MOO) 
problem [13]. A priori methods, methods that use prior 
articulation of preferences, ask the decision maker to 
specify preference information prior to the application 
of an optimization routine. The elicitation of preference 
information can be directed towards deriving a util- 
ity function that describes the decision maker’s pref- 
erences [14], or as in goal programming [7] and com- 
promise programming [23], a standard model can be 
imposed upon the decision maker. As these methods 
reduce the (MOO) problem to a single-objective opti- 
mization problem and they aspire to find a single solu- 
tion to it, they have received considerable recognition 
although their assumptions are usually restrictive. 

The interactive methods require the interaction of 
the decision maker with the computer while solv- 
ing a particular (MOO) problem. Usually, the idea is 
to construct a model that proposes solutions to the 
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(MOO) problem based on some initial input. The de- 
cision maker is then invited to reply to the solution by 
providing additional preference information. The in- 
teraction between the computer program and the de- 
cision maker continues until a satisfying solution is ob- 
tained. 

Interactive methods are important in more than one 
way. First, they have introduced the means for practi- 
cally solving a (MOO) problem [12]. Second, they help 
a decision maker learn about the inherent trade-offs of 
a problem during the solution process [5]. Third, the 
idea underlying the interactive methods constitutes the 
major motivation behind the contemporary decision 
support systems. Although interactive algorithms have 
encountered a certain level of acceptance from practi- 
tioners [1,20], they are not without disadvantages. They 
usually rely too much on the information provided by 
the decision maker, are not able to provide a global look 
at Xz, and thus at the trade-offs inherent in a prob- 
lem, and they focus on finding a single solution whereas 
a number of solutions may be compatible with the deci- 
sion maker’s preferences. Moreover, their information 
requests may be overwhelming for the decision maker. 
It has been discussed that interactive methods need to 
address behavioral aspects of decision making [16] and 
concentrate on interfacing the decision maker[15] as 
well as broadening their model base [10]. Although they 
do not encompass all the raised issues, some of the in- 
teractive (MOO) algorithms have already evolved into 
decision support systems that provide a friendly envi- 
ronment for modeling as well as problem solving [17]. 
It can be expected that more decision support systems 
to solve problem (MOO) will appear in the near future. 

Perhaps the most straight-forward way of ap- 
proaching the (MOO) problem is as in vector optimiza- 
tion methods. Also referred to as posterior methods, 
these methods are based on the sole assumption that 
the decision maker prefers more to less in each ob- 
jective function in (MOO) hence they propose identi- 
fying all of the efficient solutions of (MOO) and pre- 
senting them to the decision maker for the identifica- 
tion of the most-preferred solution. Along with theo- 
retical findings [2,11], some vector optimization meth- 
ods have been proposed; however, the methods have 
not gained practical recognition in general. The fail- 
ure in the implementation of the proposed methods can 
be explained by the heavy computational requirements 


of these methods. Perhaps a more important factor is 
the difficulty of presenting the efficient set in a ‘legi- 
ble’ way to the decision maker. Furthermore, as the ef- 
ficient set is usually continuous when the feasible re- 
gion is, the task of identifying the most-preferred so- 
lution is a monstrous one attributed to the decision 
maker. 


Multi-Objective Linear Programming 


When (MOO) has linear objective functions and a poly- 
hedral feasible set, the resulting problem is called a mul- 
tiple objective linear programming (MOLP) problem. 
The MOLP problem has mathematical features that 
make it easier to characterize and obtain the efficient set 
compared to the more general case. More specifically, it 
has been shown that the efficient set of the MOLP prob- 
lem consists of a collection of efficient faces of the fea- 
sible region. As faces of a polyhedron can be charac- 
terized in a number of ways, for instance as the convex 
hull of its extreme points if its compact, as the optimal 
solution set to a particular optimization problem, or as 
a polyhedron itself, it becomes possible to obtain and 
present the efficient set [9,18,22]. 

Yet the computational effort increases with prob- 
lem size, and the (MOO) problem cannot be considered 
truly solved at this stage without some mechanism that 
helps the decision maker identify the most-preferred 
solution in this huge and hard-to-explore set. Most of 
the vector optimization methods have concentrated on 
finding the set of efficient extreme points of the multi- 
ple objective linear programming problem. These are 
usually methods that rely on simplex-like procedures 
or parametric searches that incorporate book-keeping 
mechanisms based on the fact that the set of efficient 
extreme points is connected. A well-known procedure 
that solves (MOLP) for all of its extreme points is AD- 
BASE which was developed by R.E. Steuer [19]. 


Example 1 Consider the MOLP problem [18]: 


X1,X2,X3, 
s.t. 2x1 + 3x2 + 4x3 < 12 
4x, + x2 +%3< 8 


(1) 


X1,X2,X3 Pa 0. 
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The efficient set is the union of the two shaded efficient 
faces E, and E>. There are 5 efficient extreme points: e; 
= (0, 0, 3), €2 = (10/7, 0, 16/7), e3 = (12/10, 32/10, 0), e4 
= (0, 4, 0), es = (2, 0, 0). If X denotes the feasible region, 
The face marked E; can be characterized as the polyhe- 
dron that forces the first constraint in (1) to equality in 
the definition of X. It can also be defined as the convex 
hull of its four extreme points e, e2, e3, e4. Finally, it is 
the optimal solution set to the optimization problem 


max A1Xx1 + A2Xx2 + 3X3 


s.t. xEex 


for (Ay, A2, A3) = (2, 3, 4), and its positive multiples. 


In large problems, the set of efficient extreme points 
may still contain too many points to be studied by 
the decision maker. Moreover, extreme efficient points 
may not carry the trade-off information well since some 
portions of the efficient set may end up being over- 
emphasized whereas some regions are highly missed. 
Indeed, there is no reason for a decision maker to be 
solely interested in extreme point efficient solutions. 
The attractiveness of efficient extreme points mostly lies 
in their mathematical properties. With this motivation, 
a method that applies to a general set of (MOO) prob- 
lems has been suggested to find globally-representative 
subsets of the efficient set [6]. 


Working in the Outcome Space 


The outcome set Y = { y € R?: y= f(x) dx € X } 
helps redefine an equivalent problem to (MOO) in p- 
dimensional outcome space: 


y 
st oy EY. 


(MOOO) 


As the number of objectives p is usually much less than 
the number of variables n, the structure of Y is simpler 
than that of X [4,8]. The ability to work directly with 
(MOOO) thus has the potential of providing significant 
computational benefits that vector optimization algo- 
rithms have tried to realize [3]. 


Reflections on Optimization Trends 


As a field within the general field of optimization, 
multi-objective optimization is naturally affected by the 
trends that become dominant in optimization. Con- 
sequently, interior point methods, genetic algorithms, 
neural networks have been applied to the (MOO) prob- 
lem in various ways. As there are difficult problems un- 
der (MOO) that cannot be yet practically solved, new 
developments in the general field of optimization con- 
stitute a potential to solve these problems. 


Nonlinear and Integer Problems 


Most of the algorithms proposed to solve problem 
(MOO) concentrate on the fully linear case. In general, 
when nonlinearities are introduced, the efficient solu- 
tions and the efficient set become difficult to character- 
ize. There are some algorithms that allow for nonlin- 
earities in the objective functions, and in the constraints 
that define the feasible region, but usually in a conserva- 
tive way so as to retain some computational tractability. 
Similarly, the multiple objective integer programming 
problem is a very difficult one to solve due to the addi- 
tional complications related to integrality. 


Applications 


Along with what one can call ‘case studies’, certain 
applications that are more generic than a case study 
but more specific than problem (MOO) itself have ap- 
peared. Typical examples include, but are not limited 
to, bicriteria network optimization problems, bicriteria 
knapsack problems, and multicriteria scheduling prob- 
lems. Since usually these are problems that naturally 
involve multiple criteria, the methods developed for 
these problems have practical implications. Most of the 
methods developed can be categorized under a priori 
methods. A typical approach is to form a weighted com- 
bination of the objective functions. Recently, interactive 
and vector optimization approaches that deal with sim- 
ilar problems have also appeared. 
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A Related Optimization Problem 


A related problem is the problem of optimizing a func- 
tion g: R” —R? over the efficient set Xz. This can be 
a difficult global optimization problem depending on 
the properties of the objective function g. The problem 
is motivated in different ways. Sometimes, in certain 
settings, a function that is to serve as a pseudo utility 
function is available. Then optimizing this pseudo util- 
ity function over the efficient set in a sense corresponds 
to solving problem (MOO) itself. In addition, when g 
becomes one of the objective functions, then solving 
this problem provides the range of values the objective 
function takes over the efficient set. This information 
is valuable for a decision maker who is trying to make 
assessments to solve a problem and is used in some of 
the interactive algorithms. The difficulty of the problem 
has also resulted in heuristic solution approaches. 


Trends 


The advances in information technology affect the field 
of multiple criteria decision making heavily. Faster 
computers and parallel processing opportunities make 
it timewise feasible to solve optimization problems that 
would be deemed impractical in the past. Improved 
graphical capabilities make it feasible to accommo- 
date sophisticated user interfaces to invite the decision 
maker in the problem solving process more actively and 
reliably. The developments in the World Wide Web 
present many opportunities to explore for individual 
and group decision support. At this point in time, there 
is still a need to solve the MOO problem in a rigor- 
ous, user-friendly and creative way. The decision sup- 
port systems that enable the involvement of the deci- 
sion maker in modeling and problem solving practically 
seem to be the way of solving (MOO) problems. The 
vector optimization approaches can also benefit from 
a decision support framework in their effort to help the 
decision maker identify a most-preferred solution. 
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Traditionally, process design and process control are 
treated sequentially. Dynamics are not considered dur- 
ing the design phase, and flowsheet changes can not be 
made during the control phase. The problem with this 
approach is that the two are inherently connected as the 
design of the process affects its controllability. Thus, the 
steady state design and the dynamic operability issues 
should be treated simultaneously. Analyzing the inter- 
action of design and control addresses the issue of quan- 
titatively determining the trade-offs between the steady 
state economics and the dynamic controllability. 

The interaction of design and control problem is to 
determine the process flowsheet which is both the eco- 
nomically optimal and controllable. There are different 
methods for addressing this problem. One common ap- 
proach is to use overdesign where, once the economic 
steady state design is determined, surge tanks are added 
or equipment sizes are increased in order to handle any 
dynamic problems which may arise. This overdesign is 
usually based on heuristic rules and will likely move 
the design away from its economic optimum. There 
is no guarantee that the measures taken will even im- 
prove the controllability of the process. Other meth- 
ods may examine the dynamic operation of several de- 
signs to determine which has the best controllability as- 
pects. 

There are very few methods which address the inter- 
action of design and control in a quantitative manner. 


2466 


Multi-objective Optimization: Interaction of Design and Control 


The interaction of design and control can be addressed 
through a process synthesis approach involving opti- 
mization. This approach involves the representation of 
design alternatives through a process superstructure, 
the mathematical modeling of the superstructure, and 
the development of an algorithm to extract the opti- 
mal flowsheet from the superstructure. The simultane- 
ous optimization of the design and control of the pro- 
cess is handled through multiple objectives represent- 
ing the steady state economics and dynamic controlla- 
bility. This naturally leads to a multi-objective frame- 
work. 


Multi-objective Optimization 


In any decision making process, the goal is to reach the 
best compromise solution among a number of compet- 
ing objectives. Many examples of competing objectives 
exist in the field of engineering. For example, in the de- 
sign of a process, one may have to consider safety and 
operational issues as well as economic issues. A decision 
making process is necessary when the most economic 
design is not the safest or most operable. 

The best compromise solution depends on the rela- 
tive importance of the conflicting objectives. This rela- 
tive importance is not easily determined and is usually 
a subjective decision. The one responsible for making 
this decision is the decision maker (DM) whose choice 
can be based on a number of factors. Since subjective 
measures and decisions do not translate well into math- 
ematics, a quantitative way of determining the trade- 
offs and relative importance among the the objectives 
is necessary for a multi-objective optimization frame- 
work. 


Multi-objective Framework 
for the Interaction of Design and Control 


In analyzing the interaction of design and control, the 
objectives that are considered measure the steady state 
economics and the dynamic controllability of the pro- 
cess. The optimization approach in process synthesis 
serves as the basis for the multi-objective framework for 
the interaction of design and control. The procedure in- 
volves four steps: 

1) Process representation; 

2) Mathematical modeling; 


3) Generation of noninferior solution set (determine 
trade-offs); 

4) Best-compromise examination. 

The first step is the representation of all the possible de- 
sign alternatives through a process superstructure. In 
this step, all the units and possible connections of inter- 
est are incorporated into the superstructure such that 
all designs of interest are included as a subset of the su- 
perstructure. 

Next, a mathematical model of the superstructure is 
developed for the superstructure as well as for for ob- 
jective functions. The mathematical formulation is de- 
termined by the structure of the process flowsheet and 
must include all information needed to evaluate the ob- 
jective functions. The objective functions must mea- 
sure the economics of the process as well as the con- 
trollability of the process. Since the objective related 
to the economic performance is determined by steady 
state operation and the objective for the controllability 
is determined by its dynamic operation, the mathemat- 
ical model most contain both steady state and dynamic 
information. The mathematical formulation involves 
both continuous and discrete variables where discrete 
variables are used to indicate the existence of units and 
connections within the flowsheet. 


Noninferior Solution Set 


* 
4; fh; 
Multi-objective Optimization: Interaction of Design and Con- 


trol, Figure 1 
Noninferior solution set for a problem with two objectives 
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Once the model has been formulated, an algorithm 
is developed and used to determine the quantitative 
trade-offs among the competing objectives. Individu- 
ally, each objective can be optimized, but together, they 
will be in conflict. This means that there is a set of solu- 
tions where one objective can be improved only at the 
expense of the other objectives. This set of solutions is 
called the noninferior solution set which is visually de- 
picted for a two objective problem in Fig. 1. This solu- 
tion set is also referred to as nondominated and Pareto 
optimal and the surface of noninferior solutions implic- 
itly defines a function G(J). 

Using the information about the trade-offs among 
the competing objectives, a strategy for determining 
the best compromise solution is developed. This strat- 
egy is based on information from the DM and depends 
on the relative weights given to the objectives. These 
weights are varied systematically to locate the solution 
which the DM prefers the most. How to determine 
these weights is one of the more interesting aspects of 
the problem. 

Note that the multi-objective problem can be re- 
duced if some of the objectives (presumably those with 
very low weights) need not be optimized but simply 
brought to a satisfactory level. In this case, these ob- 
jectives can be incorporated into the problem as con- 
straints. 


General Mathematical Formulation 


The mathematical model is a multi-objective mixed in- 
teger nonlinear programming problem which has the 
following form: 


OPTIMIZE J(x,y) 

s.t. h(x,y) =O 
g(x,y) <O (1) 
x € R? 
y € {0, 1}4. 


In this formulation, J is a vector of objectives which in- 
cludes the economic and controllability objectives. The 
expressions h and g represent material and energy bal- 
ances, thermodynamic relations, and other constraints. 
The controllability measures are included in the formu- 
lation as 7. The variables in this problem are partitioned 
as continuous x and binary y. 


Solution of the MOP 


One way to address the solution of the MOP is to for- 
mulate it using a utility function U which implicitly re- 
lates the multiple objectives in terms of some common 
basis: 


min U[J(x,y)] 

st. h(x,y) =O 
g(x,y) < O (2) 
xe R? 
y € {0, 1}. 


By introducing the utility function, the vector optimiza- 
tion problem has been reduced to a scalar optimiza- 
tion problem and MINLP techniques can be applied 
to solve the problem. These MINLP techniques in- 
clude generalized Benders decomposition (GBD) [4,14], 
outer approximation (OA) [2], outer approximation 
with equality relaxation (OA/ER) [8], and outer approx- 
imation with equality relaxation and augmented penalty 
(OA/ER/AP) [16]. These methods are discussed in de- 
tail in [3]. 

With the definition of the noninferior solution set, 
the optimization problem can be formulated as 


min U[J(x,y)] 


st. G(J) =0. ©) 


The challenging aspect of the problem is determin- 
ing the explicit form of the utility function. One possi- 
ble form of the utility function is a weighted linear sum 
of the objectives: 


UU y)] = do wil, 
i€l 

where I is the set of objective functions and w; are the 
weights for the objective functions whose value is deter- 
mined by the DM. The difficulty that arises is that the 
utility function is generally not known. It is, however, 
assumed to be convex and continuously differentiable. 

The issues surrounding the solution of the multi- 
objective optimization problem are determining the 
noninferior solution set, determining the utility func- 
tion based on information from the DM, and determin- 
ing the best-compromise solution. 

Different techniques have been developed in order 
to assess the trade-offs among the objectives quantita- 
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tively. See [7] for a tutorial in multi-objective optimiza- 
tion. A review is also available in [17]. Much of the fun- 
damental aspects of multi-objective optimization can 
be found in [1]. 


Noninferior Solution Sets 


The noninferior solution set can be determined in 
a number of ways. One approach is the formulate the 
problem as 


min So wilile, y) 
i€l 

st. h(x,y) =O 
g(x,y) < O 
x € R? 


ye 10, 1)5 


(4) 


where the weights w; are selected such that w; > 0 for 
all i and )°;e; w; = 1. Through a suitable choice of the 
weights, the noninferior solution set can be found. This 
approach can miss some points in the noninferior solu- 
tion set if the solution region is nonconvex. In order to 
address this problem, a weighted norm can be used as 


follows: 


1/p 
min | 2 [ wiTi(x, vr 
i€l 
st. h(x,y) =O 
g(x,y) < O 
xe R? 


y € {0, 1}4. 


(5) 


By increasing the size of p, the curvature of the support- 
ing function is increased and more noninferior points 
can be found. In the extreme of p = oo, all the non- 
inferior points can be located. Using the oo-norm, the 
problem becomes 


min max wiJi(x,y) 
ie 
st. h(x,y) =O 
g(x,y) <0 (6) 
xe R? 
y € {0, 1}4. 


The advantage of this formulation is that the weights 
have a physical meaning for the DM. If the DM knows 


the desired values for each objective for a given nonin- 
ferior point, the weights can be set to the reciprocal of 
these values. The noninferior solution will be the one 
that is most like the one with the values specified by the 
DM. The disadvantage of this formulation is that it can 
be difficult to solve. 

Another way to determine the noninferior solution 
set is through the €-constraint method [6]. In this ap- 
proach, all but one of the objectives is incorporated into 
the problem as a constraint less than €. This results in 
the following formulation: 


min J\(x,y) 

st. Jil y) < &i, 
h(x,y) =O 
g(x,y) <O 
x € R? 
y € {0, 1}4. 


i=2,...,4, 


(7) 


By varying the values of €;, the points of the noninferior 
solution set can be found. 


Choosing the Best-Compromise Solution 


To this point, the focus has been on determining the 
noninferior solution set. Only one of the points can be 
chosen as the best solution for the problem, and the task 
of the DM is to determine this point. Once the noninfe- 
rior solution set is determined, it is presented to the DM 
who will choose the solution point he prefers. The selec- 
tion of this point is based on the relative importance of 
the objectives in the eyes of the decision maker. 

Instead of assigning arbitrary weights to the vari- 
ous objectives, a systematic approach can applied which 
uses the trade-off information in the noninferior solu- 
tion set. The slope of the noninferior solution set at any 
point reveals how much one objective will be improved 
at the expense of another objective. This information is 
used in an interactive, iterative cutting plane algorithm 
to determine the best compromise solution. 


Cutting Plane Algorithm 


The cutting plane algorithm described in [11] is based 
on [5] and [10]. Marginal rates of substitution were 
used to solve problems of the form (2) where U is un- 
known, convex, and continuously differentiable. Due to 
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convexity, the partial derivatives of U with respect to 
each of the arguments in the objective space are posi- 
tive. This is expressed mathematically as 


dU(J) 


7, > 0. 


Thus, a decrease in J; will lead to a decrease in U. In 
the interactive scheme, the DM is asked for the positive 
trade-off weights, w*, for a given solution k. This weight 
is defined as the ratio of the change in the utility func- 
tion with respect to one function divided by the change 
in the utility function with respect to another. This is 
expressed mathematically as 


pk — QUO") i 
'  aUG*)/ ah 


where J* = [J (x‘, ¥\y wo i, y’)). A line search along 
a feasible direction of steepest descent locates an im- 
proved solution for the next iteration. 

By exploiting the fact that the utility function is con- 
vex, cutting planes can be introduced to reduce the 
search to improving directions [10]. Since U is convex, 


0 > U*) — UG") 
> VU)" —J*) 
min V,U(J‘)J—J*‘) 
st. h(x,y) =O (8) 


g(x,y) <O 
x eR? 
y € {0, 1}4. 


IV 


This involves the linearization in the objective space 
around the point J*. If the solution to the minimization 
is zero, then the optimal solution J* has been found. 
If the solution has a negative value, then the direc- 
tion leads to an improvement in the objective space. 
This minimization can be performed over a number 
of points k = 1, ..., K to find a direction which im- 
proves all of them. Cutting planes in the objective space 
are formed to find new values of the objectives which 
improve the utility function according to the trade-off 
weights, V U, which the DM provides. At each iteration 


of the algorithm, the following problem must be solved: 
min Z 


P 
st. z= DY wii. y) — Sila" y*), 
i=1 
Vie = Wy oc .g Ke 
h(x,y) =O 
g(x,y) <0 
xe R?P 
y € {0, 1}. 


(9) 


The steps of the cutting plane algorithm are the follow- 
ing: 


1 Determine the initial solution point k = 1 and 
determine the values of all the objective func- 
tions. 

Assign the values of the weights w*. 

2 Solve (9) to find new values of x and y. 
Determine the values of the objective functions 
for the new values of x and y. 

3 IF the solution to (9) is zero, THEN go to Step 4 
ELSE set k = k + 1, update the values x*, y*, 
and J‘, generate new weights, and go to Step 2. 

4 Terminate with x* and y* as the best- 
compromise solution. 


Cutting plane algorithm 


This algorithm requires the DM to provide only trade- 
off weights at each iteration. These weights can be es- 
timated by knowledge of the relative importance of the 
objectives or by information from the noninferior solu- 
tion set. 


Multi-objective Optimization 
in the Interaction of Design and Control 


The interaction of design and control has been recog- 
nized as a multi-objective problem by many researchers 
as the objectives representing the steady-state economic 
design and dynamic controllability are regarded as non- 
commensurable. One of the first challenges in this 
problem is determining a suitable controllability objec- 
tive. The choice of the controllability objective will dic- 
tate the required elements of the mathematical formu- 
lation of the problem. 
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One of the early works which addressed the multi- 
objective nature of the interaction of design and con- 
trol was that of [9]. A given set of alternative steady- 
state designs was assumed to be known. Bounds on the 
dynamic measures of the designs were determined and 
used to screen designs and determine the noninferior 
solution set. No method was provided for determining 
the best-compromise solution. 

In the work of [13], singular value decomposition 
is used to determine dynamic operability measures. 
The controllability is formulated through the lineariza- 
tion of the model and is given in terms of the sin- 
gular values of the transfer function. This modeling 
leads to an infinite-dimensional problem as all frequen- 
cies must be considered for the controllability measure. 
For the multi-objective optimization, the ¢-constraint 
method was used to determine the noninferior solu- 
tion set. The scalar optimization was addressed by ap- 
proximating the infinite-dimensional problem and us- 
ing an gradient-based algorithm to solve the optimiza- 
tion problem and determine the operating parameters 
for the process. 

The previous methods did not take into account 
that the structure of the process flowsheet as well as 
the design parameters determine its inherent control- 
lability. In order to consider structural alternatives in 
the process flowsheet such as the existence of units in 
the flowsheet, discrete variables are used in the pro- 
cess modeling. This aspect of the process design was 
considered by [11,12] in the interaction of design and 
control by using the optimization approach to process 
synthesis. In this approach, the structure of the pro- 
cess flowsheet and the design parameters are consid- 
ered simultaneously with the dynamic controllability 
of the process. The controllability measures employed 
were the open-loop linear controllability measures (sin- 
gular value, condition number, relative gain array). The 
noninferior solution set was determined using the €- 
constraint method, and the best-compromise solution 
was found using the cutting plane method described 
above. 

Further development of the above technique was 
addressed by [15] where nonlinear dynamic mod- 
els were considered. The problem was formulated as 
a multi-objective mixed integer optimal control problem. 
The multi-objective problem was again solved using the 
€-constraint method. The mixed integer optimal con- 


trol problem was solved by extending the methods for 
solving mixed integer nonlinear optimization to handle 
dynamic systems. 


Conclusions 


Analyzing the interaction of design and control leads to 
a multi-objective optimization problem. The key issue 
in solving this problem is quantitatively determining 
the trade-offs between the steady-state economics and 
the dynamic controllability. By using multi-objective 
optimization techniques, these characteristics of the 
process can be traded off in a systematic manner. 

By following the optimization approach to process 
synthesis, a mathematical framework can be developed. 
This involves developing a superstructure of design al- 
ternatives and effective mathematical models for the 
different criteria. The algorithmic procedure for solv- 
ing the multi-objective problem involves the successive 
solution of scalar optimization problems to determine 
the noninferior solution set. The final step in the ap- 
proach is to determine the best-compromise solution 
from those in the noninferior solution set. 
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Multi-objective Optimization: Interactive Methods for Preference Value Functions 


The multi-objective optimization (multiple criteria de- 
cision making) problem is the problem of choosing 
a most preferred solution when two or more incom- 
mensurate, conflicting objective functions (criteria) are 
to be simultaneously maximized. Interest in multi- 
objective optimization has risen sharply during the past 
30 years. There are at least three reasons for this. First, 
and most importantly, is the increasing recognition that 
most applied problems in both the private and pub- 
lic sectors involve multiple objectives rather than one 
objective. Second, a variety of solution algorithms for 
multi-objective optimization are now available. Finally, 
the enormous improvements in the speed and stor- 
age of computers make it practical to apply these al- 
gorithms to the solution of realistically-sized problem 
applications. 

Formally, the statement of the multi-objective opti- 
mization problem of interest here is 


VMAX f(x) = [filx),.--. fp()], 


s.t. xeEeXxX. 


(V) 


Here, p > 2, X isa nonempty subset of R", each fj, j = 1, 
..+) p, is a real-valued function defined on X or on some 
suitable set containing X, and VMAX indicates that, in 
some unspecified sense, we are to “vector maximize’ the 
vector f(x) of objective functions (criteria) over X. The 
set X is called the set of decision alternatives or the de- 
cision set, and {f(x) € R?: x € X }, is called the outcome 
set. 

There are a large number of diverse solution algo- 
rithms for problem (V). All are intended to help the 
decision maker (DM) find a most preferred solution to 
the problem. In the majority of these algorithms, the 
notion of efficiency plays an indispensable role. An effi- 
cient (nondominated, noninferior, Pareto optimal) solu- 
tion for problem (V) is a solution x € X such that there 
exists no other solution x € X that satisfies f(x) > f(x) 
and f(x) # f(x). Let Xp denote the set of efficient so- 
lutions for problem (V). Notice that if x € Xz, then 
there is no other feasible solution for problem (V) that 
achieves at least as large a value as x in each criterion of 
the problem and a strictly larger value than x in at least 
one criterion of the problem. 

In the great majority of instances of problem (V), 
the preference value function (value function) v of the 
DM is unknown. This is a function v: R? +R that maps 


the outcomes of problem V to real numbers in such 
a way that for any two outcomes y' and y’, the DM 
prefers y' to y* if and only if v(y') > v(y”). Although 
v is unknown, what is known is that for each objec- 
tive function f;, the DM prefers more of f; to less of f;. 
Mathematically, this means that v is coordinatewise in- 
creasing, i.e., that whenever Z,z € R? satisfy z > z and 
Z; > z; for somej =1,..., p, then v(Z) > v(z). It is easy 
to show that when v is coordinatewise increasing, any 
maximizer x* of v [f(x)] over X must satisfy x* € Xz. In 
other words, as long as the DM prefers more to less, the 
search for a most preferred solution to problem (V) can 
be confined to Xz. This is one of the key reasons that 
the concept of efficiency is so important to the majority 
of the algorithms for problem (V). 

The interactive methods constitute one of the most 
popular categories of algorithms for solving problem 
(V). An interactive method for problem (V) consists of 
a sequence of DM-computer interactions designed to 
create a sequence of decision alternatives that termi- 
nates with a most preferred solution to the problem. In 
a majority of cases, the generated alternatives are efh- 
cient. Each iteration of the interactive process consists 
of three steps. First, an initial solution is found with the 
aid of the computer. Typically, this solution is found 
by solving a single-objective optimization problem that 
generates either an efficient point or, at worst, a feasible 
point. Next, the DM is asked to react to the generated 
point by answering one or more questions involving his 
preferences for it. Last, based upon the answers given, 
the computer generates a new point, typically by mod- 
ifying parameters in the single-objective optimization 
problem. This process continues until either the com- 
puter or the DM identifies a most preferred solution. 
The value function v of the DM is never needed and, in 
fact, is assumed to be unavailable. 

There are several advantages to using interactive 
methods as compared to other categories of methods 
for problem (V). For instance, the preference infor- 
mation asked of the DM at each iteration is not dif- 
ficult to supply. Furthermore, the DM thereby learns 
about his value function, which is often initially vague 
or mostly unknown. As the search continues, the DM 
also learns about the decision or efficient decision al- 
ternatives available and the trade-offs in the objective 
functions across these decision alternatives. The op- 
timizations required of the computer are also usually 
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not difficult to perform. Finally, because the DM is 
highly involved in the process, his confidence in the 
most preferred solution that is eventually found is en- 
hanced. 

A frequent criticism of the interactive methods is 
that, in practice, the work required of the DM during 
the iterations seems to be burdensome for him in many 
cases. This may cause the DM to prematurely termi- 
nate the search so that a most preferred solution is not 
found. 

There are literally hundreds of interactive algo- 
rithms for problem (V). Many are limited to cases 
where problem (V) is a multiple objective linear pro- 
gramming problem. Others apply when problem (V) 
is a multiple objective convex, nonlinear programming 
problem, a multiple objective integer programming 
problem, or some other type of multiple objective op- 
timization problem. Instead of examining these algo- 
rithms individually, we will describe them by groups 
according to the characteristics that they possess. 

One of the key characteristics of the interactive al- 
gorithms concerns the type of information required of 
the DM at each iteration. For instance, at each itera- 
tion, the DM may be asked to intuitively assign or re- 
assign weights to the criteria according to his current 
assessment of their relative importance. R.E. Steuer [13] 
has shown some important stumbling blocks to this ap- 
proach, however. Other algorithms may instead elicit 
relaxation quantities from the DM. In these cases, the 
DM is asked how much he would be willing to relax 
the level of one objective function in order to obtain 
possible improvements in the levels of other objective 
functions. Some of the oldest interactive algorithms use 
this approach [1,9]. Still other types of algorithms ask 
the DM various types of trade-off questions. The trade- 
off questions are designed to obtain an estimate of the 
gradient of the value function of the DM at the current 
solution. This approach is also relatively old, but diffi- 
cult for the DM to accomplish [5,14]. Finally, a num- 
ber of algorithms call for the DM to make paired com- 
parisons at each iteration. In a paired comparison, the 
DM is given two solutions to compare and must give 
his preference for one or the other. Usually, the DM 
can accomplish this. But when the two solutions are 
quite similar, difficulties can arise [15]. In addition, al- 
gorithms that use paired comparisons can sometimes 
call for excessive numbers of these comparisons [12]. 


A second dimension where the interactive algo- 
rithms differ is in the approach used to explore the fea- 
sible region X or the efficient set Xg. Some algorithms 
use feasible direction methods [2]. In these algorithms, 
at each iteration, the direction to move from a point 
that was last found and the distance to move along the 
direction are determined with the aid of the DM. By 
moving along the direction by the specified amount, 
the next solution point is found. In many algorithms, 
all such points are efficient. In another group of algo- 
rithms, feasible region reduction is used to explore X or 
Xz. As points in X or in Xz are examined in these meth- 
ods, portions of X are removed, usually via linear cuts. 
Another set of algorithms uses weighting space reduc- 
tion. In these algorithms, a weighted sum of fj, j = 1, 
..+» p, is maximized at each iteration, thereby yielding 
a point in Xz. Based upon the DM’s responses to these 
maximizations, portions of the weighting space are re- 
moved. Eventually, the portion of the weighting space 
remaining is so small that the DM can pick out the set 
of weights associated with a most preferred solution. 

Other approaches used to explore X or Xz include 
the trade-off cutting plane method [10], Lagrange multi- 
plier methods, visual interactive methods (see, e. g. [7]), 
and the branch and bound method [8], among oth- 
ers. For further reading concerning these methods, see 
[3,4,6,11,12,13]. 

Another way to group the interactive algorithms for 
problem (V) is according to whether or not they han- 
dle inconsistencies in the DM’s preference responses. 
As human beings, DM’s are prone to giving preference 
responses over the course of the solution procedure that 
imply inconsistencies such as asymmetries or intransi- 
tivities of preference. Some algorithms take no account 
of these possible inconsistencies and have been criti- 
cized for this [12]. Others attempt to reduce inconsis- 
tency by either minimizing the DM’s cognitive burden 
or by incorporating tests for inconsistency that are used 
as the interactive solution process proceeds. 

WS. Shin and A. Ravindran [12] have compared 
various of the classes of interactive algorithms accord- 
ing to four criteria that are important in practice. These 
criteria are the DM’s cognitive burden, the ease with 
which the single-objective optimizations called for can 
be used, implemented and solved, the handling of in- 
consistency, and the overall quality of the solution pro- 
cess and the answers obtained. Although preliminary, 
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these comparisons seem to show the relative superior- 
ity of the weighting space reduction and other criterion 
weight space search methods, and of the visual inter- 
active methods. Readers should note, however, that the 
rankings in the study are subjectively-obtained by the 
authors [7]. 

For further general reading on interactive methods, 
see [2,3,4,6,11,12,13,14]. 
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As is well known, duality in mathematical program- 
ming is based on the property that any closed convex set 
can be also represented by the intersection of closed half 
spaces including it. Let the multi-objective optimization 
problem to be considered here be given by 


P) me f(x) = Gilx),..., f(x) 


over xEX, 
where 
<0 
x= eX’: gi(x) S ; 
‘ i=1,...,m,X’ CR" 


Note here that vector inequalities are commonly used: 
for any n-vectors a and b, a > b means a; > b; (i=1,..., 
n). Also, a 2 b means a; 2 0b; (i= 1,..., n). On the 
other hand, a > b means a 2 b but a ¥ b. Hereafter, 
vector inequalities such as g(x) S 0 will be used instead 
of g(x) $0 (i=1,..., m). 

Defining a dual problem (D) in some appropriate 
way associated with the problem (P), our aim is to show 
the property min(P) = max(D). Here min(P) denotes 
the set of efficient points of the problem (P) in the ob- 
jective function space R?, and similarly max(D) the one 
of the dual problem (D). 

Unlike the usual mathematical programming, the 
optimal value of the primal problem (and the dual 
problem) are not necessarily determined uniquely in 
multi-objective optimization. Hence, there have been 
developed several kinds of formulation of dual problem 
in order to get the desirable property min(P) = max(D). 
Regarding Lagrange duality, three typical dualizations 
can be seen in linear cases, nonlinear cases and geomet- 
ric approaches [6]. 


Linear Cases 


The first result on duality for multi-objective optimiza- 
tion seems the one given in [1] for linear cases. This is 
formulated as a matrix optimization including the vec- 
tor optimization as a special case. Although there have 
been several related works, the probably most attractive 
one is given in [2] because it is formulated as a natural 


extension of traditional linear programming: Let A be 
an m Xx n matrix, C a p x n matrix, and b an m-vector. 
Then the primal problem (P) in linear cases is formu- 
lated as 


min Cx 
(P;) st. Ax2b 
x= 


Associated with (P;), H. Iserman [2] defined the 
dual problem as 


max Ab 
(D;)) {st AAZC 
A=0. 


Here, the multiplier A 2 0 is a p x m matrix whose 

elements are all nonnegative. 

Then Isermann’s duality is given by 

i) Ab Cx for all feasible x and A. 

ii) Suppose that Ab = Cx for some feasible ¥ and 
some feasible A. Then A is an efficient solution to 
(Dy) and x is an efficient solution to (P;). 

iii) min(P;) = max(D)). 


Nonlinear Cases 


The most natural dualization in nonlinear multi- 
objective optimization seems to be the one given in 
[10]. 
Consider the problem (P), and assume the following: 
i) X’ isa nonempty compact convex set. 
ii) f is continuous, and f(X) + R% is convex in R’. 
iii) gj(i=1,..., m) are continuous and convex. 
Under these assumptions, it can be readily shown that 
for every u € R™, both sets X(u) = {x € X’: g(x) Su} 
and Y(u) = f[X(u)] = {y € R?: y = f(x), x € X’, 
g(x) S u} are compact and convex. 

The primal problem (P) can be embedded as (Po) in 
a family of perturbed problems (P,,) given by 


(Pu) 


Defining = {u € R™: X(u) # G }, the set I” is con- 
vex. Now in a similar fashion to the ordinary mathe- 
matical programming, the perturbed map can be de- 
fined by 


min Y(u). 


W(u) = min {f(x): x € X’, g(x) Su}. 
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It is known that for every u € I’, W(u) + Ri is con- 
vex and 


W(u) + RE = Y(u) + Ri. 


In addition, the map W is monotone and convex on I’. 
Now, define the vector valued Lagrangian function 
with a p x m matrix multiplier A as 


L(x, A) = f(x) + Ag(x). 


Associated with this definition, the dual map can be 
defined as 


@(A) = min 2(A), 
where 
Q(A) = {E@6 A): xeE x’ : 


Under the terminology, the dual problem associated 
with the primal problem (P) can be given by 


(Dys) max (_) ®(A). 
AEL 


It can be shown that @ is concave point-to-set map 
on J’, namely 


@(aA! + (1 — a) A’) 
C a®(A') + (1—a)6(A’) + RA 


and ®(A) + Ri is a convex set in R? for each A € £. 
Here & is the set of all p x m matrices whose compo- 
nents are all positive. 

T. Tanino and Y. Sawaragi [10] presented the fol- 
lowing as duality in multi-objective optimization: 


Theorem 1 
i) Foranyx €X andy € ®(A) 
yZ f(x). 


ii) Suppose thatz € X, A € Land f() € ®(A). Then 
¥ = f(&) is an efficient point to the primal problem 
(P) and also to the dual problem (Drs). 

iii) Suppose that any efficient solutions to (P) are all 
proper and that Slater’s constraint qualification is 
satisfied. Then 


min (P) C max (Drs). 


Remark 2 The above theorem is not complete in the 
sense that the relation min(P) = max(D) does not hold. 
Regarding conjugate duality, there have been reports 
presenting w-min(P) = w-max(D) (see, e.g., [4] and 
[9]). Several studies based on geometric consideration 
have been made for deriving the relation min(P) = 
max(D) using vector valued Lagrangian. This will be 
stated in the following 


Geometric Duality 


Geometric considerations are made in [3], based on the 
supporting hyperplanes for epiW, and in [5], based on 
the supporting conical varieties for epiW, which is de- 


noted by G here. 
Define 
y 2 f(x), 
G=4(u,y)€R?xR?: uZ g(x) 


for some x € X’ 


Yea 7 (0,y)€G,0ER™, ye R’}. 


Associates with the primal problem (P), we consider 
the following two kinds of dual problems: 


(Dy) max (_) Ysca). 
AeL 


where 


Ysca) = {y © R?: f(x) + Ag(x) £ y, Vx € X"} 


and 
(Dy) 'o Yu-(A,1)> 
L>0 
AZz0 
where 
Yu-(a,n) 
2 p. (Ms F(x) + (A, g(x) 2 (Hy) 
y one Vx € X’ 
Theorem 3 


i) For any feasible x in (P) and for any feasible y in (Dx) 
or (Dy), 


yZ f(x). 


ii) Assume that G is closed, that there exists at least 
an efficient solution to the primal problem, and that 
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these solutions are all proper. Then, under the condi- 
tion of Slater’s constraint qualification, the following 
holds: 


min (P) = max (Dy) = max (Dj). 


Remark 4 In the above duality, we assumed that the 
convex set G is closed and that Slater’s constraint quali- 
fication is satisfied, which seem relatively restrictive. In- 
stead of these conditions, J. Jahn [3] assumed that Yq is 
closed and some normality condition. 

Define 


Agu) = {a: (0,a@) € Gi), OE R™, ae R'} 
Yo = {y: (0,v)€ G, 0€ R”, yER”}. 


Definition 5 The primal problem (P) is said to be J- 
normal, if for every ps > 0 


cMAgu)) = Aagu): 


The primal problem (P) is said to be J-stable, if it is 
J-normal and for an arbitrary jz > 0 the problem 


sup inf (1, f(x)) + (A, g(x)) 


A=0 xEX 
has at least one solution. 


On the other hand, J.W. Nieuwenhuis [7] suggested an- 
other normality condition: 


Definition 6 The primal problem (P) is said to be N- 
normal, if 


clY¢ = Yac. 
Lemma 7 _ Slater’s constraint qualification (Ax, g(x) > 
0) yields J-stability and N-normality. 


Theorem 8 Suppose that Yq is closed, minp(P) F 9, 
and the efficient solutions to (P) are all proper. Then, un- 
der the condition of J-stability, 


min (P) = max (Dy) = max (D)). 


Duality for Weak Efficiency 
Define 


Ysa) = {y © RP: f(x) + Ag(x) fy, Vx EX}. 


Theorem 9 Suppose that Yg is a nonempty subset in 
R? and Yc + R’. is bounded. Then under the condition 
of N-normality 


w-mincl Yo = w- maxcl 'S Ys/(A) 
AEL 


= w-maxcl U 


eR \{0} 
Az0 


Yu-(A,m)- 


Remark 10 As can be readily seen, by defining inf A, 
for aset A € R?, as essentially min cl(A + R’) and sim- 
ilarly sup A as essentially min cl(A — R‘), we can have 
inf(P) = sup(Drs) = sup (Dy) = sup(Dy;) under some 
appropriate stability condition [9]. 
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The multi-objective optimization (multiple criteria de- 
cision making) problem is the problem of choosing 
a most preferred solution when two or more incom- 
mensurate, conflicting objective functions (criteria) are 
to be simultaneously maximized. A central difficulty in 
such problems is that, unlike in single objective maxi- 
mization problems, there is no obvious or simple way to 
define the concept of a most preferred solution. Never- 
theless, because the applications of multi-objective op- 
timization abound, there has been great interest dur- 
ing the past 30 years in seeking appropriate defini- 
tions for a most preferred solution and in developing 
algorithms that aid the decision maker (DM) to find 
such a solution. These applications are in a wide variety 
of areas, including, for example, production planning, 
finance, environmental conservation, academic plan- 
ning, nutrition planning, advertising, facility location, 
auditing, blending techniques, transportation planning, 
and scheduling, to name just a few. 

There are several alternate mathematical formula- 
tions of the multi-objective optimization problem [13]. 
For purposes of modeling the deterministic multiple 
objective optimization problems found in management 
science/operations research, however, the most popular 
form of the problem is denoted 


VMAX [fi (x),... Sp (x)] 
s.t. xeEeXxX. 


(V) 


Here, p > 2, X isa nonempty subset of R”, each fj, j= 1, 
...)p, isa real-valued function defined on X or ona suit- 
able set containing X, and VMAX indicates that we are 
to, in some as-yet unspecified sense, “vector maximize’ 
the vector 


f(x) = [fil), .--, fp] 


of objective functions (criteria) over X. The set X is 
called the set of alternatives or the decision set. 

Of all of the solution concepts proposed for helping 
the DM find a most preferred solution for problem (V), 
the concept of efficiency has proven to be of overrid- 
ing importance. An efficient (Pareto optimal, noninfe- 
rior, nondominated) solution for problem (V) is a point 
x € X such that there exists no other point x € X that 
satisfies f(x) > f(x) and f(x) # f(x). Letting X_ de- 
note the set of all efficient points for problem (V), we 
see that whenever x € Xz, there is no other feasible 


Multi-objective Optimization: Pareto Optimal Solutions, Properties 


2479 


point that does at least as well as x in all of the criteria 
for problem (V) and strictly better in at least one crite- 
rion. A point x € X is called dominated when for some 
other point x € X, f(x) => f(x) and, for at least one 
j=l,...,p, f(x) > f(x). Thus, we have the alternate 
definition for efficiency that states that a point X is an ef- 
ficient solution for problem (V) when x € X and there 
are no other points in X that dominate x. 

One of the reasons for the fundamental importance 
of the efficiency concept is that it has proven to be 
highly useful in a variety of algorithms for problem 
(V). Among these algorithms are the satisficing meth- 
ods, compromise programming, most interactive meth- 
ods, and the vector maximization method. The latter 
method, for instance, seeks to generate either all of Xz 
or key parts of Xz. The generated set is shown to the 
DM. Then, based upon the DM’s internal utility (or 
value) function, the DM chooses from the generated set 
a most preferred solution. For details concerning these 
methods for problem (V), see [7,10,12,13,14]. 

In some cases, it is useful to consider a slightly 
relaxed concept of efficiency called weak efficiency. 
A point x € X is called a weakly efficient (weakly Pareto 
optimal, weakly noninferior, weakly nondominated) so- 
lution for problem (V) when there is no other point x 
€ X such that f(x) > f(x). Let Xwe denote the set of 
all weakly efficient points for problem (V). Notice that 
Xz is a subset of Xwe. In some cases of problem (V), 
such as when the objective functions are ratios of linear 
functions, it is easier to analyze and generate points in 
Xwe than points in Xz. 

Let U represent a utility function defined on the 
space R? of the objective functions of problem (V). Sup- 
pose that U is coordinatewise increasing, i. e., that when- 
ever Z,z € RP satisfy z > z and Z; > z; for some j= 1, 
...5 p> then U(Z) > U(z). Suppose that x* is an optimal 
solution to the single objective problem 


(S) max U[fi(x),.--, fp(x)].- 


Then x* must be an efficient solution for problem (V) 
(cf. [11]). 

The property in the previous paragraph explains 
to a great extent why the concept of efficiency is of 
such fundamental value. The assumption that the util- 
ity function U in the above paragraph is coordinatewise 
increasing implies that in problem (S), for eachj=1,..., 


p, more of f; is preferred to less of f;. Thus, if we imag- 
ine that U is the utility (or value) function of the DM 
over the objective function space of problem (V), then 
the previous paragraph implies that whenever the DM 
prefers more to less in each objective function of prob- 
lem (V), any point that maximizes the DM’s utility for 
f(x) over X must be an efficient point in problem (V). 
In short, as long as we know that the DM prefers more 
to less, we can confine the search for a most preferred 
solution to Xz. Although the utility function of the DM 
is generally not actually available, in virtually all appli- 
cations the DM does, indeed, prefer more to less in each 
objective function of problem (V). Thus, in essentially 
all cases, any most preferred solution for problem (V) 
will be found in Xz. 

Because of the central importance of efficiency, 
a great deal of effort has been made by researchers to 
delineate the properties of the efficient points and of the 
efficient set for problem (V). In what follows, we shall 
briefly highlight some of the most important of these 
properties. 

Consider the single-objective optimization problem 


P 
max wjfj(x) 
«w) dX 
s.t xEex 
Here, w;, j = 1, ..., p, are parameters, which are of- 


ten thought of as weights associated with the objective 
functions fj, j = 1,..., p, of problem (V). A number of 
so-called scalarization properties for efficient points of 
problem (V) are expressed in terms of problem (W). 
To present some of these, another efficiency concept, 
called proper efficiency, is needed. A point x ° is said to 
be a properly efficient solution for problem (V) when x ° 
€ Xz and, for some sufficiently large number M, when- 
ever fi(x) > fi(x °) for some i = 1, ..., p and some x € 
X, there exists some j = 1, ..., p such that f;(x) < f;(x °) 
and 


File) — file) 
Co eC aa 


In words, for each properly efficient solution of prob- 
lem (V), for each criterion, the possible marginal gains 
in that criterion relative to the losses in the criteria that 
have losses cannot all be unbounded from above. Let 
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X pre denote the set of properly efficient solutions for 
problem (V), and let w? = (w;,..., Wp). Then some key 
scalarization properties are as follows. 

1) If x is the unique optimal solution to problem (W) 
for some w > 0, w 4 0, then x € Xz. 

2) Ifx is an optimal solution to problem (W) for some 
w>0,w 40, then x € Xwe. 

3) Assume that for each j = 1,..., p, fj is a concave 
function on the convex set X. Then x € Xprz ifand 
only if X is an optimal solution to problem (W) for 
some w > 0. 

4) Under the assumptions in property 3), x € Xwe if 
and only if x is an optimal solution to problem (W) 
for some w > 0, w 4 0. 

5) Under the assumptions of property 3), ifx € Xz but 
x ¢ Xpre, then there exists a w > 0, w 4 0 with w; = 
0 for at least one j = 1,..., p such that X is an optimal 
solution to problem (W). 

6) If each fj, j = 1,..., p, is a linear function and X is 
a polyhedron, Xprg = Xz. 

The scalarization properties can be used for various 

purposes, including the generation of points in Xz, Xwe 

and X pre. For instance, when eachf;, j= 1,..., p, isalin- 

ear function and X is a polyhedron, from properties 3) 

and 6), points in Xz, including, at least potentially, all 

of Xz, can be generated by solving problem (W) as the 
parameter w > 0 is varied. Under the assumptions of 
property 3), the same process will generate points in 

X pre; including, at least potentially, all of Xprz. How- 

ever, from properties 3)-5), it is apparent that no such 

simple process for generating Xz exists, even under the 
assumptions of property 3). This is another motivation 
for the proper efficiency concept. 

Another important issue in efficiency concerns test- 
ing. One may want to test a given point for efficiency 
in problem (V), and one may want to test whether Xz 
and Xprg are empty or not. We will present several of 
the properties of efficiency that provide some of the the- 
ory for these tests. These properties all utilize the single- 
objective problem 


P 
max Y- fi(), 


j=l 


CT): Sat. fi) = Fie"), 
|. cee 
xeEex. 


Here, x ° is an arbitrary element of R”. The properties 

are as follows. 

7) The point x ° € R” belongs to Xg if and only if x ° 
is an optimal solution to problem (T). 

8) Suppose that x ° € X in problem (T), and that prob- 
lem (T) has no finite maximum value. Then X ppg = 
8 [1]. 

9) Suppose that the assumptions of property 3) hold, 
that x ° € X in problem (T), and that problem (T) 
has no finite maximum value. Then, if the set 


Z={zeR?: z < f(x) for some x € X} 


is closed, Xz = @. 

10) Assume that each f;,j=1,..., p, isa linear function 
and that X is a polyhedron. Suppose that x ° € X 
in problem (T), and that problem (T) has no finite 
maximum value. Then Xz = @. 

11) Any optimal solution to problem (T) belongs to 
xX E- 

Notice from these properties that solving problem (T) 

is a useful tool for both testing a point for efficiency and 

for investigating the issues of whether Xz and X ppg are 
empty or not. In the case of testing a point x ° for effi- 
ciency, property 7) shows that problem (T) can be used 
to obtain a definitive answer, i.e., using property 7), we 
will always detect whether or not x ° € Xz. Further- 
more, when property 7) shows that x ° ¢ Xz, but prob- 
lem (T) has an optimal solution x*, then, by property 

11), x* € Xz. Notice also that in this case, x* dominates 

x. 

In the case of investigating whether or not Xz and 
Xpre are empty, however, definitive answers cannot 
usually be obtained by using these properties. This is 
because none of the properties addresses the issue of 
whether or not Xz and Xprz are empty when, instead 
of having an optimal solution or having no finite max- 
imum value, problem (T) has a finite but unattained 
maximum value. The one case where the properties can 
be used to definitely detect whether or not Xz and X ppg 
are empty is the case where the objective functions of 
problem (V) are all linear and X is a polyhedron. In that 
case, problem (T) cannot have a finite but unattained 
maximum value. Therefore, properties 7), 10) and 11) 
can be used to detect whether or not Xz = Xprz is empty 
in such cases. 

One of the main challenges computationally to gen- 
erating all or parts of X¢ or Xwe for the DM to consider 
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is that both Xz and Xwe are, except for trivial cases, 
nonconvex sets. Although some researchers have sug- 
gested ways to mitigate this problem [5], it generally 
remains a major stumbling block for algorithm devel- 
opment. In many common cases, however, Xz or Xwe 
possesses a useful, although less valuable, property than 
convexity upon which algorithms can be based. This 
property is called connectedness. In particular, a set Z 
C R" is connected if, whenever A and B are nonempty 
subsets of R” such that A has no points in common with 
the closure of B, and B has no points in common with 
the closure of A, Z # A U B. Some common cases of 
problem (V) where Xz or Xe is connected are given in 
the following properties. 

12) Assume that for each j = 1,..., p, fj is a quasicon- 
cave function on X, and that X is a compact convex 
set. Then X we is connected. 

13) Assume that for each j = 1, ..., p, fj is a concave 
function on R", and that X is a compact convex set. 
Then Xz is connected. 

Recall that a concave function on a convex set is also 

quasiconcave on the set. Therefore, from property 12), 

it follows that Xwe is connected when each objective 

function in problem (V) is a concave function on X, and 

X is a compact, convex set. 

There are a variety of other properties of efficient 
points and of the efficient set for problem (V). These in- 
clude, for instance, density properties, stability-related 
properties, the domination property [2,3,8], and com- 
plete efficiency-related properties [4,6]. For further 
reading, see [5,7,9,10,12,13,14]. 
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In this article we will describe some results for sensi- 
tivity analysis and parametric programming for linear 
models. The solution approach that is described here 
is based upon the extension of simplex algorithm for 
linear programs (LP) [3,5]. Here we mention some ref- 
erences ((1,2,6,7,8,9,10,11,12,13,14,15,16,18,19,20,21], 
and [17]); however [3] is recommended for an exten- 
sive list of references and [4] for a historical outline on 
parametric linear programming. 

We will consider right-hand side (RHS) multipara- 
metric linear programming problems, where uncertain 
parameters are assumed to be bounded in a convex re- 
gion. The solution algorithm is based upon character- 
izing the given initial convex region by a number of 
nonoverlapping smaller convex regions and obtaining 


8, 


W'o 
Multiparametric Linear Programming, Figure 1 
Definition of critical regions 


optimal solutions associated with each of these regions. 

The basic assumptions for the application of the algo- 

rithm are: 

e The given region must be finite and connected. 

e One should be able to characterize at least one 
(smaller) region. 

e One should be able to identify all regions that are 
adjacent to a given region. 

Consider the following multiparametric linear pro- 

gramming problem, when parameters are present on 

the right-hand side of the constraints: 


2(0)=min c!x 
s.t. Ax =b+F0 
(1) 
x>0 
xER", OER, 


where x is a vector of continuous variables; A and F are 
constant matrices, and c and b are constant vectors of 
appropriate dimensions; 6 is a vector of uncertain pa- 
rameters, such that for each 6 € K, 6 € RS, (1) has a fi- 
nite optimal solution, and has no optimal solution for 
6 € R* — K. Further, consider the following restriction 
on 6 € &, & = {0:G6@ < g}, where G is a constant ma- 
trix and g is a constant vector; see Fig. 1 for a graphical 
interpretation for the two parametric case where 6 is 
bounded in the region given by PQRST. 

The simplex tableau associated with (1) is given as 
follows: 


Yx —? FO = xz, 


z+z!'x — fnt10 = 2), 


where 
Y=B 1A, °"F=B'F, xz, =Bb, 
z=clx, °z= ore, (2) 


Te ple (p) — .T 
Smt = cpPF, Zz?) = cy xp, 


where p corresponds to the index of basic variables 
and B is the corresponding matrix. The (critical) re- 
gion within which the above (optimal) tableau is valid 
can then be derived as follows. The critical region, CR, 
where an optimal solution, z')(Q) = ch xp(O), preserves 
its optimality, is given by the initial conditions on 0: 


G0 <g (3) 
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together with the conditions of primal feasibility. The 
conditions of primal feasibility are derived as follows. 
The basis B is said to be primal feasible if the condition: 


B™'b(0) = xp(6) = 0, (4) 


where b(@) = b + F@ and xg(0) = xg +°F9, is satisfied. 
Then using (2) and (4), the condition of primal feasibil- 
ity is given by: 


— PFO < xp. (5) 


Thus, the critical region corresponding to p is given by 

(5) and (3). For illustration purposes, say in Fig. 1, the 

initial region of 6 (condition (3)) is given by PQRST 

and the condition of primal feasibility is given by 

UVWX (condition (5)), then CR? is the correspond- 

ing critical region. Note that CR? is obtained by remov- 

ing the redundant constraints, PT, QR and RS. In order 
to devise a procedure to obtain “all’ the critical regions 

(CR! and CR), and optimal solutions associated with 

them, we first state the following: 

e Two optimal bases are said to be neighbors if 
- there exists some 0* € K such that both the bases 

are optimal, and, 
- itis possible to pass from one basis to another by 
one dual step. 

e The critical regions associated with two different op- 
timal bases are said to be neighbors if their corre- 
sponding bases are neighbors. 

e Two neighboring critical regions lie in opposite half 
spaces. 

e The optimal value function, z(@), is continuous and 
convex; see Fig. 2 for a graphical interpretation for 
the case of two parameters. 

Based upon the above statements, the solution algo- 
rithm for identifying all the critical regions can now be 
described. The algorithm consists of two major parts. In 
the first part, an initial feasible solution is obtained and 
the critical region which corresponds to the initial so- 
lution is characterized. The second part then starts with 
this critical region and identifies all the regions and cor- 
responding optimal solutions. The major steps of the 
algorithm are as follows: 

1) Find a feasible solution: 

- Solve (1) by treating 6 as a free variable to ob- 
tain 6*. If no feasible solution exists, stop; (1) is 
infeasible. 


Multiparametric Linear Programming, Figure 2 
z(0)is a continuous and convex function of 0 


-— Fix 0 = 6* and solve (1) to obtain an initial basis 

B and corresponding critical region. 
2) Find all optimal solutions: 

- Construct two lists V and W, where V consists 
of those optimal bases whose neighboring bases 
have been identified, and W consists of those 
bases whose neighbors have yet not been iden- 
tified. 

- Select any basis from W and identify all its neigh- 
boring bases. From all the identified bases, in- 
sert in W those bases which are neither in V nor 
in W. The optimal solutions (and corresponding 
critical regions) are then determined by moving 
from the basis to its neighbors by one dual step. 

- Repeat the procedure until W = {@}. 


See also 


> Bounds and Solution Vector Estimates for 
Parametric NLPs 

> Global Optimization in Multiplicative Programming 

> Linear Programming 

> Multiparametric Mixed Integer Linear 
Programming 

> Multiplicative Programming 

> Nondifferentiable Optimization: Parametric 
Programming 

> Parametric Global Optimization: Sensitivity 

> Parametric Linear Programming: Cost Simplex 
Algorithm 

> Parametric Mixed Integer Nonlinear Optimization 

> Parametric Optimization: Embeddings, Path 
Following and Singularities 

> Selfdual Parametric Method for Linear Programs 


2484 


Multiparametric Mixed Integer Linear Programming 


References 


20. 


21. 


. Gal T (1992) Putting the LP survey into perspective. OR/MS 


Today 19(6):93 

Gal T (1992) Weakly redundant constraints and their im- 
pact on postoptimal analysis in linear programming. Europ 
J Oper Res 60:315-336 

Gal T (1995) Postoptimal analyses, parametric program- 
ming, and related topics. de Gruyter, Berlin 

Gal T (1997) Advances in sensitivity analysis and paramet- 
ric programming. Kluwer, Dordrecht 

Gal T, Nedoma J (1972) Multiparametric linear program- 
ming. Managem Sci 18:406-422 

Granot D, Granot F, Johnson EL (1982) Duality and pricing 
in multiple right-hand choice linear programming prob- 
lems. Math Oper Res 7:545-556 

Greenberg HJ (1986) An analysis of degeneracy. Naval Res 
Logist Quart 33:635-655 

Greenberg HJ (1993) How to analyze the results of linear 
programs - Part 1: Preliminaries. Interfaces 23(4):56-67 
Greenberg HJ (1993) How to analyze the results of linear 
programs - Part 2: Price Interpretation. Interfaces 23(5):97- 
114 

Greenberg HJ (1993) How to analyze the results of lin- 
ear programs - Part 3: Infeasibility Diagnosis. Interfaces 
23(6):120-139 


. Greenberg HJ (1994) How to analyze the results of linear 


programs - Part 4: Forcing Structures. Interfaces 24(1):121- 
130 

Greenberg HJ (1994) The use of optimal partition in linear 
programming solution for postoptimal analysis. Oper Res 
Lett 15:179-185 

Greenberg HJ (1996) The ANALYZE rulebase for supporting 
LP analysis. Ann Oper Res 65:91-126 

Hansen PM, Labbe M, Wendell RE (1989) Sensitivity analy- 
sis in multiple objective linear programming: the tolerance 
approach. Europ J Oper Res 38(1):63-69 

Magnati TL, Orlin JB (1988) Parametric linear programming 
and anti-cycling pivoting rules. Math Program 41:317-325 
Murty K (1980) Computational complexity of parametric 
linear programming. Math Program 19:213-219 

Roos C, Terlaky T, Vial -Ph J (1997) Theory and algorithms 
for linear optimization, an interior point approach. Wiley, 
New York 

Wang H-F, Huang C-S (1993) Multiparametric analysis of 
the maximum tolerance in a linear programming problem. 
Europ J Oper Res 67(1):75-87 

Ward JE, Wendell RE (1990) Approaches to sensitivity anal- 
ysis in linear programming. Ann Oper Res 27:3-38 
Wendell RE (1985) The tolerance approach to sensitivity 
analysis in linear programming. Managem Sci 31:564-578 
Wendell RE (1997) Linear programming 3: The tolerance 
approach. In: Gal T, Greenberg HJ (eds) Advances in Sensi- 
tivity Analysis and Parametric Programming. Kluwer, Dor- 
drecht 


————— 
Multiparametric Mixed Integer 


Linear Programming 


VIVEK DUA, EFSTRATIOS N. PISTIKOPOULOS 
Imperial College, London, UK 


MSC2000: 90C31, 90C11 


Article Outline 


Keywords 

Mixed Integer Linear Programming Problems 
Involving a Single Uncertain Parameter 
in Objective Function Coefficients 

See also 

References 


Keywords 


Parametric bounds; Branch and bound; Comparison of 
parametric solutions 


In this article we describe theoretical and algorithmic 
developments in the field of parametric programming 
for linear models involving 0-1 integer variables. We 
will consider two cases of the problem: single paramet- 
ric (when a single uncertain parameter is present) and 
multiparametric (when more than one uncertain pa- 
rameters are present in the model). For the case when 
a single uncertain parameter is present, solution ap- 
proaches are based upon 

a) enumeration [11,12,13]; 

b) cutting planes [6]; and 

c) branch and bound techniques [8,10]. 

For the multiparametric case, solution algorithm that 
has been proposed is based upon branch and bound 
fundamentals [1,2]. While most of the work on single 
parametric problems has been reviewed in the two ex- 
cellent papers [5] and [7], and has been borrowed here 
for the sake of completeness, the work on multipara- 
metric problems, the focus of this article, is quite re- 
cent and is described in detail. It may be mentioned that 
while solution approaches for single parametric case 
are available for uncertainty in objective function co- 
efficients or right-hand side of constraints, for the case 
of more than one uncertain parameter the solution ap- 
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proach is available only for the right-hand side case. 

Next we will describe solution approaches for 

a) single parametric mixed integer linear programs for 
objective function coefficients parametrization; and 

b) single parametric pure integer programs when the 
uncertain parameter is present on the right-hand 
side of the constraints. 

These illustrate some concepts which are based upon 

some basic observations. For other solution ap- 

proaches, see the literature cited above. Finally we will 

present a solution approach for right-hand side multi- 

parametric mixed integer linear programs. 


Mixed Integer Linear Programming Problems 
Involving a Single Uncertain Parameter 
in Objective Function Coefficients 


These can be stated as follows: 


z(p) =min(c! +c'd)x+d'y 
xy 
st. Ax+ Ey <b, 
xeER", ye {0,1}, 
min < dp < Pmax; 


(1) 


where x is a vector of continuous variables; y is the vec- 
tor of 0-1 integer variables; @ is a scalar uncertain pa- 
rameter bounded between its lower and upper bounds 
dmin and max respectively; A is an (m x n) matrix; E 
is an (m x 1) matrix; c, c’, d and b are vectors of appro- 
priate dimensions. Solution procedure for (1) is based 
upon following two features of the formulation in (1). 
First feature of this formulation is that, since the uncer- 
tain parameter is present in the objective function only, 
the feasible region of (1) remains constant for all the 
fixed values of ¢ in [@mins Pmax]. And the second fea- 
ture is that, the optimal value of (1) for min < @ < bmax 
is piecewise linear, continuous, and concave on its fi- 
nite domain. The solution is then approached by deriv- 
ing valid upper and lower bounds, using the concavity 
property of the objective function value, and sharpen- 
ing these bounds until they converge to the same value, 
as described next. Solving (1) for ¢ fixed at its endpoints 
dmin and dmax, gives upper bounds AB and BC respec- 
tively (see Fig. 1); and a linear interpolation, AC, be- 
tween the endpoints provides a lower bound to the so- 
lution. The region ABC within which the solution will 


2) UB = UPPER BOUND; LB = LOWER BOUND 


o 
Onin Pmax 


Multiparametric Mixed Integer Linear Programming, Fig- 
ure 1 
Derivation of bounds 


Ab) 


max 


Multiparametric Mixed Integer Linear Programming, Fig- 
ure 2 
Sharpening of bounds 


lie is then reduced by solving (1) at dint, the intersec- 
tion point of two upper bounds AB and BC. This re- 
sults (see Fig. 2) in two smaller regions, ADE and EFC, 
within which the solution will exist. This procedure is 
continued until the difference between upper and lower 
bounds becomes zero. 

Integer programming problem involving a single 
uncertain parameter on the right-hand side of the con- 
straints can be stated as follows: 


z(@) =min d'y 
y 
st. Ey<b+r6, 


Omin Ss 6 = Omax 


y € {0,1}, 


(2) 


where r is a scalar constant and @ is a scalar uncer- 
tain parameter bounded between 6 min and max respec- 
tively. For a special case of (2) when r > 0, it may be 
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Multiparametric Mixed Integer Linear Programming, Fig- 
ure 3 
Step function nature of objective function value 


noted that as @ is increased from Omin to A max, the fea- 
sible region will enlarge, and hence the objective func- 
tion value will decrease or remain the same, i.e., z(0;) 
> 2(6i41) for 6; < 6;4;. Further, since only integer vari- 
ables are present in (2), a solution will remain optimal 
for some interval of @ and then suddenly another solu- 
tion will become optimal, and remain so for the next in- 
terval (see Fig. 3). The problem thus reduces to solving 
(2) at an end point, say min, and then finding a point 6; 
at which the current solution becomes infeasible. Solv- 
ing (2) at 6; + € will give another integer solution. This 
procedure is continued until we hit the other end point, 
O ynaxs 

Consider a multiparametric mixed integer linear 
programming problem (mp-MILP) of the following 
form: 


2(0) =minc!x+ d'y 


xy 
st. Ax+Ey<b+F6, (3) 
GO < g, 
xeER", yef{o,}, OER’, 


where 6@ is a vector of uncertain parameters; F is an (m 
x s) matrix, G is an (r x s) matrix, and g is a constant 
vector. Solving (3) implies obtaining the optimal solu- 
tion to (3) for every @ that lies in & = {0:G0 < g, 0 
€ R‘}. The algorithm for the solution of (3) proposed 
in [1] is based upon simultaneously using the concepts 
of 
e branch and bound method for solving mixed inte- 
ger linear programming (MILP) problems (see, e. g., 
[9]); and, 


e simplex algorithm for solving multiparametric lin- 

ear programming (mp-LP) problems [4]. 

While a solution of (3) by relaxing the integrality con- 
dition on y (at the root node) represents a paramet- 
ric lower bound, a solution where all the y variables 
are fixed (e. g., at a terminal node) represents a para- 
metric upper bound. The algorithm proceeds from the 
root node (lower bound) towards terminal nodes (up- 
per bound) by fixing y variables at the intermediate 
nodes. The complete enumeration of the tree is avoided 
by fathoming those intermediate nodes which guaran- 
tee a suboptimal solution. 

At the root node, by relaxing the integrality condi- 
tion on y, i.e., considering y as a continuous variable 
bounded between 0 and 1, (3) is transformed to an mp- 
LP of the following form: 


2(0) =minc'x+d'y 
xy 
st. Ax+Ey < b+ FO, 
0<j<1, 
xéER", OER. 


The solution of (4), given by linear parametric profiles, 
2(6)', valid in their corresponding critical regions, CR’, 
represents a parametric lower bound. 

Similarly, at a node where all y are fixed, y = ¥, (3) 
is transformed to an mp-LP of the following form: 


20) = min clx+d'9 
xy 
st. Ax +Ey <b + FO, 


¥ = {0,1}, 
xeER", OER. 


The solution of (5), 2(0)', valid in its corresponding 
critical regions, CRi, represents a parametric upper 
bound. 

Starting from the root node, some of the y variables 
are systematically fixed (to 0 and 1) to generate inter- 
mediate nodes of the branch and bound tree. At an in- 
termediate node, where some y are fixed and some are 
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Multiparametric Mixed Integer Linear Programming, Fig- 
ure 4 
Redundant constraints 


relaxed, an mp-LP of the following form is formulated: 


z(@) = ron cle + di Fj + di ye 


st. Ax + Ejyj + Exp, < b+ FO, 
GO <g, 
Vj = {0, 1}, 
O< Mm <1, 
xeER", GER’, 


(6) 


where the subscripts j and k correspond to y that are 
fixed and y that are free, respectively. The solution at 
an intermediate node, Z(6)', valid in its corresponding 
critical regions, CR’, is then analyzed, to decide whether 
to explore subnodes of this intermediate node or not, by 
using the following fathoming criteria. A given space 
in any node can be discarded if one of the following 
holds: 

e (infeasibility criterion) Problem (6) is infeasible in 
the given space. 

e (integrality criterion) An integer solution is found in 
the given space. 

e (dominance criterion) The solution of the node is 
greater than the current upper bound in the same 
space. 

If all the regions of a node are discarded the node can be 

fathomed. While the first two fathoming criteria (Infea- 

sibility and Integrality) are easy to apply, in order to ap- 


Multiparametric Mixed Integer Linear Programming, Fig- 
ure5 
Definition of CR'"; Case 1 


Multiparametric Mixed Integer Linear Programming, Fig- 
ure 6 
Definition of CR'"t; Case 2 


ply the third one (dominance criteria) we need a com- 
parison procedure, which is described next. 

The comparison procedure consists of two steps. In 
the first step, a region, CR — CRN CR, where the so- 
lution of the intermediate node and the current upper 
bound are valid is defined. This is achieved by removing 
the redundant constraints from the set of constraints 
which define CR and CR (for a procedure to eliminate 
redundant constraints see [3]); graphical interpretation 
of redundant constraints is given in Fig. 4, where C; is 
a strongly redundant constraint and C) is a weakly re- 
dundant constraint. 

The results of this redundancy test, which belong to 
one of the following 4 cases, are then analyzed as fol- 
lows: 

e (case 1; Fig. 5) All constraints from CR are redun- 
dant. This implies that CR > CR, and therefore 

Ce SGR, 
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Multiparametric Mixed Integer Linear Programming, Fig- 
ure 7 
Definition of CR'"t; Case 3 
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Multiparametric Mixed Integer Linear Programming, Fig- 
ure 8 
Definition of CR'"t; Case 4 


e (case 2; Fig. 6) All constraints from CR are redun- 
dant. This implies that CR > CR, and therefore 
CR = CR, 

e (case 3; Fig. 7) Constraints from both regions are 
nonredundant. This implies that two spaces inter- 
sect with each other, and CR is given by the space 
delimited by the nonredundant constraints. 

e (case 4; Fig. 8) The problem is infeasible. This im- 
plies that two spaces are apart from each other and 
CR" = {0}. 

Once CR" has been defined, the second step is to com- 

pare Z to Z, so as to find which of the two is lower. This 

is achieved by defining a new constraint: 


zi) = 2(A) —2(0) <0 


and checking for redundancy of this constraint in CR™. 
This redundancy test results in following 3 cases: 


z4if (9) <0 


Multiparametric Mixed Integer Linear Programming, Fig- 
ure9 
Compare 2(0) :2(0); Case 1 


Multiparametric Mixed Integer Linear Programming, Fig- 
ure 10 
Compare z(0) :2(0); Case 2 
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Multiparametric Mixed Integer Linear Programming, Fig- 
ure 11 
Compare z(0) :2(0); Case 3 


e (case 1; Fig. 9) The new constraint is redundant. 
This implies that z(0) < Z(@) and therefore the 
space must be kept for further analysis. 
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e (case 2; Fig. 10) The problem is infeasible. This im- 
plies that (0) > Z(@) and therefore the space can be 
discarded from further analysis. 

e (case 3; Fig. 11) The new constraint is non- 
redundant. This implies that z(6) < Z(6) in ABCD, 
and therefore the rest of the space can be discarded 
from further analysis. 

Based upon the above theoretical framework, the steps 

of the algorithm can be summarized as follows: 


1 Set an upper bound of z(@) = oo. 
Solve the fully relaxed problem (4). 
IF an integer solution is found in a critical re- 
gion, THEN update the upper bound and dis- 
card the region from further analysis. 
3 Fix one of the y variables to 0 and 1 to create 
two new nodes. 
IF no new nodes can be generated, THEN stop. 
4 Solve the resulting problem (6). 
IF the problem is infeasible THEN go back to 
Step 3, 
ELSE compare the solution to the current up- 
per bound. 
5 IF all regions from a node have been analyzed, 
THEN go to Step 3. 
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Motivation 


Proteins are arguably the most complex molecules in 
nature. This complexity arises from an intricate bal- 
ance of intra- and inter-molecular interactions that de- 
fine the native three-dimensional structure of the sys- 
tem, and subsequently its biological functionality. The 
underlying goal of protein folding research is to under- 
stand the formation of these native tertiary structures. 
Genetic engineering can be used to produce proteins 
with specific amino acid sequences. The next step in- 
volves developing the link between the primary protein 
sequence and the native structure. The ability to pre- 
dict the folding of proteins promises to have important 
practical and theoretical ramifications, especially in the 
areas of medicinal and biophysical chemistry. 


Experimental studies have shown that proteins, un- 
der native physiological conditions, spontaneously re- 
fold to their unique, native structure after denaturation. 
This implies that the formation of the native structure 
is controlled primarily by the amino acid sequence. Ac- 
cording to Anfinsen’s hypothesis the native structure is 
in a state of thermodynamic equilibrium correspond- 
ing to the conformation with the lowest free energy. 
Through mathematical modeling of protein interaction 
energies, the protein folding problem can be addressed 
as a conformational search for the global minimum en- 
ergy. 

There exists two fundamental problems associated 
with protein folding in the context of a conforma- 
tional search. The first is the ability to correctly model 
protein interactions using detailed mathematical equa- 
tions. The second is associated with searching the 
highly nonconvex energy hypersurface that describes 
a given protein. This complexity, coupled with an ex- 
ponential growth in the number of local minima as the 
size of protein increases, has become known as the mul- 
tiple minima problem. There exists an obvious need for 
the development of efficient global optimization tech- 
niques. An efficient method which has been successfully 
applied to detailed atomistic models of protein folding 
is the wBB [1,2,3,17] global optimization algorithm. 


Mathematical Description 


Proteins are essentially polymer chains composed of 
a predefined set of amino acid residues in which neigh- 
boring residues are linked by peptidic bonds. Naturally 
occurring proteins consist of only 20 different amino 
acid residues, and the form of the side chain R (e.g., 
methyl, butyl, benzoic, etc.) defines the differences be- 
tween these constituent groups. The chemical structure 
of a generic protein is illustrated in Fig. 1. The repeat- 
ing unit — NC“C’ — defines the backbone of the pro- 
tein. The protein also possesses amino and carboxyl 
end groups, denoted by Eamino and Ecarboxyl, respec- 
tively. 

The geometry of a protein can be fully described by 
assigning a three-dimensional coordinate vector 1;: 
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Multiple Minima Problem in Protein Folding: «BB Global Op- 
timization Approach, Figure 1 
Generic primary protein structure 


These r; specify the position of each atom in the pro- 
tein molecule. The bond vector between two atoms (i, 
j) connected with a covalent bond is defined as: 


Xj — Xi 
rig = | Vi Vi 
Zi — Zi 


The corresponding bond length is then equal to the Eu- 
clidean distance between these two atoms: 


brisl = (&) — 2)? + (3 — yi)” + (Gj - 21)" 


A covalent bond angle, 6, formed by the two adjacent 
bond vectors rj and rj, can be computed by the follow- 
ing formulas: 


Vij * Vik ‘ij X Tjk 
cos (ijk) = 


~Frulray’ On) = 


[rail Priel 


Here, ri - rjx is the dot product of the bond vectors rij 
and rj and rj X rx is the cross product. 

The dihedral angle wj; measures the relative ori- 
entation of two adjacent covalent angles 6, and 0 jx). 
This angle is defined as the angle between the normals 
through the planes defined by atoms i, j, k and j, k, | 
respectively, and can be calculated from the following 


relations: 
ee (rij X rjk) + (Tik X rt) 
[rig rj] [jk re 
sin (Wijk) = (ras Fi) re [P| 


7 rij x rit| [rik ze rel 
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timization Approach, Figure 2 
Illustration of dihedral angle 


An alternative to specifying the coordinate vector 
for all atoms in a protein molecule is to set bond 
lengths, covalent bond angles and independent dihe- 
dral angles. A common approximation is to assume 
rigid bond lengths and bond angles so that the dihedral 
angles can be used to fully characterize the shape of the 
protein molecule. 

The names of the dihedral angles of a protein chain 
follow a standard nomenclature. The dihedral angle 
between the normals of the planes formed by atoms 
C_- (NC and N;Cf C/’ respectively, is called #;, where 
i— 1 and i are two adjacent amino acid residues. The 
angle defined by the planes NjC¥C;’ and C?C;/Ni +1, re- 
spectively, is called w;, where i and i + 1 are two adja- 
cent amino acid residues. Also, w; is the dihedral an- 
gle defined by the planes C¥ C;’ Nj,1 and C;'Nj+1C?,,. 


i+1 1 


i+] 
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timization Approach, Figure 3 
Dihedral angle conventions 
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The letter y is utilized to denote the dihedral angles 
which are associated with the side groups R;. Finally, 
the letter 6 is used to name the dihedral angles asso- 
ciated with the two end groups. These conventions are 
illustrated in Fig. 3. 


Potential Energy Modeling 


A number of empirically based molecular mechanics 
models have been developed for protein systems, in- 
cluding AMBER [24], CHARMM [7], ECEPP/3 [19], 
GROMOS [11], MM3 [4]. These models, also known 
as force fields, are typically expressed as summations of 
several potential energy components, with the mathe- 
matical form of individual energy terms based on the 
phenomenological nature of that term. A general to- 
tal potential energy equation should include terms for 
bond stretching (Epona), angle bending (Eangie), torsion 
(Etor) and nonbonded (E,,) interactions: 


Epotential = Epona + Eangle + Etor + Env 


When rigid body approximations are employed, bond 
stretching and angle bending energies can be neglected. 
For these force fields, torsion angles define a set of inde- 
pendent variables that effectively describe any protein 
conformation. This approximately reduces the number 
of variables by a factor of 3 over those force fields that 
use a Cartesian coordinate system to describe flexible 
molecular geometries. 

One example of a rigid body atomistic level poten- 
tial energy model is the ECEPP/3 force field. In this 
case, the nonbonded energy terms, Epp, include electro- 
static, Eelec, van der Waals, Eyaw, and hydrogen bond- 
ing, Ehbona, interactions. These energies are calculated 
for those atoms that are separated by more than two 
atoms; that is, the atoms possess at least a 1-4 rela- 
tionship. Electrostatic energies, Elec, are calculated as 
Coulombic forces based on atomic point charges: 


QQ; 


Felec = 
elec ERij 


Here, Q; and Q; represent the two point charges, while 
Rj equals the distance between these two points. The € 
term describes the dielectric nature of the protein envi- 
ronment. 

General nonbonded van der Waals interactions, 
Eyaw, are modeled using a 6-12 Lennard-Jones poten- 


tial energy term, which consists of a repulsion and at- 
traction term: 


vdw — Cij Rij Rij 


The energy minimum for a given atomic pair is de- 
scribed by the potential depth, €,;, and position, Rij. For 
those atomic pairs that may form a hydrogen bond, the 
6-12 potential energy term is replaced by a modified 
10-12 Lennard-Jones type term: 


Bayona = €:; 15 ( rae ii : 
hbond = €ij Rij Ry 


Finally, corrective torsional energies, Eto,, which are 
represented by a three term Fourier series expansion, 
are also added: 


E E 
Etor = ma — cos) + mal — cos 2¢) 
E3 
ote ra — cos 3¢). 


Each term can be interpreted physically. The 1-x (cos 
o) symmetry term accounts for those nonbonded inter- 
actions not included in general nonbonded terms. The 
2-x (cos 2 @) symmetry term is related to the interac- 
tions of orbitals, while the 3-x (cos 3 @) symmetry term 
describes steric contributions. 

Other specific potential energy terms may also be 
added to the general energy equation depending on the 
exact protein sequence. For example, the formation of 
disulfide bridges can be enforced by adding a penalty 
term to constrain the values of particular atomic dis- 
tances. Correction terms have also been used to ad- 
just conformational energies according to the configu- 
rations of proline and hydroxyproline residues. 


Solvation Energy Modeling 


In general, the energetic description of a protein must 
also include solvation effects. A theoretically simple ap- 
proach would be to explicitly surround the peptide with 
solvent molecules and compute potential energy con- 
tributions for intra-and inter-molecular interactions. 
These explicit calculations tend to greatly increase the 
computational cost of the simulation. In addition, sol- 
vent configurations are not rigid, so these calcula- 
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tions must consider an average solvent-peptide config- 
uration, which is typically generated by a number of 
Monte-Carlo (MC) or molecular dynamics (MD) sim- 
ulations [14]. Therefore, most simulations of this type 
are limited to restricted conformational searches. 

An alternative way for effectively considering av- 
erage solvent effects is to use implicit solvation mod- 
els. One complication involves the solvent’s influence 
on electrostatic interaction energies because of the im- 
plicit relationship between dielectric effects and solva- 
tion. A simple solution has been to modify the repre- 
sentation of the dielectric term. In reality, however, the 
rigorous treatment of electrostatic interactions involves 
the solution of the Poisson-Boltzmann equation. 

Other simple and computationally feasible implicit 
solvation models are based on empirical representa- 
tions of the solvation energy. In these cases, the sol- 
vation energy of each functional group is related to 
the interaction of the solvent with a hydration shell 
for the particular group. The individual terms are then 
summed together to provide a total solvation energy for 
the system. These solvation contributions can be de- 
scribed by the following general equation: 


N 
Esow = > Sj0j. 


i=1 


Typically, S; represents either the solvent-accessible 
surface area, Aj, or the solvent-accessible volume of hy- 
dration layer, VHS;, for the functional group, and 0; is 
an empirically derived free energy density parameter. 

A number of algorithms have been developed for 
calculating solvent-accessible surface areas [8,9,22]. Al- 
though several of these are relatively efficient, the ap- 
pearance of discontinuities has been one complication 
in considering solvent accessible surface areas. In ad- 
dition, a large number of parameterization strategies 
(JRF, OONS, WE, etc.) have been used to derive ap- 
propriate o; parameters [21,23,25]. In the case of the 
JRF parameter set, discontinuities can be avoided be- 
cause the surface-accessible solvation energies are only 
included at local minimum conformations [23]. This is 
because the parameters were derived from low energy 
solvated configurations of actual tetrapeptides. 

Several methods have also been developed for cal- 
culating the hydration volumes and corresponding free 
energy parameters [6,12]. A recent and computation- 


ally inexpensive method, RRIGS, is based on a Gaussian 
approximation for the volume of a hydration layer [6]. 
This method also inherently avoids numerical prob- 
lems associated with possible discontinuities so that the 
solvation energy contributions can easily be added at 
every step of local minimizations. 


Problem Formulation 


For protein folding, the energy minimization problem 
can be formulated as a nonconvex, nonlinear global op- 
timization problem in which the energy, E, must be 
globally minimized with respect to the dihedral angles 
of the protein: 

min E(¢i, Wi, wi, x", an, 0°) 


subjectto -m<¢; <7 


The index i = 1, ..., Npgs defines the number of 
residues, Negs, in the protein. In addition, k = 1, ..., 
K' denotes the number of dihedral angles in the side 
chain of the ith residue, and j =1,..., J’ andj=1,..., 
J indicates the indices of the amino and carboxyl end 
groups, respectively. The energy, E, represents the total 
potential energy function, Epotentials plus the free energy 
of solvation, E,o}y. In most cases, this is the exact formu- 
lation; that is, energetic and gradient contributions can 
be added at each step of the minimization. However, in 
the case of surface-accessible hydration using the JRF 
parameters, the potential energy function is minimized 
before adding the hydration energy contributions. In 
other words, gradient contributions from solvation are 
not considered. 

Even after reducing this optimization problem to 
a function of internal variables, the multidimensional 
surface that describes the energy function possesses an 
astronomically large number of local minima. In addi- 
tion, evaluation of the energy, especially with the addi- 
tion of solvation, is computationally expensive, which 
makes even local minimization slow. A large number 
of techniques have been developed to search this non- 
convex conformational space. Many methods employ 
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stochastic search procedures, while others rely on sim- 
plifications of the potential model and/or mathemati- 
cal transformations. In addition, the use of statistical 
and/or heuristic conformational information is often 
required. In general, the major limitation is that there 
is no guarantee for convergence to the global minimum 
energy structure. A number of recent reviews have fo- 
cused on global optimization issues for these systems 
[10,20]. 

The @BB global optimization approach has been 
extremely effective in identifying global minimum en- 
ergy conformations of peptides described by detailed 
atomistic models. The development of this determin- 
istic branch and bound method was motivated by the 
need for an algorithm that could guarantee conver- 
gence to the global minimum of nonlinear optimiza- 
tion problems with twice-differentiable functions. The 
application of this algorithm to the minimization of po- 
tential energy functions was first introduced for micro- 
clusters [16]. The algorithm has also been shown to be 
successful for isolated [5,15], as well as solvated peptide 
systems [13]. 


Global Minimization Using «BB 


The @BB global optimization algorithm effectively 
brackets the global minimum solution by develop- 
ing converging sequences of lower and upper bounds. 
These bounds are refined by iteratively partitioning the 
initial domain. Upper bounds on the global minimum 
are obtained by local minimizations of the original en- 
ergy function, E. Lower bounds belong to the set of so- 
lutions of the convex lower bounding functions, which 
are constructed by augmenting E with the addition of 
separable quadratic terms. By using $!, wi, wt, 5 ae 
ae a” and $Y, y¥, w!, ie Pee gor 
to lower and upper bounds on the corresponding di- 
hedral angles, the lower bounding function, L, of the 
energy hypersurface can be expressed in the following 


manner: 


to refer 


L=E 
Nres 

+ a1g,i (67 — $1) (9; — 41) 
o 

+ > ay (WP — vi) (V7 - Wi) 


i=1 


Nres 
+ My,i (@; — wi) (@;/ — @;) 
ie K 
+S tnise (xb* — xh) (XP? - xt) 
i=1 k=1 
vial 
4 die (6. - 6) (Ch = 6”) 
ye 
4. 2 04, 9¢ (of = 0°) (o7” = 6°) 


The @ represent nonnegative parameters which must be 
greater or equal to the negative one-half of the mini- 
mum eigenvalue of the Hessian of E over the defined 
domain. The overall effect of these terms is to over- 
power the nonconvexities of the original nonconvex 
terms by adding the value of 2 a to the eigenvalues of 
the Hessian of E. The convex lower bounding functions, 

L, possess a number of important properties which 

guarantee global convergence [18]: 

i) Lisavalid underestimator of E; 

ii) L matches E at all corner points of the current box 
constraints; 

iii) L is convex in the current box constraints; 

iv) the maximum separation between L and E is 
bounded. This property ensures that feasibility and 
convergence tolerances can be reached for a finite 
size partition element; 

v) the underestimators L constructed over supersets 
of the current set are always less tight than the un- 
derestimator constructed over the current box con- 
straints for every point within the current box con- 
straints. 

Once solutions for the upper and lower bounding prob- 

lems have been established, the next step is to modify 

the problem for the next iteration. This is accomplished 
by successively partitioning the initial domain into 
smaller subdomains. One obvious strategy is to sub- 
divide the original hyper-rectangle by bisecting the 
longest dimension. In order to ensure nondecreas- 
ing lower bounds, the hyper-rectangle to be bisected 
is chosen by selecting the region which contains the 
infimum of the minima of lower bounds. A nonin- 
creasing sequence for the upper bound is found by 
solving the nonconvex problem locally and selecting it 
to be the minimum over all the previously recorded up- 
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per bounds. If the single minimum of L for any hyper- 
rectangle is greater than the current upper bound, this 
hyper-rectangle can be discarded because the global 
minimum cannot be within this subdomain (fathom- 
ing step). 

The computational requirement of the wBB algo- 
rithm depends on the number of variables (global) on 
which branching occurs. Therefore, these global vari- 
ables need to be chosen carefully. Qualitatively, the 
branching variables should correspond to those vari- 
ables which substantially influence the nonconvexity of 
the surface and the location of the global minimum. In 
terms of the protein folding problem, it is generally ac- 
cepted that the backbone dihedral angles (¢ and w) are 
the most influential variables. Therefore, in larger prob- 
lems, the global variable set should include only the set 
of ¢ and y variables. In this case, the dihedral angles as- 
sociated with the peptide bond (w) and the side chains 
({) are treated as local variables. 


Algorithmic Description 


The basic steps of the algorithm are as follows: 
1) The initial best upper bound is set to an arbitrarily 
large value. The original domain is partitioned along 
one of the global variable dimensions. 
A convex function L is constructed in each hyper- 
rectangle and minimized using a local nonlinear 
solver, with function calls to potential and solvation 
models. If a solution is greater than the best upper 
bound the entire subregion can be fathomed, other- 
wise the solution is stored. 
The local minima for L are used as initial starting 
points for local minimizations of the upper bound- 
ing function E in each hyper-rectangle. In solving 
the upper bounding problems, all variable bounds 
are expanded to (— z, 2) domain. These solutions 
are upper bounds on the global minimum solution 
in each hyper-rectangle. 

4) The current best upper bound is updated to be the 
minimum of those thus far stored. If a new upper 
bound (from step 3) is selected, a separate module is 
called to ensure that the absolute value of each gra- 
dient in the objective function gradient vector is be- 
low a specified tolerance (kcal/mol/deg). The second 
derivative matrix is also evaluated to verify that the 
upper bound solution is a local minimum. 


2 


wa 


3 


~ 


5) The hyper-rectangle with the current minimum 
value for L is selected and partitioned along one of 
the global variables. 

6) If the best upper and lower bounds are within an 
€ tolerance the program will terminate, otherwise it 
will return to Step 2. 

A novel approach has also been proposed for the initial- 
ization of the wBB algorithm [5]. Specifically, an analy- 
sis of 98 proteins from the Brookhaven X-ray data bank 
was used to develop dihedral angle distributions in the 
form of histograms from — zr to m for each dihedral an- 
gle of each of the naturally occurring amino acids. Us- 
ing this information, a set of reduced domains can be 
defined for every dihedral angle of every residue in the 
peptide sequence. Overall initialization domains corre- 
spond to the Cartesian products of all the sub-domains 
of individual residues in the protein. This approach 
maintains the guarantee of global optimality over the 
considered search space of the reduced domains, and 
is deterministic in those subdomains that possess con- 
vex underestimators. In addition, all variable bounds 
are expanded to the [ — z, 2] when solving the up- 
per bounding problem. Therefore, although the initial 
point of an upper bounding minimization is restricted 
to the search space of the corresponding lower bound- 
ing problem, the solution may lie outside the original 
subdomain. 


Example 1  Met-enkephalin (H-Tyr-Gly-Gly-Phe- 
Met-OH) is an endogenous opioid pentapeptide found 
in the human brain, pituitary, and peripheral tissues. Its 
biological function involves a large variety of physiolog- 
ical processes, most notably the endogenous response 
to pain. The peptide consists of 24 dihedral angles and 
a total of 75 atoms, and has played the role of a bench- 
mark molecular conformation problem. The energy 
hypersurface is extremely complex with the number of 
local minima estimated on the order of 10'!. The un- 
solvated global minimum energy conformation, which 
is efficiently located using the aBB algorithm, has been 
shown to exhibit a type IP B-bend along the N-C’ pep- 
tidic bond of Gly’ and Phe? [5], as shown in Fig. 4. 
The algorithm has also successfully predicted global 
minimum energy structures of met-enkephalin using 
both solvent-accessible surface area (JRF) and volume 
of hydration (RRIGS) models [13]. In both cases, ex- 
tended structures were identified, which qualitatively 
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Multiple Minima Problem in Protein Folding: «BB Global Op- 
timization Approach, Figure 4 

Global minimum energy structure of unsolvated met- 
enkephalin 


Multiple Minima Problem in Protein Folding: «BB Global Op- 
timization Approach, Figure 5 

Global minimum energy structure of met-enkephalin using 
area based hydration 


agrees with experimental results. However, differences 
in the role of nonbonded energies and the side chain 
conformations have been identified. The global mini- 
mum energy conformations of the surface area and vol- 
ume of hydration models are shown in Fig. 5 and Fig. 6, 
respectively. 


See also 


> Adaptive Simulated Annealing and its Application 
to Protein Folding 


Multiple Minima Problem in Protein Folding: «BB Global Op- 
timization Approach, Figure 6 

Global minimum energy structure of met-enkephalin using 
volume based hydration 


> Genetic Algorithms 

> Global Optimization in Lennard-Jones and Morse 
Clusters 

> Global Optimization in Protein Folding 

> Molecular Structure Determination: Convex Global 
Underestimation 

> Monte-Carlo Simulated Annealing in Protein 
Folding 

> Packet Annealing 

> Phase Problem in X-ray Crystallography: Shake and 
Bake Approach 

> Protein Folding: Generalized-ensemble Algorithms 

> Simulated Annealing 

> Simulated Annealing Methods in Protein Folding 
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Dynamic programming has been an area of active re- 
search since its introduction by R. Bellman [1]. More 
recently, with the recognition that many applied op- 
timization problems require more than one objective, 
the study of multicriteria optimization has become 
a growing area of research. Included in this area of 
multicriteria optimization is the study of multiple ob- 
jective dynamic programming (MODP). MODP was 
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first used to replace multiple objective linear program- 
ming (MOLP) where it was not applicable, such as in 
problems with discrete variables. Many of the tech- 
niques used are extensions of classical dynamic pro- 
gramming. The following is a discussion of some of 
the research that has been developed in the area of 
MODP. 

Using multiple objective dynamic programming to 
find the ‘shortest’ path through a network with con- 
stant costs is one of the more straightforward uses of 
MODP. Work has been done on both forward and 
backward MODP in this area. First, we consider a gen- 
eral network containing a set of nodes N = {1, ..., n} 
and a set of arcs A = {(ig, 1), (in, 13), (ig, is), ...} C 
N x N which indicates connections between nodes. 
Each arc (i, j) has an associated cost vector, cj = (cy1; 

«+» Ciim) C R™. A path from node ip to ip is the se- 
quence of arcs P = {(ig, i1), ..., (ip-1, ip)} where the 
first node of each arc is the same as the terminal node 
of the preceding arc and each node in the path is 
unique. Let [7; be the set of all paths from node 1 to 
node i. The cost to traverse a path p in J7; is [c(p)] = 
>a € pleg]. A path in IT; is nondominated if there is 
no other path p* in /7; with [c(p*)], < [c(p)], for r 


= 1,..., mand [c(p*)]r < [c(p)], for some r € {1, ..., 
m}. 
@ | =, 
1 | Evaluate S* for all nodes using S* = {c;;+S*1}. 
2 | Ifk < N, set k = k + 1 and return to step 1; 


otherwise: 

3 | For each nondominated solution at each node 
determined in step 1 and for each r, r = 
Mocosn tt Geta 1 ay 1 = inne. 
» 1 Ci, in—1> Where i, is the originated node at 
stage n and I, is the set of nodes that can be 
reached from node n. 

4 | Given weights W™ © R{’, compute the MIN- 
SUM as 


m N 
ee = 
nin] ) }w Eine 


“il 


H.G., Daellenbach and C.A. DeKluyver [5] gave one 
of the earliest algorithms for backward MODP with 
constant costs, which finds nondominated paths from 
all nodes to the destination node. Their method is ba- 


sically an extension of the principle of optimality to 
a multicriteria context. They state a principle of Pareto 
optimality of MODP: ‘A nondominated policy has the 
property that regardless of how the process entered 
a given state, the remaining decisions must belong to 
a nondominated subpolicy.’ Let S* be the nondomi- 
nated vector of objective values for a node i, exactly k 
links from its destination, t. Then the algorithm is given 
above. 

The resulting S* vectors give nondominated solu- 
tions for the network, but maybe not all of them. They 
solve an example in which the weights are not specified. 

A few years later, R. Hartley [6] proposed a simi- 
lar algorithm that also uses backward MODP to find all 
Pareto paths from all nodes in the network to a specified 
node. The algorithm is as follows: 

Let Vo(i) = {00, ..., co} fork =0,1,..., and let V;(t) 
= {0,..., O}. 

Vii) = eff U {ey + Vi): j €e P@}] forie NGF 
t) and k = 1, 2,..., where I(i) is the set of nodes such 
that (i, j) € A. The ‘eff operator finds all nondominated 
vectors in the set. The associated paths must be handled 
separately. 

H.W. Corley and I.D. Moon [4] used forward 
MODP to find all nondominated paths from a speci- 
fied node to all other nodes in a network with multiple 
constant costs. They assumed that the network contains 
no loops and that cj 4 {0,..., 0} for any (i, j) € A. Let- 
ting Gq” be the set of vector costs of all Pareto paths 
from node 1 to node i containing k or fewer arcs, the 
algorithm follows: 


1 | Set cj, = (0,...,0), i = 1,...,m and cj; = 


(co,...,00), i # j, if no arc exists from i to 
j. Set k = land let Gy = {ena d = Myce nit 
De Fors) ln anaseG. '— Vimin U", {ci 4 


(es ls (k) 
gj Bic G; \ 
3 If G' = G' i =1,...,n, stop, otherwise 


go to step 4. 
4 | Ifk = n—1,stop. Else, k = k+1 and go to step 
Be 


Vmin is an operation that computes the vector costs 
of all nondominated paths in a set of vector costs. An 
algorithm for Vmin is given in their paper. 
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Multiple Objective Dynamic Programming, Table 1 


xf 0) () () 
x! 0) () () 
w| 0) () () 
ce] vml(*).()] ().() i) 
a | _voml().(2) ()C) ()) 
Se 


[4,6] 
[6,1] G) 


Multiple Objective Dynamic Programming, Figure 1 


Table 1 gives the results of the algorithm. The re- 
sulting Pareto optimal paths from node 1 to node 6 are 
(CL, 2), (2, 5), (5, 6)} and {(1, 3), (3, 5), (5, 6)f- 

The following example uses the Corley-Moon algo- 
rithm to solve a dynamic routing problem for the net- 
work in Fig. 1. 

Using multiple objective dynamic programming to 
find the shortest path through a network with time- 
dependent costs is considerably more complicated than 
MODP with constant costs. The monotonicity assump- 
tions necessary for the principle of optimality in dy- 
namic programming can easily be broken when dealing 
with time-dependent costs. Reaching a node later may 
be less costly than reaching it earlier. M.M. Kostreva 


and M.M. Wiecek [7] extended the work done by 
K.L. Cooke and E. Halsey [3] on dynamic program- 
ming with one time-dependent cost (travel time) to 
dynamic programming with multiple time-dependent 
costs. This method uses backward dynamic program- 
ming on a discrete time grid to find all nondominated 
paths from every node in the network to the destination 
node. 

Assume the discrete time grid Sr = {to, ..., t9 + T} 
ty > 0 and the cost functions [cj(t)], > 0, (i, j) € A, for 
all t € Sr. T is the upper bound on total time to travel 
from any node in the network to the destination node, 
Ng. Also assume that [cj(t)]; is the time to travel from 
node i to node j when the arrival time at node i is time 
t. For allie N \ Ng and all t € Sr, define {[F;(t)]} as the 
set of nondominated vectors associated with the paths 
that leave node i at time ¢t and reach node Nj and define 
{[F;(t)(k)]} as the set of nondominated vectors associ- 
ated with the paths that leave node i at time t and reach 
node Ng in at most k + 1 links before time tp + T, where 
k =0,1,.... The following is the principle of optimality 
used for this algorithm: “A nondominated path p, leav- 
ing node i at time t € Sy and reaching node N at or 
before time to + T, has the property that for each node 
j lying on this path, a subpath p;, that leaves node j at 
time tj € Sr, t; > t, and arrives at node N at or before 
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time to + T, is nondominated.’ The algorithm is as fol- 
lows: 


1 | Finda time grid of discrete values Sp = {to,..., 

ty + T}, to > 0 and compute [c;;(t)] for all 

t € Sy and all (i,j) € A. 

2 | Modify [c;;(t)] for all f € Sy and all (i,j) €¢ A 

as follows: 

lex (0 = | (ey) if tele S fot T. 
oo if t+[cjj(t)]; > to+T. 

3 | Find the initial array [{[F;@@O]}],i=1,...,N, 

for all t € S7, where {[Fy, OO} = {0}, and 

{[F,(t)]} = [cin, (t)) for i € N\Na. 

4 | Find the arrays fF) ]},i = 1,...N, for 

all t € Sr, for k = 1,2,...as follows: 


{Fo} 
=VMIN{[e;;(t)]’ +{LF(t+ [ei(01,)* PM}, 
i € N\{Ng}, 
(eG) 1} = {07 


5 | The sequence of sets {[F;(¢)]}, k = 1,2,..., 
converges to the set {[F;i(tq)]}, the set of non- 
dominated vectors associated with the paths 
that leave node i at time fp and reach node Nj. 


The following example uses Algorithm One [7] to 
solve a dynamic routing problem for the network in 
Fig. 2. A grid of discrete values of time Sj = {1, 2, ..., 
20} for fo = 1 is established. 

Table 2 shows the initial array and the two subse- 
quent arrays. So, the set {Eff(E;(to))} of all nondomi- 
nated paths that leave node 1, 2, and 3 at time to = 1 are 
{(1, 2), (2, 3), (3; 4)}, {(2, 3), (3, 4)}, and {(3, 4)}. 

Kostreva and Wiecek [7] also developed an algo- 
rithm which uses forward dynamic programming to 


(2t,1) [2,2] 


[t+1,3] [(t-5)2+1,2] 
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find all nondominated paths from an origin node to 

every other node in the network without using a time 

grid. Thus, assume tf is a continuous variable, t > 0, and 

[c(t)], > 0. An assumption must be made about the 

cost functions so that the principle of optimality will 

hold for these networks: For any arc (i, j) € A and all 

ty, to > 0, if t; < to, then: 

a) ty + [cy(ti)hi < ta + [cy(t2)] 1, and 

b) [cy(ti)], < [cy(t2)], for all r € {2, ..., m}. Assuming 
the cost functions are monotone increasing with re- 
spect to time satisfies this assumption. 


1 | Find the initial vector {1G}, i = Wnenon Ny 
where {[G1"]} = {0} and {[G0"]} = [e1;(0)], 
SNe 

2 | Calculate the vectors {1G 1}; f= loseoo Ny tor 
k =1,2,..., as follows: 


(1Gh()], P=,....N} 
= VMIN{[G7(")"?] + [ci/(t")], 
= Mecano NER 
j=2,....N, 


{(Gi(t')], 1 = 1} = {0}. 


3 {GP}, k = 1,2,..., converges to {[G;]}, the 
set of vector costs of all nondominated paths 
which leave the origin node at time t = 0 and 
lead to node j. 


Assume that node 1 is the origin node. For nodes j 
=2,..., N, let [ar)™) be the vector cost of the non- 
dominated path u which is of at most k links leaving the 
origin node at time ¢ = 0 and leading to node j, where 
t is the arrival time of this path at node j. Also, let 
{(a}} be the set of vector costs of all nondominated 
paths which are of at most k links leaving the origin 
node at time t = 0 and leading to node j, where N; is the 
number of nondominated paths. Let {[G;]} be the set of 
vector costs of all nondominated paths which leave the 
origin node at time t = 0 and lead to node j. The algo- 
rithm is as listed above. 

Another way to get around the monotonicity as- 
sumption of dynamic programming is to use gener- 
alized dynamic programming techniques. See [2] for 
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Sequence of arrays 


| OEQQ | GOOO | (OOO 
| GFEOQQ | OOOO | OOO 
| OQ | GUOO | BOC) 
‘| HO0Q ~ 9000 | SOOG 
| OOO | SOQ | EO) 
‘| HO00 ~ EOOO | SOOO 
| SOG0 | SAGO | OC 
‘| FOOQ | OOOO | SOC 
| FEGQ | GEEO | SOO 

SIO | GIO0 | SOO 


a way to use generalized DP with a multicriteria prefer- be the set of all paths from the origin to node j. Let 
ence function. Basically, generalized DP uses a weaker 


principle of optimality than Bellman’s famous version 
[1]. Generalized DP finds partial solutions that may 
lead to optimal solutions even though locally they are 
not optimal solutions according to the preference func- 


tion. 


In [2] generalized DP is applied to the multicriteria 
best path problem. Assuming node 1 to be the origin 
and node N to be the destination, let [7 be the set of all 


paths in the network. Let 


P(j) ={pe Tl: i, =1, i, = j} 


X(j) ={p € M7: i, = j, in = N} 


be the set of all paths from node j to the destination 
node. The vector cost along each arc is called an arc 
length vector, li = (Ii; Sas I) € R™. A path length func- 
tion z: IT — R" assigns a path length vector to every 
path p € IT where 9 is a binary operator on R”: 


2(p) = hzo--+o1j,-1i,- 


Thus, each different objective can have a different bi- 
nary operation. For example, distance would have an 
additive binary operator and probabilities would have 
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a multiplicative binary operator. Let 


Z(j) = t2(p): p € PG)} 


be the set of all length vectors of all paths from the 
origin to node j. A multicriteria preference function u: 
R” — R is defined on the set of path length vectors. 
The objective is to maximize this preference function. 
The monotonicity assumption says that for all z,z’ € 
Z(j), u(z) < u(z’) > u(zo lj) < u(z’ o lx) for all j, 
k € S such that (j, k) € T. Unfortunately, with multi- 
objective problems this assumption is easily violated. 
Generalized DP tries to get around this monotonicity 
assumption by having local preference relations defined 
as pj © Z(j) x Z(j): for z, z’ € Z(j), where zp;z’ implies 
that any subpath from the origin to node j whose length 
is z cannot be used in a path to produce a better overall 
path from the origin to the destination node than using 
the subpath from the origin to node j whose length is 
z’. So, subpath length vector z’ is more locally preferred 
even though subpath length vector z may be globally 
preferred, u(z’) < u(z). So, for z, z’ € Z(j), zpjz’ if and 
only if dp’ € X({) such that u(zoz(p)) < u(z’oz(p’)) for 
all p € X(j). These local preference relations are used to 
form the weak principle of optimality. An optimal path 
must be composed of subpaths that can be part of an 
optimal path. 

Unfortunately, in order to get these preference rela- 
tions one would have to complete all paths from every 
node in the network. Since this is too computationally 
intense, the preference relations are relaxed to the refin- 
ing local preference relations <; where z <; z’ implies z 
pj z’. Using <; avoids having to find the entire relation 
p;. Using this relation means that a larger set of maxi- 
mal path length vectors will be kept by using p; than if 
p; were used. A maximal path length vector is a vector 
where there does not exist another vector at that state 
that is strictly more preferred. Let 


maxl(X, p) = {x € X: dx’ € X: xpx’ and x’px}. 
The following are the equations of generalized DP: 


f(Q) = {zi}, 
f(A) = maxl (UG, jea(f(i) © 1ij) <j) 
for j = 2,...,N, 


where {f(i) 0 Jj} = {zo lij: z € fi}. 


When the monotonicity assumptions are satisfied, 
the <j; relation can be replaced with the multicriteria 
preference function, u, thus reducing to the conven- 
tional DP problem. However, when the monotonicity 
assumption does not hold the <j; relation must be de- 
fined by trying exploit any special structures of each 
individual problem. Also, using dynamic programming 
to find the entire Pareto optimal set can be seen as an- 
other special case of generalized DP where z, > z,’ for 
alk=1,...,.m>2z<; z’ (assuming minimization of 
each criteria). 

The subject of multiple objective dynamic pro- 
gramming has developed into a viable body of knowl- 
edge capable of providing solutions to applied prob- 
lems in which trade-offs among objectives is important. 
Among the multiple objective techniques, it is distinc- 
tive in its ability to provide the entire Pareto optimal 
set. To gain such an advantage, one must be willing to 
perform computationally intensive operations on large 
sets of vectors. 
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This article gives a brief introduction into multiple ob- 
jective programming support. We will overview basic 
concepts, formulations, and principles of solving mul- 
tiple objective programming problems. To solve those 
problems requires the intervention of a decision-maker. 
That’s why behavioral assumptions play an important 
role in multiple objective programming. Which as- 
sumptions are made affects which kind of support is 
given to a decision maker. We will demonstrate how 
a free search type approach can be used to solve multi- 
ple objective programming problems. 


Introduction 


Before we can consider the concept of multiple objec- 
tive programming support (MOPS), we have to first ex- 
plain the concept of multiple criteria decision making 
(MCDM). Even if there is a variation of different def- 
initions, most researchers working in the field might 
accept the following general definition: Multiple Cri- 
teria Decision Making (MCDM) refers to the solving 
of decision and planning problems involving multi- 
ple (generally conflicting) criteria. ‘Solving’ means that 
a decision-maker (DM) will choose one ‘reasonable’ al- 
ternative from among a set of available ones. It is also 
meaningful to define that the choice is irrevocable. For 
an MCDM problem it is typical that no unique solu- 
tion for the problem exists. Therefore to find a solu- 
tion for MCDM problems requires the intervention of 
a decision-maker (DM). In MCDM, the word ‘reason- 
able’ is replaced by the words ‘efficient/nondominated’. 
They will be defined later on. 

Actually the above definition is a strongly simpli- 
fied description of the whole (multiple criteria) deci- 
sion making process. In practice, MCDM problems are 
not often so well-structured, that they can be consid- 
ered just as a choice problem. Before a decision prob- 
lem is ready to be ‘solved’, the following questions re- 
quire a lot of preliminary work: How to structure the 
problem? How to find essential criteria? How to handle 
uncertainty? These questions are by no means outside 
the interest area of MCDM-researchers. The outrank- 
ing method by B. Roy [17] and the AHP (the analytic hi- 
erarchy process) developed by T.L. Saaty [18] are exam- 
ples of the MCDM-methods, in which a lot of effort is 
devoted to problem structuring. Both methods are well 
known and widely used in practice. In both methods, 
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Multiple Objective Programming Support, Figure 1 
A variable, criterion, and value space 


the presence of multiple criteria is an essential feature, 
but the structuring of a problem is an even more im- 
portant part of the solution process. 

When the term ‘support’ is used in connection with 
MCDM, we may adopt a broad perspective and refer 
with the term to all research associated with the rela- 
tionship between the problem and the decision-maker. 
In this article we take a narrower perspective and focus 
on a very essential supporting problem in multiple cri- 
teria decision making: How to assist a DM to find the 
‘best’ solution from among a set of available ‘reason- 
able’ alternatives, when the alternatives are evaluated 
by using several criteria? Available alternatives are as- 
sumed to be defined explicitly or implicitly by means of 
a mathematical model. The term multiple objective pro- 
gramming is usually used to refer to dealing with this 
kind of model. 

The following considerations are general in the 
sense that usually it is not necessary to specify how the 
alternatives are defined. It is enough to assume that they 
belong to set Q. However, in Fig. 1 and Fig. 2 and the 
numerical example we consider a multiple objective lin- 
ear programming model in which all constraints and 
objectives are defined using linear functions. 

The article consists of seven sections. In Sect. “A 
Multiple Objective Programming Problem”, we give 
a brief introduction to some foundations of multiple 
objective programming. How to generate potential ‘rea- 
sonable’ solutions for a DM’s evaluation is considered 
in Sect. “Generating Nondominated Solutions”, and in 
Sect. “Solving Multiple Objective Problems”, we will 
review general principles to solve a multiple objective 
programming problem. In Sect. “Example of a Deci- 
sion Support System: VIG”, a multiple criteria decision 
support system VIG is introduced, and a numerical ex- 


Multiple Objective Programming Support, Figure 2 
Illustrating the projection of a feasible and an infeasible as- 
piration level point onto the nondominated surface 


ample is solved in Sect. “Numerical Illustrations”. Con- 
cluding remarks are given in Sect. “Conclusion”. 


A Multiple Objective Programming Problem 


A multiple objective programming (MOP) problem in 
a so-called criterion space can be defined as follows: 


“max’ q 


1 
s.t. Gg €Q, (1) 


where set Q C R* is a so-called feasible region in a k- 
dimensional criterion space R*. The set Q is of special 
interest. Most considerations in multiple objective pro- 
gramming are made in a criterion space. 

Set Q may be convex/nonconvex, bounded/un- 
bounded, precisely known or unknown, consist of finite 
or infinite number of alternatives, etc. When Q con- 
sists of a finite number of elements which are explicitly 
known in the beginning of the solution process, we have 
an important class of problems which may be called 
e.g. (multiple criteria) evaluation problems. Sometimes 
those problems are referred to as discrete multiple crite- 
ria problems or selection problems (for a survey see for 
example [16]). 

When the number of alternatives in Q is infinite and 
not countable, the alternatives are usually defined using 
a mathematical model formulation, and the problem is 
called continuous. In this case we say that the alterna- 
tives are only implicitly known. This kind of problem is 
referred as a multiple criteria design problem (the terms 
‘evalution’ and ‘design’ are adopted from A. Arbel) or 
a continuous multiple criteria problem. In this case, the 
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set Q is not specified directly, but by means of decision 
variables as usually done in single optimization prob- 
lems: 


max q= f(x) = (filx),..., fa(x)) 


s.t. xe X, 


(2) 


where X C R" is a feasible set and f : R" > R*. The 
space R” is called a variable space (see Fig. 1). The func- 
tions f;,i=1,...,k, are objective functions. The feasible 
region Q can now be written as Q = {q: q = f(x), x € X}. 
The MOP-problem has seldom a unique solution, 
i.e. an optimal solution that simultaneously maxi- 
mizes all objectives. Conceptually the multiple objective 
mathematical programming problem may be regarded 

as a value (utility) function maximization program: 
max v(q) 


3S 
s.t. Gg €Q, @) 


where v is a real-valued function, which is strictly in- 
creasing in the criterion space and defined at least in 
the feasible region Q. It is mapping the feasible region 
into a one-dimensional value space (see Fig. 1). Func- 
tion v specifies the DM’s preference structure over the 
feasible region. However, the key assumption in mullti- 
ple objective programming is that v is unknown. Gen- 
erally, if the value function is estimated explicitly, the 
system is considered to be in the MAUT category, see 
for example [7], (MAUT stands for multiple attribute 
utility theory) and can then be solved without any inter- 
action of the DM. Typically, MAUT-problems are not 
even classified under the MCDM-category. If the value 
function is implicit (assumed to exist but is otherwise 
unknown) or no assumption about the value function 
is made, the system is usually classified under MCDM 
[2] or MOP. 

Solutions of the MOP-problems are all those alter- 
natives which can be the solutions of some value func- 
tion v: Q > R. Those solutions are called efficient or 
nondominated depending on the space where the alter- 
natives are considered. The term nondominated is used 
in the criterion space and efficient in the variable space. 
(Some researchers use the term efficient to refer to effi- 
cient and nondominated solutions without making any 
difference.) Any choice from among the set of efficient 
(nondominated) solutions is an acceptable and ‘reason- 


able’ solution, unless we have no additional informa- 
tion about the DM’s preference structure. 
Nondominated solutions are defined as follows: 


Definition 1 In (1), q* € Q is nondominated if and 
only if there does not exist another q € Q such that q > 


q’ and q # q". 


Definition 2 In (1), q* € Q is weakly nondominated if 
and only if there does not exist another q € Q such that 


q>q". 


Correspondingly, efficient solutions are defined as fol- 
lows: 


Definition 3. In (2), x* € X is efficient if and only if 
there does not exist another x € X such that f(x) > f(x*) 


and f(x) 4 f(x*). 


Definition 4 In (2), x* € X is weakly efficient if and 
only if there does not exist another x € X such that f(x)> 


F(x*). 


The final (‘best’) solution q € Q of the problem (1) is 
called the most preferred solution. It is a solution pre- 
ferred by the DM to all other solutions. At the concep- 
tual level, we may think it is the solution maximizing an 
(unknown) value function in problem (3). How to find 
it? That is the problem we now proceed to consider. 
Unfortunately, the above characterization of the 
most preferred solution is not very operational, because 
no system can enable the DM to simultaneously com- 
pare the final solution to all other solutions with an 
aim to check if it is really the most preferred or not. 
It is also as difficult to maximize a function we do not 
know. Some properties for a good system are, for ex- 
ample, that it makes the DM convinced that the final 
solution is the most preferred one, does not require too 
much time from the DM to find the final solution, to 
give reliable enough information about alternatives, etc. 
Even if it is impossible to say which system provides the 
best support for a DM for his multiple criteria prob- 
lem, all proper systems have to be able to recognize, 
generate and operate with nondominated solutions. To 
generate nondominated solutions for the DM’s evalu- 
ation is thus one key issue in multiple objective pro- 
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gramming. In the next section, we will consider some 
principles. 


Generating Nondominated Solutions 


Despite many variations among different methods of 
generating nondominated solutions, the ultimate prin- 
ciple is the same in all methods: a single objective opti- 
mization problem is solved to generate a new solution 
or solutions. The objective function of this single ob- 
jective problem may be called a scalarizing function, ac- 
cording to [25]. It typically has the original objectives 
and a set of parameters as its arguments. The form of 
the scalarizing function as well as what parameters are 
used depends on the assumptions made concerning the 
DM’s preference structure and behavior. 

Two classes of parameters are widely used in multi- 
ple objective optimization: 
1) weighting coefficients for objective functions; and 
2) reference/aspiration/reservation levels for objective 

function values. 
Based on those parameters, there exist several ways to 
specify a scalarizing function. An important require- 
ment is that this function completely characterizes the 
set of nondominated solutions: 


for each parameter value, all solution vectors 
are nondominated, and for each nondominated 
criterion vector, there is at least one parameter 
value, which produces that specific criterion vec- 
tor as a solution 


(see, for theoretical considerations, e. g. [26]). 


A Linear Scalarizing Function 


A classic method to generate nondominated solutions 
is to use the weighted-sums of objective functions, i.e. 
to use the following linear scalarizing function: 

max {A’ f(x): x € X}. (4) 
If A > 0, then the solution vector x of (4) is efficient, 
but if we allow that A > 0, then the solution vector 
is weakly-efficient. (see, e.g. [21, p. 215; 221]). Using 
the parameter set A = {A: A > 0} in the weighted- 
sums linear program we can completely characterize 
the efficient set provided the constraint set is convex. 


However, A is an open set, which causes difficulties in 
a mathematical optimization problem. If we use cl( A) 
= {A: A = 0} instead, the efficiency of x cannot be guar- 
anteed anymore. It is surely weakly-efficient, and not 
necessarily efficient. When the weighted-sums are used 
to specify a scalarizing function in multiple objective 
linear program (MOLP) problems, the optimal solu- 
tion corresponding to nonextreme points of X is never 
unique. The set of optimal solutions always consists 
of at least one extreme point, or the solution is un- 
bounded. In early methods, a common feature was to 
operate with weight vectors 4 € R*, limiting considera- 
tions to efficient extreme points (see, e. g., [29]). 


A Chebyshev-Type Scalarizing Function 


Currently, most solution methods are based on the use 
of a so-called Chebyshev-type scalarizing function first 
proposed by A. Wierzbicki [25]. We will refer to this 
function by the term achievement (scalarizing) func- 
tion. The achievement (scalarizing) function projects 
any given (feasible or infeasible) point g € R* onto the 
set of nondominated solutions. Point g is called a ref- 
erence point, and its components represent the desired 
values of the objective functions. These values are called 
aspiration levels. 
The simplest form of achievement function is: 


&k — Wk 
We 


(5) 


gia) Sar 


where w >0€ R* isa (given) vector of weights, g € R‘, 
and q € Q= {f(x): x € X}. By minimizing s(g, q, w) sub- 
ject to q € Q we find a weakly nondominated solution 
vector q* (see, e. g. [25,26]). However, if the solution is 
unique for the problem, then q* is nondominated. If g 
€ R* is feasible, then q* € Q, q* > g. To guarantee that 
only nondominated (instead of weakly nondominated) 
solutions will be generated, more complicated forms for 
the achievement function have to be used, for example: 


k 
8k — Wk 
+4,W, = = —— i— i), 6 
s(g.q. w. p) max | 7 ]+P Le qi), (6) 
where p > 0. In practice, we cannot operate with a def- 
inition ‘any positive value’. We have to use a pre- 
specified value for p. Another way is to use a lexico- 


graphic formulation [10]. 
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The applying of the scalarizing function (6) is easy, 
because given g € R*, the minimum of s(g, v, w, p) is 
found by solving the following LP-problem: 


k 
min €+p>_(gi—qi) 
i=! ) 
st «xExXx 
e> & 4% jf =1,...,k. 


Ww 
Problem (7) can be further written as: 


k 
min € +p) (gi—qi) 
i=1 
st. x EX (8) 
qtew-z=g 


z>0. 


To illustrate the use of the achievement scalarizing 
function, consider a two-criteria problem with a feasi- 
ble region having four extreme points (0, 0), (0, 3), (2, 
3), (8, 0), as shown in Fig. 2. In Fig. 2, the thick solid 
lines describe the indifference curves when p = 0 in the 
achievement scalarizing function. The thin dotted lines 
stand for the case p > 0. Note that the line from (2, 3) 
to (8, 0) is nondominated and the line from (0, 3) to (2, 
3) (excluding the point (2, 3)) is weakly-nondominated, 
but dominated. Let us assume that the DM first spec- 
ifies a feasible aspiration level point g' = (2, 1). Us- 
ing a weight vector w = [2, 1]’, the minimum value of 
the achievement scalarizing function (—1) is reached at 
a point v! = (4, 2) (cf. Fig. 2). Correspondingly, if an as- 
piration level point is infeasible, say g* = (8, 2), then the 
minimum of the achievement scalarizing function (+ 1) 
is reached at point v? = (6, 1). When a feasible point 
dominates an aspiration level point, then the value of 
the achievement scalarizing function is always negative; 
otherwise it is nonnegative. It is zero, if an aspiration 
level point is weakly-nondominated. 


Solving Multiple Objective Problems 


Several dozen procedures and computer implementa- 
tions have been developed from the 1970s onwards 
to address both multiple criteria evaluation and design 


problems. The multiple objective decision procedures 

always requires the intervention of a DM at some stage 

in the solution process. A popular way to involve the 

DM in the solution process is to use an interactive ap- 

proach. 

The specifics of these procedures vary, but they have 
several common characteristics. For example, at each 
iteration, a solution, or a set of solutions, is generated 
for a DM’s examination. As a result of the examination, 
the DM inputs information in the form of trade-ofts, 
pairwise comparisons, aspiration levels, etc. (see [20] 
for a more detailed discussion). The responses are used 
to generate a presumably, improved solution. The ulti- 
mate goal is to find the most preferred solution of the 
DM. Which search technique and termination rule is 
used is heavily dependent on the underlying assump- 
tions postulated about the behavior of the DM and the 
way in which these assumptions are implemented. In 
MCDM.-research there is a growing interest in the be- 
havioral realism of such assumptions. 

Based on the role that the value function (3) is sup- 
posed to play in the analysis, we can classify the as- 
sumptions into three categories: 

1) Assume the existence of a value function v, and as- 
sess it explicitly. 

2) Assume the existence of a stable value function v, 
but do not attempt to assess it explicitly. Make as- 
sumptions of the general functional form of the 
value function. 

3) Do not assume the existence of a stable value func- 
tion v, either explicitly, or implicitly. 

The first assumption is adopted in multi-attribute util- 

ity or decision analysis (see, e. g. [7]). Interactive soft- 

ware implementing such approaches on personal com- 
puters exists. 

The second assumption was a basic paradigm used 
in interactive multiple criteria approaches in the 1970s. 
A classical example is the GDF-method [3]. DM’s re- 
sponses to specific questions were used to guide the 
solution process towards an ‘optimal’ or ‘most pre- 
ferred’ solution (in theory), assuming that the DM be- 
haves according to some specific (but unknown) under- 
lying value function (see for surveys, e. g. [5,20,21], and 
[24]). Interactive software that implements such sys- 
tems for a computer have often been developed by the 
authors of the above procedures for experimental pur- 
poses. 
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The approaches based on the assumption on ‘no sta- 
ble value/utility function’ typically operate with a DM’s 
aspiration levels regarding the objectives on the feasi- 
ble region. The aspiration levels are projected via min- 
imizing so called achievement scalarizing functions (6) 
[23,25]. No specific behavioral assumptions e. g. transi- 
tivity are necessary. 

In essence, this approach seeks to help the DM more 
or less freely to search the set of efficient solutions. 
Interactive software that implements such systems for 
a computer have been developed like ADBASE [22], 
DIDAS [14], VIG [8], and VIMDA [9]. For an excel- 
lent review of several interactive multiple criteria proce- 
dures, see [21]. Other well-known books that provides 
a deeper background and additional references espe- 
cially in the field of multiple objective optimization in- 
clude [1,4,5,6,19,27] and [28]. 

Multiple objective linear programming (MOLP) is 
the most commonly studied problem in multiple crite- 
ria decision making (MCDM). Most solution methods 
are developed for this problem. 


Example of a Decision Support System: VIG 


Today, many systems use aspiration level projections, 
where the projection is performed using Chebyshev- 
type achievement scalarizing functions as explained 
above. These functions can be controlled either by vary- 
ing weights (keeping aspiration levels fixed) or by vary- 
ing the aspiration levels (keeping weights fixed). Instead 
of aspiration levels, some algorithms asks the DM to 
specify the reservation levels for the criteria (see, e. g. 
[15]). 

An achievement scalarizing function projects one 
aspiration (reservation) level point at a time onto the 
nondominated frontier. By parametrizing the function, 
it is possible to project the whole vector onto the non- 
dominated frontier as originally proposed by [11]. The 
vector to be projected is called a reference direction vec- 
tor and the method reference direction method, cor- 
respondingly. When a direction is projected onto the 
nondominated frontier, a curve traversing across the 
nondominated frontier is obtained. Then an interactive 
line search is performed along this curve. The idea en- 
ables the DM to make a continuous search on the non- 
dominated frontier. The corresponding mathematical 
model is a simple modification from the original model 


(8) developed for projecting one point: 


k 
min «+ p> (gi — qi) 

i=? (9) 
s.t. xEex 


Qgtew-—z=gt+tr, z=0, 

where t: 0 > oo and r € RX is a reference direc- 
tion. In the original approach, a reference direction was 
specified as a vector starting from the current solution 
and passing through the aspiration levels. The DM was 
asked to give aspiration levels for criteria. 

The original reference direction approach has been 
further developed into many directions. First, [12] im- 
proved upon the original procedure by making the 
specification of a reference direction dynamic. The dy- 
namic version was called Pareto race. In Pareto race, 
the DM can freely move in any direction on the non- 
dominated frontier he/she likes, and no restrictive as- 
sumptions concerning the DM’s behavior are made. 
Furthermore, the objectives and constraints are pre- 
sented in a uniform manner. Thus, their role can also 
be changed during the search process. The method and 
its implementation is called Pareto race. The whole 
software package consisting of Pareto race is called 
VIG. 

In Pareto race, a reference direction r is determined 
by the system on the basis of preference information 
received from the DM. By pressing number keys cor- 
responding to the ordinal numbers of the objectives, 
the DM expresses which objectives he/she would like to 
improve and how strongly. In this way he/she implic- 
itly specifies a reference direction. Figure 3 shows the 


Pareto Race 


Goal 1 (max ): Crit.-Mat | <= 
| 
Goal 2 (max }: Crit.Mat 2 <= 
Bes 85437 

Goal 3 (min ): Product 3 <—= 
Be 22696 

Goal 4 {min ): Profit <—= 

a es 3 0).2696 


num:Turn 
F10:Exit 


Bar:Accelerator F1:Gears (B) F3:Fix 
F5:Brakes F2:Gears (F) F4:Relax 


Multiple Objective Programming Support, Figure 3 
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Pareto race interface for the search, embedded in the 
VIG software [8]. 

Thus Pareto race is a visual, dynamic, search proce- 
dure for exploring the nondominated frontier of a mul- 
tiple objective linear programming problem. The user 
sees the objective function values on a display in nu- 
meric form and as bar graphs, as he/she travels along 
the nondominated frontier. The keyboard controls in- 
clude an accelerator, gears, brakes, and a steering mech- 
anism. The search on the nondominated frontier is like 
driving a car. The DM can, e. g., increase/decrease the 
speed, make a turn and brake at any moment he/she 
likes. 

To implement those features, Pareto race uses cer- 
tain control mechanisms, which are controlled by the 
following keys: 

e (SPACE) BAR, an ‘accelerator’: Proceed in the cur- 
rent direction at constant speed. 

e FI, ‘gears (backward)’: Increase speed in the back- 
ward direction. 

e F2, ‘gears (forward)’: Increase speed in the forward 
direction. 

e F3, ‘fix’: Use the current value of objective i as the 
worst acceptable value. 

e F4, ‘relax’: Relax the ‘bound’ determined with key 


F3. 
F5, ‘brakes’: Reduce speed. 
F10, ‘exit’. 


e num, ‘turn’: Change the direction of motion by in- 
creasing the component of the reference direction 
corresponding to the goal’s ordinal number i € [1, 
k] pressed by DM. 

An example of the Pareto race screen is given in 
Fig. 3. The screen is associated with the numerical ex- 
ample described in the next section. 

Pareto race does not specify restrictive behavioral 
assumptions for a DM. He/she is free to make a search 
on the nondominated surface, until he/she believes that 
the solution found is his/her most preferred one. 

Pareto race is only suitable for solving moderate 
size problems. When the size of the problem becomes 
large, computing time makes the interactive mode in- 
convenient. To solve large scale problems [13] pro- 
posed a method based on Pareto race. An (interactive) 
free search is performed to find the most preferred di- 
rection. Based on the direction, an nondominated curve 
can be generated in a batch mode if desired. 


Numerical Illustrations 


For illustrative purposes, we will consider the following 
production planning problem, where a decision maker 
(DM) tries to find the ‘best’ product-mix for three prod- 
ucts: Product 1, Product 2, and Product 3. The produc- 
tion of these products requires the use of one machine 
(mach. hours), man-power (man hours), and two crit- 
ical materials (crit. mat. 1 and crit. mat. 2). Selling the 
products results in profit (profit). Assume that the DM 
describes his/her decision problem as follows: 


Of course, I would like to make as much profit as 
possible. Because it is difficult and quite expen- 
sive to obtain critical materials, I would like to 
use them as little as possible, but never more than 
I have presently in storage (96 units of each). 
Only one machine is used to produce the prod- 
ucts. It operates without any problems for at least 
9 hours. The length of the regular working day 
is 10 hours. People are willing to work overtime 
which is costly and they are tired the next day. 
Therefore, if possible, I would like to avoid it. Fi- 
nally, product 3 is very important to a major cus- 
tomer, and I cannot totally exclude it from the 
production plan. 


The traditional single objective programming con- 
siders the problem as a profit maximization problem. 
The other ‘requirements’ are taken as constraints. The 
multiple objective programming takes a ‘softer’ per- 
spective. We may, for instance, consider the problem as 
a four objective problem. The DM would like to make 
as much profit as possible, but simultaneously, he/she 
would like to use those two critical materials as little as 
possible, and in addition to maximize the use of prod- 
uct 3. Machine hours and man hours are considered as 
constraints, but during the search process the role of 


Multiple Objective Programming Support, Table 1 
The coefficient matrix of the production planning problem 


Prod. 1 Prod. 2 Prod. 3 
mach. hours 15) 1 1.6 
man hours 1 2 1 
crit. mat. 1 9 19.5 VES 
crit. mat. 2 7 20 9 
profit 4 5 3 
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Multiple Objective Programming Support, Table 2 
A sample of solutions for the multiple criteria problem 


I II Il IV 
Objectives: 
crit. mat. 1 91.46 | 94.50 | 93.79 | 90.00 
crit. mat. 2 85.44 | 88.00 89.15 | 84.62 
profit 30.27 | 31.00 30.42 | 29.82 
product 3 0.23 0.00 0.50 0.44 
Constraints: 
mach. hours 9.00 9.00 9.00 9.00 
man hours QV 10.00 10.00 9.62 
Decision Variables: 
product 1 3.88 4.00 3.45 All 
product 2 2.81 3.00 3.03 2.74 
product 3 0.23 0.00 0.50 0.44 


constraints and objectives may also be changed, if nec- 
essary. 

We assume that the problem can be modeled as an 
MOLP-model. The coefficient matrix of the problem is 
given in Table 1. 

Thus, we have the following multiple objective lin- 
ear programming model: 


crit.mat.1: 9P; + 19.5P, + 7.5P; — min 
crit.mat.2: 7P,; + 20P, + 9P; —> min 
profit: 4P, + 5Py + 3P; — max 
product 3: P; —> max 
subject to: 
mach. hours: 1.5P, + P, + 16P; <9 
man hours: Pi + 2P, + P; < 10 


The problem has no unique solution. Using the 
Pareto race (see Fig. 3) or any other software developed 
for multiple objective programming enables a DM to 
search nondominated solutions. Which solution he/she 
will choose as a final one depends entirely on his/her 
own preferences. Actually, all sample solutions except 
solution II are somehow consistent with his/her state- 
ment above. In solution II, product 3 is excluded from 
the production plan. 


Conclusion 


In this article, we have provided an overview on multi- 
ple objective programming support. The emphasis was 


how to find the most preferred alternative from among 
a set of reasonable (nondominated) alternatives. This 
kind of the approach is unique for the multiple criteria 
decision making. We have left other features like struc- 
turing the problem, finding relevant criteria etc. beyond 
this presentation. They are important, but also relevant 
in the considerations of any decision support system. 
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Multiplicative programming refers to a class of opti- 
mization problems containing products of real-valued 
functions in the objective and/or in the constraints. 
A product of convex functions is called a convex mul- 
tiplicative function; similar definitions hold for con- 
cave and linear multiplicative functions. Multiplica- 
tive functions appear in various areas, including mi- 
croeconomics [4], VLSI chip design [10] and modu- 
lar design [2]. Especially in multiple objective decision 
making, they play important roles [3]. A typical ex- 
ample is a bond portfolio optimization studied in [7], 
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where a number of performance indices such as aver- 
age coupon rate, average terminal yields and average 
length of maturity associated with a portfolio (a bundle 
of assets) are to be optimized (either minimized or max- 
imized) subject to a number of constraints. One handy 
approach to simultaneously optimizing multiple objec- 
tives without a common scale is to optimize the geo- 
metric mean, or equivalently the product of these ob- 
jectives. Thus, we are led to consider a multiplicative 
programming problem. 

The simplest subclass of multiplicative program- 
ming problems is a linear multiplicative program, which 
is a quadratic program of minimizing a product of two 
affine functions co} X+ C9, 1= 1, 2, over a polytope D C 
R’: 


min f(x) = (c)x + cio)(c] x + C29) 


s.t. xe D. 


(1) 


This problem was first studied by K. Swarup [13] many 
years ago, but had attracted little attention until the 
late 1980s when an intensive research was undertaken 
[8,12,14]. In general, the objective function f is indef- 
inite; it is quasiconcave on a region where the signs 
of er x + cigs are the same, but quasiconvex on a re- 
gion where the signs are different [1,8]. Therefore, to 
solve (1), we need to solve a quasiconcave minimization 
problem: 


min f(x) 
f (2) 
st “.xEDNS, 
and a quasiconcave maximization problem: 
max f(x) 
f (3) 
st. x E DNS, 


where S = {x € R": roa X + Cio > 0, i = 1, 2}. While (2) 
belongs to multi-extremal global optimization [6] and 
is known to be NP-hard [11] (cf. also ® Complexity 
Classes in Optimization; » Computational Complex- 
ity Theory), problem (3) can be solved using a stan- 
dard convex minimization technique because maximiz- 
ing f(x) amounts to minimizing a convex function — 
log(c} X+C19) — log(c} X + C29). For the same reason as 
(3), certain linear programs with additional linear mul- 
tiplicative constraints, e. g. the modular design problem 


with x; yj = bj [2], can be handled within the frame- 
work of convex programming, if x;, yj > 0. 

A generalization of (1) is a convex multiplicative 
program, which minimizes a product of several convex 
functions f;(x), i= 1,..., p, over a compact convex set 
DCR’: 


P 
min f(x) =| [fi@ 
i=1 


(4) 
st “ED. 


In most of the existing solutions to (4), the convex func- 
tions f; are assumed to be nonnegative-valued on D. 
When f(x’) = 0 for some i and for some x’ € D, the 
minimum value of (4) is zero; and x’ is a globally opti- 
mal solution. We may therefore assume for each i that 
fi(x)> 0 for all x € D. If f is a concave multiplicative 
function instead of a convex one, the problem is equiv- 
alent to a concave minimization problem because log 
f(x) = xy = log fi(x) is concave. The convex multi- 
plicative program (4) itself can also be transformed into 
a concave minimization problem (cf. ® Concave Pro- 
gramming), though f is not a concave function. For ex- 
ample, introducing additional variables y;, i= 1, ..., p, 
we have an equivalent problem: 


p 
min Slog yi 
i=1 
st. “xeED (5) 
fix) <yi, i=1,...,p, 
y20. 


The number p of f;s is often very small in comparison 
with the dimension n of x; e.g. five or so in applica- 
tions to multiple objective optimization. Owing to this 
low-rank nonconvexity [9], problem (5) can be solved 
far more efficiently than the usual concave minimiza- 
tion problem of the same size. 

In addition to (1) and (4), there are a number of 
studies on problems with generalized convex multiplica- 
tive functions of the forms f(x) = TT, fio+ g(x) and 
f(x) = F2i—1(&) foi(x)+ g(x), where the fjs and 
g are convex functions. These are all nonconvex min- 
imization problems, each of which has an enormous 
number of local minima. Nevertheless, algorithms de- 
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veloped in the 1990s can locate a globally optimal so- 
lution in a reasonable amount of time, by exploiting 
special structures of f such as low-rank nonconvexity. 
A comprehensive review of the algorithms are given by 
H. Konno and T. Kuno in [5]. 
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Introduction 


In this contribution, we consider Multi-Quadratic Pro- 
gramming (MQP) problems, where the objective func- 
tion is a quadratic function and the feasible region is de- 
fined by a finite set of quadratic and linear constraints. 
They can be formulated as follows: 


minx’ Qx + c!x 
st. x" Ajx + Bjx <bj, fHl,....m x>0, 


(1) 


where A; is an (nm x n) matrix corresponding to the 
mth quadratic constraint, and B; is the jth row of the 
(m x n) matrix B. 
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MQP plays an important role in modeling many 
diverse problems. The MQP encompasses many other 
optimization problems since it provides a much im- 
proved model compared to the simpler linear relaxation 
of a problem. Indeed, linear mixed 0-1, fractional, bi- 
linear, bilevel, generalized linear complementarity, and 
many more programming problems are or can easily be 
reformulated as special cases of MQP. However, there 
are theoretical and practical difficulties in the process of 
solving such problems. However, very large linear mod- 
els can be solved efficiently; whereas MQP problems are 
in general NP-hard and numerically intractable. The 
problem of finding a feasible solution is also NP-hard. 
This is because MQP is a generalization of the lin- 
ear complementarity problem [29]. The nonlinear con- 
straints in MQP define a feasible region which is in 
general neither convex nor connected. Moreover, even 
if the feasible region is a polyhedron, optimizing the 
quadratic objective function is strongly NP-hard as 
the resulting problem is considered to be the disjoint 
bilinear programming problem. Therefore, finding a fi- 
nite and exact algorithm that solves large MQP prob- 
lems is impractical. Even for the convex case (when 
Q and Aj are positive semidefinite), there are very few 
algorithms for solving MQP problems. However, the 
MQP constitutes an important part of mathematical 
programming problems, arising in various practical ap- 
plications including facility location, production plan- 
ning, VLSI chip design, optimal design of water distri- 
bution networks, and most problems in chemical engi- 
neering design. 

The MQP was first introduced in the seminal paper 
of Kuhn and Tucker [31]. Later on, the case of MQP 
with a single quadratic constraint in the problem was 
discussed in [55,56]. The first general approach for solv- 
ing MQP problems was proposed in [12], where the 
following two Lagrange functions for MQP are consid- 
ered: 


L(x, w) = x'Qx + c'x 


+ D> pjla™ Aja — Bjx = bj) : 


j=l 
Lo(x, fb, A) = Ly(x, bh) — Aix: , 


where jz and A are the multipliers for the quadratic 
and bound constraints respectively. A cutting plane al- 
gorithm was applied to solve this problem; that is, the 


algorithm solves a sequence of linear master problems 
that minimize a piecewise linear function constructed 
from the Lagrange functions for constant x, and a pri- 
mal problem with either an unconstrained quadratic 
function (using L2(x,4,4)) or a quadratic function 
over the nonnegative orthant (using L(x, jz)) [21]. 


Multi-Quadratic Integer Program 


In this contribution we consider a multi-quadratic 
integer programming (MQIP) problem with bilevel 
variables. This problem is a more specific case of 
MQP. Recently, multi-quadratic zero-one program- 
ming problems were proved equivalent to mixed- 
integer programming problems [16]. In that work, 
a quadratic zero-one programming was initially proved 
equivalent to a mixed integer programming prob- 
lem. Then, the result was extended to the case multi- 
quadratic programming case. 

Throughout this paper, we consider a multi- 
quadratic zero-one programming problem, which has 
following form: 


P, :: min f(x) = x Ax, s.t. Bx > b, x! Cx >a, 
x € {0,1}", wis aconstant . 

Notice x’Cx >a essentially represent the same the 

quadratic constraints as x’A;x + Bjx < bj in problem 

(1), due to the binary variables’ property x;x; = xj. 


Applications 
Bilinear Problem 


Each n-dimensional MQP problem can be easily trans- 
formed to a 2n-dimensional bilinear problem. A strat- 
egy for reducing the necessary dimension of the 
resulting bilinear program is also proposed [7,28]. 
However, on the other hand, bilinear optimization 
problems are nothing else but a special instance of 
MQP. Pooling problems in petrochemistry, the modu- 
lar design problem introduced in [17], in particular the 
multiple modular design problem [7,18] or the more 
general modularization of product sub-assemblies [46], 
and special classes of structured stochastic games [20] 
are only some examples of the wide range of applica- 
tions of bilinear programming problems. Another large 
class of optimization problems are problems with linear 
or quadratic functions additionally involving Boolean 
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variables (i.e., variables x; € R with the constraint 
x; € {0,1}). Another widely explored problem is the 
problem of packing n € N equal circles in a square, 
which can be transformed to a MQP problem. One 
looks for the maximum radius r of n non-overlapping 
circles contained in the unit square. This problem is 
equivalent to a MQP problem with a linear objective 
function and concave quadratic constraints. 


Minimax Problem 


A related class of global optimization problems are 
minimax location problems [42], which also lead to 
quadratic constraints. Production planning and port- 
folio optimization are examples where so-called chance 
constrained linear programs occur. These are problems, 
looking similar to linear programs. However, the ma- 
trix describing the linear constraints of such problems 
is not deterministic, it is a stochastic one. Under cer- 
tain restrictive assumptions it is possible to transform 
these stochastic constraints to deterministic quadratic 
constraints [42], such that in general a problem of type 
MQP is obtained. In [8] it is shown that nonconvex 
MQP problems can be used for the examination of spe- 
cial instances of nonlinear bilevel programming prob- 
lems. Other applications of MQP include the fuel mix- 
ture problem encountered in the oil industry [43] and 
also placement and layout problems in integrated cir- 
cuit design [9,10]. 


Mixed Integer Problem 


As described in the previous section, MQP prob- 
lem can be easily linearized to a mixed integer zero- 
one problem with the same problem size. In the- 
ory and practice, the linearization technique proposed 
in [16] has been shown to be superior than other 
conventional linearization techniques. In medical ap- 
plications, multi-quadratic zero-one problems were 
used to model epileptic brains for electrode selection 
problems. Basically, multi-quadratic zero-one prob- 
lems were solved to identify the location (electrode) 
sites of the brain that can detect seizure pre-cursors 
(predict seizures) [30,34,36]. In order to operate in 
real time, multi-quadratic zero-one problems were lin- 
earized to a mixed integer zero-one problem, which is 
much faster to solve in practice. 


Hence there are many applications of MQP. 
Whether the MQP is in practice applicable for solving, 
for example, problems resulting from integer program- 
ming problems, depends on the numerical efficiency 
of the solution method that is used. Up to now only 
few methods for solving the considered general case of 
MQP were proposed in the literature. Most of them re- 
sult from methods being developed for other more gen- 
eral problem classes. In the next section we will discuss 
some of the solution techniques. 


Solution Techniques 


There are many different techniques proposed for solv- 
ing this type of problems, most of them are of branch 
and bound type or some type of linearization tech- 
niques [4,25,26,27,37,38,39,57]. A disadvantage of the 
standard linearization technique is the additional vari- 
ables for each product x;xj;, in which the number 
of new variables is O(n”), where n is the number 
of initial 0-1 variables [4,25,26,57]. The method pro- 
posed in [16] needs only O(kn) additional continu- 
ous variables, where k is the number of quadratic con- 
straints, and the number of initial 0-1 variables remains 
the same. A branch-and-bound algorithm for solving 
MQP problems (and other more general problems), 
when the objective function is separable and the con- 
straint set is linear, was introduced in [19]. The method 
evolves solving bounding convex envelope approximat- 
ing problems over successive partitions of the feasible 
region. This method was later extended to deal with 
nonconvex constraints but it generates a number of in- 
feasible solutions and does not, in general, converge 
in a finite number of iterations [53]. An algorithm for 
the solution to linear problems with an additional re- 
verse convex constraint was proposed in [15]. The al- 
gorithm involves partitioning the feasible region into 
subsets contained in cones originating at an infeasi- 
ble vertex of the polytope formed by the linear con- 
straints while ensuring that an interior point of the 
feasible region is contained in each partition. Later 
on, an algorithm for the solution to problems with 
concave objective functions and separable quadratic 
constraint was proposed in [8]. The algorithm uses 
piecewise linear approximation for the quadratic con- 
straints and solves a MQP problem as a mixed 0-1 
linear problem. This algorithm is similar to the solu- 
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tion approaches for concave quadratic problems [40] 
and for indefinite quadratic problems [35]. During the 
last decade, several authors are interested in some spe- 
cial cases of MQP. Also, many extensions of MQP 
have been discussed in the literature. The problem of 
minimizing an indefinite quadratic objective subject 
to two-sided indefinite quadratic constraints was dis- 
cussed in [54]. Under suitable assumptions, they de- 
rived necessary and sufficient optimality conditions and 
gave some conditions for the existence of solutions for 
this nonconvex program. While several methods have 
been suggested for solving MQP problems, numerical 
solutions of the general problem are still rarely avail- 
able in the literature. By using a double duality argu- 
ment, under suitable assumptions, the MQP is proved 
to be equivalent to a convex program [6]. In addition, 
a problem with a concave quadratic function is proved 
to be equivalent to a minimax convex problem, and 
thus can be solved in polynomial time via interior- 
points methods. The property is no longer true when 
Q is an indefinite quadratic function [6]. 


Linear Forms of MQIP 


As aforementioned, MQP problems have a close re- 
lationship with mixed integer zero-one problem by 
applying linearization schemes, which have been ex- 
plored for decades. Although the existing linearization 
schemes originally were developed for QP instead of 
MQP, they could be easily applied to MQP, since the 
quadratic constraints in MQP could be reformulated by 
using the same technique in linearization considered 
for the quadratic objective. This section will provide 
a brief view of major linearization schemes and their 
applications on MQP problems. 

No matter what specific reformulation of the lin- 
earization schemes, the ideas are the same as replacing 
the quadratic product x;x; by additional variables. Cur- 
rently existing linearization schemes were developed in 
four phases. 

The prototype of linearization technique arose in 
1960s, proposed by Watters [57] and Zangwill [58] (see 
also Fortet [22,23]). This approach introduces addi- 
tional binary variables w; for replacement of the prod- 
ucts x;x; and additional constraints, x; + x; — wij <1 
and x; + x; > 2wj;;, Vi, j, for a guarantee of correct re- 
placement. Taken this approach, the MQP P; is trans- 


formed as following form: 


MIP, :: min f(x, w) = > AjiX) 
i 


+ y Si (aij + Aji) Wij, s.t. Bx > b, 
i j>i 
> CiiXi + DD Yicij + cji)wij 2 a, 
i i j>i 
Xi + xj —Wij S1,Vi,j, x1 +x; = wij, 
Vi,j, x € {0,1}", wij € {0, 1} @ is a constant 
and A = (ajj),C = (cij). 


In this formulation, the quadratic products x;xj; in 
objective and constraints, x?Ax and x'Cx, have 
been similarly replaced by additional binary vari- 
ables w;, and the formulation is consistent with orig- 
inal P; by additional constraints, x; + xj; — wij <1 
and x; + xj = 2wj;j;, Vi, j. Following this seminal work, 
Glover and Woolsey [25] provided more concise zero- 
one linear programming formulations, where reformu- 
lation rules are given under difference conditions to re- 
duce the numbers of additional constraints and addi- 
tional variables. 

In the second phase development of linearization 
techniques, researchers recognized the additional bi- 
nary variables w; in MIP, could be relaxed by continu- 
ous ones. Such linearization schemes include the mod- 
els developed in Glover [24], Glover and Woolsey [26], 
and Rhys [45]. One scheme with close relationship of 
the linearization prototype was provided in Glover and 
Woolsey [26], which introduces additional cut con- 
straints x; > wjj and x; > wjj, Vi, j enforcing the ad- 
ditional continuous variables w; to be binary. However, 
this technique doubles the number of additional con- 
straints added and thereafter enlarges the size of origi- 
nal MQP problems. A straightforward generalization of 
x; = x;x;,Vj generated their further improvement of 
this technique in [26] that used alternative concise con- 
straints (n — i)x; = are, w;j, Vi to enforce the ad- 
ditional variables to be binary, with somewhat fewer 
constraints. Applying such linearization technique, the 
MQ? P, has the following representation: 


MIP; :: min f(x, w) = ~ Aix; 
i 


+ Gr + aji)Wij, st. Bx = b, 


i j>i 
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> os a > les + Cji)Wij = O, 
i 


i j>i 
a xXj—-—Wij S 1, Vi, J (n— i)x; 


= > wij, Vi, x € {0, 1}", wij >0 
(j>i) 


a is a constant and A = (ajj),C = (ci;). 


The main difference between MIP, and MIP is the 
continuity of wj and the smaller number of additional 
constraints. 

Beyond the linearization technique in [26], Glover 
first noticed Petersen’s work [41], where the cross prod- 
ucts in the model are considered by their upper and 
lower bounds. Following this idea, Glover, in [24], 
firstly proposed a linearization technique introduc- 
ing different continuous variables w; to the pioneer 
research. In his linearization scheme, the additional 
continuous variables are defined by w; = x; > j VijXjp 
where x; are binary variables in original model and aj 
are quadratic coefficients in the objective function. Fur- 
ther define the lower and upper bounds of )> j 4ijX; by 
A; = di jtaijlaijy < 0} and AY = i {aijlaij > Of re- 
spectively. Taking the cross products and binary vari- 
ables into consideration, the additional inequalities 
At x; > w;j > A; x; and ae ajjxj—A,(1—xi) = wi = 
ar AjjXj — At (1 — x;) provide the equivalence of orig- 
inal QP model. Applied such linearization technique, 
the MQP P, has a different structure as follows: 


MIP; :: min f(w) = ss wj, s.t. Bx > b, 
i 
At x; zwi= A; Xi, Vi, 2 Aj jXj —A;(1 — x;) 
j 
=> wi = Aj jXj —AtT(1 —xi)Vi >> v; = iO, 
j i 


CY x; ZVi=z Cixi, Vi, > CijXj — C; (1 _ xi) 


i 
J 


= Viz oy CijXj — Cru _ x;)Vi, xeE {0, ee 
j 


q@ is a constant . 


Notice, in MIP3, the quadratic constraint x'Cx >a is 
replaced by a series of inequalities }°, vj > a, CH x; > 


v4, = C,xi,Vi, Vicixy -C7p-— x) = vw 2 


i; CijXj — oegi — x;)Vi, which follow the definitions 
in [24] as: Vi = Xj ye CijXj> Cr = ve leijlei = 0}, 


and oy = DV leijlei > 0}. Compared MIP; with 
MIP, and MIP>, the most important improvement of 
this linearization technique is that the numbers of ad- 
ditional variables and constraints reduce from O(n’) 
to O(n). Some recent papers [1,2] proposed further- 
improved linearization techniques based on the strat- 
egy of Glover’s technique, either providing concise for- 
mulation or generating tighter upper/lower bounds. 

The linearization techniques in the third phase de- 
velopment considered the transformation from the di- 
rection of tightness instead of problem size. One typical 
technique included in this category is the famous Sher- 
ali-Adams Reformulation-Linearization Technique (as 
RLT in short) [3,4,5,50,51,52] which provides wide- 
range applications. The development milestones of 
RLT can be found in a recent memorial paper written 
by Sherali [47]. Interested readers could follow this pa- 
per to find the development details of the linearization 
scheme. 

Some practical applications of linearization tech- 
nique in early 1980s (e.g. [13] considered for solv- 
ing notorious quadratic assignment problem) gener- 
ated the experiences that the linearization techniques 
are practically inefficient although they may have small 
problem size. Such experience intrigued some re- 
searchers to provide better LP structures with tighter 
bounds, which offer better computational efficiencies, 
rather than to pursue smaller problem size. The lin- 
earization technique shown in [3] provides a struc- 
ture having tighter bounds for zero-one QP. The trans- 
formation happens not only replacing the cross prod- 
ucts x;x; in the model but also reconstructing the con- 
straints to obtain the tightness. The example given 
in [3] not only includes the additional constraints and 
continuous variables, but reconstructs the linear con- 
straints by multiplying x; and 1 — x,, respectively. Ap- 
plying this linearization technique to MQP, P; is trans- 
formed as follows: 


MIP, :: min f(x, w) = a iiX; 
F oe Yai + aji)Wij, St. yu — Bx 


i j>i i 


> S > Beiwij+ So Beiwji + (Bij — Bx)x; 


i<j i>j 


= 0, 
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Vj.k, a Cixi + > Lig + cji)Wij = O, 
i i j>i 

xi +xj—Wwij <1,Vi,j, x) = Wij, xj) = Wij 

Vi, j, x € {0,1}", wij = 0, 

where @ is a constant and A = (aj;;), 

C= (cij), 

B = (Bjj), b = (Bx). 


Notice the linear constraints Bx > b are reconstructed 
as by multiplying x; and 1 — x; and then have much 
more complicated but tighter representations. The au- 
thors also provided the rigorous proof that the con- 
struction is tighter than the linearizaiton provided 
by Glover [24]. Other than that, this formulation 
uses the inequalities x; > w;; and x; > wij; instead of 
(n= 1k = Lyi wij which will weaken the model’s 
tightness as pointed out by the authors. 

Using this idea of multiplying x; and 1 — x; to the 
feasible set Adams and Sherali [4] provided a lineariza- 
tion strategy to more general MIP with cross products 
between continuous and binary variables. Compar- 
isons were also provided between the RLT strategy and 
the linearization techniques in Watters [57], Zang- 
will [58], Petersen [41], Glover and Woolsey [25,26], 
and Glover [24]. Along with this direction, Sherali 
and Adams [49,50,51] generated a hierarchy of relax- 
ations for zero-one polynomial problems. This relax- 
ation strategy generalizes the idea in [3] by introduc- 
ing a select set of d-degree polynomial terms or factors, 
where d is an integer less than the number of binary 
variables. Multiplying the feasible set by d-degree poly- 
nomial terms, as the authors showed, obtains an equiv- 
alent reformulation, for each d = 1,...,”, which can 
enforce the binary restrictions on the original x vari- 
ables. And these papers also proved that, when d = n, 
the resulting linear system characterizes the convex hull 
of feasible solutions, and therefore is tighter than any 
other linearization techniques. 

The most recent development, as the final phase, 
of linearization technique is proposed by Chaovalit- 
wongse et al. [16]. The authors took the dual vari- 
ables into account, and proposed a new linearization 
technique based on KKT optimality conditions. Their 
approach was originally considered for MQP, and is 
not hard to be utilized for zero-one QP problems. 
The transformation of MQP P; using this linearization 


strategy can be shown as follows. 


MIP3 :: min g(s,x) = e's — Me'x, s.t. 
Ax —y—s+Me=0, Bx > b, 
y <2M(e—x),Cx—z+M’e>=0, 
e'z—M’e’x >a, z<2M'x, 
x € {0,1}", yi, si, 2; = 0, 
where M’ = ||Cl|o and M = |/Allao. 


Notice the additional variables including s, y and z, 
which are introduced from the Lagrangian function of 
MQP. 

Theorem 1 P, has an optimal solution x° iff there exist 
y®, s°, 2° such that (x°, y°, s°, z°) is an optimal solution 
of MIPs. 


Proof 1 See [16]. Oo 


To conclude all the linearization schemes shown 
herein, we provide a table aggregating the numbers 
of additional variables and constraints for these tech- 
niques as a brief comparison. Assuming we have k 
linear constraints Bam => bj,j=1,...,k and m 
quadratic constraints x"Cjx > aj,j =1,...,m, inan 
MQP. Also assume that the number of binary variables 
n> k and n> m. Then the number of additional 
variables and constraints of the linearized forms apply- 
ing different techniques can be shown in the table as 
follows: 


Models P, 


ee ele 
O(n?) 
O(n?) 


MIP, |MIP, \MIP3 MIP, 


O(n?) 


MIP; 
Additional O(nm) 
constraints 


Addiontal 
variables 


Total O(m + k) 
constraints 
oe ck : 


variables 

Notice that the problem size is not the only rea- 
son of computational efficiencies. The tightness of lin- 
earization schemes, as pointed out by [47], may signif- 
icantly change the effectiveness of the techniques for 
MQpP. 

In terms of solution methods, there are many stud- 
ies in the literature dealing with the MQP. Most of them 
apply a technique called semidefinite programming to 


O(n?) | O(nm) 


(nm) 
(nm) 

(n?) | O(nm) O(nm) 
(nm) 


O(nm) 
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solve the problem. Specifically, these approaches in- 
clude special branch-and-bound [9,10,32,44], branch- 
and-cut [11], lift-and-project [11], and the state-of-the- 
art Interior Point method [14,33]. Some of them have 
been applied in the commercial software package, e. g., 
the solvers BARON and CPLEX in GAMS. 
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Introduction 


Many problems in optimization involve multiple length 
and time scales. Perhaps the most commonly stud- 
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ied problems in this area are configurational molecu- 
lar modeling problems such as phase transitions in wax 
formation, the structure of Lennard-Jones (LJ) clusters, 
and protein folding. This class of optimization prob- 
lems is generally characterized by the presence of many, 
many stationary points (i.e., minima, first-order sad- 
dles, second-order saddles, etc.) that give the appear- 
ance of roughness at the small length scale and quite 
different geometric structure at the large length scale. 
A good example of the disparity in different length 
scales is described in Onuchic et al. [28] who illus- 
trate the small-scale geometry of the protein free energy 
landscape showing many stationary points (or rough- 
ness or frustration) at the small length scale and a fun- 
nel shaped geometry for the large length scale. This is 
the multi-scale description we adopt in this expose. It is 
often acknowledged that finding all stationary points on 
these multi-scale objective function surfaces is all but 
impossible [8] for many problems of practical interest 
and in most cases it is irrelevant. What is of primary 
interest from a computational chemistry perspective is 
finding the relatively few important stationary points 
that describe important physical phenomena - with- 
out finding everything else. These important stationary 
points include global minima, strong local minima and 
important transitions states that describe rate limiting 
behavior. 

There are many deterministic and/or stochastic 
methods that can be used to solve multi-scale global 
optimization problems. See, for example, [1,2,3,4,5,6,7, 
10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27, 
29,30,31,32,33,34,35,36]. These methodologies run the 
gambit from branch and bound methods, to homotopy- 
continuation and interval methods, to simulated an- 
nealing and genetic algorithms, to terrain and funnel- 
ing methods, to specialized techniques. 

In this chapter, a deterministic terrain/funneling 
approach for multi-scale global optimization is de- 
scribed. This approach considers two distinct length 
scales and assumes that the geometry of the larger 
length scale is funnel shaped. Terrain methods are used 
to explore the objective function landscape at small 
length scales and gather point-wise and average gradi- 
ent and curvature information while funneling meth- 
ods are used to make large-scale, monotonically de- 
creasing moves at the large length scale and ‘funnel’ it- 
erates to the global minimizer. 


Formulation 


The formulation of the problem under consideration is 
straightforward. It is to find the 


global min f(z): subject to z < c(z) (1) 


where f = f(z) is a C? objective function defined on 
R" subject to bounds on variables, c(z), and where z are 
the optimization variables. It is assumed that f has two 
distinct length scales - a small length scale of consider- 
able roughness and a large length scale where f has non- 
quadratic behavior. For the discussions that follow, it is 
convenient to denote the gradient of f by g = g(z) and 
the Hessian matrix of f by h = h(z). 

Generally, formulations based on second-order 
Taylor series expansion are adequate to describe be- 
havior at the small length scale, and methods based on 
quadratic approximations of f are well known. How- 
ever, since it is assumed that the behavior of the ob- 
jective function, f, is non-quadratic at the large length 
scale, funneling methods are used to build approxima- 
tions to the large-scale geometry of f using the funnel 
function given by 


F(z) = Fo — I exp[—q(z)] (2) 


where q(z) = $z'Az + b'z + c, and where I” > 0, Fo 
and care scalar parameters, b is an n-dimensional vec- 
tor, and A is an n x n symmetric matrix. The functional 
form of Eq. (2) is interesting because it is non-convex, 
has a unique global minimum when A is positive defi- 
nite, and contains certain inherent self-scaling charac- 
teristics. Figure 1 gives an illustration of the funnel ge- 
ometries that can be represented by the functionality of 
Eq. (2). 

The funnels at the top and the bottom left in Fig. 1 
each have unique minimum because A is positive defi- 
nite while the one at the bottom right has two minima 
since A is indefinite. 


Methods 


In this sub-section, the global terrain method, a fun- 
neling algorithm, and a multi-scale global optimization 
method are described. 
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funnel cloud 


weak funnel 


fold 


overpass 


Multi-Scale Global Optimization Using Terrain/Funneling Methods, Figure 1 


Various Funnel Geometries 


A Global Terrain Method for Optimization 
at the Small Length Scale 


Terrain methods [18,19,20] have described in detail in 
a separate chapter in this encyclopedia and are briefly 
summarized in this chapter for the purpose of continu- 
ity of presentation. Terrain methods are used to locate 
sets of stationary and singular points of the objective 
function. They do this by following valleys and moving 
up and down the landscapes of g'g and f. Key among 
the equations used in terrain methods is the characteri- 
zation of valleys as solutions, V, to a sequence of general 
nonlinearly constrained optimization problems 


V = {min h' g'hg such that g'g = L, for all Le A} 
(3) 


where g and hare defined as before, where L is any given 

value (or level) of the least-squares objective function, 

and where A is some collection of contours. Terrain 

methods require 

1) Reliable downhill equation solving 

2) Reliable and efficient computation of singular points 

3) Efficient uphill movement comprised of predictor- 
corrector calculations 

4) Reliable and efficient eigenvalue-eigenvector com- 
putations 


5) Effective bookkeeping 
6) A termination criterion to decide when the compu- 
tations have finished. 

In this chapter, a terrain methodology is used to find 
sets of stationary and singular points and to determine 
average gradient and average curvature (or Hessian ma- 
trix) information along a given terrain path. Average 
gradient and curvature information is calculated from 
the mean value theorem using the following equations. 


(g) =(l/a) ff glz(a)]da (4) 


(h) = (/a)  f h[z(a)]do (5) 


where @ is some relevant length of the smooth ter- 
rain path connecting any set of stationary and singular 
points. It is important for the reader to understand that 
it is the set of stationary and singular points as well as 
average gradient and Hessian matrix information that 
are communicated from the small length scale to the 
large length scale. This will be discussed again a little 
later in this chapter. 
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A Funneling Method for Optimization 
at the Large Length Scale 


To build iterative global funnel approximations of the 
objective function we match function, gradient, and 
second derivative information of the true objective 
function, f, g and h, with the function, gradient, and 
Hessian matrix information, F, G and H respectively, of 
the funnel function at various points, where G = G(z) 
and H = H(z) are given by 


G(z) = F exp[—q(z)][Az + 8] (6) 


H(z) = I exp[—q(z)][A — (Az + )(Az + b)"] (7) 


Note that if f(z) is used in place of F(z) in Eq. (2), then 
it follows that 


y(z) = Fo — f(z) = I exp[—q(z)] (8) 


where y > 0 isa positive scaling factor that depends on 
a single numerical measurement, f(z), and the scalar 
parameter, Fo. Moreover, replacing G(z) with g(z) and 
H(z) with h(z) in Eqs. (6) and (7) respectively give the 
equations 


[Az + b] = g(z)ly (9) 


A= [yh + gg']/y? (10) 


Equations (8), (9), and (10) show that it is possible to 
estimate A and b from values of f(z), g(z), and h(z) 
using interpolation formula at two or more iterates. 


Interpolating Formulae Let z, be any value of the 
unknown optimization variables with corresponding 
objective function, gradient and Hessian matrix values 
Ske Ze and hy, respectively. Also let z,41 be some other 
arbitrary but not necessarily nearby or successive iter- 
ate with corresponding function, gradient and Hessian 
matrix values fi41, ge+1 and hy+1. Writing Eq. (8) for 
z, and z,41 and then subtracting the latter from the for- 
mer, eliminates Fo and gives 


Vet — Ve = fe — fet (11) 


Repeating the same algebra using Eq. (10) yields 


Vilvetihesi + get Sp41) — Vegi lveae + gg, ] =0 
(12) 


Equations (11) and (12) form a set of [1 + n(m + 1)/2] 
nonlinear equations in the two unknowns y; and yx41 
when the symmetry of the associated matrices is taken 
into account. This together with Eq. (11) gives a to- 
tal of [1 + n(n + 1)/2] nonlinear equations. For n = 1, 
there are two equations and two unknowns. When 
n> 1 there are more equations than unknown vari- 
ables. However, irrespective of this, two equations for 
which y,; and yx41 > 0 can be determined using the 
Routh criterion. 


Estimating Funnel Parameters Calculated values of 
yx and y,4, can be used to determine the matrix A from 
Eq. (10) - using gradient and Hessian matrix informa- 
tion either at z; or Z,41. Following this, the parameter b 
can be computed by simply rearranging Eq. (9) to give 


b = g(z)/y — Az (13) 


while Fo can be calculated from Fy = f(z)+ y. Like A, 
the values of b and Fp can be determined using function 
and gradient values at either z; or Z+41. 


Finding the Funnel Minimum _ It is then straightfor- 
ward to estimate the unique global minimizer of the 
funnel approximation, say y, by simply solving 


Ay =—b (14) 


Note that Eq. (10) shows that the matrix A is gener- 
ated from a rank-one, positive semi-definite correction 
to h(z), that the sign definiteness of A can be controlled 
by the parameter y, and that y appears to also play the 
role of a self-scaling factor. 


Communication Between Length Scales One of the 
keys to success in any multi-scale global optimization 
methodology is the communication between length 
scales. In the terrain/funneling approach to multi-scale 
global optimization, small-scale calculations communi- 
cate average gradient and Hessian matrix information 
at two distinct points to the large length scale. The large 
length scale optimizations, on the other hand, com- 
municate an estimate of the values of the optimization 
variables at a converged minimum of the funnel ap- 
proximation to the small length scale and identify the 
next region on the objective function surface on which 
small-scale optimizations should be conducted. 
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A Multi-Scale Global Optimization Method 


The details of a multi-scale global optimization 
methodology based on the terrain and funneling meth- 
ods is as follows. 

1 Perform two sets of small-scale optimization calcu- 
lations using the terrain methodology starting from 
two different points on the objective function sur- 
face. Calculate average gradient, and average Hes- 
sian information along the resulting terrain paths. 
Thus at the Ath funnel iteration the following infor- 
mation is available - z;, f,, g. and hy and 2441; fr+is 
&r+1 and hy4 such that fer < fr. 

2 Conduct iterative large-scale optimization calcula- 
tions with the funneling methodology initialized 
using the objective function, average gradient and 
average Hessian information from the small-scale 
optimization calculations to find a funnel minimum 
that also corresponds to a stationary point on the 
true objective function surface. To do this, 

a) Solve Eqs. (11) and (12) for yz, and yx41. 

b) Using yx41, calculate A and b from Egs. (10) 

and (13) respectively. 
c) Determine an estimate of funnel minimum, y, 
from Eq. (14). 

d) Evaluate f(y), g(y) and h(y). 

e) Test f(y) against fr4i. If f(y) < feti, then go 
to step 2f for the next funnel iteration. Else set 
Ve = yx/2 and return to step 2a. 

f) Set Zit1 = ys frti = f(Y)> Sk+i = gly), and 
hg = h(y). 

g) If ||g(ze+1)|| < &, set y = y*, and go to step 3; 

else go to step 2a. 

3 Conduct a new set of small-scale terrain calculations 
using the funnel minimum from step 2. Calculate 
average gradient and average Hessian information 
along the resulting terrain path such that new values 
of Zr+is frt+is Sk+1 and hy+1 satisfy the condition 
Ski < fk 

4 Repeat step 2 using the new small-scale information 
and zx, fx, ge and hy from step 1. 

5 Repeat steps 2 and 3 until there is no further decrease 
in the objective function. 

Here we describe step 2 of the multi-scale algorithm. 
The most effective way to determine y;+; in step 2a is 
to rearrange Eq. (11) for yz: in terms of y; and then 
substitute the resulting expression into Eq. (12) This 


gives a cubic polynomial equation in y; and shows that 
there are three possible values of y,; and thus three pos- 
sible sets of scaling factors (yx, Ye+1). Using an equa- 
tion solver like Newton’s method, it is easy to find one 
solution for y,. The other two values of y; can be deter- 
mined by deflation of the cubic equation to a quadratic 
equation and by using the quadratic formula. The cor- 
rect value of y; is the smallest real valued y, > 0 such 
that yr4i > Ye where yroi =f — fr+i t+ Ve. Step 2b 
is straightforward and step 2c requires the solution of 
a system of linear equations. Step 2d evaluates the ac- 
tual function, gradient, and Hessian matrix at the fun- 
nel iterate y. Step 2e is used to ensure monotonic de- 
creasing objective function values by halving y;,4; un- 
til f(y) < fr+1 while step 2f replaces the information 
associated with z,4, with that for the funnel iterate y. 
Finally, step 2g checks the norm of the gradient of the 
objective function and terminates the funnel iterations 
once that norm of the gradient falls below the spec- 
ified tolerance. Note that any point, y*, that satisfies 
the convergence condition in step 2g is simultaneously 
a stationary point of f(z) and a minimizer of the funnel 
function F(z). 

The proposed multi-scale optimization algorithm 
is very robust. The reason for this is because if the 
funneling algorithm gets trapped at a local minimum, 
the methodology returns to the small-scale terrain cal- 
culations to get average gradient and average Hessian 
information around that local minimum. Replacing 
point-wise zero-valued gradient information at a local 
minimum with averaged non-zero valued gradient in- 
formation forces the optimizer to look for a minimizer 
that is deeper in funnel. By forcing movement further 
down the funnel in this way, the multi-scale algorithm 
will continue to improve the value of the objective func- 
tion in a monotonic fashion until it reaches the global 
minimum at the bottom of the funnel. 


Application 


In this sub-section, a simple illustration of the multi- 
scale global optimization methodology is given us- 
ing the classical thirteen-particle Lennard-Jones (LJ;3) 
problem. This example was selected because it is the 
first in the series of Lennard-Jones clusters that truly 
has a single funnel based on the Mackay icosahedron 
structure with the global minimum lying at the bot- 
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tom of the funnel [8]. The LJ;3 cluster has 33 unknown 
Cartesian coordinates, 1509 minima, 116,835 first- 
order and second-order transition states, and many 
higher order transition states [9]. 


Small Scale Terrain Optimization 


Small-scale optimization calculations were performed 
using the terrain methodology from two different start- 
ing points. From the first starting point, which corre- 
sponds to an energy of E = —38.8572, six (6) station- 
ary and singular points along a terrain path were calcu- 
lated requiring 4832 function and gradient evaluations 
and 616.68 s of computer time. Average gradient and 
Hessian information was accumulated along the terrain 
path defined the path integrals given by Eqs. (4) and (5). 
From a second and quite different starting point on the 
objective function surface another set of small-scale op- 
timization calculations was performed using the terrain 
methodology. The energy at this second starting point 
was E = —38.4246, and five (5) stationary and singular 
points were calculated requiring 1415 function and gra- 
dient evaluations and 214.25 s of computer time. Again, 
average gradient and Hessian information was accumu- 
lated along the terrain path using the path integrals us- 
ing Eqs. (4) and (5). 


Large Scale Funneling Optimization 


To build an approximation of the large-scale geometry, 
two singular points - one from each of the two sets of 
terrain calculations - were selected. Using the function 
values at these singular points as well as average gra- 
dient and average Hessian information, funnel param- 
eters were determined and a first estimate of the fun- 
nel minimum was calculated. We then replaced one of 
the singular points with this estimate of the funnel min- 
imum and repeated the funnel optimization calcula- 
tions until || g|| < ¢ = 1x 10~*. Figure 2 summarizes 
these results, which are shown by the curve of dark or 
black diamonds. The resulting structure for LJ;3 is also 
shown in Fig. 2. Twenty (20) funneling iterations were 
required for convergence to the global minimum at 
E = —44.3268 on the LJ,3 energy surface. 


Robustness 


Terrain/funnel calculations were performed using 
many other arbitrary starting points on the potential 
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Multi-Scale Terrain/Funneling Calculations for the LJ;3 Clus- 
ter 


energy surface for the LJ,3 cluster to illustrate the ro- 
bustness of this multi-scale optimization method. Re- 
sults for some of these calculations are also shown in 
Fig. 2 by the two light gray curves with diamonds. Note 
that these multi-scale optimization calculations using 
the terrain/funneling approach also converge easily and 
monotonically to the global minimum in 32 and 27 fun- 
nel iterations respectively from these starting points. In 
all cases, the funneling portion of the overall multi-scale 
algorithm finds the global minimum in less than 0.35 s. 


Reliability - Avoiding Traps at Local Minima 


To show that the proposed optimization approach does 
not get trapped at local minima, terrain/funneling cal- 
culations were repeated from a different set of initial 
terrain calculations. In particular, small-scale terrain 
calculations starting from points on the energy surface 
corresponding to energy values of E; = —39.1597 and 
E, = —38.4246 were performed and average gradient 
and average Hessian information gathered along the re- 
sulting terrain paths. Using this information, the fun- 
neling algorithm converged to a local minimum on the 
potential energy surface at E; = —41.4445 in 18 funnel 
iterations. The results of these calculations correspond 
to the gray curve that terminates at E = —41.4445 in 
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Fig. 2. This local minimum was then used as a start- 
ing point to perform a third set of small-scale terrain 
calculations and to gather average gradient and aver- 
age Hessian information in the valley around the local 
minimum. Using this average information around the 
local minimum, E3, together with average information 
around Ej, a second set of iterative funneling calcula- 
tions was performed. In this case, the funneling calcu- 
lations located the global minimum at E = —44.3268 
on the energy surface in 16 funnel iterations. This sec- 
ond set of funnel iterations is shown by the red curve in 
Fig. 2. 

Those readers interested in the numerical details 
of the LJ,3 illustration are encouraged to contact the 
author, who is quite willing to provide all computer- 
generated numerical results for this example. 
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Many problems in finance, economics and other appli- 
cations require that decisions x; € R? are made periodi- 
cally over time, depending on observations of uncertain 
data (7, &;) in future periods t = 1, ..., T. Here, it is 
distinguished between random data 7; € ©; C RX that 
influence prices in the objective function and random 
data €&; € B,C Ri that affect the demand on the right- 
hand side of constraints in an optimization problem. 
Once an observation (7, +) becomes available, the 
decision maker has to determine a policy x; that min- 
imizes the costs p;(x'~', x;, n') in t plus the expected 
costs in the subsequent periods t + 1, ..., T, subject to 


a set of constraints f;(x‘~', x-) < h(&'). Both the objec- 
tive function and the constraints may depend on the se- 
quences of observations 7! = (71,..., 71), €' = (&1,.--5 €) 
up to ¢ and earlier decisions x’~! = (xo, ..., x;-1). Ob- 
viously, an action x; must be selected after (1;, &+) is ob- 
served but before the future outcomes 7;;1,..., N77 and 
E141...» &r are known, i.e. the decision is based only 
on information available at time t. Hence, one obtains 
a sequence of decisions with the property xo, x:(7', &'), 
say xr(n?, Bf), called nonanticipativity. This results in 
a multistage stochastic program, which may be written 
in its dynamic representation as a series of nested two- 
stage programs (with 7, 1(-) := 0, see [4]): 


dix! nf) = min } p(x", x¢,.77') 


+ f denne xis mi £ Been) APe ; 


t=0,...,T, (1) 


where the expectation is taken w.r.t. the probability 
measure Pr 1 (M1415 €++1|', €') of the joint distribution 
of (71415 141), Subject to 


Fila’, x1) < W(E'), xr > 0. (2) 


In case of discrete distributions, it is well known 
that one can immediately transform the stochastic mul- 
tistage program given by (1) and (2) into a (large) de- 
terministic equivalent problem which can be solved by 
standard optimization tools, possibly combined with 
decomposition techniques to exploit the special struc- 
ture of the problem (see e.g. [1,9,10]). However, if the 
distribution is continuous with some density function, 
it is in general impossible to do the integration in (1) 
exactly. One way to overcome this difficulty is to ap- 
proximate the (continuous) probability measure P, by 
a discrete one Q;. In MSP, this is usually done by con- 
structing a scenario tree which can be illustrated as fol- 
lows: 

Together with the associated scenario probabilities, 
this tree is defined formally as 


(ne, E:) € Ar(n? 7&7) 


iM e 
A: ba 87) ee 
(3) 


T 
q(n', &T) = in| a(n elie mee 


t=1 
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The scenario tree represents an approximation of 
the discrete-time process (m, & t = 1, ..., T), and 
A,(-) denotes the set of finitely many outcomes for (7; 
£,) conditioned on the history (n‘~', &'~'). Again, this 
results in a sparse large scale program. Naturally, the 
question arises how good the accuracy of the associated 
(deterministic) optimization problem is, and if the set 
of scenarios can be improved w.r.t. the accuracy. 

For convex optimization problems where the ran- 
dom data are decomposable in two groups, one that de- 
termines the cost function and the second one affecting 
the demand, it can be shown (see [4] for details) that 
the value function (1) is a saddle function for all t = 1, 
..., T under the following conditions: 

i) p;(-) is concave in ;, 

ii) the left-hand sides of the constraints are determin- 
istic, and 

iii) the distribution function of P,(-| y'~1, &'1) de- 
pends linearly on the past. 

Then, (1) is concave in 7; and convex in (x;, &;). The 

situation where assumptions i)-iii) are fulfilled is called 

the entire convex case. 

This underlying saddle property of the value func- 
tion motivates the application of barycentric approxi- 
mation which derives two scenario trees A“ and A'. The 
associated approximate deterministic programs pro- 
vide upper and lower bounds to the original problem. 
In this sense, barycentric approximation is a general- 
ization of the inequalities due to H.P. Edmundson [2] 
and A. Madansky [8] (see e. g. ® Stochastic Programs 
with Recourse: Upper Bounds) and J.L. Jensen [6] that 
is applicable to saddle functions of correlated random 
data. Here, it is assumed that ©, C RK and ©; C Ri are 
regular simplices whose vertices are denoted by uy,, v; 
=0,..., Ky, and vyt, Wr = 0,..., L:. Both ©; and 5; may 
depend on prior observations (7'~ ', '~ ') although this 
is not stressed in the notation for simplicity. 


To illustrate the way the discretization is performed, 
assume that a two-stage problem is given (the time in- 
dex is omitted here) with deterministic objective, i.e. 
only the right-hand side coefficients h(€) are random 
(see e.g. [7]). For any € € &, the barycentric weights 
To(&), ..., TL(E) w.r.t. the simplex & are given by 


Totes +t = 1, 
ToVo +++ + Tryp = €. 


Since #(x, &) is convex in & for all x, o(E) := (x, &) 
is a convex function for any fixed first-stage decision 
x. Due to convexity, g(&) is bounded from above for 
all € € & by a linear function W(E) = ye Ty (&) 
v,,. To construct the ‘classical’ Edmundson-Madansky 
upper bound (EM) for | g(&) dP over the simplex &, 
— is replaced by a discrete random variable with the 
same expectation, attaining values vo, ..., vz. To ob- 
tain the corresponding probabilities, & has to be re- 
placed by overlineé = f & dP in (4), and the sys- 
tem must be solved for To, ..., tT,. Then, x) dP < 
Se T(E)v,, and the weights may be interpreted as 
the probabilities of the discrete outcomes. 

On the other hand, a lower bound can be found 
using Jensen’s inequality: g(é) < J v(&) AP, ie. by 
evaluation of the function for the expectation of &, and 
the tangent y(&) to p(&) at é is a lower bound to the 
original function. Both linear approximations y(€) and 
W() to the convex value function for a given policy are 
shown in Fig. 2. 

From a computational viewpoint, the original func- 
tion g(&) is replaced by two linear affine functions. 
Clearly, w(&) and W(é) can be integrated easily over the 
support of &. If there is only randomness in the objec- 
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mation, Figure 3 


tive with deterministic right-hand sides, a lower and an 
upper bound can be constructed by applying the same 
procedure to the dual concave (maximization) prob- 
lem, deriving an upper bound from Jensen’s inequality 
and a lower approximation with the EM-rule. 

Barycentric approximation combines these con- 
cepts for stochastic objective and right-hand sides [3] 
and extends them to the multistage case [4,5]. It derives 
distinguished points, so-called generalized barycenters, 
where the value function (1) must be supported by two 
bilinear functions to minimize the error induced by the 
approximation. This is shown in Fig. 3 for K, = L; = 1, 
where the minorant is supported at &) and &) and the 
majorant at 79 and 7). 

Let Azole), ---> Ats Kelpe)s Tr OCEt)s ---s Tes Li(Es) 
be the barycentric weights w.r.t. O, and &, defined 
analogously to (4). For both simplices, the generalized 
barycenters and their probabilities are given by 


Kt 
nie = Laman] Se, f dosnt lB) APs 


v,~=0 


qu) = f euléo dP}, ply = 0,..., Le, 


[; 
Bo, = [a6 Do vie f Anelnedea Bd dP, 


Ly=0 


qlév,) = [rutno dP,, v= 0,..., Ky. 


Note that the integrand A,,;(7;)-Ty,(E,) is a bilinear 
function in (7, &;) since the barycentric weights 1,1 
and Ty, are linear in their components. Obviously, a bi- 
linear function is easy to integrate which was the inten- 
tion of the approximation. 

The generalized barycenters &),, v; = 0, ..., Ky, are 
supporting points of the minorant. They are combined 
with the vertices u,, and weighted with the correspond- 
ing probabilities q(€,,) to obtain discrete outcomes for 
the lower approximation of the original measure P,. 
This way, one derives a discrete probability measure Q! 
with support 


supp Q! = {(uy,,&,): ve = 0,..., Kz}. 


Analogously, nyt, Lr = 0, ..., Ly are supporting 
points for the majorant with assigned probabilities 
q(Nur). This induces a discrete measure Q/ for the up- 
per approximation with 


supp Q/ = {(hpigs Vues)? Mte= Op seagdeys 


Both measures represent the solutions of two corre- 
sponding moment problems. The advantageous feature 
from a computational viewpoint is that the generalized 
barycenters and their probabilities are completely de- 
termined by the first moments of 7 and &;, and by 
the bilinear cross moments E(y,-Eyt), Vi = 05 ...5 Kes 
[it = 0, ..., Ly. Note that the covariance of two ran- 
dom variables is derived from the first moments and 
the corresponding cross moments. Therefore, the mea- 
sures Q” and Q! incorporate implicitly a correlation be- 
tween 7, and &;. However, cross moments (or covari- 
ances, respectively) between different elements of 77; are 
not taken into account (the same holds for the compo- 
nents of &;). Hence, the formulae given above are appli- 
cable without the assumption of independent random 
variables. 
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Applying the approximation scheme dynamically 
over time, one obtains two barycentric scenario trees 
A! and A” with their path probabilities of type (3). 
The set of outcomes at stage t = 1, ..., T is given by 
Alig", ET) = supp Oca, gf 1) and A" (nt 1 
&'—1) = supp Q¥(-|n'—!, &—'). Substituting P, in (1) 
by the discrete measures Q} and Q” yields two value 
functions 


W(x, nf E*) = min } pe(x'*, x4, 9°) 
+ f ven xen! tesa 8 Be aQt,,| ; 


0 14 6) = min p(x' |, x;,n') 
+f Yosalat xe tf ners € bar dQta} 


for t=0,..., T with w7*1(.) = W7*1(-) := 0. According 
to [4], these are lower and upper bounds to the original 
value function, i.e. 


nie ye") < bi(x' 1, n°, €°) < Wi(x' nf, &). 


In the entire convex case, the accuracy of the approx- 
imation is quantifiable by the difference between the 
upper and lower bound. If required, the approxima- 
tion can be improved by partitioning the simplices 0; 
and &,. In case that the subsimplices become arbitrar- 
ily small, the extremal measures converge to P;, and the 
convergence of the upper and lower bounds to the ex- 
pectation of the value function is guaranteed (see [5] for 
details). 
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Introduction 


The nested partitions (NP) method is a powerful opti- 
mization method that has been found to be very effec- 
tive for solving large-scale discrete optimization prob- 
lems. Such problems are common in many practical 
applications and the NP method is hence useful in 
diverse application areas. It can be applied to both op- 
erational and planning problems and has been demon- 
strated to effectively solve complex problems in both 
manufacturing and service industries. 

The NP method was first introduced in [4] and ap- 
plication examples include diverse areas such as opti- 
mization of beam orientation in radiation therapy [1], 
feature selection in data mining [3], and product de- 
sign [5]. 


Formulation 


The NP method is particularly well suited for complex 
large-scale discrete optimization problems where tradi- 
tional methods experience difficulty. It is, however, very 
broadly applicable and can be used to solve any opti- 
mization problem that can be stated mathematically in 
the following generic form: 


min f(x) : (1) 


where the solution space or feasible region X is either 
a discrete or a bounded set of feasible solutions. 

An important special type of problem that can be 
effectively addressed using the NP method is mixed in- 
teger programs (MIP) [6]. For such problems there may 
be one set of discrete variables and one set of con- 
tinuous variables and the objective function and con- 
straints are both linear. A general MIP can be stated as 
follows: 


Zmrp = min clx + cy, (2) 
x,yEx 


where X= {xe Zi,yeR":Alx+A*y <b} and 
we use Zip to denote any linear objective function, that 
is, Zurp = f(x) = cx. While some large-scale MIPs can 
be solved efficiently using exact mathematical program- 
ming methods, complex applications often give rise to 
MIPs where exact solutions can only be found for rel- 
atively small problems. When dealing with such com- 
plex large-scale problems the NP method provides an 
attractive alternative. However, even in such cases it 
may be possible to take advantage of exact mathemati- 
cal programming methods by incorporating them into 
the NP framework. The NP method therefore provides 
a framework for combining the complementary bene- 
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fits of two optimization approaches that have tradition- 
ally been studied separately, namely, mathematical pro- 
gramming and metaheuristics. 

Another important class of problems are combina- 
torial optimization problems (COP) where the feasible 
region is finite but its size typically grows exponentially 
with the number of input parameters of the problem. 
A general COP can be stated as follows: 


min f(x), (3) 


where |X| < 00, but the objective function f: X >R 
may be a complex nonlinear function. Sometimes it 
may have no analytic expression and must be evaluated 
through a model, such as a simulation model, a data 
mining model, or other application-dependent models. 
One advantage of the NP method is that it is effective 
for optimization when f is known analytically (deter- 
ministic optimization), when it is noisy (stochastic op- 
timization), or even when it must be evaluated using an 
external process. 


Methods 


The NP method is best viewed as a metaheuristic frame- 
work, and it has similarities to branching methods in 
that like branch-and-bound it creates partitions of the 
feasible region. However, it also has some unique fea- 
tures that make it well suited for very hard large-scale 
optimization problems. 

Metaheuristics have emerged as the most widely 
used approach for solving difficult large-scale combina- 
torial optimization problems [2]. A metaheuristic pro- 
vides a framework to guide application-specific heuris- 
tics, such as a greedy local search, by restricting which 
solution or set of solutions should or can be visited next. 
For example, the tabu search metaheuristic disallows 
certain moves that might otherwise be appealing by 
making the reverse of recent moves tabu or forbidden. 
At the same time it always forces the search to take the 
best nontabu move, which enables the search to escape 
local optima. Similar to tabu search, most metaheuris- 
tics guide the search from solution to solution or pos- 
sibly from a set of solutions to another set of solutions. 
In contrast, the NP method guides the search by deter- 
mining where to concentrate the search effort. Any op- 
timization method, such as an application-specific lo- 


cal search, other general purpose heuristic, or a math- 
ematical programming method, can then be integrated 
within this framework. 

The development of metaheuristics and other 
heuristic search methods has been made largely in iso- 
lation from the recent advancements in the use of math- 
ematical programming methods for solving large-scale 
discrete problems. It is a very important and novel char- 
acteristic of the NP method that it provides a natu- 
ral metaheuristic framework for combining the use of 
heuristics and mathematical programming and taking 
advantage of their complementary nature. Indeed, as 
far as we know, the NP method is the first systematic 
search method that enables users to simultaneously re- 
alize the full benefits of incorporating lower bounds 
through various mathematical programming methods 
and using any domain knowledge or heuristic search 
method for generating good feasible solutions. It is this 
flexibility that makes the NP method so effective for 
practical problems. 

To concentrate the search effort, the NP method 
employs a decomposition approach similar to that 
of branch-and bound. Specifically, in each step the 
method partitions the space X of feasible solutions into 
the most promising region and the complementary re- 
gion, namely, the set of solutions not contained in the 
most promising region. The most promising region is 
then partitioned further into subregions. The partition- 
ing can be done exactly as branching for a branch-and- 
bound algorithm, but instead of focusing on obtain- 
ing lower bounds and comparing those bounds with 
a single primal feasible solution, the NP methods fo- 
cuses on generating primal feasible solution from each 
of the subregions and the complementary region. This 
results in an upper bound on the performance of each 
of these regions. The region with the best feasible so- 
lution is judged the most promising and the search 
is focused accordingly. A best upper bound does not 
guarantee that the corresponding subset contains the 
optimal solution, but since the NP method also finds 
primal feasible solutions for the complementary re- 
gion it is able to recover from incorrect moves. Specif- 
ically, if the best solution is found in one of the sub- 
regions this becomes the new most promising region, 
where if it is in the complementary region the NP 
method backtracks. This focus on generating primal 
feasible solutions and the global perspective it achieves 
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through backtracking are distinguishing features of the 
NP method that set it apart from similar branching 
methods. 

Unlike exact optimization methods such as branch- 
and-bound the NP method does not guarantee that the 
correct region is selected in each move of the algorithm. 
Incorrect moves can be corrected through backtrack- 
ing, but for the method to be both effective and efficient, 
the correct move must be made frequently. How this is 
accomplished depends on how the feasible solutions are 
generated. 

In what we refer to as the pure NP method feasi- 
ble solutions are generated using simple uniform ran- 
dom sampling. To increase the probability of making 
the correct move the number of samples should be in- 
creased. A purely uniform random sampling is rarely 
efficient, however, and the strength of the NP method is 
that it can incorporate application-specific methods for 
generating feasible solutions. In particular, for practical 
applications domain knowledge can often be utilized 
to very effectively generate good feasible solutions. We 
call such implementations knowledge-based NP meth- 
ods. We will also see examples of what we refer to as 
hybrid NP methods where feasible solutions are gen- 
erated using either general heuristic methods such as 
greedy local search, genetic algorithm, or tabu search, 
or mathematical programming methods. If done effec- 
tively, incorporating such methods into the NP frame- 
work makes it more likely that the correct move is made 
and hence makes the NP method more efficient. In- 
deed, such hybrid and knowledge-based implementa- 
tions are often an order of magnitude more efficient 
than uniform random sampling. 

In addition to the method for generating feasible so- 
lutions, the probability of making the correct move de- 
pends heavily on the partitioning approach. A generic 
method for partitioning is usually straightforward to 
implement but by taking advantage of special structure 
and incorporating this into intelligent partitioning the 
efficiency of the NP method may be improved by an or- 
der of magnitude. The strength of the NP method is in- 
deed in this flexibility. Special structure, local search, 
any heuristic search, and mathematical programming 
can all be incorporated into the NP framework to de- 
velop optimization algorithms that are more effective 
in solving large-scale optimization problems than when 
these methods are used alone. 


Cases 


Here we introduce three application examples that il- 
lustrate the type of optimization problems for which 
the NP method is particularly effective. For each ap- 
plication the optimization problem has a complicating 
aspect that makes it difficult for traditional optimiza- 
tion methods. For the first of these problems, resource- 
constrained project scheduling, the primary difficulty 
is in a set of complicating constraints. For the second 
problem, the feature selection problem, the difficulty 
lies in a complex objective function. The third prob- 
lem, radiation treatment planning, has both difficult- 
to-satisfy constraints and a complex objective function 
that cannot be evaluated through an analytical expres- 
sion. Each of the three problems can be solved effec- 
tively by the NP method by incorporating our under- 
standing of the application into the framework. 


Resource-Constrained Project Scheduling 


Planning and scheduling problems arise as critical chal- 
lenges in many manufacturing and service applications. 
One such problem is the resource-constrained project 
scheduling problem that can be described as follows. 
A project consists of a set of tasks to be performed and 
given precedence requirements between some of the 
tasks. The project scheduling problem involves finding 
the starting time of each task so that the overall comple- 
tion time of the project is minimized. It is well known 
that this problem can be solved efficiently using what 
is called the critical path method that uses forward re- 
cursion to find the earliest possible completion time 
for each task. The completion time of the last task de- 
fines the makespan or the completion time of the entire 
project. 

Now assume that one or more resources are re- 
quired to complete each task. The resources are lim- 
ited so if a set of tasks requires more than the available 
resources they cannot be performed concurrently. The 
problem now becomes NP-hard and cannot be solved 
efficiently to optimality using any traditional methods. 
To state the problem we need the following notation: 
V is the set of all tasks, E is the set of precedence con- 
straints, p; is the processing time of task i € V, R is 
the set of resources, Ry are the available resources of 
type k € R, and rx are resources of type k required by 
task i. 
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The decision variables are the starting times for each 
task: x; is the starting time of task i € V. 

Finally, for notational convenience we define the set 
of tasks processed at time f as 


V(t) = {ir xi <t<xj+pi}. (4) 


With this notation, we now formulate the resource- 
constrained project scheduling problem mathemati- 
cally as follows: 


min max x; + p;, (5) 
ieV 


xitpixxjy VGj)eEE, (6) 
> rin < Re, VkER,tEZ,, x;EZ,. (7) 
i€V(t) 


Here the precedence constraints (6) are easy, whereas 
the resource constraints (7) are hard. By this we mean 
that if the constraints (7) are dropped then the prob- 
lem becomes easy to solve. Such problems, where com- 
plicating constraints transform the problem from easy 
to very hard, are common in large-scale optimization. 
Indeed the classic job shop scheduling problem can 
be viewed as a special case of the resource-constrained 
project scheduling problem where the machines are the 
resources. Without the machine availability constraints 
the job shop scheduling problem reduces to a simple 
project scheduling problem. Other well-known combi- 
natorial optimization problems have similar properties. 
For example, without the subset elimination constraints 
the classic traveling salesman problem reduces to a sim- 
ple assignment problem that can be solved efficiently. 

The flexibility of the NP method allows us to ad- 
dress such problems effectively by taking advantage of 
special structure when generating feasible solutions. It 
is important to note that it is very easy to use sampling 
to generate feasible solutions that satisfy very compli- 
cated constraints. Therefore, when faced with a prob- 
lem with complicating constraints we want to use ran- 
dom sampling to generate partial feasible solutions that 
resolve the difficult part of the problem and then com- 
plete the solution using the appropriate efficient opti- 
mization method. 

For example, when a feasible solution for the re- 
source-constrained-project scheduling problem is gen- 
erated, the resource allocation should be generated 


using random sampling and the solution can then be 
completed by applying the critical path method to de- 
termine the starting times for each task. This requires 
reformulating the problem so that the resource and 
precedence constraints can be separated, but such a re- 
formulation is rather easily achieved by noting that the 
resource constraints can be resolved by determining 
a sequence between the tasks that require the same re- 
source(s) at the same time. Once this sequence has been 
determined it can be added as precedence constraints 
and the remaining solution can be generated using the 
critical path method. Feasible solutions can therefore 
be generated in the NP method by first randomly sam- 
pling a sequence to resolve resource conflicts and then 
applying the critical path method. Both procedures are 
very fast, so complete sample solutions can be gener- 
ated rapidly. 

We also note that constraints that are difficult for 
optimization methods such as mathematical program- 
ming are sometimes very easily addressed in prac- 
tice by incorporating domain knowledge. For exam- 
ple, a domain expert may easily be able to specify 
priorities among tasks requiring the same resource(s) 
in the resource-constrained project scheduling prob- 
lem. The domain expert can therefore, perhaps with 
some assistance from an interactive decision support 
system, specify some priority rules to convert a very 
complex problem into an easy-to-solve problem. The 
NP method can effectively incorporate such domain 
knowledge into the optimization framework by using 
the priority rules when generating feasible solutions. 
This is particularly effective because the domain expert 
would not need to specify priority rules to resolve all 
resource conflicts. Rather, any available priority rule or 
other domain knowledge can be incorporated to guide 
the sampling. 

The same structure can be used to partition intelli- 
gently. Instead of partitioning directly using the deci- 
sion variables (x;), we note that it is sufficient to par- 
tition to resolve the resource conflicts. Once those are 
resolved then the problem is solved. This approach is 
applicable to any problem that can be decomposed in 
a similar manner. 


Feature Selection 


Knowledge discovery and data mining is a relatively 
new field that has experienced rapid growth owing to 
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its ability to extract meaningful knowledge from very 
large databases. One of the problems that must usu- 
ally be solved as part of practical data mining projects 
is the feature selection problem, which involves select- 
ing a good subset of variables to be used by subse- 
quent inductive data mining algorithms. The problem 
of selecting a best subset of variables is well known 
in the statistical literature as well as in machine learn- 
ing. The recent explosion of interest in data mining for 
addressing various business problems has led to a re- 
newed interest in this problem. From an optimization 
point of view, feature selection can clearly be formu- 
lated as a COP where binary decision variables deter- 
mine if a feature (variable) is included or excluded. The 
solution space can therefore be stated very simply as all 
permutations of a binary vector of length n, where n 
is the number of variables. The size of this feasible re- 
gion is 2", so it experiences exponential growth, but typ- 
ically there are no additional constraints to complicate 
its structure. 

On the other hand, there is no consensus objec- 
tive function that measures the quality of a feature or 
a set of features. Tens of alternatives have been pro- 
posed in the literature, including both functions that 
measure the quality of individual features and functions 
that measure the quality of a set of features. However, 
no single measure is satisfactory in all cases and the 
ultimate measure is therefore: Does it work? In other 
words, when the features selected are used for learn- 
ing does it result in a good model being induced? The 
most effective feature selection approach in terms of so- 
lution quality is therefore the wrapper approach, where 
the quality of a set of features is evaluated by apply- 
ing a learning algorithm to the set and evaluating its 
performance. Specifically, an inductive learning algo- 
rithm, such as decision tree induction, support vector 
machines or neural networks, are applied to training 
data containing only the features selected. The perfor- 
mance of the induced model is evaluated and this per- 
formance is used to measure the quality of the feature 
subset. This objective function is not only nonlinear, 
but since a new model must be induced for every fea- 
ture subset it is also very expensive to evaluate. 

Mathematically, the feature selection can be stated 
as the following COP: 


min f(x), (8) 


xE{0,1}" 


that is, X = {0, 1}”. Feature selection is therefore a very 
hard COP not because of the complexity of the feasible 
region, although it does grow exponentially, but owing 
to the complexity of an objective function that is very 
expensive to evaluate. However, this is also an example 
where application-specific heuristics can be effectively 
exploited by the NP method. 

Significant research has been devoted to methods 
for measuring the quality of features. This includes 
information-theoretic methods such as using Shan- 
non’s entropy to measure the amount of information 
contained in each feature: the more information, the 
more valuable the feature. The entropy is measured for 
each feature individually and it can hence be used as 
a very fast local search or a greedy heuristic, where the 
features with the highest information gain are added 
one at a time. While such a purely entropy based fea- 
ture selection will rarely lead to satisfactory results, the 
NP method can exploit this by using the entropy mea- 
sure to define an intelligent partitioning. 

We let X(k) C X denote the most promising region 
in the kth iteration and partition the set into two dis- 
joint subsets (note that X(0) = X): 


Xi(k) = {x € X(k): xi = 1, (9) 


Xo(k) = {x € X(k): x; = 0} . (10) 


Hence, a partition is defined by a sequence of features 
X1,X2,...,Xn, Which determines the order in which 
the features are either included (x; = 1) or excluded 
(x; = 0). 

We calculate the information gain Gain(i) of fea- 
ture i, which is the expected reduction in entropy that 
would occur if we knew the value of feature i, that is, 


Gain(i) = I — E(i), (11) 


where I is the expected information that is needed to 
classify a given instance and E(i) is the entropy of each 
feature. The maximum information gain, or equiva- 
lently the minimum entropy, determines a ranking of 
the features. Thus, we select 


min E(i), 


i; = ar. 
8 i€{1,2,...,n} 
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E(i), 


in = arg min 
i€{1,2,... ,n}\{i} 


in = ar. min E(i). 
i€{1,2,...,0}\{i1, -- 5in—1} 
The feature order ij, i2,...,i, defines an intelligent 


partition for the NP method and this has been found 
to be an order of magnitude more efficient than an av- 
erage arbitrary partitioning [3]. We can use a similar 
idea to generate feasible solutions from each region us- 
ing a sampling strategy that is biased towards including 
features with high information gain. A very fast greedy 
heuristic can thus greatly increase the efficiency of the 
NP method while resulting in much higher quality so- 
lutions that the greedy heuristic is able to achieve on its 
own. 


Radiation Treatment Planning 


Health care delivery is an area of immense importance 
where optimization techniques have been used increas- 
ingly in recent years. Radiation treatment planning is 
an important example of this and intensity-modulated 
radiation therapy (IMRT) is a recently developed com- 
plex technology for such treatment. It employs a mul- 
tileaf collimator to shape the beam and to control, or 
modulate, the amount of radiation that is delivered 
from each of the delivery directions (relative to the pa- 
tient). The planning of the IMRT is very important be- 
cause it needs to achieve the treatment goal while in- 
curring the minimum possible damage to other organs. 
Because of its complexity the treatment planning prob- 
lem is generally divided into several subproblems. The 
first of these is termed the beam angle selection (BAS) 
problem. In essence, BAS requires the determination 
of roughly four to nine angles from 360 possible angles 
subject to various spacing and opposition constraints. 
Designing an optimal IMRT plan requires the selec- 
tion of beam orientations from which radiation is de- 
livered to the patient. These orientations, called beam 
angles, are currently manually selected by a clinician 
on the basis of his/her judgment. The planning process 
proceeds as follows. A dosimetrist selects a collection of 
angles and waits 10-30 min while a dose pattern is cal- 
culated. The resulting treatment is likely to be unac- 
ceptable, so the angles and dose constraints are ad- 
justed, and the process is repeated. Finding a suitable 


collection of angles often takes several hours. The goal 
of using optimization methods to identify quality angles 
is to provide a better decision support system to replace 
the tedious repetitive process just described. An integer 
programming model of the problem contains a large 
number of binary variables and the objective value of 
a feasible point is evaluated by solving a large, contin- 
uous optimization problem. For example, in selecting 
five to ten angles, there are between 4.9'° and 8.9 x 101° 
subsets of 0, 1,2,... ,359. 

The BAS problem is complicated by both an ob- 
jective function with no analytical expression and con- 
straints that are hard to satisfy. In the end an IMRT plan 
is either acceptable or not and the considerations for 
determining acceptability are too complex for a simple 
analytical model. Thus, the acceptability and hence the 
objective function value for each plan must be evalu- 
ated by a qualified physician. This makes evaluating the 
objective not only expensive in terms of time and ef- 
fort, but also introduces noise into the objective func- 
tion because two physicians may not agree on the ac- 
ceptability of a particular plan. The constraints of the 
BAS problem are also complicated since each beam an- 
gle will result in radiation of organs that are not the 
target of the treatment. There are therefore two types 
of constraints: the target should receive the minimum 
amount of radiation and other organs should receive no 
more than some maximum amount of radiation. Since 
these bounds need to be specified tightly the constraints 
are hard to satisfy. 

The BAS problem illustrates how mathematical pro- 
gramming can be effectively incorporated into the NP 
framework. Since the evaluation of even a single IMRT 
plan must be done by an expert and is hence both time- 
consuming and expensive, it is imperative to impose 
a good structure on the search space that reduces the 
number of feasible solutions that need to be generated. 
This can be accomplished through an intelligent parti- 
tioning, and specifically by computing the optimal so- 
lution of an integer program with a much simplified ob- 
jective function [1]. The output of the integer program 
then serves to define an intelligent partitioning. For ex- 
ample, suppose a good angle set (50°, 80°, 110°, 250°, 
280°, 310°, 350°) is found by solving the integer pro- 
gram. We can then partition on the first angle in the 
set, which is 50° in this example. Then one subregion 
includes angle 50°, and the other excludes 50°. 
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Conclusions 


The NP method is a powerful metaheuristic for solv- 
ing large-scale discrete optimization problems. The 
method systematically partitions the feasible region 
into subregions and moves from one region to another 
on the basis of information obtained by randomly gen- 
erating feasible sample solutions from each of the cur- 
rent regions. The method keeps track of which part of 
the feasible region is the most promising in each iter- 
ation and the number of feasible solutions generated, 
and hence the computational effort is always concen- 
trated in this most promising region. 

The efficiency of the NP algorithm depends on mak- 
ing the correct move frequently. This success probabil- 
ity depends in turn on both the partitioning and the 
method for generating feasible solutions. For any prac- 
tical application it is therefore important to increase 
the success probability by developing intelligent par- 
titioning methods, incorporating special structure into 
weighted sampling, and applying randomized heuris- 
tics to generate high-quality feasible solutions. 

The NP method has certain connections to stan- 
dard mathematical programming techniques such as 
branch-and-bound. However, the NP method is pri- 
marily useful for problems that are either too large or 
too complex for mathematical programming to be ef- 
fective. But even for such problems mathematical pro- 
gramming methods can often be used to solve either 
a relaxed problem or a subproblem of the original and 
these solutions can be effectively incorporated into the 
NP framework. 

The three application examples presented here illus- 
trate the broad usefulness of the NP method in both 
manufacturing and service industries, and how it can 
take advantage of special structure and application- 
specific heuristics to improve the efficiency of the 
search. 
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Scientific or engineering applications usually require 
solving mathematical problems. Such applications in 
accordance with networks span a wide range, from 
modeling the evolution of species in biology to mod- 
eling soap films for grids of wires; from the design 
of collections of data to the design of heating or air- 
conditioning systems in buildings; and from the cre- 
ation of oil and gas pipelines to the creation of com- 
munication networks, road and railway lines. These are 
all network design problems of significant importance 
and nontrivial complexity. The network topology and 
design characteristics of these systems are classical ex- 
amples of optimization problems. 

I. Intuitively speaking, a network is a set of points 
and a set of connections where each connection joins 
one point to another and has a certain length. The com- 
binatorial structure of such a network is described as 
a graph G which is defined to be a pair (V, E) where 
e Vis any finite set of elements, called vertices, and 
e Eisa finite family of elements which are unordered 

pairs of vertices, called edges. 

Additionally, assume that a function I: E — R is given 
for the edges of the graph G. Usually, assume that / 
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has only positive values and call it a length-function. 
A (connected) graph equipped with a length-function 
is called a network. 

The length of a subgraph G' = (V',E’), V' © V, 
E' CEN (“), of the network G is defined as 


LG) = >" Ke) 
ecE’ 
Each network G = (V, E) with the length-function [: E 
— Ris a metric space (V, p) by defining the distance 
function in the way that p(y, v’) is the length of a short- 
est path between the vertices v and v’ in G. That means 
that p: V x V — Risa real-valued function satisfying: 
i) p(v, v’) = 0 forall v, v in V; 
ii) p(v, v’) =0 if and only ifv =v; 
iii) p(y, v’) = p(v’, v) for all v, v’ in V; and 
iv) p(v, Vv) < p (vy w)+ p (w, v) for all v, v, win V 
(triangle inequality). 
The problem of finding shortest paths in a graph with 
a length-function is an important and well-studied 
problem. Such a path is easy to find by an algorithm 
created by E.W. Dijkstra [7]: 


PROCEDURE shortest path 
InputInstance(); 
Start with the vertex v; 
Label the vertex v with 0: L(v) = 0; 
REPEAT 
Select an edge ww’ between a labeled ver- 
tex w and an unlabeled vertex w’ such that 
the quantity L(w)+/(ww’) is minimal as pos- 
sible; 
Label w’ : L(w’) = L(w) + I(ww’) 
UNTIL v’ is achieved 
END shortest path; 


A pseudocode for a procedure finding a shortest path be- 
tween two vertices v and v’ in a network 


Assume that the procedure runs if all vertices of 
the network are achieved then the procedure creates 
a spanning tree rooted at the vertex v containing short- 
est paths from v to every vertex. Moreover, the label 
L(w) denotes the distance from v to w, in other terms, 
p(v, w) = L(w). 

Conversely, each finite metric space can be repre- 
sented as a graph with a nonnegative length-function 
[23]. 


Graphs lend themselves as natural models of trans- 
portation as well as communication networks. Conse- 
quently, it is natural to study network design problems 
such as optimal facility location problems for graphs 
and as graphs in metric spaces. 

The core network design problem is the minimum 
spanning tree problem, where one wish to design a min- 
imum cost network contains a path from each vertex 
to each other. Such a network must be a tree, which is 
called a minimum spanning tree (MST). Creating a min- 
imum spanning tree is the problem one has the longest 
history of all ND problems, starting with O. Bortvka 
[2] in 1926. See [19] for an excellent historical survey. 

All the known efficient minimum spanning tree al- 
gorithms are special cases of a general greedy method, 
in which one builds up an MST edge-by-edge, includ- 
ing appropriate short edges and excluding appropriate 
long edges ones. Perhaps the simplest method to find 
an MST is due to J.B. Kruskal [30]: 


PROCEDURE minimum spanning tree 
InputInstance(); 
Start without any edge; 
REPEAT 
Choose the shortest edge that does not form 
form a circle with edges already chosen 
UNTIL | V | —1 edges are chosen 
END minimum spanning tree; 


A pseudocode for a procedure finding a minimum spanning 
tree in a network G = (V, E) 


About an efficient implementation of Kruskal’s al- 
gorithm and several more effective procedures compare 
[3]: 

II. A general ND problem is for a given configura- 
tion of vertices and/or edges to find a network which 
contains these objects, fulfilling some predetermined 
requirements and minimizes a given objective function. 
This is quite general and models a wide variety of prob- 
lems. 

J. McGregor Smith [40] presents a classification of 
applications for network design problems. Generalizing 
this is 
e Large region networks. The metric in large geo- 

graphic regions is given by the shortest great cir- 

cle distance between the points on the (Euclidean) 
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sphere. Large region location problems arise when 
the difference between the Euclidean and the great 
circle metrics is considerable. Location of inter- 
national headquarters or distribution centers and 
planning oil or natural gas pipelines or long distance 
telephone lines are examples [27] and [38]. 
Regional networks. Consider inter-urban networks, 
like communication networks, railway lines and in- 
terstate highway networks. R.F. Love and J.G. Mor- 
ris [33] study a variety of mathematical forms as 
samples for intercity, urban and rural road dis- 
tances. Moreover, they found that the p-norms, and 
linear combinations of these, are the best possible. 
Recent surveys are [4] and [34]. 

Macro scale networks. Chemical processing plants, 
urban arterial systems and similar intra-urban sys- 
tems are typical applications. In these situations the 
rectilinear metric is often used. If the structure of the 
possible connections is predetermined it is also pos- 
sible to formulate the problem as a network design 
problem in graphs. 

Intermediate scale networks. Electric, heating and 
air-conditioning systems in buildings are examples 
of network optimization problems where Steiner 
points can reduce the overall minimum cost solu- 
tion of the network. The rectilinear metric is the 
most frequent measure of the distance in these ap- 
plications. For models and methods see [26]. 

Micro scale networks. The design of very large 
scale integration (VLSI) networks is an example of 
Steiner’s problem where the overall interconnect- 
ing length of the network is crucial for the solution. 
In this class of applications, the rectilinear metric is 
again the most frequent metric. It is also conceivable 
to use a linear combination of rectilinear and maxi- 
mum norm [26] and [32]. 

Evolutionary networks. Molecular sequences are 
used to reconstruct the course of the evolution. 
Since the evolution is assumed to have proceed from 
a common ancestral species in a tree-like branch- 
ing of species, this process is generally modeled by 
a tree. The key question is the reconstruction of 
this tree based on the contemporary data. Molec- 
ular data comes as either DNA sequences (com- 
posed of nucleotides from an alphabet of four let- 
ters; namely the four nucleic acids Adenine, Gua- 
nine, Thymine and Cytosine) or sequences of pro- 


teins (composed of amino acids from an alphabet 
of 20 letters; namely Alanine, Arginine, Asparagine, 
..., Waline). As the metric often the Hamming dis- 
tance is used. Surveys are given in [13] and [39]. 

III. Considering a group of ND problems which are 


the problems with connectivity requirements one finds: 
e The bounded degree minimum spanning tree prob- 


lem (abbreviated: BDMST-problem), which is de- 

fined as follows: Given a finite set N of points in 

a metric space and an integer B > 1. Find a span- 

ning tree T = (N, E) such that T has minimal length 

among all candidates with a maximum degree of the 

vertices less or equal than f. 

Finding a BDMST of maximum degree 6 = 2 is 

equivalent to solving the traveling salesman prob- 

lem, which is known to be NP-hard. The border- 

line which divides the classes NP-hard and P for the 

BDMST problem depending on the space and the 

quantity 6 are described in [5] and [37]. 

Many modified (minimum) spanning tree problems 

are presented in the literature, for instance 

- find a spanning tree interconnecting all vertices 
with minimal maximum degree; 

- find a spanning tree interconnecting all vertices 
with at least k leaves in the tree; 

- find a spanning tree T = (V, E) such that the 
quantity 


> L(the path from v to v’ in T) 
v,v/EV 
is minimal; 

- find a spanning tree isomorphic to a given tree; 
These problems are NP-complete in general, but it is 
shown that they can be solved more easily in several 
specific cases, [15,17], and [28]. 

The Steiner minimal tree problem, where one seek 
a minimum network that connects a set N of des- 
ignated terminal points. Any network solving this 
problem must be a tree, which is called a Steiner 
minimal tree (SMT). It may contain vertices differ- 
ent from the points which are to be connected. Such 
points are called Steiner points. In other terms; an 
SMT for N is a minimum spanning tree on N U Q, 
where Q is a set of additional vertices inserted into 
the metric space in order to achieve a minimal solu- 
tion. In general, however, it is impossible to com- 
pute the number of Steiner points in an easy way 


2542 


Network Design Problems 


independently from the determination of an SMT. 
Additionally, Steiner point locations in the space are 
not prespecified from a candidate list of point loca- 
tions. 

Steiner’s problem for graphs was originally formu- 
lated by S.B. Hakimi [22] in 1971. Since then, the 
problem has received considerable attention in the 
literature. Several exact algorithms and heuristics 
have been suggested and discussed. 

R.M. Karp [29] showed that Steiner’s problem is 
NP-complete. An algorithm which finds a solution 
in exponential time is given in [8]. The algorithm is 
based on the dynamic programming methodology 
using a decomposition property. On the other hand, 
Hakimi [22] remarked that an SMT for N in a net- 
work G = (V, E) can be found by the enumeration 
of minimum spanning trees of subgraphs of G in- 
duced by supersets of N. E.L. Lawler [31] suggested 
a modification of this algorithm, using the fact that 
the number of Steiner points is bounded by |N| — 2 
which shows that not all subsets V’ must be consid- 
ered. 

A recent survey about Steiner minimal trees in net- 
works is given in [26]. 

Algorithmic problems on arbitrary graphs often re- 
main NP-complete even when restricted to spe- 
cial classes of graphs, yet may becomes solvable in 
polynomially bounded time on others. For instance, 
Steiner's problem remains NP-complete even for 
planar graphs [14]; yet, it can be solved in linear time 
in several specific graphs [41,43,44]. 

The geometric version of the Steiner minimal tree 
problem was originated by P. Fermat [10] early in 
the 17th century and by C.F. Gauss [16] in 1836. Per- 
haps starting with the book [6], in 1941, the Gauss 
problem became popularized under the name of 
Steiner’s problem. That is: Given finite set of points 
in a Euclidean space, find a network which connects 
all points of the set with minimal length. 

A classical survey of Steiner’s problem in the Eu- 
clidean plane is [18] and is termed ‘Steiner minimal 
tree’ for the shortest interconnecting network and 
‘Steiner points’ for the additional vertices. 

Without loss of generality, the following is true for 
any SMT for a finite set N of points: 


1) the degree of each vertex is at most three; 


2) the degree of each Steiner point equals three; and 
two edges is incident to a Steiner point meet at as 
angle of 120°; 

3) there are at most |N| — 2 Steiner points. 


In the Euclidean plane one has a geometric con- 
struction by ruler and compass originated in [35] 
and [42]. Recent (1999) surveys about Steiner min- 
imal trees, also in other geometries than the Eu- 
clidean ones, are given in [4] and [26]. 

Clearly, an MST is an approximation of an SMT. 
More exactly: One can find a tree interconnecting 
a finite set of points in a metric space in fast time 
(namely, O(n’)-time, where n is the number of given 
points or the number of vertices in N, respectively) 
with a length at most twice the length of a shortest 
possible tree, namely an SMT. Hence, it is of interest 
to consider the quantity 


: L(SMT for N) _ Na finite set 
L(SMT for N)' _ of points 


which is called the Steiner ratio of the space and says 
how much the total length of an MST can be de- 
creased by allowing Steiner points. 


space Steiner ration source 
Plane with recti- 2/3 = 0.66666--- [25] 
linear norm 

Euclidean norm —-V3/2 = 0.86602: -- [9] 
Plane with p-norm 2/3 < m < J3/2 [4] 


IV. Most of the ND problems can be modeled by an 
integer program. 

Let G = (V, E bea graph. A cut S in G is a partition 
of the vertex-set V into two nonempty parts, S and V \ 
S. An edge e crosses the cut S, written by e € y(S), if it 
has exactly one endpoint in each part. Now, let |: E> R 
be a length-function. Define the following integer linear 
program: 


min Yo Ue) + xe 


ecE 


ci. Seep), OF SEV, 


eey(S) 
x. €{0,l}, ecE, 
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whereby p: 2” — N is a function parametrizing the 
ND problem which is to solve. If one has two vertices 
v and v’, and sets p(S) = 1 when S contains v but not 
v’, then the program models the shortest path problem 
(between v and v’). If p(S) = 1 for all cuts S, then the 
program models the minimum spanning tree problem. 
Other ND problems with connectivity constraints dis- 
cussed by integer linear programs are mentioned in [20] 
and [21]. 

V. The second group of ND problems are the prob- 
lems with capacity constraints. 

Let G = (V, E) be a directed graph, usually called 
a digraph, that is 
e Visa finite set of elements, called vertices, and 
e ECVx Visa finite family, called the set of edges. 
Assume, that there are two distinguished vertices, 
a source vo and a sink vy; in G, and that there is a (di- 
rected) path from vo to v;. Additionally, assume that 
there is a nonnegative capacity-function c: E> R. 

A flow f on the digraph G is a nonnegative function 
on the edges such that 
1) f does not exceed the capacities: 0 < f(e) < c(e) for 

every edge e; and 
2) f satisfies the so-called Kirchhoff-condition 


d) fn= D2 fo.) 
(u,v)EE (v,w)E€E 


for every vertex v € V \ {vo, vi}. 


The quantity 
Y> fon = Yo fn) 
(vo,v)€E (v,vi)EE 


is called the value of the flow f. The problem is to find 
a flow of maximum value, called a maximum flow. 

The fundamental theory of network flows was de- 
veloped by L.R. Ford Jr. and D.R. Fulkerson [11,12]: 
Similarly as above, a cut is defined to be a vertex par- 
tition S and V \ S such that vo € S and vy; € V \ S. The 
capacity of the cut is 


c(S)= > cle). 


eey(S) 


Ford and Fulkerson’s main result, the max-flow min-cut 
theorem, states that the maximum flow value equals the 
minimum cut capacity. 

Ford and Fulkerson proved this theorem by devis- 
ing an algorithm that, given a flow f, either finds a cut 


whose capacity equals the flow value or finds a way to 
increase the the flow value along an augmenting path 
from vo to vj. 


PROCEDURE maximum flow 
InputInstace(); 
Start with the zero flow; 
REPEAT 
find an augmenting path from vo to v1; 
increase the flow value by altering the flows 
along the edges of the path 
UNTIL it no longer applies 
END maximum flow; 


A pseudocode for a procedure finding a maximum flow from 
a source Vo to a sink v; in a graph G = (V, E) equipped with 
a capacity-function 


If all capacities are integers this procedure produces 
a maximum flow. Note, if the capacities are arbitrary 
real numbers the algorithm need never terminate, and 
successive flow values, though they will converge, need 
not converge to the maximum flow value. 

Surveys for network flow algorithms, including clas- 
sical work and a discussion of complexity, are [1] and 
[36]. 

VI. Combining ND problems with connectivity and 
capacity constraints there is the minimum cost flow 
problem, which is to determine a least cost shipment 
of a commodity through a network in order to satisfy 
the network possibilities. 

Let G = (V, E) be a digraph. Associated with 
each vertex v there is a number b(v) € R, satisfying 
Yvevb(y) = 0. A vertex v is called a source, a sink 
or a transshipment vertex if b(v) is positive, negative 
or zero, respectively. Additionally, assume that there is 
a nonnegative capacity-function c: E—> Rand a positive 
cost-function (i. e., length-function) J: E> R. Then the 
minimum cost flow problem is formulated as 


min > I(e)-x¢ 


ecE 
s.t. Me, X(wyv) — > X(vw) = b(y), 
(w,v)EE (v,w)EE 
veV, 
0 <x, < c(e), ec£, 
xe €{0,1}, ec€E. 
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This problem is discussed in the literature many times; 
one of several available good sources is [1] which in- 
cludes several polynomial time algorithms. A general 
background is [24]. 
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The covering problem on a network involves the de- 
cision problem of determining the location of one or 
more ‘facilities’ or ‘centers’ to provide service to several 
clients located at known points on the network. To pro- 
vide service to a client, a facility must be located “close 
enough’ to the client. In a more general version of the 
problem, there may be fixed costs associated with lo- 
cating facilities, and also there may be penalty costs as- 
sociated with not serving clients (not locating a facility 
close enough to the client). In this situation, to mini- 


mize total cost there is a trade-off between establishing 
facilities and not serving clients. 

Let N(V, A) be a connected undirected network 
with node set V = {v), ..., Vm} and arc set A. Each arc 
a € A has a given length |, > 0. If we consider a = [v;, 
v;] as a line segment with length /,, then any point on a 
can be defined by its distance from v; (or from v;). We 
denote by d(x, y) the shortest path distance on the arcs 
of N between the points x and y on N. If X is a set of 
points on N, then D(X, y) denotes the distance between 
y and a point in X which is closest to y. 

Without loss of generality, we assume that the 
clients are located at the nodes of N. Furthermore, we 
assume that each node v; € V is the site of exactly one 
client. To specify the notion of ‘coverage’, for i=1,..., 
m, let the ‘covering radius’ r; > 0 be associated with vj. 
In order for client i to be covered, we require that at 
least one facility x be located so that d(x, v;) < r;. Equiv- 
alently, if X is a set of located facilities, D(X, vi) < 1. 

In our version of the problem, we assume that facil- 
ities can only be located at members ofa subset V’ of V, 
where V’ = { 1,..., Vn}, 1 < m. Since both clients and 
facilities are located at nodes of N, we can define an m 
x n (0—1) covering matrix A, where a; = 1 if and only 
if d(vj, vi) < r;. Thus if aj = 1 and if there is a facility lo- 
cated at v;, then client i is covered. Let cj > 0 be the cost 
of locating a facility at node v;, j = 1,..., n, and let b; > 
0 be the penalty cost associated with not covering client 
i. Finally, let z;,i=1,..., mbea (0-1) variable. Also, let 
xj, j=1,...,n bea (0-1) variable. Then with the above, 
we can formulate the covering problem as the following 
(0-1) integer programming problem: 


n m 
(P) min )) ¢jx; + > biz: 
j=l 


i=1 


subject to: 
n 
> agp tee 1, i=1,...,m, 
j=l 
xj € {0,1}, jul,....n, 
zi € {0, 1}, i=1,...,m. 


In an optimal solution to problem (P), note that z; = 1 
only when client i is not covered. That is, only when x; 
= 0 for every j where d(v;, vi) < 1. 
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For some versions of the problem there is a ‘bud- 
get constraint’ imposed on the facilities. This constraint 
usually takes the form: 


n 
Sox <M. (1) 
j= 


In the special case where each cj = 1, the budget con- 
straint simply limits the number of facilities that can be 
located (to a total of M). This problem, when each b; 
= 1 and the cost of locating facilities is eliminated from 
the objective function, is often called the maximum cov- 
erage location problem, since when the objective func- 
tion is minimized, a minimum number of clients are 
not covered and so a maximum number of clients are 
covered. Henceforth, we will only consider versions of 
problem (P) that do not include (1). 

Several applications of problem (P) and its variants 
have been reported in the literature (see [11] for a par- 
tial list of applications). Besides direct applications of 
the problem, in [9] it is shown that the p-center prob- 
lem on a network (the problem of locating no more 
than p facilities to minimize the maximum distance be- 
tween any client and its closest facility) can be solved 
by solving a sequence of covering problems. As further 
evidence of its applicability, in [7] it is shown that the 
uncapacitated facility location problem can be formu- 
lated as a covering problem. Thus, the covering prob- 
lem is one of the very fundamental network location 
problems. 

On a general network, the covering problem is 
known to be NP-hard. Heuristics for general (0-1) cov- 
ering problems have been well-studied ([3,4,5,8,10]). 

A popular heuristic is the greedy heuristic which 
works as follows. (For notational convenience, in the 
following discussion of the heuristic we assume that 
each client must be covered. Thus the z; variables can 
be removed from (P). If this is not the case, modifi- 
cations to the heuristic are obvious.) At each iteration 
t, let A’ be the submatrix of A that consists of rows 
and columns of A that have not been removed thus far. 
For each column j of A’ compute rj = cj/Ij, where I; is 
the number of nonzero entries in column j of A’. Set 
x;* to 1 where j* is an index j for which r is mini- 
mum. 

Remove j* from A‘ as well as every row i where aj;* 
= 1. The resulting matrix is A‘*’ and the heuristic con- 


tinues until there are no nonzero entries remaining in 
the matrix. 

Thus the heuristic chooses the column of A‘ for 
which the corresponding facility covers uncovered 
clients at ‘least cost per coverage’. The procedure con- 
tinues until all clients are covered. 

There is a performance guarantee of the greedy 
heuristic that is independent of the cost coefficients, but 
does depend upon the entries in the matrix A. It can be 
shown that the ratio of the objective function value de- 
rived via the greedy approach to the optimal objective 
function value will not exceed H(d) = ae 1/k, where 
d is the maximum number of nonzero entries in any 
column of A. 

When the underlying network, N(V, A), is a tree, T 
(a connected undirected network without cycles), prob- 
lem (P) can be solved in polynomial time. A. Kolen and 
A. Tamir [7] have shown that in the case of a tree net- 
work, the rows and columns of A can be permuted (in 
polynomial time) so that the matrix A is in standard 


greedy form (does not contain the submatrix (; >) 


They then show that any covering problem (P) where A 
is in standard greedy form can be solved in O(mn). See 
[1] for the existence of covering problems where A is 
standard greedy, but where the problem cannot be de- 
rived from a tree network location problem. Thus, the 
Kolen-Tamir approach may be applicable to a broader 
class of covering problems on networks. 

In the special case of (P) on a tree, where each cj = 1 
and every client must be covered (0; is sufficiently large 
to prohibit noncoverage), even more efficient solution 
methods are possible (see [2,6], and [12]). With each ¢; 
= 1, problem (P) reduces to the problem of minimizing 
the number of facilities located subject to the condition 
that every client is covered. 

The procedure of B.C. Tansel, R.L. Francis, T.J. 
Lowe, and M.L. Chen [12] involves a physical ‘string 
model’. Their approach does not involve direct use of 
the formulation (P) and facilities can be located any- 
where (on nodes or arcs) on the tree T. The procedure 
begins by inscribing straight-line segments on a planar 
surface such that each segment represents an arc of T. 
Next, a string of length r; is fastened to each node v; of 
the inscribed tree. 

The procedure involves pulling the strings toward 
the interior of the tree by first considering strings fas- 
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tened to nodes at the tips of the tree. Facilities are lo- 
cated at points on the tree where the ends of tight 
strings reach. Once a facility is located each remain- 
ing string that reaches the facility is removed from the 
model, since the corresponding client is covered by that 
facility. 
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Most of the engineering design problems and applica- 
tions can be formulated as a nonlinear programming 
problem in which the objective function to be opti- 
mized is nonlinear and has many local optima in its fea- 
sible region. It is desirable to find a local optimum that 
corresponds to a near global optimum. The problem of 
finding a global minimum or maximum is known as 
the global optimization problem. The main difficulties 
in finding a global optimum are that there are no op- 
erationally useful optimality conditions for identifying 
whether a point is indeed a global optimum, except in 
cases of special structured problems [16]. 

In this article, one class of global optimization prob- 
lems, specifically combinatorial optimization problems 
will be mainly discussed. The combinatorial optimiza- 
tion problems deal with problems of maximizing or 
minimizing an objective function subject to inequal- 
ity and/or equality constraints over a set of combina- 
torial alternatives. Finding optimal solutions to such 
problems where the decision variables are combinato- 
rial or discrete is known as combinatorial optimization 
problems. A naive way to solve combinatorial optimiza- 
tion problems is to list all of the feasible solutions of 
a given problem, then evaluate its objective function, 
and choose the best one as an optimum solution. How- 
ever, even though it is possible in principle to solve the 
problem in this way, in practice it is not, because of the 
large number of possible feasible solutions to any prob- 
lem of a reasonable size. Most of the combinatorial op- 
timization problems are NP-complete [34]. Because of 
the combinatorial structure of these problems the time 
needed to solve them grows exponentially with the size 
of the problem. For NP-complete problems, there is no 
algorithm that provides an exact solution to the prob- 
lem in polynomial time. 


Over the last three decades, the combinatorial opti- 
mization problem is one of the most challenging prob- 
lems that has received considerable attention. The tra- 
ditional solution methodologies for combinatorial opti- 
mization problems can be categorized into three groups 
as exact, heuristic, and approximation methods [50]. 
Exact methods guarantee to obtain an optimum solu- 
tion, but the computational time for obtaining an opti- 
mum solution is usually an exponential function of the 
number of variables. The simplex method and branch 
and bound methods are some examples of exact meth- 
ods. The heuristic methods are problem-specific meth- 
ods based on success without formal analysis of perfor- 
mance. Simulated annealing (SA), tabu search, and ge- 
netic algorithms are some examples of heuristic meth- 
ods [45]. On the other hand, the approximation meth- 
ods generate feasible solutions that are near optimal so- 
lutions. Neural networks based methods are perceived 
as approximation methods. 

This article reviews the use of NNs for combina- 
torial optimization problems and provides a survey of 
most of the NN approaches that have been applied to 
combinatorial optimization problems. In Section 2, ba- 
sic definitions, classifications and applications of NNs 
are introduced. In Section 3, the mathematical basis of 
problem formulation for NNs based on an energy func- 
tion is explained. In Section 4, various combinatorial 
optimization problems studied on NNs are presented. 
Finally, Section 5 is the conclusion. 


Neural Networks 


Neural Network (NN) models are algorithms for intel- 
lectual tasks such as learning and optimization that are 
based on the concept of how the human brain works. 
A NN model is composed of a large number of pro- 
cessing elements called neurons. Each neuron is con- 
nected to other neurons by links, each with an asso- 
ciated weight. Neurons without links toward them are 
called input neurons and with no link leaving away from 
them are called output neurons. The neurons are repre- 
sented by state variables. State variables are functions 
of the weighted-sum of input variables and other state 
variables. Each neuron performs a simple transforma- 
tion at the same time in a parallel-distributed man- 
ner. The input-output relation of the transformation 
in a neuron is characterized by an activation function. 
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Some examples of activation functions are the thresh- 
old function, linear, and sigmoidal functions. The com- 
bination of input neurons, output neurons, and links 
between neurons with associated weights constitute the 
architecture of the NN. 

Mathematically, a NN model is a directed graph 
with the following properties [12]: 

1) Each neuron (node) is associated with a state vari- 
able S; and an activation threshold (bias) ¥;. 

2) Each link (edge) between two neurons i and j asso- 
ciates with it a weight wj. 

3) Each neuron (node) defines a transfer function f;(S;, 
wi, Vi), Which determines the state of the neuron. 
The classification of NNs can be done in different 

ways. According to the nature of activation, they can 

be categorized as deterministic and stochastic NNs. De- 
terministic NNs activate their states by using determin- 
istic activation functions such as the threshold, linear, 
or sigmoidal functions. Most of the existing NN mod- 
els are deterministic. Stochastic NNs activate their states 
according to a probability distribution. A typical exam- 
ple of stochastic NNs is the Boltzmann Machine (BM). 

According to the nature of the connectivity, NNs can be 

categorized as feed-forward and recurrent NNs. Ifa di- 

rected graph has no closed paths, then it is called a feed- 

forward NN. A typical example of the feed-forward 

NN is the popular multilayer perceptron. Conversely, if 

a directed graph has closed paths, then it is called a re- 

current NN. A typical example of the recurrent NN is 

the Hopfield NN ([9,12]). 

Modern era of NNs is said to have begun with 
the introductory work of W.S. McCullough and W. 
Pitts [28]. They have proposed a general theory of infor- 
mation processing based on networks of neurons. Each 
one of these neurons can only take the output values 1 
or 0 by representing the active and resting states of neu- 
rons respectively. NNs have come a long way from the 
early days of McCullough and Pitts. NNs have estab- 
lished themselves as an interdisciplinary subject with 
rich connections in the neuroscience, psychology, phys- 
ical sciences, mathematics and engineering. 

NNs have very close ties with optimization and the 
ties are manifested mainly into two aspects. On one as- 
pect, a lot of learning algorithms have been developed 
based on optimization techniques to train NNs to per- 
form modeling tasks ([2,9,12,37,39]). On the other as- 
pect, NNs have been developed for solving optimiza- 


tion problems ([4,6,12,26,36,40,44,50,51]). Because of 
the inherent nature of parallel and distributed informa- 
tion processing in NNs, they are promising computa- 
tional models for solving large scale optimization prob- 
lems in real-time ([3,5,21,22,23,49]). 


Neural Networks and Combinatorial 
Optimization Problems 


NNs have been proposed as a model of computation 
to solve combinatorial optimization problems. To solve 
combinatorial optimization problems using NNs re- 
quires a mapping of the problem onto the NNs in such 
a way that one can identify a solution from outputs of 
the neurons. In other words, to solve the combinatorial 
optimization problems using NNs, the key is to map the 
problem into the architecture of the NN for which the 
stable state represents the solution of the combinatorial 
optimization problem. 

Combinatorial optimization problems can be solved 
using NNs by following two approaches. 

1) By formulating a combinatorial optimization prob- 
lem in terms of minimizing an energy function of 
either discrete or continuous variables ([1,3,4,5,8,13, 
15,35,36,40]). 

2) By designing competition based NNs in which neu- 
rons are allowed to compete to become active under 
certain conditions ([7,9]). 

These two approaches suggest that NNs are an alter- 
native for solving combinatorial optimization problems 
as compared to other optimization techniques. Moti- 
vations for using NNs include the improvement in the 
speed of the operation through massively parallel com- 
putation and possible hardware design advantages. One 
of the main advantages of NN approaches to classical 
optimization approaches is the inherently parallel and 
distributed nature of the dynamic solution procedure. 
Therefore, NNs are capable to solve large scale opti- 
mization problems in real time ([3,5,21,22,23,49]). 

NNs have been used to solve many combinatorial 
optimization problems since the pioneering work of 
J.J. Hopfield and D.W. Tank [15]. They formulated the 
traveling salesman problem (TSP) on a highly inter- 
connected NN and made exploratory numerical stud- 
ies on a 10-city and 30-city TSP. They showed that 
an energy function can be defined for an analog Hop- 
field NN and the NN always converges to a local min- 
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imum of this function. To find the (near) global min- 
imum of the energy function, the Boltzmann Machine 
(BM) was proposed [2]. The BM was the first attempt 
to combine SA and NN architectures to solve combi- 
natorial optimization problems. However, BM is de- 
signed for discrete variables with the disadvantage of 
being so slow. Another approach by combining SA and 
NNs was proposed in [24]. In this study, a noise term 
has been added to the neural dynamics. The intensity 
of that noise term has been shown to depend on the 
state of the neuron as well as on a temperature parame- 
ter T. By selecting an appropriate temperature schedule 
T(t), the resulting NN converged to a near global min- 
imum of the NN energy function that is in the neigh- 
borhood of the global minimum of the combinatorial 
optimization problem. The Hopfield NN was extended 
to handle inequality constraints and simplified its con- 
vergence characteristics by the eigenvalue analysis [1]. 
The mathematical basis of the behavior of the Hop- 
field NN by means of an idealization of the Hopfield 
network was given in [47]. This study helped to give 
a better understanding of the relationship between NNs 
and the combinatorial optimization. The analog neu- 
ral solution of the combinatorial optimization prob- 
lem was considered [48]. The solution method was an- 
alyzed based on the Lagrange multiplier for the contin- 
uous relaxation problem of 0-1 integer programming. 
The theory and methodology of the deterministic NNs 
for the combinatorial optimization were presented in 
[50]. This study was extended to the convex program- 
ming [51]. The use of NN methods for the combinato- 
rial optimization problems was reviewed ([13,26,44]). 
A systematic approach to design competition based 
NNs for the combinatorial optimization was presented 
in [7]. The competition based NNs were studied in de- 
tail [9]. 

The procedure of NN approaches to the combina- 
torial optimization mostly begins with the formulation 
of an energy function. Ideally, the minimum of this 
energy function corresponds to the optimal solution 
of the combinatorial optimization problem. Most of 
the existing NN approaches to combinatorial optimiza- 
tion problems formulate an energy function by incor- 
porating an objective function and constraints through 
functional transformation and numerical weighting. 
A functional transformation is usually used to con- 
vert constraints to a penalty function to penalize the 


violations of constraints. Numerical weighting is of- 
ten used to balance constraint satisfaction and objec- 
tive minimization. In the NN formulation for combina- 
torial optimization problem, constraint violations en- 
ter to the energy function in an explicit way. In gen- 
eral, such an energy function will have the following 
form: 


E= > Aj;(constraint violation); + cost (1) 


t 


where A; > 0 and cost is an objective function that is 
independent from the constraint violations. By mini- 
mizing the energy function E, we attempt to minimize 
the cost while at the same time minimize the constraint 
violations. 

The second step in designing NNs for combinato- 
rial optimization problems is to derive a dynamic equa- 
tion (also known as state equation or motion equation). 
The dynamic equation of the NN prescribes the mo- 
tion of the activation states of the NN. A properly de- 
rived dynamic equation can ensure that the state of the 
NN reaches equilibrium and the equilibrium-state of 
the NN satisfies the constraints and optimizes the ob- 
jective function of the problem. Currently, the dynamic 
equations of most NNs for optimization problems are 
derived by letting the time derivative of a state vector 
to be directly proportional to the negative gradient of 
an energy function. The dynamic equations and energy 
functions for a wide variety of combinatorial optimiza- 
tion problems were studied [28]. The last step is to de- 
termine the architecture of the NN in terms of neurons 
and connections based on the derived dynamic equa- 
tion in such a way that one can identify a solution from 
the outputs of the neurons. 

The success of optimization using NNs lies in the 
formulation of an energy function, based on the objec- 
tive function and constraints of the given optimization 
problem, and the derivation of a dynamic equation of 
NNs, based on the formulated energy function. 

The NN approaches to optimization problems have 
been started with [14]. He has pioneered an approach 
for solving minimization problems by utilizing the 
collective computational capabilities of NNs. He has 
mapped the objective function and the problem con- 
straints onto a quadratic energy function of the neural 
states that presents the energy of the system of neurons. 
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The energy function had the following form: 


E(S) = -5 x is CijSiSj— > Sil ; (2) 


i=1 j=1 


where S; represents the output state of neurons. Cj rep- 
resents the strength of the link between neurons i and j. 
I; represents the input bias; the input of neurons is de- 
noted as x;. The motion equations of neurons is defined 
by 


dx; al 
rr a > CS; kes 
i#i (3) 
where S; = fi(x;). 


Hopfield showed that for a NN with symmetric con- 
nections and nonnegative elements on the diagonal of 
the C matrix, the NN will always converge by perform- 
ing the gradient descent to a stable state, which is a local 
minimum of the energy function. Many difficult opti- 
mization problems can be formulated as the minimiza- 
tion of a quadratic form and thus can be mapped onto 
the NN model. Hopfield and Tank [15] extended the 
discrete model to an analog model where 0 < S; < 1. 
The modified energy function is: 


i N 
E(S) = =3 x » CijSiSj — > Sili 
i=1 


i=1 j=1 


N : 
1 Te 
i=1 


(4) 


where R; is the input resistance to unit i, and g; is the 
activation function. The dynamics of the neuron update 
is defined by 


dx; 
dt 


N 
= —xi+ yc: ales 
j#i (5) 
1 
where S; = a + tanh(x;)). 


This model is synchronous, continuous and determin- 
istic. 

NNs to minimize the energy function are composed 
of either discrete or continuous neurons. The possibili- 
ties of using different types of NNs such as discrete-state 


and discrete-time, continuous-state and discrete-time, 
and continuous-state and continuous-time networks to 
solve combinatorial optimization problems were inves- 
tigated [29]. One usually prefers to use continuous neu- 
rons rather than discrete ones for two reasons. First, 
a continuous network tends to avoid oscillations be- 
tween stable states. Second, solutions are much better 
than those provided by discrete NNs, because the valley 
of energy landscapes are wider and neuron outputs are 
not restricted to corners of the hypercube during con- 
vergence. NNs do not guarantee globally optimal solu- 
tions but they compute locally optimal solutions. 

In the last two decades, spin glass theories of statis- 
tical physics have been extensively studied because of 
their applications to other areas like optimization and 
neural networks ([10,30,46]). A similarity between an 
NN of the type proposed in [28] and a system of el- 
ementary magnetic spins was pointed in [25]. These 
ideas were developed further [14] by studying how such 
a NN or a spin system can store and retrieve informa- 
tion. The idea of an energy function was used to for- 
mulate a new way of understanding the computation 
performed by NNs. 

To solve combinatorial optimization problems us- 
ing NNs and the mean field theory (MFT) of the statis- 
tical physics ([3,4,5,17,18,20,31,40]) was introduced by 
C. Peterson and J.R. Anderson [38]. This was the first 
detailed attempt to use the MFT and NNs for solving 
combinatorial optimization problems after a brief de- 
scription by Hopfield and Tank [15]. The problem was 
mapped onto the NN such that a neuron being ‘on’ cor- 
responded to a certain decision and then relaxed the 
system using the MFT in order to escape from local 
minima. In this study, the problem was mapped onto 
an Ising glass model ([10,11,31,40]). Another method 
for finding approximate solutions to combinatorial op- 
timization problems within NNs and the MFT con- 
cept was presented [40]. The problem was mapped 
onto a Potts glass model ([19,31,40,46,52]) in this study. 
A mean field annealing (MFA) was described based on 
the MFT and SA. Similar studies based on the MFT 
of statistical physics to solve combinatorial optimiza- 
tion problems have been studied ([4,5,49]). A general 
framework for the MFA algorithm was derived and 
its relationship with to Hopfield NNs was shown [4]. 
A NN mapping and the MFT solution method for find- 
ing good solutions to combinatorial optimization prob- 
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lems containing inequality constraints like the knap- 
sack problem were developed ([32,33]). 

A method to solve combinatorial optimization 
problems was proposed ([17,18]) based on ‘two-layer 
random field model’ using the technique of the MFA. 
This method determined the appropriate values of the 
weights of two kinds of terms in objective function; 
a cost term that should be minimized and a con- 
straint term that expresses constraints imposed on so- 
lutions. A parallel MFT method and a new tempera- 
ture scheduling method named as maximum entropy 
cooling schedule were proposed [43]. The relation- 
ship between the MFT NN model and the continu- 
ous-time Hopfield NN was analyzed [20] by using the 
theory of dynamical systems. This study showed that 
the asynchronous MFT model was equivalent to the 
continuous-time Hopfield NN on the nature of the 
fixed points and hence guaranteed theoretically usage 
of the MFT model for solving combinatorial optimiza- 
tion problems in place of the continuous-time Hop- 
field NN. 


Combinatorial Optimization Problems 


In this section, we list some of the combinatorial opti- 
mization problems that have been studied to be solved 
using NNs. Most of these problems are NP-complete 
[34]. The mapping of combinatorial optimization prob- 
lems onto NNs by focusing graph problems such as 
graph K-partitioning, maximum graph matching, and 
maximum clique problems was studied [13]. The use of 
NNs for combinatorial optimization problems was re- 
viewed [26]. A number of interesting combinatorial op- 
timization problems such as graph partitioning, travel- 
ing salesman problem, vertex cover, maximum clique, 
maximum independent set, number partitioning, max- 
imum matching, set cover, and graph coloring were 
studied to show how to map these problems into NNs 
[44]. A NN mapping and the MFT solution method 
for finding good solutions to combinatorial optimiza- 
tion problems containing inequality constraints like the 
knapsack problem were developed [32]. The MFT ap- 
proach to the knapsack problem was extended to mul- 
tiple knapsacks and generalized assignment problems 
[33]. The following three problems have been stud- 
ied mostly by several researchers using different ap- 
proaches. 


Quadratic Assignment Problem 


The quadratic assignment problem (QAP) represents 
a large class of combinatorial optimization problems 
arising in a variety of planning and designing contexts. 
The QAP seeks to minimize a quadratic cost function 
for assignment of a number of objects to positions that 
can be mathematically described as follows [50]: 


N N N WN 
min f(x)= >) DD cera 


N 
s.t. Kip=H1, pH ly... N; 
t 2 ij J (6) 
N 
xg =i i N, 
j=l 
xij € {0,1} i,j=1,...,N, 


where cj; denotes the cost associated with assigning 
object i to position j and object k to position / for i, j, 
KTS Laie. 

An energy function for this problem can be written 
as follows [8]: 


E(S) = : Py 2S > ciperSinSit 


i=1 j#i k=1 fk 
B n m C m n 2 7) 
“ies HH sus + FH (1-H sa] : 
i=1 k=11¢k k=l i=1 


The first term is the just cost function. The second term 
specifies the constraint that at most one position can be 
assigned to each object and the third term specifies the 
constraint that every object must be assigned by exactly 
one assignment. This term prevents the solution of no 
assignments. The QAP was studied in ([6,7,8,50]). 


Graph Partitioning Problem 


The graph partitioning (GP) problem can be defined as 
follows: Given a set of N nodes with a given connec- 
tivity, partition them into K sets each with N/K nodes 
such that the net connectivity (cut-size) is minimal be- 
tween each set. If K = 2, then this problem is known as 
the graph bipartitioning problem. 

An energy function for this problem can be written 
as follows [42]: For each vertex i, a binary unit S; = +1 
or —1 is assigned, and for each pair of vertices Sj, S;, iF 
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j,avalue Tj = 1 if they are connected, and Tj =0 if they 
are not connected is assigned: 


E(S) = TiSiS; 


N 
“3. 
, : (8) 


(5) -2 


vin 


yn 

t 
N 
2S 2s 

where @ is an imbalance parameter. The first term cor- 
responds to the cost function and the second term cor- 
responds to the constraints of the problem that guar- 
antee equal partition. TjS;S; is 0 whenever vertices i 
and j are not connected at all, positive whenever con- 
nected vertices i and j are in the same partition and neg- 
ative when they are in separate partition. The GP was 
studied in ([5,6,38,40,41,42,44,49]). The graph biparti- 
tioning was studied in ([4,6,36,40,41,42]). The graph K- 
partitioning was studied in ([13,44]). 


Traveling Salesman Problem 


The traveling salesman problem (TSP) can be defined 
as follows: Given a list of N cities and the distances, dj, 
among them, find the shortest possible tour through 
a set of N cities, visiting each one exactly once. Note 
that the TSP is a special case of the GP problem where 
K =N. For the n-city TSP, the number of possible tours 
are n!. However, a tour describes an order in which 
cities are visited. For the n-city TSP, there are 2n tours 
of equal path-length. Therefore, there are 2!/2n distinct 
paths for closed TSP tours. The total of N = n* neurons 
required to map the n-city TSP to NNs. To enable the 
N neurons in the TSP network, to compute a solution 
to the problem, the network must be described by an 
energy function in which the lowest energy state cor- 
respond to the best path. Hopfield and Tank [15] used 
the following energy function to solve the TSP problem. 
The output Sj indicates whether city iis assigned to po- 
sition j in the tour or not. 


_, AxySxi(Sy, it1 + Sy,i-1) . 


4: 
Sane) 

Mz 
Ms 

~Mez 


(9) 


The first three terms characterize the general problem 
constraints. The first triple sum is zero if and only if 
each city row X contains no more than ‘1’, (the rest of 
the entries being zero). The second triple sum is zero 
if and only if each ‘position in tour’ column contains 
no more than ‘1’, (the rest of the entries being zero). 
The third triple sum is zero if and only if there are n 
entries of ‘1’ in the entire matrix. The last triple sum is 
the cost term, the length of the path corresponding to 
a given tour. The TSP was studied in ([6,15,17,18,26,27, 
35,40,41,42,43,50]). 


In addition to these problems, the following prob- 


lems have been also studied: 


The Airline Crew Scheduling [21]. 

The Constraint Satisfaction Problem [26]. 

The Generalized Assignment Problem ([7,33]). 
The Graph Coloring Problem [44]. 

The Job Scheduling [7]. 

The Knapsack Problem ([1,7,32,33,36,41,42]). 
The Maximum Clique Problem ([13,22,23,44]). 
The Maximum Independent Set Problem [44]. 
The Maximum Matching Problem ([13,44]). 
The Morphism Problems [13]. 

The Number Partitioning [44]. 

The Satellite Broadcasting Scheduling [3]. 

The School Scheduling ([41,42]). 

The Set Cover Problem [44]. 

The Vertex Cover Problem [44]. 


Conclusions 


The use of NNs for the combinatorial optimization 
problems and the NN approaches that have been ap- 
plied to combinatorial optimization problems were re- 
viewed. These approaches suggest that NNs are an al- 
ternative for solving combinatorial optimization prob- 
lems as compared to other optimization techniques. 
Motivations for using NNs include the improvement 
in the speed of the operation through massively paral- 
lel computation, and possible hardware design advan- 
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tages. One of the main advantages of NN approaches 
to classical optimization approaches is the inherently 
parallel and distributed nature of the dynamic solution 
procedure. Therefore, NNs are capable to solve large 
scale optimization problems in real time. This is essen- 
tial in many engineering design, control, and optimiza- 
tion problems. Because of the inherent nature of paral- 
lel and distributed information processing in NNs, they 
are promising computational models for solving large 
scale optimization problems in real-time. 


See also 


> Neuro-dynamic Programming 

> Replicator Dynamics in Combinatorial 
Optimization 

> Unconstrained Optimization in Neural Network 
Training 
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Neuro-dynamic programming (NDP for short) is a rel- 
atively new class of dynamic programming methods for 
control and sequential decision making under uncer- 
tainty. These methods have the potential of dealing with 
problems that for a long time were thought to be in- 
tractable due to either a large state space or the lack 
of an accurate model. The methods discussed combine 
ideas from the fields of neural networks, artificial in- 
telligence, cognitive science, simulation, and approxi- 
mation theory. In this article, we delineate the major 
conceptual issues, we survey a number of recent devel- 
opments, we describe some computational experience, 
and we address a number of open questions. 
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We consider systems where decisions are made in 
stages. The outcome of each decision is not fully pre- 
dictable but can be anticipated to some extent before 
the next decision is made. Each decision results in some 
immediate cost but also affects the context in which 
future decisions are to be made and therefore affects 
the cost incurred in future stages. Dynamic program- 
ming (DP for short) provides a mathematical formal- 
ization of the trade-off between immediate and future 
costs. 

Generally, in DP formulations there is a discrete- 
time dynamic system whose state evolves according to 
given transition probabilities that depend on a deci- 
sion/control u. In particular, if we are in state i and we 
choose decision u, we move to state j with given proba- 
bility p;(u). Simultaneously with this transition, we in- 
cur a cost g(i, u, j). In comparing, however, the available 
decisions u, it is not enough to look at the magnitude of 
the cost g(i, u, j); we must also take into account how 
desirable the next state j is. We thus need a way to rank 
or rate states j. This is done by using the optimal cost 
(over all remaining stages) starting from state j, which 
is denoted by J*(j). These costs can be shown to satisfy 
some form of Bellman’s equation 


J*(i) = minE {g(i,w + *Hliw} 
for alli, 


where j is the state subsequent to i, and E{-|i, u} denotes 
expected value with respect to j, given i and u. Gener- 
ally, at each state i, it is optimal to use a control u that 
attains the minimum above. Thus, decisions are ranked 
based on the sum of the expected cost of the present 
period, and the optimal expected cost of all subsequent 
periods. 

The objective of DP is to calculate numerically the 
optimal cost function J*. This computation can be done 
off-line, i. e., before the real system starts operating. An 
optimal policy, that is, an optimal choice of u for each 
i, is computed either simultaneously with J*, or in real 
time by minimizing in the right-hand side of Bellman’s 
equation. It is well known, however, that for many im- 
portant problems the computational requirements of 
DP are overwhelming, mainly because of a very large 
number of states and controls (Bellman’s ‘curse of di- 
mensionality’). In such situations a suboptimal solution 
is required. 


Cost Approximations in Dynamic Programming 


NDP methods are suboptimal methods that center 
around the approximate evaluation of the optimal cost 
function J*, possibly through the use of neural net- 
works and/or simulation. In particular, we replace the 
optimal cost J*(j) with a suitable approximation J(j, r); 
where r is a vector of parameters, and we use at state i 
the (suboptimal) control 77(i) that attains the minimum 
in the (approximate) right-hand side of Bellman’s equa- 
tion 


7i(i) = argminE {g(i,u, j) + Ij. nli,ul . 


The function J will be called the scoring function, and 
the value Ti, r) will be called the score of state j. The 
general form of J is known and is such that once the 
parameter vector r is determined, the evaluation of J(j, 
r) of any state j is fairly simple. 

We note that in some problems the minimization 
over u of the expression 


E{e(i,u.j) + Jini, u} 


may be too complicated or too time-consuming for 
making decisions in real-time, even if the scores TG r) 
are simply calculated. In such problems we may use a 
related technique, whereby we approximate the expres- 
sion minimized in Bellman’s equation, 


Qli,u) = Efgli,u, + PHlius . 


which is known as the Q-factor corresponding to (i, u). 
In particular, we replace Q(i, u) with a suitable approx- 
imation Ai, u, r), where r is a vector of parameters. We 
then use at state i the (suboptimal) control that mini- 
mizes the approximate Q-factor corresponding to i: 


pi) = arg min Q(i, ur). 


Much of what will be said about approximation of the 
optimal cost function also applies to approximation of 
Q-factors. We thus focus primarily on approximation 
of the optimal cost function J*. 

We are interested in problems with a large num- 
ber of states and in scoring functions J that can be de- 
scribed with relatively few numbers (a vector r of small 
dimension). Scoring functions involving few parame- 
ters are called compact representations, while the tabu- 
lar description of J* are called the lookup table represen- 
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tation. Thus, in a lookup table representation, the val- 
ues J*(j) are stored in a table for all states j. In a typical 
compact representation, only the vector r and the gen- 
eral structure of the scoring function Tes r) are stored; 
the scores J(j, r) are generated only when needed. For 
example, Ti, r) may be the output of some neural net- 
work in response to the input j, and r is the associated 
vector of weights or parameters of the neural network; 
or J(j, r) may involve a lower-dimensional description 
of the state j in terms of its ‘significant features’, and r is 
the associated vector of relative weights of the features. 
Thus determining the scoring function J(j, r) involves 
two complementary issues: 

1) deciding on the general structure of the function J(j, 

r); and 
2) calculating the parameter vector r so as to minimize 

in some sense the error between the functions J*(-) 

and J(., r). 

Approximations of the optimal cost function have been 
used in the past in a variety of DP contexts. Chess play- 
ing programs represent a successful example. A key idea 
in these programs is to use a position evaluator to rank 
different chess positions and to select at each turn a 
move that results in the position with the best rank. 
The position evaluator assigns a numerical value to each 
position, according to a heuristic formula that includes 
weights for the various features of the position (mate- 
rial balance, piece mobility, king safety, and other fac- 
tors). Thus, the position evaluator corresponds to the 
scoring function TG. r) above, while the weights of the 
features correspond to the parameter vector r. Usually, 
some general structure of position evaluator is selected 
(this is largely an art that has evolved over many years, 
based on experimentation and human knowledge about 
chess), and the numerical weights are chosen by trial 
and error or (as in the case of the champion program 
Deep Thought) by ‘training’ using a large number of 
sample grandmaster games. 

As the chess program paradigm suggests, intuition 
about the problem, heuristics, and trial and error are 
all important ingredients for constructing cost approx- 
imations in DP. However, it is important to supplement 
heuristics and intuition with more systematic tech- 
niques that are broadly applicable and retain as much 
as possible the nonheuristic aspects of DP. 

NDP aims to develop a methodological foundation 
for combining dynamic programming, compact repre- 


sentations, and simulation to provide the basis for a ra- 
tional approach to complex stochastic decision prob- 
lems. 


Approximation Architectures 


An important issue in function approximation is the se- 
lection of architecture, that is, the choice of a parametric 
class of functions J(-, r) or A, :, r) that suits the prob- 
lem at hand. One possibility is to use a neural network 
architecture of some type. We should emphasize here 
that in this article we use the term ‘neural network ina 
very broad sense, essentially as a synonym to ‘approx- 
imating architecture’. In particular, we do not restrict 
ourselves to the classical multilayer perceptron struc- 
ture with sigmoidal nonlinearities. Any type of univer- 
sal approximator of nonlinear mappings could be used 
in our context. The nature of the approximating struc- 
ture is left open in our discussion, and it could involve, 
for example, radial basis functions, wavelets, polynomi- 
als, splines, etc. 

Cost approximation can often be significantly en- 
hanced through the use of feature extraction, a process 
that maps the state i into some vector f (i), called the fea- 
ture vector associated with the state i. Feature vectors 
summarize, in a heuristic sense, what are considered 
to be important characteristics of the state, and they 
are very useful in incorporating the designer’s prior 
knowledge or intuition about the problem and about 
the structure of the optimal controller. For example in 
a queueing system involving several queues, a feature 
vector may involve for each queue a three-value indica- 
tor, that specifies whether the queue is ‘nearly empty’, 
‘moderately busy’, or ‘nearly full’. In many cases, anal- 
ysis can complement intuition to suggest the right fea- 
tures for the problem at hand. 

Feature vectors are particularly useful when they 
can capture the “dominant nonlinearities’ in the opti- 
mal cost function J*. By this we mean that J*(i) can 
be approximated well by a ‘relatively smooth’ function 
If (i)); this happens for example, if through a change 
of variables from states to features, the function J* be- 
comes a (nearly) linear or low-order polynomial func- 
tion of the features. When a feature vector can be cho- 
sen to have this property, one may consider approxima- 
tion architectures where both features and (relatively 
simple) neural networks are used together. In partic- 
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ular, the state is mapped to a feature vector, which is 
then used as input to a neural network that produces 
the score of the state. More generally, it is possible that 
both the state and the feature vector are provided as in- 
puts to the neural network. 

A simple method to obtain more sophisticated ap- 
proximations, is to partition the statee space into sev- 
eral subsets and construct a separate cost function 
approximation in each subset. For example, by us- 
ing a linear or quadratic polynomial approximation in 
each subset of the partition, one can construct piece- 
wise linear or piecewise quadratic approximations over 
the entire state space. An important issue here is the 
choice of the method for partitioning the state space. 
Regular partitions (e.g., grid partitions) may be used, 
but they often lead to a large number of subsets and 
very time-consuming computations. Generally speak- 
ing, each subset of the partition should contain ‘simi- 
lar’ states so that the variation of the optimal cost over 
the states of the subset is relatively smooth and can 
be approximated with smooth functions. An interest- 
ing possibility is to use features as the basis for parti- 
tion. In particular, one may use a more or less regular 
discretization of the space of features, which induces a 
possibly irregular partition of the original state space. In 
this way, each subset of the irregular partition contains 
states with ‘similar features’. 


Simulation and Training 


Some of the most successful applications of neural net- 
works are in the areas of pattern recognition, nonlin- 
ear regression, and nonlinear system identification. In 
these applications the neural network is used as a uni- 
versal approximator: the input-output mapping of the 
neural network is matched to an unknown nonlinear 
mapping F of interest using a least squares optimiza- 
tion. This optimization is known as training the net- 
work. To perform training, one must have some train- 
ing data, that is, a set of pairs (i, F(i)), which is repre- 
sentative of the mapping F that is approximated. 

It is important to note that in contrast with these 
neural network applications, in the DP context there is 
no readily available training set of input-output pairs (i, 
J* (i), which can be used to approximate J* with a least 
squares fit. The only possibility is to evaluate (exactly 
or approximately) by simulation the cost functions of 


given (suboptimal) policies, and to try to iteratively im- 
prove these policies based on the simulation outcomes. 
This creates analytical and computational difficulties 
that do not arise in classical neural network training 
contexts. Indeed the use of simulation to evaluate ap- 
proximately the optimal cost function is a key new idea, 
that distinguishes the methodology of this presentation 
from earlier approximation methods in DP. 

Using simulation offers another major advantage: it 
allows the methods of this article to be used for sys- 
tems that are hard to model but easy to simulate; that 
is, in problems where an explicit model is not avail- 
able, and the system can only be observed, either as it 
operates in real time or through a software simulator. 
For such problems, the traditional DP techniques are 
inapplicable, and estimation of the transition probabil- 
ities to construct a detailed mathematical model is often 
cumbersome or impossible. 

There is a third potential advantage of simulation: 
it can implicitly identify the ‘most important’ or ‘most 
representative’ states of the system. It appears plausible 
that if these states are the ones most often visited dur- 
ing the simulation, the scoring function will tend to ap- 
proximate better the optimal cost for these states, and 
the suboptimal policy obtained will perform better. 


Neuro-Dynamic Programming 


The name ‘neuro-dynamic programming’ expresses the 
reliance of the methods of this article on both DP and 
neural network concepts. In the artificial intelligence 
community, where the methods originated, the name 
reinforcement learning is also used. In common arti- 
ficial intelligence terms, the methods allow systems to 
‘learn how to make good decisions by observing their 
own behavior, and use built-in mechanisms for im- 
proving their actions through a reinforcement mech- 
anism’. In less anthropomorphic DP terms, ‘observing 
their own behavior’ relates to simulation, and ‘improv- 
ing their actions through a reinforcement mechanism’ 
relates to iterative schemes for improving the quality of 
approximation of the optimal cost function, or the Q- 
factors, or the optimal policy. There has been a grad- 
ual realization that reinforcement learning techniques 
can be fruitfully motivated and interpreted in terms 
of classical DP concepts such as value and policy it- 
eration; see the survey [1], and the book [6], which 
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point out the connections between the artificial intel- 
ligence/reinforcement learning viewpoint and the con- 
trol theory/DP viewpoint, and give many references. 

The currently most popular methodology in NDP 
iteratively adjusts the parameter vector r of the scoring 
function TG, r) as it produces sample state trajectories 
(ip, ..., ik+1) ---») by using simulation. These trajecto- 
ries correspond to either a fixed stationary policy, or to 
a ‘greedy’ policy that applies, at state i, the control u that 
minimizes the expression 


E {e(i,u. ) + IG. niu} 


where r is the current parameter vector. A central no- 
tion here is the notion of a temporal difference, defined 
by 


dy = glix, ux, ings) + Jiggs. 1) —Ilies ”) 5 


and expressing the difference between our expected 
cost estimate Tir, r) at state i, and the predicted cost 
estimate g(ix, Ux, ik+1) & Teas r) based on the out- 
come of the simulation. If the cost approximations 
were exact, the average temporal difference would be 
zero by Bellman’s equation. Thus, roughly speaking, 
the values of the temporal differences can be used to 
make incremental adjustments to r so as to bring about 
an approximate equality (on the average) between ex- 
pected and predicted cost estimates along the simulated 
trajectories. This viewpoint, formalized by R.S. Sutton 
in [5], can be implemented through the use of gra- 
dient descent/stochastic approximation methodology. 
Sutton proposed a family of methods of this type, called 
TD(A), and parameterized by a scalar A€[0, 1]. One ex- 
treme, TD(1), is closely related to Monte-Carlo sim- 
ulation and least squares parameter estimation, while 
the other extreme, TD(0), is closely related to stochas- 
tic approximation. A related method is Q-learning, in- 
troduced by C.J.C.H. Watkins [9], which is a stochas- 
tic approximation-like method that iterates on the Q- 
factors. While there is convergence analysis of TD(A) 
and Q-learning for the case of lookup table representa- 
tions (see [8,4]), the situation is much less clear in the 
case of compact representations. A number of results 
have been derived for approximate policy and value it- 
eration methods, which are obtained from the tradi- 
tional DP methods after compact representations of the 
various cost functions involved are introduced. 


While the theoretical support for the NDP method- 
ology is only now emerging, there have been quite a few 
reports of successes with problems too large and com- 
plex to be treated in any other way. A particularly im- 
pressive success is the development of a backgammon 
playing program as reported by G. Tesauro [7]. Here a 
neural network provided a compact representation of 
the optimal cost function of the game of backgammon 
by using simulation and TD(A). The training was per- 
formed by letting the program play against itself. Af- 
ter training for several months, the program nearly de- 
feated the human world champion. Variations of the 
method used by Tesauro have been used with success 
in a variety of applications. 

The recent experience of several researchers, involv- 
ing several engineering applications, has confirmed that 
NDP methods can be impressively effective in problems 
where traditional DP methods would be hardly appli- 
cable and other heuristic methods would have a limited 
chance of success. We note, however, that the practi- 
cal application of NDP is computationally very inten- 
sive, and often requires a considerable amount of trial 
and error. Fortunately, all the computation and exper- 
imentation with different approaches can be done off- 
line. Once the approximation is obtained off-line, it can 
be used to generate decisions fast enough for use in 
real time. In this context, we mention that in the ma- 
chine learning literature, reinforcement learning is of- 
ten viewed as an ‘on-line’ method, whereby the cost ap- 
proximation is improved as the system operates in real 
time. This is reminiscent of the methods of traditional 
adaptive control. 

Extensive references for the material of this article 
are the research monographs [3,6]. A more limited text- 
book discussion is given in [2]. The survey [1], and the 
overviews [10,11], and other papers in the edited vol- 
ume [12] point out the connections between the artifi- 
cial intelligence/reinforcement learning viewpoint and 
the control theory/DP viewpoint, and give many refer- 
ences. 
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Problems 
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Problems 
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New hybrid conjugate gradient algorithms are pro- 
posed and analyzed. In these hybrid algorithms the 
famous parameter 6, is computed as a convex com- 
bination of the Polak—Ribiére-Polyak and Dai-Yuan 
conjugate gradient algorithms. In one hybrid algorithm 
the parameter in convex combination is computed in 
such a way that the conjugacy condition is satisfied, in- 
dependent of the line search. In the other, the param- 
eter in convex combination is computed in such a way 
that the conjugate gradient direction is the Newton di- 
rection. The algorithm uses the standard Wolfe line 
search conditions. Numerical comparisons with con- 
jugate gradient algorithms using a set of 750 uncon- 
strained optimization problems, some of them from 
the CUTE library, show that the hybrid computational 
scheme based on the conjugacy condition outperforms 
the known hybrid conjugate gradient algorithms. 
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Introduction 


Let us consider the nonlinear unconstrained optimiza- 
tion problem 


min {f(x) : x €R"}, (1) 


where f : R” — Risacontinuously differentiable func- 
tion, bounded from below. For solving this problem, 
starting from an initial guess x) € R", a nonlinear con- 
jugate gradient method generates a sequence {x;} as 


Xe+1 = Xp + Og , (2) 


where a; > 0 is obtained by line search, and the direc- 
tions d, are generated as 


dk+1 = —ge+it Bese, do = —Q0-. (3) 


In (3) Bx is known as the conjugate gradient param- 
eter, Sx = X¢41—X,~ and gy = Vf (xx). Consider |].|| 
the Euclidean norm and define yz, = gx41— gx. The 
line search in the conjugate gradient algorithms is of- 
ten based on the standard Wolfe conditions: 


fxn + oedy) — f (xk) < pargpdk (4) 
Si dk = Ogee. (5) 


where dx is a descent direction and 0< p<o <1. 
Plenty of conjugate gradient methods are known, and 
an excellent survey of these methods, with special at- 
tention to their global convergence, was given by Hager 
and Zhang [19]. Different conjugate gradient algo- 
rithms correspond to different choices for the scalar 
parameter 6;. Some of these methods, such those of 
Fletcher and Reeves (FR) [16], Dai and Yuan (DY) [12] 
and conjugate descent (CD) proposed by Fletcher [15], 
have strong convergence properties, but they may have 
modest practical performance owing to jamming: 


T T 
FR _ 8k4+18k+1 py _ 8%418k+1 


ko T ’ ko T 
& Sk VK Sk 
T 

cp _ §k4+18k+1 

ko T s 
8 Sk 


On the other hand, the methods of Polak and Ribié- 
re [23] and Polyak (PRP) [24], Hestenes and Stiefel 
(HS) [20] or Liu and Storey (LS) [22] in general may 


not be convergent, but they often have better computa- 
tional performances: 


T T T 
BRP — Ski Vk pHs — Sx4 ik is _ Sk+i0k 
| ry 2 _ T ? ke Pe 
& Sk yp Sk 8 Sk 


In this contribution we focus on hybrid conjugate 
gradient methods. These methods are combinations of 
different conjugate gradient algorithms; mainly they are 
proposed to avoid the jamming phenomenon. One of 
the first hybrid conjugate gradient algorithms was in- 
troduced by Touati-Ahmed and Storey [28], where the 
parameter 6; is computed as 


TE 
PRP _ 8+i¥k 


; PRP FR 
k — Neel?” if 0 < B, < B; ? 
i= 
2 
ER i , otherwise. 
g 


The PRP method has a built-in restart feature that di- 
rectly addresses jamming. Indeed, when the step s; 
is small, then the factor y, in the numerator of pee 
tends to zero. Therefore, B;*? becomes small and the 
search direction d+, is very close to the steepest de- 
scent direction —g;,+1. Hence, when the iterations jam, 
the method of Touati-Ahmed and Storey uses the PRP 
computational scheme. 

Another hybrid conjugate gradient method was 
given by Hu and Storey [21], where f, in (3) is 


HS — max {0, min (BE®", BP 


As above, when the method of Hu and Storey is jam- 
ming, then the PRP method is used instead. 

The combination of LS and CD conjugate gradient 
methods leads to the following hybrid method: 


IS-€ = max {0,min (BIS, BE}. 

The CD method of Fletcher [15] is very similar to the 
FR method. With an exact line search, the CD method 
is identical to the FR method. Similarly, for an exact line 
search, the LS method is identical to the PRP method. 
Therefore, the hybrid LS-CD method with an exact line 
search has similar performances as the hybrid method 
of Hu and Storey. 

Gilbert and Nocedal [17] suggested a combination 
between PRP and FR methods as 


p= max {—Bys,min {Bes Be }} 
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Since 6} is always nonnegative, it follows that BZN can 
be negative. The method of Gilbert and Nocedal has the 
same advantage of avoiding jamming. 

Using the standard Wolfe line search, the DY 
method always generates descent directions and if the 
gradient is Lipschitz-continuous the method is glob- 
ally convergent. In an effort to improve their algorithm, 
Dai and Yuan [13] combined their algorithm with other 
conjugate gradient algorithms, and proposed the fol- 
lowing two hybrid methods: 


PY — max {—cBPY, min {B1, BPY}) 


phbv =—anax {0, min { i» Be St ’ 


where c = (1—o)/(1+ 0). For the standard Wolfe 
conditions (4) and (5), under the Lipschitz continuity of 
the gradient, Dai and Yuan [13] established the global 
convergence of these hybrid computational schemes. 

In the following we propose another hybrid con- 
jugate gradient as a convex combination of PRP and 
DY conjugate gradient algorithms. We selected these 
two methods for combination in a hybrid conjugate 
gradient algorithm because the PRP algorithm has 
good computational properties, on one hand, and the 
DY algorithm has strong convergence properties, on 
the other hand. Often the PRP method performs bet- 
ter in practice than the DY method and we specu- 
late this in order to have a good practical conjugate 
algorithm. The structure of this chapter is as fol- 
lows. In Sect. “New Hybrid Conjugate Gradient Al- 
gorithms” we introduce our hybrid conjugate gra- 
dient algorithm and prove that it generates descent 
directions satisfying in some conditions the sufficient 
descent condition. Section “Ihe New Hybrid Algo- 
rithms (CCOMB, NDOMB)” presents the algorithms 
and in Sect. “Convergence Analysis” we show the con- 
vergence analysis. In Sect. “Numerical Experiments” 
some numerical experiments and performance pro- 
files of Dolan—-Moré [14] corresponding to this new 
hybrid conjugate gradient algorithm and some other 
conjugate gradient algorithms are presented. The per- 
formance profiles corresponding to a set of 750 un- 
constrained optimization problems in the CUTE test 
problem library [6] as well as some other unconstrained 
optimization problems presented in [1] show that this 
hybrid conjugate gradient algorithm outperforms the 
known hybrid conjugate gradient algorithms. 
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The iterates x9, x),%2,... 
puted by means of recurrence (2) where the step size 
a, > 0 is determined according to the Wolfe condi- 
tions (4) and (5), and the directions d, are generated 
by the rule: 


of our algorithm are com- 


dk+1 = —8k+i + Bi sk ; dy = —g0, (6) 
where 
BR = (1 — On) By + OBR 
=a 0, Skt7 aa ShnsSkt (7) 
&i 8k Vik 


and 6; is a scalar parameter satisfying 0 < 6 <1, 
which needs to be determined. Observe that if 6, = 0, 
then BN = B;**, and if 6; = 1, then BY = BP". On the 
other hand, if 0 < 0; < 1, then By is a convex combi- 
nation of B?°” and BP”. 

Referring to the PRP method, Polak and 
Ribiére [23] proved that when function f is strongly 
convex and the line search is exact, then the PRP 
method is globally convergent. In an effort to under- 
stand the behavior of the PRP method, Powell [25] 
showed that if the step length sq = xx41— Xx, ap- 
proaches zero, the line search is exact and the gradient 
V f(x) is Lipschitz-continuous, then the PRP method 
is globally convergent. Additionally, assuming that the 
search direction is a descent direction, Yuan [29] es- 
tablished the global convergence of the PRP method 
for strongly convex functions and a Wolfe line search. 
For general nonlinear functions the convergence of 
the PRP method is uncertain. Powell [26] gave a three- 
dimensional example, in which the function to be mini- 
mized is not strongly convex, showing that even with an 
exact line search the PRP method may not converge to 
a stationary point. Later on Dai [7] presented another 
example, this time with a strongly convex function for 
which the PRP method fails to generate a descent di- 
rection. Therefore, theoretically the convergence of the 
PRP method is limited to strongly convex functions. 
For general nonlinear functions the convergence of the 
PRP method is established under restrictive conditions 
(Lipschitz continuity, exact line search and the step size 
tends to zero). However, the numerical experiments 
presented, for example, by Gilbert and Nocedal [17] 
proved that the PRP method is one of the best conjugate 


New Hybrid Conjugate Gradient Algorithms for Unconstrained Optimization 


2563 


gradient methods, and this was the main motivation to 
consider it in (7). 

On the other hand, the DY method always gener- 
ates descent directions, and in [8] Dai established a re- 
markable property for the DY conjugate gradient algo- 
rithm, relating the descent directions to the sufficient 
descent condition. It was shown that if there exist con- 
stants y, and y2 such that y; < ||gx|| < y2 for all k, then 
for any p € (0, 1) there exists a constant c > 0 such that 
the sufficient descent condition g/d; < —c ||gi ||? holds 
for at least | pk| indices i € [0,k], where |j| denotes 
the largest integer < j. Therefore, this property is the 
main reason we consider the DY method in (7). 

It easy to see that 


d+ = —ge4it(1—O% Yaseen oes 
& 8k Vi Sk 
(8) 


Supposing that d;, is a descent direction (dp = —go), 
then for the algorithm given by (2) and (8) we can prove 
the following result. 


Theorem 1 Assume that a, in algorithm (2) and (8) 
is determined by Wolfe line search (4) and (5). If 
0 < & <1,and 

Sisk 


Vik 


WP> (Sh Vk Bea 1S k) 
Ilgell? 


then direction di.+4, given by (8) is a descent direction. 


Ilgi-+ (9) 


Proof Since 0 < 6 < 1, from (8) we get 


k 
ee vr ghyis Sk 


Seger = — [lgesill? + 


gh k 


SiS a 


+ 9% k4+15k 
ViSk 


VESk+I T 


il? + Sk415k 


<= |[Bee we 
k 


SiS T 
k+15k 
ViSk 


ge. Sk 
-(- 14-42 Het) tesa 
Vik 


rT 

VESK+I op 

pial ercstiae Sk 
gi ge Sk+1 


= Be Sk Ye &k+1 
=~ 5, St sill? + oe Set i5k « 


KSk 


But, Vik > 0 by (5) and since Ree < 0, it follows that 


Sh Sk 
3 Igual? <0. 


Kok 


Therefore, from (9), it follows that es 4 Akt 1<O,ie., 
the direction dx+, is a descent one. Oo 


Theorem 2 Suppose that (gi1,yk(i4 8k) <9. If 
0 < 6 <1, then the direction d;.4, given by (8) satis- 
fies the sufficient descent condition 


T Sit 1Sk 2 
Ree SH l= Ogee, I|ge-+1hh (10) 
kok 
Proof From (8) we have 
g Vk 
Seeders = — getill” + (1 — 6.) gf se 
& 
k 
“ 0; fae +1 Las Se 
k 
e, Sk 
= = |[gegrl? + 6, = II gta ll? 
pSk 
oe 6) Set Sh 5h) 
&, Sk 


T 
St41Sk 

<— (1-6-2 | Ilgesill? <0. 
VESK 


Observe that since y{sz > 0 by (5) and since gi, sk = 
Visk + BiSk < Vy Sk then y/sk/gp,,Sk > 1. Therefore, 
if 0 < 0 <1, it follows that 0; < yjsk/gj4 Sk. There- 
fore, 


T 
k 
1— =e! 0, 
VK Sk 
proving the theorem. oO 


To select the parameter 6, we consider the following 
two possibilities. In the first hybrid conjugate gradient 
algorithm the parameter 6, is selected in such a manner 
that the conjugacy condition y,dx41 = 0 is satisfied at 
every iteration, independent of the line search. Hence, 
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from y,d,41 = 0 after some algebra, using (8), we get 


0; = pocoms = (yi Sk-+1(V;Sk) = (yi Bk+1)( 8; Bk) 
(yp Rk-+ 1) ESk) — Ige-tall? lgell? 

(11) 

In the second algorithm the parameter 6; is selected in 


such a manner that the direction d,+, from (8) is the 
Newton direction, i.e., 


9 = _ V_Sk+A 
=—V" (xe) Reet = — Bee + 1 = 0) ek 
&i 8k 
46 se 
V Sk 
(12) 


Having in view that V7 f(xx+41)sk = ye, from (12) we 
get 


— gNDOMB 
0, =O 


_ (vigetr = S.8k-+1) [ell = (wep VR VESE) 
Ilge-rill” gel? — (epy Ye OESE 


(13) 


Observe that the parameter 6; given by (11) or (13) 
can be outside the interval [0,1]. However, in order 
to have a real convex combination in (7) the follow- 
ing rule is considered: if 6, < 0, then set 6; = 0 in (7), 
ie, BY’ = BP; if 0 > 1, then take 0, = 1 in (7), ie. 
BN = BP”. Therefore, under this rule for 6; selection, 
the direction d;.+, in (8) combines the properties of the 
PRP and DY algorithms. 


The New Hybrid Algorithms (CCOMB, NDOMB) 


Step 1 Initialization. Select x9 € R" and the param- 
eters 0<p<o<1. Compute f(x) and gp. 
Consider dy = —go and set the initial guess: 
a = 1/|lgoll- 

Step 2 Test for continuation of iterations. If ||gk\l,, < 
10~°, then stop. 

Step 3 Line search. Compute a; > 0 satisfying the 
Wolfe line search conditions (4) and (5) and update 
the variables x,41 = x, + a,d,. Compute f(x%+1); 
Sk+1 and sk = Xk41 — Xk Vk = Sk+1 — Sk- 

Step 4. 0, parameter computation. If (y; gk+1)(V; Sk) 
—|[gx+ill? Iigel]? = 0, then set 6, =0, otherwise 
compute 6; as follows: 


CCOMB algorithm (0; from the conjugacy condi- 
tion = oO. 
NDOMB algorithm (6; from the Newton direction): 
OK = Geen 
Step 5 B) conjugate gradient parameter computation. 
If 0 < 0% < 1, then compute ip as in (7). If @ > 1, 
then set Bj) = BP*. If 6, < 0, then set BN = BP”. 
Step 6 Direction computation. Compute d = —gx41 
+ BW sq. If the restart criterion of Powell, 


|gi+igk| = 0.2 [gesill? . (14) 


is satisfied, then set dk41 = —gxr+1; otherwise de- 
fine dy4; = d. Compute the initial guess a, = 
1 ||de-i||/ ||dx |], set k = k-+1 and continue 
with step 2. 


It is well known that if f is bounded along the di- 
rection d;, then there exists a step size a, satisfying 
the Wolfe line search conditions (4) and (5). In our 
algorithm when the Powell restart condition is satis- 
fied, we restart the algorithm with the negative gradient 
—gk+1-More sophisticated reasons for restarting the al- 
gorithms have been proposed in the literature [10], but 
we are interested in the performance of a conjugate gra- 
dient algorithm that uses this restart criterion, associ- 
ated with a direction satisfying the conjugacy condition 
or that is equal to the Newton direction. Under reason- 
able assumptions, conditions (4), (5) and (14) are suffi- 
cient to prove the global convergence of the algorithm. 
We consider this aspect in the next section. 

The first trial of the step length crucially affects the 
practical behavior of the algorithm. At every iteration 
k = 1 the starting guess for step a; in the line search is 
computed as ax—1 ||dx—1||,/ ||dx||,. This selection was 
considered for the first time by Shanno and Phua [27] 
in CONMIN. It is also considered in the packages SCG 
by Birgin and Martinez [5] and in SCALCG by An- 
drei [2,3,4]. 


Convergence Analysis 


Throughout this section we assume that 

1. The level set S={x ER": f(x) < f(xo)} is 
bounded. 

2. In a neighborhood N of S, the function f is con- 
tinuously differentiable and its gradient is Lipschitz- 

there exists a constant L>0O 


continuous, i. €é., 
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such that ||Vf(x)-—Vf(y)|| < L|lx— yl), for all 

x,yEN. 
Under these assumptions for f, there exists a constant 
I’ > 0 such that ||V f(x)|| < I, for all x € S. 

It was proved in [11] that for any conjugate gradi- 
ent method with strong Wolfe line search the following 
lemma holds. 


Lemma 1 Suppose that assumptions 1 and 2 hold and 
consider any conjugate gradient method (2) and (3), 
where dj is a descent direction and atx is obtained by the 
strong Wolfe line search. If 


1 
—_ =o (15) 
2, Nall 
then 
lim inf ||gx|| = 0. (16) 
k->oo 


For uniformly convex functions which satisfy the 
above assumptions we can prove that the norm of dx41 
generated by (8) is bounded above. Thus, by Lemma 1 
we have the following result. 


Theorem 3 Suppose that assumptions 1 and 2 hold. 
Consider the algorithm (2) and (8), where dx+, is a de- 
scent direction and a; is obtained by the strong Wolfe 
line search. 


fxn + oedy) — f (xk) < pargpd (17) 


Isiaide| S —ogp de. (18) 
If for k = 0, ||sx|| tends to zero and there exist the non- 


negative constants n, and nz such that 


Ilgell? = mi Well? and || getill? <1 Ise]. 9) 


and the function f is a uniformly convex function, i.e., 
there exists a constant jt > 0 such that for all x, y € S 


(V(x) — VF())(x — y) > wllx—yll?, (20) 
then 
Jim ge = 0. (21) 


Proof From (20) it follows that Vik > pL ||sx||’. Now, 
since 0 < 6 < 1, from uniform convexity and (19) we 


have 
ny = [Sere| | | SepiSk+1 
Bi | = T T 
§&. 8k Vi Sk 
é getill yell, 2 Isell 
~ milsell? elise? 
But || «|| < L ||sx||; therefore, 
TL N2 
|r| s 
mllskl] Ilse 
Hence, 


I'L 2 
[detill S Uge+ill + [Bp | Ilsell <P + a ae 
1 


which implies that (15) is true. Therefore, by Lemma 1 
we have (16), which for uniformly convex functions is 
equivalent to (21). Oo 


Powell [25] showed that for general functions the 
PRP method is globally convergent if the step lengths 
Ilsk|] = |lxx+1 — xx] tend to zero, i-e., ||s«l| < ||sx—-1ll 
is a condition of convergence. For convergence of 
our algorithms from (19) we see that along the it- 
erations, for k > 1, the gradient must be bounded 
as m Ilsxll’ < llgell? < 12 ||se—-i||. If the Powell con- 
dition is satisfied, i.e., ||sx|| tends to zero, then 
IIsx ||? < |]sx-1|| and therefore the norm of the gradi- 
ent can satisfy (19). In the numerical experiments we 
observed that (19) is always satisfied in the last part of 
the iterations. 

For general nonlinear functions the convergence 
analysis of our algorithm exploits insights developed by 
Gilbert and Nocedal [17], Dai and Liao [9] and Hager 
and Zhang [18]. Global convergence proof of these 
new hybrid conjugate gradient algorithms is based on 
the Zoutendijk condition combined with the analysis 
showing that the sufficient descent condition holds and 
||dx || is bounded. Suppose that level set S is bounded 
and function f is bounded from below. 


Lemma2 Assume that dy is a descent direction and V f 
satisfies the Lipschitz condition ||V f(x) — Vf(xx)|| < 
L ||x — xx|| for all x on the line segment connecting x, 
and x41, where L is a constant. If the line search satis- 
fies the second Wolfe condition (5), then 


_ l-—o |ghdi.| 
L |ldgll? 


(22) 
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Proof Subtracting g;d, from both sides of (5) and us- 
ing the Lipschitz condition, we have 


(o — gids < (gk+1 — gk) dk < Lory ||dx||? . (23) 


Since d; is a descent direction and o < 1, (22) follows 
immediately from (23). Oo 


Theorem 4 Suppose that assumptions 1 and 
2 hold, 0 < O% <1, (gps VKM 8p 448k) <0, for every 
k>0 there exists a positive constant w, such that 
1- De (p+ Sk) (YpSk) >a@->0, and there exist the 
constants y and I’, such that for all k, y < ||gx|| < I. 
Then for the computational scheme (2) and (8), where 
a, is determined by the Wolfe line search (4) and (5), 
either g, = 0 for some k or 


lim inf ||g.|| =0. (24) 
k>oo 
Proof By the Wolfe condition (5) we have 
VESk = (Gk+1—Kk) Sk = (Tl gg se = ——O) gp sx. 
(25) 


By Theorem 2, and the assumption 1 — 6; (gj 415k)! 
(y;Sk) = @, it follows that 


T op 
gpd <— (: - ji 
ae 


T +) lIgell’ < —o [Igell? . 
Therefore, 

— gee = @ [Igell? (26) 
Combining (25) with (26), we get 

yisk = (1 —o)waxy? . 


On the other hand |lyx|| = ||gx+1 — gel] < L |lsx|l 
hence, 


[gear ve| < Ugesill yell < PL Isl - 


With these, from (7) we get 


N| < Shak Shri Sk+1 
Bi | = T E 
& Sk VK SK 
But 
T 
share] — geri yell © PL Iisell PLD 


y? ~  y 7 


Si Sk 


where D = max {||y —z|| : y,z € S} is the diameter of 
the level set S. 


On the other hand, 
Si418k-+1 he 
ypsk | (L—o)wagy?’ 
Therefore, 
I'LD rT? 
|B | < =E. (27) 
y? (1 —o)waxy? 

Now, we can write 

lIdetall < Wgetill + [Be] IIscll SP + ED. (28) 


Since the level set S is bounded and the function f is 
bounded from below, using Lemma 2, from (4) it fol- 
lows that 
ee] Td 2 
oy 
Ill 


k=0 


(29) 


i.e., the Zoutendijk condition holds. Therefore, from 
Theorem 2 using (29), the descent property yields 


s 


k=0 


yt yr baal yn (ata 
la ~ lide? ~ eel? 


k=0 


which contradicts (28). Hence, y = lim inf || gx || = 0.0 
oo 


Therefore, when 0 < 6; < 1 our hybrid conjugate gra- 
dient algorithms are globally convergent, meaning that 
either g, = 0 for some k or (24) holds. Observe that in 
the conditions of Theorem 2 the direction d+, satisfies 
the sufficient descent condition independent of the line 
search. 


Numerical Experiments 


In this section we present the computational perfor- 
mance of a Fortran implementation of the CCOMB and 
NDOMB algorithms for a set of 750 unconstrained op- 
timization test problems. The test problems are the un- 
constrained problems in the CUTE [6] library, along 
with other large-scale optimization problems presented 
in [1]. We selected 75 large-scale unconstrained op- 
timization problems in extended or generalized form. 
Each problem was tested ten times for a gradually in- 
creasing number of variables: n = 1000, 2000,..., 
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10000. At the same time we present comparisons with 
other conjugate gradient algorithms, including the per- 
formance profiles of Dolan and Moré [14]. 

All algorithms implement the Wolfe line search 
conditions with p = 0.0001 and o = 0.9, and the same 
stopping criterion ||gx||,, < 10°, where ||.||,, is the 
maximum absolute component of a vector. 

The comparisons of algorithms are given in the 
following context. Let f°! and ff"°? be the opti- 
mal values found by ALG1 and ALG2, for problem 
i= 1,...,750, respectively. We say that, in the particu- 
lar problem i, the performance of ALG] was better than 
the performance of ALG2 if 


eo — fpre| < 1073 


1 


(30) 


and the number of iterations, or the number of func- 
tion-gradient evaluations, or the CPU time of ALG1 
was less than the number of iterations, or the number 
of function-gradient evaluations, or the CPU time cor- 
responding to ALG2, respectively. 

All codes were written in double-precision Fortran 
and compiled with f77 (default compiler settings) on an 
Intel Pentium 4, 1.8 GHz workstation. All these codes 
were authored by Andrei. The performances of these al- 
gorithms were evaluated using the profiles of Dolan and 
Moré [14]. That is, for each algorithm we plotted the 
fraction of problems for which the algorithm is within 
a factor of the best CPU time. The left side of these 
figures gives the percentage of the test problems, out 
of 750, for which an algorithm is more performant; 
the right side gives the percentage of the test problems 
that were successfully solved by each of the algorithms. 
Mainly, the right side represents a measure of an algo- 
rithm’s robustness. 

In the first set of numerical experiments, we com- 
pared the performance of the CCOMB algorithm with 
the performance of the NDOMB algorithm. Figure 1 
shows the Dolan and Moré CPU performance profile 
of the CCOMB algorithm compared with that of the 
NDOMB algorithm. 

Observe that the CCOMB algorithm outperforms 
the NDOMB algorithm in the vast majority of prob- 
lems. Only 730 problems out of 750 satisfy crite- 
rion (30). Referring to the CPU time, the CCOMB al- 
gorithm was better in 575 problems, in contrast to the 
NDOMB algorithm, which solved only 72 problems in 
a better CPU time. 


0.9+ 
0.8 + 
CCOMB (Conjugacy Condition) 

0.7 + 

0.6+ CCOMB NDOMB = 
#iter 150 144 436 

05+ #fg 181 189 360 
cpu 575 72 83 

0.4F NDOMB (Newton Direction) 

0.3- 

CPU time metric, 730 problems 
0.2 
0 
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Performance based on CPU time: CCOMB algorithm com- 
pared with the NDOMB algorithm 


1 c T T T T T T T 
0.9F 
0.8} 
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#iter 324 
0.77 #fg - 328 
cpu 470 
0.6} 
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04+ ; , 
CPU time metric, 711 problems 
4 1 = af. 4 4 4 
0 2 4 6 8 10 12 14 16 
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Performance based on CPU time: CCOMB algorithm com- 
pared with the Polak-Ribiére-Polyak (PRP) algorithm 


In the second set of numerical experiments we com- 
pared the performance of the CCOMB algorithm with 
the performances of the PRP and DY conjugate gradi- 
ent algorithms. Figures 2 and 3 show the Dolan and 
Moré CPU performance profiles of the CCOMB al- 
gorithm compared with those of the PRP and DY 
algorithms, respectively. 

When comparing the CCOMB and PRP algorithms 
(Fig. 2), subject to the number of iterations, we see that 
the CCOMB algorithm was better in 324 problems (i.e., 
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0.8F 
0.8} 
0.7- CCOMB hDY = 
rote o7t titer 300 237 160 
iter 433 p 
06+ fg 386 #fg 292 299 106 
’ 531 cpu 440 177 80 
SPY 0.6} 
0.5} 
: Hybrid Dai-Yuan (hDY) 
Dai-Yuan (DY) 05+ 
0.4- 
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Performance based on CPU time: CCOMB algorithm com- 
pared with the Dai-Yuan (DY) algorithm 


it achieved the minimum number of iterations in 324 
problems), the PRP algorithm was better in 196 prob- 
lems and they achieved the same number of iterations 
in 191 problems, etc. Out of 750 problems, only for 711 
problems does criterion (30) holds. Similarly, in Fig. 3 
we see the number of problems for which the CCOMB 
algorithm was better than the DY algorithm. Observe 
that the convex combination of the PRP and DY algo- 
rithms, expressed as in (7), is far more successful than 
the PRP or the DY algorithm. 

The third set of numerical experiments refers to 
the comparisons of the CCOMB algorithm with hybrid 
conjugate gradient algorithms: hDY, hDYz, GN, Hus, 
TS and LS-CD. Figures 4-9 presents the Dolan and 
Moré CPU performance profiles of these algorithms, as 
well as the number of problems solved by each algo- 
rithm in the minimum number of iterations, the min- 
imum number of function evaluations and the mini- 
mum CPU time, respectively. 

From these figures we see that the CCOMB algo- 
rithm is the top performer. Since these codes use the 
same Wolfe line search and the same stopping criterion 
they differ in their choice of the search direction. Hence, 
among these conjugate gradient algorithms we consid- 
ered here, the CCOMB algorithm appears to generate 
the best search direction. 

In the fourth set of numerical experiments we com- 
pared the CCOMB algorithm with the CG_DESCENT 
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Performance based on CPU time: CCOMB algorithm com- 
pared with the hybrid Dai-Yuan (hDY) algorithm 
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Performance based on CPU time: CCOMB compared with the 
hybrid Dai-Yuan (hDYz) algorithm 


conjugate gradient algorithm of Hager and Zhang [18]. 
The computational scheme implemented in the 
CG_DESCENT algorithm is a modification of the HS 
method which satisfies the sufficient-descent condi- 
tion, independent of the accuracy of the line search. 
The CG_DESCENT code, authored by Hager and 
Zhang, contains the variant CG_DESCENT (HZw) 
implementing the Wolfe line search and the variant 
CG_DESCENT (HZaw) implementing an approximate 
Wolfe line search. There are two main points associated 
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0.8F 
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| #fg 386~=—Ss«188 
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0.3 CPU time metric, 714 problems 
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Performance based on CPU time: CCOMB compared with the 
Gilbert—-Nocedal (GN) algorithm 
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Performance based on CPU time: CCOMB algorithm com- 
pared with the Hu-Storey (HuS) algorithm 


with the CG_DESCENT algorithm. Firstly, the scalar 
products are implemented using the loop unrolling of 
depth 5. This is efficient for large-scale problems (over 
10° variables). Secondly, the Wolfe line search is im- 
plemented using a very fine numerical interpretation of 
the first Wolfe condition (4). The Wolfe conditions im- 
plemented in the CCOMB and CG_DESCENT (HZw) 
algorithms can compute a solution with accuracy of the 
order of the square root of the machine epsilon. 


CCOMB- TS 
#iter 413 105 


#fg 407 196 
cpu 554 97 


Touati-Ahmed - Storey (TS) 


CPU time metric, 719 problems 
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Performance based on CPU time: CCOMB algorithm com- 
pared with the Touati-Ahmed-Storey (TS) algorithm 
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Performance based on CPU time: CCOMB algorithm com- 
pared with the Liu-Storey-conjugate descent (LS-CD) algo- 
rithm 


In contrast, the approximate Wolfe line search im- 
plemented in the CG_DESCENT (HZaw) algorithm 
can compute the solution with accuracy of the or- 
der of machine epsilon. Figures 10 and 11 present the 
performance profile of these algorithms in compari- 
son with that of the CCOMB algorithm. We see that 
the CG_DESCENT algorithm is more robust than the 
CCOMB algorithm. 
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Performance based on CPU time: CCOMB algorithm com- 
pared with the CG_DESCENT algorithm with Wolfe line 
search (HZw) 
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Performance based on CPU time: CCOMB algorithm com- 
pared with the CG_DESCENT algorithm with approximate 
Wolfe line search (HZaw) 


Conclusion 


We know a large variety of conjugate gradient al- 
gorithms. The known hybrid conjugate gradient al- 
gorithms are based on projection of classical conju- 
gate gradient algorithms. In this chapter we have pro- 
posed new hybrid conjugate gradient algorithms in 
which the famous parameter 6; is computed as a con- 


vex combination of BF’ and BP*, ie, By = (1 - 
0 )Bp + 082%. The parameter 6; is computed in 
such a manner that the conjugacy condition is sat- 
isfied, or the corresponding direction in the hybrid 
conjugate gradient algorithm is the Newton direction. 
For uniformly convex functions if the step size s, ap- 
proaches zero, the gradient is bounded in the sense 
that m |Isx ll? < |lgll? < no ||sz—1|| and the line search 
satisfies the strong Wolfe conditions, then our hybrid 
conjugate gradient algorithms are globally convergent. 
For general nonlinear functions if the parameter 6; 
from the By definition is bounded, i.e., 0 < 0 < 1, 
then our hybrid conjugate gradient is globally con- 
vergent. The Dolan and Moré CPU performance pro- 
file of the hybrid conjugate gradient algorithm based 
on the conjugacy condition (CCOMB algorithm) is 
better than the performance profile corresponding to 
the hybrid algorithm based on the Newton direction 
(NDOMB algorithm). The performance profile of the 
CCOMB algorithm was better than those of the well- 
established conjugate gradient algorithms (hDY, hDYz, 
GN, HuS, TS and LS-CD) for a set consisting of 750 un- 
constrained optimization test problems, some of them 
from CUTE library. Additionally the proposed hybrid 
conjugate gradient algorithm CCOMB is more robust 
than the PRP and DY conjugate gradient algorithms. 
However, subject to robustness the CCOMB algorithm 
is outperformed by the CG_DESCENT algorithm. 
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A large number of problems in mechanics, engineering 
and economics can be defined by means of appropri- 
ate differentiation of an energy function. This function 
is sometimes called a potential. On the assumption of 
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differentiability one may write the first order optimality 
condition that the gradient of the function is equal to 
zero at some point. All points which satisfy this condi- 
tion are called critical points. The set of nonlinear equa- 
tions derived this way are the governing equations (the 
model) of the studied problem. Equivalently, one may 
consider the requirement that the directional derivative 
of the function at this critical point and in all directions 
emanating from this point is equal to zero, or that all 
possible small variations of the energy function around 
the critical point is equal to zero. This way the critical 
point is described by a variational inequality and one 
speaks about a variational formulation of the problem. 
Whether a critical point is also a minimum of the con- 
sidered energy function, or, for example, a maximum or 
a saddle point requires the consideration of second or- 
der optimality conditions. One should mention in pass- 
ing that (possibly local) minima are of outmost impor- 
tance in applications. For instance, in mechanics they 
provide stable equilibria of the studied mechanical sys- 
tems. Convexity and coercivity of the energy function 
guarantees that a critical point is a minimum while 
strict convexity ensures its uniqueness as well. 

Lack of differentiability of the energy function, or 
the consideration of inequality constraints in the min- 
imization problem changes the picture. As far as the 
convexity property holds, one may use the powerful 
tools of convex analysis to study the problem. The gra- 
dient of the nonsmooth but convex function is replaced 
by the subgradient, the differential by a set-valued sub- 
differential (in the sense of J.-J. Moreau and R.T. Rock- 
afellar) and the critical point equation by the differential 
inclusion: zero must be an element of the subdifferen- 
tial of the energy function at minimum. Accordingly, 
in the variational formulation a variational equation is 
replaced by a variational inequality. Analogous consid- 
erations hold true for the case of inequality constraints. 
Here, one has, in principle, two ways to study the prob- 
lem. Either all admissible variations are taken into ac- 
count in the derivation of the optimality conditions, 
or the inequality constraints are included by means of 
the indicator function of the admissible set in the set 
of the problem’s variables. Following Moreau, who in- 
troduced and studied convex analysis and applied it for 
the solution of mechanical problems, a convex nondif- 
ferentiable potential energy function is called a super- 
potential. 


Admittedly, convexity is a convenient assumption 
too good to be true in real life applications. The study 
of nonconvex energy functions requires new tools and 
methods, which are being developed within the area 
of nonconvex analysis. Among them, the notion of the 
generalized gradients in the sense of F.H. Clarke, Rock- 
afellar [2] has found several applications. Concerning 
the search for minima of a nonconvex energy func- 
tion one may formulate critical point problems. Under 
certain assumptions the generalized gradient of Clarke 
provides a useful tool for the formulation and the study 
of nonconvex and nonsmooth energy function prob- 
lems. In the area of mechanics, P.D. Panagiotopoulos 
has been the first to introduce and use this notion. 
He called the resulting nonconvex variational inequal- 
ities hemivariational inequalities. Initially, the notion 
of Clarke, which is suitable for locally Lipschitz energy 
functions, has been used. Later, the extended notion 
of Rockafellar, which roughly speaking includes infi- 
nite vertical branches, has been considered. Later, he 
extended this analysis by incorporating inequality con- 
straints or convex superpotential terms (as in the case of 
convex problems), which have been inspired from the 
engineering applications he studied. Thus, he formu- 
lated and studied variational-hemivariational inequal- 
ity problems. On the analogy of convex analysis, the 
nonconvex and possibly nondifferentiable energy func- 
tions have been called by Panagiotopoulos nonconvex 
superpotentials. Once more, it should be emphasized 
here that hemivariational inequalities are not equiva- 
lent to minimum problems but to substationarity prob- 
lems. Nevertheless, they constitute a consistent exten- 
sion of variational inequalities, and they include them 
for the case of convex energy functions. Furthermore, 
and this is important for some numerical algorithms, 
as with the subdifferential of convex analysis the gen- 
eralized subdifterential of Clarke involves one convex 
set in the set-valued approximation of the differential 
of a nonconvex and nonsmooth energy function. Con- 
sequently, the development of theory and algorithms 
runs in parallel with the ones developed for problems of 
convex analysis and of variational inequalities. It should 
be mentioned that propositions of substationary poten- 
tial and complementary energy generalize the classical 
minimum energy propositions of mechanics (where, 
for historical reasons, they are known as principles). 
Moreover, following the example of nonsmooth analy- 
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sis, Panagiotopoulos named the whole area of mechan- 
ics which deals with nondifferentiable functions nons- 
mooth mechanics. 

In this article the notions of generalized deriva- 
tive of Clarke and Rockafellar and the definition of 
the substationarity points are first given. After that, the 
hemivariational and the variational-hemivariational in- 
equalities of Panagiotopoulos are presented. Short 
comments on the theoretical tools which are used for 
their study, examples of applications and numerical al- 
gorithms which have been proposed for their solution 
complete the paper. Details on all previous issues can 
be found in the cited publications. See also » hemivari- 
ational inequalities: applications in mechanics. 


Clarke’s Generalized Derivative 

Let a function f be locally Lipschitz at x € X and let y be 
a vector in X. The directional differential in the sense of 
Clarke of f at x in the direction y, denoted by f° (x, y), 
is defined by the relation: 


f(x,y) = isp h+py)—f(x+ h) 


M04 M 
h->0 


f° (x, y) is also called a generalized directional differen- 
tial. 

By means of the directional differential f° (x, y) one 
can now define the generalized gradient Of (x): xX > 
X*; 

af (x) 
foe ye, Pm —x) > ar—x) | 
= x" € X”: We eX : 


One may also use the definition 


Of (x) 


(2) 
= {x* © X*: (x*,-1) € Noi p(x, f(x))}, 


where Nc (x) denotes the normal cone to a set C at 
point x and epi f is the epigraph of the function f. 

Note that df(-) is a multivalued mapping; it is 
a nonempty convex, closed and bounded subset of X* 
and the following relation holds true: 


f(x,y) = max {(y,x*) ae Ifo}. (3) 


For didactical reasons the notation 0 is used here (and 
in most of the work of Panagiotopoulos). In honor of 


Clarke who proposed it, the notation dc, is sometimes 
used. When misunderstanding is not expected, the no- 
tation 0, which is usually reserved for the convex anal- 
ysis subdifferential, is also used. It should also be noted 
here that 0 should not be confused with the superdiffer- 
ential used in the theory of quasidifferentiability in the 
sense of V.F. Demyanov. 

Relation (1) can be used to define the generalized 
gradient f(x) for any type of function f: X > R 
which is finite at the point x. Note that 0f(x) may be 
empty. The above definition of 0f(x) for any function 
f: X — R makes sense, because the normal cone Nc 
(x) can be defined with respect to any set epi f. The gen- 
eralized directional differential f* (x; y) at x in the di- 
rection y is defined by the relation: 


FC, y) = sup {y,x*) : x" E Of (x). (4) 
Thus one may write the relation: 


Of (x) 
=> jx" eX": F¥ (x, 31 — x) 2 (x* 01 — 2) 


(5) 
Vxy Ex ; 


The directional differential f* (x; y) is also called direc- 
tional differential in the sense of Rockafellar [15]. Note 
that df(x) = @ if f* (x, 0) = — 00, while if f* (x, y) is 
finite for every y, then 0 f(x) 4 9. 

It should be noted that for a convex function f one 
has 


f*(x,y) =liminf f(x), Vy eX, (6) 
yy 


where f’ (-, -) denotes the one-sided directional Gateaux 
differential. Moreover, for a locally Lipschitz function f 
at point x one has 


flay) = f(xy), Wye X, (7) 
and for a continuously differentiable f: 


Df (x) = {grad f(x)}. (8) 


Examples 


For a convex function f one has 


Of (x) = Af(x), (9) 
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and for a concave and bounded below on a neighbor- 
hood of x function f: 


Af (x) = —(—f)(x) (10) 


at every point x where f is finite. The indicator function 
Ic of a (generally nonconvex) set C is defined by Ic (x) 
= 0 if x € C, and Ic (x) = 00 otherwise. It can be proved 
that 


OIc(x) = Nc(x) (11) 
and 
Th, 9) = Trey): (12) 


In the finite-dimensional case X = R", for a locally Lip- 
schitz function f at a point x, 0f(x) is the convex hull 
of all points y € R” of the form 
y = lim grad f(x;), (13) 
100 
where x; converges as i > 00 to x, avoiding the non- 
differentiability points and any other points of a set of 
measure zero (in the sense of Lebesgue) and such that 
grad f(x;) converges. 

For a maximum-type function f, i. e., when the func- 
tion is defined by means of continuously differentiable 
functions g; = 9; (x), i=1,...,m, x € R" by the relation: 
f =max {9j,..., @m}, one has 


Of (x) = co {grad y;(x): i € I(x)}, (14) 


where I(x) = {i:gi(x) = f(x)} is the active index set. 

The normal cone to a set defined by: C = {x € 
R":f(x)< 0} at a point xo with f(xo) = 0 is described by 
the relation 


Ne(x) C jax": x* ER", A>0, x" If (x) 


whenever f is Lipschitzian on a neighborhood of xo and 
0 ¢ Of (xo). The notion of d-regularity assures that di- 
rectional derivative information can be regained from 
the Clarke’s notion. For a locally Lipschitz function one 
requires that f° (x, y) = f’ (x, y), Wy € X, holds at x. 
This definition is equivalent to the statement that epi f 
is regular at (x, f(x)). For instance, a convex function 
and a maximum type function are d-regular at a point 


x where they are finite. For example, for the max-type 
function f = max {1,..., Pm} one has 


Nc(xo) = dIc(xo) 
A; > 0, 


pilxo) <0, ; 
AiPi(xo) = 0 


m 15 
= z=) Ajgrad yi(x0): 


i=1 


if0 ¢d |f (xo). The above relation permits the extension 
of the Lagrange multiplier rule for optimization prob- 
lems subjected to the nonconvex inequality constraints 
9; (x) <0,i=1,...,m. This becomes obvious, e. g. if one 
considers the search for a local minimum problem of 
a continuously differentiable function g: R" — R over 
C = {x € R": g(x) < 0,i=1,..., m}. A necessary con- 
dition is 0 € ag + Ic)(x), which implies that 

— grad g(x) € dIc(x), (16) 
which together with (15) leads to the Lagrange multi- 
plier rule. 


Critical Points and Substationarity 


The notion of substationarity plays an important role 
in the theory of hemivariational inequalities because 
it permits the formulation of the propositions of sub- 
stationary potential and complementary energy which 
generalize the corresponding classical minimum energy 
propositions in mechanics [12,13,14]. Point x9 is a sub- 
stationarity point of a functional f: X > Rif 


0 € Of (xo). (17) 
Equivalently one has: 
flo. y)=0, Wye X. (18) 


Substationarity points are all the classical stationarity 
points, all the local minima, a large class of local max- 
ima, as well as all the saddle points. Point x is said to be 
a substationarity point of f with respect to a set C, if f + 
Ic is substationary at x. 


Nonconvex Energy Functions 


The nonconvex superpotentials resulting by integrating 
discontinuous functions 8 € LP? (R) play an important 
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role in the formulation of hemivariational inequalities 
for several types of mechanical problems. After some 
technical details about filling in gaps in a multifunction, 
nonmonotone laws in mechanics which admit a non- 
convex energy function will be introduced. 


Filling the Gaps in a Multifunction 


Suppose that 6: R — Ris a function such that B € L?° 
(R), i.e. a function which is essentially bounded on any 
bounded interval of R. For any p > 0 and & € R let us 


define 


B,(§) = ee ee (19) 
and 
B,(é) = esssup A(E:). (20) 


l&i—&llse 


Obviously, the monotonicity properties of p > B,(€) 
and p > B(E ) imply that the limits as p — 0, exist. 
Therefore one may write that: B(€) = limp—+o, B (6). 
Furthermore, one defines the multivalued function: 
Be) = (BO). BO], (21) 
where [-, -] denotes an interval between the two given 


arguments. Then, a locally Lipschitz function j can be 
determined, up to an additive constant, by the relation 


é 
j(é) = [ Bl) dé, (22) 


such that aj(é) Cc B(é). If moreover B (€4) exist for 
each £ € R, then one has 0j(£) = B(E). 


Superpotential Nonmonotone Laws 


Let us assume that one has an one-dimensional 
mechanical law which is described by the graph 
of a discontinuous function. For instance, a force- 
displacement law (S — u) is considered, which may cor- 
respond to an one-dimensional nonlinear spring law, 
to a nonlinear boundary condition, etc. The law is con- 


sidered to be of the form f: u > — S whereu ER 
and S € R. The procedure of (19)-(22) is used in order 
to define a locally Lipschitz nonconvex superpotential 
energy function j(u). The mechanical law is produced, 
in turn, by using the previously introduced generalized 
subdifferential operator dc, = @ and the nonconvex 
superpotential j as follows: 


—f € Ici j(u). (23) 
By definition, (23) is equivalent to the inequality 
ju, u*—u)>(-f,u*—u), Wu" e€U, (24) 


for u € U, which Panagiotopoulos has called a hemi- 
variational inequality, and to the inclusion 

(—f.—1) € Nepij(u, j(u)). (25) 
For j Lipschitzian, j* in (23) is replaced by j° Obviously, 
if j is a convex superpotential, one has superpotential 
laws which can be described by monotone graphs with 
complete vertical branches. The procedure would be 
identical in that case as well, where dc, is replaced by 
the subdifferential of convex analysis. The result would 
be a variational inequality. Note also that extensions to 
multidimensional laws (e. g., material constitutive rela- 
tions) can be considered within this formulation. 


Hemivariational Inequalities 


An abstract coercive hemivariational inequality is writ- 
ten first. Let V be a real Hilbert space with the property 
that V C [L?(Q)]"” C V*, where V* denotes the dual 
space of V, and the injections are continuous and dense. 
Let moreover a boundary value problem be defined in 
an open, bounded subset {2 of R”. Here (-, -) denotes 
the [L?(2)]” inner product and the duality pairing, ||-|| 
is the norm of V and |-|, is the [L?(2)]”-norm. One 
should recall that the form (-, -) extends uniquely from 
V x L? [(Q)]" to V x V*. Moreover let L: V > L?(92), 
Lu = u, u(x) € R be a linear continuous mapping. 
Further, assume that / € V*, that L: V > L?(2) and that 
V= {ve V: Ve L™(Q)} is dense in V for the V- 
norm, and has a Galerkin base in V. It is also assumed 
that a(-, -): V x V > Risa bilinear symmetric continu- 
ous form which is coercive, i.e. there exists a constant c 
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> 0 such that 


a(v,v) >cllv|?, We v. (26) 


A coercive hemivariational inequality problem reads: 
Find u € V such that 


a(u,v—u) + : PPG,V—-% dQ > (1,v—u), 
Q (27) 


VveV. 


For example, a linear elastostatic structural analysis 
problem with additional nonlinear elements of non- 
monotone type which admit a nonconvex superpoten- 
tial j(u) can be written in the hemivariational inequal- 
ity form (25). In this context u are the displacements 
at the various points of the structure, which occupies 
§2 in its undeformed configuration. For a plate prob- 
lem one has 2 C R?’, for a three-dimensional contin- 
uum 92 C R3, etc. Moreover the functional space V is 
dictated from the kind of the assumed application and 
from the (natural, support) boundary conditions of the 
structure. The operator L and the energy form (u, u) de- 
pend on the mechanical theory used for the elastic part 
of the structure, while / denotes the external loading. Fi- 
nally, coercivity usually means that a well-defined me- 
chanical theory has been assumed and sufficient bound- 
ary conditions are assigned so that, for example, no 
rigid body motions of the structure are allowed. Finally, 
one should note that the familiar form of the princi- 
ple of virtual work (i.e., a variational equality) can be 
obtained back from the hemivariational inequality (27) 
if the effect of the nonlinear terms is neglected, i.e., if 
the second term on the left-hand side of (27) is absent 
and if an equality is assumed in the place of the inequal- 


ity. 


Theoretical Studies 


One recalls that the theory of the existence of solution 
of variational inequalities is a well-developed theory in 
mathematics which is closely connected with the con- 
vexity of the energy functionals involved. Indeed the 
existence theory of variational inequalities is based on 
monotonicity arguments. On the contrary the study 
of hemivariational inequalities, due to the absence of 
convexity is based on compactness arguments. Exis- 
tence and approximation results for hemivariational in- 
equalities have been studied for several applications. 


See [11,14] for details and citations to related refer- 
ences. 


Semicoercive Case 


Noncoercive problems arise in mechanics, for exam- 
ple, when the existing classical boundary conditions 
are not sufficient to prevent (even infinitesimal) rigid 
body displacements and rotations of the structure. In 
classical, equality mechanics, one may write conditions 
that guarantee the existence of a solution to the con- 
sidered boundary value problem. Nevertheless, unique- 
ness of some quantities may be lost. One may consider, 
as an example, a free elastic body subjected to self-equi- 
librated external forces. In this case stresses and defor- 
mations of the elastic body can be determined, but its 
displacements are only determined modulo some rigid 
body displacements and rotations. In the presence of 
inequality (e.g., unilateral contact) constraints analo- 
gous relations have also been provided by G. Fichera 
(see, e. g., [13, Chap. 4]). It is interesting to observe that 
some inequality constraints may be activated and sta- 
bilize the body, a result that has certain applications in 
robotics [1]. 

For hemivariational inequalities, analogous results 
have been obtained by Panagiotopoulos and coworkers 
(see also [4]). One such result for an abstract hemivari- 
ational inequality is given here, without proof. Here a(-, 
-) is assumed to be semicoercive, i.e., a(-, -) is contin- 
uous and symmetric but it has a nonzero kernel, i.e. 
ker a(-, -) = {q: a(q, q) = 0} A {0}. Moreover let ker 
a be finite dimensional. Let us assume that the norm 
I|v|| on V is equivalent to ||||v|||| = p() + ||q||,, where 
v=V+4q,q € kera,v € kerat (i.e Wq) = 0, 
V q € ker a) and p(v) is a seminorm on V such that 
ply) = pv + gq), Vv € V, q € ker a, and let a(v, v) > 
c(p(v))*, Vv € V, c = const > 0. This semicoercivity in- 
equality replaces (26) for the coercive case. Moreover, 
let q. and q_ be the positive and negative parts of q, 
where q = Lq, i.e. q4 = max{0, q}, q~ = max{0, —q}. 
The following quantities will also be used: B(—oo) = lim 
SUPE  — 00 B(&) and B(oo) = lim infg _, 45 B(E). On the 
assumption that 


B(—co) S B(E) < Boo), VEER, (28) 


a necessary condition for the existence of a solution u € 
V of a semicoercive hemivariational inequality problem 
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is the following inequality: 


iL [B(—00)44 — B(oo)q_] d® < (1,4) 


< i [B(oo)q4. — B(-00)q_] 42, (29) 
Q 


Vq € kera. 


If (28) holds strictly (with < instead of <), then (29) also 
holds strictly. 


Substationarity of the Energy 


Equivalent to the hemivariational inequality is the fol- 
lowing substationarity problem: Find u € V such that 
the superpotential energy functional 


IT(v) = satv.9) + [ jv) dQ — (1, v) (30) 
is substationary at v = u. Here, the integral [, j(v) d2 
is set equal to oo if it is not defined. 

Recall that the latter problem is, by definition, 
equivalent to: Find u € V which is a solution of the in- 
clusion 


0 € O77 (u). (31) 


The equivalence of the above defined substationarity 
problem with the hemivariational inequality can be 
proved, on the assumption that j is locally Lipschitz and 
d-regular 


Applications 


Several types of hemivariational inequalities have al- 
ready been studied (see, for example, [13,14], and the 
references given there) with respect to certain engineer- 
ing problems, e.g. in nonmonotone semipermeability 
problems, in the theory of multilayered plates (delam- 
ination), in the theory of composite structures, in the 
theory of partial debonding of adhesive joints etc. 


Numerical Algorithms 


Till now (1998), the following methods have been in- 
vestigated for the numerical solution of hemivariational 
inequality problems in mechanics. 

e Nonsmooth optimization algorithms (in particular, 
of the bundle algorithm optimization type), for the 
solution of the inclusion (31). See [6,9] and, for the 
bundle nonsmooth optimization concept, e. g., [7]. 


e Decomposition into a series of convex subproblems 
(for instance, into a number of variational inequal- 
ities). For the numerical treatment of problems in 
elastostatics this approach has been the most fruit- 
ful, according to the experience of the authors. The 
decomposition has been based either on engineer- 
ing motivated methods and heuristics (cf., [8]), or 
on mathematical programming techniques. In the 
last case results from the theory of quasidifferen- 
tiability [3], of difference-convex optimization [10] 
and of enumeration-type or branch and bound pro- 
cedures [16] have been tested. More details can be 
found in [3,5,10,14]. 
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The basic network flow problem can be formulated as 


> ij (xij) 


(i,j)EA 


s.t. be Cee Xin = On, (1) 


(n,j)EA (i,n)EA 
Vn eN, 


lig S xij S uij, 


min 


V(i, j) € A, 


where A is the (directed) arc set with generic element (i, 
j); N is the node set with generic element n; b,, is the net 
supply (if b, > 0) or net demand (if b, < 0) at node n; ug 
(respectively, 1) is the flow upper (respectively, lower) 
bound for arc (i, j); xj is the flow decision variable for 
arc (i, j); and pj (xj) is the cost function for arc (i, j). 
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It is assumed that )°,,eNn b, = 0 (otherwise, a dummy 
supply node or demand node can be added to N to en- 
force this condition). The objective function in (1) min- 
imizes total costs in the network; the first constraints 
are node flow balance equations; and the second con- 
straints are arc flow bounds. Extensions to the basic net- 
work flow problem given above include multicommod- 
ity networks, generalized networks (i.e., networks with 
arc flow gains or losses), and augmented networks (i.e., 
formulations with additional constraints and/or deci- 
sion variables in addition to xj) (see, for example, [1]). 

If the cost function $j (xj) is a nonconvex function 
for one or more of the arcs in set A, then (1) is referred 
to as a nonconvex network flow problem (NNFP). The 
most commonly used cost function in a NNFP is a fixed 
charge function, of the form 


if xj; = 0, 


gij(xij) = (2) 


aij + Bij Xi; if xj; > 0, 
where a and fj are coefficients with aj > 0 (and lj = 
0). By incurring the quantum of cost a before flow can 
be carried on arc (i, j), fixed charge NNFPs can be used 
to model a variety of network design and expansion 
[29], lot sizing [17], and facility location [6,30] prob- 
lems. Classical combinatorial problems, such as travel- 
ing salesperson and 0-1 knapsack problems, can also be 
represented as fixed charge NNFPs. In fact, any mathe- 
matical programming problem that can be formulated 
as an integer program with integer coefficients can be 
recast as a fixed charge NNFP [28]. 

Another common form of cost function in a NNFP 
is a concave quadratic function, of the form 


hij (xij) = oj + Bij (xij — Vij) (3) 


where aj, Bj, and yy are coefficients with By < 0. 
Concave quadratic NNFPs are used to model arc flow 
economies of scale which are present in many commu- 
nication [32], water resource [10], and physical distri- 
bution [22] problems. 

A third type of cost function used in NNFPs is the 
‘sawtooth’ function. A simple two-piece sawtooth func- 
tion is given by 


ij * Xij 


Bij * Xi 


if xj; < ijs 
Pij(xij) = ee (4) 


if xij = Vij. 


where aj, Bj, and y are coefficients with aj > Bj and 
li < Vij < uy. Functions of the form (4) are used to 
represent price-breaks and other types of all-units dis- 
counting [7] that occur, for instance, in network repre- 
sentations of inventory [33] and cash flow management 
problems [24]. 

The NNFP is in the class of NP-hard problems [14]. 
Thus, determining the global minimum to an NNFP is 
challenging because of the existence of many feasible 
points that are locally, but not globally, optimal. G.M. 
Guisewite [18] distinguishes between NNFPs with con- 
cave cost functions (such as (2) and (3)) and indefinite 
cost functions (that is, functions that are neither con- 
cave nor convex, such as (4)). For concave NNFPs, if 
the constraints in (1) are feasible, then there exists an 
optimal extreme point solution. Further, if the coeff- 
cients b,, 1, and uj in (1) are integer-valued, then the 
optimal arc flows xij will also be integer-valued [4]. 

One of the earliest solution methods for concave 
NNFPs was proposed by B. Yaged [32]. This method 
uses successive linearizations of the concave cost func- 
tion. It quickly converges to a local (but not necessar- 
ily global) minimum. Other approximate methods for 
concave NNFPs involve dual ascent [2,9], Lagrangian 
relaxation [11], local extreme point search techniques 
[13,21], and tabu search methods [16]. 

Exact methods for concave NNFPs generally rely on 
the underlying network topology. For problems with 
an arbitrary network topology, branch and bound pro- 
cedures using rectangular partitioning are the most 
widely used techniques [3,5,26,30]. If the number of 
supply nodes or demand nodes in the set N is limited, 
then dynamic programming approaches are tractable 
[8,17,20]. Alternatively, if there are only a small num- 
ber of arcs in the set A with a nonlinear cost function, 
then parametric programming techniques may be em- 
ployed [25]. 

For the second type of NNFP - with an indefinite 
form of cost function - the optimal solution to (1) is 
not necessarily at an extreme point of the feasible re- 
gion. For cases where the indefinite cost function is 
piecewise-linear but not necessarily continuous (as is 
the case in (4)), then the problem can be formulated 
and solved as a mixed integer program. If the indefinite 
function is continuous (and satisfies certain regularity 
properties), then it can be converted to a difference con- 
vex function (d.c. function) and the indefinite NNFP 
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can be solved as a specialized difference convex pro- 
gramming problem [31]. For more general functions, 
the indefinite NNFP can be converted (at least in princi- 
ple) to a concave NNFP on an expanded network. Then, 
the concave NNFP solution techniques can be applied 


the problem on the expanded network [27]. 


See [12,15,18,19,22,23] for additional discussion of 


applications and solution methods for NNFPs. 
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Problems in unilateral mechanics involving both mono- 
tone and nonmonotone unilateral boundary condi- 
tions have as variational formulations the so-called 
variational-hemivariational inequalities [5]. Solutions 
of these mathematical models can be determined as the 
critical points of some energy functionals which can 
usually be expressed as the sum of a locally Lipschitz 
function ®: X — R and a proper, convex and lower 
semicontinuous function W: X > R U {+ oo}. The criti- 
cal points of J := ® + W are here defined as the solutions 
of the variational-hemivariational inequality: 


UuEexXx: 


: (1) 
@°(u;v —u) + W(v) — W(u) = 0, 


VvEeXx. 

For example, let us consider a linear elastic body identi- 
fied with an open bounded subset 2 C R®. The bound- 
ary I” of the body 2 is assumed to consist of three open 
disjoint parts 7), [2 and I3,i.e. 0 = T,UT,UTPs. 
Let us denote by u = (uj), o = (oy), € = (€4), S = (Sj) the 
displacement vector, the stress tensor, the strain tensor 
and the stress vector, respectively. It is supposed that 
the body is subject to a body density force f € L?(92;R°). 
Moreover, one compels the part I”; of the body to sat- 
isfy the constraint 


u(x) € Q(x), Weel, (2) 
where Q(x) is defined as follows: 
Q(x) := {y ER’: g(x,y) < o} ; (3) 


where g: I"; x R? = Ris continuous and convex in the 
second variable. Moreover, we assume that g(x, 0) < 0, 
Vx € I’;. Thus Q(x) is a nonempty, closed and convex 
subset for all x € I”. The formulation of these unilateral 
constraints has to encompass the associated forces of 
constraints r(x). We assume a normal reaction law of 
the form 


— r(x) € Nax)(u(x)), 
where for a vector v € R®, Nqx)(v) denotes the normal 
cone of Q(x) at v. Note that the system (2-4) is equiva- 
lent to the variational inequality 

u(x) € Q(x): 


r(x)" (v—u(x)) > 0, 


Vx ely, (4) 


(5) 


Vv Ee Q(x), VxeEeTl\. 
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The part I” (it is assumed that meas(J"2) > 0) of the 
boundary is assumed to be fixed, that is, 


u(x) =0, VxeEely. (6) 


On I°3, we consider conditions of a unilateral contact 
between the body and a Winkler-type support which 
may sustain only limited values of traction, that is (we 
consider the usual decompositions v = vyn + vr, S = 
Swn + Sr, Vn» Sn € R, vr, Sp € R’, where n denotes the 
unit outward normal vector on I”): 


— Sn € Oj(x,un(x)), Vx ET, (7) 
where 
0 if y <0, 
; — J kolx)y? 
Hx y= 5 if O<y<e, 
Ko(s)e" if y >, 
and 
Sp=Cr on [3 (8) 


with Cr given in L*(I"3; R*). We refer the reader to [4] 
for the details concerning the formulation of the subd- 
ifferential law (7). From (7), (8) and the orthogonality 
between Syn and ur and between Sy and uyn, we ob- 
tain the hemivariational inequality 


Sly + f(x, un (x); vw) — Clyr > 0, 


YWweR, Vxels. 


Here 0 j(x, -) stands for the Clarke subdifferential of j 
with respect to the second variable while j))(x, u; v) de- 
notes the generalized directional derivative of j at u in 
the direction v [2]. In the framework of a small defor- 
mation theory, one has the equilibrium equations 


oijj+fi =O, (9) 
1 

ejj(u) = 5 Mii + Uji). (10) 

Oij = Cijziéxi(u) , (11) 


where Cj; € L°°(2) denotes the elasticity tensor as- 
sumed to satisfy the usual symmetry and ellipticity 
properties. Suppose now that all the data are sufficiently 


smooth to justify the following computations. From (9) 
and for a virtual displacement v — u we obtain the equa- 
tion 


-| 0:;,j(Vi — uj) dx =] f'(v—u) dx. 
Q 2 


Using the Green-Gauss theorem and relations (10) and 
(11), we get 


/ Cijieij(wei(v — u) dx 
2 


=| fw-w ax+ f S'(v—u) ds. 
2 r 


Using now the boundary conditions, we derive for v 
and u satisfying (6): 


i Cijx€ij(uexi(v — u) dx 
Q 
-|/ fi w=) ax+ f Sn(vy — un) ds 
Q rT; 
+f Ch (vr — ur) ast [ r'(v—u) ds. 
Tr; ry 


A combination of this last expression expressing the 
principle of virtual work with the mechanical laws (4) 
and (7), we obtain the variational-hemivariational in- 
equality problem: Find u satisfying (2) and (6) such that 


i Cijeieij(wexi(v — u) dx 

2 

+f j’ (x, UN; VN — UN) ds 
D3 


-f{ fi (v—u) dx — f Ch(vr—ur) ds >0, 
Q Ts 


for all v satisfying (2) and (6). This mathematical model 
expresses the principle of virtual works in its inequality 
form. Let us now present a suitable framework to for- 
mulate this last inequality model. The boundary I" of 
the body is assumed sufficiently regular and we denote 
by y: H! (2, R*) > H!?(L; R°), yn: H! (@; RB’) > 
H'? (I) and yr: H! (Q; R*) > Hr (see e. g., [6] for the 
notations) the usual trace mapping, normal trace map- 
ping and tangential trace mapping, respectively. We set 


X = {u € H'(Q;R*): y(u(x)) = 0aex eI} 
and 


C= {ue X: y(u(x)) € Q(x)aex ely}. 
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We define the operator A: X — X’, the element h € X’ 
and the functional J: X — R by the formulas: 


Waa a CaadetoaiGh ar. 


Vu,ve X, 
(h, v) =| fivi dx + | Crvr ds, 
Q rT; 
VveEX, 
ies / ards 
D3 
VveEXx. 


Using the rules concerning the subdifferentiation of in- 
tegral functionals and composite mappings [2], we can 
show that 


(u,v) < / CRT COR Cor 
Ts (12) 


Vu,vexX. 


Setting 
P(u) = ; (Au, u) — (h,u) + J(u) 


and using the relation (12) we see that the equilibri- 
ums of our system can be determined by means of the 
variational-hemivariational inequality: 


uEC: PO (u;v—u)>0, VvEC, 


or equivalently 


uEX: PO (uyv—u)+ Yc(v) — Ye(u) = 0, 


13 
Vvex, ee 


where Wc denotes the indicator function of C. It is now 
natural to introduce a concept of generalized critical 
point of the energy functional I = ® + Wc as a solu- 
tion of the variational-hemivariational inequality (13) 
(which is a particular case of the general model given in 
(1). 

Various other examples in mechanics leading to a 
variational-hemivariational inequality like (1) are de- 
scribed in [5]. As a rule, the formulation of concrete 
problems involving both monotone and nonmonotone 
boundary conditions introduces a general nonsmooth 
and nonconvex energy functional (expressed as the sum 


of a locally Lipschitz function ®: X — R and a proper, 
convex and lower semicontinuous function W: X > R 
U {+ co }) whose critical points are defined as the solu- 
tions of the variational-hemivariational inequality (1). 
If ® € C! (X; R) then problem (1) reduces to a classi- 
cal variational inequality. A critical point theory to deal 
with such case has been developed by A. Szulkin [7]. 
Defining a compactness condition of Palais-Smale type 
related to the variational inequality model and using 
the famous Ekeland principle, Szulkin provided a de- 
formation lemma and extended the well-known moun- 
tain pass theorem, the saddle point theorem, the main 
results for even functionals, etc. Several examples in- 
cluding variational inequalities are given in [7]. If W = 
0, the problem (1) reduces to the problem studied by 
K.-C. Chang [1]. Motivated by the study of partial dif- 
ferential equations involving discontinuous nonlinear- 
ities, Chang extended the concept of critical point, the 
Palais-Smale condition and the deformation lemma so 
as to be applicable to locally Lipschitz functionals. The 
minimax method developed by Chang is now one of the 
most efficient and appreciated tools to deal with hemi- 
variational inequalities arising in unilateral mechanics 
and partial differential equations involving discontinu- 
ous nonlinearities. It has been in particular extensively 
used by D. Goeleven, D. Motreanu, and P.D. Pana- 
giotopoulos [3] so as to develop in several directions the 
theory of hemivariational inequalities (see for instance 
the references cited in [3]). So, the theory of Szulkin 
and the one of Chang has been shown very efficient 
to deal with problems in unilateral mechanics involv- 
ing some given classes of boundary conditions. How- 
ever, one has seen it above, to deal efficiently with prob- 
lems in unilateral mechanics involving both monotone 
and nonmonotone boundary conditions it is necessary 
to deal with a critical point theory for functions which 
can be written as the sum ofa locally Lipschitz function 
and a proper, convex and lower semicontinuous func- 
tion. Such theory has now recently been developed by 
Goeleven, Motreanu, and Panagiotopoulos [3]. The ap- 
proach combines the approach of Szulkin and the one 
of Chang in a nontrivial way. A general linking theorem 
is obtained and then used to generalize the mountain 
pass theorem, the saddle point theorem and the main 
results for even functionals. This last theory constitutes 
a contribution in the field of nonconvex-nonsmooth 
calculus of variations that unifies the theory of Szulkin 
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and Chang. Moreover it can be used to develop the the- 
ory of variational-hemivariational inequalities and con- 
sequently to study various concrete problems in me- 
chanics involving different types of unilateral boundary 
conditions. 
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Nondifferentiable, or nonsmooth, optimization (NDO) 
is concerned with problems where the smoothness as- 
sumption on the functions involved is relaxed. ‘Non- 
differentiability’ means that the gradient does not exist, 
implying that the function may have kinks or corner 
points. Consequently, the function cannot be approxi- 
mated locally by a tangent hyperplane, or by a quadratic 
approximation. In NDO, the smoothness assumption is 
usually replaced by weaker ones, which at least guaran- 
tees the existence of directional derivatives. 

NDO problems arise in a variety of contexts, and 
methods designed for smooth optimization may fail to 
solve them. This justifies developing specialized theory 
and methods that are the object of this short introduc- 
tion. In the sequel, we will often refer to convex NDO, 
a subclass of nondifferentiable optimization, in which 
functions are further assumed to be convex. Due to its 
global property, convexity allows stronger convergence 
results and finer analyses. Yet, the difficulties linked 
with the presence of kinks remain an important aspect, 
justifying special interest for this class of problems. 

In the following Section, we give some basic defini- 
tions, then discuss examples of nondifferentiable opti- 
mization problems and finally, describe a few different 
solution techniques. 


Basic Definitions 


The basic nondifferentiable optimization problem takes 
the form 


[NDP] min f(x), (1) 


where f is a real valued, continuous, nondifferentiable 
function. Convexity of f implies that it has at least one 
supporting hyperplane at every point of R”. The slopes 
of such hyperplanes form the set of subgradients, which 
is known as the subdifferential set or the generalized 
gradient [7]. At differentiable points there is a unique 


supporting hyperplane whose slope is the gradient. At 
nondifferentiable points, there is an infinite set of sub- 
gradients and, hence, an infinite set of supporting hy- 
perplanes. 

A supporting hyperplane to f at a point xo is given 
by 


y = f (xo) + & (x — x0), 


where & is any element of the subdifferential df (xo) of 
f at xo. Recalling the fact that it is a supporting hyper- 
plane leads to the subgradient inequality 


(x0) + 8 (x — x0) < f(x). (2) 


Subgradients are defined by this inequality. 

Determining the whole subdifferential set is gener- 
ally an extremely difficult, or impossible, task. If the 
function f is polyhedral, the number of extreme points 
of the subdifferential may be exponential in the dimen- 
sion of the underlying space. A complete description of 
the subdifferential can be accomplished for simple situ- 
ations, such as the one when f is the maximum of a fi- 
nite number of convex differentiable functions: f(x) = 
max; «, fi(x). The subdifferential df (xo) is then given 
by 


Viena) % = 1, 
a; > 0 


Af (x0) = 4 D> aiV filo): 


i€I(xo) 
I(xo) = {iz filxo) = f(xo)} . 


When f is a Lipschitz function, the subdifferential set 
can be defined as being the set of cluster points of the 
gradients Vf(x;) as a sequence of differentiable points 
x; approaches x [7]. The precise definition of df (xo) is 
given by 


conv {lim V f(x;) : x; > xo: Vf(x;) exists}. 


In nondifferentiable optimization, the whole subd- 
ifferential set is never calculated. Subgradients are cal- 
culated when needed and often a single element suf- 
fices. It is common practice to isolate the procedures for 
calculating subgradients into an oracle. The number of 
calls to the oracle can be a basis for comparing different 
NDO methods. 

A natural solution method in nonsmooth analysis 
is an iterative method, where a search is done follow- 
ing descent directions. A descent direction is one along 
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which a small movement of f leads to a strict improve- 
ment. In other words 


td) — 
f'(xo3d) = lim eet 
t>0 t 
should be strictly negative. f’(xo;d) is called the direc- 
tional derivative and it is related to the subgradient 


through the relation 
f'(xod) = sup {€'d: & € Of (xo)}. (3) 


This relation implies that for d to be a descent direction, 
—d has to make an acute angle with every subgradient 
of f at Xo. 


Sources of NDO Problems 


Nonsmooth problems are encountered in many disci- 
plines. In some instances, they occur naturally and in 
others they result from mathematical transformations. 
In statistics, for example, rectilinear data fitting, 
which was long discovered to be superior to the Eu- 
clidean one - it has the advantage of overcoming the 
effect of outliers, [25] - results directly in an NDO prob- 
lem. Similarly, functions involving €; or £45 norms, Eu- 
clidean or Chebyshev distances, a maximum of convex 
functions are typical NDO problems. As an example, 
the £,. solution of an overdetermined linear system is 
found by solving the nondifferentiable convex function: 
min ||Ax — b||,, = min max |a)x—b;|, (4) 
xER"” xER" i=1,...,.m 
where x € R",b € R™ and A € R™”” with rows a}. This 
problem can be traced back to the Russian mathemati- 
cian P.L. Chebyshev who studied it in the 1850s [25]. 
Among the mathematical transformations that lead 
to NDO problems is the technique that changes con- 
strained problems into unconstrained ones through the 
use of exact penalty functions [11] (cf. also » Qua- 
sidifferentiable optimization: Exact penalty methods). 
Equality constraints, @(x) = 0 and inequality con- 
straints g(x) < 0 are placed in the objective us- 
ing penalty parameters and nondifferentiable functions 
|o(x)| and max {0, y(x)}, respectively. In other word, 
a solution to the constrained problem 


min f(x) 


st. A(x) = 0, (5) 
p(x) <0, 


is determined by solving 
min f(x) + ft |b(x)| + t2 max{0, y(x)} 


for large enough values of t; and fo. 

Still, the major source of optimization problems are 
master problems resulting from the application of re- 
laxation/restriction techniques such as Lagrangian re- 
laxation [14] (cf. also » Integer programming: La- 
grangian relaxation), [12], Benders decomposition (cf. 
also » Generalized Benders decomposition) [4,13] and 
Dantzig-Wolfe decomposition [9,10]. 

These different approaches are conceptually similar, 
at least in the linear case, and end up solving the same 
NDO problem. To show that, let us consider the linear 
program 


min c!x 
st. Ax > b, 
[LP] 
Dx > d, 
x > 0. 


Where, we assume for the ease of exposition that {x: Ax 
> b; x => 0} is a bounded, nonempty polytope. The dual 
of [LP] is 


max blu+d'yv 
[LD] 4st. Alut+tD'v<c, 
u,v > 0. 
Applying Lagrangian relaxation to [LP] is equivalent to 


relaxing Dx > d using positive dual multipliers v, lead- 
ing to 


ae aie +v'(d—Dx): Ax > | : (6) 


v>0 | x>0 


Benders decomposition applied to [LD] results in 


max | max BT +d'v: Alu < c— DT} : 
v>0 u>=0 


where v is assumed to be the complicating variable. Re- 


placing the inside problem by its dual leads to 


max | min eT + v!(d—Dx): Ax > | : (7) 


v>0 x>0 


Dantzig-Wolfe decomposition replaces [LP] by its 
convex representation in terms of the convex points of 
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{x: Ax > b; x > 0} that is indexed by €, to get 


min > a,(c! x") 


heE 
s.t. Y > a,(Dx") > d, 
heE 
Sioa 
heE 
a,>0, heé€. 
Taking the dual results in 
max vo+td'ly 
v>=0,V9 
s.t. clx'+v'Dx'>v, Vhe E, 
which is equivalent to 
max 4 minc! x" + v'(d— Dx"). (8) 
v>=0 het 


The equivalence between (6) and (7) is obvious. Us- 
ing the fact that there is always an extreme point solu- 
tion to a linear program, the equivalence between (6), 
(7) and (8) is established. Therefore, Lagrangian relax- 
ation applied to the primal is exactly Benders decompo- 
sition applied to the dual, and is equivalent by duality to 
Dantzig-Wolfe decomposition. Furthermore, all three 
solve (8), which is the maximum of a concave piecewise 
linear function that is nondifferentiable at intersection 
points. 

F.H. Clarke [7] and [8] discusses further examples 
from physics, engineering, economics and optimal con- 
trol. Other mathematical problems leading to NDO op- 
timization include semi-infinite programming, eigen- 
value optimization and variational inequalities [16]. 


Solution Approaches 


Due to the existence of successful solution methods 
for differentiable optimization, one other solution ap- 
proach tries to transform nonsmooth problem into 
smooth ones. As an example, the absolute value func- 
tion |x|, which is nondifferentiable at zero can be ap- 
proximated by 


=X, M2 4 

2 

i =bSx St, 
x, x =-E, 


for small values of the parameter ¢. For these trans- 
formations to be successful, the right transformation 
should be found and the the nondifferentiable points 
should be known. A solution approach based on this 
transformation is discussed in [21]. 

Other solution approaches that try to eliminate 
nondifferentiability, do so by transforming an un- 
constrained nonsmooth problem into a constrained 
smooth one. This approach is highly efficient for prob- 
lems that can be transformed into easily solvable con- 
strained problems such as linear programs. The f9 op- 
timization problem described in (4) is equivalent to the 
linear program 


min y 
st. Ax—ye <b, 
Ax + ye >= b, 


where e is the appropriate dimension vector whose en- 
tries are all ones. Being a linear program with a special 
type of matrix, most linear programming techniques 
were modified to solve (4). This includes the simplex- 
like algorithm in [3] and the interior point algorithms 
in [23] and [27]. 


Subgradient Methods 


The first methods for nondifferentiable optimization 
tried to extend the gradient-based methods that were 
successful for smooth optimization. The transition 
from gradients to subgradients is not straightforward as 
some subgradient-based search direction are not nec- 
essarily improving directions. P. Wolfe [26] gives an 
example where the extension of the steepest descent 
method fails. To overcome that, some designed meth- 
ods [18,20] will only take a serious step when the next 
iterate is a better one. 

Subgradient methods (cf. also » Nondifferentiable 
optimization: Subgradient optimization methods) were 
developed by N.Z. Shor [24] in the 1960s. They are ba- 
sically an iterative technique where iterates are updated 
using a current subgradient and a carefully-chosen 
stepsize. Applied to (1), iterates are given by 


Xe+1 = XE + thee, 


where x; is the current point, &, is a subgradient of f 
at x, and tf; is a stepsize. Shor [24] states that a constant 
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stepsize does not converge, even for the simple function 
|x|. He proposes the use of a stepsize that satisfies 


lag: 


th=o, th 0. 


> 
ll 


0 


In practice, the most widely used stepsize is 6 [f (xx) 
—f*]/ || & || where 6 € (0, 2] and f* is the best estimate 
of the optimal value f(x*). 


Steepest Descent and E-Subgradient Methods 


Subgradient methods are not monotonic, as they do 
not guarantee to improve the value of the minimized 
function. Descent methods are designed to overcome 
this drawback. As an example we discuss the steepest 
descent method which chooses its search direction by 
solving 
: / 

min f'(x;d). 

IId inf 
Using relation (3), the steepest descent direction, at 
a point x;, is given by 


Sh, 
iezale 


The method proceeds iteratively, updating the iterates 
by 


= ar max ri 
fi = arg, max 


dy = 


Xe = XE + KER 


and choosing the steplength a; so that f(xx41) < f (xx). 

The main difficulty with the steepest descent re- 
sides in the calculation of the direction dy which ne- 
cessitates the knowledge of the whole differential set 
f(x). To overcome that, €-subgradient methods prefer 
to calculate approximate steepest descent direction by 
searching through subgradients of neighboring points 
through the use of the €-subdifferential set 


de f (x) 
= {E: f (xo) + E(x — x) +e < f(x), Vx}. 


Details of the method can be found in [5]. 


Cutting Plane Methods 


J.E. Kelley [17] and W. Cheney and A.A. Goldstein [6] 
were the first to realize the potential of such methods 


for convex programming. Applied to (1), cutting plane 
algorithms use the subgradient inequality to approxi- 
mate f by 


F(Ge) & max foe) + Tx =), 


where gf , i € I are subgradients of f at x;, i € I. Thus, 
(1) is replaced by 


min fax fl +82 (x—xi)}, 


which is equivalent to, 


(9) 


min v 
Viel. 


st. f(xi) + ET (x —xi)<¥v, 


Problem (9) is a linear program that is easier to deal 
with than the original problem. It is to note, however, 
that this is only an approximation of (1), which gets 
better as more constraints are added. Let us denote by 
[MP;] the relaxed master problem (9) with index set I;. 

By transforming (1) to (9), a nondifferentiable prob- 
lem is replaced by a constrained problem having a large 
number of constraints. Cutting plane methods use only 
a subset of these constraints and generate the rest as 
needed. In fact, they would solve a series of relaxed mas- 
ter problems [MP;] and stop when an optimal (satisfac- 
tory) solution to (1) is reached. 

Various cutting plane methods were proposed over 
the years. Each variant generates cuts at a different 
point called the query point. Kelley’s classical cutting 
plane method [17] chooses the minimum of the relaxed 
master problem [MP] as a query point. Although, it 
may work well for some problems, this method suffers 
from slow convergence [22]. The analytic center cut- 
ting plane method (ACCPM) [15,16], on the other hand, 
chooses the analytic center as its query point. Its cal- 
culation makes use of interior point concepts and has 
shown promising results for a number of applications 
[1,2]. Bundle methods [19,20], choose the query point 
by solving a quadratic program that contains a small 
number of cutting planes. The information (bundle of 
cutting planes) is updated regularly and kept moder- 
ately small. 


Conclusion 


Nondifferentiable optimization tackles a class of prob- 
lems that are intractable to classical optimization meth- 
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ods. Most of the theory is based on the notion of sub- 
gradients and most of the work is done for the convex 
case. It has an abundance of applications in real life, be- 
cause the nondifferentiability aspect captures some of 
the inherent complexity in real-life problems. Like all 
disciplines, favoring an easily implementable and un- 
derstood method will not necessarily lead to a good 
solution method. This corresponds to the subgradient 
method in NDO. Although it is easily implementable, 
it has slow convergence. More sophisticated methods, 
such as bundle methods or ACCPM are more promis- 
ing from a computational point of view but require 
more knowhow of the method and of numerical linear 
algebra. 
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Cutting plane methods were proposed independently 
by J.E. Kelley [27] and W. Cheney and A.A. Goldstein 
[5] as a solution technique for constrained convex op- 
timization problems. Although they may not compete 
with some of the efficient methods for smooth opti- 
mization, they are one of the first and fundamental 


solution approaches for nondifferentiable optimization 
(cf. » Nondifferentiable optimization). 

The fact that they rely on polyhedral approxima- 
tions of convex functions, makes the technique suitable 
for nondifferentiable optimization. It forms with sub- 
gradient optimization [38] the two major solution ap- 
proaches for nondifferentiable problems. The cutting 
plane algorithms that are being designed over the years 
are getting more sophisticated and ultimately, showing 
promising computational results and stable numerical 
properties. 

In this article, we give an overview of cutting plane 
methods for nondifferentiable optimization problems. 
We start with a brief introduction to cutting planes. We 
then give a generic cutting plane algorithm, that will be 
used to describe some of the main variants. 


A Generic Cutting Plane Algorithm 


To describe the general cutting plane approach, we con- 
sider the following nondifferentiable problem 


§ min f(x) 


Coed | st. g(x) <0, 


where f and g are real-valued, continuous, nondifferen- 
tiable, convex functions. Convexity implies that there is 
at least one supporting hyperplane to f at every point xo 
of the domain, whose equation is given by 


y = f (xo) + E(x — x0), 


where gf is any element of the subdifferential df (xo) of 
f at xo. (For ease of notation we assume that subgradi- 
ents are row vectors.) Recalling that a supporting hy- 
perplane gives an under-estimate of f, the subgradient 
inequality 


f (xo) + €4 (x — x0) < f(x) (1) 


can be used to approximate f by the maximum of a set 
of piecewise linear functions. Therefore, given a set of 
points x;, i € I and their corresponding subgradients 
& f , f is tangentially approximated by 


Flx) = max {flxi) + Fe — xp} (2) 


Inequality (1) implies that f(x) < f(x) for any in- 
dex set J. Larger sets will give better approximations. 


Nondifferentiable Optimization: Cutting Plane Methods 


2591 


The same can be said about g, leading to the polyhedral 
approximation 


g(x) = max g(xj;) + ae — xj), 
jd 


where gf are subgradients of g at a collection of points 
xj indexed by J. Thus, [NDP] can be approximated by, 


min) max f(xi) + &!(x —x;) 


st. max g(x;) + EN(x —xj) <0, 
jeJ 
which is equivalent to, 


min v 
st. f (xi) + el (x —x)<v, Viel, (3) 
g(xj) + &Fi(x-x)) <0, VET. 


Problem (3) is a linear program that is easier to deal 
with than the original problem. It is to note, however, 
that this is only an approximation of [NDP], that gets 
better as more constraints are added. Let us denote by 
[MP;] the relaxed master problem (3) with index sets Ix 
and J; whose optimal solution is denoted by (xx, vx). 

By transforming [NDP] into (3), nondifferentiabil- 
ity is eliminated at the cost of having a problem with 
a very large number of constraints. To get around that, 
cutting plane methods use only a subset of these con- 
straints and generate the rest as needed. In fact, they 
would handle a series of relaxed master problems [MP] 
and stop when an optimal (satisfactory) solution to 
[NDP] is reached. [MP;] is a relaxation of [NDP] as, 
by convexity of g, 


ee max {(x)) + ER(x — x) < of 
contains {x: g(x) < 0}, while, by convexity of f, 
max {F() + Ef (x _ xi)} < f(x) 


for all x € {x: g(x) < 0}. This implies that the optimal 
value vj is a lower bound on the optimal value f(x*). 
The relaxation gets tighter as more points are included 
in the index sets I and J. The hope is that termination 
occurs before the sets I and J get enormously big. 

The idea described above can be put into the generic 
cutting plane algorithm below. 


In the course of the cutting plane algorithm, upper 
bounds are usually achieved through the objective eval- 
uation at a feasible point. Lower bounds are either given 
by the optimal solution of the relaxed master problem, 
or by evaluating the dual objective at a feasible dual so- 
lution. For the stopping criteria, the algorithm may be 
stopped if the bound on the duality gap drops below 
a certain threshold. This is given by the difference be- 
tween the best known upper and lower bounds. 


Initialize: 

1 Get an initial upper bound v, and a lower 
bound v; for the optimal solution f(x*). 

2 Get an initial relaxed master problem [MPo]. 
Iterate (k) 

1 Get a query point x, and a corresponding 
lower bound v7. 

2 If xx is infeasible to [NDP] (g(x;) > 0), then 
the oracle of g generates a feasibility cut of 
type g(xx) + EF(x — xx) $0. 

3 If xx is feasible to [NDP] (g(x,) < 0), then the 
oracle of f generates an optimality cut of type 
f (xn) +8 (x —xx) < vand an upper bound v/. 

4 Update the bounds: 

a) if x, is feasible to [NDP], then v, = 
max{v,, V7}. 
b) v; = min{v;, v7} 

5 Update index sets: 
a) If a feasibility cut is added then Ik, := Ip 
and Jxuy = Je {k}. 
b) If an optimality cut is added then 
Th =I {Kk} and Jxay = Ja {k}. 

6 Either STOP ork :=k +1. 


At each iteration of the method, we can construct 
a bounded polyhedral set, namely the localization set. 
Given an upper bound vé to [MP], it is given by 


ves vk; 
f(x) + f(x— xi) <v, 
1€ Ik; 
g(xj) + EF (x — xj) <0, 
jeTk 


F,(vk) = ¢ (x,y): 


The localization set F;(v,,) is a bounded polyhedral 
subset of the feasible region of [MP] that contains any 
optimal solution (x*, f(x*)) to [NDP]. To see that, f(x*) 
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k 


< vk as vk is an upper bound on the optimal value. Fur- 


thermore, by the feasibility of x* and the subgradient 
inequality g(x) + EF (x* — xj) < g(x*) < 0 for all j € Jk 
and by the fact that [MP;] is a relaxation of [NDP], f(x;) 
+ &L(x* — xj) < vp < f(x") for alli € Ik. 

Although every cutting plane method follows the 
above general scheme, different query point choices 
produce different algorithms. Prior to discussing a few 
of them, we would like to stress that some methods are 
explicitly based on the localization set, and require that 
this set be bounded. This might not hold in particular in 
the initialization phase, since no cut is yet present, and 
no upper and lower bounds on the objective are known. 
Therefore, one has to introduce box constraints. This 
may not even suffice if the first generated cut is not an 
optimality one, because the localization set remains un- 
bounded in the z variable. One must then proceed with 
a an auxiliary phase I problem. 


Kelley’s Cutting Plane Method 


This is the classical cutting plane method that was orig- 
inally described in [27] and is present in the origi- 
nal Dantzig- Wolfe decomposition [7,8] and Benders de- 
composition [4]. At iteration k, the method solves the 
relaxed master [MP;] and uses its solution x; to gener- 
ate further cuts. 

As [MP] is a relaxation of [NDP], vj, gives a lower 
bound to f(x*) which is monotonically increasing. In 
addition, when x; is feasible, f(x;.) gives an upper bound 
that, unfortunately, is not monotonically decreasing. 
The difference between the updated bounds gives an es- 
timate of the duality gap and can be used as a practical 
stopping criteria. The exact optimal solution of [NDP] 
is detected when x; is feasible and f(x;) = vx, which is 
equivalent to having 0 € df (xx). 

This optimal point strategy assumes that the relax- 
ation is a good approximation of the original problem, 
but this is only true when a big number of cuts has been 
generated. The method is globally convergent [47], but 
in practice, it sometimes shows a slow pattern of con- 
vergence [35]. 


Center of Gravity Method 


This method was first proposed in [32] as the first cut- 
ting plane method that generates query points at the 
center of the localization set. Choosing the center of 


gravity to generate query points seems to be the natu- 
ral choice. In fact cutting through the center of gravity 
of the a localization set of volume V and dimension n 
produces two sets, each with a volume of at least 


1 n 

l -(1- aa V. 

Therefore, after k iterations the volume of the localiza- 
tion set shrinks at a constant rate of 1/(1 — exp(1)). This 
rate of convergence is the best that can be obtained [46]. 
Unfortunately, finding a single center of gravity could 
be as difficult as solving the original problem. 

For some simple convex bodies such as ellipsoids, 
cubes or spheres, calculating the center of gravity is 
relatively easy. This idea is the motivation behind the 
largest inscribed sphere method of [11], the volumet- 
ric method of [43] and the analytic center cutting plane 
method (ACCPM) of [18,19,21]. 


Largest Inscribed Sphere Method 


The query point of this method is chosen as the cen- 
ter of the largest inscribed sphere in the localization set. 
Its calculation is based on work of G.L. Nemhauser and 
W.B. Widhelm [34], who showed that the minimization 
of the simple linear program 


min o 


s.t. ajx+ la} | o <b, iel, 


gives the radius o and the center x of the largest in- 
scribed sphere in the bounded polyhedron {x: a) x < 
b;, i € I}. The method is detailed in [11]. 


Volumetric Method 


P.M. Vaidya [43] proposed the volumetric center as 
a query point. It is the maximizer of the determinant of 
the Hessian matrix of the logarithmic barrier function. 
This choice is motivated by the observation that for ev- 
ery point of the localization set, an inscribed ellipsoid 
can be constructed. The point that gives the maximum- 
volume inscribed ellipsoid is the minimizer of 


aja; 


1 Tr 
2 log (2 S ats) > 


where a; are the columns of the matrix defining the lo- 
calization set. 
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Bundle Methods 


These methods were first proposed by C. Lemaréchal 
[30]. The method has developed over the years, [28,31], 
building upon the pioneering work [30]. 

The method adds a regularization term to the esti- 
mation of f and solves 


min fax f(x) + EF — x) 
+b |x oil?! (4) 


st: max {g(x;) + E(x — xj) <0, 


to get the next query point x;. The main idea is based on 
the fact that (2) is a good approximation of f when the 
next iterate is not far from the previous ones. Problem 
(4) is equivalent to the quadratic program 

min v+ oa Ix — x~—1 | 


st. f(xi) + f(x — x) <y, 
g(x) +&(x—x)) <0, je Te. 


i € Ig, 


The nice feature of bundle methods is that they limit 
the number of hyperplanes that are used to approxi- 
mate f. The set of subgradients (the bundle) is updated 
at each iteration and kept moderately small. As a result, 
(4) is solved very quickly. 

Techniques to estimate a; are treated in [28] where 
three different values are proposed. 


Analytic Center Cutting Plane Method (ACCPM) 


To overcome the difficulty associated with the calcu- 
lation of centers of polyhedrons and still use a central 
point strategy, ACCPM uses concepts form the interior 
point literature that have proven to be highly efficient. 

The query point for ACCPM is the analytic center 
of the localization set F;(v,,). This new notion of center 
was first introduced in [40,41,42] as the the unique pair 
(xk, vk) that minimizes 


log(vk — v) + S“logly — f(x:) — &/ (x — xi)] 


i€l, 


+ SY  logl—g(x;) — 6 (x — xi)]. 


i€J, 


This function is a potential function similar to the 
one used by N.K. Karmarkar [26] when presenting the 
first interior point algorithm. Its minimization is equiv- 
alent to the solution of a linear program by any inte- 
rior point method. The primal projective algorithm (cf. 
also » Linear programming: Karmarkar projective al- 
gorithm) is favored due to its ability to deal with dual 
infeasibilities when new cuts are added. It is, in prin- 
ciple, a modified Newton method applied to the mini- 
mization of the potential function [16]. 

The Newton procedure for computing a central 
point ik, vk) of F;(v,) identifies, upon termination, 
a dual feasible solution to [MP;]. Evaluating the dual 
objective at this point, gives a lower bound to the opti- 
mal solution of [NDP]. In addition, if ee is feasible to 
[NDP], i.e. g(x*) < 0, then f&) is an upper bound. 
Unfortunately, x* never coincides with the original op- 
timizer x*, as it can never be a vertex, so the stopping 
criteria can only be based on the difference between the 
upper a lower bounds. The algorithm would stop if that 
gap is sufficiently small. 

The convergence analysis of the method is done 
in [20] and [36]. ACCPM is readily implemented [25] 
and has shown promising results in solving different 
large scale problems [2,3,17,33]. The method was mod- 
ified to use weighted analytic centers as a query point 
[22], to add multiple cuts at once [24], to use a primal- 
dual interior point algorithm for the calculation of 
analytic centers [9] and to accommodate quadratic 
cuts [10]. 


Concluding Remarks 


Cutting plane methods are an effective solution ap- 
proach for nondifferentiable optimization. This has 
proven to be true when advanced methods such as bun- 
dle and analytic center methods were designed. Not 
only they make use of recent advances in optimization 
theory, they also resulted in efficient computer imple- 
mentations. 

Manipulating the set of cuts is an important en- 
hancement factor to cutting plane methods. In the ad- 
dition of cuts, introducing a whole set at once is more 
effective than introducing single cuts as it allows the fast 
accumulation of information about the problem. This is 
possible for certain classes of problems where disaggre- 
gation is possible. This is true for Dantzig-Wolfe and 
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Benders decompositions with primal and dual block di- 
agonal matrices respectively. 

The second improvement is in the manipulation of 
number of cuts in the relaxed master problem. Keeping 
all cuts may lead to a better approximation but it also 
requires huge amounts of storage space and solution 
time. Thus well defined cut-dropping strategies will 
considerably improve the performance of the method. 

A final remark concerns the use of heuristics with 
cutting planes. They can be used to initialize the 
method so that sufficient cutting planes are obtained to 
start the method, or take over the search for optimal so- 
lutions when the method stalls. 


See also 


> Dini and Hadamard Derivatives in Optimization 

> Global Optimization: Envelope Representation 

> Nondifferentiable Optimization 

> Nondifferentiable Optimization: Minimax Problems 

> Nondifferentiable Optimization: Newton Method 

> Nondifferentiable Optimization: Parametric 
Programming 

> Nondifferentiable Optimization: Relaxation 
Methods 

> Nondifferentiable Optimization: Subgradient 
Optimization Methods 


References 


1. Atkinson DS, Vaidya PM (1995) A cutting plane algorithm 
that uses analytic centers. Math Program B 69:1-43 

2. Bahn O, Goffin JL, Vial JP, du Merle O (1994) Implementa- 
tion and behavior of an interior point cutting plane algo- 
rithm for convex programming: An application to geomet- 
ric programming. Discrete App! Math 49:3-23 

3. Bahn O, Merle OD, Goffin JL, Vial JP (1995) A cutting plane 
method from analytic centers for stochastic programming. 
Math Program B 69:45-73 

4. Benders JF (1962) Partitioning procedures for solv- 
ing mixed-variables programming problems. Numerische 
Math 4:238-252 

5. Cheney W, Goldstein AA (1959) Newton’s method for 
convex programming and Chebyshev approximation. Nu- 
merische Math 1-5:253-268 

6. Clarke FH (1989) Optimization and nonsmooth analysis. 
Les Publ. CRM, Montreal 

7. Dantzig GB, Wolfe P (1960) Decomposition principle for lin- 
ear programs. Oper Res 8:101-111 

8. Dantzig GB, Wolfe P (1961) The decomposition algorithm 
for linear programming. Econometrica 29(4):767-778 


9. Denault M, Goffin J-L (1997) On a primal-dual analytic 
center cutting plane method for variational inequalities. 
GERAD Techn Report G-97-56 

10. Denault M, Goffin J-L (1998) Variational inequalities with 
quadratic cuts. GERAD Techn Report G-98-69 

11. Elzinga J, Moore TG (1975) A central cutting plane algo- 
rithm for the convex programming problem. Math Pro- 
gram 8:134-145 

12. Fiacco AV, McCormick GP (1968) Nonlinear programming: 
Sequential unconstrained minimization techniques. Wiley, 
New York. Reprint: SIAM Classics App! Math, vol 4, 1990 

13. Fisher ML (1981) The Lagrangian relaxation method for 
solving integer programming problems. Managem Sci 
27:1-18 

14. Geoffrion AM (1972) Generalized Benders decomposition. 
J Optim Th Appl 10:237-260 

15. Geoffrion AM (1974) Lagrangean relaxation for integer 
programming. Math Program Stud 2:82-114 

16. de Ghellinck G, Vial JP (1986) A polynomial Newton 
method for linear programming. Algorithmica 1:425-453 

17. Goffin J-L, Gondzio J, Sarkissian R, Vial J-P (1997) Solving 
nonlinear multicommodity flow problem by the analytic 
centre cutting plane method. Math Program B 76:131-154 

18. Goffin J-L, Haurie A, Vial J-P (1992) Decomposition and 
nondifferentiable optimization with the projective algo- 
rithm. Managem Sci 38(2):284-302 

19. Goffin J-L, Haurie A, Vial J-P, Zhu DL (1993) Using central 
prices in the decomposition of linear programs. Europ J 
Oper Res 64:393-409 

20. Goffin J-L, Luo Z-Q, Ye Y (1996) Complexity analysis of an 
interior cutting plane method for convex feasibility prob- 
lems. SIAM J Optim 6:638-652 

21. Goffin J-L, Vial J-P (1990) Cutting planes and column gen- 
eration techniques with the projective algorithm. J Optim 
Th Appl 65:409-429 

22. Goffin J-L, Vial J-P (1993) On the computation of weighted 
analytic centers and dual ellipsoids with the projective al- 
gorithm. Math Program 60:81-92 

23. Goffin J-L, Vial J-P (1998) Interior point methods for nondif- 
ferentiable optimization. In: Kishka P, Lorenz HW, Derigs U, 
Domschke W, Kleinschmidt P, Moehring R (eds) Operations 
Research Proc. 1997. Springer, Berlin, pp 35-49 

24. Goffin J-L, Vial J-P (1998) Multiple cuts in the analytic cen- 
ter cutting plane method. HEC/Logilab Techn Report 98.10 
and G-98-26, Dept. Management Stud Univ Geneva 

25. Gondzio J, du Merle O, Sarkissian R, Vial JP (1994) ACCPM- 
A library for convex optimization based on analytic centre 
cutting plane method. Europ J Oper Res 94:206-211 

26. Karmarkar NK (1984) A new polynomial-time algorithm for 
linear programming. Combinatorica 4:373-395 

27. Kelley JE (1960) The cutting plane method for solving con- 
vex programs. J SIAM 8:703-712 

28. Kiwiel KC (1990) Proximity control in Bundle methods 
for convex nondifferentiable optimization. Math Program 
46:105-122 


Nondifferentiable Optimization: Minimax Problems 


2595 


29. 


30. 


31; 


32. 


33; 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44, 


45. 


46. 


47. 


Lasdon L (1970) Optimization theory for large scale sys- 
tems. MacMillan, New York 

Lemaréchal C (1978) Bundle methods in nonsmooth opti- 
mization. In: Lemarechal C, Mifflin R (eds) Nonsmooth Op- 
timization, Proc. IIASA Workshop, March 28-April 8 1977. 
Pergamon, Oxford 

Lemaréchal C, Nemrirovskii A, Nesterov Y (1995) New vari- 
ants of bundle methods. Nonsmooth Optim, Math Pro- 
gram 69:111-147 

Levin AY (1965) On an algorithm for the minimization of 
convex functions over convex functions. Soviet Math Dokl 
6:286-290 

du Merle O, Hansen P, Jaumard B, Mladenovic N (1997) An 
interior point algorithm for minimum sum of squares clus- 
tering. GERAD Techn Report G97-53 

Nemhauser GL, Widhelm WB (1971) A modified linear pro- 
gram for columnar methods in mathematical program- 
ming. Oper Res 19:1051—1060 

Nemirovskii AS, Yudin DB (1983) Problem complexity and 
method efficiency in optimization. Wiley, New York 
Nesterov Y (1996) Cutting plane methods from analytic 
centers: efficiency estimates. Math Program B 69:149-176 
Roos C, Terlaky T, Vial J-P (1997) Interior point methods: 
Theory and algorithms. Springer, Berlin 

Shor NZ (1978) Subgradient methods: A survey of Soviet 
research. In: Lemaréchal C, Mifflin R (eds) Nonsmooth op- 
timization: Proc. IIASA workshop, March 28-April 8 1977. 
Pergamon, Oxford 

Shor NZ (1985) Minimization methods 
differentiable functions. Springer, Berlin 
Sonnevend G (1985) An analytical center for polyhedrons 
and new classes of global algorithms for linear (smooth, 
convex) programming. Lecture Notes Control Inform Sci, 
vol 84. Springer, Berlin, pp 866-876 

Sonnevend G (1988) New algorithms in convex program- 
ming based on a notion of centre (for systems of analytic 
inequalities) and on rational extrapolation. In: Hoffmann 
KH, Hiriat-Urruty JB, Lemaréchal C, Zowe J (eds) Trends 
in Mathematical Optimization; Proc. 4th French-German 
Conf. Optimization, Irsee, April 1986. Internat Ser Numer 
Math. Birkhauser, Basel, pp 311-327 

Sonnevend G (1989) Applications of the notion of analytic 
center in approximation (estimation) problem. J Comput 
Appl Math 28:349-358 

Vaidya P (1996) A new algorithm for minimizing convex 
functions over convex sets. Math Program 73:291-341 
Wolfe P (1975) A method of conjugate subgradients for 
minimizing nondifferentiable functions. Math Program 
Stud 3:145-173 

Ye Y (1992) A potential reduction algorithm allowing col- 
umn generation. SIAM J Optim 2:7-20 

Ye Y (1997) Interior point algorithms: Theory and analysis. 
Wiley, New York 

Zangwill WI (1969) Nonlinear programming: A unified ap- 
proach. Prentice-Hall, Englewood Cliffs 


for non- 


Nondifferentiable Optimization: 
Minimax Problems 


VLADIMIR F. DEMYANOV 
St. Petersburg State University, St. Petersburg, Russia 


MSC2000: 90C30, 65K05 


Article Outline 


Keywords 
Max-Type Functions 


Algorithms for Unconstrained Minimization 
Method of Steepest Descent 
Hypodifferential Descent 
Newton-Type Method 


A Convex Max-Function 
Extremal Basis Method 


Constrained Minimax Problems 
Method of Hypodifferential Descent 
The Kelley Method 

See also 

References 


Keywords 


Minimax problem; Nondifferentiable optimization; 
Inf-stationary point; Extremal basis method; 
Hypodifferential; Subdifferential 


Max-Type Functions 


Let SC R" be an open set. A standard minimax problem 
(MMP) is the problem of minimizing a function 


f(x) = max g(x, y), (1) 
yeG 


where G is a compact set of some space Y, the function 
g:S x G— Ris continuous jointly in [x, y] on S x Gand 
continuously differentiable in x (the function g’,(x, y) 
is continuous in [x, y] on S x G). The function f is, in 
general, nonsmooth though ¢ is smooth in x. 


Remark 1 The function 
f(x) = max |o(x, y)| 
can be rewritten in the form (1) since 
f(x) = max max{g(x, y), -p(x, y)} 
yEG 


= max @(x, y), 
yeG 
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where 
G={ILyl: ye Gh J{l-1 yl: ye 
CRxY=Y, 
_ _ (g(x,y), = lLyI, 
(x,7~) = 
POI) ee he FS 


The first serious study of minimax problems was per- 
formed by P.L. Chebyshev who may be considered as 
Godfather of nonsmooth analysis. He treated the prob- 
lem of minimizing the function 


f(x) = max |(y) — P(x, y)I, (2) 
y€ [a,b] 


where x = (xo, ..., Xn—1) € R", P(x, y) = °"2) xiy’. The 
function P(x, y) is a polynomial (in y). If x* € R” and 


f(x") = min f(x) 


then P(x*, y) is called the polynomial of best approxima- 
tion (for the function W(y)). If W(y) = y", [a, b] = [-1, 
1] then 


P*(y) = y" — Plx*, y) = ——Ta(y), 


Qn- 
where T’,(y) is the famous Chebyshev polynomial: 
To(y) = 1, 

Tm+i(y) =2yTm(y) — 


Ti(y) =), 
Tn= i(y), vm 2 2. 


On the interval [—1, 1] the relation T,,(y) = cos(n arccos 
y) holds. 

Let g(x, y) = W(y) — P(x, y). The following condi- 
tion holds (see [4,9]): For a point x* € R” to be a min- 
imizer of f (see (2)) it is necessary and sufficient that n 
+ 1 points yo, ..., ¥n exist such that 


yi eG, Vied,...,n; 


Yo<*+*< Yn; 
pilx™, Vit) = —gi(x"*, Vi) (3) 
Vie0,...,n—-1, (4) 
lou(x™, yi4a)| = G(x", yi) = f(x"). (5) 
The set {yo, ..., Yn} satisfying (3)-(5) is called 


a Chebyshev alternation (or alternance). 
The following properties of a max-function will be 
used in the sequel (see [1,4,5]). 


1) Let a function f be described by (1). Then it is direc- 
tionally differentiable at any point x € S and 


f(x, g) = lim SS 
. (6) 
= aa 2); 
where 
Af (x) = co{y’(x,y): y € R(x)}, 
geR’, 
R(x) = {y € G: g(x, y) = f(x)}. 


(6) means that f is subdifferentiable, df (x) is the sub- 
differential of f at x (it is a convex compact set). The 
subdifferential mapping df is not, in general, Haus- 
dorff continuous. 

2) Fora point x* € S to bea (local or global) minimizer 
of f on S it is necessary that 


0, € Of (x*). (7) 


Here 0, = (0,...,0) € R”. 
A point x* € S satisfying (7) is called an inf- 
stationary point of f. 


Remark 2 Note that for a point x** € S to be a (local 
or global) maximizer of f on S it is necessary that 


df (x™*) = {On}. (8) 


If the function f is convex (it is the case, e. g., if p is 
convex in x for any y € G), then condition (7) is also 
sufficient for x* to be a global minimizer of f on an 
open set S. Therefore condition (8) implies that a con- 
vex function f does not attain its maximal value on an 
open convex set S. 


3) If xo is not inf-stationary then the direction 


Vo 
g(xo) = -—7—, (9) 
IIvolh 
where 
[vol] = max |v], 
ved f (xo) 


is the steepest descent direction of f at xo, i.e. 


f' (xo, g(xo)) = = pain f' (Xo, g). 
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4) The following expansion holds 
f(x + A) = f(x) 
+ maxlo(x, y) — fe) + (@,(, 9), AD] 
+ 0,(A), 
where 


0x(A) ||Al>o 
— 
|All 


This expression can be rewritten as 


f(x + A) = f(x) 
fax de + (v, A)] + 0,(A), 


la,vledf 
(10) 
where 
df(x) = co 
a= p(x, y) — f(x), (11) 
[a,v]} €RxR": v = Q(x, y), 
yeG 


is the hypodifferential of f at x. Note that a > 0 and 
mMaX{a, v] <df(xy@ = 0. The mapping df is Hausdorff- 
continuous on S. 

Necessary condition (7) is equivalent to 


Ong1 € df (x"). 


5 


~~ 


(12) 
6) If xo is not inf-stationary, then let us find 


min 


Z\| = ||Zoll, 
min, Ill = [ll 


where Zo = [do, Vo]. In this case vp £ 0,,. The direction 


Vo 


ae 13 
vol 


(x0) = 
is a descent (not necessary steepest descent) direc- 
tion of f at xo. 

The vector-function g(x) (see (13)) is continuous. 
If the function ¢ in (1) is twice continuously differ- 
entiable at x then the following relation holds: 


f(x + A) = f(x) 


bars | = fled + Copley) Ad (14) 


7 


~ 


+5 (axle, yd, a)| + 0¢(A2), 


where 


0x (A?) ||Al>o 
IAI? 


(14) can be rewritten as 
f(x + A) = f(x) 


+ max E +(v,A)+ 34d. a)| (15) 


[a,v, Aled? f(x) 
+ 0x(A’), 
where 


eis co} [as ERxR"xR™"; 


a = (x, y) — f(x), 


(16) 
v = 9,(x, y), 
A= x(x, y), 
yeG 


Here R”” is the space of real (n x )-matrices. 
The set d*f(x) is called the second hypodifferential of f 
at x. It is closed, bounded and convex. The mapping 
d’f is Hausdorff continuous on S. In this case f is twice 
continuously hypodifferentiable. 


Algorithms for Unconstrained Minimization 


Assume that in (1) S = R”. Then the problem of mini- 
mizing f is an unconstrained minimax problem. There 
are a lot of numerical methods to solve this problem 
based on the properties of max-functions. 


Method of Steepest Descent 


Let xo € R" be arbitrary. Assume that x; has already 
been defined. If 0, € Of(x,) then x, is an inf-stationary 
point and the process terminates. If 0, ¢ Of (x) then let 
us take gx = g(xx) (see (9)) and find 
min f (xx + ogk) = fk + CKgK)- (17) 
Now, put x41 = x + ag. Continuing in the same 
manner we construct a sequence {x;} such that 
f(xr+1) < f (xx). (18) 


If this sequence {x,} is finite, then, by construction, the 
last point is a stationary one. 
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Let {x;} be not finite. Assume that the set Q(x) = {x 
€ R": f(x) < f(xo)} is bounded (then it is closed). Due 
to (18) xz € Q(xo) and, hence, there exist a point x* € 
Q(xo) and a subsequence {x;,} such that x,, — x*. One 
may expect that x* is inf-stationary. However, in gen- 
eral, this is not the case and the reason is the disconti- 
nuity of the mapping df. 

To ensure the convergence it is necessary to over- 
come the discontinuity of df. 

Let us introduce the set Re(x) = ty € G: f(x) — g(x, 
y) < e} where ¢ > 0. Put 


L(x) = co {gi (x, y): y € Re(x)}. 


Find 
min. ||v|| = ||ve(x)|)- 
€Le(x) 
If ||ve(x)|| = 0, the point x is called an ¢-stationary 


point. Choose any ¢ > 0. Let us construct the follow- 
ing method. Take any xo € R”. Let x; have already been 
found. If || ve(x;) || = 0 then x; is e-stationary. If ||v_(xx)]|| 
> 0 then let us find 


min f (xk — aVe(xK)) = flrk — HeVe(xK)) 


and put xx41 = X~ — AkVe(xx). Continuing analogously 
we get a sequence {x;}. 


Proposition 3 If the set Q(xo) is bounded then in a fi- 
nite number of steps we arrive at a point x, such that 


On € Loe(Xx). 


Thus, in a finite number of steps we shall find a 2e- 
stationary point. 

Now it is not difficult to modify this method to get 
an inf-stationary point of f. 

Choose any &9 > 0 and x» € R”. Assume that Q(xo) 
is bounded. Applying the above method, in a finite 
number of steps we shall find a point Xo such that 
On € Lre,(Xo). Let X, € Q(xo) be found such that 
On € Lyre, (X~) where € = 2-keo. Take ex1) = €4/2 and 
X~ = X,. Applying the above method, in a finite num- 
ber of steps a point X,41 € Q(xo) will be found such 
that 0, € Loeny, (X--41). Clearly, x~41 € Q(x). The se- 
quence {x;} is bounded. 


Proposition 4 Any limit point of the sequence {x;} is 
an inf-stationary point of f. 


Hypodifferential Descent 


Another method is based on expansion (10). Take xo € 
R". Let x; have been found. Compute 


min |[z|| = ||z«ll. 
zed f (xx) 
where Z, = [ax, vx]. If || ze || = 0 then the point x, is 


inf-stationary and the process terminates. 
If || zz || > O then % #0, and let us find 


min f (xk — &vk) = f (xk — OKVE) 


and put xx41 = Xk — OK Ve. 

If the sequence {x;} thus constructed is finite then 
the last point is inf-stationary. If {x;} is infinite then the 
following result holds: 


Proposition 5 Ifthe set Q(x) is bounded then any limit 
point of the sequence {xx} is inf-stationary. 


Remark 6 The two algorithms described above are 
‘conceptual’ (according to the terminology of E. Po- 
lak). These algorithms are computationally effective in 
the case where the set G (see (1)) contains only a finite 
number of points. Different practical implementations 
of the above ideas can be found in [2,4,10,11,12]. 


Newton-Type Method 


If the function g in (1) is twice continuously differen- 
tiable, one can employ the expansion (14)-(15). 

Take any xo € R". Let x; have already been defined. 
Find 


min F,(A) = Fy(Ax), 
AER" 
where 


max 
[a,v, Aled? f (xx) 


F,(A) = E +(v, A) + (AA, a)| : 
Now put xg41 = x4 + Ag. 

Under some additional conditions (see [3]) the se- 
quence {x;,} thus constructed converges at least to a lo- 
cal minimizer of f (and the rate of convergence is 
quadratic). 


A Convex Max-Function 
Extremal Basis Method 


Now let us consider the case where f is described by (1) 
and the function ¢ in (1) is strongly convex in x with 
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a constant m > 0 for every fixed y € G, i.e. 
p(x + A, y) = v(x. y) 


+ (gi.(x, y), A) + ml|Al/’, 
V[x,y]}—eR"xG, VWAeER’. 


(19) 


The function f is also strongly convex and has a unique 
minimizer. 
Choose an arbitrary set of n + 2 points from G: 


To = {Yo1,---> Yon+2}> 
yr eG, Viel,...,n+2. 


The set To will be called a basis. 

Assume that the points xo, ..., x,-1 and the basis tT, 
= [ki +++» Vk, n+2} have already been found. 

Let us define the function 


Sk (x) = 


max 
i€l,..., n+2 


g(x, Yki) 


and choose a point x; € R” such that 


fk (xk) = min fk (x). (20) 
If f(x~) = fe(xg) then x; is the minimizer of f, and the 
process terminates. 

The minimization problem in (20) is simpler than 
that of minimizing f. 

Consider the case f(xx) > fx(xx). By the necessary 
and (in our convex case) sufficient condition (7) 


0, € Lr, (21) 


where L; = co Hy, Hy = {Qx! (Xks Yui): i€ Ry}, Re = {ie 1, 
wiey A+ 2: O(Xks Vei) = fe(XK)}. By the Carathéodory the- 
orem [7] every point of L;, can be represented as a con- 
vex combination of not more than n + 1 points of Hy. 
Therefore, there exists at least one index i, € 1, ..., n+ 
2 such that either i, ¢ Ry or the origin in (21) may be 
‘constructed’ without the vector 9" (Xk, Yki,)- 
Let y, € G be such that 


P(Xk» I) = f (Xk) = max (xr, y). 
yeG 


Now let us construct a new basis 


Tht = {Vktils+ +++ VetInt2} 
where 
Veis iF ig, 
Vkt+i,si = J ; 


Veo t= tg. 


The basis t,,1 differs from t, by one point and also 
contains n + 2 points. For the basis t;,) again define 
the function f;,1(x) and the point x; 1. 

Asa result, a sequence {x;} is constructed. If this se- 
quence is finite, its last point is the minimizer. If not, 
the following property holds. 


Proposition 7 (See [6 Sect. III.10].) The sequence {x,} 
converges to the minimum point of f. 


Remark 8 ‘The extremal basis method can be extended 
(with necessary adjustments) to the case where ¢ is just 
convex at x, not necessarily strongly convex. 


Remark 9 If the function ¢g in (1) is not convex then 

condition (7) and the Carathéodory theorem produce 

the following properties: 

1) There exist points yj, ..., vii such that y; € Y for 
any i €1,...,k +1 anda minimizer x* of f is an 
inf-stationary point of the function 


2) There exist points yj, ..., Vk+1 and coefficients a}, 


. ++, ky, Such that 


yieY, a = 0, 


Viel,...,k+1, 


yi a,;=1, 


i€1,...,k+1 


and a minimizer x* is a stationary point of the 
smooth function 


2 aj P(X, Vi). 


These properties can be used to derive correspond- 
ing numerical algorithms. 


L(x) = 


Constrained Minimax Problems 


Let f be defined by (1) on an open set S$ C R" and 2. C 
S be a closed set. The problem is to find 


mip fla) = f 
Take x € §2. The set 


A{lax, gk]} : 
[ax. x] > [+0, g], 
x+ ang € 2, 
Vk 


geR": 
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is the Bouligand cone to {2 at x. It is nonempty and 
closed. 

The function f is subdifferentiable (see (6)). 

The following necessary condition holds ([4,5]): 

For a point x* € (2 to be a minimizer of f on 92 it is 
necessary that 


f'(x*,g)=0, Vg eT (x*,&). (22) 

The cone I”(x*, §2) may be represented in the form 

P(x*, 2) =(JAi, (23) 
iel 


where the A; are closed convex cones, I is some set (e. g., 
I'(x*, 82) can always be given as the union of all its 
rays). Of course, from practical considerations we are 
interested to have as few elements in I as possible (the 
best case is the one where {2 is convex, then I"(x*, 2) 
is also convex). 

Taking into account (6), condition (22) can be 
rewritten in the equivalent form 

Of(x*)N At # , 


Viel, (24) 


where A} is the cone conjugate to Aj: 
Ay = {ve=R": (v,g) = 0, Vege Aj}. 


A point x* € 92 satisfying (24) is called an inf- 
stationary point of f on 92. 

Let x € §2 be not inf-stationary. For every i € J let us 
find 


min |/v—w]| = |lv; — w;|| = pi 25 
jzain, |v — wil = Iv — will = py (25) 
weay 
and 
max Pj = Pip = Il Vig ~~ Wig ll : (26) 
ie. 


Then the direction 
Wig — Vi 
Bin (X0) = a 
19 
is a direction of steepest descent of the function f at the 
point xo (on the set £2): 


v . = : / 
F(X, Sig (X0)) ee (2, @). 


Ilgll=1 


It may happen that there exist several steepest de- 
scent directions (s.d.d.). If I” (xo, §2) is convex then such 
a direction is unique. 

Many numerical methods for minimizing f on Q 
employ condition (24) (see [2,6,10,11,12]). 

Let us discuss in detail the case where £2 is described 
in the form 


Q={x ES: h(x) <0}, (27) 


where 
h(x) = max p(x, z), 
zeG) 


G; is a compact set of some space Z, w: S x G; > R" is 
continuous jointly in x and z on S x G, and is contin- 
uously differentiable in x. Assume that {2 is nonempty. 
Note that 2 is closed. 

The function h is also subdifferentiable and 


dh(x) = co {il (x, 2): ZE Q(x)} ; 
Q(x) = {2 EG: W(x, z) = h(x)}. 


Note that if h(x) = 0 and 0, ¢ dh(x) then 
Ut (x, 2) = {v=Aw: 4 <0, we Oh(x)}. 


If h(x) < 0 the '* (x, 92) = {On}. 

The cone I(x, §2) is convex and closed and, hence, 
there exists only one steepest descent direction. 

The following necessary condition is true: Let x* € 
2, h(x*) = 0. For a point x* to be a minimizer of f on 
@2 it is necessary that 


0, € cofdf(x*), h(x*)} = L(x*). (28) 


If 0, ¢ dh(x*) then conditions (22) and (28) are 
equivalent. If 0, € dh(x*) then (28) holds automatically 
but (22) does not. 

If 0, ¢ L(xo), h(xo) = 0 then the direction 


g(xo) = v(x9) 
I|v(xo0) | 
where 
I|v(xo)|| = min ||v| 
veEL(xo) 


is a descent direction and it is an admissible direction, 
i.e. there exists a > 0 such that xp + a g(xo) € 2, Va 
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€ [0, ao]. Observe that the steepest descent direction is 
not necessarily admissible. 

Note also that for any A > 0 condition (28) is equiv- 
alent to 


On € co{df(x*), ADh(x*)} = Ly (x*). 


Method of Hypodifferential Descent 


The function h (as well as f) is continuously hypodiffer- 
entiable with the hypodifferential (cf. (11)) 


a= W(x,z) — h(x), 
v= Wi(x, 2), 


zZEG 


dh(x) = co § [a, v]: . (29) 


Let 


L(x) = cotdf(x), dh(x) + [h(x), On]}. 
Proposition 10 For a point x* € Q to be a minimizer 
of f on 92 it is necessary that 


On41 € L(x*). (30) 


A point X* € 92 satisfying (30) is an inf-stationary point 
of f on 2. 
If x € 92 is not inf-stationary, then 


p(x) = min ||z|| = ||Z(x)|| > 0. 
ZEL(x) 


Here 


2(x) = [n(x), z(x)], 
n(x) ER, z(x) ER", z(x) F On. 

The direction g(x) = — z(x)/ || z(x) || is an admissible 
descent direction. The vector-function z(x) is continu- 
ous on (2. 

Let us describe the following method (see [5, Chap. 
5, Sect. 5]): 

Take any xo € 92. Let x, € 2 have already been de- 
fined. If p(x,) = 0 then x; is inf-stationary. If p(x,) > 0 
then let us find 

min 
a>0, 

(xp—2z (xp JER 


= f(x~ — OK2(x,)). 


f (xk — w2(xx)) 


Now put 
Xkt1 = Xk — OK2(Xx). 


By construction x; € 2, Vk. 
If the sequence {x;} is finite, its last point is an inf- 
stationary one. If it is infinite the following result holds: 


Proposition 11 [f the set 
{x € 2: f(x) < flxo)} 


is bounded, then any limit point of the sequence {xx} is 
inf-stationary. 


Remark 12 The method described is ‘conceptual’. For 
its practical implementation it is necessary to avoid the 
computation of df and dh (by (11) and (29)) and take 
some smaller sets (since the hypodifferential mapping 
is not uniquely defined). 


Remark 13 If both functions g and w are convex in 
x then the extremal basis method (given above) can be 
extended for minimizing f on £2 (see [6]). 


The Kelley Method 


Let us consider the problem of minimizing a function 
jie) 
= max[9(x, y) + fi(x)] 
yeEG 


on a convex compact set 22 € R", where gy}: 92x G>R 
is convex in x on 2 for any y € G and continuous in y 
on G and f;: 2R is continuous on (2. Then the follow- 
ing modification of the Kelley cutting plane method [8] 
can be used: 

Choose any x9 € §2 and find yo € G such that g(x, 
yo) =f (xo). Take any vo € 091 (Xo; Yo) (where 091 (x0; Yo) 
is the subdifferential of the convex function g(x, yo) at 
Xo). Put 


Bo(x) = filx) 
+ @1(X0, Yo) + (Vo, X — Xo). 


Let x, € @ have already been defined. Find yz € G 
such that o(xz, yx) =f (xx), take any vp € 01 (x45 YE) and 
put 

B(x) = filx) 

+ Q(X, Vk) + (VE. x — XK). 
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Let 
fx(x) = max B;(x) 
i€0,...,k 
= filx) + max [eilxi. yi) + (vj,x — x;)]. 
Find 


min fi(x) = f(x; )s 


If fx(x~) = f(xf) then x; is a minimizer of f on Q 
and the process terminates. Otherwise, take xf = xk41 
and proceed as above. If the sequence {x;} thus con- 
structed is finite, its last point is a minimizer. If not, the 
following statement holds. 


Proposition 14 Any limit point of the sequence {x;} is 
a minimizer of f on &2. 


See also 


> Bilevel Linear Programming: Complexity, 
Equivalence to Minmax, Concave Programs 

> Bilevel Optimization: Feasibility Test and Flexibility 
Index 

> Dini and Hadamard Derivatives in Optimization 

> Global Optimization: Envelope Representation 

> Minimax: Directional Differentiability 

> Minimax Theorems 

> Nondifferentiable Optimization 

> Nondifferentiable Optimization: Cutting Plane 

Methods 

> Nondifferentiable Optimization: Newton Method 

> Nondifferentiable Optimization: Parametric 

Programming 

> Nondifferentiable Optimization: Relaxation 

Methods 

> Nondifferentiable Optimization: Subgradient 
Optimization Methods 

> Stochastic Programming: Minimax Approach 

> Stochastic Quasigradient Methods in Minimax 
Problems 


References 


1. Danskin JM (1967) The theory of max-min and its applica- 
tion to weapons allocation problems. Springer, Berlin 

2. Daugavet VA, Malozemov VN (1981) Quadratic rate of 
convergence of one linearization method for solving de- 
screte minimax problems. USSR J Comput Math Math Phys 
21(4):835-843 


3. Demyanov VF (1995) Fixed point theorem in nonsmooth 
analysis and its applications. Numer Funct Anal Optim 
16(1-2):53-109 

4. Demyanov VF, Malozemov VN (1974) Introduction to min- 
imax. Wiley, New York 

5. Demyanov VF, Rubinov AM (1995) Constructive nons- 
mooth analysis. P. Lang, Frankfurt am Main 

6. Demyanov VF, Vasiliev LV (1986) Nondifferentiable opti- 
mization. Springer and Optim. Software, Berlin 

7. Karlin S (1959) Mathematical methods and theory in 
games, programming and economics. Addison-Wesley, 
Reading 

8. Kelley JF (1960) The cutting-plane method for solving con- 
vex problems. SIAM J Appl Math 8(4):703-712 

9. Laurent P-J (1972) Approximation et optimisation. Her- 
mann, Paris 

10. Panin VM (1981) On some methods for solving convex 
programming problems. USSR J Comput Math Math Phys 
21(2):315-328 

11. PolakE (1987) On the mathematical foundations of nondif- 
ferentiable optimization in engineering design. SIAM Rev 
29:21-89 

12. Pschenichny BN (1983) The method of linearization. 
Nauka, Moscow (In Russian) 


Nondifferentiable Optimization: 
Newton Method 


A. M. RUBINOV 
School Inform. Techn. and Math. Sci. University 
Ballarat, Ballarat, Australia 


MSC2000: 49J52, 90C30 


Article Outline 


Keywords 
See also 
References 


Keywords 


Newton method; Variational inequality problem; 
Nonlinear complementary problem; Smoothing; 
Approximation of nonsmooth mappings; Semismooth 
mapping; Superlinear convergence 


One of the main approaches to solving equations 
and unconstrained optimization problems with differ- 
entiable functions involved is based on the Newton 
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method (NM). The classical version of this method (in 
a finite-dimensional setting) deals with an equation 


F(x) =0 (1) 


with a differentiable mapping F. Having an approxi- 
mate solution x; (k= 0, 1,...) we wish to find a new one 
Xk+1 = X~ + y which is ‘better’ than x,. Since there exists 
the derivative VF, we can approximate the mapping y 
t—> F(x, + y) by the mapping y +> F(x,)+ VF(xi)y 
and hence approximate the equation F(x; + y) = 0 by 
the equation 


F(xx) + VF(xx)y = 0. (2) 


It is assumed that the linear mapping VF(x;) is in- 
vertible. A new approximation x;4; to the solution has 
a form Xx41 = X% + ye Where yy = — (VF(xx)) | (F(xx) is 
a solution of the equation (2). (Sometimes it is more 
convenient to add a degree of freedom and consider 
a vector Xx41 = Xz + tk ye With ty > 0 as a new approxi- 
mation.) Thus a classical smooth version of the NM has 
a form 


Xepy = Xk — CF (xe) (Flee) - (3) 


There are many methods for solving an unconstrained 
minimization problem 


f(x) > min (4) 


based on the scheme (3) and its modifications. The sim- 
plest scheme: 


xeei = xe —[V2 flan] V(x) (5) 


is suitable for twice-continuously differentiable (C7) 
functions f. This scheme allows us to find critical points 
of the function f. 

As it turns out many optimization and related prob- 
lems even with smooth objective functions and con- 
straints can be reduced to the solution of special kinds 
of nonsmooth equations (see, for example, [20]). 

We shall consider as an example a variational in- 
equality problem (VIP): find a vector x € D such that 


(y — x)! F(x) >0 forallyeK, (6) 


where K is a closed convex subset of R" and F is a con- 
tinuously differentiable (C') mapping defined on an 


open set D D K. Let Px be the metric projection onto 
the set C (that is |] x — Px x |] = minyex || x — y || with 
the Euclidean norm || - || ) and 


Fx(x) =x — P(x — F(x), 


(7) 
F(z) = F(Px(z)) + (z — Pr(z)). 


It can be shown (see, for example, [20]) that a vector x 
satisfies (6) if and only if x is a solution of the equation 
F(x) = 0 and if and only if z = x — F(x) is a solu- 
tion of the equation Fx(z) = 0. Since Px is not necessar- 
ily a smooth mapping it follows that both mappings Fx 
and Fx are not necessarily smooth. 

A special case of VIP (6) with K = R', is a nonlinear 
complementary problem (NCP): find a vector x € D such 
that 


x>0, F(x)>0, x'F(x)=0, (8) 


where D C R” is an open set containing the nonnegative 
orthant R‘, and F is a continuously differentiable (C') 
function defined on D. Since Pr (x) = x*, the map- 
pings (7) have the following form: 


F(x) = min(x, F(x)), 


F(z) = F(x*)— x7, 0) 


where x* = max(x, 0) and x” = min(x, 0); max and min 
stand for componentwise maximum and minimum, re- 
spectively. Thus NCP reduces to equations with max 
and min operators. 

The following properties of the function m(b, c) = 
min(b, c) are important for application to NCP: m is 
positive homogeneous of the first degree and the set {x 
= (x1, X2): x, > 0, x2 > 0, m(x) = 0} coincides with the 
union of two positive semi-axes. Sometimes it is more 
convenient to consider functions with the same proper- 
ties and also being C' on R? except the origin. One ex- 
ample from this large class of functions is the so-called 
Fischer-Burmeister function [8]) 

o(b, c) = (8 +07)? —(b+ 0). (10) 
If 


H(x) = (O(x1, Fi(x)),---,6(%n, Fa(x)))" , (11) 


then x is a solution of NCP if and only if H(x) = 0. 
Thus it became necessary to extend the NM to 
enable the efficient solution of nonsmooth equations. 
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Note that this extension is not generally possible. B. 
Kumer [15] has found an example of a function F de- 
fined on the real line, which enjoys very good proper- 
ties (F is Lipschitz, strictly monotone, directionally dif- 
ferentiable everywhere and Fréchet differentiable at the 
solution of the equation (1)) and such that NM alter- 
nates for almost all starting points. 

One of the first contributions to nonsmooth NM 
was given by S.M. Robinson [27] for solving feasibil- 
ity problems involving inequality constraints and by M. 
Kojima and S. Shindo [14] for solving (1) with a piece- 
wise smooth mapping F. Various versions and modifi- 
cations of the NM for nonsmooth equations have been 
developed during the last decade. In particular damped 
NM and smoothing NM enjoy global convergence un- 
der certain monotonicity assumptions. There are finite- 
dimensional and infinite-dimensional settings of this 
problem. Here we consider only finite-dimensional ver- 
sions of the nonsmooth NM. For infinite-dimensional 
problems see for example [7,16]. 

The first problem which arises in the study of non- 
smooth NM is to find suitable approximations of nons- 
mooth mappings. Many authors suggested various ver- 
sions of such approximations. Two types of approxi- 
mations are mainly considered. One of them is an ap- 
proximation by a certain set-valued mapping x — V(x) 
where V(x) is a set of invertible matrices. In this case 
the equation (2) is replaced by a linear equation F(x,) 
+ Axy = 0, where A; is an arbitrary matrix from V(x;). 
Thus NM in such a setting has a form 


Xkp1 = Xe — Ay! F (xx). (12) 


The second type is based on an approximation by 
means of a certain nonlinear mapping F’. In such a case 
the method can be presented in the following form: x; 
= x, + ye, where y; is a solution of the nonlinear equa- 
tion 


F(x) + F’(xe\(y) = 0. (13) 


It is assumed that the auxiliary nonlinear equation (13) 
can be solved relatively easily. 

There is a general approach [16] based on the set- 
valued approximation F(x, y) of a single-valued lo- 
cally Lipschitz mapping F which includes both of the 
above mentioned types. This approach clarifies two 
conditions which lead to convergence of NM: first the 


uniform injectivity (invertibility) of the approximating 
mapping at a neighborhood Y of a solution x*: there 
exists c > 0 such that 


|u|] = cllyl]| foralye Y, ue F(x, y), (14) 


and secondly, that the mapping F should accomplish 
a relatively good approximation at this neighborhood: 


F(x) + F(x, y) C F(x, y + (x — x*) 


15 
+ o(||x — x"), es 


where o(y)/||y|| + 0 as ||y|| — 0. 
It can be shown (see [16]) that for many concrete 
situations (15) is equivalent to 


F(x) + F(x, x* — x) C o(||x — x*||)B, (16) 


where B is the unit ball. The local speed of convergence 
is determined by the number c and the function o in 
(15). Conditions (14) and (15) are necessary for con- 
vergence of NM under some additional assumptions 
({16]). 

We now turn to the first type of approximation. If F 
is a locally Lipschitz mapping then the B-subdifferential 
dg F(x) and Clarke generalized Jacobian dqF(x) are 
considered as approximations. By definition 


§ fins VEG: F differentiable 
| XK x at Xk 


dp F(x) = 
and dc)F(x) is the convex hull of dg F(x). L. Qi sug- 
gested another approximation, the C-differential [23], 
which can be applied to all continuous (not just Lip- 
schitz) mappings. By definition the C-subdifferential 
T is a compact-valued upper-semicontinuous mapping 
such that F(x + u) = F(x) + A(u) + o(u) for any 
A € T(x + u). (It is important that T is a mapping, 
not an individual set at the point x; approximation 
near the point x is accomplished by matrices from the 
sets T(y) for all y sufficiently close to x.) Close con- 
struction of point-based set-valued approximation is 
studied in [30], where connections with constructions 
from [16,28] are mentioned. There exist exact calcu- 
lus rules for C-differentials which allow their relatively 
easy computation. An interesting example of approxi- 
mation is given by approximate Jacobians (see, for ex- 
ample, [10]). This construction is again applicable for 
all continuous mappings and allows the unification of 
various approaches to approximation. 
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Local convergence based on a matrix approxima- 
tion V(x) can be proved only if all matrices A belonging 
to V(x) with x sufficiently close to a solution x*, accom- 
plish a sufficiently good approximation (compare with 
[16]). Convergence can be, in particular, proved if the 
mapping F is semismooth (see [25]) at the point x* or 
enjoys some properties close to semismoothness (see, 
for example, [10]). The mapping F is called semismooth 
with respect to a matrix approximation V(x) at a point 
x* if the directional derivative F’ (x, -) exists at all points 
x close to x* and 


Au — F'(x,u) = o(|lu 
AéT(x+ un), 


), 


u— 0, 


(17) 


that is, each matrix A € T(x + u) approximates the di- 
rectional derivative at the point x in the direction u. 
(Semismoothness was originally introduced by R. Mif- 
flin for real-valued functions in 1977.) In some applica- 
tions strong semismoothness is required. The mapping 
Fis called strongly semismooth at a point x if there exists 
a number K such that 


|| Au — F'(x,u) | < K jul’, 


Ae T(x + u), ae) 


u— 0. 

Semismoothness and strongly semismoothness can be 
easily verified in many concrete situations. The simplest 
example of a strongly semismooth mapping is a coordi- 
natewise maximum (or minimum) of a finite number 
of C? mappings. 

It is well known that the NM produces for differen- 
tiable mappings a sequence which converges to a solu- 
tion very fast. As it turns out this property holds also in 
nonsmooth setting under some regularity conditions. 
Qi and J. Sun demonstrated that for semismooth map- 
pings the rate of convergence is Q-superlinear [25], that 
is 


[lXe41— "I 


lim = 0. 


k>+00 ||x~ — x*|| 
Under some additional assumptions (for example, 
strong semismoothness) [25] it can be shown that the 
rate of convergence is Q-quadratic: 


IlXe+1 — x* | 


7 < +00. 
k>+00 |lxe — 2 | 


The so-called Kantorovich scheme [12] in the study 
of NM can also be extended for nonsmooth equations 


(see for example [16,25,27,29]). This scheme allows one 
to prove not only the convergence of the NM but also 
the existence of a solution in a small neighborhood of 
the starting point x, if some combinations of such pa- 
rameters as || F(x,)|| and the uniform boundary of norm 
of inverse matrices, the value K in (18), the radius of 
neighborhood, are sufficiently small. 

Various kinds of nonlinear approximation are used 
for setting the NM in the framework of the second ap- 
proach. 

The B-derivative [19,22], that is uniformly (with re- 
spect to direction) directional derivative, is often used 
as a nonlinear approximation. One more tool for such 
an approximation is the codifferential [5]. One of the 
difficult tasks arising under application of the NM 
based on nonlinear approximation is to solve the aux- 
iliary subproblem (13). For B-derivative based NM, 
J.S. Pang [19] proposed solving this problem inexactly. 
A version of NM under the assumption that each ma- 
trix of dg F(x) is invertible (the so-called BD-regularity) 
was studied in [22]. This assumption is weaker than 
invertibility of B-derivative and also nonsingularity of 
all matrices of dq F(x). A different approach to non- 
linear approximation was proposed by Robinson [28], 
who introduced the so-called point-based approxima- 
tion. This approach is convenient in the study of equa- 
tions with mappings Fx (see (7)) generated by the 
projection Px on a convex set K. Q-superlinear con- 
vergence and (for strong approximations) Q-quadratic 
convergence can be proved for various kinds of nonlin- 
ear approximations. 

An abstract version of the second approach, based 
on the Kantorovich scheme, was proposed in [29]. The 
mapping F°(x, v) is called an approximator for the 
mapping F at the point x in the direction v if F(x + v) = 
F(x) + F©(x, v) + o(v) where lim; -, ,9f~o(tv) = 0. As- 
sume that an approximator F© accomplishes a strong 
approximation: there exist a number K such that ||F(x 
+ u) — F(x) — F® (x, u) || < Kull? for all x close to 
the starting point xo, and F© enjoys the following prop- 
erty (which is weaker than the pseudo-Lipschitz prop- 
erty [1]): there exists a number L such that the equa- 
tion F(x, u) = y has a solution @ with |z|| < L|ly| 
for all sufficiently small y. Then the convergence of 
the NM based on F® can be proved [29] by means of 
the Kantorovich scheme under some typical for this 
scheme assumptions. The inequality ||x* — x,|| < y 


Nondifferentiable Optimization: Newton Method 


2-* holds for the Newton sequence (x,) with a number 
y>0. 

As a rule NM converges to a solution (even in the 
smooth setting) only if a starting point is sufficiently 
close to this solution. Thus the question arises: is it 
possible to find modifications of NM which provide 
global convergence for some equations, that is conver- 
gence from an arbitrary initial point? One of such mod- 
ifications is the damped Newton method. For instance, if 
the NM uses a positively homogeneous approximation 
F'(x,, u), such as a Jacobian in the smooth case, or di- 
rectional derivative [7,13,15,19], then x,,; is chosen as 
xe + Ax ye for some A; € (0, 1] such that 

|F(r+1)|| = C1 — Ag) Fe) < FGI. (19) 
This can be generalized to nonpositively homogeneous 
approximations such as point based approximations 
[3,21] by setting up a path p;,(A) joining x, = p;(0) to 
Xk + Vk = pe(1), where y, is found by the NM, such that 
F(px(A)) = (1 — A) F(xx) + 0(A). Then it is easy to deter- 
mine Ax € (0, 1] such that for x41 = px(Ax), (19) holds. 

There is a close connection between the damped 
NM and the so-called list square merit function of the 
operator F: 


A(x) = = ||F(x)|I°. 


1 
5 
For some nonsmooth operators F arising from NCP 
and related problems the function 6 is continuously dif- 
ferentiable. This property is very useful in the study of 
NM and damped NM [11]. 

One of modifications of the nonsmooth NM is 
the so-called smoothing Newton method, which is also 
called splitting NM or homotopy NM. This method 
based on an approximation of the operator F in (1) by 
a smooth function G. Usually a smoothing damped NM 
is studied. This method has the following form 

Ke = Xe — VG eee FO): (20) 
where G(x, €) is a smoothing approximation of the 
mapping F, that is for any ¢ > 0 the function x +> G(x, 
€) is continuously differentiable and || F(x) — G(x, e) || 
> O0ase—0. 

Various types of smoothing functions are used in op- 
timization for a long time (see, for example, [13]). Some 


of them can be constructed by means of the so-called 
Chen-Harker-Kanzow-Smale function: 


E(u,e) = ; (Ww 4 4e2)2 4 u) . (u,e) € R, 


which serves for an approximation of max(0, u) (see, 
for example, [24]). In particular, it is possible to find 
smoothing operators for operators defined by (9) by 
means of the function &. An interesting approach to the 
smoothing methods can be found in [3]. Global con- 
vergence of the method (20) can be proved under some 
assumptions. If the gradient of the smoothing function 
Gis fairly close to some approximations of the operator 
F then the method converges Q-superlinearly (quadrat- 
ically) [4,24]. 

There is a close connection between smoothing NM 
and interior point methods (see, for example, [21]). 

For some special classes of problems it is possible to 
obtain stronger results than in the general setting (see, 
for example, [6,11,18,26]). On the other hand there are 
modifications of the nonsmooth Newton method for 
the solution of generalized equations of the form y € 
F(x) with a set-valued mapping F (see, for example, 
[2,7,16]). 
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Mathematical descriptions of real-life problems are typ- 
ically stated in terms of decision variables x and param- 
eters 6. These objects may be related through a system 
of equations and inequalities such as 


gi(x,6) <0, ie, 


x ER": hi(x, 0) =0, jeJ : 


x€5(6) = 


where g’, W/: R” x R? > R, i € I, j € J, are some func- 
tions, and I and J are finite index sets. Problems of the 
form 


f (x, 8) 
st. x € F(8), 


where @ is allowed to vary over some set F in R?, are 
termed parametric programming models (PPM). Para- 
metric programming (PP) is the study of parametric 
programming models. A local analysis of these mod- 
els, around a fixed 9, is referred to as sensitivity analysis 
(SA). In particular, SA is concerned with changes of the 
minimal value subject to small perturbations of the pa- 
rameter. Parametric programming is a huge area con- 
taining, or closely related to, many topics, such as path 
following methods (cf. » Parametric optimization: Em- 
beddings, path following and singularities), sensitivity 
in semi-infinite programming, constraint qualifications 
(cf. also » First order constraint qualifications; » Sec- 
ond order constraint qualifications), bilevel program- 
ming (cf. > Bilevel programming: Introduction, history 
and overview), etc. We will refer to some of these topics 
hereby only in passing. Since every equality constraint 
can be replaced by two inequalities, one can assume that 
J = @. Then the model is said to be linear (resp. convex) 
if the functions f(-, 0), g! (-, 0): R” > R, i € I, are linear 
(resp. convex) for every 6 € R?. 


Historical Outline 


Parametric programming has its roots in the study of 
linear programs: 


where one or more coefficients of the vectors b and c, 
or the matrix A, are considered as parameters and al- 
lowed to vary. The study can be traced to the 1950s lit- 
erature. According to [16], the right-hand side changes 
in a linear program were investigated by W. Orchard- 
Hays in his unpublished Master’s thesis (1952). The 
term ‘parametric’ LP was used in [41]. The classical 
problems of SA and PPM dealt mainly with pivoting 
and the simplex method. A different approach that uses 
polyhedral structures rather than the simplex method 
was developed in [46,47,48,49]. A classical parametric 
problem in LP is to determine the range of perturba- 
tions for specific parameters in b and c, that preserve 
optimal bases. A related problem is to determine the 
range for which an optimal solution exists. This range 
is called the ‘critical set’. Various approaches to solving 
these problems have been successfully implemented in 
commercial software packages and adjusted to partic- 
ular situations, e. g., data envelopment analysis [45]. It 
is well known that difficulties may arise when the prob- 
lem under consideration is degenerate (i.e., when op- 
timal basis is not unique). In that case the commer- 
cial packages may provide essentially different results, 
that is to say, the information could be ‘confusing and 
hardly allows a solid interpretation’; see, e. g., [5], where 
the claim is demonstrated on a transportation prob- 
lem. The study of changes of the parameters in [5] is 
a departure from the classical approach. Instead of em- 
ploying local analysis and pivoting, the authors make 
use of the strict complementarity condition and opti- 
mal partitioning in order to construct and study the be- 
havior of the optimal value function for the right-hand 
side and objective-function perturbations. They do it 
for both LP and convex quadratic programming and 
obtain sharp intervals. Other approaches used to study 
the effect of perturbations of parameters in LP, includ- 
ing the ‘tolerance approach’ (where variations may oc- 
cur simultaneously and independently), are described 
in [63,64]. The classical texts on sensitivity analysis and 
parametric linear programming include [9,15,37,48,49]. 
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([15] lists 1031 references.) Almost 1000 items only on 
degeneracy in LP are listed in [17]. 

A major obstacle to using linear algebra and calcu- 
lus methods in the study of linear models is that these 
methods are not suited to explain continuity properties 
of the model. Indeed, the model does not generally react 
‘continuously’ to continuous changes of the parameters 
in the coefficient matrix. Let us illustrate such a situa- 
tion. 


Example 1 Consider the model 


When 6 = 0, then the feasible set F(@) is the segment 
—1<~x <1, the optimal solution is x = — 1, and the 
optimal value is — 1. For any perturbation 6 $ 0, all 
three objects jump to zero. 

An example of a linear model where the critical set is 
disjoint and not closed is given in [9, pp. 114-116]. An- 
other one with a matrix of full row rank where both the 
feasible set and the set of optimal solutions experience 
jumps under continuous perturbations of the parame- 
ter in the interior of the critical set is given in [42]. We 
extend this example to an LP in canonical form with 
a full row rank matrix below. 


Example 2 Consider the model 
min —x, 

s.t. X+%+%3= 1, 
xX] + 0x2 — x4 = 1, 


x20, i=1,...,4. 


A unique optimal solution at 6 = 1 is x = (x;) = (0, 1, 0, 
0) and the optimal value is —1. However, for any per- 
turbation 0 = 1 — e, € > 0, the solution jumps to an- 
other unique optimal solution (1, 0, 0, 0), and the opti- 
mal value becomes zero. 


In order to study reactions of a model to continu- 
ous perturbations of data, the feasible set and the op- 
timal solutions set can be viewed as images of point- 
to-set mappings with a domain in the space of param- 
eters. Hence the study of continuity of these and re- 
lated objects, such as Lagrange multiplier sets, requires 


basic tools of point-to-set topology. These tools have 
been used in mathematical programming sporadically 
and in different contexts, e.g., in an analysis of con- 
vergence of numerical algorithms in [50,65]. After pa- 
pers such as [26], the point-to-set approach to the study 
of parametric programming has become standard. The 
first text on the theory of nonlinear parametric opti- 
mization is [4], written by several authors from the 
‘Berlin school of parametric optimization’ initiated by 
F. Nozicka. (Twenty nine students obtained doctorates 
under his supervision.) This text contains 30 pages of 
bibliography on PP. A classical text on methodology 
used in perturbation analyses in nonlinear program- 
ming is [13], and one on path following methods in 
PP is [25]. A unified approach to general perturbations 
with applications to system analysis and numerical op- 
timization is given in [34]. Since the late 1970s there 
has been an outburst of research activities in PP. For 
instance, to date (1998) there have been 20 annual sym- 
posia on mathematical programming with data pertur- 
bations held at the George Washington Univ., and the 
International Conference on Parametric Optimization 
and Related Topics is being held bi-annually since 1985. 
There have been at least 15 books written on parametric 
programming. 

The study of parametric programming can be 
roughly divided into three general areas: stability, op- 
timality and numerical methods, and applications. 


Stability 


In this area one mainly studies continuity properties 
of the feasible set F(), the set of all optimal solutions 
F°(@) = {x°(@)}, the set of all Lagrange multipliers U = 
U(@), and the optimal value f°(@) = f(x°(@), @), as the 
parameter 6 varies. Here f° is a function, while F: 0 > 
F(O), F°: @ > F°(6), and U > U(6) are point-to-set 
mappings. If the constraints in a parametric program- 
ming model are continuous functions, then F is a closed 
mapping. In order to guarantee continuity of the map- 
ping F one requires an extra condition, e.g., that F be 
lower-semicontinuous (or, equivalently, open). If the 
PPM is convex and ¥°(@*) 4 @ and bounded, at some 
6*, then continuity of F locally implies both the exis- 
tence of optimal solutions and continuity of the optimal 
value function, and the mapping J° is closed. However, 
continuity of ¥ does not imply continuity of U. 
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Conditions for the existence of a differentiable path 
of solutions x°(@) in a general model, as the parameter 
@ varies, are given in [14]. They are extended to a con- 
tinuously differentiable path of saddle points in [13]; 
see also [19,20]. The results are established by applying 
an appropriate implicit function theorem to a necessary 
condition for optimality. The reference [14] also con- 
tains an explicit formula for the partial derivatives, as 
well as approximations based on classical penalty func- 
tions. For a state-of-the-art about Lipschitz stability for 
linear and quadratic (also nonconvex) programs up to 
1987, see [30]. For a recent (1998) guided tour of sensi- 
tivity analysis that yields lower and upper estimates for 
the optimal value function see [6]. 

If the feasible set mapping JF is continuous and 
the set of optimal solutions is nonempty and bounded, 
then such models are often termed stable. In convex 
parametric programming continuity of F is implied by 
Slater’s condition. In the fundamental paper [51], sta- 
bility for linear systems in the canonical form, subject 
to perturbations of all data, is characterized by the ex- 
istence of a positive feasible solution x > 0. The result 
is stated for linear inequalities in partially ordered Ba- 
nach spaces. It is extended in [52] to nonlinear inequal- 
ities over a closed convex set. When perturbations of 
specific coefficients are considered, then the existence 
of a positive solution is not a necessary, but it is rather 
a sufficient condition for stability. Also, in this case, 
chunks (regions) of the parameter space attached to 
a fixed parameter 0*, where F is lower semicontinu- 
ous, are termed regions of stability at 0* see [73,74]. (In 
the model from Example 2, a region of stability at 6* = 
1 is 6 > 1.) Such regions can often be calculated glob- 
ally; one of these is the set of all paths emanating from 
0* on which the constraints satisfy Slater’s condition. 
For a list of regions of stability and a necessary condi- 
tion for stability in convex PP see [69]. These regions 
are of independent interest in, e.g., the study of ran- 
dom decision systems with complete connections [66] 
and linear programming [7]. The radius of the largest 
ball centered at 0*, with the property that the model 
is stable at its every interior point 0, is the radius of 
stability at 0*, e.g., [69]. It is a measure of how much 
the system can be uniformly strained from 6* before it 
starts breaking down. In a linear parametric program- 
ming model in canonical form with a full row rank co- 
efficient matrix, stability is implied by the existence of 


a nondegenerate basic feasible solution. On the other 
hand, instability (i.e., loss of continuity of the feasible 
set mapping) typically occurs in situations of “enforced 
optima’ such as lexicographic and multilevel program- 
ming, including von Stackelberg games of market econ- 
omy. For example, in bilevel programming, instabil- 
ity occurs when the optimal solutions set of the fol- 
lower (lower level decision maker) loses its lower semi- 
continuity. The leader’s (upper level decision maker’s) 
model is then unstable, because its feasible set is the set 
of optimal solutions of the follower. In this situation 
the leader’s optimal value function typically experiences 
a discontinuity even if the follower’s model is globally 
stable; see [43, Chapt. 13; 16]. The notion of stability 
is not uniquely defined, see, e.g. [4,13,21,31,34]. Struc- 
tural stability and continuous deformations of nonlin- 
ear programs have been studied in, e. g., [22,28,29,33]. 
In particular, the ‘topological stability’ of the feasible set 
(i. e., homeomorphy with respect to all sufficiently small 
perturbations up to second order of the involved func- 
tions) is proved to be equivalent to the Mangasarian- 
Fromovitz constraint qualification being satisfied at the 
feasible points; see [24]. Characterizations of stability 
(local existence and uniqueness) of stationary points 
and Karush-Kuhn-Tucker points with respect to all 
sufficiently small perturbations up to second order are 
given, respectively, in [32,53]. For a study of stability 
with nondifferentiable data see [3,44], also [57]. Some 
difficulties with an abstract formulation of parametric 
programming are mentioned in [43, Chapt. 14]; see also 
[2,58,75]. Stability in optimal control is studied in, e. g., 
[38,39]. 


Optimality and Numerical Methods 


A parameter 6 is said to be locally (resp. globally) op- 
timal if it locally (resp. globally) optimizes the optimal 
value function f°(@) over its feasible set F = {0: (0) # 
@}. Calculation and characterization of optimal param- 
eters are basic problems of parametric programming. 
For convex PPM one can formulate necessary and suf- 
ficient conditions for local (and global) optimality of 
the parameter, e. g., [68,69,70,71]. These conditions are 
expressed in terms of saddle-point inequalities and lo- 
cal results typically require conditions such as unique- 
ness of the optimal solution in the x component and 
lower semicontinuity of the feasible set mapping. They 
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are simplified under input constraint qualifications [55]. 
The optimal value function is not generally known an- 
alytically, it is generally nondifferentiable, nonconvex 
(nonconcave), and discontinuous even for linear mod- 
els. However, for the right-hand side and objective- 
function perturbations in linear models it can be con- 
structed, e. g., [5]. 

Calculation of optimal parameters is often achieved 
by using the so-called ‘marginal value formula’. This 
formula gives the derivative of the optimal value func- 
tion on a prescribed path in terms of the derivatives of 
the Lagrangian function relative to x and @. At each 
iteration the calculation consists of two parts: first, an 
improvable feasible path is determined and then a step- 
size problem is solved on the path by a search method. 
Optimization of the optimal value function by stable 
perturbations of the parameters (also called input op- 
timization) in convex PP is a challenging problem and 
no satisfactory theory or numerical methods presently 
seem to exist, e.g., [71]. In contrast, the theory of 
path following methods, based on nonlinear program- 
ming optimality conditions and used when data depend 
on one scalar parameter, is well developed, although 
‘not successful in every case’ see, e.g., [23,25]. In or- 
der to improve the path following approach these au- 
thors propose jumps between connected components 
in the sets of local minimizers and generalized critical 
points. 


Applications 


Classical sensitivity analysis, when applied to the right- 
hand side perturbations in linear programming, pro- 
vides interpretation of Lagrange multipliers as shadow 
prices. The results have been extended to convex pro- 
gramming using the marginal value formula, e.g., in 
[12]. Genuine applications of parametric programming 
extend far beyond classical sensitivity analysis. Some of 
them are related to the fundamental notion of a well- 
posed problem in the sense of Hadamard. These are 
problems in applied mathematics which have a unique 
solution and the solution changes continuously when 
the parameters (e.g., boundary conditions in differ- 
ential equations) change continuously. Problems that 
are not well-posed are called ill-posed. According to 
Fritz John (see [62, p. ix]) ‘the majority of applied 
problems are, and always have been, ill-posed, partic- 


ularly when they require numerical answers’. There is 
a more general notion of well-posedness (for problems 
with nonunique solutions); see [10]. Many problems of 
mathematical physics can be formulated as optimiza- 
tion problems and continuous dependance of the so- 
lution on the boundary conditions can be studied us- 
ing PP. In the context of mathematical programming, 
some of the first applications of PP included the study 
of convergence of numerical algorithms. The rate of 
convergence can be determined using continuity prop- 
erties of point-to-set mappings; see, e. g., [14,50,54,65]. 
Some of the ambiguities that occur while solving or- 
dinary LP can be understood and resolved by PP. For 
example, in some models describing real-life problems, 
such as the one reported in [62, pp. 212-213], signifi- 
cant jumps of optimal solutions occur when some data 
are perturbed, while the respective values of the objec- 
tive function are comparatively close. This is a typical 
behavior of stable linear programming models when 
the optimal solutions mapping is not continuous. The 
authors of [62] suggest a method for stabilization of op- 
timal solutions of such programs by “Tikhonov regular- 
ization’. 

The results on optimal parameters in parametric 
programming have many applications including ma- 
chine scheduling [61,67], restructuring of the work 
force in a textile mill [71,72], ranking of efficiently ad- 
ministered university libraries by their robustness of 
data [40,72], the study of systems of differential equa- 
tions under matrix perturbations in robust analysis and 
control [27], as well as approximation theory, especially 
in the problems of best fitting to data. These problems 
are formulated as follows: given a set of points, one 
wishes to determine a function (from a prescribed class 
of functions, e. g., linear) that best approximates these 
points. After forcing the points to satisfy the function, 
the problem generally reduces to an inconsistent sys- 
tem of equations for which one determines a best ap- 
proximate solution. The solution is given in terms of 
decision variables (such as the slope of the line and its 
intersection with the y-axis, in the linear cases of two- 
dimensional data in the (x, y)-plane). If the Euclidean 
norm is used then the solutions are called the least 
squares solutions. However, the problem can also in- 
corporate estimates of errors made in measuring data. 
The errors may fall within some known lower and up- 
per bounds. One can consider the data vector as a pa- 
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rameter in which case the best approximation prob- 
lem assumes a more general form. It consists of finding 
perturbations of data, within specified boundaries, for 
which one achieves the best fit. The problem is called 
the generalized least squares problem in the context of 
the Euclidean space. If the analytic form of the func- 
tion that approximates data is known, and if it contains 
some ‘parameters’ to be determined, then the best ap- 
proximation problem is called the parameter identifica- 
tion problem. 


Example 3 One may wish to determine the constant 
of gravity g, the initial position so, and the initial veloc- 
ity vo of a falling object. It is known that the object is 
governed by Newton’s second law of motion s = so + 
vot + gt?/2. Measuring s = s(t) at various times t, one 
obtains a generally inconsistent system of linear equa- 
tions in the ‘parameters’ so, vo, and g. After minimizing 
a norm of the residual vector one identifies the parame- 
ters. However, one may also take into consideration rel- 
ative errors that have been made in reading t¢ and s(t). 
The generalized parameter identification problem is to 
determine the best fit within the allowable errors. This 
approach typically gives more accurate results. In the 
context of parametric programming models, the opti- 
mal errors are optimal values of the parameter, while 
the ‘parameters’ to be identified, like sp, vo, and g, are 
actually decision variables. 


Parameter identification problems are used in many ar- 
eas. For example, in the biological sciences they are used 
in attempts to find kinetic constants which can quan- 
titatively describe certain biochemical processes. Typi- 
cally, experiments are performed with tracers (labeled 
with radioactive or stable isotopes), then experimen- 
tal tissue radioactivity curves are fitted to a biological 
model to find kinetic parameters. On the basis of these 
kinetic parameters one can calculate regional glucose 
utilization in the brain [60], serotonin synthesis [8], 
aromatic amino acid activity [18] and receptor densi- 
ties [36]. 

Optimal parameters are also important for post- 
optimality analyses of linear programs. Using PP one 
can answer basic questions like: Given an optimal solu- 
tion of an LP, for what perturbations of data does the 
solution remain optimal? The following example ex- 
poses the problem. 


Example 4 Consider the linear program in one variable 
with zero-value objective function: 


A (global) minimizer is x* = —1. Now suppose that this 
program belongs to the class of perturbed programs 


when @ is fixed at * = 0. Then, for perturbations 0 4 
0, the point (x*, 0*) is actually a local maximizer! In sit- 
uations like these some optimal solutions of linear pro- 
grams may be ‘better’ than others. Here, say, x = 1 is 
‘better’ than x* = —1, because its local optimality is not 
affected by perturbations of (x, @). 


The optimality-preserving problems, at the global op- 
timality level, are solved for linear models using the 
saddle-point optimality conditions along 9-paths and 
after setting the terms corresponding to the compo- 
nents of the x variable equal to zero. Two closely related 
problems are: 

i) given an infeasible point x (e. g., a prescribed profile 
of production that one wishes to achieve), find per- 
turbations of data 6 that make the point feasible; and 

ii) given a feasible but nonoptimal point, find pertur- 
bations that make the point optimal. 

Many results from convex parametric program- 
ming can be adjusted to work for partly convex pro- 
grams (PC) and more general mathematical programs, 
e.g., [1,23,70]. Partly convex programs are programs 
which, after ‘freezing’ some of the coordinates in x, be- 
come convex programs in the remaining variables. The 
program from Example 4, with the objective function 
-x, and the constraint —1 < x; < 1, isa PC program. 
(Identify x. = 6.) Since every mathematical program 
with twice continuously differentiable functions can be 
formulated as a partly convex program, see [35], one can 
in principle study (and solve) many mathematical pro- 
grams by studying PC programs and convex parametric 
programming models. 

Numerous applications of parametric programming 
are mentioned in the classical texts [4,14,15,34]. Para- 
metric programming has found many applications in 
discrete optimization, transportation problems (e.g., 
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[59]), economics and finance (e.g., [11,56]), approx- 
imation, as well as in multi-objective, multilevel, 
stochastic and global optimization (e. g., » Parametric 
global optimization: Sensitivity). Some of the recent re- 
search in parametric programming has been focused on 
connections between polynomial complexity and per- 
turbation theory, stability results for nonunique solu- 
tions, control, semi-infinite programs (e. g., [28]), and 
nonsmooth problems. 


See also 


> Bounds and Solution Vector Estimates for 
Parametric NLPs 

> Dini and Hadamard Derivatives in Optimization 

> Global Optimization: Envelope Representation 

> Multiparametric Linear Programming 

> Multiparametric Mixed Integer Linear 
Programming 

> Nondifferentiable Optimization 

> Nondifferentiable Optimization: Cutting Plane 

Methods 

> Nondifferentiable Optimization: Minimax Problems 

> Nondifferentiable Optimization: Newton Method 

> Nondifferentiable Optimization: Relaxation 

Methods 

> Nondifferentiable Optimization: Subgradient 
Optimization Methods 

> Parametric Global Optimization: Sensitivity 

> Parametric Linear Programming: Cost Simplex 
Algorithm 

> Parametric Mixed Integer Nonlinear Optimization 

> Parametric Optimization: Embeddings, Path 
Following and Singularities 

> Selfdual Parametric Method for Linear Programs 
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Relaxation methods for convex nondifferentiable op- 
timization have their origin in relaxation methods for 
finding a solution to a system of linear inequalities, see 
[1] and [6]. In mathematical terms, one wants to find 
a vector x € R” such that a} x— b; <0 fori=1,...,m. 
In the simplest form, such a relaxation method iterates 
an initial guess x9 € R” with the iteration scheme 

k a} x k= b; k 


2 
Jan 


Qin, (1) 


Xkt+1 = Xe —Y 


where ix € Argmax;=1,...,m a} x, — bj, and y* ¢€ [6, 
2— 6] and 6 € (0, 1). The iterations stop once a feasible 
solution is found. With y* = 1 the iteration formula (1) 
corresponds to a projection of x; onto the most violated 
hyperplane at xx. 

The problem of finding a solution to an inequality 
system can be cast into a convex nondifferentiable op- 
timization problem with known optimal value, namely 


the following 


min f(x) = inimen max aj) x —bj,0 
x x | i=1,....m 
which has optimal solution value equal to zero if the 
system is consistent. 

The iteration scheme (1) can be generalized to con- 
vex nondifferentiable minimization with known opti- 
mal value in the following manner. Suppose that we 
want to find x € R” such that f(x) < fiey, where fiey = 
f* = inf f(x) is known. Let my, denote the set of affine 
functions minorizing f, 


a'y—b< fly) 


a eee en a for all y € R" 
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Assume as is customary in nondifferentiable optimiza- 
tion that we at x € R” can evaluate f(x) and one arbi- 
trarily chosen subgradient 


g €Of(x) 


Ty — 
~ Seer’: fWZfOre Y=x) {- 
for all y €¢ R" 
Then, (g, g'x —f(x)) € my and by definition of my, 


sup {a'x—b} 


(a,b)emy 


> g'x—(g'x — f(x)) = f(x). 


fitaee 


Hence f(x) = SUP(a,b)em {4 |X — b}. 

Thus, finding an x such that f(x) < fiey can be seen 
as solving an infinite system of inequalities, aTx— b — 
fley < 0 for all (a, b) € my. The analog of (1) then be- 
comes 


k (xx) = fiey 


Sk (2) 
ligell2 


Xk+1 = Xk —Y 
where gx € Of(x;). It can be shown that if X* = {x: f(x) 
< fley # O then xp > Xtey € X*. 

This steplength rule has often in the literature been 
called the Polyak II rule perhaps since it appears as the 
second numbered equation in B.T. Polyak’s classical pa- 
per [8], where its convergence properties are studied. It 
is shown there that, under certain mild conditions on 
f, the convergence rate is linear to an optimal solution, 
should it exist. 

The step (2) corresponds to a projection onto a hy- 
perplane. With y* = 1, xx,1 solves the quadratic pro- 
gramming problem 


min ||x — mulls : 


3 
st. fle = f(x) + By (x — Xxx). ®) 


One may in this framework add previously generated 
subgradients in order to obtain faster convergence, i.e. 
let xx41 solve 


min |x — xl}. 
s.t lev = f(x) te gj (x _— xj), (4) 
jet*, 


where J* is a subset of {1, ..., k} including k. 


Constraints on the Variables 


The classical way to handle constraints on the variables, 
i.e. when x has to be in some closed and convex set X as 
well as in the set {x: f(x) < fiey, is to let 


(5) 


iS (xn) fs) . 
lgell3 


Xk+1 = Px (x —y 


where Px(y) = argmin, ¢ x||y—x||} denotes the projec- 
tion of y onto the feasible set X. 

However, it is often better to let x,,1 be the projec- 
tion of x, onto the set X M {x: y*Fiev— (xk) = gi (x— 
xz)}, i.e. for y* = 1 to add x € X to the constraint set of 
(3). If X consists of simple bounds or the unit-simplex 
then it can be shown that this latter way results in a new 
iteration point which is closer to the desired set {x € X: 
F(x) < fiey than does (5) (see [2]). In these two cases 
the computational burden of solving this one projec- 
tion problem does not need to be larger, than solving 
the two very simple projections involved in (5). 


Finding a Minimum Point 


Suppose one wants to find a minimum point x* of 
a convex function f. Then if the optimal value f* is 
known a priori then the algorithms mentioned above 
can be successfully used with fiey = f*. But, prior knowl- 
edge of f* is not usually the case. However, when solv- 
ing the dual problem in applications of Lagrangian re- 
laxation to obtain upper bounds on a primal maximiza- 
tion problem, it is often the case that a (good) lower 
bound, fiows on f* is known from a primal feasible so- 
lution. In these applications, iterations schemes as the 
ones above can be applied heuristically and often suc- 
cessfully by replacing fiey in (2) by flow. 

To be specific, one such heuristic goes as follows. 
Suppose that at iteration k a lower bound ee on f* is 
known, which is used in place of fiey in (2). Initially, yo 
= 2 and at iteration k, y; is reduced by a constant factor 
a € (0, 1), if there has been no decrease in terms of the 
function values for the last K max iterations (which could 
indicate that too large steps are taken). Here a and Kmax 
are parameters which need to be chosen appropriately 
for the application in question. Typical values are a = 
0.5 and Kmax = 5. 
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0 (Initialization) 
Select a point x; € R" and 6! > O and Omax > 
0. 
Seta; = Oandj_.= co. sctk = 1, 1 = J and 
k(Q1) = 1. 

1 (Function call) 
Calculate f(x,) and gx € Of (xx). 
IF || gx ll= 0 
THEN terminate (x, € Aton f). Set a ise = 
min{ f (xx), fk-1} and update x‘. correspond- 
ingly. 

2 (Sufficient Descendent?) 
If f (xx) < FR —8;/2, set k(1+ 1) = k, of = 0, 
6141 = 6), 1 = 1+ 1 and skip next step. 

3 (Oscillation?) 
If Oo, > Omax, set k(1 + 1) = 
d141 = 61/2, replace x, by ne 
spondingly. 

4 (New point) 
Set Te = fk _ §,, 


rec 


kk, Op = (0), 
and gx corre- 


f (xx) — fx, 
k 


Il gx Il 


Xk+1 = Xk — 


5 (Path update) 
Set 0, = Oxt || Xki1 — xx ||. Set k = kK +1 and 
return to Step 1. 


A simple convergent method, [2] and [4], based 
on (2) for minimizing f with no assumption on prior 
knowledge of the optimal value is motivated as follows. 
It is a fact that if the set {x € R: f(x) < fiey} is empty 
then the iteration scheme (2) generates a path whose 
length )°°2., ||xk+1— xx|| is unbounded, and if the set is 
nonempty then the path length is bounded. If the for- 
mer seems to be occurring then fjey should be increased 
and if the algorithm seems to be converging to fiey then 
a lower fiey could be used. The mechanics of the algo- 
rithm aoe be clear from the following description. 
ee « f(x;) denote the best function 
value found up to iteration k and let x*,. be the point 
at which this occurs. Let ft, denote the ‘level’ which we 
are aiming for. The number k(/) denotes the iteration 
of the Ith change of fk. The number 0; is the length of 
the path since the last update of fey. 

Note that the level fk remains fixed in between the 
iterations k(/) and k(/+1) — 1. For this algorithm it is 
possible to derive so called efficiency estimates. In par- 


ces 


ticular, it can be shown that for ‘small’ € > 0 the algo- 
rithm produces f*,. — f* < € for k > K/e?, where K is 
a positive constant. 

Of course, one may in Step 4 use (4) instead of (2) 
to obtain a new iteration point. However, when using 
(4) in Step 4, more sophisticated schemes to adjust the 
aiming level fk are possible. A scheme suggested in [3] 
uses ideas from so called proximal point bundle meth- 
ods, in which sufficient descent in terms of function val- 
ues is enforced by means of so called null steps. An- 
other method by C. Lemaréchal, A.S. Nemirovsky and 
Yu.E. Nesterov, [5], for the case when x is constrained 
to acompact set X, uses fk, =a fi. + (1—a) fk, where 
f am is the best lower bound possible, that is 


= = min max 


min max, f(x) + gf (x — x) 


This algorithm has a complexity estimate given by f<,. 
— f* <« for k > K/e’, where K is a positive constant. 
Such a complexity estimate is optimal in the sense that 
it can not be improved uniformly with respect to the di- 
mension by more than an absolute constant factor (for 
details, see [7]). 
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Subgradient methods for minimization of a convex func- 
tion f: R" — R over a closed convex set X have proven 
to be an efficient mean to solve large scale optimization 
problems. In particular, this is the case when X is a sim- 
ple set, such as R” or the positive orthant and when high 
accuracy of the solution is not required, e. g. in the con- 
text of Lagrangian relaxation of integer programming 
problems (cf. » Integer programming: Lagrangian re- 
laxation). 

A subgradient at a point x € R” of a convex function 
f isavector g € R" satisfying f(y) > f(x) + g™(y — x) for 
all y € R". The set of subgradients at a point x is known 
as the subdifferential df(x). At points where the func- 
tion is differentiable, the gradient is the sole member of 
the subdifferential. 

Gradient based iterative methods, e. g. the steepest 
descent method designed for minimization of a smooth 
functions fail when one attempts to use their analogs, 


i.e. replacing gradients by subgradients, for minimiza- 
tion of convex nonsmooth functions. This is because 
the negative subgradient may not be a descent direc- 
tion and even if it were at all points generated along 
the path, the sequence of iterates would not necessar- 
ily minimize f. 

The basic subgradient algorithm takes the following 
form: 


0 (initialize) 
Choose a point x9 € R” and set k = 0. 
1. (Function call) 
Calculate g, € Of (xx). 
IF || gx || = 0 THEN terminate since x, € 
Argminf. 
2 (New point) 
Set 
Xk+1/2 = Xk — te TAT 
and 


Xk+1 = argmin || y— xp41/2 ||, 
yEex 


replace k by k + 1. Return to Step 1. 


It can be shown, see [4], that if 


t 40 and >) t, = 00, (1) 


k=0 


then lim inf f(x;) = inf, ¢ x f(x) = f* and furthermore, 
if also 


i <0; 


lee) 
k=0 


then x* converges to a minimum point, if there is one 
(see [2]). 

As can be expected, the rate of convergence for the 
above algorithm with the divergent series rule (1) is very 
slow. In fact, it can be shown that the algorithm can 
not have so-called geometric convergence rate. An al- 
gorithm is said to have geometric convergence rate, or r- 
linear convergence rate, if for any convex function and 
any starting point there exist M and q € (0, 1) such that 
Ilx~ — x*|| < Mg‘, where x* is an optimal point. J.-L. 
Goffin has [3] shown that it is possible to obtain geo- 
metric convergence rate with the geometric series rule 


te = Mp* |lgell. 
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if M and p € (0, 1) are chosen large enough. In prac- 
tice, though, for a particular f and starting point xo it is 
impossible to know what sufficiently large is. 

Another often used steplength rule is the so-called 
relaxation rule, or Polyak II rule, where t* is chosen ac- 
cording to 


f(x) — fi, 


tk = Vk — 
IIgell 
where ff, is an estimate of the minimum value value f* 
and y, € [5, 2 — 6], and 6 € (0, 1). See also ® Nondif- 
ferentiable optimization: Relaxation methods. 

Pure subgradient methods have a tendency to zig- 
zag, i.e. a step in the direction —g; tends to be followed 
by a step which is almost parallel with g;. Several so- 
lutions to overcome this behavior have been proposed. 
One such solution is the concept of space dilation of 
N.Z. Shor (see [5]). This approach is related to quasi- 
Newton methods for differentiable optimization. An- 
other solution to the zig-zagging problem is the method 
proposed in [1], in which a step is taken in a direction 
which is a sum of the previous direction and the current 
subgradient. This approach is analogous to conjugate 
gradient methods for differentiable optimization. 
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Nonlinear least squares problems can be phrased in 
terms of minimizing a real valued function that is a sum 
of some nonlinear functions of several variables. Effi- 
cient solution for unconstrained nonlinear least squares 
is important. Though some problems that arise in prac- 
tical areas usually have constraints placed upon the 
variables and special techniques are required to handle 
these constraints, eventually the numerical techniques 
used rely upon the efficient solution of unconstrained 
nonlinear least squares problems. 

The unconstrained nonlinear least squares problems 
have the form 


min flx) = 5D Ina}? = Fr(x) 00, 


i=1 


where r(x) = (11(x), ...5 fm(x))T and r(x), i= 1,..., m 
are nonlinear functions of x € R”. When r(x), hence 
f(x) is twice continuously differentiable, the gradient 
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and the Hessian matrix of f(x) are given by 


g(x) = Vf (x) = A(x)r(x), 
G(x) = V* f(x) 


= A(x)A(x)T + S0 ri(x)V?ri (x) 


i=1 


where A(x) = [Vri(x) --- Vrm(x)]. The special struc- 
tures of these derivatives had been exploited in de- 
veloping effective solution methods for nonlinear least 
squares. 

As a general unconstrained minimization problem, 
the Newton method plays a central role in the de- 
velopment of numerical methods for nonlinear least 
squares solution. Most commonly used nonlinear least 
squares methods can be viewed as variations on New- 
ton’s method. The Newton method for general opti- 
mization is derived based upon the quadratic model 


1 
qk(S) = fr + gp d+ 581 G48. 


where f;, =f(x™), o, = g(x), Gy = G(x), & =x — x), 
and x is an approximation to a local minimizer x* 
of the objective function f(x). qx(5) is a local approxi- 
mation to f(x) at x obtained from the truncated Tay- 
lor approximation. If this approximation is appropriate, 
then a presumably better approximation xt) = x + 
5 can be obtained by requiring that the step 5“ be 
a minimizer of q;(5). Thus the Newton method takes 
an initial approximation x“) to x* and attempts succes- 
sively to improve the approximation through the itera- 
tion 

e Solve the system G;d = — gy for 5 = 6%), 

e Set x) = 2 4 3H, 

If G; = Mx + Cy with M, =A;AT, Cy = C(x) = 
ri(x)Vr;(x) is positive definite, the solution 6 of 
the system is the global minimizer of q;,(5), and if the 
starting point x“) is sufficiently close to x* at which 
g(x*) = 0 and G(x*) is positive definite, the Newton 
method is well defined and converges at a quadratic 
rate. 

Unfortunately, the basic Newton method as it 
stands is not suitable for a general purpose use since G, 
may not be positive definite when x“ 
and even if G; is positive definite, the convergence may 


is remote from x* 


not occur since {f;} may not decrease. Though both the 
possibilities can be eliminated by incorporating either 
trust region technique for the former case or line search 
technique for the later case, the main disadvantage of 
the Newton method is the demand for evaluation of 
second order derivatives of problem functions. 

Since r(x) is being minimized in the least squares 
sense, it may be the case that r;(x*), i = 1, ..., m are 
zero or very small. Thus when x is close to x*, com- 
pared with M;, the second part C, in Gz may be neg- 
ligible. This suggests that M; is a good approximation 
to G; and gives the well known full-step Gauss-Newton 
method 
e Solve the system M;6 = — Axry for 5 = 6, 

e Set xD = 2H 4 5H, 

An important feature of the Gauss-Newton method is 
that the approximation to the Hessian G;, is directly ob- 
tained only from the first order derivatives of the prob- 
lem functions. The approximation will be exact when 
the functions r;(x), i = 1, ..., m, are all either linear 
or zeros. Since the Gauss-Newton method is obtained 
from the Newton method by neglecting the part C; 
of G;, the convergence property of the method greatly 
depends on the size of the omitted part. If C(x*) is 
large relative to M(x*) in the sense (x*) = ||M(x*)7! 
C(x*)|| > 1 which is regarded as combined relative mea- 
sure of the nonlinearity and residual size of the problem 
(assuming A(x*) is full rank), then the method does not 
converge. If m(x*) < 6 < 1 and the initial point x“ 
is close to x* enough, the Gauss-Newton method con- 
verges. In case convergence occurs, the rate of conver- 
gence also depends upon the size of 1n(x*). If C(x*) 
= 0, the method is rather satisfactory in the sense that 
the method converges quadratically for zero residual 
problems, and if C(x*) # 0, the convergence rate of 
the method is linear and the speed of convergence de- 
creases as the relative nonlinearity or the residual size 
increases. 

Since the matrix M;, is at worst positive semidefi- 
nite, the solution, denoted now by s“, determined in 
the Gauss—Newton system is not uphill on f(x). A pos- 
sible modification to force the Gauss-Newton method 
to converge is to incorporate line search techniques. If 
the matrix A; has full rank, then M;, is positive defi- 
nite and the solution s“ in the Gauss-Newton system is 
a descent direction of f(x) at x), A line search along the 
direction s determines a steplength a; such that f(x“ 
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+ aps) < f (x) and anew approximation to x* is then 
obtained as x) = x + a5, This modification was 
first suggested by H.O. Hartley [7] and is generally re- 
ferred as damped Gauss-Newton method. This method 
does prevent divergence and usually have a larger do- 
main of convergence than the full-step Gauss-Newton 
method. In fact convergence follows if the condition 
number K(A;¢A,) is uniformly bounded above. Since 
Ax will usually be bounded above, this essentially re- 
quires that A; does not loss rank in limit. Unfortunately 
this can happen [11] and convergence to a noncritical 
point can occur. Also when A; is rank deficient, the 
solution s“ of the Gauss-Newton system becomes nu- 
merically orthogonal to g; at some distance from a local 
minimizer x* and no further progress can be made by 
line searches. 

A further modification to the full-step Gauss- 
Newton method is to incorporate the trust region tech- 
nique. In this modification, the trust region subproblem 


min qx(d) = fe + r, Alb + +5'My6 
s.t. [5] < Ax 


is solved for 5“ with properly adjusted radius A; and 
the new point is obtained as xt!) = x + §, The trust 
region radius is adjusted in such a way that the model 
function q;(5) is believed to have adequately approxi- 
mated the function f(x) in the region || 6 || < A,. This 
modification with a different form was first suggested 
by K. Levenberg [8] and D.W. Marquardt [9]. J.J. Moré 
used the modification in the above form [10]. 

An aspect of the Gauss-Newton method is the effi- 
cient solution of the Gauss—Newton system. It must be 
emphasized here that the solution of the system can not 
be done by first forming the product A,Aj and then 
performing a factorization on the product, because this 
will worse the conditioning of the system and lead to 
substantial loss in precision. A stable and efficient way is 
to factorize the augmented matrix [Al rx] using either 
Householder transformation or Given’s transformation. 

Regarded as a general optimization problem, an- 
other way to obtain approximations to the Hessian 
matrix G; from first order derivative information of 
problem functions is to use quasi-Newton updates and 
the resulting Newton type methods are called quasi- 
Newton methods. Let B, denote an approximation to the 


Hessian G, in the quadratic model q;.(5). Any Newton 

type method with line searches has the following basic 

framework. 

e Solve the system Bys = — gx; for a descent direction 
s®), 

e Determine a steplength a; along the direction s“ by 
line searches. 

e Set x) =x +4 aps, 

In order to ensure global convergence and to have 

a rapid local convergence rate, quasi-Newton updates 

require the matrix B, to satisfy the following condi- 

tions: 

e since G,d, ~ yx, Be should satisfy the so-called 
quasi- Newton equation 


Buy 18 = y, 
where 
5H) = FH) _ pO y® = gs — gy. 


By is symmetric. 
e By. is effectively obtained from B; using lower rank 
updating 


Buri = Be + Ex 


so that the calculation of By; is less expensive. 

e Bx is positive definite so that the solution s“ ob- 
tained as the minimizer of q;(6) is a descent direc- 
tion of f(x) at x), 

There exist numerous updating formulas to achieve 

these conditions and the Broyden family with single pa- 

rameter 


B83" B, 


Byyi(0) = Be — 
k+1(8) = Be wT B, 50) 
(k)4,(k) 7 
+E FY 4 gy®y 
sly 


is the most important, where 


7 
w = (8) " B,.s@y12 


: yh B, 8) 
sly) 8k)" B.§(h) 
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and @ is a parameter [3]. If §)T y& > 0 for all k, then 
Bysi(@) preserves positive definiteness for all values 


60> 6 = ——(< 0), 
1— Pre ) 
where 
T a: 
; yO" Hy ® 3)" B, 8th) 
i ar) i 
§(8) Tye) §(8) T y(k) 


and H, = Bo. The @ is the value which causes Bx, 
(@) to be singular. The condition §" ys 0 is real- 
istic and can always be achieved when the steplength a; 
either is determined from exact line search or satisfies 
the Goldstein conditions. The Broyden family includes 
the famous BFGS (6 = 0) formula 


k),(k) / 
BS — p+ yy 
age 5) Ty) 
B85)" By 
8K)" B, 8th) 
DFP (6 = 1) formula 
8" Bg) 
BbEP =B ae 1 Ae 
a 5) Tye) 
(k) a(k)F 
5 yy 
8(k) Ty (k) 
aa a 
7 yOS®" B, + Bes y® 
§(8) T y(k) 


and the selfdual rank one formula (@ = 1/(1—bx)) 


oye = — B®, 


Both the DFP and BFGS methods was found to work 
well in practice and have been widely used. These two 
methods have a number of important properties as fol- 
lows: 
1) For quadratic functions (with exact line searches) 

- terminate in at most n iterations with (B, , 1; = G); 

- preserve the hereditary property 

B80 = pV, 


j=l,...,i-]L 


- generate conjugate directions, and conjugate gra- 
dients (when B, = I) 
2) For general functions 
- preserve positive definite B;; 
- global convergence for strictly convex functions 
(with exact line searches); 

- local superlinear convergence rate. 

The BFGS method is even better than the DFP method 
and has usually been used with low accuracy line 
searches. In fact, global convergence and superlinear 
convergence rate of the BFGS method with inexact line 
searches have been proved [12]. The Broyden family is 
important in that many of the properties of the BFGS 
and DFP formulas are common to whole family. L.C.W. 
Dixon showed that when applied to any continuously 
differentiable function, all Broyden methods generate 
the same sequence {x} from the same starting point, 
assuming that the multiple local minima in line search 
are resolved consistently, degenerate values of 6 are 
avoided and the algorithm is well-defined. 

When any above quasi-Newton updating formula is 
used in a method, the system Bys = —g; needs solved 
to get the direction s) and this needs O(n?) arithmetic 
operations. In early versions of quasi- Newton methods, 
the search direction s“ is obtained as a product of a ma- 
trix and a vector from 

s = —Hige 
and the matrix H; is an inverse approximation to G, 
(H; ~ G;' obtained from an inverse quasi-Newton up- 
dating formula. This avoids the solution of the system 
and only requires O(n”) arithmetic operations. The in- 
verse updating formulas can be similarly derived from 
the quasi- Newton equation 


Ray =3s". 


or can be derived from above updating formulas using 
the Sherman-Morrison formula. For example, the Broy- 
den family of inverse updatings is given by 


Nonlinear Least Squares: Newton-type Methods 


where 


T 1/2 
yh) = (y® Hy) 
gk) Hy 
. T gt 
56k) yh) yk) Ay) 
and ¢ is a parameter related to 6 by 


_ 148 
~ 1+ O(axpb, — 1) 


ob 


The corresponding inverse BFGS, DFP and rank one 
updating formulas can be obtained by setting ¢ = 1, d 
= 0 and ¢ = 1/(1 — ax). It can be seen that one kind 
of updating formulas is obtained from another kind by 
simply making interchanges B, <> H; and 8 < y®, 
Although, the matrix H; preserves the positive definite- 
ness in theory when se y > 0, the loss of the posi- 
tive definiteness may occur in practical calculations due 
to round-off errors and it can not be easily recognized 
whether the matrix H; is positive definite or not. Now 
the widely used versions of quasi-Newton methods is 
to represent the matrix B; in a factorized form Egil) 
and to update the factors in O(n’) operations to obtain 
the factors Epa Deal pa, of the matrix By,,; where L is 
a unit lower triangular matrix and D is a diagonal ma- 
trix with positive diagonals [5]. The solution of the sys- 
tem Bys = —g, is then obtained in O(n’) operations by 
using forward and backward substitutions. The positive 
definiteness of the matrix B, is indicated by the diago- 
nal elements of D; and can be maintained by control- 
ling round-off errors. Methods with quasi-Newton up- 
dating in LDL™ form preserve the convergence prop- 
erties of quasi-Newton methods and reduce the arith- 
metic operations at each iteration. 

When quasi-Newton updating formulas are used 
to generate approximations to the Hessian matrices 
or their inverses for nonlinear least squares problems, 
AA} can be selected as the initial matrix B, if AA} is 
positive definite. One more modification to the quasi- 
Newton updatings for nonlinear least squares is the def- 
inition of the vector y. Let y‘*) denote previous de- 
fined vector y™. From the Taylor expansion of g(x), 
a new definition of y™ 


y = Me416™ + (Angi — Andreti 


can be used to replace y“ in any quasi-Newton up- 
dating formula. This idea is essentially suggested by 


M.C. Bartholomew-Biggs [2]. However, the condition 
ght y‘®) > 0, required for maintaining the positive def- 
inite approximations, may not be guaranteed by line 
searches. This difficulty can be avoided by using a safe- 
guarded value 

max ee se 0.018" yo 
in place of sr y in updating formulas [1]. 

For nonlinear least squares problems, both the 
quasi-Newton and Gauss-Newton methods generate 
approximations to Hessian matrices of objective func- 
tions only from their first order derivative information. 
The convergence rate is superlinear when any quasi- 
Newton, such as BFGS method is used to any differ- 
entiable nonlinear least squares problem, but the spe- 
cial structure of the problem functions is not taken 
into account and many iterations are required to build 
up a satisfactory approximation to the Hessian. The 
Gauss—Newton method takes the advantage of the spe- 
cial form of nonlinear least squares and better approxi- 
mations to Hessian matrices are directly obtained from 
the Jacobian matrix A(x) of r(x) for zero and small 
residual problems. The damped Gauss-Newton method 
converges at a quadratic rate for zero residual problems 
and at a fast linear rate for small residual problems, 
which in limited precision may be preferable to super- 
linear convergent methods. However, the method con- 
verges slowly for large residual problems, even fails on 
singular problems. Hence, the damped Gauss—Newton 
method is generally preferred for zero and small resid- 
ual problems, but should be avoided for large resid- 
ual and singular problems. Attempts have been made 
to combine the best features of both kind of methods. 
These include adaptive methods [4,13], hybrid methods 
[1,6], and factorized quasi-Newton methods [14,15]. 

In adaptive methods approximations to Hessian 
G(x) are obtained by approximating the nonlinear term 
Cy in Gy by a symmetric matrix S,; and keeping the 
Gauss—Newton matrix M; unchanged, that is, to define 


By = My + Sx 


and to develop updating formulas for the matrix S,. 
This idea was first suggested by K.M. Brown and J.E. 
Dennis in 1973. In their method S_ = )°”™,ri(x™) 
and a approximates V7r;(x) obtained from 


oc using certain quasi- Newton updating formula. In 
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methods of C.G. Broyden and Dennis (1973), J.T. Bett 
(1976), Bartholomew-Biggs (1977), P.E. Gill and W. 
Murray (1978), and Dennis, D.M. Gay and R.E. Welsch 
(1981) the formulas for updating S; are defined by us- 
ing certain quasi-Newton-like updating formulas, for 
example, Gill and Murray used the BFGS-like formula 


yOyH™ wes" w, 
§(k)T yl) 6H) WH | 


where Wx = Mis + Sy, while Dennis, Gay and Welsch 
developed an updating formula for S;: 


Seti = Se + 


T YT 
qn y% a yh 7 


Skt1 =Sk- 


where n = 8,6 — y and yy = (Agii— Ak) Tky1- 
These methods attempt to get better approximations to 
G; from first order derivative information by using the 
special structure of nonlinear least squares. However, 
since the positive definiteness of the resulting approxi- 
mation By = M; + S; can not be guaranteed, these meth- 
ods must be incorporated with trust region techniques. 
This increases the complexity of methods. Theoretical 
analysis and practical calculations show that the Den- 
nis—Gay-Welsch adaptive method is superlinearly con- 
vergent and effective. 

At each iteration of a hybrid method, the approxi- 
mation B; is simply chosen either the Gauss-Newton 
matrix M, or a quasi-Newton matrix, for example 
BFGS matrix, by defining a test T;, that is, 

BFGS(B,, 6), y) if T, holds, 


k+1 = 


k+1 otherwise, 


where BEGS(B;, 6”, y) denotes the BFGS updat- 
ing formula. Thus each step of a hybrid method is ei- 
ther a Gauss-Newton step or a BFGS step. In develop- 
ing a hybrid method, a test must be derived to distin- 
guish between these two steps. A reasonable test should 
have the capability to differentiate problems so that the 
method ultimately takes the damped Gauss—Newton 
method for zero and small residual problems or the 
BFGS method for large residual and singular problems. 
M. Al-Baali and R. Fletcher [1] proposed a test 


Th: A(Bg, 6, y™) < A(My 41,8, y™), 


based on the quantity 


1 1/2 
A(A, 6, y) = (« — 25+ 1) ; 
k 


which is regarded as a measure of the approximation 
error of the matrix A to Gx.;, where 
T 
yt aay 
§(K)T yl) 


gh)" agk) 
ae 
80K) T y(k) 


Though the resulting hybrid method is robust and ef- 
fective, the test needs extra O(n’) arithmetic operations 
at each iteration and the resulting method is only lin- 
early convergent [6]. Fletcher and C.X. Xu [6] proposed 
a simple test 

Ty =i 2% 


tk 


where € € (0, 1) is a preset parameter. Numerical ex- 
periences show that the method is not sensitive to the 
choice of the value ¢, but the value € = 0.2 is rec- 
ommended. The test is simple and theoretical analy- 
sis shows that the method ultimately takes the Gauss- 
Newton method for zero residual problems and the 
BFGS method for nonzero residual problems. There- 
fore, the method maintains the convergence properties 
of BFGS method and combines the best features of both 
the Gauss-Newton and BFGS methods. Practical calcu- 
lations show that the later hybrid method is better than 
the Fletcher-Al-Baali hybrid method. 

Factorized quasi-Newton methods for nonlinear 
least squares take the approximations B, in the form 


By = (Ag + Ly)(Ax + Li)" 


and develop updating formulas for the matrix L; such 
that i) + AxL} + LAL approximates the second part 
Cy of Gx. If the matrix (Ax + Lx) is of full rank, the 
search direction s“ obtained from the system Bys = —gx 
is guaranteed to be descent. The updating formulas for 
L, can similarly derived from the quasi-Newton equa- 
tion 


(Anti + Legi)(Angi + Lig) 8 = y™, 


Using the theory of generalized inverses of matrices, Xu, 
X.F. Ma and M.Y. Kong derived a class of updating for- 
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mulas for Ly 


- 
Vd) 
Le =L a 
k+1 = Ly + (a—d) 507 yp 
Pra Wey ®y®™ vss wy 
—C ‘ 
§(8) T yk) 8k) TW, 8th) 
where 
Vi=Arti tli, We= ViVi 


a, b, c and d are parameters satisfying the following 
equations: 


- 
3h) wg) (k) T wry (k) 
— +2ab+ —— =1 
80k) Ty (h) 80K) Ty (h) 
ade be 
sHTy® — gH) Wish’ 
c+td=1. 


The resulting approximation By.) = (Agi + Livi )(Anu 
+ Ly41)7 are a class of Broyden-like updating formulas 


— 1 RBEGS DEP 
Bri = OBpyy + Bey, 


where aw = c* + 2cd, B = d? and a + B = (ctd)? = 
1, hing and BP | are in previous forms with B; be- 
ing replaced by W,. Both these formulas for BFGS- 
like and DFP-like are first obtained by H. Yabe and T. 
Takahashi [15] from their proposed updating formu- 
las for Ly; which can be obtained in the above updat- 
ing formulas of Lj; by setting c = 0, d = 1, b= (547 
y® py" Wy)? for BFGS-like and c = 1,d =0, a 


7 (5) " 0/58)" W,5)1? for DEP-like, respectively. 
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Nonlinear least squares problems are among the most 
commonly occurring and important applications of op- 
timization techniques. The problem is to find minima 
of a real valued function that has the form of a sum of 
some nonlinear functions of several independent vari- 
ables 


min fx) = 5 Dra)? = 510), 


i=1 


where 1;(x), i= 1,..., m, are m nonlinear functions de- 
fined on R", r(x) is the vector representation of r(x), i 
= 1,...,m,m > n, and the real number 1/2 is generally 
placed for convenience. 

Approaches for least squares can be traced back to 
more than two hundred years ago. It is well-known that 
C.F. Gauss proposed the method, called Gauss-Newton 
method, in 1809 to estimate motion orbits of planets 


from observation data. However, early in the 1750s 
some methods had been suggested by P.S. Laplace, L. 
Euler, etc. to deal with measuring data in astronom- 
ical observations. In 1805, A.M. Legendre proposed 
a method to determine the orbit of comets and the 
meridian of the earth. He called it least squares method, 
but did not demonstrate its optimality in theory. The 
advent of computer promoted the development and ap- 
plications of numerical analysis including optimization 
methods and nonlinear least squares solutions. 

Nonlinear least squares problems arise in various 
practical areas such as scientific computing, scientific 
experiments, engineering designs, survey and observa- 
tions, geological prospecting, physical science, mathe- 
matics and so on. It is particularly useful in data pro- 
cessing and error estimations. Here are a few examples 
to illustrate the applications of nonlinear least squares 
problems. 


Solution of Simultaneous Equations 


It is frequently concerned with in scientific computing 
to find a solution of a system of nonlinear equations 


file1,...,X%n) =0, 


Fal Xigi3s- Xa) =0, 


where x),..., x, are unknowns. The system is underde- 
termined if m < n, well-determined if m = n and overde- 
termined if m > n. It is usually not possible to obtain an 
exact solution for an overdetermined system and one 
possibility is to seek a best least squares solution, that 


is, to find x* such that the function 


fod = 5 DUior 


i=1 


is minimized at x*. 


Curve (Data) Fitting 


Perhaps the most frequently solved of all nonlinear least 
squares problems are data fitting problems. In scien- 
tific researches, for example chemical or physical exper- 
iments, it is often encountered that the dependence of 
some observable quantity y on some independent vari- 
able(s) t is predicted, based on theoretical ground, to 
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have the form 
y = h(x, t) 


and m values yj, ... 
tm where x € R” is an adjustable parameter vector. We 
are required to choose an optimal parameter vector x* 
such that the function h(x, t) best fits the data in the 
least squares sense, that is, define the function f(x) by 


» ¥m are measured for points f),..., 


m 


fl) = 5 Dolhle, t) = yi = Fre), 


i=1 


and x* is found by minimizing the function f(x) where 
ri(x) = h(x, ti) — yi, i= 1,..., mare called residuals. 

In practice, both the values yj, ... 
tm are usually subject to measured errors. In the above 
conventional approach, implicit assumptions are made 
that only the values y;,...,; ¥m are subject to measured 
errors and that the values t), ..., tm are either exact or 
contain negligible errors. In many applications, how- 
ever, this is an oversimplification and the use of the 
above nonlinear least squares problems may lead to bias 
in the estimated parameters and variance values. It is 
then necessary to take proper account of errors in all 
variable values. The function of the resulting problem 
has the form 


> Vm and tj, ..., 


jane ; SilGhte, 2t) = iP + 87 


= slr, t) r(x, t) + e(t)'e(z)] , 


where T = (T1,..., Tm)? and r(x, T), e(t) are m-vectors 
with components rj(x, tT) = h(x, t;) — y; and ej(t) = T; 
—t,,i=1,..., m, respectively. Taking errors in all vari- 
ables into account increases the complexity of the prob- 
lem. For example, the simplest linear problem h(x, t) = 
x, + X2t is no longer a linear problem when errors in all 
variables are taken into account. 


Optimal Designs 


Let us consider the optimal design of a coaxial cable cir- 


cuit. Let x1,..., x, be the design parameters. For a given 
circuit, its performance index is a function of x1, ..., Xn 
and w 

h(x), eee Xn @) ’ 


where @ is the frequency of the circuit. The aim of the 
circuit optimal design is to determine the circuit pa- 
rameters x1, ..., X, such that the performance index 
of the circuit best approximates a given characteristic 
function (w) in a given interval [w, 6] of the fre- 
quency w. This can be expressed as to find x* € R" 


which minimizes the function 


fo) =5 inwr 


i=1 


m 

= 5 Vlhtba, ne) — Hod? 
i=1 

where @ < @, <-++ < @m < B. Meanwhile the circuit 

parameters are generally subject to some restricted fac- 

tors of supplied cables such as geometric shapes, diam- 

eters, medium materials and so on. These subjects can 


be expressed as constrained conditions in the form 


Ci(X1,..-,%1) 20, fHl,...,p. 


Therefore, the mathematical model of the optimal de- 
sign of a circuit generally has the form 


min fe) = 5 InP 
i=1 


st. cj(x)>=0, j=l,...,p. 


Nonlinear least squares problems can be classified 
into unconstrained and constrained ones depending on 
whether there exist constraints on variables x € R”. For 
example, the resulting problem in the optimal design of 
the coaxial cable circuit is a constrained nonlinear least 
squares problem while the problem formed in the so- 
lution of simultaneous equations is an unconstrained 
nonlinear least squares problem. In some situations, 
the solution of a unconstrained nonlinear least squares 
problem may be unacceptable. To avoid such a situa- 
tion, some restrictions on parameter vector x can be 
imposed in the form of constraints. For example, some 
variables are required to be nonnegative, some may be 
bounded by lower and/or upper bound(s), and some 
dependent relationships among variables must be sat- 
isfied. This also results in constrained nonlinear least 
squares problem. When errors in all variables are taken 
into account, the resulting problem is generally called 
a generalized nonlinear least squares problem. This in- 
creases not only the problem complexity, but also the 
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problem dimension. An advantage of the generalized 
nonlinear least squares problems that can be exploited 
is that the problem variables are separable. An opti- 
mization problem is called separable if the optimization 
with respect to some of the variables is easier than with 
respect to the others. Of course, there are other sep- 
arable nonlinear least squares problems such as arose 
in curve fitting, component analysis and orthogonal re- 
gression (see [14]). 

When r(x), and hence f(x) is twice continuously dif- 
ferentiable, the gradient and the Hessian matrix of f(x) 
are given by 


V(x) = A(x)r(x), 


V? F(x) = AG)AG)T + DT rilx)V?ri(x), 
i=1 

where A(x) = [Vr1(x), .... Wrm(x)]. Let x* be a mini- 
mizer of a NLS problem. The problem is called a zero 
residual problem if r(x*) = 0 and hence f(x*) = 0 and 
a nonzero residual problem if r(x*) # 0. A nonzero 
residual problem is called small residual if r(x*) is small 
or the second part of V7f(x*) is relatively small com- 
pared with the first part of V?f(x*), otherwise it is called 
large residual. 

Methods for nonlinear least squares are iterative 
type and are based upon trying to find a point x* at 
which the so-called Kuhn-Tucker optimality condition 
is satisfied. These methods are generally of descent type. 
From a given initial point, these methods generate a se- 
quence {x} such that either one point of the sequence 
or any accumulation point of {x} is a Kuhn-Tucker 
point (referred to as a KT point). The typical behavior 
of a method is that if the iteration is not terminated 
at some point x, the values (x) of a merit func- 
tion for the problem are monotonically decreased so 
that the iterates x move steadily towards a neighbor- 
hood of a KT point x*, and then converges rapidly to 
the point x*. The basic structure of the kth iteration of 
such a method has the form 
e Check if x is a KT point. 

e Ifx is not a KT point, determine a 5 such that 


Hx +8) < G(x), 
@ xk) = 4) 4 8H, 


The information of second order derivatives of 
problem functions are required to determine if a KT 


point is a local minimizer. Since the evaluation of sec- 
ond order derivatives are time consuming and some- 
times the second order derivatives of functions are not 
available, most nonlinear least squares methods do not 
evaluate second order derivatives and just try to locate 
a KT point. A KT point may not be a local minimizer. 
However, there exist other features of methods such 
as the monotonic decreasing property of the sequence 
{o(x)}, which usually imply that a KT point is a local 
minimizer, except in rare cases. A descent method with 
this property is called globally convergent, that is, the 
method does not require the initial point x“) close to 
ae 

A globally convergent method for optimization usu- 
ally defines a merit function ¢ to force convergence 
from poor starting points. For unconstrained nonlin- 
ear least squares problems, the choice of ¢ is simple. 
It is natural to choose the objective function f as ¢. 
For constrained nonlinear least squares problems, the 
choice of ¢ is complicated by the fact that x + 6 
should move closer to satisfying the constraints than 
x and 6“ yields a reasonable decrease in the objective 
function. A number of merit functions are available for 
constrained nonlinear least squares problems solution. 
These include the ¢; penalty function proposed by S.P. 
Han [11] and used by R. Fletcher [9], the augmented 
Lagrange functions of M.R. Hestenes [12] and Fletcher 
[7], the merit function of G. Di Pillo and L. Grippo [6] 
and the merit function of P.T. Boggs and J.W. Tolle [2]. 

Strategies are available to achieve the descent prop- 
erty in nonlinear least squares methods. These are: line 
search strategy and trust region strategy. Line search 
strategy transfers a multivariable minimization prob- 
lem into a series of sub-minimization problems with 
single variable, and are the most commonly used in 
practice. In line search methods, 5) is obtained in the 
form 5“) = a;,s where s is a descent direction of the 
merit function (x) at x and a, > 0 isa steplength 
along the direction s“, The descent direction s is gen- 
erally obtained as a minimizer of some model problem 
which is a local approximation to the original problem 
at x) and the steplength a; is determined by some line 
search method so that x + as gives a sufficient de- 
crease in a chosen merit function from x). Specifically, 
a, is determined such that p(x + ays) < b(x™). For 
methods with line search, the convergence and the con- 
vergence rate depend upon the reduction on the merit 
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function at each iteration, which relies on the choices 
of the descent direction s and the steplength a;. The- 
oretically, exact line searches satisfying a, = argmin 
b(x™ + as) usually gives the maximal reduction on 
the function o(x + as). But determining a, by ex- 
act line search is not necessary and inefficient when it- 
erate x is remote from x*. Practical line search meth- 
ods search for a steplength satisfying certain condi- 
tions. These conditions are readily satisfied so that the 
steplength can be effectively determined and guarantee 
the convergence of a method when the search direction 
s is sufficient descent [1]. Among various conditions, 
Goldstein’s conditions are widely used in practice. These 
conditions require the steplength a; to satisfy 


p(x + as) < o(x) 4 paVo(x™) Ts, 
b(x™ aie as‘*)) = (x) +4 oaVd(x™) Ts, 


where 0 < p< o < 1. Usually, these two conditions de- 
fine an interval of acceptable a-values. A disadvantage 
of the second condition is that the minimum of ¢(« + 
as“) may be excluded to the left of the interval. A rem- 
edy to avoid this case is to use 


Vol(x™ + as)TS > GV G(x)T 


to replace that condition. Numerous line search meth- 
ods are available. Strategies used in line search methods 
generally consist of bracketing and sectioning. The sim- 
plest line search methods are pattern searches such as 
golden section search and Fibonnaci section search [1]. 
These pattern searches use only evaluated function val- 
ues. When the first order derivative information are 
available, the simplest line search is the backtracking 
which seeks the smallest integer i satisfying 


d(x + tags) < (x) + ptiaoVd(x™)' s® 


and set a = t/ ao if j is such an integer. Effective line 
search methods to determine a, satisfying the Gold- 
stein conditions are also available [1]. These methods 
employ sectioning schemes and interpolations. 

Trust region strategy generates 5“ by solving some 
model subproblem with a trust region constraint. The 
trust region defined by || 6 || < Ax is a neighborhood 
about the current point x and is adjusted in such 
a way that the subproblem model is believed to have 
adequately approximated the chosen merit function in 


that region. For a given trust region radius A,, the so- 
lution 5 of minimizing the subproblem model within 
the region || 6 || < Aj is sought. If a satisfactory reduc- 
tion on the merit function is obtained at x + 6, x 
+ 5 is accepted as x“), If the computed step 5 is 
not acceptable, the sub-problem model is not accurate 
enough in that region and the radius of the trust region 
is reduced to improve the accuracy of the approxima- 
tion and the step is recomputed. The trust region radius 
may be increased after an acceptable step. Methods with 
trust region converge globally. An obstacle preventing 
the trust region methods from common use is the effec- 
tive solution of trust region subproblems. Repeated so- 
lution of system of linear equations with modified co- 
efficient matrices are required to obtain a satisfactory 
5 and increase the complexity of trust region meth- 
ods. Now effective approximate solution methods for 
the solution of trust region subproblems exist. These in- 
clude positive definite dogleg path method [5,13], indef- 
inite dogleg path methods [19], optimal path method in 
two-dimensional subspaces [3] and conjugate gradient 
method [15]. 

Subproblem models are generally local approxima- 
tions to a chosen merit function at current iterate 
point x, These approximations are generally linear or 
quadratic. A linear approximation is the simplest func- 
tion and a quadratic approximation is usually more ac- 
curate than linear approximation in certain neighbor- 
hood of x. The quadratic function is one of the sim- 
plest smooth functions and the minimum of a model 
problem with quadratic function is well-determined 
and is relatively easy to determine. 

As an important class of optimization problems, 
any method for general minimization can be used to 
solve nonlinear least squares problems. However, spe- 
cial methods which take the advantage of the spe- 
cial structure of the objective function in nonlin- 
ear least squares are available. Most special methods 
are based on the well-known Gauss—Newton (G-N) 
method which requires only the first order derivatives 
of problem functions. In the G-N method, the matrix 
A(x) A(x) is used to approximate the Hessian ma- 
trix V2F(x), Since r(x*) may be small or zero as f(x) 
is minimized, this may be a good approximation when 
x is close to x*. It is well known that for zero resid- 
ual problems, the local convergence rate of the G-N 
method is quadratic, but for small residual problems, 
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the convergence of the method is at most linear and 
even not converge for large residual problems. Accord- 
ing to the information used in a method, methods for 
nonlinear least squares can be divided into first order 
derivative methods, second order derivative methods 
and methods without derivatives. Among them the first 
order derivative methods are the most commonly used. 
These include (damped) G-N method, quasi-Newton 
methods, hybrid methods [1,10], adaptive method [4], 
factorized quasi-Newton methods [17,18] and meth- 
ods with quasi-Newton corrections to Gauss-Newton 
matrix. As for second order derivative methods, the 
Newton method is the most famous. However, the dis- 
advantages of the Newton method such as the eval- 
uation of second order derivatives and convergence 
only local make the method rarely used in practice. As 
for methods without using derivatives, one can make 
a choice among the direction set methods [8] and the 
hybrid Gauss—Newton-Broyden method [16]. Theo- 
retical analysis and numerical results show that these 
methods have good convergence properties and are ro- 
bust and trust. 
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Descent methods for nonlinear least squares problems 
generate a sequence {x} such that I) < f(x) 
and x) converges to a local minimizer of the objective 
function f(x). Methods with line searches generate the 
sequence using the iteration 


Ft) = 4 asl, 


where s is a descent direction of F(x) at x and a 
is a steplength along the direction s“ determined by 
line searches. The search direction s is obtained as the 
solution of the system 


Byes = —k , 


where gx is the gradient of f(x) at x and B, is ei- 
ther the Hessian G of f(x) at x or its approximation. 
The matrix B; is required to be positive definite, so that 
the solution s which is the unique minimizer of the 
quadratic model 


1 
qk(5) = fet gp d+ 501 Bd 


is guaranteed to be a descent direction, where q;(5) is 
a local approximation to f(x) at x, fy =f (x) and 6 = 
x— x), 

When the matrix B, is not positive definite that 
are often encountered in practical calculation, the 
quadratic model does not have a unique minimizer and 
methods with line searches may not be defined. A more 
realistic approach is to take x*t) = x + §, The 6 
minimizes the quadratic model within a neighborhood 
N(x, Ax) of the point x in which the quadratic func- 
tion is believed adequately to approximate the function 
f(x). The neighborhood N(x, Ax) is generally called 
a trust region and methods having this framework are 
called trust region methods. These methods can retain 
the rapid rate of convergence of Newton type methods, 
but are also generally applicable and globally conver- 
gent. 

The development of trust region methods can be 
traced back to the work of K. Levenberg [8] and D.W. 
Marquardt [9] on unconstrained nonlinear least squares 


problems 


gest ae ee 
min f(s) = 5 D lel = 5rta)" ri), 
where r(x) = (11(x), ...3 Tm(x))T and rj(x), i= 1,..., m, 
are nonlinear functions of x € R". Assuming that r;(x), 
i=1---, m, hence f(x) is twice continuously differen- 
tiable, the gradient and the Hessian matrix of f(x) are 
defined by 


g(x) = Vf (x) = A(x)r(x), 
G(x) = V* f(x) 


= A(x)A(x) + Yo ri(x)V?ri(x) . 
i=1 


The Levenberg-Marquardt method is a modification of 
the well-known Gauss-Newton method and is based 
upon the idea that when the full Gauss-Newton step 
fails, a proper bias towards the steepest descent direc- 
tion may generate a satisfactory reduction on the func- 
tion f(x). Thus the step 6 ) between iterates of the Lev- 
enberg-Marquardt method is a solution of the system 


(AA, + Meld = —ge 


for some properly chosen jz, > 0. The application of 
the Levenberg-Marquardt method to general uncon- 
strained minimization is given by S.M. Goldfeld, R.E. 
Quandt and H.F. Trotter [6], in which the matrix AAL 
in the above system is just replaced by the matrix By, 
that is, 


(Be + eld = —ge . 


Let 5 be the solution of the system. Then 5“ solves 
the trust region subproblem 


min qx(5) = fe + gp 6 + $6' BS 
st. |[d|| < Ax 
with A, = ||6“||. Also for any given Aj, the solution 


5“ of the trust region subproblem usually satisfies the 
system 


(Be + ual)}8=—ge, [8] = ax, 


where jg => 0 such that By+ pz is at least positive 
semidefinite, except for the case when B;, is positive def- 
inite and ||By'gx|| < Ax, then the solution is 6%) = — 
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By gx. It can be seen that the solution 6“ can be ob- 
tained by controlling either the value ju, or the radius 
Ak. 

Early versions of the Levenberg—Marquardt and 
Goldfeld-Quandt-Trotter methods determined 8“ 
from the system by controlling jz, and recently used 
trust region methods determined 5 by controlling the 
radius A,;. Direct control of 4; has a number of disad- 
vantages. One is that there does not seem to have a rea- 
sonable and automatic initial choice for the value 1, 
while a reasonable value for initial A; can be a small 
fraction of the size ||x“ || of the starting point x“). One 
more problem occurs when x + 5 leads to an in- 
crease in the objective function f(x). In this case, the 
function and derivative information evaluated at x“ 
and x+ §® can be used to estimate a required de- 
crease in radius A;, but it is not clear how these in- 
formation are used to estimate a reasonable increase 
for [Lk. 

In implementing recent forms of trust region meth- 
ods, there are two main problems. One problem is how 
the trust region radius A; shall be chosen. To prevent 
undue restriction of the step 6 (k) the trust region radius 
A; should be as large as possible under the condition 
that q,(5) adequately approximates f(x) in that region. 
Let 5“) be the solution of the trust region subproblem 
for given A,, the agreement between q,(5) and f(x) in 
the neighborhood N(x, A,) can be measured by the 
comparison between the actual reduction in f(x) 


ared(8) = f(x) — f(x 4 8) 
and the reduction predicted by the quadratic model 
pred(8) = f(x) — qx(5) 


ll gar 
= — gig) _ 55" B,6®) . 


where By is either the Hessian matrix G; of f(x) or its 
approximation, for example AA, . If ared(5) is sat- 
isfactory compared with the pred (5), which implies 
a good reduction in f, the trust region is referred to 
as proper. Then x“ + 5 is accepted as a new iterate 
point x*)) and the trust region radius may either keep 
unchanged or be increased in case the reduction in f 
is sufficient and the trust region constraint is active. If 
the computed step 5 is not acceptable, which occurs 
when q;(6) is not accurate enough in N’ (x, A,), then 


the radius A; is reduced to improve accuracy of the ap- 

proximation and the step is recomputed from the trust 

region subproblem with reduced A;. A model trust re- 
gion method with direct control of A; can now be de- 
scribed as follows: 

1) Give parameters 0 < yi< 1< y2,0< 71 < 2 < land 
Amax > 0, initial point x and Aj( < Amax) and set 
k=1. 

2) Calculate f;, gx. If g. = 0 then terminate, else form 
Br. 

3) Solve the trust region subproblem for 5. 

4) Compute 0; = ared(6™)/pred(6). 

5) If Ox < m, then A; = y; Ay and go to step 3). 

6) xD = y® 4 6%), 


A if > 2 
and ||| = Ag, 


Ax otherwise, 


Arti = 


set k=k+1, where A = min{y)Ax, Amax} 

The most important issue to implement a trust re- 
gion method is the efficient solution of the trust region 
subproblem. There are three possible cases to deter- 
mine the solution of the trust region subproblem: 

a) Newton step case: By is positive definite and ||By'gxl 
< Ax. The solution is 6 = — Be ay 
b) General case: By is positive definite and ||B,'gx|| > 
Ax, or Bx is indefinite and ||(B,— ner gxl| = Ar 
where (A)* denotes the generalized inverse of the 
matrix A and pl is the smallest eigenvalue of By. 
The solution is 8 = — (By + Wx 171 gy with up > [ee 
= max {0, — 11}, ]5 || = Ag. 
Hard case [11]: Bx is indefinite and ||(B,— pine 
&x|| < Ax. The solution is 6 = — (Be — ps? Tt get 
fn” where u\*) is the eigenvector of B, correspond- 
ing to ee? and t; is determined from the equation || 

(Be- wt Tt gut tu*) || = Ax. 

Since the hard case rarely occurs, the solution of trust 
region subproblem is closely related to the solution of 
the equation 


c 


na 


Px (He) = [15e(H)I] — Ax 
= || (Be + LT) gx | —A,p=0. 
This is a nonlinear equation with respect to yz and there 


is generally no finite method to find its exact solution. 
Since Ox (j4¢)> 0 and ¢;(0o) = — Aj, it is clear that the 
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solution can be found in the interval (jug, oo). Since 
x(2) is a convex and continuous monotonic decreas- 
ing function in the interval (j1¢, 00), the solution of the 
equation is unique. M.D. Hebden [7] used the practi- 
cal structure of the function ¢;(j/1) to propose a method 
that generates a sequence {} such that ju) — jug. Let 
ww” be a current estimation to j1j, a fractional function 


Wk(u) = a= aa — Ax 


is used to approximate the function $;(j1) such that 


Wel?) = eC), Welw) = ou") 


where 
_ _ e(W) + Av? 
b.(L) 
p= - bx (uu) + Ak ne) 
b,.(u) 


Setting yx(j2) = 0 and substituting a and f give the it- 


eration 
ust) = yD) + gy) eect | 1 
&(u) = | seu 


Bi( WD) (Be + HOD 15, (WD) * 


J.J. Moré and D.C. Sorensen [11] also derived the itera- 
tion by applying the Newton method to the equation 


l 1 
Ax [[8x@)|I - 


The iteration can be calculated in the following way. 

i) Factorize the matrix (By+ w9I) = RTR with R an 
upper triangular matrix. 

ii) Solve RTRS = —g;, for be() using forward and 
backward substitutions. 

iii) Solve RTz = 5;(™) for z; using forward substitu- 
tion. 


h(w) = 


iv) 


Cy |* (i) 
We a 1 | 


lil - 


To start the iteration, an initial value for 
A natural choice is the value ju“ = jg. This choice 


is required. 


needs the calculation of the smallest eigenvalue of the 
matrix B; and will cause numerical difficulties when B, 
is not positive definite. Moré [10] used the choice 


Ar 
pe = pp rn : 


as the initial value of jx, at the kth iteration where [44-1 
is the accepted value of the equation $,_;(j/z) = 0 at the 
(k — 1)th iteration. J.E. Dennis and R.B. Schnabel [4] 
proposed the choice 


pe = py + E(e-1) | ect = 1 
k 
E(Wk-1) 


_ 5x1 (x—1) II" 
Sk—1(Hk—1) | (Broa + Ma—1D)7 181 (Hk-1) | 


which is an analog to the iteration for w*. Of course, 
a safeguard strategy is imposed to force convergence, 
that is, lower and upper bounds for 2") are provided 
and updated in iteration process. The iteration finds an 
approximate solution 6“ satisfying the Hebden condi- 
tions [7] 


pred(6) > t(fe— af), 
Je [ a4] seme, 


where t and p are positive constants and qj is the opti- 
mal value of the trust region subproblem for given Ax. 
Then strong convergence result can be obtained (see 
[5]). 

In the general case, the repeated solution of the sys- 
tem (By, + 4 1)5 = — gy for different values of 1 to de- 
termine a satisfactory approximate solution 5“ of the 
trust region subproblem for given value A; and the re- 
computation of 5“ for reduced values A, may be re- 
quired at each iteration. It is this complication that pre- 
vent trust region methods from wide use in past two 
decades. Most practical trust region methods attempt 
to find an approximate solution of the trust region sub- 
problem in a reasonable amount of computational ef- 
fort. G.A. Shultz, Schnabel and R.H. Byrd [13] proposed 
general conditions on the approximate solution 6 to 
ensure a satisfactory reduction on q,(4) so that resulting 
trust region methods have strong convergence proper- 
ties. These conditions are as follows: 
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A) There exist constants 6,;>0 and o,> 0 such that for 
all A; and By 


Ilgel 
|| Bel 


pred(6) > B; [lgu| min Ase; 


B) There exists a 62> 0 such that for all A; and By, 
pred(8) > —Bopu\ A? . 


C) If the matrix B; is positive definite and ||By'g«|| < 

Ax, then 

6) = —B,' gx : 

Condition A) ensures the global convergence of a trust 
region method to a point satisfying the first order nec- 
essary condition and a trust region method satisfy- 
ing condition B) will converge to a point at which 
the second order necessary conditions are satisfied. 
When condition C) is satisfied, a convergent trust re- 
gion method converges at a quadratic rate if G(x*) is 
positive definite at the limit point x*. 

Let A; vary in the interval (0, oo), then the solution 
5x(j2) of the trust region subproblem forms a continu- 
ous optimal path P(r) in the space R". The optimal 
path can be expressed as 


r®(z) = rer) + KY”) 


where 
t(T) k k 
Pes) = — >) = 1, 
2 ptr) +1 dX 
PS (6(z)) = O(c) u®, 


t ifr<, 
t(t) = Me 
+ ify > 4, 
be He 
0 if ue = 0 
A(t) = 
max{t——,0; if ug > 0, 
= {i pe x of, N= {i te =o 
7 
yi) = ie gus”, 11). nN, 
and ps? See < pe are eigenvalues of B, and a i= 


1, ..., m, are corresponding orthonormal eigenvectors. 
The optimal path has two properties. As a point x pro- 
ceeds from x along the path: i) the distance to x“ is 


monotonically increasing, and ii) the value of q;,(5) is 
monotonically decreasing. These properties guarantee 
that for any given Aj, the solution of the trust region 
subproblem is 6“ = 7 (r,) where tx is uniquely de- 
termined from the equation || (r) || = Ax. The for- 
mulation of the path I” (*) (rz) needs the calculation of all 
eigenvalues and eigenvectors of the matrix B;,. This is 
time consuming and is unrealistic. J.P. Bulteau and J.- 
Ph. Vial [1] restrict the solution of the trust region sub- 
problem in a two-dimensional subspace S = span[a)a2] 
and choose the solution of the problem 


min qx(5) = fe + 9,6 + 35" Bed, 
s.t. S|] < Ay, 6€S, 


as an approximate solution of the subproblem. The vec- 
tors a; and a2 are chosen in such a way that A = [a,a2], 
ATA =I. Then any 6 € S can be expressed as 6 = Az for 
any z € R? and the restricted subproblem can be sim- 
plified as 

min qx(z) = fet g, Az + tz’ A'BAZ, 

s.t. |z|| < Ax. 
This is a trust region subproblem in the space R? and 
can be easily solved using the optimal path method, 
since we only need calculate the whole eigensystem 
of a (2 x 2)-matrix ATB;A to form the optimal path 


r Wiz), The subspace S is generally chosen in the fol- 
lowing way: 


if B, P.D., 
if B, L.D., 


span[—g;, —By' gx] 


span[—gx, us*)] 


where P.D. and I.D. denote positive definite and indef- 
inite, respectively. In fact the path I ®)(z) can be re- 
garded as a projection of the optimal path 7 (r) in the 
subspace S and is an approximation of the path [ (cr). 
Based on this idea, the recent efficient solution meth- 
ods for trust region subproblems first form an approxi- 
mate path Fe) and then choose 6“ = P(r) with 
| (8) (¢,) || = A;. This greatly reduce the complication 
for the solution of the trust region subproblem, since 
for any given A; the solution is obtained from the solu- 
tion of the equation || (r) || = Aj and the reformu- 
lation of the path for any reduced A, is not required. 
When formulating an approximate path, it is required 
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to satisfy the two properties i) and ii) that the optimal 
path has. 

Dogleg path methods are most effective and are first 
suggested by M.J.D. Powell and modified by Dennis 
and H.H.W. Mei [3] for positive definite matrices and 
extended to indefinite matrices by J.Z. Zhang and C.X. 
Xu [15]. Powell’s method uses a single dogleg path in 
the form 


W = (0,809.38) 


to approximate the optimal path, where 


is the minimizer of q;(4) in the steepest descent direc- 
tion and 


a = —B," &k 


is the global minimizer of q;(6). Then the solution 6 (k) 
of the problem 


min {qx(6): [6 < Ae, 8 ¢ FO} 


is taken as an approximate solution of the trust region 
subproblem. The solution 6“ can be obtained in the 
following way 


~Bz'g, if 30) < Ax. 
6h) — Tee if 50 > Ax, 


Ce + te (50 =f -) otherwise , 


where t; € (0, 1) is determined from the equation || a 
+ t(8— 6 se) || = Az. Dennis and Mei modified the 
single dogleg path to form a double dogleg path 


FyP = (0,3, 6,6), 
where 
4 


Sk Be gk & By gk 


The solution 5“ in both the methods satisfies the gen- 
eral conditions A) and C). Hence both the methods are 
global convergent and convergence rate is quadratic if 
G(x*) is positive definite at limit point x*. 


Both the single and double dogleg path methods do 
work well when all the matrices B;, are positive definite. 
But they are unable to deal with the nonpositive definite 
case which occurs very often in practice. Zhang and Xu 
proposed indefinite dogleg paths for indefinite matrices 
Bx. One of these indefinite dogleg paths has the form 


2 106.8) a) 
where 
a 
6) — Sa k 
MP g, (Be + we D ge 
5D = — (Be + pel) ge, 


d is a negative curvature direction of the matrix B, and 
[tx is chosen such that (By+ [xI) is positive definite. 
Since the optimal path )(r) is infinite when By is in- 
definite, this is an infinite dogleg path. The solution 5 
obtained in the path has the following form 


“Tel [>a 
oP + tpd, th >0 if [5,0 | = Ag; 
a eee Sah), 
tz € (0, 1) 


6h) = 


otherwise , 


where both ¢, are determined from the equation 
\|5“ (t)|| = Ay. When this indefinite dogleg path is com- 
bined with either the single or the double dogleg path, 
the resulting trust region method generates the solution 
5“) satisfying all the three general conditions A)-C). 
The negative curvature direction d can be effectively 
obtained from the Bunch-Parlett factorization 


PB,P! =LDL" 


for symmetric matrices [2], where P is a permutation 
matrix, L a unit lower triangular matrix and D a block 
diagonal matrix with 1 x 1 and 2 x 2 diagonal blocks. 
Since matrices B, and D have the same inertia, the neg- 
ative curvature directions of B, can be directly obtained 
from the eigenvectors of D corresponding to its nega- 
tive eigenvalues. Let v; be the eigenvector correspond- 
ing to the smallest eigenvalue of D and v = (v] LTPg,)v1. 
Then 


d =—sgn(g) P'L~'v)P' Ly 


is a satisfactory negative curvature direction of B, [15]. 
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For large scale problems, methods with matrix fac- 
torizations generally are not suitable. T. Steihaug [14] 
proposed to use a conjugate gradient method for the sys- 
tem B,d = — gx to find an approximate solution of the 
trust region subproblem. After 46;, ; and d; are given, 
the iteration of the conjugate gradient method has the 
form: 


bj+1 = 5) + 0d; , 
d; Bydj 

hiv = hj — 0jBed;, 

djti =hjri + Bjd;, 


g, = taelit 
a 
hl hy 
j=l,....n. 


When this iteration is used to generate an approximate 

solution to the trust region subproblem, the solution is 

obtained in the following way. Assume that dj Bydj >0 

forj=1,...,i—1and || 6;|| < A, and that h; and d; have 

been calculated. In case either d} Bed; <0Oor d} Bad; > 

0 but || 6;+ ajdj|| > Ag, 5; + tid; is chosen as 5 where 

the value t; is determined from the equation || 6;+ tdj]| 

= At. If d} By d; > 0 for all j = 1,..., mand || 6n4;|| < 

Aj, then it follows from the properties of the conjugate 

gradient method that the matrix By, is positive definite, 

Ans = 0 and 6,41 = — By! gx is the exact solution of the 

system B,.d = — gx. Therefore, 5,4; is the desired solu- 

tion of trust region subproblem. The conjugate gradient 

method for the solution of the trust region subproblem 

has the form. 

1) Give 6; =0, hy = — gg, di =hy, ng € (0, 1), j =1. 

2) If dj Bud; < 0 then go to 6) below, else calculate a; = 
hj hyld} Bed; and 3j41 = dj+ adj. 

3) If || dj41|] = Ax, go to 6) below. 

4) Calculate hj.) = hj — ojByd;. If \|Ajill < nellgell 
choose 5 = 8:41. 

5) By = App yhjlh, hj, dir = hy + Bid; j= + 1, then 
go to 2). 

6) Determine t; such that || 5; + tjdj|| = Aj and choose 
6%) = 5; + tid). 

In fact, the conjugate gradient method for the approx- 

imate solution of the trust region subproblem is essen- 

tially a multiple dogleg path method. When the matrix 


By, is positive definite, the multiple dogleg path is 
Pe =O bine ne) 


and when the matrix B;, is indefinite, the multiple dog- 
leg path is 

r® = (0, 8),...,6;,d;) 
where dj is the first calculated negative curvature direc- 


tion in the iteration process. Then the solution 5“ is 
obtained either in the path I” us or in the path I” -. 
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An azeotrope occurs when a liquid mixture boils at 
a constant temperature, producing a vapor with the 
same composition as the liquid. This is a significant 
phenomenon in distillation processes. An azeotrope 
represents a barrier to distillation, beyond which fur- 
ther separation of the components in the mixture is 
not possible. Therefore, the problem of locating all 
azeotropes in a chemical mixture is more than simply 
a thermodynamic curiosity, it is important for a num- 
ber of reasons. First, the task of locating all azeotropes 
in a chemical mixture is essential for the synthesis of sep- 
aration processes. A separation process that is designed 
without complete knowledge about the azeotropes in 
the system is likely to be infeasible. 

In addition, many thermodynamic models have 
been proposed which can predict the phase behavior 
of nonideal mixtures. Unfortunately, the accuracy of 
these models is not uniform over a wide range of mix- 
tures. One useful way of testing the accuracy of a model 
for a given mixture is to compare the compositions of 
the azeotropes predicted by the model with those deter- 
mined by experiment. 

Despite the considerable interest in the area of pre- 
dicting phase equilibria for chemical mixtures, rela- 
tively few methods for enumerating the azeotropes for 
a given system have been reported. The thermodynamic 
conditions for azeotropy constitute a nonlinear system 
of equations. This problem presents the additional chal- 
lenge of finding all of the solutions to the nonlinear sys- 
tem of equations where the number of solutions is not 
known. 


Nonlinear System of Equations 


The most common type of azeotropes studied to date 

are called homogeneous azeotropes and occur when 

a single liquid phase is in equilibrium with the va- 

por phase. The thermodynamic conditions for homo- 

geneous azeotropy are therefore gives, 

1) equilibrium between the vapor phase and a single 
liquid phase, and 

2) composition of the vapor phase is identical to the 
composition of the liquid phase. 
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Two other classifications of azeotropes, heterogeneous 
and reactive, will be discussed later in this article. 


Condition 1: Phase Equilibrium 


The equilibrium condition requires that the chemical 
potential of each component must be the same in the 
liquid and the vapor phases. If we consider a mixture of 
N components, 


wy =pry, ViEN, 


where j4) and i represent the chemical potential of 
component i in the vapor and liquid phases. From the 
definition of the fugacity of component i in a mixture, 
Ta 

fl =ft, VieN, 
hence, 

yi] P = xi fh 
The symbol ov represents the mixture fugacity coeffi- 
cient of component i in the vapor phase. For the liquid 
phase, yj is the activity coefficient, and f/ is the fugac- 
ity of component j in the liquid phase. Rearranging (1) 
gives, 


yi _ VS 


ee oYP 


VieN. (1) 


, WieN. 


A common simplification is to assume that at low 
pressure the vapor phase can be modeled as an ideal gas, 
for which oY = 1. For the liquid phase the fugacity is 
equal to fj = $$“! PS" (PF);. But, for an ideal gas, p5*" 
= 1, and (PF); = 1. Therefore, 


a. (2) 


Condition 2: Equality of Phase Compositions 


The azeotropy condition requires that the composition 
of the vapor phase is identical to the composition of the 
liquid phase. 


Yi =Xji, VieN. (3) 


Equations must also be added to require that the 
mole fractions in each phase sum to unity and have val- 
ues between 0 and 1. 

pe ae 
i€N i€N (4) 
0< yi,xi <1, VieN. 


Equations (2), (3), (4) constitute the nonlinear sys- 
tem of equations that are satisfied by a homogeneous 
azeotrope. Typical nonlinear equation solvers cannot 
be used robustly to find all of the solutions to this sys- 
tem of equations, since it is a nonlinear, constrained 
problem with multiple solutions. Many systems contain 
multiple azeotropes, each of which is a solution to the 
system. In addition, each pure component is a solution, 
giving at least N solutions. 


Solution Methodologies 


Most of the previous work reported in the literature has 
been limited to calculating homogeneous azeotropes 
using local nonlinear equation solvers. This means that 
the ability of these methods to predict azeotropes is de- 
pendent upon choosing good starting points for the so- 
lution technique. These methods cannot guarantee that 
all of the azeotropes have been located in a particular 
system. [1] calculated ternary homogeneous azeotropes 
using the Wilson model under isothermal conditions. 
[10] calculated homogeneous azeotropes of binary mix- 
tures using an equation of state as the thermodynamic 
model. Their approach was to fix temperature and vary 
composition and volume until thermodynamic equilib- 
rium conditions were satisfied. [12] also used an equa- 
tion of state to calculate homogeneous azeotropes for 
binary mixtures. 

[2] presented a search method for finding ho- 
mogeneous and heterogeneous azeotropes which uses 
a Levenberg-Marquardt algorithm to find homoge- 
neous azeotropes and then checks the stability of each 
solution with the tangent plane criterion described by 
[8]. A solution which is found to be unstable is then 
used as the starting point for a new search for an het- 
erogeneous azeotrope. Again, this technique relies on 
local solution techniques and cannot guarantee that all 
azeotropes are located. 

[4] proposed a method for locating all homoge- 
neous azeotropes in multicomponent systems. This 
method uses a homotopy continuation technique for 
tracking branches of solutions to the nonlinear system 
of equations. They demonstrated that the technique 
performed robustly for systems containing up to five 
components using the Wilson activity coefficient equa- 
tion. Recently, [3] have extended this method for het- 
erogeneous azeotropes, and [9] have used it to predict 
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reactive azeotropes. [7] have formulated the problem of 
locating all homogeneous azeotropes as a global opti- 
mization problem in which each global minimum solu- 
tion corresponds to an azeotrope. They applied this ap- 
proach robustly using the Wilson, NRTL, UNIQUAC 
and UNIFAC activity coefficient equations for systems 
containing up to five components. 

An excellent review on nonideal distillation, includ- 
ing a discussion on the computation of azeotropes has 
recently been published in [13]. 


Global Optimization Approach 


In order to find all azeotropes, one must find all solu- 
tions to the system of nonlinear equations constituted 
by (2), (3), and (4). The method proposed in [6] refor- 
mulates the problem of enclosing all solutions of non- 
linear systems of constrained equations into a global 
optimization problem in which the task is to enclose all 
global minimum solutions. In this approach, each non- 
linear equality is replaced by two inequalities and a sin- 
gle slack variable is introduced. For the location of all 
homogeneous azeotropes, this corresponds to employ- 
ing equations (2), (3), and (4) and reformulating them 
as the following global optimization problem: 


st. InP—InP#—Iny;-s<0, Vi, 
—InP+InP#'+Iny;-s <0, Vi, 


yx; = (5) 


Problem (5) may have multiple global minima. Each 
global minimum of Problem (5) (where the solution s* 
= 0) corresponds to an homogeneous azeotrope since 
when s = 0 the constraints (2), (3), and (4) are satisfied. 
Note that the first two sets of constraints of (5) cor- 
respond to the nonlinear equations (2) of the equilib- 
rium constraint written as two inequalities. In addition, 
note that the nonlinear term P§“‘y;x; appears as both 
a positive and a negative term. Thus, this term must be 
nonconvex in at least one of the two constraints. This 
means that if a local optimization approach is used to 
solve Problem (5), some or all of the global solutions 
may be missed. 


For azeotropes in which less than N of the com- 
ponents participate (a k-ary azeotrope where k < N), 
the case where x; = 0 for one or more component must 
be accounted for. This can be done by multiplying the 
equilibrium constraints used in (5) by x;. The general 
search for all k-ary homogeneous azeotropes is formu- 
lated as: 


min s 
p ee BEY 
st. x;(InP —In P** —Iny;)—s < 0, 
VieN, 
xi(—In P + In P?** + Iny;) —s <0, 
Vie N, (6) 
i€N 
0<x<l, 
s=>0. 


0 | Start with an initial guess for a solution (this 
does not affect the convergence of the algo- 
rithm). 

1 | Determine if the current solution satisfies the 
original nonlinear system of equation. 

IF it does, THEN store the solution, it corre- 
sponds to an azeotrope. 

2 | Is the current region smaller than the mini- 
mum size tolerance? 

IF it is, THEN fathom the region. 

3 | Partition the region into two subdomains. 

4 | Solve a lower bounding problem in each sub- 
domain. 

IF the solution is greater than zero, THEN 
fathom the region. 

ELSE the solution is stored in the list of lower 
bounding solutions. 

5 | Choose among the active regions the one with 
the minimum lower bounding solution and 
update the current solution point. Return to 
Step 1. 

IF there are no regions remaining, THEN ter- 
minate. 


Global optimization procedure 


In [7], the authors have applied a global optimiza- 
tion approach to enclose all of the solutions to Prob- 
lems (5) and (6). In this approach, each nonconvex term 
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is identified, and replaced by a convex underestimator. 
The solution of the convexified problem is then a lower 
bound on the solution to the original nonconvex prob- 
lem. A branch and bound procedure is used to im- 
prove the lower bound. Regions that cannot contain an 
azeotrope have a positive lower bound and can be fath- 
omed. Regions whose lower bound is less than or equal 
to zero are refined further until all of the azeotropes 
have been located. [7] have analyzed the Wilson, NRTL, 
UNIQUAC, and UNIFAC equations and have devel- 
oped tight convex underestimators for all of the non- 
convex terms for these thermodynamic functions. 


Example 1 In this quaternary system (methanol, ben- 
zene, i-propanol, n-propanol) three binary azeotropes 
have been reported in the literature, as shown in Ta- 
ble 1. No experimental data was found for the ternary 
and quaternary systems. 

Using the global optimization approach of [7], both 
the Wilson and NRTL equations predicted only the 
three reported azeotropes. The results for the Wilson 
equation are very close to the reported compositions 
and temperatures of the azeotropes. The results for the 
NRIL equation are also close to the reported values, 
with the exception of the Methanol-Benzene azeotrope. 


Homotopy Continuation Approach 


The method proposed in [4] is based on tracking 
branches of solutions to nonlinear systems of equations 
that are perturbed from the original system. The key 
idea is to start with an equilibrium surface, on which all 
of the solutions are known a priori. The postulation is 
that every solution is connected to one of the branches 
of the initial surface. This surface is then gradually de- 
formed, and the solution branches are tracked, ending 
with the actual nonideal equilibrium surface. 

If the original equilibrium condition, (2), is rear- 
ranged so that the vapor mole fraction, y;, is represented 
as a function of x;, we obtain an equation of the form 

ee 


a as 


The ideal system can be represented by Raoult’s law: 


sat 


Nonlinear Systems of Equations: Application to the Enclo- 
sure of All Azeotropes, Table 1 

Solution for methanol (1) - benzene (2) - i-propanol (3) - 
n-propanol (4) 


Azeo| xj x2 x3 x4  T(deg C) 
experimental data from [5] 
1—2] 0.605 0.395 = = 58.08 
l=3 no azeotrope 
1-4 no azeotrope 
2=3\) = 0.600 0.400 = 71.80 
2=4/| = 0.791 = 0.209 = =77.10 
3-4 no azeotrope 
Wilson equation 
1—2] 0.624 0.376 = = 58.129 
2=3\ = 0.586 0.414 = 71.951 
2=4| = 0.780 = 0.220 76.946 
NRTL equation 
1—2}] 0.063 0.937 = = 80.166 
2=3\ = 0.588 0.412 = 71.832 
2=4A| = 0.776 = 0.224 77.131 


One of the most convenient ways to represent the 
gradual deformation of the equilibrium surface is to use 
the linear convex combination of the ideal and nonideal 
equilibrium surfaces, where the homotopy parameter, f, 
determines the degree of deformation. 


sat 


yi =[A-) + ty] 3x. 


The problem now is to find the roots of 


h(x, t) =0, (7) 
where 
h(x, tf) =y-x. 


In [4], the authors use a homotopy continuation 
method to find the roots of (7). At t = 0, the pure com- 
ponents are used at the initial solutions to the ideal 
equilibrium equation. The homotopy parameter is then 
increased by a small amount and the solution branches 
are tracked. At each step, the determinant of h, is cal- 
culated, where 
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sure of All Azeotropes, Table 2 
Solution for acetone (1) — chloroform (2) - methanol (3) 


Azeo x] x2 x3 T (deg C) 
experimental data from [5] 

1-2 0.360 0.640 — 64.50 

1-3 0.800 — 0.200 55.70 

2-3 — 0.653 0.347 53.35 
1—2-—3 | 0.318 0.241 0.441 57.67 

Wilson equation 

1-2 0.3437 0.6563 — 65.47 

1-3 0.7944 - 0.2056 55.34 

2-3 — 0.6481 0.3519 53.91 
1—2-—3 | 0.3234 0.2236 0.4530 57.58 


If det[h,] = 0, then a bifurcation may occur, and 
an additional solution branch is started, which corre- 
sponds to an azeotrope. 

[4] applied this method to a range of well- 
understood systems using the Wilson activity coefhi- 
cient equation. Even though the homotopy continua- 
tion approach does not offer theoretical guarantee of 
locating all azeotropes, [4] observed that the method 
predicted all of the azeotropes that have been verified 
experimentally for these systems. 


Example 2 [4] examined the ternary system of acetone, 
chloroform, and methanol They found that their ho- 
motopy continuation approach predicted all three bi- 
nary azeotropes that are reported in the literature, using 
the Wilson equation. Their results are shown in Table 2. 


Heterogeneous Azeotropes 


Azeotropy is not limited to systems with a single liq- 
uid phase. In the more general case, where multiple liq- 
uid phases exist in equilibrium, a liquid mixture that 
boils at a constant temperature is called a heteroge- 
neous azeotrope. Heterogeneous azeotropes can be of 
two different types. Type I heterogeneous azeotropes 
occur when the overall liquid composition is identical 
to the vapor composition. Conversely, Type II hetero- 
geneous azeotropes occur when the overall liquid com- 
position is not equal to the vapor composition. Type II 
heterogeneous azeotropes are possible theoretically, but 
have never been verified experimentally. 


The phase equilibrium condition for a heteroge- 
neous azeotrope in a system with M liquid phases is 
written: 


uv =p', VieN, VjeM. 


When the definitions of the chemical potentials are 
applied and simplifications are made, the expression 
becomes, 

Ne ; 
Yi V; J pe 


OP 
In Type I heterogeneous azeotropes, the overall 
composition of the liquid is equal to the composition 


of the vapor, 


M 
n=) we. ViEN, 


j=l 


where y/ is the mole fraction of liquid phase j in the 
liquid mixture. The additional constraints are: 


An extension of the homotopy continuation 
method for homogeneous azeotropes of [4] was devel- 
oped by [3] in order to examine the problem of finding 
all Type I heterogeneous azeotropes. 


Reactive Azeotropes 


Azeotropic behavior can also occur in reacting mix- 
tures, [11]. The authors derived necessary and sufficient 
conditions for reactive azeotropes. These are: 


Y,-X;j;=0, i=1,...,N—-1, 
where 
x; _ VEX — ViXk 
VE — VTXk 
y, — VeVi ~ ViVk 


Vk — VTYk 
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where v; is the vector of reaction stoichiometric coefhi- 
cients for component i, the index k refers to the refer- 
ence component, and vr is the vector of the sum of the 
stoichiometric coefficients for each reaction. 

In addition to this requirement, the system must 
also be in phase equilibrium, and in chemical equilib- 
rium. The chemical equilibrium expression for each re- 
action r is written, 


K. 
q Vreact 
(a) Tl (x;y;) ol 
q i€react 
1 bt 
= — I] (xiVi) prod 
(iz _ ) i€prod 


where K7, is the equilibrium coefficient for reaction r. 

[9] have applied an arc-length continuation ap- 
proach to search for all reactive azeotropes at fixed Keg 
for systems with a single chemical reaction. 


=0, 
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> Contraction-Mapping 

> Global Optimization Methods for Systems of 
Nonlinear Equations 
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> Nonlinear Least Squares: Newton-type Methods 
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Basic Problem Formulation 


Sensitivity analysis problems typically reduce to deter- 
mining the response of a vector x* = (xj, ..., x7) to 
changes in a scalar w*, where x* and a* are required to 
satisfy an n-dimensional system of nonlinear equations 
of the form 


0 = w(x, a) = (Wi(x,@),...,W"(x,a))". (1) 
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This problem formulation, with a scalar parameter a, 
is more general than it might first appear. For example, 
suppose an analyst wishes to investigate the surface of 
function values x = f(z) taken on by some function f: 
R” — R" as z ranges over a specified region Z in R”. 
One approach is to consider a suitably smooth curve s: 
[0, 1] — Z which roughly fills this region, of the form 
z = s(a), and to define a new function of the form w(x, 
a) = x — f(s(a)). Solving the system of equations y(x, 
a) = 0 for x as a function of a as a ranges from 0 to 1 
then yields a curve of points x(a) on the function sur- 
face which gives some idea of the shape of this surface 
over the region Z. 

Assuming y: R"*! — R" is twice continuously 
differentiable and has a nonsingular Jacobian matrix 
w(x", a*), the implicit function theorem guarantees 
the existence of a continuously differentiable function 
x(a) taking some neighborhood N(a*) of a* into R” 
such that 

0= W(x(a),a), we N(a*), (2) 
with x(a@*) = x*. From (2) one obtains the fundamental 
equation for sensitivity analysis, 


dx(a) _ _ 4 
2 > Wx (x(a), a)" Wa(x(@), @) , 3) 


a € N(a*). 


As it stands, (3) is an analytically incomplete sys- 
tem of ordinary differential equations. That is, a closed 
form representation for the Jacobian inverse J(a)~! = 
w(x(a), w)~! as a function of a is often not obtain- 
able for n > 3. Thus, the integration of (3) from initial 
conditions would typically require the supplementary 
algebraic determination of the Jacobian inverse J(a)~' 
at each step in the integration process. 

Why not simply incorporate a linear equation solver 
to accomplish the needed matrix inversions? Two rea- 
sons can be given. First, the Jacobian matrix might have 
one or more eigenvalues which are small in absolute 
value. Consequently, as can be seen using a singular 
value decomposition, the inverse matrix can be highly 
ill-conditioned in the sense that its elements have large 
absolute values and take on both positive and negative 
values. In this case, small roundoff and truncation er- 
rors can cause large errors in the resulting numerically 
determined component values of the sensitivity vector 


dx(a)/da. Second, there exists an alternative approach 
[14] that has proven its reliability and efficiency in nu- 
merous contexts over the past twenty years: replace the 
algebraic operation of matrix inversion by an initial 
value problem highly suited for modern digital comput- 
ers. 

The latter approach is taken in [10]. The differen- 
tial system (3) is extended by the incorporation of or- 
dinary differential equations for the Jacobian inverse. 
More precisely, letting A(aw) and 6(a@) denote the ad- 
joint and the determinant of the Jacobian matrix J(q@), 
and recalling that the inverse of any nonsingular matrix 
can be represented as the ratio of its adjoint to its de- 
terminant, the following differential system is validated 
for x(a), A(@), and (a): 


dx(a) | Wa (x(a), a) 
da Say 
dA(a) _ A(a) Trace(A(a)B(a)) — A(a) B(a)A(a) ; 
da 5(a) , 
(5) 
HOD)... Face AtaNBGAy: (6) 
da 


The ijth component of the matrix B(a) = dJ(a)/da ap- 
pearing in equations (5) and (6) is 


n 


: d ; 
D> (WjeCote.a AH) + vf agalee).a) (7) 


da 
k=1 


where vi ,, denotes the second partial of y' with respect 
to x; and x,, and Vin 41 denotes the second partial of 
yw’ with respect to x; and a. Given (4), note that each of 
the components (7) is expressible as a known function 
of x(a), A(a), 6(a), and a. Initial conditions for equa- 
tions (4)-(6) must be provided at a parameter point a* 
by specifying values for x(a@*), A(a*), and 5(a*) satis- 
fying 0 = y(x(a*), a*), A(a*) = AdjJ(a*)), and 5(a*) 
= Det(J(a*)) £0. 

In summary, the system of equations (4)-(6) pro- 
vides an analytically complete system of ordinary dif- 
ferential equations for the nonlocal sensitivity analysis 
of the original system of interest, 0 = w(x, a). That is, it 
permits the tracking of the solution vector x(a) and the 
sensitivity vector dx(a)/da, together with the adjoint 
A(q@) and the determinant 6(@) of the Jacobian matrix 
J(a), over any a-interval [w*, w* *] where the determi- 
nant remains nonzero. 
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Fully Automated Implementation 


The complete differential system (4)-(6) was initially 
implemented in [10] by means of a Fortran program 
incorporating a fourth-order Adams—Moulton integra- 
tion method with a Runge-Kutta start and hand-coded 
partial derivatives. High numerical accuracy was ob- 
tained in illustrative applications, even near critical 
points a where the determinant 5(a@) became zero. 
Nevertheless, hand-coding of partial derivatives was 
clearly an undesirable feature of the program. The par- 
tial derivative expressions in (7) involve the second or- 
der partial derivatives of y(-); and w(-) in turn could 
involve the partial derivatives of some still more basic 
function, such as the criterion function for an optimiza- 
tion problem. This is indeed the typical case for eco- 
nomic problems (e. g., the profit maximization problem 
handled in [10]), since such problems invariably incor- 
porate the decision-making processes of various types 
of economic agents. 

In consequence, a more fully automated Fortran 
program for nonlocal sensitivity analysis was eventu- 
ally developed in [11]. This program is referred to as 
Nasa (an acronym for Nonlocal Automated Sensitiv- 
ity Analysis); it is available for downloading as free- 
ware from the Web site http://www.econ.iastate.edu/ 
tesfatsi/. Nasa incorporates a fairly substantial library 
for the forward-mode automatic evaluation of partial 
derivatives through order three [13] as well as an adap- 
tive homotopy method [12] for automatically obtaining 
all required initial conditions. The following sections 
briefly describe these features. A recent example of how 
Nasa has been applied to an applied general equilibrium 
problem in economics is detailed in [2]. 


Incorporation of Automatic Differentiation 


Four basic approaches (see Jerrell [8] for an interest- 
ing comparative discussion of these four alternative ap- 
proaches) can be used to obtain computer-generated 
numerical values for derivatives: hand-coding; numer- 
ical differentiation; symbolic differentiation; and au- 
tomatic derivative evaluation, or automatic differen- 
tiation for short. Recently, computational differentia- 
tion has come to be the preferred term for automatic 
differentiation; see [1]. To avoid confusion, the more 
traditional term is used here. Numerical differentia- 


tion methods substitute discrete approximate forms 
for derivative expressions. For example, finite differ- 
ence methods involve the approximation of deriva- 
tives by ratios of discrete increments; e.g. f’(t) ~ 
[f(t + h) — f(]/h for some suitably small h. Sym- 
bolic differentiation methods generate exact symbolic 
expressions for derivatives that can be manipulated al- 
gebraically as well as evaluated numerically. In con- 
trast, automatic differentiation methods do not gener- 
ate explicit derivative expressions, either approximate 
or symbolic. Rather, these methods focus on the gen- 
eration of derivative evaluations by breaking down the 
evaluation of a derivative at a given point into a se- 
quence of simpler evaluations for functions of at most 
one or two variables. These evaluations are exact up to 
roundoff and truncation error. 

For the nonlocal sensitivity analysis problem out- 
lined above, the primary requirement is for partial 
derivative evaluations through order three to be ob- 
tained in a reliable and efficient manner. The use of 
numerical differentiation methods such as finite differ- 
ence introduces systematic approximation errors into 
applications that can be reduced but not eliminated en- 
tirely due to the risk of catastrophic floating point er- 
ror. Symbolic differentiation software packages such as 
Macsyma, Mathematica, and Maple produce analytical 
expressions for derivatives but are notorious for ‘ex- 
pression swell’, that is, for the great many lines of code 
they produce for the derivative expressions of even rel- 
atively simple functional forms despite repeated use of 
reduction routines; see [5] for explicit examples (note 
that automatic differentiation has now been introduced 
into Maple, see [6]). Thus, an automatic derivative eval- 
uation routine would seem to be the preferred alterna- 
tive for the application at hand. 

Automatic differentiation appears to have been in- 
dependently developed by R.E. Moore [16] and R. 
Wengert [20]. The key idea of Moore and Wengert was 
to decompose the evaluation of complicated functions 
of many variables into a sequence of simpler evalua- 
tions of special functions of one or two variables, re- 
ferred to below as a ‘function list’. Total differentials 
of the special functions could be automatically evalu- 
ated along with the special function values, and partial 
derivatives could then be recovered from the total dif- 
ferentials by solving certain associated sets of linear al- 
gebraic equations. 
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Nonlocal Sensitivity Analysis with Automatic Differentiation, Table 1 


Illustrative application of the Feed algorithm 


Function List | 0/dx d/dy 07/0x? 0°/dx? 

@ = 5 1 0 0 0 

b=y 0 1 0 0 

c= ab a,b + ab, ayb+aby Axxb + 2axby + aby, — Axxxb + 3Axxby + 3axbyx + AD xxx 
d = log(c) ee, Cay =¢ Ce ey 2076 = 30 “ttn tC yy 

i = (1 te (el ax, + dx ay +dy lec, AB Bloor Axxx + Axxx 


As detailed in [1] and [4], great strides have been 
made over the past thirty years in developing fast and 
reliable automatic differentiation algorithms. The Nasa 
program incorporates one such algorithm, originally 
developed in [13], that is now referred to as Feed (an 
acronym for Fast Efficient Evaluation of Derivatives). 
A detailed discussion of the use of this automatic dif- 
ferentiation algorithm for both optimization and sen- 
sitivity analysis can be found in [9]. Total differentials 
are replaced by derivative arrays in order to avoid re- 
peated function evaluations and the need to recover 
partial derivatives from total differentials for each suc- 
cessively higher-order level of differentiation. 

As a simple illustration of Feed, consider the func- 
tion F: R4., — R defined by 


z= F(x, y)=x + log(xy). (8) 


Suppose one wishes to evaluate the function value z and 
the partial derivatives z,, Z,, Zxx and Z,,, at a given do- 
main point (x, y). Consider Table 1. 

The first column of Table 1 constitutes the func- 
tion list for the function (8); it sequentially evaluates the 
function value z = x + log(xy) at the given domain point 
(x, y). The remaining entries in each row give the indi- 
cated derivative evaluations of the first entry in the row, 
using only algebraic operations. The first two rows ini- 
tialize the algorithm, one row being required for each 
independent variable. The only input required for the 
first two rows is the domain point (x, y). Each subse- 
quent row outputs a one-dimensional array of the form 
(DP; Px» Py» Pxxs Pxxx)» using the arrays obtained from pre- 
vious row calculations as inputs. The final row yields 
the desired evaluations (z, Z;, Zy, Zxx» Zxxx). Note that the 
limitation to this collection of partial derivative evalua- 
tions is for expositional simplicity only. The evaluation 
of any additional desired partial derivative of z, say Zxyy 


OF Zxxxy, can be obtained in a similar manner by suit- 
ably augmenting Table 1 with an additional column of 
algebraic operations. 

The elements in each of the rows in Table 1 can be 
numerically evaluated by means of sequential calls to 
Feed calculus subroutines. These evaluations are exact 
up to roundoff and truncation error. For expositional 
simplicity, Table 1 only depicts evaluations for partial 
derivatives through order three. However, Feed calcu- 
lus subroutines can in principle be constructed to eval- 
uate the function value and the distinct partial deriva- 
tives through order k of any real-valued multivariable 
function that can be sequentially evaluated in a finite 
number of steps by means of the two-variable functions 


w=u+v w=u-v, w=uUvV, 

u 7 (9) 
w=-, w=Uu 

v 


and arbitrary nonlinear one-variable kth-order differ- 
entiable functions such as 


u 


cos(u), sin(u), exp(u), c’, 


(10) 


log(u), au?’ +c, 


for arbitrary constants a, b, and c. Systematic rules for 
constructing general kth-order calculus subroutines for 
special functions such as (10) are derived in [13]. Ref- 
erences to other work focusing on recurrence relations 
for the derivatives of special functions such as (10) can 
be found, for example, in [15]. A detailed discussion of 
the library of Feed calculus subroutines currently incor- 
porated into Nasa is given in [11]. 

The Feed algorithm thus envisions the successive 
transformation of arrays of partial derivatives through 
any specified order k into similarly-configured arrays 
as one forward sweep is taken through the function list 
for a specified kth-order differentiable function. A sim- 
ilar approach is proposed in [15] and [18, p. 280]. In 
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contrast, the partial derivative evaluation methods pro- 
posed in [17, Chapter VI, pp. 91-111] and [21] have 
a tree structure; that is, gradient operations are used to 
generate evaluations for each successively higher-order 
collection of partial derivatives using the results of pre- 
vious gradient operations as inputs. Another approach 
that has attracted a great deal of interest is reverse- 
mode differentiation; see [3] and [7]. 


Automatic Initialization 
via Adaptive Homotopy Continuation 


The initial conditions needed to integrate the complete 
differential system (4)-(6) from a given initial parame- 
ter point a* consist of a solution vector x(a*) together 
with evaluations for the adjoint A(a@*) and determinant 
6(a*) of the Jacobian matrix y,(x(a*), w*). For many 
nonlinear problems, finding an initial solution vector is 
a difficult matter in and of itself. 

Nasa incorporates an adaptive homotopy method 
[12] for automating these needed initializations. A stan- 
dard (linear) homotopy method applied to the problem 
of finding a solution x* for a system of equations 0 = 
F(x) proceeds by introducing a homotopy of the form 

0 = tF(x) + [1 — #][x —c] (11) 
and solving for x as a function of ft as t varies from 0 to 
1 along the real line, where c represents any initial guess 
for the solution vector x*. In contrast, an adaptive ho- 
motopy is a homotopy for which the usual continuation 
parameter t varying from 0 to 1 on the real line is re- 
placed by an adaptive continuation ‘agent’ that makes 
its way by trial and error from 0 + Oi to 1 + Oj in the 
complex plane in accordance with certain stated objec- 
tives. 

Specifically, the continuation agent designed in [12] 
adaptively selects a path of 6 values from 0 + 0i to 1 + 
0i in the complex plane for the homotopy 


0 = [F(x) — F(c)] + BF(c), (12) 


where c again represents any initial guess for the so- 
lution vector x*. The path for f is selected in accor- 
dance with the following multiple objective optimiza- 
tion problem: Reach the point 1 + 0 starting from the 
point 0 + Oi by taking as few steps as possible along 
a spider-web (spoke/hub) grid centered at 1 + 0i in the 


complex plane, but do so in a way that avoids regions 
where the Jacobian matrix becomes ill-conditioned. 

The adaptive homotopy method introduced in [12] 
and incorporated into Nasa is thus an example of what 
might more generally be called an adaptive computa- 
tional method, i.e., a computational method that em- 
bodies the following principle important for applied 
researchers: Let the computational algorithm adapt to 
the physical problem at hand instead of requiring users 
to reformulate their physical problems to conform 
to algorithmic requirements. For sufficiently smooth 
functions F(-), a properly constructed homotopy (e.g., 
a probability one homotopy as formulated in [19]) 
is theoretically guaranteed to have no singular points 
along the real continuation path from 0 to 1 for al- 
most all initial starting points c. However, successful 
implementation of such homotopy methods can re- 
quire a mathematically sophisticated reformulation of 
the user’s original problem. 

The homotopy (12) is solved for x as a function of 
B as B varies from 0 + 0i to 1 + 07 in the complex plane 
by making use of a complete system of ordinary differ- 
ential equations analogous to the system set out in the 
basic problem formulation above. At each f point one 
obtains a solution vector x*(6) together with evalua- 
tions A*(f) and 5*(£) for the adjoint and determinant 
of the homotopy Jacobian matrix J*(B) = F,(x*()). 
Note that the homotopy Jacobian matrix coincides with 
the Jacobian matrix for the original function of interest 
F(-), implying that singularities are not artificially in- 
duced into the problem by the homotopy method per 
se. In principle, the solution vector x*(1 + 0i) obtained 
for (12) at 8 = 1 + 0j yields a solution vector for the 
original system of interest, 0 = F(x). In particular, let- 
ting F(x) = w(x, a*), one obtains complete initial con- 
ditions for the original problem of interest, the nonlocal 
sensitivity analysis of the system 0 = (x, a) over an in- 
terval of w values starting at a”. 
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The routing of traffic requirements is one of the im- 
portant problems that have to be solved when design- 
ing a telecommunication network. Another important 


2648 


Nonoriented Multicommodity Flow Problems 


problem is the computation of standby capacities that 
must be set up to enable the rerouting of the traf- 
fic requirements affected by any network failure of 
some given types. Both problems can be formulated 
as nonoriented multicommodity network flow models 
sometimes also called undirected multicommodity net- 
work flow models. The routing problem captures the 
main aspects of such models and is therefore consid- 
ered in more details hereafter. The reader interested 
in the computation of standby capacities is referred to 
cee 

A transmission network can be viewed as an undi- 
rected graph. An edge (i.e. a link of the network) is 
characterized by a pair of nodes, a transmission cost 
per circuit and a transmission capacity in the number 
of circuits. The routing problem is the determination of 
the most economical way of using the available trans- 
mission capacities to route a traffic matrix (a number 
of circuits between various pairs of nodes) through the 
network. This problem can be expressed as follows: 


st. Axk=rk, kEK, 
Disa. ses, 
k 


where: 

e Ais the node-arc incidence matrix corresponding to 
an arbitrary orientation of the network graph (i.e. 
Aj = +1 ifarcj is directed away from node i, Aj = —1 
if arc j is directed towards node i, Aj = 0 otherwise); 
J =({1,..., J} is the set of arcs of the network; 

e kisacommodity characterized by a number of cir- 
cuits, d*, to be routed through the network between 
a given pair of nodes, sk and th; 

e K = (I,...,K} is the set of commodities to be 
routed; 

e r* is the requirement vector for commodity k (i.e. 
i = dk‘, i = —d* and i = 0 when | ¢ {s*, t*}); 

e x" is the flow vector for a given commodity k (i.e. < 
is the flow on arc j = (s, t) with xf > 0 if the flow goes 
from s to t and xf < 0 if the flow goes from f to s); 

eb; is the capacity of link j; 

° ee is the cost per unit of flow on link j for commodity 
k, 


The first group of constraints contains one block of 
constraints per commodity. These so-called flow con- 
straints specify that x is a routing of the traffic re- 
quirements. The second group of constraints ties to- 
gether the various commodities. These coupling con- 
straints specify that only a limited number of circuits 
can be routed through any given edge of the network. 
The above formulation is called a node-arc formulation 
of the problem. 

If we assume that cost coefficient a are non neg- 
ative, this nonlinear problem can be replaced with 
an equivalent standard (linear) multicommodity flow 
problem by using the transformation of the network 
graph depicted in the following figure where the figures 
over edges and arcs denote capacities. 


(=) bj Cy) 


| 
® 


The structure of the resulting model is similar to 
that of the original model except that absolute values 
are removed and nonnegativity restrictions on the flow 
variables are introduced. The transformation is justified 
by the fact that the support of the flow associated with 
any given commodity does not contain any directed cy- 
cle in the optimal solution of the transformed problem. 
The transformed model could be solved by any code 
designed for the solution of directed multicommodity 
flow problems. However, as this model is much larger 
than the original nonlinear model a better approach has 
to be devised when a computationally efficient solution 


method is sought for. Assuming that e > 0, one can in- 


troduce nonnegative variables, < and a such that 


xk =xkt — xk and |x*| = a 


jax j + a to obtain an equiv- 
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alent linear formulation [11]: 


Digest) sh, vier 
k 

xt >0, VkEK, YjeJ, 

a 20; Vek, Viel. 


For real networks such as the Paris district transmission 
network, the number of nodes, I, can be larger than 100 
and the number of links larger than 300. If one assumes 
that there can be a traffic requirement between any pair 
of nodes, the size of the problem can be larger than ap- 
proximately M ~ 5 x 10° constraints and N ~ 3 x 10° 
variables, since K = I(I— 1)/2 in this case. However, 
the formulation can be often simplified and an equiva- 
lent smaller problem solved. In fact, when the cost per 
unit of flow on a given link does not depend on the 
commodity (i.e. ch = cj for any k) all the commodities 
which have a common endpoint can be ‘merged’. More 
precisely, consider a particular node, i, and the set, K;, 
of commodities, k, such that s; = i or t, = i. As the orien- 
tation of the commodities is arbitrary the commodities 
in K; can be replaced with a single commodity charac- 
terized by a requirement vector, r, of components, rj = 
Yonex; dk 1) = — d* for all j € Uxex, {sx, th} — {i} and rj 
= 0 for all j ¢ Uxex, {sx te}. A merging of commodities 
that minimizes the number of merged commodities can 
be found by solving a minimum node cover problem in 
a nondirected graph (cf. ® Network location: Covering 
problems) where the vertices correspond to the end- 
points of the commodities and the edges correspond 
to the commodities. In practice a satisfactory solution 
to this problem can be obtained by a greedy algorithm 
where at each iteration the node that covers the larger 
number of edges not yet covered is selected. In any case, 
the problem reduces to a number of commodities not 
larger than I. Assuming that K = I, one obtains M 
x 10,000 constraints and N ~ 60,000 variables. This 
size of problem is within the reach of modern general 
purpose linear programming codes. However, if extra 
constraints were to be included in the initial formula- 
tion the merging of the commodities might not be any 
longer possible. For example, this would be the case if 
quality of service constraints were introduced to impose 


that no more than p percent of any traffic requirement 
kt. yk 


is routed on any link. Constraints of the form xj" + x; 
< pd;/100 would have to be added to the above formu- 
lation. The size of this new model would be huge. 

Fortunately, specialized algorithms that exploit 
both the block structure of the problem and the struc- 
ture of each block of flow constraints can be designed 
to provide efficient solution methods. A primal parti- 
tioning simplex algorithm specialization is presented in 
[3]. Specializations based on price-directive decompo- 
sition (cf. » Branch and price: Integer programming 
with column generation), resource-directive decompo- 
sition and partitioning of linear systems (simplex and 
dual affine scaling algorithm) are reviewed in [5]. Tech- 
nical details on specializations as well as comparative 
experiments are presented in [4]. 

It is worth giving more details on one of the most at- 
tractive specialization method obtained by applying the 
Dantzig-Wolfe decomposition principle to the above 
formulation. The advantage of this specialization is 
threefold. First, it leads to a reformulation of the prob- 
lem that appeals to network planners (as the concept 
of routing path is more directly part of the model) 
and is easily understood without any references to the 
Dantzig-Wolfe decomposition principle. Secondly, it 
can be easily implemented using modern linear pro- 
gramming libraries. Thirdly, and more importantly it is 
computationally efficient for real-life instances that are 
usually weakly constrained (no more than 10% edges 
saturated at optimality) (see [4]). 

By applying the Dantzig-Wolfe decomposition 
principle it can be shown that the routing problem can 
be reformulated as follows [5]: 


min Sle ay 


vk is the characteristic vector of a path, py, that connects 
sx to ty in the undirected network graph, i. e. the com- 
ponents j of vj is 1 if link j belongs to pf and is 0 other- 
wise. [c*]T vk is the cost of pk per unit of flow routed on 
that path (i.e. the sum of the costs e of its links) and 
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ar is the amount of traffic routed on path ae The first 
set of constraints expresses that d;, units of traffic have 
to be routed between s, and t, whereas the second set 
of constraints expresses the limitation on the capacity 
available on the links to route the traffic. The formula- 
tion above is known as the node-path formulation of the 
multicommodity flow problem. 

This formulation has an exponential number of 
columns but it is not necessary to express all the 
columns of this linear program. Indeed, the reduced 
cost of variable ak is wr = [ck + A]T vk — mx, where 
— A is the vector of simplex multipliers associated with 
the capacity constraints and z, is the vector of sim- 
plex multipliers associated with the constraint corre- 
sponding to commodity k in the first group of con- 
straints. Therefore min, wh can be obtained by comput- 
ing a shortest path between s; and t, in the undirected 
network graph where the edge lengths are given by the 
vector cK + A, 

In the decomposition algorithm the formulation 
given above is called the (full) master program. The 
K shortest path problems which are solved to gener- 
ate columns of the master program are called the satel- 
lite problems. At each stage of the algorithm a reduced 
master program, i.e. a program that contains a sub- 
set of the columns of the master program, is solved. 
The first reduced master program is initialized with 
a set of columns corresponding to a basic matrix and 
the columns corresponding to the slack variables. If no 
such initial set of columns is available, artificial vari- 
ables are added to the formulation and a first phase 
or a big-M method have to be used to drive the pro- 
gram to a feasible basis. After solving a reduced mas- 
ter program to optimality, the K satellite problems are 
solved to check whether the solution obtained is opti- 
mal for the full master program. If the reduced costs 
corresponding to the solution of the satellite problems 
are nonnegative the solution to the reduced master pro- 
gram is an optimal solution to the full master program, 
otherwise the master program is increased with the new 
columns candidate for basis entry and the simplex al- 
gorithm goes on. After the solution of a reduced master 
program the reduced cost of all variables in that pro- 
gram are nonnegative. Hence, as A is the reduced cost 
vector associated with the slack variables of the cou- 
pling constraints we deduce that A > 0 and that in turn 
the edge lengths in the satellite problems are nonnega- 


tive. The satellite problems can therefore be solved us- 
ing the Dijkstra algorithm (see [9] and [13] for details 
of efficient implementations). 

The solution of each master program can be made 
more efficient by using a refinement of the GUB spe- 
cialization of the simplex algorithm (see [4] for de- 
tails). Some variations of the algorithm may be also ap- 
plied. For example, some columns generated may be 
discarded from the reduced master program at later 
stages, the generation of new columns may be carried 
out before optimality is reached provided that the sim- 
plex multipliers 1 are nonnegative. 

Finally it is worth mentioning that a new cutting 
plane technique based on interior point method and 
called the analytic center cutting plane method (AC- 
CPM) was recently proposed to solve large scale convex 
optimization problems [8]. Its application to the dual 
of the above formulation gives performances that are 
roughly similar to those of the Dantzig-Wolfe decom- 
position algorithm [4] but ACCPM is much more effi- 
cient for highly nonlinear problems [7]. 

Further details on network programming and mod- 
eling can be found in [1,2,6,10]. 


See also 


> Maximum Flow Problem 

> Minimum Cost Flow Problem 

> Multicommodity Flow Problems 

> Nonconvex Network Flow Problems 

> Piecewise Linear Network Flow Problems 
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Introduction 


Being natural generalizations of the classical Fréchet 
derivative and the subdifferential in the sense of con- 
vex analysis, Fréchet subdifferentials have been known 
for more than 30 years. They were probably first in- 
troduced in finite dimensions in [1] (under the name 
“lower semidifferentials”). Some of their properties 
in the infinite-dimensional setting were investigated 
in [18,21]. During the first decade Fréchet subdifferen- 
tials were not widely used because of rather poor (di- 
rect) calculus. They mostly served as building blocks 
for more sophisticated limiting (Fréchet) subdifferen- 
tials [19,23,30,31]. 

The discovery of the “fuzzy rules” in the 
1980s [11,12,15,29] revitalized interest in Fréchet sub- 
differentials. It was shown that calculus results and 
optimality conditions can be formulated in terms of 
Fréchet and other “simple” subdifferentials computed 
not at the given point, but at some points arbitrarily 
close to it, thus incorporating “differential” properties 
of the function at nearby points. Such results are ac- 
tually at the core of the corresponding statements for 
limiting subdifferentials. 

Being the smallest among all “simple” subdifferen- 
tials with reasonable properties, the Fréchet subdiffer- 
entials have proved to be convenient tools for the anal- 
ysis of nondifferentiable functions on Asplund spaces, 
a very important subclass of general Banach spaces. 
Furthermore, the main fuzzy results in terms of Fréchet 
subdifferentials present characterizations of Asplund 
spaces themselves [6,12,13,34,35,38,45]. 

The article contains no proofs. A more detailed sur- 
vey of Fréchet subdifferentials can be found in [25]. 

Mostly standard notations are used throughout the 
article. X and Y denote normed linear spaces and X* 
and Y* denote their topological duals. (-, -) is a bilinear 
form defining a canonical paring between a space and 
its dual. B,(x) stands for a closed ball with center x and 
radius p. We write B, instead of B,(0) and just B if p = 
1 (unit ball). 
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Definitions 
Fréchet Subdifferentials 
Let f : X > Ro = RU {+00}, f(x) < 00. The set 


Of(x) = 4 x* € X*: 
e (1) 
fe oP Se, 
me ual 


is called the Fréchet subdifferential of f at x. This set is 
(norm) closed and convex. The next proposition shows 
that it generalizes the notions of the Fréchet derivative 
and the subdifferential in the sense of convex analysis. 


Proposition 1 

1. If f is Fréchet-differentiable at x with the derivative 
V f(x) then Of (x) = {Vf(x)}. 

2. If f is convex then 


af (x) = {x* eX": 
f(u) — f(x) > (x*,u—x), Vue x\ 
Note that the Fréchet subdifferential does not change if 


another equivalent norm on X is used in (1). 


Example 1 The set (1) can be empty. Take f : R > 
R: f(u) = —|u|,ueR. 


One can also consider the Fréchet superdifferential 


O° f(x) = ix* @X*: 


f(u) — f(x) — (x*,u— x) 


<0 
Iu — x 


lim sup 
ux 


(2) 


While the set (1) consists of linear continuous func- 
tionals “supporting” f from below, the functionals from 
(2) “support” f from above. Unlike the classical case, the 
existence of two different derivative-like objects is quite 
natural for nonsmooth analysis: “differential” proper- 
ties of a function “from below” and “from above” could 
be essentially different. 

Subdifferentials and superdifferentials are related by 
the equality 


A(—f)(x) = —0* f(x). (3) 


Surely, in the nondifferentiable case at least one of 
the sets (1) and (2) must be empty. 


Proposition 2. df(x) 4 O and dt f(x) # QO if and 
only if f is Fréchet-differentiable at x. In this case one 
has Of (x) = dt f(x) = {Vf(x)}. 


Example 2 Both sets (1) and (2) can be empty simulta- 
neously. Take f: R > R: f(u) = usin(1/u) ifu ¥ 0, 
and f(0) = 0. 


Example 3 The fact that the set (1) is a singleton does 
not imply differentiability. Take f: R > R: f(u) = 
max(u sin(1/u),0) if u ¢ 0, and f(0) = 0. Then f 
is nondifferentiable at 0, although one evidently has 


df (0) = {0}. 


Example 4 Fréchet differentiability is essential in 
Proposition 1 Gateaux differentiable functions can be 
nonsubdifferentiable in the Fréchet sense. Take f : 
R? SR: f(m,m) = —Vlui?? + lw? if uw = uj, 
f(ui, U2) = 0 otherwise. The Gateaux derivative of f 
is 0, while 0f(0) = @. 


Remark 1 One can define the Gdteaux subdifferen- 
tial on the basis of the notion of the Gateaux dif- 
ferentiability. For this subdifferential, the analogs of 
Propositions 1 and 2 and some other results hold 
true. Considering Gateaux (and other types of) “sim- 
ple” subdifferentials can be useful in some applica- 
tions. The Gateaux subdifferential always contains the 
Fréchet subdifferential. 


If dim X < oo the Fréchet subdifferential can be ex- 
pressed equivalently in terms of certain generalized di- 
rectional derivatives [1,16,18,39,42,43]. 


Fréchet Normal Cone 


Now consider a set 2 C X and let x € 2. Similarly 
to definition (1) of the Fréchet subdifferential one can 
define the Fréchet normal cone 


x*,u-—x 

x* eX"; iiegap <0} (4) 
2 |lu—~ll 
ux 


N(x|@) = 


to 2 at x. Here u ist x means that u > x withu € 92. 
It is a norm closed and convex cone closely related 

to the subdifferential defined above. It is actually the 

Fréchet subdifferential 06,9 (x) of the indicator function 
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dQ of 2 (6Q(u) = Oifu € 2 and dg(u) = o other- 
wise). 

This fact allows one to deduce some properties of 
normal cones from the corresponding statements about 
subdifferentials. Thus, it follows from Proposition 1, 
that the normal cone (4) generalizes the corresponding 
notion of convex analysis. 

In finite dimensions the Fréchet normal cone co- 
incides with the polar of the tangent (contingent, 
Bouligand tangent) cone to (2 at x. If X is reflex- 
ive it coincides with the polar of the weak tangent 
cone [1,2,9,14,43]. 

The relationship between Fréchet subdifferentials 
and normal cones is bilateral. For a function f : X > 
Roo one can consider its epigraph epif = {(u,m) € 
XxR: f(u) < p}. Ifthe norm on X x R is compat- 
ible with that on X (that is, ||(x,0)|| = ||x/|), then the 
following equivalent definition of the Fréchet subdiffer- 
ential holds true: 


af(x) = {x* ex*s(-1ye N(x, flx)lepif)} 
(5) 


“Horizontal” normals to the epigraph can also be of 
interest. They define the singular Fréchet subdifferential 
of f at x: 


a° f(x) = \* EX*:(x*,0)e N(x, flsdlepif)} 


Of course, if f is calm at x [7,42], that is, || f(u)—f(x)|| < 
I||u —x|| for some / > O and for all u in a neighborhood 
of x, then the latter set is empty. 


Strict Fréchet 5-Subdifferentials 


As mentioned in “Introduction,” Fréchet subdifferen- 
tials have poor calculus and their direct application has 
been rather limited. There exists a way of enriching 
the properties of the subdifferentials. It consists in con- 
sidering differential properties of a function at nearby 
points. 

Consider a new derivative-like object based on the 
Fréchet subdifferential: 


Os f(x) = 'S 


u © Bs(x) 


|clf(u) — f(x)| < 6 


d(clf)(u). (6) 


It depends on some positive 5. clf denotes here the 
lower semicontinuous envelope of f (its epigraph is the 
closure of the epigraph of f in X x R). Unlike (1) the 
set (6) can be nonconvex. It is called the strict Fréchet 
6-subdifferential of f at x [24]. 

The strict Fréchet 5-superdifferential OF f(x) of f at x 
can be defined in a similar way. The equality Oy f(x) = 
=d5(= f)(x) holds true. The strict subdifferentials and 
superdifferentials can be nonempty simultaneously and 
can be essentially different. The set a8 p(x) = Og f(x) U 
dp f(x) can be useful in some situations. It is called the 
strict Fréchet 5-differential of f at x. 

The strict Fréchet 5-normal cone toa set 2 atx € 2 
is defined similarly: 

Ns(x12) = = (J Null). 


u€clS2N Bs (x) 


The goal of introducing strict Fréchet 6-subdifferen- 
tials is mainly notational. They are convenient for for- 
mulating “fuzzy” results, but such results can certainly 
be formulated in terms of ordinary Fréchet subdifferen- 
tials. 


Limiting Subdifferentials 


The limiting Fréchet subdifferentials are defined as lim- 
its of “simple” ones [23,30,31,34]. To simplify the defi- 
nitions we assume in this subsection that f : X > Roo 
is lower semicontinuous in a neighborhood of x. 

The limiting Fréchet subdifferential of f at x is de- 
fined as 


Of(x) = {x* © X*:4 sequences 
{xx} CX, {xp} C X*such that 


Xk dy x* “5 x* and x* € Of (xx), k= 1,2,...}. 


(7) 


The notations x; = x and x, a x* here mean, re- 
spectively, that x, — x with f(x,) > f(x) (f-attentive 
convergence [42]), and xf converges to x* in the weak * 
topology of X*. 

f(x) is a weakly* sequentially closed set in X*. 
In general it is nonconvex. If f is strictly differentiable 
at x the set (7) reduces to the derivative. 
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Using strict 6-subdifferentials, one can rewrite (7) in 
the following way: 


Of (x) = ()l*ds F(x) , 


5>0 


where cl* denotes the weak* sequential closure. 

Other limiting objects (the limiting superdifferen- 
tial, the limiting differential, the limiting normal cone, 
the singular limiting subdifferential, and the limiting 
coderivative) can be defined in a similar way. 

Thus, the limiting normal cone to a closed set (2 is 
defined by the equality 


N(x|2) = (\ cl" N5(x|2) . 


5>0 


It coincides with the limiting subdifferential of the in- 
dicator function of 2. The analog of (5) is also valid: 


Of (x) = {x* © X* : (x*,-1) € N(x, f(x)lepif)} . 


The limiting subdifferentials and normal cones have 
been well investigated. They possess good calculus 
(which is the consequence of the fuzzy calculus of 
Fréchet subdifferentials; see [19,23,30,31,32,34,36,42] 
for the properties of these objects and some exam- 
ples). They have proved to be very efficient for for- 
mulating optimality conditions in nonsmooth opti- 
mization [20,22,29,30,31,32,34,36], especially in finite 
dimensions. When applying limiting subdifferentials in 
infinite dimensional spaces, one must be careful about 
nontriviality of the limits in the weak* topology. Ad- 
ditional regularity conditions are needed (compact epi- 
Lipschitzness, sequential normal compactness, partial se- 
quential normal compactness [4,34,36], etc.) 


Fréchet e-Subdifferentials and e-Normals 


In some cases it can be convenient to use &-ex- 
tensions of the Fréchet subdifferentials and normal 
cones [21,23,29,44]. For instance, the Fréchet e-sub- 
differential and the Fréchet ¢e-superdifferential of f at x 
are defined as 


partial, f(x) = \x* € X*: 
jing oat ae > -e 
ms ual 


at f(x) = }*" € X*: 


lim sup as an ee, 


Unlike (1) and (2), these sets depend on the specific 
norm on X (when ¢€ > 0). 

The next two propositions extend Propositions 1 
and 2 respectively. 


Proposition 3 If f is convex then 
a. f(x) = af(x) + BY = {x" © X*: flu) — fle) > 
(x*,u—x) —ellu—x||, Vu € X}. 


Remark 2 Note that the above e-subdifferential dif- 
fers from the corresponding notion of convex analysis, 
which is usually defined [41] as the set of all x* € X%*, 
such that f(u) — f(x) > (x*,u— x) —e forallu € X. 


Proposition 4 If xf € d.,f(x), x} € dF f(x), 
&; > 0, €2 => 0 then ||x¥ — x}'|| < 61 + €2. 


Formulation 
Direct Calculus 


The propositions below present some simple calculus 
results for Fréchet subdifferentials. Most of them fol- 
low directly from the definitions. More advanced state- 
ments of fuzzy calculus are presented in the next sub- 
sections. 


Proposition 5 If f attains a local minimum at x then 


0 € Of (x). 
Proposition 6 0(A f)(x) = Adf(x) for any A > 0. 


Proposition 7 Let fi, fo : X — Roo be finite atx. Then 
Afi + fad(x) D Afi(x) + Ofa(x) . (8) 


The above proposition presents an example of a Sum 
Rule. Usually the sum rule is the central result of 
any subdifferential calculus. Unfortunately, the inclu- 
sion (8) is almost useless: it does not allow one to de- 
compose elements of the subdifferential of the sum of 
functions in terms of elements of subdifferentials of the 
original functions. Simple examples show that inclu- 
sion (8) can be strict even in the convex case. The next 
proposition gives two important cases when the equal- 
ity holds true in (8). 
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Proposition 8 Let f1, f2 : X — Roo be finite at x. 
1. Iff; and f2 are convex and one of them is continuous 
at x then 


a(fi + fa(x) = Afi(x) + Ifa(x). 


2. If f; is Fréchet-differentiable at x then 


a(fi + fax) = Vfilx) + Ifa(x) . (9) 


Part 1 of Proposition 8 is known as the Moreau- 
Rockafellar theorem [41]. Part 2 is a simple corollary 
of Proposition 7. It is an interesting example, when an 
inclusion implies an equality. Indeed, applying Propo- 
sition 7 to the sum of the functions f,; + f: and —f) and 
making use of (3), one gets 


Ofo(x) D Afi + fadlx) — OT ie) . (10) 


Taking into account Proposition 2, the inclusions 
(8) and (10) imply (9). 

Proposition 8 yields a simple necessary optimality 
condition generalizing Proposition 5. 


Proposition 9 Let fi:X—>R_ be Fréchet-dif- 
ferentiable at x where fy: X + Roo is finite. If fi + fr 
attains a local minimum at x then —V f\(x) C dfo(x). 


The next two assertions are corollaries of Propositions 7 
and 9. 


Proposition 10 Let x € 2,)N Qo. 
Then N(x|2,N 22) D N(x|Q1) + N(x|Q22). 


Proposition 11 Let f be Fréchet-differentiable at x. 
If f attains at x a local minimum on 92 then 
—V f(x) € N(x|2). 


For a set {2 C X and u € X consider the distance 
de(u) = infyeg |lu— ol. 


Proposition 12 ([18]) For any x €2 one has 
dda (x) = {x* € N(x|82) : ||x*|| < 1}. 


Strict Differentiability 


Recall that f is called strictly differentiable [34,42] at x 
(with the strict derivative V f(x)) if 


fu’) — flu) — (Vf (x), u! — u) 


[|u’ — ul 


lim =0. 


u>x,u’>x 


uu’ 


In the case of a strictly differentiable function the 
Fréchet subdifferentials and superdifferentials at nearby 
points cannot differ much from the strict derivative. 


Proposition 13 ([25]) Iff is strictly differentiable at x 
with the derivative V f(x) then for any ¢ > 0 there exists 
a6 > 0 such that: 

1. O8 f(x) C VF (x) + eB". 

2. Vf (x) € de f(u) N OF f(u) for all u € Bs(x). 


The rest of the section is devoted to “fuzzy” results 
in terms of Fréchet subdifferentials and strict Fréchet 
6-differentials. 


Variational Principles 


The variational principles by Ekeland [10], Borwein 
and Preiss [3] as well as their subsequent followers 
[6,8,34,40,42] are very powerful tools of modern vari- 
ational analysis. They make it possible to substitute an 
“almost minimal” point (up to ¢) by another point, ar- 
bitrarily close to the initial one, which is the local mini- 
mizer for a slightly perturbed (usually by adding a small 
term) function. Thus, such principles can be viewed as 
fuzzy results. 

The next assertion is valid for an arbitrary Asplund 
space. It is known as the Subdifferential Variational 
Principle. Let us recall that a Banach space is called As- 
plund [6,8,34,40] if any continuous convex function on 
it is Fréchet-differentiable on a dense Gs set of points. 
Asplund spaces form a rather broad subclass of Banach 
spaces. It includes, for instance, all spaces which ad- 
mit Fréchet-differentiable bump functions (in particu- 
lar, Fréchet smooth spaces). Reflexive spaces are exam- 
ples of Fréchet smooth spaces. 

Asplund spaces provide a very convenient frame- 
work for investigating “differential” properties of non- 
smooth functions. Actually the Asplund property of 
a Banach space is not only sufficient but also a neces- 
sary condition for the fulfillment of some basic results 
in nonsmooth analysis involving Fréchet normals and 
subdifferentials (see [6,13,34,35,38] and the statements 
below). 


Theorem 1 (Mordukhovich and Wang [38]) Let 
X be Asplund, f :X — Roo be lower semicontinu- 
ous and bounded below, ¢ > 0, A > 0. Suppose that 
f(x) < inf f + e. Then there exists au € By(x) and an 
x* € df(u), such that f(u) < f(x) and ||x*||x < e/A. 
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The following theorem states that the class of Fréchet 
subdifferentiability spaces [15] coincides with Asplund 
spaces. 


Theorem 2 The following assertions are equivalent: 

1. X is an Asplund space. 

2. For any lower semicontinuous function f : X — Roo 
the set {u € X : Of (u) F O} is dense in domf. 

3. For any lower semicontinuous function f : X > Roo 
there exists an x € domf such that Of (x) 4 9. 


Sum Rules 


After the sum rule was first established in the limiting 
form in [19] (see [23,31]) the fuzzy versions were de- 
rived in [12,15] (see also [6,17,37]). Now two main ver- 
sions of the fuzzy sum rule are known. For simplicity 
they are formulated below in terms of strict 5-subdif- 
ferentials. 


Rule 1 Weak Fuzzy Sum Rule Let f), fp : X > Roo 
be finite at x and lower semicontinuous near x. Then 
Afi + dle) © ds filx) + 95 falx) + U* for any 5 > 0 
and any weak* neighborhood U* of 0 in X*. 


Rule 2 Strong Fuzzy Sum Rule Let f, : X > Roo 
be finite at x and lower semicontinuous near x 
and f,:X—R _ be Lipschitz continuous near x. 
Then (fi + fa)(x) C 9g filx) + Os folx) + 5B* for any 
5 >0. 


A Banach space is called a trustworthy space [15] (for 
some kind of a subdifferential) if Rule 1 is valid in it. 
The following theorem proved by Fabian [12] states 
that for the Fréchet subdifferential the class of trustwor- 
thy spaces coincides with Asplund spaces. 


Theorem 3 The following assertions are equivalent: 
1. X is an Asplund space. 

2. The Weak Fuzzy Sum Rule is valid in X. 

3. The Strong Fuzzy Sum Rule is valid in X. 


The Weak Fuzzy Sum Rule yields the following rep- 
resentation of Fréchet normals to the intersection of 
closed sets. 


Proposition 14 Let {2), 92 be closed subsets 
in an Asplund space X and x € $2; S22. Then 
N(x, 21 M 22) C Ng(x, 21) + Ng(x, 22) + U* for 
any 6 > 0 and any weak* neighborhood U* of 0 in X*. 


Other fuzzy calculus results (chain rules, formulas for 
maximum-type functions, mean value theorems, etc.) 


for functions and multifunctions can be deduced from 
(some form of) the sum rule [5,16,24,37]. 


Extremal Principle 


The Extremal Principle continues the line of variational 
principles discussed above and is in a sense equivalent 
to them as well as to the sum rules. 

Let £2;, §2, be closed subsets in X. They are 
called locally extremal [29,30] near x € 23182, 
if there exists a neighborhood U of x and se- 
quences {ajz} € X, i= 1,2, k=1,2,..., such that 
dik — 0 when k — oo and (Q, — ayx) N (22 — ap) N 
U=96, k=1,2,... 

This means that by an arbitrarily small shift the sets 
can be made unintersecting in a neighborhood of x. 
The definition represents a rather general notion of ex- 
tremality: some locally extremal system corresponds to 
a local solution of any optimization problem (see vari- 
ous examples in [22,29,31,33]). 

The Extremal Principle, first established in [29] (see 
also [22,30,31]) for the case of a Fréchet smooth space 
(and in terms of ¢-normals) and in [35] (see [34]) in the 
Asplund space setting, provides a dual space character- 
ization of locally extremal systems in terms of Fréchet 
normals. It can be viewed as a fuzzy form of the separa- 
tion property. 


Extremal Principle If a system of sets 2), {22 is locally 
extremal near x € 92) M $22 then for any 5 > 0 there ex- 
ist elements xf € Ns3(x|921), xz € Ns(x|22) such that 
lly + x3 | < 4, [xt] + [zl] = 1. 


The following theorem proved in [35] shows that the 
Extremal Principle provides an extremal characteriza- 
tion of Asplund spaces. 


Theorem 4 The following assertions are equivalent: 
1. X is an Asplund space. 
2. The Extremal Principle is valid in X. 


Due to Theorems 3 and 4 the Extremal Principle 
is equivalent to the Sum Rules. It is also equiva- 
lent to some other basic results of nonsmooth analy- 
sis [6,34,45]. 

The Extremal Principle can be viewed as a certain 
extension of the classical separation theorem for convex 
sets. It was used in [22,29,30,31,36] and in many other 
papers as a main tool for deducing calculus formulas 
and necessary optimality conditions. 
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The local extremality assumption in the Extremal 
Principle can be replaced by a weaker stationarity 
condition [26,27,28]. The resulting Extended Extremal 
Principle is formulated as a necessary and sufficient 
condition and is also equivalent to the asplundity of the 
space. 

As noticed in [35], considering the extremal sys- 
tem provided by the pair {x}, (2, where x is a boundary 
point of a closed set (2 makes it possible to deduce from 
Theorem 4 the following nonconvex generalization of 
the well-known Bishop-Phelps theorem [40]. 


Corollary 1 Let 2 be a closed subset in an Asplund 
space X and let x € bd92. Then for any 5 > 0 there exists 
x* € Ng(x|) such that ||x*|| = 1. 


See also 
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Introduction 


The article considers different stationarity and regular- 
ity concepts for extended real-valued functions on met- 
ric spaces. 

All the properties can be characterized in terms of 
certain local constants. A function is said to be station- 
ary at a point (in some sense) if the corresponding con- 
stant is zero (a critical point). Otherwise the function is 
said to be regular at this point (in the same sense), and 
the constant provides a quantitative estimate of regu- 
larity. 

Traditionally the stationary behavior of a function 
at a point (stationary point) means that it is arbitrar- 
ily close to a constant near this point, that is the rate of 
change of the function is infinitely small compared to 
the increment of the variable. Of course, this is equiv- 
alent to the derivative being equal to zero (dual char- 
acterization of stationarity). The classical examples of 
the stationary behavior of a function, found in the text- 
books, are given by the functions y = x*, y = —x*, and 
y =x°. These three examples characterize the three 
possible types of stationary behavior in the differen- 
tiable case. 

Stationarity arises naturally in optimization theory: 
a point of minimum or maximum is necessarily a sta- 
tionary point. At the same time it is easy to see that 
weaker stationarity concepts are applicable to optimiza- 
tion problems. For instance, when dealing with mini- 
mizing a real-valued function, only its decrement must 
be infinitely small at a stationary point, and the func- 
tion itself does not need to be arbitrarily close to a con- 
stant or even differentiable at the point. Of course, if 
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it is differentiable, one has the same classical stationar- 
ity concept. In the nondifferentiable case more types of 
the stationary behavior are possible. The examples are: 
y = |x| and y = max(x, —x’) (both functions are con- 
sidered near the point x = 0). One can speak about inf- 
stationarity (the term suggested by Vladimir F. Demi- 
anov). From the point of view of maximization a con- 
cept of sup-stationarity can be considered in a similar 
way. Thus in the nondifferentiable case the stationarity 
splits into two “semi-stationarity” concepts. 

Another pair of stationarity concepts can be of in- 
terest in optimization theory: the point itself may not be 
stationary (inf or sup), but in any of its neighborhood 
there exists another point in which the behavior of the 
function is arbitrarily close to stationary (a “fuzzy” con- 
dition). We will speak about weak stationarity (inf or 
sup). The exact definitions will be given below. 

An example of this type of stationary behavior is 
given by the function y = xsin(1/x) if x 40, and 
y =0 if x =0. One can easily see that this func- 
tion is differentiable on R\{0}, and there exists a se- 
quence {x,} such that x, 0 and x, is a point 
of local minimum, k = 1,2,... Another example: 
y =x+x*sin(1/x) if x 4 0, and y = 0 if x = 0. This 
function is everywhere differentiable, y’(0) = 1, and 
there exists a sequence {x,} such that x, — 0 and 
y (xn) = 0,k = 1,2,... 

Stationarity concepts can also be defined in terms 
of dual space elements (subdifferentials). The relations 
between primal and dual definitions provide dual char- 
acterizations of (primal space concepts of) stationarity. 

This article contains no proofs. A more detailed de- 
scription of the stationarity and regularity concepts for 
real-valued functions can be found in [11]. 

Mostly standard notations are used throughout this 
article. X denotes a metric space with distance d. B(x) 
stands for a closed ball with center x and radius p. 


Definitions 
Inf-0 -Stationarity and Inf-6 -Regularity 


Let f be a function on a metric space X with values in 
the extended real line Rao = R U {+00}. It is assumed 
to be finite at some point x° € X. 

For p > 0 define the constant 


ol fl(x°) = gt — f(x"). (1) 


Note that 6,[f](x°) < Oand the equality 6,[f](x°) = 0 
for some p > 0 (for all p > 0) means that x° is a point 
of local (global) minimum of f. 

Of course, the infimum in (1) can be limited to 
the set {x € Bp(x°): f(x) < f(x®)}, or even to the set 
{x € Bo(x°): f(x) < f(x°)} under the additional 
agreement that the infimum over the empty subset of 
RL is 0. 

The function p > 6,[f](x°) is nonincreasing on 
R4+ and limp-++0 Oo[ f](x°) < 0. The equality lim, +0 
Oo[f](x°) = 0 means that f is lower semicontinuous at 
x°. In the latter case it can be important to know how 
quickly 6,[f](x°) approaches 0 compared to p. 

Define two more “derivativelike” constants based 
on (1): 


OLf](x°) = lim sup Pol f1(x") : (2) 
p>+0 p 
OLfl(x°) = limsup Pol FI) ' (3) 


f 
x—>x°,0—>+0 


where x nA x° means that x > x° with f(x) > f(x°). 
Due to the variations of x in (3), AL f](x°) gains some 
properties of the strict derivative. 

The constants (2) and (3) are nonpositive too, and 
“zero cases” correspond to certain kinds of stationary 
behavior of f near x°. If a constant is strictly negative, 
this can be considered as a kind of regularity. 


Definition 1 f is 

(i) inf-0-stationary at x° if OL f](x°) = 0; 

(ii) weakly inf-0-stationary at x° if OL f](x°) = 0; 
(iii) inf-0-regular at x° if OL f](x°) < 0; 

(iv) strongly inf-0-regular at x° if OL f](x?) <0. 


The purpose of the “inf” prefix in this definition is to 
emphasize that minimization problems are addressed 
here. Unlike the classical case, stationarity-regularity 
properties of nondifferentiable functions “from below” 
and “from above” can be essentially different. 


Inf-t-Stationarity and Inf-t-Regularity 
Another way of defining stationarity-regularity is based 
on using slightly modified versions of (2) and (3): 


(x) — f(x®)]- 


r{fl(e*) = limint ! a (4) 
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Lf) =F) 


tLfl(x*) = FE 


lim sup 
f ueBp(x)\{x} 
x—>x°, p—>+0 


(5) 


The notation [a]_ = min(a, 0) is used here. Again, 
only the points x € B,(x°) with f(x) < f(x°) and 
u € B,(x) with f(u) < f(x) are of interest in (4) and 
(5), respectively. The role of the notation is to handle 
the case where the set of such points is empty. Similarly 
to (2) and (3), these constants are nonpositive. 


Remark 1 t[f](x°) coincides up to a sign with the 
strong slope |V f |(x°) of f at x° [1] (see also [5]). 
Definition 2 f is 

(i) inf-t-stationary at x° if tL f](x°) = 0; 

(ii) weakly inf-t-stationary at x° if t[ f](x°) = 0; 

(iii) inf-t-regular at x° if tL f](x°) < 0; 

(iv) strongly inf-t-regular at x° if tL f](x°) < 0. 

The relations between the constants (2), (3) and (4), (5), 


as well as between the corresponding stationarity and 
regularity concepts will be discussed in the next section. 


Sup-Stationarity and Sup-Regularity 


Similarly to (1)-(5) corresponding “maximization” 
constants can be defined. To do this one has to replace 
“inf,” “lim inf,” “lim sup,” and [-]_ by “sup,” “lim sup,” 
“lim inf,” and [-]_, respectively, in the corresponding 
definitions. The resulting constants are nonnegative. 
They are related to (1)-(5) by the following equalities: 


07 [f](x°) = —Op[—f I(x) . 
A* Lf ](x°) = -O[-fl(x°), 
+ Lf\(x?) = -O[-f](x°), 
tt [fl(x°) = —t[-f](x°), 
it [fl(x°) = -t[-f1(x°) 


and lead to similar sup-stationarity and sup-regularity 
concepts. 

Of course, for a function f the set of sup-stationary 
(sup-regular) points is different in general from that of 
inf-stationary (inf-regular) points. 

The “combined” concepts can also be of interest. It 
is natural to say that a function is stationary (in some 
sense) at a point if it is either inf-stationary or sup-sta- 
tionary at this point. In constrast, the regularity prop- 


erty for a function is satisfied when this function is both 
inf-regular and sup-regular at the point. 

Definition 3 f is 

(i) 0-stationary at x° 


if max(6[f](x°), @[—f](x°)) = 0; 
(ii) weakly 0-stationary at x° 

if max(6[f](x°), 6[—f](x°)) = 0; 
(iii) 0-regular at x° 

if max(@[f](x°), O[—f](x°)) < 0; 
(iv) strongly 0-regular at x° 

if max(6[f](x°), 6[—f](x°)) < 0; 


(v)  t-stationary at x° 


if max(t[f](x°), tI—f](x°)) = 0; 
(vi) weakly t-stationary at x° 

if max(t[f](x°), t[-—f](x°)) = 0; 
(vii) t-regular at x° 

if max(t[f](x°), t[—f](x°)) < 0; 
(viii) strongly t-regular at x° 

if max(t[f](x°), t[-f](x°)) < 0. 


Strong inf-regularity can be interpreted in the following 
way: all points in a neighborhood of a given point have 
“descent sequences,” and the rate of descent is uniform. 
In contrast to that, strong regularity is equivalent to the 
existence of both descent and ascent sequences with the 
uniformity property. 


Dual Stationarity and Regularity 


All definitions in the preceding subsections are primal 
space definitions. As in the classical analysis, dual char- 
acterizations of stationarity and regularity concepts are 
important. In the case of a normed linear space, such 
characterizations can be formulated in terms of Fréchet 
subdifferentials. 

Let X be a normed linear space. Its (topological) 
dual is denoted X™*. (-,-) is the bilinear form defining 
the duality pairing. Recall that the Fréchet subdifferen- 
tial of f at x° is defined as 


Of (x°) = 4 x* € X*: 
egg OI gl. 
ae Iz — = 


Definition 4 f is 
(i) inf-d-stationary at x° if 0 € df (x°); 
(ii) inf-d-regular at x° if0 ¢ Of (x°). 
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It follows immediately from the definitions that in the 
normed space setting inf-r-stationarity (inf-t-regular- 
ity) is equivalent to inf-d-stationarity (inf-d-regularity). 

Somewhat more complicated constructions are 
needed for the characterization of weak stationarity and 
strong regularity. 

Let us assume for simplicity that f is lower semicon- 
tinuous near x°. 

In the general nonconvex setting the subdifferential 
mapping df(-) fails to possess good (semi-)continuity 
properties. In fact, the set f(x) can be empty rather 
often. Based on (6) one can define a more robust deriva- 
tivelike object: 

sfx)= VJ 
x€Bg(x°) 


[fl f(x? 136 


Af (x) . (7) 


This object depends on a positive parameter 6 and ac- 
cumulates information on “differential” properties of 
f at nearby points, thus attaining some properties of 
the strict derivative. The set (7) is called the stricté- 
subdifferential of f at x° (see [7,8,9]). In contrast to (6), 
set (7) can be nonconvex. However, it possesses certain 
subdifferential calculus. 
Using (7) one more constant can be defined for 
characterizing stationarity/regularity properties of f: 
nl f(x°) = lim inf{||x*|| sx" € Os f(0°)}. (8) 
50 
Unlike the constants considered in the preceding sub- 
sections, this constant is nonnegative. 


Definition 5 f is 
(i) inf-n-stationary at x° if nL f](x°) = 0; 
(ii) inf-n-regular at x° if n[ f](x°) > 0. 


Note that the inf-7-stationary condition n[f](x°) = 0 
does not imply the inclusion 0 € ds Ff (x°). 


Example 1 Take f(x) = x, if x <0, and f(x) = x? 
otherwise. One has df(0) = 9, 0 ¢ 05 f (0) for any 
5 > 0, while n[f](0) = 0. 


Fortunately (8) happens to be closely related to (3) and 
(5). 

Sup-d-stationarity and sup-7-stationarity as well as 
the corresponding regularity concepts can be defined in 
a similar way. 


Formulation 


Relations Between the “Elementary” Constants 


Proposition 1 The following assertions hold true: 

(i) tLf\(*) < OLf I(x"); 

(ii) If On. f\(x°) = 0 for some p > 0, then tL f](x°) = 
ALF ](x°) = 0. 


Proposition 1 (i) implies the relations between the cor- 
responding stationarity and regularity concepts: 

e inf-r-stationarity > inf-@-stationarity; 

e inf-6-regularity > inf-r-regularity. 

Proposition 1 (ii) means that at a point of local min- 
imum a function is both inf-r-stationary and inf-0-sta- 
tionary. 

Inequality (i) in Proposition 1 can be strict even for 
functions from R to R. 


Example 2 ‘Take f(x)=-—|x|, if |x| = 1/2", 
n=1,2,..., and f(x) =0 otherwise. Obviously 
tL f](0) = —1. At the same time, for any p € &, = 
{p: 1/2" < p < 1/2"7'} one has 6,[f](0) = —1/2” and 


as pL F1(0) 


pes, p 


_ 2? 1 


~ 42-1 a 


Thus, O[f](0) = —1/2. 


It is possible to modify the above example to make 
O[f](0) equal zero. 


Example 3 Take f(x)=-—\x|, if |x| = 1/n", 
n=1,2,..., and f(x) =0 otherwise. One still has 
tL f](0) = —1 while 6[f](0) = 0. 


Thus, in the above example f is inf-t-regular at 0 while 
being inf-6-stationary at this point. 

It is possible to modify the example further to make 
f continuous and even differentiable near 0 (but not 
strictly differentiable!) while keeping the inequality (i) 
in Proposition 1 strict. 


Relations Between the “Strict” Constants 


The relations between the elementary constants and 
their “strict” counterparts, as well as between the two 
“strict” constants, are given by the following theorem. 


Theorem 1 The following assertions hold true: 
(i) O[f](x°) = lim sup 6[f](x), 


x—>x? 
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(ii) t[f](x°) = lim sup t[f](x); 
f ao) 
(iii) t1fMX°) < OL fx); 
(iv) If X is complete and f is lower semicontinuous near 


x°, then t[f](x°) = OLf](x°). 
Parts (i) and (ii) of Theorem 1 imply the inequalities 
OLAI(x°) < OLF Mx), thfM(x°) < *Lfle*), 
and both of them can be strict. 


Example 4 Take the function f from Example 1. Ev- 
idently, f attains a local minimum at x, = 1/2” 
for any n = 1,2,..., and consequently, Oo[f](xn) = 0 
for some p > 0. It follows from Proposition 1 (ii) that 
tf ](xn) = OLf](xn) = 0. Consequently, «[f](0) 
6f](0) = 0. Recall that t[f](0) = —1 and 6[f](0) 
—1/2. 


II 


II 


Inequalities (i) and (iii) in Theorem 1 can be strict, too. 


Example 5 Define the function f : IR — R in the 
following way: f(x) =x if x <0, f(x) =x-—I/n if 
I/n<x <1/(n—1), n=2,3,..., f(x) =x—-1/2 if 
x > 1/2. It is easy to see that 6[f](x) = t[f](x) = —-1 
for any x € R. Then ¢[f](0) = —1. On the other hand, 
take x, = 1/n+1/n?, py =1/n, n=1,2,... Then 
f (Xn) = 1/n?, and consequently, 45, [f](xn) => —1/n’. 
It follows immediately that 6[f](0) = 0. 


Due to part (iv) of Theorem 1, in the case of a lower 
semicontinuous function on a complete metric space 
two weak stationarity concepts as well as two strong 
regularity concepts coincide and the prefixes 9 and t 
can be omitted. 


Corollary 1 The following assertions hold true: 

(i) Inf-0-stationarity > weak inf-0-stationarity; 
strong inf-0-regularity => inf-0-regularity; 

(ii) Inf-t-stationarity > weak inf-t-stationarity; 

strong inf-t-regularity = inf-t-regularity; 

Weak inf-t-stationarity => weak inf-0-stationar- 

ity; 

strong inf-0-regularity => strong inf-t-regularity; 

(iv) If X is complete and f is lower semicontinuous near 
x°, then 
weak inf-t-stationarity <> weak inf-0-stationarity; 
strong inf-0-regularity <> strong inf-t-regularity. 


(iii) 


The next “fuzzy” characterization of weak inf-t- 
stationarity can be convenient for applications. It fol- 
lows directly from definition (5). 


Proposition 2 f is weakly inf-t-stationary at x° if and 
only if for any € > 0 there exists an x € Be(x°) such that 


If (x) — f(x°)| < e and 


f(u) + ed(u, x) = f(x) forallu near x . (9) 


Remark 2 A point x satisfying (9) is referred to in [12] 
(see also [6]) as a local Ekeland point of f (with factor 
é). If all the conditions in Proposition 2 are satisfied, 
then x° is said to be a stationary point of f with respect 
to minimization [12]. Thus, stationarity with respect to 
minimization is equivalent to weak inf-t-stationarity 
and, in the case of a lower semicontinuous function on 
a complete metric space, also to weak inf-0-stationarity. 


Relations Between the Primal and Dual Constants 


Henceforth X is assumed to be a normed linear space. 
The next assertion is straightforward and has already 
been mentioned in the previous section. 


Proposition 3 
(i) inf-t-stationarity < weak inf-d-stationarity; 
(ii) inf-t-regularity <> inf-d-regularity. 


Remark 3 Due to Propositions 1 and 3 the inclusion 
0 € Of(x°) is sufficient for inf-6-stationarity of f at x°. 
The opposite implication is not true in general (see Ex- 
amples 2 and 3). 


In what follows f is assumed to be lower semicontinu- 
ous near x°. 


Theorem 2 


(i) OLFMx°) + nLfl(x°) = 0. 
(ii) If X is Asplund, then 


OP) + ni fia’) <0. 

[1 + ALF) 

This theorem follows from [10], Theorem 2. The first 
part of the theorem is elementary. The proof of the sec- 
ond part is based on the application of the two fun- 
damental results of variational analysis: the Ekeland 
variational principle [2] and the fuzzy sum rule due to 
Fabian [3]. 

Thus, in an Asplund space the constants é[ F\(x°) 
and n[f](x°) can be zero or nonzero only simulta- 
neously. Recall that a Banach space is called Asplund 
(see [4,13,14]) if any continuous convex function on it 
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is Fréchet differentiable on a dense Gs subset. Note that 
in a Banach space luales) = t[f](x°) due to Theo- 
rem 1. 


Corollary 2 

(i) _ inf-n-stationarity => weak inf-0-stationarity; 

(ii) strong inf-0-regularity => inf-n-regularity; 

(iii) If X is Asplund, then 
weak inf-0-stationarity < weak inf-t-stationarity 
> inf-n-stationarity; 
strong inf-0-regularity <> strong inf-t-regularity 
> inf-n-regularity. 


Differentiable Functions 


The constants and corresponding stationarity/regu- 
larity concepts defined above take quite a traditional 
form when the function is assumed differentiable or 
convex. Fortunately, the number of different constants 
and concepts reduces significantly. 


Theorem 3 [ff is Fréchet differentiable at x° with the 
derivative V f (x°), then 


A[fI(x°) = tLf(x*) = —O* LAI") 
= —t*[f](x*) = -|VF(x*)II. 
If, additionally, the derivative is strict, then 


ALF) = tLfl(x?) = -OF Lf") 
= -t*1f1(e°) = (IVF). 
Recall that f is called strictly differentiable [13,15] at x° 
(with the derivative V f(x°)) if 
flu) — f(x) — (Vf («*), u— x) 


||u — x]| 


lim = 0. 


Ko UX? 

This condition is stronger than the traditional 
Fréchet differentiability. Thus, condition Vf(x°) 4 0 
does not guarantee strong regularity in the sense of Def- 
inition 1 (or Definition 2) unless f is strictly differen- 
tiable at x°. 


Example 6 Take f(x) = x + x’ sin(1/x), ifx # 0 and 
f(0) = 0. This function is everywhere Fréchet differen- 
tiable and Vf (0) = 1. Thus, f is regular at zero. At the 
same time ¢[f](0) = t+ [f](0) = 0: there exists a se- 
quence x; — 0 such that V f(x,) — 0, and the asser- 
tion follows from Theorem 1, part (ii). Consequently, f 


is both weakly inf-stationary and weakly sup-stationary 
at zero. 


Corollary 3 If f is Fréchet differentiable at x° with 
the derivative V f(x°), then the following conditions are 
equivalent: 

(i) f isinf-0-stationary at x°; 

(ii) f is inf-t-stationary at x°; 

(iii) f is 0-stationary at x°; 

(iv) f is t-stationary at x°; 

(v) V(x) =0. 

If, additionally, the derivative is strict, then the above 
conditions are also equivalent to the following ones: 

(vi) f is weakly inf-0-stationary at x°; 

(vii) f is weakly inf-t-stationary at x°; 

(viii) f is weakly 6-stationary at x°; 

(xi) f is weakly t-stationary at x°. 


Remark 4 Stationarity and weak stationarity in the 
above corollary can be replaced with regularity and 
strong regularity, respectively, if one replaces the equal- 
ity in (v) with the inequality V f(x°) # 0. 


Convex Functions 


In the convex case, as one might expect, all versions of 
inf-stationarity coincide and appear to be equivalent to 
just (local and global) minimality. 


Theorem 4 Let f be convex. 

(i) If O,[f\(x°) < 0 for some p > 0, then O,[f](x°) < 
0 for all p > 0. 

(ii) The functions p— Op[f\(x°)/p and p — 
03 Lf\(x°)/p are nondecreasing on R+\{0}. 

(iii) The following equalities hold true: 


OL fI(x°) = tLfl(x®) = OLfM(x°) = tLfl(x?) 
— ime OLA) _ dg LEG) — FP) 


p>0 p xx? [|x — x°|| 


O* [f](x°) = tT [fl(&°) = inf 


p>0 
ait sip ORs, 
P>9 |[x—-x°||=p p 


6+ [fMx°) 
p 


(iv) tIfl(x°) + tT Tf] (x°) = 0. 
(v) t[f](x°) + 77 [f](x*) = 0. 
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(vi) IftLfl(x°) + tt Lf)(x°) = Oand {x,} C X isase- 
quence defining t[f |(x°), that is x, — 0 and 
fix? +x) =f") 


legal 


Ufl(e*) = lim 


then {—xx} is a sequence defining t* [f](x°): 


L(xb = ae) — FE) 


legal 


ct [f|@°) = jim 


Corollary 4 Iff is convex, then the following conditions 
are equivalent: 

(i) f attains a global minimum at x°; 
(ii) f attains a local minimum at x°; 
(iii) f is inf-0-stationary at x°; 

(iv) f is inf-t-stationary at x°; 

(v) f is 0-stationary at x°; 

(vi) f is t-stationary at x°; 

(vii) f is weakly inf-0-stationary at x°; 
(viii) f is weakly inf-t-stationary at x°; 
(ix) f is weakly stationary at x°. 


Remark 5 The conditions t[f](x°) = t*[f](x°) = 0 
imply Fréchet differentiability of f at x° (with the 
derivative equal to zero). The weaker condition 
tLfl(x°) + tt Lf](x°) = 0 in Theorem 4, (vi) implies 
linearity of the directional derivative of f along the di- 
rection of steepest descent (if the latter exists) with the 
opposite direction being automatically the direction of 
steepest ascent. This condition is not sufficient for dif- 
ferentiability of f at x° unless X = R. Note also that 
the direction opposite to the direction of steepest ascent 
does not need to be a direction of steepest descent. 


Example 7 (') Take the function f(x, y) = max(x, y) 
on R? and assume that R* is equipped with the max 
type norm: ||x, y|| = max(|x|,|y|). f is obviously not 
differentiable at 0. At the same time t[f](0) = —1, 
t*[f](0) = 1. The vector (—1, —1) defines the (unique) 
direction of steepest descent. The opposite vector (1, 1) 
defines the direction of steepest ascent and f is linear 
along the line defined by these vectors. Note that the 
direction of steepest ascent is not unique. For instance, 
the vector (1, 0) also defines the direction of steepest as- 
cent, while the opposite vector does not define the di- 
rection of steepest descent and f is not linear along this 
line. 


1The example was suggested by Alexander Rubinov (personal 
communication). 


Remark 6 Stationarity and weak stationarity in asser- 
tions (iii)-(ix) of the above corollary can be replaced 
with regularity and strong regularity, respectively, if 
one replaces (i) and (ii) with the opposite assertions: x° 
is not a point of (local or global) minimum of f. 


See also 


> Nonsmooth Analysis: Fréchet Subdifferentials 
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Introduction 


Clustering is the unsupervised classification of the pat- 
terns. Cluster analysis deals with the problems of orga- 
nization of a collection of patterns into clusters based 
on similarity. It has found many applications, includ- 
ing information retrieval, document extraction, image 
segmentation etc. 

In cluster analysis we assume that we have been 
given a set A of a finite number of points of n-dimen- 
sional space R", that is 


A= {a',...,a™}, where ai ER", i=1,...,m. 
The subject of cluster analysis is the partition of the set 
A into a given number q of overlapping or disjoint sub- 
sets Cj,i = 1,...,q with respect to predefined criteria 


such that 


The sets Cj,i = 1,...,q are called clusters. The 
clustering problem is said to be hard clustering if ev- 
ery data point belongs to one and only one cluster. Un- 
like hard clustering in the fuzzy clustering problem the 
clusters are allowed to overlap and instances have de- 
grees of appearance in each cluster. In this paper we will 


exclusively consider the hard unconstrained clustering 
problem, that is we additionally assume that 


Cl )Gs0, Vike oie. 


and no constraints are imposed on the clusters C;,i = 
1,...,q. Thus every point a € A is contained in exactly 
one and only one set Cj. 

Each cluster C; can be identified by its center (or 
centroid). Then the clustering problem can be reduced 
to the following optimization problem (see [12,26]): 


q 
T . 
minimize @(C, x) = = a a IIx’ — all? (1) 


i=1 a€C; 
subjecttoC eC, x =(x',...,x4) €R™4 
where ||- || denotes the Euclidean norm, C = {Cj, 


..., Cg} is a set of clusters, C is a set of all possible q- 
partitions of the set A, x! is the center of the cluster C;, 
i=1,...,¢: 


and |C;| is a cardinality of the set C;,i = 1,...,q. The 
problem (1) is also known as the minimum sum-of- 
squares clustering. The combinatorial formulation (1) 
of the minimum sum-of-squares clustering is not suit- 
able for direct application of mathematical program- 
ming techniques. The problem (1) can be rewritten as 
the following mathematical programming problem: 


. . . 1 a 4 j i 
minimize W(x, w) = - X d wij||x’ — a’ |? (2) 
== 


q 
subject to > Wij = 1, 


i=1,...,m, 
j=l 
and 
wij € 10, 1}, i=l,....m,j=Hl,...,q. 
Here 
™  wia' 
xi = List 2 PH Lccoyd 


a" 
dint Wij 
and w; is the association weight of pattern a; with clus- 


ter j (to be found), given by 


1 if pattern a’ is allocated to cluster j 
i , 
0 otherwise . 


w isan m X q matrix. 


2666 


Nonsmooth Optimization Approach to Clustering 


There exist different approaches to clustering in- 
cluding agglomerative and divisive hierarchical clus- 
tering algorithms as well as algorithms based on 
mathematical programming techniques. Descriptions 
of many of these algorithms can be found, for example, 
in [15,20,21,26]. 

Problem (2) is a global optimization problem. 
Therefore different algorithms of mathematical pro- 
gramming can be applied to solve this problem. Some 
review of these algorithms can be found in [16]. How- 
ever, most of these algorithms are applicable for clus- 
tering on small data sets. 

Different heuristics can be used for solving cluster- 
ing problems on large data sets and k-means is one 
such algorithm. Different versions of this algorithm 
have been studied by many authors (see [26]). This 
is a fast algorithm. k-means gives good results when 
there are few clusters but deteriorates when there are 
many [16]. This algorithm achieves a local minimum of 
problem (1) (see [24]), however results of numerical ex- 
periments presented, for example, in [19] show that the 
best clustering found with k-means may be more than 
50% worse than the best known one. 

Much better results have been obtained with meta- 
heuristics, such as simulated annealing, tabu search and 
genetic algorithms [23]. The simulated annealing ap- 
proaches to clustering have been studied, for example, 
in [13,25,27]. Application of tabu search methods for 
solving clustering problem is studied in [1]. Genetic 
algorithms for clustering have been described in [23]. 
The results of numerical experiments, presented in pa- 
per [2] show that even for small problems of cluster 
analysis when the number of entities m < 100 and the 
number of clusters q < 5 these algorithms take 500-700 
(sometimes several thousands) times more CPU time 
than the k-means algorithms. For relatively large data 
sets one can expect that this difference will increase. 
This makes metaheuristic algorithms of global opti- 
mization ineffective for solving many clustering prob- 
lems. 

The paper [18] develops variable neighborhood 
search algorithm and the paper [17] presents j-means 
algorithm which extends k-means by adding a jump 
move. The global k-means heuristic, which is an in- 
cremental approach to minimum sum-of-squares clus- 
tering problem, is developed in [22]. The incremental 
approach is also studied in the paper [19]. Results of 


numerical experiments presented show the high effec- 
tiveness of these algorithms for many clustering prob- 
lems. 

As mentioned above the problem (2) is the global 
optimization problem and the objective function in 
this problem is multimodal. However, global optimiza- 
tion techniques are highly time-consuming for solv- 
ing many clustering problems. It is very important, 
therefore, to develop clustering algorithms that com- 
pute near global minimizers of the objective function. 
We propose the clustering algorithms based on nons- 
mooth optimization approach. The algorithms provide 
the capability of calculating clusters step-by-step, grad- 
ually increasing the number of data clusters until termi- 
nation conditions are met, that is it allows one to calcu- 
late as many cluster as a data set contains with respect 
to some tolerance. 


Formulation 


The problems (1) and (2) can be reformulated as the 
following mathematical programming problem [7,8, 12] 


minimize f(x',..., x7) 3) 
subject to x = (x',..., x7) € R"™4, 
where 
1 m 
1 a * j iy)2 
x ,...,x7) = — min ||x’—a : 4 
f( )= 7D min | | (4) 


i=1 


It is shown in [12] that problems (2) and (3) are equiva- 
lent. However, there are some differences between these 
two formulations: 

e The number of variables in problem (2) is (m+n)xq 
whereas in problem (3) this number is only n x q 
and the number of variables does not depend on 
the number of instances. It should be noted that in 
many real-world databases the number of instances 
m is substantially greater than the number of at- 
tributes n. 

e In the hard clustering problem (2) the coefficients 
w; are integer, that is the problem (2) contains both 
integer and continuous variables. In the nonsmooth 
optimization formulation variables are only contin- 
uous. 

e Nonsmooth optimization formulation of the clus- 
tering problem allows one to easily consider differ- 
ent similarity measures. 
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All these circumstances can be considered as advan- 
tages of the nonsmooth optimization formulation (3). 

If q> 1, the objective function (4) in problem (3) 
is nonconvex and nonsmooth. If the number gq of clus- 
ters and the number n of attributes are large, we have 
a large-scale global optimization problem. Moreover, 
the form of the objective function in this problem is 
complex enough not to be amenable to the direct appli- 
cation of general purpose global optimization methods. 
Therefore, in order to ensure the practicality of the non- 
smooth optimization approach to clustering, proper 
identification and use of local optimization methods 
is very important. Clearly, such an approach does not 
guarantee a globally optimal solution to problem (3). 
On the other hand, this approach provides a “near” 
global minimum of the objective function that, in turn, 
provides a good enough clustering description of the 
data set under consideration. 

Note also that a meaningful choice of the number 
of clusters is very important for clustering analysis. It 
is difficult to define a priori how many clusters repre- 
sent the set A under consideration. In order to avoid 
this difficulty, a step-by-step calculation of clusters is 
implemented in algorithms discussed in the next sec- 
tion. 


Methods 


In this section we will describe two incremental clus- 
tering algorithms. Both algorithms are based on nons- 
mooth optimization approach to clustering. In the first 
algorithm nonsmooth optimization approach is used to 
find starting points for k-means algorithm. This algo- 
rithm is a modification of the global k-means algorithm 
proposed in [22]. The second algorithm is an optimiza- 
tion based clustering algorithm. 


Modified Global k-Means Algorithm 


k-means algorithm and its different variations are 
known to be fast algorithms for clustering and they are 
applicable to large data sets. In this subsection we pro- 
pose a new version of k-means algorithm: the modified 
global k-means algorithm, which in its turn is the modi- 
fication of the global k-means algorithm. First we briefly 
describe k-means and the global k-means algorithms. 
k-means algorithm proceeds as follows 


Step 1. Choose a seed solution consisting of k cen- 
troids (not necessarily belonging to A). 


Step 2. Allocate data points a' € A to its closest cen- 
troid and obtain k-partition of A. 


Step 3. Recompute centroids for this new partition 
and go to to Step 2 until no more data points change 
their clusters. 


Nonsmooth Optimization Approach to Clustering, Algo- 
rithm 1 
k-means algorithm 


Step 1. Compute the centroid x! of the set A: 


1_1ym |i 
x = 3 ini 4 


and set k = 1. 


Step 2. Set k = k + 1 and consider the centers x!, 
2 


x*,...,x*~1 from the previous iteration. 

Step 3. Consider each point a of A as a starting point 
for the k-th cluster center, thus obtaining m initial 
solutions with k points (x!,x?,...,x*~!, a); apply 
k-means algorithm starting from each of 
them; keep the best k-partition obtained and its cen- 
(hee (Go oon ot ae) 


Step 4. If k = q stop, otherwise go to Step 2. 


Nonsmooth Optimization Approach to Clustering, Algo- 
rithm 2 
The global k-means algorithm 


The effectiveness of this algorithm highly depends 
on a starting point. It converges only to a local solu- 
tion which can significantly differ from the global one 
in large data sets. 

The global k-means algorithm proposed in [22] is 
the modification of k-means algorithm and it computes 
clusters successively that is in order to compute k-th 
cluster centroid this algorithm uses centroids of k — 1 
clusters from the previous iteration. To compute q < m 
clusters this algorithm proceeds as follows. 

This version of the algorithm is not applicable for 
clustering on middle sized and large data sets. Two 
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procedures were introduced to reduce its complexity 
(see [22]). We mention here only one of them. Let 


Gig — min {|\x! =i! [Feces |e = a'|?\ . (5) 


For each a! € A we compute: 


m 
r= y min {0, 
j=l 


a= ai ||? — d+} 


and we take the data point a’ € A for which 


l= argmin jy mT i- 


Then k-means algorithm is applied starting from the 
point (x!,x?,...,x*-!, a!) to find k cluster centers. 

Now we will describe a new version of the global 
k-means algorithm where a starting point for k-th clus- 
ter center is computed using nonsmooth optimization 
approach. Let us consider the problem of finding k clus- 
ter center assuming that the centers x!,...,x*~! for 
(k — 1)-clustering problem are known. We introduce 
the following function: 


Fy) = — Yomi {dy — a P} (6) 


i=1 
where y € R” stands for k-th cluster center and 
And is defined as in (5). Consider a set 


D= {yeR": ly — a‘ ||? Sai} ; 


This is a set where the distance from any point y to 
any data point is no less than the distance between this 
data point and its cluster center. We also consider the 
following set 


Dy =R"\D aly eR": IC {1,...,m}, 
1¢9: |ly—aill <dj_,viel}. 


The function f* is a constant on the set D and its value 
over this set is 


, lo, = 
f(y =h= pa ae VyeD. 


i=1 
It is clear that x/ € D for all j = 1,...,k — 1 and 


ai € Doforallai'c A, ai Ax), j=1,...,k—1.Itis 
also clear that f(y) < do for all y € Do. 


Step 1. For any a' € Do( A calculate the set S,(a'), 
the centroid c’ of this set and calculate the value f*(c') 
of the function f* at this point. 


Step 2. Compute 


giz argmin icp, jal ‘(c') , 
and the corresponding center c/. 
Step 3. Compute the set S2(c/) and its centroid. 


Step 4. Recompute the set S2(c/) and its centroid un- 
til no more data points escape this set or return to 
this set. 


Nonsmooth Optimization Approach to Clustering, Algo- 
rithm 3 
An algorithm for finding the initial point 


Any point y € Do can be taken as a starting point 
for the k-th cluster center. Probably more preferably 
among them is a global minimizer of the function f*. 
This function is a nonconvex and nonsmooth and its 
minimization is difficult task. We consider a scheme for 
finding its local minimizer. 

For any y € Do we consider the following sets: 


Si(y) = {a' € A: |ly—a' |? = di_,}, 
So(y) = {a' € A: |ly—a' |? < di_y}, 
S3(y) = {a EA: |ly— a' ||? > dis}. 


Since y € Dp the set S.(y) A O. We suggest the follow- 
ing algorithm to find a starting point for the k-th cluster 
center. 

Now we can describe the modified global k-means 
algorithm. 

It is clear that Sia > 0 for all k > 1 and the se- 
quence {f nee is decreasing, that is, 


petit 2p)" for all k Sc, 


The latter implies that after k >0 iterations the stop- 
ping criterion in Step 4 will be satisfied. 
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Step 1. (Initialization). Select a tolerance ¢ > 0. 
Calculate the centroid x'* € R"” of the set A. Let 
f’* be the corresponding value of the objective func- 
tion (4). Set k = 1. 


Step 2. (Computation of the next cluster center). Let 
x!*_...,x** be the cluster centers for k clustering 
problem. Apply Algorithm 3 to find an initial point 


y**1 & IR" for the k + 1-th cluster center. 


Step 3. (Refinement of all cluster centers). Take 


EO al, vy }°) asa mew starting point, 
apply k-means algorithm to solve clustering problem 
forg = k+ lle x ",.,.,%° * be a solution to 


this problem and f**!* be the corresponding value 
of the objective function (4). 


Step 4. (Stopping criterion). If 


ee — fie 
ioe 


then stop, otherwise set k = k + 1 and go to Step 2. 
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Nonsmooth Optimization Clustering Algorithm 


In this subsection we propose an algorithm for cluster- 
ing where nonsmooth optimization techniques are used 
to find a starting point for the k cluster center and to 
solve k-clustering problems. 

It is clear that ie > 0 for all k > 1 and the se- 
quence { f**} is decreasing that is, 


po* = F© forall eS 1, 


The latter implies that after k > 0 iterations the stop- 
ping criterion in Step 4 will be satisfied. 


Remark 1 One of the important questions when one 
tries to apply Algorithms 4 and 5 is the choice of the 
tolerance ¢ > 0. Large values of ¢ can result in the ap- 
pearance of large clusters whereas small values can pro- 
duce small and artificial clusters. 


Remark 2 Algorithms 2, 4 and 5 are incremental 
clustering algorithms. Main difference between Algo- 
rithms 2 and 4 is in the way they compute starting 


Step 1. (Initialization). Select a tolerance ¢ > 0. 
Calculate the centroid x!* € R"” of the set A. Let 
f'* be the corresponding value of the objective func- 
tion (4). Set k = 1. 


Step 2. (Computation of the next cluster center). Se- 
lect a point y° € R” and solve the following mini- 
mization problem: 


minimize it K(y) subject to y € R” (7) 


where f* is defined by (6). 


k+1,* 


Step 3. (Refinement of all cluster centers). Let y 
bea solution to problem (7). Take x**!° = (x!*,..., 


xk*, 9*+L*) as a new starting point and solve the fol- 


lowing minimization problem: 


minimize f**'(x) subject to 


x= (x, see al) e€ R"™*(k+1) (8) 


where 


m 
fea = 235 min x! alll. 
fay fates +1 


Step 4. (Stopping criterion). Let x**!* be a solution 
to the problem (8) and f**!* be the corresponding 
value of the objective function. If 


fe — fe 
a 


then stop, otherwise set k = k + 1 and go to Step 2. 
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points for the next cluster center. Algorithm 2 uses 
data points whereas Algorithm 4 uses local minimizers 
of the function f*. Algorithm 5 uses nonsmooth op- 
timization techniques for the finding of both starting 
points and k-partition of a data set. 


Remark 3 Computational results on gene expression 
data sets presented in [6] demonstrate that Algorithm 4 
is more efficient than Algorithm 2. However, the former 
requires more computational time. 


Nonsmooth Optimization Approach to Clustering 


Remark 4 Results of numerical experiments presented 
in [11] demonstrate that Algorithm 5 is efficient for 
solving large scale clustering problems in a reasonable 
CPU time. Moreover, its success to locate global solu- 
tions is higher than that for Algorithms 2 and 4. How- 
ever, this algorithm requires significantly more CPU 
time than other algorithms. 


Solving Optimization Problems 


The objective functions in problems (7) and (8) are 
nonsmooth and nonconvex. If the number of attributes 
and clusters are large then the problem (8) is large 
scale problem. Both objective functions are non-regular 
and the computation of even one their subgradient 
may become very difficult problem (for the definition 
of non-regular function, see [14]). Therefore, subgra- 
dient-based methods are not always efficient for solv- 
ing problems (7) and (8). We use the discrete gradient 
method to solve these problems [3,4,5]. This is a deriva- 
tive free method. 

The objective functions in problems (7) and (8) are 
piecewise partially separable (for the definition of piece- 
wise partially separable functions, see [9]). The discrete 
gradient method was modified taking into account this 
special structure of the objective functions. This modi- 
fied discrete gradient method is described in [10]. 


Conclusions 


In this paper we discussed a nonsmooth optimiza- 
tion approach to clustering problems. Many cluster- 
ing problems are large scale global optimization prob- 
lems. The nonsmooth optimization approach allows 
one to significantly reduce the number of variables in 
this problem. It also can easily handle different similar- 
ity measures. 

We introduced two algorithms based on the nons- 
mooth optimization approach. Both algorithms are in- 
cremental clustering algorithms. As these algorithms 
compute clusters step by step, they allow the decision 
maker to easily vary the number of clusters according 
to the criteria suggested by the nature of the decision 
making situation not incurring the obvious costs of the 
increased complexity of the solution procedure. The 
suggested approach utilizes a stopping criterion that 
prevents the appearance of small and artificial clusters. 
Nonsmooth optimization problems from cluster anal- 


ysis have special structure which allows one to design 
efficient algorithms for their solution. 
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Given a nonempty closed set X C R” and a mapping 
F: R" + R", assumed to be continuously differentiable, 


the variational inequalities (abbreviated: VI) are to find 
an x* € X such that 


(x—x*)"F(x*)>0, VxeX. (1) 


When X = R", (the positive orthant), (1) is equivalent 
to the nonlinear complementarity problem (abbreviated: 
NCP): Find x* € R” such that 


x*>0, F(x*)>0, x*! F(x*) =0. (2) 


Usually, X is represented by several inequalities and 
equalities. By considering the Karush-Kuhn-Tucker 
(KKT) conditions of (1) if necessary, X is assumed to be 
a closed convex subset of R” here. It has been proved by 
B.C. Eaves in [9] that solving (1) is equivalent to finding 
a solution of the equation 


H(x) := x — ITx[x — F(x)] = 0, (3) 


where J7x is the orthogonal projection onto X. When 
X =R‘,, (3) becomes 


H(x) = min(x, F(x)) =0, (4) 


where the operation min is taken componentwisely. 
This means that finding a solution of NCP is equivalent 
to finding a root of (4). It also has been shown by A. Fis- 
cher in [10] that finding a solution of NCP is equivalent 
to finding a root of another equation, namely of: 


Hj(x) := o(x;, Fi(x)) =0, i=1,...,n, (5) 


where ¢ is called the Fischer-Burmeister function (FB 
function), defined in [10] as 


o(a,b) = Va*+b?—-(a+b), 


Due to the nonsmoothness of the orthogonal projec- 
tion operator /7x, the min function and the FB func- 
tion, the function H defined either in (3), in (4) or in 
(5) is, in general, not smooth (i.e. continuously differ- 
entiable) no matter how smooth F is. This prevents one 
from using classical Newton methods to find solutions 
of these (nonsmooth) equations. 

Suppose that H: R” — R" isa locally Lipschitz func- 
tion (the function H defined in either (3), (4) or (5) 
is a locally Lipschitz function) but is not necessarily 
smooth. By Rademacher’s theorem, H is almost every- 
where differentiable. Let 


a,beR. 


Dy = {x: H is differentiable at x}. 
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Then the generalized Jacobian of H at x in the sense of 
F.H. Clarke [6] can be defined by 


dH(x) = conv dgH(x) , 


where 03 H(x) [20] is defined by 


dpH(x) = ¢ lim H’(x/) 


ai—>x 
xJEDy 


The nonsmooth Newton method for solving 


H(x)=0, xe€R", (6) 


can be defined as follows: Having the vector xk ER", 
compute x‘! by 


kth = xk _ V, 1H(x*) ; (7) 


where V; € 0H(x*). The nonsmooth Newton method 
(7) reduces to the classical Newton method for a sys- 
tem of equations if H is continuously differentiable. The 
classical Newton method has the favorable feature that 
the sequence {ck} generated by (7) is locally superlin- 
early (quadratically) convergent to a solution x* of H(x) 
= 0 if H’(x*) is nonsingular (and H’ is Lipschitz contin- 
uous) [8,18]. However, in general the iterative method 
(7) is not convergent for nonsmooth equations (6). See 
[16] for a counterexample. 

In order to establish some superlinearly convergent 
results for the nonsmooth Newton method (7), we use 
the concept of semismoothness. Let H be directionally 
differentiable at x. H is said to be semismooth at x if 


Vd — H'(x;d) = o(||d||), d—>0, 


and H is called strongly semismooth at x if 


Vd — H'(x;d) = O(|ld||”), d—0, 


where V € 0H(x + d). Semismoothness was originally 
introduced by R. Mifflin [17] for functionals. L. Qi and 
J. Sun [24] extended the concept of semismoothness 
to vector-valued functions. See [19] for several forms 
of semismooth equations. Using semismoothness, they 
[24] presented the following convergence theorem for 
the generalized Newton method (7): 


Theorem 1 Suppose that H(x*) = 0 and that all V € 
0H (x*) are nonsingular. Then the generalized Newton 


method (7) is Q-superlinearly convergent in a neighbor- 
hood of x* if H is semismooth at x*, and quadratically 
convergent if H is strongly semismooth at x**. 


Note that the nonsingularity of dH(x*) in the above 
theorem is somewhat restrictive in some cases. Qi [20] 
presented a modified version of (7), which may be 
stated as follows 


kt 28 xk _ V, 1H(x*) , (8) 


where V; € 03 H(x*). The difference of this version 
from (7) is that V;, is chosen from 0g H(x‘) rather than 
the convex hull of 0g H(x‘). Analogous to the above 
theorem, Qi [20] established the following result: 


Theorem 2 Suppose that H(x*) = 0 and that all V € 
dp H(x*) are nonsingular. Then the generalized Newton 
method (8) is Q-superlinearly convergent in a neighbor- 
hood of x* if H is semismooth at x*, and quadratically 
convergent at x* if H is strongly semismooth at x*. 


In general, neither (7) nor (8) can be globalized because 
@ is not necessarily continuously differentiable, where 
for any x € R", O(x) = ||H(x)||?/2. However, if @ is con- 
tinuously differentiable (e. g., @ is defined via (5); [14]), 
the nonsmooth Newton direction is a descent direction 
of @ and thus globalized methods can be designed. See 
[7] for a line search model and [13] for a trust region 
model. 

The feature of smoothing methods is to construct 
a smoothing approximation function G: R” x Ry, > 
R" of H such that for any ¢ > 0 and x € R", G(e, -) is 
continuously differentiable on R” and satisfies 

|| H(x) — G(e,x)|| > 0 ase lO, (9) 
and then to find a solution of H(x) = 0 by (inexactly) 
solving the following problems for a given positive se- 
quence {e*} with e* > 0 as k > on, 

Ge aya: (10) 
It was suggested in [21] to use the convolution to con- 
struct smooth approximations of the nonsmooth func- 
tion H. A function ®: R” > R, is called a kernel func- 
tion if 


i P(x) dx =1. 
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Suppose © is a smooth kernel function. Define 0: Ry, 
x R" > R, by 


O(e,x) =e "@(e'x), 
where (¢, x) € R,, x R”. Then a smooth approximation 
of the projection operator JT x can be described by 


P(e, x) = [ ITx(x — y)O(e, y) dy, (11) 


where (¢, x) € R,, x R”. Suppose that 


c= [Il eQrdy < +00. 
R" 
Then for any x € R" and ¢ > 0 one has 
\|P(e, x) — ITx(x)|| < Ke. 


In general, P(e, x) is intractable because a multidimen- 
sional integration is involved. However, it can be writ- 
ten explicitly if X is of special structure (for exam- 
ple, X is a rectangular) and @ is chosen particularly 
(see [3,12]). In fact, already in 1986 S. Smale [26] gave 
a smooth function @tY"F#) to approximate max(0, 
w), w € R, and used it to study linear complementar- 
ity problems. Also see [2,15]. The paper [1] stimulates 
much recent study about smoothing methods for solv- 
ing NCP and VI. For the convenience of discussion, for 
any € < 0, define P(e, x) = P(—e, x) and P(0, x) = ITx(x), 
x € R. Then the smooth approximation G: R”*' > R” 
of H defined in (6) can be described by 


G(e, x) = x — P(e, x — F(x)), (12) 


where (€, x) € R x R”. Thus, for any (¢, x) € Rx R", 
|G(e, x) — H(x)|| < ke. 


See [21] for a general case if H is not of the form defined 
in (3). 

Note that, for variational inequalities, if the function 
F is only defined on X and not well defined outside X, 
then the function H defined in (3) is not well defined on 
R". In this case, one can use the normal map introduced 
by S.M. Robinson [25] to overcome this difficulty. It is 
also noted that solving (1) is equivalent to finding a so- 
lution of the equation 


A(x) := x — Hx[x — FUTx(x))] =0, (13) 


where x € R". The above-defined H only requires F be- 
ing defined on X instead of on R” as required by the 
function defined in (3). Unlike Robinson’s normal map, 
the above map does not need to work on a transformed 
space, it works on the original space directly. By us- 
ing the definition of P, the smoothing approximation 
G: R"*! — R" of H defined in (13) can be described by 


G(e,x) = x — P(e, x — F(P(e, x))), (14) 


where (€, x) € Rx R". 

The first globally and superlinearly (quadratically) 
convergent smoothing Newton method was proposed 
by X. Chen, Qi and D. Sun in [4], where the authors ex- 
ploited a Jacobian consistency property and applied this 
property to an infinite sequence of smoothing approx- 
imation functions to get high-order convergent meth- 
ods. The smoothing function defined by (12) satisfies 
the Jacobian consistence property while the one defined 
in (14) does not satisfy this property. The method in [4] 
was further studied by Chen and Y. Ye in [5]. 

Suppose that G: R’*! > R" is a smoothing approxi- 
mation of H. Define E: R” > R” by 


€ 
E(z) := ‘ae | F 


where z := (€, x) € R x R". Then solving H(x) = 0 is 
equivalent to finding a solution of E(e, x) = 0. Note that 
E is continuously differentiable at any (¢, x) € R x R" 
with ¢ ¥ 0 and is possibly nonsmooth at (0, x) € R x 
R". We call E a smoothing-nonsmooth reformulation of 
H. Then classical Newton methods for solving smooth- 
ing equations can be used to solve E(z) = 0 with one 
additional requirement: ¢ must be positive during the 
process of iteration. The latter can be done by solving 
a slightly modified Newton equation: 


(15) 


E(z) + E'(z)Az = pz, (16) 


where f € (0, 00) and Z := (€, 0) with € > 0. It is obvi- 
ous that one should control ¢ such that it neither con- 
verges too fast (no stronger global convergence results 
guaranteed) nor too slow (no fast local convergence re- 
sults guaranteed). A special line search model involving 
B and Z was designed in [23] to achieve this: Choose 
é € Ry4 and y € (0, 1) such that ye < 1. Choose con- 
stants 5, 0 € (0, 1). Let e° := &,x° € R" bean arbitrary 
point. For k = 0, 1,..., finda solution A z* of (16) with 
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z:=z‘ and B := y min{I, e(z)}, where for anyy€ R™!, 
e(y) = ||E(y)||?. Let be the smallest nonnegative inte- 
ger | satisfying 


e(zk + 8! Azk) < [1 —20(1 — y@)8'Je(z*) . 


Define z* := zk+8!* Az*. It is often verified that G is also 
semismooth everywhere jointly with e and x [23]. So the 
semismooth theory of nonsmooth Newton methods for 
solving nonsmooth equations can be used to obtain su- 
perlinear (quadratic) convergence of (¢, x) for the above 
smoothing Newton method while the global conver- 
gence is based on the particular designed line search. 
See [23] for details. 


Conclusion 


In this paper semismooth Newton methods and 
smoothing Newton methods for solving NCP and VI 
based on nonsmooth equations have been briefly re- 
viewed. These topics are still undergoing a very fast de- 
velopment. See [22] for an up-to-date review. Another 
nonsmooth approach for solving NCP and VI is to re- 
formulate these problems as unconstrained optimiza- 
tion problems whose objective functions are once but 
not twice differentiable, (see [11]). 


See also 


> Composite Nonsmooth Optimization 

> Nonconvex-Nonsmooth Calculus of Variations 

> Solving Hemivariational Inequalities by Nonsmooth 
Optimization Methods 


References 


1. Burke J, Xu S (1999) A polynomial time interior-point 
path-following algorithm for LCP based on Chen-Harker- 
Kanzow smoothing techniques. Math Program 86:91-103 

2. Chen B, Harker PT (1993) A non-interior-point continuation 
method for linear complementarity problems. SIAM J Ma- 
trix Anal Appl 14:1168-1190 

3. Chen C, Mangasarian OL (1996) A class of smoothing func- 
tions for nonlinear and mixed complementarity problems. 
Comput Optim Appl 5:97-138 

4. Chen X, Qi L, Sun D (1998) Global and superlinear con- 
vergence of the smoothing Newton method and its appli- 
cation to general box constrained variational inequalities. 
Math Comput 67:519-540 

5. Chen X, Ye Y (1999) On homotopy-smoothing methods for 
variational inequalities. SIAM J Control Optim 37:589-616 


20. 


21. 


22. 


23. 


Clarke FH (1983) Optimization and nonsmooth analysis. 
Wiley, New York 

DeLuca T, Facchinei F, Kanzow C (1996) A semismooth 
equation approach to the solution of nonlinear comple- 
mentarity problems. Math Program 75:407-439 

Dennis JE, Schnabel RB (1983) Numerical methods for 
unconstrained optimization and nonlinear equations. 
Prentice-Hall, Englewood Cliffs 

Eaves BC (1971) On the basic theorem of complementarity. 
Math Program 1:68-75 

Fischer A (1992) A special Newton-type optimization 
method. Optim 24:269-284 


. Fukushima M (1996) Merit functions for variational in- 


equality and complementarity problems. In: Pillo GDi, Gi- 
annessi F (eds) Nonlinear Optimization and Applications. 
Plenum, New York, pp 155-170 

Gabriel SA, Moré JJ (1997) Smoothing of mixed comple- 
mentarity. In: Ferris MC, Pang JS (eds) Complementarity 
and Variational Problems: State of the Art. SIAM, Philadel- 
phia, pp 105-116 

Jiang H, Fukushima M, Qi L, Sun D (1998) A trust region 
method for solving generalized complementarity prob- 
lems. SIAM J Optim 8:140-157 

Kanzow C (1994) An unconstrained optimization tech- 
nique for large-scale linearly constrained convex mini- 
mization. Computing 53:101-117 

Kanzow C (1996) Some noninterior continuation methods 
for linear complementarity problems. SIAM J Matrix Anal 
Appl 17:851-868 

Kummer B (1988) Newton’s method for non-differentiable 
functions. In: Guddat J, Bank B, Hollatz H, Kall P, Klatte D, 
Kummer B, Lommatzsch K, Tammer L, Vlach M, Zimmer- 
man K (eds) Adv. Mathematical Optimization. Akademie, 
Berlin, pp 114-125 

Mifflin R (1977) Semismooth and semiconvex functions in 
constrained optimization. SIAM J Control Optim 15:957- 
972 

Ortega JM, Rheinboldt WC (1970) Iterative solution of non- 
linear equations in several variables. Acad. Press, New York 
Pang J-S, Qi L (1993) Nonsmooth equations: Motivation 
and algorithms. SIAM J Optim 3:443-465 

Qi L (1993) Convergence analysis of some algorithms for 
solving nonsmooth equations. Math Oper Res 18:227- 
244 

Qi L, Chen X (1995) A globally convergent successive ap- 
proximation method for severely nonsmooth equations. 
SIAM J Control Optim 33:402-418 

Qi L, Sun D (1999) A survey of some nonsmooth equations 
and smoothing Newton methods. In: Hill R, Eberhard A, 
Glover B, Ralph D (eds) Progress in Optimization. Kluwer, 
Dordrecht, pp 121-146 

Qi L, Sun D, Zhou G (2000) A new look at smoothing New- 
ton methods for nonlinear complementarity problems and 
box constrained variational inequalities. Math Program 
87:1-35 


NP-complete Problems and Proof Methodology 


24. Qi L, Sun J (1993) A nonsmooth version of Newton’s 
method. Math Program 58:353-367 

25. Robinson SM (1992) Normal maps induced by linear trans- 
formation. Math Oper Res 17:691-714 

26. Smale S (1986) Algorithms for solving equations. In: Proc. 
Internat. Congress Math, pp 172-195 


—— 
NP-complete Problems 


and Proof Methodology 


SANATAN RAI, GEORGE VAIRAKTARAKIS 
Department OR and Operations Management, 
Case Western Reserve University, Cleveland, USA 


MSC2000: 90C60, 68Q25 


Article Outline 


Keywords 

Some Known NP-Complete Problems 
Methodology for NP-Completeness Proofs 
Example Proofs 

Conclusion 

See also 

References 


Keywords 


Computational complexity; Reducibility; Polynomial 
time reduction; NP-hard problem; NP-complete 
problem; strong NP-completeness; ordinary 
NP-completeness 


Given a combinatorial problem, one tries to exploit its 
structure so as to develop a solution algorithm that 
guarantees identifying an optimal solution for every in- 
stance of the problem. For some problems, the inher- 
ent structure is such that one can develop an algorithm 
that progressively builds an optimal solution, or selects 
among a small number of candidate solutions. Such 
algorithms are quite desirable as their computer time 
requirements are small, and often a bounded polyno- 
mial function of the number n of parameters needed 
to specify the problem (e.g. O(n) or O(n); see [9] for 
details on algorithmic complexity). Thus, such algo- 
rithms are called polynomial algorithms. Problems solv- 


able by a polynomial algorithm may be solved quickly 
on a computer. 

Unfortunately, not all combinatorial problems pos- 
sess enough structure to allow for a polynomial algo- 
rithm. Hence, when we encounter a new problem for 
which we cannot identify enough structure, we would 
like to know whether this lack of structure is due to 
the problem itself, or to incomplete analysis. To ad- 
dress this issue, one idea is to compare the structure 
of the problem at hand with the structure of other 
well known and notoriously hard problems; such prob- 
lems are known in literature as NP-complete problems. 
Specifically, if we can show that our problem is ‘equiv- 
alent’ to an NP-complete problem, then any algorithm 
that solves our problem can be used to solve the hard 
one and vice versa. Then, we can justifiably suggest 
that our problem is very difficult. With this informa- 
tion, we can either continue focusing our efforts in find- 
ing a polynomial time optimal algorithm (admitting that 
our chances for success are low) or consider heuristics 
or enumeration techniques. 

If we are ever able to find a polynomial time algo- 
rithm for a NP-complete problem, we will make one of 
the most important discoveries in human knowledge. 
This will mean that we are able to solve all hard com- 
binatorial problems very quickly (for details about the 
relationship between the class of polynomially solvable 
problems and the class of NP-complete problems see 
[4]). If we believe that this is unlikely, then we focus our 
analyses on enumeration techniques. Hence, in the lat- 
ter case, the equivalence between the problem at hand 
and the NP-complete problem have dictated our ap- 
proach towards the problem at hand. Since 1971 when 
the foundations of complexity theory were developed 
by S.A. Cook in [2], all the papers that have appeared in 
the literature have taken the latter route - namely, they 
focus on heuristics and/or enumeration techniques. In 
this sense, complexity theory is a very useful tool for de- 
termining our approach towards solving difficult com- 
binatorial problems. In this article, we present some of 
the fundamental techniques that have been used in the 
literature to prove equivalence among problems. We 
start in the next section by presenting a list of combina- 
torial optimization problems. Then, we describe some 
basic methodology for theorem proving in complex- 
ity theory, and conclude with a few illustrative example 
proofs. 
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Some Known NP-Complete Problems 


In what follows we present some problems commonly 
used in the literature to prove equivalence between 
problems. All of these problems are notoriously hard, 
they belong to the class of NP-complete problems, and 
hence, no polynomial algorithm is known for them. In 
the rest of this article we present methods for proving 
the equivalence between selected pairs among of these 
problems. Our selected problems span a sample of ar- 
eas in combinatorial optimization including set parti- 
tion, logic, graph theory, network theory, and schedul- 
ing theory. 

The following two problems are representative of 
problems in set partition. The problems are stated in 
their decision form, i. e., we only require a ‘yes/no’ an- 
swer to resolve them. Each instance is described by 
the input data required to define the problem, and 
each question requires a ‘yes/no’ answer. This presen- 
tation of combinatorial problems follows the presenta- 
tion form adopted in [4] which was the first text de- 
voted to a systematic compilation of hard combinato- 
rial problems. 


Definition 1 (Partition) INSTANCE: Set A = {q,..., 
dn} of elements, and a set function s: A > Z*. 
QUESTION: Does there exist a set A’ C A such that 


> s(a) = ; Y= s(a) ? 


aca’ acA 


Definition 2 (3-Partition) INSTANCE: Set A = {a), 
. «+, 43n} of elements, set function s: A > Z*, and thresh- 
old value B. 

QUESTION: Are there subsets A, C A, k =1,...,n, 
such that 


)- s(a) = Band |Ax| = 3forl<i<n? 


acAj; 


These problems are among the most popular problems 
found in complexity theory. Note that in ‘Partition’, A is 
partitioned in two sets with no restriction on the num- 
ber of elements per set. For this reason, ‘Partition’ is of- 
ten referred to in the literature as 2-partition. In con- 
trast, “3-partition’ involves the partition of A in n sets 
each consisting of precisely three elements. 


Definition 3 (3-Satisfiability) INSTANCE: A Boolean 
expression B in literals x;,i=1,...,q, 


B=(pu V Piz V pis) A-** A (Pat V Pn2 V Pua) 
= Aja) Vieni Pij 


where each pj; is either x, or its negation x, for some 1 
<k<q. 

QUESTION: Is there an assignment for the literals 
x, such that B is true? 


This problem is also referred to in the literature as 3- 
Sat. The related problem where every clause has an ar- 
bitrary number of literals (rather than precisely three) 
is known in the literature as the satisfiability problem, 
or Sat, and has the distinction of being the first NP- 
complete problem (see [2]). Note that, if we were able 
to solve ‘Sat’ in polynomial time, then we would be able 
to determine the truth value of all possible statements 
in propositional calculus. Effectively, we could cast ev- 
ery imaginable theorem in propositional form, and let 
a computer answer it. This would be equivalent to the- 
orem proving using computers. 


Definition 4 (Maximum clique) INSTANCE: Graph 
G =(V, E) and positive integer k. 

QUESTION: Does there exist a complete subgraph 
of G on k vertices? 


The ‘maximum clique’ is a very important problem in 
graph theory with applications in diverse fields. The 
following problem is encountered when one wants to 
identify a path (with certain properties) in a given net- 
work, 


Definition 5 (Impossible pairs constrained path prob- 
lem (IPP)) INSTANCE: Directed graph G = (V, A) 
with source s and sink t, and pairs of nodes (a;, b;) for i 
1 arene (a 

QUESTION: Does there exist a directed s — t path 
containing at most one node from each pair (aj, b;) for 
Lm ares 4 


The following problems are found in scheduling theory 
where a set of jobs is to be processed in a production 
system so as to optimize a given objective. We use the 
standard 3-field notation a/f/y (see [6]) where aw de- 
notes the number and type of processors, 6 describes 
the job characteristics, and y the objective function. 
For example, a = 1 indicates a single processor, and a 
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= Pm denotes m identical processors operated in par- 
allel. Job characteristics include completion deadlines, 
start times, precedence constraints among jobs, or pro- 
cessing characteristics. A popular scheduling objective 
is the minimization of the makespan, Cmax — the max- 
imum completion time, where the maximum is taken 
over all jobs. Evidently, Cmax may be the preferred ob- 
jective when a manager wants to maximize the utiliza- 
tion of processors. 


Definition 6 1/r;, d;/C; < dj INSTANCE: Set J = {J,, 
...> Jn} of jobs, each with a processing time p;, a due- 
date d;, and a release time r;, 1 <i <n. 

QUESTION: Is there a schedule of the jobs in J such 
that each job starts after time r; and completes at time 
C; < dj? 


In the above problem one wants a single processor 
schedule where every job J; starts no earlier than time r; 
and finishes no later than time d;. Such schedule would 
allow on time delivery of jobs to the customers. 


Definition 7 Pm//Cyax INSTANCE: Set M = {M),..., 
Mn} of parallel identical processors, set J = {J1,...5 Jn} 
of jobs each with a processing time p;, and threshold 
value B. 

QUESTION: Is there a nonpreemptive assignment 
of the n jobs to the m processors so that at any time 
every machine processes at most one job, and the com- 
pletion time of J; is C; < B for every 1 <i<n? 


Here we seek a schedule of the n jobs on the m pro- 
cessors so as to minimize the completion time of the 
last job. Every processor can process at most one job at 
a time, and every job must be processed in its entirety 
without being interrupted by any of the processors. 


Definition 8 P//prec, pj = 1/Cmax INSTANCE: Set J 
= U1, ...5 In} of jobs with processing time p; = 1 for 1 
<i <n, and set A of precedence constraints between 
jobs in J. Also, set P of parallel identical processors, and 
a threshold value B. 

QUESTION: Is there a nonpreemptive assignment 
of jobs to processors so that at any time every ma- 
chine processes at most one job, the job precedence 
constraints are satisfied, and the completion time of J; 
is C; < Bfor every 1 <i<n? 


Unlike the previous problem, the number of parallel 
identical processors is not specified in this problem, 
i.e., @ = P. Every job has unit processing time. These 
unit jobs must satisfy a set of precedence constraints. 
Among all nonpreemptive schedules that satisfy these 
constraints, we seek one that minimizes the completion 
time of the last job. 

The following section presents a general methodol- 
ogy for NP-completeness proofs with the problems de- 
scribed above as examples. 


Methodology for NP-Completeness Proofs 


We start by presenting the four basic steps of a com- 

plexity proof. Such proofs demonstrate that a new 

problem /7 can be transformed to a known NP- 
complete problem P € NPC. To indicate this reduction 

we use the notation P x /T. 

1) Show that JT € NP. 

2) Construct a transformation from P to /7. 

3) Show that the transformation in step 2 can be ef- 
fected by a polynomial time algorithm. 

4) Show that there exists a solution Sp for P, if and only 
if there exists a solution S77 for JT, and that the trans- 
formations Sp to S77 and vice versa is done by a pseu- 
dopolynomial algorithm for strong NP-complete re- 
ductions, and by a polynomial algorithm for ordi- 
nary NP-complete reductions. 

Step 1 requires that, given a solution S77 of IT we can 

check whether S77 provides a ‘yes’ or ‘no’ answer for JT, 

using a polynomial algorithm. Given an arbitrary in- 
stance I of P, step 2 requires constructing an instance 

I’ of IT. Step 3 requires that the construction of I’ from 

I is polynomial on the number of input data required 

to specify I. Finally, Step 4 requires proving that, given 

a solution Sp for the instance I, we can construct a so- 

lution S77 for I' and vice versa. In almost all reductions 

that have appeared in literature, steps 1-3 are quite sim- 
ple and usually straightforward, while step 4 often re- 
quires considerable creativity. 

Step 4 refers to strong and ordinary NP-complete 
problems. In a nutshell, this is one of many classifica- 
tions of NP-complete problems into smaller subclasses. 
For a detailed description of these classes see [4]. In 
practical terms, an ordinary NP-complete problem can 
be solved using implicit enumeration algorithms like 
dynamic programming. In this case, the complexity of 
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the algorithm is not polynomial on the length of in- 

put data, but it is polynomial on the size of these data. 

For instance, Partition is a NP-complete problem solv- 

able by dynamic programming in O(n 7; s(a;)) time 

(see [8]). Evidently, this complexity is polynomial on 

the size }°; s(a;) of the data. To see that this complex- 

ity bound is not polynomial on the length of the data, 
consider the binary encoding scheme. In this scheme 
each s(a;) can be represented by a string of length O(log 

s(a;)), and hence s(aj), ..., S(a,) can be described by 

a string of length O(}/; log s(a;)) which is no greater 

than O(n logB) where B = }°; s(a;). We see that the time 

complexity O(mB) of the dynamic program (DP) 

e is polynomial on the size B of the data. 

e but not polynomial on the length of the input data 
for the instance I of ‘partition’, where length(J) = 
O(n log B). 

Notice that the complexity of this DP, O(nB), is not 

bounded by any polynomial function of n log B. When 

the complexity of an algorithm is polynomial on the 
size of the data, but not the length of the input, we re- 
fer to the algorithm as a pseudopolynomial algorithm. 

A NP-complete problem solvable by a pseudopolyno- 

mial algorithm is called ordinary NP-complete. Else, the 

problem is strongly NP-complete. 

As indicated by its complexity, solving ‘partition’ is 
easier than solving any problem not solvable by implicit 
enumeration. Such problems require explicit enumer- 
ation algorithms like branch and bound. In the list of 
problems given earlier, ‘partition’ is the only problem 
solvable by dynamic programming. 

Given the complexity status of the known NP- 
complete problem P, we can determine whether a new 
problem JT is strongly or ordinary NP-complete, if one 
of the following happens: 

e P is strongly NP-complete, and P « JT. Then, IT is 
strongly NP-complete. 

e P is ordinary NP-complete, P « IZ, and a pseu- 
dopolynomial algorithm exists for /7. Then, 7 is or- 
dinary NP-complete. 

As indicated in the following tree, all other outcomes 

result to incomplete determination of the exact com- 

plexity status of problem /7. 


Example Proofs 


In this section we present some simple applications of 
the four step reduction process outlined previously. 


Example 9 Partition « P2//Cyax. Step 1 requires us to 
show that P2//Cmax € NP, i. e., given a schedule S of the 
n jobs, we can check whether the associated makespan 
Cmax(S) < Bin polynomial time. To perform the check, 
we need to find the completion time of the last job pro- 
cessed by each of the processors. This requires no more 
than n additions involving the processing times of the 
jobs in J. Hence, Cmax(S) can be computed in O(n) time, 
and subsequently, whether Cmax(S) < B or not can be 
established in O(1) time. Hence, P2//Cmax € NP. This 
completes step 1. 

For step 2 we must construct an instance of 
P2//Cmax» given an instance of ‘partition’. Let I be an 
instance of ‘partition’, and s(aj), ..., s(a,) be the values 
of the n elements a, ..., a, in J. We construct an in- 
stance I’ of P2//Cymax as follows. Let p; = s(a;), i € {1,..., 
n}. The number of processors is m = 2 and the number 
of jobs is n. The threshold value B is set to B = (1/2) 
pi. This completes Step 2. 

This construction of I’ required n + 2 assignments, 
and n + 1 basic operations to compute B. Evidently, the 
total amount of effort required to construct I’ is O(n). 
This concludes step 3. 

In step 4, we need to show that, there exists a solu- 
tion for the instance I of ‘partition’ if and only if there 
exists a solution for the instance I’ of P2//Cmax. Indeed, 
let Aj, Az be a partition of A such that 


Y sai = To sla) = 5 Yo slay, 


aj€A), aj€Az ajEA 


From this, we can construct a solution for I’ by assign- 
ing all jobs in J‘ = {jz a; € Aj} to be processed (in any 
order) by processor Mj, and all jobs in P= Viz a; € A} 
to be processed (in any order) by processor Mp. Let S be 
the resulting schedule for P2//Cyax. By definition of A, 
and A2, 


2 


yey) 


= 5 p= 8. 


Tiel? 


Clearly, the schedule S is constructed from Aj, A in 
O(n) time. Similarly, given a schedule S that solves 
P2//Cmax one can construct the partition A;, A in O(n) 
time as well. 

Since ‘partition’ is an ordinary NP-complete prob- 
lem, to completely determine the status of P2//Cmax, we 
will have to develop a pseudopolynomial algorithm for 
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Is there a polynomial time 
optimal algorithm for N? 


No 


Yes 


Tis polynomially 
solvable. 


Complexity status of 
Mis open. 


Yes 


Nis strongly 
NP-complete. 


NP-complete Problems and Proof Methodology, Figure 1 
Establishing the complexity status of a problem JT 


Is there PE VP Cs.t. 
P11? 


Is P strongly 
NP-complete? 


Is there a pseudo-polynomial 
optimal algorithm for 1? 


Yes No 
Tis ordinary He ANPC, 
NP-complete but exact status open. 
< B’/2 for i = 1, ..., 3n. In this case, the elements aj, 


it. Such algorithm can in fact be developed (see [1]) 
which means that P2//Cmax belongs to the class of or- 
dinary NP-complete problems. 


Example 10 3-partition « 1/r;, dj/C; < d; For Step 1, 
consider a given schedule S for 1/r;, dj/C; < dj. By scan- 
ning the n jobs of S in sequence, we can obtain the 
start and completion times of the n jobs in O(n) time. 
Then, we can perform the n comparisons 1; < C; < dj in 
O(n) time. Hence, 1/r;, dj/C; < d; € NP. This completes 
step 1. 

For Step 2, suppose we are given an instance I of 3- 
partition, that is, a set A = {a1, ..., 43n} with associated 
values s(a,), ..., $(a3,), and threshold value B. For rea- 
sons that will become clear later, we make the follow- 
ing assumptions. Firstly, s(a;) < B for every 1 <i < 3n, 
otherwise we can immediately conclude that there is no 
solution for the instance I of 3-partition. Secondly, we 
assume that the elements aj, ..., 43, possess the prop- 
erty 

*) B/4 <s(a;) < B/2 fori=1,...,3n. 

Indeed, if this condition does not hold, we can replace 
the value of each element a; by s’(a;) = s(a;) + B, and the 
threshold value B by B’ = 4B. Then, it is easy to see that, 
B <s‘(a;) < 2B (because s(a;) < B), and hence B’/4 < s'(a;) 


. ++» 43n possess the required property with respect to the 
values s’(a;) and the threshold B’. Hence, without loss of 
generality, we assume that the elements of the instance 
I of 3-partition satisfy property * ). This ensures that, if 
Yea; € Akca (aj) = B, then |A;| = 3. 

From I, we construct the instance I’ of 1/r;, dj/C; < 
d; € NP with 4n jobs as follows. Set, p; = s(aj), 7; = 0 
and d; = nB + n—1 fori=1,..., 3n. Also include in I’ 
n ‘filler’ jobs, F;,..., F, with processing times pp, = 1, 
start times rp, = iB + i— 1 and due-dates dp, = iB + i 
fori=1,...,. Note that pp, = dp, — rp, = 1 so the filler 
jobs have no slack. This completes step 2. 

Clearly, The construction of I’ is done in O(n) time. 
To prove step 4, let Ay, k = 1, ..., n be a solution of I. 
By definition of the 3-partition problem, each set Ax is 
such that |A,| = 3, and )’oea, s(a) = B, k = 1,..., n. 
From this, construct a schedule S for I’ as follows. Let J* 
= {Jj: a; € Ax} for k =1,..., n, and schedule the jobs in 
J* to be processed (in any order) starting at time (k — 1) 
B+k—1and finishing at time kB + k — 1 as in Fig. 2. 

Evidently, the schedule S is constructed in O(n) 
time. Alternatively, if S is a schedule that solves I’, then 
the filler jobs should partition the 3n regular jobs in n 
distinct sets, say J‘ = Uj: Jj starts after Fy, and is com- 
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pleted before F;} for k = 1,..., n. Note that for k = 1, 
J' contains jobs that start after time t = 0 (Fo is not de- 
fined) and complete prior to F;. As the filler jobs have 
no slack, in each J, the sum of the processing times 
must be precisely B units. By construction of the in- 
stance I, each J* must contain precisely 3 jobs. There- 
fore, the sets Ay = {a; € A: J; € J*} fork =1,..., n par- 
tition A into n triplets such that )74;ca, (ai) = Do j,eyk 
pi =Bfork=1,...,n, i.e, the collection {Ax}i_, solves 
I. Hence, given a schedule S that solves I’, we can con- 
struct a partition {A;};_, that solves J, in O(n) time. 
This completes step 4. 


This example demonstrates that we can often construct 
the instance I so that it satisfies certain properties that 
may become handy in proving step 4. As seen so far, 
quite often steps 1 and 3, as well as parts of step 4 are 
trivial, in such cases the details are often omitted. A typ- 
ical example is provided next. 


Example 11 3-Sat « IPP Given a Boolean expression 
B, construct a digraph G as indicated in Fig. 3. 


nBtn-l wh+u 


Specifically, the arc set of G is 


B= {6.viy)if = 1,2,3) 
U {(Vij, Vi41,k): l<i<m—-1;1<j,k< 3} 
LS Ge ef at 


Also, let the set of restricted pairs be M = 
(Vij. Ve)? Pij = Pri}: 

Let P=sv\), ...Vmi,,t be a constrained path. By mak- 
ing pi, true for i= 1,..., m, we force B to be true. Since 
P satisfies the constraints in M, these assignments do 
not conflict. Therefore the existence of a path P implies 
the satisfiability of B and vice versa. This construction 
took only O(m) time, and the reduction is complete. 


Our last example demonstrates how complexity theory 
can be used to derive approximation results. 


Example 12 Max Clique « P/prec, pj = 1/Cnax Given 
G = (V, E) construct a digraph D as follows. Introduce 
a job J, for every v € V,a job J, for every e € E, and an 
arc J, — J. whenever v is an endpoint of e. Let ] = (5) 
be the number of edges in a k-clique, k’ = |V| — k the 
number of nodes not in the clique and I’ = |E|—1, the 
number of edges not in the clique. 

Let the number of processors be m = max(k, +k’, I’) 


+ 1. Introduce three sets of dummy nodes 


Xy, xX=1,....m—k, 
Yy,, y=l,...,.m—-I1-k’, 
Ze; @=A sm—l' 


and the arcs X, > Yy — Z,. Note that the total number 
of jobs is 3m. 

We will show that there exists a feasible schedule for 
the instance I of P/prec, pj = 1/Cmax = 3 if and only if 
there exists a clique of size k. Or, in other words, there 
is a clique of size k if and only if we can complete the 
jobs in 3 periods. 

( = ). Suppose a k-clique exists. Then, 

e Schedule the jobs corresponding to the k clique ver- 

tices in period 1 (see Fig. 4). 
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NP-complete Problems and Proof Methodology, Figure 4 
Instance of P/prec, p = 1/Cmax 


e Schedule the m—k jobs corresponding to each X, in 
the remaining processors in the first period. 

e Schedule the / jobs corresponding to the clique edges 
in period 2, schedule the k’ = m — k remaining vertex 
jobs in period 2, and the m — 1 — k’ jobs from Yy in 
period 2. 

e Finally, schedule the J = |E| — / remaining edge jobs 
in period 3 and the m — I' jobs from Z, in period 3 
(see Fig. 4). 

This is a feasible schedule on three periods. 

(=> ). We prove this by contradiction. Suppose no 
k-clique exists. We will demonstrate that no 3-period 
schedule can exist either. Since there are m machines 
and 3m jobs, in a three period schedule all machines 
must be busy for all the three periods. Certainly, due to 
the precedence constraints X, > Yy > Z;, all m—k 
jobs must be done in period 1, all Y, in period 2 and all 
the Z, in period 3. Then, the remaining k slots in pe- 
riod 1 must be taken by vertex jobs. Since no k-clique 
exists, the edge jobs that can be scheduled in period 2 
is no greater than / — 1. Then the jobs scheduled in pe- 
riod 2 are the /! — 1 edge jobs, all the Y, jobs and the 
m — k remaining vertex jobs. The total number of these 
jobs does not exceed m — 1. Hence, there must be at 
least one machine that stays idle for a period. Hence no 


schedule on three periods can exist, which is a contra- 
diction. 

This completes the reduction, and since ‘Max 
Clique’ is strongly NP-complete, so is P/prec, pj = 
1/Cmax: 


Observe that, in Example 12, the number m of proces- 
sors is not fixed. Therefore, Example 12 does not set- 
tle the complexity status for the problem in which m is 
specified beforehand. For example, it does not resolve 
the complexity status of P3/prec, pj = 1/Cmax. A careful 
examination of Example 12 can enable us to construct 
error bounds. Suppose we develop a heuristic H to solve 
P/prec, pj = 1/Cmax. Then, given an instance of Max 
Clique, we generate the associated instance of P/prec, 
Pj = 1/Cmax described in Example 12, and apply H on it. 
Since H cannot always solve P/prec, pj = 1/Cmax opti- 
mally (unless P = NP), there will be instances where 
an optimal schedule on three periods exists, but our 
heuristic H fails to find it. Since the instance of P/prec, 
pj = 1/Cmax produced in Example 12 uses integer pro- 
cessing times, whenever H fails to produce an optimal 
solution, its makespan Cy will be Cy > 4 rather than 3. 
Equivalently, the worst-case error bound for H must be 
p = Cu/C* > 4/3. This observation is true for any pos- 
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sible heuristic for P/prec, pj = 1/Cmax. Therefore, unless 
P = NP, no heuristic algorithm can exist with worst- 
case error bound p < 4/3. Therefore, research efforts for 
worst-case analyses for P/prec, pj = 1/Cmax must be fo- 
cused on values of p = 4/3 or larger. 


Conclusion 


In this article we presented some basic techniques used 
in proving NP-completeness results. Following Cook’s 
seminal paper [2], the first list of reductions for combi- 
natorial problems was compiled in [5]. The four exam- 
ples described in this article are illustrative of diverse 
optimization areas. They can be found in [3,4,7], and 
[6], respectively. 
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We give a survey of numerical methods for the unary 
optimization problem: min f(x) = }°7_, Uj(ai(x)), x € 
R", where U(-) is a function of a single argument and 
aj(x) = aes a; € R",i=1,..., m. 
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Introduction 


Consider the unconstrained optimization problem: 


min f(x), x €R"”. (1) 
If f(x) takes the form 
f(x) =)“ Ui@(x), x ER", (2) 
i=l 


where U;(-) is a smooth function of a single argument 
and a;(x) = alm a; € R",i=1,..., m, then we call prob- 
lem (1) a unary optimization problem. Usually m = n. 
Unary optimization problems appear in many practical 
fields such as linear robust regression, portfolio selec- 
tion and chemical equilibrium, see, e. g. [1,2,9]. 

The unary optimization problem was first proposed 
by G.P. McCormick and A. Sofer [9]. They exploited 
the special structure of derivatives of the unary func- 
tion and applied the resulting algorithm to the solution 
of the classical chemical equilibrium problem. D. Gold- 
farb and S. Wang [7] first proposed an implementable 
algorithm specially for the unary optimization problem. 
Using the special structure of unary functions, in or- 
der to save computation cost, they modified the regu- 
lar Newton method to develop a partial-update Newton 
method. In recent years some Chinese scholars worked 
on this subject and published several papers. We will 
review the state of art in this new interesting field. 


Goldfarb-Wang Algorithm GW 
If the unary functions U;(-), i= 1, ..., m, in (2) are all 
differentiable and the Hessian matrix V*f(x) is nonsin- 
gular for all x € R”, then we can use Newton method 
to solve the problem (1), i.e., starting from the current 
point x‘, we compute the new iterate by 

xhtl = yk 4 gk 
where the increment s* is obtained by solving the New- 
ton equation 


V7 f(x*)s = —Vf(x*). (3) 


Goldfarb and Wang noticed that the Hessian V*f(x) of 
the unary function has the following form: 


V7 f(x) = >° gilx)aia; , (4) 
i=1 


where 
at; (x) 
bi(x) = VU; ae (5) 
Let ®(x) = diag(¢i (x), ..., m(x)), and AT = (a, ...; 
dm). Then we can write 
V2 f(x) = AT D(x)A. (6) 


Notice that only ¢;(x) on the right-hand side of (4) or 
(6) may vary when x changes and the rank of the ma- 
trix b;(x) aja} is one for any nonzero value ¢;. Suppose 
that only the jth diagonal element of &(x*) and &(x*-!) 
differs at iteration k, i.e., 


ATO(x*)A = ATO(x* MN A+(Gj(x")—G)(x* aja), 
(7) 

or equivalently, 

V(x") = V7 F(x) + (Gj e*)—Gj-1(4"))aja; . (8) 


Therefore, V7f(x*)"! can be obtained from 
V2f(x*!)—!_ by the well-known Sherman-Morrison 
rank-one update formula, and the next iteration can 
be obtained in only O(n’) arithmetic operations after 
evaluating Vf and ¢;,i=1,...,m. 

The essential point of the GW algorithm is, in the 
kth iteration, to construct an approximation B* to the 
Hessian V7f (x). Goldfarb and Wang compute the in- 
crement s* by solving the approximate Newton equa- 
tion 


Bs = —Vf(x*), (9) 


where BK = AT@*A, and k = diag(p*, Des pr). So their 
method can be viewed as a special type of inexact New- 
ton method. The diagonal matrix @* is introduced to 
approximate &(x*), and is defined by setting, for i = 1, 
...5 mM, 6° = o;(x°), and 


git) = oF, (10) 
if f:(x***) is ‘replaceable’ by o*; and 
pe = o;(x**") otherwise, (11) 


for k > 0. Choleski factorization is then used to solve 
(9). Notice that 


Bet _ Bk + Yio — aja) ; 


iesk 


(12) 
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where SK is the set of indices i such that #;(x**!) is not 
‘replaceable’ by or, i=1,..., m. We can exploit rank- 
one updates starting from the Choleski factorization of 
BK‘, which needs O(|S*|n?) operations, where |S"| is the 
number of elements in S*. So it is possible to save com- 
putation cost if |S*| is not very large. 

Consider the assumptions: 
Al) there exists a point x* € R” with Vf(x*) = 0; 
A2) V*f(x*) is nonsingular; 
A3) (x) is continuous in a neighborhood of x*. 
Goldfarb and Wang proved the following 


Theorem 1 Let assumptions A1)-A3) hold. Then there 
exists 5 > 0 such that if ||x° — x*|| < 6, the sequence 
x* generated by the partial-update Newton method con- 
verges to xx. Moreover, the convergence is linear, i. e., 


aa - x* <t |" —x* Vk (13) 


where O<t< 1. 


In order to get a higher rate of convergence, they pro- 
posed two ‘replacement’ criteria for the method: 


e Criterion 1: For i = 1, ..., m, ok is replaceable by 
g; if 
(xk) — 6k} 
aia ar alt (14) 
pie) — 7" | + VF 
where 0 <7 <1. 
e Criterion 2: For i = 1, ..., m, ot is replaceable by 
oi ifk < por 
xk) — 6k} 
Joie) — 0 7 


maxk—p+i<j<k |bi(x*) — Pi(xi-1)| ~ 


where p is a given positive integer. 
Consider the additional assumption 
A4) &(x) is Lipschitz continuous at x*. 
The authors proved the following 


Theorem2 Let assumptions Al), A2) and A4) hold and 

let x* be the sequence generated by GW algorithm. Then 

1) x* is locally quadratically convergent to x* if Crite- 
rion 1 is used; 

2) x* is locally superlinearly convergent to x* with R- 
order at least rp, where rp is the unique positive root 
of t?*? — #? — 1 = 0 if Criterion 2 is used. 


In order to obtain global convergence of the method, 
the authors modified the ‘working approximation’ &* 


of D(x) toa BK, i. e., they modified the replacement 
criterion to ensure A’ ®*A to be positive definite. But 
the modified globally convergent algorithm is only R- 
linearly convergent. 

The authors’ numerical results show the method 
takes less time to solve some types of problems than the 
modified Newton methods, but not always so. 


Analysis of the Replacement Criterion 


The GW algorithm opened a new field for unary op- 
timization, but, to our knowledge, during nearly five 
years no more papers in this field were published. Re- 
cently (as of 1999), J.Z. Zhang, N.Y. Deng and L.H. 
Chen [10] discussed the efficiency of the GW algorithm. 
They asked whether it was more efficient than New- 
ton’s method when x* approaches the solution x*. They 
showed that, generally speaking, we can not expect the 
number ||S*|| to be small enough such that the compu- 
tation cost to factorize B*t! by rank-one updates is less 
than that to factorize V?f(x**!) directly. The efficiency 
of GW algorithm heavily depends on the magnitude of 
the number |S*|, so the frequency by which the replace- 
ment takes place is a key point. Based on this idea, the 
authors extended the Criterion 1 of GW algorithm to 
the so-called 

e Criterion Ry: For i=1,...,m we replace $;(x*!) by 

$j if 
a 


g(x) gt] <o|vfa)]”, 6) 


where o and a are two constants satisfying o > 0 and 
a €(0, 1]. 
It is easy to see that the GW’s Criterion 1 is a special 
case of Criterion Rg with w = 1 ando = n/(1 — 7). 
Consider the assumptions 
Al) Fori=1,...,m, Uj,(-) is three times continuously 
differentiable; 
A2) problem (1) has a solution x* and V*f(x*) is posi- 
tive definite; 
A3) the initial point x° is close enough to the solution 
x; 
They proved two theorems: 


Theorem 3 Let assumptions A1)-A3) hold. Consider 
the sequence x* generated by Algorithm Ry witha > 
@j+1, where J is a nonnegative integer and +, is the 
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unique positive root of the equation 


(l+a)te =1. (17) 
Suppose that for a fixed i, we have 
1) PU; (a;(x* )/da} £ 0, and 
2) there exists K > 0 such that 
o} F bi(x*) Wk > K. (18) 
Then 
a) for any K’ > 0, there exists an integer k > K’ such that 
$F = gilx"); (19) 
b) for any € > 0, there exists a K > 0 such that if 
k>K, (20) 
oF = bilx*) (21) 
and 
sk 
—— a a (22) 
| 
Then, for x* = xhAL ht? , pi(x*) is replaceable by 


or successively for at most J times. 


Theorem 4 The conditions are the same as Theorem 3. 
Suppose there exist M), Mz > 0 such that when k is large 
enough, 


1+a 


M, ||x* — x* < [ere — x" (23) 

< Mp ||x* — x* = (24) 
Then there exists K > 0, such that if 

k>K (25) 
and 

oF = gilx*). (26) 
Then for xk = KHL gk t2 , i(x*) is replaceable by 


or successively for at least J times. 


When J = 0, then the positive root of (1 + a)a = 1 is 


a= 51), For GW’s Criterion 1,a = 1 > @, 


so from Theorem 3, $;(x*) will never be replaceable by 
o*! when k is large enough. So the GW algorithm with 
Criterion 1 cannot be expected to be more efficient than 
Newton method. The authors’ numerical results also 
supported this conclusion. 

But the authors pointed out, the idea of the GW al- 
gorithm to use the approximate Hessian BK is promis- 
ing, and efficient algorithms can hopefully be con- 
structed. 


Using the Trust Region Approach 


In GW algorithm, in order to ensure global conver- 
gence, the authors modified the criteria such that the 
approximate Hessian Bt be positive definite. However, 
this prevents the algorithms from having a locally su- 
perlinear convergence, may lead to B‘ being a poor ap- 
proximation and may increase the computation cost. 
Motivated by this observation, Chen, Deng and Zhang 
[3] removed the requirement for BE to be positive def- 
inite, and proposed two modified partial-update algo- 
rithms based on trust region stabilization. They also im- 
proved the replacement criteria of GW algorithm and 
constructed the matrix B* in a different way such that it 
may be a better approximation to the Hessian. 

Besides the replacement of the line search technique 
by the trust region strategy, the main differences be- 
tween their algorithms and the GW algorithm are as 
follows: 

a) As mentioned above, in order to get a better approx- 
imation to V7f(x*), they abandoned the positive def- 
initeness requirement for BX. This change requires 
an efficient method to handle a trust region sub- 
problem with indefinite matrix, so they used the in- 
definite dogleg method suggested in [11]. The main 
feature of the method is that it solves the TR sub- 
problem approximately. It first uses a Bunch—Parlett 
(B-P) factorization for B*, and then forms a dogleg 
curve. Instead of minimizing the objective function 
within the whole trust region, it makes a curvilinear 
search along the dogleg curve in the region. 

They introduced a threshold value / ~ n/6 to control 
the algorithm to get the B-P factorization directly or 
to use a rank-one update, this motivated by the fact 
that the number of multiplications of the two meth- 
ods is 


b 


~~ 


1 
C, = a” + O(n’) 
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Set parameters p > 0, positive integers w and 
1. Choose the initial point x° € R". Set k = 0. 
IF Vit) = 0, stop. 

IF k = 0, go to Step 4; ELSE, define the set 


St = Ji: |o(x') — 9k] > || VFG4|]""4. 
(30) 

IF 
Is*| <w, (31) 


where | S* | is the number of elements in the 
set S*, go to Step 5; otherwise go to Step 4. 
Set 


ob = oj(x*), i=1,...,m, B* = V7 f(x*). 
(32) 
Compute the  Choleski factorization 
V7 f(x*) = LpD,L, and the increment 
s‘ by solving the Newton equation 
V ie s= =v i: 
Set x*+! = x* + s*, Go to Step 7. 


Set 
»  \ ie"), te S% 
¢; pe ig S*, (33) 
and 
Bi = BA +} (g;(x*) — @F ")aja}. (34) 
iesk 


Compute the Choleski factorization B‘ = 
L,%D,L} which is obtained by successive | S* | 
rank-one corrections from the Choleski factor- 
ization 
Bo Sepa 

The increment &* is given by 

sk = w(V7 f(x"), Vé(x*), B,D), (35) 
where W(V7f(x*), V(x"), B*,1) is a map- 


ping defined by solving approximately the 
Newton equation 


V7 f(x)s = —Vf(x) (36) 
with preconditioned conjugate gradient inner 
iterations, and the preconditioner is given by 
the inverse of B. The initial point is selected as 
s = 0. The number of inner iterations is /. The 
PCG method used here is a Standard one, e.g. 
see [8, Algorithm 2.5.1]. Set x**! = x* + 8*. 

Set k = k + 1, and then go to Step 2. 


< Numerical Methods for Unary Optimization, Algorithm 1 
Deng-Wang-Zhang algorithm 


and 
c =n’ + O(n) 


respectively, and c,,/c; = n/6. That is, if k = 1 or |S*| > 
I, one performs the B-P factorization of BE directly; 
otherwise, one uses the rank-one updates of the B-P 
factorization |S*| times to get the factorization of the 
matrix 


BE = BY + SO(gi — $)aia} 
iesk 


where Tx is the index of the iteration at which the 
latest B-P factorization was performed. 

c) They considered the effect of a; on the replacement 
criteria. From the expression 


- 

. 

V7 f(x j j : 
a Yat eu (23) (Gey) 


(27) 
they define S* by 
sk = {i: 
llaill? |oi(x*) — oF | 5 
[[laill? |bi@x*) — GF" | + ane |VFG IH 7 
l1<ix<xm}, 
(28) 


where y > 0 is a constant. Then they gave two modi- 

fied replacement criteria: 

- Modified replacement criterion 1: For k = 1 and i 
€ {1,..., m}, set 6! = ;(x!). For k > 1, define o 
as follows: 


1) If |S*| < J, set 


ok = di(x*) ifie S*, 
; pit ifi.¢ S*. 


t 


2) Otherwise, set 
bf = gi(x*). 


2) Modified replacement criterion 2: The same as 
modified replacement criterion 1 except choose 
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an extra integer parameter p > 0 and if k < p, S* 
is the same; otherwise, define 


Sk ={i: 


gil") — OF) = max [oi(x') — Gili) 
k—p+l1<j<k 


’ 


0<i<m}. 
(29) 
Consider the assumptions 
[Al] For i=1,..., m, U(-) is twice continuously dif- 
ferentiable; 
[A2] For any real constant i the level set {x: f(x) < 
f} is compact; 
[A3] For i=1,..., m, the function @;(x) is Lipschitz 
continuous. 
Under these assumptions the authors proved that the 
two algorithms are globally convergent, and if x* con- 
verges to x* and V? f(x*) is positive definite, then the 
convergence rate of the first algorithm is quadratic, 
while for the second algorithm, x* is locally superlin- 
early convergent to x* with R-order at least rp, where rp 
is the unique positive root of f?*! — t? -1=0. 
The authors’ numerical experiments showed that 
the two algorithms outperform the trust region method 
which uses GW’s criteria 1 and 2. 


An Inexact Newton Method Using Preconditioned 
Conjugate Gradient Techniques 


More recently (1999), Deng, Wang and Zhang [6] de- 
veloped Goldfarb and Wang’s idea further and ex- 
ploited the preconditioned conjugate gradient tech- 
nique proposed in [5] and [4] to derive a new inexact 
Newton method. They showed that, when n > 31, the 
local behavior of this algorithm is superior to that of 
both the GW algorithms and the algorithm mentioned 
in the above section [3] from the efficiency point of 
view. Their algorithm model is given below. 

Suppose problem (1) has a solution x* and the fol- 
lowing conditions are valid in a neighborhood of x*: 
[Al] For i= 1,..., m, the second derivative ¢;(x) of 

U;(-) defined by (5) is Lipschitz continuous with 
the constant L; 
[A2] V?f(x*) > 0, i.e. the Hessian is symmetric posi- 
tive definite. 
The authors proved the following local quadratic con- 
vergence theorem about the Deng-Wang-Zhang algo- 
rithm. 


Theorem 5 Let Assumptions A1)-A2) hold. If 1 = p, 
then there are positive scalars 8 and M such that if ||x° 
— x*|| < 6, the Deng-Wang-Zhang Algorithm (Algo- 
rithm 1) is well-defined and, furthermore, the sequence 
{x*} generated by it satisfies 


2 


je _ x* 


< M |x* — x* 


In other words, the sequence {x*} converges to x* with 
q-order at least 2. 


The authors also studied the precisely quadratic con- 
vergence of the Deng-Wang-Zhang Algorithm, that 
is essential for the estimation of its computation cost. 
Consider the following additional assumption: 

[A3] For any nonzero h € R", we have 


Soi (x* (at hya; £0. (37) 
i=1 


The authors proved 


Theorem6 Consider the sequence {x*} generated by the 
Deng-Wang-Zhang Algorithm with | > p. Let Assump- 
tions (A1)-(A3) hold. Then there is a positive scalar 5; 
and M, > 0 such that if ||x° — x*|| < 61, then for any k= 
0, 1,... 


2 2 


k * 


= jae _ x* 


< M |x* — x* 


(38) 


where M is defined in Theorem 5. In other words, the 
sequence {x*} converges to x* with Q-order equal to 2. 


In order to obtain an implementable algorithm, the au- 

thors specified the values of the parameters J, p and w in 

the Algorithm with the following purposes in mind: 

1) retain the quadratic convergence; 

2) keep the computation cost per iteration as little as 
possible. 


Here the computation cost is only concerned with the 
arithmetic operations in solving the Newton equation 
in Step 4 and Steps 5-6. For simplicity, they only con- 
sidered the number of multiplicative operations. For 
example, for Step 4, the arithmetic operations refer to 
the number Qy of multiplications to compute the solu- 
tion s* to (36) by a direct Choleski factorization, which 
is 


Qv = Qn(n) = — + —-—. (39) 
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Numerical Methods for Unary Optimization, Table 1 


i= 100 200 400 
w= 0 0 0 
De (2,3) (4,5) (4,5) 
l= 3 5) 5) 
Q(n) 
0.67 0.53 0.43 
Qn(n) 


Similarly, the arithmetic operations in one conjugate 
gradient inner iteration in Step 6 refers to the number 
of multiplications, denoted by Qcq, in calculating the 
right-hand side of (35), but with / there being replaced 
by 1: 

Qce = Qca(n) = 2n* + 6n +2. (40) 

Using the minimization of an upper bound of the 
average computation cost per iteration, the authors ob- 
tained the optimal parameter values of w, p and J, then 
established an implementable algorithm for problem 
(1) with n > 9 as follows. 


The same as the Deng-Wang-Zhang algo- 
rithm, except that the parameters are specified 
as follows: 
1 The parameter w: set w = 0. 

The parameter p: when 19 > n > Q, set 
p € (0,1); when 30 > n > 20, set p € 
(0,2); when n > 31, set p € (2'',2’° + 1), 
where the integer i* can be determined as 
follows: Let N = [Qn/Qcg], divide interval 
(0, N) by the points 2’, i = 1,..., into subin- 
tervals (0,21, (27,205 [nun (2) -211,(27, NI, 
where 2/ + 1 < N < 2/*!, then compare the 
values 


i(2' aF 1)Qca 
1l+i 


B= ae a 


fori =1,...,j. The integer i* is defined as the 
index corresponding to the smallest value u* 
of u(i): u(i*) = u*. 

3 The parameter /: set / = 1(p), where I(p) is de- 
fined by 
pt+1= l(p) > p. 


500 800 1000 2000 
0 0 0 0 

(8,9) (8,9) (8,9) (16,17) 
9 9 9 17 

041 0.35 0.33 0.28 


Then the authors proved the following theorem 
about the computation cost: 


Theorem 7 Let Al)-A3) hold. Then compared with the 
corresponding number Qn = Qn(n) of Newton’s method 
with Choleski factorization, the average arithmetic oper- 
ations per iteration Q = Q(n) of the improved Deng- 
Wang-Zhang algorithm has the following properties: 
1) When 30 > n> 9, Q(n) < Qn(n); 
2) when n = 31, Q(n) < Qn(n); 

(n) 


3) limnsoo oann > 0. 


Corresponding to the theoretical result in Theorem 7, 
some numerical values are listed in Table 1. For exam- 
ple, for m = 200, we have i* = 2, p € (4,5), /=5 and u* = 
u(2) = 0.53Qn, which means that the average cost per it- 
eration of the improved Deng-Wang-Zhang algorithm 
is no more than 0.53 Qn. 

The authors also pointed out that the parameter se- 
lection problem in the Deng-Wang-Zhang algorithm 
is worth of further study from both theoretical and nu- 
merical points of view. Better performance than the one 
listed in Table 1 is expected due to a better parameter 
selection method. 
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An oligopoly consists of a finite number (usually few) 
firms involved in the production of a good. Oligopolies 
are a basic economic market structure, with exam- 
ples ranging from large firms in automobile, computer, 
chemical, or mineral extraction industries to small 
firms with local markets. Oligopolies are examples of 
imperfect competition in that the producers or firms are 
sufficiently large that they affect the prices of the goods. 
A monopoly, on the other hand, consists of a single firm 
which has full control of the market. 

Oligopoly theory dates to A. Cournot [1], who con- 
sidered competition between two producers, the so- 
called duopoly problem, and is credited with being the 
first to study noncooperative behavior, in which the 


agents act in their own self-interest. In his study, the 
decisions made by the producers or firms are said to be 
in equilibrium if no producer can increase his income 
by unilateral action, given that the other producer does 
not alter his decision. 

J.F. Nash [18,19] generalized Cournot’s concept of 
an equilibrium for a behavioral model consisting of 
several agents or players, each acting in his own self- 
interest, which has come to be called a noncoopera- 
tive game. Specifically, consider m players, each player i 
having at his disposal a strategy vector x; = {Xi1, ... Xin} 
selected from a closed, convex set K' C R", with a util- 
ity function u;: K > R!, where K = K' x ...x K™ © 
R””. The rationality postulate is that each player i se- 
lects a strategy vector x; € K‘ that maximizes his utility 
level uj(x1, .. ..>Xm) given the decisions 
(xj); «i of the other players. In this framework one then 
has: 


+> Xi-1s Xi» Xi41> - 


Definition 1 (Nash equilibrium) A Nash equilibrium 
is a strategy vector x* = (x{,...,x%,) € K, such that 


ui(xe Xt; ) = ui(xi,X;) ‘4 Vxi€ Ki , Wi, 
WHER = AT pan gts ig gnc hgh 


It has been shown (cf. [6,10]) that Nash equilibria sat- 
isfy variational inequalities. In the present context, un- 
der the assumptionthat each u; is continuously differ- 
entiable on K and concave with respect to x;, one has 


Theorem 2 (variational inequality formulation) Un- 
der the previous assumptions, x* is a Nash equilibrium 
if and only if x* € K is a solution of the variational in- 
equality 


(F(x*),x —x*) > 0, Vx eK, 


2692 


Oligopolistic Market Equilibrium 


where F(x) = (— Vxitt1(x), ...5 — Vm Um(x)), and is 
assumed to be a row vector, where V x; uj(x) denotes the 
gradient of u; with respect to xj. 


If the feasible set K is compact, then existence is guar- 
anteed under the assumption that each utility func- 
tion u; is continuously differentiable. J.B. Rosen [22] 
proved existence under similar conditions. S$. Karamar- 
dian [12] relaxed the assumption of compactness of K 
and provided a proof of existence and uniqueness of 
Nash equilibria under the strong monotonicity condi- 
tion. As shown by D. Gabay and H. Moulin [6], the im- 
position of a coercivity condition on F(x) will guarantee 
existence of a Nash equilibrium x* even if the feasible 
set is no longer compact. Moreover, if F(x) satisfies the 
strict monotonicity condition then uniqueness of x* is 
guaranteed, provided that the equilibrium exists. 

We begin with the presentation of a classical 
oligopoly model and then present a spatial oligopoly 
model which is related to the spatial price equilibrium 
problem. 


The Classical Oligopoly Problem 


We now describe the classical oligopoly problem 
(cf. [1,4]. [5,6,7,13,20,21]) in which there are m produc- 
ers involved in the production of a homogeneous com- 
modity. The quantity produced by firm i is denoted by 
gis with the production quantities grouped into a col- 
umn vector q € R”. Let f; denote the cost of producing 
the commodity by firm i, and let p denote the demand 
price associated with the good. Assume that 


fi = filqi) 


and 


p=0(Soa 


The profit for firm i, u;, which is the difference between 
the revenue and cost, can then be expressed as 


ui(q) =p (>: «| qi — filqi). 


i=1 


Given that the competitive mechanism is one of 
noncooperative behavior, one can write down imme- 
diately: 


Theorem 3 (variational inequality formulation) As- 
sume that the profit function u;(q) for each firm i is con- 
cave with respect to q;, and that u;(q) is continuously dif- 
ferentiable. Then q* € R" is a Nash equilibrium if and 
only if it satisfies the variational inequality: 


"| afi(q*) ao(02, 4") . ts 
>| flat) Ama at) - (S27) 
i=1 i=1 
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x [qi — 43] = 0, VqeR". 


Example 4 In this oligopoly example there are three 
firms. The data are as follows: the producer cost func- 
tions are given by: 


filg:) = 95 + 41 + 10, 


1 
fh(q2) = 5h 4g + 12, 
1 
fa(q3) = 43 + sBtis, 


and the inverse demand or price function is given by: 


Ps qi) = — ae qit 5. 


The equilibrium production outputs are as follows: 


23 


* 


=—, 0, 
qq 30 42 
3 
14 17 
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i=1 


We now verify that the variational inequality is sat- 
isfied: — du(q*)/ dq, is equal to zero, as is — 0u3(q*)/ 
dq3, whereas — du2(q*)/ qo = 7/10. Since both qj and 
q3 are greater than zero, and q> = 0, one sees that, in- 
deed, the above variational inequality is satisfied. 

Computational approaches can _ be 
in [2,4,8,14,16,17], and the references therein. 

In the special case where the production cost func- 
tions are quadratic (and separable) and the inverse de- 
mand or price function is linear, one can reformu- 
late the Nash equilibrium conditions of the Cournot 
oligopoly problem as the solution to an optimization 
problem (see [16,23]). 


found 


A Spatial Oligopoly Model 


We now describe a generalized version of the oligopoly 
model due to S.C. Dafermos and A. Nagurney [3] (see, 
also, [11]), which is spatial in that the firms are now 
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located in different regions and there are transporta- 
tion costs associated with shipping the commodity be- 
tween the producers and the consumers. For the rela- 
tionship between this model and the perfectly competi- 
tive spatial price equilibrium problem, see [3]. For algo- 
rithms for the computation of solutions to this model, 
see [15] (see, also, e.g., [9,16]). For additional back- 
ground, including qualitative properties and references, 
see [16]. In [17] dynamic oligopolistic models are pre- 
sented both spatial and aspatial (see also [4] and [20]) 
and stability analysis results given. 

Assume that there are m firms and n demand mar- 
kets that are generally spatially separated. Assume that 
the homogeneous commodity is produced by the m 
firms and is consumed at the n markets. As before, 
let q; denote the nonnegative commodity output pro- 
duced by firm i and now let d; denote the demand for 
the commodity at demand market j. Let Qj denote the 
nonnegative commodity shipment from supply market 
i to demand market j. Group the production outputs 
into a column vector q € R"!, the demands into a col- 
umn vector d € R”_,and the commodity shipments into 
a column vector Q € R™"+. 

The following conservation of flow equations must 
hold: 


qi = > Qij. Vi, 
j=l 


dj = ¥ Oe: 


i=1 


Vj, 


where Qi = 0, Vi, j. 

As before, we associate with each firm i a produc- 
tion cost f;, but allow now for the more general situa- 
tion where the production cost of a firm i may depend 
upon the entire production pattern, that is, 


fi = filq). 


Similarly, allow the demand price for the commodity at 
a demand market to depend, in general, upon the entire 
consumption pattern, that is, 


pj = pj(d). 


Let cj denote the transaction cost, which includes 
the transportation cost, associated with trading (ship- 
ping) the commodity between firm i and demand mar- 
ket j. Here we permit the transaction cost to depend, in 


general, upon the entire shipment pattern, that is, 
cij = cij(Q). 


The profit u; of firm i is then given by: 


n n 
uj = ) pe=fi= ) cijQij, 
i=l j= 


which, in view of the conservation of flow equations 
and the functions, one may write as 


u=u(Q). 


Now consider the usual oligopolistic market mech- 
anism, in which the m firms supply the commodity 
in a noncooperative fashion, each one trying to maxi- 
mize his own profit. We seek to determine a nonnega- 
tive commodity distribution pattern Q for which the m 
firms will be in a state of equilibrium as defined below. 


Definition 5 (spatial Cournot-Nash equilibrium) 
A commodity shipment distribution Q* € R"” is said 
to constitute a Cournot-Nash equilibrium if for each 
firmi,i=1,...,m, 


u;(Q*,Q*)>ui(Q;,Q7), VQ:ER", 
where 

Qi ={Qi,... Qin} ’ 

OOF OO ee OR) 


The variational inequality formulation of the Cournot- 
Nash equilibrium is given in the following theorem. 


Theorem 6 (variational inequality formulation; [3]) 
Assume that for each firm i the profit function u;(Q) 
is concave with respect to the variables {Qi, ..., Qin}, 
and continuously differentiable. Then Q* € R'?" is 
a Cournot-Nash equilibrium if and only if it satisfies the 
variational inequality 


m n 


-L A «Qs -apze. 


i=1 j=1 


VQ eR. 


Using the expressions for the utility functions for this 
model and the conservation of flow equations this varia- 
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tional inequality may be rewritten as: 


sae, 


i=1 


(qi — qj) 


+ Yo ci(Q*) x (Qj -— GY) 


i=1 j=1 


— > pj(a*) x (dj — d¥) 


j=l 

yyy |2e- ee] 
S11 (a4 IQij 

x Qi}(Qij — Q;;) = 0 

V(q,Q,d)EK, 


where K = {(q, Q, d): Q = 0, and the conservation of 
flow equations hold}. 


Note that, in the special case, where there is only a sin- 
gle demand market and the transaction costs are identi- 
cally equal to zero, this variational inequality collapses 
to the variational inequality governing the aspatial or 
the classical oligopoly problem. 


See also 


> Equilibrium Networks 

> Financial Equilibrium 

> Generalized Monotonicity: Applications to 
Variational Inequalities and Equilibrium Problems 

> Spatial Price Equilibrium 

> Traffic Network Equilibrium 

> Walrasian Price Equilibrium 
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Operations research 


Operations research originated circa 1945. The Web- 
ster’s Ninth New Collegiate Dictionary (1989) defines 
it as: The application of scientific and especially mathe- 
matical methods to the study and analysis of problems 
involving complex systems (as firm management, eco- 
nomic planning, and the waging of war). 

Case Institute of Technology (presently part of Case 
Western Reserve Univ.) located in Cleveland, Ohio, 
USA, was the first institution of higher learning in the 
world to establish a graduate program in Operations 
Research leading towards the MSc and PhD degrees. 
The first PhD degree in operations research in the USA 
was awarded to L. Friedman circa 1957, at Case. In the 
1950s and early 1960s, Case was one of the leading in- 
stitutions in the field of operations research. The fa- 
mous operations research group at Case was well estab- 
lished and reputed internationally. The operations re- 
search philosophy at Case was essentially based on the 
premise that operations research is science. An appro- 
priate definition of operations research is: The applica- 
tion of scientific methods to analyze, model, solve, and 
control human-created or management problems using 
mainly quantitative techniques [2]. Operations research 
provides the decision maker either with an optimum 
option from among a set of feasible alternative courses 
of action or with an optimum allocation of limited re- 
sources so as to minimize or maximize a given criterion 
or objective function [1]. The seven steps used in order 
to carry on an operations research study are: 

1) Observation. 
2) Analysis of the present systems. This includes: 

a) qualitative analysis, quantitative analysis and 

data collection; 

b) components to be incorporated such as: man, 
machine, material, management, money, con- 
sumer, competitors, public, government, safety, 
reliability, aesthetics, ethics. 


3) Definition and formulation of the problem. These 

must involve: 

a) the decision-maker; 

b) the objectives or criteria used; 

c) the environmental or system constraints; 

d) the feasible alternative courses of action. 

Construction of a model (synthesis/design). Four 

models are possible: 

a) mathematical (or symbolic or abstract) model; 

b) iconic model (e. g. CAD, graphics, drawings); 

c) analog model; 

d) simulation model. 

As an example, the construction of a mathematical 

model involves abstracting the operation of the sys- 

tem including: 

a) defining and quantifying 
i) the decision or control variables; 

ii) the parameters or uncontrollable variables; 

b) establishing functional relationships between 
variables and parameters; 

c) developing a mathematical function for the crite- 
rion (objective function to be minimized or max- 
imized). One must specify: 

i) the horizon period; 

ii) the units; 

iii) the measures used such as: cost or profit (e. g. 
average, total, discounted), reliability, avail- 
ability; 

d) setting up the constraints. 
Derivation of a solution to the problem. This typi- 
cally consists in selecting the best from among the 
set of feasible alternative courses of action to meet 
the defined objective function and satisfy the con- 
straints through optimization techniques and tools 
using analytic or numerical procedures. 

Testing and/or Validation. This involves: 

a) validation of data; 

b) validation of the model; 

c) testing the solution. The testing methodology 
uses: 


4 
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i) retrospective testing; 

ii) prospective testing; 

iii) simulation (accelerated testing); 

iv) others. 
Control and implementation of the solution. The 
control methodology uses, for example: 
a) sensitivity analysis; 
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b) statistical quality control. 

The implementation methodology uses, for exam- 

ple: 

a) statistical inference; 

b) sampling techniques. 
Let me say that the above definition of operations re- 
search and the outlined methodology were not exactly 
the ones proposed in the late 1950s. However, they may 
be considered close enough for all practical purposes. 
One may consider to what extent the definition and the 
methodology have been altered throughout the years. 
In my opinion very little! In the foreword [3] Aristotle 
is quoted: “And the science which knows to what end 
each thing must be done is the most authoritative of the 
sciences, and more authoritative than any ancillary sci- 
ence; and the end is the good of that thing, and in gen- 
eral the supreme good in the whole of nature’. 


See also 


> History of Optimization 
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From the 1950s onwards, a strong relationship between 
operations research, OR, and finance has developed, re- 
sulting in a large and rapidly growing literature. Al- 
though most applications have been of OR techniques 
to finance, finance problems have also stimulated the 
development and refinement of OR techniques. 
Finance problems, and especially those relating to 
financial markets, are particularly well suited to analy- 
sis using OR techniques. These problems are generally 
separable and well defined, have a clear objective (of- 
ten to maximise profit or minimise risk), and have vari- 
ables which are quantified in monetary terms. The re- 
lationships between the variables in finance models are 
usually stable and well defined, so that the resulting OR 
model is a good representation of the problem. As there 
are few concerns about human behavior ruling out the 
implementation of some solutions, the solutions pro- 
duced by the analysis can usually be implemented. In 
addition, large amounts of data, both historic and real 
time, are readily available and can be used in OR mod- 
els. Some finance problems involve very large sums of 
money, so that even a very small improvement in the 
quality of the solution is profitable to implement. 
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This review describes the application of OR to prob- 
lems in the analysis of financial markets (e. g., the mar- 
kets for debt, equity and foreign exchange markets and 
the corresponding derivatives markets). For a review of 
the application of OR to other areas of finance, such as: 
the management of the firm’s finances: working capi- 
tal management, capital investment, multinational tax- 
ation, and financial planning models (such as those de- 
veloped for banks) see [3]. 


Portfolio Theory 


A seminal application of OR techniques to finance was 
by H.M. Markowitz [67,68] when he specified the port- 
folio problem in terms of optimization over the means 
and variances of the assets available, and proposed the 
solution of this problem through quadratic program- 
ming, see [11] for a survey. In addition to specify- 
ing the portfolio problem in terms of OR techniques, 
Markowitz also developed solution algorithms for more 
general quadratic programming problems. This pro- 
vides an example of the interaction between OR tech- 
niques and finance, with the former sometimes being 
adapted to meet the needs of the latter. 

Although the most obvious application of portfolio 
theory is to the choice of equity portfolios, and empir- 
ical papers, e. g., [10,81] have used quadratic program- 
ming to compute efficient equity portfolios, the tech- 
nique can be applied more widely (e. g., to portfolios of 
currencies, bonds, or commercial loans). Multiperiod 
portfolio problems have been specified as dynamic pro- 
gramming problems [35], while [75], uses a stochastic 
generalized network model. 

OR researchers have also modified or replaced the 
quadratic programming approach to portfolio prob- 
lems, often by explicitly specifying the relevant utility 
function and using stochastic linear programming with 
recourse to model risk in a multiperiod framework. For 
example, in [15] forming bond portfolios is proposed 
to maximize their expected value, using stochastic lin- 
ear programming to allow for interest rate risk. Early 
references in this area are [50,59,109,110]. The scenar- 
ios included in portfolio models may be generated by 
Monte-Carlo simulation, prior to the use of stochas- 
tic programming to maximise expected utility, e. g., 
[37,97,102,103] and [105], where this approach is ap- 
plied to form portfolios of mortgage backed securities. 


The investment policy of a pension fund can be for- 
mulated using asset-liability management models that 
allow for the correlations between the values of the 
fund’s assets and liabilities. While these problems can 
be formulated using quadratic programming, they have 
usually been solved in other ways (see [113]). For exam- 
ple, in [74] it is assumed that the objective was to max- 
imise the expected value of a nonlinear utility of wealth 
function, and specified the problem as a nonlinear net- 
work problem, with the simulation of future pension 
fund liabilities. Similar asset-liability problems are also 
faced by insurance companies, for example [19,20,21] 
formulate this problem for a Japanese insurance com- 
pany. See also [50,59,76]. In [52] it is pointed out that 
the use of Monte-Carlo simulation can bias the results 
by including arbitrage opportunities in the sampled 
scenarios. To avoid this, an arbitrage-free event tree is 
aggregated before its inclusion in a multistage stochas- 
tic programming model of the asset-liability problem. 

Another application of quadratic programming is 
generalized hedging, in which the objective is usually to 
minimise the variance of a portfolio of a given set of as- 
sets and the chosen hedging instruments. If the hedging 
instruments include options, this introduces a nonlin- 
earity into the hedging decision, and [77] devises a non- 
linear programming model to hedge foreign currency 
exposure using a mixture of currency forward and op- 
tions contracts. Similarly, quadratic programming has 
been used to construct index tracking portfolios, where 
the purpose is to select a portfolio of assets (e. g., eq- 
uities or bonds) which, when combined with a match- 
ing short position in the index to be tracked, has mini- 
mum risk [69,70,86,88]. Multistage stochastic program- 
ming with recourse, in conjunction with Monte-Carlo 
simulation to generate the scenarios, has been used 
in [97] and [105] to track an index of mortgage backed 
securities. 

A related problem is that of portfolio immunization 
in which the objective is to construct a portfolio of in- 
terest rate dependent securities whose value is the same 
as some target asset (usually another interest rate de- 
pendent asset). (There is also a literature on manag- 
ing the assets and liabilities held by banks (which are 
taken to exclude equities), where the objective is usually 
to maximise the value (or expected value) of the port- 
folio over one (or many) time periods (net of penalty 
costs from constraint target violations), subject to re- 
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strictions of the total investment, maximum capital loss 
and various bank regulations.) By matching the dura- 
tion of the portfolio with that of the target asset, the 
portfolio is immunized against small parallel shifts in 
the yield curve (which shows the interest rates for dif- 
ferent maturities), [36,55,78] and [2]. These immuniza- 
tion studies use a risk measure which does not involve 
squares or cross products of the decision variables, and 
so linear programming, not quadratic programming, is 
the solution technique. 

In some applications of portfolio theory, the deci- 
sion variables must be integer. Quadratic integer pro- 
gramming is used to compute hedging strategies in- 
volving futures in [82] and [89]. A. Shapiro, 1988, used 
stochastic integer programming with recourse to con- 
struct bond portfolios that allow for some of the bonds 
being called (or redeemed early) if interest rates are low. 

Some authors have argued that formulation and 
solving quadratic programming portfolio problems is 
too onerous, and proposed simplified solution tech- 
niques. W.F. Sharpe [91] proposed a single index model 
which can be solved by the use of special purpose 
quadratic programming algorithms. When each asset 
represents only a small proportion of the portfolio, 
he [92] showed that his single index model can be 
treated as having a linear objective function. In 1971, 
he suggested using a piecewise linear approximation 
to the quadratic objective function, enabling the appli- 
cation of linear programming to solve portfolio prob- 
lems. Another proposal is to minimize the mean abso- 
lute deviation (MAD), which can be solved using lin- 
ear programming, rather than quadratic programming 
(e.g. [53,54,101,106], and [100]). Another approach is 
to specify the problem as choosing between a range of 
pre-specified equity portfolios using data envelopment 
analysis [84]. A further approach is to reformulate the 
portfolio problem as a nonlinear generalized network 
model for which efficient solution algorithms exist [73]. 

Portfolio problems, with the twin objectives of max- 
imising returns and minimising risk, can also be viewed 
as goal programming problems with two goals. Addi- 
tional goals can be introduced, and a number of authors 
have solved portfolio problems using goal program- 
ming, among them [57,58], and [61]. Capital growth 
theory, see [40] for a survey, has been used in a number 
of applications for the optimal investment of repeated 
investments over time, see e. g. [64,65,66,111]. 


Pricing Derivatives 


It is very important when trading in financial markets 
to have a good model for valuing the asset being traded, 
and OR techniques have made a substantial contribu- 
tion in this area. Indeed, the very rapid growth of these 
markets is partly due to the application of OR tech- 
niques in pricing models. 

P.P. Boyle [12] proposed the use of Monte-Carlo 
simulation as an alternative to the binomial model for 
pricing options for which a closed form solution is not 
readily available. Monte-Carlo simulation has the ad- 
vantage over the binomial model that its convergence 
rate is independent of the number of state variables 
(e. g., the number of underlying asset prices and interest 
rates), while that of the binomial model is exponential 
in the number of state variables. Simulation is used to 
generate paths for the price of the underlying asset un- 
til maturity. The cash flows from the option for each 
path, weighted by their risk neutral probabilities, are 
discounted back to the present using the risk free rate, 
allowing the average present value across all the sample 
paths to be computed, thus yielding the current price of 
the option [14]. Risk neutral probabilities are inferred 
from prices assuming that investors have linear util- 
ity functions. Monte-Carlo simulation can also be used 
to compute various option price sensitivities, which in- 
clude the hedge ratio. These sensitivities, or ‘Greeks’, 
are essential for many trading strategies [17]. 

Until recently it was thought that Monte-Carlo sim- 
ulation could not be used to price American style op- 
tions which can be exercised at any time before the op- 
tion expires, because no closed form solutions for their 
price exist. This is a major problem, as the majority of 
options are American style. However, progress is be- 
ing made in developing Monte-Carlo simulation tech- 
niques for pricing American style options [18,39]. Op- 
tions have also been priced using finite difference ap- 
proximations, and it has been proposed, [27] and [28], 
to use of linear programming to solve the finite dif- 
ference approximations to the price of American style 
put options. In addition, American style options can be 
priced using dynamic programming, [29]. 

Provided a price history is available, a neural net- 
work can be trained to produce prices using a specified 
set of inputs, which can then be used for the out-of- 
sample pricing [49] of securities. 
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Mortgage backed securities (MBS) are created by 
the securitization of a pool of mortgages. For any spe- 
cific mortgage, the borrower has the right to repay the 
loan early - the prepayment option, or may default on 
the payments of capital and interest. Thus MBS are hy- 
brid securities, as they are variable interest rate securi- 
ties with an early exercise option. Monte-Carlo simula- 
tion can be used to generate interest rate paths for fu- 
ture years. Forecasts of the mortgage prepayment rates 
then permit the computation of the cash flows from 
each interest rate path, and these sequences of cash 
flows are used to value the MBS [6,13,104]. This pro- 
cedure, which can be used to identify mispriced MBS 
in real time, is computationally demanding and paral- 
lel (and massively parallel) and distributed processing 
have been used in the solution of the problem. Simu- 
lation has also been used to price collateralized mort- 
gage obligations or CMOs [80]. Other hybrid securi- 
ties, such as callable and putable bonds and convertible 
bonds face similar valuation problems to MBS and re- 
quire similarly intensive solution methods. 

There is an active secondary market in loan portfo- 
lios which may carry a significant default risk. In [26] 
a Markov chain analysis with 14 loan performance 
states is used and Monte-Carlo simulation is performed 
to generate the probability distribution of the present 
value of loan portfolios. 


Trading Tactics 


Besides pricing financial securities, traders are inter- 
ested in finding imperfections in financial markets 
which can be exploited to make profits. See [51] for 
a survey of equity anomalies. One aspect of this is 
the search for weak form inefficiency (i.e. that an as- 
set’s past prices can be used as the basis of a prof- 
itable trading rule). An early attempt to find such ex- 
ploitable regularities in stock prices is the use of Markov 
chains [30,31]. Such strategies have been found in 
horserace betting markets, see [42,43,44,45], and the 
survey book [41]. 

Arbitrageurs seek to exploit small price discrepan- 
cies to give riskless profits. Network models have been 
used to find arbitrage opportunities between sets of 
currencies ([22,55,73,75]). This problem can be spec- 
ified as a maximal flow network, where the aim is to 
maximise the flow of funds out of the network, or as 


a shortest path network. A risk arbitrage, convergence 
type hedgefund trade on discrepancies between various 
markets for Nikkei puts is described in [94]. 

There has been a growing interest in using artifi- 
cial intelligence based techniques (expert systems, neu- 
ral networks, genetic algorithms, fuzzy logic and induc- 
tive learning) to develop trading strategies for financial 
markets (e. g., [38,85,96,99]). Such approaches have the 
advantage that they can pick up nonlinear dynamics, 
and require little prior specification of the relationships 
involved. 


Funding Decisions 


OR techniques have also been used to help firms de- 
termine the most appropriate method by which to raise 
capital from the financial markets. In [16] a chance con- 
strained linear programming model is put forward to 
compute the values of the debt-equity ratio each period 
that maximize the value of the firm. Other studies have 
specified the choice between various types of funding as 
a linear goal programming problem [48,60]. 

A different approach to the debt problem is to as- 
sume that the firm has found its desired debt-equity ra- 
tio, and is purely concerned with raising the requisite 
debt as cheaply as possible. In this case, debt can be 
treated like any other input to the productive process, 
and inventory models used to determine the optimal 
‘reorder’ times and quantities [9,62]. 

The design of callable bonds has been addressed 
in [23,24], using nonlinear programming, while [47] 
uses a simulated annealing algorithm. Firms which 
have issued callable debt face the bond-scheduling 
problem, in which they must decide when to call (re- 
pay) the existing debt and refinance it with a new issue, 
presumably at a lower cost. This dynamic programming 
problem has been modeled in [35,56] and [98]. 

Finally, the problem facing borrowers of choosing 
between alternative mortgage contracts (e. g., fixed rate, 
variable rate and adjustable rate mortgages) has been 
modeled using decision trees [46,63]. 


Strategic Problems 


In recent years, some of the decisions facing traders 
and market makers in financial markets have been anal- 
ysed using game theory [32,79]. Traders in stock mar- 
kets seek to trade at the most attractive prices and large 
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trades are often broken up into a sequence of smaller 
trades in an effort to minimise the price impact. This 
can be viewed as a strategic problem, and [8] uses 
stochastic dynamic programming to compute an opti- 
mal trading strategy. 

In [83] game theory is applied to the situation where 
a company has two major shareholders, and a large 
number of very small shareholders. This can be mod- 
eled as an oceanic game, in which the two large players 
behave strategically while the many small shareholders 
(the ocean) do not. This approach can be used to de- 
rive the highest price a large shareholder will pay in the 
market for corporate control. 


Regulatory and Legal Problems 


Financial regulators have become increasingly con- 
cerned about financial markets with their very large and 
rapid international financial flows. OR techniques have 
proved useful in regulating the capital reserves held by 
banks and other financial institutions to cover their risk 
exposure. OR techniques have also been used to ensure 
compliance with various legal requirements by design- 
ing appropriate strategies, and to solve other legal prob- 
lems relating to financial markets. 

A key regulatory issue is determining the capital re- 
quired by financial institutions to underpin their ac- 
tivities in financial markets. An increasingly popular 
approach to this problem is the value at risk (VAR), 
which involves quantification of the lower tail of the 
probability distribution of outcomes from the firm’s 
portfolio. Portfolios usually include options (or fi- 
nancial securities with option-like characteristics), and 
these have highly asymmetric payoffs. For such se- 
curities, analytical solutions to finding the probabili- 
ties in the lower tail of the payoff distribution are un- 
reliable. Riskmetrics’™ uses approximations based on 
‘the Greeks’ for options that are at or near the money 
(i.e. the current price of the underlying asset is close 
to the price at which the option can be exercised), 
and Monte-Carlo simulation for other options posi- 
tions [72]. (A related application of Monte-Carlo simu- 
lation is stress testing, which quantifies the sensitivity 
of a portfolio to specified, often adverse, market sce- 
narios.) Some securities are also subject to credit risk, 
which has a highly nonnormal distribution for all in- 
struments. Therefore, Monte-Carlo simulation is rele- 


vant to modeling the credit risk of portfolios of finan- 
cial instruments (e.g., loans, letters of credit, bonds, 
trade credit, swaps, forwards) as in CreditMetrics’™ 
[71]. Y. Zhao and Ziemba [107,108] discuss how to 
model downsized risk in continuous and discrete time 
models. 

Data envelopment analysis has been used to assist 
in bank regulation by measuring bank efficiency, which 
is then used to predict bank failure [4,5]. 

Traders are required to put up margin when they 
trade options, see [87] for a linear programming model 
in which the problem was modeled as a transportation 
problem. 

An extensive set of rules governs the way in which 
a ‘to-be-announced’ MBS can be structured, leading 
to a complexproblem in devising a feasible solution. 
This can be specified as a complicated integer program- 
ming problem (with the objective of maximising the 
originator’s profit). Collateralized mortgage obligations 
(CMOs) also involve the securitization of a mortgage 
pool, but in this case the pool is structured into a series 
of bonds (or tranches), each with a different maturity 
and risks. See [25] for a complex zero-one program- 
ming model for solving this problem, with the objective 
of maximizing the proceeds from the issue. 

In [90] a linear programming formulation is pro- 
posed to establish the maximum loss that investors 
could have sustained from trading in a company’s 
shares. This figure can then be used by the company’s 
lawyers in lawsuits claiming damages from a mislead- 
ing statement by the company. 

In August 1982, the Kuwait Stock Market collapsed 
leaving $94 billion of debt to be resolved. This led to the 
problem of devising a fair method for distributing the 
assets seized from insolvent brokers among the other 
brokers and private investors. This problem was solved 
using linear programming, which reduced the total un- 
resolved debt to $20 billion, saving an estimated $10.34 
billion in lawyer’s fees [33,34,95]. 


Economic Understanding 


OR can help in trying to understand the economic 
forces shaping the finance sector. Using a linear pro- 
gramming model of a bank, [7] employs annual data 
to compute movements in the shadow prices of the 
various constraints. They suggested that a rise in the 
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shadow price of the deposits constraint led to the finan- 
cial innovation of negotiable CDs. 

Arbitrage pricing theory (APT) seeks to identify the 
factors which affect asset returns. Most tests of the APT 
use factor analysis, and have difficulty in determining 
the number and definition of the factors that influence 
asset returns. To overcome these problems [1] suggests 
using a neural network, which also has the advantage 
that the results are distribution free. 


Conclusions 


Mathematical programming is the OR technique that 
has been most widely applied in financial markets. 
Most types of mathematical programming have been 
employed - linear, quadratic, nonlinear, integer, goal, 
chance constrained, stochastic, fractional, DEA and dy- 
namic. Monte-Carlo simulation is also widely used in 
financial markets - mainly to value exotic options and 
securities with embedded options, and to estimate the 
VAR for various financial institutions. In some cases 
the use of OR techniques has influenced the way finan- 
cial markets function since they permit traders to make 
better decisions in less time. For example, exotic op- 
tions would trade with much wider bid-ask spreads, if 
they traded at all, in the absence of the accurate prices 
computed using Monte-Carlo simulation. 

Other OR techniques are less used in financial 
markets. Arbitrage and multiperiod portfolio problems 
have been formulated as network models, while mar- 
ket efficiency has been tested using neural networks. 
Game theory has been applied to battles for corporate 
control, decision trees to analyse mortgage choice, in- 
ventory models to set the size and timing of corporate 
bond issues, and Markov chains to valuing loan port- 
folios and testing market efficiency. One important OR 
technique - queueing theory — has found little applica- 
tion in financial markets 

This review has shown that OR techniques have 
been usefully applied to portfolio problems and the ac- 
curate pricing of complex financial instruments. They 
are also used by financial regulators and financial insti- 
tutions in setting capital adequacy standards. It is clear 
from this that OR techniques play an important role in 
financial markets and that this role is likely to increase 
over time. 
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A supply chain is ‘a connected series of activities which 
is concerned with planning, coordinating and control- 
ling materials, parts, and finished goods from sup- 
plier to customer. It is concerned with two distinct 
flows (material and information) through the organiza- 
tion’, [67]. Supply chain management (SCM) is the co- 
ordination and integration of these activities with the 
goal of achieving a sustainable competitive advantage. 
SCM therefore encompasses a wide range of strategic, 
financial, and operational issues. 

With the emergence of supply chain management 
as a new discipline, the role of operations research (OR) 
models in effective SCM has become significant. From 
1985 onwards the performance, design, and analysis of 
the supply chain has received increased attention. Opti- 
mizing the performance of an entire supply chain is one 
of the most comprehensive strategic problems facing 
managers today. See [63] for 23 different kinds of OR 
models for logistics activities in supply chain manage- 
ment and strong arguments that OR/MS techniques are 
essential for supporting the redesign of logistics pro- 
cesses. They suggest, however, that these various mod- 
els must be coordinated effectively to properly model 
a multistage supply chain. 

This entry considers past literature encompassing 
both strategic and operational issues in supply chain 
management and design. Our goal is to highlight many 
of the recent (and some not so recent, as of 2000) 
papers that have led to the current increased level of 
activity in supply chain research. Due to the volume 
of work done over from 1985 onwards we limit our- 
selves to a discussion of papers that use OR model- 
ing techniques in both the overall design of the sup- 
ply chain and the design of operations coordination 
and control systems within the supply chain. We do 
not consider work done in the growing areas of sup- 
ply chain negotiation and contracts (except where such 
contracts may be driven by production and transporta- 
tion system models) or in the value of information in 
the supply chain; for a detailed discussion on these 
and many other developments in the field of supply 
chain management, see [68], which provides a com- 
prehensive view of the field of quantitative modeling 
applications in supply chain management. Other re- 
cent reviews of the literature in supply chain mod- 
eling include [72] and [9]. Our work complements 
these reviews by including the evolution of models for 


combined location-routing and inventory-routing de- 

cisions. 

The strategic design of a supply chain requires man- 
agers to determine: 

1) which suppliers and vendors to select for supplying 
raw materials; 

2) the number, location, and capacity of manufactur- 
ing plants and warehouses; 

3) specific transportation channels and modes for ma- 
terial movement between facilities; 

4) raw material and end-item production amounts and 
control mechanisms for flows between suppliers, 
plants, warehouses, and customers; and 

5) strategies for managing raw material, intermediate 
product, and finished goods inventory at each of the 
various locations. 

Operational decisions in the supply chain, which 
are often constrained by choices made in the strate- 
gic design phase, include deciding short-term distribu- 
tion and logistics flows between locations and produc- 
tion planning and inventory control policies at each lo- 
cation. Such decisions include determining the amount 
and timing of material flows from suppliers to plants, 
through plants and warehouses, and to retailers and 
customers. At the more detailed level, production and 
logistics operations planners must create detailed pro- 
duction sequencing and product delivery schedules. 

We classify the literature we review into three broad 
categories: 

e strategic supply chain design models; 

e production and logistics coordination and control 
models; and 

e supply chain simulation models. 

The next three Sections consider models in each of 
these three areas. Note that within each section we have 
attempted to present past papers in chronological order 
of publication with a few exceptions to maintain conti- 
nuity of presentation. 


Strategic Design Models 


This Section summarizes the evolution of OR models in 
the broad area of supply chain design. We first consider 
approaches that decide important long-term strategic 
factors such as location of facilities and assignment of 
products and/or customers to these facilities. Note that 
these models attempt to determine the supply chain de- 
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sign that provides the best combination of service and 
cost performance from an overall system view. That is, 
in these approaches, a central planner determines the 
best overall system configuration. We then take a look 
at models that combine location and transportation de- 
cisions in the subsection on location-routing models. 


Distribution System Design 


A.M. Geoffrion and G.W. Graves [36] present the first 
comprehensive supply chain design model for Hunt- 
Wesson foods. They develop a Benders decomposition- 
based algorithm that successfully solves a multicom- 
modity production and distribution system design 
problem. This system contains several plants (with 
known and fixed capacities), distribution centers with 
limits on throughput, and retailer zones (each of which 
must be sourced to a single distribution center). The 
model finds the optimal distribution center configu- 
ration by considering fixed plus variable costs for op- 
erating warehouses as well as detailed production and 
transportation costs. 

J.E. Hodder and M.C. Dincer [43] describe an inter- 
national plant location model and develop a large scale 
nonlinear programming model. The objective function 
captures the difference between the expected value of 
profit and the variance of profit, which is scaled by a risk 
aversion factor. The model incorporates constraints on 
plant capacities, market demands, and financial invest- 
ment limits. Their formulation considers the impacts 
of market prices, international interest rates, exchange 
rate fluctuations, production and transportation costs, 
import tariffs, and export taxes. 

M.A. Cohen and H.L. Lee [22] present a single- 
period, multicommodity, nonlinear programming 
model to develop a global resource deployment pol- 
icy. This paper provides a good initial effort to create 
a fully integrated logistics chain model by linking a set 
of distinct stochastic submodels. The objective function 
maximizes the total after-tax profit for the manufactur- 
ing facilities and distribution centers in all countries. 
The submodels incorporate plant production capacity 
limits, plant material requirements constraints, inven- 
tory balance constraints at both the plants and dis- 
tribution centers, demand and supply capacity limits, 
and offset trade requirements. The model decides the 
assignment of products and subassemblies to plants, 


vendors to distribution centers, and distribution cen- 
ters to market regions. It also determines the amounts 
of components, subassemblies, and final products to 
produce at each plant and how to ship these items 
between vendors, manufacturing facilities, and distri- 
bution centers. 

J.H. Bookbinder and K.E. Reece [10] extend the Ge- 
offrion—Graves [36] model to consider multicommod- 
ity distribution system design with vehicle routing and 
transportation fleet-sizing decisions. They combined 
the Geoffrion-Graves [36]Benders decomposition ap- 
proach with the vehicle routing approach of M.L. Fisher 
and R. Jaikumar [33]. The master problem in their al- 
gorithm determines the number and locations of ware- 
houses. The subproblems then determine the best set of 
vehicle routes (including the number and sizes of vehi- 
cles used by location), given the warehouse configura- 
tion specified by the master problem. 

Cohen and Lee [23] built on their prior work by cre- 
ating a deterministic model for designing a large scale 
distribution network. This model also incorporates off- 
set trade requirements and estimates before and after 
tax profits. Although they provide a comprehensive for- 
mulation of the global supply chain design problem, no 
detailed solution procedure for determining the opti- 
mal configuration is provided. 

Cohen and S. Moon [24] formulate a mixed in- 
teger, multicommodity model that determines the as- 
signment of product lines and production volumes to 
plants. For each plant they determine inbound raw ma- 
terial flows and outbound finished product flows. This 
research provides a restricted optimization algorithm to 
solve production-distribution problems with piecewise 
linear concave production costs. The objective func- 
tion consists of fixed and variable production and trans- 
portation costs. The model includes supply, capacity, 
assignment, demand, and raw material requirements 
constraints. A variant of the Benders decomposition 
technique is applied to solve each problem instance. 

Cohen and P.R. Kleindorfer [21] present a norma- 
tive model for global manufacturing operations that di- 
rects plant location and capacity decisions as well as 
product mix, material flow, and cash flow amounts. 
The model consists of several submodels (a stochastic 
supply chain network model, a financial flow model, 
a stochastic exchange rate model, and a price-demand 
model). The submodels link a multiperiod stochastic 
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master problem to a set of single-period stochastic sub- 
problems. See [23] for a description of an implementa- 
tion of this model framework. 

B.C. Arntzen et al. [3] present one of the most com- 
prehensive supply chain design models to date, which 
they use to redesign Digital Equipment Corporation’s 
(DEC’s) supply chain. They develop a multiperiod, 
multicommodity, mixed integer programming model 
called GSCM (Global Supply Chain Model) to optimize 
the configuration of DEC’s global supply chain. The 
model accommodates multiple facilities, stages (eche- 
lons), time periods, and transportation modes. GSCM 
minimizes a composite function of activity days and the 
total cost of production, inventory, material handling, 
overhead, and transportation. The constraints enforce 
requirements on meeting customer demands, produc- 
tion and throughput capacity limits at each facility, and 
bounds on decision variables. GSCM encodes the global 
bill-of-materials (BOM) for each product and enforces 
inventory balance constraints for every product and lo- 
cation. Their formulation reflects offset trade and local 
content restrictions as well as duty payment and draw- 
back for flows through various countries. The model 
decides the number and location of distribution cen- 
ters, the customer-to-distribution center assignments, 
and product-to-plant assignments. Arntzen et al. [3] 
describe how DEC used GSCM to evaluate global sup- 
ply chain alternatives and develop worldwide manufac- 
turing and distribution strategies. DEC used GSCM to 
guide a worldwide restructuring that saved the com- 
pany over $100 million. 

J.D. Camm et al. [15] develop an integer program- 
ming model using an uncapacitated facility location 
formulation for Procter & Gamble’s distribution net- 
work. The model minimizes the total cost of distribu- 
tion center location selection and distribution center- 
to-customer assignments, subject to assignment con- 
straints and a maximum number of distribution cen- 
ters in operation. This model chooses the best location 
and scale of operation for producing items in Procter & 
Gamble’s product line. 

H. Pirkul and V. Jayaraman [58] propose a mixed 
integer programming model for designing a three-level 
distribution network. The model defines the physical 
flows of commodities from plants to warehouses and 
from warehouses to customer zones. Their method de- 
termines the locations of plants and warehouses that 


result in the minimum total operating plus fixed costs 
for the distribution network. They [58] develop a La- 
grangian relaxation-based heuristic that assigns cus- 
tomers to open warehouses based on their demand for 
products, and then assigns open warehouses to open 
plants. 

Recent approaches (as of 2000) for dynamic distri- 
bution system design, in which customer demands vary 
over a finite planning horizon, have been developed 
in [60,61] and [34]. These papers consider a set of ca- 
pacitated plants with associated (uncapacitated) ware- 
houses that must distribute products to a set of cus- 
tomers in each period. Each of these models requires 
assigning every customer to a single source, or ware- 
house, and hence employs heuristics for generalized as- 
signment formulations of the problem. H.E. Romeijn 
and D. Romero Morales [60,61] show the asymptotic 
optimality of specific greedy heuristic procedures for 
the cyclic (repeating demand patterns) and acyclic de- 
mand cases. R. Freling et al. [34] apply a branch and 
price algorithm to the single sourcing problem under 
seasonal demands. 

L.M.A. Chan and D. Simchi-Levi [17] consider 
a transportation-inventory-routing design problem 
and provide a model and algorithm for a three-level dis- 
tribution system. The system contains a single vendor, 
multiple cross-docking warehouses, and multiple re- 
tailers with constant demand rates. We include this pa- 
per in the design section because, unlike the inventory- 
routing problems covered in the following section, this 
work develops insights regarding distribution system 
design strategy. The authors argue that this system cor- 
responds closely to the Wal-Mart distribution system, 
which has proven immensely successful over the past 
decade. Their algorithm assigns each retailer to a ware- 
house using a bin-packing scheme, and partitions the 
assigned retailers into clusters using a capacitated con- 
centrator location algorithm. They then combine re- 
tail clusters into groups that share the same reorder in- 
terval. The model minimizes asymptotic long-run av- 
erage transportation plus inventory costs, and the au- 
thors show an optimal solution that forces each ware- 
house to receive fully loaded trucks from the vendor, 
but never to hold inventory. The warehouse therefore 
serves only as a coordinator of the frequency, time, and 
size of deliveries to retailers, i-e., as a cross-docking 
facility. 
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Location-Routing Models 


Next we briefly summarize work done on so-called 
location-routing models. This research area has re- 
ceived vast attention starting in the 1970s and we there- 
fore highlight a few of the more significant devel- 
opments. For a detailed summary of location-routing 
models, see [8] and [53]. Since traditional facility lo- 
cation models consider only the cost of assigning cus- 
tomers or markets to facilities (see, for example, [25]), 
many researchers have shown the value of explicitly 
considering integrated location models that incorpo- 
rate detailed vehicle routing decisions. 

Perhaps the earliest work to take such an integrated 
view was done by I.R. Webb [73], who randomly gen- 
erates depot locations and considers the impact of us- 
ing straight-line distances as a substitute for route dis- 
tances in a deterministic setting with vehicle capacity 
restrictions. The paper concludes that a significant loss 
of accuracy can result from using straight-line models. 
S.K. Jacobsen and O.B.G. Madsen [45] subsequently use 
sophisticated heuristic approaches to solve a two-level 
newspaper distribution system design problem in prac- 
tice. 

G. Laporte and Y. Nobert [47] and Laporte et al. [48] 
give exact integer programming approaches for the 
location-routing problem in the absence of vehicle ca- 
pacity restrictions (the first paper considers a single de- 
pot, while the second extends this to multiple depots). 
To solve a set of randomly generated problems they em- 
ploy a branch and bound algorithm with the addition of 
violated subtour constraints. 

J. Perl and M.S. Daskin [55,56] use optimization- 
based heuristic approaches for solving the multiple- 
depot routing problem with depot throughput costs 
and capacities. Their heuristic initially opens all depots 
and solves a routing problem for each. They then use 
fixed cost and depot capacity considerations to deter- 
mine which depots to keep open and heuristically real- 
locate customers to open facilities. 

Laporte et al. [46] extend the analysis to consider the 
location of a depot that collects goods from customers 
and returns them to the depot. In this model customer 
supplies are random variables and a vehicle must return 
to the depot once it is filled to capacity. The problem 
is modeled as a two-stage stochastic program with re- 
course. 


R. Srivastava and W.C. Benton [65] consider the im- 
pact of environmental and operational factors on so- 
lutions to location-routing problems. These factors in- 
clude the customers’ spatial distribution and the ratio 
of location cost to routing costs. The authors apply sev- 
eral combinations of well-known routing and location 
heuristics to determine the effects of specific factors on 
heuristic performance. 

Other recent heuristics and case studies in 
the location-routing literature include [4,41,54,64], 
and [52]. The following section considers shorter-term 
operations decisions in supply chain management. 


Production and Logistics Control Models 


This Section considers models that decide more de- 
tailed operational decisions such as the timing and 
quantities of flows between facilities. These issues and 
decisions have been addressed for many years within 
the context of single-stage models. We focus on mod- 
els that consider the impacts such decisions have on 
multiple entities in a supply chain. These entities 
may or may not belong to the same firm and may 
have either the same or conflicting objectives. With 
a few exceptions we have not summarized the vo- 
luminous literature in the area of multi-echelon in- 
ventory control. Papers in this area tend to develop 
cost models of multistage inventory systems under var- 
ious cost, demand, and operational assumptions. They 
then seek to minimize overall production and inven- 
tory related costs. With the exception of the paper 
[20] noted in the following paragraph, we focus on 
models that consider inventory costs in conjunction 
with distribution and logistics related costs. For a de- 
tailed discussion of multi-echelon inventory analysis, 
see [39,49,68]. 

This Section first focuses on models that combine 
inventory and transportation decisions. (For in-depth 
analyses of general vehicle routing and inventory- 
routing models, see [30] and [11].) We then look at pa- 
pers that consider the coordination of material flows 
and inventory placement in the supply chain. Finally 
we look at the phenomenon known as the ‘bullwhip 
effect’, (which describes the commonly observed in- 
creased demand variance at upstream stages in a supply 
chain) and models that have attempted to mitigate this 
effect. 
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Combined Inventory and Transportation Decisions 


A.J. Clark and H. Scarf [20] provide perhaps the ear- 
liest work considering the operational interaction be- 
tween two separate members of a distribution chain. 
They develop a periodic review inventory management 
model for a serial distribution system. They use a sim- 
ple system consisting of a single distribution center and 
a single retailer to show the optimality of modified base- 
stock policies for the retailer’s inventory control. Mod- 
ified base-stock policies require a fixed base-stock level 
that the retailer attempts to bring inventory up to at the 
beginning of each period. If the retailer cannot reach 
its target base-stock level (because the distribution cen- 
ter has insufficient stock), the retailer attempts to get as 
close to the base-stock level as possible by exhausting 
distribution center stock. 

In one of the initial efforts to provide an exact 
model that incorporates combined inventory and trans- 
portation costs with stochastic demands, A. Federgruen 
and P. Zipkin [31] consider a single-period problem in 
which a central depot must allocate its inventory among 
multiple retailers. Their model minimizes expected in- 
ventory holding and shortage costs plus vehicle routing 
costs. Each of a set of vehicles has a fixed capacity and 
incurs a cost equal to cj for travel between locations i 
and j. The model determines vehicle routes and the al- 
location of inventory among the retailer locations that 
minimizes combined inventory and routing costs. 

L.B. Burns et al. [12] present the model behind a sys- 
tem developed in a streamlining effort for GM’s Delco 
Electronics Division distribution network. The model 
examines the trade-offs between inventory and trans- 
portation costs and determines which plants should 
supply a variety of assembly facilities in North Amer- 
ica. The key decision involved was between the current 
practice of straight-line deliveries to a single central- 
ized warehouse versus a peddling strategy that delivers 
parts directly to the assembly plants. The model recom- 
mended a new peddling strategy for Delco components 
and resulted in a 26.9% logistics savings opportunity. 

C.A. Yano and Y. Gerchak [76] consider a two-stage 
system where a manufacturing plant supplies an assem- 
bly plant with Just-in-time (JIT) shipments of a high- 
volume part. The model assumes that the retailer ob- 
serves stochastic, periodic demand of a single product. 
They allow for emergency shipments of parts when in- 


sufficient vehicle capacity exists. They determine the 
order-up-to point (base-stock level) for the part, the 
time between successive deliveries of the part, and the 
number of vehicles contracted for deliveries between 
the supplier and assembly plant. The model minimizes 
the sum of assembly plant expected inventory costs, 
contracted shipment costs, and emergency shipment 
costs. R. Ernst and Pyke [29] build on the work of Yano 
and Gerchak [76] by including the manufacturer’s (or, 
in their case, the warehouse’s) expected inventory costs 
plus per unit shipping costs (as opposed to a per truck 
shipping cost only). 

We find a more detailed treatment of the trade- 
off between vehicle routing and inventory costs in [2], 
which considers a system with a central depot and mul- 
tiple geographically dispersed retailers. All stock en- 
ters the system through the depot and each retailer ob- 
serves deterministic demand that occurs at a constant 
rate. The model minimizes inventory costs at the retail- 
ers (the depot holds no stock) plus distribution costs 
incurred by combining retailer deliveries into routes 
served by a set of vehicles with fixed capacities. The au- 
thors develop bounds on both the optimal solution and 
solutions from a class of heuristics they develop. They 
then show the asymptotic optimality of their heuristics 
(within a class of strategies that partitions retailers into 
regions, and if a vehicle visits a region it must visit all 
retailers in the region). For other algorithms for and ex- 
tensions to the inventory-routing model see [19,27,28], 
and [74]. 

P. Chandra and Fisher [18] consider the problem 
of coordinating production runs with vehicle routing 
decisions for a single facility with multiple customers. 
They compared the case of separately optimizing pro- 
duction planning and vehicle routing decisions to a co- 
ordinated approach that attempts to minimize the total 
combined cost of production and routing. They found 
increased value in coordinating these decisions under 
higher production capacity, longer time horizons, and 
low holding and setup costs (i. e., the more flexible and 
less constrained the production planning situation). 

M. Henig et al. [42] consider the problem of de- 
termining the optimal inventory replenishment policy 
structure and truck capacity when a distribution center 
imposes a variable cost for each unit ordered in excess 
of distribution center truck capacity, R. They show that 
the optimal periodic ordering policy under such a cost 
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contains two base-stock levels, $; and Sz (S2 > S,). The 
policy structure requires that if initial inventory posi- 
tion before ordering is less than S2, we either order up 
to S,, order exactly R units, or order up to S, depend- 
ing on the value of the initial inventory position; other- 
wise we do not order. The paper also specifies a method 
to determine the best level of truck capacity under the 
given cost structure. 

Due to the increased reliance by manufacturers on 
third party less-than-truckload (LTL) carriers, Chan 
et al. [16] consider a multiperiod distribution model 
with LTL shipments. Their model assumes that a set of 
customers observe periodic, deterministic demand for 
multiple products over a finite planning horizon. The 
distribution network contains both suppliers and ware- 
houses and the model allows for backlogging customer 
demands. The authors pose the problem as a fixed- 
charge minimum concave cost network flow problem 
over an equivalent distribution network, where the 
costs include LTL shipping costs plus inventory hold- 
ing and shortage costs. The paper develops proper- 
ties of optimal solutions and shows conditions under 
which the linear programming (LP) relaxation (under 
piecewise-linear concave costs) value equals the opti- 
mal mixed integer solution value. The heuristic solution 
algorithm proposed is based on effectively characteriz- 
ing the cost of modifying a fractional integer variable 
found in the LP relaxation solution. Their results com- 
pare favorably to an original solution approach for the 
problem proposed in [7]. 


Material Flows and Inventory Placement 


We next consider several papers that deal with more ef- 
fectively managing material flows and inventory place- 
ment in the supply chain. 

Lee and C. Billington [50] develop a stochas- 
tic model for managing material flows in Hewlett- 
Packard’s deskjet printer supply chain. They assume 
that each site observes stochastic demand and employs 
a periodic, order-up-to inventory system and that a pre- 
determined value is set for either a target service level or 
base-stock level (for each site). Their model character- 
izes the demand transmission process (whereby a site 
translates its demand into orders on its’ suppliers) and 
the availability transfer process, which describes the 
availability of goods at the supplier location. They de- 


termine the review period and order quantity for each 
product type and location and address the trade-offs be- 
tween inventory investment and service level in a mul- 
tistage supply chain. 

Pyke and Cohen [59] develop a stochastic model 
for managing material flows in an integrated three- 
level production-distribution system. The system con- 
tains multiple products, a single manufacturing facil- 
ity, one warehouse, and a single retailer. The retailer 
promises its customers a minimum service level and 
the model minimizes total cost under constant setup 
and processing times at the factory. Although the trans- 
portation time from factory to retailer is constant, an 
option exists for the retailer to receive an expedited 
shipment if the finished goods stockpile at the fac- 
tory cannot satisfy retailer demand. The model assumes 
stochastic production times and lead times between the 
factory and its finished goods stockpile, and approxi- 
mates key inventory and service time distributions nec- 
essary for expressing expected total system cost. The 
outputs of the model include the economic reorder in- 
terval and replenishment batch size for each product 
type. 

T. Altiok and R. Ranjan [1] analyze a multistage 
pull-type production system containing one final prod- 
uct, FIFO (first in, first out) processing rules, inter- 
mediate buffers, and Poisson demand. Each stage fol- 
lows an (R, r) inventory policy (when inventory posi- 
tion falls below r, order up to R) and backorders when 
stock is insufficient to meet demand. They develop an 
iterative procedure that separately considers the flow in 
each two-node subsystem as a function of the policy pa- 
rameters. The procedure terminates when the average 
throughput values of all subsystems are approximately 
equal. The outputs include the inventory level in each 
buffer and the backorder probability, P. The authors 
show that when P does not exceed 0.3, their approxi- 
mation method provides acceptable results. 

S.C. Graves and S. Willems [40] consider the place- 
ment (location) of safety stock in a multistage supply 
chain. They represent the supply chain as a network 
and assume that each stage follows a base-stock policy 
(a stage represents a processing function, such as pro- 
curement of raw material, component production, as- 
sembly, testing, etc.). Demand occurs only at nodes that 
have no successors and each stage provides a guaran- 
teed service time for meeting downstream demand. The 
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model minimizes system-wide safety stock costs while 
meeting the guaranteed service times (which constitute 
the decision variables). They [40] validated their model 
through a successful implementation in Eastman Ko- 
dak’s digital camera supply chain. 

Graves et al. [38] study requirements planning 
in multistage production systems. They begin with 
a single-stage model that produces one (aggregated) 
product to stock for a finished goods stockpile. They as- 
sume that each stage forecasts demand H periods in ad- 
vance and revises forecasts in each period. These fore- 
casts drive the production plan, which is also planned 
H periods in advance and is revised each period. They 
measure three significant parameters: the smoothness 
of the production plan, the stability of the production 
plan, and the safety stock level (smoothness of the pro- 
duction plan differs from stability in that smoothness 
characterizes the variability of actual production out- 
put while stability characterizes the variability of the 
production schedule). The paper captures the trade- 
off between production capacity and inventory require- 
ments. The authors show how to extend the single-stage 
model to a multiple-stage dynamic requirements plan- 
ning (DRP) model by replicating single-stage models. 
An application of the model to film manufacturing at 
Kodak resulted in a 60% decrease in inventory require- 
ments for two items, while increasing one item’s inven- 
tory by 20%, with a significant net savings reported for 
the business line. 


The Bullwhip Effect 


Several recent papers (as of 2000) have described and 
quantified the phenomenon known as the bullwhip ef- 
fect in supply chains - the tendency for demand vari- 
ability to increase at upstream stages in the chain. This 
demand variability places a burden on suppliers be- 
cause of the increased safety stock and excess vehicle 
capacity requirements. 

C.C. Holt, F. Modigliani, and J.P. Shelton [44] first 
showed evidence that the variation in orders can in- 
crease at upstream points in the television manufactur- 
ing chain. J.D. Sterman’s [66] documentation of expe- 
rience with supply chain simulation experiments shows 
how rational decision making can lead to the bullwhip 
effect in the absence of full information about partners 
in the chain. 


Lee, V. Padmanabhan, and S. Whang [51] offer four 
factors to explain the existence of the bullwhip effect 
(demand forecast updating, order batching, price fluc- 
tuation, and shortage gaming), along with mathemati- 
cal models of these phenomena. They provide insights 
on how coordination between channel partners can di- 
minish the bullwhip effect. Z. Drezner et al. [26] specif- 
ically consider the impact of forecast errors on the bull- 
whip effect. Their model shows that even in supply 
chains with perfect information shared among all mem- 
bers, errors in demand forecasts can still create a bull- 
whip effect. 

M.P. Baganha and Cohen [5] consider conditions 
under which demand stabilization (or damping) be- 
comes economically attractive in a two-echelon distri- 
bution system containing a warehouse and multiple re- 
tailers. Their model assumes that each retailer incurs 
a fixed order cost, which results in the optimality of 
(s, S) inventory policies at each retailer (when inven- 
tory position falls below s, order up to S). Following 
an (s, S) policy results in autocorrelated orders from 
each retailer since we can expect that a retailer will not 
place an order in a period immediately following one 
in which an order was placed. Baganha and Cohen [5] 
therefore apply an autoregressive model to describe the 
distribution of retailer orders (which constitutes ware- 
house demand) and show situations in which retailer 
order damping can lead to lower system cost. 

G.P. Cachon [13] studies supply chain demand vari- 
ability in a model with one supplier and N retailers that 
face stochastic demand. Retailers follow a periodic (R, 
nQ) policy (order up to R using integer multiples of 
some base quantity, Q). The model develops exact ex- 
pressions for supply chain (inventory) costs under sta- 
tionary retailer demand distributions. The results show 
how the supplier’s demand variance declines as the re- 
tailers’ order intervals are lengthened or as batch size 
increases. The main result of the paper shows that a bal- 
anced ordering strategy (when the same number of re- 
tailers order in each period) can lead to significant sav- 
ings in supply chain inventory costs. 

Cachon and Zipkin [14] consider competition and 
cooperation in a two-stage serial supply chain. The 
model assumes stationary stochastic demand occurs at 
a single retailer who then orders from a supplier. They 
consider both the competitive and cooperative scenar- 
ios: in the competitive case each firm minimizes its own 
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cost, while a coordinated approach minimizes system 
costs (both firms follow a base-stock policy). The paper 
shows how the competitive approach reduces efficiency 
by leading to a Nash Equilibrium point that differs from 
the system optimal solution. A sequence of linear trans- 
fer payments allows the firms to achieve the system op- 
timal solution while operating independently. 

A. Balakrishnan et al. [6] consider a two-echelon 
supply chain with a single-distribution center, multiple 
retailers, and multiple products. Like Cachon and Zip- 
kin [14], they compare expected chain costs under au- 
tonomous decisions (competitive scenario) to those un- 
der coordination mechanisms. The model differs from 
that of Cachon and Zipkin [14] since it considers in- 
ventory plus transportation costs, and the coordination 
mechanisms differ from linear transfer payments. Bal- 
akrishnan et al. [6] offer both cost- and policy-driven 
mechanisms that effectively use the retailer order vari- 
ability as a decision variable. By tuning to the best level 
of order variability they show that the coordination 
mechanisms can lead to better system costs than un- 
der autonomous decisions by damping retailer demand 
variation. 


Supply Chain Simulation Models 


Because of the large number of decision variables and 
the complexity of the constraints required in develop- 
ing exact cost models of large scale distribution systems, 
many researchers have used simulation models to pro- 
vide valuable insight into complex supply chain dynam- 
ics. We next briefly consider models that have used sim- 
ulation to derive such insights. 

J. Wikner et al. [75] examine five supply chain 
improvement strategies using a simulation model for 
a three-level supply chain that includes one factory, 
multiple distribution centers and retailers, and carries 
inventory at each facility. The five strategies include 
1) reducing system delays; 

2) fine tuning order policy parameters; 

3) removing the distribution center echelon from the 
system; 

4) changing different echelon decision rules; and 

5) improving information integration between stages. 

They conclude that integrating information flow be- 
tween channel partners is the most effective strategy for 
minimizing supply chain operating costs. 


D.R. Towill et al. [70] then extend the simulation 
model to consider a just-in-time (JIT) delivery strategy. 
The JIT strategy combined with the removal of the dis- 
tribution center echelon was shown to be more effec- 
tive than the integration of information flow or mod- 
ification of the order policy parameters. Towill and A. 
Del Vecchio [69] use methods from filter theory com- 
bined with a simulation model to analyze the effects of 
demand variability in the supply chain. They liken each 
stage in the chain to an electrical filter (with a response 
function) and analyze various supply chain responses 
to randomness in demand patterns. The simulated re- 
sponses determine the minimum safety stock required 
to achieve a desired service level. 

S. Tzafestas and G. Kapsiotis [71] present a math- 
ematical programming based approach to optimize 
a portion of the supply chain and use simulation tech- 
niques to numerically analyze the performance of the 
optimized model. They consider a two-echelon sys- 
tem in which a manufacturer supplies multiple assem- 
bly plants. They then consider three decision scenar- 
ios. In scenario I the manufacturer minimizes its cost 
and its customers must accept the deliveries imposed 
by the manufacturer. Scenario II incorporates a cen- 
tral decision-maker that attempts to minimize overall 
system costs. Scenario III presents a decentralized de- 
cision framework where the supplier minimizes its cost 
subject to the demand imposed by the assembly plants 
under their optimal decisions. They perform the simu- 
lation under three scenarios: manufacturing facility op- 
timization, global supply chain optimization, and de- 
centralized optimization at each level. The numerical 
examples chosen by the authors do not result in signif- 
icant differences in cost performance under any of the 
three tested scenarios. 

D. Petrovic et al. [57] use fuzzy modeling to deter- 
mine order-up-to levels at various stages in a supply 
chain. They develop a supply chain simulator that an- 
alyzes the effects of order-up-to levels on cost and the 
dynamic behavior of the chain in an uncertain envi- 
ronment. The fuzzy model handles uncertainty in both 
customer demand and external supply of raw materials. 
The model attempts to determine the stock levels and 
order quantities at each stage in the chain that give ac- 
ceptable delivery performance at a reasonable total cost. 

R. Ganeshan [35] presents a near optimal (s, Q) 
inventory policy (when inventory position falls below 
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s, order Q units) for a production/distribution net- 
work with multiple suppliers, a central warehouse, and 
multiple retailers. This model develops a system-wide 
cost equation (by describing the demand process at the 
warehouse) that includes warehouse and retailer in- 
ventory costs and uses a conjugate gradient method to 
find the best policy parameters. The paper verifies the 
costs implied by the prescribed policy parameters using 
a SLAM-based supply chain simulation model. 


Summary 


Fisher [32] notes that the performance of supply chains 
today is extremely poor despite advances in the areas of 
quick response systems, mass customization, lean man- 
ufacturing, and new technologies. As firms increase 
their operations throughout the world, they will have 
even greater need for tools that integrate operations and 
information among geographically dispersed locations. 

The key to success in SCM requires an emphasis 
on integrating activities through cooperation, coordi- 
nation, and information sharing throughout the entire 
chain. To have the greatest benefit, the supply chain 
must be managed as a single entity, which requires im- 
proved OR models and tools so that supply chain man- 
agers can solve problems that reflect the relationships 
among all supply chain activities. 

Many research opportunities exist for developing 
global supply chain models that take into account el- 
ements necessary for providing a complete and inte- 
grated view of the system. New areas for research in- 
clude the following: 

e modeling the effects of a greater number of stochas- 
tic elements; 

e accounting for international economic issues (in- 
cluding exchange rate fluctuations and risks); 

e incorporating and modeling detailed BOM relation- 
ships; 

e product differentiation and mass customization 
strategies; 

e capitalizing on advances in information technolo- 
gies; and 

e the value of important strategic global alliances. 

We have commented on many significant develop- 
ments and applications of OR models. Despite these 
advances, Geoffrion and Powers [37] report that many 
of the most popular commercial software packages still 


use simple heuristic approaches that result in signif- 
icantly suboptimal cost performance. Because of the 
proliferation of desktop computers and software pack- 
ages with sophisticated graphical user interfaces, logis- 
tics executives have almost exclusively selected overly 
simplified heuristic-based software for the design and 
analysis of the supply chain. 

Shapiro et al. [62] note that now is the perfect time 
for developing sophisticated OR supply chain models 
for personal computers and describe their success in 
doing so for a large consumer products company. The 
OR/MS profession has recently created many powerful 
decision tools that provide opportunities for improved 
SCM. It is important that these tools continue to find 
widespread application in industry through the devel- 
opment of comprehensive and user-friendly SCM sys- 
tems. 


See also 


> Global Supply Chain Models 

> Inventory Management in Supply Chains 
> Nonconvex Network Flow Problems 

> Piecewise Linear Network Flow Problems 
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Mechanical Model 


Considered is the rotation of a flexible arm in a hori- 
zontal plane around an axis through the arm’s fixed end 
and driven by a motor whose torque is controlled. The 
equations of motion of the motor are given by 


O(t) = w(t), 


a(t) = u(t) = =, 


(1) 
where @ is the angle of rotation, w is the angular ve- 
locity, t is the torque generated by the motor, and J is 
the moment of inertia of the motor and the arm. J is as- 
sumed to be constant, since the displacement of the arm 
due to the vibration caused by the rotation is assumed 
to be small. 

In addition, we assume the arm to be homogeneous 
and of length 1. Following [6] the displacement y = y(t, 
x) of the arm from the rotating zero line is modeled by 
the differential equation 


yerlt, x) ae AVxxxx(t, x) 
20" Basalt Hwee =e 
IX 


forall t € (0, T)andx € (0,1), (2) 


where T > 0 is some given time and a = Elp with E 
being Young’s modulus of the arm material, I being the 
moment of inertia of the cross-section of the arm, and 
p being the mass per unit length. 

The left end of the arm is clamped, and the right end 
is free. This leads to the boundary conditions 


y(t, 0) = yx(t, 0) = Yxx(t, 1) = Yxxx(t, 1)=0 


for all t € [0, T] . ©) 


At the beginning of the motion the arm is assumed to 
be in rest which leads to the initial conditions 


yO, x) = y:(0, x) = 0 


for all x € (0,1) ” 


and 


@(0) = O(0) = 0. (5) 


The Problem of Controllability 
and Minimum Norm Controllability 


Let some angle ©; € R with ©; # 0 be prescribed. 
Then we look for some control function u € L?(0, T) 
such that the solution y = y(t, x), t € [0, T], x € [0, 1] of 
(2), (3) and (4) satisfies the end conditions 


y(T, x) = y(T,x) =0 


for all x € (0,1) (6) 


and the angle © = @(t), t € [0, T], of rotation satisfies 
the end conditions 


O(T)=@r and O(T)=0. (7) 


If this problem of controllability is solvable, then a con- 
trol function u € L?(0, T) is looked for which solves the 
problem of controllability and whose norm 


T 
leon = ( / u(t)? ir) 
0 


is as small as possible. 
On using (1) and (5) we get 


Rie 


oi) =o) = f u(s) ds , 

0 

a(t) = f (t= suls) ds 
0 


for t € [0, T] so that the differential equation (2) can be 
rewritten in the form 


2 


T 
yerlt, x) + AV xxxx(t x) ~ (/ u(s) ts) 
0 


* Fela —2)yalt,3)] + y(t] = —xu(t) ve 


for t € (0, T) and x € (0, 1) 
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and the end conditions (7) are equivalent to 


aM 
-f tu(t) dt = Or, 
* 
[ wwa=o. 
0 


Solvability of the Problem of Controllability 


(9) 


At first we consider the special case where the cross- 
section areas behave like rigid bodies, i.e., they stay 
plane, do not change their measurements and do not 
rotate around their centers. Further, it is assumed that 
they stay orthogonal to the zero line. Then the differen- 
tial equation (2) can be replaced by 


yiult, x) + AY xxxx(t, x) = —xu(t) (10) 
for t € (0, T) and x € (0,1) 
(see [1] and [2]). 

In this case it can be shown that the problem of con- 
trollability is solvable for every T > 0 and the problem 
of controllability with minimum norm has a unique so- 
lution (see, for instance [3]). The main tool in the proof 
of this result is linear moment theory (see [4]). 

In the general case where the displacement y = y(t, 
x) of the arm from the rotating zero line is modeled by 
the differential equation (2), the problem of controlla- 
bility is not exactly solvable as being shown in [5]. The 
main tool in the proof of this result is nonlinear mo- 
ment theory. However, if one determines the unique 
control u = u! € L?(0, T) which satisfies (9) with min- 
imal norm such that the corresponding solution y = y! 
= y'(t, x) of (10), (3) and (4) satisfies the end condi- 
tions (6) and then determines the unique control u = 
u’ € L7(0, T) which satisfies (9) with minimum norm 
such that the solution y = y’ = y*(t, x) of the differential 
equation 


t 2 
Vist, X) + OV eeex(ts x) = ( / u'(s) ts) 
0 
x maa —x?)yl(t,x)] + y(t, »| — xu?(t) 
2 0x 


for t € (0, T), x € (0, 1) the boundary conditions (4), and 
the initial conditions (3) satisfies the end conditions (4), 


then the solution y = y* = y*(t, x) of 


t 2 
Yh(t, x) + aya (t3) — ( [2 ar) 
0 
Ee 1—x?)y*(t *(t = 2(¢ 
bay —x*)yy(t, x)] + y*( | = —xu"(t) 


for t € (0, T), x € (0, 1), the boundary conditions (3), 
and the initial conditions (4) satisfies the end condi- 
tions (6) up to a very small error. 
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Designing a composite stiffened panel is a complicated 
process, and optimization techniques are proving help- 
ful to engineers in designing a cost-effective, practical 
structure. Random search algorithms (cf. also » Ran- 
dom search methods), such as simulated annealing and 
genetic algorithms, are being used in the design of com- 
posite structures [3]. A benefit of these algorithms is 
that they can optimize functions that cannot be han- 
dled with traditional optimization techniques. In this 


article, a structural design problem for composite fuse- 
lage is described which involves solving a mixed inte- 
ger global optimization problem. The objective func- 
tion may be discontinuous, have many local optima, 
and the feasible region may be disconnected. The ran- 
dom search algorithm, improving hit and run [23], that 
is used to solve these composite structural optimiza- 
tion problems is also described in » Global optimiza- 
tion: Hit and run methods. Although a random search 
algorithm can only provide a probabilistic guarantee 
of finding the global optimum, the benefits of having 
a more realistic formulation outweigh the disadvantage 
of not having a 100% guarantee of finding the global 
optimum. The near optimal solutions found using this 
approach have been well-received by Boeing engineers 
and have demonstrated significant reductions in weight 
and cost [15]. 

Three optimization formulations for a composite 
design problem will be described, each increasing in 
complexity and incorporating more realism. The first 
two formulations are point designs, where a single 
cross-section of a composite panel is optimized. The 
first formulation assumes a fixed number of plies, while 
the second one allows the number of plies to be a vari- 
able in the optimization. The third formulation extends 
a ‘point’ optimization to a ‘blended’ panel optimization, 
by dividing a panel into elements. This third formula- 
tion is applied to the design of a large panel with vary- 
ing cross-sections. Manufacturing considerations lead 
to constraints in the third formulation to ensure a prac- 
tical, consistent, panel. A sample problem is also pre- 
sented. 


Point Design 


The first formulation optimizes a laminated compos- 
ite structure, with skin and stiffeners. Laminated com- 
posites are composed of several thin layers, called plies, 
which are bonded together to form a composite lam- 
inate. A single ply consists of long reinforcing fibers 
(e.g., graphite fibers), embedded within a relatively 
weak matrix material (e.g., epoxy). Composite lami- 
nates are usually fabricated such that all fibers within 
an individual ply are oriented in one direction, how- 
ever the angle may vary from ply to ply. The angle of 
the fibers in the ith ply is denoted by 6;. The design 
variables for the first formulation include: the fiber an- 
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Optimal Design of Composite Structures, Figure 1 
Design variables for a hat stiffened composite laminate 


gles for each ply in the skin and stiffeners, denoted 0s" 
a ond GO Foe Pe Ny a 
and stiffener geometry variables; height, width of cap, 
width of flange, angle of web, and stiffener spacing, as 
shown schematically in Fig. 1. In the first formulation, 
it is assumed that the numbers of plies in the skin and 
stiffeners, n'" , are fixed. This assumption 
is relaxed in the second formulation, where the number 
of plies will also be allowed to vary. 
The first optimization formulation is stated as: 


f(x) 
St. 2) = 0. for sg = Lwcoy th, (1) 


x Sx Sx" 


fori=1,..., 


and yitiffener 


min 


fori=1,...,n, 


where the n variables consist of the n°“? + ns*fener fiber 
angles and five stiffener geometry variables, and are 
real-valued, x € R”. The objective function, f(x), may be 
cost, weight, or a combination of cost and weight, f(x) 
= Scost * feost(X)+ Sweight * fweight(x). An alternate ob- 
jective may be to maximize performance, such as max- 
imize margin of safety. 

The constraints are composed of simple upper and 
lower bounds on the variables, x! 
the more complicated structural mechanics constraints, 
gj(x) = 0. The structural mechanics constraints are 
composed of margins of safety for strain, strength, 
damage tolerance and buckling analyses [16]. The in- 
equality constraints are formed so that a feasible design 
has a positive margin of safety. This first formulation 
has been described in [1,2,4,20]. 


< xj < xj, and 


fstifines 


Optimal Design of Composite Structures, Figure 2 
Graph of in-plane stiffness for a four ply symmetric laminate, 
[01,02,02,44] 


The margin-of-safety functions to be used as con- 
straints and/or the objective function can be described 
as black-box functions where the functions often are 
only available in the form of a computer subroutine. For 
example, stiffness of a composite laminate can be calcu- 
lated using classical lamination theory [5] or it could 
involve a finite element analysis [10,12]. 

To illustrate the global nature of these equations, 
a plot of the in-plane stiffness of a four ply, symmet- 
ric laminate, [01, 42, 92, 0;], using classical lamination 
theory, is shown in Fig. 2. The greatest in-plane stiff- 
ness occurs when the fiber angles are all 0 degrees, as 
makes sense intuitively. See [20] for the equations used 
to generate the plot. The plateau in the graph represents 
infeasible designs, with a stiffness that is less than a pre- 
scribed critical value. If stiffness is used as an objective 
function, it can be seen to be nonlinear and nonconvex. 
If stiffness is used in the constraints to allow only those 
design above a threshold of critical stiffness, the feasi- 
ble region itself is nonconvex and even has holes in it. 
This indicates what a difficult problem even attaining 
feasibility can be. 

It is also possible to define a hierarchy of objectives. 
For example, we might minimize cost and when there 
is a tie on cost, we minimize weight, and when there is 
a tie on weight, then maximize margin of safety. This 
hierarchy is natural to the formulation because weight 
is not directly affected by the fiber angles, but vary- 
ing the fiber angles can increase margin of safety while 
maintaining a low cost and weight. A two phase ap- 
proach was used, where phase 1 maximized the min- 
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imum margin of safety until a positive value was ob- 
tained, and then phase 2 maintained feasibility while 
optimizing the hierarchy of cost, weight and margin of 
safety. A contrasting approach using a penalty method 
with a variable penalty factor was compared numeri- 
cally in [10,11]. Numerically the penalty method was 
better than the hierarchical approach, although the hi- 
erarchical approach was more intuitive to the engi- 
neers. 

An improvement to the first formulation involves 
relaxing the requirement to specify the numbers of plies 
in the skin and stiffener. Since the composite laminate 
is manufactured by laying down individual plies, it is 
realistic to treat the number of plies as an integer vari- 
able. Other optimization techniques have treated thick- 
ness of a ply as a continuous variable, and then rounded 
to the appropriate number of plies [3,17]. This is not as 
accurate as treating the number of plies as integer vari- 
ables directly. The second formulation is very similar 
to the first, with additional binary variables to indicate 
whether a ply exists in the laminate; ra fori=1,..., 
nin and peas fori=1,..., ntener, where t; = 1 if ply 
i exists and takes on fiber angle 6;, and t; = 0 if ply iis 
dropped from the laminate. The upper bounds on the 
number of plies needed in the skin and stiffener, skin 
and yitiffener 
aspect of the inputs. 

The complete second optimization formulation, in- 
cluding the binary variables is summarized as: 


are now easier to provide and not a critical 


min f(x, t) 
s.t. (x,t)>0O forj=1,...,m, 


ey a fori=1,...,n, 


ic 


t; € {0,1} fori=l,...,n', 


where v is the total number of continuous variables, in- 
cluding fiber angles and geometry variables, and n’ is 
the largest number of plies, n! = n°“? 

The random search algorithm had to be modified 
in order to solve the second formulation. See » Global 
optimization: Hit and run methods, or [10,11,13] for 
details on the algorithm for mixed discrete-continuous 
global optimization. Once this capability was available, 
it was possible to create other discrete variables. The 
most interesting are the fiber angles, which may take 
on continuous values between +90 and —90 degrees, 
but for practical purposes are often restricted to dis- 


+ pitiffener 


crete values, such as 0, + 45, or 90 degrees. This flex- 
ibility in the formulation provides the ability to in- 
vestigate manufacturing considerations. For example, 
running the optimization problem allowing the fiber 
angles to be continuous values may not be practical 
for manufacturing, but can provide a lower bound 
on weight. Then the problem can be run again to 
investigate the additional weight associated with us- 
ing a discrete set of fiber angles. In this way, trade- 
offs can be carefully evaluated early in the design 
process. 


Blended Panel 


The third formulation expands the optimization prob- 
lem beyond considering a single point to include an 
entire panel. In the point optimization, it is assumed 
that the loads are constant and uniformly distributed 
over the section, and the design is also constant over 
the section. To be more realistic, the loads over a large 
panel such as a crown panel in a fuselage, are not evenly 
distributed. We could use the second formulation for 
the entire panel using the heaviest loads, but then we 
would essentially overdesign parts of the panel where 
the loads are much lighter. Therefore, in the third for- 
mulation, the panel is divided up into elements, where 
the loads are assumed constant for each element, but 
the loads may vary between different elements [16,19]. 
The elements cannot be point-optimized individually 
because the result may be impractical for manufac- 
turing. For example, the stiffener spacing could in- 
crease and decrease in adjacent elements, or a 45° ply 
could change to a —30° in adjacent elements. This 
would not be considered manufacturable, since it is as- 
sumed that once a ply is in place, the orientation of 
the fiber angle cannot change, although the ply may 
be dropped off. The third formulation must ensure 
that the elements can be produced into a consistent 
panel, or what we call a ‘blended’ panel. By including 
binary variables for each ply in each element, ‘blend- 
ing rules’ are established to reflect these manufacturing 
considerations. 

Figure 3 illustrates the main blending rule for 
a panel divided into four elements. Consider any ply 
as it passes through the four elements. The fiber ori- 
entation of ply iis 6;, and there are also four binary in- 
teger variables associated with the ply. As described in 
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Optimal Design of Composite Structures, Figure 3 
Design variables for a blended panel 


the second formulation, if the binary variable tjj, 4) = 1 
then ply iin element (j, k) exists with fiber angle 0;, and 
if ti, 4) = 0 then ply i has been dropped in element (j, k). 
By adding constraints on the binary variables, we can 
force plies to be dropped in such a way that the panel 
is manufacturable. We assume that the heaviest load on 
the panel exists in the upper left corner, and we want to 
be able to drop a ply, but once it is dropped we do not 
allow it to be added back into the panel. Thus the main 
blending rule is nicknamed the ‘less-than-or-equal-to 
rule’, and includes the following constraints: 


Fii.k) 2 Fi(j,k+1)> 

Figj,k) 2 Fi(j+1,.k) 
for all plies i, and for all rows j and columns k of the 
panel. 


For the panel illustrated in Fig. 3, the above con- 
straints would be: 


tia) = Fiqa,2), 


tiq2,1) = Fi2,2) 
and 


tia) 2 Fic2,1), 


tiq1,2) = Fia,2). 


In the example, if ply i exists in element (1, 1) but is 
dropped in element (1, 2), then it must also be dropped 
in element (2, 2). Ply i may exist or be dropped in el- 
ement (2, 1) and still satisfy the blending rule. This 


# tan 
@ fram 


blending rule has been very useful in structuring the 
optimization over a panel where the loads are allowed 
to vary, and a realistic design would make use of ply 
drops. 

To summarize the third formulation, the second 
formulation is expanded to span several elements, and 
the blending rule is introduced as additional con- 
straints: 


min f(x, t) 
et gOS 0;, f= Teeseg mm, 
pee Fe op i=l,...,n, 
(3) 
tig. = tigkty, t=l,....n, 
tig 2 tigi, t=... 
bie = 10; Me 4S Lieoiat, 


for all defined elements (j, k). 

Another aspect in blending of a panel has to do with 
the variation allowed in stiffener geometry across ele- 
ments. For example, stiffener spacing cannot vary in the 
axial direction, but is allowed to vary across elements 
in the hoop direction. In the third formulation, we in- 
clude a design variable for stiffener spacing, x P that 
is only subscripted by row of the elements, which cor- 
responds to the hoop direction. For example, in the il- 
lustration in Fig. 3 the stiffeners are one unit apart in 
the first row where the loads are heaviest, and two units 
apart in the second row which has lighter loads. No ad- 
ditional constraints are needed to incorporate this re- 
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striction, since the definition of the variable has ensured 
that stiffener spacing is constant in the axial direction. 
The other geometry variables are also allowed to vary 
across rows in the hoop direction, but remain constant 
in the axial direction. For example, as the stiffeners get 
spaced farther apart, they may also become shorter. In 
the sample problem presented here, it was assumed that 
the height of a stiffener remained constant for its entire 
length. 

The main difference between the second and third 
formulation is the difference between a point design 
and a ‘blended’ panel design. The panel design is of 
greater use to the engineers, but there is a penalty in 
computational complexity. The addition of elements 
for a blended panel greatly increases the number of de- 
sign variables for the optimization problem. Suppose 
the panel is divided into P * Q elements. Then there 
will be (nin+ n'tiffener) x Px Q binary integer variables. 
There will be (n+ nstiener) continuous variables for 
the fiber angles, and 5 * P continuous geometry vari- 
ables. An alternative formulation has been developed 
to reduce the number of variables by using integer vari- 
ables that capture the number of elements, or distance, 
for which a ply is runs. Details of this formulation and 
numerical results are presented in [6,14,22]. 

We next present a sample problem and discuss the 
differences between a point design and a ‘blended’ panel 
design. 


Sample Problem 


The sample problem presented here is taken from [18], 
and is intended to demonstrate the difference between 
the point formulation and the blended panel formula- 
tion. It uses two sets of loading conditions for each ele- 
ment as shown in Table 1., and uses the material prop- 
erties associated with AS4-3501 graphite/epoxy, as in 
[1]. There are 2 « 3 elements with a maximum number 
of 16 plies (symmetric) in the skin and stiffeners. For 
this sample problem there are 96 binary integer vari- 
ables, and 26 continuous variables, and the computer 
algorithm used 25,000 function evaluations. 

For the sample problem we first did a point by point 
optimization for each element independently to obtain 
a lower bound on weight, and to observe the trends of 
the designs. The point by point optimization results for 
this sample problem are presented in Table 2. Notice 


Optimal Design of Composite Structures, Table 1 
Loading conditions for the sample problem, where Ny, Ny, 
and Nyy are in Ib and My, My, Myy are in Ib-in 


Nx 3000 3000) 2500 2500) 2500 2500 
Ny 2000 2000); 1500 1500) 1000 1000 
Nxy| 1500 —1500} 1500 —1500} 1000 —1000 
M; | 2000 2000) 2000 2000) 2000 2000 
My 0 0 0 0 0 0 
Mxy 0 0 0 0 0 0 
Nx 2500 2500) 2000 2000) 2000 2000 
Ny 2000 2000) 1500 1500) 1000 1000 
Nx, {| 1000 —1000} 1000 —1000 500 —500 
M, | 2000 2000) 2000 2000) 2000 2000 
My 0 0 0 0 0 0 
Mxy 0 0 0 0 0 0 


that fiber angles vary in a way that would be imprac- 
tical for manufacturing. For example, the fiber angles 
vary drastically between adjacent elements, such as be- 
tween element (1, 1) and element (1, 2). Also notice 
that stiffener spacing also varies in an impractical fash- 
ion. In the first row, stiffener spacing changes from 21 
inches, to 22 inches, and then up to 30 inches. This is 
not a manufacturable design. Table 3 depicts the opti- 
mal design of the blended panel using the third formu- 
lation. Although the overall weight increased, the opti- 
mal design is now considered a practical design. Notice 
that the first ply in the first element of 35° stays in the 
entire panel, while the second ply in the first element 
of 26° gets dropped immediately. This type of tailored 
ply dropoffs is manufacturable, and makes use of the 
ability to tailor composite materials. Also, the stiffener 
spacing is fixed in the axial direction at 20.6 inches in 
the first row, as desired, and has a slight change to 20.8 
inches in the second row. This also satisfies the blending 
rule. 

The formulations presented thus far have been for 
a stiffened composite panel, but the point and the 
blended panel formulations have also been extended 
to a sandwich composite panel, as depicted in Fig. 4. 
A sandwich panel consists of an inner core, with plies 
on the outside. Typically the depth of the entire panel is 
constant, but the core increases to compensate for plies 
that are dropped. The sandwich formulation was used 
to design a fuselage keel panel in [7]. 
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Optimal Design of Composite Structures, Table 2 
Point by point optimization of the sample problem 


Skin 12 plies 10 plies 8 plies 
[49/52/ — 42/10/ — 45/ — 72/], [36/45/ — 39/ — 82/ — 39], [25/50 — 25/ — 51], 

Stiffener 8 plies 10 plies 10 plies 

[1/ — 47/1/53], [28/17/ — 23/ — 20/ — 80], [i = BS) = B3y/53)| 
Spacing 21 inches 22 inches 30 inches 
Weight 634 E — 5 lb/in* 577 E—5 lb/in? 450 E—5 Ib/in* 
Skin 10 plies 8 plies 6 plies 

[—85/55/34/ — 34/ — 44], [—36/77/ — 38/35], [1/57/ — 48], 

Stiffener 8 plies 12 plies 10 plies 

[41/ — 3/ — 58], [—10/14/42/ — 7/ — 59/ — 59], [—33/69/12/ — 17/14], 
Spacing 21 inches 24 inches 24 inches 
Weight 548 E—5 Ib/in? 506 E —5 lb/in? 396 E—5 Ib/in? 


e The overall weight of this nonblended panel is 3,111 E—5 Ib/in?. 


e The stiffener geometry variables were always at their upper and lower bounds; Height: 2 inches; Width of 
flange: 1 inch; Width of cap: 2 inches, and Angle of web: 90 degrees. 


Optimal Design of Composite Structures, Table 3 


Blended panel optimal design for the sample problem 


Skin 12 plies 10 plies 8 plies 

[35/26/ — 35/41/ — 42/ — 89/], [35/ — 35/41/ — 42/ — 89], [35/ — 35/41/ — 42/ — 89], 
Stiffener 12 plies 10 plies 8 plies 

[—3/33/ — 53/ — 2/53/ — 88], [B/S = 53) = 2/53) [= = S3y/ = 253s 
Spacing 20.6 inches 20.6 inches 20.6 inches 
Weight 702 E—5 lb/in? 585 E—5 lb/in? 552 E—5 lb/in? 
Skin 10 plies 10 plies 8 plies 

[35/ — 35/41/ — 42/ — 89], [35/ — 35/41/ — 42/ — 89], [35/ — 35/41/ — 89], 

Stiffener 12 plies 10 plies 8 plies 

[—3/33/ — 53/ — 2/53/ — 88], [=B/ 33) = D3) = 2/3) [=3/ = 53) = 2/93), 
Spacing 20.8 inches 20.8 inches 20.8 inches 
Weight 616 E—5 lb/in? 583 E—5 lb/in? 466 E — 5 Ib/in* 


e The overall weight of this nonblended panel is 3.504 E — 5 Ib/in’. 


e The stiffener geometry variables were always at their upper and lower bounds: Height: 2 inches; Width of 
flange: 1 inch; Width of cap: 2 inches, and Angle of web: 90 degrees. 


COSTADE 


Through the collective efforts of Boeing, NASA, the 
University of Washington and others, a prelimi- 


nary design software package called COSTADE (Cost/ 
Composite Optimization Software for Transport Air- 


craft Design Evaluation) has been developed [7,8,9]. 
The three optimization formulations described here are 
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Optimal Design of Composite Structures, Figure 4 
Design variables for a sandwich composite panel 


available in the software. The optimization software has 
been applied to the design of aircraft composite panels, 
including a crown panel [15], a keel panel [7], a window 
belt [9], and most recently to a full fuselage barrel [10]. 
Research continues in defining more general blending 
rules [6], and more accurately reflecting the manufac- 
turing considerations. The modified hit and run al- 
gorithm has been robust enough to solve the mixed 
integer-continuous global optimization problem with 
a hierarchy of objective functions efficiently. The design 
optimization has proved to be an effective aid in the de- 
sign process of composite structures. 
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Over the past two decades, significant technology ad- 
vances have been made in nonlinear optics due to rapid 
developments of laser technology and nonlinear optical 
materials. Applications of nonlinear optics are every- 
where, for example, lasers, spectroscopy, optical switch- 
ing, parametric amplifiers and oscillators, optical com- 


puting, and communications. A remarkable applica- 
tion is to generate powerful coherent radiation at a fre- 
quency that is twice that of available lasers, so-called 
second harmonic generation (SHG). However, nonlin- 
ear optical effects are generally very weak. In order to 
obtain useful nonlinear optical effects, several methods 
may be employed. First, extremely high intensity laser 
beams or materials with very high nonlinearities could 
be used. Unfortunately, limited by the availability of 
technology and high costs of such lasers/materials, this 
method is often impractical. Another method is to in- 
crease the effective nonlinearity of the medium by us- 
ing composite materials. Such a method is currently 
an active research topic in material sciences. The third 
method is a structure assisted method. The idea is to 
enhance the nonlinear interaction between the mate- 
rial and the light by using gratings, waveguide, or other 
diffractive structures. The advantage is that the method 
is very practical and can make good use of available 
lasers and materials [10]. 

This work is devoted to optimal design or param- 
eter identification problems that arise in modeling of 
nonlinear optical thin films [2]. We shall restrict our 
attention to second order nonlinear effects, which are 
the simplest and representative to other nonlinear ef- 
fects. The following problems are of particular inter- 
est: From the measured transmittance and reflectance 
at both frequencies, what kind of information might be 
retrieved about the medium? Given nonlinear films and 
coating materials in a multi-layered form, maximize the 
transmittance of the second harmonic field, i. e., maxi- 
mize the nonlinear optical effects. Note that in the sim- 
plest setting, the problem may be formulated as a two- 
point boundary value problem for first order nonlinear 
system of ordinary differential equations (ODEs). One 
goal of this research is to identify physical properties of 
the medium by probing the medium with light or other 
energy sources. Another goal is to design new materials 
with desirable properties, i. e., solving an optimal design 
problem. 

Throughout we shall view the optimal design prob- 
lem as a parameter identification problem. Because 
of important applications in diverse areas of science 
and engineering, parameter identification problems for 
nonlinear systems of ODEs have been studied exten- 
sively. The advent of computers and parallel machines 
has greatly accelerated activity in this area and has 
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driven the need for efficient computational methods 
and algorithms. For closely related parameter identifi- 
cation for nonlinear ODE initial value problems, three 
general approaches have been developed. We refer to 
[5] for a detailed discussion. Among them, the black 
box approach is the simplest one which essentially 
separates the numerical solution of differential equa- 
tions from the optimization process. The ODE solver 
is treated as a ‘black box’ with a minimum commu- 
nication with the optimization procedure. However, 
there is a severe drawback of the approach. Due to the 
nonlinearity of the problem, the model often does not 
have a solution for all parameters. Without the neces- 
sary communication, the optimization algorithm may 
ask for a solution to the ODE with parameter values 
for which the ODE solver fails. The second approach, 
the all-together approach, first discretizes the ODE sys- 
tem and then formulates the identification problem 
as an optimization problem with equality constraints 
resulting from the ODE discretization. The third ap- 
proach often referred to as the domain decomposition 
or in-between approach divides the interval into a num- 
ber of subintervals, and uses a black box procedure 
to solve the problem on each subinterval. In order to 
solve the ODE system over the entire interval, matching 
conditions are introduced to patch together solutions 
over subintervals. Recently, J.E. Dennis, G. Li, and K.A. 
Williamson [5] have developed two families of algo- 
rithms, the in-between and altogether approaches, for 
solving ODE inverse problems. Their model problem is 
a nonlinear initial value problem. Efficient algorithms 
are introduced by a nonlinear programming formula- 
tion of the problem coupled with an orthogonal collo- 
cation scheme for solving the model ODE. The general 
idea is to tailor the algorithm to best fit the level of in- 
teraction between the optimization algorithm and the 
ODE solution technique. 

In this article, we propose a domain decomposition 
approach for solving the parameter identification prob- 
lem of nonlinear ODE two-point boundary value prob- 
lems. We introduce an in-between approach for solv- 
ing the parameter identification problems for nonlinear 
ODE two point boundary value problems, which has 
the following distinct features: 

1) The problem is formulated as a constrained opti- 
mization problem as opposed to the standard data 
fitting least squares problem. The set of variables is 


decomposed into two parts: a set of explicit variables 
and a set of implicit variables, which is shown to be 
efficient through a complexity analysis. The conti- 
nuity and boundary conditions are treated as explicit 
constraints for the optimization problem. Hence, the 
parameters and the function values at the subdivi- 
sion points are EXPLICIT variables only need to be 
satisfied and determined at the solution or the final 
step. At each iteration step, the differential equation 

(the IMPLICIT variables) is solved over each subin- 

terval independently. Furthermore, the extremely 

large number of linear systems for computing the Ja- 
cobian, the gradient of the constraints, and the Hes- 
sian of the Lagrangian, which account for more than 

95% of the total computation, are all solved com- 

pletely independently over each of the subintervals. 

2) The optimization approach adopted from [4] and 
[5] is a much refined version of the successive 
quadratic programming (SQP) together with a glob- 
alization strategy. The basic idea of the optimiza- 
tion approach together with a discussion on sparsity 
structures may be found in [5]. We point out that the 
approach is very complicated which involves com- 
putations of Jacobians and Hessians with respect to 
the implicit variables. Since our goal in this article 
is to present the main ideas of the in-between ap- 
proach, the tedious technical details of the optimiza- 
tion techniques will be left out. 

We follow the general idea of Dennis, Li, and 
Williamson [5]. However, the situation here is more 
complicated for the following reasons: 

1) Unlike in [5], we do not assume availability of the 
data in the interior of the interval. This feature is es- 
sential in many practical applications, for example, 
in nondestructive testing. On the other hand, we use 
a set of samples that correspond to a set of experi- 
ments. The samples are taken, only on the boundary, 
by varying the sources. 

2) Because the boundary value varies from sample to 
sample, the size of the system of differential equa- 
tions is a multiple of the samples. Unlike in the ini- 
tial value problem case where the unknowns are in- 
dependent of the number of data points, the number 
of unknowns in our case is the number of sources 
(or experiments) multiplied by the number of un- 
knowns introduced in the domain decomposition 
approach. Asa result, the number of nonlinear equa- 
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tions and linear systems needed to be solved for im- 
plicit differentiations and computing the Hessian of 
the Lagrangian is a multiple of the square of the 
number of sources. Consequently, even with a mod- 
erate number of experiments, the computational and 
spatial cost may be so high that even main frame 
computers may not be able to solve it unless spe- 
cial efforts are taken to take advantage of the sparsity 
structures of the gradient of the constraints and the 

Hessian of the Lagrangian to reduce the number and 

the size of the resulting systems of linear and nonlin- 

ear equations. 

Our approach has the flavor of the multiple or paral- 
lel shooting method for solving ODE two point bound- 
ary value problems [9]. We point out that a straightfor- 
ward modification of the approach gives rise to a vari- 
ant of the multiple shooting method. However, our em- 
phasis here is not on solving the direct problem but 
rather on solving the parameter identification problem 
or the inverse problem. 


Model Formulation 


We make the following assumptions: the fields are 
transverse; the medium is stratified; and the surface 
is flat. Since the medium is stratified, the fields vary 
only in one direction. The transversality assumption al- 
lows us to reduce the Maxwell system to a system of 
Helmholtz equations. 

Let us specify the geometry of the model. Assume 
that a slab of stratified nonlinear material (say, com- 
posed of many layers of different nonlinear media) is 
placed between two linear homogeneous materials, say 
in the domain {2 = (0, 1). Suppose that the whole space 
is filled with material in such a way that the indexes of 
refraction qi(x) and q(x) at frequencies w; and w2 = 
2@, respectively, satisfy 


qa, x=, 
qi(x) = 4 qj, x EL, 
qj2; x <0, 


for j = 1, 2, where qj and qj2 are fixed constants, and qjo 
may be some piecewise constant. 

Assume that a plane wave with electric field (0, 
E;e'4\*, 0) is incident on &2 from the above. Using the 
jump conditions, we can derive the following two-point 


boundary value problem 


d* - 
(= + i) EF, =NM&\E, in’, 


a 
(= + #) E, = ES in 2 


E\(1) + iquE.() = ique!t'E, , 
E,(1) + ign E2(1) =0, 
E\(0) — iqi2E\(0) =0, 
E}(0) — iqn2E2(0) =0, 


where E, = E\(x, @1), Ey = E(x, @2), and xj, 72 char- 
acterize the nonlinearity of the medium at frequencies 
@® , ®2, respectively. The most striking feature of sec- 
ond harmonic generation is that new frequency com- 
ponents (at w2) are present. 

By introducing new variables, the system may be 
simplified as a first order system of ODEs. From now 
on, we shall consider the following two point boundary 
value problem for a general system of nonlinear ODEs 


y =F(x,y,p,k), 
Ay(1) + By(0) = g(k), 


where x € (0, 1) is the independent variable, y = y(x) € 
R”’ is the solution, p € R”? denotes the parameters, k € 
R” represents the source terms, A and B are matrices 
possibly depending on the parameters p. Here ny is the 
number of dependent variables, np is the number of pa- 
rameters, and nd is the number of samples or sources. 

The direct problem is to determine solutions y = 
y(x, p, k), given the parameters p and source terms g(k). 
Mathematically, the problem is well understood. It is 
well known that due to the nonlinearities, existence of 
solutions is not obvious. Roughly speaking, it depends 
on the regularity of F. Further, even a solution does 
exist, because of the boundary conditions, it may not 
be unique. Throughout the article, we assume that the 
two-point boundary value problem has a unique solu- 
tion. 

The parameter identification problem or inverse 
problem is to determine the parameters p from the 
additional boundary data. We consider the data set: 
ydata(0),, ydata(1),, s = 1,..., nd, where once again nd 
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is the number of samples by varying k. Define 


nd 
1 2 
(p) = 5 Ilys psk,) — ydata(0)s 


s=1 


nd 
1 
+5 Dilly pi ks) — ydata(Dall? 


s=1 


where || - || is the vector norm defined by ||u||? = uT - u. 
We are interested in identifying the parameters in 
the least squares sense, i. e., to find a p* which best fits: 


min @&(p) 
st.  y! = F(x, y; p, ks) 
bc. Ay(1; ks) + By(0; ks) = g(ks) , 


i 


Collocation Scheme 


We next describe a collocation scheme for solving the 
nonlinear system of ODEs. Here, we follow the general 
procedure given in [5]. Let us begin by dividing the in- 
terval (0, 1) into ns subintervals: [x;, xj. 1), for i= 1,..., 
ns, and x; = 0, x,; = 1. On the ith subinterval, the func- 
tion y(x) may be approximated by a polynomial: y(x) 
= Do) = 0" zi Wij(x), where nc is the number of col- 
location points in one subinterval, {W (x)} is a basis of 
Lagrange polynomials of degree nc at the points tj, z; 
are the collocation coefficients to be determined, and tj 
is the jth collocation point on the ith subinterval. 

The collocation method requires the above piece- 
wise polynomial approximation to satisfy the ODE at 
the collocation points on each of the subintervals. For 
the sth sample, s = 1,..., nd, solving for {zjj} leads to an 
approximation to y at the collocation points. This step 
leads to the following collocation conditions: 


hi(p.z) =0, 
where fors = 1,...,nd, j=1,...,nc,i=1,... 
“AV; x(tij) 


hi(p,z) = > jy ie 


k=0 


F(tij,Zj)3P) - 


In order to approximate the solution y(x) over the 
interval (0, 1), we need to patch together approxima- 
tions over subintervals. This can be done by enforcing 
continuity conditions at all end points of the subinter- 
vals except the two end points x; = 0, Xns = 1. The conti- 


nuity conditions are natural by assuming that the solu- 
tion is continuous: 


h3(p, z) =0, where fors = 1,...,nd, i= 2,...,ns, 
ne 
h3(p. 2) = Zig — D231 pMi-1,k( tio) - 
k=0 
In addition, we want to enforce the boundary value con- 
ditions: 
h3(p,z) =0, where fors = 1,..., nd, 
ne 


h3(p,z) = Azjy + B Yo Zig Pas (1) . 


k=0 


Set for q = 1, 2, 3, 
nd 
= ene, Nis 
We then have by combining the above conditions, that 


h(p,z)=0, h=(hy,In,hs)". (1) 


Domain Decomposition 


It follows that the parameter identification problem 
may be formulated as a nonlinear programming prob- 
lem: 


min @&(p,z) 
p.z 
st. h(p,z) =0. 


If both the parameters and the collocation coefficients 
are treated as independent variables, i.e., the variables 
are treated all together, then the approach is called all 
together or all-at-once. In this approach, the nonlin- 
ear system of equations h(p, z) = 0 gives rise to a set 
of explicit constraints. Thus, p, z only need to satisfy 
the constraints at the solution. Readily, one can verify 
that the dimension of the problem or number of un- 
knowns in this case is np+(ns+ ncx ns)xnyxnd. There 
are two potential drawbacks for this approach. First, by 
treating all of the variables as independent variables, 
the size of the resulting nonlinear programming prob- 
lem can be very large. Sophisticated optimization tech- 
niques are impractical for large size problems. Also, the 
approach does not support parallel structures. It is dif- 
ficult to make it efficient in a parallel environment. 

In order to exploit parallelism and reduce the size of 
the nonlinear programming problem, we propose a do- 
main decomposition or in-between approach which 
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follows in principle [5]. The basic idea of the domain 
decomposition approach is to identify a set of implicit 
constraints, i.e., constraints that are satisfied at every 
iteration. More specifically, for each sample, this ap- 
proach allows us to treat only the model parameters p 
and a subset of collocation coefficients corresponding 
to the ends of subintervals z;), i= 0, ..., ns— 1, as the 
independent variables of the nonlinear programming 
problem. Thus the dimension of the problem is reduced 
to np + ns X ny x nd. 

We decompose the set of collocation coefficients 
into a set of explicit variables 


Se = Binns Bigat 3 


and a set of implicit variables 


a 


— Ss, bogS*, a aSt a 
4 taht eee ee ee ney ’ 


where again z; , j # 0, are determined by the collocation 
conditions. 

Given p and z;j, we then solve the nonlinear system 
hy (p, Zi) = 0 for the implicit variables, independently 
on each subinterval. The dimension of the system is ny 
x nce x nd on a subinterval. The special structure of the 
problem allows us to break the nonlinear system into nd 
independent systems, where the dimension of each sys- 
tem is nyxnc. Note that the continuity conditions and 
the boundary conditions are constraints of the nonlin- 
ear programming problem. They are satisfied only at 
the final solution (p*, z*) of the optimization problem. 

Therefore, the problem becomes: 


min (p, Zc, 21(p, Ze)) 
P.ZE 


st. hy(Ze, Z4(p, Z£)) = 0, 
h3(ze, Z4(p, Zz)) = 0, 


where 27(p, zz) solves hy(z7; p, ze) = 0. 


Remark 1 A simple calculation indicates that the di- 
mension of the new nonlinear programming prob- 
lem is np + nyxnd and the number of constraints is 
nsxnyxnd. 


We now briefly discuss some implementation issues for 
the domain decomposition approach. The number of 
nonlinear systems on one subinterval is (np + nyxnd + 
1)x nd. Implicit differentiation should be used to com- 
pute the first order partial derivatives of the implicit 


variables with respect to the explicit variables. The to- 
tal number of linear systems to be solved for the im- 
plicit differentiation on one subinterval is (np + nyxnd 
+ 1)x nd x (ny + np). This number is usually quite 
large. However, the linear systems are independent not 
only on each of the subintervals but also for each of 
the samples. Furthermore, for one sample on one inter- 
val, the coefficient matrices of the linear systems are all 
the same. Therefore, only one LU factorization is nec- 
essary. The rest are hundreds of independent triangular 
solvers. The second order derivatives are computed by 
using the finite-difference technique. The main advan- 
tage of this approach is that the resulting nonlinear sys- 
tems are independent on each of the subintervals. Thus, 
there is no communication between the subintervals - 
an ideal feature for parallel computation. 


Optimization 


We have formulated the problem as a nonlinear pro- 
gramming problem by using the domain decomposi- 
tion approach. We next describe a general method de- 
veloped originally in [5] for solving this type of op- 
timization problems. The optimization algorithm is 
based on the successive quadratic programming (SQP) 
with a trust region globalization. The idea is to adopt 
different techniques based on how close the current ap- 
proximate is from the solution. If it is ‘close’ to the solu- 
tion, we choose a step to be the solution of the quadratic 
program: 


A Quadratic Model 


subject to Linearized Constraints. 


minimize 


Otherwise, if it is ‘far’ from the solution, we choose the 
step to be the solution to a trust region subproblem. 

The algorithm for the QP is robust. It forms the re- 
duced Hessian and determines whether a solution ex- 
ists. Note that ifthe reduced Hessian is not positive defi- 
nite, then the QP may have infinite number of solutions 
or no solution at all. If a solution does exist, the algo- 
rithm will find it. The algorithm will calculate a descent 
direction when the QP does not have a solution. The 
trust region globalization technique of [4] is employed 
to deal with the possible lack of positive definiteness of 
the reduced Hessian. Thus the algorithm handles de- 
generacies in the linearized constraints by using eigen 
decomposition. 
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Numerical Experiment 


Our test problem is based on a simplified model of the 
system of nonlinear Maxwell’s equations that arises in 
modeling second harmonic generation of nonlinear op- 
tical thin films. Under some assumptions [2]. Let Y = 
(V1; ¥2. ¥3, ¥4)T. The model problem takes the following 
form: 


Y" = (y2,—piyi + P3yiVs, Yas Pa] — P2Js) 
with the boundary condition 


AY (1) + BY(0) = g, 


where 
ayy 1 0 0 
0 O 
he a2) 1 
0 0 0 0O 
0 0 0 O 
0 0 0 0 
B= 0 0 O 0 
a2 1 0 0 
0 0 a22 1 


and g = (gx, 0, 0, 0)T. 

In this over-simplified case, the inverse problem is 
to determine the parameters p = (pj, po, p3, pa) from 
the measured Y(0) and Y(1) for a given source term gx. 
Since p which characterizes the physical properties of 
the medium is independent of gj, it is natural to expect 
better reconstruction of p by performing a set of exper- 
iments or using a number of g;. Here the constant g;, 
represents the intensity (power) of the incident light, 
which may vary with k in a given range. 

In our experiment, the known constants were cho- 
sen in the following way: a1; = 5, a2; = 4, ai. = — 2 and 
ax, = — 3. We used five pairs of sample boundary data 
at the end points. These data were generated by using 
the fixed parameters p; = 0.005, po = 0.003, p3 = 0.0015 
and p4 = 0.025. We used five samples gy, k = 1,..., 5, 
where g) = 2, g2 =5,g3 =—2,g4=—3andg5=—5. 
We chose the number of collocation points nc = 4. It is 
obvious that in this example, np = 4 and ny = 4. From 
the data, we then tried to recover the parameters p. The 
experiment was done on a Cray J916 which is a 16 CPU 
shared memory computer. The results are shown in the 
following three tables where ns is the number of sub- 
intervals, nd is the number of samples, n, is the num- 
ber of variables of the NLP problem, ny is the number of 


Optimal Design in Nonlinear Optics, Table 1 
Accuracy improvement with increasing number of samples 
(ns = 20, ncpu = 4) 


nd Ny Nh nit error CPU time 
1 84 80 8 1.36e-2 6.16 
2 164 160 5 1.12e-6 12.17 
3 244 240 5 8.16e-7 26.43 
4 324 320 5 6.29e-7 47.76 
5 404 400 5 6.12e-7 69.04 


Optimal Design in Nonlinear Optics, Table 2 
Convergence effect on different number of sub-intervals (nd 
= 4, ncpu = 4) 


ns | ise Nh nit error CPU time 
8 100 96 1) 1.28e-5 19.18 
16 | 260 256 5) 8.81le-6 31.97 
32 | 516 512 5 3.65e-7 93.45 


constraints, nit is the number of iterations for the algo- 
rithm to converge, error is the -norm of the relative 
error between the estimated parameter and the exact 
parameter, ncpu is the number of CPUs used for solv- 
ing the problem. 

Table 1. shows the accuracy improvement when 
the number of samples is increased from one to five. 
One may observe that after some point, additional data 
make very little difference in terms of reconstruction of 
the parameters. This is largely due to the fact that the 
accuracy for solving nonlinear systems of equations has 
already been reached. In Table 2., we demonstrate the 
effect of increasing the number of sub-intervals, i. e., re- 
fining the grids. Finally, we show in Table 3. the total 
CPU time and the speed up with different number of 
CPUs. 


Discussion 


We present a general approach on parameter identi- 
fication for nonlinear ODE two-point boundary value 
problems that arise in optimal design of nonlinear op- 
tics. The data for our parameter identification prob- 
lems are only given at the boundary points. Conse- 
quently, the problem size is a multiple of the number of 
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Optimal Design in Nonlinear Optics, Table 3 
Speed-up on different number of CPUs (nx = 644, np = 640, 
nd =5,ns = 32) 


ncpu | Total CPU time Speed-up 

1 118.7 1 

2) 117.0 2.03 
4 118.4 4.01 
6 WADI 5.83 
8 124.3 7.64 
10 127.4 9.32 
Ly 131.6 10.82 
14 136.1 Aa 
16 140.4 13.53 


boundary data pairs. Our approach is based on ideas of 
nonlinear programming and domain decompositions. 
It generalizes [5] to a more general setting. Our prelim- 
inary numerical results indicate that the methods not 
only efficient on parallel machines but also effective on 
sequential machines. We also develop a technique to re- 
duce the size of linear and nonlinear systems resulting 
from the approach. 

A new research topic is to use the general approach 
developed in this article and [5] to solve inverse scat- 
tering and diffraction (PDE) problems. A crucial step 
is to develop a fast and efficient domain decomposi- 
tion solver for the direct problem. Similar ideas have 
recently been used by Dennis and R.M. Lewis [6] for 
solving an inverse conductivity problem. The interested 
reader is referred to [1] [3] [8] for other results on re- 
lated optimal design problems. 
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The multiphase, multireaction chemical equilibrium 
problem (the CEP) is a nonlinear optimization problem 
in chemical thermodynamics that is of interest in many 
fields. Some examples of application areas include bio- 
chemistry [23], chemical engineering [32], chemistry 
[2], geochemistry [3], metallurgy [21], and plasma sci- 
ence [10]. The CEP has a number of special features 
arising from its unique structure [9,28]. This makes 
its numerical solution especially challenging, leading to 
difficulties associated with multiple local minima and 
numerical scaling. General references to the CEP in- 
clude [8,27], and [33]. A recent review of aspects of the 
CEP has been given in [24]. 

The CEP may be expressed in two general ways. 
The one more commonly encountered is formulated 
as a global optimization problem, when the objective 
function is specified at a ‘macroscopic level’ in terms of 
its analytical dependence on the underlying composi- 
tion variables and an appropriate set of parameters. The 
other is formulated at a ‘molecular level’, in terms of 
an underlying intermolecular force model, and the op- 
timization problem solved by means of a Monte-Carlo 
simulation method based in statistical mechanics [30]. 
In this article, we consider only the former viewpoint, 
and aspects of its formulation and solution. 

The formulation and numerical solution of the CEP 
require, first, an assumption about which chemical sub- 
stances are to be considered; and, second, about their 
distribution over possible phases. The latter may take 
two forms. One form of the CEP assumes that all sub- 
stances are represented in all possible phases, and is re- 
ferred to as the universally accessible form (UAF). The 
other form prohibits at least one substance from be- 
ing present in all phases, and is referred to as the re- 
stricted accessibility form (RAF). Distinguishing these 
two forms has important consequences, as discussed 
below. 


A special case of the CEP is the phase equilibrium 
problem (PEP), which involves a set of chemical sub- 
stances which can distribute themselves among two or 
more phases, in the absence of chemical reaction. An 
example of UAF-PEP is the system n-butanol + water 
+ n-butyl acetate [5]. This type of problem is impor- 
tant in the chemical processing industry, and there is 
much current interest in the development of efficient 
algorithms for its numerical solution. Elementary ex- 
amples of RAF-PEP are ‘simple eutectic’ systems and 
‘steam distillation’, as usually modeled. 

UAF and RAF forms also arise in the general 
CEP in which chemical reactions occur, complicating 
their formulation and numerical solution. Examples 
of RAF-CEP include multiphase problems involving 
condensed phases, and multiphase problems involving 
ionic species accessible to only a single phase. 


Problem Formulation 


Defining a CEP requires specifying two thermody- 
namic variables from the set {P, V, T, U, H, S, G, A}. 
{T, P} is a common choice and is primarily considered 
here; this choice implies that the objective function to 
be optimized is the Gibbs function, G. Other choices can 
be formulated in terms of this choice (see the section 
‘Sensitivity Analysis’ below). 

One of the most general problem formulations has 
been given in [25], which we summarize here. We as- 
sume the following are given: 

1) a substance formula matrix, As € E” x S, where M 
is the number of elements and S is the number of 
substances, each of which is described by a formula 
vector a; with entries aj, which is a column of As 
(electric charge is considered to be an element; in 
the unusual situation in which ionic species are ac- 
cessible to more than one phase, a ‘charge row’ must 
be included in As for each phase); we assume here 
that (As) < S and is usually given by M (as assumed 
herein); 

an elemental-abundance vector, b € E™, with entries 
bj = Oandb ¥ 0; 

3) aset of x chemical potential models or phase classes, 


pe (T, P, x8, o8):B = 1,..., 7, where: 
Be: Rate +08 ph 


2 


wer 


- x8 € EIB is the composition (e. g., mole-fraction) 
vector for mw; 
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- a is the vector of parameters for w; 

- JP is the substance index set for phase class , 
containing the set of subscripts of substances that 
are accessible to the phase class. For a UAF, If 
= {1, ..., S} for all 6. For an RAF, at least one 
substance subscript is absent from at least one IF 
denotes the number of substances deemed to be 
accessible to any phase consistent with 4; 

- @P is the number of chemical potential model pa- 
rameters given by o; 

- each phase class satisfies the Gibbs-Duhem equa- 
tion, as well as the limiting law lim —>0 ue = 
— 0. 

To elaborate on the term ‘phase class’ and to relate 

it to the term ‘phase’, we note that: 

3.1) every substance in the system is represented in at 
least one phase class; i.e., Ug = 17 iB = qd 5. see 
S}; we emphasize that all substances need not be 
represented in every phase class. 

3.2) there may be more than one phase accessible to 
a given phase class f; for each such phase k arising 
from a particular chemical potential model uw’, we 
have pk = bP (T, P,xBok a), where xk € FIB 
is the composition vector for the phase. 

3.3) although the number of phase classes, zr, is spec- 
ified a priori, the number of phases, 2°, ‘acces- 
sible’ to the phase class 4, may not be known 
a priori; furthermore, a and x® are distinct from 
the total number of phases present at equilibrium 
in nonzero amounts, IT, which is also not known 
a priori, but is a result of the equilibrium compu- 
tation. 

3.4) The term species refers to a substance in a specific 
phase. Since is generally unknown, construc- 
tion of a species index set is not always possible. 

The general statement of the chemical equilibrium 

problem at specified T, P and b is given by (omitting 

the dependence of pe on T,P a); 


T 78 
: _ Bk Bk Bi Bk 
ei ala yon ee pe; (w"), (1) 
‘ B=1k=1 ie IB 
subject to 
Wek > 0; B=l,....m; k=1,...,078, — (2) 


cs 78 
So HOES ajix? * —b; = 05 j=Hl,....M, 
B=1k=1 ie] 

(4) 
Ye SH. Pein Pot 
ieIh 

(5) 


where fi?>* is the total number of moles in phase f, k. 
In all cases we assume that at least one feasible solution 
exists. 


The Special Case of Phase Equilibrium (PEP) 


In the PEP case, the problem statement is given by (1)- 
(3) and (5), with the element-balance equations (4) re- 
placed by the substance balance: 


pa 8 


a xP — gi = 0: 


B=1k=1 


a Deere (6) 


where q; (> 0) is the (constant) total number of moles 
of substance i in the system. 


Kuhn-Tucker (KT) Conditions 


We call a solution {qbk xhk 0} (A and 6 are La- 
grange multipliers) of the following equations a Kuhn- 
Tucker point (KT point): 


Wek ub (P*) ajay =0; 
j=l 


ie I’; B=1,...,.2;k=1,...,7%, (7) 
M 
k 
ob — \* xb we (xP) — Ajai >0; 
ie 1B j=l 
B=1,...,7;k=1,...,78, (8) 
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Alternative Form of KT Conditions 


Kuhn-Tucker conditions The KT conditions given 
above are referred to as the nonstoichiometric form of 
the equilibrium conditions [27, pp. 46-48]. These con- 
ditions may be expressed in an alternative form, the sto- 
ichiometric form [27, pp. 45-46]. These arise from ex- 
pressing the element abundances in terms of component 
species ([27, Chap. 2], [29]). Equations (7), (8), and (10) 
are replaced, respectively, by 


M 
me | ee (x) — > ery | = 95 
j=l 


ie I?; B=l,....m; k=1,...,78, (11) 
M 
gbk — > xPok pt (x) = So ivi >0; 
ie Ih j=l 
p= a eS Nyc mP , (12) 
1s 78 
Sy wy tee qj=0; j=l,...,M, 
B=1 k=1 ie[h 
(13) 


where vj is a stoichiometric coefficient for species i 
with respect to component species j, and q; is the total 
amount of component j. 


Global Optimality Conditions (Reaction 
Tangent-Plane Criterion, RTPC) 


A necessary and sufficient condition for global opti- 
mality of a KT point is that the objective function is 
nowhere smaller than its value at that point. This may 
be expressed as: 


x 318 
Act = TaD ah 
B=1k=1 ie[B 


M 
«POP — ST ajay} = 0 4) 
j=l 


over all {i *, x8 *} satisfying 2-(5), where A‘ is the La- 
grange multiplier at the KT point. 


The above criterion takes different special forms for 
a UAF and for an RAF. If formation of a potential new 
phase is feasible in terms of the element-balance equa- 
tions (which is always the case for a UAF, but not neces- 
sarily for an RAF), then we may consider the phase class 
individually, and obtain a simpler set of conditions in- 
volving only the mole fraction variables for the phase 
class from the inner summation in relation (14): 


M 


De a pe? (x) - Aaj 


ieIB j=l 


(15) 


over all {mB k xP. ky satisfying equations (2) to (6). 
This is called the reaction tangent-plane criterion 
(RTPC) [9,25]. 

For a UAF, all I? contain the complete set of sub- 
stances, which means, from equation (7), that each 
summation involving the Lagrange multipliers A ;* in 
the above gives the chemical potential y;' at the KT 
point. Criterion (15) then becomes 


Dox (u?@) ut) = 0. 


i=1 


(16) 


This is the tangent-plane criterion (TPC) for a PEP [4] 
(see also [15,18]). 

For an RAF, criterion (15) is not equivalent to crite- 
rion (16). A simple example arises in the system con- 
taining {CO(g), CO2(g), O2(g), C(s)}. There are two 
phase classes, with index sets J! = {1, 2, 3} and ? = 
{4}. Consider a KT point at which only the gas phase 
is present, and denote the Lagrange multipliers as Act," 
Ao". To test for the presence of the solid phase, crite- 
rion (15) is 


jig —AL >. (17) 


If this criterion is satisfied, C(s) is not present at equi- 
librium; otherwise C(s) is present in nonzero amount at 
equilibrium. 

Similar, but more involved, RAF problems often 
arise in metallurgical applications [12]. More complex 
RAF problems, in which a test must be made for the 
simultaneous presence of multiple phases, have been 
considered in [7]. 


Chemical Potential Models 


The chemical potential of substance i in phase class , 
ii?, may be expressed in terms of activity, a;, or fugac- 
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ity f;, or activity coefficient, y; (for ease of notation we 
omit the superscript 6): 


wi(T, P,x) = wz (T, P*,x*)+RT Ina;(T, P,x), (18) 


wi(T, P,x) = 4; (T, P*,x") 
i(T, P, 
FAT, Px) ; (19) 
F(T, P*x*) 
i(T, P,x) = uF (T, P*,x*) + RT Iny,(T, P,x)x; , 
(20) 


+RTIn( 


where R is the universal gas constant, 8.3145] mol™ 1 
K™ (see also [27, pp. 48-57, 62-73]). 

Each form consists of two parts on the right side, 
*(T, P*, x*) and a logarithmic term (which has its ori- 
gins in the use of the ideal-gas equation of state). u*(T, 
P*, x*) is the chemical potential in a standard state at 
(T, P*, x*), as a reference state. The superscript * de- 
notes the requirement to specify the standard-state con- 
ditions P* and x*. Methods of obtaining j47(T, P*, x*) 
are discussed in the next section. 

The logarithmic term is obtained from an equation 
of state (EOS), or a specific model (e. g., a particular y 
model), or a free-energy model (e.g., [17]). A simple 
phase-class model is the ideal-solution model 


PAT? DH ei =) RT (21) 


In this case, the objective function G is convex. More 
generally, G may be nonconvex, leading to multiple lo- 
cal minima. 


Methods of Obtaining 


p*. Analysis of (11), together with (18)-(20), reveals 
that the equilibrium composition is determined not by 
the set of individual values of *(T, P*, x*), but by the 
linear combinations {A G;(T):j =1,..., R}, where R = 
N— M is the maximum number of linearly independent 
stoichiometric vectors (or chemical equations), v;, with 
elements vj (R as defined here is not to be confused 
with the universal gas constant R), and N is the num- 
ber of chemical species [27, p. 17]. For a single-phase 
system, N = S; for a multiphase system, N> S. A G; is 
defined by 


S 
AG; =—RTInK; = awe 


i=1 


(22) 


and 17 is the standard chemical potential of substance 
iin a particular (specified) phase. 

In order to give rise to the same set of equilibrium 
solutions, any choice of {~*} in (7) and (8) must yield 
an invariant value of A G;(T) for any possible chemical 
equation j involving the system substances. This means 
that [27, p. 73] for any two choices jz7(1) and jz7 (2), 


S 
Y> vi(ul (2) — wt) =0. (23) 
i=l 

Equation (22) may be viewed as an undetermined 
set of R linearly independent equations in the N j1* val- 
ues, in terms of a specified {A G*}. A set of 4* may 
then be constructed by assigning arbitrary values (zero 
is the most convenient) to M substances with linearly 
independent formula vectors [6,27, pp. 214-217]. This 
approach must be followed if values of jz are required 
in cases for which data are only available in the form 
of A G* (as, for example, in many situations involving 
biochemical and/or ionic systems). 

Values of j1* obtained for a particular system using 
equation (22) cannot normally be used for other chem- 
ical systems. A universal set of 7 values that can be 
used for any chemical system requires assigning {17} 
to a set of species containing all possible chemical ele- 
ments and whose formula vectors are linearly indepen- 
dent, and then assigning all jz* values relative to these 
as a datum. The minimum number of such species is 
equal to the number of chemical elements; thus, the 
simplest and most common choice is the set of elements 
themselves. 

Even using the elements as a reference chemical 
state, various routes to obtaining a universal {j.*} are 
possible, and one must take great care when combining 
data from different sources. 

For a given standard-state pressure, the temperature 
dependence of the thermochemical properties h (en- 
thalpy), s (entropy), and x is given by 


i 
h*(T) = H(t) + | Coi(T)AT , 


, 


Pye 
(1) =r) + f a ia 
ue T 
wij (T) =h;(T) — Ts;(T) , 
T : 
st(T) = sh(T) +f oD ar, (25) 


af 
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Mj (T) = hj(T) — Ts;(T), (26) 


where T’, is an arbitrary reference temperature, and Cp 
is molar heat capacity at constant pressure. An alterna- 
tive route to 7 (T) is equation (24) together with inte- 
gration of 


Ae] AF (7) 


oT T? 


Other routes to j17(T) are described in [27, pp. 65-73]. 

The above equations require a choice of T,, values 
of h¥(T,), one of {s*(T,), 4*(T,)}, and values of Cp;(T). 
Common choices of T, are OK and 298.15K. The former 
is a convenient choice when the quantities are deter- 
mined from statistical mechanical considerations, and 
the latter is convenient from an ambient temperature 
consideration. 

Given any set of data, {j1*(1)(T)}, another set may 
be formed via 


(27) 


M 
wT) = p7(T) — Y° ajie((T) , 


j=l 


(28) 


where {cj(T)} is a set of arbitrary constants for the el- 
ements; such a set ensures that (23) is satisfied. The 
choice 


o(T) = F(T) (29) 


for each element in some specified (standard) state at T 
gives Le; (2)(T) = 0 for each element. {17 (2)(T)} is then 
called the set of formation values of *. This choice is 
often used, but is not computationally convenient, since 
its calculation at an arbitrary T requires an agreed-upon 
set of choices of the element standard states at each T, as 
well as information for the elements concerning Cp(T) 
and possible phase transition values of enthalpy. 

Some sources of j¥ and other thermochemical data 
are listed at the web site [34]. 


Sensitivity Analysis 


In many cases, it is desired to calculate the rate of 
change of the optimal solution with respect to one or 
more than one member of the set of underlying param- 
eters, {T, P, 4%, bj}. The set of sensitivity parameters of 
interest are the first order derivatives dnj/ dp;, and the 
second order derivatives 0? n;/ Op; Op., where p; denotes 
a parameter. These quantities can be used in the calcu- 


lation of: 

1) the solution of problems with specified thermody- 
namic variables other than (T, P); 

2) the effect of inaccurate j1* data on the equilibrium 
composition; 

3) thermodynamic properties of the equilibrium react- 
ing mixture (e. g., H, Cc. Cy, A 

4) the effect of changes in the specified overall (initial) 
composition on the equilibrium composition (e. g., 
buffer capacity of aqueous systems). 

These topics are discussed in ([26]; [27, Chap. 8]), and 

an application is described in [19]. 


Stoichiometric Restrictions 


Stoichiometric restrictions for a CEP arise when not all 
solutions of the element-balance equations, (4) and (5), 
are allowed. Stoichiometric restrictions typically arise 
from kinetics. The CEP can be solved by increasing the 
‘effective number’ of component species, removing the 
restrictions [27, pp. 27-36, 218-219]). A simple illus- 
tration is provided by the permanganate-peroxide re- 
action [16]. 


Equilibrium Constraints 


Equilibrium constraints for a CEP result from the a pri- 
ori specification of some function of the equilibrium 
composition. Examples include fixed pH problems and 
solubility calculations in aqueous chemistry. Additional 
examples are explored in [6,20], and [1]. 


Numerical Implementation 


Phase-class models giving rise to a convex G are impor- 
tant in many areas, including rocket propellant eval- 
uation, aqueous speciation, gas-phase chemical pro- 
cessing, and metallurgical operations. Solution of such 
problems and implementation of the RTPC when nec- 
essary (in the case of pure condensed phases or of ideal- 
solution phases) is relatively straightforward in such 
cases ([27, pp. 58-59, 204-212], [28)). 

In the case of nonconvex G, most workers have fo- 
cused on numerical algorithms for the UAF, and in 
particular the PEP. See [31] for an implementation of 
a homotopy continuation method for the PEP. Global 
optimization algorithms have been applied in [11,13], 
and [22]. See [14] for the use of an algorithm based 
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on interval analysis. Numerical implementation of the 
RTPC has not yet been fully developed. 


Some computer software packages to solve certain 


types of chemical reaction equilibrium problems are 
listed at the web site [34]. 


See also 


> Global Optimization: Application to Phase 


Equilibrium Problems 


> Global Optimization in Phase and Chemical 


Reaction Equilibrium 
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Introduction 


Offshore oilfield infrastructure planning is a challeng- 
ing problem encompassing both complex physical con- 


straints and intricate economical specifications. An 
offshore oilfield infrastructure consists of Production 
Platforms (PP), Well Platforms (WP), wells and con- 
necting pipelines (see Fig. 1), and is constructed for the 
purpose of producing oil and/or gas from one or more 
oilfields. Each oilfield (F) consists of a number of reser- 
voirs (R), while each reservoir in turn contains a num- 
ber of potential locations for wells (W) to be drilled. 

Offshore oilfield facilities are often in operation over 
several decades and it is therefore important to take fu- 
ture conditions into consideration when designing an 
initial infrastructure. This can be incorporated by di- 
viding the operating horizon into a number of time 
periods and allowing planning decisions in each pe- 
riod, while design decisions are made for the horizon 
as a whole. The corresponding optimization model is 
then run periodically with updated information on the 
oilfields in order to reoptimize the planning decisions 
of the offshore oilfield facilities. 

Design decisions involve the capacities of the PPs 
and WPs, as well as decisions regarding which PPs, WPs 
and wells to install over the whole operating horizon. 
Planning decisions involve the production profiles in 
each period, as well as decisions regarding when to in- 
stall PPs, WPs, and wells included in the design. De- 
cision variables can also be grouped into discrete vari- 
ables, for example those representing the installation 
of PPs, WPs and wells in each period, and continuous 


Reservoirs 


Optimal Planning of Offshore Oilfield Infrastructure, Figure 1 
Configuration of fields, well platforms and production plat- 
forms 
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variables, for example those representing the produc- 
tion profiles and pressures in each period. 

Iyer and Grossmann [7] proposed a multi-period 
MILP model that optimizes the planning and schedul- 
ing of investment and operation decisions that include 
the selection of reservoirs to develop, selection of well 
sites and the production rates from wells at each time 
period. The model incorporates the nonlinear reser- 
voir through piecewise linear approximations. Van den 
Heever and Grossmann [14] proposed a multi-period 
MINLP model for oil field infrastructure planning. As 
opposed to Iyer and Grossmann [7], a non-linear reser- 
voir model was incorporated directly into the formu- 
lation. Meister, Clark and Shah [9] proposed a model 
for selecting the optimal information-gathering process 
during the exploration phase, and simultaneously opti- 
mizing the operating policies. Jonsbraten [8] presented 
an MILP model for optimal development of an oil field 
under oil price uncertainty. The author uses progressive 
hedging algorithm that is very similar to Lagrangean 
decomposition to solve the problem. Ahmed, Gorman 
and Bagajewicz [1] discuss the financial risk manage- 
ment in the planning and scheduling of offshore oil 
infrastructure. They introduce uncertainty, risk man- 
agement and budgeting constraints to the model by 
Iyer and Grossmann [7] employing a sampling average 
algorithm to overcome the numerical difficulties and 
compare the results with optimum results found using 
upper bound risk curves. Goel and Grossmann [4,5] ex- 
tended this research to gas field development planning 
under uncertainty. The major uncertainties were in the 
size and initial deliverability of the fields. The authors 
assumed that the uncertainty in the size and initial de- 
liverability of the fields resolve immediately when a well 
platform is built on the field. Linear reservoir mod- 
els were used, which provide a reasonable approxima- 
tion for gas fields. The authors proposed a multistage 
stochastic programming model and a solution algo- 
rithm based on the problem structure. Ulstein, Nygreen 
and Sagli [13] presented a model for tactical planning 
of Norwegian petroleum production. The model maxi- 
mizes the net income before taxes from the production 
and sale of petroleum products. Different cases with de- 
mand variations, varying quality constraints and sys- 
tem breakdowns are considered. The model is solved 
for different scenarios and solutions are compared with 
the base case scenario. The benefit of the model is to 


identify feasible ways to satisfy the demand for varying 
network configurations. 

We describe in this chapter the deterministic model 
proposed by Van den Heever and Grossmann [14] 
which incorporates nonlinearities directly into the op- 
timization model. Specifically, these are the reservoir 
pressures, gas to oil ratio, and cumulative gas produced 
expressed as nonlinear functions of the cumulative oil 
produced. 


Problem Statement 


The design and planning of an offshore oilfield infras- 
tructure (refer to Fig. 1) is considered over a planning 
horizon of Y years, divided into T time periods (e.g. 
quarterly time periods). An oilfield layout consists of 
a number of fields, each containing one or more reser- 
voirs and each reservoir contains one or more wellsites. 
After the decision has been made to produce oil from 
a given well site, it is drilled from a WP using drilling 
rigs. A network of pipelines connects the wells to the 
WPs and the WPs to the PPs. For our purposes, we 
assume that the location/allocation problem has been 
solved, i.e. the possible locations of the PPs and WPs, 
as well as the assignment of wells to WPs and WPs to 
PPs, are fixed. In the model, one can easily relax the 
assumption of fixed allocation of each well to a WP 
and consider that each well may be allocated to two 
of more WPs. However, this will clearly increase the 
computational effort significantly. A more practical op- 
tion might be to consider allocating each well to up to 
two WPs. In this case, the two different allocations are 
treated as two choices of identical wells of which only 
one can be selected. 

The design decisions we consider are valid for the 
whole planning horizon and are: 
1 whether or not to include each PP, WP, and well in 

the infrastructure over the planning horizon 
2 the capacities of the PPs and WPs 
The planning decisions are made in each time period 
and are: 
1 whether or not to install each PP and WP 
2 whether or not to drill each well 
3 the production profile of each well 
These decisions are made under the assumption that 
operating conditions are constant during a given time 
period. 
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Optimization Model 


The following is a complete Generalized Disjunctive 
Programming [6] model of the offshore oilfield infras- 
tructure planning problem. Please refer to the list of 
nomenclature at the end of this chapter. 


Objective Function 


The objective function is to maximize the Net Present 
Value (NPV) which includes sales revenues, investment 
costs and depreciation. 


T 
max YW = > Rev; — oo lor 
t=1 pePP 
+> oe +») creat 
mEWP(p) we Wwyp(z) 


(1) 


The cost of exploration and appraisal drilling is not in- 
cluded in the objective, seeing that this model will be 
used after the results from exploration and appraisal 
have become available. Costs related to taxes, royal- 
ties and tariffs are not included in this model. The 
treatment of complex economic objectives is described 
in [15]. Costs that we have not included are the costs of 
decommissioning and well maintenance. 


Constraints Valid for the Whole Infrastructure 


In (2) the sales revenue in each time period is calcu- 
lated from the total oil produced, which is in turn cal- 
culated in (3) as the sum of the oil produced from all 
production platforms. (4) calculates the amount of oil 
flow from each reservoir in each time period to be the 
sum of all oil flowrates from wells associated with that 
reservoir times the duration of the time period. The cu- 
mulative flow of oil from each reservoir is calculated in 
(5). Note that (5) is one of the linking constraints that 
links the time periods together and thus prevents a so- 
lution procedure where each period is solved individu- 
ally. The cumulative flow of oil is used in (6) to calculate 
the reservoir pressure through the exponential function 
which is obtained by fitting a nonlinear curve to the lin- 
ear interpolation data used by Iyer et al. [7]. 


_ total 
Revt = CX, 
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Dy x? — xi (3) 


pePP 
It = At Yo x"? WreR(f), f €F (4) 
(w,7,p)€ Wr, r(f,r) 
6-1 
xg) = So WreR(f). fe F (5) 


t=1 


yt = el exp(yry xc;/) VreR(f),feF 
fort=1...T (6) 


Disjunction for each PP, WP and Well 


We exploit the hierarchical structure of the oilfield to 
formulate a disjunctive model. The production plat- 
forms are at the highest level of the hierarchy, and the 
disjunction includes all constraints valid for that PP, as 
well as the disjunction for the next hierarchical level, 
i.e. for all WPs associated with that PP. In turn, the dis- 
junction for each WP, which is located within the dis- 
junction of a PP, contains all constraints valid for that 
WP, as well as the disjunctions for all wells associated 
with that WP. We present the disjunctions here with 
numbers indicating the constraints present, and follow 
with the explanation of the individual constraints con- 
tained in each disjunction (Fig. 2) 

The outer disjunction is valid for each PP in each 
time period and can be interpreted as follows: If pro- 
duction platform p has been installed during or before 
period t (discrete expression Vp_, - = true), then all 
constraints in the largest bracket are applied: 


z= p 
pP_ op Pp Zz, =0 
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> & =e (12) 


TEWP(p) 


Optimal Planning of Offshore Oilfield Infrastructure 


2741 


Tw ne 2g" x 
oo eee tg 
(13), (14) CI”, er? =0 yf 
(15), (16), (17), (18), (19) f 
ON ee g, =9 
0 
t wy w,7,p V oct 
Vi 29’ P % 76 am =f 
WwW, ,p d=1 w,7,p WT ,p ™P _( 
a ane Vv 724 : t — a 
(20) cr — 0 Vv ot Ly = 0 
(21), (22), 23), (24), (25), rel™? = 0 
(26), (27), (28), (29) eee 
Vu € Wwe (7) 
vn € WP(p) 
Vp € PP,teT 


Optimal Planning of Offshore Oilfield Infrastructure, Figure 2 


First, the smaller nested disjunction is used to calcu- 
late the discounted investment cost (including depreci- 
ation) of the production platform in each time period. 
This cost is calculated if production platform p is in- 
stalled in period t (z? = True), otherwise it is set to 
zero. (7) relates the cost as a function of the expansion 
capacity, which is set to zero if the production platform 
is not installed (z? = False), while (8) sets an upper 
bound on the expansion. (9) determines the design ca- 
pacity to be the maximum flow among all time periods, 
and this is modeled linearly by defining the expansion 
variable which can take a non-zero value in only one 
time period. (11) and (12) are mass balances calculat- 
ing the oil/gas flow from the PP as the sum of the flow 
from all WPs associated with that PP. If the production 
platform has not been installed yet (discrete expression 
Vv hs Ze = false), the oil/gas flows, as well as investment 
cost, are set to zero. 

The middle disjunction is valid for all well platforms 
associated with production platform p and is only ap- 
plied if the discrete expression Vj_, a is true. This dis- 
junction states that if well platform z has been installed 


: ; : : t Ip 
before or during period t (discrete expression Vp_, Zg 


= true), then the constraints present in that disjunction 
are applied: 
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Again, the smaller nested disjunction is used to calcu- 
late the discounted investment cost (including depre- 
ciation) of the well platform in each time period. This 
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cost is calculated if well platform z is installed in pe- 
riod t (ap? = True), otherwise it is set to zero. (13) re- 
lates the cost as a function of the expansion capacity, 
which is set to zero if the well platform is not installed 
(er? = False), while (14) sets and upper bound on the 
expansion. (15) and (16) determine the design capac- 
ity as described in the case of the production platform. 
(17) and (18) are mass balances calculating the oil/gas 
flow from the WP as the sum of the flow from all wells 
associated with that WP. (19) relates the pressure at the 
WP to the pressure at the PP it is associated with. The 
pressure at the PP is the pressure at the WP minus the 
pressure drop in the corresponding pipeline, which is 
given by the remaining terms in (19). If the production 
platform has not been installed yet (discrete expression 
Vax1 a = false), the oil/gas flows, as well as invest- 
ment cost, are set to zero. 

The innermost disjunction is valid for each well 
w associated with well platform z, and is only in- 
cluded if well platform mz has already been installed 
(discrete expression v}_, z,”” = true). If well w has 
been drilled during or before period t (discrete ex- 
,'? = true), then the following con- 


pression V4_, Z, 
straints are applied: 
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(20) ce = 0 
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The smaller nested disjunction is used to calculate the 
discounted investment cost (including depreciation) of 
the well in each time period. This cost is calculated in 
(20) if well w is drilled in period t (ao *P — True), oth- 
erwise it is set to zero. (21) relates the pressure at the 
well to the pressure at the WP it is associated with. 
The pressure at the WP is the pressure at the well mi- 
nus the pressure drop in the corresponding pipeline, 
which is given by the remaining terms. (22) states that 
the oil flowrate equals the productivity index times the 
pressure differential between reservoir and well bore. 
(23) restricts the gas flowrate to be the oil flow times 
the GOR, while (24) restricts the maximum oil flow 
to equal the productivity index times the maximum 
allowable pressure drop. The productivity index and 
the reservoir pressure determine the oil production 
rate from a well in a given time period. The well is 
usually capped when the GOR (gas to oil ratio) ex- 
ceeds a certain threshold limit or when the pressure of 
the reservoir is lower than a minimum pressure. (25) 
and (26) calculate the cumulative flow to be the sum 
of flows over all periods up to the current one. Note 
that (25) and (26) are linking constraints that link the 
time periods together and prevent a solution procedure 
where every time period is solved separately. (27) de- 
notes a specification by the oil company which restricts 
the flow profile to be non-increasing. While this com- 
pany does not require a lower bound on the quantity 
of oil flowing through a pipeline, one could consider 
adding such a lower bound in the form of a thresh- 
old constraint where the flow is either zero or above 
some minimum, in order to address concerns that the 
pipeline may seize up if the flow rate were to drop be- 
low a certain level. The linear interpolation to calcu- 
late cumulative gas and GOR as functions of cumula- 
tive oil, are replaced by the nonlinear constraints (28) 
and (29). These quadratic equations are obtained from 
a curve fit of the linear interpolation data from Iyer 
et al. [7]. If the well has not been drilled yet (discrete 


expression Vjp_,Z,_ ™P — false), the oil/gas flows, cu- 
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mulative flows, as well as investment cost, are set to 
Zero. 


Logical Constraints 


These represent logical relationships between the dis- 
crete decisions. (30)-(32) specify that each well, WP 
and PP can be drilled/installed in only one period. (33) 
states that if a WP has not been installed by t, then any 
well w associated with that WP cannot be drilled in t. 
Likewise, (34) states that if a PP has not been installed 
by period t, then any WP associated with that PP can- 
not be installed in t. The restriction that only M,, wells 
can be drilled in any given time period, is given by (35). 


i 
v zy Yw © Wwe(x), € WP(p), p € PP (30) 
t= 
T 
v.29? x € WP(p), p € PP (31) 
V Ze Vp € PP (32) 
t= 
t 
qv a > ag 
my (33) 
Vw © Wwe(m),2 € WP(p), p € PP 
t p 1p 
ar =>-77," Va € WP(p), p € PP (34) 
v a My (35) 
(w,70,p) 
Solution Method 


Van den Heever and Grossmann [14] proposed an iter- 
ative aggregation/disaggregation algorithm, where the 
time periods are aggregated in the design problem, and 
subsequently disaggregated when the planning prob- 
lem is solved for the fixed infrastructure obtained from 
the design problem. Both of these subproblems are 
described with the logic-based outer approximation 
method [6,12]. The solution from the planning problem 
is used to update the aggregation scheme after each it- 
eration. This is done through a dynamic programming 
subproblem which determines how time periods should 
be aggregated to yield an aggregate problem that re- 
sembles the disaggregate problem as close as possible. 
Convex envelopes are used to deal with non-convexities 
arising from non-linearities. 


Optimal Planning of Offshore Oilfield Infrastructure, Figure 3 
The final configuration 


Optimal Planning of Offshore Oilfield Infrastructure, Table 1 
The optimal investment plan 


Period invested 
Jan. 1999 
Jan. 1999 
Reservoir | Well 
Jan. 1999 
Jan. 1999 
Jan. 1999 
Apr. 1999 
Jul. 1999 
Oct. 1999 


Pi [2_[an.2000 
Ps [2_[an.2000 
fio _[1_[2an. 2000 


Figures 3 and 4, together with Table 1 show the op- 
timal solution obtained for the largest problem instance 
of 25 wells in Fig. 1 for the planning horizon of 24 pe- 
riods. The final configuration is shown in Fig. 2. Note 
that only 9 of the potential 25 wells are drilled over the 
24 periods. Of these, 3 are drilled in the first period, 1 in 
the second, 1 in the third, 1 in the fourth, and 3 in the 
fifth period as shown in Table 1. 

Figure 4 shows the production profile for the whole 
infrastructure over the 24 time periods encompassing 
the six years from January 1999 up to December 2004. 
The net present value obtained for the profit is $67.99 
million. This final solution is found in less than 25 min 
by the proposed algorithm, whereas a traditional solu- 
tion approach such as the OA algorithm needs more 
than 5h to find the solution. Due to the short solution 
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Optimal Planning of Offshore Oilfield Infrastructure, Figure 4 
Production profile over six years 


time, the model can quickly be updated and resolved 
periodically as more information about the future be- 
comes available. Also, different instances of the same 
problem can be solved in a relatively short time to de- 
termine the effect of different reservoir simulations on 
the outcome. 


Nomenclature 


Sets and Indices 


PP set of production platforms 

p production platform p € PP 

WP(p) set of well platforms associated with plat- 
form p 

a well platform 2 € WP(p) 

F set of fields 

f field f € F 

R(f) set of reservoirs associated with field f 

r reservoir r € R(f) 

Wwp(z) set of wells associated with well platform 
uA 

Wari(r) set of wells associated with reservoir r 


set of wells associated with reservoir r and 
well platform z 


Wwe,r(r, 2) 


Ww well w € W,)(.) 

t time periods 

T aggregated time periods 

T disaggregate time periods 

TA aggregate time periods 
disaggregate time period t € T 

c aggregate time period t € TA 


To denote a specific well w that is associated with 
a specific well platform z, which is in turn associated 
with a specific production platform p, we use the index 
combination (w, z, p). Similarly, the index (z, p) ap- 
plies to a specific well platform z associated with a spe- 
cific production platform p. We omit superscripts in the 
variable definition for the sake of simplicity. 


Continuous Variables 


x; oil flow rate in period t 

cumulative oil flow up to period t 

& gas flow rate (volumetric) in period t 

cumulative gas flow up to period t 

l; oil flow (mass) in period t 

¢: — gas-to-oil ratio (GOR) in period t 

v; pressure in period t 

5; pressure drop at choke in period t 

d; design variable in period t 

e; design expansion variable in period t 

Rev; sales revenue in period t 

CI, investment cost in period t (including deprecia- 
tion) 

S; State at the end of aggregate period T, i.e. number 
of disaggregate periods available for assignment 
at the end of aggregate period t 


m, length of aggregate period t 
Boolean Variables 
Z; = true if facility (well, WP or PP) is drilled/installed 


in period t 
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Parameters 


productivity index of well 


Parse maximum pressure drop from well bore to 
well head 

GORmax Maximum GOR 

My number of periods in aggregate time period t 

Ty number of aggregate time periods, t = 1..T, 

My maximum number of wells drilled in a time 
period 

At length of time period t 

U upper bound parameter (defined by the re- 
spective constraint) 

a pressure drop coefficient for oil flow rate 

B pressure drop coefficient for GOR 

Cit discounted revenue price coefficient for oil 
sales 

Cot discounted fixed cost coefficient for capital 
investment 

C3t discounted variables cost coefficient for capi- 
tal investment 

Ypl first coefficient for pressure vs. cumulative oil 

Y p2 second coefficient for pressure vs. cumulative 
oil 

Yg1 first coefficient for cumulative gas vs. cumu- 
lative oil 

Vg2 second coefficient for cumulative gas vs. cu- 
mulative oil 

93 third coefficient for cumulative gas vs. cumu- 
lative oil 

Ygort first coefficient for GOR vs. cumulative oil 

Ygor2 second coefficient for GOR vs. cumulative oil 

VY gor3 third coefficient for GOR vs. cumulative oil 

finy,t discounting factor for investment in period t 

fa pr,t discounting factor for depreciation in period t 

frev,t discounting factor for revenue in period t 

I, investment costs in period t 

R; revenue in period t 

Superscripts 


(w, 2, p) variables associated with well w € W, with 


(x, p) 
(p) 


(r) 


well platform z and production platform p 
variables associated with well platform a and 
production platform p 

variables associated with production plat- 
form p 

variables associated with reservoir r 
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Background 


Sensor networks are a collection of devices designed to 
detect and/or measure characteristics of targets within 
an environment or traits of an environment itself. 
These measurements are then used to estimate states 
within the system as modeled by the designer. The de- 
sign goal is to provide an optimal estimate of these 
states against predetermined measures of goodness. 
The combining of all available measurements into op- 
timal estimates of a dynamic system is a well-studied 
problem and is widely implemented in variants of 
Kalman filters throughout many current applications 
(see the Analytical Science Corporation, [10]). 
However; in many applications, practical limita- 
tions preclude the collection of data by all sensors at 
any given instance in time. In fact, systems are often 
limited to collecting information from a single sensor 
at each instance. [1,7]. For example, radar or sonar 
systems may be precluded from simultaneously trans- 
mitting pulses due to inter-pulse interference, or data 
transmission rates may exceed system bandwidth ca- 
pabilities. Under these conditions, the question is no 
longer how to combine the measurements to obtain 
an optimal state estimate, but instead is in what order 
should the sensors be visited in order to obtain an opti- 
mal estimate. The determination of this visitation order 
is called the sensor scheduling problem. A variant is the 
single sensor, multiple-site problem, where a sensor is 
moved between discrete locations while maintaining an 
estimate of a dynamic physical attribute at each site. 


Numerous optimization techniques have been used 
to address the sensor scheduling problem. In general, 
sensor scheduling has been shown to be a combinato- 
rial optimization problem, but with certain simplifying 
assumptions, several successful approaches employing 
dynamic, linear modeling techniques have been devel- 
oped. These techniques have the benefit of the extensive 
development of filter theory and are often incorporated 
directly into existing filter designs or immediate exten- 
sions to their application. In this paper, selected tech- 
niques are presented in chronological order of their de- 
velopment. Each is presented with a brief overview of its 
theoretical development followed by observations on its 
significant implications and application. 


Methods 
Linear Plant Control Model 


In an early and oft cited publication, Meier et al. [8] 
posed the sensor selection problem as an adaptive plant 
control problem, in which a sequence of plant con- 
trol vectors, Ta and measurement control vectors, us 
are sought to optimize plant performance. The mea- 
surement control vector selects which sensor (or sen- 
sors, in the more general case), if any, that the mea- 
surement subsystem would collect, creating a measure- 
ment matrix, Z, = {Zo,Z),...,Zx}, where zj is a n-di- 
mensional column vector. Initial development is for 
the general discrete case. Since, in the general case, 
the problem is of undetermined dimensionality and 
unbounded in many cases of practical interest, further 
development is made under the assumptions that the 
plant and measurement subsystem are linear in nature 
and that both the system disturbance and measurement 
noise are Gaussian processes. It is also assumed that the 
cost function is quadratic in nature. The examination of 
this special case allows more definitive results to be ob- 
tained. The resulting dynamic linear system is modeled 
as follows. 


Xep1 = Fx + Geul + We 
Ze = Hy (uj!) Xe + Ve 
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where x; is the state of the system at time k, F, is the 
state transition matrix, G, is the input matrix, w; is 
zero-mean, Gaussian system noise with a covariance of 
Qk, Ze is the measurement vector, Hx is the observa- 
tion matrix, and v;% is zero-mean, Gaussian measure- 
ment noise with covariance Rx, which is uncorrelated 
with Q;. The system state and its associated error co- 
variance matrix are optimally propagated in accordance 
with the following equations. 


uy, = KX), 


where x , 1s the optimal state estimate and 


-—1 
Ky = [GpPeoiGe + Re] Gi Peri Fk 
Py = Qe + Fp PKaiFe + Poy, 
-—1 
Pry = Fe PKR41Gx (Gi PR+1Ge + Re) Gy Pr+iFr, 
k=0,1,...,N; 


where a superscript* indicates a variable that has been 
updated with the current measurement and lack thereof 
indicates the value as propagated through the model 
with prior to the measurement update. 

Noting that the gain matrix, K;, and the covariance 
matrices, P; and P*, are independent of Ry and Hy in- 
dicates that they are also independent of the measure- 
ment control vector, u;". This leads to a major conclu- 
sion, that the measurement control vector, hs can be 
solved separately from the plant control vector, uj, and 
can in fact be determined a priori. This allows the com- 
putation of the measurement control matrix to be per- 
formed through the following equivalent nonlinear, de- 
terministic control problem: 


N 

Minimize J* = )~ {tr [Pg Pax] + IM (ut)} 

k=0 

Once in this form the problem can be solved em- 
ploying dynamic programming or gradient method 
techniques, with examples of each presented by Meier, 
et al. In the presence of possible local minima, the dy- 
namic programming approach is preferred to ensure 
a global minimum obtained. The inclusion of a cost for 
taking the measurement as part of the cost function al- 
lows for the dropping of unnecessarily expensive mea- 
surements. All following techniques focus on the min- 
imization of the error covariance, P, and assume that 


exactly one measurement will always be made each it- 
eration. 


Bounded Open-Looped Covariance 
Propagation Model 


A technique for a weighted random sampling of sensors 
is proposed by Gupta, et al. and developed in a series 
of publications [4,5,6]. Here a discrete linear system is 
assumed and the error covariance matrix is allowed to 
propagate open looped (that is without measurement 
updates) for a predetermined number of iterations, k. 
Dropping the plant portion of the model used in the 
above model, we have the following equations. 


XK = Fe—-1X-1 + Bw] 
Zk = Axe + VE 
Prt = FPF, + Qe 


where x; is the state of the system at time k, F, is the 
state transition matrix, w; is zero-mean, Gaussian sys- 
tem noise with a covariance of Q;, B is the noise input 
matrix, z; is the measurement vector, H; is the obser- 
vation matrix, and 1; is zero-mean, Gaussian measure- 
ment noise with covariance R,;, which is uncorrelated 
with Q;. (Note that the lack of the G;, uj term in this 
model reflects the pure sensor approach taken here vice 
the plant control approach taken by Meier, et al.). The 
error covariance matrix propagates in accordance with 
the following equations. 


fu(P) = FPF’ + BQB* 
— FPH" (HPH™ +R) HPF" 


fi (P) = fu falfrfa(P))): 
fu is applied k times . 


While a general solution to this equation appears 
intractable, the concept is developed further by prov- 
ing the existence of both upper and lower bounds un- 
der certain conditions. Requirements for the existence 
of these bounds include that P is positive semi-definite, 
Ris positive definite, and the selection of sensor i for the 
jth measurement independent of all other selections. 
The upper bound is 


X = BQB'+ AXA! 


N 
— Yo aA [XCF (Ri + C:XCT) Cx] AT 


i=0 
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and lower bound is 
i |Amax (Aj)|? <1. 


The final step in the development of this technique 
is to show that sensors chosen according to a Markov 
chain fall within these bounds, which in terms of the 
Markov chain parameters are 


N 
x - Jy 
k+1 = TAK 
j=l 


N 
jwi _ i eek 
TX = di fe (Xi) aij 
i=1 
k 
= i-1(_J 2 eed i 
Ye = 2045; Ca aims) $6, 
i=1 


-(BQB") + qi! m3 f6,(Po)- 


The bounds provided by this technique are loose 
and a designer has the opportunity to fine tune the sys- 
tem’s performance by adjusting qj. Selection of this pa- 
rameter is known to also create a thresholding effect 
where use of a particular sensor is disallowed. Once im- 
plemented, this algorithm has a computational burden 
that is orders of magnitude less than a tree-search al- 
gorithm and finds a near-optimal solution in the steady 
state. 


Branch-and-Bound Search Method 


The next technique, developed by Feng, et al. [3], is 
a branch-and-bound method, also built on a discrete 
time, time-varying model. The sensor schedule is repre- 
sented by vector u, where element u, = i means sensor 
iis in use at time k € {1,2,...,N} and U is the set of 
all possible u vectors. Since U is a finite set, at least one 
vector u will provide an optimal solution. To avoid an 
exhaustive search, a lower bound is developed for the 
propagation of the error covariance matrix and the al- 
gorithm performs a tree search for the optimal solution, 
pruning branches based on this lower bound. While the 
system model is similar to the previous model, the cost 
function is modified as following to reflect use of the u 
vector. 


N-1 
J(u) = > Tre{D(k)P,(k)}dt + c+ Tr {P,(N)} 


k=0 


where XY is an nxn positive definite matrix-valued 
function with 


| Hij(k)| < L,k © {1,2,..., N}, 
i,j=1,2,....n, 


where L and c are positive constants, and 
P,(k) = E {(x(k) — £(k)) (x(k) — £(k))"| Fu} 


is the error covariance matrix and F,, is the smallest 
o-algebra for which z is measurable for u € U. 

For two symmetric matrices P; and P of similar 
dimension, the notation P, > P was used to indicate 
P, — P; is a positive semi-definite matrix. Feng, et al. [3] 
proceeded to prove the lower bound on the covariance 
matrix exists by showing if 


Pi (k) < Pi, (k) fork € {1,2,...,N}, 
then J(u,) < J(uz). 


and if 


n 
2Pij = pe Pyy 
j=l 


s MPS 1 Qaveests 


then P, an xn positive matrix, is positive semi-definite. 
Hence, choosing 


W(k) = 


max W(k), 
i=1,2,...,N 


where W;(k) is a diagonal matrix chosen such that 
Hj (k)(Di(k)Dj (k))*Ci(k) < ilk) 
then the lower bound is given by 


LB(u(1), u(2),..., u(i)) 
N-1 
= Do Tr {Z(k)P*(k)} dt + c-Tr{P*(N)} , 


k=0 
subject to the system propagation equations, 
X« (kK) = Fe—1 (k) xe-1 + B(K) we-1 
Ze = Hx (k) xx + D(k) vx 


where k € {0,1,..., T — 1} and Q; and Rx are uncor- 
related covariance matrices for w; and v; , respectively. 
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Branching is now implemented by noting, if 
Y,(k) < Y;(k), then for any sequences, 

{u(1), u(2),...,u(k —1),i,u(k + 1),...,u(T)}, 
and {u(1), u(2),...,u(k —1), j,u(k + 1),...,u(T)}, 
the solution for the covariance matrix will also be less 
than or equal to the solution for the second sequence, 
where Hi (k)(Dj(k)Dj (k))"Ci(k) = Y4(k). 

Feng, et al. [3] present the associated algorithm and 
provide an illustrative example. An exhaustive search 
of the solution space for the example would have taken 
10'* checks, while the branch and bound approach 
found the solution after branching 6842 times and ex- 
amining 435 solutions. 


Multi-site Problem Overview 


In the multi-site single-sensor version of the sensor 
scheduling problem, the objective is for a single sen- 
sor to maintain an estimate of a dynamic physical at- 
tribute (e.g., position) of multiple targets. Tiwari, et 
al. [11] present a feasibility criterion for a single sensor 
to maintain a bounded estimate of an attribute at n lo- 
cations. Yerrick, et al. [13] demonstrates by simulation 
the feasibility criterion presented in Tiwari, et al. [11] 
and develops a heuristic to find a good sensor motion 
model given the dynamics of the system under obser- 
vation. Yerrick, et al. [14] provide an optimal sensor 
coverage solution for two sensor motion models given 
a model of the observed system’s dynamics. The first 
model they study is based on iid. transition probabil- 
ities among the sites. That is, the next site to be visited 
by the sensor is chosen according to a stationary proba- 
bility distribution that is independent of the previously 
visited site. Their second model is more sophisticated 
as the moves are chosen according to a transition prob- 
ability matrix. 

All these mentioned works investigate probabilistic 
strategies for the motion of the single sensor among the 
sites. A deterministic approach overcomes one disad- 
vantage of probabilistic motion: with any random mo- 
tion strategy, there is nonzero probability that a par- 
ticular site will not be visited at all in any finite time 
horizon. Yavuz and Jeffcoat [12] build an optimization 
model and show that it is NP-Hard. The authors also 
develop valid lower and upper bounds for the objec- 
tive function of their model. This paper also exploits 
the relationships between the sensor scheduling prob- 


lem and periodic scheduling problems. In particular, 
pinwheel scheduling and just-in-time manufacturing 
scheduling literatures are exploited and useful results 
are incorporated to sensor scheduling. The authors pro- 
pose two constructive heuristic procedures to solve the 
sensor scheduling problem and evaluate their perfor- 
mance through a computational study. 


Conclusions 


The three approaches presented demonstrate the power 
of using a dynamic linear model to address the sensor 
scheduling problem. These approaches also integrate 
easily into existing linear filter algorithms and systems. 
As with linear filtering schemes, small system non- 
linearities are masked by the system and measurement 
noises. Should system non-linearities, prevent achiev- 
ing the required level of performance needed, a filter- 
type of approach may still be achievable with nonlinear 
filters and potential approaches to the sensor schedul- 
ing problem using nonlinear filters are presented by 
Oshman [9] and Baras, et al. [2]. 

The recent contributions to multi-site single-sensor 
scheduling explore both probabilistic and deterministic 
solution approaches. The formulation of the problem 
is relatively new and in fact a wide range of different 
versions of the problem remain to be studied. Such ver- 
sions should consider multiple sensors or relax one or 
more of the underlying assumptions; e. g., statistically 
independent sites, uniform transition times, no cost of 
movement or measurement, or time-invariant models 
of system dynamics and measurement. 
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Introduction/Background 


Solvents find widespread use throughout the chemical 
industry, from separations in bulk chemical processing 
to reactions, transport and separations in the fine chem- 
icals and pharmaceutical industries. Solvents are use- 
ful for dissolving solid materials in order to enhance 
reaction rates, to facilitate the transport of solid ma- 
terials, to provide a heat sink during highly exother- 
mic reactions and to allow difficult separations to take 
place (liquid-liquid separations, absorption or extrac- 
tive distillation). The choice of solvent has an impact 
on the economic performance of a given process, and 
on the quality of the products manufactured. For in- 
stance, different solvents can lead to different crystal 
structures being formed during crystallization; the re- 
generation of a solvent leaving a separation unit can be 
more or less energy-intensive, as in the case of chemical 
absorption versus physical absorption, leading to differ- 
ent operating costs; the amount of solvent required to 
process a certain amount of material may vary signifi- 
cantly, leading to different equipment sizes and hence 
capital investment. 

The identification of an optimal solvent is a diffi- 
cult task for several reasons. First of all, solvent design 
is a very complex problem in which many trade-offs 
must be considered. In absorption, for instance, a sol- 
vent which has a high capacity for the compound it 
must remove may be comparatively expensive to re- 
generate, and may be lost with the separated com- 
pound in larger quantities during regeneration. In re- 
actions, a solvent which leads to a high product yield 
may also favor side reactions and thus result in the loss 
of valuable reactants. In addition, the optimal solvent 
is closely linked to the choice of operating conditions 
for a process. Thus, the decoupling of the choice of sol- 
vent and of operating conditions can lead to subopti- 
mal solvent designs. Early methods for the computer- 
aided selection of solvents were based on enumeration 
or “generate-and-test” techniques [3,7]. However, be- 
cause of the complexity of the problem, optimization 
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provides a natural framework for solvent design. In its 
most complete form, the solvent design problem is in 
fact an integrated plant-wide process and solvent de- 
sign problem. The plant-wide aspect is required as a sol- 
vent which is optimal for a reactor may cause prob- 
lems in separations (e. g., difficult recovery, poor behav- 
ior in crystallization) and vice versa. The design vari- 
ables to be considered include equipment sizes and op- 
erating conditions (continuous variables), as well as the 
molecular structure of the solvent (integer or binary 
variables). The general problem structure is therefore 
a mixed-integer program. Usually, this problem is non- 
linear in nature and, depending on the type of process 
model used, it may involve differential and algebraic 
equations. 

From a practical point of view, it is not yet feasible 
to tackle the integrated plant-wide design problem for 
processes including both reaction and separation. The 
first publications on the use of optimization in solvent 
design focused on the optimization of physical property 
metrics such as solvent capacity and heat of vaporiza- 
tion [25,29]. There has since been steady progress on 
extensions of the problem formulation, with more di- 
rect measures of performance being used. The design 
of solvents for gas-liquid and liquid-liquid separations 
is a well-studied problem and some of the techniques 
developed are now used in industry. Much remains 
to be done on solvent design for reactions and solid- 
liquid separations (crystallization). For these unit op- 
erations, solvents are often chosen on the basis of in- 
tuition, similarity with other known systems and ex- 
perimentation. The choices being made are thus often 
suboptimal. One major hurdle remains problem for- 
mulation because meaningful relations between solvent 
structure and solvent properties do not exist, or existing 
relations are very complex (e.g., Monte Carlo simula- 
tions or quantum-mechanical calculations). 

In this article, the state-of-the-art in problem for- 
mulation is reviewed, and methods used to solve the 
problem are briefly discussed. There has been extensive 
and successful work on solvent selection based on “gen- 
erate-and-test” approaches [3,15]. This approach is par- 
ticularly well suited to the case where all decisions are 
integer and can therefore be enumerated but is not con- 
sidered here because it does not usually rely on the for- 
mulation and solution of an optimization problem, but 
rather on matching certain property constraints [26]. 


Perspectives for further developments of optimization- 
based solvent design are considered at the end of this 
article. 


Formulation 
General Problem Formulation 


The general formulation of the optimal solvent design 
problem is given by 


min f(y) 

subject to Asp(x,y) = 0, 
Ssp(%& y) <0, 
hey) = 0, 
gef(Xy) <0, (1) 
hy(x, y)=0, 
&p(% y) <0, 
xe R”, 


ye {0,1)%, 


where f is the objective function; h,, is a set of struc- 
ture-property equality constraints, which relate sol- 
vent molecular structure to physical properties; he is 
a set of chemical feasibility and complexity equality 
constraints, which ensure that only meaningful solvent 
molecules are designed; hp is a set of equality con- 
straints describing the process, such as mass balances, 
energy balances and sizing constraints; g,, is a set of 
structure-property inequality constraints; gcr is a set 
of chemical feasibility, complexity and solvent prop- 
erty inequality constraints, such as limits on the sol- 
vent boiling point; g, is a set of process performance 
constraints; x is an m-dimensional vector of continuous 
variables denoting operating conditions, physical prop- 
erties or process variables; and y is a q-dimensional vec- 
tor of binary variables describing the solvent structure. 


Representation of the Solvent 


Atom Groups The solvent is usually represented as 
a set of atom groups which are commonly used in group 
contribution methods [34], such as CH3, CH2, OH and 
COOH. Any given molecule can thus be represented by 
a vector of integer variables, n, where the ith element 
corresponds to the number of groups of type i. Thus, 
n-butanol is represented by mcu, = 1, "cH, = 3 and 
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Nou = 1. In order to allow the solution of the problem 
by mixed-integer programming algorithms such as the 
outer approximation [12,42], the integer variables are 
mapped onto a set of binary variables with the following 
constraints: 


K 
af yin — mi = 0, VieG, (2) 
k=0 


where G denotes the set of groups, each y;,, denotes 
a binary variable and K is such that )~;_, 2* is equal 
to the maximum number of groups of any given type 
allowed in the optimal solvent. This allows the variable 
vector n to be treated as a vector of continuous vari- 
ables. This approach has been used extensively in the 
optimal solvent design literature. 


Atom Groups and Connectivity Information An al- 
ternative approach, which allows a more detailed rep- 
resentation of the molecule and the use of alterna- 
tive structure-property relations such as second-order 
group contribution techniques [1,28] or connectivity 
indices [20], is based on a graph-theoretic representa- 
tion of the compound via an adjacency matrix or sim- 
ilar tools. This has been proposed in the context of 
computer-aided molecular design by a number of au- 
thors [4,9,10,35]. 


Chemical Feasibility Constraints In order to ensure 
that the solvent designed is indeed physically meaning- 
ful, a number of constraints must be imposed to limit 
the combinations of binary variables. On the basis of 
typical boiling point constraints on solvents, it is first 
noted that solvents are usually acyclic, monocyclic aro- 
matic or, more rarely, bicyclic aromatic molecules. This 
can be represented by three binary variables. y, = 1 
for an acyclic molecule, y, = 1 for a bicyclic solvent 
molecule and y,, = 1 for a monocyclic molecule [29]: 


Yat Y+¥m = 1. (3) 


Furthermore, a continuous variable, m, is defined to 
represent represent the type of molecule. For a mono- 
cyclic molecule, m = 0, for an acyclic molecule, m = 1 
and for a bicyclic molecule, m = —1. This is expressed 
in terms of the binary variables as 


m—(Ya- Yo) = 0. (4) 


The octet rule [29] ensures that the solvent molecule 
designed is structurally feasible and that there are no 
remaining free attachments in the molecule. It is based 
on the valency (v;) of different structural groups: 

Y\(2 — vi)n;i — 2m =0. (5) 

i€G 

To prevent any group being bonded to itself, the 
formation of a double bond or the formation of more 


than one molecule, the following constraint can be in- 
cluded [29]: 


nj(vj—1) +2m—) nj <0 
i€G 


VjEG. (6) 


The following constraint ensures that the molecule 
contains at least two groups: 


2-Jin <0. (7) 
i€G 
In an aromatic molecule, the number of aromatic 
groups must be 6 if the molecule is monocyclic or 10 if 
it is bicyclic: 


>> 1) —6¥m — 10%» = 0, (8) 
i€Ga 
where Gz, is the set of aromatic groups. 
In a bicyclic molecule the number of aromatic car- 
bon groups (#,c) must be greater than or equal to 2: 


2Yo — Mac <0. (9) 


Chemical complexity constraints can be imposed 
to describe the presence of side chains on cyclic 
molecules [13]. Finally, limits can be placed on the to- 
tal number of groups in the molecule, on the number 
of specific functional groups and on particular com- 
binations of groups. This reduces the solvent design 
space, making the problem more tractable and it also 
accounts for the fact that group contribution methods 
typically work well for medium-sized compounds with 
a few functional groups. 
Design of Solvent Mixtures Several authors have ex- 
panded the size of the solvent design space by consider- 
ing solvent mixtures, in which the solvents in the mix- 
ture and its composition are determined as part of the 
problem solution [8,38]. The presence of continuous 
variables to describe composition makes the use of op- 
timization techniques, as opposed to enumeration tech- 
niques, especially suitable. 
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Methods/Applications 


Having introduced some general concepts on the for- 
mulation of the optimal solvent design problem, we 
now turn our attention to the specific problems which 
have been tackled to date, and to the solution methods 
which have been employed. The applications are clas- 
sified in terms of the type of objective function used. 
Most applications are in the area of fluid—fluid separa- 
tions, but there have been a few publications on reac- 
tions and on solid-fluid separations. 


Design for Property Targets 


Properties from Group Contribution and Correla- 
tions Initial work in the area of solvent design was 
focused on identifying solvents that maximize property 
performance measures. In a separation process, solvent 
capacity and selectivity could be taken as metrics for the 
effectiveness of the separation and solvent loss as a met- 
ric for the difficulty of the solvent regeneration. This ap- 
proach was adopted in [29], where the following func- 
tion was considered as an objective: 


Capacity x selectivity — 100 x solvent loss. (10) 


This early work on optimal solvent design was applied 
to multicomponent gas absorption and liquid-liquid 
extraction using a mixed-integer nonlinear program- 
ming (MINLP) formulation, thus extending an earlier 
approach based on a nonlinear programming (NLP) 
formulation [25]. The required physical properties were 
calculated using group contribution techniques and 
similar correlations. In particular, phase equilibria was 
predicted using the UNIFAC approach [21]. This ac- 
tivity coefficient model has found many applications in 
solvent design thanks to its versatility. 

The use of group contribution techniques and cor- 
relations allows many important selection criteria to be 
taken into account. Issues such as toxicity and environ- 
mental impact as deduced from octanol-water parti- 
tion coefficients can be incorporated in the formulation 
as constraints [2]. Correlations provide a route to ex- 
tending the scope of solvent design formulations. The 
design of solvents for crystallization can be considered 
by incorporating effects such as the influence of solvent 
choice on crystal shape [18]. Solvent choice for an ex- 
tractive fermentation process can be considered by in- 
corporating issues such as biocompatibility, inertness 


and ability to cause phase splitting [43]. This captures 
the effect of the solvent on yield in terms of its ability to 
promote product extraction from the mixture. The im- 
pact of solvent on reaction rate constants can be taken 
into account via a combination of group contribution 
techniques and the solvatochromic equation of [11], as 
shown in [13]. In all these cases, the MINLP formula- 
tion provides a single integrated framework to consider 
the trade-offs between the different solvent properties. 


Properties from More Detailed Models Group con- 
tribution techniques are usually most accurate for 
medium-sized molecules which contain few functional 
groups [34]. While this is often appropriate for the sol- 
vent being designed, the solutes considered are often 
more complex and this may result in a poor representa- 
tion of the solvent-solute interactions. This has reper- 
cussions on the quality of the phase equilibria predic- 
tions and can have a significant impact on the design. 
To address this issue, recent work has been aimed at in- 
corporating more realistic representations of the solute 
and of the solvent-solute systems. 

In [22,36] it was shown how continuum solvation 
models, in which the solvent is modeled as a continuum 
but the solute is modeled at the quantum-mechanical 
level, can be used in solvent design. Lehmann and 
Maranas [22] used the model of [24] to predict activ- 
ity coefficients for solvent-solute pairs. This informa- 
tion was used to optimize the capacity and selectiv- 
ity of the solvent, while taking environmental consid- 
erations into account. The use of such models poses 
additional difficulties in the context of optimization- 
based solvent design, because the evaluation of the 
quantum-mechanical quantities requires the solution 
of an optimization problem (energy minimization). 
This naturally leads to a bilevel formulation with a very 
expensive inner problem. Lehmann and Maranas [22] 
successfully applied their approach to a small case 
study, using a genetic algorithm to solve the outer sol- 
vent design problem. However, they concluded that 
the presence of adjustable parameters in the solvation 
model limited the applicability of the method because 
these parameters are hard to determine reliably. Shel- 
don et al. [36] used a different solvation model [23] 
and focused on minimizing the free energy of solvation. 
This particular problem can be formulated as a single- 
level problem and the authors applied this formula- 
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tion to a set of case studies, using the outer-approx- 
imation algorithm [12,42]. A solvent design space of 
over 3000 molecules was considered in each case study 
and the algorithm systematically converged to the opti- 
mal solution after evaluating less than 12% of the can- 
didate molecules. Despite the high computational cost 
of quantum-mechanical calculations, the best solvents 
were identified in reasonable CPU time. However, work 
remains to be done to incorporate such detailed prop- 
erty models into more realistic solvent design problems. 

Another possibility to improve the quality of prop- 
erty prediction is to resort to group contribution equa- 
tions of state which provide a description of gas and 
liquid phases, in contrast to activity coefficient mod- 
els which are limited to the liquid phase. This allows 
the representation of nonideal vapor phases, which 
is especially relevant for high-pressure processes. Ta- 
mouza et al. [40] and Thi et al. [41] have proposed 
a group contribution version of the statistical associat- 
ing fluid theory (SAFT) equation of state which takes 
into account molecular shape and hydrogen bonding 
and can thus be used to model complex fluid mixtures 
reliably. Recently, Keskes et al. [19] have shown that 
a solvent for a high-pressure absorption process for 
CO, removal from natural gas can be designed based 
on a group contribution version of the SAFT equation 
of state. A limited solvent design space was used, but 
this opens the way for further uses of equations of state 
in solvent design. 


Design for Process Targets 


Since the mid-1990s, the formulation of the solvent 
design problem has become increasingly sophisticated 
as researchers have attempted to capture the com- 
plex trade-offs that are necessary to design an optimal 
process-solvent system. Two main classes of techniques 
have been developed: sequential approaches, in which 
some features of the solvent are preselected before turn- 
ing to the process optimization problem, and integrated 
solvent and process design approaches. 


Sequential Approaches In sequential approaches, 
a first step consists in identifying candidate solvents, or 
features of “good” solvents on the basis of property tar- 
gets. In a second step, an optimal process is designed for 
the solvents selected in the first step. Such an approach 


was proposed by Pistikopoulos et al. [33,39] based on 
a multiobjective framework which accounts for eco- 
nomic and environmental considerations. A list of sol- 
vents is first generated on the basis of environmental 
and property-based criteria. The solvents chosen are 
then verified in different process flowsheets/designs. 

In a_ series of papers, Papadopoulos 
Linke [30,31,32] proposed an approach in which candi- 
date solvents are first designed on the basis of property 
targets. A clustering algorithm is then used to extract 
information from the set of high-performance can- 
didates. This information is then incorporated into 
a multiobjective process design problem. Both opti- 
mization problems (initial solvent design and process 
design) are solved using a stochastic optimization al- 
gorithm. This approach has been applied to several 
separation process design problems. 


and 


Integrated Solvent Design and Process Operation 
The simultaneous optimization of the operation of 
a dynamic separation process and of the solvent was 
considered in [14]. In this case, the process equations 
hy and gy in formulation (1) are given by a set of dif- 
ferential and algebraic equations, resulting in a mixed- 
integer dynamic optimization (MIDO) problem. The 
authors applied the MIDO algorithm of [5,6] and a de- 
composition approach based on the work of [33], in 
which feasibility of key property constraints is tested 
before solving the computationally intensive MIDO 
primal problem. The system considered in this work 
was a batch extractive distillation process, with the ob- 
jective to debottleneck existing equipment. 


Integrated Process and Solvent Design 


Single-Objective Framework Hostrup et al. [16] pro- 
posed an MINLP formulation of the integrated process 
synthesis and solvent design problem. They used a se- 
ries of analysis steps to explore the properties of can- 
didate compounds and the properties of mixtures in- 
cluding candidate compounds. This allowed them to 
eliminate some candidate compounds on the basis of 
heuristics or on the basis of specific process constraints. 
Depending on the number of solvent candidates re- 
maining after this study, a set of NLP process design 
problems or an MINLP is solved to identify the optimal 
process/solvent combination. 
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Multiobjective Framework Buxton et al. [8] extended 
the approach of [33]. Their approach is based on for- 
mulating a multiobjective problem in which economic 
and environmental criteria are taken into account. An 
approach to life-cycle analysis has been included in 
the formulation [17]. The large and highly nonlinear 
mixed-integer problem which results from this formu- 
lation is solved using a decomposition-based approach. 


Solution Techniques 


Several solution techniques have been brought to 
bear on the range of problem formulations devel- 
oped so far. Many researchers have focused on for- 
mulation rather than solution technique, and have 
applied local optimization algorithms such as the 
outer-approximation [12,42] and MIDO [5,6] algo- 
rithms. Others have used decomposition-based ap- 
proaches [33] or sequential approaches [30] to reduce 
the complexity of the problem and avoid unnecessary 
evaluations of large sets of equations. 

The solvent design problem is inherently noncon- 
vex: whenever phase equilibria are considered, as is 
always the case for separation design, highly nonlin- 
ear constraints are introduced in the problem formu- 
lation. To address this issue, there have been a few at- 
tempts at deploying and adapting global optimization 
techniques. Sinha et al. [37] have developed a special- 
ized branch-and-bound algorithm. They later devel- 
oped an approach based on interval analysis to tackle 
the same problem [38]. Wang and Achenie [44] pro- 
posed a hybrid stochastic/deterministic approach based 
on a combination of simulated annealing and local 
MINLP (outer approximation). Marcoulaki and Kokos- 
sis [27] demonstrated the use of a simulated anneal- 
ing algorithm to solve solvent design problems in sepa- 
ration process design (liquid-liquid extraction, extrac- 
tive distillation, gas absorption). Finally, genetic algo- 
rithms have been used, for instance, by Lehmann and 
Maranas [22]. 


Conclusions 


The solvent design problem can naturally be framed 
as a mixed-integer optimization problem in which 
trade-offs between different properties can be consid- 
ered through property targets or, more realistically, 
by measuring their effect on process metrics such as 


environmental impact or economic performance. The 
main difficulties in the formulation and solution arise 
from the complexity of the performance-structure re- 
lationships, which can be highly nonlinear, as in the 
case of phase-equilibrium models, or very expensive 
to evaluate, as in the case of detailed process models 
or quantum-mechanical calculations. Although much 
progress has been made in formulating problems for 
fluid—fluid separations, much remains to be done for re- 
active processes and solid-fluid separations. In partic- 
ular, reliable property—structure relations must be de- 
veloped and incorporated into the formulation of the 
problem. This may require new solution techniques to 
be developed, especially when the property-structure 
model requires an optimization problem to be solved. 
Furthermore, the nonconvexity of the problem must be 
addressed more generally so that reliable solutions can 
be guaranteed. 
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Introduction 


A triangulation of a given set S of n points in the Eu- 
clidean plane is a maximal set of non-crossing straight 
line segments (called edges) which have both endpoints 
in S. As an equivalent definition, a triangulation of S is 
a partition of the convex hull of S$ into triangular faces 
whose vertex set is exactly S. Triangulations are a versa- 
tile means for partitioning and/or connecting geomet- 
ric objects. They are used in many areas of engineer- 
ing and scientific applications such as finite element 
methods, approximation theory, numerical computa- 
tion, computer-aided geometric design, computational 
geometry, etc. Many of their applications are surveyed 
in [8,11,20,61]. 


A triangulation of S can be viewed as a planar graph 
whose vertex set is S and whose edge set is a subset of 
S x S. The Eulerian relation for planar graphs implies 
that the number e(S) of edges, and the number f(S) of 
triangles, do not depend on the way of triangulating S. 
In particular 


e(S) = 3n—3-—h 
t(S) =2n—2-—h 


where h denotes the number of edges bounding the 
convex hull of S. The number of different triangulations 
of S is, however, an exponential function of n. More 
precisely, the number of triangulations every set of n 
points (in general position) must have is §2(2.63”) [57]. 
A general upper bound is O(43”) [68], and point sets 
can be constructed that exhibit @(./72 " triangula- 
tions [6]. All these bounds are very recent, and various 
prior bounds have been given; see the citations listed 
in [6,57,68]. 

The problem of automatically generating optimal 
triangulations for a point set S has been a subject of 
research since decades. Enumerating all possible tri- 
angulations and selecting an optimal one (exhaustive 
search) is too time-consuming even for small n. In fact, 
constructing optimal triangulations in polynomial time 
is a challenging task. This becomes more apparent as 
greedy methods, such as deleting candidate triangles or 
edges from worst to best, are doomed to fail by the NP- 
completeness of the following problem; see Lloyd [56]: 
Given a point set S and some set E of edges on S, decide 
whether E contains a triangulation of S. 

Results on optimizing combinatorial properties of 
triangulations, such as their degree (Jansen [39]) or 
connectivity (Dey et al. [21] and Dillencourt [25]) are 
rare. Most optimization criteria for which efficient algo- 
rithms are known concern geometric properties of the 
edges and triangles. 


Delaunay Triangulations 


The most commonly constructed and maybe the most 
famous triangulation for a point set S is the Delau- 
nay triangulation, DT(S). See [8,27,34,45] for exten- 
sive treatments and surveys. DT(S) contains — for each 
triple of points in S - the corresponding triangle pro- 
vided its circumcircle is empty of points in S. Sib- 
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son [70] proved that DT(S) can be constructed from 
any given triangulation T of S by applying any sequence 
of good edge flips. These are exchanges of diagonals in 
one of Ts convex quadrilaterals Q such that - after the 
flip - the two new triangles are locally Delaunay, i.e., 
have circumcircles empty of vertices of Q. 

Various global optimality properties of DT(S) can 
be proved by observing that every good flip gives a lo- 
cal improvement of the respective optimality measure. 
Lawson [47] observed that equiangularity of a triangu- 
lation, which is the sorted list of its angles, increases 
lexicographically in this way. DT(S) thus maximizes 
the minimum angle. Coarseness of a triangulation is 
measured by the largest circumcircle that arises for its 
triangles. D’Azevedo and Simpson [19] showed that 
DT(S) minimizes coarseness in this sense, and also if 
smallest enclosing circles are taken rather than circum- 
circles. The latter property - unlike others - gener- 
alizes to higher-dimensional Delaunay triangulations; 
see Rajan [64]. Similarly, fatness may be defined as the 
sum of triangle inradii. Lambert [46] pointed out that 
DT(S) maximizes fatness or, equivalently, the mean in- 
radius. Triangular surfaces obtained from lifting DT(S) 
to 3-space (for any given heights at triangle vertices) 
minimize roughness, which is the integral of the squared 
gradient; see Rippa [66]. It is also known that a vari- 
ant of DT(S) minimizes the minimum angle; see Epp- 
stein [31]. 

The Delaunay triangulation is a special instance of 
regular triangulations, which are obtained by project- 
ing the lower boundary part of a convex polytope in 
3-space; see e.g. Edelsbrunner and Shah [28]. Regular 
triangulations are, thus, obviously optimal in the sense 
that they allow for a convex lifting surface. Some more 
optimality properties of DT(S) can be derived from this 
fact; see Musin [60]. Delaunay and regular triangula- 
tions can be constrained to live in nonconvex polygons 
(rather than in the convex hull of the underlying point 
set S), see [48] and [1], respectively. Equiangularity and, 
with it, various other optimality properties carry over to 
constrained Delaunay triangulations [48]. 

DT(S) is the geometric dual of the famous Voronoi 
diagram of a point set S and can be computed in 
O(n log n) time and O(n) space by various different ap- 
proaches; see e.g. [8,27,34]. Su and Drysdale [71] gave 
a thorough experimental comparison of available De- 
launay triangulation algorithms. 


On the negative side, DT(S) fails to fulfill opti- 
mization criteria similar to those mentioned above, 
such as minimizing the maximum angle, or minimiz- 
ing the longest edge. Edelsbrunner et al. [29,30] gave 
O(n? log n) time and O(n?) time algorithms, respec- 
tively, for computing triangulations optimal in these re- 
spects. The former algorithm is based on an edge in- 
sertion paradigm which is shown in Bern et al. [10] 
to lead - in polynomial time - to triangulations with 
maxmin triangle height, minmax triangle eccentricity, 
and minmax gradient surface, respectively. 


Minimum Weight Triangulations 


Most longstanding open was another optimal trian- 
gulation problem, the minimum weight triangulation. 
Here the criterion is weight, which is defined as the 
sum of all edge lengths. The complexity of computing 
a minimum weight triangulation, MWT(S), for arbi- 
trary planar point sets S was open since 1975 when it 
was mentioned in Shamos and Hoey [67]. Minimum 
weight triangulation is included in Garey and John- 
son’s [35] list of problems neither known to be NP- 
hard, nor known to be solvable in polynomial time. 
Very recently, its complexity status has been resolved; 
Mulzer and Rote [59] proved that minimum weight tri- 
angulation is NP-hard. 

Earlier attempts to prove the minimum weight tri- 
angulation problem NP-hard have resulted in some re- 
lated NP-completeness results; they are listed in [38]. 
Several heuristic algorithms have been proposed to solve 
this problem; see Lingas [53], Plaisted and Hong [63], 
and Heath and Pemmaraju [38]. None of these is 
known to produce a constant approximation in weight, 
although progress in this respect has been made later, 
see below. 

It is well known that the Delaunay triangulation 
DT(S) may exceed MWT(S) in weight by a factor 
of O(n); see Kirkpatrick [42]. Another popular tri- 
angulation, the greedy triangulation, GT(S), also may 
fail to approximate MW T(S) well. GT(S) is obtained 
by a greedy algorithm intended to yield small weight: 
Edges are inserted in increasing length order unless 
previously inserted edges are crossed and until the 
triangulation is completed. Several fast implementa- 
tions have been proposed, e. g. by Dickerson et al. [22] 
who also give a brief history. Levcopoulos [49] showed 
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a lower bound of §2(,/n) on the approximation factor. 
A matching upper bound has been given in Levcopou- 
los and Krznaric [50]. The same paper introduces an 
interesting modification of GT(S) that achieves a con- 
stant weight approximation for MWT(S) in polyno- 
mial time, though with a very large constant. Very re- 
cently, Remy and Steger [65] developed an algorithm 
that finds an (1 + ¢)-approximation of MWTV(S) in 
quasi-polynomial time, ns" 

For uniformly distributed point sets S, both triangu- 
lations GT(S) and DT(S) yield satisfactory approxima- 
tions for MWT(S); see e.g. Lingas [54]. In fact, GT(S) 
seems to perform slightly better, as reported in Dick- 
erson et al. [23]. GT(S) can be constructed in O(n) 
expected time in this case, by an algorithm in Drys- 
dale et al. [26] or by a modification of the algorithm in 
Levcopoulos and Lingas [52]. 

The weight of a triangulation may decrease when 
additional points (so-called Steiner points) are admit- 
ted. Eppstein [32] showed that the weight of a minimum 
weight Steiner triangulation for S may be 92(n) times 
smaller than the weight of MWT(S). The same pa- 
per gives efficient triangulation algorithms that approx- 
imate the weight of the former within a constant factor, 
thus improving over previous results in Bern et al. [12]. 
No polynomial-time algorithms are known for the ex- 
act minimum weight Steiner triangulation problem. 

Dynamic programming is a powerful tool to deal 
with discrete optimization problems which are decom- 
posable in a certain sense. It leads to polynomial-time 
solutions for some restricted instances of the minimum 
weight triangulation problem. For example, if S is the 
vertex set of a convex polygon then MWT(S) can be 
computed in O(n?) time and O(n’) space. The basic 
observation used is that - once some triangle of the tri- 
angulation has been fixed - the problem splits into sub- 
problems (subpolygons) whose solutions can be found 
recursively, thereby avoiding recomputation of com- 
mon subproblems. The triangulation method, first pro- 
posed by Gilbert [36] and Klincsek [44], does not really 
exploit convexity. It works as well for nonconvex poly- 
gons, and in fact for any interior face of a planar straight 
line graph. It is worth mentioning that, in the convex 
polygon case, MWT(S) is approximated by GT(S) up 
to a constant factor; see Levcopoulos and Lingas [51]. 
Anagnostou and Corneil [7] consider the case where S$ 
gives rise to a small number k of convex layers (nested 


convex hulls). Their dynamic programming approach 
works in time O(n***") and thus is polynomial for con- 
stant k. Meijer and Rappaport [58] later improved the 
bound to O(n") when S is restricted to lie on k non- 
intersecting line segments. 

The minimum weight triangulation problem can 
also be formulated as a linear programming problem. 
To this end, a variable x; is assigned to each of the (°) 
edges e; defined by S. The objective is to minimize 


do silei 
ej 


subject to the constraints 


O<xj <1 


xj +xj <1 fore; Ne # p 


xit+ > xj21. 


ejNei FY 


The last two constraints express the property that a tri- 
angulation is a maximal set of non-crossing edges. An 
integer solution of this linear program yields a mini- 
mum weight triangulation: For each edge e;, inclusion 
into, or exclusion from M WT(S) is indicated by x; = 1 
or x; = 0, respectively. An optimal solution need not 
be integer, however, and insisting on an integer solu- 
tion results in an integer programming problem whose 
general version is known to be NP-complete [35]. Using 
a modified version of this approach, Ono et al. [62] were 
able to compute MW T(S) for up to a hundred points. 

The afore-mentioned polynomial-time results for 
triangulating polygonal domains (rather than point 
sets) give motivation for the following subgraph ap- 
proach to compute MWT(S), proposed e. g. in Xu [73] 
and Cheng et al. [14]: Find a (suitable) subgraph G of 
MWT(S). If G contains k connected components, try 
all possibilities to add k —1 edges to make it a con- 
nected graph C. Complete each of these graphs C to 
a triangulation by optimally triangulating its faces, and 
select a triangulation with minimum weight, which 
gives MWT(S). This approach, which basically is ex- 
haustive search, can be implemented to run in O(n*+?) 
time. The problem, of course, is to find candidates for 
G with k small. 

The subgraph approach should be distinguished 
from the heuristic approaches in [38,53,63] we men- 
tioned before, which also first fix some graph G for S$ 
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(for example, the minimum spanning tree) and then tri- 
angulate out its faces. The difference is that G will not be 
a subgraph of MWT(S) in general, and thus does not 
lead to an exact solution. 


Locally Defined Subgraphs of MWT 


Many efforts have been put into the study of subgraphs 
of MW T(S). Gilbert [36] pointed out that the shortest 
edge defined by S always belongs to MW T(S). Another 
simple observation is that unavoidable edges, which are 
edges not being crossed by any other edge in S x S, 
have to appear in any triangulation of S and thus are 
in MWT(S). For example, all edges of the convex hull 
of S are unavoidable. The number of unavoidable edges 
does not exceed 2n — 2, see Xu [74], and usually is very 
small. 

Only in recent years, several more substantial sub- 
graphs of MWT(S) have been identified. One of them 
arises from a class of empty neighborhood graphs intro- 
duced by Kirkpatrick and Radke [43]. Let p,q € S and 
B = 1. The edge pq is included in the B-skeleton, B(S), 
if the two discs of diameter 6|pq| and passing through 
both p and q are empty of points in S. It is not hard 
to see that B(S) always is a subgraph of the Delaunay 
triangulation DT(S). In fact, B(S) can be constructed 
from DT(S) in O(n) time; see Lingas [55] or Jarom- 
czyk et al. [40]. 

Interestingly, 6(S) is a subgraph of MWT(S) for B 
large enough. The original bound £ > V2 in Keil [41] 
has been improved later in Cheng and Xu [17] to 
B > 1.1768, which is close to the largest value 2//3 
for which a counterexample is available [41]. Only 
for point sets S in convex position it is known 
that B > 2/./3 always implies inclusion of B(S) in 
MWT(S); see Hainz et al. [37]. Whereas 6(S) is a con- 
nected graph for 8 = 1 (known as the Gabriel graph of 
S), it may be highly disconnected for larger B-values. 
Cheng et al. [14] prove that - for uniformly distributed 
point sets S - the number of components is expected to 
be O(n). 

Yang et al. [76] formulated and proved a different 
inclusion region: If the union of the two disks with ra- 
dius |pq| and centered at p and q, respectively, is empty 
of points in S, then pq is an edge of MWT(S). That is, 
points p and q are mutual nearest neighbors. The skele- 
ton generated in this way and the £-skeleton do not 


contain each other for 6 > 2/./3, but for smaller B-val- 
ues the 6-skeleton contains the former as a subgraph. 
Note that both graphs are defined via a symmetric and 
local condition. A sufficient asymmetric condition can 
be found in Wang et al. [72]. We refer to Eppstein [33] 
for a survey paper on geometric graphs. 

A distinct attempt to find a sufficient local condi- 
tion defines an edge e as a light edge [2] if there is 
no edge in S x S crossing e and shorter than e. Let 
L(S) denote the graph formed by the light edges for S. 
L(S) obviously contains all unavoidable edges, and in 
fact contains both Cheng and Xu’s 1.1768-skeleton and 
the skeleton in Yang et al. [76] By construction, L(S) 
is a subgraph of the greedy triangulation GT(S), as 
light edges cannot be blocked by any edge previously 
inserted by the greedy strategy. On the other hand, 
L(S) is no subgraph of MW T(S) in general. However, 
if L(S) happens to be a full triangulation of S, then 
L(S) = MWT(S) = GT(S). This allows for a fast com- 
putation of MWT(S) for a certain class of point sets S, 
using greedy triangulation algorithms. 

These results are observed in Aichholzer et al. [2] 
as a consequence of the following matching theorem 
for planar triangulations, proved in Aichholzer et al. [4] 
and in Cheng and Xu [16]: For any two triangulations 
T, and T> ofa fixed point set S, there is a perfect match- 
ing between the edge set of T; and the edge set of T 
such that matched edges either cross or are identical. 
A similar matching exists for the triangle sets of T; and 
T2, where matched triangles either overlap or are iden- 
tical. The paper [2] further gives several polynomially- 
time computable lower weight bounds for MW T(S), the 
simplest one being the weight of L(S). Another bound 
can be stated as 


min )_Ig(e)| 
gEX(E) » 8 


where E is any set of non-crossing edges, and X(E) is 
the set of all matchings g : E — S x S with the property 
gle) = e or g(e) crosses e. 

The concept of light edges gives rise to a partition of 
the greedy triangulation GT(S) into levels L),..., Lx: 
Let L; = L(S) be the set of edges that are light with re- 
spect to S x S, let Lz be the set of edges that are light 
with respect to (S x S) \ Ci, where C; is the set of edges 
crossing L;, and so on. Upper bounds on the ratio in 
weight between GT(S) and MWT(S) that depend on 
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the number k of levels are observed in Levcopoulos and 
Krznaric [50] and Aichholzer et al. [5]. In particular, 
GT(S) is a constant-factor approximation of MWT(S) 
provided k = O(1). 


Globally Defined Subgraphs of MWT 


We have seen that several subgraphs of MW T(S) can 
be found from local conditions, namely, via emptiness 
of particular inclusion regions. The subgraphs below 
are defined in a global way, via intersection of triangu- 
lations. 

Call a triangulation T of S locally minimal if every 
4-sided and point-empty polygon drawn by T is opti- 
mally triangulated. That is, every convex quadrilateral 
contains the shorter one among its two diagonals. It is 
easy to see that GT(S) is locally minimal and DT(S), in 
general, is not. Let LMT(S) denote the intersection of 
all locally minimal triangulations for S. Then LMT(S) is 
a subgraph of MWT(S), as this triangulation of course 
is locally minimal, too. 

Whereas it is not known how to compute LM T(S) 
in polynomial time, a surprisingly large subgraph of 
LMT(S), the so-called LMT-skeleton can be identi- 
fied by a simple and efficient method, proposed in 
Belleville et al. [9] and in Dickerson and Montague [24]: 
Consider some edge set E C S x S. An edge e € E is 
called redundant in E if there is no convex quadrilateral 
formed by E that has e as its shorter diagonal. Edge e 
is called unavoidable in E if no other edge in E crosses 
e. The LMT-skeleton algorithms puts E = S x S and 
proceeds in several rounds. Each round first identifies 
all edges redundant in E and eliminates them from the 
set, and then includes all edges that are unavoidable in 
the reduced set E into the LMT-skeleton. The algorithm 
stops when no more edge in E can be classified as ei- 
ther redundant or unavoidable. The number of rounds 
(but not the produced LMT-skeleton) depends on the 
ordering in which the edges are examined. In particu- 
lar, there always exists an ordering such that one round 
suffices; see Hainz et al. [37]. 

The fact that the LMT-skeleton for S, and thus 
LMT(S), tend to be connected even for large point 
sets comes at a surprise. From the practical point of 
view, the LMT-skeleton almost always nearly triangu- 
lates S. On the other hand, a 19-point counterexam- 
ple to connectedness exists [9], and for uniformly dis- 


tributed points, the expected number of components 
is O(n); see Bose et al. [13]. The constant of propor- 
tionality is extremely small, however. It is interesting 
to note that the LMT-skeleton, and the graph of light 
edges L(S), exhibit a similar behavior of connectedness, 
but do not contain each other in general. We mention 
further that the improved LMT-algorithm in [37], that 
tends to yield some additional edges of LMT(S), in- 
deed constructs LMT(S) provided the original LMT- 
skeleton for S is connected. 

The LMT-skeleton clearly can be constructed in 
polynomial time, and several variants have been con- 
sidered in order to gain efficiency [9,15,24,37]. A pow- 
erful tool is pre-exclusion of edges before starting the 
LMT-algorithm, using the exclusion region in Das and 
Joseph [18]: For an edge e, consider the two triangu- 
lar regions with base e and base angles 7/8. If both 
regions contain points in S then e cannot be part of 
MWT(S). If S is drawn from a uniform distribution, 
reduction to an expected linear number of candidate 
edges for MWT(S) is achieved [22], and near-linear 
expected-time implementations of the LMT-algorithm 
exist [37,69]. The LMT-skeleton based subgraph ap- 
proach enables the computation of a minimum weight 
triangulation for some tenthousand points in reason- 
able time. 


Some Related and Open Problems 


Let us briefly state a few open problems related to opti- 
mal triangulations. 

A fast algorithm for computing the minimum 
weight triangulation of a simple polygon would have 
many applications and thus is of practical interest. 
Even for convex polygons, no algorithm using less than 
@(n*) time and O(n’) space is known [44]. No progress 
has been made since 1980 on this problem. 

Minimality in weight may be relaxed to k- 
optimality, meaning that all k-sided and point-empty 
polygons in a triangulation are optimally triangulated. 
This is a generalization of local minimality which con- 
stitutes the instance k = 4. Whereas it is easy to com- 
pute 4-optimal triangulations (the greedy triangula- 
tion is one), no results are known on how to compute 
a 5-optimal triangulation in polynomial time. An algo- 
rithm based on the edge insertion paradigm is proposed 
in [75]. 
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The maximal number of triangulations of a set of n 
points is a quantity still not well understood. The gap 
between the best known upper bound, O(43”) [68], and 
lower bound, 2(./72 ") [6], is large. The common be- 
lief is that the latter function is closer to the truth. 

In the last ten years, a relaxation of triangula- 
tions, so-called pseudo-triangulations, have been be- 
come popular, especially in computational geometry. In 
addition to triangles, pseudo-triangles (polygons with 
exactly three convex internal angles) are used as faces. 
Unfortunately, not much is known about optimality 
properties of pseudo-triangulations. Some basic prop- 
erties of minimum weight pseudo-triangulations are 
given in the paper [3], which also shows that the greedy 
pseudo-triangulation always has to be a triangulation. 
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Introduction 


Optimization techniques have found wide applica- 
tions in solving various problems in ad hoc networks. 
Many problems in topology control, routing, maximum 
flow and resource allocation [23,29,33] can be formu- 
lated as optimization problems. Compared with wired 
networks, there are many challenges exclusive to ad 
hoc networks. Owing to the interdependencies across 
multiple layers, many factors have to be considered 
in a unified framework. The interdependencies com- 
plicate both problem formulation and algorithm de- 
sign. For instance, interference among transmissions 
is a critical constraint to throughput of wireless net- 
works; the channel is shared by all nodes in the net- 
work. Power control determines the interference range 
of each node. Therefore, to achieve the maximum flow 
in a static wireless network, power control should also 
be considered. 

In our research, we have applied optimization- 
based approaches to routing, maximum flow and cross 
layer design problems in ad hoc networks and WSNs. 
The optimization problems formulated are usually 
nondeterministic polynomial-time (NP) hard, so the 
computation complexity is not affordable for resource- 
constrained nodes. We have devised distributed or 
approximation algorithms to solve the problems eff- 
ciently. In the rest of this chapter, we present some of 
the successful approaches that we have developed. 


Applications 


Multiconstrained Quality-of-Service 
Multipath Routing 


Although small in size, sensor nodes are built with sens- 
ing, processing and computing capabilities. They report 
the information collected to the sink for further pro- 
cessing. Depending on different applications, the pack- 
ets generated show diverse attributes. Different traffic 
has different requirements regarding packet delivery, so 
quality-of-service (QoS) routing is an important issue 
in WSNs. We have investigated both reliability and de- 
lay constraints in QoS routing. Here, reliability is de- 
fined as the packet delivery ratio. Prone to link changes 
and failures, sensor networks are unreliable. The em- 
pirical result from Berkeley [28] shows that the aver- 
age packet loss ratio increases 5-10% per link in sen- 
sor networks. Multiconstrained routing is faced with 
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time complexity and/or space complexity. For wireless 
networks, complete and accurate state information is 
not available owing to the time-varying traffic and link 
quality. Only soft-QoS provisioning is attainable in no- 
toriously unpredictable wireless communications. Here 
soft QoS is defined as guaranteeing the QoS require- 
ments with probability. It approximates hard QoS when 
the probability approaches 1. It is known that finding 
a path subject to two or more additive constraints is 
NP-complete [19]. Therefore, solving the problem in 
a heuristic and approximate way is the only reason- 
able approach for resource-limited sensor nodes. Soft 
QoS follows naturally from the inherent random link 
characteristics of ad hoc networks and WSNs. Owing to 
the inherent difficulty of end-to-end QoS and the lim- 
ited functionality of sensor nodes, some approximate 
methods have to be applied to deal with the compu- 
tational complexity. We first formulate the end-to-end 
soft-QoS problem as a stochastic program. Then we 
propose a distributed routing algorithm based on the 
linear program, which is a deterministic approximation 
of the original end-to-end QoS routing. Our proposed 
routing algorithm is hop-based, so it is scalable and 
convenient to implement. As another favorable feature, 
it circumvents the formidable computational complex- 
ity of the multiconstrained path problem. 

For wired networks, many papers have proposed 
exact or heuristic algorithms targeted at multicon- 
strained path or multiconstrained optimal path prob- 
lems [13,18,19,26,27,42,43]. However, WSNs differ 
from wired networks in the limited energy, memory 
and computation capabilities of nodes, and link char- 
acteristics. Multipath routing has been applied to ad- 
dress QoS in ad hoc networks [38], but we formulate 
the problem in a more rigorous way and consider mul- 
tiple QoS constraints. 

For a given path p, the end-to-end reliability can be 
computed as 


I] rij. (1) 


(i, jJEp 


where rj is the reliability of link (i,j) on path p. If 
there is no single feasible path satisfying the reliabil- 
ity requirement, multipath routing can be used to im- 
prove the reliability. Carefully choosing a subset of ex- 
isting paths, one can transfer the packet on all those 
paths. Although an individual path cannot achieve the 


performance goal, multiple paths may meet it aggre- 
gately as shown in Fig. 1. In Fig. 1, the source node 
assigns reliability R, to its next hop node 1. Neither 
link Jj, nor 1,3 could satisfy this reliability requirement 
alone, so node 1 distributes reliability requirement R» 
to link J). and R3 to link 13, so that R. + R3 > Rj. 
The same process is performed at each intermedi- 
ate node. Finally at sink node d, the three paths, 
so17327>4-d4,s57-17>3>5->7->d and 
s—>1—-3—6-—-7-— d,can achieve the desired re- 
liability additively. The assembly efficiency of multiple 
paths is a great boon to unreliable sensor networks. Ob- 
viously, there exist many feasible combinations. To save 
the energy cost, the set with the minimum number of 
paths is chosen as the forwarding set. We argue that 
sending a packet on more paths induces a higher energy 
cost, because more data packets have to be transmitted. 
Using more paths introduces more contention, which 
degrades energy efficiency. Although some paths in the 
set may have more hops, it is still more energy efficient 
to confine packets to a few paths. 

First of all, the question of how to quantify the re- 
liability achieved by a subset of paths needs to be ad- 
dressed. Then how to choose the energy-efficient path 
set subject to the delay constraint is our main focus. 
Denote d as the sink, which is assumed to be station- 
ary. Let P(s,d) denote the path set of P possible paths 
from a source node s to d. Each path p; in P(s,d), 
j=1,2, ... ,P, is associated with delay d; and reliabil- 
ity rj. The aggregate reliability of multiple paths is ap- 
proximated as the sum of the reliability of those paths. 
We formulate the problem as follows: Vp € P(s, d), at 
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Reliability distribution between a source-destination pair 
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Notation 


Notation Misahing 


Hop requirement for delay at node i 


Hop requirement for reliability at node i 


Actual delay of the packet arriving at node i 


Reliability requirement assigned to the path 
through node 


Delay of link /jj, described as a random variable 


Reliability of link /j, described as a random 
variable 


Decision variable of whether link (i,j) is used 
Mean of dj 
Mean of rj 


Standard deviation of dj 


Standard deviation of rj 


source node s, 


P 
Minimize >. xj 
j=l 
subject to xjdj < D, 
P 


r=1—][1-x;rj;>R. 
j=l 


xj =Oorl, forall j=1,2,...,P, 


(2) 


where D and R are denoted as the delay and reliabil- 
ity QoS requirements respectively, and the x; are the 
decision variables for whether path j is chosen or not. 
This defines a 0-1 integer programming problem. For 
clarity, the notation used in this chapter is explained in 
Table 1. 

The problem definition requires exact information 
about path quality, which is almost impossible to obtain 
in WSNs; hence, only soft-QoS provisioning is achiev- 
able. We can formulate the constraints of the defined 


problem in a probabilistic way: 


P 
Minimize 2 xj 
j=l 
subject to P(xjdj < D) > a, for D> 0 (3) 
P(r > R)> B 


xj=Oorl, Vj ¢€N(i), 


Constraint (3) can be further simplified as 


P 
P Y log(1 — xjrj) < log(l—R)] = B. 
j=l 


This formulation is a nonlinear program, which 
could have more than one solution. Solving this non- 
linear program at each node once a packet has been 
received is not practical. So an approximate method, 
which could significantly simplify the computation of 
the original problem, while providing comparably fine 
results, may be more practical. 

Though the end-to-end QoS problem formulation 
provides the exact optimal routing solution, it is subject 
to many inextricable challenges. First, wireless links are 
susceptible to fading, interference and traffic variation; 
therefore, it is almost impossible to obtain the exact in- 
stantaneous link state information. So path informa- 
tion, which is accumulated along all links on it, is even 
more unpredictable. Second, keeping path metrics con- 
sistent at all nodes is an even more formidable problem. 
Since it takes some time for updates to propagate across 
the network, some nodes refresh their path information 
with the new updates received, while other nodes are 
still using obsolete information for routing decisions. 
A packet going through nodes with asynchronous path 
information may miss the QoS requirement. Third, 
storage of voluminous end-to-end path information is 
dreadfully memory demanding. Possible paths between 
two nodes may be numerous given densely deployed 
nodes, whereas a sensor node is equipped with very 
limited memory. Furthermore, manipulation of end- 
to-end information is computationally burdensome for 
sensor nodes. The delay-constrained path problem is 
known to be NP-hard. The complexity is beyond the 
computation and energy tolerance of sensors. 

The preceding reasons shed light on link-based QoS 
routing. Per hop information is convenient to acquire 
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and maintain at a low overhead cost. The acquired 
neighbor information is enough to make routing de- 
cisions, which saves a large amount of computation. 
Thus, sensor nodes are free of intricate computation. 
For those superior features of per hop routing, we pro- 
pose approximating path quality on the basis of link 
quality. In wireless networks, delay and reliability tend 
to fluctuate with time. To model this phenomenon, we 
assume that the link delay and reliability are random 
processes dj(t) and rj(t). The time index, t, is omitted 
for simplicity in the following discussion. We assume 
that links are independent in terms of delay and relia- 
bility. Our goal is to develop a method so that both de- 
lay and reliability are ensured with high probability. We 
only employ the first and second moments of delay and 
reliability in our derivation. Now the new approximate 
problem to be addressed based on local information is 
formulated as 


Minimize y Xj 
jEN(i) 


subject to P(xjdij < eS) >a, for . >0, 


P( ¥ xij > L;)= Bp, (4) 
jeN(i) 
x;=O0orl, VjeN(i), 


where the xis are the decision variables, and dj and rj 
are the delay and reliability of link 1; at the routing de- 
cision instant, respectively. This is a probabilistic in- 
teger program. In the original problem definition, the 
nonlinear program is to be solved only at the source 
based on end-to-end information. In contrast, the ap- 
proximate problem is to be resolved at all intermediate 
nodes since the approximate problem is based on hop 
information. We further reduce the computation com- 
plexity of the approximation constraints [10]. The final 
problem formulation is 
Formulation: At each node i, 
minimize », xj 
j€N(i) 


a 2 2 
Sit. Xj (—— (44) 2h dy= d},) eae 
when is = dij >0 
Y~ xilriy + QUB)AT) = WR, 
jEN(i) 


xj=Oorl, VjeN(i). 


The new optimization problem is a deterministic es- 
timate of the problem formulated in (4). The problem 
size is proportional to the number of neighbors, which 
is usually small, so it can be conveniently solved by ex- 
isting algorithms. 


Maximum Flow Problem in Wireless Ad Hoc 
Networks with Directional Antennas 


Owing to the hostile wireless channel, and interference 
within and among flows, how to achieve the maximum 
throughput in multihop wireless ad hoc networks has 
been of great interest over the past few decades. Espe- 
cially for resource-constrained ad hoc networks, how 
to improve the system capacity is even more impor- 
tant. With switched-beam technology, the directional 
antenna is shown to be an appealing option for wire- 
less ad hoc networks. By concentrating RF energy in the 
intended transmission direction, the spatial transmis- 
sion region shrinks proportionally to the beam width 
of a sector. A directional antenna is able to reduce in- 
terference and energy consumption, and improve the 
spatial reuse ratio; thus, it can significantly boost the 
channel capacity. So the problem of interest is as fol- 
lows: Given a network topology and existing traffic 
load, how can we achieve the maximum flow between 
a given source-destination pair through optimal path 
selection? 

Owing to the different interference patterns induced 
by directional antennas in ad hoc networks, constraints 
for the maximum flow are novel and distinct. The max- 
imum flow problem to be addressed here is different 
from the classical maximum flow problem in network 
flow theory [1,4,5,6,7,16,20,36,39]. In wired networks, 
there is no interference among transmissions. Any link 
can be active at any instant without interference from 
other links. However, the broadcasting nature of the 
wireless medium makes the shared wireless channel 
bottleneck for network flow. To avoid collision, links 
in a close neighborhood may not be active simulta- 
neously. Furthermore, the interference condition of 
wireless networks with directional antennas is different 
from those with omnidirectional antennas. The asymp- 
totic throughput bounds under certain assumptions re- 
garding network topology and node configuration have 
been derived in many papers [9,14,17,30,40]. Without 
assumptions regarding the network topology or homo- 
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Optimization in Ad Hoc Networks, Figure 2 
The directional antenna model 


geneity of link capacity, we attempt to solve the prob- 
lem in a generalized setting. For the first time, the 
interference-constrained maximum flow problem in ad 
hoc networks with directional antenna is formulated as 
an optimization problem [12]. This problem is inher- 
ently a joint multipath routing and optimal scheduling 
problem. 

According to the beam pattern (beam radius, beam 
width, beam orientation), we have omnidirectional an- 
tennas, single-beam directional antennas (e.g., sin- 
gle-beam switched-beam antennas) and multibeam di- 
rectional antennas (e.g., multibeam switched-beam 
antennas or sectorized beam antennas). The beam ra- 
dius is the distance that a transmission reaches. The 
beam width is determined by the angle of a sector. For 
a six-beam directional antenna, the angle of a beam 
is 2/3. The direction that a beam is targeting is de- 
fined as the beam orientation. For directional anten- 
nas, both directional transmission and directional re- 
ception are enabled. To be clear, for single-beam di- 
rectional antennas, we assume only one directional 
transmitting beam or one directional receiving beam 
can be active at a time; for multibeam directional 
antennas, multiple directional transmission beams or 
multiple directional receiving beams can be active at 
a time. However, a beam can only be either trans- 
mitting or receiving at any instant. An illustration of 
a switched-beam antenna with six beams is shown in 
Fig. 2. Assume that the antenna is directed to dis- 
crete directions, with fixed beam radius and beam 
width. There is a link between nodes i and j if the dis- 
tance from node j to node i is shorter than the beam 
radius. 

An illustration of a node graph comprising nodes 
with directional antennas is shown in Fig. 3, though 
a realistic node graph is always more complex. Node 1 


Optimization in Ad Hoc Networks, Figure 3 
Node graph G = (V, £) 


and node 6 are considered the source node and the des- 
tination node, respectively. 

The problem to be addressed here is given network 
G(V,E) and existing flows, find the maximum flow sup- 
ported by the network between pair s—d. Before the 
complete problem formulation is presented, we define 
the constants and variables used:. 

e xj; indicates the flow over link (i,/). 

e f is the flow from source node s to source node d. 

e 5; ,;(i,l) indicates whether link (i,j) is in the 1 beam of 
node i. 

Bis the total number of beams at each node. 

E is the set of edges. 

V is the set of nodes. 

6; is the beam of node i that node j resides in. 

Now the maximum flow problem can be formulated as 


the following optimization problem. 


Maximize f 


subject to a Xi,j — ea Xj, 

{j:(i,j)€E} {j: G,i)€E} 
f — ae 

= 40 i= V-—{s,d}, (5) 
-f i=d; 

> Xk Suij, WG jek; 
(k,I)EAj, | 
xi, 20, VG, jE, 


where uj; is the normalized remaining capacity or 
bandwidth (0 < u;,; < 1) for link (i,j). The second con- 
straint specifies the contention for the resource of each 
link. This is a traditional maximum flow problem with 
an added interference constraint. 
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Optimization in Ad Hoc Networks, Figure 4 
Interference caused by (u,v) to (i,j) 


In wireless networks, a transmission collision oc- 
curs when a receiver is in the communication range 
of two transmitters, because the receiver receives both 
time-overlapping signals and cannot decode them cor- 
rectly. We assume that an antenna both transmits and 
receives directionally, but that it cannot transmit and 
receive simultaneously. With a directional antenna, 
two links interfere with each other if a receiver is in 
the transmitting beams of both transmitters, shown 
in Fig. 4. To guarantee a successful reception at node 
j» any node in the receiving beam of node j cannot 
transmit towards node j before current transmission 
finishes. 

The protocol model: In the protocol model, the 
transmission from node i to node j is successful if 
(1) node j is in the transmission range of node i, 
di; < R, where R is the transmission range; (2) any 
node u that is in the receiving beam of node j from 
node i is not transmitting in the beam covering node j 
(when interference range equals transmission range). 
This means that node j must be outside the transmis- 
sion beam of node u. 

Since the interference region is a beam, the informa- 
tion about the beam to which a link belongs is essential 
for routing and scheduling. Assume there are B fixed 
beams for each antenna. Now we can recapitulate con- 
dition (2) of the protocol model in the following way: 

(2) When (i,/) is active, for any node u in node js re- 
ceiving beam towards node i, the beam 4) should keep 
silent. Denote b(i,l) as the Ith beam of node i, where 
b= ee cB: 

A single-beam directional antenna can only target 
one beam at a time. So the channel utilization is shared 
by all links in all beams. For a single-beam directional 
antenna, we can formulate the maximum flow problem 
as the following mixed-integer program (MIP). 


Problem formulation: 


Maximize f 

ti: GEES ti: G.i€B} 
f i=s, 

= 40 i=V-—{s,d}, 


subject to 


-f i=d; 
YY dS xuvduy(u, 6) + 
u€b(i,l) (u,v)EE 


interfering links in the /th beam 


> xKibni(i,) <1, VI,i, 
(k,i)€E 


incoming flows 


B 
»( Ye xn ideal, D 
( 


1=1 \(k,i)EE 


- > x, ;bi,;(4, 0) <1, VWieV, 


(i, j)€E 
1, ifG@ je bG,)), 
igGns ee 
0, otherwise 
xij 0, V(i, j) EE. 


(6) 


The first constraint describes the in-flow and the 
out-flow at each node. The second constraint indicates 
the flow interference around i as specified by condition 
(2') in the protocol model. The first term represents the 
sum of flows causing interference to node i in beam I. 
When those flows are active, node i must restrain from 
receiving. The second term stands for the total incom- 
ing flows to node i in beam /. The sum of these two 
terms should be less than the normalized beam capacity 
of 1. By beam capacity, we mean the channel capacity 
or bandwidth, which is a constant. The third constraint 
describes the time-sharing constraint. The second and 
third constraints aggregately describe the interfering 
flows at a node. The contention region includes flows 
over all arcs in the one-hop area of a node. 

Denote M as the number of links in the network, 
N as the number of nodes in the network. The num- 
ber of variables and constraints in this MIP are M and 


2770 


Optimization in Ad Hoc Networks 


O(N + M), respectively, where O(x) indicates the vari- 
able on the order of x. 

For a single pair of source and destination nodes 
in wireless networks with multibeam directional anten- 
nas, the problem formulation in (5) can be expanded 
more specifically as follows: Problem formulation: 


Maximize f 


a= 


{j: (i, j)€E} 


) X j,i 


{j: Gi) €E} 


subject to 


YY ghee a) 


u€b(i,l) (u,v)EE 


+ YP xsd <1, VLi gy 
(k,i)EE 


Y, xehe fd 


(k,i)EE 


+ ye xi,jbi,j4,m) <1, 
(,)€E 


Vi<Il,m<B,VieV, 
1, if1=6), 
0, otherwise 


V(i, j) EE. 


bj, 1) = 
xij 20, 


The first two constraints are the same as those in (6). 
The third constraint guarantees that the flow is feasible 
because the in-flow and the out-flow share the capacity 
at the node. This constraint also implies that the in-flow 
from any beam should not be greater than 1. The num- 
ber of variables and constraints in this MIP are M and 
O(N + M), respectively. 


Joint Mobility Control and Power-Efficient Routing 
in Sparse Wireless Sensor Networks 


Many WSNs have been deployed for environmental 
monitoring. Such networks are typically sparse and 
consist of nodes of various capabilities. In a sparse net- 
work, a target node may be far away from other nodes 
and data stations. It has to transmit at high power to 
reach another node when it reports information col- 
lected to the sink. A transmission over a long hop con- 


sumes considerable energy, and thus undermines the 
purpose of long-term monitoring. For certain appli- 
cations demanding an end-to-end path, communica- 
tion over long distance is too expensive for energy- 
constrained sensor nodes. 

Current routing protocols for intermittently con- 
nected networks are designed for collecting delay- 
tolerant data in the network [24,34,35,37]. They are 
incapable of supporting certain applications owing to 
the large delay and the lack of a path between the 
source and the destination. In order to support end-to- 
end data delivery in the sparse network, a novel rout- 
ing algorithm is proposed to establish energy-efficient 
paths. Our idea is to utilize the mobility of controllable 
robots to realize energy-efficient on-demand routing in 
a sparse WSN. Adding another dimension, mobility, to 
our design, our work is different from traditional joint 
power control and routing [2,3,8,15,21,22,25,31,41]. 
Through optimal placement of mobile robots along 
with optimal power assignment and path selection, en- 
ergy efficiency can be significantly improved. 

We assume that all nodes have the same receiver 
sensitivity, or receiving power threshold, Pr. If the re- 
ceived power is greater than the threshold, the recep- 
tion is successful. Otherwise, the node cannot decode 
the packet correctly. Since nodes transmit at the same 
data rate, the transmission time of a packet is a con- 
stant. So minimizing energy consumption is equivalent 
to minimizing power because the constant transmis- 
sion time does not impact the result. Denote P,'/ and 
P;'i as the receiving power and the transmission power 
over the directed link (i,j), respectively. Assume the Eu- 
clidean distance between nodes i and j is d'/. The prop- 
agation model [32] used in our paper is 


pi = Ph(Gy, Gy, At, hy, L, A) _ api! 
d’. d”’ 
ij 
where G; and G, are the gain factors of the transmitter’s 
and the receiver’s antenna and h; and h, are the antenna 
heights of the transmitter and the receiver, respectively. 
L is the system loss factor not related to propagation. A 
is the wavelength. a is determined by G;, G,, hy, h;, L 
and A, which is a constant in our paper. y indicates the 
path loss exponent, 2 < y < 6. To correctly receive the 
data, the transmission power over link (i,j) must satisfy 
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the following condition 


pi = > (9) 


Denote (x;,y;) as the coordinates of node i. Substituting 
dj with coordinates into (9) gives 


Pa((x; — xj)? + (yi — y;)?)”” 
a 


(10) 


ij 
Poe 


Assume the power consumed in packet reception is P,, 
which is the same for all nodes. The problem of concern 
is how to actively place robots so that the total power 
consumption for data delivery is minimized. Therefore, 
determining the best positions of the robots, say (x;,,); 
is one of the objectives. Besides, choosing the path and 
the corresponding transmission powers of all interme- 
diate nodes on the path also determines the total power 
expenditure in communications. The energy consumed 
in mechanical movement is not considered, because the 
robot is constantly roaming in its territory to collect 
data messages. 
Single Robot Assume there are N sensor nodes and 
one robot in the field. A single path is to be established 
from the source node to the data station. Therefore, 
each node on the path just transmits to a single next- 
hop node. The transmission power over a link is deter- 
mined by the hop distance. Wireless links are of poor 
quality, so the link error rate plays an important role 
in energy consumption. We use expected transmission 
count (ETX) to quantify the retransmissions caused by 
link error. ETX is the expected number of transmis- 
sions for a successful reception. Assume the expected 
transmission count over link (i,j) is ET Xj, then the ex- 
pected power consumption for transmitting a packet 
with a link retransmission mechanism is P,JETXy. An 
indicator variable Aj is used to identify if link (i,j) is 
active. Then the transmission power of node i, P;', is 


N+1 7 
P= >) AjEDGP? 
j=l j#i 


When only one robot is deployed, we name it as the 
N + Ith node. Now the problem of finding the min- 
imum power path from the source sensor node s to 
the data station d can be formulated as an optimization 


problem 
N+1 N+1 7 N+1 
Minimize )° {| > AsjETXi;P;/ + )) AniPi 
i=1 \j=Lj4i k=1,k#i 


(11) 


7 Pr((x; — x;)” i— y2)V2 
subject to Aj, P,’ = Ajj R (( a) 


a 
(12) 
N+1 
> Aij <1 Vi (13) 
j= LiF 
N+1 N+1 1 j 
> Aki = > Aij = : (14) 
k=1,kAi j=l j#i 0, otherwise 
Ay =Oorl. (15) 


In this optimization problem, the Aj determine the 
path between the source and the destination. Since the 
locations of the sensor nodes are known and fixed by 
the sink, their coordinates are constant. We only need 
to determine the coordinates of the robot, which are 
xn+1 and yy+1. Clearly, this is a three-dimensional 
problem. The objective in (11) is to minimize the to- 
tal power consumption of all nodes on the end-to-end 
path. The same as for (10), to reach node j from node 
i, the transmission power has to satisfy (12). Since only 
a single path is chosen to carry the traffic, at most one 
incoming link can be active for every node, as shown 
in (13). Constraint (14) requires that only one outgoing 
link is selected for the node on the chosen path except 
the destination. As an indicator of whether a link is se- 
lected, Aj is set to 1 if the corresponding link is selected, 
otherwise it is 0. 

As the power consumption for reception is rather 
fixed, we could simplify P! to P,. According to (14), the 
objective can be rewritten as the following expression: 


N+1 N+1 : 
min {> Ai (PPETX:; + P,) 
i=1 \j=1j4i 


As shown in [11], ETX is a constant for all links 
in our scenario. Then the problem of minimum energy 
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routing with the consideration of link retransmission is 


N+1 N+1 - 
Minimize f(A, P) = om 2 A,(PUETX + P,) 
i=l \j=Lj#i 
iG Pr((x; —x;)? + (y; — y;)?)”” 
s.t. AjjP,’ = Aij R((xi — x)" + Vi — ¥i)") . 
a 
Vi 
N 
> Aij<1, Vi 
j=l j4i 
~ = -l, i=s 
PS Aki- > Aij = ; 
k=1,k4i ees 0, otherwise 
Ay =Vorl. 


(16) 


Define EPC;; = pi ETX + P, as the expected power 
consumption over link (i,j). The decision variables 
in (16) include variables with integer and real values. 
The optimization problem is a nonlinear MIP, which 
is usually hard to solve. So we will decompose the for- 
mulated problem into subproblems, then solve them 
sequentially. The strategy of decomposition is as fol- 
lows. First, we assume routing is fixed for any Ae W, 
where W denotes the solution space specified by the 
routing constraints (13)-(15). Given the restriction of 
the original problem, we could obtain the optimal so- 
lution with the coordinates of the mobile robot repre- 
sented as a function of A. The physical meaning of this 
subproblem is that given a schedule, what is the optimal 
location of the robot which yields the minimum-energy 
path? The expected energy consumption associated 
with each link is the link weight. The energy consump- 
tion is the measurement of the path quality. By vary- 
ing the placement of the robot over a limited number 
of locations in each iteration, one can discover the op- 
timal route through the shortest-path problem. Among 
all the shortest paths for each specific location of the 
robot, the one with the optimal quality is selected. This 
algorithm can solve the problem in polynomial time. 

In the derivation, it is interesting to observe that 
the optimal coordinates of the robot are the average of 
the coordinates of its upstream and downstream nodes; 
therefore, on the minimum-energy path, the robot is al- 
ways situated in the middle of two sensor nodes. With 


this observation, the solution space of the potential op- 
timal position of the robot is reduced to the possible 
links between sensor nodes, say N(N + 1)/2. This re- 
sult helps us develop an algorithm to solve the mini- 
mum power routing problem efficiently. Given the pos- 
sible locations of the mobile robot, the optimization 
problem (16) is 


N4+1 [ N+ 
Minimize f(A)= D> | So AiEPCi; 
i=l \GaL ji 
N 
subject to a Aij<1l, Vi 
j=l j#i 
= a -l, i=s 
ee ee eee 
hake jaLixi 0, otherwise 
Aij =Oorl. 


(17) 


Taking EPC; as the weight of the associated link, the 
above problem is actually a shortest-path problem. It 
can be conveniently solved by a shortest-path algo- 
rithm, such as Dijkstra’s algorithm. 


Multiple Robots When multiple robots are deployed 
in the field, the problem becomes more complicated. 
Suppose there are R robots; we number them as the 
N+1,N+2,...,N + R nodes, respectively. Now the 
problem can be formulated as follows: 


N+R ( N+R - 
Minimize f(A,P)= D> { ~ A;j;(P/ ETX + P,) 
isl \GHLjFi 
5 P. x;—-x;)? + as y2yv/2 
s. t. AijP;’ =Aij x (( : i) (Vi yj) ) 
a 
Vi 
N+R 
> Agel, Mi 
jHLj#i 
N+R N+R 24, Hiss 
2 Aki — eS Aij = , 
k=1,k4i Hitt 0, otherwise 
Ay = Corl, 


In this problem, we need to determine the best po- 
sitions of all robots, which may correlate with each 
other. In the single-robot case, the optimal location of 
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the robot solely depends on the locations of the sen- 
sor nodes, which are known; therefore, it is much more 
difficult to compute the optimal locations of multiple 
robots than it is to compute the optimal location of 
a single robot. 

Following a similar technique in the single-robot 
case, we decompose the optimization problem into 
a subproblem and a master problem. The subprob- 
lem computes the best positions for the robots, while 
the master problem determines the optimal path. De- 
note G = (V, E, W)as the graph of the sensor network. 
V is the set of vertices with N + 1 nodes (including 
data station t). E contains all the edges between each 
pair of vertices. W specifies the weight which is EPC- 
associated with each edge. 

Theorem 1 Denote G(V,E, W) as the resulting graph 
with the optimal placement of robots in the sensor net- 
work, which minimizes the energy consumption to de- 
liver a packet from node s to node t over all possible 
paths. Given a path from node s to node t in graph G, 
the robots must be situated on the edges in G(V,E,W). 
More specifically, each robot is located at the spot that 
equally divides the distance along the edge in G. 

Please refer to [11] for the detailed proof. Although 
Theorem 1 reveals the potential locations of robots 
given a path, it is still hard to solve the minimum- 
energy problem in the multiple-robots case owing to 
the large solution space. It remains to select the edges in 
G where robots reside and determine how many robots. 
Denote Qi; as the number of robots on edge (i,j). The 
problem can be formulated as follows: 


Minimize 
: 3 Ai( TEE 416 +R) 
WN (6. 4 yy-1 ij r 
i=1 \ j=1,j#i (Qi; + 1) 
ij Pedi, 
subject to Ajj;P,° = Aij _ Vi 
a 
N N 
dy @sR 
i=1 j=1,jfi 
N 
> pels. 4a 
j=l, jFi 
~ af -l, i=s 
pa Aki — >» Aij = ; 
k=1,k#i jg 0, otherwise 


Ajj =Oorl. 


The formulation is a nonconvex integer program, 
which is NP-hard. In [11], an approximation algorithm 
is proposed to solve the problem efficiently. 
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Introduction 


Cancer is the second leading cause of death in the 
United States [2]. Treatment options are determined by 
the type and the stage of the cancer and include surgery, 
radiation therapy, chemotherapy, etc. Physicians often 
use a combination of those treatments to obtain the best 
results. Our application is based on radiation therapy. 
Thanks to the continuous development of new treat- 
ment machines and technologies, it is now possible to 
have much greater control over the treatment delivery 
than was possible in the past. Researchers in optimiza- 
tion community have made significant contributions 
in improving the quality of such treatment plans for 
cancer patients [5,6,12,13,24,26,27,29,36,40,42,46,47]. 
The common objective of radiotherapy planning is to 
achieve tumor control by planning a significant total 
dose of radiation to the cancerous region to sterilize the 
tumor without damaging the surrounding healthy tis- 
sues. One of the major difficulties in treatment planning 
is due to the presence of organs-at-risk (OARs). An 
OAR is a healthy organ located close to the target. The 
dose of radiation must be severely constrained to avoid 
reaching an OAR because an overdose in the OAR 
may lead to medical complications. OAR is also termed 
“sensitive structure” or “critical structure” in the liter- 
ature. There are several survey articles that cover the 
essential elements of the radiation treatment planning 
problem, see [17,31,37,40]. 

Our aim in this chapter is to describe optimiza- 
tion techniques to improve the delivery of radiation 
for cancer patients. Two types of radiation therapy 
are the most common and include teletherapy (or ex- 
ternal beam therapy) and brachytherapy. Radiation is 
delivered from outside the body and directed at the 
patient’s tumor location using special radiation de- 
livery machines in feletherapy, (see Fig. 1). Different 
devices produce different types of radiation and they 
include Cobalt-60 machines (such as Gamma Knife ra- 
diosurgery), linear accelerators (such as intensity mod- 
ulated radiation therapy), neutron beam machines, 
orthovoltage x-ray machines, and proton beam ma- 
chines. In brachytherapy, radioactive substances are 
placed within the tumor region in the form of wires, 
seeds, or rods. Types of brachytherapy are categorized 
depending on how the radioactive sources are placed 
inside the body such as interstitial brachytherapy, in- 
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An external beam therapy machine 


tracavitary brachytherapy, intraluminal radiation ther- 
apy, and radioactively tagged molecules given intra- 
venously. There are two types of radiation treatment 
planning: forward planning and inverse planning. In 
forward planning, treatment plans are typically gener- 
ated by a trial and error approach. Therefore this pro- 
cess can be very tedious and time-consuming, and does 
not necessarily produce “high-quality” treatment plans. 
On the other hand, there has been a significant move 
toward inverse treatment planning. Such a move is due 
to significant advances in modern technologies such 
as imaging technologies and computer control to aid 
the delivery of radiation. The inverse treatment plan- 
ning procedure allows modeling highly complex treat- 
ment planning problems from brachytherapy to exter- 
nal beam therapy. Inverse planning is also called com- 
puter based treatment planning. 

In inverse treatment planning, an objective function 
is defined to measure the goodness (quality) of a treat- 
ment plan. Two types of objective functions are often 
used: dose-based models and biological (radiobiologi- 
cal) models. The biological model argues that optimiza- 
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tion should be based on the biological effects resulting 
from the underlying radiation dose distributions. The 
treatment objective is usually to maximize the tumor 
control probability (TCP) while keeping the normal tis- 
sue complication probability (NTCP) within acceptable 
levels. In the dose based model, achieving accurate ra- 
diation dose distributions on organs of interest is the 
main concern. The treatment objective is to minimize 
the deviation between the projected dose that the pa- 
tient will receive and the prescribed dosage that the 
physician provides. This is the main approach we will 
describe in this chapter. The biological aspect is implic- 
itly given in the physician’s prescription. 


Applications and Methods 
The Radiation Treatment Planning Procedure 


When a patient comes in for a treatment, the doctor 
will choose what type of radiation beam to use for the 
treatment. The choice of radiation will depend on the 
type of the cancer the patient has and how far into the 
body the radiation should penetrate to reach the tumor 
volume. 

The next step is to identify the three-dimensional 
shapes of organs of interest in the patient’s body. The 
location and the volume of organs are obtained by using 
three-dimensional imaging techniques such as com- 
puter tomography (CT) or magnetic resonance imaging 
(MRI). Based on three-dimensional images, a physi- 
cian specifies the tumor region as gross tumor volume 
(GTV), clinical target volume (CTV), planning target 
volume (PTV), and OARs. GTV represents the volume 
of the known tumor. CTV represents the volume of the 
suspected microscopic spread. PTV is the marginal vol- 
ume necessary to account for setup variations and or- 
gan and patient motion, i.e. PTV = GTV + marginal 
volume. Typically, PTV is used in designing treatment 
plans and we call PTV a target in this chapter. Organ 
geometries are the key input data for designing a treat- 
ment plan. 

A radiation physicist and a dosimetrist meet to de- 
cide what kind of radiation delivery machine to use and 
the number of treatments for the patient. Optimiza- 
tion algorithms are crucial to determine how much and 
where to deliver radiation in the patient’s body. For 
most types of cancer, radiation therapy is administered 
5 days each week for 5 to 8 weeks. Using small radiation 


doses daily rather than a few large doses helps protect 
normal body tissues in the treatment area. Resting over 
the weekend will allow some time for normal cells to 
recover from the radiation damage. 

In optimization, the three-dimensional volume is 
represented by a grid of voxels. There are several in- 
puts required in optimization approaches in radiation 
treatment planning. The first input describes the ma- 
chine that delivers radiation. The second and trou- 
blesome input is the dose distribution of a particu- 
lar treatment problem. A dose distribution consists of 
dose contribution to each voxel of the region of interest 
from a radiation source. It can be expressed as a func- 
tional form or a set of data. However, a difficulty of us- 
ing such distributions are either the functional form is 
highly nonlinear [13] or the amount of data that speci- 
fies the dose distribution is too large [27]. This problem 
needs to be overcome in a desirable automated treat- 
ment planning tool. The third common input is the set 
of organ geometries that are of interest to the physician. 
Further common inputs are the desired dose levels for 
each organ of interest. These are typically provided by 
physicians. Other types of inputs can also be specified 
depending on the treatment planning problems. How- 
ever, a desirable treatment planning system should be 
able to generate high quality treatment plans with min- 
imum additional inputs and human guidance. 


Use of Optimization Techniques 


Two major goals in treatment planning optimization 
are speed and quality. Solution quality of a treatment 
plan can be measured by uniformity, conformity, and 
avoidance [12,27,29]. Fast solution determination in 
a simple manner is another essential part of a clinically 
useful treatment planning procedure. Acceptable dose 
levels of these requirements are established by various 
professional and advisory groups. 

It is important for a treatment plan to have uni- 
form dose distributions on the target so that cold and 
hot spots can be minimized. A cold spot is a portion of 
an organ that receives below its required radiation dose 
level. On the other hand, a hot spot is a portion of an 
organ that receives more than the desired dose level. 
The uniformity requirement ensures that radiation de- 
livered to tumor volume will have minimum number 
of hot spots and cold spots on the target. This require- 
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ment can be enforced using lower and upper bounds 
on the dose, or approximated using penalization. The 
conformity requirement is used to achieve the target 
dose control while minimizing the damage to OARs 
or healthy normal tissue. This is generally expressed as 
a ratio of cumulative dose on the target over total dose 
prescribed for the entire treatment. This ratio can be 
used to control conformity in optimization models. As 
we mentioned earlier, a great difficulty of producing ra- 
diation treatment plans is the proximity of the target 
to the OARs. An avoidance requirement can be used to 
limit the dose delivered to OARs. Finally, simplicity re- 
quirements state that a treatment plan should be as sim- 
ple as possible. Simple treatment plans typically reduce 
the treatment time as well as implementation error. In 
this chapter, we introduce a few optimization mod- 
els and solution techniques that are practically useful 
for radiation treatment modalities: Gamma Knife ra- 
diosurgery, conventional three-dimensional conformal 
therapy (3DCRT) [27], intensity modulated radiation 
therapy (IMRT) [4,18]. Many treatment planning mod- 
els are developed for proton therapy [48] and tomother- 
apy [11,22]. But they are beyond the scope of this 
chapter. 


Gamma Knife Radiosurgery 


Problem The Gamma Knife is a highly specialized 
treatment unit that provides an advanced stereotactic 
approach to the treatment of tumor and vascular mal- 
formations within the head [14]. The Gamma Knife de- 
livers a single, high dose of radiation emanating from 
201 Cobalt-60 unit sources. All 201 beams simultane- 
ously intersect at the same location in space to form an 
approximately spherical region that is typically termed 
a shot of radiation. 

Gamma Knife Radiosurgery begins by finding the 
location and the size of the tumor. After administer- 
ing local anesthesia, a stereotactic head frame is fixed 
to the patient’s head using adjustable posts and fixa- 
tion screws. This frame establishes a coordinate frame 
within which the target location is known precisely and 
serves to immobilize the patient’s head within an at- 
tached focussing helmet during the treatment. An MRI 
or CT scan is used to determine the position of the 
treatment volume in relation to the coordinates de- 
termined by the head frame. Once the location and 


Optimization Based Framework for Radiation Therapy, Fig- 
ure 2 
Radiation delivery: a collimator is positioned on patient's 
head 


the volume of the tumor are identified, the neurosur- 

geon, the radiation oncologist, and the physicist work 

together in order to develop the patient’s treatment 
plan. Multiple shots are often used in a treatment using 

a Gamma Knife due to the irregularity and size of tu- 

mor shapes and the fact that the focussing helmets are 

only available in four sizes (4, 8, 14 and 18 mm). 

The plan aims to deliver a high dose of radiation to 
the intracranial target volume with minimum damage 
to the surrounding normal tissue. The treatment goals 
can vary from one neurosurgeon to the next. But the 
following requirements are typical for a treatment plan, 
although the level of treatment and importance of each 
may vary. 

1. A complete 50% isodose line coverage of the tar- 
get volume. This means that the complete tumor 
volume must receive at least 50% of the maximum 
dosage delivered to the target. This can be thought 
of as a “uniformity” requirement. 

2. Minimize the nontarget volume that is covered by 
a shot or the series of delivered shots. This require- 
ment is clear and can be thought of as a “conformity” 
requirement. 

3. Limit the amount of dosage that is delivered to or- 
gans at risk that are close to the target. Such require- 
ments can be thought of as an “avoidance” require- 
ment. 

There are standard rules established by the Radi- 
ation Therapy Oncology Group (RTOG) that recom- 
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mend the acceptable uniformity and conformity re- 
quirements. In addition to these requirements, it is also 
preferable to use a small number of shots to limit the 
treatment times and thus increase the number of pa- 
tients that can be treated. 


Optimization Model Formulation Most commonly 
known optimization models include mixed integer pro- 
gramming (MIP) model and mixed integer nonlinear 
programming (MINLP) model. MIP models guarantee 
the global optimality, but they are not practically use- 
ful due to the long computation time. We discuss an 
MINLP model that has shown to be practically use- 
ful [12]. A variant of this approach has been successfully 
implemented for planning treatments [39]. 

Suppose that the number of radiation shots for the 
treatment is given a priori. Adding the goal of minimiz- 
ing this number is typically straightforward in the opti- 
mization model. However, solving such models can be 
extremely difficult. 

Decision Variables: Consider a grid G of voxels. Let 
T denote the subset of voxels that are within the target 
and WN represents the subset of voxels that are not in the 
target. Let D;,;,, denote the amount of radiation dose 
that a voxel (i, j, k) receives. In general, there are three 
types of decision variables. 

1. A set of discrete coordinates (x;, y;,Z;). These are the 
target locations for the (ellipsoidal) shots. 

2. A discrete set of collimator sizes w: currently four dif- 
ferent sizes of focussing helmets are available (4 mm, 
8mm, 14mm, 18 mm). 

3. Radiation exposure time t,,,: the amount of radia- 
tion to be delivered for each shot centered at location 
(X53, Vs, 2s). 

Constraints: 

1. Uniformity - Isodose line coverage: A treatment plan 
is normally considered acceptable if a 0% isodose 
curve encompasses the tumor region. For example, 
50% isodose curve is a curve that encompasses all 
voxels that receive at least 50% of the maximum ra- 
diation dose that is delivered to any voxels in the tar- 
get volume. 

6 < Di, j,k <= s (i, j, k) eT (1) 

2. Choosing shot sizes: The location of the shot cen- 
ter is chosen by a continuous optimization process. 


Choosing the particular shot width at each shot loca- 
tion is a discrete optimization problem that is treated 
by approximating the step function 


1 ift>0 


EM) =) pO 


by a nonlinear function, 


H(t) © Hy(t) = 2 arctan(at) 
a 
For increasing values of a, Hg becomes a closer ap- 
proximation to the step function H. This process is 
typically called smoothing. 
An Optimization Model: 
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Solution Techniques The most critical problem for 
solving the optimization model (2) is the large number 
of voxels that are needed when dealing with large ir- 
regular tumors (both within and outside of the target). 
This makes the optimization problem computationally 
intractable. One approach to overcoming this problem 
is to remove a large number of the non-target voxels 
from the model. While this improves the computation 
time, this typically weakens the conformity of the dose 
to the target. Ferris et al. [12] propose a sequential solu- 
tion approach to speed up the time while maintaining 
conformality. First, a coarse grid problem is solved as 
a nonlinear programming (NLP) model using reduced 
data points. Then the finer grid NLP problem with full 
data points is solved using the starting point that was 
obtained by the coarse grid model in the previous stage. 
Typically, the solution from this finer grid model is very 
close to a good local optimum for the MINLP. Using 
this solution, the full MINLP model is finally solved to 
determine the values of the three decision variables for 
this problem. 
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Three-Dimensional Conformal Radiation Therapy 


Problem We learned from Section “Gamma Knife 
Radiosurgery” that the Gamma knife is specifically de- 
signed for treating diseases in the human brain. Three- 
dimensional conformal radiation therapy (3DCRT) 
adds much greater flexibility to radiation treatments 
that can treat various cancer patients in the brain, 
breast, prostate, etc. One of the main strategies for min- 
imizing morbidity in 3DCRT is to reduce the dose de- 
livered to normal tissues that are spatially well sepa- 
rated from the tumor. This can be done by using mul- 
tiple beams from different angles. A single radiation 
beam leads to a higher dose delivered to the tissues in 
front of the tumor than to the tumor itself. In conse- 
quence, if one were to give a dose sufficient to control 
the tumor with a reasonably high probability, the dose 
to the upstream tissues would likely lead to unaccept- 
able morbidity. A single beam would only be used for 
very superficial tumors, where there is little upstream 
normal tissue to damage. For deeper tumors, one uses 
multiple cross-firing beams delivered within minutes of 
one another: All encompass the tumor, but successive 
beams are directed toward the patient from different 
directions to traverse different tissues outside the tar- 
get volume. The delivery of cross-firing beams is greatly 
facilitated by mounting the radiation-producing equip- 
ment on a gantry: multileaf-collimator (MLC). 
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ure 4 

A beam’s-eye-view is a 2D shape of a tumor viewed by the 
beam source at a fixed angle 


Several directed beams noticeably change the distri- 
bution of dose, as is illustrated in Fig. 3. As a result, 
dose outside the target volume can often be quite tol- 
erable even when dose levels within the target volume 
are high enough to provide a substantial probability of 
tumor control. 

The leaves of the multileaf collimator are computer 
controlled and can be moved to the appropriate po- 
sitions to create the desired beam shape. From each 
beam angle, three-dimensional anatomical information 
is used to shape the beam of radiation to match the 
shape of the tumor. Given a gantry angle, the view of the 
tumor that the beam source can see through the multi- 
leaf collimator is called the beam’s-eye-view of the target 
(see Fig. 4); [15]. This beam’s-eye-view (BEV) approach 
ensures adequate irradiation of the tumor while reduc- 
ing the dose to normal tissue. 

Wedge Filters: A wedge (also called a “wedge fil- 
ter”) is a tapered metallic block with a thick side (the 
heel) and a thin edge (the toe); (see Fig. 5). This metal- 
lic wedge varies the intensity of the radiation in a lin- 
ear fashion from one side of the radiation field to the 
other. When the wedge is placed in front of the aper- 
ture, less radiation is transmitted through the heel of 
the wedge than through the toe. Figure 5b shows an ex- 
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Central Ray 


a A wedge filter 


b An external wedge 


Optimization Based Framework for Radiation Therapy, Fig- 
ure5 
Wedges 


ternal 45° wedge, so named because it produces isodose 
lines that are oriented at approximately 45°. The quality 
of the dose distribution can be improved by incorpo- 
rating a wedge filter into one or more of the treatment 
beams. Wedge filters are particularly useful in compen- 
sating for a curved patient surface, which is common in 
breast cancer treatments. 

Two different wedge systems are used in clinical 
practice. In the first system, four different wedges with 
angles 15°, 30°, 45°, and 60° are available, and the ther- 
apist is responsible for selecting one of these wedges 
and inserting it with the correct orientation. In the sec- 
ond system, a single 60° wedge (the universal wedge) 
is permanently located on a motorized mount located 
within the head of the treatment unit. This wedge can 
be rotated to the desired orientation or removed alto- 
gether, as required by the treatment plan. 


Optimization Model Formulation Suppose that the 
data to the optimization models are given. Let D(i,j,4),4 
be the dose contribution to voxel (i, j, k) from a beam 
of weight 1 from angle A, S be a collection of vox- 
els on the sensitive structure(s), and N be a collec- 


tion of voxels on the normal tissue. When wedges are 
allowed in the optimization, the data will be provided 
as D(i,j,k),a,F that represents the dose contribution to 
voxel (i, j,k) from a beam of weight 1 from angle A, 
using wedge orientation F. 

Beam Weight Optimization: The classical optimiza- 
tion problem in conformal radiation therapy is to 
choose the weights (or intensity levels) to be delivered 
from a given set of angles. Suppose wa represent the 
beam weight delivered from angle A, Dij,j,x) for the 
total dose deposited to voxel (i, j,k) and A represent 
the relative weighting factors in the objective function. 
Given a set 2 = T USUWN, a general optimization 
model that determines optimal radiation intensity is 


min Aif(Dr) + Asf(Ds) + An f(D) 


s.t. Dg = > Da.awa, 
AGA (3) 
1<Dr <u, 


O<wa, VAEA. 


Hard upper and lower bound constraints are imposed 
on the target dose so that, in the worst case, the result- 
ing solution will satisfy the minimum requirement for 
a treatment plan. Objective function f(D) can be de- 
fined based on the planner’s preference, but a general 
function can be written as 


f(D.) = IDC) — 4] 


p P € {1,2, 00}. 


Note that 6 is the desired dose level for an organ of in- 
terest. These problems can be cast as a quadratic pro- 
gramming (QP) problem (p = 2), minimizing the Eu- 
clidean distance between the dose delivered to each 
voxel and the prescribed dose [6,35,40,41]. Further- 
more, linear programming (LP) has also been exten- 
sively used to improve conventional treatment plan- 
ning techniques [3,24,32,37,40]. The strength of LP is 
its ability to control hot and cold spots or integral dose 
on the organs using constraints, and the presence of 
many state-of-the-art LP solvers. The LP model re- 
places the Euclidean norm objective function of a QP 
with a polyhedral one, for which standard reformula- 
tions (see [27,30]) result in linear programming prob- 
lems. While these techniques still suffer from large 
amounts of data in D(;,;,k),a, they are typically solved 
in acceptable time frames. These models tend to find 


Optimization Based Framework for Radiation Therapy 


2781 


optimal solutions more quickly than the corresponding 
QP formulations. 

Another technique to convert the quadratic (or 
more generally convex) problem to a linear program 
is via a piecewise-linear approximation of the objective 
(see [36]). For a quadratic function, a uniform spac- 
ing for the breakpoints guarantees small approxima- 
tion errors from the piecewise linear interpolant [23]. 
Since the piecewise linear interpolant is convex, stan- 
dard techniques can be used to reformulate this as a lin- 
ear program [16,23]. 

Equivalent Uniform Dose (EUD): Recently, some of 
the medical physics literature has been advocating the 
use of other forms of objective function in place of 
the ones outlined above. A popular alternative to those 
given above is that of generalized equivalent uniform 
dose (EUD). This is defined on a per structure basis as 


1 

EUD,(D, 2) := | ———— > DE ax 
ELS) 

card ({2) amet 
Note that EUD is a scaled version of the a-norm of the 
dose to the particular structure, and hence is known 
to be a convex function for any a > 1 and concave for 
a < 1[7]. Thus the problem 


max 
w 


s.t. De = >> Daawa, Q=TUSUN, 
AEA 


EUD,(D,S)<¢, 
EUD.(D,N) <u, 


0 < wa, 


EUD,(D,T) 


VAEA. 


is a convex optimization problem provided a < 1 and 
b,c > 1. As such, nonlinear programming algorithms 
will find global solutions to these problems. 

Beam Angle Selection and Wedge Orientation Opti- 
mization: Optimization also lends itself to solving the 
more complex problem of selecting which angles and 
wedge orientations to use as well as their intensities. 
Mixed integer programming (MIP) is a straightfor- 
ward technique for these type of problems. We describe 
an optimization model that simultaneously optimizes 
beam angles, wedge orientations, and beam intensities. 
Wedges are placed in front of the collimator to pro- 
duce a gradient over the dose distribution and can be 


effective for reducing dose to organs at risk. This can be 
done by adding an extra dimension F to the variable w,4: 


min Atf(Dr) + Asf(Ds) + Anf(Dw) 


s.t. Dg = > Da,Aa,FWAF » 
AEA 
Wars M-wa, (4) 


1<Dry <u, 


ee 


acA 
Wa € {0,1}, VAEA. 


The variable y4 is used to determine whether or not 
to use an angle A for delivery. The choice of M plays 
a critical role in the speed of the optimization; further 
advice on its choice is given in [27]. Note that the data 
for this problem is considerably larger, increasing by 
a factor related to the number of wedge orientations 
allowed. 


Solution Techniques Simulated Annealing (SA) has 
been well adopted in the medical community [33,44]. 
But the weakness of SA in the optimization point is its 
inability to verify the optimality. On the other hand, it is 
possible to find a global optimal solution for (4). How- 
ever, due to its slow convergence, using the MIP model 
has not been very useful for designing a treatment plan 
in the hospital. Recently, Lim et al. [27] proposed an it- 
erative solution approach that solves the MIP problem 
fast (within 20 min in two clinical case examples). It is 
termed A Three-Phase Approach. 

Three-Phase Approach is a multiphase technique 
that “ramps up” to the solution of the full problem via 
a sequence of models. Essentially, the models are solved 
in increasing order of difficulty, with the solution of one 
model providing a good starting point for the next. The 
models differ from each other in the selection of vox- 
els included in the formulation, and in the number of 
beam angles allowed. 

If the most promising beam angles can be identified 
in advance, the full problem can be solved with a small 
number of discrete variables. One simple approach 
for removing unpromising beam angles is to remove 
from consideration those that pass directly through any 
OAR [38]. A more elaborate approach [34] introduces 
a score function for each candidate angle, based on 
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the ability of that angle to deliver a high dose to the 

PTV without exceeding the prescribed dose tolerance 

to OAR or to normal tissue located along its path. Only 

beam angles with the best scores are included in the 
model. We now describe the Three-Phase Approach: 

1. Phase 1: Selection of Promising Beam Angles: The aim 
in this phase is to construct a subset of beam angles 
A, that are likely to appear in the final solution of 
(4). We solve a collection of r MIPs, where each MIP 
is constructed from a reduced set of voxels consist- 
ing of the voxels in the PTV, a randomly sampled 
10% of the OAR voxels (S), and the voxels in R. p(T ); 
that is, 


Q, ={T USUR,(T)}. 


We define A; as the set of all angles A € A for 
which w, > 0 for at least one of these r sampled 
problems. 

2. Phase 2: Treatment Beam Angle Determination: In 
the next phase, we select K or fewer treatment beam 
angles from -A;. We solve (4) using -A, and a re- 
duced set of voxels defined as follows: 


Q, ={T USUR,(T)UN}}. 


Note that |.A,| is typically greater than or equal to K, 
so the binary variables play a nontrivial role in this 
phase. 

3. Phase 3: Final Approximation: In the final phase, we 
fix the K beam angles (by fixing wa, = 1 for the 
angles selected in Phase 2 and Ww, = 0 otherwise) 
and solve the resulting simplified optimization prob- 
lem over the complete set of voxels. This final ap- 
proximation typically takes much less time to solve 
than the full-scale model, because of both the smaller 
amount of data (due to fewer beam angles) and the 
absence of binary variables. 

Although there is no guarantee that this technique 
will produce the same solution as the original full-scale 
model (4), Lim et al. [27] have found that the quality 
of its approximate solution is close to optimal based on 
several numerical experiments. 


Intensity Modulated Radiation Therapy 


Introduction A sophisticated form of treatment 
planning approach known as intensity modulated ra- 
diation therapy (IMRT) allows a number of differently 


shaped beams with different uniform radiation intensi- 
ties to be delivered from each direction, which allows 
a high degree of flexibility in delivering radiation dose 
distribution from each beam angle [4,18]. In IMRT 
treatment planning, two-dimensional (2D) beams are 
divided into several hundred or thousand pencil beams 
to generate very precise dose distribution on the treat- 
ment volume. 

Decision Variables: First, one needs to decide how 
many beam angles need to be coordinated for the treat- 
ment (beam angle optimization). For each beam an- 
gle, radiation is delivered using a multi-leaf collima- 
tor (MLC). In practice, an MLC is designed so that 
one leaf can only move one direction with a discrete 
distance. Therefore, we divide an MLC as an M x N 
grid of pixels. M is for the number of leaves in an 
MLC (note that this number can vary from one man- 
ufacturer to another), and N is for the number of 
discrete units that a leaf can move. Second, radiation 
intensity maps (fluence maps) for such beam angles 
need to be optimized to conform the three dimen- 
sional radiation dose requirement to control the tu- 
mor (fluence map optimization). For a fixed beam an- 
gle, the fluence map contains real numbers in a set of 
two-dimensional discrete coordinates that are associ- 
ated with the MLC. Since no machine can deliver such 
a non-uniform real intensity map, the intensity maps 
are first approximated as multiples of a physically de- 
liverable minimum discrete unit (this number can be 
a fraction). For example, an approximated intensity 
map for a 3 x 4 MLC may look as follows (we assume 
that the minimum discrete value allowed is 0.5 in this 
case): 


W103 25 35 20 
0 10 20 15 

(5) 
0.1 4 3 
=05x{ 1 5 7 4 
02 4 3 


Third, since we cannot deliver non-uniform radia- 
tion (see (5)) to the treatment volume with one open 
beam shape, an intensity map is decomposed into sev- 
eral unique shape matrices such that each matrix can 
contain zeros and uniform value. This is called beam 
segmentation problem. 
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Optimization Model Formulations 

Problem 1; Beam Angle and Fluence Map Optimiza- 
tion Since the optimal set of beam angles are inti- 
mately related to the optimal fluence map for each 
angle, these two problems must be dealt with to- 
gether in the problem formulation. Let a denote an 
angle, a € A, | denote the leaf index of the collima- 
tor, / = 1,2,...,m, and p represent the position of the 
leaf, p = 1,2,...,n. Then, formulating an optimiza- 
tion model for optimizing both beam angles and the 
fluence maps is a simple extension to (4) that we dis- 
cussed for the conventional 3DCRT and the model is 
given as 


min Aif(Dr) + Aef(Ds) + Anf(Dw) 


s.t. Dg = > D2Q,a,1,pWa,l,p » 
acA 


Walp <M-Wa, 
1<Dy <u, 


\ Week, 


acA 


Wa € {0,1}, VaEe A. 


(6) 


See more details of this formulation and others 
in [26,28]. 


Solution Methods 
classical optimization techniques will take too long for 
any clinicians to use for their daily treatment planning. 
Lim et al. [28] proposes a fast MIP solution approach 
and an LP-based iterative method that exploits score 
functions. However, due to the computational difficul- 
ties with large data in solving the optimization problem, 
heuristic methods are often used in practice [25]. 


Solving this problem using any 


Problem 2; Beam Segmentation Optimization Con- 
sider a matrix 


Wi) = W1,2 Wi,n 
We , 
Wm,l Wm,2 sts) Winn 
where w;,, € Z, fori=1,...,m, j=1,...,n. Our 


objective is to decompose the matrix W into K binary 
matrices S* such that 


K 
We Dine S*, (7) 


k=1 


where, S* = [s*] a € {0,1},i € {1,2,...,m}, je 

{1,2,...,m}, ux € Z, k € {1,2,..., K}. Solving (7) is 

quite easy in general. However, this problem becomes 

extremely difficult to solve when we impose the follow- 

ing two objectives and a physical constraint. 

Objectives: 

1. Minimize the value of K. 

2. Minimize T := the sum of the matrix multipliers, i.e., 
T= a Mk. 

Consecutive One Constraint: For each row of a binary 

matrix S;, if there are more than one non-zero elements 

(1’s), their sequence must be consecutive, i.e. zeros are 

not allowed to break the non-zero sequence. For exam- 


ple, 
01110 

is a feasible sequence. But, 
01010 


is not allowed because the sequence of ones is not con- 
tinuous. 

Note that there can be many more constraints to 
this problem depending on the machine that is used 
for radiation delivery. Some of the common constraints 
are overlap elimination constraint, interleaf collision 
constraint, and tongue-and-groove constraint. Details 
about these more elaborate constraints can be found 
in [1,19,20,43]. 


Solution Methods This is a combinatorial optimiza- 
tion problem that is proven to be strongly NP-hard [9]. 
Optimization formulations have been proposed includ- 
ing integer nonlinear program (INLP) and integer pro- 
gramming (IP) [28]. IP models are easier to solve than 
INLP models. IP models with relatively small num- 
bers of rows and columns can be solved within a rea- 
sonable amount of time using a branch-and-bound 
method [45]. However, as the matrix size increases (say, 
larger than 10) and the maximum value of the matrix 
‘W increases, finding global solutions for the IP models 
can take too long for treatment planners to use. There- 
fore, both researchers and planners use various heuris- 
tics. Engel [10] proposed a heuristic that generates op- 
timal T, but K is still not optimal. Other approaches 
and extension to this problem can be found in [21]. 
A genetic algorithm has also been used by other re- 
searchers [8]. 
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Introduction 


Very often scientific data are defined by multidi- 
mensional vectors of numerical values. To enable ex- 
ploratory data analysis involving heuristic abilities of 
human expert’s visualization of data is highly desirable: 
a picture is worth a thousand words. There are different 
approaches to visualization [8]. We consider one of the 
most popular approaches known as multidimensional 
scaling (MDS) [2,9,14,27,31,39,42]; it will be shown be- 
low that an essential part of the technique is optimiza- 
tion of a function possessing many optimization-ad- 
verse properties. By means of MDS a set of multidi- 
mensional vectors can be represented as a set of points 
in a low-dimensional space and exposed in this way to 
a human expert for heuristic analysis. Even more gen- 
eral sets of objects can be visualized: it is sufficient to 
know pairwise similarity/dissimilarity between the ob- 
jects. Application areas of MDS vary from psychomet- 
rics [41] and market analysis [15,36] to mobile commu- 
nications [22] and pharmacology [45]. 


Formulation 


A set of n objects is considered whose pairwise dis- 
similarities are given by an (nxn) matrix (6;,), 
i,j =1,...,n. It is supposed that dissimilarities are 
nonnegative: 5;; = 0, symmetric: 6;; = 6;;, and 6;; = 0. 
Frequently the considered objects are vectors, and dis- 
similarities are defined by a metric in the corresponding 
vector space. Sometimes (a reciprocal to dissimilarity) 
the proximity relation between objects is defined; this 
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case can be considered similarly to the case of dissimi- 
larity relation. 

An image of a set of objects is sought as a set of 
points x; €R”,i=1,...,n in a (low-dimensional 
metric) embedding space with pairwise distances be- 
tween the image points fitting the corresponding dis- 
similarities; normally Minkowski distances dp(xi,x;) 
between the points x; and x; are used where 


mn lp 

dp (Xi, Xj) = (>: [ix x") : 
k=1 

The formula defines Euclidean distances when p = 2 

and city-block distances when p = 1. 

The problem of constructing images of the consid- 
ered objects is reduced to the minimization of an accu- 
racy-of-fit criterion, e. g., of a least-squares stress func- 
tion, 


0,(X) = Yo mij (d (xi,xj) — i)" 
i<j 


where weights w; are nonnegative: w;; = 0. Normalized 
stress defined by 


vig; wij (d (x;,x;) — bi)” 


0,(X) = 


shows the proportion of unexplained sum of squares 
and can be used for comparison of results of different 
problems. Stress is not everywhere differentiable. The 
function normally has many local minima. It is invari- 
ant with respect to translation, rotation, and mirror- 
ing. The minimization problem of stress is high dimen- 
sional: the number of variables is N = n x m. There- 
fore minimization of the stress function is a difficult 
global optimization problem. 

Many global optimization methods for minimiza- 
tion of stress include auxiliary local minimization al- 
gorithms. Differentiability of an objective function at 
a minimizer is an important factor for a proper choice 
of a local minimization algorithm. The well-known re- 
sult on differentiability of stress with Euclidean dis- 
tances at a local minimizer [12] is generalized for 
Minkowski distances in [21]: positiveness of distances 
holds at a local minimizer - image points in the embed- 
ding space do not coincide. In the case of Minkowski 
distances p > 1 or m = 1, this means that stress is dif- 
ferentiable at a local minimizer. 


The result on differentiability of stress with 
Minkowski distances at a local minimizer [21] does not 
include the case of city-block distances (p = 1). It was 
shown in [44] that positiveness of distances at a local 
minimizer does not imply differentiability of stress with 
city-block distances. Examples of images at minimiz- 
ers show that values of coordinates of image points in 
the embedding space m > 1 may be equal and there- 
fore stress may be nondifferentiable at minimizer in the 
case of city-block distances. 

Everywhere differentiable S-stress is defined by 


Ss (X) = a Wij (@ (x;,x;) = 83) 7 


i<j 


Sometimes instead of the least-squares stress, a least 
absolute deviation (L,-norm) function is used: 


Sis(X) = D> wi |d (i, x)) - 84)| - 


i<j 


Examples of two-dimensional images produced us- 
ing minimization of stress with city-block and Eu- 
clidean distances are shown in Fig. 1. Vertices of multi- 
dimensional geometrical figures are considered as ob- 
jects to be visualized. The dissimilarity between ver- 
tices is measured by the distance in the original vec- 
tor space. Although it is not possible to imagine geo- 
metrical figures in the space of dimensionality larger 
than 3, properties of well-understood geometrical fig- 
ures are known. Multidimensional simplices and cubes 
are special on the symmetric location of vertices, and 
this feature is expected in the images. Besides this com- 
mon feature, the central location of the “zero” vertex is 
characteristic of a multidimensional simplex. The other 
vertices of a simplex are equally distant from the “zero” 
vertex. The vertices of a multidimensional cube com- 
pose clusters of 2* vertices corresponding to edges and 
faces. The vertices of a cube are equally distant from the 
center. 

The images of a 63-dimensional simplex are shown 
in the left column of Fig. 1, and the images of a 6-di- 
mensional cube are shown in the right column. Both ge- 
ometrical figures have n = 64 vertices. The images pro- 
duced by city-block MDS are shown in the upper row, 
and the images produced by Euclidean MDS are shown 
in the lower row. The images of vertices are shown by 
circles. 
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63-dimensional simplex (n = 64) 


6-dimensional cube (n 


= 64) 


Optimization-Based Visualization, Figure 1 
Images of vertices of multidimensional simplices and cubes 


The image of the “zero” vertex of the simplex is lo- 
cated at the center of the structures. The images of the 
other vertices tend to form a square with a vertical di- 
agonal when city-block distances are used and circles 
when Euclidean distances are used. A square with a ver- 
tical diagonal in city-block metric is equivalent to a cir- 
cle in Euclidean metric - the points on such a figure 
are equally distant to the center. Therefore the images 
of the vertices are similarly distant to the “zero” ver- 
tex when city-block distances are used. The images of 
vertices on different circles are differently distant to the 
“zero” vertex when Euclidean distances are used. 

The images of the vertices of the multidimensional 
cube tend to form a square with a vertical diagonal too 
when city-block distances are used; therefore they are 
visualized similarly distant to the center of the image. 
In the case of Euclidean metric, the images of the ver- 
tices of the cube tend to fill a circle; however, there is no 


uniformity in the location of the images of the vertices. 
The images of the vertices form clusters representing 
lower-dimensional cubes in the cases of both metrics. 


Methods/Applications 


MDS is a generalization of unidimensional scaling 
(UDS) (m = 1) [33] to the multidimensional case 
(m > 1). 

Minimization of the stress function with equal 
weights for m = 1 can be changed to a combinatorial 
maximization problem [10]: 


2 
n 
max (x Swi) — » Svan) , 


i=1 j>i j<i 


where W is the set of all possible permutations of 
1, ...,n. The optimal permutation y* found using 
maximization defines the optimal sequence of objects. 
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Then the coordinate values of image points are found 


xr) = 9, 


1 
Xyrit1 = wr TT Yo Seve 
j>i 


— oy ayr@ — DS by ctnw*e 


j<i j>itl 


+ Do 8y*ctnv*@ 
j<iti 
t= 1,.6.,n—1. 

The number of local minima for the problem of 
UDS was estimated in [37]. There a “smoothing tech- 
nique” approach was presented to locate the globally 
optimal solution. 

A branch-and-bound method for obtaining the 
guaranteed globally optimal solution to the problems of 
UDS was presented in [7]. An interchange test and new 
bounding procedures were used to improve computa- 
tional performance. 

Guaranteed solution of larger problems is not pos- 
sible; therefore heuristic approaches are used. A simu- 
lated annealing (SA) approach for the problem of UDS 
via maximization of the Defays criterion was presented 
in [5]. This algorithm includes efficient storage and 
computation methods to facilitate rapid evaluation of 
trial solutions. 

Quadratic assignment methods to generate initial 
permutations for UDS were developed in [6]. Methods 
include locally optimal pairwise interchange, SA, and 
hybrid. It was shown that substantial improvements of 
UDS can be achieved using starting permutations ob- 
tained via solution to a quadratic assignment problem. 

A heuristic algorithm based on SA for provision of 
good starting solutions for combinatorial algorithms is 
proposed in [3]. The heuristic starts with the partition 
of equally spaced discrete points. A SA algorithm is 
used to search the lattice defined by these points with 
the objective of minimizing least-squares or least abso- 
lute deviation loss function. 

An algorithm implementing SA for UDS in a dif- 
ferent way is presented in [35]. A strategy is based 
on a weighted alternating process: permutations and 


pointwise translations are used to locate the optimal 
configuration. 

A recursive dynamic programming strategy for 
some problems including UDS is discussed in [23]. 
Four different optimization strategies for UDS have 
been compared in [24]: dynamic programming, iter- 
ative quadratic assignment heuristic, smoothing tech- 
nique [37], and nonlinear programming reformula- 
tion [28]. The results show that the first two strategies 
are better than the other two and should lead to optimal 
solutions if some random starts are used. 

A mixed-integer programming formulation for the 
least absolute deviation UDS is developed in [40]. In- 
teger linear programming models for UDS were dis- 
cussed in [4]. In the case of least absolute deviation 
UDS, the objective function is piecewise linear. 

The special geometry of squared error loss function 
for UDS is employed in [38]. The developed algorithm 
is linear in the number of parameters, as the global 
minimum for every coordinate is conditioned on every 
other coordinate being held fixed. 

One of the most popular algorithms for MDS is 
SMACOEF [11]. The algorithm is based on a majoriza- 
tion approach [17] that replaces iteratively the original 
objective function by an auxiliary majorization func- 
tion, which is much simpler to optimize. The conver- 
gence properties of MDS algorithms are studied in [13]. 
It was proved that the majorization method is globally 
convergent. In almost all cases the convergence is lin- 
ear, with a convergence rate close to unity. The ma- 
jorization algorithm has been extended to deal with 
Minkowski distances with 1 < p < 2, and an algorithm 
that is partially based on majorization for p outside this 
range is suggested in [21]. 

A tunneling method for global minimization 
was introduced and adjusted for MDS with general 
Minkowski distances in [18]. The tunneling method al- 
ternates a local search step, in which a local minimum is 
sought, with a tunneling step, in which a different con- 
figuration is sought with the same value of stress as the 
previous local minimum. In this manner successively 
better local minima are obtained and the last one is of- 
ten the global minimum. 

A method for MDS based on combining a local 
search algorithm with an evolutionary strategy of gen- 
erating new initial points was proposed in [32]. Its ef- 
ficiency is investigated by numerical experiments. The 
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testing results in [22,30] proved that the hybrid algo- 
rithm combining an evolutionary global search with an 
efficient local descent is the most reliable, though the 
most time-consuming method for MDS with Euclidean 
distances. The advantages of genetic algorithms in MDS 
with nonstandard stress criteria were discussed in [16]. 

The concept of sequential estimation in MDS was 
introduced in [34]. The sequential estimation method 
refers to continually updating estimates of a configu- 
ration as new observations are added. Locally optimal 
design of the experiment was constructed. 

A globalized Newton method for stress and S-stress 
is developed in [25]. A deterministic annealing algo- 
rithm for the S-stress is presented in [26] and experi- 
mentally compared with gradient-descent methods. 

General methods for MDS can be applied in the case 
of city-block distances if they do not rely on the dif- 
ferentiability of the objective function at a minimizer. 
However, there are some methods developed especially 
for city-block MDS. 

A survey of city-block MDS is presented in [1]. Top- 
ics include theoretical issues, algorithmic developments 
and their implications for seemingly straightforward 
analyses, isometries with other distances, and links to 
graph-theoretic models. 

A distance smoothing approach for city-block MDS 
was proposed in [19]. The technique allows avoiding 
of local minima in optimization. The technique was 
extended to any Minkowski distance, and a majoriza- 
tion algorithm with a monotone nonincreasing series 
of stress values was suggested in [20]. 

A heuristic algorithm based on SA for two- 
dimensional city-block scaling was proposed in [3]. The 
heuristic starts with the partitioning of each coordinate 
axis into equally spaced discrete points. A SA algorithm 
is used to search the lattice defined by these points with 
the objective of minimizing least-squares or least ab- 
solute deviation loss function. The object permutations 
for each dimension of the solution obtained by the SA 
algorithm are used to find a locally optimal set of coor- 
dinates by quadratic programming. 

A two-stage approach for city-block MDS was pro- 
posed in [29]. The least-squares regression is used to 
obtain a local minimum of stress function in the first 
stage. SA is used in the second stage of the method. 

A bilevel method for city-block MDS was proposed 
in [44]. The method employs a piecewise quadratic 


structure of stress with city-block distances reformulat- 
ing the global optimization problem as a two-level op- 
timization problem: 


min S(P) , 


s.t. S(P) = min S(X), 
XEA(P) 


where the upper-level combinatorial problem is defined 
over the set of all possible permutations of 1, ... ,n 
for each coordinate of the embedding space and the 
lower-level problem is a quadratic programming prob- 
lem with a positively defined quadratic objective func- 
tion and linear constraints setting the sequences of val- 
ues of coordinates defined by m permutations in P. 
The lower-level problems are solved using a quadratic 
programming algorithm. The upper-level combinato- 
rial problem can be solved by guaranteed methods for 
small n and using evolutionary search for larger prob- 
lems. 

Interaction between optimization and visualization 
means not only application of optimization methods 
to implement visualization algorithms but also appli- 
cation of visualization methods to analyze properties 
of optimization problems. For example, optimal design 
of the chemical engineering process considered in [43] 
includes a minimization problem in nine-dimensional 
space where the feasible region is defined by interval 
constraints, and a nonexplicit (black box) indicator- 
type constraint. Properties of the feasible region are of 
interest while choosing the optimization algorithm and 
while making a final decision about process parameters. 
The properties of interest cannot be proven analytically, 
but heuristic analysis of the image of the set of points 
consisting of vertices of the nine-dimensional cube and 
300 randomly generated points of the feasible region is 
helpful to guess the form and dimensionality of the re- 
gion and its location in the hypercube. 


See also 


> Continuous Global Optimization: Models, 
Algorithms and Software 

> Dynamic Programming in Clustering 

> Evolutionary Algorithms in Combinatorial 
Optimization 

> Integer Programming 

> Integer Programming: Branch and Bound Methods 
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There are many situations in which it is necessary or 
desirable to classify objects into two or more mutually 
exclusive sets or classes.Medical diagnosis, for example, 
has been the focus of numerous research efforts over 
the past several years. Given a set of attributes, or rele- 
vant characteristics which describe a patient, the prob- 
lem then is the extraction and identification of various 
biological and/or historical attributes in order to deter- 
mine the correct diagnosis or classification. 

To illustrate the magnitude of the problem, consider 
the case of breast cancer diagnosis. Based on a patient’s 
medical history and on the results of mammography 
screening (the most effective diagnostic tool available 
to health care professionals), doctors attempt to clas- 
sify breast tumors as being suspicious for malignancy 
or benign. Unfortunately, of all breast tumors which are 
suspected to be malignant, over 70% are later found to 
be benign through an expensive and emotionally try- 
ing surgical procedure called a biopsy [5]. In addition, 
almost 50% of those patients who actually have breast 
cancer are classified as benign by their physicians, so 
that many malignancies go unrecognized [27]. 

The decision maker, in this example the medical 
doctor, must infer from existing information the char- 
acteristics or combinations of characteristics which are 
indicative of a benign or malignant tumor in order to 
correctly classify new cases. In their most basic form, 
the characteristics used to describe each patient are rep- 
resented by one or more binary attributes. That is, each 
object (patient) may be represented by a Boolean vec- 
tor in which an attribute value is either 1 (true) or 
0 (false). Often, the problem is compounded by the 
fact that complete information is not available. Con- 
tinuing with the breast cancer example, suppose that 
the information related to all pertinent characteristics 
is not available due to the patient’s inability to un- 
dergo certain tests because of excessive cost, the pos- 
sibility of indeterminate test results, lack of knowledge 
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related to family histories, etc. Thus, in addition to 
the binary data indicating the presence or absence of 
a given characteristic, the attribute value or level of 
some characteristics may be unknown. The doctor is 
then faced with the problem of assessing a limited set 
of characteristics to determine whether a biopsy is war- 
ranted. He/She must decide if the available characteris- 
tics and/or combinations of characteristics provide suf- 
ficient information for an accurate classification of the 
tumor. 

This problem, referred to as the inductive inference 
problem or Boolean classification problem, is illustra- 
tive of a vast number of similar situations through- 
out business, industry and medicine. Technological ad- 
vances have created a ‘data explosion’, providing de- 
cision makers with ever increasing amounts of in- 
formation. Unfortunately, this information is usually 
not exploited in an optimal way, and at times, not at 
all. Clearly, the classification problem becomes more 
complex as the amount of information related to the 
object increases. Individuals, or groups of individu- 
als find themselves incapable of consistently and reli- 
ably handling, manipulating and analyzing the avail- 
able information. As a result, the creation of com- 
puter systems capable of learning the concepts under- 
lying the data and subsequently classifying new exam- 
ples accurately and efficiently has become a practical 
necessity. 


Background Information 


As informally presented above, solving the Boolean 
classification problem generally involves the develop- 
ment of a system that learns from feature-based ex- 
amples. That is, each example is described by a set of 
Boolean attributes. The binary vector [0 1 1 1], for 
instance, describes an example in which the first at- 
tribute (or characteristic) is false, and the remaining at- 
tributes are true. Each example also carries a classifica- 
tion: positive or negative. The goal of a learning algo- 
rithm is to infer from these examples a Boolean func- 
tion (logical system) that is capable of accurately pre- 
dicting the class of new examples. Generally, the in- 
ferred system is expressed as a Boolean function in con- 
junctive normal form (CNF) or disjunctive normal form 
(DNF). 


The general form of a CNF and DNF Boolean func- 
tion is defined as (1) and (2), respectively. That is: 


k 
/\ V aj (1) 


k 
V \\ ai ; (2) 


where a; is either A; or A;. That is, a CNEF expression is 
a conjunction of disjunctions, while a DNF expression 
is a disjunction of conjunctions. Any Boolean function 
can be transformed into CNF or DNF format [15]. 

To clarify the concepts presented thus far, suppose 
that the following sets of positive and negative examples 
are somehow known: 
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The goal of a learning algorithm is to infer a Boolean 
function which correctly classifies all the examples. One 
such function (in CNF) is as follows: 


(Ay V Ag) A (Az V A3) A (Aq V AZ V Ag). 


These three clauses, when are taken together, accept the 
previous four positive examples and reject the six nega- 
tive examples. 

Traditionally, there are two main methods with the 
goal of creating these intelligent systems: decision trees 
and neural networks. These tools, which have evolved 
over a forty-year period, represent a large portion of 
the literature on learning algorithms which propose to 
solve the Boolean classification problem. When applied 


Optimization in Boolean Classification Problems 


2793 


to classification problems in which the goal of the sys- 
tem is to learn from feature-based examples, decision 
trees have been one of the most popular methodologies 
for the extraction of knowledge. Because of the natu- 
ral interpretation of knowledge, symbolic decision trees 
can be easily translated into a set of rules suitable for 
use in rule-based systems. The size and form of the de- 
cision tree is significantly affected by the ordering of 
the attributes, and often the resulting tree is nonopti- 
mal or it may be overspecialized. A complex tree is not 
only more difficult to validate, but as J.R. Quinlan [16] 
demonstrated, a simpler tree is more likely to capture 
structures inherent in the data. 

Neural networks comprise the other extreme in arti- 
ficial intelligence approaches. These systems consist of 
a set of programs based on the structure of biological 
neural systems. Knowledge is represented in the form of 
a series of interconnected neurons, the structure of their 
interconnections, and the strength of their interconnec- 
tions. To the user the process is a ‘black box’. Though 
these systems have demonstrated the ability to provide 
accurate classifications in many applications, the exam- 
ples are classified without explanations or justifications. 
In an attempt to overcome this deficiency, hybrid sys- 
tems which combine neural networks and rule-based 
systems have been developed [4]. While these hybrid 
systems are efficient and effective in terms of both time 
and storage requirements, unfortunately, an exponen- 
tial number of rules may be derived [17]. This renders 
attempts at justification of the process virtually useless 
due to the complexity of the explanation. The problem 
is that the logical rules are not derived within a com- 
plete logical framework. 


Optimization Approaches 


Recognizing the need to minimize the resulting system, 
the problem of inferring a Boolean system from positive 
and negative examples was formulated as a satisfiabil- 
ity problem (SAT) and a method for inferring a mini- 
mal DNF system was proposed [8]. The SAT problem is 
next translated into an integer programming (IP) prob- 
lem that is then solved by using an interior point method 
developed in [9]. The method makes use of a parame- 
ter, say k, which preassumes the number of disjunctions 
in the DNF system to be derived. The IP problem, if 
feasible is solved and k is successively lowered until in- 


feasibility is encountered and then it is concluded that 
there exists no system of size k or smaller which accepts 
all positive examples and rejects all negative examples. 
Many solution methods exist for solving the SAT prob- 
lem (see, for instance, [6,7,8] and [26]). Unfortunately, 
trying to determine a minimum size Boolean function 
may be computationally very difficult since it is much 
harder to prove that a given SAT problem is infeasi- 
ble than to prove it is feasible. Thus, while the SAT ap- 
proach can be used with success on small data sets and 
a minimal number of DNF clauses thus to be derived, 
when dealing with real world data, minimizing the size 
of the system may be neither feasible nor desirable due 
to the vast amounts of time and storage required by 
such algorithms. 

In [25] a logical (Boolean) function approach to the 
classification problem has been introduced with the one 
clause at a time (OCAT) approach. Like the SAT ap- 
proach, the OCAT approach formulates the Boolean 
classification problem as a series of integer program- 
ming problems. The OCAT algorithm is sequential and 
greedy in nature. The first iteration takes as input the 
E* and E™ sets, and generates a single clause which 
accepts all positive examples and rejects as many neg- 
ative examples as possible. This is the greedy aspect 
of the method. In the next iteration, it performs the 
same task using the original E* set and a revised E~ 
set which includes only those negative examples not re- 
jected by the preceding CNF clause. The iterations con- 
tinue until a set of clauses is constructed which rejects 
all the negative examples and, of course, each clause 
accepts all the positive examples. This algorithm is as 
follows: 


f=(Oe Cate 

DO WHILE (E~ # 9) 

1 i<it+l; 
Find a clause c; which accepts all 
members of E* while it reflects as 
many members of E~ as possible; 

3 Let E~ (c;) be the set of members of E~ 
which are reflected by c;; 

4 LetC <— CU cj; 

5 LetE” < E —E (cj); 

REPEAT; 


The one clause at a time (OCAT) approach (the CNF case) 
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The core of the method lies in step 2, the application 
of a branch and bound algorithm. Through the devel- 
opment of new search strategies and fathoming tests, E. 
Triantaphyllou [19] improved the performance of the 
branch and bound step. Still, like all branch and bound 
algorithms, it suffers from exponential time complex- 
ity. However, computational experiments indicate that 
the OCAT approach, when combined with the branch 
and bound algorithm, is a very efficient method for in- 
ferring logical clauses from sets of positive and nega- 
tive examples. In fact, in over half of the test cases, this 
approach generated a minimum number of clauses. In 
addition, when compared to the SAT approach of [8], 
OCAT and the branch and bound was found to be con- 
siderably faster while performing at the same level of 
predictive accuracy [19]. Thus, while the OCAT ap- 
proach may not always derive an absolute minimal sys- 
tem, computationally it is much less expensive that the 
SAT approaches and therefore more applicable to real 
world applications. 

Continued research in this area resulted in the de- 
velopment of two randomized heuristics [2]. It should 
be stated here that this approach is similar, in prin- 
cipal, to the GRASP (greedy random adaptive search 
procedure) approach presented in [3]. The first heuris- 
tic (RA1) was developed to overcome the exponential 
time complexity of the OCAT’s branch and bound algo- 
rithm. That is, RA1 derives a Boolean system from pos- 
itive and negative examples in polynomial (quadratic) 
time. The primary difference between the branch and 
bound algorithm and the RA1 heuristic is that in each 
iteration, the branch and bound attempts to reject as 
many negative examples as possible; while RAI at- 
tempts only to reject many negative examples. Again, 
the increased speed resulted in generally larger sys- 
tems. When comparing the two algorithms, A.S. Desh- 
pande and Triantaphyllou [2] found that the branch 
and bound used in the original OCAT approach pro- 
duced in general, fewer conjunctions and required 
higher CPU times than the RA1 heuristic. Additionally, 
it was concluded that a conjunction of the RAI heuris- 
tic and the branch and bound method performs much 
better in terms of both computational time/memory re- 
quirements of the process and the size of the derived 
system than either approach used alone. 

Faced with real-world problems, in which there is 
often incomplete information related to both the at- 


tribute values and the classifications, the goal to opti- 
mize the system becomes more desirable and necessary. 
Each of the methods discussed thus far have consid- 
ered only positive and negative examples with complete 
data. That is, there is no missing information in the data 
set. Often, the complete examples represent only a por- 
tion of the available data, since in general data bases are 
plagued by missing information. In [1] a logical method 
for deriving a Boolean function from positive and neg- 
ative examples was introduced in which some of the 
attribute values may be unknown. Since the missing 
information did not inhibit the classification process, 
these attributes are treated as ‘don ’t care’ values by the 
algorithm. Using a network flow algorithm, the method 
has been shown to efficiently derive a Boolean function 
with a very high predictive accuracy. The fact that the 
algorithm is capable of effectively handling missing in- 
formation makes it more applicable to real data bases. 
Note, however, that the method makes no attempts at 
minimizing the size of the derived function. 
Deshpande and Triantaphyllou [2] extended the 
RAI heuristic for complete positive and negative exam- 
ples, to include the use of incomplete data through the 
development of a second randomized heuristic, termed 
RA2. This method, allows not only for the inclusions 
of missing information in the attribute values, but it 
also makes use of examples which are unclassifiable due 
to the presence of missing information. That is, for 
some examples, the correct classification cannot be de- 
termined due to the lack of sufficient information. The 
objective of the second heuristic, similar to the first one, 
is to interactively derive a small-sized Boolean function 
from these three mutually exclusive sets: positive, neg- 
ative, and unclassifiable examples. The algorithm con- 
sists of two phases. In each iteration of the first phase, 
the objective of the algorithm is to reject many negative 
examples while accepting all positive examples, and re- 
jecting no unclassifiable examples. Once all negative ex- 
amples have been rejected by the current set of clauses, 
phase II then assures that none of the unclassifiable ex- 
amples are accepted by the system. When compared 
to the RAI the accuracy obtained with the inclusion 
of unclassifiable data was always higher than the cor- 
responding accuracy obtained without the inclusion of 
the unclassifiable data. This method has satisfactorily 
addressed the issues of efficiency and system size. Fur- 
thermore, it demonstrated that the process of extracting 
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knowledge from examples can be expedited by exploit- 
ing the patterns contained in missing information and 
unclassifiable examples. 

A related issue in this area is the development of ap- 
proaches for partitioning large scale problems in opti- 
mal or semi-optimal ways [24]. That was done by using 
a graph-theoretic approach. The same approach also al- 
lows to establish lower limits on the minimum number 
of clauses derivable from two given collections of in- 
put examples. Also, an approach for guided learning of 
a target Boolean function is proposed in [22]. In [10,21], 
and [18] the above problem was studied when the prop- 
erty of monotonicity can be established in the input 
data. Finally, in [12,13] and [11] some methods were 
presented for dealing with fuzziness and uncertainty. 


Concluding Remarks 


Clearly, minimizing the size of the inferred system is 
an attractive goal. A complex system is difficult to val- 
idate, difficult to apply, and difficult to understand. On 
the other hand, a method which seeks to minimize the 
size of the system creates an inefficient process which is 
both computationally difficult and limits the method’s 
applicability due to the vast time and storage require- 
ments. In light of the success of the RA2 heuristic, 
the authors of this article are currently conducting re- 
search aimed at the development of an ‘optimal’ logical 
method which has the ability to handle missing infor- 
mation not only in the attribute values, but in the clas- 
sification of the examples as well. 

The new method works in conjunction with the 
OCAT approach. Through the application of a modi- 
fied B &B algorithm, CNF clauses are interactively gen- 
erated such that the set of clauses, when taken together, 
accepts all positive examples, rejects all negative exam- 
ples and neither accepts nor rejects any unclassifiable 
example. We consider in this effort three optimization 
goals: efficiency of the process, accuracy of the derived 
function and the number of clauses which comprise the 
derived function. Thus‘optimal’ in this sense implies 
the derivation of a small (hopefully minimum) and ac- 
curate Boolean system through the efficient exploita- 
tion of information contained in unclassifiable exam- 
ples and, of course, the positive and negative examples. 

In our current research efforts, optimization be- 
comes even more vital. By allowing missing informa- 


tion and unclassifiable examples, the amount of avail- 
able data increases and necessitates the use of a learning 
algorithm which does not require excessive amounts 
of time and/or memory requirements. In addition, the 
goal of a logical approach is to derive a system capa- 
ble of accurately classifying new examples and provid- 
ing justification for the decision. A logical system de- 
rived from incomplete data may encounter an example 
which cannot be classified due to insufficient informa- 
tion. This system must be capable of explaining why the 
example is unclassifiable. That is, it has the additional 
responsibility of assisting the decision maker in iden- 
tifying the minimal amount of additional information 
required for classification of the example. In a minimal 
system, this information is more readily accessible. 
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From the 1950s onwards, the search for computerized 
tools and mathematical models that can speed up the 
classification of large collections of documents has been 
the focus of many research efforts. These efforts have 
been centered in developing tools that can speed up the 
classification of documents according to some underly- 
ing context. A current example of this situation is the 
Internet. In this worldwide conglomerate of databases, 
one can easily see the speed at which documents on 
the topic, say, ‘basketball’ are retrieved from among the 
millions of documents produced daily on the Internet. 
Document classification is also of paramount impor- 
tance in many information retrieval applications, such 
as news routing [7], classification/declassification of of- 
ficial documents [15] e-mail filtering [27], and context 
derivation of electronic meetings [3]. 

From the 1950s onwards, various fields of the hu- 
man knowledge have produced several solutions for 
the document classification problem (see, for exam- 
ple, [21,23], and [2]). Some examples of these fields 
are mathematical optimization, computational linguis- 
tics, expert systems, neural networks, and genetic algo- 
rithms. These methodologies have been severely limited 
to some degree by the huge amounts of information, 
both textual and graphical, generated by today’s infor- 
mation driven society. On the other hand, this ‘techno- 
logical’ limitation has been the boost for the develop- 
ment of more efficient and effective classification pro- 
cedures [15]. 

The purpose of this article is to exhibit some con- 
tributions of discrete optimization during the process 
of automatic document classification. This paper illus- 
trates these contributions by presenting three cases (ap- 
plication areas) in which optimization is used in the 
classification process. The first case deals with a generic 
procedure for the selection of a set of indexing terms 
(keywords or context descriptors). The second case deals 
with the selection of an optimal set of indexing terms to 
minimize the overlapping of keywords used in different 
documents. The last case deals with the classification of 
text documents from mutually exclusive classes. These 
three cases are only a tiny sample of a vast collection 
of related instances in the field of information retrieval 
systems; see [1,11], and [25] for additional literature. 

This article is organized as follows. The next section 
presents an overview of the document classification 
process. The subsequent section illustrates the three ap- 


plication areas in which optimization has contributed 
in the solution of the classification problem. Finally, 
a summary section is given. 


Overview of Automatic Classification 
of Documents 


The automatic classification of text documents consists 
of grouping documents of similar context into mean- 
ingful groups in order to facilitate their storage and re- 
trieval [22].Text classification can be viewed as a four- 
step process. In the first step, a representative sample of 
documents from various classes is presented to a com- 
puterized system, and a list of the co-occurring words 
with their frequencies is secured (see, for example, [22] 
and [4]). 

In the second step the frequency of the words is an- 
alyzed, and only the most ‘meaningful’ words are ex- 
tracted as indexing terms (keywords or context descrip- 
tors) [14]. The ‘meaningful’ words or keywords are the 
words with moderate co-occurring frequencies. H.P. 
Luhn [13], G. Salton [21], D. Cleveland and A.D. Cleve- 
land [4], and Ch. Fox [6] suggest the elimination of 
the ‘common’ and ‘rare’ words (i.e., frequent and in- 
frequent words, respectively) as indexing terms because 
they convey little lexical meaning. Some examples of 
common words are ‘a’, ‘an’, ‘and’, and ‘the’ [6]; ‘rare’ 
words are dependent on the document’s subject [13]. 

In the third step the context of unclassified doc- 
uments is determined by affixing them with the key- 
words that occur in their text. According to [4], ‘the 
assignment of these keywords to a document is correct 
because authors usually repeat words that conform with 
the document’s subject.’ Finally, the documents which 
were indexed with similar keywords are grouped to- 
gether [22]. 

The set of keywords attached to each document dur- 
ing the third step is often referred to as a document sur- 
rogate or just a document [4]. A surrogate is a conve- 
nient way to represent and to computationally process 
the context of real documents. For instance, the surro- 
gate of seven words { ‘document classification’, ‘doc- 
ument indexing’, ‘optimization’, ‘vector space model’, 
‘logical analysis approach’, ‘OCAT algorithm’, and ‘ma- 
chine learning’ } is a condensed and convenient way 
to represent the context of this article which contains 
thousands of words, symbols, and numbers. 
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A collection of N surrogates 


Often, a surrogate is further simplified by defining it 
as a binary vector. (For nonbinary surrogates, see [22].) 
In this case, when a surrogate’s element wj = 1 (or 0), 
it indicates the presence (or absence) of keyword T; (i 
= 1,..., £) in document D; (j = 1, ..., N). For example, 
the surrogate D, = [011 1 1 0] of six binary elements 
indicates the presence of keywords T>, T3, T4, and Ts 
and the absence of keywords T) and T¢ in D,. Figure 1 
shows a popular way to summarize a collection of N 
documents (surrogates) which are defined on t [22]: 

In the last step, documents sharing similar key- 
words are grouped together. This classification fol- 
lows from the pairwise comparison of the surrogates in 
Fig. 1 [22]. More on this is described in the following 
section. 


Examples of Optimization 
in Document Classification 


Optimization techniques have been used in various ar- 
eas of text classification with various levels of success. 
Their utilization has been limited mainly because the 
size of the document classification problem is so large 
that even with the current computerized technologies, 
it would take them very long time to produce an op- 
timal solution. Despite these technological limitations, 
the contributions of optimization in this field can be 
seen, for example, in the selection of keywords, auto- 
matic classification of documents, automatic retrieval, 
etc. Some applications of these techniques are presented 
next. 

The first example illustrates the principle of least ef- 
fort [29]. This principle is used for the derivation of an 
indexing vocabulary based solely on the frequency of 
the co-occurring words in a collection of documents. 
The second example illustrates the application of the 
vector space model [23] for the derivation of an opti- 


mal indexing vocabulary to minimize the overlapping 
of keywords used by different documents. The third 
example illustrates the utilization of a machine learn- 
ing and operations research algorithm called the one 
clause at a time (OCAT) algorithm [28] for the classi- 
fication of documents which belong to mutually exclu- 
sive classes. 


Optimization in the Principle of Least Effort 


The principle of least effort (PLE) can be viewed as one 
of the first optimization attempts in the area of docu- 
ment classification. It was introduced by H.P. Zipf [29]. 
Although the PLE does not have a strict mathematical 
formulation, the problem it solves can be stated as fol- 
lows. Given a collection of documents, the question is 
how to derive the ‘best’ set of indexing terms that will be 
used to identify the subject of documents in the collec- 
tion. The set of the best indexing terms (or keywords) 
is often referred to as indexing vocabulary (see, for ex- 
ample, [4]). Hence, the goal of the PLE is to derive an 
optimal indexing vocabulary with the most meaningful 
words occurring in these documents. 

Under the PLE an indexing vocabulary is derived 
as follows. At first all the co-occurring words and their 
frequencies are extracted from the collection of docu- 
ments. Then, these words are ranked in descending or- 
der according to their frequencies. Finally, the words 
with frequencies in between some preestablished upper 
and lower frequency limits are selected as the index- 
ing vocabulary. The frequency boundaries of the mean- 
ingful words are determined by a trial-and-error ap- 
proach [13]. Other words with co-occurring frequen- 
cies above or below the preestablished limits are known 
as ‘common’ and ‘rare’ words (the frequent and infre- 
quent words, respectively) and usually are discarded 
for indexing purposes because they convey little lexical 
meaning (see, for example, [13] and [16]). 

It is interesting to notice here that although the 
PLE does minimize the number of keywords, its un- 
wise utilization may jeopardize the quality of the in- 
dexing vocabulary. This can be illustrated by consider- 
ing the word ‘a’. The word ‘a’ is one of the most com- 
mon words in the English language (other such words 
are ‘an’, ‘and’, and ‘the’; see, for example, [6]). Thus, if 
the collection of documents is about nutrition, then ‘a’ 
may represent the name of the vitamin ‘a’ or ‘A’, and its 
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elimination clearly would jeopardize the quality of the 
indexing vocabulary. 


Optimization in the Vector Space Model 


One of the most successful models in information re- 
trieval systems is the vector space model (VSM). It was 
introduced in the mid 1970s in [23]. The VSM solves 
problems of the following nature: Given are samples of 
documents. Then, the question here is how to derive an 
optimal indexing vocabulary such that keywords used 
in one document are minimally used in other docu- 
ments. In [23], the VSM was also extended to deter- 
mine a vocabulary that minimizes the overlapping of 
keywords used in different classes. That is, keywords 
used in documents belonging to one class are minimally 
used in other classes. 

The VSM derives this optimal vocabulary as follows. 
At first, a sample of t words is taken from all the words 
co-occurring in a collection of N documents. This sam- 
ple of words is used as a candidate indexing vocabu- 
lary. Then, all documents in the collection are indexed 
(their subject is defined) by using words from this can- 
didate vocabulary. Document surrogates are formed in 
this step. Next, the VSM computes the similarity of all 
the surrogates in the collection according to 


N 
f= sim(D;, D;) 
» , (1) 


for j=2,...,N i Xj. 


Where sim(D;, D;) measures the similarity between 
documents Dj; and Dj. Usually, sim(D;, Dj) is replaced 
by a function that relates any two vectors, such as the 
functions illustrated in Table 1. This procedure is re- 
peated by using various candidate vocabularies. Finally, 
the candidate vocabulary that minimizes the expres- 
sion in (1) is selected as the optimal indexing vocab- 
ulary [23]. The following example illustrates an appli- 
cation of the VSM with binary surrogates. The cosine 
coefficient is used to solve (1). 


and 


Example 1 Let the words Tj, ..., Tz be the set of all 
the words which were found in documents D, and D3. 
(In real practice, this set may contain hundreds or even 
thousands of words.) Next, let D; and Dz be indexed 
with only four of these words. Hence, the question here 
is: what is the ‘optimal’ indexing vocabulary of four 


words that make document Dj to be indexed with key- 
words that are minimally used by D2? 

It is not difficult to realize that the number of candi- 
date vocabularies of four words that can be formed out 
of seven words is equal to (7) = 35. Table 2 shows the 
similarities between D, and D> for only three vocab- 
ularies. The first column of this table shows these vo- 
cabularies. For example, words T), T2, T4, and T7 cor- 
respond to the first vocabulary. The second and third 
column show the binary surrogates of documents D; 
and Dy. For instance, the surrogate D,; = [1 01 1] indi- 
cates the absence of word T> and the presence of words 
T,, T4, and T7 in Dj. Similarly, the surrogate for D, = 
[ 0 0 1 1] indicates the absence of T, and T2 and the 
presence of words T4 and T7 in Dj. Finally, the fourth 
column shows the CC similarity values, or sim(D,, 
D,)), between D, and D; for the three vocabularies. 

The CC similarity values in Table 2 indicate that 
when words T3, T4, Ts, T's are used as indexing terms, 
the similarity of documents D; and D) is minimal. Fur- 
thermore, the similarity value sim(D,, D2) = 0.00 indi- 
cates that both documents are completely different be- 
cause their surrogates do not contain common words. 
Therefore, according to the VSM the optimal indexing 
vocabulary corresponds to terms T3, T4, T's, and Tg. 
Thus, the other words T,, T, and T7 can be discarded 
as indexing terms. 

The solution presented in this table seems to be 
a trivial one. However, it can be easily shown that for 
realistic indexing problems such a solution can be quite 
an elaborate one. Suppose, for example, that the num- 
ber of words extracted from D, and D) is not seven, 
but fifty (which still is a small number for realistic sit- 
uations). Furthermore, if this time one is interested in 
finding candidate vocabularies of ten words, rather than 
combinations of four words, then the number of vocab- 
ularies that can be constructed is (7) = 10.27 billions. 
That is, in addition to finding the minimization of (2), 
the VSM must also use an efficient search strategy to 
quickly eliminate many non optimal vocabularies. By 
the same token, it can be easily shown that if vocab- 
ularies of all sizes are considered, then the number of 
such vocabularies is determined by 2 —1 (where t is the 
number of extracted words), which even for moderate 
values of t this expression is a too large number. 

As a result of these humongous searching spaces, 
researchers have been compelled to design fast heuris- 
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Measures of vector similarity (taken from [22, Chap. 10] 


similarity measure sim(X, Y) 


Evaluation for binary term 


evaluation for weighted term 


vectors vectors 
i t 
inner product (IP) |XnyY | Se: 
dice coefficient (DC) 2 xn Dia XV 
pl Liat 
z 
; : xny eee 
cosine coefficient (CC) rare Ziel py 
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t wie ene 
Jaccard coefficient (JC) REE ear Lie iV 


t 7) t t 
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XS (Citgo 2 on a) 
| X |= number of terms in | X |. 


| XM Y |= number of terms appearing jointly in X and Y. 


Optimization in Classifying Text Documents, Table 2 
Similarity sim(D,, D2) for three candidate vocabularies of 
four words 


Surrogates 
Vocabularies D, Dz sim(D,, D2) 
T,, Io, 14, T7 [1011] [0011] 0.50* 
TD36 Tet dis, Is [0100] [0110] 0.25 
T3, 14, Ts, T6 [0100] [0011] 0.00 


* - sim(Dj, D>) = 2/(4"? x 4") = 0.50. 


tic search strategies at the expense of optimality. Fur- 
thermore, because the size of the indexing problem can 
be very large, suboptimal heuristic solutions have been 
preferred over optimal but slow ones [2]. 


Optimization in the Classification 
of Text Documents 


The process of document classification consists of 
grouping documents according to their underlying sub- 
ject. Some examples of familiar document subjects are 
history, geography, music, engineering, etc. Usually, 
this underlying subject is determined by the set of in- 
dexing terms which was attached to each document. 
That is, documents about History share indexing terms 
whose content describes past events. Similarly, the in- 


dexing terms of documents about geography share con- 
tent which describes different geographic places on 
earth. Hence, the problem that this process of docu- 
ment classification attempts to solve is described as fol- 
lows: Given samples of preclassified documents (i. e., 
their surrogates), the question is how to use the infor- 
mation contained in their surrogates such that new un- 
classified documents can be grouped into the appropri- 
ate classes. 

There are many methodologies for solving this doc- 
ument classification problem. Some examples of such 
methodologies are: the vector space model for doc- 
ument classification [22]; fuzzy set theories [12] and 
[17]; semantic analysis methodologies [10] and [19]; 
and some others which use artificial intelligence ap- 
proaches, [3], and [2]. To some extent all these method- 
ologies use optimization in order to maximize (or min- 
imize) some performance measure, which usually is the 
similarity between indexing terms. In what follows, we 
present only one methodology which is based on arti- 
ficial intelligence and operations research approaches. 
This methodology is the one clause at a time (OCAT) 
algorithm [28] for the classification of examples (e. g., 
documents) in mutually exclusive classes. The OCAT 
algorithm uses optimization methodologies for con- 
structing classification clauses (e.g., word patterns) of 
minimal (or near minimal) size. Figure 2 shows this al- 
gorithm. 
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Input: E* and E~ 

Output: Logical rules in CNF (or DNF) form. 
f= iC = Up 

DO WHILE (E~ # 9) 

il i < i+1;/ indicates the ith iteration 
/ 

De find a clause c; which accepts all mem- 
bers of E* while it rejects as many mem- 
bers of E~ as possible; 

3h let E~(c;) be the set of members of E~ 
which are rejected by c;; 

4: letC <— C+; 

5 let E~ < E~ — E(c;); 

END; 


Optimization in Classifying Text Documents, Figure 2 
The one clause at a time (OCAT) algorithm 


The OCAT algorithm is also a machine learning al- 
gorithm. It uses logical analysis and branch and bound 
approaches to extract knowledge (sets of rules) from 
sets of preclassified examples. It takes as input data 
samples of examples from (usually two) mutually exclu- 
sive classes and extracts knowledge that is represented 
in a compact form of key data patterns which can be 
used to classify new unclassified examples into these 
two classes. 

The two mutually exclusive classes are referred to as 
the sets of positive and negative examples (denoted by 
E* and E, respectively). Furthermore, the collections 
of examples in both classes are defined over the same 
set of parameters (also called atoms, characteristics, or 
factors) which are assumed binary valued. Figure 3 il- 
lustrates a set of four positive examples: e1, e2, €3, e4 
and a set of six negative examples: es, €¢, €7, €g, €9; €10- 
All ten examples are defined on the four atoms Aj, Az, 
A3, and Ay. For instance, example e; = [ 0 1 0 0] indi- 
cates the presence of atom A, and the absence of atoms 
Aj, A3, and Ag in e;. On the other hand, example es = 
[ 10 10] indicates that atoms A; and A; are present and 
that atoms A, and A, are absent. 

When the OCAT algorithm is used to solve the doc- 
ument classification problem, E* and E™ (i.e., the sets 
with the positive and negative examples, respectively) 
correspond to the sets of documents which belong to 
two mutually exclusive classes. That is, documents in 
the positive class are the ones that belong in only one of 


e, f0 1 0 0 
a e€3 00141 
€4 1001 
and 

e5 |1 0 1 O 
eg 10 0 0 1 
= ej [1 1 1 1 

E- = 
eg 10 0 0 0 
eg |1 0 0 0 
ero |1 1 1 =0 
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Two illustrative sets of positive and negative examples 


the two classes, while the documents in the other class 
are the negative examples. Hence, Fig. 3 may represent 
a set of ten documents (surrogates) which were indexed 
by using four keywords. 

The OCAT algorithm is a greedy algorithm which 
determines a set of compact clauses either in the con- 
junctive normal form or disjunctive normal form (CNF 
or DNF, respectively, as defined below). For example, 
CNF clauses are determined as follows. In the first iter- 
ation it determines a clause that accepts all the examples 
in E* while it rejects as many examples in E~ as possi- 
ble. In the second iteration it performs the same oper- 
ation using the original E* set but this time the current 
E™ set contains only the negative examples that have 
not been rejected by any of the previous clauses. The it- 
erations continue until the set of constructed clauses re- 
ject all the negative examples. Hence, when these CNF 
or DNF clauses are taken together, they accept all the 
positive examples while they reject all the negative ex- 
amples. 

The conjunctive normal form and disjunctive nor- 
mal form (see, for example, [24]) are defined as in ex- 
pressions (2) and (3), respectively. 


k 
Vi Aa] - (3) 
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Where a; can be either A; or A;. Thus, a CNE ex- 
pression (also called a logical clause) over a vector v € 
{ 0, 1 }' is a conjunction of disjunctions defined on the 
terms A; (i = 1, ..., t). Similarly, a DNF expression is 
a disjunction of conjunctions on the same terms Aj. 

Let n be the number of atoms and M, the number of 
positive examples. It can be easily shown that the max- 
imum number of clauses that can be formed using n 
atoms and M, examples is equal to M; [28]. To form 
these clauses consider the first example e; = [0100]. It 
can be observed that in order to accept this positive ex- 
ample at least one of the four atoms Aj, Az, A3, A, must 
be specified as follows: (A, = FALSE; i.e., A, = TRUE), 
(Ay = TRUE), (A3 = FALSE; i. e., A; = TRUE), and (Ay 
= FALSE; i.e., Ay = FALSE). Hence, any valid CNF 
clause must include Aj, or A, or A3, or Ay. Similarly, 
the second positive example e; = [ 1 1 0 0 | indicates 
that any valid CNF clause must include AAj, or AAp, or 
‘A;, or Ay. In this manner, all valid CNF clauses must 
include at least one atom as specified from each of the 
following sets: {A,,AA2,A3, Aa}, {AA1, AAo, As, Aut, 
{A,, Ap, A3, AA}, and {Aj, A, A3, Aq}. Relation (4) 
shows a CNF system which was derived by using the 
OCAT algorithm on the examples in Fig. 3: 


(Az V Ag) A (Ap V A3) A (Aq V A3 V Ag). (4) 


Example 2 An application of the OCAT algorithm 
can be illustrated by using a new example, say, ei; = [ 
0010]. When ej; is ‘tested’ by the above CNF expres- 
sion, then it can be seen that e;, is classified as a negative 
example. This is as follows. The clause Az V A, evalu- 
ates to 0 because e€;; does not contain neither the second 
nor the fourth atoms. On the other hand, both clauses 
‘A, V A; and A, V A3 A Ag evaluate to 1. However, when 
the three clauses are taken together, expression (4) eval- 
uates to 0, thus indicating that e); is a negative example. 


Conclusions and Future Research 


This article illustrated some contributions of opti- 
mization for solving the document classification prob- 
lem. These contributions were illustrated by present- 
ing three cases (application areas) in which optimiza- 
tion has been used. The first case dealt with the princi- 
ple of least effort (PLE) which is used for the selection 
of an indexing vocabulary based solely in the frequency 
of the co-occurring words. The second case dealt with 


the vector space model (VSM) for the selection of an 
indexing vocabulary that minimizes the overlapping of 
words used in various documents (or in various docu- 
ment classes). The third case illustrated the one clause 
at a time (OCAT) algorithm for the classification of 
documents into mutually exclusive classes. 

A common characteristic of these three cases is the 
huge amounts of information that need to be processed 
before optimal solutions can be found. Therefore, the 
optimization techniques presented in these examples 
have been used extensively only on document classifi- 
cation problems of small size. The main reason for this 
limitation is that even with the current computerized 
technologies, these techniques would take unacceptable 
processing times to find optimal solutions for larger 
classification problems. As a consequence, scientific re- 
search efforts have focused their attention in developing 
effective and efficient heuristics for solving problems of 
more realistic size. 
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The engine is an essential part of a motor car but it is 
useless if it is not embedded in a car body with chairs, 
windshields, tyres, brakes, etc. Similarly, an optimiza- 
tion algorithm is useless if it is not embedded in an ap- 
propriate model and if it is not linked to the outside 
world via a system that helps the decision makers to use 
the data and the model. 

Optimization systems may be found at all levels of 
decision making: at the level of operational planning 
with a short-term horizon (hours or days), such as the 
scheduling of the production of corrugated cardboard, 
glass, iron, and steel, the operational control of a water- 
management system, and the load dispatching in elec- 
tricity generation; at the level of tactical planning with 
a medium-term horizon (months), such as the capacity 
planning of assembly-line production; and at the level 
of strategic planning with a long-term horizon (years), 
such as the choice of a strategy for the national energy 
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supply or the design of the infrastructure of a water- 
management system. In what follows we shall describe 
some of these systems in more detail. 

In the voluminous literature on linear programming 
applications one constantly comes across references to 
the cutting-stock problem or the trimloss problem in 
the corrugated-cardboard industry, where long strips 
or rectangular plates of material are cut into smaller 
rectangles. The objective is to set the circular and the 
guillotine cutters in such a way that the cardboard pro- 
duction machine supplies the required number of rect- 
angular plates for each client with a minimum of waste. 
Real-life conditions on the factory floor impose addi- 
tional requirements, such as the reduction of the set- 
up costs due to the resetting of the cutters, which can- 
not be formulated in pure linear programming terms. 
They can be allowed for in a mixed integer program- 
ming model but this tends to become intractable for ex- 
isting numerical methods because the number of zero- 
one variables may be large. A more successful approach 
is the generation of a number of promising cutting pat- 
terns followed by the selection of a small number of 
patterns for the actual production of corrugated card- 
board. This is a pure integer programming formula- 
tion of the trimloss problem. An additional complica- 
tion is that the format of the rectangular plates is not 
necessarily hard. Under certain conditions the plan- 
ners may deviate from the ordered dimensions, and 
also from the required number of plates. This greatly 
alleviates their task to reduce the losses of time and 
material as much as possible. An interesting feature 
of the systems for the planning of cardboard produc- 
tion is that they do not only reduce the losses, but they 
also contribute to a smooth processing of orders, plan- 
ning lists, deliveries, and invoices. Thus, these systems 
do not only tend to control the actual production but 
they also support the administrative processes in the 
factory. 

Electricity generation is subject to the iron law that 
demand has to be satisfied, as and when it occurs. Un- 
der these constraints electricity companies usually pur- 
sue the economic objective of minimizing the fuel costs. 
The calculation of the cheapest production schedule is 
divided into several parts, each with its own time hori- 
zon. One of these parts is the static dispatch problem, 
the allocation of the electricity demand predicted for 
the next fifteen or sixty minutes to the available power 


units. Another part is the unit-commitment problem, 
the daily selection of the units for actual production. In 
its mathematical form the dispatch problem can be writ- 
ten as the minimization of a nonlinear fuel-cost func- 
tion subject to a linear demand constraint whereas up- 
per and lower bounds are imposed on the variables. 
In general, the fuel-cost function can be simplified and 
written as a convex, separable, quadratic function of the 
loads assigned to the respective power units. The dis- 
patch problem is accordingly a quadratic continuous 
knapsack problem which, in its primal form, can ef- 
ficiently be solved via gradient methods for nonlinear 
optimization. A dramatic acceleration occurs when the 
dispatch problem is considered in its dual form. In each 
step, after the unconstrained minimization of the asso- 
ciated Lagrangian function, one can immediately find 
a number of variables which are at their optimal val- 
ues so that they can be removed from the problem for- 
mulation. This rapidly reduces the size of the dispatch 
problem [1]. More strongly, it implies that the dispatch 
problem can efficiently and reliably be solved, even if 
there are hundreds of variables. This problem size is 
normal in the electricity generation for a country of 15 
or 20 million inhabitants. 

After the oil crisis of the early 1970s many countries 
stimulated the development of so-called energy mod- 
els which could be used to analyze long-term strate- 
gies for the national energy supply via linear program- 
ming. Such a model has three parts at a high aggre- 
gation level: winning and/or import of primary en- 
ergy carriers like crude oil, natural gas, and coal; con- 
version (refineries and power plants) into secondary 
energy carriers like petrol and electricity; and con- 
sumption in various sectors of the national economy 
like heavy industry, light industry, transportation, and 
households. Because there is a flow of materials from 
winning/import to consumption, the linear program- 
ming model is largely a network flow model. In the 
first and the second part there are certain limits to be 
set by the decision makers: upper limits on import and 
winning, and upper limits on the nuclear capacity for 
electricity generation, for instance. In the third part the 
decision makers can introduce the scenario-dependent 
projections of the future energy demand. In practice, 
since the supply of oil and natural gas is in the hands 
of large multi-national companies, these models are 
mainly concerned with strategies for the national elec- 
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tricity supply. For a prespecified year, using as input the 
projection of the electricity demand in various sectors 
of the economy, the projected prices of primary energy 
carriers, the estimates of the investment costs of new 
energy technologies, and the estimated supply restric- 
tions, capacity limitations, and conversion efficiencies, 
the linear programming model yields an optimal mix 
of secondary energy carriers and technologies. An ob- 
vious extension of an energy model is the incorporation 
of multiple objective functions representing the policy 
objectives: 

a) to minimize the environmental damage; 

b) to maximize the 

economic benefits; 

safety of the energy system; 

- strategic independence of the country. 

The analysis of the model usually shows that the ex- 
ploitation of nuclear energy is strongly in conflict with 
the exploitation of fossil fuels. Nuclear energy is felt 
to reduce the safety of the energy system, but it also 
reduces the environmental damage of CO) and NO, 
emissions, and it enhances the strategic independence 
by the reduction of crude-oil imports. 

In the 1980s energy modeling strongly contributed 
to the popularity of scenario analysis. It became fash- 
ionable to explore the future and to judge various long- 
term strategies for the national electricity generation 
within the framework of a number of scenarios. The 
increasing electricity demand could be covered by an 
increased nuclear capacity, by the construction of new 
and more powerful fossil-fuel plants, and/or by interna- 
tional cooperation and diversification. Scenarios were 
designed as follows. First, the factors were identified 
which would affect the economic, political, and so- 
cial developments until a predetermined horizon and 
which were beyond the decision maker’s control. Be- 
cause these factors are mostly interconnected, the hy- 
pothetical future developments could be bundled in 
a small number of coherent scenarios. Strategic plan- 
ners designed a trend-following business-as-usual sce- 
nario where the current developments were smoothly 
continued, an optimistic scenario where the develop- 
ments proceeded in an upward direction, and a pes- 
simistic scenario with a less upward or even down- 
ward direction of the future developments. Each strat- 
egy had certain desirable and/or undesirable conse- 
quences within the context of the respective scenarios. 


These consequences could be evaluated via the anal- 
ysis of linear programming energy models, at least to 
some extent. In choosing a strategy, the decision mak- 
ers would have to weigh the consequences and to take 
into account the scenario probabilities. This compli- 
cated process was usually prepared in a network of pol- 
icy committees and advisory councils who were ex- 
pected to suggest a preferred strategy on the basis of 
a national energy outlook [2] with a horizon of ten or 
twenty years. 

The decision support systems just described have 
been used for decades. The systems for operational 
planning support the short-term decisions of the plan- 
ners who work in situations where the rules and the 
data are fairly well-known. Nevertheless, even in short- 
term planning the objectives and the constraints are 
not necessarily hard. The planners in a corrugated- 
cardboard factory, for instance, have to find a compro- 
mise between the objective to minimize the losses of 
material and the objective to smooth the administrative 
processes on the factory floor. The demand constraints 
may also be relaxed. The planners may deviate from the 
required dimensions, and they may ask the clients, ei- 
ther to order a larger quantity of rectangular plates now, 
or to postpone the delivery to the next planning period, 
because the order can nicely be combined with the or- 
der of another regular client. 

In the systems for strategic planning, not only the 
rules and the data are imprecise, but also the users are 
difficult to identify. Strategic planning is largely a dis- 
tributed decision-making process wherein many actors 
are involved: members of policy committees and advi- 
sory councils, managers of energy-producing compa- 
nies, representatives of trade unions, members of pres- 
sure groups, the press, etc. They all have contradictory 
views and conflicting objectives in the energy sector, 
and they have widely varying power positions. In the 
eighties their views have been analyzed via multicriteria 
decision analysis and multi-objective optimization [3]. 
The study clearly identified the critical issues in the en- 
ergy debate: safety and environmental protection. The 
proposed compromise solution to the long-term energy 
supply problem contained a significant contribution of 
nuclear energy so that it was ignored after the Cher- 
nobyl disaster. 

In the late 1980s the study had to be taken up again 
because the choice of primary energy carriers cannot 
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be postponed indefinitely, certainly not when obsolete 
power plants have to be redesigned and replaced within 
the next planning period [5]. There were several op- 
tions: new nuclear power plants, or larger and more 
sophisticated fossil-fuel plants using oil, coal, or nat- 
ural gas. The objectives of the electricity companies 
were clear. The two small nuclear power plants in the 
country covered some 7% of the national demand, and 
the companies were convinced that the nuclear option 
should be increased. However, any proposal to start the 
design of a new nuclear plant would be rejected by the 
Ministry of Economic Affairs: it would politically be 
unfeasible. Hence, in order to avoid a negative decision 
that would be irreversible for a period of ten or twenty 
years, the electricity companies proposed to cover the 
peak demand in the next planning period via special 
contracts with foreign companies. This strategy would 
enable them to delay the design of nuclear power plants 
beyond the planning horizon of ten years. The compro- 
mise appeared to be acceptable, at least in the late eight- 
ies, and even today the consequences do not constitute 
an issue for the political parties and the environmen- 
tal groups. The domestic nuclear capacity will be closed 
down within a few years, but by the electricity imports 
from neighboring countries the national economy will 
remain dependent on nuclear power, even more than in 
the 1980s. 

We have briefly summarized the course of events in 
order to sketch the fortunes of a decision support sys- 
tem in a field full of controversies. The system cannot 
dictate a solution, but it can easily identify the conflict- 
ing views of the actors and the relative importance of 
their arguments. It makes the criteria operational be- 
cause it shows the effects of the weights assigned to 
them. 

Particular attention must be given to the interface 
with the decision makers. Many decisions are made 
in groups: in boards, councils, committees, where the 
members have contradictory views, opinions, aspira- 
tions, and power positions. Single-objective optimiza- 
tion models therefore tend to be inadequate. A realis- 
tic model has multiple objective functions, and usually 
these functions have different weights for the decision 
makers. In fact, multi-objective optimization has two 
subfields: the identification of the nondominated so- 
lutions, and the selection of a nondominated solution 
where the objective functions are felt to be in a proper 


balance. The first-named subfield can be studied in the 
splendid isolation of mathematical research. The sec- 
ond subfield, however, straddles the boundary between 
mathematics and other disciplines because human sub- 
jectivity is an integral part of the selection process. One 
cannot just leave it to an optimization expert to formu- 
late a model and to calculate an acceptable nondomi- 
nated solution. At various stages the expert has to in- 
terrupt the computational process in order to fathom 
the preferences of the decision makers. Certain param- 
eters (weights, targets, desired levels) are adjusted on 
the basis of new preference information, whereafter the 
computations proceed in a somewhat modified direc- 
tion. It is not always clear, however, how the param- 
eters should be adjusted in order to guarantee a rapid 
convergence towards an acceptable compromise. This 
is crucial because decision makers cannot spend much 
time on a particular decision problem. It is an illusion 
to think that many interruptions would be possible to 
elicit preference information. One or two sessions of 
the decision-making body, communication via the mail 
and the telephone in the time intervals between inter- 
ruptions, and that’s all. 

In the 1980s, experts could work with question- 
naires to fathom the preferential feelings in a group. 
The decision makers leisurely answered the questions 
in their office or at home, and returned the responses 
via the mail. In the next session of the group the cal- 
culated results were available for discussion [3]. To- 
day, however, information technology provides sophis- 
ticated facilities for group decision making. Group De- 
cision Rooms with networked PCs and a public screen 
for electronic brainstorming and weighted voting are 
commercially available. The sessions in a GDR have 
more impact than the questionnaires. First, the tech- 
nology of the GDR eliminates the advantages of cer- 
tain discussion techniques. The group members with 
strong verbal skills who usually dominate a meeting 
lose their grip on the silent majority as soon as the but- 
tons are to be pressed. Second, the anonymous brain- 
storming and voting procedures promote an egalitar- 
ian attitude in the group (this may be a stumbling block 
in authoritarian cultures where decisions are deferred 
to the boss). GDR sessions create a certain commit- 
ment among the participants, possibly due to the in- 
tense communication, so that the decisions cannot eas- 
ily be reverted thereafter. In summary, one may observe 
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a power shift in a GDR which strongly affects the choice 
of a strategy [4]. 
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The piecewise sequential quadratic programming 
(PSQP) method is a numerical method for solving 
certain mathematical programs with equilibrium con- 
straints (MPEC), based on the classical sequential 
quadratic programming (SQP) method for nonlinear 
programming (NLP) problems [2,12]. This description 
draws on both [9] and [6], which extend the original 
proposal for PSQP [13] that was restricted to the case of 
MPEC with linear complementarity constraints. See [7] 
for a brief account of an application of PSQP to a prob- 
lem in civil engineering. Its performance on randomly 
generated quadratic programs with affine equilibrium 
constraints is documented in [6] and also in [9,10]. 

PSQP can be applied directly to any MPEC whose 
lower-level problem is a mixed complementarity prob- 
lem, and indirectly to any MPEC where the lower-level 
problem belongs to the class of variational inequalities 
(cf. also » Variational inequalities) (VI) that can be 
written via its Karush-Kuhn-Tucker conditions (KKT 
conditions). 

The formulation of the KKT-constrained MP is 


min f(x, y) 
st. G(x, y) <0, 


: (1) 
F(x, y) + > AiVygi(x, y) = 0, 


i=1 


A= 0, g(x,y) <0, AT g(x, y) =0, 


where f fR"™”" — R, G: R™” —> R* and FR™” > 
R”, are twice continuously differentiable; g: R"'” Ré 
is thrice continuously differentiable; and Vyg;(x, y) is 
the gradient with respect to y of g; at (x, y). This is 
a special case of MPEC with mixed complementarity 
constraints. It includes the subclass of MPEC whose 
equilibrium constraints are nonlinear complementar- 
ity problems by taking g(x, y) = —y. Note that to ease 
notation, equality constraints have been omitted in the 
constraints both at the upper level (G(x, y) < 0) and the 
lower level (g(x, y) < 0); these can be included without 
any difficulties. 

A selection of MPEC applications including Stack- 
elberg games, network design, and design of mechani- 
cal structures, is given in the monograph [9]. Also note 
that in many cases the equilibrium constraints arise as 
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the first order conditions of a (usually convex) opti- 
mization problem, in which case an MPEC problem is 
closely related, if not equivalent to a bilevel program (cf. 
also > Bilevel programming: Introduction, history and 
overview). The volume [1] is a good source of applica- 
tions of hierarchical optimization; see also the volume 
containing [10]. 

For comparison, and also later reference in the sec- 
tion on mathematical programs with affine equilibrium 
constraints, consider the general MPEC with upper- 
level constraints G(x, y) < 0 and the equilibrium con- 
straint that, given x, y solves the parametric variational 
inequality VI(F(x, -), C(x)), ie. 

Vy’ € C(x), 


y € C(x), F(x, y)"(y’—y) = 0, (2) 


where C(x) = ty:g(x, y) < 0}. Under some conditions 
on the vector function g, any pair (x, y) satisfying these 
constraints is associated with a KKT multiplier A such 
that (x, y, A) is feasible for (1). One such condition is 
that g is affine, i.e. linear plus a constant; another is 
that each component function g; is convex and that the 
Slater constraint qualification holds: There exists y such 
that g;(x,¥) < 0 for all i. Conversely, under convexity 
of g, each feasible point (x, y, A) of (1) satisfies the gen- 
eral MPEC constraints. 


Local Decomposition of the Feasible Set 


Let FXST denote the set of feasible points (x, y, A) € 
R™"+£ of (1). Let W = (x, y, A) € FES. For any other 
feasible point (x, y, A), the conditions 
A>0, g(x,y) <0, AT g(x, y) =0, 

imply complementarity: For each i, either A; = 0 or 
gi(x, y) = 0. It follows for each (x, y, A) € FKKT 
((x, ¥, A)), that (x, y, A) is also feasible for some nonlin- 
ear program of the form 


near 


min f(x, y) 
st. G@,y) <9, 
L(x, y,A) = 0, 
A; = 0, 3 
Vie wi ‘ ( ) 
gi(x, y) <0, 
Aj, = 9, 
Vie Ih 
gilx, y) = 0, 


where L(x, y, A) = F(x, y) + )0i Ai Vygi(x, y) and the 
index sets Ji, da partition {1, ..., €} such that J; D 
{i: gi(Z) < O} and dy D fi: A; > OF. 

For each such pair (Ji, dz), the feasible set of (3) is 
called a branch of the feasible set F**" at W. Decompo- 
sition is given by an easy but critical observation: The 
union of the branches (3) of F**" at a feasible point W 
is a neighborhood of w in FX*, 

Each branch can be called a ‘local piece’ of FS". 
The PSQP method takes advantage of this piecewise, 
or disjunctive, or decomposition approach to FT, 
hence is one of the class of disjunctive programming 
methods. 

A decomposition scheme at infeasible points is re- 
quired, to allow for disjunctive methods that work with 
infeasible iterates. This decomposition is based on an- 
other easy observation that for a point (Z,A) € FX*, 
where Z = (x, y): 


fi: g(2) <0} = fi: gi(Z) +4; < 0} 
\i: es o} = fi: ei(Z) +A; > 0} 


Let w= (x, y, A) e R=" € with A > 0, and write z= 
(x, y). Let A(w) denote the family of index set pairs (dj, 
J) that partition {1,..., 2} and satisfy 
Hi 2 fi: gilz) +A; <0}, (4) 

wD 2 {i: giz) + Xi > 0} : 


The Algorithm 


Consider multiplier vectors, called MPEC multipliers, 
y € RB’ and a € R™ and 7 € R¢ corresponding to the 
constraints G(z) < 0, L(z, A) = 0 and the block of con- 
straints g)(z) < ( = ) 0, respectively; and define the 
MPEC Lagrangian as 


LMPEC (7 A, a I, n) 
= f(z) + 0° G2) — a Lz,A) +n" g(z). 


Multipliers corresponding to the constraints A; > 0 (or 
= 0) are omitted here and in the sequel, since these turn 
out not to play a role in the PSQP method as a result of 
being linear. 
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PSQP can now be presented. At iteration k, given 
wk = (zk, A*) and a triple of multipliers v* = (y*, *, 
n*), pick an arbitrary member (J, J2) of A(w*). Corre- 
sponding to this pair (J), J2), form the quadratic pro- 
gram associated with the NLP (3) at (zk, A); the vari- 
ables of this quadratic program are given by the vector 
dw = (dz, dd), with dz = (dx, dy): 


min Vf(z*)' dz 
+ dw! V2, LMPEC(yyk v'\dw 
st.  G(zk) + VG(z*) dz < 0, 
L(w*) + VL(w*) dw = 0, 


5 
. (AK 4 dd); =0, ©) 
Vi € wii 
gi(z*) + Vgi(z*)' dz <0, 
AK + dd); ; 
Vie era? 
gi(z*) + Vgilz*)' dz =0. 


Here gradients of real-valued functions are denoted by 
V, e.g. V f(z*); whereas the derivative V G(z‘) is the sx 
(nt+m) Jacobian matrix of G at z*; and V2, LMPEC (wk, 
vk) is the Hessian or second-derivative matrix, with re- 
spect to w, of the MPEC Lagrangian at (w*, v‘). A KKT 
tuple of this quadratic program will be used to define 
the next iterate. 

The PSQP method for (1) is summarized below. 

Recall, for use in Step 1, the definition of a station- 
arity: A point z is stationary for the general nonlinear 
program 


min f(z) 
st. h(z)=0, (6) 
g(z) = 0, 


for smooth functions f, g, h, if there are vectors A, and 
A,, with dimensions matching h(z) and g(z), respec- 
tively, such that the KKT conditions hold: V f(z)+ V 
h(z)’ An + V g(z)l Ag = 0, h(z) = 0, g(z) 5 0,A = 0 
and ATg(z) = 0. See [6] for details on using approxi- 
mate stationarity in Step 1, namely checking whether 
the KKT conditions approximately hold, instead of 
stationarity. 


Qmelictwo ea) eRe Ry 
ey, 1°, i) € Rome GO =z A(w), an 
k=0. 

1. (Direction finding.) 

Choose a pair (Jk, Jk) € A*. 

IF the QP (5) is found to be the infeasible or 
unbounded below, THEN go to Step 3. 
ELSE find a stationary point dw of (5), with 
multipliers v = (¢, 2, 7) as above. 

IF dw = 0 (hence w* is stationary for (3)) 
THEN go to 3. 

2. (Serious step.) 

Let wkt! = wk + dw, vk*! = vy, and A*! = 
A(w**!). Let k = k + 1 and go to Step 1. 

3. (Null step.) 

Let wet! = wy = 
ANNIE, Izy}, and k =k +1. 

4. (Stopping rule.) 

IF the stopping condition is satisfied THEN 
STOP ELSE go to 1. 


ee 


Algorithm PSQP 


At each iteration PSQP selects a nonlinear program, 
indexed by (J, Jz), and updates (w*, vk) to (wk*}, v1) 
by applying one step of the SQP method to this sub- 
problem. The idea of selecting one of possibly several 
subproblems at each iteration is based on the Kojima- 
Shindo method [8] for solving piecewise smooth equa- 
tions ®(x) = 0, where the mapping ®R”}R” is the con- 
tinuous selection of a finite family of smooth functions 
{®'} each of which maps R" to itself. Given x* € R", an 
arbitrary index i = i(k) is chosen with @(x*) = &/(x*), 
and then x* is updated to x** ! by applying a single New- 
ton iteration to Oat x*. 

Superlinear or quadratic convergence of PSQP and 
of the method of [8] can be shown under appropriate 
conditions. 

Moreover, PSQP can be viewed as a localized ver- 
sion of previous disjunctive approaches to finding 
global solutions of MPEC rather than local solution or 
stationary points. For instance, papers on finding global 
optima are given in [11], and [5]. In connection with 
the latter on a branch and bound approach, note that 
the concept of the relaxed problem, like the problem 
(7), can be used to generate bounds for approximat- 
ing the global optimal value of an MPEC. Both [5] and 
[11] address problems which can be cast as MPEC with 
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affine equilibrium constraints; these will be discussed in 
a later section. 

To discuss stopping conditions for Step 4, observe 
that if all branches of F¥*T at the current point w* are 
such that w* is stationary for each of the associated non- 
linear programs (3), then w* is stationary for the MPEC. 
This follows because the union of branches of F**T at 
w* form a neighborhood of w* in F&*T. 

A) (Stopping condition A) Ak =9. 
Stopping condition A requires an exhaustive check 
of stationarity of wk for each NLP (3) where (Jj, J2) 
€ A(w*). Thus, if w* is a local minimizer, then ver- 
ifying stationarity at w* amounts to checking sta- 
tionarity of the current point for 28 nonlinear pro- 
grams where f is the cardinality of the set Jo(w*) = 

{ic g(z*)=0 = AK}. 

To provide a more efficient stopping rule, introduce 

the relaxed nonlinear program at wk, in which the 

complementarity condition A; gi(x, y) = 0 is relaxed 
for indices i in Jo(w*). Let 7, (w) = fi: gil(z*)< AO 

q_ (w*) = {i :@i(z*)> AK}. The relaxed NLP is: 


min f(x, y) 

st. G(x, y) <0, 
L(x, y,A) = 0, (7) 
gi(x,y)<0=A;, View), 
gi(x,y) S051; Vie D(w), 
gi(x,y) =O<Aj;, Vi € I_(w*). 


Our interest in the relaxed NLP stems from the ob- 

vious fact that if wk e¢ FX, then the feasible set 

of (7) contains every branch of 87 at the current 
iterate w*. Hence if the current iterate is stationary 
for (7), then it must also be stationary for the MPEC 

(1). 

B) (Stopping condition B.) Either Ak = @ or w* isa sta- 
tionary point of (7) with corresponding multipliers 

vk (corresponding to G, L and g). 

Stopping condition B, first used in [10], is an often 
effective heuristic to reduce the number of branches ex- 
amined (null steps executed) in order to either iden- 
tify a stationary point of the MPEC, or find a branch 
where progress can be made. It is guaranteed to ver- 
ify or disprove stationarity by checking just one branch 
[10], provided that the active constraints at w* of the re- 
laxed nonlinear program (7) have linearly independent 
gradients. Also, see computational examples in [6,10], 


where the algorithm executes considerably fewer null 
steps when using stopping condition B instead of the 
exhaustive stopping condition A, even when the relaxed 
NLP has linearly dependent active gradients. 


Convergence of PSQP 


Superlinear local convergence of the PSQP algorithm 
can be shown if, as well as a second order condition, 
there is a uniqueness condition on the optimal mulkti- 
pliers associated with each nonlinear program (3) for 
relevant pairs of index sets (Ji, Jz). The proof of con- 
vergence [9] relies on the convergence analysis of [2] 
and [12], which applies to stationary points of nonlin- 
ear programs such that the corresponding KKT mul- 
tipliers are unique and the second order sufficient op- 
timality condition holds. For superlinear convergence 
of PSQP it is also needed that the optimal multipliers 
be independent of the pairs (d1, J2), so that the current 
value of the multipliers is (locally) a reasonable estimate 
for the NLP subproblem corresponding to any branch. 

Recall the second order sufficient condition for the 
general nonlinear program (6); this condition will be 
needed below for the NLP associated with each branch 
of F€*T at a solution point of the KKT-constrained MP. 
Let Z be a stationary point of the NLP (6). For conve- 
nience assume (as below) that the associated KKT mul- 
tipliers Ay, Aj, are unique. Let H be the Hessian matrix, 
with respect to z, at = z of the Lagrangian mapping 
fle)+ AE h(z)+ re g(z); and © be the critical cone of 
(6) at Z, namely the set of vectors d in z-space such that 
(Vf(z),d) = 0, Vh(z)d = 0, and Vg7(z)d < 0, where 
Vg(Z)r is the submatrix of the Jacobian of g at Z con- 
sisting of rows i such that i(z) = 0. The second order 
sufficient condition says that H is strictly copositive on 
C,i.e.d™’ Hd>Ofor0#4deE€. 

The main convergence result for PSQP is: 


Theorem 1 In the context of the KKT-constrained 

MPEC (1), let the functions f and F be twice contin- 

uously differentiable, and g be thrice continuously dif- 

ferentiable. Let W = (X,¥,A) be a point in F€*" and 

v= (é, w,n) € R* tm+£ be such that, for each (Ji, d2) 

in A(w), 

1) W is stationary for (3); 

2) Vis the unique KKT multiplier for (3) associated with 
w; and 
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3) the second order sufficient condition for (3) holds at 
WwW. 

Then w is a strict local minimizer of the MPEC (1). 
Moreover, for each (w®, v°) sufficiently close to (w, V), the 
PSQP algorithm is well defined and produces a sequence 
{(wk, vk)} that converges Q-superlinearly to (w,v). Fi- 
nally, convergence is Q-quadratic if, in addition, V7f, V? 
F and V° g are Lipschitz continuous near (x, y). 


If (X, y, A) € F¥*" is stationary for the relaxed NLP (7), 
it is clear that the linear independence constraint qual- 
ification for (7) at this point —- linear independence of 
gradients of the constraints that are active at (x, y, A) - 
implies the uniqueness condition on the multipliers 
Vv = (€, 7, 7) required in the above result. 

The above convergence result is local in the sense 
that a starting point near to an eventual solution point 
is needed. To improve the performance of the method 
particularly from starting points that may be far from 
being optimal, it is typical in the context of nonlinear 
programming to use a line search or trust region strat- 
egy [3]. See [6] for a line search approach to MPEC 
where the objective function f is smooth and convex, 
the upper constraints are affine, and the lower or equi- 
librium constraints form an affine variational inequal- 
ity. (PSQP also has special properties for this case as 
seen shortly.) 

In this case, assuming wk = (zk, A) is feasible and 
dw = (dz, dd) is the direction computed by solving the 
QP (5), the update in Step 2 of the PSQP method is 
modified to choose a step size t > 0 such that f(z‘ + t 
dz) <f (z*), amongst other conditions; and then set wk 1 
= wk + t dw. Attempts to globalize PSQP for problems 
with nonlinear constraint functions, by adding penalty 
terms to the objective to replace these constraints, and 
then using line search or trust region strategies, are the 
subject of research. See [14,15], for example. 

Quasi-Newton variants on PSQP, in which the Hes- 
sian matrix V2, LMPEC(wk, yk) is replaced by an easily 
updated approximation, for example, are also the sub- 
ject of research. See [3] for motivation from nonlinear 
programming. 


Affine Equilibrium Constraints 


An important special case of the PSQP method is its 
application to mathematical programs with affine equi- 
librium constraints (MPAEC), that is, problems whose 


equilibrium constraints are in the form of affine varia- 
tional inequalities (AVI). MPEC with equilibrium con- 
straints which can be expressed as in linear comple- 
mentarity problems (cf. also » Linear complementarity 
problem), such as [5,11,13], fall into this subclass. 
Specifically, we consider the MPAEC formulated 
with KKT constraints, for which the upper-level con- 
straint function G, and the functions F and g in the 
lower-level VI are affine (linear plus a constant): 


min f(x,y) 
s.t. Ax + By+a <0, 
Px+Qy+q+E'A=0, (8) 


Dx+Ey+b<0, A=0, 
AT (Dx + Ey + b) =0, 


for some A € R™”", BE R*™”, a © R®; PE R™", Qe 
R™™, ge R"; and De R™", Ee R’*™, be R¢. This is 
the problem (1) with 


G(x, y) =Ax+ By+a, 
F(x, y) =Px+Qy+q, (9) 
g(x,y) =Dx+Ey+b. 


Note that a point (x, y, A) is feasible for (8) if and 
only if G(x, y) < 0 and y solves the affine variational in- 
equality VI(F(x, -) C(x)) described in (2), where C(x) = 
{y:g(x, y) < 0 is a polyhedral convex set. Thus the KKT- 
constrained MP (8) is equivalent to the MPAEC prob- 
lem in its standard form: 


min f(x, y) 
st. Ax+By+a<0, 
y solves the AVI (2) . 


(10) 


Some differences between the special PSQP method 
for MPAEC and the general method should be noted: 
First, in the QP subproblem (5) solved in Step 1, the 
objective function simplifies to 


1 
Vi(z*)' dz + 5 dz'V? f(z‘) dz. 


Second, therefore, the vector v’ of MPEC multipliers 
is not needed except perhaps if stopping rule B is em- 
ployed. Third, the specialized algorithm requires a fea- 
sible starting point and maintains feasibility of iterates. 
A heuristic ‘phase-1’ procedure [6] for finding a feasible 
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starting point is to find a local minimizer or stationary 
point of the indefinite quadratic program: 


min AT F(x, y) 

st. G(x, y) <0, 
F(x, y) + E'A=0, 
A >= 0, 
g(x, y) < 0. 


See [4] for conditions under which all stationary points 
of the phase-1 problem are indeed feasible for the 
MPAEC. 

Similar to SQP applied to nonlinear programs with 
linear constraints, the PSQP algorithm specialized to 
MPEC with affine KKT constraints exhibits locally su- 
perlinear convergence if the Hessian of the objective 
function V7 f(x, y) satisfies a second order sufficient 
condition, regardless of whether either the KKT mul- 
tiplier vector A or the MPEC multiplier vectors y, 7, 7 
are unique. To make this precise, suppose Z = (x, y) 
is feasible for the MPAEC (10), and let A(Z) be the 
set of all KKT multipliers A such that (Z, A) is feasible 
for (8), i.e. (ZA) € FEE. Now for each A € A(Z) 
and (J, Ix) € A(Z,A), let CX8™(J,, Jo) be the critical 
cone of the linearly constrained NLP (3) at (z, A); and 
define the associated z-critical cone C,{**"}(J), Jz) as 
the set of vectors dz such that (dz, dA) € CXK™(J,, Jo) 
for some dd € R™. The second order sufficient condi- 
tion for (Z, A) to be a local minimizer of (8) for each 
A € A(zZ) is that the Hessian matrix V* f(Z) be coposi- 
tive on each of these z-critical cones. 

The second order conditions act in the following 
way: 


Theorem 2 Consider the MPAEC problems (8) and 
(10) where the function f is twice continuously differ- 
entiable, and the functions G, F and g are given by 
(9). Let Z be a feasible point of (10) and A(z) be the 
(nonempty) set {A © R™: (ZA) € FER}. Suppose for 
each A € A(Z) and (J,, J2) in A(Z, A) that (Z, A) is sta- 
tionary for the NLP (3); and that V’ f (Z) is strictly copos- 
itive on the z-critical cone CE I, Jn) defined above. 
Then {z} x A(Z) consists of local minimizers of the KKT- 
constrained MP (8); in fact, Z is a strict local minimizer 
of the MPAEC (10). 


The local convergence properties of PSQP applied to 
MPAEC are now presented. 


Theorem 3 Let f, G, F, g, Z, and A(z) be as above; and 
the associated first- and second order conditions hold. 
Then for each (z°, 4°) near Z x A(Z), the PSQP algo- 
rithm specialized to MPAEC is well defined and pro- 
duces a sequence f(z‘, A¥)} such that {z*} converges Q- 
superlinearly to z. Convergence of {z‘} is Q-quadratic if, 
in addition, V7f is Lipschitz near Z. 


See also 


> Feasible Sequential Quadratic Programming 

> Generalized Monotonicity: Applications to 
Variational Inequalities and Equilibrium Problems 

> Sequential Quadratic Programming: Interior Point 
Methods for Distributed Optimal Control Problems 

> Successive Quadratic Programming 

> Successive Quadratic Programming: Applications in 
Distillation Systems 

> Successive Quadratic Programming: Applications in 
the Process Industry 

> Successive Quadratic Programming: Decomposition 
Methods 

> Successive Quadratic Programming: Full Space 
Methods 

> Successive Quadratic Programming: Solution by 
Active Sets and Interior Point Methods 


References 


1. Anandalingam G, Friesz T (1992) Hierarchical optimization. 
Ann Oper Res 34 

2. Bonnans JF (1994) Local analysis of Newton-type methods 
for variational inequalities and nonlinear programming. 
Appl Math Optim 29:161-186 

3. Fletcher R (1987) Practical methods of optimization. Wiley, 
New York 

4. Fukushima M, Pang JS (1998) Some feasibility issues 
in mathematical programs with equilibrium constraints. 
SIAM J Optim 8:673-681 

5. Hansen P, Jaumard B, Savard G (1992) New branch-and- 
bound rules for linear bilevel programming. SIAM J Sci 
Statist Comput 13:1194-1217 

6. Jiang H, Ralph D (to appear) QPECgen, a MATLAB gener- 
ator for mathematical programs with quadratic objectives 
and affine variational inequality constraints. Comput Op- 
tim Appl (to appear) 

7. Jiang H, Ralph D, Tin-Loi F (1997) Identification of yield 
limits as a mathematical program with equilibrium con- 
straints. In: Grzebieta RH, Al-Mahaidi R, Wilson JL (eds) 
The Mechanics of Structures and Materials: Proc. 15th Aus- 


Optimization in Leveled Graphs 


2813 


tralasian Conf. the Mechanics of Structures and Materials. 
A.A. Balkema, Leiden, pp 399-404 
8. Kojima M, Shindo S (1986) Extensions of Newton and 
quasi-Newton methods to systems of PC1 equations. 
J Oper Res Soc Japan 29:352-374 
9. Luo Z-Q, Pang JS, Ralph D (1997) Mathematical programs 

with equilibrium constraints. Cambridge University Press, 
Cambridge 

10. Luo Z-Q, Pang JS, Ralph D (1998) Piecewise sequential 
quadratic programming for mathematical programs with 
nonlinear complementarity constraints. In: Migdalas A, 
Pardalos PM, Varbrand P (eds) Multilevel Optimization: Al- 
gorithms and Applications. Kluwer, Dordrecht, pp 209- 
229 

11. Maier G, Giannessi F, Nappi A (1982) Indirect identification 
of yield limits by mathematical programming. Eng Struc- 
tures 4:86-98 

12. Pang JS (1993) Convergence of splitting and Newton 
methods for complementarity problems: An application of 
some sensitivity results. Math Program 58:149-160 

13. Ralph D (1996) Sequential quadratic programming for 
mathematical programs with linear complementarity con- 
straints. In: May RL, Easton AK(eds) Proc. seventh Conf. 
Computational Techniques and Applications (CTAC95). 
Sci. Press, Marrickville, pp 663-668 

14. Scholtes S, Stohr M (1999) Exact penalization of mathemat- 
ical programs with equilibrium constraints. SIAM J Control 
Optim 37:617-652 

15. Stohr M (1999) Nonsmooth trust region methods and 
their applications to mathematical programs with equilib- 
rium constraints. PhD Thesis, University Karlsruhe, Shaker, 
Aachen, 2000 


Optimization in Leveled Graphs 


PETRA MUTZEL 
Institute Computergraphik und Algorithmen Techn., 
University Wien, Wien, Austria 


MSC2000: 90C35 


Article Outline 


Keywords 

Multiple Sequence Alignment 
Leveled Crossing Minimization 
Level Planarization 

See also 

References 


Keywords 


Leveled graphs; Integer programming; Crossing 
minimization; Planarization; Multiple sequence 
alignment 


A k-leveled graph or a k-level hierarchy is defined as 
a graph G = (V, E) = (Vj, ..., Ve, E) with vertex sets 
Vi... VE VV) UU?) UV, Vi Vj = @ for i Fj, 
and an edge set E connecting vertices in levels V; and 
V; with i Aj (1 < i,j < k). V; is called the ith level. In 
a geometric representation of a k-leveled graph, the ver- 
tices in each level V; are drawn on a horizontal line L; 
with y-coordinate k— i, and the edges are drawn strictly 
monotone, i.e., an edge (v;, vj) € E, vj € Vi, vy € Vj, i 
<j, is drawn with decreasing y-coordinates. Essentially, 
a k-leveled graph is a k-partite graph that is drawn in 
a special way. 

A proper k-leveled graph is a k-leveled graph G = 
(V,,.... Vx, E) in which any edge in E connects ver- 
tices in two consecutive levels V; and V;,, fori € { 1, 
...»k — 1}. Figure 1 shows a proper leveled graph on k 
= 4 levels. This graph represents the face lattice of the 
cuboctahedron [4]. 

Optimization problems in leveled graphs arise in 
applications in computational biology and in automatic 
graph drawing. 


Multiple Sequence Alignment 


In computational biology the vertices in each level V; 
represent letters of a sequence S; over a finite alphabet 
2’. The optimization problem which arises is the multi- 
ple sequence alignment problem. Here, the k sequences 
S},..., 8; should be aligned so that the cost of the align- 
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ment is maximized. An alignment can be interpreted 
as an array with k rows, one row for each S;. Two let- 
ters of distinct sequences are said to be aligned if they 
are placed in the same column. There are many ways 
to measure the quality of an alignment, leading to dif- 
ferent problem formulations. One of them is the maxi- 
mum weight trace formulation introduced in [14]. Here, 
the letters of the sequences S; = (sii, . 
as vertices in level i in a k-leveled graph G = (Vj, ...; 
Vx, E). Every edge e € E has a nonnegative weight rep- 
resenting the gain of aligning the endpoints of the edge. 
We say that an alignment $ realizes an edge if it places 
the endpoints into the same column of the alignment 
array. 

The set of edges realized by an alignment Sis called 
the trace of S, and the weight of an alignment S is the 
sum of the weights of all edges in the trace of S. The goal 
is to compute an alignment S of maximum weight. 

The maximum weight trace problem is NP-hard, 
and can be solved in polynomial time for fixed k. 
A dynamic programming approach gives an algorithm 
with time complexity O(k?2*N) and space complexity 
O(N), where N = |; 1;, which is feasible only for very 
small problem instances. J. Kececioglu [15] presented 
a branch and bound algorithm whose implementation 
could optimally align six sequences of length 250 in 
a few minutes. 

K. Reinert et al. [19] presented a first formulation as 
an integer linear program. It is based on a graph theo- 
retical characterization of traces given in [19]. For this, 
the alignment graph G = (Vj,..., Vx, E) is extended to 
a mixed graph G’ = (Vj, ..., Vx, E, H) by adding a set 
of directed edges H = {(vj, vj4i): 1S ix k,l <j< 
n; }. This graph is called the extended alignment graph. 
The weight function is extended to E U H by assigning 
weight 0 to all the edges in H. A cycle in G’ is called 
a mixed cycle if it contains at least one arc of H. In [19] 
it has been shown that an edge set in T C Eis a trace in 
G = (V, E) if and only if there is no mixed cycle in G’ = 
(V, T, H). This characterization can be strengthened to 
substitute ‘mixed cycle’ by ‘critical mixed cycle’ which is 
essentially a simple mixed cycle visiting each sequence 
at most once. 

Figure 2 shows an alignment graph and its extended 
alignment graph. Two critical mixed cycles are shown 
by the dotted lines. A formulation in the form of an in- 
teger linear program is now straightforward. Let G’ = 


«+5 Sin;) are Viewed 
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(V, E, H) be an extended alignment graph. For every 
edge e € E we introduce a variable x, € { 0, 1 } indicat- 
ing whether e is in the trace or not. Let we denote the 
alignment weight of edge e € E. The formulation is as 
follows: 


max ) WeXe 


ecE 

s.t. y Xe <|ENC|—1, 
e€CNE 
V critical mixed cycles C in G’ 


xe € {0,1}, Vee. 


This integer linear program can be solved via 
a branch and cut approach based on polyhedral combi- 
natorics. Reinert et al. define the trace polytope as the 
convex hull of the characteristic vectors of all possi- 
ble traces of G = (V, E). Their investigation leads to 
some classes of tight inequalities which can be used 
in a branch and cut algorithm. An implementation of 
a branch and cut algorithm based on ABACUS [13] 
could optimally align 15 sequences (arising from prion 
proteins) of length 230 within 2.5 hours of computation 
time on a SparcStation Ultra 1/170. 

For two sequences, Reinert et al. have given a com- 
plete description of the trace polytope. The set (E, ©) = 
(E, {T: T C Eis a trace in G }) is an independence sys- 
tem, since 9 € © and for any T, € © and T> C Tj, also 
T2 € ©. The minimal dependent subsets or circuits of 
(E, @) have size two. Hence, the set (E, @) is a 2-regular 
independence system. A set F C E is a clique in a 2- 
regular system (E, @) if ||F|| > 2 and all (U2) 2-subsets 
of F are circuits of (E, ©). 

For two sequences, the trace polytope is completely 
described by the trivial inequalities 0 < x, < 1 and by 
the clique inequalities }°,¢ ¢ xe < 1, where C is a maxi- 
mal clique in the independence system (E, ©). Since the 
separation problem for the clique inequalities can be 
solved in polynomial time, this provides an algorithm 
to solve the maximum weight trace problem for two se- 
quences in polynomial time. 
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An efficient combinatorial algorithm for solving the 
maximum weight trace problem for two sequences has 
been given in [14]. Kececioglu has shown how to trans- 
form the problem into the heaviest increasing subse- 
quence problem. This can be solved in time O(n log n), 
where n is the length of the sequence [9]. 

One can show that the maximum weight trace prob- 
lem is equivalent to the 2-level planarization problem 
with two fixed levels arising in automatic graph drawing 
(see the section “Level Planarization’). For further infor- 
mation on multiple sequence alignment or on combi- 
natorial optimization problems in computational biol- 
ogy, see, e. g., [23]. 


Leveled Crossing Minimization 


In automatic graph drawing the task is to compute 
a layout of a given graph that is easy to read and un- 
derstand. Applications include flow diagrams, organi- 
zation charts, entity-relationship diagrams, or PERT- 
diagrams. 

A common method for drawing directed graphs is, 
as a first step, to partition the vertices into a set of k lev- 
els so that no edges have their endpoints in the same 
level and most of the edges point downwards. This can 
easily be done using a depth-first-search or a topolog- 
ical sorting algorithm when the graph is acyclic. The 
crucial step is the second one. Given a k-leveled graph, 
the vertices within the levels need to be permuted so 
that the resulting drawing is easy to read [22]. A widely 
used criterion for a ‘nice’ drawing is a minimum num- 
ber of crossings. 

The k-level crossing minimization problem arises: 
Given a k-leveled graph G = (Vj, ..., Vx, E), permute 
the vertices within the levels so that the resulting geo- 
metric representation of G contains a minimum num- 
ber of crossings. 

The k-level crossing minimization problem is NP- 
hard, even when the numbers of levels is two [5]. The 
problem can be solved in polynomial time for 2-leveled 
permutation graphs [21] and for 2-leveled trees [20]. 

An integer linear programming formulation has 
been given in [10]. This is the first approach attacking 
the k-level crossing minimization problem via polyhe- 
dral combinatorics. Recently (1999), P. Healy and A. 
Kuusik [8] have shown that the original formulation 
can be tightened using additional inequalities derived 
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in the so-called vertex-exchange graph. They are able to 
solve instances with up to 100 vertices, 100 edges, and 8 
levels to provable optimality. 

Figure 3 shows a k-level crossing minimum drawing 
of the face lattice graph shown in the first figure of this 
article. It turns out to be a symmetrical drawing. 

F. Shahrokhi et al. [20] have suggested algorithms 
approximating the 2-level minimum crossing number 
by a factor of O(n log logn) and O(log*n), respectively, 
for a certain class I” of graphs. The approximation al- 
gorithms are based on the relationship between the 2- 
level crossing minimization problem and the linear ar- 
rangement problem. The linear arrangement problem 
for a graph G = (V, E) is to find a bijection f: V > { 1, 
...5 ||V|| | of minimum length. The length of the bijec- 
tion f is given by Le = 0, ez || f(v) —f(w) II. 

Shahrokhi et al. have shown that for a certain graph 
class I” the order of magnitude for the optimal num- 
ber of crossings is bounded from below, and above, re- 
spectively, by the minimum degree times the optimal 
arrangement value, and by the arboricity times the op- 
timal arrangement value. The arboricity ag of a graph 
G = (Vg, Eg) is defined as 


|En| 
max ’ 
H | Vi| —1 


where the maximum is taken over all subgraphs H of G 
with || Viz || > 2. Equivalent to ag is the minimum num- 
ber of edge disjoint acyclic subgraphs needed to cover 
G. The graph class I” includes, e. g., connected 2-level 
graphs G of degree at most a constant k with ||E|| > (1 
+ y) ||V||, where y > 0 is fixed. It also includes regu- 
lar graphs, degree bounded graphs, and genus bounded 
graphs, which are not too sparse. 
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In practice, the crossing minimization problem for 
k-leveled graphs is reduced to a series of 2-level crossing 
minimization problems in the following way: In a pre- 
processing step, we add artificial vertices to the levels L; 
for all the edges traversing Lj(i = 2,...,k — 1). Fori= 
1,..., k — 1, we solve the 2-level crossing minimization 
problem for the two adjacent levels L; and L;,, with L; 
fixed, repermuting the vertices on level L;, 1. Then we 
go backward, fixing level L; and repermuting the ver- 
tices on level L;— , for i= k, k — 1,..., 2. The heuristic 
consists of repeating these two loops until no more im- 
provement is obtained. 

Unfortunately, the 2-level crossing minimization 
problem in which the permutation of the vertices in 
one level is fixed is also NP-hard [3]. Therefore, a lot 
of effort went into the design of efficient heuristics (see, 
e.g., [1,22]). P. Eades and N. Wormald [3] have shown 
that the drawings constructed using the median heuris- 
tic lead to a number of crossings that is within a factor 
of three times the optimal crossing number. 

M. Jiinger and P. Mutzel [12] transformed the 2- 
level crossing minimization problem with a fixed level 
to a linear ordering problem: 

Any solution is obviously completely specified by 
the fixed permutation 7, of V; and a permutation 72 
of V>. For k = 1, 2 let yi = 1 if 2;(i) < m;(j) and 0 oth- 
erwise. Thus zz (k = 1, 2) is uniquely characterized by 
the vector y* € {0,1 }°"). Let m = || Vi ||, m2 =|] V2 |, 

= ||E||, and let N(v) = { w € V: (v, w) € E} denote the 
set of neighbors of ve V = V; U V2 in G. The number 
of edge crossings between the levels V; and V3 is given 
by 
ny—-1l ng 


C(t) = ye » > > Vii + Vin Vii + 


i=1 j=it+1 kEN(i) 1EN(j) 


Let 
= 1 
c7= DO DL Me 
kEN(i) IEN(j) 


denote the number of crossings among the edges adja- 
cent to i and j if (i) < m2(j). Then 


n2—-1l ng 

C(m) = > > Ci Vij + cji(1 — Vij) 
i=1 j=i+1 
no-1 ng aa 7) 


DE xy (cis eg yig + DE DE ci 


i=1 j=it+ i=1 j=i+1 


For n= no, yi = yj, and ay = cy — 
ordering problem 


ci we solve the linear 


n-l on 

min > ~ ij Vij 

i=1 j=itl 

OS vip + Vik — Vik <1, 
l<i<j<k<n, 

viz € {0, 1}, 


l<i<j<n. 


(LO) 


If z is the optimum value of the linear ordering prob- 
lem, then 


n—-1 n 
z+) DL Gi 
i=1 j=i+1 


is the minimum number of crossings in the corre- 
sponding 2-level drawing. The constraints of (LO) 
guarantee that the solutions indeed precisely corre- 
spond to all permutations 2 of V>. 

Jiinger and Mutzel [12] have shown that the result- 
ing linear ordering problem can be solved efficiently via 
a branch and cut algorithm for instances containing up 
to 250 vertices per level. Extensive computational ex- 
periments show that the running time of their exact 
branch and cut algorithm is comparable to that of the 
widely used heuristics for instances with up to 60 ver- 
tices per level. 

They combined this branch and cut algorithm with 
a branch and bound algorithm in order to achieve 
a practically efficient exact algorithm for the 2-level 
crossing minimization problem for instances with up 
to 15 vertices on the smaller level. Computational ex- 
periments have shown that the results achieved via 
heuristic methods are far from the optimum solution 
(see [12]). 

In Fig. 4 an example of a 2-level graph with 20 edges 
is shown. The first two drawings have been computed 
via heuristics (LR-heuristic and barycenter heuristic, 
respectively) and have 30 and 10 crossings, respectively. 
The third one is the optimum solution with only 4 
crossings. This is only one example showing that it is 
worth searching for better algorithms for leveled graph 
drawing. 
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Level Planarization 


Several authors have suggested minimizing the num- 
ber of edges ‘producing’ crossings instead of min- 
imizing the number of crossings in hierarchical 
drawings. 

Figure 5 shows two drawings of the same graph. 
Figure 5a) has been generated after first removing the 
edges that ‘produce’ crossings, and then drawing the re- 
maining graph without crossings before finally insert- 
ing the four edges again. Although the drawing shown 
in Fig. 5a) has 34 crossings, that is 41% more crossings 
than the crossing minimum drawing shown in Fig. 5b), 
the reader will not recognize this fact. 

This example motivates careful consideration of the 
k-level planarization problem. A k-level graph G is level 
planar if a k-level representation of G exists in which 
no two edges cross except at common endpoints. Level 
planarity can be tested in linear time [11]. Given a k- 


level graph G = (V),..., Vx, E) with weights w, > 0 on 
the edges, the k-level planarization problem is to extract 
a level planar subgraph G’ = (Vj, ..., Vi, F), F CE, of 
maximum weight, i.e., the sum }°.¢ pwe is maximum. 
The k-level planarization problem is NP-hard, even for 
k = 2 levels. 

So far (1999), no exact algorithm for the k-level pla- 
narization problem on k > 3 levels is known. For k = 2, 
Mutzel [17] has presented a branch and cut algorithm 
based on polyhedral studies of the associated polytope. 
The integer programming formulation for the 2-level 
planarization problem is based on the following char- 
acterization of 2-level planar graphs (first presented 
in [7]). 

A 2-level graph is 2-level planar if and only if it con- 
tains no cycle and no double claw. 

Figure 6 shows a double claw (Fig. 6a) and a cy- 
cle (Fig. 6b). Although both graphs are planar bipar- 
tite graphs, they are not 2-level planar (see Fig. 6c) for 
the cycle). For double claw free graphs, the 2-level pla- 
narization problem is equivalent to the maximum for- 
est subgraph problem that can be solved via a simple 
greedy algorithm. Shahrokhi et al. [20] have given a lin- 
ear time algorithm for solving the 2-level planarization 
problem for 2-level acyclic graphs. 

In order to get an integer linear programming for- 
mulation for the 2-level planarization problem, we in- 
troduce variables for all edges e € E of the given 2-level 
graph G =(Vj, Vo, E). For any set P C E of edges we de- 
fine a characteristic vector y? € R!#ll with the ith com- 
ponent x?(e;) getting value 1 if e; € P, and 0 otherwise. 
Any 0 -1 vector xT = (x¢,,..., Xeyny)> that is the char- 
acteristic vector of a 2-level planar graph satisfies the 
following inequalities: 


Lae < |C|-1, VcyclesCCE, 


e€C 
ye <|T|-—1, VdoubleclawsT CE, 
ecT 

x. € {0,1}, VeeEkE, 


and vice versa. Hence, solving the integer linear system 
max )>¢e2We Xe with the constraints described above 
will give us the solution of the maximum 2-level planar 
subgraph problem for a given graph G = (Vj, V2, E) 
with weights w, on the edges e € E. 
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The computational experiments presented in [17] 
show that although most of the instances (from 20 to 
100 vertices) could not be solved to optimality, solu- 
tions could be obtained which are provably within 5% 
of the optimum value. 

As in the 2-level crossing minimization problem, 
the case in which the permutation in one of the two 
levels is fixed has also been investigated [18]. Unfortu- 
nately, this version of the problem is also NP-hard [2]. 

While the fixed version of the crossing minimiza- 
tion problem turned out to be ‘easier’ to attack in prac- 
tice than the free version, unfortunately, this is not true 
for the one level fixed version of the 2-level planariza- 
tion problem. The integer programming formulation 
suggested in [18] contains variables yj € { 0, 1 } for i 
<j,i=l,...,m—1,j=i+1,..., m coding the per- 
mutation zr of the vertices in level V2 and variables x, 
€ { 0, 1 } for e € E coding the edges contained in the 
subgraph. In the following integer linear programming 
formulation for the one level fixed 2-level planarization 


1 11 


27) 20 22 = «17 


problem let (p, i) and (q, j) be edges in E: 


max y WeXe 


ecE 
OS yij + Vik — vik <1, 
lSte pens Val, 
yij € 10, 1}, 
l<i<j<k< |W, 
Vij + X(p,i) + Xq,f) S 2; 
m™1(q) < m(p), 
Vij + X(p,i) + Xq,) S 1, 
m(p) < 7 (q), 
Vee. 


i<j, 


i<j, 
xe € {0,1}, 


The first two constraints require the variable vector y to 
respect a linear ordering 72. The last three constraints 
are responsible for introducing no crossings with re- 
spect to the ordering 72 coded by y. 

Mutzel and R. Weiskircher have shown that all the 
tight inequalities of the linear ordering polytope (see, 
e.g., [6]) are still tight inequalities for the polytope in- 
vestigated here. In particular, all the knowledge about 
the linear ordering polytope can be used in order to 
get a practically efficient algorithm. Hence, the prob- 
lem considered here is tightly connected with the 2- 
level crossing minimization problem in which the per- 
mutation of one level is fixed. 
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Figure 7 shows two drawings computed via itera- 
tively solving the one level fixed 2-level crossing min- 
imization problem and the 2-level planarization prob- 


WY: 
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a 


lem with one fixed level. In both drawings, the subprob- 
lems have been computed optimally. 

A comparison of both drawings shows that it is 
worthwhile to also consider planarization and not only 
crossing minimization in hierarchical graph drawing. 
The ultimate goal is to solve these problems not lev- 
elwise but in one step. This will improve hierarchical 
drawings tremendously (see [12]). The authors in [18] 
conjecture that it will be easier to solve practical in- 
stances of the k-level planarization problem than of the 
k-level crossing minimization problem. 

The question arises whether the 2-level planariza- 
tion problem can be solved efficiently when the permu- 
tations of the vertices in both levels are fixed. This prob- 
lem, however, is equivalent to the maximum weight 
trace problem for two sequences (see the section ‘Multi- 
ple Sequence Alignment’). The transformation in both 
directions is indicated in Fig. 8. 

If the graph shown in Fig. 8a) is an instance of 
a fixed 2-level planarization problem, it is equivalent to 
solving the maximum weight trace problem on the in- 
stance shown in Fig. 8b). In the case that Fig. 8a) shows 
an instance of the maximum weight trace problem, it 
is equivalent to solving the fixed 2-level planarization 
problem in the graph shown in Fig. 8c). 

It follows that the fixed 2-level planarization prob- 
lem can be solved in polynomial time. Moreover, we 
know the complete description of the associated poly- 
tope (see also [18]). This information is very helpful for 
solving the 2-level planarization problem in which the 
permutation of one level is fixed. 

Besides the application in automatic graph drawing, 
the 2-level planarization problem comes up in compu- 
tational biology. In DNA mapping, small fragments of 
DNA have to be ordered according to the given over- 
lapping data and some additional information. M.S. 
Waterman and J.R. Griggs [24] have suggested combin- 
ing the information derived by a digest mapping exper- 
iment with the information on the overlap between the 
DNA fragments. If the overlapping data is correct, the 
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maps can be represented as a 2-level planar graph. But, 
in practice, the overlapping data may contain errors. 
Hence, they suggested solving the 2-level planarization 
problem (see also [23]). Furthermore, the 2-level pla- 
narization problem arises in global routing for row- 
based VLSI layout (see [16]). 


See also 


> Graph Planarization 
> Integer Programming 
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By allowing physicians to peer into the human body, 
medical imaging technologies provide vital informa- 
tion for medical diagnosis, treatment and research. Be- 
yond the mere visualization of anatomical structures, 
these imaging technologies are now being used in such 
roles as surgical planning, cancer diagnosis and prog- 
nosis, intra-operative navigation, radiotherapy treat- 
ment planning, and the tracking of disease progression. 
A key component in the effectiveness of medical imag- 
ing technologies is the development of sophisticated 
computer algorithms, which extract and analyze use- 
ful information. Such algorithms enable reliable and re- 
peatable quantitative data to be extracted in order to 
support accurate medical diagnosis and treatment as 
well as clinical research. The development of such im- 
age analysis algorithms remains a rich area of research, 
involving numerous applications of optimization. 

Medical images are generated by a variety of tech- 
nologies. Among the most widely used are X-ray com- 
puted tomography (CT), emission computed tomogra- 
phy, magnetic resonance imaging (MRI), biomagnetic 
source imaging, ultrasound, and digital subtraction an- 
giography. (See [1] for a tutorial of these medical imag- 
ing technologies). Images from these machines are typ- 
ically stored as two-, three-, or four-dimensional arrays 
of data elements, corresponding to two or three spatial 
coordinates, and possibly a temporal coordinate. These 
data elements are generally referred to as pixels, but are 
also called voxels in three-dimensions and hypervoxels 
in four dimensions. The values of these elements can be 
scalars, such as tissue density, or vectors, such as relax- 
ation time pairs in magnetic resonance imaging. 

A central issue in extracting information from med- 
ical images is the problem of segmenting the image, ei- 
ther by identifying boundaries of critical structures in 
the image (spatial segmentation), or by classifying pix- 


els according to some set of features, such as texture or 
gray levels (feature segmentation). In the first case, the 
image is segmented spatially into a number of regions 
which correspond to anatomical objects, such as or- 
gans. These regions typically form connected sets with 
well-behaved boundaries. In the second case, spatial in- 
formation plays little or no role. For example, microcal- 
cifications, which may be an early sign of breast cancer, 
may be scattered throughout otherwise healthy tissue. 
In detecting such microcalcifications, the pixels of an 
image would be classified either as healthy tissue or as 
microcalcifications. In this case, the pixel classes are not 
connected spatially, and can have very complex bound- 
aries. 

In either case, segmenting medical images requires 
the integration of low-level information about the im- 
age along with a priori high-level information about 
what the image represents. Extracting low-level infor- 
mation involves detecting features such as edges or tex- 
tures. Once these features are detected, they can then 
be combined with high-level information either in the 
form of an expert system or some mathematical model 
of the knowledge. This integration task is made more 
difficult by the fact that medical images tend to suf- 
fer from sampling artifacts, spatial aliasing and noise, 
which can cause boundaries of structures to be indis- 
tinct and disconnected. 


Low-Level Feature Detection 


Numerous algorithms have been developed for identi- 
fying low-level features of images. The most common 
are edge-detection methods and texture transforms. 
Edge detection methods work by applying a digital fil- 
ter, which emphasizes edges in the image. As an exam- 
ple, vertical edges can be detected by convolving the im- 
age with the Sobel edge filter 


0|;-1 
2\\) | 22 
0|;-1 


After applying this filter, the resulting image will have 
values close to zero, except at vertical edges, which will 
have values which are relatively large in magnitude. By 
changing the elements of the filter, edges with different 
orientations can be detected. 
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Another important low-level feature is texture. 
Generally speaking, this is a measure of image coarse- 
ness, smoothness and regularity. Numerous texture 
transforms have been proposed, including statistical, 
spectral and structural measurements. These trans- 
forms quantify various attributes of texture at every 
pixel in the image. Thus, after applying the texture 
transforms, each image pixel is represented by a vector 
of numbers which represent the texture. 

Edge detection methods and texture transforms are 
described in detail in most recent image processing 
textbooks. (see, for example, [7]). 


Spatial Segmentation 


Once low-level features of the image have been ex- 
tracted, they can then be combined with high-level in- 
formation to segment the image. In the case of spa- 
tial segmentation, this generally involves determining 
boundaries between various regions of interest. Figure 1 
provides an example of a CT scan of a human abdomen 
which has been partially segmented spatially. In this 
picture, boundaries of critical organs have been high- 
lighted in white. 

More than a thousand different algorithms have 
been proposed for automatically segmenting images. 
Among the techniques used in these algorithms are 
thresholding, region splitting, region merging, clus- 
tering, multiscale analysis, surface fitting, rule-based 
expert systems, relaxation, and deformable models. 


Optimization in Medical Imaging, Figure 1 
Partially segmented CT image of human abdomen. Organ 
boundaries are shown in white 


A common theme underlying many of these techniques 
is the minimization of an energy function which in some 
way measures the quality of the segmentation. 


Deformable Models 


A spatial segmentation technique which has attracted 
much recent attention, particularly in the medical field, 
is the use of deformable models. Deformable models 
have the capability of combining low-level information 
derived from the image data with high-level knowl- 
edge about the characteristics of anatomical structures. 
An extensive survey of deformable models in medi- 
cal image processing is given in [6]. Deformable mod- 
els in the form of snakes were first introduced by M. 
Kass, A. Witkin, and D. Terzopoulos [5] to segment 
contour objects in 2D images. Deformable models are 
curves or surfaces, often defined using splines, which 
are controlled by an energy function. The energy func- 
tion has two major components: the external energy, 
which measures how well the spline matches image fea- 
tures, such as edges, and the internal energy which mea- 
sures nonaffine deformations from some model curve. 
By minimizing the sum of these two energy functions, 
a reasonable balance is achieved between matching im- 
age features and determining boundaries which are 
consistent with the a priori knowledge. 

In two dimensions, the internal energy of a curve 
c(s) can be determined by the formula 


E= [Bassists + Ebending€(S) ds , 
s 


where s is the arclength of the curve, Eviasticity represents 
the energy due to the elasticity of the spline and Epending 
represents the energy due to bending. Minimizing this 
internal energy results in a curve which is as close as 
possible to the original shape of the model curve. 

The external energy of a curve is determined by how 
well the curve matches image features. In particular, the 
external energy is minimized when the curve is aligned 
with edges or when textures on either side of the curve 
are different. 

Figure 2 and Fig. 3 give an example of the use of 
snakes to segment an image representing a breast tis- 
sue sample [9]. In Figure 2 an initial estimate of the 
cell boundaries is entered by a technician using a mouse 
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Optimization in Medical Imaging, Figure 2 
Initialization of snake contours for cell boundaries (Copied 
with permission from [9]) 


Optimization in Medical Imaging, Figure 3 
Cell contours minimizing energy (Copied with permission 
from [9]) 


to define approximate contours of cell boundaries. Af- 
ter these snakes are initialized, a greedy algorithm is 
used to achieve a local minimum of the energy func- 
tion. If the energy function value at a particular snake 
point can be lowered by moving the point to an adja- 
cent pixel, then the point is moved. The process is re- 
peated for each point until all points settle into a local 
minimum of the energy function. The result is shown 
in Fig. 3. 

Early implementations of deformable models pos- 
sessed only generic a priori knowledge (for example, 
smoothness criteria) of what region boundaries should 
look like. More sophisticated tools, called deformable 
templates [8] have been studied which incorporate 
more specific a priori information with respect to ex- 
pected shapes and their spatial relationships. 


Optimization Techniques 
for Minimizing the Energy Function 


The energy function governing the deformable mod- 
els or templates is, in general, a nonconvex function. 
Worse, in some cases, this energy function may actu- 
ally be discontinuous. Consequently, local optimization 
techniques are usually inadequate unless the initializa- 
tion of the deformable models is very accurate. This 
generally requires human intervention to provide an 
acceptable initialization. 

Alternatively, global optimization techniques have 
been considered for minimizing the energy function. 
Among the more frequently used algorithms in this 
arena are simulated annealing (cf. » Simulated anneal- 
ing) and genetic algorithms (cf. ® Genetic algorithms). 


Feature Segmentation 


In applications where regions of interest are not ex- 
pected to be spatially connected, it is more natural 
to segment according to features. To accomplish this, 
a number of features are evaluated at each pixel of the 
image. These features might include, for example, gray 
levels or texture transforms. The collection of measure- 
ments for a pixel forms a vector in n-dimensional fea- 
ture space, where n is the number of features evaluated. 
It is then possible to classify pixels according to where 
they are located in feature space. 


Clustering 


One approach to classifying points in feature space is 
called clustering. Here, the m points representing the 
image are assigned to a predetermined number k of 
clusters. The problem can be stated more explicitly as 
that of determining k centers in R” such that the sum 
of the distances of each point to the nearest center is 
minimized. This amounts to solving the following min- 
imimization problem: 


m 
. k 
min ymin |x: - C;| ; 
CiyusCk — j= 
i=1 
where Cj, j = 1, ..., k, represent the centers and x;, i = 
1, ..., m, are the feature vectors for the m points. When 


the norm is the 2-norm, this problem is solved by the k- 
mean algorithm [4]. Whereas when the norm is the 1- 
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norm, the problem reduces to a bilinear program which 
is solved by the k-median algorithm [3]. 

Clustering algorithms are unsupervised classifica- 
tion schemes in that no a priori knowledge is used to 
determine the classification. However, the choice of the 
number of clusters k can play a significant role. If k is 
too large, it may be difficult to attach meaning to the 
clusters. Whereas, if k is too small, critical information 
may be missed; for example, microcalcifications may be 
clustered together with healthy tissue. 


Supervised Classification 


Another approach to classifying pixels is to partition 
feature space into a number of regions according to 
some a priori knowledge. This knowledge can be pro- 
vided by a training set of data taken from a large col- 
lection of images. Each element of the training data in- 
cludes the vector of features, along with the known clas- 
sification of that vector. These data elements can be par- 
titioned into a collection of subsets, with each subset 
corresponding to a unique classification. 

With this training data, a discriminant function can 
be constructed that distinguishes between the different 
classifications. If the conical hulls of the subsets cor- 
responding to each classification are disjoint, then the 
subsets can be discriminated by a piecewise-linear func- 
tion, which can be calculated by solving a single linear 
program (cf. also » Linear programming) [2]. Other- 
wise, a decision tree can be constructed by solving a fi- 
nite sequence of linear programs. 

Once the discriminant function has been deter- 
mined, new images can be segmented by evaluating the 
features of each pixel, and then applying the discrimi- 
nant function to classify the pixel. 


Summary 


These applications give a sense of the applicability of 
optimization techniques to medical image processing. 
Global optimization methods, such as simulated an- 
nealing and genetic algorithms, are used to minimize 
energy functions corresponding to deformable mod- 
els; bilinear programs arise in clustering approaches to 
image segmentation; and linear programs arise in the 
training of image classification schemes. 


See also 

> Entropy Optimization: Shannon Measure of 
Entropy and Its Properties 

> Maximum Entropy Principle: Image Reconstruction 
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From the 1960s onwards the computer-assisted solu- 
tion of the problem to distribute the load of an elec- 
tric power system among all generation units in opera- 
tion (economic dispatch) attracted the attention of load 
dispatchers (persons, responsible for economic and se- 
cure operation of a power system) in utilities and en- 
gineering scientists [12]. Hand in hand with the devel- 
opment of computer hardware to high internal speed, 
large main storages and low costs, different optimiza- 
tion methods have been applied to more and more 
complex operation planning models of power systems 
to better reflect their reality [3,4,10,13]. Today, the 
operation planning problem is of high economic im- 
portance to the electric power industry in presence 
of deregulation measures, privatization, and an up- 
coming competitive marketing environment. This dra- 
matic move to a market-based industry will signifi- 
cantly change power system operations. New pricing 
mechanisms (e. g., spot-market, auctions) will come in 
place and require a fast solution of new optimization 
problems to economically plan the operation of future 
power systems under new marketing conditions. 

Therefore, this paper outlines the basic principles of 
how to model and optimize a power system to econom- 
ically plan its operation. 


Problem Definition 


The objectives of the operation planning problems of 
electric power systems are dependent on their plan- 
ning period. Short-term planning problems cover a day, 
a weekend or a week and are designed to solve 
the scheduling of generation units and contracts in 
the most economic way possible. Long-term planning 
problems cover a planning period of a season or a year 
and more and must allocate the resources available 
(e. g., fuels, storage water, contracted energy amounts) 
to smaller time periods. Medium-term problems for 
more than a week or a month are often to be solved in 
presence of small hydro-reservoirs or amounts of fuels 
to be disposed in short intervals. These three planning 
problems with different objectives are not independent 
from each other. They form an hierarchical model sys- 
tem (long-term, medium-term, short-term) to be run 
continuously to plan the operation economically from 
a day to a year. 

Electrical energy is not storable in large amounts. 
Therefore, the load dispatcher of a power system must 
balance its load P\(t) and the generation P,(t) including 
energy purchase contracts P,(t) for each point in time, 
expressed by equation (1). 


P,(t) + P(t) = A(t); (1) 


P\(t) is defined by the load curve (total power sys- 
tem load over the time of day) and is given as input 
data representing constant load values for short time- 
intervals (e.g., 1 hour), called time-steps in modeling 
terms. This simplification replaces the continuous load 
curve by a stepwise representation. The transformation 
into a stepwise load curve Fig. 1, is necessary to be 
able to apply mathematical programming optimization 
methods. 

The power system load P)[t] (time-discretized rep- 
resentation of load curve with [t]) is covered partially by 
an energy purchase contract and by generators located 
in different power plants. 

There are two main types of power plants: thermal 
plants (including nuclear plants) and hydro plants, to be 
distinguished by their energy input (primary energy) to 
produce electricity (secondary energy). Thermal plants 
use coal, oil and gas whereas hydro plants need wa- 
ter of rivers or stored in reservoirs as input. More- 
over, thermal generation units (a generator in a thermal 
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Pit) 


Pit) 


t 


Optimization in Operation of Electric and Energy Power Sys- 
tems, Figure 1 
Load curve representation 


plant) are characterized by other operational conditions 
than hydro ones. These different operational conditions 
must be taken into account to correctly model the dif- 
ferent types of generation units. 

Thermal generation units have, in a first approach, 
unlimited access to their fuels (coal, oil, gas) to be 
payed for. The operational costs of thermal units are de- 
fined by the cost curve for fuels between a minimal and 
a maximal electric power output. Moreover, their start- 
up or shut-down procedure lasts a number of hours and 
also causes procedure costs. Finally, their electric power 
output can be changed with a small time gradient only. 

There are also two types of hydro plants to be distin- 
guished. Firstly, the run-of-river plants that are located 
on a river. They generate electricity out of the (no-cost) 
water transported by the river. Therefore, their electric 
power output is constant over longer time periods (e. g., 
a day) and changes only when the amount of the wa- 
ter freight of the river changes. There is no degree of 
freedom for the optimization but a run-of-river plant 
must be taken into account to compute correct genera- 
tion schedules, accepted by load dispatchers. 

Secondly, storage plants which can only use a small 
amount of their reservoir-stored water during the plan- 
ning period. This condition makes the model dynamic 
in time and requires special attention when modeling 
a storage plant. Fast changes of their electric power out- 
put are possible within their generation limits. This is 
necessary to follow large differences of the power sys- 
tem load of consecutive time-steps. 

Sometimes, storage plants have pumping-units too 
which transport water from a lower reservoir to 


a higher one at times when the energy costs needed 
to pump the water are low (e.g. during the night). 
When the energy costs are high, the pumped water is 
released from the upper reservoir to generate electricity 
by running into the lower one. This pumping possibil- 
ity makes the operation of a power system more flexible 
but also more complex. Therefore, pumping units are 
not considered here to not overburden the reader. 

The last important power system element that con- 
tributes to cover the power system load is the energy 
purchase contract with an outside utility. Such a con- 
tract is defined by the costs for the delivered energy 
and is limited by a maximum power and often by a cer- 
tain amount of energy allowed to be purchased during 
a given time period (e. g., day). 

Summarizing the afore described power system ele- 
ments, equation (1) can be enhanced as follows: 


P,[t] = Palt] + Pelt] + P[4] (2) 
Pult]+ Plt] + Ps[t] + Plt] = Alt]. (3) 
where 


e Py,[t]: generation of thermal plants at time-step ¢; 
e P,,[t]: generation of run-of-river plants at time-step 
f 
e P,[t]: generation of storage plants at time-step t. 
Equation (3) describes the power balance of a so- 
called hydro-thermal system in its most general form 
(pumping not included). 
The short-term scheduling problem is chosen as 
a sample of the operation planning problem for the 
complete planning cycle (day to year). The load dis- 
patcher must solve this operation planning problem on 
a regular (daily) basis in order to achieve his objective to 
operate the power system and its elements in the most 
economic way possible. 


Modeling 


In the following, all power plants discussed before, the 
energy purchase contract, the power balance, and the 
objective to minimize operational costs are modeled 
in MIP-terms (mixed integer programming). Simple 
modeling approaches with linear relationships are cho- 
sen for the power system elements involved to sup- 
port the understanding of the reader for the short-term 
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Optimization in Operation of Electric and Energy Power Sys- 
tems, Figure 2 
Cost curve of a thermal unit 


scheduling problem of the operation planning process 
in power systems. 

Before starting with the model descriptions of the 
power system elements involved in covering the power 
system load, it is necessary to summarize the time pa- 
rameters used for modeling. 

The planning period is the time horizon (e. g., day, 
weekend, week, month, year) covered by the opti- 
mization problem. Each time horizon is split up into 
so-called time-steps with equal length (e.g., 1 hour, 
0.5 hour) during which the power system load is ap- 
proached as a constant value (see Fig. 1). This simplifi- 
cation makes the application of mathematical program- 
ming methods possible since the right-hand side of the 
matrix is transformed into a constant value. 

In general, there are always more than one unit in 
a power plant. Here, only one generation unit is as- 
sumed. Therefore, the term unit is used synonymously 
with plant in the following. 


Thermal Plant 


A linear relationship of the cost Cy,[t] to electric gen- 
eration Pi,[t] is chosen for the sake of simplicity, not 
altering the modeling approach of the thermal unit it- 
self. 

e Fuel costs: 


Cult] = a * Y[t] + b * Palt]. (4) 
e Limits on electric generation: 

Pn * Y[t] S Palt] < px * Y[¢). (5) 
e Start-up indicator: 


Uli] Yil—PlF— 1], (6) 


U[t]€(0,1), Y{[t] € (0,1). (7) 
e Start-up costs: 
Cs[t] = coh * U[t] - (8) 


e Data (thermal unit) 
- a: fixed costs; 
- b: incremental costs; 
- Pn: minimum electric power output; 
- px: maximum electric power output; 
— Ch: Start-up costs. 
e Variables (thermal unit) 
- Cwl[t]: fuel costs at time-step t; 
- C,[t]: start-up costs at time-step t; 
- P[t]: electric generation at time-step ¢; 
- Y(t]: on/off indicator at time-step t; 
- U[t]: start-up indicator at time-step t. 


Storage Plant 


In a first modeling approach, the water discharge curve 
is assumed to be linear between zero and a maximum 
discharge value qx. Moreover, a constant head H = 
H,(t) is also permissible: 


P(t) = k * Qs(t) * H(t) . (9) 


Therefore the equation (9) for the electric generation 
P,(t) of a storage plant with a water discharge Q,(t) and 
a water height H,(t) can be transformed into (12) by 
applying equation (11): 


P(t) = k * Q(t) * A(t). (10) 
H,(t) = H = const, (11) 
P(t) = ky, * Q(t); (12) 
e Electric generation 
P.[t] = ku, * Q[t] . (13) 
e Limits on water discharge 
0 < Qt] < qx. (14) 


e Data (hydro unit; storage plant) 
- ky,: gradient of the linear water discharge curve; 
- qx: maximum possible water discharge. 


2828 


Optimization in Operation of Electric and Energy Power Systems 


Ps[t] 


Qs[t] qx 


Optimization in Operation of Electric and Energy Power Sys- 
tems, Figure 3 
Water discharge curve 


e Variables (hydro unit; storage plant) 
- P,[t]: electric generation at time-step f; 
- Q,[t]: water discharge (cubicmeter per second) 
for electric generation at time step t. 


Hydro-Reservoir 


Each storage plant is connected to at least one hydro- 
reservoir. The content of the hydro-reservoir changes 
form time-step to time-step and can be computed from 
the so-called storage equation (15) expressing the water 
volume V[t] in the reservoir at the end of each time- 
step ft: 

e Water volume 


V[t] = V[t— 1] — 3600 * Q,[t] dt 


+ 3600 * Qin[t] * dt. ae 
e Volume limits 
Mm < V(t] S vx. (16) 
e Storage condition 
Ar =V([0]—V[T]. (17) 


The equation (15) defines the amount of water al- 
lowed for generation of the storage plant during the 
planning period. In short-term planning (e. g., daily op- 
timization) it is often required that the water volume at 
the end of the planning horizon V[T] equals the start- 
ing volume V[0]. 

e Data (hydro-reservoir) 

- V[0]: volume in the reservoir at the beginning of 

the planning period; 


- Ar: water volume available for generation to the 
storage plant during the planning period; 

— V,: minimum volume allowed in the reservoir; 

— Vx: maximum volume allowed in the reservoir; 

- Q<in[t]: natural inflow into the reservoir at time- 
step f; 

- dt: time-steplength in hours. 

e Variables (hydro-reservoir) 

- P,[t]: electric generation at time-step f; 

- Q,[#]: water discharge (cubicmeter per second) 
for electric generation at time-step t. 


Run-of River Plant 


The electric generation of a run-of-river plant is directly 
dependent form the water freight of the river. In gen- 
eral, the water freight of a river remains constant during 
a short-term planning period of a day or even a week- 
end. Therefore, the electric generation of a run-of-river 
plant remains constant during the planning period and 
has no degree of freedom in mathematical terms con- 
cerning its (constant) generation. Although of no math- 
ematical importance to the optimization, run-of-river 
plants must be considered in the model because of 
practice-related reasons. A load dispatcher needs the 
economic dispatch problem to be solved in engineering 
terms, taking into account all power system elements, 
including run-of-river plants, contributing to cover the 
power system load. 
e Data (hydro unit; run-of-river plant) 

- P,,[t]: hydro-unit (run-of-river plant) - electric 

generation. 


Energy Purchase Contract 


Although there is a wide variety of different contract 
conditions, a very simple contract is modeled here to 
demonstrate the basic principles. 

Constant incremental costs are assumed for the pur- 
chased energy between P, = 0 and P. = px, the maxi- 
mum power allowed to be purchased from the selling 
utility during the planning period. 

e Contract costs 


C.[t] = T. * P.[t] * dt. (18) 
e Contracted electric power limits 
0 < Plt] < pe. (19) 
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Optimization in Operation of Electric and Energy Power Sys- 
tems, Figure 4 
Contract cost curve 


In addition to the above (basic) model of a contract, 
there is often a maximum amount of contracted energy 
allowed to be purchased from the selling utility during 
the planning period as follows: 


c 
Do Pelt] * dt < dex 


t=1 


(20) 


e Data (energy purchase contract) 

- T-: tariff for the purchased energy; 

- dt: time-steplength (already defined); 

- pcx: maximum power allowed to be purchased 
from the selling utility; 

- dcx: maximum energy allowed to be purchased 
from the selling utility during the planning pe- 
riod. 

e Variables (energy purchase contract) 

- C,[t]: contract costs at time-step ¢; 

- P,[t]: electric power of purchased energy at time- 
step f. 


Power Balance 


The power balance equation (3) is the most important 
part of an operation planning optimization model and 
is therefore repeated here again: 


Poult] + Prelt] + Ps[t] + Pelt] = Alt] . (21) 


Both the variables Py, [t], P;[t] and P.[t] and the in- 
put data P,,[t] and P,[t] were already defined when the 
different power system elements were modeled. 


Objective Function 


One of the main goals of utilities is to operate their 
power system as economic as possible. Therefore, the 


operational costs during the planning period must be 
minimized to meet the overall target of a load dis- 
patcher. This objective formulates as follows: 


T T 
Z=)-(Calé] + Cee] de+ >> C.[t] > min. (22) 
t=1 t=1 
The variables Cy,[t], C.[t] and C;,[t] were already 
defined when describing the model of the thermal plant 
and the model of the contract. 


Optimization 


Since the model of the power system was formulated 
in MIP-terms (mixed integer programming) an opti- 
mization code with integrated MIP-techniques (linear 
programming followed by branch and bound method) 
must be applied. Each professional optimization code 
offers MIP methods to the user in different modular- 
ity; sometimes as package with a control language (e. g., 
MPSX/370 and MIP/370 [6]) or as a subroutine library 
(e.g., OSL [9]). 

Moreover, the solution time and the accuracy of re- 
sults are two main aspects that must be taken into ac- 
count when planning the operation of an electric power 
system by applying optimization techniques. Therefore, 
a careful selection of the solution procedure is neces- 
sary to cope with both the accuracy of results required 
and a practice-related solution time that are primarily 
conflicting goals. But today, the solution of each oper- 
ation planning problem of an electric power system as 
an MIP-optimization-model is possible in the solution 
time required and with the accuracy necessary, even 
for highly nonlinear and combinatorial problems (e. g., 
unit commitment: scheduling of a large number of ther- 
mal units taking into account time-dependent start- 
up costs) [2,14]. Moreover, multiprocessor hardware 
(e. g. RS/6000-SP) and related optimization codes (e. g., 
OSLp [8]) are further elements to speed up the solution 
time for time critical applications, sometimes arising 
in the operational planning environment of very large 
scale electric power systems. It must be emphasized 
in this context that three elements must be taken into 
account to successfully meet challenging solution-time 
targets. An appropriate modeling approach [14], an op- 
timization procedure with acceleration techniques im- 
plemented and a problem dependent hardware with 
a professional optimization code. 
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Computer Implementation: 
Optimization in Practice 


The computer implementation is an important step 
when developing an optimization model for the load 
dispatch practice. The problem definition, the model- 
ing approach, and the optimization techniques are rel- 
atively easy to define. However, it is much more difh- 
cult to transform the optimization model, written on 
a paper, into a computer executable form. This bot- 
tleneck can be overcome by modeling tools, so-called 
model generators (e. g., EasyModeler/6000 [7]), allow- 
ing to formulate the optimization model in algebraic 
terms and at the same time offering ease-of-use sup- 
port for data input and debugging. Moreover, a close 
program interaction between the model generation tool 
and an optimization code is a must for a fast and suc- 
cessful model development and model usage in the util- 
ity practice of planning the operation of its power sys- 
tem. Therefore, tools with algebraic model formula- 
tions are widely accepted by engineers. 

In the sequel, the power balance equation (21) is ap- 
plied as a demonstration example concerning the ease- 
of-use of a model generator (EasyModeler/6000) ap- 
propriate for application by utility engineers and load 
dispatchers. 


Model Formulation 


Prhit) Pre(t) Ps{t) Pe(t) 


Pl(t} 


Optimization in Operation of Electric and Energy Power Sys- 
tems, Figure 5 
Power balance 


Pelt] + Pelt] + Plt] + Pelt] = Alt] - (23) 


Model Generator Formulation (EasyModeler/6000) 
e DATA 


- P\[t] (power system load); 


- P,,[t] (generation; run-of-river plant > constant 
power output); 
e ENDDATA 
VARIABLE 
- Py[t] (generation; thermal plant); 
P.[t] (generation; storage plant); 
- P,[t] (energy purchase contract; power); 


e ENDVARIABLE 
CONSTRAINTS 


- Load (FORALL f) 


Pult] Ss P,[t] + P.[t] = Alt] _ P(t]; 


e ENDCONSTRAINT 

Comparing the power balance equation (21) with 
the formulation of the model generator (P,,[t] as a con- 
stant value for each t appears on the right-hand side), 
no relevant differences occur in their algebraic rep- 
resentation. The same is valid for the modeling ap- 
proaches of all other power system elements. Therefore, 
the application of a model generator to real-life prob- 
lems can be learned very easily, even by self-education. 
Moreover, it supports a very fast development of opti- 
mization models occurring in the operational practice 
of utilities. 


Numerical Example 


In order to support the reader’s understanding of the 
given problem, the numerical example is based on the 
(simplified) model of a power system, described before, 
to economically plan its operation. The result of the op- 
timization run in Table 1 offers the schedules of the 
power system elements in operation and the associated 
costs as well as the value of the objective function. 


Input Data 


t= 1,ic5,.7;dt= 1. 

e Thermal plant 
- pn: 10OMW; 
-— px:950MW; 
- a0OMwW; 
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Optimization in Operation of Electric and Energy Power Sys- 
tems, Table 1 
Schedule for all power system elements (7 time steps) 


t P,[t] Pi, [t] [ey Ali Pio [t] P,[t] 
1 200 110 90 = = 

2D) 300 210 90 = = 

3 600 410 90 = 100 
4 800 615.4 90 = 94.6 
5 1000 798.5 90 ES 100 
6 700 510 90 = 100 
7 200 100 90 4.6 5.4 


b: 26.31583MU/MW (MU denoting money unit); 
-— Csth: LOOOMU. 
e Run-of-river plant 
- P,,[t]; 200MW. 
e Storage plant 
- ky,:2.304MW second/m?; 
- qx: 34.72m3/second. 
e Hydro-reservoir 
- v[o]: 41.23 * 10°m?; 
- Var 0m’; 
- Vx: 150 * 10°m; 
- Ar: 0m’; 
- gin: 1m3/second. 
e Energy purchase contract 
- T.:25MU/MWh; 
- Pex: 1OOMW; 
= Acx: 400MWh. 


Numerical Results 


- Objective function: 82 470.3MU. 


Summary and Future Trends 


Today, operational planning tools are state-of-the-art 
in utilities to support load dispatchers in meeting their 
objective to economically operate their power sys- 
tems [15]. These planning tools are based on computer- 
assisted optimization models. From the 1960s onwards, 
these models have been closely linked to the develop- 
ment of computer hardware, modeling tools and op- 
timization codes. With decreasing price/performance 
of computers, power system models became larger and 


larger, taking into account more details of the power 
system elements involved and so coming closer to their 
technical and operational reality. It must be empha- 
sized in this context that MIP modeling and optimiza- 
tion methods have turned out to be the best and most 
effective strategy as an overall compromise of the con- 
flicting targets: accuracy of results and solution time. 
Multiprocessor hardware and an associated optimiza- 
tion code are also available for time critical applications 
with combinatorial and highly nonlinear models. 

In the future, utilities (sometimes privatized) will 
have to operate their power systems in a widely dereg- 
ulated environment of a competitive market. This ex- 
pected move to a market driven industry will signifi- 
cantly change electric utility operations [11]. New pric- 
ing mechanisms of energy trading (e. g. spot-marketing, 
auctions) will require the formulation of new optimiza- 
tion problems [1,5]. They will have to be solved much 
faster than up to now in order to support effectively and 
efficiently the decision making process in the business 
environment to come for utilities in the future. This 
new situation of utilities is a great but fascinating chal- 
lenge for future research concerning the new operation 
planning targets of power systems. 

Although this contribution referred to electric 
power systems only, the same modeling and optimiza- 
tion principles can be applied for other line-based en- 
ergy systems (e.g., gas, district heating), even for cou- 
pled ones with co-generation plants (plants, that pro- 
duce electricity and steam for industrial or public use 
in district heating systems). 


Note 


In memoriam Felix Schlaepfer, my friend, who con- 
tributed relevantly to the economic dispatch problem. 
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Introduction 


Unit-Disk Graphs (UDGs) are intersection graphs of 
equal diameter (or unit diameter w.Lo.g.) circles in the 
Euclidean plane. In the geometric (or disk) representa- 
tion, each circle is specified by the coordinates of its 
center. Three equivalent graph models can be defined 
with vertices representing the circles [18]. In the in- 
tersection graph model, two vertices are adjacent if the 
corresponding circles intersect (tangent circles are also 
said to intersect). In the containment graph model, two 
vertices are adjacent when one circle contains the cen- 
ter of the other. In the proximity graph model, an edge 
exists between two vertices if the Euclidean distance 
between the centers of corresponding circles is within 
a specified bound. Recognizing UDGs is NP-hard [10] 
and hence no polynomial time algorithm is known for 
deriving the geometric representation from the graph 
model. From an algorithmic perspective this places an 
textitasis on whether or not the geometric representa- 
tion is needed as input. UDGs are not necessarily per- 
fect or planar [18] as several other geometric intersec- 
tion graph classes are and thus motivate the need for 
dedicated theoretical study. 

The remainder of this article is organized as fol- 
lows. We introduce the necessary definitions and nota- 
tions for the various optimization problems on graphs 
considered in this article in Sect. “Definitions”. A brief 
survey of applications modeled using UDGs and the 
role of optimization problems discussed in this chapter 
are presented in Sect. “Applications”. A survey of algo- 
rithms and their key ideas for cliques, independent sets, 
vertex covers, domination, graph coloring and clique 
paritioning are presented in Sect. “Models”. We con- 
clude with a summary in Sect. “Conclusions”. 


Definitions 


We consider simple, undirected graphs on n ver- 
tices and m edges denoted by G=(V,E). For 
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a vertex veéV, N(v) is its neighborhood and 
N[v] = N(v) U {v} is its closed neighborhood. We 
denote the complementary graph by G = (V, E) and 
the subgraph induced by S C V by G[S]. We denote 
by 6(G) and A(G) the minimum and maximum vertex 
degrees in G respectively. Denote by d(u, v) the shortest 
distance between u,v € V, then the k-neighborhood of 
v is defined as Nx (v) = {u € V: d(u,v) < k}. We also 
use the notation G — v and G — I to refer to the graph 
obtained from G by deleting vertex v (and incident 
edges) and by deleting a subset of vertices I (and inci- 
dent edges), respectively. That is G—v = G[V \ {v}] 
andG—I=G[V\ IJ]. 

A clique is a subset of pairwise adjacent vertices in G. 
The maximum clique problem is to find a largest cardi- 
nality clique in G and the clique number w(G) is the 
size of a maximum clique. An independent set (or sta- 
ble set) is a subset of mutually non-adjacent vertices 
and the maximum independent set problem is to find 
an independent set of maximum cardinality. The inde- 
pendence number (or stability number) of a graph G is 
denoted by a(G) and it is the size of a maximum in- 
dependent set. A maximal clique (independent set) is 
one that is not a proper subset of another clique (in- 
dependent set). Cliques and independent sets are com- 
plementary to each other in the sense that C C V is 
a clique in G if and only if C is an independent set in G. 
For arbitrary graphs, the maximum clique and inde- 
pendent set problems are equivalent and possess sim- 
ilar complexity and approximation results. Algorithms 
and heuristics for one can be adapted via complement 
for the other. However, this equivalence is not natu- 
rally extended to geometric graphs that do not preserve 
upon complementing, their geometric property. For 
instance planar independent set is NP-complete [30] 
while clique is trivial. Results on UDGs will be discussed 
in Sect. “Cliques”, “Independent Sets”. A vertex cover S 
is a subset of vertices such that every edge in G has at 
least one end point in S. We denote by A(G), the size of 
a minimum vertex cover. Clearly if I C V is indepen- 
dent, then V \ I is a vertex cover of G. 

A proper coloring of a graph is one in which every 
vertex is colored (assigned a natural number) such that 
no two vertices of the same color are adjacent. A graph 
is said to be k-colorable if it admits a proper coloring 
with k colors. Vertices of the same color are referred 
to as a color class and they induce an independent set. 


The chromatic number of the graph, denoted by x(G) 
is the minimum number of colors required to prop- 
erly color G. Note that for any graph G, w(G) < x(G), 
as different colors are required to color the vertices 
of a clique. The famous theorem by Brooks on graph 
coloring [11] states that y(G) < A(G) if G is neither 
a complete graph nor an odd cycle. A related problem 
is the minimum clique partitioning problem which is to 
partition the given graph G into a minimum number of 
cliques, ¥(G). Note that this is exactly the graph color- 
ing problem on G and x(G) = x(G). 

A dominating set D is a subset of vertices such that 
every vertex in the graph is either in this set or has 
a neighbor in this set. A minimal dominating set con- 
tains no proper subset which is also dominating. The 
minimum cardinality of a dominating set is called the 
domination number, denoted by y(G). Note that every 
maximal independent set is also a minimal dominat- 
ing set. If a dominating set D is independent, it is called 
an independent dominating set. A a dominating set D is 
called a connected dominating set if G[D] is connected. 
The independent and connected domination numbers 
(obviously defined) are denoted by y;(G) and y-,(G). 
Naturally, G is assumed to be connected when we con- 
sider connected domination. 

An approximation algorithm with approximation 
ratio p > 1 for an optimization problem /7, outputs for 
every instance x of J7 with an optimal value opt(x), 
a solution of value sol(x) in time polynomial in size of 
x, such that sol(x) < p x opt(x) if [7 is a minimization 
problem or sol(x) > opt(x)/p if IT is a maximization 
problem. 

An optimization problem /7 admits a fully polyno- 
mial time approximation scheme (FPTAS) if there is 
an approximation algorithm with approximation ratio 
1 + € for any € > 0 that runs in time polynomial in size 
of the input and 1/e. IT is said to admit a polynomial 
time approximation scheme (PTAS) if it has a polyno- 
mial time approximation algorithm with approxima- 
tion ratio 1 + € for each fixed € > 0. A problem that is 
NP-hard in the strong sense [30], does not admit a FP- 
TAS unless P = NP. 


Applications 


A major application area for UDG models is in wire- 
less communication. Here the underlying connectivity 
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graph of the wireless nodes with equal and omnidirec- 
tional transmission-reception range can be modeled as 
a UDG [5,33]. Various optimization problems studied 
on UDGs are solved to facilitate effective operation of 
such networks. 

For instance, a maximum independent set corre- 
sponds to a largest set of wireless nodes that can broad- 
cast simultaneously without interference [56]. Alter- 
nately in location logistics, a@(G) is also the maximum 
number of facilities that can be located in n potential lo- 
cations if proximity between any two facilities is unde- 
sirable [45,61]. Clique and clique partitioning are pop- 
ular approaches for clustering wireless networks [40]. 
Maximal cliques are also used to model and avoid link 
interference in ad-hoc networks [32]. A dominating 
set in UDGs modeling wireless network function as 
a small set of nodes that can send an emergency com- 
munication to the entire graph [18]. Domination and 
connected domination are also used to cluster wireless 
networks. The vertices in a dominating set D are des- 
ignated as cluster-heads and N[v] for each v € D forms 
a cluster. Inside a cluster formed in this fashion, 2-hop 
communication is possible between any pair of nodes 
via the cluster-head. In mobile wireless networks, the 
nodes are weighted appropriately to find a weighted 
dominating that can yield cluster-heads that are less 
mobile [8,16]. Alternately, if a virtual backbone is desir- 
able among the cluster-heads, a connected dominating 
set is used in clustering [19,20]. Clustering is an impor- 
tant problem in wireless networks as it helps routing 
and improves efficiency and throughput [57]. Graph 
coloring problems are used to solve channel assignment 
problems in wireless networks such as frequency as- 
signment, code or time slot assignment depending on 
the protocols used [33]. The idea is that the chromatic 
number of the connectivity graph is the smallest set of 
frequency bands (time-slots or codes) required to com- 
municate without interference. The UDG recognition 
problem also has applications in determining molecu- 
lar conformations [34]. 


Models 
Cliques 


An O(n*°) time algorithm for finding a maximum 
clique in a UDG G=(V,E) given the disk repre- 
sentation is presented in [18]. We briefly describe 
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The region Rj is shaded 


the ideas presented in [18]. Consider the set of disks 
V = ({1,...,m} with centers at c;, Vi € V. For a pair 
i,j € V, (i, j) € E if and only if the Euclidean distance 
L(c;, cj) < 1. Denote by Rj the region of intersection of 
two disks of radius L(c;, cj) centered at c; and c; (see 
Fig. 1). Let Hi; C V denote the disks with centers in 
the region Rj. Consider a maximum clique C and let 
i, j € C be the farthest pair (in terms of Euclidean dis- 
tance) of vertices in C, then C C Hjj. If such a far- 
thest pair i, j in some maximum clique C is known, then 
we only need to find a maximum clique in G[H;;] to 
find a maximum clique in G. Since such an i,j pair is 
unknown, we can enumerate over all (i, j) € E to de- 
rive a polynomial time algorithm, if we can solve the 
maximum clique problem in polynomial time on every 
G[H;,;]. This is facilitated by the following observation 
made in [18]. Consider the region Rj with L(c;, cj) < 1. 
The line joining c; and c; bisects Rj into R;, and R;,. The 
disk centers located in each half form a clique and hence 
the complement G[Hij] is a bipartite graph. Since max- 
imum independent set problem in bipartite graphs can 
be solved in O(n?°), we can find the maximum clique 
in G[H;,;] in the same time which results in the claimed 
polynomial runtime. After the polynomial solvability 
was established in [18], the running time has been im- 
proved to O(n?» log n) in [9]. 
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A relevant notion of robust algorithms for restricted 
graph classes such as UDGs was introduced recently 
in [55]. A robust algorithm for solving a problem on 
UDGs would accept only the graph G in standard for- 
mat (adjacency list or matrix) as an input and solve 
the problem if it is indeed a UDG, or report that G is 
not a UDG. A polynomial time robust algorithm is pre- 
sented in [55] for finding a maximum clique in UDGs 
(without the geometric representation) which returns 
a maximum clique or reports that G is not a UDG. 
The existence of a polynomial time robust algorithm 
for the maximum clique problem on UDGs is a sur- 
prising result given the NP-hardness of UDG recog- 
nition. A key idea is an ordering L = ej, e2,...,€m 
of edges of G (input in standard format) referred to 
as a cobipartite neighborhood edge elimination ordering 
(CNEEO). Denote by G;(i) the subgraph of G with edge 
set {€;,€i41,....@m}. Define for each edge e; = (u, v) 
the set N;(i) to be the set of vertices adjacent to both u 
and v in G;(i). The authors define an edge ordering L 
to be CNEEO if for each e;, Nz (i) induces a cobipartite 
(complement of bipartite) graph in G. The authors then 
prove that given G and a CNEEO L, a maximum clique 
can be found in polynomial time and describe a greedy 
algorithm for determining a CNEEO L if it exists or cer- 
tifying that G has none in polynomial time. Finally, the 
authors show that every UDG admits a CNEEO there 
by completing the robust polynomial time algorithm 
for maximum clique problem on UDGs (in fact for the 
larger class of graphs that admit a CNEEO). 


Independent Sets 


Contrary to maximum clique, the maximum indepen- 
dent set (MIS) problem on UDGs is known to be NP- 
hard, even when the disk representation is given [18]. 
However, simple constant factor approximation algo- 
rithms and PTASs have been developed for this prob- 
lem. Note that the strong NP-hardness of the MIS 
problem precludes the possibility of a FPTAS unless 
P=NP [30]. 

Given a graph G that does not contain a (p + 1)- 
claw as an induced subgraph, an O(n log n + m) algo- 
rithm is presented in [37] to find an independent set 
of size at least a(G)/p. A p-claw is a graph on p+ 1 
vertices Vp = {Uo, U1,..., Up} Such that uo is adjacent 
to all other vertices and V, \ {uo} is an independent 


set. The algorithm proceeds by adding a vertex v € V 
to I followed by the removal of v and its neighbors 
i.e, N[v] from the graph. This step is repeated un- 
til V is empty, and the resulting independent set I is 
maximal. Let I’ denote a MIS in G. Suppose for the 
sake of argument that we sequentially removed ver- 
tices of N[v] from I” for each v removed from I. In 
any step, if v removed from I is also in I’, the num- 
ber of vertices in I’ deleted in that step is exactly one. 
If v € I\ I*, the number of vertices removed from I" 
is at most p since a MIS in N(v) has at most p vertices. 
Since I is maximal, I’ will be empty when I is empty and 
o(G) = |I*| < [EN I*| + p x |I\I*| < pill. By geom- 
etry, UDGs do not contain a 6-claw [45] and the above 
algorithm is a 5-approximation for the MIS problem on 
UDGs. 

A simple 3-approximation algorithm is presented 
in [45] that, given a UDG G constructs an indepen- 
dent set of size at least a(G) / 3. This algorithm is based 
on the observation that every UDG has some vertex v 
such that a(G[N(v)]) < 3. In particular, this is true for 
the vertex corresponding to the disk with minimum x- 
coordinate. Since every induced subgraph of a UDG is 
also a UDG, we can apply the same algorithm stated be- 
fore from [37] and the observation will continue to hold 
in each step. But the vertex v added to I in each step is 
one with a MIS of size at most 3 in N(v) yielding the 
desired approximation ratio. Given the disk represen- 
tation, such a vertex v can be found easily in each step 
and without the disk representation such a vertex can 
certainly be found in polynomial time (O(n’)). 

The shifting strategy for geometric graphs intro- 
duced in [38], analogous to techniques for planar 
graphs introduced in [4] is the key ingredient in the 
PTASs developed for the MIS problem in [39,47]. The 
approaches are similar and we follow the presenta- 
tion in [39]. Let G = (V, E) be the UDG and the cen- 
ter of each disk in V is specified. If we seek an in- 
dependent set IS[G] such that |IS[G]| > (1 — €)a(G), 
then choose parameter k to be the smallest integer for 
which (k/(k + 1))? > 1-e. Grid the region contain- 
ing G with unit squares by dividing into horizontal 
and vertical strips of unit width. Assume that the in- 
tervals on the axes corresponding to each strip are left 
closed and right open. This is necessary to deal with 
disks with centers on the boundary of two strips. For 
some 0 < i < k, delete all the disks with centers in ev- 
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An example UDG G with a unit grid applied 


ery horizontal strip congruent to i mod (k + 1) and 
denote the resulting UDG by G(i). This leaves r disjoint 
horizontal “super”-strips of width k containing disjoint 
UDGs G(i);, G(i)2,..., G(i); such that 


Gi) = |) aa;. 


l<j<r 


See Fig. 2 and Fig. 3. Varying i varies the deleted hori- 
zontal strips and hence permits us to shift the horizontal 
super-strips vertically over the graph G. 

Now for some 1 < j <1, select the jth horizon- 
tal super-strip. For some 0 <1 <k, delete from the 
super-strip G(i);, all disks with centers in every verti- 
cal strip congruent to / mod (k +1) and denote the 
resulting UDG by G(i,/);. Parameter / can similarly 
be seen as the horizontal shift parameter for the ver- 
tical super-strips. This partitions G(i, !); into s; UDGs 
G(i, 1)j,1,-..,G(i, 1)j,s; each contained in a square 
block of side k. See Fig. 4. Thus any MIS of G(i, !);1 
with 1 <j <rand1<t <5; is of size at most O(k’) 
and can be found in time n°“ by enumeration. The 
independent set returned is the union of independent 
sets from disjoint UDGs, but the one corresponding 
to the best block partition which depends on the shift 
parameters i,/. This can be expressed as follows. For 
a fixed vertical shift i and horizontal super-strip j, and 
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Optimization Problems in Unit-Disk Graphs, Figure 3 
Graph G(i) with i = k = 2. Disks centered in the underlined 
horizontal strips (2,5) have been deleted leaving 2 horizon- 
tal super-strips corresponding to disjoint graphs G(2); and 
G(2)2 
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Graph G(2, 0)2 with / = 0 (above). Disks centered in the un- 
derlined vertical strips (0, 3, 6) have been deleted leaving 
two 2 x 2 square blocks. Graph G(2, 0)2,2 inside a 2 x 2 square 
block (below) 


for some choice of I, 


IS[GG.)j):= (J) MIS[G(G, Dj.) . 


1Stssj 
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The best independent set in super-strip j is then ob- 
tained by horizontally shifting the grid, i. e., varying |, 
thus 


IS[G(i);] := max IS[G(i, 1)/] . 
0<I<k 


For each i, having found the best independent set in 
each super-strip, 


IS[G()] = |_J Is{Gci);] . 


l<j<r 


By varying the horizontal shift parameter i, we can se- 
lect the best independent set for the graph as 


IS[G] := max IS[G(i)] . 
0<i<k 


It is shown in [39] that 
IS[G]| > (k/(k + 1))’a(G) 


and the algorithm has a running time of n°“), By the 
choice of k, this implies a PTAS for the MIS problem 
on UDGs. Suggestions for improving running time and 
solution quality are presented in [39,47]. After a vertical 
shift i, this involves solving the MIS problem optimally 
on each horizontal super-strip j using dynamic pro- 
gramming (DP) instead of approximating by horizontal 
shifting. This results in |IS[G]| > (k/(k + 1))a@(G) and 
a total running time of n°, 

In either version, the shifting strategy helps us to 
divide-and-conquer by breaking down the graph into 
pieces on which optimal resolution is possible (by enu- 
meration or DP) in polynomial time, but at the same 
time bound the error created by the division process as 
division is flexible. We refer to [39] for details on the 
performance guarantees mentioned. 

Clearly, the disk representation is required for the 
above PTAS to work. The open problem of whether 
a robust PTAS exists for the problem in the sense de- 
scribed in Sect. “Cliques” was settled positively in [51]. 
Given a graph G = (V,E) (in standard format) and 
a desired error € > 0 we seek an independent set of 
size at least a(G)/p where p = 1 + €. The algorithm 
starts with some arbitrary vertex v and finds a MIS Ix 
in G[N;(v)] for k = 0,1,2,... sequentially until the 
condition |Ix+4i1| > elIz| is violated. Let r denote the 
smallest k > 0 for which |J,+41| < p|J,|. The authors 
show that there exists a constant (dependent on :) 


c(p) such that r < c(p) and each MIS J; can be found 
in polynomial time. By the choice of r, we know that 
a(G[N,+1(v)]) < pll-|, ie. I, is a p-approximate MIS 
for GLN,+1(v)]. Suppose we have a p-approximate MIS 
I’ for G’ = G[V \ N,41(v)], then clearly I = I’ UI, is 
independent since I' C V\ N,+1(v) and I, C N,(v). 
Furthermore, a(G) < a(G[N;+1(v)]) + a(G’) < pll|, 
i.e. I is a p-approximate MIS of G. This fact com- 
bined with the fact that every vertex induced sub- 
graph of a UDG is also a UDG, we have an induc- 
tive argument leading to the required PTAS. Robust- 
ness of the above algorithm is due to the following 
observations. The performance guarantee does not re- 
quire G to be a UDG, and the algorithm always returns 
a (1 + €)-approximate solution. However, geometry of 
UDGs is required to establish polynomially bounded 
running times for finding MIS in k-neighborhoods and, 
the existence of a constant c(p). The proof of these 
claims also shows that if there exists an independent set 
I, > (2r + 1)* (bound assumes unit-radius disk repre- 
sentation, which is equivalent to other representations 
discussed before) then G is not a UDG. Since this cer- 
tificate can be obtained in polynomial time, this PTAS 
is robust. 


Vertex Cover 


The minimum vertex cover problem on UDGs is also 
NP-hard as shown in [18]. Given a UDG G = (V, ), 
a polynomial time heuristic that does not require a disk 
representation to find a vertex cover of size at most 
1.58(G) is presented in [45]. This algorithm requires 
results from [37,50]. The first result is the well-known 
Nemhauser-Trotter (NT) decomposition [50] which 
states given an arbitrary graph G = (V, E) there exist 
disjoint vertex subsets P and Q such that (1) there exists 
a minimum vertex cover containing P; (2) if D is a ver- 
tex cover for G[Q] then D U P is a vertex cover for G; 
(3) any minimum vertex cover of G[Q] contains at least 
|Q| H 2 vertices. The second result from [37] states that 
following a NT decomposition, if G[Q] can be colored 
using k colors, then P U (Q \ S) is a vertex cover of size 
at most 2(1 — 1/k)B(G), where S is the largest color 
class in G[Q]. 

The authors of [45] show that triangle-free UDGs 
can be colored using 4 colors. Given a UDG G, the 
heuristic first deletes vertices that form a triangle in 
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G (call it V’). With G := G— V’, NT decomposition 
is then applied to the resulting triangle-free UDG G 
to identify sets P and Q. G[Q] is then colored us- 
ing 4 colors and the set S C Q corresponding to the 
largest color class is identified. The heuristic then re- 
turns V’ U PU Q\ S as the approximate vertex cover. 
The approximation ratio follows by applying the local- 
ratio principle [6,7,45] as follows. In V’ we pick 3 ver- 
tices for each triangle, and we have to pick at least 2. 
And for the triangle-free UDG, which is 4-colorable, 
the result from [37] applies. The running time of the 
heuristic is dominated by the time to obtain NT de- 
composition which can be accomplished in polynomial 
time. 

A PTAS has been developed in [39] for min- 
imum vertex cover that uses an approach similar 
to the PTAS for the MIS problem described in 
Sect. “Independent Sets” from the same article. For 
0 <i<k, instead of deleting a horizontal strip con- 
gruent to i mod (k+ 1), this approach uses super- 
strips of width k + 1 overlapping at horizontal strips 
congruent to i mod k. Then solving the MIS problem 
exactly using DP on each super-strip G(i); also yields 
a minimum vertex cover. For a fixed i, the union over 
0 <j <r of minimum vertex covers of each G(i); is 
a valid vertex cover for G and the smallest vertex cover 
found over all i has size at most ((k + 1)/k)B(G). De- 
tails are available in [39]. 


Domination 


Minimum dominating set (MDS) problem, minimum 
independent dominating set (MIDS) problem and min- 
imum connected dominating set (MCDS) problem are 
known to be NP-hard for UDGs [18]. In fact, they are 
NP-hard even when restricted to a subclass of UDGs 
called grid graphs on which MIS is polynomial time 
solvable [18]. The observation that a maximal indepen- 
dent set is also a minimal dominating set is used fre- 
quently in approximating dominating sets. It has been 
proven in [45] that any maximal independent set in 
a UDG G is no larger than five times its domination 
number, i.e, a@(G) < 5y(G) < 5y;(G). This follows 
from the observation that if D is a maximal independent 
set in a UDG G, then any vertex in a MDS (or a MIDS) 
can dominate at most 5 vertices in D. Any maximal 
independent set is hence a 5-approximate solution for 


the minimum dominating set (MDS) problem and the 
minimum independent dominating set (MIDS) prob- 
lem. A 10-approximate algorithm for MCDS problem is 
also presented in [45]. This bound has been improved 
to 8 in several papers [2,12,14,60] which present dis- 
tributed implementations that are applicable in a prac- 
tical setting in wireless networks. These heuristics con- 
struct a maximal independent set (which is dominat- 
ing) and then connect it using a tree approach to obtain 
a CDS. This approach is based on the result from [2] 
that fora UDG G, 


a(G) <4y(G)+1. (1) 


The maximal independent set I that is constructed (in 
polynomial time [60]) also has the property that for any 
I CI,I' and I\ I' are exactly distance two away from 
each other i. ¢., there exist u; € I’ and u2 € I\ I’ with 
d(u,, uz) = 2. The maximal independent set is con- 
nected using a spanning tree approach in [2,12,60] and 
using a Steiner tree in [14]. The bound (1) was improved 
recently in [62] to 


a(G) < 3.8y,(G) + 1.2. (2) 


This tighter bound shows that the 8-approximate algo- 
rithms such as the ones from [14,60] are in fact 7.8-ap- 
proximate. It is also observed in [62] that if 


a(G) < ay(G) +b, 


then 2.5 < a < 3.8, which suggested that further im- 
provement of their result was possible. Recently in [28], 
it has been shown that 


a(G) < 3.453y,.(G) + 8.291 (3) 


for UDGs and a distributed algorithm is presented that 
finds a CDS of size at most 6.91y,(G) + 16.58. 

Using the bound (2), a 6.8-approximate algorithm 
for the MCDS problem has also been proposed recently 
in [48] that connects a maximal independent set using 
a Steiner tree approach. Given a set of vertices desig- 
nated as terminals, a tree connecting the terminals such 
that every leaf is a terminal is called a Steiner tree. The 
non-terminal nodes are called Steiner nodes. In princi- 
ple, we could find a Steiner tree with minimum number 
of Steiner nodes (ST-MSN) with a maximal indepen- 
dent set J as terminals and the Steiner nodes Sf union 
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I yields a CDS for the UDG G. Instead of solving ST- 
MSN optimally, the authors approximate this problem. 
This is sufficient if we make the following observation. 

From a MCDS D, we can obtain a solution for ST- 
MSN problem with terminals I as follows. We can find 
a spanning tree in G[D]; add an edge between each ver- 
tex in I \ D and some vertex in D; and remove any leaf 
which is not a terminal in the resulting tree. The Steiner 
nodes in this solution are contained in D. Hence in the 
optimal solution to the ST-MSN problem, the number 
of Steiner nodes is at most y,(G). 

In the approach taken in [48], first a maximal inde- 
pendent set I is found such that every subset of I and its 
complement are exactly distance two apart. The authors 
develop and use a 3-approximate algorithm for the ST- 
MSN problem on UDGs given terminals I (see [48] for 
details). Denote the Steiner nodes in the 3-approximate 
solution for the ST-MSN problem by S;, then its size is 
at most 3y,(G). Thus, the CDS S; U I has size at most 
6.8y,(G) + 1.2. This appears to be the approach with 
best performance guarantee available presently. 

The weighted version of the MDS and MCDS prob- 
lems, where the vertices of the UDG G are weighted 
and the objective is to find a dominating or a connected 
dominating set of minimum weight (sum of the weights 
of the selected vertices) have only been studied recently 
and the first constant factor approximation algorithms 
have been developed in [3]. A factor 72 approximation 
for MWDS and a factor 89 approximation for MWCDS 
are available and these problems appear to be more 
complicated than their unweighted counterparts. 

A PTAS for the MDS problem given the disk rep- 
resentation was developed in [39], along similar lines 
as the schemes proposed for maximum independent 
set and minimum vertex cover problems. MCDS prob- 
lem also has a PTAS developed in [17] for UDGs 
when the UDG is presented in its disk representation. 
In this work, an approximation algorithm running in 
time n(slogs)’) ig presented that constructs a CDS of 
size no larger than (1 + 1 / s)¥-(G). The algorithm uses 
a grid based divide-and-conquer approach in combi- 
nation with the shifting strategy. A robust PTAS for 
the MDS problem on UDGs was proposed recently 
in [52]. 

We briefly describe the robust PTAS for MDS on 
UDGs from [52]. Given a graph G = (V,E), the au- 
thors define a 2-separated collection of subsets S, as 


S = {S),..., Sx} with $; C V,i=1,...,k satisfying 
ViF j,d(s,t) >2,Vs €S;,VteS;. 

If D(S) denotes a MDS of G[S], the authors show that 

for a 2-separated collection S in G, 


k 
v(G) = |D(V)| = > |D(S)) - 
i=1 


Furthermore, if we have subsets T; such that 
S; C Tj,i=1,...,k and a bound p> 1 such that 


|D(T;)| < p|D(S,)|, Vi=1,...,k, (4) 
and 
D'= 'o D(T;) dominates G, (5) 
i=1,...5k 


then D’ is a p-approximate MDS of G. This is true since, 
k k 
|D'| < $> |D(T))| < p D> |D(Si)| 
i=1 i=1 


< p|D(V)| = py(G). 


Given a UDG G = (V,E) and an ¢€ > 0, the al- 
gorithm in [52] constructs in polynomial time (for 
fixed ¢), a 2-separation S; and the supersets T; with 
p = 1+ € satisfying the required properties (4), (5). 
This is accomplished as follows. First, we start with 
an arbitrary vertex v; € V; := V and compute a MDS 
D(Nx(11)) of Ng (v1) for k = 0,1, 2,... until the condi- 
tion 


|D(Nx+2(v1))| > plLD(Nz(11))| 


is violated. Denote by r; the smallest k for which the 
above condition is violated, i.e, |D(N;,+2(v1))| < 
p|D(N,,(v1))|. Then we iterate this procedure for 
the graph induced by Vi4; := V;\N,,42(vi) un- 
til Vi41 = @. Note that in the subsequent itera- 
tions, the k-neighborhood is defined with respect to 
the current graph G[V;+,]. Suppose this procedure 
terminates after K iterations, let T; = N,,+2(v;) and 
let S; = N,,(v;) for i=1,...,K. The authors then 
show that S,...,Sx is a 2-separated collection and 
os D(T;) dominates G. The termination condition 
for each iteration, |D(T;))| < p|D(S;)|, guarantees the 
required approximation ratio. 
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It is noted in [52] that G needs not be a UDG 
to derive the approximation ratio, however it is nec- 
essary to show polynomial time solvability. The run- 
ning time guarantee is provided by the following results 
from [52]. Firstly, the number of iterations K < n and 
in each iteration i, the MDS in each k-neighborhood 
Nx(v;) can be found in polynomial time since its size 
is shown to be bounded by a polynomial function of k, 
and finally the number of k-neighborhoods considered 
is also bounded since r; < c(e) where c(p) is a constant 
that depends only on the desired approximation factor 
p. Finally, this PTAS can be made robust by utilizing 
the same certification approach to show the graph is 
not a UDG used for the MIS problem developed by the 
same authors, described in Sect. “Independent Sets”. 


Coloring and Clique Partitioning 


The graph coloring problem on UDGs is known to be 
NP-hard. In [18], 3-colorability of UDGs is shown to 
be NP-complete and hence it follows that no approxi- 
mation algorithm can achieve a ratio within 4 if 3, unless 
P=NP. In fact k-colorability of UDGs is NP-complete 
for any fixed k > 3 [31]. A simple 3-approximation al- 
gorithm for the problem was presented in [45] based on 
results from [37,58]. Let 


p(G) = max 6(H) , 


the largest p such that G contains a subgraph H of min- 
imum degree p. Then 


x(G) < p(G) + 1158] 


and p(G) can be found in O(m + n) steps [37,58] as 
follows. Let p := 0 and let v be a vertex of minimum 
degree in G. Repeating the steps, p := max{p, 6(G)} 
followed by G:= G—v until no vertices remain in 
G, finds p(G). If we denote by v;, the vertex removed 
in step i, each v; then has at most p(G) neighbors 
in Vi+1,.-.,Vn- Processing the vertices in the order 
Vn,--+,V1, and coloring each vertex with the small- 
est color not yet assigned to any of its neighbors al- 
ready colored, guarantees a coloring of G with at most 
p(G) + 1 colors. If Gis a UDG, then it is proven in [45] 
that 


p(G) 


——+1< ;(G). 
g FS) 


Using similar approaches, it has also been shown in [54] 
that a UDG G can be colored using no more than 
3@(G) — 2 colors. A 3-approximate algorithm for col- 
oring UDGs using network flow and matching tech- 
niques is also available from [31]. 

Clique partitioning is NP-complete even when re- 
stricted to coin graphs (UDGs where all overlaps are 
tangential) [15]. A polynomial time 3-approximate al- 
gorithm for this problem that uses the disk represen- 
tation is available from [15]. The algorithm proceeds 
by first partitioning the plane into horizontal strips of 
width ./3. A disk belongs to strip i if its center lies on 
the strip. A disk with its center on the boundary is as- 
signed to the strip on top. Let G; denote the UDG in- 
duced by disks in strip i and V(G)) are vertex disjoint. 
Solve the minimum clique partitioning problem exactly 
on each G; and let Z; denote the collection of cliques. 
The authors observe that this can be accomplished 
in polynomial time by coloring the complement since 
each G; is a cocomparability graph [9,15]. The clique 
partition returned by the algorithm is Z := (); Z; and 
it can be shown that |Z| < 37%(G) as follows. Let Z’ de- 
note a minimum clique partition of UDG G and let 
Z* be the restriction of Z’ to G; obtained by exclud- 
ing the vertices not in V(G;) from the cliques in Z”. Z* 
is a valid clique partition of G; and hence |Z}| > |Z;|. 
If C is a clique in Z’, the authors observe that based 
on geometric arguments, the centers of disks in C 
must lie inside three consecutive strips. Hence, each C 
in Z is a union of at most 3 disjoint cliques from 
Zi : Zz; and Zi 44 for some j. Hence we have, 


Ze) ale > 2s 32"|. 


t t 


The running time of the approximation algorithm is 
dominated by the exact solution step on each strip re- 
sulting in O(n + m) where m denotes the number of 
edges in G. 


Related Results 


A survey of on-line and off-line approximation algo- 
rithms for independent set and coloring problems on 
UDGs and general disk graphs (intersection graphs of 
disks of arbitrary radii) can be found in [23]. A short 
survey of results for cliques, independent sets and col- 
oring of disk graphs is also available in [27]. A survey 
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of complexity results on recognizing several variants 
of UDGs can be found in [36]. PTAS for maximum 
weighted independent set and minimum weighted ver- 
tex cover problems on intersection models of disks are 
available in [24]. 

A notion of thickness of UDGs is introduced and 
fixed parameter tractability of maximum independent 
set, minimum vertex cover and minimum (connected) 
dominating set problems (with thickness as parameter) 
is established in [59]. A parameterized algorithm run- 
ning in nV) for finding an independent set of size k 
on bounded ratio disk graphs (the ratio of maximum di- 
ameter to minimum diameter is bounded by a constant) 
are presented in [1]. 

Several variants of the classical vertex coloring 
problem have been considered on UDGs, primarily 
motivated by different frequency assignment problems 
that arise in wireless networks. Apart from natural gen- 
eralizations of UDGs such as general disk graphs, and 
bounded ratio disk graphs mentioned before, other 
generalizations of UDGs such as Quasi UDGs [42], bi- 
sectored UDGs [53] and double disk graphs [44] have 
also been developed motivated by wireless applications. 
Coloring problems have been studied in the context of 
these generalizations. 

Algorithms for distance constrained labeling, which 
is a generalization of the well-known vertex coloring 
problem, for disk graphs are presented in [26]. Another 
variant of coloring called the multicoloring problem on 
UDGs is considered in [49]. The notion of conflict-free 
coloring is introduced and studied in the context of 
disk graphs in [25]. A k-improper coloring of a graph 
is one in which each color class induces a subgraph of 
maximum degree k. Note that 0-improper coloring is 
a proper coloring by the standard definition. For fixed 
k, the k-improper coloring problem has been shown 
to be NP-complete in [35]. Coloring and other prob- 
lems on bisectored unit disk graphs, which generalize 
UDGs to allow for the phenomenon of cell sectoriza- 
tion in wireless communication are studied in [53]. An- 
other generalization of UDGs motivated by frequency 
assignment problems in wireless networks are double 
disk graphs. Here, two concentric disks of arbitrary radii 
are associated with each vertex, and two vertices are ad- 
jacent if the inner disk of one intersects the outer disk 
of the other. For instance, one could think of the in- 
ner disk as the receiver range and the outer disk as the 


transmission range. Coloring problems on these graphs 
are studied and constant factor approximation algo- 
rithms are developed in [22,44]. Hierarchical models of 
UDGs formed by a sequence of labeled UDGs is consid- 
ered in [46]. PTAS for the maximum independent set, 
minimum dominating set, minimum clique cover, and 
minimum vertex coloring problems for UDGs specified 
hierarchically are developed in [46]. 

The notion of well-separated pair decomposi- 
tion [13] with applications in geometric proximity 
problems is studied in the context of UDGs and algo- 
rithms for the same are developed in [29]. In [41], the 
hardness of approximately embedding UDGs is con- 
sidered. Given a UDG G = (V, E), let L(c,, cy) denote 
the Euclidean distance between centers c,,, cy, of discs 
u,v € V in an embedding emb(G). The authors define 
the quality of an embedding emb(G) as 


max L(cy,Cc,) 
u,v)EE 


( 
b(G)) = ———___ 
Heme) min L(cy, cy) 
(u,v)€E 
Note that for any proper unit disk embedding emb(G), 
the numerator of q(emb(G)) is at most 1 and the de- 
nominator is more than 1. The authors of [41] then 


show that finding an embedding emb(G) for a UDG 


G such that q(emb(G)) < 
n — oo is NP-hard. 

A data structure referred to as extended doubly con- 
nected edge list is developed in [43] for representing 
UDGs which facilitate faster implementation of routing 
algorithms in mobile wireless networks. 

Max-cut and max-bisection problems in UDGs are 
shown to be NP-hard in [21]. 


/3/2-€ where € — 0 as 


Conclusions 


In this chapter, we have surveyed results from literature 
on classical combinatorial optimization problems such 
as the maximum clique, maximum independent set, 
minimum vertex cover, minimum (connected) domi- 
nation, graph coloring and minimum clique partition- 
ing on unit-disk graphs. Brief descriptions of the ap- 
proaches taken to solve these problems and the key 
ideas involved have been explained. Several recent re- 
sults from literature have also been presented. A sum- 
mary of important results surveyed can be found in Ta- 
ble 1. 
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Optimization Problems in Unit-Disk Graphs, Table 1 
A summary of results surveyed in this chapter 


Problem 


Clique 
Independent 
Set 

Vertex cover 
Domination 
Connected 
Domination 
Coloring NPC [18,31] 
Clique 
Partitioning 


Complexity Constant PTAS Robust algo. 
factor 


InP[is] |N/A N/A 
NPC{is] |3(45]* | Bolt 


Poly-time [55] 
PTAS [51] 


a7 
39] 
[i7]t 


1.5 [45] # 
5 [45] * 
6.8 [48] 


NPC [18] 
NPC [18] 
NPC [18] 


PTAS [52] 


3 [45]* 
3[15]# 


NPC [15] 


¥ Algorithm requires disk representation. * Algorithm does not 
use a disk representation, but graph must be a UDG to ensure 
running time and/or performance guarantees. 
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The origin of the term mathematical software can be 
traced back to J.R. Rice, who organized a symposium on 
this topic in 1969 [20,21]. Today, the term ‘mathemat- 
ical software’ refers to accurate, efficient and reliable 
software for the solution of mathematical problems that 


arise in scientific and engineering applications. Differ- 
ent applications are described by similar mathemati- 
cal models, leading to common computational kernels. 
Mathematical software provides solutions to these ker- 
nels, and supplies building blocks for the development 
of application software. Therefore, the availability of 
mathematical software simplifies the solution of ap- 
plication problems by relieving users from having to 
deal with details related to basic algorithms and their 
implementations, while exploiting the experience and 
know-how of mathematical software developers that is 
needed to produce reliable, accurate and robust mod- 
ules. Mathematical software is therefore the result of the 
collaboration of experts in different fields of scientific 
computing, as it is confirmed by the existence of or- 
ganized mathematical software repositories and cross- 
indexed catalogs [1,10,18]. 

The production of mathematical software is a com- 
plex process, ranging from the development of algo- 
rithms, to their implementation in specific environ- 
ments, to the development of user-friendly interfaces, 
and to intensive testing and quality assurance of the fi- 
nal product. Moreover, this activity is largely influenced 
by the evolution of computer architectures. 

To solve real world problems, hardware must be 
‘dressed’ with a suitable suite of software products. This 
software can be grouped into three main layers that we 
refer to as low-level, medium-level and high-level. These 
terms indicate how close the software is to the hardware 
(with low-level referring to the closest layer), and, at the 
same time, how close the software is to the real-world 
problem (with high-level software referring to the layer 
closest to the application). 

We divide mathematical software into the following 
main categories [23]: 

a) individual routines, sometimes gathered into collec- 
tions, 

b) packages of basic routines, 

c) packages for specific mathematical areas, 

d) general-purpose libraries, 

e) problem solving environments (PSE). 

Note that problem solving environments have been in- 

cluded among the above categories although they do 

not consist only of mathematical software, in order to 

give an idea of the trends in the development of scien- 

tific software and of the role of mathematical software 

in such contexts. 
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In the field of optimization, a lot of sequential soft- 
ware is available within each one of the above cate- 
gories. The continuing evolution of most optimization 
algorithms and software makes hard both the task to 
give an exhaustive overview of the current (1999) state- 
of-the-art and the process to rapidly individuate a soft- 
ware among the existing ones which is able to take ad- 
vantage of the special nature of the particular problem 
to be solved. 

A first step along such direction should be to formu- 
late the optimization problem as one of the standard op- 
timization paradigms. This usually leads to a taxonomy, 
according to which most optimization problems can be 
classified. One of the most complete classifications can 
be found in [17]. It provides an up-to-date (1999) on- 
line optimization tree guide to the different subfields 
of optimization and includes an overview of the ma- 
jor algorithms in each area, with pointers to software 
packages where appropriate. The first two branches of 
such optimization tree provide pathways through con- 
tinuous and discrete problems. Following these path- 
ways we are able to meet the most relevant classes of 
optimization problems, ranging from nonlinear equa- 
tions and nonlinear least squares for the unconstrained 
continuous optimization, to linear programming and 
general nonlinearly constrained problems for the con- 
strained continuous optimization. 

The choice of an algorithm, and related software, 
to solve an optimization problem has to be made tak- 
ing into account either some intrinsic properties of the 
problem, such as: 

e type of the objective function and constraints, 

e size (number of variables and constraints), 

e sparsity degree, 

or the main factors that determine the computational 

cost of the algorithm, such as: 

e evaluation of objective functions, constraints, 
and/or derivatives, 

e number of evaluations of objective functions, con- 
straints, and/or derivatives, 

number of variables or constraints, 

e number of iterations (optimization algorithms are 
essentially iterative). 

We point out that a large number of available optimiza- 

tion software belongs to category c), that is it consists of 

packages which are specifically aimed to optimization 

problems and that can be divided into the following two 


groups: 
e single-class, that is packages which address a specific 
problem class; 
e multiple-class, that is packages that cover more than 
one class of problems. 
Among the software collections belonging to the 
first group, we mention WHIZARD [25] for linear 
programming which uses primal, dual and network 
simplex algorithms, and L-BFGS-B [5] for bound- 
constrained optimization problems, which uses a lim- 
ited memory BFGS algorithm and it is suitable for solv- 
ing large problems. 

Among the packages belonging to the multiple- 
class we mention MINPACK-1 [14] which is intended 
to solve systems of nonlinear equations and nonlinear 
least squares problems, and it is based on the trust re- 
gion concept. 

One of the most recently developed packages of the 
multiple-class group is LANCELOT [6,12], which pro- 
vides solvers for unconstrained optimization problems, 
systems of nonlinear equations, bound-constrained 
optimization problems, and general nonlinearly con- 
strained optimization problems. 

We observe that in many cases software packages 
which can be used for more than one problem class 
would sacrifice some efficiency by handling some spe- 
cial nature of the main target problem as the general 
one. 

We also mention three general numerical software 
libraries, NAG, Harwell and IMSL (category d)), which 
contain optimization capabilities. 

In the 1990s, much research effort has been ad- 
dressed to develop modeling languages and optimiza- 
tion systems (category e), that is PSEs). The basic idea 
of a modeling programming language is to provide user 
with common notation and familiar concepts to formu- 
late optimization models and examine solutions, while 
computer manages communication with an appropri- 
ate solver. One of the most used modeling languages is 
AMPL [9], which offers an interactive command envi- 
ronment for setting up and solving linear and nonlinear 
optimization problems, in continuous or discrete vari- 
ables. 

An optimization system is a complete system 
formed by modeling languages to formulate the opti- 
mization problem, as well as by easy-to-use user inter- 
faces to solve it and other utilities, like report writing, 
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model management and execution control. An exam- 
ple of currently available optimization systems is the 
NEOS Server [17] which allows to solve optimization 
problems automatically over the Internet, with mini- 
mal input from the user and with state-of-the-art op- 
timization software without downloading and linking 
code. Furthermore, the NEOS Server provides deriva- 
tives and sparsity patterns determined automatically 
with ADIFOR, which is a tool for the automatic differ- 
entiation of Fortran programs [3]. 

Finally, we observe that, as pointed out in [7,26], 
computational kernels that frequently appear in nu- 
merical optimization are those of dense and sparse 
linear algebra (matrix factorization, orthogonalization, 
preconditioning, etc.). In most optimization software 
computational kernels are solved by using efficient rou- 
tines from standard packages designed for basic linear 
algebra problems (BLAS, LAPACK). 

While the quality of mathematical software can be 
considered satisfactory for ‘traditional’ computing en- 
vironments, the widespread and effective use of high 
performance computing (HPC) resources, required for 
the solution of the so-called grand challenges, is still 
inhibited by the lack of software suitable for such ad- 
vanced computational environments [23]. 

Design and implementation of mathematical soft- 
ware for high performance computing environments 
has to take into account a number of issues in addi- 
tion to the ones faced for sequential and vector systems. 
The new features to be dealt with include a variety of 
processor and memory system architectures, the lack of 
standard language features for specifying parallel oper- 
ations or data distribution, and nondeterminism in ex- 
ecution. 

The number of available parallel optimization pack- 
ages is small if compared with the large number of 
sequential packages. Among them, we mention BTN 
(block truncated Newton) [16], for unconstrained min- 
imization in shared and distributed memory comput- 
ing environments and PDS, a collection of routines for 
solving unconstrained nonlinear optimization prob- 
lems using direct search methods, which has been 
developed for distributed memory architectures [24]. 
Moreover, versions of GENOS (generalized network op- 
timization system), for solving unconstrained optimiza- 
tion problems with network and generalized network 
constraints, are available for vector and parallel ma- 


chines [8]. Routines for unconstrained nonlinear prob- 
lems based on Newton-type methods, are included in 
the NAG parallel library. Other efforts to produce opti- 
mization software for HPC environments are gathered 
into a few projects currently under development. A well 
know project is MINPACK-2, aimed mainly at devel- 
oping a version of MINPACK-1 suitable for advanced 
architectures [2]. 

Two basic strategies for introducing parallelism into 
optimization algorithms, and more generally into nu- 
merical algorithms, can be identified: 

e parallelizing the computational kernels; 
e parallelizing the methods. 

One of the already mentioned computational ker- 
nel in optimization algorithms is the evaluation of ob- 
jective functions and/or derivatives, which can domi- 
nate the overall execution time. If the evaluations are 
computationally intensive, one can exploit parallelism 
in each of them, and the exploitation depends on the 
type of the objective function. For example, in the case 
of partially separable functions, i. e. functions expressed 
as the sum of element functions on which small subsets 
of variables have disjoint effects, one can exploit paral- 
lelism by having different processors compute different 
element functions concurrently, as in [11]. 

With the strategy of ‘parallelizing the method’, par- 
allelism is introduced at a level higher than compu- 
tational kernels, often leading to new optimization 
methods. For example, in the context of quasi- Newton 
methods, parallelism has been introduced using line- 
searches that evaluate the objective function at multiple 
points concurrently. The basic idea consists in choosing 
several points along a search direction, evaluating the 
objective function in each of them concurrently, and 
using the point with the lowest function value at the 
next iterate. 

A further example can be found in the area of global 
optimization. The idea is to partition the feasible region 
into subregions, where each processor searches for a lo- 
cal minimum; the global minimum is obtained by com- 
paring the local results (see for example [4,19]. 

Finally we observe that an effective implementation 
of the parallelization strategies described requires the 
use of dynamic load balancing techniques, to ensure as 
much as possible that the workload is uniformly dis- 
tributed among the processors and, hence, to minimize 
processor idle time. 


Optimization Strategies for Dynamic Systems 


2847 


See also 


> Continuous Global Optimization: Models, 


Algorithms and Software 


> Large Scale Unconstrained Optimization 
> Modeling Languages in Optimization: A New 


Paradigm 


References 


13. 
14. 


. ACM Trans Math Softw http://www.netlib.org/toms 
. Averick BM, Moré JJ: User’s guide for the MINPACK-2 test 


problem collection. Argonne Nat Lab Math Comput Sci Div 
157, http://www.mcs.anl.gov/summaries/minpack93/ 


. Bishof C, Carle A, Hovland P, Knademi P, Mauer A (1993) 


ADIFOR 2.0 user’s guide (revision D). ANL/MCS-TM-92 


. Byrd RH, Dert CL, Rinnooy Kan AHG, Schnabel RG (1990) 


Concurrent stochastic methods for global optimization. 
Math Program 46:1-29 


. Byrd RH, Lu P, Nocedal J (1997) L-BFGS-B: Fortran subrou- 


tines for large scale bound constrained optimization. ACM 
Trans Math Softw 23:550-560 


. Conn AR, Gould NMI, Toint PHL (1992) LANCELOT: A For- 


tran package for large scale nonlinear optimization. Ser 
Comput Math, vol 17. Springer, Berlin 


. Conn AR, Gould NIM, Toint PHL (1994) Large-scale nonlin- 


ear constrained optimization: A survey. In: Spedicato E (ed) 
Algorithms for Continuous Optimization: the State of Art. 
Ser C: Math and Physical Sci., vol 434. Kluwer, Dordrecht, 
pp 287-332 


. Dembo JM, Zenios SA (1987) GENOS 1.0 user’s guide: 


A generalized network optimization system. Report 87-13- 
03, Dept Decision Sci Wharton School, Univ Pennsylvania 


. Fourer R, Gay DM, Kernighan BW (1993) AMPL: A modeling 


language for mathematical programming. Duxbury Press, 
Pacific Grove 


. GAMS: Guide to the available mathematical software. Nat 


Inst Standards and Techn (NIST) http://gams.nist.gov 


. Grienwank A, Toint PHL (1983) Numerical experiments 


with partially separable optimization problems. In: Dohl 
A, Eckmann B (eds) Numerical Analysis: Lecture Notes in 
Mathematics, vol 1066. Springer, Berlin 


. LANCELOT. Dept Comput and Inform, Council Central Lab 


Res Councils http://www.dci.clrc.ac.uk/Activity/LANCELOT 
MIPIII (1994) User's manual. Ketron Management Sci. 
Moré JJ, Sorensen DC, Hillstom KE, Garbow BS (1984) The 
MINPACK project. In: Cornell WJ (eds) Source and Develop- 
ment of Mathematical Software. Prentice-Hall, Englewood 
Cliffs 


. Moré JJ, Wright SJ (1993) Optimization software guide. 


SIAM, Philadelphia 


. Nash SG, Sofer A (1992) BTN: Software for parallel uncon- 


strained optimization. ACM Trans Math Softw 18:414—448 


17. NEOS Optimization software: NEOS guide. Optim Techn 
Center Argonne Nat Lab and Northwestern Univ, http:// 
www-c.mcs.anl.gov/home/otc/ 

18. Netlib Repository. Univ Tennessee, Knoxville and Oak 
Ridge Nat Lab, http://www.netlib.org 

19. Pardalos PM, Philips A, Rosen JB (1992) Topics in parallel 
computing in mathematical programming. Sci. Press, Mar- 
rickville 

20. Rice JR (1969) Announcement and call for papers, mathe- 
matical software. SIGNUM Lett 4(3) 

21. Rice JR (1971) Mathematical software. Acad. Press, New 
York 

22. Rice JR, Boisvert RF (1996) From scientific software libraries 
to problem software environments. IEEE Comput Sci Eng 
Fall 

23. Serafino DDI, Maddalena L, Messina P, Murli A (1998) Some 
perspectives on high performance mathematical software. 
In: De Leone R, Murli A, Pardalos PM, Toraldo G (eds) High 
Performance Algorithms and Software in Nonlinear Opti- 
mization. Kluwer, Dordrecht 

24. Torczon V: PDS: Direct search methods for unconstrained 
optimization on either sequential or parallel machines 
http://www-c.mcs.anl.gov/home/otc/ 

25. Whiz C (1994) Linear programming optimizer. Ketion Man- 
agement Sci 

26. Wright MH (1993) Some linear algebra issues in large-scale 
optimization. In: Proc NATO ASI Conf Linear Algebra for 
Large-Scale and Real Time Applications. Kluwer, Dordrecht 


a 
Optimization Strategies 


for Dynamic Systems 


ARTURO CERVANTES, L. T. BIEGLER 
Chemical Engineering Department Carnegie, 
Mellon University, Pittsburgh, USA 


MSC2000: 93-XX, 65L99 


Article Outline 


Keywords 
Variational Methods 


Partial Discretization 
Dynamic Programming 
Sequential Methods 
Sensitivity-Based Gradients 
Adjoint-Based Gradients 

Full Discretization 
Multiple Shooting 
Collocation 
NLP Techniques 
Full Space SQP Approaches 


2848 


Optimization Strategies for Dynamic Systems 


Emerging Areas 
Addressing Bottlenecks in NLP Solvers 
Discrete Decisions In Dynamic Optimization 
Multistage Applications 


See also 
References 


Keywords 


Dynamic optimization; NLP; Adjoint; Sensitivity; 
Collocation; SQP 


Interest in dynamic simulation and optimization of 
chemical processes has increased significantly during 
the 1980s- 1990s. Common problems include control 
and scheduling of batch processes; startup, upset, shut- 
down and transient analysis; safety studies and the 
evaluation of control schemes. Chemical processes are 
modeled dynamically using differential-algebraic equa- 
tions (DAEs). The DAE formulation consists of differ- 
ential equations that describe the dynamic behavior of 
the system, such as mass and energy balances, and al- 
gebraic equations that ensure physical and thermody- 
namic relations. 

The general dynamic optimization problem can be 
stated as follows: 

min 


Z2(tr), v(t-), u(tp), tr, il 
Monies” (tr), v(tp), u(t), ty, P) (1) 


s.t. DAE model: 


ae = F (a(t), y(t), u(t), tp) (2) 
G (z(t), y(t), u(t), tp) = 0, (3) 


initial conditions: 
z(0) =z’, (4) 


point conditions: 


G; (z(t;), (t.).e(t,), tp) = 05 (5) 
bounds: 
eo Ses: 2", 
yey. x, 
ew Su= u, (6) 
Pape pp. 
t<te= tf, 


where 


is a scalar objective function, 


are differential equation constraints, 


are algebraic equation constraints, 


are additional point conditions at times ts, 


are differential state profile vectors, 


are the initial values of z, 


are algebraic state profile vectors, 


are control state profile vectors, 


is a time-independent parameter vector. 


We assume, without loss of generality, that the index of 
the DAE system is one and that the objective function 
is in linear Mayer form. Otherwise, it is easy to refor- 
mulate most problems to this form. Dynamic optimiza- 
tion problems can be solved either by the variational 
approach or by applying some level of discretization 
that converts the original continuous time problem into 
a discrete problem. The first approaches are focused on 
obtaining a solution to the classical necessary condi- 
tions for optimality. These approaches are also known 
as indirect methods. 

The methods that discretize the original continuous 
time formulation can be divided into two categories, 
according to the level of discretization. Here we dis- 
tinguish between the methods that discretize only the 
control profiles (partial discretization) and those that 
discretize the state and control profiles (full discretiza- 
tion). Basically, the partially discretized problem can be 
solved either by dynamic programming or by apply- 
ing a nonlinear programming (NLP) strategy (direct- 
sequential). A basic characteristic of these methods is 
that at every iteration a feasible solution of the DAE 
system, for given control values, is obtained by integra- 
tion. The main advantage of these approaches is that 
they generate smaller discrete problems than full dis- 
cretization methods. 

The methods that fully discretize the continuous 
time problem also apply NLP strategies to solve the 
discrete system and are known as direct-simultaneous 
methods. These methods can use different NLP and dis- 
cretization techniques but the basic characteristic is that 
they solve the DAE system only once, at the optimum. 
In addition, they have better stability properties than 
partial discretization methods, especially in the pres- 
ence of exponentially increasing modes. On the other 
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hand, the discrete problem is larger and may require 
special solution techniques. 

With this classification we take into account the de- 
gree of discretization used by the different methods. 
Some authors might prefer to classify the methods ac- 
cording to the solution strategy as indirect methods, 
methods based on dynamic programming and direct 
methods [41]. 

This article is organized as follows. In the next sec- 
tion we present the description of the variational meth- 
ods. Following this we describe methods that partially 
discretize the dynamic optimization problem, and we 
then discuss full discretization methods as well. Finally, 
we conclude with a brief description of some of the 
emerging areas related to dynamic optimization such as 
interior point methods and the solution of mixed integer 
dynamic optimization problems and multistage prob- 
lems. 


Variational Methods 


These methods are based on the solution of the first 
order necessary conditions for optimality that are ob- 
tained from Pontryagin’s maximum principle [37]. For 
the problem (1)-(4), the optimality conditions are for- 
mulated as a set of differential-algebraic equations 


dz 0H 
ra F(z(t), y(t), u(t), p), z(0) = z, 
(7) 
a oH dp  aGy 
Se ee, Mer ee Maer 
ae ae on oe 8) 
dH _ OF, 8G _y 
dy dy bye 
aH _ 
du (9) 
'f OH 
adr = 0; 
0 Op 


G (z(t), y(t), u(t), t, p) = 0, 
where the Hamiltonian, H, is a scalar function of the 
form 


H(t) = A(t) F(t) + w(t)" G(t) (10) 


and A, ju are vectors of the adjoint variables and vy is 
the multiplier associated with the final time constraint, 


Gy (z(ty), v(tp), u(tp), ty, p)) = 0. 


The main problem in obtaining a solution to these 
equations are the boundary conditions. Normally the 
state variables are assigned initial conditions and the 
adjoint variables are assigned final conditions. This 
procedure leads to a two-point boundary value problem 
(TPBVP) that can be solved with different approaches: 
single shooting, invariant embedding, multiple shoot- 
ing, collocation on finite elements and finite differences. 

In the single shooting methods the missing initial 
conditions values are guessed. Then, an initial value 
solver integrates the DAE forward and a Newton iter- 
ation is applied to adjust the guessed initial conditions 
so that the final conditions are equal to the given values. 
The main disadvantage of this method is that in many 
cases the problem cannot be solved for a given set of 
guessed initial conditions, due to nonlinearities and in- 
stabilities of the DAE system. 

Invariant embedding [45] is a procedure for con- 
verting the TPBVP to a initial value problem (IVP). 
It is based on assuming the structure of the solution, 
and results in solution procedures analogous to the Ric- 
cati matrix differential equation. The main disadvan- 
tage here is the high dimensionality of the resulting 
problem. 

Multiple shooting methods follow the same idea as 
single shooting, but now the integration horizon is di- 
vided into smaller subintervals. In this way, state vari- 
able values are not only guessed at initial time, but also 
at several points in between. Then the system equa- 
tions are decomposed by either solving a collocation 
system for each region or using a direct integrator along 
the nominal trajectory on each subinterval. The New- 
ton iteration is also needed to enforce the continuity 
between subintervals. The discretization methods (or 
global methods) are known as the most stable. The so- 
lution to the TPBVP is obtained simultaneously for the 
whole horizon, so the initial conditions do not need to 
be guessed. 

For the multiple shooting and discretization meth- 
ods, special decomposition strategies are usually used to 
decompose the structured linear algebraic system that 
is obtained at every iteration of the solution procedure. 
Efficient factorizations schemes, based on structured 
Gaussian elimination [27,30] and structured orthogo- 
nal factorization [50] can be used in order to minimize 
the computational effort. Although these methods work 
well for problems without bounds, handling inequality 
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constraints is difficult, unless a priori information about 
the active constraints is known. 


Partial Discretization 
Dynamic Programming 


The use of iterative dynamic programming (IDP) for the 
solution of dynamic optimization problems has been 
limited largely because of the high dimensionality usu- 
ally associated with it. This problem is often avoided by 
allowing a very coarse grid, which in some cases can 
be accurate enough [13]. Although the IDP algorithm 
is slower than most gradient-based algorithms, it can 
be useful to cross-check results of relatively small prob- 
lems (n < 100). This is especially true when the global 
optimum is unknown, as the probability of obtaining 
the global optimum is usually high once the grid is not 
poorly chosen [20]. For these techniques the time hori- 
zon is divided into P time stages, each of length L. Then, 
the control variables are usually represented as piece- 
wise constant or piecewise linear functions in each in- 
terval. The piecewise linear functions in each interval 
(ty; th 1), usually takes the form 
Yi+1 — Ui 


L je-#), 


where uj; and uj, 1 are the values of u at t; and tj; , re- 
spectively. 

The dynamic optimization problem is to find uj, i= 
0,... , P—1, that minimize a given objective function. 
The basic search algorithm is the following [33]: 

1) Divide the time interval [0, t;] into P time stages, 

each of length L. 

2) Choose the number of allowable values M for u. 

3) Choose an initial profile for each u;, initial region 
size r;, and the contraction factor y. 

By using the initial control policy, integrate the sys- 
tem from ft = 0 to ty to generate the state trajectory 
and store the values of the states at the beginning 
of each time stage, so that the states at (i— 1) cor- 
responds to the value of the states at the beginning 
of stage i. 

Starting at stage P, integrate the system from ty — L 
to ty using as initial value the states at P—1 from 
step 4 once with each of the allowable values for the 
control vector. Choose the control up_, that gives 
the minimum value for the objective function, and 
store the value. 


u(t) = ui + ( 


4 


wa 


5 


~ 


6) Step back to stage P—1, corresponding to time 
tr — 2L. For each allowable value of up_» integrate 
the system by using as initial value the states at P—2 
chosen from step 4 and the given control policy 
(constant or linear). Continue integration until t = 
ty using for the last stage the value up_, from step 5. 
Compare the M values of the objective function and 
choose the up_, that gives the smallest value. 

7) Continue the procedure until stage 1, corresponding 
to the initial time t = 0. 

8) Reduce the region for allowable control, pele y rk, 
where k is the iteration index. Use the control policy 
from step 7 as the midpoint for the allowable values 
for the control u at each stage. 

9) Increment the iteration index and go to step 5. Con- 
tinue the procedure for a specified number of itera- 
tions and examine the results. 

This algorithm works well when the dynamic op- 
timization problem does not include bounds on state 
variables. In order to include them, a penalty term 
has to be added into the objective function to penal- 
ize the constraint violation. This can be done by adding 
a state variable for each inequality that measures the 
constraint violation over time [35] or by computing the 
constraint violation at given points in time [20]. 


Sequential Methods 


In the sequential methods, only the control variables are 
discretized. This is why these techniques are also known 
as control parametrization methods. Given the initial 
conditions and a given set of control parameters, the 
DAE system is solved with a differential algebraic equa- 
tion solver at each iteration. This produces the value 
of the objective function, which is used by a nonlinear 
programming solver to find the optimal parameters in 
the control parametrization. The sequential method is 
of the feasible path type, that is, in every iteration the 
DAE system is solved. This procedure is very robust 
when the system contains only stable modes. If this is 
not the case, finding a feasible solution for a given set of 
control parameters can be difficult. 

The time horizon is divided into P time stages and 
at each stage the control variables are represented with 
a piecewise constant, a piecewise linear or a polynomial 
approximation [22,48]. Also, a common practice is to 
use a set of Lagrange polynomials. So, in each stage i, 
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the control variables can be written as: 


ncol t—t 
oS fic 
nl = ova ( ) mi 
q=1 ; 


(11) 


where u;,, represent the values of the control variables, 
in stage i at collocation point q. Here, w, is a Lagrange 
polynomial of order ncol satisfying, 


WalPr) = 8q,r . 


The gradients of the objective function with respect 
to the control parameters can be calculated with the 
sensitivity equations of the DAE system or by integra- 
tion of the adjoint equations. 


Sensitivity-Based Gradients 


The sensitivity equations are found by differentiating 
the DAE system after the control has been discretized 
with the parameter set 6 [21,45], and for each element, 
6), we write: 


06; ~~ dt 

_ (aF\' ( az aF\' ( ay OF 

(=) Ga) *+(5) (a5) + (Ge) 

(=) (ae)(5) (a) +(e) 

+ + =0. 
4 00 j oy 00 j 00 j 

The solution of the sensitivity equations is simpli- 
fied because the Jacobians in the sensitivity equations 
(dF/ 0z, OG/ dz, OF/ dy, JG/ dy) are equal to the DAE 
system Jacobians calculated at each step (or given num- 
ber of steps) of the integration. The computational ef- 
fort is reduced to one matrix multiplication per param- 
eter per Jacobian evaluation. Once the sensitivities of 
the states with respect to the parameters are known, the 


gradient of the objective function and the constrains c, 
can be calculated as follows [21,45]: 


zo (aa) * (ae) (ae) * (3a) Gs) 
ao = (aa) + (aa) (3) + Cae) (3) 


Although there have been a lot of advances in solv- 
ing sensitivity equation more efficiently [23], the com- 
putational effort of solving them is still an expensive 
part of the optimization algorithms. The cost of solving 
these equations is strongly dependent on the number 
of input variables. Current directions for handling the 
computational load include exploitation of faster com- 
puter hardware and parallel computer programming 
architectures, as well as more efficient solution strate- 
gies that avoid repeated factorization of Jacobians [5]. 

The methods that are based in this approach can- 
not treat the bounds on state variables directly, be- 
cause the state variables are not included in the nonlin- 
ear programming problem. Special methods have been 
developed to address this problem. Most of the tech- 
niques for dealing with inequality path constraints rely 
on defining a measure of the constraint violation over 
the entire horizon, and then penalizing it in the objec- 
tive function, or forcing it directly to zero through an 
end-point constraint [49]. 

An alternative is to transform the inequality con- 
straints into equalities by adding a square slack variable 
[26]. The slack variables can then be treated as control 
variables and the proper bounds can be imposed in the 
NLP. The problem of this approach is that it can gen- 
erate high-index problems, and special index reduction 
techniques have to be applied at the same time. 

In [49], inequality path constraints are handled 
through a hybrid approach which is the result of the 
combined application of the discretization of these con- 
straints at a finite number of points, and forcing an in- 
tegral measure of their violation to zero. Each inequal- 
ity path constraint requires the introduction of an addi- 
tional ordinary differential equation, and it is converted 
to 2P (P = number of stages) point constraints and one 
end-point constraint. 

The use of initial value solvers that can handle di- 
rectly the path constraints has also been studied [22]. 
The main idea is to use an algorithm for constrained 
dynamic simulation so that any admissible combina- 
tion of the control parameters produces an initial value 
problem that is feasible with respect to the path con- 
straints. The algorithm proceeds by detecting activation 
and deactivation of the constraints during the solution, 
and solving the resulting high-index DAE system and 
their related sensitivities. 
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Adjoint-Based Gradients 


The gradients can also be calculated through adjoint 
methods [14,25,40] at a cost independent of the num- 
ber of input variables. The DAE adjoint equations are 
determined from the Hamiltonian function (10). The 
adjoint profiles A and jz form a semi-explicit index one 
DAE that can be solved easily. Once the adjoint system 
(8)-(9) is solved, the gradients are obtained from 


'S (OF f) 
so = | oO a Val, 
0 du du 


If the control profile is discretized into piecewise con- 
stants u;. The gradient with respect to u; can be ex- 
pressed as 


(OF dG 
ap = [ = pe) aby se: 
0 Ou ou 


tp 
+f er eS dt dup 
tp_, \ OU du 


P 


and 


dp "i (0F dG 
du; I. (Fi is a") - 

Adjoint methods are not difficult to automate, but 
they require the storage of the state profiles for the sub- 
sequent adjoint calculation. Also, Jacobians for system 
and adjoint equation integration can be evaluated at 
different times, in general, sparse LU factors of Jaco- 
bians from system equation integration are not used 
while solving adjoint equations. The use of implicit 
Runge-Kutta methods that transform the DAE system 
into discrete-time implicit equations [38] can solve this 
problem. 

As in the sensitivity based methods, in the adjoint- 
based methods, the bounds of the states variables can 
not be treated directly. Usually, when state constraints 
are imposed a separate adjoint system is developed for 
each constraint. However, if path constraints are han- 
dled individually, we face a daunting task because of 
the number of adjoint systems that must be developed. 
Special techniques that reduce the number of adjoints 
variables have been developed to overcome this prob- 
lem [38]. Other techniques approximate the constraint 
satisfaction (constraint aggregation methods) by intro- 
ducing an exact penalty function [10,40] or a Kreis- 
selmeier-Steinhauser function [10] into the problem. 


Full Discretization 


Full discretization methods explicitly discretize all the 
variables of the DAE system and generate a large scale 
nonlinear programming problem that is usually solved 
with a successive quadratic programming (SQP) algo- 
rithm. These kinds of methods follow a simultaneous 
approach (or infeasible path approach); that is, the DAE 
system is not solved at every iteration. It is only solved 
at the optimum point. Because of the size of the prob- 
lem, special decomposition strategies are used to solve 
the NLP efficiently. Despite this characteristic, the si- 
multaneous approach has advantages for problems with 
state variable (or path) constraints and for systems 
where instabilities occur for a range of inputs. In ad- 
dition, the simultaneous approach can avoid interme- 
diate solutions that may not exist, be difficult to obtain, 
or require excessive computational effort. 

The are two main different approaches to discretize 
the state variables explicitly, multiple shooting [12] and 
collocation on finite elements [19]. We briefly describe 
both of them in the following sections. 


Multiple Shooting 


In these methods the control variables are approxi- 
mated by suitable parametrizations using only a finite 
set of control parameters. Usually a piecewise constant 
or piecewise linear representation is used. On each stage 
i=0,...,P—1a time transformation is used [29] 


i-1 
O(t.v) =t+thi, t=to+ doh; te [0,1], 
k=0 


with v = (to, do, qd, os 
cretization grid 


., dp—1), With a dimensionless dis- 


0 = Tio < Ti <°** < Tim, = 1 


such that 6;(T;,0, v) = t; and 6;(Tj,m,,v) = t+ 1. A piece- 
wise approximation u; of the control u; is then defined 
by 


Ui(t) = i,j(t, ij) 


using local control parameters qj. The functions 9;, ; are 
given basic functions, typically vectors of polynomials. 
If a piecewise constant approximation is chosen, this 
function takes the form ¢;, ;(t, qi) = qi. For a piecewise 
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linear approximation the functions are expressed as 


amass] (43 aii) 
ey a eo ij ij ’ 
Ti,jt1 — Tij ’ : 


1 
[ii 
- (7!) 


by linear interpolation between the values qj, and qj, 
at the endpoints of the stage. With this representation, 
a continuous approximation can be obtained by impos- 
ing continuity equations between the stages. 

After this, the DAE system is explicitly discretized 
on each stage i = 0, ..., P—1 at the points tj of the 
discretization grid using multiple shooting [11,43]. At 
each grid point the values of the state variables sj = (s7 p 


Gi,i(T, Gif) = aij + 


co ;) are chosen as additional unknowns. In this way a set 
of relaxed decoupled initial value problems (IVP) is ob- 
tained: 


dz; 
Tr = fi (2:(*)..Vilt), Oiglt Gis), 2, 0:7, ¥)) he, 


0 =; (zi(t), vit), Gi,j(7, Gif), P, 9:(t, V)) 


=£ (S341 i,j(Tij, ij), P, Oi(tij, v)) 
(12) 


with initial conditions 


zi(tij) = $i, yi(tij) = Sj, - 

By including into the NLP the continuity conditions 
for the differential variables and the consistency condi- 
tions 


0= g; (55.5%), gist qij)> Ps 9i(Tij, v)) 


as equality constraints, the final solution satisfies the 
DAE system. With this approach, the inequality con- 
straints for states and controls can be imposed directly 
at the grid points. For piecewise constant or linear 
controls this approximation is adequate, but path con- 
straints for the states may not be satisfied between grid 
points. This problem can be avoided by applying spe- 
cial techniques to enforce feasibility, like the ones used 
in the sequential methods. 

The resulting NLP is solved using an SQP-type 
method that requires at each iteration the calculation of 
the objective function gradient and the constraint Jaco- 
bians. For almost all the different functions explicit for- 
mulas are available, and the corresponding derivatives 


can easily be calculated. The only exception is z;(T;, j+ 1), 
which is computed by numerical integration of the re- 
laxed decoupled IVP, hence the sensitivities with re- 
spect to initial values and parameters must be deter- 
mined. This task is performed with the same techniques 
used in sequential methods. The only difference is that 
they are applied at every stage, and this allows a paral- 
lel implementation of this kind of algorithm. The SQP- 
type methods used for the solution of the NLP are very 
similar to the algorithms used after the full discretiza- 
tion using collocation. For this reason, we consider the 
collocation methods next. 


Collocation 


The continuous time problem is converted into an NLP 
by approximating the profiles as a family of polynomi- 
als on finite elements. Different polynomial represen- 
tations are used in the literature. In [16,46] a mono- 
mial basis representation [4] for the differential profiles 
is used. This representation is recommended because of 
smaller condition number and smaller rounding errors: 


ncol 
P= a4 dz 
At) = eit hi Y (ES 
= h; ) ating 


where zj—, is the value of the differential variable at 
the beginning of element i, h; is the length of element 
i, dz/dt;, 4 is the value of its first derivative in element 
i at the collocation point q, and (2, is a polynomial of 
order ncol, satisfying 


(13) 


2,(0)=0, forqg=1,...,ncol, 
d 
qt alhr) =6,, forq=1,...,ncol, 


where p; is the collocation point within each element. 
One disadvantage of the representation (13) is that state 
path constraints can only be enforced directly at the 
mesh points dividing each element. However, we can 
solve this problem by adding bounded algebraic vari- 
ables to the problem formulation. The control and al- 
gebraic profiles are approximated using Lagrange poly- 
nomials of the form 


ncol 
t — tj-1 
1) = ova ( m ) ra 
q=1 ; 


ncol 


u(t) = vq (==) ig 
q=l ' 


(14) 


(15) 
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where yj, and u;,, represent the values of the algebraic 
and control variables, respectively, in element i at col- 
location point q. 

Here, w, is a Lagrange polynomial of order ncol sat- 


isfying, 
Waq(pr) = 8q,r : 


Other authors also prefer to use low order Lagrange 
polynomials [7,31] for the differential variables. 


ncol 


z(t) = vy (==) Zing - 
q=l ' 


(16) 


NLP Techniques 


The large scale NLP problems that arise from the full 
discretization of the DAE system are usually solved us- 
ing successive quadratic programming (SQP) methods. 
These methods can be classified into full space and re- 
duced space approaches. 


Full Space SQP Approaches 


Full space methods take advantage of the DAE op- 
timization problem structure and the sparsity of the 
model. They are very efficient for problems with many 
degrees of freedom [7,8] as the optimality conditions 
can be easily stored and factored. Two important dis- 
advantages of these methods are that second derivatives 
of the objective function and constraints are usually re- 
quired, and special precautions are necessary to ensure 
descent properties. In [1], a full space algorithm which 
exploits the almost block diagonal structure of the DAE 
optimization problem was developed. This approach 
decouples the optimality conditions for each block of 
the quadratic programming (QP) subproblem using an 
affine transform. This way, the first order conditions in 
the state and control variables can be solved recursively, 
making the effort of solving it increase linearly with the 
number of blocks. Also in [7,8] a full space method is 
presented. In this work, the sparsity and the block diag- 
onal structure are also exploited, but the degrees of free- 
dom of the problems solved are relatively large com- 
pared to the number of variables. reduced space SQP 
In process engineering problems, the degrees of free- 
dom are relatively few, as the number of state variables 
is much larger than the number of control variables. 


In these cases, a reduced space SQP approach (rSQP) 
can be very efficient. With this approach, either pro- 
jected Hessian matrices or their quasi- Newton approx- 
imations may be used, avoiding the necessity of second 
derivatives. An efficient algorithm can be constructed 
by decoupling the search direction into its components 
in range and null spaces and solving a smaller QP sub- 
problem at every iteration. When using collocation, this 
decomposition allows to exploit the structure of the col- 
location matrix [32] decreasing the computational ef- 
fort of these methods. 

A partially reduced strategy using multiple shoot- 
ing was developed in [42] and more recently in [29]. In 
this strategy, the structured NLP is projected onto the 
reduced space of differential variables plus control pa- 
rameters, utilizing the natural decomposition of the dis- 
cretized states into differential and algebraic variables. 
This algorithm is particularly efficient for problems 
with relatively large number of algebraic constraints. In 
addition to these methods, specialized decomposition 
procedures that take advantage of the structure of the 
Hessian were explored in [44]. 


Emerging Areas 


In this final section we briefly summarize areas of re- 
search that emerge (as of 2000) for dynamic optimiza- 
tion. These extend the methods presented so far to 
larger and more challenging applications and can be 
classified as improvements to nonlinear programming 
solvers, extensions to include discrete decisions and the 
treatment of multistage dynamic systems. 


Addressing Bottlenecks in NLP Solvers 


Several features in the above NLP strategies lead to per- 
formance bottlenecks for dynamic optimization. As the 
problem size increases, the selection of the correct ac- 
tive set can be expensive, especially for tightly con- 
strained NLPs. It has been noted that the normally effi- 
cient active set strategies in [7,8] can become time con- 
suming as many state and control profiles become con- 
strained by their bounds. To overcome this problem, 
interior point and barrier methods allow us to deal with 
many active constraints in an efficient manner. This ad- 
vantage is rooted in improved complexity properties 
of interior point methods. Whereas active set strate- 
gies have an exponential worst-case complexity, inte- 
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rior point methods typically have a complexity pro- 
portional to a low power of the number of variable 
bounds. In actual practice the resulting performance for 
active set strategies exhibits a polynomial increase in 
the number of active set iterations, whereas the number 
of interior point iterations is independent of problem 
size. 

The barrier (or interior point) method can be mo- 
tivated by representing the NLP (resulting from a dis- 
cretization of (1)-(6)): 


min f(x) 
st. c(x) = 0, 
x>0 
as 
min f(x)—p > In(x;) 
st.  c(x) = 0 


and solving the equality constrained problem for a de- 
creasing sequence of positive jw. This transformation 
can be applied directly to the NLP or at the QP level 
for the subproblem derived from an SQP algorithm. In 
the latter case, an interior point method can be derived 
that follows a central path for decreasing jw. Very effi- 
cient implementations (e. g., predictor corrector algo- 
rithms [34] have been developed for this purpose and 
these have desirable convergence rates. However, de- 
spite these properties, interior point QP solvers, em- 
bedded within SQP, are not competitive with active 
set strategies unless the number of active constraints is 
large. Here active set solvers take advantage of warm 
starts from previous QP solutions, while so far inte- 
rior point solvers still require a fixed computational cost 
(typically about ten linear factorizations and solutions 
of the KKT system) regardless of the number of ac- 
tive constraints. To reduce this fixed cost to only one 
KKT factorization per NLP integration, it becomes ad- 
vantageous to develop the barrier method at the NLP 
level. Recently efficient barrier NLP solvers have been 
developed, including the LOQO solver in [47] and the 
NITRO solver in [15]. For these methods there are 
still some limitations on convergence properties, due to 
nonconvex NLPs. 

NLP methods based on interior point concepts al- 
low us to exploit directly all of the features mentioned 


above for dynamic systems. Examples that demonstrate 
the performance of these approaches include the so- 
lution of linear model predictive control (MPC) prob- 
lems [39] and nonlinear MPC problems [1] using inte- 
rior point QP solvers and the solution of large optimal 
control problems using barrier NLP solvers [17]. 


Discrete Decisions In Dynamic Optimization 


Along with the DAE models described in (2)-(3), it be- 
comes important to consider the modeling of discrete 
events in many dynamic simulation and optimization 
problems. In chemical processes, examples of this phe- 
nomena include phase changes in vapor-liquid equi- 
librium systems, changes in modes in the operation of 
safety and relief valves, vessels running dry or over- 
flowing, discrete decisions made by control systems and 
explosions due to accidents. These actions can be re- 
versible or irreversible with the state profiles and should 
be modeled with appropriate logical constraints. An in- 
teresting presentation on modeling discrete events can 
be found in [6]. The simulation of these events is of- 
ten triggered by an appropriate discontinuity function 
which monitors a change in the condition and leads 
to a change in the state equations. These changes can 
be reformulated either by using complementarity con- 
ditions (with positive continuous variables x and y al- 
ternately set to zero) [24] or as binary decision vari- 
ables [6]. These additional variables can then be embed- 
ded within optimization problems. Here complemen- 
tarity conditions can be reformulated through smooth- 
ing [18] to yield an NLP while the incorporation of inte- 
ger variables leads to mixed integer optimization prob- 
lems. 

For the latter case, several studies have considered 
the solution of mixed integer dynamic optimization 
(MIDO) problems. In particular, [3] developed a com- 
plete discretization of the state and control variables to 
form a mixed integer nonlinear program. On the other 
hand, [2] apply a sequential strategy and discretize only 
the control profile. In this case, careful attention is paid 
to the calculation of sensitivity information across dis- 
crete decisions that are triggered in time. 


Multistage Applications 


The ability to solve large dynamic optimization prob- 
lems and to model discrete decisions allows the inte- 
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gration of multiple dynamic systems for design and 
analysis. Here different dynamic stages of operation 
can be considered with individual models for each dy- 
namic stage. Multistage applications in process engi- 
neering include startups and transients in dynamic sys- 
tems with different modes of operation, design and 
operation of periodic processes with different models 
(e. g., adsorption, regeneration, pressurization, in a dy- 
namic cycle, [36]), synthesis of chemical reactor net- 
works [28], changes in physical phenomena due to dis- 
crete changes (as seen above) and multiproduct and 
multiperiod batch plants where scheduling and dynam- 
ics need to be combined and different sequences and 
dynamic operations need to be optimized. 

For these applications each stage is described by 
separate state variables and models as in equations (2)- 
(3). These stages include an overall objective function 
with parameters linking among stages and control pro- 
files that are manipulated within each stage. Moreover, 
multistage models need to incorporate transitions be- 
tween dynamic stages. These can include logical con- 
ditions and transitions to multiple models for different 
operation. Moreover, the DAE models for each stage 
require consistent initializations across profile discon- 
tinuities, triggered by discrete decisions. 

The solution of multistage optimization problems 
has been considered in a number of recent studies. See 
[9] for the simultaneous design, operation and schedul- 
ing of a multiproduct batch plant by solving a large 
NLP. More recently (as of 2000), multistage problems 
have been considered as mixed integer problems using 
sequential strategies [2] as well as simultaneous strate- 
gies [3,28]. These applications only represent the initial 
stages of dynamic systems modeling, in order to deal 
with an integrated analysis and optimization of large 
scale process models. With the development of more 
efficient decomposition and solution strategies for dy- 
namic optimization, much more challenging and di- 
verse multistage applications will continue to be con- 
sidered. 


See also 


> Dynamic Programming: Continuous-Time Optimal 
Control 

> Dynamic Programming: Infinite Horizon Problems, 
Overview 


> Dynamic Programming and Newton’s Method in 
Unconstrained Optimal Control 

> Dynamic Programming: Optimal Control 
Applications 

> Dynamic Programming: Stochastic Shortest Path 
Problems 

> Hamilton-Jacobi-Bellman Equation 

> Infinite Horizon Control and Dynamic Games 

> Quasidifferentiable Optimization: Stability of 
Dynamic Systems 
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Introduction 


The 3D structure of molecules is important for a mul- 
titude of reasons, prominent in contemporary studies: 
structure-property relations, reaction kinetics and dy- 
namics, and drug design. Determining these structures 
is not straight forward, however, and a variety of tech- 
niques have been developed, including X-ray diffrac- 
tion, NMR spectroscopy, and electron spectroscopy. 
Despite the many techniques available, at least in the 
area of protein structures, more than 85% of struc- 
tures present in the PDB have been solved using X-ray 
diffraction methods [10]. 

In a single-crystal X-ray diffraction experiment, 
X-rays are focused on a molecular structure which has 
been crystallized. The incident rays are then diffracted 
and their intensity is sampled using a detector. The 
resulting pattern is recorded and analyzed, yielding 
data which primarily includes a reciprocal space metric 
(h, k, 1), termed Miller index, followed by a diffraction 
intensity at each individual coordinate set. The coordi- 
nates (h, k, 1) describe an infinite set of parallel planes 


through a given crystal using the primitive reciprocal 
lattice vectors as a basis. 

An ideal crystal can be described as a periodic ar- 
rangement of atoms repeated infinitely in space. This 
allows for a characterization of 3-D crystal structure in 
terms of a Fourier series known as the density function: 


p(x) = poi exp(—271H» - x), (1) 


where V is the unit cell volume, m is an index for the 
set of reflections, H is a (h, k, 1) Miller index, and Fy is 
a structure factor defined as: 


Fy =)  fjexp(271H - x)), (2) 
j 
where j is an index for the set of atoms in the structure, 
fj is an atomic scattering factor for atom j, and x; is the 
position of atom j. A structure factor can also be written 
in terms of an amplitude and phase: 


Fu => | Fy| exp(idu). (3) 


Hence, full characterization of a crystal structure re- 
quires both amplitude and phase data for a large num- 
ber of reflections, H. In a traditional X-ray diffrac- 
tion experiment, the diffraction intensity measured, 
Iy, is directly proportional to the structure factor am- 
plitude, |Fy|. Phase data, however, is not directly avail- 
able, yet vital for reconstructing the density function of 
a crystal. It is therefore in the difficult task of phase re- 
trieval, for which this article is concerned. 

First, a discussion of the Patterson and MAD tech- 
niques is presented. Traditionally, these techniques do 
not directly rely on optimization. Thus, the treatment 
of these methods is brief, focusing on some interest- 
ing uses of optimization in their context. Then, two 
prominent direct methods are discussed, both of which 
rely on the solution of difficult nonconvex optimiza- 
tion formulations. Minimal principle methods are ad- 
dressed first, with particular emphasis on the solution 
of centrosymmetric structures. Then, maximum en- 
tropy methods are presented, with a focus on the maxi- 
mum determinant method. 


Patterson Methods 


Derivation of an electron density map requires ampli- 
tude and phase information for a large number of recip- 
rocal lattice vectors, H. As mentioned previously in the 
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introduction, phase information is not directly measur- 
able in a traditional X-ray diffraction experiment, a se- 
rious obstacle in calculation of a density map. In 1934, 
with this severe limitation in mind, it was Arthur Lindo 
Patterson, who first began to look at a Fourier trans- 
form of diffraction intensities instead of structure fac- 
tors. Such a transform does not require any knowledge 
of phase information. Naturally, this image is not iden- 
tical to the original density function. However, certain 
structural information is still contained in the resultant 
map. Hence, techniques based on a Fourier transform 
of intensity, still used in some contemporary structural 
solution methods, form the global class of Patterson 
methods. 

The Patterson function is simply the convolution of 
the density function with its inverse: 


P(u) = p(x) * p(—x). (4) 


The discrete form of the Patterson function is written 
as: 


P(u) = FO Fag exP(—277 Hy -u). (5) 
A Patterson map can therefore be constructed from 
a sufficient set of diffraction intensities measured for 
a set of reciprocal lattice vectors, H, in a single-crystal 
X-ray diffraction experiment. 

The physical meaning of a Patterson map in rela- 
tion to the original structure is easiest to interpret when 
atomic contributions to the density function are con- 
sidered point-like. Hence, for the time, assume electron 
density is large at the exact position x; of each atom in 
the crystal and 0 at non-atomic site coordinates. Then, 
it is clear from (4) that the Patterson function at u will 
only be substantial when it is a difference of two atomic 
position vectors, namely u = x; — x;. Thus, for ev- 
ery possible combination of interatomic differences we 
observe a Patterson peak, with intensity proportion- 
ally to the product of the atomic scattering factors of 
the participant atoms. Consequently, a N-atom struc- 
ture, without consideration of overlap, yields a Patter- 
son map with N(N — 1) peaks and a highly pronounced 
central peak. In other words, a Patterson map is the su- 
perposition of many copies of the original electron den- 
sity map. 

Solution by Patterson methods requires a technique 
for deconvolution of the density function from the Pat- 


terson map. In traditional techniques [2], the Patterson 
function, P(u), is displaced by two vectors, x; and xx. 
Then, the two shifted Patterson maps are superim- 
posed, 


Pix(u) = P(u — xj) + P(u — xx) (6) 


Ideally, the difference of the two vectors chosen to de- 
rive P;,(u) represents an interatomic difference vector. 
In such a case, the structural image in P;,(u) is en- 
hanced. This procedure is repeated until sufficient in- 
formation is available to construct the original density 
function. A Patterson map is particularly useful in the 
case of heavy atom structures since the atomic scat- 
tering factor of an atom is proportional to the atomic 
number of its composite element. Hence, when a heavy 
atom is present at position x,, all peaks at u = x, — x; 
are pronounced in the map. Naturally, from such a Pat- 
terson map, it is much easier to deconvolute the original 
density function. 

A function for the selection of appropriate displace- 
ment vectors, used for the solution of Patterson maps, 
is presented in [11] and [9]. The generalized symmetry 
minimum function, SMF, can be written as: 


SMF(u) = ~ min P(R,x; + v; —u), (7) 


where i is the set of sampled points, x;, in the Patter- 
son map, R represents a symmetry rotation operator, v 
a translational operator, and s the set of symmetry re- 
lations present. This form of the SMF is applicable to 
the full space of the Patterson map. The positions of 
atoms are chosen from peaks in SMF(u). SMF is still 
used in contemporary algorithms for displacement vec- 
tor selection, and has yielded structural solutions for 
molecules containing up to 6,000 non-hydrogen atoms 
in the asymmetric unit [3]. The success, however, is still 
largely restricted to structures containing heavy atoms. 
In terms of optimization, solution by Patterson 
methods is classically posed in terms of a search over 
a set of interatomic distance vectors calculated from 
a Patterson map. In [8] this was done for the solution of 
myoglobin, based on the ‘minimum average’ function: 


|X Wm (8) 


s.t. Pm = P(Wm),¥m € M (9) 
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Wn = W(Wn), Vm € M (10) 


we 2, 


(11) 


where M is a subset of all available ratios having the 
lowest values of Pyn/Win, Pm and W,, are the values of 
the Patterson and test functions, and w is a vector cho- 
sen from £2, the set of all possible orientations in the 
Patterson map. Traditionally, this problem is solved in 
a stochastic manner: a large set of trials is constructed 
with randomly generated orientations for w. The merit 
function is then calculated for each trial, and the best 
solution is selected subject to further solution filtering. 


MAD 


Multiple anomalous diffraction, MAD, directly solves 
the phase problem by means vector analysis of diffrac- 
tion amplitudes sampled for at least three wavelengths. 
The list of available elements, which produce anoma- 
lous differences is limited by the set of experimentally 
available wavelengths. With the exception of sulfur, 
none of the elements commonly found in biological 
macromolecules yield sufficient anomalous scattering. 
Hence, some sort of chemical substitution is often re- 
quired. The calculation of phases is straight forward 
and simply requires knowledge of the position of the 
anomalous scatterers in the unit cell. 

Since most elements which anomalously scatter are 
heavier, Patterson methods are often sufficient to locate 
the substituted atoms. Determining the candidates for 
substitution and calculation of the heavy atom struc- 
ture is, however, at times difficult when a large num- 
ber of substitution sites exist. In [13], a figure of merit 
is proposed for automated selection of the correct trial 
heavy atom structures. This figure of merit is composed 
of four individual tests: Patterson function, difference 
Fourier, phasing figure of merit, and solvent location. 


Direct Methods 


Direct methods refers to a class of techniques which 
rely on probability theory to determine phases in re- 
ciprocal space using data from a single-crystal X-ray 
diffraction experiment. These methods, revolutionary 
at the time they were introduced, pushed the limit of 
solvable structures substantially. Unfortunately, the ac- 
curacy of direct methods fall as the size of the asym- 


metric unit increases. In addition, these methods typi- 
cally rely on near atomic data resolution. Finally, since 
all direct methods operate primarily in reciprocal space, 
a Fourier transform of the phases derived is required to 
construct the electron density function. From this den- 
sity function, atomic positions are then selected. 

Most direct methods rely on the use of origin- 
independent combinations of phases, most prominent 
of which are the triplet phase relations: 


® = oy + ox + -n-x. (12) 


In the sequel, minimal principle methods for phasing 
are first discussed. These techniques rely on solution of 
a NLP, with triplets defining the constraint set. Then, 
focus is placed on the class of maximum entropy meth- 
ods for phasing, with particular emphasis on maximum 
determinant methods. 


The Minimal Principle 


The minimal principle methods are certainly some of 
the most well-known phasing techniques. Like all direct 
methods, they represent a phasing technique, which is 
applied in the reciprocal space of a crystal. Specifically, 
a phase solution is determined from the minimal prin- 
ciple by minimizing a merit function in the framework 
of a nonconvex NLP. The optimization program, first 
proposed by [5], is as follows: 


min Rmin = y > Ai[ cos(®,) —o]"/( y> Ar) (13) 


S.t. ou, oF gx; + $-H)-K; = ®; ’ Vte T, (14) 


ou, € [0,27], VmeM, (15) 


where M denotes the total number of reflections from 
an X-ray diffraction experiment after all the sym- 
metry equivalent reflections have been removed, T 
denotes the set of triplet phase invariants, A; = 
2N7"?|En||Ex||E-a—x|, @¢ = T1(A;z)/Io(Az), |E| is 
a normalized structure factor amplitude, N is the num- 
ber of atoms in the unit cell, I, is a modified Bessel 
function of order n, and H and K denote Miller indices. 

In all practical cases, it has been demonstrated [7] 
that a set of phases which minimizes Rin and satisfies 
atomicity constraints represent the true phase solution 
for a structure. 
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Most contemporary phasing algorithms, which 
make use of the minimal principle, rely on a combina- 
tion of local search and stochastic optimization tech- 
niques. One commonly used direct methods package 
for phasing is known as SnB [16]. 

For centrosymmetric structures, an integer pro- 
gramming model of the minimal principle has been de- 
rived by [12]: 


min Rmin = ) > A;[4B:+1-20;+0; |/()~ Ar) (16) 


t 


s.t. bu, + ox, +¢_n,-K, = 2a:+B:, WteT, (17) 
ou,, + Ou, RK, =1, Vm eM,Vs € Sm, (18) 
bu, = u,R,, Vn eM,Vue Um, (19) 
a:,B;€ {0,1}, VWteT, (20) 
gu, € {0,1}, VneM, (21) 


where M, as before, denotes the total number of reflec- 
tions from an X-ray diffraction experiment after all the 
symmetry equivalent reflections have been removed, 
Sm is the set of shifted phases related to H,, by ro- 
tational symmetry R,, and U,, is the set of unshifted 
phases related to H,, by rotational symmetry R,. So- 
lution of this model is nontrivial. In addition, it has 
been shown [17] that a global solution to the minimal 
principle model can in fact represent a false minimum, 
which does not correspond to a true phase solution. 
False minima result when a true structural solution 
contains triplet invariants, which sum to an odd multi- 
ple of zr, termed “odd triplets’. Such odd triplets are typ- 
ically absent from strong A-value triplet sets. This has 
motivated the formulation of a modified integer mini- 
mal principle, which solves over a subset, T’ of the full 
triplet set T. The global minimum of this model can 
easily be obtained by simply finding a phase solution 
which satisfies the constraint set when all invariants are 
set to zero: 


Vte T’ 


ou, + ox, + o-H,-K; =0, (22) 


ou, + OH,,R, — ie Vm é€ M, Vse Sin (23) 


On, = OH,,R, ; VmeM,Vue Um (24) 


¢ € {0,1}. (25) 


This model can be solved in polynomial time, and 
has been shown to greatly enhance computational ef- 
ficiency for a variety of structures when compared to 
a standard crystallography package [12]. 


Maximum Determinant 


One of the first papers, which would lay a foundation 
for maximum entropy methods for phasing was pub- 
lished in 1950. By constructing the Hermitian forms 
of the structure factor function in terms of electron 
density and noting the positivity of electron density, 
[6] showed that a system of determinants containing 
certain Fy’s must be non-negative. This yielded the 
concept of a Karle-Hauptman matrix, abbreviated as 
KH matrix. 

Later, [14] proposed a method by which KH matri- 
ces could be utilized for phasing of crystal structures. 
First, consider the Sayre equation: 


(ExEy_x)* = 60Eu, (26) 


where E is a normalized structure factor at a particu- 
lar Miller index, 0 is a normalization constant, and the 
brackets around the left-hand side indicate an average 
over K. If substitutions are made, H = H; — Hj and 
K = L+ Hj, in terms of the rows i € r and columns 
j € c ofa KH matrix, L a random vector, (26) reduces 
to: 


eo => (Eran Eaaa,)" = 6 Ey; i Vi € ¢, 
(27) 


The values of Ey,—-H;> in the corresponding KH matrix, 
represent correlation coefficients, and consequently 
form a covariance matrix. If the following assumptions 
are then taken: a large number of atoms in the unit cell, 
the positions of which are mutually independent, then 
it is possible to derive a conditional joint probability law 
of the basic form: 

bm+1 

Be) 


where D,,, is the determinant of a KH matrix of order m, 
denoted by A,,, and 6,,41 is the determinant of A,, 


plEi,...,Em) = Cexp ( (28) 
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with one additional row and column appended. In this 
situation, the structure factors which compose the Aj, 
matrix are known, the phases of the appended struc- 
ture factors are varied. Ultimately, (28), implies that the 
most probable value for the phases of the structure fac- 
tors added to A,, will maximize the determinant 6,,+41. 
A generalized rule was also suggested by Tsoucaris [14]: 
given a KH matrix which contains structure factors 
with unknown phases, the most probably set of phases 
will maximize the determinant of the KH matrix. In 
terms of an optimization framework: 


max Z = det A (29) 
s.t. A>0 (30) 
A= A(9) (31) 
du, € {0,27}, Wm eM. (32) 


CRUNCH is a well known crystallography package [4] in 
which the phasing is done primarily based on solution 
of the maximum determinant formulation. The deter- 
minant is maximized using a local search technique in 
combination with an expression of the derivative of the 
determinant of A in terms of element i, j: 


6 det A 
500i 

where a and f are phases of elements aj; and bj; re- 
spectively and bj; is related to a;; through the inverse 
of A. In addition, programs such as CRUNCH will often 
maximize the determinant for a large number of small 
matrices until enough phase information is available to 
compose a full density function. Each of these matrices 
will typically contain a small amount of overlap to fix 
origin specification. It has also been shown by [15], that 
the eigenvalues of KH matrices can be used to assess 
phase set quality and for phase refinement. 


= 2|ai;||bi;| sin(Bij — aij) det A, (33) 


Entropy Maximization 


The more general technique of entropy maximization 
is discussed in detail by [1]. Essentially, the optimiza- 
tion model described involves both entropy and phys- 
ical considerations. In general, the solution is done by 
maximizing entropy defined as: 


Sse [ MeitieSieaide, Ge 


where q is the probability density of atoms and m is 
a ‘prior prejudice’, typically taken as a uniform distribu- 
tion. It then remains to write q in terms of the normal- 
ized structure factors U of the crystallography problem. 
In the space group P1, this is achieved as: 


1 
q(x) = + 2s Un,, €Xp(—271H m * x) (35) 


Un =f aw exp(277H - x)d°x. (36) 
Vv 


Solution of the maximum entropy formulation will typ- 
ically allow phasing of structures beyond the size limi- 
tations of other direct methods. 


Conclusion 


Phase information is not directly measurable from 
a traditional X-ray diffraction experiment. Numerous 
techniques have been developed for phasing of crys- 
tal structures, most well-known are the Patterson, di- 
rect, and MAD methods. Typically Patterson methods 
rely on the presence of one or more heavy atoms in the 
structure. MAD methods require the presence of a mea- 
surable anomalous difference. Direct methods phase in 
reciprocal space and are typically subject to size and 
resolution limits. In the field of direct methods, much 
work with regard to optimization has been done. Still, 
despite the age of the crystallography field, many prob- 
lems are still open in the area of robust and reliable de- 
termination of crystal structures. 
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Problem Description 


The complexity of water-related problems is escalat- 
ing as the uses of water and the (environmental and 
others) objectives to fulfill continue to expand. Most 
of the easier structural solutions for greater water re- 
sources utilization have already been implemented and 
new projects, including interbasin transfers, find some 
opposition in the Society. In these circumstances the 
need for a rational water resources planning is becom- 
ing stronger than ever as a result of the impact of the 
changes in the general climatic conditions and the in- 
creasing demand of water resources using. 

To ensure the successful catchment management of 
complex water resource systems (interaction of reser- 
voirs and channels in the surface of rivers as well as 
aquifers and other groundwater resources), it is essen- 
tial that the most reliable models and supporting tools 
can be used. In the field of conjunctive use of water re- 
source systems, the reality is complex and, so, the mod- 
els for planning are large (in terms of the number of 
decision variables) and stochastic (there are parameters 
such as the hydrological exogenous inflow and demand 
for different uses whose values cannot be controlled by 
the decision maker and are uncertain). The property of 
uncertainty makes the water resources planning diffi- 
cult to tackle, but yet the solution is critical for a proper 
utilization of the (scarce) water resources. 

The multiperiod optimization modeling framework 
should aim to confer the ability to solve vital problems 
to water resources planning agencies. The problem con- 
sists of water resources planning under uncertainty on 
hydrological exogenous inflow and demand for a set of 
inter-related (and transboundary) basin systems along 
a given planning horizon. It should have a direct bear- 
ing on the assignment of water resources to the require- 
ments of the different uses, by operating significant de- 
mand savings and minimizing the degradation in qual- 
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ity of both water environment and natural environment 
associated with its use. The main elements of the prob- 
lem are the water resource sources (such as the surface 
and groundwater systems), the water demand centers 
to satisfy current and potential future needs (for hy- 
dropower generation, irrigation, industrial, domestic, 
recreative and ecological purposes among others) and 
the infrastructure of reservoirs and water transporta- 
tion systems including artificial, natural and would-to- 
be basin and inter-basin channels. 

A water resource system is included by a surface 
subsystem and a groundwater subsystem, both inter- 
connected. The system can be viewed as a physical net- 
work whose nodes and arcs are as follows: 

1) Nodes with water storage capacity (i.e., reservoirs 
including lakes), where evaporation and losses by 
infiltration to groundwater should be considered. 
This type of nodes can have associated hydropower 
generation units that make use of water but it does 
not reduced it practically. 

2) Physical junction nodes. They are points in the river 
where the waterflow has some modification such as 
river confluences, hydrological inflows, diversions, 
etc. 

3) Demand nodes. Other demand uses are irrigation, 
urban, industrial, recreation and for ecological 
purposes among others. They can be represented 
by consumptive and (partially) nonconsumptive 
water demand nodes. 

4) Return nodes. They are nodes (points in the river) 
where water is (partially) returned from some de- 
mand nodes. 

5) On-the-river hydropower nodes. They are nodes 
without water storage capacity, so, they can only 
make use of the waterflow for satisfying hy- 
dropower generation needs, but without regulating 
it, nor reducing it either. 

6) Surface water pumping facilities. These nodes allow 
water pumping to upstream reservoirs. 

7) Natural stream arcs. Different types of arcs can be 
modeled as network arcs, such as natural chan- 
nels (i. e., river reaches in multireservoir systems), 
canals, ditches, interbasin transfers, etc. 

8) Aquifers. They are nodes from the groundwater 
system with water storage capacity. Conjunctive 
use of surface water and groundwater is of great 
importance in many basins, given the scarcity of 


water resources and the competition between con- 
flicting uses. 
9) Controlled recharge facilities. These nodes allow di- 
rect injection of surface water into the aquifers. 
10) Groundwater pumping facilities. These nodes al- 
low direct pumping of groundwater to the natural 
stream arcs. 
The main purpose of optimization in the field consists 
of determining the water resources availability and de- 
mand balancing for each period of the planning hori- 
zon under study. In case of no balancing feasible so- 
lution, the approach should provide a water resources 
planning to minimize the weighted unbalancing devia- 
tion. The water flows through interbasin transfer chan- 
nels along a given time planning horizon. Technically, 
the problem is converted to a time replicated network. 
Novel modeling schemes for multistage linking con- 
straints to force upper bounds on the water demand 
cumulated deficit for given consecutive time periods 
should be considered, see [8]. This type of constraints 
force water management policies to avoid disastrous 
consequences of drought out events. For the same pur- 
pose a constraint type can be modeled to preserve ‘ear- 
marked’ reserve stored water in (directly and nondi- 
rectly) upstream reservoirs along the river to satisfy po- 
tential future needs in selected demand centers at given 
time periods. 

The decision maker should decide about the water 
volume to be stored at each reservoir and controlled 
aquifer and, then, the water volume to be released, such 
that physical structural constraints are satisfied and wa- 
ter utilization policies are prioritized and optimized. 
These policies are related to environmental objectives, 
hydroelectrical production, irrigation, urban and in- 
dustrial demands and other uses, reservoirs’ water lev- 
els, aquifers’ artificial recharge and pumping policies, 
reserve stored water at reservoirs to satisfy potential fu- 
ture needs at selected demand centers during drought 
out events, etc. So, the objective function to minimize 
is the expected value of a composite function included 
by the penalization of the deficits on the satisfaction 
target levels, the weights of water flow through natu- 
ral stream and surface pumping arcs and the weights of 
water pumping from aquifers along the planning hori- 
zon. 

As an important byproduct the system should de- 
termine the risk of significant water deficiencies and to 
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mitigate the consequences of extreme events as drought 
outs and floods. The assessment of the impact on water 
balancing due to water channelling infrastructure mod- 
ifications, as well as modifications on water resource us- 
ing policies for some demand types are some other use- 
ful results. See some approaches in [1,2,7,9] and [12] - 

[13] among others. 

Other results from using optimization schemes for 
water resources planning are as follows. 

e Assessing environment protection by enforcing 
lower bounds on the flow through natural stream 
arcs and water quality. 

e Assessing the degree of systems’ reliability. 
Determining the risk of significant water deficien- 
cies even over very extended areas. 

e Qualifying the water demand according to the re- 
quirements of the different uses, and assessing the 
impact on each other use. 

e Advancing and quantifying the potential repercus- 
sions on the Environment and the Economy of cer- 
tain water utilization policies at given time periods 
under a variety of potential scenarios. 

e Determining the structural works and management 
changes that should be performed to mitigate disas- 
trous consequences of drought out and flood events. 

e Assessing the rational use of groundwater by help- 
ing to decide when and how much aquifer pump- 
ing should be performed (and when and how much 
aquifer artificial recharge should be commanded) to 
preserve their structural constitution, given the sce- 
nario tree that is foreseen (see below), and the rank- 
ing and weighting of the demand uses to consider. 

e Assessing the need and timing of inter-basin trans- 
fers by considering the potential scenarios to occur 
and the demand uses in the different river basin ar- 
eas. Similar impact for transboundary rivers. 


Stochastic Approach 


The optimization problem described above can be ex- 
pressed in the following model structuring, 


min clv 
Vv 
st. Av=p (1) 
v>=0, 


where c is the vector of the objective function coeffi- 
cients, A is the m x n constraint matrix, p is the right- 


hand side m-vector and v is the n-vector of the decision 
variables to optimise. It must be extended in order to 
deal properly with uncertainty in the values of some pa- 
rameters, say, c and p in this case, hydrological exoge- 
nous inflow and demand in various uses. The class of 
optimization problems with uncertainty in the param- 
eters is among the most intractable class in numerical 
computation. 

In any case one needs to consider two additional 
features. In the first place, one must model the avail- 
ability of hydrological information over time, and state 
what sort of water resource using decisions can be made 
at each of the various stages. Secondly, to compute an 
optimal water resources solution in the stochastic area 
any proposed solution should also be compared with 
other candidate solutions as it is done in the determin- 
istic field. But, in the stochastic setting, the criteria by 
which this comparison can be performed are much less 
clear. Thus, one needs an approach to model the un- 
certainty in the problem data. The traditional approach 
is to make probabilistic distribution assumptions, es- 
timate the parameters from historical data and, then, 
develop an stochastic model to take the uncertainty 
into account. Such an approach may not be appropriate 
if only limited information is available. In many such 
cases one may employ a technique so-called scenario 
analysis, where the uncertainty is modeled via a set of 
scenarios [6]. 

Let S denote the set of scenarios to consider, and w* 
the likelihood that the decision maker assigns to sce- 
nario s fors € S. So, in contrast to traditional mathemat- 
ical programming approaches, state-of-the-art schemes 
model the uncertainty by using scenarios to character- 
ize the uncertain parameters. A scenario tree is gener- 
ated and, through the use of full recourse techniques, 
an implementable solution is obtained for the first time 
stage by considering all scenarios but without subordi- 
nating to any of them; additionally, a coordinated so- 
lution for each scenario group at the other time stages 
should also be provided. While this approach is used, 
the so-called deterministic equivalent model (DEM) 
has a huge number of constraints and variables. So, very 
often the problem structure (network-like and others) 
is lost as a consequence of the need to impose additional 
conditions on the value of the variables to ensure the 
coherence of the water resource using decisions taken 
at different time stages. 
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The minimization of the expected value of the com- 
posite function included by the penalization of the 
deficit on the satisfaction target levels can be expressed 


min y wsly 
7 
ses 
st Av=p', 


v>0. 


Vs eS, (2) 


Note that (2) gives an implementable policy based 
on the so-called simple recourse scheme. (See that the 
whole vector of decision variables is anticipated at 
stage 1). 


Nonanticipative Water Resources Policies 


Model (2) does anticipate decisions in v that for mul- 
tistage environments may not be needed at stage r = 1. 
Very frequently the decisions for stage r = 1 are the de- 
cisions to be made since at stage r = 2 one may realize 
that some of the data has been changed, some scenarios 
vanish, etc. In this case, the model will be usually re- 
optimized in a rolling planning horizon mode. When 
only spot decisions (i. e., decisions for the first stage) are 
to be made, the information about future uncertainty is 
taken into account for a better spot decision making. 
This type of scheme is termed full recourse. 

Let R denote the set of stages and v; the vector of the 
variables related to stage r under scenario s for r € Rand 
s € S, and v’ is the set of vectors v; Wr € R. The so-called 
nonanticipative principle is stated as follows, see [15]: If 
two different scenarios, say, s and s’ are identical up to 
stage r on the basis of the information available about 
them up to that stage, then the values of the v-variables 
must be identical up to stage r. This principle guaran- 
tees that the solution obtained from the model is not 
dependent at stage r on the information that is not yet 
available. To illustrate this concept, consider a so-called 
scenario tree where each node represents a point in time 
where a decision on water resource using can be made. 
Once a decision is being made several contingencies can 
happen, and information related to these contingencies 
is available at the beginning of the next stage. This in- 
formation structure is visualized as a tree, where each 
root-to-leave path represents one specific scenario and 
corresponds to one realization of the uncertain param- 
eters. 


In order to introduce the implications of this prin- 
ciple, see [8], in water resources planning optimization, 
let us define a set of scenario groups, say, G, for each 
stage r, such that all scenarios having the same realiza- 
tions of the uncertainty up to stage r belong to the same 
scenario group, say, g for g € G,. Let S,,, denote the set 
of scenarios that belong to group g at stage r for S,, © 
S. Let a node in the scenario tree be represented by the 
pair, say, (k, r) for k € G,, r € R, such that the scenario 
tree is defined by the set of nodes Ux € G,, r € R(k, 1) 
and the set of directed arcs E, where (k, €) € E if and 
only if Sg, r+ 1 C S,, for k € G, and £ € G,, 1. Let Gk 
= {f € G,.i}(k, €) € E. Finally, let N denote the set 
of solutions that satisfy the so-called nonanticipativity 
constraints. That is, 


S a. -4)S! 
Ve=Vv,, 
¥s,8 Say, 
geG,,reR 


(3) 


veN=.iv: 


So, the DEM of the so-called full recourse version of 
model (1) can be expressed 


min y wees! ys 
Vv 


ses 

st. Av’ =p’, WseS, (4) 
veEN, 
v>0, VWseS. 


Model (4) has a nice structure that we may exploit. Two 
approaches can be used to represent the nonanticipa- 
tivity constraints (3). One approach is based on a com- 
pact representation, where (3) is used to eliminate vari- 
ables in (4) as well as for reducing model size, so that 
there is a single variable for each element at each sce- 
nario group of each stage, but any special structure of 
the constraints in (1) is destroyed. In this case let the 
variables vector v = (x, y, z) have the following struc- 
ture: xg, vector of variables with nonzero coefficients 
in the constraints related to stage r alone for g € G,, r € 
R; Y¢, r, vector of variables with nonzero elements in the 
constraints related to the stages r and r+ 1 (for the water 
resources planning problem this type of variables rep- 
resent the stored water in the reservoirs and aquifers at 
the last time period of stage r); and z,,,, vector of vari- 
ables with nonzero elements in the constraints related 
to stage r as well as in the constraints related to sets of 
stages to be defined below. 
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Introduce the following additional notation. U is 
the set of z-related constraint blocks through time 
stages, so-called multistage linking constraints, R,, is the 
set of time stages related to constraint block u for € U, 
r,, and 7, are the smallest and largest elements from R,, 
respectively, and Ng, ,, is the set of nodes in the directed 
path through the set of time stages (i.e., set R,,) whose 
ending node is node (g, r,,) and the unique origin node 
is, say, (i, r,,). So, the pair (k, r) index for variable z;, , 
is such that (k, r) € Ny, for ending node (g,7,,) and 
constraint block u. This type of constraint block can be 
represented as follows: 


» Dy ,cZk,t = dou 
gu) (RT)EN gu (5) 


Vg €Gz,, uéU, 


where D,,; is the matrix for constraint block u related 
to the z-variables from stage t, and dy, is the right- 
hand side of constraint block u for scenario group g 
from stage r,,u € U. (See that constraint block u has 
|G;,| versions.) For the water resources planning this 
type of constraints prevent that the cumulated water de- 
mand deficit in given nodes through consecutive time 
stages can violate given upper bounds under given sce- 
nario groups. 

The compact representation of model (4) can be ex- 
pressed as follows: 


a oe 
XY gr 


rEeR geG, 
T alr a 
(ag Xgyr + bo Vr ale CorZgur) 


6 
St. OS Xr. Vers Zr © Xr 2 
VgeG,, reR, 
ZgrELZyu, VeEeG,, ue, 
such that 
ArXgyr + Bi ye r—1 + BrYg,r 
Xgur +CgrZg,r = Pgr (7) 


VgeG,, reR, 

where wy,, gives the weight for scenario group g at 
stage r, such that ag ;, bg, and cg, are the x-, y- and z- 
variables related objective function coefficients respec- 
tively, for the pair (g, r), A,, B, and C, are the appropri- 
ate constraint matrices and p,,, is the right-hand side, 


£ 


all with the conformable dimensions, and ¢: {g € G*_, 


for g € G,,r € R}, and 


Wer = > w. (8) 


sES gy 


One of the main inconveniences of the compact rep- 
resentation (5)-(7) is the inherent difficulty for its de- 
composition in smaller models. Given the large scale 
instances of the model, easy decomposition is a key for 
success. It can be obtained from the so-called splitting 
variable representation. It requires to produce sibles of 
the y- and z-variables. 

For this purpose let N%" denote the set of pairs (k, 
t) such that k € G;,t € Rt > rand du e€ ~ = 7, for 
(g, r) € Nx, u. That is, (g, r) and (k, t) for g € G, and k 
€ G; are any two nodes in the scenario tree for r, t € R, 
such that there is a constraint block u for u € U where 
t =r, and there is a path from some node, say, (i, r,,) 
to node (k, t) through node (g, r). 

In order to introduce the new representation, let us 
rename the y- and z-variables such that y¢, , and Zg, ; will 


be replaced by ee and z¢’,, respectively, and add the 


new variables yg, r— 18, where ¢: g € Gt pand zk, Tis 
V(k, t) € N®",g © G,,r € R. So, the splitting variable 


representation is as follows: 


min y ) Wer 
XYZ 


rE€R g€G,; 
Te T 4,0 T L8r 
(Ag -Xg,r + as ee + Co rZ ger) 
s.t x 
8.t 
9 
Zou (9) 
£ 
em 
k,t 
&r 
x, y,2 2 9, 
where 
4 
ArXgr + BY) + BrYgy 
C22, = : 
Koy 3 +C, gor Pg,r (10) 
VgeG,,reR, 


£:geGt, 


Stu 
ek. nEN Dut2e_e = Agus 
VgeG-,, ue, 


. (11) 
. (12) 


t Yer — er = 0, 
") Ve {O}UG?, geG,,reR 
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Ad 
& = ght = 0, 


Zz 
Zs ee 13 
sr V(k,t) € N®&", geG,,reR ) 
Remark 1 The two constraint blocks (12) and (13) in 


(9) are the expressions for the nonanticipativity con- 
straints (3). 


Different types of decomposition approaches can be 
used for solving model (8); namely, augmented La- 
grangian and Benders [3] decomposition schemes, both 
being very amenable for using parallel computing ap- 
proaches, see [4,5,10,11,14,16], and [17] among others. 


Conclusions 


Full recourse based mathematical 
schemes have been used as the kernels of decision 
support systems for water resource planning under wa- 
ter exogenous inflow and demand uncertainty, where 
the uncertainty is treated via scenario analysis. This 
methodology results in a huge deterministic equiva- 
lent model (with hundreds of thousands of constraints 
and variables), where care should be taken to preserve 
the constraint structure of the original problem. Two 
very useful constraint types are considered for the de- 
mand centers, namely, upper bounds on the deficit of 
reserve stored water in (directly and nondirectly) up- 
stream reservoirs to satisfy potential future needs, and 
upper bounds on the consecutive time periods cumu- 
lated water demand deficit (so-called multistage linking 
constraints). 

A splitting variable scheme for the reservoirs and 
controlled aquifers stored water modeling represen- 
tation can be used, as well as decomposition frame- 
works based on Benders and augmented Lagrangian 
approaches. On the other hand, given the separability of 
the subproblems attached to the nodes of the scenario 
tree as well as the reduced overhead required, one can 
be motivated to develop parallel computing versions of 
the decomposition approaches on a distributed envi- 
ronment. It will significantly help to solve large scale 
water resource planning problems under uncertainty in 
water exogenous inflow and demand parameters. 


programming 


See also 


> Global Optimization in the Analysis and 
Management of Environmental Systems 
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Abstract 


Finding the median (or minisum) point in continuous 
space relative to a set of weighted demand points (cus- 
tomers, markets, existing facilities) is a classical prob- 
lem in location theory. We examine an extension of 
the problem where the existing facilities are modeled 
as fixed areas, and the median point is determined rel- 
ative to the closest distances to these areas. Two popu- 
lar distance metrics are considered: the Euclidean and 
rectangular norms. These demonstrate the differences 
between the classes of round (or smooth) norms and 
block norms. Mathematical properties of the new mod- 
els are used to adapt existing solution methods to solve 
them. 


Keywords 


Euclidean norm; Rectangular norm; Closest distance; 
Median 


Introduction 


In continuous location theory, facilities to be optimally 
located are generally represented by points, and the cus- 
tomers or markets that they serve are also geometri- 
cal points in space. The objective is to find the optimal 
site of one or more facilities with respect to a specified 
performance measure such as the sum of transporta- 
tion costs. This is one of the oldest formal optimization 
problems in mathematics and has a long and interest- 
ing history ([15] Section 1.3, [7,9,21]). Many variants of 
the problem exist. A very basic version of the location 
problem is to minimize: 


W(x) = )) wiK(x— ai), (1) 


i=1 


where x = (x1,X2) is the unknown facility location 
in i, w; is a positive weight representing transporta- 
tion cost per unit distance for customer i, and K(x —a;) 
is a norm measuring distance from the facility loca- 
tion x to the fixed location a; = (aj), aj2) of demand 
point i. The most common distance measure is Eu- 
clidean or straight-line distance and in this case, the 
most common solution procedure is some form of non- 
linear optimization such as gradient descent. 

A simple yet elegant solution procedure for the min- 
isum problem (1) with Euclidean (£2) distances was 
proposed by Weiszfeld in 1937 [20]. Setting the first- 
order partial derivatives of W(x) to zero, we obtain the 
following system of equations for a stationary point: 


w(x; — Git) = 
Der ae, =0, t=1,2. (2) 


Since the objective function is convex, the above equa- 
tions are both necessary and sufficient for any differ- 
entiable point to be a global solution of (1). However, 
since the system of equations cannot be solved explic- 
itly, Weiszfeld developed the following one-point iter- 
ative scheme by isolating the coordinates on one side: 


Yo wi ait! lo(x4 — aj) 
qtl = i 
Y willo(x4 = a;) 


x t=1,2, (3) 


where q = 0,1,2,..., denotes the iteration number. 
The Weiszfeld procedure is equivalent to a gradi- 
ent descent with predetermined step size. In a semi- 
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nal paper by Kuhn [14], global convergence of the pro- 
cedure was proven provided that no iterate coincides 
with a fixed point a; , where the iteration functions 
are undefined. The local convergence rate is always lin- 
ear when the optimal solution occurs at a differentiable 
point; however, when the optimal solution coincides 
with a fixed point, the convergence rate may be sub- 
linear or quadratic under certain conditions [13]. The 
Weiszfeld procedure has been generalized to other dis- 
tance functions, and convergence properties have been 
studied e.g. see [1,2] for €, distances. We note that 
global convergence is guaranteed for the ¢, norm if the 
parameter p € [1,2], that is, for the rectangular norm 
(p = 1), the Euclidean norm (p = 2), and all £, norms 
in between. 

The Ly norm, with 1 < p < oo, belongs to the class 
of round norms that are characterized by unit circles 
with no ‘flat spots’. Block norms, on the other hand, 
have unit circles that are polygons in i? (or polyhe- 
drons in higher dimensional space). With this property, 
the objective function in (1) becomes convex piecewise 
linear. Block norms play a useful role in location mod- 
els (e. g., see [17,19]), as the search in continuous space 
may now be confined to a finite set of points, and lin- 
ear programming or related techniques may be used. 
Similarly, many other types of problems become much 
easier to solve when absolute values occur in a certain 
form. Examples of block norms include the rectangu- 
lar norm (p = 1), Tchebycheff norm (p = oo), and 
a linear combination of the two that has been used to 
approximate the ¢, norm [18]. 

When distances are rectangular, that is, when: 


K(x — ai) = (|x1 — aii] + | x2 — ai2|). (4) 


minimizing W(x) is accomplished by simply finding 
weighted median locations separately along the x; and 
x2 axes (e. g., see [15] Chapter 2, or [9]). For example, if 
the aj;’s are ordered from smallest to largest, with their 
weights attached, then the aj; associated with the me- 
dian of the weights would determine the optimal so- 
lution for x;. The separability property of rectangular 
distances allows for the solution of more complex prob- 
lems. For example, the demand points may be replaced 
by demand ‘areas’, and the distance becomes the ex- 
pected distance between the facility and each demand 
for a given density function of space distributed de- 
mand [6,22]. 


The Model 


We now consider the problem of locating a new fa- 
cility denoted by point x € 7 to service a set of 
n specified demand regions (or market areas) denoted 
by A; C K?, i = 1,...,n. The A; are assumed to be 
fixed areas in the plane that are bounded and closed, 
with known demands again specified by weighting con- 
stants, w; > 0, i = 1,...,n. The objective is to find the 
point x that minimizes the weighted sum of distances to 
the n demand regions. 

What makes our problem different from the well- 
studied minisum problem discussed above is that the 
travel distance separating the facility from a demand 
region A; is now defined as the distance measured by 
norm K from x to the closest point in A;. This may be 
interpreted as flow from the facility entering the given 
market at the closest entry point. Internal distribution 
costs within the market area are assumed to be unim- 
portant or ‘someone else’s concern’ (see [4,5] for fur- 
ther discussion). 

The idea of closest distances is well known in set 
theory (e. g. [11]). Area demands have been used many 
times in location problems, but along with the assump- 
tion that travel distances within areas are relevant: e. g. 
see [6,10,22]. If some form of aggregation is used, then 
the expected travel distance from the facility to some 
‘mean’ point in the interior of the demand area deter- 
mines the transportation cost. 

Denote the closest point in A; by a;(x). Since this 
point is determined by the intersection of the smallest 
possible circle of norm K centered at x with Aj, it fol- 
lows that if A; is a convex region, and K a round norm, 
then, a;(x) is always uniquely defined. The travel dis- 
tance now becomes: 


d(x,Aj) = ae K(x — y) = K(x — aj(x)). (5) 


The single facility minisum problem with closest dis- 
tance function then takes the form: 


min W(x) = ) > wid(x, Ai) = D> wiK(x—aj(x)). 
i=1 i=1 
(6) 
Property 1 (Brimberg and Wesolowsky, [4]) If A; is 
a convex region, then d(x, A;) is a convex function of x 
for any norm K. 
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It follows that if all the A; are given as convex regions, 
then W is a sum of convex functions that is itself con- 
vex. In this case a general descent algorithm may be 
used to obtain the optimal solution. However, more 
specialized methods may be devised based on proper- 
ties of the particular model. 


Euclidean Closest Distances 


We now specify the norm K to be the Euclidean norm, 
that is, 


d(x, Aj) = ly(x — a;(x)) 
= [(x1 — aiy(x))? + (x2 — ain(x))71"? , (7) 


Vx = (x1, x2), ai(x) = (air(x), ai2(x)) € KR? . 


Let int(A;) and B; denote, respectively, the interior 
and boundary of A; ,i = 1,...,n. Let a;(x) represent 
the “fixed point” a;(x) which is assumed to be unique. 
When x ¢€ int(A)), it is clear that 0(d(x, A;))/dx; ex- 
ists, and is equal to 0 for all j. However, since the func- 
tional form of W(x) changes crossing from the inte- 
rior to the exterior of Aj, it also follows that when 
x € B;, 0(d(x, A;))/dx; is undefined for at least one 
direction j. The following result [5] provides an inter- 
esting relation when x is a point outside Aj. 


Property 2 Consider any x ¢ Aj. Then the par- 
tial derivative 0(d(x, A;))/dx; is defined Vj; its value 
at x is the same when a;(x) is replaced by the fixed 
point a;(x). 


It follows that if x is external to all the A;, the gradi- 
ent vector of W(x) may be calculated by replacing all 
the demand regions by the associated fixed points a;(x). 
Meanwhile if x is inside a region, we simply delete that 
region from the calculation. Thus the problem may be 
converted, at least locally, to the standard form given 
in (1). The basic idea behind the algorithm given in [5] 
may now be summarized as follows: 

Given an initial location x°, we determine the clos- 
est point a;(x°) for each demand region A; not contain- 
ing x°. These a;(x°)are then treated as fixed points re- 
placing the respective areas A;. This allows us now to 
make use of the well-known Weiszfeld procedure intro- 
duced above. One Weiszfeld iteration produces a new 
location x! with lower objective function value relative 
to the set {a;(x°);i = 1,...,n}, although reduction of 
the step size may be required if x° falls within a demand 


a;(x) 


Optimizing Facility Location with Euclidean and Rectilinear 
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Closest distances with an area facility 


region. Using the new location x’, we then recalculate 
the closest points to obtain {a;(x!);i = 1,...,n}, and 
a further improvement in the objective function. The 
whole process is repeated to provide a sequence of de- 
scent moves. The algorithm will converge to the opti- 
mal solution if all the A; are convex regions; otherwise, 
since the objective function is no longer convex, we are 
only guaranteed a local minimum. 

Suppose now that instead of being a point, the new 
facility is represented by an area of fixed dimensions 
and orientation, denoted by S(x), where x is a specified 
‘center’ point. The closest distance between the facility 
and demand region 4A; is then defined as: 

d(x,Aj)= min {l2(z—y)} = l2(ci(x) —aj(x)), 
zES(x),yEA; 
(8) 


where cj(x) and aj(x) are the closest points in S(x) 
and Aj, respectively. As illustrated in Fig. 1, we may 
replace cj(x) by x and a;(x) by a’j(x) = aj(x) + (x 
— ¢;(x)). 

Using this insight, the problem may be converted 
to the original form by replacing each A; by an en- 
larged area A’, [5]. This is illustrated in Fig. 2, where 
S(x) is a rectangle, and A; a polygon. Note that A’ is 
also a polygon but with a larger number of sides. 


Rectilinear Closest Distances 


The norm K is now specified by the rectangular norm; 
hence, 


d(x, Aj) = &(x—a;(x)) = |x,—aj1(x)|+|x2—ai2(x)|. 
(9) 
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The problem now becomes: 


minW(x) = )> wi(|xi—air(x)|-+|x2—ai2(x)|). (10) 


i=1 


If the area A; is a rectangle, then the dis- 
tance d(x, A;) has a very convenient form. Referring to 
the notation and closest distances given in Fig. 3, it fol- 
lows that 


2 2 
1 
a(x, Ai) = 5 Yo dS xj - aijel -— 1, 


j=l k=1 


(11) 


where 


Q4j12 — Gil 4j22 — Gi21 
i => 
2 2 


(12) 


from which we can conclude that this problem is equiv- 
alent to solving: 


n 2 2 
min W(x) = > wily - Aijk| 


i=1 j=1 k=1 


(13) 


This problem is now separable, and is very easily solv- 
able by ordering coordinates and taking medians, as 
discussed previously (see [15], Chapter 2). The physi- 
cal interpretation of this problem is also interesting. If 
each rectangle A; with weight or demand w; is replaced 
by two fixed points with the same weights w; at diago- 
nally opposing vertices, then the original problem be- 
comes an equivalent problem with point demands. 

In common with many location models using rec- 
tilinear distances, W(x) remains piecewise-linear along 
any straight line in the solution space when closest rect- 
angular distances are in use. Specifically, W(x) is lin- 
ear in segments of the plane marked off by lines drawn 
through the vertices of polygonal demand areas. This is 
particularly important in location problems, such as the 
maximin location problem, where the objective func- 
tion is not convex. It means that the problem can be 
decomposed into a number of problems with linear ob- 
jective functions [3,4,8,10,12]. 

Another characteristic of this problem in common 
with other rectilinear location problems (and block 
norms in general) is that it can be solved with lin- 
ear programming; that is, optimizing W(x) can be ex- 
pressed as a set of linear programming formulations. 

The rectangular distance function has discontinu- 
ities in its derivatives, so that associated location prob- 
lems provide difficulties for gradient descent based pro- 
cedures. One way around this is to use hyperbolic ap- 
proximations for the terms represented by absolute dif- 
ferences [16,23]. 
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Order Complementarity 


The introduction of the order complementarity prob- 
lem in complementarity theory can be justified by the 
following two reasons: 

1) In the study of some particular classical comple- 
mentarity problems the essential fact is not the or- 
thogonality in the sense of an inner-product, but im- 
portant is the lattice orthogonality. It is very use- 
ful, in some circumstances, to represent the classical 
complementarity problem as an order complemen- 
tarity problem. 

In several practical problems the complementarity 
condition appears with more than one operator. We 
have this situation, for example in Economics, in 
Lubrication Theory, in Stochastic Optimal Control 
Theory, etc. [16]. The study of order complementar- 
ity problems is a new chapter in Complementarity 
Theory. This chapter is now in developing. 


2 


YN 


Preliminaries 


Denote by E(t) (respectively, by (E, || - ||) and (E, (-), -)) 
a locally convex space (respectively, a Banach space and 
a Hilbert space). Suppose that E is ordered by a pointed 
convex cone K C E,i.e., Kis a subset of E satisfying the 
following properties: 

1) K+KCK; 

2) AKC K forall A € R,; and 

3) KN (—K) = {0}. 

Denote by < the ordering defined by K, that is, x < y if 
and only if y — x € K. Assume that the ordered vector 
space (E, K) is a vector lattice, i.e., for every pair (x, y) 
€ ExE, the supremum v (x, y) and the infimum A (x, 
y) with respect to the ordering < exist in E. In this case, 
for every x, y, z € E we have: 

1) VQ y)+zZ=V (x4+Z,y +2); 

2) ACY Y) + Z=A (X42, 4+ 2); 

3) VixVvyZ2=VIV Oy) V2) =V % y 2). 
We can show that v (x, y) = — A (—x, —y) for all x, 
yek. 

Let E(t) bea locally convex space and K C Ea closed 
convex cone. 

We say that K is regular (respectively, completely 
regular) if all monotone increasing and order bounded 
(resp. topological bounded) sequences of elements of K 
are T-convergent. Let D C E bea subset. An operator T: 
D —> Eis said to be isotone (respectively, antitone) if x 
< x2 (x1, X2 € D) implies T(x) < T(x) (respectively, 


T(x2) < T(x;)). If S is a subset of D, we denote by j2(S) 
the measure of noncompactness of S i.e., 4(S) = inf{r > 
0: S can be recovered by a finite family of subsets of E 
whose diameter < r}. 

We say that T is a k-set-contraction (k > 0) if it 
is continuous, bounded and j(T(S)) < ky(S) for any 
bounded set S C D. A k-set-contraction is called a strict- 
set-contraction if k < 1. T is called condensing if it is con- 
tinuous, bounded and ju(T(S)) < j4(S) for any bounded 
set S C D with (S) > 0. If K C Eis a pointed convex 
cone we say that a bilinear form (-, -) on E is K-local if 
and only if (x, y) = 0, whenever x, y € K and A (x, y) 
=0. 


Order Complementarity Problems 


The order complementarity problems represent a rela- 
tively new chapter in complementarity theory. The or- 
der complementarity problems are necessary since, in 
many situations some classical complementarity prob- 
lems must be represented as a lattice orthogonality 
problem. Furthermore in some practical problems, we 
must use the complementarity condition simultane- 
ously with respect to several operators. 

Let (E, K) be a vector lattice with respect to the or- 
der defined by the pointed convex cone K. Let D be 
a nonempty subset of E. In particular, the set D can be 
the cone K. Given m linear or nonlinear functions f;, 
...9fmiE — E, the order complementarity problem asso- 
ciated with the family of functions and with the set D 
is: 


find x9 € D 
OCP({fi}ja3 D) 4s... ACfi(%0),-- +s fim(X0)) 
= 0. 


In [16] this problem is named the implicit general order 
complementarity problem. We have several interesting 
particular cases: 

1) Ifm=2, D=E, f; = Id (the identity mapping) and 
f2(x) = Tx + q, where T: E > E is a linear mapping 
and q an element in E, we have the linear order com- 
plementarity problem denoted by LOCP(T, q). This 
problem was studied systematically for the first time 
in 1989 in [2] where several interesting new classes 
of linear operators were introduced. We find, for ex- 
ample, the operators of classes (H™), (S), (Z), (K), (P) 
and (A). 
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2) Ifm is arbitrary and the functions f; (i=1,...,m) are 
affine mappings we have the generalized linear order 
complementarity problem. Several results about this 
problem are obtained in [9,16,25]. 

3) If m = 2, D= K and fy, f2 are nonlinear mappings 
we have the nonlinear order complementarity prob- 
lem studied for the first time in 1986, in [12]. 

4) If m = 3, D= E, f; = Id and f2, f3 are nonlinear 
we have the order complementarity problem intro- 
duced in lubrication theory in 1986 in [22]. 

5) If m is arbitrary, D = K, f; = Id and fo, ..., fim 
are nonlinear but having the form f(x) = x — T;(x) 
(i= 1,..., m), with every T; a nonlinear mapping, 
we have the generalized order complementarity prob- 
lem studied systematically in [17], and for set-valued 
mappings in [18]. 


Order Complementarity Problem 
as Mathematical Model 


The order complementarity problems can be consid- 
ered also as mathematical models for many practical 
problems. We indicate in this section some of such 
models. 


Mixed Lubrication Problem 


Consider the mixed lubrication in the context of a jour- 
nal bearing with elastic support. The problem is to 
study the contact pressure X. In this case, E = H'(2) 
(defined over L?(2)) and the cone is K = {u € H'(): 
jt = Oae. on S82}. We have two operators, T, (X) and 
T2(X), where T, is generally an integral operator and 
T> is the Reynolds’ partial differential operator. For the 
definition of these operators, the reader is referred to 
[16,17,22]. In this case, there are three distinct functions 
which cause the decomposition of the spatial area into 
three disjoint regions: the innermost region (solid-to- 
solid contact), the elasto-hydrodynamic lubrication re- 
gion (solid-to-fluid contact), and the cavity region (in 
which the pressure returns to the ambient value). The 
complementarity formulation is based on the observa- 
tion that the contact pressure X satisfies the following 
equations specified for every region: 

1) X>0, T,(X) = 0, T2(X) > 0 (solid-to-solid contact); 
2) X=0, T(x) = 0, T2(X) = 0 (cavity point); 

3) X => 0, T1(X) = 0, T2(X) = 0 (lubrication point). 


The problem to know the contact pressure X is equiva- 
lent to the solvability of the problem OCP(Id, T;, T2;K). 
Initially, this problem was defined in [22] and until now 
it is not solved. 


Global Reproduction of an Economic System 
Working with Several Technologies 


Consider a nonlinear economic system which is a gen- 
eralization of the classical linear input-output system 
defined by Leontief. Suppose that the system has n pro- 
duction sectors and every sector works with m tech- 
nologies to produce one type of output. The number of 
technologies is the same for every sector. Every sector 
is constrained to use the production of the others. Let 
x; be the level in units of the gross activity performed in 
the sector j. Suppose that to produce x; units in the sec- 
tor j, f* (Xj) units from the technology k of sector i are 
needed as inputs. We make the following assumptions: 
for all i, j, k: 
1) allf : (x;) are continuous; 
2) fi; (0) = 0; 
3) OXuj <1; implies fF; (uj) < By). 
The balances between total activities and final demands 
for the technology k are given by x; = i Filep + 
yj,i=1,...,, where y; is the final demand for the sec- 
tor i. Denote this system by S({f ti): This is the classi- 
cal Leontief nonlinear input-output system studied by 
several authors (see the references cited in [17]). We re- 
place condition 3) by the following more realistic con- 
dition: 
4) there exists a continuous mapping ®: R” — R” such 

that: 

i) Id + @ is invertible and (Id + ©)! is isotone 

(with respect to the ordering defined by R‘, ); 
ii) B(x) + ae ff : (xj) is isotone for every k, where 
KE (Kipades Xa)? and f} (xj) = El ie 

If NG (x;)}) satisfies 1), 2) and 4) we say that it is a tol- 
erant system and in this case @ is a tolerance. We define 
FK(x) =x — Ye) and for any y° > 0, 


Fl(x)—y° = 0 


Syo = 


n, 
) xeER: 


F™(x)—y° >0 


For this model, the problem is to show that given y° > 0 
with S,o nonempty, the problem OCP(T}, ..., Tm, R'.) 
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has a solution x° > 0 which is the least element of Syo; 
where 


Ti (x) = F(x) —y°, wees Din (x) = F™(x) —y° : 


In this case we say that the production x? is realizing y° 
with a minimal social cost. This model was studied in 
[17]. In this paper it is shown that a tolerant economic 
system, which is locally nonlinear and with the func- 
tions f 7 not necessarily isotone, has a behavior similar 
to a classical Leontief model. 


Discrete Dynamic Complementarity Problem 


One of the recent discoveries in complementarity the- 
ory is the dynamic complementarity problem. It seems 
that this problem was defined in [10] and it is a unify- 
ing framework for fluid and diffusion approximation of 
stochastic flow networks. Now we consider the discrete 
dynamic complementarity problem (DDCP). Let (R”, (-, 
-)) be the Euclidean space ordered by R". Let x = {x(0), 
..+) X(n), ...} be a sequence of vectors in R”. Assume 
that x(0) > 0 and Ris a real (m x m)-matrix. The prob- 
lem (DDCP) is the following: given the sequence x and 
the matrix ® find the sequence y = {y(0), ..., y(m), ...} 
such that for all n € N: 

i) z(n) = x(n) + Ry(n) = 0; 

ii) y(0) =0, A y(n) = y(n) — y(n — 1) = 0; 

iii) (z(n), A y(n)) =0. 

We assume by convention that y(—1) = 0 and N = {0, 
1, 2, ...}. Consider the vector space S = {x: x: N—> R”, 
ordered by the convex cone 


K= {x € S: x(n) > 0 forall n € N} 


and endowed with the Fréchet locally convex topology 
defined by the family of seminorms {p;}, <n, where p, = 
yf =o || x(n) ||. The space S is a vector lattice and K is 
normal. We define the following operators from S into 
S: 


Ty(y) = {x(0) + Ry(0),...,x(n) + Ry(n),...}, 
T(y) = {0, y(1) — y(0),..., y(n) — y(n —1),...}, 


We put A = {y € S: y(0) = 0}. The solvability of problem 
DDCP is equivalent to the solvability of the problem 
OCP(T}, T2;A). The study of dynamic complementar- 
ity problem is an interesting new research domain in 
Complementarity Theory. The reader can find other ex- 
amples of order complementarity problems in [14,16], 


where it is shown that the generalized linear comple- 
mentarity problem in Cottle and Dantzig’s sense or the 
Bellman routing problem can be reformulated as on or- 
der complementarity problem. 


Solution Methods 


Let E(t) be a locally convex space ordered by a closed 
pointed convex cone K C E suppose that E is a vector 
lattice. Given the operators T, ..., T,, E { E and the 
set D C E, we consider the problem OCP({T;}"_,, D). 
Because the fact that the operator ‘A’ is used in the def- 
inition of this problem, many classical methods appli- 
cable to the solvability of nonlinear equations or to the 
solvability of fixed point problems are not applicable. 
For example the operator ‘A’ can distroy the compact- 
ness or the differentiability of operators T; (i = 1, ..., 
m). However, some fixed point methods or some topo- 
logical methods are applicable. In this sense several ex- 
istence theorems and iterative methods for solvability 
of the problem OCP({T;}'_,, D) are presented in the 
papers [12,13,14,15,16,17,18]. Several existence results 
can be obtained using the fixed point theory and the 
following result. 


Theorem 1 If ®o: E > E is an arbitrary function such 
that (Id + ®o)~' exists, then x € D is a solution of the 
problem OCP({T;}*_,, D) if and only if x. is a fixed point 
of the mapping H(x) defined by 


H (x) = (Id + Po) 
x (Vi (Id +89 — T)(x),..., (Id +o — Tm)(x)}) 


for every x € E. 


The importance of this Theorem is the fact that for 
practical problems we can choose the mapping ®p such 
that F{(x) has some good properties. We denote H(x) = 
A (x — T,(x), ...,x — Tm(x)), for all x € E. 


Definition 2. We say that H is ®o-isotone on D if there 
exists a mapping ®: E — E such that H + @ is isotone 
on D, (Id + ®) is invertible and (Id + ©)~! is isotone. 


We recall that D is order convex if u, v € D and u < w 
< vimply that w € D. 


Theorem 3 Let (E(t), K) be a locally convex vector lat- 
tice and D C E an ordered convex set. If the following 
assumptions are satisfied: 
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1) K is a regular cone; 

2) H is ®-isotone; 

3) (Id + ®)"(H + ®) is continuous; 

4) there exist x9, yo € D such that xo < yo, Xo < H(xo) 
and H(yo) < yo, 

then the problem has a minimal and a maximal solu- 

tions, both computable by iterative methods. 


Proof The proof is in [16]. 


Suppose that Id — T; = R; + S;, where R; is isotone and 
S; antitone. In this case we can associate to the mapping 
H the mapping 


A(x, y) = A{Ri(x) + Si(y), ..-, R(X) + Sl y)} 


for all x, y € E. We obtain that H is a heterotonic op- 
erator in Opoitsev’s sense (see [23]). We say that (x, 
yx) is a coupled fixed point for H if H(xs, yx) = X» and 
(ys. Xx) = yx. We have the following result: 


Theorem 4 Let (E, || - ||) be a uniformly convex Banach 

space ordered by a regular pointed closed convex cone K 

C E. If one of the following assumptions are satisfied: 

1) His nonexpansive; 

2) His condensing with respect to a measure of noncom- 
pactness; 

3) H is continuous and dim E < + 0, 

then for very conical interval [xo, yo] strongly invariant 

for H (i.e, x9 < (xo. yo) and H(yo. Xo) < yo), there 

exists a coupled fixed point (x, yx) for H and a solution 

X of the problem OCP({T;}"_,, K) such that x» < xX < 

yx. Moreover Xx = limp ooxXx and yx = lime cobs 

where x¢ = A(xk-1, ye-1) and yx = A(yp—1. Xk-1)- 


Proof The proof of this theorem is in [16]. 


Several interesting existence results based on a special 
topological index, defined on cones, are presented in 
[15]. This topological index was defined in [23]. The 
problem OCP({T;};,, D) can be also studied by the 
topological degree. Several results in this sense have 
been obtained especially for the Linear Order Comple- 
mentarity Problem [8,9,25]. The results obtained for 
the problem OCP({T;}7_,, D) can be applied to the 
problem NCP(f, K) when E is a Hilbert space and the 
inner product is K-local, since in this case the problem 
NCP(f, K) is equivalent to the problem OCP(Id, f, K). 
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> Integer Linear Complementary Problem 

> LCP: Pardalos—Rosen Mixed Integer Formulation 

> Lemke Method 

> Linear Complementarity Problem 

> Linear Programming 

> Parametric Linear Programming: Cost Simplex 
Algorithm 

> Principal Pivoting Methods for Linear 
Complementarity Problems 

> Sequential Simplex Method 

> Topological Methods in Complementarity Theory 
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Matroids have been defined in 1935 as generalization of 
graphs and matrices. 

Starting from the 1950s they have had increasing 
interest and the theoretical results obtained have been 
used for solving several difficult problems in various 
fields such as civil, electrical, and mechanical engineer- 
ing, computer science, and mathematics. Oriented ma- 
troids are a special class of matroids. They can be viewed 
as a combinatorial abstraction of real hyperplanes ar- 
rangements, of point configurations over the reals, of 
convex polytopes, or of directed graphs. Scope of this 
article is to introduce the reader to the theory of ori- 
ented matroids, providing an extensive discussion of 
the axiom systems for them and illustrating the differ- 
ent aspects that characterize these objects. 


Historical Review 


In 1935 H. Whitney in [35] studied the linear depen- 
dence and its important application in mathematics. 
A number of equivalent axiomatic systems for matroids 
is contained in his pioneering paper, that is considered 
the first scientific work about matroid theory. 

In the 1950s and 1960s, starting from the Whit- 
ney’s ideas, W. Tutte in [22,23,24,25,26,27,28,29,30] 
built a considerable body of theory about the structural 
properties of matroids, which became popular in the 
1960s, when J. Edmonds in [7,8,9,10,11,12,13] intro- 
duced the matroid theory in combinatorial optimiza- 
tion. From 1965 on, a growing number of researchers 


Oriented Matroids 


2879 


became interested in matroids. The origins of oriented 
matroids go back to these years due to R.G. Bland, J. 
Folkman, M. Las Vergnas, and Lawrence. Among the 
first results, see [17] and [15], which contains the fun- 
damental topological representation theorem for ori- 
ented matroids. To the same years further notions of 
oriented matroids go back due to Bland, who was mo- 
tivated by linear programming duality theory ([3,4]), 
and, independently, to Las Vergnas, who was motivated 
by graph theory ([31,32,33,34]). In 1978 appeared [5]. 

In the literature there are several other early papers, 
among them [16,19], and [21]. 

[2] contains a comprehensive and tutorial treatment 
of oriented matroids, whose current research updates 
are provided in [36]. 


Axiom Systems for Oriented Matroids. 


Researchers from various mathematical areas arrived at 
four basic, equivalent, axiom systems: 

1) circuit axioms; 

2) orthogonality axioms; 

3) chirotopes, or basis orientations; 

4) vector axioms. 

In order to understand these axiom systems, some no- 
tions and definition are needed. 


Definition 1 A signed set X is a set X together with 
a partition (X*, X~) of X. X* is the positive set; X~ is 
the negative set. X is positive (negative), if X* = @ (X~ 
=). X =X* UX is the support of X. 


Definition 2 Let X and Y be two signed sets, X is a re- 
striction of Y if and only ifX* C Y* and xX” CY. 


Definition 3 Let F be an unsigned set and X be 
a signed set. Then, the restriction of X to F, denoted 
X|p, is a signed set Y = X U F, such that Y* = X* UF 
and Y =X UF. 


Definition 4 A signed set X can be defined also 
throughout a mapping sgy: X — {—1, 1}, such that X* 
= {x: sgx(x) = 1} and X7 = {x: sgx(x) = —1]}. sgx is called 
the signature of X. 


In the following, X \ Y denotes the restriction of X to X 
\Y. 


Definition 5 Let X and Y be two signed sets, their com- 
position X o Y isa signed set such that (X o Y)* = X* U 
(Y* \X~) and (X 0 Y)~ = X~ U(Y7 \X*). 


Note that © is associative, while X o Y = Y o X if and 
only if the restrictions of X and Y to their intersection 
are equal. 


Definition 6 The opposite of a signed set X, denoted 
—X, is the signed set such that (—X)* = X~ and (—X)~ 
=X", 


Definition 7 Let X and Y be two signed sets. X and 
Y are called orthogonal signed sets, denoted by X L Y, 
if either X M Y = Q, or the restrictions of X and Y to 
their intersection are neither equal nor opposite, i.e. 
there must exist x, y € X M Y such that X(x)Y(x) = 
—X(y)Y(y). 


Definition 8 Let E be any set. A signed subset of E is 
a signed set whose support is contained in E. 


A signed subset of E can be identified with an element of 
{—1, 0, 1}", which is usually abbreviated by {—, 0, +}/. 
If E = {l,..., n} and X € {—, 0, +}4, X is a sign vector 
having n entries —, 0, or +. 


Circuits and Circuit Axioms. 


In this Section we provide the definition of an oriented 
matroid in terms of its signed circuits. 


Definition 9 (Circuit axioms) A collection € of signed 

subsets of a set E is the set of signed circuits of an ori- 

ented matroid M on E if and only if © satisfies the fol- 

lowing axioms: 

1) 0 Ze; 

2) symmetry: C = — €; 

3) incomparability: VX, Y € ©, if.X C Y, then X = Y or 
X=-Y; 

4) weak elimination: VX, Y € ©, X #—-Y,andxeX*n 
Y~ there exists a Z € C such that Z* C (Xt U Y*)\{x} 
and Z~ C (X~ UY7)\ {x}. 


Note that the Axioms 1), 3), and 4) of Definition 9 are 
the circuit axioms of an ordinary matroid. 


Corollary 10 The circuit supports C = {X: X € C} in 
an oriented matroid M are the circuits of a matroid M, 
called underlying matroid of M. 


Definition 11  C is called a circuit orientation of M, 
which has the same rank of M. 

An ordinary matroid M is orientable if it has a cir- 
cuit orientation. 
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Deciding the orientability of a matroid is a difficult 
problem. Bland and Las Vergnas [5] proved that a bi- 
nary matroid is orientable if and only if it is regular. 


Definition 12 Let M be an oriented matroid on a set 
E and let x € E. Then x is a loop of M if ({x}, 0) € €. 
If x ¢X for every X € C, x is called coloop of M. 


Bland and Las Vergnas [5] and, independently, Folk- 
man and Lawrence [15] obtained the following result. 


Theorem 13 Let € be a collection of signed subsets of 

a set E satisfying 1), 2), and 3) of Definition 9. Then 4) of 

Definition 9 is equivalent to 

4’) strong elimination: V X, Y € C,x€X*NY-, andf 
€ (X*\ Y-)U (X \ Y*), there is a Z € © such that 
f eZ, Z* C(X* UY") \ {x}, and Z~ C (X" UY) \ 
{x}. 


Minors 


As for the ordinary case also for an oriented matroid M 
on a set E, it is possible to define submatroids or minors 
induced by a subset F of E by deletion and/or contrac- 
tion. 

For any 8 C {-, 0, +}, let MinS be the collection 
of nonempty signed sets in & with inclusion-minimal 
support and let MaxS be the collection of signed sets in 
§ with inclusion-maximal supports. Then the following 
properties hold. 


Proposition 14 (Deletion) Let M be an oriented ma- 
troid on a set E with set of signed circuits C, and let F C 
E, Then C' = {X € ©: X C F} is the set of circuits of an ori- 
ented matroid on F called the submatroid of M induced 
on F and denoted by M(F). 


Proposition 15 (Contraction) Let M be an oriented 
matroid on a set E with set of signed circuits C, and let 
F CE, Then Min{X|p: X € C}) is the set of circuits of an 
oriented matroid on F called the contraction of M to F 
and denoted by M/A, where A = E \ F. 


Proposition 16 Let M be an oriented matroid on a set 
E, and let A, B be two disjoint subsets of E. It holds 


(M\ A)\ B=M\(AUB), 


@__ ™ 
B (AUB)’ 
(M\A) ™M 


B =(S)\4. 


Definition 17 A circuit signature of a matroid assigns 
to each circuit C two opposite signed sets X and —X 
supported by C. 


Theorem 18 Let M be a matroid on a set E, and let § be 
a circuit signature of M. Suppose that for all x € E the in- 
duced circuit signatures 8 \ {x} and 8/{x} are circuit ori- 
entations of M \ {x} and M/{x}, respectively. Then, one 
of the following condition holds: 

1) S is a circuit orientation of M; 


2) |E| = 3; 
3) |E| =4. 
Duality. 


Let ™ bean oriented matroid on a set E, Ba basis of M, 
and x € E\ B, then there is a unique circuit c(x, B) of M. 
c(x, B) is contained in B U {x} and supports a unique 
signed circuit of 1. Let c(x, B) be the basic circuit of x 
with respect to B, the signed circuit supported by c(x, 
B) which has x in its positive part. On the other hand, 
given x € B, c*(x, B) denotes a unique cocircuit of M 
disjoint from B \ {x}. c*(x, B) denotes the correspond- 
ing signed cocircuit of M which is positive on {x} with 
respect to B. The existence and uniqueness of c* (x, B) is 
proved in the following Proposition. 


Proposition 19 Let ™ be an oriented matroid on a set 
E with set of circuits C. Then the following three proper- 
ties hold. 

1) There exist a unique signature C* of the cocircuits of 
M such that X L Y, for all X € C and Y € C*. 

2) The collection C* is the set of signed circuits of an 
oriented matroid on E called the dual matroid (or or- 
thogonal matroid) of M, denoted by M*. 

3) M** =M. 


The following Theorem, proved in [5], states the ax- 
iomatic definition of an oriented matroid in terms of 
its dual. 


Theorem 20 (Orthogonality axioms) Let M be a ma- 

troid, C be a circuit signature of M, and C* be a cocircuit 

signature of M. Then the following properties are equiv- 

alent. 

1) C and C* are the circuit collections of a pair of dual 
matroids. 

2) XLY,VXEC,VYEC*. 

3) X LY, VX EC, VY €C*, with |KNY| <3. 
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Bland and Las Vergnas have given [5] and [6] a fur- 
ther axiom system, called painting axioms by them, ex- 
pressed in the following theorem. 


Theorem 21 (Painting axioms) Let C and C* be two 
collections of signed subsets of a set E. Then, C is the set 
of circuits and C* is the set of cocircuit of an oriented 
matroid on E if and only if they satisfy the conditions 1)- 
3) in Definition 9 and one of the two following equivalent 
properties. 
1) 3-painting: For all 3-partitions E=BUGU Randx 
€ B either there exists an X € C such thatx € X CB 
U Gand X MB C X*, or there exists a Y € C* such 
thatx€ YC BURand YNBCY*, but not both. 
2) 4-painting: For all 4-partitions E=BUWUGUR 
and x € BU W either there exists an X € C such that 
xe X CBUWUG, XNBCX*t, andXNWEX, 
or there exists a Y € C* such thatx€ YCBUWU 
RYOBCY*,andYQUWCY_, but not both. 


Corollary 22 Each element of an oriented matroid be- 
longs either to a positive circuit or to a positive cocircuit, 
but not to both. 


The previous Corollary is helpful to define a special type 
of oriented matroid, called acyclic oriented matroid. 


Definition 23 An oriented matroid ™ = (E, C) is 
acyclic if it does not contain a positive circuit. 

M is totally acyclic if all its elements are contained 
in a positive circuit. 


The identification of the minors of M* with the duals of 
the minors of ™ follows from Theorem 20, as showed 
in the following proposition. 


Proposition 24 Let ™ be an oriented matroid on a set 
Eand A be a subset of E. Then, 


(M\ A)* =M*/A, 
(M/A)* =M*\ A. 


Chirotopes and Basis Orientations 


An oriented matroid can be also defined by giving a sign 
to its bases. In the following are reported some defini- 
tions, needed to understand how to construct a basis 
orientation of a given oriented matroid, which becomes 
characterized in terms of signed bases, as shown in [32] 
and [34]. 


Definition 25 The basis signature of an oriented ma- 
troid is called chirotope. 


Definition 26 Let ™ be an oriented matroid on a set 
E with signed circuits C. The bases of M are those max- 
imal subsets of E that do not contain any circuit, i.e. 
they are the bases of M. 


Definition 27 A basis orientation of an oriented ma- 

troid M is a mapping of the set of ordered bases of M 

to {—1, 1} such that 

1) x is alternating; 

2) for each two ordered bases of M of the form (a, x2, 
..+) X~) and (b, x2,...,x;), a # b, it holds that 


X(b, x2,...,X-) = —C(a)C(b) x(a, x2,...,Xr), 


where C is one of the two opposite signed circuits of 
M in the set (a, b, x2, ..., X;). 


Condition 2) in Definition 27 is also known as pivoting 
property. 

Las Vergnas [32] and [34] has proved that every ori- 
ented matroid ™ has exactly two basis orientations, 
which are opposite. He also showed that if x is a ba- 
sis orientation of M, then M is uniquely determined by 
M and y. 

The pivoting property 2) in Definition 27 can be 
rewritten also as follows. 


Definition 28 A basis orientation y of an oriented ma- 

troid ™ is such that 

2*) for each two ordered bases of ™ of the form (a, x2, 
..+)X,) and (b, x2,...,x;), a # b, it holds that 


X(b, X2,...,Xr) = D(a)D(b) x(a, x2,...,%Xr), 


where Dis one of the two opposite signed cocircuits 
complementary to the hyperplane spanned by {x2, 
...) Xp} in M. 


Conditions 2 and 2* of Definitions 27 and 28, respec- 
tively, are equivalent for a map x, if M is an oriented 
matroid. 


Definition 29 (Chirotope axioms) Let E be a finite 
set and let r > 1 be an integer. A chirotope of rank r 
is a mapping y: E’ — {—1, 0, 1} satisfying the following 
three properties: 

1) x is not identically zero; 
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2) x is alternating, i.e. 


X(%Xo,,---.Xo,) = sign(o) x(x1,..., Xr), 


for each permutation o and each x),..., x, € E; 
3) forall x),...,X;,¥1,...,¥, € E such that fori=1,..., 
r, X1° X2 = 0, where 


Xi =AXUVi- X2,--- Xr), 

Ha =F Oo VG Viet 
it holds that 

Asean Xe)? LViga-s2 Ie) SO 


J. Lawrence [18] proved the following result. 


Theorem 30 Let E be a set and let r > 1 be an integer. 
Then, a mapping x: E’: {—1, 0, 1} is a basis orientation 
of an oriented matroid of rank r on E if and only if it is 
a chirotope. 


Vectors and Covectors 


The concept of vectors and covectors for oriented ma- 
troids has been introduced in 1978 by Bland and Las 
Vergnas [5]. 


Definition 31 A vector of an oriented matroid is any 
composition of circuits. 

A covector of a vector of the dual oriented matroid, 
i.e. any composition of cocircuits. 


The set of vectors V of an oriented matroid ™ can be 
viewed as partially ordered set. The partial order ‘<’ is 
given by 


Y <X_ if Y isarestriction of X . 


(V, <) is a pure ordered set of rank p* = p(M*). It has 
a unique minimal element 0 and its atoms (covering 0) 
are the circuits of M. Note that all above definitions and 
properties of vectors can be easily dualized for covec- 
tors. 

The formal characterization of the set of vectors of 
an oriented matroid is contained in the following theo- 
rem due to Edmonds and A. Mandel in 1982 [14]. 


Theorem 32 (Vectors axioms) A collection V of 
signed subsets of a set E is the set of vectors of an oriented 
matroid if and only if the following properties hold: 


1) BEY. 

2) symmetry: V =—YV; 

3) composition: VX, Y € V, X circY € V; 

4) strong vector elimination: VX, Y € V,a€ X*N Y— 
andbe(X\YJU(Y\X)UQTINY)UK N 
Y—), there exists Z € V such that 
i) Z* C (X*U Y*)\{a}s 
ii) Z~ C(X” UY \\fa}; 

iii) b EZ. 

In Theorem 32, condition 3) can be replaced by one of 

the following two conditions: 

3’) vector elimination: V X, Y€ Vandaex*nyY-, 
there exists Z € VV such that 
i) Z* C (X* U Y*)\{a}; 

ii) Z” C(X” UY )\fah; 
iii) (AY) UMA) U ANY YUCK NY) CZ. 

3) Y-approximation of X: VX, Y € V with Y C X and 
Xt 1 Y~ & G, there is a proper restriction Z of X 
such that 
i) ZeEY; 

ii) VY JU AVY) CZ 

The oriented matroid operations of deletion and con- 

traction, and the duality concept can be formulated 

in terms of vectors as formalized in the following two 
propositions. 


Proposition 33 Let M be an oriented matroid on E 

with set of circuits C and set of vectors V, and let A C E. 

e The set of vectors of the deletion matroid M\A is the 
set 


V\A={XEV: XNA=H}. 


e The set of vectors of the contraction matroid M/A is 
the set 


V/A ={X\A: XEV}. 


Proposition 34 Let ™M be an oriented matroid on E 
with set of circuits C and set of vectors V. Then the set 
of vectors of its dual matroid M* is the set & = {Y € {+, 
—,OofFF:X LY, VXeC}={Yef4,— 0f:X LY, VXe 
V}. 


General Topics 


This Section is devoted to showing how the oriented 
matroids collocate in both pure and applied mathemat- 
ics and how their four axioms systems arise from the 
following four topics: 
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directed graphs; 

orthogonal pairs of real vector subspaces; 
point configurations and convex polytopes; 
real hyperplane arrangements. 


Directed Graphs 


Let D = (V, E) be a digraph and let C be the set of its 
simple cycles. Each cycle c € C is associated with an 
orientation, i.e. each of them consists of some forward 
(positive) edges and some backward (negative) edges. 
Therefore, any c € C can be viewed as a signed subset of 
E with positive and negative parts. A signed subset of E 
deriving from a cycle c € Cis called a signed circuit of D 
and all signed circuits of D form the collection 


C= {X = (Xt,X7): X a signed circuit of D} . 


The oriented matroid M = Mp of D, also denoted by 
M = M(E), is given by the pair (E, C). 

For a digraph D it is also possible to define the set 
of its signed cocircuits as follows. Let V = (Vi, V2) 
be a minimal cut of D, i.e. a partition of the nodes of 
D such that the removing of the edges connecting el- 
ements in V, to elements in V2 increases by one the 
number of components of the underlying undirected 
graph. Let Y* be the set of edges in D from V; to V2 
and Y~ be the set of edges from V2 to Vj, then the sets 
Y = (Y*, Y-) are signed sets called signed cocircuits of 
D, which form the collection 


Y asigned 


c*= 
cocircuit of D 


Y=(Y",Y): 
where C* provides the collection of circuits of the dual 
oriented matroid of Mp. 

It is quite easy to show that the digraph D satisfies 
the properties 1)-4) of circuit axioms expressed by Def- 
inition 9 and that all properties of D are reflected in C 
and C*. For example, if D does not contain any oriented 
cycle, then C will contain no positive circuit, i.e. the 
matroid corresponding to D will be an acyclic oriented 
matroid. 


Real Vectors Spaces 


In this Section the two most important ways to relate 
oriented matroids to real vector spaces are considered: 
point configurations and hyperplane arrangements. 


Vector Configurations 


Generally speaking, given an arbitrary field F and a fi- 
nite set of vectors that spans a vector space of dimen- 
sion r over F, the minimal linear dependences gener- 
ate the circuits of a matroid of rank r. In order to get 
an oriented matroid, the field F must to be ordered. In 
more detail, given a finite set E = {v,.. 
tors that spans a vector space of dimension r over an 
ordered field {vj, ..., v,;} CR’, then a minimal linear 
dependence is such that 


30 =0, 


i=1 


> Vn} of vec- 


with A; € R and the circuits of the associated oriented 
matroid M = (E, C) of a vector configuration E are the 
sets X = (X*, X_) such that 

XP eS dis Ape, RO = {he AL a 
for all the minimal dependences among the vectors ¥j. 

The bases of the matroid corresponding to a vector 
configuration E are the subsets of E that form vector 
space bases, i.e. all subsets {v;,,..., vi,} of E such that 
det({vi,,...5 vi,}) #0. 

For a vector configuration over the real field R, let 
consider the signs of the determinants of ordered r-sub- 
sets of {v1, ..., Vn}, then the basis orientation or chiro- 
tope of the vector configuration is defined as 

Xi, ..., i) = sign det{v;,,...,vi,}, 
where y(ij,...,i-) € {+, —, O}. y is an asymmetric func- 
tion and satisfies the properties 1)-3) of chirotope ax- 
ioms expressed in Definition 29. 


Point Configurations 


In the following, starting from the observation that ev- 
ery vector configuration in R"\{0} corresponds to an 
affine point configuration in an (r — 1)-dimensional 
affine space, it will be showed how any point configu- 
ration in a real affine space leads to an acyclic oriented 
matroid. In fact, after choosing a linear form [p(v;) 4 0 
for all i it is possible to define an (r — 1)-dimensional 
affine space as 


A’ := {x ER’: h(x) = 1}, 
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and to associate with each vector v; the point 1/lo(v;)v; 
€ A’! Vectors v; with Io(v;) < 0 correspond to points 
called ‘points with negative weight’. If the given vector 
configuration does not contain any positive linear de- 
pendences, i.e. 


> Aivi = 0, A; =>0, 
i 


then it is possible to choose Jp such that Jo(v;) > 0 for 
all i. This corresponds to the situation where the ori- 
ented matroid is acyclic, which can be always achieved 
by simply replacing some of the vectors v; by their nega- 
tives. The circuits of the corresponding acyclic oriented 
matroid are given by the signs of the coefficients in the 
minimal affine dependences (> ;A;v; = 0, A; = 0). For 
example, the vertices of a complex polytope describe an 
acyclic oriented matroid. 

The sign patterns of arbitrary (possibly non mini- 
mal) affine dependences can be derived from the cir- 
cuits by compositions as follows. Given two signed sets 
X and Y, 


XoY=(xX" u(y" (2 7),2 UY a") 


The signed sets so obtained are called vectors of the ori- 
ented matroid. 


Hyperplane Arrangements. 


A real hyperplane arrangement + = {H),..., Hy} isa fi- 
nite set of hyperplanes through the origin in R’. Since 
every hyperplane is defined by giving a linear function 
I(x) = ei ajxj, any hyperplane can be defined as H; 
= {x € R’: ];(x) = 0}. Since J; can be viewed as vectors 
in the dual space (R’)*, they form a vector configura- 
tion in R’)*, which determines an oriented matroid. In 
fact, once chosen the vectors |;, a positive side H = = {x 
€ R’: 1;(x) = 0} of H; is distinguished and the oriented 
matroid corresponds to the arrangement of halfspaces 
{H*:1<i<n}inR’. 

As showed below in Theorem 36, not only every real 
arrangement of hyperplanes gives rise to an oriented 
matroid, but the inverse is also ‘nearly’ true. 


Topological Representation Theorem 


In [15] Folkman and Lawrence showed that each ori- 
ented matroid has a pseudosphere representation. This 


property, expressed in the so called topological represen- 

tation theorem, is a generalization of the hyperplane ar- 

rangement model to arbitrary oriented matroids. In this 

Section the topological representation theorem will be 

treated superficially, giving only its meaning and some 

of the most important consequences that it implies. For 
more details about this fundamental result in the theory 

of oriented matroids, see [2]. 

Let E be a finite, parallel-free, spanning set of 
nonzero vectors in R’*!, and let C C {+, —, 0}" be the 
set of signed circuits of the corresponding oriented ma- 
troid. For each e € E, let S, = {x € R™*!: (x, e) =0, ||x|| = 
1}. The positive and negative parts of S, are respectively 
St = {xe RT*!: (x, e) = 0, [xl] =1} and S* = —S?. It is 
easy to prove that 
1) o and S; are subsets of the unit r-sphere S” = {x € 

R™?: |[x|| = 1s 

2) S, is a linear (r — 1)-sphere; 

3) Se and S, are the two closed hemispheres of S,, 
which is the intersection of S’ and the hyperplane 
orthogonal to e, so that the arrangement of spheres 
A = (Se)ee x is equivalent to an arrangement of hy- 
perplane discussed above. 

In fact, once established the arrangement of spheres A 

and distributed the signs + and — to the hemispheres, 

such signed arrangements of (r — 1)-spheres in S” iden- 
tify an oriented matroid of rank r + 1. In more details, 
the signed circuits C in this case are the vectors X € {+, 

—, 0} such that 


C1) UcexSe = S", where S*° is either S* or ST; 
Co) X ={e € E: X, £ 0} is minimal with property cy. 


A subset S of S” is called a pseudosphere if there exists 
a homomorphism h: S" — S" such that S = h(S"~'), 
where S’~! = {x € S’: x, 41 =O}. 


Definition 35 An arrangement of pseudospheres A = 

(S.)ee x is a finite set of pseudospheres S, in S’ such that 

e for AC Eeach S4 =Neea Se FM, is homeomorphic 
to a sphere of some dimension; 

e for every e € E and every nonempty intersection S4 
such that S4 ¢ S,, the intersection S4 M S, is a pseu- 
dosphere in S4 with sides $4.M St and S4M S;. 


Theorem 36 (Topological representation) Let A be 
a signed arrangement of pseudospheres, and let C(A) be 
the family of the sign vectors X € {+, —, O}* that satisfy 
c, and Cp, then 
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tr,) if A = (S.Jeer is a signed arrangement of pseudo- 
spheres in S", then C(A) is the family of circuits of 
an oriented matroid on E and whose rank is r + 1; 
if (E, C) is an oriented matroid of rank r + 1, then 
there exists a signed arrangement of pseudospheres 
A in S" such that C = C(A); 
tr3) given two signed arrangements A and A’, then C = 
C(A) =C =C(A’) ifand only if A’ = h(A) for some 
self-homomorphism h of S". 


tr2) 


Corollary 37 There is a 1-to-1 correspondence between 
arrangement of pseudospheres in S" and oriented ma- 
troid of rank r + 1. 


Conclusions 


Starting from the 1950s, matroids and oriented ma- 
troids have had increasing interest. A huge number of 
scientific works have been published on those subjects 
and a large collection of matroid theorems and theoret- 
ical results exists. 

This article has introduced the combinatorial theory 
of oriented matroids, providing their axiomatic defini- 
tions and their basic properties. 


See also 


> Matroids 
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Triangularization is a process of reducing a square 
(rectangular) matrix into upper-triangular (upper- 
trapezoidal) form by applying a series of elementary 
transformations. There are basically two types of ele- 
mentary transformations: orthogonal and nonorthog- 
onal. Examples of nonorthogonal transformations in- 
clude classical Gaussian transforms, and Gauss—Jordan 
transforms. 


Orthogonal Factorization 


Here the matrix A € R”” is reduced to an upper- 
trapezoidal form using a sequence of elementary or- 
thogonal transformations (such as Householder or 
Givens transformations; see » QR factorization for 
more details). 


Rank Revealing Factorizations 


Orthogonal factorizations can also be used effectively 
for computing the numerical rank [2] of a matrix. The 
general idea is to identify the independent columns of 
the matrix and permute them to the left-hand side; i.e. 
find a permutation matrix J7 such that 


r n—r 

Tr r Ry Ry 
All = 1 
Q Cont ae (1) 


where Rj is nonsingular, upper triangular and || R22 || 
<. It is said that the e-rank of A is r. The following QR 
factorization with column-pivoting computes factoriza- 
tion in (1). Given A € R’”” with m > n, the following 
algorithm computes r = rank(A) by permuting columns 
while computing the QR factorization (see [2] for de- 
tails). 


itor j= lem 
c(j) = A(1: m, jf)" ACL: m, j); 
end; 
7p =O 
find ks.t. c(k) = max(c(1 : n)); 
f= e(p 
while t >€ 
ip = ice ile 
exchange columns i and k; 
exchange c(k) and c(i); 


v = house(A(r : m,r)); 
Apply to rest of the columns; 
fOrt— leant 
c(i) = c(i) — A(r, i); 
end; 
ifr<n 
find k such that 
c(k) = max(c(r+ 1: n)); 
t = c(k); 
else t = 0; 
end; 


Complete Orthogonal Factorization 


Sometimes, it is desirable to reduce the matrix A € R””” 
to the following form 


Ty O 
T u 
AZ = 
gtaz=(% o)- 
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where T); € R”™’ is nonsingular and upper triangular. 
Such a factorization is rank revealing and the rank of 
A will be r. Such factorizations are very useful in rank- 
deficient optimization problems (such as rank-deficient 
least squares) [1]. 

The first step towards complete orthogonal factor- 
ization is to apply QR factorization with column pivot- 
ing. Thus, 


Ry R 
T u 12 
All = . 


Then apply a series of orthogonal transformations on 
the right-hand side so that Riz is zeroed. While Rj» is 
being zeroed, values in Rj, will change, and let us call 
the modified Rj; as T);. The only trick here is make sure 
that while Rj is being zeroed, zeros that are already in 
Rj, are not disturbed. 

Let us consider an example of a matrix A € R°”® 
whose rank is three. After the QR with column pivot- 
ing, let 


Qi 412 413, Aig 45 6 
O a22 423, Arg 25 6 
” 
Q'AIT=]| 0 O 433 d34 435 a36] 
0 0 0 a44 45 46 
0 0 0 as4 ass as6 
where 
444 445 46 2 
454 455 456 / ||, 


and Qis orthogonal and JT is some permutation matrix. 
The following sequence of Givens transformations, 


Q'AITG3, G35 G34 G35 G35 G34 Gig G5 G14 


will zero Rj. without disturbing the zeros that are al- 
ready present in Rj. 


See also 
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A classical problem in the field of multiple criteria de- 
cision making (MCDM) is to build a preference rela- 
tion on a set of multi-attributed alternatives on the basis 
of preferences expressed on each attribute and ‘inter- 
attribute’ information such as weights. Based on this 
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preference relation (or, more generally, on various re- 
lations obtained following a robustness analysis) a rec- 
ommendation is elaborated (e.g. exhibiting a subset 
likely to contain the ‘best’ alternatives). 

A common way [20] to do so is to attach a number 
v(x) to each alternative x € X and to declare that x is ‘at 
least as good as’ y if and only if v(x) > v(y). The num- 
ber v(x) depends on the evaluations x1, ..., x, of x on 
the n attributes and we have v(x) = V(x, ..., X,). The 
most common form for V is an additive value function 
in which V(x, ..., Xn) = )07_, kivi(x;); in that case the 
task of the analyst reduces down to assessing the partial 
value functions v; and the scaling constants k;. The pref- 
erence relation that is built using this value function ap- 
proach is a weak order, i. e. a complete and transitive bi- 
nary relation. Using such information it is not difficult, 
in general, to elaborate a recommendation. The defini- 
tion of the aggregation function V may not always be 
simple however. Making all alternatives comparable in 
a ‘nice transitive way’ requires much information and, 
in particular, a detailed analysis of trade-offs between 
attributes. 

Outranking methods (OMs) were first developed in 
France in the late 1960s following difficulties experi- 
enced with the value function approach in dealing with 
practical problems. They are closely associated with the 
name of B. Roy, who developed the well-known family 
of ELECTRE methods. A large part of the literature on 
OMs was written in French which has been prejudicial 
to their international diffusion; good accounts in En- 
glish are [18,32,38,39,46,53,56,57], while detailed refer- 
ences in French include [24,31,41,47,48]. 


Basic Ideas 


As in the value function approach, OMs build a pref- 
erence relation, usually called an outranking relation, 
among alternatives evaluated on several attributes. Roy 
defines an outranking relation as a binary relation S on 
the set X of alternatives such that xSy if, given what is 
known about the preferences of the decision-maker, the 
quality of the evaluations of the alternatives and the na- 
ture of the problem, there are enough arguments to de- 
cide that x is at least as good as y, while there is no es- 
sential reason to refute that statement. 

In most OMs the outranking relation is built 
through a series of pairwise comparisons of the alter- 


natives (this implies that these methods deal with fi- 
nite sets of the alternatives; their underlying principles 
may however be adapted in order to deal with infinite 
sets [19]). Although pairwise comparisons can be done 
in many ways, the concordance-discordance principle is 
prevalent in most OMs (exceptions include [2,50]). It 
consists in declaring that an alternative x is at least as 
good as an alternative y(xSy) if: 

e a majority of the attributes supports this assertion 

(concordance condition); and 
e the opposition of the other attributes (the minority) 

is not ‘too strong’ (nondiscordance condition). 

This principle is at variance with the ones under- 
lying the value function approach. It rests on a ‘vot- 
ing’ analogy and may be used without having recourse 
to a subtle analysis of trade-offs between attributes. It 
mainly uses ordinal considerations and has a strong 
noncompensatory flavor [3,11]. The application of this 
principle gives rise, in general, to binary relations which 
are neither complete (i.e. it is possible that Not(xSy) 
and Not(ySx)) nor transitive (i. e. we may have xSy, ySz 
and Not(xSz)). Exploiting an outranking relation in or- 
der to arrive at a recommendation is therefore not an 
easy task and calls for the application of specific tech- 
niques [41,53]. 

We briefly describe below ELECTRE I [35], which 
is the oldest and simplest OM before coming to some 
extensions and comments. 


ELECTRE | 


Consider a finite set of alternatives X evaluated on 
a family N = {1,..., n} of attributes. A first step in the 
comparison of two alternatives x = (x),..., X,) and y= 
(v1, ---» Yn) is to know how they compare on each at- 
tribute. ELECTRE I uses a traditional preference model 
for this purpose: a weak order (i. e. a complete and tran- 
sitive binary relation) S; is supposed to be defined on 
each i € N, x;S;y; meaning that x is judged at least as 
good as y on attribute i. Dealing with a finite set, it is not 
restrictive (see [15]) to assume the existence of a real 
valued function g; such that 


xiSivi > gilxi) = gilyi)- 


Quite often in practice, numbers are used to evaluate 
the alternatives on the various attributes and the rela- 
tions S; stem from the comparison of these numbers [4]. 
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In ELECTRE I, the analysis of the proposition xSy 
rests on the partition of the set N of attributes into 
a concordant coalition C(xSy) = {j € N: gj(xj) = gj(yj)} 
and a discordant coalition D(xSy) = {j € N: gj(xj) < 
gj(yj)}. The proposition xSy will be accepted if the 
concordant coalition C(xSy) is ‘sufficiently important’ 
(concordance condition) and if on any of the attributes 
in D(xSy) the ‘difference of preference’ in favor of y is 
not considered to be ‘large’ (condition of nondiscor- 
dance). In order to implement the concordance condi- 
tion, a positive weight k; is assigned to each attribute 
j € N and the importance of a coalition is supposed 
to be represented by the sum of the weights of the at- 
tributes belonging to that coalition. Thus the concor- 
dance index c(x, y) = je cxsykj/ Dije wk; represents 
the relative importance of the coalition C(xSy) in the 
set N of all attributes; we have c(x, y) € [0, 1]. Whether 
or not C(xSy) is ‘sufficiently important’ is then judged 
comparing c(x, y) to a concordance level s € [1/2, 1]. 
It is worth noting that the partition of N into C(xSy) 
and D(xSy) and the computation of the concordance 
index c(x, y) rest on purely ordinal comparisons: alter- 
ing the functions g; without altering the binary relations 
S; will not change the values of the concordance index. 
Suppose that c(x, y) => s. Concluding that xSy) would 
give no power to the attributes in D(xSy)). If on any of 
these attributes the, positive, preference difference be- 
tween y and x is ‘large’ there are good reasons to re- 
ject the proposition xSy). The definition of ‘large’ pref- 
erence differences is done in ELECTRE I via the defini- 
tion of nonnegative veto thresholds v; (which may vary 
with gj) on each attribute; a preference difference is de- 
clared ‘large’ as soon as gj(yj) — gj(xj) > v;. It should 
be noticed that the implementation of the nondiscor- 
dance principle through the definition of veto thresh- 
olds v; linked to a particular functions g; is a matter of 
commodity only; what is in fact looked for is a subset of 
the asymmetric part of S; corresponding to ‘large’ pref- 
erence differences, which may be done independently 
of any numerical representation. In summary, we have 
in ELECTRE I: 


xSx 
t 
c(x,y)>s 
and 
gi(yj) — g(xj) < vj, Vj € D(xSy) . 


When s = 1 (which amounts to requiring unanimity of 
the attributes in order to accept outranking) or v; = 0 
for all j (implying that all positive preference differences 
are ‘large’), the outranking relation S is nothing but the 
so-called dominance relation A defined by 


xAy & [xjS;y; forall j ¢ N]. 


It is not difficult to see that it is always true that A 
¢ S. An outranking relation may be usefully seen as 
an enrichment of the dominance relation A in which 
unanimity of the attributes is not required and not all 
positive preference differences are considered ‘large’; 
decreasing the value of s and/or increasing the values 
of the v; results in a richer but somewhat riskier out- 
ranking relation. Although, the dominance relation A 
is clearly reflexive and transitive (but not complete), 
simple examples, inspired by Condorcet’s paradox [49], 
show that, in general, S is neither complete nor transi- 
tive when s < 1 and v; > 0. 

It is important to note that in ELECTRE I, the 
weights k; cannot be interpreted as substitution rates or 
trade-offs; they are thus fundamentally different from 
the scaling constants that are used in the value func- 
tion approach. In line with the voting analogy underly- 
ing the concordance-discordance principle, it is useful 
to interpret k; as the ‘number of votes’ given to attribute 
j (this number of votes being independent of the choice 
of the function gj), the concordance threshold s speci- 
fying a level of “qualified majority’. 

ELECTRE I was originally designed to lead to 
‘choice-type’ results. Since S may not be complete or 
transitive, the set {x € X: xSy for all y © X} of maxi- 
mal alternatives (in X given S) can be empty. In order 
to overcome this difficulty, ELECTRE I determines the 
minimal (with respect to inclusion) set of alternatives 
not outranking each other such that all the alternatives 
outside of this set are outranked by at least one alterna- 
tive from this set. Technically, this leads to the determi- 
nation of the kernel of the graph (X, S) after the detec- 
tion and elimination by reduction of possible circuits (a 
well-known result in graph theory proves the existence 
and unicity of kernels in graphs without circuits). 


Extensions 


Besides ELECTRE I, many other OMs have been 
proposed in the literature [13,14,23,33,36,40,42,45,58]. 
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They mainly differ on: 

e the type of result that is looked for (e.g. one may 
wish to use S to rank order alternatives or to sort 
them into pre-defined categories); 

e the way the outranking relation is built. It is indeed 
possible to implement the concordance-discordance 
principle in various ways (e. g. allowing for synergy 
effects in D(xSy)). Moreover, varying the values of s 
and of the thresholds v; lead to several different rela- 
tions S; such variations are incorporated in methods 
which use several nested outranking relations (as 
in [40,42]) or a fuzzy outranking relation in which 
a ‘credibility’ is attached to each arc in the graph (X, 
S) as in ([13,14,36,58]) (on the notion of fuzzy out- 
ranking relation see ([17,28]); 

e the way alternatives are compared on each attribute. 
In ELECTRE 1 it is postulated that alternatives can 
be compared on each attribute according to a weak 
order. This traditional preference model may be 
inappropriate considering the inevitable elements 
of imprecision, uncertainty and inaccurate deter- 
mination entering the evaluations of the alterna- 
tives. Indifference on each attribute may not be 
fully transitive; moreover there may exist cases in 
which the transition from indifference to strict pref- 
erence is not without ambiguity giving rise to mod- 
els involving ‘weak preference’ relations (such mod- 
els involve indifference and/or preference thresholds 
[34,37,51,52]. 

The following table summarizes the main charac- 
teristics of the existing ELECTRE methods and might 
help in choosing an appropriate OM. See [41] and [56] 
for a complete description of these methods and of 
many others in a similar vein, in particular the TACTIC 
method [54] and the family of PROMETHEE methods 
[13,14]. 


Practical Considerations 


We give here some indications on how to give a value 
to the parameters used in ELECTRE I: weights kj, veto 
thresholds v; and the concordance threshold s (they 
may be transposed to all ELECTRE methods; for a more 
detailed account see [24,26,41,48] and for an alterna- 
tive approach [21]). Before doing so, it is important 
to note that the underlying philosophy of OMs is not 
to describe as accurately as possible the preferences of 


Outranking Methods, Table 1 
Main characteristics of existing ELECTRE methods; adapted 
from [41] 


Electre |Prefer- |Useof |No.of /Type of 
methods |ence weights joutrank. [result 
model on rel. used 
each 
attribute 
I [35] trad. yes 1 Choice 
IS [45] |nontrad. |lyes 1 Choice 
Il [40] _|trad. yes Ranking 
(partial) 
III [36] |nontrad. |lyes 1 (fuzzy) |Ranking 
(partial) 
IV [42] |nontrad. |no upto5 Ranking 
(partial) 
Tri nontrad. /yes 1 Assignm. 
[41] [58] into 
predef. 
categ. 


a decision-maker. This decision-maker is often a re- 
mote abstract entity (the state, the region, the firm); 
when this is not the case he/she is frequently not very 
accessible and his/her preferences may be only very 
partially structured. Searching for the ‘true’ values of 
kj, vj or s makes little sense under these conditions. 
The concordance-discordance principle is best seen as 
a useful and easily understandable convention to help 
structuring preferences. The ‘assessment’ of the param- 
eters of the method should therefore aim at transform- 
ing what appears to be the stable basic judgements of 
the actors to be helped into numerical values. Needless 
to say that, under these conditions, the elaboration of 
a recommendation should be preceded by a thorough 
robustness analysis. 

In order to give a numerical value to the weights 
kj it is useful to envisage imaginary but realistic alter- 
natives combining plausible evaluations on the various 
attributes. Consider two such alternatives x and y such 
that g(xj) > gj(yj) for all j < J CN and g;(y;) > gi(x;) for 
all i ¢ J. If the differences between the evaluations of x 
and y have been chosen in such a way as to avoid ‘large’ 
preference differences and if it may be agreed that x is at 
least as good as y while y is not at least as good as x, we 
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can then infer that }°j<;kj > sand )°; ¢;k; < s, suppos- 
ing without loss of generality that ss enkj = 1. Com- 
bining several questions of this type gives rise to a poly- 
hedron of plausible values for kj and s to be explored 
during the robustness analysis, see [43,44]. It should be 
noted here that the precise numerical values of kj and 
s are irrelevant in ELECTRE I as long as they imply 
a similar partition of subsets of attributes into ‘winning’ 
coalitions (for which the sum of the weights exceed the 
concordance threshold) and ‘losing’ ones. 

Consider now two imaginary alternatives x and y 
such that g;(x;) > gj(yj) for all j €¢ N\{i} and choose g;(y;) 
to be one of the best evaluations on attribute i and g;(x;) 
to be one of the worst. We have D(xSy) = {i}. If it can be 
accepted that xSy, then it is clear that no veto power 
should be conferred to attribute i, which amount to set- 
ting v; to an arbitrarily large number. If not, attribute i 
has a veto power; in order to give a value to vj one can 
then increase gj(x;) and/or decrease g;(y;) till xSy is ac- 
cepted. A slightly larger value than the difference g;(y;) 
— gi(x;) leading to the acceptance of xSy gives a plau- 
sible value for v; (note that before choosing a constant 
value of v; it should be checked that the maximum dif- 
ference g;(y;) — gi(x;) on attribute i compatible with xSy 
does not vary along the scale of g;; when this is the case 
variable thresholds can be easily used). 


Theoretical Appraisal 


OMs have often been criticized for their lack of ax- 

iomatic foundations; ELECTRE I was proposed on 

a more or less ad hoc basis and subsequent methods 

aimed at extending it. The situation has changed dra- 

matically in recent years giving rise to a variety of stud- 
ies investigating the foundations of these methods. In 
particular, it is worth mentioning that: 

e the links between concordance-discordance princi- 
ple leading to possibly intransitive and incomplete 
outranking relations and classical aggregation prob- 
lems in social choice theory (exemplified by Arrow’s 
impossibility theorem; see [49]) has been studied in 
depth [1,5,27]; 

e outranking methods may be axiomatised in more 
or less the same way as the various instances of the 
value function approach (see [3,10,11,30,54]), the 
axioms emphasizing the ‘ordinal’ and ‘noncompen- 
satory’ features of the methods; 


e the structural properties of outranking relations 
have been studied in depth [7], this problem having 
strong links with the classical problems of the con- 
struction of voting paradoxes [25] and the binary 
choice probabilities problem [16]; 

e various ways of exploiting outranking relations 
have been carefully analyzed and/or axiomatized see 
[6,8,9,12,22,29,55]. 

This literature on the foundations of OMs while still 
being in its early stages has already greatly contributed 
to a better understanding of these methods and their 
underlying hypotheses. 


Practical Applications 


OMs have been applied in real-world studies since their 
creation. It is impossible to give here a complete list of 
applications and references. We only mention a few sig- 
nificant applications in various fields (detailed biblio- 
graphical indications may be found in [41]). 


Environment 


Forestry management (Canada), Nuclear waste man- 
agement (Belgium), Pollution prevention and control 
(France), Solid waste management (Finland, Greece), 
Water resource management (France, Hungary, USA); 


Finance 


Allocation of grants (Belgium), Analysis of the interna- 
tional diversification of portfolios (Canada), Equitable 
burden sharing in international institutions (Belgium), 
Investment planning (France), Portfolio management 
(Canada); 


Health 


Computer-aided diagnosis (France), Epidemiology 
(France), Identification of bacteria (Belgium), Manage- 
ment of hospitals (Canada); 


Location 


Airports (Canada, the Netherlands), High voltage elec- 
tric lines (France, Canada), Schools (France), Thermal 
power plants (Algeria); 
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Transportation 


Choice of a highway route (France), Planning the reno- 
vation of metro stations (France), Selection of suburban 
metro extensions projects (France); 


Miscellaneous 


Analysis of tenders (France, Portugal), Choice between 
forecasting models (Belgium), Choice of a market- 
ing strategy (France), Inventory management (France), 
Production planning in a job-shop (Canada), Promo- 
tion of navy officers (Portugal), Regional planning (The 
Netherlands). 
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Consider a set of experimental observations repre- 
sented by a vector b € R". The goal is to estimate a set 
of parameters x € R” with the help of a matrix of in- 
dependent input conditions represented by A € R"*"”. 
In other words, one wishes to express b in terms of A. 
However, one may have a larger number of experimen- 
tal observations than parameters to be estimated, i. e., it 
may be the case that n > > m. The problem described 
above is a typical estimation problem which gives rise 
to an overdetermined system of linear equations: 


Ax =b. (1) 


In general one cannot expect to obtain a vector x 
which satisfies (1) even if A has m linearly independent 
columns. This feature of the problem leads to the search 
for a vector x which makes Ax as close as possible to b. 
The closeness is measured in some suitable norm which 
is usually either the 2-norm, or the 1-norm, or the oo- 
norm. The most common is the 2-norm which yields 
the well-known linear least squares problem: 


min ||Ax — b||, = V(Ax — b)T(Ax —b). (2) 


The linear least squares approach is usually preferred 
because it leads to a simpler problem. More precisely, 
it admits a closed-form solution which can be obtained 
by solving the linear system of equations: 


A'Ax =2A'D. (3) 


Since AT A is a symmetric positive (semi)definite ma- 
trix (it is positive definite when A has m linearly inde- 
pendent columns, in which case the solution is unique) 
it can be decomposed in the form of LDL™ (or, Choleski 
factorization) where L is unit lower triangular, and D is 
diagonal. The factored form can then be used to solve 
(3) which has always a solution. However, this method 
is only reliable when A is a well-conditioned matrix. 
A more numerically stable way to solve (3) is to use 
an orthogonal factorization (e. g., QR) combined with 
a pivoting strategy. A detailed treatment of the linear 
least squares problem can be found in [8]. 

In some instances, the set of observations includes 
gross inaccuracies or wild points. In such cases, it may 
be preferable to use the 1-norm which leads to the fol- 
lowing estimation problem 


min ||Ax — Bll, => |(Ax—b)i| , (4) 


i=1 


where (Ax — b); is used to represent the ith component 
of Ax — b. The function in (4) is not differentiable at 
those points where (Ax — b); = 0 for some i € {1,..., 
n}. The problem is commonly referred to as the ¢; es- 
timation problem. The parameter values obtained from 
the minimization problem (4) will not be as adversely 
affected by the presence of wild points as the estimates 
obtained using (3). On the other hand, in contrast to the 
linear least squares problem (4) is a combinatorial opti- 
mization problem because it can be shown that a mini- 
mizing point x has the property that some of the com- 
ponents of the residual vector Ax — b are equal to zero, 
some are positive and some are negative (this property 
is what makes this approach immune to wild points). 
Hence, if one had access to the information as to which 
components are zero, positive, and negative, respec- 
tively, one could find a minimizing point x by solving 
the following linear program: 


min > si (Ax — b); 
ie Ac 


st. (Ax—b); =0, Vie A, 


where A is the set of indices corresponding to zero 
components of Ax — b, A° is its complement with re- 
spect to {1,..., nm}, and s* is the sign function which as- 
sumes the value + 1 for positive residuals, and — 1 for 
negative residuals, respectively. 
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Unfortunately, one has a priori no idea about s* and 
A. An alternative way to pose (4) leads to the following 
problem: 


Yui + vi) 


i=1 


min 
c>0 
d>0 
u=0 
v>0 


st. u—v+A(c—d)=b, 


with (Ax — b); = uj; + v; and x; = cj — dj. The equiva- 
lence of (4) and the above linear program is discussed 
in [17]. The most successful attempts at solving (4) were 
based on the above reformulation and its dual problem. 
Notably, I. Barrodale and F.D.K. Roberts [4] specialized 
the simplex algorithm of linear programming to the 
above formulation by taking advantage of the comple- 
mentarity between the u; and v; variables in the pivot- 
ing process. R.D. Armstrong, E.L. Frome and D.S. Kung 
[1] developed a revised simplex algorithm for the linear 
programming formulation of the problem. A different 
algorithm which aims at minimizing the nondifferen- 
tiable 1-norm function (4) was given in [6]. 

A more recent idea for solving the ¢, estimation 
problem was given in [12]. This idea is quite different 
from those mentioned above in that it replaces the orig- 
inal function with a once continuously differentiable 
function, and leads to the following problem: 


min ) 7 p((Ax — b):) (5) 
i=1 
where 
yay — #llsy 6) 


|t]- 5 if |t] > y, 


with t being a knock-off variable, and y a positive scalar. 
This function is known as Huber’s M-estimator func- 
tion ([{9]) in the statistics literature as it was intro- 
duced by P.J. Huber as a robust estimator in the face 
of inaccuracies in the observations. K. Madsen and H.B. 
Nielsen observed that they can obtain a solution (4) by 
repeatedly solving (5) for decreasing values y tending 
to zero. They were also able to avoid the potentially ill- 
conditioning effects of driving y to zero. 

As far as obtaining a set of parameters ‘immune’ to 
grossly inaccurate observations, one has the option to 


use the 1-norm or (4), or the Huber problem (5). It is in- 
teresting that Huber’s problem was used as a subprob- 
lem to solve (4). The relationship between problems (4) 
and (5) were further explored in [13] and [11]. 

Another popular choice for the solution of overde- 
termined systems of linear equations is to compute a so- 
lution to minimize the oo-norm of the residual vector. 
This approach yields the problem 


min max |(Ax — b);| . (7) 


The problem is commonly known as the Chebyshev 
problem. Here, one faces again a problem of a combi- 
natorial nature as it can be proved that a solution to 
(7) has certain residual values equal to the maximum 
in absolute value, and others smaller than this value in 
modulus, respectively. This partition of the residuals at 
a minimizing point is obviously unknown. Hence, one 
must resort to some algorithm to compute a solution to 
(7) much the same way as in the case of (4). Here, again 
there exist approaches based on minimizing the nondif- 
ferentiable function in (7) (nondifferentiable at points 
where at least two residuals attain the maximum value 
in modulus). The most notable of such methods are that 
of Bartels—Golub [7], Bartels-Conn-Charalambous [5]. 
There exist also methods based on the linear program- 
ming formulation which is given as follows in [17]: 


min Z 


Kaz. 
st. —z<(Ax—b);<z, Viz=1,...,n. 
Some of the approaches based on linear program- 
ming favored the above primal formulation for use in 
a penalty function algorithm [10,15]. Some others used 
the dual formulation: 


min (v—w)'b 

v>0 

w>0 

st. <A'(v—w)=0 
e'(v+w) =1, 


where e represents a vector of all ones. Among these ap- 
proaches, the most successful is the simplex adaptation 
of [2]. 

A survey of the use of the 2-norm, 1-norm and oo- 
norm criteria in linear regression in statistics is given in 
[14], but contains only developments until 1981. 
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Some of the algorithms mentioned above are avail- 
able as software packages. In particular, the 1-norm al- 
gorithms of Barrodale-Roberts and of Bartels—Conn- 
Sinclair are available in the NAG (Numerical Algo- 
rithms Group) software library. The 1-norm and Hu- 
ber algorithms of Madsen-Nielsen are available from 
the authors. The Chebyshev algorithm of Barrodale- 
Phillips is available in the NAG library, and also in 
the ACM collection [3]. The Chebyshev algorithm of 
Pinar-Elhedhli is available from the authors. A copy of 
the Bartels—Golub algorithm for the Chebyshev prob- 
lem can be obtained from [16]. 
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The packet annealing method [1,7,9] was motivated 
and developed specifically for thermodynamic global 
minimization problems such as those encountered in 
protein structure prediction [3], but it may be applica- 
ble to other global minimization problems as well. It 
uses the intrinsic variable-scale coarse-grained hierar- 
chical structure [11] of the potential energy (objective 
function) landscape to guide a deterministic search to 
the global minimum. The method is similar to simu- 
lated annealing [4] in that it assumes that each point 
in the search space, parameterized by multidimensional 
vector R, corresponds to a conformation of a physical 
system whose motion is governed by the potential en- 
ergy V(R). According to statistical mechanics, at tem- 
perature T, the conformational probability distribution 
is the Gibbs—Boltzmann distribution [5], 


exp [—BV(R)| 


T;R) = ; 
Pas®) J exp[—BV(R)] dR 


where 8 = I/kgT is the inverse temperature (kp 
is Boltzmann’s constant which relates the energy and 
temperature scales). As T — 0, all probability is con- 
centrated in the vicinity of the global minimum, Rg, 
of V. Simulated annealing attempts to find R, by follow- 
ing pp(; R) as the system is cooled using the Metropo- 
lis [6] or other (e. g., molecular dynamics) search pro- 
cedures to simultaneously search the entire space. In 
contrast, during cooling, packet annealing recursively 
subdivides conformation space into a sequence of com- 
pact macrostate regions which are separated from each 
other by potential energy barriers that are large com- 
pared to the current T. By this means, the space is 
subdivided into a growing number of smaller and 
smaller macrostates which are searched in parallel. 
The hierarchical relationships between the macrostates 
are represented in tree-like macrostate trajectory dia- 
grams which describe the thermodynamic properties of 
the macrostates as functions of temperature. This al- 
lows computational effort to be focussed on the most 
promising subregions of conformation space. A key 
feature is that both the characteristic energetic and spa- 
tial scales of each macrostate are computed during the 
search process so that each macrostate can be searched 
using appropriately coarse-grained energetic and spa- 
tial variables. The nature of the linkage between these 
scales, the scaling properties of the system, determines 
the difficulty of finding the minimum. 

For illustration, consider the problem of finding 
the global minimum of the two-dimensional potential 
shown in Fig. la. At a high temperature where kg T 
exceeds the internal energy barriers (right panel, 
Fig. 1b), the probability distribution is spread fairly 
smoothly over a compact region and can be coarsely ap- 
proximated by a single Gaussian characteristic packet 
#2 (right panel, Fig. 1c) which is characterized by its 
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Packet Annealing, Figure 1 
Packet annealing analysis 


centroid (vector R°) and root-mean-square size (ten- 
sor A®) (Fig. 1d). As T is reduced to the point where 
the central energy barrier becomes larger than kg T, 
the distribution bifurcates into two lobes B and y 
which can be approximated by two child packets (cen- 
ter panels, Fig. 1b, Fig. 1c, Fig. 1d). As temperature 
is further decreased, 6 bifurcates into two children 
6 and ¢ (left panels). By this temperature it is evi- 


SSA 


SON x 
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dent that the peak within the @} packet corresponds 
to the global minimum of V. While this could be 
found by a random search (as in simulated anneal- 
ing), it is clear that it would be more efficient to 
use the hierarchically coarse-grained structure mani- 
fested by the characteristic packet analysis to direct the 
search. 

e A model two-dimensional potential, V(71, r2). 
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e The corresponding Gibbs-Boltzmann probabil- 
ity distribution pz at three temperatures Thj > 
Timed > Tho. 

e Superposition of the Gaussian packets that are so- 
lutions of the characteristic packet equations at the 
three temperatures (a large number of characteristic 
packets, corresponding to the very small-scale fluc- 
tuations of V, will appear at lower temperatures). 

e The characteristic packets are characterized by the 
positions of their center-of-masses (R°) and by their 
root-mean-square fluctuation tensors (A°), repre- 
sented here by ellipses. 

e Free-energy vs. temperature trajectory diagram 
for this temperature range. Solid lines represent 
metastable macrostate trajectories and dotted lines 
represent transitions. The discontinuities in the tra- 
jectories correspond to branch points at which pack- 
ets bifurcate (from [1]) 

The characteristic packets are computed by solv- 
ing a coupled set of self-consistent equations intentify- 
ing Gaussian distributions which are metastable in the 
stochastic physical system having potential V(R) [10]. 
This is equivalent to finding which locally minimize, the 
Hellinger distance, between ppg and [2]. The character- 
istic packets only coarsely approximate pp; their main 
role is to determine the boundaries of the macrostate 
regions and thus dissect conformation space. Thermo- 
dynamic properties such as entropy and free energy can 
then be computed for each macrostate region and com- 
pactly represented in trajectory diagrams (Fig. le). All 
trajectories are tracked in this simple example, so find- 
ing the global minimum is guaranteed. This is not the 
case in more complicated problems where the number 
of branches exceeds computational capacity, and it is 
necessary to prune the trajectory diagram and pursue 
only a limited subset of branches. These are selected by 
a branch selection algorithm which uses the macrostate 
thermodynamic properties to predict those which are 
most likely to contain the global minimum. While the 
global minimum in Fig. 1 could be found by following 
only the trajectory having the lowest free-energy at each 
bifurcation, in general, multiple trajectories will have to 
be followed. The efficiency of the method will largely 
be determined by the ability of the branch selection al- 
gorithm to minimize the number of needed search tra- 
jectories. This is problem-dependent and requires the 
existence of underlying regularities within the class of 


potentials being studied. It has been suggested that such 
regularities do exist for protein potential functions [1]; 
this is an active area of research. 

A key feature is that reduced accuracy is sufficient 
until the final (low) temperature is reached—it is only 
necessary that each trajectory remain within the catch- 
ment region of its macrostate. Within each macrostate 
region V(R) can be replaced with an approximation 
which has been spatially smoothed on a scale com- 
mensurate with the size of the macrostate. The use of 
smoothing is somewhat analogous to that occurring in 
the diffusion equation [8] and Gaussian density anneal- 
ing [12] methods [11]. 
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Unrestricted Parallelism 


For simplicity, complexity theory usually focuses on de- 
cision problems. A complexity class consists of all deci- 
sion problems solvable with given resource bounds. 

The depth of a Boolean circuit is the most obvious 
measure of parallel time, because all gates can work in 
parallel. Problems solvable by uniform families of cir- 
cuits of depth O(log n) have long been recognized as 
an important complexity class. Here n always denotes 
the length of an input, and it is assumed that the family 
contains one circuit for each n. 

Complexity theory for parallel computations really 
started with the observation of the strong correspon- 
dence between parallel time and Turing machine space 
complexity discovered by A. Borodin [1]. In terms of 
circuit complexity, this means that not only is the depth 
of a Boolean circuit equal to the parallel time, but for 
every narrow circuit there is an equivalent shallow cir- 
cuit. The close relationship between space in sequential 
machines and time in parallel machines is known as the 
parallel computation thesis [8]. 

This relationship between space and time complex- 
ity classes has been strengthened by the introduction of 
alternating Turing machines [2] as a computing model 
with unbounded parallelism. These machines have uni- 
versal states in addition to the existential states present 
in nondeterministic machines. For universal configura- 
tions, all successor configurations must lead to accep- 
tance, while for existential configurations at least one 
successor must lead to acceptance. 

For time functions T(n) and space functions 
S(n), alternating Turing machines define the com- 
plexity classes ATIME(T(n)) and ASPACE(S(n)). With 
ALOGSPACE = ASPACE(log n) and APTIME = U, 
ATIME(n‘), the correspondence between sequential 
and parallel complexity classes is ALOGSPACE = P 
(polynomial time) and APTIME = PSPACE (polyno- 
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mial space). Similar relations hold at higher levels of the 
time and space hierarchy. 


Restricted Amount of Hardware, Circuits 


More practical complexity classes (for parallel com- 
puting) simultaneously restrict the parallel time and 
the amount of hardware used. S.A. Cook [3] has pro- 
posed to consider complexity classes obtained by simul- 
taneously bounding the time and space of Turing ma- 
chines. The most important of these classes has later 
been named SC (Steve’s class). SC consists of all de- 
cision problems that can be solved simultaneously in 
polynomial time and O(log* n) space. Finally, SC is de- 
fined as SC = U, SC*. 

Similarly, N. Pippenger has studied the complexity 
classes (later called NC*) consisting of all problems that 
can be solved by uniform families of Boolean circuits 
such that the size of the nth circuit is polynomial in n 
and its depth is O(log* n) (see [9]). Again, NC is de- 
fined as NC = U,NC*. The classes NC and NC* are 
much more widely used than the classes SC and SCK, 
because NC directly measures the parallel time. Infor- 
mally, it consists of all decision problems that can be 
solved very fast (in polylogarithmic time) on a parallel 
machine with only a moderate (polynomial) amount of 
hardware. It is an open problem (as of 2000) whether 
NC =SC [1]. 

If Boolean circuits are allowed to have ‘AND’ and 
‘OR’ gates with an arbitrary number of inputs, then the 
restriction to polynomial size and depth O(log‘n) de- 
fines the class ACF. It is easy to see that NCK c ACK C 
NC#+ im 

By allowing randomized computations, NC extends 
to RNC. Formally, these circuits may contain gates 
(without inputs) producing independent random out- 
puts 0 and 1 (representing false or true) with equal 
probability. Such a randomized circuit solves a prob- 
lem, if it rejects every negative instance, while accept- 
ing every positive instance with probability at least 1/2. 
Deciding whether a graph has a perfect matching is in 
RNC, but is not known to be in NC. 

For good reasons, practical parallel computing has 
initially focused on utilizing the obvious parallelism 
present in many scientific computations due to the 
presence of matrices, vectors or simple loops. At the 
same time, theoretical research has tried to classify 


problems according to their efficient solvability by par- 
allel algorithms, the central question being whether 
a problem is in NC. 


Restricted Amount of Hardware, PRAMs 


To show membership in NC, and for designing any par- 
allel algorithm while abstracting from all communica- 
tion issues, the PRAM model of parallel computing has 
been defined. A PRAM [6,13] consists of many cooper- 
ating processors, each being a random access machines 
(RAM, [4]). Each processor can do local computations 
consisting of additions, subtractions, shifts, conditional 
and unconditional jumps, as well as indirect address- 
ing. Arbitrary long shifts and multiplications of large 
numbers are not allowed in one step, as these opera- 
tions would make a single processor impractically pow- 
erful. 

The processors in a PRAM are synchronized, and 
communication between processors is accomplished by 
a global memory. The intention is not to suggest that 
synchronized operation and global memory are easy to 
realize, but that it simplifies programming of a parallel 
machine. Simulation of global memory (by local mem- 
ory and communication) can be handled by a compiler 
or even directly implemented in hardware ([10]). Sev- 
eral flavors of PRAM have been defined. 

e Inan EREW PRAM, different processors are not al- 
lowed to access the same memory location simulta- 
neously. 

e Ina CREW PRAM, only writing to memory is so re- 
stricted. 

e Ina CRCW PRAM, simultaneous reading and writ- 
ing is allowed. 

There are three kinds of CRCW PRAMs with different 

ways to handle concurrent writing. 

e In the COMMON model, all processors writing to 
the same location simultaneously, are required to 
write the same data. 

e In the ARBITRARY model, an arbitrary processor 
succeeds (i. e., it writes last). 

e Inthe PRIORITY model, the lowest numbered pro- 
cessor succeeds with writing. 

Interesting complexity classes are obtained by re- 
stricting computations to be uniform. Technically, it 
is assumed that a logspace Turing machine produces 
the programs for all processors. Then the complexity 
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classes EREW*, CREW* and CRCW* are defined as the 
classes of problems solved by a uniform PRAM of the 
given type in time O(log* n) with polynomially many 
processors. (For CRCW* the most powerful type PRI- 
ORITY is assumed.) It is known that NC‘ C EREWk ¢ 
CREWS ¢ CRCWE = ACK C NC! (see [9]). 

Besides minimizing the parallel time, a key issue in 
parallel computing is to minimize the work, which is the 
product of the parallel time with the number of pro- 
cessors. An algorithm is said to be optimal if the ratio 
between the work and the optimal sequential time is 
bounded by a constant, and it is said to be efficient if 
that ratio is bounded by a polylogarithmic factor. 

While the PRAM model abstracts from communi- 
cation issues, another branch of research has focused 
exactly on communication for various interesting ar- 
rangements of processors, like arrays, trees and hyper- 
cubes [11]. 

Theoretically, it has been shown that PRAM algo- 
rithms can be implemented by some fixed networks of 
communicating processors with very little loss of speed 
[7,14]. Yet, it has been felt that the constant factors in 
the speed loss could be too big to warrant the restriction 
of the algorithm designer to simply program a PRAM 
and let the compiler handle all communication. 


BSP and LogP 


Several parallel machine models have been proposed 

with two goals in mind. 

e Programs (like PRAM programs) are portable to 
various types of physical parallel machines. 

e The programmer (unlike a PRAM programmer) has 
some control over the communication between pro- 
cessors in order to obtain high efficiency. 

These two somewhat contradicting goals can be 
achieved by letting the programmer choose the source 
and target, but not the path and detailed timing of each 
message. The two most influential such models are the 
BSP model (bulk synchronous parallel model) of L.G. 
Valiant [15,16] and the LogP model [5]. 

The BSP model is called a bridging model, because 
it is intended to bridge the gap between hardware and 
software for parallel computing, as the von Neumann 
model did for sequential computing. Thanks to compil- 
ers, the programmer does not have to know too many 
details of the actual machine. 


The BSP model performs a sequence of supersteps, 
each consisting of three phases, local computation, 
global communication and a barrier synchronization. 
The latter is a global check that all components have 
finished a superstep. The BSP model is characterized by 
two parameters, L and g. The time unit is the duration 
of a local operation. The periodicity parameter L mea- 
sures the length of a superstep, while g measures the 
length of a global operation. 

The LogP model has basically the same goals as the 
BSP model, but intends to give the programmer slightly 
more flexibility to address important performance is- 
sues without having to deal with unnecessary details. 
The LogP model is characterized by four parameters, 
the latency L, the overhead per message 0, the time gap 
between messages g (for each processor) and the ratio 
P between the number of processors and the number 
of memory modules. This allows each parallel machine 
to be characterized by only a few parameters. The al- 
gorithm designer can prescribe different methods for 
different ranges of the parameters. Then such an algo- 
rithm can compile efficiently on various parallel ma- 
chines. 

The BSP and LogP models, as well as the QSM 
(queueing shared-memory model), are actually quite 
closely related as indicated by various simulation results 
(e.g., [12]). 
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The rapid growth and large availability of high speed 
networking have brought high performance computing 
systems (HPCS) to the reach of many people wish- 
ing to process very large data and difficult problems 
as fast as possible. Such systems evolve at an incred- 
ible pace and different machines with new architec- 
tures, programming models and paradigms, and com- 
putation granularity are proposed every year. For in- 
stance, the use of personal computers (PC) clusters in- 
terconnected by high performance local networks with 
raw throughput close to 1Gb/s and latency smaller than 
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10 us yielded, in the late 1990s, parallel systems whose 
computing power was close to or even better than the 
one of super-computers of the middle of the 1980s, for 
a hundredth to a tenth of their nominal price. Their 
local networks were either realized with off-the-shelf 
ware (e.g. Myrinet and Fast Ethernet), or application- 
driven devices, in which case additional functionali- 
ties were built-in, mainly at the memory access level. 
Such clusters supported the Linux operational system, 
among others, and offered a system level virtualiza- 
tion for user-friendly programming environments like 
multi-threading, communication libraries, automatic 
load-balance, I/O systems, etc. 

When using an HPCS to solve computationally in- 
tensive problems, the first aspect to be understood is 
the level of concurrency existing in the problem, i-e., 
which tasks can be executed simultaneously and which 
cannot. For instance, there are cases where a problem 
is not adapted at all to the parallel setting and only very 
small benefit from parallelism can be obtained. There- 
fore, the choice of a model for parallel computing is of 
primeval importance. 


Models 


Parallel computing models can be roughly divided into 
those which implicitly hide or assume the value of its 
parameters and those which explicitly set these values, 
as follows. The machine size can be defined by a pa- 
rameter p or be (implicitly) connected to the problem 
input size n. The topology of the communication net- 
work is either explicit (ring, grid, hypercube, etc.) and 
the processors communicate only with neighbors, or 
hidden, i.e., the processors can communicate with any 
other processor through a high speed interconnection 
medium. The communication costs may depend on the 
size of the messages sent and on the distance they travel, 
or be fixed by message or even by set of messages sent. 
Finally, although parallel computers are asynchronous, 
and some models take this into consideration, most 
of the existing models suppose a synchronized mode, 
with some kind of barrier of synchronization - even 
light ones, based on rendez-vous communications. In 
the following, we briefly describe the most important 
models for parallel computing and the way they deal 
with the parameters above. For more details, we refer 
to [11,12,14,22,23,24,29]. 


Atomic Models 


In these models, a parallel machine consists of a large 
set of atomic processors communicating by the ex- 
change of atomic messages at a constant cost per mes- 
sage. The number of processors is usually taken as 
a function of n, the input size of the problem to be 
solved. 


PRAM 


The shared memory model known as parallel random 
access machine (PRAM) [19,22,23] is the best known 
model for parallel algorithm design, because of its high 
abstraction level. In this model, a parallel machine con- 
sists of a large set of atomic processors. They commu- 
nicate through a shared memory, where any position 
can be accessed in constant time. In order to design an 
algorithm we just have to describe a sequence of syn- 
chronous parallel operations executed by the proces- 
sors on the shared memory, without worrying about 
scheduling the communications between the proces- 
sors. This model is perfectly adapted to determine the 
level of parallelism inherent to a problem, since all pa- 
rameters are implicit or hidden [10,21]. Unfortunately, 
however, only small shared-memory parallel comput- 
ers have been built so far, because of technological con- 
straints regarding the concurrent access to the mem- 
ory in constant time, when the number of processors is 
large. 


Atomic GRAM 


One way to solve the problem raised by the fully con- 
nected shared memory was to consider distributed 
memory machines and make explicit the topology 
through which the atomic processors communicate. 
Among the most used GRAMs (where G stands for the 
graph defining the topology of the communication net- 
work), we find grids and hypercubes [11,24], whose def- 
initions follow. 


The 2-dimensional grid 


The 2-dimensional grid (called grid in the remainder) 
of size N is composed of N processors PE;,;,1 < i,j < 
/N, such that processor PE;,; is linked to processors 
PE; — 1, j, PEis 1, PE, j—1 PE, 1, for 2 < i,j < VN — 
1. The grid has degree 4, diameter O(N) (a longest 
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path existing between processors PE), and PE /x x; 
for instance), and bisection width /N (which can 
be obtained by deleting all links between processors 
PE, vw and PE, vay forl <i< JN). 

Grids can be generalized in two ways. The first one 
is to consider d-dimensional grids which are defined on 
N processors PEj,,....i,, 1 < ix < N', with links be- 
tween PEj,,...,i,,...,ig aNd PEi,,...i,41,..,i9 for 1 <k < 
d. The second classic generalization is to add links in 
the 2-dimensional grid between processors PE;,; and 
PE yx,; and between PEj,, and PE; yy for 1 < i,j < 
VN. The obtained structure is called a 2-dimensional 
torus. As 2-dimensional grids, 2-dimensional tori can 
be generalized to d-dimensional tori. 

Processors of a grid are usually numbered in a row 
major order by using a unique index. PE; ; is then de- 
noted by PE(;_1) yw+j-1 that is, processors are linearly 
numbered from left to right and from top to bottom, 
with indices in the range 0,..., N — 1. 


The Hypercube 


An interconnection network with the topology of a d- 
dimensional hypercube, denoted H(d), is composed of 
N=24 processors, labeled from 0 to N — 1, and dN\2 
communication links. Let (i)2 be the binary string rep- 
resenting i and i, denote the k-th digit, from right to 
left, in (i)2. Then, the neighbors of PE; are all PE; such 
that (i). and (j)2 differ in exactly one bit position, say k, 
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A Hypercube H(5) with 32 nodes and diameter 5. We can see 
its decomposition in four H(3), in bold 


0 < k<d, implying that its degree is d. In this case, we 
say that PE; and PE; are neighbors along dimension k. 
It is not difficult to see that the maximum distance in 
a hypercube is given by those pairs of processors whose 
binary string differ in all d positions, implying that its 
diameter is d = log N. Finally, the bisection width of 
a hypercube is N/2. 

Figure 2 shows that processors are the vertices of 
a hypercube of dimension d, each connected to d neigh- 
bors. Notice further that, for instance, PE) and PE, are 
neighbors along dimension 2 in any d > 2-dimensional 
hypercube. 

A hypercube H(d) can be decomposed in d differ- 
ent ways into two copies of H(d — 1), with N/2 edges 
connecting them. In order to find one such decomposi- 
tion, it suffices to fix any one bit position in the proces- 
sors’ addresses, say position k, 0 < k < d. Then, the two 
copies of H(d— 1) are composed of the vertices ig_;--- 
ine 1 O ipa +++ ig and ig-y +++ ing 1 1 ip_1 +++ ip, respec- 
tively. It is interesting to notice that one can use these 
decompositions in order to implement divide and con- 
quer algorithms in hypercubes. 

Let us further remark that the hypercube is vertex 
and edge symmetric, making it easy to use. Very infor- 
mally, we could say that, as far as the neighbors are con- 
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cerned, the hypercube looks the same from every node. 
Note that this is also true for the torus, but false for the 
grid. 


Bulk Models 


Parallel algorithm designers who dealt with real prob- 
lems to solve, soon discovered that they needed not 
only to determine a problem’s inherent parallelism, but 
also to write algorithms that could be efficiently imple- 
mented. Hence, atomic models were not of help, since 
parallel computers cannot expand endlessly in order to 
reflect the size of the problem at hand, and distributed 


memory bulk models appeared. 
In opposition to the atomic models, the number of 


processors, p, becomes a parameter and each processor 
is supposed to be able to hold n/p data. A parallel algo- 
rithm is then a sequence of supersteps, composed of lo- 
cal computations followed by a communication round, 
where messages are exchanged among the processors. 
Notice that such an algorithm, where the number of su- 
persteps is small and independent of n, will be efficient 
in any HPCS, provided that the communication proce- 
dures are implemented in an efficient manner, what is 
usually the case. 


Bulk GRAM 


One difference between bulk and atomic GRAMs is the 
number of processors, that here is set to an independent 
value p. In each communication phase, processors can 
then send a message of varying length to their neigh- 
bors. The cost of these communications is also mod- 
eled, according to a value proportional to the initializa- 
tion of the communication channel plus the time spent 
by the message to arrive at the destination [5,11,20]. 


BSP 


The BSP model (for bulk synchronous parallel) [28], 
uses slackness in the number of processors and memory 
mapping via hash functions to hide communication la- 
tency and provide for the efficient execution of atomic 
PRAM algorithms on existing hardware. 

An input of size n is distributed evenly across a p- 
processor parallel computer. In a single superstep each 
processor may send h and receive h’ messages (called an 
(h, h’)-relation) and then perform an internal computa- 
tion on its internal memory cells using the messages it 


has just received. To avoid any conflicts that might be 
caused by asynchronies in the network (whose topology 
is left undefined) the messages sent out in a round t by 
some processor cannot depend upon any messages that 
the processor receives in round t (but, of course, they 
may depend upon messages received in round t — 1). 

Communication costs depend on two parameters, 
namely the latency L and the throughput g. The cost 
of an (h, h’)-relation performed by a processor is then 
L + max(h, h’) g, and the total cost of a communica- 
tion round is max; = 1, ..., pL + max(h;, hj’) g where 
h; (respectively, h;’) represents the amount of data sent 
(respectively, received) by processor i. Precise models 
of parallel computers can be obtained by assigning re- 
alistic values to L and g. More detailed BSP models and 
algorithms can be found in [2,3,27,29]. 


CGM 


The coarse grained multicomputer model, or CGM(n, p) 
for short, was introduced in [7]. The CGM(n, p) is a BSP 
model consisting of a set of p processors with O(n/p) 
local memory each, in which each superstep has h = h’ 
= O(n/p). 

The originality of CGM stems from its cost model. 
The algorithm designer will no longer try to minimize 
the overall amount of data exchanged, as in the BSP, 
but rather design algorithms with a small number of 
supersteps, independent of the input size n. As a conse- 
quence, these algorithms will be very efficient in prac- 
tice. Ideally, algorithms should run for a constant num- 
ber of supersteps, as it is the case for sorting [27] and 
many problems in image processing [17,18], compu- 
tational geometry [13,15], and optimization [8,16]. On 
the other hand, graph problems seem to be somewhat 
more complex, some of them requiring O(log p) com- 
munication rounds to be solved [4]. 


LogP 


This model uses the BSP model as a starting point and 
focuses on the technological trend from fine-grained 
parallel machines towards coarse-grained systems, ad- 
vocating portable parallel algorithm design [6]. It char- 
acterizes a distributed memory parallel computer by 
four parameters (whence its name), as follows. 

e L: the latency of a small message being communi- 

cated from user-space to user-space. 
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e o: the overhead incurred in a communication. It is 
defined as the time interval during which a proces- 
sor cannot do anything else because it is communi- 
cating. 

e g: the gap between two consecutive message trans- 
missions (or receptions) by one processor. It is de- 
fined as the inverse of the bandwidth of the commu- 
nication processing element. 

e P:the number of processors. 

Memory resources are considered finite. Hence, 
only L/g messages can be at the same time in the net- 
work. The cost to communicate an elementary packet 
between two processors is L+ 20. If a reception ac- 
knowledgment is required, this cost becomes 2L+ 20. 
Designing algorithms in LogP may become an elabo- 
rate task, because of the several parameters involved, 
and the asynchronous character of the models [1,9,26]. 
It is also very interesting to notice that work on LogP 
algorithms are very similar to those on bulk GRAM al- 
gorithms, done some 10 years earlier [5,25]. 
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Optimization - 
Parallel Search Algorithms - Heuristics 


Many applications in the field of artificial intelligence 
and operations research rely on heuristic search (cf. 
Heuristic search) as their primary solution method. Be- 
cause these applications often spawn very large deci- 
sion trees, the design of efficient parallel algorithms is 
of prime importance. Depending on the level of paral- 
lelism (fine-grained or coarse-grained) and the degree 
of parallelism (moderately or massively parallel), the 
techniques used in parallel heuristic search can be cate- 
gorized into one of three classes: 

e task partitioning 

e parallel window search 

e tree partitioning 


Task Partitioning 


The method of task partitioning (or operator parti- 
tioning) provides a fine-grained parallelism at the low- 
est level, the operator level. It speeds up the process- 
ing of individual nodes by performing repetitive steps 
like successor generation, node evaluation, and book- 
keeping in parallel. 

Task partitioning is especially popular in the field of 
computer chess, where special purpose hardware was 
built to assist in the move generation and board eval- 
uation. The world computer chess champions Deep 
Thought [3] and Deep Blue, for example, employ 
special-purpose coprocessors for generating all moves 
of a position and for evaluating all 64 fields of a board 
in parallel. In the early years, hardware implementa- 
tions were only feasible for simple operators that could 
be easily implemented in silicon. With the advent of 
programmable hardware accelerator chips like FPGAs 
(field programmable gate arrays), more complex parts 
of the tree search could be implemented in hardware, 
which dramatically improved the performance of com- 
puter chess programs, most notably that of the world 
champion Shredder. 
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Parallel Window Search 


In parallel window search, all processors examine the 
entire tree, but each with another search bound. This 
method was originally developed to speed up game 
playing programs. As first suggested by Baudet [1], the 
total range of values in a game tree is subdivided into 
p nonoverlapping alpha-beta windows (where p is the 
number of processors), so that approximately one third 
is covered. The advantage is, that the processor hav- 
ing the true minimax value in its window will find it 
faster by virtue of starting with a narrow search window 
instead of using the full-width window. Even the un- 
successful processors are productive: They determine 
whether the minimax value lies below or above their 
assigned search interval. If the true minimax value does 
not lie in one of the initial p windows, the processors are 
re-scheduled to cover some of the remaining intervals. 
The iterative re-scheduling process is continued (like in 
binary search) until the final solution is found by one of 
the processors. 

In some favorable cases, parallel window search 
may even cause superlinear speedup s(p) > p when the 
minimax value is found early in the search. Moreover, 
the communication and synchronization overheads are 
quite low, allowing efficient execution on loosely cou- 
pled parallel systems. On the negative side, however, 
is the limited scalability of parallel window search. 
Even with an infinite number of processors, a maxi- 
mal speedup of five or six can be attained in chess trees, 
because in the best case all w !471 + w !#J— 1 nodes 
of the minimal tree (with width w and depth d) must 
be examined by the ‘successful’ processor returning the 
minimax value. For this reason, parallel window search 
is suitable for small systems with up to three processors 
only. 

Parallel window search can also be applied to single- 
agent searches like iterative-deepening Ax (IDA*) [7]. 
Here, different processors are used to search the entire 
tree up to different thresholds (windows), hoping that 
one of them would find a solution. If not, a global ad- 
ministration scheme determines the next larger thresh- 
old, and the node expansion starts over again. Note, 
that this scheme works only in applications where the 
increments in the threshold are known a priori. In the 
15-puzzle, for example, the first processor’s threshold 
would be set to the heuristic estimate h(n) of the ini- 


tial position n, the next processor gets the threshold 
h(n) + 2, and so on. 

As in adversary game tree search, the maximum 
scalability is limited to the maximum number of iter- 
ations. Because the iterations with consecutive thresh- 
olds are not explored in sequential order, the first solu- 
tion may not be optimal. Optimality can, however, be 
guaranteed by completing all shallower iterations than 
that of the best solution found so far. The better the 
node ordering, the faster the solution speed. In the ex- 
treme case, superlinear speedup can be achieved when 
the solution is found early in some ‘left’ part of the tree. 

Because of its nonoptimality and its limited scala- 
bility, parallel window search is mainly used for quickly 
determining a (possibly suboptimal) solution which is 
then improved by other means [7]. 


Tree Partitioning 


In tree partitioning, the total search space is split into 
smaller parts (subtrees) for simultaneous exploration 
by different processors. Once the subtrees have been 
distributed among the processors, only little commu- 
nication is necessary for broadcasting improved bound 
values and for termination detection. 

Compared to the other two techniques, tree parti- 
tioning is the only parallelization strategy that allows 
(in principle) to employ an unlimited number of pro- 
cessors. This is especially true for the use of tree parti- 
tioning in parallel depth-first search (DFS). Here, appli- 
cations have been successfully tested on massively par- 
allel systems with more than a thousand processors. 

Because DFS trees tend to be highly irregular in 
practice, i.e. they exhibit varying branching degrees 
and search depths, static tree partitioning methods are 
insufficient to keep all processors busy during the whole 
computation. Two dynamic work partitioning methods 
have been proposed: stack splitting and search-frontier 
splitting. 


Stack Splitting 


In the stack splitting scheme [5], the local work is par- 
titioned by splitting the donor’s search stack into two, 
one of which is given to the requester. Care must be 
taken to select a suitable amount of work for shipment. 
On the one hand, the transferred work must be large 
enough to vindicate the communication costs, and on 
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Stack splitting 


the other hand, not too much work should be sent to 

avoid thrashing effects. Several strategies for selecting 

nodes from a donor’s stack have been proposed: 

e Removing nodes from the tree levels near the root 
give coarse-grained work packets, which are likely 
to be splitted a second time during the search. 

e Removing nodes near a cut-off level deep in the 
tree give fine-grained work packets, making it some- 
times necessary to send more requests for obtaining 
enough work. 

e Removing a ‘vertical slice of nodes’, one from each 
level, may be useful in trees with large branching fac- 
tors and irregular search depths. 

Stack-splitting works for simple DFS as well as for 
iterative DFS. The iterative-deepening search PIDAx 
(Parallel IDA, [4]) starts a new iteration with an in- 
creased cost-bound when all processors have finished 
their current iteration without success. The end of an 
iteration is determined by a barrier synchronization, 
e. g., the distributed termination detection algorithm of 
Dijkstra [2]. 


Search-Frontier Splitting 


Rather than subdividing a processor’s stack, search- 
frontier splitting initially generates a suitable number of 
‘work packets’. A work packet is a node from a ‘search- 
frontier’ in the tree, containing nodes n with the same 
cost value f(m). Search-frontier splitting has two phases: 
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Search-frontier splitting 


e In an initialization phase, the nodes of a search- 
frontier are distributedly generated by a cost- 
bounded BFS or an iterative DFS. The bound is in- 
crementally increased until at least p nodes with the 
same cost f(p) are generated and stored in the local 
memories. Each frontier forms the root of a subtree 
that represents an indivisible piece of work used in 
the second phase. 

e In the main asynchronous search phase, each pro- 
cessor expands its own frontier nodes in DFS or 
DFBB fashion. When a processor gets idle, it sends 
a request for a work packet (unprocessed frontier 
node) to another processor. 

The initialization phase is only short. A suitable 
amount of frontier nodes is generated to provide a fine 
work granularity. As before, the work packets must nei- 
ther be too small (to pay off for the communication 
costs) nor too large (to avoid thrashing effects). 

In practice, little load balancing is required, be- 
cause the expansion of the local frontier nodes keep 
the processors busy for most of the time. Search- 
frontier splitting is applicable to DFS and iterative DFS. 
The iterative-deepening variant AIDA (asynchronous 
IDA* [8]) starts a new iteration on the previously used 
frontier nodes. This has the effect of a self-improving 
load balancing scheme, because all subtrees tend to 
grow at approximately the same rate when searching to 
the next larger cost-bound. In practice, the communi- 
cation overhead decreases with increasing search time. 
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Work Distribution 


The work distribution is either initiated by the sender 
(donor) or by the receiver (recipient). In the sender ini- 
tiated work distribution, the generation of subtasks is 
independent from idle processors. It is based on the ra- 
tionale that the local work-load of any two processors 
in the system should not differ by a factor > 4 [6]. 
Work packets are delivered to weakly loaded processors 
either on demand (when they become idle) or when 
imbalances occur. In conjunction with a node prior- 
ity scheme, sender initiated work distribution may help 
avoiding speedup anomalies, but at the cost of a re- 
duced execution speed. 

The receiver initiated work distribution scheme, 
also known as work stealing, is more popular. Work 
requests are only sent by processors that became idle. 
When a donor has work to share, it returns a work 
packet, otherwise it notifies the requester accordingly. 
More sophisticated variants start issuing work requests 
as soon as there are fewer than 5 work packets left 
on their stack, thereby reducing communication la- 
tency by overlapping communication and computa- 
tion. 

But which processor should best be addressed to ob- 
tain a work packet? The answer depends on the topol- 
ogy and characteristic of the interconnection network 
of the parallel system. Four receiver initiated work dis- 
tribution methods have been extensively tested and 
analysed: 

e ARR: In the asynchronous round robin strategy, 
each processor maintains a local variable target 
pointing to the next donor. Whenever a processor 
runs out of work, it sends a work request to the 
target processor and increments target (modulo p) 
thereafter. 

e GRR: The global round robin method works simi- 
lar to ARR, but with a global target variable instead. 
Whenever a processor runs out of work, it looks up 
the global target variable, increments it, and sends 
a work request to the assigned donor. Hence, four 
messages are sent for a single work request. Memory 
contention can be reduced by introducing a hierar- 
chy of distributed target variables. 

e RP: Inrandom polling, idle processors send work re- 
quests to randomly chosen processors. Each donor 
is selected with the same probability. 


e PF: In packet forwarding, unsuccessful work re- 
quests are not returned to the sender, but forwarded 
to the next neighbor. On ring topologies, work re- 
quests are forwarded until a processor responds with 
a work packet, or the message makes a full round 
through the ring, thereby indicating that no work is 
available. 

Which of the work distribution schemes to choose 
depends on the interconnection topology of the tar- 
get platform. On systems with a small communica- 
tion diameter (e.g. hypercube), RP and ARR give 
best speedups, while PF performs better on sys- 
tems with a large communication diameter (e.g. ring, 
torus) [4,8]. 


Table Driven Search 


In many domains, application-specific heuristics and 
search enhancements introduce interdependencies be- 
tween the generated states, making efficient paralleliza- 
tion a challenging task. Much of this information is 
stored in memory tables (e.g. transposition tables or 
refutation lists), which are essentially large caches in 
which the generated nodes and some book-keeping in- 
formation are stored. Before expanding a new node, 
a table lookup is performed to check whether infor- 
mation on that node is available. This is especially 
useful when a node can have multiple predecessors, 
i.e. when the search space is a graph rather than 
a tree. 

In parallel implementations the transposition ta- 
ble is partitioned among the processors. Each time 
a processor extends a new node, it first does a remote 
lookup by sending a message to the processor respon- 
sible for that portion of the table and waiting for the 
result. This results in a large time-overhead — even with 
asynchronous message passing. As an efficient alterna- 
tive, transposition-driven work scheduling (TDS) [9] 
was introduced, which migrates the node to be ex- 
panded to the processor that may contain the corre- 
sponding transposition table entry. From that moment 
on, the receiving processor is responsible for the ta- 
ble lookup and, depending on the result, for extend- 
ing the subtree of that node to the given search depth. 
TDS was introduced for parallel single-agent search 
(like IDA*) but can also be applied to multi-agent 
search. 
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For finite-dimensional optimization problems with in- 
finitely many inequality constraints (‘semi-infinite op- 
timization), the topological concept of global stabil- 
ity and its algebraic characterization are introduced in 
the following. Global stability refers to perturbations of 
the defining functions and their derivatives up to sec- 
ond order. Then, transitions between stable problems 
via one-parametric families are considered. Necessar- 
ily, certain unstable situations will be met and they will 
be discussed by means of the underlying singularities. 

The considered optimization problems are of the 
following type: 


for all y € Y 
u;(y) =O, ieE7, 


hi(x) =0,i€A, 
M{[h, g] = fren g(x,y) = 0 
j 
l vj(y) 20, je] 
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h=(h;,,i¢ A), u=(u,i€ TD, v= (v7 € J); |Al <n, [| 
<r, |J| < oo. The index set Y is assumed to be compact. 
All defining functions are assumed to be of class C* (k 
> 2), ie, f € CK(R", R), g € CK(R"xR’, R), etc. 


The Reduction Ansatz 


Under certain assumptions it is possible to reduce (SIP) 
locally to an optimization problem with a finite num- 
ber of inequality constraints. In fact, put Yo(x) = {y € 
Y: g(x, y) = O} and let x € M[h, g]. Then, all points of 
Yo(x) are global minima for g(x, -)|y. Consequently, if 
Yo(x) = {Vy,.-- 
‘strongly stable’ (cf. [16]), then the implicit function 
theorem guarantees the existence of local C'- respec- 
tively Lipschitz continuous, mappings y;(x), where y;(x) 
is a local minimum for g(x, -)|y, j = 1,..., p. But then, 
in a neighborhood of x, we can describe the feasible set 
M[ h, g] by means of the equalities h; = 0, i € A, and the 
finite number of C?-, respectively C! !-, inequality con- 
straints p(x) = 0,j=1,..., p, where p(x) = g(x, yj(x)) 
(local marginal function) (cf. [4]). 


,Yp} are ‘nondegenerate’ respectively 


Global (Structural) Stability 


In this section, global (structural) stability of (SIP) will 
be introduced and characterized by using the concept 
of topological stability of the feasible set M[ h, g] and 
the concept of strong stability of stationary points for 
(SIP). Throughout this section assume that for all y € Y 
the gradients Du;(y),i¢I, Dv;(y), j € Jo(y) are linearly 
independent, where Jo(y) = {j EJ: vY)= o}. 


Topological Stability of M[ h, g] 


The set M[ h, g] is called topologically stable with re- 
spect to the strong C’-topology (briefly: C?-stable) if 
there exists a C?-neighborhood U of (h, g) in C* such 
that M[h, g] is homeomorphic with M (hl for every 
(h,@) € U. Here, a C?-neighborhood of (h, g) is gen- 
erated by perturbations of (h, g) and their derivatives 
up to second order which are controlled by a positive 
continuous function e(-): R” > R (for details, cf. [6]). 
The topological stability of M[ h, g] is closely re- 
lated with the Mangasarian-Fromovitz constraint qual- 
ification (MFCQ). The (MFCQ) is said to hold at x € 
M[h, g] if the (row) vectors Dh;(x), i € A are linearly 
independent and if there exists a vector & € R” satis- 


fying Dh,(x)-& = 0,i¢€ A and Dy g(x,y)-& > 0 
for all y € Yo(x). In [12] it is shown that topological 
stability of M[ h, g] can be characterized by an equiv- 
alent algebraic condition: If M[ h, g] is compact, then 
M[ h, g] is C2-stable if and only if (MFCQ) holds at 
all x € M[h, g]. This equivalence was proved first in 
[2] for finite optimization problems (i.e. for those with 
finitely many constraints) and, then, it was generalized 
for semi-infinite problems in [12]. A generalization to 
noncompact feasible sets M[ h, g] under a stronger con- 
straint qualification is presented in [9]. 


Strong Stability of Stationary Points 


The strong stability of a stationary point for a finite 
problem and its equivalent algebraic characterization 
have been introduced in [16]. 

A point x € M[h, g] is called a stationary point for 
(SIP) if there exist y,,... Jp = Yo(x), and reals B;, i€ 
A, 7; 2 0,j=1,...,p such that 


P 
Df(e) = > B,Dhilx) + > ¥,;Degl%,7;)- 


icA j=l 


A stationary point can be a local minimum or a saddle 

point for (SIP). Let SIP(f, h, g) denote the semi-infinite 

problem that is generated by the function vector (f, h, 

g). The strong stability of a stationary point x(f, h, g) for 

SIP(f, h, g) is a local property; it means the existence 

and local uniqueness of a stationary point x(f, h® for 

SIP(f, hh) where all local sufficiently small perturba- 

tions of (f, h, g) and their derivatives up to second order 

are considered and where x(f, hh) depends continu- 
ously on the perturbed function vector (f.h.Q). Here, 
the function vector (f, h, g) is considered as a parame- 
ter, and under certain assumptions equivalent algebraic 
conditions for strong stability can be obtained by us- 
ing the implicit function theorem in the correspond- 
ing function space. This was done first in [16] for finite 
problems, a generalization to semi-infinite problems is 
given in [17] where the following three cases are distin- 

guished for the considered point x = x(f, h, g). 

1) The set Yo(x) is finite with Yo(x) = {¥,,... aT pte 
and all points y,,...,y, are strongly stable local 
minima for g(x,-)|y. Furthermore, the linear in- 
dependence constraint qualification (LICQ) holds 
at x € Ml[h,gl], ie. the vectors Dh;(x), i € A, 
Dy, g(x, Yi)» j=1,..., pare linearly independent. 
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2) Yo(x) = {9,..-. +s Jp are 
strongly stable local minima for g(x, -)|y. Further- 
more, (MFCQ) holds at x € M[h, g but not (LICQ). 

3) Not all points from Yo(x) are strongly stable local 
minima for g(x, -)|y. 

In Case 1 and Case 2 the reduction Ansatz is fulfilled 
at x € M[h,g] can be described locally by means of 
finitely many C’'-constraints. Therefore, the Karush- 
Kuhn-Tucker system (written as a system of equations 
as in [16,17]) is Lipschitz continuous and has a gen- 
eralized Jacobian. Then, an equivalent algebraic char- 
acterization for the strong stability of x € M[h, g] in 
Case 1 and Case 2 is a condition on a certain subset of 
this generalized Jacobian (for details, cf. [17]): In Case 1 
one obtains a condition on coherent orientation; here, 
all elements (matrices) of this subset of the generalized 
Jacobian restricted to the corresponding tangent space 
have a nonvanishing determinant with a common sign. 
In Case 2, all elements of this subset restricted to the 
corresponding tangent space are positive definite. In 
particular, in Case 2 a strongly stable stationary point 
x € M[h, g] has to be a local minimum for SIP(f, h, g). 

In Case 3 the active index set Yo(x) might contain 
infinitely many points! The equivalent algebraic charac- 
terization for strong stability in that case is also a pos- 
itive definiteness condition but a rather technical one 
(for details, see [17]). In Case 3 a strongly stable sta- 
tionary point x € M[h, g] has to be a local minimum 
for SIP(f, h, g) as well. 


Vp} and all points y,,.. 


Global (Structural) Stability of SIP(f h, g) 


Two problems SIP(f, h, g) and SIP(f, h. 2) are called 
equivalent if there exist continuous mappings gRx R”: 
R", Y/R — R with the following properties: 
1) The mapping ¢(t, -): R” > R"” is a homeomorphism 
for eacht ER. 
2) The mapping w is a homeomorphism and mono- 
tonically increasing. 
3) For allt ER: y,[L'(f,h, g)] = LY (f, h,®), where 
¢::= (t,-) and L'(f, h, g) = {x € M[ h, g]: f(x) < th. 
The semi-infinite problem SIP(f, h, g) is called struc- 
turally stable if there exists a C2-neighborhood V of the 
defining triple (f, h, g) such that SIP(f, h,@) and SIP(f, 
h, g) are equivalent for all (f, h, g) € V. 
The structural stability of SIP(f, h, g) can be charac- 
terized by means of the introduced topological stability 


of M[ h, g] as well as the strong stability of stationary 
points as follows. If M[h, g] is compact, then SIP(f, h, 
g) is structurally stable if and only if (MFCQ) holds at 
all x € M[h, g], every stationary point for SIP(f, h, g) is 
strongly stable and different stationary points have dif- 
ferent f-values. This result was proved first in [1,4] for 
finite problems and, then, it was (partially) generalized 
to semi-infinite problems in [8,20]. 


Generic Transitions 


In this section one-parametric families (SIP); of semi- 
infinite optimization problems are considered: 


min f(x, t), 


st. x € M(t), 


(SIP) 


where all defining functions f, h, g, u, v depend on one 
additional variable t (the parameter), i.e., one consid- 
ers f(x, t), hi(x, t), i € I, g(x, t, y), etc. Note that, in 
particular, the index set Y(t) now also depends on the 
parameter ¢t. This is in contrast to one-parametric fi- 
nite optimization problems (i.e., Y(t) finite), where the 
index set of inequality constraints is not assumed to 
be parameter-dependent (i.e., Y(t) constant; cf., e.g., 
[5,7,13,15]). It is assumed (in the sequel briefly referred 
to by Acusc) that each set Y(t) C R'’, t € R, is com- 
pact, and that the set-valued mapping t — Y(t) is upper 
semicontinuous at each t € R. 

A point x € M(#) is called a generalized criti- 
cal point (shortly, g.c.point) for (SIP); if the family of 
vectors D, f(x, f), Dyhi(X, ft), i € I, Dyg(X, ty), y € 
Yo(X, f)) is linearly dependent. Here, D, f stands for the 
row vector of first partial derivatives, and Yo(x, f) de- 
notes the parameter-dependent set of active inequality 
constraints, i.e., Yo (x, t) = {y € Y(t): g(x, t, y) = O}. 
The generalized critical point set is defined to be the set 
y= {(x, t) € R"x R: x is a g.c.point for (SIP);}. By the 
well-known first order necessary optimality condition 
of F. John, for every local minimizer x of (SIP); one has 
z=(x,HNeE DL. 

The main idea for the investigation of parameter- 
dependent local minimizers is to study the larger set 2’ 
which contains also local maximizers and several kinds 
of saddle points. For a fixed generic problem, these dif- 
ferent classes of g.c.points can easily be distinguished 
by algebraic conditions which use the so-called linear 
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and quadratic (co-)indices. In particular, the linear (co- 
index refers to the number of (positive) negative La- 
grange multipliers corresponding to active inequality 
constraints; the quadratic (co-)index counts the num- 
ber of (positive) negative eigenvalues of the restriction 
of the Hessian of a Lagrangian to the tangent space. 
For a one-parametric problem, these numbers may only 
change at singularities in X’. The nonsingular points in 
» are termed nondegenerate critical points (for details, 
cf. [13]). In the finite case, i.e., when Y(t) is a con- 
stant and finite set, a nondegenerate critical point is 
a local minimizer if and only if its linear as well as its 
quadratic index vanish. Moreover, at a non-degenerate 
critical point z, the set X’ can locally be described by 
the implicit function theorem which yields a regular 
parametrization of X’ by t. Since the quadruple of linear 
and quadratic (co-)indices remains constant in a neigh- 
borhood of z, a nondegenerate local minimizer remains 
stable under small pertubations of the parameter f. 

In [5,7], H.Th. Jongen, P. Jonker, and F. Twilt 
showed for one-parametric finite optimization prob- 
lems that, apart from non-degenerate critical points 
(points of type 1) generically there occur exactly four 
different types of singularities in »' (points of type 2, 3, 
4, and 5). At each singularity the change of the index 
quadruple is completely described by certain character- 
istic numbers which can be computed from the prob- 
lem data at the singularity. Results about the topologi- 
cal structure of X’ around the singular points yield the 
fundamentals for the design of numerical path follow- 
ing methods (for details, cf. [3]). 

In one-parametric semi-infinite optimization, gener- 
ically three additional singularities come into play. Let 
all defining functions of (SIP), be three times continu- 
ously differentiable and define the set CUSC to be the 
subset of C?(R"x R, R)!/I*/| consisting of all functions 
(u, v) which define index sets Y(t) such that Acusc 
holds. Then there exists a C}-open dense subset F of C? 
(R" x R, R)4I*! x C3(R" x Rx R’, R) x CUSC, such that 
for all (f, h, g, u, v) € F each point of the corresponding 
g.c.point set &’ is one of eight types. The latter result is 
given in [10] and proved in detail in [18]. 

In the remainder of the present section, this type 
classification is motivated and the local structure of ¥ 
at the typically semi-infinite singularities is discussed. 
Since in the generic case the number of active indices 
at a point (x, f) € Y cannot exceed n+ 1 (cf. [10]), one 


deals with finitely many solutions of the so-called lower 
level problem min g(x, t, -)|y(), as well as with the upper 
level problem f(-, t)|mi). The idea of the type classifica- 
tion in the semi-infinite case is that in exactly one of 
these finitely many finite optimization problems a sin- 
gularity of codimension one occurs. 

First, assume that all active indices are nondegener- 
ate global minimizers of the lower level problem. Then 
a local reduction of the semi-infinite to a finite one- 
parametric optimization problem can be performed, as 
described in the previous section about the Reduction 
Ansatz. Here, the assumption Acusc is needed. In this 
reduced problem, all five types from one-parametric 
finite programming can occur. In particular, (x, f) is 
a nondegenerate critical point of (SIP); if it is of type 
1 for the reduced problem. Moreover, this shows that 
the four singularities from the finite case persist in the 
semi-infinite setting. 

The typically semi-infinite singularities arise if ex- 
actly one of the active indices in Yo(x, t) = {J,,-... Vp} 
is degenerate. Let this be the index ,. In the generic 
case, the degeneracy can only be due to situations such 
as at points of type 2, 3, 4, and 5. As points of type 3 
are never local minimizers (cf. [10]), one is left with 
three possibilities which give rise to the typically semi- 
infinite singularities of type 6, 7, and 8. The present sur- 
vey will not handle all details of the type definitions 
and of the local structures of »’. For locally simplified 
problems these definitions are given in [10], whereas 
the definitions in terms of the original problem data 
are given in [18]. In the following a one-dimensional 
manifold I” in R"x R is said to exhibit a turning point 
at z = (x,f) € I if the function ®(x, t) = t, re- 
stricted to I”, possesses a local extremum at Z. If, addi- 
tionally, I” is locally a C’-manifold and the extremum 
of & is nondegenerate, then Z is called a quadratic turn- 
ing point. 

A g.c.point Z of type 6 can roughly be character- 
ized by the fact that the gradients of constraints which 
are active at y, in the lower level problem are lin- 
early independent, and exactly one of the multipliers 
corresponding to active inequality constraints vanishes. 
Now consider two auxiliary problems (SIP)/ and (SIP)? 
where this inequality constraint is treated as an equality 
constraint, respectively deleted as a constraint. It can 
be shown that Z is a non-degenerate critical point of 
both (SIP)? and (SIP)?, which gives rise to two solu- 
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tion curves WY and W4 in Rx R, both being regularly 
parametrized by ft. Exactly one branch W{ and yt of 
each curve belongs to &’, so that at Z the set & is locally 
a one-dimensional manifold which is piecewise of dif- 
ferentiability class C*. In the case |A|+ p =n, the curves 
w* and W4 meet tangentially, and Y does not exhibit 
a turning point at Z, whereas in the case |A| + p <n 
the curves W° and W meet under a nonvanishing an- 
gle. Moreover, it can be shown that the linear index and 
co-index remain constant when passing a point of type 
6 along &’, whereas the quadratic index and co-index 
change if and only if X’ exhibits a turning point at Z. 
In the latter case, the quadratic index changes by one, 
and characteristic numbers determine the direction of 
change. In particular, a local minimizer can only be lost 
at a point of type 6, if ' exhibits a turning point there. 
In this case, a feasible direction of quadratic descent 
(i. e., a jump direction) can be given (cf. [3,11,18]). 

At a g.c.point Z of type 7, the number of active con- 
straints at y,, in the lower level problem does not exceed 
r, and their gradients are linearly dependent. It turns 
out that necessarily a component of the index set Y(#) 
vanishes under pertubations of the parameter (cf. also 
[8]). As a consequence, the feasible set mapping t > 
M(t) is not upper semicontinuous at f, and a branch of 
> emanates, respectively ends at z. More precisely, 
coincides locally with one branch of a one-dimensional 
C*-manifold which exhibits a quadratic turning point 
at z. A feasible direction of linear descent can be given 
if Y consists locally of local minimizers (cf. [11,18]). 

At a g.c.point Z of type 8, the number of active con- 
straints at y, in the lower level problem equals r + 1. 
If the Mangasarian-Fromovitz constraint qualification 
holds at y,,, one calls Z to be of type 8a, else of type 8b. 
First consider points of type 8a. Similarly to the situa- 
tion at points of type 6, there are two auxiliary problems 
which give rise to two C*-curves W', W? of nondegen- 
erate critical points, both being regularly parametrized 
by t, where exactly one branch of each curve belongs to 
. Since W! and W* meet in Z under a nonvanishing 
angle, »' is locally a one-dimensional manifold which 
is piecewise of differentiability class C’. It can be shown 
that X’ does not exhibit a turning point at z and that 
the index quadruple remains constant when passing Z 
along &. Now let z bea g.c.point of type 8b. Similarly to 
the situation at points of type 7, a component of the in- 
dex set Y(#) vanishes under pertubations of the param- 


eter, and a branch of &’ emanates, respectively ends at Z. 
More precisely, ' coincides locally with one branch of 
a one-dimensional C?-manifold which can be regularly 
parametrized by t. Again a feasible direction of linear 
descent can be given if XY’ consists locally of local mini- 
mizers (cf. [11,18]). 

Finally, there is a remarkable phenomenon con- 
cerning the global structure of 2’. In contrast to the case 
of one-parametric finite programming where the singu- 
lar points form the (relative) boundary of the set of non- 
degenerate critical points (cf. [5,7]), in the semi-infinite 
case there appears an additional type of boundary point 
which does not belong to ¥’. This so-called trap-door 
point occurs when a new component of Y(t) is born at 
a parameter value ¢ (recall that at points of type 7 and 
type 8b such a component vanishes). The occurrence of 
trap-door points as well as of points of type 7 and 8b 
can be avoided if the mapping t > Y(f) is not only as- 
sumed to be upper but also lower semicontinuous. In 
this case, the singular points form the (relative) bound- 
ary of the set of non-degenerate critical points. For de- 
tails, cf. [18,19]. 
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In this article, we describe solution approaches for two 
types of linear problems involving uncertain parame- 
ters. The first problem that is discussed, involves a sin- 
gle uncertain parameter present on the right-hand side 
of the constraints, and the second problem involves un- 
certainty in coefficients of the objective function. The 
significance of solving these problems is that, while the 
first problem takes into account the case when a pa- 
rameter associated with the model equations, such as 
demand and supply is uncertain, the second problem 
incorporates uncertainty in the coefficients of variables 
which define the objective function, such as cost of raw 
materials and selling price of products. Mathematically, 
the former problem depicts the value of objective func- 
tion as the feasible region shrinks (or enlarges), and 
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Parametric Linear Programming: Cost Simplex Algorithm, 
Figure 1 
Right-hand side parametrization 


the latter characterizes the ‘fixed’ feasible region with 
a number of regions corresponding to different sets of 
active and inactive constraints. 

For the first problem, in Fig. 1, PQRS represents 
an initial feasible region corresponding to the value 6, 
of the uncertain parameter, and C represents the con- 
straint involving 0. As 6, changes from 6; to @2, the 
feasible region and the set of active constraints changes. 
For a given range of 0, the aim is then to identify inter- 
vals of @ within which the optimal value function func- 
tion, z(@), preserves its optimality. For the second prob- 
lem, in Fig. 2, where z(#) represents the objective func- 
tion and ¢ represents the vector of uncertain parame- 
ters in the objective function coefficients, although the 
feasible region PQRS remains constant, the slope of ob- 
jective function changes with change in ¢. The aim in 
this case is to identify regions rather than the intervals 
of uncertain parameter space. Next we will describe so- 
lution approaches for both the problems. 


Parametric Linear Programming 


Consider the following parametric linear programming 
problem: 


z(@) = min cx 

st. Ax =b+ fo, 
x>0, (1) 
Omin < 8 < Omax, 
xER", 


Parametric Linear Programming: Cost Simplex Algorithm, 
Figure 2 
Objective function parametrization 


z(0 
(0} 20)" 


8 


min mix 


Parametric Linear Programming: Cost Simplex Algorithm, 
Figure 3 
Optimal value function (right-hand side case) 


where x is a vector of continuous variables; A is a con- 
stant matrix, and c, b and f are constant vectors of ap- 
propriate dimensions; 6 is a scalar uncertain parame- 
ter. The solution of (1) is approached by incorporating 
the uncertain parameter, 0, in the simplex tableau. The 
solution procedure consists of two phases. In the first 
phase, an optimal solution for any @ in [6 mins Omax] is 
obtained. Let B, denote the corresponding optimal ba- 
sis, and xp(0) = B™ 1(b +f 6), the vector of correspond- 
ing basic variables. The critical region, CR, (Fig. 1), as- 
sociated with this basis is given by the condition of pri- 
mal feasibility, x, (0) > 0, together with the condition 
that Onin <0 < max. The optimal value function, z(6)!, 
is given by z(0)! = c} x}(0). This completes the first 
phase. In the second phase, the neighboring bases (Bo 
or B) are identified by one dual step. The procedure is 
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repeated until the complete range of 6 has been charac- 
terized by critical regions and corresponding optimal 
value functions. While solving problems of the form 
given in (1) is usually computationally expensive, it is 
better than exhaustively obtaining the optimal solutions 
for all the values of @ that lie within Omin and Omax. 
However, in the worst case the computational require- 
ments for solving (1) are not bounded by a polynomial 
in the size of the problem [10]. Some other references 
on the subject are [1,2,3,11] and [12]. 


Objective Function Parametrization 


Consider the following linear programming problem 
with uncertain objective function coefficients: 


2p) = minc'(¢)x 
s.t. Ax =b, 
x > 0, 


x ER", 


(2) 
PER, 


where c(g) = c + H ¢, such that c is a constant vector 
and H is a constant matrix; b is a constant vector; @ is 
a vector of uncertain parameters, such that for each 
€ K, @ € R*, (2) has a finite optimal solution, and has 
no optimal solution for @ € R* — K. Further, consider 
the following restriction on ¢ € W, W = {: Gd < gt, 
where G is a constant matrix and g is a constant vector. 
By defining new variables v and z, (2) can be rewritten 
as: 
eee a 
zp?) =ming v+z 
x 
s.t. Ax = b, 
v=H'x, 


z=c'x, 


(3) 


x > 0, 


xER", geER. 


In the simplex based solution algorithm [4], asso- 
ciated with each optimal basis is a region of @ which 
can be uniquely defined for that basis. The basic idea of 
the solution approach is then to identify all such bases 
so that YW is completely characterized by a number of 
regions, and the corresponding value of the objective 
function. This can be achieved as follows. 

Let B denote an optimal basis and xg denote the cor- 
responding optimal solution; also let p denote the cor- 


responding index, also let the basic variables in H be 
denoted by Hz, then we can write: 


Po' —Hiy—H"', 


Pel =cly—¢l, 


(4) 
hn+i = H} xp 7 
zi?) — Ch XB ; 
where 
Y=B tA, xg=B'b. 


The set of equations in (3) can now be rewritten as fol- 
lows: 


Ax =b, 
v+°H'x =hnit ; 
z+zlx =z), 


where ? z= cj, Y, and using appropriate transforma- 
tion, the objective function can be written as: 


z(o) = —(Pz' + 69H" )x +o hygi t+ 2”. (5) 


By denoting cg (@) = cg + Hp , and substituting in (4) 
the following is obtained: 


°2() = °z + Ho. (6) 


As described later, this is an important relation for 

characterizing the region of ¢ associated with the opti- 

mal basis B. In order to obtain parametric solution and 

the corresponding regions of ¢, we first need the fol- 

lowing: 

e The optimal value function, Zmin(@), is concave, lin- 
ear and continuous over K. 

e Two bases are said to be neighboring bases if and 
only if 


i) there exists #* € K such that both bases are opti- 
mal bases for #* € K, and 

ii)it is possible to pass from one bases to the another 
by one primal simplex step. 


e The critical region where the basis B remains opti- 
mal is given by the following condition of dual fea- 
sibility: ° z(p) > 0, or — ° H } <? z, together with 
the restriction on ¢ given by G@ < g. 

e item Two critical regions are said to be neighbors 
(or neighboring critical regions) if their correspond- 
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ing optimal bases are neighbors (or neighboring 

bases). 

The solution algorithm then relies on finding an 
initial optimal solution and the corresponding critical 
region, and then identifying all the critical regions by 
passing from one region to its neighbor by a primal 
simplex step. The procedure is continued until the com- 
plete space of ¢, along with the corresponding opti- 
mal value function, has been characterized. Also see 
[5,6,7,8,9,13,14] and [3] for further details. 
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In this article we will present some developments in the 
subject of parametric programming for problems in- 
volving 0-1 integer variables and nonlinearities in the 
model. For pure quadratic integer problems, [13] pro- 
posed an algorithm by extending some of the concepts 
described in [10] for the case of linear models. The 
problem they considered can be stated as follows: 


(0) = min p(y) 
st. Ay>b+r0, 
0<@<1, 
y € {0,1}", 


(1) 
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where @ is a scalar uncertain parameter; p(y) = y™Qy; 
ris a nonnegative n-vector, such that only r; > 0; Q is 
an (m x m) matrix; A is an (nm x m) matrix. The solu- 
tion procedure is based upon the observation that as 0 
is increased from 0 to 1, it results in shrinking of the 
feasible region (because r > 0) of (1) and hence only 
a solution worse than (greater than or equal to) the so- 
lution at 0 can be obtained, i. e., z(@) is a nondecreasing 
function of @ (see Fig. 1). Another interesting feature 
of the formulation in (1) is that since 9 is bounded be- 
tween lower and upper bounds, there is a finite number 
of (integer) solutions that will lie between these bounds. 
Further, because the optimal solution of (1) at a fixed 
value of 6 corresponds to lattice points of a polyhedron, 
it remain optimal for a range of 6 until another lattice 
point becomes optimal (this results in a finite number 
of intervals of 6 corresponding to the finite number of 
solutions that lie in [0, 1]; Fig. 1). [13] proposed the fol- 
lowing procedure for identifying these (critical) inter- 
vals and corresponding optimal solutions: 


Set k = 1 and let 6 = 0. 
1 | For (1) find the optimum solution y;, and the 
corresponding p(y;). Also find 6; using: 


aiVe— by 


Fil 


ch = 


where a, denotes the first row of A. 

IF no such yx exists, THEN go to 3, 
ELSE go to 2. 

2 | Setk=k+1and let 6 = 0,_, + 6, where 


eu a Tes a 


‘fl 


where 6 = g/r, and g is the greatest common 
divisor of the elements of a). 

IF 6 < 1, THEN go to 1, else go to 3. 

3 | Stop, all critical solutions have been found. 


[2] extended their previous work on dynamic pro- 
gramming to present a solution procedure for pure in- 
teger nonlinear programming problems. [18] extended 
the branch and bound technique of [17] for linear pro- 


z(0) 


—e 


Piynad 
Plyp.od ! 
Plys) i 


0 6 8 6 & & |} 


Parametric Mixed Integer Nonlinear Optimization, Figure 1 
z(0) is a discontinuous nondecreasing step function 


grams to nonlinear programs. They also discussed the 
extension of their results to the mixed integer case. We 
next consider the following parametric mixed integer 
nonlinear programming problem: 


29) = min cly + f(x) 


st. h(x) = 0, 
By + g(x) <b+ 16, (2) 
Omin < 9 < Omax, 
xeX, ye {0,1}", 


where f and g are continuously differentiable and con- 
vex on the n-dimensional compact polyhedral convex 
set X = {x: x € R", Aix < a1; h is affine with respect to x; 
0 is a scalar uncertain parameter; c, b and r are constant 
vectors and B is a constant matrix. 
The solution of (2) can be approached by using 
i) outer-approximation (OA) [3]; or, 
ii) generalized Benders decomposition (GBD) princi- 
ples [9]. 
See [7] for details and some applications of these algo- 
rithms. The basic idea in both the approaches is to de- 
compose the problem into a primal and a master sub- 
problem, and the basic difference between the two ap- 
proaches is in the formulation of the master subprob- 
lem. The solution of primal subproblem, which is ob- 
tained for fixed y, represents a parametric upper bound, 
whereas, the solution of master subproblem, which is 
obtained for a relaxation of the initial problem, repre- 
sents a parametric lower bound. The solution algorithm 
is based upon iterating between these two subproblems 
(upper and lower bounds). 
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The primal subproblem is formulated by fixing y = 
y in (2) to obtain a parametric NLP (pNLP) of the fol- 
lowing form: 


2y,0) = min cly+ f(x) 

a. k=, 
By + g(x) <b+r0, (3) 
0; < 6 < O41, 


xeEXxX. 


Under appropriate continuity and convexity properties 
([4,5,6,11]), the solution of (3) is approximated by cre- 
ating linear parametric upper and lower bounds at the 
extreme points of the interval [0;, 0;41]. A parametric 
upper bound is given by: 


ZY, 8) = a2*(y, 8;) + (1 — a2" (7, 6:41), 
ae€(0,1) (4) 


and a parametric lower bound is given by: 


zy, 0) => max{LB;, LBj+1} 5 

LB; = 2*(y, 6:) + Voz" (y, 6:)(8 — 6;) , 

LBi41 = 2° (Y, 3-41) + Voz" (y, 0:-41)(6 — 9:41) , 
(5) 


where z*(y, 6;) and z*(y, 0:41) are the solutions of (3) 
at 6; and 6;,1, respectively; Vgz* is evaluated through 
the Lagrange multipliers. The parametric upper and 
lower bounds are tightened at the intersection point 
(in @) of LB; and LB;, 1 breaking the interval of @ into 
two smaller intervals, within which another set of upper 
and lower bounds are obtained. This approximation is 
continued until the difference between upper and lower 
bounds is within ¢. If an infeasibility is found at an ex- 
treme point, the following problem is solved to obtain 
the feasible interval: 


min @ (or max 0) 

s.t. h(x) = 0, 
By + g(x)—10 <b, (6) 
xeEX, 
0; <0 < Oj41. 


The final solution of a primal subproblem is given by 
parametric upper bounds obtained in their correspond- 
ing intervals of 6. 


The master subproblem can then be constructed 
by using either OA or GBD principles. Using OA 
principles, master subproblem is formulated as follows 
({1,15,16]): 


zZ(0) = pat clytp 
s.t. T(A)[h(x*) + Vh(x*)(x — x*)] <0, 
By + g(x*) + Vg(x*)(x — x*) 
< bo +16, 
fO)+Vf@*(x—x)-n<o, 7 


Sox doy Sljl=l, 


je] j€L 
0; < 6 < O41, 
xeEX, ye {0,1}", 


where ju is a scalar variable; T(A) is a diagonal matrix 
with elements 


-1, A, <0, 
ty, p —_ 0, Xp = VU, 
1, A,>9, 


where 4, is the Lagrange multiplier of equation hp, p = 
1,..., P, [12]; x* are the solutions of (3) obtained at the 
extreme points of intervals of 6 while solving the primal 
subproblem; J = {j: y = 1} and L = {j: y = 0}, and |J| is the 
cardinality of J. 

Alternatively, using GBD principles, the master 
subproblem is formulated as follows [14]: 


z/(@) =minp 
Yi 
st. peelyt f(x*) +A" [h(x*)] 
+r! [By + g(x*) —b—r6], 
AT Lh(x*)] (8) 
+r) [By + g(x*) —b— 10] <0, 
0; < 6 < O41, 
y € {0,1}", 


where A and t are Lagrange multipliers obtained at ex- 
treme points of intervals of @ in the primal subproblem; 
and the subscript inf corresponds to the Lagrange mul- 
tipliers in the infeasible interval of 6, where following 
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relaxed problem is solved: 


RA (s1 + 52+ ¢) 

s.t. h(x) + s; —s. = 0, 
By + g(x) —b— rine < $, ) 
51,52, = 0, 
xeX, 


where sj, s. and ¢ are positive slack variables and the 
subscript inf corresponds to infeasible 6 points. 

Note that (7) and (8) are parametric mixed integer 
linear programming (pMILP) problems, which can be 
solved using the following algorithm proposed by [15]. 
The solution is approached by fixing 6 at the lower 
value of the interval, in the master subproblem. This 
results in a deterministic MILP, which is then solved 
to obtain an integer solution given by y’. Fixing y = y’ in 
the master subproblem results in a parametric LP (pLP) 
problem. The solution of this pLP, obtained by using an 
algorithm described by [8], is given by a linear paramet- 
ric profile, Z’(@), and represents an upper bound on the 
solution of master subproblem. Another MILP is for- 
mulated, to obtain a breakpoint 04, where some other 
integer solution is lower than the upper bound, z’(6), as 
follows: 


Op = mind 
x,0 


st. T(A)[h(x*) + Vh(x*)(x — x*)] <0, 
By + g(x") + Va(x"*)(x — x*) 
<b + 176, 
F(x") + V(x") — x") — pw <0, 


YVw- dows l-L 


ie] j€L 
be <z(9), 

6; <0 < O41, 
xe xX, 

y € {0, 1}, 


(10) 


/ 


yy. 


For the interval [4;, pp] the solution is given by z'(@) = 
2(0)', and for the rest of the interval, [Oup, 9141] the 
procedure is repeated until (10) is infeasible. The final 
solution is given by a set of parametric profiles valid in 
their corresponding intervals. 


0 (initialization) Define an interval of 6; toler- 
ance, €; an upper bound 2*(0@) = ox; initial 
ay. 

1 (primal problem) For each interval with a 
new integer structure, y: 

la | Solve problem (3) to obtain Z(y, 6) such that 
tolerance, €, is satisfied. 

1b | If Z(y, 0) < Z*(@) update the best upper 
bound function, and the corresponding inte- 
ger solutions, y*. 

2 (master problem) For each interval, solve ei- 
ther (7) or (8) to obtain a lower bound z’(@) 
and the corresponding break points. Define 
the new set of intervals. 

3 (convergence) If for some interval Z*(@) < 
z'(@), or the master problem is feasible, the 
solution is given by the current solution, oth- 
erwise return to primal problem with new in- 
teger solution. 


This solution of the master subproblem represents 
a parametric lower bound. If in an interval the lower 
bound, z’(@), exceeds the upper bound, Z(y, @), or the 
master subproblem is infeasible the algorithm stops 
with the current solution in those intervals as the final 
solution, otherwise, the new vector y along with the cor- 
responding interval obtained from the solution of the 
master subproblem is returned back to the primal sub- 
problem. 

The main steps of the algorithm are summarized 
above. 

These algorithms have been tested on a number of 
applications including process synthesis and planning, 
heat exchanger network synthesis, multi-objective opti- 
mization and simultaneous product and process design 
({1,14,15]). 
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Introduction 


The principle of embedding for nonlinear equations has 
been known for more than 40 years (cf. e. g. [9, Chapt. 
11], and the historical remarks there, [10]). We con- 
sider the nonlinear system of equations 


F(x) =0 (1) 


defined by F: R" > R" (e.g. F € C? (R", R")) and equa- 
tion (1) is embedded by 


H(x,t)=0, te[0,1], 


with the following properties: 


H(x°,0)=0, H(x,1) = F(x), 


where H: R” x R > R", x° € R" arbitrarily chosen and 
fixed. 

The following so-called standard embedding is one 
example: 


H(x, t) := F(x) + (t-—1)F(x®). 


Moreover, there are many practical examples, in partic- 
ular from mechanics and electrotechnics, which, a pri- 
ori, depend on one real parameter ¢t. Then the function 
H is given explicitly. In order to find a solution for (1), 
we have to find a discretization of the interval [0, 1]: 


O0= to <-:+< tj) < tip <-:+<ty=l (2) 


and corresponding (x(t;), t;) with H(x(t;), t;) = 0, i = 
1,...,.N, using the starting point x°. The main tools for 
finding such a discretization are path following methods 
(also called homotopy methods or continuation meth- 
ods). The general principle will be explained under the 
following assumptions: 


El) HE C?(R"xR,R). 

E2) There exists a function x € C1([t, f], R”) such that 
H(x(t), t) = 0. 

If we denote by r(¢) the radius of convergence, for in- 

stance for the Newton method solving H(x, t) = 0 in 

a neighborhood of x(t), then we have (cf. [3]) 


Theorem 1 Assume El) and E2). Then there exists 
a real number r > 0 such that r() > r for allt € [t, ¢]. 


Path following methods are realized by so-called 
predictor-corrector schemes. The main idea of such 
a scheme is the following: t; > t;,1 beginning at 
some t with known x(t) is the predictor, and the New- 
ton method is the corrector. More precisely, using the 
known point (x', t;) as an approximation of (x(t;), tj), 
we compute an approximation (x‘*!, t;41) of (x(t;+), 
ti+1) by a finite number of Newton steps, where fj 41 
has to be chosen in such a way that x’ lies in the con- 
vergence region of the problem 


H(x, ti+1) =0 


if |t;.1 — t;| is chosen sufficiently small. 

For example, consider H(x, t) = 3/x + t — 4.5. The 
point (x', t;) = (0.7499,0.5) is an approximation of 
(x(t;), t;) and for t;4 1 = 0.6, x! lies in the convergence 
region of H(x, 0.6) = 0, starting at x'*! = 0.7692. 

Let ¢ be the endpoint of the scheme. In case of 
[t,t] = [0,1], we obtain, in a finite number of steps, 
a point lying in r(1) and having at least superlinear con- 
vergence. 

Concerning parameter-dependent equations we re- 
fer to [1] and the very good and informative bibliogra- 
phy therein. 

Now we consider a nonlinear optimization prob- 
lem: 


fey, bilx) =0,i€], 
eye) C=O FET 


(where I := {1, ..., m}, J = {1, ..., s} and assume that 
f, hi gj € CX(R™, R), i € I, j € J, and k > 1 will be 
specified later) and the corresponding Karush-Kuhn- 
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Tucker system (shortly KKT-system) 
Df (x) + > A; Dh;(x) 


iel 
+57 wjDgj(x) = 0, 

jeJ 
h;(x, t) = 0, ie], (3) 
g(x) < 0, je, 
Mj 29, jel, 
Mj (x) = 9, 76) 


(here Df (x) := (0f/ 0x1, ..., Of/ 0x,) is a row vector). 

In the literature we can distinguish two general ap- 
proaches to realize the idea explained above for nonlin- 
ear optimization problems. 

e First approach: Transform the KKT system into 

a system of equations or into a generalized equation. 
e Second approach: (P) will be embedded by a one- 

parametric optimization problem. 

In the first approach, both transformations are well 
known. The KKT system (3) is a mixture of equations 
and inequalities. The reformulation to a system of equa- 
tions is quite simple. For y € R we define 


+ 


y' :=max{y,0}, y :=min{y,0}. 


We can formulate the KKT system as an equivalent sys- 
tem of equations: 


D(f (x) + je, viDhi(x) 
+ Dijes ¥f Dgy(x), 
h(x), ie], 
yi —gi~), jel 


(cf. e. g. [3, Ref. 134 and 135]). H(x, y) is called a Kojima 
function. We refer to the interesting article [7], and the 
references therein. Here, f, hj, gj € C’'(R", R), i€ Lj 
€ J (i-e., there exist derivatives which are locally Lips- 
chitz). Two embeddings (parametrizations) of H(x, y, 
t) = 0 are investigated. In particular, this permits new 
interpretations of the related solution methods (penalty 
and a new barrier method) and allows for estimates of 
the solutions by using implicit function theorems. 
There are also possibilities to reformulate the KKT 
system into a k-times (k > 1) continuously differen- 
tiable nonlinear parameter-dependent system of equa- 
tions (cf. e.g. [3, Ref. 62] and the references there). 
Then we can use standard methods for parameter- 
dependent equations (cf. the literature mentioned 


H(x,y):= =0 (4) 


above). Furthermore, we can also consider the KKT sys- 
tem as a generalized equation (cf. [3, Ref. 193] and the 
references there) and use the corresponding path fol- 
lowing methods (cf. [3, Ref. 177]). 

Before we come to the second approach, we intro- 
duce a general one- parametric nonlinear optimization 
problem 

min{f(x,f): xe M(t}, 

teR, resp.te€ [0,1], 
hj(x,t)=O0,ieT, 
g(x, t) <0, jet 
fihigpeC R XRR), 
jel, ke 2, 

I:= {1,...,m}, J:= {1,...,s}. 


M(t)=3xeER": 


ie_, 


Furthermore, we introduce the following notations: 


x a generalized 
critical point of ¢ , 
P(t) 


II 


Xige : (x,t)e R” xR: 
x a stationary 


Mstat = point of P(t) { ° 


(x,t)e R"” xR: 


x alocal 
(x,t)€R” xR: minimizerof } , 
P(t) 


Joc = 


(a generalized critical (g.c.) point is defined in [3] for 
instance). 

We adapt the concept of embedding to the problem 
(P) using the problem P(t) with at least the following 
properties: 
Al) A local minimizer x° for P(0) is known; 
A2) P(1) is equivalent to (P). 
The conditions Al) and A2) are fulfilled if we choose 
e.g. P(t) defined by 


fx, := tf) +(1—-d)|x-2x°]*, 
h(x, t) = hi(x) + (t —Dhj(x°), iel, 
g(x, t) = gi(x) + (t-1) |gi(x)|, fel, 


where x° € R” is arbitrarily fixed. 

Here too, we have to find a discretization (2) of 
the interval [0, 1] and corresponding points (x(t;), ti) 
€ Moc (Xstar Or Lgc), i= 1, ..., N. We can also use 
a predictor-corrector scheme applied to the full prob- 
lem P(t) in some interval [t, f] © [0, 1] (a modification 
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of the above Theorem holds, cf. [3]) or to a finite num- 
ber of reduced problems with equality constraints only: 


eas. ance h(x, t) =0, i € I, 
P(t): a ge Heh, ; 
te [ts ts41] ’ 


where Jo is the index set of active constraints, which is 
constant in a certain interval [t,, t,,1] C [0, 1]. We call 
this procedure an active index set strategy. In order to 
understand how difficult it is to find such a discretiza- 
tion (2), we give a very brief summary on two classes 
of functions (f, H, G) (H = (hy, ..., hm)™, G = (g1,..., 
g;)'), namely, the Jongen-Jonker-Twilt class F (cf. [3] 
and the original articles [3, Ref. 115 and 118]) and the 
Kojima-Hirabayashi class (cf. [3, Ref. 135]). 


Generic Singularities, the Approach via Piecewise 
Differentiability and Topological Stability 


First, we introduce two well-known constraint qualifi- 
cations. 

The linear independence constraint qualification 
(LICQ) is satisfied at x € M(#) if the vectors D,h;(x, f), 
i € I, Dy g(x, t), j € Jo(X, f), are linearly independent 
Vols, t) := fj € J g(a t) = 0). 

The Mangasarian-Fromovitz constraint qualifica- 
tion (MFCQ) is satisfied at x € M(f) if: 

MF1) D,h;(x, t), i € I, are linearly independent 
ME2) there exists a vector € € R” with 


ie, 
j € Jo(&, 2). 


D,hy(X, DE = 0, 
Drg(X, HE <0, 


We consider the Kojima function H{(x, y, t) corre- 
sponding to P(t) and H(x, y, t) is piecewise continu- 
ously differentiable (shortly PC’) (see [4]). The classical 
definition of a regular value of a continuously differ- 
entiable function is generalized for PC’ functions. Fur- 
thermore, it is shown that, if 0 is a regular value of H, 
then the set H{~!(0) is PC! manifold (cf. [3, Ref. 135]. 
Next, we cite our short characterization of the class 
F introduced by H.Th. Jongen, P. Jonker and F. Twilt. 
The space C?(R" x R, R) will be endowed with the 
strong (or Whitney-) C2-topology, the C?-topology of 
the product of a finite number of copies of C?(R" x 
R, R) being the induced product topology. A typical 
C? base-neighborhood N of the zero function in C?(R" 


x R, R) is induced by means of a continuous positive 
function e: R" x R > Ras follows: 


N, ={¢ € C?(R" xR): 
dp od 
le@l+ | ste | 3 | me) | 


Pg 
as Zz | 0Z;02j0Z, @) 


forall ze R"t}, 


< €(z) 


where z = (x, t) € R" x R. A typical C? base- 
neighborhood of f € C?(R” x R, R) will be the set f + 
Ne. 

The local structure of ¥’,. is completely described if 
(f, H, G) belongs to a C2-open and dense subset F of 
C?(R" x R, R)"*5, 

If (f, H, G) € F then Y,, can be divided into 5 types: 
e Type 1: A point Z = (X,t) € Xy- is of Type 1 if the 

following conditions are satisfied 


Df + So AiDehi + > [Lj Dx gj =0, 
i€l j€Io@ we 

(5) 

the LICQ is satisfied at x € M(#), (6) 

bj #0, je IoZ), (7) 

D2 L(x) | 7@ 8 nonsingular. (8) 


Here, DL is the Hessian of the Lagrangian 


L=f+) ahit+ D> wjgy. 


i€l j€Jo(z) 


and the uniquely determined numbers Aju; are 
taken from (5). 
Furthermore, 


Dyhj(zé = 0, i € I, 


T(z) = i eR’: Dy g(Zé = 0, je Jo(z) 


is the tangent space at z. D°L(z)|7(. represents 
VIDILY, where V is a matrix whose columns form 
a basis of T(z). 

A point of Type 1 is called a nondegenerate criti- 
cal point. The set ’,, is the closure of the set of all 
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Type | 


- 
"KEE 
- Aa i 

> IS 


Afz) 20 na ") xe F 
g 
- 4 * e 
Type 5 os ¥ fr a wr 


“be: | cre al 


MFCQ violated MFCQ aielated 
I 


MPCQ holds 
k m 


Parametric Optimization: Embeddings, Path Following and 
Singularities, Figure 1 


points of Type 1, the points of the Types 2-5 consti- 
tute a discrete subset of 2’,.. The points of the Types 
2-5 represent three basic degeneracies: 

e Type 2: violation of (7). 

e Type 3: violation of (8). 

e Type 4: violation of (6) and 


[T| + Jo(Z)|-l <n. 
e Type 5: violation of (6) and 
UU] + Jo@)|=n+1. 


For each of these five types Fig. | illustrates the local 
structure of 2’... Let wee? v €{1,..., 5} be the set of g.c. 
points of ope v. Figure 2 illustrates the local structure 


of F in Yjo¢ and SY star. The class F is defined by 


=H. GeC Rsk Ry: 
pe as Us Ee : 


The following theorem (cf. [4, Ref. 15]) provides 
a special perturbation of (f, H, G) with additional pa- 
rameters that can be chosen arbitrarily small such that 
the perturbed function vector belongs to the class J. 


Theorem 2 Let (f, H, G) € C?(R" x R, R!*"*’). Then, 
for almost all (b, A, c, D, e, F) € R" x RV? x R™ x 


R™ x RS x R™", we have 


(f(x, t) + b'x+x' Ax, H(x,t) +c+ Dx, 
G(x,thte+FxyeF. 


Here ‘almost all’ means: each measurable subset of 
{(b, A, c, D, e, F): (f(x, t) + b' x +x" Ax, 
H(x,t) +c + Dx, G(x, t) +e+ Fx) ¢ F} 


has Lebesgue-measure zero. 


Remark 3 There is also a perturbation theorem for the 
Kojima-Hirabyashi class with a linear perturbation in 
the objective and scalar perturbations in the constraints 
(cf. [3, Ref. 135]) 


Remark 4 (cf. [3]) Considering 2’stat we note that the 
condition (f, H, G) € F implies that zero is a regular 
value of the Kojima-mapping 1. 


Definition 5 Let K CRU {+oo}. 

i) The problem P(t) is called regular in the sense of Jon- 
gen-Jonker-Twilt (briefly: JJT-regular) (with respect 
to K) if 


5 
(f,H,G) € F (« x K)N Le CU =: . 
v=1 
ii) The problem P(t) is called regular in the sense of 
Kojima-Hirabayashi (briefly: KH-regular) (with re- 
spect to K) if 0 € RR" is a regular value of H 
(Hl anxrmxrsxk)» 


Now, we present some facts about path following meth- 
ods (for more details see [3]). For this, we assume A1), 
A2) and that (f, H, G) € F. The algorithm PATH III 
(cf. [3, Chap. 4]) computes a numerical description of 
a compact connected component in >» gc and stat, re- 
spectively. 

In the last part of this section we present two the- 
orems that are essential for the analysis of considered 
embeddings in the literature and for modifying embed- 
dings that are successful under certain assumptions. 


Theorem 6 ([3, Ref. 71]) We assume 

C1) M(t) is nonempty and there exists a compact set C 
with M(t) C C for allt € [0, 1]. 

C2) P(t) is KH-regular with respect to [0, 1]. 

C3) There exists a t; > 0 and a continuous function x: 
[0, t1) —> R" such that x(t) is the unique stationary 
point for P(t) for t € [0, t1). 
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Type 3 


e Type 3 f 


Type 5 k 


g Type 4 h Type 4 


Type 5 


Parametric Optimization: Embeddings, Path Following and Singularities, Figure 2 
The full curve stands for the curve of local minimizers and the dotted curve represents a curve of stationary points 


C4) MFCQ is satisfied for all x € M(t) for all t € [0, 1]. 
Then there exists a PC!-path in ¥ sta that connects (x°, 
0) with some point (x*, 1). 


Applying our first Remark we obtain 


Corollary 7 We assume C1), C3), C4) and 

D2) P(t) is JJT-regular with respect to [0, 1]. 

Then there exists a PC?-path K(x°, 0) in Y stat connecting 
(x°, 0) with some point (x*, 1). Furthermore, if (x, t) € 
K(x®, 0), then (x, t) belongs to Uy € 1, 2,3, 5} vc 


Finally we present a consequence of a general topologi- 
cal stability result given in [3, Ref. 87]: 


Theorem 8 We assume C1) and C4). Then M(t) is 
homeomorphic with M(to) for allt), tz € [0, 1]. 


Remark 9 We see that the MFCQ plays an important 
role in both theorems above. Unless the MFCQ is sat- 
isfied, we do not attain a point (x,1) € Dstat by path 
following methods only, g.c. points of the Types 4 and 
5 may appear and the path ends in © stat. 


From this point of view we refer to the analysis of 
possible jumps from one connected component to an- 
other in Y’gar and 2g, respectively (cf. [3, Chap. 5]). 
In the worst case we have to find all connected compo- 
nents in 2’,.. Since we do not have jumps in all cases, 
this problem has not been solved yet. This is not sur- 
prising because, in case we surely find a discretization 
(1) and a corresponding point (x, t;) € Necy i= l,..., 
N, the problem of global optimization is solved (cf. e. g. 
[3, Sect. 6.3]). 


Concluding Remarks 


i) We have restricted ourselves to parametric optimiza- 
tion problems in R” so far. We refer to semi-infinite 
problems (cf. e.g. [6, pp. 161-176], and the refer- 
ences in that article). 

ii) The theory and methods described in the section 
above are used for an analysis and appropriate mod- 
ifications of the standard embedding (cf. [6, pp. 59- 
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84]), the penalty, exact penalty, and Lagrange multi- 
plier embeddings (cf. [4] and [4, Refs. 2, 3, 6]), and 
for constructing new methods [2]. Summarizing, we 
have now more clearness concerning the problem of 
what kind of difficulties (mainly singularities) may 
appear and how to overcome them in some cases 
(see also [5]). 


iii) We refer to further applications: One parametric 


optimization problems arising for instance in practi- 
cal problems, multi-objective optimization, stochas- 
tic optimization etc. (cf. [3, Chap. 1]). Of course, 
on the one hand, the applications described there 
are not complete and, on the other hand, there is 
a stormy development in several fields. 


iv) Finally, we would like to refer to so-called interior 


point methods, which are path following methods, 
too (cf. e.g. [8]). 


See also 


> Bounds and Solution Vector Estimates for 
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Introduction 


Peptide and protein identification is of fundamental 
importance in the study of proteomics. Tandem mass 
spectrometry (MS/MS) coupled with high performance 
liquid chromatography (HPLC) has emerged as a pow- 
erful protocol for high-throughput and high sensitiv- 
ity peptide and protein identification experiments. In 
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recognition of the extensive amount of sequence infor- 
mation embedded in a single mass spectrum, tandem 
MS has served as an impetus for the recent develop- 
ment of numerous computational approaches formu- 
lated to sequence peptides robustly and efficiently with 
particular emphasis on the integration of these algo- 
rithms into a high throughput computational frame- 
work. The two most frequently reported computational 
approaches in the literature are (a) de novo and (b) 
database search methods, both of which can utilize 
deterministic, probabilistic and/or stochastic solution 
techniques. 

The majority of peptide identification methods 
used in practice are database search methods [2,11,19, 
21,30,31,32,36,37] due to their accuracy and ease of 
implementation. A variety of techniques for peptide 
identification via databases are currently available. 
One approach, as implemented in the SEQUEST al- 
gorithm [11,36,37], uses cross-correlation to mathe- 
matically determine the similarity between a theoret- 
ical tandem mass spectrum predicted from a peptide 
sequence in the database and the experimental tan- 
dem mass spectrum under investigation. The more fre- 
quently used technique, known as probability-based 
matching, utilizes a probabilistic model to determine 
whether an ion peak match between the experimen- 
tal and theoretical tandem mass spectrum is actual or 
random [2,19,30,32]. Various models have been formu- 
lated for this purpose, ranging from a likelihood ratio 
hypothesis test [2,19] to the null hypothesis that peptide 
matches are random [32]. The major limitation of data- 
base methods is they are ineffective if the database in 
which the search is conducted does not contain the cor- 
responding protein responsible for generating the tan- 
dem mass spectrum. 

De novo methods are advantageous over database 
techniques since they can be used to find novel pro- 
teins, amino acid mutations and study the proteome 
before the genome. A prominent methodology for the 
de novo peptide identification problem is the spec- 
trum graph approach [3,4,5,6,8,12,18,22,25,34,35]. In 
the majority of spectrum graph representations, the 
peaks in the tandem mass spectrum as translated as 
nodes on a directed graph, where the nodes are con- 
nected by edges if the mass difference between them 
is equal to the weight of an amino acid. The nodes 
or edges of the spectrum graph are typically assigned 


scores based on empirically-derived weights. Various 
alternative techniques to the spectrum graph approach 
have also been developed. For example, the de novo 
algorithm PEAKS [26] generates 10,000 potential se- 
quences using a dynamic programming algorithm and 
then in a subsequent step reevaluates the predicted 
sequences using a stricter confidence scorer. Another 
technique addresses the peptide identification prob- 
lem via stochastic optimization using genetic algo- 
rithms to solve multi-objective models and can em- 
pirically test for independence between scoring func- 
tions [20,27,28]. The algorithm NovoHMM [13] uses 
a hidden Markov model to solve the peptide identifica- 
tion problem, where the observable random variables 
are the observed mass peaks and the hidden variables 
correspond to the unknown peptide sequence. Despite 
the vast potential of de novo methods, they can be com- 
putationally demanding and may exhibit variable pre- 
diction accuracies. 

In this work, we present a novel mixed-integer lin- 
ear optimization (MILP) approach to efficiently ad- 
dress the de novo peptide identification problem so 
as to form a basis for a high-throughput compu- 
tational framework for peptide identification [9,10]. 
This framework is denoted as PILOT, which stands 
for Peptide identification via Mixed-Integer Linear 
Optimization and Tandem mass spectrometry. 


Formulation 


This section provides a thorough description of the 
mathematical model and algorithmic framework for 
the de novo identification of peptides of spectra result- 
ing from tandem mass spectrometry. 


Mathematical Model for Peptide Identification 


The essential components of the mixed-integer linear 
programming problem formulation are the parameters, 
sets, binary variables, constraint equations and objec- 
tive function [10]. 


Parameters Each tandem mass spectrum of a pep- 
tide contains the mass of the parent peptide, its charge 
state, and a list of mass-to-charge ratios and their corre- 
sponding intensities of the fragment ions of the peptide. 
It is important to note that the mass of the parent pep- 
tide and the masses of the ion peaks in the tandem mass 
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spectra are subject to a certain degree of experimental 
error [8]. The parameters are: 


Mp mass of parent peptide 


mass(ion peak i) mass of ion peak i 


Xi intensity of ion peak i 
Set Definitions The first set to consider is the mass 
difference between all of the peaks in the tandem mass 
spectrum, which we denote by the matrix M defined in 


Eq. (1). 


M= {Mi,j = mass(ion peak j) — mass(ion peak i) : 
mass(ion peak j) > mass(ion peak i)} 


(1) 


The index i corresponds to the rows and the index j cor- 
responds to the columns of the matrix M;,;. The con- 
struction of the peptide sequence should be restricted to 
using only those peaks whose mass difference is equal to 
the weight of an amino acid. The indices corresponding 
to these peak pairs are stored in the matrix S, defined in 
Eq. (2). 


S = {Sj,j = Gi j) : Mi,j = mass of an amino acid} 


(2) 


Thus, the mass difference between peak i and peak j is 
equal to the weight of some amino acid for every (i, j) € 
S;,j. The subsequent problem formulation will only be 
considered over this set, Sj, ;. 

The physical relationships between fragment ions 
can be used to construct additional sets. The sequenc- 
ing of a candidate peptide must be done using ions 
from the same ion series (i. e., b, y, etc.). While it is not 
known a priori of what ion type a given mass peak is, 
there do exist important relationships among the dif- 
ferent ions. As the charged parent peptide undergoes 
collision-induced dissociation (CID), it primarily frag- 
ments into two ion pairs: either a and x, b and y, or c 
and z, where all three pairs are what are known as com- 
plementary ions by definition. These ion pairs are easily 
identified a priori since the sum of two complementary 
ions is equal to the weight of the parent peptide, mp, as 
determined experimentally. The indices of peak pairs 
which satisfy this relationship are stored in the set C, 


defined in Eq. (3). 


C = {C,; = (i, j) : mass(ion peak i) 
+ mass(ion peak j) = mp +2; i j} (3) 


This set will be useful for eliminating certain ions in 
the sequencing calculations. However, one should note 
that further fragmentation of these ions is possible 
and frequently observed, which places limitations on 
how many complementary ions are actually detected in 
a tandem mass spectrum. 

Different types of ion series begin and end at differ- 
ent m/z values in the tandem mass spectrum. For in- 
stance, a candidate peptide derived using the y-ion se- 
ries must begin at the weight of water (19 Daltons) and 
terminate at the weight of the parent peptide (mp + 1), 
whereas deriving the same sequence using the b-ion se- 
ries, the appropriate bounds become zero mass (1 Dal- 
ton) and the weight of the parent peptide subtracted 
by the weight of water (mp — 17), in respective order. 
To model this mathematically, two new sets are created 
which correspond to the boundary conditions at the 
“head” of the peptide and the “tail” of the peptide. Note 
that the sets presented below consider only the possibil- 
ity for b- or y-ions in the candidate sequence. 


BC4 — {1,19} Daltons (4) 
BGS = {mp —17, mp + 1} Daltons (5) 


Binary Variables Binary {0-1} variables are utilized 
in the problem formulation to model which peaks (p;) 
and paths connecting peaks (wj;,;) are used in the con- 
struction of the candidate sequence. The use of binary 
variables also allows us to invoke logical inference when 
formulating the model constraints. 


§ 1, if peak (i) is selected 
ee | 0, otherwise 


1, if peaks (i) and (j) are connected by 


Wij = ) apath (i.e.,p; = pj = 1) 
0, otherwise 
Constraints Several constraints derived from ion 


properties and graph theory are formulated in terms 
of the binary variables via logical inference. The first 
constraint exploits the fact that the candidate peptide 
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must be sequenced using ions of the same type and that 
complementary ions are of different type by definition. 
Thus, if peak i is used to construct a candidate sequence 
and the peak pair (i, j) belong to the complementary 
ion set defined in Eq. (3), then peak j should be elim- 
inated from consideration in the sequencing calcula- 
tions. This is modeled mathematically by Eq. (6). 


pitpj)<1 VG) € Cy (6) 


An obvious but important constraint to impose on 
the candidate sequence is that the summation of the 
weights of its amino acids is equal to the mass of the 
parent peptide (mp). It is well-known that the exper- 
imentally measured parent peptide mass is subject to 
a certain degree of experimental error [8], which is 
dependent on the resolution of the mass spectrome- 
ter used. Thus, exact conservation of mass cannot be 
achieved but must be relaxed by some tolerance of er- 
ror, as shown in Eq. (7). 


+ Mi,j* Wi,j < (mp — 18) + tolerance (7) 
(JES; 


> Mi, * Wi,j = (mp — 18) — tolerance (8) 

GE JESi,; 

The algorithm typically uses a tolerance of error of +2 
Daltons above and below the parent peptide mass. It is 
also possible to formulate the tolerance term as a vari- 
able and then incorporate it into the model such that its 
value is minimized. 

The sequencing of the candidate peptide is best en- 
visioned as connecting peaks in the tandem mass spec- 
trum with paths that correspond to weights of amino 
acids. To ensure that the paths selected are continu- 
ous and non-degenerate, we use the flow conservation 
law from graph theory which has been used extensively 
in process synthesis problems [1,15,16,17,24,29,33], as 
shown in Eq. (9). 


Ym Twa =0 
j€ Sj, KES; 
ViteBo™ i¢Bc™ (6) 
The above constraints ensure that the number of input 


paths entering a peak is equal to the number of output 
paths leaving that peak. 


To anchor the beginning and end of the candidate 
sequence, the peaks denoted as the boundary condi- 
tions for the sequencing are activated, as shown in Eqs. 
(10) and (11). 


» Wij =1 (10) 
ieBched jESi,j 
>, » eye (11) 


jeBow" 165i; 


One should note that the existence for certain 
boundary condition elements (contained in the sets 
BC*=4 and BC) are checked for by the preprocessing 
algorithm (described in a subsequent section) and can 
be adjusted if information is missing from the tandem 
mass spectrum. Furthermore, these constraints enforce 
the non-degeneracy of paths since only one path can ini- 
tiate and terminate the sequence, respectively. 

The final set of constraints constitutes the mathe- 
matical relationship between the binary variables rep- 
resenting the peaks, p;, and the paths connecting the 
peaks, wj,;: 


= Wij = Pi Vie Bo (12) 
JESEi, 33 

De Wj,i = Pi Vi ¢ Bo (13) 
JESi,j 


These constraints ensure that if there exists a path 
entering and leaving a peak k (i.e., wi, = land wx,j = 
1), then peak k will be activated in the construction of 
the candidate sequence (i.e., p; = 1). These constraints 
also allow the option for removing peaks (and the paths 
connected to these peaks) from the sequencing calcu- 
lations by simply deactivating the binary variables that 
represents those peaks (p;). For instance, this is useful 
for eliminating the precursor ion and multiply-charged 
ions from consideration. 


Objective Function ‘The b- and y-ion peaks are on 
average the most abundant in intensity throughout the 
entire m/z range [23]. Based on this observation, the 
objective function is postulated as an explicit function 
of the peak intensities in attempt to maximize the num- 
ber of b- and y-ions used in constructing the candidate 
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sequence. 


MAX 


Pks Wij 
1 ( 


(14) 


> Aj ‘Wij 


i,jJESi,j 


Equations (1-14) comprise the entire mathematical 
model for the de novo peptide sequencing problem us- 
ing tandem mass spectrometry. The entire problem for- 
mulation is summarized in problem (P). 


MAX Aj Wij 
Pho Wij. oD a 
(i, jJESi,j 
s.t. = Mi,j* Wi,j < mp + tolerance 
(i, f)ESi,j 
> Mi,j- Wi,j = mp — tolerance 
(i, f)ESi, j 
pitpjis1 Vj) € Gj 
>. Wi,j = Pi Vie Bole 
GESi,j 
~~ Wii = Pi Vi ¢ Ror 
G€Si,j 


x x wi = 1 (P) 


ieBched jESi,j 


Dd, py ta" 


jeBc" 165i; 

me Oma 
JESj,i KES; 

Vi, i ¢ BC}, i é BC™ 
Wij, Pk = {0,1} VG, 9), (k) 


The resulting problem (P) is a mixed-integer linear pro- 
gramming (MILP) problem and can be solved to op- 
timality using existing methods (e.g., CPLEX [7]). To 
generate a rank-ordered list of candidate sequences, in- 
teger cuts are used to exclude previous solutions from 
be revisited. That is, for every solution, an integer cut is 
incorporated into the model using the following general 
form[14]: 


ij - 


(i,jeB 


y Wig Ss [Bl =1 


(i,j)ENB 


(15) 


where B = {(i, j): wij = 1} 
NB = {(G, j) [Wig = 0} 
|B| is the cardinality of B 


Peptide Identification via Mixed-Integer Optimization, Ta- 
ble 1 
lons identified by the preprocessing algorithm 


lon Type Relation to the bz-ion 
mp + 2— bz 
bp - 28 
AA; or AA2 


where AA; + AA2 + 1=b2 
mp + 2— AA, or mp + 2— AA2 
where AA; + AA2 + 1=b2 


Algorithmic Framework 


The overall algorithm PILOT is comprised of: (1) a pre- 
processing algorithm used to identify certain peaks 
and to validate boundary conditions, (2) a two-stage 
mixed-integer linear optimization framework to ad- 
dress missing ion peaks due to residue-dependent frag- 
mentation characteristics, and (3) a post-processing 
technique for selecting the most probable sequence by 
cross-correlating the theoretical tandem mass spectra of 
the candidate sequences with the experimental tandem 
mass spectrum [9]. 


Preprocessing of Spectral Data Before formulating 
the MILP problem, the raw tandem mass spectrum is 
analyzed using a preprocessing algorithm to elucidate 
key spectral features that can be exploited in the se- 
quencing calculations. In particular, certain ion peaks 
are sought to confirm the proposed boundary condi- 
tions previously mentioned. First, the raw spectrum is 
examined for the existence of the typically abundant in 
intensity b2-ion [23], whose validity can be confirmed 
by its complementary y,-2-ion. If the corresponding 
Yn-2-ion also exists in the spectrum, then the two pos- 
sible yn-1-ions are back-calculated using the mass of the 
parent peptide (mp) and the weights of the amino acids 
which constitute the b2-ion (see Table 1), and the spec- 
trum is once again searched to confirm these ions. 
Every b)-ion amino acid pair found by the algo- 
rithm is then ranked based on the intensities of their 
supporting ions (i.€., Vfn—2}, Vfn—1}, 442} and their iso- 
topic offsets and neutral losses of water and ammo- 
nia). The highest scoring b2-ion amino acid pair is then 
selected to represent N-terminal boundary conditions 
for the sequencing calculations. The preprocessing al- 
gorithm is also used to identify the C-terminal amino 
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acid based on known ion peaks. For example, if the pep- 
tide is a result of tryptic digestion, then its C-terminal 
amino acid must be lysine (indicated by an ion peak 
at 147 Da) or arginine (indicated by an ion peak at 
175 Da). Multiply-charged ions and immonium ions 
are also searched for in the experimental tandem mass 
spectrum to guide the interpretation process. 


Missing Ion Peaks: Two-Stage Algorithmic Approach 
In the mixed-integer linear optimization model (see 
problem P), the algorithm attempts to derive sev- 
eral candidate sequences using only single amino acid 
weights to connect mass peaks. However, if the spec- 
trum is missing certain mass peaks then it might not 
be possible to fully construct the correct sequence in 
this fashion. To accommodate this issue, the sequenc- 
ing problem is split into two stages: stage one derives 
the candidate peptides using only single amino weights 
and stage two allows for the possibility of using two 
amino acid weights. That is to say, in the second stage 
the additional option is available to connect two peaks 
via the weight of any two combined amino acids. In 
this stage the emphasis is again placed on primarily us- 
ing the weights of single amino acids to construct the 
candidate sequences by penalizing the use of combined 
amino acid weights. This is accomplished in the ob- 
jective function by multiplying the path variable cor- 
responding to multiple residues by a penalty weighting 
fraction which is less than one and decreases with in- 
creasing mass error. Asa result, the driving force for the 
algorithm is the single residue weights, while the dou- 
ble residue weights are utilized only to bridge the gap 
between disjoint single residue segments of the candi- 
date sequence. 


Postprocessing: Scoring Candidate Sequences 
A subsequent scoring of the candidate peptide se- 
quences is necessary in order to: (a) assign amino 
acids to regions of the peptide not considered in the 
sequencing calculations due to boundary condition ad- 
justments, (b) resolve doublet and triplet amino acid 
combinations due to missing peaks, and (c) validate the 
“pb” or “y” ions used to construct the candidate sequence 
by looking for other supporting ions in the raw tandem 
mass spectrum. For cases (a) and (b), the weight in 
the candidate sequence is replaced by permutations of 
amino acids consistent with this mass. This results in 


a super set of candidate sequences whose theoretical 
tandem mass spectra can be predicted and compared to 
the experimental tandem mass spectrum for validity. 

Since only the y- or b- ion series were used in con- 
structing the peptide sequences, it would be beneficial 
to utilize various other types of ions when scoring these 
candidate peptides in order to exploit as much infor- 
mation from the tandem mass spectrum as possible. 
In particular, the assignment of a peak as a b- or y- 
ion can be confirmed by the existence of supporting 
isotopes, neutral losses of small molecules (i. e.,-H2O, 
-NH3, -H,O-NH3, -H,O-H20) and multiply-charged 
ions. To introduce dependencies among the ion se- 
ries, a reward/penalty system was created. For instance, 
a match between a predicted y-ion and a peak in the 
experimental spectrum is more probable if the corre- 
sponding y-ions isotopes and offsets are also found in 
the experimental spectrum [18]. Thus, the score from 
a match between b- or y-ion is assigned a reward pro- 
portional to the number of its corresponding isotopes 
and offsets that also match with the experimental spec- 
trum. Conversely, the existence of isotopic offsets and 
and neutral loss ions without a corresponding y- or 
b-ion are penalized in the score. These conventions ad- 
dress the likelihood that the peaks used in the construc- 
tion of the candidate sequence are actually of the b- or 
y-ion series. 


Cases 


The proposed framework is for the de novo identifica- 
tion of doubly-charged tryptic peptides that were ion- 
ized via electrospray ionization. A comparative study is 
presented with several existing de novo peptide identi- 
fication methods to demonstrate the predictive capabil- 
ities of the proposed framework, PILOT. The de novo 
peptide identifications algorithms Lutefisk, LutefiskXP, 
PepNovo, PEAKS, EigenMS, NovoHMM, were selected 
on the basis of availability and reported performance. 
Tandem MS for quadrupole time-of-flight mass spec- 
trometers were analyzed in the comparative study. 


Quadrupole Time-of-Flight, QTOF, Spectra 


To test the method’s performance on quadrupole time- 
of-flight tandem mass spectra, we selected an exist- 
ing data set that is publicly available [26]. These spec- 
tra were collected with Q-TOF2 and Q-TOF-Global 
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mass spectrometers for a control mixture of four known 
proteins: alcohol dehydrogenase (yeast), myoglobin 
(horse), albumin (bovine, BSA), and cytochrome C 
(horse). A total of 38 doubly-charged tryptic peptides 
were examined using Lutefisk, LutefiskXP, PepNovo, 
PEAKS, EigenMS and PILOT [9]. The top-ranked se- 
quence reported from each of these methods is used in 
the comparison. 

The results of the comparison can be summarized 
using various measures. First, consider the overall pep- 
tide identification accuracy of the de novo methods, 
which is reported in Table 2. In terms of correct identi- 
fications, PILOT is superior to the other de novo meth- 
ods with an identification rate of about 66%, followed 
by PEAKS and EigenMS, both at about 53%. A com- 
mon limitation of de novo methods is the inability to 
assign the correct N-terminal amino acid pair or re- 
solve isobaric residues (i.e., Q or GA, W or SV, etc.). 
To accommodate this limitation in the comparison, we 
also reported the percentage of predictions for which 
there are only one, two, or three incorrect amino acid 
assignments in the entire sequence. In Table 2, it is seen 
that allowing for up to three incorrect amino acids in- 
creases the identification rate for all methods on the or- 
der of 30%, which supports the hypothesis that these 
limitations are inherent in all the de novo methods. The 
last entry in Table 2 reports the number of correctly as- 
signed residues normalized by the total number of ac- 
tual residues (which is 418 for the 38 doubly-charged 
peptides considered). PILOT outperforms the other de 
novo methods with a residue accuracy of 91%. 

Another alternative metric of performance is the 
percentage of correct continuous subsequences of 
a given number of amino acids [13,18]. These trends 
are summarized in Fig. 1 for all of the de novo methods. 
This alternative metric reveals that although LutefiskXP 
has a lower overall identification rate than Lutefisk, it 
exhibits a much better accuracy over subsequences of 
varying length (as shown in Fig. 1). Note that some of 
the trends in Fig. 1 exhibit an increase in accuracy for 
correct subsequences greater than 8 consecutive amino 
acids in length. This is because these counts are normal- 
ized by the total number of peptides that are of at least 
the length specified, which decreases from 37 for subse- 
quences of length 8 to 30 for subsequences of length 9 
to 25 for subsequences of length 10. For each of the 38 
QTOF peptides, PILOT predicts at least 6 consecutive 
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Peptide Identification via Mixed-Integer Optimization, Fig- 
ure 1 

Comparison of correct subsequences of varying length for 
quadrupole time-of-flight predictions 


amino acids correctly for every peptide and performs 
consistently better than the other de novo methods over 
the entire range of subsequences considered. 


Conclusions 


A novel mixed-integer linear optimization (MILP) 
framework, PILOT, was proposed for the de novo iden- 
tification of peptides using tandem mass spectrome- 
try data. PILOT is the first reported mixed-integer lin- 
ear optimization formulation for the peptide identi- 
fication problem which can introduce integer cuts to 
generate a rigorous rank-ordered list of candidate se- 
quences, introduce complementary ions directly into 
the sequencing calculations and allow for the error 
tolerance to be introduced as a variable term. For 
a given experimental MS/MS spectrum, PILOT gen- 
erates a rank-ordered list of potential candidate se- 
quences and a cross-correlation technique is employed 
to select the most probable peptide by assessing the 
degree of similarity between the theoretical tandem 
mass spectra of the predicted sequences and the exper- 
imental tandem mass spectrum. A comparative study 
using tandem mass spectra from quadrupole time-of- 
flight was presented to benchmark the performance of 
the proposed framework with several existing de novo 
methods. 
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Peptide Identification via Mixed-Integer Optimization, Table 2 
Identification rates for QTOF spectra 


Lutefisk 


Correct Identifications | 10 (0.263 9 (0.237 


LutefiskXP PepNovo 
16 (0.421) 


PILOT 
25 (0.658 


PEAKS Online EigenMS 
21 (0.553) 20 (0.526 


with in 1 Residue 11 (0.290) | 10 (0.263 


7 (0.447) 


0.579 21 (0.553) | 25 (0.658 


with in 2 Residue 


5 (0.658) 


with in 3 Residue 23 (0.605) | 25 (0.658 


0.842 30 (0.790) | 35 (0.921 


) ) 
) ) 
23 (0.605) | _ 22 (0.579) 
) ) 
) ) 


Total Correct Residues | 245 (0.586) | 294 (0.703 
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Introduction 


Conjugate-gradient methods represent an important 
class of unconstrained optimization algorithms with 
strong local and global convergence properties and 
modest memory requirements. An excellent survey 
of the development of different versions of nonlinear 
conjugate-gradient methods, with special attention to 
global convergence properties, is presented by Hager 
and Zhang [22]. This family of algorithms includes a lot 
of variants, well known in the literature, with important 
convergence properties and numerical efficiency. The 
purpose of this chapter is to present these algorithms 
as well as their performances to solve a large variety of 
large-scale unconstrained optimization problems. 

For solving the nonlinear unconstrained optimiza- 
tion problem 


min {f(x):x © R"} , (1) 


wheref : R” — R isa continuously differentiable func- 
tion bounded from below, starting from an initial guess 
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xo € R" anonlinear conjugate-gradient method gener- 
ates a sequence {x;} as 


Xk+1 = Xe + agg , (2) 


where a; > 0 is obtained by line search, and the direc- 
tions d;, are generated as 


dk+1 = —ge+it Best, do =—o. (3) 


In (3) Bx is known as the conjugate-gradient param- 
eter, Sp = X¢41 —X~% and gy = Vf(x,). Consider |[.|| 
the Euclidean norm and define y, = gx41 — gx. The 
line search in the conjugate-gradient algorithms is of- 
ten based on the standard Wolfe conditions: 


f(x + ondx) — f(x) < porgy de , (4) 


Skride = gpd (5) 


where d; is a descent direction and 0<p<o <1. 
For some conjugate-gradient algorithms, stronger ver- 
sions of the Wolfe conditions are needed to ensure con- 
vergence and to enhance stability. According to the 
formula for 6, computation, the conjugate-gradient 
algorithms can be classified as classical, hybrid, scaled, 
modified and parametric. In the following we shall 
present these algorithms and insist on their numeri- 
cal Dolan and Moré’s performances profiles for solving 
large-scale unconstrained optimization problems. 

The history of conjugate-gradient method begins 
with the seminal paper of Hestenes and Stiefel [23], who 
presented an algorithm for solving symmetric, positive- 
definite linear algebraic systems. In 1964 Fletcher and 
Reeves [18] extended the domain of application of con- 
jugate-gradient method to nonlinear problems, thus 
starting the nonlinear conjugate-gradient research di- 
rection. The main advantages of the conjugate-gradient 
method are its low memory requirements and its con- 
vergence speed. A large variety of nonlinear conjugate- 
gradient algorithms are known. For each of them con- 
vergence results have been proved in mild conditions 
which refer to the Lipschitz and boundedness assump- 
tions. To prove the global convergence of nonlinear 
conjugate-gradient methods, often the Zoutendijk con- 
dition is used combined with analysis showing that the 
sufficient descent condition gf dx < —c ||gx ||? holds, 
and that there exists a constant 8 such that ||d,||* < 5k. 
Often, the convergence analysis of conjugate-gradient 


algorithms, for general nonlinear functions, follows in- 
sights developed by Gilbert and Nocedal [19]. The idea 
is to bound the change u,4) — ux in the normalized di- 
rection uz = dx/ ||d,||, which is used to conclude, by 
contradiction, that the gradients cannot be bounded 
away from zero. 


Classical Conjugate-Gradient Algorithms 


These algorithms are defined by (2) and (3), where the 
parameter /; is computed as in Table 1. Observe that 
these algorithms can be classified as algorithms with 
|gx+1||° in the numerator of 6; and algorithms with 
Re 4k in the numerator of parameter x. 

The FR, CD and DY methods (see the tables for 
the definitions of the acronyms used for the algo- 
rithms throughout the text), with || geil’ in the nu- 
merator of 6;, have strong convergence theory, but all 
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Unconstrained Optimization, Table 1 
Classical conjugate-gradient algorithms 


No. Formula Author(s) 


Hestenes and Stiefel [23] 
(HS). 

The first conjugate- 
gradient algorithm for 
linear algebraic systems 


Fletcher and Reeves [18] 
(FR). 

The first conjugate- 
gradient algorithm for 
nonlinear functions 


= 
Ber — Ye Iki 


a Polak-Ribiere [31] and 
949k 


Polyak [32] (PRP) 


PRP+- 0 YAIKA 
By hare eee 


= max 
9k Gk 


Polak-Ribiere and Polyak 
+ (PRP+) suggested by 


Powell [33] 


= ku Gk-+1 
IK Ak 


a = Conjugate descent (CD) 
introduced by 


Fletcher [17] 


1S. _ Yak 


i 26] (LS 
fe Be Liu and Storey [26] (LS) 


7 
DY _ 9k4+19k+1 


= Dai and Yuan [13] (DY) 
VK Sk 
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these methods are susceptible to jamming. They be- 
gin to take small steps without making any significant 
progress to the minimum. On the other hand, the HS, 
PRP and LS methods, with g/, yx in the numerator 
of parameter; have a built-in restart feature that ad- 
dresses the jamming phenomenon. When the step sx is 
small, the factor yz = gr+1 — gx in the numerator of B, 
tends to zero. Therefore, 6, becomes small and the new 
direction d;4, in (3) is essentially the steepest descent 
direction —g,+41. With other words, the HS, PRP and 
LS methods automatically adjust 6; to avoid jamming, 
and their performances are better than the performance 
of methods with ||g,+1||° in the numerator of Bx. 


Hybrid Conjugate-Gradient Methods 


These algorithms have been devised to exploit the at- 
tractive features of the classical conjugate-gradient al- 
gorithms. They are defined by (2) and (3), where the 
parameter ; is as in Table 2. There are two classes of 
hybrid algorithms. The first class of the hybrid algo- 
rithms combines in a projective manner the algorithms 
having ||gx+1l|” in the numerator of 6; with the al- 
gorithms having gj, ,yx in the numerator of Bx. The 
second class of hybrid algorithms, more recently es- 
tablished, considers convex combinations of algorithms 
with || gx +1||” in the numerator of 8; and the algorithms 
having gj, ,y« in the numerator of Bx. In general, the 
performances of hybrid conjugate-gradient algorithms 
are better than the performances of classical conjugate- 
gradient algorithms. 


Scaled Conjugate-Gradient Algorithms 


The algorithms in this class generate a sequence x, of 
approximations to the minimum x* of f, in which 


Xke+1 = Xe + ard , (6) 


dear = —Or4igesi + Bese. (7) 


where 6,41 is a parameter. The iterative process is ini- 
tialized with an initial point x9 and dy = —go. Ob- 
serve that if 6,4; = 1, then we get the classical con- 
jugate-gradient algorithms according to the value of 
the scalar parameter 6;. On the other hand, if 6, = 0, 
then we get another class of algorithms according to the 
selection of the parameter 0,41. Considering B; = 0, 
there are two possibilities for 0,41: a positive scalar or 


a positive-definite matrix. If 6,41; = 1, then we have the 
steepest-descent algorithm. If 6,4; = V*f(xr41)7|, or 
an approximation of it, then we get the Newton or 
the quasi-Newton algorithms, respectively. Therefore, 
we see that in the general case, when 0,41 # 0 is se- 
lected in a quasi-Newton manner, and Bx ¥ 0, (7) rep- 
resents a combination between the quasi-Newton and 
the conjugate-gradient methods. However, if 6,41 is 
a matrix containing some useful information about the 
inverse Hessian of function f, we are better off using 
dpi = —O¢418k+1 since the addition of the term B;.s, 
in (7) may prevent the direction d;, from being a de- 
scent direction unless the line search is sufficiently ac- 
curate. Therefore, in the following we shall consider 
041 as a positive scalar which contains some useful in- 
formation on the inverse Hessian of function f. 

To determine 6, consider the following proce- 
dure [1,2,3,4]. As we know, the Newton direction for 
solving (1) is given by dy41 = —V* f(xn41)7 | geti- 
Therefore, from the equality 


—VW? f (xXk41) Set = —Ox+i geri + BKsk . 
we get 


SEV? (xe) Ok+1 Sk — $7 Sk-+1 

sp Vf (xk-+1)Sk , 
Using the Taylor development, after some algebra, we 
get 


p= (8) 


(Ox41¥k — Sk) Bk+1 
B= 7 
Vx Sk 
where yr = Bk+1 — gk. Birgin and Martinez [10], who 
first introduced scaled conjugate-gradient algorithms, 
arrived at the same formula for 6;, but using a geomet- 
ric interpretation of quadratic function minimization. 
The parameter fx in (7) can be defined, as in Table 3, 
where the scaling parameter 6; is computed as 


(9) 


T 
Orgy = AE (10) 
VE Sk 
Another scaled conjugate-gradient algorithm was 
presented by Andrei [1,2,3,4]. This is a scaled memory- 
less Broyden-Fletcher-Goldfarb-Shanno (BFGS) pre- 
conditioned conjugate-gradient algorithm. The basic 
idea is to combine the scaled memoryless BFGS method 
and the preconditioning technique in the frame of the 
conjugate-gradient method. The preconditioner, which 
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Hybrid conjugate-gradient algorithms 


Nr. Formula Author(s) 


Bee’ = max {esr ann By*, Be tt Hybrid Dai-Yuan [15] (hDY) 


pr? = max {o,min (Bi, Bey} Hybrid Dai-Yuan zero [15] (hDYz) 


GN — max {—BER, min {BPRP, BER Gilbert and Nocedal [19] (GN) 


B= max {0, mint Bb,” .B, tt Hu and Storey [24] (HuS) 
a Be TSB = Bes 


FR 


Touati-Ahmed and Storey [37] (TaS) 
k otherwise 


pe =a (Oominta 8} ) Hybrid Liu-Storey, conjugate descent (LS-CD) 


Ik Yk 9k 
DES rs al ae 
= — Vio) EA) Hf d+ NGF 9H) 
pane) Ypsid—llon+1 I’ lax ll?” 
iidu= Othen'seto, =O le, p18 
if 0, > 1, then take 6, = 1, i.e., Become — pr 


; Andrei [7]. Convex combination of PRP and DY where 6, 
is obtained by a conjugacy condition (CCOMB) 


pipome =(1 6) SeevE +0 Th IKI 
Vx Sk 
Ne Wisse eel Sg 
lax |? llaull? (of 4 rEsid 
He, = 0 thenset—O1e, 6.0 — 8, 
if 0, > 1, then take 6, = 1,i.e., BNOOMB pov 


; Andrei [7]. Convex combination of PRP and DY where 6, 
is obtained using the Newton direction (NDOMB) 


Ik Vk hp 9k 
BNDHSDY — (1 — 9,) EY 4 9, Sets + 


wine Andrei [8]. Convex combination of HS and DY, where 6, 
k 


Lok is obtained using the Newton direction (NDHSDY) 


I 9k-+1 
ve= Othensetg,— Oe, s,2 > — 8) 
if 0, > 1, then take 0, = 1,i.e., BNDHSDY — pov 


Performance Profiles of Conjugate-Gradient Algorithms for Unconstrained Optimization, Table 3 
Scaled conjugate-gradient algorithms 


No. Balle E Author(s) 
Scaled Perry. Suggested by Birgin and Martinez [10] and Andrei [1,2,3,4] (BM) 


= Gh-15k 


ae Scaled Perry+. Suggested by Birgin and Martinez [10] (BM+) 
k 


On Vk 


FA Ge Scaled Polak-Ribiére-Polyak. Suggested by Birgin and Martinez [10] and 


Andrei [1,2,3,4] (sPRP) 


a 
dat — PI Gk+1 


Scaled Fletcher-Reeves. Suggested by Birgin and Martinez [10] and 
Andrei [1,2,3,4] (sFR) 


k Ok Ok—19 19k 


Ca = Oy thet Scaled Hestenes-Stiefel (sHS) [1] 
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Performance Profiles of Conjugate-Gradient Algorithms for Unconstrained Optimization, Table 4 


Modified conjugate-gradient algorithms 


No. Formula 
dh4 = —Gk1 + BNdk, do = —9o, 
By = max Be Nt, 
Tk = Taglmintn Tos? 7 = 9-01 


2 if 
Yk 
= Ta (1. -2bel a 9k+1 


Author(s) 


Introduced by Hager and Zhang [20,21] 
(CG_DESCENT). 
This is a modification of the HS method 


Suggested by Andrei [9] (ACGA) 


Suggested by Andrei [9] (ACGA+) 


der = O41 9K41 + Berd, 


4 
BeasD eyed pare Ik1Yk 
Ms YK Sk VkSk 


2 
= IIgxtill 
Ve Gk+1 


O41 


Introduced by Andrei [5] (CGSD) as 
a modification of the DY method 


5 iF 
APRP 1 lly«ll 
= - s 
B;, yas (v lige k} QGk+1 


is also a scaled memoryless BFGS matrix, is reset when 
the Powell restart criterion holds. The parameter scal- 
ing the gradient is selected as the spectral gradient (10). 


SCALCG Algorithm 


Step 1. Initialization. Select x9 € R", and the parame- 
ters 0 < p<o < 1.Compute f(x) and go = Vf (xo). 
Set do = —go and a = 1/||go|| . Set k = 0. 

Step 2. Line search. Compute a, satisfying the Wolfe 
conditions (4) and (5). Update the variables x,4) = 
xk + apd,. Compute f(xx~41), Seti and sh = X41 — 
Xks Vk = Sk+1 — Sk 

Step 3. Test for continuation of iterations. If this test is 
satisfied the iterations are stopped, else set k = k + 1. 
Step 4. Scaling factor computation. Compute 6; us- 
ing (10). 

Step 5. Restart direction. Compute the (restart) direc- 
tion dx as 


kok 


T T T 
k \ &x41Sk Sik 
|( + fag au — O41 am s 
Ve Sk Vi Sk Vr Sk 


Si 415k 
Aki = —Ox4i8k-+1 + Oe-+1 gis Vk 


Suggested by Andrei [9] (APRP). 
This is a modification of the PRP method 


Step 6. Line search. Compute the initial guess 
Ok = Ak) |\dk-i||,/||dx||,. Using this initialization 
compute a; satisfying the Wolfe conditions. Update the 
variables x,41 = x~ + a@xd,. Compute f(xp41), Sk+1 
and sp = Xp41 —Xk, Vk = Sk+1 — Sk: 

Step 7. Store: 0 =0,, s =s,and y= yx. 
Step 8. Test for continuation of iterations. If this test is 
satisfied the iterations are stopped, else setk = k +1. 

Step 9. Restart. If the Powell restart criterion 
[gr 8k| > 0.2 || gx+:l|* is satisfied, then go to step 4 
(a restart step), otherwise continue with step 10 (a stan- 
dard step). 

Step 10. Standard direction. Compute the direction dx 
as 


rn (gr 4 iSk)w + (e741 W)sk 


dkt1 =—Vv 
VESk 
T T 
VW \ Sk41Sk 
=[leo 7 Sk 
VeSk I) VESK 


where v and w are computed as 


T 
§k+415 
v =O 8K41 -6 ( yrs y 


4 ( 4 2) Seis - oti? | & 
yess y's ys 
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Parametric conjugate-gradient algorithms 


No. Formula 


Author(s) 


y —ts,) 
pee = os t > Ois a constant Dai and Liao [12] (DL) 


DL+ 
k 


> 


= max {0 


veo | SIKH 
YISk YISk 

t > Ois a constant 

pT ee hepa k—t5k) 
k diz 

where z, = 

Eq = 6(fk — fet1) + 39k + Gk41)" SK, 

6 > Oisaconstant and uz € R" satisfies 
iF : 

5, Uk # 0; 

for example ux = dk 


S ee eee € [Oh ill 
Br Axllg«ll2-H—And) yk? «€01 
The FR algorithm corresponds to A, = 1. 
The DY algorithm corresponds to A, = 0 


Dai and Liao + [12] (DL+) 


Suggested by Yabe and Takano [38] (YT) based on 
a modified secant condition given by Zhang et 
al. [39] 


Suggested by Yabe and Takano plus [38] (YT+) 


Suggested by Dai and Yuan [14] 


Pee Mx] 941° +0 Mp) ps Vk 
Allgkll?-+U—Andd yk 


Ak, Mk € [0, 1] 


Suggested by Nazareth [28]. 
This two-parameter family includes the methods FR, 
DY, PRP and HS in extreme cases 


ee Hk || K-41 I +0 Meany sve 
‘ (App) IGK II? AKA] yk—COKd] gk’ 
Ak, [ek € [0, land w, € [0,1 — Ax] 


and 


T 
VES 
w=O0y. —0 (4) - 


i 14 va 

LIS aid 
14+ 6 = : 
+|( : a) 3 ae 


with saved values 6, s and y. 


Step 11. Line search. Compute the initial guess 


Oy = OK} ||dx-1||,/ ||dx||,. Using this initialization 


compute a, satisfying the Wolfe conditions. Update the 


variables xp41 = x~ + dy. Compute f (x41), Se+1 
and sx = Xk41—Xk. Vk = Skt+i — Sk- 

Step 12. Test for continuation of iterations. If this test 
is satisfied the iterations are stopped, else set k = k + 1 
and go to step 9. 


Suggested by Dai and Yuan [14] 

This three-parameter family includes the six classical 
conjugate-gradient algorithms, as well as the 
previous one-parameter and two-parameter families 


To a great extent, the SCALCG algorithm [1,2,3,4] 
is very close to the Perry-Shanno computational 
scheme [30,34,35]. SCALCG is a scaled memory- 
less BFGS preconditioned algorithm where the scal- 
ing factor is the inverse of a scalar approxima- 
tion of the Hessian. If the Powell restart criterion 
lor ge| > 0.2 ||gx+1||’ is used, for general functions 
f bounded from below with bounded second partial 
derivatives and bounded level set, using the same ar- 
guments considered by Shanno in [35], one can prove 
that either the iterates converge to a point x* satisfying 
|| (x*)|| = 0 or the iterates cycle. 


Modified Conjugate-Gradient Algorithms 


We know a large variety of modified conjugate-gradi- 
ent algorithms. All of them are designed to improve the 
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performances of the classical computational schemes 
using the idea of preconditioning or the modification 
of classical schemes in order to satisfy the sufficient- 
descent condition. The algorithms in this class are char- 
acterized by (2) and (3), where the parameter fh; is com- 
puted as in Table 4. 

Maximization in the formula for the pe compu- 
tation scheme by Hager and Zhang plays the role of 
the truncation operation like in the PRP+ scheme, for 
example. Hager and Zhang obtained this algorithm 
by deleting a term from the search direction for the 
memoryless quasi-Newton scheme of Perry [30] and 
Shanno [34]. Hager and Zhang [20] proved the global 
convergence with inexact line search, showing that 
for any line search and any function, the sufficient- 
descent condition g/d, < —(7/8) || gx ||? is satisfied and 
the jamming is avoided essentially owing to the y/ gk+1 
term in the formula for BY. 

The ACGA and ACGA+ computational schemes 
are a modification of the DY conjugate-gradient algo- 
rithm, designed to satisfy the sufficient-descent con- 
dition. Andrei [9] proved that for uniformly convex 
functions under a strong Wolfe condition the ACGA 
is globally convergent. The CGSD algorithm is also 
a modification of the Dai and Yuan conjugate-gradient 
algorithm. Andrei [9] proved the global convergence of 
CGSD for general nonlinear functions under the Wolfe 
conditions. 

One of the best conjugate-gradient algorithms in 
this class is CONMIN by Shanno [34] and Shanno and 
Phua [36]. Using the Hestenes and Stiefel formula for 
updating f;, Perry [30] suggested a formula for com- 
puting the search direction d,+4 which satisfies a sys- 
tem of linear equations, similar but not identical to the 
quasi-Newton equation. Shanno [34] reconsidered the 
method of Perry and interpreted it as a memoryless 
BFGS updating formula. In this algorithm g;,+4, is mod- 
ified by a positive-definite matrix which best estimates 
the inverse Hessian, without any additional storage re- 
quirements. For convex functions, under inexact line 
search, Shanno [35] proved the global convergence of 
CONMIN. 


Parametric Conjugate-Gradient Algorithms 


The parametric conjugate-gradient algorithms were in- 
troduced in the same way as the quasi- Newton methods 


were combined to get the Broyden or the Huang fami- 
lies. These algorithms are defined by (2) and (3), where 
the parameter f, is as in Table 5. 


Performance Profiles 


In this section we present the computational perfor- 
mance of a Fortran implementation of conjugate-gra- 
dient algorithms on a set of 750 unconstrained op- 
timization test problems. The test problems are the 
unconstrained problems in the CUTE [11] library, 
along with other large-scale optimization problems 
presented in [6]. We selected 75 large-scale uncon- 
strained optimization problems in extended or gen- 
eralized form. For each function we have considered 
ten numerical experiments with the number of vari- 
ables n = 1000, 2000,..., 10000. CG_DESCENT was 
authored by Hager and Zhang [20,21], CONMIN by 
Shanno and Phua [36]. The CG_DESCENT code con- 
tains the variant CG_DESCENT(w) implementing the 
Wolfe line search and the variant CG_DESCENT(aw) 
implementing an approximate Wolfe line search. The 
Wolfe conditions implemented in CG_DESCENT(w) 
can compute a solution with an accuracy on the or- 
der of the square root of machine epsilon. In con- 
trast, the approximate Wolfe line search implemented 
in CG_DESCENT(aw) can compute a solution with 


0.9} 

0.8} Polak-Ribiere-Polyak+ (PRP#).__Liu-Storey (LS) 

07+ Polak-Ribiere-Polyak (PRP) Hestenes-Stiefel (HS) 

0.6} 

Conjugate Descent (CD) 

0.5} Fletcher-Reeves (FR) HS me 348 
FR_ 174 127 101 

047 lon Pepeaie 26) Bs 

cee yuan (Oy) CD 190 155 123 

0.3} LS 262 1 4 
DY 228 219 171 

0.2 

CPU time metric, 657 problems 
0.1 " 1 1 


0 2 4 6 8 10 12 #«214 ~ «416 


Performance Profiles of Conjugate-Gradient Algorithms for 
Unconstrained Optimization, Figure 1 

Performance profiles of the HS, FR, PRP, PRP+, CD, LS and DY 
methods. See the tables for the definitions of the acronyms 
used for the algorithms referred to in all the figures 


Performance Profiles of Conjugate-Gradient Algorithms for Unconstrained Optimization 


2945 


CCOMB 


\ 


oie Polak-Ribiere-Polyak (PRP) ] 

0.6; CCOMB PRP = | 
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0.75; NDOMB PRP = 4 
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0.7; #fg 327 251 «131 «4 
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0.6} , : 4 

CPU time metric, 709 problems 
0.55 : : : . n . 
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0.9; a 
NDHSDY 
0.8; A 4 
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0.7; 4 
NDHSDY HS = 
#iter = 277 244 183 
0.6} #fg «283-316 «105 7 
cpu 413 203 88 
0.5 4 
CPU time metric, 704 problems 
0.4 
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0.7} oN 
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sad | CCOMB DY = | 
0.4 #iter 433 80 187 4 
03} #ig «386197 117'—s*! 
: cpus 4531 105 64 
0.2 4 
0.1 CPU time metric, 700 problems 

% 2 4 6 8 10 12 44 16 


0.9} 
0.8; Dai-Yuan (DY) 
a7 NDOMB DY =_ | 
#iter 438 74 «188 
0.6} #fg -379'Ss 201 :«:120'=—s+ 
cpu 383 225 92 
0.5 
CPU time metric, 700 problems 
04 
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0.9} , 

0.8; ‘ NDHSDY 

0.7 Dai-Yuan (DY) 

08; NDHSDY DY = | 

05+ #iter 435 79 188 
#fig 396 192 114 

0.4) cpu 521 113 68 4 

0.3} ; ; | 

CPU time metric, 702 problems 


a a a (Pa) 


Performance Profiles of Conjugate-Gradient Algorithms for Unconstrained Optimization, Figure 2 
Performance profiles of some hybrid conjugate-gradient algorithms (continued on next page) 


an accuracy on the order of machine epsilon. The 
rest of the algorithms considered in this study are au- 
thored by Andrei. All codes were written in double- 
precision Fortran and compiled with f77 (default com- 
piler settings) on an Intel Pentium 4, 1.8 GHz worksta- 
tion. 

All algorithms implement the Wolfe line search 
conditions with p = 0.0001 ando = 0.9, and the same 
stopping criterion ||gx||,, < 10~°, where ||.||,, is the 
maximum absolute component of a vector. 


The comparisons of the algorithms are given in 
the following context. Let fA'C! and fA™? be the op- 
timal value found by ALG1 and ALG2, for problem 
i= 1,...,750, respectively. We say that, in the particu- 
lar problem i, the performance of ALG1 was better than 
the performance of ALG2 if 


ha — fh! < 107° 


and the number of iterations, or the number of func- 
tion-gradient evaluations, or the CPU time of ALG1 
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(continued) 


was less than the number of iterations, or the number 
of function-gradient evaluations, or the CPU time cor- 
responding to ALG2, respectively. 

The performances of these algorithms were evalu- 
ated using the profiles of Dolan and Moré [16] cor- 
responding to this set of 750 test problems we ex- 
tracted from the CUTE collection [11] and from [6]. 
For each algorithm, we plot the fraction of problems 
for which the algorithm is within a factor of the best 
CPU time. The left side of the figures gives the per- 


centage of the test problems, out of 750, for which 
an algorithm is more performant; the right side gives 
the percentage of the test problems that were success- 
fully solved by each of the algorithms. Mainly, the right 
side represents a measure of an algorithm’s robust- 
ness. 

In the first set of numerical experiments we com- 
pare the classical conjugate-gradient algorithms. Fig- 
ure 1 shows the CPU time performance profiles of these 
algorithms. 
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Performance Profiles of Conjugate-Gradient Algorithms for Unconstrained Optimization, Figure 3 
Performance profiles of the NDHSDY algorithm compared with those of some classical conjugate-gradient algorithms 


From Fig. 1 we see that the FR, CD and DY meth- 
ods, although they have strong convergence properties, 
may not perform well in practice owing to jamming. In 
contrast, although the HS, PRP and LS methods in gen- 
eral may not converge, they often perform better than 
the FR, CD and DY methods. 

Figure 2 presents the performance profiles of some 
hybrid conjugate-gradient algorithms. 

Figure 3 presents the performance profiles of the 
NDHSDY algorithm compared with those of the classi- 
cal conjugate-gradient algorithms: PRP, PRP+, LS and 
CD. It seems that the best algorithm is the hybrid algo- 
rithm NDHSDY given by a convex combination of HS 
and DY, where the parameter in the convex combina- 
tion is obtained using the Newton direction. 

In the next set of numerical experiments we com- 
pare the scaled conjugate-gradient algorithms. Figure 4 
shows the performance profiles of SCALCG, BM, BM+, 
sPRP and sFR. We see that the SCALCG algorithm is 
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Unconstrained Optimization, Figure 4 

Performance profiles of scaled conjugate-gradient algo- 
rithms 


top performer among the scaled conjugate-gradient al- 
gorithms. 
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Figure 5 shows the performance profile of the 
SCALCG algorithm compared with those of the clas- 
sical conjugate-gradient algorithms PRP and PRP+, as 
well as the hybrid algorithms CCOMB and NDHSDY. 

In the following we compare the modified con- 
jugate-gradient algorithms CG_DESCENT(w), ACGA, 
ACGA+, CGSD and APRP. Figure 6 presents the per- 
formance profiles of these algorithms. 

Figure 7 presents the performance profiles of the 
CG_DESCENT(w) algorithm compared with those of 
the PRP, PRP+, NDHSDY and SCALCG algorithms. 

Now, comparing CONMIN with some other mod- 
ified conjugate-gradient algorithms, ACGA, ACGA+, 
CGSD and APRP, we obtained the performance pro- 
files as in Fig. 8. 

We see that CONMIN is the top performer. Figure 9 
presents the performances profiles of the CONMIN al- 
gorithm compared with those of the PRP, NDHSDY, 
SCALCG and CG_DESCENT algorithms. 

Finally, let us consider the parametric conjugate- 
gradient algorithms DL(t=1) and DL+(t=1). Figure 10 
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Performance Profiles of Conjugate-Gradient Algorithms for 
Unconstrained Optimization, Figure 6 

Performance profiles of the CG_DESCENT, ACGA, ACGA+, 
CGSD and APRP algorithms 


shows the performance profiles of the DL and DL+ al- 
gorithms compared with those of the PRP, SCALCG 
and CONMIN algorithms. 
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rithms 


Conclusion and Discussion 


Conjugate-gradient algorithms are one of the most el- 
egant and probably the simplest algorithms for com- 
putational nonlinear optimization. Their theory is well 
established [22] and they have proved to be surpris- 
ingly effective in solving real practical applications. The 
computational study presented here, which include 29 
conjugate-gradient algorithms, shows that the most ef- 
fective are CONMIN, CG_DESCENT and SCALCG. 
Close to these algorithms is NDHSDY, a convex combi- 
nation of HS and DY conjugate-gradient algorithms in 
which the parameter is computed using the Newton di- 
rection. Concerning the robustness, CG_DESCENT is 
in first place. 

This computational study involved a large variety 
of nonlinear test functions. However, to draw con- 
clusions about the effectiveness of these algorithms, 
the test functions must be organized on some classes 
with well-established characteristics, and one must see 
which conjugate-gradient algorithm is more successful. 
This remains to be explored. 
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Performance profiles of the CONMIN, ACGA, ACGA+, CGSD 
and APRP algorithms 


It is worth seeing a comparison between the most 
successful conjugate-gradient algorithms and the quasi- 
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Performance profiles of the CONMIN algorithm compared with those of the PRP, NDHSDY, SCALCG and CG_DESCENT algo- 


rithms 


Newton limited BFGS (LBFGS) algorithm of No- 
cedal [29]. Figure 11 shows the performance pro- 
files of the CONMIN, SCALCG, CG_DESCENT and 
NDHSDY algorithms compared with those of the 
LBFGS (m = 3) algorithm, an implementation given by 
Liu and Nocedal [25] using the line search of Moré and 
Thuente [27]. 

From Fig. 11 we see that LBFGS (m=3) is way 
more successful than any conjugate-gradient algorithm. 
Closest to LBFGS is CONMIN. 

Even though conjugate-gradient methods are rele- 
vant nonlinear optimization methods, there are some 
open problems which deserve additional research: 

1. In contrast to the quasi- Newton methods for which 
the step length for the vast majority of iterations 
is equal to 1, the step length in conjugate gradient 
methods differs from 1, being larger or smaller up 
by to two order of magnitude depending on how 
the problem is scaled. In conjugate-gradient meth- 
ods the size of a; varies in a very unpredictabe 
way. 


2. Another open problem is the preconditioning of 
conjugate-gradient algorithms. The scaled conju- 
gate-gradient algorithms by Birgin and Marti- 
nez [10] and Andrei [1,2,3,4] introduce a scaling of 
&x+1 in the direction d,+, computation. However, 
if the definition of 6,4; in (7) does contain enough 
information about the inverse Hessian of the min- 
imizing function, then it is better to use the search 
direction dy41 = —O¢41%-+41, since the addition of 
the term fxs, in (7) may prevent d,+1 from be- 
ing a descent direction unless the line search is suf- 
ficiently accurate. In scaled conjugate-gradient al- 
gorithms there is a very delicate balance between 
—6.412k+1 and Bxs_%, which brings to attention the 
preconditioning question. 

3. Another open problem with conjugate-gradient 
methods is that the structure of the minimizing 
problem is not taken into account to design more ef- 
ficient computational schemes. This is in sharp con- 
trast to quasi-Newton or truncated Newton meth- 
ods. 
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The phase problem of X-ray crystallography is formu- 
lated as one in constrained global minimization. The 
problem of reaching this minimum is solved by the 
Shake and Bake algorithm which effectively by-passes 
the myriad of (unconstrained) local minima. The func- 
tion whose constrained global minimum is sought is 
known as the minimal function. 


Introduction 


When a beam of monochromatic X-rays strikes a crys- 
tal, the crystal scatters the incident beam in different 
directions and with different intensities determined by 
the crystal structure, that is the arrangement of the 
atoms in the unit cell of the crystal. From the intensi- 
ties of the scattered X-rays, the directions of which are 
labeled by the so-called reciprocal lattice vectors H, a set 
of numbers |Ey| may be derived, one corresponding to 
each scattered X-ray. The phases y of the scattered X- 
rays are lost in the scattering (diffraction) experiment, 
that is to say cannot be measured. However, the eluci- 
dation of the crystal structure requires a knowledge of 
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the complex numbers 
Ey = |Eu| exp(i¢u) . (1) 


the so-called normalized structure factors, of which 
only the magnitudes |Ey| can be determined from ex- 
periment. Thus the phase $y, unobtainable from the 
diffraction experiment, must be assigned to each |Ey|. 
The problem of determining the phases @y when only 
the magnitudes |Ey| are available, is called the ‘phase 
problem’. A major advance in the early 1950s was the 
recognition that the lost phase information was to be 
found in the measurable intensities of the diffraction 
pattern. In fact, owing to the known atomicity of crystal 
structures and the redundancy of observed magnitudes 
|Ey|, the phase problem is not only solvable in principle 
but is a greatly overdetermined one. 

The phase problem is here formulated as one in 
constrained global minimization. The ability to impose 
constraints, in the form of identities among the phases 
which must, of necessity, be satisfied, even if only in- 
completely and approximately, enables one to avoid the 
countless local minima of the minimal function and to 
arrive at the unique constrained global minimum. The 
shake and bake algorithm, which implements the mini- 
mal principle formulated here, provides a routine and 
completely automatic solution of the phase problem 
when diffraction intensities to atomic resolution (1.2A 
or better) are available. 


Identities Among the Phases 


The relationship between the crystal structure and the 
diffraction pattern, in the case that the structure con- 
sists of N identical atoms in the unit cell (the only case 
to be considered here), is given by the pair of equations 


N 
1 
Ey = — exp(27iH -r;), 2 
at. ifr = Tj, 
(Ey exp(—27iH -r)) 4 = § YN (3) 
0 ifr x Tj, 


where |Ey| is obtained directly from the intensity of the 
X-ray scattered in the direction labeled by the recipro- 
cal lattice vector H, whose three components are in- 
tegers, r is an arbitrary three-dimensional vector, and 
r; is the position vector of the atom labeled j. Thus 


the positions of the maxima of the triple Fourier series 
(3), a function of the position vector 1, yield the crys- 
tal structure directly. However, in order to calculate (3) 
both the phases @y and magnitudes |Ey| (in (1)) are 
needed. Since only the magnitudes |Ey| are obtainable 
from the measured intensities in the diffraction experi- 
ment while the phases ¢y are lost, (3) makes clear why 
the phase problem has historically been regarded as the 
central problem of X-ray crystallography. 

The system of equations (2) implies the existence 
of relationships among the normalized structure factors 
Ey since the (relatively few) unknown atomic position 
vectors r; may, at least in principle, be eliminated. In 
this way one obtains a system of equations among the 
normalized structure factors Ey alone, that is, among 
the phases @y and magnitudes |Ey|: 


F(E) = G(¢;|E|) = 0 (4) 


which depends on N but is independent of the atomic 
position vectors r;. For a specified crystal structure the 
magnitudes |E| are determined (by (2)). Thus the sys- 
tem of equations (4) leads directly to a system of iden- 
tities among the phases ¢ alone: 


G($;|E|) > H(A||E|) = 0 (5) 


dependent now on the presumed known magnitudes 
|E|, which must of necessity be satisfied. The direct 
methods are those which exploit these relationships to 
go directly from known magnitudes |Ey| to the desired 
phases oy 


Structure Invariants 


Equation (3) implies that the normalized structure fac- 
tors Ey, that is to say magnitudes |Ey| and phases oy, 
determine the crystal structure. Equation (2) however 
does not imply that, conversely, the crystal structure 
determines unique values for the normalized structure 
factors Ey since the atomic position vectors r; depend 
not only on the crystal structure but on the choice of 
origin as well. As it turns out, however, the magni- 
tudes |Ey| are in fact uniquely determined by the crystal 
structure independently of the choice of origin, but the 
values of the individual phases depend on the choice 
of origin as well as on the crystal structure. Never- 
theless there exist special linear combinations of the 
phases, the so-called structure invariants, whose values 
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are uniquely determined by the crystal structure and are 
independent of the choice of origin. The most impor- 
tant class of structure invariants, the so-called triplets, 
are the linear combinations of three phases 


bux = oH + Ox + b-H-K (6) 


where H and K are arbitrary reciprocal lattice vectors. 


Probabilistic Background 


It is assumed that the atomic position vectors r; are 
random variables which are uniformly and indepen- 
dently distributed. Under this assumption the nor- 
malized structure factors Ey, as functions ((2)) of the 
primitive random variables r;, are themselves random 
variables. Since the magnitudes |Ey| are known from 
the diffraction experiment, (1) and (2) imply that the 
phases $y also are random variables. Hence it makes 
sense to ask for the conditional probability distribution 
of the special linear combinations of three phases, (6), 
the triplets dyx, assuming that the values of an appro- 
priate set of observed magnitudes |E| have been given. 


For Fixed H and K, the Conditional Probability 
Distribution of the Triplet Dy 
Under the assumptions of the previous section, the con- 
ditional probability distribution of the triplet dyx, (6), 
given the three magnitudes |Ey]|, |Ex|, |En+x| is 

® — 
Auk 27 Ip(Aux) 


P( exp(Aunk cos ®) , (7) 
where @ represents the triplet dux, the parameter Aux 
is defined by 


2 
Auk = —= |ExExEn+x| > 0 (8) 
VN | 
and Ip is the modified Bessel function. Since Ayx > 0, 
the distribution (7) implies that the mode of ¢y is zero 
and the conditional expectation (or average) of cosy, 
given Anx, is 


T,(Aux) 
Io(Aux) 
where I, is the modified Bessel function. It is also 
readily confirmed that the larger the value of Ayx the 
smaller is the conditional variance of cos@yx, given 
Aux. It is to be stressed that the conditional expected 
value of the cosine, (9), is always positive since Ayx > 0. 


E(cos dux) = >0, (9) 


Minimal Principle 


It is assumed that a crystal structure consisting of N 
identical atoms in the unit cell is fixed, but unknown, 
that the magnitudes |E| of the normalized structure fac- 
tors E are known, and that a sufficiently large base of 
phases, corresponding to the largest magnitudes |E], is 
specified. In view of (9), one is led to construct the so- 
called minimal function: 


R@) = 


Yeux 4nK 
1,(Aux) ) 7 
x A cos — —— (10) 
d nx uk Ip(Aux) 


which, because of (6), is seen to be a function of the 
phases ¢, dependent on known magnitudes |E|. Again 
in view of (9), one is led to conjecture that the global 
minimum of R(@) constrained to satisfy the identities 
(5) yields the correct values of all the phases (the min- 
imal principle). It is to be emphasized that the uncon- 
strained global minimum of R(@) does not give us the 
answer we seek, nor do any of the (innumerable) local 
minima. 


Computer Program ‘Shake and Bake’ 


The six-part shake and bake phase determination pro- 

cedure, shown by the flow diagram in Fig. 1, combines 

minimal-function phase refinement and real-space fil- 
tering. It is an iterative process that is repeated until 

a solution is achieved or a designated number of cycles 

have been performed. With reference to Fig. 1, the ma- 

jor steps of the algorithm are described next. 

A) Generate invariants. Normalized structure-factor 
magnitudes (|E|’s) are generated by standard scal- 
ing methods and the triplet invariants that involve 
the largest corresponding |E|’s are generated. Pa- 
rameter choices that must be made at this stage in- 
clude the numbers of phases and triplets to be used. 
The total number of invariants is ordinarily chosen 
to be at least 100 times the number of atoms whose 
positions are to be determined. 

B) Generate trial structure. A trial structure or model 
is generated that is comprised of a number of ran- 
domly positioned atoms equal to the number of 
atoms in the unit cell. The starting coordinate sets 
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Phase refinement 


Generate invariants 


Swucture factor 
calculation or 
inverse Fourier 
summation 


{Rip} reduced) 


Founer 
summation 


Real space 
filtering 


Phase Problem in X-ray Crystallography: Shake and Bake Ap- 
proach, Figure 1 

Flow chart for Shake and Bake, the minimal-function phase 
refinement and real-space filtering procedure 


are subject to the restrictions that no two atoms are 
closer than a specified distance (normally 1.2A) and 
that no atom is within bonding distance of more 
than four other atoms. 


C) Structure-factor calculation. A normalized struc- 


ture-factor calculation based on the trial coordi- 
nates is used to compute initial values for all the de- 
sired phases simultaneously. In subsequent cycles, 
peaks selected from the most recent Fourier series 
are used as atoms to generate new phase values. 


D) Phase refinement. The values of the phases are 


perturbed by a parameter-shift method in which 
R(¢), which measures the mean-square difference 
between estimated and calculated structure invari- 


E) 


F) 


ants, is reduced in value. R(@) is initially computed 
on the basis of the set of phase values obtained from 
the structure-factor calculation in step C. The phase 
set is ordered in decreasing magnitude of the associ- 
ated |E|’s. The value of the first phase is incremented 
by a preset amount and R(¢@) is recalculated. If the 
new calculated value of R(#) is lower than the pre- 
vious one, the value of the first phase is incremented 
again by the preset amount. This is continued until 
R(@) no longer decreases or until a predetermined 
number of increments has been applied to the first 
phase. A completely analogous course is taken if, 
on the initial incrementation, R(@) increases, except 
that the value of the first phase is decremented un- 
til R(@) no longer decreases or until the predeter- 
mined number of decrements has been applied. The 
remaining phase values are varied in sequence as 
just described. Note that, when the ith phase value is 
varied, the new values determined for the previous 
i— 1 phases are used immediately in the calculation 
of R(#). This process, when convergent, often, but 
not always, yields the constrained global minimum 
of R(#). The stepsize and number of steps are vari- 
ables whose values must be chosen. 

Fourier summation. Fourier summation is used 
to transform phase information into an electron- 
density map (refer to (3)). The grid size must be 
specified. 

Real-space filtering (identities among phases im- 
posed). Image enhancement has been accomplished 
by a discrete electron-density modification consist- 
ing of the selection of a specified number of the 
largest peaks on the Fourier map for use in the next 
structure-factor calculation. The simple choice, in 
each cycle, of a number of the largest peaks cor- 
responding to the number of expected atoms has 
given satisfactory results. No minimum-interpeak- 
distance criterion is applied at this stage. 


Applications 


Shake and Bake has been tested successfully, with no 
failure, using experimentally measured atomic resolu- 


tion data for some 30 structures, many of which had 


been difficult to solve with existing techniques or had 


defeated all previous attempts. These structures range 
in size from 25 to 1000 independent nonhydrogen 
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Toxl]: Histogram 


Atoms: 624 Space Group: P21242 1 
No. of Trials: 1619 
Trials Rmin range: (0.467, 0.532] Cycles: 255 
700 


600 


500 


0.467 0475 0483 O.49t 0.499 0.507 O.515 
Final Value of R¢@) 


0.823 0.531 0.539 


Phase Problem in X-ray Crystallography: Shake and Bake Ap- 
proach, Figure 2 

Final values of the minimal function R(@) after 255 cycles for 
1619 shake and bake trials for the 624 atom Tox II structure 
clearly showing the separation between the single solution 
and the 1618 nonsolutions 


ToxIl: Trace of Solution 
Atoms: 624 Space Group: P242;24 


RE) SnB Cycles: 255 
0.54 Typical Non-Solution 


9.49 


Solution ——-» 


0.44 
0 100 206 
No. of Cycles 


Phase Problem in X-ray Crystallography: Shake and Bake Ap- 
proach, Figure 3 

The course of R(#) for Tox Il, as a function of cycle number, 
for the solution trial and for a typical nonsolution trial 


atoms. Although a number of these structures had been 
previously known, this fact was not used in these appli- 
cations. In all cases those trials which led to solutions 
were readily identified by the behavior of the minimal 
function R(@) (See Fig. 2; Fig. 3). 


A Notable Application: Determination by Shake 
and Bake of the Previously Known Crystal Structure 
of Toxin II From the Scorpion 


Androctorius australis Hector. This structure, consist- 
ing of 64 amino acid residues (512 protein atoms, the 
heaviest comprising four disulfide bridges) and 112 wa- 
ter molecules (a total of 624 atoms), crystallizes in the 
space group P2;2,2; and diffracts to a resolution of 
0.96A. A total of 50,000 triplets having the largest val- 
ues of Ayx were generated from the 5, 000 phases with 
the largest values of |E| and used in the definition of the 
minimal function R(@). A total of 1619 Shake and Bake 
trials were run, each for 255 cycles. The final value of 
R(@) for the trial which led to the solution was 0.467, 
the value of the constrained global minimum of R(@). 
The range of final values of R(#) for the remaining 1618 
trials was [0.507, 0.532] (see histogram, Fig. 2), clearly 
nonsolutions. 

Figure 3 shows the course of the minimal function 
R(@), as a function of cycle number, for the trial which 
led to the solution and for a typical nonsolution trial. 
Both trials show almost identical behavior for some 130 
cycles when R(@) for the solution trial drops precipi- 
tously from a value of about 0.50 to 0.467 and remains 
at about that level for all remaining cycles. For the non- 
solution trial however, R(@) oscillates between 0.51 and 
0.52 for all remaining cycles. 
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In this article, the minimum cost network flow prob- 
lems [38] are considered with piecewise linear arc costs, 
so called piecewise linear minimum cost network flow 
problem (PLNFP). As a special subclass of minimum 
cost network flow problems, general piecewise linear 
network problems can be classified according to the 


type of piecewise cost functions. Using general network 

flow constraints, PLNFP can be stated as follows: 
Given a directed graph G = (N, A) consisting ofa set 

N of m nodes and a set A of n arcs, then solve [PLNFP]: 


minimize f(x) = » fij%ij) 


(i,JeA 
subject to 
Yo xi— Do xi = bi, VieN, (1) 
(k,i)EA (i,k)EA 
Ox<xj<uj, Vajea, (2) 


where f is separable and each fj is piecewise linear. For 
instance, the arc cost f (xj) can be defined as follows: 


ly... 1 - 1 

CiXipt sj, OS xij < Ajj, 

Bi ony face 1 eae 

Ci jXij + $5, hij < xij < Ai. 
Fijlij) = : 

rij. rij rij—1 7 rij 

ceXip tS, Agi SxijSij, 


where ze for k= 1 to rj — 1 are breakpoints in the given 
interval [0, uj]. The constraints in (1) are called the con- 
servation of flow equations. The constraints in (2) are 
called capacity constraints on the arc flows. The prob- 
lem is uncapacitated if uj = 00, V(i, j) € A. 

Three specific classes of PLNFP are identified based 
on the arc costs, fj as follows: 
1) Convex PLNFP. 

2) Concave PLNFP. 
3) Indefinite PLNFP. 

In some cases, indefinite PLNFP is called discontin- 
uous PLNEP, since it usually results from a set of dis- 
continuities in the arc cost functions. Since the fixed 
charge network flow problem (FCNFP) has a very close 
relation to the PLNFP, it is important to understand 
the special structure of FCNFP to solve PLNFP. Due 
to the global optimality property of concave minimiza- 
tion [19,33], global solutions can be obtained at ex- 
treme points of feasible regions for the cases of FC- 
NFP and concave PLNFP. A recent survey on min- 
imum concave-cost network flow problems can be 
found in [16]. 


Convex PLNFP 


Firstly, let us consider a convex PLNFP. Suppose con- 
straints are all linear, and the cost function to be mini- 
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mized is separable and piecewise linear. Then the pro- 
portionality assumption is violated for the cost func- 
tion. However, if the piecewise linear function is con- 
vex, the problem can still be modeled as an LP [29]. 
Let f (xij) denote the contribution of xj to the sep- 
arable objective function. Suppose that there are rj — 1 
breakpoints at which f(;(xj) changes slope, such that 


=)? L Pte SH ys 
O=A1j,<A< <A <A) = Gy. 


Let the slope in the subinterval i <x < ae be ce 
for k = 1 to rj, and let yt be the portion of xj lying in 
the kth subinterval, A to Ae (i. e., ve is the length of 
the overlap of the interval 0 to xj with the subinterval 
AR to is k = 1 to rj. When defined in this manner, 


the new variables y1., 


ipo Vi, partition x; such that 


xi = Ybor + yi; - (3) 


These variables are subject to the constraints: 


1 1 
Vig Si 
2 2 1 
O< yi, <4; -Aij 
: (4) 
rjj-l rij-l rjj—2 
O<y; SAy  —4ij 
rij rij rij—l 
0295 = 4 = AG 


and: 


for every t, if yj ; > 0, then each of yt is equal to 
its upper bound 


MAR, Vk <t. (5) 


Defining the new variables as shown above, it is 
: 1 4,1 Tif, Taj 
clear that fj(xjj) is equal to c;,y;, +--+ + ¢;/'y;/’. If the 
original separable piecewise linear objective function to 
be minimized is continuous and convex so that it holds 
the following increasing conditions 
1 ij 


if ee 


(6) 
constraints on the new variables of the type (5) can 
be ignored in the transformed model. Since convex 
PLNFP is reformulated as an LP using the above tech- 
nique and has specially structured network constraints, 
it can be solved efficiently in polynomial time. If fj 


is not continuous the slopes do not satisfy the condi- 
tion (6), then the constraints (5) must be specifically 
included in the model. Since these constraints are not 
linear, the transformed model is no longer an LP. 


FCNFP 


Due to the similarity of its structure with the piecewise 
linear case, the fixed charge network flow problem (FC- 
NFP) has close relations to PLNFP. It is very important 
and useful to investigate some features of FCNFP even 
if FCNFP does not belong to the class of PLNFP. 

The FCNFP is a special case of minimum concave 
cost network flow problems (MCNFP) [19], whose arc 
cost function has a discontinuity at the origin. The arc 
cost function f(x) of the FCNFP has a form 


0 if xij =0 


fij(xij) = (7) 


Sij + CijXij if Xij > 0, 
where sj > 0 is a fixed cost for arc (i, j) € A. 

In many practical problems, the cost of an activity 
is the sum of a fixed cost and a cost proportional to 
the level of the activity. FCNFP is obtained by impos- 
ing a fixed cost of sj > 0 if there is positive flow on 
arc (i, j) and a variable cost cj. Due to the discontinuity 
of fj, the problem can be transformed to a 0-1 mixed 
integer programming problem by introducing n binary 
variables, indicating whether the corresponding activ- 
ity is being carried out or not. Assuming sj > 0, fj can 
be replaced by 


Fij = CijXig + Sif Vij 
with 


0 if xj; =0 
1 if xj; > 0. 


Xij >0O and Vij = (8) 


The above condition (8) can be incorporated into the 
capacity constraints to yield 
0 < xij < Uijyij, 


Vij € {0, 1}. 


Hence we obtain the following formulation of the fixed 
charge network flow problem FCNFP wip: 


min > (cijxij + SijVij) 
(eA 
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subject to 
Mx =b (9) 
OX<xjSuijyiy, Gea. (10) 
v7 €40,1}, (EEA, (11) 


where M is an m Xx n node-arc incidence matrix and 
b is an m-dimensional column vector. This FCNFP yp 
can be solved by using any type of classical branch and 
bound algorithms that use LP relaxations [30]. These 
LP relaxations can be solved efficiently by existing lin- 
ear network algorithms exploiting the special structure 
of their feasible domain [18]. 

As we can see later, many concave and indefinite 
PLNFPs can be reduced to a FCNFP model by intro- 
ducing new variables and modifying problem struc- 
tures. It is noticed that FCNFP models reduced from 
original PLNFPs can be also transformed to a 0-1 
mixed integer programming problem. Consequently, 
the size of the resulting model grows very fast even if 
the original PLNFP is just of medium size. This stimu- 
lates the reason that many researchers have developed 
new efficient schemes to improve their exact methods 
(especially the branch and bound method). Indeed, the 
computational effort and memory requirement to solve 
large scale FCNFP models have been gradually reduced 
in various application areas. Yet, since there is a lim- 
itation for improving exact solution methods to solve 
the problem in practical sense, developing a effective 
approximate method is still in need. 


Concave PLNFP 


As different from the convex case, concave PLNFPs are 
more difficult to solve since we cannot use the same 
technique in the convex case to reduce the problem into 
an LP. However, a concave PLNFP can be transformed 
to a fixed charge network problem in an extended net- 
work. The size of the extended network depends on the 
number of linear pieces in each arc cost function. An 
arc separation procedure (ASP) is required to solve the 
problem in this way and ASP can be valid due to the 
concavity of arc cost functions. 

Let us consider an arc (i, j) € A and its arc cost fj, 
and suppose fj has rj linear pieces as defined previ- 
ously. Then arc (i, j) can be separated into rj arcs be- 


tween nodes i and j for (i, j) € A. Each separated arc (i, 
j)* for k = 1 to rj has a fixed charge cost function f i (see 
Fig. 1) defined by 


£E( ) 0 if Xij =0 

(Xe) SS 

— si + Chm if x;j > 0. 

This extended network is denoted by G,(N, A.) where 
the number of arcs |A,| is given by 


ne = |A-| = > ij - 
(i,jJEA 


After the ASP modification shown in Fig. 2, the origi- 
nal concave piecewise linear objective function can be 
expressed as a sum of fixed charge arc cost functions as 
follows: 


f(x) 


II 


yf) 
(i,peA 


Vij 


Dd Lf 


(i,j)€A k=1 


Vij 


de Lileiixij + si). 


(i,f)EA k=1 


II 


(12) 


It is easy to see that the equality in (12) can not be true 
in general cases without a set of constraints to restrict 
a domain for each separated arc cost function. How- 
ever, due to the following property from the concavity 
of arc cost functions: 

Vij 


eae oy 


1 
ip ij > 9. (13) 


the equality holds for true in this case. More precisely, 
fij(xij) is equal to at most one of arc cost among all sep- 
arated arc costs between node i and j at the optimality 
of minimization problems. This argument can be gen- 
eralized as the following property. 


Proposition 1 Given an extended network described 
above, if a positive flow x;. is optimal for a minimum 
concave PLNFP and faq < x7, Ai forl<q< rj 
then it takes only one arc, (i, j)4 (i.e. qth arc) among all 
separated arcs between node i and j, (i, j)* fork = 1,..., 
Vij. 

Based on this, the original concave PLNFP can be re- 
duced to a FCNFP with the objective function given 
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° Ai A Ay’ (= 4) xy 


Piecewise Linear Network Flow Problems, Figure 1 
Example of a concave piecewise linear arc cost function 


fyltxg) 


fi(Ny) 


Piecewise Linear Network Flow Problems, Figure 2 
Arc separation procedure 


in (12) and some extended network constraints corre- 
sponding to the constraints in (1). The resulting formu- 
lation of a concave PLNFP is shown as follows: 


rij 


min fo) = DY /heep 
(i,f)EA k=1 
subject to 
ij Tij 
Y yai- Y Vahaw, wen, 
(L,i)EA k=1 (iDEA k=1 


Vij 
ye = xi, V(i,j)EA, 
k=1 


x 20,VGjeAd, k=l... 


Vij» 


O< xj <uiyj,VG EA. (14) 


In the formulation, it is clear that any other set of con- 
straints is not necessary to specify lower and upper 


bounds for separated arcs (i, j) € A., Vi, j € N, due 
to the above proposition. 

As a result, the solution of concave PLNFP can be 
found by solving the fixed charge network problem for- 
mulated as above. Thus, developing an efficient algo- 
rithm to solve a FCNFP (exactly or approximately) is 
a key to solve PLNFPs in this approach. 


Indefinite PLNFP 


Lastly, we consider a PLNFP with indefinite arc cost 
functions, which is the most difficult case in this class. 
The major difficulty to find exact solutions for indefi- 
nite PLNFPs is originated from the structure of their arc 
cost function. Obviously such cost functions are neither 
convex nor concave, and possibly have a finite set of dis- 
continuities. However, due to the nature of real world 
applications of the model, we focus on two certain types 
of arc cost function in indefinite PLNFP models, so 
called staircase arc cost function [6,17,26] and sawtooth 
arc cost function [26] arc cost functions, respectively. 

Both arc cost functions have a very similar structure 
in overall shape, however, they have a different aspect at 
breakpoints. It can be described in mathematical form 
as follows: 
e ‘Staircase’ arc cost function (see Fig. 3): 

fF OAR) < fab +e), 

for any ¢ >Oandk=2,..., ry. 

e ‘Sawtooth’ arc cost function (see Fig. 4): 


pomaey 1 =) > fi (Ai 4 


for any e>Oandk=2,..., rj. 

Moreover, it is assumed that the property of slopes 
shown in (13) is still valid since it is a very general phe- 
nomenon in real applications. Note that extreme point 
solutions are not guaranteed in this cases since objec- 
tive functions are no longer concave. 

Now, we introduce an equivalent (FCNFP) ip for- 
mulation of the problem with some additional parame- 
ters and binary variables. Let us define the size of inter- 
val between adjacent breakpoints as 


k=1,..., 7, 
(15) 


AM, = AR -AR LG |) EA, 
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Piecewise Linear Network Flow Problems, Figure 3 
Staircase arc cost function 


Aj (= uy) %y 


Piecewise Linear Network Flow Problems, Figure 4 
Sawtooth arc cost function 


and define the gap of function values at each breakpoint 
in arc cost functions as 


Adk = (eh ‘+ sh) — ee ae T+ sk » 

— ‘ 

= sisi i) + (ch, — ch 2) ae , Vj) € A, k, 
(16) 


where si = 0 and chi = 0 (also clearly A di, = =s p We 
now let a ; be the put of x; that lies within level ‘k (i.e. 
kth subinterval), in the following sense, 


(0) if xj; < 27, oe 
k-1 k 
if Ay oie 


if xj; = ais 


(17) 


and we obtain the following equation for substitution 
into (FCNFP) ip model. We then introduce new binary 
variables defined by 


i k-1 ae 
k 1 if Ai; < Xij 


dij (18) 


0 otherwise. 


Using (15)-(18), the indefinite PLNFP under con- 
sideration can be formulated as an MIP version of FC- 
NFP as follows: 


‘ij 
min > Sex, + Adiy;.) 


(i,j)€A k=1 


subject to constraints in (1) and 


rij 


= V(i, jf). (19) 
k=1 

te AN VEG), BE Misia ta (20) 

2AM Yh VO Relocate, 2D 

x0 VG 7), = Lacy Piys (22) 

PEW) VG). R= lisa aty (23) 


It is noticed that combining one Spal from 
(20) and one from (21) yields ae yt < — l< 
AK yk 1, which implies yf, < yf, V(i,j), k > 1. 

There is another approach to formulate the prob- 
lem as a concave minimum cost network flow prob- 
lem (MCNFP) model. In [26], B.W. Lamar described an 
equivalent formulation of MCNFP with general nonlin- 
ear arc costs (including the problem considered in this 
section) as a concave MCNFP on an extended network. 
The equivalence between the problems is based on con- 
verting each arc with an arbitrary cost function in the 
original problem into an arc with a concave piecewise 
linear cost function in series with a set of parallel arcs, 
each with a linear arc cost function (cf. [26] for de- 
tails). Thus, the resulting problem is a concave MCNFP, 
which is different from the FCNFP formulation model 
shown above. 
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Applications 


Piecewise linear network models have a number of ap- 
plications in various areas such as transportation prob- 
lems, location problems, distribution problems, com- 
munication network design problems, and economic 
lot-sizing problems. Due to the structure of objective 
functions (especially in concave and indefinite PLNFP 
models), many real-world situations listed above can 
be modeled as PLNFP. There are major applications in 
the following two fields which have been studied exten- 
sively by researchers. 


Transportation Problems 


The first major field is transportation-related prob- 
lems with concave cost functions including fixed charge 
cost functions. The concave cost functions in this field 
are usually assumed to be piecewise linear in many 
cases. A number of algorithms developed with differ- 
ent schemes and their computational results have been 
reported. These can be found in a limited list of refer- 
ences [2,3,4,7,11,15,20,21,24,25,31,34,35]. 

The category of exact solution approach contains 
diverse techniques based on extreme point ranking, 
branch and bound, and dynamic programming meth- 
ods. Since the problems can be formulated as MIP mod- 
els, branch and bound approach with various branch- 
ing schemes has been a major interest in the literature. 

K.G. Murty [28] introduced an extreme point rank- 
ing method for solving fixed charge problems. P. McK- 
eown [27] extended Murty’s method to avoid some de- 
generacy of the problem. Recently, Pardalos [32] dis- 
cussed a range of enumerative techniques based on ver- 
tex enumeration or extreme point ranking. 

P. Rech and L.G. Barton [34] investigated a noncon- 
vex transportation algorithm using branch and bound 
approach to problems with piecewise linear cost func- 
tions. These functions are approximated by a convex 
envelope and solved using out-of-kilter method. C.T. 
Bornstein and R. Rust [6] specialized this approach to 
the problem with staircase cost function, using succes- 
sive linearizations of the objective function. P.T. Thach 
[37] proposed a method for decomposing the prob- 
lem with a staircase structure into a sequence of much 
smaller subproblems. 

Lamar [25] developed a branch and bound ap- 
proach for cases of capacitated MCNFP in which the 


costs consist of piecewise linear segments. The problem 
is formulated an MIP, with the branching variables de- 
termining which linear cost region an arc flow falls into. 
Recently, D. Kim and P.M. Pardalos [23] developed 
a heuristic procedure for solving general FCNFP with- 
out formulating it as an MIP. The procedure is consist 
of solving a series of LPs to update slopes and search- 
ing extreme points of the convex feasible region with 
the updated slopes. This approach provides a potential 
possibility of parallel implementation with different ini- 
tial solutions to improve the quality of solutions. Some 
heuristic approaches can be found in [8,11,21,24]. 


Location Problems 


Another major application area is to solve location 
problems. Since the problem in this field is to locate fa- 
cilities and determine the size of facilities to minimize 
total cost [13], it naturally involves fixed costs and/or 
piecewise linear costs. Since solution methods in this 
field have used network formulation in many cases, 
they are quite similar to those for solving FCNFP or 
PLNFP [1,10,12,22,36]. However, exploiting their cer- 
tain problem structures, there are some Lagrangian ap- 
proaches [5,14] to the problems. 

Recently, K. Holmberg [17] proposed a decomposi- 
tion and linearization approach for solving the facility 
location problem with staircase costs. A comparison of 
heuristic and relaxation approaches in this field can be 
found in [9]. 


Concluding Remarks 


In this article, three categories of PLNFP are identified 
and formulated in general formats. Some properties 
of problems in each category are investigated to show 
the insight of problems including FCNFP. The concave 
PLNFP is formulated as a FCNFP in MIP structure ex- 
ploiting the concavity of arc cost functions. Moreover, 
the indefinite (nonconvex) PLNFP is also transformed 
to a FCNFP with a reduced feasible region. This implies 
that the extreme point solution of the transformed FC- 
NFP may not be an extreme point solution of the origi- 
nal indefinite PLNFP. 

A major advantage of the formulations introduced 
in the paper is that solutions can be found by solving 
fixed charge problems instead of solving difficult non- 
convex optimization problems. As we can see in the 


2964 


Piecewise Linear Network Flow Problems 


transformation to FCNFP, the size of FCNFP is usually 
quite large because of new binary variables introduced 
in the model. Thus, developing an efficient algorithm 
for large scale FCNFP can provide a key to solve con- 
cave and indefinite PLNFP in practice. 
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We present a new class of simplex type algorithms 
for solving general linear problems. The distinguished 
characteristic of the new algorithms is their capability of 
generating two paths to the optimal solution. One path 
is of simplex type while the other is not. The basic solu- 
tions of the simplex path are not feasible. The nonsim- 
plex path consists of feasible segments the endpoints of 
which lie on the boundary of the feasible region. Pre- 
liminary computational results indicate that the new al- 
gorithms are substantially faster than the classical sim- 
plex algorithm. 


Introduction 


The classical simplex algorithm [6] had been the most 
efficient method for solving practical linear problems 
until the middle of 1980s. Then N.K. Karmarkar [9] de- 
veloped the first interior point algorithm. Subsequent 
research led to the development of efficient interior 
point algorithms which outperform the simplex algo- 
rithm on large linear problems. Despite this fact the de- 
velopment of pivoting algorithms that clearly outper- 
form the classical simplex method remained of great in- 
terest. It seems that a new class of pivoting algorithms 
developed recently is more efficient than the simplex 
method. 

The new algorithms differ radically from the clas- 
sical simplex algorithm. The basic solutions generated 
are not feasible. In that sense the algorithms are ex- 
terior point methods. However, the algorithms gener- 
ate a second path, which consists of feasible points. In 
fact, it consists of line segments the end point of which 
lie on the boundary of the feasible region. As a result, 
the movement between adjacent basic feasible solutions 
is avoided. The geometry reveals that the new algo- 
rithms are faster than the well known simplex method, 
a fact that is verified by the available preliminary com- 
putational results, see [2,5,8]. However, more computa- 
tional results are needed to draw more safe conclusions. 

The first exterior point simplex type algorithm de- 
veloped by K. Paparrizos [11] for the assignment prob- 
lem. Other exterior point algorithms for network flow 
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problems can be found in [1,10,15]. Paparrizos [12] 
generalized his exterior point method to the general lin- 
ear problem by developing a dual in nature algorithm. 
Independently, a similar primal algorithm solving a se- 
quence of linear subproblems is developed in [3]. The 
algorithms in [12] and [3] generate two paths. One path 
is feasible, the other is not. However, both paths are of 
simplex type. In [13] the algorithms in [12] and [3] are 
generalized, so that the feasible path is not of simplex 
type. The main algorithm presented in this paper is very 
similar to the algorithm described in [13]. The algo- 
rithm in [5] also generates two paths of simplex type. 
One path is feasible to the primal problem while the 
other is feasible to the dual problem. A generalization 
to this algorithms can be found in [14]. 


Algorithm Description 


We will describe the algorithm using the linear problem 
in the standard form 


min cx 
(Pi) 4 st Ax =b 
x>0 


where c, x € R",be R”, A € R™*" and T denotes trans- 
position. Without loss of generality we assume that A is 
of full row rank, i.e., the row rank of A is m (with m < 
n). 

A submatrix of A consisting of m independent 
columns is called basic matrix. The jth column of a ma- 
trix or tableau A is denoted by aj and the ith row by 
A;. A column in a basic matrix B is called basic. The 
columns not in B are called nonbasic. The submatrix of 
A consisting of all the nonbasic variables is called a non- 
basic matrix and it is denoted by N. Also, with B and N 
we denote the sets B = {j: a; is basic}, N = {j: a; is non- 
basic}. The basic and nonbasic components of a vector 
x (respectively, c) are denoted by xg, xy (respectively, 
cp Cw). Setting xy = 0 in the equality constraints of (P1) 
we have Bxg = b. From the last equation we have 


xB= Bo} b 
The solution xg = B~!b, xy = 0 is called basic. A solution 


that satisfies all the constraints of (P1) is called feasible. 
Clearly, a basic solution is feasible if and only if xg > 0. 


Given a basic matrix B we can use the equality con- 
straints of (P1) to express the basic components xg as 
a function of the nonbasic components xy: 


xg = B'b—B™'Nxy. 
Substituting in the objective function of (P1) we have 
z=clx= caXxB - chxy 
= cl (B-1b — BNxy) + exw 
— ci Bb + ex _ ci B1N)xn : 
We set 
Zj = Cj — cz Ba; 
and 
H=-B'N. 


The current basis B is optimal if Zz 0,jEN. 

Now, we are ready to describe an exterior point al- 
gorithm. Let B be a nonoptimal basic matrix not neces- 
sarily feasible to (P1), i.e., xp = B-'b # 0. Set Q = {j: z; 
<0,j¢€N}andR=N~ Q= fj: z= 0,j © N}. If Q 
= @, then B is an optimal basis and xg, xy is an opti- 
mal solution to (P1). In this case the algorithms stops. 
Otherwise, a leaving variable xg, = xx is determined as 
follows. 

First an improving direction d (cTd < 0) such 
that dp = 0 and dg > 0 is constructed. The direc- 
tion d is constructed in such a way so that the ray 
{x: x =x + td,t > 0} must intersect the feasible re- 
gion of problem (P1). In the case xz = 0, d is very easily 
computed. Just set dr = 0, dg > 0 and 


dz = >. hjd;, (1) 


JEQ 


where hj = — B”'a;. The leaving variable is determined 
by the following minimum ratio test, 


ae = iain alae 


—dz, (-ds,° 


r 


ds, 20 -; (2) 


Observe that the above minimum ratio test is precisely 
that used by the primal simplex algorithm. However, 
keep in mind that our algorithm is an exterior point 
method and, hence, the basic solutions are not feasible 
in general. Similar to the simplex method, the exterior 
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point algorithm stops in the case dg > 0. In this case, it 
can be easily shown that problem (P1) is unbounded. 

After the choice of the leaving variable xg, = x,, we 
proceed to the choice of the entering variable ~). It is 
worth mentioning that the exterior point algorithm, al- 
though of primal nature, chooses the entering and leav- 
ing variables in the reverse order of that of the primal 
simplex algorithm. First the leaving variable is chosen 
and then the entering one. In that sense, the exterior 
point algorithm resembles to the dual simplex method. 
The entering variable is chosen as follows. It is set 


a = min =a h,j <0,j¢€ ah (3) 
and 

—Zs . —Zj ‘ 

hn = min | hs > OER) 


(5) 


and / = s, otherwise. From the way the entering variable 
is chosen, it is easily seen that priotity is given to the 
variables in Q. 

Let y be the feasible point from which the ray 
{x: x + td, t > 0} exits the feasible region. Let B be the 
new basis, B = (B ~ {kt) U {I}. Also, let x be the new 
basic solution. The new direction is d = 7—% and anew 
iteration can be initiated. 

Formally, the algorithm can be described as follows. 
[0] (Initialization) Start with a feasible basis B. Set N 

={l,....n}~B,Q=GEN:7z<0,R=N~Q. 

Construct the improving direction d, where dr = 0, 

dg = land dg = Le hj. Set also do = pa E QF. 

[1] (Test of optimality) If Q = @, STOP. The current ba- 

sic solution is optimal. 

[2] (Choice of the leaving variable) If dg => 0, STOP. 

Problem (P1) is unbounded. Otherwise, choose the 

leaving variable xg, = x; from relation (2). 

[3] (Choice of the entering variable) Choose the enter- 
ing variable x; using the following two relations. Set 


If 


—Zp =Z5 
hyp —hys , 
set 1 = p. 


[4] (Pivot operation) Set B — (B ~ {k}) U {J}. IfleQ 
then Q< Q~ {I}andR<RU {k}. If] e€ Rthen Q 
< Qand R< (R~ {]}) U {k}. Let y be the boundary 
point from which d exits the feasible region and x 
the new basic solution. Set d < y—x,x < Xx 
compute dy = )°j < gz; and go to Step 1. 


Example 1 We further illustrate the algorithm by ap- 
plying it to the following linear problem. 


min z = —2x, —3x.+ x3 + 4x4 

s.t. Xj + 4x. +2x%34+44+%5 = 8 
X1 + 3x2 —4x3 -—X4 + X56 = 8 
Xx, + 3x2 + 2x3 -—x4 + x7 = 10 


X1,X2, 3, X4,%5,X6,X7 = O. 


In Step 0 we set B = [5, 6, 7]. It is easily seen that Q 
= [1, 2] and R = [3, 4]. Also, (xg)T = (xs, x6, x7) = (8, 8, 
10). We set d3 = dy = 0 and d, = d = 1. Then, we have 
(dg)' = (ds, de, dz) = (— 1-4, -1— 3, -1—3) =(— 
5, — 4, — 4). Finally, we have z; = — 2, z. = — 3, z4=1, 
zs =4, and 2 =Z7 =Z, = 0. 

The algorithm does not stop in Step 1 because Q # 
@. As dg Z 0 we can choose the leaving variable. It is 


F § X5 X6 X7 
min 


| -ds’ -ds’ —d; 
parr js 8 w=5-3 
Per dd ess 


Hence, r= 1 and k = 5 (B[1] = 5). Variable x; is leaving. 
For the choice of the entering variable we first com- 
pute 


—Z2 ; 
, =)? =min 


There is no j € R such that fy; > 0. Hence, !=2€ Q. 
The new sets B, Q and R are B = [2, 6, 7], Q = [1] and 
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R = [3, 4, 5]. The point y from which the improving 
direction exits the feasible region is given by y = x+td, 
where f = = = 2. Hence, y™ = (0, 0, 0, 0, 8, 8, 10) 
+ 8/5(1, 1, 0, 0, — 5, — 4, — 4) = (8/5, 8/5, 0, 0, 0, 8/5, 
18/5). It is easily verified that the new basic solution is 
xi = (0, 2,0, 0,0, 2, 4). The new direction is 


= 8 —2 =) 
d' =(y—x)' =(-, —,0,0,0,—,—). 
(y—*) ( 7 


Observe that c'd <0, dg > Oand dr = 0. 

The algorithm is initialized with a basic feasible so- 
lution. As a result a two phase or a big-M method very 
similar to that of the primal simplex algorithm can be 
used to solve general linear problems. 


Algorithm Justification 


In this section we show that the algorithm correctly 
solves linear problems. We also show that the algorithm 
is finite under the usual non degeneracy assumptions. 


Lemma 2 [f the algorithm stops at step 1, then the last 
basic solution is optimal. 


Proof The proof is by induction on the number of it- 
erations. Assume that z; > 0 for j € R. It is easily seen 
that this induction hypothesis is satisfied by the initial 
basis. Let B be the next basis, R the updated set R and Z; 
the corresponding reduced costs. Then we have 


ZI ; oa, 
j= Gm phy 20, Gk GER. (6) 


N 


Because of the choice of the entering variable x; we have 
z/hy = 0. fh, < 0, then z; = 0. If h,; > 0, (6) is equiv- 
alent to 

Ba 

hy; rl 


which holds because of the choice of the entering vari- 
able. If j = k, then Z, = ia > 0. Hence, Z; > 0 for 
JER. 

Denote now by B the basis just before the last ba- 
sis B. From the stopping condition Q = 9, we conclude 
that Q = {I}. A simple induction on the number of it- 
erations shows that dg = 0 and d; > 0. Let d be the 
direction corresponding to the increase of the entering 
variable x;, / € Q. Then d = Ad, where A > 0. Let y be 


the boundary point from which d exits the feasible re- 
gion. It is easily concluded that x = y > 0. Hence, the last 
basic solution x is feasible. By the well known Theorem 
of duality we conclude that x is optimal. 


Lemma 3 [If the algorithm stops at Step 2, problem (P1) 
is unbounded. 


Proof From d= y— x we conclude that Ad = 0. From 
dp = 0 we have 


c'd = ae 


JEQ 


From relation (6) and a simple induction we conclude 
that z; < 0 for j € Q. As dj = 0 we have cT d < 0. This 
concludes the proof. 


We show finiteness of the algorithm under the assump- 
tion that the dual problem of (P1) is nondegenerate, 
i.e. z; # 0, for j € N. We also assume that the initial 
basic feasible solution x is nondegenerate. 


Theorem4 The algorithm solves problem (P1) correctly 
after a finite number of iterations. 


Proof The correctness of the algorithm is an immedi- 
ate consequence of Lemmas 2 and 3. Let B be the cur- 
rent basis and B the previous one. Let d correspond to 
B and y be the boundary point from which d exits the 
feasible region. Then d = y — x, where x is the basic 
solution corresponding to B. Also x = x + td > 0 for 1 
<t < xp,/(— dp,). Indeed, for t= 1 we have x = y > 0. 
We conclude that xg, > 0. From the nondegeneracy as- 
sumption we have z; # 0. Hence, the objective func- 
tion decreases strictly from iteration to iteration. The 
decrease is (— z;/h,1) xg, < 0. This concludes the proof. 


The algorithm cycles on degenerate problems, see [12]. 
The primal-dual pivoting rule [5] also cycles, see [8]. 
However, refinements of these algorithms employing 
the least index rule [4] or the lexicographic rule are cy- 
cling free, see [3,12] and [8]. 


Computational Improvements 


The algorithm performs two minimum ratio tests; one 
for choosing the entering and one for choosing the leav- 
ing variable. This fact may mislead someone to con- 
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clude that no modifications of the algorithm are pos- 
sible. This is not true. Consider the current basis B. 
From the previous iteration, a boundary point y, which 
is used to construct the current direction d, is available. 
In [17] several heuristics for improving the point y and, 
hence, reducing the number of iterations, have been de- 
veloped. 

The algorithm is capable of choosing the entering 
variable x; so that ] € R. Consequently, a minimum ra- 
tio test involving all the nonbasic variables is necessary. 
We can reduce the work per iteration by choosing the 
entering variable among those belonging to Q. Then, 
a variant consisting of stages is constructed. The mini- 
mum ratio test of this variant is restricted to the set Q. 
As a result the condition z; > 0, j € R is not satisfied. 
However, the objective function improvement per iter- 
ation is larger. 

As long as the basic solutions are exterior the 
boundary points y must satisfy yr = 0. This restric- 
tion does not permit computation of good boundary 
point. One way to cure this computational deficiency 
is to force the basic solutions to be feasible to the dual 
problem of (P1). This way a new primal dual algorithm 
is constructed. This algorithm can be seen as a gen- 
eralization of the primal dual pivoting rule discussed 
in [5]. 

The pivoting algorithms that generate two paths 
seem to be more efficient than the classical simplex al- 
gorithm. Preliminary computational results reported in 
[1,7] and [5] support this belief. It is worth mention- 
ing that the algorithm is up to 10 times faster than the 
simplex algorithm employing the maximum coefficient 
rule on some specially structured linear problems of 
rather small size (n < 1000), see [2]. In all computa- 
tional results we are aware of the computational superi- 
ority of two path pivoting algorithms over the simplex 
method increases as the size of the problems increases. 
This fact is very encouraging for the new algorithms. 
Certainly, more computational results, particularly on 
benchmark problems, are required to draw more safe 
conclusions. A computational study of this type is now 
under way in [16]. 


See also 
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Introduction 


Since there has been tremendous progress in planning 
in the process industry during the last 25 years, it might 
be worthwhile to give an overview of the current state- 
of-the-art of planning problems in the process industry. 
This is the purpose of the current contribution, which 
has the following structure. We start with some concep- 
tional thoughts and some comments on special features 
of planning in the process industry. What is said in 
this article applies to the chemical but also to the phar- 
maceutical as well as to the food industry. The reader 
will find an orientation on production planning, strate- 
gic and design planning, planning under uncertainty 
and multiobjective planning. In the Sect. “Model Fea- 
tures in Planning Problems” the focus is on planning 
features one would expect in a process industry plan- 
ning model. The Sect. “Planning Under Uncertainty” 
and “Multi-Criteria Planning Problems” address plan- 
ning under uncertainty and multi-criteria planning. 


A Definition of Planning 


A definition of the term “planning” leads to a group of 
related terms such as “strategic planning,” “design plan- 
ning,” “master planning,” “operative planning,” and 


“production planning.” Planning needs also be distin- 
guished from scheduling. 

A starting point could be Pochet and Wolsey (p. 3 
in [21]) in their definition of production planning: Pro- 
duction planning is defined as the planning of the acqui- 
sition of the resources and raw materials, as well as the 
planning of the production activities, required to trans- 
form raw materials into finished products meeting cus- 
tomer demand in the most efficient or economical way. 
Note that this does not say anything about the length 
of the time horizon. Their definition of supply chain 
planning is similar to that of production planning, but 
extends its scope by considering and integrating pro- 
curement and distribution decisions. They distinguish 
supply chain design problems which cover a longer 
time horizon and include additional decisions such as 
the selection of suppliers, the location of production fa- 
cilities, and the design of the distribution system. 

In this article we use planning for any type of 
strategic, design, or operative planning. We always 
assume that we are dealing with multisite produc- 
tion networks. Operative planning includes production 
planning within multisite production networks and 
scheduling of individuals sites. While in production 
planning the focus is rather on optimizing the trade- 
off between economic objectives such as cost minimiza- 
tion or maximization of contribution and the less tangi- 
ble objective of customer satisfaction, in scheduling due 
dates, makespan, or machine utilization becomes more 
relevant. So, instead of the term “production planning” 
we use the term “operative planning” (or just, “plan- 
ning”), with the target of supporting decisions which 
have an operative impact on a time scale of several 
months, maybe up to a year. Planning involves the de- 
termination of operational plans that support differ- 
ent short- or mid-term objectives for the current busi- 
ness and for a given multisite topology. Planning covers 
a horizon from a few months to 12 months, and can be 
extended to cover years (when it comes to strategic or 
design planning) and time-discrete models are used. If 
the time horizon becomes smaller we are in the realm 
of scheduling where time-continuous models become 
more efficient. When we extend the time horizon we are 
dealing with strategic planning or design planning cov- 
ering 1 year up to 20 years. Design planning includes 
those parts of the Pochet-Wolsey definition given ear- 
lier that allow beyond the topology also for the design of 
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production units, or the capacity of warehouses. Strate- 
gic planning is more concerned with product and cus- 
tomer portfolio optimization but also with the acquisi- 
tion of whole production sites. Kallrath [16] elaborates 
more on the concept of operative planning and the de- 
sign planning problem and promotes the idea of com- 
bining both in one single model. 


Special Planning Features in the Process Industry 


In the process industry continuous and batch produc- 
tion systems can be distinguished. There exists also 
semibatch production, which combines features from 
both. Plants producing only a limited number of prod- 
ucts each in relatively high volume typically use spe- 
cial purpose equipment allowing a continuous flow of 
materials in long campaigns, i.e., there is a continuous 
stream of input and output products with no clearly 
defined start or end time. Alternatively, small quanti- 
ties of a large number of products are preferably pro- 
duced using multipurpose equipment which is oper- 
ated in batch mode. Batch production is characterized 
by well-defined start-ups, e. g., filling in some products 
and follow-up steps defined by specific tasks for heat- 
ing, mixing, and reaction, and a clearly defined end for 
extracting the finished product. Batch production in- 
volves an integer number of batches, where a batch is 
the smallest quantity to be produced; the batch size may 
also vary between a lower and an upper bound. Sev- 
eral batches of the same product following each other 
immediately establish a campaign. Production may be 
subject to certain constraints, e. g., campaigns are built 
up by a discrete number of batches, or a minimal cam- 
paign length (or minimal production quantity) has to 
be observed. Within a fixed planning horizon, a certain 
product can be produced in several campaigns; this im- 
plies that campaigns have to be modeled as individual 
entities. One might argue that details of the batch pro- 
duction could be rather found in a scheduling model 
than in scheduling. However, the model provided in 
Kallrath and Maind1 (Chap. 8 in [17]) is a clear example 
where batch and campaign features have been incor- 
porated into a time-discrete planning model enhanc- 
ing it by some continuous time aspects. This problem 
was solved first by Kallrath [10]. An elegant and nu- 
merically more efficient formulation to add time con- 
tinuity to discrete-time models was developed more re- 


cently by Siirie [24]. However, it seems that this formu- 
lation still needs to be extended to support multistage 
production. 

Chemical products produced using different pro- 
duction equipment could lead to different performance 
when used. Therefore, customers might require that 
a product always is produced using one particular ma- 
chine, or at least it is always produced using the same 
machine. This feature is called origin tracing and is 
treated in Kallrath [15]. Certain performance chemi- 
cals or goods in the food industry have a limited shelf- 
life and are subject to an expiration date, or can only 
be used after a certain aging time. To trace those time 
stamps requires that individual storage means are con- 
sidered, e. g., containers or drums, which carry the time 
stamp or the remaining shelf-life. A model formulation 
is provided in Kallrath [15]. 

Another special feature in the refinery or petro- 
chemical industry or process industry in general is the 
pooling problem (see, for instance, [1], or Chap. 11 
in [18]). This is an almost classic problem in non- 
linear optimization. It is also known as the fuel mix- 
ture problem in the refinery industry but it also 
occurs in blending problems in the food industry. 
The pooling problem refers to the intrinsic nonlin- 
ear problem of forcing the same (unknown) frac- 
tional composition of multicomponent streams emerg- 
ing from a pool, e.g., a tank or a splitter in a mass- 
flow network. Structurally, this problem contains in- 
definite bilinear terms (products of variables) ap- 
pearing in equality constraints, e.g., mass balances. 
The pooling problem occurs in all multicompo- 
nent network flow problems in which the conserva- 
tion of both mass flow and composition is required 
and both the flow and composition quantities are 
variable. 

Nonlinear programming (NLP) models have been 
used by the refining, chemical, and other process in- 
dustries for many years. These nonlinear problems are 
nonconvex and either approximated by linear ones 
which can be solved by linear programming (LP) or 
approximated by a sequence of linear models. This se- 
quential LP technique is well established in the refinery 
industry but suffers from the drawback of yielding only 
locally optimum solutions. Although many users may 
identify obviously suboptimal solutions from experi- 
ence, there is no validation of nonobvious suboptimal 
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solutions, as this would require truly globally optimal 
solutions. Recent advances in optimization algorithms 
have yielded experimental academic codes which do 
find globally optimal solutions to large scale pooling 
NLP models [20]. Nonconvex nonlinear models are not 
restricted to the oil refining and petrochemical sector, 
but arise in logistics, network design, energy, environ- 
ment, and waste management as well as in finance and 
their solution requires global optimization. 


Some Comments on Planning and Scheduling 
in the Process Industry 


Planning and scheduling is part of company-wide lo- 
gistics and supply chain management. Planning and 
scheduling are often treated as separate approaches to 
avoid mathematical complexity. Depending on the level 
of detail required, the borderlines between planning 
and scheduling are diffuse. There could be strong over- 
laps between scheduling and planning in production, 
distribution, or supply chain management and strate- 
gic planning. The main structural elements of planning 
and scheduling in the process industry are: 
e Multipurpose (multi-product, multi-mode) reac- 
tors, 
Sequence-dependent set-up times and cleaning cost, 
e Combined divergent, convergent, and cyclic mate- 
rial flows, 
e Non-preemptive processes (no interruption), buffer 
times, 
e Multistage, batch, and campaign production using 
shared intermediates, 
Multicomponent flow and nonlinear blending, 
e Finite intermediate storage, dedicated, and variable 
tanks. 
Structurally, in scheduling these features often lead to 
allocation and sequencing problems, knapsack struc- 
tures, or to the pooling problem. Although the hori- 
zon of scheduling problems is usually only days to 
a few weeks, time-discrete models lead to too many bi- 
nary variables. Thus, time-continuous formulations are 
preferable; see Janak et al. [8] or the reviews by Floudas 
and Lin [4] or Floudas [2]. The largest scheduling prob- 
lem using a continuous-time approach was solved by 
Janak et al. [6,7]. It includes over 80 pieces of equip- 
ment, considers the processing recipes of hundreds of 
different products and leads to a linear mixed integer 


programming (MILP) problem with up to 463,025 con- 
straints, 55,531 variables, among them 8,981 binary 
variables, and 1,472,365 non-zeroes. 

In production or supply chain planning, we usu- 
ally consider material flow and balance equations con- 
necting sources and sinks of a supply network avoid- 
ing some of the complicating details of scheduling. 
Time-indexed models using a relative coarse discretiza- 
tion of time, e. g., a year, quarters, months, or weeks, 
are usually accurate enough. LP, MILP and nonlin- 
ear mixed integer programming (MINLP) technologies 
are often appropriate and successful for problems with 
a clear quantitative objective function as outlined in 
the section “Model Features in Planning Problems” or 
quantitative multicriteria objectives. A typical size plan- 
ning problem with four sites, 800 different products, 
1,500 different combinations of product and produc- 
tion plant, 10,000 different combinations of customer, 
product, package and month is reported in Sect. 5.1 in 
Kallrath [15]. This problem leads to over 200,000 vari- 
ables, 380,000 nonzero elements, 400 integer variables, 
and 900 semi-continuous variables. The number of dis- 
crete variables usually can reach a few thousand. 


Model Features in Planning Problems 


In the literature and in available software packages 
we usually find discrete-time models supporting multi- 
period analysis, i.e., nearly all the data may vary over 
time and allow one to evaluate scenarios that involve 
time-dependent aspects such as seasonal demand pat- 
terns, new product introductions, and shutdown of 
production facilities for maintenance periods. These 
models include the following main structural objects 
which are represented by the corresponding indices of 
the model: 

e Locations can be production or storage sites, host- 
ing plants and tanks, or demand points hosting 
tanks. 

e Facilities are typically production, wrapping, or in- 
ventory units that are characterized by their func- 
tional properties. Especially, in the process industry 
we find multistage production systems involv- 
ing units with general product-mode relationships. 
Their functional properties are attributes such as 
capacity, throughput rates, product recipes, yields, 
minimum production utilization rates, fixed and 
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variable costs, or storage limitations. Facilities can 
be existing or potential (for design studies). Produc- 
tion facilities may be subject to batch and campaign 
constraints across periods. 

e Demand points may represent customers, regional 
warehouse locations, or distributors who specify the 
quantity of a product they request. A demand point 
can be also seen as a sink of the planning model, 
i.e., a point where a product leaves the system and 
is not traced further. Demand may be subject to cer- 
tain constraints, e. g., satisfying a minimum quantity 
of demand, observing origins of production, or sup- 
plying a customer always from the same origin. 

e Inventories may be physically fixed entities such as 
tanks or warehouses but also moveable entities (e. g., 
drums, containers, boxes). They can be defined as: 
1. Dedicated to a single product from one produc- 

tion source, 
2. Dedicated to a specific product, 
3. Free to accept any product from any source or- 
origin. 
We may encounter tank farms, and especially multi- 
purpose storage entities, i. e., variable and multiprod- 
uct tanks. 

e Products may be classified as raw materials, inter- 
mediates, finished, and salable products. A product 
may have several of these attributes, and it can be 
purchased from suppliers, produced, or sold. Prod- 
ucts are produced according to the capabilities at 
the facilities and the recipes assigned; they may es- 
tablish a product group, e. g., additives. Product re- 
quirements are based on market demand, which is 
characterized by volume, selling price, package type, 
time, origin, and location or by other products in 
which they are used as intermediate products. 

e Suppliers or vendors may provide products for pur- 
chase under different offering schemes. This in- 
cludes the ability to link the product supply to loca- 
tions and describe contractual pricing mechanisms 
or availability. The solver may choose the optimal 
supplier. 

Regarding the overall business and strategic objectives 

the model needs to incorporate data describing the: 

e Costs, i-e., certain fixed costs, variable costs (pro- 
duction, transportation, inventory, external product 
purchase, energy, resources and utilities), and other 
costs, 


e Commercial aspects: financial aspects such as depre- 
ciation plans, discount rates, investment plans, for- 
eign currency exchange rates, duties and tariffs, as 
well as site-dependent taxes, 

Maximize operating cash flow and maximize net present 

value (NPV) objective functions are used to determine 

the financial and operating impacts of mergers, acqui- 
sitions, consolidation initiatives, and capital spending 
programs affecting business. In detail this may include: 

1. Maximize the net profit (free design reactors; open 
and close facilities), 

2. Maximize the contribution margin for a fixed sys- 
tem of production units, 

3. Maximize the contribution margin while satisfying 
a minimum percentage of demand, 

4. Minimize the cost while satisfying full demand (al- 
low external purchase of products), 

5. Maximize total sales neglecting cost, 

Maximize total production for a fixed system of 
production reactors, 

7. Maximize total production of products for which 
demand exists, 

8. Minimize energy consumption or the usage of 
other utilities, 

9. Minimize the deviation of the usage of resources 
from their average usage, 

10. Multicriteria objectives, e. g., maximize contribu- 
tion margin and minimize total volume of trans- 
port. 

Objective functions 2-10 support different short- or 

mid-term objectives for the current business. By using 

different objective functions, one can create operational 
plans that support strategies such as market penetra- 
tion, top-line growth, or maximization of cash flow to 
support other business initiatives. 

If, besides this broad structure, the focus is on 

a more detailed representation of physical entities, we 

find that planning models and their constraints may in- 

volve the following features (in alphabetic order): 

e Batch production (see Kallrath [10]): The quan- 
tity of a specific product being produced in a cam- 
paign possibly over several periods must be an inte- 
ger multiple of some pre-defined batch size. 

e Buy, build, close, or sell specific production as- 
sets (see Kallrath [12]): This feature is used for clos- 
ing, or selling acquisition, consolidation and capac- 
ity planning to determine the NPV and operational 
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impacts of adding or removing specific assets or 
groups of assets to the network. 

Campaign production (see Kallrath [10]): This al- 
lows one to impose a lower and/or an upper bound 
on a contiguous production run (campaign) possi- 
bly across periods; this feature is also known under 
the name minimal runs. 

Delay cost: Penalty costs apply if customer orders 
are delivered after the requested delivery date. 
Minimum production requirements: Minimum 
utilization rates modeled as semicontinuous vari- 
ables have to be observed for specific production 
units and/or entire production locations for each 
production time period. 

Multilocations: These can be production sites, stor- 
age sites, and demand points. 

Multipurpose production units (see Kallrath and 
Wilson [18] or Chap. 8 in Kallrath and Maindl [17]): 
If a unit is fixed to a certain mode, several prod- 
ucts are produced (with different mode-dependent 
daily production rates), and vice versa, a product 
can be produced in different modes. Daily produc- 
tion can be less than the capacity rates. A detailed 
mode-changing production scheme may be used to 
describe the cost and time required for sequence- 
dependent mode changes. 

Multistage production (see Chap. 8 in Kallrath and 
Maindl [17]): Free and fixed recipe structures can 
be used for the production of multiple intermediate 
products before the production of the final product 
with convergent and divergent product flows. The 
recipes may depend on the mode of the multipur- 
pose production unit. 

Multitime periods (see Timpe and Kallrath [25]): 
Nonequidistant time period scales are possible for 
commercial and production needs. For instance, de- 
mand may be forecast weekly for the first quarter of 
the year and then quarterly for the remainder of the 
year. 

Nonlinear pricing for the purchase of products [12] 
or utilities (energy, water, etc.) or nonlinear cost 
for inventory or transportation may lead to convex 
and concave structures in order to model volume 
and price discount schemes for the products or ser- 
vices purchased, while, in addition, contract start-up 
and cancellation fees may lead to additional binary 
variables. 


e Order lost cost: Penalty costs are incurred if prod- 


ucts are not delivered as requested and promised. 
Packaging machines are optimized to increase ma- 
chine throughput and ensure that priority is given 
to the most profitable products. 

Product swaps: With the objective of saving trans- 
portation and other costs companies often arrange 
joint supply agreements called swaps. For example, 
company 1 based in Europe as well in the USA has 
a production shortage of product A in the USA and 
thus purchases a defined quantity of product A in 
the USA from company 2. Company 2 (also located 
in the USA and Europe) has a customer in Europe 
requesting product A and thus purchases a defined 
quantity of product A from company 1 in Europe. 
Both companies get product A where they need it 
and avoid the cost of shipping the product. With- 
out this type of supply agreement company 1 would 
have to ship product A from its European plant to 
the USA, and company 2 would have to ship prod- 
uct A from its US manufacturing plant to Europe. 
Production origin tracing (see Kallrath [15]): It is 
possible to define fixed, free, or unique origins for 
specific demands. For example, a customer may re- 
quire that his demand is satisfied only from a specific 
plant in the network, or it may not be supplied from 
a set of plants, or the customer only requests that he 
is supplied from one unique plant during the whole 
planning horizon. 

Shelf-life (see Kallrath [15]): Product aging time can 
be traced. This allows for the application of con- 
straints such as maximum shelf-life, disposal costs 
for time-expired products, and the setting of selling 
prices as a function of product life. 

Transportation and logistics (see Kallrath [13]): 
Transportation quantities are appropriately mod- 
eled by the use of semicontinuous variables. This 
allows minimum and maximum shipment quanti- 
ties to be defined for each source location, destina- 
tion location, product, and transport means combi- 
nation. The logistics involves the costs, lead times, 
and constraints (minimum shipment quantities) as- 
sociated with moving intermediate and finished 
products between facilities and demand points. The 
means of transport may be chosen by the optimizer 
and nonlinear cost functions have to be considered 
as well. 
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This list covers many features but may be enhanced de- 
pending on the planning problem at hand. A model 
supporting these features caters to the overall busi- 
ness and strategic objectives. The model incorporates 
data describing the variable costs for production, trans- 
portation, inventory, external product purchase, en- 
ergy, resources, utilities, and further commercial as- 
pects: financial aspects such as depreciation plans, 
discount rates, investment plans, foreign currency 
exchange rates, duties and tariffs, as well as site-depen- 
dent taxes. Maximize operating cash flow and maximize 
NPV objective functions can also be used to determine 
the financial and operating impacts of mergers, acqui- 
sitions, consolidation initiatives, and capital spending 
programs affecting business. One would expect that 
a planning model supports various objective functions, 
among them net profit (free design reactors; open and 
close facilities), contribution margin, cost, sales, total 
production, or multicriteria objectives, e. g., maximize 
contribution margin and minimize total transportation 
volume. 

A possible extension which could relatively easily be 
connected to such a model is customer or product port- 
folio features as described in Kallrath [15]. 


Planning Under Uncertainty 


In many instances, the data are not in a determinis- 
tic form and this naturally leads to optimization under 
uncertainty, that is, optimization problems in which at 
least some of the input data are subject to errors or un- 
certainties, or in which even some constraints hold only 
with some probability or are just soft. Those uncertain- 
ties can arise from many reasons: 

1. Physical or technical parameters which are only 
known to a certain degree of accuracy. Usually, for 
such input parameters safe intervals can be specified. 

2. Process uncertainties, e. g., stochastic fluctuations in 
a feed stream to a reactor, processing times. 

3. Demand and price uncertainties occur in many situ- 
ations: supply chain planning, investment planning, 
or strategic design optimization problems involving 
uncertain demand and price over a long planning 
horizon of 10-20 years. 

For planning, the third point is most relevant. It is diffi- 

cult to predict demand and prices, especially in strategic 

or design planning problems where the time horizon 


covers several years. Scenario-based optimization in the 
sense of stochastic optimization leads to large number 
of variables. A decision taker might be more inclined to 
hedge against certain risks than to find the most proba- 
ble scenario. Therefore, the robust optimization frame- 
work developed by Lin et al. [9,19] seems to be more 
appropriate. It provides (1) an explicit trade-off be- 
tween the effect of uncertainties on the objective func- 
tion of choice, (2) the unified treatment of uncertain- 
ties in product demands, processing times, processing 
rates, prices of products, and prices of raw materials, 
and (3) the alternative deterministic equivalent models 
for a variety of types of representations of uncertain- 
ties through bounded, symmetric, normal, difference 
of normal, binomial, discrete, and Poisson probability 
distributions. 


Multi-Criteria Planning Problems 


In planning we may encounter the situation that there 
are conflicting objectives. Maximizing the contribution 
margin and minimizing the amount of stocked material 
might conflict. The novice might think if the storage 
costs are appropriately included in the objective func- 
tion both objectives would go along with each other 
very well. However, some promising sales could be lost 
because not enough material had been stocked. Thus, 
the goal to minimize the amount of stock is different 
from maximizing the contribution margin. At least in 
this example it might be possible to measure both goals 
in the same unit of measure, in this case a monetary 
unit. The more general situation is that we are facing 
conflicting goals which cannot even be measured on 
a common scale. 

Multiobjective optimization, also called multicrite- 
ria optimization or vector minimization problems, al- 
lows one to involve several objective functions. A sim- 
ple approach to solve such problems is to express all 
objectives in terms of a common measure of goodness 
leading to the problem how to compare different objec- 
tives on a common scale. Basically, one can distinguish 
two cases. Either the search is for Pareto optimal solu- 
tions, or the problem has to be solved for every objective 
function separately. 

When minimizing several objective functions si- 
multaneously the concept of Pareto optimal solutions 
turns out to be useful. A solution is said to be Pareto op- 
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timal if no other solution exists that is at least as good 
according to every objective, and is strictly better ac- 
cording to at least one objective. When searching for 
Pareto optimal solutions, the task might be to find one, 
find all, or cover the extremal set. 

A special solution approach to multiple objective 
problems is to require that all the objectives should 
come close to some target, measured each in its own 
scale. The targets we set for the objectives are called 
goals. Our overall objective can then be regarded as to 
minimize the overall deviation of our goals from their 
target levels. The solutions derived are Pareto optimal. 

Goal programming can be considered as an exten- 
sion of standard optimization problems in which tar- 
gets are specified for a set of constraints. There are 
two basic approaches for goal programming: the pre- 
emptive (lexicographic) approach and the Archimedian 
approach. In the Archimedian approach weights or 
penalties are applied for not achieving targets. A lin- 
ear combination of the violated targets weighted by 
some penalty factor is added, or establishes the objec- 
tive function. We consider only the first approach. 

In preemptive goal programming, goals are ordered 
according to importance and priorities. Especially, if 
there is a ranking between incommensurate objectives 
available, this method might be useful. The goal at pri- 
ority level i is considered to be infinitely more impor- 
tant than the goal at the next lower level, i + 1. But 
they are relaxed by a certain absolute or relative amount 
when optimizing for the level i + 1. In a reactor design 
problem we might have the following ranking: reactor 
size (i = 1), safety issues (i = 2), and production out- 
put rate (i = 3). 

Here we provide an illustrative example for preemp- 
tive (lexicographic) goal programming with two vari- 
ables x and y subject to the constraint 42x + 13y < 100 
as well as the trivial bounds x > 0 and y > 0. We are 
given 


Criterion 


Type A/P A 


Goal 1 (OBJ1): 
Goal 2 (OBJ3): 
Goal 3 (OBJ2): 


—3x + 15y — 48 | Min 
1.5x + 21y — 3.8} Max 


where the attribute A or P indicates whether we have 
to interpret A as an absolute value or percentage-wise. 
The multi-criteria LP or MILP problem is converted to 
a sequence of LP or MILP problems. The basic idea is 


to work down the list of goals according to the prior- 
ity list given. Thus, we start by maximizing the LP with 
respect to the first goal. This gives us the objective func- 
tion value zj. Using this value zf enables us to convert 
goal 1 into the constraint 

Pi powere we &. (1) 

~ 100 

Note how we have constructed the target Z, for 
this goal (P indicates that we work percentage-wise). 
In the example we have three goals with the opti- 
mization sense {max, min, max}. Two times we apply 
a percentage-wise relaxation, one time an absolute one. 
Solving the original problem with the additional in- 
equality (1) we get: 


zi} = —4.615385 


=> 5x+2y—20 > —4.615385—0.1 x (—4.615385) 
(2) 


Now we minimize with respect to goal 2 adding (2) as 
an additional constraint. We obtain 


Zz = 51.133603 
=> —3x + 15y — 48 > 51.133603+4 (3) 


Similarly as for the first goal, we now have to con- 
vert the second goal into a constraint (3) (here we al- 
low a deviation of 4) and maximize according to goal 
3. Finally, we get z} = 141.943995 and the solution 
x = 0.238062 and y = 6.923186. To be complete, we 
could also convert the third goal into a constraint, giv- 
Ing 


1.5x + 2ly — 3.8 = 141.943995 — 0.2 - 141.943995 
= 113.555196 . 


Note that lexicographic goal programming based on 
objective functions provides a useful technique to tackle 
multicriteria optimization problems. The great advan- 
tage is that the absolute or percentage-wise deviations 
used in lexicographic goal programming based on ob- 
jectives are easy to interpret. However, we have to keep 
in mind that the sequence of the goals influences the 
solution strongly. Therefore, the absolute or percentage 
deviations have to be chosen with care. It is very im- 
portant that the optimization problem can be solved to 
exact optimality or at least closely to optimality because 
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otherwise the interpretation of the permissible devia- 
tion from targets becomes difficult if not impossible. 

Goal programming offers an alternative approach 
but should not be regarded as without defects. The spe- 
cific goal levels selected greatly determine the answer. 
Therefore, care is needed when selecting the targets. It 
is also important in which units the targets are mea- 
sured. Detailed treatment of goal programming appears 
in such books as those by Ignizio [5] and Romero [22], 
who introduce many variations on the basic idea, as 
well as in that by Schniederjans [23]. 


Solution Approaches 


Most of the planning problems in the process industry 
lead to MILP or MINLP models and contain the follow- 
ing building blocks: tracing the states of plants, modeling 
production, balance equations for material flows, trans- 
portation terms, consumption of utilities, cost terms, and 
special model features. Mode changes, start-up and can- 
cellation features, and nonlinear cost structures require 
many binary variables. Minimum utilization rates and 
transportation often require semicontinuous variables. 
Special features such as batch and campaign constraints 
across periods require special constraints to implement 
the concept of contiguity [10,24]. The model, however, 
remains linear in all variables. Only if the pooling prob- 
lem occurs, e. g., in the refinery industry or the food in- 
dustry, we are really facing a MINLP problem. For a re- 
view on algorithms used in LP, MILP, NLP, and MINLP 
the reader is referred to [11]. State-of-the art global 
solution techniques to nonconvex nonlinear problems 
were reviewed by Floudas et al. [3]. 

It is very convenient and saves a lot of maintenance 
work if the planning model is implemented in an al- 
gebraic modeling language. In modeling languages one 
stores the knowledge about a model. A model coded in 
a modeling language defines the problem; it usually does 
not specify how to solve it. Unlike procedural languages 
such as Fortran or C, modeling languages are declara- 
tive languages containing the problem in a declarative 
form by specifying the properties of the problem. Alge- 
braic modeling languages [14] are a special subclass of 
declarative languages, and most of them are designed 
for specifying optimization problems, i.e., the model 
can be written in a form which is close to the mathe- 
matical notation. Usually they are capable of describing 


problems of the form 


Minimize f(x) (4) 
subject to g(x) = 0 (5) 
h(x) >0, (6) 


where x denotes a subset of X = IR” x Z”. 

The problem is flattened, i. e., all variables and con- 
straints become essentially one-dimensional, and the 
model is written in an index-based formulation, us- 
ing algebraic expressions in a way which is close to the 
mathematical notation. Typically, the problem is de- 
clared using sets, indices, parameters, and variables. 

In a modeling language, model and model data are 
kept separately. There is a clear cut between the model 
structure and the data. Thus, many different instances 
of the same model class with varying data can be solved. 
Many systems provide an open database connectivity 
(ODBC) interface for automatic database access and an 
interface to the most widely used spreadsheet systems. 
This relieves the user from the laborious duty of search- 
ing for the relevant data every time the model is used. 
A second advantage of this concept is that during the 
development phase of the model (in the cycle) the ap- 
proach can be tested on toy problems with small artifi- 
cial data sets, and later the model can be applied with- 
out change for large scale industry-relevant instances 
with real data. 

In an algebraic modeling language, the formulation 
of the model is independent of solver formats. Differ- 
ent solvers can be connected to the modeling language, 
and the translation of models and data to the solver for- 
mat is done automatically. This has several advantages. 
The formerly tedious and error-prone translation steps 
are done by the computer, and after thorough testing 
of the interface errors are very unlikely. There is a clean 
cut between the problem definition and the solution ap- 
proach, i.e., between the modeling and the numerical, 
algorithmic part. In addition, for hard problems dif- 
ferent solvers can be tried, making it more likely that 
a solution algorithm is found which produces a useful 
result. 

Modern algebraic modeling languages such as 
AIMMS, GAMS, LINGO, MPL, Mosel, or OPL studio 
are well suitable to implement such models (see Kall- 
rath [14] to get a flavor of all of them), use state-of- 
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the art commercial solvers, e. g., XPressMP (by Dash 
Optimization, http://www.dashoptimization.com) or 
CPLEX (by ILOG, http://www.ilog.com), and allow 
one to solve even huge MILP problems with several 
hundred thousand variables and constraints quite ef- 
ficiently. In the case of MINLP, the solution efficiency 
depends strongly on the individual problem and the 
model formulation. However, as stressed in [11] for 
both problem types, MILP and MINLP, it is recom- 
mended that the full mathematical structure of a prob- 
lem is exploited, that appropriate reformulations of 
models are made, and that problem-specific valid in- 
equalities or cuts are used. Software packages may also 
differ with respect to the ability of presolving techniques, 
default strategies for the branch-and-bound algorithm, 
cut generation within the branch-and-cut algorithm, 
and last but not least diagnosing and tracing infeasibili- 
ties, which is an important issue in practice. 

There is great progress in solving planning prob- 
lems more efficiently by constructing efficient valid in- 
equalities for certain substructures of planning prob- 
lems. The well-written books by Wolsey [26] and 
Pochet and Wolsey [21] contain many examples. These 
inequalities may a priori be added to a model, and in the 
extreme case they would describe the complete convex 


hull. As an example we consider the mixed-integer in- 
equality 
x<CA, 0<x<X; 


xeR{, AEN, (7) 


which has the valid inequality 


x<X-—G(K—A), where K:= Z| and 
G:= X-C(K-1). (8) 


This valid inequality (8) is the more useful, the more K 
and X/C deviate. A special case arising often is the sit- 
uation A € {0,1}. Another example, taken from ((26], 
p. 129) is 


AiQ,+A2Q2, < B+x; xeR, 1,02 EN, (9) 


which for B ¢ IN leads to the valid inequality 


[Ai] a+ (Las) Ql + Pt) < |B]+ io , (10) 


where the following abbreviations are used: 


f = B-|B| ; ti = Ai—|Ai| % hh = A2—|A2 | . (11) 


The dynamic counterpart of valid inequalities added 
a priori to a model leads to cutting plane algorithms 
which avoid adding a large number of inequalities a pri- 
ori to the model (note this can be equivalent to finding 
the complete convex hull). Instead, only those useful in 
the vicinity of the optimal solution are added dynami- 
cally. 

With use of these techniques, for some BASF plan- 
ning problems including up to 100,000 constraints and 
up to 150,000 variables with several thousand binary 
variables, good solutions with integrality gaps below 2% 
have been achieved within 30 min on standard Pentium 
machines [11]. 


Conclusions 


Planning is strongly based on mathematical optimiza- 
tion exploiting large MILP problems. Strategic, design, 
and operative planning models including several hun- 
dred thousand variables and constraints can be solved 
efficiently using commercial algebraic modeling lan- 
guages and attached MILP solvers. These models are 
connected to company-wide databases. 
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Introduction 


Increased competition leads contractors and chem- 
ical companies to look for potential savings at ev- 
ery stage of the design process. Process plant lay- 
out is an important part of the design or retrofit 
of chemical plants, and involves decisions concern- 
ing the spatial allocation of equipment items and the 
required connections between them [21]. Equipment 
items are allocated to one floor (single-floor case) or 
many floors (multifloor case) considering a number 
of cost and management or engineering drivers such 
as: 

e Connectivity cost, which involves the cost of piping 
and other required connections between equipment 
items. In addition, other related network operating 
costs such as pumping may also be taken into ac- 
count. 

e Construction cost, which leads to the design of com- 
pact plants. The trade-off between the cost of occu- 
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pied area (land) and height (multifloor plants) must 

also be considered here. 

e Retrofit, i.e. fitting new equipment items within an 
existing plant layout. 

e Operation, which involves scheduling issues (e. g. 
pipeless plants). 

e Safety, which introduces constraints with respect 
to the minimum allowable distance between equip- 
ment items. 

Trade-offs between connectivity, pumping, construc- 
tion, financial risk and installation of potential pro- 
tection devices are necessary. In order to resolve 
these trade-offs optimally, computer-aided methods are 
needed to support engineers in the identification of op- 
timal layouts subject to multiple design criteria. 

Traditionally, process plant layout decisions are ei- 
ther ignored or do not receive appropriate attention 
during the design or retrofit of chemical plants. Fur- 
thermore, safety aspects are usually not considered 
systematically within process layout, design and op- 
eration frameworks, thus resulting in inefficient and 
unsafe plants. It is critical that systematic and effi- 
cient computer-aided methods are developed to sup- 
port engineers in the rapid generation of alternative, 
safe, chemical process plant layouts. 

The process plant layout problem shares many sim- 
ilarities with the facility layout problem, which has been 
the center of interest for industrial engineers for several 
years (for comprehensive reviews, see [14,18,22,36]). 
Here, we focus on the chemical process plant lay- 
out, which has attracted attention within the research 
community because of the particular production, en- 
vironmental and safety considerations of the process 
industries. Such considerations affect both the objec- 
tive functions and the constraints of any optimization 
model used. 

In this chapter, a brief review of chemical plant lay- 
out research is given for the process plant layout prob- 
lem and this is followed by a detailed description of an 
optimization-based framework. Illustrative research is 
presented by integrating traditional plant layout frame- 
works with sustainability aspects. Special attention is 
given to aspects associated with land use, safety and 
plant location. Issues related to design/operation, pro- 
duction organization and pipe routing are also dis- 
cussed. Finally, other applications based on plant layout 
principles are presented. 


Approaches to Process Plant Layout 


The initial approaches to the process plant layout prob- 
lem were based on heuristics [2,35]. Although heuris- 
tic approaches may be efficient from the computational 
point of view, they do not offer any guarantee for the 
optimality of the solution obtained. Graph-theoretic 
approaches have been applied to the problem of orga- 
nizing equipment items into sections for single-floor 
plants [15]. A method to aid decisions concerning the 
assignment of equipment items to floors in multifloor 
arrangements was proposed [16], albeit with no con- 
sideration of the detailed layout within each floor. Ap- 
plication of stochastic optimization techniques to the 
process plant layout problem was demonstrated [6,7] 
by developing software tools capable of tackling larger 
problems. 

Recently, a number of mathematical program- 
ming approaches have emerged based either on land 
space discretization [11,12,34] or continuous-domain 
representation [3,9,13,23,24,30]. Most of these mod- 
els accommodate various important issues of the lay- 
out problem, such as rectangular equipment footprints, 
rectilinear distances, equipment orientation, restric- 
tions on available space, layout organization into pro- 
duction sections and some safety aspects (e.g. mini- 
mum distances between equipment items). 

The process plant layout problem is well known 
as a hard computational problem. Most literature ap- 
proaches, which are based on mathematical program- 
ming and use branch-and-bound solution procedures, 
can usually tackle flowsheets with up to 12 equip- 
ment items. It has been demonstrated that the solu- 
tion performance can greatly be improved with ad- 
ditional simple constraints to eliminate symmetrical 
layout solutions and/or tighten mathematical formu- 
lation [8,9,33,37]. Furthermore, iterative optimization- 
based procedures have recently been developed us- 
ing construction and/or improvement solution phases, 
which are suitable for larger flowsheets [17,28,40]. 

Next, a representative mathematical programming 
framework is described, based on a continuous-domain 
representation. 


A Continuous-Domain Mathematical Model 


A mathematical programming model for the optimal 
single-floor process plant layout problem in a two- 
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Plant Layout Problems and Optimization, Table 1 
Notation 


Parameters 


Parameters SSCS 
jaiB; _[Dimensionsofiem? SCS 


Decision variables 


1 if length of item i (parallel to the x axis) is equal to w;; 0 otherwise 


Nonoverlapping binary variables 


Length of item i 


Depth of item i 


Coordinates of the geometrical center of item i 


Relative distance in x coordinates between items i and j, if i is to the right of j 


Relative distance in x coordinates between items j and j, if / is to the left of j 


Relative distance in y coordinates between items j and j, if iis above j 


Relative distance in y coordinates between items j and j, if iis below j 


Total rectilinear distance between items j and j 


dimensional continuous space is described here as pro- 
posed previously by Papageorgiou and Rotstein [24]. In 
the formulation presented here, rectangular shapes are 
assumed for equipment items following current indus- 
trial practice. Rectilinear distances between the equip- 
ment items are used for a more realistic estimate of pip- 
ing costs as either aerial or underground corridors are 
usually used for the piping and instrumentation net- 
work. Equipment items, which are allowed to rotate 
90°, are assumed to be connected through their geomet- 
rical centers. 

Overall, the single-floor process plant layout prob- 
lem can be stated as follows: 
Given 
A set of N pieces of equipment and their dimensions 
Connectivity network and associated costs 
Space and equipment allocation limitations 
Minimum safety distances, if any, between equip- 
ment items 
e Production sections 
Determine 
e The allocation of each equipment item (i.e. coordi- 

nates and orientation) 
So as to optimize a suitable criterion (here, minimize 


equipment connection cost). 


The indices, parameters and decision variables (bi- 
nary and continuous) associated with the mathematical 
model are listed in Table 1. 

Next, the main mathematical constraints of the 
single-floor process plant layout model are described. 
Equipment Orientation Constraints The values of 
length, 1;, and depth, d;, of equipment item i depend on 
its orientation in the space and can be determined by 


l = a,O; + B;(1 — O;) Vi, (1) 


dj=a;+B;—1; Wi. (2) 


Distance Constraints The relative distances between 
items i and j are given by the following equalities: 


Rij — Lij = X; — Xj Vi, j): Cij > 0, (3) 
Ay=By= Y= ty VG.) Gye 0, (4) 
Djj = Rij + Lij + Aij + Bij Vi, j):Cij > 0. (5) 


It should be noted that the above constraints (3-5) 
are written only for those equipment pairs whose rela- 
tive distance occurs in the objective function (i.e. terms 
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Plant Layout Problems and Optimization, Figure 1 
Equipment nonoverlapping 


with nonzero connection costs, Cj; > 0), which is min- 
imized. As a consequence, it is guaranteed that at most 
one variable of each pair (Rj,Lj) and (Aj,By) will be 
nonzero at the optimal solution of each linear program- 
ming problem during the branch-and-bound solution 
procedure. 


In order to avoid the 
overlapping of the two equipment items i and j occu- 
pying the same physical location, appropriate mathe- 
matical constraints should be included in the model to 
prohibit overlapping of the projections of each equip- 
ment footprint either in the x or y in the dimension as 


Nonoverlapping Constraints 


clearly shown in Fig. 1 
Nonoverlapping is guaranteed if at least one of the 
following conditions is active: 


(i + 1) 
=e 
2 (6) 
Vi=1,...,N-1, j=itl,...,N, 
I,+1]; 
oe oe ae 
2 (7) 
Vi=1,...,N-1, j=itl,...,N, 
dj+d 
y,— i i) 
2 (8) 
Vi=1,...,.N-1, j=i+1,...,N 
dj+d 
Y;- — i) 
2 (9) 
Vi=1,...,N—1, j=i4+1,...,N 


These nonoverlapping disjunctive conditions can 
mathematically be modelled by including appropriate 
“big M” constraints and introducing two additional sets 
of binary variables; E1j and E2,. Each pair of values (0 
or 1) for these variables determines which constraint 
from (6) to (9) is active. For every pair (i,j) such that 
i < j, we have: 

If constraint (6) is active, then E1;; = 0, E2;; = 0. 

If constraint (7) is active, then E1;; = 1, E2;; = 0. 

If constraint (8) is active, then E1;; = 0, E2;; = 1. 

If constraint (9) is active, then E1;; = 1, E2;; = 1. 


In summary, the nonoverlapping constraints in- 
cluded in the mathematical model are: 
(I; + 1) 
Xi — Xj + M(E1j; + £2;j) > - (10) 
Vi=1,...,N-1,j=i+1,...,N 
(I; + lj) 
A= he SE Be (11) 
Vi=1,...,N-1, j=i41,...,N 
(di + dj) 
¥; — Yj + MQ + £1j;; — £2;;) = 2 (12) 
Vi=1,...,N-1,j=i+1,...,N 
(d; + dj) 
Yo tM = £1yj— E24) 2 >) 


Vi=1,...,N-1, j=i+l,....N, 


where M is a suitable upper bound on the dis- 
tance between two equipment items. Note that the 
above constraints can easily be modified to include 
minimum/maximum distances between specific items 
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for safety and/or operational reasons (see, for exam- 
ple, [24]). More systematic frameworks to account for 
safety aspects are described in Sect. “Other Applica- 
tions”. 


Additional Layout Design Constraints Lower- 
bound constraints on the coordinates of the geometri- 
cal center of each equipment item should be included 
in order to avoid intersection of items with the origin 
of axes, achieved by 


l; 
XxX; > — Vi, 14 
aa NA (14) 
dj 
Y,=—> Vi (15) 


Similarly, upper-bound constraints can be used to 
force all equipment items to be allocated within a rect- 
angular shape of land area defined by the corners (0,0) 
and (x™*,y™**): 


li 
ee Via (16) 


d; 
Y; + = ae (17) 
Objective Function The objective function consid- 
ered in this model is the minimization of the total con- 


nection cost: 


Min > » Ci;Dij . 


i jfi 


(18) 


Other cost terms can be considered, such as land 
area, construction/building, vertical/horizontal pump- 
ing, protection devices and accident property damage. 
All continuous variables in the formulation are de- 
fined as nonnegative. Overall, the single-floor process 
plant layout problem has been formulated as a mixed- 
integer linear programming (MILP) model (con- 
straints 1-5, 10-18). This model has also been extended 
for multifloor process plant layout problems [26], com- 
bined with safety aspects [25] and integrated with de- 
sign/operation of pipeless batch plants [29]. 


Integration Aspects 


In this section, additional important aspects are dis- 
cussed with respect to their integration into traditional 
plant layout frameworks. 


Use of Land 


The cost of land is an important component that should 
be included in the mathematical model of the pro- 
cess pant layout problem. Land cost has been implic- 
itly taken into account [30] where the surface area has 
been assumed to be proportional to the land occupied 
by equipment items and not the space around items. 
A stochastic optimization technique [7] includes ex- 
plicitly the cost of land in the objective function. Both 
previous efforts [7,30] refer to single-floor layouts prob- 
lems. 

For the multifloor plant layout case, the trade-off 
between the cost of occupied area (land) and that of 
construction (multiple floors) should be considered 
and resolved in an optimal manner. Mathematical pro- 
gramming models that have crucially addressed such is- 
sues have been developed by Georgiadis et al. [12] and 
Patsiatzis and Papageorgiou [26]. 

Sole consideration of land cost (and possibly to- 
gether with construction, connection and pumping 
costs) may result in quite a compact layout and con- 
sequently a conservative layout solution from a safety 
point of view. Therefore, sufficient integration of safety 
aspects into existing frameworks is necessary. Such rep- 
resentative research efforts are described next. 


Safety 


Safety aspects should be considered during the early 
stages of the design process by using appropriate quan- 
titative indices. Such aspects are either ignored or con- 
sidered through rather simplistic terms (e. g. minimum 
distances [3,24]). The safety of a chemical plant can fur- 
ther be improved by either putting the plant equipment 
items far apart from each other or by installing protec- 
tion devices at additional capital cost, which would re- 
duce the area of exposure and consequently reduce the 
propagation of a potential accident. Various trade-ofts 
should be considered simultaneously within the process 
plant layout, such as: 

e Piping, pumping and land costs that increase as the 
distance between equipment items increases; 

e Financial risk component cost, which decreases as 
equipment items are put far apart or by installing 
extra protection devices; and 

e Cost of protection devices, which eliminate accident 
escalation. 
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Along the above directions, a mathematical program- 
ming model-based approach was introduced by Pen- 
teado and Ciric [30], which resulted in mixed-integer 
nonlinear programming (MINLP) models determining 
simultaneously the process plant layout, the number 
and the type of protection devices and the financial 
risk associated with accidents and their propagation to 
neighboring units, assuming circular equipment foot- 
prints. An alternative MINLP model has also been pro- 
posed [27] by adopting rectangular shapes and recti- 
linear distances between units. Both these studies were 
based on a risk representation related to accidents prop- 
agating from a source to a target unit utilizing the 
equivalent TNT method [19]. An evolutionary proce- 
dure was proposed by Fuchino et al. [10] for the ar- 
rangement of equipment models in the early design 
phase by considering simultaneously the position and 
the orientation of equipment modules, as both affect the 
propagation of an accident. 

Two widely used indices for risk assessment and 
safety evaluation are the Mond Index [20] and the Dow 
Fire and Explosion Index [1]. The use of the Mond 
Index was illustrated by Castell et al. [7] by adding 
a penalty term in the objective function, which mini- 
mizes any violations of Mond safety distances. User in- 
tervention is crucial in this work to ensure the feasibil- 
ity of all separation distance requirements. Finally, the 
use of the Dow Fire and Explosion Index was demon- 
strated by Patsiatzis et al. [25] by proposing an MILP 
approach to safe process plant layout. This index de- 
termines the realistic maximum loss occurring under 
the most adverse operating conditions and is applicable 
to processes where flammable, combustible or reactive 
material is stored or processed. It is based on historic 
loss data, the energy potential of the processed materi- 
als in the chemical plants and the current application of 
loss-prevention practices. 


Location Information 


Integration of location-specific information with the 
decision-making procedure for the process plant lay- 
out problem was successfully demonstrated by Ozyurt 
and Realff [23]. Such information is related to existing 
infrastructure, geographic aspects, climate conditions, 
elevation and soil characteristics. The location-specific 
information system is first used to derive conclusions 


about the plant environment, and these are then trans- 
lated into additional mathematical constraints on the 
process plant layout problem by restricting some equip- 
ment locations. 


Design and Operation 


The process plant layout decisions are traditionally 
considered after the plant design stage has been com- 
pleted. However, it is obvious that strong interactions 
exist between plant layout and plant design and there- 
fore significant benefits could be gained by develop- 
ing systematic, simultaneous approaches. Such an ap- 
proach was proposed by Barbosa-Povoa et al. [4], where 
an MILP model was shown to be particularly suitable 
for cases with equipment of rectangular and irregular 
shapes as well as flexible equipment input/output point 
locations. The latter model has been integrated with 
operating aspects and applied to multipurpose batch 
plants [5]. 

It should be noted that layout considerations are of 
particular significance for pipeless plants as they deter- 
mine the vessel transfer times, which can then affect 
the schedule of the plant. Simultaneous optimization- 
based approaches have also been developed for pipeless 
batch pants, which offer enhanced production flexibil- 
ity and piping-free cleaning requirements, by consider- 
ing plant design, layout and operation aspects [29,31]. 


Layout Organization into Production Sections 


The organization of the plant layout problem into well- 
defined production sections is often required for vari- 
ous reasons, such as safety, efficient material handling 
and workforce management. The boundaries of these 
sections are drawn by walls or corridors, which facili- 
tate the movement of materials and/or operators. 

The organization of the plant layout into sections 
can formally be incorporated into the process plant lay- 
out mathematical model presented in Sect. “A Continu- 
ous-Domain Mathematical Model”. The basic assump- 
tion is that the equipment items are partitioned into 
subsets by using either rule-based/intuitive techniques 
or algorithmic approaches (see, for example, [15]). 
Each subset of equipment items constitutes a section, 
and then each section can then be represented by 
a well-defined rectangular box. New constraints must 
be introduced to guarantee that no overlapping occurs 
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(1) among different sections and (2) among the foot- 
prints within each section [24]. The aim of the op- 
timization is then to determine the location of both 
production sections and equipment items within each 
section so that the total layout cost is minimized. 


Pipe Routing 


The design of the piping system in a chemical plant 
constitutes an important part of deriving an efficient 
plant layout (apart from pipeless plants) and should 
be considered as early as possible in the design pro- 
cess. A number of factors should be taken into ac- 
count, such as cost, safety, operability and struc- 
tural/mechanical aspects. A pipe-routing method was 
proposed by Schmidt-Traub et al. [32] using grid 
and vector router algorithms. Optimization-based and 
graph theory methods have also been described by Gui- 
rardello and Swaney [13], which could include capacity 
constraints, pipes with branches and stress analysis. 


Other Applications 


This section demonstrates the application of basic plant 
layout principles to two interesting problems: alloca- 
tion and data classification. 


Allocation 


Allocation problems, which include plant layout prob- 
lems, with different numbers of dimensions have been 
research topics of great activity for many years. Most 
allocation problems are large-scale combinatorial op- 
timization problems, occurring in several different in- 
dustrial applications. A general purpose MILP model 
for allocation problems in any given number of dimen- 
sions has been presented [38]. This mathematical for- 
mulation utilizes a type of nonoverlapping constraints 
similar to those presented in Sect. “A Continuous-Do- 
main Mathematical Model”. The proposed model deals 
with the allocation of items in an N-dimensional space. 
Several problems, previously presented in the litera- 
ture, have been solved using the proposed model, such 
as one-dimensional scheduling problems, two-dimen- 
sional cutting problems, as well as plant layout prob- 
lems and three-dimensional packing problems. Prob- 
lems defined in four dimensions are also presented and 
solved using the model described above. 


Data Classification 


Beyond chemical plant design, basic plant layout prin- 
ciples can also be applied to data classification problems 
as illustrated here. Data classification, a fundamental 
problem in data mining and machine learning, deals 
with the identification of patterns and the assignment of 
new samples into known groups. A rigorous mixed-in- 
teger optimization model for multiclass data classifica- 
tion problems has been proposed [39] using a hyperbox 
representation. The optimal location and dimension of 
each box is determined by minimizing the total num- 
ber of misclassifications. Special constraints are intro- 
duced to avoid overlapping of boxes that belong to dif- 
ferent classes. These constraints simply represent an 
extension of those described in Sect. “A Continuous- 
Domain Mathematical Model” to cover more than two 
dimensions. 


Concluding Remarks 


In this chapter, a comprehensive review of the chemical 
plant layout research has been presented. Systematic, 
optimization-based frameworks for plant layout have 
the potential to offer contractors and chemical compa- 
nies significant savings and increase business competi- 
tiveness. The future will show additional benefits if is- 
sues related to dynamic plant layout, uncertainty and 
more efficient solution procedures for larger problems 
are resolved. 
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It is considered that the behavior of an object at any in- 
stant of time is characterized by n real numbers Xo cees 
x". The vector space X of the vector variable x = (x1, ...5 
x") is the phase space of the object under consideration. 
The motion of the object consists of the fact that the 
variables x!, ..., x” change with time. It is assumed that 
the object’s motion can be controlled, i. e., that the ob- 
ject is equipped with certain controllers on whose po- 
sition the motion of the object depends. The positions 
of the controllers are characterized by points u = (ul, 
..., U") of a certain control region U, which may be any 
set in some r-dimensional Euclidean space. In applica- 
tions, the case where U is a closed region in the space is 
especially important. 

In the statement of the problem it is assumed that 
the object’s law of motion can be written in the form of 
a system of differential equations 


dx? 

ap = File), i=1,...,n, (1) 
or in vector from 

dx 

= = flew. (2) 


where the functions f; are defined for x € X andu € U. 
They are supposed to be continuous in the variables x’, 
...,x", wand continuously differentiable with respect to 


xt, xt 


If the control law is given, i.e., a certain admissible 
control u = u(t) is chosen, (2) takes the form 


d. 
= = fz,ult)), (3) 


from which, for any initial condition x(to) = xo, the mo- 
tion of the object x = x(t) is uniquely determined, i-e., 
the solution of (3) is defined for a certain time inter- 
val. The solution x(t) is called the solution of (2) cor- 
responding to the control u(t) for the initial condition 
x(to) = Xo. The solution may not be defined on the en- 
tire interval to < t < t; on which u(t) is given as it may 
run off to infinity. 

It is said that the admissible control u(t), tp) < t < 
t; transfers the phase point from the position xo to the 
position x, if the corresponding solution x(t) of (2), sat- 
isfying the initial condition x(to) = xo, is defined for all 
t, to <t <t), and passes through the point x, at the time 
th. 

It is supposed that an additional function fo(x, u), 
which is defined and continuous together with its par- 
tial derivatives df/ dx',i=1,..., n, on all of X x U, is 
given. The fundamental problem of finding the optimal 
controls is formulated as follows [1]. 

In the phase space X two points xo and x, are given. 
Among all the admissible controls u = u(t) which trans- 
fer the phase point from the position xo to the position 
x, (if such controls exist), find one for which the func- 
tional 


rf fols(e).u(n) ae (4) 


takes on the least possible value. Here x(t) is the solu- 
tion of (2) with initial condition x(to) = x9 correspond- 
ing to the control u(t) and ft, is the time at which this so- 
lution passes through x;. The control u(t) which yields 
the solution of the problem is called an optimal control 
corresponding to a transition from xo to x;. The corre- 
sponding trajectory x(t) is called an optimal trajectory. 

An important special case of the fundamental prob- 
lem is the one where f(x, u) = 1. In this case, the func- 
tional (4) takes the form 


J=t,—to 


and the optimality of the control u(t) signifies minimal- 
ity of the transition time from xo to x;. The problem of 


2988 


Pontryagin Maximum Principle 


finding the optimal controls and trajectories in this case 
is called the time-optimal problem. 

In order to formulate the necessary optimality con- 
dition it is convenient to adjoin a new coordinate x° to 
the phase coordinates x', ..., x", which vary according 
to (1). Let x° vary according to the law 


TG ata age tad) 


and consider the system of differential equations 


“Lge $0 (5) 


dt 
whose right-hand sides do not depend on x°. Introduc- 
ing the vector 


x = (x",..., x") 


in the (n + 1)-dimensional vector space X, system (5) 
may be rewritten in vector form 


where fix, u) is the vector in X with coordinates Fol% 
u), ...>fn(x, u). Note that fix, u) does not depend on 
the coordinate x° of the vector X. 

To formulate the theorem, which yields the solution 
of the fundamental problem, in addition to the funda- 
mental system (5) another system of equations 


qi “\ Ofi(x,u) 
dt » 


i=0,...,n, (6) 


ax! ne 
j=0 
in the auxiliary variables Wo, ..., Ww, is considered. 
If we choose an admissible control u(t), t9 < t < 
t;, and have the corresponding phase trajectory x(t) of 
system (5) with initial condition x(to) = xo, system (6) 
takes the form 


di OF (x(t), u(t) 
ae at 
where i=0,..., n. 


This system is linear and homogeneous. Therefore, 
for any initial condition it admits the unique solution 


w= (Wo.---+ Wn) 


for the y;, which is defined on the entire interval tp < 
t < t,. Just as the solution x(t) of (5), the solution of 
(7) consists of continuous functions w;(t) which have 
everywhere, except at a finite number of points, i.e., at 
the points of discontinuity of u(t), continuous deriva- 
tives with respect to t. For any initial condition each so- 
lution of (7) is called the solution of (6) corresponding 
to the chosen control u(t) and phase trajectory x(t). 

Systems (5) and (6) are combined into one entry. To 
do so the function 


HW, x,u) = (0, f(x, uw) = Do vi filxw) 
j=0 
of the variables 


| eee See i 


is considered. 

The systems (5) and (6) may be rewritten with the 
aid of the function H in the form of the Hamiltonian 
system 


dx! 0H 

— i= ee ee 8 
dp gyi? PTO” ” 
dw; 0H ; 

=— =0,...,n. 9 
dt Oxi” Pipe ° 7) 


Thus, taking an arbitrary admissible, i.e., piecewise 
continuous, control u(t), to) < t < t,, and the initial 
condition x(to) = Xo, the corresponding trajectory 
x(t) = (x°(t),...,x"(t)) can be found. After that the 
solution of (9), 


W(t) = (Wolt),..-. Wnt), 


corresponding to the functions u(t) and x(t) can be 
found. It should be emphasized that the vector func- 
tions X(t) and y(t) are continuous and have every- 
where, except at a finite number of points, continuous 
derivatives with respect to ft. 

For fixed values of w and x the function H becomes 
a function of the parameter u € U. The least upper 
bound of the values of this function is denoted by 

M(W,x) = sup H(w,x,u). 

ueU 

If the continuous function H achieves its upper bound 


on U, then M(w, x) is the maximum of the values of H 
for fixed w and x. 
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A necessary condition for optimality is given by 
Theorem 1, which is called the maximum principle [1]. 


Theorem 1 Let u(t), to < t < t1, be an admissible con- 
trol such that the corresponding trajectory x(t) which be- 
gins at the point Xo at the time to passes at some time 
t, through a point on a line IT. In order that u(t) and 
x(t) be optimal it is necessary that there exist a nonzero 
continuous vector function W(t) = (Wo(t),..-. Walt) 
corresponding to u(t) and x(t) such that 
1) for every t, to < t < t1, the function H(W(t), x(t), u) 
of the variable u € U attains its maximum at the 
point u = u(t), 


H(W(t), x(t), u(t) = M(w(t), x(t); (10) 
2) at the terminal time t, the relations 
Yolti) <0, M(w(t), x(ti)) = 0 (11) 


are satisfied. Furthermore, it turns out that if 
w(t), x(t) and u(t) satisfy system (8)-(9) and condi- 
tion 1), the time functions W(t) and M(W(ty), x(t1)) 
are constant. Thus, (11) may be verified at any time 
t, to <t < ty, and not just at ty. 
From Theorem 1 an analogous necessary condition for 
time optimality can be derived. To do this, it is neces- 
sary to set f(x, u) of Theorem 1 equal to 1. The function 
H then takes the form 


H = Yot do Wifi(x.u) 
j=l 


Introducing the n-dimensional vector y = (W1,... 
and the function 


Wn) 


Hb, x,0) = D> vifilx, 4), 
j=l 


equations (1) and (6) can be rewritten in the form of the 
Hamiltonian system 


dx! HT! 
77 Se nigh (12) 
dw; HH’ 
es ee (13) 


For fixed values of w and x, H’ is a function of u. The 
upper bound of the values of this function is denoted by 


M'(W,x) = sup H’(W, x, u). 


ueU 


Because 
H'(w,x,u) = H(w,x,u)— Wo, 
then 
M'(W,x) = M(W,x)— Wo. 
and therefore (10) and (11) now take the form 
H'(w(t), x(t), u(t)) = M'(w(t), x(t) 
Thus, the following theorem is valid [1]. 
Theorem 2 Let u(t), to < t < t,, be an admissible con- 
trol which transfers the phase point from xo to x, and let 
x(t) be the corresponding trajectory, so that x(to) = xo; 
x(t1) = x. In order that u(t) and x(t) be time-optimal it 


is necessary that there exist a nonzero, continuous vector 

function w(t) = (Wi(t), ..., Wn(t)) corresponding to u(t) 

and x(t) such that 

1) for all t, to < t < ty, the function H'((t), x(t), u) of 
the variable u € U attains its maximum at the point 
u= u(t), 


H' (w(t), x(t), u(t)) = M’(w(t), x(t); 
2) at the terminal time t, the relation 
M'(Wr(t1), x(t1)) = 0 


is satisfied. Furthermore, it turns out that if y(t), x(t) 
and u(t) satisfy system (12)-(13) and condition 1), the 
time function M’ (W(t), x(t)) is constant. Thus, (14) 
may be verified at any time t, to < t < t;, and not just 
att}. 


(14) 


From among all the trajectories which start at xo and 
end on some point of IT (as well as the corresponding 
controls), Theorem 1 allows us to single out those sep- 
arate, isolated trajectories and controls which satisfy all 
the formulated conditions. In fact, there are 2n + 3 rela- 
tions (8)-(10) for 2n + 3 variables x/, yj; and u. Further- 
more, since (10) is not differential and the number of 
differential equations equals 2n + 2, the solutions of the 
system (8)-(10) in general depend on 2n + 2 parame- 
ters. However, one of these parameters is redundant as 
the functions w;(t) are defined only up to a common 
multiple. In addition, one of the parameters is deter- 
mined by the condition that 


max H(W(to), x(to), 4) 


vanishes. 
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Thus, there are 2n parameters, on which the whole 
manifold of solutions of (8)-(10) depends. It follows 
that 2n parameters must be chosen in such a way that 
the trajectory x(t) passes through Xo at the given time 
t = to and through a point of IT for some t > to. The 
number t) — fo is also a parameter, so that there are al- 
together 2n + 1 essential parameters. The condition that 
the trajectory must pass through the point x and the 
line IT gives rise to 2n + 1 relations. Hence, one can ex- 
pect that there exist only separate, isolated trajectories 
joining the point Xo with the line [7, which satisfy the 
conditions of Theorem 1. Only these separate, isolated 
trajectories can turn out to be optimal. 


See also 


> Dynamic Programming: Continuous-Time Optimal 
Control 

> Hamilton-Jacobi-Bellman Equation 

> High-Order Maximum Principle for Abnormal 
Extremals 
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Portfolio Selection Problem 


The heart of the portfolio problem is the selection of an 
optimal set of investment assets by rational economic 
agents. Elements of portfolio problems were discussed 
in the 1930’s and 1940’s by J.R. Hicks, [19], J. Marschak 
[46], D.H. Leavens [37], J.B. Williams [62], and oth- 
ers; see [45] for a survey of these early contributions. 
However, the first formal specification of such a selec- 
tion model was by H.M. Markowitz [40,42] who de- 
fined a mean-variance model for calculating optimal 
portfolios; see also A.D. Roy [51], whose first safety 
model is very close to the mean value model. Follow- 
ing [55,58,59] and [50], this portfolio selection model 
may be stated as: 


min x'Vx 
a rH Tt, (1) 
x'e=1, 


where x is a column vector of investment proportions 
in each of the risky assets, V is a positive semidefinite 
variance-covariance matrix of asset returns, r is a col- 
umn vector of expected asset returns, r, is the investor’s 
target rate of return and e is a column unit vector. An 
explicit solution for the problem can be found using the 
procedures described in [47,66], or [50]. 

Restrictions on short selling can be modeled by aug- 
menting (1) by the constraints: 


x>0, (2) 


where 0 is a column vector of zeros. The problem now 
becomes a classic example of quadratic mathematical 
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programming; indeed, the development of the portfo- 
lio problem coincided with early developments in non- 
linear programming. Markowitz’s [41] critical line al- 
gorithm is a systematic solution procedure. Formal in- 
vestigations of the properties of both formulations, and 
variants, appear in [22,57], and the references above. 


The Use of Mean And Variance 


The economic justification for this model is based on 
the von Neumann-Morgenstern expected utility re- 
sults, discussed in this context by Markowitz [42]. The 
model can also be viewed in terms of consumer choice 
theory together with the characteristics model devel- 
oped by K. Lancaster [36]. His argument is that goods 
purchased by consumers seldom yield a single, well de- 
fined service; instead, each good may be viewed as a col- 
lection of attributes each of which gives the consumer 
some benefit (or dis-benefit). Thus preference is defined 
over those characteristics embodied in a good rather 
than over the good itself. The analysis focusses atten- 
tion on the attributes of assets rather than on the assets 
per se. This requires the assumption that utility depends 
only on the characteristics. With k characteristics, C;, 
we need 


U = f(W) = g(Cy..... Cx), 


where U and W represent utility and wealth. Model- 
ing too few characteristics will yield apparently false 
empirical results. Clearly, the benefits of this approach 
increase as the number of assets rises relative to the 
number of characteristics. The objects of choice are the 
characteristics C),..., Cx. In portfolio theory, these are 
taken to be payoff (return) and risk. 

At Markowitz’s suggestion, when dealing with 
choice among risky assets, payoff is measured as the 
expected return of the distribution of returns, and risk 
by the standard deviation of returns. Apart from mi- 
nor exceptions, see [66], this pair of characteristics form 
a complete description of assets which is consistent 
with expected utility theory in only two cases: assets 
have normal distributions, or investors have quadratic 
utility of wealth functions. The adequacy of these as- 
sumptions has been investigated by a number of au- 
thors (e.g., [5,16,60]). Although returns have been 
found to be nonnormal and the quadratic utility has 
a number of objectionable features (not least diminish- 


ing marginal utility of wealth for high wealth), several 
authors demonstrate approximation results which are 
sufficient for mean variance analysis ([39,49,52]). 

A number of authors, including Markowitz [42], 
consider alternatives to the variance and suggest the 
use of the semivariance. This suggestion has been ex- 
tended into workable portfolio selection rules. E. Fama 
[14] and S. Tsiang [61] have argued the usefulness of 
the semi-interquartile range as a measure of risk. A. 
Kraus and R.F. Litzenberger [35] and others have ex- 
amined the effect of preferences defined in terms of the 
third moment which allows investor choice in terms of 
skewness. J.G. Kallberg and W.T. Ziemba [31,32] show 
that risk aversion preferences are sufficient to deter- 
mine optimal portfolio choice if assets have normally 
distributed returns whatever the form of the assumed, 
concave, utility function. 


Solution of Portfolio Selection Model 


In the absence of short sales restrictions, (1) can be 
rewritten as 


1 
min L = 5x Vx —Ay(x'r — rp) — An(x’e — 1). (3) 
The first order conditions are 
Vx = Air + Are 5 


which shows that, for any efficient x, there is a linear re- 
lation between expected returns r and their covariances, 
Vx. 

Solving for x: 


x=A{VV rT +A,Ve = Vi" [re]A [rp 1]’, (4) 
where 
a b Vv 'r 
A= = = 
E i i Vote 
Substituting (4) into the definition of portfolio variance, 
x'V x, yields 


r'V'e 
ev tel” 


Vy = [rp JA [rp tl’ ; 
1 
[< —2br, + “ 2 (5) 
Spee | 


where V, and S, represent portfolio variance and stan- 
dard deviation, respectively. This defines the efficient 
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set, which is a hyperbola in mean/standard-deviation 
space (or a parabola in mean/variance space). The min- 
imum risk is at Smin = c” and rmin = b/c (both strictly 
positive). Rational risk averse investors will hold port- 
folios lying on this boundary with r > rmin. 

Each efficient portfolio, p, has an orthogonal port- 
folio z (i.e., such that Cov(rp, rz) = 0) with return 


a— bry 


z= : 
b—crp 

Using this, the efficient set degenerates into the straight 
line tangent to the hyperbola at p which has intercept 


Lz: 
r=f, + As ‘ (6) 


where r and s represent vectors of the expected return 
and risks of efficient portfolios, and A = (rp — rz)/Sp can 
be interpreted as the additional expected return per unit 
of risk. This is known as the Sharpe ratio [54,56]. Equa- 
tion (6) shows a two-fund separation theorem, that lin- 
ear combinations of only two portfolios are sufficient to 
describe the entire efficient set. 

Under the additional assumptions of homogeneous 
beliefs (so that all investors perceive the same param- 
eters) and equilibrium, (6) becomes the capital mar- 
ket line. The security market line (i.e. the relationship 
between expected returns and systematic risk or beta), 
which is the outcome of the capital asset pricing model 
(CAPM), can be derived by premultiplying (4) by V and 
simplifying using the definitions of V, and rz: 

Vx 


where fp = v" (7) 
P 


r=ret(rp—rz)B, 


If it exists, the risk-free rate of interest may be substi- 
tuted for r, (definitionally, the risk-free return will be 
uncorrelated with the return on all risky assets). Equa- 
tion (7) then becomes the original CAPM in which 
expected return is calculated as the risk-free rate plus 
a risk premium (measured in terms of an asset’s co- 
variance with the market portfolio). The CAPM forms 
one of the cornerstones of modern finance theory and 
is not addressed here. Discussion of the CAPM can be 
found in [22] and [17], while systematic fundamental 
and seasonal violations of the theory are presented in 
[63] and [34]. 


Short Selling 


The assumption that assets may be sold short (i.e., x; 
< 0) is justified when the model is used to derive an- 
alytical results for the portfolio problem. Also, when 
considering equilibrium (e. g., the CAPM), none of the 
short selling constraints should be binding (because in 
aggregate, short selling must net out to zero). However, 
significant short selling restrictions do face investors 
in most real markets. These restrictions may be in the 
form of absolute prohibition, the extra cost of deposits 
to back short selling or self imposed controls designed 
to limit potential losses. For example, the NYSE im- 
poses the ‘uptick rule’ under which short sales are al- 
lowed only if the price of the immediately preceding 
trade was higher than, or equal to, the trade preceding it 
(as short selling is profitable when the market is falling), 
this rule substantially limits its attractiveness. 

The set of quadratic programming problems to find 
the efficient frontier when short sales are ruled out can 
be formulated as either minimising the portfolio risk 
for a specified sequence of portfolio returns (rp) by re- 
peatedly solving equations (1) and (2), or maximising 
the weighted sum of portfolio risk and return for a cho- 
sen range of risk-return trade-off parameters (ju) by re- 
peatedly solving (8). This latter approach has the ad- 
vantages of locating only points on the efficient frontier 
and, for evenly spaced increments in jz, locating more 
points on the efficient frontier where its curvature is 
greatest. 


max a = x’'Vx— p(x’r—rp) 
st x>0 (8) 


x'e=1. 


When short sales are permitted, a position (long or 
short) is taken in every asset, while when short sell- 
ing is ruled out, the solution involves long positions in 
only about 10% of the available assets. When short sell- 
ing is permitted, about half the assets are required to 
be sold short, often in large amounts, and sometimes 
in amounts exceeding the initial value of the invest- 
ment portfolio. Indeed, this is the main activity of ‘short 
seller’ funds. 

In contrast, most models based on portfolio theory, 
in particular the CAPM, ignore short selling constraints 
[43,44]. This change is consistent with the development 
of equilibrium models for which institutional restric- 
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tions are inappropriate (and if imposed would not be 
binding). However, when short selling is permitted, the 
number of asset return observations is required to ex- 
ceed the number of assets, while complementary slack- 
ness means that this condition need not be met when 
short selling is ruled out. Computational procedures to 
solve mean-variance models with various types of con- 
straints, and the optimal combination of safe and risky 
assets for various utility functions are discussed in [65]. 


Estimation Problems 


The model (1) requires estimates of r and V for the 
period during which the portfolio is to be held. This 
estimation problem has been given relatively little at- 
tention, and many authors, both practitioners and aca- 
demics, have used historical values as if they were pre- 
cise estimates of future values. However, S.D. Hodges 
and R.A. Brealey [21], among others, demonstrate the 
benefits obtained from even slight improvements on 
historical data. 

Estimation risk can be allowed for either by using 
different methods to forecast asset returns, variances 
and covariances, which are then used in place of the 
historical values in the portfolio model, or by using 
the historical values in a modified portfolio selection 
technique [2]. Since the portfolio selection model of 
Markowitz takes these estimates as parametric, there is 
no theoretical guidance on the estimation method and 
a variety of methods have been proposed to provide the 
estimates. The Sharpe single index market model [53] 
has been widely applied to forecast the covariance ma- 
trix. Originally proposed to reduce the computation re- 
quired by the full model, it assumes a linear relation 
between stock returns and some measure of the mar- 
ket, r= a + B’m + « (for market index m and residuals 
€). This uses historical estimates of the means and vari- 
ances; however, the implied covariance matrix is V; = 
Vm BB’ + V, where v,, is the variance of the index, B 
is a column vector of slope coefficients from regressing 
each asset on the market index and V is a diagonal ma- 
trix of the variances of the residuals from each of these 
regressions. A number of studies have found that mod- 
els based on the single index model outperform those 
based on the full historical method, see e. g. [4]. 

The overall mean method, first proposed in [12], 
is based on the finding that, although historical esti- 


mates of means are satisfactory, data are typically not 
stable enough to allow accurate estimation of the N(N 
— 1)/2 covariance terms. The crudest solution is to as- 
sume that the correlations between all pairs of assets 
expected in the next period are equal to the mean of 
all the historic correlations. An estimate of V can then 
be derived from this. E.J. Elton, M.J. Gruber and TJ. 
Urich [13] compared the overall mean method of fore- 
casting the covariance matrix with forecasts made us- 
ing historical values, and four alternative versions of 
the single index model. They concluded that the over- 
all mean model was clearly superior. A simplified pro- 
cedure for estimating the overall mean correlation ap- 
pears in [1]. 

In recent years, statisticians have shown increas- 
ing interest in Bayesian methods [20] and particularly 
James-Stein estimators ([10,11,30,48]). The intuition 
behind this approach is that returns which are far from 
the norm have a higher chance of containing mea- 
surement error than those close to it. Thus, estimates 
of returns, based on individual share data, are cross- 
sectionally ‘shrunk’ towards a global estimate of ex- 
pected returns which is based on all the data. Although 
these estimators have unusual properties, they are gen- 
erally expected to perform well in large samples. 

P. Jorion [27,28] examined the performance of 
Bayes-Stein estimation using both simulated and small 
real data sets and concluded that the Bayes-Stein ap- 
proach outperformed the use of historical estimates of 
returns and the covariance matrix. However, he found 
[29] that the index model outperformed Stein and his- 
torical models. J.L.G. Board and C.MLS. Sutcliffe [4] ap- 
plied these and other methods to large data sets. They 
found that, in contrast to earlier studies, the relative 
performance of Bayes—Stein was mixed. While it pro- 
duced reasonable estimates of the mean returns vec- 
tor, there were superior methods (e. g., use of the over- 
all mean) for estimating the covariance matrix when 
short sales were permitted. They also found that, when 
short sales were prohibited, actual portfolio perfor- 
mance was clearly improved, although there was lit- 
tle to choose between the various estimation meth- 
ods. 

An alternative approach is to try to control for 
errors in the parameter estimates by imposing addi- 
tional constraints on (1). Clearly, ex-ante the solu- 
tion to such a model cannot dominate (1), however, 
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ex-post, dominance might emerge (i.e., what seems, 
in advance, to be an inferior portfolio might actu- 
ally perform better than others). The argument is that 
adding constraints to (1) to impose lower bounds (i.e., 
prohibiting short sales) and/or upper bounds (forc- 
ing diversification) can be used as an ad hoc method 
of avoiding the worst effects of estimation risk. Of 
course, extreme, but possibly desirable, corner solu- 
tions will also be excluded by this technique. K.J. Co- 
hen and J.A. Pogue [8] imposed upper bounds of 2.5% 
on any asset. Board and Sutcliffe [3] studied the ef- 
fects of placing upper bounds on the investment pro- 
portions, which may be interpreted as a response to 
estimation risk. Using historical forecasts of returns 
and the covariance matrix, and with short sales ex- 
cluded, they found that forcing diversification leads to 
improved actual performance over the unconstrained 
model. C.R. Hensel and A.L. Turner [18] have also 
studied adjusting the inputs and outputs to improve 
portfolio performance. 

V.R. Chopra and Ziemba [7], following the work of 
Kallberg and Ziemba [33], showed that errors in the 
mean values have a much greater effect than errors in 
the variances, which are in turn more important than 
errors in the covariances. Their simulations show errors 
of the order of 20 to 2 to 1. This quantifies the earlier 
findings and stresses the importance of having good es- 
timates of the asset means. Chopra [6] shows that mean, 
variance and covariance errors affect turnover in the 
same way. 

Another approach is to use fundamental analysis to 
provide external information to modify the estimates 
[21]. Clearly, among the simplest external data to add 
are the seasonal (e. g., turn of the year, and month and 
weekend) effects which have been found in most stock 
markets around the world. Incorporation of these into 
the parameter estimates can substantially improve the 
performance of the model. Ziemba [63] demonstrated 
the benefits of factor models to estimate the mean 
returns. 


Summary 


We have considered only the single period mean- 
variance portfolio theory model. Although recent de- 
velopments have focussed on extending the model 
to multiple periods, most of these models which as- 


sume frictionless capital markets require the solution 
of a sequence of instantaneous mean-variance mod- 
els in which the existence of transactions costs adds 
enormously to the complexity of the problem. Surveys 
covering dynamic portfolio theory appear in [9,22,66], 
and [23]; see also [64]. 
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Portfolio selection concerns the problem of finding the 
most attractive stocks and the determination of their 


proportions in a portfolio, which is essentially a matter 
of arbitration between the risk and the return. In 1952 
H.M. Markowitz [17] proposing the mean-variance 
model gives the start of a theory that had known since 
a great development. According to this approach, an 
investor follows two conflicting objectives that are the 
maximization of the expected return and the minimiza- 
tion of the risk measured by the variance of the return. 
On the same basis (essentially measuring the risk by the 
variance or another measure of the variation of the re- 
turn) several other models were developed. A complete 
overview of these models can be found in [5], includ- 
ing the single-index models, the multi-index models, 
the average correlation models, the mixed models, the 
utility model and models using different criteria such 
as the geometric mean return, stochastic dominance, 
safety first and skewness. 

However, an analysis of risk nature in portfolio 
management shows that the latter comes from vari- 
ous origins and then its nature is multidimensional. 
The traditional theoretical approach mentioned above 
does not take into account this multidimensional mea- 
sure of risk. Also, individual goals and investor’s pref- 
erences cannot be incorporated in these models. Mul- 
tiple criteria decision making (MCDM) provides the 
methodological basis to resolve the inherent multicri- 
teria nature of portfolio selection problem. Addition- 
ally, it builds realistic models by taking into account, 
apart of the two basic criteria of return and risk (mean- 
variance model), a number of important criteria, such 
as marketability, price earning ratio (PER), growth of 
the dividends, and others. Furthermore, MCDM, have 
the advantage of taking into account the preferences of 
any particular investor. Recently, several authors have 
developed a new approach, using MCDM for portfolio 
management. 

In this article we develop some arguments for the 
use of MCDM in portfolio management and we present 
a brief review of the existing articles concerning this 
new approach (Section 2). Then, we propose a multi- 
criteria methodological framework in two stages (Sec- 
tion 3). In the first stage the ELECTRE TRI [28] method 
and the MINORA ([24,25]) systems are used to select 
attractive stocks. In the second stage, the ADELAIS [23] 
system is used to construct a portfolio of the attrac- 
tive stocks chosen in the first stage. This methodolog- 
ical framework is illustrated on a case study. Finally, 


Portfolio Selection and Multicriteria Analysis 


2997 


we summarize the used methodological framework and 
underline its advantages then give some concluding re- 
marks. 


MCDM and Portfolio Selection 
Portfolio Selection: A Multicriteria Problem 


To manage efficiently portfolio selection, it is neces- 
sary to take into account all the factors that influence 
the financial markets. Then, portfolio management is 
a multicriteria problem. Effectively, multifactor models 
and APT (arbitrage pricing theory) point out the exis- 
tence of several influence factors for the determination 
of the stock prices. Furthermore, fundamental analy- 
sis models, commonly used in practice, underlines that 
stock prices are also dependent on the firm health and 
its capacity to pay dividends. The latter problem itself 
is a multicriteria problem because, in order to solve it, 
we must appreciate the profitability of the firm, its debt 
level (in the short and long terms) and quality of man- 
agement. Finally, in practice, an investor has a personal 
attitude and particular objectives. 

By reducing the risk to its probabilistic dimen- 
sion, the classical approach cannot take into account 
its multidimensional nature and risk measures that do 
not have a concrete and practical economical meaning. 
In addition, this approach imposes a norm to the in- 
vestor’s behavior that can be restrictive. Also, it can- 
not take into account the personal attitude and prefer- 
ences of a real investor confronted with a given risk in 
a particular situation. However, experience has proved 
that the classical approach is useful, for instance con- 
cerning the diversification principle and the use of the 
beta as measure of risk. Thus, the use of the classical 
approach seems to be necessary but not sufficient, to 
manage portfolio selection efficiently. Some additional 
criteria must be added to the classical risk-return crite- 
ria. In practice, these additional criteria can be found in 
fundamental analysis or constructed following the per- 
sonal goals of the investor. 

Note that the application of the above principles 
is difficult because of the complexity of multicriteria 
problems on the one hand and the use of criteria from 
different origins and of conflicting nature on the other 
hand. Furthermore MCDM will facilitate and favor the 
analysis of compromise between the criteria. It equally 
permits to manage the heterogeneity of criteria scale 


and the fuzzy and imprecise nature of the evaluation 
that it will contribute to clarify. (Here the words fuzzy 
and imprecise refer to: a) the delicacy of an investor’s 
judgement (the human nature and the lack of informa- 
tion), that will not always allow to discriminate between 
two close situations; and b) the use of a representation 
model, which is a simplification of reality that expresses 
itself in an error term.) 

Linking the multicriteria evaluation of an asset port- 
folio and the research of a satisfactory solution to the 
investor’s preferences, the MCDM methods allow to 
take into account the investors’ specific objectives. Fur- 
thermore, these methods do not impose any normative 
scheme to the comportment of the investors. The use 
of MCDM methods allows to synthesize in a single pro- 
cedure the theoretical and practical aspects of portfolio 
management, then it allows a non normative use of the- 
ory. 

The originality of MCDM provides the possibility 
to obtain a gain of time and/or to increase the number 
of stocks considered by the practitioner by systematiz- 
ing the decision process. Moreover, in a market close to 
efficiency, as are all the big markets, it is the good and 
fast use of all available information that ensures infor- 
mational efficiency of capital markets and will permit 
the investor to compete. 


Review of Existing Study 


In this section we present a brief review of articles con- 
cerning MCDM and portfolio management. An analy- 
sis and a more complete presentation of most of these 
papers can be found in [7] or [12]. T.L. Saaty, P. Rogers 
and R. Pell [22] proposed in 1980 to construct a portfo- 
lio using the analytic hierarchy process (AHP) method- 
ology. They consider that stocks must be compared fol- 
lowing the investor objectives and the criteria that in- 
fluence the prices. In addition, the latter criteria are 
themselves influenced by some global economic fac- 
tors. In AHP, factors, criteria and stocks are succes- 
sively weighed following their relative importance. Fi- 
nally, the weight of each stock gives its proportion in 
the portfolio. 

S.M. Lee and L. Chesser [16] present a goal pro- 
gramming (GP) model to construct a portfolio. The 
used objectives are the research of a minimum return, 
minimization of risk (using 6), various diversification 
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objectives and some specific objectives to the investors. 
In GP, the investor can set his desires in preference or- 
der in a simple and natural way. 

A. Rios-Garcia and S. Rios-Insua [20] construct 
a portfolio using multi-attribute utility theory (MAUT) 
and multi-objective linear programming (MOLP). Par- 
ticularly, the authors propose the use of Bayesian 
econometrics to update the investor multi-attribute 
utility function, and examine the problem posed by the 
nonlinear criteria in MOLP. 

Y. Evrard and R. Zisswiller [6] use MAUT to per- 
form a valuation of some stocks. The aim is to show 
how it is possible to construct models that link the at- 
tributes of the stocks (return, risk, PER and earnings 
per share) and the investors’ preferences. 

H. Nakayama, T. Takeguchi and M. Sano [19] pro- 
pose a graphics interactive methodology to construct 
a portfolio using the expected return, the variance and 
the evolution of the return in the past. 

J.M. Martel, N.T. Khoury and M. Bergeron [18] per- 
form a portfolio selection using the outranking meth- 
ods ELECTRE I and ELECTRE II. From the stocks in- 
cluded in two portfolios selected by a real investor, they 
generate 50 portfolios. These portfolios are compared 
using ELECTRE I and II and follow four criteria: the 
return, the logarithmic variance, the PER, and the liq- 
uidity. The aim is to determine which portfolio fits well 
to the decision criteria. 

G. Colson and C. De Bruyn [1] propose a system 
that performs a stock valuation and allows the con- 
struction of a portfolio. The heart of this system is the 
confrontation of the following objectives: 

1) Obtain a minimum level of gain. 

2) Maintain the risk under a predetermined level. 

3) Obtain a minimum level of gain as dividends and 
interests. 

4) Insure a sufficient level of diversification, firm con- 
trol or liquidity. 

This system is subdivided between two subsystems, 

the single decision model (SDM) and the simultaneous 

management model (SMM). The SDM ranks the stocks 

using various statistics criteria and information com- 

ing from correspondents. The SMM is a GP model as 

in [16]. 

A. Szala [26] performs stock evaluation in collabo- 
ration with a French investment company. For the fi- 
nancial analyst of the company, that examines a small 


number of stocks using numerous criteria, Szala uses 
the outranking method ELECTRE III to obtain a rank- 
ing of the stocks. For traders or portfolios managers, it 
is not realistic to use many criteria because they gener- 
ally manage a great number of stocks. Then, concerning 
traders and portfolio managers, Szala decided to amal- 
gamate the financial criteria in a synthesis criterion ob- 
tained by using the PREFCALC system (PREFCALC is 
an interactive system of the same nature as MINORA 
that will be presented in the following section of this 
paper). 

Khoury, Martel and Bergeron [14] use the outrank- 
ing methods ELECTRE IS and ELECTRE III to select 
international index portfolios. They generate 19 index 
portfolios from the indexes of 16 countries. The crite- 
ria that have been used are: return, standard deviation, 
transaction costs, country risk, available cover for for- 
eign currencies and exchange risk. 

The purpose of Colson and M. Zeleny ([2,29]) is 
to construct an efficient frontier in concordance with 
the principles of stochastic dominance. To achieve their 
aim, they propose to use a three-dimensional vector, 
the ‘prospect ranking vector’ (PRV), as a measure of 
risk. The first component of the PRV is the probability 
not to achieve a minimal return, the second component 
is the expected return and the third component is the 
probability to exceed a maximum return. We find an 
interest updating the PRV in order to perform a port- 
folio selection rather than construct an efficient fron- 
tier [9]. To achieve this result, it is necessary to mod- 
ify the PRV in order to obtain more complete measures 
of risk. Then, the first component is divided into two 
components: the first is destined to protect the investor 
against strong losses and the second is used to take into 
account the other possible losses (not so strong but sig- 
nificant). Concerning the third component, the proba- 
bility to exceed a maximum level of return is changed 
in the probability to significantly exceed the expected 
return. Then, this new version of the PRV can be asso- 
ciated to other criteria to perform a multicriteria port- 
folio selection. In [9] portfolio selection is managed by 
using the MINORA system that will be presented in the 
following section. 

C. Zopounidis, D.K. Despotis and I. Kamaratou [31] 
propose the use of the ADELAIS system to construct 
a portfolio using some diversification constraints, some 
constraints representing the investor’s personal pref- 


Portfolio Selection and Multicriteria Analysis 


2999 


erences and the following criteria: return, dividends, 

B, earnings per share growth, marketability, PER. The 

ADELAIS system is also used in our methodology that 

will be presented later. 

M. Tamiz, R. Hasham and D-F. Jones [27] propose 
to use GP for portfolio evaluation and selection. The 
proposed method consists of two stages: the first stage 
estimates the sensitivity of the stocks to specific factors 
using GP and regression analysis. They use twelve fac- 
tors: UK, US and German interest rates, US and Ger- 
man inflation rates, Dow Jones index, Nikkei average 
index, Hang Sang index, oil price, gold price, house 
price and sterling index. The sensitivities obtained by 
the GP model and those of the regression analysis are 
compared. In the second stage, a portfolio is selected by 
using a GP model based on the decision maker’s sce- 
narios and preferences. 

C. Dominiak [4] presents a procedure for secu- 
rity selection that uses a multicriteria discrete analysis 
method based on the idea of reference solution. The cri- 
teria he used are divided into three groups: 

1) Valuation measures that contain Price book value 
ratio and PER. 

2) Fundamental variables that contain profit margin 
ratio and changes in quarterly net profits. 

3) Technical indicators that contain rate of change, rel- 
ative strength index and price appreciation during 
the last 3 months. 

Ch. Hurson and N. Ricci [8] propose to combine ar- 

bitrage pricing theory (APT) and MCDM to model 

the portfolio management process. First, APT is used 
to construct some efficient portfolios, to estimate their 
expected return and to identify influence factors and 
risk origins. Then, two multicriteria decision meth- 
ods: the ELECTRE TRI outranking method and the 

MINORA interactive system are used to select attrac- 

tive portfolios, using APT factors as selection criteria. 

This methodology is illustrated by an application to the 

Greek market. 


Methodological Framework 


In this section we will present our methodology and il- 
lustrate it on a case study. For a more detailed presen- 
tation and applications on Greek and Belgium market, 
see [7,10,11,12,32]. A first application of MINORA to 
portfolio management can also be found in [30]. 


Portofolio data 

- Balance sheets 

- Income statements 

- Prices of shares 

- Volume of transactions 


J 


Evaluation of the criteria 

Financial ratios 

Stock market ratios 

Risk and Return (CAPM, market model) 
Qualitative criteria 


Multicriteria analysis for 
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ELECTRE TRI 
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j 
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ADELAIS 
Composition 
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Basic components of the methodological framework 


Basic Components 


The basic components of the proposed methodological 
framework are presented in Fig. 1. 


Portfolio Data and Criteria 


For the analysis and the selection of stocks, it is nec- 
essary to possess the following basic information for 
at least three consecutive years: balance sheets, income 
statements and stock market information, i.e. prices 
of shares and volume of transactions. This consecutive 
financial information allows the investor or portfolio 
manager to evaluate the evolution of the stock’s situ- 
ation, and to form financial and stock market ratios rel- 
evant to portfolio management. 

On the basis of the portfolio data one can cal- 
culate financial and stock market ratios (return on 
equity, return on investment, current ratio, interest 
charges/sales, earnings per share, price earnings ra- 
tio, dividend per share, etc.), the pair risk/return from 
the theoretical portfolio models capital assets pricing 
model (CAPM) and market model. Apart from the 
quantitative criteria, the investor or portfolio manager 
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needs to have more general information so that his eval- 
uation would be as objective and complete as possi- 
ble. Such information about a stock may be its quality 
of management, image, security of information, posi- 
tion on the market, etc. This qualitative information is 
sometimes more important than the financial, because, 
for example, if the firm does not have good managers, 
its financial results (sales, net income, return) will not 
be satisfactory. 

Financial and stock market ratios have become an 
accepted evaluation technique of financial analysts and 
portfolio managers. They provide a quantitative view 
of every element that concerns the internal operation 
of a firm as well as its relations with the outer world, 
and permit fast processing of a large volume of finan- 
cial data. The financial ratios have already been used in 
many fields of financial management. C.F. Lee [15] has 
grouped every financial ratio that has been used in the 
forecasting of firm failure, bond rating, market return 
and mergers. The risk and the return issue of theoreti- 
cal portfolio models (CAPM and market model) are the 
basic criteria in portfolio selection. Qualitative criteria 
are modeled according to the preferences of each user 
(portfolio manager) with the aid of an ordinal scale (3 
better than 2 and 2 better than 1). 


Multicriteria Analysis for the Selection 
of Attractive Stocks 


In this step the two MCDM methods MINORA and 
ELECTRE TRI are used to help the portfolio manager 
in the selection of a set of attractive stocks taking into 
consideration his preferences and experiences. Here, 
only a short description of the two methods is given. 


MINORA System 


MINORA is an interactive multicriteria decision mak- 
ing system that ranks a set of alternatives from the best 
ones to the worst ones using several criteria. In this pur- 
pose the MINORA system uses the UTA ranking algo- 
rithm [13]. From the ranking by the decision maker of 
a subset of well-known alternatives, UTA uses ordinal 
regression to estimate a set of separable additive utility 
functions of the following form: 


u(g) = uy(gi) +++: + uR( ge). 


where g = (gj, ..-, gx) is the performance vector of an 

alternative and u,(g;) is the marginal utility function 

of criteria i, normalized between 0 and 1. The ordi- 
nal regression is performed using linear programming 

(for more details see [3]). In MINORA the interaction 

takes the form of an analysis of inconsistencies between 

the ranking established by the decision maker and the 
ranking obtained from the utility function estimated by 

UTA. Two measures of these inconsistencies are used 

in MINORA: 

1) The F indicator which is the sum of the deviations of 
the ordinal regression curve (global utility versus de- 
cision maker’s ranking), e.g. the sum of estimation 
errors. 

2) The Kendall’s t that gives a measure, from — 1 to 1, 
of the correlation between the decision maker rank- 
ing and the ranking resulting from the utility func- 
tion. 

At optimality, when the two rankings are similar, the F 

indicator is equal to 0 and the Kendall’s t to 1. The in- 

teraction is organized around four questions presented 
to the decision maker: 

1) Is he ready to modify his ranking? 

2) Does he wish to modify the relative importance of 
a criterion, its scale or the marginal utilities (trade- 
off analysis)? 

3) Does he wish to modify the family of criteria used: 
to add, cancel, modify, divide or join some criteria? 

4) Does he wish to modify the whole formulation of the 
problem? 

These questions send back to the corresponding stages 

of MINORA and the method stops when an acceptable 

compromise is determined. Then the result (a utility 
function) is extrapolated to the whole set of alternative 
to give a ranking of them. 

The MINORA system presents two main advan- 
tages: 

e It furnishes a ranking of stocks that is a natural pre- 
occupation frequently used by portfolio managers. 

e The form of the interactivity, all the originality of the 
MINORA system can be found in the inconsisten- 
cies analysis in an interactive way. It allows to help 
the decision maker to construct his own model in 
a non normative way and organizes, in an unique 
procedure, all the activity of decision making, from 
the model formulation to the final result (a ranking 
of the alternatives from the best to the worst in the 
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case of MINORA). At the same time the decision 

maker is constantly integrated to the resolution pro- 

cesses and can control its evolution at any moment. 
Finally, it should be noted that the MINORA system has 
been used successfully to solve numerous management 
problems. 


The ELECTRE TRI Outranking MCDM Method 


ELECTRE TRI is an outranking method specially con- 
ceived for sorting problems. In our methodology the 
ELECTRE TRI method is used to sort the stocks in 
three categories: attractive stocks, uncertain stocks (to 
be studied further) and nonattractive stocks. ELECTRE 
TRI deals only with ordered categories (complete or- 
der). The categories are defined by some reference al- 
ternatives or reference profiles (one down profile and 
one up profile) which are themselves defined by their 
values on the criteria. Next we can define the categories 
C;,i=1,..., c, where C; is the worst category and C, 
the best one. We can also define the profiles r;, i = 1, 
..» ¢— 1, where r; and r,—, are the lower and the up- 
per profile respectively. Then, the profile r; is the theo- 
retical limit between two categories C; and C;,, and r; 
represents a fictitious stock which is strictly better than 
r; 1 on each criterion. In ELECTRE TRI, the informa- 
tion asked from the decision maker about his prefer- 
ences takes the form, for each criterion and each profile, 
of a relative weight and an indifference, preference and 
veto thresholds. To sort the stocks, ELECTRE TRI com- 
pares each of them to the profiles using the concepts 
of indifference, preference and veto thresholds in order 
to construct a concordance index, a discordance index 
and finally a valued outranking relation as in ELECTRE 
HI method (cf. [21]). This valued outranking relation 
s;(a, b) € [0, 1] measures the strength of the relation “a 
outranks b’ (a is at least as good as b). This valued out- 
ranking relation is transformed in a ‘net’ outranking re- 
lation in the following way: s,(a, b) > A <> aSb, where 
S represents the net outranking relation, a and b two 
stocks and J a ‘cut level’ (0.5 < A < 1) above which the 
relation a outranks b is considered as valid. Then the 
preference P, the indifference I and the incomparability 
R are defined as follows: 
e alb & aSband bSa, 
e aPb & aSband no DSa, 
e aRb & no aSband no bSa. 


In ELECTRE TRI there are two non total compensation 
sorting procedures (the pessimistic one and the opti- 
mistic one) to assign each alternative into one of the 
categories defined in advance. In the sorting procedure, 
stock a is compared at first to the worst profile r; and in 
the case of aPr,, a is compared to the second profile r2, 
etc., until one of the following situations appears: 
i) aPr; or alr;, , andr; 1 Pa; 
ii) aPr;and 7; 1 Ra,..., 4m Ra, Ti+ m+1 Pa. 
In situation i) both the procedures assign stock a to 
category i + 1. In situation ii), the pessimistic pro- 
cedure classifies stock a to category i + 1, while the 
optimistic procedure classifies stock a to category i 
+ m + 1. When the value of A gradually decreases, 
the pessimistic procedure becomes less compulsive and 
the optimistic procedure less permissive. Evidently the 
optimistic procedure tends to classify the stocks to 
the higher possible category, in contrast to the pes- 
simistic procedure that tends to classify the stocks 
to the lower possible category. In general, the pes- 
simistic procedure is applied when a policy of pru- 
dence is necessary or when the available means are 
very constraining. The optimistic procedure is applied 
to problems where the decision maker desires to fa- 
vor the alternatives that present some particular inter- 
est or some exceptional qualities. In portfolio manage- 
ment the optimistic procedure will be well adapted to 
an optimistic investor with a speculative investment 
policy, while a prudent investor, following a passive 
investment policy, will prefer the pessimistic proce- 
dure. 

ELECTRE TRI manages incomparability in such 
a way that it will point out the alternatives that have par- 
ticularities in their evaluations. In cases where some al- 
ternatives belong to different categories in both proce- 
dures, the conclusion is that they are incomparable with 
one or more reference profiles (as the number of cate- 
gories between the two assignments is increasing, the 
‘particularities’ of the alternatives are becoming more 
important). This is because these alternatives have good 
values for some criteria and, simultaneously, bad values 
for other criteria; moreover, these particular alterna- 
tives must be examined with attention. In this way the 
notion of incomparability included in ELECTRE TRI 
brings an important information to the decision maker 
and for this reason the best way to employ ELECTRE 
TRI is to use the two assignments procedures and to 
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compare the results. The advantages of ELECTRE TRI 

are the following: 

e ELECTRE TRI by sorting the stocks is well adapted 
to the purpose of portfolio management (acceptable 
stocks, stocks to be studied further and unacceptable 
stocks). 

e ELECTRE TRI, as all the methods of the ELECTRE 
family, accepts intransitivity and incomparability. In 
ELECTRE TRI this is done in such a way that the 
method will point out the alternatives that have par- 
ticularities in their evaluation. 

e The ELECTRE family uses techniques that are easily 
understandable by the decision maker. 

The methods from the ELECTRE family are very popu- 

lar and they have been used with success in a great num- 

ber of studies [21]. 


Multicriteria Analysis for the Determination 
of Attractive Stocks’ Proportions in a Portfolio: 
The ADELAIS System 


MINORA and ELECTRE TRI give the possibility to the 
portfolio manager to select the most attractive stocks he 
desires to include in his final portfolio. Then, the ADE- 
LAIS system helps him to determine the proportions 
(percentage) of capital invested in the above attractive 
stocks. ADELAIS is an interactive computer system de- 
veloped to support the search for a satisfactory solution 
in MOLP problems of the general form: 

max {filx),.... far()} 


s.t. xEA, 


where x = (Xj, ..., Xm) is the vector of decision vari- 
ables, f; (x), ..., fn(x) are linear functions of the de- 
cision variables and A is the set of the feasible solu- 
tions defined by linear constraints A C R”. The sys- 
tem provides extensive data management capabilities 
and concerning the solution process it provides a ‘two 
level’ interaction: interactive assessment of the portfo- 
lio manager’s utility function and interactive modifi- 
cation of the satisfaction levels. The system operates 
in two phases: a preliminary phase that is performed 
once and an iterative phase. In the preliminary phase, 
ADELAIS provides the upper and lower bounds for 
each objective function, the pay-off matrix, the start- 
ing efficient solution in a minimax sense and its rate of 


closeness to the ideal values (upper bounds). At each 

iteration of the iterative phase a new efficient solu- 

tion is presented the portfolio manager with the up- 
per and lower bounds for each objective, the achieve- 
ment percentages with respect to the upper bounds, the 
satisfaction levels (i.e., the revised lower bounds) es- 
tablished in previous iterations. Then, the system asks 
the portfolio manager to indicate the objectives he de- 
sires to improve and if he intends to decrease some 
other objectives in compensation. The portfolio man- 
ager’s answers, combined with relative answers of pre- 
vious iterations, form the basis for the establishment 
of new satisfaction levels. The new satisfaction levels 
limit the feasible set but the system provides the port- 
folio manager with the possibility to relax them if de- 
sired by analyzing the local trade-offs among the objec- 
tives. If the portfolio manager does not intend to de- 
crease some objectives the system stops on the current 
efficient solution (best compromise), otherwise a ref- 
erence set of decision alternatives (i.e. a set of vectors 
that might be assumed by the objectives) is constructed 
and the UTA method is used to estimate an additive 
utility function as in the MINORA system. Once a sat- 
isfactory utility function is assessed, this is maximized 
over the set A in order to obtain a new efficient solu- 
tion taking into account the portfolio manager’s prefer- 
ences. 

The advantages of the ADELAIS system for the 
portfolio manager, are the following: 

e It allows determining the proportions of the attrac- 
tive stocks in the portfolio that is not possible with 
ranking or sorting methods. 

e The interactive analysis of the inconsistencies as 
in the MINORA system, which helps the portfolio 
manager to understand the portfolio selection prob- 
lem. 

e The interactive revision of the satisfaction levels ori- 
entates in a way easy to understand the research of 
the best compromise that is well adapted to the port- 
folio manager’s preferences. 


Example 1 We will illustrated our methodology by 
a case study coming from [10]. The sample considered 
in the study consists of 40 stocks coming from the fi- 
nancial and commercial sectors of the Athens Stock Ex- 
change (ASE). These stocks are evaluated using the 7 
following criteria: 
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e gg): Return that must be maximized. 
g2: Marketability (percentage of shares traded) that 
must be maximized. 

e 3: Beta risk that must be minimized. 
ga: Price earnings ratio (price of the share/earnings 
per share). Since there are negative values of earn- 
ings (losses), we choose to maximize the inverse of 
this criterion 

e gs: Growth of dividend per share that must be max- 
imized. 

© 86: 
ries/current liabilities) that must be maximized. 


Acid test (current assets minus invento- 


e gy: Return on equity (i.e. net income/stockholder 

equity) that must be maximized. 
All the stocks of the sample have been evaluated on the 
five stock market criteria g1, ..., g5, while, in order to 
obtain most significant results, the criterion g¢ is used 
for the stocks of the commercial sector (CS1 to CS20) 
and the criterion g7 for the stocks of the financial sector 
(FS1 to FS20). 

Table 1 presents the multicriteria evaluation for the 
stocks of the commercial sector. Qualitative criteria are 
not included because they are dependent to the portfo- 
lio manager’s personal information that was not avail- 
able in this application. 


Ranking of Stocks Using the MINORA System 


The reference set and its ranking for the commercial 
sector is the following: CS16; CS18; CS12, CS19; CS9, 
CS3; CS7, CS8; CS1; CS20; for this purpose the port- 
folio manager can use a graphic help. The MINORA 
system provides two basic results: the criteria’s graphics 
(i.e. marginal utilities, Fig. 2 and the ordinal regression 
curve (ranking versus global utility, Fig. 3). An impor- 
tant graphic help is at the disposal of the portfolio man- 
ager in order to interpret these results. In Fig. 2 there are 
three utility curves for the return criterion (min, middle 
and max). The middle one corresponds to the above 
presented model of additive utility and, also, gives the 
relative weight for the criterion. The two others show 
the entire range of the possible marginal utility func- 
tions and gives an idea of the sensitivity of the crite- 
ria. Figure 3 shows the ordinal regression curve (stocks’ 
ranking versus global utility). 

Table 2 presents the pre-ordering of the portfolio 
manager (D.R.), the stock names (Actions), the global 
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81 & 83 84 85 86 
CS1 0.82 0.45 0.26 —4.7 —100 0.45 
C82 0.41 0.63 0.03 2.28 —20 2.04 
CS3 0.57 02 £0O.1 6.08  —33.3 1.08 
CS4 0.24 0.02 0.08 2.41 —53.5 0.62 
(S5 0 0.46 0.62 5.04 —76.5 3.02 
CS6 0.93 0.02 0.14 2.82 6.38 0.72 
CS7 0.01 0.69 0.77 7.55 —40 3.23 
CS8 0.86 0.86 0.86 4.28 Soll — (O537/ 
CS9 2.16 06 0.12 2.11 56.3 0.51 
CS10 | 1.24 0.12 0.62 11.65 12.5 1.17 
CS11 0.8 0.58 0.62 13.7 34.6 1.54 
CS12 | 1.23 0.37 0.64 8.97 45.9 0.96 
CS13 | 0.24 0.28 0.73 —1.75 0 0.72 
CS14 | 0.26 0.65 0.58 4.88 714 0.9 
CS15 1.1 0.76 0.54 0.29 0 0.73 
CS16 | 1.79 0.55 0.73 5.88 —100 2.69 
CS17 | 1.02 1.06 0.82 50) 6.38 0.73 
CS18 | 1.32 1.12 0.94 12.06 -—61 2.69 
CS19 | 1.36 0.04 1.02 1.79 1100-231 
CS20 | 0.57 0.17 0.23 —11.5 0 0.52 
0.6 
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Marginal utility curves of the criterion return 


utility of every stock (G.U.) and the ordering from MI- 
NORA (M.R.). 


Sorting of Stocks Using the ELECTRE TRI Method 


The objective is to sort the stocks in the following three 
categories: attractive stocks (C3), uncertain stocks to be 
studied further (C2), non attractive stocks (C1). The 
reference profiles and the thresholds related to the con- 
cordance and discordance indices are fixed by the port- 
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Portfolio Selection and Multicriteria Analysis, Table 2 
Global utility for the sample of 40 Greek stocks (extrapolation phase); (*) Stocks not included in the reference set 


Actions G.U. D.R. MLR. Actions G.U. D.R. MLR. 
CS16 0.761 1 1 FS5 0.679 * 1 
CS18 0.745 2 2 FS10 0.665 1 2} 
CS11 0.684 * 3 FS20 0.570 2 3 
CS12 0.656 3 4 FS9 0.570 2 3 
CS19 0.656 3 4 FS6 0.555 * 5 
CS2 0.643 * 6 FS2 0.554 * 6 
CS17 0.635 * 7 FS18 0.549 4 7 
CS3 0.632 5 8 FS3 0.549 4 7 
CS9 0.632 5 8 FS17 0.534 * 9 
CS10 0.622 * 10 FS19 0.532 * 10 
CS7 0.596 7 11 FS4 0.511 6 11 
CS8 0.596 7 11 FS15 0.504 * 12 
CS15 0.582 * 13 FS14 0.499 7 13 
CS5 0.559 * 14 FS7 0.490 8 14 
CS1 0.554 9 15 FS8 0.490 8 14 
CS14 0.554 * 16 FS16 0.484 * 16 
CS13 0.495 * 17 FS12 0.476 * 17 
CS6 0.471 * 18 FS1 0.433 * 18 
CS4 0.388 * 19 FS11 0.395 10 19 
CS20 0.362 10 20 FS13 0.210 * 20 
1 / ; og EP: Portfolio Selection and Multicriteria Analysis, Table 3 
: ‘ : : if : Reference profiles and thresholds 
3) | i ae Parameters 
i i iyi 81 82 83 84 85 8&6 
aly : ; : High ref. profile 1 035 06 6 10 1.1 
: 4 é Low ref. profile 0.5 0.7 0.2525 0 0.7 
: ’ 1 Indiff. threshold |0.05 0.05 0 0.1 8.72 0.05 
6 : i Pref. threshold O25 O72 07% Os 10 O45 
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Portfolio manager's ordinal regression curve 


folio manager according to his experience and prefer- 
ences. Table 3 presents this preferential information. 
Table 4 presents the sorting results of the ELECTRE 
TRI method and recapitulates the MINORA results for 


Veto threshold 2 1 1 10 180 2.75 


comparison. The stocks that belong to the best cate- 
gory (C3) in both optimistic and pessimistic sorting are 
proposed without hesitation to the portfolio manager 
for selection. The stocks that belong to the worst cat- 
egory (C1) in both optimistic and pessimistic sorting 
are not proposed to the portfolio manager. When the 
stocks belong to the uncertain category (C2) for both 
optimistic and pessimistic sorting, this means that these 
have moderate values on all criteria and, consequently, 
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Portfolio Selection and Multicriteria Analysis, Table 4 
Sorting of stocks 


Optimistic 


Pessimistic 


Cl 


CS4, CS$13, CS20, 
FS1, FS7, FS11, 


CS1, CS4, CS13, 
CS19, CS20, FS1, 


FS12, FS13 FES 7/5 ESS), TEAST, 
TESLA, ESI, FES IA! 

C2 | CSI, CS2, CS3, CS2, CS3, CS5, 

CS5, CS6, CS8, CS6, CS7, CS8, 


C3 


CS14, CS15, CS17, 
ESSHESOWESIA, 
FS19 


CS7, CS9, CS10, 
CS11, CS12, CS16, 
CS18, CS19, FS2, 
FS3, FS4, FS5, 


CS9, CS12, CS14, 
ESTSNESIVAESSs 
FS8, FS10, FS16, 
FS19 

CS10, CS11, CS16, 
CS18, FS2, FS4, 
FS5, FS6, FS15, 
FS17, FS18, FS20 


FS6, FS10, FS15, 
FS16, FS17, FS18, 
FS20 


they must be studied further. In the cases where some 
stocks belong to different categories in both optimistic 
and pessimistic sorting, this means that they are incom- 
parable with one or two reference profiles. This is due to 
the fact that these stocks have good values for some cri- 
teria and, simultaneously, bad values for other criteria. 
Thus, the notion of incomparability underlies the par- 
ticularities of these stocks that must be examined fur- 
ther and brings an important information to the portfo- 
lio manager. Comparing the ranking results of the MI- 
NORA system with the sorting results of the ELECTRE 
TRI method, one can remark, generally, that there is an 
agreement, that is, the stocks which are well ranked (i.e. 
top of the ranking) by MINORA are in the best cate- 
gory C3 by ELECTRE TRI, and vice versa. This agree- 
ment between these two methods asserts the interest of 
the methodology and allows the portfolio manager to 
be confident with their results. 


Determination of the Attractive Stocks’ Proportions 
in a Portfolio Using the ADELAIS System 


According to the results obtained above, the portfolio 
manager can choose a subset of the best stocks from 


each sector and then, using the ADELAIS system, de- 
termine the proportions invested in each selected stock. 
The chosen set is the following: FS10, FS5, FS20, FS6, 
FS2 from the financial sector and CS16, CS18, CS12, 
CS11, CS9 from the commercial sector. 

The decision variables, X;, are the percentages of 
capital invested in each stock. For the needs of the study 
we note: X,; = FS10, X2 = FS5, X3 = FS20, X4 = FS6, X5 
= F82, X6 = CS16, X7 = CS18, Xs = CS12, Xo = CSI, 
X19 = CS9. 

Since, the return, the marketability, the beta, the 
price earnings ratio and the growth dividend per share 
of the constructed portfolio are directly related to the 
percentages of capital invested in each stock, only these 
five criteria are used in the application of the ADELAIS 
system. On the other hand, the acid test ratio and the 
return on equity can only be used as evaluation crite- 
ria of the financial soundness of each stock, and they 
do not characterize the constructed portfolio. Conse- 
quently, the objective functions are: 

e Maximize the return (g)): max R)X; +--+: + RioX1o. 
e Maximize the marketability (g2): max M;X; +--+ + 

MioX10. 

Minimize the beta (g3): min B} X,+ +++ + ByoXjo. 

e Maximize the price earnings ratio (g4): max 

PER, X 1+ +++ + PERio X10. 

e Maximize the growth of dividends per share (gs): 
max GD,Xj+--- + GDj9Xjo. 

Here, R;, M;, B;, PER;, and GD; are the values of the 

corresponding objective for the stock i. The constraints 

are: 

oe fe oer 
invested. 

e 0.05 < X; <0.2,..., 0.05< Xio< 0.2 upper and lower 
limits of the amount to be invested in each stock (% 
of capital). 

At the beginning (preliminary phase) ADELAIS pro- 

ceeded to the estimation of an initial efficient portfolio 

of stocks (compromise) and presents it as in Table 5. 

A pay-off table for the objectives was also given to the 

portfolio manager. 

With respect to the initial solution, the port- 
folio manager being satisfied by the attained val- 
ues of the marketability and the growth of divi- 
dends per share asked for an improvement of the 
other objectives. On the basis of this information 
the system generated a set of portfolios, and asked 


+ Xo = 1, all the available capital must be 
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Portfolio Selection and Multicriteria Analysis, Table 5 


Initial efficient solution; R, is the rate of closeness to the ideal (upper bound) 


Criteria 

Initial solution £1 i) &3 &4 8 
Upper bound 1.61 0.64 0.672 13.98 49.92 
Compromise 1.44 0.58 0.745 11.9 47.5 
Satisfaction Level | 12.05 0.34 0.933 9.08 —6.33 
Lower bound 12.05 0.34 0.933 9.08  —6.33 
R, 58.2% 81.2% 60.5% 57.5% 95.7% 


Portfolio Selection and Multicriteria Analysis, Figure 4 
Portion of capital invested in the portfolio 


Portfolio Selection and Multicriteria Analysis, Table 6 
Best compromise solution 


Solution | £1 2 3 &4 &5 
Attained 147 052 0.73 11.9 47.22 
values 

R, 65% 61.9% 65.8% 57.5% 95.2% 


the portfolio manager to rank them. On the ba- 
sis of these data, ADELAIS assesses an additive util- 
ity model and use it to determine a new refer- 
ence solution. Finally, after three iterations ADE- 
LAIS determines the best compromise solution that 
could not be improved. Table 6 presents the val- 
ues of the objectives and the rate of closeness to 
the ideal point, the corresponding portfolio appears 
in Fig. 4. 


Concluding Remarks 


In this article a review of articles is presented concern- 
ing MCDM and portfolio management and a method- 
ological framework for portfolio selection. MCDM is 
a new supportive tool for portfolio selection. The use of 
multicriteria decision making methods allows to take 
into consideration the investors personal preferences 
and all the relevant criteria, whatever their origins, for 
portfolio selection. In our methodology, the conjoined 
use of the MINORA system and the ELECTRE TRI 
method have shown some advantages as the comple- 
mentarity and the similarity of the obtained results that 
state the portfolio manager’s confidence in the con- 
stitution of his portfolio; also, they satisfy one of the 
portfolio managers’ preoccupation which is the rank- 
ing and the sorting of stocks in portfolio selection. Sec- 
ondly, ELECTRE TRI with the notion of incompara- 
bility brings an important information to the portfo- 
lio manager, especially when the evaluation of stocks 
appears difficult. Thirdly, ADELAIS and MINORA sys- 
tems provide a considerable aid to the portfolio man- 
ager to construct his own model of portfolio selection 
in an interactive way. This portfolio selection model is 
without any normative consideration proposed by the 
classical portfolio theory. Finally, the topic of portfolio 
management also covers the portfolio diversification by 
categories of securities (stocks, bonds, options, interna- 
tional securities, etc.), in order to compensate the risks. 
The construction of a portfolio for each of these securi- 
ties and the diversification problem also are multicrite- 
ria. Thus, it will be interesting to look at the contribu- 
tion that MCDM can bring to these problems. The final 
aim is to regroup these problems in order to formalize 
and improve all the processes of portfolio management. 
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Consider the linear programming (LP) problem in the 
standard form: 


min c!x 
st. Ax =b, (1) 
x>0, 


where the given matrices A € R™”, c € R",b € R”, and 
the unknown vector x € R”. The dual of (LP), (DLP), 
can be written as: 


max bly 
st. Aly+s=c, (2) 
s=>0, 


where the unknown vector y € R”. Let F be the feasible 
set of (x, y, s), and let the interior of F be denoted by int 
F, ie., this is the set of feasible (x, y, s) such that x > 0 
and s>0. 

Potential reduction algorithms for linear program- 
ming are generally equipped with various potential 
functions that are solely used to measure the solu- 
tion’s progress. There is no restriction on either path 
following or stepsize during the iterative process; the 
greater the reduction of the potential function, the 
faster the convergence of the algorithm. These potential 
functions represent the logarithmic volumes of certain 
coordinate-aligned ellipsoids containing the optimal so- 
lution set. There are three types of potential functions 
(see the references). For (x, y, z) € int F, the first is the 
primal-dual potential function 


n 


w(x,s) = plog(x's) _ Y > log(x;s;)) ‘ 


j=l 


where p > n. This function is also called Tanabe-Todd- 
Ye potential function. Note that for p =n 


W(x,s) = nlog(x's) _ Y log(x;s;)) > nlogn. 
j=l 


Thus, for p > n, w (x, s) approaching — oo implies xTs 
converging to 0. 
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Let z= bTy and z = c' x for some strictly feasible x 
and (y, s), respectively. Then, consider 


p(x, z) = plog(c'x —z) — }) log(x)) 


j=l 


and 


galy,s, Zz) = plog(z — b'y) _ Y > log(s;) ; 
j=l 
yp is called Karmarkar’s potential function or the pri- 
mal potential function, and ¢gq is called the dual poten- 
tial function. The three functions are closely related: 


W(x, s) = hp(x, z) — Y “log s; : 


j=l 


and 


W(x, s) = aly.s,Z) — ) log x;. 


j=l 


Thus, the reduction in ¢,(x, z) or daly, s, Z), when (y, 
s) or x is fixed, implies the same reduction in w(x, s). 

Depending on which potential being used as the pri- 
mary reduction, there are also three types of potential 
reduction algorithms. Here, we describe the primal and 
primal-dual algorithms. 


Primal Potential Reduction Algorithm 


Consider a pair of (x*, y*, s*) € int F. Fix z* = bTy*, then 
the gradient vector of the primal potential function with 
respect to x* is 


Vorp(x*, z*) = he (x4) 
age 


We directly solve the ball-constrained linear problem 
for direction d,: 


min Vobp(x*, 2*\! d, 
st. Ad, = 0, 
|Gx*)-"d, || <a. 
Let the minimizer be d,. Then 
xp 


d, =-—a7—, 
le" 


where 
p* = plz") 
= (1 - XFAT(A(K#PAT)TAX® ) 
x X'Vo_(x*,z*) : 
Update 
Fee rer eee eae (3) 
[P*| 


and we have 


az 
2(1—a@) 


dp (ak tt, zk) — h(x, z*) ew |e*| ie 


Thus, as long as || p* || > 7 > 0, we may choose an ap- 
propriate a such that 


plait, z*) — bp(x*, zk) < -8 


for some positive constant 5. By the relation between 
W p(x, s) and $,(x, z), the primal-dual potential function 
is also reduced. That is, 


Wo(x*t, sk) — Yolx*, s*) < 6. 


However, even if || p* || is small, we will show that the 
primal-dual potential function can be reduced by a con- 
stant 5 by increasing z* and updating (y*, s*). 

We focus on the expression of p*, which can be 
rewritten as 


p* = (1- XFAT (A(X PAT) TAX® ) 


x (sso ee _ e) 
= IF i sa") 8, (4) 
where 
s(z*) =e — A! y(z*) (5) 
and 
i) =~ gle a | 


yi = (A(X*)2AT) 1b, 
yn = (A(X*YAT) A(X). (6) 
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One can show that, when || p(z*) || is small, then (x*, 
y(z*), s(z*)) is in the neighborhood of the central path 
and bTy(z*) > z*. Thus, we can increase z* to bTy(z*). 
Moreover, $(x*, s(z*)) is reduced from $(x*, s*) by 
a constant. Overall, we have the following potential re- 
duction theorem to evaluate the progress. 


Theorem 1 Given (x*, yk, s*) € int F. Let p =n+4/n, 


zk = bTyk, xk! be given by (5), and y+?! = y(z‘) in (6) 
and s‘+! = s(z) in (5). Then, either 


w(aktl, sky < (xk, sk) —8 
or 
w(xk, skt1) < (xk, s*) 6, 


where 6 > 1/20. 


This theorem establishes an important fact: the primal- 
dual potential function can be reduced by a constant no 
matter where x* and y‘ are. In practice, one can perform 
the line search to minimize the primal-dual potential 
function. This results in the following primal-dual po- 
tential reduction algorithm. 


Given acentral path point (x°, y®, s°) € intf. 
Let 2 
Set ie =O, 
WHILE (s*)"x* >< DO 
1 Compute y; and y from (6). 
2 IF there exists z such that s(z) > 0, 
THEN compute 
Z = arg min acer s(Z)), 
FI 
IF yp(x*, s(2)) < yp(x*, s*), 
TEE Ny = (ys sz) anid 
gk+l = bl yk: 
ELSE ae = ye gktl <4 sk and gktl = gk. 
FI 
3 Tet xh Se — wx") with @ = 
arg ming. oWp(x* =X nig as), 
4 Set k := k +1 and return to Step 1. 


Primal algorithm 


The performance of the algorithm results from the 
following corollary. 


Corollary 2. Let p = n + O(./n). Then, the primal 


Jn log((x°)T 
€ 


0 
algorithm terminates in at most O( . dy itera- 


tions with 


T ik 


ee aay ee, 


Primal-Dual Potential Reduction Algorithm 


Another technique for solving linear programs is the 
symmetric primal-dual algorithm. Once we have a pair 
(x, y, s) € int F with jz = xTs/n, we can generate a new 
iterate x* and (y*, s*) by solving for d,, dy and d, from 
the system of linear equations: 


Sd, + Xd, =ype—Xs, 
Ad, =0, (7) 
-Aldy—d, =0. 


Let d := (d,, dy, ds). To show the dependence of don 
the current pair (x, s) and the parameter y, we write d= 
d(x, s, y). Note that d! d, =— d} ATdy = 0 here. 

The system (7) is the Newton step starting from (x, 
s) which helps to find the point on the central path with 
duality gap y ny. If y = 0, it steps toward the optimal 
solution characterized by the optimality conditions of 
(LP) and (DLP); if y = 1, it steps toward the central 
path point of ju; if 0 < y < 1, it steps toward a central 
path point with a smaller complementarity gap. In the 
algorithm presented here, we choose y = n/p < 1. Each 
iterate reduces the primal-dual potential function by at 
least a constant 5, as does the previous potential reduc- 
tion algorithm. 

Let the direction d = (d,, d,, 
equation (7) with y = n/p, and let 


_ ay min(X3) 
7 [xs (Ste- Xs) 


d;) be generated by 


(8) 


where @ is a positive constant less than 1. Let 


II 


x x + Od, , 
y y+Od,, 


s'’ =s+ 0d, . 
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Then, we should have (x*, y*, s*) € int F and 


Yo(xt, st) — Wolx,s) 
< —ay/min(Xs) (x8)! (e— Xs) | 
2 


(04 
2(1—a@) ° 


Let v € R" bea positive vector and p > ./n. Then, we 


can prove 
2 V3 
2 y4/-. 

~ V4 


y min(v) lv (e- --) 


Combining these we have 


a2 


3 
Vales") — W(x,5) < -af2+ 1 —a) = 


for a constant 6. This result will provide a competitive 
theoretical iteration bound, but a faster algorithm may 
be again implemented by conducting a line search along 
the direction d to achieve the greatest reduction in the 
primal-dual potential function. This leads to the follow- 
ing algorithm. 


Given (?, vy?) € int F. 

Set p= Jnandk := 0. 

WHILE (s*)'x* > € DO 

1 Set (x, s) = (x*, s*) and y = n/p; 
compute (d,, d,, d;) from (7). 

2 etx) = x 40d,,y° = y" +ad,, 
and st! = sk + @d,, where @ = 
arg min, Wp (x* + ad,, s* + ad,). 

3 Set k := k + 1 and return to Step 1. 


Primal-dual algorithm 


Theorem3 Let p = n+O(./n). Then, the primal-dual 
algorithm terminates in at most O (Vilog(*)) it- 
erations with 


Oo = bys © 


Potential reduction methods have been successively 
extended to solving nonlinear conic programs, see, 
e.g., [13]. 
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The Powell method in its basic form can be viewed as 

a gradient-free minimization algorithm. It requires re- 

peated line search minimizations, which may be car- 

ried out using univariate gradient free, or gradient based 

procedures. It was introduced by M.J.D. Powell [1]. The 

procedure is described in the algorithm steps below. 
The minimization problem considered is: 


min f(x). 


1. Initialization 

Select an accuracy € > 0, and a starting point x. 
Set the initial search directions s“ to be the unit vectors 
along each coordinate axis, for i= 1,..., n. Set the main 
iteration counter to k = 0, and the cycle counter i = 1. 
Initialize z = x, Set counter j = 0 for the case where 
step 2.2a is used. 

2. Directional univariate minimization 


2.1 | Determine a univariate minimizer 47 for the 
problem f(z + A;s). 

Set 2) = 2 + A*s, and increment 
(<= fae Il, 

Repeat step 2.1 until i =n+1. 

Check for termination (i.e. use the criterion 
in step 3) 

Go to step 2.2a if the original version is de- 
sired, or 

go to step 2.2b if the variant avoiding linearly 
dependent directions is chosen 


2.2a | New direction selection (pattern search di- 
rections) 

(Original version of the method) 

Set j <— j+1. 

IFj<n 

THEN replace s‘) by 2") — x, 

ELSE reset search direction set to the coor- 
dinate directions. 

END IF 

Set x(k+1) = gin). 

initialize 2 = x(*+)), 
increment counter k < k + 1; 
set i = 1, go to step 2.1. 
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New direction selection 
(Modified variant to avoid linearly depen- 
dent directions) 
At the present kth main iteration calculate 
A = maxy,...nf f(z) — f(z“*)} and the 
index m for which this occurs. The corre- 
sponding direction for which this occurs is 
designated as s‘"). 
Calculates fie= f(x) =e), 
fs = fen” = x). 
IF either f3 < f, and/or (f — 1 —2f) + fs) 
(fi — fa — A)? = 0.5A(fi — fo)? 
THEN 
use the old set of directions 
[sO eS ls 
seh x =) Gr 
x) = 741) _ (4) whichever 
yields the lowest value of f(x); 
initialize 2) = x(**), 
increment counter k <— k +1; 
set i = 1, go to step 2.1. 


ELSE define the search direction (pattern) 
ga Zt) = yh) 


2.2b 


find the value A* minimizing the 
function f(z""*) + As). 
define the new set of directions 
Oy ccc, See oc oO) Sik 
set xt) — g(ntl) 4 1 *s: 
initialize 2) = x(*+)), 
increment counter k <— k +1; 
set i = 1, go to step 2.1. 

END IF 


3. Termination check 

A satisfactory termination criterion is generally to 
stop whenever at any stage of the algorithm the change 
in the variables is less than the required accuracy, that 
is when || z+) —x® || <e. 

In terms of the termination criterion, Powell [1] 
gives a more elaborate termination check procedure. 
This is defined by the following steps: 

e Specify a set of accuracies, for each variable inde- 
pendently, €), ..., €,, each of which is greater than 
zero. 

e Apply the standard Powell’s method until a com- 
plete cycle of n directional minimizations causes 
a change of less than 1/10th of the desired accuracy 


in the variables, individually. Call the resulting point 
(A) 


ye. 

e Increase every variable by 10 times the correspond- 
ing specified accuracy. 

e Apply the standard Powell’s method from this point 
until a complete cycle of n directional minimizations 
causes a change of less than 1/10th of the desired ac- 
curacy in the variables, individually. Call the result- 
ing point y®, 

e Define a search direction s49) = y“) — y®), Calcu- 
late A* minimizing f(y“ + As“). Set yO = y“ + 
A 48), 

e Assume that the process has converged if the com- 
ponents (individually) of the vectors (y — y) 
and (y‘8) — y) are less than 1/10th of the set ac- 
curacies. If this does not hold, proceed to the next 
step. 

e Replace the search direction s\) by (y — y) and 
restart the procedure from step 2 onwards (present 
termination control procedure). 

The termination procedure proposed above by 
Powell is expected to be more reliable, but it is more 
expensive since the entire minimization problem has to 
be resolved at least twice, until the tight convergence 
criteria are satisfied. 

The method has a quadratic termination prop- 
erty, minimizing quadratic functions in predeter- 
mined number of operations, requiring n” + O(n) line 
searches. The directions generated also by the method 
can be shown to be conjugate (e. g. [2]). 

To remedy the case where directions will tend grad- 
ually to become linearly dependent (in the original ver- 
sion, step 2.2a) a modification, originally proposed by 
Powell, is also given in step 2.2b. 


See also 


> Cyclic Coordinate Method 
> Rosenbrock Method 
> Sequential Simplex Method 
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Introduction 


Weare concerned with Nonlinear Programming prob- 
lems defined in the following way: 


Minimize f(x) 
subject to h(x) =0 
(1) 
g(x) <0 
xEQ, 


where h: RR” > R”,g:R" > R?,f:R" >R are 
continuous and {2 C R” is a closed set. From now 
on ||- || represents the Euclidean norm and v, means 
max{0, v}. The set R, will be the set of nonnegative real 
numbers. 


The Powell-Hestenes-Rockafellar (PHR) Aug- 
mented Lagrangian [42,54,56] is given by: 


to Hebel 


Elle") ]}. 0 


forallx € R",AER” we R'. 

PHR-based Augmented Lagrangian methods for 
solving (1) are based on the iterative (approximate) 
minimization of Lp with respect to x € §2, followed by 
the updating of the penalty parameter p and the La- 
grange multipliers approximations A and jz. The most 
popular practical Augmented Lagrangian method gave 
rise to the LANCELOT package [24,25,26]. LANCELOT 
does not use inequality constraints g(x) <0 in its 
problem formulation. When an inequality constraint 
gi(x) <0 appears in a practical problem, it is re- 
placed by gj(x) +s; = 0,s; > 0. The convergence of 
the LANCELOT algorithm to KKT points was proved 
in [24] using regularity assumptions. Under weaker 
assumptions that involve the Constant Positive Lin- 
ear Dependence (CPLD) constraint qualification [4,55], 
KKT-convergence was proved in [2] for a variation 
of the LANCELOT method. In the original LANCELOT 
method S2 was a box. A generalization where 2 is 
a polytope may be found in [23]. 

The motivation of (2) comes from the classical Ex- 
ternal Penalty method [27,34,36]. In this method one 
minimizes the function given by 


ix) 


Lp(x,A, h) = 


®p(x) = f(x) +> fom Plea le 3) 


for successive values of p that tend to infinity. If, after 
minimizing (3) for a given p, a satisfactory feasibility is 
not achieved, the External Penalty philosophy leads to 
increase the value of p. If p is very large, the problem of 
minimizing ®, may become very difficult for ordinary 
minimization solvers. 

The Augmented Lagrangian philosophy is different. 
Assume that the result of minimizing ®p, for a given p 
is not satisfactory, in terms of feasibility. Then, instead 
of increasing p (or, perhaps, besides increasing p) one 
modifies the origin with respect to which infeasibility 
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is penalized. For example, suppose that, after the min- 
imization of ®, we obtain x such that h;(x) = c £ 0. 
A common sense conjecture would be that this “disap- 
pointing” result was obtained because we punished the 
deviation of h;(x) with respect to 0, whereas the correct 
strategy would be to punish the deviation with respect 
to —c. This leads to the Shifted Penalty idea, in which, 
instead of ®p, one uses the Shifted Penalty function: 


P(x, c,d) = f(x) + SLL + ¢)? 


P 2 
+ > [(gi(x) + a. | . (4) 

i=1 
Writing c; = A;/p and d; = j1;/p we observe that the 
Shifted Penalty strategy coincides with the Augmented 
Lagrangian one. The naive modification of the shifts c;, 
d; sketched above gives rise to the best known (first- 
order) updating formula for the Lagrange multipliers 
in the Augmented Lagrangian method. It is interest- 
ing to observe that this intuitive reasoning is indepen- 
dent of the smoothness of f,h and g. In this article 
we give preference to matrix-free updating procedures, 
which excludes the consideration of higher order esti- 
mates [28,35]. 

In [3] a new PHR-like algorithm was introduced 
that does not use slack variables to complete inequal- 
ity constraints and admits general constraints in the 
lower-level set §2. In the box-constraint case, subprob- 
lems are solved using a technique introduced in [13], 
which improves the GENCAN algorithm [12]. CPLD- 
based convergence and penalty-parameter bounded- 
ness were also proved in [3] under suitable conditions 
on the problem. 

In addition to its intrinsic adaptability to the case 
in which arbitrary constraints are included in 2, the 
following positive characteristics of the Augmented La- 
grangian approach for solving (1) must be mentioned: 
1. Augmented Lagrangian methods proceed by se- 

quential resolution of simple problems. Progress 
in the analysis and implementation of simple- 
problem optimization procedures produces an al- 
most immediate positive effect on the effective- 
ness of Augmented Lagrangian algorithms. Box- 
constrained optimization is a dynamic area of prac- 
tical optimization from which we can expect Aug- 
mented Lagrangian improvements. 


Global minimization of the subproblems implies 
convergence to global minimizers of the Aug- 
mented Lagrangian method [11]. There is a large 
field for research on global optimization methods 
for box-constraint optimization. When the global 
box-constraint optimization problem is satisfacto- 
rily solved in practice, the effect on the associated 
Augmented Lagrangian method for the Nonlinear 
Programming problem is immediate. 

Most box-constrained optimization methods are 
guaranteed to find stationary points. In practice, 
good methods do more than that. The line-search 
procedures of [12], for example, include extrap- 
olation steps that are not necessary at all from 
the point of view of KKT convergence. However, 
they enhance the probability of convergence to 
global minimizers. As a consequence, the proba- 
bility of convergence to Nonlinear Programming 
global minimizers of a practical Augmented La- 
grangian method is enhanced too. 

The global convergence theory of Augmented La- 
grangian methods [11] does not need differentiabil- 
ity of the functions that define the Nonlinear Pro- 
gramming problem. In practice, this indicates that 
the Augmented Lagrangian approach may be suc- 
cessful in situations were smoothness is dubious. 
The Augmented Lagrangian approach can be 
adapted to the situation in which analytic deriva- 
tives are not computed. See [47] for a derivative- 
free version of LANCELOT. 

In many practical problems the Hessian of the La- 
grangian is structurally dense (in the sense that 
any entry may be different from zero at different 
points) but generally sparse (given a specific point 
in the domain, the particular Lagrangian Hessian 
is a sparse matrix). As an example of this situation, 
consider the following formulation [18,19] of the 
problem of fitting circles of radii r within a circle of 
radius R without overlapping: 


Min ~ max {0,4r? — ||pi — pjll3}” 


i<j 
subject to ||p; ||] < (R—71)’. 
The Hessian of the objective function is structurally 


dense but sparse at any point such that points p; 
are “well distributed” within the big circle, since 
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only pairs of overlapping small circles appear in 
the expression of the objective function. Newtonian 
methods usually have difficulties with this situa- 
tion, both in terms of memory and computer time 
since the sparsity pattern of the matrix changes 
from iteration to iteration. This difficulty is almost 
irrelevant for the Augmented Lagrangian approach 
if one uses a low-memory box-constraint solver. 

7. Independently of the Lagrangian Hessian density, 
the structure of the KKT system may be very poor 
for sparse factorizations. This is a serious difficulty 
for Newton-based methods but not for suitable im- 
plementations of the Augmented Lagrangian PHR 
algorithm. 

8. Ifthe Nonlinear Programming problem has many 
inequality constraints the usual slack-variable 
approach of Interior-Point methods (also used 
in [2,24]) may be inconvenient. There are several 
approaches to reduce the effect of the presence of 
many slacks, but they may not be as effective as not 
using slacks at all. 

9. Huge problems have obvious inconvenients in 
terms of storage requirements. The Augmented 
Lagrangian approach provides a radical remedy: 
problem data may be computed “on the flight”, 
used when required in the subproblem, and not 
stored at all. This is not possible if one uses matri- 
cial approaches, independently of the sparsity strat- 
egy adopted. 

10. If, at the solution of the problem, some strong con- 
straint qualification fails to hold, the performance 
of Newton-like algorithms could be severely af- 
fected. The Augmented Lagrangian is not as sen- 
sitive to this type of inconvenient. 

The amount of research dedicated to Augmented La- 

grangian methods decreased in the 21th century. Mod- 

ern methods, based on interior-point techniques, se- 
quential quadratic programming, trust regions, restora- 
tion, nonmonotone strategies and advanced sparse lin- 
ear algebra procedures attracted much more attention. 
A theoretical reason, and its practical consequence, 
seems to be behind this switch of interest. Roughly 
speaking, under suitable assumptions, Interior-Point 

Newtonian techniques converge quadratically (or, at 

least, superlinearly) whereas practical Augmented La- 

grangian generally converge only linearly. Therefore, if 
both methods converge to the same point, and the pre- 


cision required is strict enough, an Interior-Point New- 
tonian method will require less computer time than an 
Augmented Lagrangian method, independently of the 
work per iteration. Several attempts have been made to 
alleviate both the slow-convergence behavior as the ill- 
conditioning of the subproblems [14,21,32,33,39,49]. 
Behind these attempts is the fact that the optimality 
conditions of the Augmented Lagrangian (and Penalty) 
subproblems may be decomposed in such a way that, 
for p large, resemble the KKT conditions of the origi- 
nal problem. This fact may be exploited in several ways 
and makes it possible that good implementations of the 
Augmented Lagrangian method be quite competitive 
with Interior-Point Newtonian techniques, even when 
high precision is the main requirement at the solution. 

The general form of the Augmented Lagrangian 
method based on the PHR formula considered in this 
article is the following. 


Algorithm 1 

Let Amin < Amaw [max > 0, y > 1, 0<t <1. Let 
{ex} be a sequence of nonnegative numbers such that 
lity sg ee =D: Wet Ale Annins Aniealy t= Vyas ttt 
He; € [0, fmaxl,i =1,...,p, and p; > 0. Initialize 
k<1, 


Step 1. Solving the subproblem. 
Compute x* € R” an approximate solution of 


Minimize Lp,(x,A*, u*) subject tox € 2. (5) 


Step 2. Define 
k 
vi = max | g(x), EE} i = lp catyipis 
l Pk 


Ifk =1lor 


max {I|h(x") loo, IIV* loo} 


< rmax {[Ih(x')|hoo.|V'"oot 6) 
define px+1 = px. Otherwise, define px+1 = pr. 


Step 3. Compute ae € [Amin. Amax],i = 1,...,m 
and p*t} € [0, Umax],i=1,...,p.Setk<k+1 


and go to Step 1. 
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In the practical implementation of Algorithm 1, we 
will compute a = min{max{A min, AE + prhi(x*)}, 
Amax} and ae = min{max{0, pe + prgi(x*)}, [max }- 
These are safeguarded first-order estimations of the 
Lagrange multipliers. The safeguards defined by 
Amin; Amax aNd [lmax are necessary to prove the global 
convergence results. Some authors prefer to define 
Augmented Lagrangian algorithms without safeguards 
for the Lagrange multipliers [9,26]. However, bounded- 
ness of the multiplier estimates are necessary to prove 
the main convergence results and, if this boundedness 
is not algorithmically forced, it may be guaranteed only 
by means of strong problem assumptions. 

Different Augmented Lagrangian algorithms will 
differ only in Step 1. In each case we will need a pre- 
cise definition for the approximate solution of (5). 


Cases 
Augmented Lagrangians and Global Optimization 


In this section we will only assume continuity of the 
functions f, h and g. Throughout the section we will as- 
sume that a global minimizer of (1) exists. Several ver- 
sions of the Augmented Lagrangian method generate 
sequences that converge to global minimizers, provided 
that global minimizers of the subproblems are available. 
This property is inherited from the analogous prop- 
erty of the External Penalty method. A practical con- 
sequence of this property is the fact that Augmented 
Lagrangian methods tend to find feasible points with 
lower objective function values than other nonlinear 
programming solvers, when clever agressive algorithms 
are used for solving the subproblems. 

Here we will assume we are able to find an approx- 
imate global minimizer defined by the tolerance e,. At 
each iteration, x* will belong to 92M Py, where P; is an 
auxiliary set to which a global minimizer of (1) neces- 
sarily belongs. For example, Px may be a set that con- 
tains the feasible region of (1). The presence of the con- 
straints defined by P,; helps in the global resolution of 
the subproblems. Obviously, in the absence of algorith- 
mic advantages, P; may be defined as being R”. Algo- 
rithm 2 will be Algorithm 1, where Step 1 is defined as 
follows. 


Step 1. Let Py C IR” be a closed set such that a global 
minimizer z (the same for all k) belongs to Px. 
Find an ¢-global minimizer x* of the problem 


Min Dili, p*) subject tox € 2M Py. That is 
x* € 21M Py is such that: 


be MM DSi ees «86 


for all x € 22M Px. The ¢,-global minimum can be 
obtained using a deterministic global optimization 
approach, such as the ~BB method [37]. 


In most deterministic global optimization methods for 
solving (7) the point x*~! is not used as “initial approx- 
imation” as most local optimization solvers do. In fact, 
the concept of “initial point” has no meaning at all in 
this case. The information used by the Outer iteration 
k is the set of approximate Lagrange multipliers com- 
puted after iteration k — 1, and nothing else. 

Theorems 2 [11] is the main convergence result re- 
lated to Algorithm 2. Limit points of sequences gener- 
ated by this algorithm are feasible global minimizers. 


Theorem 2 Assume that the sequence {xk} is well 
defined and admits a limit point x’. Then, x’ is 
a global minimizer of (1). If, instead of ex > 0 we 
assume only that ¢, > ¢ => 0, x will be feasible and 
f(x*) < f(x) + ¢ for all feasible x. 


The problem of finding x* € 2 P, satisfying (7) con- 
sists of finding an ¢;-global solution of the problem: 


Minimize Lp,(x,A*, w*) subject to x € QA Pz. (8) 


When §2 and P, are defined by linear equality and/or 
inequality constraints and f, h, g admit continuous sec- 
ond derivatives, this problem has been solved in [11] 
using the wBB algorithm [37]. 

The practical results presented in [11] corroborate 
the theory and give hints on the effectivity of the Aug- 
mented Lagrangian method for global optimization. 


Augmented Lagrangian Algorithm 
with Arbitrary Lower-Level Constraints 


In [2,3] safeguarded Augmented Lagrangian methods 
were defined that converge to KKT points under the 
CPLD constraint qualification and exhibit good prop- 
erties in terms of penalty parameter boundedness. AL- 
GENCAN, which is publicly available in the TANGO 
Project web page http://www.ime.usp.br/~egbirgin/ 
tango/, is the application of the main algorithm in [3] to 
problem (1). 
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In this section we will assume that f,h, g admit 
continuous first (and, sometimes, second) derivatives. 
Observe that the function L,, defined in (2) has con- 
tinuous first derivatives with respect to x, but sec- 
ond derivatives are not defined at the points such that 
gi(x) + pi/p = 0. 

In Algorithm 3 we will assume that the set (2 is de- 
fined by 

Q = {x ER" | h(x) = 0, g(x) < 0}, (9) 
where h: R" > R™, g: R" > R2 are as smooth as 
necessary. The constraints defined by @ are called 
lower-level constraints. Algorithm 3 is identical to Al- 
gorithm 1 except at Step 1. The subproblem resolution 
at Algorithm 3 is as given below. 


Step 1. Compute (if possible) x* € IR" such that there 
exist vk ¢ R@,u* € RE? satisfying 


VL p(x", A*, wt) + >) vEVh,(x*) 
i=1 
p 
+ Do ukVg (x)I| < ex, (10) 


i=1 


uj = 0,g.(x*) < es foralli=1,...,p, 


(11) 
g(x") < —€5 => uk = 0 foralli= 1,...,), (12) 


|a(x*)|| < ex. (13) 


The conditions (10)-(13) are approximate KKT condi- 
tions for the minimization of Lp, subject to the lower 
level constraints. If {2 = IR” these conditions reduce to 
Vp (xt, AF, WII < ex. 

The CPLD (Constant Positive Linear Dependence) 
condition defined by Qi and Wei [55] is a crucial tool 
in the convergence theory of Algorithm 3. In [4] it 
has been proved that CPLD is a constraint qualifica- 
tion and its relation with other constraint qualifications 
have been reported. 

A First-Order Constraint Qualification is a prop- 
erty of feasible points of a Nonlinear Programming 
problem such that, when verified at a local minimizer, 
implies that the local minimizer is a KKT point. The 
Linear-Independence Constraint Qualification (LICQ), 


also called regularity, says that the gradients of the 
active constraints at the feasible point x are linearly 
independent. 

Assume that the feasible set of a nonlinear program- 
ming problem is given by h(x) = 0, (x) < 0, where 
h: R" — R™ and g: R" — R?. Let I(x) C {1,..., D} 
be the set of indices of the active inequality constraints 
at the feasible point x. Let I, C {1,..., m}, I, C I(x). 
The subset of gradients of active constraints that corre- 
spond to the indices I, U I, is said to be positively lin- 
early dependent if there exist multipliers A, jz such that 


YS AiVAi(x) + D> wiV E(x) = 0, (14) 


i€lh i€] 


with jz; > 0 for all i € I, and eh |Ai| + ies [hi > 
0. Otherwise, we say that these gradients are positively 
linearly independent. 

The Mangasarian-Fromovitz Constraint Qualifica- 
tion MFCQ says that, at the feasible point x, the gradi- 
ents of the active constraints are positively linearly in- 
dependent [48,57]. 

The CPLD Constraint Qualification says that, if 
a subset of gradients of active constraints is positively 
linearly dependent at the feasible point x (i.e. (14) 
holds), then there exists 6 > 0 such that the vectors 


{Vhi(}ien {VE(Mhien 


are linearly dependent for all y¢IR” such that 
lly — xl $6. 

The main convergence theorems related to Algo- 
rithm 3 were proved in [3]. Theorem 3 says that, if 
a limit point satisfies the CPLD condition with respect 
to the lower-level constraints, then this point is station- 
ary with respect to a natural infeasibility measure. In 
other words, this theorem says that, if the limit point 
is not feasible, then (very likely) it is a local minimizer 
of the upper-level infeasibility, subject to lower-level 
feasibility. 


Theorem 3 Let {x*} be a sequence generated by Algo- 
rithm 3. Let x* be a limit point of {x*}. Then, if the se- 
quence of penalty parameters {px} is bounded, the limit 
point x* is feasible. Otherwise, at least one of the follow- 
ing possibilities hold: 
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(i) x* isa KKT point of the problem 


m P 
Minimize [x hi(x)? + Yes" 
i=1 i=1 


subject tox € 92. 


(15) 


(ii) x* does not satisfy the CPLD constraint qualifica- 
tion associated with S2. 


From the point of view of optimality, we are interested 
in the status of feasible limit points. Theorem 4 says 
that, under the CPLD constraint qualification, feasible 
limit points are stationary (KKT) points of the original 
problem. Since CPLD is strictly weaker than the Man- 
gasarian—-Fromovitz (MF) constraint qualification, it 
turns out that Theorem 4 is stronger than results where 
KKT conditions are proved under MF or regularity 
assumptions. 


Theorem 4 Let {x*} be a sequence generated by Algo- 
rithm 3. Assume that x’ is a feasible limit point of (1)- 
(9) that satisfies the CPLD constraint qualification re- 
lated to set of all the constraints. Then, x’ is a KKT point 
of the problem (1)-(9). 


Theorems 3 and 4 are interesting and useful but they 
do not explain why it is better to use the Augmented 
Lagrangian instead of a pure penalty method. In fact, 
if we define A* = 0, u* = 0 for all k these two theo- 
rems remain valid and we are in presence of a variation 
of the External Penalty method. The use of Lagrange 
multipliers estimates is justified in Theorem 5, which is 
also proved in [3]. Theorem 5 says that, under appro- 
priate conditions, the sequence of penalty parameters 
{ex} do not tend to infinity. In practice, this means that 
the minimization subproblems tend to remain well- 
conditioned and that minimization algorithms for solv- 
ing the subproblems will not face difficulties associated 
to very large values of px. 


Theorem 5 Assume that: 

1. The sequence {x*} is generated by the appli- 
cation of Algorithm 3 to problem (1)-(9) and 
limpsoo x* = x". 

2. In Algorithm 3 we use the updating rules 
Ae = max{Amin, min{A‘ + prhi(x*), Amax}} and 
= max{0, min{j* + prgi(x*), Max} f- 

3. The point x’ is feasible (h(x*) =0, h(x*) =0, 
g(x*) < 0 and g(x*) < 0). 


4. The gradients of the active constraints at x’ are lin- 
early independent. The associated (unique) Lagrange 
multipliers are A", Ww, u's Vv. 

5. The functions f,h, g,h and g admit continuous sec- 
ond derivatives in a neighborhood of x’. 

6. Define the tangent subspace T as the set of 
all z € R" such that Vh(x*)'z = Vh(x*)'z = 0, 
Vegi(x*)'z = 0 for all i such that gi(x*) = 0 and 
Vg (x*)'z = 0 for all i such that g (x*) = 0. Then, 
for allz € T.z £0, = 


a] vfte" +O AFV h(x") 
i=1 


P m 

+ YU HIV? gi(x*) + DO v7 V7h,(x") 
i=1 i=1 
p 

Ts Yourvees") |= >0. 


i=1 


7. Foralli=1,..., m,j=1,...,p,A¥ © Amin, Amax)s 
be € [0, Lemax): 

8. For alli such that gi(x*) = 0, we have * > 0. (Strict 
complementarity in the upper level.) 

9. There exists a sequence n.—-0 such that 
€k < ne max{||h(x*)|], || VE ||} for all k = 0,1,2... 
Then, the sequence of penalty parameters {px} is 

bounded. 


Observe that strict complementarity is imposed only to 
the constraints in the upper-level set. In the lower-level 
set it is admissible that g(x") = u} = 0. Observe, too, 
that the assumption on the reduced positive definite- 
ness on the Hessian of the Lagrangian is weaker than 
the usual second-order sufficiency assumption [36], 
since the subspace T is orthogonal to the gradients of 
all active constraints, and no exception is made with 
respect to active constraints with null multiplier u}. In 
fact, this is not a second-order sufficiency assumption 
for local minimizers. It holds for the problem of min- 
imizing x,x2 subject to x. — x, < 0 at (0,0) although 
(0, 0) is not a local minimizer of this problem. 

The last hypothesis of Theorem 5 imposes that 
the precision in which subproblems are solved should 
tend to zero faster than the measure of infeasibility- 
noncomplementarity. Some authors [30,31,40,41], in 
slightly different contexts, also used convergence toler- 
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ances that depend on the degree of infeasibility of the 
current inner iterate. 

The Augmented Lagrangian method is the only ef- 
ficient nonlinear programming algorithm that can take 
obvious advantage of the existence of case-oriented op- 
timization solvers for problems whose constraints are 
a subset of the original problem constraints. The par- 
tition of the constraints in “easy” and “complicate” is 
very common in engineering applications. In the Aug- 
mented Lagrangian framework, easy constraints go to 
the lower level and complicate constraints contribute 
to the Augmented Lagrangian function. The most com- 
mon situation is the one in which lower level con- 
straints are linear. Location problems of this type are 
described in [3]. Problems with more than 3 x 10° vari- 
ables and 14 x 10° constraints are solved in this way, 
using moderate computer time. The codes are free for 
download through the TANGO Project web page http:// 
www.ime.usp.br/~egbirgin/tango/. The key for the effi- 
ciency of the Augmented Lagrangian method in these 
problems is the use of the Spectral Projected Gradient 
method [15,16,17] for solving the subproblems. 


Augmented Lagrangian Algorithm 
with Lower-Level Box Constraints 


In most applications, the definition of the lower level set 
£2 in (1) is: 


Q={xeR"|a<x <b}, (16) 


where a, b € R",a < b. In other words, §2 is an n-di- 
mensional box. By the continuity of the Augmented 
Lagrangian function and the compactness of (2, this 
definition guarantees that a global minimizer of the 
subproblem exists. Many times one adds bound con- 
straints in the lower level of a nonlinear programming 
problem in order to guarantee solubility of the subprob- 
lems and boundedness of the sequence {x*}. 

Obviously, the constraints (16) may be written in 
the form (9) and, so, Algorithm 3 may be applied and 
Theorems 3, 4 and 5 hold. However, many specific al- 
gorithms for box-constrained optimization exist that 
use stronger convergence criteria than the one given 
in (10)-(13). Namely, in box-constrained minimization 
one usually declares convergence when 


x* €Q and ||Po(x* — VL p,(x*, A*, w* Ile < ex, 
(17) 


where Pg denotes de Euclidean projection on 2. The 
condition (17) implies (10)-(13). This leads us to de- 
fine Algorithm 4 as Algorithm 1 where Step 1 is defined 
by (17). Theorems 3, 4 and 5 obviously apply to Algo- 
rithm 4. It must be observed, however, that, since all 
the points of 92 satisfy CPLD, Theorem 3 guarantees 
that the limit points of the generated sequence {x*} are 
KKT points of 37", hi(x)? + 7? [gi(x)+]? subject 
tox € Q. 

Algorithm 4, with the subproblems solved by the 
box-constraint solver GENCAN [12], with the modifi- 
cations introduced in [13], is called ALGENCAN. The 
code that implements ALGENCAN is free for download 
in the TANGO Project web page http://www.ime.usp. 
br/~egbirgin/tango/. It is written in Fortran 77 (double 
precision) and interfaces with AMPL, CUTEr, C/C++, 
Python and R (language and environment for statistical 
computing) are available. 

The default version of ALGENCAN uses tT = 0.5, 
Y = 10, Amin = —10”°, fumax = Amax = 107°, e, = 1074 
for allk, Ay = 0, v1 = O and p,; = max{10~, 
min{10, (2|f(x°)|)/(l h(x?) II? + Ig(x°) +173}. The de- 
fault convergence criterion is max{||h(x*)|loo, || V* loo} 
< 10°‘. The condition ||V*||o < 1074 guaran- 
tees that, for all i=1,...,p, gi(x*) < 10-4 and that 
(uk + pxgi(x*))+ = 0 whenever gi(x*) < —107*. 
This means that, approximately, feasibility and comple- 
mentarity hold at the final point. Dual feasibility with 
tolerance 10~* is guaranteed by (17) and the choice 
of Eke 

The celebrated package LANCELOT also solves the 
basic Nonlinear Programming problem with box con- 
straints, but each inequality constraint is completed 
with a slack variable to become an equality constraint 
plus a lower-level bound. 

A comparison between the default versions of AL- 
GENCAN and LANCELOT B using all the (1023) prob- 
lems of the CUTEr collection was reported in [3]. The 
executions were stopped when the precision 10~* was 
achieved or when the allowed CPU time (5 min on an 
1.8GHz AMD Opteron 244 processor, 2Gb of RAM 
memory and Linux operating system) was exhausted. 
Codes are in Fortran 77 and the compiler option “-O” 
was adopted. 

Given a fixed problem, for each method M €{LAN- 
CELOT, ALGENCAN}, x |, was defined in [3] as the final 
point obtained by M when solving the given problem. 
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Practical Augmented Lagrangian Methods, Table 1 
Efficiency means number of times that method M obtained 
the best r. Robustness means the number of times in which 
r™ <0o0 


ALGENCAN LANCELOTB 


The point x.) is considered to be feasible if 


max {4 (<M) oo |e (ta), } < 10. 
Define 
foest = min {f (xfha) | xfha is feasible} . 


It is said that the method M found a solution of the 
problem if xM., is feasible and 


f (xfha1) Ss Soest + 10? | fiest| + 107° 


or max 1 fosests f (xfia)} < —107°. 


Finally, let t™ be the computer CPU time that method 
M used to arrive to x |. Define 


a t™, if method M found a solution, 
r = 
oo, otherwise. 
The quantity r was used as performance measurement 
in [3]. The results of the comparison are reported in the 
form of performance profiles [29] and a small numeri- 
cal table. See Fig. 1 and Table 1." 


Alternative Augmented Lagrangians 


The main drawback of the PHR formula (2) is that 
the objective function of the subproblems is not twice 


‘When LANCELOT B solves a feasibility problem (problem 
with constant objective function), it minimizes the squared in- 
feasibility instead of addressing the original problem. As a result, 
it sometimes finishes without satisfying the user required stop- 
ping criteria (feasibility and optimality tolerances on the the orig- 
inal problem). In 35 feasibility problems, LANCELOT B stopped 
declaring convergence but the user-required feasibility tolerance 
was not satisfied at the final iterate. 16 of the 35 problems seem 
to be problems in which LANCELOT B converged to a station- 
ary point of the infeasibility (large objective function value of the 
reformulated problem). In the remaining 19 problems (less than 
2% of the total number of problems), LANCELOT B seems to stop 
prematurely. This easy-to-solve inconvenient may slightly deteri- 
orate the robustness of LANCELOT B reported here. 


ALGENCAN — 
LANCELOT B -~—- 


0 200 400 600 800 1000 1200 1400 


ALGENCAN — 
LANCELOT B --: 


30 35 40 


5 10 15 20 25 


45 50 


Practical Augmented Lagrangian Methods, Figure 1 
Performance profiles of ALGENCAN and LANCELOT B in the 
problems of the Cuter collection. Note that there is a CPU 
time limit of 5 min for each pair method/problem. The sec- 
ond graphic is a zoom of the left-hand side of the first one 


continuously differentiable. This is the main motiva- 
tion for the introduction of many alternative Aug- 
mented Lagrangian methods. See, for example, [1,5,6, 
7,8,22,38,43,45,46,50,51,52,53,58]. Most of them have 
interesting interpretations as proximal point meth- 
ods for solving the dual problem, when the origi- 
nal nonlinear programming problem is convex [44]. 
In [10] a comparison of many different Augmented 
Lagrangian formulae within an algorithmic framework 
similar to the one of Algorithm 1 has been performed 
using the CUTE collection [20]. In general, the PHR 
formula seems to be more efficient than the alter- 
native ones for the resolution of the selected prob- 
lems. 
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Introduction 


The protein folding question is one of the most chal- 
lenging problems in computational biology. Although 
the structures of approximately 40,000 proteins have 
been determined via experimental techniques and cat- 
aloged in the Protein Data Bank (PDB) [3], there are 
thousands more to be discovered. Modeling proteins 
with computational techniques is especially critical for 
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proteins that do not easily crystallize or cannot be ad- 
dress by NMR techniques. 

Protein structure prediction requires searching 
through a vast conformational space for the native 
structure. This challenge can be met through the appli- 
cation of powerful algorithms, such as the wBB global 
optimization method [1,2,11]. Many ab initio protein 
structure prediction methods rely on databases and 
statistical methods to predict short peptide fragments 
which are subsequently assembled using scoring func- 
tions [8,9,23,37,38,39,40,41,44,45]. Other successful 
approaches apply detailed physics-based force fields 
and search for the minimum free energy of a pro- 
tein [5,6,16,17,18,19,20,21,24,25,26,27,28,29,30,31,34, 
36,42]. For a detailed summary of protein structure 
prediction methods, the reader is directed to two recent 
reviews [12,13]. 

The prediction of residue contacts within a-helical 
bundles is critical to the prediction of the overall ter- 
tiary structure of these proteins. Predicted interhelical 
distance restraints can be used to significantly reduce 
the conformational search space in the protein fold- 
ing problem. Both modeling [22] and experiments [35] 
have shown that the residues that define the hydropho- 
bic core are most crucial for folding, limiting the con- 
formational search space. The preference of nonpolar 
atoms for nonaqueous environments is called the hy- 
drophobic effect [4]. This occurrence is due to the in- 
ability of nonpolar molecules to participate in hydrogen 
bonding in an aqueous environment. The hydrophobic 
effect is a major stabilization factor for proteins, nu- 
cleic acids, and membranes. Because of the dominance 
of such hydrophobic interactions in protein folding, 
this paper focuses upon predicting specific hydropho- 
bic residue pair contacts in protein structure. The im- 
portant contributions of electrostatic, van der Waals, 
hydrogen bonded, torsional, and solvation energetic 
terms are included using secondary and tertiary struc- 
ture prediction methods (e. g., the ASTRO-FOLD ap- 
proach [16,17,18]). 


Methods 


There are three main aspects of the overall approach to 
the helical topology prediction problem. First a dataset 
of helical proteins was assembled. This dataset is then 
used to develop hydrophobic residue-based interheli- 


cal contact probabilities. Two optimization models are 
then presented to minimize these contact probabilities 
subject to constraints that enforce physically realistic 
topologies. A summary of the important details of the 
approach are presented here. A more detailed descrip- 
tion of the method is available elsewhere [32]. 


Dataset Selection 


A database PDB set of 318 helical protein struc- 
tures was compiled to generate probabilities for spe- 
cific hydrophobic-to-hydrophobic PRIMARY residue 
contacts and associated hydrophobic-to-hydrophobic 
WHEEL residue contacts between helices of the same 
helical protein (this terminology is explained below). 
They were taken from the following sources: 20 from 
Table 2 of Zhang et al. [43]; 7 from Table 1 of Huang et 
al. [15]; 62 from the CATH database [33]; and 229 from 
the PDB Select 25 Database [14]. 


Probability Generation and Probability Sets 


The probabilities will be established for both a contact 
at position i (denoted as a PRIMARY contact) and any 
associated contacts in the helical wheel position (de- 
noted as WHEEL contacts). For the purpose of calculat- 
ing the PRIMARY probabilities, two helices of a given 
protein in the database PDB set were considered to in- 
teract if they had a contact between a pair of residues 
with a PRIMARY distance between 4.0 A and 10.0 A. 
Unless otherwise specified, distances in this paper refer 
to C*-C® distances. 

For a parallel helix-to-helix interaction, WHEEL 
contacts include the following residue combinations: 
(i+3) to (j+3)s (i+3) to (j+4)s (i-+4) to (+3); (144) 
to (j + 4); (i— 3) to (j — 3); (i— 3) to (j — 4); (i— 4) to 
(j—3); and (i—4) to (j—4). For an antiparallel helix-to- 
helix interaction, the following are the possible WHEEL 
contacts: (i + 3) to (j — 3), (i + 3) to (j — 4); (i + 4) to 
(j-3); (i+-4) to (j—4); (1-3) to (7 +3); (i—3) to (j +4); 
(i—4) to (j+3); and (i—4) to (j+4). Figure 1 illustrates 
the PRIMARY and WHEEL contacts for an antiparal- 
lel helical interaction. Two residues k and / that were 
WHEEL residues of i and j, respectively, were consid- 
ered to interact if the WHEEL distance between them 
was greater than or equal to 4.0 and less than 12.0 A. 

Formulating the problem as a set of PRIMARY and 
WHEEL contacts provides a significant advantage over 
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Predictive Method for Interhelical Contacts in Alpha-Helical 
Proteins, Figure 1 

Two interacting a-helices in the test set protein 1rop (PDB). 
The helices here interact in an antiparallel manner, hy- 
drophobic residues j and j form a PRIMARY contact, and the 
residues (i+3), (i+4) can each interact with (j — 3), (j — 4) to 
form WHEEL contacts if both residues of a given pair are hy- 
drophobic. This figure was created with PyMol [7] 


other methods. Instead of making assumptions about 
the form of the helix, such as representing the helix as 
a simple cylinder, the proposed method is able to ad- 
dress irregular helices. By selecting a set of interheli- 
cal point contacts within a specified distance range, the 
approach can handle the most difficult cases, including 
those containing helices that bend or kink. 

Two helices may interact in one of three possible 
ways. They may be parallel or antiparallel to one an- 
other. The third type of possible interaction will be la- 
beled unclassified. These are cases for which neither the 
parallel nor the antiparallel label applies. For the model 
prediction section of this paper, only parallel and an- 
tiparallel helical interactions were predicted. 

The occurrence frequencies of each of the 36 pos- 
sible hydrophobic pairs were determined by count- 
ing the number of occurrences of hydrophobic-to- 
hydrophobic minimum interhelical distances within 
the 4.0 to 10.0 A distance range. The frequency for each 
pair was then split into two groups, parallel and an- 
tiparallel, based on the relative direction of the inter- 
acting helices. The total number of hydrophobic-to- 
hydrophobic minimum distance contacts was identi- 
fied as the sum of these 36 occurrence frequencies. The 
PRIMARY probabilities are then established as the oc- 
currence frequency of a specific hydrophobic pair for 
a specific directionality divided by the total number of 
hydrophobic-to-hydrophobic contacts. 

Conditional WHEEL probabilities were generated 
for interacting helices in the database PDB set. Given 


that two a-helices have an interaction with a corre- 
sponding hydrophobic-to-hydrophobic minimum in- 
teraction distance within 4.0 to 10.0 A the conditional 
probability that the residues on the same side of the he- 
lical wheel form any hydrophobic-to-hydrophobic con- 
tact within 4.0 to 12.0A was determined. These prob- 
abilities were calculated by considering the number 
of hydrophobic-to-hydrophobic WHEEL contacts and 
the total number of possible WHEEL contacts for every 
specific helix to helix interaction individually, calculat- 
ing the probability for each interhelical residue contact 
by averaging over the total number of such contacts af- 
ter the entire database PDB set has been considered. 


Interhelical Contact Prediction Model 


Given the locations of helices in a protein’s primary 
sequence, the next step in the proposed method is to 
predict the interhelical PRIMARY and WHEEL residue 
contacts. This is done in order to impose distance con- 
straints upon such contacts in the tertiary structure pre- 
diction section of the framework. 

To accomplish this task, two mixed-integer lin- 
ear programming (MILP) optimization problems were 
formulated. The first MILP problem, denoted as the 
Level 1 Model, identifies a set of the most probable in- 
terhelical PRIMARY contacts. The PRIMARY contacts 
selected by the Level 1 Model are enhanced by the pres- 
ence of the most probable WHEEL contacts predicted 
in the Level 2 Model. This second model also provides 
a method to distinguish between equally likely results 
of the Level 1 model. 


Level 1 Model: PRIMARY Interhelical Contacts In 
the Level 1 problem, the binary variables y%,, and yhin 
are activated if the helices m and n of the same protein 
interact in an antiparallel or a parallel fashion, respec- 
tively. In addition, the binary variables w/;" are defined 
as active when the hydrophobic residue pair (i, j) forms 
a PRIMARY contact, where jis in helix m and j is in he- 
lix n. 

Objective Function: The objective function of Level 
1, Eq. (1), corresponds to maximizing the sum of 
the most probable hydrophobic contacts (i, j) for the 
given primary sequence by considering the product of 
the binary variable w7'", representing the existence of 
a residue-residue contact, the binary variables y*, and 
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iF is representing the existence of a helix-helix contact, 
and the probability of a parallel or antiparallel contact, 
pe jn OF Pyimn> Fespectively. 


my Ve We Deas 
m n i 7 
+ DD ann SO wi loom (1) 
m n i j 


Yinn> Yan wh” = {0,1} (2) 
This objective function is nonlinear due to the prod- 
ucts of binary variables that result. The objective func- 
tion can be reformulated as a linear objective function 
by introducing a second pair of variables, as described 
elsewhere [10]. 

Residue Contact Rules: Every hydrophobic amino 
acid of helix m, i, can have at most one PRIMARY con- 
tact with another hydrophobic amino acid of helix n, j, 
and this is given by Eq. (3). The two terms in this equa- 
tion are necessary due to the model formulation; it is 
assumed that the second index is always larger than the 
first index for any potentially active variables w;; in or- 
der to reduce the number of binary variables. 


Yo wit Do wi <1 (3) 


jimi iii 


Equation (4) prevents any pairs of contacts (i, j) and 
(i’, j’) from both being specified if either the number 
of residues between i and 7’ or j and j’ is less than five 
or the number of residues between i and 7’ is different 
than the number of residues between j and j’ by more 
than two residues. This requires that (i’, j’) cannot be 
a WHEEL contact to the PRIMARY contact (i, j) and 
also limits the size of kinks in the protein backbone that 
result from a differing separation between (i and i’) and 
(j and j’). 
wii + Wit <1 
V(i, i’, jf’): |diftGi, 7) — diff(j, j)| > 2 
or either |diff(i, /’) 
or |diff(j, ’)| <5 (4) 


<5 


In this equation, diff(i, i’) refers to the difference in se- 
quence numbering between i and i’. 

Equation (5) states that ifa PRIMARY contact (i, j) 
occurs, then none of the WHEEL residues for i can also 


be part of a PRIMARY contact themselves. 


wer tw" <1 Vi, j.k, 1): i,k € mand 


kisina WHEEL position of i (5) 


Helix Contact Rules: For every helix m, Eq. (6) es- 
tablishes the maximum number of PRIMARY con- 
tacts that can be specified involving m. The parameter 
counth(m) is established based upon the number of hy- 
drophobic residues within a helix that are not WHEEL 
residues to each other. 


= (in + yhn) <counth(m) Wm (6) 


Equations (7)-(8) require a minimum number of loop 
residues between two a-helices to yield helical interac- 
tions of a given orientation. 


Vs =0 V(m,n): loop between (m,n) 
has less than6AA_ (7) 


Yon = 9 V(m,n): loop between (m, n) 


has lessthan1 AA (8) 


Equation (9) states that two helices m and n can either 
interact in a parallel or antiparallel fashion, if they in- 
teract at all, but in only one manner. 


Yon t¥nn <1 V(m,n) (9) 


Equations (10)-(13) restrict the topology of helical in- 
teractions based upon transitive rules. 


V(m,n,p): mAn# p, 
nhel > 3 (10) 


Van Ing tmp 22 


V(m,n,p): men #p, 
nhel > 3 (11) 


ae + Yan + Yip <2 


V(m,n,p): mAn# p, 
nhel > 3 (12) 


yon t¥np + Ymp <2 


Vim,n,p)im én #p, 
nhel > 3 (13) 


Ynn + Vip + Yep S2 
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Relating Residue Contacts to Helix Contacts: A direct 
link between the w;; PRIMARY contact variables and 
the y“, and ae helical interaction variables is pro- 
vided by Eqs. (14)-(15). 


win = Vi + Voi (14) 


(15) 


Ynn + Ynn— YoY wi” <0 
yy 


Equations (16)-(17) require the PRIMARY contact 
predictions, wi, to be consistent with the variables 


representing the interaction directions, y?,, and ee 


Wr wre ye eo VEG fh): 
i’ >i, j/ > jand 
|’ — i] < |j’ — jf] + 3 or 
|’ — i] > | -3j|-3 
(16) 
wrt Wit + Yin <2 V(i7,i 7): 
i’ >i,j>j and 
| — i] < |j’ —j| +3 0r 
|’ — i] > | -3j|-3 
(17) 


Equations (18) and (19) specify that a PRIMARY con- 
tact pair (i, j) may be predicted only if it results in an 
overlap between the two helices of at least two-thirds of 
the length of the shorter helix. 


win +yi,<1 ifm,noverlap < 2/3 


of shorter helix (18) 


wien + ae <1 ifm,noverlap < 2/3 


of shorter helix (19) 


Additional Features: Obtaining a rank-ordered list 
of the most likely sets of helical contacts is more de- 
sirable than a single solution. Equation (20) introduces 
the idea of integer cut constraints into the model. Af- 
ter each successive solve of the above model, the previ- 
ous solution can be excluded from the feasible solution 


space using this equation. Here A is the set of active 
variables, I is the set of inactive variables and card(A) 
is the cardinality of set A. 


(m,n), (i, JEA 


~ 2B (Yan + Yinn + wn) < card(A)—1 


(m,n), (i, JEL 


(inn + yn + 02") 


(20) 


Equation (21) indicates an upper limit on the num- 
ber of PRIMARY contacts that two interacting helices 
m and n can have specified. This upper contact limit is 
shown as max_contact in Eq. (21). 


~ Yow < max_contact-(y%,, + yon) V(m, n) 
ij 
(21) 


Equation (22) eliminates a number of helical interac- 
tions from the Level 1 solutions equal to the value of the 
parameter subtract. The subtract parameter effectively 
loosens the helical packing, which may be desired in 
predicting only the most essential and hopefully small- 
est distance contacts. 


> » (vin a yon) = p count) 


— subtract (22) 


Equations (1)-(22) represent the Level 1 mathemati- 
cal model which is a mixed-integer linear optimization 
problem (MILP). 


Level 2 Model: WHEEL Interhelical Contacts The 
Level 2 MILP problem serves as a check on the ordering 
of the solutions found in Level 1. 

Objective Function: The Level 2 objective func- 
tion, Eq. (23), maximizes the most probable hydropho- 
bic (k, 1) WHEEL contacts based on probabilities cal- 
culated using the database PDB set. Although this was 
done with the (i, j) pairs and parallel or antiparallel ori- 
entations already fixed from Level 1, the model could be 
altered to allow for only fixing the helical orientations 
after Level 1, for example. The parameters p/ ee and 
ae give the probabilities that any hydrophobic 
(k, 1) pair will occur on the same side of the helical 
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wheel as a specific PRIMARY pair (i, j) for antiparallel 
or parallel orientations, respectively. The Level 2 objec- 
tive function must be based upon not only the p{),; 5.1m 
and pe enh probabilities, but also upon the probabil- 
ities Prenton and tem as shown in Eq. (23). These 
latter probabilities treat the (k, 1) WHEEL contacts as 
PRIMARY contacts themselves, and the values reflect 
the relative weights of the different (i, 7) WHEEL con- 
tacts to these (k, 1) PRIMARY contacts. This approxi- 
mation allows for distinguishing between the possible 
(k, 1) hydrophobic contacts that may be specified when 
there is a choice of more than one pair. 


ma DD why: [Psion + Prrkimn | 


mn i,j 


mn 
“Vt “Wi; 


+ Z » » Wis ij bP ashen + P xishe| 


m,n i,j k,l 
Ymn wr" (23) 


Yorns Vrans We, Wii as Weimn> Viimn = {0,1} (24) 
Like the objective function in the Level 1 model, 
Eq. (23) is also nonlinear due to the products of binary 
variables that result. It can be linearized in a similar 
fashion [10]. 

Wheel Residue Contact Rules: Equation (25) states 
that a maximum of one WHEEL contact is allowed to 
be specified per primary contact. 


mn 
dX dX Wiig S Wij 


V(m,n, i,j): Vern + yun =1 
(25) 


Applications 


The Level 1 and Level 2 MILP optimization problems 
were applied to 26 target proteins with known struc- 
tures in the PDB, termed the test PDB set. For each 
of these proteins, only the primary amino acid se- 
quence and the experimentally-determined locations of 
the helices were presented to the model. The model 
predicted the interhelical hydrophobic residue con- 
tacts between such helices using the PRIMARY and 
WHEEL probabilities developed from globular helical 
proteins. 


A predicted set of interhelical contacts (a solu- 
tion) was evaluated by computing the average of the 
distances of these contacts from the experimentally- 
determined structure. The proposed framework was 
used to generate 20 solutions for each protein and each 
parameter value by applying integer cuts. The solutions 
in this list are ranked by objective value, from best to 
worst. The best contact distance average value is defined 
as the lowest contact distance average value identified 
for a specific protein. An upper limit of 14 A was iden- 
tified as a goal for the average distance corresponding 
to a contact prediction, since such a distance constraint 
would significantly improve the structure refinement. 

Figure 2 displays the lowest combined PRIMARY 
and WHEEL contact distance averages for every test 
protein. For the parameters given, these are the best so- 
lutions: the experimentally-determined distance aver- 
ages corresponding to contacts predicted by the model 
are lower for those solutions than for all other solutions 
in each protein system. Figure 2 demonstrates that the 
predictive results of the model are highly encouraging. 
A general goal of 5.0 to 14.0A for the actual distance 
range of contacts predicted by the model was set, since 
such a distance range would significantly improve the 
structure refinement. This goal was attained and sur- 
passed for the entire set of the test proteins. 

The error bars of one standard deviation for the dis- 
tance averages indicate that 1fc3, lash, lcc5, and 2ezh 
may fall close to this limit and that 1a17 is beyond this 
target. The averages for a large number of the target 
proteins fall far below 14.0 A, suggesting that lower dis- 
tance restraints such as 12.0 A or even 10.0 A and below, 
may be appropriate in some or even most cases. 

The successful predictions of this model using 
both experimentally-determined helix locations as well 
as predicted helical regions support the thesis that 
hydrophobic-to-hydrophobic interactions are key to 
the folding of the native structures of a-helical globular 
proteins. Despite the variety of structural motifs present 
in the test PDB set, the hydrophobic-to-hydrophobic 
interactions based model was able to identify low dis- 
tance interhelical PRIMARY and WHEEL contacts for 
each of the 26 proteins analyzed. This observation is 
also supported by the success of the subtract parame- 
ter in identifying the most essential contacts and further 
reinforces the utility of the probability values developed 
for the model. 
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Predictive Method for Interhelical Contacts in Alpha-Helical Proteins, Figure 2 
Lowest contact distance averages for identified solutions to the target proteins as explained in the text. Error bars for one 
standard deviation of the contact distances are given 
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In decision making involving to multiple criteria, the 
basic problem stated by analysts and decision mak- 


ers concerns the way that the final decision should be 
made. In many cases, however, this problem is posed in 
the opposite way: assuming that the decision is given, 
how is it possible to find the rational basis through 
which the decision was made? Or equivalently, how 
is it possible to assess the decision maker’s preference 
model leading to the exact same decision as the actual 
one or at least the most ‘similar’ decision? The philoso- 
phy of preference disaggregation in multicriteria analy- 
sis is to assess/infer preference models from given pref- 
erential structures and to address decision-aiding activ- 
ities through operational models within the aforemen- 
tioned framework. 


Definitions and Notations 


Under the term multicriteria analysis two basic ap- 

proaches have been developed involving: 

a) aset of methods or models enabling the aggregation 
of multiple evaluation criteria to choose one or more 
actions from a set A; 

b) an activity of decision-aid to a well-defined decision 
maker (individual, organization, etc.). 

In both cases the set A of potential actions or deci- 

sions is analysed in terms of multiple criteria in or- 

der to model all the possible impacts, consequences 
or attributes related to the set A (for instance, see 

[11,27,47,49,69]). 

B. Roy [47] outlines a general modeling methodol- 
ogy of decision making problems, which includes four 
modeling steps beginning with the definition of the ob- 
ject of the decision and ending with the activity of deci- 
sion aid, as follows: 

e Level 1: Object of the decision, including the defini- 
tion of the set of potential actions A and the deter- 
mination of a problematic on A (see below). 

e Level 2: Modeling a consistent family of criteria as- 
suming that these criteria are nondecreasing value 
functions, exhaustive and nonredundant. 

e Level 3: Development of a global preference model, 
to aggregate the marginal preferences on the criteria. 

e Level 4: Decision-aid or decision support, based 
on the results of level 3 and the problematic in 
level 1. 


In level 1, Roy [47] distinguishes four referential prob- 
lematics, each of which does not necessarily preclude 
the others. These problematics can be employed sepa- 
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rately or in a complementary way in all phases of the 

decision making process. The four problematics are the 

following: 

e Problematic a: Choosing one action from A 
(choice). 

e Problematic f: Sorting the actions in well defined 
categories which are given in a preference order 
(sorting). 

e Problematic y: Ranking the actions from the best 
one to the worst one (ranking). 

e Problematic 5: Describing the actions in terms of 
their performances on the criteria (description). 


In level 2, the modeling process must conclude on 
a consistent family of criteria {g), ..., gn}. Each crite- 
rion is a nondecreasing real valued function defined on 
A, as follows: 


., _R 
8 A> Ig. gi]C7 > glaeR, (1) 


where: 

e [gi*, 97] is the criterion evaluation scale; 

e gjx is the worst level of the ith criterion; 

e g* is the best level of the ith criterion; 

e g(a) is the evaluation or performance of action A on 
the ith criterion; 

e g(a) is the vector of performances of action A on the 
n criteria. 
From the above definitions the following preferen- 

tial situations can be determined: 


gila) > gib) @a>b 
gila) = gilb) > a~b 


(a is preferred to b), 


(a is indifferent to b). 


In multicriteria analysis four types of criteria are 
used with the following properties: 

e Measurable criterion: The criterion enables the pref- 
erential comparison of intervals of the evaluation 
scale. It can be distinguished in the following sub- 
types [69]: 

- true criterion (without any threshold); 

— semicriterion (with indifference threshold); 

- pseudocriterion (with indifference and prefer- 
ence thresholds). 

e Ordinal criterion: The criterion defines only an or- 
der on A; thus the evaluation scale is discrete (qual- 
itative criterion); 


e Probabilistic criterion: It covers the case of un- 
certainty in the actions’ performances modeled by 
probability distributions (see the Section “Disaggre- 
gation Under Uncertainty’ below); 

e Fuzzy criterion: The actions’ performances are inter- 
vals of the criterion’s evaluation scale. 

The modern theoretical steams in the field of multi- 
ple criteria decision-aid (MCDA) can be distinguished 
into four groups: 

1) Multi-objective optimization (see the Section “Dis- 
aggregation in Multi-objective Optimization’); 

2) Value-focused approaches ([29,30]); 

3) Outranking methods ([1,49]) 

4) Disaggregation methods. 

Roy and D. Bouyssou [48] point out the major 
conceptual and methodological differences distinguish- 
ing value-focused approaches from outranking meth- 
ods using the classical example regarding the location 
of a thermo-nuclear electrical production plant. On the 
conceptual level, the authors characterize the value- 
focused approach as descriptive of the decision maker’s 
preferences, while outranking methods are character- 
ized as a constructive way of building these prefer- 
ences. On the methodological level, the value-focused 
approach proposes a value or utility function to model 
the decision maker’s global preference (functional sys- 
tems), whereas outranking methods propose outrank- 
ing relations (relational systems). 


Outline of the Article 


The development of disaggregation methods has actu- 
ally begun in 1978, with the presentation of the UTA 
method in the ‘Cahiers du LAMSADE series. We will 
summarise all the progress made in this scientific field 
from that moment. 

The subsequent sections include: the background of 
the UTA method through the use of goal programming 
techniques in regression analysis; some thoughts about 
the usefulness of the disaggregation; a brief presenta- 
tion of the UTA algorithm; variants of UTA; other dis- 
aggregation methods; methods for decision making un- 
der uncertainty; multi-objective optimization. Finally, 
the last two sections show the implementation of dis- 
aggregation methods through decision support systems 
and potential real-world applications. 
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History 


The history of the disaggregation principle in multi- 
dimensional/multicriteria analyses begins with the use 
of goal programming techniques, a special form of lin- 
ear programming structure, in assessing/inferring pref- 
erence/aggregation models or in developing linear or 
nonlinear multidimensional regression analyses [52]. 

A. Charnes, W.W. Cooper and O. Ferguson [9] pro- 
posed a lineal model of optimal estimation of executive 
compensation by analysing or disaggregating pairwise 
comparisons and given measures (salaries); the model 
was estimated so that it could be as consistent as possi- 
ble with the data from the goal programming point of 
view. 

O.J. Karst [28] minimized the sum of absolute de- 
viations via goal programming in linear regression 
with one variable, while H.M. Wagner [70] generalises 
Karst’s model in the multiple regression case. Later, 
J.E. Kelley [31] proposed a similar model to minimize 
Tchebycheffs criterion in linear regression. 

Later, V. Srinivasan and A.D. Shoker [65] outlined 
the ORDREG ordinal regression model to assess a lin- 
ear value function by disaggregating pairwise judge- 
ments. N. Freed and G. Glover [17] proposed goal pro- 
gramming models to infer the weights of linear value 
functions in the frame of discriminant analysis (prob- 
lematic f). 

The research on handling ordinal criteria has be- 
gun with the studies [71] and [25]. Both research teams 
faced the same problem: to infer additive value func- 
tions by disaggregating a ranking of reference alterna- 
tives. F.W. Young, J. de Leeuw and Y. Takane [71] pro- 
posed alternating least squares techniques, without en- 
suring, however, that the additive value function is op- 
timally consistent with the given ranking. In the case of 
the UTA method proposed by E. Jacquet-Lagréze and 
J. Siskos [25] optimality is ensured through linear pro- 
gramming techniques. 


The Disaggregation Paradigm in MCDA 


In the traditional aggregation paradigm, the criteria ag- 
gregation model is known a priori, while the global 
preference is unknown. On the contrary, the philoso- 
phy of disaggregation involves the inference of prefer- 
ence models from given global preferences (Fig. 1). 


Aggregation 
model 


—— aggregation i LOBAL 
\ CRITERIA PREFERENCE,’ 
a disaggregation — a 


Aggregation 
model ? 


Preference Disaggregation, Figure 1 
The aggregation and disaggregation paradigms in MCDA 


Global Preference As Datum 


The clarification of the decision maker’s global prefer- 
ence necessitates the use of a reference set of actions Ap. 
Usually, this set could be: 

1) a set of past decision alternatives (Ap: past actions); 

2) a subset of decision actions, especially when A is 
large (Ar C A); 

3) aset of fictitious actions, consisting of performances 
on the criteria which can be easily judged by the de- 
cision maker to perform global comparisons Ax: fic- 
titious actions). 

In each of the above cases the decision maker is 
asked to externalise and/or confirm his/her global pref- 
erences on the set Ar taking into consideration the per- 
formances of the reference actions on all criteria. Usu- 
ally, the form of the global preference follows the fol- 
lowing typology: 

e Measurable judgements on Ap; 

e Ranking (weak order relation) on Ar (problematic 
ys 

e Pairwise relation; 

e Sorting of reference actions (problematic £). 


The ‘Famous’ UTA Method 
Objective of the Method 


The UTA method proposed by Jacquet-Lagréze and 
Siskos [26] aims at inferring one or more additive value 
functions from a given ranking on the reference set 
Ar. The method uses special linear programming tech- 
niques to assess these functions so that the ranking(s) 
obtained through these functions on Ag is (are) as con- 
sistent as possible with the given one. 
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The Additive Value Model 


The criteria aggregation model in UTA is assumed to be 
an additive value function of the following form [30]: 


u(g) = >> piui(gi) (2) 
i=1 


subject to normalization constraints: 


w(gie)=0, ule) =, 
VGH 1, as; 
4 (3) 
ye =1, 
i=1 
where u;, i = 1, ..., n, are nondecreasing real valued 


functions, named marginal value or utility functions, 
which are normalized between 0 and 1, and p; is the 
weight of 1;. 

Both the marginal and the global value functions 
have the monotonicity property of the true criterion. 
For instance, in the case of the global value function the 
following properties hold: 


u EO) >u [| <a>b (preference), 
u [sa =u [g@| <aw~b (indifference). 


The UTA method infers an unweighted form of the 
additive value function, equivalent to the form defined 
from relations (2)-(3), as follows: 


u(g) = >> ui(gi) (4) 
i=1 


subject to normalization constraints: 


ui(gix) = 0, i es (5) 
do ule?) = 1. (6) 
i=1 


Of course, the existence of such a preference model 
assumes the preferential independence of the criteria 
for the decision maker [30], although this assumption 
does not pose significant problems in a posteriori anal- 
yses such as disaggregation analysis. 


2 s 


fr gaia: 


underestimation 
errors* 


overestimation 
errors” 
-- 


Global value 


Preference Disaggregation, Figure 2 
Ordinal regression curve (ranking versus global value) 


The UTASTAR Algorithm. 


In order to assess every marginal value function, the 
evaluation scales of each criterion (especially in the case 
of measurable criteria) is discretised in a limited set of 
points: 

Git Geek) =e) (7) 

On the other hand, the set of reference actions Ar 
= {a), ..., ax} is ‘rearranged’ in such a way that a, is 
the head of the ranking and q; its tail. Since the ranking 
has the form of a weak order, for each pair of consecu- 
tive actions (qj, aj,1) one of the two following relations 
holds: 


aj > aj41 (preference), 


aj ~ aj+1 (indifference). 

In the original version of UTA [26], for each packed 
action a € Ap a single error o(a) is introduced to be 
minimised. Later, Y. Siskos and D. Yannacopoulos [60] 
introduced two errors leading to better results (Fig. 2). 
This variant of UTA, is now called UTASTAR. 

The main computational procedure employed in 
UTASTAR employs linear programming techniques to 
find additive value functions which are as consistent as 
possible with the ranking on Ap: 

1) Express the global value of reference actions 
ulg(aj)],j = 1, ..., k, first in terms of marginal val- 
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ues u;(g;), then in terms of variables: 
! I 
Wil = ui(gi*') — uj(g;) = 0, 


1=1,...,a;-1, 


PSH, ig NS 


by means of the relations 


I-1 
ui(g) = y we ui(g;) = 0, Viand! > 1. 
f=1 


2) Introduce two error functions o* and o~ on Ag by 
writing for each pair of consecutive actions in the 
ranking, the analytic expressions: 


A(aj, aj+1) = ulg(aj)] —ot (aj) +o (aj) 
— ulg(aj4i)] + at (aj41) — 0 (aj4i). (8) 


3) Solve the linear program 


k 
min z = Y [ot (aj) +o (a;)] 


ja 


subject to the set of constraints: 


A(a;,4j41)>6 ifa; > ajai, 
ee ET Ge pea. pl 
A(aj, 4j41) =0 if aj ~ Aj+i, 
(9) 
n aj-l 
edwin =, 
i=1 J=1 (10) 
wit > 0 
i=1, n; I=1,...,a;-1, 
a (a)=0, o@ (a) 20, f=1es,k, G1) 


o being a small positive number. 

4) Test the existence of multiple or near optimal solu- 
tions of the linear program in step 3 (stability anal- 
ysis); in case of nonuniqueness, find the mean addi- 
tive value function of those (near) optimal solutions 
which maximise the objective functions p; = uj(g7) = 
>- wy for all i = 1, ..., on the polyhedron (9)-(11) 
bounded by the new constraint: 


[ot (aj) +o -(a)]<2* +6, 
1 


k 
(12) 


J 


z* being the optimal value of the linear program in 
step 3 and € a very small positive number. 


Variants and Meta-UTA Techniques 


After the development of the UTA method several vari- 
ants have been developed incorporating different forms 
of global preference or different forms of optimality cri- 
teria used in the linear programming formulation. The 
main variants include: 

e Inferring u from pairwise comparisons [26]. 

e Maximising Kendall’s t, a consistency measure be- 
tween the two rankings, via a mixed linear program- 
ming formulation [26]. 

e Inferring u from assignment examples in the case of 
problematic 6 ([14,22,26,78]). 

e Optimising lexicographic criteria without dis- 
cretization of criteria scales G; [41]. 

e Inferring u in the presence of nonmonotonic prefer- 
ences on the criteria evaluation scales [13]. 

Other techniques, named meta-UTA, aimed at the 
improvement of the value function with respect to near 
optimality analysis or to its exploitation for decision 
support. 

D.K. Despotis, Yannacopoulos and C. Zopouni- 
dis [12] propose to minimise the error’s dispersion 
(Tchebycheff criterion) within the UTA’s step 4. In the 
case where UTA gives a sum of error equal to zero 
(z* = 0), M. Beuthe and G. Scanella [5] propose the 
meta-UTA techniques UTAMP1 maximising 6 (mini- 
mum difference between the global value of two consec- 
utive reference actions) in near optimality analysis, and 
UTAMP2 maximising 6 plus the minimum of marginal 
value step wj of UTA’s step 1. 

Beuthe and Scanella [7] propose similar techniques 
for the case of z* > 0 and provide some comparative 
analysis results for different UTA variants. 

Finally, Siskos [51] suggests the construction of 
fuzzy outranking relations based on multiple value 
functions u provided by UTA’s near optimality analy- 
sis. 


Other Disaggregation Methods 


The disaggregation logic has been employed almost 
in all aggregation models in multicriteria analysis. Of 
course, in some cases, it is not easy to infer aggregation 
models or procedures from their output. 

A first attempt to infer ELECTRE III from a given 
ranking was made in [45] without satisfactory results. 
L.N. Kiss et al. [33] developed the ELECCALC system 
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that estimates indirectly the parameters of the ELEC- 
TRE II method from the decision maker’s responses 
to questions of the system regarding his/her global 
preferences. V. Mousseau and R. Slowinski [40] and 
Mousseau et al. [39] present linear programming for- 
mulations to infer the parameters of ELECTRE TRI, ex- 
cept the veto thresholds, from assignment example in 
the case of problematic 6. 

P. Spiliopoulos [63] and Siskos and N.F. Matsatsi- 
nis [55] employ the UTA method iteratively as a model 
to analyse consumers’ behavior, through the MARKEX 
system providing decision support during new product 
development [37]. 

The disaggregation of measurable judgements in 
order to infer additive value functions for prediction 
purposes is proposed in [32]. Siskos et al. [58] devel- 
oped an ordinal regression formulation to measure cus- 
tomer satisfaction by disaggregating multiple satisfac- 
tion judgements. The method was implemented in the 
MUSA system and it was applied in several real-world 
studies (for instance, see [38]). 

UTA was used in several works for conflict resolu- 
tion in multi-actor decision situations ([8,24,36]). 

Additive value functions are usually assessed in 
two phases: in the first phase the marginal value 
functions are assessed under the preferential inde- 
pendence conditions and in the second phase their 
weights are assessed by disaggregating a ranking of 
a small number of reference actions [50]. Two-phase 
disaggregation methods were implemented through 
the MACBETH system ([2,3]) and the MHDAS sys- 
tem ([59,64]). 

The general scheme of the disaggregation philos- 
ophy is also employed in other approaches, includ- 
ing rough sets ([16,43,62,72]), machine learning [44] 
and neural networks ([35,66]). All these approaches are 
used to infer some form of decision model (a set of de- 
cision rules or a network) from given decision results 
involving assignment examples, ordinal or measurable 
judgements. 


Disaggregation Under Uncertainty 


Within the framework of multicriteria decision aid un- 
der uncertainty, Siskos [52] developed a specific ver- 
sion of UTA (Stochastic UTA), in which the aggrega- 
tion model to infer from a reference ranking is an addi- 


5(g}) 


Evaluation 
scale G; 


Preference Disaggregation, Figure 3 
Distributional evaluation and marginal value function 


tive utility function of the form: 


W(8") = S92 88g) wile!) 


i=1 j=1 


(13) 


subject to normalization constraints (5)-(6), with the 

following additional notation (see also Fig. 3): 

e 6% is the distributional evaluation of action A on the 
ith criterion; 

e 5f (g! ) is the probability that the performance of ac- 
tion A on the ith criterion is gi ; 
ui(g! ) is the marginal value of the performance gi; 

e 6° is the vector of distributional evaluations of ac- 
tion A; 

e u(6°) is the global utility of action A. 


Of course, the additive utility function (13) has the 
same properties as the value function: 


u(S") > u(8°) Sa> b( preference), 

u(S") = u(8°) = a ~ b( indifference). 

Similarly to the case of the UTA described above, 
the stochastic UTA method disaggregates a ranking 


of reference actions [53]. The algorithmic procedure 
could be expressed in the following way: 
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1) Express the global expected utilities of reference ac- 
tions u(6“/) in terms of variables wj = u; Ge) - 
ui(g}) = 0. 

2) Introduce two error functions o* and o7: 


A (aj, aj41) = u(5*!) — 07 (aj) + 0 (aj) (14) 
—u(8"it+!) +07 (aj41)—o (aj41). (15) 
3) Solve the linear program: 


k 
minz = Y lot (aj) +o (a;)] 


j=l 


subject to the set of constraints: 


A(aj, aj41) = 4 ee ee oe 
A(aj,aj41) =0 ifaj ~ aj4i, 
(16) 
yo wets 
il 
wi 20, 
i=1,...,% $20 et 
ot(aj)>0, o (aj)>0, 
j=,2,...,k, 


6 being a small positive number. 
4) Test the existence of multiple or near optimal solu- 

tions. 

Of course, the ideas employed in all variants of the 
UTA method are also applicable in the same way in the 
case of the stochastic UTA. 


Disaggregation in Multi-objective Optimization 


The disaggregation approach is also applicable in the 
specific field of multi-objective optimization, mainly in 
the field of linear programming with multiple objective 
functions. For instance, in the classical methods of [18] 
and [73] the weights of the linear combinations of the 
objectives are inferred locally from trade-offs or pair- 
wise judgements given by the decision maker at each 
iteration of the methods. 

TJ. Stewart [67] proposed a procedure of pruning 
the decision alternatives using the UTA method, while 


Jacquet-Lagréze, R. Meziani and Slowinski [23] devel- 
oped a disaggregation method, similar to UTA, to assess 
a whole value function of multiple objectives for linear 
programming systems. 

Siskos and Despotis [54] proposed an interactive 
method named ADELAIS that uses UTA iteratively, 
in order to optimise an additive value function within 
the feasible region defined on the basis of the satisfac- 
tion levels determined during each iteration. Finally, 
A. Tangian and M.J. Gruber [68] propose a different 
form of disaggregation techniques for the assessment of 
quadratic multi-objective functions. 


Interactive Disaggregation Systems 


Except for the normative features that a disaggregation 

approach provides, it also constitutes a basis for the in- 

teraction between analysts and decision makers. The is- 
sues involved during this interactive dialog and negoti- 
ation include: 

e the consistency between the assessed preference 
model and the a priori preferences of the decision 
maker; 

e the assessed values (values, weights, utilities, etc.); 
and 

e the overall evaluation of potential actions (extrapo- 
lation output). 

A general interaction scheme is given in Fig. 4. 

Several decision support software have been de- 
veloped on the basis of disaggregation methods, most 
of them being UTA based. They include: PREF- 

CALC [21], MINORA-MIIDAS [59], ADELAIS [54], 

MARKEX [55], UTA+ [34], FINEVA [81], FINCLAS 

[75], PREFDIS [77], MUSTARD [6]. 


Applications and Conclusions 


From their first appearance in 1978 onwards, pref- 
erence disaggregation methods have been applied in 
several real-world decision making problems from the 
fields of financial management, marketing, environ- 
mental management, as well as human resources man- 
agement. The following list reports some of these appli- 
cations (list not exhaustive): 

e financial management 

- venture capital [56]; 
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Problem formulation 


a | Criteria modelling 


Expression of DM's global 
preferences on a reference set 


Assess the DM's preference model 


Is there a satisfactory consistency 
between the DM's preferences and 
the ones resulted by the model? 


YES/ Is there any intention of modifying 
the preference model? 
Is there any intention of modifying 
the DM's preferences? 
NO 


Is there any intention of modifying 
the criteria modelling or the 
problem formulation? 


NO 
The preference model is judged 
unacceptable 


Extrapolation on the 
whole set of alternatives 


Is the preference model 
accepted? 


Preference Disaggregation, Figure 4 
Simplified decision support process based on disaggregation analysis 


- portfolio selection and management ([20,79]); e industrial project evaluation [22]; 

- business failure prediction ([74,76]); e job evaluation [64]. 

- business financing ([61,75,81]); The above applications have provided insight on 

- country risk assessment ([10,42,80]); the applicability of preference disaggregation analysis 
e marketing in addressing real-world decision problems and its effi- 

- marketing of new products ([4,55,63]); ciency. The future research developments on this field 

- sales strategy problems ([46,57]); required to explore further the potentials of the pref- 

- customer satisfaction ([38,58]); erence disaggregation philosophy within the context of 


e environmental management ([15,19,53]); multicriteria decision aid, include: 
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Inference of More Sophisticated Aggregation Models 
by Disaggregation 


Currently most preference disaggregation methods lead 
to the assessment of additive value functions. How- 
ever, in many cases such additive models fail to com- 
prise the decision maker’s preference in a satisfactory 
way, either because their underlying assumptions do 
not hold (preferential independence), or because they 
do not consider the existing interactions between the 
criteria. In that respect it is worth examining the devel- 
opment of alternative aggregation models, for instance 
in the form of multiplicative value functions or even 
outranking relations. 


Evaluation of the Aggregation-Disaggregation 
Relationship 


Both aggregation and disaggregation procedures share 
the same objective: to aggregate all criteria into a global 
preference model that will support decision making. Of 
course, as this paper has demonstrated, there are sig- 
nificant differences in the process employed in both 
approaches to accomplish this objective. However, it 
would be interesting to explore the relationship of ag- 
gregation and disaggregation procedures in terms of 
similarities and/or dissimilarities regarding the evalu- 
ation results obtained by both approaches. This will en- 
able the identification of the reasons and the condi- 
tions under which aggregation and disaggregation pro- 
cedures will lead to different or the same results. 


Experimental Evaluation 
of Disaggregation Procedures 


Real-world applications can be used to illustrate the de- 
cision support provided by preference disaggregation 
approaches in practice. However, a thorough investiga- 
tion of their performance and ability to capture the de- 
cision maker’s preferences requires the conduct of ex- 
perimental studies. Through such studies it is possible 
to examine how different data conditions and preferen- 
tial structures affect the efficiency of preference disag- 
gregation approaches and the aggregation models used. 


Implementation of Disaggregation Methods 
into DSSs and Group DSSs 


Preference disaggregation procedures operate on an in- 
teractive and iterative way. The decision maker inter- 


acts with the procedure to achieve a consistent rep- 
resentation of his/her preferences in the aggregation 
model through an iterative trial and error process. The 
DSSs technology is well adapted to both these features, 
enabling the decision maker to take full advantages 
that preference disaggregation approaches provide, in 
real-time. Furthermore, this framework could be ex- 
tended bearing in mind the fact that many crucial de- 
cisions are taken by a group of decision makers work- 
ing in a negotiating or cooperative environment to con- 
clude to a consensus decision. Group DSSs provide the 
means required in supporting such decision making 
situations. 
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Multicriteria decision aid (MCDA) has emerged during 
the last three decades as a promising scientific field in 
operations research and management science, and to- 
day it has consolidated its position with a wide set of 
methods and tools confronting in a realistic and flex- 
ible way decision problems where multiple conflicting 
criteria should be considered. The tools provided by 
MCDA are not just some mathematical models, which 
aggregate several criteria, points of view or attributes, 
but furthermore they are decision support oriented. Ac- 
tually, ‘support’ is a key issue in MCDA, implying that 
the models are not developed through a straightforward 
sequential process where the decision maker’s role is 
passive. Instead an iterative process is employed to ana- 
lyze the preferences of the decision maker and represent 
them as consistently as possible in an appropriate deci- 
sion model. Throughout this process the interaction of 
the decision maker with the analyst is essential, provid- 
ing significant preferential information to the analyst. 
This decision support nature of MCDA is the basic fea- 
ture that distinguishes it from the classical models that 
the optimization approach employs. 

Within the methods and tools provided by MCDA, 
several approaches and theoretical disciplines can be 
defined, although their distinction and the existing 
boundaries among them are often difficult to deter- 
mine. In fact several authors provided different catego- 
rizations of the MCDA approaches ([19,21,34,36]). Fol- 
lowing the proposal of P.M. Pardalos et al. [15] and C. 
Zopounidis [40], one can identify four major streams in 
MCDA: 

1) multi-objective programming ([30,32,35]), 

2) multi-attribute utility theory [11], 

3) outranking relations ([18,20]), 

4) preference disaggregation analysis [10]. 

The differences among these approaches can be iden- 
tified in terms of the types of problems that they ad- 
dress, in terms of the preference models that they de- 
velop, as well as in term of the process that is employed 
to develop these models. As far as the type of problem 
is concerned, multi-objective programming addresses 
decision problems where there is not a finite set of al- 
ternative solutions, but instead the possible alternatives 
are determined implicitly through a set of constraints 
imposed by the nature of the problem. On the other 
hand, multi-attribute utility theory, outranking rela- 
tions, and preference disaggregation analysis are used 
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to study decision problems involving the evaluation of 
a well-defined finite set of alternatives. 

Concerning the form of the model that is employed, 
in multi-objective programming the decision problem 
is formulated as a mathematical programming model, 
with more than one objective functions representing 
the objectives of the decision maker. In both multi- 
attribute utility theory and preference disaggregation 
analysis the model is a utility function either addi- 
tive or multiplicative. Finally, outranking relations are 
based on pairwise judgments between the alternatives 
of the form “alternative a is at least as good as alterna- 
tive D. 

Although there are significant differences between 
the four MCDA approaches regarding the types of 
problems that they address and the model formulation, 
the most significant differences involve the informa- 
tion that they require from the decision maker, and 
the procedure that is used to elicit this information. 
In multi-objective programming the process employed 
to develop the model and to obtain the best compro- 
mise solution is an interactive one. Once the decision 
problem has been consistently formulated as a mathe- 
matical programming problem, the efficient (nondom- 
inated) set of solutions is determined. The decision 
maker is then asked to provide some preferential infor- 
mation usually in terms of some reference points, indi- 
cating the way that the efficient set should be investi- 
gated. In multi-attribute utility theory, a direct interro- 
gation process is employed to elicit information from 
the decision maker concerning the trade-offs among 
the conflicting criteria, attributes or points of view. 
These trade-offs are then used to construct the global 
preference model in the form of a utility function that 
the decision maker implicitly uses to make decisions. 
The modeling of the decision makers preferences in 
outranking relations is achieved similarly to the pro- 
cess used in multi-attribute utility theory (direct inter- 
rogation of the decision maker), although the type of 
information required differ (the decision maker must 
determine the weights of the evaluation criteria, as well 
as preference, indifference and veto thresholds). Com- 
pared to the other three approaches, preference dis- 
aggregation requires the minimal amount of informa- 
tion from the decision maker. The weights, trade-offs, 
reference points, or any other preferential information 
does not have to be determined a priori by the decision 


maker. Instead, the decision maker, based on his/her 
past experience is asked to provide some characteris- 
tic examples of his decision making policy through the 
evaluation of a ‘reference’ set of alternatives. Then us- 
ing ordinal regression techniques, the utility function 
that has been implicitly used by the decision maker is 
estimated. 

This indirect estimation of the decision makers’ 
preferences in preference disaggregation analysis is 
a quite appealing characteristic compared to the direct 
interrogation procedures employed in multi-objective 
programming, multi-attribute utility theory and out- 
ranking relations. 

The preference disaggregation analysis found an ap- 
propriate field of applications in the domain of finan- 
cial decision making. Most financial decision making 
problems such as corporate failure prediction, credit 
granting, portfolio selection and management, venture 
capital investment, country risk assessment, etc., have 
a repetitive character [37], while the decisions have to 
be taken in real time. These two characteristics of fi- 
nancial decision making problems are in accordance 
with the general methodological framework of prefer- 
ence disaggregation analysis. The repetitive character 
of financial decision problems enables the decision an- 
alyst to exploit the experience and the past decisions 
of the decision maker (financial/credit analyst, portfo- 
lio manager, etc.) in order to develop the appropriate 
decision model. Once this model is validated, it can 
be used to support real time financial decision making. 
Zopounidis [40] provides a comprehensive discussion 
of the applications of preference disaggregation anal- 
ysis and multicriteria analysis in general in the study 
of financial decision making problems, while the books 
([38,39]) and [7] illustrate the application of preference 
disaggregation analysis in venture capital investments, 
business failure prediction, and portfolio selection and 
management. 

The rest of this article will focus with more details 
on the basic concepts, principles and techniques used 
in preference disaggregation analysis. More specifically, 
Section 2 describes the foundations of preference dis- 
aggregation analysis, while Sections 3, 4 and 5 illus- 
trate how the preference disaggregation analysis can 
be applied in the study of ranking, sorting and choice 
problems respectively using simple illustrative exam- 
ples from the field of credit granting. Finally, Section 6 


3044 


Preference Disaggregation Approach: Basic Features, Examples from Financial Decision Making 


concludes the article and discusses some possible future 
research direction in this field. 


Preference Disaggregation Analysis 


The preference disaggregation approach refers to the 
analysis (disaggregation) of the global preferences of 
the decision maker to deduce the relative importance 
of the evaluation criteria, using ordinal regression tech- 
niques based mainly on linear programming formula- 
tions. 

The preference disaggregation analysis is based on 
the simple finding that generally, in real world situa- 
tions, decision makers are either unable or unwilling to 
provide in a direct way specific information regarding 
their preferences including weights or trade-offs. Even 
if this is possible, the procedure that will be employed 
to elicit such information from the decision maker is 
time consuming which may prevent its practical appli- 
cations in real decision problems where decisions have 
to be taken in real time. 

On the contrary, instead of describing the proce- 
dure that leads to the final decision, it would be eas- 
ier for the decision maker to provide the analyst with 
the actual decision he/she would take considering the 
specific characteristics and conditions of the problem 
at hand. 

For instance, when a committee of professors inves- 
tigates the profiles of candidates for a graduate course, 
in order to rank them from the most appropriate to the 
less appropriate ones, the past evaluations that the com- 
mittee has made can be used. These evaluations can ei- 
ther have the form of a ranking or they may express 
the intense of preference between two candidates (how 
many times an alternative is preferred compared to an- 
other alternative) [12]. Furthermore, it is even possi- 
ble to consider more detailed information that the de- 
cision maker can provide, for instance the ranking of 
the alternatives on each evaluation criterion combined 
with the ranking of the criteria according to their sig- 
nificance [1]. 

The purpose of gathering such information from 
the decision maker is to have some representative ex- 
amples of decisions taken by the decision maker. These 
examples reflect the decision policy and the preferences 
that the decision maker has implicitly used in mak- 
ing the decision. Consequently, through the analysis of 


such decision instances, the analyst can derive useful in- 
formation concerning the global preference system of 
the decision maker. 

Decision makers when making decisions evaluate 
each alternative over a set of factors, criteria, attributes 
or points of view that affect the overall evaluation of 
the alternatives. Then, these partial evaluations are ag- 
gregated to derive the final decision. Following the 
same approach, the aim of the preference disaggrega- 
tion analysis is to disaggregate the overall decision into 
the partial evaluations on each one of the evaluation cri- 
teria. The disaggregation should be performed in such 
a way so that the aggregation of partial evaluations will 
lead to the overall evaluation that the decision maker 
provided. If this is not possible then the deviations that 
occur should be minimized. 

In such a disaggregation process it is clear that the 
form that the partial evaluations will have, as well as 
the selection of the model which will be used to aggre- 
gate the partial evaluations are two key issues. The first 
preference disaggregation approaches employed a sim- 
ple weighted sum model of the form: 


> widij A 


where w; denotes the weight of each criterion and dj 

denotes the distance of the evaluation of an alternative 

i on criterion j from the ideal point for this criterion 

([6,16,31]). However, such a model is just an oversim- 

plification of real world situations, which suffers from 

two major drawbacks. 

e Firstly, it is obvious that this formulation implies 
both the overall value (score) of an alternative as well 
as the partial value of the alternative on an evalu- 
ation criterion, are linear functions (the weights of 
the evaluation criteria are independent on the crite- 
ria’s values). 

e Furthermore, this model is not appropriate for con- 
sidering criteria which are measured through a qual- 
itative scale. This would require the transformation 
of this qualitative scale into a numerical one, which 
can not be unique and it may not be in accordance 
with the preferential information that a qualitative 
scale provides. 

To overcome such limitations, utility functions may be 

used. Utility functions are nonlinear increasing or de- 

creasing functions of the criteria’s values, indicating the 
value of the alternatives in the global preference system 
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of the decision maker. The most common forms of util- 
ity functions used in practice include the additive form 
and the multiplicative form. An additive utility function 
can be expressed as: 


Ug) = > uj(g)). 
j 


whereas a multiplicative utility function can be ex- 
pressed as: 


[](kuj(g)) +D-1 
; , 

where U(g) denotes the global utility of an alternative 
described by the vector of criteria g, u;(g;) is the partial 
or marginal utility of an alternative on criterion g;, and 
k is a scaling factor. R.L. Keeney and H. Raiffa in their 
book [11] provide a comprehensive discussion of utility 
theory and the underlying assumptions of the several 
types of utility functions. 

The aim of preference disaggregation analysis is to 
estimate the marginal utilities of each evaluation cri- 
terion so that their aggregation using either an addi- 
tive or a multiplicative utility function results in an 
evaluation of the alternatives which is consistent with 
the decision maker’s preferences and judgement pol- 
icy. This estimation is achieved through mathematical 
programming techniques with the objective being the 
optimization of a measure of consistency. Multiplica- 


U(g) = 


tive utility functions can generally be more appropri- 
ate for modeling decision makers’ preferences in real 
world decisions taking into account possible interac- 
tions among the decision makers’ preference on sev- 
eral criteria [14]. However, their estimation results in 
nonlinear programs that are computationally intensive 
and difficult to solve. Consequently, in practice additive 
utility functions are commonly used instead of multi- 
plicative ones, since they provide a simple but also pow- 
erful approach for modeling decision makers’ prefer- 
ence in multiple criteria decision making problems. 

A well known preference disaggregation method 
which incorporates additive utility functions to model 
decision maker’s preferences is the UTA method (UTIil- 
ités Additives) proposed in [10]. The subsequent three 
subsections illustrate the UTA method as it has been 
proposed in [10], as well as some of its variants which 
have been proposed for the study of ranking, sort- 
ing and choice decision problems. More specifically 


the UTASTAR method [28] for ranking problems, the 
UTADIS method ([3,9,42]) for sorting problems, as well 
as two methodologies proposed in [22] and [33] for 
choice problems are presented. 


Preference Disaggregation Analysis 
in Ranking Problems 


The UTA method performs an ordinal regression based 
on the preference disaggregation approach of MCDA. 
Following the general methodological framework of 
preference disaggregation analysis, the decision maker 
is asked to provide a ranking (pre-ordering) of a refer- 
ence set of alternatives which will be used to construct 
the additive utility model. This reference set may con- 
sist of examples of past decisions, or of a subset of the 
alternatives under consideration for which the decision 
maker can express a global evaluation. 

Given this pre-ordering, the aim of the UTA 
method is to estimate a set of additive utility functions 
which are as consistent as possible with the decision 
maker’s preferences (pre-ordering). The marginal utili- 
ties are piecewise linear (Fig. 1). The range of values of 
each criterion is divided into a;—; equal intervals. The 
number of these subintervals can be specified by the 
decision maker, or it can be determined by the ana- 
lyst so that there is at least one alternative falling into 
each subinterval. The estimation of the marginal utili- 
ties is achieved through the following linear program- 
ming formulation: 


min F= Y¢o(a) 
acA 

s.t. U(a)—U(b) + o(a)—a0(b) = 6 
ifaPb, Va,beA, 

U(a) — U(b) + o(a) — o(b) = 0 

ifalb, Va,be A, 

ui(g!*') — ui(g!) > 0, 

» ui(gix) = 1, uilgix) =0, Vi, 
where P and I represent the preference and the indif- 
ference relations, respectively, A is the set of reference 
alternatives used to develop the additive utility model, 
U(a) = wr uilgi(a)] is the global utility of an alterna- 
tive a € A, o(a) is an error function (a(a) > 0), and 6 
is a threshold used to ensure the strict preference of an 
alternative a over an alternative b (5 > 0). 
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Preference Disaggregation Approach: Basic Features, Exam- 
ples from Financial Decision Making, Figure 1 
Piecewise linear form of marginal utilities 


Solving the above linear program a utility func- 
tion is estimated which minimizes the deviations o(a). 
However, it has been observed that in many cases this 
utility function is not the most consistent one with 
the pre-ordering provided by the decision maker. On 
the contrary, utility functions that correspond to sub- 
optimal solutions of the above linear program often 
provide a more consistent representation of the deci- 
sion maker’s preferences. Furthermore, the above linear 
program has often multiple optimal solutions (degen- 
eracy). Therefore, in order to examine these two essen- 
tial issues, in a second stage the UTA method proceeds 
to a post-optimality analysis, trying to find some char- 
acteristic sub-optimal or multiple optimal solutions. 
A heuristic post-optimality procedure that has been 
proposed by E. Jacquet-Lagréze and Y. Siskos [10] finds 
the solutions which correspond to extreme weights of 
the evaluation criteria. This is achieved by incorporat- 
ing an additional constraint of the form: F < F* + k(F*), 
where F* is the optimal value of the objective function 
obtained by solving the above linear program and k(F*) 
is a small portion of it. The new linear program that 
is obtained is solved with the objective being the mini- 
mization or the maximization of the weight of each cri- 
terion: 


min (or max) Y > piui(gix) 


i=1 


with p;=Oorl, Vi. 


Siskos and D. Yannacopoulos [28] proposed the UTAS- 
TAR method, an improved variant of the UTA method. 
The differences between the two methods can be iden- 
tified in the following two aspects: 

1) Instead of using the marginal utilities u;(g/) as 
decision variables, the difference wj between the 
marginal utilities of two successive values of a crite- 
rion is used: wj = ui(g)*") — uj(g’) > 0. In this way 
the constraints ui(gi*") _ ui(g! ) > O are transformed 
in nonnegativity constraints w; = 0. 

2) Two error functions are used instead of one. The 
two error functions denoted as o0* (a) and a7 (a) rep- 
resent the overestimation or underestimation error 
that may occur in the decision maker’s pre-ordering. 
The overestimation error involves alternatives that 
have been ranked by the decision maker higher than 
their rank according to the additive utility model, 
while the underestimation error involves alterna- 
tives which have been ranked by the decision maker 
lower than their rank according to the additive util- 
ity model. 

Considering these two differences the new linear pro- 

gram that is solved is the following: 


min Y lot (a) —o (a)] 
acA 
st. U(a)—U(b) + ot (a) — a7 (a) 
—o'(b)+a07(b)>6 
if aPb, 
U(a) — U(b) + oT (a) — (a) 
—ot(b) +a07(b) =0 
if alb, 


2D Wa. 
j 


i 


wij,0t (a),o (a) > 0, Wa € A, Vi, j, 


aj—l 
ui(gt) = >> wir, Vi.j.k. 
k=1 
The UTA and the UTASTAR methods have been 
applied successfully in a variety of decision prob- 
lems, including environmental decisions [24], market- 
ing decisions and sales strategy problems ([13,17,23]), 
customer satisfaction [25], venture capital invest- 
ments [29], country risk assessment [2], evaluation of 
bankruptcy risk [37], as well as research and develop- 
ment decisions [5]. 
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Preference Disaggregation Approach: Basic Features, Exam- 
ples from Financial Decision Making, Table 1 
Evaluations of the firms 


EBIT/TA NU/NW  TL/TA  (CA-I)/CL 
F, 10% 24% 33% 297, 
F, 5% 9% 78% il tli 
1) 13% 18% 80% 0.84 
F4 15% —8% 72% 0.85 
Fs 8% —30% 93% 0.60 


In the subsequent subsection, a simple example is 
used to illustrate how the UTASTAR method can be ap- 
plied in the study of ranking problems. 


Example 1 Consider a decision problem concerning 
credit granting. Five firms (F\, F2, F3, F4, Fs) are seeking 
financing by a bank. The credit managers of the bank 
evaluate the firms along four financial ratios: 
1) earnings before interest and taxes/total assets 
(EBIT/TA); 
2) net income/net worth (NI/NW); 
3) total liabilities/total assets (TL/TA); and 
4) (current assets-inventories)/current liabilities [(CA- 
1)/CL]. 
For the ratios EBIT/TA, NI/NW and (CA-I)/CL the 
preferences of the credit managers are increasing func- 
tions of their values. Hence, the higher the values of 
these ratios the more creditworthy a firm is. On the con- 
trary, for the ratio TL/TA the preference of the credit 
managers is a decreasing function of its value, since 
high values of this ratio mean that the firm is highly in- 
debted. Table 1 illustrates the evaluations of the firms 
along these four criteria. 
According to the credit policy of the bank, the fol- 
lowing preferential structure (pre-ordering) is defined: 


Fi PF, PF; Py PFs. 


The first step of the UTASTAR method consists of mak- 
ing explicit the utilities of the alternatives (firms). The 
range of values of each criterion (financial ratio) is di- 
vided into a number of subintervals, so that there is at 
least one firm belonging to each interval. Following this 


rule the following scales are retained: 


[gix, gt] = [5%, 7.5%, 10%, 12.5%, 15%] , 
Leon, 9%] =[—30%, —16.5%, —3%, 10.5%, 24%] , 
[gsx, 23] =[93%, 73%, 53%, 33%] , 

[eas 24 | = (0.6, 1.785, 2.97) . 


Using linear interpolation the global utilities of the 
firms can be written as follows: 


U(F}) =u, (10%) + u2(24%) + 13(33%) + u4(2.97) , 
U(F2) = 4, (5%) + 0.11u2(—3%) + 0.89u2(10.5%) 
+ 0.25u3(93%) + 0.75u3(73%) 

+ 0.57u4(0.6) + 0.43u4(1.785) , 
U(F3) = 0.81) (12.5%) + 0.2u; (15%) 

+ 0.44u(10.5%) + 0.56u2(24%) 

+ 0.35u3(93%) + 0.65u3(73%) 

+ 0.80u4(0.6) + 0.20u4(1.785) , 
U(F4) = u,(15%) + 0.37u2(—16.5%) 

+ 0.63u2(—3%) 

+ 0.95u3(73%) + 0.05u3(53%) 

+ 0.79u4(0.6) + 0.21u4(1.785) , 
U(Fs) = 0.81; (7.5%) + 0.2u;(10%) 

+ uz(—30%) + 43(93%) + u4(0.6) . 


aj—1l 


kay Wik the 


Using the transformation ui(g*) = 
global utilities can be expressed as: 


U(F,) = wi + wig + Way + Woz + W3 


+ Waa + W31 + W32 + W33 + Wa + Wa, 


(1) 


U( Fy) = wa) +W22 + 0.8923 +0.75w3; +0.43W41, (2) 


U(F3) =Wyy + W12 + W13 + 0.20w44 
+ Wa + W22 + W23 + 0.56W24 
+ 0.65w3; + 0.20wa4) , (3) 


U(F4) = wi + wiz + W13 + Wig + Wa} 
+ 0.63W22 + w31 + 0.05w32 + 0.21W4, , 


U(Fs) = wi, + 0.212. (4) 
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Taking into account the pre-ordering that was de- 
fined, and the global utilities of the firms the following 
linear program is formulated (6 = 0.05): 


minfot (F,) +07 (Fi) + ot (F,) +07 (Fo) 
+o7(F3) +07 (F3) +ot (Fy) +07 (Fy) 
+o (Fs) +0 (Fs)} 


s.t. 


Wit + Wi2 + 0.11W23 + Wr 

+ 0.25w31 + W32 + W33 + 0.57wWa + Wa2 

+o1(F,\)—o (F\)—ot (F:) +0 (F:) = 0.05, 
(5) 


— Wy — W122 — W13 — 0.2W 14 — 0.11 W223 

—0.56w24 + 0.1w3) + 0.23 w41 

+o0*(F))—o (Fy) —o* (F3) +07 (Fs) = 0.05, 
(6) 


— 0.8wy4 + 0.37W22 + W23 + 0.56W24 

= 0.35w3 = 0.05w32 = 0.01w41 

+ o*(F3)—o7 (F3)— ot (Fa) + 0 (Fa) = 0.05, 
(7) 


0.8wi2 + W13 + Wig + W21 + 0.63 W22 

+ w3, + 0.05w32 + 0.21w41 

+o (Fy) 07 (Fs) —0* (Fs) + 0 (Fs) > 0.05 , 
(8) 


Wi + Wi2 + W13 + Wi4 + Wa + W22 


+ W223 + Wo4a + W31 + W32 + W33 + Wa + Wan = 1. 


(9) 


Constraint (5) involves the pairwise comparison of F, 
with F,, constraint (6) involves the pairwise compar- 
ison of F, with F3, constraint (7) involves the pair- 
wise comparison of F3 with F4, constraint (8) involves 
the pairwise comparison F, with F;, while constraint 
(9) is used to normalize the global utilities between 
0 and 1. The optimal solution to this linear problem 
is as follows. (The solver of Microsoft Excel has been 
used to solve all the presented linear programs. Since 
most of the solutions presented are not unique, differ- 
ent solutions may be obtained through other linear pro- 
gramming packages depending upon their particulari- 


ties and vagaries.) 


W 3 =0.0152, wi4 = 0.0357 , 

W2, = 0.0758, w2. = 0.0786 , 

W23 = 0.0959, wo4 = 0.011, 

w3, = 0.1301, w32 = 0.0754, w33 = 0.0758 , 
Wa, = 0.3306, Wa. = 0.0758 . 


This solution is fully consistent with the predefined pre- 
ordering. More specifically, the global utilities obtained 
through this solution are: 


U(F,) =0.9491, U(F,) = 0.4795 , 
U(F3) = 0.4295, U(F4) = 0.3795 , 


In a second stage, through the post-optimality analy- 
sis, the existence of multiple optimal solutions is inves- 
tigated. In order to find the most characteristic solu- 
tions the weight of each criterion is maximized. This 
is achieved by solving four new linear programs with 
the objectives being the maximization of w), + wi. + 
W13 + Wi4, W21 + W22 + W23 + Wo4, W31 + W32 + W33, and 
Wa, + Waa; respectively. For instance, the linear program 
to maximize the weight of the first criterion (EBIT/TA) 
would be the following: 


max{w + wy+wi3+ Wi} 


S.t. 


Wi + Wi2 + 0.11W23 + Wg + 0.25w31 + W32 

+ w33 + 0.57w41 + W42 > 0.05, 

— Wy — W12 — Wy3 — 0.2Wy4 — 0.11023 

— 0.56W24 + 0.1W31 + 0.23w4) = 0.05, 

— 0.8wy4 + 0.37w22 + W23 + 0.56W24 

— 0.35w31 — 0.05w32 — 0.01w4, > 0.05 , 

0.8wi2 + W13 + Wi4 + W21 + 0.6322 

+ w3, + 0.05w32 + 0.21w4; = 0.05, 

Wi + Wi2 + Wi3 + Wig + Wai + W22 + W23 

+ Waa + W31 + W32 + W33 + Wa + Wa = 1. 
Solving this linear program the obtained solution is: w14 
= 0.2296, w23 = 0.2390, wa, = 0.5314. Similarly solving 


the other three linear programs the following solutions 
are obtained: 
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For max{w21 + W22 + W23 + Wa}: 

W21 = 0.4297, wo = 0.3529, wa, = 0.2174. 
For max{w3] + w32 + w33}: 

w23 = 0.0524, w33 = 0.7051, wa, = 0.2425. 
For max{w4 + wa}: 

w23 = 0.0524, wa, = 0.2425, wa. = 0.7051 . 


A simple way to select a unique utility function is to 
consider the mean of all the solutions obtained through 
the post-optimality stage: 

W114 = 0.0574, W2) = 0.1074 , 

w22 = 0.0882, W23 = 0.0860 , 

W33 = 0.1763, Wa = 0.3084, Wa2 = 0.1763. 
Consequently, the marginal utilities of the financial ra- 
tios are: 

U; (5%) = u, (7.5%) = u,(10%) = u,(12.5%) = 0, 

u (15%) = 0.0574 , 

uz(—30%) = 0, u2(—16.5%) = 0.1074, 
Ur(—3%) =0.1957 , 

(10.5%) = 0.2816 u2(24%) = 0.2816 , 

u3(93%) = u3(73%) = u3(53%) =0, 

U3(33%) = 0.1763 . 

u4(0.6) =0, u4(1.785) = 0.3084 , 

u4(2.97) = 0.4847 . 

According to this solution the global utilities of the 
firms are: 

U(F3) = 0.3548, U(F4) = 0.2852, U(Fs) = 0. 

It is obvious that the ranking of the firms according to 
their global utilities is fully consistent with the prede- 
fined pre-ordering. Furthermore, the bank can use the 


obtained additive utility function to evaluate any new 
firm. 


Preference Disaggregation Analysis 
in Sorting Problems 


The preference disaggregation analysis except for the 
study of ranking problems can also be used to study 


sorting problems. In this case, the primary objective is 
not to rank a set of alternatives from the best ones to 
the worst ones, but instead the aim is to sort the alter- 
natives into two or more predefined ordered homoge- 
neous classes denoted as Cj, ..., Cq (C; is the class of the 
best alternatives, and C, is the class of the worst alter- 
natives). The sorting of the alternatives can be accom- 
plished in several ways. The most simple and common 
one is based on the comparison of the score of each 
alternative with some thresholds (uw, ..., ug—1) which 
distinguish the classes. Following the methodological 
framework of the UTA method the score of each alter- 
native is its global utility U(a). The sorting of an alter- 
native a is accomplished through the following compar- 
isons: 


U(a) > uy, => ae Cy ; 

un < U(a) < uy > aeC, 
up < U(a)<up-) > AEC, 
U(a)<ug1 => aeC,. 


J.M. Devaud et al. [3] (see also [9] and [42]) pro- 
posed the UTADIS method (UTilités Additives DIS- 
criminantes) a variant of the UTA method which is es- 
pecially conceived for the study of sorting problems. 
The objective of the method is to estimate a global 
utility model (additive utility function) and the utility 
thresholds in order to minimize the classification error. 
The classification error is measured through two error 


Global Utility L(g) 


Class C; 
Alternative et; 
Allermauveg ey 
: Classificatian 

Alernative 7, error 

oth) 
Litility 
threshold i, 


Classification 
sitor 
ot tad 


Class Cy 
Alternative 6, 
Alternative 65 


Alleralive 4, 
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functions denoted as o*(a) and o” (a), representing the 
deviations of a misclassified alternative from the utility 
threshold. Figure 2 illustrates these two types of errors 
for the two-group (class) sorting problem. 

In the UTADIS method the additive utility model 
and the utility thresholds are estimated through the fol- 
lowing linear program (using the transformation wi = 
ui(gi*) _ ui(g!) proposed in [28]): 


min F= So ot(a) +e 


acc; 


+ > [ot (a) +0 (a)] +- 


acCk 


+ >> ota) 


a€Cq 

st. U(a)—u,+o7(a)>0, Vae CQ, 
U(a) —ux—1 — 0 (a) < —6, Wa € Cx, 
U(a) —uz to'(a)>0, Va e Cy, 
U(a) —ug—-1 —o (a) < —6, Va € Cy, 


m aj-l 


wi =) 


i=1 j=1 


Up—-1 — Un & S, K=2,...,q—1, 


wij = 0, oT (a) > 0, o (a) > 0. 


The first constraint implies that the global utility of an 
alternative a € C, should be greater or equal to the util- 
ity threshold 1. If this is not possible, then an amount 
of utility equal to o*(a) should be added to the global 
utility of this alternative, indicating that the alternative 
is classified to a lower class than the one it actually be- 
longs (cf. Fig. 2). The second set of constraints is used 
for alternatives that are classified by the decision maker 
in an intermediate class C;,. In such cases, the global 
utility of the alternative should be strictly lower than the 
utility threshold u,_; (a positive small real number 6 is 
used to ensure the strict inequality) and greater or equal 
to the utility threshold u,. If either of these two condi- 
tions is not satisfied then the corresponding amount of 
utility o*(a) or o (a) should be added (subtracted) to 
the global utility of the alternative. Similarly, the third 
constraint is used for alternatives which belong to the 
worst class Cy. The global utility of these alternatives 
should be strictly lower than the utility threshold u,_1; 
otherwise an amount of utility equal to o (a) should 
be subtracted from the global utility of the alternatives, 
indicating that these alternatives are classified by the 


model to a higher (better) class than the one they ac- 
tually belong (cf. Fig. 2). The fourth constraint is used 
as a normalization constraint, so that the global utilities 
and the utility thresholds are normalized between 0 and 
1. Finally, the fifth constraint is used to ensure that uy 
>+++ > Ug—y (a positive real number s > 6 > 0 is used to 
ensure the strict inequality between the utility thresh- 
olds). The post-optimality analysis stage is performed 
similarly to the UTA method. 

In [4] and [41] three variants of the UTADIS 
method were proposed to improve the classification ac- 
curacy of the obtained additive utility models. The first 
variant (UTADIS I) except for the classification errors 
also incorporates the distances of the correctly classi- 
fied alternatives from the utility thresholds which have 
to be maximized. The second variant (UTADIS II) is 
based on a mixed integer programming formulation 
minimizing the number of misclassifications instead of 
their magnitude, while the third variant (UTADIS HI) 
combines UTADIS J and II, and its aim is to minimize 
the number of misclassifications and maximize the dis- 
tances of the correctly classified alternatives from the 
utility thresholds. 

Except for the classification of the alternatives in the 
predefined classes, the global utilities of the alternatives 
that are estimated through the family of the UTADIS 
methods can be used to rank the alternatives belong- 
ing in each class. Hence, the decision maker is provided 
with an additional information that can be used to iden- 
tify the best and the worst alternatives within each class. 
The applications of the UTADIS method include sales 
strategy problems [3], the evaluation of research and 
development projects [9], and several financial deci- 
sion problems such as credit risk and bankruptcy risk 
evaluation, country risk assessment, credit card evalua- 
tion, evaluation of bank branches’ efficiency, and port- 
folio selection and management ([4,41,42]). In the sub- 
sequent subsection, a simple example is used to illus- 
trate how the UTADIS method can be applied in the 
study of sorting problems. 


Example 2 Let us consider again the credit granting 
problem which has been used previously to illustrate 
the application of the UTASTAR method. In this case 
we assume that the credit managers of the bank are not 
interested in ranking the firms from the most credit- 
worthy to the less creditworthy ones, but instead they 
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are interested in sorting them in two classes: the credit- 
worthy firms which could be financed by the bank (class 
C,), and the untrustworthy firms which should not be 
financed (class C2). Suppose according to the credit pol- 
icy of the bank, that the firms F), F, and F3 can be con- 
sidered as creditworthy firms, whereas the firms F4 and 
Fs; can be considered as risky (untrustworthy) firms. 
The global utilities of the firms have the same form as in 
the case of the UTASTAR method (1)-(4). The new lin- 
ear program that is used to estimate the additive utility 
model is the following: 


minot (F\) +o* (Fo) +ot(F3;) +o (Fy) +o (Fs) 


S.t. 


Wi + Wi12 + W21 + W22 + W23 + W4 


+ w31 + W32 + W33 


+ wy twy—u+ot(F)>0, (10) 
W21 + W22 + 0.89W 23 + 0.75W3) 
+ 0.43w4, — uy + ot (Fo) >0, (11) 
Wy + Wi2 + W13 + :0.20W14 
+ Wa + W22 + W23 + 0.56W24 
+ 0.65w3, + 0.20w4, — uy, +o*(F3)>0, (12) 
Wit + Wi2 + W13 + Wi4 
+ W21 + 0.63W22 + w3; + 0.05w32 
+ 0.21w41 =tHhy = o (F4) < 0.001 , (13) 
Wit 0.2w42 —Uuy— o (Fs) S 0.001 , (14) 
Wi + Wi2 + W13 + Wig 
+ Way + W22 + W23 + Wo4 
+ w31 + wW32 + W33 + Wai + Wag = 1. (15) 


Constraints (10)-(12) involve the firms F,, Fy and F3 
which belong in class C; and consequently their global 
utility should be greater or equal to the utility thresh- 
old u;. On the contrary, constraints (13)-(14) involve 
the firms F, and F; which belong in class Cy and con- 


sequently their global utilities should be strictly lower 
than the utility threshold u; (a small positive number 
5 = 0.001 is used to ensure the strict inequality). Con- 
straint (15) is used similarly to the UTASTAR method 
for normalization purposes. Solving the above linear 
program the following solution is obtained: 


Ww22 = 0.1779, Ww23 = 0.8018, Ww3) = 0.0203. 


The estimated utility threshold is u; = 0.9067. Accord- 
ing to this solution the global utilities of the firms are: 

U(F)) = 1, U(F2) = 0.9067 , 

U(F3) = 0.9929, U(F4) = 0.1324, U(Fs) = 0. 
Obviously, all the firms are correctly classified into 
their original class, however the obtained solution is 
not unique. Consequently, the method proceeds to the 
post-optimality analysis stage to identify the most char- 
acteristic multiple optimal solutions. This is achieved 
similarly to the UTASTAR method. However in this 
case, five new linear programs are solved. The first four 
correspond to the maximization of the criteria’s weights 
(as in the UTASTAR method), while the fifth corre- 
sponds to the maximization of the utility threshold ). 
For example, the linear programming formulation for 
maximizing the weight of the ratio EBIT/TA is the fol- 
lowing: 

min{wy + Wi2 + wi3 + Wis} 

s.t. 

Wi + Wi2 + Wa + W22 + W23 1 W24 

+ W31 + W32 + W33 

+ Wai + Wan — Uy 2 0, 

W21 + W22 + 0.89W23 

+ 0.75w3; + 0.43w4; —u, = 0, 

Wy + Wi2 + W713 + :0.20W 44 

+ W21 + W22 + W23 + 0.56wW24 

+ 0.65w3; + 0.20w4, —u, > 0, 

Wi + Wi2 + W13 + Wa + W21 + 0.63 W22 

+ w3, + 0.05w32 + 0.21w4, — uy < —0.001 , 

Wit + 0.2w 2 —uy< —0.001, 

Wi + Wi2 + Wi3 + Wi4 

+ Wai + W22 + W23 + Wa 


+ w31 + W32 + W33 + Wa + Wan = 1. 
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The solution to this linear program is: 
W144 = 0.4704, W23 = 0.5296 , 


and the utility threshold is u; = 0.4714. 

Similarly solving the other four linear programs the 
following solutions are obtained. 

For max{w2, + wW22 + W23 + Wa}: 


w21 = 0.2772, Ww22 = 0.1619 ’ 
wWw23 = 0.2837, W244 = 0.2772 , 
u, = 0.3802. 


For max{w3) + W32 + W33}: 
W23 = 0.0011, w33 = 0.9989, u; = 0.001. 
For max{wa, + wa}: 


W23 = 0.0010, w4; = 0.0005 , 
Wa. = 0.9985, u; = 0.0011 . 


For max{u}}: 


W221 = 0.9861, W222 = 0.0139 > 
uy = ia 


Similarly to the UTASTAR method, the mean of these 
solutions is considered as a unique solution: 


Wi4 = 0.0941, wo) = 0.2527 , 
W22 = 0.0352, w23 = 0.1631, 
W24 = 0.0554, w33 = 0.1998 , 
wa, = 0.0001, w42 = 0.1997. 


The utility threshold is u,; = 0.3707. Consequently, the 
marginal utilities of the financial ratios are: 


U1 (5%) = u,(7.5%) = u,(10%) = 4, (12.5%) = 0, 
u1(15%) = 0.0941 . 
U2(—30%) =0, u2(—16.5%) = 0.2527, 
U2(—3%) = 0.2878, u2(10.5%) = 0.4509 , 
U2(24%) =0.5064. 
u3(93%) = u3(73%) = u3(53%) = 0, 
Uz(33%) = 0.1998 . 
u4(0.6) = 0, u4(1.785) = 0.0001 , 
u4(2.97) =0.1998 . 


According to this solution the global utilities of the 
firms are: 


U(F)) = 0.9059, U(F)) = 0.4330, 
U(F3) = 0.5008, U(F4) = 0.3689, U(F;) = 0. 


It is obvious that the sorting of the firms according to 
their global utilities is fully consistent with the prede- 
fined classification. The utility function that has been 
constructed through the above procedure is the follow- 
ing: 


+ 0,1998u3(g3) + 0.1998u4(ga) . 


Preference Disaggregation Analysis 
in Choice Problems 


Except for the study of ranking and sorting problems 
the preference disaggregation analysis is also applicable 
in choice problems. In this case the decision maker is 
concerned with the selection of the most appropriate al- 
ternative. T.J. Stewart [33] demonstrated how the pref- 
erence disaggregation analysis could be applied in such 
types of problems. The methodology that he proposes 
combines the UTA method with an interactive pruning 
of the alternatives until the best alternative is selected. 

The proposed approach proceeds in five steps: 

1) Initially, a subset of the alternatives under consider- 
ation is selected. 

2) Following the methodology of the UTA method, the 
decision maker is asked to provide a ranking (pre- 
ordering) of the subset of the alternatives that was 
selected in step 1. 

3) According to the ranking defined in the previous 
step and using the UTA method the corresponding 
utility function is estimated. In the same step, for 
each alternative a not belonging to the reference set 
used to construct the additive utility model, a lin- 
ear program is solved in order to investigate the ex- 
istence of a utility function for which U(a) > U(o) 
(where o is the best alternative among the alterna- 
tives belonging to the reference set) while retaining 
the correct (consistent) ranking of the other alterna- 
tives of the reference set. If such a utility function 
does not exist, then a can be eliminated from further 
consideration. 
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4) If such a utility function exists then, in step 4, the 
utilities of all the alternatives (based on the utility 
function developed on the reference set of alterna- 
tives) are presented to the decision maker along with 
all the alternatives not belonging to the reference set 
that can be considered as best alternatives (for which 
there is a utility function that makes them the best 
alternatives). 

5) Ifthe decision maker is satisfied with the best solu- 
tion then the process stops, otherwise in step 5 the 
decision maker selects the alternative not belonging 
to the reference set which outperforms the current 
best alternative among the reference set. This new 
alternative is included in the initial reference set and 
the decision maker is asked to provide a new rank- 
ing of the alternatives in the new reference set. The 
methodology proceeds from step 3, until all the al- 
ternatives not belonging into the reference set have 
been considered. 

Another way or applying the preference disaggre- 
gation approach in choice problems has been proposed 
in [22]. This approach is based on constructing a fuzzy 
outranking relation [20] using the set of utilities esti- 
mated through the post-optimality analysis performed 
through the UTA method. The relation suggested is 
based on the calculation of the percentage of utilities 
for which an alternative is better than another. In this 
way the degree of credibility of the affirmation ‘alter- 
native a is at least as good as alternative b’ is estimated. 
The construction of this outranking relation enables the 
decision maker to identify the best alternative(s) which 
are not outranked by any other alternative(s), while 
furthermore, the incomparabilities which may occur 
among the alternatives due to their dissimilar charac- 
teristics are identified. 


Conclusions and Future Research 


In this article the main features, disciplines and char- 
acteristics of the preference disaggregation approach of 
MCDA were presented. Furthermore, the applicability 
of the preference disaggregation analysis in the study of 
several types of problems (ranking, sorting, and choice) 
was demonstrated through the description of the UTA 
method and several of its variants. 

Preference disaggregation analysis using mini- 
mal information from the decision maker constitutes 


a promising and powerful approach for modeling the 
decision makers’ preference and developing a global 
preference model through an interactive and iterative 
procedure. 

Multicriteria decision support systems (MCDSSs) 
implementing this MCDA approach can be very helpful 
tools both for decision analysts as well as for decision 
makers in making decisions in real world situations 
where the time is limited while the cost of taking a deci- 
sion is a very significant factor. Researchers in this field 
have already explored the potentials of MCDSSs incor- 
porating preference disaggregation procedures. Some 
representative examples are the PREFCALC system [8], 
the MINORA system [26], the MIIDAS system [27], the 
PREFDIS system [44], the FINCLAS system [43] and 
the FINEVA system [45] for corporate assessment, as 
well as the MARKEX system [13] for marketing de- 
cisions. Recently (1998), researchers have also inves- 
tigated the contribution of artificial intelligence tech- 
niques in the framework of an integrated decision sup- 
port system (intelligent multicriteria decision support 
systems), combining the preference disaggregation ap- 
proach as well as other MCDA approaches with the ca- 
pabilities provided by expert systems and neural net- 
works. The appealing features of such an integration, 
constitute a significant field of further research both in 
preference disaggregation analysis as well as in MCDA 
in general. 
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Preference modeling is an inevitable step in a lot of 
fields, that include economy, psychology, sociology, op- 
erational research, decision support systems. 

This step is sometimes implicit, as in operational 
research and economy, where the preferences of the 
decision-maker are often represented by a function to 
be optimized; sometimes it is studied in detail and 
based on experiments or inquiries, as in psychophysics 
or multicriteria analysis. Of course, people coming 
from different disciplines will, in general, have differ- 
ent points of view on preference modeling. It is useful 
to distinguish the normative perspective (attitude which 
consists in defining how to take a decision in order to 
satisfy predefined norms or properties; here the main 
questions are: how to be rational? how must be the pref- 
erences in order to reach that goal?), the descriptive per- 
spective (attitude which consists in modeling, as real- 
istically as possible, the behavior of a decision-maker; 
how do people take decisions?) and the prescriptive one 
(attitude which consists in helping a decision-maker to 
be as coherent as possible with his own goals and pref- 
erences; how to help somebody to decide’). 

However, people working on preferences, even they 
come from different fields and have different perspec- 
tives, use more or less the same tools. The purpose of 
this article is to give a brief overview of these tools. 


Basic Preference Relations 


Let A be the set of elements (decisions, candidates for 
a job, commodities, consumption levels, locations, ...) 
to be compared or evaluated. It is often assumed that, 
comparing two elements a and 8, the decision-maker 
can have three attitudes: preference for one element (a 
decision-maker prefers a decision a over a decision b 
(eventually according to a particular point of view) if, 
in a situation where he has to select one of them, he 
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chooses a), indifference between a and b (a decision- 
maker is indifferent between two decision a and b 
(eventually according to a particular point of view) if 
both decisions are equally acceptable for him), and in- 
comparability between them (two decisions a and b are 
incomparable (at a given moment of a decision process) 
if the decision-maker is not able or refuse to express 
a preference or an indifference, due to a lack of data or 
contradictory information). We will denote: 


aPb_ if ais preferred to b, 

bPa_ if b is preferred to a, 

alb incase of indifference, 
aJb incase of incomparability. 


The basic preference relations P, I and J are the sets of 
couples (a, b) such that, respectively, aPb, alb and aJb. 
These relations are used in most of the works devoted 
to preference modeling. Moreover, it is traditionally ac- 
cepted that these relations are mutually exclusive and 
verify the following basic properties: Va, b € A, 


aPb => b Pa (P is asymmetric), 
ala (J is reflexive), 
alb => bla (J is symmetric), 
a fa (J is irreflexive), 
aJb => bJa (J is symmetric). 


Defining the ‘preference-indifference relation’ S by aSb 
if and only if aPb or alb, it is not difficult to see that 
all the above situations can be characterized using only 
relation S: 


aPb ifandonlyif aSbandb Sa, 
alb ifandonlyif aSband bSa, 
aJb ifandonlyif a Sbandb Sa. 


Traditional Preference Model 


A usual attitude is to associate numbers (valuations) to 
the elements of A and to declare that a is preferred to 
b if the ‘value’ of a is greater (or smaller if the ‘value’ is 
a cost for example) than the value of b and that there 
is indifference if the values are equal. This is the tradi- 
tional approach in operational research, decision the- 
ory, economy or finance, where decision problems are 
considered as optimization ones. This attitude leads to 


a preference model with the following strong assump- 
tions: Va, b,c € A, 


aPb and bPc = aPc (P transitive), 
alb and bIc = alc (I transitive), 
aPb and bile => aPc, 
alb and bPc => aPc, 


and J is empty (there is no incomparability). 
In terms of relation S, this means that, Va, b, c € A, 


aSb 
a Sb 


characterizing what is usually called a weak order (i.e., 


and bSc = aSc_ (S transitive), 


= bSa_ (S complete), 


a transitive and complete relation). In this case, indiffer- 
ence I is an equivalence (i. e., a reflexive, symmetric and 
transitive relation) and the set of equivalence classes is 
totally ordered by relation P. 

When A is finite or enumerable, the weak order 
structure is sufficient to represent preferences by num- 
bers as explained here above; in the nonenumerable 
case, some topological conditions have to be added 
(see [7]). 


Extensions of the Traditional Model 


The transitivity of indifference is incompatible with 
the existence of a ‘sensitivity threshold’ under which 
the decision-maker does not feel any difference be- 
tween two elements or refuses to accept a strict pref- 
erence for one of the elements. Moreover, the fron- 
tier between indifference and preference is not always 
clear, leading to new situations (and new relations) 
called ‘hesitation between indifference and preference’ 
or ‘weak, strong, very strong, ... preferences’. Finally 
it is also necessary to extend the traditional model 
by taking into account incomparability situations (J is 
not always empty). All these considerations have led 
to new models called semi-orders, interval orders, par- 
tial orders, pseudo-orders, suborders, embedded fami- 
lies of preferences, ... 
ied in the literature. The interested reader can consult 
[5,6,8,11,14,20,22,26]. 


, which were extensively stud- 


Valued Preferences 


Modelers sometimes want to quantify preferences to 
express either an intensity of preference, either a num- 
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ber (or a percentage) of votes in favor of an element 
over another, either a probability of preference, or 
a ‘degree’ (without any specific property). In this case, 
a number is associated to each couple (a, b), lead- 
ing to so-called valued, fuzzy, probabilistic relations. 
The study of the properties of these relations, of their 
representations by numerical functions and of their 
use in decision models, is illustrated, for instance, 
by [9] or [12]. 


Preferences on Structured Sets 


The set A may have a special structure: the elements 
of A are sometimes points in a topological space (as in 
economy), vectors of evaluations on several dimensions 
(as in multicriteria analysis), probability distributions 
on a set of consequences (as in decision-making under 
risk), sets of possible consequences depending on the 
states of the nature (as in decision-making under un- 
certainty). All these situations have led to a lot of mod- 
els, concepts or methodologies that include value the- 
ory, utility theory, multi-attribute utility theory, multi- 
ple criteria decision analysis, subjective expected utility 
theory and conjoint measurement theory. 

Some references in this abundant literature are 
[7,10,15,16,18,19,23,25,28,29]. 


Other Topics Connected to Preference Modeling 


We briefly mention in this section some topics which 

cannot be ignored by people interested in preference 

modeling: 

e the ‘art’ of collecting preference information from 
subjects (see for example [30]), 

e the statistical analysis of preferences (see [4] or 
[13]), 

e the geometrical representation of preferences (see 
for example [2]), 
the problem of meaningfulness ([21]), 
the evolution of preferences over time (see [17]), 

e the philosophical aspects of preference modeling 
((3,31]) 
the social choice theory ([24]), 

e the preference aggregation problem ([1,27]. 
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A two-stage stochastic linear program with random 
right-hand side (and with discrete random variables) is 
normally written as follows 


min cx + Q(x) 

x>0 

st. Ax =b 
where 


Q(x) = D> pj Q(x, &)), 
bi 


and 
Q(x, §) = malay Wy = h(&) — T()x} . 


Here, p; is the probability that the random vector a 
takes on its jth possible value &, and W is an (m x n) 
matrix. This way of formulating a two-stage stochas- 
tic program is partly motivated by solution procedures 
([1,8]), partly by the time structure of the problem. For 
this article, the latter is the most important. The inter- 
pretation of the problem is that first (now) we make 
a decision x, then we observe a value of E and finally 
we make a recourse decision y based on our earlier deci- 
sion x and the observed value of €. 

Many aspects of preprocessing (pre-analysis) are in 
no way particular to multistage stochastic problems. 
For example, the issue of removing redundant rows 
and columns, as well as that of consistency, are well 
discussed for deterministic (one-stage) problems, see 
for example the work on preprocessing in linear pro- 
gramming by H.J. Greenberg [5]. Details referring to 
stochastic programs are found in [10], and in [7, Chapt. 
5; 6]. The one major point to make for stochastic pro- 
grams is that since, for example, the matrix W shows up 
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in the formulation as many times as there are possible 
values of €, it may be worthwhile to spend a lot of ef- 
fort just to simplify the matrix a little. This may not be 
true for deterministic LPs, as it may cost more to sim- 
plify the matrix than to solve the problem to optimality 
without simplifications. Hence, although technically the 
issues are the same, the motivation for simplifications 
may be rather different. 

An xo = 0 satisfying Ax = b may produce a recourse 
problem Q(xo, &;) which is infeasible for some j. In other 
words, the requirement that Q(x, &) be feasible for all 
— produces implied constraints on the first-stage deci- 
sions x. The cone pos W is a cone containing all vectors 
which are such that if they are put as right hand sides in 
the second-stage problem, they produce a feasible prob- 
lem. Formally, 


pos W = {t: t= Wy, y= 0}. 


If pos W = R”™ we say that we have complete recourse. 
That is easy to test for [10]. If h(&) — T(E) x € pos 
W for all possible € and all x > 0 satisfying Ax = 
b, we have relatively complete recourse. This is hard 
to test, but possible to generate. The rest of this ar- 
ticle will discuss the generation of relatively complete 
recourse. The clue is that if we find all implied con- 
strains, and add them to Ax = b, the expanded set 
of constraints will imply relatively complete recourse 
by construction. These implied constraints are useful 
in two directions, a) when solving the stochastic pro- 
gram we can use methods requiring relatively com- 
plete recourse, and b) the implied constraints will show 
the modeler things he has assumed, but never writ- 
ten down explicitly. Hence, the constraints are use- 
ful for both error detection and increased understand- 
ing. A small numerical example can be found in [7, 
Sect. 6.4]. 

To find an implied constraint for a given Xp is equiv- 
alent to finding a feasibility cut in Benders decomposi- 
tion (cf. also » Generalized Benders decomposition). 
However, the purpose of preprocessing is not to find 
just one, but to find all implied constraints (feasibility 
cuts). More formally, we look for a polar matrix W* 
(with a minimal number of columns), which is such 
that 


yeposW > y'W* <0. 


PROCEDURE support(W, W*); 


BEGIN 
frame(W); 
done:=false; 
FORi=1TOnDO 
IF NOT done THEN BEGIN 
@) = Wi w*; 
I, := {k: a(k) > 0}; 
I_ := {k : a(k) < 0}; 
I. := {k : a(k) = 0}; 
done:= (I_ U I_ = @); 
IF done THEN W* := 0; 
IF I, 4 @ AND NOT done THEN 
BEGIN 
We J, = ISDEIN WY" Wr 
ELSE BEGIN : 
FORALL k € I,, j € I. DO 
Ca = We — (ae(k)/a(1)) W;s 
w* = Wr U Wr Ukj Cij3 
frame(W*); 
END; (*ELSE*) 
END; («IF *) 
END; (*FOR*) 
END support; 


Pseudocode for finding the polar matrix W* 


Finding W* is a problem of exponential complexity, 
equivalent to extreme point enumeration. An algo- 
rithm tailored to the problem is found in [10], and 
presented above. For reasonably large LPs one can- 
not, in general, expect to be able to find all implied 
constraints. However, if there are few implied con- 
straints (and that is in any case the more interest- 
ing situation) the algorithm will find them quickly. 
Hence, this procedure may in some situations yield 
a lot of insight into a problem, in other cases it leads 
nowhere. Only by checking can that be found out. 
A pseudocode for the algorithm is given below. Two 
calls are made to a procedure frame(W). This pro- 
cedure finds a frame of pos W, i.e., it removes all 
columns from W that are nonnegative linear combina- 
tions of other columns. The variable done} is boolean. 
The recourse matrix W is input to the procedure, 
and the polar matrix W* is output. An illustrated ex- 
ample of the use of this procedure is found in [7, 
Sect. 5.2]. 
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For networks there are some more specific results. 
The starting point is the well known result by D. Gale 
[2] and A.J. Hoffman [6]. Let a network be given by a set 
of arcs A = {1,..., m} and a set of nodes N = {1,..., m}. 
An arc k may be described by its start node i and end 
node j by k ~ (i, j). We let B(i) be the external flow in 
node i (with a positive number meaning supply), and 
let y(k) be the capacity of arc k. By Q* = [Y, N\ Y]* we 
understand the set of arcs starting in a node in Y and 
ending in N\ Y. We define Q™ similarly, and define Q = 
Q*UQ asacut. Finally, we let G(Y) be the graph con- 
sisting of the nodes in Y, and the arcs connecting nodes 
in Y. The Gale-Hoffman (GH) result says that a capac- 
itated network flow problem is feasible if and only if for 
every cut Q=[Y, N\ Y], the net supply in Y is less than 
or equal to the capacity of Q*, i.e. if the GH-inequality 


> BU) < YS yk) (1) 


i€Y keEQr 


is satisfied. Following [11] this has been strengthened to 
show that a GH-inequality is needed if and only if G(Y) 
and G(N\ Y) are both connected. If either of them is 
disconnected, the GH-inequality is not needed as it is 
implied by other GH-inequalities. This is an interesting 
result as it shows that a linear dependency argument 
can be obtained by looking only at one inequality at 
a time. 

An algorithm for finding all necessary GH- 
inequalities can be found in [12], and one for updat- 
ing the inequalities when new arcs or nodes are intro- 
duced is given in [3]. The latter can be seen as a net- 
work interpretation of procedure support. These re- 
sults are shown for uncapacitated networks in [4,9]. 
They can be found directly or deduced from the ca- 
pacitated case by observing that (1) will always be sat- 
isfied as long as Q* # @. Hence, we keep only those 
GH-inequalities for which Q* = @. The result that 
G(Y) and G(N\ Y) both must be connected still ap- 
plies. 

If the recourse problem is a network flow problem, 
the W-matrix is the node-arc incidence matrix (with 
a row removed), and both external flows and arc capac- 
ities are functions of the first stage decisions x and the 
random variables ‘a If 6 and y in (1) are replaced by 
the appropriate expressions in x and &, and the results 
added to Ax = b, we obtain relatively complete recourse, 
which was our goal. Since this must be true for all &; for 


a given x, we can normally perform a worst-case analy- 
sis with respect to &;, and add this result to the first case 
constraints. 

If a capacitated network flow problem is formu- 
lated as an LP, with the upper bounds written explic- 
itly as constraints, and procedure support is applied, the 
columns of W* will correspond to an index vector for 
B and y (with negative signs on the index vector for y) 
in (1). 
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Introduction 


The classical paradigm in mathematical programming 
is to develop a model that assumes that the input data 
is precisely known and equal to some nominal values. 
This approach, however, does not take into account the 
influence of data uncertainties on the quality and fea- 
sibility of the model. It is therefore conceivable that as 
the data takes values different than the nominal ones 
several constraints may be violated, and the optimal so- 
lution found using the nominal data may be no longer 
optimal or even feasible. In a numerical case study on 
linear optimization problems from the Net Lib library, 
Ben-Tal and Nemirovski [1] concluded that in in real- 
world applications of linear optimization problems, one 
cannot ignore the possibility that a small uncertainty 
in the data can make the usual optimal solution com- 
pletely meaningless from a practical viewpoint. This ob- 
servation raises the natural question of designing solu- 
tion approaches that are immune to data uncertainty, 
that is they are “robust”. Modern robust linear op- 
timization models are proposed by Ben-Tal and Ne- 
mirovski [1], Bertsimas and Sim [2] and Chen, Sim 
and Sun [3]. While the proposals of Ben-Tal and Ne- 
mirovski [1] and Chen, Sim and Sun [3] lead to second 
order cone optimization problems (SOCP), the pro- 
posal of Bertsimas and Sim [2] preserves the linearity 
of the model. We focus on the results of Bertsimas and 
Sim [2]. 
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Formulation 


We consider the following nominal linear optimization 
problem, 


maximize cx 

subject to ” 
in which the data at every constraint, (a;, b;), 
i= 1,..., mare potentially uncertain at the time when 
the decision x € 3” needs to be made. For simplicity 
of our exposition, we assume without loss of generality 
that the data at the objective c is not subject to uncer- 
tainty, since we can use the objective maximize ¢ and 
add the constraint t — é’x < 0 to the model. 


Affine Data Dependency 


We focus on constraint-wise uncertainties. Clearly, un- 
certainties appearing at various parts of the data in any 
ith constraint (4;, b;) may be correlated. To capture 
such correlation, we assume that the uncertain data at 
the ith constraint is affinely dependent on some primi- 
tive uncertainty vector, 2! € 3 as follows 


Nj 
a; = a,(z') = a) + ) a3; , 
j=l 


Ni 
bj = bi(Z') = bP + D> biz’, 


j=l 


where (a!, b!) Ee grth j=1,...,Nj are non zeros 
vectors. Note that we can always define a bijection 
mapping from a vector space of 2! € i at the ith 
constraint to the corresponding data space of (4;, bj). 
Therefore, under the affine data dependency, it is al- 
ways possible to map all the data uncertainties affecting 
the ith constraint to the primitive uncertainty vector, 
z'. The affine data mapping can easily represent linear 
relations among data entries. As an illustration, 


a (100 y 0 
a2 (Z1, Z2) = 200 + —5 Zit 0 22; 
b 300 0 1 


the data is a vector in t* and has two primitive un- 
certainties. The first and second elements are related 
such that when a, increases by two units, a, decreases 


by five units. The values of b can change indepen- 
dently of a, or a2 and it is controlled by another 
primitive uncertainty, z2. For the case when all data 
entries in (4;,b;) are independently distributed, we 
would have N; = n+ 1 and that the vector (a!, b!), 
j=l,...,n+1 is a vector taking a non-zero value 
only at the jth row. We assume that the distributions 
of the primitive uncertainty vector, z',i=1,...,mare 
mildly characterized as follows: 


Model of Data Uncertainty U We assume that es 
j=1,...,N; are independently (but not necessar- 
ily identically) distributed random variables with zero 
means and support in [—1, 1]. 

Under the model of data uncertainty U, the nomi- 
nal value of Z' is a zero vector. Hence, it follows natu- 
rally that (a?, b?) is the nominal value of the uncertain 
data (a;(2'), b;(Z‘)). Similarly, (a/, b!), j = 1,..., Ni is 
the direction of data perturbation under the influence 
of the primitive uncertainty 2;. 

In robust optimization, we represent data uncer- 
tainty using uncertainty sets instead of probability dis- 
tributions. At each constraint, we allow the primitive 
uncertainty vector, z' to vary within an uncertainty set, 
U; without having to violate the ith constraint. We call 
the following problem the robust counterpart of Prob- 
lem (1) 


maximize c’x 
subject to a;(z')'x < b;(z') (2) 
Vzie Uji=1,...,m. 


Indeed, given any solution to the robust counterpart, 
we are able to guarantee deterministically that the so- 
lution will remain feasible if the primitive uncertainty 
vector, Z lies within the uncertainty set U;. Under the 
model of data uncertainty U, we are able to design un- 
certainty sets that guarantee feasibility in constraints 
with high probability without being overly conserva- 
tive. 

Although the robust counterpart (2) has possibly 
exponential or even infinite number of constraints, it 
is nevertheless a convex optimization problem with re- 
spect to its decision variable, x. In fact, the computa- 
tional complexity of the robust counterpart depends on 
the nature of the uncertainty set, U;. Here, we focus on 
polyhedral uncertainty sets in which the robust coun- 
terparts are computationally tractable in the form of 
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linear optimization problems that are moderately larger 
in size as compared to the nominal problems. 

Using linear programming duality, we show how to 
convert the robust counterpart (2) into a concise linear 
optimization problem. We focus on the following poly- 
hedral uncertainty set, 


P; = {ze RN: Siz+Tju<r;forsome ut}. 


Theorem 1 The robust counterpart 


a;(z')'x < b;(z') Vz! € Pi, (3) 
is equivalent to 
dp: 
rp <b? ayx 
a; x —bi 
Sip = (4) 
aNi'x be 
Tip =0 
p20 
Proof 1 For notational convenience, we ignore the 


constraint index i. Under the affine data dependency, 
we can represent the constraint (3) as 


N 
S(al'x _ b!)z; < bo - ax VzeEeP 
j=l 

or equivalently as 


max 
zEP 


N 
y-(ai'x — biz} < b° — ax. (5) 
j=l 

By standard linear programming duality, the objective 
of the following problem 


d'z 
Sz+Tu<r 


maximize 
subject to 


is the same as 


minimize rp 
subject to S’p d 
Tp =0 


Hence, the constraint (5) is equivalent to 


minimize r’p < b°— ax 
als —b} 
subject to S’p= : 
aN", — 5N 
T'p=0 
p=o. 


(6) 


Finally, to ensure the feasibility of x, we only need to 
find a vector p feasible in the left hand side optimization 
problem of constraint (6) such that r’p < b° — ax. If 
such vector, p exists, the corresponding minimizer, p* 


should also satisfy r’p* < b° — ax. oO 
Hence, the robust counterpart (2) in which 
U; = P;,i = 1,..., mis equivalent to 
maximize c’x 
subject to r;'p; < bo — ax i=1,...,m 
1/ 1 
a; x—b; 
Sip; = i=1,...,m 
of : 
a x bNi 
Tp; =0 eo eatery 
p; 290 i=1l,...,m 
(7) 


Worst Case Uncertainty Set 


Under the model of data uncertainty U, the primitive 
uncertainty vector Z' at the ith constraint has support 
in [—1, 1]‘’. Hence, we define the following worst case 
uncertainty set 


W; = {ze RN: -1<2< 


for the primitive uncertainty corresponding to the i 
constraint. Such uncertainty set is first considered by 
Soyster [4]. 


Theorem 2 The robust counterpart 


aj(z'’x <bj(z') VzieW;, 
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is equivalent to 


dp.q: 
I'(p + q) <b} —a}x 
alz—p) 
_ (8) 
P-4q= 
aN’ a 
p.q=0. 
The robust counterpart (2) in which U; = Wi, 
i=1,...,mis equivalent to 
maximize c’x 
subject to V(p; + 4:) <b — ax $= 1a gn 
al’x —b!} 
P;-4 = : i=1,....m 
aNi'x b 
P94, 29 i=l1,...,m. 
(9) 


Uncertainty Set with a Budget 


Under the model of data uncertainty U, the worst case 
uncertainty set is can be over conservative. Speaking in- 
tuitively, when N; is large, it may be unlikely for the 
primitive uncertainty vector Z' to deviate unanimously 
towards the violation of the constraint. Bertsimas and 
Sim [2] consider an uncertainty set that is able to with- 
stand parameter uncertainty without excessively affect- 
ing the objective function as follows: 


Nj 
-1<z<1,) |< 
j=l 


BUT) = {zen : 


The goal is to be protected against all cases that up to 
Nj are 
allowed to to change deviate maximally in [—1, 1]. In 


I, of the primitive uncertainties zi, j=l,..., 


other words, Bertsimas and Sim stipulate that nature 
will be restricted in its behavior, in that only a subset of 
the primitive uncertainties will change in order to ad- 
versely affect the solution. They propose an approach, 
that has the property that if nature behaves like this, 
then the robust solution will be feasible deterministi- 
cally. 

The parameter I, commonly known as the budget 
of uncertainty controls the size of the uncertainty set, 


Bi (I) such that B;(0) = {0} and B;(N;) = W;. Note 
that the uncertainty set is a polytope as follows 


B(I;) ={z e RN 


for some u} : 


i:u<1,lu<T,,-u<z<u, 


Theorem 3 The robust counterpart 


a;(z')'x < b;(z') Vz' € B (I), 
is equivalent to 
dt,s, p,q: 
[t+ 1's < bf —a"'x 
s+lt=p+q 
a}'x — b} (10) 
P-q= 
Ng ye 
pP.4.s290,t=0. 
The robust counterpart (2) in which U; = B;(I%), 
i= 1,..., mis equivalent to 
maximize c’x 
subject to Tit; + 1's; < 6° — ax i=1,...,m 
s;+1t; =p,;+q; i=1,...,m 
al’x —b!} 
Pi -9= : i=1,...,.m 
a’'x — pi 
Pee eee, i= l,....m. 
(11) 


If more than I; of the primitive uncertainties ai 
j =1,...,.N; change, the robust solution will be fea- 
sible with very high probability. 


Theorem 4 Under the model of data uncertainty U, if 
x is feasible in the robust counterpart (10) then 


. ; Tr? 
Pr (a;(z')'x > b;(z')) < exp (- =) ; (12) 


Note the original bound proposed in Bertsimas and 
Sim [2] require symmetrically bounded distribution. 
However, this condition is relaxed by Chen, Sim and 
Sun [3]. Nevertheless, tighter bounds can be achieved if 
the distributions are also symmetrically distributed. 
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Theorem 5 

1. Suppose zi, j=1,...,Nj are independent symmet- 
rically distributed random variables in [—1,1], then 
if x is feasible in the robust counterpart (10), we have 


Pr (a;(2')'x > b;(z')) < B(n, I), (13) 
where 
N; 
1 N; : Ni 
B l;) == — 1- 7 
(n, Fi) == a | o( i) +> ("’ 
1=|[vJ+1 
(14) 
where v = Ete and =v-—|v|. 
2. For I; = 0/Ni; 
lim B(N;, I) = 1— 6(6), (15) 


N;—0o 


where 


1 @ ¥ 


is the cumulative distribution function of a standard 
normal. 


We next compare the bounds: (12) (Bound 1), (13) 
(Bound 2), and the approximate bound (15). Table 1 
illustrates the choice of I; as a function of Nj; so that 
the probability that a constraint is violated is less than 
1%, where we used Bounds 1 and 2 and the approxi- 
mate bound to evaluate the probability. It is clear that 
using Bounds 2 or the approximate bound gives essen- 
tially identical values of I;, while using Bound 1 leads 
to unnecessarily higher values of J; when the prim- 
itive uncertainties are symmetrically distributed. For 
N; = 200, we need to use IX = 33.9, i-e., only 17% of 
the number of uncertain data, to guarantee violation 
probability of less than 1%. For constraints with fewer 
number of uncertain data such as N; = 5, it is neces- 
sary to ensure full protection, which is equivalent to the 
worst case or the Soyster’s method. Clearly, for con- 
straints with large number of uncertain data, the pro- 
posed approach is capable of delivering less conserva- 
tive solutions compared to the Soyster’s method. 


Applications 


Most of the practical optimization problems are in the 
form of a mixed integer programming (MIP) model, 


Price of Robustness for Linear Optimization Problems, 
Table 1 

Choice of I as a function of N; so that the probability of con- 
straint violation is less than 1%. 


Ni Tj from 
Bound 1 


Tj; from = Ij from 
Bounds 2 Approx. 

5 5) 

8.2 8.4 
24.3 24.3 
33.9 33.9 
105 105 


i.e., some of the variables in the vector x take inte- 
ger values. Fortunately, the robust model (11) in the 
case in which the nominal problem is a MIP is still 
a MIP formulation, and thus can be solved in the same 
way that the nominal problem can be solved. Moreover, 
both the deterministic guarantee as well as the proba- 
bilistic guarantee (Theorem 2) is still valid. As a result, 
the robust approach applies for addressing data uncer- 
tainty for MIPs. In our computational studies, we apply 
the robust formulation to a zero-one knapsack prob- 
lems that are subject to data uncertainty. We examine 
whether this approach is computationally tractable, and 
whether it succeeds in reducing the price of robustness. 

The zero-one knapsack problem is the following 
discrete optimization problem: 


maximize c’x 
subject to wx <b 
x € {0,1}" 


Although the knapsack problem is NP-hard, for prob- 
lems of moderate size, it is often solved to optimal- 
ity using state-of-the-art MIP solvers. For this experi- 
ment, we use CPLEX 6.0 to solve to optimality a ran- 
dom knapsack problem of size, n = 200. 

Regarding the uncertainty model for data, we as- 
sume the weights w; are uncertain, independently 
distributed and follow symmetric distributions in 
[w; — 6;, w; + 6;]. Hence, under the affine data depen- 
dency, we have 


w(Z) = wi + 6;Z; i=1,...,n 

in which Z;,i = 1,..., are independently distributed 
and follow symmetric distributions in [—1, 1]. An ap- 
plication of this problem is to maximize the total value 
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Price of Robustness for Linear Optimization Problems, Fig- 
ure 1 

Optimal value of the robust knapsack formulation as a func- 
tion of I” 


of goods to be loaded on a cargo that has strict weight 
restrictions. The weight of the individual item is as- 
sumed to be uncertain, independent of other weights 
and follows a symmetric distribution. In our robust 
model, we want to maximize the total value of the goods 
but allowing a maximum of 1% chance of constraint vi- 
olation. The robust model is as follows: 


maximize c’x 
subjectto wx + )°'_,6;xiZ; <b VzeE BIL) 
x € {0,1}". 


For the random knapsack example, we set the ca- 
pacity limit, b to 4000, the nominal weight, w; being 
randomly chosen from the set {20, 21,..., 29} and the 
cost c; randomly chosen from the set {16,17,..., 77}. 
We set the weight uncertainty 6; to equal 10% of the 
nominal weight. The time to solve the robust discrete 
problems to optimality using CPLEX 6.0 on a Pentium 
II 400 PC ranges from 0.05 to 50s. 

Figure | illustrates the effect of the protection level 
on the objective function value. In the absence of pro- 
tection to the capacity constraint, the optimal value 
is 5592. However, with maximum protection, that is 
admitting the Soyster’s method, the optimal value is 
reduced by 5.5% to 5283. In Fig. 2, we plot the opti- 
mal value with respect to the approximate probability 
bound of constraint violation. In Table 2, we present 
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Price of Robustness for Linear Optimization Problems, Fig- 
ure 2 

Optimal value of the robust knapsack formulation as a func- 
tion of the probability bound of constraint violation given 
in (13) 
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Table 2 
Results of Robust Knapsack Solutions 


rT Probability Optimal Reduction 
Bound Value 


a sample of the objective function value and the proba- 
bility bound of constraint violation. 

It is interesting to note that the optimal value is 
marginally affected when we increase the protection 
level. For instance, to have a probability guarantee of 
at most 0.57% chance of constraint violation, we only 
reduce the objective by 1.54%. 
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Conclusions 


The robust methodology provides solutions that ensure 
deterministic and probabilistic guarantees that con- 
straints will be satisfied as data change. Moreover, the 
protection level determines probability bounds of con- 
straint violation, which do not depend on the solution 
of the robust model. As the robust model remains a lin- 
ear optimization problem, the method naturally applies 
to discrete optimization problems. It is also possible to 
build uncertainty sets that are mapped from asymmet- 
rical distributions. The interested reader may refer to 
Chen, Sim and Sun [3]. 
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LCP 


Linear complementarity problems in the general form 
are given as follows (cf. also ® Linear complementarity 
problem): 


—Mu+v=q, 
u,v>0, 
uj;v; =0, P= 1.22.5; 


where M is a given (n x n) matrix, q € R". The non- 
negative variables u; and v;, i= 1,...,, should be non- 
negative and complementary. The solvability of LCPs 
depend on the special properties of the coefficient ma- 
trix M. For illustration we will give some well solvable 
classes in the sequel. 

LCPs are natural generalizations of linear program- 
ming (linear optimization) and quadratic program- 
ming. In the first case 


0 A —b 
m= (fr ‘a and a~(~). 
where the matrix A and the vectors b and care the prob- 
lem data of the linear optimization problem 
min OMe Ax >b,x> o} : 


This way a block diagonal skew-symmetric matrix M is 
obtained. When the LCP is obtained from the convex 
quadratic optimization problem 


1 
min } eT + 5x1 Qe: Ax > b, xz) : 


where Q is a positive semidefinite symmetric matrix, 


then 
A —b 
6) and q = ( ) : 


0 
e 
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Here M is a positive semidefinite bisymmetric matrix. 
Bisymmetry means that the matrix has a block diagonal 
structure, and it is the sum of a symmetric block diag- 
onal positive semidefinite, and a skew-symmetric block 
diagonal matrix. 

Some other classes of solvable LCPs occur when M 
is 
e a P-matrix, 
e asufficient matrix or, equivalently, a P.-matrix [21]. 
LCPs are solvable by using pivot methods or interior 
point methods (cf. also » Linear programming: Inte- 
rior point methods). Here the basics of some of the 
most popular principal pivot methods are discussed. 


Principal Pivoting 


When one solves and LCP by using pivoting, then the 
equality constraints —Mu+v = q always hold. The co- 
efficient matrix of the vector v, the unit matrix might 
serve as a natural initial basis. Then the basic solution u 
= 0 and v= q is complementary as well. A basis is called 
complementary when exactly one of the complementary 
variables (u;, v;), for all i=1,..., n, is a basis variable. 

In a pivot algorithm, because we are working with 
basic solutions, the equality constraints always hold. 
We strive for nonnegativity while some variables leave 
the basis and others enter. Because the initial basis is 
complementary, it is a natural idea to preserve the com- 
plementarity property. This imposes that, if a set of 
variables leaves the basis, then precisely the variables in 
the complementary set must enter the basis. Such a step 
is called principal pivoting. 

Principal pivoting was introduced by A.W. Tucker 
[19,20]. The theory of principal pivoting, or comple- 
mentary pivot theory, was extensively studied [4,5,6,7, 
13,14,16]. 


Principal Pivot Algebra 


When we write the basis (or simplex) tableau of the 
LCP down, then by leaving out the basis part we get the 
tableau: 


Uu 
v|q|| M 


Let (y, 7) be a partition of the index set {1,..., n} 
and let us assume that the principal submatrix My, 


is nonsingular. Then the current representation of the 
LCP can be written as 


Vy = Gy + Myyty + Myyuy , 

vy = a7 + Myyty + Myzuy. 
This in the usual tableau form can be written as A prin- 
cipal pivot, when the variables v, leave the basis and 
their complementary pair u, enters the basis can be ex- 
plained both in the equation and tableau form as fol- 
lows. Using the assumption that M,, is nonsingular, 
we may write 

Uy =q), + My) vy — M,, 

Vy = G7 + MyyMyy vy + MZ 


where 


! -1 
dy = Iy — MyyM,,4y . 
This representation corresponds to the tableau 
Further, the matrix 


—1 —1 _ 
M' = ( oe Ny ) 
MyyM,, Myy ~ MyyM,,Myy 


is called a principal pivotal transform of the matrix M. 
If we define 


then the LCP can equivalently be reformulated as 


—M'u +v' =q', 
“sv =0, 
0 0g FS Wich: 


This way, without loss of generality, we may assume 
that the current principal pivot transform of the LCP 
contains v’ as the vector of basis variables; u’ = 0, v’ = q’ 
as the current complementary solution; and, M’ is the 
coefficient matrix of the nonbasic variables. 

When only one variable leaves the basis, i. e. y = {k} 
for some index, while its complementary pair enters, 
then this operation is called a simple principal pivot, or 
diagonal pivot, because in this case Miya = mx, the 
kth diagonal element of the matrix M. When two vari- 
ables are coming in and at the same time their pairs 
leave the basis, then this 2 x 2 principal pivot is called 
an exchange pivot or double pivot. 
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Invariant Matrix Classes 


Let a class of matrices K be given. Then one say that the 
matrix class is invariant under principal pivoting, if for 
all M € K and for all principal pivot transforms M’ of 
M the relation M’ € K holds. 

It is known, that several matrix classes enjoy the in- 
variance property. The most important invariant ma- 
trix classes are the following: 

e the class of P-matrices; 

the class of Po-matrices; 

the class of positive definite matrices; 

the class of positive semidefinite matrices; 

the class of bisymmetric positive semidefinite matri- 
ces; 

the class of column sufficient matrices; 

the class of row sufficient matrices; 

the class of sufficient matrices; 

the class of Q-matrices. 

Detailed discussion on matrix classes, their characteri- 
zation, invariance and other properties can be found in 
[7,16]. 


Simple Principal Pivoting Methods 


Simple principal pivoting methods are using only sim- 
ple principal pivots, i.e. principal pivots of order one. 
The best known variant of this method is due to Y. Bard 
[1] and G. Zoutendijk [22]. 

Let us assume that the matrix M is a P-matrix. Then 
all of its principal submatrices, in particular all of its 1 
x 1 principal submatrices are nonsingular. 

Due to the P-matrix property, the steps of the algo- 
rithm are executable, however there is no guarantee that 
the method in this form is finite. The finiteness ofa vari- 
ant of this most simple principal pivoting algorithm was 
proved by K.G. Murty [15]. Murty’s least-index refine- 
ment is as follows: 

e Let an ordering of the pairs of the complementary 
variables be fixed. 
e In Step 1, choose the least indexed infeasible basic 

variable, i.e. let k = min{i: v;/ < 0,i=1,..., n}. 
Finally, remark that a finite variant of the Zoutendijk— 
Bard principal pivot rule can also be developed by us- 
ing an appropriate lexicographic pivot selection rule (cf. 
also » Lexicographic pivoting rules). 


Principal Pivoting Methods for Linear Complementarity 
Problems, Table 1 


uy uy 


Vy | dy || Myy | Myy 
Vy | dy || Moy | Myr 


Principal Pivoting Methods for Linear Complementarity 
Problems, Table 2 


Vy uy 


7 —T =I 2 
Uy dy Myy _ Ny, 
Vy | dy || MyyMy, | Moy — MyyMy,Myy 


0 | (Initialization) 

Let the unit matrix, the coefficient matrix of 
the v variables be the initial complementary 
basis. 

1 | (Leaving variable selection) 

IF no infeasible variable exist, 

THEN stop; the LCP is solved. 

Choose an infeasible basis variable, say v;,. 

2 | (Principal pivot) 

Do a simple principal pivot on m),, 4 0. The 
chosen infeasible variable leaves the basis while 
its complementary pair enters. 

Go to Step 1. 


Zoutendijk—Bard principal pivot rule 


General Principal Pivoting Methods 


As mentioned earlier, general principal pivoting meth- 
ods use not only simple principal pivots of order one, 
but also use larger pivot blocks. Frequently, these blocks 
are not determined at once, the principal blocks are 
build-up step-by step. 

Three variants of more general principal pivoting 
methods will be described. The (complementary) basis 
tableaus will be considered in the form as given in Ta- 
ble 1 and Table 2. 


Criss-Cross Methods 


The first one is the least-index criss-cross method 
[3,9,10,12], which uses first and second order principal 
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0 (Initialization) 

Let the unit matrix, the coefficient matrix of 

the v variables be the initial complementary 

basis. 

Let an ordering of the pairs of the comple- 

mentary variables be fixed. 

1 | (Leaving variable selection) 

IF no infeasible variable exist, 

THEN stop; the LCP is solved. 

Choose the least indexed infeasible basis vari- 

able, say v,, where k = min{i : v, < 0, 

f= My coon he 

2 | (Entering variable selection) 

IF the pivot (v,, u,) is possible, ie. m,, ¥ 0, 

THEN go to Step 3a. 

IF m',, =0, 

THEN look for positive coordinates in row k 

of M’. 

IF all coordinates in row k are nonpositive, 

THEN stop; the LCP is infeasible. 

Choose the least indexed positive coordinate, 

say with index r, in row k: r = min{i: m}; > 

O, fi, onon hs 

Go to Step 3b. 

3a | (Diagonal pivot) 

Do a simple principal pivot on m/,. Then 

chosen infeasible variable v’, leaves the basis 

while its complementary pair u’/, enters. 

Go to Step 1. 

3b | (Exchange pivot) 

Do a 2 x 2 principal pivot on the principal 

mm, 
kk kr 
rk Mer 

and v’, leave the basis while their complemen- 

tary pairs u), and u’, enter. 

Go to Step 1. 


submatrix . The variables v/, 


Least-index criss-cross pivot rule for sufficient LCPs 


pivots. During the execution of the algorithm all the ba- 
sis solutions are complementary. 

Let us assume that the matrix M is a sufficient ma- 
trix. As it was mentioned above, the class of sufficient 
matrices is closed under principal pivot transforms. 

Due to the assumption that the matrix M is suffi- 
cient and the class of sufficient matrices is closed un- 
der principal pivoting, the steps of the algorithm are 
executable. It is also known that this least index criss- 


cross method solves sufficient LCPs in a finite number 
of steps [3,9,10,11,12]. 

Observe, that if the matrix M is a P-matrix, then 
only simple principal pivots are executed. Thus, the 
least index criss-cross rule reduces to the simple princi- 
pal pivoting algorithm with Murty’s least index refine- 
ment. 


Lemke’s Algorithm 


The second algorithm is the complementary pivoting 
algorithm of C.E. Lemke [13,14]. This algorithm can 
be interpreted on different ways, here only the simplest 
version is presented. 

If q => 0, then the vector u = 0, v = q solves the LCP. 
If q Z 0; then let d= (—1,..., —1)T € R" and let consider 
the augmented LCP: 


—Mu+v+dia=q, 
u,v,A>0, 
ujv; =0, a ees (a 
The solution u = 0, v = q — dd, where A = min {q;: 1 <i 
< n} has the following properties: 


—Mu +v+ dd =q holds; 
all coordinates are nonnegative; 


it is complementary, i.e. ujvj = 0, Vis 
there is an index k with both u, and v; are nonbasic, 
and thus zero, further A = qx. 


Those solutions of the augmented LCP, which sat- 
isfy these properties, will be referred to as almost com- 
plementary solutions. Lemke’s complementary pivot al- 
gorithm traverses through a sequence of almost com- 
plementary solutions. At the last step the variable 4 
reaches the value zero, and, this way a complementary 
solution of the original LCP is obtained. 

Lemke’s algorithm builds-up a big principal pivot 
block via a series of pivots which produce almost com- 
plementary basic solutions. It does not make explicit 
use of any special property of the coefficient matrix, 
however the conclusion, at Step 2, that the LCP is in- 
feasible might not always be valid. For large classes of 
LCPs, e.g. if M is a copositive matrix, the infeasibility 
conclusion is true. For detailed discussions see e. g. [7]. 

The algorithm in this form terminates in a finite 
number of steps only for instances when all almost 
complementary basis of the augmented LCP are nonde- 
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0 (Initialization) 
Form the augmented LCP as discussed above. 


The initial almost complemen- 
tary feasible basis is given by 
Dig ence Mesig Aa Weil o20n Mas 


(The actual basis variables, and the variable 
that just have left the basis will again be de- 
noted by v‘,, while their complementary non- 
basic pairs by u/,.) 

1 | (Entering variable selection) 

The entering variable is u,, the complemen- 
tary pair of the variable which just have left 
the basis. 

2 | (Leaving variable selection) 

IF all coordinates in the column of uj; are 
nonnegative, 

THEN stop; the LCP is infeasible. 

Make a primal simplex ratio test in the col- 
umn of ui. 

IF the pivot (A, u,.) comes out of the ratio 
test, 

THEN go to Step 3a. 

IF for some r the pivot (v’., u,) comes out of 
the ratio test, 

THEN go to Step 3b. 

3a | (Solution) 

Do a pivot on (A, u;), stop. The resulting 
feasible basis is complementary; the LCP is 
solved. 

3b | (Pivot) 

Doa pivot on (v/,, u,) and let k := r. 

Go to step 1. 


Lemke’s algorithm 


generate. Degeneracy resolution is possible on the ba- 
sis of lexicographic pivot selection [7,18] as well as by 
using least index resolution [3]. A particularly interest- 
ing result is the lexicographic Lemke rule of M.J. Todd 
[18]. To date (2000), this is the only simplex method for 
oriented matroid programming [2] and, it is a particu- 
larly interesting simplex algorithm for linear optimiza- 
tion [17]. 


Symmetric PPM 


Finally, a general form of the symmetric principal piv- 
oting method is presented. In the major cycles the PPM 


0 (Initialization) 

Let the unit matrix, the coefficient matrix of 
the v variables be the initial complementary 
basis; thus the initial solution is u = 0, v = q. 
Let A < min{g;: 1 <i <n}. 

1 | (The distinguished variable) 

IF no infeasible variable exist, 

THEN stop; the LCP is solved. 

Choose an infeasible variable as the distin- 
guished variable which has, let say, index k. 
We have two cases: 

li | The distinguished variable is u, = A < 0. 

lii | The distinguished variable is v/, < 0. 

In both cases we use u/, as the driving vari- 
able. 

2 | (Blocking variable selection) 

Increase the value of u’,. 

Let 3; be the largest possible value of u), sat- 
isfying the following two conditions: 

2a | IF v, is the distinguished variable THEN vi}, 
stays nonpositive; 

2b | for all basic variables vi > A holds. 

IF 3; = 00, THEN stop; the LCP is infeasible. 
IF 3; = 0, THEN no pivot is needed; 

let ui, := 0, uj for all i # k remain un- 
changed and let v’ := q + Mu’. Go to Step 
Ih 

ELSE, let r be the index of the variable which 
blocks the increase of u. 

3a | (Diagonal pivot if m’., > 0) 

IF m',, > 0 THEN do a simple principal pivot 
on m},, and make v/, nonbasic at its blocking 
lower bound value. 

IF r = k, THEN go to Step 1. 

IF r £ k, THEN go to Step 2. 

3b | (Exchange pivot if m’_, = 0) 

IF m’., = 0 THEN do a2 x 2 principal pivot 


i / 

Me Ula 
The variables v, and v/, leave the basis while 
their complementary pairs u/, and u/, enter. 
Let k := rand go to Step 2. 


on the principal submatrix 


Symmetric principal pivot algorithm 


drives an infeasible variable, called the distinguished 
variable to zero. In this process an artificial lower bound 
A <0 is used. The nonbasic variables u;’ are either zero, 
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or equal to A < 0. This way, the ‘complementary’ basic 
solution v = q + Mu’, in fact, will not be complemen- 
tary. Let M be a row sufficient matrix. 

Although the symmetric principal pivoting algo- 
rithm is not finite for nondegenerate problems, it also 
can be turned into a finite one by using least index res- 
olution [3] or lexicographic pivot selections [7]. 

Several variants of the principal pivoting method — 
symmetric, parametric, asymmetric etc. - were devel- 
oped in the last decades. Good surveys of this extensive 
theory can be found in [7,16]. 


Complexity 


Finally, some notes are due on worst-case behavior. 
As it was mentioned, and was proved in the cited lit- 
erature, the presented algorithms are finite when they 
are furnished with either least-index resolution or lex- 
icographic selection rules. However, in the worst case 
they may require exponentially many pivots to solve 
the LCP. The best known exponential example is due 
to Murty. The data is given as follows: q; = — 1, Vi, and 


0 ifi>j, 
OD Mpa 7. 


Clearly the matrix M is an upper triangular P-matrix. 
Further, the vector uT = (0,..., 0, 1) and v? =(1,..., 1, 
0) is the unique solution of this LCP. It can be shown 
that, e.g., Murty’s simple principal pivoting method 
and the least-index criss-cross method needs 2”~! steps 
to solve this LCP. On the other hand, although it is not 
proved in general, the results of the paper [8] indicate, 
that analogously to simplex methods, principal pivot- 
ing methods need a polynomial number of iteration in 
average. 


See also 


> Convex-Simplex Algorithm 

> Criss-Cross Pivoting Rules 

> Equivalence Between Nonlinear Complementarity 
Problem and Fixed Point Problem 

> Generalized Nonlinear Complementarity Problem 

> Integer Linear Complementary Problem 

> LCP: Pardalos—Rosen Mixed Integer Formulation 

> Least-index Anticycling Rules 


> Lemke Method 

> Lexicographic Pivoting Rules 

> Linear Complementarity Problem 

> Linear Programming 

> Order Complementarity 

> Parametric Linear Programming: Cost Simplex 
Algorithm 

> Pivoting Algorithms for Linear Programming 
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Solving linear programming problems (cf. also » Lin- 
ear programming) is of enormous relevance in real 
world applications, which contain a lot of data and 


of unknown variables. Hence, the computational effi- 
ciency of solution methods is a crucial criterion for their 
applicability. 

Today, we have a competition between the simplex 
method (invented around 1947 by G.B. Dantzig) and 
interior point methods (starting with Karmarkar’s al- 
gorithm in 1984; cf. also ® Sequential quadratic pro- 
gramming: Interior point methods for distributed opti- 
mal control problems). 

This article concentrates on simplex methods and 
on an investigation of their arithmetical effort, mea- 
sured in terms of the average number of pivot steps. 

Throughout the paper we discuss the following type 
of linear programming problems: 


maximize v!x 
subj Tx <b} Tx < bm 
jectto a,x <D,...,a,x< (1) 
where V,a1,...,dm € R",b € R™ 
and m>n. 


For abbreviation we use 


ae b} 
A:=]| : |eR™" and b=| : 
a} b™ 


The matrix A collects the m gradient vectors to the re- 
strictions as row vectors, and the vector b gives the m 
capacities. X := {x: Ax < b,x € R"} is the feasible region, 
respectively the feasible polyhedron, to the problem (1), 
which can also be written in the form 


+ 


maximize v' x subject to Ax <b. (2) 


Other types of programs as 


T 


maximize v' x subject to Ax < b,x >0, (3) 


T 


maximize v' x subject to Ax = b,x >0 (4) 


and hybrids or variations of such forms can easily be 
translated into (1). But for form (1) our discussion on 
the influence of distributions, dimensions and variants 
can be made much better in geometrical, verbal terms. 
All the stated results hold - after adaption - for the 
other forms, too. 

If X has vertices and if there are optimal solutions 
to (1), then there is a vertex in the optimal set. In each 
vertex of X at least n restrictions of (1) will be active or 
tight. And in each edge of X at least n — 1 restrictions 
are active. Every nonoptimal vertex is incident to an 


3074 


Probabilistic Analysis of Simplex Algorithms 


edge improving the objective. And if an optimal vertex 
exists, every iterative construction of a connected path 
over such improving edges leads to the optimal vertex 
after a finite number of steps. These facts are exploited 
in the design of the simplex method, which works in 
two phases: 


I Finda vertex xp € X; 
IF there is no vertex, 


THEN STOP. 
II Construct a sequence of vertices xo,...,x5 € 
X, such that for i=0,...,s—1 the vertices x; 


and x;,) are adjacent and vx; <v" xia. 

IF x; is optimal OR 

IF at x, the nonexistence of an optimal solu- 
tion becomes obvious, 

THEN STOP at x,. 


Phase I works in a similar manner to Phase II. Since 
Phase II admits a better geometrical explanation, and is 
simpler to analyze, we concentrate (for the beginning) 
on Phase II. 

Note that our definition of Phase II still gives the 
freedom, how we determine the successor vertex (if 
more then one are possible). A rule for that decision will 
fix a ‘variant’ of the simplex algorithm. The complexity 
of Phase II (the so-called ‘simplex algorithm’) is mainly 
determined by the number s. Less difficult to analyze is 
the effort to perform a single pivot step, which costs at 
most O(mn) arithmetic operations for updating an (m 
x n)-tableau under all reasonable variants. 

In this article we are interested in the average case 
behavior of the random number s, when our problems 
(1) follow a given distribution. Since nobody knows the 
‘real world distribution’, we have to introduce and to 
use a self-made stochastic model about the appearance 
of special instances of (1). 

Based on that model, we will evaluate the stochastic 
behavior of s. It is clear that this will massively depend 
on 
e the variant under use; 

e the stochastic model, respectively distribution, cho- 
sen. 

A probabilistic analysis of the behavior of an algorithm 

consists of three essential steps: 

e astudy of the way the algorithm is working on given, 
deterministic problem-instances including a charac- 


terization of the desired figures (e.g. s) for that in- 

stance; 

e aconsensus about an underlying stochastic model 
on the distribution of occurring problem-instances; 

e a cumulation over all possible instances, weighted 
with their occurrence probability, leading to 
stochastic information on the random behavior. 

So, we study the procedure of a deterministic algorithm, 

which is employed to solve random problem-instances. 

This stands in contrast to the situation with ran- 
domized algorithms, where random parameters decide 
how the algorithm shall proceed in solving a given, de- 
terministic problem. 

Throughout the paper we shall rely on a nondegen- 
eracy assumption: All submatrices of (A, b) and of (A)T, 
v are of full (i. e., maximal) rank. 

This is compatible with our models either by di- 
rect conditioning or by the fact that in such a prob- 
abilistic model the set of degenerate problems is 
a nullset. 

In this paper, we shall briefly report on experiments 
and their (limited) information-value. After that we 
come to two different stochastic models which admitted 
a successful probabilistic analysis. The first is the sign- 
invariance model, whose analysis reached its summit in 
the middle of the 1980s. And the second is the rotation- 
symmetry model, whose evaluation had started even 
earlier. But the refinement of that approach is still going 
on. 


Numerical Experiments 
and Comparison of Variants 


The first idea to learn more about the average case be- 
havior of s is to carry out controlled numerical (Monte- 
Carlo) experiments. For that purpose, one has to fix 
several dimension-pairs (m, n), to use a stochastic 
model for generating the data, and to solve the created 
problem-instances by application of a given variant. 

These experiments can be employed for a variety of 
purposes, as for forecasting the number of pivot steps s 
(on the basis of (m, n) for a fixed variant and model) or 
for comparing different variants or for recognizing the 
different influences of stochastic models. 

All of that had been done and tried in the past. 
Studying the huge number of reports on such exper- 
iments leads to a very confusing and frustrating im- 
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pression. Since stochastic models, employed variants, 
dimensions and problem-types vary excessively, the re- 
sults and methods can hardly be compared. In particu- 
lar, it is not possible to summarize the outcome briefly, 
since all the test parameters would have to be men- 
tioned exactly. So, we refer to the very informative sur- 
vey by R. Shamir in [19], who comes to the following 
overall conclusion: 


The smaller dimension n (respectively, the di- 
mension of the polyhedra) enters the mean value 
function of s in a slightly superlinear way and 
the larger dimension m (respectively, the num- 
ber of inequality restrictions, including sign- 
constraints) has only a significantly sublinear in- 
fluence. 


Easier to understand and to interpret are exper- 
iments, when they are done parallel to a theoretical 
study, because then both results, the empirical and the 
theoretical one, can be checked whether they justify 
and confirm each other. This has been achieved in ex- 
periments for the so-called rotation-symmetry model 
(RSM): 

e Letb = 1andlet @ ...> Am, V (and an auxiliary vec- 
tor u) be distributed independently, identically and 
symmetrically under rotations on R” \ {0}. 

Note that only b > 0 is essential. Choosing b = I (ie. 

the vector of ones) means a simplifying standardization 

only. 

The experiments could more or less confirm the 
theoretical results on Ej,,(s) (the expected number of 
pivot steps required for (m, n) problems). These theo- 
retical results will be presented later. Rather informative 
was the comparison of the behavior of different variants 
and of the influence of different stochastic distributions. 
In [11] we tested seven variants belonging to three cat- 
egories (A, B, C), whose geometric description can be 
given as follows. 

Note that in each vertex a decision has to be made, 
which one of the (exactly) n tight restrictions should be 
de-activated. This means that a choice among the sub- 
set of the improving edges (originating from that ver- 
tex) is made. The current basis is the set of the n gradi- 
ents (a);) corresponding to the active restrictions at the 
current vertex. 

e Category A: Variants exploiting information on the 
shape of X and on the objective v™x. 


- (rule of steepest ascent) Choose that incident im- 
proving edge with smallest angle to the gradient 
of the objective function. 

- (rule of greatest improvement) Take that edge 
which leads to the maximal improvement of v™x 
in the next step. 

e Category B: Variants exploiting information on the 
objective only. 

- (Dantzig’s rule) Since a vertex x is optimal if 
and only if v is in the cone of the gradients a; 
of the current basis, we can calculate in each 
step the representation of v by that basis of R”. 
So every basis-gradient is associated with its v- 
representation coordinate. Since optimality re- 
quires a completely nonnegative representation, 
Dantzig’s rule suggests to take the edge that de- 
activates the restriction whose gradient has the 
most negative coordinate. 

- (shadow-vertex algorithm or parametric rule) 
This is the variant for which theoretical studies 
worked very well. The results will be presented 
in the following sections. Therefore, we explain 
it in detail. This variant leads from a vertex op- 
timizing an alternative objective u™x to the op- 
timal solution for v™x (or an unbounded edge), 
by providing all optimal vertices to the family of 
objectives (A v + u)Tx with A € [0, oo). 

For A starting at 0 and increasing, the sequence 
of optimal vertices gives just the parametric sim- 
plex path, which had also been constructed in 
the early parametric variant of S.I. Gass and T.L. 
Saaty. They had introduced the parametric con- 
cept for another type of problems and without 
the geometric shadow-vertex interpretation: If 
we project X on Span(u, v), then our variant 
constructs a path starting in the u7x-optimum 
visiting only shadow-vertices. These are vertices 
which keep their vertex-property even after the 
projection. The projection-image of the con- 
structed path is again a path along the boundary 
of the two-dimensional image of X. The choice of 
the right edge is organized by calculation of the 
basis representations of v and u (as in Dantzig’s 
rule) and by minimizing the quotient of corre- 
sponding coordinates. 

e Category C: Rules evaluating combinatorial princi- 

ples only. 
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- (rule of random choice) Select randomly one of 

the improving edges. 

- (rule of justice) De-activate the tight restriction 

that had been active most often. 

- (rule of Bland) De-activate the tight restriction 

that has the least original index. 
These variants were compared under three different 
rotation-symmetric distributions for the vectors 4;: 
e uniform distribution on w, (the unit sphere of R"); 
e uniform distribution on 2, (the full unit ball of R”); 
e Gaussian distribution on R". 
In general, it turned out that results for Gaussian dis- 
tribution were better (smaller) than for unit ball and 
the latter were better than the unit sphere results. This 
effect is simply caused by different ‘redundancy-rates’. 
A restriction is redundant, if its existence or nonexis- 
tence has no impact on the shape of X. Here, the ith 
restriction is redundant if and only if a; belongs to 
Conv(0, a1,...5 4j—15 4j+1,---) Am). This will never hap- 
pen when all points come from w, and rather seldom 
when all points come from §2,,, but very often when the 
points are Gaussian distributed. And, it is obvious, that 
under normal circumstances a problem becomes easier, 
if more restrictions are redundant, respectively if the re- 
dundancy rate is high. 

The quality of the different variants can be ordered 
consistently. The best performance shows the rule of 
steepest ascent. It is slightly better than the rule of great- 
est improvement. These two variants show a very good 
performance in particular when the current vertex is 
still far away from the optimal one. 

A bit worse are the variant of Dantzig and the 
shadow-vertex algorithm. The reason may be that they 
do not exploit information on the polyhedron itself 
(which may make the edge-choice more ineffective, but 
saves computation time in the single pivot step). 

Considerably worse is the performance of the com- 
binatorial variants. The best among these is random 
choice, followed by rule of justice, and finally comes 
Bland’s rule. 

The overall impression is that the differences be- 
tween Category A and Category B are not dramatic, but 
that the differences between Category B and Category 
C are striking. 

We have also tested the standard deviation of s and 
the more meaningful quotient between standard devi- 
ation and mean value (for the number of pivot steps). 


This quotient was less, but close to 1, when m was in 
the order of n. But the quotient became quite small for 
m > n. We understand this as a hint that in the RSM 
for m— oo and fixed n (the “asymptotic case’) the shape 
of X, the number of facets as well as their size, and the 
length of edges will stabilize more and more. 

However, all these experiments and their outcome 
are not at all satisfactory for a final judgement. One rea- 
son is that the computation time for a sufficient number 
of repetitions of the experiments is irresponsibly high. 
Hence we cannot advance to reasonably high dimen- 
sions. Also complexity theory investigations cannot be 
settled by limited experiments. A third argument con- 
cerns potential regression analysis attempts based on 
the data of the results. It is almost impossible to mod- 
elize the qualitative structure of E,,,,(s) as a function of 
m and n with parameters to be specified by the regres- 
sion, as long as we do not understand (theoretically) 
the interaction between m, n and the stochastic model. 
Many such attempts failed as the model structures did 
always fit only in a bounded range of m and n. 

Much more meaningful is the outcome of theoreti- 
cal (arbitrary dimension) considerations. In the follow- 
ing, we present two successful approaches. 


Results Under the Sign-Invariance Model 


The first investigation under this kind of model was 
done by S. Smale [20]. He analyzed problems of type 


(3): 


maximize v' x subject to Ax < b,x >0 
and treated this problem as a special case of the linear 
complementarity problem (cf. » Linear complementar- 


ity problem) 


Find w,zeER™* 

such that for giveng € R™*", 
Me Rimtax(atn) | (5) 
w—Mz=q, w'z=0, 
w>0,z>0. 

When 
0 -A b 
wala g) of a=). 


then a solution of (5) yields a solution of (3) and its dual. 
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As solution procedure, Smale employs the Lemke al- 
gorithm. One starts with a solution for 1 replacing q. 
Then one moves on [1, q] forward and performs pivot 
steps whenever one of the coordinates of the vector w 
reaches the value 0, in order to keep all components 
nonnegative. 

The analysis amounted to the question, how many 
cones of a special type will be intersected by a line seg- 
ment. This is a typical question for a parametric algo- 
rithm. The expected number of pivot steps Em, n(s’) was 
analyzed under the following stochastic model: 

1) (A, 8, v) is distributed absolutely continuous. 
2) The columns of (A, b) and v are independent. 
3) The measure of (A, b) is invariant under coordinate 

permutations in columns of (A, b). 

Smale proved for problems distributed under that 
model: 


Theorem 1 ([20]) 


Engl) < C(n) (1 + In(m + yee ; 


This shows polynomiality in m (for fixed n), but not in 
n. C(n) is an (exponential) function of n. 

Smale’s studies gave a motivation for the analysis 
of the so-called sign-invariance model (SIM). It is ex- 
tremely simple and only relies on a finite number of re- 
flections and symmetries. 

Let A, b and v define a nondegenerate dataset for 


problem (3). Let the occurrence of (4 :) and of 


S;AS, Sb 
( v' Sy 0 
S; € R™*™ and S, € R"*". (A sign matrix is a diago- 
nal matrix with +1 or —1 in the diagonal entries). To 
explain the impact of that model, it suffices to consider 
a somehow relaxed version of sign-invariance, the so- 
called flipping model, where we consider only the sign 
matrix S, and deal with problem instances of form (1): 


) be equiprobable for every sign matrix 


(6) 
at ele nyt ee oO" 


ne vix 
Here, < > indicates that one of the relations < or > 
shall be valid in the formulation of the instance. We 
get < if s;; = 1 and > if s;; = — 1 in (6), respectively in 
S}. Since all sign matrices S; shall be equiprobable, this 
means that we independently determine the m direc- 


tions of the relations, each one with probability % for < 
and with probability % for >. 

By the way, we generate exactly 2” problem- 
instances out of one data set. The idea of averaging is 
to solve all 2” instances, to sum up the required pivot 
steps and to divide by the number of instances. 

This set of problem instances can be solved (as far 
as Phase II is concerned) by application of the shadow- 
vertex algorithm explained in the section above. There 
we realize a simplex-path over all (temporarily) optimal 
vertices when we traverse the set of objectives (u + A 
v)'x for A > 0. 

If we add the corresponding set of (optimal) vertices 
for negative values of A, then the total set will be called 
the set of co-optimal vertices. 

With s for the number of pivot steps, Scoop for the 
number of co-optimal vertices and S for the number of 
shadow-vertices, the following relation is obvious: s < 
Scoop < S. 

Using simple combinatorial enumeration argu- 
ments, I. Adler and M. Haimovich [13] showed 


Theorem 2 ([13]) For type (1) under SIM: 


Em,n(Scoop|4 co-optimal path exists) 
m—-n+2 
—____"< 

m+1l ~— 
So far, the analysis considers only the procedure of 
moving from a u™x-optimum to a v’x-optimum. But 
this does not fit exactly into a probabilistic analysis of 
a complete solution method (as Smale’s method), be- 
cause the uTx-optimum is not given beforehand and 
calculating it would be as troublesome as calculating the 
v'x-optimum. 

In 1983-1984, the combination of this result with 

a design of a complete algorithm was done in three 
papers by M.J. Todd [21], Adler and N. Megiddo [3], 
Adler, R.M. Karp, Shamir [2]. They all came to the same 
result for E,,,,(s‘), the expected number of pivot steps 
required to solve the LP completely (including Phases I 
and II). 


nN. 


Theorem 3 ([2,3,21]) For problems of type (1), respec- 
tively of type (3), distributed under SIM, the expected 
number of pivot steps for the complete solution by a lexi- 
cographic version of Lemke’s algorithm (s') is 

Emn(s') < 2(n + 1)°, 


(respectively, < 2 min(m’, n7)). 
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In the first two papers, analyzing type-(3) problems, the 
proof was based - as in Smale’s analysis - on the eval- 
uation of a probability that a typical cone is intersected 
by a line. But this time, this is the line 


[d, 1] 


with 0 < d= (6, 87, ...,6"*”)T and 6 as small as desired. 
Closer to our geometrical interpretation and easier 
to explain is the third approach in [2]. 
To explain the solution process of a type-(1) prob- 
lem, we use 


(n+k) ,__ n, oT 1 T n+k 
Xx Se Rae SP ues \ 


for0<k<m-—nand X™ =X. 

The following complete algorithm works directly in 
the space R” and is called the lexicographic variant of 
the constraint-by-constraint method: 


Initialization 
Determine the unique vertex x € X“”) and choose u 
asu = 6'!a,+...+65"a, with 6 > 0 sufficiently 
small. 
Stagek (1 < k < m—n) 
START at xX, the maximal vertex for u!x 
on Xntk-l) 
Inge x) 
THEN go to stage k + 1. 
ELSE 
use the shadow-vertex algorithm to improve 
the value of cae: (note that so far ac > 


Back) 
START at x and minimize an , on 
X(ntk-l) 
STOP as soon as Gin < pntk 
On the last traversed edge find a point x with 
a! X= prtk 

n+k 


ENTER stage k + 1 with %; replace x. 

This is possible, because we have moved on 

a co-optimal path, hence % maximizes u!x 

qn. 

STOP if it is impossible to achieve ole < 

because then the original problem is infeasible. 
Stagem—n+1 

START at X, which maximizes u' x on X(” = X, 

Apply the shadow-vertex algorithm to find the 

optimal point for v'x or discover that v! x 

is unbounded on X. 


pntk 


In principle, this amounts to solving (m— n + 1) 
problems, for which the average number of steps is less 
n each (Theorem 2). But now, due to the lexicograph- 
ical choice of u, it can be exploited that (when we en- 
ter stage k + 1) most of the work to optimize the cur- 
rent objective has already been done in earlier stages. 
Thus, the effort of stage k + 1 becomes much smaller 
than n. 

Finally, the order of the total average number of 
steps is O(n”) instead of O(mn). 

With slight additional conditions on the distribu- 
tions of the A entries, Adler and Megiddo [3] could es- 
tablish also a lower bound of type C- n’. 

They argued that for m < 2n, since the share of fea- 
sible problems is at least n~ *, the conditional expected 
number of pivot steps for solving LP’s of that model, 
under the condition that the problem instance is feasi- 
ble, is O(n). 

As for every probabilistic model, one should ask 
about the direct impact of the model on the results. 

An important feature of SIM is the fact that many 
instances will be infeasible, precisely 


number of feasible instances 


number of generated problems 


(tH) 
~ 


as m — oo while n is fixed. 


Only conditioning on feasible problem instances avoids 
averaging over a lot of easy problems. But even if we 
do so, we meet a remarkably small expected number of 
vertices: 


Em,n (vertices per feasible instance) = a 
(¢) + 

which is less than 2” and converges to that value for m 
— oo, n fixed. 

Now it is not astonishing, that for a large class of 
variants the average number of pivot steps for the com- 
plete solution will be bounded from above by a function 
of n only (cf. [1]). 

But the most important cause for simplification of 
problem instances with m > n is the average redun- 
dancy rate (the share of the restrictions without impact 
on X). This expected number (conditioned on feasible 
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problems) is 


CC 
(Greet) 
as m — oo while n is fixed. 


>1 


Simultaneously, the absolute number of nonredundant 
constraints (conditioned on feasibility) tends to 2n for 
m — oo and fixed n. 

So, for m > n, even for the feasible instances, the 
average nonredundancy rate will be very small. This will 
of course make these problems easy. And it says that the 
sign-invariance model gives reasonable and meaningful 
information only for m = O(n). 

SIM relies on symmetries and reflections only. The 
combinatorial methods for evaluation make it un- 
likely that a calculation of higher moments of the s- 
distribution can easily be done. Besides that, the model 
is somehow inflexible. For every set of data, the reflec- 
tion procedure leads to exactly the same cumulated sta- 
tistical characteristics in the total set of the 2” instances. 
There is no way to choose a desired redundancy share 
or a size of the expected number of vertices and to 
parametrize certain figures in order to study their im- 
pact. 

Apparently (in particular for m > n), the small up- 
per bounds in Theorem 2 and Theorem 3 do rather 
reflect the special properties of the model than confirm 
the efficiency of the simplex method, which had been 
pointed out in [1]. 


Results Under the Rotation-Symmetry Model 


The theoretical analysis based on the rotation- 
symmetry model RMS above started in 1977 [4] by 
work of K.H. Borgwardt and is still (1998) develop- 
ing. The most important among his results, a polyno- 
mial upper bound for the expected number of shadow- 
vertices, was derived in 1996-1997 [10] and it had pre- 
decessors with slightly cruder bounds in 1987 [6] and 
1982 [5]. 


Theorem 4 ([10]) For every rotation-symmetry distri- 
bution as in RMS and for every pair (m, n) with m = n, 
the expected number S of shadow-vertices (and of pivot 
steps s in Phase II) satisfies 


AE mn(S) & Em n(S) < const - ma-D -n2, 


This result and its predecessors have been derived by 
translating the question about S into the dual space of 
the vectors a;. Candidates for being a vertex are only the 
(”") basic solutions x4 solving a system of n equations 


+ 


aal x=1,...,aaqn'x=1 


with 
A={Al,..., AC {,...,m}. 


xa is actually a vertex if all other restrictions are satis- 
fied, i.e. a) xa <1 foralli¢ A. 

It becomes a shadow-vertex if the projection on 
Span(u, v) preserves its vertex property. 

Now there is a one-to-one correspondence 


xacoA= {AY cn, A) > Compa xis cp PAR) 


Besides X = {x : Ax < b} we consider its polar poly- 
hedron Y = Conv(0, ay, ..., Am). 

The following equivalences enable us to derive the 
average number of shadow-vertices directly from the 
input data: 


Lemma 5 

1) x, is a vertex of X if and only if Conv(aq., . 
is a facet of Y. 

2) A vertex xa is a shadow-vertex of X if and only if 


1.) Aan) 


Conv(d,q1,...,4@an) M Span(u, v) 4 @. 


The addition theorem for expectation values and the 
symmetry of index choices yield 


Emn(S) = (")o( 


Here, one integrates over all possible configurations of 
@), ..-) Am, U, v and weights with regard to the un- 
derlying distribution. The resulting multiple integral is 
very hard to evaluate. For the case of moderate dimen- 
sions (m, n arbitrary), we could only compare our inte- 
gral with known results about a closely related integral. 
Much more efficient are the tools for evaluating the so- 
called asymptotic case (m — oo, n fixed), because there 
the integrals behave like Laplace integrals and can con- 
veniently be evaluated. So it was much easier to derive 
asymptotic results for specific RSM-distributions. 


Conv(a,,..., dy) is a facet 
intersected by Span(u, v) 
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In the sequel, we write E,,,,(S) ~ f(m,n) asm > 
oo, n fixed, when we mean that 


Emn(S . Emn(S 

C, <_ liminf inf) <  limsup in(S) <Q 
mos JV”) aoe f(m,n) 
n fixed n fixed 


for certain constants Ci, C2 > 0. Besides that, we speak 
of an RSM-distribution on 92, with algebraically de- 
creasing tail if 


P(||x|| =) ~ (- 4)” 


forr—>l1foray>0. 


Theorem 6 ([4,6,14,16,18]) For fixed n and m — ov, 
the following distributions lead to the following behavior 
of Emn(S): 

e Gaussian distribution on R": 


Emn(S)~ Vinmn? . 
e Uniform distribution on 2 ,: 
Emn(S) ~ mot 7? , 
e Uniform distribution on wy: 
Emn(S) ~ m@=5 nr? , 
e There are RSM-distributions such that 
Emn(S) ~ C(n). 


e For RSM-distributions on 2, with algebraically de- 
creasing tail: 


oa 
Emn(S) ~ MO-TRY n° , 


These results should be compared with corresponding 
results on the average number of vertices of X, respec- 
tively facets of Y, denoted by Ey,,(V) in our model. 


Theorem 7 ([6,8,16]) For fixed n and m > om, the 
following distributions lead to the following behavior of 
Emn(V): 

e Gaussian distribution on R": 


(n—1) (n—1) 


Emn(V) ~ [nm] 2 2"-n 2 


= 
va 


e Uniform distribution on 22): 


(n-]) on n _1 (n—1) 
Emn(V) ~ meFD22 x win i(n+1) 2 
e Uniform distribution on wy: 


(u=) on n _5 (n—1) 
Emn(V) ~ m= 22 xe? n-4(n—1) 2 


e For distributions on 92, with algebraically decreasing 
tail: 


w=) non 
Emn(V) ~ me-1F2V 22 7 on 


(nl) 
(n—l) (N\~ %a— 
2 Gi ees 2y) = (5) 2(n—142Y) ; 


Obviously, the simplex method is able to select a rather 
short path through the huge set of vertices. Hereby it 
visits (on the average and approximately) only the nth 
root of the total number of available vertices. 

Another very important point is the variance of the 
number of shadow-vertices, respectively of the number 
of required pivot steps. Due to the technical difficulties 
mentioned above, so far (1998) only the asymptotic case 
has been analyzed. K. Kiifer [17] showed 


Theorem 8 ([17]) For distributions with algebraically 
decreasing tail on 92, the quotient of variance and 
square of expected value behaves asymptotically as fol- 
lows: 


Vat m,n (s) 1 


Eats) n? 
Var m.n(S) oe 
—_—§— © m(n—142Y) , 
Emn,n(S) 


Here, s is the number of pivot steps of the shadow-vertex 
algorithm (Phase II) and S is the number of shadow- 
vertices. 


So far, we have dealt only with a fictive Phase II algo- 
rithm, starting at an optimal vertex for an auxiliary ob- 
jective. But this vertex is impracticable to find. Now let 
us talk about a safe Phase I. 

A special feature of our problems is the feasibility 
of the origin, which makes (in contrast to the sign- 
invariance model) every instance feasible. Based on that 
information, we can employ a method (cf. [5] and[6]), 
which applies the shadow-vertex algorithm n— 1 times, 
and each time the dimension of the problem is in- 
creased. In each of these stages all the stochastic re- 
quirements of our model are satisfied. 
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Here we introduce 


xX = ‘#3 Ax <1,x*t} See =X” =o! : 


and formulate the dimension-by-dimension algorithm: 


Initialization(Stage k=1) 
Starting from the origin, find a vertex of X) 
maximizing v' x =v! - x}, 
IF this maximal vertex does not exist, 


THEN STOP. 

Stagek,2<k <n, 
Use the optimal point (ee a 
0,...,0)' for v'x on X-), which is 


located on an edge of X“). 

1. Find an adjacent vertex in X“ to that edge. 
Apply the shadow-vertex algorithm using e/ x 
and v'x as pair of objectives for maximizing 
v'x on X®), 

IF v'x turns out to be unbounded on xX), 
STOP. 

3. IFk <n,set k = k+1 and enter the next stage. 

IF k = n, PRINT the optimal vertex for X. 


One can derive an upper bound for this cumulation 
of n — 1 applications of the shadow-vertex algorithm 
by summing up all the expected numbers of shadow- 
vertices. But this would significantly overestimate the 
actual number of pivot steps in this algorithm, since we 
would ignore that the original distribution comes from 
R’ and that only projection distributions (from R” to 
R‘) can determine the behavior in stage k. Since the set 
of projection distributions is only a small subset of the 
RSM-distributions in dimension k, the corresponding 
bound for the expected number of steps in stage k is 
much better. Consequently, we obtain the following re- 
sult, which also holds for problems of type (3), includ- 
ing sign-constraints: 


Theorem 9 ([9,10]) For every pair (m, n) with m = 
n and every RSM-distribution on R", the expected total 
number of pivot steps for the dimension-by-dimension 
algorithm satisfies 


1 
Expats) sm). w-C , 


as well for problems of type (1) as for problems of type 
(3). 


But, as observed in the analysis of the constraint-by- 
constraint method (cf. Theorem 3), it is plausible that 
most work of optimizing in stage k + 1 has already been 
prepared in prior stages, such that the actual number of 
steps in stage k + 1 is much smaller. This was precisely 
clarified by G. Héfner [14] for the asymptotic case: 


Theorem 10 ([14]) For every RSM-distribution the ex- 
pected total number of pivot steps in the dimension-by- 
dimension algorithm satisfies 


pote 2 
Begs) ~ me) .n2 


when m — oo and n is fixed for problems of type (1) and 
(3). 


It must be clear that this algorithm is crude and lengthy 
and has been introduced only for meeting the condi- 
tions of RSM and for making the probabilistic analysis 
possible. 

In order to confirm the ‘folklore’ observation, that 
Phase I can be done with an effort not exceeding that of 
Phase II, Hofner analyzed another complete algorithm. 
But unfortunately, this method is assured to work only 
in the asymptotic case. 

1) Solve the problem 


max 1! x subject to Ax < 1.x >0 


by use of the shadow-vertex algorithm starting at the 
vertex 0. The optimal vertex x will (in the asymptotic 
case) with extremely high probability be a vertex of 

x= \" Ax < i}. 

2) Start the shadow-vertex algorithm at x, forget about 

the sign constraints and optimize v™x on X. 

It can be shown that both applications of the shadow- 
vertex algorithm require (on the average) at most 
m\"~1) . 7? . const pivot steps. 

So, this is an algorithm with a Phase I effort not ex- 
ceeding that of Phase II. 

So far, the plausible and natural very good behav- 
ior of Phase I can only be guaranteed in the asymptotic 
case. In the moderate cases, the situation is similar to 
that of the sign-invariance results, where the constraint- 
by-constraint method needs a factor n more pivot steps 
than Phase II does. 

As we have discussed the advantages and drawbacks 
of SIM, we now consider similar questions for RSM. 
Seemingly it is a tremendous advantage of RSM that it 
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generates only (the hard) feasible problems. But simul- 
taneously it turns out to be a drawback that the given 
Phase I algorithms are designed in a way such that they 
exploit this fact and are dependent on the guarantee of 
‘0 being feasible’. 

One way to overcome that drawback lies in the fol- 
lowing idea. Remember that we want to solve all prob- 
lem instances of the type 


max vix 


Sh weer cae eee 


with arbitrary values of b! (not necessarily positive). 
For integrating all these problems in our analysis, 

we use a ‘homogenization method’. We introduce the 

notation P,, := {x : Ax < b} and reformulate our restric- 

tions as ne 

e ajx< bi corresponds to a) x < 1—b' when bi = 
1— bi. 

So we can demand that 


eee 12 nga eet Se 1 


and define a polyhedron in R"*’ by 


which means that a) x + b'x"*! < 1(x € R") fori = 
1,...,m. 

This system defines a new polyhedron P,41 C 
R"*!. The set of feasible points with x"*! = 0 is a one- 


to-one copy of \*: Ax < i}. The set of feasible points 


with x”*! 


= 1} corresponds one-to-one to the set of 
points in P,. 

It is now clear that in level {x”*1! = 0} the prob- 
lem satisfies all RSM-requirements. So we can solve the 
optimization problem for v™x on that artificial polyhe- 
dron. But then we can use one further stage (n + 1) of 
the dimension-by-dimension algorithm to reach level 
nel el ag! Om Py aa). If 
we use the shadow-vertex algorithm starting at the level 
{x”*" = 0}-optimum, then we walk on a co-optimal path 
all the time. And there will be two possible outcomes: 
1) max{x"*!:xeP,,i} <1. 

Then level {x"*! = 1} has no feasible points and P, 


is proven to be empty, respectively infeasible. 


{x = 1} (by maximizing x 


n+1 


2) max{x"*!: xe Py }>1. 
Then the shadow-vertex path in stage n + 1 will tra- 
verse the desired level. We calculate the intersection 
point, drop the last coordinate 1 and have the opti- 
mal point for max {47x subject to Ax < b. This results 
from the co-optimality of our path. 

Now the following probabilistic result is obvious: 


Theorem 11 ([7]) If (i) Nosass (:2) are distributed 


on R"*! according to the RSM, then general problems of 
type (1) can be solved for every (m, n) with an expected 
total number of pivot steps as 


Emn(s‘) <m*(n + 1?°C. 


But this condition has a quite artificial flavor, because 
the RSM-distribution of the augmented vectors may 
lead to dependencies between the gradients a; and the 
capacities b'. 

We know of one special distribution where both 
wishes (RSM-distribution and independency) can be 
combined, namely the Gaussian distribution on R"*'. 
This is the only RSM-distribution where the compo- 
nents of the generated vectors are independent. We ob- 
tain: 


Theorem 12 ([7]) If the vectors 


are independent and Gaussian distributed, then 


Ens) = mi(n + 1)°- const. 


For more general independent distributions of the right 
sides (the capacities), as for uniform distribution, we 
could not derive satisfactory bounds so far (1998). 
However, this seems to be caused by technical difficul- 
ties only. The special results in Theorems 11 and 12 
indicate that general problems with arbitrary indepen- 
dent capacity distribution may be solvable on the aver- 
age with the same effort. 

We conclude our report with a look on general vari- 
ants. In [15] and [12] P. Huhn proved a lower bound on 
the average number of pivot steps, which is valid for 
all variants. Assume that Phase I has provided us with 


Probabilistic Analysis of Simplex Algorithms 


3083 


a vertex x9 of X, and that the objective v™x had no im- 
pact on Phase I. Then we start at xo with Phase II and try 
to reach the optimal vertex xo. To bridge the distance, 
every variant has to use edges of the polyhedron X. Now 
stochastic geometry can provide information on the dis- 
tribution of the length of these edges. If one can show 
that there are extremely few ‘long’ edges, then a large 
number of ‘small’ edges has to be used for our walk. 
This has been done in [15] and [12] and it gave a guar- 
antee that no variant can (on the average) do its job 
with less than a certain (computable) number of steps. 
Quantitatively, this reads as follows. We present only 
the result for a special distribution, the uniform distri- 
bution on w, (corresponding results have been derived 
for a large class of distributions). 


Theorem 13 [15] In a typical RSM problem with uni- 
form distribution on w,, every variant of the simplex al- 
gorithm will, on the average, require a certain number 
Err nS) Of pivot steps, and 


1 
j mo 0 A 
Ee) >C-me).n with C > 0. 


Despite the fact that here the n-order is n° (compare 
with n? for the shadow-vertex algorithm), this shows 
that no variant can perform substantially better. This 
means that there is no algorithm (variant) running 
essentially faster than the shadow-vertex algorithm, 
which can exploit the increasing number of options 
with n, and which avoids the typical order m!/\"~ )) 
the RSM. 

Thus, a posteriori, the results on the shadow-vertex 
algorithm have proved to be quite representative. It is 
not the very best variant, but not much worse than the 
very best. 

Note that the lower bound in Theorem 13 is mean- 
ingful only when m > n, because only then it becomes 
significantly greater than 1, although the inequality is 
valid for all (m, n). This is different from the results 
about the variance (Theorem 8) and the speedup for 
Phase I (Theorem 10), where it is uncertain, whether 
these results will be valid in moderate dimensions, too. 
(Perhaps not the technical difficulties are to blame.) It 
may as well be possible that these results essentially 
rely on a regularization effect of the polyhedra for large 
number of points, as we know it from the approxima- 
tion of a ball from inside by the convex hull of a huge 
number of random points. 


in 


To clarify these questions, remains an important 


challenge for future research. 
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When formulating stochastic programming problems 
one usually starts from an underlying deterministic 


problem which, in case of probabilistic constrained lin- 
ear programming (PCLP), is the linear programming 
(LP) problem. In game theoretical formulation, the LP 
primal-dual pair of problems is the following: 


sup {yAx} — min, 
yey’ 


xe X’={x eR": Ax >b, x > 0}, 
and 

inf {yA ; 

inf, ty. x} — max 

yeY’={yeR”™: yA<c, y>0}, 


where A is a given matrix of dimension m x n, b and 
c are given vectors of dimension m and n, and x and 
y are decision vectors of dimension n and m, respec- 
tively. In order to extend the LP duality concept [2] we 
reformulate the feasibility sets of the two problems by 
introducing probability constraints as follows: 


sup {yAx} > min, 
yey (1) 
xeX={x ER": P(Ax > B)> p, x => 0}, 


and 


inf {yAx} > max, 
xEX (2) 
ye Y={yeER”: P(yAX<t)>q, y= 0}. 


Here f and — 1 are random vectors of dimension mand 
n with given joint continuous probability distribution 
functions F and G, p and q are reliability levels in (0, 1); 
A, x, y are defined as above. 

The pair of problems (1) and (2) was introduced 
in [3]. Each of them is a generalization of problems 
with probabilistic constraints and linear objective, in- 
troduced by A. Charnes and W.W. Cooper [1], by B.L. 
Miller and H.M. Wagner [6], and by A. Prékopa [7]. 
The formulation of a dual problem can be of help in 
elaborating a solution method, in analysing sensitivity 
of a solution, in assessing the closeness of the objective 
function value at a solution at hand from the optimal 
value. 

Because of the presence of the bilinear function yAx, 
the objective functions of both (1) and (2) directly de- 
pend on the feasibility set of the other problem. It is de- 
sirable, therefore, to find objective functions for Prob- 
lems (1) and (2) such that either of them could be in- 
terpreted in itself. The idea is similar to that applied 
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in LP where cx and yb can replace the corresponding 
objective functions. The argumentation leading to such 
a pair of self-contained problems follows. 

Define the following sets: 


B={beR"™: F(b)=> p,b € supp F} , 
X(b) = {x ER": Ax >b,x>O0}, 
for given b € R™, 

C = {ce R": G(-—c) > q, —c € suppG} , 
Y(c)={yeER™: yA<c, y>0}, 


for given c € R”. 


The sets X(b) and Y(c) are convex polyhedral sets. The 
sets B and C are closed because F and G are assumed 
to be continuous, and their interiors are not empty be- 
cause p, q < 1, B (respectively, C) is convex if F (re- 
spectively, G) is quasiconcave. The set B is bounded 
if the support set supp F (the smallest closed subset of 
R” whose probability measure generated by F is 1) is 
bounded. Similarly, the set C is bounded if supp G is 
bounded. Assume, in the course of the transformation 
below, that B and C are bounded. Obviously, 


X= {x ER": x € X(b), bE BY, 
Y= {yeR™: ye Y(c), ceC}, 


so (1) and (2) can be rewritten in the following equiva- 
lent form: 


mi. JP = ” 
and 
mar ee fat 7AP| ff . 
Observe that 
/ 
a pe aed gl 
for any given c € R" and x € R", with inf cx’ = — 00 if 


Y(c) = 9, by the LP duality theorem. It means that 


inf tex’) 


x/EX(Ax) 


= sup | 


sup) sup yAx 
ceC [ yeY(c) cEC 


and both sides are defined to be — 00 if Y(c) = @ for all c 
€ C. The function cx’ is convex in c, concave in x’, and 


continuous on the product set C ® X(Ax) provided C is 
convex. Assume that C is convex. Then the saddle value 
of cx’ exists with respect to minimizing over X(Ax) and 
maximizing over C [9]: 


inf \ sup cx’ 


cx’) = 
x/EX(Ax) | cEC 


edi 
In fact, cx’ has a saddle point if {c € C: Y(c) 4 G} sat- 
isfies the Slater condition: its interior is nonempty. The 
reason is that infy < x(4x)cx’ as a function of c is closed, 
convex hence continuous over its domain {c: Y(c) 4 @} 
[10]. Therefore, it attains its maximum over the com- 
pact set {c € C: Y(c) # O}, say at c°. Then, by the LP 
duality theorem, infy ¢ x(4xc° x’ is attained, say at x°. It 
implies that, in the presence of the Slater condition, (c°, 
x°) is a saddle point, with respect to minimizing over 
X(Ax) and maximizing over C [5]. This fact is needed 
to ensure that not all the optimal solutions of (1) are lost 
during the transformation. 

Relax the minimization of supc supy(.yAx over 
X(b) in (3) and use infimum instead. Observe that X(b) 
= {x’ > 0: 4x € X(b) such that Ax’ > Ax} for any fixed 
b ER”, so that 


inf inf sup ca = inf sup eal 
xEX(b) ( x/EX(Ax) (cec xE€X(b) | cec 
where both sides are defined to be + 00 if X(b) = @. Then 


restore the minimization over X(b) and obtain the fol- 
lowing problem which corresponds to (1): 


sup cx > min, 
cEC 


(5) 


Ax >b,x>0, 


F(b) => p, b € supp F 


xEX= 4x: 


By assuming that B is convex, a similar argumentation 
leads to the following problem which corresponds to 


(2): 
inf yb ; 
as 


A<c, y>0, 
yeY= %y: i an 


G(—c) => q, —c € suppG 
(6) 


Next, we summarize the relation between (1) and (5), 
and (2) and (6), respectively. 

If the probability distribution function G is quasi- 
concave, suppG is bounded, (b°, x°) is optimal for (5), 
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then x° is optimal for (1). If the probability distribution 
function G is quasiconcave, supp G is bounded, the set 
{c € C: Y(c)  G} fulfills the Slater condition, x° is op- 
timal for (1), then (b°, x°) is optimal for (5), where b° = 
Ax®. 

If the probability distribution function F is quasi- 
concave, suppF is bounded, (c°, y°) is optimal for (5), 
then y° is optimal for (2). If the probability distribution 
function F is quasiconcave, supp F is bounded, {b € B: 
X(b) # } fulfills the Slater condition, y° is optimal for 
(2), then (c°, y°) is optimal for (6), where c° = y°A. 

It remains to state the duality theorem, the proof of 
which can be found in [4]. 


Theorem 1 (Duality theorem) Suppose that the prob- 
ability distribution functions F and G are quasiconcave, 
that F is a strictly increasing function of its components, 
that supp G is bounded, and that the Slater condition 
int {b € B: X(b) £ 0} F @ holds. If (5) is unbounded 
in value, then (6) is inconsistent. Otherwise, (6) is con- 
sistent, their values are equal, and that value is attained 
in (6). 

Suppose that the probability distribution functions F 
and G are quasiconcave, that G is a strictly increasing 
function of its components, that suppF is bounded, and 
that the Slater condition int {c € C: Y(c) 4 0} £ @ holds. 
If (6) is unbounded in value, then (5) is inconsistent. Oth- 
erwise, (5) is consistent, their values are equal, and that 
value is attained in (5). 

Suppose that the probability distribution functions F 
and G are quasiconcave, that each of them is a strictly 
increasing function of its components, that supp F and 
supp G are bounded, and that the following regularity 
conditions hold: int {b € B: X(b) 4 @} 4 @ and int c € 
C: Y(c) 4 O} 4 @. Then (1) has an optimal solution x°, 
and (2) has an optimal solution y° such that (x°, y°) is 
a saddle point of yAx with respect to minimizing over X 
and maximizing over Y. 


The rich class of quasiconcave probability distribu- 
tion functions includes, among others, the multidimen- 
sional normal, exponential, uniform, gamma distribu- 
tions [8]. In practical problems, the support set of the 
probability distribution in question is usually bounded. 
Although the approximating theoretical distribution 
often has an unbounded support, reasonable truncation 
can be applied. 


See also 


> Approximation of Extremum Problems with 
Probability Functionals 

> Approximation of Multivariate Probability 
Integrals 

> Discretely Distributed Stochastic Programs: Descent 
Directions and Efficient Points 

> Extremum Problems with Probability Functions: 
Kernel Type Solution Methods 

> General Moment Optimization Problems 

> Logconcave Measures, Logconvexity 

> Logconcavity of Discrete Distributions 

> L-shaped Method for Two-Stage Stochastic 
Programs with Recourse 

> Multistage Stochastic Programming: Barycentric 
Approximation 

> Preprocessing in Stochastic Programming 

> Probabilistic Constrained Problems: Convexity 
Theory 

> Simple Recourse Problem: Dual Method 

> Simple Recourse Problem: Primal Method 

> Stabilization of Cutting Plane Algorithms for 
Stochastic Linear Programming Problems 

> Static Stochastic Programming Models 

> Static Stochastic Programming Models: Conditional 
Expectations 

> Stochastic Integer Programming: Continuity, 
Stability, Rates of Convergence 

> Stochastic Integer Programs 

> Stochastic Linear Programming: Decomposition 
and Cutting Planes 

> Stochastic Linear Programs with Recourse and 
Arbitrary Multivariate Distributions 

> Stochastic Network Problems: Massively Parallel 
Solution 

> Stochastic Programming: Minimax Approach 

> Stochastic Programming Models: Random 
Objective 

> Stochastic Programming: Nonanticipativity and 
Lagrange Multipliers 

> Stochastic Programming with Simple Integer 
Recourse 

> Stochastic Programs with Recourse: Upper Bounds 

> Stochastic Quasigradient Methods in Minimax 
Problems 

> Stochastic Vehicle Routing Problems 


Probabilistic Constrained Problems: Convexity Theory 


3087 


> Two-Stage Stochastic Programming: Quasigradient 
Method 
> Two-Stage Stochastic Programs with Recourse 


References 


1. Charnes A, Cooper WW (1959) Chance-constrained pro- 
gramming. Managem Sci 6:73-89 
2. Dantzig GB (1963) Linear programming and extensions. 
Princeton Univ. Press, Princeton 
3. Komaromi E (1986) Duality in probabilistic constrained 
programming. In: Prékopa A (ed) Proc. 12th IFIP Conf. Sys- 
tem Modelling and Optimization. Springer, Berlin, pp 423- 
429 
4. Komaromi E (1992) Probabilistic constraints in primal and 
dual linear programs: Duality results. J Optim Th Appl 
75:587-602 
5. Mangasarian OL (1969) Nonlinear programming. McGraw- 
Hill, New York 
6. Miller BL, Wagner HM (1965) Chance-constrained pro- 
gramming with joint constraints. Oper Res 13:930-945 
7. Prékopa A (1970) On probabilistic constrained program- 
ming. In: Kuhn H (ed) Proc. Princeton Symp. Math. Pro- 
gram., Princeton, pp 113-138 
8. Prékopa A (1971) Logarithmic concave measures with 
application to stochastic programming. Acta Sci Math 
32:301-316 
9. Rockafellar RT (1970) Convex analysis. Princeton Univ. 
Press, Princeton 
10. Williams AC (1963) Marginal values in linear programming. 
SIAM J Appl Math 11:82-94 


Probabilistic Constrained Problems: 
Convexity Theory 


ANDRAS PREKOPA 
RUTCOR, Rutgers Center for Operations Research, 
Piscataway, USA 


MSC2000: 90C15 


Article Outline 


Keywords 
See also 
References 


Keywords 


Probabilistic constraint; Logconcave probability 
density function; Logconcave function; Multivariate 
normal distribution 


The general form of a probabilistic constraint is the fol- 
lowing: 


P(gi(x,€&) >0,... g(x, €) > 0) > p, 


where g,(x, y), x € R",y € R4,i=1,... , r, are some 
functions, & is a q-variate random vector and p is a fixed 
probability. In the simplest case g,(x, y) = Tix — yi, 
where T; is the ith row of a matrix T,i=1,... , 7. In 
this case the above constraint take the form: P(Tx > 
§) = p. 

The most important theorem in connection with 
the probabilistic constraint states that if gi, ... , 
gr are concave, or quasiconcave functions in R"*4 
and & has a continuous probability distribution with 
logconcave probability distribution function, then 
P (gi(x, é)>0,i=1,... .r) is a logconcave function 
in R" ([4,5]). 

There are, however, important cases where the 
functions gj(x, y) are not concave or quasiconcave. S. 
Kataoka [2] and C. van de Panne and W. Popp [3] con- 
sidered the probabilistic constraint of the form: 


P(E,x; Hees + Enxn >b)=>p, 


where & = (&,... , &:)7 has a multivariate normal dis- 
tribution and b is constant. The practical problems were 
investment and animal feed problems, respectively. If 
pm = E(&), C = E[(& — w)(& — w)"], then the above 
probabilistic constraint is shown to be equivalent to 


plx+ (1 — p)vx"Cx>b, 


where @ is the univariate standard normal probability 
distribution function. If p > 1/2, then ®!(1 — p) <0 
and the set of x vectors satisfying the probabilistic con- 
straint is convex. 

Generalizations of this result have been given in 
[1,4]. We look at the joint probabilistic constraint: 


P(E xy Hee + Ein, > b),1=1,...,n) > p, 


where we assume that the altogether rm random vari- 
ables & have a joint normal distribution. The statement 
is that if the covariance matrices of the rows (&;1,... , 
&i,), i= 1,... , 7, are constant multiples of a fixed co- 
variance matrix C), or the covariance matrices of the 
columns (1... , €;)',j=1,... ,, are constant mul- 
tiples of a fixed covariance matrix C2, then the set of x 
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vectors satisfying the probabilistic constraint is convex, 
provided that p > 1/2. For further convexity statements 
see [5]. 
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Introduction 


Management of a firm’s production-distribution sys- 
tem involves tactical and operational decisions along 
with strategic ones. The policies that enable a firm to 
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meet its strategic goals comprise a collection of struc- 
tural and infrastructural decisions, which typically in- 
volve long-term commitments with regards to the con- 
figuration and coordination of its supply chain. These 
structural decisions, a.k.a., facility design decisions in- 
clude location, capacity, product range, and production 
or operating technologies at various facilities. The usual 
objective is minimizing costs or maximizing gross prof- 
its while leaving other manufacturing tasks (e. g., goals 
set for quality, flexibility, etc.) for prior or subsequent 
analysis. Mathematical models of this nature are gen- 
erally called as “production-distribution system design 
problems” (PDSDP). 

Traditionally, location, capacity, technology, and 
product range decisions have been dealt with sepa- 
rately. Facility location models, in general, ignore the 
geographical differences in capacity and technology ac- 
quisition/operation costs. Whereas, capacity expansion 
models mostly deal with the temporal aspects of mar- 
ket demand and do not incorporate location decisions 
with regards to the establishment of new plants. The 
interactions between facility design decisions, however, 
can be significant and PDSDP research takes its motiva- 
tion from these interactions. The governments of many 
countries provide subsidies to support economic activ- 
ities of specific sectors or regions with high rate of un- 
employment. In taking advantage of the capital and/or 
employment subsidies, preferential tax rates, and free 
trade zones provided by governments, firms, especially 
multinationals need to take interdependencies between 
their location, capacity and technology decisions into 
account. These decisions could further be complicated 
due to varying scale and scope economies inherent in 
different technologies. 

A production-distribution network provides an ef- 
fective representation of the manufacturing and logis- 
tics activities of a firm and assists researchers with 
a framework to study various systems. In a typical net- 
work, nodes represent suppliers, manufacturing facili- 
ties, distribution centers, warehouses and customers of 
the firm. The arcs on the network delineate the flow of 
items between nodes. An example network of five ech- 
elons is depicted in Fig. 1. There are mainly two impor- 
tant feature that define the difficulty of PSDSP: first, is 
the number of echelons in the system and second, is the 
number of different types of configurational, most no- 
tably locational, decisions that need to be made. 


Suppliers Markets 


Component Mfg. Final Assembly _ Distribution 


Production-Distribution System Design Problem, Figure 1 
A production-distribution network 


Our objective is to provide an overview of the 
prevailing methodology for designing production- 
distribution systems. The remainder of this article is 
organized as follows: In the next section we give an 
overview and taxonomy of the prevailing PDSDP mod- 
els. Subsequently, an overview of PDSDP in practice is 
given. The paper concludes with our remarks. 


Prevailing Models 


Location decisions have been attracting researchers 
since the end of nineteenth century. But a rigorous 
methodology for production-distribution systems did 
not come out until sixties during which two compet- 
ing approaches have appeared. For a warehouse loca- 
tion problem, Shycon and Maffei [71] propose a simu- 
lation model claiming that, a proper model should be 
descriptive, whereas Kuehn and Hamburger [46] favor 
prescriptive models and propose a mixed integer lin- 
ear program (MILP) and a heuristic solution procedure. 
After these two pioneering works, MILP formulations 
received considerably more attention in both practice 
and theory. 

It is evident that PDSDP refers to a family of prob- 
lems. The purpose here is to provide an overview 
of production-distribution system design literature, 
rather than an exhaustive review. The uncapacitated fa- 
cility location problem (UFLP) is the simplest type of 
PDSDP, with a single-commodity, a network of two 
echelons (i.e., facilities and customers) of which only 
a single echelon of nodes (i. e. facilities) is to be located. 
The following is the most popular formulation known 
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in the literature: 


min > fiyi +O capri (1) 


i€l i€l jeJ 
sk Y xg = 1, jE, (2) 
ie] 
xijSyi, TET, j€), (3) 
xij 20, t€r,jeJ, (4) 
yi € {0,1}, iel (5) 


xij : proportion of customer j’s demand satisfied by fa- 
cility i, 

yi: 1 if facility iis opened, 0 otherwise, 

cjj: the total production and distribution costs for 
supplying all of the customer j’s demand from fa- 
cility i, 

fi: fixed cost of opening facility i, 

I,J: the set of candidate facility sites and customers. 


The objective is to minimize the sum of the fixed 
setup costs of opening facilities and the variable costs 
of serving the customers. Constraints (1) guarantee that 
each customer’s demand is satisfied, and (2) ensure that 
only open plants can make shipments. 

Beginning with Kuehn and Hamburger, most re- 
search has been directed towards devising more ef- 
ficient solution procedures to UFLP. Efroymson and 
Ray [19] propose an LP based branch-and-bound algo- 
rithm. However, in order to solve the arising LPs more 
efficiently and to minimize the memory requirements, 
they devised a more compact but weaker formulation 
of UFLP by replacing constraints (2) with the following 
equivalent set of constraints: 


Sa ae ie], (6) 
je] 


where n; is the number of customers that facility i can 
serve. Later, Spielberg [75] proposed another branch- 
and-bound (implicit enumeration) algorithm using the 
same weaker formulation. The largest stride towards an 
efficient solution procedure came in late seventies. Bilde 
and Krarup [8] and Erlenkotter [22], independent from 
each other, took advantage of the tighter formulation 
and proposed one of the most remarkable solution pro- 


cedures. Instead of solving LPs at nodes to optimally, 
they devised quick procedures to obtain good solutions 
to the dual of LPs. Using these “good” dual solutions, 
they generated integer feasible primal solutions using 
complementary slackness conditions and a heuristic. 
Erlenkotter reports that most of the time the optimal 
solution is found after single pass. If there is a gap left, it 
is eliminated by branch-and-bound. After these works, 
this type of procedure (called dual-based branch-and- 
bound) has been repeatedly applied to other location 
problems. 

The normative work on UFLP has been extended in 
several ways so as to analyze more realistic production- 
distribution systems. One of the earliest attempts was 
to incorporate the limited availability of land and other 
production factors at the alternative sites. The formula- 
tion of the capacitated version is quite similar to UFLP, 
where capacity constraints are appended to the formu- 
lation. Using identical formulations, Akinc and Khu- 
mawala [2] and Nauss [55] propose linear program- 
ming and Lagrangean relaxation based branch-and- 
bound solution techniques, respectively. On the other 
hand, Geoffrion and McBride [28] solve a generaliza- 
tion of the capacitated model in which there are lower 
as well as upper bounds and arbitrary constraints over 
structural variables (y;). 

Multicommodity models have also been widely 
studied in the literature(e. g. Warszawski [88], Neebe 
and Khumawala [56], and Karkazis and Boffey [41]). 
The problem is an immediate generalization of UFLP 
where there are multiple products. Even though they 
offered different formulations, all of them assumed that 
only one product can be assigned to a facility. This as- 
sumption is later relaxed in Klincewicz and Luss [43]. 
Quite a few researchers have focused on increasing the 
number of echelons to be located. This line of research 
constitutes a major stride towards the development 
of analytical models that are capable of representing 
PDSDPs of the size managers typically encounter in 
real life. Kaufman, Eede, and Hansen [42] and Tcha 
and Lee [76] are among the earlier works. The for- 
mer deals with location of plants and warehouses to 
minimize that total cost of respective fixed costs and 
production and distribution costs. Tcha and Lee how- 
ever deals with arbitrary number of echelons. The 
commonality between both models is that they ig- 
nore any cost that might arise from the interaction be- 
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tween plants and warehouses. Later, Gao and Robin- 
son [26] propose an efficient dual-based branch-and- 
bound algorithm and Barros and Labbe [6] present 
a profit maximization version of the same prob- 
lem. They proposed various heuristics and an exact 
Lagrangian relaxation-based branch-and-bound algo- 
rithm. 

Thus far, works on UFLP and some of its immedi- 
ate generalizations are reviewed. Arguably one of the 
most influential model in PDSDP literature is proposed 
and solved by Geoffrion and Graves [27]. Their model 
consists of three echelons; plants, warehouses, and cus- 
tomers. The objective is to minimize the total costs 
of production, transportation, warehousing and total 
fixed costs of opening the warehouses. The capacity re- 
strictions over DCs can be both from above or below, 
which enables modeling piecewise linear concave DC 
throughput costs. Furthermore, each customer is to be 
supplied from one DC and there may be certain re- 
strictions on the configurational decisions. They pro- 
posed a solution procedure based on Benders’ decom- 
position. 

The most important contribution of Geoffrion and 
Graves [27] is the paradigm change triggered by this pa- 
per. In multi-echelon models, the usual practice was to 
represent flow between neighbor echelons by different 
sets of variables and impose flow conservation at the 
nodes. Geoffrion and Graves [27] represent the flows 
on the network by a single set of variables from nodes 
in the first echelon to the nodes in the last (i.e. using 
more indices). With this modeling technique the num- 
ber of variables grow considerably with the size but the 
formulation becomes tighter, which could be useful in 
devising more efficient algorithms. 

Later, Moon [54] extended the model and solu- 
tion procedure considering economies of scale in DC 
throughput costs. In this case the DC throughput costs 
are represented by general concave cost functions. 
Pirkul and Jayaraman [59] propose a similar model that 
differs in the following aspects: 

i) The opening decisions are not only for ware- 
houses but also for plants, 

ii) The capacity limitations are only from above, and 

iii) There are upper bounds on the number of plants 
and warehouses that can be opened. They devise a La- 
grangean relaxation based heuristic solution procedure. 
In a subsequent paper [60], they also include supplier 


selection. Elhedli and Goffin [20] present one of the 
most recent advances in this area. 

In designing an production-distribution system, it 
is crucial to optimize the configurational decisions si- 
multaneously because a sequential approach is bound 
to produce suboptimal results especially when the in- 
teractions between these decisions and scope and scale 
economies are present. One of the earliest attempts to 
develop a model that consider scale economies is due to 
Soland [73]. His model is a simple extension of UFLP 
where fixed facility costs are replaced by a concave 
function of the size of the facility. Later, Holmberg [37] 
and Holmberg and Ling [38] propose a capacitated fa- 
cility location problem where capacity acquisition cost 
is an arbitrary piecewise linear function. Verter and 
Dincer [85] propose alternative model where capacity 
costs are assumed to be general concave functions of 
total acquisition. More recently, Dasci and Verter [17] 
and Verter and Dasci [84] extend these models to 
a multi-product environment and selection of various 
dedicated and flexible technologies that display dif- 
ferent forms of scale and scope economies. A num- 
ber of authors propose models that integrate inventory 
control and logistics decisions into a PDSDP frame- 
work. The inventory related costs also display scale 
economies and therefore, necessitate a concave or non- 
linear cost modeling, as in the aforementioned works. 
Shen [69] present a unifying work on this issue. Sney- 
der et al. [77] and Sourirajan et al. [74] present more 
recent location models that consider logistics related 
costs. 

Since the mid-eighties, the world witnessed the 
emergence of global manufacturing firms, which diver- 
sify their operations to different countries. Globaliza- 
tion provides the firm with many advantages, such as 
access to cheap labor, raw material and other produc- 
tion factors, presence at regional markets, and access 
to locally available technological resources and know- 
how. The arising supply chain structures, however, are 
usually more challenging from the perspective of the 
manager. The strategic management of international 
production-distribution networks are further compli- 
cated due to the price and exchange rate uncertain- 
ties. In a series of papers, Hodder and Jucker [35,36] 
and Hodder and Dincer [34] propose scenario based 
approaches to model these uncertainties. Their models 
maximize expected profit less a constant portion of the 
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variability of the profit, which is an appropriate way to 
represent a risk-averse decision-maker. Later, Gutierrez 
and Kouvelis [31] also used a scenario-based approach 
to find robust solutions under all possible scenario re- 
alizations. More recently, Canel and Khumawala [11] 
and Kouvelis et al. [45] proposed models that include 
subsidies and tariffs. The prevailing studies in this field 
suggest that the scenario-based approach is a popular 
way of modeling the various types of uncertainties that 
characterize the international environment. The num- 
ber of possible scenarios however quickly proliferate as 
the problem size increases, making not only the evalu- 
ation of the scenarios but also the generation of them 
a formidable task. 

As we have seen, there are families of PDS- 
DPs involving capacity limitation on sites, multiple 
products, multiple echelons, or time phased system 
design models. Almost all of these models are for- 
mulated as mixed integer programs and are solved 
usually by branch and bound. In general, Lagrangean 
relaxation, linear programming (LP) relaxation, and 
dual-based procedures (see for example, Erlenkot- 
ter [22] are the most common lower bounding tech- 
niques. While dual-based procedures are usually shown 
to be computationally more efficient than the former 
two, it is very much dependent on the special structures 
of the problems. On the other hand, Lagrangean and 
LP relaxation techniques can essentially be used for any 
formulation. Starting with Geoffrion and Graves [27], 
Bender’s decomposition, which is able to handle arbi- 
trary side constraints on structural variables, has been 
used few times. Nevertheless, it is among the less popu- 
lar solution methods like dynamic programming. From 
the perspective of heuristic solution techniques, La- 
grangean relaxation based and pairwise improvement 
type procedures are the most commonly used tech- 
niques. 

While the match between a formulation and a so- 
lution procedure is certainly the defining factor in the 
efficiency of a solution method, tighter formulations 
usually lead to more efficient solutions procedures. For 
example in UFLP, while constraints (2) and (6) are in- 
terchangeable, the former provides a tighter formula- 
tion, which eventually led to very efficient dual-based 
solution procedures. Similarly, as pioneered by Geof- 
frion and Graves [27], defining the flows on a network 
by a single set of variables from nodes in the first ech- 


elon to the nodes in the last. While this formulation 
causes the number of decision variables to propagate, 
it leads to tighter formulation. 

While we try to give an overview of the prevailing 
PDSDP models and solution methodologies, space pro- 
hibits us from mentioning tens if not hundreds other 
works. Therefore, we have devised a classification of 
the prevailing analytical models along differentiating 
PDSDP features in Table 1. This table gives a quick 
snapshot of PDSDP literature in the last 40 years. This 
classification is intended to reveal the strengths and 
weakness of the existing methodology as well as the 
major trends in the literature. Analytical approaches 
have been such a focus of research that even the table 
is far from exhaustive. Therefore, we refer the inter- 
ested reader to the following review and critique pa- 
pers: Aikens [1], Verter and Dincer [86], Vidal and 
Goetschalckx [87], Goetschalckx et al. [30], Klose and 
Drexl [44], Meixell and Gargeya [51]), Snyder [72], 
Sahin and Sural [64], and Shen [70]. 


PDSDP in Practice 


In this section, we provide an overview PDSDP applica- 
tion in practice. In their seminal paper, Geoffrion and 
Graves [27], have not only provided one of the most in- 
fluential models on PDSDP, but also reported arguably 
the first industrial implementation. They assisted Hunt- 
Wesson Foods, Inc in re-locating their distribution cen- 
ters (DCs). The firm had been producing several hun- 
dred commodities at 14 sites and distributing through 
a dozen DCs. They recommended five changes in the 
configuration of DCs as well as the establishment of 
anew DC. 

After this work PDSDP implementations have 
started to appear in the literature. Arguably, the 
most comprehensive implementation is presented by 
Arntzen et al. [4]. This paper reports on the devel- 
opment and use of Global Supply Chain Management 
(GSCM), a large scale optimization model that con- 
tributed to the re-organization of the Digital Equip- 
ment Corporation (DEC) in the late 80s and early 90s. 
Although its development had started as a small project, 
GSCM has become an essential tool to examine all as- 
pects of supply chain management in DEC. GSCM is 
arguably the most comprehensive system that consid- 
ers production, inventory, material handling, taxes, fa- 
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Production-Distribution System Design Problem, Table 1 
A taxonomy of analytical approaches for production-distribution system design problems 
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Kuehn and Hamburger [46] 
Efroymson and Ray [19] 
Spielberg [75] 

Elson [21] 

Warszawski [88] 

Geoffrion and Graves [27] 
Wesolowski and Truscott [89] 
Balachandran and Jain [5] 
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Akinc and Khumawala [2] 
LeBlanc [47] 

Kaufman et al. [42] 
Erlenkotter [22] 

Geofrrion and McBride [28] 
Nauss [55] 

Karkazis and Boffey [41] 
Neebe and Khumawala [56] 
Van Roy and Gelders [83] 
Van Roy and Erlenkotter [82] 
Tcha and Lee [76] 

Hodder and Jucker [36] 
Hodder and Jucker [35] 
Hodder and Dincer [34] 
Klincewicz and Luss [43] 
Moon [54] 

Gao and Robinson [26] 
Holmberg [35] 

Robinson et al. [63] 

Barros and Labbe [6] 
Gutierrez and Kouvelis [31] 
Verter and Dincer [86] 
Pirkul and Jayaraman [59] 
Holmberg and Ling [38] 
Pirkul and Jayaraman [60] 
Lim and Kim [49] 

Dogan and Goetschalckx [18] 
Melachrinoudis and Min [52] 
Hinojosa et al. [33] 

Dasci and Verter [17] 
Tsiakis et al. [78] 
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function (C/P/M)! 


Number of 
structural decisions 


fay Objective 
echelon in PDS 
Number of 
Number of 


Jayaraman and Pirkul [40] 


‘Ee 


Sneyder et al. [60] 
Sourirajan et al. [57] 


cility fixed charges, production line fixed charges, trans- 
portation costs, duty costs, duty drawbacks and duty 
avoidance. In an effort to establish the optimal sup- 
ply chain structure, GSCM reduced the number of 
plants from 31 to 12, which enabled the major cus- 
tomer regions (America, Europe, and Pacific Rim) to 
become self-contained. Estimated savings as of 1995 
were $1.2 billion. GSCM was also used to determine 
the optimal network structure for distribution of spare 
parts and collection of defective items. Repair cen- 
ter facility locations and capacities were determined. 
Total savings of this project were estimated to be 
$200 million. 

A set of published industrial applications are sum- 
marized in Table 2. While such works appear at a steady 
pace, the literature on the applications of PDSDP is 
quite sparse. There seem to be a few possible explana- 
tions: First, (re)-configuration of a firm’s supply chain 
is a long-term process, which requires a genuine com- 
mitment by top management. In many cases, the pos- 
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sibility of success in adopting an analytical approach 
for (re)-designing the supply chain is severely dimin- 
ished due to the lack of support from top management. 
Second, we are aware of many firms who choose not 
to report their experiences in production-distribution 
system design simply due to the strategic nature of 
these decisions. Finally, the level of many industrial 
projects might not justify publication in academic jour- 
nals. 

Most of the models developed in these projects 
are different from each other and the past models ap- 
peared in the literature. This might be taken as a nega- 
tive sign on the applicability of generic models. How- 
ever, there are several commercial software packages 
that have built-in functions and off-the-shelf general- 
ized programs. The companies developing these soft- 
ware packages report hundreds of firms in their client 
roster. 

One such software package is called, Strategic Lo- 
gistics Integrative Modeling System (SLIM). It is an 
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A Sample of Published Industrial Applications 
Published Paper 
Geoffrion and Graves [27] 


Company 
Hunt-Wesson 


Industry 


Facilities 
14 plants, 12 DCs? 


Region 
USA 


Estimated Savings 
$1-5 million/year 
9-25% of dist. cost 


Van Roy and Gelders [83] N.V. ESSO 


Belgium 1 plant, 7 depots N/A 


Breitman and Lucas [10] GM 


Multinational $1 billion total 


Cohen and Lee [13] Apple 


Multinational N/A 


Martin et al. [50] 
Robinson et al. [63] 


Libbey-Owens-Ford 


Dow Consumer 
Products 


Home- and 
Food-care 


4 plants $2 million/year 


$1.5 million/year 


Ault Foods Limited 
DEC 
IBM Europe 


Pooley [61] 
Arntzen et al. [4] 
Feigin et al. [23] 


Dairy 


USA 
USA 13 CDCs?, 23 RDCs* 
Canada 4 plants, 12 depots 


$3 million/year 


33 plants $1.5 billion total 
1 plant, 13 DCs $40 million/year 


Sankaran and Raghavan [65] | S. Shakti LPG Limited LPG = [india 2 ports, 5 plants $1 million/yearN/A 


Koksalan and Sural [64] 
Sery et al. [67] 
Woudaet al. [90] 


Efes Group 
BASF Corporation 


Nutricia Dairy and 
Drinks 


GE Plastics 
Nu-kote 
Elkem 
BMW 


Tyagi et al. [79] 
LeBlanc et al. [48] 
Ulstein et al. [80] 
Fleischmann et al. [24] 
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DC: Distribution Center, CDC: Central Distribution Center, RDC: Regional Distribution Center 


advanced decision support system that is used to 
evaluate strategic production and distribution plan- 
ning problems. SLIM is developed by Prof. B. Shapiro 
of MIT and his associates, who have collaborated 
with over 100 companies in the US, Canada, Eu- 
rope, and Australia [68]. Another software package is 
called Strategic Analysis of Integrated Logistics Sys- 
tem (SAILS), which was developed by A. Geoffrion 
and his colleagues at the Insight Inc. In this package, 
extensive data management functions and optimiza- 
tion tools are integrated. Geoffrion and Powers [29] 
report that there have been 50 or more studies done 
by SAILS or one of its earlier versions. The projects 
result with 5-15% reduction in total transportation 
costs. 

Generic models have the advantage of having 
a shorter development time and lower costs whereas 
custom-made models have the advantage of ability 
to incorporate various firm specific practices. Either 
generic or custom-made, above studies have shown 
that there are considerable savings possible from 
(re)designing production-distribution systems through 
an analytical model. 


Concluding Remarks 


At the time, the first edition of the encyclopedia has ap- 
peared, we stressed that the development of analytical 
models that could effectively represent supply chains 
of the sizes that are typically encountered in real life 
was crucial. Recent advances in PDSDP research and 
computing enabled researchers to analyze more com- 
plex systems and achieve this objective. A quick glance 
at Table 1 would show that the advances made in the 
last 10 years are far greater than the advances made ear- 
lier. The existing methodology on PDSDP however still 
needs to be improved. We see two general directions the 
methodology should be improved: integration of strate- 
gic and tactical decisions and including international 
features in PDSDPs. 

It is well known that structural decisions would re- 
strict how the firms make their tactical decisions sub- 
sequently. Therefore, integration of this decisions in 
a PDSDP framework should help firms improve their 
profits. Although few works have recently appeared in 
the literature that integrates these decisions [74,77], the 
literature still has gaps. 
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While issues such as globalization, outsourcing, in- 
ternational plant location have been around quite some 
time, the literature is still scant on the issue. Among 
the 56 works that we have classified in Table 1, only 
six papers consider international features such as sub- 
sidies, tariffs, and exchange rate uncertainty explic- 
itly. There has also been a skepticism as to the im- 
pact of certain aspects globalization on firms’ config- 
uration decisions. For example, according to one sur- 
vey, conducted among the plant managers of 73 large 
multinational firms, subsidies and free trade zones are 
ranked among the least important factors. However, 
a number of studies, mostly conducted in the US Na- 
tional Bureau of Economic Research, report that both 
international and domestic firms have been quite re- 
sponsive to subsidies, free trade zones, taxes, and la- 
bor costs in deciding their plant locations. For exam- 
ple, Head, Ries, and Swenson [32], in their investi- 
gation of location patterns of Japanese manufacturing 
establishments, conclude that these firms have been 
quite responsive to trade zones. Similarly, Coughlin 
and Segev [15], in their study of foreign investment 
in the US, concludes that higher average labor den- 
sity (as a surrogate measure of average wage rate) and 
higher taxes in a state are found to deter foreign invest- 
ment. Finally, both Fuest and Huber [25] and Bergman, 
Fuss, and Regev [7] provide anecdotal and empirical ev- 
idence about how subsidies alter the location choices of 
domestic firms in the Eastern Germany and Israel re- 
spectively. 

We can conclude that the interactions between fa- 
cility design decisions are especially important for in- 
ternational companies. They face not only a multitude 
of location and technology choices but also a com- 
plex array of cost structures due to regional differ- 
ences as well as governments’ tax, subsidy, and free 
trade policies. These decisions are further complicated 
by manufacturing strategy options and scale and scope 
economies. Therefore, it is important that analytical 
approaches for PDSDP incorporate demand and ex- 
change rate uncertainties as well as the other distin- 
guishing features of the international environment. 
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Abstract 


We briefly review the development of generalized- 
ensemble optimization techniques and their applica- 
tion since publication of “Protein Folding: Generalized- 
Ensemble Algorithms” was published in the first edition 
of this book. 


Keywords 


Energy landscape sampling; Model hopping; Parallel 
tempering 


“Protein Folding: Generalized-Ensemble Algorithms” 
was submitted to the editors in February 1999 as 
a contribution for the “Encyclopedia of Optimiza- 
tion.” While the article has remained useful as a short 
and concise introduction, the remarkable success over 
the last 8 years in the forming and application of 
generalized-ensemble techniques to optimization prob- 
lems warrants some comments. New generalized- 
ensemble algorithms have been developed, and the sim- 
ulation of small proteins (of order ~ 50 residues) has 
become feasible. 

One example for the recent algorithmic develop- 
ments is energy landscape paving (ELP) [5]. Like all 
successful stochastic optimization techniques, it aims at 
avoiding entrapment in local minima and to continue 
the search for further solutions. For this purpose, one 
performs in ELP low-temperature Monte Carlo simu- 
lations with an effective energy: 


w(E) =e F/T with B=E+ f(H(q.t)). (1) 


Here, T is a (low) temperature and f(H(q,t)) a func- 
tion of the histogram H(q, t) in a prechosen “order pa- 
rameter” q. The weight of a local minimum state de- 
creases with the time the system stays in that state, i.e., 
ELP deforms the energy landscape locally till the local 
minimum is no longer favored, and the system will ex- 
plore higher energies. It will then either fall in a new 
local minimum or walk through this high-energy re- 
gion till the corresponding histogram entries all have 


similar frequencies and the system again has a bias to- 
ward low energies. Note that for f(H(q, t)) = f(H(q)) 
ELP reduces to the older generalized-ensemble meth- 
ods. We have evaluated the efficiency of ELP in simu- 
lations of the 20-residue trp-cage protein whose struc- 
ture we could “predict” within a root-mean-square de- 
viation (rmsd) of 1 A [8]. 

Note also that ELP allows even the possibility 
of zero-temperature simulations [8]. For T + 0 only 
moves with AE <0 will be accepted. If we choose 
E=E+cHC(E, t), we find as acceptance criterion: 


AE + cAH(q,t)<0<cAH(q,t)<—AE, (2) 


where E is the physical energy. Hence, within ELP the 
system can overcome even at T = 0 any energy barrier. 
The waiting time for such a move is proportional to the 
height of the barrier that needs to be crossed. Factor c 
sets now only the time scale, and in this sense the T = 0 
form of ELP is parameter free. 

Today, the most popular generalized-ensemble 
technique in protein science is parallel tempering (also 
known as replica exchange) [1,3]. In its most com- 
mon form, one considers N noninteracting copies of the 
molecule, each at a different temperature T;. In addi- 
tion to standard Monte Carlo [3] or molecular dynam- 
ics moves [3,9] that affect only one copy, parallel tem- 
pering introduces a new global update [1]: the exchange 
of conformations between two copies i and j = i+ 1 
with probability 


w(C4 — CP”) = min(1, exp(—B;E(C;) — B;E(Ci) 
+ BiE(C;) + BjE(Cj))). (3) 


This exchange of conformations leads to a faster con- 
vergence than is observed in regular low-temperature 
canonical simulations. Note that parallel tempering 
does not require Boltzmann weights but can be com- 
bined easily with other generalized-ensemble tech- 
niques [3]. 

A variant of parallel tempering is “model hop- 
ping” [6], where sampling of low-energy configurations 
is enhanced by performing a random walk through an 
ensemble of systems with slightly altered energy func- 
tions. In that way, information is exchanged between 
varying stages of coarse graining or different local envi- 
ronments. We assume that the energy function can be 
divided into two terms: E = E, + aEg. As in parallel 
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tempering, “model hopping” considers N noninteract- 
ing copies of the molecule, but adjacent copies are now 
exchanged with probability 


w(Co4 > CP) = min(1, exp{—B[Ea(C)) 
+ ajEp(Cj) + Ea(C;i) + ajEp(C;) 
— Ea(C;) — aj Eg(C;) — Ea(C)) 
— ajEz(Cj)]}). 
(4) 


Here, Aa= aj —aj and AEg = Ez(Cj) — Ep(C;j). 
Configurations perform a random walk on a ladder of 
models with a4, = 1> a. > a3> ... > Gy that differ 
by the relative contributions of E, to the total energy 
E of the molecule. For instance, barriers in the energy 
landscape of proteins often arise from van der Waals 
repulsion between atoms that come too close. Hence, 
we have considered an implementation of “model hop- 
ping” with successively smaller contributions from the 
van der Waals energy. While the “physical” system is 
on one side of the ladder (at a, = 1), the (nonphysical) 
model on the other end of the ladder (at ay < 1) may 
allow atoms to share the same position in space. As the 
protein “tunnels” through energy barriers, sampling of 
low-energy configurations is enhanced in the “physical” 
model (at a; = 1). With this realization of “model hop- 
ping” we could “predict” the structure of a 46-residue 
protein A in an all-atom simulation within a root mean 
square deviation (rmsd) of 3.2 A [6] 

Model hopping also allows guiding a simulation by 
information obtained from homologous structures [2]. 
Usually, such spatial constraints introduce an addi- 
tional roughness into the energy landscape and there- 
fore often lead to extremely slow convergence of the 
simulation. This problem is circumvented in our ap- 
proach through a random walk in an ensemble of repli- 
cas that differ by the strength with which the constraints 
are coupled to the system. We have demonstrated the 
usefulness of this approach on some examples of the 
CASP6 competition [2]. 

Generalized ensemble, as discussed in “Protein 
Folding: Generalized-Ensemble Algorithms” and in 
this addendum, now allows the in silico study of small 
proteins (built out of ~ 50 amino acids) using all- 
atom models. Examples include the 34-residue hu- 
man parathyroid hormone fragment PTH(1-34) [4], 
the 28-residue FSD, the 36-residue villin headpiece 


subdomain [5,7,10], and fragment B of protein A (46 
residues) [6]. Current work aims at extending the ap- 
plicability of these methods to all-atom simulations of 
proteins built out of 50 to 100 residues. 
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Introduction 


One of the most demanding problems in computational 
biology is the so-called ab initio protein structure pre- 
diction problem. In ab initio prediction the objective is 
to predict protein structure solely from physically based 
force fields that describe the interactions between the 
amino acids and interactions between amino acids and 
the protein’s environment. The ab initio protein struc- 
ture prediction problem can be cast as an optimiza- 
tion problem in which the minimum free energy of the 
molecule is sought, because this configuration corre- 
sponds to the most stable structure of the protein. 

In this contribution the problem of protein loop 
structure prediction is addressed. By the term loops we 
refer to any amino acid subsequence within a protein 
that is not of the geometrically regular type of an a-he- 
lix or a B-strand. The importance of loops for the over- 
all three-dimensional structure and function of pro- 
teins has been pointed out before (see, e.g., Fiser et 
al. [5]). Even though loops are short amino acid sub- 
sequences, they are of major importance to the overall 
structure. Loops are often exposed to the surface and 
contribute to active and binding sites. Without loops, 
many proteins could not fold into compact structures. 

Unfortunately, loop structure is considerably more 
difficult to predict than the geometrically regular 
B-strands and a-helices, since loops possess greater 
structural flexibility than strands and helices, and since 
they have relatively few contacts with the remainder of 
the structure. 

Methods for loop structure prediction have been in- 
vestigated for at least two decades [2]. Recent progress 
in loop structure prediction has been achieved with ap- 
proaches that combine dihedral angle sampling, steric 
clash detection, clustering, and scoring or energy func- 
tion evaluation to build up ensembles of loop confor- 
mations. 


In the so-called loop reconstruction problem, the an- 
chor geometry of the protein into which the loop must 
fit is assumed to be known. Here we address the prob- 
lem of loop structure prediction when no information 
on the anchor geometry is available. More precisely, we 
assume the secondary structure of the stem residues is 
assumed to be known, but the geometry of the protein 
into which the loop must fit is considered to be un- 
known in our methodology. This loop structure predic- 
tion with flexible stems must be considered more diffi- 
cult than the loop reconstruction problem. 

Ultimately, the optimization based method for loop 
structure prediction summarized here is going to be 
embedded in an existing ab initio protein structure pre- 
diction method [10,11,12,13,14,15,16,17,18,19]. 


Method and Applications 


This section first introduces a new methodology for 
loop structure prediction for loops with flexible stems. 
Subsequently, results are summarized that have been 
obtained for a large test set of 3215 loops of known 
structure. 


Method 


The loop prediction method proceeds along the fol- 
lowing steps: (i) generating conformers by high-reso- 
lution dihedral angle sampling, (ii) structure optimiza- 
tion with a physically based energy function, (iii) itera- 
tive clustering of ensembles to discard conformers that 
are likely to have a large rmsd with respect to the native 
loop structure, and (iv) strategies for selecting optimal 
loops from the ensemble that remains after step (iii). 
These steps are briefly described. For details we refer 
to [20]. 

The description of the geometry of a loop consists 
of two elements. The geometry of the backbone is de- 
scribed in terms of the dihedral angles, @ and yw. Cor- 
respondingly, the side chains of each amino acids are 
described by side chain dihedral angles. The number of 
the side chain dihedral angles depends on the type of 
amino acid. Step (i) generates a large number of candi- 
date backbone conformations by sampling n — 1 dihe- 
dral angle pairs (¢;, Wj), i = 1,...,n — 1 from appro- 
priate probability distributions for a loop that consists 
of n amino acids. Conformers are generated with prob- 
ability functions in a discretized (%, )-space [20]. In 
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this approach the (, y) space is divided into 72? an- 
gular bins of size 5° x 5°. Functions for the probability 
to find a (f, w) pair in any of the 5° x 5° bins are de- 
rived for each of the 20 amino acids in each of three 
types of secondary structure, resulting in 60 probabil- 
ity functions. The three types considered are a-helical, 
f-strand, and loop. More precisely, helical and strand- 
type amino acids are defined by the DSSP classification 
codes H, E [9]. To qualify as a loop, an amino acid se- 
quence must not be at either terminal of the protein 
and must be located between strands or helices. Proba- 
bility functions are derived by counting occurrences of 
(¢, w)-pairs for each amino acid in each bin for a refer- 
ence set of known protein structures. The reference set 
is chosen to be all proteins with experimental resolution 
of 2.2 A or better in the PdbSelect25 set (http://www. 
cmbi.kun.nl/gv/dssp/)[9]. This results in a set of 939 
reference proteins, and an overrepresentation of angu- 
lar bins due to considering structures that are too simi- 
lar is avoided. Similar dihedral angle bin sets have suc- 
cessfully been used before [3,4]. Compact and compu- 
tationally efficient representations of the dihedral angle 
bin sets and their sampling for conformer generation 
exist. We refer to [20] for details. 

Having generated a backbone geometry by dihe- 
dral angle sampling, side chain angles are optimized 
for each amino acid by looping over the amino acids 
and identifying the lowest energy conformation that 
can be achieved with any combination of angles from 
a well-established library of side chain conformations 
(the Dunbrack rotamer library [1]). While determining 
the optimal side chain configuration for an amino acid, 
the backbone and side chain angles of the remaining 
amino acids are fixed. The side chain optimization iter- 
ates over the sequence of amino acids that constitute the 
protein and terminates when no further improvement 
can be achieved. Energies are calculated with a physi- 
cally based force field, the ECEPP/3 force field [21]. Co- 
valent bond lengths and bond angles are fixed at their 
equilibrium values, so that the conformation is a func- 
tion of the torsional angles only. The energy comprises 
electrostatic, nonbonded, hydrogen-bonded, and tor- 
sional contributions. After backbone generation by di- 
hedral angle sampling, and after side chain optimiza- 
tion, the entire structure is subjected to an energy mini- 
mization in which all degrees of freedom, both all back- 
bone angles, and all side chain angles, are optimized si- 


multaneously. For this purpose a sequential quadratic 
programming algorithm [6] is used. 

Using the approach described so far, a sufficiently 
large number n of geometrical conformations is gen- 
erated and the resulting ensemble is subjected to clus- 
tering. In all results reported here, the ensemble size 
was chosen to be n© = 2000. Before introducing the 
clustering algorithm we need the following definitions. 
The cluster size N; is defined by 


Nj = #{j | r(i, j) ss T thresh} : (1) 


In Eq. (1) r(i,j) denotes the root mean square deviation 
(rmsd) between conformer i and j, and the pound sym- 
bol is the cardinality operator. For details on the rmsd 
calculation we refer to [20]. Note that the number of 
conformers n, must be chosen with care, as the com- 
plexity of the pairwise rmsd calculation is O(n2). In the 
algorithm the upper index k is introduced to count the 
number of iterations. The symbols n, N ee and r(*) 
denote the number of conformers in the ensemble in 
iteration k, the number of conformers N; as defined in 
Eq. (1) in the largest cluster found in iteration k 

N*® = max N i (2) 


i,max 
i=1,....n) 
and the largest pairwise rmsd found in the ensemble in 
iteration k 


max a r(i, j). (3) 


i,j=l,...,M¢ 


The relative rmsd threshold Tthresh,rel € (0, 1) and the 
relative critical cluster size Nerit,rei € (0, 1) are used to 
e€Xpress thresh and Nerit in terms of the largest pairwise 


rmsd r‘*). and the largest cluster Noe in each itera- 
tion, 
k k 
oes _ Pehresh,rel” gree (4) 
(k) (k) 
Nevit = Nerit,relN; ax” (5) 


The algorithm can now be stated as follows [20]. 

1. Initialization 
Reset the iteration counter, k = 0. Choose the en- 
semble size n and generate the ensemble. Choose 
relative thresholds rthresh,re! ANG Nerit,rel- Sample val- 
ues are thresh,rel = 0.5 and Nerit,rel = 0.5. 
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2. Determination of rmsd threshold ae 


Calculate pairwise rmsds r(i,j) for i= 1,..., n*), 
$2132: 
Determine r‘*), from Eq. (3) and set tos according 


to Eq. (4). 


3. Determination of clusters size threshold nie 

Calculate cluster sizes ni for all conformers 

i = 1,...,n according to Eq. (1), where 

Tthresh = qo. in Eq. (1). 

Calculate NY? according to Eq. (2). Set nie ac- 

cording to Eq. (5). 

4. Termination criterion 

If N; > N& for alli = 1,...,n™, stop. 

5. Discarding far-from-native conformations 

Increment the iteration counter k. 

Discard conformations i for which N; < Nn, Con- 

sider the remaining conformations to constitute the 

new ensemble. Set n‘*) to the number of conformers 
in the new ensemble, re-enumerate the conformers 
in the new ensemble with numbers i = 1,..., ni), 

and go to step 2. 

We stress that the clustering algorithm is based on the 
idea of discarding structures that are likely not to be 
close to the native structure. In contrast, existing clus- 
tering algorithms are based on selecting structures that 
are close to native [3,4,7,8,22]. 

For test cases with known three-dimensional struc- 
ture, the quality of the sampling approach can be as- 
sessed by identifying the lowest rmsd to the known 
structure in an ensemble. This check reveals that sam- 
pling is not restricting the overall prediction strat- 
egy [20]. 

Ultimately, a valid strategy is needed for selecting 
one or a few conformers from an ensemble that are 
likely to have a small rmsd with respect to the na- 
tive structure when the native structure is not known. 
The potential energy has been reported not to be 
a useful criterion [8,22]. Among other things, this 
can be accounted to the fact that the potential en- 
ergy does not take entropic contributions into ac- 
count. As a remedy, the so-called colony energy has 
been suggested as an approximate approach to tak- 
ing entropic contributions into account [22]. Alterna- 
tively one can disregard potential energy completely 
and identify conformers that are likely to be close 
to native based on the size of clusters they form 


within the ensemble. We tested the potential en- 
ergy, colony energy, cluster size, and a hybrid ap- 
proach on a large set of 3215 loops with known struc- 
ture. 


Applications 


In this section we summarize the results that have been 
obtained for a large test set of loops with known struc- 
ture. Using the PdbSelect25 we identified a test set 
of 716 loops of total length 10 (4 loop residues and 
6 stem residues), 656 loops of length 11 (5 loop residues 
and 6 stem residues), 387 loops of length 12 (6 loop 
residues and 6 stem residues), 366 loops of length 13 
(7 loop residues and 6 stem residues), 283 loops of 
length 14 (8 loop residues and 6 stem residues), 223 
loops of length 15 (9 loop residues and 6 stem residues), 
176 loops of length 16 (10 loop residues and 6 stem 
residues), 138 loops of length 17 (11 loop residues 
and 6 stem residues), 115 loops of length 18 (12 loop 
residues and 6 stem residues), 75 loops of length 19 
(13 loop residues and 6 stem residues), and 80 loops 
of length 20 (14 loop residues and 6 stem residues). To 
our knowledge, this is the largest dataset for which loop 
structure prediction results have been reported to date. 
We refer to [20] for details. 

The results are summarized in Fig. 1. The results la- 
beled average rmsds by energy are obtained by identify- 
ing the lowest ECEPP/3 energy conformer in each en- 
semble, recording its rmsd to native, and subsequently 
averaging over the rmsds for all ensembles of loops of 
a given length. The average rmsds by colony energy re- 
sults when using the colony energy [22] instead of the 
ECEPP/3 energy. The remaining data shown in Fig. 1 
is obtained with the clustering algorithm. For all of 
the loops reported here, the clustering algorithm termi- 
nated after k = 2 steps. 

Figure 1 shows that the conformer that generates 
the largest cluster after any clustering step k = 0, 1,2 
is on average a better prediction for the native struc- 
ture than both the lowest energy and lowest colony 
energy conformer. From this we infer that the cluster 
size is a better criterion for the identification of con- 
formers than energy or colony energy. Furthermore, 
the largest cluster conformer found after step k + 1 of 
the clustering algorithm is as good as, or better than, the 
largest cluster conformer found in step k. In this sense, 
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Average of rmsds as a function of total loop length for the loops from the PdbSelect25 proteins. Lines are added as a guide 


to the eye. Results are reproduced from [20] 


repeatedly applying the clustering algorithm improves 
the largest cluster conformer on average. 

In order to assess the computational demand cre- 
ated by the suggested approach, we measured the cpu 
times needed by our approach for 20 randomly selected 
loops of each sequence length. The resulting average 
times are 1.5h, 5h, and 11.25h for a single loop with 4 
loop and 6 stem residues, 10 loop and 6 stem residues, 
14 stem and 6 loop residues, respectively, when the en- 
semble size is chosen to be 2000 conformers. These 
times were obtained on a single Intel Xeon 3 GHz ma- 
chine running RedHat Linux 9.0. 

The ensemble generation step needs a negligible 
fraction of the total time. The second step, energy mini- 
mization, is the computationally most demanding step. 
The times needed for pairwise rmsd calculations con- 
tribute between 3.5min and 7 min to the total times 
given above. The time needed for the application of 
the clustering algorithm detailed in Sect. “Method” is 
independent of the loop length and contributes about 
4 min per loop to the total times given above. The time- 
consuming step of energy minimization is amenable to 
parallelization, since the loops in an ensemble can be 
treated separately. 
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Introduction/Background 


Pseudomonotone maps were introduced by Karamar- 
dian as a generalization of monotone maps [22]. Other 
generalizations include various kinds of quasimono- 
tone maps (quasimonotone, strictly quasimonotone, 
semistrictly quasimonotone ...). These generalizations, 
the relations between them, as well as the relation to 
generalized convexity are discussed in other articles of 
the Encyclopedia (see ® generalized monotone single 
valued maps and » generalized monotone multivalued 
maps and in [16]. 

It should be noted that the same term a “pseu- 
domonotone map” has been introduced by Brezis to 
denote a totally different class of maps [4]. The main 
difference between the two classes is that pseudomono- 
tone maps in the sense of Brézis are defined through 
a kind of continuity property, whereas Karamardian 
used only the order relation of real numbers in his defi- 
nition. For this reason some authors, starting by Gwin- 
ner [12], use the term “topologically pseudomonotone” 
for pseudomonotone maps in the Brezis sense. Al- 
though it is possible to give a definition that includes 
both kinds of pseudomonotonicity [11], we will use the 
term “pseudomonotone” only in the sense defined by 
Karamardian. 

Pseudomonotone maps have the advantage that 
they lead to generalizations of existence theorems for 
the Stampachia variational inequality problem (VIP), 
without imposing additional assumptions, and with 
practically the same proof as for monotone maps [16]. 
However, this quasi-identical treatement of the VIP is 
not extended to other topics, and the properties of the 
two classes of maps are often quite dissimilar. For in- 
stance, while the sum of two monotone maps is mono- 
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tone, this is false for pseudomonotone maps. A vast the- 
ory has been developed for monotone maps, based on 
the concept of maximal monotonicity. By contrast, un- 
til recently it was believed that maximality plays no réle 
for pseudomonotone maps. Consequently, some algo- 
rithms for finding the solution of VIP with maximal 
monotone maps have no extension to the pseudomono- 
tone case. 

This article will present some recent developments 
that can be considered as a first step towards filling the 
lacunae in the theory of pseudomonotone maps. In par- 
ticular, maximal pseudomonotonicity will be discussed. 
The main tool is the definition of an equivalence re- 
lation in the set of all pseudomonotone maps. Also, 
a generalization of paramonotone maps and their use 
in cutting plane algorithms will be described. Finally, 
recent results on pseudoaffine maps and on the relation 
to monotone maps will be presented. 


Definitions 


Let X be a real Banach space and X* be its dual. Given 
x, y € X, [x, y] denotes the line segment {(1—t)x+ ty: 
t € [0,1]}.For K C X*, R44 K willbe the set U wat 


A multivalued map T : X > 2*" isa map whose values 
are subsets of X*, possibly empty. The domain D(T) 
of T is the set {x € X : T(x) # Q}, its graph the set 
gr(T) = {(x,x*) € X x X*: x* © T(x)} and its set of 
zeros is the set Zp = {x € X : 0 € T(x)}. The map T 
is called upper sign-continuous [13] if for all x € D(T) 
and v € X, the following implication holds: 


(vee (0,1), inf 
x*ET(x+tVv) 


(x*,v) > 0) 


=> sup (x*,v)>0. 
x* ET(x) 


If T is upper hemicontinuous (i.e., its restriction 
on line segments is upper semicontinuous with respect 
to the weak* topology in X™*), then it is upper sign- 
continuous. 

The map T is called monotone if for every (x, x*), 
(y. y*) € gr(T), (y* —x*, y — x) = 0; it is called max- 
imal monotone if its graph is not strictly contained in 
the graph of any other monotone map. Also, T is called 
D-maximal monotone if its graph is not strictly con- 
tained in the graph of any other monotone map with 
the same domain. 


The map T is called pseudomonotone if for ev- 
ery (x, x*), (y, y*) € gr(T), the following implication 
holds. 


(x*,y—x) >0> (y*,y—-x) > 0. 


Obviously, every monotone map is pseudomono- 
tone. 

Given a locally Lipschitz function f : X > RU 
{+00}, we denote by 0°f its Clarke subdifferential [7]. 
The locally Lipschitz function f is called pseudoconvex 
if for every x € dom(f) and x* € 0° f(x) the following 
implication holds: 


(x*, y—x)>0=> f(y) = f(x). 


It is known that a locally Lipschitz function f is 
pseudoconvex if and only if 0°f is a pseudomonotone 
map [23]. 


Formulation 
Maximal Pseudomonotonicity 


In order to introduce maximal pseudomonotone 
maps, one first defines an equivalence relation in the 
set of pseudomonotone maps. Two pseudomonotone 
maps T, and T> are called equivalent if they have the 
same domain, the same set of zeros, and for each x 
which is not a common zero, the elements of T,(x) 
are positive multiples of the elements of T>(x) and vice 
versa. In other words, 

(a) D(T) = D(Th) 

(b) Z7, = Zn, 

(c) for every x € X\Z7,, R44Ti(x) = R44 Th(x). 

In this case we write T; ~ T>. This is an equivalence 
relation. Another aspect of this equivalence is pro- 
vided by the Stampacchia variational inequality. Given 
a map T and a convex subset K of X, we denote by 
S(T, K) the set of all x € K which are solutions of the 
VIP: 


Vy € K,Ax* € T(x): (x*,y—x) > 0. 
The following result holds [14]. 


Proposition 1 Let T,, T, be pseudomonotone maps. If 
T; ~ Th, then S(T,, K) = S(T), K) for every convex set 
K C X. Conversely, if S(T;,K) = S(T, K) for every 
line segment K and T,, T, have weak*-compact convex 
values, then T,; ~ To. 
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Since all equivalent maps provide the same solutions 
to VIP, one can choose any element of the equivalence 
class to study or even find the solutions. 

Given a pseudomonotone map T, its equivalence 
class has a maximum with respect to graph inclusion. 
This is simply the map T defined by T(x) = Usv7S(x) 
for all x € D(T). It can be shown [13] that T is also 
given by the formula 


®, ifx ¢ D(T) 
T(x) = 4 Ry4T(x), ifx € D(T)\Zr 
Nip ys ifxe€ Lr 


where Nz, , is the normal cone at x to the set Lr, = 
tye Xs Ay" € T(y), (y*, yy — x) S Of. 

A pseudomonotone map T is called D-maximal 
pseudomonotone, if the graph of T is not properly con- 
tained in the graph of any other map with the same 
domain. When the domain of T is convex, there is an 
equivalent, more appealing definition for D-maximal 
pseudomonotonicity [14]: 


Proposition 2 Let T be pseudomonotone and such that 
D(T) is convex. Then T is D-maximal pseudomonotone 
if, and only if, every pseudomonotone extension of T with 
the same domain is equivalent to T. 


Some properties of the set of zeros of T are provided by 
the following proposition [14]. 


Proposition 3 Let T be D-maximal pseudomonotone. 
Then Zr is weakly closed in D(T). If in addition D(T) is 
convex, then Zr is also convex, and z € Zr is equivalent 
to 


V(y, y*) € gr(T), (y*,y—z) = 0. 


The following proposition provides a simple criterion 
for showing the D-maximal pseudomonotonicity of 
a map [13]. 


Proposition 4 Assume that T is pseudomonotone, 
upper-sign continuous, with weak* -compact, convex val- 
ues and open domain D(T). Then T is D-maximal pseu- 
domonotone. 


A simple consequence of the above proposition is: 


Corollary 5 The Clarke subdifferential 0° f of a locally 
Lipschitz, pseudoconvex function f : X + RU {+00} 
is a D-maximal pseudomonotone map. 


As was explained before, from the point of view of VIP 
one can use any element of the equivalence class. This is 
fortunate, since in many cases instead of showing that 
a D-maximal pseudomonotone map has a “nice” prop- 
erty (as is the case with maximal monotone maps), one 
shows that an equivalent map has this property. For in- 
stance, one has: 


Proposition 6 If T is D-maximal pseudomonotone, 
then T(x) is convex for every x € D(T). If in particu- 
lar the assumptions of Proposition 4 are satisfied, then 
T(x) U {0} is weak*-closed. 


Here is a case where one can find an equivalent map 
with a better continuity property [13]: 


Proposition 7 Let T : R" — 2" be a pseudomono- 
tone map, upper sign-continuous, with compact convex 
values. If D(T) is open and convex, then there exists an 
equivalent upper semicontinuous map T, with compact 
convex values. 


For instance let T be a single-valued pseudomonotone 
map defined on an open convex subset of R”. If T is 
hemicontinuous (i. e., continuous along line segments) 
then the above proposition guarantees that there exists 
an equivalent map which is continuous. Likewise, one 
can show that under some fairly general assumptions, 
T is equivalent to a map which is generically single- 
valued (i.e., is single valued except on a set of the first 
category). See Corollary 3.10 in [13]. 


A Generalization of Paramonotone Maps 


A monotone multivalued map T is called para- 
monotone if for every (x,x*), (y,y*) € gr(T), 
(y* —x*,y—x) = 0 implies that x* € T(y) and 
y* € T(x). It can be shown that the subdifferential 
of a proper Isc convex function is paramonotone [5]. 
Other examples of paramonotone maps are given 
in [21]. Paramonotone maps have been extensively 
used in algorithms for the solution of VIP [5,6,24]. The 
main reason is that these maps have the following “cut- 
ting plane property”: 


x € S(T, K) 
yEK 
(y",x—y) 20 
for some y* € T(y) 


=> y € S(T,K). (1) 


3108 


Pseudomonotone Maps: Properties and Applications 


Assume that a map has property (1). If at the nth 
iteration of an algorithm one finds a point y, that is 
not a solution of VIP, then all solutions of VIP belong 
to the intersection of K with the halfspace {x € X : 
(y*,x — yn) < 0} where y* is an arbitrary element of 
T(yn)- 

Let K C X be nonempty, closed and convex. A sin- 
gle valued pseudomonotone map T : K + X* is called 
pseudomonotone, if for all x, y € K, 


(T(x), y—x) = (T(y), y—x) =0 


implies that T(x) = kT(y), for some k>0 [8]. Note 
that single-valued pseudomonotone, maps are a gener- 
alization of single-valued paramonotone maps. To ex- 
tend this generalization to the multivalued case, one 
needs the tools presented in the previous subsection. 


Definition 8 [17] A map T : X — 2%” is pseudo- 
monotone, on K if it is pseudomonotone and for every 
x,y € Kand x* € T(x), y* € T(y), (x*,y-—x) = 
(y*, y —x) = O imply x* € T(y) and y* € T(x). 


It is easy to see that every paramonotone map is 
pseudomonotone,. Other classes of pseudomonotone,. 
maps is provided by the following propositions [17]. 


Proposition 9 The Clarke subdifferential 0° f of a lo- 
cally Lipschitz pseudoconvex function f is pseudo- 
monotone. 


Proposition 10 Ifthe map T is pseudomonotone,, then 
any map equivalent to T is pseudomonotonex. 


Proposition 9 is a particular case of a more general 
situation. A map T is called cyclically pseudomono- 
tone [9,10] if for every (x;, x7) € gr(T),i = 1,2,...,n, 
the following implication holds: 


(x, xi41—x;) > 0, Vi=1,2,...,0-1 
=> (x%,x1 — Xn) <0. 


Proposition 11 If T is D-maximal pseudomonotone 
and cyclically pseudomonotone with convex domain, 
then it is pseudomonotone.. 


Since the Clarke subdifferential of a locally Lipscitz 
pseudoconvex function is D-maximal pseudomono- 
tone and cyclically pseudomonotone, we see that the 
previous proposition implies Proposition 9. 

We saw that paramonotone maps have the cut- 
ting plane property (1). The same is true for 


pseudomonotone, maps; what is more interesting is 
that these maps are characterized in some sense by the 
cutting plane property: 


Proposition 12 Let T be pseudomonotone on the con- 
vex set K. If T is pseudomonotone,, then property 
(1) holds on every subset of K. Conversely, if prop- 
erty (1) holds on every convex, compact subset of K 
and T has convex, weak*-compact values, then T is 
pseudomonotone. on the interior of K. 


If T is single-valued, the assumption of pseudomono- 
tonicity becomes redundant: 


Proposition 13 Let T : K — X* be hemicontinuous. 
If T has property (1) on each convex compact subset of K, 
then T is pseudomonotone on K and pseudomonotonex 
on its interior. 


In Sect. “Methods/Applications” we will show how to 
apply pseudomonotone, maps for the solution of vari- 
ational inequalities. 


Pseudoaffine Maps 


Given a convex subset K of R", a single-valued map 
T : K — R" is called pseudoaftine (or PPM, as in [3]) if 
both T and —T are pseudomonotone. These maps were 
studied in [3] in connection with VIP. It is easy to see 
that a differentiable function f : K — R is pseudolin- 
ear (i.e., both f and —f are pseudoconvex) if and only 
if Vf is pseudoaffine. It is not hard to show that pseu- 
dolinear functions defined on the whole space IR” have 
a very particular form [2,25]: 


Proposition 14 A differentiable function f : R" > R 
is pseudolinear if and only if there exist a vector u € 
R" and a one-variable differentiable function h whose 
derivative is always positive or identical to zero, such 
that f(x) = h((u, x)). 


If T = Vf in this case, then T(x) = h’(u,x), ie, 
T is equal to a positive multiple of a constant vector. 
For general pseudoaffine maps (i.e., those that are not 
necessarily equal to a gradient) that are defined on the 
whole space, the following elegant characterization has 
been shown: 

Proposition 15 A map T : R” — R” is pseudoaffine if 
and only if there exists a positive function g : R" > R, 
a skew-symmetric linear map A and a vector u such that 


Vx ER", T(x) = g(x)(Ax + uv). 
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The proof of the above result needs some “global” argu- 
ments provided by algebraic topology and by projective 
geometry [2]. 


Pseudomonotone vs. Monotone Maps 


One of the basic differences between the class of mono- 
tone maps and the class of pseudomonotone maps has 
to do with their stability with respect to some opera- 
tions. For instance, the class of monotone maps is stable 
with respect to addition (i. e., the sum of two monotone 
maps is monotone), while this is not the case for pseu- 
domonotone maps. By contrast, the product of a pseu- 
domonotone map with a positive function produces 
a pseudomonotone map while this is not the case for 
monotone maps. 

In particular, it was noted in [1] thata map T : X > 
2*" is monotone if and only if for every x* € X* the 
map T +x” is pseudomonotone. More recently He [18] 
and Isac and Motreanu [20] obtained another result in 
this direction. Assume that X is a Hilbert space (in [18] 
one considered X = R”) and K C X is a convex set 
with nonempty interior. Let further T : K — X* be 
a continuous single-valued map which is Gateaux dif- 
ferentiable in the interior of K. Then T is monotone if 
and only if T + x* is pseudomonotone for all x* in 
a straight line of X*. The differentiability assumption 
is essential in the argument of both papers [18,20] be- 
cause the proof is based on a first-order characteriza- 
tion of generalized monotonicity. 

In a recent paper [15] it was shown that the differ- 
entiability assumption is redundant, and one can also 
weaken considerably the assumption that the interior 
of K is nonempty. Given x* € X* andaset K C X, 
one says that x* is perpendicular to K if the value of x* 
is constant on K, i.e., (x*, y—x) = 0 forall x,y € K. 
The following proposition holds. 


Proposition 16 Let K C X be nonempty and con- 
K — 2*° be a map with nonempty 
values. Assume that there exists a straight line S = 
{xg + tx* : t € R} in X* such that x* is not perpendic- 
ular to K, and for allz* € S, T +2” is pseudomonotone. 
Then T is monotone. 


vex and T 


In case K has nonempty interior or, more generally, 
nonempty quasi-interior, the assumption “x* is not 
perpendicular to K” is automatically fulfilled. It should 
also be noted that the results of this subsection are also 


true if we replace “pseudomonotone” by “quasimono- 
tone” (see the article » generalized monotone multi- 
valued maps in this Encyclopedia for the definition). 


Methods/Applications 


Many of the algorithms used to find a solution of a vari- 
ational inequality with a paramonotone map, can be 
also used in the more general case of a pseudomono- 
tone, map. We illustrate this by an example of a per- 
turbed auxiliary problem method. Let K be a closed 
convex subset of a Hilbert space H, T : K > 2h 
a map with nonempty values. Choose a Gateaux dif- 
ferentiable strongly convex function M : H > R 
with a weakly continuous derivative (we can take for in- 
stance M(x) = ||x||? /2). Construct a sequence {xx }xen 
by the following algorithm. 


(i) Choose an arbitrary xo € K. 
(ii) Having chosen x,, find x,41 € K 
and x¢,, € T(x;41) such that 


Vy @K, (paxgy, + M'(xn41) 
— M'(xx),y—Xk41) = 0 


where {//x}keN is a sequence of positive constants 
bounded from below. Note that finding x;,,; amounts 
to solving VIP for the perturbed map Ty4:(-) = 
bee4iT(-) + M'(-)— M'(x,). This problem can be much 
easier than the original one, since for instance if T 
is weakly monotone and 4x4, is small, then Ty+, is 
strongly monotone. 

Assume that VIP has a solution and that the se- 
quence {xx}xen is well-defined. Then it can be shown 
that if T is pseudomonotone,. and satisfies a fairly gen- 
eral continuity condition, then the sequence {xx}ken 
converges weakly to a solution of VIP for T. Details can 
be found in [17]. 


Conclusions 


The theory of pseudomonotone maps is far from been 
developed to a satisfactory level. By contrast, the theory 
of monotone maps has reached a high level of matu- 
rity [19]. It is hoped that some of the recent advances 
presented here, and in particular the ideas on maxi- 
mality of pseudomonotone maps, will provide a firm 
background for the study of pseudomonotone maps. 
This is illustrated by the ease and naturalness with 
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which notions like paramonotonicity can be general- 
ized to pseudomonotone maps, a task that seemed al- 
most impossible before the introduction of maximal 
pseudomonotone maps. In addition, the new notion of 
a pseudomonotone, map seems to be ideally fit the cut- 
ting plane property (see Proposition 12) and this adds 
some confidence that the definition of maximality is on 
the right way. 
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Introduction 


Most problems of process design, process control, pro- 
cess operations, and molecule design are determined 
by the optimal solutions; however, those problems 
are mainly characterized by the existence of multiple 
minima and maxima, as well as first-, second-, and 
higher-order saddle points. During the last decade we 


have experienced a rapid development of new meth- 
ods for deterministic global optimization as well as the 
application of available global optimization algorithms 
in important engineering fields [1,2,9,10,11,12,13,14]. 
Recently, in order to locate the global solutions to the 
nonconvex phase stability analysis problems [3,4,5], 
a quadratic underestimation function based branch- 
and-bound algorithm, i.e., QBB, was developed for 
twice-differentiable nonlinear programs (NLPs) in 
terms of the simplicial partition of the constrained re- 
gion [6,7]. 


Formulation 


The nonconvex optimization problem considered in 
this section can be formulated as 


¥ f(x) 
(P)| subjectto gi(x)<0 i=1,...,m, 
xeS' Cor", 


min 


where f and g; belong to C’, the set of twice- 
differentiable functions, and S° is a simplex defined by 


n+1 n+1 
S° = reRix= y AV’ 420, ag = 1 F 
i=l i=1 


where Vie VC", i=1,2,...,n+1 are the 
n +1 vertices of the simplex $°, and V is the set of 
its vertices. Let Dy be a subset of ‘t” defined by 


Dy = {K ER": gi(x) <0, i= 1,2,..., mp. 


In general, the set Dg is nonconvex and even dis- 
connected. We assume throughout this section that 
problem (P) has an optimal solution, unless otherwise 
stated. For any nonconvex optimization problem, i.e., 
(P), the QBB algorithm proposed in this section belongs 
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to a branch-and-bound scheme. During each iteration 
of this framework, a branching step and a bounding 
step must be finished simultaneously. 


Simplicial Partition 
For the branching procedure, the simplex $° will be di- 
vided into refined subregions by using simplicial parti- 
tion. For such kind of branching, it is a simple matter to 
check that for every i € I, where I is the vertex set of S°, 
the points V!,...,V*!,U, V*1,...,V""! are vertices of 
a simplex S; C S, S is the current simplex, and that 

(int S;)N (int $))=¢% Vj Ai; US) =S. 

i€l 

Then, the simplexes S;, i € I, form a subdivision of the 
simplex S via U. Each §; will be referred to as a sub- 
simplex of S. An important special case is the bisection 
where U is a point of the longest edge of the simplex S, 
for example, U € [V”, V"], i.e. 


IV" —Vv"]|= max {\V'-v/]} 
i<j 
i, j=l,...n+1 
where ||-|| denotes any given norm in i”, and U = 


av™ + (1 —a)V" with 0 < a < 1/2. Adjiman et al. [9] 
proved that this simplicial bisection is exhaustive since 
5 (Sx) > Oask > +00. 


Quadratic Underestimation Function 
for General Non-convex Structures 


In the bounding step of a branch-and-bound algo- 
rithm, a lower bound is always obtained by construct- 
ing a valid convex underestimation problem for the 
original one appearing in the problem (P), and solving 
the relaxed convex NLP to global optimality. For the 
current simplex given by 


n+1 n+1 
S= cen ses ) AVA SO, 1 
i=1 i=1 


(1) 


where Vie VCR", i=1,2,...,n+1 are the 
n + 1 vertices of the current simplex S, and V is the set 
of these vertices. Then, we intend to compute a lower 
bound ju(S) of the objective function f on $M Dg. In 
other words, we compute a lower bound for the optimal 


value of the problem 


min F(x) 
(P(S)) | subject to gis 0 £2 1).24 7, 
xESCHR". 


As mentioned above, f and g; are generic nonconvex 
functions belonging to C’, then the main idea for com- 
puting a lower bound 2(S) is to construct from prob- 
lem (P(S)) a convex problem by replacing all those non- 
convex functions with their respective convex under- 
estimation functions, then solving the resulting relaxed 
convex problem. In order to achieve this, we see the fol- 
lowing definition: 


Definition 1 Given any nonconvex function 
f(x): S > R,x € S C NR" belonging to C’, the 
following quadratic function is defined by 


F(x) = 3 aix; + S bjxj+c, (2) 
i=1 i=1 


where x € S C i” and F(x) = f(x) holds at all ver- 
tices of S. The a;’s are nonnegative scalars and are large 
enough such that F(x) < f(x), Vx eS. 


It is trivial to see that F(x) is convex since all quadratic 
coefficients, i.e., a;'s , are nonnegative. Theorem 2.2.1 
in [7] ensures that F(x) defined by Definition 1 is a con- 
vex underestimator of f(x) if the difference function be- 
tween them, i. e., D(x) = F(x) — f(x), is a convex func- 
tion. It is well known that D(x) is convex if and only 
if its Hessian matrix Hp(x) is positive semidefinite in 
the current simplex. A useful convexity condition is de- 
rived by noting that Hp(x) is related directly to the Hes- 
sian matrix Hy(x) of f(x), x € S by the following equa- 
tion: 


Hp(x) = 2A —Hy(x), 


where A is a diagonal matrix whose diagonal elements 
are a;’s defined in Definition 1. Analogous to the “di- 
agonal shift matrix” defined in [9], A here is referred 
to as the diagonal underestimation matrix, since these 
parameters guarantee that F(x) defined by Eq. 2 is a rig- 
orous underestimator of the generic nonconvex func- 
tion f(x). D(x), as defined, is convex if and only if 
2A — H;(x) = 2diag(a;) — Hy(x) is positive semidefi- 
nite for allx € S. 
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In order to simplify the parameter calculation, the 
underestimator F(x) is reformulated by using a single 
nonnegative a value, as follows: 


F(x) =a) x7 + So bixitc. (3) 


i=1 i=1 


Then, all diagonal elements of the diagonal underes- 
timation matrix A are therefore equal to the uniform 
quadratic coefficient a defined by Eq. 3. Some interval 
arithmetic approaches are provided in [5,7] to estimate 
the quadratic coefficients with theoretical guarantee in 
the current simplex. 

After the quadratic coefficients have been identified, 
the linear and constant coefficients of F(x) defined by 
Eqs. 2 or 3, i.e., b;’s and c, can be given by the quadratic 
coefficients a;’s and the current simplex. In view of Def- 
inition 1, we know F(x) = f(x) holds at all vertices of S, 
so the following linear equation group can be obtained 
as 

Vv AV 2b V4 =f (v*) k=1,..., +1, 
where A € }t”*” is the diagonal underestimation ma- 
trix whose diagonal elements are the quadratic term co- 
efficients, a;’s defined in Eqs. 2 or 3. b € i” is the lin- 
ear coefficient vector whose elements are b;’s defined in 
Eqs. 2 or 3, and c is a scalar: 


bTVF4c=f(Vi)-V AV k=1,...,n41. 
The vector b € i” is augmented as (b, c) € H"*}, 
in order to include the scalar c. In the same way, 
the matrix V € #("*)*" is augmented as (V,1) € 
RO TDXOFD, where 1 is a column unity matrix of i”. 
(V, 1) € R“+DX+Y js a regular square matrix since 
V € "+D*" is the coordinate matrix of the simplex 
which is linearly independent. Then we have 


(b, ce)’ = (V, 1)" [f(V)-—V’ AV] , 


where [f(V) — V' AV] € 8"*! is a column vector for 
the n + 1 vertices of the current simplex. By virtue of 
this equation, it is obvious that the linear and con- 
stant coefficients defined by Eqs. 2 or 3 are determined 
uniquely by the quadratic coefficients and the current 
simplex. 

By replacing all the nonconvex functions in prob- 
lem (P(S)) with their corresponding quadratic function 


based convex underestimators described by Eq. 3, we 
have the following relaxed convex programming prob- 


lem (QP(S)): 


min F(x) 
(QP(S))} subjectto Gj(x)<0 i=1,...,m, 
xESCcHr", 
where 
F(x) = Soa? + SE: +cf, 
i=1 i=1 
Gj(x) = > aii x? + >» bY x; + c8i 
i=1 i=1 
j=i1,2,...,m. 
Let Dg bea subset of i” defined by 
Dg = {x € KR": G(x) <0, PH 152) 20% , m} ‘ 


Obviously, the set Dg is convex and compact. It should 
be noted that only additional m + 1 quadratic parame- 
ters, i.e, a/ and a fori = 1, 2, ..., m,are introduced 
during the above transformation process if the uniform 
underestimation function is used, since all other lin- 
ear and constant coefficients can be calculated by those 
quadratic parameters and the current simplex. 


QBB Underestimators for Special Structures 


For the concave function structure, denoted by 
f(x), whose eigenvalues are all nonpositive, i-e., 
Xi,xes(x) < 0. Then, the quadratic coefficient of its un- 
derestimator defined by Eq. 2 is zero, so the valid 
lower bound of the concave function structure over 
the current simplex is a linear function. In fact, the 
valid bound constructed by Eq. 2 is equivalent to the 
convex envelope of the concave function over a sim- 
plex [7]. Let S be a simplex generated by the ver- 
tices V1,V’,..., Vie, S = {x € NH": x = 
rt avi, Ag > 0, At A, = 1}, and let f(x) be 
a concave function defined on S. Then the con- 
vex envelope of f@4(x) over S$ is the affine function 
L°A(x) = b’x + c that is uniquely determined by the 
system of linear equations f°4(V') = b’V! +c for 
,nt+il. 

For the general quadratic function presented by 


oe eee 


fwa= x'Qx + q'x 
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(note H;(x) = Q), we have the diagonal underestima- 
tion matrix, A, constructed on the basis of interval 
arithmetic [7], as 


1 
a = max Io a3 
i 2 


for the uniform case, and for the nonuniform case, we 
get 


1 
aj; = max 0, 3(@ + 2 Ieul) 
jFi 


Then, we have the quadratic underestimation function 
as 


F(x) =x’ Ax+b'’x+c, 


where the linear and constant coefficients, i.e., (b,c), 
can be determined uniquely by the quadratic coefh- 
cients calculated above and the current simplex. 


Properties of the QBB Underestimator 


For construction of the QBB underestimator, only 
quadratic coefficients need to be calculated since the 
linear and constant ones defined by Eqs. 2 or 3 can be 
determined uniquely by the quadratic coefficients and 
the current simplex. Another important property of the 
QBB algorithm is that the quadratic function based un- 
derestimator is always convex throughout the problem 
space. A potential benefit of this property is that it al- 
lows the convex solver applied to get the solution to the 
underestimator to have a feasible or an infeasible con- 
vergence path. Geometrically speaking, the QBB uses 
a convex quadratic function to approximate the convex 
part of a general nonconvex function directly, which 
can bypass the concave parts and avoid the overestima- 
tion for them. 


Function Decomposition 


It should be noted that the relaxed convex pro- 
gramming problem (QP(S)) can contain not only the 
quadratic underestimation functions for the generic 
nonconvex terms, but also the convex function 
terms which are not necessarily transformed into the 
quadratic underestimators. Then, the final underesti- 
mation strategy of the relaxed problem (QP(S)) can 


be slightly decomposed into the following convex pro- 
gramming formulation, as 


min F'(x) 
(QP(S)’) subjectto Gi(x)<0 i=1,...,m, 
xEScHn", 


where 


P(x) = f(x) + f(x) + LOA(x) + FNM(x), 
Gi(x) = gi(x) + g&(x) + LEA(w) + GN) 


A= 1, 2,25. 5m; 


and f(x), fO(x), Ly*(x), g(x), giM(x), and L°A(x) 
represent the linear terms, convex terms, and the lin- 
ear underestimation functions for the concave terms in 
the objective function and the constraints, respectively. 
FNC(x) and G;N°(x) represent the quadratic convex 
underestimation functions for the generic nonconvex 
terms. Compared with the relaxed problem (QP(S)), the 
relaxed problem (QP(S)’) contains not only quadratic 
function terms, but also the generic convex terms of the 
original problem. 


Algorithmic Procedure of QBB 


At the start of this section, problem (P) is formulated 
over an initial simplex $° which can be easily obtained 
by using an outer approximation approach. Now, we 
are in a position to present the proposed algorithm for 
solving problem (P) by using the basic operations de- 
scribed in previous sections. 

Step 1 - Initialization. A convergence tolerance, &,, 
and a feasibility tolerance, ef, are selected and the itera- 
tion counter k is set to be zero. The global lower and up- 
per bounds [Up and yo of the global minimum of prob- 
lem (P) are initialized and an initial current point xh is 
randomly selected. 

Step 2 - Local solution of problem (P) and up- 
date of upper bound. The nonconvex and nonlin- 
ear optimization problem (P) is solved locally within 
the current simplex S. If the solution fi,.,, of problem 
(P) is ef-feasible, the upper bound yx, is updated as 
ve = min(yg, F*,.)- 

Step 3 - Partitioning of the simplex. The current 
simplex, S*, is partitioned into the following two sim- 
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plexes (r = 1,2): 


where, k,m and k,! correspond to the vertices with 
the longest edge in the current simplex, i-e., (k, m), 
(k, 1) = arg max;<; {| V"/ — v%']}. 

Step 4 - Update of A fs by, p> Ch, f and Die g;? 
bi. ep Ch, % inside both subsimplexes r = 1, 2. The 
nonnegative parameters a, , and aj, ,, of the general 
nonconvex terms in the objective function and the 
constraints are updated inside both simplexes r = 1, 2 
according to the methods presented in former sections, 
and the corresponding linear and constant coefficients, 
ie., by, p> Ck, f and Di gi Ch 
ingly. 

Step 5 - Solutions inside both subsimplexes 
r = 1, 2. The convex programming problem (QP(S)’) 
is solved inside both subsimplexes (r = 1, 2) by using 
some nonlinear programming solver. If a solution ed 
is feasible and less than the current upper bound, yx, 
then it is stored along with the solution point > ane 

Step 6 - Update iteration counter k and lower 
bound /1,. The iteration counter increases by 1, 


gp are renewed accord- 
>’ L 


k<ek+1, 


and the lower bound ju; is updated to the minimum so- 
lution over the stored ones from the previous iterations. 
Furthermore, the selected solution is erased from the 
stored set: 


k’,r! 
Mk = Fas ? 
kyr’ . I, 
where, Fi?" = minfF’/,r = 1,2,I = 1,..., 
sol 1,1 sol 


k — 1}. If the set I is empty, set wz, = yx and go to step 
8. 

Step 7 - Update the current point x*:* and the cur- 
rent simplex S*. The current point is selected to be the 
solution point of the previously found minimum solu- 
tion in step 6, 


and the current simplex becomes the subsimplex con- 
taining the previously found solution, 


k’, ky 1 
sk = yr'o yim Vv " =F Vv 
vr) F ifr’ =1, 
k’, kyl 
sk = [veo —. 
; : 5 ; 
yee, aes 7*) , otherwise. 


Step 8 - Check for convergence. If (yz — Wk) > &c 
then return to step 2. Otherwise, ¢,-convergence has 
been reached. The global minimum solution and the so- 
lution point are given as 


f* rey ical 
x" exo kK ; 
where, k” = arg, {f*? = y)5 P= 1) .a05 ke 


Conclusion 


The QBB algorithm is guaranteed to identify the global 
optimum solution of problems belonging to the broad 
class of twice-differentiable NLPs. For any such prob- 
lem, the ability to generate progressively tighter con- 
vex lower bounding problems in a branch-and-bound 
framework guarantees the convergence of this algo- 
rithm to within ¢ of the global optimum solution un- 
der the exhaustive simplicial partition of the initial sim- 
plex. Different methods [7] have been developed for 
the construction of the convex valid underestimators 
for special function structures and the generic noncon- 
vex function structures, where the maximal eigenvalue 
analysis of the interval Hessian matrix provides the rig- 
orous guarantee for the QBB algorithm to converge to 
the global solution. 
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QR factorization is a process of reducing a square (rect- 
angular) matrix into upper triangular (upper trape- 
zoidal) form by applying a series of elementary orthog- 
onal transformations. 


Properties of Orthogonal Transforms 


Orthogonal transforms are where the transformation 
matrices are orthogonal. Orthogonal matrices are 
square matrices where each column is a unit vector and 
each column is mutually orthogonal to every other col- 
umn. This implies that Q € R"*” is orthogonal if and 
only if Q T Q= QQT =I (i.e. the transpose of an or- 
thogonal matrix is its inverse). Orthogonal transforma- 
tions are invariant under the 2-norm; i.e. || Qx ||2 = || 
X ||2. More details can be found in [8]. There are two 
popular orthogonal transformations: Householder and 
Givens. 


Householder Transformations 


These are named after A.S. Householder, who popu- 
larized their use in matrix computations. However, the 
properties of these matrices have been known for quite 
some time. For any nonzero v € R", a matrix H of the 
form 


vol 


H=I- ee 
is called Householder transformation. It is easy to ver- 
ify that H is symmetric, and orthogonal (which also 
means that it is its own inverse). Identity matrices are 
not Householder matrices. Geometrically, Householder 
matrices merely rotate a given vector in n-dimensions 
(without stretching or shrinking). Given any two vec- 
tors x and y such that || x ||2 = || y ||2, there exists 
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a Householder transformation H such that Hx = y (it is 
easy to verify that v = y — x satisfies this equation). Note 
that v completely characterizes H (in the sense that even 
though H is an n x n matrix, v is enough to reconstruct 
H, and to apply H). Also, scaling v by a scalar factor a 
will not change the transformation H. 


QR Factorization 
Using Householder Transformations 


Since Householder transformations rotate vectors in n- 
dimensions, they can be used to introduce zeroes se- 
lectively. Specifically, given any vector x 4 0 € R”, one 
can construct a Householder matrix H such that Hx is 
a multiple of e; (the first column of the identity matrix), 
i.e. make everything except the first row of Hx zero. Ge- 
ometrically, this amounts rotating the vector such that 
it is parallel to the principal axis. It is easy to see that 
such H has the form H =I — 2vvT vl v where v=x + 
@ e; and @ = || x ||2. In order to avoid subtracting close 
numbers (while dealing with floating point arithmetic), 
v is often chosen as v = x + sign(&,) @ e;, where &, is the 
first element of x. 

The following function House will compute the vec- 
tor v, given x, that characterizes H so that H = I — 
2vvT/vTv and that Hx = — @ e,. Also, v is scaled such 
that v(1) = 1, as the scaling does not affect H (using 
a notation similar to MATLAB [5]). 

To apply H to a vector y, note that 

vl viy 
Hy = (1-25,-) Pa 2g eae 
and hence, one can compute Hy without explicitly com- 
puting H. The same idea can be extended to applying H 
to a set of columns C € R"™*, Let us call that function 
row. House(v, C). 


function: v = House(x) 
n = length(x); 
v(1) = x(1) + sign(x(1)) * norm(x, 2); 
v(2:n) = x(2: n)/v(1); 
v(1) = 1; 
end; 


Suppose H, = House(x) with x taken as the first col- 
umn of a matrix A € R”*". Then H,A will have zeros 
on the first column below the first row. Then one can 


find H>’ = House(A(2 : m, 2)) such that everything be- 
low the second row of the second column is zeroed. Ef- 
fectively, applying H2’ to the lower (m — 1) x (n — 1) 
matrix is the same as applying 


1 m-1 


1 1 0 
H= 
. ae H,, ) 


to A. Note that H» does not affect the first row and col- 
umn of HA. If this process is continued by applying 
a sequence of Householder transformations to A, it is 
reduced to an upper-trapezoidal matrix R; i.e. 


Ay-1Hy—-2*- A= R. (1) 


Since each H; is orthogonal, the product QT = H,—1 
Hy, — 2 -+: Hy, is also orthogonal. Then rearranging the 
equation, 


A=QR, (2) 


where Q is orthogonal and R is upper-trapezoidal. This 
form of factorization is called QR factorization (or or- 
thogonal factorization). The following algorithm com- 
putes the QR factorization of a matrix A. 


function: QR(A, m, n) 
for i=1: min(m,n), 
v = House(A(i : m, i)); 
A(Zit1:m,i) =v(2:m—i+1); 
A(i+1:m,i+1:n) = row.House(v, A(+1: m, 
i+1:n)); 
end; 
end; 


In the above algorithm, the essential components of 
the Householder vectors are stored right where the ze- 
ros are going to be introduced. Here are the different 
parts of a matrix A after the algorithm is applied (su- 
perscripts indicate how many times an entry has been 
modified): 


1 1 1 1 
43, @jyy 43" Ay 
2 2 2 

V21 497 3 Dy 
3 3 

V31 32,4338 Ay 


Vin V2 Vin3 Vinn 
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vj is the ith component of vector v produced by the 
above algorithm during the jth iteration. Note that the 
matrix Q is available in in the lower triangular portion 
of the matrix in a factored form. 


Givens Rotations 


These rotations are named after W. Givens; they are 
also referred to as Jacobi iterations. C.G. Jacobi devised 
a symmetric eigenvalue algorithm based on these trans- 
formations in 1846. Consider the following 2 x 2 matrix 


of the form 
sin 0 
cos 0 


applied to a vector x € R’. It is easy to see that GTx 
is a mere rotation of x by an angle of @ in counter- 


cos 0 
—sin@ 


G(@) = ( 


clockwise direction. Such transformations are called ro- 
tations and as such are orthogonal. A straightforward 
extension that applies to an n-vector is given by matri- 
ces of the following form 


i j 
1 0 0 0 
i} 0 c s 0 
G(i, j,@) = 
j|0 —s c 0 
O-- QO QO -. 1] 


with c? + s? = 1. Here GT(i, j, @) x is a rotation of x € 
R" by an angle 6 in counterclockwise direction in the 
(i, j)-plane. It is easy to verify that GT(i, j, 9) only mod- 
ifies the rows i, j of the vector that is applied to and the 
remaining entries are unaffected; i.e. 


CX}; — SX; ith component, 


G' (i,j, 0)x = 3 sx; + cx; jth component, 


unchanged otherwise. 


Given any vector x € R", GT(i, j, 8) can be constructed 
such that only the rows i, j are affected and that x; is 
zeroed. Solving the following equations 

and 


Sxj + cx; =0 +s =1 


will yield 


Xj Xj 


—_———— , s= ———. (3) 
af Xj FG [xj + x5 


Let Gi) denote the application of a Givens rotation 
that uses rows i and j and zeros Aj, entry. The first col- 


c= 


umn below the first row can be zeroed using a sequence 
of Givens rotations such as 
1 1 
Qi = Gop Gy. 
Similarly, the second column can be zeroed below the 
diagonal by 


2 2 
Qo = Gon GES. 


Repeating this process for each column, A is reduced to 
upper-trapezoidal form, as in 


Qn QaAa=R. 


The beauty about using Givens rotations is that there 
are various ways of applying these rotations and yet get- 
ting the same final QR factorization. In fact, this fact can 
be exploited in parallel processing very effectively. De- 
tailed parallel QR factorization algorithms can be found 
in [6,7] and [3]. 


Fast Givens Transformations 


Fast Givens Transformations involve half the number of 
multiplications compared to Givens rotations and they 
can be used to zero without an explicit square root com- 
putation. They are also referred to as square-root-free 
Givens transformations. Details can be found in [1,2,4]. 

Finally, it can be shown that if A has full rank, then 
it has a unique QR factorization if we make the diagonal 
elements of R positive [8]. 


See also 


> ABS Algorithms for Linear Equations and Linear 
Least Squares 

> Cholesky Factorization 

> Interval Linear Systems 

> Large Scale Trust Region Problems 

> Large Scale Unconstrained Optimization 

> Linear Programming 

> Orthogonal Triangularization 


Quadratic Assignment Problem 3119 


> Overdetermined Systems of Linear Equations 

> Solving Large Scale and Sparse Semidefinite 
Programs 

> Symmetric Systems of Linear Equations 
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The quadratic assignment problem (QAP) is a combi- 
natorial optimization problem, that although there is 
a substantial amount of research devoted to it, it is still, 
up to this date, not well solvable in the sense that no ex- 
act algorithm can solve problems of size n > 20 in rea- 
sonable computational time. The QAP can be viewed 
as a natural extension of the linear assignment problem 
(LAP; cf. also » Assignment and matching). Let 8, de- 
note the set of all permutations ¢:N — N, where N = 
{1,..., n} € Z*. Given a cost matrix C = (cy) ¢ R"*”" we 
can formulate the LAP using permutations as: 


n 


n n 
min Cé(i)o(7) = Min Cidli) - 1 
eas y 2 oe) = i@(i) (1) 
i=1 j= iz 
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The general formulation of the QAP as introduced by 
E.L. Lawler in [88] is obtained by increasing the dimen- 
sion of the cost array C: 


n n n n 
me > » » = CHIP NOKP() 


i=1 j=1 k=1 1=1 


n n 
= ea Yo Yeiowow- (2) 


i=1 j=1 


Formulation (2) will be referred to as the general QAP, 
while an instance will be denoted by QAP(C). The most 
widely used formulation of the QAP, and its first ap- 
pearance in the literature, is that of T.C. Koopmans 
and M.J. Beckmann [85] which is a special case of (2). 
Used as a mathematical model for the location of a set 
of indivisible economical activities, the formulation of 
Koopmans and Beckmann involves three n x n input 
matrices with real elements F = (fj), D = (dy) and 
B = (biz), where fi is the flow between the facility i 
and facility j, dy is the distance between the location 
k and location /, and bj is the cost of placing facility 
i at location k. The objective is to assign each facility 
to a location such that the total cost is minimized. The 
Koopmans-Beckmann QAP formulation is given as fol- 
lows: 


n n nt 
min >) fidowow + 2 bid - 
0" i=1 


i=1 j=1 


In the context of facility location (cf. also » Facilities 
layout problems) the matrices F and D are symmet- 
ric with zeros in the diagonal, and all the matrices are 
nonnegative. An instance of a QAP with input matri- 
ces F, D and B will be denoted by QAP(F, D, B), while 
we will denote an instance by QAP(F, D), if there is 
no linear term (i.e., B = 0). It can be seen that (3) is 
a special case of (2) by setting cj = f jd for all i, j, k, 
I with i A jork Aland cis, = fidse + bin, otherwise. 
In terms of computational complexity (cf. also » Com- 
plexity theory; » Computational complexity theory), S. 
Sahni and T. Gonzalez [129] have shown that the QAP 
is NP-hard and that even finding an approximate solu- 
tion within some constant factor from the optimal so- 
lution cannot be done in polynomial time unless P = 
NP. 


Formulations 


The QAP can be formulated as the following 0-1 integer 
programming problem with quadratic objective func- 
tion (hence the name ‘quadratic assignment problem’): 


n n n 
min ) ) cajerXinxjt + > CijijXij 


i,j=1 k,l=1 i,j=l 


ix kj#! 


s.t. ay = 1) j=l,...,n, (4) 
i=1 
n 
= 1, 1=1.. nM, 
j=l 
xij € {0,1}, i,j=l,. pt 


The above formulation is a direct consequence of for- 
mulation (2), where the constraints imposed by the per- 
mutations are expressed algebraically. A QAP in Koop- 
mans-Beckmann form can be formulated in a more 
compact way using the inner product between two ma- 
trices: 


min (F,XDX") + (B, X) 


(5) 
st. X € Xy, 


where X,, is the set of all permutation matrices X = (xj) 
such that their elements satisfy the constraints in (4). 
In the objective function of (4), let the coefficients cjjxj 
be the entries of an n* x n? matrix S, such that cji is 
on row (i — 1) n+ k and column (j — 1)n + 1. Now let 
Q := S — al, where I is the (n* x n?) unit matrix and 
@ is greater than the row norm ||S||oo of matrix S. The 
subtraction of a constant from the entries on the main 
diagonal of S does not change the optimal solutions of 
the corresponding QAP, it simply adds a constant to the 
objective function. Hence we can consider a QAP with 
coefficient array Q instead of S. Let x = (x11, ...5 Xin 
X21) 0++) Xun)? = (x1, .--3 Xan). Then we can rewrite the 
objective function of the QAP with array of coefficients 
Q as a quadratic form x7 Qx, where it can be shown 
that Q is symmetric and negative definite. Therefore 
we have a quadratic concave minimization problem (cf. 
also ® Concave programming) and can formulate the 
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min x!Qx 


s.t. ee = 1, 


j=l,...,n 
it (6) 
yey a bee iM, 
j=l 
xi; 2 0, i,j=l,...,n, 


The above formulation was introduced in [14], and 
was used to derive cutting plane procedures (cf. also 
> Integer programming: Cutting plane algorithms). By 
adding the term a J to the matrix Q instead of subtract- 
ing it, we could always assume that the objective func- 
tion of the QAP is convex. This leads to the formulation 
of the QAP as a quadratic convex minimization prob- 
lem. The QAP can also be formulated using the trace of 
a matrix as: 


tr(FXD' + B)x! 
s.t. XE X,. 


min 


(7) 


The trace formulation of the QAP first appeared in [47], 
and was used in [51] to introduce eigenvalue lower 
bounding techniques for symmetric QAPs. 

Let vec(X) € R” be the vector formed by the 
columns of a permutation matrix X. The QAP can be 
formulated using the Kronecker product as 


min vec(X)'(F @ D) vec(X) 
+ vec(B)! vec(X) (8) 
s.t. XEX,. 


Using the Kronecker product, Lawler [88] provided an 
alternative formulation for the QAP as an n* x n? LAP. 
An n? x n? matrix C is constructed from the n* costs 
cijkl, Such that the (ijkl)th element corresponds to the 
(i—1)n+k, G—1) n+ J)th element of C. The QAP 
then is equivalent to an LAP of dimension n? with C as 
the cost matrix, and with the additional constraint that 
the n? x n* permutation matrix which defines a feasible 
solution, must be the Kronecker product of two permu- 
tation matrices of dimension n. In other words the QAP 
is equivalent to 


min (C,Y) 
st. Y=X@X, (9) 
XEX,. 


The resulting LAP however cannot be solved efficiently 
(i.e., in O(n°) time) because Y, although it is an n? 
n* permutation matrix, is constrained to have a special 


x 


structure. 


Linearizations 


Linearization is a technique which involves the elimi- 
nation of the nonlinear term in a given objective func- 
tion, in order to make it linear, through the intro- 
duction of new variables and new linear (binary) con- 
straints. The objective is to transform a 0-1 nonlinear 
integer program into a provably equivalent 0-1 linear 
integer program, such that existing methods for lin- 
ear integer programs will provide a relaxed problem 
where lower bounds may be computed. Though there 
are several ways to linearize a given nonlinear inte- 
ger program, it is desirable to have a linearization that 
will introduce the least amount of new variables and 
constraints. Moreover, the ‘tightness’ of the relaxation 
of the resulting linear integer program is very impor- 
tant. 

The first attempt to devise solution techniques for 
solving the QAP had to do with the elimination of the 
quadratic term in the objective function of (4), in order 
to transform the problem into a 0-1 linear program. 
Four such linearizations of the QAP will be presented 
in this section. The first is due to Lawler [88], which is 
the first linearization suggested for the QAP, and the 
second by Kaufman and F. Broeckx [80] which is the 
smallest with regard to the number of new variables 
and constraints introduced. The third is a more recent 
(1983) one that is due to A.M. Frieze and J. Yadegar 
[56], which unifies most of the previous linearizations 
for the QAP, and is closely related to the fourth lin- 
earization presented in this section due to W.P. Adams 
and T.A. Johnson [2]. 


Lawler’s Linearization 


Lawler [88] replaces the quadratic terms xjx,; in the ob- 
jective function of (4), with n* variables 

Vijkl = XijXkl » oe oe a Or 
which results in a 0-1 linear program of n* + n? binary 


variables and n‘* + 2n? + 1 constraints. More specifically, 
it is proved in [88] that the QAP is equivalent to the 
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following 0-1 linear program 


n n 
min > 2 Cijkl Y ijkl 


i,j=1 k,l=1 
s.t. (xij) € X,, 
n n 
De vist = 2°, 
i,j=1k,l=1 
Xig + Xk1 — 2Vijkr = O, 
viet € {0, 1}, 
for i,j,k,] =1,...,n. 


Kaufman-Broeckx Linearization 


Rearranging terms in the objective function (4) we ob- 
tain 


n n 
> Xij > CijkIXkl - 
ij=l k= 
Kaufman and Broeckx [80] defined n’ new real vari- 
ables 


n 
wigs aig S CijkiXk1, i,f=1,...,n, 
kl=1 


resulting in an equivalent linear objective function 


n 
> Wij ‘ 
i,j=l 
Introducing n? constants aj := ue j=) cya for i, j = 1, 
.., n, the QAP becomes equivalent to the following 
mixed 0-1 linear program: 


n 
min > Wij 
i,j=1 
s.t. (xij) € Xn, 
n 


AijXij 1 y CijkIXkl — Wij S ij, 
k[=1 
wij = 0, 


i,j=l,...,n. 


The above formulation employs n? new real variables, 
n° binary variables and n* + 2n constraints. The ele- 
ments cj; are all assumed to be nonnegative, which is 
a valid assumption since the addition of a constant to 
each element will not affect the optimal solution. The 
proof of equivalence of the QAP to the above linear in- 
teger program can be found in [80]. 


Frieze-Yadegar Linearization 


In [56] the products of the binary variables are re- 
placed by continuous variables (i.e. yijxi := xjXx1), and 
the QAP(C) is proved to be equivalent to the following 
mixed 0-1 linear program: 


n n 
min ba > CijkIVijkl 
i,j=1kl=1 
s.t. (xij) € Xn, 
n 


> Vijkl = Xkl, 


i=1 


Vik, 1, 
So viii = Xk; 
j=l 
Vi, k,l, 
” (10) 
Se viii = Xij, 
k=1 
Vi, jl, 
Se viii = Xij, 
1=1 
Vi, j,k, 
Vijij = Xj» 
Vi, j. 
0 = yijxi <1, 
Vi, j,k, 1, 
where i, j, k, /=1,..., n. The above program has n* new 


real variables, n* binary variables, and n* + 4n? + n? + 
2n constraints. Note that the constraint yjjjj = xj is re- 
dundant since it follows from the definition of the yjju 
variables. Frieze and Yadegar considered a Lagrangian 
relaxation of the above 0-1 linear program, and estab- 
lished a relationship between the lower bounds derived 
by the solution of the relaxation, and the lower bounds 
derived from decomposition techniques applied to the 
Gilmore-Lawler bound for the QAP. 


Adams-Johnson Linearization 


Adams and Johnson presented in [2] a new 0-1 linear 
integer formulation for the QAP, which resembles the 
one of Frieze and Yadegar described previously. It is 
based on the general linearization technique for general 
0-1 polynomial programs introduced by Adams and 
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H.D. Sherali [3,4]. The QAP(C) is proved to be equiva- 
lent to the following mixed 0-1 linear program: 


n n 
min a > CijkIVijkl 


i,j=1k,1=1 
s.t. (xij) E€Xy, 
n 


Ye yin =x, Vik, 


ap (11) 
DL vain =xu, Vi,k,I, 
jel 
Vijkt = Yeuij, Vi, je kyl, 
yi 29, Vij, kyl, 
where i, j,k, 1 = 1, ..., n, and each yj; represents the 


product xx). The above formulation contains n* bi- 
nary variables x;, n* continuous variables yj), and n* + 
2n? + 2n constraints excluding the nonnegativity con- 
straints on the continuous variables. Although as noted 
in [3] a significant smaller formulation in terms of both 
the number of variables and constraints could be ob- 
tained, the structure of the relaxation of the above for- 
mulation is favorable for solving it. As noted in [2], the 
constraint set of the above relaxation describes a solu- 
tion matrix Y which is the Kronecker product of two 
permutation matrices (i.e. Y = X @ X where X € X,,), 
showing clearly the equivalence of the above formula- 
tion with the QAP as formulated in (9). The theoretical 
strength of the above linearization of the QAP lies on 
the fact that, as shown in [2] and [73], the constraints of 
the relaxations derived from all previous linearizations, 
can be expressed as a linear combination of the con- 
straints of the continuous relaxation of the above lin- 
earization. Moreover, many of the previously published 
lower-bounding techniques, can be explained based on 
the dual-space of this relaxation. 


Complexity Issues 


The first two parts of this section bring evidence to the 
fact that the QAP is a ‘very hard’ problem from the 
theoretical point of view. Not only the QAP cannot be 
solved to optimality efficiently but it even cannot be ap- 
proximated efficiently within some constant approxi- 
mation ratio. Furthermore, finding local optima is not 
a trivial task even for simply structured neighborhoods 
like the 2-opt neighborhood. The asymptotic behavior 


of the QAP and polynomially solvable special cases of 
the QAP are mentioned in the last two parts of this sec- 
tion. 


Computational Complexity 


Two early results obtained by Sahni and Gonzalez [129] 
in 1976 settled the complexity of solving and approxi- 
mating the QAP. It was shown that the QAP is NP-hard 
and that even finding an €-approximate solution for the 
QAP is a hard problem, in the sense that the existence 
of a polynomial €-approximation algorithm implies P = 
NP. 


Theorem 1 [129] The quadratic assignment problem is 
strongly NP-hard. For an arbitrary € > 0, the existence 
of a polynomial time €-approximation algorithm for the 
QAP implies P = NP. 


The proof is done by a reduction from the Hamiltonian 
cycle problem [58]. 

M. Queyranne [121] derives an even stronger result 
which further confirms the widely spread belief on the 
inherent difficulty of the QAP in comparison with other 
difficult combinatorial optimization problems. It is well 
known and very easy to see that the traveling salesman 
problem (TSP) is a special case of the QAP. The TSP 
on n cities can be formulated as a QAP(F, D) where 
F is the distance matrix of the TSP instance and D is 
the adjacency matrix of a Hamiltonian cycle on n ver- 
tices. In the case that the distance matrix is symmetric 
and satisfies the triangle inequality, the TSP is approx- 
imable in polynomial time within 3/2 as shown in [37]. 
Queyranne [121] showed that, unless P = NP, QAP(A, 
B) is not approximable in polynomial time within some 
finite approximation ratio, even if A is the distance ma- 
trix of some set of points on a line and B is a symmetric 
block diagonal matrix. 

A more recent result of S. Arora, Frieze and H. Ka- 
plan [6] answers partially one of the open questions 
stated by Queyranne [121]. What happens if matrix A 
is the distance matrix of n points which are regularly 
spaced on a line, that is, points with abscissae given by 
Xp =p, p =1,..., n? This special case of the QAP is 
termed linear arrangement problem and is a well stud- 
ied NP-hard problem. In the linear arrangement prob- 
lem the matrix B is not restricted to have the block di- 
agonal structure mentioned above, but is simply a sym- 
metric 0-1 matrix. Arora, Frieze and Kaplan [6] give 
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a polynomial time approximation scheme (PTAS) for 
the linear arrangement problem in the case that the 0-1 
matrix B is dense, that is, the number of ‘1’ entries in 
B is in 2(n’), where n is the size of the problem. They 
show that for each € > 0 there exists an €-approximation 
algorithm for the dense linear arrangement problem 
with time complexity depending polynomially on n and 
exponentially on 1/e, hence polynomial for each fixed 
€>0. 


PLS-Complexity 


Assume that an optimization problem P is given by 
specifying a ground set €, a set F C 2® of feasible so- 
lutions and an objective function f : ¥ — R. A globally 
optimal solution S* € F of the problem P is defined as: 


f(S*) := min (8); 


For any given S € F denote the neighborhood of S by 
N(S) C F. The neighborhood of S consists of other fea- 
sible solutions which are topologically ‘close’ to S. A lo- 
cally optimal solution or a local minimum S € F of the 
problem P, given the neighborhood N is defined as: 


f(S)= min_f(S). 
SEN(S) 


Recently (as of 1999) it has been shown that even 
finding a locally optimal solution for the QAP can be 
prohibitively hard, that is, even local search is hard 
in the case of the QAP. Consider the following ques- 
tion “How easy it is to find a locally optimal solution 
for the QAP? Since local optimality is defined through 
a specific neighborhood structure, the answer depends 
on the involved neighborhood structure. If the neigh- 
borhood N is replaced by new neighborhood N’, one 
would generally expect changes in the local optimal- 
ity status of a solution. The theoretical basis for fac- 
ing this kind of problems was introduced by D.S. John- 
son, C.H. Papadimitriou and Yannakakis [72]. They de- 
fine the so-called polynomial time local search problems, 
shortly PLS problems. A pair (P, N), where P is a (com- 
binatorial) optimization problem and N is an associ- 
ated well defined neighborhood structure, defines a lo- 
cal search problem in which the objective is to find a lo- 
cally optimal solution of P with respect to the neigh- 
borhood structure N. Without going into technical de- 
tails a problem in the PLS class is a local search problem 


for which local optimality can be checked in polyno- 
mial time. In analogy with decision problems, there ex- 
ist complete problems in the class of PLS problems. The 
PLS-complete problems, are - in the usual complexity 
sense — the most difficult among the PLS problems. 

K.A. Murthy, Pardalos and Y. Li [103] introduce 
a neighborhood structure for the QAP which is sim- 
ilar to the neighborhood structure proposed by B.W. 
Kernighan and S. Lin [81] for the graph partitioning 
problem. For this reason we will call it a K-L type neigh- 
borhood structure for the QAP. Murthy, Pardalos and Li 
[103] show that the corresponding local search prob- 
lem is PLS-complete. Consider a permutation o € Sy. 
A swap of ¢o is a permutation ¢ € §, obtained from 
oo by applying a transposition (i, j) to it, @ = doo (i, j). 
A transposition (i, j) is defined as a permutation which 
maps i to j, j to i, and k to k for all k ¢ {i, j}. A greedy 
swap of permutation ¢o is a swap @ which maximizes 
Z(F, D, bo) — Z(F, D, ¢) over all swaps ¢ of bo, where 
Z(F, D, #) is the objective function value of QAP(F, D) 
with permutation ¢@ (see formulation (3)). Let o, ..., 
¢ be a sequence of permutations in §,, each of them 
being a greedy swap of the preceding one. Such a se- 
quence is called monotone if for all k = 0,..., 1, in the 
pair (ox, @k+1) where by = x1 © (ik, je) and x41 
= pe © (ik+1, jk+1)> we have {ix, dks ial {ik+1. jk+1h = 
Q. The neighborhood of ¢o consists of all permutations 
which occur in the (unique) maximal monotone se- 
quence of greedy swaps starting with permutation ¢o. 
Let us denote this neighborhood structure for the QAP 
by Nx — 1. It is not difficult to see that, given a QAP(F, 
D) of size n and a permutation ¢ € S,, the cardinal- 
ity of Nx—1(@) does not exceed [n/2| + 1. It is eas- 
ily seen that the local search problem (QAP, Nx — z) is 
a PLS problem. This result can be found in [112], where 
the authors show that the graph partitioning problem 
with the neighborhood structure defined in [81] is PLS- 
reducible to (QAP, Nx ~ 1). 


Theorem 1 [112] The local search problem (QAP, 
Nx — 1), where Nx—_ is the Kernighan-Lin type neigh- 
borhood structure for the QAP, is PLS-complete. 


The PLS-completeness of (QAP, Nx — 1) implies that, in 
the worst case, a general local search algorithm as de- 
scribed above involving the Kernighan-Lin type neigh- 
borhood finds a local minimum only after a time which 
is exponential on the problem size. Numerical results, 
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however, show that such local search algorithms per- 
form quite well when applied to QAP test instances, as 
reported in [103]. 

Another simple and frequently used neighborhood 
structure is the so-called pair-exchange (or 2-opt) 
neighborhood N>. The pair-exchange neighborhood of 
a permutation ¢o € S, consists of all permutations ¢ € 
8, obtained from ¢o by applying some transposition (i, 
j) to it. Specifically, 


Nig) = {PO fi Lj=l....m i Fj - 


It can also be shown that (QAP, N2) is PLS-complete. 
A.A. Schraffer and Yannakakis [130] have proven that 
the graph partitioning problem with a neighborhood 
structure analogous to N2 is PLS-complete. A similar 
PLS-reduction as in [112] implies that the local search 
problem (QAP, 2) is PLS-complete. 

Finally, let us mention that no local criteria are 
known for deciding how good a locally optimal solu- 
tion is as compared to a global one. From the complex- 
ity point of view, deciding whether a given local opti- 
mum is a globally optimal solution to a given instance 
of the QAP is a hard problem, see [108]. 


Asymptotic Behavior 


Under certain probabilistic conditions on the coefh- 
cient matrices of the QAP, the ratio between its ‘best’ 
and ‘worst’ values of the objective function approaches 
1, as the size of the problem approaches infinity. R.E. 
Burkard and U. Fincke [29] identify a common com- 
binatorial property of a number of problems which, 
under certain probabilistic conditions on the problem 
data, behave as described above. 

In an early work Burkard and Fincke [28] investi- 
gate the relative difference between the worst and the 
best values of the objective function for the Koopmans- 
Beckmann QAP. They first consider the case where the 
coefficient matrix D is the matrix of pairwise distances 
of points chosen independently and uniformly from the 
unit square in the plane. Then the general case where 
entries of the flow and distance matrices F and D are 
independent random variables taken from a uniform 
distribution on [0, 1] is considered. In both cases it is 
shown that the relative difference mentioned above ap- 
proaches 0 with probability tending to 1 as the size of 
the problem tends to infinity. 


Later Burkard and Fincke [29] consider the ratio be- 
tween the objective function values corresponding to 
an optimal (or best) and a worst solution of a generic 
combinatorial optimization problem. They find that for 
each € > 0, the ratio between the best and the worst val- 
ues of the objective function lies on (1 — €, 1 + €), with 
probability tending to 1, as the size of the problem ap- 
proaches infinity. Under additional combinatorial con- 
ditions, W. Szpankowski [132] strengthens this result 
and improves the range of the convergence to almost 
surely. In the almost sure convergence the probability 
that the above mentioned ratio tends to 1 is equal to 1. 
The asymptotic behavior of the QAP can be stated in 
the following theorem: 


Theorem 3 Consider a sequence of problems QAP(A™, 
B™) for n € N, with n x n coefficient matrices A™ = 
(a?) and B” = (o”). Assume that ae and as neéN, 
1 <i,j <n, are independently distributed random vari- 
ables on [0, M], where M is a positive constant. More- 


over, assume that the entries as neN,1<ij<n 


have the same distribution, and the entries ne, neéeN, 
1 <i,j <n, have also the same distribution, which does 
not necessarily coincide with that of a Furthermore, 
assume that these variables have finite expected values, 
variances and third moments. Let os, and #\"). denote 


an optimal and a worst solution of QAP(A™, B™), re- 
spectively, that is, 


z (a. BO), op) = minZ (a, Be, ¢) , 
PESy 


and 


Z (AO BGO) \ = max Z (4, Bg). 
ESn 


Then the following equality holds almost surely: 


Z(A™, Be, oi) 


The above result suggests that the value of the objective 
function of the problem QAP(A™, B”) (correspond- 
ing to an arbitrary feasible solution) gets somehow close 
to its expected value n*E(A)E(B), as the size of the prob- 
lem increases, where E(A) and E(B) are the expected 
values of ay and bY”, néN,1 <i,j <n, respectively. 
J.C.B. Frenk, M. van Houweninge, and A.G. Rinnooy 
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Kan [54] and W.T. Rhee [126,127] provide different an- 
alytical evaluations for this ‘getting close’, by imposing 
different probabilistic conditions on the data. 

Results on the asymptotic behavior of the QAP have 
been exploited by M.E. Dyer, Frieze, and C.J.H. McDi- 
armid [46] to analyze the performance of branch and 
bound algorithms for QAPs with randomly generated 
coefficients as described above. They have shown that 
for such QAPs the optimal value of the continuous 
relaxation of Frieze-Yadegar linearization as stated in 
(10), is in O(n) with probability tending to 1 as the size 
n of the QAP tends to infinity. Hence the gap between 
the optimal value of this continuous relaxation and the 
optimal value of the QAP grows like O(m) with proba- 
bility tending to 1 as n tends to infinity. 


Polynomially Solvable Cases 


Since the QAP is NP-hard, restricted versions which 
can be solved in polynomial time are an interesting as- 
pect of the problem. A basic question arising with re- 
spect to polynomially solvable versions is the identifica- 
tion of those versions and the investigation of the bor- 
der line between hard and easy versions of the prob- 
lem. There are two ways to approach this topic: first, 
find structural conditions to be imposed on the coef- 
ficient matrices of the QAP so as to obtain polynomi- 
ally solvable versions, and secondly, investigate other 
combinatorial optimization or graph-theoretical prob- 
lems which can be formulated as QAPs, and embed the 
polynomially solvable versions of the former into spe- 
cial cases of the later. These two approaches yield two 
groups of restricted QAPs which are briefly reviewed in 
this section. For a detailed information on this topic, see 
[35]. 

Most of the restricted versions of the QAP with spe- 
cially structured matrices involve Monge matrices or 
other matrices having analogous properties. A matrix 
A = (aj) is a Monge matrix if and only if the following 
Monge inequalities are fulfilled for each 4-tuples of in- 
dices i, j,k, Li<k,j<l: 


Qij + Akl S ail + ag; - 


A matrix A = (aj) is an anti-Monge matrix if and only 
if the following anti-Monge inequalities are fulfilled for 
each 4-tuples of indices i,j,k, Li<k,j<l: 


Qij + Akl = ail + ag; - 


A simple example of Monge and anti-Monge matrices 
are the sum matrices; the entries of a sum matrix ma- 
trix A = (ay) are given as aj = a; + Bj, where (a) and 
(B;) are the generating row and column vector, respec- 
tively. A product matrix A is defined in an analogous 
way: its entries are given as aj = a,j, where (aj), (B;) 
are the generating vectors. If the row generating vector 
(@;) and the column generating vectors (8;) are sorted 
nondecreasingly, then the product matrix (fj) is an 
anti-Monge matrix. 

In contrast with the traveling salesman problem, it 
turns out that the QAP with both coefficient matrices 
being Monge or anti-Monge is NP-hard, whereas the 
complexity of a QAP with one coefficient matrix be- 
ing Monge and the other one being anti-Monge is still 
open, see [23] and [35]. However, some polynomially 
solvable special cases can be obtained by imposing ad- 
ditional conditions on the coefficient matrices. These 
special cases involve very simple matrices like prod- 
uct matrices or so-called chess-board matrices. A ma- 
trix A = (aj) is a chess-board matrix if its entries are 
given by aj = (—1)'*). These QAPs can either be for- 
mulated as equivalent LAPs, or they are constant per- 
mutation QAPs (see [23,35]), that is, their optimal so- 
lution can be given before hand, without knowing the 
entries of their coefficient matrices. A few other ver- 
sions of the QAP involving Monge and anti-Monge 
matrices with additional structural properties can be 
solved by dynamic programming. Other restricted ver- 
sions of the QAP involve matrices with a specific di- 
agonal structure such as circulant and Toeplitz ma- 
trices. An n X n matrix A = (aj) is called a Toeplitz 
matrix if there exist numbers c_ y+ 1, ..., C-1, Cos C1 
s+) Cn—1 Such that ay = cj—j, for all i, j. A matrix 
A is called a circulant matrix if it is a Toeplitz ma- 
trix and the generating numbers c; fulfill the condi- 
tions c; = C,—;, for 0 < i < n — 1. In other words, 
a Toeplitz matrix has constant entries along lines par- 
allel to the diagonal, whereas a circulant is given by 
its first row and the entries of the i — th row re- 
sembles the first row shifted by i — 1 places to the 
right. 

In general versions of the QAP with one anti- 
Monge (Monge) matrix and one Toeplitz (circulant) 
matrix, remain NP-hard unless additional conditions, 
such as monotonicity, are imposed on the coefficient 
matrices. A well studied problem is the so called anti- 
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Monge-Toeplitz QAP where the rows and columns 
of the anti-Monge matrix are nondecreasing, investi- 
gated in [26]. It has been shown that this problem is 
NP-hard and contains as a special case the so-called 
turbine balancing problem (TBP) introduced in [99] 
and formulated as a QAP in [87]. In the TBP we are 
given a number of blades to be welded in regular spac- 
ing around the cylinder of the turbine. Due to inac- 
curacies in the manufacturing process the weights of 
the blades differ slightly and consequently the gravity 
center of the system does not lie on the rotation axis 
of the cylinder, leading to instabilities. In an effort to 
make the system as stable as possible, it is desirable 
to locate the blades so as to minimize the distance be- 
tween the center of gravity and the rotation axis. The 
mathematical formulation of this problem leads to an 
NP-hard anti-Monge-Toeplitz QAP. (For more details 
and for a proof of NP-hardness see [26].) It is prob- 
ably interesting that the maximization version of this 
problem is polynomially solvable. Further polynomi- 
ally solvable special cases of the anti-Monge-Toeplitz 
QAP arise if additional constraints such as benevo- 
lence or k-benevolence are imposed on the Toeplitz 
matrix. These conditions are expressed in terms of 
properties of the generating function of these matri- 
ces, see [26]. The polynomially solvable QAPs with 
one anti-Monge (Monge) matrix and the other one 
Toeplitz (circulant) matrix described above, are all con- 
stant permutation QAPs. The techniques used to prove 
this fact and to identify the optimal permutation is 
called reduction to extremal rays. This technique ex- 
ploits two facts: first, the involved matrix classes form 
cones, and secondly, the objective function of the QAP 
is linear with respect to each of the coefficient matri- 
ces. These two facts allow us to restrict the investi- 
gations to instances of the QAP with 0-1 coefficient 
matrices which are extremal rays of the above men- 
tioned cones. Such instances can then be handled by el- 
ementary means (exchange arguments, bounding tech- 
niques) more easily that the general given QAP. The 
identification of polynomially solvable special cases of 
the QAP which are not constant permutation QAPs 
and can be solved algorithmically remains a challeng- 
ing open question. 

Another class of matrices similar to the Monge ma- 
trices are the Kalmanson matrices. A matrix A = (aj) is 
a Kalmanson matrix if it is symmetric and its elements 


satisfy the following inequalities for all indices i, j,k, Li 
<j<k<l: 

Aij + Akl S Gik + jl, Gil + Ajk S Aik + Aji - 
For more information on Monge, anti-Monge and 
Kalmanson matrices, and their properties, see [32]. 
The Koopmans-Beckmann QAP with one coefficient 
matrix being a Kalmanson matrix and the other one 
a Toeplitz matrix, has been investigated in [44]. The 
computational complexity of this problem is an open 
question, but analogously as in the case of the anti- 
Monge-Toeplitz QAP, polynomially solvable versions 
of the problem are obtained by imposing additional 
constraints to the Toeplitz matrix. 

Further polynomially solvable cases arise as QAP 
formulations of other problems, like the linear ar- 
rangement problem, minimum feedback arc set prob- 
lem, packing problems in graphs and subgraph iso- 
morphism, see [23,35]. Polynomially solvable versions 
of these problems lead to polynomially solvable cases 
of the QAP. The coefficient matrices of these QAPs 
are the (weighted) adjacency matrices of the underly- 
ing graphs, and the special structure of these matrices 
is imposed by properties of these graphs. The meth- 
ods used to solve these QAPs range from graph the- 
oretical algorithms (in the case of the linear arrange- 
ment problem and the feedback arc set problem), to dy- 
namic programming (in the case of subgraph isomor- 
phism). 


Lower Bounds 


Lower bounding techniques are primarily used with 
implicit enumeration algorithms, such as branch and 
bound, to perform a limited search of the feasible re- 
gion of a minimization problem, until an optimal so- 
lution is found. A more limited use of lower bound- 
ing techniques, is also for evaluating the performance of 
heuristic algorithms, by providing a relative measure of 
proximity of the suboptimal solution to the optimum. 
In comparing lower bounding techniques, the follow- 
ing criteria should be taken into consideration: 
e complexity of computing the lower bound; 
e tightness of the lower bound (i.e. closest to the op- 
timum solution); 
e efficiency in computing lower bounds for subsets of 
the primal feasible set. 
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Since there is no clear ranking of the performance of 
the lower bounds that will be discussed below, all of the 
above criteria should be kept in mind while reading the 
following paragraphs. Considering the asymptotic be- 
havior of the QAP, it should be fair to assume that the 
tightness of the lower bound probably dominates all of 
the above criteria. That is, if there is a large number of 
feasible solutions close to the optimum, then a lower 
bound which is not tight enough, will fail to eliminate 
a large number of subproblems in the branching pro- 
cess. 


Gilmore-Lawler Type Lower Bounds 


Based on the formulation of the general QAP as a LAP 
of dimension n’ stated in formulation (9), Lawler [88] 
derived lower bounds for the QAP, by constructing 
a solution matrix Y in the process of solving a series of 
LAPs. If the resulting matrix Y is a permutation matrix, 
then the objective function value is optimal, otherwise 
it is bounded from below by (C, Y). Specifically, con- 
sider an instance of QAP(C), where the matrix C is par- 
titioned into n? minors, C“) = (Cijkl) nxn for i,j =1,...; 
n. Each minor C“/ essentially contains the costs asso- 
ciated with the assignment x, = 1. Partition the solution 
matrix Y also into n? minors, Y“) = (yijx1)nx n for i, j = 
1,...,”, whose actual values are to be determined in the 
process. Solve the n? LAPs associated with each minor 
Ca), 


n n 
li; = min > > CijklVijkl 


k=1 1=1 
n 


s.t. Se yin =1, l= 1 ae sits 
ae (12) 
Y- yijn — ale k= 1, n 
1=1 
yijii = 1, 
ie 210, 1), tf = lysseg ht. 


Observe that the last constraint essentially reduces the 
problem into an LAP of dimension (n — 1), obtained by 
deleting the ith row and jth column of the matrix C“). 
Denote the solution matrix for each of the above LAPs 
by Y“. Using the values J; from above, solve the LAP 
to obtain the Gilmore-Lawler lower bound for general 


QAPs: 
GLB(C) = min 2 > Li jXij 
i=1 j=l (13) 
s.t. (xij) € Xn, 


and denote its solution matrix by X* = (x7). If 
1 a 
—\o xiv eX,, 
n rr 


then Y* = (x7 Y) 2,2 is a Kronecker product of two 
permutation matrices of dimension n, and then it is also 
an optimal solution. Otherwise the optimal solution to 
the QAP is bounded below by GLB(C) = (C, Y*). Con- 
sidering that each LAP can be solved in O(n*) time, the 
above lower bound for the general QAP of dimension n 
can be computed in O(n?) time. 

For the more special Koopmans-Beckmann for- 
mulation of the QAP (cf. formulation (3)) where the 
quadratic costs are derived by the pairwise product of 
two matrices F and D, the structure of the problem can 
be used to reduce the computational effort. Before we 
proceed, let us make some definitions. For vectors a, b € 
R", define the following extremal variations of the usual 
inner product between vectors, by imposing an order- 
ing in the elements of the vectors: 


(a,b) := Y-aibi. (14) 
i=1 
where 4; > 4; +1, 0; < bi41, Vi, and 
(a,b) = Y° aibj, (15) 
i=1 


where a; > aj41, b; => bj41, Vi. The following is a well 


known result: 


Proposition 4 [69] Fora, b € R" the following inequal- 
ities hold for any $ € 8,: 


(a,b) <)> aibpi < (a,b)*. 


i=1 


Consider an instance QAP(F, D, B), and recall that this 
can be transformed into an instance of QAP(C) by as- 
signing the values 


Sidi, 
Fiirdjj + bij, 


fori Ak, j Al, 


Cijkl = 
a fori=k, j=l. 
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Each minor C“ in the partitioned matrix C, is now 
Chl = fi4 d,,.y, where fi and dj.) is the ith and jth 
row of matrix F and D respectively. Therefore, using 
the result of Proposition 4, instead of solving n? LAPs 
we can easily compute the values 1; as 


lig = fiidjj + bij + Ge dj) (16) 


where the vectors Fee Ag € R"! are obtained by 
removing the ith and jth element of the vectors f(;.) 
and dj.) respectively. Finally by solving the resulting 
LAP as in (12), we obtain the Gilmore-Lawler lower 
bound for the Koopmans-Beckmann QAPs, denoted 
by GLB(F, D), in O(n*) time. Its name is due to the fact 
that Lawler [88] and P.C. Gilmore [60] independently 
derived this lower bound, while the first author consid- 
ered the case for general QAPs also. The simplicity of 
the Gilmore-Lawler lower bound makes it one of the 
most efficient to compute, although it deteriorates fast 
as n increases. The quality of the lower bound can be 
improved if the contribution of the quadratic term in 
the objective function is made to be smaller than that 
of the linear term. Consider the formulation of the gen- 
eral QAP where the linear and the quadratic terms are 
separated for clarity. By the above discussion the lower 
bound will be the solution to the LAP 


min 2 yj + Cijij)Xij 


i=1 j=l 
s.t. (xij) EX. 


We want to decompose the cost coefficients in the 
quadratic term of (4) and transfer some of their value 
into the linear term such that cj >> 1j, which will result 
in a tighter lower bound since the LAP can be solved 
exactly. This procedure known as reduction was intro- 
duced in [41], and it has been investigated by many 
researchers (see [18,47,56,128]). The general idea is to 
decompose each quadratic cost coefficient into several 
terms, which in turn will end up being linear cost co- 
efficients and will be moved in the linear term of the 
objective function. Consider the following general de- 
composition for each quadratic cost coefficient in the 
objective function in (4): 

D1) cijkt = Cijk + eijk + gigi + hikt + tiepiFKjF 

1. 


Here e, g,h,t € R®, Substituting the above and sep- 
arating terms, the objective function in (4) becomes 


n n n 
ep ee ea 


i=1 j=l kl=1 
i#k j#l 


n n n 
+ > ye > (eijk + gig + Miki + tiki) XijXkI 


i=1 j=l k= 
ifxk j#1 


n 
+ DE cijijrij- 


i,j=l 
Consider now the term associated with the ejjx: 


n n n 
y ) ) CijkXijXkl 


ij=l k=1 J=1 


k#i 1A; 
n n n 
= Domi) do em) QE xm 
i,j=l k=1 l=1 
kFi Lj 


We can add the term 


n 


Dd, ek 


k=1 
k#i 


to the (i, j)th element of the LAP that composes the 
linear term of the objective function, since xj = 1 
=> xy =0 > V4; xn = 1, Vk. Using similar argu- 
ments for the vectors g, h and t, their costs become 
linear and the objective function with decomposed 
quadratic costs become 


n n n n n 
™~ “~*~ ™~ =, Sere ~ 
> » » ? CijkIXijXkl + y y CijXij, 


i=1 j=l k=11=1 i=1 j=l 


ifxfk j#l 
where 
n 
Cig = Cijij + > Cijk 
k=1 
ki 


+ > Sijl + y 


l=1 k=1 LS 
Lj k #i 1A j 
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Therefore we can apply the Gilmore-Lawler bound 
in the quadratic term of the decomposed objective 
function, whereas we can get an exact solution to the 
LAP that composes the linear term, and the sum of 
these two values will constitute a lower bound for 
the QAP. In the case of the Koopmans-Beckmann 
formulation of the QAP were we have two matrices 
F and D, the general decomposition scheme is: 
D2) 


fg=fytAituy iF; 
dkj =dyitv_etpi, kA. 


Here A, 2, v, p € R". Substituting to the product fijdy 
it is easily seen that D2) reduces to the general decom- 
position D1) with vectors e, g, h, t € R”. Frieze and 
Yadegar [56] showed that the inclusion of the vectors 
h and t in D1) does not affect the value of the lower 
bound, and therefore are redundant (similarly the vec- 
tors 44 and p for the Koopmans-Beckmann QAP are 
redundant in D2)). The same authors in [56] derived 
lower bounds for the QAP based on a Lagrangian re- 
laxation (cf. also » Integer programming: Lagrangian 
relaxation). Specifically, consider the Lagrangian relax- 
ation of the 0-1 linear programming formulation of 
the QAP (see (10)), where the second and third con- 
straints are included in the objective function, using as 
Lagrangian multipliers the elements of the vectors e and 
g. The Lagrangian function is thus defined as 


Le, g):= 


min ) Cijkl Vijkl 
ijkl 


+ = e jkl (mu - = rm) 


jkl 
+ Se giki XkI > Vijkl 
ikl j 
= Gr — €jkl — Sikl) Yijkl 
ijkl 


+¥ (Nast Den) 
ij k i 
s.t. first constraint in (10), 


fourth constraint in (10) 


last constraint in (10). 


The authors prove in [56] that for any choice of e and 
g, the solution to the above minimization problem will 
equal the value of the Gilmore-Lawler lower bound as 
applied to the QAP, with decomposed quadratic cost 
coefficients, as dictated by using the vectors e and g 
only in D1). Therefore, max,,, £ (e, g) constitutes an 
upper bound on the lower bounds for the QAP, ob- 
tained by using the Gilmore-Lawler bound with de- 
composed quadratic cost coefficients. Using subgradi- 
ent algorithms (cf. also ® Nondifferentiable optimiza- 
tion: Subgradient optimization methods) the authors 
derive near optimal solutions for max,,, £ (e, g) re- 
sulting in two lower bounds, denoted by FY1 and FY2, 
corresponding to the two different solution approaches 
proposed. As suggested by the experimental results in 
[56], these bounds seem to be sharper than previously 
reported Gilmore-Lawler based lower bounds using re- 
duction techniques. Almost all of the other approaches 
for obtaining lower bounds for the QAP with reduction 
techniques, are special cases of the general decomposi- 
tion scheme D2) (see [18,47,128]). 


Variance Reduction Lower Bounds 


The variance reduction lower bounds were intro- 
duced in [93]. Consider an instance of the Koopmans- 
Beckmann formulation of the QAP, with input matrices 
F = (fj), D = (dj) € R"*". Now partition both matrices 
into a sum of two matrices, F = F; + Fy and D= D, + 
D>, were F, = Gs: Fy = Go and D, = (d\?), D, = 
(a: Construct an n x n matrix L = (1), by solving the 
following n* LAPs: 


n 


b= min Do fed ba + fa dow; 
g € Sn k=1 
oi) = j 


2) (2) 4(2) 
+ Skid gin — Sei AGyj 7) 


It is proved in [93] that the solution of the LAP with 
cost matrix L as constructed above, constitutes a lower 
bound for QAP(F, D). The problem of concern now, 
is to find a way to partition the matrices F and D such 
that the resulting lower bound is maximized. Observe 
that when F, = F and D, = D (i.e. no partitioning), we 
essentially derive the GLB(F, D). 

Let M € R”*", and denote its rows and columns 
respectively as mi., and my.,j), i,j=1,...,n. Think of M 
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as a data set of mn elements mj, and define an average 
y (M) and a variance V(M) as, 


y(M) := —— omy 


i=1 j=1 


V(M) = S) > (y(M) = mij). 


i=1 j=1 


Also define the total variance, 


T(M, A) =A Y°V(mG) + (1-A)V(M), 


i=1 


where A € [0, 1]. The term V(m,;.)) stands for the vari- 
ance of mi;.), treated as a 1 x n matrix. The authors ob- 
served that, as the variances of the matrices F and D 
decrease, the GLB(F, D) increases, while it is optimal if 
the variances of the rows of the matrices are zero. The 
partition scheme considered is of the form, F; =F + Az, 
F, =— Ap, and D; = D+ Ap, D2 = — Ap. Considering 
only the matrix F, the problem is to find a matrix Ap, 
such that the variances of F,; and F, and the sum of the 
variances of the rows for each F; and F, are minimized. 
We will only describe how A; is obtained since Ap is 
obtained in the same way. The problem of minimizing 
the variances can be stated mathematically as 


min @T(F + Ag,A)+(1—6)T(—Ap, A), — (18) 


where Ar € R”*" and 6 € [0, 1] is a parameter. Two 
approximate solutions were proposed in [93], corre- 
sponding to the two reduction schemes 
RI) bij = OF nn =f4) + Sun 
R2) 45 = O(v(F¢,m) — VFt9)- 
Here i,j = 1,..., n, Ar = (6;), and with 6,,, being free 
to take any value (it was given a value of zero in the 
experiments conducted in [93]). In obtaining R2), the 
problem of minimizing the variances such that the ma- 
trix Ar is constrained to have constant columns, is con- 
sidered. The matrix Ap is constructed in the same way. 
Based on the two reductions schemes above, the result- 
ing lower bounds from the solution of (17) are denoted 
by LB1(@), and LB2(@). The above procedure for com- 
puting Ap, Ap has O(n”) computational complexity. 
After the partitioning of the matrices F and D, the 
solution to the LAP with cost matrix L = (J), were 1; are 
defined in (17), will yield LB1(@) or LB2(6) according to 


what reduction scheme used. If LB2(@) is used, the fact 
that the matrices F, and D, have constant columns can 
be exploited to compute the Jj, i,j = 1, ..., n, efficiently 
as 


_— (FM Go\" (2) . 
li = faa) Ei; > dj 


k=1 
kx j 
+d SO) fag -—(n- DP dP + fid sj . 
k=1 
ki 


where dae € R"" are the ith and jth row of F; 
and D, respectively, with the ith and jth elements re- 
moved from each, and (-, -)~ is defined in (14). In the 
case that LB1(@) is used, the direct approach would be 
to solve the n? LAPs defined in (17), which will require 
O(n?) computational effort. A different approach is to 
calculate lower bounds for the values lj, i,j = 1,...,n, 
as follows: 


— (Fo GO\ Fo 7 OV 
lij= Cae ay) v (Fe: 4,9) 
as pe = as a + 
— Je) (2) (2) 
- (Feo. a?) * (Feb, ae») : 


where each vector in the above extremal inner products, 
is of dimension n — 1, and corresponds to the ith row or 
column of the indicated matrix, upon removal of the ith 
element. Similarly as before the extremal inner prod- 
ucts (-,-)~ and (-,-)* are defined in (14) and (15). Using 
the above approach would require O(n*) time to com- 
pute lower bounds for the lj, i, j = 1, ..., n, thus the 
total computational complexity of the variance reduc- 
tion lower bounds is O(n’). It is worth noting that there 
is also a closed form solution to problem (18) given in 
[71] which is 
bij = oF yf) 
O(1 —A) + 6A7(1 — 8) — 67A7(1 — 8) 


(1— Oa —A + OA) y(F) 
6A(1 — @) 
apie LE a eee | 
for i, j = 1, ..., n. However it was reported in [93] 


that using the above closed form in the computation 
of the lower bounds, poses implementation obstacles. 
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The experimental results conducted in [93], also sug- 
gest that the settings of 0 = 0.5 for LB1(@), and @ = 1 for 
LB2(@) as best choices. Finally, these lower bounds per- 
form well on QAPs with input matrices that have high 
variances, but their performance reduces to that of the 
Gilmore-Lawler bounds when the variance of the ma- 
trices is small. 


Eigenvalue Based Lower Bounds 


These bounds were introduced in [50,51], and are ap- 
plied to the Koopmans-Beckmann formulation of the 
QAP. This approach utilizes known results on per- 
mutation matrices and eigenvalues, and exploits the 
special structure of QAP(F, D). Upon the introduc- 
tion of the method in [50,51], many improvements 
and generalizations have appeared (see for example 
[65,66,67,68, 123,124]). There is a resemblance with the 
Gilmore-Lawler based lower bounds, in the sense that, 
based upon a general lower bound, reduction tech- 
niques are applied to the quadratic terms of the ob- 
jective function in order to improve its quality. The 
reduction techniques that applied to eigenvalue based 
lower bounds however, yield a significant improve- 
ment, which is not really the case with the Gilmore- 
Lawler bounds under certain reductions. 

Considering the trace formulation of the QAP in 
(7), with F and D being real symmetric matrices, there- 
fore with all their eigenvalues being real, the following 
result can be stated for the quadratic term [51]: 


Theorem 5 [51] Let F, D € R"*" be symmetric matri- 
ces, and denote by X = (Aj, ..., An)? and x}, ..., X» the 
eigenvalues and eigenvectors of F, and by = (1, ..., 
Len) and yj, ...; Yn the the eigenvalues and eigenvectors 
of D. Then the following two relations are true for all X 
EX, 
i) tr FXDXT = SOP, 0, Ap (x XyP = ATS(X). 
Here S(X) = ((xi, Xy;)*) is a doubly stochastic matrix, 
ii) (A, w)~ <tr FXDXT < (A, y)*. 


Using Theorem Sii) a lower bound for QAP(F, D) based 
on the eigenvalues of F and D is then 


EVB = (A,u) + min trBX' , 
XEXy, 


where the second term is an ordinary LAP that can be 
solved exactly. Observe that in Theorem 5, the smaller 


the interval [(A, w)~, (A, 14)*] is, the closest (A, 4) 
is to tr FXDXT. A possible way of making the interval 
smaller, is to decompose the matrices F and D such that 
some of their value will be transfered in the linear term, 
and the eigenvalues of the resulting matrices that com- 
pose the quadratic term, are as uniform in value as pos- 
sible. Define the spread of the matrix F as 


spread(F) := max {|A; —4,| hf = becsag ths 


Based on the above discussion, we want to minimize 
the spreads of the matrices that compose the quadratic 
term. There is no simple closed form for expressing 
spread(F) in terms of fj, however we can minimize in- 
stead a formula for the upper bound given in [98]: 


1 


n n 2 7 
spread(F) < m(F) = | 2 ¥ + fi, — <(te FY 
n 
i=1 j=1 


(19) 


The decomposition scheme that the authors use in [51], 
is the following: 


IgA J gre Fey his (20) 


dit = dei t+ get git sk, (21) 


where rj = sj = 0, for i Fj. 

Consider the decomposition for matrix F and let 
F= (f, p: Minimizing the function f(e,r) = m(F) 
obtained by substituting the values of f; ; in (19), the 
following values are obtained [51]: 


1 n n 
— ij -trF 22 
aa) 2228 
i=1 j=1 
Aid (3) 
eg = if fit — ; 
n—2\4 j 7 
j=l 
ri = fii — 2; , (24) 
for i= 1,..., n. Analogously we obtain the values for g 


and s for the decomposition of D. Replacing F and Din 
(7) we obtain 


tr(FXD + B)X' =tr(FXD+B)x! , 
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where bi = bij + fidjj + 2e; dist djx, and matrices 
tAj 


F and D have respective eigenvalues 1= On; ke An) 
and = (f,,...,/,). The corresponding eigenvalue 
lower bound is then 

EVB1 = (3, 7) + min trBX™ . 

XEX, 

If we restrict ourselves only to pure quadratic (fj; = 
di = 0, Vi, B = 0) symmetric QAPs, the matrix B in the 
above decomposition becomes B = cw!', where c = 
2(e1,...,@n)!| andw= > ips 03 Dj): Therefore 
minyex, tr BX! = (c,w) ,and 


EVB1 = (1, z) +(c,w)” < min tr(FXD+B)Xx". 
EXy 


We can however obtain further improvement as sug- 
gested by F. Rend] [123], who examined the linear term 
(c, w)~, and proposed a method where EVB1 is itera- 
tively improved, until some specified number of itera- 
tions is reached, or we have satisfied an optimality con- 
dition. More specifically, let Sy := {X1, ..., Xx} © Xa 
and 


L(X;) = min {(c, X;w) : X;, © Xn \ Si-1} , 


so for any integer k > 1, L(X)) < L(X2) <-+- < L(Xx). 
In other words the set S, contains the k first solu- 
tions (permutation matrices) of the problem minyex, 
(c, X;w), where the first solution X,5 L(X,) = (c, w)~. 
Let 


QAP(F, D, X;) = tr(FX;D + B)x/ , 
and also define the following 

Z(k)3s= ann (QAP, D,Xa)s P= Tyccay kf a 
The following inequalities [123] result: 

ZQ) 2+ = 2) = (LE) 

+ UX) == (LE) +10%), 

where if Z(i) = (1, Z) + L(X;) for some i, then X; 
is the optimal solution to the problem. So essentially, 


we try to close or reduce the gap between the optimal 
solution of the QAP and the lower bound EVB1, by 


increasing the value of the linear term (c, w)~ in the 
bound in k steps, where k is specified as a parameter. 
The generation of the set S, or ranking as it is called, 
is a special case of the problem of ranking the k first 
solutions of an assignment problem with cost matrix 
(cjw;) where, as shown in [104], has time complexity 
O(kn*). Rendl [123] presents an O(n log n + (n + log 
k)k) for this special case. There are two issues regard- 
ing the effectiveness of the above ranking procedure, in 
improving the lower bound, addressed in [123]. First, 
observe that if the vectors c and w have m < n equal 
elements, then there are at least m! permutation matri- 
ces {X;} such that the values (c, X;w) are equal. This in 
turn, implies that there will be none or small improve- 
ment in the lower bound while generating S;, for quite 
some number of iterations. As dictated by the decom- 
position in (22), (23), c and w will have equal elements 
if the row sums of F and D are equal. One condition 
then for applying the ranking procedure, is that most of 
the row sums of F and D are not equal. Secondly, Rendl 
[123] also defines a ratio called the degree of linearity 
based on the ranges of the quadratic and linear terms 
that compose the lower bound 


| Gal’ eal 


(c,w)" — (c, w) 

The influence of the linear term on the lower bound 
then is inversely proportional to the value of L. A small 
value of L suggests that the ranking procedure would be 
beneficial for the improvement of EVB1 for symmetric, 
pure quadratic QAPs. For large values of L, we can ex- 
pect that the quadratic term dominates the linear term 
in the objective function, and [51] suggest the follow- 
ing improvement on EVB1. Considering Theorem 5i) 
as applied to the reduced matrices F and D, denote the 
elements of the matrix $(X) by sj = (xj, Xy;)*. We can 
apply the bounds J < sj < uj where 


uij = max{((x;,y;) )?, (xi, ¥j)" 73, 
jo if (ey yj) : (ei: yj) differ in sign, 
an min{((x;, yj) )’, (xi, 9)" 73 otherwise. 


Recalling the fact that the sj are the elements of a dou- 
bly stochastic matrix, we can then form the capacitated 
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transportation problem 


non 
min ) ) Nike jSij 
i=1 j=1 


n 
s.t. yeas, j=l.,... 
i=1 
n 
yg S 1 a eee 
jel 


lig S Sij S uij. 


»M, 


CTP* 


The new lower bound then would be 
EVB2 = CTP* + (c,w) 


A more generalized approach to eigenvalue based 
lower bounding techniques, was employed in [66], that 
led to new lower bounds. Consider the following sets of 
n X n matrices, where I € R"*" is the identity matrix, 
and u:=(1,..., 1)7 € R” is the vector of ones, 


O:= {Xs A XSI}, 
= (Go Mua xX a =u} , 
N := {X: X>0}. 


It is a well known result that X, = OM € NN, while the 
set of doubly stochastic matrices 2 = € MN. Moreover, 
by Birkhoff’s theorem [15] we know that 92 is a convex 
polyhedron with a vertex set X,,, that is, 2 = conv{X : 
X € X,}. Considering the above characterization of X,, 
we can see that any solution to a relaxation of the QAP 
obtained from excluding one or two of the matrix sets 
O, € and N, will yield a lower bound. Naturally the re- 
laxation, and therefore the lower bound, will be tighter 
if only one of the matrix sets is excluded. In relation to 
Theorem 5, Rendl and Wolkowicz [124] showed that 
min tr FXDX' = trFApA}DApA} = (A,p) , 


max tr FXDX! 
xXEO 


II 


trFApA}DApA} = (A,p)t , 


where Ar, Ap are the matrices with columns the eigen- 
vectors of F and D respectively, in the order specified by 
the minimal (maximal) inner product of the eigenval- 
ues. In other words, the lower bound on the quadratic 
part of the QAP as obtained from the EVB, is derived 
by relaxing the feasible set to that of orthogonal ma- 
trices. In [124] a new lower bound is derived, similar 
to EVB2 but using a different approach to decompose 


the matrices F and D. More specifically, denote the de- 
composition scheme in (20) and (21) by the vector d := 
(eT, gT, rT, sT) E R™, where r = (ri, ...5 Tan)? and s = 
(S11, ...» San)", and consider EVB1 as a function of d. 
Maximizing this function with respect to d will result 
in a lower bound with the best possible decomposition 
that involves both the linear and quadratic terms. This 
leads to a nonlinear, nonsmooth, nonconcave function 
which is hard to solve, and a steepest ascent algorithm 
is proposed for maximizing it in [124]. The new bound, 
denoted EVB3, produces some of the best lower bounds 
for the QAP, with the expense however of high compu- 
tational requirements. 

All of the above discussed lower bounds, relax the 
set of permutation matrices to O. A tighter relaxation 
was proposed in [67], where the set of permutation ma- 
trices was relaxed to O NM €, by incorporating € in the 
objective function, by exploiting the fact that the vec- 
tor of ones u is both a left and right eigenvector with 
eigenvalue 1, for any X € X,. More specifically define 


P:= [u/llul|:V], 


where VT u = 0, V! V =1,,_1. therefore V is an or- 
thonormal basis for {u}+, while Q := VVT is the orthog- 
onal projection on {u}+. The following characterization 
of the permutation matrices is given in [67] 


Lemma 6 [67] Let X €R"*" and Y€R"~'*"~1 If 


x= P| | pr (25) 
then 

XE, 

XENS vy = 


XEOSYEO,-1. 


Conversely if X € € then there exists a Y such that (25) 
holds. 


Note that the above characterization of the permuta- 
tion matrices, preserves the orthogonality and the trace 
structure of the problem. Substituting X = — uuT/ ||u||? 
+ VYVT as suggested by (25), in the trace formula- 
tion of the QAP in (7), we have an equivalent projected 
problem (PQAP) of dimension n — 1 on the variable 
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matrix Y. The new lower bound IVB is obtained by re- 
laxing Y to O, — 1, therefore deriving a lower bound for 
the quadratic part of PQAP, while the linear part can 
be solved exactly as an LAP. Decompositions for im- 
proving the IVB are also considered in [67], where it is 
shown that the quadratic term in the projected problem 
is unaffected by e and g in the decomposition scheme 
in (20), (21). Obtaining a lower bound by considering 
both the quadratic and linear term is also considered 
in [78]. 

The symmetry assumption on the QAP is required 
by any of the eigenvalue based lower bounding tech- 
niques described above. Hadley, Rendl and Wolkowicz 
[68] show that any real QAP can be transformed into an 
equivalent QAP where the matrices F and D are Hermi- 
tian, which allows the application of eigenvalue based 
bounds. 


Bounds Based on Semidefinite Relaxations 


Recently (as of 1999), semidefinite programming (SDP) 
relaxations for the QAP were considered [76,77,137]. 
The SDP relaxations considered in these papers are 
solved by interior point methods or cutting plane meth- 
ods (cf. also » Linear programming: Interior point 
methods; » Extended cutting plane algorithm), and the 
obtained solutions are valid lower bounds for the QAP. 
In terms of quality the bounds obtained in this way are 
competitive with the best existing lower bounds for the 
QAP. For many test instances from QAPLIB [31] such 
as some instances of Hadley [26], Roucairol [128], Nu- 
gent et al. [105], and Taillard [133], they are the best ex- 
isting bounds. However, due to prohibitively high com- 
putation time requirements, the use of such approaches 
as basic bounding procedures within branch and bound 
algorithms is up to now not feasible. See [77,78] for 
a detailed description of SDP approaches to the QAP 
and illustrate the idea by describing just one semidefi- 
nite programming relaxation for the QAP. 

The set of n x n permutation matrices X,, is the in- 
tersection of the set of n x n 0-1 matrices, denoted by 
Zn, and the set E, of n x n matrices with row and col- 
umn sums equal to 1. Moreover, X,, is also the inter- 
section of Z,, with the set of n x n orthogonal matrices, 
denoted by O,,. Hence 


Xn = Zn N Ey = Zn NO, . 


Recall that 

0,={XeR”™: xx =X X= Th 
and 

€, ={XeER"": tu =X w= an} , 


where I is the n x n identity matrix and u is the n- 
dimensional vector of all ones. Then, the trace formu- 
lation of the QAP (7) with the additional linear term 


non 
250 bijXij, 


i=1 j=l 
can be represented equivalently as follows: 


tr(FXDX' —2BX') 
St. Sk a Sd, 

Xu=Xlu= u, 

xi, = Xj = 0. 
In order to obtain a semidefinite relaxation for the QAP 
from the formulation QAP, above, we introduce first 
an n’-dimensional vector vec(X). vec(X) is obtained as 
a columnwise ordering of the entries of matrix X. Then 
the vector vec(X) is lifted into the space of (n? + 1) x (n? 
+ 1) matrices by introducing a matrix Yx, 


— Xo vec(X)" 
*~ \vec(X) vec(X)vec(X)T }’ 


Thus, Y has some entry xp in the left-upper corner fol- 
lowed by the vector vec(X) in its first row (column). The 
remaining terms are those of the matrix vec(X) vec(X)T 
sitting on the right lower n* x n* block of Yx. Secondly, 
the coefficients of the problem are collected in an (n* + 
1) x (n* + 1) matrix K given as 


0 
= ee 


where the operator vec is defined as above and D @ F is 
the Kronecker product of D and F. 

It is easy to see that with these notations the objec- 
tive function of QAP, equals tr(KYx). By setting yoo 
:= Xo = 1 as done in [77], one obtains two additional 


—vec(B)" 
D@F }’ 
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constraints to be fulfilled by the matrix Yx: Yx is pos- 
itive semidefinite and matrix Yy is a rank-one matrix. 
Whereas the semidefiniteness and the equality yoo = 1 
can be immediately included in an SDP relaxation, the 
rank-one condition is hard to handle and is discarded 
in an SDP relaxation. In order to assure that the rank- 
one positive semidefinite matrix Yx is obtained by an n 
x n permutation matrix as described above, other con- 
straints should be imposed to Yx. Such conditions can 
be formulated as valid constraints of an SDP formula- 
tion for the QAP by means of some new operators, act- 
ing on matrices or vectors as introduced below. Given 
a matrix A € R"*", the operator diag(A) € R" produces 
a vector containing the diagonal entries of matrix A 
in their natural order, that is, from top-left to bottom- 
right. The adjoint operator Diag acts on a vector V € 
R" and produces a matrix Diag(V) € R"*” with off- 
diagonal entries equal to 0 and the components of V on 
the main diagonal. For some matrix Y € ROPEDX(P +1) 
operator arrow(Y) € RR’ +), is defined as arrow(Y) := 
diag(Y) — (0, You-n2)» where (0, Y(o,1:n2)) is an n? + 1- 
dimensional vector with first entry equal to 0 and other 
entries coinciding with the entries of Y lying on the 0th 
row and in columns between 1 and n’, in their natural 
order. The adjoint operator Arrow acts on an n? + 1- 
dimensional vector W and produces an (n? + 1) x (n? + 
1) matrix Arrow(W) 


lyj7T 
Wh SW 
A W) = 2e(Win2))’ 
rrow(W) ian a) 


where W(:n2) is the n-dimensional vector obtained 
from W by removing its first entry wo. Furthermore, we 
are going to consider an (n” + 1) x (n? + 1) matrix Y as 
composed of its first row Y(o.), of its first column Y(.,0), 
and of n* submatrices of size n x n each, which are ar- 
ranged in an n x n array of n x n matrices and produce 
its remaining n? x n? block (this is similar to the struc- 
ture of a Kronecker product of two n x n matrices. The 
entry Yep, 1 <a, B < n’, will be also denoted by yx, 
with 1 < i,j,k, 1 <n, where a = (i—1)n+jand B =(k 
—1)n+1. Hence, yx is the element with coordinates 
(j, 1) within the n x n block with coordinates (i, k). 
With these formal conventions let us define the 
so-called block-0-diagonal and off-0-diagonal operators, 
acting on an (n* + 1) x (n? + 1) matrix Y, and denoted 
by b° diag and o° diag, respectively. b° diag(Y) and 0° 


diag(Y) are n x n matrices given as follows: 


b° diag(Y) = > Yk (k) » 
k=1 


n 
o diag(Y) =D) Yi.m,0.8) 
k=1 


where, for 1 < k <n, Y(x.)«.) is the kth n x n matrix on 
the diagonal of the n x n array of matrices, defined as 
described above. Analogously, Y(., 4), ., x) isan x nma- 
trix consisting of the diagonal elements sitting on the 
position (k, k) of the n x nm matrices (n? matrices al- 
together) which form the n* x n* lower right block of 
matrix Y. The corresponding adjoint operators B° Diag 
and O° Diag act on an n x n matrix S and produce (n? 
+1) x (n? + 1) matrices as follows: 


0 
1@S)’ 
0 0 
° Diag = 
7 Ue (; 7 


Finally, let us denote by eo the n- + 1-dimensional unit 
vector with first component equal to 1 and all other 
components equal to 0, and let R be the (n* + 1) x (n* + 
1) matrix given by 


0 
B® Diag = 
lag (( 


R= n —u' @u! 
~ \-u @u I@E 
agar Tr 
+( n u de 
—u@u E@I 


where E is the n x n matrix of all ones. With these no- 
tations, a semidefinite relaxation for QAP; is given as 
follows 


tr(KY) 

s.t. b° diag(Y) = I, 
o° diag(Y) = I, 
arrow(Y) = @o, 
tr(RY) = 0, 
Y=0, 


min 


QAPRo 


where = is the so-called Léwner partial order, that is, 
A = B if and only if B — A > 0, that is B — A is pos- 
itive semidefinite. In [77] it was shown that an equiv- 
alent formulation for the considered QAP is obtained 
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from QAPRo by imposing one additional condition on 
the matrix Y, namely, the rank-one condition. 


Exact Solution Methods 


Several exact solution approaches for solving the QAP 
will be presented in this section. Specifically the ex- 
act algorithms that have been used for the QAP are 
dynamic programming, cutting plane algorithms, and 
branch and bound which appears to be the most suc- 
cessful one. 


Branch and Bound 


Branch and bound algorithms appear to be the most ef- 
ficient exact algorithms for solving the QAP. For the 
QAP there are three types of branch and bound algo- 
rithms, namely: 

e Single assignment algorithms ([60,88]). 

e Pair assignment algorithms ([59,86,105]). 

e Relative positioning algorithm ([97]). 

All of the above algorithms work by iterative construct- 
ing an optimal permutation starting from an empty 
permutation. The single assignment algorithms seem to 
be the most efficient and the pair assignment algorithms 
do not have favorable computational results. 

We will now describe a recent branch and bound 
algorithm for the QAP, that was proposed in [111]. In 
the description that follows we will consider the Koop- 
mans-—Beckmann formulation of the QAP. First let us 
define the necessary notation used in describing the 
branch and bound algorithm. A partial permutation for 
the set of integers &,, = {1, ..., n} is denoted by 


; =( 1 D. ta 
Nbc) b(2) + bel) 


where k < n. From now we will write 6; = (6%(1), bx (2), 
..., O¢(k)) for short. An assignment of a facility i to a lo- 
cation j will be denoted by i — j, while if i must never be 
assigned to j we will write i +> j. Note that #x is essen- 
tially a partial assignment of facilities to locations. If we 
want to add an extra assignment to some ¢,, say k+ 1 > 
j, we will write @c41 =o, Uk +1 — j, thereby $x 41(4) 
= @;(i) fori=1,...,k, and x4 1(k + 1) =j. Given some 
x let its range be Q; := {o,(i) :i= 1, ..., k}, and define 
the sets of nonpermissible assignments to be Ex +1 := {j 
€8,\Q.:k+1~ j}. Given an instance QAP(F, D, B), 


a pair of ; and E,,; completely defines a subproblem 
P; as 
non n 
min > dAdo P > bigsi) 
ij i 


aS gad. tata 
o(k +1) € Exsi. 


The original problem Po is obtained for an empty par- 
tial permutation @o and E; = 9. For each P; a lower 
bound g(P;) can be computed, using any of the lower 
bounds described previously, and let the optimal solu- 
tion to P; be denoted by f(P;). In the branch and bound 
algorithm, a forest of n binary trees is constructed, 
where each node of the tree corresponds to a partial 
subproblem P;. The branching process is as follows. 
Given a node P; (i.e. a subproblem) defined by some 
ox and E;, 41, two descendant nodes are created, the left 
child p and the right child P*. For Pi we set Pia = bk 
Uk+1-—j for some j ¢ E; and E;,, , = 9, while for P’ 
we set bj = bx and Ey i4 = Ex, , Uj. Anode which has 
o, with k = n — 1 cannot decomposed further, and it is 
called a terminal node. Immediately we can identify the 
following properties 

e g(P;) <f(P;) for any node Pj, 

e g(P;) =f(P;) if P; is a terminal node, 

e g(P;) = g(Pj) if Pj has descended from Pj. 

A node defined by some ¢; and E; + will have two ter- 
minal nodes as children if k = n — 2. Moreover, for any 
node |E,+1| + k < n — 1, while if equality holds then 
there is only one j ¢ Ex, ; and only one left child is gen- 
erated with @,41 =@, Uk+1— jand Ex42 =9. 

The branch and bound algorithm in [111] starts by 
computing an upper bound solution to the original sub- 
problem Po by means of a heuristic (cf. also » Heuris- 
tic search). Let the corresponding permutation be ¢ = 
(p(1), A(2), ..., @(n)). Note that during the process of 
the algorithm the upper bound is continuously updated 
whenever a better feasible solution is found. Then n 
nodes are created, where for each P; for i = 1, ..., n, 
we set d; = dp U1 > (i). Ey = G, and g(P};) = 0. Then 
the following steps are performed at each iteration 
1) Selection: Here we choose which node to examine 

next, and we choose the node with the maximum 

g(Pi). 
2) Branching: Given the chosen node P; from step 

1, we create two new nodes Pi and P*, based on 
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the branching scheme described previously. We set 
g(P') = g(P;) and we compute g(P!). 

3) Elimination: If g(P!) is less than or equal to the cur- 
rent upper bound, then the node Pp! is pruned, that 
is, marked not to be considered in step 1) in the fu- 
ture. 

4) Termination: The algorithm stops if, and only if, 
there are no more nodes to be considered in step 1). 

The authors in [111] applied the above described 

branch and bound algorithm for the QAP in con- 

junction with the variance reduction lower bounds de- 
scribed previously. 


Traditional Cutting Plane Methods 


Traditional cutting plane algorithms for the QAP have 
been developed by a different authors, [7,8,9,13,14], and 
[80]. These algorithms make use of mixed integer linear 
programming (MILP) formulations for the QAP which 
are suitable for Benders decomposition. In the vein 
of Benders, the MILP formulation is decomposed into 
a master problem and a subproblem, called also slave 
problem, where the master problem contains the orig- 
inal assignment variables and constraints. For a fixed 
assignment the slave problem is usually a linear pro- 
gram and hence, solvable in polynomial time. The mas- 
ter problem is a linear program formulated in terms of 
the original assignment variables and of the dual vari- 
ables of the slave problem, and is solvable in polyno- 
mial time for fixed values of those dual variables. The 
algorithms work typically as follows. First, a heuristic 
is applied to generate a starting assignment. Then the 
slave problem is solved for fixed values of the assign- 
ment variables implied by that assignment, and optimal 
values of the primal and dual variables are computed. If 
the dual solution of the slave problem satisfies all con- 
straints of the master problem, we have an optimal so- 
lution for the original MILP formulation of the QAP. 
Otherwise, at least one of the constraints of the mas- 
ter problem is violated. In this case, the master prob- 
lem is solved with fixed values for the dual variables of 
the slave problem and the obtained solution is given as 
input to the slave problem. The procedure is then re- 
peated until the solution of the slave problem fulfills all 
constraints of the master problem. 

Clearly any solution of the master problem obtained 
by fixing the dual variables of the slave problem to some 


feasible values, is a lower bound for the considered 
QAP. On the other side, the objective function value 
of the QAP corresponding to any feasible setting of the 
assignment variables is an upper bound. The algorithm 
terminates when the lower and the upper bounds co- 
incide. Generally, the time needed for the upper and 
the lower bounds to converge to a common value is too 
large, and hence these methods may solve to optimal- 
ity only very small QAPs. However, heuristics derived 
from cutting plane approaches produce good subopti- 
mal solutions in early stages of the search, see for exam- 
ple, [21] and [14]. 


Polyhedral Cutting Planes 


Similarly to traditional cutting plane methods also 
polyhedral cutting planes or branch and cut algorithms 
(cf. also » Integer programming: Branch and cut al- 
gorithms) make use of an LP or MILP relaxation of 
the combinatorial optimization problem to be solved, 
in our case the QAP. Additionally, polyhedral cutting 
plane methods make use of a class of (nontrivial) valid 
or facet defining inequalities known to be fulfilled by all 
feasible solutions of the original problem. If the solu- 
tion of the relaxation is feasible for the original prob- 
lem, we are done. Otherwise, some of the above men- 
tioned valid inequalities are probably violated. In this 
case a ‘cut’ is performed, that is, one or more of the 
violated inequalities are added to the LP or MILP re- 
laxation of our problem. The latter is resolved and the 
whole process is repeated. In the case that none of the 
valid inequalities is violated, but some integrality con- 
straint is violated, the algorithm performs a branch- 
ing step by fixing (feasible) integer values for the cor- 
responding variable. The branching steps produce the 
search tree like in branch and bound algorithms. Each 
node of this tree is processed as described above by per- 
forming cuts and then by branching it, if necessary. 
Clearly, related elements of branch and bound algo- 
rithms like upper bounds, selection and branching rules 
play a role in branch and cut algorithms. Hence, such 
an approach combines elements of cutting plane and 
branch and bound methods. The main advantage of 
polyhedral cutting plane algorithms with respect to tra- 
ditional cutting planes relies on the use of cuts which 
are valid for the whole polytope of the feasible solutions, 
and possibly facet defining. Traditional cutting planes 
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instead rely frequently on cuts which are not valid for 
the whole polytope of the feasible solutions. In this case 
the whole computation has to be done from scratch for 
different variable fixings. This requires additional run- 
ning time and additional amounts of memory. Another 
and not less important drawback of traditional cutting 
plane algorithms is due to the ‘weakness’ of the cuts 
they involve. In contrast with cuts produced by facet 
defining inequalities, the weak cuts cannot avoid the 
slow convergence. 

Polyhedral cutting plane methods for the QAP are 
not yet backed by a strong theory. However, some ef- 
forts to design branch and cut algorithms for the QAP 
have been made in [106] and [75]. M.W. Padberg and 
M.P. Rijal [106] have tested their algorithm on sparse 
QAP instances. The numerical results are encouraging, 
although the developed software is of preliminary na- 
ture, as claimed by the authors. V. Kaibel [75] has used 
branch and cut to compute lower bounds for QAP in- 
stances. His results are promising especially in the case 
where box inequalities are involved. 


Heuristics 


There is a large amount of research directed toward 
heuristic algorithms for solving the QAP. This is par- 
tially due to the fact that, although substantial improve- 
ments have been done in the development of exact al- 
gorithms for the QAP, problems of dimension n > 20 
are still not practical to solve because of very high com- 
puter time requirements. The following types of heuris- 
tic algorithmic approaches have been applied towards 
the QAP: 

construction methods (CM); 

limited enumeration methods (LEM); 
improvement methods (IM); 

tabu search (TS); 

simulated annealing (SA); 

genetic algorithms (GA); 

greedy randomized adaptive search procedures 
(GRASP); 

e ant systems (AS). 


Construction Methods 


Construction methods were introduced in [60]. They 
are iterative approaches which usually start with an 
empty permutation, and iteratively complete a partial 


permutation into a solution of the QAP by assigning 
some facility which has not been assigned yet to some 
free location. 


PROCEDURE construction(¢o, I”) 


= 
DOnw= een — 1 
IF (i,j) ¢é F > 
j = heur(i); 
update(¢;, (i, j))s 
F=Fu (i,j) 
FI; 
p = his 
OD; 
RETURN(¢) 


END construction; 
Pseudocode for construction method 


A generic construction method is presented in pseu- 
docode under the name PROCEDURE construction 
(fo, I’). Here do, 1, ...5 n—1 are partial permuta- 
tions, and heur(i) is some heuristic procedure that as- 
signs facility i to some location j, and returns j. I” is 
the set of already assigned pairs of facilities to loca- 
tions. The procedure update constructs a permutation 
go; by adding the assignment (i, j) to 1. The heuris- 
tic heur(i) employed by update could be any heuristic 
which chooses a location j for facility i, (i, j) ¢ I’, in 
a greedy fashion or by applying local search. One of the 
oldest heuristics used in practice, the CRAFT heuris- 
tic, developed in [17], is a construction method. An- 
other construction method which yields good results 
has been proposed in [100]. 


Limited Enumeration Methods 


It has been observed that often enumeration methods 
(e.g. branch and bound algorithms) find good solu- 
tions in early stages of the search, and then employ a lot 
of time to marginally improve that solution or prove 
its optimality. Based on this observation, limited enu- 
meration methods impose a limit on the enumeration 
process, which can be either a maximum number of it- 
erations or time limit, to produce a heuristic solution. 
Another strategy which serves the same goal is to ma- 
nipulate the lower bound. This can be done by increas- 
ing the lower bound if no improvement in the solution 
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is achieved during a large number of iterations, and 
would yield deeper cuts in the search tree to speed up 
the process. Clearly, such an approach may cut off the 
optimal solution and hence should be used carefully, 
possibly in conjunction with certain heuristics that per- 
form elaborate searches in the feasible space. 


Improvement Methods 


These methods are otherwise called local search algo- 
rithms. For a comprehensive discussion of theoretical 
and practical aspects of local search in combinatorial 
optimization, see [1]. 

Basic ingredients of improvement methods are the 
neighborhood and the order in which the neighbor- 
hood is searched. A frequently used neighborhood for 
the QAP is the k-exchange neighborhood which we will 
define as follows. Let the difference between two per- 
mutations ¢ and w be 6(¢, W) := {i: 6() 4 WW}, and 
define the distance between the two permutations to 
be d(¢, w) := |8(¢, y)|. The k-exchange neighborhood 
N;(@) for a permutation ¢ € S,, is 


NiO) = {W: dw) sk, 2<k<nj. 


The size of the neighborhood used in the k-exchange 
local search is (7) = n!/k!(n — k)!. For the QAP the most 
frequently used values for k are 2 and 3, with N2(@) pro- 
ducing better empirical results. 

Another important ingredient of improvement 
methods is the order in which the neighborhood is 
scanned. This order can be either fixed previously or 
chosen at random. Given a neighborhood structure and 
a scanning order, a rule for the update of the current 
solution (from the current iteration to the subsequent 
one) should be chosen. The following update rules are 
frequently used: 

e first improvement; 

e best improvement; 

e Heider’s rule [70]. 

In the case of first improvement the current solution 
is updated as soon as the first improving neighbor solu- 
tion is found. Best improvement scans the whole neigh- 
borhood and chooses the best improving neighbor so- 
lution (if such a solution exists at all). Heider’s rule 
starts by scanning the neighborhood of the initial solu- 
tion in a prespecified cyclic order. The current solution 
is updated as soon as an improving neighbor solution 
is found. The scanning of the neighborhood of the new 


solution starts there where the scanning of the previous 
one was interrupted (in the prespecified cyclic order). 


Tabu Search 


Tabu search was introduced in [62,63] as a technique to 
overcome local optimality. See [61] for a comprehen- 
sive introduction to tabu search algorithms. 

Different implementations of tabu search have been 
proposed for the QAP, for example, a tabu search with 
fixed tabu list ({131]), the robust tabu search ([133]), 
where the size of the tabu list is randomly chosen be- 
tween a maximum and a minimum value, and the reac- 
tive tabu search ([12]) which involves a mechanism for 
adopting the size of the tabu list. Reactive tabu search 
aims at improving the robustness of the algorithm. The 
algorithm notices when a cycle occurs, and increases 
the tabu list size according to the length of the detected 
cycle. The numerical results show that generally the re- 
active tabu search outperforms other tabu search al- 
gorithms for the QAP (see [12]). More recently, also 
parallel implementations of tabu search have been pro- 
posed, see for example, [36]. Tabu search algorithms al- 
low a natural parallel implementation by dividing the 
burden of the search in the neighborhood among sev- 
eral processors. 


Simulated Annealing 


Simulated annealing exploits the analogy between com- 
binatorial optimization problems and problems from 
statistical mechanics. S. Kirkpatrick, C.D. Gelatt and 
M.P. Vecchi [82] and V. Cerny [135] were among the 
first authors who recognized this analogy, and showed 
how the Metropolis algorithm (see [96]) used to sim- 
ulate the behavior of a physical many-particle system 
can be applied as a heuristic for the traveling salesman 
problem. 

Burkard and Rendl [33] showed that a simulated 
cooling process yields a general heuristic which can be 
applied to any combinatorial optimization problem, as 
soon as a neighborhood structure has been introduced 
in the set of its feasible solutions. In particular, they 
applied simulated annealing to the QAP. Other simu- 
lated annealing (SA) algorithms for the QAP have been 
proposed by different authors, see for example, [136] 
and [40]. All these algorithms employ the 2-exchange 
neighborhood. They differ on the way the cooling pro- 
cess or the thermal equilibrium is implemented. The 
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numerical experiments show that the performance of 
SA algorithms strongly depends on the values of the 
control parameters, and especially on the choice of the 
cooling schedule. 


Genetic Algorithms 


The so-called genetic algorithms (GA) are a nature in- 
spired approach for combinatorial optimization prob- 
lems. The basic idea is to adapt the evolutionary mech- 
anisms acting in the selection process in nature to com- 
binatorial optimization problems. The first genetic al- 
gorithm for optimization problems was proposed by 
Holland [53] in 1975. For a good coverage of theoret- 
ical and practical issues on genetic algorithms, see [43] 
and [64]. 

A number of authors have proposed genetic algo- 
rithms for the QAP. Standard algorithms, like the one 
developed in [134], have difficulties to generate the best 
known solutions even for QAPs of small or moder- 
ate size. Hybrid approaches, such as combinations of 
GA techniques with tabu search as the one developed 
in [52] seem to be more promising. More recently an- 
other hybrid algorithm, the so-called greedy genetic al- 
gorithm proposed in [5], produced very good results on 
large scale QAPs from QAPLIB [31]. 


Greedy Randomized Adaptive Search Procedure 


The greedy randomized adaptive search procedure 
(GRASP) was introduced in [48] and has been ap- 
plied successfully to different hard combinatorial opti- 
mization problems [49,83,84,125] and among them to 
the QAP [94,109,110] and the BiQAP [95]. See [48] 
for a survey and tutorial on GRASP, and to [117] for 
a comprehensive presentation of the implementation of 
GRASP to the QAP and related problems. 

GRASP is a combination of greedy elements with 
random search elements in a two phase heuristic. It 
consists of a construction phase and a local improve- 
ment phase. In the construction phase good solu- 
tions from the available feasible space are constructed, 
whereas in the local improvement phase the neigh- 
borhood of the solution constructed in the first phase 
is searched for possible improvements. A pseudocode 
of GRASP is shown below. The input parameters are 
the size RCLsize of the restricted candidate list (RCL), 
a maximum number of iterations, and a random seed. 
RCL contains the candidates upon which the sampling 


related to the construction of a solution in the first 
phase will be performed. 


PROCEDURE 
GRASP(RCLSize, MaxI ter, RandomSeed) 

InputInstance(); 

DOk =1,..., MaxIter > 
ConstructSolution(RCLSize,RandomSeed); 
LocalSearch(BestSolutionFound); 
UpdateSolution(BestSolutionFound); 

OD; 

RETURN BestSolutionFound 

END GRASP; 


Pseudocode for generic GRASP 


Ant Systems 


Ant systems (AS) is a recently developed heuristic for 
combinatorial optimization problems which tries to 
imitate the behavior of an ant colony in search for food. 
AS was originally introduced in [45] and [38] and has 
already produced good results for well known problems 
like the traveling salesman problem (TSP) and the QAP 
[39,57]. Numerical results in [39,57] show that ant sys- 
tems are competitive heuristics especially for real life 
instances of the QAP with a few very good solutions 
clustered together. For randomly generated instances 
which have many good solutions distributed somehow 
uniformly in the search space, AS are outperformed 
by other heuristics, that is, genetic algorithms or tabu 
search approaches. 


Related Problems 


Generalizations of the QAP appeared almost as soon as 
the problem itself. Specifically, Lawler [88] addressed 
the issue of extending to cubic, quartic, and N-adic as- 
signments problems in general, in the same fashion as 
the LAP was extended to QAP in formulation (1). For 
the cubic assignment problem for example, we have n° 
cost coefficients Cijklmp Where i, j, k,l, m, p=1,...,n, and 
the problem is then defined to be 


n n n 
min » ye > CijkImpXijXk1Xmp 


i,j=1k,l=1 m,p=1 
s.t (xij) EX. 


As it is noted [88], we can construct an n* x n° matrix 
S containing the cost coefficients, such that the cubic 
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assignment problem is equivalent to the LAP 


min (S,Y) 
st Y=X@X®@xX, 
XEX,. 


In an analogous way the LAP can be extended to any N- 
adic assignment problem, by considering the solution 
matrix Y to be the Kronecker Nth power of a permu- 
tation matrix in X,. In this section several generaliza- 
tions and related problems of the QAP are presented, 
for which real applications have been found that initi- 
ated an interest in analyzing them and proposing solu- 
tion techniques. 


Biquadratic Assignment Problem 


A generalization of the QAP is the biquadratic assign- 
ment problem (BiQAP), which is essentially a quar- 
tic assignment problem with cost coefficients formed 
by the products of two four-dimensional arrays. More 
specifically, consider two n* x n* arrays F = (f jx) and D 
= (dinpst). The BiQAP can then be stated as: 


n n 
> > Sijktd mpstXimX jp XksX1t 


i,j,k,l=1 m,p,s,t=1 
n 


s.t. So xi = 1, 
i=1 
n 
¥ 25 =1, ol PPR (9 
j=l 


Xij E {0, 1}, 


min 


j=l,...,n, 


i,j=1,...,n. 


The major application of the BiQAP arises in very large 
scale integrated (VLSI) circuit design. A detailed de- 
scription of the mathematical modeling of the VLSI 
problem to a BiQAP can be found in [24]. Determin- 
istic improvement methods and variants of simulated 
annealing and tabu search have been developed for 
the BiQAP in [22]. Computational experiments on test 
problems of size up to n = 32, with known optimal so- 
lutions (a test problem generator is presented in [24]), 
suggest that one version of simulated annealing is best 
among those tested. The GRASP heuristic has also been 
applied to the BiQAP in [95], and produced the optimal 
solution for all the test problems generated in [24]. 


Multidimensional Assignment Problems 


A close relative to the class of M-adic assignment prob- 
lems is that of the multidimensional assignment prob- 
lems (MAPs), often referred to as multi-index assign- 
ment problems, that also arise as natural extensions 
from the LAP. The general formulation of the MAP is 


Mi 
min ) sie 3 Ciyiy Xi iy 
i;=1 ea 
M2 
s.t. y “se 3 Xipniy = 1, 
iz=1 in=1 
oo 1 = 1; .,M), 
Mk-1 fe 


3s sy 


ij=1 ip—1=1 tpg = 1 in=1 
2. ., My, k =2,...,N—1, 
Mn-1 
aa eS Xip-in = 
ij=1 in—1=1 
foriy =1,...,My, 
Xipeiy © {0, 1} 


for all i, --- in, 
with n cost coefficients C;j,...i. A feasible solution to 
the above problem will be an N-dimensional permu- 
tation array. Multidimensional assignment problems 
in their general form have found many applications 
as a means of solving the data association problem. 
More specifically, the central problem in any mul- 
titarget tracking and multisensor surveillance is the 
data association problem of partitioning the observa- 
tions into tracks and false alarms. General classes of 
these problems can be formulated as multidimensional 
assignment problems. For a detailed description on 
the application of MAPs for multiple target tracking 
applications, as well as for solution approaches, see 
[101,102,118]. 

Various applications are also contributed to special 
cases of the MAP. Specifically, the five-dimensional as- 
signment problem has been successfully used for track- 
ing elementary particles. By solving a five-dimensional 
assignment problem, physicists reconstruct tracks gen- 
erated by charged elementary particles produced by 
the large electron-positron collider (LEP) at CERN in- 
stitute [119]. The 3-index assignment problem is also 
a special case of the MAP. 
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Bottleneck QAP 


In the bottleneck quadratic assignment problem (BQAP) 
of size n we are given an n x n flow matrix F and an n x 
n distance matrix D, and wish to find a permutation @ 
€ 8, which minimizes the objective function 


max { fijdgiiro(y? 1S i,j <n}. 


A more general BQAP analogous to the QAP in (2) is 
obtained if the coefficients of the problem are of the 
form cy, 1 < i,j,k, 1 <n: 

un ee eeu: 

Besides the application in backboard wiring mentioned 
above, the BQAP has many other applications. Basi- 
cally, all QAP applications give raise to applications of 
the BQAP because it often makes sense to minimize the 
largest cost instead of the overall cost incurred by some 
decision. A well studied problem in graph theory which 
can be modeled as a BQAP is the bandwidth problem. 
In the bandwidth problem we are given an undirected 
graph G = (V, E) with vertex set V and edge set E, and 
seek a labeling of the vertices of G by the numbers 1, 
..., 1, where |V| = n, such that the minimum absolute 
value of differences of labels of vertices which are con- 
nected by an edge is minimized. In other words, we seek 
a labeling of vertices such that the maximum distance 
of 1-entries of the resulting adjacency matrix from the 
diagonal is minimized, that is, the bandwidth of the ad- 
jacency matrix is minimized. It is easy to see that this 
problem can be modeled as a special BQAP with flow 
matrix equal to the adjacency matrix of G for some ar- 
bitrary labeling of vertices, and distance matrix D = (|i 
— j)). 

The BQAP is NP-hard since it contains the bottle- 
neck TSP as a special case. Some enumeration algo- 
rithms to solve BQAP to optimality have been proposed 
in [19]. These algorithms employ a Gilmore-Lawler- 
like bound for the BQAP which involves in turn the 
solution of bottleneck linear assignment problems. The 
algorithm for the general BQAP involves also a thresh- 
old procedure useful to reduce to 0 as many coefficients 
as possible. Burkard and Fincke [27] investigated the 
asymptotic behavior of the BQAP and proved results 
analogous to those obtained for the QAP: If the coef- 
ficients are independent random variables taken from 


a uniform distribution on [0, 1], then the relative dif- 
ference between the worst and the best value of the ob- 
jective function approaches 0 with probability tending 
to 0 as the size of the problem approaches infinity. 

The BQAP and the QAP are special cases of a more 
general quadratic assignment problem which can be 
called the algebraic QAP (in analogy to the algebraic 
linear assignment problem (LAP) introduced in [30]). 
If (H, *, <) is a totally ordered commutative semigroup 
with composition + and order relation <, the algebraic 
QAP with cost coefficients cj; € H is formulated as 


ba C1IP(I)P(1) * +++ * Cing(1)o(n) * +++ * Cand(n)e(n) - 
The study of the bottleneck QAP and more generally 
the algebraic QAP was the starting point for the inves- 
tigation of a number of algebraic combinatorial opti- 
mization problem with coefficients taken from linearly 
ordered semimodules, that is, linear assignment and 
transportation problems, flow problems, and other. See 
[34] for a detailed discussion on this topic. 


Other Problems Which Can Be Formulated As QAPs 


There are a number of other well known combinato- 
rial optimization problems which can be formulated as 
QAPs with specific coefficient matrices. Of course, since 
QAP is not a well tractable problem, it does not make 
sense to use algorithms developed for the QAP to solve 
these other problems. All known solution methods for 
the QAP are far inferior compared to any of the special- 
ized algorithms developed for solving these problems. 
However, the relationship between the QAP and these 
problems might be of benefit for a better understanding 
of the QAP and its inherent complexity. 

Two well studied NP-hard combinatorial optimiza- 
tion problems which are special cases of the QAP, are 
the graph partitioning problem (GPP) and the maxi- 
mum clique problem (MCP). In GPP we are given an 
(edge) weighted graph G = (V, E) with n vertices and 
a number k which divides n. We want to partition the 
set V into k sets of equal cardinality such that the total 
weight of the edges cut by the partition is minimized. 
This problem can be formulated as a QAP with distance 
matrix D equal to the weighted adjacency matrix of G, 
and flow matrix F obtained by multiplying with —1 the 
adjacency matrix of the union of k disjoint complete 
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subgraphs with n/k vertices each. For more informa- 
tions on graph partitioning problems, see [90]. In the 
maximum clique problem we are again given a graph G 
= (V, E) with n vertices and wish to find the maximum 
k <n such that there exists a subset V; C V with |Vj| 
= k, which induces a clique in G, that is, all vertices of 
V, are connected by edges of G. In this case consider 
a QAP with distance matrix D equal to the adjacency 
matrix of G and flow matrix F given as adjacency ma- 
trix of a graph consisting of a clique of size k and n — 
k isolated vertices, multiplied by —1. A clique of size k 
in G exists only if the optimal value of the correspond- 
ing QAP is —k’. For a review on the maximum clique 
problem, see [114]. 

The traveling salesman problem (TSP) is another 
well known combinatorial optimization problem which 
is NP-hard, and much research has been devoted to 
finding efficient algorithms that will provide near- 
optimal solutions. In the TSP we are given a set of 
cities and the distances between them, and our task is 
to find the optimal tour that will visit each city once 
and will minimize the total distance traveled. In formu- 
lating the TSP as a QAP the distance matrix D is the 
corresponding distance matrix of the TSP, and the flow 
matrix F is the adjacency matrix of a complete cycle of 
length n. Without loss of generality the distance matrix 
D is considered to be symmetric. A complete cycle or 
tour is then defined by a permutation ¢. The traveling 
salesman problem (TSP) is a notorious NP-hard com- 
binatorial optimization problem. Among the abound- 
ing literature on the TSP, [89] is a comprehensive ref- 
erence. 

In the linear arrangement problem we are given 
a graph G = (V, E) and wish to place its vertices at the 
points 1, ..., 2 on the line so as to minimize the sum 
of pairwise distances between vertices of G which are 
joined by some edge. If we consider the more general 
version of weighted graphs than we obtain the back- 
board wiring problem. This is an NP-hard problem as 
mentioned in [58]. It can be formulated as a QAP with 
distance matrix the (weighted) adjacency matrix of the 
given graph, and flow matrix F = (f(j) given by fj = |i— 
jl, for all i, j. In the minimum weight feedback arc set 
problem (FASP) a weighted digraph G = (V, E) with 
vertex set V and arc set E is given. The goal is to re- 
move aset of arcs from E with minimum overall weight, 
such that all directed cycles, so-called dicycles, in G are 


destroyed and an acyclic directed subgraph remains. 
Clearly, the minimum weight feedback arc set problem 
is equivalent to the problem of finding an acyclic sub- 
graph of G with maximum weight. The unweighted ver- 
sion of the FASP, that is a FASP where the edge weights 
of the underlying digraph equal 0 or 1, is called the 
acyclic subdigraph problem and is treated extensively in 
[74]. An interesting application of the FASP is the so- 
called triangulation of input-output tables which arises 
along with input-output analysis in economics used to 
forecast the development of industries, see [91]. For de- 
tails and a concrete description of the application of 
triangulation results in economics, see [41] and [122]. 
Since the vertices of an acyclic subdigraph can be la- 
beled topologically, that is, such that in each arc the la- 
bel of its head is larger than that of its tail, the FASP 
can be formulated as a QAP. The distance matrix of the 
QAP is the weighted adjacency matrix of G and the flow 
matrix F = (fj) is a lower triangular matrix, that is, fj 
= —lifi<jandfj = 0, otherwise. The FASP is well 
known to be NP-hard (see [58,79]). 

Another well known NP-hard problem which can 
be formulated as a QAP is the graph packing problem 
(cf. [16]). The graph packing problem can be formu- 
lated as a QAP with distance matrix equal to the adja- 
cency matrix of G2 and flow matrix equal to the adja- 
cency matrix of G;. A packing of G) into G exists if 
and only if the optimal value of this QAP is equal to 
0. In the positive case the optimal solution of the QAP 
determines a packing. 


QAP Problem Generators 


Since the QAP is a very hard problem from a practi- 
cal point of view, often heuristics are the only reason- 
able approach to solve it, and so far there exists no per- 
formance guarantees for any of the algorithms devel- 
oped for the QAP. One possibility to evaluate the per- 
formance of heuristics and to compare different heuris- 
tics is given by QAP instances with known optimal so- 
lution. Heuristics are applied to these instances and 
the heuristic solution is compared to the optimal one 
known before hand. The instances with known optimal 
solution should ideally have two properties: first, they 
should be representative in terms of their hardness, and 
secondly, they should not be especially easy for any of 
the heuristics. 
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Two generators of QAP instances with known opti- 
mal solution have been proposed so far: Palubeckis’ gen- 
erator [107] and the Li-Pardalos generator [92]. 

The first method for generating QAP instances 
with a known optimal solution was proposed by G.S. 
Palubeckis [107] in 1988. The input of the Palubeckis’ 
algorithm consists of the size n of the instance to be 
generated, the optimal solution (permutation) @ of the 
output instance, two control parameters w and z, where 
z<w,and the distance matrix A of an r x s grid with rs 
= n. A contains rectilinear distances also called Man- 
hattan distances, that is, the distance aj between two 
given knots i, j lying in rows rj, r; and in columns ¢;, cj, 
respectively, is given by aj = |r; — r;| + |ci — cj]. The out- 
put of the algorithm is a second matrix B such that ¢ is 
an optimal solution of QAP(A, B). The idea is to start 
with a matrix B such that QAP(A, B) is a trivial instance 
with optimal solution ¢. Then B is transformed such 
that QAP(A, B) is not any more trivial, but @ continues 
to be its optimal solution. 

Palubeckis starts with a constant matrix B = (bj) 
with bj = w. QAP(A, B) is a trivial problem because 
all permutations yield the same value of the objective 
function and thus, are optimal solutions. Hence, also 
the identity permutation id is an optimal solution of 
QAP(A, B). Then matrix B is iteratively transformed 
so that it is not a constant matrix any more and the 
identity permutation remains an optimal solution of 
QAP(A, B). In the last iteration the algorithm con- 
structs an instance QAP(A’, B) with optimal solution 
¢ with the help of QAP(A, B) with optimal solution the 
identity permutation id, by setting A’ = (a9), The 
optimal value of QAP(A’, B) equals w )77_, TS aij. 
D. Cyganski, R.F. Vaz and V.G. Virball [42] have ob- 
served that the QAP instances generated by Palubeckis’ 
generator are ‘easy’ in the sense that their optimal value 
can be computed in polynomial time by solving a linear 
program. 

Another generator of QAP instances with known 
solution has been proposed by Li and Pardalos [92]. As 
Palubeckis’ generator, Li and Pardalos starts with a triv- 
ial instance QAP(A, B) with the identity permutation id 
as optimal solution and iteratively transforms A and B 
so that the resulting QAP instance still has the optimal 
solution id but is not trivial any more. The transforma- 
tions are such that for all i, j, i’, j’, aij = ay 7 is equivalent 
to bj <b’; at the end of each iteration. 


If the coefficient matrices are considered as 
weighted adjacency matrices of graphs, each itera- 
tion transforms entries corresponding to some spe- 
cific subgraph equipped with signs on the edges and 
hence called sign-subgraphs. The application of the 
Li-Pardalos algorithm with different sign-subgraphs 
yields different QAP generators. A number of gener- 
ators involving different sign-subgraphs, for example, 
subgraphs consisting of a single edge, signed triangles 
and signed spanning trees have been tested. It is per- 
haps interesting and surprising that QAP instances gen- 
erated by involving more complex sign-subgraphs are 
generally ‘easier’ than those generated by involving sub- 
graphs consisting of single edges. Here a QAP instance 
is considered to be ‘easy’, if most heuristics applied to 
it find a solution near to the optimal one in a relatively 
short time. Nothing is known about the complexity of 
QAP instances generated by the Li-Pardalos generator, 
since the arguments used to analyze Palubeckis’ gener- 
ator do not apply in this case. 


Surveys and Books 


In this concluding section a list of survey articles and 
books which cover all the aspects of the QAP in depth 
is given. 

One of the early survey articles is [51] where the 
eigenvalue based lower bounds for the QAP are in- 
troduced. The survey papers [20,112] and [25] cover 
every aspect of the QAP. Specifically, the article [25] 
is the most recent one, and the most comprehensive. 
A collection of articles with theoretical and algorithmic 
contributions for the QAP can be found in the book 
[113]. The book [35] has a comprehensive introduction 
on the QAP, and focuses on special cases of the QAP 
which can be solved in polynomial time. Finally the 
book [106] focuses on polyhedral aspects of the QAP. 
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Given two continuous functions f : R” > Rand gR” > 
R defined on a polyhedral set S C R” such that g(x) > 0 
for all x € S, the fractional programming problem is to 
find some point x* which satisfies 


fla) _ ae fla 
g(x*) —-xeS g(x) | 


(1) 


Applications and algorithms for fractional programs 
have been treated in considerable detail since the early 
work of J.R. Isbell and W.H. Marlow [5]. Included 
among the many applications are portfolio selection, 
stock cutting, game theory, and numerous decision 
problems in management science. See [3] for work 
known to up to 1971 and [1,4,12,13] for the most re- 
cent surveys. 

If f(x) is concave and nonnegative and g(x) and 
S are convex (and S is bounded), then (1) is called 
a concave-convex fractional program. It was shown in 
[10] that such problems can be solved by a single con- 
cave problem using a simple variable transformation. 
This provides an efficient approach for solving a lim- 
ited class of fractional programming problems. Unfor- 
tunately, even in some of the simplest cases (for exam- 
ple when f(x) and g(x) are quadratic) a new constraint, 
which may be nonlinear, must be added (to the trans- 
formed feasible region), and the transformed problem 
becomes very difficult to solve. In addition, if the prob- 
lem is not concave-convex initially, then the transfor- 
mation does not even necessarily yield a concave prob- 
lem. In fact, in the most general case, Eq (1) may have 
many local maxima which are different from the opti- 
mal one, and hence determining the global maximum 
is very difficult (i. e., NP-hard). 

A different and more recent method is to replace the 
nonlinear functions by suitable linear underestimators 
and then obtain the global optimum by a vertex rank- 
ing procedure. This method, due to P.M. Pardalos [6], 


is applicable only when f(x) is a convex quadratic func- 
tion and g(x) is linear (hence the ratio is quasiconvex). 

Another well-known approach, and one of the old- 
est, is to consider the global optimization problem 


max f (x) —Ag(x), (2) 


where A € R is a constant. This ‘parametric’ approach, 
which was first proposed by W. Dinkelbach [2], gener- 
ates a sequence of values A; that converges to the global 
optimum function value [11]. This method has since 
then been applied to many specific types of fractional 
programs including the concave-convex type, but very 
little work has been done to solve fractional programs 
where the ratio of two concave, two convex, or the ratio 
of a convex and a concave function is to be maximized. 
In addition, this method does not provide a sequence 
of improving upper bounds, and hence even though the 
sequence A; may be converging to the global optimum 
function value, no bound on the error is available at any 
iteration. 

The method discussed here improves Dinkelbach’s 
algorithm by providing a means for obtaining a se- 
quence of improving upper bounds which, along 
with the corresponding sequence of improving lower 
bounds, will provide a bound on the error at each it- 
eration of the solution procedure. In addition, both 
the sequence of lower bounds and the sequence of 
upper bounds converge to the global optimum func- 
tion value at a ‘superlinear’ rate. This algorithm is also 
appropriate for the class of quadratic fractional pro- 
grams (i.e., one or both of f(x) and g(x) are quadratic) 
where the ratio may involve concave, convex, or even 
indefinite terms. It combines Dinkelbach’s approach 
with a method guaranteed to solve linearly constrained 
quadratic programming problems regardless of the def- 
initeness of the quadratic from [8]. 

Two algorithms which are similar to the one pre- 
sented here are given in [4,11]. Schaible’s method [11] 
first computes a sequence of improving upper and 
lower bounds using an efficient section method. Dinkel- 
bach’s algorithm is then started as soon as the section 
method achieves a set of bounds that differ by some 
pre-specified tolerance. The algorithm presented here 
differs from Schaible’s method in that the upper and 
lower bounds are continuously improving throughout 
the procedure. Nevertheless, in both algorithms the se- 
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quence of upper and lower bounds converges superlin- 
early. 

Likewise, [4] presents a variety of related algorithms 
which also provide upper and lower bounds. These al- 
gorithms combine Dinkelbach’s approach with various 
search techniques (e.g., Newton, binary, modified bi- 
nary). The result is a set of related algorithms with con- 
vergence rates that vary depending on the search tech- 
nique employed. T. Ibaraki [4] also provides a collec- 
tion of computational results for the fractional knap- 
sack problem and quadratic fractional programs. 


Problem Formulation and Properties 


The fundamental result which relates the global op- 
timization problem (2) to the general fractional pro- 
gramming problem (1) is as follows: x* solves the frac- 
tional programming problem (1) ifand only if x* solves 
the global optimization problem (2) with constant A* = 
Fle" )ig(x*). 

Dinkelbach’s original iterative algorithm is based on 
this theorem and can be described as follows: 


1 | Select some x € S. 

Set 1 = f(x)/g(x) and k = 0. 

2 | Solve the constrained global optimization 
problem (2) to get the optimal solution point 
x (k+D) 

3 | IF i) ee AO aD) = 0, 

THEN set x* = x**! A* = 1), 

STOP. 

4 | IF f(x) — AM g(x) > 0, 

THEN set A) = (cei) and k = 
k+1. 

Go to Step 2. 


Dinkelbach(S, f, g) 


The efficiency of this algorithm depends on the 
number of times the constrained global optimization 
problem must be solved, and on the time spent solving 
it during each iteration. Also note that a test of the form 
f(x**Y) — A®Mg(xk+Y) < 0 is not necessary since, for 
any fixed k, 


FP) — 20 g(x) = mag f(x) — A g(x) 


> f(x™) — A) g(x) 
=0. 


Now consider the function M(A) defined as 


M(A) = max f(x) — Ag(x). (3) 


The function M(A) has two interesting properties that 
are important in guaranteeing convergence of upper 
and lower bounds to A* and in determining the rate of 
this convergence. The first of these properties is that for 
any lower bound A of A*, M(A) is positive, and for any 
upper bound A of A*, M(A) is negative. Secondly, the 
function M(A) is convex. That is, 
1) M(A) > 0 for all A < A*, and M(A) < 0 for all A > A*; 
and 
2) M(A) is convex. 
The sequence of iterates A, A‘, ... generated by the 
algorithm Dinkelbach(S, f, g) is strictly monotone in- 
creasing, and satisfy M(A®) > 0 for i = 0, 1, ...[2]. 
Hence, by the properties of M(A) listed above, they pro- 
vide a strictly monotone increasing sequence of lower 
bounds for A*. 


Bounds and Convergence Rates 


The sequence of lower bounds 4" converges superlin- 
early to A* = f(x*) : g(x*) — where x* is any opti- 
mal solution for (1) as shown in [7]. However, as it 
now stands, the algorithm Dinkelbach(S, f, g) does not 
provide upper bounds on the global optimum function 
value A*. One way to obtain an initial upper bound is 
to solve the following two problems: 


max f(x) (3a) 
to get the optimal solution f(x’), and 
ae (x) (4) 


to get the optimal solution g(x”). Then an initial upper 
bound is clearly given by y~) = f(x’)/g(x"). In fact, 
according to the properties of M (part 1), any y € R sat- 
isfying M(y) < 0 would also be an upper bound. Hence, 
if we define 


() — (nt) (n-1) pnd =A 
n = n= —M a -s 
met = Coops co) 


where A” is the most recent lower bound of A* and 
y— is the most recent upper bound of A*, then the 
new upper bound is given by y“?. As Fig. 1 illustrates, 
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y is just the root of the line segment joining the points 
(A, M(A)) and (v9, M(y— 9), 

This leads to an important modification of the algo- 
rithm Dinkelbach(S, f, g): 


1 Select some x € S. 

Set 1 = f(x Ji g(x). 

2 Solve the constrained global optimization 
problems (4) and (5) to get the optimal func- 
tion values f(x’) and g(x’), respectively. 
Sety' = f(x )/o(« yand k= 0, 

IF yo =O < §, 
THEN set A* = A and x* = x; 


STOP. 
3 Solve the constrained global optimization 
problem 
M(A™) = max f(x) — A® g(x) (6) 
x€ 


to get the optimal solution point x‘**)). 
4 IFM(A) =0, 
THEN set x* = x“**) and A* = 1); 
STOP. 
5 Solve the constrained global optimization 
problem 


M(y*-») = max f(x) = y& g(x) (7) 


to get the optimal solution point y, 
6 IFM(y*-») =0, 
WHEN cetxs =) "and? =a), 


STOP. 
7 Set 
yO = ye 
k-1 k 
—M(y*-Y). a ae 
My) = MQ®) 


8 IFy®) —\ <8, 
THEN set A* = A and x* = x(t); 


STOP. 
9 Seta = ia iggy and k= ke ll 
Go to Step 3. 


Fract(S, f, g, 5) 


M() 


Quadratic Fractional Programming: Dinkelbach Method, Fig- 
ure 1 


Note that the parameter 6 > 0 is a user supplied 
stopping tolerance. The following assertion from [7] 
shows that the sequence iterates y~, y, y™, ...is, 
in fact, a sequence of upper bounds on A*, and that the 
sequence is strictly monotonically decreasing: 


RP ay? ay fort = =1, 0,4 000% 


In fact, the sequence of upper bounds y also con- 
verges to A*, and this convergence is superlinear as 
well [7]. 


Special Cases 


If the feasible set S is polyhedral and the functions f(x) 
and g(x) are either linear or quadratic, then the al- 
gorithm solves a sequence of linear or quadratic pro- 
grams, respectively. In particular, if f(x) = cTx and g(x) 
= d'x then the algorithm solves the sequence of linear 
programs 


max(c — Ad)" x (8) 
xeS 


If both f(x) and g(x) are quadratic, i. e., f(x) = (2) xTQx 
+ clx and g(x) = (4) xTPx + dTx, then the algorithm 
solves the sequence of quadratic programs 


1 
max 5% (Q —AP)x + (c-AMd)" x. (9) 
xE 


Notice that the matrix (Q — AP) may be indefinite, in 
which case the algorithm is required to find the global 
maximum ofa linearly constrained indefinite quadratic 
function. Even though this is an NP-hard problem (e. g., 
when (Q — A“ P) is positive definite), the method de- 
veloped by A.T. Phillips and J.B. Rosen [8] is guaran- 
teed to find an €-approximate global maximum (i.e., 
the relative error is no larger than €) for any specified € 
>0. 
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Furthermore, if f(x) and g(x) are such that f(x) — A 
g(x) is only ‘partially separable’, then the method devel- 
oped in [9] can be used to find an €-approximate global 
maximum for any € > 0. Specifically, the method in [9] 
is guaranteed to find solutions to the sequence of sub- 
problems (6) and (7) if x can be partitioned into two 
components x = (w, z) such that f(x) — x g(x) (where 
the constant « = A or y*~) can be written in the 
form $(w) + w(z) where $(w) is a separable convex 
function of w and y(z) is a concave (but not necessar- 
ily separable) function of z. The applicability of these 
methods to the solution of these subproblems greatly 
extends the class of fractional programming problems 
that can be solved in practice. 


See also 


> Bilevel Fractional Programming 
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> Fractional Combinatorial Optimization 

> Fractional Programming 
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Introduction 


In this paper we consider a quadratic programming 
(QP) problem of the following form: 


; l + T 
min x)= =x Qx+cx 
f(x) = 5x7Q i 

s.t. xéED 


where D is a polyhedron in R”, c € R”. Without 
any loss of generality, we can assume that Q is a real 
symmetric (n x n)-matrix. If this is not the case, then 
the matrix Q can be converted to symmetric form by 
replacing Q by (Q + Q")/2, which does not change 
the value of the objective function f(x). Note that if 
Q is positive semidefinite, then Problem (1) is consid- 
ered to be a convex minimization problem. When Q 
is negative semidefinite, Problem (1) is considered to 
be a concave minimization problem. When Q has at 
least one positive and one negative eigenvalue (i.e., Q 
is indefinite), Problem (1) is considered to be an indefi- 
nite quadratic programming problem. We know that in 
the case of convex minimization problem, every Kuhn- 
Tucker point is a local minimum, which is also a global 
minimum. In this case, there are a number of classical 
optimization methods that can obtain the globally opti- 
mal solutions of quadratic convex programming prob- 
lems. These methods can be found in many places in 
the literature. In the case of concave minimization over 
polytopes, it is well known that if the problem has an 
optimal solution, then an optimal solution is attained at 
a vertex of D. On the other hand, the global minimum 
is not necessarily attained at a vertex of D for infinite 
quadratic programming problems. In this case, from 
second order optimality conditions, the global mini- 
mum is attained at the boundary of the feasible domain. 
In this research, without loss of generality, we are inter- 
ested in developing solution techniques to solve general 
(convex, concave and indefinite) quadratic program- 
ming problems. 


Complexity of Quadratic Programming 


In this section we discuss the complexity of quadratic 
programming problems. The complexity analysis can 


give an idea of the possibility of developing efficient al- 
gorithms for solving the problem. In [10], the QP was 
shown to be NP-hard in the case of a negative definite 
matrix Q. The QP was also proven to be NP-hard by 
reduction to the satisfiability problem [11], and reduc- 
tion to the knapsack feasibility problem [5]. Moreover, 
it has also been shown that checking local optimality for 
the QP itself isan NP-hard problem [11]. In addition, 
checking for strict convexity (checking local optimal- 
ity as part of the second order necessary conditions) in 
the QP was proven to be NP-hard [8]. In fact, find- 
ing a local minimum and proving local optimality of 
such a solution to the QP may take exponential time. 
This is true even in the case of a small number of con- 
cave variables. For instance, although the matrix Q is of 
rank one with exactly one negative eigenvalue, the QP 
is still N’P-hard [9]. However, a large number of neg- 
ative eigenvalues does not necessarily make the prob- 
lem harder to solve. For example, consider the follow- 
ing problem: 


1 
min st Ox +clx 


s.t. x>O0. 


If the matrix Q has (n — 1) negative eigenvalues, then 
there must be at least (n — 1) active constraints at the 
optimal solution [3]. Correspondingly, it is sufficient to 
solve (n — 1) different problems, in each case setting 
(n — 1) of the constraints to equalities, to find the opti- 
mal solution. In general, if the matrix Q has (n—k) neg- 
ative eigenvalues, then we are required to solve aoa 
independent problems. In addition, the total computa- 
tional time required to solve this problem is propor- 
tional to fot. Thus, if k is an constant and indepen- 
dent of n, then the computational time is bounded by 
a polynomial in n. On the other hand, if k grows with 
n, then the computational time can grow exponentially 


with n [3]. 


Equivalence Between Discrete 
and Continuous Problems 


Before we show the equivalence between discrete and 
continuous programs, it is important to discuss an 
equivalence property between two extremum prob- 
lems [2]. Therefore, we refer to the following theorem 
(see [2] for a proof). 
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Theorem 1 Let Z and X be compact sets in R", R be 
a closed set in IR", and let the following hypotheses hold. 


H,) f: R" — R is a bounded function on X, and 
there exists an open set A C Z and real number 
a, L>0 such that, for any x, y € S, f satisfies 
the following Holder condition: | f(x) — f(y)| 
< Lx — yll*. 
H2) It is impossible to find g : R" — R such that 
(i) @ is continuous on X, 
(ii) p(x) =0, x € Z; v(x) >0, x E X—Z, 
(iii) Vz € Z, there exists a neighborhood S(z) and 
areal & > 0 such that, for any x € S(z) n(x 
Z), o(x) > &llx — zI|*. 


Then a real [to exists such that for any real u > [o; 
min f(x), x € ZM R is equivalent to min[ f(x) + 
Lg(x)], x EXOR. 


Now we can show an equivalence between discrete and 
continuous programs from the following theorem [2]. 


Theorem2 Lete’ = (1,1,...,1),Z =B", X = {xe 


R"; 0< x < e}, R= {x € R"; g(x) = O}. Consider 
the problem 


min f(x) 
; (2) 
s.t g(x) >0, x eB", 
and the problem 


min [f(x) + ux"(e—~x)] 


s.t g(x) >0, O< x<e. 


(3) 


Then we suppose that f verifies assumption H; from The- 
orem 1 with a = 1; that is, it is bounded on X and Lip- 
schitz continuous on an open set A > Z. Subsequently, 
there exists some [to € R such that Vu < [Lo Prob- 
lems (2) and (3) are equivalent. 


Integer Programming Problems 
and Complementarity Problems 


The connections between integer programs and com- 
plementarity problems can be exhibited by applying 
KKT conditions. The results can be generalized in the 
quadratic programming case [4]. 


Theorem 3 Let us first assume 
3a) f: R” > R, g: R" > R are continuously differ- 
entiable functions. 


3b) g(x) satisfies a constraint qualification condition at 
x° to ensure that KKT conditions are validated. 
Then the nonlinear programming problem 


min f(x) 


Sib g(x) = 0, 4) 


x>0, 

has an optimal solution x° if there exist uo € R", 
y®, v°? € R” such that (x°, y°, u°, v°) is an optimal solu- 
tion to the following problem: 


min f(x) 

st. f'(x)—y"g'(x) —u = 0, 
—v=0, 5) 
yv=0 
x'u =0 


x,y,u,v>0. 


Proof 1 Necessity. If x° is an optimal solution to Prob- 
lem (4), from KKT conditions we obtain (y°, u°) such 
that 


f'(x°) _ >" g(x°) — y° 
g(x°) 


II 


IV 


T 
x? Ti 


II 


0 
0, 
0 
0 


0 0 0 
Xu Vn 


IV 


Let v? = g(x°), then (x°, y?, u°, v°) is an optimal solu- 
tion to Problem (5). 
Sufficiency. The proof is trivial. Oo 


We now generalize the results of Theorem 3 to the 
quadratic programming case. Consider the following 
problem 


1 
min 52 Qe +clx 
st. <Ax>b, (6) 
x eB", 


where Qis a symmetric matrix. Using Theorem 2, Prob- 
lem (6) is equivalent to 


1 
min Exc —2pl)x + (ci + nes] 
st. Ax > b, (7) 
x Se; 


x>0. 
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Applying Theorem 3 to Problem (7), we then obtain 


min Exc —2pDx+ (cl + nes] (8) 


s.t. c+ Qx+ple—2x)-y'TAt+t=u, (9) 


b—Ax =v, (10) 
e-x=w, (11) 
x'u =0 (12) 
y'v=0, (13) 
t'w =0, (14) 
x,y, t,u,v,w= 0. (15) 


Arrange the terms in (9), we then have Qx — 2px = 
—(c + we) + y'A—t + u. Consequently, (8) becomes 
min[3(c’ + yre’)x + 3(b’y — e't). From (12), (13), 
and (14), we have 


x'u= 0, 
0= y'v = y'b - y' Ax, 
0= ttw=tle—-t'x; 


therefore, y'b = y'Ax and tle = t'x. Taken all to- 
gether, Problem (6) is equivalent to the following prob- 
lem. 


min ¢'s 
st. Ag+a=b, 
xu = 0, 
x, uo, 
where 
aT = (xT yl £2), 
i= (ul,vi wi ; 
, —Q+2yuI A™ —-I 
A= A 0 0 , 


LE 0 0 


1 
et z(ct + pe! +e',b’, e*), 


b= (c™,b' e7). 


Note that there are no restrictive assumptions made on 
Q, this transformation is applicable to the convex case 
as well as the nonconvex case. 


Integer Programming Problems 
and Quadratic Integer Programming Problems 


Integer programming is used to model a variety of im- 
portant practical problems in operations research, engi- 
neering, and computer science. Consider the following 
linear zero-one programming problem: 

min c'x 


s.t. Ax <b, x, €{0,1}, (@=1,...,n) 


where A is a real (m x n)-matrix, c € R” and b € R”™. 
Let e' = (1,...,1) € R"” denote the vector whose 
components are all equal to 1. Then the zero-one in- 
teger linear programming problem is equivalent to the 
following concave minimization problem: 


min f(x)= clx + ux'(e — x) 
s.t. Ax<b,0<x<e 


where ju is a sufficiently large positive integer. We know 
that the function f(x) is concave because —x!x is con- 
cave. 

The equivalence of the two problems is based on 
the facts that a concave function attains its minimum 
at a vertex and that x'(x — e) = 0, 0 < x < e, implies 
x; = Oor 1 fori = 1,...,n. We note that a vertex 
of the feasible domain is not necessarily a vertex of the 
unit hypercube 0 < x < e, but the global minimum is 
attained only when x'(e — x) = 0, provided that ju is 
a sufficiently large number. 

These transformation techniques can be applied to 
reduce quadratic zero-one problems to equivalent con- 
cave minimization problems. For instance, consider 
a quadratic zero-one problem of the following form: 


min f(x)= cix + x'Qx 

st. x € {0,1} 
where Q is a real symmetric (m x n) matrix. Given any 
real number ju, let Q = Q + pI where J is the (n x n) 
unit matrix, and ¢ = c — je. Because of f(x) = f(x), 
the above quadratic zero-one problem is equivalent to 
the problem: 

min f(x) = é'x+x™Qx 


s.t. x, € {0,1}, (i=1,...,n) 
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In this case, if we choose jz such that Q = Q + pl be- 
comes a negative semidefinite matrix (e.g. ~ = —A, 
where A is the largest eigenvalue of Q), then the objec- 
tive function i (x) becomes concave and the constraints 
can be replaced by 0 < x < e. Thus, this problem is 
equivalent to the minimization of a quadratic concave 
function over the unit hypercube [4]. 


Various Equivalent Forms 
of Quadratic Zero-One Programming Problems 


The problem considered here is a quadratic zero-one 
program, which has the form 


min f(x) = x'Qx, 


s.t. x; € {0,1}, ) 


Q=1,..:,.n, 


where Q is an n X n matrix [6,7]. Throughout this sec- 
tion the following notation will be used. 

e {0,1}": set of n dimensional 0-1 vectors. 

e R"*": set of n x n dimensional real matrices. 

e R": set of n dimensional real vectors. 

In order to formalize the notion of equivalence we need 
some definitions. 


Definition 1 The problem P is “polynomially re- 
ducible” to problem Py if given an instance I(P) of prob- 
lem P, an instance I(Po) of problem Pp can be obtained 
in polynomial time such that solving I(P) will solve 
I(Po). 


Definition 2 Two problems P; and P2 are called 
“equivalent” if P; is “polynomially reducible” to P, and 
P is “polynomially reducible” to P). 


Consider the following three problems: 


P: min f(x) = x'Qx, x € {0,1}", 
Q E RU*n 
P,: min f(x) = x Qx +clx, xe {0, 1}", 
QeER"™" cE R". 
Py: min f(x) = x'Qx, xe {0,1}", 
Q E€ RU*n 
n 
+s = k for some k 
i=1 
s.t. O<k<n, 


where x = (X1, X2,...,Xn)- 


Next we show that problems P, P;, and P; are all “equiv- 
alent”. Then, formulation P, will be used in the rest of 
the sections. 


Lemma 1 P is “polynomially reducible” to P}. 


Proof 2 It is very easy to see that P is a special case of 
Py. Oo 


Lemma 2 P; is “polynomially reducible” to P. 


Proof 3 Problem P, is defined as follows: min f(x) = 
x'Qx +clx,x € {0,1}",Q © R™",ce RQ = 
(qij) then let B = (b;;) where 


qij ifi Fj 


bij => 
Gij + Ci 


ifi = j. 
Since x} = x; (because x; € {0,1}), we have g(x) = 
x'Bx = x'Qx + c'x. So the following problem is 


equivalent to problem P; : ming(x) = x'Bx,x € 
{0,1}", Be R"™", Oo 


Using Lemma 1 and Lemma 2, it is evident that P and 
P, are “equivalent”. 


Lemma 3 P> is “polynomially reducible” to P. 


Proof 4 Problem Pz is as follows: min f(x) = 
sOs, 2 6 10,1)", Qe Rr” > 7a; = bier 
some k s.t.0 < k < n. If Q = (qj;) then let M = 
21 j=1 Vi=. Iqijl] + 1. Now, define the following 
problem P: min g(x) = x'Qx + M()7/_, xi —k)? s+. 
x € {0,1}",Q € R"*". Let x, = Gi? Seid a) and xp = 
(a) «0098! ) such that ar = hand 5a? = % 
then g(x) < “4 as )vi_, x? = k, g(x) > =UEB 
+ Mor g(xy) > “Mas | ox? — k| > 1. There- 
fore, g(xo) < g(xv) if 7, x? A kand )v'_, x° =k. 
Hence, if min g(x) = g(xo) where xp = (xj,...,x?) 
then )7j_, x? = k.So min f(x) = min g(x). From the 
above discussion, it can be easily seen that P2 is “poly- 
nomially reducible” to P. oO 


The proof of Lemma 3 also illustrates how equality 
(knapsack) constraints in a quadratic zero-one program 
can be eliminated. 


Lemma 4 P is “polynomially reducible” to Po. 


Proof 5 Let problem P be defined as follows: 
min f(x) = x'Qx, x € {0,1}", Q € R"™". De- 
fine a series of (n+ 1) problems: P,(0), P:(1), P2(2),--- , 


Quadratic Integer Programming: Complexity and Equivalent Forms 


P,(n), where P(j) is the following problem min f(x) = 
x'Qx, x € {0,1}, Q © R™™, Si = 7 
Let the minimum of the problem P,(j) be y;, then the 
minimum of problem P is easily seen to be the min 
{V0 Vis +++ Yuh: O 


Lemma 3 and Lemma 4 imply that P and P, are “equiv- 
alent”. Since “equivalent” is a transitive relative, P, P1, 
P, are all “equivalent”. 


Complexity of Quadratic Zero-One 
Programming Problems 


Quadratic zero-one programming is a difficult prob- 
lem. We next will show that the quadratic knapsack 
zero-one problem in (P) isa NP hard problem by prov- 
ing that it is equivalent to the k-clique problem. A k- 
clique is a complete graph with k vertices. 


k-clique Problem 


Given a graph G=(V, E) (V is the set of vertices and E 
is the set of edges), does the graph G have a k-clique as 
one of its subgraphs? 

k-clique problem is known to be NP-complete. We 
will show that the k-clique problem is “polynomially re- 
ducible” to problem P defined in the previous subsec- 
tion. 


Theorem 4 The k-clique problem is “polynomially re- 
ducible” to Po. 


Proof 6 Problem P2 was defined as min f(x) = x'Qx, 
s.t. x; € {0,1},i = 1,---,n, J. x; = m for some 
0 < m < n. Given the graph G = (V, E), define Q = 
(qij) such that 


- 0 if (vi, vj) EE 
a if (vj,v;) ZE, 
where n = |V|,m = k (we are trying to find a k- 


clique). The meaning attached to the vector x € {0, 1}” 
in problem P, is as follows 


1 
xj = 
0 


We can easily prove that the graph G has a k-clique if 
and only if min f(x) = —k(k—1). So the k-clique prob- 
lem is “polynomially reducible” to P3. oO 


means that v; is in the clique, 


means that v; is not in the clique . 


Problem P, is “equivalent” to P, so problem P is also 
NP-hard. Therefore, as the dimension of the problem 
increases, the necessary CPU time to solve the problem 
increases exponentially. 


Quadratic Zero-One Programming 
and Mixed Integer Programming 


In this section, we consider a quadratic zero-one pro- 
gramming problem in the following form: 


min f(x) = x! Qx, 
n 
(17) 
s.t. Yixi=k, x € {0,1}". 
i=1 
Let Q be n X n matrix, whose each element q;,; > 0. 


Define x = (x1,...,Xy), where each x; represents bi- 
nary decision variables. We will show that the problem 
in (17) can be linearized as the following mixed integer 
programming problems. The first linearization tech- 
nique is trivial and can be found elsewhere. Recently, 
more efficient linearization technique was introduced 
in [1]. In addition, the linearization technique for more 
general case (where q;,; € real) and multi-quadratic 
programming was also proposed in [1]. 


Conventional Linearization Approach 


For each product x;x; in the objective function of the 
problem (17) we introduce a new continuous variable, 
Xij = xixj( x j). Note that Xi = = Xj for 
x; € {0, 1}. The equivalent mixed integer programming 
problem (MIP) is given by: 


min > > QijXij 
s.t. Yo xi =k, 
i=1 


xij S Xi, fori,j=1,...,nGi Fj) 
xij S Xj, for i,j=1,...,n(i ¥ j) 
xj+xj-l<xjj, fori,j=1,...,ni Fj) 
O< xij <1, fori,j=1,...,n(i Fj) 

(18) 


where x; € {0,1},i,j=1,...,n. 
The main disadvantage of this approach is that the 
number of additional variables we need to introduce is 
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O(n”), and the number of new constraints is also O(n?). 
The number of 0-1 variables remains the same. 


A New Linearization Approach 


Consider the following mixed integer programming 
problem: 


min g(s) = Yo si = e's 


sei i=1 
n 
s.t. ees 
i=1 (19) 
Qx —y—-s=0, 
y < plex), 
x; € {0,1}, fori=1,...,n 
yi, $i 20, fori=1,...,n. 


where Q is an n X n matrix, whose each element 
qi,j = 0. 

In [1], the mixed integer 0-1 programming problem 
in (19) was proved equivalent to the quadratic zero-one 
programming in (17). The main advantage of this ap- 
proach is that we only need to introduce O(n) addi- 
tional variables and O(n) new constraints, where the 
number of 0-1 variables remains the same. This lin- 
earization technique proved more robust and more effi- 
ciently solving quadratic zero-one and multi-quadratic 
zero-one programming problems [1]. 
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The quadratic knapsack problem is one of the sim- 
plest quadratic programming problems as defined be- 
low (cf. also » Quadratic programming with bound 
constraints): 


min f(x) = $x'Qx+c'x 


n 
(P) s.t. > ajixji = M, 
i=1 
O0<x,<1, i=1,...,n, 
where x € R” is a variable vector, Q € R"*",c € R" and 
M isa scalar. 
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Quadratic Knapsack 


The problems are mainly classified by the nature 
of matrix Q. When the matrix Q is positive semidefi- 
nite, i.e., the objective function f(x) is convex, prob- 
lem (P) can be solved in polynomial time by the ellip- 
soid algorithm [8], and several kinds of interior point 
algorithms (e.g. [5,7,11], which solve general convex 
quadratic problems including (P) as a special case). 
Also, P.M. Pardalos, Y. Ye and C.G. Han [15] show 
a potential reduction algorithm for the special case of 
(P) defined below: 


min $x! Qx 
n 
S.t. ye = 1, 
i=1 
x,>0, i=1, it 


In particular, when (P) has a diagonal matrix Q with 
positive elements, an O(n) algorithm has been proposed 
by P. Brucker [3]. The algorithm generates the corre- 
sponding KKT condition using binary search. Pardalos 
and N. Kovoor [13] also propose an O(n) randomized 
method. 

The convex case is important because of its frequent 
appearance as a subproblem in many application areas. 
Among those are general convex quadratic program- 
ming [9], multicommodity network flow problems [1], 
resource management [2], and portfolio selection prob- 
lems [10]. 

The problem becomes extremely difficult if f(x) is 
not convex. S. Sahni [16] shows that the problems with 
the negative diagonal matrix Q are NP-hard (cf. also 
> Computational complexity theory; » Complexity 
theory), which implies that the general indefinite case 
is also NP-hard. 

Let aj, ..., a, and b be positive integers, and let us 
consider the subset sum problem, which finds a feasible 
solution of the set defined below: 


n 
x: Yaixi = b, x; €{0,1},i=1,...,n 
i=1 


The feasibility is determined by the the following con- 
cave quadratic knapsack problem: 


n 
min > x;(1 — x;) 
i=1 


n 
s.t. > a =m, O0<x, <1, t1=1,...,n. 
i=1 


The subset sum problem is feasible if and only if the 
global optimum value of the corresponding quadratic 
knapsack problem is zero. 

As we see in the above, the indefinite case arises in 
several combinatorial optimization problems. For ex- 
ample, given a graph G(V, E) where V = {1, ..., n} is 
a set of vertices and E C V? is a set of edges, find the 
maximum clique of G. This problem can be formulated 
in the following way: 


min > =X jX; 
(i, f)EE 
n 
s.t. Se = 1, 
i=1 
x,>0, i=1,...,n 


If G has a maximum clique of size k, then the global 
maximum is (1/k — 1)/2. We can also formulate the 
maximum independent set problem and the node cov- 
ering problem in a similar fashion. 

One can also formulate any quadratic minimization 
problem over a convex hull by the quadratic knapsack 
problem. Consider the problem of the form: 


min q(z)=z'Mz+4+r'z a) 
s.t. zeEP, 

where z,r€ R”, M € R™*™ and P CR” is the polytope 

described as the convex hull of a given set of points {v;, 

..+» Vn}. It can be verified easily that the above general 

quadratic problem has the following equivalent formu- 

lation 


min f(x)=x'(V'MV)x +1! Vx 


s.t. pe =1, (2) 


where V = [v1,..., Vn]. Let z* and x* be optimum solu- 
tions of (1) and (2), respectively. Then we have 


giz") =f"), 


and moreover z* = Vx*. 

There exist only a few algorithms for obtaining 
a global optimum solution for the case of the general 
indefinite Q. See [15] for a partitioning approach as 
well as an interior point method, while [4] surveys al- 
gorithms for general nonconvex quadratic problems. 
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The case when the objective function is separa- 
ble has also been well investigated by several authors. 
Some practical algorithms to obtain an exact solu- 
tion are reported in [6,14]. S.A. Vavasis [18] shows an 
O(n(log n)*) algorithm for finding a local minimum of 
the problem, while K.G. Murty and S.N. Kabadi [12] 
show that verifying a local minimum for an indefinite 
quadratic problem with general constraints is NP-hard. 
Also, [17] gives an €-approximation algorithm which is 
weakly polynomial in the problem size if the number of 
negative diagonal elements is fixed. 


See also 


> aBB Algorithm 

> Complexity Theory: Quadratic Programming 

> D.C. Programming 

> Integer Programming 

> Multidimensional Knapsack Problems 

> Quadratic Assignment Problem 

> Quadratic Fractional Programming: Dinkelbach 
Method 

> Quadratic Programming with Bound Constraints 

> Quadratic Programming Over an Ellipsoid 

> Reverse Convex Optimization 

> Standard Quadratic Optimization Problems: 
Algorithms 

> Standard Quadratic Optimization Problems: 
Applications 

> Standard Quadratic Optimization Problems: 
Theory 
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Synonyms 
QPwBC 


Problem Statement 


The bound constrained quadratic problem has the fol- 
lowing form: 


1 
min f(x) = min 5x'Qx+e'x, (1) 
Q={xeER": 1<x <u}, 


where Q = (qj) € R”*” is an indefinite symmetric ma- 
trix and x, c, I, u € R". Here (as always in the sequel), 
all inequalities involving vectors are interpreted com- 
ponentwise, and V f(x) = Qx + c is the gradient of f. 
The region {2 is assumed to be nonempty (i.e. Jj) < uj 
for each i € {1, ..., n}) and may be unbounded (ie. /; 
= — oo and/or u; = + oo for some i € {1, ..., n}). The 
function f(x) is assumed to be bounded below on 22. 
For each x € 2, the active set A(x) is defined as: 


A(x) = qi: xi Ec {]i, uh} ‘ 


Problems of the form (1) naturally arise in a number of 
different applications. Moreover, QPwBC isa basic sub- 
routine for many nonlinear programming codes, and 
the monotone linear complementarity problem can be 
written in the above form. For the convex case (i.e. 
Q positive semidefinite), which is known to be poly- 
nomially solvable [16], many efficient algorithms ex- 
ist [4,5,7,9,10,12,18,36]. However, not many algorithms 
exist for the efficient solution of the general nonconvex 
problem [8,17,22,24,25,26]. 

From the complexity point of view, problem (1) is 
NP-hard [32], and even checking local optimality for 
a feasible point is NP-hard [20,27]. The complexity of 
finding a stationary point for (1) is an open question (in 


the concave case this problem is PLS-complete [15]). 
Algorithms to construct approximate solutions [33] in 
polynomial time exist. 


Optimality Conditions 


For problem (1) the classical local optimality conditions 
can be stated in a very special form. Moreover, there 
exist interesting results about global optimality which 
lead to efficient numerical procedures. 


Local Optimality Conditions 


Proposition 1 If x* € 2 is a local minimum for prob- 
lem (1) then: 
A) if qi = 0, then 
i) [Vf (x*)]i = 0; or 
ii) [V f (x*)]i> 0 and x} =I or 
iii) [V f(x* )]i< O and x* = uj. 
B) if qi < 0, then 
i) [Vf (x*)]i > 0 and x* = Ij; or 
ii) [V f (x*)]i< O and xF = uj. 


Proposition 1 specializes the classical KKT stationarity 
conditions, which only involve first order information, 
to problem (1) by taking into the account the sign of the 
second order pure derivatives. If x* is nondegenerate, 
ie. 


(xf — LaF — ui) + |[VFO*)],| #0 


for each i € A(x*), then the conditions A)-B) are suffi- 
cient for local minimization. 

The following proposition states a relationship be- 
tween the number of negative eigenvalues of the matrix 
Q and the cardinality of the active set at a stationarity 
point x*. 


Proposition 2 If the matrix Q has k negative eigenval- 
ues counting multiplicities, then at least k constraints are 
active at a local solution x* of problem (1). 


Because of Proposition 2, if f is concave, the problem is 
bounded if and only if all upper and lower bounds are 
finite, and the solution can be found by checking all the 
vertices of §2. Therefore the concave QPwBC problem 
is equivalent to a quadratic zero-one problem [1,22]. 
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Global Optimality Conditions 


Global optimality conditions for problem (1) can be 
stated in terms of copositivity [14] of the Hessian ma- 
trix. 


Definition 3. Ann x n matrix Q is copositive with re- 
spect to a polyhedral cone J” C R" (denoted by I’- 
copositive) if and only if 


v' Qv > Oforally € I \ {0} 


(for strict copositivity, > has to be replaced by >). 


Definition 4 Given x € 92, the tangent cone I’ (x) of 
92 in x is defined as 


I(x) ={veR": X+ ave 2 forsomea > 0}. 


Definition 5 Given x € 2 and v € R", we define 
A(x, v) as follows: 


A(X, v) =max{A>0: X+AVE Q}. 


Let us consider the following decomposition for the 
cone I(x): 


T(x) = (U a) U (U reo). 
i=1 


i=1 
where 


Pi(x)={ver: [x + A(x, v)-v]; = uj}, 
Po(x)={ve Tl: [x+A(x,v)-v]; = hi}, 


SHU y ccs 


Leifve DF (x)\{0} (or ve I’; (x) \ {0}), then vj 4 0 
and the maximum stepsize along v moving from x sat- 
urates the ith upper (lower) constraint (see Fig. 1). 


Proposition6 A KKT point x yields a global minimum 
if and only if X is stationary point and the Q} (or Q;) 
are I’; -copositive (respectively, I; -copositive), where 


Qt = (ui —X)Q+ 2Vf(He/) , 
Qe = (Hi —W)Q-2VF@e/) . 
Finally, the following Proposition [21] gives a sufficient 


condition for a KKT point to be a global minimum, in 
terms of convexity of some augmented function L(x). 
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Quadratic Programming with Bound Constraints, Figure 1 
Partitioning of the set I (x) for the two-dimensional case 


Proposition 7 Let x be a KKT point for problem (1). 
Let | and u; be finite for eachi € {1,..., n}. Let 


een!) 


— Un — Ly 


IIVFG)I]1| 
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D = diag ( 


If L(x) = f(x) +(x —Xx)' D(x —X) is convex in 2, then 
x is a global solution of (1). Moreover, if L(x) is strictly 
convex in 92, then this solution is unique. 


This kind of result can be a useful tool for branch and 
bound algorithms for global optimization. Moreover, 
Proposition 7 allows one to construct test problems in 
quadratic programming with known global minimum. 

More results on the global optimization criteria for 
(1) exist in the literature (see, for example, [21] and ref- 
erences therein). 


Algorithms for Local Minimization 


Most of the algorithms to locally solve problem (1) 
can be classified in the so-called active set strategies, 
which reduce the solution of the problem to a sequence 
of auxiliary unconstrained subproblems on affine sub- 
spaces of R” (faces). They generate a sequence of feasi- 
ble points x, each x“ associated with a working set 
Ww” ¢c A(x). The active set algorithms can be de- 
scribed according to the very general framework in Ta- 
ble 1. 

These methods differentiate on the way they solve 
the subproblems P(x, W™) and on the definition of 
a new face. One of the first of such algorithms, due to 
B.T. Polyak [29], uses a conjugate gradient algorithm to 
solve P(x, W). Since then, many modifications have 
been proposed for the solution of the auxiliary problem. 
In particular, the approximate solution of such prob- 
lems is suitable to deal with large scale problems. With 
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Quadratic Programming with Bound Constraints, Table 1 
Active set algorithm for QPwBC 


Initialization: 
take a first point x € Q; 
W = A(x); k = 0; 
REPEAT 
Solve the quadratic unconstrained problem: 
P(x), Ww) = min f(x +yv), 
v; =0,Vie W™; 
IF P(x, W“) is unbounded below THEN 
ACT]1) choose x“*) € 2: 
Al) = Ata and Fo) = F(a), 
choose Wt) 5 Ww; 
ELSE 
ACT2) a) = max{a€ [0, 1] : x +av € 2}; 
x(k) = xb) 4 Ql y (hk) 
choose Wt) 
such that A(x**)) > WH) 4 wh; 
ENDIF 
k=k+1; 
UNTIL(stop condition holds) 


> 


regard to the definition of a new working set, in ACT2) 
a projected gradient step can be taken in order to add 
more than a new variable to the new working set [18]. 
Arguments of combinatorial nature show that, in non- 
degeneracy assumptions, an active set strategy termi- 
nates in a finite number of steps at a stationary point, 
provided that the exact minimization of the subprob- 
lems is performed (at least once every j steps, for some 
prefixed j). In case of degeneracy, the finite termination 
still holds for some active set algorithms. Specialized 
versions of active set strategies have been successfully 
proposed for solving large sparse problems [4,5,19]. 

On a completely different approach are based the 
algorithms that belong to the family of the interior 
point methods (cf. also ® Linear programming: Inte- 
rior point methods); after Karmarkar’s polynomial al- 
gorithm for linear programming, many interior point 
algorithms have been developed for the convex linear 
complementary problem (and therefore for the convex 
QPwBC). They include the primal-dual potential re- 
duction algorithm and the path following algorithms 
[34]. For more detail see » Linear complementarity 
problem. Finally, penalty techniques have been success- 
fully proposed for the convex QPwBC [6]. 


Algorithms for Global Minimization 


The global optimality conditions expressed in Propo- 
sition 6, suggest a very simple algorithmic framework 
for solving (1), whose main ingredient is the procedure 
COPOS(Q, I’, d). Such a procedure [2], given an n x 
n matrix Q and a polyhedral cone I’, detects either the 
I -copositivity of Q or a direction d € I” such that d™Qd 
< 0. In the sequel all Q; matrices and the cones J; are 
relative to the stationary point x. 

In the algorithm in Table 2, COPOS is used to es- 
cape from local solution which are not global. 

In [3] the basic algorithm escape has been improved 
using pseudoconvexity and a preprocessing procedure. 
However, because of complexity reasons (the problem 
of exactly checking copositivity is itself NP complete!) 
algorithms based on copositivity are suitable only for 
very small size problems. 

A different approach [23], originally proposed for 
concave quadratic problems [30], uses a separable for- 
mulation based on the eigenstructure of the quadratic 
form. Using the linear variable transformation x = Py, 
where P is an orthogonal matrix whose columns are the 
eigenvectors of Q, the original problem is transformed 
into the separable form 


a oily) + daly), 


Quadratic Programming with Bound Constraints, Table 2 
Global QPwBC algorithm 


Initialization: 
take a first stationary point x; 
i = Ils 
REPEAT 
IF I,’ ¢ {0} THEN call COPOS(Q;, I", d); 
IF P~ 4 {0} THEN call COPOS(Q; , I’, d); 
IF a direction d is found such that d' Qtd < 0 
ord'Q-d <0; 
THEN 
Se = et Nrvan (as 
use x* as starting point for a procedure that 
generates a new stationary point x; 


i=0; 
ENDIF 
i=it+l; 


UNTIL(i = n +1). 


Quadratic Programming with Bound Constraints 


3165 


where M is a rectangle of minimum volume that con- 
tains Q = {ty €R": 1 < Py <u}. The functions 
P,(y) and ®2(y) are, respectively, the concave part and 
the convex part of the objective function. 

The function ®2(y) can be underestimated by using 
a piecewise linear approximation and this gives a con- 
vex problem which approximates the original problem 
and for which an error bound can be given, depending 
both on the size of the negative eigenvalues of Q and on 
the size of the range of allowed displacements along the 
respective eigenvectors. This technique can be incorpo- 
rated within a branch and bound framework. A way 
to improve the approximation is to make a partition- 
ing of the domain along the eigendirections, based on 
the error estimate, and bounding techniques can be de- 
vised. An efficient parallel implementation is described 
in [28]. 

The __ reformulation-linearization/convexification 
techniques [31] are based on a suitable linearized re- 
formulation of the problem (1). The goal of RLT is to 
try to approximate the convex envelope of the objective 
function over the feasible region in deriving tighter and 
tighter lower bounding linear programs. 

Based on the combinatorial nature of the problem 
some branch and bound enumerative techniques have 
been proposed [13], that can be very expensive from 
a computational point of view, and therefore only suit- 
able for small size problems or problems whose spar- 
sity allows only a low number of subproblems to be ex- 
plored. 

More attracting from the computational point of 
view are algorithms based on interior point methods, 
whose main drawback is unfortunately that no guaran- 
tee exist about the convergence to the global solution of 
problem (1) [11]. 
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Quadratic programming (QP) plays an important role 
in optimization theory. In one sense it is a continuous 
optimization and a fundamental subroutine for general 
nonlinear programming, but it is also considered one of 
the most challenging combinatorial optimization prob- 
lems. 

One of QP problems is to minimize a quadratic 
function over an ellipsoid constraint. Since any ellipsoid 
can be transformed to a ball by an affine transforma- 
tion, without of loss generality, we consider the follow- 
ing ball-constrained QP problem BQP (r): 

min $x'Qx aoe hy 
x€ Br) ={x eR": |x] S75, 


(1) 


s.t. 


where Q € R"*", c € R", and superscript T denotes the 
transpose operation. Here, || - || denotes L2 norm and 
r > 0 is the radius of the ball. A main recent result is 
that this problem is an ‘easy’ problem, even when the 
objective function is nonconvex. 

We begin with a brief history of this problem. 
There is a class of nonlinear programming algorithms 
called model trust region methods. In these algorithms, 
a quadratic function is used as an approximate model of 
the true objective function around the current iterate. 
Then the main step is to minimize the model function. 
In general, however, the model is expected to be accu- 
rate or trusted only in a neighborhood of the current 


Quadratic Programming over an Ellipsoid 


3167 


iterate. Accordingly, the quadratic model is minimized 
in a L2-norm neighborhood, which is a ball, around the 
current iterate. Recently (1996), it was demonstrated 
[5] that a class of combinatorial optimization problems 
can be solved by solving a sequence of ball-constrained 
QP problem. 

The model-trust region method is due to K. Leven- 
berg [7] and D.W. Marquardt [8]. These authors con- 
sidered only the case where Q is positive definite. J.J. 
Moré [10] proposed an algorithm with a convergence 
proof for this case. D.M. Gay [4] and D.C. Sorenson 
[15] proposed algorithms for the general case, see also 
[2]. These algorithms work very well in practice, but 
no theoretical complexity result was established for this 
problem then. 

It is well known [4,15] that the solution x of problem 
BQP (r) satisfies the following necessary and sufficient 
conditions: 


(Q+pDx =—c, 
= max{0,—A}, (2) 
|x|] =r, 
where A denotes the least eigenvalue of matrix Q. Since 
Qis not positive semidefinite, we must have A < 0. 


Let jz* and x* satisfy conditions (2). It has been 
shown that jz* is unique and 


c 
we <iajs H, 


(3) 
It is also known that 


|Al < nmax{|qij|} 


where qj is the (i, j)th component of matrix Q. Thus, 
we have 


0s p32 == nmax{|qij|} + — (4) 


where j° is a computable upper bound. It is further 
proved that ([19]) 
1 ” je ill 
— |A| < =r’u* < q0)—q(x*) < “a |A|+r lel]. (6) 
This inequality can be used to develop an approxima- 
tion algorithm for general quadratic optimization, see 
[3]. 

We now analyze the complexity of solving BQP (r). 
A simple bisection method was proposed in [18] and in 


[19]. For any given j1, denote solutions of the top linear 
equations by x,, in conditions (2), i.e., 

wi=—(Q+tpN ce, Vu>|al. (6) 
For any given j4 we can check to see if pz > |A| by check- 
ing the positive definiteness of matrix Q + yz I, which 
can be solved as a LDLT decomposition. These facts lead 
to a bisection method to search for the root of || x, || 
= r over the interval yz € [JA], 4°] C [0, w°]. Obvi- 
ously, for a given €” € (0, 1), a wx such that, say 0 < 
pe — w* < e' w*/8, can be obtained in O(log(1°/yu*) 
+ log(1/e’)) bisection steps, and the cost of each step is 
O(n?) arithmetic operations (for performing LDLT de- 
composition). 

The remaining question is what €’ would be suffi- 
cient to generate an €-minimizer of q(x) over the ball 
B(r), that is, an x satisfying 


q(x) — q(x*) 2 
q(0) — q(x*) ~ 


Let jz denote the right endpoint of the interval gen- 
erated by the bisection search. Then, > p*. If pw = 
p*, then we get an exact solution x* = x,,+. Thus, we 
assume jl > [L* > A. By the positive semidefiniteness of 
Q+ * I, we have 


||x,| < ||x*|| =r. 


We consider two cases. 


Case I. In the first case we assume 


(1- cS )erela 


or 


pt > [Al + 
ae 


Using the relation (6) and simplifying, we obtain 


lx"? = xu||? =O)" 
x (I—(Q + w*1I(Q 4+ wD ?(Q + p*D)x* 
= (x*)"(2(u — w*\(Q+ wD 
—(u— p*P(Q+ wD ”)x*. 
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Next we bound the above expression by using the small- 
est eigenvalue A of Q. This gives 


4/2 2 2(u — b*) (u—p*y x2 
I — Tal = Gear Cain Ie 
_ ( 2(u — 2") 
~ (ua wt) + (u* = A) 

_ (uw — p*)? ) Ix" |]? 

((u — u*) + (u* — |A]))? 


_— (ew + Au — w*)(e* — Al) » 

(Cu ps) + (u* — [A)))? 

_ (: 7 (a* = [Aly )r 
(Ce = e*) + (a* = |A/))? 


2 


(HK oe uy 


where in the last step we used the assumption 


* 


p> |Al + 


* 


€ 
8ynt 
Therefore, if we have  — u* < €’ */8, then 


CH) +(2) ae, 


cGy  * 


Ix" laxu| 
(7) 


On the other hand, note that 


q(x) — q(x") 


1 1 
stu Xu + Cay _ 5 (eT Ox" —c!x* 


1 
= 5 (QXn + c)! (x, —x*) 
+ 5(Qe" + ay") 
1 1 
= 5 oxy (eu =x*)= SH) On —x"*) 
1 
= — 5 lu _ ee, ep =%") 
1 
— 5% (xu) — Ie" PD. 
(8) 


Now we use the bound (7), the assumption  — * < 
‘w*/8 and the fact || x, || < || x* || =1 to obtain: 


* 2) / 
q(x) — q(x*) < PE ; i pe 
€ 


_ é! 2/ne’ p*r? 
= (5 me ) 2 
ie “~) ao - g(x*)), 


where the last step is due to (5). Thus, if we select 


e 


Sa 
2/84 


then x,, is feasible for BQP(r) and 


q(xp) — q(x*) < €(q(0) — q(x*)), 


ie, X, is an €-minimizer to x*. 
Case II. In this case, we have 


(1-555) <i 


or 
€ 
* <|Al| + u*—. 
po Slee Ta 
Again, ss have wp — p* <€’ */8, then x — |A| < 
Zi a +8 Se However, unlike Case I, we find that || x,, 
|| is not sufficiently close to r. When we observe this fact, 
we do the following computation, essentially due to S.A. 
Vavasis and R. Zippel [18], to enhance x,,. 
al = 
the eigenvalue A. Then, one of the unit vectors ¢;, j = 1, 


Let q, 1, be an eigenvector associated with 


T 1 
.»— ™m, must have le} a| > Te (In fact, we can use 
T 1 
q= WG 


A randomly generated q will do it with high probabil- 
ity.) Now we solve for y from 


any unit vector q to replace e; as long as q 


(Q+ puDy = e; 
and let 
xXx=X+ay, 
where @ is chosen such that ||x|| = 7. Note we have 


(Q+puDx =—-c+ae;, 
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and in the computation of x, and y, matrix Q+ pI 
needs to be factorized only once. 
It is easy to show that 


i aa 


1 
J/n(u — |Al) 
and 


en 
8 8./n 


la] < 2r(w—|Al) Vn < 2r( 
Then, we have from (8) 
q(x) — q(x") 
= 5(Qr+ OTe") 
ip lax" + 0)"(x—x*) 
~ 3(Qx +e=de)' G2") 
~ se} (x =9)= seat)" (x ~x") 
= — Snax" (x —x*) 
+ hacfta—st)—ererlenxh 
= — Fux + p*x*) "(x — x*) 
4 wel (x —x*) 


2 


1 1 
= —5 (up) "(x — x") + 500} (x — x"), 


where the last step follows from ||x|| = || x* || =r. Now 
we use jl — u* <€’4*/8 and the preceding upper bound 
on @ to estimate the right-hand side: 


r?*e! é/u* eu 
eo ste 3) 2 
8 a ( 8 +) Pv 
e ne’ € ty 
(Ge eG es) 


4 2 2 2 
& fn € o 
< (F a 5) (q(0) — q(x")), 


q(x) — q(x*) < 


where the last step is due to (5). Thus, if we choose 


< 


Jn+ 50 


then x is feasible for BQP(r) and 
q(x) — q(x*) < €(q(0) — q(x*)) < €(Z—2z), 


i.e., x is an €-minimizer of q(x) over B(r). 
Hence, the bisection method will terminate with an 
€-minimizer of BQP(r) in at most 


pe 1 
O (log ( — ) + log(—) +1 
(toe (i) +8 (Z) +6) 


steps, or in a total of O(n3(log(2°/u*) + log(1/e) + log 
n)) arithmetic operations. 


Theorem The total running time of the bisection al- 
gorithm for generating an €-minimal solution to the 
ball-constrained QP is bounded by O(n? (log(u°/ju*) + 
log(1/€) + log n)) arithmetic operations. 


Recently, F. Rendl and H. Wolkowicz [14] showed that 
BQP(r) can be reformulated as a positive semidefinite 
problem, which is a convex nonlinear problem. There 
are polynomial interior point algorithms (see [11]) to 
compute an d,’ such that q/(d,') — q(d,/(u*)) < €’ in 
O(n? log(M*/e’ )) arithmetic operations. This will also 
establish an 


Olt oy is eee 
og — og n = ogn 
~ 08 — +n! logn] | log — + log 


arithmetic operation bound for the algorithm. 

The polynomial complexity in Theorem 1 can be 
further improved. In particular, see [20] for a mixed bi- 
section and Newton method for solving BQP(r) and for 
an arithmetic operation bound O(n? log(log(.°/ju*) + 
log(1/e’))) to yield a yz such that 0 < yw — u* <’. The 
brief idea of the method is to first find an approximate 
p- to the absolute value of the least eigenvalue |A| and an 
approximate eigenvector q to the true q, such that 0 < 
fu —A <€' and qtq‘ > 1 —€’. This approximation can 
be done in O(n? log(log(1/e’))) arithmetic operations. 
Then, we will use q to replace e; in Case II (ie. || xy < 
r) to enhance x(jz) and generate a desired approxima- 
tion. Otherwise, we know pL* > wand, using the mixed 
method in [20], we will generate a pe € (pt, W°) such that 
| — *| <e! w*/8 in O(n? log(log(°/*) + log(1/e’))) 
arithmetic operations. 

Finally, let Q and c have integer data. Consider the 
decision problem: Is there an x € R" satisfying ||x|| < 
1, and q(x) < 0? Under the Turing machine computa- 
tional model, this problem can be answered in polyno- 
mial time (see [18]). 
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Consider that we have n ‘objects’ and m ‘locations’, n > 
m, and we want to assign all objects to locations with at 
least one object to each location, so as to minimize the 
overall distance covered by the flow of materials mov- 
ing between different objects. Given a flow matrix F = 
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(fi) and a distance matrix D = (dj), we can formulate 
the quadratic semi-assignment problem as follows: 


n 


n m—1 m 
min o> S > Sij dk XkiX1; 
i=1 j=l k=1 /=k+1 


+ yD bam 


i=1 j=1 


n 
s.t. ) xij = 1, i nee 77 
j=l 


Xij E {0, 1}, 


i=1,....m, jHl,...,n. 


Comparing the above formulation with that of the 
quadratic assignment problem (cf. ® Quadratic assign- 
ment problem), we can see that the QSAP is a re- 
laxed version of the QAP, where instead of assign- 
ment constraints we have semi-assignment constraints. 
SQAP unifies some interesting combinatorial optimiza- 
tion problems like clustering and m-coloring. In a clus- 
tering problem we are given n objects and a dissimilarity 
matrix F = (fj). The goal is to find a partition of these 
objects into m classes so as to minimize the sum of dis- 
similarities of objects belonging to the same class. Obvi- 
ously this problem is a QSAP with coefficient matrices 
F and D, where D is an m x m identity matrix. In the 
m-coloring problem we are given a graph with n vertices 
and want to check whether its vertices can be colored by 
m different colors such that each two vertices which are 
joined by an edge receive different colors. This problem 
can be modeled as a SQAP with F equal to the adjacency 
matrix of the given graph and D the m x m identity ma- 
trix. The m-coloring has an answer ‘yes’ if and only if 
the above SQAP has optimal value equal to 0. Practical 
applications of the SQAP include distributed comput- 
ing [5] and scheduling [1]. 

SQAP was originally introduced by D.E. Greenberg 
[2]. As pointed out in [3], this problem is NP-hard. 
I.Z. Milis and V.F. Magirou [5] propose a Lagrangian 
relaxation algorithm for this problem, and show that 
similarly as for for the QAP, it is very hard to provide 
optimal solutions even for SQAPs of small size. Lower 
bounds for the SQAP have been provided in [4], and 
polynomially solvable special cases have been discussed 
in [3]. 


See also 
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Smoothness, or the existence of the classical derivative 
information for a function plays a significant role in the 
theory and the tools used today for modeling, approxi- 
mation, optimization and for their applications. Never- 
theless, nature seems to be more rich than the assump- 
tions done within current mathematical or physical the- 
ories. Nonsmoothness arises in a very large number of 
applications. The arising phenomena, including com- 
plex dynamics, pattern formation and chaos, are ap- 
pealing for both theoretical investigations and practi- 
cal applications. Most of them have not yet been stud- 
ied. Abandoning smoothness assumptions one arrives 
at the area of nonsmooth analysis. 

Within nonsmooth approximation the classical no- 
tion of the derivative is replaced by some set-valued 
generalized derivative. This is required for the con- 
struction of qualitative and quantitative first order ap- 
proximations of a function with points of nondifferen- 
tiability (kinks) or, respectively of a set with corners. In 
fact, the linearization (i.e., the linear or affine approx- 
imation) of a function at a given point which is based 
on the familiar Taylor expansion formula is based on 
the assumption that the derivative of the function (or 
its gradient) exists at the considered point. 

Historically, for convex nondifferentiable functions, 
a suitable set-valued extension of the derivative has 
been provided by the subdifferential of convex analy- 
sis, in the sense of J.-J. Moreau and R.T. Rockafellar 
[8,12]. For the general case of nonconvex, nondifferen- 
tiable functions, a direct extension of the convex analy- 
sis subdifferential has been provided by the generalized 
subdifferential in the sense of F.H. Clarke and Rockafel- 
lar [1,2,13]. This notion has been used in a variety of ap- 
plications, although it does not possess the above men- 
tioned first order approximation property. One should 
note that a large number of notions have been proposed 
for the approximation of nonconvex and nonsmooth 
functions (or sets) or of the solution of affiliated opti- 


mization problems. A complete list would go beyond 
the limits of this short article. This activity demon- 
strates the large practical interest of this area. 

The quasidifferential in the sense of V.F. Demyanov 
and A.M. Rubinov is an appropriate tool for the con- 
struction of first order approximations of functions and 
sets and, subsequently, for the solution of nonsmooth 
and nonconvex optimization problems. By treating sep- 
arately convex and concave contributions of the func- 
tion the quasidifferential introduces an ordered pair of 
convex sets. Intuitively speaking, the convex analysis 
subdifferential is present, for the convex contribution, 
while the superdifferential takes into account the con- 
cave parts (which, in turn, can also be studied by means 
of convex analysis arguments, since a concave function 
becomes convex if one changes its sign). The links of 
the quasidifferential with other notions of nonsmooth 
analysis have been discussed in »® Quasidifferentiable 
optimization: Dini derivatives, Clarke derivatives and 
[4]). More important is that certain calculus rules have 
been developed for the calculation of the quasidifferen- 
tial of sums, differences, products, quotients and, more 
general, of every function that can be constructed by 
using finite number times the minimum and maximum 
operators over a finite number of classical, smooth con- 
stituent functions (see ® Quasidifferentiable optimiza- 
tion: Calculus of quasidifferentials and [7]). Finally, 
based on the notion of the quasidifferential, certain new 
variational formulations can be constructed which gen- 
eralize the notion of variational inequalities of convex 
analysis. These variational formulations have the form 
of sets of variational inequalities, are valid for the gen- 
eral nonsmooth and nonconvex case (see also » Qua- 
sidifferentiable optimization: Variational formulations 
[6,11]) and give a computationally advantageous form 
to the hemivariational inequalities in the sense of P.D. 
Panagiotopoulos (see, among others, » Nonconvex en- 
ergy functions: Hemivariational inequalities; » Hemi- 
variational inequalities: Applications in mechanics and 
[6,9,10,11]). 

Here, the definition of the quasidifferential for one- 
dimensional and finite-dimensional functions is given 
and hints for its extension into functionals are dis- 
cussed. Finally, some information on the related, and 
more convenient for the numerical applications notion 
of the codifferential and on the construction of opti- 
mization algorithms is provided. 
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One-Dimensional Nonsmooth Functions 


Let f be a real-valued finite function defined on the real 
line R. The most powerful and widely used tool to study 
the properties of f is the notion of derivative. Function f 
is called differentiable at x € R if there exists its deriva- 
tive f’(x) at x, which is defined by 
1 
/ < 

f'(x) = lim = [f(x + a) — f(@)] . (1) 
If this limit exists for every point of some open set S € 
R, the function f is called differentiable on S. 

Among the variety of applications of the deriva- 
tive one recalls here the first order approximation (lin- 
earization) of f in the neighborhood of a point x: 


f(x + A) = f(x) + f'(x)A + 0,(A) (2) 
with 
0x(A) 
>0 asA>0O. (3) 


Moreover, x* is a minimum of the function f if 
Te =O (4) 


Relation (4) defines a stationary point of f, since it also 
holds true for a maximum and for a saddle point of f. As 
is usual, higher order derivatives are checked in order to 
specify the nature of the stationary point. 


One-Sided Differentials 


Assume now that the limit (1) does not exist, but at 
the same time the following directional derivatives ex- 
ist: the right-hand side derivative f’,(x) and the left- 
hand side derivative f’_(x) of f at x. The right-hand side 
derivative is defined by: 

f-0) = lim = * Lf +a) - FR] « (5) 
Analogously, the left-hand side derivative is defined by 
the limit: 

os lim | “I f(x +a) — f(x)] . (6) 
Here a | 0 means that a — 0, by taking positive values 
a >Oand a t 0 means that a — 0, with negative values 
a <0. 


It is clear that for a function f to be differentiable at 
x it is necessary and sufficient that f’,(x) = f’—(x). 

The directional derivative of a function f at point x 
and in the direction x € R is defined by the limit: 

fag= lim © ~ [fet ag) — f(x)] , (7) 
if this limit exists. 

The notion of directional derivative is a proper ex- 
tension of the notion of the derivative. For example, it 
can be used to linearize a given function (cf. (2)) along 
a direction g. In this case relation (2) holds along a given 
direction, a different value holds for the opposite di- 
rection, etc, so that it provides the basis for a quasilin- 
earization of the function f. 

From the definition one may easily see that a neces- 
sary condition for a directionally differentiable function 
f to attain a minimum at point x* is that: 


f'(x*,g)=0, VWeeR. (8) 


If strict inequality holds in (8) for every direction g not 
equal to zero, the condition becomes also sufficient for 
x* to bea strict local minimum of f. On the other hand, 
a necessary condition for a directionally differentiable 
function f to attain a maximum at point x* is that: 


f'(x*™*,g) <0, VgeR, (9) 


with analogous implications for a strict local maximum. 

A point x* which satisfies relation (8) is called an 
inf-stationary point of f, while a point x*™* satisfying 
(9) is called a sup-stationary point. It is interesting to 
observe that for a nonsmooth function first order opti- 
mality conditions may, in some cases, become sufficient 
for a minimum or a maximum. 


Quasidifferential 


A function f: R > R is called quasidifferentiable (q.d) 
at a point x if it is directionally differentiable at x and 


there exists a pair of closed intervals df (x) = [v1, v2] and 
Of (x) = [w1, w2] such that 
f'(x,g) = max vg+ min wg, VgeR. (10) 


ved f(x) wed f(x) 


The pair of intervals Df(x) = [Of (x), Of (x)] is called 
a quasidifferential of f at x. The set df(x) is called the 
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subdifferential and the set 0 f(x) the superdifferential of 
f atx. 

It is clear that a quasidifferential is not uniquely de- 
fined. In fact, if a function f is quasidifferentiable at 
x and Df(x) is its quasidifferential at this point, i-e., 
Df(x) = [df(x), Of (x), then every pair of the form 
[of(x) + Co f(x) — C], where C is an interval C = [c,, 
2] € R, with c) < c is also a quasidifferential of f at 
x. In fact, the quasidifferential is a class of equivalent 
ordered pairs of convex sets. 


Necessary and Sufficient Optimality Conditions 


For a quasidifferentiable function the necessary and 
sufficient optimality conditions (see (8)-(9)) can be 
written as follows. Let Df(x) = [0f(x), 0f(x)] be 
a quasidifferential of f at x. A necessary condition for 
function f to attain a minimum at point x* is that: 


— af(x*) C af(x*). (11) 
The condition 
— Of (x*) C int Of (x*) (12) 


is sufficient for x* to be a strict local minimum of f. 
Analogously, a necessary condition for a maximum of f 
at x* * is that: 

— af(x™) c f(x"), (13) 
with an analogous result for a sufficient condition for 
a strict local maximum: 


— af(x**) C int Of (x**). (14) 


Finite-Dimensional Nonsmooth Functions 
Subdifferentiable Functions 


Let a function f defined on an open set X C R" be di- 
rectionally differentiable at a point x ¢ X. The function 
f is subdifferentiable at x if its directional derivative is 
a superlinear function, i.e. there exists a convex com- 
pact set U such that 


fagQ= max(h, g), VgeER". (15) 


Superdifferentiable Functions 


A function is superdifferentiable at x if its directional 
derivative can be written by means of a convex compact 
set V as 


FAD =Le)= min(h, g). VgeER". (16) 


Quasidifferentiable Functions 


A directionally differentiable function f defined on an 
open set X C R" is called quasidifferentiable at a point 
x € X, if there exists an ordered pair of convex compact 
sets [U, V] in R” x R” which produces the directional 
derivative of the function by: 


f'(x, 8) = fxg) = max(h, g) + minh, g), Vg eR". 
(17) 


Clearly, the first term on the right of (17) is a sublinear 
function while the second term is a superlinear func- 
tion. Thus, the directional derivative of a quasidiffer- 
entiable function belongs to the space L of functions 
which can be written as the sum of a sublinear func- 
tion and a superlinear function. Moreover with an ele- 
ment [U, V] of the space of compact sets it is associated 
the class of equivalent ordered pairs of compact convex 
sets. 

Thus, the class of equivalent ordered pairs of convex 
compact sets [U, V] of (17) (the quasidifferential Df (x) 
of f at x) fully describes the first order derivative of the 
directionally differentiable function f and gives rise to 
the quasilinearization (17) and, subsequently, to a qual- 
itative and quantitative first order approximation of f 
in the sense of (2). 

As an example, let us mention that for a differen- 
tiable function f either Df = [Vf, {0}] or Df = [{0}, Vf] 
can be used as the quasidifferential of f. For a convex, 
nondifferentiable function f, Df = [ of, {0}], where of 
denotes the classical subdifferential of convex analysis 
[12] can be used. Analogously, for a concave function 
f, one may uses Df = [{0}, df], where df denotes the 
superdifferential of the concave function f. A difference 
convex function (d.c. function) is a function f which can 
be expressed as the difference of two appropriately de- 
termined convex constituents, i.e., f(x) = f1(x) — f2(x), 
Vx ©€ X, where f(x) and f2(x) are convex functions. 
In this case one constructs a quasidifferential simply by 
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Df (x) = [ Of 1 (x), 0f2(x)], where the convex subdifferen- 
tials of the functions f;(x) and f2(x) are used. 


Further Related Topics 


Extension of the theory of quasidifferentiability to 
infinite-dimensional function spaces has not been stud- 
ied till now (1999) in details. First hints can be found in 
[3,6]. 

The notion of the quasidifferential has been ex- 
tended by Demyanov to the notion of the codifferen- 
tial, which has certain advantages for numerical appli- 
cations (see ® Quasidifferentiable optimization: Cod- 
ifferentiable functions and [4]). Several applications of 
the quasidifferentiability concept and related references 
are given in » Quasidifferentiable optimization: Appli- 
cations and in [5]. 
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Quasidifferentiability and codifferentiability extend the 
notion of the subdifferential of convex analysis for 
a quite general class of nonconvex and nonsmooth 
functions. If for a directionally differentiable function f: 
R" — R there exists an ordered pair of convex compact 
sets [U, V] in R” x R" which produces the directional 
derivative of f at x in the direction g by the expression: 


f(x, g) = max(h, g) + min(h, g) , (1) 


this function is called quasidifferentiable in the sense of 
V.F. Demyanov and A.M. Rubinov. 

If moreover the quasidifferential of the above func- 
tion is of the form [U, 0] (where 0 is considered as an 
element of the space R”), then function f is called subd- 
ifferentiable. 

More details on this notion, the calculus rules for 
computing quasidifferentials, its connection to other 
notions of nonsmooth analysis and it applications 
can be found in » Quasidifferentiable optimization; 
> Quasidifferentiable optimization: Calculus of qua- 
sidifferentials; ® Quasidifferentiable optimization: Dini 
derivatives, Clarke derivatives; » Quasidifferentiable 
optimization: Applications; as well as in [1,2,3]. 


The quasidifferential, as well as the subdifferential 
of convex analysis, are set-valued quantities which in- 
clude discontinuities at the points of nondifferentiabil- 
ity. In numerical algorithms this may cause problems. 
A notion that takes into account neighboring informa- 
tion would be more appropriate. This led Demyanov to 
extend the notion of the quasidifferential and to define 
the notion of the codifferential. 

Let X be an open subset of R” and let a function f be 
defined and finite for every x € X. A function f is called 
codifferentiable at x if there exist convex, compact sets 
df (x) C Rt! and df(x) C R"t! such that the fune- 
tion admits a first order approximation in a neighbor- 
hood of x of the form 


f(x + A) = f(x) + amex, [a + (v, A)] 


(2) 


+ min [b+ (w,A)]+0,(A), 


[b,w]ed f(x) 
where 0,(@A)/a — 0, as a@ | 0, VA € R”. The ordered 
pair of convex, compact sets Df(x) = [d f(x), df(x)] 
is called a codifferential of f at x, where df(x) is a hy- 
podifferential and d f(x) is a hyperdifferential. 

If there exists a codifferential of the form Df(x) = 
[d 'f (x), 0], where 0 is considered as an element of space 
R"*!, the function f is called hypodifferentiable. 

One recalls that classical convex nondifferentiable 
functions are subdifferentiable (resp. hypodifferen- 
tiable) in the above outlined framework, since one may 
use the classical convex analysis subdifferential in the 
above definitions for the construction of the subdiffer- 
ential (resp. the hypodifferential) at a given point. 

More details about codifferentiability (including ex- 
tensions to higher order codifferentials) can be found 
in » Quasidifferentiable optimization: Codifferentiable 
functions. 


Hypodifferentiable Optimization 


Efficient nonsmooth optimization algorithms can be 
constructed for hypodifferentiable functions. In fact, 
the technique of replacing a nondifferentiable opti- 
mization problem by an enlarged, classical, inequality 
constrained optimization problem has been success- 
fully used for convex or for composite optimization 
problems [4,13]. For hypodifferentiable functions a di- 
rection of descent at each given point can be deter- 
mined and used in an iterative optimization procedure. 
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Let us consider a conceptual iterative steepest de- 
scent optimization algorithm and the form it takes for 
nondifferentiable (hypodifferentiable) functions. First, 
recall that a nondifferentiable function does not possess 
derivatives in the classical sense. One uses set-valued 
approximations of the derivative (cf., the subdifferen- 
tial or the hypodifferential) at the points of the nondif- 
ferentiability instead. 

Accordingly optimality conditions (which will also 
provide the stopping rules for an optimization algo- 
rithm) and the calculation of the steepest descent di- 
rection must appropriately be modified. 

The first order necessary condition for a hypodif- 
ferentiable function f to attain a minimum at point xo 
reads: 


Oe df (xo) : (3) 


Points x for which relation (3) is satisfied are called inf- 
stationary points. Note that the previous relation hold 
in the space R”*?. 

If at a given point x;, at the kth iteration of an iter- 
ative optimization scheme, relation (3) is not satisfied, 
then one may always find the point Z with minimum 
norm in the closed convex set df(x;), such that: 


Z" (xk) = (n* (xx), 2" (xk) = sm IZ. (4) 


Since (3) is not satisfied, one has |Z (xx) | > 0. The 
direction 

z*, (xx) 

I|z*(xx) | 
can be used as a descent direction within an optimiza- 
tion algorithm. 

In the conceptual manner used in this note, the next 

step of the iterative algorithm will have the form: 


Bk(Xk) = (5) 


Xk+1 = Xp + ARE, 


where steplength a, will be determined from the so- 
lution of the one-dimensional optimization problem 
(along the direction gx): 


ak = arg min {f(xe + agy)} . 


For more general quasidifferentiable and codifferen- 
tiable functions one may construct appropriate solution 
algorithms, see > Quasidifferentiable optimization: Al- 
gorithms for QD functions and in the original literature 
(see, e. g., [1]). 


Comments 


Nondifferentiable optimization procedures have at- 
tracted the attention of several researchers and practi- 
tioners in the last decade. The lost of information which 
is connected with smoothing approaches is, for sev- 
eral applications, critical for the quality of the results. 
Beyond the quasidifferentiable optimization literature, 
previously mentioned in this note, general methods and 
theory for descent type methods for nonsmooth func- 
tions can be found in [7,12]. In this respect, the bun- 
dle concept has been found useful (see, among others, 
[6,8,9,11]). An application of this method for the solu- 
tion of hemivariational inequality problems arising in 
mechanics can be found in [10] and [5]. 

Closing one would like to mention again the addi- 
tional requirements of nonsmooth optimization with 
respect to classical, smooth one. First, stopping crite- 
ria must take into account the set-valued nature of the 
nonsmooth optimality conditions. Otherwise cycling in 
an iterative scheme or premature exit at a noncritical 
point may occur. This is the more critical point. More- 
over, the line search must take into account the non- 
differentiability of the involved function. This require- 
ment is, usually, easily taken into account (for instance, 
by means of a derivative-free technique). 


See also 


> Generalized Monotonicity: Applications to 
Variational Inequalities and Equilibrium Problems 

> Hemivariational Inequalities: Applications in 
Mechanics 

> Hemivariational Inequalities: Eigenvalue Problems 

> Hemivariational Inequalities: Static Problems 

> Nonconvex Energy Functions: Hemivariational 
Inequalities 

> Nonconvex-Nonsmooth Calculus of Variations 

> Quasidifferentiable Optimization 

> Quasidifferentiable Optimization: Algorithms for 
QD Functions 

> Quasidifferentiable Optimization: Applications 

> Quasidifferentiable Optimization: Applications to 
Thermoelasticity 

> Quasidifferentiable Optimization: Calculus of 
Quasidifferentials 

> Quasidifferentiable Optimization: Codifferentiable 
Functions 


3178 


Quasidifferentiable Optimization: Algorithms for QD Functions 


> Quasidifferentiable Optimization: Dini Derivatives, 
Clarke Derivatives 

> Quasidifferentiable Optimization: Exact Penalty 
Methods 

> Quasidifferentiable Optimization: Optimality 
Conditions 

> Quasidifferentiable Optimization: Stability of 
Dynamic Systems 

> Quasidifferentiable Optimization: Variational 
Formulations 

> Quasivariational Inequalities 

> Sensitivity Analysis of Variational Inequality 
Problems 

> Solving Hemivariational Inequalities by Nonsmooth 
Optimization Methods 

> Variational Inequalities 

> Variational Inequalities: F. E. Approach 

> Variational Inequalities: Geometric Interpretation, 
Existence and Uniqueness 

> Variational Inequalities: Projected Dynamical 
System 

> Variational Principles 


References 


1. Demyanov VF, Rubinov AM (1995) Introduction to con- 
structive nonsmooth analysis. P. Lang, Frankfurt am Main 

2. Demyanov VF, Stavroulakis GE, Polyakova LN, Pana- 
giotopoulos PD (1996) Quasidifferentiability and nons- 
mooth modelling in mechanics, engineering and eco- 
nomics. Kluwer, Dordrecht 

3. Demyanov VF, Vasiliev LN (1985) Nondifferentiable opti- 
mization. Optim. Software, New York 

4. Fletcher R (1990) Practical methods of optimization. Wiley, 
New York 

5. Haslinger J, Miettinen M, Panagiotopoulos PD (1999) Finite 
element method for hemivariational inequalities. Kluwer, 
Dordrecht 

6. Hiriart-Urruty J-B, Lemaréchal C (1993) Convex analysis and 
minimization algorithms. Springer, Berlin 

7. Kiwiel KC (1985) Methods of descent for nondifferentiable 
optimization. Springer, Berlin 

8. Lemarechal C, Mifflin R (eds) (1978) Bundle methods in 
nonsmooth optimization. Pergamon, Oxford 

9. Makela MM, Neittaanmaki P (1992) Nonsmooth optimiza- 
tion. World Sci., Singapore 

10. Miettinen M, Makela MM, Haslinger J (1995) On numerical 
solution of hemivariational inequalities by nonsmooth op- 
timization methods. J Global Optim 8(4):401-425 
11. Schramm H, Zowe J (1992) A version of the bundle idea for 

minimizing a nonsmooth function: conceptual idea, con- 


vergence analysis, numerical results. SIAM J Optim 2:121- 
152 

12. Shor NZ (1985) Minimization methods for nondifferen- 
tiable functions. Springer, Berlin 

13. Womersley RS, Fletcher R (1986) An algorithm for compos- 
ite nonsmooth optimization problems. J Optim Th Appl 
48:493-523 


Quasidifferentiable Optimization: 
Algorithms for QD Functions 


VLADIMIR F. DEMYANOV 
St. Petersburg State University, St. Petersburg, Russia 


MSC2000: 90Cxx, 65Kxx 


Article Outline 


Keywords 

Codifferentiable (c.d.) Functions 

Method of Codifferential Descent (MCD) 
Method of Hypodifferential Descent (MHD) 
Difference of Convex (d.c.) Functions 
Difference of Max-Type (d.m.) Functions 
Twice Codifferentiable Functions 
Quasidifferentiable Programming Problems 
See also 

References 


Keywords 


Quasidifferentiable function; Codifferentiable 
function; Method of codifferential descent; Method of 
hypodifferential descent; D.c. function 


Codifferentiable (c.d.) Functions 


f: R" = R is called quasidifferentiable at x € R” if 
it is directionally differentiable (in the sense of Dini 
or Hadamard) and there exists a pair Df(x) = 


[ar, asx) of compact convex sets of R” such that 


min (w,g). (1) 


ip 
(x,g) = max (v,g) + 
f = ved f(x) = w€0f (x) 


Here f’ (x, g) is either the Dini or Hadamard derivative 
of f at x in a direction g € R”. (See ® Quasidifferentiable 
optimization: Optimality conditions). 
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In the sequel we discuss only the problem of min- 
imizing the function f. In > Quasidifferentiable opti- 
mization: Optimality conditions, necessary conditions 
for a minimum of f were formulated in terms of qua- 
sidifferentials (q.d.) and a formula for computing steep- 
est descent directions was derived. However, it is diffi- 
cult to apply steepest descent directions for construct- 
ing numerical methods for minimizing the function f 
since the quasidifferential mapping Df is, in general, 
discontinuous in the Hausdorff metric. This is why we 
need some other tool to overcome the discontinuity of 
Df. 

A function f: R” > R is called Dini codifferen- 
tiable (D.c.d.) at x € R” if there exists a pair Df(x) = 
[d f(x), df(x)] of compact convex sets of R"*! such 
that 


+ A)=f(x)+ max ja+(v,A 
f(x ) =f (x) mel (v, A)] 7 
+ min [b+(w,A)]+0,(A). 
[b,w]éd f(x) 
x(aA 
OOD) 2: YR eR (3) 
a 
Here a, b € R; v, w € R". If in (2) 
x(A — 
ox(A) Alo) VA ERT, (4) 


[|All 


then f is called Hadamard codifferentiable (H.c.d) at x. 
Without loss of generality it may be assumed that 


b=0. (5) 


min 
[b,w]ed f(x) 


max a= 
la,vledf(x) 


If it causes no misunderstanding, we shall use the term 
codifferentiable (c.d.) for both Dini and Hadamard 
codifferentiable functions. 

The pair Df(x) = [df (x), df(x)] is called a cod- 
ifferential of f at x, df(x) is a hypodifferential, d f(x) is 
a hyperdifferential. A codifferential (like quasidifferen- 
tial) is not uniquely defined. If there exists a codifferen- 
tial of the form Df (x) = [df(x), {0,4 1}], the function f is 
called hypodifferentiable at x. If there exists a codifferen- 
tial of the form Df (x) = [{0,+4:}, df (x)], the function 
f is called hyperdifferentiable at x. 

It is easy to see that the class of Dini (Hadamard) 
codifferentiable functions coincides with the class of 
Dini (Hadamard) quasidifferentiable functions. 


For example, if Df(x) = [df (x), df(x)] is a cod- 
ifferential of f at x such that (5) holds, then the pair 
Df (x) = [af (x), Of (x)], where 


Of(x) = {vER": [0,v] edf(x)}, 
af (x) = \w ER": [0,w]e€ df(x)} 


is a quasidifferential of f at x. 

A function f is called continuously codifferentiable 
at x if it is codifferentiable in some neighborhood of x 
and there exists a codifferential mapping Df which is 
Hausdorff continuous at x. 


Remark 1 Of course, it is possible to introduce the no- 
tion of continuously quasidifferentiable function; how- 
ever, if f is continuously q.d. at x then it is just differen- 
tiable at x. 


For a fixed A the functions (see (1) and (2)) 


@,,(A) = f(x) + max (v,A) + min (w, A) 
ved f(x) wed f(x) 
and 
@>,(A) = f(x) + max [a +(, A)| 
[a,v]ed f(x) 
+ min [b+(w,A)] 
[b,w]ed f(x) 


are both first order approximations of f in a neighbor- 
hood of x. The function F,(A) = ®1,(A)—f(x) is pos- 
itively homogeneous (of degree one) in A while the 
function F(A) = 2,(A)—f(x) is, in general, not pos- 
itively homogeneous. The loss of homogeneity is the 
price to be paid for the continuity (if any) of the cod- 
ifferential mapping. 

Note again that the ‘value’ of the mapping Df at any 
x is a pair of convex compact sets in R"*!. 

If turns out that most of the known functions are 
continuously codifferentiable (see [3,4]). For example, 
all smooth, convex, concave and concavo-convex func- 
tions are continuously codifferentiable. The class of c.d. 
functions enjoys a very rich calculus similar to that for 
q.d. functions (see » Quasidifferentiable optimization: 
Optimality conditions) which is a generalization of the 
classical differential calculus. The class of c.d. functions 
was introduced in [3]. 

First we discuss the problem of minimizing a c.d. 
function on the entire space (in the absence of con- 
straints). 
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For a c.d. function the following necessary condi- 
tion holds: 


Proposition 2 For a point x* € R" to be a minimizer of 
f on R" it is necessary that 


On44 € {df (x*) + [0, w]} > 


- * (6) 
V[0, w] edf(x*) 


(it is assumed that condition (5) holds). 


A point x* satisfying (6) is called inf-stationary. Let x 
be not inf-stationary. Then there exists w = [0,w] € 
d f(x) such that 


On+i ¢ {df (x*) + W} = Lyx). (7) 
Find 
min. (ll = [ZO 


(7) implies that 


Zl) = [nwl), Ze)] A Ont 
(n(x) ER, z_(x) ER"). 


It is also possible to show that z»(x) 4 0, and that for 
the direction 


Zq(Xx) 
|| zr(x)|| 


Sax) = — 
the inequality f(x, g@(x)) <— ||zw(x)|| holds. 


Method of Codifferential Descent (MCD) 


Let a function f be defined, Lipschitz and continuously 
codifferentiable on R”. Fix any jz > 0. Choose an arbi- 
trary xo € R”. Let x; have already been found. If con- 
dition (6) holds at x,;, then x, is inf-stationary and the 
process terminates. Otherwise, for every W € ay F (xx) 
where 


d(x) = |Wedf(x): ¥=(,w), 0<< ph (8) 
we find 
ann. Il = [Ex 


with 


Zkw = [nkw, Zw] ; L(x) = [df (xx) a w] : 


Now, for every W € dj, f (xx) we find 


min ff (xk — OZKw) = f (XK — KwZKw) (9) 
and then 
min (xe — OwZiw) = f (xk — OKT, ZK) - 
wed p f(xx) 


Put xp41 = Xk — kw, Zkw,- Continuing in the same 
manner we construct a sequence {x;} such that f(x,41) 
<f (xx). 


Proposition 3 (See [4, Thm. V.5.1].) Let the set {x € 
R": f (x) < f(xo)} be bounded, x* be a limit point of the 
sequence {x,} and let relation (4) hold uniformly in x 
from some neighborhood of x* and in A from S = {A 
€ R": ||Al| = 1). 

Then the point x* is an inf-stationary point of f (i. e. 
condition (6) holds). 


Remark 4 The above described MCD is a conceptual 
method (according to the terminology of E. Polak). It 
should be adjusted to a specific class of functions. The 
MCD isa generalization of the classical steepest descent 
method. 


For example, if for every x € R” the set df (x) is the 
convex hull of a finite number of points then in (8) one 
can take only points w = (w, w) which are ‘vertices’ of 
df such that 0 <@ < p. 

In this case at each step it is required to solve only 
a finite number of one-dimensional optimization prob- 
lems (9). 


Method of Hypodifferential Descent (MHD) 


Let f be defined, Lipschitz and continuously hypodif- 
ferentiable on R", i.e. there exists a codifferential map- 
ping of the form Df(x) = [df(x), {0,41}] which is 
Hausdorff continuous. Then the necessary condition 
for a minimum (6) takes the form 


Ont € df(x*). 


If x € R" is not an inf-stationary point (i.e., (10) does 
not hold) then let us find 


(10) 


ee ZI] = (n(x), 20) |] = oe) = lz) I . 


Since p(x) > 0 then z(x) # 0,41. It is possible to show 
that z(x) A 0,. The direction g(x) = — z(x)/ || z(x) || 
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is a descent direction (not necessarily the steepest de- 
scent direction). The vector function g(x) is continuous 
at any point which is not inf-stationary. 

Take any xo € R”. Let x, have already been con- 
structed. Let us find p(x,) = ||Z(xx)||. If o(x~) = 0 then 
x x is inf-stationary and the process terminates. Other- 
wise, take gy = —z(x,)/ || z(xx) || and find 


min f(xe + age) = fle + ange) 


Put x4 41 =X + &,x~. Continuing analogously we con- 
struct a sequence {x;} such that f(x, 41) <f (xx). 


Proposition 5 Let x* be a limit point of the sequence 
{x;} and the hypotheses of Proposition 3 hold. Then Oy « 1 
€ df (x*) i. e. x* is an inf-stationary point of f. 


Difference of Convex (d.c.) Functions 


Let f(x) =f1 (x) — f2(x) where f, f.: R” > Rare convex. 
A dc. function is quasidifferentiable with the quasidif- 
ferential Df (x) = [df(x), Of (x)] where Of (x) = Of 1 (x), 
Of (x) = —df2(x), df; (x) and df2(x) are subdifferentials 
(in the sense of convex analysis) of the functions f; and 
f2 respectively: 


filz) — filx) = (v,z—x), 


ER": 
- Vz € R" 


dfi(x) = 
The sets df; are convex and compact. The necessary 
condition for a minimum (6) takes the form 


Ofalx") C afi(x*). (11) 


If f2 is a polyhedral function (i.e. f(x) = max; ;{a; + 
(v;, x)} where a; € R, v; € R", I= 1,..., N) then condi- 
tion (11) is sufficient for the point x* to be a local min- 
imizer of f. 

Since the mappings df; and df are discontinuous 
then Df is also discontinuous. 

If Fis a convex function, ¢ > 0 then the set 


F(z) — F(x) 
OeF(x) = 4vER": >(v,z-x)-8, 
Vz eR" 


is called the ¢-subdifferential of F at x. 

We shall use the following properties of a convex 
function (see, e. g., [7]): 
1) deF(x) is a closed compact set. 


2) The mapping 0,F is Hausdorff continuous jointly in 
€ and x on (0, 00) x R”. 
3) 


1 
,g) = inf —|F =F 
Bm 8) = [Pe a) — P+ 


= Fi(x, 2). 


In [6] the following necessary and sufficient condi- 
tion for a global minimum is stated: 

For a point x* to be a global minimizer of a d.c. 
function f(x) = f1(x) — f2(x) it is necessary and suffi- 
cient that 


Oe fa(x*) C Oe filx*), We >0. (12) 

Note that if ¢, > ¢2 and 

Fore, (% 8) = Fie, (%. 8) — fre, (xg) <0 (13) 
then 

inf f(x + ag) < f(x) + e2—e1. (14) 


Let us construct the following method for finding an 
inf-stationary point of f (i.e. a point satisfying (11)). 

Fix €0, lo = €0/2. Take an arbitrary x99 € R”. Assume 
that the set 


C= {x ER": f(x) < f(xoo)} 

is bounded (then it is closed since f is continuous). If 
Boo = Ojof2(X00) C Aoo := Peo fi (Xoo) 

then we put xo = Xoo. If 
9 p19 .f2(Xo0) g¢ Je) f1(Xo0) 

then let us find 


max min ||v — w|| = ||voo — Wooll = Poo 
w€Boo vE Ago 


and put goo = (Woo — Voo)/ || Woo — Voo ||. Since poo > 0 
then 


Feotuy (00> S00) < 0 
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and, by (13) and (14), we conclude that 
; é 
inf Ff (X00 + &g00) = f (x00 + 00800) < f (x00) — S . 
(15) 
Now take x01 = X00 + 00800 and check the condition 
Bor := Oy f2(x01) C Aor := Ie, filXo1) - 


Continuing in the same manner, in a finite number of 
steps we shall find a point xo;, such that 


Boso = Op f2(X0s9) C Aoso = Deo fi(Xos0) (16) 
(it is due to (15) and the boundedness of C). 

Put xo = X0s)- By (16) 

Bo := Oyo fa(Xo) C Ao := Dey filXo) - 

Let x; be constructed such that 

Ou, fo(xk) C Oe, fixe), Vie 0,...,k, (17) 
where [Uj = [o/2!, € = &0/2'. 

Put xk 41,0 = Xx. If 

Orgs F2(%k+1,0) C Ie, 4, fil%e+1,0) (18) 


then we take x.41 = Xxk+1,0- If (18) does not hold, we 
continue as above and ina finite number of steps a point 
Xk+1,s,4 Will be found such that 


Onar Fo(%k+1,s441) G Degas SiCXk-+ IL sp41) 


and we put xK41 = Xk+yspqi: 
As a result we construct a bounded sequence {xx} 
satisfying (17). 


Proposition 6 Any limit point of the sequence {x;} is 
an inf-stationary point of f. 


Difference of Max-Type (d.m.) Functions 
Let 


flx) = fix) — A) 
where f1, fo: R" > R are max-type functions: 
filx) = max g(x, y) , 
yeG, 


falx) = max ¢(x, y) 
yeGz 


where ¢ and @» are continuous on R” x G; and R” 
x G2, respectively, and there exist derivatives y),’ and 
2x’ which are continuous. The function f (called a d.m. 
function) is quasidifferentiable. It is also continuously 
codifferentiable: 


f(x +A) =f(x)+ max 


,A 
la,vled f(x) E a ) 


+ min [b+(w,A)]+0,(A), 
[b,wled f(x) 
where 
a= p(x, y) — filx), 
df (x) = co 4 [a,v]: v= 9}, (x, y), 
YE G, 
CRxR’", 
_ b= fo(x) = 2(x, y), 
df (x) = co 4 [b, w]: w = —9},(x, y), 
YE G 
CRxR". 


Here a,b € R;v, wE€ R". 
Now it is possible to employ the MCD for finding 
inf-stationary points. 


Twice Codifferentiable Functions 

A function f: R” — R is called twice codifferentiable at 
x € R" if there exist convex compact sets d*f(x) and 
d f(x) CRx R" x R™" such that 


f(x + A) = f(x) 


+ max 


1 
E + (v,A) + =(AA, a)| 
[a,v,Aled? f(x) 2 


+ min 
[b,w,Bled f(x) 


+ 0(A?) 


E 5 Ga Ne 5(80, 4)| 


where 


AY?) ws 
OAT. WAR eR 
a2 


Here R”*" is the space of real (n x n)-matrices. 

The pair of sets D* f(x) = [d’ f (x), d f(x)] is called 
a second order codifferential of f at x. If f is twice c.d. in 
some neighborhood of x and the mapping D?f is Haus- 
dorff continuous at x, then the function is called twice 
continuously codifferentiable at x. 
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The class of twice c.d. functions is quite rich and en- 
joys a well-developed calculus (see [4]). 

Let f be twice continuously c.d. on R”. Then the fol- 
lowing second order Newton-type method can be em- 
ployed to find inf-stationary points of f. 

Take any xo € R". Let x, have already been con- 
structed. Put 


1 
F,(A) = max E + (v, A) + —(AA, a)| 
[a,v, Aled? f (xz) 2 
1 
+ min E + (w, A) + =(BA, a)| , 
[b,w Bled f(x) 2 
find 


min Fy(A) = Fy(Ax) 
AeR" 


and take xp. =x, + Ag. 

The sequence {x;} thus constructed converges (un- 
der some additional assumptions) to an inf-stationary 
point of f (see [1]). 


Quasidifferentiable Programming Problems 


Let functions f and h;: R" > R (ie I=1,..., N) be 
quasidifferentiable on R” and let 


Q={x ER": h(x) <0, Vien. 


Assume that 2 4 @. 
It is required to find 


(P)min f(x) = f*. 


The set £2 is called quasidifferentiable, problem (P) 
is a quasidifferentiable (q.d.) programming problem. 
Necessary conditions for a minimum of f on 92 are 
stated in » Quasidifferentiable optimization: Optimal- 
ity conditions. If all the functions f and f;’ are, in addi- 
tion, continuously codifferentiable then it is possible to 
extend the MCD to problem (P) (see [4]). 

Another approach to problem (P) is based on the 
penalization technique. 

We say that problem (P) is calm if 


lim sup fase 
elo é 


<B<oo (19) 


where 
fe= a ; 
Q, = {x ER": hj(x) <e, Vie , 


e>d. 


Proposition 7 Ifthe calmness condition (19) holds then 
there exists A* < 00 such that, for any A > A*, the set of 
minimizers of the function f on 2 coincides with the set 
of minimizers of the function 


F(x, A) = f(x) + AD | hP(x) 


ie] 


(20) 


on R". Here h; (x) = max{0, h;(x)}. 


Remark 8 Thus, the constrained optimization problem 
(P) is reduced to the unconstrained one. Since the func- 
tion F(x, A) is again quasidifferentiable, one can use 
methods for unconstrained optimization. Another con- 
dition (different from (19)) under which Proposition 7 
is valid was stated in [2]. 


Remark 9 Problem (P) is called a d.c. programming 
problem if all the functions f and h;’ (i € I) are dic. ice. f 
= fi — fa, hi = hy; — ho; where fy, f2, hi, ha; are convex. 
If the calmness condition (19) holds then, by Proposi- 
tion 7, problem (P) is reduced to that of minimizing the 
function F(x, A) (see (20)) (if A is sufficiently large). We 
have 


h¥ (x) 


II 


max{0, h;(x)} 


max{0, hy (x) — hoi(x)} = hai(x) — hai(x) , 


where 


hii(x) = max{h;(x), h2i(x)} , 
hoj(x) = hyi(x) + Aoi (x). 


The functions hij and hy; are convex, therefore Ae 
is d.c. and, hence, the function F(x, A) is also d.c. and 
one may use the method described above for d.c. func- 
tions. 
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Quasidifferentiability and the notion of the quasidif- 
ferential extend the subdifferential of convex analysis 
for a quite general class of nonconvex and nonsmooth, 
but directionally differentiable functions. By using an 
ordered pair of convex sets, the quasidifferential copes 
in a nice way with both nonsmoothness and noncon- 
vexity issues. Since its introduction by V.F. Demyanov, 
a number of quasidifferential optimization problems 
have been studied. Moreover calculus rules have been 
developed and applications, among others in mechan- 
ics and engineering [5] have been considered. In addi- 
tion, the related, more appropriate for numerical pur- 
poses notion of the codifferential has been introduced. 

Let us consider a classical optimization algorithm, 
the (anti)gradient optimization, and how it is modi- 
fied for quasidifferentiable functions. First, recall that 
a nondifferentiable function does not has derivatives in 
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the classical sense. One should use set-valued approx- 
imations at the points of the nondifferentiability in- 
stead. This replacement introduces the following two 
problems in a gradient optimization algorithm. First, 
one has to calculate an appropriate direction of de- 
scent, which should be followed at a given iteration 
step. Moreover, the optimality conditions for a non- 
differentiable function have a form dictated by the set- 
valued approximation of the derivative (i. e., one should 
check, at a given point, if zero is element of a set, or if 
some relation between sets is satisfied). Several appli- 
cations of the quasidifferentiability concept are briefly 
reviewed in this short article. 


Nonsmooth Modeling 


Let us recall first the notion of quasidifferentiability in 
the sense of Demyanov. A function f which is defined 
on an open set X C R" and which is directionally dif- 
ferentiable at a point x € X is called quasidifferentiable 
if there exists an ordered pair of convex compact sets 
[df (x), Of (x)] in R” x R" which produces the direc- 
tional derivative of the function by the following for- 
mula 

f'(x,g) = max (w,g)+ min (v,g) , (1) 

wed f(x) ved f(x) 
for all directions g € R”. More details are given in 
> Quasidifferentiable optimization. 

Relation (1) gives rise to a qualitative and quantita- 
tive nonsmooth approximation (quasilinearization) of 
anonsmooth and nonconvex function f at point x. The- 
oretical results on nonsmooth modeling can be found, 
among others, in [8,9,14]. 

The notion of the quasidifferential gives rise to non- 
smooth models, with applications in mechanics [5,11]. 
In particular, interesting nonconvex variational formu- 
lations can be written, as it is discussed in more de- 
tail in » Quasidifferentiable optimization: Variational 
formulations. They extend the variational inequalities, 
which are valid for the convex, nondifferentiable case, 
and constitute a parallel development to the hemivaria- 
tional inequalities in the sense of P.D. Panagiotopoulos 
(see also ® Nonconvex energy functions: Hemivaria- 
tional inequalities; » Hemivariational inequalities: Ap- 
plications in mechanics as well as [12]). Furthermore, 
quasidifferential and codifferential optimization tech- 
niques can be used for the construction of numerical 


algorithms for problems of nonsmooth computational 
mechanics [5]. 


Nonsmooth and Nonconvex Optimization 


The notion of the quasidifferential allows one to calcu- 
late one steepest descent direction of a quasidifferen- 
tiable function f(x) at a given point xo. Assume that at 
point xo one has the subdifferential df (xo) and the su- 
perdifferential of (xo). Then, a steepest descent direc- 
tion h can be calculated by: 


* * 
Wy + w, 


wi wz 


(2) 
for w* € Of (xo), wx € Of (xo), such that 


Iwi + w3I] = max min hws + wall 
wi€df(xo) | w2€d (xo) 


Moreover, there exists necessary (and in some cases 
sufficient) set-valued optimality conditions for quasid- 
ifferentiable optimization problems (see > Quasidif- 
ferentiable optimization). Thus one has whatever is 
needed for the construction of a numerical algorithm. 
Calculus rules exist for the construction of the quasid- 
ifferential (see ® Quasidifferentiable optimization: Cal- 
culus of quasidifferentials), if this is not obvious from 
the definition of the optimization problem. Stopping 
rules for an optimization algorithm can also be ex- 
tracted. In fact, if the optimality criteria are satisfied, 
then (at least local) minimum point has been calculated. 
Otherwise, one can calculate a steepest descent direc- 
tion by (2) and proceed with a (steepest descent like) 
numerical optimization scheme. In this respect the af- 
filiated notion of the codifferentiability has certain ad- 
vantages for the numerical implementation. More de- 
tails can be found in » Quasidifferentiable optimiza- 
tion: Codifferentiable functions and in [3,6]. 

It is worth noting to observe here that formula (2) 
may admit multiple solutions. This should be expected 
since one deals with nonconvex (global) optimization 
problems. This is actually one of the advantages of the 
quasidifferentiability concept since, theoretically, if one 
follows all possible directions of descent which may 
arise along an iterative algorithm one should be able to 
calculate multiple solutions (i. e., local minima). 

More information on smooth (convex and noncon- 
vex) optimization and appropriate algorithms devel- 
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oped for these problems can be found, among others, 
in [2,7]. Note also the multi-objective programming ap- 
proach for the solution of systems of quasidifferentiable 
equations which has been developed in [13]. 


Multilevel and Marginal Function Optimization 


Interesting results on the application of the quasidiffer- 
entiability concept for the sensitivity analysis and algo- 
rithms for multilevel optimization problems have been 
presented in [1,10]. 


Applications in nonsmooth mechanics 


Quasidifferential modeling and optimization have been 
used for nonsmooth mechanics applications. As it is al- 
ready mentioned these results can be found in [5,11]. 
A number of recent (2000) applications of quasidiffer- 
entiability can be found in [4]. 
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Certain semipermeability or temperature control prob- 
lems in thermoelasticity, which may be combined with 
analogous mechanical unilateral contact effects, can be 
formulated and studied in a unified framework by non- 
smooth modeling techniques [7]. The theory of qua- 
sidifferentiable optimization, in the sense of V.F. De- 
myanov and A.M. Rubinov, provides a general frame- 
work for the treatment of both convex and noncon- 
vex, nonsmooth modeling problems [1,2,3,4]. Coupled 
thermal and kinematical nonconvex unilateral effects 
will be modeled in the sequel by using the quasidif- 
ferentiable optimization approach. Analogous formu- 
lations which have been based on the notion of hemi- 
variational inequalities have been proposed and stud- 
ied for semipermeability and thermal problems by P.D. 


Panagiotopoulos et al. [6,7]. An extension to thermo- 
viscoelasticity has recently been published in [5]. 

This short article is mainly based on the results pre- 
sented in [4,7], where more details can be found. 


Classical Thermoelastic Model 


Let us consider a thermoelastic medium in the Eu- 
clidean space R*. A point is denoted by x and its co- 
ordinates with respect to a fixed Cartesian coordinate 
system 0x)x2x3 by x;, i= 1, 2, 3. The time variable t takes 
values in the interval [0, T] CR. Moreover, let u = u(x, 
t) be the displacement of the material point x at time t¢ 
with reference to the natural state of the body, which is 
characterized by zero stresses and a constant absolute 
temperature 0 > 0. The density at point x of the natu- 
ral state is denoted by p = p(x) and the open, bounded, 
connected subset of R° occupied by the body is denoted 
by £2. As usual, the boundary I" of £2 is assumed to be 
regular. 

The behavior of a linear thermoelastic body is gov- 
erned by the following constitutive equations for the 
stress tensor o = {oj}, i = 1, 2, 3, and the specific en- 
tropy deviation 7 — yo (where no is the specific entropy 
of the natural state) 


Oi; = tiz — mij(O — 4%) 


(1) 
= Cijnk€nk — mij(O — Op), 
1 1 
1-1 = 9 (Pl? — %) + Pca (2) 


Here 6 = 0(x, ft) is the absolute temperature, and ¢ = 
{e,} the strain tensor, which is related to the displace- 
ments by the small deformation elasticity relation 


1 
éjj(u) = 3 Mini + uj,i). 


Here C = {Cynx}, i,j, h, k = 1, 2, 3, is the elasticity tensor, 
which satisfies the well-known symmetry and ellipticity 
conditions, m = {mj} is the symmetry tensor of thermal 
expansion, and cp = cp(x)> 0 is the specific heat at zero 
strain of the body. C(x), m(x) and cp(x) are referred to 
the natural state of the body. The equations of motion 
read: 


pu; = 04,5 + fir (3) 
and the law of conservation of energy has the form 


pOon = —gii + Q. (4) 
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Here f = {fi}, fi = fi(x, t), is the volume force vector, q 


= {qil, qi = qilx, t), is the heat flux vector and Q = Q(x, 


t) is the radiant heating per unit volume. Fourier’s law 


of heat conduction reads: 


qi = —kjj0jj. 


(5) 


The symmetric tensor of thermal conductivity k = {kj}, 


ki = k(x), refers to the natural state of the body and 


satisfies the condition 


3 
kjjajaj = cajai, Va={aj}ER ‘i 


where c is a positive constant. These relations lead to 
the following system of differential equations, which 


describe the linear thermoelastic behavior of a generally 


nonhomogeneous and nonisotropic body: 


pul) = fi + (Cijnne nc); — (1mi( — 60)) 


in 2 x(0,T), 


pep®’ — (kij9,;),i + mi joe; =Q, 


(6) 


in 2 x (0, T). 


In the sequel the following initial conditions at t = 0 are 


assumed: 

i; = up(x), ue =m (x) mW, 
and 

6=O(x) in®. 


Let the following bilinear forms be introduced: 
atu.) = | Cijnkeij(Wenk(v) dQ, 
2 
w= f ujv;d&2, 
2 
My(8.) = f (m8), 42. 
ro) 
ma(u.e) =f mjijui jo dQ, 
ro) 
K@.9) = [ k;,9,j;9,,dQ, 
ro) 


6.0) = [ OpdQ. 
2 


(7) 


(8) 


(9) 


(10) 


Quasidifferential Thermal Boundary Conditions 


In order to complete the description of the previous 
boundary value problem one needs to specify boundary 
conditions for the thermal and for the elasticity prob- 
lem. First, let us assume that between the boundary 
temperature and the heat flux the following quasidif- 
ferential (QD) superpotential relation holds: 


qin; = —k;j0,jn; = wi (0, t) + w2(0, t), 
with {w (6, t), w2(9, t)} € Dj(@, t), 
on I\ x (0, T), 


(11) 


where J”; C I’ and on the remaining part of the bound- 
ary one assumes, for simplicity, that: 


86=0 onl —TI,. (12) 


For the displacements, a simple boundary condition is 
considered: 


uj; =0 onl x (0,T). (13) 


Here n = {n;} denotes, as usual, the unit normal to I" 
directed towards the exterior of 92. 


Variational Formulation 

One follows here the usual way for the construction 
of the variational or weak formulation of the previ- 
ous boundary value problem (see also » quasidifferen- 
tiable optimization: variational formulations; » hemi- 
variational inequalities: applications in mechanics). Let 
the virtual variations v— u' and g — @ are sufficiently 
smooth. Then, by multiplying (6) and (7) by v— u’ 
and g — @ respectively, integrating over §2, and using 
the Green—Gauss theorem, one obtains the variational 
equalities 


(pu",v—u') + a(u,v—u') + M,(0 — 0, v—u’) 
= (f.v—- u’) + / tijnj(vi = u') aI 

cr 
in 2 x (0,7) 


and ae 


(pcp6’,p — 0) + K(, 9 — 8) + M2(6ou', p — 8) 
= Q.9-0)+ | kij8jmlo—6) dl’ 


in 2 x (0, T). 
(15) 
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Now let us assume that Cy,x, kij, my, p> 0 and cp> 
0 are elements of L®(S2), and that f(t)e [L?(2)]° and 


Q(t)€ L?(2). Moreover, the spaces [H(Q)P for v, u’ 
and H!(92) for 9, @ are introduced. 

Let us recall that a QD boundary condition (for in- 
stance, the relation (11)), gives rise, due to the definition 
of the quasidifferential, to a min-max relation. This re- 
lation, is used for the formulation of nonconvex vari- 
ational problems, as it is discussed in more details in 
> quasidifferentiable optimization: variational formu- 
lations. 

Thus, the variational equalities (14) and (15) are 
combined with the boundary conditions (11)-(13), and 
lead to the following variational problem: find func- 


0 
tions u: [0,T] — [H'(2)]}? and 6: [0, T]> ® = 


0 
H'(2): 6 = 0 on F— 1}, with u/(t) € [H'(2)]°, 
u’ (t)€ [L?(2)]5, 6’ (t)€ L?(Q), which satisfy the initial 
conditions and the variational expression 


(pu”,v—u') + a(u,v—u') + Mi (6 — 0, v—u’) 


(<0), We Or 


(16) 
and 
(pcp0’,p — 0) + K(0, — 6) 
+ M2(Oou', Q— 0) 


+ min \Wwh.o _ 6): 
= (Q.9 _ 0), 


wr € aI(8, n} 
Vo Ee ®. 


Quasidifferential Elastic Boundary Conditions 


Assume now simple thermal boundary conditions, i.e., 


9=66 onl x(0,T). (18) 


For the elasticity problem let a nonmonotone, possibly 
multivalued quasidifferential (QD) boundary law holds 
on a part Is of the boundary I’: 


—S = {-Si} = {-oijnis 

= S,(u’,x, t) + So(u’, x, t), 

{Si(u', x, t), So(u', x, t)} © Dw(u',x, t) 
on Is x (0, T). 


(19) 


On the remaining part of the boundary one assumes 
simply that: 

uj = U; on Iv x (0, T). (20) 
Here P = Py UT's, where 'y and I's are nonempty, 
disjoint, open sets, U; = U;(x, t) is a prescribed displace- 
ment vector on Jy (which should be compatible with 
the initial conditions (8)-(9)). 

In an analogous way one proceeds with the bound- 
ary value problem which is defined by the relations (6)- 
(9) and (18)-(19). Let v, u’ € [H!(Q)]? be such that v 
=u’ = U'(t) on Iy and g, 6 € H'\(Q) with g = 6 = 
99 on I’. In this case one gets the variational problem: 
find u: [0, T] > [H! (Q)} with uw’ = UonIy and 6 € 
H'(Q) with 0 = 0) on I’ with w’ (t)€ [H'(2)]3, u (HE 
[L?(2)]*, 8’(t)€ L*(2), which satisfy the initial condi- 
tions and the variational expression 


(pu"”,v—u') + a(u,v—u’) 
+ M,(0 = 60, ad u’) 
+ max {(ST, v— u') : Sf €aw(u’, t)\ 


21 
+ min{(st.v—w): steavw,y} =?» 
= (f. v— a) 
Vv €[H\(Q)) withv = U’(t) only 
and 

(pcp 6’, — 0) + K(6,g — 9) 
+ M2(Ou',g —0 

2(Oou', p ) (22) 
=(Q,9-8), 
Yoe H(2) wihg=Honl. 


More general thermoelastic problems may be consid- 
ered by considering QD laws for both the elasticity and 
the thermal part of the problem, or even mixed laws. 
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A function f defined on an open set X C R" and di- 
rectionally differentiable at a point x € X is called qua- 
sidifferentiable (in the sense of V.F. Demyanov) if there 
exists an ordered pair of convex compact sets [U, V] in 
R" xR" which produces the directional derivative of the 
function by the following formula 


fag = max(h, g) + min(h, g) (1) 


for all directions g € R". 

Quasidifferentiability is a genuine generalization 
of the classical differentiability concept which is valid 
for smooth differentiable functions, and of the con- 
vex analysis subdifferential, which, in turn, is a set- 
valued differential valid for convex, nondifferentiable 
functions. An ordered pair of convex sets is used for the 
approximation of the directional derivative in (1). More 
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details can be found in the companion article ® Qua- 
sidifferentiable optimization, the links with other no- 
tions of nonsmooth analysis are discussed in ® Qua- 
sidifferentiable optimization: Dini derivatives, Clarke 
derivatives and applications are briefly presented in 
> Quasidifferentiable optimization: Applications. One 
may also consult the original publications [2,3,4,5,6]. 

Using classical differential calculus and on the 
assumption of smooth (differentiable) functions the 
derivative of sums, of differences or of composite func- 
tions etc may easily be calculated. To this end one uses 
calculus rules and the derivatives of the involved func- 
tions (cf., the chain rule of differentiation). For quasid- 
ifferentiable functions there exist appropriate calculus 
rules [2]. The situation is more complicated here, since 
one manipulates ordered pairs of convex sets. Further- 
more, calculus rules have been developed for composite 
functions which can be produced from a finite num- 
ber of smooth constituents and from the application of 
a finite number of minimum or maximum operators. 
Moreover, as the quasidifferential of a given function is 
not uniquely determined (it is actually a class of equiv- 
alent ordered pairs of convex sets) one may wishes to 
simplify the results of such a calculus operation. 

It is clear that, since quasidifferentials have found 
a number of applications, among others in optimiza- 
tion, in mechanics, in control theory and in economy, 
the need for refining the quasidifferential calculus and 
for incorporating it into automatic computational pro- 
cedures (e.g, in computer algebra systems, in analogy 
to classical systems [1]) is obvious. For the latter task, 
which at the present remains open for future research 
efforts, use of results developed within the theory of in- 
terval analysis may be advantageous. 

Calculus rules for one-dimensional functions (de- 
fined on R) and for functions defined on R” are given 
without proofs here. See [2,3] for more details. 


One-Dimensional Case 


A function in R! and sets which are intervals of the real 
line R! are considered first. Let D, and D» be two pairs 
of closed intervals: D,; = [A,, B,], D2 = [A2, B2], where 
Ay =[V11; Vi2], By = [Wis Wiz], Az = [V21, V22], Bo = [Wa1; 
W22], with vj) < v2, Wi < Wa, Vi € {1, 2 }. Addition of 
intervals is defined as follows: D = D, + Dz = [A,+ Bj, 
A2+ By] = [A, B], where A = [v11+ V2], Vj2+ V22] and B 


= [w11+W21, Wi2+ W22]. Moreover, for D = [A, B], A = 
[v1, v2], B= [w1, wa], v1 < v2, wi < w2, multiplication 


by a scalar quantity A is defined by: 


pa sA BL Az 0. 
~ AB, A], A <0, 


where, on the right-hand side one has A [A, B] = [[A v1, 
X v2], [A wi, A wa]], etc. 

Based on these results concerning calculus of closed 
intervals one derives calculus rules for quasidifferentials 
in the one-dimensional case. 

Let f; be a directionally differentiable function at 
a point x and let Dfi(x) = [0fi(x), Ofi(x)] be its qua- 
sidifferential at a point x € R!: dfi(x) = [vu via), 
dfi(x) = [wu.wi2], Vir < Viz, Wi S Wiz. Then the 
function f = A f, is also directionally differentiable at 
x and admits a quasidifferential of the form Df(x) = 


[df (x), Of (x)], where 


Avy.A , AO, 
af (x) _ [Avi1,AV12] me 

[Awy,Awi], A<0, 
a(x) = [Awy,Aw2], A>, 

[Avi2, Avi] ; A <0. 


If in addition, f(x) 4 0 then the function f = 1/f) is 
also directionally differentiable at x and 


Df(x) = — Pile) = [af (x), Af], 
1 


where 
1 1 
Of (x) = ated , 
Of (x) _ |- aves | : 


Let us consider two directionally differentiable 
functions f;, f2 at a point x and let D f(x), D f2(x) be 
their quasidifferentials 

Dfi(x) = [fi(x), #fi(x)] . 

Dfx(x) = [dfr(x), Ofa(x)] , 


with the corresponding intervals denoted by: 


ofi(x) = [vn v2) . Ofi(x) = [wi Wiz] , 
Ofa(x) = [v21, v22] . Ofx(x) = [wa1, W22] , 


Vii SV2, Wi SW, Vie {1,2}. 
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Then the function f = f+ f2 is also directionally differ- 
entiable at x and one can take Df(x) = [df (x), Of(x)], 
where 

Of (x) = dfi(x) + dfi(x) = [v1, va] 

Df) =f) + Tf) = [mi wal 


with Vy = Vit V21, V2 = Viat V22,W1 = Wit Wa) and W2 
= Wi2+ W22. 

Analogously one proceeds with the product of two 
functions f = ff2, where 


Df (x) = filx)Dfi(x) + fa(x)Dfilx) 
= [df (x), Of(x)] . 


Furthermore, let g;(x) (i € I = {1, ..., N}) be direc- 
tionally differentiable functions at a point x. The func- 
tions 


fix) = max gi(x), folx) = min pi(x) 


are also quasidifferentiable. 

Finally, let f(z1, ..., Zm) be a smooth function and 
let yi, ..., ¥m be quasidifferentiable functions at a point 
xo. Then the function F(x) = f(yi(x), ...; ¥m(x)) is also 
quasidifferentiable at xo. 

One concludes that the family of quasidifferentiable 
functions is a linear space, closed with respect to all 
smooth operations, as well as the operations of taking 
the pointwise maximum and minimum over a finite 
number of functions. 


Finite-Dimensional Case 


In this case one needs calculus rules for pairs of convex 
sets of R” (see, e. g. [4,6]). 

Let the functions f, f1, f2 be quasidifferentiable at x 
and A be a real number. Then the sum, the product, the 
function A f and the function 1/f(x) (or every point x 
such that f(x) # 0) are also quasidifferentiable and an 
element of their quasidifferential can be calculated as 
follows: 


D(fi + f2)(x) = Dfi(x) + Dfh(x), 
D(fi : fa)(x) = file) Dfo(x) + fa(x)DAi(x) , 
D(Af)(x) =ADf (x) , 


D (5) (x) = - Phe: 


Let moreover the functions f1,..., fm be defined on 
an open set X C R” and be quasidifferentiable at x € X. 
Then, the functions 


oi(x) = 


max 
i€{1,...,m} 


Filx), a(x) = Pe 


are quasidifferentiable at x as well. The following rela- 
tions hold: 


Dpi(x) = [24j(x), 3b(@)] . 7 = 1.2, 


with 


agi(x)=co L) }ak@)- YS Afilx) 
Peni) i € R(x), 
ixk 
Igi(x) = D> Afelx), Ado(x) = Do afie(x), 
kER(x) keq(x) 
IGr(x) = co (J ]Of(x)- do dfi(x) 
poe i € Q(x), 
ixk 


Here, [0f;(x), 8 fx(x)] is a quasidifferential of f;, at x 
and the following activity sets have been used: 


R(x) = {i el: fi(x) = bi(x)} , 
Q(x) = {ie I: filx) = dr(x)} , 


where I = {1,..., n}. 

Finally, consider the case of a composite function. 
Let a mapping H(x) = (hy(x),..., Mm(x)) be defined such 
that H(x): X — Y, where X is an open set in R” and Y 
is an open set in R™ and every function h; is quasid- 
ifferentiable at x9 € X. Let us assume that a function 
f is defined on Y and is Hadamard differentiable and 
quasidifferentiable at yp = H(xo). Then the composite 
function 


b(x) = f(A(x)) 
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is quasidifferentiable at x9 and its quasidifferential 
Do(xo) = [86(x), 96(xo)] is expressed by the formu- 
las: 


d9(x0) 
p= (vA, + pi) — vA, — VO ay), 
=e mo (vp) 22. vm) € Of (yo), 
Aj € Ohj(xo), Wi € Ohj(Xo) 
Te) 


b= SA; + wi) + vA; + 0 p,), 
— . i=1 — 
ah a v= (v,..., 0) € df(yo), 
Aji € Ohj(x0), Mi € dh; (xo) 


where v and V are arbitrary vectors such that 


VSvST, Wwe Af(y)U (—If(y0)) . 


Concrete examples and the derivation of the above 
rules can be found in the above given literature. One 
should only mention that if some of the involved sets 
(i.e., the subdifferential or the superdifferential) hap- 
pens to be polyhedral, then certain of the previous rules 
can be simplified significantly (see, e. g., [7]). The latter 
case appears, among others, in the applications of qua- 
sidifferential calculus within a finite element method 
environment for applications in mechanics (see also 
> Quasidifferentiable optimization: Variational formu- 
lations; » Quasidifferentiable optimization: Applica- 
tions to thermoelasticity, and [5]). 
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If for a directionally differentiable function there exists 
an ordered pair of convex compact sets [U, V] in R” 
xR” which produces the directional derivative of f at x 
in the direction g by the expression: 


f(x, g) = max(h, g) + min(h, g); (1) 


this function is called quasidifferentiable in the sense of 
V.F. Demyanov and A.M. Rubinov. This notion cov- 
ers a large number of structured nonconvex and non- 
smooth functions, which can be used for the solution 
of nonconvex and global optimization problems. For 
instance, the class of difference convex functions is in- 
cluded. 

More details on this notion, the calculus rules for 
computing quasidifferentials, its connection to other 
notions of nonsmooth analysis and it applications 
can be found in » Quasidifferentiable optimization; 
> Quasidifferentiable optimization: Calculus of qua- 
sidifferentials; ® Quasidifferentiable optimization: Dini 
derivatives, Clarke derivatives; » Quasidifferentiable 
optimization: Applications, as well as in [1,2,3]. 

The quasidifferential, as well as the subdifferential 
of convex analysis, are set-valued quantities which in- 
clude discontinuities at the points of nondifferentiabil- 
ity. In numerical algorithms this may cause problems. 


A notion that takes into account neighboring informa- 
tion would be more appropriate. This led Demyanov 
to extend the notion of the quasidifferential by intro- 
ducing the codifferential. Accordingly, the notions of 
subdifferential and superdifferential are extended to the 
notions of hypodifferential and hyperdifferential. One 
should mention that all quasidifferentiable functions 
are codifferentiable as well. Moreover, calculus rules ex- 
ists, in analogy to the quasidifferential calculus rules. 

These notions, which are useful for the construction 
of numerical algorithms in nonsmooth optimization 
[1] and nonsmooth computational mechanics [2] are 
introduced in this short paper. More details are given 
in the cited literature and in the previously mentioned 
lemmas. 


Codifferentiable Functions 


Let X be an open subset of R” and let a function f be 
defined and finite for every x € X. A function f is called 
codifferentiable at x if there exist convex compact sets 
df(x) C R"*! and df(x) C R"*? such that the func- 
tion admits a first order approximation in a neighbor- 
hood of x of the form 


F(e+A=f&)+ max (2) 


[b + (w, A)] + 0,(A), 


[a + (v, A)] 


+ min 
[b.wled f(x) 


where 0,(a@ A)/a + 0asa@ | 0, VA € R”. The ordered 
pair of convex compact sets Df(x) = [df (x), df(x)] is 
called a codifferential of f at x, where df (x) is a hypodif- 
ferential and d f(x) is a hyperdifferential. 

If moreover there exists a codifferential Df which is 
Hausdorff continuous in a neighborhood of x, the func- 
tion f is called continuously codifferentiable at x. 

If there exists a codifferential of the form Df(x) 
= [df(x), {O}], the function f is called hypodifferen- 
tiable, while if there exists a codifferential of the form 
Df (x) = [{0}, df(x)] the function is called hyperdif- 
ferentiable. 

Note here that for a continuously codifferentiable 
function the first order approximation which is based 
on (2) is a continuous function in both x and A (recall 
that the analogous approximation based on the quasid- 
ifferential is a continuous function of only A). 
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Twice Codifferentiable Functions 


Twice codifferentiable functions present a suitable tool 
for constructing higher order approximations of non- 
differentiable functions. They extend the notion of sec- 
ond order derivatives of classical smooth analysis. 

Let a function f be defined on an open set X C R" 
and let it be finite there. The function f is twice codif- 
ferentiable at x € X if there exist convex compact sets 
d*f(x), and d f(x), both subsets of R xR” xR"*” such 
that 


f(x+A) = f(x) 


+ max 


1 
E £6, NEGA, a)| 
[w,v,A]ed? f(x) 2 


fe min 
[b,w,Bled’ f(x) 


+ o(A’), 


E + (w, A) + (BA. a)| 


with o((@ A)’)/a? > 0 asa | 0 and VA € R". Here 
R"*" is the space of real (n x n)-matrices. 

The ordered pair of convex sets D*f(x) = 
[d°f(x),d f(x)] is called a second order codifferential 
of f at x, the set d*f(x) is a second order hypodifferen- 
tial and the set d*f(x) is a second order hyperdifferen- 
tial of f at x. Moreover, if f is twice codifferentiable in 
some neighborhood of a point x and the mapping D? 
f is Hausdorff continuous at x, then the function f is 
called twice continuously codifferentiable at x. 

Analogously to the quasidifferentiable and codiffer- 
entiable functions, twice hypodifferentiable functions 
and twice hyperdifferentiable functions may be defined. 
Calculus rules do also exist for twice codifferentiable 
functions (see [1, p. 216]). 

For example, let f be convex and finite on a convex 
set X C R", x € X, and let Xo be an arbitrary closed 
convex and bounded subset of X with x € int Xo. In this 
case one may consider the second order codifferential 
D*f (x) = [d?f(x), 0]>, with 


a = f(z) — f(x) 
+(v(z),x — 2), 
v(z) € af (2), 
A=0ER"", 

zE Xo 


d’ f(x) = co 4 [a, v, A]: 


Here v(z) € 0f(z) is an arbitrary element of the set val- 
ued mapping, which is kept fixed for every z € Xo and 


f(z) is equal to the classical convex analysis subdiffer- 
ential. 

Moreover, for a twice continuously differentiable 
function f it is well-known that 


1 
F(x+A) = f(x) +f"), A)+ 5A, A)+o0(A’), 


where fF (x) is the matrix of second order derivatives 
(Hessian) of f at x. The function f is twice continuously 
codifferentiable and one may consider (among other 
choices) one of the following second order codifferen- 
tials of f: 


@ f(x) = {[0, f(x), f/"(O]}. 
d f(x) = {0,0,0}, 


or 

d’ f(x) = {0, 0, 0} , 

d f(x) ={[0, f(x), fo]}. 
Applications 


Efficient nonsmooth optimization algorithms can be 
constructed based on the notion of the codifferential, 
or, for hypodifferentiable functions, on the notion of 
the hypodifferential. In fact, the technique of replac- 
ing a nondifferentiable optimization problem by an 
enlarged, classical, inequality constrained optimization 
problem has been successfully used for convex or for 
composite optimization problems [4,15]. For hypod- 
ifferentiable functions a direction of descent at each 
given point can be determined and used in an iterative 
optimization procedure. For general, codifferentiable 
functions, several directions of descent can be deter- 
mined. This can be expected, given that one deals with 
nonconvex, global optimization problems. Some details 
in this direction are given in ® Quasidifferentiable op- 
timization: Applications and in the original publica- 
tions [3,11]. 

Furthermore, twice (or higher order) quasidiffer- 
entials and codifferentials provide set-valued approxi- 
mations of the higher order derivatives of a function. 
For numerical optimization tasks this information may 
lead to more efficient algorithms, in analogy to the use 
of Hessian matrices in classical, smooth optimization. 
Other attempts for generalized second order deriva- 
tives can be found, for convex functions in [5] and 
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for nonconvex functions in [10,12]. For more informa- 
tion in the area of nonsmooth optimization see, e. g. 
[6,7,8,9,13]. 

Another area of interest for practical applications 
will be the use of this information for the construc- 
tion of necessary and sufficient (local)optimality condi- 
tions. Applications of these results include stability and 
sensitivity analysis for quasidifferentiable and codiffer- 
entiable optimization problems. In mechanics, this in- 
formation can be used for the study of the stability of 
structures governed by quasidifferentiable superpoten- 
tials (cf. e.g., [14] and » Quasidifferentiable optimiza- 
tion: Stability of dynamic systems). Applications in eco- 
nomics will be of interest as well. Much work remains 
to be done in this area, which is open for further inves- 
tigations (as of 1999). 
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The notion of the quasidifferential in the sense of V.F. 
Demyanov and A.M. Rubinov [5] constitutes a set- 
valued extension of the classical differential, which is 
appropriate for nonsmooth and generally nonconvex 
but directionally differentiable functions. This class of 
functions covers a large number of applications in non- 
smooth analysis and, among others, includes the pop- 
ular in global optimization class of difference convex 
functions. The quasidifferential approximates the di- 
rectional derivative of a function by using an ordered 
pair of convex sets, the subdifferential and the superdif- 
ferential. Definitions are given in ® Quasidifferentiable 


optimization. Information on the corresponding calcu- 
lus can be found in » Quasidifferentiable optimization: 
Calculus of quasidifferentials. 

Here, the relation between quasidifferentials and 
more classical notions in nonsmooth analysis is briefly 
addressed. In particular, the Dini directional derivatives 
and the F.H. Clarke [2,3] derivatives are considered. 
Other notions of nonsmooth analysis may be found, 
among others, in the recent publications [1,3,11]. 


Dini Derivatives 


The Dini upper derivative of a function f: R" > Rat 
a point x € domf ina direction g € R” is defined by: 


fg =lim sup en) — f(x)]. (1) 
alo a 


Note that the upper limit in (1) is not necessarily finite. 
Analogously, the Dini lower derivative of f at x is de- 
fined by the relation 


nae 7 
i= pat Lf(x,ag) — f(x)]. 
Recall that if the limit 
f@g) =lim Sie + ag) — f(x)] 
alo a 


exists it is called the derivative of a function f at a point 
x ina direction g, or the Dini derivative and it is denoted 
by fo! (xg). 

Since the Dini derivative (resp. the Dini upper or 
lower derivative) is just the one-sided (resp. the one- 
sided upper or lower) derivative of an ordinary real- 
valued function, one can uses the methods developed 
to study functions of one variable. Thus, for instance, 
calculus rules for directional derivatives can be con- 
structed. 

A function f defined on an open set 92 is called Dini 
uniformly directionally differentiable at a point x € Q 
if it is directionally differentiable at x and there exists 
a real number a such that 


“[flx+ag)—fo)-af"e. ol <e. 
VgeS, 


(2) 
Vae (0, &) ; 


where S = {g |g| = 1 is the unit sphere. By setting a g = 
v in (2) one gets: 


[fx + v) — fe) — fL)] <eIlvll 


Vv such that ||v|| < a. 
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Thus the uniform directional differentiability means 
that ici | f(x + v) — f(x) — fitv)| tends to zero, as ||v|| 
> 0. 


More details on Dini derivatives and their use in op- 
timization can be found in [9]. 


Clarke Derivatives 


Let us consider the upper and lower Dini derivatives of 
a function f for a fixed direction g, i.e. the functions x 
> fi (x, g) and x > f V(x, g)- Let us also consider the 
upper (resp. the lower) regularizations of these func- 
tions: 


Fo(x. 9) = max } (lim sup fSlx. 0] ; 


respectively 


fp (x. g) = min | £56. a)imintf§(s. 2) 


For a Lipschitz function f, the upper and lower Dini 
derivatives are bounded in some neighborhood of x, 
hence both previous limits are finite. 

The Clarke upper and lower derivatives are defined 
as upper and lower regularizations of the Dini upper 
and lower derivatives, i. e. 

fo, 8) =F08). 

fey 8) = Foe.) 

For the initial, equivalent definition of these quantities, 
see [2,6 p. 69]. Here the approach of [8] has been fol- 
lowed. For every fixed direction g, the function x > 
f 1 g) is upper semicontinuous and the function x 
=f ¥ 1 (% g) is lower semicontinuous. 

It is appropriate to recall here some properties of the 
Clarke derivatives. For every fixed point x, the function 
goof t ( g) is sublinear and the function g > f ! mee 
g) is superlinear, thus the subdifferential 0 f t i (% g) and 
the superdifferential 0 Ex (x, g) can be determined, such 
that 


fo(x.g)= max (1,9), 
1<e0f2, (xg) 
fal%.g)= min (w,g). 

wed fe, (xg) 


Moreover, the following relations hold 
fa -g) = (Ph), 
fe. g) = —f&%-g). 


From the above properties it results that 


max (l,g)= 
1e8 fd (xg) 


max 
wed fay (xg) 


(w, g); 


thus the two compact convex sets coincide. The Clarke 
subdifferential is thus defined as 


dei f (x) = afk (x. g) = IFA (x. g) 


The mapping x > 0c,f(x) is upper semicontinuous. An 
element of the Clarke subdifferential is called a general- 
ized gradient of f at x. 

Concerning the relation between the directional 
derivative of the function (if it is directionally differen- 
tiable) and the Clarke upper and lower derivatives one 
has, in general, 


fax. g) < file. g) < fA 9)- (3) 


Thus Clarke upper and lower derivatives are a sublinear 
majorant and a superlinear minorant of f’(x, g) respec- 
tively. Only in the case of an u.s.c. (resp. Ls.c.) direc- 
tional derivative f’(x, g) the second (resp. the first) in- 
equality in (3) holds as an equality. The latter property 
is considered to be the major drawback of the Clarke 
subdifferential in nonsmooth analysis applications, be- 
cause it does not always gives rise to an approximation 
of the directional derivative at the points of nondiffer- 
entiability. 

For further reference we recall here the necessary 
optimality conditions for a locally Lipschitz function f 
at a point x: 


OE Oc f (x) pi 


Note also that since approximations of sets and func- 
tions are linked, the notion of Clarke subdifferen- 
tial gives rise to a notion for the generalized tangent 
cone (and respectively a generalized normal cone). The 
reader is referred to [2,3] [6, p. 83], [10] for more de- 
tails. 


Quasidifferential and Clarke Subdifferential 


Before giving some information on the links between 
the quasidifferential and the Clarke subdifferential, 
some elements on the several definitions of the differ- 
ence of convex compact sets are in order. 
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Differences of Convex Sets 


Before stating the definition, some introductory mate- 
rial must be given. The max-face of a compact set U 
generated by x € R” is defined by 


G,(U) = jh € U: (h,x) = max(x, g)? . 
geU 


Recall that the max-face set coincides with the subdif- 
ferential of the support function of U, i.e. G,(U) = 0 
pu(x). Recall also that for a convex function defined on 
R" the set of points of T C R” where max-face is a sin- 
gleton is of measure zero is a set of full measure (with 
respect to R”, i.e. R” \ T is a set of measure zero). 

The difference of two sets U and V, U—V is de- 
fined on the set of full measure T where both G,(U) 
and G,(V) are singletons by: 


U-V = cleo {Vpu(x) — Vpv(x): x € T}, 


where V g denotes the gradient of function g. One may 
observe here that if U= V +W then U-V = W. 
An equivalent definition of UV is given by 


U~V =cleo{ |) [G.(U)—G,(V)] }. (4) 


x€Tu,v 


where the dependence of T on both U and V is explic- 
itly indicated. 

An extension of (4) leads to the quasidifference op- 
eration —, defined by 


U=V = cleo | JiG.(U) =GiV)| I. (5) 
x0 


Unfortunately ~ is not invariant with respect to the 
equivalence relation ~. Nevertheless an estimate of the 
form U~V D U~V, for every sets U and V always 
holds and in some cases there exist conditions under 
which the inclusion holds as an equality (see e. g. [6, p. 
117]). 


Estimation Results 


The Clarke subdifferential can also be generated, in 
some cases, by the set operators difference ~ and qua- 
sidifference ~ applied on the subdifferential f(x) and 
the superdifferential 0 f(x) of a quasidifferentiable func- 
tion f(x). 


Under appropriate assumptions on f, and for ap- 
propriate choice of the elements of the subdifferential 
and the subdifferential of f at x, an estimate of the fol- 
lowing form can be extracted: 


AC Ici f (x) CB. 


with set A = df(x)+(—df(x)) and set B = 
df (x)~(—0f (x)), as it is discussed in [6, pp. 143-155]. 
A different approach to the study of the relationship be- 
tween the Clarke subdifferential and the quasidifferen- 
tial is followed in [7] (see also [6, pp. 156-159]). More 
details in this direction can also be found in [4]. 
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Penalty function methods are used for solving many 
constrained optimization problems of the form: Find 


inf f(x) = f*, (1) 


where f is a locally Lipschitz quasidifferentiable func- 

tion defined on R", Df (x) = [Of (x), Of (x)] is its qua- 

sidifferential at a point x € R”, X C R" is aclosed set. 
It is always possible to define X in the form (see [2]) 


X = {x ER": g(x) =0}, (2) 


where ¢ is also a locally Lipschitz quasidifferentiable 
function defined on R”, the pair of sets Dy(x) = 
[dp(x), dp(x)] is a quasidifferential of g at x € R” and 


g(x) >0, VxE€X. 


Thus the set X is the set of global minimum points 
of the function g on R”. Hence, it is closed. We shall 
assume that the set X C R” is not empty and bounded. 

As the function 9 is quasidifferentiable then the fol- 
lowing expansion holds: 


p(x + ag) = v(x) + ag'(x, g) + o(a, x, g), 
where 


o(a, x, g) ayo , 
a 


We shall assume that in this expansion at each point x 
€ R" the convergence to 0 is uniform with respect to g 
ER", [gl =1. 

The idea of penalty function methods consists in 
reducing the problem (1) to a problem of the un- 
constrained optimization. Among the different ap- 
proaches existing for such reduction we shall consider 
the method of exact penalty functions. 
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For solving the problem (1) a penalty function 
F(c, x) = f(x) + p(x) 


is introduced, where c is a nonnegative number, and 
then the problem 


inf F(c, x) (3) 
xER" 


is considered. 

We assume that inf, € R”F(c, x) is attained for every 
c > 0. In practice it would be useful to find conditions 
which guarantee that there exists an exact penalty pa- 
rameter c* > 0 such that the set 


Ye ER": x= argmin F(c*,.)| 

| xER" 
coincides with the set 

J ER": x= i . 

\* x = arg min 09} 


At first such a problem was investigated in [5,10] 
for the problem of convex programming. Now there are 
many works in this field of mathematics. (See, for exam- 
ple, [1,4,6,8,9]). 

The implementation of exact penalty function 
methods first of all depends on the properties of the 
function g. Therefore various conditions are imposed 
on ¢ to make it possible to solve our problem. We shall 
consider some of them. 


Regularity Condition 1 


(See [3].) 

We say that a regularity condition is satisfied for the 
function ¢ if for any boundary point x* € bdX there 
exist positive real numbers ¢(x*) and §(x*), such that 


MOD Sox, 9) + Ble") 


=—] max (vy, 
Es 8) 


Vx € A(X) N Secx*)(x*) , 
Va € (0, e(x*)], 
Vg € N(X,x): lg] =1., 


w€0Q(x) 


+ min i) + B(x*), 


where bd X is the set of boundary points of X, 
Se(xe)(x*) = {x ER": ||x —x* || < e(x*)} , 


AzéX: 


xebdXx: . Seng 
x is a projection of z 


A(X) = 


N(X, x) is the normal cone to the set X at the point x € 
X: 


- a, (81g) 3 0, 
N(X,x)=jgeER": Wg, € F(X, x) 
and 
dg. € R" a, = 0, 
P(X,x)=4geER": groga ld, 
x + cg € X 


The regularity condition 1 is a condition about the 
behavior of the function ¢ only at the boundary points 
of the set X. 

If for the function g the regularity condition 1 
holds, then there exists an exact penalty parameter c*. 

Since in practice the exact penalty parameter is 
a priori unknown, a sequence of real values cx is con- 
structed, satisfying the conditions 


0O=Co << CR <<, 
lim cz, = +00. 


k—++00 


Let us find 
x(-,) = arg min F(-;, x). 
xER" 


As a result, a decreasing sequence of real values {g 
(x(cx))} is constructed. There exists an integer K> 0 
such that x(cx) € X, Wk> K. Thus, for k> K, the points 
x(cx) will be global minimum points of a function F(c,, 
x) on R", i.e. will be solutions of problem (1). The value 
of the penalty parameter c* is directly proportional to 
the Lipschitz constant of the function f on the set 


L£(x**) = {x ER": (x) < g(x*™)} , 


where 


2K : 
x"* = min f(x 
son ), 


and inversely proportional to the number 6(x*), where 
the point x* is a limit point of the sequence {x(c;)}. In 
this method the regularity condition 1 is used only in 
a neighborhood of the point x*. 

Note that the function ¢ is essentially nondifferen- 
tiable at the boundary points of the set X. 
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For differentiable functions at the boundary points 
of the set X, the regularity condition 1 does not hold. 
For this reason, if the function g at the boundary points 
of the set X is superdifferentiable, then it cannot be used 
for constructing a sequence of exact penalty functions 
F(c, x). 

An example of a function, which can be used for 
constructing a family of exact penalty functions (even 
not requiring the set X to be bounded) is the Euclidean 
distance function. For it as an exact penalty parameter it 
is possible to take the Lipschitz constant of the function 
f on the set £ (x* *). However this function is not suit- 
able for practical use due to computational problems. 

We shall notice, that the regularity condition 1 is not 
constructive. Sometimes instead of it one uses another 
regularity condition. 


Regularity Condition 2 


(See [3].) 
We shall assume that there exists a real number 6> 
0, such that the following inequality holds 


inf min “x,¢)= inf 
x€A(X) llell 1, sa ( 8) x€.A(X) 
ge NX, x) 
min max (v,g)+ min (w,g)|>6. 
lgll = 1, veEdg(x) wEdQ(x) 
ge NX,x) 
(4) 


If the set X is bounded and does not consist of iso- 
lated points, and the regularity condition 2 is fulfilled 
for the family of penalty functions {F(c;, x)} then there 
exists an exact penalty parameter. 

Under some assumptions on the set X it is possible 
to get an analytical representation of the normal cone 
for this set at each boundary point and then, having 
calculated the constant f, it is possible to evaluate the 
exact penalty parameter c*. 

We shall assume, that for the function g at each 
point x € bd X the regularity condition 2 holds. Then 
the representation of the normal cone N(X, x) to the 
given set X at the point x € bd X, 


N(X,x) = () clcone(dy(x) + w), 


wEdg(x) 


holds. Here, cl A is the closure of A, cone A is the conical 
hull of A. 

For example, if the function ¢ is subdifferentiable at 
each boundary point of the set X, then 


N(X, x) = clcone(dg(x)) , 
and (4) can be rewritten as 
min 


Il s||=1. 
g€clcone(dy(x)) 


inf 


re max (v,g)>B>0. (5) 


vede(x) 
Example 1 Let 


1 
fo(x) = 2 (Aox, x) + (bo, x) ye EG bo € R" 5 


1 
filx) = 5 (Age, x) (Di, &) 
Ks b; ER", 


GER, TES iy.085 


where fj, i € (0, ..., p) are strongly convex functions. 
All the matrices A;,i=0,..., p, are positive definite. 
Consider the problem 


min fo(x) F (6) 
where 

X={xeR": fi(x) <0 ie. 

Let 

p(x) = max{0, pi(x)}, 

gi(x) = max fi(x) , 

x ER". 

Then 

X= {x ER": g(x) =0}. 
Let 

x = arg min g(x). 


We shall assume that g(x) < 0. 

Problem (6) is a convex programming problem. For 
this problem the regularity condition 1 is valid. 

Let d be the radius of the maximal ball centered at 
the point x and inscribed into the set X. Then the num- 
ber c* = L/2 md is an exact penalty parameter for the 
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problem (6). In this equality L is a Lipschitz constant of 


the function f on the set 


Xi 


x** 


{x ER": gx) < o(x**)} , 


arg min fo(x) , 


II 


II 


i.e. the number 


L = max ||Aox + bol 
xEX, 


can be taken as a Lipschitz constant of f on Xj, m is 
a strong convexity constant of the function gj m = 
minje; m;, where m; is constant of strong convexity of 
the function f;, i € I). 


Example 2 Consider the optimization problem 
i 7 
min f(x), (7) 


where 


fe=fAQ)—Ae), 
X = {x ER": G(x) — gr(x) < 0} , 


fis fo, Pi» P2 are convex functions defined on R”. 
Let the set X be bounded. It can be defined in the 
form 


X = {x ER": g(x) = 0}, 


where g(x) = max {0, 91 (x)— ¢2(x)}. 
The function g can be represented as the difference 
of convex (d.c.) functions 


p(x) = max{ p(x), p2(x)} — G2(x) - 


We shall consider only points x € X where g(x) = 
92(x). Then the pair of convex sets 


Do(x) = [co{dgi(x), dy2(x)}, —2(x)] 


is a quasidifferential of the function g at a point x, 
where 0@;(x), i= 1, 2, is the subdifferential of the convex 
function @; at x in the sense of convex analysis. Here co 
A is the convex hull of A. 

If the regularity condition 2 is valid for the function 
g, i.e. there exists a real value B> 0 such that 

min 

Ig =1 

ge N(X,x) 


inf 
x€A(X) 


>8B, 


(v,g)— max (w, g) 


max 
veco{ Ig) (x),AG2(x)} wE0@2(x) 


then there exists an exact penalty parameter c* for the 
sequence of penalty functions 


F(cy, x) = f(x) + cxp(x) 
= (fi(x) + ce max{y (x), g2(x)}) 
= (fa(x) - CK 2(x)) . 
Let the set X be defined as 
X = {x ER": (x) — g2(x) = 0} 5 
then it can be rewritten as 
X= {x ER": g(x) =0}, 
where g(x) = max{0, |;91(x)— @2(x)|}. 


In this case the function ¢ can be represented as the 
difference of convex functions, namely 


p(x) = max {291 (x), 292(x), @1(x) + G2(x)} 
— (gilx) + G2(x)). 


If the regularity condition 2 is valid for the function 
g, i.e. there exists a real value B> 0 such that 


inf min 
*EAC) gil = 1 
ge NX, x) 


(v, g) 


max 
veEco{2dg (x),20G2(x),0p1 (x) +G2(x)} 


> B, 


(w, g) 


max 
wEd[ 1 (x) +G2(x)] 
then there exists an exact penalty parameter c* for the 
sequence of penalty functions 
F(cy, x) = f(x) + cep(x) 
= filx) + cx max{2g (x), 292(x), pi (x) 
+ po(x)} — (fo(x) + cepi(x) + cepr(x)) - 
Thus the solution of the problem (7) can be ob- 


tained as the result of unconstrained optimization of 
d.c.functions. 


V.F. Demyanov [2] considers the following condition 
for constructing a family of exact penalty functions. Put 


d(x) = me g(x, g) 


| g||=1 


(v,g) + min Zz) ’ 


= min } max 
|| || =2 vEdg(x) w€0Q(x) 


Wx(x) = limsup d(x’). 


x! —>x,x/ EX 
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Regularity Condition 3 


If for some € > 0 the set 
Xe = {x ER": p(x) <e} 
is bounded and 


Wx(x) <0, Wx ebdx, 
then for the family of penalty functions F(cx, x) there 
exists an exact penalty parameter c*< oo. 

To use the regularity condition 3 it is necessary to 
know the behavior of the function ¢ in the neighbor- 
hood of the set X. 

Sometimes the following regularity condition is 
used (see [7]). 


Regularity Condition 4 

(Condition of p-regularity). We say that the problem 
(1) with the set X is p-regular if there exists a positive 
number f such that the inequality 


g(x) > Box(x), Vx ER"\X, 


holds, where px is the Euclidean distance function. 

It is not difficult to observe that the regularity condi- 
tion 4 is not constructive. In [7] the existence of an ex- 
act penalty parameter for a family of penalty functions 
is proved for problems of nonlinear programming if the 
condition of p-regularity is satisfied. 
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Directionally Differentiable Functions 


Let f be a real-valued function defined on an open set X 
CR", x € X. The function f is called Dini differentiable 
at the point x in a direction g € R" if there exists the 
finite limit 


/ iia = 
dime) = la er oe) f(x]. (1) 


Here a | 0 means that a — 0, a > 0. The quantity f’p(x, 

g) is called the Dini derivative of f at x in a direction g. 
The function f is called Hadamard differentiable at 

the point x in a direction g € R” if there exists the finite 


limit 


2 we) 7). O) 


lim 
[a,g’]>[+0,g] @& 


fulx, g) = 


Clearly, if f is Hadamard differentiable at x in a direc- 
tion g then it is Dini differentiable as well and 


fulx, g) = folx, g). (3) 


If the limit in (1) exists and is finite for every g € R” 
then the function f is called Dini directionally differen- 
tiable (D-d.d) at x. The quantity f’p(x, g) is called the 
Hadamard derivative of f at x in a direction g. 

If the limit in (2) exists and is finite for every g € R” 
then the function f is called the Hadamard direction- 
ally differentiable (H-d.d) at x. Of course, every H-d.d. 
function at x is D-d.d., the converse is not necessarily 
true. 

The directional (and generalized directional) 
derivatives may be used to describe optimality con- 
ditions (see ®» Dini and Hadamard derivatives in opti- 
mization). However, using properties of special classes 
of functions one can expect to get more ‘constructive’ 
conditions. One of such classes is the family of quasid- 
ifferentiable functions. 


Quasidifferentiable Functions 


Let f be a real-valued function defined on an open set X 
C R", x € X. The function f is called Dini (Hadamard) 
quasidifferentiable (q.d) at x if it is Dini (Hadamard) 
directionally differentiable at x and if its directional 
derivative f’p(x, g) (f/u(x, g)) can be represented in the 
form 


; ; 
(x,g) = max (v,g)+ min (w,g), 
fol* & v0 f(x) : wed fp(x) . 


(iio = max (v,g)+ min 0) , 


ved f(x) w€0fu(x) 


where the sets 0 fp(x), dfo(x), Ofu(x), 0 f(x) are con- 


vex compact sets of R”. The pair 
Dfp(x) = [8fo(x), Afo(x)] . 
(Dfiu(x) = [8fulx), Bfu(2)1) 


is called a Dini (Hadamard) quasidifferential of f at x. 
Most of the results stated below are valid for both Dini 


3206 


Quasidifferentiable Optimization: Optimality Conditions 


and Hadamard q.d. functions, therefore we shall use the 
notation Df(x) = [df (x), Of (x) for both Dpf(x) and 
Dyf (x) and the pair Df(x) will just be called a quasid- 
ifferential of f at x. Analogously, the notation f’(x, g) is 
used for both f’p(x, g) and f’H(x, g). 

The directional derivative f’(x, g) is positively ho- 
mogeneous (in g) of degree one: 


FRAGDHAP RO, WAS, (4) 


Note that Hadamard quasidifferentiability implies Dini 
quasidifferentiability, the converse not necessarily be- 
ing true. 

Thus for a quasidifferentiable (q.d) function 


7 . 
(x,g) = max (v,g)+ min (w,g), 
Fag vedf(x) 8 wedf(x) . (5) 


VgeR". 


The set df (x) is called a subdifferential of f at x, and 
the set f(x) is called a superdifferential of f at x. Note 
that a quasidifferential is not uniquely defined: If a pair 
D = [A, B] is a quasidifferential of f at x then, e. g., for 
any convex compact set C C R" the pair D; = [A + 
C, B — C] is a quasidifferential of f at x (since, by (5), 
both these pairs produce the same function f’(x, g)). 
The equivalence class of pairs of convex compact sets 
Df (x) = [df(x), Of (x) producing the function f’(x, g) 
by formula (5) is called the quasidifferential of f at x (we 
shall use the same notation Df (x) for the quasidifferen- 
tial of f at x). 

If there exists a quasidifferential Df(x) of the form 
Df (x) = [of (x), {0,}] then f is called subdifferentiable at 
x. If there exists a quasidifferential Df(x) of the form 
Df(x) = [{0n}, Of (x)] then f is called superdifferen- 
tiable at x. Here 0, = (0,...,0) € R”. 


Examples of q.d. Functions 


1) Iff isa smooth function on X then 


FReo= Oe: (6) 


where f’(x) is the gradient of f at x. It is clear that 


f(x. g) = ae) + Re 8). (7) 
with 
f(x) = {f'(@)}, Of (x) = {On}. 


Hence, f is Hadamard quasidifferentiable and even 
subdifferentiable. Since in (7) one can also take 


Af (x) = {On}, Of (x) = (f'()}, 


then f is superdifferentiable as well. 

2) Iff is a convex function on a convex open set X C 
R" then (as it is known from convex analysis) f is 
H-d.d. on X and 

7 
(x,g) = max (v,g), 
f 8 vedf(x) 8 
where Of (x) is the subdifferential of f (in the sense of 
convex analysis): 


fz) — fle) = ,z-*) ) 


ER’: 
, VzEx 


af(x) = 

Therefore f is Hadamard quasidifferentiable and one 

can take the pair Df(x) = [ Of (x), {0,}] as its quasid- 
ifferential. Thus, f is even subdifferentiable. 

3) Iff is concave on a convex set X then f is H-d.d. and 


f'(x,g) = min (w, g), 
w€df(x) 


where 


(z) — f(x) < (w,z— x) 


a _ n, f 
Of(x) = jweR": WreX 


Hence, f is Hadamard quasidifferentiable and one 
can take the pair 


Df(x) = [{0,}, f(x] 


as its quasidifferential. Thus, f is even superdifferen- 
tiable. 


Calculus of Quasidifferentials 


The family of q.d. functions enjoys a well-developed 
calculus: First let us define the operation of addition of 
two pairs of compact convex sets and the operation of 
multiplication of a pair by a real number. 

If D; = [Aj, Bi], D2 = [A2, Bp] are pairs of convex 
compact sets in R” then 


D, + D2 = [A, B] 
with 


A=A,+A2, B=B,+B. 
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If D = [A, B] where A and B are convex compact sets, A 
€ R then 


_ (AA, AB], A>, 
~ )[AB,AAJ], A <0. 


Let X C R” be an open set. 


Proposition 1 ([1, Chap. II]) 

1) If functions f;, ..., fw are quasidifferentiable at 
a point x € X, and Aj,..., Aw are real numbers then 
the function 


f= Dail 


is also quasidifferentiable at x with a quasidifferential 


Df (x) = [Of (x), Of (x)] where 


N 
Df(x) = 95 A:Dfi(x), (8) 


i=1 


Df (x) being a quasidifferential of f; at x. 
2) If f; and fz are quasidifferentiable at a point x € X 
then the function f =f 1- f is also q.d. at x and 


Df (x) = filx)Df(x) + fax) Dfi(x) - (9) 


3) If f; and fz are quasidifferentiable at a point x € X 
and f >(x) 4 0 then the function f = {f if 2} is also q.d. 
at x and 


1 


Df(x) = Px) 


[fo(x)Dfi(x) — fi(x)Dfr(x)] . (10) 


4) Let functions f), ..., fy be quasidifferentiable at 
a point x € X. Construct the functions 


gpi(x) = Pease 
Oe =, aan fix) . 
Then the functions ~ and @2 are q.d. at x and 


Dea (x) = [dgi(x), Agi (x)] , 


S (11) 
D@2(x) = [dga(x), OGa(x)] , 


where 


0g1(x) = co 4 Of,(x) — se Ofi(x): k € R(x) 


i € R(x) 
ixk 
doi(x)= D> Of, 
kER(x) 
dgrx(x)= D> aft, 
kEQ(x) 


I92(x) = co 4 Ofx(x)— D> Afi(x): k € Q(x) 
i € Q(x) 
ixk 
Here [Ofx(x), Ofx(x)] is a quasidifferential of the 
function f;, at the point x, 


R(x) = {i€1,...,N: fi(x) = gi(x)} , 
Q(x) = {iE 1,...,N: filx) = @(x)}. 


The following composition theorem holds. 


Proposition 2 [1, Chap. III]) Let X be an open set in 
R", Y be an open set in R™ and let a mapping H(x) = 
(hy (x), ...,; Mm(x)) be defined on X, take its values in Y 
and its coordinate functions h; be quasidifferentiable at 
a point xo € X. Assume also that a function f is defined 
on Y and is Hadamard quasidifferentiable at the point 
yo = H(xo). Then the function 


g(x) = f(A(x)) 
is quasidifferentiable at the point xo. 


The corresponding formula for the quasidifferential of 
gy at xo is presented in [Thm. III.2.3] 


Remark 3 Thus, the family of quasidifferentiable func- 
tions is a linear space closed with respect to all ‘smooth’ 
operations and, what is most important, the operations 
of taking the pointwise maximum and minimum. For- 
mulas (8)-(10) are just generalizations of the rules of 
classical differential calculus. Most problems and re- 
sults of classical differential calculus may be formulated 
for nonsmooth functions in terms of quasidifferentials 
(see, e. g., [1,3]). For example, a mean value theorem is 
valid [5]. 
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Necessary and Sufficient Conditions 
for an Unconstrained Optimum 


The following results hold due to the properties of di- 
rectionally differentiable functions. 

Let X C R" be an open set, f be a real-valued func- 
tion defined and directionally differentiable on X. 


Proposition 4 Fora point x * € X to bea local or global 
minimizer of f on X it is necessary that 


f'(x*,g=0, VWeeR". (12) 
If f is Hadamard d.d. at x * and 
fax" .2)>0, Veer’, ¢70,, (13) 


then x * is a strict local minimizer of f. 
For a point x * * € X to be a local or global maxi- 
mizer of f on X it is necessary that 


f'(x**,g) <0, VgeR". (14) 
If f is Hadamard d.d. at x * * and 
filx**,g) <0, VgeER", g4#0,, (15) 


then x * * is a strict local maximizer of f. 


These conditions may be restated in terms of quasidif- 
ferentials. Let f be quasidifferentiable on an open set X 
CR". 


Proposition 5 (see [1,3,5]) For a point x * € X to be 
a local or global minimizer of f on X it is necessary that 


— Of (x*) C af(x*). (16) 
If f is Hadamard quasidifferentiable at x * and 
— Of (x*) C int af(x*), (17) 


then x * is a strict local minimizer of f. 
For a point x** € X to be a local or global maximizer 
of f on X it is necessary that 


— af(x**) C af(x**). 


If f is Hadamard quasidifferentiable at x** and 


(18) 


— af(x**) C int Of (x**), (19) 


then x* * is a strict local maximizer of f. 


Remark 6 The quasidifferential represents a general- 
ization of the notion of gradient to the nonsmooth case 
and therefore conditions (16)-(19) are first order opti- 
mality conditions. 

In the smooth case one can take Df(x) = 
[Af (x), Af (x)] where Af(x) = {f'(x)}, Of (x) = {0n}s 


therefore condition (16) is equivalent to 


F(x") =0n, (20) 
condition (18) is equivalent to 
f'(x**) =0,, (21) 


and, since both sets 0f and 0f are singletons, the con- 
ditions (17) and (19) are impossible. Thus, conditions 
(17) and (19) are essentially ‘nonsmooth’. 


A point x* € X satisfying (16) is called an inf-stationary 
point, a point x** € X satisfying (18) is called a sup- 
stationary point of f. In the smooth case the necessary 
condition for a minimum (20) is the same as the neces- 
sary condition for a maximum (21). 


Directions of Steepest Descent and Ascent 


Let x € X be not an inf-stationary point of f (i.e. condi- 
tion (16) is not satisfied). Take w € df (x) and find 


min |/v + wl] = ||v(w) + wl] = p(w). 
ved f(x) 
Since df (x) is a convex compact set, the point v(w) 
is unique. Find now 


max p,(w) = pi(w(x)). 
w€0f(x) 
The point w(x) is not necessarily unique. As x is not an 
inf-stationary point, then p;(w(x)) > 0. The direction 


v(w(x)) + w(x) 
gilx) = = 


7 _ v(x) + w(x) 
[Xr Fw] 


pilw(x)) 
(22) 


is a steepest descent direction of the function f at the 
point x, i.e. 


f'(x, g(x) = ae f'(x,g). 


Ilg=1 


Here || - || is the Euclidean norm. The quantity f’(x, 
gi(x)) = — pi(w(x)) is the rate of steepest descent of f 
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at x. It may happen that the set of steepest descent di- 
rections is not a singleton (and it need not be convex 
too). Recall that in the smooth case the steepest descent 
direction is always unique (if x is not a stationary point). 

Similarly, if x € X is not a sup-stationary point of f 
(i. e. condition (18) does not hold) then let us take v € 
Of (x) and find 


min ||v + wl] = |lv + w(v)|| = p2(v) 
wed f(x) 


and 


oF p2(v) = pa(v(x)) . 


The direction 


= LOA WO) _ 
os lv) + wiv) | 


v(x) + w(v(x)) 


pee 


is a steepest ascent direction of the function f at x, i.e. 


f'(x, g2(x))) = max f"(x, g). 
Ilsll=1 


The quantity f’ (x, g2(x)) = p2(v(x)) is the rate of steepest 


ascent of f at x. As above it may happen that there exist 
many steepest ascent directions. 


Remark 7 Thus, the necessary conditions (16) and (18) 
are ‘constructive’: in the case where one of these condi- 
tions is violated we are able to find steepest descent or 
ascent directions. 


The condition for a minimum (16) can be rewritten in 
the equivalent form 


On€ () [af(e*)+w]:= Lik’), 


wed f(x*) 


(24) 


and the condition for a maximum (18) can also be rep- 
resented in the equivalent form 


(\ [af**) +] = L(x**). 
ved f(x**) 


On € (25) 


However, if, for example, (24) is violated at a point 
x, we are unable to recover steepest descent directions, 
it may even happen that the set L)(x) is empty (see [1, 
Sects. V.2 and V.3]). 

Therefore, condition (24) is not ‘constructive’: if 
a point x is not inf-stationary then condition (24) sup- 
plies no information about the behavior of the function 


in a neighborhood of x and we are unable to get a ‘bet- 
ter’ point (e.g., to decrease the value of the function). 
The same is true for the condition for a maximum (25). 
Nevertheless conditions (25) and (25) may be useful for 
some other purposes. 


Example 8 Let x = (x, x) € R?, xo = (0, 0), f(x) = 

|x |— |x|. We have f(x) = f1(x)— f(x), where f1(x) 

= max{f3(x), fa(x)}, fo(x) = {f5(x), fox}, f(x) = oe 

falx) =— x, fs(x) = x®), f(x) = — x). The functions 
f3-f6 are smooth therefore (see (7)) 

Dfs(x) = [0 a(x), 0fa(x)] 

with Of3(x) = {(1,0)}, Ofa(x) = {(0,0)}, 

Dfa(x) = [0 falx), Ofa(x)] 

with dfs(x) = {(-1,0)}, dfa(x) = {(0, 0)}, 

Dfs(x) = [dfs(x), 0fs(x)] 

with 9f5(x) = {(0, 1)}, Afs(x) = {(0,0)}, 

Dfo(x) = [dfo(x), Of6(x)] 

with dfe(x) = {(0, 1}, Dfe(x) = {(0,0)}, 


Applying (11) one gets Dfi(xo) = [8fi(xo), Afi (xo)], 
where 

Ofi (xo) = eratornameco{d f3(xo) — 8 fa(xo) 
Afal%o) — Afs(%0)} = cof(1, 0), (1, 0)}., 

Afu(xo) = {(0,0)} , 

Df2(x0) = [8 fa(x0), Ofa(xo)] , 


where 


Of2(xo) > cota fs(xo) _ 0 fe(xo), Ofe(xo) _ Ofs(xo)} 
= co{(0,1),(0,—1)}, 


Ofa(x0) = {(0, 0)}. 
Finally, formula (8) yields 


Df (xo) = [Af (x0), OF (xo)] , 
where 

Af (xo) = co{(1, 0), (—1, 0)}, 

Af (xo) = cof(0, 1), (0,—1)}. 


Since (see Fig. 1) conditions (16) and (18) are not sat- 
isfied, the point xo is neither inf-stationary nor sup- 
stationary. 
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Quasidifferentiable Optimization: Optimality Conditions, 
Figure 1 


A 


Quasidifferentiable Optimization: Optimality Conditions, 
Figure 2 


Applying (22) and (23) we conclude that there exist 
two directions of steepest descent: g) = (0, 1), g1' = (0, 
— 1) and two directions of steepest ascent: g> = (1, 0), 
g =(— 1,0). 

It is also clear that the sets (see (24), (25)) 


Li(x0) =  (-) [af (xo) + w] 


wEdf (xo) 
and 
L2(x0) = (| [fF (x0) + ¥] 
ved f (xo) 


are both empty. 


Remark 9 Ifa function f is directionally differentiable 
but not quasidifferentiable, and if its directional deriva- 
tive f’(x, g) is continuous as a function of direction (this 
is the case, e.g., if f is directionally differentiable and 
Lipschitz) then (by the Stone-Weierstrass theorem) its 
directional derivative may be approximated (to within 
any given accuracy) by the difference of two positively 
homogeneous convex functions, i.e. 
i eS 3 

f(x 9 & max(v, g)t+ min(w, g) : (26) 
where A and B are convex compact sets in R”. Rela- 
tion (26) shows that f’ can be studied by means of qua- 
sidifferential calculus (e.g., one is able to find an ap- 
proximation of a steepest descent direction etc.). Cor- 
responding results can be found in [1,4]. 


Remark 10 In many cases of practical importance the 
quasidifferential of a function f is a pair of sets each of 
them being the convex hull of a finite number of points 
or/and balls. If this happens it is easy to store and op- 
erate with the quasidifferential, to check necessary con- 
ditions, to find directions of descent or ascent, to con- 
struct numerical methods. 


Necessary and Sufficient Conditions 
for a Constrained Optimum 


Let a function f be defined and finite on some open set 
X CR" and let 2 C X. Consider the problem of finding 
a minimum or a maximum of f on {2. For the definite- 
ness in the sequel we shall consider only the problem of 
minimizing f on {2 since the problem of maximizing f 
is the problem of minimizing the function f; = —f. 

Let x € 92. The set 


Atlan, gel}: 
[ax gx] > [+0, g] 
X+ Age E 92 
Vk 


T(x,Q)= 4geER": (27) 


is called the Bouligand cone to the set (2 at the point x 
(or the cone of feasible directions). It is nonempty and 
closed. If x € int 2 then (x, 2) = R". 


Proposition 11 Let f be Hadamard directionally differ- 
entiable at a point x* € 92. For the point x* to be a local 
or global minimizer of f on &2 it is necessary that 


fie O20, Veero@"2). (28) 
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If 


fax’ .g@)>0, VeelT@’,2), ¢40,, (29) 


then x* is a strict local minimizer of f on Q, i.e. there 
exists 5> 0 such that 


TOS Te), 
Vx €Q, |x -—x*|| <5, x 4x". 


The set $2 C R" is called quasidifferentiable if it can be 
represented in the form 


Q = {x ER": h(x) <0}, (30) 


where h is a quasidifferentiable function. 
Take x € Q and consider the cones 


yi(x) = {g ER": h'(x, g) < 0}, 
I(x) = {g ER": h'(x,g) < o} . 


Let h(x) = 0. We say that the ‘regularity condition is sat- 
isfied at x if 


cl yy(x) = I(x). (31) 


If h(x) = 0 (i.e., by (30), x € £2) and the regularity 
condition (31) holds then 


T(x, 2) = I(x). (32) 


Now we are able to express condition (28) and (29) 
in terms of quasidifferentials of the functions f and h. 

If h(x)< 0 then x € int Q, I(x, 2) = R" and, by 
Proposition 5, conditions (16) and (17) hold. Therefore 
let us consider the case h(x) = 0. 


Proposition 12 Let functions f and h be Hadamard 
quasidifferentiable at a point x* € 82 and h(x*) = 0. As- 
sume also that the regularity condition (31) is satisfied at 
x*, For the point x* to be a local or global minimizer of f 
on §2 it is necessary that 


(df (x*) + w) () [—cl(cone(dh(x*) + w’))] 4 @ 


(33) 
for all w € Of (x*), w! € Oh(x*). 
Condition (33) is equivalent to the condition 
— Of (x*) C L(x*), (34) 


where 


L(x) = () [af(x) + d(cone(ah(x) + w))] - 


w€0h(x) 


The set L(x) is nonempty and convex. 
If h(x*) = 0 and 


— Of (x*) C int L(x*), (35) 


then x* is a strict local minimizer of f on 2. 

A point x* € 92 is called an inf-stationary point of f 
on £2 if condition (28) holds. 

Let x € 92, h(x) = 0. Assume that x is not an inf- 
stationary point and find 


min \|z ++ z'| = ||z(w, w’) + z(w,w’) | 
z€[0f(x)+w] 
= |v(w.w)+ wtv'ww') + w'| 
= Jaw. = A000) 
and 
p(x) = max d(w,w’) 
w € Of(x) 
w! € Oh(x) (36) 
= d(wo, Wy) = ||q(wo. wo) || 


Since x is not inf-stationary then p(x)> 0. 
Proposition 13 [f h(x) = 0 and the regularity condition 
(31) holds then the direction 
_ qlwo, Wo) 

p(x) 


is a steepest descent direction of f on 92 at x and go € 
I(x, 2), 


L= (37) 


f 2) = min f'(x, 2) 
Il g||=1.8er@) 
= — |¢(wo, w)|| = —e(x) 


i. €. — p(x) is the rate of steepest descent. 

Remark 14 If there exist several pairs [wo, wo](wo € 
df (x), wy € Oh(x)) satisfying (36), then (by (37)) there 
are several steepest descent directions. 


Remark 15 Condition (33) is also equivalent to 


On € ‘a [af(x*) + w + cl(cone(dh(x*) + w’))] 


w € Of(x’) 
w! € Oh(x*) 


= L'(x*). 
(38) 
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However, condition (38) is not ‘constructive’ since 
the set L’(x) may happen to be empty if x is not a sta- 
tionary point (we consider the case h(x) = 0). 


Proposition 16 Let x* € 92 and h(x*) =0. Assume that 
the functions f and h are quasidifferentiable at x*. For 
the point x* to be a local or global minimizer of f on 2 
it is necessary that 


Ly(x*) C Lp(x*), (39) 


where 
Ly(x) = — [Of (x) + dh(x)], 
L,(x) = cofdf(x) — dh(x) , Ah(x) — Of (x)}. 


If, in addition, f and h are Hadamard q.d. at x*, 
h(x*) = 0 and L;(x*) C int L2(x*) then x* is a strict local 
minimizer of f on 92. 


Proposition 17 Let h(x*) = 0, f and h be Hadamard 
q.d. at x*. If the regularity condition (31) holds at x* then 
condition (39) is equivalent to condition (28). 


Let x € 92, h(x) = 0. Assume that (39) does not hold. 
Find 
d(x) = max p(v) = plv(x)), 
veEL,(x) 


where 


piv) = min |lv—w] = |lv—w)]]. 
w€L2(x) 


Since (39) is not satisfied then p(v(x)) > 0. 


Proposition 18 The direction 


,_ ox) —w(r(2)) 
80 = ~~ Br) oie 


is a descent direction of f on &2 at x. 


Remark 19 While the steepest descent direction go (see 
(37)) may be not admissible, the direction go’ (see (40)) 
is always admissible, i.e. for sufficiently small a> 0 we 
have x+ @ go € 2. 


Recent results and the state-of-the-art in quasidifferen- 
tial calculus can be found in [2]. 
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Problems in mechanics whose governing relations can 
be obtained from a generally nondifferentiable and 
nonconvex, but quasidifferentiable (in the sense of V-F. 
Demyanov and A.M. Rubinov) potential function are 
considered. They consider a fairly general form for the 
modeling and the study of nonsmooth problems in 
mechanics [4] and they cover certain classes of varia- 
tional and hemivariational inequality problems of me- 
chanics [14,15]. The notion of hemivariational inequal- 
ities has been introduced and thoroughly studied in 
mechanics by P.D. Panagiotopoulos (see also » Non- 
convex energy functions: Hemivariational inequalities; 


> Hemivariational inequalities: Applications in me- 
chanics). Moreover, there exists extensive theoretical 
support for the use of quasidifferentiable calculus and 
optimization techniques, see, e.g., [3,4]. For methods 
and heuristic algorithms of nonconvex optimization 
in computational mechanics, see [12]. In this short 
note some techniques for treating stability problems 
for nonsmooth structures are outlined. This way re- 
sults for classical, smooth structures (e.g., [1,10,11]) 
can be extended to cover nonsmooth ones (cf., also 
[8,9]). This work and the preliminary results outlined 
here are based on [18,19]. 

All previously mentioned potentials are piecewise- 
differentiable and may be described, in general, as con- 
tinuous selections of differentiable functions. In turn, 
the structural analysis problem results from minimality 
or in general critical point conditions of the potential 
(see examples in [2,6,7,14,15]). 

Results from stability analysis of parametric opti- 
mization problems for nondifferentiable functions are 
used for the study of a stepwise holonomic, incremental 
structural analysis problem. In particular the system- 
atic first and second order linearizations proposed in 
respectively, and the arising normal forms are adopted 
for the potential energy function. 

The techniques outlined here may be useful both for 
the analysis of the stability of structures which involve 
nonmonotone and possibly multivalued nonlinearities 
(in a holonomic or a stepwise holonomic setting) and 
for the design of incremental-iterative algorithms for 
structural analysis purposes. 


Smooth Potentials and Stability in Mechanics 


Let a discretized elastostatic analysis problem be formu- 
lated as a potential energy minimization problem: 


re {IT (u, 2) = IT(e(u)) — p(u, A)} . (1) 


Here u is the n-vector of displacement degrees of free- 
dom, e is the m-vector of discrete element deforma- 
tions, IT (e) € R is the internal energy density, p (u, A) 
€ R is the external loading potential, parametrized by 
a loading scalar A € R' and U ad C R" is the space of 
admissible displacements. Displacement u and defor- 
mation e vectors are connected by the geometric com- 
patibility operator A(u): R’—R” such that e (u) = A (u) 
holds. 
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On the assumption that [7(u,A) is smooth, the 
equilibrium configurations for the structural system are 
critical points of this potential, i.e. for fixed A = A one 
has: 


we {u ER": V,l(u,A) = 0} ; (2) 


Moreover, inspection of the second order deriva- 
tives (the Hessian matrix of [7(u, A)) gives us stability 
information [1]. If u is a nondegenerate critical point, 
ie.ue Soi and V2T7T(u, 2) is regular, then a positive 
or negative definite Hessian V2TT(u, A) indicates that 
the point u is a local minimum or maximum, respec- 
tively. Only local minima correspond to stable equilib- 
rium configurations. If V2/7(u, A) is singular in u € 
» crit, then higher order derivatives of TI(u, X) must be 
examined for stability [1]. 

If uo is either a noncritical or a nondegenerate crit- 
ical point of TI(u, A), which is assumed here to be at 
least a C?-function, i.e. two times continuously differ- 
entiable, then /7(u, A) is C! -equivalent to its second or- 
der approximation around up, i.e. 


Tew =n DAV waa 
i su tig) VTE Gig, G48) 
(3) 


where @ is a C!-coordinate transformation (diffeomor- 
phism). In the vicinity of a nondegenerate critical point 
the behavior of [7(u, A) is characterized by the number 
of negative eigenvalues of V27T(u, A) (the quadratic in- 
dex). 

In the coordinates A u = u — uo and the notation 
AlT(Au, A) = IT(u, A) — IT(uo, A) we can determine 
(cf. [5, p. 21]) a local C!-coordinate transformation ®: 
U — V, where U, V are neighborhoods of the origin, 
such that: 


ev a) ae eating 0 a ee 
VyeV. 
(4) 


Qualitative stability results for fixed load A = A are rec- 
ognized in the normal form (4). 


Incremental Algorithm 


Incremental-iterative solution algorithms are based on 
appropriate approximations of (1). Let us consider the 


one-parametric load incrementation on the following 
case of (1) (cf. [10]): 


min {77 (u, A) = IT(u) — Ap! u} ; (5) 
ue n 
For equilibrium we have 


Vill(u,A)=0 => V,JT(u)—Ap=0 (6) 
For the examination of the stability of a solution we 
study the following relation in terms of A — Ap = A A, 
(defining Au as a function of AA, if V? IT (uo) is regu- 
lar) 


V2TT(uy)Au + AAp =0, (7) 


which connects the incremental displacement A u for 
a change of loading equal to A A p. Relation (7) can be 
produced by subtraction of the Taylor expansions of the 
equilibrium equation (6) in (uo + A u, Ao + A A) and 
(uo, Ao), respectively, and by using the approximation 
(up to higher order terms) 


VT (uo + Au) = ViulT (uo) + V2 (uo)Au. (8) 


Consider the coordinate transformation: A u= @~! (y) 
= $; yi = F y where @; are the eigenvectors of V2 IT (uo) 
and the summation convention over repeated indices is 
used. Then equation (7) is written in the new coordi- 
nate system as: 


VII (uo)Fy — pAA = 0 
=> F'V?II(u)Fy—F'pAA=0 (9) 
> [wiyi| = F'pAad =0. 


Here @; are the eigenvalues of the local tangential stiff- 
ness matrix K (uo) = V? IT (uo), which act as stability 
coefficients for the linearized equation of equilibrium 
(7) [10,11]. 


Nonsmooth Superpotentials 


Let us assume problem (1) with a nonsmooth potential 
energy function. For simplicity let only the internal en- 
ergy function IT (u) be nonsmooth and Ujq = R” in (1). 

Let V denote an open subset of R”. We call a func- 
tion f: V > Ra continuous selection of the C’-functions 
gi: V > R,1 <i<k, (briefly, f € CS{g1, ..., gx}), iff is 
continuous and Vu € V die {1,..., k}: f(u) = gi(u). Let 
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TT(u) in (1) be a piecewise differentiable PC’) function 
of appropriate order r > 1, defined on an open set U C 
R". This means that (cf. [7]) at every point uo € U there 
exists an open neighborhood V C U and a finite col- 
lection of C’-functions {J7;,..., [7,} defined on V such 
that TT|y € CS{I7j,..., IT}. 

Let I(u) be the active index set set{i: IT(u) = IT; (u)} 
One considers a smooth external loading potential p (u, 
4), which depends on the one-dimensional loading pa- 
rameter A (cf. (1)). 

The assumption of a PC’-potential energy function 
is very general and covers a large number of nonsmooth 
mechanics applications (see, also, [13, Chap. 8]). More 
detailed analysis of the requirements which are neces- 
sary in order for a PC’-function to be the potential of 
a certain structural analysis problem must be investi- 
gated on a case-by-case basis. 

Any PC’-potential is locally Lipschitz continuous 
and Bouligand differentiable with the B-derivative at 
a point up € R” in the direction d € R" being a con- 
tinuous selection of the functions VI7;(uo)'d,i € 
Tuo). Here I(uo) denotes the essentially active index set 
Tuo) = {i € I(uo): uo € d(int({u € U: M(u) = 
IT;(u)}))}, with cl (resp. int) abbreviating the closure 
(resp. the interior) of a set. For completeness, recall 
that Clarke’s generalized subdifferential is given by [7] 
dcMT (up) = conv {Vi(uo): i € (uo) where conv 


stands for the convex hull. 


Nonsmooth Local Approximations 


For the needs of the applications in mechanics the first 
and the second order differentiation, or the appropriate 
analogous nonsmooth notions, and suitable local non- 
smooth approximations which generalize the (second 
order) Taylor expansion of a smooth function are used. 
A local coordinate transformation will provide us with 
a simple formulation of the energy minimization prob- 
lem, cf. (4), which, in turn, will be used for the extrac- 
tion of stability information analogous to (9). 
Following [6], a critical point uo of a PC?-potential 
function IT(u) is called a nondegenerate critical point if 
TT is locally representable as a continuous selection of 
functions /7,,..., I7, such that the following proper- 
ties are true: 
ND1) the vectors VIT ;(uo), j € I(uo)\{i} are linearly 
independent Vi € I (uo), 


ND2) the restricted Hessian of the Lagrangian, the ma- 
trix V? L(uo) | v(up)s is invertible. 
Here V(u}o) denotes the space 
yern, [VHiluo) — VITj(uo)l"y = 0, 
i, j € I(uo) 
For the Lagrangian 


L(u) = S> AjTTi(u) 


i€I(ug) 
holds 
> AiVI (uo) = 0, 
i€I (uo) (10) 
EH 1, MSD. 
i€I(ug) 


The qualitative behavior of the potential energy func- 
tion, the link to the stability of the described mechanical 
system, can be shown if one considers the normal form 
(cf. (4))). In this context, the following result of [6] is of 
importance. 

Let IT € CS(IT;,i € I) and let up € R" be a nonde- 
generate critical point for [T with quadratic index equal 
to q. Suppose moreover that |Io(uo)| =k + 1. Then at uo, 
the potential is topologically equivalent to g(y), where 


k 
B(V1.--+5 Yn) =H (up) + CS (rmx 
i=1 


k+q n 


De It 


j=k+1 


vi 


r=k+qt+l 


(11) 


One observes that the second term in the right-hand 
side of (11) is sufficiently rich to describe locally every 
type of nonsmooth, finite-dimensional functions. 

Furthermore, following [7] one notes that a PC?- 
function can always be transformed into the min-max 
normal form: 


TT(u) = max min TT ;(u) ; (12) 


1Si<k j€M; 
where (12) is considered as a local representation of the 
potential in a neighborhood of uo, M; € {1,..., m} and 
the functions Hy: U > R,j € {I, ..., m}, are C’- 
functions. 
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In this case a consistent nonsmooth second order 
approximation of the PC?-potential, expressed by the 
normal form (12), is given by: 


max min {  ITj(uo) + VI j(uo) *(u ~ uo) 
i<i<k jem; | +4(u— uo)! V7ITj(uo)(u — uo) J 


(13) 


Note here that the previously denoted min-max form is 
not defined in an unique way. 


Stability Results. Discussion 


For a structural analysis system with a structured non- 
smooth PC?-potential with (11) and for a nondegener- 
ate critical point up € R”, the local approximation (11) 
is available. Let us assume a potential of the external 
loading equal to A pT u, as in (5) and let for the present 
the load parameter 1 be fixed to a given value. From 
(11) the following complete subdivision of the coordi- 
nate space R” arises: 


R" = R‘ @RI1@R™ *7 = R™™ OR™ OR", (14) 


where R®°” stands for the essentially nondifferentiable 
subspace, R"" for the unstable subspace and R* for 
the stable subspace. Let, moreover, the local coordinate 
transformation that leads to (11) be traced for the com- 
ponents of the vector A p (cf. (6)-(9)). Let the compo- 
nents of the last vector in the three subspaces of ((14)) 
be p"™, p"" and p*, respectively. 

Further one considers the type of the CS in the lin- 
ear term of the right-hand side in (11) in comparison 
with the three above defined components of the load- 
ing vector. This information can be used for stability 
analysis. Smooth and nonsmooth contributions should 
be treated separately. For the nonsmooth part, for ex- 
ample, if one has a max-type function and q = 0, then 
only stable local minima of the potential energy func- 
tion arise. 

The above outlined scheme can be followed for the 
derivation of stability considerations for a structure at 
a given point and for a given loading level (A is con- 
stant). For the examination of the stability question 
along a given loading path (one-parametric change of 
4) one should take into account that the local repre- 
sentation (11) may change as A changes. The results 
are qualitative of the same nature, but, for practical ap- 


plications, a combinatorial problem arises, which con- 
cerns the way of possible changes of the subdivision 
(14) as loading changes. Further work in this direc- 
tion will generalize the computational mechanics tech- 
niques for the tracing of post-buckling equilibria in 
nonsmooth mechanics’ applications. Theoretical sup- 
port will be provided by the theory of parametric op- 
timization (cf., e. g., [5] and the applications in contact 
mechanics [16,17]). 
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In science and, especially in engineering, the variational 
or weak formulation of a given boundary value prob- 
lem has certain advantages. Instead of writing point- 
wise relations (for example, partial differential equa- 
tions) which hold for each point of the considered sys- 
tem, one multiplies the governing relation with an ar- 
bitrary virtual variation, integrates over the entire area 
and requires that the latter integral be equal to zero. 
This is a weak or a variational formulation of the prob- 
lem. Since the considered virtual variation is arbitrary 
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one gets back, on the assumption of sufficient regular- 
ity, the initial pointwise relations. 

Variational formulations provide the basis for the 
development of numerical approximation methods (for 
example, by the finite element method). One of the ad- 
vantages is that by performing partial integration one 
transfers differentiability requirements from the actual 
variables of the problem to the virtual ones, which, in 
turn, results in less demanding requirements on the 
complexity of the required finite element basis approx- 
imation functions. The literature on variational prob- 
lems is very large, so that every selection of references 
would be incomplete. In this sense, let us mention here 
the publications [1,3,7,8,14]. 

In the language of smooth optimization, instead of 
considering the first order optimality condition that the 
derivative of a function at a given point is equal to zero, 
one proceeds as follows. The latter equation is multi- 
plied by a virtual change of the variables along an arbi- 
trary direction. Then, one considers the equivalent re- 
lation that the directional derivative of the function is 
equal to zero for all directions emanating from the as- 
sumed point. 

In mechanics the arising quantities have a physical 
meaning (for instance, they correspond to the virtual 
work of a system). For historical reasons one speaks 
about variational principles. Moreover, on adequate 
smoothness assumptions one writes variational equal- 
ities. Finally, for engineering applications, and depend- 
ing on the nature of the studied problem, one has to 
solve, after numerical discretization, systems of linear 
or nonlinear equations. 

In connection with convex, nondifferentiable po- 
tentials or for convex problems with inequality con- 
straints it is intuitively conceivable that not all virtual 
variations are allowed for. The theory of variational in- 
equalities has been developed for the study of this class 
of problems. It is connected with the subdifferential of 
convex analysis and it is appropriate for the study of 
monotone operators [5,9]. In simple cases, or after ap- 
propriate reformulations one gets linear or nonlinear 
complementarity problems (see, e. g. [6] for a recent re- 
view). 

For general nonconvex and nonsmooth problems 
a nonconvex extension of the notion of the variational 
inequality is required. For potential operators and by 
using the generalized subdifferential in the sense of F.H. 


Clarke, this class of variational problems have been 
developed and studied by P.D. Panagiotopoulos, who 
called them hemivariational inequalities. See ®» Non- 
convex energy functions: Hemivariational inequalities; 
> Hemivariational inequalities: Applications in me- 
chanics or [9,11] for more details. 

The notion of quasidifferentiability, in the sense of 
V.F. Demyanov and A.M. Rubinov, provides an ele- 
gant way for the formulation and study of noncon- 
vex variational inequality problems. By taking advan- 
tage of the ability of the quasidifferentials to provide 
a qualitative and quantitative nonsmooth approxima- 
tion of anonsmooth function one arrives at a very pow- 
erful variational description of the problem. This link 
has been studied for several applications in mechanics 
in [4,10,12]. One should mention that the author’s un- 
derstanding of this theory and their first attempts have 
been based on previous theoretical results of C.A. Stuart 
and J.F. Toland [13] and G. Auchmuty [2] concerning 
difference convex energy functions. Of course, the class 
of difference convex functions is included in the class of 
quasidifferentiable functions, so that the here presented 
approach is sufficiently general. 


Variational Formulation of Subdifferential Laws 


Let us assume a monotone possibly multivalued (i. e., 
with complete vertical branches) relation (a law) be- 
tween the quantities u and —f. To be more precise, one 
may think about a nonlinear boundary law which con- 
nects boundary reactions —f with boundary displace- 
ments u in mechanics. Let a convex l.s.c. and proper 
function ® exists, the convex superpotential in the 
sense of J.-J. Moreau, and that the previously men- 
tioned law is written in the subdifferential form: 


—f edu). (1) 


Here 0 denotes the subdifferential of convex analysis. 
Function @(u) can be considered as the potential en- 
ergy corresponding to the mechanical law (1). 

By definition, (1) is equivalent to the following vari- 
ational inequality: 

P(u*)— O(u) > —(f,u*—u), Wu ER. (2) 


For example, if ® is the indicator Ix of a convex 
closed interval K of R, then one has 


~ f € Ax(u). (3) 
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This is a unilateral constraint as one easily recognizes by 
considering the equivalent variational inequality (for u 
€ K): 


(f.u*—u)>0, Wur eK. (4) 


Indeed, if u, — u is an admissible variation of u (in 
the sense that it satisfies (4)), then the same does not 
hold for the variation u — u,. Of course in the one- 
dimensional case (-, -) is a simple multiplication. For 
multidimensional problems it will be an inner product. 

Analogous subdifferential relations can be written 
for multidimensional laws (for example, for constitu- 
tive laws in elastoplasticity [9]). 


Variational Formulation 

of Quasidifferential Laws 

Let us assume now a nonmonotone possibly multival- 
ued relation. By means of a real-valued, quasidifferen- 
tiable superpotential energy function ®, one may ex- 
presses this relation in the form: 


—f=mt+w, (5) 


with {w1, w2} € D&(u) = [0®(u), ®(u)). 
By definition, (5) is equivalent to the relation: 


—f,u*—u) = max (w*,u* —u 
=f ) ee 1 ) 
+ min (w5,u*—u) , (6) 
wx €dD(u) 
Vu* €U, 


for u € U, or with the system of variational inequalities 
* * * * * 
—f,u* —u)—(wy,u*—u) < max (wy,u*—4u) , 
(feu? =u) — (w$.u* — a) Samar (wi a —u) 
VuseU, Vwre d®(u) , 


(7) 


VuXeU, Vwi € d@(u). 


(8) 


Space U is in general a subspace of R” and depends on 
the considered application. 


Analogously one treats multidimensional relations 
(for example, boundary adhesive layers) or constitutive 
laws (e. g., materials with softening effects). A number 
of concrete examples have been given in [4, Chap. 3]. 


Example: an Elastostatic Problem Involving 
QD-Superpotentials 


Let 2 C R? be an open bounded subset occupied by 
a deformable body in its undeformed state. On the as- 
sumption of small deformations one writes the virtual 
work relation 


/ ojj(uyeij(v = u) d&x2 
2 


=f fii-w) a2 +f oymlvi—uddP (9) 
VveV, 


for u € V. Here V denotes the function space of the 
displacements which will be defined further. As it has 
been outlined previously, for the derivation of (9) one 
multiplies the equilibrium equation: 

Oj, + fi =0, (10) 
where f; is the volume force vector, by a virtual dis- 
placement v; — u; and then we have integrated over 92. 
On the assumption of appropriately regular functions, 
one applies the Green-Gauss theorem by taking into 
account the strain-displacement relation 

1 

bij = 3 Mg + yi) (11) 
Let us assume further that the body is linearly elastic, 
i.e. that 

Oij = CijnkEnk » (12) 
where C = {Cink}, i,j, h, k = 1, 2, 3, is the elasticity tensor 
which obeys to the well-known symmetry and ellipticity 
conditions. The energy bilinear form of linear elasticity 
is further denoted by a (u, v) = fo Cijnkéii(u)enk(v)d. 


Variational Equality 


For example, let us assume first that on the boundary I” 
of the structure the classical boundary conditions Sy = 
O and ur, =0,i= 1, 2, 3, hold. Then one gets the classical 
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variational equality: Find u € Vo = {v:v € V, vr, =0 on 
I”} such that 


a(uv) =f finde, VveVv. 
Q 


Convex Variational Inequality 


Furthermore, let us assume now that on I” the general 
monotone multivalued subdifferential boundary condi- 
tion (1) holds. Using (2) and (9) one obtains the follow- 
ing variational inequality: find u ¢ V with ®(u) < ov, 
such that 


a(uy—u) + f (D(v) — O(u)) dI 
rT 


> | fv -uyaa, 
Q 
VveV: jv) <o. 


QD Laws and Systems of Variational Inequalities 


Let us assume that on I” the nonmonotone, possibly 
multivalued boundary condition (5) holds, where @ is 
a quasidifferentiable functional. It has the form 


—-S=w,t+w2, 


with {w1, w2} € Dj(u) = {0@(u), ID(u)}. 

Then one has, by definition, the relation (6), where 
®'(u, v) = (—S, v). Finally, by an analogous way, one has 
the variational problem: find u € V, wi, w2 € W such as 
to satisfy the relation 


a(u,v—u) -{ filvi — uj) dQ 
i?) 


max 
wi (x) € dP(u(x)) 
a.e.onl” 


(why =u) 


(13) 
+ min 
w3(x) € I®(u(x)) 
a.e.on I” 


VveVv. 


(w3,v—u) =0, 


The function spaces V and W depend on the studied 
application. For instance, for three-dimensional elasto- 
statics the following choice has been proposed in [4]: 
V =[H'(2)]°, W = [L?(1)}>. A more general formula- 
tion, also proposed in the previously given original pub- 
lication, would be to assume that w,, w2 € [H~/?(")]°. 


Then in the left-hand side of (13) on should replace 
w(x) € O®(u(x)) a.e.on byw, € OF(u) and w2(x) € 
0 ®(u(x)) a.e. on I” by wz € OF(u), where one assumes 
that 


/ P(u(x)) dr if O(-)eE L(L), 
F(u) = Jr 
ee) otherwise. 


Then instead of (13) one has the following problem: 
find u € [H'(Q)}°, w1, w2 € [H~ 17(")]? such that 


auy—u)— fF filv;—u) de 

+ max {(wr + w2.¥— 4): wi € OF(u)} 

+ min {(wy +wz,v—u): wy € dF(u)}=0, 
Vv = HUG. 


One should mention that the related questions con- 
cerning the extension of QD-superpotentials to func- 
tion spaces remain still open. 

Moreover we can write the min-max form which 
reads: find u € [H'({2)]}> such as to satisfy the relation: 


a(u,v—u) -[ filvi —uj) dQ 
2 
min max {(wi +w;,v—u) =0, 
wy €OF(u) wy edF(u) 
Vv € [H'(2)]°. 
If in particular the superpotential F can be expressed 


as the difference of two convex functions, i.e. if F = 
@®,— ®), with }; and ®, convex, then one has 


OF => 0®, r OF => —d®, ‘ 


where dis the subdifferential of the convex analysis. In 
this case the following system of variational inequalities 
results, as it can easily be shown by using the definition 
of the subdifferential: find u € [H'(2)]*, such as to sat- 


isfy 
a(u,v—u)— / Filvi — uj) dQ 
Q 


— (wy,v—u) + (v) — Bj (u) > 0, 
Vv € [H'(2)]° 


for all wy € [H~"/7(I")}> such that 


(w;,v—u) < ®(v) — &(u), Vv € [H'(2)]°. 


Quasivariational Inequalities 


Further information on variational formulations in 
elastostatics can be found in ® Hemivariational in- 
equalities: Applications in mechanics. 
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Variational or weak formulations of boundary value 
problems in science and, in particular, in engineering 
are integral, energetic expressions of all involved quan- 
tities (involving differential equations and boundary 
conditions). Usually, under differentiability (smooth- 
ness) assumptions of the involved variables and equal- 
ities throughout the considered model one gets vari- 
ational equality problems. The strong formulation of 
the initial problem (i. e., constitutive relations, bound- 
ary conditions, etc) can be reconstructed if one con- 
siders certain values for the (otherwise arbitrary) varia- 
tions in the weak form, i.e., in the variational equality. 
Variational formulations provide the basis for modern 
computational mechanics techniques (e. g., the finite el- 
ement method) and for this reason they have been ex- 
tensively studied in the affiliated literature (see, among 
others, [22]). In terms of optimization they can be con- 
sidered as stationary point statements for the total dif- 
ferential of an appropriately constructed (convex or 
nonconvex) potential energy function, provided that 
the studied problem admits a potential. Namely, the 
weak formulation expresses the fact that the variation 
of a function for every small variation of the involved 
independent variables is equal to zero, which, due to 
the arbitrariness of the variations, is equivalent to the 
more classical requirement that the first derivative of 
the function vanishes at a critical point. 

Due to inequality-type constraints or due to lack of 
differentiability in the involved functions one is some- 


times obliged to consider one-sided (unilateral) varia- 
tions of the problem’s variables. A systematic way of 
doing so is provided by the theory of variational in- 
equalities [13]. They are related to monotone opera- 
tors, to convex, nondifferentiable optimization prob- 
lems and to complementarity problems. Variational in- 
equalities have been applied for the study of problems 
in engineering [17,20], economics, transportation plan- 
ning and flow in networks (see also [6,8,10]). 

Extensions for nonconvex variational inequalities, 
which are based on the generalized gradient ap- 
proach in the sense of F.H. Clarke, have been pro- 
posed and studied by P.D. Panagiotopoulos who named 
them hemivariational inequalities. Details are given in 
> Nonconvex energy functions: Hemivariational in- 
equalities and » Hemivariational inequalities: Appli- 
cations in mechanics. Parallel developments which are 
based on the notion of the quasidifferentiability in the 
sense of V.F. Demyanov and A.M. Rubinov are de- 
scribed in » Quasidifferentiable optimization: Varia- 
tional formulations. 

Furthermore, there exist problems where the admis- 
sible space (for the variables and their variations) or the 
involved potentials depend on the solution of the prob- 
lem. This class of implicit variational inequality prob- 
lems are called quasivariational inequalities. They have 
been used for the modeling of stochastic impulsive con- 
trol problems, in free boundary problems, in mechanics 
and in economy. The interested reader may find more 
information in the references [1,2,7,9,15]. Here a short 
outline of quasivariational inequality problems is given. 
A model application arising in unilateral contact prob- 
lems with Coulomb friction in engineering mechan- 
ics demonstrates the discussed ideas. This approach is 
based on early theoretical and numerical studied of [19] 
(see also numerical applications in [11,12]) and, among 
others, have recently been tested for several convex and 
nonconvex problems of mechanics in [14]. 


Variational Inequalities 


Let us first consider abstract variational formulations of 
a boundary value problem which is defined in a subset 
Q of R",n=1,...,n, with boundary I’. Let V be a real 
Hilbert space and V’ be its dual space. Let a(-,-): V x V 
— R be a symmetric, continuous and coercive bilinear 
form and (I, -) be a continuous linear form on V. An 
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abstract variational problem reads: find u € V such that 


a(u,v—u)=(l,v—u), VWreV. (1) 


Let moreover K be a closed convex subset of V and 
assume that a solution of the boundary value problem 
within the set K is sought. It can be shown that this so- 
lution is characterized by the following abstract vari- 
ational inequality (of the G. Fichera type, see [20, p. 
188}): 


Find ue KCV 
s.t. a(u,v—u) > (l,v—u), (2) 


VveEK. 


For a convex, l.s.c. proper functional on V one 
may define the more general (nonlinear) variational in- 
equality ([14]): 


Find ueV 
st. a(u,v) + @(v) — O(u) = 
>(1,v—u), 


VveV. 


It is obvious that (2) is a special case of (3), with ® = Ix, 
where the indicator function of the set K is defined by 
Ix(v) = 0 if v € K, +00 otherwise. 

Let moreover j: R — R denotes a locally Lipschitz 
function and let j°(u, v—u) denotes the generalized gra- 
dient of the nonconvex and nonsmooth function j. By 
definition, one has the following connection with the 
generalized gradient, in the sense of Clarke: 


Pu,v) = {max (w,v): w € dcr j(u)} . (4) 


A hemivariational inequality problem reads: 


find ueV 
st. a(u,v—u) + / Pe v—u)dQ 
2 (5) 
= (v=), 
VveV. 


Implicit Variational Inequalities 
and Quasivariational Inequalities 


If one assumes for instance that the linear form (1, -) or 
the set K in the previous relations depend on the so- 


lution u, one gets various types of implicit variational 
inequalities or quasivariational inequalities. 

Let the set K be a variable of the solution u. Then 
from (2) one gets the quasivariational inequality: 


find ue K(u)CV 
st. a(u,v) > (1,v—u), 
Vv € K(u). 


Along the same lines one formulates from (3) the im- 
plicit variational inequality: 


find ueV 
st. a(u,v) + O(u; v) — O(u;u) 
> (l,v—4), 
Vee V. 


Here the first argument in ®(-, -) is tackled as a pa- 
rameter. A concrete application of this method will be 
demonstrated by the mechanical problem in the next 
section. 

Finally, in analogy to the previous extensions, for 
a continuous mapping h(u) the following quasihemi- 
variational inequality (which may also be characterized 
as implicit hemivariational inequality) problem can be 
written (see [16, p. 128]): 


find ueV 
st. a(u,v—u) + h(u)j°(u, v —u)d 
(6) 
>= I(v—u4), 
VveV. 


Mechanical Example: 
Coupled Unilateral Contact Problem with Friction 


Let 2 € R? be an open bounded subset occupied by 
a deformable body in its undeformed state. On the as- 
sumption of small deformations one writes the virtual 
work relation (for u € V) 


[oven a2 = | filvi — ui) dQ 
Q 2 


++ 1 Sy(vy — un) dT 
Tr (7) 


+ / St, (v7, —_ ur; ) dr ’ 
T 


VveVv. 


3224 


Quasivariational Inequalities 


Here V denotes the function space of the displace- 
ments, which in general is an appropriate subset of 
H!(2) and f;, Sy, St, € L2(I”). Recall here that the ab- 
stract bilinear form a(-, -) reads in this case of linear 
elasticity 


a(u,v) = i Cijnkeij(Wenk(v) d& . (8) 


Moreover the underlying elastostatic equilibrium equa- 
tion boundary value problem has the form: 


Oijj+fi =O, (9) 


where the f; is the volume force vector. One recalls 
here the strain-displacement relation (small deforma- 
tion theory): 
1 
ej = 5 Mini + Uji). (10) 
Let a linearly elastic body be assumed, i. e., the consti- 
tutive material relation reads: 


Oij = CijnkEnk » 


where C= {Cink}, i,j, h, k = 1, 2, 3, is the elasticity tensor 
which satisfies the well-known symmetry and ellipticity 
properties. 

Recall here that on the assumption that classical 
support conditions hold on I" (i.e., say uy = 0 and ur, = 
0, i= 1, 2, 3) one gets the following variational equality: 


VN = 0, 
Find ue Vo= j\veEV: vr, =0 
onl’ (11) 
s.t. a(u,v) =H Fivi dQ, 
Q 
Vve Vo. 


Signorini-Coulomb Unilateral Frictional Contact 


Let us assume the pointwise unilateral contact relations 
(known as Signorini condition, for the frictionless uni- 
lateral contact case): 


—Sn2z0, un-g<0, 


(12) 
—Sn(un-—g)=0 onl. 


Here, the inequalities on the boundary tractions corre- 
spond to the mechanical restriction that no tensile trac- 


tions are permitted. Moreover, the normal boundary 
displacements should not be greater that a given initial 
distance g, because no penetration is allowed. Finally, 
the complementarity relation expresses the physical fact 
that either contact is realized or a separation takes place. 

A simplified static version of the Coulomb’s friction 
law connects the tangential (frictional) forces S, with 
the normal (contact) forces Sy by the relation 

y = |Sw|—|Sr] 2 0. (13) 
Here | * | denotes the norm in R? and y is the fric- 
tion coefficient. The friction mechanism is considered 
to work in the following way: If |Sr| < jz|Sy| (i.e. y > 0) 
the slipping value y must be equal to zero and if |Sr| = 
}L|Sn| (i.e. y = 0) then we have slipping in the opposite 
direction of Sy. Explicitly we have: 


ify >0 
ify =0 


then yr = 0, 


then there exists 0 > 0 (14) 


st. yri = —oSrTi, 


where i = 1, 2, 3 refers to the components of vector Sr 
with respect to a reference Cartesian coordinate system. 

Contact law (12) can be written in the superpoten- 
tial form: 


— Sy € Wy,,(un) = IPy(u) = Nu,,(un). (15) 


Here the set of admissible displacements is introduced: 
Usa = {ue V: un—g <0} (16) 


and the notions of the convex analysis subdifferential 
and of the normal cone to a set have been used. The 
corresponding variational inequality reads 


—Sn(un)(vy —un) 50, Wn € Vag. (17) 
For the friction law one writes, analogously: 
— Sr € duy (U|Sn| |un|) = OPr(Snsur), (18) 


where the involved potential is nondifferentiable (due 
to the absolute value nonlinearity of |uy|) and de- 
pends on the normal contact traction Sy, thus, implic- 
itly, on the solution of the problem uy, i.e. one consid- 
ers the parametrized potential ®r(u;ur) = ®r(Sysur) 
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= kt|Sn||un|. The corresponding variational inequality 
reads 
— Sr(ur\(vr — ur) < Or(usvr) — Or(u; ur), (19) 
Vvr € Usa : 
Combining relations (7), (9) and (15) one gets the 
implicit variational inequality: find u € Ugq such that 


- oijéij(v —u) dQ + Or(u; vr) — Or(us; ur) 
Q 


> | Pi ndjae. (20) 
Q 


Vve Usgic 


A dual problem in terms of stresses provides us a corre- 
sponding quasivariational inequality problem. For sim- 
plicity, a two-dimensional problem is considered fur- 
ther. Moreover, the following set of admissible bound- 
ary tractions for the Signorini-Coulomb unilateral con- 
tact problem is assumed: 


Ssa = {(Sy, Sr): gj(Sw,Sr) <0, j= 1,2} , 
where the constraint functions have the form: 


gi(Sn, Sr) =pSn — Sr, 
gil(Sn, Sr) = USN + Sr : 


Moreover one needs the set of admissible stresses 
(which include the boundary tractions): ¥'(o) = {o: 0%, j 
+ fi = 0} M Sag. It may be shown that in this case the 
previous problem is expressed in the form of the quasi- 
variational inequality: 


find o € XY(o) 
s.t. / 4 j(Tij — ij) dQ > 0, 
2 
Vre Sioa) 


Numerical Algorithms: Applications 


Theoretical results and numerical algorithms can be 
found in several books dealing with variational inequal- 
ities, convex analysis and their applications. For the nu- 
merical solution, usually one solves, iteratively, a num- 
ber of variational inequality problems. The resulting se- 
ries approximates the solution of the initial quasivaria- 
tional inequality. Further information and references, 
mainly connected with the mechanical problems used 


as model applications in this paper and their general- 
izations, can be found in [3,5,14,18,21]. Finally iterative 
solution methods can be based on multilevel optimiza- 
tion techniques, as it is discussed in ® Multilevel opti- 
mization in mechanics. 
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Introduction 


Crew scheduling problems (CSPs) consist of assign- 
ing crews to trains and creating rosters for each crew, 


while satisfying several Federal Railway Administration 
(FRA) regulations and trade-union work rules. The ob- 
jectives are to minimize the cost of operating trains on 
one hand and to improve the quality of life for crew 
on the other hand. This chapter gives a comprehen- 
sive description of the existing literature related to crew 
scheduling, an overview of CSP for the North American 
Railroads and describes an application of space-time 
network flow based multi-commodity methods [13] to 
solve this problem. Network flow models have found 
successful applications in a large number of diverse 
fields which include applied mathematics, computer 
science, engineering, management, and operations re- 
search [1]. 

Crew scheduling has been historically associated 
with airlines and mass-transit companies. Several pa- 
pers on crew schedule management have appeared 
in the past literature; most notable among these are 
due to [4,6,8,15,16]. All these articles explore a set 
covering based approach to solve the crew schedul- 
ing problem. Crew scheduling is conventionally di- 
vided into two stages: (1) Crew pairing: A crew pair- 
ing is a sequence of connected segments that start 
and end at the same crew base and satisfy all legal- 
ity constraints. The objective is to find the minimum- 
cost set of crew pairings such that each flight or train 
segment is covered. (2) Crew rostering: The objective 
here is to assign individual crew members to trips or 
sequences of crew pairings. This pairing and roster- 
ing approach uses a set covering formulation and is 
usually solved using column generation embedded in 
a branch-and-bound framework (also called branch- 
and-price [10]). This approach has gained wide ac- 
ceptance and application in both the airline indus- 
try [2,3,11] and in European [5], Asian [7], and Aus- 
tralian railroads [9,14]. 
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While there have been several papers devoted to the 
study of railroad crew scheduling problems in Europe, 
Asia, and Australia, North American railroad problems 
are yet to be addressed satisfactorily. One application 
of optimization methods to North American railroad 
crew scheduling is due to [12]. They studied crew bal- 
ancing in the context of a major North American Rail- 
road, BNSF Railway and developed a dynamic pro- 
gramming approach to solve this problem. The major 
short-coming of their approach is that they did not con- 
sider the possibility of different crew types; each gov- 
erned by a different set of rules. Another drawback 
is that their approach could handle only a particular 
crew district configuration (single-ended crew district). 
While most crew districts in North America are single- 
ended, there are several which are double-ended or even 
more complex (these configurations are described in 
the next section). The multi-commodity network flow 
approach described here models all the rules consid- 
ered by [12] and also handles the case where different 
crew pools have different sets of rules. It is also applica- 
ble to all the crew district configurations encountered 
in North America. 

Crew pairing and rostering approaches which use 
column generation have been the predominantly suc- 
cessful methods to solve crew scheduling problems. 
However, this approach is not suited for North Ameri- 
can railroads due to the following reasons: 

1. The rail network of North American railroad is di- 
vided into several crew districts. As a train follows 
its route, it goes from one crew district to another, 
picking up and dropping off crew at crew change 
terminals. Almost all crew districts consist of two or 
three terminals. Hence, a pairing and rostering ap- 
proach is needlessly complex and not required since 
most pairings would consist of two trains, a train 
from home to away and a train from away to home. 
In addition, rail networks typically consist of 200- 
300 crew districts and the emphasis is on an ap- 
proach which is simple and fast, and column genera- 
tion techniques which are computationally intensive 
are not appropriate. 

2. The FRA regulations governing North American 
railroads are extremely complex. The most com- 
plicating of these rules is First-In-First-Out (FIFO) 
requirement. FIFO constraints require that crews 
should be called on duty in the order in which they 


become qualified for assignment at a location. The 
success of all approaches using column generation 
or branch-and-price algorithms is contingent on the 
ease of solving the sub-problem. It should be noted 
that the addition of the FIFO side constraints to the 
problem would spoil the special structure of the sub- 
problem and blow up the computational times. 
Henceforth, whenever CSP is mentioned, it is men- 
tioned in the context of the North American railroad 
CSP. 


Definitions 


This section gives an overview of the CSP and defines 
some of the essential terminology needed to under- 
stand the problem. It also gives an overview of some of 
the typical regulations which govern crew management 
and lists the set of inputs required to properly define 
and formulate the crew scheduling problem. 


Terminology 


Crew District: The rail network of a railroad is di- 
vided into crew districts that constitute a subset of ter- 
minals (nodes). Each crew district is typically a geo- 
graphic corridor over which trains can travel with one 
crew. A typical railroad network for a major railroad 
in the U.S. may be divided into as many as 200 to 300 
crew districts. As a train follows its route, it goes from 
one crew district to another, picking up and dropping 
off crew at crew change terminals. 


Crew Pools: Within a crew district, there are several 
types of crews called crew pools or crew types, which 
may be governed by different trade-union rules and 
regulations. For example, a crew pool may have prefer- 
ence for the trains operated in a pre-specified time win- 
dow. Similarly, a crew pool consisting of senior crew 
personnel is assigned only to pre-designated trains so 
that crews in that pool know their working hours ahead 
of time. 


Home and Away Terminals: The terminals where 
crews from a crew pool change trains are designated ei- 
ther as home terminals or away terminals. The railroad 
does not incur any lodging cost when a crew is at its 
home terminal. However, the railroad has to make ar- 
rangements for crew accommodation at their away ter- 
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minals. A crew district with one home terminal and one 
away terminal is called a single-ended crew district. The 
other type of crew district is a double-ended crew dis- 
trict, in which more than one terminal is a home ter- 
minal for different crew pools. Some of the other crew 
district configurations are crew districts with one home 
terminal and several away terminals, and crew districts 
with several home terminals and corresponding sets of 
away terminals. 


Crew Detention: 
minal and rests for the prescribed hours, the crew is 
ready to head back to its home terminal. However, if 
there is no train scheduled to depart, then the crew may 
have to wait in a hotel. According to the trade-union 
rules, once a crew is at the away terminal for more than 
a pre-specified number of hours (generally 16 hours), 
the crew earns wages (called detention costs) without 
being on duty. 


Once a crew reaches its away ter- 


Crew Deadheading: Crew deadheading refers to the 
repositioning of crew between terminals. A crew nor- 
mally operates a train from its home terminal to an 
away terminal, rests for a designated time, and then op- 
erates another train back to its home terminal. Some- 
times, at the away terminal, there is no return train pro- 
jected for some time, or there is a shortage of crews 
at another terminal. Thus, instead of waiting for train 
assignment at its current terminal, the crew can take 
a taxicab or a train (as a passenger) and deadhead to the 
home terminal. Similarly, the crew may also deadhead 
from a home terminal to an away terminal in order to 
rebalance and better match the train demand patterns 
and avoid train delays. 


On-duty and Tie-up Time: Whenever a crew is as- 
signed to a train, it performs some tasks to prepare the 
train for departure, and hence crews are called on-duty 
before train departure time. The time at which the crew 
has to report for duty is called the on-duty time. Sim- 
ilarly, a crew performs some tasks after the arrival of 
the train at its destination, and hence crews are released 
from duty after the train arrival. The time at which the 
crew is released from duty is called tie-up time. The 
duty duration before train departure is referred to as 
duty-before-departure and the duty duration after train 
arrival as duty-after-arrival. Hence, the total duty time 


(or duty-period) of a crew assigned to a train is the sum 
of the duty-before-departure, the duty-after-arrival, and 
the travel time of the train. 


Duty Period: In most cases, duty-period of a crew 
assigned to a train is the total duration between the 
on-duty time and the tie-up time. In some cases when 
a crew rests for a very short time at an away location 
before getting assigned to a train, the rest time and the 
duration of the second train may also included in the 
duty period of the crew. 


Dead Crews: By federal law, a train crew can only 
be on duty for a maximum of 12 consecutive hours, at 
which time the crew must cease all work and it becomes 
dead or dog-lawed. 


Train Delays: When a train reaches a crew-change 
location and there is no available crew qualified to 
operate this train, the train must be delayed. Train 
delays due to crew unavailability are quite common 
among railroads. These delays are very expensive (some 
estimate $1000 per hour) and can be reduced sig- 
nificantly through better crew scheduling and train 
scheduling. 


Regulatory and Contractual Requirements 


Assignment of crews to trains is governed by a vari- 
ety of Federal Railway Administration (FRA) regula- 
tions and trade-union rules. These regulations range 
from the simple to the complex. The regulations also 
vary from district to district and from crew pool to crew 
pool. Some examples of these kinds of constraints and 
their typical parameter values: 

e Duty-period of a crew cannot exceed 12 hours. 
Duty-period of a crew on a train is usually calcu- 
lated as the time interval between the on-duty time 
and tie-up time of the train. 

e When a crew is released from duty at the home ter- 
minal or has been deadheaded to the home terminal, 
they can resume duty only after 12 hours (10 hours 
rest followed by 2 hours call period) if duty-period 
is greater than 10 hours, and after 10 hours (8 hours 
rest followed by 2 hours call period) if duty-period 
is less than or equal to 10 hours. 

e Whenever a crew is released from duty at the away 
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Railroad Crew Scheduling, Figure 1 
An example of crew assignment decision tree 


terminal, they usually must go for a minimum 8 
hours rest, except for a few exceptions. 
e Crews belonging to certain pools must be assigned 
to trains in a FIFO order. 
e A train can only be operated by crews belonging to 
pre-specified pools. 
Every train must be operated by a single crew. 
Crews are guaranteed a certain minimum pay per 
month regardless of whether or not they work. 
Figure 1 gives an example of the kind of decision pro- 
cess that needs to be followed by crew planners. 


Problem Inputs 


The inputs for the mathematical formulation of the 
crew scheduling problem are: 


Yes 


No 


Train Schedule: The train schedule contains infor- 
mation about the departure time, arrival time, on- 
duty time, tie-up time, departure location, and ar- 
rival location for every train in each crew district it 
passes through. 

Crew Pool Attributes: This includes attributes of 
various crew types, namely their home locations, 
their away locations, minimum rest time, train pref- 
erences, etc. 

Crew Initial Position: This provides the position of 
crew at the beginning of the planning horizon. This 
includes information of the terminal at which a crew 
is released from duty, the time of release, the num- 
ber of hours on duty in the previous assignment, and 
the crew pool the crew belongs to. 

Train-Pool Preferences: The train-pool prefer- 
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ences, if any, contain information about the set of 
trains that can be operated by a crew pool. 

e Away Terminal Attributes: This consists of infor- 
mation about the away terminals for each crew pool. 
It includes the rest rules and the detention rules for 
each crew pool at each corresponding away termi- 
nal. 

e Deadhead Attributes: This consists of the time 
taken to travel by taxi between two terminals in 
a crew district. 

e Cost parameters: Cost parameters are used to set up 
the objective function for the crew scheduling prob- 
lem. They consist of crew wage per hour, deadhead 
cost per hour, detention cost per hour, and train de- 
lay cost per hour. 


Formulation 


The CSP is formulated as a multi-commodity network 
flow problem on a space-time network. The construc- 
tion of the space-time network is described first and 
then the formulation of the CSP as an integer program- 
ming problem is given. 


Space-Time Network 


The CSP is decomposed into a separate problem for 
each crew district. In the space-time network, each 
node corresponds to a crew event and has two defin- 
ing attributes: location and time. The events that are 
modeled while constructing the space-time network are 
departure of trains, arrival of trains, departure of dead- 
heads, arrival of deadheads, supply of crew, and termi- 
nation of crew duty to mark the end of the planning 
horizon. Figure 2 presents an example of the space-time 
network in a crew district. Note that for the sake of clar- 
ity, this network only represents a subset of all the arcs. 

For each crew, a supply node whose time corre- 
sponds to the time at which this crew is available for 
assignment, and whose location corresponds to the ter- 
minal from which the crew is released for duty is cre- 
ated. Each supply node is assigned a supply of one unit 
and corresponds to a crew member. The network also 
has a common sink node for all crews at the end of the 
planning horizon. This sink has no location attribute 
and has the time attribute equal to the end of the plan- 
ning horizon. The sink node has a demand equal to the 
total number of crew supplied. 


Home Terminal Away Terminal 
am Pot 


Time 


Railroad Crew Scheduling, Figure 2 

Space-time network for a single-ended district with a single 
crew type. Node legend: green (supply), blue (arrival), yellow 
(departure), red (demand) Arc legend: green (train), orange 
(rest), blue (deadhead), black (demand) 


For each train (say /) passing through a crew district, 
a departure node (say I’) is created at the first depart- 
ing station of the train in the crew district and an ar- 
rival node (say /”) at the last arriving station of the train 
in the crew district. Each arrival or departure node has 
two attributes: place and time. For example, place (I') 
= departure-station (I) and time (I') = on-duty-time (1); 
and similarly, place (I) = arrival-station (1) and time 
(I) = tie-up-time (1). 

Train arc (I', I) is created for each train / connecting 
the departure node and arrival node of train I. Dead- 
head arcs are constructed to model the travel of crew 
by taxi. A deadhead arc is constructed between a train 
arrival or crew supply node at a location and a train de- 
parture node at another location. All the deadhead arcs 
which satisfy the contractual rules and regulations are 
created. Rest arcs are constructed to model resting of 
a crew at a location. A rest arc is constructed between 
a train arrival node or a crew supply node at a loca- 


3232 


Railroad Crew Scheduling 


tion and a train departure node at the same location. 
Rest arcs are created in conformance to the contractual 
rules and regulations. Since the contractual regulations 
are often crew pool specific, deadhead arcs and rest arcs 
are created specific to a crew pool. Finally, demand arcs 
are created from all train arrival nodes and crew supply 
nodes to the sink node. Each arc has an associated cost 
equivalent to the crew wages, deadhead costs, or deten- 
tion costs, depending on the type of the arc. It can be 
noted that all contractual requirements other than the 
FIFO constraint are easily handled in the network con- 
struction. 

The space-time network does not model the case 
when qualified crews are not available for assignment 
to a train causing train delays. Train delays are mod- 
eled by the construction of additional arcs. To do this, 
rest arcs and deadhead arcs which do not honor the rest 
regulations are constructed and flows on these arcs are 
penalized to ensure that flows on these arcs occur only 
when qualified crews are not available for assignment. 
The flows on these arcs denote that the train will be de- 
layed until crew becomes qualified for train operation. 
However, as the delay of a train may have propagating 
effect in the availability of crews in subsequent assign- 
ments, it is assumed that the crew assigned to a delayed 
train has sufficient slack in the rest time at the train ar- 
rival node to make it qualified for subsequent assign- 
ments. 


Integer Programming Formulation 


The CSP is formulated as an integer multi-commodity 
flow problem on the space-time network described in 
the previous section. Each crew pool represents a com- 
modity. Crew enters the system at crew supply nodes, 
takes a sequence of connected train, rest, and deadhead 
arcs before finally reaching the sink. 


Decision Variables 
x; : Flow of crew pool c € C on each train arc 1 € L 
xq : Flow on deadhead arc d € D 


x,:Flowon rest arcr € R 


Objective function 


Min >) > cx 3} cata + 3 Gx, 


lEL cEC deD reR 


Constraints 

) xf = 1, foralll e L (1) 

cEC 

<= 1, for alli € N, (2) 

aeit 

xa = f (3) 

aeN, 

xi = > xq, foralleL,cEeC (4) 
a € tail(1)- 

x= > xq, foralleL,ceEeC (5) 
aé head(1)* 

So x — MQ — x) <0, forallre R (6) 

r/CA;, 

x; € {0, 1} and integer, for alll € L,c € C (7) 

xq € {0, 1} and integer, for alld € D (8) 

x, € {0, 1} and integer, for allr € R (9) 


Constraint (1) is the train cover constraint which 
ensures that every train is assigned a qualified crew to 
operate it. Constraint (2) ensures flow balance at a crew 
supply node. Constraint (3) ensures the flow balance at 
the sink node. Constraints (4) and (5) ensure flow bal- 
ance at train departure and arrival nodes respectively. 
Constraint (6) ensures that the crew assignment honors 
the FIFO constraint. Constraints (7), (8), and (9) spec- 
ify that all the decision variables in the model are bi- 
nary. The objective function is constructed to minimize 
the total cost of crew wages, deadheading, detentions 
and train delays. Note that the detention and delay costs 
are taken into account while calculating the cost of rest 
arcs. 


Theorem 1 There is a one to one correspondence be- 
tween a feasible flow on the space-time network satisfy- 
ing constraints (1)-(9) and a feasible solution to the crew 
scheduling problem. 


Most crew districts have two terminals, and a typi- 
cal train schedule has around 500 trains running in 
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Railroad Crew Scheduling, Table 1 
Notation 


N Set of nodes in the space time network Set of outgoing arcs specific to crew pool c at node i 
Set of train arcs in the network, indexed by / Set of incoming arcs specific to crew pool c at node i 


ji : : r F F 
there is flow on rest arc r 


N Set of crew supply nodes Cost of deadhead arc d € D 


N 


Sink node 


ic 
le 
D r 
f 
G(N, A) | Space-time network CG Cost of crew wages for crew pool c € Contrainarc/ € L 
s Ca 
d Cr 


Cost of rest arcr € R 


Set of crew pools in the system, indexed by c 


tail(l) 


The node from which arc / originates 


Set of outgoing arcs at nodei 


head(I) | The node at which arc / terminates 


Set of incoming arcs at node i 


a couple of weeks in a crew district. Each crew district 
could have two to four crew types and around 50 crews. 
Therefore, the space-time network could have around 
50 + 2 x 500 = 1,050 nodes. The number of deadhead 
arcs is typically around 25,000, and the number of rest 
arcs is around 100,000. 

Since the number of rest arcs for a typical prob- 
lem is of the order of 100,000, and as each rest arc has 
one FIFO constraint, the number of FIFO constraints 
in the model would be 100,000, which makes the prob- 
lem very large. In addition, these constraints spoil the 
special structure of the problem and a direct approach 
to solve the CSP suffers from intractability and does 
not converge to a feasible solution in several hours 
of computational time. However, the integer program- 
ming problem with FIFO constraints relaxed (Relaxed 
Problem) can be solved to optimality within minutes. 
Efficient methods to solve the CSP are described in the 
next section. 


Methods and Applications 
Successive Constraint Generation (SCG) 


The SCG algorithm works by iteratively pruning out 
crew assignments which violate the FIFO constraints 
from the current solution of a more relaxed problem. 
The SCG algorithm starts with the optimal solution of 
the Relaxed Problem. The algorithm scans the rest arcs 
in the current solution with positive flow, and for each 
rest arc assignment which violates FIFO constraints, it 
adds the corresponding FIFO constraints. The problem 


is then re-solved and the solution re-checked for FIFO 
infeasibilities. This process is repeated until all FIFO in- 
feasibilities are pruned. 


Algorithm-SCG 

1. Solve the Relaxed Problem. Ifa feasible solution ex- 
ists, then proceed to Step 2. Otherwise STOP as the 
instance is infeasible. 

2. Examine all the rest arcs with positive flow in the so- 
lution of Step 1. Add FIFO constraints to the integer 
program on those rest arcs assignments which vio- 
late FIFO requirements. 

3. If FIFO constraints are added in Step 2, re-optimize 
the modified integer program and go to Step 2. Oth- 
erwise, STOP. The current solution is optimal. 

The SCG algorithm is an exact algorithm guaranteeing 
optimal solution to the original problem. However, in 
the worst case, SCG could add all the FIFO constraints 
to the integer program and would hence become an in- 
tractable approach. Fortunately, this seldom happens in 
practice. Computational results show that the number 
of constraints added is usually much less than the total 
number of rest arcs in the network. 

While SCG is an exact algorithm and produces 
provably optimal solutions, the running time of this al- 
gorithm could be quite high. Some instances had run- 
ning times in the order of minutes while others had 
much higher running times. While these running times 
are acceptable in the planning environment, they re- 
strict the applicability of this algorithm in the real-time 
environment. 
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Quadratic Cost-Perturbation (QCP) Algorithm 


The QCP algorithm penalizes FIFO violations. This 
method guarantees zero FIFO violations in the case 
where there is no priority in assigning crews to trains 
and serves as a heuristic method for the other case when 
there are priority restrictions. The basic intuition be- 
hind this approach is that the costs of arcs while solving 
the Relaxed Problem is perturbed in a way that it guar- 
antees FIFO compliance. 

The cost perturbation strategy is presented through 
the illustration shown in Fig. 3 for the case when there is 
only one crew pool type. In case (I), crew assignments 
are made in a non-FIFO manner, and in case (II), the 
assignments are made in a FIFO manner. Consider the 
case when crews are detained at the Terminal 2. Then, 
due to the nature of detention costs, the cost of the as- 
signment (II) would definitely be less than or equal to 
the cost of assignment (I), and hence the solution to the 
Relaxed Problem would honor FIFO constraints. On 
the other hand, suppose all the rest arcs had a cost of 
zero; then both the assignments would have the same 
cost, and the Relaxed Problem would have no cost in- 
centive to choose assignment (II) over assignment (1). 
Thus, a solution to the Relaxed Problem may violate 
the FIFO constraints. In order to provide an incentive 
to the Relaxed Problem to choose case (II) over case (I), 


Terminal 1 Terminal 2 


the cost assignments on rest arcs are perturbed so that 
the solution of the Relaxed Problem has assignments of 
type (II) and not assignments of type (1). 

The cost perturbation scheme that is used is a func- 
tion of the duration of rest arcs. Suppose that the 
time duration between events corresponding to nodes 
2 and 4, 4 and 5, and 5 and 7 are a, b, and c, re- 
spectively. Consider a cost assignment which is pro- 
portional to the square of the duration of rest arcs. 
The constant of proportionality is represented by k (k 
is set to a very small value). 

Then, 

Cost of assignment (1) 


= k(duration of arc (2, 7)) 

+ k(duration of arc (4, 5))* 
=ka+b+c)+kb? 
= k(a? + 2b? + c* + 2ab + 2be + 2ca) 


Cost of assignment (II) 


= k(duration of arc (2, 5))? 

+ k(duration of arc (4, 7))* 
= k(a + b)* + k(b + c)* 
= k(a? + 2b7 + c? + 2ab + 2bc) 


Terminal 2 


Terminal 1 


(D Invalid assignment 


Railroad Crew Scheduling, Figure 3 
Illustrating the FIFO assignments 


(ID Valid assignment 
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It can be observed that the cost of assignments in 
case (II) is less than that in case (I). Hence, when the rest 
arcs have zero costs, the QCP scheme applied to the Re- 
laxed Problem gives FIFO compliant assignments un- 
less there is priority in assigning crew pools to trains. 
These observations are stated as the following theorem. 


Theorem 2 Quadratic Cost Perturbation applied to the 
Relaxed Problem guarantees FIFO compliant crew as- 
signments if there is no priority in assigning crews to the 
trains. 


The solution time of QCP is very short and is compara- 
ble to that of the Relaxed Problem. Note that in the case 
where there are priorities, this approach could be used 
to obtain a solution with a small number of violations 
and the SCG algorithm can be then used to prune out 
these violations. An interesting observation was that 
for most of the instances tested, this method produced 
solutions with objective function values same as those 
for the Relaxed Problem. This implies that FIFO con- 
straints can be satisfied with little or no impact on the 
solution cost. Hence, QCP can be used to obtain ex- 
cellent quality solutions using much less computational 
time. Due to its attractive running times and high solu- 
tion quality, this method has the potential to be used in 
both the planning and the real-time environment. 


Applications 


The crew scheduling model has applications in the tac- 
tical, planning and strategic environments. Some spe- 
cific examples of how the model can be used as an ef- 
fective tool for decision making are provided in this sec- 
tion. 


Tactical Crew Scheduling: Tactical scheduling refers 
to the decisions that need to be taken in the real-time 
planning environment on daily basis. The model for 
the CSP has several potential applications in the tacti- 
cal scheduling environment. Some of the applications 
are: (i) Assign crews to trains, (ii) Recommend which 
crews to place in hotels and which crews to deadhead 
home, (iii) Minimize trains delayed due to shortage of 
crew, and (iv) Disruption management. 


Crew Planning: The essence of the crew planning 
problem for operations or planning is to determine 


how many crews should be deployed in each crew 
pool. A planning problem is solved every month as 
the train pattern changes with seasonal traffic demand 
fluctuations. Currently, railroads solve the pool siz- 
ing problem based on historical precedent and rules- 
of-thumb, through negotiation with the union, and 
by trial and error. The network flow model can sat- 
isfy the need for a structured approach that captures 
all of the considerations, quantifies the various costs, 
and recommends the best way to define and staff 
crew pools. Some of the applications of the model in 
the planning environment are: (i) Develop and eval- 
uate crew schedules, and (ii) Analyze robustness of 
the plan with respect to availability of the crew in the 
pool. 


Crew Strategic Analysis: Strategic management in- 
volves development of policies and plans and allocat- 
ing resources to implement these plans. The timeframe 
of strategic management extends over several months 
or even years. Strategic crew problems include fore- 
casting future head-count needs and evaluating major 
policy changes such as negotiating changes to trade- 
union rules or changing the number and location of 
crew change points on a network. The model can be 
used to quickly calibrate efficient frontiers for each crew 
district and show what number of crews minimizes the 
sum of train delay costs and crew costs. Some of the 
applications of the network flow model in the strate- 
gic environment are: (i) Determining the number of 
crew districts and territory of crew districts, (ii) Effect 
of changing crew trade-union rules, and (iii) Forecast- 
ing crew requirement. 


Conclusions 


The crew scheduling problem for North American rail- 
roads is governed by several Federal Railway Adminis- 
tration (FRA) regulations and trade-union work rules. 
In addition to satisfying these regulations, the objective 
is to minimize the total wage costs, train delay costs, 
deadhead costs, and detention costs. The railroads di- 
vide their network into a number of crew districts, and 
each crew district has several crew pools. Each crew 
pool in a district could have a different set of operat- 
ing rules. These factors make this a complex problem 
to model and solve. 
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The network flow formulation for the crew schedul- 
ing problem described in this article is both flexible and 
robust and can be easily manipulated to represent each 
of the possibilities encountered in real-life. The crew 
scheduling problem is formulated as an integer pro- 
gram on a space-time network. The network is con- 
structed in such a way that all FRA regulations and 
trade-union work rules other than FIFO constraints are 
enforced during the network construction phase itself. 
The operational constraints are handled in the integer 
programming formulation. Two efficient approaches to 
solve the problem are described. 

The crew scheduling model has applications in 
a wide range of settings. The broad spectrum of appli- 
cations varies from the short-term problem of assign- 
ing crews to trains over the next few days to the long- 
term problem of forecasting crew requirements based 
on future demand patterns. The model gives railroad 
executives a method to calibrate and quantify the im- 
pact of current decisions on future operations by run- 
ning several “what-if” scenarios. It has the potential to 
make a significant impact on the railroad’s on-time per- 
formance, crew utilization, and productivity, while also 
improving the quality-of-life for crew and improving 
railroad safety. 
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Introduction 


Everyday railroad managers need to assign hundreds of 
locomotives to hundreds of different trains. Locomo- 
tive scheduling entails optimally assigning a set of lo- 
comotives to each train so that the assignment satisfies 
a variety of business constraints and minimizes the to- 
tal scheduling cost. Major US railroad companies have 
several billions of dollars of capital investment in loco- 
motives. Thus, solving this problem effectively is of crit- 
ical importance for railroads. This chapter gives a de- 
scription of the existing literature related to locomotive 
scheduling, an overview of the North American rail- 
road locomotive scheduling problem (LSP) and sum- 
mary of the key features of two comprehensive models 
for locomotive scheduling developed by [1] and [15]. 

The set of locomotives assigned to a train is called 
consist. When a train arrives at its destination, its con- 
sist is either assigned to an outbound train in its en- 
tirety, or its consist goes to the pool of locomotives 
where new consists are formed. The former case is re- 
ferred to as a train-to-train connection between the in- 
bound and outbound trains, and the latter case is re- 
ferred to as a consist-busting. The cost of assembling 
and disassembling consists must be controlled by de- 
veloping plans that minimize consist-busting. 

Locomotives which provide power to the train are 
referred to as active locomotives. Due to difference in 
power demand at different stations, locomotives are 
also repositioned from one station to another. The lo- 
comotives can be repositioned by simply deadheading 
on an existing train, or by dispatching a set of loco- 
motives which travel independently from one station to 
another, also referred to as light travel. 


LSPs are notoriously hard combinatorial optimiza- 
tion problems. The paper [7] presents an excellent 
survey of existing locomotive scheduling models and 
algorithms for this problem. The models described 
in existing literature can broadly be classified in two 
categories: Single locomotive type and Multiple locomo- 
tive type models. Single locomotive scheduling models as- 
sume that there is only one type of locomotive available 
for assignment. These models can be viewed as min- 
imum cost flow problems with side constraints; some 
papers on single locomotive scheduling models are due 
to [3,4,8,10,16]. Single locomotive assignment mod- 
els are appropriate for some European railroad com- 
panies but are not suited for US railroad companies 
since most trains are assigned multiple types of locomo- 
tives. Multiple locomotive assignment models have been 
studied by [5,6,9,12,13,14,17,18]. The most recent and 
comprehensive multiple locomotive assignment model 
is due to [1] which has been further refined by [15]. 
While [1] develop the initial framework to solve this 
problem; [15] improve the initial effort on several di- 
mensions leading to the development of a practical lo- 
comotive scheduling approach. 


Definitions 


This section lists the set of inputs required to define and 
formulate the LSP. It also describes the constraints that 
govern the problem and the objective function. 


Problem Inputs 


Locomotive Data: A railroad typically has several dif- 
ferent types of locomotives with different pulling and 
cost characteristics and different number of axles (often 
ranging from four to nine). The set of all the locomotive 
types is denoted by K and the index k represents a par- 
ticular locomotive type. The following data is associated 
with each locomotive type k € K: (i) hk; the horsepower 
provided by a locomotive of type k; (ii) A*: the number 
of axles in a locomotive of type k; (iii) G*: the weekly 
ownership cost for a locomotive of type k; and (iv) BY: 
fleet size of locomotives of type k, that is, the number of 
locomotives available for assignment. 

Train Data: Locomotives pull a set of trains L from 
their origins to their destinations. Trains have different 
weekly frequencies; some trains run every day, while 
others run less frequently. The same train running on 
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different days is considered as different train entities; 
that is, if a train runs five days a week, it is consid- 
ered as five different trains. The index / is used to 
denote a specific train. Each train has the following 
associated information: (i) dep-time(l): the departure 
time for the train J; (ii) arr-time(I): the arrival time for 
train [; (iii) dep-station(I): the departure station for train 
I; (iv) arr-station(I): the arrival station for train J; (v) 
T;: tonnage requirement of train J; (vi) 6): horsepower 
per tonnage needed for train J; (vii) H;: horsepower re- 
quirement of train /, which is defined as H; = £77); 
and (viii) E): the penalty for using a single locomotive 
consist for train 1. 

Locomotive-Train Combinations: The following 
information is specified for each train-locomotive type 
combination: (i) oe the cost incurred in assigning an 
active locomotive of type k to train J; (ii) dr: the cost 
incurred in assigning a deadheaded locomotive of type 
k to train J; and (iii) tf: the tonnage pulling capa- 
bility provided by an active locomotive of type k to 
train 1. Also specified for each train / are three disjoint 
sets of locomotive types: (i) MostPreferred[lI], the pre- 
ferred classes of locomotives; (ii) LessPreferred[I]: the 
acceptable (but not preferred) classes of locomotives; 
and (iii) Prohibited[I], the prohibited classes of loco- 
motives. When assigning locomotives to a train, loco- 
motives from the classes listed as MostPreferred|I] and 
LessPreferred|I] can be assigned (a penalty is associated 
for using LessPreferred[I]). 


Hard Constraints 


Hard constraints are mandatory constraints which have 
to be satisfied for a locomotive assignment plan to be 
feasible. 

Power Requirement for Trains: Each train must be 
assigned locomotives with at least the required tonnage 
and horsepower. 

Locomotive Class to Train Type: Each train type 
(e.g., auto train, or merchandise train, or intermodal 
train) is targeted to use specific classes of locomotive 
types. 

Geographic: Each geographic region permits the 
travel of only specific locomotive types. For example, 
it may be specified that Atlanta can only use: CW40, 
AC44, and AC60 locomotives. 

Locomotive Balance Constraints: The number of 
incoming locomotives of each type into a station must 


be equal to the number of outgoing locomotives of that 
type at that station. 

Active Axle Constraints: Each train must be as- 
signed locomotives with at most 24 active axles. Exceed- 
ing 24 powered axles may overstress the couplers and 
cause a train separation. 

Consist Size Constraints: Each train can be as- 
signed at most 12 locomotives including both the ac- 
tive and deadheading locomotives. This business policy 
reduces risk exposure if the train were to suffer a catas- 
trophic derailment. 

Fleet Size Constraints: The number of assigned lo- 
comotives of each type is at most the number of avail- 
able locomotives of that type. 

Repeatability of the Schedule: The routing of lo- 
comotives should be such that the number of locomo- 
tives of each type at each station at the end of the week 
should be equal to the number of locomotives of each 
type at each station at the beginning of the next week 
(so that the plan is repeatable every week). 


Soft Constraints 


These constraints are flexible constraints and they de- 
fine characteristics of a solution which are preferred but 
not mandatory. 

Consistency in Locomotive Assignment: A train 
should be assigned the same consist each day that it 
runs. Railroads believe that crews will perform more ef- 
ficiently and more safely if they operate the same equip- 
ment on a particular route and train. 

Consistency in Train Connections: If locomotives 
traveling on a train to its destination station connect 
to another train originating at that station, then they 
should preferably make the same connection on each 
day that both of the trains run. 

Avoid Consist Busting: Since consist busting in- 
volves the use of more resources, it is preferable to avoid 
consist busting. 

Avoid Single Locomotive Consists: Ifa single loco- 
motive is assigned to a train and this locomotive breaks 
down, then the train will get stranded. 


Objective Function 


The objective function for the LSP is to minimize the 
sum of: (i) Cost of ownership, maintenance, and fuel- 
ing of locomotives; (ii) Cost of active and deadheaded 
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locomotives; (iii) Cost of light traveling locomotives; 
(iv) Penalty for consist-busting; (v) Penalty for incon- 
sistency in locomotive assignment; and (vi) Penalty for 
using single locomotive consists. 


Formulation 


The LSP is formulated as a multi-commodity flow 
problem with side constraints on a network called the 
weekly space-time network. Each locomotive type de- 
fines a commodity in the network. 


Space-Time Network 


The weekly space-time network is denoted as G’ = (N’, 
A’), where N’ denotes the node set and A’ denotes the 
arc set. Figure | displays a part of the weekly space-time 
network at one location. The network is constructed as 
follows: 

Nodes in the Weekly Space-Time Network: The 
network contains a train arc (i), j1) for each train |. The 
tail node i; of the arc denotes the event for the de- 
parture of train / at dep-station(l) and is called a de- 
parture node. The head node j; denotes the arrival 


event of train / at arr-station(1) and is called an arrival 
node. For each arrival event, an arrival-ground node 
is created, and for each departure event, a departure- 
ground node is created. Each node in the network is 
associated with two attributes: time and place. Time(i) 
denotes the time attribute of node i in the weekly 
space-time network. The sets of departure, arrival, and 
ground nodes are denoted by the sets DepNodes, ArrN- 
odes, and GrNodes, respectively. Let the set AllNodes = 
DepNodes U ArrNodes U GrNodes. 

Arcs in the Weekly Space-Time Network: The net- 
work contains four types of arcs. The first is the set 
of train arcs, denoted by the set TrArcs, and con- 
tains an arc for every train. Each arrival node is con- 
nected to the associated arrival-ground node by a di- 
rected arc called the arrival-ground connection arc. 
Each departure-ground node is connected to the asso- 
ciated departure node through a directed arc called the 
ground-departure connection arc. All the ground nodes 
at each station are sorted in the chronological order of 
their time attributes and each ground node is connected 
to the next ground node through directed arcs called 
ground arcs. The ground arcs allow inbound locomo- 
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Railroad Locomotive Scheduling, Figure 1 
A part of the weekly space-time network 
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tives to stay in an inventory pool as they wait to be con- 
nected to the outbound trains. The last ground node in 
the week at a station is connected to the first ground 
node of the week at that station through the ground 
arc; this ground arc models the ending inventory of lo- 
comotives for a week, which becomes the starting in- 
ventory for the following week. The possibility of an in- 
bound train sending its entire consist to an outbound 
train is modeled by creating train-train connection arcs 
from train arrival nodes to train departure nodes when- 
ever such a connection can be feasibly made. Therefore, 
the four kinds of arcs are: train arcs (TrArcs), connec- 
tion arcs (CoArcs), and ground arcs (GrArcs). Let Al- 
lArcs = TrArcs U CoArcs U GrAres. 

The LSP can be formulated as a flow of differ- 
ent types of locomotives in the weekly space-time net- 
work. Locomotives flowing on train arcs are either ac- 
tive or deadheading; and those flowing on connection 
and ground arcs are idling (that is, waiting between two 
consecutive assignments). The following additional no- 
tation for the weekly space-time networks is used in 
the MIP formulations: (i) I[i]: the set of incoming arcs 
into node i € AllNodes; (ii) Oi]: the set of arcs em- 
anating from node i € AllNodes; (iii) df: defined for 
every arc 1 € AllArcs (for a train arc l, dk denotes the 
cost of deadheading of locomotive type k on train arc 1, 
and for every other arc it denotes the cost of travel- 
ing for a non-active locomotive of locomotive type k on 
arc 1); (v) CB: the set of all connection arcs from arrival 
nodes to ground nodes; alternatively, CB = {(i, j) € Al- 
lArcs: i € ArrNodes and j € GrNodes}; (vi) CheckTime: 
a time instant of the week when no event takes place 
(that is, no train arrives or departs at any station); and 
(vii) S: the set of arcs that cross the CheckTime [that is, 
S = {(i, j) € AllArcs: time(i) < CheckTime < time(j)}]. 


Problem Size and Computational Issues 


The integer programming formulation of the LSP con- 
tains around 200,000 variables and 100,000 constraints 
and cannot be solved to optimality or near optimal- 
ity using commercial state-of-the-art software. Even the 
linear programming relaxation of this problem cannot 
be solved in a reasonable amount of time. Addition- 
ally, the formulation given above cannot capture the 
consistency constraints effectively. The main contribu- 
tion made in [1] was to develop a two-stage solution 


approach to solve this problem. In the first stage, the 
daily locomotive scheduling problem which is a simpli- 
fied problem is solved, and in the second stage the daily 
locomotive schedule is modified to obtain the feasible 
weekly locomotive schedule. 

This approximate two stage approach was moti- 
vated by the observation that in a typical problem more 
than 90% of the train arcs in the space-time network 
correspond to the trains that run 5, 6, or 7 days. Based 
on this observation, the daily locomotive scheduling 
model that is a simplification of the weekly locomotive 
scheduling model is created in the following manner. 
(i) all trains that run p days or more per week run ev- 
ery day of the week; and (ii) all trains that run fewer 
than p days do not run at all. (Note: This assumption 
results in an approximation in the sense that locomo- 
tives are provided to some trains that do not exist, and 
locomotives are not provided to some trains that ex- 
ist.) 

To transform the solution of the daily locomo- 
tive scheduling solution into a feasible solution to 
the weekly scheduling problem, locomotives are taken 
from the trains that exist in the daily problem but do 
not exist in the weekly problem (Type 1 trains) and as- 
signed to the trains that do not exist in the daily prob- 
lem but exist in the weekly problem (Type 2 trains). 
This may lead to the model using additional loco- 
motives to meet the constraints. The solution of the 
daily problem can be translated into the solution of the 
weekly problem more effectively if the number of Type 
1 trains is less than the number of Type 2 trains but still 
as close as possible. 

Another contribution made in [1] is determination 
of good train-train connections and good light arcs. 
Railroads often specify some “candidate” train-train 
connections and “candidate light arcs” out of which 
a certain number are fixed in the final solution. In 
the basic formulation, each “candidate” train connec- 
tion or light arc has a fixed charge variable associated 
with it and these fixed charge variables make the MIP 
very hard to solve. This issue is handled in the follow- 
ing manner. The space-time network that contains all 
the candidate train-train connection arcs and candidate 
light arcs is constructed. The linear programming relax- 
ation of the LSP has no fixed charge variables, and has 
large costs for all the train-to-ground and ground-to- 
train arc flows to discourage the flow on such arcs (or, 
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One-Day Scheduling 
Problem: Transform the 


problem to the one-day 
scheduling problem and solve 
it. 


Railroad Locomotive Scheduling, Figure 2 
Overview of the multi-stage locomotive scheduling algorithm 


alternatively, to discourage consist-busting), is solved. 
Let g(J denote the total flow of locomotives (of all 
types) on any arc / in the daily space-time network. The 
candidate train-train connection arc h with the largest 
value of g(h) is selected. This arc indicates a success- 
ful potential train-train connection. Connection arc h 
is made the unique connection arc for the two corre- 
sponding trains and the linear programming relaxation 
is resolved. If this linear programming relaxation is in- 
feasible or if it increases the cost of the new solution by 
an amount greater than @, then the train-train connec- 
tion is not made; otherwise, it is fixed as a good con- 
nection. This iterative procedure is repeated until ei- 
ther the desired number of train-train connections is 
reached (as specified by some parameter y), or until 
the set of candidate train-train connections becomes 
empty. Using a similar greedy iterative approach, the 
set of good light moves is also determined. Figure 2 
gives an overview of the overall approach in [1]. 


Consist Flow Formulation for the LSP 


Consist busting is such an anathema to real world rail- 
road managers that most managers stipulate that a high 
quality locomotive plan is one designed such that no 


Seven-Day Scheduling 
Problem: Reoptimize one-day 


locomotive schedule to 
determine the seven-day 
schedule. 


consist busting is required. Consist busting affects crew 
requirements, station fluidity, locomotive productivity, 
and mechanical maintenance processes. Consist bust- 
ing consumes between two and six additional hours per 
locomotive within the station, asset time that could be 
productively used to pull trains on the mainline. Upon 
reassembly, each consist must undergo extensive op- 
erational testing as well. In addition, consist-busting 
often results in outbound trains getting their locomo- 
tives from several inbound trains. If any of these in- 
bound trains is delayed, the outbound train is also de- 
layed, which potentially propagates to further delays 
down the line. Consequently, railroad managers seek to 
streamline and simplify processes in order to eliminate 
fragility in the operating plan. In reality, consists will be 
tactically busted as part of real-time operations to com- 
pensate for unplanned events. 

In order to minimize consist busting, [15] extended 
the locomotive flow formulation described in [1]. Most 
features of the model are kept identical but consists 
are routed over the network instead of individual loco- 
motives. In this formulation, referred to as consist flow 
formulation, each consist type (that is, a group of lo- 
comotives) is defined to be a commodity that flows on 
the network. Thus, the consist flow formulation differs 
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from the locomotive flow formulation in the sense that 
locomotive types are replaced by the consist types. Ob- 
serve that each feasible solution of the consist flow for- 
mulation has a corresponding feasible solution to the 
locomotive flow formulation with the same cost, but 
the converse is not true. Thus, the reader might be led 
to believe that consist formulation is quite restrictive 
and may produce significantly inferior optimal solution 
compared to the flow formulation. However, computa- 
tional results revealed that if the number and types of 
consists are judiciously chosen, then both formulations 
produce solutions with comparable quality. 

The consist flow formulation is a multi-commodity 
integer program. The network is constructed in a simi- 
lar way to that described in Sect. “Space-Time Network” 


Additional Notation 


C: Denotes the set of consist types available for as- 
signment and c € Crepresent a particular con- 
sist. 

Cost of assigning an active consist of type c € 

Cto train I. 

dj: Defined for every arc ! € AllArcs. For a train arc 
I € TrArcs, d;captures the cost of deadheading 
a consist of type c € Con arc I. For anarcl € 
CoArcsUGrArcs, dj captures the cost of idling for 
a consist type c € Con arc l. 

a“; Number of locomotives of type k € K in consist 

typec EC. 

Set of arcs entering node i. 

Set of arcs leaving node i. 

S: Set of overnight arcs or arcs that cross the Sun- 
day midnight timeline. (This time is chosen as 
the time for counting the number of locomotives 
used in the solution.) 


Decision Variables 


Binary variable representing the number of active 

consists of type c € C on arc! € TrArcs. 

yj: Integer variable representing the number of non- 
active consists (deadheading, light-traveling or 
idling) of type c € Conarc1 € AllArcs. 

zj: Binary variable which takes value 1 if at least one 
consists flows on arc | € LiArcs and 0 otherwise. 

s*: Integer variable indicating the number of unused 

locomotives of type k € K. 


Objective Function 


minz = = Yo efx + > > diy; 


leTrArcs cEC leEAllArcs cEC 
+ > Fiz] ~ Soaks (1) 
1eLiArcs kEK 
Constraints 
pe =1, forall 1 €TrArcs, (2) 
cEeC 


ye ae +y)<12, forall / € TrAres, (3) 


cE€C keK 


VY af+ y= Yo OF + yr), 


lel[i] leO[i] 

forall i¢ AllNodes, ceEC, (4) 
‘> So mck yi <12z;, forall 1 € LiArcs , (5) 
kEK cE€C 

chy xt c k _ pk 
Doda * (aj + ypday() + s* = BE, 
les ceC (6) 
foral keK, 

x; € {0,1}, forall 1 eTrAres, ceEC, (7) 


y; = Oand integer , for all 1 € AllArcs,c EC, (8) 
z € {0,1} , forall ] € LiArcs. (9) 


s* >0, forallk eK (10) 

Constraint (2) ensures that every train / is assigned 
exactly one active consist. Constraint (3) ensures that 
the locomotive flow upper-bound on each train arc is 
satisfied. Constraint (4) ensures that flow is balanced at 
every node for every consist type. Constraint (5) en- 
sures that the locomotive flow upper-bound on each 
light arc is satisfied. Constraint (6) ensures that the 
number of locomotives used for each fleet type is no 
more than the fleet size. 

Note that in this formulation, it is not required to 
explicitly specify the constraints that each train gets the 
required tonnage, horsepower and does not exceed the 
24-active axle requirement. These constraints are im- 
plicitly handled in the formulation. The active axle con- 
straints are handled by not creating consists which have 
more than 24 active axles. The tonnage and horsepower 
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constraints are handled implicitly in the following way; 
if assigning consist c € Cas an active consist to train 
1 € TrArcs violates the tonnage or horse power con- 
straints, then the corresponding variable is set to zero 
(x; = 0); thus disallowing the assignment of consist c 
to train /. The consist flow formulation has significantly 
less side constraints compared to the locomotive flow 
formulation, resulting in faster solution time. Another 
speed-up in the consist flow formulation comes from 
the fact that each active consist assignment variable, 
x7 ,is a binary variable, whereas in the locomotive flow 
formulation, it is an integer variable and can take values 
0, 1, 2, 3,... etc. The MIP optimization engine, which is 
typically a branch and bound algorithm, is likely to ex- 
plore lesser branches for a binary integer program com- 
pared to a general integer program as there are lesser 
options to pursue. There are instances where the loco- 
motive flow formulation could not give a feasible inte- 
gral solution in 10 hours, but the consist flow formula- 
tion gave an optimal solution within a few minutes of 
computational time. 

Railroads often impose complex rules on what lo- 
comotive types may be combined into ideal consists. 
Some locomotives do not work well together. Some rail- 
roads segregate AC powered locomotives and DC pow- 
ered locomotives. These requirements are often very 
hard or impossible to honor in the locomotive flow 
formulation but are rather trivial in the consist flow 
formulation. Further, in the locomotive flow formula- 
tion, an outbound train often obtains its planned con- 
sist from locomotives coming from multiple trains and 
if any of these inbound trains is delayed, the outbound 
train is delayed as well. But in the consist flow formula- 
tion, all outbound trains derive their active consist only 
from one inbound train (but may derive their deadhead 
consists from other trains) thus reducing the impact of 
train delays. In summary, the benefits of using the con- 
sist formulation are (1) solution speed and robustness 
greatly improved, (2) consist busting is reduced to zero, 
and (3) constraints are more easily incorporated, result- 
ing in more practical solutions. 

Computational tests show that the consist flow for- 
mulation may have its optimal objective function value 
as much as 5% higher than that of the locomotive flow 
formulation. However, the solution is far superior in 
terms of consistency, simplicity, and robustness. Thus, 
it may be easier to comply with and may need overall 


fewer locomotives in practice (considering train delays, 
for example). 


Methods and Applications 


In [15], the authors describe methods to use the consist- 
based formulation to incorporate several practical re- 
quirements to generate an implementable plan. These 
include incremental locomotive planning, satisfying 
cab-signal requirements, incorporating foreign power 
requirements, and accounting for shop power require- 
ments. The interested reader may refer to [15] for fur- 
ther details. 

In this section, the result of two case studies to il- 
lustrate the uses of the model to assist decision making, 
are presented. The aspects of the problem observed are: 
(1) Effect of varying minimum connection time on so- 
lution cost, and (2) Effect of varying transport volumes 
on key transport performance characteristics. 


Effect of Varying Minimum Connection Time 
on Solution Cost 


Freight trains do not run on time and often arrive later 

than their planned arrival time, which makes it diffi- 

cult for locomotive dispatchers to adhere to the loco- 
motive plan. One method commonly recommended to 
improve plan compliance is to increase the train-train 
minimum connection times. Although increasing the 
minimum connection time may improve plan adher- 
ence, it also increases locomotive costs as more locomo- 
tives will be held in inventory at terminals. This study 
quantifies the impact of increasing the minimum con- 
nection times. 

The following conclusions can be drawn from the 

study (see Fig. 3): 

e The minimum connection time could have a signif- 
icant impact on the solution cost. 

e The solution cost increases linearly with the increase 
in connection times at the rate of around $200,000 
per hour. 

Depending upon the lateness of trains and the willing- 

ness of railroad planners to improve locomotive plan 

compliance, appropriate connection times can be used. 

As shown in this study, railroads can use the model to 

estimate the benefit of reducing locomotive connection 

times at terminals; this reduction may involve the in- 
vestment of resources to improve efficiency of termi- 
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Railroad Locomotive Scheduling, Figure 3 
Solution cost vs. minimum connection time 


nal activities like fueling and servicing locomotives, and 
consist-busting overheads. 


Effect of Varying Transport Volume 
on Key Performance Characteristics 


In this study, the impact of varying transport volumes 
(or train tonnages) on the following key transport char- 
acteristics: number of locomotives used, solution cost, 
mean pulling power of a consist, and mean miles trav- 
eled per consist, is measured. The results of these tests 
are presented in Table 1 and Fig. 4. 

The following conclusions can be drawn from this 
study: 


Railroad Locomotive Scheduling, Table 1 
Effect of varying transport volumes 


% increase Meantonnage Locos 
in tonnage of trains 


—20 


Solution 
Used Cost ($) 


Railroad Locomotive Scheduling, Figure 4 
Impact of transport volumes on solution cost 


e As the transport volume (mean tonnage) increases, 
the solution cost increases as a quadratic function 
as shown in Fig. 4. The nature of the variation indi- 
cates that the rate of change of cost with respect to 
tonnage (slope) is a linear increasing function. 

e The mean pulling power of each consist and the 
number of locomotives used increase as expected 
but the surprising observation is that the average 
number of miles traveled by each consist remains 
roughly the same. 

As demonstrated in this study, railroads can use the 

model as a generic approach for modeling the optimal 

number of locomotives needed or the cost as a function 
of the rail freight transport volume over the entire net- 
work. 


Mean pulling Mean miles/ 
power/consist consist 
405.05 


6,183 1,026 


7,502,802 
7,588,421 
7,726,579 


10,099 
10,373 
10,464 


405.04 
404.17 


| -5 [7,342 1,040 | 7,910,331 | 10,841 405.14 


| 0 «(7,729 1,049 | 8,120,134 | 11,093 403.73 
(SS Eis 1,079 | 8,437,334 | 11,457 403.86 


8,812,467 
9,296,423 
9,635,790 


11,828 
12,483 
12,715 


405.27 
403.98 
405.12 
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Conclusions 


The LSP creates a blueprint for use by managers to as- 
sist them in making tactical, day-to-day decisions re- 
garding locomotive assignments. This article provides 
an overview of some of the recent work done in this 
area. The locomotive planning tools presented in this 
article have the potential to serve as a core component 
in a strategic fleet sizing model. Each year, railroads de- 
velop five year outlooks and forecast the demand for 
freight transportation and the associated train sched- 
ules to handle that demand. Railroads also must project 
locomotive fleet supply, as units reach their economic 
limit of repair and are retired from the fleet. The gap be- 
tween the future supply of locomotives and the future 
demand for locomotives must be closed with a com- 
bination of improved productivity and new purchases. 
This planning tool will go a long way towards helping 
railroad financial executives and strategic planners set 
asset goals and negotiate the purchase of new locomo- 
tives. The planning tool can also be used to perform 
“what-if” kind of analysis and allow railroad managers 
to take informed decisions after analyzing their global 
impacts. 


References 


1. Ahuja RK, Liu J, Orlin JB, Sharma D, Shughart LA (2005) 
Solving real-life locomotive scheduling problems. Trans Sci 
39:503-517 

2. Ahuja RK, Magnanti TL, Orlin JB (1993) Network Flows: 
Theory, Algorithms, and Applications. Prentice Hall, Engle- 
wood Cliffs 

3. Booler JMP (1980) The solution of a railway locomotive 
scheduling problem. J Oper Res Soc 31:943-948 

4. Booler JMP (1995) A note on the use of Lagrangean Relax- 
ation in railway scheduling. J Oper Res Soc 46:123-127 

5. Chih KC, Hornung MA, Rothenberg MS, Kornhauser AL 
(1990) Implementation of a real time locomotive distribu- 
tion system. In: Murthy TKS, Rivier RE, List GF, Mikolaj J (eds) 
Computer Applications in Railway Planning and Manage- 
ment. Computational Mechanics Publications, Southamp- 
ton, pp 39-49 

6. Cordeau JF, Soumis F, Desrosiers J (2000) A Benders de- 
composition approach for the locomotive and car assign- 
ment problem. Trans Sci 34(2):133-149 

7. Cordeau JF, Toth P, Vigo D (1998) A survey of optimiza- 
tion models for train routing and scheduling. Trans Sci 
32:380-404 

8. Fischetti M, Toth P (1997) A package for locomotive 
scheduling. Technical Report DEIS-OR-97-16, University of 
Bologna 


9. Florian M, Bushell G, Ferland J, Guerin G, Nastansky L (1976) 
The engine scheduling problem in a railway network. IN- 
FOR 14:121-138 

10. Forbes MA, Holt JN, Watts AM (1991) Exact solution of lo- 
comotive scheduling problems. J Oper Res Soc 42:825-831 

11. Nemhauser GL, Wolsey LA (1988) Integer and Combinato- 
rial Optimization. Wiley, New York 

12. Nou A, Desrosiers J, Soumis F (1997) Weekly locomotive 
scheduling at Swedish State Railways. Technical Report 
TRITA/MAT-97-OS 12, Royal Institute of Technology, Stock- 
holm 

13. Ramani KV (1981) An information system for allocating 
coach stock on Indian Railways. Interfaces 11:44-51 

14. Smith S, Sheffi Y (1988) Locomotive scheduling under un- 
certain demand. Trans Res Records 1251:45-53 

15. Vaidyanathan B, Ahuja RK, Liu J, Shughart LA (2008) Real- 
life locomotive planning: new formulations and computa- 
tional results. Transp Res Part B 42:147-168 

16. Wright MB (1989) Applying stochastic algorithms to a lo- 
comotive scheduling problem. J Oper Res Soc 40:187-192 

17. Ziarati K, Soumis F, Desrosiers J, Gelinas S, Saintonge A 
(1997) Locomotive assignment with heterogeneous con- 
sists at CN North America. Eur J Oper Res 97:281-292 

18. Ziarati K, Soumis F, Desrosiers J, Solomon MM (1999) 
A branch-first, cut-second approach for locomotive assign- 
ment. Manage Sci 45:1156-1168 


Random Search Methods 


H. EDWIN ROMEIJN 
Department Industrial and Systems Engineering, 
University Florida, Gainsville, USA 


MSC2000: 65K05, 90C30 


Article Outline 


Keywords 

Conceptual Methods 
(Pure) Random Search 
Pure Adaptive Search 
Adaptive Search 
Hesitant Adaptive Search 

Algorithms 
Simulated Annealing 
Pure Localization Search 

Conclusions 

See also 

References 


3246 


Random Search Methods 


Keywords 


Global optimization; Nonsmooth optimization; 
Unbounded optimization; Interval methods 


A general global optimization problem is a problem of 
the form 


max f(x) 


st. x eS, 


where f is a continuous function on S, and S$ C R@ is 
a compact body, and the goal is to find a point in the set 


S* ={x eS: f(x) = f*} 


where 
Eee 


It is well-known that, without additional assumptions, 
this problem is inherently unsolvable in a finite number 
of steps. Therefore, the global optimization problem is 
usually considered solved if a point is found in 


Br = {x €S: ||x—x*|| < € for some x* € S*} 
or in the level set 
Se ={x ES: f(x) > f*—-6} 


for some € > 0 [17]. 

Stochastic methods are methods that contain some 
stochastic elements. This means that either the out- 
come of the method is a random variable, or the ob- 
jective function itself is considered to be a realization 
of a stochastic process. Therefore, the possibility of an 
absolute guarantee of success is sacrificed. Instead, it is 
usually possible to prove that, as the effort increases to 
infinity, an element of Bz or S¢ will be found with prob- 
ability one. Surveys on the topic of stochastic meth- 
ods for global optimization can be found in [10,29,38] 
and [33]. 

Random search methods are those stochastic meth- 
ods that rely solely on the random sampling of a se- 
quence of points in the feasible region of the problem, 
according to some prespecified probability distribution, 
or sequence of probability distributions. These methods 
are applicable to, and enjoy an asymptotic performance 
guarantee for, a very wide class of problems. There- 
fore, these methods have, during the last decade, en- 
joyed increasing interest for its ability to handle prob- 


lems whose mathematical structure is difficult (or un- 
desirable, or even impossible) to analyze. 


Conceptual Methods 


The methods discussed here are of a conceptual nature, 
in the sense that at this point there does not exist an 
efficient implementation of these methods. However, 
the theoretical results that can be obtained for these 
methods are interesting in itself. Moreover, they have 
shown potential for inspiring (or theoretically support- 
ing) practical algorithms for global optimization. 


(Pure) Random Search 


The simplest stochastic method for global optimiza- 
tion is pure random search (PRS) ([12] and [5]). This 
method consists of generating a sequence of indepen- 
dent and identically distributed uniform points in the 
feasible region S, while keeping track of the best point 
that is found. In pseudocode, the method is given be- 
low. 

The sequence of points generated by this method 
converges to a global optimum with probability one. In 
particular, the probability that a point in S, is reached 
within the first N iterations is equal to 


1-(1—9(S.))%, 


where ¢ denotes the uniform distribution on S. In other 
words, this method offers a probabilistic asymptotic 
guarantee. 


PROCEDURE pure random search() 


InputInstance(); 
Set y = —00; 
DO 


Generate a point x from the uniform dis- 
tribution over S; 
Set y = max(y, f(x)); 
OD; 
RETURN(y); 
END pure random search; 


A pseudocode for PRS 


In [37], F.J. Solis and R.J-B. Wets introduce a class of 
random search methods whose most important feature, 
as compared to pure random search, is that disimprove- 
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ments are disallowed through a generalization of the 
acceptance-rejection approach. Moreover, sampling is 
not limited to the uniform distribution, but to a (pre- 
specified) sequence of (absolutely continuous) proba- 
bility distributions on S. Asymptotic convergence with 
probability one can again be shown. 

If the random search method is adaptive, i.e. if we 
allow the distributions to be sampled from to depend 
on the values of previously found solutions, then con- 
vergence cannot be assured in general. 


Pure Adaptive Search 


Pure adaptive search (PAS) differs from PRS, and re- 
sembles the random search methods of Solis and Wets, 
in that it forces improvement in each iteration. How- 
ever, the improving force is much stronger in that the 
algorithm is truly adaptive, without using any form of 
acceptance-rejection. In particular, an iteration point is 
generated from the uniform distribution on the subset 
of points that are improving with respect to the previ- 
ous iteration point. More formally, the method reads: 


PROCEDURE pure adaptive search() 


InputInstance(); 
Sat jf) = =CO5 
DO 


Generate a point x from the uniform dis- 

tribution over {x € S: f(x) > y}; 

Set y = f(x); 

OD; 
RETURN(y); 

END pure adaptive search; 


A pseudocode for PAS 


This method has been introduced and analyzed in [27] 
for convex programming problems, and by Z.B. Zabin- 
sky and R.L. Smith [42] for more general global opti- 
mization problems. For Lipschitz continuous problems 
with convex feasible regions, the expected number of 
iterations to achieve a solution with a given precision 
increases at most linearly in the dimension d of the 
problem. This result suggests there is hope for an effi- 
cient random search method for global optimization. In 
fact, several random search methods have reported em- 
pirical linearity in dimension for optimizing quadratic 
functions ([34,35] and [37]). PAS has been extended to 


the case of a finite domain in [44], with analogous per- 
formance results. 


Adaptive Search 


It is generally very difficult to generate a point uni- 
formly distributed in a level set of the global opti- 
mization problem. A method that avoids this issue, 
at the cost of having to generate from other distri- 
butions than the uniform, is adaptive search (AS). In 
this method, a sequence of improving points is gen- 
erated by generating points according to a sequence 
of distributions in the feasible region S, while using 
an acceptance-rejection approach to attain improve- 
ment. The sequence of generating distributions should 
be chosen in such a way that, as the method progresses, 
these distributions concentrate more and more around 
the global optimum of the problem. An example of 
a family of distributions having this property is the fam- 
ily of Boltzmann distributions. This family of distribu- 
tions, parameterized by a positive parameter T, are ab- 
solutely continuous on S with densities 


r(x) x ght 

Note that, as the parameter T approaches infinity, the 
sequence of distributions approaches the uniform dis- 
tribution on S. On the other hand, it can be shown that, 
as the parameter T approaches zero, the sequence of 
distributions converges to a distribution that concen- 
trates on the set of points where the global optimum is 
attained. Using this family of distributions, the adaptive 
search method reads: 


PROCEDURE adaptive search() 


InputInstance(); 
Set y = —00; 
Sai 1 = ces 

DO 


Generate points x from the Boltzmann 
distribution with parameter T 
until a point is found with f(x) > y; 
Set y = f(x); 
Decrease T; 

OD; 

RETURN()); 

END adaptive search; 


A pseudocode for AS 
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This method was introduced by H.E. Romeijn and 
Smith [30] with the objective of studying the behav- 
ior of simulated annealing (see below). They general- 
ized the key result for pure adaptive search as follows. 
If the value of the parameter T depends, in a mono- 
tonely decreasing way, on the current record value, then 
the expected number of improving points to achieve 
a solution with a given precision increases at most lin- 
early in the dimension d of the problem for a wide class 
of global optimization problems. The parameter T can 
be used to limit the total number of iterations (includ- 
ing the nonrecord values generated in the acceptance- 
rejection phase). Ideally, one should choose the value of 
the parameter T in such a way that, during the next iter- 
ation, the probability of obtaining an improving point 
is at least equal to some fixed value. 


Hesitant Adaptive Search 


One way of implementing PAS would be to use an 
acceptance-rejection approach for generating points in 
level sets of S (assuming the problem of generating 
a uniformly distributed point in S itself is relatively 
easy). In terms of total number of iterations, this ap- 
proach would be equivalent to PAS, with the following 
modification. At each iteration, either a new PAS iter- 
ate is generated (with some probability b) or the pre- 
vious iteration point is repeated (with probability 1 — 
b), where b depends on the current record value. More 
precisely, b is the relative measure of the level set cor- 
responding to the current record value. hesitant adap- 
tive search (HAS), introduced by D.W. Bulger and G.R. 
Wood [13] with the objective of studying localization 
search algorithms (see below), generalizes this way of 
viewing PAS by relaxing the specific choice of b men- 
tioned above. An explicit expression for the expected 
number of iterations required to obtain a point with 
a given objective function value (or better), and even 
the full distribution of this random variable, can be de- 
rived (see also [40]). 


Algorithms 
Simulated Annealing 


Simulated annealing (SA) is a random search method 
that avoids getting trapped in local maxima by accept- 
ing, in addition to transitions corresponding to an in- 
crease in function value, also transitions correspond- 


ing to a decrease in function value. The latter is done 
in a limited way by means of a probabilistic acceptance 
criterion. In the course of the maximization process, the 
probability of accepting deteriorations descends slowly 
towards zero. These ‘deteriorations’ make it possible to 
move away from local optima and explore the feasible 
region S in its entirety. 

SA originated from an analogy with the physical an- 
nealing process of finding low energy states of a solid 
in a heat bath [26]. M. Pincus [28] developed an algo- 
rithm based on this analogy for solving discretizations 
of continuous global optimization problems. Many ap- 
plications to date have been to discrete (combinatorial) 
optimization problems (see e. g. [1,23] and [2]). 

All SA algorithms for global optimizations can be 
viewed as approximative versions of AS. The main 
problem with directly implementing AS is that, in gen- 
eral, it will be extremely difficult to generate points ex- 
actly from the Boltzmann distribution on S. The ap- 
proximative character of SA lies in the fact that SA algo- 
rithms use a Markov chain sampling approach instead. 
This means that a Markov chain is defined on S hav- 
ing the property that the limiting distribution of this 
Markov chain is the desired Boltzmann distribution. 
One way of achieving this is the following. First, create 
a Markov chain on S that has the uniform distribution 
as its limiting distribution. Examples of such Markov 
chains are the hit and run generator (see [8,9,36] and 
[32]), and the random ball walk (see [25]). This Markov 
chain can then be filtered as follows to change the lim- 
iting distribution to the Boltzmann distribution. If the 
current iteration point is x, and the current value of the 
temperature parameter is T, then the candidate point 
(say z) that is generated by the Markov chain is accepted 
with probability 


min{1, ef@-SOWT} 


(the Metropolis criterion). Otherwise, we discard the 
candidate point and remain at the current iteration 
point. 

C,J.P. Bélisle [7] showed that, although successive iter- 
ations of SA may experience deteriorations in objec- 
tive function value, these effects are, under mild con- 
ditions, transient if the Markov chain reaches globally, 
i.e., if each feasible point can be reached from any other 
feasible point in one iteration. In particular, if the se- 
quence of temperature parameters T decreases to zero 
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PROCEDURE simulated annealing() 
InputInstance(); 
Set y = —00; 
Sat I = cop 
Choose x € S; 
DO 
Generate a point z according to 
Markov chain transition distribution; 
With probability min{1, ef@-S@/7), 
set x = Z; 
If f(x) > y set y = f(x); 
Adjust T; 
OD; 
RETURN(y); 
END simulated annealing; 


some 


A pseudocode for SA 


in probability, SA will eventually be absorbed in arbi- 
trarily small neighborhoods of the global maximum. 
The cooling schedule controlling the way the parameter 
T is decreased should be chosen in accordance with AS. 
In practice, this means that the temperature parameter 
should be proportional to (an estimation of) the value 
error corresponding to the current record value. If the 
temperature value is held constant at 0 (which means 
that no deteriorations are ever accepted), we obtain the 
improving hit and run algorithm (see [43]). In [24], M. 
Locatelli derives convergence results for simulated an- 
nealing algorithms with nonglobally reaching Markov 
chains. 

Examples of specific SA algorithms can be found in 
[3,11,15,16,21,31,39] and [32]. In [22], the first polyno- 
mial time implementation of an SA algorithm is pre- 
sented, albeit for the unimodal problem of minimizing 
a linear function over an up-monotone convex set in 
the positive orthant. The authors derive an SA algo- 
rithm using the rapidly mixing Markov chains devel- 
oped in [18]. 

Finally, a class of algorithms closely related to the 
SA algorithms discussed above are algorithms based on 
the Langevin stochastic differential equation 


dx(t) = Vf(x(t)) dt + €(t) dw(t), (1) 


where Vf is the gradient of f and w(t) is a standard 
d-dimensional Wiener process. If €(f) is constant, then 
the limiting distribution of the sequence of points thus 


generated is precisely the Boltzmann distribution at 
temperature €7/2. In other words, algorithms based on 
this result can be viewed as SA algorithms, but us- 
ing continuous time instead of discrete time Markov 
chains. See e. g. [19] and [14] for theoretical results, and 
[4] for a practical implementation. 


PROCEDURE pure localization search() 


InputInstance(); 
Set y = —00; 
Setein—as: 

DO 


Generate a point x uniformly in L; 
WE jG) = py Sat py = jfk 
Shrink L (while remaining a superset of the 
current record level set); 
OD; 
RETURN(y); 
END pure localization search; 


A pseudocode for PLS 


Pure Localization Search 


The aim of pure localization search (PLS) is to approxi- 
mate PAS. The approximation consists of the following. 
Instead of generating a point uniformly from a level set, 
a point is generated uniformly from a superset of the 
level set, called a localization. Note that PRS is an in- 
stance of PLS, where the superset of each level set is the 
entire feasible region S. 

This class of algorithms was introduced in [6] 
where, in addition, an example of a particular PLS al- 
gorithm for one-dimensional Lipschitz optimization is 
provided. 


Conclusions 


The main advantage of random search methods for 
solving global optimization problems is that they are 
applicable to a very wide class of optimization prob- 
lems, and nevertheless provide a convergence guaran- 
tee, albeit an asymptotic one. Therefore, random search 
methods have been applied most successfully to prob- 
lems that are black-box and/or unstructured, so that 
approaches that make use of a particular mathematical 
structure cannot be used. Examples include industrial 
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design problems and the design of composite laminates 
[20,41]. Moreover, these methods are usually very eas- 
ily implemented. As such, they are also often used as 
a first approach towards solving a problem, until insight 
into the structural properties of the problem warrant 
the development of new, or application of existing, spe- 
cial purpose optimization methods. 
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Introduction 


The purpose of reactive scheduling is to adjust a pro- 
duction schedule upon the occurrence of unexpected 
or unforeseen events. The original schedule is usually 
obtained a priori and then reactive scheduling is per- 
formed upon the breakdown of a piece of equipment, 
when new customer orders are received, or when ex- 
isting orders are modified or deleted. Thus, in order to 
be effective, reactive scheduling systems must be able 
to generate updated production schedules relatively 
quickly. It is not desirable to do full-scale reschedul- 
ing for every unexpected event and usually heuristic ap- 
proaches are developed to achieve the desired sched- 
ule modifications. Recent reviews on scheduling ap- 
proaches that include reactive scheduling issues can be 
found in Floudas and Lin [3,4]. 

The proposed scheduling formulation uses the 
continuous-time scheduling formulation for short- 
term scheduling proposed by Floudas and cowork- 
ers [5,8,9] and the medium-term scheduling framework 
developed by Lin et al. [10] and Janak [6]. Full-scale 
rescheduling of each production schedule is avoided by 
fixing binary variables for a subset of tasks from the 
original production schedule. The subset of tasks to fix 
is determined using a detailed set of rules that reflect 
the production needs and can be modified for different 
production facilities. The fixing of tasks results in a re- 
duced computational effort required to solve the result- 
ing MILP problem and thus a suitable, updated produc- 
tion schedule can be determined in a shorter amount of 
CPU time. 


3252 


Reactive Scheduling of Batch Processes 


Problem Statement 


In the batch plant investigated, there are several dif- 
ferent types of operations (or tasks) possible and the 
plant has many different types of units where over 80 
are modeled explicitly. Also, there are hundreds of dif- 
ferent products that can be produced through a variety 
of processing recipes resulting in hundreds of different 
tasks. Each product is made using one of the process- 
ing recipes shown in Fig. 1 or a slight variation. The 
recipes are represented in the form of State-Task Net- 
work (STN), in which the state node is denoted by a cir- 
cle and the task node by a rectangle. 

The information on which units are suitable for 
each product is given. All the units are utilized in 
a batch mode with the exception of the type 5 and 6 
units, which operate in a continuous mode. The capac- 
ity limits of the type 1, 2, and 3 units vary from one 
product to another, while the capacity limits of the type 
4a, 4b, 5, and 6 units are the same for all suitable prod- 
ucts. The processing time or processing rate of each task 
in the suitable units is also specified. Additional infor- 
mation for the plant under investigation can be found 
in Janak et al. [6]. 


The time horizon considered for production 
scheduling is a few weeks or longer and customer or- 
ders are fixed throughout the time horizon with speci- 
fied amounts and due dates. There is no limitation on 
external raw materials and we apply the zero-wait stor- 
age condition or limited intermediate storage capacity 
for all materials based on actual plant data. There are 
two different types of products produced, category 1 
and 2. The solution of this medium-term scheduling 
problem, even for only a few days, results in a large- 
scale scheduling problem that can be very computa- 
tionally complex. New techniques need to be developed 
in order to both efficiently and accurately carry out re- 
active scheduling. 


Formulation 


The overall methodology for solving the reactive 
scheduling problem in a medium-term production 
scheduling framework can be summarized in the fol- 
lowing steps. The flowchart for the overall reactive 
scheduling framework is shown in Fig. 2. 

Step 1. Obtain a nominal production schedule for 
the full scheduling horizon using the medium-term 
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State-task network (STN) representation of plant 
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Reactive Scheduling of Batch Processes, Figure 2 
Flowchart of the reactive scheduling framework 


scheduling framework proposed in Janak et al. [6] by 
decomposing the full scheduling horizon into smaller 
short-term scheduling subproblems in successive time 
horizons. 
Step 2. Upon the realization of an unexpected event, 
perform the following steps. 
a. Fix all the tasks in the short-term scheduling sub- 
horizons which have already taken place using the 
nominal schedule. 


b. For the subhorizon that is currently in produc- 
tion, there are two cases of unexpected events which 
are considered: breakdown of a unit or the addi- 
tion/modification of orders. If a unit breaks down, 
then fix the appropriate tasks for the current sub- 
horizon using the rules outlined in Sect. “Reactive 
Scheduling: Unit Shutdowns”. For the case when or- 
ders are added or modified, then fix the appropriate 
tasks for the current subhorizon using the rules out- 
lined in Sect. “Reactive Scheduling: New or Modified 
Orders”. Note that a combination of both can also be 
performed. 

c. Once the tasks have been fixed, then the overall 
short-term scheduling model presented in Janak 
et al. [6] with some modifications can be used to 
perform rescheduling. First, the formulation is en- 
hanced so that the results of the level 1 decompo- 
sition model are fixed to match the results from 
the nominal schedules. Thus, the days in each sub- 
horizon are fixed so that the horizons match the 
nominal schedules. Next, modifications need to 
be made to reflect changes from the unexpected 
events. Each of the two cases of unexpected events 
that can occur in a particular subhorizon requires 
a different modification to the model. Complete 
information on these modifications can be found 
in Sect. “Reactive Scheduling: Unit Shutdowns” 
and “Reactive Scheduling: New or Modified Orders” 

Step 3. Once the items in Step 2 have been com- 
pleted, then the horizon with the current reactive events 
is ready to be rescheduled. This is done in the same 
manner as the nominal schedule, however, a smaller 
time limit or integer solution limit may be employed, 
if desired. Next, the remaining overall time horizon 
is rescheduled from scratch. This is necessary since 
changes in the reactive time horizon can cause changes 
in the inventory, demand satisfaction, and unit avail- 
ability which can make the nominal solution for the re- 
maining time horizons infeasible or severely subopti- 
mal. 


Reactive Scheduling: Unit Shutdowns 


For the case when a piece of equipment breaks or is 
shut down during a subhorizon, then, for Step 2b in 
the overall framework, the following tasks need to be 
performed. For the current subhorizon, tasks are fixed 
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in a particular unit at a specific event point by setting 
the associated binary variable to one. For the case of 
unit shutdown, we also fix the starting time and batch 
size of all tasks that are fixed. Tasks are fixed for the 
unit shutdown case if they start processing before the 
unit shutdown in all unaffected units or if they start and 
finish processing before the shutdown time in the unit 
that is shut down. For Step 2c in the overall framework, 
some additional information needs to be included in 
the model before rescheduling is performed. For the 
case of unit shutdown, the starting and finishing times 
of all tasks in the particular unit that shuts down need 
to be appropriately bounded so that the unit becomes 
unavailable during the shutdown period. Also, for the 
subsequent scheduling horizon, the starting time of the 
unit that was shut down needs to be set to either the end 
of the last task or to the time the unit became available 
again, depending on which is later. This ensures that no 
task will be scheduled during the blocked off time if it 
overlaps into the next horizon. 

The mathematical formulation for reactive schedul- 
ing with unit shutdowns uses the same sets of con- 
straints developed for short-term scheduling in Janak 
et al. [6] including constraints (22)-(52), (56)-(58), and 
the overall objective function given in constraint (66). 
In addition, the above additional requirements can be 
described mathematically as follows: 


wv(i, jn) =1,VieI™ jeJi,neNn (1) 


ii Ce re ak Pe | aaa 
VieI™ jeJineN (2) 


B(i, j,n) = Bi, j, n)™, 
VieI™ jejineN (3) 


T°(i, j,n), T/(i, j,n) < Shutdown," 
Viel,jeJ4*neN,n<N™ (4) 


TS(i, jn), T!(i, jn) = Shutdown , 
Viel,jeJ4*neN,n>N™ (5) 
where I is the set of tasks (i) to be fixed, J* is the set 


of units (j) which experience a shutdown, N'™ is an in- 
termediate event point, T°(i, j, n)°°" is the value of the 


starting time of the task from the nominal scheduling 
run, B(i, j, n)°°" is the batch size of the task from the 
nominal scheduling run, Shutdown'*" is the beginning 
of the shutdown in unit (j), and Shutdown‘"* is the end 
of the shutdown in unit (j). 


Reactive Scheduling: New or Modified Orders 


For the case when new orders are added to a schedul- 
ing horizon, then, for Step 2b in the overall framework, 
the following actions need to be performed. For the cur- 
rent subhorizon, tasks can be fixed in a particular unit 
at a specific event point for a variety of reasons. Some 
of the possible cases for fixing tasks from the nominal 
schedule considered in this work are as follows. 

1. If the task takes place in a unit that is not suitable 
for any products which have new or changed orders. 
This is done so that production is free in any unit 
which may need to accommodate a new order or re- 
move an old order. 

2. Ifthe task takes place in a unit that is heavily utilized 
(e. g., more than a specified percentage). This is done 
because well utilized units should not be resched- 
uled, if possible. Note that in this case, a task is only 
fixed in a unit that is heavily utilized if the unit does 
not have any tasks taking place in it which corre- 
spond to new or modified orders. 

3. If the task corresponds to a product with one of the 
top four largest demands in the current subhorizon. 
This is done to help ensure that the larger demands 
in the horizon can be met. 

4. If the task is suitable in three or less processing units. 
These tasks are fixed because they most likely cannot 
be processed anywhere else. 

5. If the task occurs in a special processing unit, spe- 
cific to the problem at hand. For instance, if the task 
occurs in a unit which has very tight production or 
predetermined production. 

6. If the task occurs in one of the larger processing 
units, such as the large type 1 units, and corresponds 
to a product with demand that is greater than or 
equal to some amount. These tasks are fixed because 
they represent a good utilization of resources. Larger 
demands should be assigned to the larger processing 
units. 

Thus, if a task satisfies one or more than one of the 

items above, then it is fixed in the reactive scheduling 
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formulation using the data from the nominal sched- 
ule. Note that a task which corresponds to a reduced or 
deleted order cannot be fixed, regardless of whether or 
not it satisfies any of the above items. For Step 2c in the 
overall framework, some additional information needs 
to be included in the model before rescheduling is per- 
formed. For the case of new orders, the demands used 
in the mathematical framework need to be updated to 
reflect any changes. 

The mathematical formulation for reactive schedul- 
ing with new or modified orders also uses the same sets 
of constraints developed for short-term scheduling in 
Janak et al. [6] including constraints (22)-(52), (56)- 
(58), and the overall objective function given in con- 
straint (66). In addition, the above additional require- 
ments can be described mathematically using Eq. (1) 
given for reactive scheduling with unit shutdowns. 


Extensions of Reactive Scheduling Formulation 


As mentioned previously, it is also possible to consider 
a combination of reactive events. For instance, multi- 
ple units could become unavailable and multiple orders 
could be added or modified in a single horizon. This 
case can be addressed using the same methodology de- 
scribed in the previous section. In order to fix a task, we 
would need to check that the task does not correspond 
to a modified order and that it does not occur during 
a blocked time in any of the unavailable units. Then, if 
both of these conditions are met and if the task satisfies 
one or more of the above criteria, it can be fixed. Note 
that as more reactive events take place, most likely fewer 
tasks will be able be fixed in the reactive scheduling for- 
mulation, making the resulting problem more compu- 
tationally complex. 


Cases 


In this section, an example problems is presented 
to demonstrate the effectiveness of the proposed ap- 
proach. Additional examples can be found in Janak et 
al. [7]. The example considers the production schedul- 
ing of an industrial batch plant including campaign 
mode production. We use the nominal schedule ob- 
tained in Janak et al. [6] and we consider reactive 
scheduling in the event of unit shutdowns. The example 
is implemented with GAMS 2.50 [1] and solved using 
CPLEX 9.0 [2] with a 3.20 GHz Linux workstation. The 


dual simplex method is used with best-bound search 
and strong branching. The short-term scheduling hori- 
zon where reactive scheduling is performed is run with 
a relative optimality tolerance equal to 0.001% along 
with a 30 minute time limit. The subsequent short- 
term scheduling horizons are fully rescheduled and are 
run with a relative optimality tolerance equal to 0.001% 
along with a three hour time limit and an integer solu- 
tion limit of 40. 


Nominal Production Schedule 


The nominal production schedule was determined in 
Janak et al. [6] where the total time period is 19 days, 
from D0 to D18. The nominal problem is solved includ- 
ing campaign mode production so that production in 
the type 5 unit is fixed to yield campaigns of predeter- 
mined length. The rolling horizon framework decom- 
poses the time horizon into 8 individual subhorizons, 
each with its own products and demands. The results of 
the decomposition for each time horizon can be seen in 
Table 1. 

The final production schedule for the entire time 
period can be seen in Fig. 3 and 4 where the process- 
ing units (operation type 1, 2, 3, and 5) are shown in 
the first figure and the other units (operation type 4a, 
4b, and 6) are shown in the second. Each short-term 
scheduling horizon is represented with a different color. 

The total demand for the entire 14-day period is 
2323.545 and the total production is 2719.859, where 
15.00 of the demands are not met. The production 
schedules obtained satisfy demands for all but one 
product, though some due dates are relaxed, and also 
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Decomposition results for nominal problem 
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Overall production schedule for processing units for nominal problem 
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Reactive Scheduling of Batch Processes, Table 2 
Unit utilization statistics for nominal problem 


Time Used (h) Time Left (h) Percent Utilized 
33.33% 
71.71% 
59.21% 
80.70% 
58.07% 


Type F9 
Type +10 
Type EN 
Type F13 
Type 5 


Type 1-6 
Type 1-7 
Type 1-8 


produce 17.06% more material than the demands re- 
quire. Note that many of the processing units are not 
fully utilized, as shown in Table 2, indicating the poten- 
tial for even more production in the given time period 
which may be incorporated using reactive scheduling. 


Case 1: Reactive Scheduling with Unit Shutdown 


In the example, we consider reactive scheduling for the 
first time horizon where several units are unavailable 
for a period of time. We will use the nominal produc- 
tion schedule as our base schedule where the first time 
horizon covers three days (i.e., DO to D2) and utilizes 
nine event points. The three different unit shutdowns 
considered can be seen in Table 3. 

The reactive scheduling framework for unit shut- 
downs fixes tasks that start before the latest shutdown 
start time in units which are unaffected and fixes tasks 
that start and end before the shutdown time in units 
which experience a shutdown. Thus, for this example, 
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Unit shutdowns for case 1 


Unavailable = Unavailable 
Start Time (h) End Time (h) 


Unit 


all the tasks starting before time 48 hours in all the units 
except Type 1-3, Type 1-5, and Type 1-11 are fixed. 
This means that both the binary variable (wv(i, j, )), 
starting time (ts(i, j,)), and batch size (b(i, j, n)) of 
the task need to be fixed for each task (i) in those units 
(j) at the event point (n) they occurred in the nominal 
production schedule. Thus, if we consider the nominal 
production schedule for the first time horizon shown in 
Fig. 5, then the tasks shown in Table 4 are fixed in each 
unit. 

In addition, in each of the units which experience 
a shutdown, bounds need to be placed on the starting 
and finishing times for all tasks at event points which 
do not already have tasks fixed to them. An intermedi- 
ate event point is chosen and all of the event points be- 
fore and including the intermediate event point have an 
upper bound placed on the starting and finishing times 
of tasks to be less than or equal to the start of the shut- 
down. Similarly, all of the event points after the inter- 
mediate event point have a lower bound placed on the 
starting and finishing times of tasks to be greater than 
or equal to the end of the shutdown. Note that the inter- 
mediate event point is unit specific and is determined so 
that there are approximately the same number of event 
points before and after the shutdown while excluding 
event points that already have tasks assigned to them. 
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Tasks to fix in case 1 


Task 


Unit Event Point 


Type 1-6 | R_P89 
R_P26 

R_P209 
R_P423 
Fiype 72 |e Pier [Ni 
Type 1-13 | R_P181 
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Nominal production schedule for the processing units in the first horizon 
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Bounds on timing in units in case 1 


Fixed 
Event 
Points 


Bounds 


, tf(i, j,n) < 24 for N2, N3,N4, N5 
, tf(i, j,n) > 48 for N6, N7, N8, NJ 
, tf(i, j,n) < 36 for N1, N2, N3, N5 
, tf(i, j,n) > 72 for N6, N7, N8, NJ 
, tf(i, j,n) < 48 for N1, N2, N4, N4 
, tf(i, j,n) => 72 for N7, N8, N9 


ts(i, j, 
ts(i, j, 
ts(i, j, 

( 

( 

( 


ts(i, j, 
ts(i, j, 
ts(i, j, 


n) 
n) 
n) 
n) 
n) 
n) 


Thus, considering the nominal production schedule for 
the first time horizon, the bounds shown in Table 5 are 
placed for each of the units which experience a shut- 
down. 

Then, fixing the above sets of tasks from Table 4 and 
imposing the bounds defined in Table 5, we obtain a re- 
active production schedule which excludes production 
in the affected units during the shutdown periods and 
maintains all of the production that has already taken 
place. The reactive schedule for this example can be 
seen in Fig. 6. Note that once a reactive schedule is ob- 
tained for the first time horizon which incorporates the 
unit shutdown information, the subsequent horizons 
must each rescheduled from scratch due to the changes 
in inventory, demand satisfaction, and unit availability. 


Conclusions 


In this chapter, we presented a reactive scheduling 
framework which provides an immediate response to 
unexpected events such as equipment breakdown or 
the addition or modification of orders. The reactive 
scheduling formulation takes into account the sched- 
ule currently in progress as well as planned produc- 
tions that are not affected by the unexpected event. 
The proposed mathematical framework utilizes an ef- 
ficient MILP mathematical framework developed for 
short-term scheduling problems with modifications in- 
troduced to reflect the effects of the unforeseen event. 
To avoid full rescheduling of the current production 
schedule, the formulation determines tasks which are 
not affected by the unforeseen event, either directly or 
indirectly, and can be carried out as scheduled. The 
resulting tasks along with additional subsets of tasks 
are then fixed in the MILP problem and the rest of 


the horizon is rescheduled. We considered two types 
of unexpected events: unit shutdown and the addition 
or modification of orders, as well as their combination. 
All the cases of unexpected events utilize the nomi- 
nal production schedule, the original MILP formula- 
tion for short-term scheduling with modifications, and 
a program to determine which tasks may be fixed before 
rescheduling. The formulation is then able to determine 
an updated production schedule for the remaining time 
horizon in a reasonable amount of CPU time. Reac- 
tive scheduling of a large-scale industrial batch plant 
was performed to demonstrate the effectiveness of the 
proposed approach. Results indicate that the reactive 
scheduling framework is capable of determining up- 
dated production schedules to account for unexpected 
events and can also be used to improve existing produc- 
tion schedules. 
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A typical nonlinear program is to find a point in a feasi- 
ble region that will minimize an objective function. The 
feasible region is often represented by a finite set of 
algebraic inequalities called constraints. If a constraint 
can be removed from the set without causing a change 
to the feasible region it is said to be redundant with re- 
spect to the remaining constraints. A constraint that is 
not redundant is said to be nonredundant, irredundant, 
or necessary. 

A more formal definition requires some notation. 
Let R C R” denote a nonempty feasible region. The set 
of constraints indexed by I = {1, ..., m} is given by G(J) 
= {g; < 0:i € I}, where g;(x): R” — R, and the region 
determined by G(J) is R(I) = {x: gi(x) < 0, i € I}. Sup- 
pose that R = R(I). We then call G(I) a representation of 
R. The kth inequality constraint ‘g;,(x) < 0’ is redundant 
with respect to G(x) if RU) = RU), where I, = I — {k}. 
Further definitions, such as those for relatively redun- 
dant constraints, weakly necessary constraints, etc., can 
be found in [3,5]. 

Of course, there can be more than one redundant 
constraint. If T is a proper subset of J, and if R®) = 
R(J), the constraint set G() is called a reduction of G(1). 
Further, G() is called a common dependency set for the 
set of redundant constraints G(I =i) [8]. If all the con- 
straints in G() are nonredundant, then G() it is called 
a prime representation of the feasible region. A related 
concept is that of a minimal representation. 

For linear constraints, i.e., when g; (x) = a} x — bi, 
where a; € R” and b; € R, a minimal representation is 
one having the fewest constraints. In [11] it is proved 
that a representation is minimal if and only if it con- 


tains no redundant constraints and no implicit equal- 
ity constraints. The constraint ‘g;(x) < 0’ is an implicit 
equality in G(J) if gx(x) = 0 for all x € R(J). In order 
to obtain a minimal representation the implicit equal- 
ities must first be replaced with explicit equalities, and 
then the redundant constraints must be removed one at 
a time. (The definition of redundancy is easily modified 
to include equality constraints.) 

For the quadratic case, i.e., when gi(x) = a x + 
(1/2) xT H; x — b;, where the H; are symmetric, posi- 
tive definite matrices, a minimal representation is de- 
fined as one having the least number of quadratic con- 
straints, and, among those with the same number of 
quadratic constraints, the least number of linear con- 
straints. It was proved in [10] that a representation was 
minimal if and only if it contained no redundant con- 
straints, no implicit equality constraints, and no pseu- 
doquadratic constraints. A pseudoquadratic constraint 
is one that can be replaced by a finite number of linear 
constraints without causing a change to the feasible re- 
gion. For example, since {x € R*: x2 = 0, xj + x5 < 1} 
equals {x € R*: x2 =0, — 1 <x; <1}, the constraint ‘x7 
+x} < 1’ was pseudoquadratic. 

Algorithms used to detect redundant constraints 
can usually be classified as either deterministic methods 
or probabilistic methods. The deterministic methods are 
optimization based. Consider the nonlinear program 
max {gz(x): x € R(;)}. If there is no solution; that is, 
if the program is either unbounded or infeasible (R(x) 
= 9), then it follows from the definition of redundancy 
that the kth constraint is necessary. Otherwise the pro- 
gram has a solution x*. If g,(x*) > 0, then again the Ath 
constraint is necessary, and if g;,(x*) < 0, the constraint 
is redundant. This method can work quite well in the 
linear case [6,9] as each of the constraints can be classi- 
fied by solving a linear program. Consequently, for the 
linear case, the redundancy problem is polynomial. The 
importance of redundancy detection to the solution of 
large sparse linear programs is discussed in [1]; and the 
importance of redundancy detection for model analy- 
sis is discussed in [7]. For the nonlinear case, this ap- 
proach has the drawback that it requires the solution of 
nonconvex programs. For example, if all the constraint 
functions are convex, then max {g;,(x) : x € R(x) } is 
a nonconvex program. 

An alternative to the deterministic, optimization 
based methods are the probabilistic methods. The first 
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such method, which became known as the hyperspheres 
direction hit and run method was introduced in [4]. (A 
description of the method can be found in [9].) This 
technique generates a sequence of random lines that 
pass through the feasible region. The lines are generated 
as follows. A given feasible point x € R and a direction 
s, sampled from a uniform distribution over the surface 
of the unit hypersphere, determines the line {x + o s: 
o € R}. The next feasible point is sampled uniformly 
from the feasible segment of the line. Note that in or- 
der to determine the feasible line segment the intersec- 
tion points of the line with all the constraint bound- 
aries must be calculated. Under appropriate assump- 
tions, the constraints hit by the feasible line segment 
are nonredundant. This technique requires a stopping 
rule after which all constraints that have not been hit are 
classified, possibly with error, as redundant. The main 
advantages of hit and run methods are that they can 
very quickly identify most of the necessary constraints, 
and that they can be applied to a large class of nonlin- 
early constrained regions. The main disadvantages are 
the need for an initial feasible point, and the need to 
calculate the intersection points. 

For general nonlinear programs it may well be that 
neither the deterministic nor the hit and run methods 
are applicable. For these problems, there is an algo- 
rithm [2] based on a connection between the redun- 
dancy problem and the constraints in a set covering 
problem. This method only requires that for any x € R” 
it can be determined if g;(x) is negative or nonnegative. 
In fact, the method does not even require a nonempty 
feasible region. This technique is probabilistic in that it 
samples points from R”. Each point that is sampled is 
used to generates an m-bit binary word whose kth bit 
is unity if and only if the kth constraint is violated at 
that point. Upon termination of the sampling process, 
the collection of all the generated binary words form the 
rows of a set covering matrix E having m columns, one 
corresponding to each constraint. Let y be a nontrivial 
feasible solution to the set-covering problem, i.e., Ey > 
e, where e is a vector of ‘ones’, and where y has at least 
one zero component. Then G(D) is a reduction of G(J), 
where i € 1 if and only if y; = 1. An objective func- 
tion can be introduced to the set-covering problem so 
that any set covering heuristic can be used to find an 
approximately minimal representation. Here the defi- 
nition of minimal will depend on the choice of objec- 


tive. The main disadvantage to the approach is the need 
to generate the set covering rows. The main advantage 
is its general applicability. 


See also 


> Equality-constrained Nonlinear Programming: KKT 
Necessary Optimality Conditions 
> Inequality-constrained Nonlinear Optimization 
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Discrete and continuous nonconvex programming 
problems arise in a host of practical applications in 
the context of production planning and control, loca- 
tion-allocation, distribution, economics and game the- 
ory, quantum chemistry, and process and engineering 
design situations. Several recent advances have been 
made in the development of branch-and-cut type al- 
gorithms for mixed-integer linear and nonlinear pro- 
gramming problems, as well as polyhedral outer-ap- 
proximation methods for continuous nonconvex pro- 
gramming problems. At the heart of these approaches 
is a sequence of linear (or convex) programming relax- 
ations that drive the solution process, and the success 
of such algorithms is strongly tied in with the strength 
or tightness of these relaxations. 

The Reformulation-Linearization-Technique (RLT) 
is a method that generates such tight linear program- 
ming relaxations for not only constructing exact solu- 
tion algorithms, but also to design powerful heuristic 
procedures for large classes of discrete combinatorial 
and continuous nonconvex programming problems. 
Its development originated in [4,5,6], initially focus- 
ing on 0-1 and mixed 0-1 linear and polynomial 
programs [21,22], and later branching into the more 
general family of continuous, nonconvex polynomial 
programming problems [18,45,49]. For the family of 
mixed 0-1 linear (and polynomial) programs in n 
0-1 variables, the RLT generates an n-level hierarchy, 
with the n-th level providing an explicit algebraic char- 
acterization of the convex hull of feasible solutions 
[21,22]. The RLT essentially consists of two steps—a re- 
formulation step in which certain additional nonlin- 
ear valid inequalities are automatically generated, and 
a linearization step in which each product term is re- 


placed by a single continuous variable. The level of 
the hierarchy directly corresponds to the degree of the 
polynomial terms produced during the reformulation 
stage. Hence, in the reformulation phase, given a value 
of the level d € {1,...,m}, the RLT constructs vari- 
ous polynomial factors of degree d comprised of the 
product of some d binary variables x; or their com- 
plements (1 — xj). These factors are then used to mul- 
tiply each of the defining constraints in the problem 
(including the variable bounding restrictions), to cre- 
ate a (nonlinear) polynomial mixed-integer zero-one 
programming problem. Suitable additional constraint- 
factor products can be used to further enhance the pro- 
cedure. In general, for a variable restricted to lie in the 
interval [/;, uj], the nonnegative expressions (x; — lj) 
and (uj; — x;) are referred to as bound-factors, and for 
a structural inequality ax > £, for example, the expres- 
sion (ax — B) is referred to as a constraint-factor; im- 
plied product constraints can be generated using ei- 
ther bound-factors or constraint-factors. After using 
the relationship xi = x; for each binary variable x;, 
j €{1,...,m}, which in effect accounts for the tight- 
ening of the relaxation, the linearization phase substi- 
tutes a single variable wy (respectively, vj.), in place 
of each nonlinear term of the type [[j<) x; (respec- 
tively, yx |] je; xj), where y represents the set of con- 
tinuous variables. Hence, relaxing integrality, the non- 
linear polynomial problem is linearized into a higher 
dimensional polyhedral set Xq defined in terms of the 
original variables (x, y) and the new variables (w, v). 
Denoting the projection of Xz onto the space of the 
original (x, y)-variables as Xp,, it is shown that as d 
varies from 1 to n, we get, 


Xpy =) Xp, =) Xp, owe Xp, = conv(X), 


where Xp, is the ordinary linear programming relax- 
ation, and conv(X) represents the convex hull of the 
original feasible region X. An extension of this de- 
velopment to the case of general integer/discrete vari- 
ables is presented in [7,25], where the bound-factors 
are replaced by suitable Lagrange interpolating poly- 
nomials, and a further extension to 0-1 mixed-inte- 
ger as well as general mixed-discrete semi-infinite and 
bounded convex programming problems is presented 
in [26] (see also [50]). Lovasz and Shrijver [16] and 
Boros et al. [9] have also independently developed var- 
ious concepts related to the RLT process. This RLT 
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process has also been extended and enhanced in [27] 
through the use of more generalized constraint-factors 
that imply the bounding restrictions 0 < x; < 1 for 
j€{i,...,n}. A similar hierarchy of relaxations lead- 
ing to the convex hull representation is obtained based 
on the use of these generalized factors in the refor- 
mulation phase, in lieu of simply the bound-factors +; 
and (1 — x;), for j € {1,...,m}. In addition, this hier- 
archy embeds within its construction stronger logical 
implications than only xi =x; Vje {1,...,n}. For 
example, consider an RLT constraint that has been gen- 
erated by taking the product of some nonnegative poly- 
nomial factor F with a defining constraint y'x > 6 to 
yield [F(y'x — 6)], => 0, where [-];, denotes the lin- 
earization of the polynomial expression [-] under the 
RLT substitution process. Then, this constraint can be 
tightened by deriving a stronger valid inequality of the 
type px > 8 under the condition that F > 0, and 
then imposing the RLT constraint [F(p'x— 8)]z > 0, 
which is valid whenever F = 0 or F > 0. The resulting 
overall RLT process is shown in [27] to not only sub- 
sume the previous development, but also provide the 
opportunity to exploit frequently-arising special struc- 
tures such as generalized/variable upper bounds, cov- 
ering, partitioning, and packing constraints, as well as 
sparsity. 

The hierarchy of higher-dimensional representa- 
tions produced in this manner markedly strengthens 
the usual continuous relaxation, as is evidenced not 
only by the fact that the convex hull representation 
is obtained at the highest level, but that in computa- 
tional studies on many classes of problems, even the 
first level representation helps design algorithms that 
significantly dominate existing procedures in the liter- 
ature [4,6,20,27,30,41]. Based on a special case of the 
RLT process that employs the bound-factors for only 
a single variable at a time, Balas et al. [8] describe a lift- 
and-project cutting plane algorithm that is shown to 
produce encouraging results. The theoretical implica- 
tions of this hierarchy are noteworthy; the resulting 
representations subsume and unify many published lin- 
earization methods for nonlinear 0-1 programs, and 
the algebraic representation available at level n pro- 
motes new methods for identifying and characterizing 
facets and valid linear inequalities in the original vari- 
able space, as well as for providing information that 
directly bridges the gap between discrete and contin- 


uous sets [3,38,40]. Indeed, since the level-n formula- 
tion characterizes the convex hull, all valid inequalities 
in the original variable space must be obtainable via 
a suitable projection; thus such a projection operation 
serves as an all-encompassing tool for generating valid 
inequalities. References [38,40] provide discussions on 
generating facets and tight valid inequalities for sev- 
eral classes of problems. Reference [3] discusses persis- 
tency issues for certain constrained and unconstrained 
pseudo-Boolean programming problems whereby vari- 
ables that take on 0-1 values at an optimum to an 
RLT relaxation would persist to take on these same 
values at an optimum to the original problem. Ref- 
erences [1,2,13,34,39,42], respectively discuss the use 
of RLT to generate improved model representations 
for the set partitioning, quadratic assignment, traveling 
salesman problems, and to 0-1 mixed-integer programs 
subject to various assignment constraints. 

Although the Reformulation-Linearization Tech- 
nique was originally designed to employ factors in- 
volving 0-1 variables in order to generate 0-1 (mixed- 
integer) polynomial programming problems that are 
subsequently re-linearized, the approach has also been 
extended to solve continuous, bounded variable poly- 
nomial programming problems. Problems of this type 
involve the optimization of a polynomial objective 
function subject to polynomial constraints in a set 
of continuous, bounded variables, and arise in nu- 
merous applications in engineering design, produc- 
tion, location, and distribution problems. Reference 
[45] prescribes an RLT process that employs suitable 
polynomial-factors (bound-factors based on bounding 
restrictions 1; < xj < uj,j € {1,..., n}, as well as con- 
straint factors) to generate additional polynomial con- 
straints through a multiplication process, which upon 
linearization through variable redefinitions as above, 
produces a linear programming relaxation. The result- 
ing relaxation is used in concert with a suitable de- 
signed partitioning technique that attempts to reduce 
the error between the original nonlinear and their re- 
sulting linearized terms, in order to develop an al- 
gorithm that is proven to converge to a global opti- 
mum for this problem. Special classes of polynomial 
constraints based on grid factors, Lagrange interpolat- 
ing polynomials, and mean value theorem constraints 
can be generated to further tighten these RLT relax- 
ations [48]. In some cases (e.g., see [47]), it is benefi- 
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cial to retain certain simple convex constraints in the 
relaxation, resulting in a more general Reformulation- 
Linearization/Convexification Technique. Additionally, 
Sherali and Fraticelli [35] have proposed a class of 
semidefinite cuts based on semidefinite relaxation en- 
hancements that can be used to significantly tighten 
RLT representations. While RLT essentially operates 
on polynomial functions having integral exponents, 
many engineering design applications lead to polyno- 
mial programs having general rational exponents. For 
such problems, a global optimization technique has 
been designed [18] by introducing a new level of ap- 
proximation at the reformulation step, and accordingly, 
redesigning the partitioning scheme in order to induce 
the overall sequence of relaxations generated to become 
exact in the limit. Further extensions for solving non- 
linear factorable programs for which the objective and 
constraint functions involve sums of products of uni- 
variate functions have also been developed [49]. Here, 
suitable under/over-approximating nonconvex polyno- 
mial functions are derived for the defining univariate 
functions in the problem, and then an appropriate par- 
titioning scheme is devised that drives the errors from 
these approximations and those for the RLT process ap- 
plied to the resulting polynomial program simultane- 
ously to zero in the limit, in order to obtain a global 
optimum for the given factorable program. For non- 
convex programs that are defined in terms of black- 
box functions, a new concept of a pseudo-global RLT 
approach has been developed by Sherali and Gane- 
san [36], which has been successfully applied to the de- 
sign of containerships. 

A special application of the RLT to mixed-in- 
teger quadratic problems subject to linear equality 
constraints that yields exact reformulations having 
fewer quadratic terms and some additional support- 
ing RLT constraints has been developed to produce 
tighter convex relaxations [10,11,12,14,15]. More pre- 
cisely, we multiply a subset of equality constraints 
Ax = b by an appropriate subset of problem vari- 
ables {x, | k € K}, to obtain a reduced RLT system 
Vk € K(Aw, = bxx), where we = (x4%1,.- 
all k € K. This is equivalent to the homogeneous linear 
system Vk € K(Az, = 0) where zp = (Whi —XkX1,---, 
Wkn — XkXn), Which may be written in a more compact 


., X~KXy) for 


way as A’z = 0. If we partition A’ into basic and non- 
basic submatrices B, N, and accordingly partition z into 


Zp and zy, we have (B|N)z = 0, whence Nzy = 0 im- 
plies that Bzg = 0. We therefore conclude that enforc- 
ing the reduced RLT system and the subset of quadratic 
relations wy; = x,x; for (k, i) corresponding to nonba- 
sic columns of N is enough to infer wx; = x,x; for all 
(k, i). In other words, by letting the RLT process ensure 
that zy = 0, we automatically obtain as an implication 
of the RLT linearized constraints that the quadratic re- 
lation zg = 0 will hold true as well. 

For the continuous case, there exist special instances 
where RLT can produce convex hull or convex enve- 
lope representations [17,28]. Various classes of applica- 
tions have been studied for which specialized RLT de- 
signs have been used to develop enhanced effective al- 
gorithms. This list, which is ever expanding, includes 
bilinear programming problems [15,28], general indefi- 
nite quadratic programming problems [12,13,47], loca- 
tion-allocation problems employing different distance 
metrics [20,29,41,46], water distribution network de- 
sign problems [43,44], the solution of Hartree-Fock 
equations in quantum chemistry [14], the linear com- 
plementarity problem [37], and hard and fuzzy clus- 
tering problems [31,32]. References [11,19,23,24,25,33] 
provide expository discussions and a survey of RLT the- 
ory and applications. 
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Consider the following general problem of regression. 
Given m measurements or observations f = (f1,...5fn) € 
R" with f; = 4; + nj, where fu = (J41,..., in) is unknown 
but is in some predefined subset K of R", and 7; is the 
noise generated by sampling, it is required to find a best 
fit or an estimate g of yw. The set K is dictated by the 
underlying system which generates f. A best fit or an 
optimal solution g is obtained by minimizing a suitable 
distance function d(f, g) subject to g € K. Ifw=(w,..., 
Wn) > 0 is a given weight function, we use the following 
distance functions in our analysis: 


1<i<n}, 


doo(f, g) = max {Wj lfi — gil : 
A(f.g) =) {wilfi-gils 1<i< ny, 
a(f.g)= >) {wilfi- gi’: 1si<n}. 


Note that (d2 (f, g))!”” and not dp (f, g) is a distance 
function in that it satisfies the well known triangle in- 
equality. Let the primed entries d..’ (f,.g), di’ (f, g) and 
dy’ (f, g) denote the corresponding distances when w; 
= 1 for all i. As will be seen later that the results for the 
two sets of distances are different. We call f, g, w etc. 
‘functions’ because they may be considered as such on 
some underlying set S = {x), ..., X,}, where f; = f(xi), 
&i = B(x), wi = w(x;) etc. The best known example of 
a regression problem is the linear regression where K 
consists of linear functions given by g; = a x; + b with xj 
< +++ <x,, and the numbers a and b are unknowns to 
be determined by minimizing d.(f, g). In this article we 
are concerned with the following three cases: K formed 
by 

e istone or monotone functions, 

e quasiconvex and umbrella functions, and, finally 

e convex and concave functions. 
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We develop algorithms to compute best fits and obtain 
their computational complexity . In most cases the al- 
gorithms are of polynomial complexity . Wherever pos- 
sible we derive explicit expressions for best fits. Ap- 
plications can be cited to problems in reliability engi- 
neering, economics, social sciences, order restricted sta- 
tistical inference etc. [10,13,14,15,18,24]. For example, 
the failure rate of a system increases with the age of the 
system (isotonicity), the human mortality rate (or the 
number of auto accidents) first decreases and then in- 
creases with age (quasiconvexity), the efficiency of an 
organization first increases and then decreases with its 
size (quasiconcavity). Similarly, in economics assump- 
tions of convexity/concavity are made about functions 
representing utility, marginal utility, productivity, sup- 
ply, demand etc. 

We briefly discuss the significance of different dis- 
tance functions used in regression. In the classical ap- 
proach to optimization and regression, the least squares 
distance function d> (f, g) was extensively used. Its dif- 
ferentiability properties led to certain simplification of 
mathematical analysis. For some time now, both d, (f, 
g) (mean absolute deviation) and doo (f, g) (maximum 
absolute deviation) distance functions are being used. 
For example, see MINMAD and MINMADAX regres- 
sion in [2] and the least absolute value (LAV) or L)- 
norm estimation and Log-norm estimation in [11]. See 
also [6]. If we write d,(f, g) = >> {wi |fi — gi P: 1 <i 
< nj, 1 < p< ow, then (d,(f, g))"? is a distance func- 
tion, and |[f |lp = dp (f, 0), 1 < p< ov, and |If |loo 
= doo (f, 0) are, respectively, the Ly and Log norms on 
R". Both d; and doo distances have the strong advantage 
that their form allows transformation of the regression 
problem to a linear program which facilitates compu- 
tation of its solution [2,19]. The nature of the problem 
essentially determines the choice of the distance func- 
tion. Then the algorithms developed for the solution, 
their computational complexities and the properties of 
the best fits obtained, all in turn, depend on the distance 
function used. 


Isotonic Regression 


A function g = (gi, ...; n) € R” is called isotonic or 
monotonic (nondecreasing) ifg; <--+ < gy. We let K be 
the set of all isotonic functions and consider the isotonic 
regression problem of minimizing d, (f, g) subject to g € 


K. Since dy (f, g) is strictly convex in g and K is a closed 
convex cone, the solution of this problem is unique. (K 
isa cone if A h € K whenever h € K and A > 0.) 

We describe the pool adjacent violators algorithm 
(PAV) for computing the solution. See [24] and other 
references given there. The form presented below ap- 
pears in [5]. Let N = {1, ..., n}. A partition J of N is 
a decomposition of N into disjoint sets of consecutive 
integers such that their union is N. Each member of J is 
called a block and is denoted by B. We let Az(c) = )- {w; 
(f; — c)*: i € B}. We let g(J) be any n-vector whose ith 
coordinate g;(J), i € N, is given by gi(J) = 4g, where B 
is the unique block of J containing i and jg is the min- 
imizer of Ag. Clearly, g;(J) has identical values for all i 
in a block. Also jg equals the ‘block average’ > {w; fi: 
i € B} / > {w;: i € B}. The PAV algorithm starts with 
the finest partition whose blocks are single integers in 
N and an initial infeasible solution g violating the con- 
straint g) <--: < gy. It successively merges blocks to 
reduce infeasibility and obtains a new coarser partition 
J and an infeasible solution g(J). It terminates when g(J) 
becomes feasible obtaining the optimal solution and the 
final optimal partition. Let B = {p,...,q},1<p<q< 
n, denote a block of a partition J during any iteration of 
the algorithm. The predecessor and successor blocks of 
B, denoted respectively by B — and B +, of the same par- 
tition J, are defined as follows: If p > 1, then B — is the 
block containing p — 1, otherwise B — = @. Similarly, if 
q <n, then B + is the block containing q + 1, otherwise 
B+=9. 

It can be shown that [4,21] the algorithm is of lin- 
ear time complexity (O(m)) in the usual unit cost RAM 
model [1]. We now describe an interesting implement- 
ing of the algorithm [21]. Let Wj = 0 {w:1 <j <i 
and F; = > {w; fj: 1 <j < i} with Wo = Fy = 0. Clearly, 
W;_-1 < W;. By plotting the n + 1 points P; = (Wi, Fi), 
0 <i <n, in the plane we obtain the cumulative sum 
diagram (csd). Note that the slope of the line segment 
joining P;— , and P; is f;. Let Q; = (Wj, G;), be the great- 
est convex minorant (gcm) or the lower convex hull of 
the csd. This is the largest convex function which does 
not exceed the csd at any point. It is easy to see that Po 
= Q and P,, = Q,. Also since the gcm is convex, the 
slopes (G; — G;— 1)/(Wi — Wj) are nondecreasing. It 
has been shown that g; = (G; — Gj_1)/(Wi — Wi-1), 1 
<i <n, give an optimal solution to the isotonic regres- 
sion problem. Graham ’s scan [12] can be easily modi- 
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Initialization: Set J = {{i} : i € N}; 
compute the minimizer juz of Ag, BEJ 
(if B={i}, then jvg = fi). 

Set B = {1}, B+ = {2}, B— =; 
WHILE B+ 4 
IF LB > [B+ THEN 
merge B and B+ 
(i.e. set J = J\{B, B+} U{BU B+} and 
B= BU B+); 
compute the new [1,3 
set B+ appropriately 
(B— remains unchanged); 
WHILE B— 4 ® and z— > [Lp 
merge B and B— 
(i.e. set J = J\{B, B—} U {BU B-} 
and B = BU B-); 
compute the new /1,; 
set B— appropriately 
(B+ remains unchanged); 
END WHILE 
ELSE set B = B+; 
set B— and B+ appropriately; 
END IF 
END WHILE 
x;(J) = [4g, i € B € J, is an optimal solution 


Regression by Special Functions: Algorithms and Complex- 
ity, Algorithm 1 
Algorithm PAV (pool adjacent violators) 


fied to obtain the gcm in O(n) time. See e.g. [28]. Thus 
the overall algorithm is again O(n). The treatment and 
the results for dy’ are identical to those of d. 

Now consider the problem of isotonic medium re- 
gression which is the problem of minimizing d,’ (f, g) 
subject to g € K [7,23]. The PAV algorithm can be ap- 
plied to this problem with Ag(c) = > {|f; — c |: i € B}. 
A minimizer jug of Ag can be easily shown to be a me- 
dian of the set {f;: i € B}. We choose jg to be the central 
medium of the set {f;: i € B} defined as follows. For an 
ordered (ascending) {c1,..., c;}, it is c(,41)2 if r is odd, 
and 

ce + Cr+ 

2 
if r is even. (If r is even, then both the lower and 
upper mediums, ¢,. and c;j241, or any point in be- 


tween the two, minimize Az(c). The nonuniqueness of 
the medium is addressed later.) The PAV algorithm 
with this choice of the central medium is of complexity 
O(n?), since the median of an unordered set of cardinal- 
ity rcan be computed in linear time O(r) ([1, Algorithm 
3.6]). Now consider minimizing d(f, g). In this case, 
let Ap(c) = > {w; |fi — ¢ | : i © B}, whose minimizer 
jtp is a weighted median of the set {f;: i € B}. Again 
it is not unique. We use the unique weighted central 
medium of an ordered (ascending) set {c), ..., c-} with 
weight w; for c; defined as follows. Let A = > {w;: 1 < 
i < n}.The central median is c, if Yi fo: l<i<q- 
1} <A/2< >) {ws q <i < 1} for some g. It is (cg—1 + 
cq)/2 if )° {wj: 1 <i < q— 1} =A/2 (As before both the 
lower and upper mediums, c,—, and c,, or any point 
in between the two, minimize Ag(c)). Again it can be 
computed in linear time by extending ([1, Algorithm 
3.6]) to the weighted case. It will be seen that the ratio 
max {w;} / min {w;} plays an important role in analysis. 
The PAV algorithm for this case is of complexity O(n’). 
Now consider the nonuniqueness of the medium. We 
may use the lower or upper medium for jug in the PAV 
algorithm since both minimize Ag(c). When we consis- 
tently use the lower (resp. upper) medium in the algo- 
rithm for d or d;’, we get the minimal (resp. maximal) 
optimal solution to the problem. This has been estab- 
lished in [5] for a general case of minimizing a separa- 
ble convex function subject to the monotonicity con- 
straint. This problem includes, as special cases, the iso- 
tonic regression with distances d; and d, and other 
situations such as in [25,26]. Now consider the prob- 
lem of isotone optimization, i.e., minimizing doo and 
doo’ [27]. Define ti = w; wj/(w; + wj) and @ = max {tj 
(fi — fj): 1 < i <j < n}. Define two isotonic func- 
tions g and g by g, = max {fj - l<j< ib, 8, = 
min {fj + a ixj< nk, l<i< n. Then both g and 
g are optimal solutions to isotone optimization with 
distance function doo and 6 = doo(f.g) = doolf.g). 
Furthermore, a monotonic g is an optimal solution if 
and only if g < g < g. The computation of the solution 
clearly takes O(n?) time. The above results are also valid 
for doo’, but in this case the results can be simplified and 
an O(n) algorithm for the computation can be obtained 
(28, [Sect. 6]). Define kj = fi, k; = max{kj-1, fi}, i 
=2,...,n,andk, = fas ki = min{kj+1, fit, i=n 
—I1,n—2,...,1. Let? = doo hk) — doo PK) Then 
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gi = ki —O and g; = k; + 0. Clearly, the algorithm 
is O(n). For isotonic regression problems with integer 
constraints see [10] and [18]. 


Quasiconvex and Umbrella Regression 


A function g = (g1,.-.5 Zn) € R” is called quasiconvex if 
&i < max {gp, gq} holds for all i with p<i<qforalll < 
p<q<n([22].Ag is called quasiconcave or umbrella if 

— gis quasiconvex. It can be shown that g € R” is qua- 

siconvex if and only if there exists an index r, 1 <r < 

n, such that g) > ++: > g <@r41 S++ < Qn. (Thus g, 

= min {g;: 1 < i < n}.) [8,29]. Ifr = 1 (resp. n), then g 

is nondecreasing (resp. nonincreasing). Similar results 

may be stated for a quasiconcave function. In what fol- 
lows we discuss regression by quasiconvex functions, 
the results for quasiconcave functions are symmetric. 

The quasiconvex problem can be transformed into 2n 

isotonic subproblems by the observation made above. 

Let K denote the set of all quasiconvex g, and let K, de- 

note all g = (gi, ..., Zn) with g >,..., > g, and g,41 

< +++ <g,, where 1 < r < n. Note that K, is the set of 

all quasiconvex g such that g, or g,+ 1 equals min {g;: 

1 <i< nf}. Clearly, K = U {K,;:1 <r < n}. It is easy to 

see that K is a closed cone which is not convex, although 

K,, for each fixed r, is a closed convex cone. Hence opti- 

mal solutions obtained in this section are not necessar- 

ily unique. To solve the quasiconvex Regression prob- 
lem of minimizing d) (f, g) subject to g € K, we consider 

the following two subproblems for each r: 

e P1,: Find gj; 1<i<r,and A 1, so that A 1, = min 
> {wi (fi — gi)?: 1 <i < 7} subject tog) > +--+ > gy 
and 

e P2,: Find g;,r+1<i<n,and A 2, so that A 2, = 
min )¢ {w; (f; — gi)*: r+ 1 <i <n} subject to g,41 
Ss <p. 

We then minimize A 1, + A 2, subject tol <r<n. 

Note that both P1, and P2, are isotonic regression prob- 

lems. They can be solved in O(r) and O(n — r) time re- 

spectively, using the PAV algorithm of the previous sec- 
tion. Thus, the quasiconvex regression problem can be 
solved in O(n 7) time. We have shown in [31] that the 
complexity can be improved to O(n) by using special 
mathematical results and suitable data structures. Our 
algorithm uses one forward and one backward pass on 
indexes 1,..., n to obtain the unique optimal solutions 
of both P1, and P2, and the values of A, = A 1, and 


A 2, for all r. Although we use Graham ’s scan and the 
gcm discussed in the previous section to compute the 
solutions of the isotonic regression subproblems P1, 
and P2,, alternative schemes without using the gcm can 
be easily devised. Another algorithm of unknown com- 
plexity appears in a later article [9]. Now consider the 
quasiconvex medium regression problem of minimizing 
d i(f, g) subject to g € K. We may consider problems 
Pl, and P2, as above where )> w; (f i1-—g i) is replaced 
by > w; |f i — gi |. These isotonic medium regres- 
sion problems can be solved in quadratic time giving an 
overall complexity of O(n *). Whether this complexity 
can be improved or not is an open issue at this time. 

The strategy of transforming the quasiconvex prob- 
lem into 2n isotonic subproblems can also be used for 
doo. Since each isotonic subproblem can be solved in 
O(n’) time as for isotonic regression (see above), it may 
seem that the complexity of the quasiconvex problem 
is O(n*). A little reflection will show that the compu- 
tations can be organized in O(n’) time. Indeed, using 
the notation for isotonic regression above, the constants 
ti(fi—fj), 1<i<j <n, can be computed in O(n?) time. 
Then the 0 ’s needed for the subproblems can be com- 
puted recursively in O(n’) time. The rest of the compu- 
tations are O(n). If doo’ is used, then the complexity can 
be improved to O(n). Let f, = min {fj;: 1 <i<n},m 
is not unique. Define ky = fim, kj = max{kj-1, fi}, i 
=m+1,m+2,...,n,k; = max{kj4, fi},i=m—1, 
m—2,...,1,0 = doo) and g;, =k, -—0,1<i< 
n. Then g is an optimal solution to the problem. Also, 
define ky = fi, ki = min{kj-y, fj}, i=2,...,m—1, 
ky = So ki = min{kj+1, fits i=n—l,n—2,...,m 
+1, km = fon 9 = “28 and g = ki + 0,1 <i 
<n. Then k is the greatest quasiconvex minorant of f, 
i.e., the largest quasiconvex function such that k; < fj 
for all i, and g is the maximal optimal solution to the 
problem. Clearly, the computations are O(n) [29]. This 
problem on an interval is considered in [30]. 


Convex and Concave Regression 


We consider functions f, g etc. on a set S = {x), ..., 
Xn} so that f; = f(x), gi = g(xi) etc. Then g is convex 
if (gi — gi-1)/ 6)-1 S (@i41 — gi/ 6,2 Si<n-1, 
where 6; = Xj+1 — Xj, 1 < i< n-— 1. These constraints 
are linear. If the points x; are equally spaced, i.e., all 4; 
have identical values, then the convexity condition be- 
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comes gj41 —2g;+gi-1 > 0,2 <i<n-—1.LetK be 
the set of all convex functions. Then K is a closed con- 
vex cone. We first consider the convex regression prob- 
lem of minimizing d2 (f, g)(do' (f, g)) subject to g € 
K. Clearly, this is a quadratic programming problem 
where d> (d,’) is a strictly convex function of g. Its solu- 
tion is unique. The problem may be formulated as a lin- 
ear complementarity problem and solved [3,20]. Special 
methods as in [16,17] and other references given there, 
may be applied. Some earlier work appears in [13,14]. 
Results on complexity analysis of algorithms for these 
problems, in general, are lacking. The problem with d,, 
dj’ or dog distance function can be formulated as a lin- 
ear programming problem [2,19] and solved. Now con- 
sider doo’. Let k be the greatest convex minorant of f, 
i.e., the largest convex function such that k; < f; for all 
i. It may be easily computed in O(n) time using Graham 
’s scan as for isotonic regression. Let 90 = dealish) and 
Z= k; +6,1<i<n. Then gis a solution to the prob- 
lem computed in O(n) time. Indeed, it is the maximal 
optimal solution to the problem [28,32]. 

Since regression problems are indeed approxima- 
tion problems, some of the results of [33] are applica- 
ble to our problems. In particular, note the significance 
of the dual cone of the cone K of isotone functions as 
stated in the last paragraph there. 


See also 


> Isotonic Regression Problems 
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Projections 


Projection methods have become a useful technique for 
solving the convex feasibility problem 


(CFP) Find x € C, 


where C is a closed, convex nonempty set defined in 
a Hilbert space H endowed with an inner product ( -, 
-):H? > R, and the induced norm ||x||? = (x, x ). 

Projection methods are iterative. Given x; ¢ C, the 
standard iterative scheme is 


Xj41 = Xj + @;(Ps, (xi) — xi), (1) 


where S; > C is a closed convex set, Ps,(x;) is the pro- 
jection of x; on the set S;, and w; € R is the relaxation 
parameter. If w; < 1 we call it an underprojection, and if 


@; > 1 we call it an overprojection. In general, 0 < n < a; 
<2 -— nis required to ensure convergence. 

We attempt to present an overview of projection 
techniques. This section will cover the choice of the 
supersets {S;}°. Next section will study the relaxation 
parameter and a third section offers additional mate- 
rial on the subject, including pertinent references. We 
use a standard notation with minor peculiarities that 
should cause no difficulty to the reader. 

Let us start by recalling that Ps(x), the projection of 
x on a closed convex nonempty set S, is the point in S 
closest to x, namely, 


Ps (x) = argmin ||z — x||”. (2) 
zées 
By the minimum principle, (2) holds if and only if Ps(x) 
€ Sand 
[z<S] => [(z—Ps(x),x — Ps(x)) < 0]. (3) 
Since 
(Ps(t) = ee =e) 
= (Ps(x) — x, x — Ps(x) + Ps(x) —z), 


we immediately obtain that (3) is equivalent to 


[ze S] > (Ps(x) —x,x —z) < —||Ps(x) — x||’. 
(4) 


In general, the projection problem (2) is difficult, 
but sometimes it is a straightforward computation. If 
S is a halfspace given by S e {z€H: (a,z) < B}, 
where a € H, B € R, the projection of x onto S is given 
by: 


x x €éS, 


_ (a,x) 8 (5) 


Ps(x) = ; 
a_ otherwise. 
(a, a) 

We can easily verify that (3) holds. 

In most techniques based on (1) S; is a halfspace, 
a hyperplane, or an appropriate set that renders (2) an 
easy problem. For instance, if C is the solution set of 
a linear system of equations in the Euclidean space R", 
i.e, C= {z € R": Az = D}, block action methods split the 
system in p subsystems, i. e., 


Az = by 
[Az = b] := 
Apz= by 
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Let Cy = {z € R": Ay z = by}, fork =1,..., p. 
If A; is full rank, and A,’ its transpose, then 


Po, (xi) = x; — A. (Ar A) | (Anxi — bx). 


Observe that (3) and (4) hold as equalities. A standard 
method chooses 


Si = Cy, k=(imod p)+1 


Convergence will be apparent in the next section. 
An important CFP is to find x € C, where C is de- 
fined by convex inequalities, i. e., 


C={zeH: fi@ <0), 7 =1,0.2,m}; (6) 


where f;(-): H — R are convex subdifferentiable func- 
tions and b/ are scalars, for j = 1,..., m. Convexity of C 
can be shown by well known properties of convex func- 
tions that we state for completeness 


Lemma 1 Let f(-): H — R. Let vj,..., Vp be vectors in H, 
and |L1, ..., [tp be nonnegative scalars such that sa 
[Lx = 1. Then f (-) is convex if and only if 


P P 
(omen) = Daron (7) 
k=1 k=1 


Lemma 2 f1(-),...5fm(-): H — Rare convex functions, 
and V1, ...,; Vm are nonnegative scalars. Then a vj 
fi() is also convex. 


Lemma 3 If f(.): H > R is convex and subdifferen- 
tiable, with subgradient df(-): H > H, then 

[yx eH] => fly) = f(x) + (df (x), y — x) 
and (obviously) 

[B= f(y] > [B= f(x) + (af (x), y—x)]. 


We turn our attention to the choice of {S;}?° for solving 
z € C defined by (6). Given x; define the halfspace 


Cj(xi) = {z © H: fj(xi) + (fj(xi), z— xi) < b/}. 
(8) 


As [ze C] > [b = fj(z)] we assert by Lemma 3 that 
C;(x;) D> C. In the next section it will be evident that 
convergence of (1) is ensured if j = (i mod m) + 1 and S; 
= C;(xi). 


Let ft(-) e max(0, fj(-)). Another possible choice 
for S; is 
mM Et x Filx; 
Sen De |. ~% 


+ (Ofi(xi),2—xi) < bi) (7 


By Lemma 3, S; contains the convex set C; given next. It 
is obvious that C; > C: 


zeH: >) ft (xi fi2 <b’) 


jal 


C; = 


Relaxation 


Both {S;}9°, and {@;}?° influence the quality of conver- 
gence of projection methods significantly. This section 
is concerned with the relaxation parameter. We would 
like to choose {@;}?°, such that z € C is obtained with 
the fewest number of iterations possible. Using the iter- 
ative scheme (1) we have for all z € S; that 


Ilxi41 — ||" = [xi — z + (Ps, (xi) — x1)? 
= ||xj —2||° + 2«; (x; — z, Ps,(xi) — xi) 


+ wo; || Ps; (xi) = x; ||’ ‘ 
Since S; > C, we conclude by (4) that 


[z€C] => |lxi41-2ll? 


(10) 
< ||xi — zl|? — @;(2 — @;) ||Ps,(xi) — xi’. 


We may reasonably expect that w; = 1 is the ‘best’ choice 
because w(2 — w) achieves its maximum value at w = 1 
and therefore x; is the ‘closest’ to the set C; however, 
@; = 1 for alli can be a very poor choice. Let us illustrate 
this fact with the following example in the Euclidean 
space R?, z = (z!, z”): 


Example 4 Let 
C = {2 <0, (sina)z’ + (cosa)z” > cosa}, 


i even, 


i odd. 


— (sina)z! + (cosa)z? > cosa 
'  ([2<0 
Now the problem is: Starting at x = 0, estimate the num- 
ber of iterations needed to generate z € C. 
e (Scheme 1) Assume exact projections, i.e. Vi: xj41 = 
Ps, (xj). 
We deduce from Fig. 1 that for all i: 


(11) 


Ixi41 — 2l| = [xi — ll cosa. 
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Consequently: 
I x2i — 2I| = [|x — zl] cos! 2a). ve) 


If w = 10° and xo is the origin then || x; — z || < 10~* 

for all i > 350, a significant reduction in the number 

of iterations. 
A small angle a means a small rate of convergence, and 
overprojection (w > 1) has the desirable practical ef- 
fect of opening up this angle. Convergence of projec- 
tion methods may significantly improve if an optimum 
value of the relaxation parameter is chosen; however 
this choice is often difficult. We argue that, in general, 
it is beneficial to overproject (w > 1), despite some rare 
examples require underprojection (@ < 1) to achieve 


Li Li42 
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Successive projections 


Consequently: a better (linear) rate of convergence. Figure 3 illustrates 

this latter possibility. The convex set C is the unique 

xi — zl] = ||xo —zl| cos'a, foralli, (12) _ point intersection of a bundle of straight lines passing 
through it. 


which means that the smaller a, the bigger the num- 
ber of iterations. Indeed, if @ = 10° and xp is the ori- 
gin we need more than 700 iterations (i > 700) to 
ensure that || x; — z || < 107+. 

e (Scheme 2) Assume 


Ker 2(Ps; (xi) = xi) i even, 
Xit1 = ; * 
Ps, (x;) i odd. 


We deduce from Fig. 2 for i even that 


Ilxi+1 — Zll = [xi — zl, 


I[xi+2 — 2|| = ||xi+1 — Z|] cos(2a) 
Relaxation in Projection Methods, Figure 3 
Under projection e, ..., ¢; Projection > 


= ||x; — z|| cos(2a). 


Fejér Property 

Most convergence analysis of projection methods are 
derived from the so called Fejér property, namely, the 
monotonicity of the sequence {|| x; — z ||}?°. Note from 
(10) that 


0<a@; <2 


( zEC ) => [|lxi41 —2|| < lx; —zI|]. (14) 


Hence, a sufficient condition to preserve the Fejér 
property is that the value of the relaxation parameter 
Relaxation in Projection Methods, Figure 2 w; belongs to the interval [0, 2]. To ensure conver- 
Overprojections gence a stronger condition is usually imposed. {w,}7° 
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must stay uniformly bounded away from 0, and 2. We 
will show below that this condition is not necessary. 
We will obtain a relaxation parameter w > 2, preserv- 
ing the Fejér property, and not impairing convergence. 
Even more, numerical results reveal that the quality of 
convergence improves significantly. In order to present 
the key facts without obscuring our presentation, we fo- 
cus our attention on parallel algorithms for solving the 
convex intersection problem (CIP), namely: Find z € C, 
where C is a nonempty intersection ofa finite collection 
of p convex sets, that is, 

C=C,N-+-NC,. 

We assume (again for the sake of clarity) that the 
projection problem 
(15) 


Vik = argmin ||x — x; \|’ 
xECk 


is easily solved. 


Problem CIP 
Data Xo, an estimate to z € C, 
mp0 <i S ily 
6:0 <é < l/p, 
i = 0, the iteration number 
REPEAT 
POR = ses 
IN parallel 
Let yjx be defined by (15) 
dik = Vik — Xi 
END parallel 
Choose wiz : 7 < Wik < (2— 7), 
Choose [tix = 4: So [ee = Il 
END FOR 
Xin. = Xj + ae Mik@ixdix,i=it1 
Convergence 


,p DO 


UNTIL 


Parallel projection algorithm (PPA) 


Above we have presented a parallel projection algo- 
rithm (PPA); we will now sketch its proof of conver- 
gence under suitable standard conditions. Then we in- 
troduce a A factor to modify the relaxation parameters 
and argue that the solution of a quadratic program- 
ming problem will lead to optimal relaxation. Finally 
we present the accelerated projection algorithm (APA) 
that subsumes our work. 


Convergence of PPA 


By using (10), (7) on the convex function f(-) = || - ||’, 
and the definitions of wj, and j1j, we obtain for z € C 
that 


Pp 
\|xi4 — z||? = xi + > Hik@ikdik — Z 
k=1 
p 2 
= you ik(x; + Wikdik — Z) 
k=1 
P 
< Yo bik \lxi + @indin — ZI 
k=1 
P 
s Yo i x [lx = z\” — Wiz (2 — Wik) Il dix || *| 
k=1 
< |lxi— zl? -— 9 aan IIx lI? 


< Ix; - ll? pS IIdix ll. 


k=1 


From the last inequality we observe that the Fejér prop- 
erty is maintained. Hence, {x;}{° is a bounded sequence. 
Moreover, {|| x; — z ||}7° decreases monotonically and 
is bounded below, therefore it has a limit. This forces 
{y7P_, ll dix ||?}9° > 0. Hence, {dz}9 > 0,k=1,..., 
p, which happens to be a convenient convergence test. 

So far the relaxation parameters [ix wix belong to 
(0, 2), as required by standard convergence analysis for 
projection methods. We now show the existence of a A 
factor that, in general, takes the relaxation parameters 
out of the interval (0, 2), but the Fejér property of the 
sequence {|| x; — z ||}?° and the desired convergence 
condition are preserved. 

Let us put 


P 
x(A) Sx; +A DY winwindic. (16) 


k=1 


We then obtain for A > 0 that 
[z€C] => [Ilx(A) -2ll? < llxi— 21’ + A], (17) 


where by (4) 
P 
—2A > Mik @ik ||dix ||? +A? 
k=1 


2 


P 
> Mik Mik dik 


k=1 


p(A) = 


(18) 
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The minimum of g(A) occurs at 


her Hiri (Idi? 7 


Ai = 7 (19) 
| ea Mik@indix 
and for any a > 0 we obtain that 
2 
[ofan incre ldiell”| 
y(ada;) = —a(2 —@) = 7x (20) 


| hay Mik @indix 


If we choose a@;: 7 < a; < 2 — nand let xj+1 = x(ajAj;), 
we ensure by (17) and (20) that the Fejér property is 
preserved. Consequently {p(a; ;)}?° — 0 and we can 
assert that {dix}f° > 0, k =1,..., p, as long as the se- 
quences {1.x }92 ,,k =1,..., p, remain uniformly pos- 
itive. Note that no upper bounds and no other condi- 
tions whatsoever are imposed on the latter sequences. 

To observe that A; may cause an overprojection, as- 
sume that mj, =1,k=1,..., p,at all iterations; then x; +1 
= x(a; A;) becomes 


P 2 Pp 
= Mik ||4ix| 
Zena Hit Udiel oT aedie, (21) 
| 4 HMikdik k=1 
hi 


Nid = Xi + 


d; 


where A; > 1 by (7). 

Figure 4 shows one iteration on Example 4 with @; 
=1,and pw = 0.5. 

The net effect of the A factor is to pull the sequence 
{xi}? out of a wedge of small angle. In the optimiza- 
tion jargon we look for the minimum of a given up- 
per bound of the distance from x(A) to C along the 
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A factor 


direction d; := _ Hin@ix diz starting from x;. It is 
worth noticing that sometimes inequality (17) becomes 
an equality; for instance, when C is a linear system of 
equations. In this case x(A;) is indeed the closest point 
to Calong dj. 

Let us proceed one step further with our analysis. 
Denote the vector w in the Euclidean space R? with 
components w* = jz wy, k = 1, ..., p. We then write 


P 
x(w) = xj +) w*dik 


(22) 
k=1 
and obtain (almost verbatim as argued for y(A)) 
llxQw) — zl|* S |lxi — zl? + ew), 
P P : 
gw) = —2 >) wk |Idiell? + |) w'dis 
k=1 k=1 
Hence we argue that 
w; = argmin g(w) (23) 


wk>S 


should yield ‘good’ (over)relaxation parameters. But the 
parallel algorithm can degrade severely, unless we have 
at hand an efficient algorithm for solving the quadratic 
program (23). Otherwise, any acceptable value for w; 
combined with the A factor can be advantageous. 

The net effect of w; is to obtain ‘locally’ the best di- 
rection of search to locate some z € C. Thus, w; can 
be considered as the optimum relaxation parameter. In 
Example 4, z is obtained in one step. 

Table 1 summarizes the accelerated parallel algo- 
rithm (APA) that subsumes our work. We dropped the 
iteration number i for convenience. 


Discussion 


The literature on projection methods is rather vast 
and rapidly growing. Reviews on projection methods 
and/or their applications in mathematics, physics and 
social sciences published in the 1990s include [1,4,5,7, 
10,13,21,23], and others. 

Projection methods have a long history. In their 
early inceptions [6,20,22] no relaxation parameters 
were introduced, but soon it was noticed that overre- 
laxation (or overprojection) could speed convergence. 
Successive over relaxation techniques were introduced 
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Accelerated parallel algorithm 


Problem Findz<¢C=C,...C, 
Data x, an estimate of z, 

0<n<1,0<6<lI1/p, 

0 < € for the convergence test 
REPEAT 

FORk =1,...,pDO 

IN parallel 

ye = argmin ec, lly — xl’, 

oh, = We = 3 

END parallel 

Find w* > § 

END FOR 

4 = Lew lle I? 


Deen wha? 
Choose a:n <a <2-—7 


Update: x = x + Aa 77 _, w'dy 
UNTIL Y-,IIdell <¢ 


in the solution of large linear system of equations. Sig- 
nificant work on determining and computing the opti- 
mum relaxation was performed in the 1950s ([19] gives 
a good list of references on the subject). R. Bramley 
and A. Sameh [2] found that underprojection could im- 
prove convergence, at least theoretically, in the solution 
of a nonsymmetric linear system of equations. 

Convergence rate of projection methods is linear 
under mild conditions, i.e., || xi+1 — z || < yi || «i - 
z ||, where {y; € R}7° is uniformly positive and strictly 
less than one. The value of y; depends strongly on the 
angle a; between supporting hyperplanes of S; and S; 
[11,18,24]. Overprojection and most techniques used to 
improve the quality of convergence of projection meth- 
ods merely attempt to reduce y;. For the convex in- 
equality problem, see [12,14]. In [12] projection aggre- 
gation methods were developed and the choice of (9) 
was justified. In [14] a projection algorithm with a su- 
perlinear rate of convergence is presented, where {w;}7° 
— Lis needed. 

We proved that the sequence {x;}7° generated by 
the parallel algorithms (PPA and APA) satisfies the Fe- 
jér property. Strong convergence in Hilbert spaces for 
a countable number of sets can be proved under mild 
additional conditions [1,9, Thm. 2.16]. It is straight- 
forward to show that convergence is preserved if we 


project on closed supersets Sj, > Cy, k = 1, ..., p. We 
only need 


[PseCo) — at] = 8 | Pox(x) — 


for some 6 > 0. 

The A factor has been suggested in [8,12], and [15]. 
Numerical results reported in the latter paper, in [9,25] 
and [17] are impressive. However, the actual theoretical 
improvement of the quality of convergence is an open 
question. See [16] for theoretical results when C is the 
intersection of affine sets (hyperplanes). In [12,15] the 
use of the A factor is analysed for sequential versions of 
the projection method. 

We suggest the solution of the quadratic problem 
(23) to obtain ‘optimal’ relaxation parameters. K.C. Ki- 
wiel recommends the solution of the nonlinear pro- 
gramming problem (20) [21]. To the author’s knowl- 
edge, there exists no evidence of the superiority of ei- 
ther approach. We do not believe that the exact solu- 
tion of either problem is a good strategy, because this 
can degrade the performance of a parallel algorithm. 
Our recommendation is to obtain some acceptable ini- 
tial values for the relaxation parameters, and then use 
the A factor. 

Throughout the paper we have assumed that C # @. 
The inconsistency case, i.e., C = @, has attracted a lot of 
attention lately. See [3] and references therein. 


See also 
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Replicator equations are a class of dynamical systems de- 
veloped and studied in the context of evolutionary game 
theory, a discipline pioneered by J. Maynard Smith [36] 
which aims to model the evolution of animal behavior 
using the principles and tools of game theory. Because 
of their dynamical properties, they have been recently 
applied with significant success to a number of combi- 
natorial optimization problems. It is the purpose of this 
article to provide a summary and an up-to-date bibli- 
ography of these applications. 
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The Model and its Properties 


In this Section we discuss the basic intuition behind 
replicator equations and present a few theoretical prop- 
erties that are instrumental for their application to op- 
timization problems. For a more systematic treatment 
see [23,55]. 

Consider a large, ideally infinite population of indi- 
viduals belonging to the same species which compete 
for a particular limited resource, such as food, terri- 
tory, etc. This kind of conflict is modeled as a game, 
the players being pairs of randomly selected population 
members. In contrast to traditional application fields of 
game theory, such as economics or sociology [33], play- 
ers here do not behave ‘rationally,’ but act instead ac- 
cording to a pre-programmed behavior pattern, or pure 
strategy. Reproduction is assumed to be asexual, which 
means that, apart from mutation, offspring will inherit 
the same genetic material, and hence behavioral phe- 
notype, as its parent. Let J = {1, ..., n} be the set of 
pure strategies and, for all i € J, let x;(t) be the rela- 
tive frequency of population members playing strategy 
i, at time t. The state of the system at time ¢ is simply 
the vector x(t) = (x;(t), ..., xn(t))T. Clearly, the states 
are constrained to lie in the standard simplex of the n- 
dimensional Euclidean space R”: 


Sn = {xe R": x; >0, Vie J, es= 1). 


Here and in the sequel, the letter e is reserved for 
a vector of appropriate length, consisting of unit entries 
(hence eTx = )°; xj). 

One advantage of applying game theory to biology 
is that the notion of ‘utility’ is much simpler and clearer 
than in human contexts. Here, a player’s utility can sim- 
ply be measured in terms of Darwinian fitness or re- 
productive success, i.e., the player’s expected number 
of offspring. Let W = (w;) be the n x n ‘payoff (or fit- 
ness) matrix. Specifically, for each pair of strategies i, j 
€ J, wi represents the payoff of an individual playing 
strategy i against an opponent playing strategy j. With- 
out loss of generality, we shall assume that the payoff 
matrix is nonnegative, i.e., wi > 0 for all i,j € J. At 
time ft, the average payoff of strategy iis given by: 


mi(t) = >) wijx;(t), (1) 


j=1 


while the mean payoff over the entire population is 
YL, xilt)ri(t). 

In evolutionary game theory the assumption is 
made that the game is played over and over, generation 
after generation, and that the action of natural selection 
will result in the evolution of the fittest strategies. If suc- 
cessive generations blend into each other, the evolution 
of behavioral phenotypes can be described by the fol- 
lowing set of differential equations [53]: 


£i(t) = xi(t) | mi(t) — Do x,(t)x,(t) (2) 


j=l 


for i=1,..., , where a dot signifies derivative with re- 
spect to time. The basic idea behind this model is that 
the average rate of increase x;(t)/x;(t) equals the differ- 
ence between the average fitness of strategy i and the 
mean fitness over the entire population. It is straight- 
forward to show that the simplex S,, is invariant under 
equation (2) or, in other words, any trajectory starting 
in S, will remain in S,. To see this, simply note that 
(d/dt) )°; x(t) = }°; x;(t) = 0, which means that the 
(relative) interior of S, (i.e., the set defined by x; > 0, 
for alli=1,..., n) is invariant. The additional observa- 
tion that the boundary too is invariant, completes the 
proof. 

Similar arguments provide a rationale for the fol- 
lowing discrete-time version of the replicator dynam- 
ics, assuming nonoverlapping generations, which can 
be obtained from (2) by setting 1/A t = ai xj(t)z(t): 


x(t); (t) 
xi(t+ At) = =——_—_ (3) 
Ss x;(t)7;(t) 
for i=1,..., n. Because of the nonnegativity of the fit- 


ness matrix W and the normalization factor, this sys- 
tem too makes the simplex S,, invariant as its continu- 
ous counterpart. 

A point x = x(t) is said to be a stationary (or equi- 
librium) point for the dynamical systems under con- 
sideration if x;(t) = 0 in the continuous-time case, and 
xi(t + At) = x;(t) in the discrete-time case (i = 1, ..., 
n). Moreover, a stationary point is said to be asymp- 
totically stable if any trajectory starting in its vicinity 
will converge to it as t + oo. It turns out that both the 
continuous-time and discrete-time replicator dynamics 
have the same set of stationary points, namely all the 
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points in S, satisfying, for all i= 1,..., n, the condition 


xi(t) | mi(t) — > x)(t)xj(t) | = 0, 


jal 


or, equivalently, z;(t) = Via x;(t)sj(t) whenever x; > 
0. 

Equations (2) and (3) arise independently in differ- 
ent branches of theoretical biology [23]. In population 
ecology, for example, the famous Lotka—Volterra equa- 
tions for predator-prey systems turn out to be equiva- 
lent to the continuous-time dynamics (2), under a sim- 
ple barycentric transformation and a change in veloc- 
ity. In population genetics they are known as selection 
equations [17]. In this case, each x; represents the fre- 
quency of the ith allele A; and the payoff wi is the fit- 
ness of genotype A; Aj. Here the fitness matrix W is 
always symmetric. The discrete-time dynamical equa- 
tions turn out to be a special case of a general class of 
dynamical systems introduced in [2] and studied in [3] 
in the context of Markov chain theory. They also rep- 
resent an instance of the so-called relaxation labeling 
processes, a class of parallel, distributed algorithms de- 
veloped in computer vision to solve (continuous) con- 
straint satisfaction problems [25,44,50]. An indepen- 
dent connection between relaxation labeling processes 
and game theory has recently been described in [37]. 

The following theorem states that under replicator 
dynamics the population’s average fitness always in- 
creases, provided that the payoff matrix is symmetric 
(in game theory terminology, this situation is referred 
to as a doubly symmetric game). 


Theorem 1 Suppose that the (nonnegative) payoff ma- 
trix W is symmetric. Then, the quadratic polynomial F 
defined as 


F(x) = x' Wx (4) 


is strictly increasing along any nonconstant trajectory of 
both continuous-time (2) and discrete-time (3) replicator 
equations. In other words, for all t > 0 we have 


< Pex) >0 
for system (2), and F(x(t + A t)) > F(x(t)) for system (3), 


unless x(t) is a stationary point. Furthermore, any such 
trajectory converges to a (unique) stationary point. 


The previous result is known in mathematical biol- 
ogy as the fundamental theorem of natural selection 
[17,23,55] and, in its original form, traces back to [18]. 
As far as the discrete-time model is concerned, it can be 
regarded as a straightforward implication of the Baum- 
Eagon theorem [2,3] which is valid for general polyno- 
mial functions over product of simplices. F.R. Waugh 
and R.M. Westervelt [54] also proved a similar result for 
a related class of continuous- and discrete-time dynam- 
ical systems. In the discrete-time case, however, they 
put bounds on the eigenvalues of W in order to achieve 
convergence to fixed points. 

The fact that all trajectories of the replicator dynam- 
ics converge to a stationary point has been proved in 
[32,34]. However, in general, not all stationary points 
are local maximizers of F on S,. The vertices of S,, 
for example, are all stationary points for (2) and (3) 
whatever the landscape of F. Moreover, there may ex- 
ist trajectories which, starting from the interior of S,,, 
eventually approach a saddle point of F. However, a re- 
sult proved by I. Bomze [5] asserts that all asymp- 
totically stable stationary points of replicator dynam- 
ics correspond to (strict) local maximizers of F on S,,, 
and vice versa (see [10] for additional results relating 
the fields of optimization theory, evolutionary game 
theory and the qualitative behavior of dynamical sys- 
tems). 

Under continuous-time replicator dynamics, the 
trajectories approach their limits most efficiently in 
the sense that (2) is a gradient system if one uses the 
(non-Euclidean) Shahshahani metric [23] which, for 
any point u € S,, is defined as 


1 
dy(x,y) = » “ae 


fi: uj>o} | 


This efficiency result is called Kimura’s maximum prin- 
ciple. 


Maximum Clique Problems 


Let G = (V, E) be an undirected graph, where V = {1, 
..., n} is the set of vertices and E C V x V is the set of 
edges. The order of G is the number of its vertices, and 
its size is the number of edges. Two vertices i, j € V are 
said to be adjacent if (i, j) € E. The adjacency matrix of 
G is the n x n symmetric matrix Ag = (aj) defined as 
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follows: 


1 if (i,j) €E, 


aij = . 
0 otherwise. 


A subset C of vertices in G is called a clique if all its 
vertices are mutually adjacent, i.e., for all i,j € C, withi 
# j, we have (i, j) € E. A clique is said to be maximal if it 
is not contained in any larger clique, and maximum if it 
is the largest clique in the graph. The clique number, de- 
noted by w(G), is defined as the cardinality of the max- 
imum clique. The maximum clique problem is to find 
a clique whose cardinality equals the clique number. 

The maximum clique problem is a well-known ex- 
ample of combinatorial optimization problem, not only 
because it was one of the first problems shown to be 
NP-complete [19], but also for its theoretical as well as 
practical implications. Due to the inherent computa- 
tional complexity of the problem, exact algorithms are 
guaranteed to return a solution only in a time which in- 
creases exponentially with the number of vertices in the 
graph, and this makes them inapplicable even to mod- 
erately large problem instances. Moreover, a series of 
recent theoretical results show that the problem is in 
fact difficult to solve even in terms of approximation. 
Because of these negative results, much effort has re- 
cently been directed towards devising efficient heuris- 
tics for finding large cliques, for which no formal guar- 
antee of performance may be provided, but are anyway 
of interest in practical applications. We refer to [8] for 
a recent survey of results concerning algorithms, com- 
plexity and applications of this problem. 

In 1965, T.S. Motzkin and E.G. Straus [38] estab- 
lished a remarkable connection between the maximum 
clique problem and a certain quadratic programming 
problem. Consider the following quadratic function, 
sometimes called the Lagrangian of G: 


fo(x) = x" Acx (5) 


and let x* be a global maximizer of fg on S,,, n being the 
order of G. In [38] it is proved that the clique number 
of G is related to f¢(x*) by the following formula: 


—— 
1— fe(x*) 


Additionally, it is shown that a subset of vertices C is 
a maximum clique of G if and only if its characteristic 


w(G) = (6) 


vector x©, which is the vector of S,, defined as 


— ifiec, 


0 otherwise, 


is a global maximizer of fg on S,. In [21,47], the 
Motzkin-Straus theorem has been extended by provid- 
ing a characterization of maximal cliques in terms of 
local maximizers of fg on Sy. 

Once that the maximum clique problem is formu- 
lated in terms of maximizing a quadratic polynomial 
over the standard simplex, the use of replicator dynam- 
ics naturally suggests itself [42]. In fact, consider a repli- 
cator system with payoff matrix defined as: 


W= Ag. 


From the fundamental theorem of natural selection, we 
know that the replicator dynamical systems, starting 
from an arbitrary initial state, will iteratively maximize 
the Lagrangian fg in S,, and will eventually converge 
to a local maximizer which, by virtue of the Motzkin- 
Straus formula provides an estimate of the clique num- 
ber of G. Additionally, if the converged solution hap- 
pens to be a characteristic vector of some subset of ver- 
tices of G, then we are also able to extract the vertices 
comprising the clique from its nonzero components. 
Clearly, in theory there is no formal guarantee that the 
converged solution will be a global maximizer of fg. 
However, experimental work suggests that the basins 
of attraction of global maximizers are quite large, and 
frequently the algorithm converges to one of them. 

In [42], M. Pelillo presents extensive experimental 
results with the previous approach over thousands of 
randomly generated graphs. The discrete-time dynam- 
ics (3) was used, and the system was started from the 
vector (1/n,..., 1/n)T which corresponds to the simplex 
barycenter. Two series of experiments were conducted. 
In the first one, graphs with a relatively small number 
of vertices were considered, i.e. with up to 500 vertices 
and densities ranging from 0.10 to 0.90. The solutions 
found by the algorithm were always very close to the 
optimal ones, as found by standard exact algorithms. In 
the second part of the study, graphs with up to 2000 ver- 
tices and about one million edges were used (in this case 
all graphs had density 0.50). Here to gauge the quality of 
the solutions found the Matula estimate was employed, 
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which accurately predicts the clique number in a ran- 
dom graph, when the number of vertices is sufficiently 
large [35]. Specifically, let 


M(n, 54) 
e 
= 21logy)3 n — 2 logs log, 5 +2 logys 2 as 


D.W. Matula proved that, as n — ov, the order of 
the maximum clique in an n-vertex 5-density random 
graph is either |M(n, 5)| or [M(n, 5)] with probabil- 
ity tending to 1, where |x] denotes the largest inte- 
ger less than or equal to x, and [x] denotes the small- 
est integer greater than or equal to x. Interestingly, it 
was also shown that the smallest maximal clique is ex- 
pected to have M(n, 5)/2 vertices [4]. Experimentally, 
in [42] it was found that the cardinality of the cliques 
found by the replicator dynamical system turned out 
to be significantly larger than the estimated minimum, 
thereby contradicting what is known as the Jerrum con- 
jecture [27], which states that in a large 0.5-density ran- 
dom graph it may be hard to find a clique whose or- 
der is even a bit larger than that of the smallest maxi- 
mal clique. A similar conclusion was also drawn in [26]. 
Overall, the results presented in [42] were competitive 
with those obtained using more sophisticated neural 
network heuristics, both in terms of quality of solutions 
and speed. 

One drawback associated with the original Motz- 
kin-Straus formulation, however, relates to the ex- 
istence of spurious solutions, i.e., maximizers of fg 
which are not in the form of characteristic vectors. This 
was first observed in [40]. To illustrate, consider the 
path P?, i.e. the graph with three vertices {1, 2, 3} and 
two edges, one between 1 and 2, and the other between 
2 and 3. Clearly, C = {1, 2} and D = {2, 3} are maxi- 
mum cliques, and from the Motzkin-Straus theorem it 
follows that their characteristic vectors x° and x? are 
global maximizers of the Lagrangian of P? in $3. How- 
ever, it can easily be proved that all the points lying on 
the segment connecting x© and x, which is a subset 
of S3 since the simplex is convex, are also global so- 
lutions of the Motzkin-Straus program. See [47] for 
general characterizations of such spurious solutions. In 
principle, spurious solutions represent a problem since, 
while providing information about the cardinality of 
the maximum clique, they do not allow us to easily ex- 
tract its vertices. 


The spurious solution problem has been solved in 
[5]. Consider the following regularized version of f¢: 


ms 1 
fox) =x! Agx+ xx (7) 


which is obtained from (5) by substituting the adja- 
cency matrix Ag of G with 


~ 1 
Ag = AG F gin 


where I, is the n x n identity matrix. Unlike the 
Motzkin-Straus formulation, it can be proved that all 
maximizers of ie on S, are strict, and are characteristic 
vectors of maximal/maximum cliques in the graph [5]. 


Theorem 2 Let C be a subset of vertices of a graph G, 
and let x© be its characteristic vector. Then, C is a max- 
imum (maximal) clique of G if and only if x© is a global 
(local) maximizer of fc in Sy. Moreover, all local (and 
hence global) maximizers of fc over S, are strict. 


In an exact sense, therefore, a one-to-one correspon- 
dence exists between maximal cliques and local max- 
imizers of fe in S, on the one hand and maximum 
cliques and global maximizers on the other hand. 
Preliminary experiments with this regularized for- 
mulation (7) on random graphs are reported in [5], and 
a more extensive empirical study on DIMACS bench- 
mark graphs is presented in [10]. The emerging picture 
is the following. The solutions produced by the replica- 
tor models are typically very close to the ones obtained 
using more sophisticated continuous-based heuristics. 
Moreover, the original version of the Motzkin-Straus 
problem performs slightly better than its regularized 
counterpart, but the former often returns spurious so- 
lutions. This may be intuitively explained by observing 
that, since all local maxima of fe are strict, its landscape 
is certainly less smoothed than the one associated. to 
the nonregularized version. This therefore enhances the 
tendency of local optimization procedures to get stuck 
into local maxima. This is the price to pay for the algo- 
rithm to return nonspurious, ‘informative’ solutions. 
In order to study the effects of varying the start- 
ing point of clique finding replicator dynamics, Bomze 
and F. Rendl [12] implemented various sophisticated 
heuristics and compared them with the usual (less ex- 
pensive) strategy of starting from the simplex barycen- 
ter. Surprisingly, they concluded that the amount of 
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sophistication seems to have no significant impact on 
the quality of the solutions obtained. Additionally, they 
showed that using (Runge-Kutta discretizations of) the 
continuous-time dynamics (2) instead of (3) does not 
improve efficiency. This analysis indicates that to im- 
prove the performance of replicator dynamics on the 
maximum clique problem one has necessarily to resort 
to some escape strategies. Various attempts along this 
direction can be found in [5,6,9,13]. 

The Motzkin-Straus theorem has been generalized 
to the weighted case in [21]. Let G = (V, E, w) be 
a weighted graph, where V = {1,..., n} is the vertex set, 
EC Vx Vis the edge set and w € R’ is the weight vector, 
the ith component of which corresponds to the weight 
assigned to vertex i. It is assumed that w; > 0 for all i € 
V. Given a subset of vertices C, the weight assigned to 
C is defined as 


W(C) = Dwi. 


i€C 


A maximal weight clique C is one that is not contained 
in any other clique having weight larger than W(C). 
Since we are assuming that all weights are positive, it is 
clear that the concepts of maximal clique and maximal 
weight clique coincide. A maximum weight clique is 
one having largest total weight, and the weighted clique 
number of G, denoted w(G, w), is its weight. The max- 
imum weight clique problem is to find a clique C such 
that W(C) = @(G, w) (see [8] for a recent review). The 
classical (unweighted) version of the maximum clique 
problem arises as a special case when all vertices have 
the same weight. For this reason the maximum weight 
clique problem has at least the same computational 
complexity as its unweighted counterpart. 

Note that the original Motzkin-Straus program for 
unweighted graphs can be reformulated as a minimiza- 
tion problem by considering the function 


g(x) = x" (I + Aq)x, 


where Ag is the adjacency matrix of the complement 
graph G, which is the graph having the same vertex set 
as Gand E = {(i,j)€ VxV: iF¢ jand (i, j) ¢ E} 
as its edge set. It is straightforward to see that if x* is 
a global minimizer of g in S,,, then @(G) = 1/g(x*). This 
is simply a different formulation of the Motzkin-Straus 
formula (6). Now, consider a weighted graph G = (V, 


E, w), and let J((G, w) be the class of symmetric n x n 
matrices M = (mj)j,j¢ v defined as 2mj > mj + mj if (i, 
j) ¢ Eand mj = 0 otherwise, and mj = 1/w; for all i € 
V. Given a global solution x* of the following quadratic 
program, which is in general indefinite, 


min g(x) = x! Mx (8) 


st. xeES,, 


we have [21]: 


1 
o(G,w) = —~ 
g(x*) 
for any matrix M € M(G, w). Furthermore, denote by 
x°(w) the weighted characteristic vector of C, which is 
the vector in S,, with coordinates 


Wi 
W(C) 
0 otherwise. 


ifi eC, 
x©(w) = 


It turns out that a subset C of vertices is a maximum 
weight clique if and only if its characteristic vector 
x°(w) is a global minimizer of (8). Notice that the ma- 
trix I+ Az belongs to M(G, e). In other words, the orig- 
inal Motzkin-Straus theorem turns out to be a special 
case of the preceding result. 

As in the unweighted case, this formulation suffers 
from the existence of spurious solutions, and this en- 
tails the lack of a one-to-one correspondence between 
the solutions of the continuous optimization problem 
and those of the original, discrete one. In [11] these spu- 
rious solutions are characterized and a regularized ver- 
sion which avoids this kind of problems is introduced 
(see also [7]). Specifically, let N(G, w) be the the class 
of n x n symmetric matrices M = (mj); jev defined as 
my = mi + mj; if (i,j) ¢ E and my = 0 otherwise, and 
mj; = 1/2w; for all i € V. The following theorem is the 
weighted counterpart of Theorem 2. 


Theorem 3 Let C be a subset of vertices of a weighted 
graph G = (V, E, w), and let x°(w) be its characteristic 
vector. Then, for any matrix M € N(G, w), C is a max- 
imum (maximal) weight clique of G if and only if x°(w) 
is a global (local) solution of program (8). Moreover, all 
local (and hence global) solutions of (8) are strict. 


Note that N (G, w) is isomorphic to the positive or- 
thant in (}) — |E| dimensions. This class is a polyhedral 
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pointed cone with its apex given by the matrix M(w) = 
(mj(w));, j ¢ v with entries 


1 


: ifi=j, 
Wi 
1 1 
mij(w) = — ififj, Gf ¢E, 
2w; 2wj 
0 ifi # j, (i,j) € E. 


Observe that in the unweighted case, M(e) = ee! — 


Ao = Ag the regularized adjacency matrix of the com- 
plement graph G. This reflects the elementary property 
that an independent set of G, i.e. a subset of pairwise 
nonadjacent vertices, is a clique of G. Hence, while the 
local maximizers of x! Acx over S, correspond to max- 
imal cliques of G, the local minimizers of x! Acx over 
Sn correspond to maximal independent sets. 

Theorem 3 suggests using replicator equations to 
approximately solve the maximum weight clique prob- 
lem. Indeed, note that replicator equations are max- 
imization procedures, while ours is a minimization 
problem. However, it is a straightforward exercise to see 
that the problem of minimizing a quadratic form x’ M 
x on S, is equivalent to maximizing yee’ — M, where 
y is an arbitrary constant. Therefore, the payoff matrix 
for replicator dynamics to be used in this case is: 


W= yee! —M 
where M = (mj) is any matrix in N(G, w), and 


Experiments with this approach on both random 
graphs and DIMACS benchmark graphs are reported in 
[11]. Weights were generated randomly in both cases. 
The results obtained with replicator dynamics (3) were 
compared with those produced by a very efficient maxi- 
mum weight clique algorithm of the branch and bound 
variety. The algorithm performed remarkably well es- 
pecially on large and dense graphs, and it was typically 
an order of magnitude more efficient than its competi- 
tor. 


Graph Isomorphism 


Given two graphs G’ = (V’, E’) and G” = (V”, E”), an 
isomorphism between them is any bijection ¢: V’V” 
such that (i, j) € E’ = (@(i), 6G) € E”, for all i, j € 


Vv’. Two graphs are said to be isomorphic if there exists 
an isomorphism between them. The graph isomorphism 
problem is therefore to decide whether two graphs are 
isomorphic and, in the affirmative, to find an isomor- 
phism. 

The graph isomorphism problem is one of those few 
combinatorial optimization problems which still resist 
any computational complexity characterization [19,28]. 
Despite decades of active research, no polynomial time 
algorithm for it has yet been found. At the same time, 
while clearly belonging to NP, no proof has been pro- 
vided that it is NP-complete. Indeed, there is strong ev- 
idence that this cannot be the case, for otherwise the 
polynomial hierarchy would collapse [14,52]. The cur- 
rent belief is that the problem lies strictly between the P 
and NP-complete classes. 

The subgraph isomorphism problem is more gen- 
eral and in fact more difficult, being NP-complete [19]. 
Given two graphs, it is the problem of determining 
whether one is isomorphic to a subgraph of the other. 
At the highest level of generality we find the maxi- 
mum common subgraph problem, which consists of 
finding the largest isomorphic subgraphs of two graphs. 
A simpler version of this problem is to find a maximal 
common subgraph, i.e., an isomorphism between sub- 
graphs which is not included in any larger subgraph iso- 
morphism. 

H.G. Barrow and R.M. Burstall [1], and also D. 
Kozen [30], introduced the notion of an association 
graph as a useful auxiliary graph structure for solv- 
ing general graph/subgraph isomorphism problems. 
Specifically, the association graph derived from graphs 
G’ = (V’, E’) and G” =(V", E”’) is the undirected graph 
G =(V, E) where 


V=vV'xv" 


and 


E={(i,4), GK) EVxVii Fj Ask, 
(i,j) € BE’  (h,k) € E”}. 
The following straightforward result establishes an 
equivalence between the graph isomorphism problem 
and the maximum clique problem [46]. 
Theorem 4 Let G’ =(V’, E’) and G" =(V", E") be two 


graphs of order n, and let G be the corresponding associa- 
tion graph. Then, G' and G” are isomorphic if and only if 
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w(G) =n. In this case, any maximum clique of G induces 
an isomorphism between G' and G", and vice versa. In 
general, maximum (maximal) cliques in G are in one-to- 
one correspondence with maximum (maximal) common 
subgraph isomorphisms between G' and G". 


By virtue of Theorem 2, it is a straightforward exercise 
to formulate the graph isomorphism problem in terms 
of a quadratic programming problem. Let G’ and G” 
be two arbitrary graphs of order n, and let Ag denote 
the adjacency matrix of the corresponding association 
graph G, whose order is n*. The graph isomorphism 
problem is equivalent to the following program: 


max fg(x) =x"(A+ 4I2)x 


s.t. xe€S,2. 


(9) 


More precisely, G’ and G” are isomorphic if and only 
if fo(x*) = 1-— 1/2n. In this case, any global solu- 
tion to (9) induces an isomorphism between G’ and 
G”, and vice versa. In general, local (global) solutions 
to (9) are in one-to-one correspondence with maximal 
(maximum) common subgraph isomorphisms between 
G’ and G”. 

The previous result allows one to use replicator dy- 
namics with payoff matrix 


1 
W=Ae7 zine 


as a heuristic for graph isomorphism problems. Start- 
ing from an arbitrary initial state, the dynamical system 
will converge to a local solution of (9). This will cor- 
respond to a characteristic vector of a maximal clique 
in the association graph G which, in turn, will induce 
an isomorphism between two subgraphs of G’ and G” 
which is maximal, in the sense that there is no other 
isomorphism between subgraphs of G’ and G’ that in- 
cludes the one found. 

The algorithm outlined above has been tested over 
hundreds of random 100-vertex graphs with expected 
densities ranging from 1% to 99%. Except for very 
sparse and very dense instances, the algorithm was al- 
ways able to obtain a correct isomorphism very effi- 
ciently. In terms of quality of solutions, the result com- 
pare favorably with those obtained using more sophisti- 
cated state-of-the-art deterministic annealing heuristics 
which, in contrast to replicator dynamics, are explicitly 
designed to escape from poor local solutions. As far as 


computational time is concerned, replicator dynamics 
turned out to be significantly faster. 

In [46] experiments were also done using the fol- 
lowing exponential version of replicator equations, 
which arises as a model of evolution guided by imita- 
tion [22,23,24,55]: 


ekzilt) 
x(t) = x;(t) | —————__. - 1 ] 10 
(0 = 800 | Span (10) 
i=1,...,n, where x is a positive constant. As x tends 


to 0, the orbits of this dynamics approach those of 
the standard, ‘first order’ replicator model (2), slowed 
down by the factor «; moreover, for large values of k 
the model approximates the so-called ‘best-reply’ dy- 
namics [24]. As it turns out [22], these models behave 
essentially in the same way as the standard replicator 
equations (2), the only difference being the size of the 
basins of attraction around stable equilibria. 

A customary way of discretizing equation (10) is 
given by the following difference equations [15,20]: 


x; (tekmil) 

xi(t+ 1) = —, (11) 
Dei xj(te ‘ 

i=1,...,n. The extensive results reported in [46] with 


this dynamics show that exponential replicator dynam- 
ics may be considerably faster and even more accurate 
than the standard, first order model. 

The approach just described is general and can 
clearly be extended to deal with subgraph isomorphism 
or relational structure matching problems [45]. Prelim- 
inary experiments, however, seem to indicate that lo- 
cal optima may represent a problem here, especially in 
matching sparse and dense graphs. In these cases escape 
procedures like those presented in [5,6,9,13] would be 
helpful. 


Subtree Isomorphism 


Given a graph G = (V, E), a path is any sequence of dis- 
tinct vertices ig --+ i, such that for allk=1,..., n, (ix-1, 
i) € E; in this case, the length of the path is n. If ip = i, 
the path is called a cycle. A graph is said to be connected 
if any pair of vertices is joined by a path. The distance 
between two vertices i and j, denoted by d(i, j), is the 
length of the shortest path joining them (by convention 
d(i, j) = 00, if there is no such path). Given a subset of 
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vertices C C V, the induced subgraph G[C] is the graph 
having C as its vertex set, and two vertices are adjacent 
in G[C] if and only if they are adjacent in G. 

A connected graph with no cycles is called a tree. 
A rooted tree is one which has a distinguished vertex, 
called the root. The level of a vertex iin a rooted tree, de- 
noted by lev(i), is the length of the path connecting the 
root to i. Note that there is an obvious equivalence be- 
tween rooted trees and directed trees, where the edges 
are assumed to be oriented. We shall therefore use the 
same terminology typically used for directed trees to 
define the relation between two adjacent vertices. In 
particular, if (i,j) € E and lev(j)— lev(i) = + 1, we say 
that i is the parent of j and, conversely, j is a child of 
i. Trees have a number of interesting properties. One 
which turns out to be very useful for our characteriza- 
tion is that in a tree any two vertices are connected by 
a unique path. 

Let T, = (Vj, E,) and T = (V2, Ey) be two rooted 
trees. Any bijection ¢H,—H), with H; C V, and H, 
C V2, is called a subtree isomorphism if it preserves 
the adjacency and hierarchical relationships between 
the vertices and, in addition, the subgraphs obtained 
when we restrict ourselves to H; and Hy, i.e., T,;[H,] 
and T>[H>], are trees. The former condition amounts 
to stating that, given i, j € Hi, we have (i, j) € E, if 
and only if (¢(i), @(j)) € Ey, and i is the parent of j 
if and only if (i) is the parent of ¢(j). A subtree iso- 
morphism is maximal if there is no other subtree iso- 
morphism ¢’: H,/—>H,’ with H, a strict subset of Hy’, 
and maximum if H, has largest cardinality. The max- 
imal (maximum) subtree isomorphism problem is to 
find a maximal (maximum) subtree isomorphism be- 
tween two rooted trees. This is a problem solvable in 
polynomial time [19]. 

Let i and j be two distinct vertices of a rooted tree 
T, and let i = xo --- x, =j be the (unique) path joining 
them. The path-string of i and j, denoted by str(i, j), is 
the string s, --- s, on the alphabet {—1, +1} where, for 
allk =1,..., n, s; = lev(x;)— lev(x,_ 1). By convention, 
when i = j we define str(i, j) = ¢, where e is the null 
string (i.e., the string having zero length). The path- 
string concept has a very intuitive meaning. Because of 
the orientation induced by the root, only two types of 
elementary moves can be done from any given vertex, 
i.e., going down to one of the children (if one exists) 
or going up to the parent (if the vertex is not the root). 


Assigning to the first move the label +1, and to the sec- 
ond the label —1, the path-string of i and j is simply the 
string of elementary moves required to move from i to 
j, following the unique path joining them. 

The tree association graph (TAG) of two rooted 
trees T, = (V1, E,) and Tz = (V2, E) is the graph G 
= (V, E) where 


V=YV,x V2 (12) 


and, for any two vertices (i, h) and (j, k) in V, we have 


((i,h), Gj, k)) € E > str(i, j) = str(h, k). (13) 


The following theorem establishes a one-to-one cor- 
respondence between the maximum subtree isomor- 
phism problem and the maximum clique problem [48]. 


Theorem 5 Any maximal (maximum) subtree iso- 
morphism between two rooted trees induces a maximal 
(maximum) clique in the corresponding TAG, and vice 
versa. 


In many practical applications the trees being matched 
have vertices with an associated vector of symbolic 
and/or numeric attributes. The framework just de- 
scribed can naturally be extended for solving attributed 
tree matching problems [48]. 

Formally, an attributed tree is a triple T = (V, E, a), 
where (V, E) is the ‘underlying’ rooted tree and @ is 
a function which assigns an attribute vector a(i) to each 
vertex i € V. It is clear that in matching two attributed 
trees, the objective is to find an isomorphism which 
pairs vertices having ‘similar’ attributes. To this end, 
let o be any similarity measure on the attribute space, 
i.e., any (symmetric) function which assigns a positive 
number to any pair of attribute vectors. If 6: H} >H)2 is 
a subtree isomorphism between two attributed trees T; 
= (Vj, E;, a) and Ty = (V2, Ey, a2), the overall simi- 
larity between the induced subtrees T;[H;] and T2[H2] 
can be defined as follows: 


S() = S> o(ai(i), 02 (P(i))). 


i€H, 


The isomorphism ¢ is called a maximal similarity sub- 
tree isomorphism if there is no other subtree isomor- 
phism ¢’: H’;—>H’> such that H, is a strict subset of 
Hy’ and S()< S(¢’). It is called a maximum similarity 
subtree isomorphism if S(@) is largest among all subtree 
isomorphisms between T; and T>. 


3288 


Replicator Dynamics in Combinatorial Optimization 


The weighted TAG of two attributed trees T, and T, 
is the weighted graph G = (V, E, w), where V and E 
are defined as in (12) and (13), and w is a vector which 
assigns a positive weight to each vertex (i, h) € V = V, 
x V2 as follows: 


Win = O()(i), @2(h)). 


The following result is the weighted counterpart of 
Theorem 5 [48]. 


Theorem 6 Any maximal (maximum) similarity sub- 
tree isomorphism between two attributed trees induces 
a maximal (maximum) weight clique in the correspond- 
ing weighted TAG, and vice versa. 


Theorems 5 and 6 provide a formal justification for ap- 
plying replicator dynamics to find maximal subtree iso- 
morphisms. In [48] this approach has been applied in 
computer vision to the problem of matching articulated 
and deformed visual shapes described by ‘shock’ trees, 
an abstract representation of shape based on the sin- 
gularities arising during a curve evolution process. The 
experiments, conducted on a number of shapes repre- 
senting various object classes, yielded very good results, 
both in the weighted and in the unweighted case. The 
system typically converged towards the globally opti- 
mal solutions in only a few seconds, and compared 
favorably with another powerful tree matching algo- 
rithm. 


A Geometric Problem 


Let G = {x), ..., Xm} be a finite set of points in R”. 
The convex hull of G, denoted by conv(G), is defined 
as the smallest convex set containing G. A basic prob- 
lem in computational geometry is to determine whether 
a given query point y is inside or outside conv(G) [49]. 
This task can easily be accomplished by a replicator dy- 
namical system [43]. Such an algorithm can be used as 
a subroutine for solving more general geometric prob- 
lems, such as the polygon inclusion and the convex hull 
problems. 

Consider the n x m real matrix defined as X = [x 
+++ XJ. It is well known that conv (G) can be written as 

conv(G) = {ue R”: u= Xv, ve Sy}. 
Given an arbitrary point y € R” the measure 


Ey, G) = min |Xv— yl, 


sometimes called the exteriority of y to conv(G), is just 
the Euclidean distance between y and its closest point 
in conv(G). The exteriority measure can provide use- 
ful information about the ability of neural networks to 
generalize well [16]. Clearly, y € conv(G) if and only if 
E(y, G) = 0, in which case the closest point to y is y itself. 

For convenience, the problem of evaluating E(y, G) 
is translated into the equivalent (but more manageable) 
quadratic program: 


min C(v) = 3 |Xv— yl ean 
st. VE Sy. 


It is a well-known fact that C is convex (strictly con- 
vex indeed if the vectors x), ..., Xm happen to be lin- 
early independent), and this implies that all local min- 
ima of C are also global minima. Any descent procedure 
is therefore guaranteed to approach the global optimal 
solution in this case, without the risk of getting trapped 
into poor local minima. 

It is interesting to note that a similar optimization 
problem, known as the problem of ‘optimal stability’, 
also arises in the context of learning in perceptron neu- 
ral networks, where the goal is to derive the parameters 
of the network so as to ensure larger basins of attraction 
[31,51]. Moreover, our problem turns out to be closely 
related to that of determining whether a given set of 
prototype vectors can be stored in a neural network as- 
sociative memory [29]. 

Note that the quadratic objective function in (14) is 
explicitly written as follows: 


1 1 
C(v) = sv XT Xv —y!Xv+ sv 


which is a nonhomogeneous quadratic polynomial. In 
order for replicator equations to find a solution of prob- 
lem (14), we need to construct the payoff matrix as 


w=x!x 
and to replace the z function defined in (1) with: 
mi(t) = D> wisxj(t) + si, 
j=l 


where s; equals the ith component of —XTy. After 
a proper rescaling of W and the s;’s, it is readily seen 
that C is a Liapunov function for both continuous-time 
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(2) and discrete-time (3) dynamics. The algorithms will 
converge to a local solution of (14), say v*, starting from 
any interior point. Owing to the convexity of C, v* will 
be also a global minimizer of C, so that the exteriority 
can be calculated as: 


E(y,G) = V2C(vx). 


In [43], experiments with a simple toy problem demon- 
strate the validity of the approach. 


Multipopulation Models 


The single-population replicator equations discussed so 
far can easily be generalized to the case where inter- 
actions take place among n > 2 individuals randomly 
drawn from n distinct populations [23,55]. In this case, 
the continuous-time dynamics (2) becomes 


x(t) = x(t) (x = Estat) , (15) 


and its discrete-time counterpart is 
Le im 
x; (t)7; (t) 
ETO LAG) 
The function z can either be linear, as in (1), or can 


take a more general form. If there exists a polynomial F 
such that 


xi (t+ At= (16) 


OF 
x = —, 
: ax' 


then it can be proved that F strictly increases along any 
trajectory of both dynamics [2,3,23]. Note that these dy- 
namics work in a product of standard simplices. 

H. Miihlenbein et al. [39] used multipopulation 
replicator equations to approximately solve the graph 
partitioning problem, which is NP-complete [19]. Given 
a graph G = (V, E) with edge weights wj;, their goal was 
to partition the vertices of G into a predefined number 
of clusters in such a way as to maximize the overall in- 
trapartition traffic 


P=)", 
lL 


where 


is the intrapartition traffic for cluster jz. Here, x can 


be interpreted as the probability that vertex i belongs to 
cluster jL. 
By putting 
lL 
Ub 2F » j Wi jx j 
Lt; = ——— 
1 Ke 
the replicator equations seen above will indeed con- 
verge toward a maximizer of F. However, in so do- 


’ 


ing the system typically converges towards an inte- 
rior attractor, thereby giving an infeasible solution. To 
avoid this problem, Mihlenbein et al. [39] put a ‘selec- 
tion pressure’ parameter S on the main diagonal of the 
weight matrix, and altered it during the evolution of the 
process. Intuitively, S = 0 has no influence on the sys- 
tem. Negative values of S prevent the vertices to decide 
for a partition, whereas positive values force the ver- 
tices to take a decision. The proposed algorithm starts 
with a negative value of S, and makes the discrete-time 
dynamics (16) evolve. After convergence, if an infeasi- 
ble solution has been found, S is increased and the al- 
gorithm is started again. The entire procedure is iter- 
ated until convergence to a feasible solution. A similar, 
but more principled, strategy for the maximum clique 
problem can be found in [9]. The results presented in 
[39] on a particular problem instance are fairly encour- 
aging. However, more experiments on larger and di- 
verse graphs are needed to fully assess the potential of 
the approach. 

Multipopulation replicator models have also been 
used in [39,41] to solve the traveling salesman prob- 
lem, which asks for the shortest closed tour connecting 
a given set of cities, subject to the constraint that each 
city be visited only once. The results presented on small 
problem instances, i. e., up to 30 cities, are encouraging 
but it seems that the results do not scale well with the 
size of the problem. 


Conclusions 


Despite their simplicity and inherent inability to escape 
from local solutions, replicator dynamics have proved 
to be a useful heuristic for attacking a variety of com- 
binatorial optimization problems. They are completely 
devoid of operational parameters, which typically re- 
quire a lengthy, problem-dependent tuning phase, and 
are especially suited for parallel hardware implementa- 
tion. 
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This article describes resource allocation for control of 
epidemics of infectious diseases in humans, particularly 
diseases that are spread directly between individuals, 
such as sexually-transmitted diseases and influenza. In 
its most general form, the problem is to determine the 
optimal amount to spend over time and in different 
populations on programs for controlling the spread of 
an infectious disease. Restricted versions of the prob- 
lem include that of determining the level of investment 
over time in a single program targeted to a single popu- 
lation; determination of the appropriate investment in 
competing interventions targeted to the same popula- 
tion; and determination of the appropriate allocation of 
resources for a single intervention targeted to different 
populations. 

While resource allocation problems have been stud- 
ied by economists and operations researchers for many 
years (see, for example, [4,10,26]), resource allocation 
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for epidemic control poses special challenges. Epidemics 
of infectious diseases are dynamic and are inherently 
nonlinear: while an epidemic is growing, saving one 
person today from getting infected could translate into 
scores of people being saved from infection over time. 
In addition, an epidemic may progress differently in 
different populations, and different programs for the 
control of an infectious disease can have very different 
costs and effectiveness. 

Approaches to the problem of resource alloca- 
tion for epidemic control include analytical deriva- 
tions for simple epidemic models, numerical analysis of 
more sophisticated epidemic models, and heuristic ap- 
proaches for decision makers. Before these approaches 
are described, a simple epidemic model is presented. 
Understanding this type of model is key to understand- 
ing the resource allocation problem. 


Epidemic Models 


One of the simplest types of epidemic models assumes 
transmission of an infectious disease within a closed 
population that is divided into three subgroups: unin- 
fected individuals, infected individuals, and those re- 
moved from the infection-transmission process [12,13]. 
To specify the model, let x(t) represent the number of 
uninfected individuals in the population at time ¢, y(t) 
the number of infected individuals in the population at 
time t (these individuals are assumed to be infectious), 
and z(t) the number of individuals removed from the 
infection-transmission process at time f. Let B(t) be the 
rate of infection-transmitting contacts at time f, u(t) the 
rate at which susceptibles are immunized at time t, and 
y(t) the rate of removal from the population at time t. 
The model can be written as 


d. 

e = —A(t)x(#)y(t) — ud), (1) 
d 
AD = penx oye) — pO 90, 2) 
a = y(Oy(t) + u(t) (3) 


This model and similar models provide the founda- 
tion for much of the work that has been done on opti- 
mal resource allocation for epidemic control. 


A key feature of this and other models of infectious 
disease is a nonlinear growth rate that is a function of 
the size of the uninfected group multiplied by the size 
of the infected group. More sophisticated models may 
include features such as entry into and exit from the 
population, further subdivision of the population by 
risk group and disease stage, different types of infec- 
tious contacts, variable infectivity rates, and stochastic 
parameters. Comprehensive expositions of mathemati- 
cal epidemic models can be found in [2] and [3]. 


Analytical Results 


A number of researchers have considered the applica- 
tion of control theory to simple epidemic models sim- 
ilar to that outlined above with the goal of obtaining 
analytical results characterizing the form of the optimal 
solution. The parameters of the model may be assumed 
to be deterministic or stochastic. Examples of controls 
typically considered include vaccination of susceptibles 
(which increases the rate u(t)), treatment or removal of 
infectious persons (which increases the rate y(t)), and 
reduction in the sufficient contact rate (6(t)) between 
susceptibles and infectious persons. A finite or infinite 
time horizon may be considered. The goal is to deter- 
mine the optimal control over time: for example, the 
optimal rate of immunization u(t) for 0 < t < T. For 
analytical tractability, most analyses assume that only 
one type of control is applied (and thus only one pa- 
rameter is affected by the control). 

A typical objective in the application of such control 
might be to minimize the cost of control (e. g., immu- 
nization cost plus the fixed cost of establishing the im- 
munization program) plus the cost associated with the 
number of individuals who become infected. With the 
exception of the fixed cost of establishing a control pro- 
gram, costs are usually assumed to be linear: the cost of 
control is a constant multiplied by the affected param- 
eter, and the cost of disease is a constant multiplied by 
the number of individuals who become infected. 

Use of simple epidemic models and linear cost func- 
tions allows for characterization of the form of the op- 
timal solution: for example, immunization of all indi- 
viduals until the epidemic is reduced to a certain level 
(a ‘bang-bang’ solution with a single switching point). 
A survey up to 1977 of the application of control the- 
ory to infectious diseases can be found in K. Wickwire 
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[27]. More recent work of this type can be found in 
D. Greenhalgh [6], $.P. Sethi [23], and Sethi and P.W. 
Staats [24]. 

A related type of analysis considers the optimal tim- 
ing of interventions for epidemic control. For exam- 
ple, H.L. Lee and W.P. Pierskalla [15] considered mass 
screening for a contagious disease with no latent period. 
The goal is to minimize the average number of infected 
individuals in the population over a fixed time horizon. 
The authors showed that under certain assumptions, an 
optimal strategy is mass screening at equal time inter- 
vals. 

Another type of analysis considers allocation of re- 
sources among different population subgroups with the 
goal of disease eradication (reducing the disease equi- 
librium in each population to zero). Use of an equilib- 
rium condition allows for analytical tractability. For ex- 
ample, R.M. May and R.M. Anderson [17] considered 
the distribution of vaccine among several heteroge- 
neously mixing populations. The goal is to eradicate the 
disease with as little vaccine as possible. They character- 
ized the optimal fraction of each population that should 
be immunized. As another example, J. Abounadi and 
L.M. Wein [1] analyzed resource allocation for control 
of human immunodeficiency virus (HIV) among ho- 
mogeneously mixing and heterogeneously mixing pop- 
ulation groups. Expenditure of resources on a given 
population group reduces that group’s contact rates. 
The goal is to eradicate the disease while minimizing 
the total cost of control (which is a monotonically in- 
creasing function of the reduction in the contact rates). 
The authors developed analytical results characterizing 
the amount of resource that should be spent on each 
population. 

More recent analytical work by A. Richter, M.L. 
Brandeau and G.S. Zaric [22] considers allocation of 
resources across nonmixing populations with the goal 
of minimizing the total number of new infections that 
occur over a fixed time horizon. Resources for trans- 
mission reduction are to be allocated among the popu- 
lation groups subject to a constraint on total available 
resources (cost of control is a monotonically increas- 
ing function of the reduction in the transmission rate) 
and subject to limits on attainable transmission rates in 
each population. The authors establish conditions un- 
der which the resource allocation problem is convex or 
concave, as well as other analytical results. Zaric [28] 


extended such analyses to more sophisticated epidemic 
models. 


Numerical Analyses 


Another approach to the problem of resource allocation 
for epidemic control uses numerical analysis of more 
sophisticated epidemic models, often tailored for spe- 
cific diseases. Several notable examples are mentioned 
here. 

Using a model of tuberculosis epidemiology, poli- 
cies for the control of tuberculosis in developing coun- 
tries were analyzed by C. ReVelle and colleagues in [18] 
and [19]. Numerical analysis was used to determine the 
set of interventions that minimizes the cost of control 
required to achieve a specified number of active cases at 
the end of a given time horizon. I.M. Longini, E. Acker- 
man and L.R. Elveback [16] used numerical analysis to 
determine the optimal distribution of a fixed amount of 
vaccine among different age groups during an Influenza 
A epidemic. H.W. Hethcote and J.A. Yorke [8] and Het- 
hcote, Yorke and A. Nold [9] used numerical analysis 
of an epidemic model to evaluate the equilibrium epi- 
demic state for policies aimed at controlling the spread 
of gonorrhea. Although the results were not compared 
explicitly with program cost, the authors suggested that 
such comparison must be made before the appropriate 
allocation of resources can be determined. 

A framework similar to that of [22] was used by 
Richter, Brandeau and D.K. Owens [21] to evaluate al- 
location of HIV prevention resources among different 
programs targeted to different risk groups in the patient 
population of a large health care system. The authors 
estimated the cost-effect function associated with each 
intervention as applied to each population (defined as 
the reduction in transmission that could be achieved as 
a function of expenditure), and used numerical analy- 
sis to determine the optimal allocation of a fixed budget 
across the prevention programs and the populations. 

In [5], C.M. Friedrich and Brandeau investigated 
the optimal level of funding for a single HIV prevention 
program targeted to a single population using a sim- 
ple epidemic model. Via simulation, the authors devel- 
oped qualitative insights into the nature of the optimal 
investment (e. g., spend no money, spend some of the 
money, or spend as much money as possible) for differ- 
ent types of cost-effect functions. 


3294 


Resource Allocation for Epidemic Control 


Another approach to the resource allocation prob- 
lem uses ideas from artificial intelligence. W.Y. Tan and 
S. Yakowitz [25] proposed the application of a machine- 
learning algorithm for a Markov decision process (e. g., 
see T.L. Lai and Yakowitz [14]) to determine the op- 
timal policy for control of an epidemic. The approach 
was illustrated using numerical analysis of a stochas- 
tic model of the HIV epidemic in a single population. 
The goal is to minimize the number of new infections 
by continuously allocating a fixed amount of resources 
between programs that either lower the contact rate or 
lower the infectivity per contact. As the learning algo- 
rithm progresses, the resource allocation progresses to 
an equilibrium; [25] showed that this equilibrium is the 
optimal allocation. 


Practical Tools for Decision Makers 


Existing analytical work on resource allocation for epi- 
demic control has limited applicability due to the sim- 
ple epidemic models and simple control policies that 
are assumed. Numerical approaches can provide ap- 
plicable results, but require the development of real- 
istic epidemic models and the collection of a signifi- 
cant amount of data. As an alternative, E.H. Kaplan 
[11] has proposed a heuristic approach for community 
planners who must allocate HIV prevention resources. 
He also suggests that decision makers subjectively con- 
struct production functions that estimate the number of 
new HIV infections that would occur in a particular 
population group ifx incremental dollars were spent on 
a given HIV prevention program targeted to that group. 
Then the problem of allocating a fixed prevention bud- 
get among prevention programs and populations so as 
to minimize new HIV cases reduces to a knapsack prob- 
lem. 


See also 


> Combinatorial Optimization Algorithms in 
Resource Allocation Problems 
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The purpose of this article is to introduce the complex 
reverse convex programming problem. As such, this ar- 
ticle is not a complete survey of the research into solu- 
tion methods for reverse convex programs. This article 


will provide several examples that demonstrate the im- 
portance or relevance of the problem class, and will in- 
troduce the reader to some typical solution strategies 
that illustrate the combinatorial nature of the problem. 
Much of the early work in reverse convex programming 
deals with finding local minima [1,2,15,17]. The discus- 
sion of this article is directed toward the global opti- 
mization of reverse convex programs. 


Definitions 


Let X and C denote two convex subsets of R", where X 
is closed. Let G = cl(C‘); that is, G is the closure of the 
complement of a convex subset of R”. We call such a set 
a reverse convex set. The corresponding reverse convex 
feasible region is defined as 


F=XNG. (1) 


Let f: O > R'}, where O is an open subset of R” such 
that O D F. The general reverse convex programming 
problem (RCP) is defined as 


min{f(x): xe XMG}. (2) 


Clearly, F is generally a nonconvex set and, moreover, 
F is often disconnected. As a result, (2) is a member of 
a class of difficult optimization problems, whether or 
not f is a convex function. 


Examples 


The first two examples below are well-known difficult 
examples in their own right; each can be rewritten as an 
equivalent reverse convex optimization problem. There 
is no computational significance of this conversion; 
however, the conversion is included so the reader may 
better understand the general complexity of the reverse 
convex class of optimization problems. 


Example 1 Let 


pe I cq Ax 2 b,x; € {0,1}, 
j=l,...,n 
where A is (m x n)-matrix and b is an m-vector. Then 

min {c'x: x € XI}. (3) 


is the well-known 0-1 linear integer programming 
problem. Define the two sets X = {x € R": Ax > b, 0 
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<x <1,j=1,....n} and G= {xe R": })) (1 =x) = 
0}. We see that G is a reverse convex set and that (3) is 
equivalent to the reverse convex optimization problem 


min {c! x: xEXNG}. (4) 


Example 2 Let X4 = {x € R": Ax => b, x => 0} and let f 
be a concave function on R”. The problem 


min {f(x): x € X4} (5) 


is the well-known, and difficult, concave minimization 
problem (f is continuous since we assume it is defined 
on R”). Introduce an additional variable, n, and the in- 
equality constraint n — f(x) > 0. Let X = {(x,n) €R"*!: 
Ax + n0 > b, x = 0}, where 0 is a column m-vector of 
zeros and let G= {(x, n): n —f (x) = O}, a reverse convex 
set. Then (5) is equivalent to the reverse convex opti- 
mization problem 


pin {7 (x,n)€XNG}. (6) 


Example 3 Let E denote the node-arc incidence ma- 
trix of a connected, directed graph (assume there are 
m nodes and n arcs). Let f;,; denote the nonnegative 
flow on arc (i, j) and let k;,; = 0 denote the capacity 
of arc (i, j). Assume the capacity of arc (i, j) can be in- 
creased by an amount x;,; > 0 at a corresponding cost 
of ¢;,; (x;,;) = 0. Also assume, for each arc (i, j), we have 
Ci, j(Xi,;) = Ci, j(%i,j) if Xij 4 Xi,j> Ci, j (xi, ;) —> Ww, as 
Xj,j > 0, and ¢;,; is continuous. Let B > 0 denote the 
capacity expansion budget; it is desired to increase the 
flow capacity (from the source to the sink, respectively 
represented by the first and last rows of E) as much as 
possible. We assume the network at hand has the prop- 
erty that there are economies of scale present. That is, 
the average cost of capacity expansion, for each arc, is 
a decreasing function. That is, each incremental capac- 
ity cost function, cj, j is concave. Let v denote the value 
of a feasible flow vector f. Let q = (— 1, 0,..., 0, 1)’. The 
optimization problem may be written as 


max v 
fav 
st. Ef —vq=0 

f-xsk (7) 


> ci,j(%i,j) < B 
i,j 


f2=0, x =0. 


Since each c;,; is concave and continuous, the set G = {x 
> 0: Vey) < B} is a reverse convex set. 


Example 4 Consider the following problem, an exam- 
ple of the linear bilevel optimization problem [5], 

min clx+d'y 

“Y (8) 

s.t. Aix + Diy = b}, 
where y solves 

min f'z 

st. Aox + Doz > b?. 


The interpretation of this problem is that a ‘king’ de- 
cides upon a vector pair, (x*, y*). He imposes his choice 
of x* upon his subjects; however, this benevolent king 
will allow his subjects to choose the vector y (since the 
king is optimistic and, he assumes, omniscient as well, 
he assumes his subjects’ optimal choice of y, when faced 
with x*, will be the same as his choice, y*). Let 
: Tr 
min z 
o(b? —Azx)=% % f (9) 
st. Agx + Doz > b?, 
denote the so-called optimal value function for the 
king’s subjects, as a function of the king’s choice of x. 
The king’s problem (8) is equivalent to 
min clx+d'y 
xy 
st. Aix + Diy > bd! 
Ax + Dry as b? 
o(b? — Aox) — fl y>0. 


(10) 


Since o is a convex polyhedral function of x, we see that 
the last constraint of (10) is a reverse convex constraint. 
That is, the set G = {(x, y): o(b? — Axx) — fTy = 0} is 
a reverse convex set. Of course this problem is quite dif- 
ficult in that, unlike the previous examples, the reverse 
convex constraint is not explicitly known. 


Example 5 Consider the reverse convex optimization 
problem with several reverse convex constraints: 


min {f(x): x © X, gi(x) >0,i=1,...,m} (11) 


where each g; is a convex function. By the addition 
of a variable, (11) can be converted to a reverse con- 
vex optimization problem with one reverse convex con- 
straint [21]. Let 


g(x) = min {gj(x): i=1,...,m} (12) 
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and let 
p(x) = >° gilx), 


(13) 
q(x) = max > gi): i=1,...,m 
i#i 
Then 0 < g(x) = p(x) — q(x) is equivalent to the two 
constraints, in (x, f), 


q(x) —t <0, 


p(x) —t => 0, “) 


where, of course, the last constraint is a reverse convex 
constraint. See U. Ueing [22] for an early branch and 
bound approach to the global optimization of (11). 


Example 6 Consider the separable optimization prob- 
lem 


min filx) + fay) 
yeY (15) 
st. g(x) + @(y) = 0, 


where X, Y are convex sets of, perhaps, different dimen- 
sions; f; is convex, f> is concave, g' is a vector of con- 
cave functions, and g’ is a vector of convex functions. 
Assume for each y € Y, that the problem 
ey [me A 
s.t. g(x) > —g?(y) 
has a saddle point. Let y', i= 1,..., k, be points of Y and 
let (x!, A‘) denote a saddle point for (Py) ty anis Ke 
The kth relaxed master of Benders’ decomposition pro- 
cedure [6,8] is 
mio 
n 
st. n—(faly) —A'Tg?(y)) 
= file =A ge), 
a ree 


(16) 


This is a problem with several reverse convex con- 
straints. Note that a simple change of variable, y = 7 
— f2(y), leads to the equivalent problem 


oe y t+ foly) 

Y¥ 

st. py tAilg*(y) 
> filx!) — Ail g(x‘), 
f= 1.50% 


(17) 


This problem involves the minimization of a concave 
function subject to several reverse convex constraints, 
avery difficult formulation of the relaxed master. 


Some Basic Concepts for Solution Methods 


Let f: R” — R! and g: R" — R' be convex functions 
and let G = { x € R": g(x) = O}, a reverse convex set. 
Also assume 


dw e XG‘ dB f(w) < f*, (18) 
where f* is the optimal value of the reverse convex 
problem (2). Then any optimal solution x has the prop- 
erty that g(x) = 0; it then follows that 


0 = max {g(x): x EX, f(x) < f@} (19) 


and, therefore, x is optimal for (19). Under certain ad- 
ditional ‘stability’ [21] conditions, it is also the case that 
if X satisfies (19) then Xx is optimal for (2). 


X a Convex Polytope, f Linear 


In order to develop an understanding of the combi- 
natorial nature of the problem, it is useful to consider 
the following. If, in addition to the above assumptions, 
X is a convex polytope then the feasible region (1) is 
also a convex polytope (see [11,12]). This, in turn, im- 
plies that there is an optimal solution for (2) on an edge 
of the feasible region (1) if f is a linear function (or, 
of course, if f is a concave function). In this case we 
refer to the reverse convex programming problem as 
LRCP to denote that we are dealing with linear pro- 
grams with an additional reverse convex constraint. As- 
sume a vertex, X of X, can be found with the property 
that g(x) > 0. Then pivot via the simplex algorithm, so 
as to decrease f(x) = cTx, until a neighboring pair of ver- 
tices, u and v, is found with the property g(u) > 0 and 
g(v) < 0. Let z € [u, v] be the last point on the edge [u, 
v] with the property g(z) = 0. Consider the convex poly- 
tope X M {x: cT x = cTz} and generate the n — 1 neigh- 
bors (we assume nondegeneracy for this discussion) of 
zon the hyperplane {x: cl x = cz}. If one of those neigh- 
bors, say 7, is such that g(7) > 0 then we consider the 
edge, [71, N2] of X, that contains 7 and continuing piv- 
oting, via the simplex algorithm, from that edge’s end- 
point which is feasible. On the other hand, if no neigh- 
bor of z is feasible, we solve (19) with x = z. If x is 
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optimal for (19) then, under ‘stability’, x is optimal for 
(2). If X is not optimal for (19), let x’ be optimal; then x’ 
is a vertex of X (if c'x’ < c'X) or x’ is on an edge of X 
(if cx’ = c'X) and, of course, g(x’) > 0. Note that we 
have made an implicit assumption that (19) is an easier 
problem to solve than the original problem (2). There 
is no theoretical justification for this assumption; often 
however, in practice, the assumption appears to hold. 
Also, all that is needed is an x’ with the aforementioned 
properties; an optimal x’ is not necessarily required. 

See [3,4] for an early usage of the concave minimiza- 
tion problem (19) in the context of the application of 
Benders’ decomposition [6,8] to an economies-of-scale 
network capacity expansion problem of the form of (7). 
Also see [11] for a full discussion of the above pivoting 
method. 

Since an edge optimal solution exists, a branch and 
bound edge search strategy may be in order. The first 
is due to R.J. Hillestad [10]; also, see [14]. We will not 
discuss this approach. 

There is also a large literature, employing cutting 
planes based on the seminal paper of H. Tuy [20], 
which discusses the usefulness and difficulties of cut- 
ting planes for reverse convex optimization (e.g., see 
[7,9,12,13,18]). This literature will not be discussed in 
this article. 


X Convex, F Convex 


Under assumption (18), and following Tuy [21], define 
the mapping 2: X N G > R" by 


a = min {6 € [0,1]: g(x+d(w—x)) =0}, 
a(x) =x+t+a(w—x). 
(20) 


If z solves (19) but g(z) > 0, then f(z(z)) < f(x). This 
suggests the following: 


Select x € XM dG(0G = {x : g(x) = 0}) 
Repeat Let z solve (19) 
If g(z) > 0.x < x(z) 


Until {g(z) = 0} 


Under a few technical assumptions, this method 
will, in the limit, produce an optimal solution. How- 
ever, the major step in the algorithm involves a convex 


maximization problem over a convex set and, in some 
sense, illustrates the relationship between convex max- 
imization (concave minimization) and reverse convex 
programs. Of course, the convex maximization prob- 
lem is often as difficult to solve as the original reverse 
convex program (2) and, therefore, solving the latter by 
a sequence of convex maximization problems may not 
be effective. However, there are excellent concave mini- 
mization methods (cf. e. g., » Concave Programming). 

There are various algorithmic strategies that at- 
tempt to overcome the inherent difficulties for solving 
(19) directly. Let 5, denote a support for g at the point 
x. Let x € XMG and let z solve the convex optimization 
problem, with a linear objective, 

max {82 (x —X): x eX, f(x) < f@}. (21) 
If d¢(z — X) > 0, then g(z) > 0 and f(m(z)) < f(x). To 
see the latter, observe 


f(x(z)) < f(z) + a(f(w) — f(z) 
< f(z) + a(f(w) — f*) < f(z) < f(), 


where the strict inequalities hold since (18) implies that 
a > 0 and that z cannot be optimal for (2). This suggests 
the following: 


xEXNG 

Repeat 

Let z solve (21) 

If 5 2(z —x)>0,x < x(z) 
Until {54 (z— x) = 0} 

x < m(z) 

let z solve (19) 

If g(z) > 0.x < x(z) 
WHILE {g(z) > 0} 


Select 
DO 


Of course, the point of solving (21) is to avoid, as 
long as possible, addressing problem (19). Nevertheless, 
(19) must be dealt with eventually. Another approach 
is to use an outer approximation method for problem 
(19). For instance, see Tuy [21] for a method which 
generates a sequence of convex polytopes, {S;}, with the 
properties S; C S,— 1 and each S, > X NG. Other meth- 
ods, and combinations of methods, have been devel- 
oped for problem (2) and the reader is referred to [13] 
as well as to the ‘Journal of Global Optimization’. 
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Test Problem Construction 


At this point in time, as one might imagine, there are 
no generally applicable and efficient methods for solv- 
ing (2), the general RCP. Therefore, in order to test new 
procedures, it is important to have at hand a method 
for generating problems for which we know the opti- 
mal vector. There are several such methods; we will de- 
velop one, the first, due to Y. Sung and J.B. Rosen [19] 
(also, see [16]), for constructing a concave minimiza- 
tion problem, over a convex polytope, whose answer is 
known. A slight modification of that method leads to 
the construction of an LRCP whose answer is known. 
We proceed as follows. 

Let X = {x € R”: Ax < b}, where A is m x n, m > 
n (e.g., nonnegativity constraints are included), and it 
is assumed that X is bounded. Let f(x) = cTx and let x 
be any edge point of X that is not a vertex of X. Then 
x is a vertex of XM {x: clx< ede Without loss of 
generality, assume the first n — 1 rows of (A, b) define 
the edge of X of which x is an element. That is, x is the 
unique solution of the system of equations 


aj x = bi, 


cx = clk 


i=1l,...,n—1, 


(22) 


(we assume cT is not a linear combination of the rows 
ats i=1,...,n—1). Let Dx =d denote the matrix rep- 
resentation of (22). Let 
v; = min fal x: Ax <b,c'x< o' x} , 
i=l,...,n—-—l, (23) 
Vn = min fa" Ax < b} 
and, for arbitrarily small ¢ > 0, define v™(e) = (v; — ¢, 
.++) Vn — €). Let the jth component of the vector r be de- 
fined by r; = (dj + vj(e))/2. Then x is the unique optimal 
solution for the LRCP 


T Ax <b, 


minjc x: 
. \| Dx — rl]? — ||d — v(e)||2/4 = 0 
where || - || denotes the Euclidean norm. 
See also 
> aBB Algorithm 


> D.C. Programming 

> Quadratic Knapsack 

> Quadratic Programming with Bound Constraints 

> Standard Quadratic Optimization Problems: Theory 
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Control engineers are interested in the amount of mod- 
eling uncertainty that can be tolerated in feedback con- 
trol systems. Robust control theory concerns the eval- 
uation and optimization of uncertainty tolerance, oth- 
erwise known as stability margin. The purpose of ro- 
bust control theory is to enable control engineers to de- 
termine quantitatively whether or not a feedback con- 
trol design is capable of maintaining satisfactory perfor- 
mance for all perturbations within a given class. Beyond 
this, they may also seek to optimize control system ro- 
bustness by choosing a feedback controller that maxi- 
mizes uncertainty tolerance. In either case, the problem 
is essentially a nonconvex optimization problem. 


Canonical Robust Control Problem 


By a suitable choice of variables, most robust con- 
trol problems can be cast in the general framework of 
the canonical uncertain control system (S) depicted in 
Fig. 1 (cf. [25, p. 62]). Given the plant P(s) and a set 
A of uncertain feedback perturbations A, the canonical 
robust control synthesis problem is to find a controller 
K(s) so that the closed-loop transfer function matrix re- 
mains stable for all block diagonal perturbation matri- 
ces 


A, 0 0 
0 A, 0 

A= _ Ped. 
0 0 An 


The A,’s are called uncertainties and the set A is called 
the uncertainty set. Via the introduction of an addi- 
tional fictitious uncertainty, J.C. Doyle, J. Wall and G. 
Stein [10] observed that one may embed performance 
issues such a noise attenuation requirements within the 
robust stability framework; so, the focus on stability ro- 
bustness is not overly restrictive. 

When the controller K(s) is given beforehand, then 
the resulting simplified problem is called the robustness 
analysis problem, also known as the multivariable sta- 


uncertainty A 


controller 


nominal closed loop T' 


Robust Control, Figure 1 
Canonical uncertain control system (S) 
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bility margin analysis problem. On an abstract level, 
M.G. Safonov [25] showed the analysis problem to be 
equivalent to testing for the topological separation of the 
graphs of the operators T and A - this is so even in very 
general cases where T and A are nonlinear operators. 
The multivariable stability margin K,,(T) associated 
with a given controller K is [26,27] 
inf 
A y20 
Km(T) = 4 st. system (S) is unstable 
AeyA. 


Thus, the quantity K,, is the smallest nonnegative real 
number y for which the ‘scaled’ set y A contains 
a destabilizing A. In cases where there is no y > 0 for 
which there is a destabilizing A € yA, one defines K,, 
= oo. Clearly, a controller K(s) solves the robust con- 
trol synthesis problem if and only if the nominal closed- 
loop transfer function T(s) satisfies 


K,(T) > 1. 


It may be assumed without loss of generality that 
dim(u;) = dim(y). 


Linear Time-Invariant Robustness 


If, as is often assumed by control engineers, both A and 
T are linear time-invariant with stable rational Laplace 
transforms, then robustness analysis may be related to 
the characteristic equation of system (S) 


det(I — A(s)T(s)) = 0. (1) 


The system (S) is said to be stable for a given A(s) ifand 
only if (1) has no solution C4 where C, denotes the 
closed right-half of the complex plane. 

Suppose, additionally, that the set A is specified in 
the frequency-domain as simply the set of stable A’s 
for which A(jw) € A(jw) where, for each w, A(jw) is 
a given set of matrices. Then, the multivariable stability 
margin can be evaluated as 


Ky = inf kn (jo) 
where 
inf y 
A YA 
km(T(j@)) = st. det(I— yA(jw) T(jw)) = 0, (2) 
A(jow) € A(ja). 


Closely related to the multivariable stability margin 
km(Gq) is the structured singular value [9] 
1 
km(T(jo)) 
Thus, an ‘optimal’ solution to the robust control syn- 


thesis problem is obtained if one can solve the so-called 
pi-synthesis control problem 


u(T(jo)) = 


min su T(jo)). 
He Bp (jo)) 


The exact computation of j4(T(jw)) is in general im- 
practical, except for very simply structured uncertainty 
sets A. Indeed, the optimization problem (2) is in gen- 
eral NP-hard, meaning that it cannot be computed in 
polynomial time in the worst cases [6]. Still, easy to 
compute conservative upper-bounds on jz abound, but 
even for these it is usually necessary to first reformulate 
the robustness analysis problem. 


Topological Separation, Sectors and IQCs 


The robustness analysis problem (2) can be interpreted 
in terms of a ‘topological separation’ of the graphs of 
A(jw) and T(ja) (e. g., [15,16,25]): 


graph(A) a fe = @, € range (5) (3) 


graph(y T) 4 fe = (‘) € range (7) . (4) 


In particular, the robustness condition k,,(T) > 1 is 
equivalent to the existence of a complex matrix Q such 
that the quadratic functional z*Qz topologically sepa- 
rates graph(A) and graph(yT) in the sense that [15,25] 


Re(z*Qz) >0 forallz € graph(A),z4#0, (5) 
Re(z*Qz) <0 forallz € graph(yT),z 40. (6) 


Indeed, for € > 0 sufficiently small one such matrix is 
[16] 


atjo) = (70) (yr4jo)-1) ~er 


Safonov [25], building on the 1960s nonlinear sta- 
bility work of IW. Sandberg [32,33] and G. Zames 
[37], called this quadratic form of the topological sep- 
aration condition the sector stability criterion. Later, A. 
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Megretski and A. Rantzer [19] have dubbed it the in- 
tegral quadratic constraint (IQC) approach and linked 
it to the S-procedure nonlinear stability concept of V.A. 
Yakubovich. 

In formulating robustness analysis as an optimiza- 
tion problem, the conditions (3)-(4) are usually rewrit- 
ten as matrix definiteness conditions 


er (ez Q(jo) «€) > 0 


forall Ac A (7) 


ae I 
herm (er Q(jo) er) <0, (8) 


where 
1 
herm(X) w(K +X"). 


Consequently, the problem (2) can be reformulated as 
the optimization k,,(T(jw)) 4 inf, > 9 y subject to the 
LMI constraints (7)-(8). Unfortunately, verifying that 
there exists a Q satisfying (7)-(8) is not in general easier 
than the original problem. The problem remains inher- 
ently NP-hard [6]. 

As is apparent from (7)-(8), the problem in this 
form is an instance linear matrix inequality (LMI) 
problem, but with possibly infinitely many LMI con- 
straints: (7) has one LMI for each element of the set A 
which in general may be infinite - for example, even the 
unit interval A = [0, 1] C R has infinitely many points. 


Restrictions on Q 


The class of matrices Q which may potentially solve the 
optimization (7)-(8) is inherently restricted. K.C. Goh 
and Safonov [15] established that every IQC topolog- 
ical separating functional is isomorphic to the positiv- 
ity and small gain stability criteria of Zames [37]. More 
precisely, for some invertible matrix F, the change of 
variables 


Coley at 
yi yi 


causes the topological separation conditions (5)-(6) to 
simplify to the so-called ‘small gain’ form 


il? — ||? = 0 for all z € graph(A), z # 0, 
[Mil\° — [ai < 0 for all z € graph(yT), z # 0, 


which corresponds to Q = F* & ) F. Similarly, if 


one defines 


pel 2\e 
I ot 


then under the invertible change of variables 


conditions (5)-(6) assume the ‘positivity’ form 


~~ 


Re(yju) > 0 forall z € graph(A), z ¥ 0, 
Re(y;u) <0 forall z € graph(yT), z 4 0, 


0 
ve 
cation of these results is that, without loss of general- 
ity, one may restrict the Q matrices to those for which 
rank(Q) = dim(Q)/2 without introducing any conser- 
vativeness in solving the optimization. Another impli- 
cation is that any Q that solves (7)-(8) must have a Her- 
mitian part whose sign matrix has exactly as many 1’s as 
—1’s. But, despite these freedoms to further restrict Q as 
above, the problem of exact computation of k,, remains 
inherently NP-hard. 


which corresponds to Q = F* ( :) F. One impli- 


Linear Matrix Inequalities 


Since exact computation of the multivariable stability 
margin is inherently NP-hard, one must settle for com- 
puting upper and/or lower bounds on k,,. In particu- 
lar, ‘conservative’ lower-bounds are preferred by con- 
trol engineers, since these give conservative sufficient 
conditions for stability of the uncertain system. 

Many practical algorithms for computing a lower- 
bound k,, on the multivariable stability margin in- 
volve the solution of finite-dimensional linear matrix 
inequalities of the form 


km(T(jw)) = k,,(TCGo)) 


an 
s.t. 
. (9) 
=) herm : Q(M) 
yT(jo) yT(jo) 
<0, 


herm(M(0@)) > 0. 
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Casting the k,, lower-bounding problem in this frame- 
work typically involves assuming additional ‘structural’ 
characteristics for the uncertainty set A and exploiting 
these characteristics to identify linearly parametrized 
subsets of matrices Q(@) and M(@) for which (7) is 
known to hold a priori. Then, having eliminated the 
constraint (7), the problem assumes the form of the 
semidefinite programming problem (9), which is prac- 
tical to solve. Indeed, for each fixed y > 0 it is a con- 
vex linear matrix inequality (LMI) optimization prob- 
lem. The key to success here is identify which classes of 
linearly parametrized matrices Q(@) and M(6) go with 
which sorts of uncertainty structure. 


Uncertainty Structure 


Several commonly encountered uncertainty structures 
(cf. [13]) are listed in the table below along with linearly 
parametrized M(@) and Q(M) such that 


6) eon(3)= 


AeA, 
for all 
M(@) with herm(M(@)) > 0. 
Uncertainty A M(@) and Q(6) 
Small gain M=061,0€ER, 
| Alsi, a-0( 0 aw) 


Positive A €« C”*™ M(0)= 61,0 ER, 


0 0 
a=(i 0) 
M(@) =6 eC", 
a= (oi M 


herm(A) > 0, 


Repeated small gain 
56ER 


A= Signe Fe ls -M M 


M(@)=0 €C"*", 
0 0 
a(n) 
MQ) =0 ECP, 
M(@) = M*(8), 


Cu) 


Repeated positive 6 € R 
A= Olepeme ) = 0, 


Repeated positive 6 € C 
A = Oli msezan 


Re(4) > 0, 


When it is known that each of the individual uncer- 
tainties A; satisfies such a condition for a Q; (i= 1,..., 
m) partitioned evenly as 


LA 12 
Qi = Qn Qn 
Qi Q; 


then it follows that inequality (7) holds for a likewise 
partitioned Q having (for j, k = 1, 2) 


GY GR); 0 
0 -- QY?(M,) 


State-Space Robustness Analysis 


Insofar as stability analysis is concerned, the ‘state- 
integrator’ operator (1/s)J may be regarded as a re- 
peated positive uncertainty 61,6 € C4, [34]. This leads 
to a rearrangement of the block diagram of the system 
(S) so that the state-integrator is relocated from inside 
the nominal plant T(s) = CUIs — A)~'B + D into the the 
A block as an additional fictitious uncertainty. Thus, A 
and T become 


-I 0 0 

0 A 0 
A< : Fi T<e A B 
C D 

0 0 An 


whence T becomes a constant matrix. 


History and Further Reading 


The origins of robust control theory can be traced to 
1970s research work headed by M. Athans at the MIT 
Electronic Systems Laboratory. Up to this time, most 
mathematical control theorists generally assumed that 
untried 1960s optimal control theories would produce 
superior feedback designs. In 1971, researchers Athans 
[1] and H.H. Rosenbrock [24] began to voice cau- 
tious doubts about the this popular view. By 1975 the 
worst expectations had been confirmed when early at- 
tempts to apply linear-quadratic Gaussian (LQG) opti- 
mal control theory to practical feedback designs pro- 
duced unacceptable results, see [2,21,30]. The prob- 
lem was quickly identified to be a lack of attention to 
the multivariable stability margin problem [36]. Athans 
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and Safonov [25,29] introduced the use of the term ro- 
bustness to describe this problem and laid the foun- 
dation for its solution based on quadratic separating 
functionals. The idea of computing lower-bounds on 
ky via LMI optimization is due to M.K.H. Fan and A.L. 
Tits [12]. 

The evolution robust control theory during its in- 
fancy and up to 1982 is accurately described in [30]. 
Bibliographies focusing on more recent developments 
in the theory of multivariable stability margin may 
be found in references [8,22]. The article [27] pro- 
vides an elementary introduction to robust control the- 
ory including motivating examples and a sampling of 
nonoptimization techniques for estimating conserva- 
tive bounds on multivariable stability margin. Books 
on robust control design methods include J.M. Ma- 
ciejowski [18], S. Skogestad and I. Postlethwaite [35], 
and B.R. Barmish [4]. Linear matrix inequality (LMI) 
optimization methods in control theory are emphasized 
in books of S.P. Boyd, L. ElGhaoui, E. Feron and V. Bal- 
akrishnan [5] and of ElGhaoui and S. Niculescu [11]. 
References [17,17,23] describe classes of optimal Hoo 
robust control synthesis problems can be reduced to 
LMI optimizations using embeddings based on Par- 
rott’s theorem and Finsler’s theorem. Bilinear matrix in- 
equality (BMI) formulations of the robust control syn- 
thesis problems are described in [20,31]. Commercial 
software packages for robust control include [3,7,14]. 
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Consider the robust stability analysis of linear, time- 
invariant, discrete-time control systems with uncertain 
real parameters q;,i=1,..., €, with given lower and up- 
per bounds q; < qi < qe. The scalars q; form a vector 
q = [q1, -.-» qe]? that is bounded by a hyperrectangle 
Q=igq; <a < re i=1,..., €}. The characteristic 
polynomial is 


P(z,q) = Do ax(qz*, qeQ. 
k=0 


For simplicity we assume a,,(q)> 0 for all q € Q. Sup- 
pose the coefficients ax(qi, ..., qe) depend (affine) lin- 
early on the uncertain parameters, i.e. 


ax(q) = ako + 4e1qi + +++ + anege. 
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Then P(z, Q) = {P(z, q): q € Q} is a polytope of poly- 
nomials. The question of robust Schur stability of the 
polytope is now: Are all roots of P(z, Q) located inside 
the unit circle of the complex z-plane? 


Solution 


Define the vertex polynomials P;(z), i = 1, ..., Phan for 
which all q; take on their maximum or minimum value. 
An obvious necessary condition for robust Schur stabil- 
ity is that all vertex polynomials P; are Schur stable. 

The following derivation of a necessary and sufh- 
cient condition was given in [1]. In the first step the 
edge theorem [2] is used. For the present problem is 
says, that the polytope of polynomials is Schur stable 
if and only if all exposed edges are Schur stable. The 
notion of ‘exposed’ edge is illustrated by the case ¢ 
= 2. The admissible region is a rectangle in the (q1, 
q2)-plane. The four edges of the rectangle are exposed, 
the two diagonals are, however, not exposed. In the ¢- 
dimensional hyperrectangle each vertex is connected to 
€ neighboring vertices by exposed edges. Counting each 
edge only once there are £ 2° — 1} edges. The edge the- 
orem reduces the test of an ¢-dimensional continuum 
to a finite number of one-dimensional stability tests for 
the edges. 

The second step is now the Schur stability test for 
an edge between vertices qz and qc with correspond- 
ing polynomials Pg(z) and Pc(z). A point on the edge is 
described by 


Pa(z,a@) = aPp(z) + (1 — @)Pc(z), a € (0,1). 


Starting from a stable polynomial P,(z, 0) = Pc(z) 
there are three possibilities how Pa4(z, a) can become 
unstable with increasing a: 

i) areal root crosses the point z = +1; 


ii) areal root crosses the point z = — 1; and 
iii) a complex conjugate pair of roots crosses the unit 
circle. 


By stability of the vertices Pg(1) > 0, Pc(1) > 0 and 
therefore P4(1, a)> 0 for a € (0, 1). A similar argument 
holds for Pg(— 1) and Pc(— 1). 

Condition iii) is checked by the following test. For 
a polynomial 


P(z) = 3 apz* = an Ie — Zi), 
k=0 i=1 


define the (n — 1)x (n — 1) matrices 


Qn Gn-l cet tt 
0 an 
x= 
An An-1 
0 0 an 
0 0 ag 
a ay 
Y= ; 
0 ap 
a9 ay An-2 
S(P) = X-Y. 


It is shown in [4] that 


det S(P) = at’ x | [(1 - zize). 
Tek 


Notice that S(P) = 0 for z; = 1/z,, which is true for com- 
plex conjugate roots on the unit circle (and for pairs z;, 
zx for which |z;| > 1 or |z;| > 1, i.e. the stability bound- 
ary has been crossed already). The following arguments 
from [1] follow closely a corresponding argument in [3] 
for Hurwitz stability. P4 (z, w), a € (0, 1), has no com- 
plex conjugate roots on the unit circle if and only if 


S(Pa) = SlaPp+(1—a)Pc] = aS(Pg) + (1—@)S(Pc) 
is nonsingular for all a € (0, 1). Equivalently, 


ea) = S(Pp) — AS(Pc) 


(with A = (aw — 1)/c) is nonsingular for all A € (— 00, 0) 
and equivalently 


-1 
Seo = S(Pg)S(Pc) ! —Al 


is nonsingular for all A € (— 00, 0). That is S(Pg) S(Pc)~ 
1 has no negative real eigenvalues. 


Summary 


A polytope of polynomials of degree n with ¢ interval 
parameters is Schur stable if and only if 
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i) all 2° vertices are Schur stable; 

ii) for all 22° — 1 edges a testing matrix has no nega- 
tive real eigenvalues. The computation of the testing 
matrix involves forming two (m — 1)x(n — 1) matri- 
ces from the coefficients of the corresponding vertex 
polynomials, one inversion and one multiplication. 

Since the sequence of vertices is arbitrary, the total 

number of inversions is £2°~?. 
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Introduction 


In engineering applications, optimization problems of- 
ten arise in which an optimal steady state of a dynam- 
ical system is sought. Chemical production plants, for 
example, can often be described by ordinary differen- 
tial equations and algebraic equations, and an optimal 
continuous operation corresponds to an optimal steady 
state of the differential algebraic (DAE) model. 

Because an optimal steady state is sought, the dy- 
namical optimization problem reduces to an algebraic 
optimization problem. This point of view, however, ne- 
glects the stability properties of the dynamical system. 
While a steady state of a dynamical system is the solu- 
tion of a nonlinear set of algebraic equations, these al- 
gebraic equations do not reveal anything about the sta- 
bility of the resulting steady state. In fact, a steady state 
that is optimal with respect to a profit function may be 
found, but this steady state may turn out to be unsta- 
ble. This problem has been addressed with the use of 
matrix measures before [2,14,17]. Unfortunately, ma- 
trix measures overestimate the region of instability in 
the process design space and therefore lead to subopti- 
mal results. 
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The present contribution summarizes recent 
progress on a new approach to take stability bound- 
aries into account in steady state optimization. The 
new methodology, which is based on ideas from non- 
linear dynamics, permits to consider the stability of 
nonlinear dynamical systems without having to in- 
troduce approximations. The approach is based on 
the notion of the distance to critical manifolds in the 
parameter space [16]. This idea proves to be very gen- 
erally applicable. As a result, the approach cannot only 
be used to consider stability boundaries in steady state 
optimization, but other boundaries that are critical for 
the systems’ dynamics as well as feasibility boundaries. 

In engineering models, some or all parameters are 
often uncertain, i.e., they are only known up to a con- 
siderable error. In a chemical production plant, for ex- 
ample, a precise value of some or all kinetic constants of 
the chemical reactions may not be known. In fact, un- 
certainty of this kind arises systematically in engineer- 
ing models, because precise measurements of system 
parameters create cost themselves. Therefore, a trade- 
off usually exists between measuring unknown system 
parameters to desirable precision on the one hand, and 
affordable precision on the other hand. 


Definitions 


In this section the problem is introduced. For a more 
detailed introduction we refer to [24]. 

A large class of dynamical systems can be modeled 
by differential-algebraic (DAE) systems of the form 


PO= FeO, <bO40.40,00,c0Sz, 
0 = f%(x4(t), x°(t), u(t), d(t), 9, t), 
y(t) = h(x"(t), x*(t), u(t), d(t), 9, t) , 
(1) 


x = (xt xT eR™, ueR™, deR”, 
v € RR”, ye R”” are state variables, inputs, dis- 
turbances, parameters, and outputs of the system. 
In (1), t denotes time, and f := (f*, f#")? and h 
are smooth functions which map from some subset 
U c R™ x R™ x R”™ x R”® x R onto R”™ and 
R"”», respectively. The state variables x and the cor- 
responding equations have been partitioned into dy- 
namic state variables x4 and differential equations hig’ 
and algebraic variables x* and equations f*. In the se- 


where 


quel we assume that inputs u, and disturbances d vary 
only quasi-statically compared to the system dynam- 
ics. Ménnigmann and Marquardt [24] show that the 
system (1) can be simplified considerably under this 
assumption. The simplified system reads 


x(0) = x9, 


x = f(x,a, p), 


2 
y = h(x,a@, p), a 


where p denotes all inputs, references, and parameters 
that can be assumed to be known precisely, while a de- 
notes inputs, references, disturbances, and parameters 
that can be modeled in terms of an average value and an 
uncertainty. The parameters w therefore have the form 


a; € [a] := [a _ Aa;, a + Aaj], 
a Pree 


(3) 
>No - 
By a slight abuse of notation, we use the same symbols 
f and h in (1) and (2). 

The steady state optimization problem for the un- 
certain dynamical system (2) with stability constraints 
can now be stated: 


one (x, , p) (a) 
subject to 0= Fa, p), (b) 

= f(x. a. pO 
a ae o((%, &, p) Va € [a], (c) 
0 < 9(%,d, p) Vaela], (d) 
xEX,acApeP. (ec) 
(4) 


where ¢ is the merit function and (4b) ensures that 
the optimal point of operation is a steady state of the 
dynamical system. Equations (4c) constitute a semi- 
infinite constraint that guarantees all eigenvalues in the 
spectrum o(J) of the Jacobian J(x,a@, p) of f with re- 
spect to x to be in the open left half of the complex 
plane. This constraint ensures stability of the nonlin- 
ear system. Since constraint (4c) enforces this condition 
for all a within the uncertainty region (3), the resulting 
optimal steady state will be robust with respect to the 
uncertainty in a. In (4d), g is a twice continuously dif- 
ferentiable function that maps into R"s, ng > 1. Equa- 
tion (4d) is a semi-infinite constraint that ensures the 
inequality constraints to hold for all a. Finally, (4e) de- 
notes bounds on x, a, and p. In a typical application 
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there exist box constraints on some or all of the state 
variables and parameters. We note that some or all of 
the nominal values of the uncertain parameters a 
may be degrees of freedom in the optimization. 


Method 


Monnigmann et al. [19] introduced an algorithm for 
solving the semi-infinite problem (4). This algorithm is 
based on detecting and backing-off critical manifolds. 
We first introduce the notion of a critical manifold and 
a normal vector, and then present the algorithm for 
solving (4). 


Critical Manifolds and Normal Vectors 


In order to introduce the notion of a critical manifold, 
consider a single feasibility constraint 


0 < gi(x,a, p) (5) 


from among the feasibility constraints in (4d). In this 
simple case, the critical manifold, denoted by M‘, is de- 
fined by the set of points at which the constraint (5) is 
active, 


M* = {(x,a, p)|0 = f(x, a, p),0 = gi(x,a, p)} . 

(6) 

This critical manifold M© separates the part of the 
space of uncertain parameters a in which the constraint 


holds from the part in which the constraint is violated, 
cf. Fig. 1. 


0=f( a, p) 


infeasible feasible 


ay 


In Fig. 1 the dot marks a candidate steady state of 
operation in the feasible regime. The shortest distance 
in the space of the uncertain parameters (a, a2) be- 
tween this candidate point of operation and the criti- 
cal manifold occurs along the marked direction that is 
normal to the critical manifold. By imposing a mini- 
mum distance from the candidate point of operation 
to the critical boundary, we can ensure that the critical 
boundary is not crossed, regardless of the actual values 
of the uncertain parameters in the uncertainty region 
(3). This amounts to overestimating the uncertainty re- 
gion by a circle and forcing the circle to at most touch 
the critical boundary tangentially, or to stay at a larger 
distance from the critical boundary, cf. Fig. 1. In the sit- 
uation sketched in Fig. 1 we assume that the uncertain 
parameters have been scaled appropriately. This can be 
done, for example, by measuring them in units of their 
uncertainties Aa; [18]. In the general case, the uncer- 
tainty region (3) defines a hyperrectangle and the circle 
shown in Fig. 1b becomes a hyperellipsoid. Regardless 
of the number fq and scaling of the uncertain parame- 
ters, the distance between the candidate point of oper- 
ation and the critical manifold, or the distance between 
the uncertainty region and the critical manifold can be 
measured with the aid of the normal vector. We refer 
to [18] for details. 

The concepts of a critical manifold and of measur- 
ing the distance to a critical manifold along a normal 
vector have been introduced for feasibility constraints. 
The most important application of the normal vector 
approach, however, is the use of normal vectors to en- 


a 


infeasible feasible 


R 


b “1 


Robust Design of Dynamic Systems by Constructive Nonlinear Dynamics, Figure 1 
Critical manifold MS and robustness region R. This figure is reproduced from [16]. We note that the overestimation introduced 


by the circle in (b) can be mitigated by other descriptions [9]. 
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sure a parametric distance from stability boundaries in 
order to guarantee robust stability. Critical points for 
stability of ODE and DAE systems have been investi- 
gated in bifurcation theory. A thorough discussion of 
the use of normal vectors to manifolds of bifurcation 
points is beyond the scope of this paper and we refer the 
reader to textbooks in applied bifurcation theory (see, 
for example, [15]) and previous papers on the use of 
normal vectors in steady state optimization [18,21]. We 
give a sketch of the idea, however. 

A simple result of nonlinear systems theory states 
that a steady state (x, w, p) of a nonlinear system 
(2) is stable if all eigenvalues of the Jacobian A(x, @, p) 
defined by 


mg 


7 Ox; 


Aij (x, a, p), 

evaluated at this steady state, lie in the left half of 
the complex plane. Since A is non-symmetric in gen- 
eral, stability may be lost due to either a real eigen- 
value or a complex conjugate pair of eigenvalues in 
the open right half of the complex plane. In bifur- 
cation theory, these two cases are known as saddle- 
node or Hopf bifurcation [15]. Under genericity con- 
ditions [15], a manifold of saddle-node (Hopf) bifur- 
cations exists in the vicinity of a saddle-node (Hopf) 
bifurcation point. Saddle-node bifurcations are usually 
characterized by the necessary conditions 


0 = f(x*,a*,p*) (a) 
0 = A(x*,a*, p*)v (b) (7) 
0=viy—-1 (c) 


where (x*,a*, p*) is the bifurcation point. Equations 
(7a) ensure that the bifurcation point is a steady state. 
Equations (7b) are eigenequations with a real eigen- 
value zero, the critical eigenvalue. Finally, (7c) is a regu- 
larization that is necessary because (7b) determines the 
eigenvector v up to its length only. Based on these nec- 
essary conditions the manifold 


MS := {(e" a" op") |Sv € R” such that (7) holds} 
(8) 
can be stated. Similar necessary conditions can be stated 


for Hopf bifurcation points. Note that the determinant 
of A is zero if and only if a real zero eigenvalue exists. 


The determinant therefore can be used as a test func- 
tion for checking if a steady state may be a saddle-node 
bifurcation (numerically more robust test functions ex- 
ist, see [15], for example). Test functions of this type 
will be used in the algorithm to detect the crossing of 
critical manifolds. 

Monnigmann and Marquardt [18] presented 
a scheme for the derivation of systems of equations for 
the calculation of normal vectors. This scheme can be 
applied to defining equations for manifolds of the form 
(5) or (6). Ménnigmann and Marquardt [18] applied 
this scheme to a number of bifurcation point types. 
The saddle-node normal vector system, for example, 
comprises of 


Equation (7) and 


Ny 
se ou (9) 
o= yoy Diet at, pt) ri i=1,...,M, 


where r € R”* is the desired normal vector. This sys- 
tem has first been stated and used by Dobson [4]. Sim- 
ilar systems for the calculation of normal vector can 
be derived for the feasibility constraints, Hopf bifurca- 
tions, and other critical dynamical points [18]. In gen- 
eral, these systems have the form 


0 = G (x, x*, a, p, 1), (10) 


where x denote auxiliary variables such as the eigenvec- 
tor v in (7). The upper index t is introduced to distin- 
guish between the various types of critical points and 
manifolds such as t = saddl 
t = feasibility. 


node, t = Hopf, or 


Algorithm 


Based on the normal vector systems (10) an algo- 
rithm for solving (4) can be stated. Before proceed- 
ing to the algorithm a few remarks are necessary. For 
brevity, these remarks are given in an informal fash- 
ion. For a more detailed and precise discussion we refer 
to [20,21]. 

Loosely speaking, the optimization algorithm will 
move the robustness region (3) (or an approximation 
thereof such as the circle in Fig. 1) in the parameter 
space while seeking a minimum for the profit function 
(4a). In this process, critical manifolds may be crossed. 
Test functions, like the determinant introduced in Sect. 
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“Critical Manifolds and Normal Vectors”, can be used 
to signal the crossing of critical manifolds. Whenever 
a critical point is detected, this point is added to a set of 
known critical points denoted by 7. Later in the algo- 
rithm these points are used to initialize normal vector 
constraints. Critical points that have been used to ini- 
tialize normal vector constraints are collected in a set 
7. Points in J can be processed to 7 by solving an opti- 
mization problem that is not discussed here for brevity. 
We refer to [20] for details. 

Given the index set J, the following optimization 
problem is solved: 


: (0) (0) (0) 
cee d(x’, ae’, p*”’) (a) 
subject to 0 = f(x, a, p) (b) 
0= GMD (4 THD yl) p® r)) VieTd, (c) 
7 
0=a —¢ +1—_, Vied, (d) 
[Ir | [2 

0<19~ Y/ng, Wied (e) 
xeX, aeA, peP. (f) 

(11) 


Because the crossing of critical manifolds is only de- 
tected at the nominal point of operation, critical man- 
ifolds may enter the robustness region without being 
detected [20]. For this reason, a rigorous numerical test 
for critical points in the robustness region is neces- 
sary. Interval arithmetics can be used to carry out this 
test [21] for problems of moderate complexity. 

Finally, an adjustable parameter i , is / Na; has to 
be chosen. This parameter is used to switch off normal 
vector constraints. 

The algorithm can now be stated as follows. 

1. (Initialization) Choose a steady state of (2) that is 
stable in the sense of (4c) and feasible with respect 
to the constraints (4d). Choose a value for i. If criti- 
cal point in the vicinity of the steady state are known 
a priori, put them into J.Set J = @. 

2. (Update of J) Process points in J to J [21]. Remove 
critical points from 7 for which 1? > i. 

3. (Optimization) Solve problem (11) using the current 
index set 7. 

4. (Detection of critical manifolds) Analyze the path 
between the starting and end point of step 3. Ifa crit- 
ical manifold has been crossed along this path, put 


this point into J and return to step 2. Otherwise pro- 
ceed to step 5. 

5. (Rigorous search for critical points) Check for crit- 
ical points in the robustness region (3). If a critical 
point exists that has not been detected in step 4, put 
it into J and return to step 2. Otherwise stop. 

For several details on the practical implementation we 

refer to [20]. For details on step 5 we refer to [21]. 


Cases 


The normal vector based optimization approach has 
successfully been applied to different types of problems 
and a number of cases. The treated models range from 
a few to a few hundred model equations in size: 

e Steady state optimization of parametrically uncer- 
tain dynamical systems with constraints for stabil- 
ity [1,9,21,22,24]; 

e Steady state optimization of parametrically uncer- 
tain systems with constraints on the location of 
higher codimension critical points. Normal vector 
constraints on the location of these points can be 
used to guarantee parametric robustness with re- 
spect to hysteresis, for example [7,18]. 

e Steady state optimization with parametrically robust 
pole placement. In applications of this type, the al- 
gorithm is applied to simultaneously seek an op- 
timal plant design and to tune controller parame- 
ters [5,8,10,11,12,13,16,19,23]. 

e Optimization of parametrically uncertain transient 
processes [3,6]. 

Here the use of the normal vector based methodology is 

demonstrated with a small example. Because only two 

uncertain parameters exist in this example, the results 
can be visualized. The result for this example have first 

been published in [24]. 

The system treated here is a simple model for the 
solution free radical homopolymerisation of vinyl ac- 
etate. The model has been analyzed thoroughly with re- 
spect to its nonlinear dynamic behavior by numerical 
bifurcation analysis. We refer to [25] for details on the 
model and its analysis. The process is optimized with 
respect to an economic cost function that takes the cost 
of monomer, the cost of an initiator of the polymeriza- 
tion, and the cost of the solvent into account [24]. Two 
parameters are assumed to be uncertain. These are the 
residence time in the reactor @ and the concentration of 
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Robust Design of Dynamic Systems by Constructive Nonlinear Dynamics, Figure 2 
Optimum which results in the first sweep through the algorithm with J = 9,7 = @. The optimum that results in step three 


is marked by the cross. These results were first reported in [24] 


initiator If: 


A@é = Aa, = 5 min, 
(12) 
Al; = Aa, = 0.005 mol/l. 
Assuming that nothing is known about the location 
of critical boundaries, the algorithm proceeds with 
J =1= @. The parameter lis set to 1 = 2. The value of 
iis not important, since normal vector constraints are 
never switched off in step two in this particular exam- 
ple. The result of the optimization solved in step three 
is shown in Fig. 2a. Figure 2b shows the location of the 
stability boundaries which result from saddle-node and 
Hopf bifurcations as a function of @ and Iy. The partic- 
ular value of I;, at which Fig. 2a was obtained, is marked 


stable 

unstable 

constrained optimum 
saddle-node 

Hopf 


0 50 100 
a & [muir] 


150 200 


by the horizontal dashed line in Fig. 2b. While the opti- 
mal point of operation lies on a stable branch of steady 
states, the optimal point is obviously not robust with re- 
spect to variations in 0. The analysis in step 4 or 5 will 
therefore reveal that a loss of stability is likely due to the 
saddle-node bifurcations in the vicinity of the candidate 
optimal point. 

In the next step a close saddle-node bifurcation is 
therefore added to J, and the optimization (11) is re- 
peated with a single normal vector constraint. The re- 
sult is shown in Fig. 3. Since the robustness region 
marked by the ellipse contains no critical points, the 
algorithm terminates. An optimal point of operation 
that is stable despite the parametric uncertainty (12) has 
therefore been found. 


I; 
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0.08 17 - 
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Robust Design of Dynamic Systems by Constructive Nonlinear Dynamics, Figure 3 
Optimum which results from the optimization with constraint on the minimal distance to the saddle-node bifurcation mani- 
fold. The optimum from Fig. 2 is marked for reference. These results were first reported in [24] 
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Conclusions 


This contribution summarizes recent progress in the 


development of methods for the optimization of non- 


linear dynamical systems with parametric uncertain- 
ties. The approach has successfully been applied to 
various problems in plant design and integrated plant 


design and controller tuning. Cases discussed here are 


restricted to steady state optimization. The extension to 


the robust optimization of transient processes is subject 
to current investigations [3,9]. 
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Introduction 


Many deficiencies in global optimization methods for 
problems with nonconvex constraints require the ne- 
cessity to reexamine certain concepts of approximate 
optimal solutions and to develop a robust approach to 
these problems. 


Difficulties of Problems 
with Nonconvex Constraints 


A wide class of global optimization problems have the 
form 


min{f(x)|gi(x) > 0(i = 1,...,m), x € [a, bJ}, (P) 


where f,£1,-..,2m are nonconvex continuous real- 
valued functions on R”, a,b € R", and [a, b] := {x € 
R"|a < x < b}. When the feasible set 


D:= {x € [a, b]|gi(x) => 0,i = 1,..., m} 


is highly nonconvex, computing just one feasible solu- 
tion may be almost as hard as solving the problem it- 
self. Therefore, most current methods for solving these 
problems are confined to finding only an approximate 
optimal value rather than an approximate optimal solu- 
tion. Even within these limitations, many methods are 
deficient in one way or another, providing approximate 
optimal solutions which are not guaranteed to be close 
enough to the true optimum, or in other cases that very 


unstable with respect to perturbations of the data or re- 
finements of the tolerances. 
Typically, the given problem is relaxed to 


min{ f(x)|gi(x)+e => O(i = 1,..., m)x € [a, b}}, (R) 


where ¢ > 0 is the tolerance. Although this problem is 
a bit easier than (P) because it satisfies the regularity as- 
sumption, an optimal solution of it can rarely be com- 
puted exactly in finitely many iterations. So the best that 
one can expect to compute in finitely many iterations is 
an 7-optimal solution of (R), i. e., a feasible solution x* 
of (R) such that f(x*) < f(x) + 7 for every other feasi- 
ble solution x. Such an x* is sometimes referred to as 
an (€,7)-approximate optimal solution of the original 
problem. Unfortunately, the inadequacy of this concept 
has been shown on simple examples where an (¢, )-ap- 
proximate optimal solution lies so far away from the 
true optimum that it can hardly be accepted as an ap- 
proximation of the latter in some reasonable sense. 

In view of these limitations and deficiencies, a ro- 
bust approach to nonconvex global optimization prob- 
lems is desirable which could provide the user with 
reliable good feasible solutions, stable under small per- 
turbations of the data and easily implementable, though 
not necessarily the best among all possible feasible so- 
lutions. 


D(C)-Optimization Problems 


The basic idea of this robust approach is to transform 
an optimization problem with a complicated noncon- 
vex constraint set into a sequence of problems with 
nice constraint sets. Such a transformation is possible if 
the functions describing the problem belong to certain 
classes we are going to define. 

For any two functions g;, g2: R” — R write g = 
SVG, = Ag if g(x) = max{gi(x), go(x)}, h(x) = 
min{gi (x), go(x)}. 

Let C be a family of real-valued functions on R” 
such that 
(i) fis f2 € C,0y,a2 € Ri Safi tof, eC; 
fii) 2. BECFauvaec. 


Proposition 1 Under assumptions i and ii the family 
D(C) = C — C is a vector lattice with respect to the two 
operations V and A. 
Also note that if f € D(C) then |f| € D(C) because 
[f| = max{f, —f}. 
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An optimization problem of the form (P) where 
f.8 € D(C),i = 1,...,m, is called a D(C)-optimiza- 
tion problem. 


Proposition 2 Assume C satisfies assumptions i and ii 
and also that: 
(iii) Every function x +> x;, withi € {1,...,n}, belongs 
toC. 

Then by simple manipulations any D(C)-optimiza- 
tion problem can be rewritten in the form (called “stan- 
dard”) 


min{ f(x)|g(x) 2 0,x € [a, b}}, 


where fEC, geDC). (1) 


Two most important cases when C satisfies assump- 
tions i, ii, and iii are: 

1) C is the family of convex functions. Any 
f € D(C) is then called a dc function and a D(C)- 
optimization problem is called a dc optimization prob- 
lem. Until recently most problems studied in global op- 
timization could be shown to belong to this class [1]. 

2) C is the family of increasing functions, i.e., 
functions f(x) such that x’ < x > f(x’) < f(x). Any 
f © F is then called a dm function and a D(C)- 
optimization problem is called a dm optimization, or 
else a monotonic optimization problem. A theory of 
monotonic optimization has emerged in recent years 
that has been shown to parallel dc optimization in sev- 
eral respects [2,5,6]. 

As a result, any de (dm, respectively) optimization 
problem can be written in the form (1) where f, gi, and 
& are convex (increasing, respectively) functions. 

Since any polynomial can be viewed either as a dc 
or as a dm function, the set of dc functions or dm func- 
tions on a box [a, b] is dense in the space C[a, b] of con- 
tinuous functions on [a, b] with the supnorm topology. 
Virtually every global optimization of interest belongs 
to either of the basic classes described above. 


A Robust Approach 


In a continuous nonconvex optimization problem, an 
isolated optimal solution is usually very difficult to 
compute and very difficult to implement when it is 
computable. To avoid these difficulties most global op- 
timization methods assume that the feasible set D satis- 


fies 
D = cl(int)D, 


where clA, intA denote the closure and the interior, re- 
spectively, of the set A. However, this condition is gen- 
erally very hard to check. Practically we often have to 
consider feasible sets which are not known a priori to 
contain isolated points or not. 

Therefore, from a practical point of view it is desir- 
able to know how to discard isolated feasible solutions 
without having to check for their presence. 

A nonisolated feasible solution x* of (P) is called an 
essential optimal solution if 


f(x*) < f(x) Vx eD*, 


where D* denotes the set of nonisolated points (ac- 
cumulation points) of D:= {x € [a, b]|g(x) => 0}. In 
other words, an essential optimal solution is an optimal 
solution of the problem 


min{ f(x)|x € D*}. 


For a given tolerance ¢ > 0a point x € [a, b] satis- 
fying g(x) > ¢ is said to be e-feasible, and an ¢-feasible 
point which is nonisolated is called an essential e-feasi- 
ble solution. In other words, an essential ¢-feasible so- 
lution is a nonisolated point of the set De := {x € 
[a, b]|g(x) => e}. For given tolerances ¢ > 0,7 > 0, 
an essential e-feasible solution x* is called an essential 
(€, n)-optimal solution if 


f(x") < f(x) +n VWxe Di, 


where D%, is the set of all essential e-feasible solutions. 
An essential (¢, 7)-optimal solution for ¢ = 0,7 = 0 is 
obviously essential optimal. 

The above discussion suggests that instead of try- 
ing to find an optimal solution to (P), it would be more 
practical and reasonable to look for an essential (¢, 7)- 
optimal solution. 

The robust approach embodies this point of view 
for D(C)-optimization, i. e., for a class of problems that 
includes virtually every nonconvex global optimization 
problem of interest. 


Interchangeability Between Objective and Constraint 
From now on we consider problem (P) where gy, 
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1-15 2m € D(C). Setting g(x) = minj=,....m gi(x), we 
rewrite (P) as 


min{ f(x)|g(x)=0, x € [a,b]}, (P) 


where f, g € D(C). Further, by Proposition 2, without 
loss of generality we can assume f € C. Given e,y € R, 
let us consider the pair of optimization problems 


min{f(x)|g(x) > e,x € [a, b]}, (Pe) 


max{g(x)| f(x) < y,x € [a, b]}, (Q,) 


where the objective and constraint functions are inter- 
changed. Owing to the fact f € C, a key feature of prob- 
lem (Q,,) for our purpose is that its feasible set is a con- 
vex set (if C is the set of convex functions) or a normal 
set (if C is the set of increasing functions), so in either 
case problem (Q,,) has no isolated feasible solution and 
computing a feasible solution to (Q,,) can be done at 
cheap cost. 


Proposition 3 If max (Q,) < € then min (P;) > y. 


On the basis of this property, the robust approach to 
D(C)-optimization consists in replacing the original 
problem (P), possibly very difficult, by a sequence of 
easier, stable, problems (Q,), where the parameter y 
can be iteratively adjusted until a stable (robust) solu- 
tion to (P) is obtained. 


Successive Incumbent Transcending 


A key step towards finding a global optimal solution of 
a problem (P) is to deal with the following question of 
incumbent transcending. 

(*, y) Given a real number y, check whether prob- 
lem (P) has an essential e-feasible solution x satisfying 
f(x) < y, and find one such solution if it exists. 

Using Proposition 3, consider a convergent branch- 
and-bound (BB) algorithm for solving problem (Q,) 
generating a sequence of partition sets M;, together with 
numbers a(M;,) € R U {—ov}, and points xk yk such 
that 


x* € My, f(x") <y, (2) 
a(Mx) = max(7) , (3) 


a(My) — g(x") > 0 (k—> +00). (4) 


a(M) is an upper bound over M and (3) holds because 
Mx is the partition set with largest aw(M) among all par- 
tition sets currently of interest, while (4) is the conver- 
gence condition. 

From (4) we have g(x*) — max (Q,) and hence, for 
€ > 0 given, either g(x*) > 0 for some k or a(Mxx) < € 
for some k. In the former case, x* is an essential ¢- 
feasible solution of (P) satisfying f (x*) < y. In the lat- 
ter case, max (Q,) < ¢; hence by Proposition 3, min 
(P;) > y, so no feasible solution x of (P) exists such that 
f(x) < y (hence, if y = f(x) — n and x is an essential 
é-feasible solution, then x is an 7-optimal solution of 
(P)). 

Thus, with the stopping criterium “g(x*) > 0 
or a(M;,) < e” a convergent BB procedure for solving 
(Q,) will help to transcend a given incumbent value y, 
i.e., to solve the subproblem (*, y). 

For brevity a convergent BB procedure for solving 
(Q,,) with stopping criterion “a(M) < ¢ or g(x*) > 0 
will be referred to as procedure (*, y). 

Using this finite procedure, one can solve problem 
(P) according to the following successive incumbent 
transcending (SIT) scheme, where yo denotes an arbi- 
trary real number larger than max{f(x)|x € [a, b]}. 


SIT (Successive Incumbent Transcending) 
Scheme 

Start with y = yo. 

Call procedure (*, y). Ifan essential e-feasible so- 
lution x of (P) is obtained with f(x) < y, reset 
y < f(x) — 7 and repeat. Otherwise, stop: x is 
an 7-optimal solution if y = f(x) — n; problem 
(P) has no «-feasible solution if y = yo. 


Since f(D) is compact and 7 > 0 this scheme is nec- 
essarily finite. 


The SIT Algorithm for D(C) Optimization 
Incorporating procedure (*, y) into the SIT scheme, we 
obtain the following SIT algorithm for (P). 


Let 7o be any real number such that y > max{f(x)| 
x € [a, b}}. 


SIT algorithm for (P) 

Select tolerances ¢ > 0,7 > 0. 

Step 0. Let P) = {M1}, My = [a,b], Ry = @. 
Let y = yo. Setk = 1. 
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Step 1. For each box (hyperrectangle) M ¢€ 
P, compute an upper bound a(M) for 
max{g(x)|x € MM 2, f(x) < y}. Delete ev- 
ery M such that a(M) < «. 


Step 2. Let P’, be the collection of boxes re- 
sulting from P; after completion of step 1. Let 
R', =REUP'E. 


Step 3. If R’, =@ then terminate. If y = yo 
the problem (P) is essentially e-infeasible; oth- 
erwise, the essential feasible solution x such that 
y = f(x) — 7 is an essential 7-optimal solution 
of (P). 


Step 4. If R’;, AG, let My € argmax{a(M)|M € 
R’,}. Divide Mj into two subboxes using the 
standard bisection. Let Px41 be the collection of 
these two subboxes of Mx, Rr+1 = R’x \ {Mx}. 
Increment k, and return to step 1. 


Proposition 4 The SIT algorithm terminates after 
finitely many steps, yielding an (e,n)-optimal solution 
or evidence that the problem has no essential e-feasible 
solution. 


Extensions 


So far all constraints have been assumed to be of in- 
equality type: gj(x) > 0,i =1,...,m. Inthe case when 
there are equality constraints such as 

hj(x)=0, j=1,...,s, 
one can use linear equalities to eliminate certain vari- 
ables, so without loss of generality it can always be as- 
sumed that all equality constraints are nonlinear. Since 
an exact solution to a nonlinear system of equations 
cannot be expected to be computable in finitely many 
steps, one should be content with replacing every given 
equality constraint hj(x) = 0 by an approximate one: 

—6<hjx)<6, j=1,...,s, 
where 6 > 0 is the tolerance. A mixed system with both 
inequality and equality constraints can thus be replaced 
with an approximate system involving only inequality 
constraints to which the above approach can be applied. 
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Abstract 


The various robust linear programming models inves- 
tigated so far in the literature essentially appear to be 
based either on what is referred to as rowwise uncer- 
tainty models or on columnwise uncertainty models 
(these basically assume that the rows, or the columns, 
of the constraint matrix are subject to changes within 
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a well-specified uncertainty set). In this chapter, we dis- 
cuss a special case of columnwise uncertainty, namely 
the subclass of robust linear programming (LP) models 
with uncertainty limited to the right hand side (RHS) 
only (this subclass does not appear to have been sig- 
nificantly investigated so far). In this context we in- 
troduce the concept of a ‘two-stage robust LP model’ 
as opposed to the standard case (which might be re- 
ferred to as a ‘single-stage robust LP model’) and we 
address the question of whether LP duality can be used 
to convert a LP problem with RHS uncertainty into 
a robust LP problem with uncertainty on the objec- 
tive function. We show how to derive both statements 
of (a) the dual to the robust model and (b) the ro- 
bust version of the dual. The resulting expressions of 
the objective function to be optimized in both cases, 
appear to be clearly distinct. Moreover, from a com- 
plexity point of view, one appears to be efficiently 
solvable (it reduces to a convex optimization prob- 
lem), whereas the other, as a nonconvex optimiza- 
tion problem, is expected to be computationally diffi- 
cult in the general case. As an application of the two- 
stage robust LP model introduced here, we next inves- 
tigate the PERT (program evaluation and review tech- 
nique) scheduling problem, considering two possible 
natural ways of specifying the uncertainty set for the 
task durations: the case where the uncertainty set is 
a scaled ball with respect to the Loo norm; the case 
where the uncertainty set is a scaled Hamming ball 
of bounded radius (which, though leading to a quite 
different model, bears some resemblance to the well- 
known Bertsimas-Sim approach to robustness). We 
show that in both cases the resulting robust optimiza- 
tion problem can be efficiently solved in polynomial 
time. 


Keywords and Phrases 
Robust linear programming; Duality; PERT scheduling 


Introduction 


Various models for handling robustness objectives 
with respect to uncertainties on some specified co- 
efficients in linear programming (LP) models have 
been proposed in the literature. We can mention Soys- 
ter [9], Ben-Tal and Nemirovski [1,2] and Bertsimas 
and Sim [3,4]. 


The various approaches proposed can roughly be 
divided into two distinct categories, depending on 
whether the underlying uncertainty model refers to 
possible fluctuations on the row vectors of the con- 
straint matrix (we call this ‘Yrowwise uncertainty’), or 
on the column vectors (we call this ‘columnwise uncer- 
tainty’). 

Columnwise uncertainty was first considered by 
Soyster [9]. In this model each column <A; of the 
m Xn constraint matrix is either supposed to be ex- 
actly known, or is only known to belong to a given sub- 
set K; C R™ (uncertainty set’). The cost vector and the 
right hand side (RHS) are supposed to be certain. A ro- 
bust solution is a solution which is feasible for all pos- 
sible choices of the uncertain column vectors in their 
respective uncertainty sets. With this definition, assum- 
ing nonnegativity constraints on all variables of the LP, 
it can easily be shown that the problem of finding an 
optimal robust solution reduces to solving an ordinary 
LP with constraint matrix A = (a;,)), where Vi, j, and 
the coefficient a;; is defined as 
° aij = MaXyex; {Vi} in case of a ith constraint of the 

form < 
°e 44; = minyex{Vis in case of a ith constraint of the 

form >. 

Note that the above maximization (or minimiza- 
tion) can easily be carried out if we assume the uncer- 
tainty sets are either of finite cardinality (and not too 
big!) or closed convex. 

As observed by many authors, a drawback of Soys- 
ter’s model is that it usually leads to rather conservative 
solutions, in other words the price to pay for robustness 
in the above sense is often too high. 

Contrasting with the above, rowwise uncertainty 
has attracted more interest and has been studied by, 
among others, Ben-Tal and Nemirovski [1,2], and more 
recently by Bertsimas and Sim [3,4]. 

Ben-Tal and Nemirovski start with the assump- 
tion that each row A; of the constraint matrix belongs 
to a known uncertainty set consisting of an ellipsoid 
E; C R" , anda solution x € R", x > Ois said to be ro- 
bust in this context if and only if it satisfies 


for all i: Ajx < b;, VA; € Ej. 


Ben-Tal and Nemirovski then show that finding an 
optimal robust solution reduces to solving a conic 
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quadratic problem, which can be done in polynomial 
time. 

A way to obviate nonlinearity, while retaining the 
idea of rowwise uncertainty, was proposed by Bertsimas 
and Sim [3,4], considering a slightly different model of 
uncertainty. More precisely, they assume that each un- 
certain coefficient a;; can take values in a given interval 
[ai,j — O;,;, i,j + a@;,;] and, for each row i, a positive 
parameter J; > 0 (not larger than the total number of 
uncertain coefficients in row i) is considered. A solution 
x is then qualified as ["-robust (in the sense of Bertsi- 
mas and Sim) if and only if for all i = 1,..., m this so- 
lution satisfies the ith constraint for all possible choices 
of the coefficients in row i such that at most I"; of the 
uncertain coefficients in the row are allowed to devi- 
ate from the nominal values aj; (note that the above 
statement implicitly assumes the J”; parameters to be 
integers, but Bertsimas and Sim show that a slightly 
more general definition, allowing for nonintegral val- 
ues of the J”;’s can be handled in the same way). With 
this model of uncertainty, Bertsimas and Sim show that 
finding an optimal I"-robust solution can be reduced 
to solving an ordinary linear program only moderately 
increased in size, thus opening the way to large-scale 
applications. Moreover the approach readily extends to 
optimization problems including integrality constraints 
on all or part of the variables; in that case the robust ver- 
sion of the problem is a mixed-integer program, but, 
again, the resulting robust model is only moderately in- 
creased in size compared with the original model. 

In the present chapter we investigate a specific sub- 
class of robust LP decision problems with columnwise 
uncertainty, namely LP problems with uncertainty on 
the RHS coefficients only. To handle such problems, 
a first natural idea would be to use duality to reformu- 
late them as robust LPs with uncertainty on the objec- 
tive. This is the subject of Sect. “Duality and Robustness 
for LPs with Uncertainty on the RHS”. 


Duality and Robustness for LPs 
with Uncertainty on the RHS 


We first address in this section the question of whether 
duality can prove in any way useful to convert a colum- 
nwise uncertain linear program into a rowwise un- 
certain linear program, assuming of course the same 
uncertainty model for the columns of the given linear 


program and for the corresponding rows in the dual. 

Intuitively, no nice (i.e. strong) duality result is to 
be expected when taking into account robustness con- 
straints since, in both the primal and the dual, there is 
a price to pay for uncertainty; therefore, if we maximize 
in the primal, the robust primal optimal solution value 
will be (in general strictly) less than the optimal solu- 
tion value of the ‘nominal’ primal LP, and minimizing 
in the dual will lead to a robust dual optimal solution 
value (in general) strictly larger than the same value. 

Let us illustrate the phenomenon for a small typi- 
cal example. Consider the following LP (a continuous 
knapsack problem actually) with two uncertain coeffi- 
cients a, and a; in the constraint matrix 


Maximize 4x, + 3x2 
(P) subject to 
a\X1 + 42xX2, <4 
x, 20,%. 20, 
the standard LP dual of which reads 
Minimize 4u 
(D) subject to 
aquu>4 
anu = 3 


u>0. 


Let us assume that the uncertainty set for a is the real 
interval [2,3], the uncertainty set for a, is the real in- 
terval [1, 2], and let us take as the definition of a robust 
solution in both (P) and (D) a solution which is feasible 
for any possible values of a; and a in their respective 
uncertainty sets. Then it is easily seen that the optimal 
robust primal solution is x° = [0, 2] with correspond- 
ing primal objective function value 6; and the optimal 
robust dual solution is u° = 3 with corresponding dual 
objective function value 12. This example thus clearly 
shows that no natural extension of the usual properties 
related to LP duality is to be expected in the context of 
robust LP. 

A special case of columnwise uncertainty in LP is 
when uncertainty only concerns the coefficients of the 
RHS. Such problems frequently arise in practical ap- 
plications. As a typical example, we mention the ro- 
bust program evaluation and review technique (PERT) 
scheduling problem with uncertainty on the durations 
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of (some of) the tasks, assuming that a robust earliest 
termination date has to be determined. More precisely, 
we want to determine the minimum total duration of 
the project under any possible assignment of task dura- 
tions, taken in a given uncertainty set. The two-stage ro- 
bust model discussed in Sect. “Two-Stage Robust (LP) 
Decision Model” will appear to be relevant to such ap- 
plications. 
Consider the following LP 


and assume that the RHS b is not known exactly, but is 
only known to belong to some uncertainty set B C R”. 
The set B may be finite or infinite (we will introduce 
additional assumptions for this set when necessary). 
Two distinct robustness models for LPs with uncer- 
tain RHS will be successively discussed in the following 
sections, namely single-stage robust decision models 
(Sect. “Single-Stage Robust (LP) Decision Model”) and 
two-stage robust decision models (Sect. “Two-Stage 
Robust (LP) Decision Model”). For both cases it will be 
shown that, even in the restricted situation addressed 
here (uncertainty on the RHS only), one cannot use 
standard duality theory to convert a columnwise un- 
certain linear program into a rowwise uncertain linear 
program while preserving equivalence. Also, examples 
will be provided to show that the two-stage robust LP 
decision model is capable of producing less conserva- 
tive solutions than the single-stage robust LP model. 


Single-Stage Robust (LP) Decision Model 


We first consider the simplest case where the values of 
all the decision variables x have to be fixed (taking into 
account uncertainty) before we get any kind of infor- 
mation on the actual realization of the uncertain pa- 
rameters. In such a simple model (indeed a special case 
of Soyster’s model) feasibility has to be ensured for any 
b € B, and the problem to be solved simply reduces to 


max cix 
Ax <b 
x>0, 


where Vi, b; = minyep {bj}. 


The (standard) LP dual to the above problem reads 


min ub 


(D1) 
ulA >c 


u>0. 


On the other hand, if we consider the dual to 


max c'x 
Ax <b 
x>0 
we get 
min ub 
ulA PC 
u>0. 


Now consider the robust version of this dual prob- 
lem where the cost vector b is uncertain and can take 
any value in B. A simple and natural objective in this 
context is to find u achieving a minimum value of 
max u'b over all possible b € B, thus leading to 


(D2) minmax {u'b} 
u beB 


subjectto ul’A>c 


u=0. 


It is clearly realized that (D1) and (D2) are completely 
different optimization problems for the same solution 
sets since 


Vu > 0,u'b < max{u'b}, 
beB 
with strict inequality holding in the general case. 
If the set B is either finite or closed convex, it is ob- 
served that (D2) can be efficiently solved via standard 
convex optimization techniques since the function 


u — max{u'b} 
beB 


is convex. Also, in this case, b can be efficiently com- 
puted since min,eg{b;} too is a convex optimization 
problem; therefore problem (D1), too, can be efficiently 
solved. 
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Two-Stage Robust (LP) Decision Model 


It frequently arises in applications that the process of 
decision making under uncertainty can be decomposed 
into two successive steps (two-stage decision making) 
or more (multistage decision making). For simplicity of 
presentation, we restrict ourselves here to the case of 
two-stage decision making. In this case, the set of de- 
cision variables x is decomposed (partitioned) into two 
distinct sets of variables which we denote y and z. The 
y variables concern the decisions to be taken in the first 
stage (before knowing anything about which realization 
of uncertainty will arise) and the z variables concern the 
decisions to be taken in the second stage (after realiza- 
tion of uncertainty). 

Limiting ourselves, to make the discussion easier to 
follow, to the case where the objective function only de- 
pends on the decision variables of the first stage, our 
decision problem can thus be rewritten 


(1) max y'y 
subjectto Fy+Gz>b 
y=0,z>0, 


where y and b are vectors and F and G are matri- 
ces of appropriate dimensions. (Observe that the rea- 
son for restricting ourselves to the case of an objec- 
tive not depending on the z variables is only for the 
sake of simplicity in the presentation; our two-stage 
robust decision model would readily handle the gen- 
eral case of an objective depending on both the y and 
the z variables). Now, since the RHS b in (I) is uncer- 
tain, we have to make our robustness objectives pre- 
cise. In the sequel, we consider robustness for a so- 
lution y by requiring that feasibility can be ensured 
for any possible RHS b € B by using the second-stage 
decision variables z (by analogy with the terminology 
used in stochastic programming, the z variables might 
be referred to as ‘recourse’ variables). So, if we define 
Y = {y/y > OandVb € B,Az > 0: Gz< b— Fy}, we 
want to solve 


max{y" y}. 
yeY 


Note that in the above, for any given robust solution 
y, the value taken by the z variables depends on which 
b € B is actually realized. This is an important feature 
of our model which explains why it can produce less 


conservative solutions than Soyster’s model (see the ex- 
ample given in Remark 1 below). 
According to Farkas’ lemma, we know that, for fixed 
b € B, anecessary and sufficient condition for the exis- 
tence of z > 0 verifying Gz < b — Fy is that 
u'(b—Fy)>0 Vu in the polyhedral cone: 
C = {u/u'G > Oand u > 0}. 
Denoting u',u*,...,u?, the extreme rays of the above 


cone, we can equivalently represent the set Y as the sys- 
tem of linear inequalities 


(uw/)"Fy < (u/)"b, Vb © B, Vj =1,...p, 
which is equivalent to 
(w)" Fy < wi = min(u/)"b. 
bEB 


The robust two-stage decision problem is then re- 
formulated as 
(1) max yy 
subject to (u/)'Fy < w/,Wj=1,...,p 
y= 0. 
Since we are interested in investigating duality 
in the context of robustness, let us state the (stan- 


dard) dual to (I)’. So, introducing dual variables 
Aj.(j = 1,...p), the dual to (I)’ can be written as 


min So Ajw! — yoy min(u!)"b 
j j 


subject to So AjFTul >y 
j 
A; 2=0,Vj=1,...,p 


and since: {u/u = ) Ajus,A; > O} = {u/Glu > 
0,u > O}, this can be rewritten as 


minmin ub 
u-bEB 


(DI)’ 
subjectto Flu>y 
Glu>0 
u>0 
or, equivalently if we denote W as the set of solutions to 
(DI), 


‘ : T 
. 1 
rate) 7 
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Ana priori different way of using duality in our con- 
text would be to take the (standard) LP dual to (I) for 
fixed b, and then to carry out robustness analysis with 
respect to the coefficients b of the objective in the dual, 
allowing b to take all possible values in B . The LP dual 
to (I) reads 


(DI) min u!b 
subject to Flu > y 
Glu>0 
u=>0. 


A natural robust version of (DI) consists in finding the 
dual solution u minimizing the value of u'b produced 
by the worst possible b: 


min max{u'b} , (2) 
uEW bEB 


which is to be contrasted with (1): indeed, it is seen 
that the robust version of the dual significantly differs 
from the dual of the robust version of the initial (pri- 
mal) problem (1) because the function of u to be min- 
imized is minyeg{u'b} in one case, and maxpep{u'b} 
in the other case. 

It is worth observing that this structural difference 
between the two functions also implies a difference with 
respect to the practical solvability of the corresponding 
problems. The objective function in (2) is convex in u, 
making the robust version of (DI) efficiently solvable, 
whereas the objective in (1) is concave in u, making 
(DI)’ and thus (by standard LP duality) (1)’ also diffi- 
cult problems in the general case. 


Remark 1 As already suggested, the two-stage robust 
decision model proposed here is capable of producing 
less conservative solutions than Soyster’s model. The 
reason for this is that if, for a given uncertainty set B, 
we consider Soyster’s model for problem (I), the prob- 
lem to be solved is 


max{y"y}, 

yeYs 
where the set Ys is defined as {y/y > Oanddz > 0: 
Gz < b— Fy} with b defined as 

Vi,b; = min{bj}. 

bEB 

It is easily seen that Yy C Y = {y/y > Oand Vb e€ 

B,dz => 0: Gz < b — Fy} and cases where strict inclu- 


sion holds (leading to an improved robust optimal so- 
lution value over the optimal value of Soyster’s model) 


can easily be found, as illustrated by the following ex- 
ample. 
In this example we consider three variables, y > 
0, z,; > 0 and z, > 0, and three constraints, 
y-a <b, 
y— 2, < by 


Z+2. <b3, 


and the uncertainty set B is taken as the set contain- 
ing the two vectors (1,0, 1)" and (0, 1,1)’. The objec- 
tive function is to maximize y. It is easily checked that 
the set Ys corresponding to Soyster’s model is in this 
case the real interval [0, 1/2] leading to an optimal ro- 
bust solution value 0.5. On the other hand, the set Y 
corresponding to our two-stage model is the real inter- 
val [0, 1] leading to the (less conservative) optimal ro- 
bust solution value 1. Indeed, the value y = 1 is feasible 
in our model because, in the case where b = (1,0, 1)" 
occurs, we can take z} = 0 and z, = 1, and in the 
case where b = (0,1,0)' occurs, we can take z; = 1 
and z, = 0. (Observe, as already pointed out, that the 
value taken by the z variables indeed depends on which 
b € Bis actually realized.) Of course, this example does 
not rule out the possibility of having Y = Ys for some 
special instances. As will be seen in the next section, 
this possibility will arise in connection with the robust 
PERT scheduling problem, in the special case (referred 
to there as Case 1) where the uncertainty set on the task 
durations is the Cartesian product of a family of real in- 
tervals. 


An Application of the Two-Stage Robust LP Model 
with RHS Uncertainty: Robust PERT Scheduling 


In this section we specialize the general two-stage ro- 
bust LP decision model investigated earlier to robust 
PERT scheduling, with an uncertainty set D on the du- 
rations of the tasks, supposed to be given as a (finite 
or infinite) list of ‘scenarios’. More precisely, we want 
to determine an earliest termination date which can be 
achieved for any realization of the task durations d in 
a given uncertainty set D. 


Formulation as a Two-Stage Robust LP Model 


Consider a PERT network represented as a directed cir- 
cuitless graph N in which the nodes correspond to tasks 
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(the tasks are numbered i = 1,2,...n; the set of tasks 
is denoted I), and there is an arc (i,j) with length (du- 
ration) d; whenever there is a precedence constraint 
stating that processing of task j should not start before 
completion of task i. The set of arcs is denoted U. We 
assume that node 1 has no immediate predecessor (it 
thus represents the initial task) and node n has no di- 
rect successor (it thus represents the terminal task of 
the project). Denoting y; (j = 1,...) the starting date 
for each task j, and assuming first that the task dura- 
tions dj are exactly known, we want to minimize the 
total duration of the project while satisfying all prece- 
dence constraints; in other words 


Maximize —y, 


subject to 
n=) 
yi-yji <—di, VG, je U. 


Indeed, it is easy to check that in the above, y,; = 0 can 
be replaced by y; > 0, or equivalently —y, < 0; thus, 
the problem can be rewritten 


Maximize 


— Vn 
subject to 

—y <0 
yi-yji xd, VG) EU. 


This model is recognized as a special case of (I), the con- 
straint matrix [F, G] being formed by the transpose of 
the node-arc incidence matrix of N with an additional 
row involving variable y; only (with associated coefhi- 
cient —1). F is reduced in this case to a single column 
(the column corresponding to node n in the transpose 
of the incidence matrix of N). The RHS vector b is the 
vector with coefficients equal to the opposite of the task 
durations (more specifically, the RHS coefficient for the 
constraint corresponding to arc (i, j) € U is equal to 
—d;). Note that we do not state explicitly the nonneg- 
ativity conditions on y, since they are implied by the 
precedence constraints and nonnegativity of the dj co- 
efficients. 

Thus, the problem is cast in a form very similar to 
that of (I), the only difference being that the nonneg- 
ativity conditions on y and z are dropped. The conse- 
quence of this for the analysis in Sect. “Two-Stage Ro- 
bust (LP) Decision Model” is just that we have to con- 
sider the polyhedral cone C’ = {u/u’G = Oandu > 


0} instead of the polyhedral cone: C = {ulu'G > 
0 and u > 0}. 

Owing to the special structure of the G matrix aris- 
ing in the PERT scheduling problem, we have the fol- 
lowing result. 


Proposition 1 The extreme rays of the polyhedral cone 
C’ are in 1-1 correspondence with the characteristic vec- 
tors of the various paths between node 1 and noden inN. 


Proof: By observing that G’ is the node-arc incidence 
matrix of the graph N without the row associated with 
node n but with an extra column with coefficient —1 in 
the first row and all other coefficients 0, we realize that 
u satisfying G'u = 0 and u > 0 corresponds to a non- 
negative flow between node 1 and node n in N with 
value equal to u;,;, the component of u corresponding 
to the extra column with coefficient —1 in the first row 
and all other coefficients 0. Therefore the extreme rays 
of the cone C’ correspond to the incidence vectors of 


the various paths connecting node 1 to node nin N. O 


Let us denote P = {x!,7?,..., 2%} the set of all paths 
between 1 and n in N, u',u?,...,u* the correspond- 
ing characteristic vectors and the condition (uk)T(b — 
Fy) > O specializes to— )7-,% di + yn = 0 (this is be- 
cause, in that case 
(u')"b = — )° dj and(u*)"Fy = —y,). 
iemk 
So the condition for feasibility is that for each path 


x*, Yn = Vijeqk di. In view of this, the robust PERT 
scheduling problem can be reformulated as 


max —Yy 


subject to 
Yn = 2 dj, Wn" —eP,WdeD, 
ienk 
where we recall that D denotes the uncertainty set for 
the task durations. 

This problem therefore reduces to determining the 
path 2* maximizing, over the set P of all possible paths 
in N, the objective function 

max{ D7 di}. 

ien* 
In other words we want to solve 

max max{) ° d;} (RPS) , 


mweEP deD * 
1€ 
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where RPS stands for robust PERT scheduling problem. 

Now, if we want to go further into the analysis of 
(RPS), we have to specify how the uncertainty set D 
is defined. Of course there are many possible ways for 
this; we content ourselves below with examining two 
among the most natural possible definitions, and show 
that, for each of them, the above robust optimization 
problem (RPS) can be efficiently solved. 

Case 1: D is a scaled ball with respect to the Log norm. 

The first easy special case is when, for each task i, the 
duration d; can take any value in a given real interval 
[d; ae withO <d; < a In this case D is the Carte- 
sian product [d,, dj] x [dy,df]x---x [d7, d+], 
which may be viewed as a scaled ball with respect to the 
Loo norm (using component-wise scaling to have all in- 
tervals of equal width). It is easily seen that an optimal 
robust solution for problem (RPS) can be obtained in 
this case by looking at a longest path (critical path) in N 
when each of the tasks i is assigned the longest possible 
duration d 

As an illustration of the above, consider the follow- 
ing example with n = 7 tasks, where the graph of prece- 
dence constraints has the following arcs: (1,2), (1,3), 
(2,3), (2,5), (2,6), (3,4), (3,7), (4,5), (5,7) and (6, 
7). Thus, task 2 cannot be started before completion of 
task 1, etc. Also note that the tasks are numbered ac- 
cording to a topological ordering of the graph, since 
there is no arc (i,j) with i > j. The associated intervals 
[d; ,d Ae for the durations of the tasks are given in Ta- 
ble 1: 


Robust Linear Programming with Right-Hand-Side Uncer- 
tainty, Duality and Applications, Table 1 
Associated intervals [d-, d7] for the durations of the tasks 


Task1 Task2 Task3 Task4 Task5 Task6 
(2,41 [i481 [13,61 [[4,8) |[4,81 [I8, 161) 


Task 7 is not shown in the table because it is 
a dummy task (of duration 0, without uncertainty) rep- 
resenting the end of the schedule. It is easy to see that 
in this example, the optimal solution to the (RPS) prob- 
lem has duration 34 and corresponds to the critical path 
(1,2, 3,4,5, 7). Indeed 34 is the earliest achievable ter- 
mination date if we require that the schedule remains 
feasible for any possible choice of the task durations 
in the Cartesian product [d;, dy] x [asa | Xr X 


Ide. dt ]. This corresponds to the situation where the 
duration of each task iis d;*. 

Case 2: D is a scaled Hamming ball of bounded ra- 
dius I’. 

Here again we assume that, for each task i, the du- 
ration d; can take any value in a given real interval 
[d;,d; + Aj;] with d; > 0. d; is called the nominal value 
of the duration for task i, dj + A; being the possible 
extreme (or worst-case) value for the task duration. As 
is actually the case in many practical applications, it 
is unlikely that all tasks simultaneously take on worst- 
case values. To take this observation into account, we 
will impose an upper bound I” on the number of task 
durations which are allowed to take on a worst-case 
value, given that all task durations which do not take on 
a worst-case value are assumed to have a value equal to 
their nominal value. More formally, if we associate with 
each task i a 0-1 integer variable u;, the uncertainty set 
D corresponding to this definition is 


D= {0 =(;)i=1,...,n> 6; = dj + Aju; 
(i =1,...,n) such that 


> ui < Tu; € {0,1}, Vi}. 


i=1 


As can be seen from the above definition, in the spe- 
cial case where all A; are equal to 1, D is recog- 
nized as the Hamming ball of radius I” centered at 
d = (d;),i = 1,...n (in other words, @ — d can be any 
0-1 vector with at most J” components equal to 1). 
When the A;’s take on arbitrary positive values, the 
Hamming ball structure is still present after applying 
scaling to each component i with respect to the corre- 
sponding A; value. 

We note here that, in spite of the fact that the def- 
inition above is close in spirit to the concept of uncer- 
tainty suggested by Bertsimas and Sim [3,4], our model 
is fairly different from the one studied by those authors, 
since they restricted themselves to rowwise uncertainty, 
whereas in our robust PERT scheduling problem, we 
have uncertainty on the RHS only (a special case of 
columnwise uncertainty). For more detailed discussion 
of this issue, see Sect. “Differences with Bertsimas and 
Sim’s Approach”. 

We now show that, with the above definition of 
the uncertainty set D, problem (RPS) can be efficiently 
solved via a dynamic programming recursion. To that 
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aim, we will consider the problem with parameter 
I’ as only one representative of the class of prob- 
lems (RPS[i, k]) for i running from 1 to n (the num- 
ber of tasks) and k running from 0 to I”. More pre- 
cisely, assuming that the tasks are numbered accord- 
ing to a topological ordering of the circuitless graph N, 
(RPS[i, k]) consists of the robust PERT scheduling sub- 
problem corresponding to the subset of tasks 1, 2,...,i, 
the durations of at most k of which are allowed to take 
on their worst-case values. The case k = 0 (no devia- 
tion allowed) corresponds to the usual PERT schedul- 
ing problem in terms of the nominal values for the 
task durations. We denote v*[i, k] the optimal objec- 
tive function value for problem (RPS[i, k]), and for any 
task i, we denote Pred|i] the set of tasks j such that (j, 7) 
is an arc of N (the set of direct predecessors of node i). 
Bellman’s optimality principle then leads to the follow- 
ing dynamic programming recursion: 


Vie [1,n], Vk =0,1,...0: 


v*[i,k] = max max{v*[j,k] + dj; 


j€Pred([i) 


v*[j,k-—1] +4; + Aj}. () 


The optimal value of the robust PERT scheduling prob- 
lem we are interested in is then v*[n, I]. The ratio- 
nale behind (3) can easily be explained as follows. Con- 
sider the set of all paths from 1 to j € Pred[i]. The du- 
ration of arc (j,i) has nominal value d; and worst-case 
value dj + Aj. The maximum duration of a path from 
1 to i through j with at most k tasks allowed to devi- 
ate from their nominal values can be obtained: either 
by allowing for at most k deviations on the subset of 
tasks {1,2,..., j} and taking the nominal duration for 
arc (j,i) or by allowing at most k — 1 deviations on the 
subset of tasks {1, 2,... j} and taking the worst-case du- 
ration for arc (j,i). Thus, the optimal value for node i 
via node j is the maximum value among these two al- 
ternatives, and the optimal value for node i is the max- 
imum taken on the set of all direct predecessors of i. 
Obviously, solving the recursion (3) is achieved in poly- 
nomial time O(m x n), where m is the number of arcs 
and n the number of nodes of the PERT network; more 
precisely the complexity is O(m x I), where I’, the pa- 
rameter defining the uncertainty set, is at most n, the 
number of tasks (but is often significantly smaller than 
n in practical applications). 
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e=ol2 [6 [9 [3 [6 [7 
=a [afro [1a_|17_no faa 


Let us illustrate the above for the same seven-task 
example as the one considered to illustrate Case 1. We 
thus consider the same intervals for the task durations, 
the lower bound of each interval representing the nom- 
inal task duration, and the upper bound representing 
the worst-case duration. For J” = 3, application of the 
recursion (3) leads to the v*[i, k] values shown in Ta- 
ble 2. 

For instance the value v* [4,2] = 16 corresponds to 
the path (1, 2, 3, 4) with three arcs of nominal durations 
2, 4 and 3, respectively. If, in this path, two arcs out 
of three are allowed to take on their worst-case dura- 
tions, the worst case (maximum) is obtained when task 
2 has duration 8 and task 3 has duration 6 (task 1 keep- 
ing its nominal duration 2), the resulting length of the 
path being 8 + 6 + 2 = 16. Let us also illustrate how the 
recursion (3) works for computing, e.g., v*[7,2] and 
v*[7,3]. We have 


v* [7,2] =max{v* [3, 2] + 3;v*[3, 1] + 6; 
v" [5,2] +4 v"[5,1] + 8 
v* [6,2] + 8 v*[6, 1] + 16} 
= max{15, 16, 25, 25, 20, 26} = 26. 
The maximum above is obtained for j = 6 and the cor- 


responding optimal path is (1, 2, 6, 7). 
Similarly we have 


v" [7,3] =max{v* [3, 3] + 3;v*[3, 2] + 6; 
v* [5,3] +4; v*[5,2] + 8 
v*[6, 3] + 8; v*[6, 2] + 16} 
= max{15, 18, 28, 29, 20, 28} = 29. 


The maximum above is obtained for j = 5 and the 
corresponding optimal path is (1, 2,3, 4,5, 7). It is thus 
seen that, depending on the choice of the control pa- 
rameter J”, various optimal paths are obtained which, 
of course, may differ from the optimal solution to the 
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nonrobust PERT scheduling problem (considering only 
the nominal values for the task durations). Also, ob- 
serve that the value v*[7, 3] = 29 corresponds to a less 
conservative robust solution than the one obtained in 
Case 1. 


Differences with Bertsimas and Sim’s Approach 


We now turn to show that, in spite of the similarity in 
the definition of the uncertainty sets, the robust version 
of the PERT scheduling problem investigated here is es- 
sentially different from the model proposed by Bertsi- 
mas and Sim [3] for the robust version of the short- 
est-path problem. From an abstract point of view, the 
difference basically stems from the fact that, in our case, 
we are faced with a LP problem with uncertainty on 
the RHS, whereas Bertsimas and Sim addressed a LP 
problem with uncertainty on the cost coefficients. How- 
ever, to further understand the source of this differ- 
ence, we show below which difficulties would arise if 
we wanted to apply the Bertsimas-Sim approach to the 
robust longest (critical) path problem on a directed cir- 
cuitless graph G. 

Following these authors, the robust shortest s-t path 
problem in G with uncertainty parameter J” (assuming 
I’ € N) is formulated as 


i,j%i,j (> 


RSP) mi Fan A 
~ SS 2 Ci j%ij tse De 


(i,j)€U i, jes 


where X denotes the set of incidence vectors of all s- 
t paths in G, c;,; denotes the nominal cost of arc (i,j) 
and c;,; + Aj;,; is the worst-case cost of arc (i,j). Af- 
ter transformation of (RSP) using the duality theorem 
to convert the second term in the braces into a mini- 
mization, the problem is reformulated as a standard LP, 
the solution of which reduces to m + 1 applications of 
a standard shortest-path algorithm. Observe that one of 
the reasons for all the above to work so nicely is that the 
second term in the braces, as a function of x, is convex 
in x, since it is the pointwise maximum of a finite num- 
ber of linear functions. 

The above approach is still valid if, instead of look- 
ing for an optimum robust minimum cost path, we are 
looking for an optimum robust maximum benefit path: 
cj,; > 0 being interpreted as a reward associated with 
the use of arc (i,j), the effect of uncertainty being to re- 
duce the nominal reward c;; by the amount A;;. The 


problem would then take the form 


max 
xEX 


Cj. jXi.j Aj. {Xj ; 
) i,j%i,j — scU sir , y i,j%i,j 


(i,jeU (i, jyES 


which is essentially analogous to the above robust min- 
imum cost path, up to a change in the signs of the co- 
efficients in the objective (still assuming, of course, that 
the graph G under consideration is circuitless). In par- 
ticular, we note that the function to be maximized is 
concave, so we still have a convex optimization prob- 
lem. 

By contrast, the robust PERT scheduling problem 
addressed in the present paper is formulated as 


max ) Ci,jXi,j + Max 
xEX sou |sleP 
(i,jeU 


> Ai jXij¢ > 
(i,JeS 
with c;,; > Oand A;,; > 0. 

It is then readily observed that this problem consists 
in maximizing a convex function of x on {0,1}, and 
it is well known that this cannot be simply reduced to 
ordinary LP as is the case for Bertsimas and Sim’s ap- 
proach. Thus, robust PERT scheduling may be viewed 
as a typical illustration of the big differences between 
models featuring rowwise uncertainty and models fea- 
turing columnwise uncertainty in robust LP. 


Conclusions 


In this chapter, various robust LP problems have been 
investigated, and the question of whether LP duality 
can still be used to help solve such problems has been 
addressed and answered negatively. Among the prob- 
lems considered, robust LP problems with uncertainty 
on the RHS only have been recognized as an interesting 
subclass of problems, for which the solution techniques 
should not confine themselves to the classical approach 
proposed by Soyster [10]. In this respect we have been 
led to propose a new class of robust LP models referred 
to here as ‘two-stage robust decision models’ which can 
be expected to lead to less ‘conservative’ optimal ro- 
bust solutions than those usually obtained from Soys- 
ter’s model. In order to show the practical usefulness of 
this two-stage model, a specialization to robust PERT 
scheduling was discussed, leading, under two natural 
ways of defining the uncertainty set with respect to 
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the task durations, to efficient solution methods. Also 
some fundamental differences between our approach 
to robust PERT scheduling and the one proposed by 
Bertsimas and Sim [3] in the context of the robust 
shortest-path problem were pointed out. We think that 
many other possible applications of this two-stage ro- 
bust modeling approach deserve further investigations, 
for instance in dynamic inventory management, opti- 
mal resource allocation problems and telecommunica- 
tion problems. 
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The analyst who attempts to build a mathematical 
model for a real-world system is often faced with the 
problem of uncertain, noisy, incomplete or erroneous 
data. This is true for several application domains. In 
business applications noisy data are prevalent. Returns 
of financial instruments, demand for a firm’s products, 
the cost of fuel, and consumption of power and other 
resources are examples of model data that are known 
with some probabilistic distribution at best. In social 
sciences data are often incomplete —- for example, par- 
tial census surveys are carried out periodically in lieu 
of a complete census of the population. In the phys- 
ical sciences and engineering data are usually subject 
to measurement errors, as in models of image restora- 
tion from remote sensing experiments. For some appli- 
cations not much is lost by assuming that the value of 
the uncertain data is known and then developing a de- 
terministic mathematical programming model. Worst 
case or mean values can be used in this respect be- 
cause they provide reasonable approximations when ei- 
ther the level of uncertainty is low, or when the uncer- 
tain parameters have a minor impact on the system we 
want to model. For many applications, however, uncer- 
tainty plays a key role in the performance of the real- 
world system: worst-case analysis often leads to conser- 
vative and potentially expensive solutions, and solving 
the mean value problem, i.e., a problem where all ran- 
dom variables are replaced by their mean values, can 
even lead to nonsensical solutions since the mean of 
a random variable might not be a value that can be re- 
alized in practice. 

A general approach to dealing with uncertainty is 
to assign to the unknown parameters a probability dis- 
tribution, which should then be incorporated into an 
appropriate mathematical programming model. Early 
models for dealing with data uncertainty were in the 
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form of sensitivity analysis of the mean value model. 
Later developments formulated stochastic linear pro- 
gramming models whereby the data uncertainty was in- 
corporated in the estimation of the expected value of 
the objective function. This chapter addresses the prob- 
lem of planning under uncertainty using robust opti- 
mization models. 


Robust Optimization Models 


Robust optimization formulations are applicable to op- 
timization models that have two distinct components: 
a structural component that is fixed and free of any noise 
in its input data; and a control component that is subject 
to noisy input data. In some cases the robust optimiza- 
tion model is identical to a two-stage stochastic pro- 
gram with recourse. But it also allows additional flexi- 
bility in dealing with noise. In order to define the model 
we introduce two sets of variables: 

e x €R” denotes the vector of decision variables that 
depend only on the noise-free structural constraints. 
These are the design variables whose values are inde- 
pendent of realizations of the noisy parameters. 

e y€R" denotes the vector of control variables that 
can be adjusted once the uncertain parameters are 
observed. Their optimal values depend both on the 
realization of uncertain parameters, and on the op- 
timal values of the design variables. 

The optimization model we are interested in has the fol- 

lowing structure: 


min (c, x) + (d, y) (1) 
such that 

An= db, (2) 

Bx + Cy =e, (3) 

x € R%, (4) 

yeR'Y, (5) 


where b, c, d, e are given vectors, A, B, C are given matri- 
ces and (-, -) is the inner product of its arguments. Equa- 
tion (2) denotes the structural constraints that are free 
of noise. Equation (3) denotes the control constraints. 
The coefficients of these constraints, i.e., the elements 


of B, C and e are subject to noise. The cost vector d is 
also subject to noise, while A, b and c are not. 

To define the robust optimization problem we in- 
troduce an index set (2 := {1,..., N}. With each index s 
€ 2 we associate the scenario set {d(s), B(s), C(s), e(s)} 
of realizations of the control coefficients. Reference to 
an index s implies reference to the scenario set associ- 
ated with this index. The probability of the sth scenario 
is ps, and }°,;e@ ps = 1. Now the following question 
is posed: What are the desirable characteristics of a so- 
lution to problem (1)-(5) when the coefficients of the 
constraints (3) take values from some given set of sce- 
narios? The solution is considered robust with respect 
to optimality if it remains ‘close’ to optimal for any real- 
ization of the scenario index s € §2. The problem is then 
termed solution robust. The solution is robust with re- 
spect to feasibility if it remains “almost’ feasible for any 
realization of s. The problem is then termed model ro- 
bust. The concepts of ‘close’ and ‘almost’ are precisely 
defined later through the choice of appropriate norms. 

It is unlikely that a solution to the mathematical 
program will remain both feasible and optimal for all 
realizations of s. If the system being modeled has sub- 
stantial built-in redundancies, then it might be possible 
to find solutions that remain both feasible and optimal. 
Otherwise a model is needed that permits a trade-off 
between solution and model robustness. Robust opti- 
mization models formalize a way to measure this trade- 
off. 

First let us introduce a set {y!, ..., yN} of control 
variables for each scenario s € §2, and another set {z!, 
...> 2} of feasibility error vectors that measure the in- 
feasibility of the control constraints under each sce- 
nario. 

The real-valued objective function & (x, y) = (c, x)+ 
(d, y) is a random variable taking the value &, (x, y°) := 
(c, x)+ (d(s), y*) with probability p,. Hence, there is no 
longer a simple single choice for an aggregate objective 
function. The expected value 


o() = > ps&(-) (6) 
sEQ 


is precisely the objective function used in stochastic 
programming formulations. Another choice is to em- 
ploy worst-case analysis and minimize the maximal 
value. The objective function is then defined by 


o(:):= max §5(+). 
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The robust optimization formulation also allows the in- 
troduction of higher moments of the distribution of & 
(-) in the optimization model. Indeed, the introduction 
of higher moments is one of the features of robust op- 
timization that distinguishes it from the stochastic pro- 
gramming model. For example, we could use a nonlin- 
ear utility function that embodies a trade-off between 
mean value and variability in this mean value. If U(é;) 
denotes the utility of €,, then the function 


o(-):= >) pUlE()) 


sea 


captures the risk preference of the user. A popular 
choice of utility functions, typically used in portfolio 
management applications, is the logarithmic function 
U(E;) = log &,. The general robust optimization model 
includes a term o(x, y!, ..., yY) in the objective func- 
tion to denote the dependence of the function value on 
the scenario index s. This term controls solution robust- 
ness, and can take different forms depending on the 
application. The examples mentioned above are some 
popular choices. 

The robust optimization model introduces a sec- 
ond term in the objective function to control model 
robustness. This term is a feasibility penalty function, 
denoted by p(z!, ..., 2), and it is used to penalize vi- 
olations of the control constraints under some of the 
scenarios. The introduction of this penalty function 
also distinguishes the robust optimization model from 
the stochastic programming approach for dealing with 
noisy data. In particular, the model recognizes that it 
may not always be possible to arrive at a feasible solu- 
tion to a problem under all scenarios. Infeasibilities will 
inevitably arise, and they will be dealt with outside the 
optimization model. The robust optimization model 
generates solutions that present the modeler with the 
fewest infeasibilities to be dealt with outside the model. 

The specific choice of penalty function is problem 
dependent, and it also has implications for the choice 
of a solution algorithm. Two suitable penalty functions 
are the following: 

e p(z',..., 2%) = Ysen ps || 2 ||?. This quadratic 
penalty function (i.e., a weighted £2-norm) is ap- 
plicable to equality control constraints where both 
positive and negative violations of the constraints 
are equally undesirable. The resulting quadratic pro- 
gramming problem is twice continuously differen- 


tiable, and can be solved using standard quadratic 
programming algorithms, although it is typically 
large scale. 

e p(z',...,2%):= Y sea ps max (0, max; Zj), where zs 
is the jth element of vector z*. This penalty function 
is applicable to inequality control constraints when 
only positive violations are of interest (negative val- 
ues of some z; indicate slack in the inequality con- 
straints). With this choice of penalty function, how- 
ever, the resulting mathematical program is nondif- 
ferentiable. 

The robust optimization model takes a multicriteria ob- 

jective form. A goal programming weight parameter A 

is used to derive a spectrum of answers that trade off so- 

lution for model robustness. The general formulation of 
the robust optimization model is stated as follows: 


min. 068,900.05 9) EADS eB) 


st. Ax = b, 
Bis)x + C(s)yvi +2 =e(s), VWse Q, 
x eR’, 
y ERY, Wse, 
ZeER™, VWseQ. 


Comparisons of Robust Optimization 
with Sensitivity Analysis 
and Stochastic Linear Programming 


We compare here the alternative approaches for dealing 
with uncertainty. In particular, we contrast robust op- 
timization (RO), stochastic linear programming (SLP) 
and sensitivity analysis (SA). We will see that RO en- 
joys several advantages, but it is not without its short- 
comings. 

Sensitivity analysis is a reactive approach to con- 
trolling uncertainty. It just measures the sensitivity of 
a solution — typicallly of the solution to the mean value 
problem - to changes in the input data. It provides no 
mechanism by which this sensitivity can be controlled. 

Stochastic linear programming and robust opti- 
mization are constructive approaches for controlling 
uncertainty in the values of model parameters. In this 
respect they are both superior to SA. With stochastic 
linear programming models the decision maker is af- 
forded the flexibility of recourse variables. These corre- 
spond to the control variables of RO and provide the 
mechanism with which the model recommendations 
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can be adjusted to account for the realizations of ran- 
dom data. 

The SLP model, however, optimizes only the first 
moment of the distribution of the objective value & (-). 
It ignores higher moments of the distribution, and the 
decision maker’s preferences towards risk. These as- 
pects are particularly important for asymmetric distri- 
butions, and for risk averse decision makers. Further- 
more, aiming at expected value optimization implicitly 
assumes an active management style whereby the con- 
trol (i.e., recourse) variables are easily adjusted as sce- 
narios unfold. Large changes in the objective values &, 
may be observed across different scenarios, but their 
expected value will be minimal. The RO model mini- 
mizes higher moments as well, e. g., the variance of the 
distribution of &(-). Hence, it assumes a more passive 
management style. Since the value of &; will not differ 
substantially among different scenarios, little or no ad- 
justment of the control variables will be needed. In this 
respect RO can be viewed as a SLP whereby the recourse 
decisions are implicitly restricted. 

This distinction between RO and SLP is important, 
and defines their domain of applicability. Applied to 
personnel planning, for example, a SLP solution will 
design a workforce that can be adjusted (by hirings or 
layoffs) to meet demand at the least expected cost. The 
important consideration of maintaining stability of em- 
ployment can not be captured. The RO model, on the 
other hand, will design a workforce that will need few 
adjustments in order to cope with demand for all sce- 
narios. However, the cost of the RO solution will be 
higher than the cost of the SLP solution. 

Another important distinction of RO from SLP is 
the handling of the constraints. Stochastic linear pro- 
gramming models aim at finding the design variables 
x such that for each realized scenario s a control vari- 
able setting y° is possible that satisfies the control con- 
straints. For systems with some redundancy such a so- 
lution might always be possible. The SLP literature 
even allows for the notion of relatively complete re- 
course, whereby a feasible solution y* exists for all sce- 
narios, and for any value of x that satisfies the de- 
sign constraints. What happens in cases where no fea- 
sible pair (x, y°) is possible for every scenario? The SLP 
model is declared infeasible. RO explicitly accounts for 
this possibility. In engineering applications (e. g., image 
restoration) such situations inevitably arise due to mea- 


surement errors. Multiple measurements of the same 
quantity may be inconsistent with each other. Hence, 
even if the underlying physical system has a solution 
(in this case an image does exist!) it will not satisfy all 
the measurements. The RO model, through the use of 
error terms {z*} and the infeasibility penalty function 
p(-), will find a solution that violates the constraints by 
the least amount. This approach is receiving increasing 
attention in the operations research literature. 

While RO has some distinct advantages over SA and 
SLP, it is not without limitations. First, RO models are 
parametric programs and we have no mechanism for 
specifying a priori a ‘correct’ choice of the parameter 
A. Of course, this problem is prevalent in multicriteria 
programming optimization. Second, the scenarios in §2 
are just one possible set of realizations of the problem 
data. RO does not provide a means by which the sce- 
narios can be specified. This problem is prevalent in 
SLP models as well. Substantial progress has been made 
in recent years in integrating variance reduction meth- 
ods, such as importance sampling, into stochastic linear 
programming, and these techniques also apply to RO. 

Despite these potential shortcomings, we emphasize 
that working only with expected values (as in the lin- 
ear programming formulations) is fundamentally lim- 
ited for problems with noisy data. Even going a step fur- 
ther - that is, working with expected values and hedging 
against small changes in these values — is also inappro- 
priate. In this respect, RO provides a significantly im- 
proved modeling framework. 

Robust optimization integrates the methods of 
multi-objective programming with stochastic program- 
ming. It also extends SLP with the introduction of 
higher moments of the objective value, and with the 
notion of model robustness. Both RO and SLP pro- 
vide constructive mechanisms for dealing with uncer- 
tain data. 


Notes 


The material in this article is drawn from [1, Chap. 13], 
where additional material and extensive discussion of 
the literature can be found. 

The robust optimization model was suggested in 
[7], which also discussed example applications of ro- 
bust optimization to the classic diet problem, and prob- 
lems in finance, engineering design and capacity plan- 
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ning. J.K. Sengupta [10] had discussed earlier the ro- 
bustness of stochastic linear programming solutions. 
Another approach for enforcing robustness in stochas- 
tic programs by means of restricted recourse models 
was introduced in [11]. 

The terminology of ‘structural’ and ‘control’ vari- 
ables is borrowed from the flexibility analysis of man- 
ufacturing systems; see [9]. 

Applications of robust optimization are discussed in 
[5] (for environmental planning), [2] (for image recon- 
struction from projections), [8] (for capacity planning), 
[12] (for matrix balancing), [6] (for capacity expansion 
for power generation firms), [4] (for capacity expansion 
planning in manufacturing). A robust optimization ap- 
proach to capacity planning for a multiproduct, multi- 
facility production firm was suggested in [3], and ap- 
plied to plan car manufacturing facilities for the Gen- 
eral Motors Company. 


See also 


> Competitive Ratio for Portfolio Management 

> Financial Applications of Multicriteria Analysis 

> Financial Optimization 

> Portfolio Selection and Multicriteria Analysis 

> Semi-infinite Programming and Applications in 
Finance 
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Introduction 


The research area of production scheduling has re- 
ceived considerable attention from both academia and 
the chemical processing industries over the past decade. 
Most of the work in the literature assumes that all data 
are deterministic - that is, they are of constant, known 
values. However, in reality, uncertainty is prevalent in 
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many scheduling problems due to lack of accurate pro- 
cess models and variability of process and environmen- 
tal data. Thus, an emerging area of research aims at de- 
veloping methods to address the problem of scheduling 
under uncertainty, in order to create reliable schedules 
which remain feasible in the presence of parameter un- 
certainty. 

The issue of robustness in scheduling under uncer- 
tainty has received relatively little attention, in spite of 
its importance and the fact that there has been a sub- 
stantial amount of work to address the problem of de- 
sign and operation of batch plants under uncertainty. 
Most of the existing work has followed the scenario- 
based framework, in which the uncertainty is modeled 
through the use of a number of scenarios, using either 
discrete probability distributions or the discretization 
of continuous probability distribution functions, and 
the expectation of a certain performance criterion, such 
as the expected profit which is optimized with respect to 
the scheduling decision variables. The scenario-based 
approaches provide a straightforward way to implicitly 
incorporate uncertainty. However, they inevitably en- 
large the size of the problem significantly as the number 
of scenarios increases exponentially with the number of 
uncertain parameters. This main drawback limits the 
application of these approaches to solve practical prob- 
lems with a large number of uncertain parameters. An 
alternative approach for scheduling under uncertainty 
is reactive scheduling. It is carried out to adjust a sched- 
ule, which is usually obtained a priori in a deterministic 
manner, upon realization of the uncertain parameters 
or occurrence of unexpected events. Due to the “on- 
line” nature of reactive scheduling, it is required to gen- 
erate updated schedules in a timely manner and often, 
heuristic approaches are developed for schedule modi- 
fications. 

In this chapter, we propose a novel robust opti- 
mization approach to address the problem of schedul- 
ing under uncertainty. The underlying framework is 
based on a robust optimization methodology first intro- 
duced for Linear Programming (LP) problems by Ben- 
Tal and Nemirovski [1] and extended in this work for 
Mixed-Integer Linear Programming (MILP) problems. 
The approach produces “robust” solutions which are, in 
a sense, immune against uncertainties in both the coef- 
ficients and right-hand-side parameters of the inequal- 
ity constraints. The approach can be applied to address 


the problem of production scheduling with uncertain 
processing times, market demands, and/or prices of 
products and raw materials. We consider uncertainty 
in scheduling problems which can be characterized as 
bounded or bounded and symmetric as well as uncer- 
tainty described by a known probability distribution 
function, such as a uniform or normal distribution. Ad- 
ditional work on the robust optimization of short-term 
scheduling problems can be found in Lin et al. [2] and 
Janak et al. [3]. 


A Motivating Example 


Consider the following example process that was first 
presented by Kondili et al. [4] and has been widely 
studied in the literature. Two different products can be 
produced from three feeds according to the State-Task 
Network as shown in Fig. 1. Five processing tasks are 
considered including Heating, Reaction 1, Reaction 2, 
Reaction 3, and Separation. Four units are available and 
four intermediate materials are processed in-between 
tasks. The initial stock level for all intermediates and 
products is assumed to be zero. Each task has a unit- 
specific, variable processing time. The relevant data are 
shown in Table 1. The objective is to maximize the 
profit from the sale of products manufactured in a time 
horizon of 12 hours. 

The continuous-time formulation proposed by 
Floudas and coworkers [5,6,7,8] is used to solve this 
simple scheduling problem. The “nominal” solution is 
shown in Fig. 2, which features intensive utilization of 
the two reactors and an objective function value (profit) 
of 3638.75. However, this solution can become com- 
pletely infeasible when there is uncertainty in the pro- 
cessing times of the tasks. That is, when a task requires 
a longer processing time than its nominal value, it may 
not be able to finish processing within the time interval 
assigned in the nominal schedule. In this example, even 
a very small perturbation may make the schedule infea- 
sible, especially for the two reactors, and can have a sub- 
stantial effect on the scheduling decisions. For instance, 
if the processing time of each task is increased by sim- 
ply 10% of its nominal value, then the nominal schedule 
will become infeasible and the optimal schedule with 
the slightly increased processing times will be signifi- 
cantly different from the nominal schedule, as shown 
in Fig. 3. In the Heater and the Separator, the number 
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Robust Optimization: Mixed-Integer Linear Programs, Figure 1 
State-task network for the motivating example 
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ble 1 
Data for the motivating example 


Units Capacity Suitability Processing Time 


Heater [100 [heating [10 
Reactor? [80 [reaction 2202010 


States 
Feed A 
Feed B 
Feed C 
HotA 
IntAB 
IntBC 


Storage Capacity Initial Amount Price 
Unlimited Unlimited 
Unlimited Unlimited 
Unlimited Unlimited 

100 0.0 

200 0.0 

150 0.0 

200 0.0 

Unlimited 0.0 

Unlimited 0.0 


ImpureE 
Product 1 
Product 2 


of tasks as well as processing amounts change, while in 
Reactor 1 and Reactor 2, even the task sequences are dif- 
ferent. Furthermore, the profit is reduced to 3264.69. 

It is clear that solving a scheduling problem at the 
nominal values of the uncertain data is not enough. To 
obtain reliable and efficient schedules, systematic and 
effective approaches which take into account uncer- 


tainty are required. In the rest of this chapter, we pro- 
pose a new robust optimization framework to generate 
schedules that are reliable in the presence of uncertainty 
arising from various sources. 


Formulation 


Consider the following generic Mixed-Integer Linear 
Programming (MILP) problem 


Min/Max c’x + d’y 
xy 
s.t. Ex+ Fy=e 


Ax+ By <p 


(1) 


x<x<x y=0,1. 


Assume that the uncertainty arises from both the coef- 
ficients and the right-hand-side parameters of the in- 
equality constraints, namely, dim, bj, and p;. We are 
then concerned about the feasibility of the following in- 
equality: 


> AlmXm + > bicyk < pi. 
m k 


where aim, bix, and p; are the nominal values of the 
uncertain parameters and @),,, biz, and pi are the “true” 
values of the uncertain parameters. 

As shown in the previous section with the motivat- 
ing example on scheduling, the optimal solution of an 


(2) 
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Separator (EE 
Sep Sep 
Reactor 2 80.00 ; 80.00 ; 80.00 ; 80.00, 80.00 68.75, 
RI R2 RI R3 R2 R3 | 
Reactor 1 50.00 50.00 50.00 50.00 50.00 50.00, 40.00 
| a -- ne a7 a ae a a a ee ad a | 
Rl R2 R2 R3 R2 R3 R2 
Heater 52.00 20.00 52.00 16.00 
Heat Heat Heat Heat 
L 1 1 1 1 1 1 1 1 1 1 
0 1 2 3 4 5 6 7 8 9 10 11 12 
Robust Optimization: Mixed-Integer Linear Programs, Figure 2 
Optimal solution with nominal processing times (profit = 3638.75) 
Separator 97.50 109.69 
= 1. 
Sep Sep 
Reactor 2 80.00 r 80.00 rT 80.00 i 80.00 59.69, 40.00 ; 
RI R2 RI ts R2 "RB! R2 
Reactor 1 50.00 50.00 47.50 50.00, 50.00 50.00 50.00 
> St Te rT -— _ 
Rl R2 R3 R3 R2 R3 R2 
Heater 52.00 52.00 36.00 
Heat Heat Heat 
1 1 1 1 1 1 1 1 1 1 1 
0 1 2 3 4 5 6 7 8 9 10 11 12 
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Optimal solution with processing times increased by 10% (profit = 3264.69) 


MILP problem may become infeasible - that is, one or 
more constraints are violated substantially — if the nom- 
inal data is slightly perturbed. Our objective here is to 
develop a robust optimization methodology to gener- 
ate “reliable” solutions to the MILP problem, which are 
immune against uncertainty. This robust optimization 
methodology was first introduced for Linear Program- 
ming (LP) problems with uncertain linear coefficients 
by Ben-Tal and Nemirovski [1] and is extended in this 
work to MILP problems under uncertainty. We con- 
sider several different types of uncertainty including: 
(i) bounded uncertainty, (ii) bounded and symmetric 
uncertainty, and (iii) uncertainty described by a known 
distribution such as a uniform or normal distribution. 


Bounded Uncertainty 


Suppose that the uncertain data range in the following 
intervals: 


|a1m — Aim| < €laim|, |bix — biel < €|biel, 


lpi — pil < €|pi 


» (3) 


where dim, b;, and pi are the “true” values, aim, big and 
pi are the nominal values, and € > O is a given (relative) 
uncertainty level. 


We call a solution (x,y) robust if it satisfies the fol- 
lowing conditions: (i) (x,y) is feasible for the nom- 
inal problem, and (ii) whatever are the true val- 
ues of the coefficients and right-hand-side parameters 
within the corresponding intervals, (x,y) must satisfy 
the /th inequality constraint with an error of at most 
5+ max[1, |p;|], where 5 is a given infeasibility toler- 
ance. 


Theorem 1 Given an infeasibility tolerance, (6), to gen- 
erate robust solutions, the following so-called (€,6)-Inter- 
val Robust Counterpart (IRC[e,5]) of the original uncer- 
tain MILP problem can be derived. 


Min/Max c’x + dy 


X,Y, u 
s.t. Ex + Fy =e 
Ax+By<p 
yy AlmXm + € » |dim|Um + + Dik Vk 
m me My, k 
+e >° [bile 
keK] 


< pr —el|p:| +6-max(1, |p|], V1 


Robust Optimization: Mixed-Integer Linear Programs 


Um S Xm S Um, 
= 


ye = 0,1, 


a 
IA 
coy 


Vk (4) 


where M, and K; are the set of indices of the x and y 
variables, respectively, with uncertain coefficients in the 
Ith inequality constraint. 


Proof 1 We want to find a robust solution (x,y) which 
satisfies condition (i) and condition (ii), that is: 
V1, V(dim: |@im — @im| < €laim|, bie: 


[bik — bie| < €lbie|, and pr: |i — pil < €lpil): 


2 dimXm + > @imXm 


méM| me M, 
+ bik Vk + > bine 
k¢K keK, 


< pi + 6- max(1, |p]. 
(5) 


Using the worst-case values of the uncertain param- 
eters, or those that make the inequality constraint the 
most difficult to satisfy, 


GimXm S AimXm + €|Aim||Xml, 
bik vk < bik ve + €lbiKl yK. (6) 
and =p; > pi —€\|pi| 
and substituting into Eq. (5) and rearranging terms, it 
is clear that a solution (x,y) is robust if and only if it is 


a feasible solution of the following optimization prob- 
lem. 


Min/Max c'x +d" y 
xy 


s.t. Ex + Fy =e 
Ax+By<p 


» AlimXm + € y |aim||Xm| 


meM, (7) 


+ is. +e > bil yk 
k 


keK, 
< pi—elpi|+6-max[1,|pi|], V/ 
eae x 


yk =0,1, Vk. 


The above problem is equivalent to problem (4) where 
the absolute value operator is represented with a set 


of auxiliary variables (u,,) and a set of additional con- 
straints. Oo 


For each inequality constraint that involves uncer- 
tain coefficients and/or right-hand-side parameters, 
an additional constraint is introduced to incorporate 
the uncertainty and maintain the relationships among 
the relevant binary and continuous variables under 
the uncertainty level and the given infeasibility toler- 
ance. Essentially, this constraint considers the worst- 
case values of the uncertain parameters which make 
the inequality the most difficult to maintain. At the 
same time, a certain degree of relaxation is introduced 
to allow tolerable violations of the constraint. Note 
that mathematical model (4) remains an MILP model. 
Compared to the original deterministic MILP problem, 
the robust counterpart has a set of auxiliary variables 
(um) and a set of additional constraints relating the vari- 
ables x,, and Uj. 


Uncertainty with Known Probability Distribution 


Assume that for the inequality constraint /, the true val- 
ues of the uncertain parameters are obtained from their 
nominal values by the random perturbations 


Aim = (1 i €E1m) Alm 
bi, = (1 + ef) DK (8) 
pi = (1+ €&)p1 


where &,, &4 and € are independent random variables 
and € > 0 isa given (relative) uncertainty level. 

In this situation, we call a solution (x,y) robust if it 
satisfies the following: (i) (x,y) is feasible for the nom- 
inal problem, and (ii) for every inequality J, the proba- 
bility of violation of the uncertain inequality is at most 
K, or 


Pr YS aiken + is7. > pi +6-max{(1, |p7|] 
m k 


<K« (9) 


where 6 > 0 is a given feasibility tolerance and k > 0 is 
a given reliability level. Thus, « represents the probabil- 
ity of violation of constraint | where k = 0% indicates 
that there is no chance of constraint violation, yielding 
the most conservative solution. 

If the probability distributions of the random vari- 
ables €i, &% and & in the uncertain parameters are 
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known, it is possible to obtain a more accurate estima- 
tion of the probability measures involved. The MILP 
from (1) can be rewritten as an uncertain MILP as fol- 
lows 
Min c'x+d'y 
x,y 
s.t. Ex + Fy =e 
Max] Ax + By — p < &- max[(1, |p|] 
Max | y-p Ol ag) 
xXSXSX 
y=0,1 
—E=([A,B, pl eZ 


where the data set € = [A, B, p] varies in a given uncer- 
tainty set Z, A, B, and p represent the “true” values of 
the uncertain coefficients, and 5 > 0 is an infeasibility 
tolerance introduced to allow a certain amount of in- 
feasibility into the inequality constraint. The inequality 
can be written in expanded form as 


Max[ 2» AlmXm + dX biye—pi < 6-max[1, ieil]] 
(11) 


for every constraint / where @jm, biz, and pi are again 
the true values of the uncertain coefficients. Using the 
expressions for the true values of the uncertain coefh- 
cients given in constraint (8), the uncertain inequality 
in (11) can be rewritten as follows 


Max Dt + €&1m)AimX¥m + Be + €E KE) OIKVE 


—(1+€&)pi < 5-max[1, ies] . 
(12) 


Rearranging terms, we get 


k 
( So EtmaimXm + D> Exebinye — 7) 
meM, keK, 


< 6+ max[1, | (13) 


where M; and K; define the sets of uncertain param- 
eters aj, and bj;, respectively, for constraint J. Then, 


a solution (x,y) to the original uncertain MILP given 
in Eq. (10) which satisfies this constraint is called “re- 
liable” because it takes into account the maximum 
amount of uncertainty € € Z and allows an amount of 
infeasibility 6. Now, to transform the constraint into 
a deterministic form, we instead consider the following 
formulation 


Pr 


eo + So bin ye —pite 
m k 


( pa Efm@imXm + > EDIE VE — nn) 


me M, keK) 


> §-max[1, 


nil <K. (14) 


This constraint enforces that the probability of viola- 
tion of the uncertain inequality is at most «, where 
5 > Oisa given feasibility tolerance (i. e., amount of er- 
ror allowed in the feasibility of constraint J) and « > 0 
is a given reliability level (i.e., the probability of vio- 
lation of constraint ] where x = 0 indicates that there 
is no chance of constraint violation). Thus, if we know 
a probability distribution function for the sum of the 
random variables, 


E= YO Emaim%m + D> Ex binye — €1P 1 


meM, keK, 


(15) 


we can use this information in the probabilistic con- 
straint (14) to write a deterministic form for the un- 
certain constraint which is “almost reliable”, depend- 
ing on the value of «. This is done using the definition 
of a probability distribution function and the following 
relationship 


Fe (A) = Pr{€ < A} = 1— Pr{& > A}=1—«K (16) 


to replace the stochastic elements in constraint (13), 
generating a deterministic constraint that is “almost re- 
liable” for the given uncertainty level, €, infeasibility tol- 
erance, 6, and reliability level, «. The final form of the 
deterministic constraint (or robust counterpart prob- 
lem) is simply determined using the inverse distribu- 
tion function (quantile) of the random variable & 


Fei(l—K) = f(a, 


pil). = (17) 


Aim|Xm, lbiklyx; 
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Thus, the additional constraints in the Robust Counter- 
part (RC) problem can be written as 


Ye aimxm + > breve + F(A, aim Xm [Brel ye LiL) 
m k 


< pi +6-max{(1,|pi|], V/ 
(18) 


where € is a given uncertainty level, 5 is a given infeasi- 
bility tolerance, and A is determined from x using con- 
straint (16) and the probability distribution function 
for &. 


Uncertainty with Normal Probability Distribution 
Suppose that the distributions of the random variables 
Eim> €m and & in (8) are all standardized normal distri- 
butions with zero as the mean and one as the standard 
deviation. Then, the distribution of € defined in (15) is 
also a normal distribution, with zero as the mean and 


2 2 2 
‘| Dein BimXm + Lokex, O7,¥k + P7 as the standard 
deviation. 


Theorem 2 Given an uncertainty level (€), an infeasi- 
bility tolerance (5), and a reliability level (k), to generate 
robust solutions, the following (€,6,«)-Robust Counter- 
part (RC[e,6,k ]) of the original uncertain MILP problem 
can be derived. 


Min/Max c'x+d'y 
xy 
s.t. Ex + Fy =e 
Ax + By <p 


Y- aimxm + > binye + eA 
m k 
= aiyst, + Bigye + Ph 


meM, keK, 


(19) 


< pi +5-max{[l1,|pi|],  V/ 
xSxX SX 
ye = 0,1, Vk 
where A = F,'(1—x) and F," is the inverse distribu- 


tion function of a random variable with standardized 
normal distribution. Thus, A and x are related as follows 


K=1-F,(A) 
kK =1—Pr{E <)} 


where & is a random variable with standardized normal 
distribution. 


Proof 2 Let (x,y) satisfy the following 


Y aimxm + >- bieye +A 
m k 
ye Gim%n + Dy bieve + Pi 


meM, keEK 


(20) 


< pi + 8+ max[1, |p7|] 


where A = F,'(1 — x) and F;' is the inverse distribu- 
tion function of a random variable with standardized 
normal distribution. Then 


Pr} > GimXm + is, > pi + 6-max(1, ra 
m k 


— Pr) > AimXm + € > E1m|@im|Xm 
m 


meM, 


+0 diye te >> Exlbiely 
k 


keEK| 


> pi + €&|pi| + 6- max[1, ra 


< Pr} ( 2 Etm|@im|Xm 


meM, 
+S > Selbrelye — &1 il) 
keK, 
YS aaah + Do Bare + pi > aI 
meM, keK, 
= 1- Pr} ( > E1m|Aim|Xm 
meM, 


+ D2 Stlbuelye — & il) 


keEK, 


Yay + SO Hire + pi <a 


meM, kEK] 


1—F,(A)=1-(l-k)=k. Oo 


Note that 
Qmem, Slm|4imlXm + Direx, §klBrelye — §11prl)/ 
) ei Bin Xm + Lokek, ORY + P7 is also a ran- 
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dom variable with standardized normal distribution. 
This formulation results in a convex MINLP problem, 
but can still be solved efficiently using a mixed-integer 
nonlinear solver (e. g., DICOPT [9], MINOPT [10]). 

In the discussion above, for simplicity, we have as- 
sumed that there is a single common uncertainty level 
(e), infeasibility tolerance (5), and reliability level (x) 
in each MILP or convex MINLP problem with uncer- 
tain parameters. However, the proposed robust opti- 
mization techniques can easily be extended to account 
for the more general case in which the uncertainty level 
varies from one parameter to another and the infeasibil- 
ity tolerance and reliability level are dependent on the 
constraint of interest. Furthermore, note that for each 
type of uncertainty addressed, one additional constraint 
is introduced for each inequality constraint with un- 
certain parameter(s) and auxiliary variables are added 
if needed. Because the transformation is carried out at 
the level of constraints, in principle, the various ro- 
bust optimization techniques presented can be applied 
to a single MILP or convex MINLP problem involving 
different types of uncertainties. More specifically, for 
each inequality constraint, as long as all of its uncer- 
tain parameters are of the same type, an additional con- 
straint that corresponds to the uncertainty type can be 
introduced to obtain the deterministic robust counter- 
part problem. It should be pointed out that the afore- 
mentioned robust optimization methodology circum- 
vents any need for explicit or implicit discretization or 
sampling of the uncertain data, avoiding an undesir- 
able increase in the size of the problem. Thus, the pro- 
posed methodology is potentially capable of handling 
problems with a large number of uncertain parame- 
ters. 


Applications 


The robust optimization methodology proposed in the 
previous section can be applied to address the prob- 
lem of scheduling under uncertainty. In this work, we 
employ the continuous-time formulation presented by 
Floudas and coworkers [5,6,7,8], which leads to MILP 
models, to develop new robust scheduling approaches 
for the following three classes of uncertainties: (i) un- 
certainty in processing times/rates of tasks, (ii) uncer- 
tainty in market demands for products, and (iii) uncer- 
tainty in market prices of products and raw materials. 


Uncertainty in Processing Times 


The parameters of processing times/rates of tasks par- 
ticipate in the duration constraint and appear as linear 
coefficients of the binary variable (i. e., aw) and the con- 
tinuous variable (i.e., Bj) as follows: 


TI (i, j, n)—T (i, j,n) = aij-wv(i, n) + BiB, j,n), 
(21) 


where wv(i,n) is a binary variable indicating whether or 
not task (i) starts at event point (n), B(i,j,n) is a con- 
tinuous variable determining the batch-size of the task, 
and T*(i,j,n) and Tf (i,j,n) are continuous variables rep- 
resenting the starting and finishing time of the task, re- 
spectively. Note that this is an equality constraint. Thus, 
in order to apply the robust optimization techniques 
proposed in the previous section for inequality con- 
straints with uncertain parameters, the duration con- 
straint is relaxed to an inequality constraint 


T! (i, j, n)—T (i, j, n) > aij-wv(i, n) + Bi;-Bli, j, n). 
(22) 


Consequently, the variable T/(ij,n) represents the 
lower bound on the finishing time of the task, instead 
of the exact finishing time as determined by the orig- 
inal duration constraint. Using this modified duration 
constraint, the various robust optimization techniques 
can be readily applied to consider uncertainty in the pa- 
rameters a and Bj. 

For example, consider a task with parameters aj 
and 6, exhibiting bounded uncertainty in the follow- 
ing ranges 


P U pl - B U 
Oj, SA SQ;,, B;; < Bij < ij: 


ij ij (23) 


According to Theorem 1, to obtain the deterministic 
robust counterpart problem, the following constraint is 
added to the original scheduling model 


TS(i, j, n)—T*(i, j, n) > aj-wv(i, n)+B})-B(i, j,n)—8. 
(24) 


Note that no auxiliary variables need to be introduced 
because the variable B(i,j,n) (batch-size of the task) is 
non-negative by definition. 

Alternatively, consider a batch task with fixed pro- 
cessing time represented by parameter aj. Then, the 
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true value of the processing time can be represented in 
terms of the nominal processing time as follows: 


Gig = (1+ €&0;; Oi; , (25) 
where Fai, is a random variable with known distribu- 
tion. 

For the case where the uncertainty is characterized 
by a standardized normal distribution, then, according 
to Theorem 2, to obtain the deterministic robust coun- 
terpart problem, the following constraint is added to 
the original scheduling model: 


TS (i, j,n)—T! (i, j, n)+[1+eAlaij-wv(i,n) <8, (26) 


where A = F~!(1 — «) and F;' is the inverse distribu- 
tion function of a random variable with a standardized 
normal distribution. 


Uncertainty in Product Demands 


The product demands (i. e., dem;) appear as the right- 
hand-side parameters in the demand constraints 


STF(s) > dem;, Ws € S? (27) 


where STF(s) is a continuous variable representing the 
amount of state (s) accumulated at the end of the time 
horizon and S? is the set of final products. 

The robust optimization techniques can be directly 
applied to these inequality constraints with uncertain 
parameters. For example, in the case of bounded un- 
certainty, 


dem” < dem, < dem! , (28) 
and according to Theorem 1, the constraint to be added 
to the original scheduling model to derive the deter- 
ministic robust counterpart problem is as follows: 


STF(s) > dem —6é. (29) 


Alternatively, for case of uncertainty with a known 
distribution, if we consider an uncertain product de- 
mand represented by parameter dem,, then the true 
value of the product demand can be represented in 


terms of the nominal product demand as follows: 
dem, = (1+ €&,)dem, , (30) 


where &, is a random variable with known distribution. 


For the case of normal uncertainty, according to 
Theorem 2, the constraint to be added to the origi- 
nal scheduling model to derive the deterministic robust 
counterpart problem is 

STF(s) > dem,(1 + €A — 8) (31) 
where A = F,'(1—«) and F," is the inverse distribu- 
tion function of a random variable with standardized 
normal distribution. 


Uncertainty in Market Prices 


The market prices (i. e., prices) participate in the objec- 
tive function for the calculation of the overall profit: 


Maximize Profit = > price, - STF(s) 
seSP 


_ > price, -STO(s), 


sES" 


(32) 


where S? and S’ are the sets of final products and raw 
materials, respectively, and STO(s) and STF(s) are con- 
tinuous variables representing the initial amount of 
state (s) at the beginning and the final amount of state 
(s) at the end, respectively. The objective function can 
be expressed in an equivalent way as follows: 


Maximize Profit 


s.t.Profit < y price, - STF(s) 


seSP 


_ > price, -STO(s) . 


sES" 


(33) 


Now the uncertain parameters price>; appear as linear 
coefficients multiplying the continuous variables STF(s) 
and STO(s) in an inequality constraint and the robust 
optimization techniques can be readily applied. 

For example, if the uncertainty is normally dis- 
tributed, 

price, = (1 + €&,)price, (34) 
where &, is a standardized normal random variable, 
then according to Theorem 2 the deterministic ro- 
bust counterpart problem can be obtained by introduc- 
ing the following constraint to the original scheduling 
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model: 


Profit < > price, - STF(s) — > price, - STO(s) 


seSP ser 
—eh 2 price? - STF(s)* + s price? - STO(s)? 
seESP seSr 


ae 
(35) 


where A = F-!(1 — x) and F;' is the inverse distribu- 
tion function of a random variable with standardized 
normal distribution. 


Cases 


In this section, the robust optimization formulation 
is applied to two example problems. Both the exam- 
ples are implemented with GAMS [11] on a 3.20 GHz 
Linux workstation. The MILP problems are solved us- 
ing CPLEX 8.1 while the MINLP problems are solved 
using DICOPT [9]. 


Case 1: Bounded Uncertainty 
in the Processing Times 


Let us revisit the motivating example in Sect. “A Mo- 
tivating Example”. Assume that the uncertainty of the 
processing times is bounded and the (relative) uncer- 
tainty level (€) is 15%, that is, 


0.85a <& < 1.15a (36) 


and the infeasibility tolerance level (6) is 10%. 

By solving the IRC[e, 5] problem, a “robust” sched- 
ule is obtained, as shown in Fig. 4, which takes into ac- 
count uncertainty in the processing times. The nom- 
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Nominal solution Robust solution 


CPU time (s) 114.47 


inal schedule can be seen in Fig. 2 in Sect. “A Moti- 
vating Example”. Compared to the nominal solution 
which is obtained at the nominal values of the pro- 
cessing times, the robust solution exhibits very differ- 
ent scheduling strategies, including both task-unit as- 
signments and task timings. Even the sequences of tasks 
in the two reactors in Fig. 4 deviates significantly from 
those in the nominal solution in Fig. 2. The robust so- 
lution ensures that the robust schedule obtained is fea- 
sible with the specified uncertainty level and infeasibil- 
ity tolerance. However, the resulting profit is reduced, 
from 3638.75 to 2887.19, which reflects the effect of un- 
certainty on overall production. A comparison of the 
model and solution statistics for the nominal and ro- 
bust solutions can be found in Table 2. 

Figure 5 summarizes the results of the IRC problem 
with three different levels of uncertainty. It is shown 
that with a given infeasibility tolerance, the maximal 
profit that can be achieved decreases as the uncer- 
tainty level increases, which indicates more “conserva- 
tive” scheduling decisions because of the existence of 
uncertainty. On the other hand, at a given uncertainty 
level, the profit increases as the infeasibility tolerance 
is increased, which means more “aggressive” schedul- 
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Robust solution for case 1 (e€ = 15%, 
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Profit vs. infeasibility tolerance at different uncertainty levels for case 1 


ing arrangements can be incorporated if violations of 
related timing constraints can be tolerated to a larger 
extent. These results are consistent with intuition and 
other approaches, however, with the robust optimiza- 
tion approach, the effects of uncertainty and the trade- 
offs between conflicting objectives are quantified rigor- 
ously and efficiently. It should be noted that at a given 
uncertainty level, the objective value of profit as well as 
the corresponding schedule change dramatically at dis- 
crete points as the infeasibility tolerance increases. This 
behavior is caused by special characteristics of the ex- 
ample problem, including the fixed time horizon and 
the fixed processing times of tasks. 


Case 2: Uncertainty with a Normal Distribution 
in the Market Prices 


In this example, we consider uncertainty with a normal 
distribution in the market prices for the same process 
and processing time data given in Case 1. The objective 
function is the maximization of profit in a time horizon 
of 8 hours. The uncertainty level (€) is 5%, the infea- 
sibility tolerance (5) is 5%, and the reliability level («) 
is 5%. The nominal schedule is shown in Fig. 6 with 
a profit of 1088.75. The robust schedule is obtained 
by solving the robust counterpart problem, as shown 
in Fig. 7, and the corresponding profit is 966.97. By 
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Robust solution for case 2 (€ = 5%, 5 = 5%, kK = 5%, profit = 966.97) 
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Profit vs. reliability level at different uncertainty and infeasibility levels for case 2 


executing this schedule, the profit is guaranteed to be 
least 966.97 with a probability of 95% in the presence 
of the 5% uncertainty in the prices of the products and 
raw materials. A comparison of the model and solution 
statistics for the nominal and robust solutions can be 
found in Table 3. 


Figure 8 summarizes the results of the RC problem 
at several different levels of uncertainty and an infea- 
sibility tolerance of 0% at increasing values of the reli- 
ability level. It is shown that at a given reliability level, 
the maximal profit that can be achieved decreases as the 
uncertainty level increases, which indicates more con- 


Rosenbrock Method 


3343 


Robust Optimization: Mixed-Integer Linear Programs, Ta- 
ble 3 
Model and solution statistics for case 2 


Robust 
solution 
966.97 
0.05 


Nominal 
solution 
1088.75 

0.02 


Profit 
CPU time (s) 


Binary 60 
Variables 


Continuous 
Variables 


Constraints 


servative scheduling decisions because of the existence 
of uncertainty. Also, at a given uncertainty level and in- 
feasibility tolerance, the profit increases as the reliability 
level increases, meaning that as the probability of viola- 
tion of the uncertain constraint, or k, increases, then A 
decreases and according to Eq. (35), the profit takes on 
a larger value. 


Conclusions 


In this chapter, we propose a new approach to ad- 
dress the scheduling under uncertainty problem based 
on a robust optimization methodology, which when 
applied to MILP problems, produces “robust” solu- 
tions that are, in a sense, immune against uncertain- 
ties in both the coefficients in the objective function, 
the left-hand-side parameters and the right-hand-side 
parameters of the inequality constraints. A unique fea- 
ture of the proposed approach is that it can address 
many uncertain parameters. The approach can be ap- 
plied to address the problem of production schedul- 
ing with uncertain processing times, market demands, 
and/or prices of products and raw materials. Our com- 
putational results show that this approach provides an 
effective way to address scheduling problems under un- 
certainty, producing reliable schedules and generating 
helpful insights on the tradeoffs between conflicting ob- 
jectives. 
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The Rosenbrock method is in its basic form a gradient- 
free minimization algorithm. It was introduced by H.H. 
Rosenbrock [2], and avoids the use of line searches. It is 
based on orthogonal search directions with alternating 
minimizations between these and using pattern search 
at the end of each orthogonal direction search cycle. 
The algorithm is given below. The minimization prob- 
lem considered is: 


min f(x). 


1. Initialization 

Set the search directions to be the coordinate directions, 
d), ..., d, where the d® are vectors of zeros, except 
for a 1 in the ith position. Select a scalar value € > 0 to be 
used as the termination tolerance, an expansion factor 
B, > 1 and a contraction factor B2 such that — 1 < B» 
< 0. Set the initial stepsizes g, i=1,...,n, along each 
of the above defined search directions. Select an initial 
point x and initialize by setting z“ = x, Set the pat- 
tern iteration counter k = 0 and the direction search 
counter i = 1. Initialize the stepsizes along each direc- 
tion to 6; = ae 


2. Main Iteration Step (direction Search) 


2.1 Forward search 

IF f(z + 6d) < f(z) 

THEN 
forward search step i is successful; 
set 204) = 7 4 § 4, 
set 6; — B1d;. 

ELSE (if f(z + 6;d®) > f(z)) 
forward search step is unsuccessful; 
set 204) = 7, 
set 6; <— B2d;. 

END IF; 

IFi<n, 
increment search counter i < i+ 1; 
go to step 2.1. 

ELSE (if i = 1) go to step 2.2. 

END IF; 


22 1E F(Z) = fi) 
(at least one improvement achieved in 2.1); 
set 2) = 2D j = 1; 
go to step 2.1. 
ELSE (it f(2 7") — f(z) 
(no improvement achieved in 2.1); 
Te fz) fx) 
(one improvement in iteration k); 
go to step 3. 
BSE (if f(z y= (x) 
(no improvement in iteration k); 
IF | 6; |< € for all i, 
THEN x“) is an estimate of the optimal 
solution; 
STOP. 
ELSE 
set 2) = z"*)), 
set i = 1, go to step 2.1. 
END IF; 
END IF; 
END IF; 


3. Pattern Search and New Search Direction Set Genera- 
tion 


3.1 Set x(t) = glntl), 
Oe xe) =x |e, 
THEN xt» is an estimate of the optimal so- 
lution; STOP. 
ELSE solve the linear system 
eed ieee lle 
x) for Aj; 
go to step 3.2 
3.2  Orthonormalization of new search direc- 
tions (Gram-Schmidt procedure) 
a seta) = d“ if A; = 0; 
set a’? = 77d if A; £0. 
b — setb™ = a; 
set b{) = al yi (ata?) a, i> 2, 
where d = b/ || b“ |); 
denote these new directions d“ as d“), 
3.3 Reset the stepsizes 6; = 8, for P=, suse Hh 
set the initial point 2 = x‘*+); 
increment the main cycle counter k <— k +1; 
Setii—ale 
Go to step 1. 
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The procedure as described in the algorithm above 
can be seen to be taking discrete steps along each of the 
n directions. Success in each of these is followed by an 
expansion of the stepsize, while a failure is marked by 
a reduction of the stepsize, changing direction using 82 
which is negative, to be taken in the next cycle. Upon 
a single success of the cycle of n searches a new set of 
search directions is generated by the orthogonalization 
procedure (Gram-Schmidt). Continued failures in the 
search directions will result in stepsize shrinking and 
triggering of the termination criterion (within € toler- 
ance) eventually. 

It is possible to derive a continuous minimization 
procedure along each of the search directions, by re- 
placing the discrete step search with a line search pro- 
cedure. Such a scheme can be found for example in [1], 
where it is also demonstrated that under differentiabil- 
ity assumptions on the objective function f it is possible 
to show that Rosenbrock’s method converges to a sta- 
tionary point of f (minimum under convexity assump- 
tions). 


See also 


> Cyclic Coordinate Method 
> Powell Method 
> Sequential Simplex Method 
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The gradient projection method proposed by J.B. Rosen 
in 1960 [82,83] is one of earliest methods in the history 
of mathematical programming for solving constrained 
optimization problems. The importance of this method 
in the literature also stems from the fact that many more 
efficient algorithms, in linear and nonlinear program- 
ming, developed later (e. g. by D. Goldfarb [42], by B.H. 
Murtagh and R.W.H. Sargent [63] and by N.K. Kar- 
markar [53]) incorporated the basic ideas propounded 
by Rosen. 

The global convergence of Rosen’s method was 
a long-standing open problem. Since Rosen’s method 
is included in many textbooks, the convergence prob- 
lem became quite well-known. In fact, almost all books 
(such as [3,4,57,67]) that have a chapter or a section 
to introduce Rosen’s method recognize the problem on 
the global convergence of Rosen’s method. Through ef- 
forts of 26 years, the proof was finally found [35,36,47]. 

The study on the global convergence of Rosen’s 
method had a great impact on the development of 
a general theory of global convergence in nonlinear 
programming. In fact, many new techniques [28,29] 
were discovered to reach the final solution. It is desir- 
able to solve other open problems with them. 

One of big remaining open problems about global 
convergence in nonlinear programming is Powell’s con- 
jecture that the DFP method (Davidon-Fletcher-Powell) 
for unconstrained optimization is globally convergent. 
Progress has been made slowly [69,70,71]. 

This article will review the story about Rosen’s 
method and survey the results in theory of global con- 
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vergence and the development about Powell’s conjec- 
ture. 


Rosen’s Method 


Consider linearly constrained optimization problems in 
the following form: 


max f(x) 


a 
s.t. aj,x= bj, 


(1) 
j=l,...,m, 


where x € R”; R" is the n-dimensional Euclidean 
space whose points are represented as column vectors. 
A point is called a feasible point if it satisfies the con- 
straints. The set of all feasible points is called the fea- 
sible region. Without special mentioning, we always as- 
sume that the function f is continuously differentiable 
in a convex set containing the feasible region. The func- 
tion f is always referred to be an objective function. For 
simplicity of notations, we denote g (x) = V f(x), and 
especially g;, = g (x;,) and g* = g (x*). 

A nonzero vector d is called a feasible direction at 
a feasible point x if there exists A > 0 such that for any 
A € [0,A],x+A dis feasible. 

A constraint is called an active constraint at a fea- 
sible point x if its equality sign holds at the point. The 
set of indices of active constraints is called the active set, 
denoted by J(x), i. e. 


J(x) = i alx = bjt 
In particular, we denote 


Jk = J(xx) and J* = J(x*). 

For simplicity, we also denote M = {1,..., m}. Note that 
J CJ’ will stand for J C J’ and J # J’. For a singleton 
{h}, we write J \h and J U h instead of J \ {h} and J U 
{h}, respectively. 

For the feasible region in the considered problem 
(1), a direction d is feasible at a feasible point x if and 
only if a/d = 0 for j € J(x). In fact, if d is a feasible 
direction at point x, then there exists a number 1 > 0 
such that x + A d is a feasible point. Thus, a} (x + Ad) 
= 0b; for j € M. Since a} x = b; for j € J(x), we have 4 
ajd > 0 for j € J(x). Therefore, ajd > 0 for j € J(x). 


Conversely, suppose ajd > 0 for j € J(x). Set 


bj — alx 
5 en : . 
= nin) aja col if dj: 
A= 
ajd <0, 
1 otherwise. 


Clearly, 2 > Oand ford € [0, A] » X+A dis feasible. 
Now, define 


D'(x) = {a: ald = Ofor je I(x) 


Then D'(x) is exactly the set of all feasible direction at 
point x. 

A feasible point x * is said to be a local maximum of 
problem (1) if there exists a neighborhood of the point 
x* such that for any feasible point x in the neighbor- 
hood, f(x) < f(x*). 

A nonzero vector d is called an ascendant direction 
at a feasible point x if g (x) d > 0. Clearly, if d is a fea- 
sible ascendant direction at x, then we can find a feasi- 
ble point x’ along the direction d such that f(x’) > f(x). 
Therefore, x is a local maximum only if there does not 
exist a feasible ascendant direction at x. That is, a feasi- 
ble point x is a local maximum only if 


D'(x) N D(x) = 8, 
where 
De) = {di g'd> 0}. 


The following Theorem states a necessary and sufficient 
condition for D!(x) N D*(x) = @. 


Theorem 1 Let x be a feasible point. Then D'(x) N 
D?(x) = @ if and only if there exist uj, j € J(x), such that 


g(x) = uja; (2) 
jET(x) 
and 
uj < O for j < J(x). (3) 


A feasible point x is called a Kuhn-Tucker point if 
there exist uj, j € J(x), satisfying (2) and (3). Clearly, 
for a linearly constrained optimization problem, ev- 
ery local maximum is a Kuhn-Tucker point. However, 
a Kuhn-Tucker point may not be a local maximum. 
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A feasible point x is regular if all active constraints at 
x are linearly independent, i. e. a; for j € J(x) are linearly 
dependent. The problem (1) is nondegenerate if its con- 
straints satisfy the following regularity condition: Every 
feasible point is regular. 

Suppose the problem (1) is nondegenerate. At each 
feasible point x, the subspace D; = {d:a}d = 0,j € J} 
where J = J(x) is the tangent plane at the point. The or- 
thogonal projection operator on D, is denoted by Py. It 
is not hard to find out that 


Ty y-1 aT 
P; =I—Aj,(A, Aj) A,, 
where A; denotes the matrix consisting of column vec- 


tors aj, j € J. An important property of gradient projec- 
tion is as follows. 


Theorem 2 Let 0<c < +00. For] =J(x), define 


_fPig if Pig > c-un, 
Prng otherwise, 


d 


where uj, = maxjey uj. Then d = 0 if and only if x is 
a Kuhn-Tucker point. Furthermore, if d € 0, then d is 
an ascendant feasible direction. 


Theorem 2 suggests the algorithm Rosen’s method be- 
low. 

The global convergence of Rosen’s method relies 
on a parameter chosen at each iteration. To avoid the 
zigzag phenomena, Rosen chose the parameter to be 
a positive number with an upper bound. He also gave 
a convergence proof in his paper. However, it did not 
take very long for someone to point out that there is 
a serious mistake in the proof. Since then, substantial 
efforts have been made on the problem. Maybe, it is due 
to a natural expectation for seeking simplicity. Most 
books incorrectly state Rosen’s method by setting the 
parameter to be zero which is exactly what Rosen tried 
to avoid. See [30] for a counterexample to the method 
described in those books. It was indicated in [35] that 
this counterexample also works for the case in which 
the parameter varies and converges to zero as the com- 
putation runs from the first iteration to the infinity. It 
follows that Rosen’s restriction on the parameter is not 
sufficient for the convergence. 


Initially, choose a feasible point x;. At each it- 
eration k = 1, 2, ..., the algorithm carries out 
the following steps. 

1 Choose a positive number c,. Compute 
a search direction by the following formula: 


if || Py, gk ||> ckhukn,. 


B Tk Sk 
CLs otherwise, 


PrN hy 


where 

(wij, J € Je)" = (Ay An) ARB" 
and 

Ukh, = max{uzj: j € J\M’}. 


2 Ifd, # O, then stop; x, is a Kuhn-Tucker 
point. 
If d, ~ 0, then compute 


bj = al x; 
m = min{ 
a} dy 


° a} dj < 0}, 


= 1 ifald, > Oforallj ¢ Jk, 

Ak = leer 

m otherwise, 

and find a new point xx; = Xx + Axdk, 
(0 < Ax < Ax) by a line search procedure. 


Rosen’s method 


The first convergent version was found by E. Polak 
[66]. Compared with the original one, Polak’s version 
is too complicated. First of all, he used the €-active set 
strategy. Secondly, the version contains a special pro- 
cedure, which involves computing the gradient projec- 
tion several times. Polak proved the convergence of his 
version under a condition, named by €-hypothesis. X.- 
S. Zhang [96] showed that €-hypothesis is equivalent to 


3348 


Rosen’s Method, Global Convergence, and Powell’s Conjecture 


the regularity condition. Zhang’s work motivated from 
a Yue-Han theorem [92]. 

M. Yue and J. Han utilized Polak’s procedure to 
study the reduced gradient method. The reduced gradi- 
ent method was first proposed by P. Wolfe [87]. An im- 
portant step in the reduced gradient method is to com- 
pute a reduced basis. There are several ways for find- 
ing a reduced basis [33,85,86,92,93]. It has been known 
to be quite efficient in practice. Wolfe’s original version 
of reduced gradient method is not globally convergent. 
A counterexample has been given by Wolfe [90] him- 
self. Several convergent versions [57,85,86,92,93] ap- 
peared later. Among them, Wang [85] derived his ver- 
sion from the Levitin-Polyak’s method [56]. Levitin- 
Polyak’s method is a gradient projection method differ- 
ent from Rosen’s method. In each iteration, they chose 
the search direction to be the projection of gradient on 
the feasible region. In general, we cannot find a closed 
form formula for such a projection. However, theoret- 
ically, the method can be put in very a general setting 
[1,46,52,60]. 

Yue and Han [92] also obtained a new property of 
a linearly constrained set in the format of the standard 
form of linear programming. These types of properties 
are very useful for the application of €-active set strat- 
egy. See [32] for a generalization. D.-Z. Du [22] also 
found a simpler convergent version for Rosen’s method 
by deleting Polak’s special procedure. 

Although several convergent versions have been 
found, one still wanted to know what would happen 
when the parameter is also bounded below by a posi- 
tive number. In fact, all the above convergent versions 
use some ideas different from Rosen’s one for anti- 
zigzaging. Thus, it was still open whether Rosen’s orig- 
inal idea that keeps the parameter away from zero and 
infinity works or not. (Allowing the parameter to ap- 
proach zero as the iteration varies was considered as an 
oversight in [82].) 

Zhang [97] made the first breakthrough in this di- 
rection; he showed that if the parameter cx, equals a pos- 
itive constant at every iteration, then Rosen’s method is 
convergent in the three-dimensional space. Soon later, 
Du and Zhang [35] established a more general result 
that if the parameter c, is chosen to be a positive con- 
stant with a specific upper bound, then Rosen’s method 
is convergent in the n-dimensional space. This is the 
first solution for the convergence problem. In the same 


paper, Du and Zhang also conjecture that the upper 
bound on the constant can be deleted, that is, as long 
as the parameter does not vary as the iteration varies, 
it can be chosen to be any positive number. This con- 
jecture was settled through several efforts including 
[26,47], and [36]. Finally, [34] showed a convergence 
theorem with a more general rule for selecting the pa- 
rameter. 

The global convergence of Rosen’s method also de- 
pends on the choice of line search. In [36] it is as- 
sumed that the line search is normal, including exact 
line search, Curry test, Goldstein test, Wolfe test, and 
Armijo rule [2]. 


Theorem 3 Consider Rosen’s method with a normal 
line search procedure. Let a and B be two positive num- 
bers with a < B. Suppose in Rosen’s method the parame- 
ters c, are chosen to satisfy a < c, < B. Then the method 
either stops at a Kuhn-Tucker point or generates an in- 
finite sequence whose cluster points are all Kuhn-Tucker 
points. 


If the line search is not normal, X.-D. Hu [48] showed 
that the global convergence of Rosen’s method may fail 
But, with Hu’s counterexample, the method in [22] still 
has the global convergence property. If the steplength 
in the line search is uniformly bounded, the choice of 
parameter can be further relaxed [29], which gives an 
improvement of Ritter’s result [80]. 


Theorem 4 Consider Rosen’s method with a normal 
line search procedure of uniformly bounded steplength. 
Let a and B be two positive numbers and € and ¢' two 
positive integers. Suppose in Rosen’s method the param- 
eters cx are chosen to satisfy 


a) cy > a whenever Jy, =+++ =Je— €; and 
b) cy < B whenever J, A+++ A Jy_¢' and |x] = +++ = 
Jee’ |. 


Then the method either stops at a Kuhn-Tucker point or 
generates an infinite sequence whose cluster points are all 
Kuhn-Tucker points. 


Finally, we would like to mention that [45] gives 
a method to deal with this degenerative case. This 
method uses the lexicographic simplex procedure to 
deal with degenerate line constraints. It is worth men- 
tioning that Bland’s rule [5] can also be used here in- 
stead of the lexicographic simplex procedure. 
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Global Convergence 


The significance of the solution for the global conver- 
gence problem of Rosen’s method is not only on a sin- 
gle algorithm. In fact, the solution is obtained based on 
discovery of new techniques and new developments of 
global convergence theory in nonlinear programming. 

To give a unified convergence theorem, W.I. Zang- 
will [94,95] introduced a point-to-set mapping; each it- 
eration that finds a new point from the current point is 
viewed as taking an element from the image set of the 
point-to-set mapping. He showed an abstract conver- 
gence theorem under the closeness of the mapping. 

Let X and Y be two topological spaces. Denote by 
P (X) the collection of all subsets of X. A mapping 
from X to P (Y) is usually called a point-to-set mapping. 
A point-to-set mapping A:X — P (Y) is said to be closed 
at a point x if 


Xk 7 xX 
ye 7 y 
ye © Axx) 


> yeéEA(x). (4) 


A point-to-set mapping is closed if it is closed at every 
point in its definition domain. 
Consider the following abstract algorithm. 


Let I” be a subset of a topological space X. 

Let A be a point-to-set mapping from X\I" to 

P(X). 

Choose an initial point x). 

k If x, € TI, then stop. Otherwise, choose 
Xki1 © A(xx). 


j=) 


Zangwill’s algorithm 


Theorem 5 (Zangwill’s theorem) Let f be a continu- 

ous function on a topological space X and I a subset of 

X. Let A be a closed point-to-set mapping from X\ I" to 

P(X) such that 

a) the closure of UP2, A(xx) is compact for every conver- 
gent sequence of points x1, X2,...inX \ I; and 

b) for everyx € X\ I and y € A(x), f(y)> f(x). 

Then Zangwill’s algorithm either stops at a point in I" or 

generates an infinite sequence whose cluster points are all 

inl’. 


However, for a constrained optimization problem, the 
closeness is lost when the line search procedure is 
stopped by a constraint. Du [23] proved a result to tell 
when an algorithm has the global convergence prov- 
able by Zangwill’s theorem. G.P. McCormick [58] sug- 
gested to search along a broken line, that is, ifthe search 
is stopped by reaching a new constraint then do not 
stop there and, keeping the constraint active, find a new 
direction to continue the search. K. Ritter [78,79,80] 
decomposed the broken line search into several line 
searches and applied it to a family of feasible direc- 
tion method including the gradient projection method. 
Since the discovery of Zangwill’s theorem, many ab- 
stract convergence results based on the point-to-set 
mapping have been established. See [50,51,61,62] for 
some of them. Some necessary or sufficient conditions 
for global convergence, such as Bazaraa—Shetty’s con- 
dition [4] and Wolfe’s work [88,89,90], also appeared. 
However, none of them is powerful enough to show the 
convergence of Rosen’s gradient projection method. 

As the open problem on Rosen’s method is resolved, 
a number of new techniques [28,29] for studying the 
global convergence of ‘nonclosed’ algorithms have been 
discovered. These techniques have now been known as 
slope lemmas. 


Lemma 6 (first slope lemma) Let {x;} be a sequence of 
feasible points such that f (xi)< f(xk+1) for k = 1, 2,.... 
Let x* be a cluster point of the sequence such that for any 
subsequence {xz} eK converging to x*, xx41— x > Oas 
k—> oo, k € K. If {xx} does not converge to x*, then there 
exists a subsequence {xz}k eK such that x, > x* ask > 
oo, k € K, and 


By (Ke+1— Xk) _ (5) 
k—o0, kEK |Xk-+41 _ xx| 

Lemma 7 (second slopelemma) Let {x;} be a sequence 

of feasible points, in a linearly constrained region, con- 

vergent to x*. Suppose f(xx)< f(x+1) for all k. Then 

there exists a subsequence {xx}kex such that for every 

J with J = J; for infinitely many k, 


= 
8, (Kk+1 — Xk) . 

0< ok < ||Pig(x*)]). 
k—00,kEK |Xx-+1 — X«ll 
Lemma 8 (third slope lemma) Let {x,;} be a se- 
quence of feasible points such that for all k, f(x,)< 
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SF (Xk 1). Let x* be a cluster point of the sequence. Sup- 
pose that for any subsequence {xx}. «x converging to x*, 
lim > 00, ke K(Xk+ 1— Xk) = 0. If there exists a positive 
number yt such that for all k, 


B, Ket — Xx) 


IIXe-+1 — Xe 


> wmax{|| Py, gk Pr yiBk I, 


* 


then the following two statements are equivalent: 
a) there exists a subsequence {x;}x eK converging to x* 
such that 


Bh (X41 — Xx) = (6) 
k—> 00, kEK |Xk-+1 = Xx| 
b) For every subsequence {x;}x «x converging to x*, (6) 
holds. 


The first and the second slope lemmas have intuitive 
background as follows. 

When you passed by a mountain, you may have no- 
ticed that the road somehow looks like a snake in s- 
shape. Why was the road built in such a way? The an- 
swer is easy, increase the length to decrease the slope. 
The longer its length is, the smaller its slope is. How- 
ever, did you think that as its length approaches infinity 
its slope approaches zero? This is what the first slope 
lemma states. 

Consider a path going up on a mountain. The aver- 
age slope is the ratio of the height increment over the 
length of the projection of the path onto the level plane. 
Clearly, the average slope of any path cannot exceed the 
slope of the direct path connecting the two endpoints. 
This is what the second slope lemma states. 

The power of the first slope lemma is surprising. It 
can show that a large class of feasible direction meth- 
ods share a convergence property that if the generated 
sequence of points has a cluster point but does not con- 
verge to it, then every cluster point of the sequence 
is a Kuhn-Tucker point. It is interesting to point out 
that Wolfe [90] has showed by a counterexample that 
Zoutendijk’s method [4,98] can generate a sequence of 
points converging to a point which is not a Fritz John 
point. However, with the first slope lemma, we can still 
show that it generates a sequence of points which do 
not converge to a point, then every cluster point of the 
sequence is a Fritz John point. 


Powell’s Conjecture 


Consider an unconstrained optimization problem as 
follows: 


max f(x), 


where f is continuously differentiable in R”. 

The first quasi-Newton method was initially pro- 
posed by W.C. Davidon [16] in 1959, but his work was 
published nine years later [17]. The public attention on 
Davidon’s work was largely due to the introduction of 
R. Fletcher and M.J.D. Powell [41] in 1963. This method 
is called the DFP method(Davidon-Fletcher—Powell). 


Choose an initial point x; and an initial posi- 
tive definite symmetric matrix Hj. Set k = 1. 

1 Compute gy. If g, = 0, then stop: else, go to 
step 2; 

2 Ifk > 1, then compute a positive definite sym- 
metric matrix H;, with an updating formula 


a 
Of pi 


OL -Vk-1 
Hy—1yk—1(A—1yK—-1) 


Y¥,_,He—1YK-1 


Ay = Ag-1 — 


’ 


where yx—1 = Sk — Se—1 aNd Op_1 = XK — XK-1. 
Compute a search direction d; = Hygx. Set 
Xku1 = Xx + Ody and k := k + 1, where @ is 
chosen by a line search. Go to step 1. 


DFP method 


Since then, many variations and generalizations of 
quasi-Newton methods [6,7,44,49,84,91] appeared in 
the literature. There are three issues on the conver- 
gence of quasi-Newton methods: quadratic termina- 
tion, global convergence, and convergence rate. All 
quasi- Newton methods have quadratic termination un- 
der certain conditions [8,9]. However, the global con- 
vergence is a difficult problem for quasi- Newton meth- 
ods. Powell [69] in 1971 established the first con- 
vergence theorem that if the objective function is 
twice continuously differentiable and uniformly con- 
cave, then the DFP method with exact line search gen- 
erates a sequence convergent to a stationary point. 
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L.C.W. Dixon [20,21] at the same period found that 
all methods in Broyden’s family with exact line search 
actually generate an identical sequence. This discov- 
ery extended Powell’s result to all members in Broy- 
den’s family. It is interesting to point out that although 
Powell in 1971 obtained a convergence theorem, he did 
not believe that DFP method in general is globally con- 
vergent. Thus, he conjectured in 1971 that there exists 
a twice continuously differentiable objective function 
such that DFP method generates a sequence convergent 
to a point which is not a stationary point. However, one 
year later he changed his mind. In fact, after eighteen 
months he worked on his own conjecture, but he did 
not find such a counterexample. Instead, he [70] proved 
that his conjecture in 1971 is false for objective func- 
tions with two variables and that the uniform concavity 
in his convergence theorem can be replaced by concav- 
ity and an upper bound on the objective function. 


Theorem 9 Let f be a twice continuously differentiable 
concave real function on R". Suppose that the level set 
{x: f (x) => f(x)} is bounded. Then the DFP method with 
exact line search and initial point x either stops at the 
maximum or generates an infinite sequence whose func- 
tion value converges to the maximum value of f (x). 


Therefore, since 1972, the following is considered as the 
conjecture of Powell: 


Conjecture 10 (Powell’s conjecture) Suppose the ob- 
jective function is continuously differentiable. Then every 
cluster point of the sequence generated by DFP method 
with exact line search is a stationary point. D. Pu and 
W. Yu [77] made an important progress on Powell’s con- 
jecture. They showed the following. 


Theorem 11 Suppose DFP method generates an infinite 
sequence {x,} converging to x*. If the objective function f 
belongs to the class C’', that is, there exists a constant L 
such that for every x and y 


Ig) — gy) < Lix—yll. 
then 
lim inf ||gx|| = 0. 
k->oo 


Powell [72] also showed a global convergence theorem 
for BFGS method with inexact line search (Wolfe test) 


for twice continuously differentiable and concave ob- 
jective functions with a upper bound. Ritter [81] tried 
to generalize Powell’s result to whole Broyden’s family. 
However, the restriction on search step in line search 
restricts success of his result. R.H. Byrd, J. Nocedal, 
and Y. Yuan [12] successfully generalized Powell’s re- 
sult to all members in Broyden’s family except DFP 
method. It is an very interesting open problem whether 
the DFP method with the Wolfe test has the same global 
convergence. With a modified Wolfe test and a suit- 
ably smooth objective function, Pu [76] showed that if 
a member in Broyden’s family generates a convergent 
sequence, then the sequence converges to a stationary 
point. 

For the convergence rate, Powell [69] showed that 
if the objective function is uniformly concave and its 
Hession matrix satisfies the Lipschitz condition, then 
DFP method with exact line search is superlinearly con- 
vergent. This can be extended to all members in Broy- 
den’s family by Dixon ’s theorem. Similarly, Powell 
[72] also showed that if the objective function is uni- 
formly concave and its Hession matrix satisfies Lips- 
chitz condition, then BFGS method with Wolfe test is 
superlinearly convergent. Byrd, Nocedal, and Yuan [12] 
generalized this result to Broyden’s family except DFP 
method under the condition that the Hession matrix is 
Hélder continuous. Pu [75] showed that with the mod- 
ified Wolfe test, DFP method is one-step superlinearly 
convergent. 

Combining Rosen’s method with variable metric 
methods, Goldfarb [42] and Murtagh and Sargant 
[63] obtained two efficient algorithms. Powell [71] 
showed that these two algorithms are actually equiv- 
alent. The convergence of such algorithms is still an 
open problem. However, several convergent variations 
[55,59,79,80] have been established. The convergence 
of quasi-Newton methods for nonlinearly constrained 
optimization can be found in [15,65,73,74]. 


See also 


> Equality-constrained Nonlinear Programming: KKT 
Necessary Optimality Conditions 

> First Order Constraint Qualifications 

> Inequality-constrained Nonlinear Optimization 

> Kuhn-Tucker Optimality Conditions 

> Lagrangian Duality: Basics 


3352 


Rosen’s Method, Global Convergence, and Powell’s Conjecture 


> 
> 
> 


> 


Saddle Point Theory and Optimality Conditions 
Second Order Constraint Qualifications 

Second Order Optimality Conditions for Nonlinear 
Optimization 

Successive Quadratic Programming: Full Space 
Methods 


References 


ils 


Allen E, Helgason R, Kennington J (1987) A generalization 
of Polyak’s convergence result for subgradient optimiza- 
tion. Math Program 37:309-318 


. Armijo L (1966) Minimization of functions having Lipschitz 


continuous first partial derivatives. Pacific J Math 16:1-3 


. Avriel M (1976) Nonlinear programming: Analysis and 


methods. Prentice-Hall, Englewood Cliffs 


. Bazaraa MS, Shetty CM (1979) Nonlinear programming: 


Theory and algorithms. Wiley, New York 


. Bland RG (1977) New finite pivoting rules for the simplex 


method. Math Oper Res 2(2):103-107 


. Broyden CG (1965) A class of methods for solving nonlinear 


simultaneous equations. Math Comput 19:577-593 


. Broyden CG (1967) Quasi-Newton methods and their ap- 


plication to function minimization. Math Comput 21:368- 
381 


. Broyden CG (1970) The convergence of a class of double- 


rank minimization algorithms 1: general consideration. 
J Inst Math Appl 6:76-90 


. Broyden CG (1970) The convergence of a class of double- 


rank minimization algorithms 2: the new algorithms. J Inst 
Math Appl 6:222-231 


. Buckley A (1975) An alternate implementation of Gold- 


farb’s minimization algorithm. Math Program 8:207-231 


. Byrd RH (1985) An example of irregular convergence in 


some constrained optimization methods that use the pro- 
jected Hessian. Math Program 32:232-237 


. Byrd RH, Nocedal J, Yuan Y (1987) Global convergence 


of a class of quasi-Newton method on convex problems. 
SIAM J Numer Anal 24:1171-1190 


. Byrd RH, Shultz GA. A practical class of globally convergent 


active set strategies for linearly constrained optimization. 
Manuscript (unpublished) 


. Calamai PH, More JJ (1987) Projected gradient methods for 


linearly constrained problems. Math Program 39:93-116 


. Coleman TF, Conn AR (1984) On the local convergence of 


a quasi-Newton method for the nonlinear programming 
problem. SIAM J Numer Anal 1:755-769 


. Davidon WC (1959) Variable metric method for minimiza- 


tion. AEC Res Developm Report ANL-5990, no. nov 


. Davidon WC (1968) Variable metric method for minimiza- 


tion. Comput J 10:406-410 


. Dax A (1978) The gradient projection method for quadratic 


programming. Report, Inst Math, Hebrew Univ, Jerusalem 


19. 


20. 


22. 


23: 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


Dembo RS, Klincewicz JG (1985) Dealing with degener- 
acy in reduced gradient algorithms. Math Program 31:257- 
363 

Dixon LCW (1972) Quasi-Newton algorithms generate 
identical points. Math Program 2:383-387 


. Dixon LCW (1972) Quasi-Newton algorithms generate 


identical points Il, the proof of four new theorems. Math 
Program 3:345-358 

Du D-Z (1983) A modification of Rosen-Polak’s algorithm. 
Kexue Tongbao 28:301-305 

Du D-Z (1985) Changing of point-to-set maps and point- 
to-set map families for continuities. Acta Math Applic 
Sinica 8(2):142-150, in Chinese 

Du D-Z (1985) A family of gradient projection algorithms. 
Acta Math Applic Sinica, English Ser 2:1-13 

Du D-Z (1985) A gradient projection algorithm for convex 
programming with nonlinear constraints. Acta Math Ap- 
plic Sinica 8:7-16, in Chinese 

Du D-Z (1987) Remarks on the convergence of Rosen’s gra- 
dient projection method. Acta Math Applic Sinica, English 
Ser 3:270-279 

Du D-Z (1988) Gradient projection methods in linear and 
nonlinear programming. Hadronic Press, Palm Harbor 

Du D-Z (1991) Convergence theory of feasible direction 
methods. Sci Press, Marrickville, Australia 

Du D-Z (1992) Rosen’s method and slope lemmas. In: 
Pardalos PM (ed) Advances in Optimization and Parallel 
Computing. Elsevier, Amsterdam, pp 68-84 

Du D-Z, Sun J, Song T-T (1980) A counterexample for 
Rosen’s gradient projection method. Math Assortment 
1:4-6, in Chinese 

Du D-Z, Du X-F. A convergent reduced gradient algorithm 
without using special pivot. Math Numer Sinica, to appear 
Du D-Z, Sun J (1983) A new gradient projection method. 
Math Numer Sinica 4:378-386, in Chinese 

Du D-Z, Sun J, Song T-T (1984) Simplified finite pivoting 
processes in the reduced gradient algorithms. Acta Math 
Applic Sinica 7:142-146, in Chinese 

Du D-Z, Wu F, Zhang X-S. On Rosen’s gradient projection 
methods. Ann Oper Res, to appear 

Du D-Z, Zhang X-S (1986) A convergence theorem 
of Rosen’s gradient projection method. Math Program 
36:135-144 

Du D-Z, Zhang X-S (1989) Global convergence of Rosen’s 
gradient projection methods. Math Program 44 

Du D-Z, Zhang X-S (1989) Notes on a new gradient projec- 
tion method. System Sci and Math Sci 2 

Dunn JC (1981) Global and asymptotic convergence rate 
estimates for a class of projected gradient processes. SIAM 
J Control Optim 19:368-400 

Dunn JC (1987) On the convergence of projected gradi- 
ent processes to singular critical point. J Optim Th Appl 
55:203-216 

Fletcher R (1987) Practical methods of optimization, un- 
constrained optimization. Wiley, New York 


Rosen’s Method, Global Convergence, and Powell's Conjecture 


3353 


41. 


42. 


43. 


AA, 


45. 


46. 


47. 


48. 


49. 


50. 


a1. 


52. 


53. 


54. 


55. 


56. 


57. 


58. 


59! 


60. 


61. 


62. 


Fletcher R, Powell MJD (1963) A rapidly convergent de- 
scent method for minimization. Comput J 6:163-168 
Goldfarb D (1969) Extension of Davidon’s variable met- 
ric method to maximization under linear inequality and 
equality constraints. SIAM J Appl Math 17:739-764 
Goldfarb D (1970) A family of variable metric methods de- 
rived by variational means. Math Comput 24:23-26 
Goldstein J (1970) Variations on variable-metric methods. 
Math Comput 24:1-22 

Gui X-Y, Du D-Z (1984) A superlinearly convergent method 
to linearly constrained optimization problems under de- 
generacy. Acta Math Applic Sinica, English Ser 1:76-84 
Haraux A (1977) How to differentiate the projection on 
a convex set in Hilbert space: Some applications to varia- 
tional inequalities. J Math Soc Japan 20:615-631 

He G-Z (1990) Proof of convergence of Rosen’s gradient 
projection method. Acta Math Applic Sinica 1 

Hu X-D (1989) Nonlinear programming: A unified ap- 
proach and the upper bound of partial concentrator. PhD 
Thesis, Inst Appl Math, Chinese Acad Sci 

Huang HY (1970) Unified approach to quadratically con- 
vergent algorithms for function minimization. J Optim Th 
Appl 5:405-423 

Huard P (1975) Optimization algorithms and point-to-set 
maps. Math Program 8:308-331 

Huard P (1979) Extensions of Zangwill’s theorem. Math 
Program Stud 10:98-103 

Kalfon P, Ribiere G, Sogno JC (1969) A method of feasible 
directions using projection operators. Proc IFIP congr 68: 
Information Processing, 1 North-Holland 

Karmarkar N (1984) A new polynomial-time algorithm for 
linear programming. Combinatorica 4:373-395 

Klee V, Minty GJ (1972) How good is the simplex algorithm? 
In: Shisha O (ed) Inequalities III. Acad Press, New York 
Kwei H-Y, Gui X-Y, Wu F, Lai Y-L (1979) Extension of a vari- 
able metric algorithm to a linearly constrained optimiza- 
tion problem: A variation of Goldfarb’s algorithm. In: Haley 
KB (ed) O.R. 1978. North-Holland, Amsterdam 

Levitin ES, Polyak BT (1966) Constrained minimization 
methods. USSR Comput Math Math Phys 6(5):1-50 
Luenberger DG (1973) Introduction to linear and nonlinear 
programming. Addison-Wesley, Reading 

McCormick GP (1969) Anti-zigzagging by bending. Man- 
agem Sci 15:315-320 

McCormick GP (1990) A second order method for the 
linearly constrained nonlinear programming problem. In: 
Rosen JB, Mangasarian OL, Ritter K (eds) Nonlinear Pro- 
gramming. Acad Press, New York 

McCormick GP, Tapia RA (1972) The gradient projection 
method under mild differentiability conditions. SIAM J 
Control Optim 10:93-98 

Meyer RR (1970) The validity of a family of optimization 
methods. SIAM J Control Optim 8:41-54 

Meyer RR (1976) Sufficient conditions for the conver- 


63. 


64. 


65. 


66. 


67. 


68. 


69. 


70. 


71. 


72. 


73. 


74. 


75. 


76. 


VEE 


78. 


79. 


80. 


gence of monotonic mathematical programming algo- 
rithms. J Comput Syst Sci 12:108-121 

Murtagh BA, Sargent RWH (1969) A constrained minimiza- 
tion method with quadratic convergence. In: Fletcher R 
(ed) Optimization. Acad Press, New York 

Murtagh BA, Saunders MA (1978) Large-scale linearly con- 
strained optimization. Math Program 14:14-72 

Nocedal J, Overton ML (1985) Projected Hessian updating 
algorithms for nonlinearly constrained optimization. SIAM 
J Numer Anal 1:821-850 

Polak E (1969) On the convergence of optimization algo- 
rithms. Revue Franc Inform Rech Oper 3:17-34 

Polak E (1971) Computational methods in optimization. 
Acad Press, New York 

Pollak HO (1978) Some remarks on the Steiner problem. 
J Combin Th A 24:278-295 

Powell MJD (1971) On the convergence of the variable 
metric algorithm. J Inst Math Appl 7:21-36 

Powell MJD (1972) Some properties of the variable met- 
ric algorithm. In: Lootsma FA (ed) Numerical Methods for 
Nonlinear Optimization. Acad Press, New York 

Powell MJD (1974) Unconstrained minimization and ex- 
tensions for constraints. In: Hammer PL, Zoutendijk G (eds) 
Mathematical Programming Theory and Practice. North- 
Holland, Amsterdam, pp 31-79 

Powell MJD (1976) Some global convergence properties of 
a variable metric algorithm for minimization without exact 
line searches. In: Cottle RW, Lemke CE (eds) Nonlinear Pro- 
gramming. Amer Math Soc, Providence 

Powell MJD (1978) The convergence of variable metric 
methods for nonlinearly constrained optimization calcula- 
tions. In: Mangasarian O, Meyer R, Robinson S (eds) Nonlin- 
ear Programming 3. Acad Press, New York, pp 27-63 
Powell MJD, Yuan Y (1991) A trust region algorithm 
for equality unconstrained optimization. Math Program 
49:189-211 

Pu D (1992) A class of DFP algorithm without exact linear 
search. Asia-Pacific J Oper Res 9:207-220 

Pu D (1997) The convergence of Broyden algorithms 
without convexity assumption. System Sci and Math Sci 
10:289-298 

Pu D, Yu W (1988) On convergence property of DFP algo- 
rithm. J QUfu Normal Univ 14(3):63-69 

Ritter K (1973) A superlinearly convergent method for 
minimization problems with linear inequality constraints. 
Math Program 4:44-71 

Ritter K (1975) A method of conjugate directions for lin- 
early constrained nonlinear programming problems. SIAM 
J Numer Anal 12:272-303 

Ritter K (1980) Convergence and superlinear convergence 
of algorithms for linearly constrained minimization prob- 
lems. In: Dixon LCW, Spedicado E, Szego GP (eds) Non- 
linear Optimization: Theory and Algorithms Il. Birkhauser, 
Basel, pp 221-251 


3354 


Rosen’s Method, Global Convergence, and Powell’s Conjecture 


81. 


82. 


83. 


84. 


85. 


86. 


87. 


88. 


Ritter K (1981) Global and superlinear convergence of 
a class of variable metric methods. Math Program Stud 
14:178-205 

Rosen JB (1960) The gradient projection method for non- 
linear programming, Part I: linear constraints. SIAM J Appl 
Math 8:181-217 

Rosen JB (1961) The gradient projection method for non- 
linear programming, Part Il: nonlinear constraints. SIAM J 
Appl Math 9:514-553 

Shanno DF (1970) Conditioning of quasi-Newton methods 
for function minimization. Math Comput 24:647-656 
Wang CY (1981) Simplifications of a new pivoting rule and 
Levitin-Polyak gradient projection method and their con- 
vergent property. Acta Math Applic Sinica 4:37-52, in Chi- 
nese 

Wang CY (1983) On convergence property of an im- 
proved reduced gradient method. Kexue Tongbao 28:577- 
582 

Wolfe P (1963) Methods of nonlinear programming. In: 
Graves RL, Wolfe P (eds) Recent Advances in Mathematical 
Programming 

Wolfe P (1969) Convergence conditions for ascent meth- 
ods. SIAM Rev 11:226-235 


89. 


90. 


91. 


92. 


93. 


94. 


95. 


96. 


97. 


98. 


Wolfe P (1970) Convergence theory in nonlinear program- 
ming. In: Abadie J (ed) Integer and Nonlinear Program- 
ming. North-Holland, Amsterdam 

Wolfe P (1972) On the convergence of gradient methods 
under constraints. IBM J Res Developm 16:407-411 

Wu F, Gui X (1981) A class of variable metric methods with 
n+1 parameters. Acta Math 24:921-930 

Yue M, Han J (1979) A new reduced gradient method. Sci 
Sinica 22:1099-1113 

Yue M, Han J (1984) A unified approach to the feasible di- 
rection methods for nonlinear programming with linear 
constraints. Acta Math Applic Sinica, English Ser 1:63-73 
Zangwill WI (1969) Convergence conditions for nonlinear 
programming algorithms. Managem Sci 16:1-13 

Zangwill WI (1969) Nonlinear programming: A unified ap- 
proach. Prentice-Hall, Englewood Cliffs 

Zhang X-S (1979) An improved Rosen-Polak method. Acta 
Math Applic Sinica 2:257-267, in Chinese 

Zhang X-S (1985 1987) On the convergence of Rosen’s gra- 
dient projection method. Acta Math Applic Sinica 8:125- 
128, in Chinese. Also in: English Ser 3:280-288 

Zoutendijk G (1960) Methods of feasible direction meth- 
ods. Elsevier, Amsterdam 


Saddle Point Theory and Optimality Conditions 


3355 


Saddle Point Theory 
and Optimality Conditions 


JORGEN TIND 
University Copenhagen, Copenhagen, Denmark 


MSC2000: 90C06 


Article Outline 


Keywords 

Saddle Points 

Mathematical Programming 
Convex Programming 

See also 

References 


Keywords 


Optimization; Saddle point 


The notion of a saddle point is a fundamental concept 
in many areas of science and economics. A classical in- 
stance is the famous saddle point theorem for a zero- 
sum matrix game due to J. von Neumann [2]. We shall 
here emphasize the utility of saddle points in the con- 
text of optimization. 


Saddle Points 


Let us first recall some fundamental observations about 
saddle points and duality. Consider a function K(x, y): 
X x Y > R, where X C R" and Y C R”. We say that 
(xo, Yo) € X x Y is a saddle point of K on the sets X and 
Y if 

K(x, yo) < K(x0, yo) < K(xo, y) (1) 
for allxe XandyeY. 


Consider next the following two programs: 


z = max inf K(x, y) (2) 
xEX yeEY 
and 
w = minsup K(x, y). (3) 
VEY vex 


Since one program is derived from the other via an in- 
terchange of the optimization directions they constitute 
a pair of so-called dual programs. 

We say that x9 € X is an optimal solution of the pro- 
gram (2) if inf, ¢ y K(xo, y) = max,ex infyey K(x, y)). 
Similarly, yo € Y is an optimal solution of the program 
(3) if min, ¢Y supxe x K(x, y) = supye x K(x, yo). Ob- 
serve that 


inf K(xo, y) < K(xo, yo) < sup K(x, yo) (4) 
yeY xEX 


for any (xo, yo) € X x Y. This implies that we always 
have so-called weak duality: z < w. We speak of strong 
duality when optimal solutions (xo, yo) exist for pro- 
gram (2) and (3), respectively, so that z = w. If so, the 
inequalities of (4) are turned into equations. In this case 
sup ¢ x K(x, yo) is obtained by xo, so that the sup oper- 
ator is naturally replaced by the max operator. Similarly 
for the inf operator and we obtain that 


in K = Ki = K(x, : 
Mey (xo, y) (xo, Yo) Se (x yo) 


However, this expression is equivalent to the definition 
of a saddle point. 

This leads us to the result that (x9, yo) is a saddle 
point if and only if xo and yo are optimal solutions of 
the programs (2) and (3), respectively, and with equal 
values, i.e. z = w. 
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Mathematical Programming 


We shall now study the particular situation where K(x, 
y) is the Lagrange function of a mathematical program- 
ming problem. Let f(x): R” — R and g(x): R" > R”. 
Further let K(x, y) = f(x) — yg(x) + yb, where b € 
R”, and Y = RY, the nonnegative orthant. In this set- 
ting program (2) becomes an ordinary mathematical 
programming problem stated in the following so-called 
primal form: 


max f(x) 
Zp = 4st g(x) <b (5) 
xeXx, 


The dual form of (5) is derived from program (3): 
min sup f(x) — yg(x) + yb 
yeRY xEX 


or with u € R alternatively as 


min u+ yb 
Wp = {st utyeg(x)> f(x), Wxex, (6) 
uéER, yeR®. 


Since the programs (5) and (6) are special cases of the 
dual programs (2) and (3), we get that optimal solutions 
Xo and yo exist for (5) and (6), respectively, and with z, 
= wy if and only if (xo, yo) is a saddle point for K(x, y) = 
F(x) — yg(x) + yb. 

The condition (1) for a saddle point (x9, yo) looks 
here as follows: 


f(x) — yog(x) + yob 
< f (xo) — yog(xo) + yob (7) 
< f(xo) — yg(xo) + yb 


for all x ¢ X andy € R”. 

We shall show in this particular case that the condi- 
tion for a saddle point can be restated in an alternative 
form as the so-called optimality conditions: 

i) Xo maximizes f(x) — yog(x) over X; 

ii) yo(g(xo) — b) = 0; 

iii) yo > 0; and 

iv) g(xo) < b. 

So, assume that we have a saddle point (x0, yo) satisfy- 
ing (7). Then condition i) is implied by the left inequal- 
ity of (7). If g(xo) ¢ b then by an appropriate choice 


of nonnegative elements y we can violate the right in- 
equality of (7). This implies iv). Condition iii) is im- 
plied directly. Conditions iii) and iv) imply that yo(g(xo) 
— b) < 0. If this inequality is strict then yo does not 
minimize over R, according to the right inequality of 
(7). This proves ii). Conversely, by similar arguments 
we can show that i)-iv) imply the saddle point condi- 
tion (7). 

We shall next study the mathematical programming 
problem (5) and provide conditions leading directly to 
the existence of a saddle point or equivalently to the sat- 
isfaction of the optimality conditions. In this context 
we shall study the perturbation function of the primal 
program (5) by varying the right-hand side of the con- 
straints. Let d € R”. The perturbation function $(d) is 
then defined as follows: 


max f(x) 
d(d)= 4st. g(x) <d 
xex. 


Let D= {d © R": dx € X s.t. g(x) < d}. Consider next 
the following program: 


min u+ yb 
st. u+yd>¢(d), VdeD, (8) 
uéER, yeR®. 


We shall show that (8) is equivalent to the dual program 
(6). Assume (u, y) is a feasible solution of (8), i. e. it sat- 
isfies the constraints of (8). For x € X and d = g(x) we 
then obtain that u + yg(x) = u+ yd = $(d) = (g(x) 
> f(x). Hence (u, y) is also feasible in (6). Conversely, 
assume that (u, y) is feasible in (6). Ifx € X and g(x) <d 
we immediately get that f(x) < ¢(d) and moreover that 
u+yd>u+ ye(x) > f(x). Hence u + yd > o(d) imply- 
ing that (u, y) is also feasible in (8). So with equal set of 
feasible solutions and with the same objective function 
the two programs are equivalent. 

The dual program in the form of (8) has a nice geo- 
metric interpretation. For a given set of coefficients (u, 
y) the sum u + yd becomes an affine function in d € R”. 
The objective of (8) is to select the coefficients in such 
a way that this function always is above the perturba- 
tion function, but with the lowest possible value at the 
point b. (This confirms the existence of weak duality for 
the dual programs (5) and (6)). For an optimal solution 
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of the primal programming problem (5) we thus addi- 
tionally have strong duality if the affine function coin- 
cides with the perturbation function at the point b. In 
mathematical terms the perturbation function ¢ satis- 
fies the following condition: 


p(d) < o(b) + y(d—b), VdeD, (9) 


and we say that the perturbation function is superdiffer- 
entiable at b. 

Significant cases exist for this to occur. For exam- 
ple, if the perturbation function is finite and concave 
over the set D of feasible right-hand sides then it is su- 
perdifferentiable at points in the relative interior of D. 
Moreover, at the boundary the perturbation function is 
not superdifferentiable if and only if the perturbation 
function has a directional derivative of value + 00 at the 
selected point. This case rarely occurs, and if this does 
not happen the perturbation function is said to be sta- 
ble. 

These mainly geometrical observations can be 
treated more rigorously applying some classical results 
about conjugate functions and supporting hyperplanes 
in convex analysis, see [3]. 


Convex Programming 


In this last section we assume that X = R"., f(x) is differ- 
entiable and concave and that the components of g(x) 
are convex and differentiable. These assumptions are 
indeed fulfilled in many applications. In this case the 
perturbation function is finite and concave on R’'. We 
also assume that the perturbation function is stable. By 
the observations above we then get the following funda- 
mental property: If an optimal solution xo exists for the 
primal program (5), then a vector yo exists such that (xo, 
yo) constitutes a saddle point (or equivalently that (xo, 
yo) satisfies the optimality conditions). 

Moreover by the assumptions made on convexity 
and differentiability the optimality conditions can be 
restated into the famous Karush-Kuhn-Tucker condi- 
tions [1]: 


© (Vif (x0) — yoVxg(x0)) = 05 
e (Vif (x0) — Yo Vxg(x0))x0 = 05 
© yo(g(xo) — b) = 0; 


© g(x) < band (%, yo) = 0. 
Therefore and subject to the assumptions made, the 
Karush-Kuhn-Tucker conditions provide necessary 


and sufficient conditions for an optimal solution of the 
mathematical programming problem (5). This result 
has had a tremendous impact on the development of 
algorithms to solve mathematical programming prob- 
lems. Moreover, if the perturbation function #(d) is 
also differentiable at b then it is approximated by the 
right-hand side of (9). This happens in many applica- 
tions. Thus with numerically small deviations d — b the 
vector yo measures the marginal effects on the optimal 
value of the objective function of (5). Due to this prop- 
erty yo is often denoted as the vector of shadow prices 
and as such it plays a central role in the discussion and 
interpretation of results obtained by mathematical pro- 
grams, in particular when modeled over problems in 
economics. 
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A second order constraint qualification (SOCQ) is 
a condition which is imposed on the analytical descrip- 
tion of the constraint set of an optimization problem 
and which usually involves second order approxima- 
tions of the data functions. second order constraint 
qualifications are essential in order to establish second 
order optimality conditions, but they play also a cer- 
tain role in the perturbation analysis for optimization 
problems. Roughly speaking, second order constraint 
qualifications establish a link between the geometry of 
the given set and certain kinds of second order approx- 
imations of the analytical data. Second order constraint 
qualifications are closely related to first order constraint 
qualifications (cf. » First Order Constraint Qualifica- 
tions), in particular, see that article for notions such as 
LICQ, MFCQ, Robinson CQ and others. 

Historically, SOCQs were first introduced in studies 
of second order optimality conditions for smooth non- 
linear programs, see, e.g., [11,12,19,20]. Given twice 
continuously differentiable functions g;: R” — R (i = 


0, ..., 7), the smooth mathematical programming prob- 
lem is 
min go(x) 
st. gi(x) <0, 
(P) £OT = Uo oscg nh, (1) 
g(x) = 0, 


jeJ:={m41,...,r}. 


A (dual) second order necessary optimality condition 
of Fritz John type is the following one: If x is a local 
minimizer of (P), then for every critical direction d, i.e., 
every d satisfying 


(Dgj(3).d) =o, 


- — 70 
ie, 


7eJ, 


where Iz := {i € 1: g;(x) = 0} and 2 := Iz U {0}, 
there exist multipliers u; > 0 for i € P. and uj € R for j 
€ J, not all zero, such that 


uj (Dgi(x), d) => 0, 
ux Dgx(x) = 0, 

kePuy (2) 
‘> ux (d, D° gi(x)d) = 0, 


keruy 


ae) 
i€ I,, 


see [2,3,16,18]. A SOCQ comes into play if one asks 
when up can be chosen as 1 in order to have an 
objective-independent condition of Kuhn-Tucker type. 
Suppose d is a critical direction. Then under the above 
assumptions, the multiplier uo in (2) can be chosen as 1 
if there holds (see [2]) the 
e Ben-Tal SOCQ: Dg;j(x), j € J are linearly inde- 
pendent, and there exists some h 4 0 such that 

(h, Dgi(x)) + (d, D’gi(x)d) < 0 for i € I(x,d), 

and (h, Dg;(x)) + (d, D*g;(x)d) = 0 for j € J, where 

i € I(x, d) ifand only ifi € Iz and (Dg;(x), d) = 0. 
A similar SOCQ plays a role in infinite-dimensional set- 
tings [3]. If MFCQ holds at a local minimizer X of (P), 
then the Ben-Tal SOCQ is fulfilled at x for each critical 
direction d [2,3,25]. The Ben-Tal SOCQ does not guar- 
antee that the same multiplier vector can be taken such 
that (2) is satisfied (with uo = 1) for each critical direc- 
tion. However, under LICQ this so-called strong neces- 
sary condition holds, see [2]. For other SOCQs imply- 
ing the strong necessary condition, see [12]. Classical 
textbooks like [11,19] often apply the 
e McCormick SOCQat x € M: Any vector d satisfying 

(Dgi(x),d) = 0 fori € Iz U J is the tangent of an 

arc «(@), twice differentiable, along which g;(a(@)) 

= 0 for all i € I; U J, where 0 € [0, e],e >0.M 

denotes the constraint set of (P). 

If McCormick’s SOCQ, together with a first order CQ, 
holds, then a strong necessary condition is fulfilled in 
a much weaker form, namely, only for critical direc- 
tions d satisfying additionally (Dgi(x),d) = 0 for all 
i € Iz. Note that LICQ implies the McCormick SOSC 
[11], but the Kuhn-Tucker (first order) CQ does not 
imply the McCormick SOSC [11]. 

While strong necessary optimality conditions and 
the corresponding (restrictive) SOCQs rely on assump- 
tions ensuring that active inequalities may be han- 
dled as equations, the approach which goes back to 
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A. Ben-Tal [2,3] is based on verification of optimality 
along curves that have a second order expansion, so- 
called parabolic curves. On the one hand, the concept 
of parabolic curves and related second order tangent 
sets requires only weak and ‘natural’ constraint qualifi- 
cations, on the other hand, it leads to explicit second or- 
der optimality conditions involving second order terms 
of the original data. 

This approach was continued in the remarkable pa- 
per [9] for problems of the type 


oe min f(x) 


P 
e st. G(x) EC @) 


(with (smooth) abstract constraints), where G maps 
a Banach space X into a Banach space Y, f maps X into 
R, f, G are twice continuously differentiable, and C is 
a nonempty closed convex subset of Y. The following 
discussion of the role of SOCQs in this context takes 
pattern from [5,6,9,21,24]. 

Assume X = R". Define the Lagrange function by 
L(x, u): = f(x)+(u, G(x)). The first order tangent set 
Tc(x) to C at x consists of all directions p satisfying 
dist(x + tp,C) ~ o(t), the second order tangent set 
Té(X, p) to C at X in direction p consists of all direc- 
tions q satisfying 


1 
dist (z+ tp+ 514 c) ~ 0(t’), 


where dist(x, C) denotes the Euclidean point-to-set dis- 
tance from x to C, and g(t) ~ o(t*) means the Landau 
notation, i.e., lim, | (t)/t* = 0. Now the direction h € 
X is said to be critical at the feasible point x if and only 
if DG(x)h € Tc(G(x)) and (Df(x),h) < 0. Then one 
has [9]: 

Theorem 1 Let x be a local minimizer for f on the feasi- 
ble set S defined by (3). If Robinson’s CQ holds at x, then 
for all critical directions h at x, 


sup {(h, D2,.L(%, u)h)—o(u,Qi)}=0, (4) 
u€ A(x) 


where A(x) is the set of Lagrange multipliers associated 
with x for (P), Qh := T2(G(x), DG(x)h), and a(:, Q) 
is the support function of Q, ut—> o(u, Q) = supge Qu 
q): 

In order to have a narrow gap between necessary and 
sufficient optimality conditions, the following direc- 
tional SOCQ, introduced in [4,6], is very important: 


e Directional SOCQ: The set C is said to be second or- 
der regular at y € C ina direction p € Tc(y) and with 
respect to a linear mapping M: X — Y, if for any se- 
quence y,, € C of the form 


1 
Yn = y+ top + 5tdn, 


where t | 0, qn : = Mw, + an, {an} converging in Y, 

Wn € X, tnWn — 0, one has lim dist(qn, To(y, p)) = 0. 
If Cis second order regular at y with respect to all such 
p, M, X then C is called second order regular at y. 

For example, the polyhedral convex cone C in the 
setting of a nonlinear program (1) is second order regu- 
lar. Further, the cones of positive and negative semidef- 
inite matrices are second order regular (which is used 
for deriving optimality conditions in semidefinite pro- 
gramming [23]). For several other settings of C, for 
example in semi-infinite programming and composite 
optimization, the directional SOCQ has been specified 
in [6], where also the following crucial result can be 
found (known as the equivalence theorem): 


Theorem 2 If Robinson’s CQ holds at a feasible point 
x, and if for every critical direction h, the set C is sec- 
ond order regular at G(x) in the direction DG(x)h and 
with respect to DG(x)(-), then the inequality (4) holds 
strictly (i. e., with > 0) for nonzero critical directions h if 
and only if f satisfies, with some positive c, the quadratic 
growth condition f(x) > f(x) + c ||x — x||° for feasible 


x near x. 


In this sense, there is no gap between second or- 
der necessary and sufficient conditions. Similar second 
order optimality characterizations under some direc- 
tional second order regularity were derived in [21] for 
so-called parabolically regular (extended-valued) func- 
tions. Note that the latter concept is weaker than that of 
the directional SOCQ. 

If the abstract constraint (3) reduces to the nonlin- 
ear programming form (1), then the term o(u, Q;) in 
(4), called ‘shifting term’ or “o-term’, vanishes. Hence, 
the equivalence theorem reduces to a well-known fact 
for smooth nonlinear programs (see, e. g., [2,25]), since 
MFCQ and Robinson’s CQ coincide in this situation. 

In contrast to this, for more general settings, o(u, 
Qy) is essential. Conditions which allow a concrete rep- 
resentation of the o-term can be considered as SOCQs. 
For standard semi-infinite programs with compact in- 
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dex set I (i.e., C is the cone of nonnegative continu- 
ous functions on J), this has been done in [5,24]. Con- 
ditions which ensure the local reduction of a semi- 
infinite program to a standard nonlinear program (‘re- 
duction approach’) also allow a concrete representation 
of o(u, Q,), for standard semi-infinite programs see, 
e. g., [13,22], for generalized semi-infinite programs see, 
e.g., [14,17]. 

The directional SOCQ defined above (or some 
stronger versions, respectively), together with a first or- 
der directional CQ, have been essentially used (e. g., in 
[5,7,8]) in sensitivity and stability analysis of problem 
(P) under perturbations along certain directions: sec- 
ond order expansion of optimal values, first order ex- 
pansion of optimal solutions, differential stability, Lip- 
schitz stability, etc. 


Remarks 


In this article, the focus was on smooth problems. 
Though the notion SOCQ is not often used in non- 
smooth optimization, the study of higher-order tan- 
gent sets and their influence in optimality conditions 
of higher order could be considered as an analogy to 
SOCQs. For many material in this direction see, e. g., 
[1,21]. A special case between smooth and nonsmooth 
programs are so-called C’! optimization problem (i.e., 
problems of type (P), but the data are differentiable with 
locally Lipschitzian gradients). A SOCQ for this class of 
problems is a stronger variant of the Abadie CQ: the 
Bouligand (contingent) cone has to coincide with the 
linearization cone [15]. This SOSC, together with some 
first order CQ, is again of interest in deriving necessary 
second order optimality conditions, see [10,15]. 


See also 


> Equality-constrained Nonlinear Programming: KKT 
Necessary Optimality Conditions 

> First Order Constraint Qualifications 

> Inequality-constrained Nonlinear Optimization 

> Kuhn-Tucker Optimality Conditions 

> Lagrangian Duality: Basics 

> Rosen’s Method, Global Convergence, and Powell’s 
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The quest for optimality conditions for nonlinear opti- 
mization problems started a long time ago, however, un- 
til calculus was invented there was little progress. Cal- 
culus has enriched and accelerated this endeavor signif- 
icantly. 

In the following discussion x = (x, ..., Xn)™ de- 
notes the column vector of decision variables whose 
optimal values need to be determined. All functions 
are assumed to be continuous real valued functions of 
x. When discussing the first (second) order optimality 
conditions, we make the assumption that all functions 
are continuously differentiable (twice continuously dif- 
ferentiable). 

For any real valued function f(x), we denote the gra- 
dient vector of f(x) at x, (Of(X)/0x1,...,0f(X)/dxn), 
written as a row vector, by V, f(x) and the n x n Hes- 
sian matrix of f(x) at x, (0 f(*)/dx;0x;), by V2, f(X). 

We discuss the nonlinear optimization problem in 
terms of ‘minimization’ of the objective function. In- 
stead, if an objective function F(x) is required to be 
‘maximized’, this problem is equivalent to minimizing 
—F(x) subject to the same constraints. Because of this, 
we state all our results in terms of minimization prob- 
lems. 

Nonlinear optimization problems can be classified 
into three broad types, which are of the following forms: 
e Unconstrained minimization 


min 6(x) 
| (1) 
st. x ER", 
e Equality-constrained optimization 
min 0O(x) 
(2) 
st. gi(x) =0, i=1,...,m. 
e General constrained optimization 
min 0@(x) 
=0, i=1,...,m, 
st. gi (x) ; 
>0, i=m+1,...,.m+p. 
(3) 


The general problem (3) is said to be a convex pro- 


gramming problem [2,3] if the objective function to 
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be minimized, (x), is convex; the equality-constraint 

functions g;(x), i= 1,..., m, are affine; and the inequal- 

ity (=) constraint functions gj(x),i=m+1,...,m+p 
are concave; a nonconvex programming problem other- 
wise. 

In linear programming, we only talk about an opti- 
mum solution, but in nonlinear optimization, we need 
to consider different types of optimum solutions. In 
minimization problems the feasible solution X is said 
to be a: 

e local (or relative) minimum if 0(x) > @(x) for all 
feasible x satisfying ||x — x|| < € for some positive 
€. 

e strong (or strict) local (or relative) minimum if 
O(x) > O@(x) for all feasible x # Xx satisfying 
||x — x|| < € for some positive e. 
global minimum if @(x) > @(x) for all feasible x. 

e stationary point if it satisfies one of the necessary 
conditions for a local minimum. 

Corresponding concepts (weak or strong local max- 

imum, global maximum) for maximization problems 

are defined analogously. We illustrate these concepts in 

Fig. 1 for the problem of minimizing the graphed func- 

tion f(x) of a single variable x € R’ in the interval a < 

x < b. The points a, x°, x’, x'°, x’? are strong local min- 

ima; x°, x*, x°, x'!, b are strong local maxima; x!* is the 

global minimum, and x° is the global maximum. x’, x? 

are weak local minima; and x°, x? are weak local max- 


f(x} 


ima. x° is a stationary point even though it is neither 
a local maximum nor a local minimum. In each of the 
intervals x! < x < x’, and x® < x < x’, f(x) is a con- 
stant; and every point in the interior of these intervals 
(i. e., points x satisfying x! <x <x? or x8 <x <x?) is both 
a weak local minimum and a weak local maximum. 

In linear programming all local minima are global 
minima, hence we only talk about an ‘optimum solu- 
tion’ there. As seen above, in nonlinear programming 
models these concepts could be different. 

The original intent of the problem is of course to 
find a global minimum. Using existing algorithms, this 
is possible with reasonable efficiency only for convex 
programming problems (which includes linear pro- 
grams and convex quadratic programs as special cases). 
As discussed in [3,4], for general nonconvex program- 
ming problems, even finding a local minimum is hard; 
existing efficient algorithms can at best guarantee con- 
vergence to a stationary point on such problems. 

Unfortunately, there are no known useful condi- 
tions that can efficiently characterize a global minimum 
for the general nonlinear programming problem. We 
only know some necessary, and some sufficient con- 
ditions for a local minimum for this general problem. 
In convex programming problems however, every local 
minimum is a global minimum; so for this nice class of 
problems we have useful necessary and sufficient con- 
ditions for a global minimum. 


Second Order Optimality Conditions for Nonlinear Optimization, Figure 1 


Second Order Optimality Conditions for Nonlinear Optimization 


3363 


Optimality Conditions 
for the Unconstrained Minimization Problem (1) 


These were developed first for one-dimensional mini- 

mization problems in the 17th century as Newton was 

developing calculus, and very soon after extended to 
multidimensional minimization problems. These con- 
ditions are: 

e First order necessary conditions for x to be a local 
minimum: V,6(x) = 0. 

e Second order necessary conditions for X to be a local 
minimum: V,6(x) = 0 and V?,(6(X)) is positive 
semidefinite (PSD). 

e Second order sufficient conditions for x to be a strict 
local minimum: V,,6(x) = 0, and V2,,(0(X)) is pos- 
itive definite (PD). 

e Necessary and sufficient conditions for x to be 
a global minimum if 6(x) is convex: V,6(x) = 0. 
The computational effort needed to check whether 
a given square matrix is PSD or PD is O(n?). Hence, 
given x, each of the above conditions can be checked 

efficiently. 

When 6(x) is nonconvex, there is a slight gap be- 
tween the necessary and sufficient conditions for a local 
minimum when V,.6(x) = 0 and V2,.(6()) is PSD and 
not PD. In this case we are unable to conclude that ei- 
ther x is a local minimum, or that it is not. To bridge 
this gap, more complicated conditions involving higher 
order derivatives are needed, but they are impractical, 
particularly when 1 is large. 


Optimality Conditions for the Equality 
Constrained Minimization Problem (2) 


Inspired by the study of problems in mechanics, these 
conditions, which form the foundation of nonlinear 
programming theory, were developed in the 18th and 
19th centuries. Major results were obtained by L. Eu- 
ler and J.L. Lagrange and were first published in a book 
written by Lagrange in 1788. 

The necessary conditions are derived under a con- 
dition on the constraints in (2) known as a constraint 
qualification (CQ). A well known CQ is known as the 
regularity condition. 

The feasible solution x for (2) is said to satisfy the 
regularity condition (and hence called a regular point 
for (2)) if {Vxgi(%),..., Ve8m(X)} is linearly indepen- 
dent. 


The optimality conditions for a feasible solution x 
to be a local minimum for (2) are: 

e First order necessary conditions for x to be a local 
minimum for (2): If x is a local minimum for (2), 
and either all the constraints are linear constraints, 
or X is a regular feasible solution, there exists 7 = 
(701,..., 2m) such that 


V.O(%) = >) WiVx gil). 
i=1 


The vector 7 in the above condition is known as the 
Lagrange multiplier vector. The function L(x, 7) = 
O(x) — 77, Wigi(x) is known as the Lagrangian 
function for (2) with Lagrange multiplier vector 7. 
The above necessary condition can also be written 
as V, L(x, 7) = 0. 

e Second order necessary conditions for x to be a local 
minimum for (2): If x is a local minimum for (2), 
and either all the constraints are linear constraints, 
or X is a regular feasible solution, there exists 7 = 
(301, ..., 2m) such that 


V,O(X) = 2 i Vx Gi(X) 
i=1 


and 


y'V2,(L@,)y =O forall y € T, 


where T = {y: Vxgi(x)y = 0 for alli =1,...,m}. 


e Sufficient condition for X to be a strict local mini- 
mum for (2): If x is a feasible solution for (2), and 
there exists a 77 = (711,..., 7m) such that 


V.0(%) = )) Vx gi() 


i=1 
and 
y'V2,(L@,%))y>0 forallo# y eT, 


where T is defined above, then X is a strict local min- 

imum for (2). 
Given a regular feasible solution x, the existence of a La- 
grange multiplier vector 7 which satisfies either of the 
necessary conditions given above together with x, can 
be checked efficiently. 

Here again, in nonconvex programs, there is a small 
gap between the necessary and sufficient optimality 
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conditions for a local minimum. If the second order 
necessary condition holds at x, but not the sufficient 
condition, then using these conditions we are not able 
to guarantee that either x is a local minimum, or that it 
is not. 


Optimality Conditions for the General 
Constrained Optimization Problem (3) 


Fundamental ideas about necessary optimality condi- 

tions for nonlinear optimization problems including 

some inequality constraints, have been investigated be- 

ginning with J.B.J. Fourier in 1798; and later by A. 

Cournot, M. Ostrogradsky, C.F. Gauss, J. Farkas, and G. 

Hamel among others. Rigorous development of these 

conditions in the form we know them today has been 

completed by W. Karush in 1939, and later in essen- 
tially the same form by H.W. Kuhn and A.W. Tucker in 

1951. 

Given a feasible solution x for (3), the ith constraint 
in (3) is said to be active at x if g;(x) = 0, inactive oth- 
erwise. Thus all equality constraints are active at every 
feasible solution, and an inequality constraint is active 
at a feasible solution if it holds as an equation there. We 
will denote the index set of active constraints at a feasi- 
ble solution x, {i: gi(x) = 0} by B(x). 

Necessary optimality conditions are derived under 
a CQ. There are several CQ, some weaker than the oth- 
ers. The principal ones are: 

e Regularity condition: The feasible solution xX satisfies 
this CQ and is therefore called a regular point for (3) 
if {Vx gi(x): i € B(x)} is linearly independent. This 
condition can be checked efficiently. 

e First order CQ: The feasible solution xX for (3) satis- 
fies this CQ if for each 


Vigi(x)y = 0, 
“ ; i=1,...,m; 
YS)" Ve gilR)y = 0, 


i€{m+1,...,m+ p}N B(x) 


y is the tangent direction to a differentiable curve 
emanating from X and lying in the feasible region. 
This condition is hard to check. 

e Second order CQ: The feasible solution x for (3) sat- 
isfies this CQ if for each y € {y: Vegi(x)y =0, 1 € 
B(x)}, there exists a twice differentiable curve ema- 
nating from x and lying in the region {x: gi(x) = 0, 


i € B(x)}, for which y is the tangent direction at x. 
This condition is hard to check. 

e Mangasarian-Fromovitz CQ: The feasible solution x 
for (3) satisfies this CQ if the set 


{d: Vegi()d =0,i=1,... 


Vigi(x)d > 0, 
fori€ {m+1,...,m+ p}M B(x) 


,m} 
n 


is nonempty, and {V,gi(x): i= 1,...,m} is lin- 
early independent. This condition can be checked 


efficiently. 
The Lagrangian function for (3) with Lagrange multi- 
plier vector 7 = (71,...,@m+p) is L(x, 7) = A(x) — 


pak 7; gi(x). The optimality conditions for a feasible 

solution x to be a local minimum for (3) are: 

e First order necessary conditions for x to be a local 
minimum for (3): If x is a local minimum for (3), 
and either all the constraints are linear constraints, 
or X satisfies the regularity, or the 1st order, or 
the Mangasarian—Fromovitz CQs, there exists a La- 
grange multiplier vector 7 = (7,...,7@m+p) such 
that 

m+p 
V,0(%) = >> wiVegi), 

i=1 
7,>0 forie{m+l,...,m+ p}, 
Wigi(x)=0 forie{m+l],...,m-+ p}. 


In the literature, these conditions are commonly 
referred to as the KKT conditions (Karush-Kuhn- 
Tucker conditions). Given x, checking for the ex- 
istence of a Lagrange multiplier vector which to- 
gether with x satisfies these conditions can be posed 
as a linear programming problem and solved efh- 
ciently. 

e Second order necessary conditions for X to be a local 
minimum for (3): If x is a local minimum for (3), 
and either all the constraints are linear constraints, 
or X satisfies the regularity, or the 2nd order, or 
the Mangasarian—Fromovitz CQs, there exists a La- 
grange multiplier vector 7 = (7,...,@m+p) such 
that 

m+p 
VsO%) = Do iV gil), 
i=1 


7, >0 forie{m+l1,...,m+p}, 
Wig(x)=0 forie{m+1,...,m+ p} 
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and 

y'(V2,L(%,%))y >0 forally € Th, 
where 

T = {y: Vegi(¥)y = 0, i € B(X)}. 


Given x, the existence of a Lagrange multiplier vec- 
tor which together with x satisfies these conditions 
can be checked efficiently using linear programming 
and PSD checking methods. 

Sufficient conditions for X to be a strict local min- 
imum for (3): If the feasible solution x for (3) is 
such that there exists a Lagrange multiplier vector 
W = (7,...,7m+p) which together with X satisfies 

m+p 
V.O(x) = )> wiVxgi(Z), 


i=1 
7, >0 forie{m+1,...,m+ p}, 
Wigi(x)=0 foric{m+1,...,m+ p} 
and 
y'(V2.L(%,m))y >0 forall y € Ty, 
where 
Th = ty: Vegilx)y = 0 
fori € {1,...,m}U ({i: 7; > 0} 
A {m+1,...,m+ p}N B(x)) 
and 
Vxgi(x)y = 0 forall iin the set 
{i: 1; =O} N{m41,...,m+ p}N BR). 
If (3) is a nonconvex program, verifying whether the 
last condition among the sufficient optimality con- 
ditions holds is hard (K.G. Murty and S.N. Kabadi 
[4] have shown that the simple special case of this 
problem: checking whether x7 Dx > 0 for all x > 0 is 
hard if Dis not a PSD matrix). 
A weaker sufficient condition for X to be a strict local 
minimum for (3) is obtained by replacing T in the 
above condition by the set 


T3 = ty: Vegi(x)y = 0 
for alli € {1,...,m}U({i: 7; > 0} 
N{m+1,...,m+ p} Bx)). 


This weaker sufficient condition can be checked ef- 
ficiently. 


e Necessary and sufficient conditions for xX to be 
a global minimum of (3) if it is a convex program: If 
(3) is a convex program, then the first order neces- 
sary conditions stated above are necessary and sufh- 
cient for a given feasible solution to be a global min- 
imum. 
If (3) is a nonconvex program [2,3], the gap between 
the second order necessary conditions, and the sufhi- 
cient conditions for a local minimum is small, but in 
case the problem under consideration falls in this gap, 
we are unable to confirm that either x is or is not a local 
minimum using these conditions. [1,5] 


See also 


> Equality-constrained Nonlinear Programming: KKT 
Necessary Optimality Conditions 

> First Order Constraint Qualifications 

> Inequality-constrained Nonlinear Optimization 

> Kuhn-Tucker Optimality Conditions 

> Lagrangian Duality: Basics 

> Rosen’s Method, Global Convergence, and Powell’s 
Conjecture 

> Saddle Point Theory and Optimality Conditions 

> Second Order Constraint Qualifications 


References 

1. Bazaraa MS, Sherali HD, Shetty CM (1993) Nonlinear pro- 
gramming theory and algorithms, 2nd edn. Wiley, New York 

2. Fiacco AV, McCormick GP (1968) Nonlinear programming: 
sequential unconstrained minimization techniques. Wiley, 
New York 

3. Murty KG (1988) Linear complementarity, linear and nonlin- 
ear programming. Heldermann, Berlin 

4. Murty KG, Kabadi SN (1987) Some NP-complete problems 
in quadratic and nonlinear programming. Math Program 
39:117-129 

5. Prékopa A (1980) On the development of optimization the- 
ory. Amer Math Monthly 87:527-542 


ee 
Selection of Maximally 


Informative Genes 


IOANNIS P. ANDROULAKIS', ERIC YANG” 
' Department of Chemical and Biochemical 
Engineering, Rutgers University, Piscataway, USA 
? Department of Biomedical Engineering, 
Rutgers University, Piscataway, USA 


3366 


Selection of Maximally Informative Genes 


Article Outline 


Introduction 


Formulations 
A Mixed Integer Formulation for Gene Selection 
A Mixed Integer Formulation for Feature Selection 
and Classifier Complexity Minimization 


Cases 
Gene Selection and Complexity Minimization 


Conclusions 
References 


Introduction 


Gene expression microarray experiments have been 
celebrated as a revolution in biology, attracting sig- 
nificant interest because they allow for the analysis of 
the combined effects of numerous genetic and envi- 
ronmental components. This global approach will al- 
low a fundamental shift from “... piece-by-piece to 
global analysis and from hypothesis driven research to 
discovery-based formulation and subsequent testing of 
hypotheses ...” [7,8,10,15,24,29,41]. One of the major 
challenges is to extract in a systematic and rigorous way 
the biologically relevant components from the array ex- 
periments in order to establish meaningful connections 
linking genetic information to cellular function. Be- 
cause of the significant amount of experimental infor- 
mation that is generated (expression levels of thousands 
of genes), computer-assisted knowledge extraction is 
the only realistic alternative for managing such an in- 
formation deluge. A number of excellent publications 
have focused on different aspects of gene expression 
experiments [1,2,3,6,15,20,25,32,35,36,38]. Novel com- 
putational approaches that exploit large warehouses of 
gene expression data have been identified as major en- 
ablers for realizing fully the potential of this technol- 
ogy [4]. A significant concern with microarray analyzes 
is the ability to assign a certain level of significance 
to smaller subsets of genes whose expression patterns 
could potentially indicate a more direct involvement in 
the biological process under study. 

The identification of smaller sets of “informative 
genes” is a manifestation of a boarder problem in the 
machine learning community, namely the problem of 
“feature selection”. Feature selection has received in- 
creased attention with the recent advances in function- 
al genomics that resulted in the creation of high-di- 


mensional feature sets. A number of recent publica- 
tions [11,13,20,47] have devised various approaches 
for extracting critical, differentially expressed genes in 
a systematic manner. The advantage of multivariate 
methods is that they take into account collaborative ef- 
fects of gene expression activities. Reducing the number 
of measured variables reduces the degrees of freedom, 
hence avoiding pointless over-fitting. Too few genes 
will not discriminate or predict; too many genes might 
introduce noise to the model rather than information. 
Therefore, the identification of informative genes is 
a significant component of an integrated, computer- 
assisted analysis of array experiments. In most cases the 
question of identifying differentially expressed genes is 
restated as a hypothesis-testing problem in which the 
null hypothesis of no association between expression 
levels and responses of interest is tested [15]. 

One of the reasons for performing gene selection, 
which will be the focus of this chapter, is in tissue clas- 
sification. Samples from multiple cell types (for exam- 
ple different cancer types, cancerous and normal cells 
etc.) are comparatively analyzed using microarray gene 
expression measurements. The question therefore be- 
comes how to identify which genes provide consis- 
tent signatures that distinctly characterize the different 
classes. The problem can be viewed as either a super- 
vised classification problem in which the classes are al- 
ready known, or as an unsupervised clustering problem 
in which we attempt to identify the classes contained 
within the data. In gene selection, the computational 
problem is equivalent to that of feature selection in 
multidimensional data sets. Identifying the minimum 
number of gene markers is however critical because this 
reduced set can provide information about the biology 
behind the experiment as well as define the basis for fu- 
ture therapeutic agents. Typical examples are discussed 
in [12,20,23,33]. 

The obvious way to simplify the complexity of the 
model (classifier) is to minimize the number of de- 
grees of freedom used for building the model. Tradi- 
tionally this problem is approached in a stepwise fash- 
ion as a feed-forward or backward feature selection pro- 
cess [30]. This approach is widely used in analyzing 
microarray data [20]. The problem however is that syn- 
ergistic effects are not properly captured, and it is of- 
ten difficult to come to a conclusion as to what the 
actual number of informative features is [27]. There- 


Selection of Maximally Informative Genes 


3367 


fore, a search algorithm must explicitly account in the 
objective for the actual number of features used in 
the model. For linear discriminant models, concepts 
such as the Akaike and Bayesian Information Criteria 
(AIC, BIC) have been used, albeit in a stepwise fash- 
ion. In either case the objective (maximum likelihood 
estimation) is augmented to account for the number 
of features (variables) used in the model. Recently, re- 
searchers incorporated the number of features in their 
search objective [19]. Therefore, we treat the total num- 
ber of features used in the model explicitly as one of 
our complexity criterion. The definition of model com- 
plexity is not an easy task. When the decision boundary 
is a hyper-plane, it is rather straightforward to require 
a minimum number of non-zero coefficients (AIC and 
BIC criteria discussed earlier). In nonlinear classifiers, 
or when the separations are defined as the intersection 
of multiple hyper-planes, it is not obvious how to de- 
cide which model is “simpler” as the notion of simplic- 
ity is ill defined. 

In this article we will discuss two integer optimiza- 
tion formulations for selecting informative genes. The 
first formulation is based on a mixed integer linear for- 
mulation that while minimizing the classification error, 
attempts to simultaneously minimize the complexity of 
the classifier by controlling directly the number of fea- 
tures used. The second formulation expands the com- 
plexity quantification attempting to control both the 
number of features and the complexity of the classifier 
while maximizing its performance. 


Formulations 
A Mixed Integer Formulation for Gene Selection 


A mixed-integer linear formulation was recently pro- 
posed [46]. Feature selection is always considered 
within the framework of a given analysis. This could 
be model development/fitting, classification, clustering 
etc. In other words we want to extract the minimum 
number of required independent variables necessary 
to perform a particular task. Therefore, an objective 
measuring the “goodness of fit” will be required. The 
parameters associated with the model naturally define 
a continuous optimization problem. The notion of se- 
lecting a subset of variables, out of a superset of possi- 
ble alternatives, naturally lends itself to a combinatorial 
(integer) optimization problem. Therefore, depending 


on the model used to describe the data the problem 
of feature selection will end up being a mixed inte- 
ger (non) linear optimization problem. Furthermore, 
this problem is a multi-criteria optimization since one 
wishes to simultaneously minimize the model error and 
the number of features used. Let m denote the number 
of observations for a two-class problem such that k and 
I denote the number of samples in each class (for exam- 
ple number of benign and cancerous cells respectively). 
We also denote as J, and I, the indices of the corre- 
sponding samples and I = I; U I, denotes the entire set 
of samples. Finally the set J denotes the set of all genes 
recorded in the observations and J’ C J denotes the set 
of genes (features) that are required to develop an ac- 
curate model. The expression data is presented in the 
form xj,i=1,...,1,j=1,...,J. A linear classifier is 
constructed as: 


Bo + >) Bjxij <0, ie], 
jet 
Bo + > Bjxij > 0, ie€l, 


jeJ 


However, because the observations are not, in general, 
perfectly separable by a linear model a goal program- 
ming formulation can be proposed whose goal is to esti- 
mate the coefficients that minimize the deviations from 
the classifier model, Fig. 1. 

In order to minimize the number of variables used 
in the classifier, thereby extracting the most relevant 
features for the specific linear model, binary variables 
need to be introduced to define whether a particular 


min: S>di+ >> d; 


tel t€ Ig 

s.t. 

Bo +> Bixij di a = 0,t I, 
jE] 

Bo t+ $= By xis —d +a =0,1 = Ip 
je] 

3; ER 

dj,d? € Rt 
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Optimization-based classification model 
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Selection of Maximally Informative Genes, Figure 2 
Mixed integer formulation of the feature selection problem 


variable is used in the model or not. Therefore: 


1, jes 
yj = 
0, Jer 
The number of “active” genes can therefore be con- 
strained (that is introduced parametrically in the for- 
mulation in order to avoid the solution of a multi- 
criteria optimization problem). According to the e-con- 
straint method one additional constraint of the form: 
Vie Yj < € is introduced. The complete MIP formu- 
lation is shown in Fig. 2. 


A Mixed Integer Formulation for Feature Selection 
and Classifier Complexity Minimization 


An interesting idea was recently introduced [45] in 
the context of oblique multicategory classification trees, 
whereby the class assignment is modeled through the 
use of the concept of purity of a partition. Ideally, 
one wishes to construct a multivariate classifier in such 
a way that each “partition” is occupied by elements of 
a single class (orthogonal partitions). However, this for- 
mulation is faced with a number of complexities. First, 
it is highly non-linear, second it does not perform fea- 
ture selection and finally it builds classifiers sequen- 
tially, in the sense of the one-against-all concept. How- 
ever, the purity concept introduces a very intelligent 


way of quantifying the ability of a classifier to parti- 
tion the data. Furthermore, with proper modifications 
to be discussed shortly, it allows the quantification of 
the complexity of the classifier. 

We will discuss the basic elements of recently pro- 
posed approach [53] which can be effectively gener- 
alized in the context of a mixed-integer optimization. 
This generalization will be used to develop a general 
framework of oblique multi-category trees to address 
the question of how to build simple, yet informative, 
classifiers that simultaneously perform informative fea- 
ture selection. We assume that we are given the ensem- 
ble of gene expression data in the form of an f-dimen- 
sional vectors belonging to k distinct classes. The ques- 
tion is to identify how many, and which, of these f fea- 
tures are critical for the construction of a simplified, 
yet informative, classifier. An oblique multi-category 
classifier is defined by the intersection of a number 
of planes. We term these intersections “partitions”, 
ma = 2? where p is the number of planes. We define 
complexity as the number of occupied partitions that 
are required to properly classify the data. qn, is a bi- 
nary variable indicating whether point “n” belongs in 
a partition 2. The total number of points of class k in 
partition mz is termed o;,, whereas vx, denotes the 
fraction of points of class k in said partition. Finally, 
Yk,x is a binary variable which is 1 if that partition con- 
tains even a single point from class k, and 0 otherwise. 
The location of a point relative to a particular plane 
is defined according to the binary variable z,,,, which 
is 1 if the point is below the plane, and 0 otherwise. 
This variable is a critical one since it basically defines all 
the auxiliary variables in the formulation. Finally sy is 
a binary variable indicating whether feature “f” is used 
in the construction of the classifier. Give that p planes 
create 2? partitions, we wish to identify the partitions, 
and the corresponding spatial distribution of points in 
the reduced space defined by sy which will create the 
“purest” possible partition. We model this by analyz- 
ing the product yi,2VK/,2,k # k’. In order to account 
for non-linearly classifiable problems we are basically 
looking for partitions that satisfy the set of constraints 
depicted in Fig. 3. 

The modeling idea is that (i) the partition contains 
no point of class “k” and the maximum numbers of “k” 
(empty partition), (ii) the partition contains points of 
class “k” and contains the minimum number of points 
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Selection of Maximally Informative Genes, Figure 3 
Modeling occupied partitions 


of class “k”, and (iii) if no points of class “k” are present, 
then it may contain an arbitrary number of points of 
class “k”. Obviously, the point is to maximize the num- 
ber of type (i) partitions while satisfying the “purity” 
requirements. The objective thus becomes to minimize 
the “slack” variable E. The novelty of our approach is 
that it allows the complete control of the complexity of 
the model by treating explicitly the number of features, 
and number of occupied partitions. The detailed for- 
mulation is summarized in Fig. 4. 
The proposed formulation optimizes simultane- 
ously for a number of design criteria: 
e the feature selection; 
e the construction of a multivariate, multi-class clas- 
sifier; and 
e the creation of multiple structurally alternative solu- 
tions via the introduction of integer cuts [5]. 
The overall framework remains linear and the solution 
is done parametrically for a given number of occupied 
partitions and number of features. This is a simple way 
for decoupling the objectives, however, more elaborate 
multi-objective schemes can, and will be, explored. We 
will discuss a number of computational studies to illus- 
trate the method. 


Cases 
Gene Selection and Complexity Minimization 


Our case study will focus on the model proposed in 
mainly because these results will demonstrate how to 
interpret computational results in their biological con- 
cept. This analysis demonstrated that there is indeed 
a string relationship between computational complex- 
ity, as modeled through the various optimization for- 
mulations, and biological relevance. 


“Small Round Blue Cell Tumors” (SRBCT) is a de- 
scriptive category encompassing a large number of 
malignant tumors that tend to occur in childhood. 
They are united by their similar histo-pathological ap- 
pearance. However, subtle clues may be present to 
distinguish between the tumors. For proper character- 
ization, pathologists often employ immunohistochem- 
istry, electron microscopy, and molecular analysis for 
chromosomal abnormalities. The SRBCTs of child- 
hood include neuroblastoma (NB), rhabdomyosar- 
coma (RMS), non Hodgkin lymphoma (NHL), and the 
Ewing family of tumors (EWS). Currently no single bi- 
ological or chemical test exists that can detect SRBCTs. 
A comprehensive study was presented in which a large 
number of genes were monitored. The data were re- 
duced by SVD decomposition and the leading factors 
were used to train an Artificial Neural Network to build 
a predictive diagnostic device. 

In order to analyzed this data set with the mixed 
integer formulation, Fig. 4 the raw data are first pre- 
processed used an extension of the signal-to-ratio ap- 
proach introduced by [20]. The original method was 
extended for multiclass-class problems in order to as- 
sist in the elimination of irrelevant features. This step 
reduced the initial number of features to 500. Multiple 
cuts were generated for a multitude of features/plane 
combinations and we discuss representative results to 
illustrate the extracted information. The maximally in- 
formative model has 3 features and 2 planes (4 occupied 
partitions), Fig. 5. As a standard validation, it was ver- 
ified that scrambling the data (systematic error) does 
significantly deteriorate the performance of the classi- 
fier which is a further validation that the underlying 
structure in the data in the result of a random process. 

Through the aforementioned analysis several con- 
served key genes across multiple solutions were identi- 
fied, all of which are integral in tumor progression. The 
first of these genes, caveolin-1 (CAV-1), has been docu- 
mented, when its expression patterns are altered, to be 
a key component in the formation of a variety of tu- 
mors, such as prostate [50], bladder [37], esophageal, 
and mammary [26,51]. The effects of CAV-1 in tumori- 
genesis fall into three major categories: 1) deregulation 
of cell cycle control [49]; 2) metalloproteinase produc- 
tion [51]; and 3) induction of angiogenesis [44]. The 
second gene identified, neurofibromatosis 2 (NF2), has 
also been shown to be involved in cancer formation, in 
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Mixed formulation for minimizing informative genes and classifier complexity 


neural based tumors such as schwannomas and menin- 
giomas [17,31,40,52]. NF2 exerts its effect through its 
gene product, merlin, which is involved in the regula- 
tion of cell motility and cell proliferation. Recent stud- 
ies have highlighted the involvement of merlin in tumor 
suppression through the inhibition of Rac signaling. 
In cases where the production of NF2 is diminished, 
RAC signaling becomes activated, and the tumor sup- 


pression capabilities of NF2 are lost. The third major 
gene identified is a myeloid/lymphoid or mixed-lineage 
leukemia marker (AF1Q). While the literature on this 
gene is limited, it is known that this gene is necessary 
for neuronal differentiation [28]. Thus it may be possi- 
ble that uncontrolled regulation of this gene may lead 
to neuronal-based tumors. The fourth gene, sarcogly- 
can, alpha (SGCA), is a component of the dystrophin- 
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Optimal space decomposition for the SRBCT problem 


glycoprotein complex, and has been linked to the onset 
of mammary tumorigenesis [48]. Tumorgenesis is in- 
duced when mutations inhibit the production of SGCA. 
This leads to a subsequent loss of control over a variety 
of functions such as growth control, cell survival, cy- 
toskeletal organization, basement membrane assembly, 
branching morphogenesis, polarity, and tumor sup- 
pression in epithelial cells. As mentioned previously in 
the case of CAV-1, when these key components of cel- 
lular function become aberrant, tumorigenesis ensues. 
The final gene identified through our analysis is the 
CD99 antigen (CD99), which has been determined to 
be a marker of lung carcinomas as well as mammary 
tumors. It is thought that CD99 might play an integral 
role in the aggregation of breast cancer cells, the ini- 
tiating step of tumorigenesis. CD99 also assists in the 
invasive processes characteristic of metastatic tumors. 
Finally, the gene for receptor, IgG, alpha chain trans- 
porter (FCGRT, FCRN) was also selected. This gene, as 
well as a few others, has been detected through the use 
of cDNA microarrays, in studies involved in elucidating 
the underlying genomic profile of astrocytomas [22]. 
Specifically FCRN is known to mediate immune de- 
fense in response to the onset of pilocytic astrocytomas, 
possible keeping the astroctyoma in a benign state. In 
addition, FCRN is known to be expressed by dendritic 
cells [54] and may serve as a basal mechanism of im- 
mune function. 


Conclusions 


This summary presented a number of optimization- 
based formulation, with emphasis on mathematical 
programming, for addressing the problem of gene se- 
lection. Numerous issues can be raised for future re- 
search. In fact the advantage of a MP-based formalism 
is the tremendous flexibility it provides. 

Multi-objective optimization: Interpretation of bi- 
ological information needs to tackle multiple simul- 
taneous objectives. In this short review we discussed 
simultaneous optimization of accuracy and size of clas- 
sifier (number of features). In clustering applications 
the number of clusters is yet another level of complex- 
ity, hence an additional decision variable. Therefore, 
multicriteria trade-off curves (Pareto solutions) have to 
be developed for these high-dimensional mixed integer 
(non) linear optimization problems. 

Incorporation of biological constraints: One of the 
advantages of using mathematical programming tech- 
niques is that constrains can be readily accounted for. 
Thus far microarray analyzes approaches treat the ar- 
ray data as raw unconstrained measurements. One of 
the targets of microarray analysis is to identify potential 
correlations among the data. However, prior biologi- 
cal knowledge is not taken into account mainly because 
most data mining methods cannot handle implicit or 
explicit constraints. Recently, the need to account for 
biological driven constraints when clustering expres- 
sion profiles was demonstrated [42]. 

Large-scale combinatorial optimization: The devel- 
opment of scalable algorithms is a daunting task in 
optimization theory. With the recent developments in 
genomics we should be expecting routinely that the 
analysis of gene arrays composed of tens of thou- 
sands of probes (hence tens of thousands of binary 
variables in the MIP gene selection formulation). Re- 
cent works [14,18,39], discuss various mixed-integer re- 
formulations to the classification problem. The recent 
work of Shioda et al. [43], identified opportunities for 
successful reformulations of various data mining tasks 
in the context of linear integer optimization. Busygin et 
al., present some more recent ideas for addressing the 
bi-clustering problem as a fractional 0-1 optimization 
problems [9]. Undoubtedly, integer optimization will 
play a prominent role in feature algorithmic develop- 
ments as recent results demonstrate the complemen- 
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tarity of the different methodologies, suggesting that 
a unified approach may help to uncover complex ge- 
netic risk factors not currently discovered with a single 
method [34]. 

Global optimization: The development of general 
non-linear, non-convex separating boundaries natu- 
rally leads to requirements of solving large-scale com- 
binatorial non-linear problems to global optimality. Re- 
cent advances in the theory and practice of determinis- 
tic global optimization are also expected to be critical 
enablers [16]. 

Analyzing almost empty spaces: The sparseness of 
the data set is a critical roadblock. Accurate models 
can be developed using convoluted optimization ap- 
proaches. However, we would constantly lack appropri- 
ately populated datasets in order to achieve a reasonable 
balance between the thousands of independent vari- 
ables (genes measured) and necessary measurements 
(tissue samples) for a robust identification. Informa- 
tion theoretic approaches accounting for complexity 
(Akaike and Bayesian Information Criteria) should be 
developed to strike a balance between the complexity 
and the accuracy of the model so as to avoid pointless 
over fitting of the sparsely populated datasets. 

Uncertainty considerations: Noise and uncertainty 
in the data is a given. Therefore, data mining algorithms 
in general and mathematical programming formula- 
tions in particular have to account for the presence of 
noise. Issues from robustness and uncertainty propa- 
gation have to be incorporated. However, an interest- 
ing issue emerges: how do we distinguish between noise 
and an infrequent, albeit interesting observation? This 
in fact maybe a question with no answer especially if 
we consider the implications of sparsely populated data 
sets. 
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In this article we will describe the selfdual parametric 
method for solving linear programs (LPs) of the follow- 
ing form: 


min cx 
« (1) 
st. Ax=b, x>0 x ER", 


where x is a vector of continuous variables; A is a con- 
stant matrix; c and b are constant vectors and we as- 
sume that (1) does not contain redundant constraints 
(see [8] or [7] for a procedure for removing redundant 
constraints). The solution of (1) can be approached by 
primal or dual simplex methods, or their variants [4,3]; 
however, see [9] for interior point methods. Primal and 
dual simplex algorithms start from a primal or dual fea- 
sible solution, which is then solved to optimality by us- 
ing primal or dual simplex pivot steps respectively. On 
the other hand, the selfdual parametric method, which 
is also called the criss-cross method, does not require 
a starting feasible solution and uses a combination of 
primal and dual simplex pivot steps. The criss-cross 
method is based upon introducing a parameter, 6, into 
the model, (1), and then minimizing @ by using primal 
or dual simplex pivot steps. If 6 decreases to zero the 
problem is primal or dual feasible and then the opti- 
mal solution is found by using primal or dual simplex 
method respectively, otherwise, the problem is infeasi- 
ble. 

In order to apply the selfdual parametric method, 
the parameter 0 is introduced and the problem in (1) is 
rewritten in the following form [4,8]: 


n 
. ed . : 
min Yc + 9¢;)x; 
j=l 
n 
st. xg + > aX; = bi + 0b;, 
j=mtl (2) 
i=1,...,m, 
xj =0 forall j 
x ER’, 


where (x), ..., Xm) is the vector of the selected basic 
variables; (0,..., 0, Cm4t’s.-+» Cn’) and (by’,..., by’) are 
the updated c and b vectors after pricing out the ba- 
sic columns; b; = 0if b’>0, 6; = 1 otherwise; and, 
¢; = 0, ifc’ >0,¢; = 1 otherwise. The basic idea of the 
selfdual parametric method is to minimize @ by using 
primal or dual simplex pivot steps - the smallest value 
of 6 corresponding to an optimal tableau is called the 
critical value, 0. When bg =) then 6 = 0, and the 
problem can be solved to an optimal solution by using 
primal or dual simplex method. Otherwise, 6 > Oand 
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| INITIALIZE A BASIS] 


——, 


USE PRIMAL OR 
DUAL SIMPLEX 
ALGORITHM 


SELECT A CRITICAL ROW 
OR A CRITICAL COLUMN 


CARRY OUT A DUAL 
OR A PRIMAL PIVOT STEP 


Selfdual Parametric Method for Linear Programs, Figure 1 
Criss-cross method 


is given by 

i: withis.t. bi <0}, 
b 

+ with j s.t. c} <o| ; 


In such a case, i.e. when 6 > 0,if 6 = —(b'/b;), the ith 
row of the current tableau is defined as a critical row and 
if = =(cle ;)» then the jth row of the current tableau 
is defined as a critical column. The criss-cross method 
identifies a critical column or a critical row and a primal 
or dual simplex step, respectively, is carried out. Either 
@ decreases to zero in a finite number of steps or the 
problem is primal or dual infeasible. The basic idea of 
the algorithm is outlined in Fig. 1. 

Other useful references on the subject are [1,2,3,5, 
10,11] and [6]. 
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A wide variety of nonlinear convex optimization prob- 
lems can be expressed in terms of linear matrix inequal- 
ities (LMIs), i. e., convex constraints of the form 


xjA, +++ +X,An X B 


where x € R” is the optimization variable, A; = Al € 
R™", B = BT € R”" are given matrices, and the in- 
equality sign denotes matrix inequality (i.e., the ma- 
trix B — )°; x;A; is positive semidefinite). Linear ma- 
trix inequalities arise in systems and control, combina- 
torial optimization, statistics, eigenvalue optimization, 
and several other fields. 

The most widely studied problem of this type is 
the semidefinite programming problem (SDP), in which 
we minimize a linear function subject to an LMI con- 
straint: 


c!x 


St. xXyAy +++: + x,Ay X B. 


min 


Semidefinite programming has received a great deal of 
attention recently, motivated by the discovery of new 
applications and by the development of new interior 
point methods. For surveys of the theory and applica- 
tions of SDP, see [1,5,14,15,16,21]. SDPs are convex op- 
timization problems, since the linear matrix inequality 
is a convex constraint, and the objective function is lin- 
ear. 

An interesting closely related problem is the prob- 
lem of maximizing the determinant of a linear matrix 
function, subject to LMI constraints: 


max det(Cyo + x,;C; +---+ x,Cy) 
s.t. Co + x, Cy +--+ + xn Cy >0 (1) 
XA t+ + XnAn XB 


where C; = C} € R*?, A; = A} € R™", and B = BT 
€ R”*" are given. Although the objective function is 
not concave, this problem can be easily transformed in 
a convex optimization problem. We can first note that 
the function f(X) = —(det X)!/? is convex on the set 


of positive definite matrices in R?*?, so problem (1) is 
equivalent to the convex minimization problem 


— (det C(x)? 
st. C(x) >0 
x,A1 + a XnAn s B, 


min 


where C(x) = Co + x1C) +++: + x,C,. An alternative 


formulation is 


min log det C(x)! 
st. C(x) >0 
xjAy +e+ + xX An X B. 


This is a convex problem, since the function log det X a: 
is convex on the set of positive definite matrices. 

A unified form that includes both the SDP and the 
determinant maximization problem is 


min c!x +logdet C(x)! 
st. C(x) >0 (2) 
xjAy +++ + xX, An X B. 


It is clear that this problem reduces to an SDP when Cp 
= 1 and C; = 0 for i > 0, and to a determinant maxi- 
mization problem when c = 0. Moreover, as we will see 
below, there exist important applications where both 
terms arise. 

Reference [22] gives a detailed discussion of prob- 
lem (2), including an overview of applications and a de- 
scription of an efficient interior point method. In this 
article, we illustrate the applications of (2) with a few 
examples from different areas. Following [22], we will 
refer to (2) as a max-det problem. 


Ellipsoidal Approximation 


Our first class of examples are ellipsoidal approximation 
problems. We can distinguish two basic problems. The 
first is the problem of finding the minimum-volume el- 
lipsoid around a given set C. The second problem is the 
problem of finding the maximum-volume ellipsoid con- 
tained in a given convex set C. Both can be formulated 
as convex (semi-infinite) programming problems. 

To solve the first problem, it is convenient to 
parametrize the ellipsoid as the pre-image of a unit ball 
under an affine transformation, i. e., as 


€={v: |Av +b] <1}. 
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It can be assumed without loss of generality that A = AT 
> 0, in which case the volume of € is proportional to det 
A7'. The problem of computing the minimum-volume 
ellipsoid containing C can be written as 


min logdet A“! 

st. A=A'>0 
\|Av + b|| < 1, 
for all v € C, 


(3) 


where the variables are A and b. Note that both the 
objective function and the constraints are convex in A 
and b. 

For the second problem, where we maximize the 
volume of ellipsoids enclosed in a convex set C, it is 
more convenient to represent the ellipsoid as the im- 
age of the unit ball under an affine transformation, i.e., 
as 


E= {By+d: lly <U. 


Again it can be assumed that B = BT > 0. The volume 
is proportional to det B, so we can find the maximum 
volume ellipsoid inside C by solving the convex opti- 
mization problem 


max log det B 
st. B=B'>0 
(4) 
By+decC, 
Vy, |lyll <1 
in the variables B and d. 


For general C, problems (4) and (3) are semi-infinite 
programming problems. They reduce to finite prob- 
lems in certain cases, which we now review. 

We first consider the problem of finding the min- 
imum volume ellipsoid that contains given points x’, 
Linge IR Ae, 


C = {x!,...,x*}, 


(or, equivalently, the convex hull of {x!, ..., x<}). This 
problem has applications in cluster analysis [4,18]), and 
robust statistics [19, $7]. Applying (3), we can write this 
problem as 


min logdet A“! 
st. [Axi +b] <1,i=1,... 
A=A! >0, 


_K, (5) 


where the variables are A = AT € R™” and b € R". 
The norm constraints ||Ax' + b|| < 1, which are con- 
vex quadratic inequalities in the variables A and b, can 
be expressed as LMIs 


I Axi+b 
: > 
ae +b)! 1 ) = 0, 


so (5) is a max-det problem in the variables A and b. 

Ina similar way, we can compute the maximum vol- 
ume ellipsoid contained in a polytope described by a set 
of linear inequalities 


C= {x: alx < bj, i=1,... 


i lt. 
To apply (4) we first work out the last constraint: 
By+decC, 


=> a) (By + d) < bi, 


Vv ily <1 
Vv iyi <1 


> ame) taj}d <b; 
y|ls 


=> ||Ba;|| +a) d < bj. 


This is a convex constraint in B and d, and equivalent 
to the LMI 


Gi —a}d)I 


a) B 


Ba; 
> 0. 
bj — 4) = 


We can therefore formulate (4) as a max-det problem 
in the variables B and d: 


min log det B! 


st. B>0O 
(b; = a) d)I Ba; 
= 0, 
(Ba;)" b; = a}d 
i=1,...,L. 


These techniques extend to several interesting cases 
where C is not finite or polyhedral, but is defined as 
a combination (the sum, union, or intersection) of ellip- 
soids. In particular, it is possible to compute the optimal 
inner approximation of the intersection or the sum of 
ellipsoids, and the optimal outer approximation of the 
union or sum of ellipsoids, by solving a max-det prob- 
lem. See [5] and [8] for details. 

As an example, consider the problem of finding the 
minimum volume ellipsoid €9 containing K given ellip- 
soids €,,..., Ex described as 


op = {x: x" Aix + 2b) x+ ci < o}. 
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The solution can be found by solving the following 
max-det problem in the variables Ap = an bo, and K 
scalar variables T;: 


min logdet Aj! 
st. Ag =Aj >0 


T] > 0,...,T > 0 
Ao — Tj Ai bo — T;0; 0 
(bo —tb;)' —-1— te; bj <0, 
0 bo —Ag 
b= Len cnyk, 


(co is given by co = b| Ap !bo — 1.) See [5, p. 43], for 
details. 


Experiment Design 


The field of experiment design is another source of max- 
det problems. Suppose x € R” is an unknown quantity 
that we want to estimate from a measurement y = Ax 
+ w where w ~ N(0, I) is measurement noise, and A € 
RX" (N > n) has full rank. The minimum variance es- 
timator is given by x = (A! A)~1A! y, and has an error 
covariance E(x —X)(x —x)' = (A! A)7!. Suppose that 
the rows of the matrix A can be chosen among M pos- 
sible test vectors v' € R". The goal of experiment design 
is to choose the rows of A so that the error covariance is 
‘small’. We can write ATA = N )™, Ajviv} , where A; is 
the fraction of measurements that use the test vector v’. 
If N >> M we can ignore the fact that the numbers A; are 
integer multiples of 1/N, and treat them as continuous 
variables. 

Several different criteria can be used to measure the 
size of the error covariance matrix (e. g., the maximum 
eigenvalue, the trace, or the determinant). A design 
in which the determinant det(AAT)~! is minimized is 
called D-optimal. This problem can be expressed as 
a max-det problem 


min 


This formulation of D-optimal experiment design has 
the advantage that one can easily incorporate useful 


convex constraints, e. g., linear inequalities on the num- 
bers A;. A few interesting possibilities are mentioned in 
[22]. For surveys of experiment design, see [9,12,17]. 


Estimation of Structured Covariance Matrices 


Suppose we want to estimate the covariance matrix ’ € 
R?’? of a normal distribution N (0, 7) from M samples 
y, 28 gen, The maximum likelihood estimate for » is 
the positive definite matrix that maximizes []™ p(y, 
where 


p(x) = ((2x)? det zy? exp (-px727] 


In other words, ¥ can be found by solving 


M 

1 

min logdet Y~'— a » yo GLYW 
i=1 


(6) 
st. N=! >, 


which can be expressed as a max-det problem in the in- 
verseR= X71: 


min Tr SR + logdet R™! 


(7) 
s.t. R>0O, 


where 
ee i 
= —) (i) ,,(4) 
a= M 4 ae a, 
‘= 


Without any additional constraints, this problem has 
a straightforward analytical solution: R = S~! (provided 
S is nonsingular). The formulation as a max-det prob- 
lem is useful when additional constraints are added. To 
give a simple illustration, bounds on the variances » 
can be expressed as LMIs in R 


Re; 
Si = e| Re; <a> ( T ) = 0. 
e; a 


Adding constraints is also useful when the matrix S is 
singular (for example, because the number of samples 
is too small) and, as a consequence, the max-det prob- 
lem (7) is unbounded below. In this case we can impose 
constraints (i. e., prior information) on 2’, for example 
lower and upper bounds on the diagonal elements of R. 
See also [2,3,6], [20, $6.13], and [10]. 
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Quasi-Newton Updates 


In a quasi-Newton method for unconstrained mini- 
mization of a convex function f on R”, the Newton step 
— V?f(x)! V(x) is replaced by —-H~!Vf(x), where H 
= HT > 0 is an approximation of the Hessian matrix, 
based on prior information and previous gradient eval- 
uations. At each iteration, as the algorithm moves from 
x to the next point x*, a new approximation H™ is de- 
termined, based on the current H, and on the difference 
between the gradients at x* and x. A good updating 
rule for H should satisfy several properties: H* should 
be close to H, it should be easy to compute (or, more 
precisely, the search direction —H*~'Vf(x*) should be 
easy to compute), and it should incorporate the new in- 
formation obtained by evaluating the gradient Vf(x*). 
This last property is usually enforced by imposing the 
secant condition 


Ht (xt —x)= Vf(xt) — Vf(x). (8) 


R.H. Byrd and J. Nocedal [7] have proposed to mea- 
sure the difference between H and H™ by using the Kull- 
back-Leibler divergence (or relative entropy), given by 


(Tr H-'H* — log det H"'H* — n) 


Nile 


(see also [11]). The Kullback-Leibler divergence is non- 
negative for all positive definite H and H*, and zero 
only if H* = H. The update H* that satisfies the se- 
cant condition and minimizes the Kullback-Leibler di- 
vergence is the solution of the following optimization 
problem: 


min TrH~'H* —logdet H7'H* 
st. HT >0 (9) 
Ht (xt — x) = Vf(xt) — VF(). 


R. Fletcher [13] has shown that the solution is given by 


Hss'H — gg! 


Ht =H- oo 
s’ Hs 


Ty (10) 
assuming that s’ g > 0, where s = x* — x and g = 
Vf(x*)— Vf(x). Formula (10) is well known in uncon- 
strained optimization as the BFGS quasi-Newton up- 
date (Broyden-Fletcher-Goldfarb-Shanno). 

Fletcher’s observation opens the possibility of 
adding more complicated LMI constraints to (9), and 


solving the resulting problem numerically. For exam- 
ple, we can impose a certain sparsity pattern on H*, or 
we can relax the secant condition as 


JHt at — x) — Vt) + VF) se, 


where € is a given tolerance. 

Updating H by numerically solving a convex opti- 
mization problem will obviously involve far more com- 
putation than the BFGS update. Thus, this formulation 
for quasi-Newton updates is only interesting when gra- 
dient evaluations are very expensive. 


Conclusion 


The max-det problem (2) is an extension of the 
semidefinite programming problem, and _ includes 
a wide variety of convex optimization problems as spe- 
cial cases. Some of the applications we discussed have 
been studied extensively in the literature, and in some 
cases analytic solutions or efficient specialized algo- 
rithms have been developed. The practical importance 
of the general problem is that it can be handled effi- 
ciently using recently developed interior point meth- 
ods. This allows us to solve interesting variations of 
problems for which no analytical solution or specialized 
method exist, and it opens the possibility of adding use- 
ful LMI constraints. 
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In this article we discuss optimization problems of the 
form 


min f(x) 


xeER” (1) 
s.t. G(x) x 0, 


where f(x) is a real valued function, G(x) is a mapping 
from R" into the space 8? of p x p symmetric matri- 
ces, and the notation A = 0 (the notation A > 0) means 
that A is a negative (positive) semidefinite matrix. We 
refer to (1) as a semidefinite programming problem. The 
above semidefinite programming problem is said to be 
linear if the objective function f(x) is linear and the 
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mapping G(x) is affine, i.e. is of the form 


min c!x 
xER” 


n 
st. Ag+ EZ <0, 


i=1 


(2) 


where c € R" and Apo, ..., An € 8? are given matrices. 
Let us observe that the constraint G(x) < 0 can be 
written as G(x) € 82, where 


8? := {Ze 8; Z= 0} 


denotes the set of p x p negative semidefinite symmet- 
ric matrices. Note that the set S?. forms a closed convex 
cone in the space S?, and that 8? can be viewed as a lin- 
ear (vector) space of dimension p(p + 1)/2. We equip S? 
with the scalar (inner) product 


P 
Ae B:= trace(AB) = > ajjbjj;, 
ij=l 

between two matrices A = (aj) and B = (bj). 

With these observations at hand one can notice 
a certain similarity between problem (1) and nonlinear 
programming problems. In a nonlinear programming 
problem corresponding constraints can be written in 
the form G(x) € R”, where G(x) is a mapping from R” 
into R” and R™ is the negative orthant of the space R”. 
That is, in both cases the constraints can be formulated 
as convex cone inclusions. Let us note at this point that 
the polar (negative dual) of 5? is given by the cone sf 
of positive semidefinite matrices. 


Duality 


With problem (1) is associated the following La- 
grangian function 


L(x, 2) := f(x) + Qe G(x) 


of x € R” and 2 ¢€ S?. Noting that, for a given x, the 
maximum of L(x, §2) with respect to 2 € sf, equals 
F(x) if G(x) € 8? and + oo otherwise, we can write prob- 
lem (1) in the form 
min max L(x, £2). (3) 
xER" Qesh 
By interchanging the order in which the operators min 
and max are applied, we obtain 


max I 92) = min L(x, 2); . (4) 
es", | xR" 


We refer to the pair (1) and (4) as primal and dual 
problems, respectively. In particular, in case the primal 
problem is the linear semidefinite program (2), the dual 
problem takes the form 


max {2e Ao 

Rest 

s.t. cj, + 2eA;=0, (5) 
a re 


There is a long history in mathematical programming 
of considering dual problems. In semidefinite program- 
ming dual problems were studied in various forms, for 
example, in [1,9,12]. 

The optimal value of the primal problem is always 
greater than or equal to the optimal value of the dual 
problem. It is said that there is no duality gap between 
the primal and dual problems if their optimal values 
are equal. Note that in the semidefinite programming, 
the duality gap can happen even in the linear case. It 
is possible to give various regularity conditions, called 
constraint qualifications, ensuring the ‘no duality gap’ 
property. In the linear case, if ¥ and @ are feasible 
points of the problems (2) and (5), respectively, then x 
is an optimal solution of (2), 2 is an optimal solution of 
(5), and there is no duality gap between these problems 
if and only if the following complementarity condition 


QeG(x)=0 (6) 


holds, where G(X) = Ao + )°y_, X;Ai. This result can 
be extended to a certain class of convex semidefinite 
programming problems [10]. Note that since @ is a fea- 
sible point of (5) and hence Qe si, and x is a feasible 
point of (2) and hence G(x) € S?, the complementar- 
ity condition (6) is equivalent to G(x) = O, where O 
denotes the zero p x p matrix. 


First Order Optimality Conditions 

Consider now a (possibly nonlinear) semidefinite pro- 
gramming problem in the form (1), and suppose that 
f(x) and G(x) are continuously differentiable. Let x be 
a locally optimal solution of (1). With X is associated the 
following system of first order optimality conditions: 


V,.L(x, 2) = 0, 
92 e G(x) = 0. 


2 > 0, 
e (7) 
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In case the semidefinite program is linear, the first 
and second conditions in (7) ensure that 2 is a feasi- 
ble point of the dual problem (5), and the last condi- 
tion corresponds to the complementarity condition (6). 
Therefore, in that case, if there exists a matrix 2 € 8? 
satisfying conditions (7), then 2 is an optimal solution 
of the corresponding dual problem. For general (pos- 
sibly nonlinear) semidefinite programming problems, 
conditions (7) are obtained by linearization of f(x) and 
G(x) at the point x. 

It should be noted that existence of a matrix §2 sat- 
isfying the first order optimality conditions (7), is not 
guaranteed even in the linear case. If such a matrix ex- 
ists we refer to it as a Lagrange multipliers matrix. It 
is possible to show that the set of Lagrange multipli- 
ers matrices is non empty and bounded if and only if 
the following constraint qualification holds: there exists 
a vector h € R" such that the matrix G(x) + dG(x)h 
is negative definite (i.e. belongs to the interior of $?), 
where 

- “\ dG(X) 

dG(x)h := d h; a, 


is the differential of G at x. 

The above constraint qualification can be viewed as 
an extension of the Mangasarian-Fromovitz constraint 
qualification [5], used in nonlinear programming. Note 
that in the linear case the partial-derivatives matrix 
aa equals A;, i= 1,..., n, and hence the above con- 
straint qualification is equivalent to the Slater condi- 
tion: there exists x* € R” such that the matrix G(x*) is 
negative definite. This equivalence also holds for convex 
semidefinite programming problems. For a discussion 
of constraint qualifications in cone constrained prob- 
lems see [6,7,8,13]. 

It is possible to formulate the above constraint qual- 
ification in another equivalent form. Recall that for 
a convex closed subset K of a finite-dimensional vec- 
tor space and y € K, the tangent cone Tx(y), to K at y, is 
formed by vectors z such that the distance from y + tz 
to K is of order o(t), t > 0. The condition 


dG(x)R” + Tgp (G(x)) = 8? (8) 


is another equivalent form of the above constraint qual- 
ification. It is possible to show that the tangent cone to 
the set S?,, at G(x), can be written in the form 


Tsp (G(x)) = {Z € 8?: E'ZE <0}, 


where E is a p x (p — r) matrix of full column rank p — r 
such that G(x)E = O and r is the rank of G(x), e.g. 
[10]. (If r = p, ie. G(X) is negative definite and hence 
belongs to the interior of $?, then T,» (G(x)) coincides 
with the whole space 8?.) 

Consider now the set W, of p x p symmetric matri- 
ces of rank r. This set forms a smooth manifold in the 
space 8, and the tangent space to W, at the point G(x) 
is given by 


Tw,(G(x)) = {Z € 8?: E'ZE=0}. 


It follows from the above formulas that Ty,(G(x)) co- 
incides with the lineality space of the cone Tgp (G(x)), 
ive. it is the largest linear subspace of Tz (G(x)). It is 
said that the point X is nondegenerate ([2,10]) if the fol- 
lowing condition holds 


dG(x)R” + Tw,(G(x)) = 8?. (9) 


The above condition (9) means that the mapping G 
intersects W, transversally at the point x. It immedi- 
ately follows from the corresponding result in differen- 
tial geometry that nondegeneracy, in a certain sense, is 
a generic property (cf. [11]). It is clear that constraint 
qualification (8) implies (9). Therefore we have by the 
above discussion that if the point x is nondegenerate, 
then there exists a Lagrange multipliers matrix 92 sat- 
isfying conditions (7). In fact it is not difficult to show 
that if x is nondegenerate, then such Lagrange multipli- 
ers matrix is unique. 

Let us note now that it follows from the comple- 
mentarity condition (6) that 


rank(2) + rank(G(x)) < p. 


It is said that the strict complementarity condition holds 
at x if there is a Lagrange multipliers matrix @ such that 


rank(@) + rank(G(x)) = p. 


It is possible to show that, in a certain sense, strict com- 
plementarity is also a generic property ([2]). Under the 
strict complementarity condition, nondegeneracy of x 
is a necessary and sufficient condition for uniqueness 
of the corresponding Lagrange multipliers matrix. 


Second Order Optimality Conditions 


In the convex (and, in particular, in the linear) case 
first order conditions (7) are also sufficient for a feasi- 
ble point x to be an optimal solution of the problem (1). 
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However, even in the convex case second order opti- 
mality conditions can be an important part of the anal- 
ysis, e. g. second order conditions are intimately related 
to stability properties of the corresponding optimal so- 
lutions. From now on we assume that f(x) and G(x) are 
twice continuously differentiable. 

Let x be a stationary point of the problem (1), i.e. * 
is feasible and there exists a Lagrange multipliers matrix 
@ satisfying first order optimality conditions (7). With 
x is associated the so-called critical cone 


= h' Vf) =0, 

COE VS aah e Tse (GZ) f° 

This critical cone represents those directions h € R” for 

which first order conditions (7) do not provide an in- 

formation about optimality of x. Let us denote by A(x) 

the set of all Lagrange multipliers matrices (2 satisfying 

conditions (7), and suppose that the constraint quali- 
fication (8) holds. Recall that A(x) is non empty and 

bounded if and only if the constraint qualification (8) 

is satisfied. We can write now second order conditions 

for X to be a locally optimal solution of the problem (1) 

as follows ([3,10]). 

e Second order necessary conditions: for any critical 
direction h € C(x) there exists a Lagrange multipli- 
ers matrix 2 € A(X) such that 

h'V2,.L@, Q)h + h' W(X, Q)h = 0. (10) 

e Second order sufficient conditions: for any nonzero 
critical direction h € C(x) there exists a Lagrange 
multipliers matrix 2 € A(X) such that 


h' V2, LG, Qh+h HR, Qh>0. (id) 


Here V2, L(X, 2) stands for the Hessian matrix of sec- 
ond order partial derivatives and H(x, 2) denotes the 
n X n symmetric matrix whose ij-element is 


[H(%, Q)]ij = —2 trace (RGIG@)I"G;) , 


where G; := ae) and [G(x)]* denotes the Moore- 
Penrose pseudo-inverse of the matrix G(x). 

Note that there is no gap between the above sec- 
ond order necessary and sufficient conditions in the 
sense that the weak inequality sign in (10) is replaced 
by the strict inequality in (11). The additional term 


h' H(x, 2)h, which appears in the above conditions, 
represents the curvature of the set S?. The matrix 
H(X, 2) is positive semidefinite, and therefore this 
‘curvature term’ is always nonnegative. 

In fact, the above second order sufficient conditions 
(11) are equivalent to the so-called second order growth 
condition: there exists a constant a > 0 such that for all 
feasible x, i.e. satisfying G(x) € 52, in a neighborhood 
of x the inequality 

f(x) =f) +a |x —x|] (12) 
holds. If the semidefinite programming problem is lin- 
ear, then all elements of the Hessian matrix Vv Lc, 2) 
are zeros, and hence the first term in (10) and (11) van- 
ishes. Nevertheless, even in the linear case the second 
(curvature) term can be positive, and the second order 
growth condition (12) can hold. 

If the point x is nondegenerate, then there exists 
a unique Lagrange multipliers matrix 2, and if, more- 
over, the strict complementarity condition holds, then 
the critical cone becomes a linear space and can be writ- 
ten as 


Ce) = 4h: DohE'GE=0}. 


i=1 


Stability Analysis 


Suppose now that the considered semidefinite pro- 
gramming problem is subject to perturbations. That is, 
consider the following family of optimization problems 


min x,u 
xeR" f ) 


s.t. G(x,u) < 0, 


(13) 


depending on the parameter vector u € R™ guiding per- 
turbations of the above problem. We assume that f(x, u) 
and G(x,u) are twice continuously differentiable and 
that for a given value u = uo of the parameter vector, 
problem (13) coincides with the ‘unperturbed’ problem 
(1). For example, in the linear case we can view some of 
the matrices A; as parameter vectors which are subject 
to perturbations. 

Let us denote by S(u) the set of optimal solutions of 
the parametric problem (13). What can be said about 
continuity properties of S(u), as a function of the pa- 
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rameter vector u? Note that S(u) can have more than 
one element and can be empty. It is possible to con- 
struct examples where the optimal-solutions set S(uo), 
of the unperturbed problem, is non empty while S(u) is 
empty for arbitrary small changes in u. Of course such 
examples are pathological and unsuitable for numerical 
analysis since arbitrary small perturbations of the data 
may result in unsolvability of the considered problem. 

It is possible to give various conditions ensuring 
continuity (upper semicontinuity) of S(u). For example, 
in the convex (linear) case, if S(uo) = {X} is a single- 
ton (i. e. the unperturbed problem has the unique opti- 
mal solution x) and the Slater condition holds, then any 
optimal solution x(u) € S(u), of the parametric prob- 
lem (13), converges to X as u —> up. Yet even if x(u) is 
continuous at u = uo, the rate at which it changes can 
be much faster than the rate of change of u, i.e. small 
perturbations in u can bring large changes in the corre- 
sponding optimal solution. In that case the problem is 
said to be ill-conditioned and may be difficult for a nu- 
merical solution. 

It is said that an optimal solution x, of the unper- 
turbed problem, is stable if for all u sufficiently close 
to uo the optimal-solutions set S(u) is non empty and 
x(u) — X, as u —> ug, for any x(u) € S(u). If, more- 
over, there is a positive constant « such that 

\|x(u) —=| <«k |lu—uoll. (14) 
then x is said to be Lipschitz stable. 

It is not difficult to show that if the constraint map- 
ping G(x, u) = G(x) does not depend on u, and hence 
the feasible set of (13) is fixed, then the second order 
growth condition (12) is a sufficient condition for a sta- 
ble optimal solution x to be Lipschitz stable. The gen- 
eral case is more subtle and, in order to ensure Lipschitz 
stability of x, some additional conditions are required 
(see [4] for an extensive discussion of that type results). 

Let us finally remark that if the point x is nondegen- 
erate and the strict complementarity condition holds, 
then the optimal solution x(u) is a continuously differ- 
entiable function of the parameter vector ([10]). In that 
case one can approximately evaluate the constant « in 
(14) by using the corresponding linear approximation 
(first order Taylor expansion) of x(u) at uo. It turns out 
that the rate of change of x(u) is closely related to con- 
ditioning of the matrix V2, L(x, 2) + H(x, @), used 


in the corresponding second order optimality condi- 
tions, restricted to the critical cone C(x) which is a lin- 
ear space in that case. 
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Introduction 


Research on ad hoc wireless sensor networks has in- 
creased greatly in recent years [24]. Sensor networks 
usually consist of a large number of sensors which are 
deployed to collect data of interest. Such networks are 
versatile tools which provide a low-cost method of tar- 
get tracking, as well as monitoring seismic activity, tem- 
perature, sound levels, and light [9]. Information gath- 
ered by the sensors is only useful if the positions of the 
sensors are known. However, it is often the case that the 
use of a GPS system is too costly or, consumes too much 


power, or the network is being deployed in a location in 
which GPS is denied [21]. 

Recently, techniques have been developed which es- 
timate the node locations based on a mixture of dis- 
tance measurements and angle measurements between 
pairs of nodes in the network. This problem is referred 
to as the Sensor Network Localization Problem, (SNLP) 
and can be formally stated as follows: Given the true po- 
sitions of some of the nodes and the pair-wise distances 
between some nodes, how can the positions of all of the 
nodes be estimated? [6,9,28]. 


Organization 


Throughout the article, we will investigate the SNLP. In 
the following section, we formally define the problem 
statement and in Sect. “Methods”, we review several so- 
lution techniques which appear throughout the litera- 
ture. In Subsect. “Semidefinite Programming Model”, 
we describe a semidefinite programming (SDP) model 
for the problem. We then provide a general assess- 
ment of this approach and describe some implementa- 
tion details. We highlight this method specifically be- 
cause of its advantages over heuristic methods. Par- 
ticularly, the SDP method is known to localize any 
network whenever a unique solution exists, and to 
do so in polynomial time. We provide some conclud- 
ing remarks in Sect. “Conclusion” and indicate direc- 
tions of future research. Finally, a list of cross refer- 
ences is provided in Sect. “See also”. We conclude this 
section with an introduction to some of the symbol- 
ogy that will appear most prevalently throughout the 
article. 


Idiosyncrasies 


Here we briefly introduce some of the symbols and 
notations we will employ throughout this paper. De- 
fine the trace of a symmetric matrix A, denoted 
Trace(A) as the sum of the diagonal entries. The stan- 
dard trace inner product of two matrices A and B 
is given as (A,B) = Trace(A'’B). The 2-norm of 
a vector x is denoted as ||x|| and is defined to be 
y (x, x). A positive semidefinite matrix A will be de- 
noted as A > 0. Agree to let Ig and Oy respectively 
represent the identity matrix and a vector contain- 
ing all zeros, both with dimension d ¢€ Z. Finally, 
we will use italics for emphasis, CALLIGRAPH Y 
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to refer to formulations, and Small Caps for prob- 
lem names. Any other locally used terms and sym- 
bols will be defined in the sections in which they ap- 
pear. 


Formulation 


A wireless sensor network is typically made up of 
a number of densely distributed sensors that collect 
data. An instance of the SNLP consists of a set of m 
so-called anchor points whose positions are known 
a priori [9,12,22]. The object is to determine the loca- 
tion of n sensor points in the system based upon infor- 
mation obtained from the anchor sensors. Let the an- 
chor points and the sensor points be respectively de- 
noted as a}, d2,...,4@m € R4@ and x), X2,...,Xn € R¢. 
The Euclidean distances dj. ; between points a, and x; 
for some k,j, and dj; between x; and x; for some i < j 
are also given. Let Nz = {(k, j) : dk,; is specified} de- 
note the sensor/sensor pairs and N, = {(i,j) : i < j, 
dj; is specified} represent the sensor/anchor. Then 
the sensor network localization problem as defined 
in [9] is to find the localization (estimated position) of 
Xj,X2,-++,Xn € R@ such that: 


SNLP: lax —x;\|? = dij, ¥(k, j) € Ng, and 


llxi — xl? = dij, VG, j) € Ne. 


(1) 


From this seemingly simple formulation, many difficult 
questions arise. For a given instance of the SNLP, does 
this instance have a realization in the required dimen- 
sion? If so, is the realization unique? We should note 
that these two seemingly related questions are quite 
different from a computational perspective. It has been 
shown that determining if an instance of the SNLP has 
a unique realization in R* can be determined efficiently 
under certain assumptions [20]. On the other hand, it 
remains IVP-complete [17] to compute a realization 
on the plane, even if the instance is known to have 
a unique realization [5]. This is the main problem of 
interest in this article. 


Methods 


In this section, we review several solution methods 
which have been applied to the SNLP. Particularly in 
Subsect. “Semidefinite Programming Model”, we high- 
light the techniques of Ye et al. [1,8,9,28] and the ap- 


plication of semidefinite programming methods for ef- 
ficiently computing solutions to large-scale instances of 
the SNLP under a variety of circumstances. 


Review of Solution Approaches 


Several techniques have been applied to the SNLP, 
all having some redeeming qualities [9,16,18]. Several 
techniques involve the use of distance or angle mea- 
surements between the anchor points in order to com- 
pute a localization [13,15,23,25,26,27]. Another com- 
mon technique used by Bulusu et al. [11] and Howard 
et al. [19] is to employ a grid or a set of surveyed points 
whose locations are known. Then, the sensor localiza- 
tion is attempted using the relative distances between 
sensors and the set of beacon points [9]. 

The so-called “DV-Hop” technique of Niculescu 
and Nash [23] is an efficient method in dense topologies 
whereby the anchor nodes flood the network with their 
location information. This allows other points to trian- 
gulate their positions based on the information of the 
anchor nodes. This information is then passed along 
to other sensor nodes who use the combined locations 
to triangulate their positions. However, for widely dis- 
persed and irregular topologies, the relative errors in 
the node estimation tends to be fairly substantial. 

A similar technique proposed by Savarese et al. [25] 
uses a method similar to the “DV-Hop” algorithm de- 
scribed above to provide a rough estimate of the loca- 
tion information. These estimates are then improved 
by applying a least-squares triangulation using these 
estimates as well as a new collection of estimated 
positions [9]. 

The “iterative multilateration” technique of Sav- 
vides et al. [26] is another effective method especially 
when the number of anchor nodes is relatively high. 
This method calculates via triangulation the positions 
of those nodes that are adjacent to at least three anchor 
points. Then these new localized nodes become anchor 
points and the process continues. In the end every node 
in the network has become an anchor. 

In [15], the so-called “multidimensional scaling” al- 
gorithm is proposed. The heuristic begins by making 
an initial estimate of the node positions based solely on 
the connectivity and basic distance and angle informa- 
tion of the non-anchor nodes. Then using a variant of 
singular value decomposition [29], a map is generated 
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of the relative locations of the nodes. Finally, these esti- 
mates are greatly improved and an absolute global map 
is produced by taking into account the locations of the 
anchor nodes and updating the estimates accordingly. 

The work of Doherty et al. [13] involves a tech- 
nique in which linear bounding hyperplanes are used 
to model the proximity constraints on the nodes which 
can communication with each other as convex con- 
straints [1,9]. However, these constraints are often too 
loose and provide solutions which are not helpful in 
terms of calculating the unique realization of the sen- 
sors. 

As we see, the drawback with most sensor network 
localization techniques involving heuristics is that they 
do not always find a unique solution even when it exists, 
or require excessive computation time to do so [28]. 
A recently developed method introduced by So and Ye 
in [28] uses a semidefinite programming (SDP) model 
that guarantees the discovery of a unique solution when 
it exists. Furthermore, the solution can be computed 
in polynomial time. In the following subsection, we 
present the SDP model, discuss the motivation behind 
using this approach and analyze some properties of the 
model. 


Semidefinite Programming Model 


A semidefinite program is a convex optimization prob- 
lem where the objective function is linear and the con- 
straint is defined by a linear matrix inequality. Given 
a vector c € R”, and m + 1 symmetric matrices 
Fo, Fi,..., Fm € R”*", a semidefinite program can be 
written in the form: 


min {c!x|F(x) > 0}, 
xER™ 


where F(x) = Fo + )(j_, xiFi, and F(x) > 0 im- 
plies that F(x) is positive semidefinite [14,30]. Hence, 
both the objective function and the constraint are con- 
vex, and therefore semidefinite programs are closely re- 
lated to linear programs, and many algorithms for solv- 
ing linear programming problems have generalizations 
that apply to semidefinite programs as well [30]. 

In [28], the authors note that (1) is a non-convex 
optimization problem, which is difficult to solve in 
general. They propose a SDP relaxation by convert- 
ing the nonconvex quadratic distance constraints into 
linear constraints as follows. Specifically, let X = 


[x1,X2,...,%X,] be the d x n matrix which we are are 
trying to determine. Let e;; € IR be the vector where 
the ith position is 1, the jth position is —1, and all other 
entries are zeros. Then for all (i, 7) € Nx, we have that: 


||; = x;\||? = e],X" Xejj : (2) 
Furthermore, for all (k, j) € N; it follows that 
[lax — xl? = (axse,)" Was X)"Uas X(azse;). (3) 


where e; is a vector of all zeros except for —1 at entry j, 
and (ax; e;) € R¢+* is a vector consisting of a; “on top 
of” e; [9]. Using these definitions, we can reformulate 
the problem as follows. 


Find X € R@*" and Y € R”™*" (4) 
such that 
e;, Yeu = d;., V(i, j) € Nx ’ (5) 
Li. 
(aise)"( coy Jeausey 


= B,.V(kj)ENa. (6) 
y =x'x, (7) 


The intuition behind the SDP formulation is to relax 
constraint (7) to Y > X'X; thus implying that Y— X'X 
is positive semidefinite. Boyd et al. [10] among others 
have shown that a positive semidefinite matrix Y—X'X 
can be expressed as 


i. x 
z=( th y )eo (8) 


Define Z;.4,1.4 to be the d x d principle submatrix of Z. 
Then the SDP relaxation of the SNLP as given in [28], 
is to find Z e R@+™*4+") to; 


SDP: maximize 0 (9) 
subject to 

Zi:dad = Ta, (10) 

((0; €;;)(0; e;;)', Z) = di, Vi, ENz, (11) 

((ax3 e;)(ax3 ej)", Z) = dij, W(k, j) € Na, (12) 

Z=0. (13) 
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Notice that by definition, any feasible solution matrix Z 
must have at least rank d [9]. We can formulate the dual 
of the SDP relaxation as 


SDP-D : minimize 


(a, Ve S vidi; + py wrjdkj (14) 
(i, ENx (k,j)ENa 
subject to 
Vo i 
( 0 0 )+ 2 yij (0; e;;)(05 e:;) + 
(i, jJENx (15) 
> wx jas €j)(ak; ej)" = 0. 
(k,j)ENa 


Notice that the dual formulation is always feasible. In 
particular having V = 0, y;; = 0 forall (i, j) € Nx, and 
wxj = 0 for all (k, j) € N, forms a feasible solution. 

In [28], So and Ye postulate and prove several re- 
sults regarding the above formulations. We will high- 
light the key theorems and provide a basic analysis. For 
detailed proofs and a more in-depth study, see [28]. 

The first result provides a class of instances for 
which the SDP relaxation is exact, i.e. for instances 
when the matrix Z has rank d. Suppose that formu- 
lation SDP is feasible. This implies that the distance 
measurements d;; and dxj are exact for the positions 
X = [X),...X,]. Then, we have the following result. 


Theorem 1 Let Z be a feasible solution for SDP and 
U be an optimal slack matrix of SDP-D. Then by the 
duality theorem for semidefinite programming [4], it fol- 
lows that: 

1, 12) = 03 

2. rank(Z) + rank(U) < d +n; 

3. rank(Z) > d and rank(U) < n. 


A immediate consequence of this theorem is that for 
optimal dual slack matrices U such that rank(U) = n, 
it follows that rank(Z) = d. Therefore, formulation 
SNEL£P is equivalent to formulation SDP implying 
that the SN LP formulation can be solved optimally 
in polynomial time [28]. 

The next theorem establishes the existence of a large 
group of efficiently localizable graphs. 


Theorem 2 Suppose that the network in question is 
connected. Then the following are equivalent: 


1. Problem SN LP is uniquely localizable. 

2. The max-rank solution matrix of SDP has rank d. 

3. The solution matrix of SDP, represented by (8), sat- 
isfies Y = X'X. 


This theorem has several significant implications. First, 
we have that as long as SN LP has a unique local- 
ization, then it can be computed in polynomial time 
by solving the corresponding semidefinite relaxation. 
The converse also holds. That is, if the solution matrix 
to the semidefinite relaxation X has rank d, then X is 
the unique localization for formulation SN LP [28]. 
Lastly, as we mentioned above we have the existence 
of a family of graphs for which the localization can 
be efficiently computed despite the underlying NP- 
completeness of the SNLP in general. 

The seminal work of So and Ye [28] which we high- 
lighted above provides a baseline to which many exten- 
sions can be made. To begin with, the results presented 
above are based on the assumption that the distance 
measurements are exact. The work of Biswas et al. in [9] 
provides extensions to handle inaccurate and incom- 
plete measurements. This greatly improves the robust- 
ness of the SDP formulation, making the model more 
applicable to real-world scenarios in which inaccuracies 
are inevitable. Furthermore, in [1] the authors provide 
SDP formulations of the SNLP which incorporate angle 
information which can be used alone or in concert with 
distance information to calculate sensor realizations. 
This method is particularly useful when the sensors can 
detect multiple angles [1]. We see that many exten- 
sions are possible and that the by using semidefinite 
programming methods, a large class of sensor network 
localization problems are able to be solved more effi- 
ciently and effectively than by previous heuristic tech- 
niques. 


Conclusion 


The focus of this article was the sensor network local- 
ization problem (SNLP), with particular attention given 
to a set of robust solution technique based on a semidef- 
inite programming model. After an introduction to the 
problem, we highlighted several solution approaches 
which have been applied. Next, we presented an ana- 
lyzed the SDP formulation of So and Ye [28]. The re- 
sults proved for the SDP relaxation of the SNLP have 
provided a framework which can be extended to other 
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problems in distance geometry in which angle and dis- 
tance information are mutual between pairs of points. 
Such problems include Euclidean ball packing and most 
recently 3-dimensional molecule conformation prob- 
lems [7]. 
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> Graph Realization via Semidefinite Programming 

> Semidefinite Programming and Determinant 
Maximization 
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and Stability 
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Optimization 

> Solving Large Scale and Sparse Semidefinite 
Programs 
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Structural design (SD) deals with optimal design of lin- 
early elastic mechanical constructions. Mathematically, 
the SD problem is as follows. We are considering a me- 
chanical construction S with finitely many degrees of 


freedom M, so that virtual displacements of S are spec- 
ified by vectors w € R™. We are also given a set W C 
R” of kinematically admissible displacements. The en- 
ergy of elastic deformation of S under a displacement 
w is a nonnegative quadratic form w'Bw/2 of the dis- 
placement. B is a symmetric positive semidefinite ma- 
trix characterizing S; this matrix is assumed to depend 
linearly on the vector t of design parameters of the con- 
struction: B = B(t). 

The construction can be subject to an external load; 
mathematically, a load is a vector f € R™. The equilib- 
rium displacement Wy caused by the load minimizes 
the potential energy w1 B(t)w/2 — fTw over w € W: 


1 
wf € ata [ftw - 5H BC ; (1) 


the corresponding optimal value 
1 
compl(f; t) = sup [Tw _ 5H BC (2) 
weEew 2 


— the compliance of S under load f — indicates how 
stiff is the construction w.r.t. the load (the less is the 
compliance, the better). 

The SD problem in its general setting is: 

e Given a set F C R™ of tentative loads and a set T of 
admissible values of the design vector tf, find t € T 
which minimizes the worst-case, w.r.t. loads from F, 

e compliance: 


min compl,(t) 
(Pini) = sup compl(f; f) 
f€F 
st. ted. 


The outlined general setting has two particular cases 
of especial interest. 


Truss Design 


A truss is a construction, like an electric mast or the 
Eifel Tower, comprised of thin elastic bars linked with 
each other at nodes. In the standard truss topology de- 
sign problem the nodes form a given finite set in R4 
(d = 2 for planar and d = 3 for spatial trusses), and all 
pair connections of the nodes by bars are allowed. Vir- 
tual displacements of a node form a given linear sub- 
space of R4, and the space R™ of virtual displacements 
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of the truss is the direct product of these subspaces over 
all nodes. The set W of admissible displacements is cut 
off R™ by a number of inequality constraints (typically 
linear) representing obstacles - restrictions on the dis- 
placements of the nodes coming from absolutely rigid 
partial supports. 

The design variables of the truss are volumes ty of 
all tentative bars, and the corresponding matrix B(t) is 
B(t) = ye by te, where N is the number of tentative 
bars and the vectors bg € R™ are readily given by the 
geometry of the nodal set. An external load - a collec- 
tion of physical forces acting at the nodes of truss - can 
be identified with a vector f € R” >and the equilibrium 
displacement and compliance of truss associated with 
load f are given by (1), (2), respectively. 

The set T of feasible design vectors always is a subset 
of RY (bar volumes must be nonnegative) satisfying the 
resource constraint )\)_,te < v (upper bound on the 
weight of the truss); the description of T may include 
also some other, normally linear, constraints. 


Shape Design 


A shape is a construction comprised of material con- 
tinuously distributed in a given 2D or 3D domain Q, 
the mechanical properties of the material varying from 
point to point. Thus, a shape is a distributed mechani- 
cal system with infinitely many degrees of freedom. In 
order to get a computationally tractable model, a finite 
element approximation is applied. Specifically, 

e the infinite-dimensional space of displacements of 
the actual construction (the space of vector fields on 
(2) is approximated by its finite-dimensional sub- 
space R™; 

e 2 is partitioned into N cells Cz, £=1,..., N, and the 
mechanical properties of the material are assumed 
to be constant within every cell. 

With this approximation, the energy of elastic de- 
formation of shape under displacement w is 


E(w) 


N 
= ; 2 ie Tr (« [ ep(w)ep (w) aP) , 
(3) 


where Tr stands for the trace, and 
e v¢ is the d-dimensional volume of Cg; 


e ep(w) is the strain tensor associated with displace- 
ment w at a point P € 92; the only property of this 
tensor important in our context is that ep(w) is an 
Loo function of P taking values in the Euclidean 
space R? (D = d(d + 1)/2) linearly depending on w; 

e v,'te is the ‘rigidity tensor’ of the material speci- 
fying the mechanical properties of the material in 
cell Cg; mathematically, tg is a symmetric positive 
semidefinite D x D matrix. 

After finite element approximation, the set of kinemat- 

ically admissible displacements becomes a subset W of 

R™, an external load acting at a shape can be identi- 

fied with a vector f € R™, the equilibrium displacement 

wy of the shape minimizes the potential energy E(w) — 


fiw: 


wy € Argmax[f' w—E(w)], (4) 
wew 
and the rigidity properties of the shape w.r.t. the load 
are measured by the compliance 


compl(t; f) = et [flw _ E(w)| . (5) 


The set T of feasible design vectors is always a subset of 
the set (S? )', $? being the cone of positive semidefinite 
D x D matrices (the rigidity tensors must be positive 
semidefinite). Typical additional restrictions defining T 
are the ‘resource constraints’ imposed on the quantities 
Tr(t¢) (these quantities in a sense measure densities of 
the material in the cells). The most important case is the 
one of 


te € S?, te = 0, 


pe < Tr(te) < pe, 
ey .., tn): = 


i —s eres ; 
yee 6) 
O<pex<pyp<m, LS Aen kNG 
N 
Y pe <v, 
t=1_ 


where S? is the space of symmetric D x D matrices and 
the relation A > B for A, B € S? means that A — B is 
symmetric positive semidefinite. 


‘Standard Case’ of the SD Problem 


The truss and the shape problems are covered by a sin- 
gle particular case of (Pini), the one where the ‘design 


3392 


Semidefinite Programming and Structural Optimization 


variables’ te are positive semidefinite symmetric matri- 
ces of certain row dimension D, the constraints defining 
T are restrictions on the vectors comprised of traces of 
these matrices, and 


N S§ 
B(t) = D> Yo bestebj,, (7) 


€=1 s=1 


where bg, are given M x D matrices. 

Indeed, the truss problem clearly fits the indicated 
scheme (with D=S=1, vg =1,1=1,..., N). In the case 
of the shape problem, there always exist S “quadrature 
grids’ {Po, € Cry and ‘quadrature weights’ ioe oer 
£=1,...,.N, such that 


S 
1 
= [ enlwreS(w) dP = J vier, (meh, 
VE Ce s=l 
identically in w € R™. Specifying matrices bg, according 
to 


bi,w = yesep,,(w), Vwe R™, 


the energy (3) becomes wT By 
and relation (5) becomes (2). 


with B(t) given by (7), 


Standard Case 


The ‘standard case’ of the general SD problem is as fol- 
lows: 


S.1 | The space of the design vectors is the direct 
product of N of the spaces S? of symmetric 
D x D matrices, so that the design vector is 
f= (ncoontim) 8 EOP. LE Icon. Nk 

S.2 | The set T of admissible designs is given 
by (6). 

S.3 | The mapping t +> B(t) is given by (7), and 
the matrix )-p_, °°, be, by is positive defi- 
nite; 

S.4 | The set W of kinematically admissible dis- 
placements is polyhedral: 


W ={we R™: Pw < p}, pe R14, (8) 


and the system of constraints Pw < p satis- 
fies the Slater condition: Jw : Pw < p. 


The Set of Loads 


In the traditional literature on structural design (see 
[1,2,3,4,5,7] and references therein) F is either a single- 
ton {f} (‘single-load case’), or, more generally, a finite 
set: 


F= {fi,..., Sih. (9) 


Recently, a robust setting of the problem was proposed 
and motivated [6], where F is an ellipsoid: 


F={f=Qu: weR', uu <i}. (10) 


Semidefinite Reformulation of Pini 


It turns out that the standard case of the multiload SD 

problem can be naturally posed as a semidefinite pro- 

gram; this is the case for the robust SD problem as well, 

provided that there are no obstacles (W = R™), In fact, 

there are three semidefinite settings of the standard SD 

problem: the primal(P), the dual(D) and the equivalent 

primal (P*). The relations between these forms are as 

follows: 

e is, basically, a straightforward semidefinite reformu- 
lation of (Pini); 

e (D) is obtained from the Fenchel dual of (P) by ana- 
lytic elimination of part of variables, 

e (P*) is obtained from the Fenchel dual of (D) by an- 
alytic elimination of part of variables. 

The explicit semidefinite forms of the standard truss 

and shape design problems are as follows. 


Multiload Truss Design 
Here T, W, F are given by (6) (where D = 1), (8) and 
(9), respectively. 

tT > min 


2[c—p'y] yi P-f 


N 
Ply; — fi > bedy te 
f=1 
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[r. ER, yi € R‘] The simplest truss design problem (single load, no ob- 
stacles, trivial bounds on bar volumes: pg = 0, py = v) 
can be reduced to a linear programming program. In- 
deed, in this case (P*)is the program 


k N 
250 fi wit d Treg — prog] 
i=1 £=1 


1 
—-\ ¢'¢g—> min 
+vy > min 3 > e VE 1 
: T . 
On by wy te>0, 
‘ : Deete S¥, 
Ak : bi we > quoy = f 
£ 
(D) Sree aie a eS ee ae RSS aR a eS wee a eS eee RSS a 
blw + blwe 2 y top —o7 [te,qe€R, £=1,...,N]. 
=0, €=1 ,N; 
om =O. f=L.cmw Partial minimization w.r.t. tg results in the program 
Y = 0; 


2 
Pw; <ajp, i=1,...,k; 1 
k Dla —> min 


pea s.t. > qebe = f. 
£ 


which is, essentially, an LP program. 


T > min 
i i 
2[t—p yi] qi qn Robust Obstacle-Free Truss Design 
~~~ ees Here F is the ellipsoid (10), T is given by (6) with D = 1, 
q\ > th = 0, and W=R”. 
: Tt > min 
; : t 
(Pt) anmeches N > 0: 
pe Ste < Dy, Q di bedi te | ~ 
€=1,...,N; (P) = 
N peu Sp, 
ys te < v3 £=1 ,N; 
t=1 N 
yi20, i=1,....k ys 
N f=1 
Yo abe = fi-P' yi, 
t=1 
i=1,...,k, 
[t, te € R] 


[c.te.q) ER, yi € R4J. 
(from now on, Ix is the k x k unit matrix), 
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N 
—2Tr(Q'w) + ) [peat — peop 1 + vy 


f=1 
—> min 
a w' be 
= 0, 
(D) biw ytos—o;, 
| | er 
G20, C= Ligh 
y 20; 
2Tr(a) =1 


[a esto#.yeR,weR™|, 


T —> min 


[r eR, te ES”, gp eR*] 


Multiload Shape Design 


Here T, W, F are given by (6) (where D = 3 for planar 
and D = 6 for spatial shapes), (8), (9), respectively. 


T—> min 

see yl 9) PF) 
N S 

= 0, 

Pl yi fi > > besteby, 
£=1s=1 

PH Lyesagks 

(P) 

te>0, €=1,...,N; 

be STr(te) <p, €=1,...,N3 

“N 

do Tr(te) < v5 

£=1 

yi = Oz i=1,...,k 


TER, te €S?, y; € RI 
[ y 


k N 
—2 Al wi + SS Beat — peoy | +vy > min 
i=1 £=1 
ays Be[wi] 
ays Be[wx] 
ii. 1 See eee 
Bf [wi] Bi lwel ? (y+op -o7)Ip 
>0, €=1,...,N3 
Se, CHiN 
y = 0; 
Pwi <ajp, i=1,...,k 


leiot.y ER, Wi eR], 


wl bey 
where Bg[w] = a 
w! bes 


qn ty 
(pt) i te 


des te 
Pe < Tr(te) < Pe, 


N 
Y= Tr(te) < v5 
e=1 


yi = Oz P= 1p eencks 


N 


Ss 
Dd do besae, = fi Py» 


€=1s=1 
[c ER, te €8?, qi. ER, y; € R14]. 


os Ceeeeee 24 


Robust Obstacle-Free Shape Design 


Here T is given by (6) (where D = 3 for planar and D = 
6 for spatial shapes), F is the ellipsoid (10) and W = R™. 
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T—> min 
2tIk Ql 
N S 
>0 
Q > S- besteby, 
(P) £=1s=1 
te>0, €=1,...,N; 
pe <Tr(te)<p, €=1,...,N; 
“N 
do Te(te) < v, 
e=1 
[t,te € R], 
N 
=2 Tr(Q! w) + > weoe — peo, | +vy > min 
e=1 
a w! bey 
= >~0 
a w bes ae 
(D) 
bhw blew: (y+opt —o7)Ip 
£=1,...,N; 
of 0, €=1,...,N 
y = 0; 
2Tr(a) = 1 


[ae st, of, yeER, we RMXt], 


T > min 
2th : qi Gy 
qi t) oe 0; 
qn ty 
(Pt) qi te 
qe — Suave : te — 

Qs te 
pe <Tr(te)< py, €=1,...,N; 
“N 
Yo Tr(te) < v3 
l=1 


N S 
pe bes des =Q 


f=] s=1 


[= ER, te € 8”, ge, € Ro]. 


Relations between the original setting of the SD prob- 
lem (Pini) and its semidefinite forms (P), (D), (P*) are 
summarized in the following statement: 


Theorem 1 Consider standard truss or shape case of 

problem (Pini), and assume that the set of loads F is either 

(9), or (10); in the latter case, assume also that there are 

no obstacles, i. e., W =R™. Then 

i) A pair (t, t) can be extended to a feasible solution 
to problems (P) and (P*) if and only if t € T and 
complp(t) < t; consequently, the problems (P) and 
(P*) are equivalent semidefinite reformulations of 
the problem of interest. 

ii) All three programs (P), (D), (P*) are strictly feasible 
(i. e., for each problem there exists a feasible solution 
satisfying strict forms of all inequality constraints) 
and solvable. The optimal values of (P) and (P*) are 
equal to each other and to the optimal value in the 
problem of interest; the optimal value of (D) is minus 
the one of (P) and (P*). 

iii) For each program, every level set of its objective (i. e., 
the part of the feasible set where the objective is < 
a constant) is bounded. 


Computational Issues 


Semidefinite forms of the standard SD problem are well 
suited for solving by modern polynomial time interior 
point methods. The “computational bottleneck’ is huge 
sizes of SD problems of actual interest. The limitations 
imposed by this bottleneck heavily depend on which 
one of the forms (P), (D), (P*) is solved, and in many 
cases (e. g., in truss design) (D) is by far better suited for 
numerical processing than other forms of the SD prob- 
lem. For detailed discussion of computational issues, 
including ‘computationally cheap’ techniques for re- 
covering a (nearly) optimal design ae from (nearly) 
optimal solution to (D), see [5,6]. 


See also 


> Duality for Semidefinite Programming 

> Interior Point Methods for Semidefinite 
Programming 

> Semidefinite Programming and Determinant 
Maximization 

> Semidefinite Programming: Optimality Conditions 
and Stability 
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> Semi-infinite Programming, Semidefinite 
Programming and Perfect Duality 

> Solving Large Scale and Sparse Semidefinite 
Programs 

> Structural Optimization 

> Structural Optimization: History 

> Topology of Global Optimization 

> Topology Optimization 


References 


1. Bendsoe MP (1995) Optimization of structural topology 
shape and material. Springer, Berlin 

2. Bendsoe MP, Ben-Tal A, Zowe J (1994) Optimization meth- 
ods for truss geometry and topology design. Structural Op- 
tim 4:141-159 

3. Bendsoe MP, Guedes JM, Haber RB, Pedersen P, Taylor JE 
(1994) An analytical model to predict optimal material prop- 
erties in the context of Optimal Structural Design. J Appl Me- 
chanics 61:930-937 

4. Ben-Tal A, Bendsoe MP (1993) A new method for optimal 
truss topology design. SIAM J Optim 3:322-358 

5. Ben-Tal A, Nemirovski A (1994) Potential reduction polyno- 
mial time method for truss topology design. SIAM J Optim 
4:596-612 

6. Ben-Tal A, Nemirovski A (1997) Stable truss topology design 
via semidefinite programming. SIAM J Optim 7:991-1016 

7. Ringertz U (1993) On finding the optimal distribution of ma- 
terial properties. Structural Optim 5:265-267 


———— 
Semi-infinite Programming 


and Applications in Finance 


K. O. KORTANEK!, VLADIMIR G. MEDVEDEV2 
' University Iowa, Iowa City, USA 
* Byelorussian State University, 

Minsk, Republic Belarus 


MSC2000: 90C34, 91B28 


Article Outline 


Keywords 

Problem Statement 
and Early Numerical Solution Methods 

The Support Problems Method Developed 
in Belarus 

Extended Support Problems Method 

A SIP Approach for Estimating Uncertainty 
in Dynamical Systems 


The Minimax Observation Problem Under Uncertainty 
with Perturbations 


A Prototype: Analog of the Vasicek Model 
with Impulse Perturbations 

Estimating the Spot Rate for Bonds 
with Constant Maturities 

A Nonarbitrage Condition for LDSU 

See also 

References 


Keywords 

Duality equality; Optimality conditions; Supports 
problems method; Dynamical system; Nonstochastic 
uncertainty; Perturbations; Term structure of interest 
rates; Term to maturity 


Problem Statement 
and Early Numerical Solution Methods 


Consider the linear semi-infinite programming prob- 
lem LSIP, where ‘s.t.’ means ‘subject to’ throughout this 


paper: 


vp = maxc! x 
s.t. f(x, t) <0, (1) 
teT, dy<x<d*, 
where 
f(x.) =a'(\x—b(), te T = [ts €*), 
x, c, dx, d* € R"; 
a(-)ER", b(-) ER, 


are differentiable functions. Its dual semi-infinite pro- 
gram DLSIP is: 


vp = inf b(t) da(t)+ S vt d®* +r dar 
D . LS 


r=1 
s.t. i a,(t) da(t) + vi —v> = c, 
T 


—_— 


where a is a nonnegative Riemann-Stieltjes measure 
[13], and ae v, > 0. 

Examples in transportation theory and wavelet filter 
design are given, respectively, in [18,19]. In [4, Thm. 4] 
it is proved that vp = vp, but the infimum need not be 
attained. This equation is termed the duality equality. 
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Example 1 


—tx,; — tx. <0, 
-l<x; <1, 


§ Vt € [0, 1], 
max } —x 1: ; 

i i=1,2 
Here the common program value is 0 but it is not at- 
tained in the dual. 


In the 1970s however algorithms were developed under 
more regularity. 


Definition 2 The linear system (1) satisfies the Slater 
condition if there exists x, d,. < x < d*, for which it is 
satisfied strictly with x. 


With the Slater condition S.-A. Gustafson [9] sup- 
plemented the necessary complementarity conditions 
(analogous to LP) with the now classical matching of 
derivative conditions, see also [11, (2.3)-(2.8)]. Rather 
than reviewing these we focus on recent extensions (ap- 
pearing in Russian) of Gustafson’s well known three 
phase algorithm, see [8,10]. [12, Fig. 28-1] presents the 
logic flow of semi-infinite programming algorithms ap- 
pearing in 1973. 


The Support Problems Method Developed 
in Belarus 


The support problems method was suggested in [6] and 
further developed and algorithmically implemented in 
[24]. It is based on the principle of eliminating the 
subsets of the index set (T) where the constraints 
are violated. The first component consists of techni- 
cal procedures for forming, solving, and analyzing se- 
quences of LP problems having a small number of con- 
straints which does not depend on a preassigned ac- 
curacy. The second component is a finishing proce- 
dure which roughly speaking employs a Newton pro- 
cedure on a system of nonlinear necessary conditions 
(the matching of derivatives). However, the method is 
the most general available of the hybrid types [14] be- 
cause it uses higher order derivatives. 

We assume the set of feasible solutions of LSIP is 
nonempty, namely, 


f(x.) <0, VteT, 


aleeR*: 
cae kak eee. 


Define 
J=({1,...,n}, 
fe) = TEP 
dat) . 
Ss = (s) = 
a(t) = (a (t) = a ; jet). 
4) ifq <0, 
N(q) = 
(9) —_— ifq > 0, 


and let p = p(x, t) € {0, 1,...} be a positive integer satis- 


fying 


f(x) =0, seN(p—1), by 
f(x, t) £ 0. 
From (2) it follows that 
{0,2,...3, f(x, t) <0 
ift ¢ {t*, ta}, 
{0,1,...3, f?@,1) <0 
ift = te, 
0,1,...3, f(x, t) <0 
p=pxnet! ye 3) 
if p is even 
and t = ¢*, 
LOA yccels (Fo Gd) S 0 
if p is odd 
and t = f*, 


Definition 3 The integer q = q(t) is called the motion- 
less degree (see [20]) at the point t € T in problem (1) if 
fx, 1) =0, se N(q(t)), 


and there exists at least one feasible solution x € X such 
that 


Vx € X, (4) 


<0 iftA#t*or 
FUNG, 2) t= f*and piseven, (5) 
>0 ift = t* and pis odd. 


It follows from Definition 3 that 


(1.135: ¢o=i 
{=1,0, 12.0.9 (=i) 


ift € ]ts, f*[, 


ns | if t € {ty;t*} 


where p*(t) = max p(x, ft), x € X. 
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Remark 4 If in problem (1) there exists at least one 
point t € T so that q(t) > 0, then the Slater condition 
does not hold. 


Following [20], let us describe the algorithm for con- 
structing the function q(t), t € T. Define 


T(x) ={teT: f(x,t) =0} = {t;: ie B, 
I = I(x) = {1,...,k = k(x)}. 


Set a =— 1,ie€1;k=0. Denote 


t; = t* 
PO = diel: . 
qt + 1 an odd integer 
ph) — ry re; 
f' Gi) =9, 
k 
se N(q), 
k 
fai +N (z, t;) <0 
x = izeR; fie PM, 
y+) 
fu (z, ti) => 0 
ifie 1°, 


dy <z<d* 


AO = (x 


iel 


For each j € I let f(z) = fr +V(z, tj), and let x” be 
an optimal solution of the following LP problem: 


argmin f,")(z) if j ¢ , 
(j) = zeXxtk) (6) 
argmax f,")(z) if je 1°), 
zexth) 


x 


Since x € X™, it follows that an optimal solution of (6) 
always exists. Set 1 = {j€ Ee) = 0}. The follow- 
ing cases occur. 


e I 4G. Set 
a, jeI\l®, 
Gaya +2, tye tte tt), 
qo +1, tj € {te} 
e 1 =9@. Set 
-1, teET\T(x) 
q(t) = 


q. JEL e=s. 


The function q(-) so defined satisfies conditions (1)- 
(5), but the details must be left to [20]. 


Let x be the feasible solution of the problem (1). Define 
T(x) ={teT: f(x,)=O0}={t;: ie fh, 

I= I(x) = {1,...,k = k(x)}, 

pi=plx.ti), gi=qti), ie, 

In = {i€ 1: git 1 =pi}. Ik =1\bo, 

=fiel: t;=t*, qi t+lodd}, Po =1\ I". 
The following result is proved in [20]. 
Theorem 5 The feasible solution x € X is the optimal 


solution of problem (1) if and only if the vector x is the 
optimal solution of the following LP problem: 


max clz 
Si 0 du SZ < a", 
and 
fz, t)) =0, sEN(qj), jel (7) 


0 fjePnk, 


8 
opel Ws. o 


=0 


FOYE, t;) | = 


Extended Support Problems Method 


We conclude the algorithmic part of the paper with 
a very general description of the extension to problems 
not having interior point (non-Slater problems). 


1 Determine a feasible solution x for prob- 
lem (1). 

In this step the LSIP problem with a floating 
number of variables is solved by the support 
problems method. 

2 Determine motionless points t; € T(x) and 

the corresponding motionless degrees qj, j € 
HED. 
In this step one finds all points in the interval T 
for which f(x, t) = 0 and solves the sequence 
of the LP problems for defining the motionless 
degrees at these points. 

3 Determine an optimal solution of (1). 

In this step the constraints (7) — (8) are added 
to the constraints of the original problem (1). 
The resulting problem is solved by the support 
problems method. 


Extended support problems method 
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We turn now to very recent applications of SIP to 
finance, which employ semi-infinite programs having 
index sets, T, appearing in a partitioned form, corre- 
sponding to units of time. A basic structure for LSIP 
is an index set-partitioned form having the following 
structure: 


+ 


min c'x 
st. bx (t) < A(t)x < b*(t), 
Vte TL, 
PLSIP v 9 
: me| = ti]. oe 
i=1 
Bx < Gx < g*, 


dy <x<d"*, 


where c, x, d*, dx are n-dimensional vectors, b.(-), 
b*(-) are m-dimensional functions, A(-) an m x n ma- 
trix function, and G an p x n matrix, with g*, g* 
p-dimensional vectors. We shall use the T’ notation 
throughout for various choices of the integer L. Typi- 
cally, T; is the ith day of an observational period. 


A SIP Approach for Estimating Uncertainty 
in Dynamical Systems 


Generally, the problem of estimation occurring in non- 
deterministic systems has been investigated by means 
of many stochastic models beginning with the papers 
of N. Wiener [28] and R.E. Kalman [16]. In the 1970s, 
nonstochastic observation models (‘minimax’, “guaran- 
teed’) under uncertainty were developed in [5,21,22]. 
During the 1980s a new approach for optimization of 
linear dynamical systems under uncertainty was devel- 
oped by R. Gabasov and F.M. Kirillova; see [7]. 

Our approach [17] to modeling uncertainty is in 
contrast to other qualitative approaches, for example, 
based on stochastic differential equations. In the latter 
case certain mathematical assumptions are made about 
the underlying stochastic processes which may be dif- 
ficult to verify in real situations, for example, in the 
financial derivatives and assets markets. We demon- 
strate our approach by applying the following general 
minimax observation problem, stated with unknown pa- 
rameters under nonstochastic uncertainty, to differen- 
tial equations models for interest rates of shortest du- 
ration, termed the spot interest rate. The models we de- 


velop are analogous to some of the stochastic differen- 
tial equations models appearing in the literature. 


The Minimax Observation Problem 
Under Uncertainty with Perturbations 


Our main model is the well known linear dynamic 
system under nonstochastic uncertainty with perturba- 
tions, LDSU, over the time interval, J = [0, T]: 


x = Ax + Dw(t), 
x(0) = Xo € Xo, 


X= sx eR": dy Sx sal; 


Wit) = {w(t) ER!: we < w(t) < w* 


DeR™. w.* eR’. 


The fundamental matrix F of (10) has the following 
properties: 


F=AF, F(0)=€E, 
F(t —s) = F(t— p)F(p—s), 
F(t+s) = F(t)F(s), 
F(t) = F(—1), 


yielding the form of a solution of (10) by the Cauchy 
formula: 
t 
x(t) = F(t)xo +f F(t)F(—s)Du(s) ds. (11) 
0 


We assume that the state x(t) of the system (10) is esti- 
mated from a sensor system of the form: 


y(t) =h'x(t)+2(t), Wee T, (12) 


which is a measurement system giving inexact informa- 
tion about current state of system (10), where z(t) is 
an unknown piecewise continuous measurement error 
function. 

Let (10), (12) generate a signal y*(t), t € J with some 
measurement error z *(t), t € J. With our approach we 
seek x(-) by solving the following minimax observation 
problem: 


min max |z*(t)|. (13) 


(x,w(-))EX9xW(-) tET 


We obtain an equivalent infinite linear program from 
(13) by substituting (11) into (12) in order to obtain an 
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explicit expression for z*(-): 
2*(t) = y*(t) 


E 
—h! F(t)xp —h" i F(t)F(—s)Dw/(s) ds. 
0 
Hence (13) is equivalent to: 


min v 


st. -y*(t) < h' F(t)x 
+f F(t)F(—s)Dw(s) ds + v, 
0 


t 


h" F(t)x + / F(t)F(—s)Dw(s)ds_ “14 
0 


=v sy"), 
xEX, v=>d, 
We <w(t)<w*, VWteT. 

Our main application to state estimation is the fol- 
lowing one. Suppose (x°, w®, v°) is optimal for (14). 
Then 


X(t) = F(t)x® + i F(t)F(—s) Dw%(s) ds 
0 


is an estimate of the state x(t) of the system (10). This 
estimate gives the minimal possible maximum absolute 
value of the measurement error v°. Special purpose al- 
gorithms for problems of this type have been developed 
in [24,25]. To implement this approach we must specify 
a class of impulse perturbations, and we illustrate one 
such prototype class next 


A Prototype: Analog of the Vasicek Model 
with Impulse Perturbations 


The financial markets setting shall be that of a default- 
free discount bond paying $ 1 at maturity time T. At 
this point for convenience, we take the inception to be 
0, while letting P(t, T) denote the price of this bond at 
time t, 0 < t < T. In actuality the bond may be a 91 
day treasury bill issued at 10/1/97 ( = to) maturing on 
12/31/97 = to + 91days. By definition, P(T, T) = 1. For 
t < T, the yield to maturity R(t, T) prevailing at time t 
is the internal rate of return at time f on a bond with 
maturity date T: 


1 


R(t, T) = ~7— 


log P(t, T), O<t<T. (15) 


7 
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Semi-infinite Programming and Applications in Finance, Fig- 
ure 1 
US treasury yield curve: Constant maturities 


The interest rate yield curve is the plot of R against the 
time to maturity, see [2,15,29]. For T > t, as a function 
of T, R(t, T) is usually called the term structure of inter- 
est rates at time t. Figure 1 is an illustration from recent 
data. 

The instantaneous short rate or spot rate prevailing 
at time f, r(t), (see, [26]) is defined as: 


r(t) = lim R(t, T). (16) 
>t 
Hence, 
P(t, T) = ee 48, (17) 


The spot rate cannot be observed from real data, but it 
is the focus of various stochastic differential equations 
models. Our prototype LDSU model shall also be con- 
structed around the spot rate, r. It follows from (15), 
(17) that 
1 T 
R(t, T) = —— ds. 18 
(N= pf was (1s) 
We illustrate the approach with the classical model 
of O. Vasicek [26], where the standard Brownian mo- 
tion, Z, underlies the stochastic differential equation for 


the spot rate, and where a, f, and o are parameters, see 
[23,27]: 
dr = (a+ Br) dt+o dZ. (19) 


The parameters are employed to capture shifts and 
volatility of the spot rate, [3]. We hypothesize that the 
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spot rate is governed by the following linear dynamic 
system with unknown parameters and nonstochastic 
uncertainty. Let N be a positive integer, 


t#=a + Br(t)+ w(t), 
B #0, 
where T’ is defined in (9), L = N. Assume that a pri- 


ori information about the unknown parameters, w(-), 
of this LDEU takes the form: 


(20) 


r(t) =1, te TN, 


w= (r,0,B,wi, i=1,...,N), 
Ox = (rx, x, Bx, Wxi, 1 lll: ,N), 
Oo =(",0 pw. P= bach); 
N43 Pm (21) 
Q={oER 1 On <o<o*}, 
w(t)=Wwi, Wi < wi < w3, 
teT;, i=1 LN, 


where w(-), are piecewise-constant perturbations. Other 
choices of classes of perturbations appear in [17]. Sys- 
tem (20)-(21) comprise our prototype of (10). 


Remark 6 We stress the dependence of the spot rate 
on the parameters in (21) by writing r(-|w). However, 
to emphasize the status of w as an independent variable 
we write r(t|w) = f(t, w). 


We address next how the general measurement system 
(12) specializes in our Vasicek-based prototype. 


Estimating the Spot Rate for Bonds 
with Constant Maturities 


We define the period of observation to be T™, for a pos- 
itive integer, M, see (9). Let t be the current time-to- 
maturity (maturity term), so that with inception date t 
a T-maturity bond becomes due at date f + T. 

Assume that we have observed values of the treasury 
yield curve giving a series of yields to maturity RY, i 
= 1,..., M, for some given maturity term t during M 
days of observation. The date ty is the last day of the 
observation period, becoming the current date. Under 
this interpretation we build a piecewise constant form 
of the yield to maturity, for numerical stability reasons, 
and consistent with LDSU: 


Rt, t+ 1) = R™, 
Le Ts 


(22) 


The table below illustrates some observations of US 
treasury yield curve rates, for successive years. 


Date 3-mo 6-mo 9-mo_ l-yr 
01/03/94 3.16 <1 3.39 367 4.66 
01/18/96 5.11 5.02 5.01 

01/19/96 5.1 5.06 5.02 5.03 


Observations of treasury yield curves. 


. . p(T) — pl3 mo) 
Legend: <— 1:R,"' = Ro1/03/94 


Definition 7 By the 2-based yield we mean the aver- 
aged integral 


1 t-+Tt 
p(t, o|t) = ~| r(sjw) ds, teT™, 
T Jt 


(23) 
@ € 82, the set of unknown parameters. 
The estimation error, € (t, w) is the difference, 
t,w|t) — R(t, t +7), 
p(t, @|t) — R ) (24) 


teT™, wen. 


Note that (24) corresponds to the measurement error 
function in (10). We compute the estimate w° of un- 
known parameters w by minimizing over w € 92, the 
maximum absolute value of the function of estimation 
errors € (t, @) on the interval T™. This leads to the fol- 
lowing problem: 


min max |e(t,@)|. (25) 


MEL tEeTM 


Problem (25) may be written as the following nonlinear 
semi-infinite programming NSIP problem, see [14]: 


min vr; 
st. p(t, @|t) — vz < R(t,t +1), 
R(t, t +t) < p(t,o|t) + vz, 
te T™, 


(26) 


@W@EM, v~>0. 

We shall call problem (26) the t-programmed prob- 
lem of spot rate estimation, but we must be more spe- 
cific. We return to our basic model (20)-(21) and apply 
the Cauchy formula to find that the solution of the dif- 
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ferential equation (20) has the form 
r(t|@) = ery + <(eF! — 1) 
B 
+ wile —1), 


teT;, i=1,....N, N=M-t. 


To derive the explicit form of problem (26) means that 
we must first specify the function 


p(t, @|t) = ~ | r(s|w) ds, 
T Jt 


te T;, £=1)..:5.M. 


After a tedious series of integrations we obtain 


eBlttt) _ 


tB 


eblt+t) _ pBbt 1 
+\ Tar BB) 


i+t 
+ >" aj (B, tlt)we, 


k=1 


ePt 
To 


p(t, @|t) = 


where ai, (, t|t) = 


eblt-te-it+t) _ eB(t-tk +7) 4 eBlt-tk) _ eBlt-tk-1) 


tp , 
k <i, 
eblt-teitt) _ pB(t-tk+t) _ pB(t—tk-1) +1 tr —t 
TB tp 
k =i, 
eblt-te-itt) _ eB(t-tk +) fe = hiea 
tp? 7 tp 
i<xk<i+rt, 
eAt-h#-itt) tot tt 
tp2 7 tp , 
k=i+t. 
(28) 


From the above we obtain the following nonlinear 
semi-infinite programming problem 


min vz 
eAlttt) _ Bt 


s.t. Ff 
tp 


FP eblttt) _ bt 1 
= a 
tp? B 


i+t 

+0 a,(B. tlt)we — v2 
k=1 

< Rit,t +7), 


R(t,t +1) 
eblt+t) _ pBt 


< ——_r 
= 7 0 


n eBbltt+t) _ ebt 1 
= a 
Tp? B 
i+t 


+ a ai(B, t|T) We + Vr, 


k=1 


(29) 


where 


te T;, i=1,...,M; 


Bs = B = B", 
i=l1,.. 


teStosr’, 


.N. 


Ox <a <a’, 


v>0, We < wi <w*, 


Let (w°, v™°) be the solution of problem (29), (26). 
Upon substituting w° for w in (27) yields a formula that 
we term the t-estimate of the spot rate, i.e., 


rr(t) = f(t,o°), teT™. (30) 
It follows from (17) that the function 
“N\ -fi rr(s) ds 
P(t, t) =e : (31) 


té[to,tw], tf [t, tn], 


will be an estimate of the price at time t of a discount 
bond maturing at time Tf. 

The function r;(t) = f(t|w°), while an estimate in 
the interval [fo, tz], becomes the forecast in the future 
interval (the time after the current time ty), namely, 
the forecast interval [ty,, ty], where N = M + t. If the 
function r(t, |w°) is defined over t € [tn, tyr], where 
M +t < M”, then the predicted price of a discount 
bond maturing at time f is given by: 


7 rae =f iiss 
P(t,t) =e !t ; (32) 


te linstyel. €S lite), 
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and the predicted yield is 


me ee 1 7 ~ 
RD = — | r-(t) ds, 
t—tJ: 


t € [tn, tyr), 2?) 


SE [t, tyr] 


at time t of discount bond maturing at date f. 


A Nonarbitrage Condition for LDSU 


Analogous to the stochastic case, we construct a portfo- 
lio with two bonds with differing maturities, T; and T, 
selling one unit of the T, maturity and buying A of the 
Tz maturity. The value of the portfolio is (see [29, Sect. 
17.5]), 

IT(t) = P(t, T,) — AP(t, To). (34) 
Analogously, we differentiate [7(t) with respect to time, 
t, recognizing that all of our parameters are indepen- 
dent of time, (see also Remark 6. Hence we obtain 


dIT(t) 
dt 


= IT(t)f(t,@). (35) 
But (35) states that the return on the portfolio equals 
the risk-free rate, the spot rate, [29, p. 271]. But this re- 
quired condition is not our complete measure of nonar- 
bitrage because our approach depends on actual obser- 
vations and real data. Let us be more precise. 

If at time T, $1 is invested in the risk-free market 
(what the observations of actual data show) and grows 
to $M at time T, then $M must be compared with what 
the estimated spot rate returns over this period, where 
we assume f(s, w) > 0 for all a, i-e., 


elt f(s.@)ds 


LP f(s,o)ds 
If M > en J", we borrow $1 at T at the spot 
rate and invest it in the risk-free market during the pe- 


“T. 
riod [T;, T2]. At T; we make a profit of M— elt: Slsso)ds 


SR fls@)ds : ; 

If M < et J", we borrow $1 in the risk-free mar- 

ket (supported by observed data) and invest it at the 

spot rate over the period [T;, T2]. We make a profit of 
Ts 

eft, S604 _ This analysis motivates the following 


condition. 


Assumption 8 Let f(-, -) be a nonnegative function 
specified as in Remark 6. For any arbitrary T), T2, and 
W satisfying T; < T2, W > 0, there exists aw* € {2 such 
that: 


T2 
/ f(s,@*) ds = W. (36) 
Ty 


This is a necessary condition to guarantee existence of 

nonarbitrage in the broader sense of using real obser- 

vational data for estimation by the rules and models we 
have introduced. 

We conclude by presenting one of the figures ob- 
tained from a solution of problem (26), (29) with 
piecewise-constant perturbations for maturity term t = 
91 days over the last three months of 1995. Using the 
notation of (29) Fig. 2 is a three-dimensional plot of 
R(t,t + t) = p(t,w|r) having two boundaries both 
with slope 1 that have financial interpretations: 

e the left-most boundary is R(t, t + 91 days), 10/1/95 
< t < 12/24/95, depicting the estimated yield curve 
over the observation period; 

e the right-most boundary is R(t, t), 10/1/95 < t < 
12/24/95, and depicts the unobservable estimated 
spot rate over the observation period. 


Semi-infinite Programming and Applications in Finance, Fig- 
ure 2 

Estimate of the term structure of interest rates using the ana- 
log of the Vasicek model with piecewise-constant perturba- 
tions for observations of yield to maturity, 7 = 3 mo, made 
during the observation period 1/8/1995 to 31/12/1995 
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It is interesting to note that in the stochastic case the 
market price of risk cannot be estimated uniquely in 
many cases. This result is demonstrated in [23]. 
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The class of general semi-infinite programming prob- 
lems may be looked upon as a generalization of the class 
of optimization problems with finitely many variables 
and constraints since in a semi-infinite program, either 
the number of variables or the number of constraints 
(but not both at the same time) may be infinite. In the 
present paper we will mainly deal with semi-infinite 
programs with finitely many variables. The main the- 
oretical as well as practical difficulty is that one needs 
to verify that a proposed optimal solution satisfies in- 
finitely many inequality constraints. In addition, non- 
linear problems may have many local minima and one 
needs to verify that a calculated optimum is indeed 
global. We will discuss the concept of computational 
equivalence, i.e. that for a fixed computer and soft- 
ware one may construct an optimization problem with 
finitely many variables and constraints which has the 
same computer representation as the semi-infinite pro- 
gram whose solution is sought. We will deal in detail 
with the class of linear problems and give some numer- 
ical examples, illustrating computational equivalence. 
We will consider one-sided approximation and approx- 
imation in the maximum norm. For the application 
of global optimization to linear semi-infinite program- 
ming and an illustration on the air quality control prob- 


lem, see » Semi-infinite Programming: Methods for 
Linear Problems. The literature on semi-infinite pro- 
grams and their applications to problems in science and 
engineering is extensive. For a general introduction to 
this field, see [8] [10] and [14]. Recent results are to be 
found, for instance, in [16]. 


Nonlinear Semi-Infinite Programs 


We will study the following class of problems: 


Definition 1 Let Sbea fixed set, F:R” — Rand G:R" x 
S — R two fixed functions. Then the following task will 
be called a semi-infinite program: 


min F(y) (1) 
over all y € R” subject to the constraint 


G(y,s)=0, seéeS. (2) 


Remark 2 The data of the semi-infinite program (1), 
(2) are hence the index set S, the real-valued functions 
F defined on R" and the real-valued function G, defined 
on R” x S. 


Definition 3. Use the notation of Definition 1. Put 
Y ={y € R”: G(y,s) = 0}, (3) 


V = inf F()). (4) 
yey 


If Y is empty, then we define V = — oo, i.e. the condi- 
tion (2) is inconsistent. 


We observe that (1) and (2) define a very general class of 
problems. If we restrict S to be finite, then we get non- 
linear optimization problems with finitely many non- 
linear constraints. If F and G are linear with respect to 
y then we get linear programs if S is finite, otherwise 
linear semi-infinite programs. If we require Y, the set 
of feasible solutions to be compact and F to be contin- 
uous on Y, then the existence of optimal solutions is 
guaranteed. We need to impose further assumptions in 
order to show that a proposed computational scheme is 
efficient. When a linear semi-infinite program is to be 
solved, a common approach is to discretize the prob- 
lem, i.e. replace the index set S by a finite subset T and 
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then solve the linear program hereby arising numeri- 
cally, which always can be done with a finite number of 
arithmetic operations. For nonlinear problems, the nu- 
merical solution of the discretized problem is still diffi- 
cult, since it may have many local extrema and the ver- 
ification that a calculated extremum is the global op- 
timum sought is nontrivial. For this reason discretiza- 
tion has not been the main approach for the treatment 
of nonlinear semi-infinite programs. Instead one has 
sought binding constraints, worked with penalty meth- 
ods and settled for calculation of a local optimum. See 
[3,4,6,7,13,14,15] and [16]. Quasirandom methods have 
also been used for estimating the global optimum. See, 
e.g., [5]. 


Computationally Equivalent Semi-Infinite 
Programs 


As known, a computer may only store a finite number 
of symbols. Symbol-manipulating languages like Maple 
and Mathematica may perform relatively complicated 
operations exactly by means of formula manipulations 
but their capacity is limited to a certain extent. This 
is illustrated by their treatment of operations on ratio- 
nal numbers which may be represented exactly as pairs 
of integers. However, if one would try to solve a lin- 
ear systems of equations with rational coefficients ex- 
actly, then one would find that the available storage ca- 
pacity is exceeded already for relatively small systems. 
We assume from now on that we have a language like 
Fortran which may manipulate integers exactly, pro- 
vided that their magnitude is limited by a bound which 
is known but depends on the computer and the soft- 
ware used. Arithmetic operations on real numbers may 
be performed with high accuracy, but not exactly and 
bounds for the errors may be derived. Since the stor- 
age of the computer is limited the set R of real num- 
bers must be represented by a finite subset, the com- 
puter numbers. Therefore the set R may be split into 
a finite number of subsets whose elements have iden- 
tical representations in the computer. Two reals, which 
have the same computer representations are considered 
computationally equivalent, since it is not possible to 
distinguish between them by means of computational 
operations. As a consequence of this the index set S in 
(2) must be represented by a finite subset T of computer 
numbers. We introduce 


Definition 3. The program 
min F*(y) (5) 
over all y € R” subject to the constraint 


G*(y,s)>0, seES, (6) 


is said to be computational equivalence to the program 

(1), (2) if there is a compact set Y* C R” which contains 

all the feasible solutions of both programs such that: 

e F*(y) and F(y) are computationally equivalent on 
Y*; 

e G*(y, s) and G(y, s) are computationally equivalent 
on Y* xS. 


Remark 5 The requirement that the set of feasible so- 
lutions should be compact may seem awkward, since 
a semi-infinite program may have an unbounded feasi- 
ble set even if the set of optimal solutions is bounded. 
Therefore one considers regularized semi-infinite pro- 
grams which have the constraint ||y|| < M where, the 
positive number M is chosen so large that this con- 
straint is not binding for at least one optimal solution. 


The next issue is to construct an optimization problem 
with finitely many constraints whose set of feasible so- 
lutions is computationally equivalent to that of (1), (2). 
We will outline the procedure for doing this which is 
presented in [1,11,12]. We next introduce 


Definition 6 Use the notations and definitions of (1) 


and (2). Let T € S bea finite set with N points: 
r= {t1,..., ty}. 


Let further w),..., wy be N continuous functions de- 
fined on S and having the properties: 


wj(s) = 0, séS, j=1,...,N, 

N 

Yi wi(s)=1, SES, (7) 
j=l 

wj(sj) = 4;;, i=1,...,.N, j=Hl,...,N. 


Let f be a continuous mapping defined on S and such 
that f(s) is either a real number or a vector in R”. 
We now define the nonnegative interpolatory operator 
based on T € S by 


N 
(Lf)(s) = )° wj(s)f(t)). 


j=l 
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Remark 7 We note that L is a linear operator having f 
as argument. Also note that (Lf)(s) = f(s) for all s € T. 


Remark 8 In the case of linear semi-infinite program- 
ming one may prove that if we replace the functions de- 
fined on S with the outputs of the mapping L then this 
semi-infinite program has the same feasible set as the 
linear program obtained by replacing the index set S by 
the grid T used for defining L. See, e. g., [8] or [12]. 


Example 9 We consider the simple linear semi-infinite 
program: 


‘ y2 
min y; + —, 
yi a! 


subject to the constraints 
Yyitpys2vit+s, 


One finds directly that the optimal y;, y2 are defined by 
the condition that the straight y, + y2s should be the tan- 
gent to the curve \/1 + sat the point s = 1/7. Assume 
that we use a computer with working relative accuracy 
1.0- 1078. We define here L as the result of equidistant 
linear interpolation with stepsize h. We notice that the 
functions a;(s) = 1 and a2(s) = s are interpolated exactly 
and one may verify that the maximal relative interpo- 
lation error for /1 +s is given by h?/32. Therefore, 
if we take h = 5 - 10°“, the discretized linear program 
becomes computationally equivalent with the original 
one. In this case T has 2001 elements. 


0<s<l. 


Approximation in the Uniform Norm 


We use the same definitions as in (1) and (2). Let f be 
a function which is continuous on S. We define its max- 
imum norm: 


= max|f(s)|. 
fll = max f() 
Next we consider the problem to determine 


min max |G(y,s)|. 
ae IG(y, s)| 


An equivalent formulation of this task is given by the 
semi-infinite program: 
min yo 
subject to the constraints 
séeS. 


G(y,s)+ yo 20, —G(y,s) + yo = 0, 


In this last problem, the variables are the real yo and 
the vector y € R”. At each point s € S two inequality 
conditions are to be satisfied. To calculate the optimal 
value corresponding to a certain y one needs to perform 
a global optimization over S. We note that the problem 
is consistent. If we discretize the problem, replacing S 
by a finite subset T this may be interpreted as approxi- 
mating over T. 
If G can be written 


G(y,s) = a(s)" y — B(s), (8) 


then we have a linear approximation problem and seek 
to approximate b by a linear combination of a), ..., 
Gn which are real-valued functions on S. The corre- 
sponding discretized problem becomes a linear pro- 
gram which may be solved by means of the simplex al- 
gorithm. See, e.g., [8]. Due to the special structure of 
these problems the exchange algorithms by Remez, see 
[2] are applicable. If these latter algorithms are used for 
the original problem, one needs to perform a global op- 
timization in each exchange step. The three-phase al- 
gorithm described in [8,9] and [10] may be adapted to 
the uniform approximation problem as well. The idea 
is to seek q points {s),..., sg} € S such that the absolute 
value of G(y, s) achieves its maximum yo. One needs 
to keep track of the sign of the deviation and whether 
the extreme value is achieved at a boundary point of S$ 
or in the interior. The number of local extrema as well 
as their approximate positions may be obtained from 
a discretized version of the problem in the case of (8), 
i.e. the linear approximation problem. In the nonlin- 
ear case they are generally found by other means. See, 
e.g., papers in [13] and [16]. We illustrate this with an 
example. 


Example 10 Determine the straight line 


Vi + yrs 


which approximates the function exp s best in the max- 
imum norm over the real interval [0, 1]. Thus we seek 
the solution to the problem 

min max |e* — y; — ys]. 

Yisy2 ee i | 
It is easy to verify, e.g., from a simple graph, that the 
maximum deviation in absolute value occurs at three 
points s1, so, s3 and further, that s; = 0, s3 = 1. Thus these 
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points are at the boundary of S and sp is in the interior. 
Hence we get a condition on the derivative of the point- 
wise deviation considered as a function of s at sz. Let yo 
be the maximal absolute value of the deviation. Then 
we obtain the following nonlinear system of equations 


exp(si) — yi — y25i =o, i= 1,3, 
—exp(s2) + yi + y2S2 = Yo, 

exp(s2) — y2 = 0, 

s5=0, 53 = 1; 


This system may be solved by means of Newton’s 
method. Here it was easy to guess the form of the non- 
linear system giving the optimal solution. The fact that 
the infinitely many constraints are satisfied, namely that 
the absolute value of the deviation at each point is 
less than or equal to yo may be verified analytically. In 
a more general context, the nonlinear system giving the 
optimal solution may be constructed from a discretized 
version of the approximation problem. Still, the verifi- 
cation that the optimum approximation has been found 
may be nontrivial, even in the case of linear approxima- 
tion problems. In this situation, quasirandom methods 
may be contemplated. Experiments with this approach 
will be reported elsewhere. 
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We have developed over the last twenty years an ap- 
proach for the study of optimal control and variational 
problems based on the consideration of measure spaces; 
see [11,12] and the references there. In many ways this 
work has been inspired by the work of L.C. Young [19] 
which, starting in the period between the Wars, opened 
up a new vista for mathematics; concepts associated 
with distributions and chains, for instance, are descen- 
dents of his work, and so is our contribution. 


Integral Relationships 


The applications of measure theory to optimization 
problems are based on the identification of linear func- 
tionals with a class of Radon measures, by Riesz’ the- 
orem [11]. We note that this theorem only applies if 
the underlying space is compact, this will cause trouble 
when considering unbounded sets of controls: the han- 
dling of infinities there will be done by means of Loeb 
measures in the setting of nonstandard analysis. 

In order to use Riesz’ theorem we need to rewrite 
the relationships associated with optimization prob- 
lems in the form of linear functionals on appropriate 
function spaces. We show how to do this in two specific 
cases. 

We consider first a finite-dimensional control prob- 
lem, to be referred to as problem P1. Let x, u be vectors 
in Euclidean n-spaces R” and R” respectively, t a real 
variable, J := [to, ty] with to < ty, A a compact subset of 
R”, xo, xf points in A, a bounded, closed subset U of 
R”. Further, let 2; := JxAxU, and gi2; — R" a con- 
tinuous function. The control function t € J > u(t) € U 
is Lebesgue-measurable, and the trajectory function t € 
J — x(t) € A is the (absolutely continuous) solution of 


X(t) = g(t, x(t), u(t), rel, (1) 


the differential equation describing the system to be 
controlled. We assume that the class F, of all admissi- 
ble trajectory-control pairs p := [x(-), u(-)] is nonempty, 


and seek to minimize the functional I: ¥; > R 
tf 
1p) = f° flt. (8). u(t) a 2) 
to 


for p € F;. Here fo; is a continuous function defined 
on 2). 

We develop now some equalities that are satisfied 
by the admissible pairs. Let B be an open ball in R"*! 
containing J x A; we denote by C’(B) the space of all 
real-valued functions on B that are uniformly contin- 
uous on B together with their first derivatives. Let @ € 
C’(B); define 


P(t, x, u) := x(t, x)u + H(t, x) (3) 


for all (t, x, u) € 2. Of course, é € C(2)). If p = [x(-), 
u(-)] is an admissible pair, 


/ (t, x(t), u(t)) dt 
= (tp. x) — P(to. x0) = Ag, (4) 


for all @ € C’(B). There are two special cases which 
are of interest; in the first we put w(t, x) := xjw (t), 
with 1 <j <n,and wy € D(J’); see [11]. Then, putting 
W(t, x, u) = xw'(t)+ u(t), for 1 < j < n and y 
€ DJ’), the equality (4) becomes [wits x(t), u(t)) 
dt = 0, since the (test) functions in D(J’) are zero at 
the boundary of J. The second case of interest happens 
when the function ¢ is chosen as a differentiable func- 
tion of the time t only, w(t, x, u) := 6(d), (t,x, u) € Qi, 
then W(t. x, u) = O(t), (t, x, u) € 2). We introduce 
a subspace of C({2;), to be denoted by C,({21), consist- 
ing of those functions which depend only on the first 
variable t; then the equalities (4) become: bi 7 A(t, x(t), 
u) dt = ap, h € C,(921), with ay, the Lebesgue integral of 
h(-, x, u) over J, independent of x and u. We will now 
choose for each of these spaces countable sets of func- 
tions whose linear combinations are dense in the corre- 
sponding spaces in the appropriate topologies. Thus we 
obtain countable sets of equalities. 

For the space C’(§21) we shall choose {@;}, a set of 
polynomials in (t, x1, ..., Xn); for D(J’), {xj}, the se- 
quence of functions of the type when the functions y 
are the sine and cosine functions 


t—to 
At )° 


sin (2m 
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r=1,2,...,j=1,...,, and for C,({2;) the sequence 
{hy}, a set of polynomials in t. Thus, finally, we obtain 
our set of integral equalities: 


[Fie x.ue dt = Adj, i=1,2,..., 

J 

[xste.x).ucny at =0, $= iivey. 15) 
J 

[rate xt0.uey dt = an,, | sal he 
J 


We will consider now an optimal control problem, 
to be denoted by P2, associated with a nonlinear diffu- 
sion equation. We follow the notation in [7] and [13]; 
see also [4], where we study a nonlinear wave equation. 
Take D, a bounded domain in R" with smooth bound- 
ary 0D, and T, a positive real number, and define: Qr := 
Dx (0, T); [7 := 0D x (0, T); Do := Dx {0}; Dr := Dx 
{T}. = 

We also choose some functions: Q7R, k € K; 
f: RxR"x Q;7 > R, f € C(R x R" x Qp). 

Consider the nonlinear diffusion equation: 


u;(x, t) — div(k(x, t)Vu(x, t)) 
= f(u(x, t), Vu(x, t),x, tf) (6) 


for (x, t) € Qr, with the initial condition u(x, 0) = 0, x 

€ D, and the boundary condition Vu-n|r, = v; here n 

is the outward normal, and the function (s, t) € [7 > 

v(s, t) € V C Ris the control function, taking values in 

a bounded control set V. 

A pair (u, v) of trajectory-function u and control- 
function v is said to be admissible if: 

i) The function (x, t) > u(x, f) is a classical solution 
of (6), that is, it is in C?!(Qr) N C(Qr U Ip U Do) 
and satisfies (5). 

ii) The control function is continuous in I" ,. 

iii) The terminal relationship u(-, T) = g is satisfied; g 
is a given continuous function on Dr. 

The set of admissible pairs for this problem will be de- 

noted by F2, and assumed to be nonempty. Since the 

control set V is bounded, then there are bounded sets A 

C Rand B C R" so that u(x, t) € A, Vu(x, t) € B, for all 

(x, t) € Q,y. Since there are many such sets A and B, we 

choose from those the minimum sets, that is, either the 

intersections 1 A and \ B of all sets satisfying the rela- 
tions above, or subsets of them. Thus, every point in our 


set A (respectively, B) will be a state (respectively, a gra- 
dient of a state) which can be reached by an admissible 
control inside the time interval [0, T]. 

The optimization problem associated with this 
equation is as follows. Let fo2, f; be nonnegative, 
Lipschitz-continuous real-valued functions on R?"*?, 
R™! respectively. Then we wish to find a minimizing 
pair (u, v) of admissible trajectory u and control v for 
the functional 


J(u, v) = / So2(u(x, t), Vulx, t), x, t) dx dt 
Qr 
+] filv(s,t),s,t)dsdt. (7) 
Ir 


We transform now this problem. Let w be in K. Then 
one can show [7] that the classical solution of (6), if it 
exists, satisfies the integral relation 


[uy —kVuVw + fw] dx dt 
Qr 


--f kyvdsae+ [ gwdx, (8) 
Ty Dr 


for all y € K. To these equalities we must add the equiv- 
alent of the last set of equations in (5). If a function &: 
A x B x Qr — R depends only on (x, t), fQ,&dx dt = 
ag, the Lebesgue integral of € over Qr. Also, if a func- 
tion ¢: V x I’7 — R depends only on (s, t), Ie ¢ ds 
dt = b,, the Lebesgue integral of ¢ over Ir. By choos- 
ing countable dense sets as before, {yj}, {&}, {cx}, we 
can obtain a countable set of equalities associated with 
problem P2: 


i, luwi, — kVuVw; + fwil dx dt 
Qr 


--f kv ds de +f gw; dx, 
Ir Dr 


ire? eee (9) 
§j dx dt = ag, ‘eee 
Qr 
i Cy ds dt = be, | a Oe 
Pr 
Metamorphosis 


We proceed to transform the control problems defined 
above; instead of minimizing over sets of admissible 
pairs trajectory-control, we find that it is possible to 
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minimize over a measure space, in the case of problem 

P1, and the product of two measure spaces, in that of 

problem P2. In general, the minimization of the func- 

tionals (2) over the set F, and (7) over the set Fz are 
not possible: the infima are not attained; it is not possi- 
ble, for instance, to write necessary conditions for these 
problems. We proceed then to transform them. 

The advantages of the new formulations are: 

i) An automatic existence theory: there always are 
minimizers for our measure-theoretical problems; 

ii) The new problems are linear, and then one can use 
the whole paraphernalia of linear analysis; 

iii) Also, our minimization is global: the value reached, 
say, numerically is close to what one could reason- 
ably call the global infimum of each problem. 

The price to pay for these advantages is that the final 

state is reached only asymptotically: that is, as the num- 

ber of (linear) constraints associated with the measure- 
theoretical problem tends to infinity. 
Let us consider first problem P1. An admissible pair 

p := [x(), u(-)] defines a linear, bounded, positive func- 

tional 


p: F> [Fa.x.ue) dt eR 
J 


in the space C({2,) of continuous real-valued functions 
F, with 2 :=J x A x U. By Riesz’ theorem, the admissi- 
ble pair p defines a Radon measure ft on 2; so that (2) 
becomes 


I(“) = (for), (10) 
while (5) becomes 

wi) = Adi, 1=1,2,..., 

U(Xj) = 9, ca eee (11) 

M(hk) = ang, 2? oe 


where we have written w(F) = [@, F du. Note that 
we have not achieved anything new so far; the mini- 
mization of the functional (2) over (5) is exactly equiv- 
alent to that of the functional j1(f1) over (11). We shall 
consider below the extension of our problem, the min- 
imization of (10) over the set S; of all measures jz in 
M7*(2;) satisfying (11). 

In the case of Problem P2, we can proceed simi- 
larly. A solution of (6) defines a linear, bounded posi- 


tive functional 


u(-,-): Fo F(u(x, t), Vu(x, t), x, t) dx dt 


Qr 
in the space C({22) of continuous real-valued functions 
F, with Q, := A x Bx Qr. Also, a control v defines a lin- 
ear, bounded, positive functional: 


v(-,+): c+ | G(v(s, t), s, t) ds dt 
Ir 


in the space C(w) of continuous functions G, w := V x 
Ir. 

By Riesz’s theorem, an admissible pair (u, v) defines 
two Radon measures A and v, the first on 22, the sec- 
ond on @, so that (9) becomes: 


/ Fdr+ | Gidv= [ gw; dx := ai, 
22 QW Dr (12) 


where 
F;(u, w, x, ft) = uWit(x, t) 
— K(x, thwVwilx, t) + fu, w, x, thyi(x, t), 
Gi(v,s, t) = Wilxlap. t)v. 
Thus, the minimization of the functional (7) over F, 
is equivalent to the minimization of 
I(A, v) = A(fo2) + v(fi), 


where we have written as above A(f) for f 2, f da, and 
v(g) for ff dv, over the set of measures (A, v) corre- 
sponding to admissible pairs, which satisfy 


(13) 


M(Fi) + v(G;) = ai, PSA Bini 
Mj) = ag, (S12, (14) 
v(x) = be, RS 129 05 


Again, we have not achieved anything new. As in the 
case of P1, we shall consider the extension of our prob- 
lem, the minimization of (13) over the set S2 of all pairs 
of measures (A, v) in M*(22) x M*(@) satisfying (14). 

We have developed, therefore, infinite- 
dimensional linear programming problems, the min- 
imization of linear forms (10) or (13) over sets S; and 
Sp, respectively, of positive Radon measures satisfying 
countably-infinite sets of linear equalities, (11) or (14). 
In the next section we consider the two main problems 
associated with their usefulness: do they have solutions; 
and how do the solutions help us solve our optimization 
problems. 


two 
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We start by choosing finite, but variable, subsets of 
equalities from the sets (11) and (14); in this way we 
consider two semi-infinite linear programming prob- 
lems, with finite number of equalities taking place in 
infinite-dimensional spaces. The first, to be denoted by 
LS1 = LS1(M,, M2, M3) consists in minimizing the lin- 
ear functional (10) over the subset S;(M,, M2, M3) of 
M*(2;) defined by the equalities 


woi) = Adi, 1 =1,....M,, 
(xj) = 9, j=l,...,M), (15) 
Lh) = ah,» k — 1, Fares M3; 


while the second, to be denoted by LS2 = LS2(N;, No, 
N3) consists in minimizing the linear functional (13) 
over the subset S2(Ni, N2, N3) of M*(22) x M*(w) de- 
fined by 


(Fi) + v(Gi) = aj, i=1,...,Mi, 
A(§j) = 4¢,, (eG ce (16) 
v(bx) = be, k =1,...,N3. 


We can prove that our minimization is global [11,12]. 
Proposition 1 
i) As M,, M2, M3 —> ov, 

inf =U for) 


S1(M),M2,M3) (17) 


— inf < inf I. 
Ss (fo) = 
ii) As N;, No, N3 > &, 


inf Aor) + v(fi)] 


S2(N1,N2,N3 


(18) 
=* inflA (for) +v(fil s mt 


Thus, we can approach the global infima by taking 
a large enough number of equalities. The fact that the 
global infima can be strictly less than the classical one is 
discussed in [11]. 

There are two aspects of these semi-infinite linear 
programming problems which are of interest to us; their 
characteristics such as existence and characterization of 
solutions, and the relationship of these solutions to the 
original optimization problems. We examine first the 
linear programs themselves. The conclusions of the fol- 
lowing proposition follow from weak*-compactness of 


the sets of measures, and Rosenbloom’s theorem [10]. 
We denote by 5(z) the atomic measure with support {z}. 


Proposition 2 

i) The linear programs LS1 and LS2 defined by (10)- 
(15) and (13)-(16) respectively have minimizers. 

ii) The solution of the program LS1, (10)-(15), has the 
form 


M 
Mopt = > 0,5(t;, xi, Ui), 
i=l 


with a; = 0,M :=M, + M2 + M3. 
iii) The solution of the program LS2, (13)-(16), has the 
form 


(Apt Vopt) 
N N 

= (>: a5 (uj, Wi, Xj, ti), >> BiS(vi, 51, 5) : 
i=1 i=1 


(19) 
with a;, B; = 0,N:=N,+N2 +N3. 


In part iii) of this Proposition we have identified (A, v) 

with the product pz x v; other possibilities exist [12]. 
We study now the other aspect of these solutions, 

how useful they are. How do we construct suboptimal 

pairs of trajectories and controls for our functionals, 
once we have solved the linear programming problems? 

We start with the problem LS1, (10)-(15), and shall 

proceed in several steps: 

1) We proceed to obtain a solution {opt of the form 
given above in ii). 

2) We obtain a weak*-approximation to this measure 
by a pair of piecewise constant functions (x, u). 
The exact procedure for this construction, to be ex- 
plained in detail below, involves the use of the sup- 
port points of /1op as well as the coefficients aj. 

3) We use the function u as control in (1) with x(to) = 
Xo and obtain a trajectory x,(-). The pair pr := (xr, 
u) can be shown to have the following properties: 

a) By taking the numbers M;, M2, M3 sufficiently 
large, one can make I(pa) as near as desired to 
infsi(fo1). 

b) The final state xp(t,) tends to xp as M,, M2, M3 
> oo. 

c) The constraint xp(t) € A tends to be satisfied in 
a similar manner. 
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In the case of problem LS2, (13)-(16), we proceed in 

a similar fashion. 

1) Firstly we shall obtain optimal measures (A opt, Vopt) 
for this problem. 

2) We obtain then a weak* approximation to (Aopt, 
Vopt) by a set of two piecewise-constant functions (u, 
v) by means of results involving weak* -density. 

3) The control function v obtained above is in L2(I"7), 
that is, for each t € (0, T), v(-, f) € L2(0D), since it is 
piecewise constant and the set Qr7 is bounded. It can 
serve then as boundary function for a (necessarily 
weak) solution of the system (6) to be denoted by 
u,. This solution will be in H!(Q7). 

4) Borrowing the term from H. Rudolph [17], we shall 
call the pair (u,, v) of trajectory and control func- 
tions asymptotically admissible if: 

a) The control function v € L3(Qr), and v(s, t) € V. 

b) The trajectory u, is the weak solution of (6) cor- 
responding to the admissible control v € L2(Qr). 

c) The trajectory function satisfies the appropriate 
constraints. 

d) The final value u,(-, T) of the trajectory function 
tends in L,(Dr) to the prescribed function g as 
Ni, N2,N3 > &. 

5) If the numbers Ni, No, N3 are sufficiently large, 
and the weak*-approximation in step ii) above suf- 
ficiently good, then the value at the pair (u,, v) of the 
functional J, J(u,, v), is close to the infs J, and thus 
is a good suboptimal pair. Note that no use is made 
of the trajectory u, obtained in step ii) together with 
the control v. 

We proceed in the next section to deal with approxima- 

tion and examples. 


Approximations and Examples 


The pattern should be clear; if one wants to obtain 
an estimate for a solution to problems such as P1 or 
P2, one should develop semi-infinite linear program- 
ming problems such as LS1 or LS2, find out the corre- 
sponding solutions (the coefficients w; and the supports 
of the atomic measures as in Proposition 1) and then 
build suboptimal pairs. There are two main ways of es- 
timating solutions of semi-infinite linear programming 
problems such as LS1 or LS2. 

The first consists in using functional analytical tech- 
niques in the space of measures; see [5,17]. Rudolph’s 


method has been used with great success to solve prob- 
lems such as P1, in a large number of dimensions; see 
[18] for a most impressive application of Rudolph’s 
method, as well as of our theory. 

The second approach consists in approximating the 
infinite-dimensional problem by a finite-dimensional 
one, taking place in a Euclidean space of large dimen- 
sion. We indicate first how to do this with respect to 
Problem P1. We note that in this problem the continu- 
ous functions hy, k = 1, ..., M3 have been replaced in 
practice by M3 lower semicontinuous pulse-like func- 
tions, also to be denoted by h,; the set J is divided into 
Ms3 equal segments, and the function h, equals 1 in the 
kth of such segments, zero elsewhere. This is explained 
in detail in [11], and has been done because it brings 
better stability to the numerical processes. 

A further concept must be introduced now [11]. 
Note that we have in LS1 a nonlinear optimization 
problem, in which the unknowns are the coefficients @; 
and supports (tj, x;, uj), i = 1, ..., M. In order to find 
a linear approximation to this problem, we consider 0, 
a countable dense subset of 22; := J x A x U. Taking Y 
>> M elements from w, we can write (10)-(15) as fol- 
lows. We write we := (tg, xe, ug), and wish to minimize 


Yr 
Y- ae for(we) 


(20) 

f=1 
on the set defined by the elements ag > 0, £=1,..., Y, 
which satisfy, further, 

ng 

Y> acdilte, xe, ue) = A¢d;, i=1,...,M, 

f=1 

Y 

Yen; (te, xe, ue) = 0, j=il,...,M2, (21) 


l=1 


‘a 
S\ achk(te. xe. ue) =an,, k=1,...,Ms3. 
(=1 


Here, then, the supports we are fixed, in @; the coef- 
ficients ag, € =1,..., Y, are the only unknowns; this 
is an M x Y (finite-dimensional) linear program. Of 
course as Y — oo the support of the optimal mea- 
Sure [opt in Proposition 2 can be approximated closer 
and closer by that of ee the solution of (20)-(21). 
Note, further, that at most M of the unknown q@-s are 
nonzero; we shall assume that the problem has essential 


3414 


Semi-infinite Programming and Control Problems 


regularity, and that exactly M of these a-s are nonzero; 
see [11] for a discussion of this point. Once this finite- 
dimensional linear program is solved, suboptimal pairs 
can be constructed as explained above; the construction 
of the control function from the coefficients ag and sup- 
ports ug is as follows. 

1) The time set J = [to, tr]] has been divided into M3 
equal subdivisions J,, k= 1,..., M3, each of measure 
At/Ms3, with At := ty — to. 

2) There is a total of M indices £ associated with those 
values a that are nonzero. To each of the subdivi- 
sion J; of J defined above are associated a number 
of these indices; if only one is so associated, then the 
value of ag equals At/M3; if more than one are asso- 
ciated, then the sum of the corresponding ag-s adds 
up to At/M3. 

3) Without loss of generality, let J., for 1 < j < M3 be 
associated with two ag-s, ag, and a,, as explained 
above, a typical situation; then we build the curve 
u(-) on Jz by making u(t), t € Jz, equal to ug, or ug, 
in each of the two partitions of J; with lengths a, 
and a, respectively. 

An example of this process will be given in the next sec- 

tion. 

In the case of problem P2, a similar construction 
can be made. We have however chosen here Rudolph’s 
method, for which an initial finite-dimensional esti- 
mate is necessary. 

It is convenient to work in M*(2 x w), identifying 
(A, v) > A x v; see [13] for details. The system we have 
chosen is 


x2 


u(x, t) — aiv(, af 
_ Vu(x, I; +1 

1+ x2+4+ u(x,t)?’ 
ux (0, t) = 0, u,(1, t) = v(t), 
t € (0,1), x € (0,1); 


Vu(x, ») 


(22) 


that is, k(x, t) = x?/(1+ t*), f(u, w, x, t):= || w |Z + D/G 
+x? + u*). We have taken V = [—10, 10], g(x) = 0.075, 
x € [0, 1], and we wish to minimize: 


ff ule, tP + Vale, OE 
Tu, v) = I 1+ sin?(tu(t, x)) si 


that is, for(u, w, x, t) = (u*+ ||w||Z)/(.+sin?(tw)), and fi 
= 0. The boundary 0D is composed of two points only, 


of which only one, the one at x = 1, plays an active role, 

the control being the heat flow at that point. 

The functions w were chosen of the form (x, t) 

= ft? cos(€ax) + q(t), or W(x, t) = ft? sin(hax) + q(t); 

the functions q are test functions introduced to improve 

the behavior at x = 1 for the determination of an ini- 

tial solution, as explained below. Ten such functions y 

were chosen, with values of p = 1, 2 and ¢ = 1, 2, 3, h 

= 1, 2. The 16 functions & were chosen by dividing the 

square [0, 1] x [0, 1] into 16 equal squares, the func- 
tions € being the characteristic functions of the indi- 

vidual squares. Thus the total number of constraints m 

equals m = 1+ 10+ 16 = 27. The computational method 

consists of three steps: 

i) The most difficult problem encountered was that of 
finding an initial solution from which, in part ii) 
of the method, one can iterate towards the min- 
imum. This was done here by means of a finite- 
dimensional linear program, obtained by discretiz- 
ing all the variables of the problem. It was necessary 
to find an initial solution by first solving for some 
of the parameters, thus the need of the functions q 
introduced above, because the (rather rough) dis- 
cretization tended to make the problem infeasi- 
ble. Then a finite-dimensional simplex program was 
run, of rather small size, with 677 variables and, 
of course, 27 constraints. Only the first phase (the 
one that produces a feasible solution) was run. It is 
usual in these problems to use a discretized solution 
as an initial one; see [5,17, Chapts. 5, 6]. 

ii) Then the simplex algorithm of Rudolph was run us- 
ing the output of step i) as initial solution, and after 
87 iterations it converged to a value of 0.202247; it 
had started with a value of 1.93919. This is a nu- 
merical estimation of the global minimum. 

iii) Once the optimization is performed, a nearly- 
optimal control v can be constructed; the method is 
shown in detail in [13], and follows the same gen- 
eral rules as the previous case. The graph of the re- 
sulting control is shown in Fig. 1. 


Unbounded Controls and Nonstandard Methods 


We consider in this last section a finite-dimensional 
control problem just like P1, but in which the control 
set U is unbounded. We shall use the same notation for 
P1 as above. The problem now is that since U is un- 
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Semi-infinite Programming and Control Problems, Figure 1 
Graph of the nearly-optimal control v. Note that the control 
set is V =[—10, 10] 


bounded we may obtain ‘impulses’ as controls and thus 
discontinuous trajectories. We need to be able to handle 
infinities, a role for which nonstandard analysis is well 
suited. We start by extending the control space, and 
consider U a subset of R”, the m-dimensional space 
generated by the extended real line R. Nothing much 
has changed, but we can put a topology on £2 := J x 
A x U which makes it compact [1]. We proceed now 
with our nonstandard construction; see [2,12,14]. We 
will work in a nonstandard framework given by a super- 
structure V(W), R C W. The superstructure V(*V) is 
also an enlargement, and N,-saturated. We study inte- 
grals of the form 


[fex0.u0) dt, (23) 
with p € F, and f € C(Q’). Then, 

Vp e Fi: [fe-x0.ue) dt eR; (24) 
by transfer, 

Vpe* Fi: i eco) dt e*R. (25) 


Thus, the nonstandard version of P1 consists in mini- 
mizing 
“I(p):=* / * for(t, x(t), u(t)) dt (26) 
an 


on the class *F, of pairs satisfying 


| “fit, x(t), u(t) dt = b;, i=1,..., M, (27) 
*] 


where the system (15) has been written in a compact 
form. Consider now the map suggested by (24). If p € 
F; is fixed, the map 


ye FP | F.x(0).u(0) dteR, 
J (28) 


Fe C(@’) 
is linear and positive. By Riesz’s theorem, there is 
a measure, to be denoted also by v», on the Borel sets 
B of 92’, that represents this map; remember that 92’ is 


compact. Then (* 2’, *B, p )isa nonstandard measure 
space and then (see [9]): 


Lemma3_ There is a measure space (* 2’, A, ue) so that 
ie is the Loeb measure associated with Vp; then, 


a) F(t, x(t), u(t)) dt = yt (F) 
) 


={ Fd ue, 
*Q/ 


The algebra A is an extension of the algebra *B. 


FecC(*®’). (29) 


Thus, one can write the optimization problem as the 
problem of minimizing 


Ke) = 0 ua); (30) 


over the set Mj, of measures of the form ue defined by 


B=t, tS TeasM, (31) 


Proposition 4 

i) The infima associated with the problems (26)-(27) 
and (30)-(31) are equal. 

ii) For any positive infinitesimal s € *IR, we can find 
a near-minimizer ju; € Mi, for the functional J in 
(30) in the set Mi, so that 


I(us) = inf J +s. (32) 
Mt, 


Let, then, s be a fixed positive infinitesimal in *R, and 
[ts the corresponding near-minimizer for J on Mi. We 
can proceed to map back this measure to the standard 
world, by means of the standard part map, see [2]. 


Proposition 5 There is a Baire measure [Lop on Q! so 
that: 
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i) If S isa Baire set in 2", 
Hopt(S) = °ms(stg7(S)), 


where st5,(S) is the union of the monads of the ele- 
ments of S. 


ii) Hopt( for) =| foi A Lopt 
Q/ 
int f f(t, x(t).8(0) dt. (33) 
wt J 


iii) The measure [op is a solution of the following opti- 
mization problem. Minimize 


(for) (34) 


over the set Mi (R') of positive Baire measures on 


92’ satisfying 
pi) = b3, i=1,.. 


. MM. (35) 


iv) The support of [Lop contains subsets of Q' in which 
at least one component of the variable u is either —oo 
or oo. The measure op: is defined by a Baire mea- 
sure on] x A x V, with V the set of all finite elements 
of U, plus atomic measures on those subsets. 


In problems of interest, in which the function fo; tends 
to infinity at infinity, and in which the infimum is finite, 
elements (t, x, u) € 2’ with one or more components 
of value oo or —oo do not really occur in the support 
of {Uopi; note that expressions such as oo —oo are not 
defined for the extended real line. Thus, if |fo1 (¢, x, z) = 
oo whenever a component of u is either oo or — 00, and 
that the minimum associated with the linear program 
is finite, such elements are not present in the support of 
LMopt- 

Weare now ina strong position to solve our original 
problem: the optimization problem P1 with unbounded 
U in the standard world. We proceed as in P1 in pre- 
vious sections, solving for opt, approximating its sup- 
port, building suboptimal pairs. 

We have solved a simple problem, taken from [6], 
with n = 1 and foi (t, x, u) := (uw? — 1)'%, and x = u; the 
0, tp = 1, x = 0, xp =2. The nu- 
merical approximation was performed with the follow- 
ing parameters: Q = 24, M, = 2, M2 = 8, M3 = 30, M 
40, N = 27000. Each of the axis associated with the vari- 
ables (t, x, u) was divided into 30 parts; the minimum 
obtained was 0.49587, which should be compared with 


other parameters are fo 


1.0 
t 


Semi-infinite Programming and Control Problems, Figure 2 
Graph of the control u 
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Graph of the trajectory x 


the minimum for the problem obtained by semiclassi- 
cal means in [6] of 0.413. The approximation problems 
for this kind of optimization problems are fierce; it is 
necessary to have a large value of Q as well as very fine 
mesh, thus very large linear programs. The graphs of 
the control u and the trajectory x can be seen in Fig. 2 
and Fig. 3. 

We should note that, even in this simple problem, 
we have achieved something not easily accomplished by 
the more traditional methods, which would have found 
it extremely hard to deal with the cube of a ‘delta func- 
tion’; they mostly deal in problems in which the control 
variable, our u, appears linearly. 

The method employed here appears promising to 
deal with partial differential equations with solutions 
exhibiting shocks, such as those studied in [8]. See also 
[15,16]. 

We have also extended these methods to the design 
of optimal shapes associated with nonlinear differential 
equations [3]. 


See also 


> Control Vector Iteration 
> Duality in Optimal Control with First Order 
Differential Equations 
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Unconstrained Optimal Control 

> Dynamic Programming: Optimal Control 
Applications 
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> Robust Control 
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Methods 
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> Semi-infinite Programming: Numerical Methods 

> Semi-infinite Programming: Second Order 
Optimality Conditions 

> Semi-infinite Programming, Semidefinite 
Programming and Perfect Duality 

> Sequential Quadratic Programming: Interior Point 
Methods for Distributed Optimal Control Problems 

> Suboptimal Control 


References 


1. Berge C (1963) Topological spaces. Oliver and Boyd, Edin- 
burgh 

2. Cutland NJ (eds) (1988) Nonstandard analysis and its appli- 
cations. Cambridge Univ Press, Cambridge 

3. Fakharzadeh A (1997) Shapes, measures and elliptic equa- 
tions. PhD Thesis, Univ Leeds, Leeds 

4. Farahi MH, Rubio JE, Wilson DA (1996) The global control 
of a nonlinear wave equation. Internat J Control 65:1-15 

5. Glashof K, Gustafson S (1983) Linear optimization and ap- 
proximation. Springer, Berlin 

6. Lawden DF (1959) Discontinuous solutions of variational 
problems. J Austral Math Soc 1:27-37 

7. Mikhailov VP (1978) Partial differential equations. MIR, 
Moscow 

8. Oberguggenberger M (1992) Multiplication of distribu- 
tions and applications to partial differential equations. 
Longman, London 


9. Render H (1993) Pushing down Loeb measures. Math 

Scand 72:61-84 

10. Rosenbloom PC (1952) Quelques classes de problémes éx- 
tremaux. Bull Soc Math France 80:183-216 

11. Rubio JE (1986) Control and optimization: the linear treat- 
ment of nonlinear problems. MUP and Wiley, New York 

12. Rubio JE (1994) Optimization and nonstandard analysis. M 
Dekker, New York 

13. Rubio JE (1995) The global control of nonlinear diffusion 
equations. SIAM J Control Optim 33:308-322 

14. Rubio JE (1998) The global optimization of variational 
problems with discontinuous solutions. J Global Optim 
12:225-237 

15. Rubio JE (2000) Optimal control problems with un- 
bounded constraint set. Optim 48:191-210 

16. Rubio JE. The global control of shock waves 

17. Rudolph H (1987) Simplexalgorithmus der semiinfiniten 
linearen Optimierung. Wiss Leuna—Merseburg Tech 
Hochsch 5:782-806 

18. Rudolph H (1990) Global solution in optimal control via 
SILP. In: Sebastian H-J (ed) System modelling and optimiza- 
tion. Lecture Notes Control Inform Sci. Springer, Berlin, pp 
394-402 

19. Young LC (1969) Calculus of variations and optimal control 
theory. WB Saunders, Philadelphia 


ee 
Semi-infinite Programming: 
Discretization Methods 


SIP 


REMBERT REEMTSEN 
Brandenburg Technical University Cottbus, 
Cottbus, Germany 


MSC2000: 90C34, 90C05, 90C25, 90C30 


Article Outline 


Keywords 

Convergence of Solutions of Discretized SIP Problems 
Solution of Discretized SIP Problems 

See also 

References 


Keywords 
Semi-infinite programming; Optimization algorithms; 


Discretization of optimization problems 


Consider the following optimization problem with re- 
spect to x € R", in which Y C R” is a nonempty com- 
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pact set and F € C(R”) and g € C(R" x Y) are given 
functions: 


min f(x) 


P[Y] 
s.t. g(x, y) <0, 


yey. 


If Y is an infinite set, this problem includes finitely 
many unknowns and infinitely many inequality con- 
straints and, therefore, is said to be a semi-infinite pro- 
gramming (briefly: SIP) problem. The problem is de- 
noted as linear if f and g(-, y), y € Y, are affine-linear, as 
convex if these functions are convex, and as nonlinear 
in all other cases. 

Results which hold for P[Y] under the given gen- 
eral conditions similarly apply to problems with con- 
straints g(x, y) < 0, y € Y/, forj=1,...,p, where YJ C 
R”/ is nonempty and compact and g; € C(R" x Y/) (e.g. 
[43]). Note in this connection that an ordinary inequal- 
ity constraint g (x) < 0can, for example, be expressed 
as gi(x, y) < 0, y € {1}, for [gj(x, y) := ygj(x) and 
that an equality constraint hj(x) = 0 can be described by 
the two inequality constraints +h,(x) < 0. Inclusion of 
equality constraints in this way, however, is not possi- 
ble when the existence of a point is required at which all 
inequality constraints are strictly satisfied. A compre- 
hensive survey of numerical methods for the solution 
of such SIP problems can be found in [43]. 

In the following discussion, || - || is an arbitrary 
norm and || - ||,, 1 < p < 00, the /?-norm in some space 
R*. The set No equals N U {0}, the number |A| means 
the cardinality of a set A, and 


dist(D, Y) := sup inf ||y — z||, 
yeY zeED 


is the density of D C Y in Y. Moreover, for each D C Y, 
a problem P[D] is defined by 


min f(x) 


P[D] 
st. g(x,y) < 0, 


yeD. 
The set of feasible points of P[D] is denoted by 


F(D) := {x € R": g(x,y) <0, yeD} 


and its minimal value by 
D):=_ inf ; 
W(D) = inf fle) 


In case D C Y isa finite and Y an infinite set, problem 
P[D] is said to be a discretized SIP problem. 

A point x* € F(D) is called a global solution of p[D] 
if f(x*) = yt (D) and a local solution if f(x*) < f(x) is 
true for all x € F(D) M U(x*) with some open ball U(x*) 
centered at x*. Usually convergence of algorithms to 
a global solution of some problem p[D] can be guar- 
anteed only for linear and convex problems. Therefore, 
if not specified, a ‘solution’ of p[|D] for some D in the 
following is a point which can be obtained in practice 
as the limit point of some sequence generated by an al- 
gorithm. This may be a local or a global solution of the 
problem or, more generally, a point at which some first 
order optimality condition is satisfied. 

One approach to the solution of a SIP problem p[Y] 
is to (approximately) solve p[Y;] for i= 0, 1, ..., where 
{Yj} is a sequence of finite subsets (‘grids’) of Y with 


lim dist(Y;, Y) = 0. 
1—0o 


A procedure of the latter type is denoted as a discretiza- 
tion method. The grid sequence {Y;} needed for that is 
usually prescribed a priori where, typically, the grids are 
equidistant and have the property Yi,; > Yj, i € No. Oc- 
casionally {Y;} is also successively defined a posteriori 
(‘adaptively’) such that information obtained on the ith 
discretization level is utilized to define Yj. 

If x*! € F(Y;) is a solution of p[Y;], i € No, and ac- 
cumulation points of {x*'} solve p[Y], then also accu- 
mulation points of each sequence {x'} of approximate 
solutions x‘ of p[Y;] are solutions of p[Y] as long as 
lim;oo(x' — x*!) = 0. Thus, it has to be guaranteed 
that accumulation points of {x*'} solve p[Y] and that 
the algorithm used for the solution of p[Y;] generates 
such point x‘ end after finitely many iterations. Both is 
separately discussed below. 

In practice, only a finite set of discretized problems 
plYi], i= 0, ..., I, can be solved for some I € N. Thus 
a discretization method describes a way to efficiently 
compute a solution of a (finely) discretized SIP problem 
plYr]. Such solution serves as an approximate solution 
of the given SIP problem where its accuracy is usually 
determined by the density of Y; in Y. 
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A clear advantage of discretization methods over 
other SIP methods is that they exclusively work with fi- 
nite subsets of Y. In particular, for the finite program 
plYil, feasibility of a point x € R” can be checked eas- 
ily, other than for the SIP problem p[Y] itself. There- 
fore a discretization method is especially suited for SIP 
problems with a solution x* at which g(x*, -) is (al- 
most) constant on Y or on parts of Y. The latter occurs, 
for example, at SIP problems originating from complex 
Chebyshev approximation [41]. 

A drawback of discretization methods is that they 
require a huge number of function evaluations and that 
the numerical costs for solving the discretized prob- 
lems and hence for gaining accuracy with respect to the 
SIP problem increase dramatically with decreasing grid 
densities. Therefore, in view of reasonable computing 
times, the maximal number of grid points and hence the 
accuracy obtainable by discretization methods are lim- 
ited in practice. (Grids with at most 50,000 to 100,000 
points for problems with less than 100 variables are typ- 
ical.) Furthermore, a solution x*! € F(Y;) of p[Yj] is 
normally only an outer approximation of a solution of 
the SIP problem, and hence especially an approximate 
solution of p[Y] that is not feasible for p[Y]. Observe 
that, for Y; C Y, a global solution x*! of plYi] which is 
feasible for p[ Y] also solves p[Y] since 


f(x") = a = ee, 27"): 


but that such equivalence of a SIP problem with a finite 
optimization problem is the exceptional case [43]. 

The solution reached by a discretization procedure, 
however, often suffices for practical purposes. If this is 
not the case, then, under certain conditions, the solu- 
tion can be improved by a locally convergent reduction 
based method [43]. Such two-phase procedures cur- 
rently represent the most promising methods at least 
for the solution of nonlinear SIP problems. 


Convergence of Solutions 
of Discretized SIP Problems 


A solution of a discretized SIP problem is not nec- 
essarily an approximate solution of the SIP problem 
(see [23,43] for counterexamples). Such approximation 
property is only given under suitable assumptions. For 


that let {Y;} be a sequence of finite subsets of Y satisfy- 
ing 


lim dist(Y;, Y) = 0 


1—0o 


and let 
A(x? D) := F(D)N {x ER": f(x) < f(x") 


be a level set with respect to x’ € F(Y) and DC Y. 

Then, if Y; C Yin C Y for i € No and A (x, Yo) 
is bounded for some x" € F(Y), the following can be 
shown (see [42] for a more general form of this result): 
Problem p[Yi], i € No, possesses a global solution x*! 
€ R"; moreover, {x*'} has an accumulation point, each 
such point is a global solution of p[ Y], and {(Y;)} con- 
verges monotonically increasing to 4(Y). 

Especially for convex problems, a bounded set A(x", 
Yo) exists if and only if the SIP problem p[Y] has 
a nonempty bounded set of solutions [43]. Similar re- 
sults were also proved for linear problems in [10,11,17], 
for convex problems in [46], and for nonlinear ones in 
[12]. Furthermore, for linear problems, the general pos- 
sibility of discretization is studied in [4,5,6]. Some of the 
given theorems do not require the inclusion Y; C Yix 
for all i € No, but efficient use of the obtained solution 
of p[ Y;] as a starting vector for p[Yj,:] normally sug- 
gests that Y;,; contains Y; or at least those points of Y; 
which belong to constraints that are active this solution. 

An extension of the above convergence result guar- 
antees convergence of solutions x*! of problems p[Dj], 
i € No, where Do equals Yo, the set Djs satisfies Dj U 
{y'} © Dist © Yin, and Y' is a point with g(x*/, y') = 
41 &(x*', y) [40,42,43]. A variant of this state- 
ment concerning also nonlinear problems is derived in 
[30, p. 464] where x*! only needs to be a certain (ap- 
proximate) stationary point of p[D;]. Rules which allow 
to drop some of the constraints in p[D;] were given in 
[7,26]. But, in this case, the size of |Dj| is not easily con- 
trolled and the choice of {Y;} not obvious (|Y;| should 
grow slowly if e.g. Dix := Dj U {y'}). 

Another convergence theorem refers to nonlinear 
SIP problems and the practically relevant case that 
plYi] is solved by a sequential quadratic programming 
(briefly: SQP) type method. For that let f € C? (R") and 
g €C*°(R" x Y), and define, for each x € R", p > 0, and 
compact set D C Y, the exact Loo-penalty function 


Loo(x, p,D) := f(x) + p¢* (x, D), 


MaxX ye Y; 
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where 


#* (x, D) := max max{g(x, y), 0}. 
yED 


A stationary, respectively critical, point of Loo(-, p, D) 
that is feasible for p[D] is a Karush-Kuhn-Tucker point 
(KKT point) of p[D], and, conversely, a KKT point of 
p[D] is a stationary point of Loo(-, p, D) if p is suffi- 
ciently large (see e.g. [2,9] for details in the finite case 
and use the definition of a KKT point of a SIP problem 
frome. g. [16,17], which also applies to finite problems). 

Then, if, for all i € No, there exists a stationary point 
x*! of Loo(-, pi, Yi) for some p; > 0 and if p; < p * with 
some p*, one has [43]: Each accumulation point x* of 
{x*'} is a stationary point of Loo(-, p*, Y); if {x*#/} con- 
verges to x* and lim; cof * (x* ti, Yi,) = 0, then x* also 
is a KKT point of p[Y]. 

For example, the SQP type methods in [9] and [24] 
for the solution of a discretized SIP problem p[Yj] (see 
below) converge to a stationary point x*! of Loo(-, pis 
Y;) and a KKT point of p[Y;i], respectively. The exis- 
tence of an accumulation point of {x*'} is especially 
guaranteed for the method in [9] if a certain level set 
for the exact Lgo-penalty function is compact [9]. 

Only few rate of convergence results for a sequence 
of solutions of discretized SIP problems are known (cf. 
[30,41,43]). Note also in this regard that the numeri- 
cal costs for solving discretized SIP problems normally 
tend to infinity with decreasing grid density. For a gen- 
eral theory on the discretiza- tion of SIP problems, the 
reader is referred to [29,30]. 


Solution of Discretized SIP Problems 


Except for small n and |Y;|, it is not advisable to solve 
a discretized SIP problem p[Y;] by an arbitrary method 
for finite programming, since such methods often re- 
quire the solution of subproblems which have the same 
number of constraints as the problem itself and hence 
do not use to advantage that the constraints in a SIP 
problem originate from a continuous function. More- 
over, if a finely discretized SIP problem p[Y;] is to be 
solved (the objective of a discretization method), it is 
much more efficient to solve p[Yr] via a sequence of 
problems p[Y;], i = 0, ..., I, with progressively refined 
grids Y; rather than to approach p[Y,] directly. (See e. g. 


[9] for numerical examples showing this.) For the all- 
over efficiency of a discretization method it is, there- 
fore, also important that the (approximate) solution x! 
of p[Y;] (and possibly additional information) can be 
exploited for the solution of p[Yj1]. Such point x‘, 
however, is usually infeasible for p[Yis1] if Yis1 A Yi. 
Therefore methods which start from a feasible point 
of p[Y;] may turn out to be too costly within such 
scheme. 

A variety of methods for the efficient solution of 
a discretized SIP problem p[Y;] has been suggested. 
Concerning linear SIP problems, in particular the fol- 
lowing algorithm has been specified and successfully 
applied (let i € No be fixed and choose k := 0 if i = 0): 


0 Choose k € No and Dy C 
n <| Dx |< oo. 

1 Finda solution x‘ € R” of P[D,]. 
Let Ay := {y € Dg: g(x*, y) = 0}. 

2 Find yk € Y; such that ee) = MaXyey, 
g(x", y). 

3 IF g(x", y*) < 0, STOP! 
ELSE choose Dx.; © Y; with Dg; D Ax U 
{y*}. 

4 Set k :=k+1and go to Step 1. 


Y; with 


If the stopping criterion of the algorithm is fulfilled, 
the algorithm can be continued with i:=i+ 1, beginning 
with Step 2 and the current index k. Especially, if {Y;} is 
a sequence of grids such that Y; C Yi41 C Y, if Do := Yo 
and if A(x’, Yo) is bounded for some x’ € F(Y), then 
each problem p[D,] has a global solution. 

It can be shown that this algorithm stops after 
finitely many iterations with a global solution x* € F(Y;) 
of p[Y;], provided that 2 (Dy41) = (Dx) is true only 
for finitely many k € N in succession [40]. The latter 
is guaranteed, for example, if the solution of problem 
pl[Dx] is unique [40], which is almost always the case on 
a computer. In practice, such solution should be com- 
puted via solution of the dual problem for p[D,], since, 
in that way, p[Dx11] can be solved very efficiently and 
much computing time is saved (e.g. [43]). The stop- 
ping criterion in Step 3 may be exchanged for g(x*, y*) 
< &; where {é;} is a zero sequence of nonnegative num- 
bers. 
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Discretization procedures of this type for linear SIP 
problems were proposed in [14,17,40] with certain sets 


Dx4i 2 7 EY: g(x*,y)> 7 


and different choices of 0% < 0 (see also [15] for an ex- 
tension to quadratic problems). The cardinality |D,,1| 
of such set D;.1, however, can become quite large so 
that, in practice, a proper subset of Dx,; may have to 
be selected by a cumbersome management. Another 
promising choice of Dx,1, which can be motivated by 
the success of the methods e. g. in [8,9,28], and [36], but 
which has not been tried yet, would be to only add some 
or all violating discrete local maximizers of g(x‘, -) on Y; 
to the set of active points Ax. 

An extension of the algorithm for linear problems 
from [40] to convex problems was developed in [41,42] 
and applied to large filter design problems in [35] (see 
also [36] for computing times and a comparison with 
another method). As a cutting plane method, this latter 
method requires the knowledge of a compact set X D 
A (x*, Yo) for some x’ € F(Y), which is described by 
finitely many linear inequality constraints. Typically, in 
practice, X is given by box constraints aj < xj < fj, j 
= 1,..., ”, where it is known or assumed that p[Y] has 
a global solution which satisfies such bounds. A way to 
construct a bounded set X numerically without use of 
such a priori knowledge on the solution has been found 
for SIP problems which correspond to linear complex 
Chebyshev approximation problems [41]. 

An approach to finite convex programs in [13] com- 
bines an interior point logarithmic barrier method with 
a cutting plane technique, which allows to add and 
delete constraints and hence is capable of solving finely 
discretized convex SIP problems. In [20] this method 
has been modified and incorporated into a ‘dynamic’ 
heuristic discretization procedure for the solution of 
SIP problems. (See [43] for a comment on the numer- 
ical results in [20].) Further ideas concerning the solu- 
tion of discretized convex SIP problems can be found in 
[21,22,23,47], and [48]. 

A number of methods is oriented towards the solu- 
tion of nonlinear discretized SIP problems. In the 1970s, 
some authors had developed combined methods of fea- 
sible directions for nonlinear finite optimization prob- 
lems, which can start from an arbitrary point in R” 
(e.g. [34]). For SIP problems, such methods have been 


embedded into a discretization scheme, where conver- 
gence of certain obtainable (approximate) stationary 
points of p[Y;] to a related stationary point, respectively 
KKT point, of p[ Y] can be proven under relatively weak 
assumptions [8,28,33]. In particular, in [33], the sub- 
problems occurring at solution of the discretized prob- 
lem p[Y;] contain one constraint for each member of 
the entire set of e-most active points 


Hae es, 
= {ye Yi: g(x,y) > ot (x, ¥i)—e}, 


for some ¢ > 0, while, in [8], the special structure of 
a discretized SIP problem is exploited and constraints 
are needed only for the usually much smaller set of dis- 
crete ‘left’ local maximizers in vA). The behavior of 
the algorithm could be improved by addition of further 
points to the latter set [28]. A numerical comparison of 
the method in [8] with other methods is found in [45]. 

Using first order information only, these algorithms 
for p[Y;] normally have at best a linear rate of conver- 
gence. Indeed, the r-linear convergence of a modifica- 
tion of the method for finite problems from [34] could 
be shown [31], where, in this method, however, all con- 
straints have to be respected for the construction of the 
quadratic subproblems. The method was incorporated 
into an adaptive discretization procedure such that, for 
certain convex problems, the entire sequence of iter- 
ates has the same r-linear rate of convergence as the 
inner method for the finite subproblems [32]. Another 
variant of the method from [34], using ¢-most active 
constraints only, can be found in [30, p.279], but the 
convergence rate of this method does not seem to be 
known. 

A combined discretization procedure and exact 
penalty function method using first order information 
was developed in [30, p. 479]. Another, normally at best 
linearly convergent method can be found in [18,19], 
where the subproblems contain constraints for all ele- 
ments of the set of e-global points 


Yj,e(X) 
= Ive Yi: g(x,y) = wy. 


at x with respect to Y; for a certain ¢ > 0. (Note that 
Yi, e(x) equals v7) for each point x € R” outside or 
on the boundary of F(Y;) and that especially Y;j,o(x) 
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represents the set of discrete global maximizers of g(x, «) 
on Y;.) Consult, furthermore, [1] for an unconventional 
discretization procedure for problems with convex con- 
straints, which generates feasible points with respect to 
ply]. 

Some authors have developed SQP type algorithms 
for the solution of nonlinear finite optimization prob- 
lems with large numbers of constraints, as they are 
given by discretized SIP problems. The purpose of such 
developments is to considerably reduce the size of the 
arising quadratic subproblems and the total number of 
gradient evaluations in comparison to standard SQP 
methods and to simultaneously preserve the good con- 
vergence properties of these methods (e. g. [3]). Meth- 
ods of this form were given in [9,24] (with code con- 
tained in the software package CFSQP), [27,44] (with- 
out convergence proofs), and, for linearly constrained 
problems, in [39] (with code in [24]). A special feature 
of the algorithm in [24] is that the iterates remain fea- 
sible with respect to p[Y;]. The peculiarity of the algo- 
rithm in [9] is that the quadratic subproblem at the kth 
iteration only needs to include constraints for the usu- 
ally small set of discrete €-global local maximizers 


Yj e(x") = 
{ve Yie(x*): g(x*,7) > g(x*y)y € ui@) , 


where U;(j) is a discrete neighborhood of y in Y;, con- 
sisting of y and neighboring points of ‘y in Y; (see also 
[43]). As an example of such SQP type algorithms for 
the solution of a discretized SIP problem p[Yj], the al- 
gorithm from [9] is given below in a rudimentary form 
(again let i € No be fixed and choose k := 0 ifi = 0). 

Convergence of this algorithm to a stationary point 
of the exact Loo-penalty function, respectively a KKT 
point, of p[ Yi] is guaranteed under standard assump- 
tions. If, in addition, the Maratos effect avoiding 
scheme from [25] is properly incorporated into the 
algorithm, then, under some additional assumptions 
usually required in this context, it also converges r- 
superlinearly. The final data obtained by the algorithm 
(with an adequate stopping criterion) can normally 
be completely employed to initialize the algorithm for 
plYi+1], in case a sequence of discretized SIP problems 
is solved. 


0 Selecta € (0,1/2),6 € (0,1),¢ > 0,andk € 
No. 
Choose px > 0, xk ER" a symmetric positive 
definite matrix H; € R"*”, and a subset Dz C 
Y; with Dy > Y} ,(x*). 
1 Compute the unique solution (d*, &) € R" x 
R of the quadratic problem 
min 3d' Hyd + Vf(x*)'d + pxé 
st. g(x", y+ Vex", yd <=, 


y © Dx, 
G20 
and associated Lagrange multipliers 
(AK, Ay) E RIPE XR. 
IF || d* ||= 0, STOP| 
2 IF & = O and || A* |= px, set 
pk :=1.5 || A¥ Ih. 


3 Let £ € No be the smallest number such that 
ty = B* satisfies 


Letx"; Ot, Vi) — lolx’ tid”, pe, Y= 


at,d* Hyd* 


Set x**1 = x* + tpd* and prs := pr. 

4 Compute H;,; by Powell’s modification of the 
BFGS update ([37]) with respect to the Hessian 
of the Lagrangian 


Li(x, ACY) = f(x) + > AG)gle, ») 


yeyYj 


of P[Y;], where A(Y;) = (A(y))yey;, and 
choose a set Dz; C Y; with Dy, D Tee), 
Set k := k + 1 and go to Step 1. 


Applied to a discretized SIP problem, the algo- 
rithm of [9] improves another one from [2, Sect. 4.2] 
which, instead of Yo). employs the in this case usu- 
ally much larger, upper set Y;,.(x*). But, although the 
choice Dx := alee is suitable in the above algorithm, 
addition of further points to D; is advised to improve 
the all-over efficiency of the method. In this regard, the 
selection rules from [24] and [44] have turned out to be 
valuable. 

The SQP algorithm in [9] can be similarly used also 
for the solution of the locally reduced problem in a sec- 
ond phase of a two-phase approach to the solution of 
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plY] and has shown to yield excellent results in this 
way. At earlier two-phase approaches, always two dif- 
ferent methods had been applied to the problems in 
both phases. 

In addition to the mentioned methods, also stochas- 
tic discretization procedures have been developed which 
are capable of providing a quasi-optimal solution of 
a finely discretized SIP problem (e.g. [49]). 


See also 


> Semi-infinite Programming: Approximation 
Methods 

> Semi-infinite Programming and Control Problems 

> Semi-infinite Programming: Methods for Linear 
Problems 

> Semi-infinite Programming: Numerical Methods 

> Semi-infinite Programming: Second Order 
Optimality Conditions 

> Semi-infinite Programming, Semidefinite 
Programming and Perfect Duality 
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The class of linear semi-infinite programming prob- 
lems may be looked upon as a generalization of the class 
of linear programs. In a semi-infinite program, either 
the number of variables or the number of constraints 
(but not both at the same time) may be infinite. In the 
present paper we will mainly deal with semi-infinite 
programs of the latter type. The main theoretical as well 
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as practical difficulty is that one needs to verify that 
a proposed optimal solution satisfies infinitely many 
linear inequality constraints. We will describe two ap- 
plications which illustrate this, namely global optimiza- 
tion and an air quality control problem. In a companion 
paper in this volume we will describe how to solve semi- 
infinite programs numerically by means of systematic 
approximation with simpler problems. The literature 
on linear semi-infinite programs is vast. For a general 
introduction to this field, see [5,6] and [7]. Recent re- 
sults are to be found for instance, in [11]. 


Definition of Linear Semi-Infinite Programs 


We now introduce, following [6]: 


Definition 1 Let S be a fixed set, a: S > R", b: S > 
R two fixed functions, c € R" a fixed vector. Then the 
following task will be called a primal linear semi-infinite 
program: 


minc!y, (1) 
over all y € R” subject to the constraint 


a(s)'y>b(s), séS. (2) 


Remark 2 The data of the semi-infinite program (1) 
and (2) are hence the index set S, the vector-valued 
function a and the real-valued function b, both defined 
on S as well as the fixed vector c € R". 


Definition 3 Use the notation of Definition 1. Put 
Y = {yeER": a(s)'y> Ws], se S}, (3) 
V = infc'y. 4 
ae o 


If Y is empty, then we define V = —o0, i. e. the condition 
(2) is inconsistent. Y is called the set of feasible solutions 
to (1), (2). 


The following theorem on dual inequality holds: 


Theorem 4 Use the notations of Definition 1. Assume 
that there is a subset {5}, ..., Sg} C S, such that 


Then the following inequality holds for all y satisfying 
(2): 


q 
cly > > x;b(s;). 
i=1 


Proof Let (2) and (5) be satisfied. Then 


a(si)" 


y>=b(s;)), i=1,...,q. 


Multiplying by x; and summing over i we obtain 


q q 
Y xia(si) y = D> xj(si). (6) 
i=1 i=1 


The result now follows from (5). 


Remark 5 The right-hand side of (6) gives a lower 
bound for the optimal value of the linear semi-infinite 
program defined by (1) and (2). 


We next consider the task of finding the largest value of 
this lower bound: 


Definition 6 The dual semi-infinite program is defined 
by: Determine a finite subset {s;, ..., sg} C S and real 


numbers x;,..., x, such that the expression 


q 
Y= xib(si) (7) 
i=1 


is maximized, subject to the constraints 


Remark 7 IfSisa finite set, then (1), (2) define a linear 
program and (7), (8) is equivalent to its dual. 


The problems in Definitions 1 and 6 above admit sev- 
eral different geometric interpretations. Thus (8) means 
that the vector c shall be written as a nonnegative lin- 
ear combination of the q vectors a(s;), ..., a(s,). The 
inequality (2) means that the real-valued function b is 
to be approximated from the above over S by a linear 
combination of the n functions a), ..., dy. 

We note that the feasible subset Y defined by (3) is 
a convex subset of R", if it is nonempty. For each fixed 
s € S the relation 


a(s)"y = B(s) 
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defines a supporting hyperplane to Y. Hence program 
(1), (2) is the task to minimize the linear form cTy over 
Y. The latter problem may be formally rewritten as the 
equivalent task of minimizing 


ely, 
subject to the single constraint 


gly) = 0, 


where 
g(y) = min a(s)'y — B(s). 


Thus the evaluation of g(y) requires the solution of 
a global minimization problem, which may be nontriv- 
ial if S is infinite. Under fairly general assumptions the 
duality results of linear programming may be extended 
to semi-infinite programs. In particular, the programs 
(1), (2) and (7), (8) have optimal solutions and the same 
optimal value. See e. g. [5] and [6]. We state the follow- 
ing theorem on complementary slackness: 


Theorem 8 Let y be an optimal solution of (1), (2) and 
let q, {S1, ...1 Sqf © S, X}, ..., Xq be an optimal solution of 
(7), (8). Put 

d(s) = a(s)'y—b(s), seéS. 


If the two problems also have the same optimal value, the 
following relations hold: 


d(s)=>0, seéS, (9) 
q 
Y > xia(si) =¢, (10) 
i=1 
x;d(s;) = 0, i= 1,....4, (11) 
the function d achieves its global minimum at 
$i, F=1,...,g (12) 


The proof is given e. g. in [6]. 


Remark 9 The relations (9)-(12) give necessary con- 
ditions for optimality. If the set S is finite, then (1), (2) 
and (7), (8) form a dual pair of linear programs and we 
recognize Theorem 8 in this case as the complementary 
slackness result in linear programming. 


If S is infinite, then (10), (11) may be combined to 
a nonlinear system of equations with x;, sj, i= 1,...; 
q, and y as unknown variables. Further relations may 
be derived from (12) and included in the system which 
then has equally many scalar unknowns as it has equa- 
tions. This requires that q is known and one must 
also require that the function d has continuous par- 
tial derivatives of the first order. In addition it must be 
known for each s; whether this point is at the boundary 
or in the interior of S. These considerations are used in 
the three-phase algorithm. See e. g. [5,6] or [7], whose 
main ideas are as follows: 


1 | Solve the problems (1), (2) and (7), (8) approx- 
imately by replacing S with a finite subset T to 
obtain a dual pair of linear programs. 

2 | Determine q and approximate values for the 
variables x;, s;, i = 1, ..., gq, and y from the 
results of Phase 1. Decide for each s; whether 
it is an interior or boundary point of S. Con- 
struct a nonlinear system of equations with re- 
spect to the variables sought. 

3 | Solve the nonlinear system of equations by 
a suitable numerical scheme. Check the feasi- 
bility of the calculated solution, in particular 
that the infinitely many constraints in (2) are 
satisfied. In case of failure return to Phase 1 
and redo the whole process with a finer grid. 


A Global Maximization Problem 


We discuss now a very special instance of the prob- 
lem (1), (2). The treatment of this test problem il- 
lustrates some fundamental properties of semi-infinite 
programs. Thus we put n = 1, a;(s) = 1, c; = 1 and con- 
sider the task 


Example 10 

min y, (13) 
subject to the constraint 

y= vs), seS. (14) 


In this example y is a real number. We note that (14) 
may define infinitely many constraints. The dual of the 
problem (13), (14) may be written 


max b(s), séS. (15) 
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By placing various assumptions on the function b and 
the set S one may illustrate the different properties 
a dual pair of semi-infinite programs may have. Thus if 
S is finite, then both programs achieve their optimal val- 
ues in accordance with the theory of linear programs. If 
b is bounded on S, then the primal program (13), (14) 
has the optimal solution 


y = sup D(s). (16) 


ses 
If in addition S is compact and b continuous, then the 
dual (15) also achieves its optimal value and we write 


y = max D(s). (17) 
ses 

For a discussion of the various states of semi-infinite 

programs, see [5,6] and [7]. 


Example 10 may also be used to illustrate the properties 
of some suggested numerical schemes for linear semi- 
infinite programs. We note that (17) is useful for nu- 
merical work only if there is a way to determine all local 
maxima to b on S. If b has continuous partial deriva- 
tives of the first order, we may try to solve by numerical 
methods the equation 


Vb(s) = 0. (18) 


In addition one would need to study the values of b at 
the boundary of S. 

We discuss now the three-phase algorithm as de- 
scribed in [3,5,6,7,9,10]. When applied to the problem 
(13), (14) this scheme becomes: 


Let Tz, £=1, 2,..., be a sequence of grids (subsets to S) 
such that each T¢ contains finitely many points and 


lim) max min ||s — t|| = 
loo ses tele 
0. (19) 


Calculate an sg which solves 


max b(t). 
tEeT¢ 


2 | Take sg as starting value. 


3 | To determine the local maxima of b with some numerical 


Example 11 We consider the following special case of 
Example 10: 


min b(s) = si s € [0,1]. 


1S Saperecom 
s + 0.000001 


In this case there are many local maxima and the grid 
would need to be very fine, if the three-phase method 
should locate them all. A direct analytic treatment 
would of course be easy, since the expression for b is 
simple but the example illustrates potential difficulties. 

We note that if S has several dimensions, then the 
‘curse of dimensionality’ sets in, making a systematic 
discretization approach inefficient, since it implies that 
b must be tabulated and this table must be large in or- 
der to describe b. In a practical situation there is no 
assurance that the optimal solution has been found, if 
this cannot be tested analytically. Since one has to ac- 
cept the possibility that a nonoptimal value has been ac- 
cepted one may consider pseudorandom generation of 
grids instead of the deterministic approaches described 
above and in the references given up to now. A major 
advantage of these approaches is that the computational 
complexity is not crucially dependent on the number 
of dimensions and statistical estimates for the uncer- 
tainty in the optimal value may be developed. For an 
introduction to global optimization methods based on 
random processes, see [12]. The author is working on 
methods for applying random schemes for the general 
semi-infinite program (1), (2). 


An Air Pollution Control Problem 


The three-phase approach, described above may be di- 
rectly applied to general linear semi-infinite programs. 
We illustrate with the air pollution control problem as 
described on [6, p. 17; 184]. Here the air quality control 
area is represented by the compact set A C R’. The an- 
nual mean concentration of a certain inert pollutant (e. 
g. sulphur dioxide, SO.) is represented by a real-valued 
function p, defined on A. There are many sources emit- 
ting pollution and each car, house or power plant may 
contribute to the pollution. The permissible level of pol- 
lution is defined by a given function v. If there is a point 
s € A such that the standard v is exceeded, i.e. is such 
that 


p(s) > v(s), (21) 


then the pollution concentration must be reduced. Af- 
ter reduction, the remaining pollution must satisfy the 
standard at each point s € A and this fact implies that 
infinitely many inequality constraints must be met. We 
will assume that the sources of pollution may be split 
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into n + 1 classes and that the sources in each class will 
be regulated in the same way. We further require that 
the superposition principle is valid, i.e. that the con- 
centration contributions add up. Therefore we write 


n 


p(s) = So u,(s), Ss EA, 


r=0 


(22) 


where the concentration contribution from source- 
class r is given by the nonnegative function u,. In partic- 
ular, uo represents the concentration contribution from 
background sources which cannot be regulated. Next, 
u; may be the contribution from trucks, operating in A, 
uz from passenger cars, u3 from residential heating and 
so on. The implementation of an abatement policy im- 
plies that the contribution from class r is reduced by the 
fraction E,, hence the remaining concentration contri- 
bution from this class is 


(1—E,)u,(s), O<E, <1. 


The reduction vector E € R” must therefore satisfy the 
constraints 


O<£E,<1, r=1,...,n, 


uo(s) + YI —Er)ur(s) Svs), se A. 


r=1 


The last relation may also be written 
do Erur(s) > p(s)—v(s), s EA, 
r=1 


which may be interpreted as the requirement that the 
total removed pollution should amount at least to the 
difference between the original pollution and the re- 
quired standard v(s). 

Assume that the cost associated with the pollution 
abatement is given by the linear form K(E) where 


Kh= > on, 
r=1 


The task to determine E such that the air quality stan- 
dard v is satisfied at each s € S, while the control cost is 
minimized is hence the solution to the problem: Deter- 
mine 
n 
V = min CrE ys 


EeR" 
r=1 


(23) 


subject to the constraints 


Y > E,u,(s) = p(s)—v(s), s EA, 


(24) 
r=1 
E20, PHA, (25) 
=Ee eal, PS ly a (26) 


Thus the equations (23)-(26) define a linear semi- 
infinite program of the type (1) and (2). We note that if 
there is a point s* € A, such that uo(s*) > v(s*), then the 
background pollution is above the standard and there is 
no control policy achieving the desired goal. Hence the 
program (23)-(26) is inconsistent. Otherwise we con- 
clude the existence of an optimal policy if the func- 
tions u,, r = 0, ..., n are continuous. For the purpose 
of numerical treatment we replace the area A by a fi- 
nite grid T in (24) and solve the corresponding linear 
program. In so doing, we enforce the standard at the fi- 
nite grid only and hence it may be violated somewhere 
between the gridpoints. However, convergence obtains 
if we consider a sequence of finer and finer grids as in 
(19). It is also possible to derive a nonlinear system and 
use the three-phase algorithm. It is of interest here that 
Theorem 8 implies that if an optimal control policy has 
been calculated, there is a subset of points where the 
standard is met exactly. It has been proposed to put air 
quality sampling stations at these points. For further de- 
tails, see [6]. 


Concluding Remarks 


There are a large number of other applications of semi- 
infinite programming. One such is the approximation 
of functions in the maximum norm. Special algorithms 
have been developed for classes of such applications. 
For the approximation in the maximum norm, the al- 
gorithms of Remez (see [2]) are applicable. But we note 
also here that a global maximization needs to be carried 
out at each step of the algorithm. An one-sided approx- 
imation problem has been studied by [1]. The solutions 
of the dual of this problem are associated with quadra- 
ture rules of generalized Gauss type. However, the de- 
termination of abscissaze and weights of such rules, us- 
ing only the parameters of the corresponding semi- 
infinite program, is notoriously ill-conditioned. For the 
so-called classical cases associated with the names of 
Legendre, Hermite and Jacobi, other analytic informa- 
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tion is used, for instance, the coefficients of the three- 
term recurrence relation formulas for corresponding 
orthogonal polynomials. See e.g. [4] and [8]. These 
methods give an alternative way of finding the solutions 
of certain special semi-infinite programs which may be 
used for testing the accuracy and efficiency of general 
methods for semi-infinite programs. 
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The problem 
min {F(z): ze K}, 
Ur) ‘ (1) 
K= {zeR": g(z,t) <0, Vte T}, 


is considered, where T is a compact subset of the space 
R”, F and g(-, t) are continuous functions from R” into 
R and g(z, -) is continuous on T. If the functions F and 
g(-, t) are assumed to be convex, we deal with convex 
semi-infinite problems, and many important applica- 
tions lead to linear semi-infinite problems, with linear 
functions F and g(-, t). 

With some additional writing effort, the content of 
this article can be easily extended to the case that the set 
of feasible points is given as the intersection of finitely 
many sets 


{zeER": g.(z,t) <0, VteT}, ses, 


where, for each s € S, T* is a compact subset of Rs and 
g; has the same properties as g. 

The notion ‘semi-infinite programming problem’ is 
due to the property that the feasible set K belongs to 
a finite-dimensional space whereas T is an infinite set. 
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The references here are not primarily intended to 
give a complete documentation of the original papers 
and sources or to reflect the historical development of 
this field. Except for the case that reference is given 
to explicit statements or techniques, it was our aim to 
draw the reader’s attention to the basic approaches. 

Concerning theory and properties of semi-infinite 
programming, see the monographs [1,19,24], papers 
[12,18] and to the proceeding volumes [6,33]. 

Transforming g(z, t) < 0, Vt € T, into max; e rg(z, 
t) < 0, one could treat semi-infinite problems in the 
framework of nondifferentiable optimization. We do 
not adopt this approach here (see the review paper 
[27]). 

The algorithms for semi-infinite problems generate 
usually a sequence of finitely constrained auxiliary op- 
timization problems suitable for being solved by stan- 
dard algorithms for finite optimization. We assume the 
latter to be available. We concentrate on the different 
ways these auxiliary problems may be generated and 
confine ourselves to some remarks on properties the 
finite optimization algorithms should favorably have. 
Roughly one can distinguish three classes of methods 
according to the way the auxiliary (finite) problems are 
generated: Exchange methods, including cutting plane 
methods (applicable only for convex problems), dis- 
cretization methods and methods based on local reduc- 
tion. Exchange and discretization methods are more 
recommendable for a ‘first phase’ of the solution pro- 
cess, whereas the reduction approach is recommend- 
able for a final stage in order to provide a higher ac- 
curacy of the solution and a better rate of convergence. 

In order to sketch briefly the respective ideas behind 
the exchange and discretization methods, we suppose 
that K is a compact set and introduce the finitely con- 
strained auxiliary problems 


py. ORs (2) 
K;= {z€ Zo: g(z,t) <0, Vt e T;}, 


with Zp a convex compact set, Zp D K. Sometimes Zo is 
artificially introduced in order to provide solvability of 
these problems. If K 4 @, the approximate problem (2) 
has a solution for every finite set T; C T. 

Omitting detailed assumptions on the problems and 
methods involved, we concentrate on the main ideas of 
the numerical approaches mentioned above. 


Exchange Methods 


This notion refers to the fact that in step i the set Kix; is 

obtained from K; by addition of at least one new con- 

straint and (in many algorithms) deletion of some of the 

constraints of K;, i.e., an exchange of constraints takes 

place. 

e Step i: Given T; C T, |T;| < co. Compute (approxi- 
mately) a solution z! of P(T;) and some (or all) local 


maxima fj,..., #1, of the subproblem 


Q(z') max {g(z',t): te T}. 


Stop if g(z’, t') <0,j=1,..., qj. Otherwise, choose 
Ti+; such that the following relations hold: 
Tn TT aaeet 
max g(z',t) > 0. 
teTj44 
Necessary for convergence is 


i 4f\ _ i 
aS g(z', ti) = max g(z ,t), (3) 
i.e. the global solution of the nonconvex optimization 
problem Q(z!) is required in each step (or at least in 
a subsequence of steps) which for dimT > 2 may be 
very costly. Under (3) convergence may be proved for 
instance in case that constraints are never deleted from 
T; (cf. [18]). 

In fact, exchange methods are similar to the classi- 
cal REMES-algorithm for solving linear Chebyshev ap- 
proximation problems. Special algorithms mainly dif- 
fer in the choice of Tj,;. In the literature, cf, e.g., 
[14,23,30,40,41], there are different algorithms which 
have important additional features: The ability to add 
several constraints ¢(z, t') < 0, where f' are some points 
with g(z’, t') > 0, (or to add only the constraint which 
is maximally violated at z') and to delete constraints in 
an efficient way in the case of a convex problem (1). In 
[23] a cutting plane algorithm for convex problems is 
suggested, which ensures a linear rate of convergence 
with respect to the values of the objective functional. 


Discretization Methods 


Algorithms of this type compute a solution of problem 
(1) by solving a sequence of problems P(T;), where T; 
is an h,-grid on T, i.e., a finite subset T; C T such that 
supre Tr dist(t, T;) < h;. In the ith step a fixed (usually, 
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uniform) grid T; is considered, generated by h; = y;hj-1, 
with y; € (0, 1) chosen a priorily or defined by the solu- 
tion procedure itself. In step i only subsets T; of T; are 
used and in a number of algorithms one has (k; € N): 


1 
yi= ee with k;-; > 2 


i-1 

implying T;-; C T; Vi. 

e Step i: Given hj-;, the last set T;-1 C T;-; anda so- 
lution z'~! of problem P(T;—1); 


il) choose h; = y;hj-; and generate Tj; 

i2) select T; C Ti; 

i3) compute a solution Z of P(T;). If Z is feasible for 
P(T;) within a given accuracy, put z' := Z (de- 
fine also yj41 if the sequence {y,} is not a priorily 
chosen) and continue with Step(i + 1). Otherwise 
repeat i2) for a new choice of Te enlarging the 
old one. 


With regard to efficiency, it is important to use as much 
information as possible from former grids in solving 
P(T;). Substantial differences among the algorithms of 
this type are in the choice of Ti. Preferably T; should be 
chosen such that the solution of P(T;) approximately 
solves also P(T;). Most of the suggested algorithms se- 
lect T; such that 


Ti > Tf = {te T;: g(Z,t) >= —a}, 


with @ > 0 some chosen threshold and Z the foregoing 
inner iterate (cf. i3)). 

The choice of @ is crucial (cf. [15,31,32]): A large 
value of a leads to many constraints in P(T;) and 
choosing a@ too small one should expect a large num- 
ber of steps i2), i3) for fixed T;. Adaptive strategies for 
decreasing a are used in [7] and [26]. 

In [39] special versions of the method of feasible di- 
rections coordinate the choice of the grid and the search 
direction in order to obtain feasible solutions of (1) 
in each step. In this and some other papers informa- 
tion about the last T;—, and z‘! is used in choosing 
y;, and the initial set T; at ith step. In [28] discretiza- 
tion processes are carried out such that, for a certain 
class of convex semi-infinite programming problems, 
the rate of convergence of the basic optimization algo- 
rithm used for solving the discretized problems is re- 
tained for the whole solution procedure. Hereby only 


one step of the optimization algorithm is performed in 
each problem P(T;) to define Z (see i3)). 


Reduction Methods 


These methods use the fact that under appropriate as- 
sumptions the original (infinitely many) constraints 


g(z,t) <0, VteT, 

can be replaced by finitely many constraints which lo- 
cally are sufficient to describe K. In the sequel, we as- 
sume that F € C?(R"), g € C?(R" x T). Let Z € R" be 
a given point. Let f),..., f4(z), with q(Z) < 0, beall the 
local solutions of 


Q(z) max{g(z,t): te T}. 


Obviously, z € K, if and only if 


g(z,tj) <0, j=l,...,q@). 
Assume now that there exist a neighborhood U of Zand 
twice continuously differentiable functions 

tj: us Tr, ile = tj; j=l,...,q) 
such that for all z € U the t;(z) are all local solutions of 
Q(z). Then, with 


G,(z) := gz, t(z)), j=l....,q@, 


we have G; € C?(U) and 
KnuU=({2e U> Gz) 30, f= 1,..2;4@)}; 


ie. in U we may replace problem P(T) by the finite 
problem 


PAT) min {F(z}: Gz) = 0, f=1,...,9@} 


Sufficient for the existence of U and t; as above is that 
all solutions ¢; of Q(z) are nondegenerate (cf. [17,18] 
and also > Semi-infinite Programming: Second Order 
Optimality Conditions). We note that this ‘reducibility 
in Z is a generic property (see [20,44]), thus it holds 
‘almost everywhere for almost all problems’. 

Now, reduction methods generally can be described 
as follows: 
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e Step i: Given z' (not necessarily feasible); 


i-1 


il) compute all local maxima ti7',..., hs 


lem Q(z‘1); 

i2) apply some steps of an algorithm for finite prob- 
lems to the reduced finite-dimensional problem 
P,i-1(T): 


of prob- 


min {F(z): Giz) = 0, f=1,s.1,gi=1} ; 


where G;,(z) = g(z, t;(z)) and the functions t;(-) 

are defined in a neighborhood of z'~!. Let z’** 

be the last iterate of the algorithm mentioned; 
i3) set z' = z'-*i and continue with Step(i + 1). 


In order to apply second order methods usually sec- 
ond order derivatives are required and to guarantee su- 
perlinear convergence, additionally a strong second or- 
der optimality condition has to be satisfied. Conditions 
concerning problem (1) which provide locally super- 
linear convergence of some SQP-methods applied to 
problem P,:-1(T) are given in [9] (see also ® Semi-in- 
finite Programming: Second Order Optimality Con- 
ditions). These conditions ensure, in particular, twice 
continuous differentiability of the functions G;. SQP- 
methods, using augmented Lagrangian functions and 
quasi-Newton update of their Hessians have been ef- 
fective in this context (cf. [9,16,25,29,38,43]). Here, ac- 
cording to the results in [9], an inexact evaluation of the 
functions t;(-), and hence, of the functions Gj, is admit- 
ted. 

To give a more explicit version of the reduction ap- 
proach, we specify the steps i2) and i3) from above, fol- 
lowing [9]. 

Given z° = z'~!, B; 9 = Bj; (Bp is, in principle, 
an arbitrary symmetric positive definite n x n-matrix). 
Then, for /=1,..., k;, with given z*!“!, B; |; do: 

e compute a solution s! and a vector of Lagrange mul- 
tipliers A*! of the quadratic programming problem 


min {VF(zi!!)s + 3s" Bij-is} 
st. Gj(z!!) + VGj(zi!)Ts <0, 
j=l... qin 


e compute a steplength a by a rule providing a de- 
crease of function (4) below (see, for instance [3]); 


e update 
ae ees 
Bj = Bji-1 
yy)! Byr-as!(Bir-is')" 


Gy)Ts 


(BFGS-formula), where 


(s))T Bias! 


yl = Velo AM, c) 
_ V0 a, c), 
Ean) 

qi-1 qi-1 


= FQ) + D462) + £ NG@y 


Jat j=l 


is the augmented Lagrangian of problem P,i-:(T), 
concerning the choice of c > 0 see [18 Assumption 
7.8]; 
e set B; = B;,x,, 2! i+. 
Conditions ensuring a local q-superlinear convergence 
of the sequence {(z*!, 4*!)} to a Kuhn-Tucker point of 
problem (1) are given in [9]. 

In the first instance SQP-methods are only lo- 
cally convergent. Therefore, hybrid techniques com- 
bining robust globally convergent descent algorithms 
with SQP-methods have been developed (cf. [8,10,11]). 
Globalization techniques for SQP-methods were intro- 
duced in [13] using the exact penalty function 


i,k; . 
ZN, 


qi 
F(z) +d) max(0, G;(z)] (4) 


j=l 


(with sufficient large d > 0) to control the stepsize. In 
the papers [3,42] and [37] alternative penalty functions 
with the same goal have been introduced. 

The idea to apply for semi-infinite programming 
a penalty technique coupled with a discretization seems 
to be promising. However, numerically this may be 
rather difficult: The condition number of the Hessians 
of the objective functions of the auxiliary problems de- 
pends essentially not only on the penalty parameter, but 
also on the common influence of almost active con- 
straints. The number of these constraints usually in- 
creases with the number of discretization points. There- 
fore, the use of exact penalty functions or their smooth 
approximations has the advantage that it is possible 
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to work with a fixed penalty parameter. Exact penalty 
functions of integral type for semi-infinite program- 
ming have been proposed in [2]. In [4] and [5] ex- 
change techniques are coupled with penalization of the 
auxiliary problems, which are finally solved by inte- 
rior point methods. Interior point methods for solv- 
ing smooth, convex semi-infinite programming are also 
suggested in [35,36] and [34]. This approach requires 
computations of integrals over T to obtain the compo- 
nents of the gradients and Hessians of some aggregated, 
logarithmic barrier functions. The solution is reached 
by following the so-called central path of the problem 
by a predictor-corrector method. In [21,22] proximal 
penalty approaches have been proposed, with a special 
deleting rule for the constraints based on the existence 
of upper bounds for the Lagrange multipliers of the 
regularized auxiliary problems uniform with respect to 
a sequence of discretizations. 
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Weare concerned with semi-infinite optimization prob- 
lems 


F 
SIP max F(z) 
s.t. zZEZ 


with feasible set 

Z={zeR": g(z,t) <0 forallt € B}, 
and 

B= {teER”: hi(t) <0, je jt, 


J a finite index set, F € C?(R", R), g € C?(R" x R™, R), 
hi € C’(R", R), j € J. We assume that B is compact. 
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Later on we consider generalized semi-infinite prob- 
lems GSIP, where the index set B = B(z) depends on z, 
i.e. B= B(z) = {t € R™: Wi(z, t) < 0, j € J} with given hi 
€ C?(R" x R™, R), j € J. In this case we assume that the 
range of the set-valued mapping B remains in a com- 
pact set By C R”, i.e. B(z) C Bo for all z € R". 

In the same way as in finite optimization, second 
order optimality conditions for SIP are an important 
tool for answering a number of theoretical and practical 
questions. First order optimality conditions may again 
be given by a system of nonlinear equations, the so- 
called Karush-Kuhn-Tucker equations. Then, strong 
sufficient second order conditions imply this system to 
be regular and this regularity allows to answer questions 
of stability of solutions, dependence of solutions on 
problem parameters but also superlinear convergence 
of some classes of numerical methods (cf. [3] and the 
references therein). We give the conditions in a form 
which is suitable for applications like these. We follow 
the ‘reduction approach’ as in [3] and [4]. 

In the sequel, the parametric optimization problem 


max (z, ft) 
A(z) t : 
s.t. te B 


will play an important role. For Z € Z we define the 
active index set 


E(2) = {f € B: g(z,f) =0}. 


Obviously, any point ¢ in E(Z) is a (global) maximum of 
©(z). Throughout we assume that E(Z) is nonempty. (If 
E(z) = @, then by the assumptions above, the point Z is 
an interior point of Z and the optimality conditions co- 
incide with the optimality conditions in unconstrained 
optimization.) 

The row vectors of partial derivatives of g with re- 
spect to z, t will be denoted by g-, g;. For a function v: 
R" —> R we denote the directional derivative of v in Z in 
the direction € € R" by Dv(z; &), 


HES ie Ee 
tT 0+ T 
If the derivative v, is itself directionally differentiable in 
the direction &, we put D*v(Z; €) := Dv,(Z; €)é. 
For brevity, we do not consider vector-valued func- 
tions g or equality constraints in the definition of Z 
and B. 


The Parametric Problem © (z) 

In this section we prepare the ‘reduction approach’ for 
obtaining optimality conditions for SIP. We briefly out- 
line a stability result (cf. [5]) for the parametric problem 
(applying to SIP and GSIP as well) 


@(2) max gz, t) 
s.t. t € B(z), 


B(z) := {t: W(z, t) < 0, j € J}. Let Zand a point f which 
is feasible for O(z) be given. We define the active index 
set of O(Z) as 


IG) = (je): WD =0}. 


We say that the linear independency constraint qualifi- 
cation (LICQ) holds if the vectors 

n@t, jeJ@?, 
are linearly independent. The (weaker) Mangasarian- 
Fromovitz constraint qualification (MFCQ) is said to 
hold in ¢ for @(2), if there exists a vector 7 € R™ such 
that 


WiZ.pn<0, jet@P. 
Let a solution ¢ of ©(Z) be given. If at f the MFCQ is sat- 
isfied, then necessarily the Kuhn-Tucker condition is 
fulfilled, i. e. there exists a multiplier vector y € RVEH| 
such that 


LG@,t.y)=0, y>0, (1) 


where L‘ denotes the Lagrange function of ©(z) at (Z, f), 
L'(z,t,y) = g(z,t)— > yjhi(z, t). 
jeJ(Z,b) 
We say that strict complementary slackness (SCS) holds 


if the multiplier Y in (1) satisfies 


¥;> 0; 7S IG,4). 
Let us introduce the following second order sufficient 
optimality condition for O(Z): 


Ayea) Let a solution ¢ of @(Z) be given. Assume that 
LICQ is satisfied. (Under LICQ there is a unique 
multiplier Y such that (1) is fulfilled.) We assume 
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that the strong second order condition (SSOC) 
holds, 

n LL t,y)n <0 forall n € T(z, £) \ {0}, 
where T(Z, ft) is the subspace T(z,t) = {n € 
R™: 7,hi(Z, fn = 0, j € JD}. 

Theorem 1 Let ¢ be a solution of O(Z) such that Ayea) is 

satisfied. Then there exist neighborhoods U(z) of z and 

V(8) of t and mappings t : U(z) > V(t), y: U(Z) > 

RV@9I such that: 

e ¢(Z)=t, 

° yZ@)=yY, 

e and for all z € U(Z) the value t(z) is the unique local 
solution of O(z) in V(t) N B(z) with corresponding 
multiplier y(z) such that LICQ and SSOC are satis- 
fied. 


The functions t, y are Lipschitz continuous and direc- 
tionally differentiable. The value function 
G(z) = g(z, tz), 


is continuously differentiable in U(z) and the deriva- 
tive G, is Lipschitz continuous and directionally differ- 
entiable with: 


G,(z) = g-(z, t(z)) 
— YF nie, t@) Q) 
jEJ(Z,t) 
= Li(z, t(z), y(2)), 
D’G(z&) = ELL (z, H(z), YO)E 
— D' t(z;&)Li(z, t(z), y(2))Dt(zs€) 
2 D> Dyj(zsE)hUz, t(2))E. 


j€J(Z,t) 


(3) 


The directional derivatives Dt(z;&) are given as the 
unique solutions of the quadratic program 


max F(&, n) 
st. hh(z, t(z))E + hi(z, t(z))n 
P(é) =0 ify(z)>0 
<0 ifyj(z) =0 
forj €J(Z,0), 
where 


FOE. m) = 507 Lie, 12), 1) 


+ n° LE (z, t(z), y@E 


The directional derivatives Dy(z;&) are given by the 
unique multipliers B(E) of P(E), Dy (z;&) = B(E). 
Remark 2 
a) For SIP, i.e. if the functions / do not depend on z, 
the derivatives of G and problem P(&) take the sim- 
pler form: 
Gz(z) = gz(z, t(z)), 
D°G(z;£) = €" geez, t(2))E 
— DT e(z £)Li(Z, t@), y(2))Dt(z €) 


and 
max 50" Li,(z, (2), ¥(2) 
+n" gerlz, t(z))E 
pe) JSt hi(z, t(z))n 
=0 ifyj(z)>0 
<0 ifyj(z)=0 
jeJ@P. 


b) If in A;eq), in addition to LICQ and SSOC, we as- 
sume SCS, then the functions f, y are continuously 
differentiable and G is twice continuously differen- 
tiable. Consequently, in (3) we can write Dt(z;&) = 


t(z)&, Dy(z&) = y2(z)é. 


The Reduced Problem 


Let Z be feasible for SIP. By assuming that A;eq) is sat- 
isfied for all solutions f of O(z) (i.e. all  € E(Z)), by 
applying Theorem 1, the problem SIP can locally be re- 
duced to an equivalent finite optimization problem. 
Theorem 3 Let Zz € Z be given such that Ajeq) is 
satisfied for allt € E(zZ). Then the set E(Z) is finite, 
E(z) = {t',...,#"}, and there is a neighborhood U(zZ) 
of z such that with the functions 


G*(z) := g(z,t*(z)), €=1,...,7, 


in Theorem 1 (corresponding to the solutions # of ©(zZ)) 
the following holds: z € U(Z) is a local maximizer of 
SIP if and only if z is a local maximizer of the so-called 
reduced problem 


F(z) 

z € U(Z) 

G'(z) <0, 
a re 2 


max 


SIP pea(Z) 


Semi-infinite Programming: Second Order Optimality Conditions 


3437 


By Theorem 1, the functions G* are continuously differ- 
entiable in U(z) and Gt is Lipschitz continuous and di- 
rectionally differentiable. 


Optimality Conditions for SIP 


We consider SIP and assume, that for z € Z the regu- 
larity conditions of Theorem 3 hold. Then, in a neigh- 
borhood U(Z) of Z, the problem SIP is equivalent with 
the finite problem SIP,.4(z). Let L denote the (F. John 
type) Lagrangian of SIPyea(Z), 


L(z, A) = AoF(z) — > AeG*(z), 
f=1 
with G(z) = gz; t'(z)), and K the cone of critical direc- 
tions for SIP,eq(Z) in Z, 


_ n, Fe(Z)E2=0, 5 _ 
K=|fer": ating <0, © be 
The following dual optimality conditions hold. 


Theorem 4 Let Z be feasible for SIP such that the as- 

sumptions of Theorem 3 are satisfied. Then we have: 

a) (Necessary conditions) Suppose, z is a local maxi- 
mizer of SIP. Then, for any —& € K there exist mul- 
tipliers Ao(E),...,A-(E) = 0, not all zero, such that 
(with the expressions for Ssip(E); Qsip(E) below) 


Ss(§) =0 and Qsp(§) <0. 


The multipliers A¢() can be chosen 0 if g,(Z, t)E < 
0, and Ao(é) can be chosen 0 if F(Z)E > 0. 

b) (Sufficient conditions) Suppose that for any & € K \ 
{0} there exist multipliers AolE), oe ,Ar(E) > 0 such 
that 


Ssp(&) =0 and Qsrp(&) <0. 


Then Z is a strict local maximizer of SIP. 
The expressions for Ssip(&), Qsip(&) are: 


Ss(&) = L-(Z, A(E)) 
= olE)FZD — YD AelEg(Z 7) (4) 


l=1 


Qsip(E) 
= gv (Fuee0 = > AelE) ge2(Z, *) g 


f=1 


+ AD YB Zep Dee). (5) 


f=1 


The proof of Theorem 4 follows by applying F. John 
type optimality conditions of finite optimization to 
SIPea(Z) (conditions, modified to deal with functions 
whose derivatives are Lipschitz continuous and direc- 
tionally differentiable cf. [4]). This leads to the follow- 
ing primal necessary conditions (under the assump- 
tions of Theorem 4): Let Z be a local solution of SIP. 
Then, for any € € K the following system has no solu- 
tion d € R": 


gelZ, fd +E" giz, f)E 
—D"t(z&)Lt(z, tf, y*)Dt(z &) < 0, 
f° € E(z;&), (6) 
F,(zZ)d + €'F,,(Z)é > 0, 
if F,(Z)E = 0, 


where E(z,&) = {f' € E(Z): g-(z, #)& = 0}. An ap- 
propriate theorem of the alternative implies Theorem 
4a. Correspondingly, the sufficient condition in Theo- 
rem 4b is equivalent to the nonsolvability of the system 
(6) with strict signs replaced by the weak signs, E(z; &) 
replaced by E(Z) and ‘if F,(z)& = 0’ omitted. 

In the following remark we discuss optimality con- 
ditions under stronger as well as conditions under 
weaker regularity assumptions. 


Remark 5 Let Z be feasible for SIP. 

a) Ifin A;eg), in addition to LICQ and SSOC we assume 
SCS, then (cf. Remark 2b) the functions G* in the re- 
duced problem SIPreq(Z) are twice continuously dif- 
ferentiable and in Theorem 4 the term Qgyp(&) can 
be replaced by 


a (mera — So Ael6) gc, F) 


l=1 


+ AOE O LEG, #7540) &. (7) 
l=1 

Such optimality conditions have been derived in [2]. 
The idea of the ‘reduction approach’ goes back to 
[12] where in addition, it is assumed that LICQ 
holds: g,(z, f)(= G£(z)), € = 1,..., r, are linearly 
independent. Then in the terms S and Q we have 
Ay = 1 (Kuhn-Tucker condition) and the multi- 
pliers A,,...,A, are unique (independently of &), 
i.e. the matrix between parentheses in (7) is nega- 
tive (semi) definite. 
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b) In [3, Thm. 5.4] a sufficient optimality condition 
similar to Theorem 4b is proven which does not 
make use of a reduction assumption. It is based on 
the idea to replace in (5) the directional derivative 
Dt*(Z;&) by a solution of the quadratic problem 
PLE ) and assumes that condition LICQ holds at all 
points ¢ € E(z). 

In [7,8] H. Kawasaki has given necessary and suffi- 
cient optimality conditions under weaker regularity 
assumptions and without assuming differentiability 
of g with respect to t. The expressions 


Ya 


Cc 


Dif Bean? pote) (8) 


are replaced by terms 2e(f*, £) which describe a sec- 
ond order approximation of the feasible set Z at Z in 
the direction &. The necessary dual conditions (cf. 
[8, Thm. 3.1]) do not make regularity assumptions 
on ©(zZ). A similar sufficient condition without a gap 
is obtained (apart from the strong sign in the sec- 
ond order condition) (cf. [8, Thm. 4.1]) under the 
assumption that B is convex and conditions on the 
behavior of certain functions such as g(Z, t) which, 
in particular, imply that the set E(z;&) is finite. In 
[8] it is shown that for the special case B = [0, 1], 
under the regularity conditions of Theorem 4, the 
term 2e(f*, &) coincides with (8). These terms (often 
called ‘shift terms’) reflect the ‘semi-infinite struc- 
ture’ inasmuch as they describe the dependence of 
E(z) on z. 
d) In [1] the ‘shift terms’ are given in the form 


o(A(é), t(&)) (9) 


where A is a multiplier function (in the space of mea- 
sures), T(&) describes a second order approximation 
of the feasible set and o denotes the support func- 
tion. There is a gap between the necessary and the 
sufficient conditions (cf. [1, Thms. 3.1; 3.2]) inas- 
much as (apart from the strong sign) in the suffi- 
cient condition the term (9) is given with a set t(&) 
which can be strictly larger than the correspond- 
ing set in the necessary condition. However, under 
the following assumptions there is no gap: g(-, t) is 
twice differentiable, g,, is continuous on R” x B, 
g(Z,-) is twice continuously differentiable, g,,(Z, -) 
are continuously differentiable; the constraint qual- 
ification (CQ) is fulfilled: there exists & such that 


ge(Z, f)E < 0,f € E(Z); g(Z, t) satisfies a certain sec- 
ond order growth condition; B and E(Z) are smooth 
(compact) manifolds. 


Optimality Conditions for GSIP 


We briefly outline optimality conditions for GSIP. The 
statements of Theorems 3 and 4 remain true for the 
generalized problem with the only modification that 
due to the dependence of B (i.e. h/) on z the expres- 
sions for S(E) and Q(é) (cf. Theorem 4) contain extra 
terms (all terms in (2), (3)). Consequently, Theorem 4 
holds for GSIP if the expressions for Ssrp(€) and Qsrp(€) 
in (4) and (5) are replaced by: 


Sesw(&) = Ssrp(&) + So Aclé) > Vend(Z, f*), 


f=1 jest) 


Qesip(E) = Qsip(&) + So Aclé) 


l=1 


x DO (WET HE PDE + Dy GOAE, AE). 


jeJ@,t) 


We give some information on other optimality condi- 
tions for GSIP. 


Remark 6 Under the additional assumption SCS Area), 
as in the SIP case in Remark 5a, the formula for Qgsip(&) 
simplify. Corresponding optimality conditions are im- 
plicitly contained in [10] (see also [11]). 

A second order sufficient condition not based on 
a reduction (as in Remark 5b) is to be found in [4, Thm. 
5.1]. For first order necessary conditions under weak 
regularity assumptions, see [6] and [9]. In the first paper 
no regularity assumption on ©(Z) is made, whereas the 
latter assumes that MFCQ is fulfilled for any ¢ € E(Z). 
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Duality of the Linear SIP Problem 


Linear semi-infinite programming is a next level of ex- 
tension of elementary finite linear programming where 
now finitely many variables occur in infinitely many 
linear inequalities. Some of the earliest papers were 
by A. Charnes, W.W. Cooper, and K.O. Kortanek, 
[3,4,5,6]. More recent surveys by S.-A. Gustafson and 
R. Hettich were co-authored with Kortanek, see [8,11], 
and several symposium volumes on semi-infinite pro- 
gramming have appeared since the late 1970s, [7,10]. 

In this paper we begin with the primal program de- 
noted by Program I. The analogy to a primal finite lin- 
ear program is thus made apparent. 


Program I. Let T be a nonempty set and u = (uy, 
...) Un 41) be real-valued functions on T and b € 
R". Find v; = supyT™b from among y € R” such 
that yTu(t) < un4i(t), Vt € T. 


Clearly, if we choose a finite subset of {t), ..., ty} of 
T, then we obtain a finite LP approximation to Program 
I, namely, 


Program [;. Find v7, = maxyTb subject to y € R” 


and yTu(t;) <un silt) i = 1k. 


Remark 1 
Program I, may be a very bad approximation to Pro- 


Without sufficient regularity conditions, 


gram I, and possibly useless. Here, we also use the typi- 
cal Russian notation i = 1, m fori=1,...,m. 


Constructing the dual program to a finite LP is a pleas- 
ant task, and in this case yields the following, 


Program JI; Find v;, = min )7*_, un si(t)) Ai 


i=1 


with A; > 0, i = 1, k such that ek ulti) Aj = b. 
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Now any finite LP approximation depends on a selec- 
tion of a finite subset of the given SIP index set T, and 
of course there are infinitely many such choices. To at- 
tempt to recover the given infinite problem by this pro- 
cedure requires that we at least allow the finite subsets 
of T to be freely chosen. This is formulated mathemati- 
cally with generalized finite sequences, a particular func- 
tion space, in the dual program. There are close con- 
nections between this construction and G.B. Dantzig ’s 
concept of generalized linear programming with vari- 
able coefficients, and some of these were explored in 
a paper by Kortanek in the 1970s, [12]. 


Program II. Find vy = inf; supp AUn+i(t)A(t) 
from among A € R®, X(-) > 0, such that 
Yor supp u(t) A (t) = b, where A € R® are gener- 
alized finite sequences, namely supp A = {t: A(t) 
0} is a finite set. 


From a probabilist ’s point of view Program II is equiv- 
alent to the following: 


Program JIB. Find vyg = inf [7 un+1 da where 
a are nonnegative Borel measures on a Borel set 
T CR" subject to [uj da = bj,i = 1,k, and 
where u; are Borel integrable functions, see [9]. 


Using analogous algebraic manipulations as in finite 
linear programming, one can verify that if y and A are 
feasible respectively for Programs I and II, then 


ylb< SY) ungils)a(s). 


s€supp A 


When both programs are feasible, it follows that 
vr S vir. (1) 


Termed the duality inequality, (1) becomes an 
equality when certain regularity conditions hold. The 
most widespread sufficient condition is that the func- 
tions u(-) are continuous on a compact set S, and the 
Slater condition holds, namely there exists y such that 
Zl u(t) < Un+1(t), Vt €S. This condition is also termed 
superconsistency, and it is defined for the dual Program 
II in a different way. As reviewed in [11, Thm. 6.11], 
a program is superconsistent if and only if its dual pro- 
gram has bounded level sets. Bounded level sets are nec- 
essary and sufficient to overcome the ‘danger’ noted be- 
low Program JI; about using any one finite linear pro- 
gram as an approximation to a linear semi-infinite pro- 
gram. 


In general, without a constraint qualification a du- 
ality gap, vr < vir can occur. 


Example 2 
2 
vm sup {-y1: = ve yot ta Vte [0, 1]} ‘ 
Here vj = —1, but vj = + 00 reflecting the inconsis- 


tency of the dual program. 


During the 1970s affine perturbations of the b-vector 
were constructed of the form b + € v, where € > 0, v € 
R" leading naturally to the following dual programs. 


vie = sup yl(b + ev) 


lhe 
"  Tst. ylu(t) < ungilt), VteT 
and 
viye = inf unqilt)a(t) 
i t€suppA 
sé ~{fromamong A « R®, AC) >0, 
st. > u(t)a(t) = b + ev. 
t€supp A 


The idea was that II, would suggest a Program IJ* that 
would be in perfect duality with Program I. To accom- 
plish this, certain properties were required of the vector 
v used in the perturbation. 


Definition 3. A vector v € R" that satisfies the fol- 
lowing property is termed a descent vector for Program 
I: given any € > 0, there exists a generalized finite se- 
quence A£(-) which is feasible for Program II§ such that 
in addition, 


Yo ungi(A'(t) < vi. (2) 


To see how a descent vector is used, let 5 > 0 but other- 
wise arbitrary. Let y®) be I-feasible such that 


v= bt y®) > ViI— >. (3) 


Let € < 8/2 ||y©T v ||, where without loss of generality 
y Ty # 0). By Definition 3 there exists A*(-) which is 
II',-feasible. Hence by the duality inequality, 


So uns (as(t) = yOTb + cyl v. (4) 


t 
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Combining (2)-(4) yields 


6 6 
v= dX Un+i(t)A‘(t) => VI 5 _— 5 8 


=v—6. 


The interpretation of (5) is that we can obtain a solu- 
tion of II§ with objective function value as close to v; 
as desired. The formal perfect dual to Program I is the 
following one. 


Program I]*. infw from w € R, (Vv, ¥m41) € R™* 
which satisfy the following constraints: given any 
€ > 0, 46, 0 <6 <€ such that 
YS uj(t)u(t) =; +63, i= 1m, 
t€supp [L 
D2 volt) uO) — WS 8¥ m1 


t€supp [L 
has a solution pp € R®). 


Rather than pursue duality theory in depth, we look 
to newly discovered relationships between of LSIP and 
semidefinite programming, SDP, see [14,15,18,19]. 


Dual Semidefinite Programs 
We begin with the following standard primal SDP pro- 
gram: 


sup cx 


P m 
s.t. ye X Qo, 
i=1 


(6) 


where Q;,i = 0,m, aren X nN symmetric matrices. 


Remark 4 Itis convenient to denote the value of a pro- 
gram (P) by v(P), and the value of a particular feasible 
point/solution, say x, simply by v(x). 


inf UeQ 
D ist UeQ:=c;, i=I1,m, (7) 
U>0. 


For Programs P and D, we have the following weak 
duality lemma, also termed the duality inequality. 


Lemma 5 [If U is D-feasible and x is P-feasible, then 
v(x) < v(U). 


Now let V;, j = 1,n, be an orthonormal basis for the 
linear space S" (of nx 71 symmetric matrices, n = 


5 X (n + 1)) with inner product U e W = (U, W ) 


= trace UW, where U, W € S$", and UW denotes ordi- 
nary, dimension compatible matrix multiplication. 
Let 


Qo = > bv, =b'Y, 
j=l 

Qi = Yo aiVj, A= (aij)mxn, 
jel 


U =o 9iVj = yV. 


j=l 


Definition 6 The convex cone K C R” is defined with 
respect to the given orthonormal basis by: 


z={zj}¢€K ifandonlyif z'V>0. 


The dual cone is conveniently expressed as K* = {s € 
R": sTK C R,, R, is the set of nonnegative reals. Here, 
K = K*. We also use the terminology appearing in [16, 
Sec. 1.5.1] namely: 


Q*: Ss” > R"”, 

Q*(U) = (Ue Qi) ;-tm 
Q*: S® > R™*, 

Q*(U) = (Qe U,Q*(U))'. 


Perfect Duality from the View 
of Linear Semi-Infinite Programming 


Program (P) is equivalent to the following LSIP: 


sup c'x 


s.t. Y= xjui(t) <ul(t), WteB, 
i=1 (8) 
B= {teR": |t|| = [tl =1} 


where u;(t) = t' Qit, 


i=0,m. 
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As reviewed in [11 Sec. 6.4], we have the following per- 
fect dual, Dc: 


v(De) = infw; 
weR 
s.t. (c, w)' € l(Min41), (9) 


where Min+1 
= co({(uj(t),..., a(t) ,to(t)) Yen) CR? *, 


Rather than considering arbitrary paths in My+1 
converging to (c, w)T, we need only consider movement 
along a descent ray, [13], associated with the minimiza- 
tion of Program D*, which generates straight line paths, 
(9), namely: 


inf w; 


D* Tl 
w € Rand (v V1) e Rm 


which satisfy the condition: 


given e>0, 
que R®, p>, 
s.t. u;(t)u(t) = c; + vj, 


t€supp [L (10) 


i=l1,m, 


DE wolf) u(t) = w < €Vn-g1. 


t€supp [L 


We need another program using descent vectors (for 
minimization) that is visibly close to the SDP structure: 


inf (U + W)eQ), 


Dpv a 
U,WeES", U>0O, 


where 
satisfying: given any k > 0, de, > 0: 


&e > 5 >0> 4U;: 


Q;e(Us +6W)=c;, i=l1,m 


(11) 


Qo e (Us +5W)—k < Qe(U+ W). (12) 


Definition 7 We say that V is feasible for Dpy if there 
are U, W € §" with U = 0, such that V = U + W, and 
the constraints of Dpy are satisfied. 


Our first goal is to show that Programs D* and Dpy are 
equivalent. We need two lemmas for this task. 


Lemma 8_ [If j1 € R®), « > 0, then Ja PSD U such that 


> uj(t)u(t) = Q;eU, i=0,m. 


t 


Conversely, if U is PSD, then dp € R®), | = 0, such that 


Do ui(Out) = Qe U, i =0,m, 


t 


Proof Let supp p(t) = {ti ... 
(th. Pies tin). Then 


, tp} with tf) = 
ui(t)) = t] Qty = Qie Uj, a 
where Uj = (tjxtjt),1=T7- 


Set U= pam (tj) Uj. Since each U; = 0, j = 1, p, 
it follows that 


U is PSD, and 
Q;eU= > uj(t)u(t), i=0,m. 
t€supp LL 


Conversely, if U is PSD, then there exists B € S” 
such that 


B' =B! withBXB!=U, X diagonal. 
Therefore, 


Q;eU = Q; e BXB! = tr(Q;BXB"') 
= tr(B-'Q;BX) = B'Q;BeX. 
Let 


x= diag{A, Sic wet Ant; Aj = 0, j= 1,n. 
Then 
Q; e U = tr(B'Q;BX) = > 0) B*Q;BO6;, 
j=l 


where 6; has \/A; in its jth component, and zero else- 
where. 


Let 
0 if A; =0, 
i= a] if £0 
and 
0 ift ¢ {t,..., ta}, 
u(t) 
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Hence, we obtain, 


QieU=B'QBeX= YY ° plt)ui(t). 


t€supp [LL 


For the next result we will need the following program: 


p* inf W; 
where we€R and WES?" satisfy: 
given any « >0,4U, € S$", U. = 0: 
Q;e(Ue+eW) =c;, i=l1,m, (14) 
Qo e (Ue + EW) < w. (15) 


Lemma 9 Program D* is equivalent to Program D**. 


Proof Let D** be consistent with feasible point w € R, 
W é€ S". Simply use Lemma 8 and set 


QieW =—-vj, 


Qo e@ W = —-Vin41. 


i=l1,m, 


Hence, any feasible solution of D** is feasible for D*. 
Suppose now (¥, Vn+1, Ww) is feasible for D*. By 
Lemma 8, for any € > 0, JU. such that 


QieUe=c;+ev;i, i=1,m, (16) 


Qo e Ue — Ww SX €Vm 4. (17) 


Therefore, for any € > 0 the equation AY = c + evhas 
solutions and so does AY = — 1, (A is the matrix defined 
above Definition 6), so IW € S$” such that Q*(W) =— 
v. We consider two cases. 

1) Qo € linear span {Q;}"L). 

Suppose Qo = )°7,a;Q;. Then set w’ = )0")_,ajc;. It 
follows from (17) that w’ < wand (W, w’) is feasible 
for D**. 

Qo is linearly independent of {Q;}7,. 

For this case it follows that there exists Y,, Y2 C R"” 
such that, 


A 0 
_= 
(ry) (..°..) 


2 


pea 


Therefore, JW; € S$", i= 1, 2, such that 


Q*(W,) = 90, 


Qo e Wi = —Vn41; 


Q*(W2) = -v, 
Qo e W, = 0. 


Hence, (W, + W2, w) is feasible for D**. 
Since the objective functions are identical, namely 
w, it follows that D* is equivalent to D**. 


Theorem 10 Programs D* and Dpy are equivalent. 


Proof It suffices to show that Dpy and D** are equiv- 
alent. We begin by supposing that Dpy is feasible with 
feasible solution V. This means JU > 0, W € S" satis- 
fying (11) and (12). Let w, = Qo e (U +W) + 1/n. Then 
(W, wz) is feasible for D** and v(Dpy) => v(D**). 

Conversely, suppose D** is feasible with feasible so- 
lution, (W, w). Then 4U such that 


Qie(U+ W)=¢; 
Qoe(U+W)<w, i=1,m. 


1) Qo € linear span {Q;}!)}. 
In this case Qo e (U +W) is determined by {Q; e (U 
+W)}"_,. Therefore, U +W is feasible for Dpy and 
v(Dpy) < v(D**). 
2) Qj is linearly independent of {Q;}’_,. 
In this case there exists Wo € S” satisfying 
Qi ¢ Wo = 0,7 Qoe«W = 1. 
Now, for any k > 0, IM >0, Mk = w— Qo e(U+ W). 
Hence, 


w= Qe(U+W+ Mkw) 


and 


Qie(U+W+MkW)=c;, i=1,m. 


But for any 6 < 1/M (= €), JUs = 0 Us satisfies (14) 
and 15. 

Hence, Q; e (Us +5(W+ Mk Wo)) =ci, i = 1m, 
and 


Qo e (Us + 6(W + Mk Wo)) 
= Qo e (Us + dbW) + 5Mk < w+ 65Mk. 
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Therefore, 


Qo e (Us + 6(W + Mk Wo)) — 6Mk 
< Qoe(U+(W+ MkW))). 


So 


Qo e (Us + 6(W + Mk Wo)) — k 
< Qe(U+(W+ MkW)). 


Therefore, (U +W+ (w— Qo e (U + W)) Wo) isa feasi- 
ble solution for Dpy and it is easy to show that v(Dpy) 
= v(D**). Hence, in conclusion we have obtained the 
following equivalence: 


Dpy & D** & D*, 


Corollary 11 Program P and Program Dpy are in per- 
fect duality. 


The SDP as a Conic Convex Program 


Recently, M.V. Ramana [16], Zhi-Quan Luo, Jos.F. 
Sturm, and Shuzhong Zhang [15], and J.F. Strum [18] 
developed a perfect dual of the semidefinite program P, 
(6) involving only linear and positive definiteness con- 
straints. The authors [15] state that their regularized 
program coincides with Ramana’s extended Lagrange- 
Slater dual and that this fact was already recognized in 
[17]; see also [1,2]. In [14] we investigate these connec- 
tions by using results for regularized conic convex pro- 
grams. We develop some relationships between all the 
perfect duals including the one presented earlier in this 
paper, II*. We refer the interested reader to all of these 
papers, but here conclude by showing how to restate the 
SDP program P as a conic convex program. 

Earlier the convex cone K and the matrix A were 
introduced. Clearly, Program P, (6), is equivalent to: 


sup cx as) 
st. b—A'xeK* (=K) 

with dual program, 
inf bt 
oe (19) 
st. Ay=c, yeK. 


Program P is equivalent to the following conic convex 
program: 


inf (0,c)'(s, ft) 
st. (s,t)e Rt", 


Pp’ (s, t) —(b,0) = (A! x,x), 
dx € R™ 
and (s,t)¢€ K* xR” =Kx R”™. 


Program D is equivalent to: 


inf (b,0)"(y, u) 
ay fromamong (y,u) € R"t™ 
s.t. (y, u) — (0, c) € ker[A I] 


and (y,u) € K x {0}. 
Program D” is a conic convex program which in the no- 
tation used in [15] becomes: 


CP((0, c), (b, 0), ker[A I], K x {0}). 


In [15] a finite sequence of regularized programs is con- 
structed having certain properties that lead to the con- 
struction of an SDP perfect dual. 
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Keywords 


Complementarity problem; Sensitivity analysis 


Sensitivity analysis has been extensively developed for 
a variety of problems in mathematical programming 
and applied mathematics [2]. It is concerned, in gen- 
eral, with investigation of properties of solutions to 
problems with perturbed data. This type of analysis de- 
rives its importance from the fact that almost all prob- 
lems in mathematical programming and applied math- 
ematics are solved for a fixed, specified set of data. As 
a result, the computed solution may be considerably in- 
accurate or even infeasible if the data are subjected to 
changes. 

Nonlinear complementarity problems were origi- 
nally formulated in the context of mathematical pro- 
gramming and then shown to be a special case of varia- 
tional inequality problems ([1,6]). Linear complemen- 
tarity problems were introduced earlier as extensions 
of convex quadratic programming problems (and bi- 
matrix games) [1]. Initial sensitivity analysis results 
were developed for linear complementarity problems 
by extending corresponding results from linear and 
quadratic programming [1]. In recent years it has been 
widely recognized that complementarity problems gen- 
eralize many economic, game-theoretic and transporta- 
tion equilibrium problems ([1,6,11]). These equilibrium 
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problems were modeled as finite-dimensional comple- 
mentarity problems (and variational inequalities) and 
the issues of computational efficiency and sensitivity 
analysis became an important part of their analysis. The 
computation of equilibria results in predictions of the 
behavior of the underlying system. As a result, the sen- 
sitivity of the equilibrium solution to changes in model 
parameters and data needs to be analyzed by the mod- 
eler. 

The (finite-dimensional) parametric nonlinear com- 
plementarity problem is defined as 


find x 
x>0, 
F(x,€) = 0, 
F(x,e)'x =0, 


NCP(e) 


where F is a mapping from R” x R* to R" and € € R* 
is the parameter vector. Observe that NCP(€) is equiva- 
lent to the parametric variational inequality VI(€) with 
mapping F and the fixed feasible set X(€) = K = {x: x 
> 0}. The parametric linear complementarity problem is 
defined as 


find x 
s.t. x>0, 
LCP(M, q) 
Mx+q= 0, 


(Mx + q)'x = 0, 


where M is an n x n matrix and q € R" (both M and q 
are considered parameters here). 

Let S be the solution point-to-set map for NCP(e) 
which assigns to each parameter € € R* the set of solu- 
tions of NCP(e) 


Se) = {ee Oc Fe) = 0, FG ce)" x = 0). 


Similarly, let the solution point-to-set map for 
LCP(M, q) be defined as 


S(M, q) 
=n 0: Mx+q>0, (Mx + q)'x =0}. 


Sensitivity analysis of the parametric nonlinear comple- 

mentarity problem NCP(e) is concerned with: 

e The existence and (local) uniqueness of solutions of 
NCP(e), that is, investigating whether the solution 
set S(€) is nonempty and a singleton. 


e Continuity properties of the solution set map S, such 
as Lipschitz continuity, both for multivalued and 
single-valued S. 

e Differentiability properties of the solution set map 
S, such as directional or Fréchet differentiability of 
a single-valued map S and differentiability in a gen- 
eralized sense of a multivalued map S. 

The fundamental sensitivity analysis results were 
obtained for the NCP(€) problem using the equivalent 
parametric generalized equation of the form ([12,13]): 


find x 


GEx(e) 
st. 0 € F(x,€) + Nx(x), 
where Nx(x) is the normal cone of the set K = {x: x > 0} 
at x. 

S.M. Robinson [13] showed that under the assump- 
tions that the mapping F is continuously differentiable 
and that the solution x* of the unperturbed problem 
NCP(e*) is a regular solution, the solution set S(e) of 
NCP(e) is (locally) nonempty and a singleton (that is, 
S(e€) = {x(e)}) for all € near €*. In addition, the solution 
x(e) is Lipschitz continuous in €. Under the same as- 
sumptions it was also shown later that x(e) is direction- 
ally differentiable at e* [7] and, under further assump- 
tions on K, that x(¢) is Fréchet differentiable at ¢* (see 
surveys in [6,8]). Sensitivity analysis results described 
up to now for the problem NCP(e) can be directly ap- 
plied to the perturbed linear complementarity problem 
LCP(e) where we substitute M(e) and q(e) for the ma- 
trix M and the vector q in the formulation LCP(M, q). 

Further extensions of the sensitivity analysis results 
for NCP(€) were obtained for situations where the so- 
lution set map S(€) is multivalued. These include con- 
ditions for the existence, continuity and differentiabil- 
ity (in a generalized sense) of the point-to-set map S(e€) 
([4,9]). 

Another approach to the study of perturbed solu- 
tions for nonlinear complementarity problems, called 
stability analysis, was proposed in [5]. In this approach, 
the mapping F itself is the problem parameter (instead 
of being dependent on a parameter €) and results from 
degree theory are utilized to obtain conditions for ex- 
istence and (generalized) continuity of the multival- 
ued solution map S as a function of F. This approach 
was also applied to linear complementarity problems 
LCP(M, q) where (M, q) is the problem parameter and 
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conditions for stability are expressed in terms of prop- 
erties of the matrix M [1]. Recent extensions of stabil- 
ity analysis for linear and nonlinear complementarity 
problems, using degree theory, can be found in [3,4]. 

Sensitivity analysis of nonlinear complementarity 
problems was applied to several equilibrium prob- 
lems, including the general spatial price equilibrium 
model [14] and the Cournot-Nash oligopolistic equilib- 
rium model [15]. Another class of problems to which 
sensitivity analysis results for complementarity prob- 
lems may be applied are mathematical programs with 
equilibrium constraints which arise in game theory, 
bilevel programming, and network design problems 
[10]. These problems include constraints of the type x € 
S(€) where, in certain cases, S(€) may be the solution set 
of a complementarity problem. Computational meth- 
ods for solving these problems rely on the ability to cal- 
culate the (generalized) derivatives of the solution set 
map S(e). 
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> Parametric Global Optimization: Sensitivity 

> Sensitivity Analysis of Variational Inequality 
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problems with perturbed data. This type of analysis de- 
rives its importance from the fact that almost all prob- 
lems in mathematical programming and applied math- 
ematics are solved for a fixed, specified set of data. As 
a result, the computed solution may be considerably in- 
accurate or even infeasible if the data are subjected to 
changes. 

Variational inequality problems were originally de- 
veloped to model partial differential equations aris- 
ing from applications in mechanics [5]. These prob- 
lems were formulated in infinite-dimensional spaces 
and sensitivity analysis issues were not addressed. How- 
ever, it has been widely recognized in recent years 
that variational inequalities are direct generalizations 
of many economic, game-theoretic and transportation 
equilibrium problems ([4,10]). These equilibrium prob- 
lems were modeled as finite-dimensional variational in- 
equalities and the issues of computational efficiency 
and sensitivity analysis became an important part of 
their analysis. The computation of equilibria results in 
predictions of the behavior of the underlying system. 
As a result, the sensitivity of the equilibrium solution 
to changes in model parameters and data needs to be 
analyzed by the modeler. 

The (finite-dimensional) general parametric varia- 
tional inequality problem is defined as 


find x € X(e) 
VI(e) s.t. F(x,e) '(y —x)>0 
for all y € X(€), 
where F is a mapping from R" x RK to R", X is a feasible 
point-to-set map from Rk to R" (that is, it assigns to each 


point € the feasible set X(€) of VI(€)), and € € R¥ is 
the parameter vector. Let S be the solution point-to-set 


map which assigns to each parameter € € RK the set of 
solutions 
F(x,€)"(y—x) = 0, 
sigelzexQ: "OO V-*= 


Vy € X(e) 


Sensitivity analysis of the parametric variational in- 

equality VI(€) problem is concerned with: 

e The existence and (local) uniqueness of solutions of 
VI(e), that is investigating whether the solution set 
S(€) is nonempty and a singleton. 

e Continuity properties of the solution set map S, such 
as Lipschitz continuity, both for multivalued and 
single-valued S. 


e Differentiability properties of the solution set map 

S such as directional or Fréchet differentiability of 

a single-valued map S and differentiability in a gen- 

eralized sense of a multivalued map S. 

The fundamental sensitivity analysis results were 
initially obtained for the special VI(e) problem, where 
X(e) = K is a fixed (closed and convex) set, using the 
equivalent parametric generalized equation of the form 
([12,13]): 


find x 


GE 
ae) st. 0 € F(x,€) + Nx(x), 


where Nx(x) is the normal cone of the set K at x. 

S.M. Robinson [13] showed that under the assump- 
tions that the mapping F is continuously differentiable 
and that the solution x* of the unperturbed prob- 
lem VI(e*) is regular, the solution set S(€) is (locally) 
nonempty and a singleton (that is S(e) = {x(e)}) for all € 
near €*. In addition, the solution x(e) is Lipschitz con- 
tinuous in €. Under an additional assumption that K is 
polyhedral, it was also shown later that x(€) is direction- 
ally differentiable at «* and, under further assumptions 
on K, that x(€) is Fréchet differentiable at «* (see the 
surveys in [4,7]). 

Sensitivity analysis results were then extended to the 
more general VI(e€) problem, where X(e) = {x € R”: 
g(x, €) = 0, h(x, €) = 0} is defined using mappings g and 
h. For this problem, the equivalent parametric general- 
ized equation is of the form ([7,7]): 


find (x,u,w) 
GKKTo(e) st O€ T(x,u,w,€) 


+Nrenxr" xe (x, u, w), 


where 
T(x, u, w,e)! 


= [Lp(x,u,w,e)', g(x, 6), h(x, 6)"] 


and Lp (x, u, w, €) = F(x, €) — Vig(x,€)T ut Vx h(x, €)T 
w. The generalized equation GKKT (€) is equivalent to 
the system of generalized Karush-Kuhn-Tucker condi- 
tions similar to those in nonlinear programming [2]. 
By applying the results obtained for VI(e) with fixed 
K to GKKT (e), Robinson [13] showed that under the 
assumptions that the mapping F is once- and the map- 
pings g and h are twice-continuously differentiable and 
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that the solution (x*, u*, w*) of the unperturbed prob- 
lem GKKTo(e*) is regular, the solution set S(€) is (lo- 
cally) nonempty and a singleton (that is S(e) = {(x(€), 
u(e), w(e))}) for all ¢ near €*. In addition, the solu- 
tion (x(€), u(e), w(€)) is Lipschitz continuous in €. Un- 
der the same assumptions, it was also shown later that 
(x(€), u(e), w(€)) is directionally differentiable at «* [6] 
and, under an additional strict complementarity slack- 
ness condition, that (x(€), u(e), w(€)) is Fréchet differ- 
entiable at e* (see the surveys in [4,7]). 

Further extensions of the sensitivity analysis results 
for VI(€) were obtained for situations where the solu- 
tion set map S(€) is multivalued. These include condi- 
tions for the existence, continuity and differentiability 
(in a generalized sense) of the point-to-set map S(e) 
([3,8,11]). 

Sensitivity analysis of variational inequalities was 
applied to a variety of equilibrium problems. These 
applications include the traffic assignment or network 
equilibrium model ([10,11]), the general spatial eco- 
nomic equilibrium models ([1,10,14]), and the spatial 
competition facility location models including spatial 
price equilibrium and Cournot-Nash oligopolistic equi- 
librium [15]. Another class of problems to which sensi- 
tivity analysis results were extensively applied are math- 
ematical programs with equilibrium constraints which 
arise in game theory, bilevel programming, and net- 
work design problems [9]. These problems include con- 
straints of the type x € S(€) where S(€) is the solution set 
of a variational inequality VI(e€). Computational meth- 
ods for solving these problems rely on the ability to cal- 
culate the (generalized) derivatives of the solution set 
map S(e). 
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We are interested in what happens to a solution of 
a problem when there are changes in the problem data. 
We shall use the term sensitivity analysis to refer to 
the calculation and study of any collection of mea- 
surements associated with any changes, infinitesimally 
small or finite. The measure of change may be ex- 
plicit or implicit, qualitative or quantitative. For exam- 
ple, a qualitative characteristic may be the continuity 
of a change, while a quantitative one might be a nu- 
merical bound. An explicit measure is usually a closed 
form expression while an implicit one is inferred by 
a proof of existence of a property or quantity. Sta- 
ble will mean ‘well-behaved’ or ‘well-conditioned’, in 
a given context. Traditionally, well-conditioned gen- 
erally meant small solution changes from small data 
changes, but many subjective interpretations are pos- 
sible and currently in use. The concept is relative to 
a given property, data value or data change. Stochastic 
changes and effects are certainly possible and frequently 
encountered in applications, but our focus is determin- 
istic. 

Some familiar examples will hopefully clarify the 
kinds of problems and conclusions that are of interest. 
Suppose Ax = b, where A is an n x n matrix and x and 
bare in E”, n-dimensional Euclidean space. If A is non- 
singular, then we know that x = A~'b. Now, if A or b 
changes, we can track the change in x if we know A?. 
We have a closed form smooth stable explicit solution 
x(A; b) of x as a function of A and b. Of course, com- 
putationally, we may even here encounter serious diffi- 
culties, since A~! or b may be difficult to evaluate. For 
example, suppose that A and b are functions of some 
parameter € € E?, which we denote by A(e) and D(e). If 
A remains nonsingular and b remains defined for € € T 
C E4, then x(e) = A~!(e)b(e) for € € T. In this instance, 
at least we know the general form of the parametric so- 
lution, but even so a closed form expression x(€) of x 
as a function of € might be too difficult to obtain. Issues 
become more complicated, e. g., for what € is x continu- 
ous, differentiable, rational, bounded, etc.? Can we cal- 
culate x and its derivatives or bounds, relative to €, or at 
least make good estimates? If A is singular or b is not in 
the range of A, then x(€) may not exist or may be mul- 
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tivalued. Can we characterize the solution set S(€) for 
ecT? 

This example suggests that we are thinking ofa solu- 
tion as a function of the problem data, though the func- 
tion may be extremely complicated, multivalued and 
only implicitly defined. Our central objective is to char- 
acterize this functional relationship. Why? Because, in- 
evitably, data are ‘soft’, subject to error or change, un- 
certain. It is usually desirable to approximate new so- 
lutions and the effects of data changes, once a solution 
has been calculated, without requiring the calculation 
of a solution for each new data value. 

A closed form solution will generally not be avail- 
able, except at an aggregate level for simple models such 
as that noted above. There are many such models in 
the sciences, in effect whenever we have a formula for 
a quantity as a closed form expression. The reader will 
know many examples (e. g., the quadratic formula for 
the solution of ax? + bx + c= 0 where x, a,b,c € E!,A 

mr’, F = ma, E = mc’, etc.), but perhaps may not have 
focused on viewing these formulas as representations 
of solutions of parametric problems, in effect solutions 
in terms of problem data. This is our orientation here, 
albeit as noted, we will not usually have the luxury of 
a closed form solution at our disposal. 

We are interested in a particular problem formula- 
tion that has many varied and important applications. 
The model is the nonlinear programming (NLP) prob- 
lem that is defined as follows: Find a vector x that solves 


min f(x) 
(P) st. gi(x)>0 (i =1,...,m), 
hj(x)=0 (j=1,...,p). 


Since our focus is on the effect of data changes on x, we 
explicitly introduce a vector € that represents the data 
and study the problem of finding a solution x(€) of 


min f(x, €) 


P(e) ot. 812 S0. CH le; 


hj(x,e-)=0 (j=1,...,p). 


We assume that x € E” (Euclidean n-space), though all 
the results we mention have extensions to more gen- 
eral spaces, and we assume € € T,, a subset of E7. In this 
framework, when € is fixed, the parametric NLP P(e) 
becomes a standard NLP of the form (P). We are in- 


terested in a solution x(e), the ‘optimal value’ f[x(e), €] 
and other quantities, when € changes. 

Some simple examples may help to illustrate the 
problem. Suppose we have the linear program: 


min x 
x 


s.t. x>zeE 


xeE!, ec El. 


Then, the solution x(€) = € and the optimal value 
Ff lx(e), €] = x(€) = €, where € can be any number. This 
situation is rather ideal: the solution is defined and 
unique for every €, the solution and optimal value are 
infinitely differentiable, in this case linear in €. Every- 
thing is completely stable. 

As another example of a very well-behaved prob- 
lem, now an NIP in E”, consider: 


min (x1 — €f)? + (x2 — Je2|)? 
s.t. X1, x%2 = 0, 


x €E*, €€ E’. 


It is easy to see that x(€) = [x1(€), x2(€)] = (€7, Je2|) and 
flx(e), €] = 0 for any € € E’. Here again, the solution 
x(€) is a well-defined unique continuous function for all 
¢ in E’. The component x;(e) is infinitely differentiable, 
but now x2(e€) is not differentiable at €2 = 0. The opti- 
mal value is perfectly stable, indeed constant. Again, we 
have x(€) and f[x(e), €] in closed form. 

Unfortunately, things are not always so nice, even 
for very simple problems. Consider the linear program 


min €x 
x 
LP(e) st. x>0, 
xe€E!, e€c€E!. 


When e€ > 0, then x(€) = 0 solves this problem and 
flx(e), €] = 0. When e€ = 0, then f[x(e), €] = 0 for any 
x € E!, hence LP(0) is solved for any x > 0, so the so- 
lution set S(e) by € = 0 is given by S(0) = {x € E': x > 
0} and of course f[x(0), 0] = 0. When «€ < 0, things col- 
lapse, since the objective function f(x, €) is unbounded 
below in the feasible set R = {x € E': x > 0}. Symboli- 
cally, we see that f(x, €) = «x > — oo as x > + oo when 
€ < 0. Here, we say that the infimum (smallest value of 
f) is ‘not attained’, nor is the solution (minimizing value 
of x) attained. Summarizing, the solution is stable for € 
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> 0, unstable for € = 0, since the solution set S(0) can 
vanish in every small neighborhood of € = 0, and un- 
solvable for € < 0. 

An equivalent example in E” may give more per- 
spective on the geometry of the problem. Consider 


min x2 
x 


st €X) <%, x, >0,x €E*, ec E!, 


The reader is encouraged to sketch the feasible region 
for € > 0, € = Oande <0. Clearly, the smallest value of x2 
for € > 0 or € = O is x2 = 0, but for € < 0, the component 
Xz is not bounded below. The associated values of x; 
are x; = 0 for € > 0 and any x, > 0 for € = 0 and x, 
unbounded above as x, — — oo for € < 0. Collecting this 
information, we have the following: The solution x(e) = 
(0, 0) for € > 0; x(e) is any x € S(O) = {x € E’: x, = 0, 
x2 = 0} for € = 0; and the solution is not attained for € 
< 0. The optimal value f[x(€), €] = 0 for € > 0 and is 
not bounded below for € < 0. As before, the problem is 
stable for € > 0, but unstable for € = 0 since the solution 
does not exist for some € in any neighborhood of € = 0, 
and unsolvable for € < 0. 

What causes such erratic behavior and what guaran- 
tees such stability, for different parameter values? Can 
we identify conditions that imply some sort of stability 
or that warn of the presence of instability? Can changes 
in a solution be predicted within a specified error tol- 
erance for problem data changes, without solving the 
problem again for each new perturbation? A resolution 
and understanding of such questions is the substance of 
sensitivity and stability analysis. Note that in our small 
examples, the problem functions taken individually are 
continuous and differentiable and extremely well be- 
haved. Clearly, we need to capture the effect of parame- 
ter changes on their joint collective behavior, at least in 
a neighborhood ofa solution. 

Some of the important tools and concepts that have 
been especially useful in obtaining conditions and re- 
sults for sensitivity calculations and stability character- 
izations are: 

i) optimality conditions, 

ii) constraint qualifications, 

iii) implicit function theorems, 

iv) directional derivatives, 

v) point-to-set maps and set and map convergence 
and continuity, 


vi) condition number, 

vii) convexity and convex analysis, and 

viii) duality. 

The list is subjective and far from exhaustive. Some 
of these constructs will be utilized in the preliminary 
results we mention, but we cannot provide additional 
coverage in this brief presentation. 

We next present a few basis results that illustrate 
the application of some of the mentioned mathemati- 
cal tools and indicate typical assumptions and conclu- 
sions. Consider the important special realization of our 
general problem P(e) when the constraints g(x) > 0 and 
h(x) = 0 are not present, resulting in the so-called un- 
constrained problem 


Be. 22 


st. «x EE" (€ € E4), 


Assume that f is twice jointly continuously differen- 
tiable in (x, €). Let Vx f(x) = [ of (x)/ Ox, ..., Of (x)/ 
dx,] denote the usual gradient of f at x and V2f(x) = 
V[Vf(x)T] (with (i, j)th element given by 07f(x)/ dx; 
dx;) define the matrix called the Hessian of f at x. (Note 
that V,f is an n-dimensional row vector and Li is an 
n x n matrix.) The following result is well known and 
the assumptions will be called the second order sufficient 
conditions at (x, €), denoted by SOSC(x, €). Let € = €, 
a fixed quantity. 


Proposition 1 If V, f(X,€) = 0 and V2 f(x, €) is pos- 
itive definite, then x is a strict and locally isolated mini- 
mizer of f (x, €). 


This result is of intrinsic interest and is very important 
and well known. What is especially intriguing is that 
sensitivity results follow without additional assump- 
tions. 


Proposition 2 Assuming SOSC(x, €) as before, for € 
near € there exists a unique once continuously differen- 
tiable vector function x with value x(€) such that x(€) = 
x and SOSC(x(e€), €) continues to hold and hence x(e) is 
a locally unique (i. e., isolated) local minimizer of f (x, €). 
Furthermore, the optimal value f* is twice continuously 
differentiable, where we define f* (€) =f [x(é), €]. 


This may be viewed as a stability result, essentially an 
existence theorem (following directly from a classical 
implicit function theorem) that guarantees that the as- 
sumptions made at (x, €) will persist near (x, €) at [x(€), 
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€], hence also the conclusions. This suggests an intu- 
itively appealing definition of an ideal form of stability, 
with respect to a given change in the data. We might de- 
fine ‘assumption stability’ to be the persistence of initial 
conditions and assumptions in the sense indicated. 

Interestingly, we can go even further and derive 
a sensitivity formula from our observations, again with- 
out any new assumptions. 


Proposition 3 Assuming as in Proposition 1 and with 

x(€) as in Proposition 2, we obtain the following relations 

at [x(e€), €] near (x, €): 

i) Defining the optimal value f*(e) = f[x(e), €], we 
conclude from the chain rule for differentiation that 


Vef*(e) = didelflx(e), ll 
_ Vif: Vex + Vef = Vef(X, ©)|x=x(e) 


since Vf [x(€), €] = 0, where F(y)|y =z means to eval- 
uate F at y =z. With this understanding, we write 


Vef*(€) = Ve flx(e), €]. 
ii) Differentiating the last result again by €, we find that 
Vef*(€) = Vref Vex + Veflxaxto): 
iii) Differentiating V xf [x(€), €] = 0 by € yields 
VifVex + Vif =0 
from which we conclude that 
Vex(€e) = —V? f[x(e), e] Vo five), €]. 


Here, V2,.f =Ve(Vif1). 


Thus, without additional conditions other than the op- 
timality conditions SOSC and the assumption of the 
presence of data in the form of a parameter vector and 
appropriate smoothness, we have a characterization of 
local optimality in Proposition 1, a parametric stability 
and existence theorem in Proposition 2, and sensitiv- 
ity measurements in the form of parameter-derivative 
formulas, for both the optimal value and local mini- 
mizing point, in Proposition 3. Note that the deriva- 
tives are computable, once we have calculated a solu- 
tion x(€) for a given €, hence in particular at € where 
x(€) = x. These results are extremely important and 
useful and they also dramatize some important prin- 


ciples, i.e., the persistence of assumptions as a key to 
stability, and the intimate connection between optimal- 
ity conditions and stability and sensitivity analysis. Of 
course, the conclusions are local, but this is not surpris- 
ing since the results are based on information at one 
point. As a postscript, we also note that SOSC is often 
invoked and generally needed for the optimal conver- 
gence and rate of convergence behavior of numerical 
algorithms, e.g., for the quadratic convergence rate of 
Newton’s method. Thus, we conclude that sensitivity 
and stability analysis, optimality conditions and char- 
acterizations and the convergence properties of compu- 
tational algorithms are extremely closely related. Their 
underlying theory can undoubtedly be unified at a gen- 
eral level. 

Applications of results such as those given are nu- 
merous. Perhaps most directly, we can estimate f*(€) 
and x(e) for small changes in € by using a truncated 
Taylor’s series as follows: 


i) f*(©) yoz frO@t+v f*(@(e—€) + Ce OH) 
A P a a 
where V.f* and V2f* are as given above in i) and 
ii), and 
ii) x(e) © x(€) + Vex(@(e- 8) = ¥ — V2 1%, 2) 
V2, (%, Oe — €). 


These estimates are available when a good approxima- 
tion to x has been obtained. 

We conclude this brief expository article by indicat- 
ing a number of other applications, extensions and ref- 
erences. 

There is now an enormous body of literature de- 
voted to sensitivity and stability results in mathemati- 
cal programming, most of which have been published 
starting in the 1970s. The issues surrounding well- 
posedness and resulting definitions, e. g., solvability for 
small data perturbations, continuous solution changes 
for continuous data changes, and others are classical 
however and have been studied in mathematics and 
physics and other disciplines since early times. A rudi- 
mentary general theory has been known for some years. 
The recent burst of activity was undoubtedly stimulated 
by the tremendous advances in NLP in the 1960s and 
1970s, largely resulting from the preceding emergence 
and computational demands of modern optimization 
applications in the sciences, engineering, economics, 
industry, and collectively what came to be known as 
operations research, the advent of modern computers, 
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and the development and success of linear program- 
ming (LP) methodology. 

For LP, developed in the 1940s, there quickly fol- 
lowed what was called post-optimality sensitivity analy- 
sis that included parametric closed form expressions for 
solution changes with data changes and error bounds, 
this having been recently significantly extended. Exis- 
tence theorems and closed form expressions have also 
been recently developed for optimal value and solution 
point parameter-derivatives and directional derivatives, 
as well as parametric bounds, for the general paramet- 
ric nonlinear program (NLP) of the form P(e), anal- 
ogous to those given in this article for the uncon- 
strained problem Pj(e). Significant sensitivity and sta- 
bility results have been obtained for many important 
and well known classes of problems, e.g., geometric 
programs, separable programs, semi-infinite and infi- 
nite programming, control theory, multi-objective op- 
timization, integer programming and stochastic pro- 
gramming. The parametric perturbation theory has 
been significantly extended to more general models 
as well, e.g., variational inequalities and generalized 
equations. Results include many variations in optimal- 
ity conditions, constraint qualifications, definitions of 
continuity and differentiability, generalized convexity 
and other mathematical notions and tools. A significant 
body of theory now also extends to nonsmooth (i.e., 
nondifferentiable) functions, and to perturbations that 
are more general than perturbation of a parameter, e. g., 
function and set perturbations. 

Applications are abundant. They include solution 
and optimal value extrapolation for perturbed data, 
as already noted, model validation, scaling and regu- 
larization, decomposition algorithms, bilevel program- 
ming, optimization involving implicitly defined func- 
tions, duality relations, analysis of convergence prop- 
erties of algorithms, the effect of data perturbations 
on algorithmic performance, approximation of sensi- 
tivity measurements from algorithmic information and 
much more. Computational implementations on ac- 
tual practical real-world problems do exist but are still 
quite limited. Much remains to be done both in theory 
and practice, e.g., the provision of standard software 
for user-friendly sensitivity and stability calculations in 
major NLP solution computer programs, as available in 
LP (though user-friendliness may need more emphasis, 
even here). The theory is rich and every facet of opti- 


mality characterization and algorithmic definition and 
performance is a fertile field for the study of the effect 
of data perturbations. 

The significant contributions to this field are too nu- 
merous to mention and thus we shall not attempt to 
single out key contributors. Instead, we shall cite only 
[1,2,3], references with which the author has been per- 
sonally involved, not to presume to monopolize per- 
sonal credit but because they contain numerous ref- 
erences and state-of-the-art surveys and scholarly pa- 
pers from many established and emerging leaders in the 
field. These books and journals and the tutorial and re- 
search articles and hundreds of references therein will 
hopefully quickly and surely lead the interested reader 
to the central core of the published results in this vital 
area. 
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In this article, we describe a number of (for the most 
part) computable formulas and bounds that can be used 
to approximate sensitivity and stability measurements 
for nonlinear programming (NLP) involving a parame- 
ter. We draw freely from the definitions and results pre- 
sented in > Sensitivity and stability in NLP and » Sen- 
sitivity and stability in NLP: Continuity and differential 
stability. The reader is advised to read these before pro- 
ceeding with the results we present here. 

As noted in >» Sensitivity and stability in NLP and 
> Sensitivity and stability in NLP: Continuity and dif- 
ferential stability, the problem of interest is to find a so- 
lution x(e€) of 


min f(x, €) 


P(e) st. gi(x,€) >0 


h(x, €) =0 


where x € E” and the parameter € € T C E17. We denote 
a local solution by x(e), the local optimal value by f* (e) 
= f[x(e), €], solution set by S(e), and feasible region by 
R(e). 


Our focus is on general finite-dimensional deter- 
ministic NLP involving a parameter. The reader may 
know that there is a fairly fine-tuned sensitivity anal- 
ysis methodology for linear programming (LP), of- 
ten referred to as post-optimality analysis, including 
a range-analysis concerned with maximum tolerances 
on changes in the objective function coefficients within 
which a solution and basis does not change, and formu- 
las for right-hand side and constraint coefficient ma- 
trix changes, bounds on changes, and the like. Recent 
work has sharpened the classical theory to allow for 
more simultaneous data changes and extend the param- 
eter change tolerance idea. Likewise, sensitivity approx- 
imation results have been developed for quadratic pro- 
grams, networks, separable programs, geometric pro- 
grams, and many other mathematical programs having 
a special structure. In the direction of more general- 
ity, sensitivity and stability approximation results have 
been developed for semi-infinite programming, infi- 
nite programming, control theory, discrete program- 
ming, stochastic programming, and many other general 
programs, extending further to related generalizations 
suchas variational inequalities, generalized equations, 
nonsmooth analysis, and more. 

We consider here three types of sensitivity and sta- 
bility measurements at a general level for smooth NLP 
involving a parameter: 

i) parameter-derivatives of a local solution x(€) and 
the local optimal value function f*(€) = f[x(e), €]; 

ii) algorithmic approximations of solution parameter- 
derivatives using a classical and well known barrier 
function (interior point) algorithm; and 

iii) parameter-dependent solution bounds. 


Parameter Derivatives 


In > Sensitivity and stability in NLP: Continuity and 
differential stability we defined the Karush-Kuhn- 
Tucker conditions (KKT) at a feasible point x of P(e) 
to be the existence of u > 0 and w such that (conditions 
KKT(x, u, w, €)): 


V,L(x,u,w,€) = 0, 
uigi(x,€) = 0 (i= 1,... 
hj(x,€) =0 G=1,... 
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where 


L(x, u, w,€) = f(x, €) 
m P 

_ > ui gi(x,€) + So wihj(x, €) 
i=1 j=l 


is the Lagrangian and uy, w are called the Lagrange mul- 
tipliers associated with x. Definitions of SOSC, LI and 
SCS were also given in > Sensitivity and stability in 
NLP: Continuity and differential stability, along with 
the following result, which we repeat here for conve- 
nience. For simplicity, assume that the functions defin- 
ing P(e) are twice jointly continuously differentiable in 
(x, €) unless otherwise specified. 


Proposition 1 Assume that KKT, SOSC, LI and SCS 

hold at a feasible point x of P(€) with associated La- 

grange multipliers (u,w), where € € interior T. Then, 
for € in a neighborhood of € in T we have the following 
consequences: 

i) There exists a locally unique once continuously dif- 
ferentiable vector function y = (x, u, w) such that the 
assumptions continue to hold at y(e) = [x(e), u(e) 
w(e)], where y(€) = y = (x, u, Ww). 

ii) The point x(€) is an isolated local minimizer of prob- 
lem P(e) with associated unique Lagrange multipli- 
ers [u(e), w(e)]; and 

iii) The local optimal value function f* is twice contin- 
uously differentiable in e. 


This Proposition follows from the fact that the (x, u, 
w)-Jacobian (i. e., the matrix of partial derivatives with 
respect to (x, u, w)) of the KKT equation system given 
above is nonsingular at (x,u,w,€). Thus, a classical 
implicit function theorem is applicable and the results 
quickly follow, once it is shown that SOSC persists near 
€=€. 


Remark 2. G.P. McCormick must be credited for ini- 
tially identifying and applying the conditions, KKT, 
SOSC, LI and SCS, which imply the existence of the 
KKT y-Jacobian inverse. He initiated the use of these 
assumptions to obtain extrapolation results for classi- 
cal barrier function methods in terms of the algorith- 
mic parameter. See [6, Chap. 5]. He also used these to 
obtain results similar to those in Proposition 1 but for 
a problem that has additive perturbations that may be 
nonlinear in x and linear in €, a special case of prob- 
lem P(e). See [6, Thm. 6]. The author showed that KKT, 


SOSC, LI and SCS suffice for results for the general 
problem P(e€)as given in Proposition 1, as well as for 
the results given in the rest of this article. These devel- 
opments are pursued in detail in [2]. McCormick also 
stated a partial converse of Proposition 1 as follows: If 
KKT and a weakened form of SONC hold and the KKT 
y-Jacobian has an inverse at (x, u, w) (for problem P(e) 
with € not involved, i.e., € fixed), then SOSC must hold 
at (x, u, w). This is [6, Cor. 7], where SONC () the sec- 
ond order necessary condition) here means the second 
order part of SOSC, weakened to zT V2 Lz > 0 for all z 
as in SOSC. We note that SCS and LI also follow from 
the assumptions of [6, Cor. 7]. We can show and here 
note that the following partial converse of Proposition 1 
is true and equivalent to [6, Cor. 7] just cited: If x is a lo- 
cal minimizer of problem P(e) (with € fixed) and KKT 
and the KKT y-Jacobian inverse exist at (x, u, w), then 
again all the assumptions of Proposition 1, i. e., KKT, 
SOSC, LI and SCS, must hold at (x, u, w). 


To obtain parameter derivatives of (x, u, w), we simply 
note that KKT(x, u, w, €) is locally identically satisfied 
at [x(e), u(e), w(€)] near € = €, and hence these equa- 
tions can be differentiated with respect to «. We have 
the following (conditions KKT[x(e), u(e), w(e), €]): 


V,,L[x(e), u(e), w(e), €] = 0, 
ui(e)gi[x(e),€] = 0 (i=1,...,m), 
hj[x(e€),€] = 0 G= 


| 
an 


With F representing the vector function of equalities on 
the left, this has the form F[y(€), €] = 0 where y = (x, u, 
w). Since our assumptions imply that F is differentiable 
in €, we can use the chain rule for differentiation to con- 
clude that dF/de = V, FVey + V-F = 0. Applying this 
calculation to the KKT equations we get the following: 


M(€)Vey(e) + N(e) = 0, 


where 
V2L  —Vg! +++ —Vgh Why ++ Vig 
u1 Vx 81 Z1 0 
* . 0 0 
M = | unVegm 0 gm 
Vhy 
0 0 


Vhp 
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is the Jacobian of the KKT equations with respect to y 
and 


N= Pe ee coca ee 


is the Jacobian of the KKT with respect to e. 
Since M is nonsingular, as noted, we obtain the re- 
sult 


Vey(e) = M(e)'N(€), (1) 


where we have introduced N = —N, for convenience. 
Thus, (1) provides a formula for the parameter deriva- 
tives of all components of (x, u, w) for € near €. In par- 
ticular, once we calculate the local solution x with asso- 
ciated Lagrange multipliers (x), then we can compute 


Vey(€) = M(é)'N(€), (2) 


where y(€) = [x(€), u(é), w(€)] = (x,u,w) = y and 

of course Vey = [VexT, Ve ul, VewT]T. It also follows 

that V-y is continuous near €, so Vey(€) > Vey(€) as 

E> e. 

We can also calculate parameter derivatives of first 
and second order of the local optimal value function 
f* (©) = flx(e), €], again by repeated use of the chain 
rule. The results follow. Since the KKT imply that 
a) f*(€) =L[x(e), u(e), w(e), €], then, since V.f* = Vy 

LV. y+ V¢ Land since it can be shown that V, LV¢ 

y = 0, it follows that 
b) Vef*(€) = VeL(x, u Ws ©)| (x, u, w) = [x(6), ule), w(e)}» and 

the derivative of this gives V2 f*(€), i-e., 

c) V2 f*(€) =d/de[V_ L[x(e), ule), w(€), €]], where the 
vertical bar in b) denotes evaluation at the specified 
point. 

As before, these expressions apply for all € near €, they 

can be evaluated at € once (x, u, w) is available, and they 

are continuous near €. 

We derive a formula for V2 f* in explicit form in 
the next section, for the general problem P(e). 

Some special realizations of problem P(e) should be 
mentioned. They are intrinsically important and also 
result in considerably simplified formulas. 

If problem P(e) is unconstrained, then the result (as 
stated in > Sensitivity and stability in NLP) is that V_f* 
= V~ f[x(e), €], with Vex =— V2 f—! V2, f and V2 f* = 
V2 fVe x + V2 f, which can be conveniently expressed 


as 
Ver = (Vee Vat veal" 
where 
Vif Ve ') 
V2 =( x €x . 
ear \ yep YE 


If the constraints of problem P(e) are present but inde- 
pendent of ¢, then again V.f* = Ve f[x(e), €] and V2 
f* = Ve" fVex + V2 f, but now V_x is obtained from 
(1) with N = [— V2, fT, OT. 

An extremely important case is the so-called right- 
hand side problem, an instance of problem P(e) with 
form 


min f(x) 
Pile) sts glee GH 1,...4m), 
hj(x)=ejt+m (j=1,...,p). 


This was mentioned also in » Sensitivity and stability 
in NLP: Continuity and differential stability, in the con- 
text of the directional derivative D, f* of f* which we 
found there to be D, f*(€) = [u(e), — w(e)]Tz. Consis- 
tent with this result, we find for problem P(e), under 
our current assumptions, that 


Vef*(€) = [ule), -w(©)]'s 
2 T TT (3) 

VeF" (e) = [Veule) ,—Vewle) ']. 
Thus, as noted in > Sensitivity and stability in NLP: 
Continuity and differential stability, the rate of change 
of the optimal value with respect to a small change only 
in the kth constraint value is captured by the value of 
the associated optimal Lagrange multiplier. In appli- 
cations, since the components of € may generally be 
viewed as ‘resource levels’, this establishes a direct con- 
nection between a given level and its imputed value 
(with regard to a marginal change in the optimal value), 
given by the associated optimal Lagrange multiplier. 
Thus, as noted before in » Sensitivity and stability in 
NLP: Continuity and differential stability, optimal La- 
grange multipliers are often called ‘shadow prices’. 

The Lagrange multipliers are explicitly involved as 
variables in dual programs, and constitute all the dual 
variables for linear programming problems. A consid- 
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erable literature is devoted to duality and dual programs 
and their relationship. As noted, duality and sensitivity 
analysis results are closely connected. We do not pur- 
sue this here. See [2] or any standard NLP-LP textbook 
for an introduction. 


Applications of Sensitivity Analysis 


There are numerous applications of sensitivity and sta- 
bility results such as those briefly presented. One of the- 
most immediate is verification of local stability and the 
determination of the relative importance of each pa- 
rameter at a local solution, in accordance with its rel- 
ative effect on the optimal value, solution point or La- 
grange multipliers. This assessment is also relevant to 
model validation in determining if the relative effects 
noted are reasonable, whether changes are warranted, 
whether some parameters should be rescaled or calcu- 
lated with more precision or others assumed constant 
and locally ignored for simplicity because of their little 
effect, etc. 

Another immediate application is the estimation of 
a solution or optimal value for perturbed data. This was 
indicated in » Sensitivity and stability in NLP for the 
unconstrained problem. For the general problem P(e), 
under the assumptions of Proposition 1, an approxima- 
tion of y(€) in a neighborhood of € is given by the first 
order terms of a Taylor’s series 


yle) & y(€) + Vey(E)(E — €). 


Using the results of the previous section, particularly 
(2), this translates into 


(x(e), u(e), w(e)) & (X, UW, W)+M(e) 1 N(€)(e—-€). (4) 


Similarly, we can obtain first- or second order estimates 
of the optimal value function, respectively as follows, 
near € = €: 


POX Ot Vef*Oe-2. 
and 
POX fO+VSf@Oe-9 
+5-aT VS @e-d. 6) 


These approximations provide computable estimates 
when a close approximation of ((X, u, w)) is available, 


where we can use the relationships indicated in the pre- 
vious section for V.f*, V2 f*, etc., under the various 
cases that can arise. 

Numerous other applications are known and docu- 
mented and the reader can undoubtedly think of sev- 
eral. We cannot pursue more developments here, but 
we do indicate some computational efficiencies and al- 
gorithmic approximation possibilities in the next sec- 
tion. 


Computational Efficiencies 
and AlgorithmicApproximations Computational 
Efficiencies and Algorithmic Approximations 


First, we show how the results of Proposition 1 can 
be exploited to greatly simplify the calculation of the 
parameter-derivatives, and also yield some new formu- 
las. Then, we show how the solution point and opti- 
mal value parameter derivatives can be approximated 
by a solution algorithm as a solution is approached. 


Computational Efficiencies 


Above, we noted that the equations KKT[y(e€), €], ie. 


V,,L[x(e), u(e), w(e), €] = 0, 
ui(e)gilx(e),€] = 0 
h,{x(e),€] =0 


hold in a neighborhood of €. Under the assumptions, 
all is changing continuously with e€, locally, and indeed 
the system is once continuously differentiable. Hav- 
ing calculated (x,u,w) with e = €, we have valu- 
able additional information. We know which g; and u; 
are 0 or positive at (y,é€), and we know this relative 
€ at y(€). 
Thus, we can delete the nonbinding-constraint equa- 
tions and we can divide out the positive u;, without 
losing any information, locally. We proceed to do this 
and, for convenience, relabel the binding constraints 
& = Cis oe) and associated (all positive) La- 
grange multipliers “= (u),..., u,)', so that g;> 0 
and u; = 0 fori=r+1,..., m can be ignored and 
suppressed in the subsequent calculation. We redefine 
the Lagrangian accordingly to T= f -— ja 4igi + 
eS wjh; and relabel Vey as Vey, to indicate the re- 
duced derivative, so that y = (x, u, w). 


status of g; and u; will sustain near € = 
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Given all these simplifications, the KKT system be- 
comes KKT[Y(€), €]: 


VxL[x(e€), ule), w(e), €] = 0, 
—gilx(e),e] = 0 
h;[x(e€),€] = 0 


(i=1,... 
(Gj=1,...,p), 


where we have written —g; rather than +g; to obtain 
a symmetric (x, u, w)-Jacobian of this system and fur- 
ther computational advantage. Now, differentiating by 
€ yields 


M(e)V.3(e) — N(e) = 0 


and analogously as before in (1), we obtain 


Vep(€) = Me)! N(€), (6) 


where the Jacobian of KKT with respect to (x, 1H, w) is 


oa Vu. 2 
PO 


and the negative of the Jacobian with respect to € is 


NV Veet aes Veg 
—Veh},...,-Veh)]" 


and 
T = [-V,g', VA"). 


Remark 3 We note that the above system KKTIy y(e), €] 
turns out to be VoL = = Oand we get M = VAL, all eval- 


uated at [V(e), d. where we recall that y = (x,u, w). 


Also, note that V2. = —N = V2 1. 
ey ye 


Of course, all of the formulas presented in the first Sec- 
tion still hold, in terms of this new reduced structure, 
to provide all the relevant and nontrivial formulas for 
all the cases mentioned there. In addition, we are able 
to express all of the block components Aj, Aj2, A21, 
Ay of mM, for all cases that can arise, in terms of 
the original problem derivatives, resulting in enormous 
computational savings. See [2, Chap. 4, Sect. 2]. Fur- 
ther economies are introducedin the special formulas 
that apply for the right-hand side perturbation prob- 
lem P(e) and other particular instances of the general 
problem P(e). The optimal value formulas also simplify 


considerably and, for the general problem since f* = L 
and V. f* = Ve-L, we obtain the elegant result 
V2 ft = VAIVG + V2L 
ee, es eg ee 
= -N'V 9+ VE = —-N'M ON + V2E 


which can be expressed as 


af Vp 
Viet =IVeo" Ve 2 © 
ef = [Wey oH] G6) & 


This expression and the first equation in (7) are new 
and were not given in [2]. It turns out that V2 f* can be 
reduced to the simpler form (presented in [2]), 


a Vex 
Vif = (Vex ~ 1) Viir,e) €) o("; I ) 
VL V2 


ies (8) 
= (V. xt 1) x exL Vex 
. Wot. “Vel i 


Remark 4 If the given problem P(e) is jointly convex 
in (x, €), i.e, f convex, all g; concave, all h; linear affine 
in (x, €) with T convex, then it is well known that f* 
is convex. This fact becomes immediately evident from 
(8), since P(€) jointly convex implies that T is jointly 
convex. But this implies that Vix, ah is positive semidef- 
inite, which means that V2 f* is positive semidefinite, 
hence f* must be convex. Of course, we need to assume 
that problem P(e) has a solution for each « € T where 
the conclusion of Proposition liii) holds, to guarantee 
that (8) holds for each € € T. 


Algorithmic Approximations 


An intriguing possibility is the simultaneous approx- 
imation of sensitivity and stability analysis informa- 
tion, as a solution is being estimated by an algorithm in 
progress. We illustrate the feasibility of this idea with 
results that have been obtained by a classical combi- 
nation barrier-exterior point algorithm, forerunner of 
a powerful class of interior-exterior point methods that 
are currently among the best methods available, both 
for LP and NLP. 
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The classical logarithmic-quadratic barrier-penalty 
function for problem P(e) is defined as 


W(x, €,r) = f(x, €) 


m P 
1 
—r y log gi(x,€) + ap ) hi(x,€) . (9) 
j=l 


i=1 


A solution procedure based on W for solving problem 
P(e) for any fixed value of € is briefly described as fol- 
lows: Select r, > 0 such that r, > 0 as k > + oo and 
solve min, W(x, €, ri) s.t. gi(x, €) > 0 (for all i) to obtain 
x(€, rx), for k = 1, 2,.... Then, ideally, x(e, rg) > x(e), 
a solution of P(e), as k > +00. 

At a minimizer x(e, r) of W(x, €, r), we must have 
Vz W=0. Since Vx W = Vi f— OM, (rigi)V git Djnt 
(hj/r)V hj and since the Lagrangianof problem P(e) is L 
=f — 07, wigi + jar wjhj yielding Vx L = Vx f — 
ey EV eee YS wjV hj, then V, W = 0 is equiv- 
alent to V, L = 0 with r/g; = u; and hj/r = w;. Hence, 
combined with the results ofProposition 1, we have the 
following additional consequences. 


Proposition5 Under the assumptions of Proposition 1, 
for any (€, r) in a neighborhood of ((€,0), 0) with r > 
0, there exists a unique once continuously differentiable 
vector function y such that y(e, r) = [x(e, r), u(e, r), w(é, 
r)], where the assumptions of Proposition 1 continue to 
hold, and such that conditions KKT[y(e, r), €, r]: 


V,.L[x(e, 1), ule, r), w(e,r), €] = 0, 
ui(e, r)gilx(e,r),€] =r 


(Gj =1,....p), 


(= Tsu), 


hj[x(e,r),€] = rw; 


with y(€,0) = (x, u, w), and suchthat x(e, r) is a locally 
unique unconstrained local minimizer of W(x, €, r), with 
all gi[x(e, r), €] > 0 and V2. W[x(e, r), €, r] positive def- 
inite. 
Note that the resulting equations in Proposition 5, sat- 
isfied at y(€, r), are a simple perturbationof the KKT 
equations exhibited above, and we have labeled them 
accordingly. This is a striking example of the close re- 
lationship between optimality conditions and algorith- 
mic theory, and as we now indicate, sensitivity analysis 
as well. 

Just as above, we can immediately differentiate 
equations KKT[y(e, 1), €, r] withrespect to €, now to ob- 
tain the perturbed formula for the parameter derivative 


Vey(€) given in (1). The analogous reasoning applies 
and we obtain 


Veyvlée,r) = M(e,r) 'N(e, 1), (10) 


where M and — N are the Jacobians of the perturbed 
KKT system with respect to (x, u, w) and e, respectively. 

The following results hold, for any € or € sufficiently 

close to € and for r > 0 and small enough: 

i) y(€,0) = y(), 

ii) Vey(€,0) = Vey(E), 

ili) y(e,r) > y(€) as (e, r) > (€, 0), and 

iv) Veyle,r) > Vey(€) = Me) N(€) as (e, 7) > 

(€,0). 

Thus, in particular, we can take € = € and estimate 
y(€) and V. y(€) by y(e, r) and V¢ y(e, 7), for (€, r) near 
(€, 0). 

These results lead immediately to optimal value ap- 
proximations since we find that, for € near € and r > 0 
small: 

i) W*(e,r) >f* (©), 
ii) VeW*(e, r) > Vef*(e), and 
iii) V2 W*(e, r) > V2 f*(€) as r > 0, where W*(e, r) 

= W[x(e, r), €, r]. 

We also note that since V,W = 0 and V2 W is posi- 
tive definite near V,W = 0, we have that V2 WV.x 
+ V2..W =0, yielding 


Vex(€,r) = -V2w V2. W, (11) 


an alternative calculation for Vx, leading immediately 
to simple formulas for Ve uj(e, r) and Ve wj(e, r), 
through the relations u; = r/g; and w; = h,/r. These re- 
sults provide a basis for approximating solution sensi- 
tivity information at an unconstrained minimizer of W 
by exploiting algorithmic properties and calculations. 
It is important to note that most of the information 
required to calculate the parameter derivatives will al- 
ready be available from the usual implementations of 
an algorithm based on W that is used to find a solution 
of problem P(e). 

In the next section, we present a few basic results 
that provide upper bounds on a solution point change 
and upper and lower bounds on the change in the op- 
timal value, when the problem data parameter is per- 
turbed. Many extensions and variations are possible 
and many have been reported. Our objective here is to 
give an idea of the kind of bound information that can 
be calculated, once a solution has been determined. 
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Bounds 
Linear Equations 


In > Sensitivity and stability in NLP, our first example 
of an important and widely encountered mathematical 
model is a system of equations, 


Ax = b, (12) 


where A is an n x n real matrix and x and b are in E". 
We briefly discussed a number of issues that might arise 
in characterizing a solution x for changes in the data, A 
and b. As noted, we may think of a solution or set of 
solutions as a function of A and b, viewed as parame- 
ters, exploit the nonsingularity of A to actually obtain 
a closed form expression for x(A; b) and otherwise en- 
deavor to track x as A and b vary, or at least establish ex- 
istence and continuity and parameter-differentiability 
properties and the like. Another approach is the devel- 
opment of bounds on a solution change resulting from 
the data perturbations. Since the equations model is so 
important, and since many of the sensitivity results that 
we have presented here and in » Sensitivity and stabil- 
ity in NLP and > Sensitivity and stability in NLP: Con- 
tinuity and differential stability and that have been pre- 
sented elsewhere reduce the problem to equations (or 
their linearization) that follow from optimality condi- 
tions, the model is also quite relevant. Hence, we give 
a classical result that is both important and instructive 
in revealing what information is crucial in the calcula- 
tion of bounds. 

Let || - || denote a vector norm or its corresponding 
matrix norm, as relevant in the following context. Per- 
turbations in the data A and b are denoted by 6 A and 6 
b, respectively. 


Proposition 6 Suppose Aq! exists and x solves Ax = b 
while x + 5x solves (A+ 6A)(x +5x) = b+ 5b. Suppose a 
= || A715 A || < 1. Then, the corresponding change 5x in 
the solution x satisfies the following inequality 

Sxl] _ eA) 


Ca e Ta) 
lx] ~ 1-a@ \ Al [el 7” 


where c(A) = ||A|| || A~ || is the so-called condition num- 
ber of A. 


(13) 


This result provides a bound for a generally small per- 
turbation of A (since we require @ < 1) on the relative 
change in x for relative changes in the data. For || 6A || 


small enough, a will be close to 0, and the crucial fac- 
tor is the condition number: if c(A) is small enough, 
then small relative solution changes result from small 
relative data changes, but if c(A) is too large, then large 
solution changes may result from small data changes. 
The former and latter cases correspond to what is of- 
ten termed to be ‘well-conditioned’ or “ill-conditioned’, 
respectively. (Generally the larger the value of c(A), the 
more difficult it is to solve (12) with prescribed accu- 
racy.) This bound is of intrinsic interest and should 
provide a useful perspective for the reader.Some of the 
bounds results that follow for various classes of para- 
metric programs may be seen to be related to some vari- 
ation of (13). In this context, note that the condition 
number c(A) = Amax/Amin > 1, where Amax and Amin are 
the positive square root of the maximum and minimum 
eigenvalues of AT A, respectively, and we assume that 
Amin > 0. If A is a real positive definite symmetric ma- 
trix, then Amin and Amax are simply the minimum and 
maximum eigenvalues of A. 


Solution-Point Bounds for NLP 


We begin with a bound on the perturbed solution of 
a quadratic program (QP), i.e., a mathematical pro- 
gramming problem having a quadratic objective func- 
tion and linear constraints: 


ix! Kx —k'x 


Q(K, k) 


s.t. Cx<c, Dx=d. 


Proposition 7 Assume that K is a real positive defi- 
nite and symmetric matrix, with minimum eigenvalue A. 
Suppose the data K and k are perturbed to K and K and 
let 5 = max{|R - KI, i= k| 
Q(K, k) and % solves Q(RK, k). Then, provided 8 < 1, it 
follows that 


\. Assume that x solves 


ae 6 _ 
|#-—x| < —— + IzI). 


14 
= (14) 


~~ 


The following results apply to the general parametric 
problem P(e) introduced above, under various assump- 
tions and conditions already used here and in ® Sen- 
sitivity and stability in NLP: Continuity and differen- 
tial stability for continuity and derivativecharacteriza- 
tions. See » Sensitivity and stability in NLP: Continu- 
ity and differential stability for the involved definitions. 
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The next proposition invokes all the same assumptions 
as Proposition 1, except that we now assume that the 
functions of problem P(e) have twice differentiable par- 
tial derivatives in x that are jointly continuous in (x, €), 
rather than assuming twice continuous partial deriva- 
tives jointly in (x, €) as was done for Proposition 1. With 
this understanding, we have the following result. 


Proposition 8 If the conditions KKT(x,u,w,e), 
SOSC(x,u,w,€) and LI(x,€) and SCS(x,€) hold for 
x € R(€) ate = € € interior T, with the differen- 
tiability assumptions weakened as explained in the pre- 
ceding paragraph, then there exists a locally unique con- 
tinuous vector function y = (x, u, w) such that y(€) = 
[x(€), u(€), w(€)] = (x,u,w) = y and such that these 
assumptions continue to hold at y(é) = [x(e), u(e), w(e)] 
in a neighborhood of €. This implies that x(€) is an iso- 
lated local minimizer with associated unique Lagrange 
multipliers [u(e), w(€)]. Furthermore, for any A € (0, 1), 
near [u(e), w(e)], the following bound holds: 


ly—y)| <A -Ay* |M@™*| FY, Oll, (5) 


where F = [Vx L, uy Z1, «++, Um Sm M1, ..., hp] so that 
F(y, €) = 0 represents the Karush-Kuhn-Tucker condi- 
tions and M = V, F, all as introduced in above. 


The real-valued function ¢ is said to be Lipschitz contin- 
uous on a set S contained in a normed space X if there 
exists a quantity A > 0 such that |@(x)— (y)| <A || x— 
y for allx,y €S. 

In > Sensitivity and stability in NLP: Continuity 
and differential stability we gave the following result, 
summarized here for convenience. 


Proposition 9 If the conditions KKT(x,u,w,€), 
SSOSC(x, u, w, €) and LI(x, €) hold for a feasible x at 
€ = € € interior T, then near € there exists y = (x, u, 
w) continuous and locally unique such that y(€) = y 
and such that the assumptions persist at y(e); x(€) is 
an isolated local minimizer with associated unique La- 
grange multipliers[u(e), w(€)]; x, u and w are locally 
Lipschitz and directionally differentiable; and f* is once 
continuously differentiable and twice directionally dif- 
ferentiable. Therefore, in particular it follows that for € 
near €, there exists a, B, y > 0 such that 

a) \\x(e) Xl] < a lle — ell 

b) ||ule) — ull < B lle — El]; and 

c) ||w(e) —w|| sy lle -— 


Thus, we have bounds on the local solution point and 
associated Lagrange multipliers that strongly regulate 
the local rate of change of these quantities, assuring that 
it is uniform and stable in the sense indicated and pro- 
viding a sharper result than merely knowing continu- 
ity. However, the result is theoretical per se and does 
not give a prescription for calculating the Lipschitz con- 
stants, a, 6, and y. Such issues are important and on- 
going in obtaining computable solution bounds. 

Many such Lipschitz continuity results have been 
obtained in this area, both for a local solution point 
and its associated Lagrange multipliers, as well as for 
the local and global optimal value function. To cite still 
another, under assumptions that we have encountered 
in » Sensitivity and stability in NLP: Continuity and 
differential stability, Proposition 10, we have the fol- 
lowing (in addition to the several conclusions given in 
> Sensitivity and stability in NLP: Continuity and dif- 
ferential stability): If KKT, GSSOSC and MFCQ hold at 
(x, u, w) for € = €, then the local solution x is again lo- 
cally unique and Lipschitz, although here the Lagrange 
multipliers associated with x are not unique in general. 

Next, we present a simple general scheme for ob- 
taining computable optimal value bounds for finite pa- 
rameter perturbations, once a solution point is avail- 
able. This will be briefly presented for the classes of 
problems of the form P(e) where the optimal value 
function f* is convex on a convex parameter set T. 
Since a concave function is the negative of a convex 
function, the results also apply to f* concave.The ap- 
proach has even been extended to special classes of 
problems where f* is neither convex nor concave. To 
avoid any pathological exceptions, we assume solution 
attainment and constraint regularity sufficient to imply 
the Karush-Kuhn-Tucker conditions at any subject lo- 
cal solution point, in the remainder of the article. 


Computable Optimal Value Bounds f* Convex 


We shall first assume that f* is convex and well-defined 
for all € € T, aconvex subset of E7. Conditions implying 
the convexity of f* for problem P(e) are well known. 
For example, in > Sensitivity and stability in NLP: Con- 
tinuity and differential stability we noted the following 
result: 

If P(€) is jointly convex in (x, €), i-e., if f and the — 
gi are jointly convex and the h; are linear-affine in (x, €) 
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for all i and j and any € € T convex, then the optimal 
value f* is convex on T. More general assumptions are 
known that imply the convexity of f*. Suffice it to say 
that the class is large of problems P(e) for which f* is 
convex. Many results involving convexity and concav- 
ity characterizations of f* are given in [4] and [7]. 

We are assuming that f* is convex on T, a convex 
nonempty set. Consider any €), €2 € T, with €; # €2 
and denote by [€1, €2] the interval from €) to €2. We 
first show how to obtain bounds on f* over [€1, €2]. De- 
fine €(@) = we + (1— @)e€;, where 0 < a < 1. Note that 
for any € € [€), €2], there exists a unique @ such that 
€ = €(@). Then, we obtain a unique global upper bound 
U(a) on f* over [€}, €2] from the following result that 
follows from the definition of a convex function: 


Ff’ lela)] Saf" le) + 1 —a@)f" (a) = Ula). (16) 


Next, suppose that V.f*(e€;) and V_ f*(€2) exist. Then, 
again from the definition of convexity it is known that 
at any € € T where V.. f*(€) exists we have 


f= f*@+Vef*@le—S), for anye € T. (17) 


For € = €), or € = €, we thus obtain two global lower 
bounds L¥(e) and L}(e). Thus, L*(e€) = max{L7(e), 
L}(e)} gives a convex lower bound for any € € T. We 
can apply this result to « = €(a), with € = e« and 
€ = © and denote the right-hand side of the inequal- 
ity by L\(@) and L2(q), respectively. Next, let L(a~) = 
max{L (a), L2(@) }. Then, it follows that 


L(@) < f*[e(a)] 
< U(a) foranya € [0,1]. (18) 
This last inequality gives global upper and lower 
bounds onthe optimal value f* over the interval [e,, 
€2]. Note that U(a) can be calculated when we have de- 
termined only the value of f* at €, and €2. The lower 
bound L(@) can be computed when we have also deter- 
mined V,f* at €; and €). Thus, with f* and V, f* well- 
defined at €1, €2 € T, these quantities associated with 
only two problems, P(€;) and P(€2), respectively, are 
enough to provide parametric global upper and lower 
bounds over any interval [€1, €2]. 
The bounds over [€), €2] can be significantly im- 
proved in general each time f* and V.f* are deter- 
mined for a new value of ¢€ in the interval. For example, 


suppose we take € = €3 € (€}, €2). Once we determine f* 
and V,. f* at €3, we can calculate new upper and lower 
bounds, as above, over [€1, €3] and [€3, €2]. Because of 
the convexity of f*, the new bounds can only improve. 
In fact, continuing this way with new intermediate pa- 
rameter values in [€, €2], it is theoretically possible to 
calculate upper and lower bounds to any prescribed de- 
gree of accuracy, since the bounds will converge ulti- 
mately from below and above to the graph of f* over 
[€1, €2]. 

Note that to calculate bounds over [€1, €2], we need 
the convexity of f* only over the interval [€1, €2], not 
over the entire set T, allowing for significant additional 
generality, e. g., allowing us to drop the assumption of 
convexity of T. Furthermore, the process can be ex- 
tended to provide bounds over more general subsets of 
T, as follows. Suppose we are interested in €1,..., €% € 
T and suppose we can determine f* and V<f* at the €;. 
Analogously as before, define e(a) = )~*_, aie;, with 
ae a; = 1 and a; > 0. Then, an upper bound at any 
such €(@) is obtained from Jensen’s inequality, 


k 
flea] < Do ai f*(e:) = Ula), (19) 
i=1 


where @ = (a1, ..., @x). Lower bounds L¥(e) on f* over 
the entire set T are computable at each €; using (17) 
with € = ¢; (i= 1,..., k), yielding L*(€) = max{L7(e), 
..., Li (€)} as a piecewise-linear convex global lower 
bound on f* over T. Adapting this to € = €(a), we re- 
place € in the lower bound inequality (17)by €(a) and 
the € by ¢; to get L;(a) (i=1,..., k), finally getting L(@) 
= max{ L;(q@), ..., Ly(a)}. We thus have 


L(a) < f*[e(a)] < U(@) (20) 


k 


for any a = (a,...,@%) such that }°*_, a; = Landa; > 


0 (i=1,...,k). 


Remark 10 We note that, unlike before where €(@) = 
aé, + (1— @) €; and there is only one @ for a givene 
= €(a) € [€], €2], there now may be a set of values of 
a such that e(@) = €. Thus, for agiven € such that € = 
€(a@), the upper bound U(a) may have a range of val- 
ues. In this instance, we note that the best bound U*(e) 
is the optimal value of: ming U(q) s.t. ae iE; = €, 

a a; =1,a; >0 (i= 1,..., kh), a linear program 
with parameter €. (The lower bound will have only one 
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value for any such ¢, as before.) The upper bound will 
be single-valued for € = €(a), if the set of € such that € = 
€(a) is a simplex with vertices €; in the subset of T that is 
spanned by the given e;. Thus, an effective way to gener- 
ate upper bounds over any subset S of T is to systemat- 
ically cover S with contiguous simplexes, using points 
€; of S. The upper and lower bounds over each sim- 
plex will be uniquely determined and loose bounds over 
a given simplex can be sharpened by selecting a new pa- 
rameter vector € in the given simplex, subdividing into 
newly determined contiguous simplexes with the new 
vertex €, and proceeding as indicated above, to obtain 
upper and lower bounds over each simplex. 


Next, we show how to obtain a simpler and sharper up- 
per bound, if the constraint functions g; and h; are as 
prescribed when P(e) is a jointly convex program (i.e., 
gi jointly concave and h, jointly linear-affine, for all i 
and j). 


gi Jointly Concave, h; Jointly Linear Affine 


With g; and h; as indicated, it turns out that if x(€1) is 
a feasible point of P(e) and X(€2) is a feasible point of 
P(e), then x(~) = awx(e2) + (1 — a)x(e;) is a feasi- 
ble point of P[e(a)] for any a € [0, 1], where e(a) = 
dé, + (1— a)e;. This gives us a feasible point of P(e) 
for any € € [€;, €2]. It also means that f*[e(a)] < 
f{x(@), €(a)] = F(a), i.e., the optimal value of prob- 
lem P[e(@)] is bounded above by the objective function 
evaluated at the feasible point x(@). This immediately 
provides an easily computable upper bound on f* over 
[€1, €2] once x(€,) and X(€2) are available. There is no 
requirement here that f be convex. If x(€1) solves P(€) 
and x(€2) solves P(€2), then this upper bound is exact at 
€, and €. (Note that f* will generally be nonconvex if f 
is not convex.) 


P(e€)Jointly Convex 


If problem P(e) is jointly convex, then we not only have 
the g; jointly concave and the hj jointlylinear affine as 
before but now also have f jointly convex, hence f* is 
convex and the upper bound has additional properties. 
In this case, it turns out that V2 f* is also convex over [0, 
1]. Furthermore, if x(€) solves P(€,) and x(€2) solves 
P(€2), and x(a) = ax(€2) + (1 —@)x(e€;) then it follows 
that f(a) = f[x(a), €(a)] gives aconvex upper bound 


on f* over [€;, €2] that either agrees with or is lower 
and hence sharper than the linear upper bound U(q) 
described earlier, i.e., 


f*[e(a)] < flaw) < Ula) foralla € [0,1]. (21) 


As noted, f(a) is immediately computable once a so- 
lution is determined forP(€,) and P(€2). Following the 
technique for improving U(q), this bound can be im- 
proved by solving P(e) for €3 € (€1, €2), then proceeding 
as above for the sub-intervals [€1, €3] and [€3, €2]. 


f and e for Subsets of T 


The calculation of 7 and f can be extended to bound 
f* above over more general subsets of T, as well, pro- 
ceeding somewhat analogously to the procedure given 
above for f* convex that was indicated for U(a). We 
shall illustrate this for f(a), and assume that P(e) is 
jointly convex. Given a set of values {€,..., €,}, where 
€; € T, suppose we obtain solutions x(€;) for P(eé;), i 
= 1,..., k. Then, as before we consider €(a) = i 
Q;€;, with ao a; = 1 and a; > 0 for all i. If we de- 
note x(a) = ae a;x(é;), then it follows that x(q) is 
a feasible point of Ple(a)], hence again 


f*le(a)] < flx(@), €(a)] = f(a) < U@), 


for a = (a,...,@%), (22) 


where here U(q@) = i af*(e;) as in (19) and Fa is 
convex over @ and we thus have an upper bound on 
f*(€) for any € € T such that € = €(a) for @ as pre- 
scribed. Similar remarks to those given previously ap- 
ply regarding multiple values of @ corresponding to 
any given € = 5 aj€é;. Again, proceeding to cover 
T systematically with contiguous simplexes and calcu- 
lating bounds over each simplex would eliminate this 
multiple-value problem and result in unique upper and 
lower bounds, and would appear to be a very effective 
and natural way to proceed. 


Remark 11 As for g; jointly concave and h; jointly lin- 
ear affine, we candrop the convexity assumption on f 
and we can also simply assume that we have only a fea- 
sible point x(e;) of P(e;) for i = 1, ..., k (rather than 
a solution point x(€;)). Then, we still obtain an upper 
bound since we again have x(a) € R[e(a)] and hence 
we again have x(a) € R[e(a)], where we now have 
x(a) = eS ajx(é;). 
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We turn last to a calculation of a bound on the distance 
from a feasible point of P(e) to a solution of P(e). 


Bounds on the Distance of a Feasible Point 
to a Solution Point 


The first bound is for a given unperturbed problem. 


Proposition 12 Suppose we are given problem P(e), 
with feasible region R(€) convex in x with € fixed. Assume 
x(€) locally solves P(e), assume that the feasible region is 
not a singleton, and assume that there exists a number 
m(€) > 0 such that z™V2f (x, €)z = m(e) ||z||? for any x 
€ R(€) and any z € E" such that z is a feasible direction 
of R(€) from x(€), i.e, x(e)+ B z € R(€) for any B and 
some B such that 0 < B < B. Then, x(e) is the unique 
global minimizer and the following bound holds: 


(23) 


mle) 
2 


ivestere (3F°) 


for any x € R(e). 


Remark 13 The conclusions of this Proposition hold in 
some neighborhood of x(e€) if we assume that the uni- 
form quadratic underestimation of zT V2 fz holds only 
near x(€) (rather than for any x € R(€)). Hence, Propo- 
sitions 14 and 15 also hold if the respective distances 
being bounded are sufficiently small. 


Using the fact that V, f[x(e), €]z => 0 must hold, the 
inequality (23)follows easily from a second order Tay- 
lor’s series expansion of f around x(€). This provides 
a bound on the distance from any given feasible point of 
problem P(e) to the solution x(€) of P(e). This bound is 
of theoretical interest but may not be computable with- 
out x(€) because m, z, and f* depend on R(e) and x(e). 
As for z, if we let z beany z € E”, then the largest suit- 
able m will generally be smaller than with z restricted, 
but z will be free of dependency on x(e) and the results 
will apply if m > 0. Also, with z unrestricted, we note 
that the function f is strictly convex in R(€). Nonethe- 
less, even for z unrestricted, the largest suitable value of 
m is the optimal value (if positive) of a nonconvex pro- 
gram, ming, 27 V2 f(x, €)z s.t. ||z|| = 1, x € R(e), which 
may be prohibitive except for special classes of prob- 
lems. For example, if f is quadratic in x and V2f is pos- 
itive definite, then m may be taken to be the minimum 
eigenvalue of V2 f. We deal with f* by using a com- 
putable upper bound in the sequel. 


Next, we allow € to change and obtain a bound on 
the distance from a solution x[e(a)] of P[e(@)] to the 
computable feasible point x(a), as defined in Remark 
11, using the bound (23). 


Proposition 14 In addition to the assumptions of 
Proposition 12, suppose that for any € = €(a), the prob- 
lem constraints g; are jointly concave in (x, €) and the 
h; jointly affine, and also suppose that problem P(e) has 
a feasible point x(€) for € =€; € T convex, where i = 1, 
...,k. Then it follows that x(a) € R[e(a)] and 


I[x(o) — x[e(a)] || 


: (See ke) ai 


mle(a)] 
2 


for any a = (a1, ..., &) paeey a; =landa;>0(i=1, 
... k), where X(a) = Yk, a(€;), €(a) = DE, wie, 
x[€(a)] solves P[e(a)] and f* [e(a)] is the optimal value 
of Ple(a)]. 


The upper bound (24) involves f*[e(a)] which would 
generally not be available unless x[e(a@)] were known. 
We want a computable upper bound notrequiring 
x[e€(a)], and such is available if the optimal value f* of 
P(e) is convex over the set of € = e(a), using the re- 
sults for f* convex above.In particular, recall that f* is 
convex if P(e) is jointly convex. We obtain the follow- 
ing result that gives computable bounds, once a suitable 
value for m has been determined. 


Proposition 15 In addition to the assumptions of 
Proposition 14, assume that x(€;) solves P(€;)(i = 1, ..., 
k) and x(a) = > a;x(€;), suppose that we also have 
f jointly convex in the set of € = €(a) (thus making 
P(e) jointly convex and hence f* convex). Then, since 
F* le()] = L(a) from (20), we have from (24) that 


mle(a)] 
2 


Iz(@) — x[e(a)] || < (Z ae a) (25) 


and therefore 


L 
2 


U(a) — L(a) 


[ ts] 
2. 


for any a = (1, ..., &) such that paar a; = 1 anda; 
>0(i=1,...,k), where U(a) and L(a) are given above 
and we have also used (22). 


\|x(a) — x[e(@)]]] < (26) 
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This provides computable bounds over any € = €(a), 
once the solutions x(€;) (i = 1, ..., k) have been deter- 
mined, provided m can be calculated or bounded be- 
low. There are results that are applicable to the calcu- 
lation of such a number for important classes of prob- 
lems. For a fixed (x, €), the value m can be taken to be 
theminimum eigenvalue of Vf. For f quadratic, this 
simplifies, as noted earlier. For this and other impor- 
tant computable cases, see [5]. 

Our last result gives bounds under the same con- 
ditions as Propositions 12, 14 and 15, but using a first 
order bounding condition on the objective function f, 
ratherthan the second order condition of Proposition 
12. It is a simple consequence of the convexity of f. 


Proposition 16 Assume as in Proposition 15, except 
that we now let V f[x(e), e]z => m(e) || z || replace the 
condition that z1V2 f(x, €)z => m(e) ||z||?. Then, it fol- 
lows that 


f(x,e) — f*(e) 


IIx — x(e)|| S ne) 


(27) 


for any x € R(€). Now, adding the other assumptions of 
Proposition 15, we get the bounds 


f{x(@), €(@)] — L(@) 


\|x(w) — x[e(@)]]| < lean 


Z U(a) — L(a) 
mle(a)] 


(28) 


These bounds are mainly of theoretical interest because 
the verification of our condition involving m requires 
the solution x(€). Our condition, V, fz => m ||z|| is 
a minimal growth condition on the directional deriva- 
tive of f at a solution x(€) in any feasible direction. 
A suitable m would be the infimum at x(e) of V,.fz for 
|| z || = 1 and za feasible direction from x(e), providing 
this is positive. This leads to the more relaxed problem 
(where x = x(€)): min, V fz s.t. V g; z => 0 for all i such 
that g; = 0 and V h; z = 0 for all j and || z || = 1. See the 
Remark below. This can be formulated as a linear pro- 
gram and precisely corresponds to the search direction 
problem required for well-known methods of feasible 
directions. Thus, we obtain another connectionbetween 
optimality, algorithmic and sensitivity analysis calcula- 
tions. 


Remark 17 With convexity, the condition V fz > 0 at 
a feasible point, for any feasible direction z, is sufficient 


for unique global minimization at that point. This con- 
dition is frequently satisfied at a solution, e. g., it holds 
at the minimizer of a nondegenerate linear program- 
ming problem. We note further that Vfz > 0 for any 
such z is a sufficient condition for a global minimum of 
a convex problem and is a necessary condition at a lo- 
cal minimizer of a general (i. e., not necessarily convex) 
differentiable programming problem. It should also be 
mentioned that at x = x(e), the set of unit vectors z, 
such that Vg; z = 0 for all i such that g; = 0 and V h; 
z = 0 for all j, is the same or larger than the set of unit 
vectors z that are feasible directions of R(¢) from x(e). 
Thus, the largest acceptable number m = min, V fz at 
x(€) will be the same or smaller for the former set of 
z then for the latter set. However, the former set has 
the advantage of having a simple algebraic formulation 
and computability (as noted, it yields a linear program- 
ming problem). Thus, we can strengthen the quadratic 
growth condition in Propositions 12, 14 and 15, and the 
linear growth condition in Proposition 16, to the in- 
dicated set of z. All of these results will then be valid 
with respect to the newly resulting bounds. Finally, we 
should mention another appealing fact. With the set of 
z so chosen, V fz > 0 at x(€) implies that x(€) is a strict 
local minimizer for a general programming problem 
(i. e., without convexity). 


Most of the material in this article and many references 
can be found in [2]. References [1] and [3] give state- 
of-the-art tutorials and numerous extensions in a mod- 
ern setting, along with an extensive bibliography. The 
bounds result of Proposition 6 for the solution of a lin- 
ear system of equations can be found in [8, Chap. 6, 
Sect. 6.4]. Other bounds results may be found in [2] and 
[5], and many other optimal value convexity and con- 
cavity results are given in [4] and [7]. 


See also 


> Nonlocal Sensitivity Analysis with Automatic 
Differentiation 

> Parametric Global Optimization: Sensitivity 

> Sensitivity Analysis of Complementarity Problems 

> Sensitivity Analysis of Variational Inequality 
Problems 

> Sensitivity and Stability in NLP 

> Sensitivity and Stability in NLP: Continuity and 
Differential Stability 
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In > Sensitivity and stability in NLP we introduced 
the parametric nonlinear programming (NLP) prob- 
lem, gave several examples, mentioned various impor- 
tant applications of sensitivity and stability results for 
parameter changes and indicated several tools and con- 


cepts that have proved effective. We suggest that the 
reader peruse these preliminaries before reading this 
continuation of developments. 

For convenience, we again state the problem of in- 
terest: 


min f(x, €) 
P(e) st. gi(x,e)>0 (i= 1,... 


hj(x,-)=0 ({=1,... 


where x € E" and the parameter (data) « € T C E74. 
We present some basic continuity and differentiabil- 
ity results of the parameter-dependent solution point 
x(€) and optimal value f*(e€) of problem P(€). We focus 
on concrete results for specific classes of problems that 
are frequently encountered. More general results may 
be found in more detailed technical treatments else- 
where and a few publications will be noted for refer- 
ence and further study. In particular, we do not include 
nonsmooth results or the notions of point-to-set maps, 
semicontinuity of maps and functions, generalized gra- 
dients, alternative definitions of continuity or differen- 
tiability, functional or set perturbations (e. g., f to 7 g 
to g, hto h, x € M to x € M, etc.) and other very im- 
portant mathematical concepts, models or instruments 
that lead to significant extensions and generalizations 
of the basic results we present. Our purpose here is to 
provide a quick and hopefully unencumbered introduc- 
tion and simple exposition of significant rudimentary 
results. 

The results we present require some well-known 
smoothness properties, optimality conditions and con- 
straint regularity assumptions, called constraint qualifi- 
cations. We briefly review these in the next section. 

If f and — g; are convex in x and h; is linear-affine in 
x, for all i, j and any € € T, the problem P(e) is said to be 
convex in x. If these function properties hold jointly in 
(x, €) and T is a convex set, then P(e) is said to be jointly 
convex. For simplicity, continuous differentiability and 
continuity are assumed. 


Optimality Conditions 
and Constraint Qualification 


Associated with the problem P(e) are the Lagrangian 


m P 
L(x, u,w,€) = f(x, -)> Ui gi(X, +>) wjhj(x, €) 
=1 


i=1 j 
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and the optimal value function 


Pee ink fee) #R© 49, 
se if R(e) = 9, 


where the feasible region 


R(e) 


gilx,e)>0 (i =1,...,m) 


= EE": ; ; 
7 hj(x,<e)=0 (f= 1,...,p) 


Ifa local solution x(¢€) is known to exist, then it is under- 
stood that the optimal value is the local optimal value 
F* (©) = flx(e), €] for local results involving x(e). The 
solution set S(€) of P(e) is defined as 


S(e) = {x isa local solution: f(x,«) = f*(e)}. 


The following concepts and definitions are needed. 
The directional derivative of f* at € in direction z is 


w~ 


a 


fret az) — fe) 


a 


D.f*(©) = li 
ae 


In the next conditions, assume that x is feasible and 
that € is fixed at some value in T. Differentiability is 
assumed as needed. 

b) The Karush-Kuhn-Tucker conditions (known to be 
necessary under appropriate regularity conditions at 
a local solution x of a differentiable problem) are: 
there exist u > 0 and w such that (condition KKT(x, 
U, W, €)) 


V,L(x,u,w,€) = 0, 
uigi(x,€) =0 (i=1,...,m), 
hj(x,€) =0 (j=1,....p), 


where x € R(€), u = (uj, ..., Um) and w = (w},...; 
Wp) are called Lagrange multipliers, u => 0 means u; 
>0(i=1,...,m), and V, denotes the n-component 
gradient (row vector) with respect to x. 
The set of Lagrange multipliers (u, w) that satisfy 
KKT(x, u, w, €) is denoted by K(x, €). 

c) The second order sufficient condition, noted by 
SOSC(x, u, w, €), is as follows: KKT(x, u, w, €) holds 
for some (u, w) at x € R(e) and 


z' V2L(x,u,w,€)z > 0 


d) 


€ 


Ne 


Ve 


& 


for all z £ 0, z € Z, where 


Vi Gi(x, €)Z = 0 
for is.t. 
gilx,€) = 0; 

Vx gilx,€)z =0 
for is.t. 
gilx,€) = 0, uj; > 0; 
Vi Aj(x,€)z = 0 


Z=(<%zEE": 


The general second order sufficient condition, desig- 
nated GSOSC(x, u, w, €), requires that SOSC holds 
for all (u, w) € K(x, €). 

The strong second order sufficient condition, denoted 
by SSOSC(x, u, w, €), is the same as SOSC(x, u, w, 
€) with one change: the restriction in the set Z, that 
V xgilx, €)z = 0 for isuch that g;(x, €) = 0, is dropped. 
The general strong second order sufficient condition, 
designated GSSOSC(x, u, w, €), requires that SSOSC 
holds for all (u, w) € K(x, €). 

strict complementary slackness at a point x, noted by 
SCS(x, €), is said to hold if the associated KKT mul- 
tipliers are such that u; > 0 for all binding g; (i.e., all 
i such that g;(x, €) = 0). (This term derives from the 
fact that the condition u;g; = 0 in the KKT equations 
is known as complementary slackness.) 


There are many constraint qualifications. We shall give 
results involving the following three: 


i) 


ii) 


iii 


w~ 


The first is linear independence at a point x € R(€) 
of the binding constraint gradients, designated by 
LI(x, €); ie., Vigi(x, €) for i such that gi(x, €) = 0 
and Vhj(x, €) for j = 1,..., p are linearly indepen- 
dent. 

The Mangasarian-Fromovitz constraint qualifica- 
tion holds at x € R(€), noted by MFCQ(x, €), if: a) 
there exists a vector z such that Vx gi(x, €)z > 0 for 
all i such that g;(x, €) = 0 and V..hj(x, €)z = 0 for j = 
1, ..., p; and b) the vectors V.hj(x, €) G = 1,...; p) 
are linearly independent. 

The generalized Slater constraint qualification, de- 
noted by GS(e), holds for the convex problem P(e) 
if there exists a feasible point x such that g;(x, €) > 
0 for all i and such that V,.hy(x,€),..., Vilp(X, €) 
are linearly independent. For P(€) convex, MFCQ 
holding at each feasible point and GS are equiva- 
lent. 
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The following well known results can now be stated, for 
the standard nonparametric problem P(e), i.e., with € 
fixed: 

a) SOSC(x, u, w, €) implies that x is a strict local min- 
imizer (i.e., the unique global minimizer over the 
intersection of the feasible region and some neigh- 
borhood of x); 

b) GSOSC(x, u, w,€), u, w, €) and MFCQ(x, €) imply 

that x is an isolated local minimizer (i. e., the unique 

local minimizer in the intersection of the feasible re- 

gion and some neighborhood of x); 

Although GSSOSC at x implies GSOSC at x 

and also implies SSOSC(x, u, w, €), which implies 

SOSC(x, u, w,€), u, w, €), the condition GSSOSC 

alone does not imply that x is an isolated local min- 

imizer. A constraint qualification is needed, along 
with each second order condition, as in b) above and 
as in Propositions 8, 9 and 10 below. In fact, even 
the much stronger conditions, KKT(x, u, w, €) and 
val V2 L(x, u, w, €)z > 0 for all z 4 0 and any x and 
any (u, w) € K(x, €), do not imply that x is an iso- 
lated local minimizer. (See the counterexample by 

S.M. Robinson in [2, p. 71].) 

If x is a local minimizer and LI(x, €) or MFCQ(x, €) 

holds, then KKT(x, u, w, €) holds. If LI(x, €) holds, 

then the Lagrange multipliers (u, w) are unique. 

MFCQ(x, €) holds if and only if the set of associ- 

ated Lagrange multipliers satisfying KKT(x, u, w, €) 

is nonempty, compact and convex. If LI(x, €) holds, 
then MFCQ(x, €) holds. Therefore, it follows from 

b) that SOSC(x, u, w, €) and LI(x, €) imply that x is 

an isolated local minimizer. 


Ya 


c 


d 


ww 


Remark 1 A stronger form of MFCQ is necessary and 
sufficient for a unique Lagrange multiplier, but will not 
be used here. 


e) If problem P(e) is convex, then any local solution 
is global and the solution set is convex, and if the 
KKT(x, u, w,€) holds, then x is a global solution. 
Also, if GS(e€) holds, then so does KKT(x, u, w, €) 
at any local solution x. 

With this brief perspective, we present several ba- 
sic sensitivity and stability results that hold for problem 
P(e). We avoid detail and focus only on certain key im- 
plications of the assumptions. 


Proposition 2 For the once differentiable problem with 
nonempty uniformly compact feasible region R(€), for € 


near €, the optimal value function f* is continuous at € 
if MFCQ holds for some x € S(€). 


Proposition 3 The optimal value function f* is convex 
on T if P(é) is jointly convex in (x, €) as defined. As- 
suming solution attainment, this further implies that f* 
is continuous and directionally differentiable in the inte- 
rior of T. 


Proposition 4 IfR does not vary with € and f is concave 
in €, and T is convex, then f* is concave on T. Again, 
assuming the solution is attained, this means f* is con- 
tinuous and directionally differentiable in the interior of 
T. 


Proposition5 Suppose R(e) 4 % and compact and does 
not change with €, and assume f and V . f are continuous 
in (x, €). Then, at any € T, it follows that S(e) 4 © and 
compact and the directional derivative D, f* exists for 
any direction z and is given by 


D,f*(6) = min Vef(x,€)Z s.t.x € S(e). (1) 


Proposition 6 Assume that the problem P(e) is con- 
vex in x for each € € T convex and the problem 
functions are once continuously differentiable in (x, €). 
Then, if € € interior T and the set of points satisfying 
KKT(x, u, w, €)) is nonempty and bounded, or equiva- 
lently, if the solution set S(e) A and bounded (hence 
compact) and the Slater constraint qualification GS(e) 
holds for P(€) with € = € € interior T, then in a neigh- 
borhood N(€) of €, GS(€) holds, S(e) 4 © (and S(e) is 
convex) for eache € N(€) and S(€) is uniformly compact 
in N(€). Furthermore, f* is continuous and directionally 
differentiable in N(€) and in any direction z and 


D,f*(e-)= min max V_L(x,u,w,e)z, (2) 
x€S(€) (u,w)EK(x,€) 
where K(x, €) is the set of (u, w) such that KKT(x, u, w, 
€) holds. 


Remark 7 As indicated, the assumption GS is equiva- 
lent to MFCQ or to assuming K(x, €) # 8 and bounded 
at a local solution. Dispensing with convexity but as- 
suming LI(x, €) holds for each x € S(e), rather than 
GS(e), we have for each K(x, €) a singleton set {(u(x), 
w(x)} and hence 


D,f*(6) = = V.L(x, u(x, 6), w(x,6),€)z. (3) 
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Suppose we assume that the functions defining P(e) 
are twice continuously differentiable in (x, €). Then, we 
have the following second order results. 


Proposition 8 If KKT(x,u,Ww,¢€), SOSC(X, u, Ww, €) 
and LI(x, €) and SCS(x, €) hold for x € R(é) ate = 
€ € interior T, then there exists a locally unique and 
once continuously differentiable vector function (x, u, w) 
such that [x(€), u(é), w(€)] = (x,u,w) and such that 
these assumptions continue to hold at [x(e), u(e), w(e)] 
in a neighborhood N(€é) of €. This implies that x(e) is 
an isolated local minimizer with associated unique La- 
grange multipliers [u(e), w(€)]. Furthermore, f* is con- 
tinuous and in fact twice continuously differentiable in 


N(€), where f*(e€) =f [x(e), e]. 


Proposition 9 Suppose KKT(x,u,w,e) and 
SSOSC(x, u, w, €) and LI(x,€) hold for x € R(€) and 
€ € interior T. Then there exists (x, u, w) continu- 
ous and locally unique in N(€) such that [x(€), u(é), 
w(€)] = (x, u, w) and such that the assumptions persist 
at [x(€), u(e), w(e)] and as before, x(e) is an isolated 
local minimizer with associated unique Lagrange multi- 
pliers [u(e), w(€)]. Now, it turns out that fore € N(é), 
x, u, w are Lipschitz and directionally differentiable, and 
f* is continuous and once continuously differentiable 
and twice directionally differentiable. 


Proposition 10 If we again assume KKT(x, u, W, €) 
and further strengthen the second order conditions to 
GSSOSC(x, u, w,€) and now assume MFCQ(x, €), for 
x € R(e) ande = € € interior T, it follows that there 
exists a locally unique vector function x fore € N(€) 
such that x(€) = x and once again the assumptions con- 
tinue to persist at x and its associated nonempty com- 
pact convex set of Lagrange multipliers also exists as a re- 
sult of MFCQ continuing to hold in N(€). Again, x(e) is 
an isolated local minimizer, but now we can show only 
that x is continuous and f* is once directionally differen- 
tiable. It follows that fore € N(€), 


max 
(u,w)K[x(€),€] 


Df" |e) = V.L[x(e€), u, w, €]z. (4) 


Propositions 6-10 demonstrate what we have called as- 
sumption stability (i.e., persistence of initial assump- 
tions). A unique local solution with the given proper- 
ties continues to be locally unique and continuous and 
sometimes even differentiable, with continuous small 


data changes, and satisfying characterizations follow. 
All of the results we have given are now very well known 
and have been finely tuned, most therefore invoking 
close to ‘minimal assumptions’. Weaker conditions will 
generally significantly change the conclusions. For ex- 
ample if GSSOSC is replaced by the weaker condition 
GSOSC in the assumptions given in Proposition 10, 
then the perturbed solution x(€) need no longer be lo- 
cally unique, although the initial solution x = x(€) is 
an isolated local minimizer. Assumption stability is lost, 
since GSOSC does not persist. 

We offer a few observations concerning the rate of 
change of the optimal value f*(€), with data changes. 
See (2). D, f* is itself an optimal value, a contribution 
to which comes either from f or from the g; and hj, 
through the Lagrangian, its value generally depending 
on the problem solution set and the optimal set of La- 
grange multipliers. Note that if the constraints are not 
dependent on the parameter ¢€, then (2), (3) and (4) 
all collapse to formula (1): D, f*(€) = minge) Vef(x, 
€)z, formula (4) reducing to (1) without the min, as ex- 
pected from Proposition 5. Note further that this for- 
mula does not depend on the Lagrange multipliers, but 
only on the behavior of f over the solution set. On 
the other hand, if f does not depend on e, the depen- 
dency is all on the constraints and multipliers. In partic- 
ular, consider the so-called right-hand side perturbation 
problem 


min f(x) 
P(e) st. gi(x)>e; (i=1,...,m), 
hj(x)=ej;+m (j=1,...,p). 


Then, applying the results given in Propositions 6 and 
10, assuming for simplicity that the local solution x(€) 
and its associated optimal Lagrange multipliers (u(e), 
w(€)) are unique, and using formulas (2), (3) or (4), we 
find that 


D:f*(€) = [u(e), -w(e)]"z (5) 


Thus, the directional rate of change of f* with respect to 
changes in the constraint values is captured entirely by 
the optimal Lagrange multipliers, e. g., at the local mini- 
mizer x(€) the rate of change of f* (€) along a unit vector 
in the direction of €;, 1 < i < m, is given by u;(e). This 
is an extremely important and well known result. In ap- 
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plications, it translates to the multipliers being shadow 
prices (i. e., imputed values) of resource levels. 

Two simple examples may help to clarify some of 
these results. The interested reader should graph these 
problems to appreciate the geometry. 


Example 11 (Assume |e| < 1/2.) 


min €x; + % 


P(e) s.t. x, >0, x. =0, 


+x. >1, xP +x3 <4. 

The solution is S(0) = {x € E?: 1 < x; <2, x2 = 0}, f*(0) 
= 0; S(e) = (1, 0) and f*(e) = € for € > 0; and S(e) = (2, 
0) with f*(e€) = 2e for € < 0. It follows that D,f*(0) = 
minso) Vef[x(0), O]z = minsw)x1z = z if z > 0 and 22 if 
z <0, which agrees with the values that follow by direct 
calculation of the closed form solution. We also note 
that the optimal value is concave, agreeing with results 
given previously, since f is concave in € and the feasi- 
ble region R is fixed. (Note that the problem is convex 
in x, Slater’s condition holds, and S(0) is bounded, so 
Propositions 4, 5 and 6 all apply.) 


The next example is trivial, but illustrative. 


Example 12 


min Xx) 


st. x2 — x7 De. 


The solution is x(€) = (0, €), the optimal value f*(e) = € 
and the optimal Lagrange multiplier is u(€) = 1, for any 
e. The problem is jointly convex and Slater’s condition 
holds. The solution is unique, LI and SCS hold and in 
fact all the given second order conditions hold for this 
problem, so all the results we have given for these will 
hold. In particular, we see that D, f*(e) = u(e)z = z, 
as expected. Note that f* is convex and differentiable 
and x(e) and u(e) are unique and differentiable, all as 
predicted by the theory. 


There are many variations and extensions of the small 
sample of results that we have presented here. We have 
only been able to give the reader a flavor of this inter- 
esting and important subject. 

Many references exist and many would have to be 
cited, if we were to give proper credit to all the many re- 


searchers who have contributed, even to the handful of 
results presented. Rather than attempt this, we refer the 
reader to three references. These references contain nu- 
merous tutorials and hundreds of additional references 
and should provide a quicker introduction to the sub- 
ject than initially attempting to study scholarly papers 
scattered in the literature. 


See also 


> Nonlocal Sensitivity Analysis with Automatic 
Differentiation 

> Parametric Global Optimization: Sensitivity 

> Sensitivity Analysis of Complementarity Problems 

> Sensitivity Analysis of Variational Inequality 
Problems 

> Sensitivity and Stability in NLP 

> Sensitivity and Stability in NLP: Approximation 
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The SCP (Sequential Cutting Plane) algorithm [6,7,3] 
can be used for solving both NLP (Nonlinear Program- 
ming) and MINLP (Mixed Integer Nonlinear Program- 
ming) problems efficiently. For convex problems the al- 
gorithm finds global optimal solutions. For non-convex 
problems, global optimality cannot be guaranteed. Nev- 
ertheless, the algorithm can also be used on non-convex 
problems to find good approximations of the global op- 
timal solution. 

The SCP algorithm presented here uses a branch 
and bound strategy to solve MINLP problems where an 
NLP problem is solved in each integer node of the tree. 
The NLP problems are not solved to optimality, rather 
one iteration step is taken at each integer node of the 
tree and linearizations of the nonlinear constraints are 
added as cuts to the problem. The iteration step consists 
of performing an NLP iteration as described in [6,8], 
where the algorithm solves a sequence of linear pro- 
gramming problems. Note that in [7], the approach was 
different than the one used here. In [7], an NLP iter- 
ation was performed in each node of the branch and 
bound tree, which can be inefficient in difficult combi- 
natorial problems where the branch and bound tree is 
large. 


Formulation 


The SCP algorithm solves problems of the form 


min f(x,y), a) 
st. g(x,y) <0,j=1,...,.mxeX, yeY. 


The set X is a bounded, box-constrained set of the 
form X = {x € R""|x'® < x < xY} and the set Y is 
a finite bounded set Y = {y € Z"|y!8 < y < y¥}. 
The functions f and g are convex and continuously dif- 
ferentiable. 

The performance of the algorithm on a difficult set 
of block optimization problems, see [2], is presented. 
The ECP (Extended Cutting Plane) algorithm [9,10] 
proved to be very efficient in [2] for solving these types 
of optimization problems. The SCP algorithm further 


enhances the ideas from the Extended Cutting Plane al- 
gorithm in order to solve these difficult MINLP prob- 
lems even more efficiently. 


Methods 


The problem (1) is solved by the SCP algorithm by do- 
ing a normal branch and bound procedure on a relaxed 
version of (1). In each integer node of the tree, the in- 
teger variables are fixed. An NLP iteration is then per- 
formed on the NLP subproblem obtained by fixing the 
integer variables. If the iterate is optimal, the solution 
in the integer node is an upper bound on the optimal 
solution of the original problem (1) and can be used for 
dropping nodes from the tree. If not, linearizations of 
the nonlinear constraints are added in the current iter- 
ate as cuts to the relaxed version of (1) and the branch- 
ing process is continued. 


Branch and Bound 


The root node P! of the tree is constructed by relax- 
ing (1) such that the integer requirements and the non- 
linear constraints from the problem formulation are 
dropped. If some of the constraints gj(x,y) < 0, j = 
1,...,m, are linear, they can be kept in the relaxed 
problem. 

A branch and bound procedure, see [4], is then per- 
formed on this linear relaxation until an integer feasible 
node PK is obtained. In this node, the integer variables 
yk are fixed and an NLP iteration is performed in order 
to solve the NLP subproblem 


min f(x, y*) ; 


. (2) 
st. g(x,y") <O0,xEX. 


Note that the problem is not solved to optimality. In- 
stead one NLP iteration is performed in order to get 
a good approximation of the optimal solution to the 
subproblem. Let x* be the iterate obtained after the NLP 
iteration on (2). 

The iterate (x*, y*) can be used to add lineariza- 
tions of the violated and active nonlinear constraints in 
( xk , yk), 

g(x", v8) + Vgi(x*, y*) (x —x*, y— y*) <0, 

pe {iz glx" y) 20}, 
to the set of cuts (2; in order to obtain a new set of cuts 
Qr41. 
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If x* is the optimal solution to (2), then Fla*, y*) is 
an upper bound of the optimal solution to (1) and can 
be used to drop unexplored nodes from the branch and 
bound tree whenever the lower bounds of the nodes are 
greater than this upper bound. 


NLP Iteration 


The NLP subproblem (2) is solved by performing an 
NLP iteration [6,8]. In an NLP iteration, a sequence 
of linear programming (LP) subiterations is performed. 
In each subiteration (i) within the NLP iteration, an 
LP problem LP(x“?) is generated in the current iterate 
(x), y*), The LP problem LP(x“?) is of the form 


min Vf(x, y¥)’d + Ct, 
st. g(x, y*) + Vei(x, y)Td -—t <0, 


j=l,...,m, 
(d@)?HOd =0,r=1,...,i-1, i>, 
t>0, 


x 4d, EX,dy=0, 


where d = (dx, dy), and d” +r=1,...,i—1, are the 
previously obtained search directions within the NLP 
iteration. Also, H is the current estimate of the Hes- 
sian of the Lagrangian 


L(x, A) = f(x,y") + > Ajgi(x,y*). 


j=1 


The BFGS update formula was used in the implemen- 
tation of the algorithm to approximate H 

The dual optimal solution to the LP problem 
LP(x\) provides a Lagrange multiplier estimate, which 
can be used when estimating the Hessian of the La- 
grangian. Consequently, the Hessian approximation 
can be updated in each LP subiteration. 

The solution to each LP problem LP(x“)) provides 
a search direction d“ = (d‘, 0). A line search is per- 
formed in the obtained search direction minimizing 
a modified function based on the Lagrangian of (2), 


L(x) = fx, y*) + Do Ajax, yt 


j=l 


+ p> (glx, y)*y, 


j=l 


where gj(x, yk) = max(g;(x, aa? 0) and p(>0) is 
a penalty parameter. 

The current iterate (x, yk) is then updated us- 
ing the solution a of the line search such that 
x"FD = x) 4 gd, where a is the step length 
found in the line search. 

A new LP problem is then constructed in the up- 
dated iterate (xt, y*), The new LP problem is con- 
structed in a similar way as the previous LP problem 
with equality constraints requiring the new search di- 
rection to be a conjugate direction to the previously ob- 
tained search directions with respect to the current es- 
timate of the Hessian of the Lagrangian. 

The linear equality constraints 

(d)’Hd =0, r=1,...,i-1, 
are cutting hyperplanes ensuring that d will be a conju- 
gate direction to the old directions d. 


Initialize: Do a number of NLP iterations on 
the continuous relaxation of (1) to obtain the 
initial iterate for the problem. Construct the 
relaxed node P! and insert it into the branch 
and bound tree as the top node and set the 
upper bound U = oo. Add cutting planes for 
the initial iterate to the set of cutting planes 
2. Let k=1. 


While (there are unexamined nodes in the tree) 

do 

1. Do normal branching on the branch and 
bound tree including the cutting planes 2; 
as linear cuts until integer solution (z*, y”) 
found. 

2. Fix y* and perform an NLP iteration on (2). 
Let the iterate after the NLP iteration be 2”. 

3. If solution x* optimal for (2) Then update 
upper bound U := min{U, f(x", y*)}. 

4. If (2) is found to be infeasible Then let x” be 
the last iterate from the NLP iteration. 

5. Let Qp41 = Q,. Add cutting planes generated 
in (2*,y*) for the active and violated con- 
straints and add them to (41. Let k := k+1. 

End While 
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Pseudo-code for the SCP algorithm 
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The new LP problem is then solved, a new line 
search performed and the iterate updated again. The 
procedure is repeated until the LP problem becomes in- 
feasible, a sufficient number of steps have been taken 
or the current solution to the LP problem is sufficiently 
close to zero. 

Normally, the NLP iteration would then be repeated 
until an optimal point is found. However, in this ver- 
sion of the algorithm only one NLP iteration is per- 
formed in each integer node of the branch and bound 
tree in order to improve the performance of the algo- 
rithm. 

Convergence properties of the NLP version of the 
SCP algorithm have been analyzed in earlier papers, 
see [6] and [8]. 


Algorithm Pseudo-Code 


The algorithm is summarized in Algorithm 1. 


Cases 


The performance of the algorithm is illustrated on 
a set of difficult block optimization problems presented 
in [2]. The paper concerns optimizing the arrangement 
of a number of departments with unequal area require- 
ments. It is possible to formulate the problems as mixed 
integer nonlinear programming problems, where the 
constraints are department and floor area requirements 
as well as department locational restrictions. The opti- 
mization target is to minimize the cost associated with 
the projected interactions between departments. In [2], 
the Extended Cutting Plane method was compared to 
a number of commercial algorithms and proved to be 
superior to the other solvers. 

In Table 1 the results for the solvers in [2] are sum- 
marized in terms of the number of problems solved to 
optimality, number of problems for which a feasible so- 
lution was obtained and number of problems for which 


Sequential Cutting Plane Algorithm, Table 1 
Performance of the solvers on the block layout problem as 
reported in [2] 


Solution BARON DICOPT MINLPbb SBB_ a@-ECP 


[Nosowutin 22s fo | «[o_| 


no solution was obtained within 12 h of CPU time. 
More information about the solvers used can be found 
in [1]. 

The results are excellent for the w-ECP algorithm on 
these test problems. It only failed to solve one problem 
to global optimality and obtained the best integer feasi- 
ble solution for the problem it could not solve to global 
optimality. 

In Fig. 1 the results of the new SCP algorithm are 
compared with the a-ECP algorithm using a perfor- 
mance profile [3]. The results are also compared with 
a version where each integer node is solved to optimal- 
ity by repeating the NLP iterations until an optimal so- 
lution to (2) is found. This version of the SCP algorithm 
is denoted SCP-NLP. Note that if you solve the inte- 
ger nodes to optimality, the procedure is similar to the 
method described in [5]. CPLEX was used as the MILP 
solver and the NLP part of the algorithm was imple- 
mented in MATLAB. A maximum of 12 CPU h for each 
problem was used when solving the problems. 

As can be seen from the results, the SCP algorithm 
performed very well on the test problems. In more than 
half of the cases it was the fastest solver. Considering 
that some of these problems can take several hours to 
solve, the performance improvement is significant. 

In Fig. 2 the best-known solutions of the SCP and 
a-ECP solvers, when running the solvers for a maxi- 
mum of 60 CPU s, are compared with the best-known 
solutions for the other solvers in [2], when running 
these solvers for a maximum of 12 CPU h. 

Note that both the SCP and a@-ECP solvers return 
very good solutions after 60 CPU s in comparison with 
the other solvers that were run for a maximum of 12 
CPU h. In fact, the SCP algorithm reported the best- 
known solution in almost half of the cases. Thus, the 
SCP algorithm can also be used for efficiently finding 
good solution candidates, when the problems are too 
difficult to solve to the global optimum. 


Conclusions 


The Sequential Cutting Plane algorithm is well-suited 
for solving mixed integer nonlinear programming 
problems. The numerical results above for a set of chal- 
lenging block layout problems support this claim. An- 
other advantage is that the algorithm will efficiently find 
a good feasible solution to a problem, even when the 
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Performance Profile 
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Sequential Cutting Plane Algorithm, Figure 1 
Performance profile comparing CPU times for solving the block layout problems 
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Sequential Cutting Plane Algorithm, Figure 2 
Performance profile comparing the optimal values of the solvers. Observe that 60 CPU seconds is used for SCP/ECP and 
12 CPU hours for the other solvers 
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problem is not solved to optimality. Thus, difficult com- 
binatorial optimization problems in real-world applica- 
tions could be solved by combining the SCP algorithm 
with some non-deterministic approach such as an evo- 
lutionary algorithm. 
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For the purpose of this article, distributed optimal con- 
trol problems are optimization problems which are 
posed in function space and in which one of the con- 
straints is a partial differential equation. An example of 
such a problem is 


ini =f (y(x) — F(x)? dx + a (x) dx 
2 2 2 Tr 
st. —Ay(x) + y3(x)-— yx) =0 in, 
2 (x) = u(x) inl, 
on 
u(x) < u(x) < u,(x) aeonl, 
yilx) < y(x) < yulx) ae.in 2, 


where S2 is a bounded open domain in R’, I’ is the 
boundary of 22, w > 0 is a given parameter, and the 
functions 1), Uy, Y, Vir Yu are given. The functions y 
and u are called the states and controls, respectively. 
The partial differential equation including the bound- 
ary conditions is called the state equation or the govern- 
ing equation. 

Abstractly, a distributed optimal control problem 
may be written as 


min J(y, 4), (1) 
such that 

c(y,u) = 0, (2) 

u € Uaa, (3) 

y € Yaa, (4) 


where Y, C are Hilbert spaces, U is a Banach space, Yaa 
CY, Uaa C U are closed convex sets, and J:Y x U—> R, 
c: Y x U > Care twice Fréchet differentiable functions. 
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In the previous example, Y = H!(2), C = (H1()), U 
= 1*(I), and Yaa = {y € H'(2):y1 < y < yy ae. in Q}, 
Usa = {ue L? (2): <u <u, ae. in I}. Several optimal 
control problems that fit into this framework are stud- 
ied in [4,7,8,12]. This problem formulation also covers 
optimal design problems [5,9] and parameter identifica- 
tion problems. 

After a discretization of the problem (1)-(4) using, 
e. g., finite elements, one often obtains a nonlinear pro- 
gramming problem of the form 


min J"(y", u"), (5) 
such that 

oo uw =, (6) 

uf <u" <ul, (7) 

yt sy" Syl, (8) 


where y" © R"”, ue R™ and J’RI7+™ > R, RI tM 
— R”» are twice differentiable functions. The number 
of discretized states n, tends to be large. Depending on 
the type of control, n, can be small (boundary control) 
or large (distributed control). 

Sequential quadratic programming interior point 
(SQPIP) methods have been used to solve distributed 
optimal control problems. See, e. g., [2,3,5,7,9,10,11]. 
While the various SQPIP methods differ, they share 
some important design features. In each iteration a sub- 
problem is solved that only involves a linearization of 
the state equation (2) or (6) and a quadratic approxi- 
mation of the Lagrangian L(y", u", A") = Jh(y*, u") + 
(c'(y", u), A"). All iterates stay strictly feasible with re- 
spect to the bounds, but the nonlinear state equation 
is only satisfied in the limit. SQPIP methods attempt 
to achieve feasibility and optimality at the same time. 
Such all-at-once approaches are usually more efficient 
than solution approaches that maintain feasibility of the 
nonlinear state equation at every iteration. For a general 
introduction to SQP methods and interior point meth- 
ods for nonlinear programming problems see ® Suc- 
cessive quadratic programming: Solution by active sets 
and interior point methods; » Linear programming: 
Interior point methods. 

This article focuses on the application of SQPIP 
methods to distributed optimal control problems. 


For such problems affine scaling SQPIP methods [3] 
and various versions of primal-dual SQPIP methods 
({2,5,7,9,10,11]) have been used. Primal-dual interior 
point methods have been applied to the nonlinear 
problem (5)-(8) directly or as inequality constrained 
quadratic programming subproblem solvers. In all ref- 
erences above, SQPIP methods have been applied to 
discretized optimal control problems (5)-(8). While 
these applications have been successful, there are sev- 
eral open research issues. The development of SQPIP 
methods for large scale nonlinear programming prob- 
lems is a very active research area. Additional research 
issues arise when SQPIP methods are applied to dis- 
tributed optimal control problems. The latter research 
issues and achievements will be described in the follow- 
ing. 

Distributed optimal control problems have a partic- 
ular problem structure. It is derived from the under- 
lying infinite-dimensional problem and from the di- 
vision of optimization variables into states and con- 
trols. Since one is interested in the solution of the 
infinite-dimensional problem (1)-(4), one needs to 
consider sequences of discretized problems (5)-(8). 
While for a fixed discretization the nonlinear pro- 
gramming framework may be applicable, the under- 
lying infinite-dimensional problem is important when 
sequences of successively refined discretizations are 
considered. It is important to understand the conver- 
gence behavior of the SQPIP algorithm in the infinite- 
dimensional setting, because this often dominates the 
convergence behavior of the algorithm for the dis- 
cretized problem when the discretization is fine. Con- 
vergence results in [2,11] for SQPIP algorithms applied 
to elliptic optimal control problems show that in sev- 
eral cases the number of optimization iterations in- 
creases only slowly, if at all, when the discretization is 
refined. This indicates that there might be an under- 
lying infinite-dimensional convergence theory. How- 
ever, such a theory is not yet known for most interior 
point algorithms. Currently, the only analyses of inte- 
rior point methods for infinite-dimensional problems 
are [6,13,14]. 

Most available SQPIP algorithms for nonlinear pro- 
gramming use sparse direct linear algebra to solve sub- 
problems. This is not suitable for many distributed 
optimal control problems. Here, subproblems involve 
the solution of linearized partial differential equations, 
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which is best done using problem specific solvers, such 
as multigrid or domain decomposition techniques. 
How to adjust the accuracy of iterative subproblem 
solves within SQPIP methods to obtain an efficient, 
globally convergent algorithm is not yet completely un- 
derstood. An analysis of the influence of inexact prob- 
lem information on the convergence of SQPIP methods 
is also important because discretizations of distributed 
optimal control problems can lead to errors in the 
derivative information. For example, this may happen 
when derivative information for the discretized prob- 
lem (5)-(8) is computed by discretizing the Fréchet- 
derivatives of the infinite-dimensional problem (1)-(4). 
In such cases error in derivative information often goes 
to zero as the discretization is refined. An understand- 
ing of the influence of inexact derivative information 
on the convergence of SQPIP methods is necessary for 
the development of efficient and robust algorithms, in- 
cluding the development of grid refinement strategies 
within the optimization. 

When considering SQPIP methods for distributed 
optimal control problems it is useful to distinguish be- 
tween problems with and without state constraints (4), 
(8). Problems (1)-(3) or (5)-(7) which include only 
control constraints are often easier to solve. For con- 
trol constrained problems there exist more solution ap- 
proaches, like projection methods, than for state con- 
strained problems. If the controls u are in U = L?, p € 
[1, oo) or p = o&, which is, e. g., the case in the exam- 
ple problem at the beginning of this article, then opti- 
mality conditions for the infinite-dimensional problem 
can be formulated in a form that is suitable for the de- 
velopment of interior point methods. In particular, un- 
der suitable assumptions it is possible to show that La- 
grange multipliers corresponding to (3) are in L™. See, 
e. g., [13,14]. Development of optimality conditions for 
distributed optimal control problems (1)-(4) with state 
constraints that are suitable for use in optimization al- 
gorithms is an active research area. Existing results are 
less general than the ones for control constrained prob- 
lems. Moreover, the Lagrange multipliers correspond- 
ing to the state constraints (4), which are optimization 
variables in most SQPIP methods, are usually only mea- 
sures that can not be represented by functions in L?. 
See, e. g., the discussion and references in [10]. For the 
discretized problem (5)-(7) it is often possible to show 
that the partial Jacobian 0/ dy)cl (y", uw") is invertible for 


h 


all ul <u" <ul, y" eR". In this case the Jacobian 


Henly" ul) ech(yh, ul) 
0 EL yh) 


of the active constraints (6), (7) has full row rank, i.e., 
it is appropriate to assume the linear independence con- 
straint qualification for the problem (5)-(7). This as- 
sumption is made in many local convergence analy- 
ses for SQPIP methods for nonlinear programming. 
For the problem (5)-(8) with state constraints, this as- 
sumptions is often too restrictive. For example, the lin- 
ear independence constraint qualification will be vio- 
lated if more that n,+n, control and state constraints 
(7), (8) are active ([1,15]). Finally, the quality of some 
preconditioners for the iterative solution of subprob- 
lems within SQPIP method is improved when control 
constraints (7) are present, but decreased if state con- 
straints (8) are given (see, e. g., [1]). 

SQPIP methods for distributed optimal control 
problems is a rapidly developing area. This class of op- 
timization algorithms shows great promise, especially 
for problems with state constraints. The reader will find 
detailed discussions in the references and related infor- 
mation in > Successive quadratic programming: Solu- 
tion by active sets and interior point methods; » Linear 
programming: Interior point methods on SQP methods 
and interior point methods for nonlinear programming 
problems. 
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The sequential simplex method is due to the original 
work of W. Spendley, G.R. Hext and F.R. Himsworth 
[2]. It was later developed further by J.A. Nelder and 
R. Mead [1]. An exposition of the ideas underlying the 
method and its operation are as follows. 

The minimization problem considered is: 


min F(x). 


First, a simplex is defined as the geometric object 
formed by a set of n + 1 points in the case of n vari- 
ables (dimensions). Equidistant points form a regular 
simplex. In two dimensions such an object is a triangle, 
in three dimensions it is a tetrahedron and so forth. 

The basic idea of the downhill simplex method (for 
minimization of n-dimensional functions) rests on the 
ability to use the geometric object to move a vertex at 
a time in the direction of descending function values. 
To achieve this firstly one needs to define the initial 
layout of the vertices, and then apply general types of 
moves to modify the object. The basic moves are three, 
namely, reflection, expansion, and contraction. A suit- 
able termination criterion is also needed. All these ba- 
sic characteristics of the sequential simplex algorithm 
are outlined below. 


1. Initial simplex construction 
Generally, one can select any number of n + 1 points 
to form the initial simplex, as long as each subset of n of 
them is capable of spanning the n-dimensional space, in 
other words there are precisely n linearly independent 
vectors in the set. A simple procedure is to select for ex- 
ample an initial vertex point x“ and then arraying the 
remaining n points scaled in the coordinate directions: 
x) = xO 4 ie), §=2,....n4+1, 
where A; is some set of constants which reflect the guess 
of the problem’s characteristic scales in each of its vari- 
ables. The vectors e are the unit vectors along each 
coordinate. 


Having constructed the initial simplex each of the 
vertices must be evaluated in terms of the objective 
function value, leading to the corresponding values f;, 
...> fn+i- Following this, we denote the corresponding 
index, function value and vertex vector for the lowest, 
highest and replacing new vertex by the triplets {J, fi, 
x}, fh, to x}, fr, fir ¥}. It is also useful to define 
the centroid of the simplex (excluding the highest func- 
tion value vertex) by: 


n+1 


1 : 

(c) (i) 
ee Se ) xe 1 
n (1) 


i=1iXxh 


thus having a new triplet representing this centroid 
given by {c, fc, x}. 


2. Reflection (main operation) 

This corresponds to rejecting vertex h, with maximum 
function value among all other vertices, and replacing 
it by a reflected point r through the opposite face of 
the simplex. This is based on the expectation that such 
a new point will have a smaller value. The reflected 
point is constructed by: 


x”) = x60) A any (x _ xl), 


where a; > 0 is the reflection coefficient, determining 
how far the new point will be on the far side of the cen- 
troid vertex c. Clearly, the definition above results in x” 
to lie on the line joining the vectors x and x). The 
definition of the reflection coefficient is thus: 


_ k= I, 


a px — xO], 


If fi < f+ < fn, then the new vertex r is accepted, 
defining a new simplex. It is quite easy though for 
reflection-only based operation to get in a cyclic op- 
eration without improvement. This may happen if the 
simplex gets positioned in such a way that the reflected 
point gives the same function value as the original one 
(the highest) which it is replacing. One may be able to 
introduce rules, such as not allowing returns to be made 
to already visited locations, or by replacing the second 
highest value of the simplex by a new point. 

If the function value is found to be f, < f; then the 
expansion operation is carried out in step 3. If, on the 
other hand, the expansion produces a point for which f, 
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> f1 holds for all points in the simplex except the highest 
value one (point h) then the contraction operation is 
carried out in step 4. 


3. Expansion 

If during the reflection process above, it is found that 
the new point is better than all others in the previous 
simplex, that is f, < f), then the reflection is successful 
in producing a new candidate minimum point. In such 
a case it is expected that the direction along the centroid 
defined in (1) and the replaced point h is a descent di- 
rection. Hence, it is desirable to move further along this 
direction by expanding the simplex. The expansion is 
defined by a new index e point: 


xl?) = x) Es a(x” =), 


again having as a definition of the expansion coefficient 
the ratio: 


_ Ox, 


eo Ox], 


which indicates how much further the original replace- 
ment step is taken in the expansion phase. 

If the procedure satisfies f, < f; the point x is re- 
placed by the point x and another reflection operation 
is restarted. If this does not improve the function value, 
that is f. > f; holds, then the replacement is such that 
the point x is replaced by the point x“ and another 
reflection process is started again (going back to step 2). 


4. Contraction 

If the reflection process has produced a point for which 
f, > fi holds for all points in the simplex except the 
highest value one (point h), then the point x) is re- 
placed by x"). But this will cause the new highest value 
point to be the one just introduced, hence it is necessary 
to contract the simplex by the following rule, where s 
indicates the new contracted point: 


x6) x) 4 a(x”) = x), (2) 


where the parameter @3 is the contraction coefficient, 
such that 0 < a3 < 1, and defines the ratio: 


_ bee =x, 


am [x —x]], 


9 


defining by how much the direction to the highest value 
vertex from the centroid is contracted. 

If f, > fn, then the contraction operation in (2) is 
used, without of course replacing the original highest 
value point h. 

If the point s is such that f; < min{f;, f,} then point 
x") is replaced by x“ and reflection operations resume 
(going to step 2). 

If the contraction has failed, yielding f; > min{f),, 
fr} then it is proposed to replace all points x by mov- 
ing closer to the lowest value point using (x + x")/2 
and restarting reflection operations (going to step 2). 

5. Termination criteria 

Termination criteria are necessarily by the con- 
struction of the method going to have to be based on 
some ‘average’ value along all current vertices in the 
simplex. A suitable criterion, which may be tested each 
time the simplex has been modified by any of the above 
three operations, is the following: 

n+1 


1/2 
1 
(= Sch 0] <€, 


where the value fo may be set to be the centroid cal- 
culated in (1), or one that includes all present points, 
i.e. even the highest point excluded in the summation 
in (1). 


The above criterion is a standard deviation measure 
of all simplex points, which should be less or equal to 
some € > 0 specified tolerance on its value. 

The various parameters a, 2, @3 appearing in the 
various operations of the algorithm are either fixed, or 
can be adapted using line search type operations to find 
the best values that will enhance the desired effect of the 
operation involved (steps 2-4). 

The sequential simplex method requires only func- 
tion values generally, and is a simple algorithmic proce- 
dure. Generally, it is not the best derivative-free meth- 
ods, requiring several function evaluations for larger 
problems (in terms of number of variables). However, 
whenever a quick and easy search phase is required for 
an unconstrained optimization problem, without too 
many variables, this method may be quite useful. An- 
other advantage is that the method does not require the 
objective function f(x) to be differentiable, hence it may 
be useful for such practical applications. 
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> Convex-simplex Algorithm 

> Cyclic Coordinate Method 

> Lemke Method 

> Linear Complementarity Problem 

> Linear Programming 

> Parametric Linear Programming: Cost Simplex 
Algorithm 

> Powell Method 

> Rosenbrock Method 
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We consider the class of problems having the following 
structure: 


where A is a m X n matrix of zeros and ones, e = (1, 
..., 1) is a vector of m ones and c is a vector of n (arbi- 
trary) rational components. This pure 0-1 linear pro- 
gramming problem is called the set covering problem. 
When the inequalities are replaced by equations the 
problem is called the set partitioning problem, and when 
all of the > constraints are replaced by < constraints, 
the problem is called the set packing problem. 


Applicability of the Problem 


Many applications arise having the packing, partition- 
ing and covering structure. Delivery and routing prob- 
lems, scheduling problems and location problems often 
take on a set covering structure whereby one wishes to 
assure that every customer is served by some location, 
vehicle or person. Other applications include switch- 
ing theory, the testing of VLSI circuits, and line balanc- 
ing. Similarly, scheduling problems whereby one wishes 
to satisfy as much demand as possible, without creat- 
ing conflicts often requires the set packing format. Fi- 
nally, when every customer must be served by exactly 
one server, the problem takes on the set partitioning 
format. Commonly cited problems having this struc- 
ture include the crew-scheduling problem, where every 
flight leg of an airline must be scheduled by exactly one 
cockpit crew; another is the political districting problem, 
whereby regions must be divided into voting districts 
such that every citizen is assigned to exactly one dis- 
trict. See the survey [4] which contains a bibliography 
on applications. 

Recently (as of 2000), reformulating them as ei- 
ther set-covering problems or set-partitioning prob- 
lems having an extraordinary number of variables has 
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solved a variety of difficult problems. Because, for even 
small instances of the problem, the problem size can- 
not be explicitly solved, techniques known as column- 
generation, which began with the seminal work [16] on 
the cutting-stock problem, are employed. An overview 
of such transformation methods can be found in [5]. 
For specific implementations to the vehicle routing 
problem, see [11], for the bandwidth packing problem, 
see [27], for the generalized assignment problem see 
[29] and for alternative column-generation strategies 
for solving the cutting-stock problems, see [5]. 

J. Bramel and D. Simchi-Levi [7] have shown that 
the set-partitioning formulation for the vehicle routing 
problem with time windows is a tight formulation, i.e. 
the relative gap between the fractional linear program- 
ming solution and the global integer solution is small. 
Similar results are obtained for the bin-packing prob- 
lem [9] and for machine scheduling [8]. 


Solution Approaches 


Once the problem has been formulated as a set- 
covering, set-packing or set-partitioning problem, the 
search for an optimal (or near-optimal) solution to this 
NP-hard 0-1 linear programming problem remains. 
Most solution approaches start by considering the lin- 
ear programming relaxation (LP relaxation) of the re- 
spective problem. If the matrix A is a perfect zero-one 
matrix, see [23], then the LP relaxation of both the set 
packing and the set partitioning problem have a zero- 
one optimal solution for all choices of the objective 
function. Likewise, if the matrix A is an ideal matrix, see 
[26], then the same holds true for the set covering and 
the set-partitioning problem. Problems arising in prac- 
tice need not, however, have perfect or ideal matrices. 
Nevertheless, it has been observed in computational 
practice that as long as the problems to be solved are rel- 
atively small, linear programming (or linear program- 
ming coupled with branch and bound) is likely to pro- 
vide integer solutions quickly. However, as the problem 
size increases, the nonintegrality of the linear program- 
ming solution increases dramatically as does the length 
and size of the branching tree. It is for these larger in- 
stances of the problem that approximation techniques, 
reformulation and exact procedures have been devel- 
oped that exploit the underlying structure of the prob- 
lem. 


Reformulation of the Linear Description 
of the Problem 


The natural structure of packing, covering and parti- 
tioning approaches provides opportunities to automat- 
ically remove any unnecessary rows or columns, and to 
remove any variables that cannot exist in any optimal 
solution. Checks for inconsistencies among the con- 
straints are also performed. Reformulation procedures 
for the set covering problem have been well known for 
a long time [15] but had not been implemented into 
a special-purpose code for solving very large scale prob- 
lems until 1993 [19]. 


Heuristics for the Set Partitioning 
and Covering Problems 


Virtually every heuristic approach for solving general 
integer programming problems has been applied to the 
set-covering, packing and partitioning problems. The 
set covering and packing formulations naturally lend 
themselves to greedy starts (i.e. an approach that at 
every iteration myopically chooses the next best step 
without regard for its implications on future moves), 
see e.g. [14]. Interchange approaches have also been 
applied here; a swap of one or more columns is taken 
whenever such a swap improves the objective function 
value. Newer heuristic approaches such as genetic al- 
gorithms (cf. also ® Genetic algorithms), probabilistic 
search [13], simulated annealing (cf. also ® Simulated 
annealing methods in protein folding) [5], and neural 
networks (cf. also » Neural networks for combinato- 
rial optimization) [1] have each been tried. Unfortu- 
nately, there has not been a comparative testing across 
such methods to determine under what circumstances 
a specific method might perform best. J.E. Beasley [6] 
maintains an extensive test set of problem instances for 
these important problems. 

In addition, one can embed heuristics within ex- 
act algorithms so that one iteratively tightens the upper 
bound at the same time that one is attempting to get 
a tight approximation to the lower bound for the prob- 
lem. See [19] for a description of a linear-programming 
based heuristic for the set-partitioning problem with 
side constraints, and [2] for heuristics based on using 
Lagrangian relaxation (cf. also » Integer programming: 
Lagrangian relaxation) embedded within branch and 
bound to solve the set covering problem. 
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Exact Solution Approaches to the Set Covering, 
Packing and Partitioning Problems 


Exact approaches to solving set partitioning, covering 
and packing problems require algorithms that generate 
both good lower and upper bounds on the true mini- 
mum value of the problem instance. One can use any of 
the heuristics mentioned above to obtain a good upper 
bound to these problems. One should note, however, 
that the set covering and packing problems are easier 
problems for heuristic search because for these prob- 
lems, it is, in general, easy to find feasible solutions. The 
set-partitioning problem may create unique concerns 
for some of these algorithms specifically because each 
row must be covered exactly once. 

In general, the lower bound on the optimal solu- 
tion value is obtained by solving a relaxation of the op- 
timization problem. That is, one solves another opti- 
mization problem whose set of feasible solutions prop- 
erly contains all feasible solutions of the original prob- 
lem and whose objective function value is less than or 
equal to the true objective function value for points 
feasible to the original problem. Thus, we replace the 
‘true’ problem by one with a larger feasible region that is 
more easily solved. There are two standard relaxations 
for covering, packing and partitioning problems: La- 
grangian relaxation (where the feasible set is usually re- 
quired to maintain 0-1 feasibility, but many if not all 
of the constraints are moved to the objective function 
with a penalty term) and the linear programming re- 
laxation (where only the integrality constraints are re- 
laxed and the objective function remains the original 
function). For the set-covering problem, in [12] a La- 
grangian formulation and subgradient optimization is 
used. In [3] various Lagrangian relaxations are tested, 
including some that incorporated cuts within the for- 
mulation and kept a disjoint set of the original linear 
constraints unrelaxed. In [2], dual and primal heuris- 
tics, recursive variable fixing and subgradient optimiza- 
tion are embedded within a branch and bound tree 
search. 

An alternative approach to solving set partitioning, 
packing and set covering problems is branch and cut. 
This method begins by solving the linear programming 
relaxation to the problem and then tightening the for- 
mulation by adding new linear inequalities to the con- 
straint set. 


Specifically, it requires finding linear inequalities 
that are violated by a given relaxation but are satisfied 
by all feasible zero-one points. The most successful cut- 
ting plane approaches are based on polyhedral theory, 
that is they replace the constraint set of an integer pro- 
gramming problem by a convexification of the feasible 
zero-one points and extreme rays of the problem. Some 
of the polyhedral cuts useful for set-partitioning prob- 
lems are clique inequalities, odd cycles, and the comple- 
ments of odd cycles in the intersection graph associated 
with the matrix A. For a complete description of how 
such cuts are embedded into a tree search structure that 
also uses heuristics, and reformulation and variable fix- 
ing techniques, see [19]. 

For details on polyhedral structure see [10,22,24,25] 
and [28]. Currently, the polyhedral description of these 
problems is incomplete. As our understanding of the 
mathematical structure of the set partitioning, packing 
and covering polytopes improves, and with the contin- 
uing advancement in computer technology, it is likely 
that many difficult and important problems will be 
solved by being able to solve larger and larger set parti- 
tioning problems to proven optimality. 


The Future 


The recent interest (as of 2000) in reformulating hard, 
important scheduling problems into set partitioning 
problems via column generation has reinvigorated re- 
search into both linear and integer programming solu- 
tion techniques. The linear programming relaxations of 
these very large set partitioning problems yield highly 
degenerate problems for the primal simplex method 
to solve. This degeneracy resulted in the revisiting of 
both primal and dual steepest edge methods, see [17]. 
The fact that the column generation approaches require 
the solution to many set-partitioning problems requires 
that we better understand the structure of these prob- 
lems. More research into polyhedral structure, time- 
staged network optimization (for the subproblem so- 
lutions), and careful attention to computer implemen- 
tation details are likely to yield successes to problems 
(such as machine-shop scheduling) that up until now 
have not allowed exact solution approaches. 


See also 


> Branch and Price: Integer Programming with 
Column Generation 
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> Decomposition Techniques for MILP: Lagrangian 
Relaxation 

> Graph Coloring 

> Integer Linear Complementary Problem 

> Integer Programming 

> Integer Programming: Algebraic Methods 

> Integer Programming: Branch and Bound Methods 

> Integer Programming: Branch and Cut Algorithms 

> Integer Programming: Cutting Plane Algorithms 

> Integer Programming Duality 

> Integer Programming: Lagrangian Relaxation 

> LCP: Pardalos—Rosen Mixed Integer Formulation 

> Mixed Integer Classification Problems 

> Multi-objective Integer Linear Programming 

> Multi-objective Mixed Integer Programming 

> Multiparametric Mixed Integer Linear 
Programming 

> Parametric Mixed Integer Nonlinear Optimization 

> Simplicial Pivoting Algorithms for Integer 
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Nowadays (2000) set-valued optimization means set- 
valued analysis and its application to optimization, and 
it is an extension of continuous optimization to the set- 
valued case. In this research area one investigates opti- 
mization problems with constraints and/or an objective 
function described by set-valued maps, or investiga- 
tions in set-valued analysis are applied to standard op- 
timization problems. In the last decade there has been 
an increasing interest in set-valued optimization (e. g., 
see the special issue [7]). 

General optimization problems with set-valued con- 
straints or a set-valued objective function are closely 
related to problems in stochastic programming, fuzzy 
programming and optimal control. If the values of 
a given function vary in a specified region, this fact 
could be described using a membership function in the 
theory of fuzzy sets or using information on the dis- 
tribution of the function values. In this general setting 
probability distributions or membership functions are 
not needed because only sets are considered. 

Optimal control problems with differential inclu- 
sions belong to this class of set-valued optimization 
problems as well. Set-valued optimization seems to 
have the potential to become a bridge between different 
areas in optimization. And it is a substantial extension 
of standard optimization theory. Set-valued analysis is 
the most important tool for such an advancement in 
continuous optimization. And conversely, the develop- 
ment of set-valued analysis receives important impulses 
from optimization. 


Set-valued optimization problems have been inves- 
tigated by many authors for instance, there are pa- 
pers on optimality conditions (e. g., [3,4,5,6,9,11,14,17, 
19,21,22]), duality theory (e.g., [8,20,23,24]) and re- 
lated topics (e. g., [10,15,16,25]). For further develop- 
ments see [7]. 

In the following let X, Y and Z be real linear spaces, 
let Y and Z be partially ordered by convex cones Cy C Y 
and Cz C Z, respectively (then <c, and <c, denote the 
corresponding partial orderings), let S bea nonempty 
subset of X, and let F: S > 2% and G: S — 27 be 
set-valued maps. Throughout this article it is generally 
assumed that the domain of a set-valued map equals its 
effective domain, i.e. for every element of the domain 
the image is a nonempty set. 

Under these assumptions one considers the set- 
valued optimization problem 


min F(x) 
(SVOP) 4st.  G(x)N(—Cz) ¥Q@, 
xeES. 


For simplicity let 
$= 7 es Cn e a} 


denote the feasible set of this problem which is assumed 
to be nonempty. If G is single-valued, the constraint in 
(SVOP) reduces to G(x) € —Cz or G(x) <c, 0z general- 
izing equality and inequality constraints. If, in addition, 
F is single-valued, then the problem (SVOP) is a general 
vector optimization problem. 

It is also possible to use a constraint of the form G(x) 
C —Cz. But with a simple transformation (see [20]) this 
type of a constraint can be transformed to the type of 
the constraint in problem (SVOP). This transformation 
has the drawback that convexity and continuity proper- 
ties of the map may be lost. 

As a simple example for problem (SVOP) consider 
the case that the objective function of a standard opti- 
mization problem is not known explicitly. But one as- 
sumes that for every feasible x a lower bound f(x) and 
an upper bound g(x) are given. In this case one can re- 
place the objective function by a set-valued map F with 


F(x) := [f(x), g(*)] 


In practice it turns out that set-valued optimization 
makes only sense for a set-valued map F whose lower 


forallx €S. 
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bound cannot be described by a function f (because the 
minimization of F is in a certain sense equivalent to the 
minimization of f). 

Now the actual minimality notion used in set- 
valued optimization is presented. 


Definition 1 A pair (x, y) with x € S and y € F(x) is 
called a minimizer of problem (SVOP), ify is a minimal 
element of the set F(S) := Ux € sF(x), ie. 


({y} — Cy) N F(S) C {9} + Cy. 


It is known from vector optimization that the so-called 
weak minimality notion is the appropriate concept for 
the formulation of necessary and sufficient optimality 
conditions. This fact also holds for the set-valued case. 


Definition 2 If in addition int(Cy) 4 9, then a pair 
(x, y) with x € S and y € F(x) is called a weak mini- 
mizer of problem (SVOP), if y is a weakly minimal ele- 
ment of the set F(S), i.e. 


({y} — int(Cy)) N F(S) = @. 


In order to obtain optimality conditions generalizing 
the known classical conditions a suitable differentiabil- 
ity notion is now introduced. 


Definition 3 [14] In addition, let X and Y be real 
normed spaces, and let a pair (x, y) with x € S and y € 
F(x) be given. A single-valued map DF(x, y): X — Y 
whose epigraph equals the contingent cone (e.g., see 
[13]) to the epigraph of F at (x, y), i.e. 


epi(DF(@, y)) = T(epi(F), (%, 9) 


is called contingent epiderivative of F at (x, y). 


Here the epigraph of F is defined as 


epi(F) 
= {(x,y)e Xx Y: x eS, ye F(x) +Cy}. 


The contingent epiderivative is a possible gener- 
alization of the well-known directional derivative in 
the single-valued case. Under convexity assumptions 
the contingent epiderivative is a sublinear map. In set- 
valued optimization convex maps are introduced as fol- 
lows. 


Definition 4 The set-valued map F: § > 2” is called 
Cy-convex, if for all x1, x. € Sand A € [0, 1] 


AF(x1) + (1 — A)F(x2) 
GC F(Ax, + Ql =— X)x2) + Cy. 


Using the concept of the contingent epiderivative it is 
also possible to introduce subgradients of a set-valued 
map. 


Definition 5 [2] Let the contingent epiderivative 
DF(x, y) of F at (x, y) exist with x € S and y € F(x). 
A linear map L: X — Y with 


L(x) <c, DF(X, y)(x), for all x € X, 

is called a subgradient of F at (x, y). The set OF (x, y) of 
all subgradients L of F at (x, y) is called subdifferential 
of F at (x, y). 


Theorem 6 [2] In addition to the assumptions, let X 
and Y be real normed spaces, let S = X, let Cy be pointed, 
let Y be order complete, let F be Cy-convex, and let 
the contingent epiderivative DF(x, y) of F at (x,y) ex- 
ist with x € S andy € F(x). Then the subdifferential 
OF (xX, y) is nonempty. 


Next, a complete characterization of weak minimizers 
in convex set-valued optimization is presented. 


Theorem 7 [14] In addition to the assumptions, let X 
and Y be real normed spaces, let S be a convex set, let 
int(Cy) # @, let F be Cy-convex, and let the contingent 
epiderivative DF(x, y) exist atx € Sand y € F(x). The 
pair (X, y) isa weak minimizer of problem (SVOP) ifand 
only if 


DF(x, y)(x —x) € —int(Cy) forallx €S. 


A corresponding result can be shown for the con- 
strained case where G describes an inequality constraint 
as in problem (SVOP). This optimality condition ex- 
tends the Lagrange multiplier rule to the set-valued case. 


Theorem 8 [11] In addition to the assumptions, let X 
and Y be real normed spaces, let S be a convex set, let 
int(Cy) 4 @, let F be Cy-convex, let G be Cz-convex, let 
x € Sand y € F(x) be given, and let the contingent 
epiderivative of (F, G) at (x, (y, Z)) exist for an arbitrary 
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Z € G(x) MN (—Cz). Moreover, let the regularity assump- 
tion 


{z: (y.z) € D(F, GH, )\(cone(S — {¥}))} 
+ cone(Cz + {Z}) = Z 


be satisfied (cone(--- ) denotes the cone generated by a set 
[13]). Then (x,y) is a weak minimizer of the problem 
(SVOP) if and only if there are continuous linear func- 
tionals t € Cy«\ {Oy«} and u € Cz* with 


t(y) + u(z) > 0 
for all(y,z) = D(F, G)(x(y, z))(x — x) with x € Ss and 


u(z) = 0. 


General results on set-valued analysis can be found in 
[1]; while [12] and [18] present the theoretical back- 
ground of vector optimization. 


See also 


> Generalized Monotone Multivalued Maps 
> Generalized Monotone Single Valued Maps 
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Shape optimization is a part of the larger field called 
structural optimization (cf. » Structural optimization; 
> Structural optimization: History). Structural opti- 
mization can be characterized as an applied branch of 
the optimal control theory in which the control vari- 
able is related to the geometry of structures. It can be 
divided into 3 parts: 

i) sizing optimization (optimization of a typical size: 
a thickness optimization of beams, plates, etc.); 

ii) shape optimization (optimization of a shape of 
structures, keeping the topology of an initial de- 
sign); 

iii) topology optimization (it makes possible to change 
the topology of an initial design). 

An abstract formulation of a large class of optimal 
shape design problems reads as follows: 


Find Q* € 0 
(P) s.t. I(2*, u(Q*)) < (2, u(&2)), 
VQe0O. 


Here 0 is a family of admissible domains, u({2) is aso- 
lution of a state problem (P(2)), describing the behav- 
ior of a structure, represented by a domain 2 and I is 
a cost functional. The state problem is typically given by 
partial differential equations (PDEs) or by variational 
inequalities (VIs). The mathematical analysis of (P) in- 
cludes: 

j) the study of existence of solutions to (P); 

jj) discretization of (P) and the convergence analysis; 
jj) sensitivity analysis. 

In order to guarantee the existence of solutions to 
(P) we make several assumptions (note that there is no 
uniqueness of the solution, in general). First we intro- 
duce the concept of a convergence in 0, ie. if {2,}, Qn 
€ O, is a sequence, we specify the meaning of saying: 
‘2, tends w’ to 2, > Q. With any Q € O, a Hilbert 
space V({2) of functions defined in §2 will be associated 
(space of functions with finite energy). If 2, = Q and 
yn € V(2,), y € V(S2), then one has to define the con- 
vergence ‘y, — y’ (note that the domain of definition of 
functions varies). Finally, let u: 22 > u(2) € V(Q2), 2 


€ O bea state relation (PDE, VI, etc.) with u({2) being 

the solution of (P(2)) and G = {(92, u(2)):2 € O} be 

its graph. We suppose that: 

1) Gis compact in the following sense: if {2,}, Qn € 
O, is an arbitrary sequence, then there exist a subse- 
quence {({2n,, u(2n,))} and an element (2, u({2)) 
€ Gsuch that 


Qn > 2, U(Qn,) > u(2), k>0o, 


in the specified sense; 
2) I is lower semicontinuous: if {92,}, {vz}, where 2, 
€ O, vn € V(S2,,) are arbitrary sequences such that 


2, 2,9 7 wih EO, ve VIG) then 
lim inf 1(2,, ¥) = 1(2,4). 
n—-oo 


It holds (see [4]): 


Theorem 1 Let 1)-2) be satisfied. Then (P) has a solu- 
tion. 


Example 2 Let O bea family of bounded domains sat- 
isfying the uniform cone property (see [1]), let (P(2)) 
be given by the Neumann problem 


(P(Q)) a oe f maeod 

“a = 0 on 022, 
and I(2, y) = f'@|y— zal” dx, where f, za € Lj, (R”) are 
given functions. We set V(2) = H'(Q), i.e. V(Q) is 
the standard Sobolev space of functions defined in Q, 
whose derivatives up to the order one are square inte- 
grable in Q, i.e. elements of L?(2) (see [7]).The weak 
formulation of (P({2)) is given by: 


Find u(2) € H'(Q) 
s.t. i (grad u(S2) - grad @ + ud) dx 
Q 


= | foax, Vo € H\(Q). 
Q 


We define: 
e 2, > & ifand only if x(@,) > x(2) in L?(R”); 
e ‘y, > y if and only if y(Q,) > x(92) (weakly) in 
L’(R™). 
Here, (2) is the characteristic function of £2, 2 is 
a domain containing all $2 € O and the symbol ‘~’ 
stands for the uniform extension of functions from their 
domain of definition to 2 (see [1]). One can verify 1)- 
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2) for the convergences introduced above (see [1,4,8]). 
Thus, the corresponding SO problem has a solution. 


To define an approximation of (P) we replace O by 
a system ‘O;”, h > 0+. Any Oy contains domains (2), 
whose shapes are described by a finite number of pa- 
rameters. This number depends on the discretization 
parameter h (the boundary of 92), is piecewise linear, 
a spline function, etc.). In what follows we shall sup- 
pose that O, C O for any h > 0. With any 2), € O), 
a finite-dimensional space V},({2;,) will be associated (a 
finite element space, e.g.). The state problem (P({2)) 
will be replaced by its suitable discretization (P(2;,))n 
(by using the Ritz-Galerkin method, e. g.). Finally, the 
cost functional J may be approximated (I;,), as well. The 
approximation of (P) reads as follows: 


Find QF EO, 
(P)n jst. In(Qf,un(2Qz)) < In(n, un(2h)), 
VQhp € On. 


Here, up(S2,) is a solutionof (P(2y)n). Problem (P)y, 
expressed in the algebraic form leads to a nonlinear 
mathematical programming problem, in general. 

To establish a relation between solutions to (P) and 
(P);, when h — 0+, the following assumptions are 
needed: 

3) for any 92 € O there exists a sequence {2p}, Qh € 

O,, such that 


Boo. hks0s 


4) for every sequence {(92;, up($2;))}, where Qp € On 
and up,(82;,) solves (P(§2y))p, there exist its sub- 
sequence {(Qh,, Un,(S2n,))} and an element (£2, 
u({2)) € Gsuch that 


Qi. 2: 


“Un, (2ny,) > u(Q)’, k > 00; 
5) if 2, —> 2 with Q, € Op, @ € O and ‘u(2;) > 
u(2), then 


lim In (2n,un(Sn)) = (2, u(2)). 
h—0+ 


Then one can prove (see [4]): 


Theorem 3 Let 3)-5) be satisfied and let for any h > 0 
there exists an optimal pair (Q¥, u,(2¥)). Then there 


exista subsequence (Qj Un, (82 px)’ and an element 
(92*, u(Q2*)) € G such that 


oe 
‘Un (27) =e u(2*)’, k eo x, 


and (§2*, u(92*)) is an optimal pair for (P). Moreover 
any such cluster point (§2*, u(§2* )) of a sequence {(927, 
uj (S27 ))} in the sense of 6) is an optimal pair for (P). 


Example 4 We describe the approximation of (P) from 
Example 2, considered in R’. For any h > 0, the fam- 
ily O;, contains polygonal domains, being the piecewise 
linear approximations of 92 € O and such that 


(h) 
O2p = 'e AjAj+1 
i=1 


(Aniy+1 = A1) with the length |A;Aj41| < A for any 
side. Let {T(h, 2;)}, h — 0+ be a family of triangula- 
tions of 2», (see [2]) satisfying: 

7) any A;Aj;+;1 is the side of just one boundary trian- 
gle T € T(h, Qy); 

8) the number of nodes in J(h, {2;,) is the same for 
any 92; € Oy (h being fixed) and the nodes have 
still the same neighbors; 

9) the position of internal nodes of T(h, 82) continu- 
ously depends on variations of the boundary nodes 
Aj, i=1,..., n(h); 


10) the family {T(h, 2;)} satisfies the uniform angle 
condition: 
46) > 0 such that O67 > O 


holds for any triangle T € J(h, 2p), for any 2p € 
O; and any h > 0, where @,7 is the minimal interior 
angle of T. 
With any such T(h, §2;,), the space of allpiecewise linear 
functions V;,({2,) will be associated (see [2]). Finally, 
(P({2)) is replaced by: 


Find = upj(82n) € Va(S2n) 
s.t. / (grad un(2p) - grad bp 


h 


+un(Qnr)bn) ax 

=) fon dx, 
Qh 

Von € Vi (92h). 


(P(Qi))h 
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It can be shown (see [4]) that 3)-5) are satisfied, pro- 
vided that 7)-10) hold. Thus (P) and (P); are close on 
subsequences as follows from Theorem 3. 


Shape sensitivity analysis is a specific field of SO, ana- 
lyzing the differentiability of the solution of state prob- 
lems with respect to shape variations. There are several 
concepts of the shape differential calculus: the method 
of mappings [6], the material derivative approach [9] 
and the boundary variation technique [8]. Higher or- 
der derivatives in SO and their application are studied 
in [3,5]. On the contrary, if the state problem is given 
by VI, then the differentiability of the mapping: 22 > 
u(S2) is weakened due to the fact that such mapping is 
only Lipschitz continuous (see [9]). Thus the resulting 
minimization problem is nonsmooth, in general. Such 
a type of problems can be realized or by using methods 
of nonsmooth optimization or by regularizing the state 
problem(see [4]). The use of global optimization meth- 
ods in SO based on function evaluations only combined 
with the so-called fictitious domain methods is studied 
in [4]. 


See also 


> Topological Derivative in Shape Optimization 
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Introduction 


Uncertainties in chemical plants appear for a variety of 
reasons. There are internal reasons, such as fluctuations 
of values of reaction constants and physical properties, 
and external reasons, such as quality and flow rates of 
feed streams. The need to account for uncertainty in 
various stages of plant operations has been identified as 
one of the most important problems in chemical plant 
design and operation [7,8,18]. 

There are two main problems associated with the 
consideration of uncertainty in decision making: the 
quantification of the feasibility and flexibility of a pro- 
cess design and the incorporation of uncertainty within 
a decision stage. The quantification of process feasibil- 
ity is most commonly addressed by utilizing the feasi- 
bility function introduced by Swaney and Grossmann, 
which requires constraint satisfaction over a specified 
uncertainty space, whereas flexibility evaluation is asso- 
ciated with a quantitative measure of the feasible space. 
Halemane and Grossmann [10] proposed a feasibility 
measure for a given design based on the worst points 
for feasible operation, which can be mathematically for- 
mulated as a max-min-max optimization problem: 


d)= (d, z, 0), 1 
x(a) nan ae z, 0) (1) 
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where T is the feasible space of 6 described as T = 
{0|04 < 6 < 6"}, where @;, Oy are lower and upper 
bounds, respectively. 

The general formulation for quantifying flexibility, 
known as the flexibility index problem, can be defined 
as the determination of maximum deviation A that 
a given design d can tolerate, such that every point 6 
in the uncertain parameter space (T(6)) is feasible [1]. 
A well-studied case is the hyperrectangle representation 
6, T(5) = {0|0N —5A0~ < 0 < ON + 5A0*}, where 
A@* and A@~ are the expected deviations of the uncer- 
tain parameters in the positive and negative directions 
and 6 the deviation along a specified direction. Other 
descriptions of T(5), such as the parametric hyperellip- 
soid, have also been investigated [15]. 

The flexibility index can be determined from the 
formulation proposed by Swaney and Grossmann 
[18] as: 


F = maxd 

Subject t 6,d) <0 

ubjec co )< (2) 
6>0. 


One approach to determining the flexibility index is by 
vertex enumeration, in which the maximum displace- 
ment is computed along each vertex direction. This 
scheme is based on the assumption that the critical 
points (6°) lie at the vertices of T(A‘), which holds 
only under certain convexity conditions. Other exist- 
ing approaches to quantifying flexibility involves deter- 
ministic measures such as the resilience index (RI) pro- 
posed by Saboo et al. [16] and stochastic measures such 
as design reliability proposed by Kubic and Stein [12] 
and the stochastic flexibility index proposed by Pis- 
tikopoulos and Mazzuchi [14] and Straub and Gross- 
mann [17]. Recently lerapetritou and coworkers [6] in- 
troduced a new approach to quantifying process fea- 
sibility based on the description of the feasible region 
by an approximation of the convex hull. Their ap- 
proach results in an accurate representation of process 
feasibility. 

However, the convex hull approach is limited in its 
application to only convex and 1-D quasiconvex fea- 
sible regions, and its performance deteriorates in the 
presence of nonconvex constraints. This shortcoming 
can be overcome by utilizing surface reconstruction 


ideas to capture the accurate shape of the feasible re- 
gion. 


Definition 

The main problem definition for surface reconstruction 
is, given a set of range points, to reconstruct a mani- 
fold that closely approximates the surface of the origi- 
nal model. The range data are a set of discrete points in 
three-dimensional space that have been sampled from 
the physical environment or can be obtained using laser 
scanners that generate data points on the surface of 
an object. The problem naturally arises in a variety of 
practical situations such as range scanning an object 
from multiple view points, recovery of biological shapes 
from two-dimensional slices, interactive surface sketch- 
ing, etc. Surface reconstruction has extensive applica- 
tions in the areas of automatic mesh generation and 
geometric modeling, molecular structure, and protein 
folding analysis. 

The problem of feasibility analysis is analogous to 
that of surface reconstruction since the main effort of 
feasibility analysis lies in identifying and accurately es- 
timating the boundary of the feasible region. In previ- 
ous approaches this boundary is approximated by lin- 
ear inequalities, either by incorporating a hyperrect- 
angle [18] or by describing an approximation of the 
convex hull [6] inside the feasible space. These meth- 
ods can have satisfactory performance in case of con- 
vex, connected feasible regions but will be inaccurate 
for cases of nonconvex or disjoint feasible regions. On 
the other hand, the surface reconstruction scheme can 
successfully describe both nonconvex and disjoint re- 
gions defining the bounding surface by piecewise linear 
functions. The present work proposes a feasibility anal- 
ysis scheme based on surface reconstruction ideas, in 
particular, the w-shape methodology for surface recon- 
struction. 


a-Shape Approach 


Various approaches are described in the literature for 
determining the shape of a pattern class from sampled 
points. Many of these approaches are concerned with 
efficient construction of convex hulls for a set of points 
in a plane. Jarvis [11] was one of the first to consider the 
problem of computing shape as a generalization of the 
convex hull of a planar point set. 
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A mathematically rigorous definition of shape was 
later introduced by Edelsbrunner et al. [3] as a natural 
generalization of the convex hulls, which is referred to 
as a hull. The @ hull of a point set is based on the no- 
tion of generalized discs in a plane. The family of a hulls 
includes the smallest enclosing circle, the set itself, and 
an essentially continuous set of enclosing regions in be- 
tween these two extremes. 

Edelsbrunner et al. [3] also define a combinatorial 
variant of the w hull called the a shape of a planar set, 
which can be viewed as the boundary of the @ hull with 
curved edges replaced by straight ones. Conceptually, a 
shapes are a generalization of the convex hull of a point 
set 8, with @ varying from 0 to oo. The a shape of 
§ is a polytope that is neither necessarily convex nor 
connected. For a = ov, the a shape is identical to the 
convex hull of S. However, as a decreases, the a shape 
shrinks by gradually developing cavities. When a be- 
comes small enough, the polytope disappears and re- 
duces to the data set itself. 

To provide an intuitive notion of the concept, Edels- 
brunner describes the space R® to be filled with styro- 
foam and the point set S to be made of more solid ma- 
terial, such as rock. Now ifa spherical eraser with radius 
a curves out the styrofoam at all positions where it does 
not enclose any of the sprinkled rocks (the point set 8), 
the resulting object that formed will be called an @ hull. 
The surface of the object can be straightened by sub- 
stituting straight edges for circular ones and triangles 
for spherical caps. The obtained object is the a shape 
of 8. It is a polytope in a fairly general sense: it can 
be concave and even disconnected; it can contain two- 
dimensional patches of triangles and one-dimensional 
strings of edges, and its components can be as small as 
single points. The parameter a controls the degree of 
details captured by the a shape. 

It is possible to generalize all the concepts involved 
in the construction of a shape (i.e., @ hulls, @ com- 
plexes, Delaunay triangulation, Voronoi diagrams) to 
a finite set of points $ in R@ for arbitrary dimen- 
sion d. This generalization, combined with an extension 
to weighted points, is developed in Edelsbrunner [2]. 
However, the implementation details of the problem 
becomes progressively more complex with increasing 
dimension, and the worst-case complexity of the prob- 
lem grows exponentially. 


Selection of a 


The computed a@ shape of a given set of sample points 
explicitly depends on the chosen value of a, which con- 
trols the level of detail of the constructed surface. Man- 
dal et al. [13] present a systematic methodology for se- 
lecting the value of a in R*. They visualize the problem 
of obtaining the shape of § as a set-estimation problem 
where an unknown set U € B is to be estimated on the 
basis of a finite number of points X), X2, ... , Xn € A. 
As n increases, $(n) will cover many parts of 2, and 
hence the value of a for S(n) should depend on the 
sample size (n); thus a is a function of n. Additionally 
a should also be a function of the interpoint distance 
of the sampled n points of §(n). To account for the de- 
pendence on the interpoint distance, the authors have 
constructed the minimum spanning tree (MST) of the 
sampled data points. If J, represents the sum of edge 
weights of the MST, where the edge weight is taken to 
be the Euclidean distance between the points, then the 
appropriate value of w for the construction of a shape 
is given by 


In 
hy = = (3) 
n 


where 7 is the total number of sample points. 

To illustrate the performance of @ shape in captur- 
ing the shape of an object, a disjoint, nonconvex object 
is chosen, as illustrated in Fig. (1). The sampled points 
represent a 2-D object, which is the input to the a-shape 
construction code. The a shape identifies from the in- 
put data set points that lie on the boundary of the ob- 
ject. These points are joined by a line to describe the 
surface of the object. The above figure also illustrates 
the dependence of the captured shape on the chosen 
value of a. The a value estimated by performing the 
MST operation is 120, at which value the a shape was 
found to capture the nonconvex as well as the disjoint 
nature of the object. By further increasing the value 
of a the performance of a shape deteriorates, and at 
very high a the w shape forms a convex hull of the ob- 
ject (Fig. 1b). Hence the level of detail captured by the 
a shape strongly depends on the chosen value of a, and 
progressively decreasing the value of a will capture the 
shape more accurately. 
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Shape Reconstruction Methods for Nonconvex Feasibility 
Analysis, Figure 1 

Performance of a shape for different a values: (a) 120 (b) 
100,000 


Formulation 
Feasibility Analysis Using « Shape 


The overall aim of feasibility analysis is the determina- 
tion of the range of parameters over which a particu- 
lar process is feasible. A formal definition of this prob- 
lem is to obtain a mathematical description of the re- 
gion in parameter space bounded by the process con- 
straints. This region can be considered analogous to 
an object whose shape or surface can be estimated us- 
ing the a-shape technique. The input to any surface- 
reconstruction algorithm needs to bea set of points rep- 
resenting the object whose surface needs to be deter- 


Shape Reconstruction Methods for Nonconvex Feasibility 
Analysis, Figure 2 

Point-in-polygon test: an odd number of intersections means 
the point is inside; an even number of intersections means 
the point is outside 


mined. The steps involved in determining the feasible 
region using the surface reconstruction ideas are as fol- 
lows: 
e Generate sample data points to adequately represent 
the feasible region under consideration. 
e Construct the w shape for the sampled data using the 
a estimate obtained from the MST of the data set. 
e Join the identified boundary points to obtain a poly- 
gonal representation of the feasible region. 
Having defined the surface or shape of the feasible re- 
gion, the next step involves determining whether a par- 
ticular point belongs to the feasible region. Since the 
feasible region has been approximated by a polygon, 
a simple way to check if a point is inside the polygon 
is by using one of the point-in-polygon tests [9]. One 
method to determine whether a point is inside a region 
is the Jordan Curve Theorem, which states that a point 
is inside a polygon if, for any semi-infinite ray from this 
point, there is an odd number of intersections of the ray 
with the polygon’s edges (Fig. 2). Conversely, a point 
is outside a polygon if the ray intersects the polygon’s 
edges an even number of times or does not intersect at 
all. Following this, whenever a parameter needs to be 
checked for feasibility in a polygon estimated feasible 
region, a semi-infinite ray is drawn from the point in 
any direction, and the number of intersections is noted, 
which determines whether or not the point is feasible. 


Sampling Technique 


The first step in the proposed approach is to have a good 
representation of the feasible region. Most of the com- 
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mon sampling techniques sample the parameter space 
based on the distribution of the uncertain parameter, 
which are considered to be uniform for the cases con- 
sidered here for simplicity of presentation. Under this 
condition it leads to uniform sampling of the entire pa- 
rameter space, irrespective of whether or not the sam- 
pled points are feasible. However, typically, the feasible 
region covers only a very restricted region of the en- 
tire parameter space. Hence sampling techniques cov- 
ering the entire range of uncertain parameter prove to 
be inefficient, particularly when evaluation of the pro- 
cess constraints is an expensive operation. A new sam- 
pling technique is thus introduced here that takes ad- 
vantage of the fact that typically a small section of the 
entire parameter space is feasible. The sampling prob- 
lem is formulated as an optimization problem and is 
solved using a genetic algorithm (GA). The use of aGA 
as a solution procedure proves to be very efficient for 
this problem since the search scheme has the inherent 
property of concentrating around regions having good 
solutions, which is the feasible solution for the prob- 
lem addressed here, thereby reducing expensive func- 
tion evaluation. 

The formulation of the sampling problem as an op- 
timization problem is given by 


_ Vieas 


subject to (fi)g <0 
(fale <0 (4) 


(fue <0, 


where Vieas is the volume of the feasible region eval- 
uated by constructing the a shape using the sampled 
feasible points. The optimization variables are the pa- 
rameter values 0, which are sampled by the GA to opti- 
mize the objective, and fi, fo ... . f, are the constraints 
of the feasibility problem evaluated at 6. However, in 
this formulation there is no optimal value of the vari- 
able 0 that will maximize the volume, but we are in- 
terested in the entire sampled set of feasible 6 values, 
using which the volume is evaluated by constructing an 
a shape over the entire set of feasible 6 values. Since 
the objective is to maximize the volume of the feasible 
space, whenever a chosen value of @ satisfies the con- 
straint functions, the volume is evaluated to update the 


objective function. When the value is not feasible, there 
is no need to reevaluate the volume since it will not 
change, but the fitness function is penalized by assign- 
ing it a small value. Solving this problem using a GA 
reduces the required number of function evaluations by 
minimizing the unnecessary evaluation of infeasible pa- 
rameter space. 


Cases 


The idea of using surface reconstruction for the estima- 
tion of a feasible region is illustrated by a few case stud- 
ies. 

The feasible region is defined by the following sets 
of convex and nonconvex constraints: 


f= 6, — 20, —15 < 0, (5) 
62 
Ba a= S20, (6) 
fa = 02(6 + 6) — 80 < 0, (7) 
6, — 4)? 
f= 10 -S=* _ 297 <0, (8) 


Figure (3) illustrates the actual nature of the feasible re- 
gion bounded by inequalities (5)-(8) and the convex 
hull approximation of the enclosed feasible region. The 
first step in the proposed scheme is to sample the feasi- 


15 
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Feasible region bounded by convex and nonconvex con- 
straints and its estimation using convex hull 
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ble space efficiently. The optimization problem for sam- 
pling is given by 
max Vieas ’ 
91,62 
6, — 26, —15 < 0, 


Oy 
oy Pat SO, (9) 
fhe 2 
fp SE rg: 
b) 0.5 


Both uncertain parameters are considered to vary 
within the range of (—20,20). In order to solve the 


15 
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Analysis, Figure 4 

Sampling of feasible space using (a) genetic algorithm and 
(b) random sample 


problem using a GA, the parameters 6; and @) are en- 
coded as bits, with 7 bits for each parameters, giving rise 
to a 14-bit chromosome. A population size of 20 is cho- 
sen for this problem following the guideline of Edwards 
et al. [4]. 

The working principle of a GA is based on generat- 
ing multiple numbers of good solutions. Hence evaluat- 
ing the volume for each of the feasible parameter values 
(@) will reduce the efficiency of the procedure because 
of the repetition of the solution. To avoid this, a mem- 
ory of the sampled parameter value is maintained and 
updated. For every generated chromosome in the pop- 
ulation of the GA simulation, the stored parameter val- 
ues are searched to check for uniqueness of the new 
solution. If a new solution is unique, then the con- 
straints are evaluated; else it is updated from the mem- 
ory. Chromosome evolution through 2000 generations 
requires a total of 40,000 function calls, of which only 
3064 are unique and 938 are feasible points as illus- 
trated in Fig. 4. The same problem was solved by draw- 
ing random samples in the range (—20, 20) for both un- 
certain parameters (Fig. 4b), where 9830 function calls 
were required to generate 950 feasible points. However, 
this procedure is particularly advantageous when the 
feasible region is a small portion of the entire parameter 
range. Otherwise its performance becomes comparable 
to random sampling over the entire parameter range. 

In the above formulation, the volume of the feasible 
region, Vfeas, is computed by generating an w shape of 
the sampled points. An alternative formulation for gen- 
erating the sampled data set is given by 


max 3 Ofeas , 


6, — 26, —15 <0, 
Or 
Fe 


6-4) 63 
ja EO ey 
5 0.5 


62(6 + 01) — 80 < 0, 


(10) 


where Osea; represents a feasible sample point and the 
objective is to maximize the total number of sampled 
feasible points. This formulation is computationally less 
demanding since it does not require volume evaluation 
of the feasible region at every step. However, it suffers 
from the disadvantage of a lack of a convergence crite- 
rion. To overcome this problem, a hybrid of these two 
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Start GA simulation 


Solve problem (10) 


Restart GA 


Evaluate o 


Construct the a shape 


Check for 


convergence STOP 


Calculate volume of feasible region 
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Modified algorithm for sampling of feasible region 


formulations (9) and (10) is used, where the main al- 
gorithm is evolved according to formulation (9), and 
the volume is evaluated only at intermediate points to 
check for convergence of the simulation. The overall 
procedure is illustrated in Fig. 5. 

Once a good estimate of the feasible region is ob- 
tained by the sampling scheme, the surface-reconstruc- 
tion algorithm is used to determine points forming the 
boundary of the feasible region, which are then joined 
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estimation 
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Performance of a shape in predicting feasible space using 
a=25 


by a straight line as illustrated by Fig. 6. The value of 
a plays a crucial role in determining the degree of de- 
tail captured by the a shape. The a value determined by 
the procedure outlined in Sect. “a-Shape Approach” for 
the 938 sampled points is 25, which was found to cap- 
ture the nonconvex nature of the object with adequate 
accuracy, as illustrated in Fig. 6. 


Process Operation Example 


This example represents the flow sheet shown in Fig. 7, 
consisting of a reactor and heat exchanger [5] where 
a first-order exothermic reaction A — B is taking place. 
The existing design has a reactor volume (V) of 4.6 m? 
and a heat exchanger area (A) of 12 m?. Two uncer- 
tain parameters are considered, the feed flow rate Fo 
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and the activation energy E/R. A mathematical model 
of this process is given by: 


Fo(cao — Ca1)/Cao = Vko exp(—E/RT) cai 
(—AH)Fo(cao — €a1)/Cao = Focp(Ti — To) + Que 
Que = Fycp(T; — Tr) 
Que = Fywepw(Tw2 — Tw1) 
Que = AUATi, 
(T, — Ty2) — (Th — Tw) 
In(T, — Ty2)/(T2 — Twi) 
VV 
(cao — Ca1)/Cao = 0.9 


311 < T; < 389 


ATin = 


T, — T, = 0.0 

Tw2 — Twi = 0.0 
T, — Ty2 = 11.1 
Th — Tyi = 11.1 


To = 333K, Ty; = 300K, 
U = 1635 kJ/(m7hK) 
Cp = 167.4kJ/kmol, 
Cao = 32.04kmol/m?, 
—AH = 23260kjJ/kmol . 


(11) 


The range of uncertain parameters E/R and Fo over 
which the design remains feasible is illustrated in Fig. 8. 
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Performance of « shape in predicting the feasible space of 
reactor-cooler example 


The aim is to have a description of the range of the pa- 
rameters E/R and Fp over which the operation remains 
feasible. Following the proposed approach for feasibil- 
ity analysis, the feasible space is first sampled by solv- 
ing the problem at different values of the parameters, 
and a representation of the feasible region is obtained. 
In the next step, these sampled points are analyzed by 
a shape to identify points lying on the boundary of 
the feasible region. These identified surface points are 
joined by straight lines to obtain a polygonal estima- 
tion of the feasible region. Figure 8 compares the ac- 
tual feasible region with that of w-shape estimation ob- 
tained with 400 sample points, which was found to per- 
form with great accuracy. To understand the effect of 
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Effect of sampling density on the performance of a shape 
(a) 100 points (b) 25 points 
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sample density on the performance of the a-shape pro- 
cedure, the feasible region evaluation was performed 
with fewer sample points of 100 and 25, as illustrated 
in Fig. 9. The w-shape prediction was found to under- 
predict the feasible region since the sampling density 
was inadequate in capturing the entire region. How- 
ever, there was no overprediction of the nonconvex fea- 
sible region. Figure 9b also compares the performance 
of the convex hull with that of a shape, where it is ob- 
served that even though the sampling density was very 
low, a shape still captured the nonconvex nature of the 
feasible region. The convex hull is constructed by per- 
forming line searches toward the vertices of the uncer- 
tain space to locate points on the boundary of the fea- 
sible region. The convex hull covers a larger percentage 
of the feasible region compared to the a shape for the 
case of sparse sampling, but it overpredicts the feasi- 
ble region over the nonconvex constraint. The perfor- 
mance of a shape is directly dependent on the informa- 
tion captured by the sampling of the feasible space. It is 
important to know, however, that in the absence of suf- 
ficient information the w shape will be a poor predictor 
of the feasible space, but it will not lead to erroneous 
results. 


Conclusions 


The problem of evaluating the feasible range of a pro- 
cess operation is addressed in this paper using surface- 
reconstruction ideas. The problem definition is to eval- 
uate and quantify the uncertain parameter range over 
which a process retains its feasibility. In the present ap- 
proach the feasible region is viewed as an object, with 
process constraints defining the boundary of the ob- 
ject. Surface-reconstruction ideas are used to define the 
shape of the object. The procedure starts by first sam- 
pling the feasible region to obtain a representation of 
the feasible space. An @ shape is then constructed of 
the sampled points, which identifies points forming the 
boundary of the object. These points are joined to have 
a polygonal representation of the feasible region. Fi- 
nally, determination of whether a point is feasible or not 
can be done by a point-in-polygon check. Examples are 
presented to illustrate the performance of the proposed 
scheme in nonconvex and even disjoint problems. 

The application of the proposed technique in higher 
dimensions becomes computationally challenging. One 


way of dealing with this issue is by reducing the dimen- 
sionality of the problem. The ideas of principal compo- 
nent analysis [19] can be utilized to map the original 
uncertainty space to the reduced dimensional space of 
important eigen directions. The w-shape ideas can then 
be applied in the reduced space and the feasibility in- 
formation mapped back to the original uncertain space. 
These ideas are currently being explored by the authors. 
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Introduction 


The use of zeolites as molecular sieves, absorbents and 
catalysts has today been well established in a wide va- 
riety of processes. However, almost the totality of ap- 
plications involves a very small number of nearly circu- 
lar structures, like Linde type A, faujasite and ZSM-5. 
These structures are usually modified to meet the spe- 
cific needs of each process. Modification techniques, 
such as ion exchange or coke deposition, usually re- 
sult in a distribution of pore sizes and shapes, some- 
thing that retards the ability of the molecular sieve to 
be highly selective. On the other hand, there is a great 
variety of natural and synthetic zeolites that has been 
developed, but no significant effort has been made to 
find potential catalysis and separation applications for 
them. There could very well be existing structures that 


are highly selective in their unmodified state, or requir- 
ing a small amount of modification, just because their 
windows happen to be of the proper size and shape. 
Gounaris et al. [7,8] developed a mathematical frame- 
work, which is based on optimization, to address ex- 
actly this issue. The framework can identify shape se- 
lective zeolite structures and provide researchers with 
a rigorous way to determine the best candidate portals 
for the process of interest. 


Characterization of Molecular Footprints 


For spherical molecules going through circular win- 
dows, such as the noble gases approaching a Linde 
type A window, the molecular shape and the rota- 
tional orientation are not important for penetration 
into a channel since every possible rotation results in 
the same projection. The Lennard-Jones length is often 
used as an order-of-magnitude estimation of the size of 
the molecule [11]. It is also the starting point for several 
attempts to compare these lengths with nominal zeo- 
lite window diameters so as to identify zeolite windows 
that are suitable for separating a set of molecules. For 
instance, zeolite 3A has a diameter that is between that 
of Hz and O;; thus, it would be a good candidate for 
their separation. Since most molecules are not spheri- 
cal, and most windows are not circular, we need more 
accurate methods to characterize molecules. 

We start from a simple model of a molecule as 
atoms connected by bonds, obtained by a quantum 
mechanics or a molecular mechanics calculation. In 
the hard-sphere model, each atom is represented by 
a sphere of van der Waals radius, and the bond lengths 
and angles are considered fixed - equivalent to an in- 
flexible molecule at absolute zero temperature. 

When a molecule approaches the opening of a chan- 
nel, various rotational orientations of the molecule give 
rise to different projections on the horizontal plane. 
From this ensemble of projections, the ones that are 
most favorable for penetration are usually the small- 
est, which we would call “footprints.” If the molecule is 
a spheroid with three different axes, then the molecule 
should be oriented so that the longest axis is perpendic- 
ular to the plane, and the footprint is the ellipse formed 
by the two smaller axes. If the molecule is a rectangu- 
lar parallelepiped, then the footprint is the rectangle 
formed by the two smaller axes. This suggests the fol- 
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lowing useful quantitative measures of the size of a foot- 

print involving no more than two parameters: 

(a) The footprint is the projection that can be enclosed 
by the smallest possible circle, characterized by its 
radius (o. 

(b) The footprint is the projection that can be enclosed 
by the ellipse with the smallest possible area. This 
footprint is characterized by its major and minor 
radii, which are denoted, respectively, with o; and 
2. Its eccentricity is defined as e = ./1 — p/p). 

(c) The footprint is the projection that can be enclosed 
by the rectangle with the smallest possible area. 
This footprint is characterized by its major and mi- 
nor lengths, which are denoted respectively with a 
and a2. The aspect ratio is defined as AR = @/a}. 

(d) The footprint is the projection that minimizes the 
sum of the distances of the projected atomic nuclei 
from a suitable center. This footprint can be charac- 
terized by a major diameter dy, which is the largest 
distance between two points on the edge of the foot- 
print, and a minor one dm, which is the width in 
the direction perpendicular to the major diameter. 
For an exact definition and a detailed description 
of the calculation of these diameters, see Gounaris 
et al. [8]. 

(e) The footprint is the projection that minimizes the 
sum of the distances of all the projected atomic nu- 
clei from each other. It can be quantified with the 
same parameters as in measure d. 

The computations of these quantitative measures are 
formulated as nonlinear programming problems or as 
bilevel nonlinear programming problems, and are de- 
scribed in detail in Gounaris et al. [8]. See Gounaris 
et al. [7] for examples of footprints of popular 
molecules and for relevant illustrations. 


Definition of Strain and Calculation 
of Strain Index 


When a guest molecule approaches a host portal, there 
are three possible outcomes: free passage, constrained 
passage, and no passage. When the molecular projec- 
tion of the guest can be entirely contained within the 
portal, there is no hindrance and the passage is free. 
When some of the atomic nuclei in the projection fall 
outside the window, and no rotation and translation 
can prevent this, then there is no passage. However, if 


all nuclei fall inside the portal but some atomic radii 
extend beyond this area, then there can be constrained 
passage in the sense that the atomic spheres have to be 
squeezed so as to fit in the portal. 

We define the amount of distortion on a single atom 
as 

é= = (1) 

To 

where fro is the original atomic radius and r, is the 
squeezed atomic radius. 

The total strain, S, for a guest to penetrate through 
a host portal is quantified as 


1 1 1 1 
s= 5450-0 (ge- 3) +D (ge) 
t t j if J 


(2) 


where 6; and 4; are, respectively, the amounts of dis- 
tortion on the ith guest (G) and jth host (H) atom. In 
practice, only the oxygen atoms in the zeolite window 
and the outer (often hydrogen) atoms of the molecule 
make significant contributions. 

There is a strain associated with every projection of 
the guest molecule, but there is some optimal projec- 
tion that exhibits the minimum possible strain, denoted 
as S*. This optimal projection for one channel may be 
different from that for another channel with a different 
shape. For instance, a molecule in the shape of a cylin- 
der can give a rectangular or a circular projection, de- 
pending on the requirement posed by the shape of the 
portal. 

Since S* values could span a wide range of orders 
of magnitude, the introduction of a logarithmic scale 
is necessary for a better representation. We define the 
“strain index” as 


SI = log(1 + S*). (3) 


The strain index can serve as a measure of the total dis- 
tortion required for penetration to take place. A host- 
guest pair exhibiting a strain index of zero would corre- 
spond to a free passage, while a strain index approach- 
ing infinity would correspond to no passage at all. 

A rigorous algorithmic framework to calculate ro- 
bustly the strain index of a given host-guest pair was de- 
veloped by Gounaris et al. [7,8]. The optimization for- 
mulation that models the problem is described below. 
Let us first present the notation used. 
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Indices: 

i=1,2,...,M Atoms in the guest molecule 
j=1,2,...,N Atoms in the host molecule 

k= 1,2,...,K Constraints defining host interior 


Decision variables: 

g.v0,y Rotation angles 

xt, yt Translation of guest projection on the x-y 
plane 


6;, 6; Required distortion of guest and host atoms 
Auxiliary variables: 
dij Distance of the ith guest atom projection from 


the jth host atom 
Position of the ith guest atom (after rotation 
and projection) 


Ce] 


Parameters: 
x? 
yy Coordinates of the ith guest atom (random 
z orientation) 
xh; bs : 
: Position of jth host atom on x-y plane 
Pa 
ax, bx, cx Parameters defining the convex hull of the 
host 
tis Ty Effective atomic radius of guest and host 
atoms 


The objective is to minimize the total strain, S, required 
for penetration: 


(4) 


For every guest atom, the position of its center should 
correspond to a valid rotation that resulted from the 
original conformation provided. According to the “x y 
z” convention for rotation matrices, the coordinates of 
the projected atom nuclei are given by 


x; = cos@-cosgy- x? + cos@- sing: y? 
—sinO-z?+xt Vi 

yi = (sinw-sin@-cosg—cosw-sing)-x? (5) 
+ (sin y - sin @ - sing + cos W - cosy): y? 
+cosO-sinw-z?+yt Vi. 


The terms xt and yt allow for translation of the projec- 
tion on the x-y plane, so as to obtain a better fit with 
respect to the host. Note that the host-guest conforma- 
tions are provided independently, and there is no re- 
quirement that they use the same reference coordinate 
system. 

For every pair of host-guest atoms, we impose the 
condition that their effective spheres cannot intersect 
with each other, therefore implying that they have to be 
squeezed to fit: 


djj = 6;-7) +6; 1; 


6 
dj, = (x; —xhj) +(yi-yhj) Vi, j). o) 


In order to avoid obtaining (otherwise valid) solutions 
where the guest is completely outside the portal area, 
we have to include also a set of constraints that outer- 
approximates the portal. A set of linear constraints that 
serves the purpose is the one that describes the convex 
polygon whose vertices coincide with the atom centers 
of the host: 


an Xi + be yi <ce Wi,k). (7) 


The parameters a,, by,and c, can be easily calculated 
from the host atom coordinates [ |. Note that only 
those atoms that participate in the host’s convex outer 
approximation are used for this calculation; therefore, 
K does not necessarily have to equal N. This only hap- 
pens in the case of convex portals. 

Finally, the following bounds have to be applied to 
the decision variables of the problem: 


—1<9,0,w<-4+7 (8) 
0<6;<1 Vi 
(9) 
<6;<1 Vj. 


The bounds on the Euler angles are imposed so as not 
to obtain periodic solutions, while the bounds on the 
deltas relate to their definitions. Note that no bound is 
imposed on the translation variables xt and yt, which 
are allowed to vary freely. 

The minimization of the objective function (4), sub- 
ject to constraints (5)-(9), constitutes a nonlinear pro- 
gramming problem that involves continuous variables. 
The problem is nonconvex, with nonconvexities intro- 
duced both by the objective function (definition of to- 
tal strain) and by the constraints (projected rotations 
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Complete strain index database 


and atom-atom distances). This problem can be rig- 
orously addressed by deterministic global optimization 
methods such as @BB [1,2,5,6,10]. For computational 
efficiency reasons, Gounaris et al. [7,8] chose instead to 
employ local optimization methods with an insightful 
initialization scheme that was effective in avoiding con- 
vergence to nonglobal solutions. 

They applied their method on a large database of 
zeolite portals and molecules. In particular, they con- 
sidered 38 popular molecules and 123 zeolite structures 
(corresponding to a total of 217 different windows). For 
an exact list of the windows considered, see Gounaris 
et al. [8]. Complete reference for all these structures 
can be found in the Atlas of the International Zeolite 
Association [3,4]. Figure 1 shows a schematic repre- 
sentation of all the results and can serve as a database 
of strain indices. Such a database shows the relations 
between many molecules and zeolite rings and can be 


a powerful tool for the identification of portal candi- 
dates that are selective between two molecules. It can 
offer a systematic screening technique which has an en- 
ergetic basis and does not rely exclusively on qualita- 
tive measures. Once it has been identified that a zeo- 
lite structure is a good candidate to admit selectively 
some molecule, experimental studies should be em- 
ployed to accurately determine diffusion rates or Lang- 
muir isotherms. These results could also be supported 
further by molecular dynamics or Monte Carlo simula- 
tions [9,12,13] which are tailored to study the specific 
sorbent/sorbate systems under consideration. 


Strain-Based Screening 


When a set of molecules approaches a zeolite channel 
window, the results can be described as a triage. When 
all the molecules pass, such as in the case of hydrogen, 
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nitrogen and oxygen approaching the relatively large 
opening of faujasite, there is no selectivity and no sepa- 
ration. When none of the molecules pass, such as the 
case of oleic and linoleic acid approaching the chan- 
nel of SAPO-56, there is also no selectivity and no sep- 
aration. When some of the molecules have strain in- 
dices different from those of others, such as in the case 
of ethane and ethylene approaching ERS-7, then there 
is selectivity and potential for catalysis or separation. 
A higher strain index may reduce the diffusion rate 
or reduce the equilibrium adsorption, instead of com- 
pletely denying passage, but it would nevertheless serve 
the separation scheme. 

When a molecule is being squeezed to fit a host 
channel, some activation energy is required which 
would lead to a decrease of the equilibrium concentra- 
tion in the channel according to the Boltzmann equa- 
tion: 


where € is some “hardness” coefficient. An averaged 
Lennard-Jones parameter may be used. 

Let us define selectivity between two molecules 
A and B, yap, as the difference between their reduced 
equilibrium concentrations: 


YAB = 


(11) 


EA E 
exp (-=) = exp (-zz)| : 


If the distortion energies required are similar, both 
molecules will penetrate the same relative amount, and 
therefore C*/C® ~ Ci/C§ and there is no selectivity 
(yap — 0). On the other hand, if the energies are sub- 
stantially different, the penetration levels will be differ- 
ent and high selectivity will be achieved (yaz — 1). In 
the case where E, — 0, selectivity is at a maximum at 


C E 4eS (10) very low temperatures, but a rise in temperature can ac- 
— = exp |—— ] =exp{[—-—]., , a : . 
Co P RT P RT tivate molecules B and selectivity will decline. 
C3 / C3= 
1.00 -—-+ a - 1.00 
* Pose. Faas RRO 
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Ww 
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Selectivity vs. temperature for the C3/C3= system. RRO RUB-41, TON Theta-1, AEL AIPO-11, MWW MCM-22, CZP chiral zin- 


cophosphate, CGS cobalt gallium phosphate-6, AHT AIPO-H2 
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An illustrative example of such calculations is pre- 
sented in Fig. 2, where selectivity is plotted versus tem- 
perature for the system of propane/propylene. RUB-41 
is identified as the most promising candidate for the 
separation of the two C3 molecules. It maintains a very 
high selectivity along the whole range of temperatures 
considered. Theta-1, AIPO-11 and MCM-22 are also se- 
lective at ambient temperature, but their performance 
deteriorates at higher temperatures. All these structures 
correspond to the case where one of the two molecules 
(propylene) enjoys a free passage through the portal, 
while the second molecule (propane) has to experience 
some distortion. A different trend holds for chiral zin- 
cophosphate (CZP), cobalt gallium phosphate-6 and 
AIPO-H2, which seem to benefit from an increase in 
temperature. The potential of RUB-41 to be highly se- 
lective on the C3 system can be explained by the high 
degree of similarity between propylene’s projection and 
the actual shape of the portal, a similarity that resembles 
the analogy between a lock and a key. 
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Biographical Summary 


Naum Zuselevich Shor (1937-2006) is recognized as 
one of the paramount researchers in the field of 
optimization. He is well known for his significant 
contributions to many important areas of optimiza- 
tion, including nonlinear and stochastic programming, 
computational methods for nonsmooth optimization, 
discrete optimization problems, matrix optimization, 
dual quadratic bounds in multiextremal programming 
problems, and numerical algorithms for solving large- 
scale optimization problems. 


Biographical Details 


A renowned Ukrainian mathematician, Shor was born 
on January 1, 1937. His childhood took place during the 
horrific years of World War II. In 1954, Shor entered 
the Mechanics and Mathematics Department of Na- 
tional Taras Shevchenko University in Kiev, Ukraine. 
Two years later in 1956, the young brilliant scientist 
Victor Mikhylovich Glushkov moved to Kiev and was 
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appointed director of a newly created Computing Cen- 
ter (formerly one of the laboratories at the Institute 
of Mathematics) of the National Academy of Sciences 
of Ukraine (NASU). Shor decided to focus his atten- 
tion on differential algebra and began working on his 
diploma thesis under Glushkov’s supervision. 

In 1958, Shor graduated from Taras Shevchenko 
University and was invited by Glushkov to join the 
Computing Center, which 4 years later became the In- 
stitute of Cybernetics of NASU. There Shor became an 
active part of the research group guided by another 
talented mathematician, Vladimir Sergeevich Mikhale- 
vich. First, Shor examined the problems of modeling 
and optimization of the reliability of computing de- 
vices, as well as application of noise power spectrum 
analysis to various problems in radiology. By 1960, the 
research team of Mikhalevich had evolved into the De- 
partment for Applied Problems with a focus on optimal 
planning and design. This transformation had com- 
pletely shifted Shor’s scientific interests toward the field 
of optimization, which emerged as a new area of math- 
ematics in the 1940s. Working together on the opti- 
mal selection of design decisions, Shor and Mikhalevich 
constructed a numerical procedure for sequential anal- 
ysis of variants. The proposed procedure represented 
a generalization of dynamic programming algorithms. 
It could easily be employed for solving various applied 
problems of optimal design and planning, including, 
but not limited to, gas supply systems, electrical net- 
works, and transportation route systems. As a result, 
their ingenious numerical scheme received a high num- 
ber of citations. Continuing his work in optimal design, 
Shor had also suggested a method for solving optimal 
design problems for lengthy objects and treelike struc- 
tures. In addition, Shor was the first to apply the sub- 
gradient descent method to optimization of nonsmooth 
functions in 1962. Specifically, using the subgradient 
scheme, he devised an approach for solving large-scale 
dual network transportation problems by reduction to 
the maximization of a piecewise linear function. His ap- 
proach later became well known as the generalized gra- 
dient descent method. In 1964, Shor defended his Ph.D. 
dissertation entitled “On the Structure of Algorithms 
for Numerical Solution of Problems of Optimal Plan- 
ning and Design.” 

After earning his doctor of philosophy degree in 
1964, Shor continued his work on application of his 


generalized gradient descent method to various math- 
ematical programming problems, including block pro- 
gramming and two-stage stochastic programming. In 
1967, he also cowrote a book with Mikhalevich on the 
computational approaches to optimal selection of de- 
sign decisions. Only 1 year after the book on optimal 
design had been published, Yuriy Ermoliev and Shor 
devised a modification of Shor’s subgradient method 
for solving two-stage stochastic programming prob- 
lems. This revolutionary approach was later advanced 
even further by the research team led by Ermoliev and 
became known as the direct quasi-gradient method for 
optimization under uncertainty. 

Another pioneering approach in optimization, 
which was introduced by Shor in 1970, was based on 
the idea of space transformation known as dilation. Al- 
most concurrently, Shor worked on two methods in- 
volving space dilation. The first technique is the method 
with space dilation in the direction of the subgradient, 
which is used for solving systems of nonlinear equa- 
tions and inequalities. The second method, also known 
as the r-algorithm, utilizes the operation of space di- 
lation in the direction of the difference between two 
consecutive subgradients. The r-algorithm has become 
one of the most efficient procedures for solving com- 
plex optimization problems. Nearly a decade of his re- 
search on the subgradient and subgradient-type meth- 
ods with space transformation was finally summarized 
by Shor in his monograph Minimization Methods for 
Non-Differentiable Functions and Applications, which 
was first published in 1979, and just 6 years later was 
translated into English. 

Remarkably, the famous ellipsoid method, which 
was independently formulated by A.S. Nemirovsky 
and D.B. Yudin in 1975, and which was used by 
L.G. Khachiyan in 1979 to prove that linear program- 
ming problems can be solved in polynomial time, is ac- 
tually a special case of Shor’s method with space dila- 
tion in the direction of the difference between two con- 
secutive subgradients. 

In the early 1980s, Shor became captivated with 
graph theory while working on the network opti- 
mization problems. Together with his Ph.D. student 
G.A. Donets, he was investigating the graph coloring 
problems. In particular, they formulated a hypothe- 
sis on the number of solutions for coloring of a plain 
graph. This hypothesis on plain graph colorings is ev- 
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idently supported by computational experiments. In 
1982, their treatise on the algebraic approach to the 
problems of plain graph coloring was published. 

In the second half of the 1980s, Shor collaborated 
with S.I. Stetsenko on quadratic extremal problems. 
Later he investigated dual estimates in multiextremal 
problems and produced a paper on this subject in 1992. 
The same year, Shor’s joint paper with O.A. Berezovski, 
where they described new procedures for constructing 
optimal inscribed and circumscribed ellipsoids, also ap- 
peared in the press. The scope of his research was con- 
tinuously expanding to other complex areas of opti- 
mization to include minimization of matrix functions 
(1995), generalized set partitioning problems (1996), 
polynomial optimization problems (1998), nonsmooth 
optimization in stochastic programming (1999), and 
Lagrangian bounds for multiextremal polynomial and 
discrete optimization problems (2002). Furthermore, in 
1998, his extensive analysis of polynomial problems us- 
ing methods of nondifferential optimization was pub- 
lished in a monograph. The book included a com- 
prehensive review of techniques used in nondifferen- 
tial optimization as well as their application to various 
problems, such as the problems of discrete optimization 
and graph optimization, polynomial problems, and op- 
timal Lyapunov functions. Polynomial problems were 
given special consideration. Specifically, Shor discov- 
ered that in order for the dual quadratic bound of 
a polynomial to be equal to the global minimum of 
such a polynomial, it is necessary and sufficient that 
the difference between the polynomial and its global 
minimum could be represented as a sum of squares of 
real polynomials. This result illustrates the connection 
between the problems of nonconvex polynomial opti- 
mization and David Hilbert’s 17th problem about rep- 
resentation of a definite rational function as a quotient 
of the sum of squares, which was posed by Hilbert in 
1900 and solved by Emil Artin in 1927. 

Until his death on February 25, 2006, Shor kept ac- 
tively working on different intricate optimization prob- 
lems. His dedication to research, immense knowledge, 
and unstoppable intellect were manifested in his undis- 
putable achievements in the field of mathematical pro- 
gramming. During his long research career, Shor won 
numerous awards. Among the most prestigious are 
both the former USSR and Ukraine State Prizes in Sci- 
ence and Technology (1973, 1981, 1993, and 2000). In 


addition, for his great contribution to computational 
methods for solving large-scale optimization problems, 
Shor was recently awarded the Glushkov Prize and the 
Mikhalevich Prize. In recognition of his lifelong sci- 
entific accomplishments, he was elected as an Asso- 
ciate Member of the National Academy of Sciences in 
Ukraine in 1990, and became a full Member of the 
Academy on December 4, 1997. During his life, Shor 
had over 180 research papers and nine books pub- 
lished and supervised Ph.D. dissertations of over 35 stu- 
dents who now continue his scientific work not only in 
Ukraine, but all over the world. 
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The shortest path tree problem is a classical and widely 
studied combinatorial problem ([1,23,24]). The scope 
of this article is to provide an extensive treatment of 
the major classical approaches. It then proceeds focus- 
ing on the auction algorithm and some of its recently 
developed variants. There is a discussion of the theoret- 
ical and practical performance of the treated methods 
comparing their effectiveness. 


Mathematical Model 


The shortest path problem can be posed in more than 
one way: 
1) to find the shortest path from a single source to 

a single destination; 

2) to find the shortest path from each of several sources 
to each of several destinations; 

3) to find the shortest path from one single source to 
all destinations. 

Problems of type 3) are also called shortest path tree 

problems (SPT). 

Before describing the single source-all destinations 
shortest path problem, we report in the following the 
needed notation and definitions. 

Let G = (V, E, C) bea directed graph, where 
e V is aset of nodes, numbered 1,..., n; 

e E={(i,j): i,j € V} isa set of m edges; 
e C: E— Risa function that assigns a length to any 

edge (i, j) € E; 

e a forward path P = {(ij, iz), ..., (ik-1, ix)} is a set of 
edges, whose length is the length of its edges. 

In order to assure that the shortest path problem admits 

a solution, it must be assumed that: 

e all cycles in the graph have nonnegative length; 

e the graph is strongly connected. 


The last assumption can be removed by defining the 
distance between not connected nodes equal to +00. 

Let moreover s be the label of the source node; then 
the single source shortest path tree problem (SPT) can be 
formulated as 


min )° c(i, j)x(i. j) 
(i, ez 

s.t. Yo x f— So x(h,i) = b; a) 
(i,j)€E (h,i)€E 
bs =-1,if#s, and bh=n-1 


x(i,j)€ {0,1}, VG, je E. 


The dual problem (DSPT) is: 


max (n —1)x(s)— xj) 
st. m(i)—2(j) < c(i,j), VGf €E, 


where z(i) is the dual variable associated with the 
node i. 


A Generic Shortest Path Algorithm 


Any shortest path algorithm for the single source-all 
destinations problem maintains and adjusts a vector 
{a(1), ..., 2(n)} of distance labels that can be scalars 
either oo and that satisfy 


Proposition 1 Let {(1), ..., m(n)} be scalars satisfying 


n(j)< w(t) +cli,j), Vi, f) € E, (3) 
and let P be a path starting from a node i; and ending at 
a node i. If 


m(j) = (i) + c(i, j), Vi, j) € P. (4) 


Then P is a shortest path from i; to ig. 


The conditions (3) and (4) are also known as comple- 
mentary slackness conditions (CSC) from the connec- 
tion of the shortest path problem with the minimum 
cost flow problem. The generic shortest path algorithm 
starts with some vector of labels {7(1), ..., 2(n)} and 
successively selects edges (i, j) that violate (4). For each 
violating edge it sets 


m(j) = (i) + c(i, j) 
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and stops when CSC is satisfied by all edges. Intuitively, 
the labels (i) can be view as the length of some path 
P; from the source to the node i. Therefore, if (j) > 
(i) + c(i, j), the path obtained by extending P; by edge 
(i, j) is shorter than P; whose length is (j). This can be 
iterated to find successively better paths from the source 
to various destinations. 

The violating edges could be arbitrarily chosen, but 
a more efficient way is to establish an order of selecting 
nodes from a set L, called candidate list, and checking 
violation of the CSC for all of their outgoing edges. 

Let the node labeled 1 be the source node; then the 
pseudocode of a prototype shortest path algorithm is as 
follows: 


Sari = (i), aA) = 0, (2) = coi = IL, 
WHILE L # 9 
select from L a node i; 
FOR each outgoing edge (i, j) 
IF x(j) > (i) + c(i, j) 
set z(j) = m(i) + c(i, j) 
add j to Lif j ¢ L 
ENDIF 
ENDFOR 
ENDWHILE 


Pseudocode of a prototype shortest path algorithm 


Implementations of the Generic Algorithm 


In the literature there exist many implementations of 
the generic algorithm that differ in the criterion of se- 
lection of the next node to be removed from the set L. 
Traditionally, they are divided into two groups: 

1) Label setting methods: the node i to be removed from 
L corresponds to the minimum label. If the input 
data are nonnegative, it can be shown that each node 
will enter L at most once and its label has permanent 
value the first time that node is extracted from L. At 
each iteration must be calculated the minimum label 
over L and many implementations of this approach 
differ in the procedure they use to obtain that mini- 
mum. 

Label correcting methods: the choice of the node i re- 
quires less calculations, even if a node i can be in- 


2 


na 


volved more than once. 


Label Setting Methods 


The first label setting algorithm is due to E.W. Dijkstra 
in 1959 [20]. In this method the next node to be re- 
moved from L is the node i such that i = arg minje 
m(j). There are different versions of this algorithm de- 
pending on the particular data structure representing 
the set L and used to facilitate the removal and the addi- 
tion of nodes, as well as finding the node with the min- 
imum label and this choice is crucial for good practical 
and theoretical performance. The most famous and eff- 
cient Dijkstra-like algorithms are S- HEAP and S-DIAL. 


Set L = {1}, w(1) =0, m(i) = co Vi F 1. 
Set L(1) = 1, L(last) = nil, p; = nil Vi 
WHILE L 4 
i = L(1); 
replace L(1) by L(last) 
order heap L 
FOR each outgoing edge (i, j) 
IF z(j) > x(i) + c(i, jf) 
set w(j) = m(i) + c(i, jf) 
set pj =i 
IFj¢L 
insert j into L as L(last) 
order heap L 
ENDIF 
ENDIF 
ENDFOR 
ENDWHILE 


Pseudocode of S-HEAP 


S-HEAP 


The data structure chosen to represent the set L is a bi- 
nary heap, i.e. a tree whose radix corresponds to the 
node having the minimum label and in which each 
node has label not greater than those of its children. By 
using this data structure, the removal of the node L(1) 
corresponding to the minimum label, the insertion of 
a new node in the last position L (last) and the correc- 
tion of the label of a node already inserted in L have 
complexity O(log q) < O(log n), where q is the cardi- 
nality of L and n the number of nodes. At each itera- 
tion the radix of the heap L is removed and some la- 
bels of nodes belonging to L may decrease. Therefore, 
some nodes may have to be repositioned in L, while 
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some other nodes may enter in it for the first time in 
L and have to be put at the right position. Each of the 
just mentioned operations takes O(log n) time. The to- 
tal number of insertions is n as well as that of removals. 
Therefore, the number of operations needed to keep or- 
dered the heap L is O((n + r)log n), where r is the total 
number of repositioning operations. To get an upper 
bound on 1, it is enough to observe that there is at most 
one repositioning for each edge, because each edge is 
involved at most once. Thus, r < m and the total opera- 
tion count needed to maintain L is O(m log n). Because 
this dominates the O(m) operation count to examine 
each edge, the worst-case running time of S-HEAP is 
O(m log n), even if experimental results ([9]) indicate 
that it grows approximately like O(m + n log n), because 
usually r is a small multiple of n and considerably less 
than m. 


S-DIAL 


This algorithm, due to R.B. Dial in 1969 [19], as- 
sumes that all edge lengths are nonnegative. L is im- 
plemented as a direct-address table, a dynamic array 
having a number of elements equal to the maximum 
number of different label values. Since every finite la- 
bel is equal to the length of some path with no cycles, 
the possible label values range in 


[0,(n —1) max c(i, j)]. 
(i, )EE 


The entry i of L, also called a bucket, is a double- 
linked list containing each node whose label is i. The 
algorithm starts with the source node 1 in the bucket 
L(0) and all other buckets are empty. At the first itera- 
tion the algorithm puts each node (1, i) in the bracket 
L(c(1, i)) and then proceeds to examine the bucket 
L(1). If L(1) is nonempty, it repeats the process, remov- 
ing from L each node with label 1 and moving other 
nodes to smaller numbered buckets, otherwise it pro- 
ceeds checking bucket L(2) etc. 

Checking the emptiness of a bucket and insert- 
ing or removing a node from a bucket require O(1) 
time; searching the minimum label node requires O(n 
max(i, j) ¢ £ C(i, j)), while adjusting node labels and repo- 
sitioning nodes between buckets require O(m). There- 
fore, the running time of S-DIAL is pseudopolynomial 
and is given by O(m + n max, j) cz c(i, j)). 


Set L(0) = {1}, m(1) = 0, z(t) =coViF 1. 
Set 2 =O, os = inl \/7, 
WHILE L 4 9 
move on L until L(z) 4 @; 
set i equal to the first element of L(z) 
remove i from L(z) 
FOR each outgoing edge (i, j) 
IF z(j) > w(i) + c(i, f) 
IF j ¢ L THEN remove j from L(z(j)) 
set m(j) = (i) + c(i, j), pj =i 
insert j into L(z(j)) 
ENDIF 
ENDIF 
ENDFOR 
ENDWHILE 


Pseudocode of S-DIAL 


Label Correcting Methods 


The label correcting methods require less sophisticated 
calculations to select the next node to be removed from 
L, but as counterpart they may involve a node more 
than once. All these methods implement the set L as 
a queue and differ in the particular type of queue they 
use and in the choice of the position in the queue L, 
where new node labels are inserted. In this section we 
will treat two among the most famous label correcting 
methods: the Bellman-Ford method and the D’Esopo- 
Pape method. 


Bellman-Ford Method 


The Bellman-Ford method is related to the method 
proposed by R. Bellman [3] and L.R. Ford [22]. It uses 
a FIFO strategy to maintain the queue L: a node is re- 
moved only from the top of the queue and is inserted at 
its bottom. This method proceeds in cycles of iterations: 
the first cycle consists of iterating on the source node 1, 
while in each subsequent cycle the nodes entered L dur- 
ing the previous cycle are removed from L in the same 
order that they were inserted. 

The Bellman-Ford algorithm solves the SPT in the 
more general case in which the edge lengths can be neg- 
ative and fails to terminate if and only if there exists 
a path starting from the source and containing a nega- 
tive cycle. In the case where all cycles in the graph have 
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Satib= (it, el) =O, a) = coX/i s2 i, 
Set pi = iol, Wil, 
WHILE L 4 © 
set i = top element of the queue L 
remove i from L 
FOR each outgoing edge (i, /) 
IF z(j) > w(i) + c(i, jf) 
set m(j) = (i) + c(i, j), pj =i 
IF j ¢é LTHEN 
insert j into L at the bottom 
ENDIF 
ENDIF 
ENDFOR 
ENDWHILE 


Pseudocode of the Bellman-Ford method 


nonnegative length, the shortest distance of every node 
can be obtained after at most n — 1 iteration cycles. 
Since in each iteration cycle each edge is involved at 
most once and each iteration cycle requires O(m) op- 
erations, the running time of this method is O(nm). 


D’Esopo-Pape Method 


Like the Bellman-Ford method this method can be 
used to detect the presence of a negative cycle and like 
the Bellman-Ford method, a node is always removed 
from the top of the queue L, but it is inserted at its bot- 
tom if it has never been in L before, otherwise it is put at 
the top. The choice of this inserting strategy comes ob- 
serving that the removal and the updating of the label 
of a node i affect the labels of a subset N; of neighbor 
nodes j with (i, j) € E. Therefore, by placing the node at 
the top of the queue, the labels of nodes belonging to N; 
will be updated as quickly as possible. 


Auction Algorithms 


The auction approach was proposed by D. Bertsekas [4] 
(see also [5,6]) for solving the assignment problem. It 
was then generalized for the transportation problem, 
the minimum cost flow problem [10,11,12] and for the 
shortest path [7]. A complete survey of the auction al- 
gorithm can be found in [8, Chapt. 4]. 

To solve the problem SP, the standard forward auc- 
tion algorithm follows a primal-dual approach and con- 


sists of tree basic operations: path extension, path con- 
traction and dual price increase. 


Let i be the terminal node of P. 

IF x(i) < ming, jez{c(i, j)+ m(f)} 
THEN go to Step 1; 
ELSE go to Step 2. 

ENDIF 

Step 1 (CONTRACT PATH) 
Set 2 (i) = ming, er{c(i, j) + r({)}. 
IFi¢s 

contract P and go to next iteration. 

ENDIF 

Step 2 (EXTEND PATH) 
Extend P by node 


ji = arg ming epic, ptr. 


The algorithm starts with a pair (P, 7) satisfying 
CSC, (at start P may consist only of s and a may be 
zero), then it proceeds in iterations, transforming (P, zr) 
into another pair satisfying CSC, that is at each iteration 
a dual feasible solution and a primal (infeasible) solu- 
tion are available for which complementary slackness 
holds. Therefore, while the algorithm maintains com- 
plementary slackness, it either constructs a new primal 
solution (not necessarily feasible) or a new dual feasible 
solution, until a primal feasible (and hence also opti- 
mal) is obtained. In more detail, at each iteration the 
candidate path P is either extended by adding a new 
node at the end of the path or contracted by deleting 
from P the last inserted node, said terminal node. At 
any iteration, if no extensions or contractions are pos- 
sible, the value of the dual variable corresponding to the 
terminal node of P is raised. The algorithm terminates 
when the candidate path P is extended by the target 
node, in case of single source-single destination prob- 
lem, or when each node has been involved at least once, 
in case of single source-all destinations problem. 

Even if the auction algorithm is introduced for solv- 
ing the single source-single destination problem, it can 
be easily adapted to solve the SPT problem. In fact, in 
this case it terminates when each node has been visited 
at least once by the algorithm. 


Graph Reduction in Auction Algorithms 


The original version of the auction algorithm for the 
shortest path problem was modified by S. Pallottino and 
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M.G. Scutella [27], who found out conditions under 
that it is possible to ‘prune’ the original graph. In more 
detail in [27], they showed that every time the standard 
forward auction algorithm reaches a node f, the opti- 
mality of the candidate path P enables to delete from 
the original graph all edges whose head is f, except the 
edge (k, f), if k is its predecessor in P. The set V of the 
nodes becomes in that way partitioned in the set of the 
nodes never visited by the algorithm and those visited at 
least once and that have only one incoming edge. The 
algorithm they have developed has a strongly polyno- 
mial computational time equal to O(m?), where m is the 
number of edges in the graph, without requiring that 
the cycle lengths must be positive, without assumptions 
on the topology of the graph and whatever are the input 
data. 

Strengthening the graph reduction idea using upper 
bounds to the node shortest distances, they developed 
[13] an auction algorithm, whose computational time 
is O(n min{m, n log n}), since it deletes arcs more ef- 
fectively. The set V of the nodes becomes now parti- 
tioned in three sets: the set of the nodes never visited 
by the algorithm, the set of those visited at least once 
(said tree nodes set) and that of the nodes never visited, 
but connected through an edge to at least one tree node 
(said border nodes set). The upper bounds uj, j € V, to 
the node shortest distances that they use, behave exactly 
as the temporary labels that Dijkstra’s algorithm asso- 
ciates to each node of the graph (see e. g. [8,20,23,28]). 
In fact, such an upper bound u; expresses exactly the 
shortest distance from the source to i and as soon as the 
border node i becomes tree node. Bertsekas, Pallottino 
and Scutella use those upper bounds in order to ‘prune’ 
the original graph as much as possible. In fact the al- 
gorithm they developed in [13] deletes not only the in- 
coming edges of the last visited node i as in [27], but 
also all the edges (i, j) € Eif uj + c(i, j) = uj; or otherwise 
the edge (k, j) € E for which k is a tree node other than i. 


Modified Version 
of the Standard Auction Algorithm 


A modified auction algorithm (MA), due to [16], 
reaches a substantial computational time improvement 
over the standard algorithm. Its peculiar characteristic 
is that the CSC are not longer maintained verified dur- 
ing the algorithm iterations. It proceeds as the standard 


auction algorithm, but it does not require that the dual 
feasibility has to be maintained throughout the algo- 
rithm; this allows to raise the dual prices higher than 
in the standard algorithm and, consequently, the num- 
ber of path contractions becomes substantially reduced. 
More precisely, the dual variable associated with the 
terminal node i of the candidate path P is raised to the 
second minimum value in the set 


{-(c(k, i) — x(k) + (i), (c(i, p) — xi) + 2(p)): 
(i, p) € By 


where k is the node that immediately comes before i in 
P. The correctness and convergence of MA are showed 
true through the following theoretical results, whose 
proofs are in [16]. 


Graph Collapsing In Auction Algorithms 


All graph collapsing auction algorithms due to [15], are 
based on the following simple idea. When a node of the 
graph is visited for the first time by the auction algo- 
rithm, then the shortest path from the source to this 
node is found. Moreover, during the successive compu- 
tations, any sub-paths extracted from an optimal path 
can be substituted by (collapsed to) a single arc of the 
same length. Suppose that at the end of the kth itera- 
tion of the auction algorithm the node i is visited for 
the first time. This means that i) is the terminal node of 
the current candidate path P = {s = ig, ..., i}. The CSC 
are satisfied and for the arcs belonging to P it holds that 


(ij) = m(ijer) + c(ij, ij41). 


Due of this property of the candidate path P, from the 
point of view of the algorithm, i.e. with respect to the 
sequence of nodes visited for the first time, it is equiv- 
alent to consider the original graph or a graph where 
a subpath is replaced by a single arc, whose length is 
equal to the length of the replaced subpath. The exact 
meaning of this equivalence has been clarified in [15]. 
Here the description of the topological transformations 
has been intentionally left vague, because there are sev- 
eral different methods to realize these transformations 
leading to different algorithms which can perform more 
or less efficiently. In the next section one of such ap- 
proaches, which seem particularly fruitful, will be de- 
scribed in full details. Because it performs also the graph 
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Let s and d be the labels of the source and the target node respectively. 


For each i € E let FS(i) be its forward star and let P be the candidate path. 


Let pred(i) be the node k such that (k, i)e ENP. 
Step 0: Set pred(i) = oldpred(i) = NIL, Vi € V. 
P= {sh 
Choose z € R!V! such that Clp — Mj +H, = 0, VU, p) € E. 
Step 1: Let i be the terminal node of the path P. 
IF i = d, THEN stop, ELSE go to Step 2. 
Step 2: Compute FS(i) := {(i, p) € E}, k = pred(i). 
IFi=s 
Vi = MiN(s, pers(s){Csp + Pips 
p* =arg MiN(,, »eRS(s) sp + Ip} 
IF pred(p*) = NIL 
remove from E each arc (1, p*), 1 # s 
oldpred(p*) = s 
pred(p*) = s 
IF | FS(s) |> 1 
m(s) = MING, p)eFS(s),pxp* tCsp + Hp} 
ELSE z(s) = yy 
i = p* and go to Step 1. 
ELSE // case i # s // 
IF FS(i) = 9 // Construction due to FS(i) = 9 // 
remove from E each arc (1, i) 
i = pred(i) and go to Step 1. 
ELSE // case FS(i) 4 @ // 
k = pred(i), in = 2(k) — xj, 
Vi = MING, peFs(i){Cip + Tp} 
IF in < y; // Normal contraction // 
(i) = yi, i= k and go to Step 1. 
ELSE 
p* =arg MiIN(;, »ERs(i) tip + 1p} 
V2 = MING, peRs(i),p£p*{Cip + Tp} 
m(i) = y2 
IF in > y2 // Normal extension // 
IF pred(p*) = NIL 
remove from E each arc (1, p*), 1 #i 
oldpred(p*) = i 
pred(p*) =i 
ELSE // Graph collapsing extension // 
add the arc (k, p*) to the set E 
set Ckp* = Chi + Cip* 
IF pred(p*) = NIL 
remove from E each arc (I, p*) 
oldpred(p*) = i 


ELSE 
remove from E the arc (i, p*) 
pred(p*) = k 


i = p* and go to Step 1. 


Pseudocode of the algorithm GCA2 solving the single source-single destination problem 
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‘pruning’ operation proposed in [27], it returns a solu- 
tion under the same assumptions made in [27], that is 
without the requirement of positive cycle lengths, with- 
out assumptions on the topology of the graph and for 
any kind of input data. 

The main idea of graph reduction seems to be simi- 
lar to that introduced in [13], because the effectiveness 
of both belongs to the reduction of the number of itera- 
tions (nodes visiting), but the approaches are somewhat 
different. In fact, while their method uses several crite- 
ria to delete those edges that certainly will not belong to 
a shortest path before passing over them, the approach 
in [15] deletes edges belonging to shortest paths, replac- 
ing a chain of them with a single edge. 


Graph Collapsing Auction Algorithm 


In this section is described an auction algorithm 
(GCA2), due to [15], that applies fruitfully the graph 
collapsing concept to the modified version of the 
auction algorithm (MA). Note that, because dur- 
ing the computation MA does not maintain verified 
the CSC along the current candidate path, it is no 
longer straightforward to implement the substitution 
of a piece of the path with only one arc. In GCA2 this 
substitution is realized during the extension step, only 
when the second minimum value computed during the 
price updating phase of MA results from the incoming 
arc on the current node, because only in this case the 
CSC are verified as equality. 

Besides applying the graph collapsing concept to 
the modified auction algorithm, GCA2 uses the dual 
prices updating idea even when a graph collapsing oc- 
curs. In fact, in step 2 of GCA2, when a graph col- 
lapsing occurs, a certain amount of computational ef- 
fort is achieved updating the price of the current node 
through y2, the third minimum value, which after the 
collapsing becomes the second minimum value. In [15] 
it is shown that the computational complexity of GCA2 
is not worst than that of MA, which is no worse than 
that of the Pallottino’s algorithm [27]. 


Virtual Source Concept Applied 
in Auction Algorithms 


In [14] two algorithms were proposed having complex- 
ity O(n?) and in which the computational effort is re- 
duced fully exploiting the below described property of 


Step 0: choose  € R'Y! such that 
c(l, p) — (1) + x(p) = 0, V(1, p) € E. 
pred(i) = NIL and d(i) = +oo for each i € V 
sort FS(s) in nondecreasing order 
w(s) = 2(s) = SELECT.MIN FS(s)) 
w(l) =z(1) foreachlEV,14 5 
E=E\{(1,s), (gj): Lg € V} 
Q = {s}, i= sand goto Step 1. 
Step 1:IF | Q |=| V | OR w(q) = +00, Vq € Q, 
THEN stop. 
IF (i) = SELECT.MIN FS(i)) go to Step 2 
ELSE go to Step 3. 
Step 2: Let jj = arg ming; »epgi{c(i, p) + 1(p)} 
sort FS(j;) in nondecreasing order 
pred(j;) = i, d(ji) = w(i) 
m(j;) = SELECT.MIN FS(j;)) 
w(ji) = w(i) + r(ji) 
E=E\{(, ji): LEV} 
INSERT(Q, j;) and go to Step 3. 
Step 3: IF m(j) = +oo, Vj € FS(i) OR FS(i) = 9 
w(i) = m(i) = +00 
ELSE 
(i)°4 = (i) 
(i) = SELECT.MIN FS(i)) 
w(i) = wz(i) + (a (i) — (i)"*). 
UPDATE(Q, i) 
i = SELECT.MINQ) and go to Step 1. 


Pseudocode for VSA2 


the auction algorithm: when it reaches for the first time 
a node, said current node, the shortest path from the 
source to current node is found. Hence, it is obvious 
that all the subtrees rooted at the current node can be 
computed applying the algorithm from a virtual source 
located on the current node itself, and that the com- 
plete shortest path tree can be assembled joining pieces 
of optimal subpaths so obtained. 

In order to make well defined the joining opera- 
tion, the algorithm associates a label to each virtual 
source. In this aspect it resembles Dijkstra’s algorithm 
[20], but the order on which new nodes are explored 
remains equal to that of the auction algorithm. The vir- 
tual source algorithm actually combines the good char- 
acteristics of both the approaches. 

The first version of the virtual source algorithm 
(VSA1) evolves as the standard auction algorithm with 
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two differences. First of all the algorithm maintains 
a special list Q of those nodes that during the computa- 
tion become virtual sources. Moreover, the contraction 
phase is modified: every time it becomes needed to per- 
form a contraction on a node i, the algorithm updates 
the value of a parameter w(i), called weight associated 
to i, which is inserted in Q if it does not belong to Q 
yet. The next terminal node is that corresponding to the 
actual minimum weight on Q. During the iterations of 
the algorithm the list Q is maintained sorted in nonde- 
creasing order of weights. The crucial characteristic of 
VSA1 is the way in which the weights of the nodes are 
updated. At any iteration the weight associated to each 
node i expresses the shortest distance from the source 
to the node i itself added to the actual minimum value 
of the function c+ z on the set of edges outgoing from i. 

Even if VSA1 is interesting from a theoretical point 
of view, it can be easily furthermore improved to com- 
pletely eliminate not only the contraction phases, as 
performed by VSA1, but also extension phases. The re- 
sulting algorithm, called VSA2, performs only inser- 
tions of nodes in Q and updating of Q. Since any ex- 
tension of the auction algorithms is followed by a con- 
traction and since VSA1 creates a virtual source during 
a contraction phase, the authors thought to anticipate 
the virtual source placement on a node as soon as that 
node is discovered by the algorithm. A relative pseu- 
docode is given above. 


A New Virtual Source Algorithm 


In the algorithm VSA2 described in the previous sec- 
tion both the contraction and the extension phases 
are completely eliminated. The computational time de- 
pends exclusively on how efficiently can be performed 
the dictionary operations of insertion and updating of 
the data structure chosen to represent Q, the set con- 
taining the virtual sources. 

In order to improve the performance of VSA2, in 
which Q is implemented as a queue, in the new pro- 
posed algorithm the virtual sources weights are main- 
tained in some sorted fashion, using the property that 
VSA2 assigns nondecreasing weights. The basic idea 
is similar to that realized by Dial, who chose a direct- 
address table as data structure. Nevertheless, given the 
bounds of the memory available on a typical computer, 
storing a direct-address table of size equal to the max- 


imum length of a path is impractical, or even impossi- 
ble, if the number of nodes and/or the maximum arc 
cost are large. In the new proposed algorithm, instead 
of using the virtual sources weights as an array index 
directly, the array index is computed from the weight. 
The data structure resulting is called hash table. Since it 
typically uses an array of size proportional to the num- 
ber of elements actually stored, an hash table is more 
effective than the direct addressing technique, when, as 
in our case, the number of elements actually stored is 
small compared to the size of the direct-address table. 
In fact, while in a direct-address table a node having la- 
bel k is directly stored in the slot k, with hashing the 
virtual source having weight k is stored in the slot g(k), 
where if n is the number of nodes of the graph and / = 
(n — 1) max, j) € ec(i, j) 


g: U={0,...,B > {0,...,m—1} 


is a function that maps U into the slots V[0,..., m— 1]. 
The efficiency of an hash table depends on the choice 
of the hash function g. An hash function is ‘good’ if 
it is such that each element is almost equally likely to 
hash to any of the m slots. The most popular techniques 
for designing a good hash function are hashing by divi- 
sion, hashing by multiplication and universal hashing. 
In practice, heuristic techniques can be also used to cre- 
ate hash functions that are likely to perform well. For 
a detailed analysis of these techniques, see [18]. 

The hash function allocated in the new proposed 
algorithm has been designed following the division 
method, in which for creating hash functions a key k 
is mapped into one of m slots by taking the remainder 
of k divided by m. Formally, the general hash function 
is the following: 


g(k) =k mod m. 


In the new virtual source algorithm m has been chosen 
in order to process at most 10 elements for each unsuc- 
cessful search. 

Even if this line of research is still being investigated, 
the hash function g above defined has already led to sat- 
isfactory results, as discussed in the next section dedi- 
cated to the computational results. 


Computational Results 


Detailed results are reported in [15,21], and [14]. Here, 
we briefly describe the results obtained comparing two 
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classes of algorithms that seem to be the most compet- 

itive: the auction class, from that GCA algorithms de- 

scend, and the class of Dijkstra-like algorithms. 

In [15,21], and [14] we have considered as repre- 
sentative of the first class the forward modified auction 
(MA) [16], that in some preliminary performed tests 
has been selected as the faster one over the all competi- 
tors in all the instances. For the second class we have 
chosen the forward S-HEAP and the forward S-DIAL 
of [24] because they are widely known state of the art 
algorithms and because all the implementations of Di- 
jkstra’s forward algorithm often behaves similarly [17]. 

The experiments were conducted on a Digital Al- 
pha 4100 running Digital UNIX V4.0B. All the auction 
programs were written in C and compiled with gcc ver. 
2.7.2, while for the S-HEAP and S-DIAL algorithms we 
used the Fortran implementation due to [25]. 

To cover most of the practically encountered in- 
stances different kinds of problems were studied; for 
any family of problems different sizes were considered 
and for any size we recorded running times employed 
by any algorithm averaged over ten different random 
generated instances. 

The types of problems taken into account were: 
square and long grid networks, generated using the 
GRIDGEN code by Y. Lee and J.B. Orlin and networks 
generated using the NETGEN code [26] with different 
densities. For any of such networks we have solved the 
SPT problem. 

The following three groups of graphs are consid- 
ered: 

1) Square and long grid-like graphs, generated by using 
the GRIDGEN code, whose node number n varies 
from 1000 to 20000. The source is always placed at 
one end of the grids, so the diameter is large. All 
grids are planar and square with average degree for 
any node equal to 4. The length/height ratio is ap- 
proximatively 30. 

2) General random networks generated by using the 
NETGEN code [26]. Even in this case n ranges from 
1000 to 20000, using as arc number m both 4n and 
10n. 

3) Complete networks, whose nodes number n ranges 
from 100 to 500. 


In all cases the arc lengths are randomly chosen in the 
range 0- 10000; for any of them we solved the SPP from 
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Mean computational time on square grids 
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Shortest Path Tree Algorithms, Figure 2 
Mean computational time on elongated grids 


the node labelled 1 by the network generator to all the 
other nodes. In all instances GCA2 has outperformed 
MA. 

The figures show the mean computational time of 
the tested algorithms in seconds and in log. scale. 

MA and GCA2 have analogous behavior, but in 
each instance GCA2 leads to a computational time sav- 
ing at least of 50% over MA. 

Recently (as of 1999), we have realized a new imple- 
mentation of the algorithm VSA2 in which the set con- 
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Mean computational time on sparse graphs 
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Mean computational time on dense graphs 


taining the virtual sources is an hash table. Even if the 
worst-case complexity has been not improved, the new 
algorithm reduces in practice the computational time 
of VSA2. Even though the testing phase of this new al- 
gorithm is at the moment limited to general dense ran- 
dom graphs, analyzing the results obtained, it seems to 
be competitive with the Dijkstra-like algorithms. 


Conclusions 


This article is a brief survey of the most popular al- 
gorithms and of some novel approaches for solving 
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Mean computational time on complete graphs 


shortest path problems. The new methods proposed 
are variations of the auction technique. They are based 
on topological transformations of the graph, operated 
during the iteration of the algorithms. The main idea 
is based on the property of any auction algorithm for 
the shortest path that if a node is included at a cer- 
tain step in the candidate path, then the shortest path 
to this node is found. Different realizations of such ap- 
proaches are described leading to different algorithms: 
graph collapsing algorithms and virtual sources algo- 
rithms. The graph collapsing algorithms presented are 
two. The former one is the straight application of the 
idea to the standard auction algorithm of Bertsekas im- 
proved in [27], while the latter applies the same con- 
cept to the modified auction of [16]. Strengthening the 
peculiar characteristics of the standard auction method, 
it has been developed the family of virtual source algo- 
rithms. 

For all the algorithms an upper bound on the total 
number of required operations is found. An extensive 
set of numerical test has been carried out and looking 
the results it is possible to conclude that the algorithm 
GCAL has only a theoretical relevance. GCA2, instead, 
has been revealed in all cases much more efficient. It 
seems to be one of the better algorithms for solving the 
shortest path problem and extends the applicability of 
the auction approach. GCA2 fills the performance gap 
between them and the better Dijkstra-like algorithms, 
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while the most recent virtual source algorithm seems to 
completely eliminate it. 
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Introduction 


In this chapter, we propose an enhanced state-task 
network (STN) mixed-integer linear programming 
(MILP) model for the short-term scheduling of multi- 
product and multipurpose batch plants with interme- 
diate due dates. The proposed approach extends the 
continuous-time scheduling model which was origi- 
nally developed by Floudas and coworkers [6,7,8,10]. 
This enhanced formulation is able to account for lim- 
ited, renewable resources, various storage policies, in- 
cluding unlimited intermediate storage (UIS), finite in- 
termediate storage (FIS), no intermediate storage (NIS), 


and zero-wait (ZW) conditions, and incorporates sev- 
eral additional features, including variable batch sizes 
and processing times, batch mixing and splitting, and 
sequence-dependent changeover times. The enhanced 
formulation still utilizes a continuous-time representa- 
tion employing a necessary number of event points of 
unknown location corresponding to the activation of 
a task. However, tasks are allowed to continue over sev- 
eral, consecutive event points, enabling resource and 
storage quantities to be correctly determined at each 
task activation. The full mathematical model and ad- 
ditional computational results can be found in [9]. 
There are several other models in the scheduling lit- 
erature which are capable of accounting for resource 
constraints as well as mixed storage policies in short- 
term scheduling problems. Maravelias and Gross- 
mann [11] developed a global event based continuous- 
time MILP model which utilizes the STN approach and 
addresses the general problem of batch scheduling, in- 
cluding resource constraints, variable batch sizes and 
processing times, various storage policies, batch mix- 
ing and splitting, and sequence-dependent changeover 
times. Their model utilizes the idea of task decou- 
pling, eliminates binaries for unit assignment and the 
continuous variables for start times of tasks, proposes 
a new class of valid tightening inequalities, and was 
the first general STN-based model capable of han- 
dling resource considerations. However, owing to the 
continuous-time representation used, which is com- 
mon for all units, their formulation always requires an 
extra event point for the end of the last task, generat- 
ing larger and more complex models than the proposed 
formulation. In addition, Castro et al. [2] presented 
a general, continuous-time MILP model for schedul- 
ing of batch processes based on the resource-task net- 
work (RIN) representation. It is uses a global event 
based representation of time and treats all types of re- 
sources in a unified way so that no special sequenc- 
ing constraints are required. The authors claim that it 
is a simpler and less degenerate mathematical model 
than other RTN continuous-time formulations, such 
as the model of Schilling [13]. In later work, Castro 
et al. [3] extended their RTN formulation to consider 
continuous processing tasks, better constraints to han- 
dle ZW conditions for batch tasks, and tighter timing 
constraints. The authors also extended their formula- 
tion to deal with sequence-dependent cleaning tasks. 
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Conceptual Model Enhancements: 
Splitting of the Tasks 


As previously mentioned, the proposed formulation 
allows all tasks to continue over several consecutive 
event points. This change was implemented so that the 
amount of resource utilized by a task is correctly de- 
termined in relationship to all other tasks which utilize 
the same resource at all instances of time. For exam- 
ple, consider the production schedule and associated 
resource utilization given in Fig. 1 for a process which 
has two tasks that utilize the same resource. Both tasks 
(T1 and T2) have a processing time of 1 and utilize ten 
units of the resource with each batch. The schedule on 
the left, which is determined with the original schedul- 
ing model, does not recognize that task T1 is active 
when task T2 begins. At time 2 when task T2 starts, it 
does not see that task T1 is active and thus determines 
that only ten units of the resource are currently being 
used instead of 20. However, if you look at the images 
on the right, which were generated with the proposed 
formulation, task T1 is split over two consecutive event 
points. Thus, when task T2 starts at time 1, task T1 is 
active at that time point and the calculated resource uti- 
lization is correct. 

Note that splitting of processing tasks in our 
continuous-time model is also necessary to account 
for several other features inherent in scheduling prob- 


Schedule: Schedule: 
U2 —— v2 _ 72 
ui} uit 1 
0 1 2 3 0 i 2 3 
Resource: => Resource: 
30 30 
20 20 T2 
10 Tl T2 10 Tl 
% 1 2 3 % 1 2 3 


Short-Term Scheduling of Batch Processes with Resources, 
Figure 1 
Task splitting for resource utilization 


lems in addition to limited, renewable resources. For 
instance, in order to model FIS for a state that can be 
produced or consumed by more than one task, it may 
be necessary to allow the tasks to continue over more 
than one event point. If the tasks have different process- 
ing times or do not start and/or finish at the same time, 
then task splitting will have to be incorporated to ensure 
that storage limits are maintained. Also, for STNs that 
employ recycle loops, the intermediate state recycled in 
the loop is consumed by a task that occurs earlier in the 
STN than the task that produces the recycled state, cre- 
ating a complicated time dependence between the two 
tasks that must be maintained in order to avoid violat- 
ing material balances. If these related tasks have differ- 
ent processing times, it may be necessary to allow task 
splitting for the task(s) with longer processing times in 
order to determine the best possible solutions. 

As a consequence of tasks extending over multiple 
event points, each processing task must have two sets of 
binary variables and one set of continuous variables as- 
sociated with it. The binary variable ws(i, n) indicates 
that a task i starts at event point n, while the binary 
variable,w f(i, n) indicates that a task i ends at event 
point n. In addition, the continuous variable w(i, 1) 
indicates that a task i is active at event point n, re- 
gardless of whether the task is starting, finishing, or 
just processing at that event point. In response to this 
change, the enhanced model has an expanded set of 
constraints in order to accurately keep track of the uti- 
lization of units as well as the timing and sequence of 
tasks that have been split. In addition, two new sets of 
tasks are introduced into the mathematical model. One 
set, (i'), represents the storage of intermediate states, 
while the other set, (u), gives the utilization of a re- 
source. New constraints are then introduced into the 
enhanced model in order to relate the timing and se- 
quence of these new tasks with their associated process- 
ing tasks so that specified limits can be enforced. 


Formulation 


The proposed formulation requires the following in- 
dices, sets, parameters, and variables: 


Indices: 
i processing tasks; 
i* storage tasks; 


j units; 
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k orders; 

n_ event points representing the beginning of a task; 
s states; 

u utilities; 

Sets: 


I processing tasks; 


st 
i 


storage tasks for state s; 


I; tasks which can be performed in unit j; 
I;,_ tasks which process order k; 


tasks which are processing or storing; 


I? tasks which produce state s; 


IS tasks which consume state s; 


I,, tasks which consume utility u; 
J units; 
J; units which are suitable for performing task i; 


orders; 
orders which are processed by task i; 
s orders which produce state s; 


N_ event points within the time horizon; 
S states; 


states with finite intermediate storage; 
states with no intermediate storage; 
p States which are final products; 


S, states which are raw materials; 
S, states with ZW constraint; 
U_ utilities; 


Parameters: 

ij constant term of processing time of task i in 
unit j; 

Bij variable term of processing time of task i in 
unit j; 

Sin variable term of consumption of utility u by 
task i; 

Viu constant term of consumption of utility u by 
task i; 

Pis proportion of state s produced that is con- 
sumed by task i; 

am; amount of order k; 

avy maximum availability of utility u; 

ca a maximum capacity for task i in unit j; 

cap;; minimum capacity for task 7 in unit j; 

cap: capacity of storage for state s; 

dem, demand of state s; 

due, due date of order k; 

H time horizon; 


price of state s; 


ST? 


initial available amount of state s; 


ST™* maximum amount of state s; 


Continuous variables: 


B(i,j,n) amount of material undertaking task i in 
unit j at event point n; 

BS(i,j,n) amount of material starting processing at 
event point n; 

B‘(i,j,n) amount of material finishing processing at 
event point n; 

BU(i,u,n) amount of utility u consumed by task i at 
event point n; 

Bu(u,n) remaining level of utility u at event point n; 

B,(i*,n) amount of material stored by storage task 
i* at event point n; 

D(s, n) amount of state s delivered at event point 
n; 

MS makespan; 

ST(s, n) amount of state s at event point n; 

STF(s) final amount of state s at the end of the time 
horizon; 

STO(s) initial amount of state s at the beginning of 
the time horizon; 

T*(i, j,n) time at which task i starts in unit j at event 
point n; 

Ti, j.n) time at which task i finishes in unit j at 


TS (i, n) 


Tin) 


event point n; 

time at which storage task i* starts at event 
point n; 

time at which storage task i* finishes at 
event point n; 


T;.(u,n) — starting time of a change in utility wu at 
event point n; 

Ti(u,n) finishing time of a change in utility u at 
event point n; 

TT*(j,n) starting time of the active task in unit j at 
event point n; 

TT‘(j,n) finishing time of the active task in unit j at 
event point n; 

w(i, n) indicates if task i is activated at event 
point n; 

Binary variables: 

ws(i,n) assigns the beginning of task i at event 


w f(i,n) 
y(k, i, n) 


point n; 

assigns the ending of task i at event point n; 
assigns the delivery of order k through task i 
at event point n. 
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On the basis of this notation, the mathematical 
model for the short-term scheduling of batch plants 
with mixed storage policy and resource constraints in- 
volves the following constraints. 


Allocation Constraints 


Yo wli,n) <1, VjeJ, nEN (1) 
ié€l; 
wisn) = >> ws(i,n')— S° wfli.n’), 


VielneN (2) 


Yo ws(i.n) = Yowflin), Viel (3) 
neN neN 
ws(i,n) <1— a ws(i,n’) + ae wf(i,n’), 


Viel, nEeN (4) 


wf(i.n) < D> ws(in')— SY) wfli.n’), 


VielI, neN (5) 


These constraints express the requirement that for each 
unit j and at each event point n, only one of the 
tasks that can be performed in this unit (i.e. i € Ij) 
should take place. Constraints (2) relate the continu- 
ous variable w(i, 1) to the binary variables ws(i, n) and 
wf (i, ) so that w(i, 2) will take on a value of 1 if task i 
is activated at event point n. Constraints (3) require that 
each processing task i must both start and finish during 
the time horizon. Constraints (4) require that process- 
ing task i cannot start at event point n if it has started at 
an earlier event point n’ and has not finished by event 
point n. Constraints (5) require that processing task i 
cannot finish at event point n unless is has started at 
an earlier event point n’ and has not finished by event 
point n. 


Capacity Constraints 


cap -w(i,n) < Bi, j,n) < capi; 


Viel, j€Ji, neN (6) 


-w(i, n), 


By(i*,n) < cap*, Vite It, neNn (7) 


These first set of constraints express the requirement 
for the batch size of a processing task i at a unit j, 


B(i, j,n), to be greater than the minimum amount of 
material, capi”, and less than the maximum amount 
of material, ca Pij > that can be processed by task i in 
unit j. Constraints (7) represent the maximum available 
storage capacity for each storage task i* at each event 


point n. 
Batch-Size Matching Constraints: Processing Tasks 


Bi, j,n) < BU, j,n— 1) 
+ cap [1—w(i,n—1)+wf(i,n—1)], 
Viel, jeJi, ne N,n>1 
(8) 
B(i, j,n) => BC, j,n—1) 
— cap [1—wli,n-1) +wf(i,n—1)], 
Viel, jeJi, nEeN,n>1 
(9) 


BY(i,j,n) < B(i,j,n), Viel, jEJ;, n€N (10) 


BS(i, j,n) < capy;** 


ij ‘ws(i,n), Viel, jesi, neNn 


(11) 


BS(i, j,n) > BC, j,n) — cap;; — ws(i,n)], 
Viel, jeJi, n€N (12) 


BY, j,n) < Bli,j,n), Viel, jeJi, ne N (13) 


Bi, j,n) < cap?™-wf(i,n), Viel, je ji, neN 
(14) 

B‘(i, j,n) = B(i, jn) — cap?™[1 — wf (i. n)], 
Viel, jeJi, ne N (15) 


Constraints (8) and (9) represent the relationship be- 
tween the batch size of task i in unit j at two consec- 
utive event points m — 1 and n. These constraints are 
required because tasks can extend over several event 
points and the batch sizes at these consecutive event 
points must be consistent. Constraints (10)-(12) relate 
the variables B(i, j,) and B‘(i, j,), where B*(i, j, n) 
is the amount of material that is starting to be pro- 
cessed at event point n. Similar to the previous set of 
constraints, constraints (13)-(15) relate the variables 
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B(i, j,n) and BiG, 7, n), where Bs, n) is amount 
of material that is finishing being processed at event 
point n. 


Batch-Size Matching Constraints: Utility Tasks 


BU(i, u,n) = yinw(i, n) + nivBii, j,n), Vu € U, 


ie€l,, j€Ji, nEN (16) 


y > BU(i, u,n) + By (u, n) 
i€l, 
= )> BU(i,u,n—1) + Bu(u,n—1), 


i€ly 


Vue U, nEN,n>1 (17) 
y> BU(i, u,n) + Buu, nm) = avy, 
= WueU, n€N,n=1 (18) 


where yj, and nj, are the constant and variable terms 
of the amount of utility uw consumed by task i in unit 
j at event point n, BU(i, u,n) is the amount of utility 
u consumed by task i at event point n, and By(u, n) is 
the amount of utility u available at event point n. Con- 
straints (16) represent the amount of utility required by 
the unit to process B(i, j, n) of material while perform- 
ing task i. Constraints (17) express the mass balance on 
the utilities, requiring that the amount of utility at event 
point n is equal to the amount of utility at event point 
n—1. Constraints (18) express the requirement that 
the amount of utility u at the first event point, including 
the amount available, By,(u, 1), and the amount con- 
sumed, )* i€l, BU(i, u, n), must be equal to the original 
amount of utility u available, av,. 


Material Balances 


ST(s,n) = ST(s,n —1) — D(s,n) 


+ opis) Bi jn — 1) 


ieR  — ISJi 
+ >> pis >) Bi, j.n) 
ielS  j€Jj 
+ DF Balin —1)— DF Bali, 0), 
iste st iste [st 
VseES, neN,n>l 


(19) 


ST(s,n) = STO(s) +} pis D> BCG, jn) 


ieIs ji 
— D> Bali", n), 


Vs eS, 
isteyst 


STF(s) = ST(s, n)—D(s,n)+) > pis )> BCG, j.n) 


I€Si 
+ D2 Bui, n), 


jste st 
ele 


neEN,n=1 (20) 


ie 
VseES, nEeN,n=N 


(21) 


According to constraints (19), the amount of material 
of state s at event point n is equal to that at event point 
n — 1 increased by any amounts produced or stored at 
event point n — 1, decreased by any amounts consumed 
or stored at event point n, and decreased by the amount 
required by the market at event point n, D(s, n). Con- 
straints (20) and (21) represent the material balance on 
state s at the first and last event points, respectively. The 
amount of state (s) at the first event point is equal to 
the initial amount, STO(s), decreased by any amounts 
consumed or stored at the first event point. The total 
amount of state s at the end of last event point, ST F(s), 
is equal to the amount at the beginning of the last event 
point, ST(s,), increased by any amounts produced 
or stored at the last event point and decreased by the 
amount required by the market at the last event point. 


Duration Constraints 
T'(i, jn) = T(i,jn), Viel, fei, n€N (22) 
T'(i, j,n) < Ti, j,n) + H-w(i,n), 


Viel, jeji, neNn (23) 


T°(i, j,n) < T'(i, j,n —1) + HU — w(i,n — 1) 
+wf(i,n—1)], 
Viel, jeJi, ne N, n>1 (24) 


T'(i, j,n') — TS(i, j,n) = ajjws(i, n) + BiB, j, 0) 
— H[1 — ws(i, n)] 
— H[1—wf(i,n’)] 
-HL DO wflin")], 
n<n" <n! 
Viel, jéJi, nEN, n EN, n<n' 
(25) 
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T*(i, j, n') — T’(i, j,n) < ajjws(i, n) + BiB (i, j,n) 
+ H[1 — ws(i, n)] 
+ All —wfli,n)] 
+H So) wfli.n')), 


n<n"/<n’ 
Vig I, jeji, nEN, n. EN, n<n'. 
(26) 


The first set of constraints represent the relationship be- 
tween the starting and finishing times of task i in unit j 
at event point n. Because tasks can extend over multiple 
event points, the finishing time is not assigned from the 
start time, but must be greater or equal to the start time. 
Constraints (23) also represent the relationship be- 
tween the starting and finishing times of task i in unit j 
at event point n. If task i does not take place at event 
point n, then along with constraints (22), the finishing 
time is set equal to the starting time. Constraints (24) 
relate the starting time of task i in unit j at event point 
n to the finishing time of the same task in the same unit 
at the previous event point, n — 1. These constraints are 
relaxed unless task i is active and does not finish pro- 
cessing at event point n — 1. In this case, task i must 
extend to the following event point, n, so that this con- 
straint, T°(i, j,n) < TG, 4, n—1), along with the se- 
quencing constraints (29), T°(i, j,n) = TG, 7, n—1), 
results in the two times being equal. Constraints (25) 
and (26) relate the starting time of task i in unit j at 
event point n with its finishing time at a later event 
point n’. 


Tew] hen. VCer, #2eNn (27) 


Ti n) > Ti(u,n), VueU, nen. (28) 


The first set of constraints relate the starting and finish- 
ing times of a storage task i* so that the finishing time 
must always be greater than or equal to the start time. 
The second set of constraints relate the starting and fin- 
ishing times of changes in the amount of utility u so that 
the finish time must always be greater than or equal to 
the start time. 


Sequence Constraints: Processing Tasks 


T°(i, j,n) > T'(i, j,n —1), 


Viel, jeji, ne N,n>1 (29) 


T(i,j,n) = T@,j.n—-1)—HIL—w',n-1)], 
VieJ, iel, el, if i, neN, n>1 
(30) 


TS(i,j.n) = Ti, j,n-1)—H[l—wf(i,n—-1)], 
VseS, ier, vi eR, 
eh, fel j#s, nEN, n>1 (G1) 

G58) 
< T'(i’, ',n—1) + H[2—wf(i’,n—1)—ws(i, n)], 
Vs € S*,S',8", ie, # eR, 

jel, jf ¢dv, j#i, ne€N,n>1. (32) 
The sequence constraints in (29) state that task i start- 
ing at event point n should start after the end of the 
same task performed at the same unit j which has fin- 
ished at the previous event point, nm — 1. The constraints 
in (30) are written for tasks i and i’ that are performed 
in the same unit j at event points n and n — 1, respec- 
tively. If both tasks take place in the same unit, they 
should be at most consecutive. Constraints (31) relate 
tasks i and i’ which are performed in different units 
j and j’ but take place consecutively according to the 
production recipe. Constraints (32) are written for dif- 
ferent tasks i and i’ that take place consecutively with 
the “zero-wait” (ZW) condition owing to storage re- 
strictions on the intermediate material. Combined with 
constraints 31, these constraints enforce that task i in 
unit j at event point n starts immediately after the end 
of task i’ in unit j’ at event point n — 1 if both tasks are 
activated. 


Sequence Constraints: Storage Tasks 


T(i,j,n) = Th, n—-1), 


WseS, ie, jeji, elt, 


néeN,n>1 (33) 
T°(i, jn) < TE", n—1) + H[1— ws(i,n)], 
VseS', ie, jeji, tek, 

néN,n>1 (34) 
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TS(i*,n) > Ti, j’,n—-1)—H[1—wf(i,n—1)], 
VseS, ieR, fey, tel, 


néeN,n>1 (35) 


TS, n) < Ti’, j/,n-1I +H[l—wf(i,n—-D)], 
VseS', fer, jejy, el, 
néeN,n>1 (36) 


Ti(i*,n) = TEG*,n-), Vitel, ne N,n>1 


(37) 


The first two sets of constraints relate the starting time 
of a processing task i at an event point n to the fin- 
ishing time of a storage task i“ at the previous event 
point, n — 1. Note that (34) is only written for states s 
with FIS. Thus, if task i starts at event point n and 
consumes a state s that requires FIS, then we have 
T(i, j,n) = Thi, n — 1) for all storage tasks i* for 
state s. Constraints (35) and (36) relate the starting time 
of a storage task i*' at an event point n to the process- 
ing task i’ at the previous event point, n — 1. Similar 
to constraints (33) and (34), these constraints enforce 
the timing for a processing task that produces a FIS 
state and a storage task which stores the same FIS state. 
Constraints (37) relate the starting and finishing time of 
a storage task i*' at two consecutive event points. They 
ensure that, along with constraints (33)-(36), the tim- 
ing for storage of FIS states will be enforced so that stor- 
age limitations are not violated. 


Sequence Constraints: Utility-Related Tasks 


PG jn=) 
> Ti, (u,n) — H[1— wG,n-1) + wf(i,n—-1)], 


VueU, i€l,, jesi, nEN,n>1 (38) 


T'(i, j,n —1) < TS(u,n) + H[1— w(i,n—1)], 


VueU, i€l,, jeli, nEN, n>1 (39) 


TS (u,n) > T°(i, jn) — HU — wi, n)), 
VueUu, ie€l,y, j€Ji, nEN (40) 


TS (u, n) < T°(i, j,n) + HU — win) 
VueU, i€l, jesi, neN (41) 


S f 
Ty (un) = T,(u,n—1), VueU, neN,n>1 


(42) 


The first two sets of constraints relate the finishing time 
of a processing task i which utilizes utility u at an event 
point n — 1 to the starting time of the usage of utility u 
at the next event point. Constraints (40) and (41) relate 
the starting time of the usage of utility u at event point 
n to the processing task i which utilizes utility u at the 
current event point. Constraints (42) relate the starting 
and the finishing time of the usage of utility u at two 
consecutive event points. They ensure that, along with 
constraints (38)-(41), the timing for the changes in the 
utility level will be consistent and the amounts of utili- 
ties used can be monitored exactly and specified limits 
enforced. 


Order Satisfaction Constraints 


The order satisfaction constraints provided here are 
written for problems involving network-represented 
processes. Note that these constraints can easily be 
modified for the case of sequential process problems. 
This is done by relating orders to units in the same 
manner as orders are related to tasks below. 


S_>o y(k in) =1, VkEK (43) 
i€l, nEN 
wf(i,n) = > y(k,i,n), Viel, neN (44) 
kEK; 
D(s,n) = ~ Y- amg: y(k, i, n), 
keEK, i€I, 
Vs ESP, neN (45) 


T'(i, j,n) < due, + H[2— y(k, i,n) —wfl(i,n)], 
VseS, kek;, i€Ik, jeJi, nEN 


(46) 


T'(i, j,n) => due, — H[2— y(k, i,n) —wfl(i,n)], 
VseS, kek, iel, jesi, nen 
(47) 


The first set of constraints ensure each order k is met 
exactly once; thus, each order is processed by only one 
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task i and is delivered at exactly one event point n. Con- 
straints (44) relate the delivery of an order k through 
task i to the activation of task i at event point n. Con- 
straints (45) relate the amount of state s delivered at 
event point n, D(s,n), to the amount of that state due 
through order k. Thus, if state s has two orders asso- 
ciated with it, kl and k2, of amounts am, and amz, 
respectively, and orders k1 and k2 are both delivered 
at event point n, then the delivery of state s at event 
point n is represented as D(s,n) = amy, + am and 
the amount of the state delivered will be equal to the 
amount ordered. Constraints (46) and (47) relate the 
time that order k is due to the actual time that order k 
is delivered. 


Bound Constraints 

T'(i, j,n) < H, Viel, 
T(i,j,n) <H, Viel, 
Ti(u,n) <H, Vue U, 
Ty(u.n) <H, Vue U, 
Ti(u,n) =0, Vue U, 
STO(s) =0, Vs éS' 
STO(s) < ST’, VseS’ 
ST(s,n) =0, Ws € S*,S',S" neN 
ST(s,n) < STO, VseS, neNn 
D(s,n) =0, Vs €S?, nEN 


0<w(i,n)<1, Viel, nEeN 


jeji, nEN 
jesi, nEN 
neN 
neN 
neN,n=1 
(48) 


These constraints represent bounds on several of the 
continuous variables. The starting and finishing times 
of processing tasks and changes in utility level must 
all be within the time horizon. The start time for the 
changes in utilities at the first event point is set to zero. 
The initial amounts of all non-raw-material states are 
set to zero, the intermediate amounts of all ZW, NIS, 
and FIS states are set to zero, and the amounts of all 
nonproduct deliveries are set to zero. Also, the continu- 
ous variable representing the activation of task i in unit 
j at event point n, w(i,n), must fall between zero and 
one. 


Objective Function 


There are several different objective functions that can 
be employed with a general short-term scheduling 


problem. Three of the most common types are reviewed 
below. 
Maximization of sales 


max >) prices b D(s,n) + sre09| (50) 


sesP neN 


The objective function represents the maximization of 
the value of the final products. 
Minimization of makespan 


Min MS (50) 


st. MS>T(i,j,n), Viel jeti, ne€N (51) 


STF(s) = dem,, Vs € S?, (52) 
where MS is the makespan. The objective function rep- 
resents the minimization of the makespan of the pro- 
cess for a fixed demand for each state s, dem,, contained 
in the set of final products, S?. 

Minimization of order earliness 


Min > due, — > » ba T'(i, j, n) « y(k, i, n) 


keK i€l, j€J; nEN 
(52) 


The objective function represents the minimization 
of the total earliness of all orders where the bilinear 
term, which is a product of a continuous and a binary 
variable, can be replaced with a continuous variable 
and supporting constraints using a Glover transforma- 
tion [4,5]. 


Methods 
Number of Event Points 


In this formulation, the number of event points is de- 
termined using the same approach as proposed in [6]. 
First, the problem is solved with a small number of 
event points to obtain a solution. Then, the number of 
event points is increased by one and the problem is re- 
solved to obtain a better solution. This is repeated un- 
til an additional increase in the number of event points 
does not result in any improvement in the objective 
function. 
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Tightening Constraints 


Following the tightening constraints suggested by Mar- 
avelias and Grossmann [11], similar constraints are in- 
troduced to tighten the relaxed solution of the proposed 
enhanced formulation. Specifically, constraints (54) 
tighten the formulation by enforcing the condition that 
the summation of the processing times of the tasks as- 
signed to a specific unit j should be less than or equal to 
the time horizon. 


SS ajws(i,n)+ BiB, j.n) <H, VjeT (54) 


i€l; neN 


Furthermore, this condition is also enforced for each 
unit j at each event point n as follows: 


> > ajjws(i, n’) + Bij BC, j, n’) 


i€lj n’>n 


<H-TT(j,n), Vie], nEN (55) 
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where T'T*(j, ) and TT j.n) are the starting time and 
the finishing time, respectively, of the task active in unit 
j at event point n. They are defined as follows: 


TT°(j,n) < T*(i,j,n) + H[1 — ws(i, n)], 
VjeJ, ieljy, nen 


(57) 
TT%(j,n) => TS(i, j, n) — H[1 — ws(i, n)], 
VjeJ, iel;, nen 
TT'(j,n) < Ti, j,n) + H[Il—wf(i,n)], 
VjeJ, iel, nen 

, , : : (58) 
TT'(j,n) => T'(i, j,n) — HU — wf(i, n)], 
VjeJ, ielj, nen 


Thus, constraints (55) enforce the condition that the 
summation of the processing times of all tasks starting 
in unit j at event points n or greater must be less than or 
equal to the amount of time remaining. Likewise, con- 
straints (56) enforce the condition that the summation 
of the processing times of all tasks finishing in unit j 


before event point n must be less than or equal to the 
amount of time that has passed up to the beginning of 
event point n. Note that constraints (56) are only active 
if a task finishes at the previous event point, n — 1, oth- 
erwise, TT( j,n) will not have an exact value and the 
constraint is relaxed. 

The addition of constraints (54)-(58) leads to re- 
laxed solutions with smaller sums of processing times, 
or smaller durations. This then leads to fewer acti- 
vated binary variables, ws(i,n) and wf(i,n). More- 
over, the continuous variables including the batch sizes 
(BS(i, j, n), B(i, j,n), B(i, j,n)) and the amounts of 
states (ST(s,n), STF(s)) are all bounded by the binary 
variables. Finally, because these continuous variables 
appear in the objective function, the addition of these 
constraints results in tighter relaxations. 


Sequential Processes 


Single-stage and multistage sequential processes are 
batch- or order-oriented and thus do not need to in- 
clude tasks or states or any of the constraints involving 
states. The model described in the previous section can 
be applied to sequential processes with a few modifi- 
cations. For instance, there are no defined tasks, states, 
batch sizes, or material amounts and all material bal- 
ances and capacity constraints are unnecessary. Thus, 
the basic constraints (1)—(5), (22)-(26), (29)-(32), and 
part of (48) all apply. The order satisfaction constraints 
(43)-(47) need to be modified as previously detailed. 
If storage constraints are to be considered, then con- 
straints (27) and (33)-(37) need to be included and if 
resource constraints are to be considered, then con- 
straints (28) and (38)-(42) should also be included. 
Furthermore, all of the binary and continuous variables 
and their participating constraints should be modified 
similarly to the order satisfaction constraints to reflect 
a dependence on orders associated with units instead of 
tasks associated with units. 


Cases 


In this section, two example problems are presented 
to demonstrate the effectiveness of the proposed ap- 
proach. Both general network-represented and sequen- 
tial processes are considered. Comparisons with previ- 
ously published approaches are also provided. All ex- 
amples are implemented with GAMS 2.5 [1] and are 
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solved using CPLEX 8.1 with a 3.00 GHz Linux work- 
station. The default GAMS/CPLEX options are used in 
all runs with the exception that the CPLEX option for 
feasibility is activated and a relative optimality tolerance 
equal to 0.01% was used as the termination criterion. 


Case 1: Resource Constraints, Mixed Storage 
Policies, Variable Batch Sizes, and Processing Times 


The second case also comes from Maravelias and 
Grossmann [11] and involves the STN given in Fig. 2. 
This example includes resource constraints, mixed stor- 
age policies, and variable batch sizes, processing times, 
and utility requirements. The plant consists of six units 
involving ten processing tasks and fourteen states. Un- 


limited intermediate storage (UIS) is available for raw 
materials F1 and F2, intermediates I1 and I2, and final 
products P1-P3 and WS. Finite intermediate storage 
(FIS) is available for states S3 and S4, while no interme- 
diate storage (NIS) is available for states $2 and S6 and 
a ZW policy applies for states $1 and S5. There are three 
different renewable utilities: cooling water (CW), low- 
pressure steam (LPS), and high-pressure steam (HPS). 
Tasks T2, T7, T9, and T10 require CW; tasks T1, T3, T5, 
and T8 require LPS; and tasks T4 and T6 require HPS. 
The maximum availabilities of CW, LPS, and HPS are 
25, 40, and 20 kg/min, respectively. The corresponding 
data for the example can be found in Tables 1 and 2. 
The objective function is the maximization of sales and 
time horizons of 12 and 14h are considered. 
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State-task network for case 1 
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Storage restrictions for case 1 
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Resource utilizations for case 1 
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Schedule for case 1 with a time horizon of 12 h. HPS high-pressure steam, LPS low-pressure steam, CW cooling water 


For a time horizon of 12h, the optimal sales are 
$13,000 and eight event points are required. The pro- 
duction schedule and resource utilization levels can be 
seen in Fig. 3. The problem involves 3318 constraints, 
110 binary variables, and 1077 continuous variables. Its 
optimal solution was found in 222 nodes and 1.71 s. For 
atime horizon of 14h, the optimal sales are $16,350 and 
eight event points are required. The production sched- 
ule and resource utilization levels can be seen in Fig. 4. 
The problem involves 3354 constraints, 109 binary vari- 
ables, and 1077 continuous variables. Its optimal solu- 
tion was found in 2869 nodes and 15.65 s. Note that in 
both cases, the limiting resource is CW, as can be seen 
from the resource utilization levels. In both schedules, 
tasks T2 and T7 occurring at the same time requires the 
maximum amount of CW available. 


Case 1 was also solved with the model M* of Mar- 
avelias and Grossmann [11] to compare the two formu- 
lations. Although this example was solved in their origi- 
nal paper, we have re-solved it here in order to compare 
the models using the same computational tools. The 
model and solution statistics using both models can be 
seen in Table 3. For the time horizon of 12h using nine 
time points, the model involved 2396 constraints, 180 
binary variables, and 1408 continuous variables. The 
same optimal solution of $13,000 was found in 23,235 
nodes and 64.92 s. For the time horizon of 14h using 
ten time points, the model involved 2663 constraints, 
200 binary variables, and 1564 continuous variables. 
The same optimal solution of $16,350 was found in 
22,625 nodes and 112.66 s. Note that the reported num- 
ber of nodes and CPU seconds for both time hori- 
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Schedule for case 1 with a time horizon of 14h 


zons using our application of model M* are different 
from those found by Maravelias and Grossmann [11]. 
For a time horizon of 12h, they reported 2107 nodes, 
while our application of model M* took 23,235 nodes. 
For a time horizon of 14h, they reported 60,070 nodes, 
while our application of model M* took 22,625 nodes. 
In addition, for both time horizons, model M* of Mar- 
avelias and Grossmann [11] takes at least one more 
time point and thus involves more binary and continu- 
ous variables. Also, the time horizon of 12h took over 
100 times more nodes to solve, while the time horizon 
of 14h took over 10 times more nodes to solve. This 
indicates that when a larger number of time points are 
considered in a problem, the proposed model performs 
better computationally than the model of Maravelias 
and Grossmann [11], even when the objective is the 
maximization of sales. 


Case 2: Sequential Process 
with Order-Dependent Processing Times 


The second case is taken from Pinto and Gross- 
mann [12] and involves a sequential process contain- 
ing one stage with four parallel extruders of unequal 
capacity and with processing times depending on the 
order being processed. A total of 12 orders are due at 
specific times over a 30-day period. The corresponding 
processing rate and due date data for the example can 
be found in Table 4. The objective of the problem is to 
meet all orders while minimizing earliness, as seen in 
constraint (53). 

The optimal processing schedule is given in Fig. 5 
with an objective function value of 1.026. The prob- 
lem was modeled with the formulation of Ierapetritou 
and Floudas [6] using only the allocation, duration, and 
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Table 3 
Model and solution statistics for case 1 


Maravelias and 
Grossmann[_ ] 
formulation 

12h 14h 

9 10 

180 200 
1408 1564 
2396 2663 
18423.5 | 22186.7 
$13000 | $16350 


23235 | 22625 
(2107) | (60070) 


64.92 112.66 


Proposed 
formulation 


12h 

8 

110 
1077 
3318 
19000 
$13000 
222 


14h 

8 

110 
1077 
3354 
19000 
$16350 
2869 


Horizon 
Event points 


Binary variables 


Continuous variables 


Constraints 


LP relaxation 


Objective 
Nodes! 


CPU time (s) 1.71 15.65 


'Numbers in parentheses represent values reported by Mar- 
avelias and Grossmann [11] 


same task in the same unit and different task in the same 
unit sequencing constraints along with the order satis- 
faction constraints outlined in (43)-(47) and the objec- 
tive given in constraint (53). 

Suppose now that, owing to limited manpower, 
there is a hard constraint on the number of extruders 
which can operate at the same time. We will consider 
the case where three extruders may operate simultane- 
ously (type 1) and the case where only two extruders 
may operate simultaneously (type 2). For both cases, 
we employ the model outlined in Sect. “Sequential Pro- 
cesses“, again using the order satisfaction constraints 
and the objective function to minimize the earliness of 
the orders. For type 1 with three extruders, the opti- 
mal objective function value is 1.895 and the produc- 
tion schedule can be seen in Fig. 6. For type 2, the op- 
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Table 4 
Data for case 2 


Due date Processing time (days) 
(day) ji j2 3 j4 


1 
2 
3} 
4 
5) 
6 
7 
8 


7.000 | 5.600 


Transition 0.180 | 0.175 |0.00 | 0.237 


timal objective function value is 7.909 and the produc- 
tion schedule can be seen in Fig. 7. Model and solution 
statistics for all three cases can be seen in Table 5. 

It can be seen from Table 5 that consideration of re- 
source constraints in the form of limited manpower in- 
creases the computational complexity of the problem. 
Resource considerations require a more complicated 
model involving more variables and constraints. For in- 
stance, resource considerations require an event point 
for every time the resource level changes, or in this 
case, for each order. However, a problem without re- 
sources only requires as many event points as the max- 
imum number of sequential tasks. For this example, 
the simpler problem without resource considerations 
only requires four event points, while the more com- 
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Schedule for case 2 without limited manpower 
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Schedule for case 2 with three extruders (type 1) 
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Schedule for case 2 with two extruders (type 2) 


plicated problem with resource constraints requires 12 
event points, resulting in many more binary variables 
and thus a much more complex problem. 

In order to test the effectiveness of the proposed 
formulation when used with sequential processes, we 
performed a computational comparison for this exam- 
ple with the model of Pinto and Grossmann [12]. Al- 
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Table 5 
Model and solution statistics for case 2 
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Binary variables 458 444 
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though this example was solved in their original pa- 
per, the objective function used was the maximization 
of starting times instead of the minimization of tardi- 
ness. So, we re-solved our model using the maximiza- 
tion of the starting times as the objective. Pinto and 
Grossmann [12] reported optimal objectives of 269.10, 
268.24, and 264.98 for the three cases of no resources, 
resources limited to three extruders, and resources lim- 
ited to two extruders. Our optimal objective function 
values with the same objective were 269.10, 268.82, and 
265.74, respectively. Thus, the proposed model found 
improved schedules with a better objective function 
value for the case where resources are limited to three 
extruders and the case where resources are limited to 
two extruders. This is not unexpected, however, ow- 
ing to the fact that the model used in [12] employs 
the concept of time slots and all slot-based formula- 
tions restrict the time representations and hence they 
can result, by definition, in suboptimal solutions. Note 
that the model and solution statistics found in Table 5 
for this problem using an objective function of mini- 
mization of order earliness are comparable to the model 
and solution statistics determined using an objective of 
maximization of start times. We do not make a com- 
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parison with the model and solution statistics presented 
in [12] because the authors did not report the integral- 
ity gaps or other optimality criterion used; thus, a direct 
comparison would not be meaningful. 


Conclusions 


In this chapter, an enhanced continuous-time formu- 
lation was presented for the short-term scheduling 
of multipurpose batch plants with intermediate due 
dates. The proposed formulation incorporates several 
features, including various storage policies (UIS, FIS, 
NIS, ZW), resource constraints, variable batch sizes and 
processing times, batch mixing and splitting, and se- 
quence-dependent changeover times. The key features 
of the proposed formulation include a continuous-time 
representation utilizing a necessary number of event 
points of unknown location corresponding to the ac- 
tivation of a task. Also, tasks are allowed to continue 
over several event points, enabling resource quantities 
to be correctly determined at the beginning of each re- 
source utilization. Four examples were presented to il- 
lustrate the effectiveness of the proposed formulation. 
The computational results were compared with those 
in the literature and it was shown that the proposed 
formulation is significantly faster than other general re- 
source-constrained models, especially for problems re- 
quiring many time points. 
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Introduction 


The scheduling problem of multiproduct and multipur- 
pose continuous plants has received relatively little at- 
tention in the literature despite its practical importance 
in the chemical process industries that produce a vari- 
ety of products using both batch and continuous pro- 
duction modes. Scheduling of a continuous plant typ- 
ically involves handling continuous production, man- 
agement of transitions, and accommodating inventory 
constraints, while meeting demands for final products 
with specified due dates. The problem deals with se- 
quencing of products on each unit, quantitative deter- 
mination of production amounts, optimal start and fin- 
ish times of production, and storage tasks. One of the 
key differences between scheduling of batch processes 
and scheduling continuous processes is in handling the 
processing times. In batch plants, the processing times 
are typically fixed and known a priori, and the produc- 
tion amount depends on the capacity of the batch unit. 
In continuous plants, the processing times are a func- 
tion of unit-dependent processing rates, final product 
demand, and storage limitations. Additionally, in con- 
tinuous plants, the production amount is available con- 
tinuously while it is being produced, unlike in batch 
plants, where the amount is available only after the end 
time for the batch that is being processing. Owing to 
these differences, the problem of scheduling of contin- 
uous plants has drawn separate attention. 

Floudas and Lin [3,4] presented extensive re- 
views comparing various discrete-time-based and 
continuous-time-based formulations. During the last 
two decades, numerous formulations have been pro- 
posed in the literature based on continuous-time rep- 
resentation, owing to their established advantages over 
discrete-time representations. On the basis of the 
time representation used, the different continuous-time 
models proposed in the literature can be broadly clas- 
sified into three distinct categories: slot-based, global 
event-based, and unit-specific event-based formula- 
tions for both network-represented and sequential pro- 
cesses. In the slot-based models, the time horizon is 
represented in terms of ordered blocks of unknown, 
variable lengths, or time slots. In addition, alternative 
methods have been developed which define continuous 
variables directly to represent the timings of tasks with- 
out the use of time slots. These methods can be clas- 


sified into two different representations of time, global 
event-based models and unit-specific event-based mod- 
els. Global event-based models use a set of events that 
are common across all units, and the event points are 
defined for either the beginning or the end (or both) of 
each task in each unit. On the other hand, unit-specific 
event-based models define events on a unit basis, allow- 
ing tasks corresponding to the same event point but in 
different units to take place at different times. For se- 
quential processes, other alternative approaches based 
on precedence relationships have also been used. A de- 
tailed comparison of various continuous-time mod- 
els for short-term scheduling of batch plants was per- 
formed recently by Shaik et al. [28]. They concluded 
that, owing to the heterogeneous locations of the event 
points used, the unit-specific event-based models al- 
ways require fewer event points and exhibit favorable 
computational performance compared with both slot- 
based and global event-based models. 

There are two types of demand patterns for schedul- 
ing of continuous processes that have been addressed in 
the literature: cyclic and short-term. In cyclic schedul- 
ing, a cyclic mode is assumed and the product de- 
mands are specified in terms of constant demand 
rates at the end of a specified time horizon, while 
short-term scheduling deals with a general problem 
where the product demands have different sets of due 
dates. Sahinidis and Grossmann [25] proposed a slot- 
based mixed-integer nonlinear programming (MINLP) 
model for cyclic scheduling of single-stage continuous 
parallel production lines that do not share any com- 
mon resources. Pinto and Grossmann [24] modeled 
cyclic scheduling for sequential operation of a multi- 
stage, multiproduct continuous plant based on a slot- 
based continuous-time representation, leading to an 
MINLP model with explicit inventory breakpoints for 
representing intermediate storage. Munawar et al. [23] 
extended the slot-based cyclic scheduling formulation 
of Pinto and Grossmann [24] to hybrid flowshops con- 
sisting of serial and parallel configurations of process- 
ing and storage units. They proposed a modified def- 
inition of the time slot to account for feed losses dur- 
ing product transitions. Zhang and Sargent [30,31] 
proposed a global event-based MINLP model using 
a resource-task network (RTN) for the optimal oper- 
ation of a mixed-production facility consisting of batch 
and continuous processes. Schilling and Pantelides [26] 
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also used the RTN framework for their slot-based 
continuous-time formulation for short-term schedul- 
ing of batch and continuous processes and proposed 
a special branch and bound solution that branches 
both on discrete and on continuous variables. Karimi 
and McDonald [15,19] developed two slot-based mod- 
els for production planning and short-term schedul- 
ing, which differ in the pre-assignment of slots to 
the time periods for a single-stage multiproduct facil- 
ity consisting of parallel semicontinuous processors. 
Floudas and coworkers [7,8,9,10,11,12,13,14,16,17,18] 
developed unit-specific event-based models for a vari- 
ety of problems involving design, synthesis, short-term 
scheduling, medium-term scheduling, reactive schedul- 
ing, and scheduling under uncertainty. Ierapetritou and 
Floudas [7,8] proposed a novel short-term schedul- 
ing model for batch and continuous processes us- 
ing a state-task-network (STN) framework deploying 
a unit-specific event-based continuous-time represen- 
tation. They used an approximation of the storage-task 
timings for handling different storage requirements 
in their model for continuous processes [8]. Mockus 
and Reklaitis [21,22] proposed a global event-based 
MINLP model for short-term scheduling of batch and 
continuous processes in which both the start and the 
end times of tasks occur at events that are common 
across all units. Their formulation can handle resource 
constraints such as limited availability of utilities and 
manpower. Giannelos and Georgiadis [5,6] proposed 
a unit-specific event-based formulation for short-term 
scheduling of batch and continuous processes using 
a STN representation. Their model is similar to that 
of Ierapetritou and Floudas [7,8], but they relaxed 
the task durations using buffer times and implicitly 
eliminated the various big-M constraints of Ierapetri- 
tou and Floudas [7,8]. In their models [5,6], the au- 
thors introduced special duration and sequencing con- 
straints to ensure feasibility of material balances. The 
start and end times of the tasks producing/consuming 
the same state were, respectively, enforced to be the 
same. While this feature is essential for continuous pro- 
cesses when bypassing of storage is allowed (the rea- 
sons are discussed later in this chapter), for all other 
cases, it will be very restrictive and may lead to sub- 
optimal solutions. Giannelos and Georgiadis [6] en- 
forced these restrictions in their model for short-term 
scheduling of batch plants as well, leading to subop- 


timal solutions as recently observed by Sundaramoor- 
thy and Karimi [29] and Shaik et al. [28]. Mendez and 
Cerda [20] proposed a production-campaign-based al- 
gorithmic approach for short-term scheduling of con- 
tinuous processes, leading to compact models for the 
case when storage bypassing is allowed, with the as- 
sumptions that only one state is produced by each task 
and no initial inventories exist for intermediate states. 
Castro et al. [1] developed a global event-based for- 
mulation for short-term scheduling of batch and con- 
tinuous processes using a RTN representation, where 
changeovers are treated as additional batch tasks. Most 
of the above mentioned models [1,5,8,20] can handle 
different storage requirements such as unlimited, finite, 
flexible, dedicated, and no intermediate storage poli- 
cies. However, for an industrial case study of consumer 
goods manufacturing involving making, storage, and 
packing tasks, Ierapetritou and Floudas [8] solved the 
cases of unlimited and finite intermediate storage and 
reported approximate suboptimal solutions. Mendez 
and Cerda [20] and Giannelos and Georgiadis [5] also 
reported suboptimal solutions for the case of finite in- 
termediate storage with no maximum demand limits. 
Castro et al. [1,2] could not find the global optimal solu- 
tion for the no-intermediate-storage case even at higher 
event points and large computational times. They clas- 
sified the problem as intractable and used a decomposi- 
tion strategy [2] to improve the computational perfor- 
mance. 

Recently, Shaik and Floudas [27] proposed an 
improved model for short-term scheduling of con- 
tinuous processes using the unit-specific event-based 
continuous-time representation. They modified and ex- 
tended the formulation of Ierapetritou and Floudas [8] 
and presented improved sequencing constraints to rig- 
orously address the different storage requirements. 
Their formulation is based on the STN representa- 
tion, resulting in a mixed-integer linear programming 
model that accurately accounts for various storage re- 
quirements such as dedicated, finite, unlimited, and no 
intermediate storage policies. The formulation allows 
for unit-dependent variable processing rates, sequence- 
dependent changeovers, and with/without the option 
of bypassing of storage. The Shaik and Floudas model 
is presented in Sect. “Formulation” for the cases with 
and without storage bypassing. In Sect. “Computational 
Case Study,” different variants of an industrial case 
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study from fast moving consumer goods manufactur- 
ing is presented to demonstrate the capability of the 
model. 


Formulation 


The Shaik and Floudas model for short-term schedul- 
ing of continuous plants is described below for several 
instances of different storage requirements, which in- 
clude (a) unlimited intermediate storage, (b) dedicated 
finite intermediate storage, (c) no intermediate storage, 
and (d) flexible finite intermediate storage. Both the 
with and the without bypassing of storage requirements 
options are discussed. Initially, the first case, when stor- 
age bypassing is allowed, is discussed, and then, the sec- 
ond case, without bypassing of storage, is discussed in 
Sect. “Dedicated and Flexible Finite Intermediate Stor- 
age Without Storage Bypassing.” 


Unlimited Intermediate Storage 


Initially, consider the simple case of unlimited interme- 
diate storage . For this case, there is no need to model 
the storage tasks explicitly, and hence the model is de- 
scribed only using the continuous processing tasks ip. 
There is no difference in the model due to whether by- 
passing of storage is allowed or not, because of the avail- 
ability of unlimited intermediate storage. The mathe- 
matical model consists of the following allocation con- 
straints, capacity constraints, material balances for raw 
materials and intermediates, demand, duration, and se- 
quencing constraints: 
Allocation Constraints. 


Yi win) <1 VjelneN (1) 


i€l; 


In each unit, only one task or no task takes place at any 
event as represented by constraint (1). 
Capacity Constraints for Processing Tasks. 


RE"(TI (ip, n) — T*(ip, n)) S Blip, n) 
< RP#(T! (ip, n)—T*(ip,n)), Vip Cl, n EN 
(2) 
blip, n) = Ri,(TS (ip, n) — T'(ip, 2) 
Vip€ lp, n EN (3) 


The amount of material processed by a continuous pro- 
cessing task is constrained by lower and upper bounds 
in (2), which are a function of the duration of the corre- 
sponding task i), which is represented by the difference 
between the end time and the start time of the task at 
event 1, (TT (ip, n) — T*(ip, n)), and the minimum and 
the maximum processing rate of the task i,. For the case 
of constant processing rates, the amount of production 
is related as described by constraint (3). 
Material Balances. (A) Raw Materials. 


STo(s,n)+ >) p5,,b(ip, 2) =0 Ws S®,n€N (4) 


ip€l, 


In constraint (4), the amount of raw material, as and 
when required from the external resources, is related 
to the amounts consumed at the corresponding event 
n. On the other hand, if the entire raw material re- 
quirement is supplied at the beginning of the schedul- 
ing horizon, then constraint (4) is modified as follows: 


ST(s, n) = ST(s,n—1) + D) pf, blip, m) 


ipl, 


VseS® neN,n>1, (5) 


ST(s,n) = STo(s) + > Psi, (ip, 1) 


ipl, 


WVseSki n=1. (6) 


The total initial amount required from external re- 
sources, STo(s), calculated in (6), is either partly con- 
sumed at the first event n = 1 or remains in the stor- 
age, which is consumed during the subsequent events 
as described in (5). 

(B) Intermediates. 


ST(s,n) = ST(s,n—1) +) pb, blip, n) 


ip€ls 
a » Psi, 0(ip. nm) Vs Ss”. # EN, aS 7) 
ip€ls 
= Pp . 
ST(s,n) = STo(s) + )) pf, blip. n) 
ip€ls 
+ Do psi,bip,n) WseSN,n=1 (8) 


ipl, 


Similarly, for the intermediate state s, the material bal- 
ance is written in constraints (7) and (8). ST(s,n) 
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defines the excess amount of state s at event n. In 
constraint (7), the first term on the right-hand side sig- 
nifies the amount of state s stored at the previous event 
(n - 1) in the storage; the second term represents the 
amount of state s produced by the upstream process- 
ing task at event n. This total amount is either con- 
sumed in the downstream processing task indicated by 
the third term or remains in the storage at event n as 
shown on the left-hand side. At the first event, the ini- 
tial amount of available intermediates is taken into ac- 
count in (8). These constraints are based on the condi- 
tion that an intermediate material is allowed to go di- 
rectly to the production task bypassing the storage, be- 
cause only the excess amount, ST(s, n), is stored. The 
other case, where storage bypassing is not allowed, ir- 
respective of whether the amount produced by the up- 
stream processing unit is in excess of the amount con- 
sumed by the downstream processing, is discussed in 
Sect. “Dedicated and Flexible Finite Intermediate Stor- 
age Without Storage Bypassing.” 
Demand Constraints. 


DP" < D7 YD 8, blip,n) < DP™ Ws eS? (9) 


n€N ip€l, 


The material balance for the final product state s is given 
in constraint (9), where the total production of state s is 
limited between the specified lower and upper bounds 
on the demands of the final product. 

Duration Constraints for Processing Tasks. 


TS (ip, n) — T’(ip, n) < Hw(ip, n) 
Vip€lp,neN (10) 


The duration constraint for processing tasks is given 
in (10), which states that the duration is zero if the cor- 
responding processing task is not active, otherwise the 
constraint is relaxed. 

Sequencing Constraints. (A) Same Task in the 
Same Unit. 


T(i,nt+1)>Tf(i,n) Viel, neN,n¥N (11) 


Constraint (11) states that the start time of a task at the 
next event should be greater than the finish time of the 
same task at the current event. 

(B) Different Tasks in the Same Unit. For two dif- 
ferent tasks taking place in the same unit, the differ- 
ent possible changeover requirements are (i) no setup 


time required, (ii) changeover time is required but is 
independent of the sequence in which the two con- 
secutive tasks take place, and (iii) sequence-dependent 
changeovers. 

For the first case of no changeover times, the two 
constraints for the same task in the same unit and dif- 
ferent tasks in the same unit can be combined into one 
equation as shown in (12): 


T(i,n +1) > TI(i,n) 


VieJ, ie], i €1j,nEN,nA#N. (12) 


For the second case of sequence-independent 
changeovers, the constraint for different tasks in the 
same unit requires modification as shown in (13), 
where 1; is the changeover time in unit j: 


T’(i,n +1) > TS(i,n) + twin) 


VjeJ,ii €l,i#i,neN,n#N. (13) 


For the third case of sequence-dependent changeovers, 
the constraint for different tasks in the same unit is gen- 
eralized as follows: 


TS(i,n) > Thi, n') + t9,w(i, n) 
—HQ-w(i'n))-H > SO win" 


i” n’<n"’<n 
Vielif’el,i#i,nn eN,n>n'. 
(14) 


Note that in constraints (12)-(14), only the last con- 
straint (14), has big-M terms, unlike in the previous for- 
mulations of Ierapetritou and Floudas [7,8]. This fea- 
ture is observed to result in better linear programming 
relaxed solutions which often improve the computa- 
tional performance. 

(C) Different Tasks in Different Units for Process- 
ing Tasks. The start and finish times of different pro- 
cessing tasks i and i,, which produce or consume the 
same state s, need to be aligned, and we have the follow- 
ing two alternative ways of writing these constraints, 
(15) and (16) or (15A) and (16A): 


T*(ip,n) = T*(i,,n) — H(1 — w(i,, n)) 
WseS™, el, i, € Is, ip Fit, Pry > 0, 


Psi, <9, 2 EN, (15) 
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T! (ip, n) = TI(i,,n) — H( — w(ih, n)) 


SIN 


Fi | A of Pp 
Vse , by €I,, i, €I;, ip # i, Pit > 0, 


Poe <0O,neN, (16) 


T'(ip,n) = TS(i,,n) — H(2— w(it, n) 


—wlip,n)) VWs e SN, ip € I, ie I;, 


ip F i, Pris > 0, pi, <0, EN, USA) 
TI (ip, n) = TI(i,, n) — H(2 — (ih, n) 
—w(ip,n)) VWs ESN, ip €I,, tls, 

ip # it, Pris > 0, pf, <0,n EN. (16A) 


In these constraints the start and end times of the 
downstream (consuming) processing tasks are enforced 
to be later than the corresponding times of the up- 
stream (producing) processing tasks that process the 
same state s. Constraints (15) and (16) enforce the con- 
dition that the start and end times of the consuming 
tasks need to be always aligned to those of the corre- 
sponding producing tasks whenever the producing task 
is active; (15A) and (16A) state that the start and end 
times of the corresponding processing tasks need to be 
aligned conditionally, when both tasks are active. 
Constraints (15) and (16) are required when we 
use the variable ST(s,n) in the material balance con- 
straint (7), for instance, when we do not consider stor- 
age as a separate task for the unlimited intermediate 
storage case (or for the dedicated finite intermediate 
storage case discussed later). Otherwise, if we do not 
have ST(s,n) (for instance, for the no intermediate 
storage case discussed later) or if we consider storage 
as a separate task and use the variable B(ig, 1) instead 
of ST(s,n) in the material balance (for instance, for 
the dedicated and flexible finite intermediate storage 
cases discussed later), then we need constraints (15A) 
and (16A). The reason for this is that when we do not 
consider storage as a separate task and use the variable 
ST(s,) in the material balance constraint (7), there 
is an implicit assumption that when the downstream 
consuming task starts at event n, the amount stored 
in ST(s,n—1) would be available for consumption, 
which may not always be valid (because of the het- 
erogeneous locations of events used), unless the con- 


suming tasks are always aligned to the producing tasks 
whenever the producing tasks are active, as described in 
the sequencing constraints (15) and (16). It should be 
noted that the fact that the model of Ierapetritou and 
Floudas [8] for continuous processes results in the re- 
ported approximate solutions [1,5,20] is due to lack of 
the second constraint, (16) or (16A), relating the end 
times of the producing and consuming tasks. 

Extra Tightening Constraint. The following tight- 
ening constraint gives a better linear programming re- 
laxation: 


Yoyo (iin) — Tin) <H- POV ied. 


n€N i€l; 


(17) 


It states that the sum of the durations of all tasks suit- 
able in unit j is limited by the available time in the hori- 
zon (H — sae where Tis a lower bound on the total 
cleanup time required in unit j. 


Dedicated Finite Intermediate Storage 
with Storage Bypassing Allowed 


Next, consider the case where a dedicated finite in- 
termediate storage is available for each intermediate 
state s. For this case as well, there is no need to model 
the storage tasks explicitly, because the storage tasks are 
anyway dedicated in nature. We are only interested in 
constraining the finite nature of the intermediate stor- 
age. Hence, the model is described here only using the 
continuous processing tasks ip, and the other case when 
storage is considered as a separate task is discussed in 
Sect. “Flexible Finite Intermediate Storage with Storage 
Bypassing Allowed.” All the above constraints for the 
case of unlimited storage remain the same except for 
the constraints for different tasks in different units for 
processing tasks. Here, since we do not consider stor- 
age as a separate task, we use constraints (15) and (16). 
Additional constraints would be required depending on 
whether storage bypassing is allowed or not. Initially, 
we consider the case when storage bypassing is allowed 
and then discuss the other case in Sect. “Dedicated and 
Flexible Finite Intermediate Storage Without Storage 
Bypassing”. 

Storage Bypassing Allowed. When an intermediate 
material is allowed to go directly to the production task 
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bypassing a finite intermediate storage or if there is no 
intermediate storage available, then to ensure the fea- 
sibility of the inventory capacity balance, the start and 
finish times of processing tasks i and ip, which produce 
and consume the same state s, respectively, need to be 
the same if both tasks are active at the same event point 
as shown in the following constraints: 


T*(ip,n) < Gas n) + H(2— w(i,, n) 
—w(ip,n)) Ws €S"S, iy € I, 1 6 Tes 


‘ | p 
ip F i, Pein > 0, P <0,nEN, (18) 


TS (ip, n) < TIC, n) + H(2— wig, n) 


sis 


—Wlip,n)) Ws e€ 


ji 7] P c 
ip F i, Psi, > 0, ps, <0, 2 EN. (19) 


pty dy, i, di, 


So, from constraints (15), (16), (18), and (19), both the 
start and the end times of the producing and the con- 
suming tasks of the same state s would be the same, if 
both tasks are active at event n. If either of the tasks 
is not active, then constraints (18) and (19) are relaxed 
and are trivially satisfied. This zero-wait condition is re- 
quired to ensure the feasibility of the inventory capacity 
balance as illustrated in Shaik and Floudas [27], because 
of the unit-specific nature of the continuous-time rep- 
resentation used. The formulation of Ierapetritou and 
Floudas [8] did not take this into account. In the for- 
mulation of Giannelos and Georgiadis [6], this condi- 
tion was enforced even for batch plants as well, which 
is unrealistic, and hence their formulation led to sub- 
optimal solutions as observed by Sundaramoorthy and 
Karimi [29] and Shaik et al. [28]. 

Now, to constrain the finite nature of the interme- 
diate storage available, the following bounds are added 
for the states that have the finite storage requirements: 


ST(s,n) <ST™ VseSS nen. (20) 


No Intermediate Storage 


For the case when no intermediate storage is avail- 
able, the excess amount of state s, ST(s,n), is driven 
to zero at each event n or simply this variable is elim- 
inated. Then, the material balance constraints (7) and 


(8) reduce to the following, meaning that the amount 
of state s produced at an event has to be consumed at 
the same event: 


D 0F,,bCip. 0) + D2 pS, lip, n) = 0 
ip€ls ip€ls 

¥seS™ nen, (1) 
The condition of enforcing the same start and end times 
of the producing and consuming tasks of the same 
state s, described in constraints (15A), (16A), (18), and 
(19) is again applicable because no intermediate storage 
is available. Here we use (17) and (18) because in the 
material balance constraint (21), there is no assumption 
that the consuming task will receive material from the 
storage at the previous event, and hence there is no need 
to enforce the alignment unconditionally. 


Flexible Finite Intermediate Storage 
with Storage Bypassing Allowed 


This is a general case where finite intermediate storage 
is available and for each material state several suitable 
storage options exist. A material state can be stored 
in all or a limited number of storage units and vice 
versa. To handle this general case we need to intro- 
duce separate tasks for the storage activity, because the 
storage cannot have more than one state at any time 
and we need to accommodate additional constraints 
for relating the timing of storage tasks (i) to that of the 
processing tasks (i,). The nature of these constraints 
would be different depending on whether storage by- 
passing is allowed or not. Initially, we consider the case 
when storage bypassing is allowed and then discuss the 
other case in Sect. “Dedicated and Flexible Finite Inter- 
mediate Storage Without Storage Bypassing.” For the 
dedicated finite intermediate storage case, in contrast to 
the model discussed in Sect.“Dedicated Finite Interme- 
diate Storage with Storage Bypassing Allowed,” if we al- 
ternatively chose to consider storage as a separate task, 
then the following same model would be applicable. 

The additional constraints in the mathematical 
model are described below for the case of flexible finite 
intermediate storage when storage bypassing is allowed. 
The allocation constraint in (1) remains the same ex- 
cept that now it is written over all units (both process- 
ing and storage). 
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Allocation Constraints for Storage Tasks. 


Wis, 1 + 1) = wists 2) + Zist, 2) — 1 


Vig E€Ig,n EN (22) 


Constraint (22) states that if a storage task is active and 
it stores a nonzero amount at event n, then the same 
storage task should be active at the next event n + 1 as 
well. Additionally, this constraint also avoids unneces- 
sary tank-to-tank transfer of material because the same 
storage task would be active at the next event as well. 


Capacity Constraints for Storage Tasks 


bis, 2) < Vir" wis, n) Vis €I,n€N (23) 


bist, nm) < Viv" zlist, nm) Wie €In,n EN (24) 
The capacity constraints for processing tasks remain 
the same as in (2) or (3), while for storage tasks the 
amount of material that can be stored is limited by the 
available capacity of the corresponding storage unit as 
shown in constraint (23). Constraint (24) is the same 
as (25), but uses a different binary variable z(ig, 1) to 
confine only those instances when b(ig, 1) 4 0, which 
is realized through the penalty term on the number of 
binary variables in the objective function described in 
(45). 


Material Balances Constraint (4), for calculating the 
amount of raw material as and when required from the 
external resources, remains the same. For the other case 
when the entire raw material requirement is supplied at 
the beginning of the scheduling horizon, constraints (5) 
and (6) are modified as follows: 


> pe, Oists n—1)+ S- piibli. n) =0 


ist EI st i€ls 
VseSkneNn>1, (25) 
STo(s) + 2 S- pibli. n)=0 VseS®, (26) 


n=1 iél, 


where the set I; consists of both processing and storage 
tasks. The variable ST(s, 1) is eliminated here because 
separate storage tasks are defined explicitly. Similarly, 
the material balances in (7) and (8) for the intermedi- 


ates are modified as given below: 


Ye 0%, lips 2) + > ph, Blin. 2 = 1) 


ip€ls ig EI 
= Spi b(i. n) =0 VWseSlS neNn>1, 
i€l, 
(27) 
1 Pi, (ip. n) + STo(s) + S > piibli. n)=0 
ip€ls i€l, 
VseSS n=1, (28) 


where the set I; consists of both processing and storage 
tasks. These constraints are based on the same condi- 
tion that an intermediate material is allowed to go di- 
rectly to the production task bypassing the storage. The 
demand constraint in (9) remains the same. 


Duration Constraints for Storage Tasks 


T! (ig, n) > T(ig,n) Wis € Ign, n EN (29) 
The duration constraint given in (10) remains the same 
for processing tasks, while for the storage tasks the fin- 
ish times have to be later than the start times as shown 
in (29). 

The sequencing constraints, (11)-(14), (15A), 
(16A), (18), (19) for the same task in the same unit, dif- 
ferent tasks in the same unit, and different tasks in dif- 
ferent units for the processing tasks remain the same. 
Here, because storage is considered as a separate task 
we use constraints (15A) and (16A). The sequencing 
constraints for storage tasks are defined below. 


Sequencing Constraints. Different Tasks in Different 
Units for Storage Tasks The start time of a storage 
task that stores the intermediate state s should be same 
as the start time of the processing task that either pro- 
duces or consumes state s as follows: 


T*(is,n) = T*(ip, n) — H(2 — wig, 1) 
—w(ip,n)) WseS' it €L.ip €L,n EN, 
(30) 
T* (ist, n) < T*(ip, n) + H(2 — wlis, 2) 


—w(ip,n)) WseS'N, it E15, ip EL,N EN. 
(31) 
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If the corresponding processing task that either pro- 
duces or consumes state s is not active (w(ip,n) = 0), 
and there is a nonzero amount in the storage 
(z(ist, 2) = 1), then the start time of the storage task 
should be equal to the finish time of the same task at 
the previous event and is realized through constraints 
(32) and (11). 


T*(igsn) < T! (ig, n—1)+H(1—2(in, 2) +W(ip, 1) 
Vs ESN, ig €1;, ip E15, n EN,n>1 (32) 
The finish time of the storage task for storing state s 


should be greater than the finish times of both the pro- 
ducing and the consuming tasks of state s as follows: 


TS (ig, n) > T! (ip, n) — H(2 — wis, 2) — (ip, 1) 
VsE SN, ig €1;, ip €E15,0 EN. (33) 
Additionally, if the amount stored in the storage is zero 
(z(ist, 2) = 0), then the finish time of the storage task 
should be same as the finish time of the corresponding 


consuming processing task and is realized from con- 
straints (33) and (34). 


TT (ix, n) 

< T!(ip, 2) + H(2—w(is, 2) —W(ip, n) + HzCine. 0) 
Ws ESN, ig E15, ip € Is, pa <0, we N 

(34) 
However, if the amount stored in the storage is nonzero 
(z(ist, 1) = 1), then it should remain in the storage until 
the start time of the processing task at the next event. 
So, the finish time of the storage task should be same as 


the start time of the next processing task and is realized 
from constraints (35) and (36): 


TS (ig, n) > T° (ip, n + 1) — H(2 — wis, n) 
—Ww(ip,n + 1)) -—H01 — Zig, 1)), 
VsE SN, ig EI, ip CL,0 EN, NAN, 
(35) 
T! (ig, n) < T'(ip,n + 1) + H(2— wig, 1) 
— wip, n + 1)) + H(1 — 2Cist, 1) , 
VsES inl, ip C10 EN NAN. 
(36) 


The tightening constraint given in (17) is again applica- 
ble. 


Dedicated and Flexible Finite Intermediate Storage 
Without Storage Bypassing 


The nature of constraints is different when storage by- 
passing is not allowed for the cases where production 
must go through finite storage (dedicated or flexible) 
before consumption in the downstream units. This is 
a general case and we do not need to enforce the same 
start and end times for producing or consuming tasks 
of the same state, because the material always goes 
through storage. 


Dedicated Finite Intermediate Storage Case Without 
Bypassing of Storage Here again, we have the op- 
tion of considering storage as a separate task or not. In 
this section, we describe the model without considering 
storage as a separate task, and the other case when stor- 
age is considered as a separate task is discussed in the 
next section along with the flexible finite intermediate 
storage case. 

When we do not consider storage as a separate task, 
the model comprises all the constraints, (1)-(14), (15), 
(16), and (17), discussed in Sect. “Unlimited Intermedi- 
ate Storage.” Additionally we need constraint (20) and 
the following material balance constraints for the inter- 
mediate states. 


Material Balance for Intermediates 


ST(s,n—1)+ >> pi, b(ip, n) < STM 


ip€ls 
Vse SS neN,n>1 (37) 
STo(s) +) pb, blip, n) < ST™ 
ip€ls 
Vse SIS neN,n=1 (38) 


In constraints (37) and (38), the total amount received 
into the dedicated storage at each event is constrained 
to be within the maximum capacity limits. 


Flexible Finite Intermediate Storage Case Without 
Bypassing of Storage For the dedicated finite inter- 
mediate storage case as well, in contrast to the model 
discussed in the previous section, if we alternatively 
chose to consider storage as a separate task, then the 
following same model would be applicable for both 
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the dedicated and the flexible finite storage cases. Con- 
straints (1), (2), (4), (9)-(.1), (14), (16A), (17), (22)- 
(24), (27)-(29), and (32)-(36), discussed in the previ- 
ous sections, are required. Constraint (15A) is not re- 
quired as it is implicitly enforced, because consuming 
tasks are aligned with storage tasks, which in turn are 
aligned with the producing tasks. Additionally, we need 
the following allocation, material balance, and sequenc- 
ing constraints. 


Allocation Constraints 


D> wisn) = wip, m) 


Oy Cc 
ist D sit 


FIS ; 
VseS sip € py NEN (39) 


Constraint (39) states that whenever a producing task 
of state s is active then one or more of suitable storage 
tasks also need to be active for all intermediate states 
that have the finite storage requirement. 


Material Balance for Intermediates Here, because 


the storage tasks are modeled explicitly, constraints (37) 
and (38) are modified as follows: 


DE PoC 2 — D+ DY pF, blip, n) 


ist€ls ip€ls 
< > Vie Wis, 1) Ws € SIS neN,n>1, 

ist€Is 

(40) 
STo(s) + D7 %,,bCip. n) 
ip€l, 
< )0 VE" wien) VseS™§,n=1. (41) 
ist EI 


Sequencing Constraints. Different Tasks in Different 
Units 


T*(ist, 2) = T*(ip, n) — H(2 — wligt, 2) 
—wip,n)) VsESN, in € Is, ip € pj. n EN 
(42) 


T* (ist, n) = T* (ip, n) + A(2 = Wists n) 


Vs € SIN 


— w(ip, n)) rin Is, ip € py, WEN 


(43) 


T*(ip, n) => T* (ist, n) — H(2 — wis, 2) — w(ip, n)) 
Vs ESN, it € 15, ip € Psi WEN (44) 


Constraints (42) and (43) impose the requirement that 
the start time of the storage task is the same as the start 
time of the corresponding processing task that pro- 
duces the intermediate state, if both the storage task and 
the producing task are active. Constraint (44) states that 
the start time of the processing task that consumes the 
intermediate state should be later than the start time of 
the corresponding storage task if both tasks are active. 


Objective: Maximization of Profit 


on > > prices a pr, b(i, n) 


sEsP nEN iel, 
—C, >>> wli,n)— C3 > D> cline) (45) 
i€I neN igt€ lot WEN 


The objective is maximization of profit due to sales 
from the production of final products, and additionally 
there are penalties for the total number of binary vari- 
ables as shown in (45), where C; is the corresponding 
cost coefficient. 


Objective: Minimization of Makespan For the ob- 
jective of minimization of the makespan (MS) the fol- 
lowing constraints need to be added: 


Minimize MS and T/(i,n) < MS VielI,neN. 
(46) 


The tightening constraint in (17) is modified as fol- 
lows: 


Soo (rl (i,n) — Thi, n)) < MS — oP" 
n€N i€]; 


VjeJ. (47) 


Computational Case Study 


An industrial case study of a fast moving consumer 
goods manufacturing plant that has been extensively 
studied by several authors [1,2,5,8,20,27] is considered. 
The plant follows a common production sequence: 
mixing, storage, and packing. The mixing stage has 
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(Afixer A) 


(Afixer B.C) 


(Mixer B.C) 


Short-Term Scheduling of Continuous Processes, Figure 1 
State-task-network representation of the plant 


three parallel mixers (mixers A, B, and C) operating in 
a continuous mode that produce seven intermediates 
(Int1-Int7) using three different base stocks (bases A, B, 
and C) available as required. These intermediates may 
be stored in three storage tanks (tanks A, B, and C) or 
directly packed in five continuous packing lines (lines 


p? (57) 
@. an Or 


{Line 3) 


{Line ?) 


{fine 2) 


{Line ?) 


{Line 2) 


(Line 3) 


(Line 4) 


(Line 2) 


(Line 5) 


{Line 5) 


{Line 4) 


{Line 4) 


(Line 2) 


{Line 4) 


1-5), thus producing 15 final products (S1-S15). The 
STN representation of the plant is shown in Fig. 1 along 
with the unit suitability for each task. 

The base stocks are available in unlimited initial 
amounts. For each task suitable on multiple units, a sep- 
arate task is considered. For instance, two making tasks 
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Production rates 


Task (ip) Unit Up) RM (ton/h) 
Mixer A 
Mixer A 
Mixer B 
Mixer C 
Mixer B 
Mixer C 
Mixer B 
Mixer C 
Mixer B 
Mixer C 
Mixer B 
Mixer C 
Line 3 


(m31 and m32) are considered for producing “Int3” us- 
ing mixers B and C, respectively. The 15 final prod- 
ucts are produced using 15 packing tasks (p1-p15). The 
problem data such as production rates, task-unit suit- 
ability, and cleanup times used in the literature [5,27] 
are given in Tables 1 and 2. The state-specific data such 
as minimum demand specifications and prices for the 
final products, and storage limitations are shown in Ta- 
ble 3. A time horizon of 120h is considered. 

The case study with finite intermediate storage is 
considered. We compare the results from the litera- 
ture [1,2,5,8,20], except for the model of Giannelos and 
Georgiadis [5], for which the comparison is based on 
our implementation of their model. All the computa- 
tions in this work were performed using a 3.2 GHz Pen- 
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Cleanup times 


Unit Up) 5 


I 


Changeover ts <= ip) 


(p2, p4) <> p8 

(p3, p5) <> (p9, p14) 
pl < p6 

(p12, p15) <> (p13, p7) 


Short-Term Scheduling of Continuous Processes, Table 3 
State-specific data 
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tium 4 machine with 1 GB RAM using GAMS (distribu- 
tion 21.7) and CPLEX 9.0.2, and the case studies were 
solved with an integrality gap of less than 0.3%. 

Finite Intermediate Storage. Consider the case of 
flexible finite intermediate storage for all the intermedi- 
ates. Three storage tanks (tanks A, B, and C) are avail- 
able each with a maximum capacity of 60 ton, and any 
intermediate can be stored in any of the three tanks. 
Since there are seven intermediates, 21 additional stor- 
age tasks need to be considered. Unlike in the Shaik and 
Floudas model, the start and end times of storage tasks 
are not precisely calculated in the model of Giannelos 
and Georgiadis [5]; rather it seems that they determine 
and/or adjust these timings during postprocessing. In 
our implementation of their model, we used the fol- 
lowing equation to calculate the start times of all active 
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storage tasks during postprocessing: 


1S 
T;, = Tit — Ost, 


: VseS'*,7ER, eT. (8) 


Then, in order to avoid the unnecessary activation of 
storage tasks in our implementation of their model, we 
used the following objective function, which is similar 
to (45) of the Shaik and Floudas model: 


Cy ~~ price, ST(s, tn) 
sesP 


~Q 0) oxi, )-C3 YS Yo ys). (49) 


i€l teT seSfis teT 


The model statistics are reported in Table 4. For the 
objective function of maximization of profit in (45) 
with C; = 10 and C, = C3; = 1, the Shaik and Floudas 
model requires four events, and the objective function 
is found to be 26910.181 in 465.61 CPU s with an in- 
tegrality gap of 0.12%, which corresponds to a profit 
of 2695.32. The Gantt chart for the Shaik and Floudas 
model is depicted in Fig. 2. 

This case is regarded as one of the hardest in- 
stances to solve in the literature. The Shaik and Floudas 
model finds the global optimum solution and requires 
consideration of only four events compared with the 
global event-based model of Castro et al. [1], which re- 
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Results for the case study: finite intermediate storage 


Shaik and 
Floudas[ ] 


Mendez 
and 
Cerda[ ] 


Giannelos 
and Geor- 
giadis[ ] 


Castro 
etal.[ ] 


Events 


Binary 
variables 


Continuous 
variables 


4267 
21130 
26946.72 
26910.18 

2695.32 

0.12 


1127 2113 
6884 
2695.32 
2692.06 
2692.06 


0.12 


Constraints 


Nonzeros 
RMILP 
MILP 
Profit 


2695.32 
2695.32 
2695.32 


2670.28 
2670.28 


Integrality 
gap (%) 
CPU time (s) 
Nodes 


465.61 
7144 


60000 
3307486 


#CPU time for other models reported for completeness only 


ported ten events for this case as shown in Table 4. 
The campaign-based algorithmic model of Mendez and 
Cerda [20], although it is very compact in terms of the 
fewest problem statistics, could not find the global op- 
timal solution for this case (the best solution reported 
corresponds to a profit of 2670.28). The formulation of 
Giannelos and Georgiadis [5] also reported a subopti- 
mal solution corresponding to a profit of 2689.48 us- 
ing four events. On the basis of our implementation of 
their model, an improved objective value of 2692.06 was 
found within a maximum CPU time of 60000 s and an 
integrality gap of 0.12%. However, in the Gantt chart it 
was observed that the end times of some storage tasks 
are not precisely calculated as discussed in Shaik and 
Floudas [27]. 


Conclusions 


In this study, the formulation of Shaik and Floudas [27], 
an improved model for short-term scheduling of con- 
tinuous processes based on a STN representation us- 
ing a unit-specific event-based continuous-time formu- 
lation, was presented. The model of Ierapetritou and 
Floudas [8] for continuous processes was modified and 
extended to precisely account for different storage re- 
quirements, such as dedicated, flexible, unlimited, and 
no intermediate storage policies. The efficacy of their 
formulation was demonstrated for an industrial case 
study from a fast moving consumer goods manufactur- 
ing process reported in the literature. 


Nomenclature 

Indices 

i tasks 

ip processing tasks 

igs Storage tasks 

j units 

n_ events indicating beginning of a task 
s_ states 


Sets 

I tasks 

I, processing tasks 

I storage tasks 

I; _ tasks which can be performed in unit j 

I; tasks which process state s and either produce or 
consume 
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Short-Term Scheduling of Continuous Processes, Figure 2 
Gantt chart for the finite intermediate storage case 


J units (both processing and storage) 

Jp processing units 

Js storage units 

J; units which are suitable for performing task i 
N _ event points within the time horizon 

S states 

S® states that are raw materials 

S'N states that are intermediates 

S™S intermediate states that have finite storage re- 
quirements 

S? states that are final products 


Parameters 


Ron minimum processing rate of task i, ton/h 


Ri“ — maximum processing rate of task i, ton/h 
Rj processing rate of task i if it is constant, ton/h 
i,_ maximum storage capacity for storage task i, 

ton 

H time horizon, h 

price, price of state s 

D™" — lower bound on demand for state s, ton 

D™* — upper bound on demand for state s, ton 

Tj sequence independent cleanup time required in 
unit j, h 

i sequence dependent cleanup time required be- 
tween tasks i and i’,h 

zmin minimum total cleanup time required for unit 


jh 


743 nd $5 
=<. m2 

n316.3nagq 3 

m6] 


mi 
nd 4a? 


m32 

n376.5,n4 9 

inte? int 
nt 0 


5 ints 
F 


72 96 120 


pr, , P<; proportion of state s produced, consumed from 
tasks i, respectively, p?, > 0, p‘, < 0 


Variables 
w(i,n) binary variable for assignment of task i at the 


beginning of event n 


Z(is,) binary variable to determine if storage task ig 
stores anonzero amount at event n 

b(i,n) amount of material undertaking task i at 
event n, ton 

STo(s, n) amount of state s € S® that is required from 
external resources at event n, ton 

ST(s,n) excess amount of state s that needs to be 
stored at event n, ton 

T°(i,n) time at which task i starts at event n, h 

T/(i,n) time at which task i ends at event n, h 

References 


1. Castro PM, Barbosa-Povoa AP, Matos HA, Novais AQ 
(2004) Simple continuous-time formulation for short-term 
scheduling of batch and continuous processes. Ind Eng 
Chem Res 43:105-118 

2. Castro PM, Barbosa-Povoa AP, Novais AQ (2004) A divide- 
and-conquer strategy for the scheduling of process plants 
subject to changeovers using continuous-time formula- 
tions. Ind Eng Chem Res 43:7939-7950 

3. Floudas CA, Lin X (2004) Continuous-time versus discrete- 
time approaches for scheduling of chemical processes: 
A review. Comput Chem Eng 28:2109-2129 


Short-Term Scheduling, Resource Constrained: Unified Modeling Frameworks 


3547 


. Floudas CA, Lin X (2005) Mixed integer linear program- 


ming in process scheduling: Modeling, algorithms, and ap- 
plications. Ann Oper Res 139:131-162 


. Giannelos NF, Georgiadis MC (2002) A novel event-driven 


formulation for short-term scheduling of multipurpose 
continuous processes. Ind Eng Chem Res 41:2431-2439 


. Giannelos NF, Georgiadis MC (2002) A simple new contin- 


uous-time formulation for short-term scheduling of mul- 
tipurpose batch processes. Ind Eng Chem Res 41:2178- 
2184 


. lerapetritou MG, Floudas CA (1998) Effective continuous- 


time formulation for short-term scheduling: 1. Multipur- 
pose batch processes. Ind Eng Chem Res 37:4341-4359 


. lerapetritou MG, Floudas CA (1998) Effective continuous- 


time formulation for short-term scheduling: 2. Continu- 
ous and semi-continuous processes. Ind Eng Chem Res 
37:4360-4374 


. lerapetritou MG, Hene TS, Floudas CA (1999) Effective 


continuous-time formulation for short-term scheduling: 
3. Multiple intermediate due dates. Ind Eng Chem Res 
38:3446-3461 


. Janak SL, Floudas CA, Kallrath J, Vormbrock N (2006) Pro- 


duction scheduling of a large-scale industrial batch plant. 
|. Short-term and medium-term scheduling. Ind Eng Chem 
Res 45:8234-8252 


. Janak SL, Floudas CA, Kallrath J, Vormbrock N (2006) Pro- 


duction scheduling of a large-scale industrial batch plant. 
ll. Reactive scheduling. Ind Eng Chem Res 45:8253-8269 


. Janak SL, Lin X, Floudas CA (2007) A new robust optimiza- 


tion approach for scheduling under uncertainty: Il. Uncer- 
tainty with known probability distribution. Comput Chem 
Eng 31:171-195 


. Janak SL, Lin X, Floudas CA (2004) Enhanced continuous- 


time unit-specific event-based formulation for short-term 
scheduling of multipurpose batch processes: Resource 
constraints and mixed storage policies. Ind Eng Chem Res 
43:2516-2533 


. Janak SL, Lin X, Floudas CA (2005) Comments on “En- 


hanced continuous-time unit-specific event-based formu- 
lation for short-term scheduling of multipurpose batch 
processes: resource constraints and mixed storage poli- 
cies.”. Ind Eng Chem Res 44:426 


. Karimi lA, McDonald CM (1997) Planning and scheduling of 


parallel semi-continuous processes: 2. Short-term schedul- 
ing. Ind Eng Chem Res 36:2701-2714 


. Lin X, Floudas CA (2001) Design, synthesis and scheduling 


of multipurpose batch plants via an effective continuous- 
time formulation. Comput Chem Eng 25:665-674 


. Lin X, Janak SL, Floudas CA (2004) A new robust opti- 


mization approach for scheduling under uncertainty: |. 
Bounded uncertainty. Comput Chem Eng 28:1069-1085 


. Lin X, Floudas CA, Modi S, Juhasz NM (2002) Continuous- 


time optimization approach for medium-range production 
scheduling of a multiproduct batch plant. Ind Eng Chem 
Res 41:3884-3906 


19. 


20. 


21: 


22: 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


Mc Donald CM, Karimi IA (1997) Planning and scheduling 
of parallel semi-continuous processes: 1. Production plan- 
ning. Ind Eng Chem Res 36:2691-2700 

Mendez CA, Cerda J (2002) An efficient MILP continuous- 
time formulation for short-term scheduling of multiprod- 
uct continuous facilities. Comput Chem Eng 26:687-695 
Mockus L, Reklaitis GV (1999) Continuous-time representa- 
tion approach to batch and continuous process schedul- 
ing: 1. MINLP formulation. Ind Eng Chem Res 38:197-203 
Mockus L, Reklaitis GV (1999) Continuous-time representa- 
tion approach to batch and continuous process schedul- 
ing: 2. Computational issues. Ind Eng Chem Res 38:204- 
210 

Munawar SA, Bhushan M, Gudi RD, Belliappa AM (2003) 
Cyclic scheduling of continuous multi-product plants in 
a hybrid flowshop facility. Ind Eng Chem Res 42:5861-5882 
Pinto JM, Grossmann IE (1994) Optimal cyclic schedul- 
ing of multistage continuous multiproduct plants. Comput 
Chem Eng 18:797-816 

Sahinidis NV, Grossmann IE (1991) MINLP model for cyclic 
multiproduct scheduling on continuous parallel lines. 
Comput Chem Eng 15:85-103 

Schilling G, Pantelides CC (1996) A simple continuous-time 
process scheduling formulation and a novel solution algo- 
rithm. Comput Chem Eng 20:$1221-S1226 

Shaik MA, Floudas CA (2007) Improved unit-specific 
event-based model continuous-time model for short-term 
scheduling of continuous processes: Rigorous treatment 
of storage requirements. Ind Eng Chem Res 46:1764-1774 
Shaik MA, Janak SL, Floudas CA (2006) Continuous-time 
models for short-term scheduling of multipurpose batch 
plants: A comparative study. Ind Eng Chem Res 45:6190- 
6209 

Sundaramoorthy A, Karimi IA (2005) A simpler better 
slot-based continuous-time formulation for short-term 
scheduling in multipurpose batch plants. Chem Eng Sci 
60:2679-2702 

Zhang X, Sargent RWH (1996) The optimal operation of 
mixed production facilities - A general formulation and 
some approaches for the solution. Comput Chem Eng 
20:897-904 

Zhang X, Sargent RWH (1998) The optimal operation of 
mixed production facilities - Extensions and improve- 
ments. Comput Chem Eng 22:1287-1295 


Short-Term Scheduling, 
Resource Constrained: 
Unified Modeling Frameworks 


MUNAWAR A. SHAIK, CHRISTODOULOS A. FLOUDAS 
Department of Chemical Engineering, 
Princeton University, Princeton, USA 


3548 


Short-Term Scheduling, Resource Constrained: Unified Modeling Frameworks 


MSC2000: 90B35, 65K05, 90C90, 90C11 


Article Outline 


Introduction 
Motivation 


Formulation 
Allocation Constraints 
Capacity Constraints 
Material Balances 
Duration Constraints 
Sequencing Constraints 
Tightening Constraint 
Utility Related Constraints 
Bounds on Variables 
Objective Function 


Computational Case Studies 
Conclusions 


Nomenclature 
Indices 
Sets 
Parameters 
Binary Variables 
Positive Variables 


References 


Introduction 


The research area of batch and continuous process 
scheduling has received great attention from both 
academia and industry in the past two decades. Floudas 
and Lin [2,3] presented extensive reviews comparing 
various discrete- and continuous-time-based formu- 
lations. During the last two decades, numerous for- 
mulations have been proposed in the literature based 
on continuous-time representation, due to their estab- 
lished advantages over discrete-time representations. 
On the basis of the time representation used, the differ- 
ent continuous-time models proposed in the literature 
can be broadly classified into three distinct categories: 
slot-based, global event-based, and unit-specific event- 
based formulations. In the slot-based models, the time 
horizon is represented in terms of ordered blocks of 
unknown, variable lengths, or time slots. Global event- 
based models use a set of events that are common across 
all units, and the event points are defined for either the 
beginning or end (or both) of each task in each unit. On 
the other hand, unit-specific event-based models define 
events on a unit basis, allowing tasks corresponding to 
the same event point but in different units to take place 


at different times. For sequential processes, other al- 
ternative approaches based on precedence relationships 
have also been used. 


Motivation 


A detailed comparison of various continuous-time 
models for short-term scheduling of batch plants was 
performed recently by Shaik et al. [8] They concluded 
that, due to heterogeneous locations of event points 
used, the unit-specific event-based models always re- 
quire less event points and exhibit favorable compu- 
tational performance compared to both slot-based and 
global event-based models. For problems that do not 
have resource considerations such as utility constraints, 
it was found [8] that the modified model of Ierapetri- 
tou and Floudas [4] as presented in Shaik et al. [8], out- 
performs the other models both in terms of least prob- 
lem size and fast computational performance. Similarly, 
for problems with resource constraints the enhanced 
model of Janak et al. [5,6] was found [5,6,8] to perform 
well. 

However, for an additional instance of the following 
example involving a recycle stream shown in Fig. 1, it is 
observed that the model of Ierapetritou and Floudas [4] 
yields a suboptimal solution as discussed below. 


Example 1 Consider the second example discussed in 
Shaik et al. [8], in which two different products are pro- 
duced through five processing stages: heating, reactions 
1, 2, and 3, and separation, as shown in the STN repre- 
sentation of the plant flow sheet in Fig. 1. Since each of 
the reaction tasks can take place in two different reac- 
tors, each reaction is represented by two separate tasks. 
The processing time of task i on unit j is assumed to 
be a linear function, a; + §;B, of its batch size, B. The 
relevant data [8] of the constant (@;) and variable (f;) 
coefficients for processing times of different tasks (i), 
the suitable units (j), and their minimum (Be) and 
maximum (B/""*) batch sizes are shown in Table 1. The 
initial stock level for all intermediates is assumed to be 
zero and unlimited storage capacity is assumed for all 
states. The prices of products 1 and 2 are $10/mu. 

For the objective of maximization of profit and 
a time horizon (H) of 10h, this example is solved us- 
ing the unit-specific event-based model of Ierapetritou 
and Floudas [4] (I&F), the global event-based models 
of Castro et al. [1](CBMN), and Maravelias and Gross- 
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Heating 
f=1 


Reaction 1 
i= 2,3 


Product 2 


Feed C 


Short-Term Scheduling, Resource Constrained: Unified Modeling Frameworks, Figure 1 


State-task network representation for example 1 


Short-Term Scheduling, Resource Constrained: Unified Modeling Frameworks, Table 1 
Data of coefficients of processing times of tasks, limits on batch sizes of units for example 1 


Task(i) 


Heating Heater 


i Reactor1 
Reaction1 


Reactor2 


Reactor1 


Reaction2 
Reactor2 


x Reactor1 
Reaction3 


Reactor2 


Separation 


Separator 


mann [7] (M&G), and using the slot-based model of 
Sundaramoorthy and Karimi [10] (S&K). All the result- 
ing mixed-integer linear programming (MILP) mod- 
els are solved in GAMS distribution 21.1 using CPLEX 
8.1.0 on the same computer (3 GHz Pentium 4 with 
2 GB RAM) as in Shaik et al. [8] Table 2 provides a com- 
parative study of these models in terms of the problem 
statistics such as the number of binary and continu- 
ous variables, number of constraints, CPU time taken 
to solve to the specified integrality gap, the number of 
nodes taken to reach the optimal solution, the objective 
function at the relaxed node, and so forth. It should be 
noted that for the S&K model, n event points are shown 
to represent n—1 slots for a valid comparison with the 
other global-event and unit-specific event-based mod- 
els. In the CBMN model, there is an additional param- 
eter (At) that defines a limit on the maximum number 
of events over which a task can occur. 


For this case, the slot-based and global event-based 
models require at least eight event points and are able 
to find the global optimal solution of 1962.7, compared 
to the unit-specific event-based I&F model, which gives 
a suboptimal solution of 1943.2 with six events and with 
higher events as well. When this case is solved using 
the enhanced unit-specific event-based model of Janak 
et al. [5,6], (LF) it found the global optimal solution of 
1962.7 using six events, as shown in Table 2. The Gantt 
chart for the JLF model is shown in Fig. 2. 


The reason for this exception can be attributed to the 
fact that the I&F model does not allow a task to con- 
tinue over several events, while the JLF model is an en- 
hanced version of the I&F model that allows tasks to 
take place over multiple event points in order to accu- 
rately account for the resource considerations such as 
utility requirements. Although, there are no resource 
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Model statistics and computational results for example 1 under maximization of profit 


Events CPU Time(s) Nodes 


Example 1 (H= 10) 


RMILP ($) MILP (S$) Binary variables 


Continuous Constraints Nonzeros 


variables 


S&K 88679 


1962.7 


M&G 184605 


1962.7 


CBMN(At = 1) 6449 


1860.7? 


(At=2) 194968 


1959? 


(At=3) 366226 


1962.7 


6713 


1943.23 


101415 


1943.29 


Orv} N’ | OV] 0 | CO | CO | CO | CO 


138714 


# Suboptimal solution 


1962.7 


Stilt 


N4 37.9 


8 
N5 N6 
| ' 28.4 i 


Reactor2 
Reactor? 


Heater 


N3 434 N4 soo N6 
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Gantt chart for example 1 using the JLF model 


constraints in this example, it can be observed from 
the Gantt chart of Fig. 2 for the JLF model, that this 
schedule will not be feasible for the I&F model. Ac- 
cording to the constraint for different tasks in differ- 
ent units of the I&F model, for state ‘s5’, the con- 
suming task (i=7) at event ‘N5’ should start after the 
end time of the producing task (i=8) at event ‘N4’,, 
which is clearly not the case in the global optimal so- 
lution of Fig. 2. This constraint becomes relaxed in 
the JLF model because the producing task does not 
end at event ‘N4’, but it continues over the next event 
and ends at event ‘NS’ in the global optimal solution. 
The models of S&K, M&G, and CBMN allow tasks to 
take place over multiple events, and hence, are able 
to find the global optimal solution. Moreover, in these 
models the events/slots are globally aligned, and hence, 
they do not require the above-mentioned sequencing 
constraint for different tasks in different units, which 
is generally required for the unit-specific event-based 
models. 


This example demonstrates that, although, there are 
no resource constraints, in some cases, it is a require- 
ment for the unit-specific event-based models as well to 
allow tasks to take place over multiple events in order 
to find the global optimal solution. To understand such 
cases, let us examine the constraint for different tasks 
in different units that is used in the I&F model. It states 
that the consuming task at the current event should 
start after the end time of producing task at the previ- 
ous event that processes the same state, which need not 
be true if there is sufficient material for the consuming 
task to start production, which happens to be the case 
in the particular instance of Fig. 2. The amount of state 
‘s5’ produced by the recycle stream (i= 8) at event ‘N4 
is not necessary for starting the consuming task (i= 7) 
at event ‘N35’. So, this constraint needs to be relaxed de- 
pending on whether there is sufficient amount for the 
consuming task to start production or not, which is im- 
plicitly realized by allowing the tasks to take place over 
multiple events. 
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If we focus on the computational performance of 
the JLF model in Table 2, it has poor LP relaxation, and 
requires large number of constraints, nonzeros, and 
CPU time, compared to the other competitive mod- 
els of S&K and CBMN. Because, the JLF model was 
originally developed to handle the more general case 
of problems with resource constraints, it can be ob- 
served that it does not reduce well, in terms of problem 
statistics, to the case of no resources. With this moti- 
vation, Shaik and Floudas [9], proposed a unified mod- 
eling approach using unit-specific event-based contin- 
uous-time representation, which (i) can handle prob- 
lems with resource constraints by allowing tasks to take 
place over multiple events, (ii) efficiently reduces to the 
case of no resources, and (iii) is applicable for batch as 
well as continuous processes with mixed storage poli- 
cies. The unified model of Shaik and Floudas [9] (S&F) 
for short-term scheduling of batch plants is described 
in the next section. In Sect. “Computational Case Stud- 
ies”, we consider examples of problems with and with- 
out resources and compare the performance of the S&F 
model with other models. 


Formulation 


The nomenclature used in the S&F formulation is given 
in the Nomenclature section. The three index binary 
variable w(i,n,n’) defines the assignment of task i that 
starts at event n and ends at event n’ (n’ > n). To ex- 
ercise control on the maximum number of events over 
which a task is allowed to continue, a parameter, An, 
is defined such that n < n’ < n+ An, An=0,1,.... 
So, the task i that starts at event n may end either 
at the same event point n(An=0), which would be 
similar to I&F model, or it may end at a later event 
n+ An (An = 1,...), which would be similar to the 
model of Janak et al. [5,6] In the model of Castro 
et al. [1](CBMN) also such parameter (At) was defined. 
However, unlike in the S&F model [9], in all the slot- 
based and global event-based models [1,7,10], a task 
that starts at an event cannot end at the same event, 
thus by definition, the unit-specific event-based mod- 
els [5,6,9] would require at least one event less com- 
pared to the slot-based and global event-based models. 

The mathematical model has the following alloca- 
tion, capacity, material balance, duration, sequencing, 
and utility related constraints. 


Allocation Constraints 


wi,n,n’)<1 VjEet,nEeN 


De De 


i€Ij n’EN,n<n’<n+An 


(1) 


w(i,n,n’) <1 


Dee es 


i€Ij nEN,n<n/<n+An 


VjeJ,n'e€N,An>0 (2) 


w(i',n',n) <1 


> 


i/E1;,i#i’ n’/EN,n<n/<n+An 


eS 


n’EN,n<n’<n+An 


Viel, jes,neN,An>0 (3) 


_ w(i,n,n') 


» 


. , 
w(i,n,n) <1 
n’/EN,n<n’<n+An 


L= > > Soo win! n" 


n’EN,n’<n n”EN,n/ <n" <n’+An jéJi i’€l; 


> SD SD SLL) 


n”EN n’EN,n!<n,n/<n'<n"+An jéJj i’ €l; 


Viel,neN,n>1,An>0 (4) 


2 


w(i,n’,n) 
n’EN,n’<n<n’+An 


= 2. Dm 


n’EN,n'/<n n”EN,n! <n" <n’+An 
n”’EN n/EN,n! <n,n" <n! <n"”+An 


VielI,neN,An>0 (5) 


w(i, n,n") 


w(i,n"’,n’) 


Constraint (1) enforces that at the most one task can 
start at each event and constraint (2) states that at the 
most one task can end at each event. Constraint (3) 
states that if a task starts at an event, a different task 
cannot end at the same event, because only the task 
that starts at an event can end at the same event. Con- 
straint (4) states that a new task can start only if the 
total number of tasks that started earlier matches the 
total number of tasks ending. Constraint (5) states that 
a task cannot end unless it started earlier. Note that con- 
straints (2)-(5) are applicable only if An>0. 
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Capacity Constraints 
i : / . / . / 
BY" w(i,n,n') < bi, n,n’) < BP w(i, n,n’) 


Viel,n,n €N,n<n'<n+An 


(6) 


For each task, constraint (6) enforces the minimum and 
maximum batch size. 


Material Balances 


ST(s,n) = ST(s,n —1) 


+0 De b(i, n’,n—1) 
ier? n’EN,n’/<n—1<n’+An 
+> > thee) 
ieI$ n’EN,n<n’<n+An 
VsEeS,neN,n>1 
(7) 
ST(s,n) = STo(s) + >> > b(i, n,n’) 


i€If n’EN,n<n’/<n+An 

VsES,n=1 (8) 
In the material balance (7), the amount of a state at the 
previous event is adjusted by the amount of the state 
produced by the tasks that are ending at the previous 
event and by the amount of the state being consumed 
by the tasks that are starting at the current event. In 
(8), the initial available amount of the state is taken into 
account. 


Duration Constraints 


T/(i,n) = T’(i, n) + ajw(i, n,n) + B;b(i, n, n) 
VielI,neN,An=0 (9) 


Ti(i,n) > TS(i,n) Viel.ne€N,An>0 (10) 
TI (i, n') > T(i,n) + ajw(i, n,n’) 
+ Bi b(i, n,n’) — MQ — w(i,n,n’)) 
Vieln,n €N,n<n'<n+An,An>0 
(11) 
TL(i,n') < T’(i,n) + a;w(i,n,n’) 
+ Bi b(i,n,n') + M0 — wi, n,n’)) 
Vieln,n €N,n<n'<n+An,An>0 
(12) 


If An =0, then the finish time of a task that started at the 
same event is calculated from (9). Otherwise, if An >0, 
then the finish time of a task that started at an earlier 
event is calculated from (10)-(12). Note that, because 
of the usage of three-index binary and continuous vari- 
ables, the duration constraints in the S&F model are 
simpler and have fewer big-M terms compared to the 
duration constraints in the model of Janak et al. [5,6], 
and hence it is observed to result in improved LP relax- 
ations. Janak et al. [5,6] used several additional tighten- 
ing constraints and bilinear variables to improve the LP 
relaxation, which are not necessary in the S&F formu- 
lation. 


Sequencing Constraints 


Same Task in the Same Unit 


TS(i,n +1) > TS(i,n) 
Viel,neN,n<N (13) 


Ti(i,n +1) < TI(i,n) 


- of ow 
+M/1—-— y y w(i,n',n’) 
n’EN,n’<n n”EN,n' <n" <n’+An 
= y y w(i, n,n’) 
n”EN n'EN,n’<n,n" <n’ <n"’+An 
sop 
+M y w(i,n,n) 


n’EN,n’<n<n’+An 
Viel,neN,n<N,An>0 
(14) 


If An=0, then the constraint for same task in the same 
unit is given in (13). Otherwise, if An >0, then the con- 
straint for same task in the same unit is given by (13) 
and (14), where the zero wait condition of (14) is ad- 
ditionally applied when the task is active at event n but 
not ending at event n. 


Different Tasks in the Same Unit 


Ti(i,n +1) > Thi’, n) 
Vii EL,if#i,jel,neNn<N 
(15) 
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Different Tasks in Different Units 


T(i,n +1) > Ti(i,n) 


- > 


n’EN,n’<n<n’/+An 


—M w(i',n',n) 
< Coat Pp: «| 
VseS,ieli,i el ,ielj,i €ly, 
i#i,j,j ELjAInNnENn<N 

(16) 


For different tasks that produce or consume the same 
state, the start time of the consuming task at the next 
event is enforced to be later than the finish time of the 
producing task at the current event, provided the pro- 
ducing task is finishing at the current event. 


Tightening Constraint 


2 ae = 


i€I; nEN n’EN,n<n’<n+An 
+ Bibli,n,n')) <H Viel 
(17) 


(a;w(i, n,n’) 


The sum of the durations of all tasks suitable in each 
unit should be within the time horizon. 


Utility Related Constraints 


w(i,n’,n") 


Lm oY 


ie€l, n’EN,n/<n n”EN,n!<n"<n’/+An 
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Be IE of 
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n”EN n’EN,n’<n,n"<n/<n"”+An 


<Up™ WueU,neNn 


(18) 
The consumption of each utility at an event by all suit- 


able active tasks is limited to the maximum availability 
in (18). 


Sequencing of Utility Related Tasks 


Ty(usn+1)>Ti(u.n) VueU,neNn<N 


(19) 


Constraint (19) is similar to the constraint for same task 
in the same unit for each utility. 


T,,.(u,n) > T*(i, n) 


ty de 2s 


n’EN,n'<n n”EN,n! <n" <n’+An 


= > >. w(i, n,n’) 


n”EN n'EN,n’<n,n" <n! <n"’+An 


Vue Uu,iel,neNn 


—M w(i,n',n") 


(20) 
T..(u, n) < T*(i,n) 
+M]1— 2 > w(i,n’,n") 
n’EN,n’/<n n”EN,n’ <n” <n’+An 
> nD 
n”EN n/EN,n’<n,n”" <n! <n"”+An 
Vue U,iel,,neNn 
(21) 


In constraints (20) and (21), at each event, the start 
times of all suitable tasks that consume a utility are en- 
forced to be equal, and are assigned to T;,(u, n), if the 
task is active. 


TS(i,n—1) = TS,(u,n) 


—~M)1— > > w(i, n,n") 


n’EN,n’<n—-1 n”EN,n’/<n"<n’+An 


- > ~ w(i,n",n’) 
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(22) 
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TS(i,n—1) < Ti,(u,n) 


a ys w(i,n',n") 


n'EN,n!<n—1 n”EN,n! <n" <n’+An 


= > ye wi, n,n’) 


nl EN n’/EN,n! <n—1,n" <n! <n"+An 


+M)1—- 


VueU,ielneNn>1l 
(23) 


In constraints (22) and (23), the end times of all suit- 
able tasks that consume a utility at previous event are 
enforced to be before T;,(u, 1), if the task is active. If 
any of these tasks are finishing at the previous event, 
then the end times are enforced to be equal to T*,(u, n). 
Note that, unlike in the model of Janak et al. [5,6], the 
end time of utility consumption Thu, n) is not part of 
the model here, but is accurately calculated as a param- 
eter after solving the model. 


Bounds on Variables 
T*(i,n) < Hy T!(i,n) < H; Tj(u,n) < Hs 


bG HH) SBS esl Gn) = ST, (24) 
Siiy =< sT° 

w(i,n,n') =0,b(i,n,n')=0 Vn' <n (25) 

T.(u,n)=0, Vna=1 (26) 

STo(s) = 0, Vs ¢ S® (27) 


In (24), general bounds are added to different contin- 
uous variables. The non-permissible cases of the three- 
index binary and continuous variables are eliminated 
in (25). In (26), T*,(u, 1) at the first event should be as- 
signed to the reference start time of the horizon, which 
is assigned to zero for simplicity. In (27), the initial 
amounts of all states, except the raw materials, should 
be assigned to their appropriate values, which are as- 
signed to zero for simplicity. Depending on the STN of 
actual process considered, additionally it is possible to 
identify tasks that cannot occur at certain events and 
the corresponding binary and continuous variables can 
be eliminated. 


Objective Function 


Maximization of Profit 


Max Profit = 


LU (s+ Ye 


sesf n=N ier? n’€N,n'<n<n’+An 


bli, n’, ") 
(28) 


For the objective of maximization of profit, the total 
amount of the final products produced by the last event 
is considered in (28). In all the constraints involving 
big-M terms, the value of M can be assigned to the time 
horizon, H. 


Minimization of Makespan (MS) 


Min MS 


ST(s,N)+>> > > 


n=N ie]? n’EN,n’<n<n’+An 


(29) 
b(i,n',n) > D, 


YseSf (30) 


TIi,N)< MS Wiel (31) 
For the objective of minimization of makespan, MS, 
the demand constraints for the final products are given 
in (30). The makespan should be the upper bound on 
the end time of each task at the last event. The param- 
eter H, in the tightening constraint of (17) needs to be 
replaced by MS. 


Computational Case Studies 


Example 1 Consider the same example discussed ear- 
lier. There are no resource considerations, hence, con- 
straints. 18-33 will disappear. For the objective of 
maximization of profit, Shaik et al. [8] presented the 
comparative study for two different time horizons 
(H =8h and H=12h) for this example. In this study, 
we will consider two more instances of this example 
(H =10h and H = 16h) and evaluate the performance 
of the unified S&F model. The model statistics for the 
objective of maximization of profit are given in Table 3. 

Similar to the CBMN model, for each instance, at 
each event, the S&F model is solved using increasing 
values of An = 0,1,... 
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Model statistics and computational results for example 1 under maximization of profit 


Events CPU Time (s) 


Nodes 


RMILP ($) MILP ($) Binary 


Example 1a (H=8) 


variables 


Continuous 
variables 


Constraints Nonzeros 


M&G 


CBMN(At = 2) 


S&K 5 0.07 4 | 1730.9 1498.6 48 235 249 859 
M&G 5 0.16 26 | 1730.9 1498.6 64 360 826 2457 
CBMN(At=1)] 5 0.01 4 | 1730.9 1498.6 32 104 114 439 
I&F 4 0.03 13 | 1812.1 1498.6 18 90 165 485 
5 0.28 883 | 2305.3 1498.6 26 115 216 672 
S&F(An =0) 4 0.01 9 | 1730.9 1498.6 18 122 193 511 
(An=0) 5) 0.22 530 | 2123.3 1498.6 26 155 252 696 
Example 1b (H= 10) 
S&K 8 105.5 88679 | 2690.6 1962.7 84 433 456 1615 
M&G 8 507.64 184605 | 2690.6 1962.7 |112 609 1402 4884 
CBMN(At=1)] 8 1.82 6449 | 2690.6 1860.77 | 56 170 189 760 
(At=2)|] 8 81.95 194968 | 3136.3 19599 104 218 261 1238 
(At=3)|] 8 207.43 366226 | 3136.3 1962.7 | 144 258 321 1635 
I&F 6 2.16 6713 | 3078.4 1943.29 | 34 140 267 859 
7 43.73 101415 | 3551.8 1943.29 | 42 165 318 1046 
S&F(An =0) 6 2.13 6335 | 2730.7 1943.29 | 34 188 311 881 
(An=0) 7 27.93 64076 | 2780.2 1943.29 | 42 221 370 1066 
(An=1) 6 14.40 18902 | 2730.7 1962.7 65 219 692 2206 
Example 1c (H= 12) 

S&K 1.93 1234 | 3002.5 2610.1 72 367 387 

29.63 16678 | 3167.8 2610.3 84 433 456 

561.58 288574 | 3265.2 2646.8 96 499 525 


10 10889.61 


11 > 67000" 17270000 | 3343.4 2646.8? | 120 6. 663 2371 
yf ell) 814 | 3002.5 2610.1 96 5 1210 4019 
8 58.31 17679 | 3167.8 2610.3 | 112 60 1402 4884 
9 2317.38 611206 | 3265.2 2646.8 | 128 6 1594 5805 

8 
3 


colIN|]—-|o 


14.39 


Ke) 


331.72 
1 
71 


4366.09 


coI;N|]—-|o 


N 


5.06 
96.13 


lee) 


3438353 


6018234 


3315.8 


2646.8 


108 
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Bil 
26 
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Short-Term Scheduling, Resource Constrained: Unified Modeling Frameworks, Table 3 (continued) 


Events CPU Time(s) Nodes RMILP ($) 


MILP ($) Binary Continuous Constraints Nonzeros 
variables variables 


Example 1d (H= 16) 


S&K 809.58 231810 


3738.4 


38788.48 | 10065424 


3738.4 


M&G 8866.64} 1402457 


3738.4 


> 670009 9225164 


3738.4 


CBMN(At = 1) 15.04 33503 


3658.12 


(At=2) 206.30 315632 


3738.4 


(At=2) 12392.47 | 15779526 


3738.4 


16.35 32106 


3738.4 


586.47 | 1057072 


3738.4 


13.68 25814 


3738.4 


340.24] 487795 


3738.4 


4 Suboptimal solution; Relative Gap: 1.59%, 3.16%°, 5.12%, 28.16%®, 2.58%", 5.46%9, 7.72%" 


For the first instance, (example 1a), for H =8h, it 
can be observed that the unified model (S&F) per- 
forms equally well compared to the I&F model. The 
S&F model requires less number of nodes and gives bet- 
ter RMIP values compared to the I&F model. The S&F 
model requires the same number of binary variables as 
that of I&F model. For the second instance of this ex- 
ample, (example 1b), for H = 10h, as already discussed 
in the motivation section, the I&F model fails to find 
the global optimal solution even at higher events. This 
can be confirmed by solving the S&F model for An = 0, 
which also gives the same suboptimal solution, because 
when An=0 (similar to At=1 for the CBMN model), 
the tasks are not allowed to take place over multiple 
events. For An=0, the S&F model gives better RMIP 
values compared to I&F model. For An=1, using six 
events the S&F model is able to find the global opti- 
mal solution in 14.4 CPU s. The CBMN model requires 
a value of At=3 to find the global optimal solution. 
Compared to the JLF model from Table 2, there is al- 
most 50% reduction in the RMIP value, in the num- 
ber of constraints, and nonzeros for the S&F model. 
The number of binary and continuous variables are also 
fewer, apart from the exceptional computational per- 
formance of the S&F model. The schedule obtained by 
the S&F model is similar to the JLF model as shown in 
Fig. 3. 

For the third instance (example 1c), for H = 12h, it 
can be observed that the unified model (S&F) at An = 0, 


performs slightly faster compared to the I&F model. 
The S&F model requires less number of nodes and 
gives better RMIP values compared to the I&F model. 
Among the slot-based/global event-based models that 
require at least 11 events, only the M&G model is able 
to find the global optimal solution in the specified CPU 
time. The unit-specific event-based models require only 
7 events to find the global optimal solution with ex- 
ceptional computational performance. Similar conclu- 
sions hold true for the fourth instance (example 1d), for 
H = 16h. The unified model (S&F) at An =0, performs 
faster compared to the I&F model. The S&F model re- 
quires less number of nodes and gives better RMIP val- 
ues compared to the I&F model. The slot-based/global 
event-based models require 10 events, while the unit- 
specific event-based models require only 8 events to 
find the global optimal solution with faster computa- 
tional performance. When we consider an additional 
event, the S&F model again outperforms all the other 
models as seen in Table 3. 

For the objective of minimization of makespan the 
computational results are given in Table 4. For models 
involving big-M constraints a value of M = 50h is used, 
similar to Shaik et al. [8]. The S&F model gives better 
RMIP values compared to the I&F model. 

Among the slot-based/global event-based models 
that require 10 events, the CBMN model takes a total 
CPU time of 106.6s. As discussed in Shaik et al. [8] 
since the CBMN model gives a suboptimal solution at 
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Gantt chart for example 1b using the S&F model 
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Model statistics and computational results for example 1 under minimization of makespan 


Model Events H CPU Time(s) Nodes 


Example 1 (Dg = Do = 200 mu) 
- 10.98 
- 519.35 


5378 
142108 


18.685 
18.685 


RMILP (h) MILP (h) Binary 


Continuous Constraints Nonzeros 


variables variables 


19.789 96 
19.340 | 108 


556 
622 


528 1936 
59 


9 50 66.55 15674 | 18.685 19.789 | 128 693 1598 5869 


10 50 | 5693.53 1066939 | 18.685 


= 0.71 1809 | 18.685 
= 50.49 134189 | 18.685 


10 = 56.11 109917 | 15.654 


776 1790 
193 216 
215 24 
279 337 


19.340 | 144 
19.789 64 
19.7899 | 72 
19.340 | 136 


il 


8 50 0.78 1008 | 12.738 19.764 45 190 367 1211 


9 50 74.26 111907 | 12.477 


50 1.42 2280 | 18.685 
50 98.60 105673 | 18.685 


4 Suboptimal solution 


At=1, we consider the total time for both At=1 and 
At =2. Similarly, for the S&F model as well we need 
to add the total CPU time while comparing with other 
models. The unit-specific event-based models require 9 
events to find the global optimal solution. Here, the I&F 
model solves faster compared to the unified S&F model. 
For problems involving no resource considerations, the 


Short-Term Scheduling, Resource Constrained: Unified Mod- 
eling Frameworks, Figure 4 
STN for example 2 


215 418 1398 
254 435 1244 
287 494 1429 


19.340 53 
19.764 45 
19.340 53 


S&F model was found [9] to perform either equally well 
or better than the I&F model. 


Example 2 Now, consider an example with resource 
considerations and mixed storage policies. The STN for 
this example is shown in Fig. 4, and the corresponding 
data [5,7,8] is given in Table 5 and 6. 


Short-Term Scheduling, Resource Constrained: Unified Mod- 
eling Frameworks, Table 5 
State related data for example 2 


FI F2 I 213) P17 P2 


| s7=%{kg)_| 1000 | 1000 | 200 | 100 | 509 | 1000 | 1000 | 


sre) | 00] «oo o| of of o| | 
forcessme| 0 [| 0 | 0] of 0] 20] 0] 
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Task related data for example 2° 


T1 
(04 


Bmin pmax 


R1|40 | 80 0.025 | 0.75 | 0.0375 


T2 T3 T4 
Vins Sins -— Sicw Vins Sins Vicw Sicw 


R2}25 |50 0.75 | 0.06 


R3|40 | 80 0.25 | 0.0125 


6 0.25 0.3 
4 0.25 0.3 
4 


0.025 


a gmin7BMeax in kg, a inh, B inh/kg, y in kg/min, and 6 in kg/min per kg of batch 


There are two types of reactors available for the pro- 
cess (types I and II), with two reactors of type I (R1 
and R2) and one reactor of type II (R3) with four re- 
actions suitable in them. Reactions T1 and T2 require 
a type I reactor, whereas reactions T3 and T4 require 
a type II reactor. Additionally, reactions T1 and T3 are 
endothermic, where the required heat is provided by 
steam (HS) available in limited amounts. Reactions T2 
and T4 are exothermic, and the required cooling water 
(CW) is also available in limited amounts. Each reactor 
allows variable batch sizes, where the minimum batch 
size is half the capacity of the reactor. The process- 
ing times and the utility requirements include a fixed 
time and a variable term that is proportional to the 
batch size. The processing times are set so that the 
minimum batch size is processed in 60% of the time 
needed for the maximum batch size. For the raw mate- 
rials and final products, unlimited storage is available, 
while for the intermediates, finite storage is available. 
Two different cases of this example studied in the liter- 
ature [5,7,8] are considered that differ in the resource 
availability. In the first case (example 2a), we assume 
that the availability of both HS and CW is 40 kg/min, 
and in the second case (example 2b), it is 30 kg/min. 


Also, two different objective functions, maximization 
of profit and minimization of makespan, are consid- 
ered. A comparative study of different continuous-time 
models for this example was already provided in Janak 
et al. [5,6] ()LF) and Shaik et al. [8], where additional 
bounding constraints were added to improve the com- 
putational performance of the JLF model. In this study, 
in order to investigate the effect of using the three-index 
binary and continuous variables, we compare the per- 
formance of the unified S&F model with the JLF model 
without the addition of any additional bounding con- 
straints. 

Maximization of Profit. For the objective of maxi- 
mization of profit and a time horizon of 8 h, the optimal 
solution is $5904.0 in the first case (example 2a) and 
$5227.778 in the second case (example 2b). The com- 
putational results in terms of the model statistics and 
the CPU times are reported in Table 7 for the models of 
JLF and S&F. 

Minimization of Makespan For the objective of 
minimization of makespan, the optimal solution is 8.5 h 
in the first case (example 2a) and 9.025h in the sec- 
ond case (example 2b). The computational results in 
terms of the model statistics and the CPU times are 


Short-Term Scheduling, Resource Constrained: Unified Modeling Frameworks, Table 7 
Model statistics and computational results for example 2 under maximization of profit 


Model 


Events CPUTime(s) Nodes RMILP($) MILP (S$) Binary variables Continuous 


Constraints Nonzeros 
variables 


Example 2a 


Pe ampete 
ire __[eat___[20a8 [119973 [5000 [ae —‘(aa_—=SCS~*~dr Cd 


sF(Gn=t1[6 [203 [240 frovisa [sso00 [efi ——S—S—=~d 9 —=di as _ 


Example 2b 


PS CammpleOSOSC—~—C—~—SCSCSCSC~*dY 
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Model statistics and computational results for example 2 under minimization of makespan 


Model Events CPU Time(s) Nodes RMILP(h) MILP(h) Binary variables Continuous Constraints Nonzeros 
variables 


Example 2a 
JLF 8.5 60 
SF(An=1) | 7 8.5 54 


Example 2b 
9.025 48 
9.025 24 


reported in Table 8. For constraints involving big-M Nomenclature 
terms, a common value of M = 10 is used. 


Indices 

For both the objective functions, the unified S&F 
model has better RMIP values and faster computational /. i’ tasks 
performance compared to the JLF model. Also there is J: 7 units 
a drastic reduction in the problem statistics especially 1”, n',n" — events 
the number of continuous variables, constraints and $5 states 
nonzeros in all the instances considered. u utilities 
Conclusions Sets 


When there are resource considerations such as util- J tasks 

ity requirements, the unit-specific event-based mod- 1; tasks which can be performed in unit j 

els need to consider the formulations such as Janak J; tasks which process state s and either produce or 
et al. [5,6], that allow tasks to take place over multiple consume 

events. In this study, it is demonstrated that for short- I? tasks which produce state s 

term scheduling problem of batch plants involving no J; tasks which consume state s 

resource considerations as well, we need to allow tasks J, tasks which consume utility u 

to take place over multiple events in order to ensure J units 

achieving the global optimal solution. In such cases, the Ji units which are suitable for performing task i 
model of Ierapetritou and Floudas [4] yields suboptimal N event points within the time horizon 

solutions, while the model of Janak et al. [5,6] which S _ states 

was originally developed for solving problems with re- S* _ states that are raw materials 

source constraints, does not reduce well to the case states that are intermediates 

of no resources in terms of problem statistics. Hence, S’ states that are final products 

a unified modeling approach is discussed based on the U _ utilities 

unit-specific event-based continuous-time formulation 
of Shaik and Floudas [9] where they consider three- 
index binary and continuous variables. Their model is 
applicable to both problems of with and without re- B" minimum capacity (batch size) of task i 
sources in a unified way. The Shaik and Floudas [9]  Bi** maximum capacity (batch size) of task i 

model is found to perform either equally well or better ST? initial amount of state s available 

than the I&F model for problems involving no resource ST™* maximum amount of state s 

considerations, and is found to perform better than the; coefficient of constant term of processing time 
JLF model for problems with resource constraints. of task i 


Parameters 
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Bi coefficient of variable term of processing time of 
task i 

Yiu coefficient of constant term of consumption of 
utility u by task i 

Sin coefficient of variable term of consumption of 
utility u by task i 

Pis proportion of state s produced (p;,; > 0), con- 
sumed (pj; < 0) by task i 

H time horizon, h 

price, price of state s 

D, demand for state s, 

An limit on the maximum number of events over 
which a task is allowed to continue 

Up* = maximum availability of utility u 

M large positive number in big-M constraints 

Binary Variables 


w(i,n,n’) binary variable for assignment of task i that 


starts at event n and ends at event n’ 


Positive Variables 


b(i, 


n,n’) amount of material undertaking task i that 
starts at event 1 and ends at event n’ 


STo(s) initial amount of state s € S* that is required 
from external resources 

ST(s,n) excess amount of state s that needs to be 
stored at event n 

T°(i,n) time at which task i starts at event n 

T/(i,n) time at which task i ends at event 1 

T,,(u, 1) start time at which there is a change in the 
consumption of utility u at event n 
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Parameters 
price(s) = price of state s 
p?(s,i), p°(s, i) = proportion of state s produced, 
consumed from task i, respectively 
r(s) = market requirement for state s at the end of 
the time horizon 
Vmin(i,j) = minimum capacity of unit j when pro- 
cessing task i 
Vmax(i,j) = maximum capacity of unit j when pro- 
cessing task i 
stmax(s) = maximum storage capacity of state s 

Variables 
wv(i,j,n) = binary variables that assign the beginning 
of task iin unit j at event point n 
b(i,j,n) = amount of material undertaking task i in 
unit j at event point n 
d(s,n) = amount of state s being delivered to the 
market at event point n 
H = time horizon 
st(s,n) = amount of state s at event point n 
Ts(ij,n) = starting time of task i in unit j at event 
point n 
Tf(ij,n) = finishing time of task i in unit j at event 
point n 


Introduction 


There has been a significant amount of work devoted 
to the area of short-term scheduling, which involves 
the determination of the order in which tasks use 
units and various resources and the detailed timing 
of the execution of all tasks so as to achieve the de- 
sired performance. The problem data are usually as- 
sumed to be deterministic in the studies. However, in 
real plants, parameters like raw material availability, 
processing times, and market requirements vary with 
respect to time and are often subject to unexpected de- 
viations. Therefore, the consideration of uncertainty in 
the scheduling problem becomes of great importance in 
order to preserve plant feasibility and viability during 
operations. 

Although there are a large number of papers that 
address uncertainty in process design, much less at- 
tention has been devoted to the issue of uncertainty 
in process planning and scheduling, mainly owing to 
the increased complexity of the deterministic problem. 
Among the work that has appeared in the literature 


is that of Shah and Pantelides [18] that addressed the 
problem of the design of multipurpose batch plants 
considering different schedules for different sets of 
production requirements using a scenario-based ap- 
proach [6] and an approximate solution strategy. Pis- 
tikopoulos and Ierapetritou [14] presented a two-stage 
stochastic programming formulation for the problem 
of batch plant design and operations under uncer- 
tainty. The multiperiod planning and scheduling of 
multiproduct plants under demand uncertainty was ad- 
dressed by Petkov and Maranas [13]. The stochastic 
elements in their proposed model are expressed with 
equivalent deterministic forms, resulting in a convex 
mixed-integer nonlinear programming (MINLP) prob- 
lem. Schmidt and Grossmann [16] considered the op- 
timal scheduling of new product testing tasks and re- 
formulated the initial nonlinear, nonconvex disjunctive 
model as a mixed-integer linear programming (MILP) 
problem using different sets of simplifying assumptions 
that give rise to different models. The uncertainties in 
planning and scheduling problems are generally de- 
scribed through probabilistic models. During the last 
decade, fuzzy set theory has been applied to scheduling 
optimization using heuristic search techniques [8,10]. 
Recently, Balasubramanian and Grossmann [2] devel- 
oped MILP models for flowshop scheduling and new 
product development processing scheduling, based on 
a fuzzy representation of uncertainty. Daniels and Car- 
rillo [4] addressed the problem of B-robust schedul- 
ing in single-stage production facilities with uncer- 
tain processing times. Vin and Ierapetritou [20] pro- 
posed a multiperiod programming model to improve 
the schedule performance of batch plants under de- 
mand uncertainty. Acevedo and Pistikopoulos [1] ad- 
dressed linear process engineering problems under un- 
certainty using a branch and bound algorithm, based 
on solution of multiparametric linear programs at each 
node of the tree, and the evaluation of the uncertain 
parameter space for which a node must be consid- 
ered. 

However, most of the existing approaches can han- 
dle only a certain type of uncertain parameters, mostly 
uncertainty in product demands, and more impor- 
tantly, the additional complexity makes them infeasible 
for realistic applications. In this work, a novel frame- 
work is proposed for uncertainty analysis of scheduling 
problems based on the ideas of sensitivity analysis of the 
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corresponding MILP problem which is addressed using 
a branch and bound solution method. 


Methods 
Deterministic Scheduling Formulation 


In this work, the mathematical model used for batch 
plant scheduling follows the main idea of the contin- 
uous time formulation proposed by Ierapetritou and 
Floudas [9]. The model involves the following con- 
straints: 


Minimize H or maximize » Y= price(s)d(s, n), (1) 
subject toy) wv(i,j,n) <1, (2) 


i€l; 


st(s,n) =st(s,n—1)—d(s,n), 


ae Y> pP(s, i) = b(i, j,.n— 1) 


i€l; i€Ji 
+ >> p(s,i) ¥ > bli, jn), (3) 

i€l; ii 
st(s,n) < stmax(s) , (4) 


Vmin(i, j)wv(i, j, 1) 
< b(i, j,n) < Vmax(i, jwv(i,j,n), (5) 
> d(s,n) > r(s), (6) 


Tf (i, j,n) = Ts(i, j,n) + a(i, wri, j, 1) 
+ B(i, j)b(i, j, n) , 
isi, j, n+ 1) = T7G,7,2)- U0 —wv, j.7)) » (8) 


(7) 


Ts(i, j,n) => Tf(i,j,n) -UQ —wv(i',j,n)), (9) 


Ts(i, j.n) > Tf(i’, ’.n)—-UQ—wv(i’, j’,n)), (10) 


Ts(i, j,n +1) = Ts(i, j,n) , (11) 
THE ja DS TG (12) 
TI) <A, (13) 
Ts(i, j,n) <H, (14) 


where U denotes an upper bound of the makespan, for 
the cases where the objective is the minimization of the 


makespan. For the cases where maximization of profit 
is considered, U = H in constraints (8)-(10). In gen- 
eral, the objective function is to minimize the makespan 
as shown in (1) or to maximize the total profit. Allo- 
cation constraint (2) states that only one of the tasks 
can be performed in each unit at an event point n. Con- 
straint (3) represents the material balance for each ma- 
terial at each event point n being equal to that at event 
point n — 1, adjusted by any amounts produced and 
consumed between event points n — 1 and n, and deliv- 
ered to the market at event point n. The storage and ca- 
pacity limitations of the production units are expressed 
by constraints (4) and (5). Constraint (6) is written to 
satisfy the demands of the final products. Constraints 
(7)-(14) represent time limitations due to task dura- 
tion and sequence requirements in the same or different 
production units. 

Although there are a large number of papers 
that deal with uncertainty issues concerning pro- 
cess design and production planning, as reported in 
“Introduction,” the issue of uncertainty is not well stud- 
ied for scheduling problems, mainly owing to the high 
complexity of the deterministic case. 


MILP Sensitivity Analysis 


The formulation presented in “Deterministic Schedul- 
ing Formulation” corresponds to the MILP problem 
where the binary variables (wv(i, j, m)) denote the as- 
signment of tasks i to units j at event point n, respec- 
tively, throughout the time horizon. Therefore, the ef- 
fects of operation parameters on the plant performance 
can be investigated through the sensitivity analysis of 
the MILP model of the deterministic scheduling prob- 
lem. 

Although sensitivity analysis theory is well devel- 
oped in linear programming, efforts are still being 
made in order to handle the integer programming case, 
mainly owing to lack of optimality criteria for the in- 
teger optimization problems. Schrage and Wolsey [17] 
examined the effect of a small perturbation on the right- 
hand side or objective function coefficients in an integer 
program by collecting dual information at each node of 
the branch and bound tree while solving the original 
integer program and using a recursive scheme to ob- 
tain an upper bound on the objective function. Their re- 
sults were extended to nonlinear integer programming 
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problems by Skorin-Kapov and Granot [19]. Pertsinidis 
et al. [12] developed a sensitivity analysis algorithm for 
parametric MILP programs, which also provides the re- 
sults of the MILP master problem for the MINLP case, 
while an algorithm that provides a sequence of improv- 
ing parametric lower and upper bounds is employed for 
parametric nonlinear integer programming subprob- 
lems. An algebraic geometry algorithm for solving inte- 
ger programming problems was presented by Bertsimas 
et al. [3] Their method provides a natural generalization 
of the Farkas lemma for integer programming and leads 
to a method of performing sensitivity analysis. 

A method of sensitivity analysis for MILP was pre- 
sented by Dawande and Hooker [5], based on the idea 
of inference duality. It reveals that any perturbation that 
satisfies a certain system of linear inequalities will re- 
duce the optimal value no more than a prespecified 
amount. The inference-based sensitivity analysis con- 
sists of two parts: dual analysis that determines how 
much the problem can be perturbed while keeping the 
objective function value in a certain range, while pri- 
mal analysis gives an upper bound on how much the 
objective function value will increase if the problem is 
perturbed by a certain amount. The dual solution is 
obtained by using inference methods to generate con- 
straints at every node that is violated by the branching 
cuts. The dual solution can be viewed as a proof of op- 
timality and can be utilized to determine under what 
parameter perturbations the dual solution still provides 
a valid proof. More specifically the main results of the 
inference-based sensitivity analysis are summarized be- 
low. 

For the general mixed-integer problem 


Minimize z = cx 
subject to Ax > a,0 <x <h, (15) 


x;integer,j=1,...,k. 


Assuming a perturbation of all problem parameters 
such that 

Minimize z = (c + Ac)x 

subject to (A+ AA)x > a+ Aa,0<x <h, (16) 


xjinteger,j=1,...,k. 


If there exist s?,...,s) that satisfy the following 
set of inequalities, the constraint z > z* — Az remains 


valid: for the perturbations AA and Aa in the param- 
eters involved in the left-hand side and the right-hand 
side of the constraints 


eee p tty (17) 


n 
i= —)> gf + Wa—zy+ AZp; 
j=l 
for a perturbation Ac of the coefficients of the objective 
function 


n 
P P(>P P 
Dd Aciuy = 85 (Gj — uj) = rp 
j=l (18) 


P P Bo ars 
sh > —Ac;, sf > gi, fale. 


where qj =A?A;;—Abc;; p corresponds to the leaf 
node where the dual variable of the objective function 
(ae ) equals 1, whereas uy and a: denote the lower and 
the upper bound of x; at node p, respectively; zp is 
the objective value at node p; and Az, = z* — Zp. Leaf 
nodes are the nodes at which the branch and bound 
procedure terminates based on standard fathoming cri- 
teria [7]. Thus, with use of constraints (17) and (18) in 
the scheduling problem, the range of parameters where 
the objective remains within certain limits can be iden- 
tified and used to evaluate alternative schedules at the 
branch and bound tree. Moreover, the importance of 
different constraints and parameters is obtained and 
can be utilized to improve future plant operability. 


Robustness Metric 


In order to improve the schedule flexibility prior to its 
execution, it is important to measure the performance 
of a deterministic schedule under changing conditions 
due to uncertainty. 

Standard deviation (SD) is one of the most com- 
monly used metrics to evaluate the robustness of 
a schedule. To evaluate the SD, the deterministic model 
with a fixed sequence of tasks (wv(i, j,n)) is solved 
for different realizations of uncertain parameters that 
define the set of scenarios k which results in different 
makespans H;,. The SD is then defined as 


(Ai, _ Agy, Me H, 
a . Hog = A, (19) 


SDay: = ; 
(Prot _ 1) Prot 


k 
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where H,yg is the average makespan over all the sce- 
narios, and (pot) denotes the total number of scenar- 
ios. A detailed discussion of different robustness met- 
rics can be found in Samsatli et al. [15]. Vin and Ier- 
apetritou [20] proposed a robustness metric taking into 
consideration the infeasible scenarios. In the case of 
infeasibility, the problem is solved to meet the maxi- 
mum demand possible by incorporating slack variables 
in the demand constraints. Then the inventories of all 
raw materials and intermediates at the end of the sched- 
ule are used as the initial conditions in a new problem 
with the same schedule to satisfy the unmet demand. 
The makespan under infeasibility (Horr) is determined 
as the sum of those two makespans. Their proposed ro- 
bustness metric is defined as 


Hact — Have)? 
sDoc= ‘ ( act avg) 


(Prot -1) ° ay) 


k 


where H, = Hx, if scenario k is feasible and 
Hact = Heorr, if scenario k is infeasible. 


Proposed Uncertainty Analysis Approach 


The basic idea of the proposed approach is to utilize 
the information obtained from the sensitivity analysis 
of the deterministic solution to determine (1) the im- 
portance of different parameters and constraints and 
(2) the range of parameters where the optimal solution 
remains unchanged. The main steps of the proposed 
approach are shown in Fig. 1. More specifically, there 
are two parts in the proposed analysis. In the first part, 
important information about the effect of different pa- 
rameters is extracted following the sensitivity analysis 
step, whereas in the second part alternative schedules 
are determined and evaluated for different uncertainty 
ranges. 

First, the deterministic scheduling is solved at the 
nominal values using a branch and bound solution ap- 
proach, and the dual multipliers A? are collected at each 
leaf node p. Then the inference-based sensitivity anal- 
ysis as described in the previous section is performed 
for all the important scheduling parameters, including 
demands, prices, processing times and capacities. Note 
that only the dual information of the nodes that cor- 
respond to nonzero dual variables is required. Using 


the results of this analysis, one can answer a number 

of very interesting questions regarding the robustness 

of the plant to parameter changes. 

In particular: 

e How does the capacity of the units affect the produc- 
tion objective? 

e What is the range of product demand that can be 
covered and how much would the profit be affected 
by such changes? 

e What is the effect of a price change on the objective 
value? 

e What is the significance of the constraints involved 
in the model? Are there any redundant sets of con- 
straints? 

The first question can be answered by imposing the 
same perturbation on capacity constraint (5) for the 
different units involved in the production of specific 
products and determining the change in the objective 
function (Az). The unit with the largest effect on the 
objective value is also the most critical one for the pro- 
duction of this product and thus a change in its capac- 
ity will result in the largest production change. Simi- 
larly, the rest of the questions can be answered by an- 
alyzing perturbations at the appropriate constraints to- 
gether with the effects on the objective function. The 
results for two examples are given in the next section. 

In the second part of the analysis, the sensitiv- 
ity information is used to define the range of uncer- 
tain parameters where the schedule is optimal and to 
identify alternative schedules at different uncertainty 
ranges. The set of constraints (17) and (18) are used to 
determine the range of uncertain parameters for cer- 
tain changes in the objective function. The branch and 
bound procedure is then continued on the nodes with 
the objective value within the predicted limits to iden- 
tify new optimal solutions. The alternative schedules 
are evaluated using the robustness metric (SDeorr) as 
defined in “Robustness Metric,” the average and the 
nominal schedule performance in terms of the objec- 
tive function. 

Since the entire analysis is based on a single branch 
and bound tree among a large number of possible 
branch and bound trees that can be used to solve the 
MILP, it provides conservative sensitivity ranges. The- 
oretically, the exact sensitivity ranges can be obtained 
by investigating an exponential number of branch and 
bound trees. However, using the above analysis, one can 
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Solve the deterministic scheduling 
problem using B&B tree 


— Range of parameter change 
for certain objective change 


Extract information from the 


-I tant te 
leaf nodes (primal and dual) saa ae cae 


— Plant robustness to 
parameter changes 


y 
Move the bounds of the 
uncertain parameter range 


y 
Identify the feasible schedules 
by examining the B&B tree 


— Robustness metric 
| Evaluate the alternative schedules — Nominal performance 


— Average performance 


Short-Term Scheduling Under Uncertainty: Sensitivity Analysis, Figure 1 
Flow chart of proposed approach. B&B branch and bound 


extract useful information regarding the approximate Case 

range of the parameter change for a certain objective 

change and the robustness of the plant to parameter The case study [9] considers two different products pro- 
changes, and one can also determine the importance of duced through five processing stages: heating, reactions 
different parameters as illustrated in the next section. 1, 2 and 3, and separation of product 2 from impure E as 
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Short-Term Scheduling Under Uncertainty: Sensitivity Analysis, Figure 2 
State-task network representation for the case 
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illustrated in the state-task network (STN) representa- 
tion in Fig. 2. For the first part of the analysis, the prob- 
lem is solved with the objective of maximizing the profit 
within the time horizon of 12h. After the sensitivity 
analysis has been performed, the following information 
is obtained. It is found that the most critical task of the 
production line is reaction 2. By decreasing the process- 
ing capacity of reaction 2 in reactor 1 or reactor 2 by 
11 units, the profit will be reduced by 5%, whereas very 
small change or no change at all is observed in the ob- 
jective function with a processing capacity change for 
reaction 1 or reaction 3 in both reactors. The objective 
value is also not sensitive to the change of other param- 
eters, for example, the processing capacity of separation 
in the separator can drop by up to 120 units without the 
profit decreasing. Another important modeling issue 
that can be addressed is the question of constraint re- 
dundancy. Here the importance of storage constraints 
is investigated and it is found that these constraints are 
redundant since they are not active in any of the solu- 
tion branch and bound nodes. More interestingly, the 
duration constraints are also found to be redundant, 
which means that the maximum processing capacities 
are already reached with the current processing times, 
so the profit cannot be improved even with zero pro- 
cessing times assuming a fixed number of event points. 
For the second part of the analysis, the demand of prod- 
uct 2 is considered to be the uncertain parameter vary- 
ing within the range [20, 80] and the objective function 
is modified to minimize the makespan. A branch and 
bound tree is constructed at nominal point r(‘p2’) = 50 
and the dual information is stored at each node. 
Applying the inference duality sensitivity analy- 
sis, one obtains the following expression regarding the 
range of demand change following a specific objective 
change (AH): —0.0297Ad < AH, which means that if 
the demand is increased by Ad, the new makespan be- 
comes at most Hpom + 0.0297 Ad. When r(’p2’) is in- 
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Comparison of alternative schedules for the case 


Schedule 1 Schedule2 Schedule 3 


creased from 50 to 80, schedule 1 becomes infeasible. 
Then, we solve the linear programming problem at each 
leaf node with the demand of 80 and check the leaf 
nodes with the objective value below 7.89 that is ob- 
tained using this inequality in the branch and bound 
tree. The new optimal solution is found to be sched- 
ule 2 and schedule 3 is one feasible solution, as illus- 
trated in Table 2 . The schedules are then evaluated 
with respect to the mean and nominal makespan and 
the SD within the demand range [20, 80] and the val- 
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Values of binary variables of optimal schedules 


(Task, unit) 
(Heating, heater) 


No M1 n2 3 


(Reaction 1, reactor 1) 


(Reaction 1, reactor 2) 


(Reaction 2, reactor 1) 


(Reaction 2, reactor 2) 


(Reaction 3, reactor 1) 


(Reaction 3, reactor 2) 


O;/O;O;O;O/;}—];oO];— 


(Separation, still) 


(schedule 1) 


(Task, unit) no m 


(Heating, heater) 1/1 
(Reaction 1, reactor 1)|1 |0 
0 


(Reaction 1, reactor 2) | 1 


(Reaction 2, reactor 1)|0O | 1 


(Reaction 2, reactor 2) 


n2 n3 


(Reaction 3, reactor 1) 


iS) || || 4 | = 
-—-|o;/o|/o 


1 
0 
(Reaction 3, reactor 2) 0 
(Separation, still) 0 


(schedule 2) 


(Task, unit) 
(Heating, heater) 


nz N3 


(Separation, still) 


(schedule 3) 
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ues are shown in Table 1. Compared with schedule 2, 
schedule 3 has a larger mean makespan but a lower SD, 
which means higher robustness; therefore, depending 
on the decision-maker’s attitude towards risk and the 
expected growth in demand, one can choose schedule 3 
over schedule 2, whereas schedule 1 remains a valid al- 
ternative if the demand is expected to remain constant. 

Note that the proposed uncertainty analysis does 
not substantially increase the problem complexity. That 
is due to the fact that the required information is al- 
ready obtained from the solution of the deterministic 
problem. 


Conclusions 


An integrated framework was developed in the work 
reported here to handle uncertainty in short-term 
scheduling based on the idea of inference-based sen- 
sitivity analysis for the MILP problem and the utiliza- 
tion of a branch and bound solution method. The pro- 
posed method leads to the determination of the impor- 
tance of different parameters and constraints on the ob- 
jective function and the generation and evaluation of 
a set of alternative schedules given the variability of the 
uncertain parameters. The main advantage of the pro- 
posed method is that no substantial complexity is added 
compared with the solution of the deterministic case 
since the only additional information required is the 
dual information at the leaf nodes of the branch and 
bound tree. One illustrative example was presented to 
highlight the information extracted by the proposed ap- 
proach and the complexity involved. 
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Signal processing methodologies based on higher-order 
statistics or spectra (HOS) of order greater than two 
have become important signal processing tools in a va- 
riety of application areas: digital communications, sys- 
tem identification and spectral analysis, source sep- 
aration and array processing, time delay estimation, 
image and speech processing and biomedical applica- 
tions among others, [3,4,9]. The increased popularity of 
HOS in signal processing applications can be attributed 
to many attractive properties they possess: preserva- 
tion of nonminimum phase information, ability to de- 
tect/identify nonlinear behavior, robustness to Gaus- 
sian and other forms of noise, etc. It is well known that 
when a signal is Gaussian there is no benefit in con- 
sidering the HOS of this signal since all the statistical 
information is conveyed by its first and second order 
statistics (SOS). However, for nonGaussian signals the 
SOS do not provide a complete description and a lot 
of important information can be extracted from their 
HOS, [8,9,11]. 

The n-order moment and cumulant sequences of an 
n-order stationary random process {y(i)}, i= 1,2,...are 
defined as, [9]: 


My, n(A1, see »An—-1) = E{yi .- “Yn}, 
Cyn (Ar, oA) 


= -1? p- DIE TL: ae TL»: 


i€el; i€lp 


(1) 


where, y = y(i); Yo = yi t+ At), .--5 Yn = Vi + An—1), the 
summation covers all partitions (I1,..., I»), p=1,...,1, 
of the set {1,..., n}, A, =0, £1, 42,...,and E{-} denotes 


statistical expectation. The second, third and fourth or- 
der cumulants of zero-mean processes are utilized often 
in practice and take the form 


Cy,2(A) = My,2(A), (2) 
Cy,3(A1, A2) = My,3(A1, A2), (3) 


Cy,4(A1, Ar, 43) = My,a(A1, Ao, 43) 
— My,2(A1)My,2(A2 = A3) 
= My,2(A2)My,2(A3 _ Ai) 
— My,2(A3)My,2(A1 — Az). (4) 


The yy, 2 = C,, 2(0) is the variance, the yy, 3 = Cy, 3(0, 0) is 
the skewness, and the yy, 4 = Cy), 4(0, 0, 0) is the kurtosis 
of {y(i)}. For a complex process the definitions in (1) 
may include conjugation in one or more terms in the 
products. 

The n-order polyspectrum (higher-order spectrum) 
of {y(i)} is defined as the (n — 1)-dimensional discrete 
Fourier transform of Cy, ,(A1, ..., An—1), that is 


Se yee, C1On=1) 
[oe 


n—1 
Cy n(Ar, fe ay) An-1) I] eA, (5) 


Aitgives An—1=—0O l=1 


|oi| < ,1=1,...,.n-1,| OT @i| <a. Forn=2, 3,4 
we obtain the power spectrum, the bispectrum and the 
trispectrum, respectively. 

Cumulants are utilized as measures of ‘Gaussian- 
ity’ and statistical independence because they satisfy the 
following two important properties: 

1) Given that the set of random variables {y(i), 
yi + Aq), ..., Wi + An—1)} is divided into any num- 
ber of mutually independent subsets, then, C,, n(A1, 

.+> An—1) = 0. Therefore, if a random process {y(i)} 
is independent identically distributed, then 


Cy n(Ar, semis An-1) 
= Vyn5(A1) te b(An-1), 
where 6(A) = 0, A # 0, is the delta function. Also, 


given that {x(i)}, {z(i}, i= 1, 2, ...are two indepen- 
dent processes and y(i) = x(i) + z(i), then 


Cyn (Ar, a) Xn—-1) 


= Cyn(Ar, tee An-1) r Czn(A1, tee »An-1)- 
(6) 
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2) If the set of random variables {y(i), yi + 41), ..., 
yi + Ay-1)} are jointly Gaussian, then C,, (Aj, ...; 
An—1) = 0, for n > 3. 

These properties do not hold for moments. For this rea- 

son cumulants are often more attractive than moments 

in applications of HOS, [7,9]. 

Often in practice we want to calculate the higher- 
order cumulants of the process 


y(k) = s(k) + w(k), 


where the s(k) may be either deterministic signal or 
a random process and the w(k) is a zero-mean station- 
ary Gaussian noise process independent from s(k). In 
practice, the estimation of HOS is based on time aver- 
aging. To overcome problems of nonstationarity of y(k) 
in the case where s(k) is a deterministic energy signal, it 
is necessary to assume multiple realizations y;(k), j = 1, 
...5J,k=1,..., K, of sufficient length and estimate the 
n-order moment as follows: 


i kK (7) 
=e > Y— vilk)yj(k +A) yjlk + An—1) 


j=l k=1 


for A; = 0, £1,..., EL, and G = 1/J, [9,10]. If s(k) is 
random and locally stationary or a deterministic power 
signal, then (7) applies by segmenting the y(k) into J 
possibly overlapping segments (considered as multi- 
ple realizations) with G = 1/JK. The sample estimate 
Cad, ...,An—1) is obtained by substituting moment 
estimates in the definitions of cumulants. 

To gain insight into the utilization of HOS let us 
consider the following examples. 


Example 1 Let x(k) = on Cnerk+%") where, by are 
independent identically distributed random variables 
uniformly distributed in the interval [—z, z]. This is 
a harmonic stationary process. Let y(k) and z(k) be the 
responses of a linear and a nonlinear system, respec- 
tively, both driven by x(k). Then, 


y(k) = ae, 
n 

2(k) = Pe gi 
n 


+ yee + ~~ ee 


m,l myl,i 


It can be shown that [3,9]: 


Cy3(A1,A2) = 0, Cz,3(A1,A2) # 0. 


In general, the polyspectra of a system output can be 
utilized in various ways in detecting as well as charac- 
terizing various types of nonlinearities in the system, 
e.g., detecting quadratic and cubic phase coupling in 
harmonic processes and identifying nonlinear Volterra 
filters driven by Gaussian processes among others, [9]. 


Example 2 Consider now a linear filtering problem 
where the linear time invariant (LTI) system with im- 
pulse response {f(k)} is driven by a stationary random 
sequence {x(k)};=1,2,.... Assuming that the system is 
stable, the following expression can be written for the 
n-order cumulant of the system output {y(k)}, [2,7]: 


Cyn(Ar, tee »An—1) = Cyin(Ai, tee An—1) 


- 8 
* ~ FO fte a) f+ 2-0), - 


k=—00 


where * denotes (n — 1)-dimensional linear convo- 
lution. In the special case where {x(k)} is indepen- 
dent identically distributed, zero-mean, non-Gaussian, 
{f(k)}, k =0, ..., q, is finite length and {w(k)} is additive 
stationary zero-mean Gaussian noise statistically inde- 
pendent from {x(k)}, the following relations hold for 
the diagonal cumulants (A; = A for all i): 


q 
y(k) = Yo f(n)- x(k — n) + w(k), 


n=0 


q 
Cy2(A) = yx2° Do fl) f(k + A) + Cy,2(A), 
k=0 
q 
Cyndy. A) = Yen DFS" Mk + A), 
k=0 


n> 3. 


A= -q,...,0,...,9, 


The C,,2(A) is corrupted by noise. On the other hand, 
the Cy,,(A,..., 4), n = 3, are noise free and proportional 
to the corresponding order correlation of the chan- 
nel coefficients. Note that the second order cumulants 
(power spectrum) above do not preserve the true phase 
character of f(k) unless the system is minimum phase 
(that is, all q zeros of its Z-transform are inside the unit 
circle). Thus, from Cy 2(A) alone only an equivalent 
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minimum phase system (within all-pass phase ambigui- 
ties) can be recovered. On the other hand, cumulants of 
order greater than two (polyspectra) preserve the true 
phase character of the system and thus are able to iden- 
tify the system correctly up to a sign and possibly a con- 
stant linear phase term. This property, is the reason be- 
hind the wide utilization of HOS for blind system iden- 
tification and deconvolution in seismic signal process- 
ing, dispersive communications channels, speech pro- 
cessing and other applications, [4,8]. 


To have a glance of optimization procedures that in- 
volve HOS, consider the recovery of the coefficients 
f(k), in the linear filtering problem. The nonlinear- 
least squares approach proposes the minimization of 
the nonlinear function 


4 4 : 
s [esac A) = Yen DAO FO MK + | 


A=-q k=0 


(9) 


with respect to the unknown parameters {y x, 1, f(k): k= 
0,..., q}, [2,6]. CG ...,A) is the estimated diagonal 
cumulant from data samples. In practice the unknown 
order q must be estimated by means of model order se- 
lection criteria. 

Minimization of (9) is a difficult problem which re- 
quires tedious searching programming techniques and 
proper initialization to avoid local equilibria. Thus, it 
is customary to seek solutions based on linear rela- 
tions, [2,9,12]. Usually, a linear relation between the un- 
known parameters of the system model and the higher- 
order cumulants of the observed process is established. 
The solution is obtained by forming and solving an 
overdetermined linear system of equations. A variety 
of such algorithms have been proposed in the literature 
based on various system models. 

Alternatively, we may consider the following decon- 
volution scenario, [1,3,4]: 


x(n) = u(n)*y(n) = [u(n)*f(n)]*x(n), 


where u(m) is an appropriate filter so that x(n) = A- 
x(n — D) where D is a constant delay and A a constant 
phase term. Since the effect of linear filtering (i. e., con- 
volution with f(m)) increases the Gaussianity of a ran- 
dom process (central limit theorem), then, inverse fil- 
tering (i.e., deconvolution with u(n)) must decrease the 


Gaussianity of the process. Based on this idea, deconvo- 
lution can rely on maximizing or minimizing an appro- 
priate measure of Gaussianity such as the kurtosis, y¢_,4, 
of x(n) with respect to the inverse filter coefficients. Ac- 
tually, a variety of algorithms have been derived for de- 
convolution based on the constrained maximization of 
the objective function (for m 4 r, m, r > 2), [1]: 


3g 
lVz,m| 
a 
lvz,rl 


To date the utilization of HOS in applications has 
been hampered by: i) the high computational complex- 
ity and the requirement for long data records in obtain- 
ing reliable estimates of HOS, and ii) the hard assump- 
tions made regarding the stationarity and ergodicity of 
the available data. The emergence of faster digital hard- 
ware and the introduction of efficient HOS estimators 
will facilitate the wider utilization of HOS, [5,9]. A de- 
tailed coverage of signal processing algorithms and ap- 
plications with HOS can be found in [9]. A recent ex- 
tensive biography of HOS containing over 1700 entries 
has been compiled in [11]. 


See also 


> Global Optimization Methods for Harmonic 
Retrieval 
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A simple recourse problem is a stochastic linear pro- 
gram with recourse (cf. » Stochastic linear programs 
with recourse and arbitrary multivariate distributions) 
for which the recourse action simply involves calculat- 
ing linear penalties based on the surplus and shortfalls 
of scarce resources. In general all second stage parame- 
ters may be random. 

The general simple recourse problem may be for- 
mulated as follows: 


min {cx + Eg[Q(x, &)]: Ax = b,x > 0}, (1) 
where 


inf qt(&)yt + q-(&)y~ 
st. yr —y” =h(E)—T(E)x, = (2) 
yy 20. 


Q(x, §) = 


Here A € R™", b € R”, and c € R” are given matrices. 
The uncertain parameters are h: R’ > R*, q*: R” > RK, 
q.:R'> R*, and T: R’ > R**" where & is a random 


variable defined on the probability space (4, F, jz) with 
& C R’ the support of the measure jz. In the general 
case the probability distribution of € is continuous or 
has a very large number of realizations, which makes 
directly solving the deterministic equivalent very diffi- 
cult. 

One well-known example of a simple recourse 
problem is the newsboy problem. In this problem 
a newsboy must decide how many newspapers to order 
for sale the next day with only probabilistic informa- 
tion about the next day’s demand. Unsold newspapers 
will be sold back to the supplier at a reduced rate and 
additional demand must be supplied to the customers 
but at a higher cost to the newsboy. 

Three efficient methods have been developed for 
the case where only the right-hand side parameters are 
random. These methods are: the primal method (see 
> Simple recourse problem: Primal method); the dual 
method (see » Simple recourse problem: Dual method); 
and a method using the dualplex algorithm [1]. A good 
reference for the primal and dual methods for simple 
recourse problems is [2]. 
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Resource Allocation Problems 
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The dual method for solving simple recourse problems 
(cf. » Simple recourse problem), devised by A. Prékopa 
[1], is based on the dual simplex algorithm. As with the 
primal method (see ® Simple recourse problem: Primal 
method), this method only allows for uncertainty in the 
right-hand side parameters, and a finite, discrete prob- 
ability distribution, while the stochastic dependence or 
independence of the random variables is not important. 

This simple recourse problem may be formulated as 
follows: 


min {cx + E¢[Q(x, €)]: Ax = bx =O}, (1) 
where 
inf qtyt+qy 


Q(x,§)= 4st. yr-y =§-Tx, (2) 
Pay ee 


Here A €R”™",bER™,ceER", qi eR, g ER q=q° 
—q >0,and T € R™" are given matrices. isa random 
variable defined on the probability space (4, F, j4) with 
& CR’ the support of the measure ju. 

Using linear programming duality, €¢[Q(x, €)] may 
be rewritten as 


r Tix 


> Gi —Tix)+q i, 


i=1 


F(z) az) . (3) 


CO 


The objective function of (1) is then a piecewise-linear, 
convex function with breakpoints derived from the ele- 
ments of &. 

Let &,1,..., &i,x, be the possible values of &; in in- 
creasing order, with pi, ..., Pi,x,; the corresponding 
probabilities. Introduce &),9 < &j,1 and &),4;41 > &i,k; 
for i = 1, ..., r with the property that & 9 < Tjx < 
&; 4,41 for all x feasible for (1) and put p;, 9 = pi,x,+1 = 0. 
The stochastic linear program (1) is reformulated as an 
equivalent (deterministic) linear program. The follow- 
ing notation is used: 


fii = —qy + q( pata + >-? + pij-i8i-); 


forj=1,...,k;+1,i=1,...,7, representing the function 
values at the breakpoints. 

min cx+ fa 
st. Ax =), 


This linear program may be efficiently solved using the 
dual simplex method. All dual feasible bases have the 
following form. For some s, 1 < s < 1, there are m+ 
s basic x variables, r — s of the i in {1, ..., r} have ba- 
sic variable pairs of the form (A;,;,, 4i,j,41) and the re- 
maining s of these i have only one basic Ajj, variable. 

The algorithm corresponds to finding an initial dual 
feasible basis then using the dual simplex method. Only 
columns corresponding to variables which might en- 
ter the basis need to be calculated at each step. This 
amounts to all nonbasic x variables and at most two Aj 
for each i € {1,...,r} (the two immediately surrounding 
the basic 1 variables for each i). 
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The primal method for solving simple recourse prob- 
lems (cf. also ® Simple recourse problem), devised by 
R.J-B. Wets [3], is based on the concept of a work- 
ing basis introduced by G.B. Dantzig [1]. The primal 
method only allows for uncertainty in the right-hand 
side parameters, however stochastic dependence or in- 
dependence of these parameters is not important to the 
method. A good reference for this algorithm is [2]. 

This simple recourse problem may be formulated as 
follows: 


min {cx + Eg[Q(x, &)]: Ax = bx SO), (1) 
where 


inf qryt+qy 
Q(x,§)= 4st yr-y =§-Tx, (2) 
eee eee 


HereA ER”, DER", ceEeR",g ER, gq ER q=g 
—q >0,and T € R”™ are given matrices. £ isa random 
variable defined on the probability space (4, F, 2) with 
& CR’ the support of the measure ju. 

The method given here assumes & is a discrete ran- 
dom variable with finitely many realizations (i.e., & is 
finite). For continuous distributions the method may be 
efficiently used on successively finer approximating dis- 
tributions and error bounds calculated for each approx- 
imation. 

Using linear programming duality, Eg[Q(x, €)] may 
be rewritten as 


r Tix 


> (at este —Tix)+4q / 


i=1 


F;(z) az) . (3) 


co 


The objective function of (1) is then a piecewise-linear, 
convex function with breakpoints derived from the ele- 
ments of &. 

Let &,1, ..., &i,x, be the possible values of &; in in- 
creasing order, with pi, ..., pi,x,; the corresponding 
probabilities. Introduce & 9 < &j 1 and &),4,41 > &i,k; 
for i = 1, ..., r with the property that &)9 < Tix < 
i 4,41 for all x feasible for (1) and put pj, o = pi,k,41 = 0. 
The stochastic linear program (1) is reformulated as an 
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equivalent (deterministic) linear program. The follow- 
ing notation is used: 


hij = §i,j _ i715 


forj=1,...,k;+1,i=1,..., 7, representing the lengths 
of the intervals of the piecewise linear function, and 


gi = —qt + q(pio ++** + pi,j-1), 


forj=1,...,k;+1,i=1,..., 17, representing the gradi- 
ents between breakpoints. 


min cx + gv 
st. Ax =b, 
kj+1 
Tix — )° vij = &io $= Lycee 5; (4) 
j=l 
vts=h, 
x>0, O<vs<h. 


This linear program may be solved using the simplex 
method only considering bases with the property that 
for each i = 1, ..., r there is €; such that vj, ..., vig, 
Si€;t1> +++» Sikjt1 are basic, Siz...) Si0;—1> Vie;t1> «+ 
Vi,k; +1 are nonbasic, and s;z, may or may not be basic. 
To reduce the amount of computation involved only 
so-called key variables need to be recorded as being ba- 
sic. These are and basic x; and basic vjg for which sj is 
also basic. There are always m+r key variables and the 
working basis, W, is given by the first m + r rows of the 
columns corresponding to the key variables in the (full) 
basis. 

The algorithm corresponds to finding an initial 
working basis, then using the (upper bounded) simplex 
method with only the working basis inverse stored. For 
this, all reduced costs may be calculated with little extra 
effort beyond that required for a linear program with as 
many variables and only m + r constraints. The choice 
of the pivot column is constrained by the requirement 
of maintaining a basis with the required property. The 
calculation of the pivot row and the pivot step are es- 
sentially the same as for the upper bounded simplex 
method. 
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Simplicial decomposition (SD) can be viewed either as 
a generalization of the Frank-Wolfe algorithm [6] (cf. 
> Frank-Wolfe algorithm) or an extension of Dantzig- 
Wolfe decomposition [5] to nonlinear programs. The 
term ‘simplicial decomposition’ is due to B. von Hohen- 
balken [22], but the essential idea is generally known as 


column generation and has been called inner lineariza- 
tion/restriction by A.M. Geoftrion [7]. 
In general, SD addresses the following problem 


min f(x), (1) 


where f(x) is pseudoconvex. The set S is typically 
a nonempty and bounded polyhedron, i.e., S = {x € R": 
Ax <b, x> 0}, A isam x n matrix, and b € R™. With S 
being bounded and polyhedral, problem (1) can be re- 
stated as 


s.t. i => 1, (2) 


where n is the number of extreme points of S, each of 
which is represented as Y'. In words, problem (2) finds 
a convex combination of the extreme points, Y', that 
minimizes f(x). For real-world problems, the number 
of extreme points is generally large and it is impractical 
to generate all of them a priori. Instead, SD generates 
extreme points one at a time as follows. 


Select x! € S and set k = 1. 
het ve argmin<sVf(x*)"y. 
IP VY fe yr" — x) = 0, 
THEN stop and «x* is an optimal solution. 
ELSE, go to Step 2. 
2 | Let 
argmin f(ApZ* + D> AY’) 
. felr 
= st. Ag+ A; = 1, 
ierk 
A; = 0, Vie IK U {0}, 
where I* ¢ {1,..., k}, and Z* = 0 or x/ for 
some j € {1, ..., k}. 
Seta US Ze A anda k= |, 
ieIk 
Return to Step 1. 
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Since S is polyhedral, the problem in Step 1 is a lin- 
ear program, generally called the subproblem. When 
solved by the simplex algorithm, its solution, Y*, is 
guaranteed to be an extreme point. In the event that x* 
satisfies the stopping criterion, the following sequence 
of inequalities demonstrates that x* must be globally 
optimal: 


Vx €S: f(x) > f(x*) + VF (xe) (x — x*) 


ST ie Or ae 7); 


The three inequalities follow from the pseudoconvexity 
of f(x), the fact that Y* solves the subproblem, and the 
stopping criterion, respectively. 

The problem in Step 2, or the master problem, is 
structurally the same as problem (2) and finds a convex 
combination of Z* and extreme points in I that min- 
imizes f(x). This convex combination produces a new 
point, x**1, with a better objective value. To justify, 
consider the first order Taylor series expansion of f(x), 
ie, 


Fe® + A* =") 
= f(x*) +AVF(x*) "(Vv —x*) 
+A | y= x* a(x*;A(y* — x*)), 


where lim, _, 9 a(x*; A(Y* — x*)) = 0. When Step 2 is ex- 
ecuted, V f: (xk)T(Y*— x*) < 0 and the above expansion 
implies that there exists a sufficiently small ie (0, 1) 
such that f(x* +A(Y* —x*)) < f(x*). When Zé and I* 
are properly defined (see below), xk eas — x*) lies 
in the convex hull of Z* and Y?, for all i € I. Since A‘ 
solves the master problem, the following must hold: 


fe) = fAiZ* + Soar 
ierk 


< f(xt +A(Y' — x4) < f(x. 


So, the objective value decreases after each iteration. 

Note that Z* is not necessarily an extreme point of S. 
However, Z' and the (index) set I‘ provide some flexi- 
bility and determine the number of iterations to achieve 
an optimal solution. For example, if I* = {k} and Z* = 
x* in Step 2, then SD reduces to the Frank-Wolfe algo- 
rithm, which converges in the limit to an optimal solu- 
tion. 


When I° =@, 
ik = fi: ieTand yk > o} U {k}, 


and Z* = 0, the resulting algorithm is essentially the 
same as those in [11,22] and [23], and converges af- 
ter a finite number of iterations. For this choice of Z* 
and I‘, SD drops or discards extreme points with zero 
weight, Ae = 0, to reduce the size of the master prob- 
lem and, perhaps, to release computational resources 
for other uses as well. To obtain finite convergence, note 
that the number of possible index sets, I*, is finite since 
there are only a finite number of extreme points for S. 
For each Ik generated by the algorithm, there is an asso- 
ciated minimum objective value, f(x**'), that is always 
decreasing for k > 1. This implies that the algorithm 
generates a sequence of distinct I*. Since the number of 
possible I k is finite, the sequence cannot be infinite, i.e., 
the algorithm must terminate finitely. 

For the above choice of Z‘ and I‘, the Carathéodory 
theorem (see, e.g., [2]) guarantees that the cardinal- 
ity of I* is at most rank(A) + 1. Thus, allocating com- 
putational resources for storing rank(A) + 1 extreme 
points is sufficient to ensure finite convergence. How- 
ever, rank(A) + 1 is large for large scale problems and 
allocating such a large amount of resources may be im- 
practical. As an alternative, D.W. Hearn, S. Lawphong- 
panich, and J.A. Ventura [10] proposed the following 
extreme point dropping scheme to restrict the cardinal- 
ity of I* to at most r, where r > 1. 


De) When siete andeZ en 

For k > 1, let B denote the cardinality of the 
set {i: i € IX and Ak > 0}. 

2a | IF B <r, THEN set 

= eet and ye OUlikt 
ZEST 

2b | IF 6B =r, THEN set 

Pa Geer ar andi > 0} tk} 
Zk ~ xk 

where i* = argmin, {Ak}: Ak“! > 0}. 


An extreme point dropping scheme 


In Step 2, extreme points with zero weight, i. e., A= 
= 0, are dropped from the master problem in itera- 
tion k. When the number of remaining (or positively 
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weighted) extreme points is less than r, the new extreme 
point, Y*, is added to the master problem (see step 2a). 
Otherwise, the new extreme point replaces one of re- 
maining extreme points with the smallest weight (see 
step 2b) to keep the cardinality of I* at r. The choices 
for Z* in steps 2a and 2b ensure that Z* and the ex- 
treme points in I‘ always form a p-simplex (see [20]), 
a fact essential for proving finite convergence. With 
the above extreme point dropping scheme, the result- 
ing SD, known as the restricted simplicial decomposi- 
tion (RSD), converges finitely when problem (1) has 
a unique solution, x*, and r > dim(®) + 1, where @ = 
{Y!: Vf(x*)T(Y! —x*) =0,i=1,..., n}. (See [9].) When 
r < dim(®) + 1, RSD can be shown to converge in the 
limit to x* using standard arguments in nonlinear pro- 
gramming (see, e. g., [9] and [15]). 

In practice, a more successful application of SD is in 
solving large nonlinear multicommodity flow problems 
(see, e. g., [1]) of the form: 


6 
min (Sox) 

3) 
st. Ax(c)= b(c), Ve, 


x(c)>0, Ve, 


where A is a node-arc incidence matrix of a network 
with m nodes and n arcs, b(c) € R” is a supply/demand 
vector for each commodity c, x(c) € R” is a flow vector 
for commodity c, and f(x) is a pseudoconvex travel cost 
function. In Step 1 of SD, the subproblem for problem 
(3) decomposes into C problems (one for each com- 
modity c) of the following form: 


+ 


é 
argmin v(t] y 


y ai 
s.t. Ay = D(c), 
y= 0. 


¥co= (4) 


Problem (4) is a shortest path problem and can be 
solved efficiently with specialized network algorithms 
(see, e.g., [1]). J.D. Murchland [18] first discussed SD 
as a method for solving problem (3) that is generally 
known as the traffic assignment problem in transporta- 
tion science. D.G. Cantor and M. Gerla [4] (see also [8]) 
implemented SD for solving problem (3) to route mes- 
sages in computer communication networks. Later, the 


results in [9] and [10] renewed the interest in SD by 
demonstrating empirically that RSD efficiently solves 
large traffic assignment problems. In [10], the master 
problem is solved by a method with at least a superlin- 
ear convergent rate, e. g., [3] and [17], and ris relatively 
small. 

When applied to nonlinear single commodity (e. g., 
[10] and [17]) or dynamic network flow problems (e. g., 
[19]), SD may not be as efficient as other methods. For 
these problems, the dimension of ® tends to be large 
and each extreme point in ® contributes little as part of 
the convex combination that forms x*. 

In the literature, there are several extensions and 
modifications to SD, restricted or otherwise. First, it 
is claimed in [11] that SD also applies to problems in 
which S is convex, but not necessarily polyhedral. For 
example, S = {x € R”: gi(x) < 0, i= 1,..., m}, where 
gi(x) is convex on R”. In this case, a straightforward 
application of SD (see [11]) would yield a subproblem 
with nonlinear constraints, a problem as complex as 
the original. Later, Ventura and Hearn [21] proposed 
a modification for RSD in which the subproblem is 
a linear program instead. 

Second, when applied to problem (3), the represen- 
tation of the extreme points can effect the convergence 
rate of SD. In particular, the extreme point, Y*, can be 
represented either as Y‘ = )°“, y*(c), an aggregate 
form, or (Y*)T = (y*(1)T, fuss yK(C)T), a disaggregate 
form. The latter renders the master problem larger and 
more complex as shown below: 


Cc 
min f ‘a Ao(c)z*(c) + > Aile)y*(c) 
c=1 ie rk 

st. Ag(c) + ~ Alc) =1, Ve, 
ierk 
Ailc) > 0, Veandi € Ik U {0}. 


Despite the increased in problem complexity, T. Lars- 
son and M. Patriksson [12] demonstrated empirically 
that SD with disaggregate extreme points converges 
faster on several real-world traffic assignment prob- 
lems. 

Third, A. Migdalas [16] introduced an extension 
to the Frank-Wolfe algorithm called the regularized 
Frank-Wolfe algorithm in which the subproblem has 
a nonlinear term in the objective function to control the 
distance between Y* and x*. For example, one version 
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of the regularized subproblem is 


y* = argmin 
yes 


Vi (xk) ly + 50 — x*)" Dy =<"). 


where Dé is a positive definite matrix. In [13], Lars- 
son, Patriksson, and C. Rydergren solved the above 
subproblem approximately by performing several iter- 
ations of the Frank-Wolfe algorithm and showed em- 
pirically that the regularized subproblem can improve 
the convergence of SD. 

Finally, C.H. Wu and Ventura [24] and Lawphong- 
panich [14] extended SD to solve problems with side 
constraints, i.e., S = {x € R": Ax < b, Dx < d, x > 0}, 
where A and b are as defined for problem (1), D is aq x 
n matrix, and d € R‘. Here, A may have a special struc- 
ture that can be exploited computationally, and D, rep- 
resenting the side constraints, does not. 


See also 
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Simplicial decomposition (SD) is a class of methods 
for solving continuous problems in mathematical pro- 
gramming with convex feasible sets. There are two 
main characteristics of the methods in this class: 

i) an approximation of the original problem is con- 
structed and solved, wherein the original feasible set 
is replaced by a polyhedral subset thereof, that is, an 
inner approximation of it which is spanned by a fi- 
nite set of feasible solutions; and 

ii) this inner approximation is improved (that is, en- 
larged) by generating a vector (or, column) in the 
feasible set through the solution of another approx- 
imation of the original problem wherein the origi- 
nal cost function is approximated (often by a linear 
function). 

As such, the class of SD methods may be placed within 

the framework of column generation methods. Another 

characteristic of an SD method however is that the se- 
quence of solutions to the inner approximated prob- 
lems tends to a solution to the original problem in such 

a way that the cost function (or, some merit function) 

strictly monotonically approaches its optimal value. 

Therefore, the class of SD methods also falls within the 


framework of iterative descent (or, ascent) algorithms 
for continuous mathematical programs. 

We consider, for the most part, the solution of the 
differentiable optimization problem 


min f(x), 


s.t. xEX, 


(1) 


where f: X — R is pseudoconvex on X (that is, for any 

x, y € X, Vf(x)T(y — x) = 0 implies f(x) < f(y)), and 

where X := {x € R": Ax = b; x > 0"} is a nonempty poly- 

hedral set. 

The derivation of the method rests on two classi- 
cal results on the representation of convex sets and of 
points in such sets. The first result is the representation 
theorem (e. g., [2,18]), which states that: 

i) the set of extreme points p’, i € P, of the polyhedral 
set X is nonempty and finite; 

ii) the set of extreme directions d', i € D, is empty if 
and only if X is bounded, and if X is not bounded, 
then it is nonempty and finite; finally, and most im- 
portantly, 

iii) a vector x € R” belongs to X if and only if it can be 
represented as a convex combination of the extreme 
points plus a nonnegative linear combination of the 
extreme directions, that is, for some vectors A and ju, 


x= oApit Did’, (2a) 
icP i€D 

pae? =1, (2b) 

i€P 

A; = 0, ie P, (2c) 

pee = 0, ie D. (2d) 


Thus, in principle, the polyhedral set X can be given an 
inner representation in terms of extreme points and di- 
rections, and the problem (1) can be cast in the vari- 
ables 1; and ju; instead of in x. The advantage of mak- 
ing this problem transformation is that the inner repre- 
sentation of X is much simpler than its original, outer, 
representation in terms of linear equalities and inequal- 
ities; disregarding the definitional constraints (2a), the 
set described by (2) is the Cartesian product of a sim- 
plex and the nonnegative orthant, an optimization over 
which often can be made with little more effort than for 
an unconstrained problem (e.g., [3,4]). Furthermore, 
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the inner representation may also be useful for inter- 
preting the result of an optimization, since the extreme 
points and directions may have further significance; in 
applications to network flows, for example, the repre- 
sentation theorem states that a link flow x € X, where 
the polyhedral set X describes the flow conservation 
and nonnegativity constraints for the network flow, can 
equivalently be represented by (or, decomposed into) 
the sum of flows on routes and in cycles (e. g., [1, Thm. 
3.5]). The latter also explains the origin of the term sim- 
plicial decomposition: the variable transformation de- 
composes a feasible solution into the sum of variables 
that (in the bounded case) forms a simplex. 

The disadvantage of the transformation is that since 
the number of extreme points and directions of a poly- 
hedral set grows exponentially with its dimension, the 
transformation introduces an impractically large num- 
ber of variables. The practical use of simplicial decom- 
position then hinges on the second basic result in the 
representation of convex sets, Carathéodory’s theorem 
(e.g., [26, Thm. 17.1]). This result states that a point x 
in the convex hull of any subset X of R” can be rep- 
resented as a convex combination of at most as many 
elements of X as its dimension, dimX (which is defined 
as the dimension of its affine hull), plus one. (This num- 
ber is not larger than n + 1.) Although Carathéodory’s 
theorem is not stated in terms of extreme points and 
directions, its natural application in the context of sim- 
plicial decomposition is that, in the case of a bounded 
polyhedral set, for example, any feasible point can be 
described as the convex combination of extreme points 
of the set, the total number of which need never exceed 
the dimension of the polyhedron plus one. (This result 
obviously refines the representation theorem.) 

The classical form of the simplicial decomposition 
method was first described by B. von Hohenbalken [31] 
(see however the end of this article for some earlier ref- 
erences to similar algorithms) for the problem (1). The 
algorithm alternates between the solution of two prob- 
lems. Given known subsets P and D of P and 2D, re- 
spectively, f is minimized over the inner approximation 
of X which is defined when these subsets replace P and 
D in (2), in terms of the variables a, i€ P, and jij, 
i € D. (We will denote this problem the restricted mas- 
ter problem (RMP); it is also sometimes referred to as 
the coordination step.) Notice that we use the notation 
A and i to distinguish the vectors in the RMP from the 


(longer) vectors A and jz in the complete master prob- 
lem which is equivalent to (1) and is defined by the sys- 
tem (2). Further denoting by A the set of vectors (A, p) 
satisfying the restriction of the system (2b)-(2c) to the 
known s subsets P and D and utilizing (2a) to substitute 
x for (A, A) (we write x = x(A, j2)), the RMP may then 
be formulated as 


st. (A, fed. o 


ns f(x, f)), 
Alternately, a profitable extreme point or direction of X 
is generated through the solution of an approximation 
of (1), in which f is replaced by its first order, linear, 
approximation, y +> f(x) + V f(x)T(y — x), defined at 
the solution, x, to the RMP (3), that is, by the problem 
min Vf(x)'y, 
| fx)'y (4) 

st ye xX; 


this approximate problem is a linear programming 
problem, which in general is much easier to solve than 
the original one. (This is called the column generation 
subproblem, and corresponds to the decomposition step 
in some descriptions of column generation methods.) If 
the solution to this problem lies within the current in- 
ner approximation, then the conclusion is that the cur- 
rent solution, x, is optimal in (1), since, then, eae (x)TY 
— x) = 0 must hold for all y ¢ X. Otherwise, P or D 
is augmented by a new element, the resulting inner ap- 
proximation is improved (that is, enlarged), and the so- 
lution to the new RMP has a strictly lower objective 
value than the previous one; the latter result follows 
since the strict inequality Vf(x)™d < 0 holds (that is, 
d defines a direction of descent with respect to f at x), 
where d denotes either the direction d := y—x towards 
the new extreme point y or an extreme direction. The 
iteration is then repeated with the solution of a new col- 
umn generation subproblem defined at the solution to 
the RMP. In the method of [31], Carathéodory’s theo- 
rem is utilized in the validation of a column dropping 
rule, according to which any extreme point or direction 
whose weight in the expression of the solution x to the 
RMP is zero is removed; thanks to the finiteness of P 
and D and the strictly decreasing values of f, the con- 
vergence of the SD algorithm in the number of RMP is 
finite. 
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In the case of a convex function f, a finite termi- 
nation criterion is automatically supplied, on the one 
hand by the upper bound on the optimal objective value 
of (1) that is defined by the solution to the RMP, and 
on the other hand by the lower bound that is supplied 
by the solution to the column generation subproblem; 
in fact, letting x be an arbitrary feasible solution to (1), 
and x* be any optimal solution to (1), we obtain from 
the optimality of x* and by the convexity of f that 


f(x) = fx") 
> f(x) + V(x)" * — x) (5) 
> f(x) + min{ VE(x) (y—x)}s 


note that the problem defined in (5) is precisely the col- 
umn generation subproblem (4) defined at x. A finite 
termination criterion for the solution of the RMP can 
be defined by the analogous lower bound. In this case, 
due to the simple form of the constraints defined by 
(2b)-(2d), the lower bound is available directly from 
the value of the gradient of f with respect to oe jt) at 
x= x(A, iL). Indeed, the lower bound is either (analo- 
gously to (5)) given by the current objective value plus 


min f(A.) 
Od; 
— Vf (x0, 2) TA — Vax, Mm) fh 


or it is minus infinity (if Vf(x)Td' < 0 holds for some 
i¢€ D). 

Assume now that the function f is strictly pseudo- 
convex (that is, for any x, y € X with x 4 y, Vf(x)T(y — 
x) => 0 implies that f(x) < f(y) holds), so that the optimal 
solution x* is unique, and for simplicity we also assume 
that X is bounded. For such problems, an improvement 
over the original scheme was devised in [10,11]. The ba- 
sis for the improvement is the observation that a par- 
ticular feasible solution, such as the optimal one, can 
be represented as the convex combination of an often 
much smaller number of extreme points than dimX + 
1, as implied by Carathéodory’s theorem; in fact, the 
highest number of extreme points needed to describe 
the optimal solution x* is dimF* + 1, where F* is the 
optimal face of X, that is, the face of X of the smallest 
dimension which contains x*. (In the present context, 
this set may be described by 


F* = {yeX: Vf(x*)"(y—x*) = 0}, 


ieP 


a set which is spanned by the extreme points of X that 
solve the linear approximation (4) to (1) defined at the 
optimal solution.) Based on this observation, they de- 
vise a modification of the original scheme, in which the 
number of extreme points retained is kept below a pos- 
itive integer, r; when this number of extreme points 
has been reached, any new extreme point generated re- 
places the column in P that received the least weight in 
the solution to the RMP. In order to ensure the conver- 
gence of the algorithm, the optimal solution x to the 
RMP must also be retained as an individual column 
(however not counted among the r columns). They 
show that the modified algorithm is finitely conver- 
gent in the number of RMP, provided that r > dimF* 
+ 1. Referred to as restricted simplicial decomposi- 
tion (RSD), the scheme is shown below, for the case of 
a bounded set X. 


PROCEDURE RSD(r) 
(Init): x° € X, p* :=x°, P=, t:=0. 
(Sub): Solve (4) defined at x‘ > p'', i, € P. 
(Augment): i; € P = x' is optimal. 
| P |= r => replace an element of Pp by iy. 
|Pl<r> P= Pui. 
(Master): x‘t! minimizes f over the convex 
hull of p* and pi, i € P. 
(Update) Let p= t= 7 
Go to the Subproblem. 
END 


The value of r is crucial to the performance of the 
algorithm. If r > dimF* + 1, then since the number 
of RMP is finite, the local rate of convergence is gov- 
erned by the local convergence rate of the method cho- 
sen for the solution of the RMP; thus, a superlinear or 
quadratic convergence rate may be attained if a (pro- 
jected) Newton method is used ({11]). If r < dimF* + 1, 
however, then the algorithm is only asymptotically con- 
vergent, and the rate of convergence is the same as that 
of the Frank-Wolfe algorithm (or, conditional gradient 
method; which is actually obtained as a special case of 
RSD when r := 1), that is, the convergence rate is sub- 
linear. Since the threshold value for finite convergence 
cannot be estimated from the original data and thus is 
unknown a priori, the proper value of r must in general 
be based on computational experience. 
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In column generation methods for linear pro- 
grams, such as Dantzig- Wolfe decomposition ([16]), the 
columns are generated through the pricing operation of 
the simplex method in linear programming, which uti- 
lizes an estimate of the dual optimal solution. Its ex- 
tension to nonlinear programming, nonlinear Dantzig- 
Wolfe decomposition (e.g., [18]), also utilizes the pric- 
ing operation in the construction of the column gener- 
ation subproblem, which in both cases is equivalent to 
the result of performing a Lagrangian relaxation of the 
original problem using the current estimate of the vec- 
tor of Lagrange multipliers; while the column genera- 
tion subproblem is nonlinear in the latter algorithm, the 
RMP are in both cases linear programs. In contrast, the 
class of SD algorithms are column generation methods 
where the columns are generated through the solution 
of the primal, linearized problem (4), and which thus 
does not utilize dual information. However, it is estab- 
lished in [14] that Dantzig-Wolfe decomposition is in 
fact a special case of simplicial decomposition, when the 
latter is applied to a primal-dual (saddle point) refor- 
mulation of the linear program. Also for linearly con- 
strained nonlinear programs of the form (1), simpli- 
cial decomposition may be based on the pricing-out of 
a subset of the linear constraints. Identifying a subset of 
the constraints defining X as complicating, these may 
be priced-out (that is, Lagrangian relaxed) in the col- 
umn generation subproblem, and instead included in 
the master problem, just as in Dantzig-Wolfe decom- 
position for linear and nonlinear programming prob- 
lems. Such methods have been devised in [20,28]. It 
should be noted, however, that just as in the original 
(primal) SD method, the column generation subprob- 
lems in these methods are based on the linearization of 
the original objective function, and are therefore linear 
programs, and their RMP are nonlinear; this is precisely 
the opposite to the case of nonlinear Dantzig-Wolfe de- 
composition. 

The RSD algorithm has been successfully applied 
to large scale, structured nonlinear optimization prob- 
lems, in particular mathematical programming mod- 
els of various nonlinear network flow problems, where 
the column generation subproblem reduces to effi- 
ciently solvable linear network flow problems (e.g., 
[11,15,23]). 

Other special structures in the feasible set X may 
also be taken into account efficiently in the construc- 


tion of an SD method. For example, assume that the 
set X is a Cartesian product of polyhedral sets X; in 
smaller dimensions R"*, withk € Kand \opc 4 Mk =n. 
(In network flow problems, k could denote a commod- 
ity of goods to be transported or a pair of origin and 
destination in an urban transportation network.) Not- 
ing that the linear column generation subproblem de- 
composes into |K| independent linear column gener- 
ation problems, it is possible to store extreme points 
and directions of the individual (smaller-dimensional) 
sets X, rather than extreme points and directions of 
X. The RMP of such a disaggregate simplicial decom- 
position (DSD) method ([15]) would then have vari- 
ables of the form Aj, i € Py, k € K, and likewise for 
Hi, and |K| convexity constraints (2b) instead of only 
one as in the SD method. The total number of extreme 
points and directions is much less in the disaggregated 
representation ()>,¢4{Px + Dx} in the disaggregated 
case, and | [,<4{Px+Dx} in the aggregated case; [24]). 
On the other hand, according to Carathéodory’s the- 
orem, the total number of columns needed to express 
an optimal solution is in this case bounded above by 
ere x (dim Ff + 1), which may be a much higher num- 
ber than dimF* + 1. This result notwithstanding, it has 
been observed in applications of Dantzig-Wolfe de- 
composition to linear multicommodity network flow 
problems ([13]) that a disaggregated representation of 
the solution (in this case, as commodity route flows in- 
stead of as aggregated link flows) speeds up the con- 
vergence of the method. The same conclusion has been 
drawn from applications to nonlinear multicommod- 
ity network flow problems ([15]) of the DSD algo- 
rithm. 

Experience with the RSD method has shown that it 
makes rapid progress initially, quickly reaching a near- 
optimal solution, especially when relatively large values 
of r are used and when second order methods are used 
for the solution of the RMP, but that it slows down close 
to an optimal solution. It is also relatively less efficient 
for larger values of dimF*. 

The explanation for this behavior is to be found in 
the construction of the column generation subproblem, 
which utilizes first order approximations of f. The col- 
umn generation subproblem of RSD is the same as that 
of the Frank-Wolfe (FW) method mentioned earlier, 
the quality of whose search directions are known to de- 
teriorate rapidly. The reason is that as the sequence {x'} 
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tends to an optimal solution, the sequence {Vf(x‘)Td‘} 
of directional derivatives of the search directions d‘ := 
y' — x' tends to zero whereas {d'} does not; thus, the 
search directions rapidly tend to become orthogonal to 
the gradient of f, and the result of the deteriorating de- 
scent property is a decreasing step length in the line 
search of this algorithm. 

We then make the observation that the RSD method 
is similar to FW, since the same descent direction- 
generating subproblem is used, and that the only differ- 
ence between FW and RSD lies in the updating phase, 
the latter algorithm using a multidimensional search 
(when r > 2) rather than a one-dimensional search, 
which one may interpret as a devise for reducing the 
zig-zagging effect inherent in the FW algorithm. It is 
a natural conclusion from this discussion that better ap- 
proximations of f could be exploited in the column gen- 
eration phase of SD methods, since then the columns 
generated would be of better quality, thus leading to 
larger improvements in the inner approximations of 
the feasible set. (The counterpart in line search meth- 
ods for (1) is that improved approximations of f in 
a direction-finding subproblem yield better search di- 
rections.) 

An extension of the RSD algorithm was made by T. 
Larsson, M. Patriksson and C. Rydergren [16] based on 
this observation. The motivation behind the nonlinear 
simplicial decomposition (NSD) method is that by gen- 
erating columns based on better approximations of the 
objective function, the sensitivity of the method to the 
dimension of the optimal face will be reduced, fewer 
columns will be needed to describe an optimal solution, 
resulting in fewer iterations, and enabling a smaller 
value of the parameter r to be chosen. 

The NSD method is obtained from the RSD method 
by replacing the linear column generation subproblem 
(4) with 


min{V f(x!) "y+ gly. x')}, (6) 


where gy: X x X > R is a continuous function of the 
form @(y, x), convex and continuously differentiable 
with respect to y for all x € X, and with the property 
that V,@(x, x) = 0” for all x ¢ X. Among the possible 
choices for g we mention the following, where diag de- 
notes the diagonal part of the matrix and where y; > 0: 


gy, x) Subproblem 
0 Frank-Wolfe 
ey: Newton 
oa [diagV7 f(x")]y Diag. Newton 
spit yy Projection 


Even though the finite convergence property will 
be lost (because nonextremal points will be generated), 
one may expect a more rapid convergence of the NSD 
method than the RSD method, both in terms of the 
number of iterations needed to reach a given solution 
accuracy and in terms of the required solution time, 
provided however that the nonlinear column genera- 
tion subproblems can be efficiently solved, at least ap- 
proximately. In numerical experiments performed on 
large scale nonlinear network flow problems, such con- 
clusions were indeed made. It was particularly observed 
that the NSD method is relatively much less sensitive to 
the value of dimF* than is RSD, which permits the use 
of a much smaller value of r in the NSD method. 

Convergence results for SD methods allow for both 
the column generation subproblem and the RMP to be 
solved inexactly, thus facilitating its practical use. In 
[11], convergence is established for the RSD method, 
wherein the RMP is solved using one iteration of New- 
ton’s method only. Further developments along these 
lines are found in [8,14] for a general class of SD meth- 
ods and in [25, Chap. 9] for the NSD method. The con- 
vergence results established in the latter reference not 
only validate inexact computations but also quite arbi- 
trary rules both for defining and for dropping columns. 

SD algorithms have been extended to handle non- 
linear constraints as well. In [30], the column gen- 
eration subproblem is made linear by approximating 
the nonlinear constraint functions by piecewise linear 
functions, reminiscent to the Topkis—Veinott scheme 
([29]). The NSD method of [16,25] applies to general 
convex sets X directly. A combination of sequential 
quadratic programming (SQP) and NSD is devised in 
[25, Chap. 9]. There, in the column generation sub- 
problem one replaces the nonlinear constraints with 
linear approximations, and dual information about 
these constraints is included in the objective function. 
SD methods for nonlinearly constrained problems are 
believed to be efficient, if the nonlinearity is mild, as 
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concluded from the computational results of [30]; how- 
ever, few applications of SD methods to nonlinearly 
constrained problems have been reported. 

The class of SD methods has furthermore been ex- 
tended to the solution of the general variational in- 
equality problem (VIP) of finding x* € X such that 


F(x*)' (x —x*) > 0, Vx eX, (7) 


where F: X — R” is a continuous and monotone map- 
ping. If the mapping F satisfies F = Vf, that is, it is the 
gradient of f, then the VIP defines the first order op- 
timality conditions for x* in (1), and the SD method 
for the VIP becomes that for (1). If this is not the case, 
then F replaces Vf in the column generation subprob- 
lem (4), and the RMP is defined as the restriction of 
the variational inequality problem (7) to the currently 
known inner approximation of X. Further, in this case 
there is no objective function (or, merit function) im- 
mediately available for monitoring the convergence of 
the SD method. Column dropping rules in SD methods 
for the VIP must however be based on the improvement 
of the method in terms of some merit function, without 
which the method may cycle. (This is evidenced by the 
nonconvergence of the FW method applied to the VIP; 
see [9].) S. Lawphongpanich and D.W. Hearn [19] uti- 
lize the primal gap function, 


W(x) == max F(x) "(x — y), 
yEex 


which is zero at solutions to VIP and positive elsewhere 
in X, to guide the dropping of columns. A related merit 
function is used in [27]. The NSD and NSD/SQP meth- 
ods are extended to VIP in [25, Chap. 9], there using 
the merit function 


W(x) = max{ F(x)" (x — y) — gly, x)}. 


In contrast to the case of the problem (1), the sequence 
of solutions to the RMP in SD methods applied to (7) 
does not necessarily yield a monotonically decreasing 
sequence of values of any merit function for the VIP un- 
less very restrictive assumptions are made on the origi- 
nal data, whence the theoretical properties of SD meth- 
ods for VIP are less strong; for example, a consequence 
of the property just mentioned is that the finite conver- 
gence result for the RSD method cannot be transferred 
to the VIP. The solution methods that have been con- 
sidered for the RMP for the VIP are generalizations of 


those used for the RMP of (1); the most popular ones 
are projection algorithms, due to a large degree to the 
simple form of the constraints of the RMP; see, e.g., 
[22] for numerical investigations of different algorith- 
mic approaches to the RMP. 

Larsson, Patriksson and A.-B. Strémberg [17] de- 
velop an SD scheme for nondifferentiable convex opti- 
mization. There, the gradient Vf(x) is replaced by the 
set of subgradients, the subdifferential, df (x), defined by 


f= fO+ EG -4), 


ee Vy € R” 


&, € R": 


It is shown that the utilization of an arbitrary sub- 
gradient & € df (x) in place of the gradient in the col- 
umn generation subproblem may lead to the termina- 
tion of the algorithm at a nonoptimal solution, since 
not all subgradients define descent directions. A mod- 
ification of the SD scheme is therefore made, wherein 
the subgradients evaluated in the course of solving the 
RMP are averaged with weights proportional to the step 
lengths used in the solution method for RMP; this vec- 
tor of averaged (or, ergodic) subgradients is then shown 
to yield an improved inner approximation. An alter- 
native, and probably computationally much more effi- 
cient, means to define a linear column generation sub- 
problem with the properties required is through the 
generation of (approximately) shortest e-subgradients. 

The first simplicial decomposition type methods 
evolved from the experience of the poor convergence 
of the FW method (e.g., [5,12,21]). The perhaps first 
thorough theoretical investigation of SD methods is due 
to C.A. Holloway [12]. Its close relationships to col- 
umn generation methods however make it difficult to 
trace its earliest history; for example, already the first 
urban transportation planning studies in the 1950s ap- 
plied heuristics resembling the DSD algorithm (see [24] 
for an overview of the history of such methods). A re- 
lated class of methods also exist for least-distance and 
other problems in quadratic programming (e. g., [7]). 

Further surveys on simplicial decomposition meth- 
ods, their history and their relationships to column gen- 
eration, are found in [15,24,25]. 


See also 


> Decomposition Principle of Linear Programming 
> Generalized Benders Decomposition 
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> MINLP: Generalized Cross Decomposition 

> MINLP: Logic-based Methods 

> Simplicial Decomposition 

> Stochastic Linear Programming: Decomposition 
and Cutting Planes 

> Successive Quadratic Programming: Decomposition 
Methods 
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Simplicial path following methods are relatively new in 
the area of integer programming (cf. >» Integer pro- 
gramming). They are based on a triangulation of Eu- 
clidean space and a pivoting algorithm which, for a re- 
strictive class of problems, terminates with an integral 
solution or shows that no such solution exists. The path 
constructed consists of a sequence of neighboring sim- 
plices, the vertices of which are integral lattice points. 

Simplicial methods originated in fixed point theory, 
where they are used to approximate fixed points of con- 
tinuous mappings [7,18]. 

In the area of continuous mathematics they have 
been applied successfully in disciplines such as game 
theory [19], the approximation of roots of systems of 
complex polynomials [8,13], economics and economet- 
rics [20] and they have provided a useful machinery to 
prove a variety of intersection lemmas [10,16]. 

Returning to discrete problems again, the essentials 
of the method consist of the following ingredients: 

e triangulations 

e labelings 

e pivoting 

e termination- and noncycling arguments. 

The simplicial algorithms developed so far yield 
a conclusive answer to the feasibility problem: “Does 
a given bounded set in Euclidean space contain a lattice 
point?’ for a restrictive class of sets, the so called max- 


closed sets. In fact, for this class, they provide a poly- 
nomial time algorithm (for polyhedral cases) which 
generalizes earlier work of A. Pnueli [17], F. Glover 
[11], R. Chandrasekaran [1] and R.W. Cottle and A.F. 
Veinott [3]. 

In the sequel we shall pay attention to the no- 
tions quoted above. Also, some intrigueing complex- 
ity issues which arise when studying unimodular max- 
closed form transformations are discussed briefly. Fi- 
nally, we discuss the use of simplicial methods outside 
this tractable class of max-closed sets, namely to locate 
regions of specific interest, and their possible incorpo- 
ration into branching algorithms. 


Triangulations 


A triangulation of Euclidean space is a set of simplices 
which union covers the space and moreover satisfies the 
condition that any two simplices from this set intersect 
in a member of this set. 

For our purposes we need a triangulation which 
uses all lattice points as zero-dimensional elements and 
triangulates the unit-cube together with all its integral 
translates. There are various ways to triangulate space 
in such a manner. An extensive study on triangulations 
and simplicial methods is [4]. 


Labelings 


The labelings form the most crucial part of the simpli- 
cial methods. It is through the labeling device that the 
original problem is translated to a format where the ar- 
guments of the pivoting algorithms adapt to. There are 
two cases to consider, integer labeling and vector label- 
ing. 


Integer Labeling 


This part needs some introductory notations and con- 
ventions. Given a € R" the following cones play a cru- 
cial role: 


P(a) = {x € R": x; > a; fori=1,...,n}, 
Py (a) = {x € R": x € P(a) and x, = ax}, 
N(a) = {x € R": x; < a; fori=1,...,n}, 


Nx(a) = {x € R”": x € N(a) and xx = ax}. 
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Now suppose a set S C R” is given. A point a € R” car- 
ries label k if (following [15]) 


SO N,(a) = @. 


According to the above, a point may carry more la- 
bels (in which case the smallest such is chosen, for ex- 
ample) or it may carry no label at all. Drawing some 
pictures, one sees that a labeling device as above parti- 
tions Euclidean space into n regions, except for points 
in S itself and except for some part which is located near 
the so called Pareto boundary of S. The latter part might 
vanish if S has a specific shape, which shall be precisely 
the case for the max-closed sets that will be discussed 
later on. 

Of course, it is understood in this context that the 
set S is accessible to questions whether some point car- 
ries label k. In case S is polyhedral, linear programming 
answers these questions. If S is convex, one needs con- 
vex programming to do so. In all cases of interest we 
assume that membership questions related to S are rela- 
tively easy as long they are not specified to lattice points 
of S, that is, as long as they deal with S as a continuous 
set! 

In [5,6] it is shown that if S is a simplex (knapsack 
problem) one can even avoid the use of linear programs 
in establishing labels. An explicit device is available in 
that case. 


Vector Labeling 


Relative to a set S C R" we can associate to any point 
a € R" the vector which starts at a and ends at a point 
of S nearest to a. Here S is assumed to be closed, and 
preferably convex, in which case this vector is unique. 
The reader may notice that now all points carry a la- 
bel! This labeling device was introduced in [15] where 
it is shown that it can be used to obtain similar re- 
sults as provided by integer labeling rules but, more- 
over, enables a pivoting algorithm to continue where it 
would terminate in the former case due to the lack of 
labels! 


Pivoting 


We shall explain the pivoting structure only in the case 
where integer labelings are used. See [15] for the more 
complicated process involving vectors. Also, we restrict 


Simplicial Pivoting Algorithms for Integer Programming, Fig- 
ure 1 
Pnueli’s algorithm 


ourselves here to full-dimensional pivots, which lead 
from an + 1-dimensional simplex of the triangulation 
to a neighboring one. Although varying dimension piv- 
oting algorithms [6] are essential in speeding up perfor- 
mance, they involve rather technical details and go be- 
yond the scope of this overview. 

Before going into simplicial pivoting we discuss 
Pnueli’s algorithm [17] first. Because of introductory 
purposes we will present a strongly modified form of 
it. 

A set SC R" is specified, together with a lattice point 
u, being an upper bound for S; this means that S C N(u). 
A lower bound for S is any lattice point ¢ for which 
N(£) N S = @. Clearly, if S is bounded, it has both up- 
per and lower bounds in the above cone-like sense. In 
the sequel e, denotes the kth unit vector. Now the mod- 
ified form of Pnueli’s algorithm (see Fig. 1 for a two- 
dimensional example) consists of the following iterative 
procedure: 


1 | letxo =u 
2 | if x; has label k let xj; = x; — ex. 


It is not at all that hard to see that, as long as the 
algorithm runs, we have all integral lattice points of S 
contained in N(x;). This means that the algorithm ter- 
minates in three possible states: 

1) The iterate x; is recognized as a lower bound. S is 
proven to contain no integral solutions. 

2) The iterate x; has no label. If this is because x; € S we 
have located a solution. 

3) The iterate x; has no label and, unfortunately, 


xj ES. 
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Start A simplex of the triangulation is located 
with the following property: all labels 
from 1, ... ,m are present as labels of the 
n+1 vertices of the simplex involved. Such 
a simplex can easily be found near an up- 
per bound of S (as is done in Fig. 2). In 
the case of an arbitrary starting, varying 
dimension algorithm [6] such a simplex 
shows up during the execution of the al- 
gorithm itself! 

If a simplex carrying all labels is found, 
two vertices of this simplex must carry 
the same label. These two vertices define 
two neighboring simplices of the trian- 
gulation: those two full dimensional sim- 
plices which intersect the given one in one 
of the two facets opposed to the two ver- 
tices involved. Now one of those simplices 
is supposed to represent the previous state 
of the algorithm. The next state is defined 
as the other simplex. Thus a new lattice 
point is found, its label is calculated, and 
when it has one, a new simplex is found 
carrying all labels, one of which occurs 
twice at a vertex. 


Pivot 


As shall become clear later, 3) cannot happen when- 
ever S is max-closed. Pnueli’s algorithm in the above 
form is extremely simple and it runs (as long as it runs) 
in polynomial time if S is polyhedral: it uses at most 


n ee — £3) 


i=1 


linear programs to travel from an upper bound u to 
a lower bound ¢. A disadvantage however is that it can- 
not start at an arbitrary point, in the neighborhood of 
which an integral solution is expected for some rea- 
son whatsoever (this remark will turn out to be even 
more relevant when we will deal with the unimodular 
transformations later). Simplicial algorithms are much 
more flexible with respect to starting conditions and 
this is precisely the reason why they deserve attention 
as a substitute for Pnueli’s algorithm. 

We now turn to the pivoting structure (see Fig. 2) 
(Comparing Fig. 1 and Fig. 2, one might prefer Pnueli’s 
algorithm above the simplicial one; however, note the 
latter can start everywhere.) 


Simplicial Pivoting Algorithms for Integer Programming, Fig- 
ure 2 
A simplicial algorithm 


The above construction of a path of neighboring 
simplices forms the basic idea of each simplicial al- 
gorithm. Again we emphasize that in arbitrary start- 
ing, variable dimension algorithms the construction is 
considerably more complicated. However sophisticated 
a simplicial algorithm may be, the ultimate goal is to 
create a sequence of almost solutions, thereby carefully 
avoiding cycling. 


Termination and Noncycling Arguments 


An elegant feature of simplicial algorithms is that, if 
special care is taken in the construction of the sequence 
of simplices, cycling is impossible. Moreover, they are 
designed in such a way that they cannot tend to infinity 
without passing an upper or lower bound of the set S. 
This means that, as long as the algorithm runs, it cre- 
ates new candidate solutions on every iteration. Using 
these arguments one can prove that 
1) The algorithm reaches a recognizable lattice point 
(which can be an upper bound or a lower bound, 
depending on the starting position and the state of 
the algorithm) in which case S is proven not to con- 
tain any lattice point. The argument is similar but 
slightly more involved than in Case 1 to Pnueli’s al- 
gorithm and was first used in [5]: 
If v is a lattice point of S the set P;,(v) clearly does 
not contain points carrying label k. Therefore the 
algorithm cannot pass this set! In order to pass it, 
it would have needed all labels there to be present. 
As a consequence, the algorithm cannot pass from 
P(v) to a lower bound or vice versa, without hitting 
v or another solution. Hence it cannot run between 
an upper bound and a lower bound without hitting 
a solution when there is one. 


Simplicial Pivoting Algorithms for Integer Programming 


2) The algorithm encounters a point carrying no label. 
If S is max-closed (see below) this must be a solu- 
tion. 


Max-Closed Sets 


Following [15] a set S is called max-closed if it satisfies 


xyeS => max(x,y)eS 
where 
max(x, y) = (max(x), yi),...,max(Xn, Vn)). 


As one may verify, the following generalities are easily 

established: 

1) Translations map max-closed sets to such sets. 

2) Intersections of max-closed sets are max-closed. 

3) Inequalities of the form x; < a; and x; > f; define 
max-closed sets. 

4) An inequality of the form 


CyXy tees + CyXy SC 


defines a max-closed set whenever at most one of 
the c; is positive. Special features around these kinds 
of inequalities in integer programming were already 
investigated in [1,11,12,17]. Of special interest are 
the inequalities of this type on two variables: They 
arise as integer programming reformulations of the 
simultaneous Diophantine approximation problem 
[14]. 

5) Iff is a monotone increasing function in each of its 
variables, the set {(x, z): f(x) > z} is max-closed. 

6) Max-closed sets need not be polyhedral, nor need 
they be convex. They need not even be connected! 

7) A function g is called max-closed whenever it satis- 
fies 


g(max(x, y)) < max(g(x), g(y)). 


Such functions define max-closed sets by the in- 
equalities g(x) < y. 

8) A bounded max-closed set contains a unique coor- 
dinatewise maximum point, that is, a u with S C 
N(u). 

Based on the above properties it is seen that the fol- 

lowing algorithm solves the integer feasibility program 

whenever S is defined by inequalities of the type men- 

tioned in 4): 


_ 


maxxes >. Xi. 

If x solves Step 1 let u = |x], the lattice point 
obtained by rounding the x; downwards. 

3 | Let S := $M N(u) and repeat. 


iS) 


The algorithm sketched above can be found in more 
or less comparable form in [1,11,17]. It only uses 


n 


Yui - £3) 


i=1 


linear programs but it is subjected to the recognition 
problem: ‘Is a given set S definable by inequalities of the 
form mentioned in 4)? Or, more generally, is S max- 
closed?’ 

Our modified form presented earlier requires more 
linear programs to run, but it provides a correct an- 
swer whenever it has run without halting between an 
upper and lower bound, without having checked S first 
on max-closedness. In other words: it avoids the recog- 
nition problem. 

Max-closed sets have an important analogue in the 
area of computational logic. There, so called Horn for- 
mulas, constituting the basics of data-base-reasoning, 
play an important algorithmic role. The reader familiar 
with Horn formulas shall certainly recognize property 
4) above. Also, Pnueli’s algorithm in the form presented 
above recalls the single-lookahead-unit-resolution pro- 
cedure. See [2,9,21]. V. Chandru and J.N. Hooker, us- 
ing results of Chandrasekaran established this interest- 
ing link. 

Resuming the results so far: Pnueli’s algorithm, or 
a simplicial substitute of it, conclusively answers the 
question whether a max-closed set S contains a lattice 
point. This is because of the following theorem. 


Theorem 1 [fS is max-closed, the only points carrying 
no label are the points of S itself. 


We emphasize that max-closedness is a sufficient con- 
dition for the above algorithms to run in a conclu- 
sive manner. But for these algorithms to run it is not 
a necessary one: they might even occasionally run con- 
clusively (that is, never encountering a non labeled 
point which is not a solution) whenever S is not max- 
closed. 
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Unimodular Max-Closed Form Transformations 


Bounded linear integer programming should have 
turned out to be an easy job in case polyhedral sets 
would have been automatically max-closed. Unfortu- 
nately, this is not the case. An important result, how- 
ever, is that simplices can be brought into max-closed 
form through a unimodular transformation [17]. This 
transformation can be found in a polynomial number 
of arithmetical operations. This is the more intrigueing 
since the problem of finding a lattice point in a simplex 
(knapsack problem) is NP-complete. The breakdown of 
this transformation approach is that after transforming 
a simplex into max-closed form the number 


n 

> (ui - hi) 

i=l 
might grow exponentially with the dimension. For this 
reason, an arbitrary starting algorithm certainly is to be 
favored over one which has to start at an upper bound, 
at least in cases where solutions exist. 

Transforming arbitrary polyhedral sets (unimod- 
ularly) into max-closed form is not possible. There- 
fore, simplicial methods clearly are incomplete methods 
when they are applied to arbitrary sets. They return no 
answer in case a point is encountered which carries no 
label and is not a solution, before they have entered the 
upper or lower bound area. As indicated earlier, vec- 
tor labeling avoids this situation and simplicial meth- 
ods using this kind of labeling keep running until either 
a solution is encountered or a specific region of interest 
is reached: a so called twinplex [15]. 

A twinplex consists of a simplex of the triangulated 
region which is very specific in the sense that the n + 
1 associated distance vectors point in all possible direc- 
tions. To be precise, these n + 1 vectors aj, .. 
isfy 


«> Any Sat- 


n+1 

> Ajai =0 

i=1 
for a nontrivial A > 0. Moreover, these vectors aj, ..., 
Anz determine n+1 valid inequalities, each of which is 
violated by at least one of the vertices of the simplex in- 
volved. Intuitively, a twinplex is located where the set 
S is locally the fattest. Based on this intuition it is sug- 
gested in [15] to cut S at this place in two or more com- 
ponents, thereby creating a branching algorithm where 


simplicial methods are incorporated. The various com- 
ponents are transformed then in a manner which make 
them more likely to be max-closed. Thus a branching 
tree is designed where tighter max-closed relaxations of 
the problem are solved with increasing depth. 

Yet another possibility, when meeting a nonsolu- 
tion point a without (integer) label is to go into a re- 
cursion: If SM N;(a) # @, decrease dimension and find 
out whether S MN N;(a) contains a lattice point. If not, a 
can be assigned label k after all and the simplicial search 
continues again in full dimension. In this approach it is 
suggested to select an index k for the recursion which 
has a large effect on the non max-closedness of the 
problem. For example, if S is polyhedral, one selects 
an index k which appears most often with a nonnega- 
tive coefficient. Exploiting twinplexes and incorporat- 
ing simplicial algorithms in branch and bound search 
trees is still an area under investigation. In the satisfia- 
bility area, however, branching algorithms which tend 
to create Horn-like subproblems with increasing depth 
have been studied with success. 
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For NP-hard optimization problems, the use of ex- 
act algorithms for the evaluation of the optimal so- 
lution is computationally intensive requiring an effort 
that increases exponentially with the size of the prob- 
lem. In practice, exact algorithms are used for solv- 
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Simulated Annealing 


ing only moderately sized problem instances. This re- 
sults in the development of heuristic optimization tech- 
niques which provide good quality solutions in a rea- 
sonable amount of computational time. One such pop- 
ular technique is simulated annealing (SA) which has 
been widely applied in both discrete and continuous 
optimization problems ([1,4]). SA is a stochastic search 
method modeled according to the physical annealing 
process which is found in the field of thermodynam- 
ics. Annealing refers to the process of a thermal sys- 
tem initially melting at high temperature and then cool- 
ing slowly by lowering the temperature until it reaches 

a stable state (ground state), in which the system has 

its lowest energy. S. Kirkpatrick, C.D. Gelatt and M.P. 

Vecchi [7] initially proposed an effective connection 

between simulated annealing and combinatorial opti- 

mization (cf. » Evolutionary algorithms in combinato- 

rial optimization), based on the original work in [9]. 
The main advantage of the SA algorithm is its abil- 

ity to escape from local optima by using a mecha- 

nism which allows deterioration in the objective func- 
tion value (OFV). That is, in the optimization process 

SA accepts not only better than previous solutions, but 

also worse quality solutions controlled probabilistically 

through the temperature parameter T. More particu- 
larly, in the first stages of SA where T is relatively high, 
the search of the solution space is widely ‘explored’ so 
that different solution directions are identified, and of- 
ten ‘bad’ solutions are accepted with high probability. 

During the course of the algorithm the temperature T 

decreases in order to steadily reduce the probability P 

of accepting solutions that lead to worse objective func- 

tion values. With the allowance of controlled ‘uphill’ 
movements one can avoid the entrapment to local op- 
tima and, eventually, higher quality solutions can be ob- 
tained. 

There are many factors that have a strong impact on 
the performance of the SA algorithm: 

e The initial temperature T,. A high value of T; means 
that the probability P of accepting inferior solutions 
is also high, leading to a diversified search in the first 
iterations of the algorithm. With low values of the 
initial temperature the search becomes more local- 
ized. 

e The thermal equilibrium. This is the condition in 
which further improvement in the solution cannot 
be expected with high probability. 


e The annealing schedule, which determines in what 
point of the algorithm and by how much the tem- 
perature T is to be reduced. 

Now consider a minimization process. Let AE de- 
note the change of the OFV between the current state 
and the state under consideration that occurs as T de- 
creases. This change corresponds to the change in the 
energy level in the analogy with physical annealing. 
Then the probability P of accepting a worse quality 
solution is equal to e~4?/(k8"), where kg is the Boltz- 
mann constant. Simulated annealing is presented below 
in pseudocode: 


PROCEDURE simulated annealing() 
InputInstance(); 
Generate randomly an initial solution; 
initialize T; 
DOT >0 
DO thermal equilibrium not reached — 
Generate a neighbor state randomly; 
evaluate AE; 
update current state 
IF AE < 0 with new state; 
IF AE > 0 with new state with prob- 
ability e~42/(s7), 
OD; 
Decrease T using annealing schedule; 
OD; 
RETURN (solution with the lowest energy) 
END simulated annealing: 


A pseudocode for simulated annealing procedure 


The following example (the quadratic assignment 
problem, QAP) illustrates the basic principles of SA in 
combinatorial optimization. The QAP is defined as fol- 
lows. 

Given a set N = {1, ..., n} and two (n x n) matrices 
F = (fi) and D = (dj), find a permutation p of the set N 
that minimizes the following function: 


n n 
YY fidpapy- 
i=1 j=l 


In the context of location theory one uses the QAP for- 
mulation to model the problem of allocating n facilities 
to n locations with the objective to minimize the cost 
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associated not only with the distance between locations 

but also with the flow. F and D correspond to the flow 

and distance matrices respectively [11]. 

Let T; represent the temperature at stage i of the 
procedure and T, > ...> Ty represent the anneal- 
ing schedule. Then the application of SA to the QAP 
([2,12]) can be described with the following steps: 

e Start with a feasible solution (permutation). Make 
an exchange between two randomly selected per- 
mutation elements (2-exchange). Evaluate the con- 
sequent change AE. 

e While AE <0 repeat the above step. If AE > 0 then 
randomly select a variable x from a uniform distri- 
bution U(0, 1). Accept the pair exchange if x < 
P(AE) = et, and repeat the process. 

e The system remains at stage i until a fixed number of 
pair exchanges (equilibrium) has taken place before 
going to the next stage. 

e The procedure stops when all the temperatures in 
the annealing schedule have been used, i.e. when 
i>f. 

Simulated annealing has been used to solve a wide 
variety of combinatorial optimization problems, such 
as graph partitioning and graph coloring ([5,6]), VLSI 
design [7], quadratic assignment problems [2], image 
processing [3] and many others. In addition, imple- 
mentations of SA in parallel environments have re- 
cently appeared, with applications in cell placement 
problems, traveling salesman problems, quadratic as- 
signment problems, and others [10]. General refer- 
ences on simulated annealing can be found in [1] and 
in [8]. 
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Simulated Annealing Methods in Protein Folding 


Proteins are essential molecules for the functioning of 
biological systems. They are linear polymers, with the 
monomeric units drawn from a set of 20 amino-acids. 
The number of amino-acids in a protein ranges from 
tens to thousands. In their biologically active state, pro- 
teins assume, out of the immense number of possible 
configurations, a unique state, the so-called native state. 
How the protein chain folds into this native state is the 
essence of the protein folding problem. Leaving aside 
the actual kinetics of the folding process, of great in- 
terest and pragmatic value is even the prediction, by 
any means, of the three-dimensional structure from the 
knowledge of only the sequence of amino-acids and 
the potential of interaction between the atoms in the 
amino-acids. 

If one adopts as a working hypothesis that the na- 
tive state is that which minimizes the protein’s potential 
energy, then the above prediction now reduces, given 
a potential energy as a function of the positions of the 
atoms in the protein, to an optimization problem. This 
has been a view which has been advocated by H.A. 
Scheraga as the ab initio solution of the protein struc- 
ture prediction problem [7]. 

How can one estimate the potential energy, which 
will serve as the objective function in the optimiza- 
tion procedure? Commonly used empirical potentials 
for proteins [5] usually have the form 


U(r’) 
= Ss klb— bo)? + So Ske — Hy 
pairs 2 bond angles 2 
+ > kgl1 + cos(ng — 4)] 
dihedral angles 


a 2 


nonbonded pairs i,j 


0;;\ 12 
4e jj (2) = 


9: 4j 
+ > oe 


nonbonded pairs i, j 


The potential energy U(r’) of a protein with N atoms 
having a certain configuration r\ in the 3N-dimen- 
sional configuration space is modeled by harmonic 
bond terms with force constant ky, and bond length 8, 
bond angle terms with force constant kg, dihedral angle 
terms with force constant kg, multiplicity n and phase 
6, van der Waals terms of the Lennard-Jones type with 
parameters o;; and €,;,and Coulombic terms, with q; the 


charges and € the dielectric constant. The exact value of 
the parameters will depend on the identity of the par- 
ticipating atoms. 

Several related physical problems have been shown 
to suffer from NP-hardness, i. e., to find their optimal 
solution, a search algorithm requires a simulation time 
which scales with the size of the system faster than any 
polynomial function. Among them are the ground state 
determination of atomic clusters [24] or various spin 
glass models [4]; demonstrations that the protein fold- 
ing problem is NP-hard have also been given [13]. 

Since the problem is NP-hard, heuristic algorithms 
must be sought for its solution. From the variety of ap- 
proaches developed (for a review see [20]), we focus 
on the method of simulated annealing [9] as applied to 
proteins. It has an obvious and physically interpretable 
application. Once a cooling schedule is chosen, repre- 
sentative configurations of the allowed micro-states are 
generated by methods either of the molecular dynam- 
ics (MD) or Monte-Carlo (MC) type [6]. For proteins, 
simulated annealing is traditionally built on an MD ap- 
proach where the dynamics of the system is simulated 
by integrating the Newtonian equations of motion and 
the temperature is controlled through some form of 
coupling to a heat-bath. If the MC approach is used, 
after having drawn a new configuration, it is accepted 
or rejected according to an updating probability of, for 
example, the Metropolis type [12] 


p =min|[1, exp(—BAV)]. (1) 


where f = 1/kT and A V is the change in potential en- 
ergy. This acceptance probability has the desirable fea- 
tures that 
i) it obeys detailed balance; and 
ii) it reduces to a steepest descent minimizer at low 
temperature (where only moves which decrease the 
potential energy are accepted). 
In addition to the standard Metropolis Monte-Carlo 
protocol, several other smarter MC algorithms have 
been designed, based on atomic moves biased by the 
forces acting upon the atoms in the molecule [16,18]. 
Substantial improvements to the method of sim- 
ulated annealing can be obtained by propagating not 
just one phase point through configuration space, but 
a whole distribution of points. In the method titled 
Gaussian density annealing [11], the probability den- 
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sity is approximated by a single Gaussian distribution 
which is a product of the individual distributions for 
each atom. This Gaussian distribution is propagated as 
the temperature decreases according to the Bloch equa- 
tion, and the dependence of the first two moments of 
the distribution (i.e., measures of its center position 
and of its width) is followed as a function of the inverse 
temperature. 

With r the d-dimensional vector of Gaussian ran- 
dom variables, the multivariate probability density 
function in the spherically symmetric case has the form 


(19)? 


p(t, B) = (210) %e wo? , 


where the center of the packet is at ro, and the second 
moment is M, = do”. For many degrees of freedom, the 
many body density distribution can be approximated as 
a product of single body density distributions, each with 
its own center and variance. 

The differential equations describing the evolution 
explicitly in the inverse temperature, obtained from the 
reduced Bloch equation, are found to be 


dro 1 
SY ee A 2 
dp d 2 1 ) (2) 
dM) 1 


ap — 3 Marq (U), (3) 
where (U) is the potential energy averaged by weight- 
ing with the Gaussian probability density. Note that the 
center rp moves in a steepest descent on the effective po- 
tential (U) while the width of the distribution adjusts to 
the curvature of the effective potential. 

A variant of the Gaussian density annealing has 
been applied by C. Tsoo and C.L. Brooks [23], for the 
study of optimization of water clusters. The popular 
diffusion equation method of Scheraga and coworkers 
[10] exists as a special case of the Gaussian density an- 
nealing method when all Gaussian packets are charac- 
terized by the same variance. 

Of particular interest and promise are two related 
methods. The first is the elegant packet annealing 
method of D. Shalloway and coworkers [15] of which 
the Gaussian density annealing method can be shown 
to be a special case. The second is the locally enhanced 
sampling (LES) method of R. Elber and coworkers [17]. 
The LES method has has been applied to complex sys- 
tems such as solvated peptides with excellent results. It 


has the additional advantage that it is relatively simple 
to implement. These and other related methods have 
recently been reviewed [20]. 

The potential smoothing algorithms based on 
a Gaussian smoothing transform of the potential energy 
surface are quite effective for a large number of systems. 
For a complicated biomolecular potential energy sur- 
face it is possible to carry out the smoothing transfor- 
mation approximately by fitting the nonbonded poten- 
tial functions to Gaussians or exponentials. However, 
there is computational overhead associated with com- 
puting these transformed functions. It has recently been 
shown that it is possible to derive all the benefits of the 
Gaussian smoothing transform while carrying out no 
explicit transform of the potential energy function. The 
method substitutes a ‘top hat’ (impulse) function for the 
Gaussian in the smoothing transform. In one dimen- 
sion the result is that the force on the smoothed po- 
tential is simply the difference in the potential energy 
evaluated at each side of the top hat divided by twice 
the top hat’s width - that is, a finite difference formula 
for the force. Since the width of the smoothing func- 
tion is not always small, this exact force derived for the 
smoothed potential can be thought of as a ‘bad deriva- 
tive’. Most significantly, it is possible to use this method 
to smooth the Boltzmann probability distribution func- 
tion. This approach is similar in spirit to the packet an- 
nealing algorithm of Shalloway and coworkers, which 
involves a Gaussian smoothing of the Boltzmann dis- 
tribution [15]. However, the method of bad derivatives 
requires no explicit integral transform. The method was 
applied to isolate low lying energy minima for a small 
peptide with excellent results demonstrating the supe- 
riority of the method to standard Gaussian smoothing 
algorithms [3]. 

Other recent improvements of the simulated an- 
nealing method include the use of the Tallis probability 
distributions [19]. In the Tsallis formalism, a general- 
ized statistics is built from the generalized entropy 


Sa (4) 


where q is a real number and S, tends to the informa- 
tion entropy 


S=—-k > pilnp; (5) 
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when q — 1. By means of maximizing the Tsallis en- 
tropy with the constraints 


> pi =1 and Y> ple; = const, 


where €; is the energy spectrum, the generalized proba- 
bility distribution is found to be 


[1-1 — q)B«i] 

pPi= a an (6) 
where Z, is the generalized partition function. This 
distribution goes to the Gibbs—Boltzmann distribution 
when q tends to 1. But for q # 1, the probability dis- 
tributions have power-law tails, and are thus broader 
than the Gibbs-Boltzmann distributions. This delocal- 
ization of the distribution is the essential feature that al- 
lows more ample exploration of the configuration space 
and faster cooling schedules. 

A typical implementation of a generalized simulated 
annealing algorithm has been proposed by I. Andri- 
cioaei and J. Straub [2] (for a variety of MD and MC 
based sampling algorithms of the Tsallis class, see [21]) 
and has the following structure: 

1) generate trial moves by the method of choice; 
2) accept trial moves with probability 


q(T) 
= = T Enew Iq 
1 —(1—4(T))B ) | 7) 


aa ( 1—(1—4q(T))BEoua 

that obeys detailed balance; 

3) reduce the temperature and reduce q such that 

limr — 9 q(T) = 1 and go to 1). 

At constant temperature, the sampling converges 
towards the Tsallis equilibrium probability distribution 
in (6). As temperature decreases, the parameter q is var- 
ied as a monotonically decreasing function of tempera- 
ture. The steepest descent behavior is imposed by start- 
ing with a convenient value of q at the initial tempera- 
ture and by having q tend towards 1 as the temperature 
decreases during the annealing schedule. Since q > 1 
at low T, the desirable reduction to a steepest descent at 
low temperature is preserved. 

Interestingly enough, it was shown [1], that when 
the maximum entropy formalism is applied to the en- 
tropy postulated by Tsallis (4) one is able to recover 
the more general Levy probability distributions (cor- 
responding to fractal random walk, the dimension of 


which is determined by q), which a variational entropic 
formalism based on the Boltzmann entropy is unable to 
do. At initial values of q(T) > 1, a Markov chain gener- 
ated at the initial temperatures would converge towards 
a Levy distribution. For example, in the particular case 
of q = 2 and a harmonic potential, the Levy distribution 
is a Cauchy-Lorentz distribution which is the same dis- 
tribution used for trial moves in the fast simulated an- 
nealing method of H. Szu and R. Hartley [22]. 

Of importance is to study how does the simulated 
annealing time depend on the features of the potential 
energy surface of the proteins. One can derive a sim- 
ple scaling relation for the optimal cooling schedule in 
a simulated annealing optimization protocol. The rel- 
evant energy scales of U(r) are AU, the difference in 
energy between the ground and first excited state min- 
ima, and U*, the highest barrier on the potential sur- 
face accessed from the global energy minimum. The fi- 
nal temperature reached in a simulated annealing run 
must be small enough so that at equilibrium the mole 
fraction in the global energy minimum basin is signifi- 
cant. 

The time that the trajectory must spend at the low- 
est temperature to ensure that the equilibrium distri- 
bution is sampled is at least Tmin, the time required to 
surmount the largest barrier separating the global en- 
ergy minimum from other thermodynamically impor- 
tant states, which can be shown to go as 


Ut a 
Tmin ~ | [TT . 
( A ? 


In the limit q — 1 of Gibbs-Boltzmann statistics, 
one finds that 


The time for classical simulated annealing increases ex- 
ponentially as a function of the ratio of the energy scales 
U*/AU. However, for q > 1 the situation is qualitatively 
different. As a result of the weak temperature depen- 
dence in the barrier crossing times, the time for sim- 
ulated annealing increases only weakly as a power law 
[21]. Given the large separation in energy scales on the 
potential energy surface of proteins, and the large value 
of U*/A U, the generalized simulated annealing is ex- 
pected to be well suited for problems of global opti- 
mization as one encounters in protein folding. 
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The algorithm based on the generalized Tsallis 
probability distribution has been recently adopted and 
employed for proteins by U.H.E. Hansmann and Y. 
Okamoto [8]; see [14] for a review of other generalized 
ensembles as applied to the protein folding problem. 
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Introduction 


In applied work, there is often no a priori unique func- 
tional form and parametrization for models of empiri- 
cal processes, nor are there generally accepted ones. For 
many empirical processes the functional form and the 
parameters of the system are not known and must be 
estimated from the data and the optimal control de- 
rived [10]. 


The model of a phenomenon must consider suit- 
able functional forms for the merit function and for 
the constraints, and determine suitable parameters for 
these and suitable inputs of the control variables to ob- 
tain a specified set of output variables which will render 
the merit function a minimum (maximum), possibly 
global. Since the estimation problem is usually posed as 
an unconstrained optimization problem and determin- 
ing the optimal control is an optimization problem, two 
optimization problems must be solved. 

Except for simple model categories, interactions oc- 
cur between the estimation space, where the values of 
the variables are determined, and the control space, 
where the optimal control is derived [7], so there is 
a variational aspect with regard to the optimal con- 
trol to be analyzed. To avoid severe suboptimization all 
the unknowns must be determined at the same time, 
by solving a single more general optimization prob- 
lem. 

To achieve an efficient control of an empirical pro- 
cess, a mathematical programming problem must be 
solved simultaneously in the estimation and control 
spaces, to determine a sufficiently accurate estimate of 
the functional, form and the parameters and the least- 
cost solution of the optimal control problem. 

Statistical conditions have been studied [19], and 
empirical aspects of the approach have been consid- 
ered [2,3]. 


Definitions 


To control phenomena of any type [10], including 
problems of pathological conditions [3], a set of deci- 
sions regarding what are the best actions for their con- 
trol, so as to render optimal the merit function, are con- 
sidered. Three approaches can be indicated by which 
the problem could be specified and solved: 

1. Ad hoc methods may be used, such as calibration, 
simulation, experience or intuition, which are lim- 
ited and may lead to wrong formulations of the 
problem. 

2. The classic three-stage approach: determine the 
model to implement, estimate the parameters of the 
form adopted and then solve the quantified opti- 
mization problem. 

3. Solve the estimation and the optimization problem 
simultaneously [19]. 
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Whatever the approach, it must be ensured that the esti- 
mation of the functional forms and the parameters sat- 
isfies the following statistical properties to ensure that 
the estimates are the best, correct and consistent ones 
that can be formulated. Statistical estimation methods 
are important, because, when implemented correctly, 
with regard to an accurately specified functional form, 
they will provide estimates of parameters that have the 
following properties [1,15]: 
1. The parameter estimates are unbiased: 

e As the size of the data set grows larger, the esti- 

mated parameters tend to their true values. 


2. The parameter estimates are consistent, which then 
will satisfy the following conditions: 
e The estimated parameters are asymptotically un- 
biased. 
e The variance of the parameter estimate must tend 
to zero as the data set tends to infinity. 


3. The parameter estimates are asymptotically efficient: 
e The estimated parameters are consistent. 
e The estimated parameters have smaller asymp- 
totic variance compared with any other consis- 
tent estimator. 


4. The residuals have minimum variance, which is en- 
sured by: 
e The variance of the residuals must be a mini- 
mum. 
e The residuals must be homoscedastic. 
e The residuals must not be serially correlated. 


5. The residuals are unbiased (have zero mean). 


6. The residuals have a noninformative distribution 

(usually, a Gaussian distribution). 

e If the distribution of the residuals is informa- 
tive, the extra information could somehow be 
obtained, reducing the variance of the residuals, 
their bias, etc. with the result that better estimates 
are obtained. 

Through a correct implementation of statistical estima- 
tion techniques the estimates are as close as possible to 
their true values, all the information that is available is 
applied and the uncertainty surrounding the estimates 
and the data fit is reduced to the maximum extent possi- 
ble. Thus, the estimates of the parameters, which satisfy 
all these conditions, are the “best” possible in a “techni- 
cal sense” [1]. 


Definition and Properties 
of the Traditional Estimation Approach 


The estimation and optimization of an empirical model 
by the traditional three-stage statistical method [20] 
considers: 

e A functional form is posited and an error structure 
for the residuals is assumed. 

e Under these hypotheses a data set is used to deter- 
mine the values of the parameters, by solving an un- 
constrained optimization problem and then the es- 
timates determined are checked to verify that they 
satisfy the properties indicated above [13,1]. If this 
is so, it is possible to proceed to the next stage, oth- 
erwise a new functional form must be specified and 
anew unconstrained optimization problem must be 
solved. 

e An optimization problem is solved in the space of 
the control variables, to determine the optimal pol- 
icy. 

With regard to empirical processes, which require 
the estimation of nonlinear models, there may be many 
alternative models to represent the behavior of a phe- 
nomenon [4]. Therefore the efficacy of the control pol- 
icy to be adopted will depend on which of the alter- 
native models is chosen and how the parameter values 
and the functional form are selected [12]. This leads to 
a number of problems in applications: 

e For nonlinear and dynamic estimation problems, 
the likelihood function to be maximized is usually 
not concave, so there are many local maxima, each 
leading to a different set of estimation coefficients. 
Determining the global maximum of the function 
may not be helpful. 

e Certain statistical properties must be satisfied to en- 
sure a statistically correct estimate. Such conditions 
may hold at some local maxima, but not at oth- 
ers. There is no assurance that the global maximum, 
even if unique, will satisfy these conditions; thus, all 
local maxima must be verified. 

e Since there may be many alternative models, the 
model chosen may not have a different optimal con- 
trol policy, or the optimization may not yield the 
best policy for the possible parametrization, while 
other candidate models could satisfy both criteria. 

For estimation problems linear in the parameters, the 

whole process is simplified. In general, however, the 
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stages of such a procedure are interrelated and so enu- 
merative solution techniques over all stationary points 
must be adopted, providing serious limitations to this 
approach. 


The Simultaneous Estimation 
and Optimization Approach 


It would appear more plausible, to avoid suboptimiza- 
tion and to balance the eventual imprecision, to solve 
both problems simultaneously, imposing the statistical 
conditions that must be satisfied as constraints [19]. 

Let the data set of a phenomenon consist of suffi- 
cient measurements (y;,,X;,Un) over (n = 1,2, ...,N) 
periods, where it is assumed, that y, € R? isa p-dimen- 
sional vector to be explained, while x, € R’ is an r-di- 
mensional vector of explanatory or state variables of 
the dynamic process and u, is a q-dimensional vector 
of control variables and a horizon T is indicated, over 
which the merit function must be optimized and the 
control variables must be determined. Thus, the whole 
period considered is composed of a historical period 
{1,2,... N} and a future period for which policy must 
be determined, given by {N+1,N+42,...,T7}. Fur- 
ther, let w; € R’ and v; € R? be random variables to be 
determined with mean zero and finite variance. 

It is desired to determine functional forms 
gp: R't4 — R" and yn: R't! — RP and a set of suit- 
able coefficients © € R” such that 

T 
MinJ= >) c(xi,ui.yi), (1) 
i=N41 
i=1,2,...,T, (2) 


Xit-1 = 9(xi, Ui, Vi, Wi: a) 


Viti = (Xi, Ui, Vi: 6) t=1,25 so: i ; (3) 


Equation (1) is the objective function for the model 
and (2) and (3) are the state space formulation of the 
problem, while a similar representation may be adopted 
for the input-output formulation, [14]. To ensure that 
all the statistical properties that the given estimates of 
the residuals must fulfill are satisfied at every itera- 
tion, instead of solving an unconstrained maximum- 
likelihood or least-squares problem [13], the required 
statistical properties of the estimates are set up as con- 
straints, together with the specification of the model of 
the phenomenon and this global optimization problem 
is solved for all the undetermined variables. 


Formulation 


A phenomenon can be represented by a model which 
will capture all the prescribed input-output relations, 
at a preset level of precision. Suppose that it is desired 
to determine an optimal control for the system (1)-(3) 
over a given period i =,N+1,...,Z —1 based on 
a historical period i = 1,2, ... , N for which a suitable 
data set is given. 

To achieve this, a set of constraints must be added to 
enforce that the parameter values that will be estimated 
have the required properties. 

The unknowns to be determined are the input and 
output variables considered and the parameters of the 
functional form specified in the current iteration, indi- 
cated as O = {6,2} C R™. Note that m may be much 
larger than 2r+ q+ p-+1, the number of variables 
present in each system, since the system is nonlinear. 

The mathematical program is formulated with re- 
spect to the residual variables, but it is immediate that 
for a given functional form the unknown parameters 
will be specified and thus the unknowns of the prob- 
lem will also be defined and available. Hence, the math- 
ematical program is fully specified for each functional 
form to be considered, as the residual terms can be 
stated so: 


Wi = Xi41 — On(Xi, Ui, 91:1) i=1,2,...,N, 
heH (4) 
Vi = Pitr — K(X, Ui, Vit 2) 1=1,2,...,N, 
kek (5) 


where *, as usual indicates the historical values of a vari- 
able and thus suitable values of 6),02 must be deter- 
mined by the mathematical program, such that all the 
constraints expressed in terms of w;, v;Wi are specified 
and H , K are suitable function spaces. 

The mathematical programming problem, in the 
notation defined above in “Definitions,” to be solved for 
each functional form in the given sets, with a suitable 
optimization routine, is the following: 


T 


MinJ = 7) c(xisui.yi). (6) 


i=N+1 


Xit1 = G(X, Ui, Vi, Wi: M1), (7) 
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Viti = (xi, Ui, vi: 02), (8) 
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The abstract model of the dynamical system 
Eqs. (1)-(3) given by the system of Eqs. (7) and (8) 
is to be optimized with regard to a given merit func- 
tion Eq. (6) subject to the means of the residuals that 
should be zero Eqs. (19), (10), and the sum of squares 
of the residuals should be less than critical values k,,,k, 
Eqs. (11), (12), which can be decreased by dichotomous 
search at every iteration, until the problem does not 
yield a feasible solution. 

The least values obtained for these parameters, 
while retaining a feasible solution to the whole prob- 
lem, are equivalent to a minimization of the statistical 
estimation error and of a maximization of the maxi- 
mum likelihood, under appropriate distributional as- 
sumptions concerning the residuals. 

Further, all the serial correlations between the resid- 
ual must not be significantly different from zero, as en- 
forced by constraints Eqs. (13)-(23). 

To ensure that these conditions hold throughout 
the possible variation of the independent variables, 
the residuals must be homoscedastic and thus satisfy 
Egs. (23) and (24). 

The homoscedasticity condition on the residuals 
is obtained by regressing the original variables of the 
problem, indicated by the data matrix W, on the nor- 
malized square of the residuals, which are indicated, 
respectively, by gy.gy. This leads to a set of nonlinear 
equations in the squared residuals to be determined. 
The y? test is applied at a confidence level of (1 — a) 
with m— 1 degrees of freedom and a significance level 
of a [6]. 

Constraints Eqs. (25)-(28) are sample moments of 
the probability distribution function of the residuals 
which are made to assume given values in terms of the 
variance o” and its higher powers. These constraints 
enforce the residuals to have a noninformative distri- 
bution, here a Gaussian. 
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This constrained minimization problem Eqs. (6)- 
(29) will dominate the solutions obtainable by the tradi- 
tional three-phase procedure, since whenever the latter 
has a solution, the new procedure will also have a solu- 
tion. 


Theorem 1 Let the given optimal control problem as 
described in Eqs. (1)-(3) have a unique solution and let 
all the statistical conditions indicated above hold, so that 
the solution of the maximum-likelihood estimate of the 
unconstrained problem is well defined. Then, the solu- 
tion of optimal control problem Eqs. (1)-(3) will be equal 
to the solution of the constrained optimization problem 
Eqs. (6)-(29). 


Methods and Applications 


Nonlinear optimization routines which use line search 
methods may cause difficulties, since some of the con- 
straints may be highly nonlinear, so trust region meth- 
ods may be better. 

A specialized technique based on complementarity 
theory is used to actually solve this problem. 

Consider the following optimization problem 
which represents in summary form the problem Eqs. 
(6)-(29). 


Min Z= f(w) f:R" >R, (30) 
g(w)>0 g:R" >R?, (31) 
h(w) =0 h:R® > R4, (32) 


The proposed algorithm consists in defining 
a quadratic approximation to the objective function, 
a linear approximation to the constraints and deter- 
mining a critical point of the approximation by solving 
a linear complementarity problem, as given in [18]. 

Expanding the functions in a Taylor series, at the 
given iteration point w*, one may eliminate the equality 
constraints simply by converting them into p + 1 in- 
equality constraints. Thus, 


h(w) = h(w*) + Vh(w')\(w —w*) > 0, (33) 
—et h(w) = —e; (h(w*) + Vh(w')\(w—w*) > 0. (34) 


Unconstrained variables must be transformed into 
nonnegative variables for the linear complementarity 


problem (LCP) algorithm. So let 


§ = Inf{w; | wi € {g(w) = 0, h(w) = 0}} (35) 
where ¢ is a suitable lower bound to the unrestricted 
variables, which will be expressed as 

xj =wi—-C6>0. (36) 
Should there be no lower bound specifiable for a vari- 
able, then as it is well known, the variable can be repre- 
sented as the difference of two nonnegative variables. 

A set of trust region constraints can be imposed on 
the problem as a system of linear inequalities centered 
around the iteration point, to limit the change in the 
possible solution. Note that such a set of inequalities 
has properties quite different from those of the usual 
trust region constraint imposed in nonlinear optimiza- 
tion [8]: 

Dx+d=0 (37) 
where D € R"*" is a suitable matrix which may be 
changed at every iteration and d € R” a suitable vector. 
These can be included in the inequalities. 

The resulting quadratic problem is easily trans- 
formed into a linear complementarity problem, which 
can be solved by linear programming routines [18]. 
A new solution point will always exist whenever the 
trust region is included in the problem, and will lie ei- 
ther inside the trust region or on a trust region con- 
straint. 

Whenever this point occurs inside the trust region, 
it is an approximate stationary point. If the solution 
point occurs on a trust region constraint and the solu- 
tion is feasible while a reduction in the objective func- 
tion has occurred, the solution point is taken as the new 
starting point and a new iteration is started. Otherwise, 
if the new point is infeasible, the trust region is reduced. 
Finally, if there has been an increase in the objective 
function, the trust region is enlarged and the iteration 
is repeated, with suitable safeguards to provoke a reduc- 
tion in the objective function. 

If the objective function is bounded for all values of 
the variables which satisfy the constraints, then a local 
minimum point will be found. Thus, the convergence of 
the algorithm can be proved without difficulty by stan- 
dard techniques [19]. 
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Models 


The data sets of empirical processes of phenomena may 
be of various types, such as dynamic, cross-sectional or 
both cross-sectional and dynamic, but the model is suit- 
able and accurate for all types of models, replacing the 
dynamic model indicated in Eqs. (2) and (3) by an ap- 
propriate model of the phenomenon, suitable for the 
data available and the purposes of the application. 

In Table 1 three well-known examples are indi- 
cated and their salient estimation characteristics are 
presented. The model for naphthalene arose in an in- 
vestigation of the oxidation of naphthalene to phthalic 
anhydride [5,11]. The model fitted was a full quadratic 
cross-sectional model. As can be seen, the results for the 
two estimations are identical and all the statistical con- 
ditions are satisfied, so confirming Theorem 1. 

The model for the tire composition studies the ef- 
fect of three process variables. The data are very scanty, 
but the full quadratic model applied indicated that the 
model as a whole was significant, although the coef- 
ficients of the squared terms and the cross-product 
terms under the null hypothesis are not significant. 
Heteroscedasticity is found to be significant, by the 
appropriate test. Our algorithm indicates that lagged 
variables should be included and the estimates of such 
a model result have a very low residual. 

The model specified for the Constant Elasticity of 
Substitution (CES) function in economics is a function 
which is nonlinear in the parameters although it can be 


almost linearized by a logarithm transformation. Log 
transformations gave good results for traditional rou- 
tines, while no transformations were required with this 
algorithm, which is a definite advantage. 

Further computational experiments are given in Ta- 
ble 2 for some important chemical processes and sim- 
ilar results are reported. For example, in the chlorides 
experiment it is indicated that more variables are re- 
quired by defining suitable cross products and lagged 
terms of the original set of variables. 

In Table 3 a time series implementation is presented 
with a number of variants. The experiment is very well 
known and the original data were analyzed [9] and 
a polynomial model was fitted. A dynamic model with 
an exponential term was subsequently fitted [16,17] and 
autocorrelated terms were added to the series, in the 
third instance. Traditional techniques provide limited 
results, whereas this algorithm solves the problem well. 


Cases 


While in the previous section the application of vari- 
ous models was presented and the results were shown 
in all cases to be good, here we shall consider dynamic 
models with exogenous and endogenous variables and 
examine their performance. 

Two general instances will be considered: financial 
prediction models and optimal control in drilling for 
oil. 


Simultaneous Estimation and Optimization of Nonlinear Problems, Table 1 
SAS Institute statistical package and Socrates algorithm: performance comparison of statistical conditions for examples non- 


linear in the variables 


Naphthalene 
Socrates SAS 


No. of observations 


Tire composition 
Socrates 


CES function 
SAS nonlinear SAS logs Socrates 


No. of parameters 


No. of iterations 


Dynamic 


Mean of residuals 


Residuals variance 


Heteroscedasticity 


Autocorrelation 


Lack of normality 


S: significant, NS not significant at confidence level 0.95 U unverifiable 
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Simultaneous Estimation and Optimization of Nonlinear Problems, Table 2 
SAS Institute statistical package and Socrates algorithm: performance comparison of statistical conditions for examples non- 


linear in the variables 


Chemical inversion 
Socrates 
No. of observations 


Isomerization Chlorides 


Socrates SAS nonlinear Socrates 


SAS logs 


No. of parameters 


No. of iterations 


Dynamic 


Mean of residuals 


Residuals variance 


Heteroscedasticity 


Autocorrelation 


Lack of normality 


S: significant, NS not significant at confidence level 0.95 U unverifiable 


Simultaneous Estimation and Optimization of Nonlinear Problems, Table 3 
SAS Institute statistical package and Socrates algorithm: performance comparison of statistical conditions for dynamic sys- 


tems 


Exponential model Polynomial model Autocorrelated 


SAS Socrates 


No. of observations 


Socrates SAS Socrates 


No. of parameters 


No. of iterations 


Mean of residuals 


Residuals variance 


Heteroscedasticity 


Autocorrelation 


Lack of normality 


S: significant, NS not significant at confidence level 0.95 U undeterminable 


Financial Prediction 


A number of chronological series of quotations on vari- 

ous stock exchanges were gathered and it was attempted 

to make predictions one period ahead for the series. The 

series considered are: 

1. The Standard and Poor 500 common stock index 
(SPX). 

2. The Dow Jones Euro Stock Index, consisting of 50 
stocks expressed in euros (SX5E). 

3. The Nikkei 225 stock average (NKY). 

4. US Government bonds 10-year 
(USGGI1OYR). 

5. The 10-year fixed US dollar fixed swap rate (USS- 
WAPIO). 


index 


6. German government bonds index 

(GDBRI1O0). 

7. Deutsche mark-euro exchange rate (DM-EUR). 
8. Gold spot price (GOLDS). 
9. Chicago Board Options Exchange: Volatility Index 

(VIX). 

10. Reuters Jeffries CRB Futures Price Index (CRY). 

The results of fitting suitable dynamical models to 
this data, available for over 12 years, are given in Ta- 
ble 4. 

Notice how the estimation system (here there is no 
concurrent optimization system) through the specifi- 
cation of the constraints which impose the statistical 
properties on the estimates results in very precise pre- 


dictions one period ahead. 


10-year 


Simultaneous Estimation and Optimization of Nonlinear Problems 


3605 


Simultaneous Estimation and Optimization of Nonlinear Problems, Table 4 
Variances and residual variances for the single lagged variable dynamic systems, 647 periods 


Name of 
financial series 


SPX 


Series 


Mean value Standard 


1055.0 


Goodness of fit 
Absolute 
mean error 
0.0612 


Residuals 


x? value 


Mean value Standard 
deviation 


0.0836 


deviation 


270.909 | 1.1328e-04 2.8622 


No. of 
freedom 


SX5E 


1032.8 


3082.0 1.1017e-04 | 0.4617 0.3057 31.9584 


NKY 


14816.0 


3703.1 -0.0137 1.4617 1.0236 62.5824 


USGG10YR 


5.2399 


0.9479 


-4.7074E-06 | 2.6441E-04 | 1.9255E-04 | 5.58760E-03 


USSWAP10 


5.8226 


0.9990 


-4.9551E-06 | 2.7227E-04 | 2.0133E-04 | 5.31960E-03 


GDBR10 


4.8036 


1.0002 


-2.0893E-06 | 2.3075E-04 | 1.7287E-04 | 4.68289E-03 


DEM-EUR 


R7522 


0.2526 


-1.1848E-07 | 5.1837E-05 | 4.2083E-05 | 6.33564E-04 


GOLDS 


369.5799 


101.1419 | -2.4613E-05 | 0.0237 0.0167 6.53197E-01 


VIX 


19.6402 


6.7262 6.3632E-05 | 0.0093 0.0070 2.00086 


CRY 


245.4010 


41.9699 | -1.0608E-05 | 0.0085 0.0068 0.12247 


Results have also been obtained for predictions two 
periods ahead and up to five periods ahead with compa- 
rable results. Thus, this algorithm is extremely robust. 


Optimal Control in Drilling of Oil 


Determining optimal control policies for petroleum 
drilling is a very interesting problem, since through the 
mudlogging data bases which are compiled for every 
well, the underlying empirical process can be studied 
in detail. 

In Table 5 the actual time series of the drilling pro- 
cess is compared with the best predictions obtained 
from an endogenous model for each process. The best 
model indicated a differing number of endogenous 
variables, but for all 16 processes the model with five 
lags was the most accurate. 

The determination of optimal control policies in 
processes for the extraction of oil from underground 
sources requires that they be formulated as formal pro- 
cedures. In each process during 1 week, the original 
mudlogging data set was resampled every 30 elemen- 
tary periods, which were of 5 s, so the fundamental pe- 
riod considered was 150 s. 

The period over which to determine the optimal 
controls was chosen to be 192 periods, or 8 h, while 
the historical period was relatively long. The results are 
shown in Table 6, in which the optimal increment de- 
termined on the basis of this algorithm with a closed- 
loop policy is compared with what actually happened. 


In Table 6 the six instances determining optimal 
controls are indicated and each entry is concerned with 
the drilling experience of the given well for that week 
with regard to the given period. From the active perfo- 
ration intervals an initial period was selected randomly 
and the optimal control was defined for the next 192 
periods (8h). The average predicted increment over the 
actual increment was more than 30%. 


Conclusions 


Simultaneous estimation and optimization provides 
very robust and versatile algorithms for nonlinear es- 
timation and optimization of all types of empirical data 
sets. Here a more complex unique optimization prob- 
lem was used instead of a succession of optimization 
problems, the first being unconstrained and the second 
being a constrained decision or allocation optimization 
problem. 

This approach always imposes the satisfaction of 
the required statistical properties, so as to ensure that 
at every iteration a statistically correct solution is de- 
termined for that functional form. By iterating on 
the functional forms and on the specification of the 
problem, one will always obtain better solutions, until 
a lower bound is reached. 

The lower bound indicates the pure noise of the em- 
pirical process, but as can be seen it is usually much 
lower than the noise components determined by tradi- 
tional methods. 
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Simultaneous Estimation and Optimization of Nonlinear Problems, Table 5 
Best endogenous prediction results of five lag models for the drilling processes 


No. of exo- 
genous variables 


Process 


84002408 


Residual variances 2 


No. of degrees Time (s) 
of freedom 


782709.488774 470.89 


0.11496404E-02 


0.002329 982.77 


0.41461787E-01 


0.311272 5073.40 


0.10572470E-01 


0.012292 1056.66 


0.27926209 


5.930955 1839.50 


0.12768791E-01 


0.057533 2069.71 


0.76346211E-03 


0.000425 575.76 


0.30245159E-02 


0.014091 1518.98 


0.19508212E-03 


0.000238 10601.91 


0.63263313E-03 


0.001045 8576.97 


0.88143523E-02 


0.007543 1278.91 


0.58546597E-01 


0.132707 658.40 


4 
4 
8 
4 
6 
8 
4 
6 
6 
6 
8 
6 
6 


0.55842518 


1.509173 478.83 


0.99603892E-03 


0.002172 987.23 


0.27631189E-02 
0.21829931E-02 
0.84573218E-03 


Simultaneous Estimation and Optimization of Nonlinear 
Problems, Table 6 

Optimal predicted versus actual increment for six oil wells 
over 192 periods (8 h) 


Real Difference (%) 


increment 


Welland week Optimal 
increment 


FTO2D 9 
FTO2D 16 


FTO2D 23 
GX01 3 
GX01 11 
BEO1 1 


Thus, the simultaneous estimation and optimiza- 
tion algorithm for nonlinear and dynamic problems is 
an extremely powerful instrument. 


See also 


> Complementarity Algorithms in Pattern 
Recognition 

> Generalizations of Interior Point Methods for the 
Linear Complementarity Problem 

> Mathematical Programming Methods in Supply 
Chain Management 


0.005723 
0.001563 
0.000488 


1254.82 
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problem 


In 1857 the British mathematician J.J. Sylvester, in a one 
sentence article [28], posed the problem: 


It is required to find the least circle which shall 
contain a given set of points in the plane. 


From then until the 1960s, the problem attracted the 
occasional interest of mathematicians who proposed al- 
gorithms [1,5,21], applications [21,29] and related the- 
ory [17,26], both for the problem in the plane and for 
the minimum sphere problem in higher dimensions. 
The references, especially [1,14,26], cover the history in 
more detail. 

Starting in 1971, the problem attracted greater in- 
terest in the context of location theory because find- 
ing the center of the circle of minimum radius is 
equivalent to locating a central facility for which the 
maximum distance to any service point is minimized 
[8,9,11,12,16,23]. The problem was also introduced 
in the 1970s computer science literature as one of 
the closest point problems of computational geometry 
[27]. This article provides an optimization formulation, 
characterization of the solution, one of the primary al- 
gorithms for the problem in the plane and references to 
extensions. 

A minimax statement of the minimum sphere prob- 
lem in R" is: 

min max |x —aill, 

x€R" i=1,...,m 
where there are m given points a; € R" and || - || denotes 
the Euclidean norm. Converting this to a constrained 
optimization problem with differentiable functions is 
accomplished by squaring the norm term and introduc- 
ing a new variable s for the squared radius of the mini- 
mum sphere: 

mins 
(s,x) 


st. [lx —ajll? <s, 


(s,x) € R"t!, 


i=1,...,m, 


In this form the problem is a convex program for which 
the Karush-Kuhn-Tucker conditions [20] are both nec- 
essary and sufficient. Applying these conditions shows 


that there exist nonnegative multipliers u;,i=1,...,m 
such that 
m 
2 ujaj = x* ‘ 
i=1 
m 


y= 1, 


i=1 
u;(s* — ||x — ail’) = 0, 


u; = 0, 


os eee 77 P 


i=l,...,m. 
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Thus x*, the center of the minimum sphere with 
radius \/s*, is a convex combination of points on its 
boundary since, for points within the sphere, the asso- 
ciated u; must be zero. Further, by Carathéodory’s theo- 
rem (cf. » Carathéodory theorem), at most n + 1 of the 
given points are required for this convex combination. 
In the plane, therefore, the minimum circle is defined 
by either 
e two points of maximum separation, or 
e three of the points which form a nonobtuse triangle. 
This characterization of the solution is also evident 
from geometrical arguments. 

Given this characterization, a finite procedure could 
be devised in which all circles defined by two or three 
points are constructed. In this procedure, three points 
would be discarded if they formed an obtuse triangle. 
From the circles constructed, the smallest covering cir- 
cle would be chosen. However, this total enumerative 
approach would not be effective for large values of m - 
the number of two and three point combinations grows 
rapidly with the number of points. If, for example, m = 
100, over 160,000 combinations would be considered. 

Below is an outline of a more efficient method due 
to J. Elzinga and D.W. Hearn [8]: 


0. | Choose any two of the given points and go to 
Step 1. 

1. | Let these two points define the diameter of a 
circle. 

IF this circle covers all points THEN STOP 
ELSE find an outside point and go to Step 2 
with these three points. 

2. | Solve the minimum circle problem for these 
three points. 

IF the minimum circle is defined by two 
points THEN go to Step 1. 

IF three points define it THEN go to Step 3. 

3. | IF the circle defined by three points covers all 
points THEN STOP 

ELSE find an outside point (e.g., the farthest 
one) and go to Step 4. 

4. | Solve the minimum circle problem for these 
four points. 

IF it is defined by two points THEN go to 
Step 1. 

IF defined by three points THEN go to Step 3. 


Elzinga—Hearn algorithm 


This outline omits many details of how the differ- 
ent steps are executed and efficiencies such as reduc- 
ing the given set of m points to their convex hull. See 
[2,8,14] for a discussion of those details and [15] for 
a computer code. In computational testing on random 
data, the most effective versions compute the solution 
of a problem with 100 points in fewer than 10 itera- 
tions [14]. Empirically, the computational effort goes 
up linearly with the number of points, however, an ex- 
ample given in [6] requires O(m7”) time. In theory, there 
are methods of lower complexity, e. g., methods based 
on construction of Voronoi diagrams [25,27] and the 
O(m) method of N. Megiddo [22], but effective imple- 
mentations of these methods have not been developed. 
E. Welzl [30] gives a random method of expected com- 
plexity O(m) and reports on the implementation of an 
effective heuristic variation. 

The Elzinga-Hearn algorithm can be classified as 
a dual procedure [14] - only the final circle covers all 
points and is therefore feasible. Primal methods, in par- 
ticular, the first algorithm, due to G. Chrystal [5] and, 
independently, to B. Peirce [29] also exist, and the most 
efficient implementations are competitive with that of 
Elzinga and Hearn. In [14] these implementations are 
described and a classification scheme is given which 
shows the equivalence of a number of other proposed 
algorithms. 

In the location theory context positive weights indi- 
cating relative importance may be associated with the 
given service points [11,14]. Then the problem is one 
of minimizing the maximum weighted distance and it 
loses its covering circle interpretation [14]. The algo- 
rithm given above has been extended to this weighted 
problem in [13,14]. 

By a change of variables of the form s = v+xTx, the 
above constrained problem becomes 

min v+x'x 
(v,x) 
s.t. v+ 2a} x _ a} aj > 0, 
t= 1,.2.,m, 
(v,x) € R"*}, 


which is recognized to be a convex quadratic program 
[9,19,23]. Thus there are many general purpose algo- 
rithms which can solve the minimum sphere problem. 
A pivoting method reminiscent of the revised simplex 


Single Facility Location: Circle Covering Problem 


3609 


method for linear programming, and with storage re- 
quirements that depend only on 1, is given in [9]. 

Extensions of the basic problem include covering 
a convex polyhedron defined by algebraic inequalities 
[10,18] and minimax location on a sphere or hemi- 
sphere using great circle distances [7,24]. See also the 
O(m log m) method of G. Xue and S. Sun [31]. 


See also 


> Combinatorial Optimization Algorithms in 
Resource Allocation Problems 

> Competitive Facility Location 

> Facility Location with Externalities 

> Facility Location Problems with Spatial Interaction 

> Facility Location with Staircase Costs 

> Global Optimization in Weber’s Problem with 
Attraction and Repulsion 

> MINLP: Application in Facility Location-allocation 

> Multifacility and Restricted Location Problems 

> Network Location: Covering Problems 

> Optimizing Facility Location with Rectilinear 
Distances 
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The problem of single facility location can be stated as 
follows: Determine the location of a single new facility 
with respect to a number of existing facilities that min- 
imizes an appropriate defined total cost function which 
is chosen to be proportional to distance. Typical exam- 
ples are the location of a new: 

a) machine in a manufacturing facility; 

b) warehouse relative to production; 


c) pump in chemical operations; 

d) well in an oil field development. 

A generalization of this problem involves the multifa- 
cility location-allocation problem, [5]. A mathematical 
formulation of the single-facility problem is as follows: 
m existing facilities are located at known distinct points 
Py, ..., Pm, a new facility is to be located at a point X, 
costs of ‘transportation’ nature are incurred and are di- 
rectly proportional to an appropriately defined distance 
between the new facility and the existing ones. Let d(X, 
P;) represent the distance between points X and P; and 
let w; represent the cost of transportation between the 
new facility and existing facility i at P;. Then the total 
‘transportation’ cost between the new facility and the 
existing facilities is given by: 


F(X) = D7) wid (X, Pi), 


i=1 


where terms w; are referred to as ‘weights’. The single 
facility location problem is to determine the location of 
anew facility, say X*, that minimizes f(X). In many ap- 
plications the cost per unit distance is a constant thus 
the minimization problem often reduces to a determi- 
nation of the location that minimizes distance. The 1, 
norm is a popular distance measure in facility location 
theory. If the coordinates for the new facility are x1, x2, 
so that X = (x), x2), and for the existing facility i the co- 
ordinates are aj, b;, so that P; = (a;, bj), the |, norm is as 
follows: 


lp = ((x1 — aj)? + (x2 — bi)?)?, p2il. 


Typically in the literature it is assumed that p = 1 or 
2, for which we obtain the rectangular and Euclidean 
norms, respectively. Examples of facility location prob- 
lems where Euclidean distance applies are the network 
location problems as well as the pipeline design prob- 
lems. Examples of facility location problems where rec- 
tilinear distance applies are machine location problems 
where transportation occurs along a set of aisles ar- 
ranged in a rectangular pattern as well as the well lo- 
cation problem in an oil field where the no flow bound- 
aries are located in the midpoint between the extraction 
wells, [6]. 
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Mathematical Model 


The Euclidean distance location problem can be stated 
mathematically as: 


min f(X) = Do wi((a — a: + (2 — b;))2. (1) 
i=1 

It is variously referred to as the Steiner-Weber prob- 
lem or the general Fermat problem. The first important 
property of the problem is that since wjl,(X, a;) isa con- 
vex function of X and the sum of convex functions is 
a convex function as well, it follows that f(X) is a convex 
function. This means that local minima are global op- 
tima of problem (1), [8]. With this information we are 
assured that the following extremal equations for f(X) 
can produce only global optima of problem (1): 


BFC) _ on will ask) 
Ox ji h(X, ai) 

One difficulty that the above equation has is that the 
derivatives are undefined if 1,(X, a;) = 0. Therefore, if 
the optimal location for the new facility coincides with 
that of an existing facility, equation (2) cannot be used 
to check optimality. However, each existing facility can 
be easily checked for optimality by utilizing the follow- 
ing property, [8]: f(X) is minimized at the rth existing 
facility location (a,1, a,2), if and only if 


2 


“ wi(ar1 — ait) 
6e2| yy) ay 
Se, lara) 
2 
- wi(ar2 — aj2) 
cab ie ie ols : 3 
tl 25 Ga ay | oe © 
i=1,3r 
Solution Approach 


An iterative procedure has been proposed for the solu- 
tion of problem (1) which is known as Weiszfeld proce- 
dure. This iterative procedure is based on the following 
equation, which can be obtained from (2) so as to get 
Xk: 


WiGil 


m 
pa 1n(X,a;) 
yy" wi” 
Dini Bean 
Note that x; is also involved in the right-hand side of 
(4) so that an iterative scheme should be employed to 


x= k= 12’ (4) 


solve (4). The following form holds at iteration / + 1: 
ae Wiail 
xt _ i=1 1(X,ai) 
k — ae Wi ’ 
i=1 1(X!,a;) 


A good initial point comes from the solution of the 
squared Euclidean problem, that is the same as the Eu- 
clidean location problem except that each distance I, (X, 
a;) is squared. The solution to this problem was found 
to be the center of gravity location given by: 


k= 1,2. (5) 


k= 1,2. (6) 


The iterative procedure is guaranteed to converge to the 
optimum location. Discussion about the convergence is 
given in [7]. 


Duality 
A strict convex approximation function to eliminate the 
problem of discontinuities of the derivatives of f(X) is 
the following: 

min f(X) 


= a wi((x1 — aj)? + (%2 — Bi)? + 2). 


i=1 


Based on this problem one can derive, [8], the dual limit 
as € — 0: 


m 
max D(U) = ~ 2 an + aj2V;) 
= 


m 

t. ; = 0, 

s.t DH (8) 
m 
Yo vi =0 


i=1,...,m. 


The inequality constraints can be equivalently 
posed as u; + v; < w; to produce a differentiable prob- 
lem that can be solved using standard nonlinear pro- 
gramming algorithm. The optimal Lagrange multipliers 
of problem (5) solve the original problem (1), [8]. It is 
of interest to notice the geometric interpretation of op- 
timal facility location. Figure 1 gives the case of four ex- 
isting facilities at points A, B, C and D. For the case of 
equal weights suppose that the four points can be ar- 
ranged in pairs (A, B) and (C, D) so that the straight 
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Single Facility Location: Multi-objective Euclidean Distance 
Location, Figure 1 
Graphical solution for the Euclidean distance problem 


lines AB and CD intersect. The intersection point X* 
is the optimum location for the new facility. However, 
if a point, for example D, lies within the triangle ABC, 
then the optimal location of new facility coincides with 
the location D. 


Discrete Location Problem 


In the discrete location problem, [4], the new facility is 
to be located and a single choice must be made from 
among a finite number of sites, say n. The mathematical 
model for this assignment problem is the following: 


min f(x) = Sy oa 


i=1 j=1 


n 
s.t. yoga j=l,...,n, 


i=1 


n 
) xij =1, $= 1,.25.,7; 
j=l 


Vi, j. 


(9) 


xij =Oorl, 


As an example of the use of the assignment model in 
a context similar to continuous location problem, sup- 
pose that there are p existing facilities and dj); represents 
the Euclidean distance between existing facility k and 
site j and that w;, represents a total cost per unit dis- 
tance incurred in transporting items between a new fa- 
cility i and existing facility k. Then, for a given assign- 
ment, wiz ei dyjxj represents the total cost incurred 
transporting items between new facility j and existing 
facility k. The total cost of transportation of all items is 


then given by: 


n Pp n n 
Soo wie Yo dejxij = SS cites (10) 
j=l j=l 


i=1 k=1 


where cj = ae wikdj;. With the above transforma- 
tion the location problem can then be solved as an as- 
signment problem, [4]. 


Objectives 


In the previous sections the objective considered for 
the location problem was the most commonly used 
minimization of cost that can be translated to mini- 
mization of distance since the cost has been considered 
directly proportional to distance. However, there are 
many other objectives used in the literature concern- 
ing the location problem. A complete review could be 
found in [3]. One can distinguish three different objec- 
tive categories: 

e the push objectives, 

e the pull objectives and 

e the balancing objectives. 

The pull objectives are based on the assumption that 
the facility to be located is desirable and so the distance 
from the ‘costumers’ has to be minimized. In this cate- 
gory belong the objectives mentioned before as well as 
objectives of profit maximization where there is a price 
associated with each demand cite. In the push objec- 
tives the facility is undesirable as for example a danger- 
ous facility due to leak possibility, and so is located to 
maximize the distance from the ‘costumer’ cites. The 
balancing objectives try to minimize the weighted (bal- 
ance) distance between the new facility and the cos- 
tumers. If we consider the distribution of all facility- 
costumer distances for any given solution, push and 
pull objectives optimize some function of the mean i.e., 
the first central moment of this distribution. In con- 
trast most balancing objectives target on minimizing 
the variability of the distribution of distances i.e., the 
second moment. A number of criteria exist to evalu- 
ate the selection of the balancing objectives, [9]. Among 
them are the scale invariance criterion, according to 
which the optimal facility location remains the same ir- 
respective of the type of measure applied to the prob- 
lem; the principle of transfers, that states that the distri- 
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bution will become suboptimal if a facility from above 
the average distance is transfered to the facility that 
is below the average. Other criteria include analytical 
tractability, normalization of measures, appropriateness, 
sensitivity and Pareto optimality. For the latter one con- 
siders a single facility location problem where the three 
points are almost collinear. The distances to the three 
points are equal if the points lie on the circumference 
of a circle with its center at the facility. However, the 
closer the points are to the line the larger the radious of 
a circle is. Customers in all three locations will benefit 
from getting the facility closer to them until it reaches 
the central point. The above case is an example where 
the equality objective alone is not meaningful. Based 
on the problems associated with balancing objectives 
a number of authors, [1], have considered the trade-offs 
between equality and efficiency in the objective func- 
tion. Concluding the facility location problem is usu- 
ally multi-objective in nature. Location under any of 
the aforementioned objectives satisfies but a single ob- 
jective. Specialized solution approaches can be used to 
guide the optimization when more than one objective 
is considered, [2]. 
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The problem of single facility location can be stated as 
follows: Determine the location of a single new facil- 
ity with respect to a number of existing facilities that 
minimizes an appropriate defined total cost function 
which is chosen to be proportional to distance. A math- 
ematical formulation of the single facility problem is as 
follows: m existing facilities are located at known dis- 
tinct points P;,..., P,, a new facility is to be located at 
a point X, costs of ‘transportation’ nature are incurred 
that are directly proportional to an appropriately de- 
fined distance between the new facility and the existing 
ones. Let d(X, P;) represent the distance between points 
X and P; and w; represent the cost of transportation be- 
tween the new facility and existing facility i at P;, then 
the total ‘transportation’ cost between the new facility 
and the existing facilities is given by: 


F(X) = Yo wid(X, Pi), 


i=1 


where the terms w; are referred to as ‘weights’. The sin- 
gle facility location problems is to determine the loca- 
tion of a new facility, say X*, that minimizes f(X). In 
many applications the cost per unit distance is a con- 
stant thus the minimization problem often reduces to 
a determination of the location that minimizes distance. 
The appropriate determined distance can be a straight 
line (i.e. Euclidean distance), [7], or a rectilinear dis- 
tance. Examples of facility location problems where Eu- 
clidean distance applies are the network location prob- 
lems as well as the pipeline design problems. If the co- 
ordinates for the new facility are x, y so that X = (x, y) 
and the coordinates for the existing facility i are aj, bj so 
that P; = (a;, b;), the rectilinear distance between X and 
P; is defined as: 


d(X, P;) => |x — a;\ + ly— b;| 5 


Examples of facility location problems where rectilinear 
distance applies are machine location problems where 
transportation occurs along a set of aisles arranged in 
a rectangular pattern as well as the well location prob- 
lem in an oil field where the no flow boundaries are 
located in the midpoint between the extraction wells. 
A distinct difference between the rectilinear and Eu- 
clidean distance is illustrated in Fig. 1, which illustrates 
that there are several different paths between X and P; 
for which the rectilinear distance is the same. The num- 


xX a 
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Location, Figure 1 
Different rectilinear paths between X and P; 


ber of such paths is, of course, infinite. This is not the 
case with Euclidean distance where the path is unique. 


Mathematical Model 


The rectilinear distance location problem can be stated 
mathematically as: 


min f(x,y) = So wi(lx—ailt|y—bil), 


i=1 


which is equivalently stated as: 


min) wi(|x — ail) + min | willy — bil), 


i=1 i=1 


where each quantity can be treated as separate opti- 
mization problem [4]: 


m 
min ) | wi(|x — ail). 


i=1 


min ) | willy — bil). 


i=1 


Some properties if the optimum solution are: 

a) The x-coordinate of the new facility will be the same 
as the x-coordinate of some existing facility. Simi- 
larly, the y-coordinate of the new facility will coin- 
cide with the y-coordinate of some existing facility. 
Itis not necessary however, that both coordinates be 
for the same existing facility. 

b) The optimum x-coordinate (y-coordinate) location 
for the new facility is a median location. 
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A different way to addressing the same problem is pro- 
posed in [10], where the problem is formulated as a lin- 
ear programming model. Let us consider the absolute 
distance between the new and the existing facility i. This 


is given by: 
dij =aj—x ifx <aij, 
diz =x—a;, ifx> ai, 
dij.dj2>0, i=l1,....m 


Using these expressions: 


x + diy — din = aj, 
di, X diz = 0, 
dij, din = 0, 


i=l,...,m, 


and the location problem (1) become a linear program- 
ming problem: 


m 
min )> wi(di + dia) 

- (2) 
s.t. x+ diy = diz = aj, 

dij,di2n > 0, i=l,...,m. 
Note that the condition dj, x dj. = 0, which is not in- 
cluded in the above formulation, is always satisfied at 
the optimal solution of problem (2). For example, let 
the current values of dj, diz be given by d,,, d’,,, where 
d', >d'., and d,, # 0. Then a solution which reduces the 
value of the objective function is given by dj = d',—d.,, 
and dj, = 0. Problem (2) represents a linear program- 
ming problem with 2m + 1 variables and m nontrivial 
constraints. The dual of problem (2) is: 


m m 
min y AiSi —Sm+1 ) aj. 
i=1 i=1 
By defining a new variable z; = s; — $41 + max;[w;], i= 
1,..., m, the dual becomes: 


min YG — max[wj])aj 


i=1 
m 


st. S(zi- max[w;]) 2 0, (3) 
i=1 


max[w;] — w; < z; < w; + max[wj], 
z 1 


i=1,...,m. 


Problem (3) has all its constraints but one of the 
bounded-variable type and can therefore solved effi- 
ciently by linear programming techniques. 

See [8], for a proof that the function W,(xx) = 07, 
wi(|xk — ajx|), where k = 1, 2 for the two-dimensional 
problems, is a convex function and for the following 
optimality conditions: at some t*, 


ywi= Jw <0, (4) 


t Nk 
iwi YE wi eo (5) 
i=1 i=t+1 


are satisfied. If condition (5) is met as a strict inequality, 
then x, = ay* x. If condition (5) is met as an equality, 
then x, € [ay*x, 4¢*+14]. Based on the above conditions, 
[8] propose an iterative procedure to determine ¢*. 


Uncertainty 


Uncertainty may appear in the destinations, which are 
also called regional demand, and the weights. Regional 
demand is modeled by a continuous spatial distribu- 
tion of one or more destinations. The location objective 
then corresponds to minimizing the expected value of 
the distance of the facility to the random destinations. 
The analytical evaluation of the integral type of the ob- 
jective is only possible in the simple cases of rectangular 
or circular regions that however lead to objectives that 
are not easy to optimize, [1]. Let us consider the sin- 
gle facility location problem with rectangular distances. 
Let each of the m existing facilities have random coordi- 
nates (Yj, Yj2), described by bivariate normal probabil- 
ity density function f(y;1, yj). We want to find a facility 
location (xf, x3) that minimizes the expected sum of 
weighted rectangular distances. The total expected cost 
is: 


EW(X) =Ey| > wjd(X, Yj) |. (6) 
j=l 
where Yj = (Ya, Yj2) and Y = (Yu, Yp, a3 
It follows that: 


> Yu Yy2). 


EW(X) = ) > wjEyd(X, Yj)) 


j=l 


m m 
= do wiEx, |x1 — Ya] + >. wiEx, |x2 — Yja| 
j=l 


j=l 
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m m 
= Yo iE, [x1 — YA 1 Ye wiExp [x2 - Yjo| 


j=l j=l 


= EW,(x1) + EW(x2), 
which means that the location problem becomes: 


min EW(X) = minEW,(x}) + minEW,(x2). (7) 
x] x2 


Using the density function f we can evaluate each one 
of these terms: 


EWioa) = Yow, f ler —yal Fon) dyn 8) 
a 


Let Yj; have mean 1; and standard deviation oj. Then: 


_ ol 1 (yn Hay? 
reaine(H(S2)). 0 


As the derivative of EW, (x) is easily evaluated as fol- 
lows: 


EW, (x1) = Yow; (1 — 2P, (< > natn) , (10) 


j=l * 


we may use a method such as interval bisection to 
find x}. 

Another approach to overcome the regional de- 
mand is to replace each region by a representative 
point, a centroid, and solve the resulting classical lo- 
cation problem, [2]. Although this approach is relative 
simple it raises questions regarding the involved aggre- 
gation error due to the arbitrary way of selecting the 
used centroids. In case of uncertainty in the weights 
the most typical question is to determine all the points 
which may be optimal for any choice of weights. As- 
suming a distribution to be known for each one of them 
H.C.L.W. Williams [11] derived the probability for each 
one of the destinations of being optimal. Also the de- 
gree of nonoptimality of a given site is an important in- 
formation that can be extracted from the uncertainty 
analysis. This type of information is important when 
considering a possible relocation decision of a facil- 
ity, [5]. 


Constraints 


Many practical applications need methods able to han- 
dle feasible regions of any shape even disconnected 


ones. See [6], for a special method for location problems 
with /,-distance within a finite union of convex poly- 
gons; this method was subsequently extended in [9] to 
general objectives and arbitrary polygonal shaped feasi- 
ble regions. 


Dynamic Location 


Let us assume that a facility is expected to serve over 
r periods of time during which may be repeatedly relo- 
cated. The problem of dynamic facility location is to find 
the single facility location but for each of the r periods. 
Let the weights w; be the present value of the cost per 
unit distance between the new facility and existing fa- 
cility j in period k, and let c, be the present value of the 
cost. The objective is then to find a series of locations Xx 
= (Xin, X2x), k= 1,..., 7, that minimize the present value 
of the location plan. The dynamic location problem to 
be considered is [8]: 


rT m r 
min ~— ‘> wjed(Xx, aj) + = CkZks 
k=2 


k=1 j=1 


(11) 


where z;, = 1 if X, 4 X,_) is allowed and z, = 0 other- 
wise. The variables z; serve as indicators if the facility is 
permitted to move from where it was the previous pe- 
riod. 


Objectives 


The problem of facility location is usually a multi- 
objective problem since more than one objectives have 
to be optimized simultaneously. The models described 
above result in the location solution where only the ob- 
jective of minimizing a distance function is being con- 
sidered. However, a number of alternative objectives 
have been considered in the literature that can be classi- 
fied in the pull objectives, push objectives and balancing 
objectives, [7]. The first category involve the constraints 
that minimize the distance to the new facility assuming 
that it is a desirable unit whereas, the push objectives 
maximize the corresponding distances assuming the lo- 
cation of undesirable unit. The balancing objectives try 
to weight the distances from the new facility to the ex- 
isting ones. An alternative to the objective of minimiz- 
ing the distance is the maxmin objective, [8], that has 
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the following form: 


max f(x) 


st. f(x) = minwjl,(X,aj), j=l,...,m. 


In order to incorporate more than one objective in 
the facility location problem, multi-objective optimiza- 
tion techniques should be applied, [3]. The basic idea 
of these techniques is the systematic generation of the 
Pareto optimal solution set that involves the points in 
which one objective can be improved only at the ex- 
pense of other objectives. 
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Introduction 


Optimization problems for a finite-dimensional de- 
cision variable under infinitely many inequality con- 
straints are called semi-infinite (see [8,19] for reviews). 
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Smoothing Methods for Semi-Infinite Optimization 


For smoothing methods one considers them in the 
form 


SIP: in f(x) subject to g(x, y) = 0 forall y € Y, 
with the objective function f € C?(IR",R), the con- 
straint function g € C?(R" x R™, R), and a nonempty 
and compact index set Y C R™. Moreover, Y is as- 
sumed to be described by finitely many inequality con- 
straints, 


Y = {ye R”|v(y) = 0}, 


with v € C?(R™, R°) and s € N. The feasible set of SIP 
is denoted by 


M = {x € R"| g(x, y) = O forall ye Y}. 


A basic problem in semi-infinite optimization is to 
check whether a point x € R” is feasible, since this in- 
volves the verification of infinitely many inequality con- 
straints. 

The semi-infinite constraint in SIP is equivalent to 


G(x) := a g(x,y) = 0, 


which means that the feasible set M is the upper level 
set of the, in general, nonsmooth function G. Smooth- 
ing methods try to replace G by a smooth function, but 
to keep important properties of the original problem 
under this modification. More explicitly, one wishes 
that under weak assumptions a nonempty and compact 
feasible set M can be approximated arbitrarily well by 
a level set of a single smooth function with certain reg- 
ularity properties. Moreover, there should be a corre- 
spondence between Karush-Kuhn-Tucker points of the 
original and of the smoothed problem, along with their 
Morse indices. 


Smoothing Approaches Motivated 
by Nonlinear Programming 


A smoothing procedure for finite optimization prob- 
lems is given in [12]. There the main idea is to use the 
logarithmic barrier approach to approximate finitely 
many inequality constraints g;(x) > 0, i € I, |I| < 0, 
by one smooth and nondegenerate constraint 
Ye, In(gi(x)) = In(e) for ¢ > 0. A similar approach 
is taken in [5] to smooth finite maximum functions. 


However, obvious generalizations of this approach to 
semi-infinite programming are not successful. 

In fact, there are two standard arguments which 
connect SIP to finite optimization problems. First, 
a sufficiently fine discretization of the index set leads 
to an arbitrarily accurate outer approximation of M 
by finitely many inequality constraints which could, in 
a next step, be smoothed by the logarithmic barrier ap- 
proach. Unfortunately, the so-called second-order shift 
terms of semi-infinite programs are ignored by the dis- 
cretized problem, so correspondences of Morse indices 
cannot even be established between the original and the 
discretized problem, let alone the smoothed discretized 
problem. 

Second, assuming the so-called reduction ansatz at 
some point x € M, the feasible set can locally be de- 
scribed by finitely many smooth inequality constraints. 
The logarithmic barrier approach for this locally re- 
duced SIP is used in [10]. While Morse indices are mod- 
eled well in this approach, the assumption of the reduc- 
tion ansatz in the whole feasible set is not generic [16]. 

Another obvious generalization of the approach 
from finite programming is to directly use the barrier 
term f; In(g(x, y)) dy for SIP. For infinite quadratic 
programming problems a related interior point ap- 
proach is presented in [18]. For nonlinear SIP, however, 
this logarithmic barrier term is neither self-concordant 
nor does it necessarily enforce interior points, as an 
example from [10] shows. The main problem is that 
in some situations even the singularity of the loga- 
rithm is smoothed by the integral, and boundary points 
can become feasible for the approximation (note that 
f\n(y) = yln(y) — y can be continuously extended to 
y = 0 with value 0). 


Smoothing Approaches for Semi-Infinite Programs 


An approximation of the feasible set in semi-infinite 
optimization by a quadratic distance function is pre- 
sented in [6]. While smoothness of the approximating 
problem is shown, it is inherently degenerate, so no re- 
sults on Morse indices can be expected from this ap- 
proach. 

A smoothing method for semi-infinite programs 
adhering to all the above criteria is given in [14,15]. 
There the function G is smoothed by mollification, as 
is explained in the remainder of this contribution. 


Smoothing Methods for Semi-Infinite Optimization 
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Definitions 


The auxiliary function G is the optimal value function 
of the so-called lower level problem: 


Q(x): min g(x, y) subject to v(y) > 0. 
yeR™ 

Points x from the topological boundary 0M of M 
satisfy G(x) = 0, and the corresponding globally mini- 
mal points of Q(x) are denoted by 


Yo(x) = {y € Y1g(x, y) = 0}. 


The set Yo(x) is also called the active index set of x for 
SIP. 


The Extended Mangasarian-Fromovitz Constraint 
Qualification 


A nice topological structure of M at its boundary 
points can be guaranteed under constraint qualifica- 
tions. Since G is directionally differentiable [1], accord- 
ing to [21] the natural extension of the well-known and 
basic Mangasarian—Fromovitz constraint qualification 
[17] at a zero x of Gis 


(dE R"|G'(z,d) > 0} 49. 


With the formula for the directional derivative 
from [1], one obtains the following explicit condition 


which is well known for semi-infinite programs [9,16]. 


Definition 1 At * € M the extended Mangasarian- 
Fromovitz constraint qualification (EMFCQ) is said to 
hold if there exists some vector d € R” with 


Dx g(x, y)d > 0 for all y € Yo(x). 


Here D,g denotes the row vector of partial derivatives 
of g with respect to x. In [16] it is shown that EMFCQ 
holds generically in semi-infinite programming and is, 
thus, a weak assumption. 


The Reduction Ansatz 
and Nondegenerate KKT Points 


For theoretical as well as numerical purposes it is of 
crucial importance to keep track of the elements of 
the active index set Yo(x) for varying x. Recall that 
each y € Yo(x) is a global minimizer of Q(x). The re- 
duction ansatz [7,22] is said to hold at x € M if all 


global minimizers of Q(x) are nondegenerate in the 
sense of Jongen et al. [11]. Since nondegenerate mini- 
mizers are isolated, and Y is a compact set, the closed 
set Yo(x) can only contain finitely many points, say, 
Yo(x) = {y’, ... , y?} with p € N. By a result from [3] 
the local variation of these points with x can be de- 
scribed by the implicit function theorem. 

In fact, for x locally around x there exist contin- 
uously differentiable functions y'(x), 1 <i < p, with 
yi(x) = y' such that y'(x) is the locally unique local 
minimizer of Q(x) around jy’. It turns out that the func- 
tions G;(x) := g(x, y'(x)) are even C? in a neighbor- 
hood of x. 

A major consequence of the reduction ansatz is 
the so-called reduction lemma: if the reduction ansatz 
holds at x, then for all x from a neighborhood U of x 
we have 


G(x) = min G;(x). 
l<i<p 


This means that M can locally be described by finitely 
many C’ constraints, that is, SIP is locally equivalent to 
the smooth finite optimization problem: 


STP yea: min f(x) subject to G;(x) > 0, 
xe" 


Examples show that the reduction ansatz cannot be ex- 
pected to hold everywhere in the feasible set of a generic 
semi-infinite program [16]. As nondegeneracy in the 
sense of Jongen et al. is a local property, one can, how- 
ever, define a nondegenerate KKT point of the SIP via 
the locally reduced problem SIPyeq. Let 


P 
L(x, A) = f(x) — ya G;(x) 


i=1 


denote the Lagrangian of SIPreq with multiplier vector 
AeER?. 


Definition 2 A point x € M is called a nondegener- 

ate Karush-Kuhn-Tucker point of SIP if the reduction 

ansatz holds at x and if x is a nondegenerate Karush- 

Kuhn-Tucker point of SIP,.q, that is, the following 

three conditions hold: 

1. The linear independence constraint qualification 
holds at x, and there exists a (unique) multiplier vec- 
tor A > 0 with D, L(x, A) = 0. 
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2. The multipliers satisfy 4; >0,i=1,... iD: 

3. The matrix D2L(<, A)|r,M,.4> that is, the Hessian of 
the Lagrangian, restricted to the tangent space to 
Myeq at X, is nonsingular. 

The number of negative eigenvalues of the matrix in 

condition 3 is called the Morse index of x. 


For generic SIP all Karush-Kuhn-Tucker points are 
nondegenerate [20,24]. In this sense, nondegeneracy of 
KKT points for SIP is a weak assumption. 


Mollifiers 


With the Euclidean norm ||-||2 on R” the standard 
mollifier [2] is the C°° function: 


ree (jax): Ilx|l2<1, 


Ixll, = 1, 


n(x) = 


where C > 0 is chosen such that tee n(x) dx = 1. For 
é€ > 0 put 


Ne(x) = =n (=) 


The function n¢ is also C™, it satisfies ine Ne(x) dx = 1, 
and its support is the closed ball B(0,¢) with 
B(O, €) = {x € R"| ||x||2 < e}. 


Definition 3. For ¢ > 0 the ¢-mollification of a lo- 
cally integrable function F: R" — R is the convolution 
F® = n, * F on R", that is, 


F*(x) =) Ne(x—Z) F(z) ae = [ Ne(Z)F(x—z) dz 
R" 


B(0,€) 
for all x € R”. 


Theorem 1 ([2]) 

1. For alle > 0, the e-mollification F* is in C*°(R", R). 

2. If F is continuous on R", then F* converges to F uni- 
formly on compact sets for ¢ — 0. 


Formulation 


To formulate the smoothing method, the following 
three weak assumptions are made in [14,15]. 


Assumption 1 The feasible set M of SIP is nonempty 
and compact. 


Assumption 2 The EMFCQ holds everywhere in M. 


Assumption 3 All KKT points of SIP are nondegener- 
ate. 


The smoothing approach is based on the mollification 
of the optimal value function G: 


oS Siete By): 


In view of Theorem 1, the function G* is C® for each 
€ > 0, and G* converges to G uniformly on compact 
sets for e > 0. 

Intuitively, for sufficiently small ¢ > 0 the set 


M* = {x € R"|G*(x) > 0} 
and the smooth finite optimization problem 


SIP*: min f(x) subject to G*(x) > 0 

xElk" 

should be strongly related to M and SIP, respectively. 
This statement is made precise in the following theo- 
rems. 


Theorem 2 ([14]) M° converges to M in the Hausdorff 
distance for € — 0. 


Theorem 3 ([14]) For all sufficiently small e > 0, EM- 
FCQ holds everywhere in the set M°. 


Theorem 4 ([14]) For all sufficiently small ¢ > 0, the 
set M® is homeomorphic with M. 


Theorem 5 ([14,15]) 

1. The set KKT(f, M) of Karush-Kuhn-Tucker points 
of SIP is finite. 

2. For each x € KKT(f,M) let U(x) be some neigh- 
borhood of x. Then outside the sets U(x),x € 
KKT(f,M), the problem SIP* has no KKT points for 
sufficiently small ¢ > 0. 

3. The neighborhoods U(x),x € KKT(f,M), from 
part 2 can be chosen such that each U(x) contains 
exactly one KKT point x* of SIP® for sufficiently small 
& > 0. Moreover, x* is nondegenerate, and the Morse 
index of x in SIP and the Morse index of x° in SIP* 
coincide. 


Conclusions 


As an application of smoothing by mollifiers, Jongen 
and Stein [14] showed an important topological prop- 
erty of semi-infinite optimization problems. In fact, as- 
sume that at any x € M one can define ascent and 
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descent directions for f. Then these define ascent and 
descent flows for f, respectively. For compact M sup- 
pose that all local minima and maxima of f on M are 
isolated critical points. Starting in a neighborhood of 
a local minimum one follows the ascent flow and might 
reach a local maximum. From there one steps down- 
wards via the descent flow and might reach a local min- 
imum. Perhaps the latter minimum is different from 
the former one, and the previous procedure is repeated. 
In this way one obtains a kind of “bang-bang” path in 
M which connects certain local minima and local max- 
ima. The main question that arises is whether one can 
reach all local minima via such a bang-bang strategy or, 
equivalently, if a certain “min-max digraph” is strongly 
connected [12]. Of course, M has to be assumed to be 
connected, since only local information is used. 

Even for finitely many constraints, in general the 
answer to the latter question is negative. A two- 
dimensional counterexample was given in [23], and 
the general mechanism which generates obstructions is 
presented in [4]. On the other hand, a special global 
adaptation of the metric, constructed in [12], gives 
a positive result. Moreover, Jongen and Stein [13] pre- 
sented an automatic adaptation of the metric based on 
local information which generically gives a positive re- 
sult. 

Smoothing by mollifiers allows one to derive a sim- 
ilar result for SIP. In fact, for generic Riemannian 
metrics and sufficiently small ¢ > 0 the corresponding 
min-max digraph of SIP* is strongly connected [14]. 
In view of Theorem 5 the corresponding KKT points 
(especially the local minima and maxima) of SIP* are 
arbitrarily close to those of the unperturbed SIP. This 
shows that SIP can be approximated arbitrarily well by 
a smooth finite SIP* with strongly connected min-max 
digraph. 
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Smooth optimization problems can be considered in the 
form of 


min f(x) 

st. A(x) =0, j=l,...,n—k, 
: (1) 
gi(x) <0, i=1,...,m, 
x€ RR", 


where k > 0, R” is the n-dimensional Euclidean space, 
fi hj, j=il,....n—k, gi,i=1,..., m, are at least twice 
continuously differentiable real valued functions and 
the aim is to find a solution point and/or the optimal 
value of the objective function f. Note that the underly- 
ing space can be more general than the Euclidean one, 


e. g., Hilbert space. A smooth optimization problem (1) 
is nonlinear if the objective function f or at least one 
constraint function is nonlinear. Problem (1) is noncon- 
vex if at least one function from f, g;,i= 1, ...,m, is not 
convex or at least one function from hj, j=1,....n—k, 
is not affine. An important class of problems (1) is that 
of the unconstrained optimization problems, where the 
constraints g; and hj, for all i, j are not present or where 
every point in the domain of f is feasible, i.e., satisfies 
the constraints. If the number of constraints is infinite, 
then (1) result in semi-infinite optimization problems, 
and if the variables are restricted to a subset of the in- 
tegers, then integer optimization problems are obtained. 
Since minimization and maximization are mathemati- 
cally equivalent, without loss of generality, maximiza- 
tion should be replaced with minimization in (1). The 
practical applications of nonlinear optimization are in- 
credibly vast, and moreover, smooth nonlinear opti- 
mization has very good properties with respect to struc- 
tural investigations and computational performances. 

Problem (1) can be considered a representation of 
models providing tools to describe real-life constraints 
of different types. For theoretical investigations, other 
representations could be helpful. Let h denote the map 
from R" into R"~* of components hj, j = 1,..., n — k; 
furthermore, assume that the following regularity con- 
dition holds: 0 is a regular value of h, i.e., the Jacobian 
matrix Jh(x) € £(R", R’*) of h at x is of full rank (n — 
k) for allx € M = {x € R": h(x) = 0,j =1,..., n — kh. 
Under this assumption, the feasible set 


A={xeM: g(x) <0,i=1,...,m} (2) 


is a subset of the k-dimensional submanifold M of class 
C? in R” which can be endowed with a Riemannian 
metric (e.g., the one induced by the Euclidean struc- 
ture of R”). Assume, furthermore, that A is connected. 
In order to better see the structure of problem (1), we 
reformulate it into the following form: 


min f(x) 


(3) 
st xEACMC™M, 


where M is a k-dimensional Riemannian manifold and 
M is the n-dimensional differentiable manifold R” en- 
dowed with the Riemannian metric G,(x) = J, x € R", 
which induces the Riemannian metric of M defined as 
the restriction of the n x n identity matrix to all the tan- 
gent spaces of M. The speciality of problem (1) is that 
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the representation of the manifold M is not a curvilin- 
ear coordinate system in the sense of differential geom- 
etry and the essential condition M C M holds, which 
motivates the investigation of the common curvilinear 
coordinate representations of { and M from the point 
of view of nonlinear optimization. 

A local minimizer of problem (1) or (3) is a feasible 
point x* € A such that f(x*) < f(x) for all x in a feasi- 
ble neighborhood of x*. If f(x*) < f(x) for all feasible 
x, then x* is called a global optimizer. If x* is a local 
optimizer and f(x*) < f(x) for all x # x* in a feasible 
neighborhood of x*, then x” is called a strict local opti- 
mizer. If x* is the only local minimizer in some feasible 
neighborhood of x”, it is called an isolated local mini- 
mizer. 

A fundamental result due to K. Weierstrass is the 
fact that a feasible global minimizer of a continuous 
function f exists in a nonempty and compact feasible 
region A. If f is once continuously differentiable and x* 
is a local unconstrained minimizer, then the gradient V 
f(x*) = 0. If f is twice continuously differentiable and 
x* is a local unconstrained minimizer, then V f(x*) =0 
and the Hessian matrix H f(x*) is positive semidefinite. 
If V f(x*) = 0 and H f(x") is positive definite, then x* 
is an isolated (hence, also strict) local minimizer. 

In the case of a finite number of equalities, the lo- 
cal optimality conditions are deduced by the Lagrange 
multiplier rule, [9,10]. This classical rule was indepen- 
dently extended to constraints including a finite num- 
ber of equalities and inequalities in [2,3,6,7,8]. A trans- 
parent description of the smooth local optimality con- 
ditions can be found, e. g., in [1,4,5,11]. By improving 
these rules [13], global optimality conditions can be ob- 
tained which are formulated as follows. 

Let x* € A bea given point, I(x*) denote the index 
set of the active inequality constraints at x*, |I(x*)| the 
number of active constraints, and gy,*): R"” > RUG)! 
the mapping of the active constraints at x*. (An in- 
equality constraint is active at x* if equality holds.) 

Let us introduce: 

e M,.* as the set 


hj(x) = 0, 
2 j=1,....n—k 
Rete 1, J ’ ’ ’ (4 
j, ") : gi (x) + Sze — 0, ( ) 
i € I(x*) 


e the set TM,»: 


{ovi-¥2) € RMN), 


Vhj(x)v1 = 0, 
j Hine th (5) 
Vgi(x)v1 + Ziv2i = 0, ¢, 
i € I(x*), 
(x, z) € Myx 


e aregularity condition 


r(Lh(x), 0} ", gie)(x), Dz]") 
=n—k+|I(x*)|, (6) 
(x, Z) S My*, 


where Jh and Jg7(.*) are the Jacobian matrices of the 

mappings h: R” > R”* and gyx*): R" > RIO, 

respectively, and D, = diag(z1, ..., Z1*)|) the diago- 

nal matrix with the components of the vector z. 
Here, problems satisfying (6), i. e., regular problems, are 
considered for which the inequality n > k holds. It is 
well-known that there cannot exist Lagrange multipli- 
ers for a local minimum in an irregular problem. In- 
stead of (1) or (3), let us consider the problem 


ace f(x). (7) 

Asa point x* € A isa local optimal solution of prob- 
lem (1) if and only if (x*, 0) € M,» is a local optimal 
solution of (7), and since the orthogonal projection of 
M,* to R" with respect to the Euclidean metric contains 
A, a point x* € A is a global optimal solution of prob- 
lem (1) if (x*, 0) € M,» is a global optimal solution of 
(7), we deal with this latter problem only. 

Let the Lagrangian function associated with f and 
M,» be defined as 


L(x, Z, (x), A(x, Z)) 


n—k 
= f(x) — > uj @hj(x) 


j=l 
1 2 
= se Ai(x, Z) gi (x) + 377i : (8) 
i€I(x*) 
(x,z) € Myx, 
pu: R? > R™*, 


pe Rite") > RUC 
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where 


u(x)’ = Vf(x)Jh' (x) [yh(x) h(x) ™] ; 
Ago” = Vibe ao 
x [Une (x), DAU gi), Del]. 


Let the geodesic gradient vector and the geodesic Hes- 
sian matrix of the Lagrangian function (8) be defined as 


VEL(x, z, U(x), A(x, z)) 
n—k 


= V(x) — > u(x) Vhj(x) 


j=l 
— Yo Aik, 2)Vgilx), 
i€I(x*) 
(x,z) € Myx, 
VE L(x, z, U(x), A(x, z)) 
=— x Aix, z)zie} , 
i€I(x*) 


(x,z) € Myx, 
where e;, i= 1, ..., |[(x*)|, are the unit vectors, 


Hé 


(x,z 


L(x, Z, L(x), A(x, Z)) 


n—k 
Hf (x) — © py(x)Hhy(x) 


j=l 


=| — Yo Aj(x,z)Hgi(x) 0 . (9) 
i€I(x*) 
0 =Di} ies 
(x,z) € My», 


where the symbol |r M,« denotes the restriction to the 
tangent spaces of M,+ and Dy, is the diagonal matrix 
with components A;(x, z), i=1,..., |I(x*)|, at (x, z). 

Now, the global Lagrange multiplier rule is for- 
mulated for the case of equality and inequality con- 
straints. First, a definition of geodesic convex sets is re- 
called where the geodesic is used in the classical mean- 
ing. If M is a Riemannian manifold, then a set A C M 
is geodesic convex if any two points of A are joined 
by a geodesic belonging to A, moreover, a singleton is 
geodesic convex. It is emphasized that every Rieman- 
nian metric generates a geodesic convexity notion. In 
optimization theory related to the Lagrange multiplier 
rule, the induced Riemannian metric seems to be the 
most important. 


Theorem 1 (Global Lagrange multiplier rule) If the 
point (x*, 0) € Mx» is a (strict) local or global minimum 
of problem (7), then 


Vij L(x", 0, p(x*), A(x*, 0)) = 0, 
(v1. V2) Hey L(x*, 0, u(x"), A(x", 0)) 
X (v1,V2) = (>) 0, 

(v1,V2) € TMy«(x*, 0), 


((v1, V2) # 0). 


Ifa © My» is an open geodesic convex set with re- 
spect to the induced Riemannian metric and 


Ve yl(s", 0, p(x"), A(x*, 0)) = 0, 
(vi. v2) THE L(x, 2, [u(x), A(x, 2)) 
X (v1, V2) = (>) 0, 

(v1, V2) € TMy« (x, 2), 

(vi, V2) # 9), 

(x, Zz) € Mx, 


(10) 


then the point (x*, 0) is a (strict) global minimum of the 
function f on A. Moreover, 


Ven l(x,z, (x), A(x, z)) = Df (x,z), a 
(x, z) € Myx, 
He, L(x, 2, U(X), A(x, 2)) = D? f(x, 2), (12) 
(x, z) € Myx, 


where Df and D’f are the first and second covariant 
derivatives of the function f with respect to the induced 
Riemannian metric of the manifold M,», respectively. 


Because of the linear independence of the active gradi- 
ents, the first order optimality condition is equivalent 
to the classical one, and in the case of equality con- 
strained problems, the second order optimality con- 
ditions coincide with the classical ones at the station- 
ary points as well, moreover, a geodesic convex feasible 
neighborhood always exists around a stationary point 
in a Riemannian manifold, so this latter condition does 
not mean a new restriction. If inequality constraints are 
present, then from (5) and (9), the classical optimality 
conditions and the nonpositivity of the Lagrange mul- 
tipliers at the stationary points can be deduced. In this 
approach, neither Farkas’ lemma in the necessary part 
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nor strict complementarity assumption (A;(x*, 0) > 0, 
i € I(x*)) in the sufficiency part are used; they are re- 
placed by the regularity condition and the smoothness 
of the functions. 

By the global Lagrange multiplier rule, the neces- 
sary and sufficient optimality conditions are given by 
the same tensor formulae based on the first and sec- 
ond covariant derivatives, only the domains are differ- 
ent where the second order formula holds. The sec- 
ond order conditions (10) define a class of functions on 
geodesic convex sets with respect to the induced Rie- 
mannian metric, the geodesic convex functions with re- 
spect to the same metric, introduced in optimization 
theory in [12]. It is recalled that if M is a Riemannian 
manifold and A C M a geodesic convex set, then a func- 
tion f: A > R is geodesic (strictly) convex if its restric- 
tions to all geodesic arcs belonging to A are (strictly) 
convex in the arc length parameter. From the point of 
view of geometry, the existence of a constrained mini- 
mum is equivalent to the existence of a geodesic convex 
function with respect to the induced Riemannian met- 
ric. It follows that the Lagrange method can be consid- 
ered the transformation of a constrained problem in R” 
into an unconstrained problem on the constraint sub- 
manifold with the induced Riemannian metric in R”. 
In the case of a Euclidean space, the geodesic convexity 
coincides with the classical one. The application of the 
Riemannian geometry highlighted the geometric back- 
ground of smooth optimization and provides it with 
strong mathematical tools to study structural proper- 
ties (e. g., geodesic convexity) and to deepen the theory 
of algorithms (e. g., variable metric methods). 


See also 


> aBB Algorithm 

> Continuous Global Optimization: Models, 
Algorithms and Software 

> Global Optimization in Batch Design Under 
Uncertainty 

> Global Optimization in Generalized Geometric 
Programming 

> Global Optimization Methods for Systems of 
Nonlinear Equations 

> Global Optimization in Phase and Chemical 
Reaction Equilibrium 

> Interval Global Optimization 


> 


> 


MINLP: Branch and Bound Global Optimization 
Algorithm 
MINLP: Global Optimization with aBB 
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Problem Formulation and Basic Facts 


Let X be a nonempty convex set in the real n-di- 
mensional space R”, and let G: X — J7(IR") be a mul- 
tivalued mapping. Here and below JT(A) denotes the 
family of all nonempty subsets of a set A. Then one 
can define the multivalued or generalized variational 
inequality (GVI) problem, which is to find an element 
x* € X such that 


Ag* € G(x*), (g*,y—x*)>0 VyeXx. (1) 


If the cost mapping G is single-valued, GVI (1) reduces 
to the following usual variational inequality (V1) prob- 
lem: Find an element x* € X such that 


(G(x*),y—x*)>0 VWyeXx, (2) 


where G: R” — R” isa given mapping. 

VIs are now regarded as very useful and pow- 
erful tools for investigation and solution of various 
equilibrium-type problems arising in economics, engi- 
neering, operations research and mathematical physics. 
Many such applied problems involve multivalued map- 
pings with rather weak continuity properties. The arti- 
cle is devoted to the construction of solution methods 


for VIs with multivalued cost mappings. These prob- 
lems involve in particular multivalued inclusions, com- 
plementarity and fixed-point problems, nonsmooth op- 
timization and game problems, and mixed VIs (MVIs). 
Problem (1) was originally considered by Browder [5]. 

It is well known that the multivaluedness creates 
certain difficulties for providing convergence of many 
iterative methods, which are applied successfully to sin- 
gle-valued problems. This fact leads to the necessity of 
construction of new solution methods. In this article, 
we outline briefly the current situation and describe 
some new advances in this field. 

First we consider some existence results for GVI (1) 
which are based on certain continuity-type properties 
of multivalued mappings; see, e. g., [11,16,17]. 


Definition 1 A multivalued mapping Q: R” > 
IT(IR") is said to be a K (Kakutani) mapping on X if it 
is upper semicontinuous on X and has nonempty, con- 
vex, and compact values. 


Proposition 1 Let G: X — IT(R") be a K-mapping. 

Suppose at least one of the following assumptions holds: 

(a) The set X is bounded; 

(b) there exists a nonempty bounded subset Y of X such 
that for every x € X\Y there is y € Y with 


(g.x-—y)>0 Vg Ee G(x). 


Then GVI (1) has a solution. 


The solution of GVI (1) is closely related to that of the 
corresponding dual (or Minty) GVI (DGVI) problem, 
which is to find a point * € X such that 


Vx € X and Vg € G(x): (g,x —*) >0. (3) 


We denote by X° (respectively, by X“) the solution 
set of problem (1) (respectively, problem (3)). Recall 
certain monotonicity-type properties for multivalued 
mappings. 


Definition 2. Let Q: R” — J7(R") be a multivalued 

mapping. The mapping Q is said to be 

(a) strongly monotone on X with constant t > 0 if for 
each pair of points x, y € X and for all q’ € Q(x), 
q’ € Q(y), we have 


(q’—q",x-—y) >= tllx- yl; 
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(b) strictly monotone on X if for all distinct x, y € X 
and for all q’ € Q(x), q” € Q(y), we have 


(q = 9" x= 9) > 03 


(c) monotone on X if for each pair of points x, y € X 
and for all q’ € Q(x), q” € Q(y), we have 


(q’—q",x—y) 20; 


(d) pseudomonotone on X if for each pair of points 
x,y € X and for all q’ € Q(x), q” € Q(y), we have 


(q”,x—y) = 0 implies (q',x—y) >0. 


From the definitions we have the following implica- 
tions: 


(a) => (0) (c) ==> (a). 


The reverse assertions are not true in general. 
Now we give an extension of the Minty lemma for 
the multivalued case. 


Proposition 2 

(i) The set X* is convex and closed. 

(ii) If Gis a K-mapping, then X4 C X*. 
(iti) If G is pseudomonotone, then X* C x?, 


We also recall some conditions under which GVI (1) 
has a unique solution. 


Proposition 3 

(i) If Gis strictly monotone, then GVI (1) has at most 
one solution. 

(ii) If G is a strongly monotone K-mapping, then 
GVI (1) has a unique solution. 


Of course, there exist various modifications and exten- 
sions of the above results; see, e. g., [3,16] for more de- 
tails. 


Projection Methods for GVIs 


We observe that the existence and uniqueness results 
for multivalued problems are very similar to those for 
single-valued VIs, but this is not the case for solution 
methods in general. That is, the substantiation of con- 
vergence and derivation of rates of convergence for it- 
erative methods applied to multivalued problems meet 
certain difficulties in comparison with those in the sin- 
gle-valued case. This reduces essentially the number of 


approaches to the creation of efficient solution meth- 
ods. To illustrate this assertion, we first outline the be- 
havior of projection-type methods. 

Unless otherwise stated, throughout the article we 
suppose that 

(C1) X is a nonempty, convex and closed subset of 
the real n-dimensional space R", G: X — IT(IR") is 
a K-mapping. 


Projection Method 


Let us consider the standard projection method 
xP 41 = ay[x*—Apg*], g* € G(x"), A, > 0, (4) 


where zx(-) denotes the projection mapping onto X. 
Usually, during the computation process we can find 
at least one element from G(x*) at the current point 
x*, but the whole set G(x*) is not determined explic- 
itly. The problem is to find a suitable rule for choosing 
the step size A; > 0, which provides convergence under 
mild assumptions and a good rate of convergence. We 
recall that in the single-valued case, where (C1) means 
that G is continuous, the above method is rewritten as 
follows 


x41 = ay[x* —A,G(x")], An > 0, (5) 


and its convergence requires either integrability, 
or strengthened monotonicity (co-coercivity, strong 
monotonicity) and Lipschitz continuity assumptions. 
That is, if G is of the form G = Vf, where f is a given 
function, the step size A; can be chosen in conformity 
with the known exact or inexact (Armijo-type) rules. 
Then method (5) generates a sequence whose limit 
points are solutions of VI (2) with G = Vf; moreover, 
it attains a linear rate of convergence if G is strongly 
monotone and Lipschitz continuous. The superlinear 
rate of convergence can be obtained within the conju- 
gate gradient approach. However, this is not the case if 
G is not integrable. In fact, the same method (5) does 
not provide convergence even in the nonintegrable 
monotone case, for instance, when G(x) = Ax + b 
with A being skew-symmetric, regardless of the step- 
size choice. Therefore, we have to utilize different step- 
size rules and impose certain additional assumptions, 
such as co-coercivity. 


Definition 3 A mapping Q: R” — R” is said to be 
co-coercive with constant jz > 0 on X, if for each pair 
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of points x, y € X, we have 


(Q(x) — Q(y), x — y) = ul]Q(x) — QYy)II . 


In fact, if G is co-coercive and VI (2) is solvable, then 
method (5) with the fixed step size Ax, = A € (0, 22) 
generates a sequence {x*} which converges to a solu- 
tion of VI (2); see [15]. At the same time, the co-coer- 
civity of G again implies the single-valuedness and even 
Lipschitz continuity of G. 

Nevertheless, method (4) becomes convergent if we 
utilize the divergent series step-size rule 


Yaz <0, (6) 


a [oe 
A= ae a, > 0, » =, 
le" lh rm sh 


and replace monotonicity of G with the acute angle con- 
dition: 

(C2’) GVI (1) is solvable, and for every x* € X*, it 
holds that 


Vx € X\X*, Vg € G(x), (g,x—x*) > 0; (7) 


see [15]. This property has clear geometric sense: the 
angle between —G(x) and x* — x has to be acute at 
each nonoptimal point x. For instance, (7) holds if G 
is strictly monotone. 

However, rule (6) leads to very slow convergence 
and prevents the method from attaining a linear rate 
of convergence. For this reason, we have to apply other 
approaches 
e to utilize more efficient step-size rules, 

e to attain more rapid convergence, 
e to weaken sufficient conditions for convergence. 


Basic Solution Methods for GVIs 


So, we intend to describe some other solution methods 
for GVI (1). First of all, in addition to (C1) we will uti- 
lize the following weakened condition 

(C2) GVI (1) is solvable, and for every x* © X*, it 
holds that 


Vx EX, Vg € G(x), (g,x—x*) > 0 (8) 


(cf. (3) and (7)) or the somewhat more restrictive, but 
simplified version 
(C3) GVI (1) is solvable, and G is monotone. 


Clearly, we have the following implications 


(C2') = (C2) and (C3) => (C2) 


but the reverse assertions are not true. 
Owing to Proposition 2, we see that (C2) is equiva- 
lent to 


X* =x? Zg 


for GVI (1) if (C1) is fulfilled. Also note that (8) may 
in principle be called the nonobtuse angle condition; 
see [26] for more details. 

We divide the basic solution methods into the fol- 
lowing families: 
Averaging methods; 
Center-type methods; 
Combined relaxation methods; 
Proximal point methods; 
Regularization methods. 
In the next sections, we describe properties of these 
general approaches to enhance the convergence prop- 
erties of methods (4) and (6). 


Averaging and Regularization Type Methods 


We now consider the methods which utilize modifica- 
tions of the initial problems or some other kind of con- 
vergence. 


Averaging Method 


The idea of the averaging method consists in replac- 
ing the usual convergence of {x*} with an ergodic con- 
vergence. It utilizes the same divergent series step-size 
rule (6). In fact, the sequence 


k k 
gk Dax / ai ; (9) 
i=0 i=0 


enjoys stronger convergence properties than {x*}. This 
idea leads to the so-called averaging method, which is 
due to Bruck [8]. 

Method (AVR). Choose a point x° € X and a pos- 
itive sequence {a}. Set z° := x°, Bo := dy. At the kth 
iteration, k = 0,1,..., set 


41°= bis Tel = Ok+1/BR4i 5 
Brai = Be+ Oeti, Tee1 = Oe41/Be41 5 


xT x(x" + axg*) ie E G(x*); 


k k-+1 k 
ZO = Tepe +L te )z". 
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From the description it follows that the sequence {z*} 
generated by (AVR) satisfies (9). 


Theorem 1 Suppose that (C1) and (C3) hold and that 
sequences {x*} and {z*} are constructed by (AVR) and 
that the sequence {cx} satisfies the following conditions: 


[o.e) [o,@} 
Via =0o, DS (axllg*|)? <0. 
k=0 k=0 


Then there exist limit points of {z*} and all these points 
belong to X°. 


The rate of convergence of the averaging method 
was investigated by several authors. It was shown by 
Nemirovskii [40] that ||z* — x*|| = O(1/k), where 
x* © X*. 


Regularization Methods 


The idea of the earliest and most popular regularization 
method consists in replacing the initial GVI (1) with 
a sequence of the following auxiliary problems: Find 
a point x* € X such that 


Ag® € G(x*), (g° +ex®,x—x°)>0 VWxeX, (10) 


where € > 0 is a regularization parameter. It was first 
proposed by Tikhonov [46] and was adjusted to VIs by 
Browder [7]. Suppose that (C1) and (C3) hold. Then 
G is monotone, G + el is strongly monotone and, by 
Proposition 3, (10) has a unique solution, which can be 
found by one of the versions of the above projection 
method within a given accuracy. The basic approxima- 
tion property of the exact regularization method is for- 
mulated as follows: 


Theorem 2 Suppose that (C1) and (C3) are fulfilled 

and that the sequence {x**} is obtained from (10) with 

{ex} \, 0. Then the following assertions are true: 

(i) each auxiliary GVI (10) has a unique solution; 

(ii) the sequence {x** } converges to the solution x* of (1) 
nearest to the origin. 


We also can replace (C3) with (C2) and obtain sim- 
ilar convergence properties despite the fact that the 
cost mapping in (10) is not monotone. We present 
a strengthened version of the result from [34]. 


Theorem 3 Suppose that (C1) and (C2) are fulfilled 
and that the sequence {x**} is obtained from (10) with 
{Ex} \, 0. Then the following assertions are true: 


(i) each auxiliary GVI (10) has a solution; 
(ii) {x**} converges to the solution x* of (1) nearest to 
origin. 


Moreover, we can obtain convergence results for (RM) 
under even weaker conditions which are utilized for 
providing existence results for GVIs; see [33]. Namely, 
let us consider the following coercivity condition (see 
e.g. [3]): 

(C2’’) There exists a number r >0 such that for 
any point x € X \ X, there is a point y € X, ||y|| < ||x|| 
such that (g, y—x) < 0, Vg € G(x), where 


X,={x EX | ||x|| <r}. 


The basic approximation properties of the regular- 
ization method are then formulated as follows: 


Theorem 4 Suppose conditions (C1) and (C2”) are ful- 

filled. Then: 

(i) GVI(1) has a solution; 

(ii) GVI (10) has a solution for each ¢ > 0; 

(iii) Each sequence {x**} of solutions of GVI (10) has 
limit points and if {ex} \, 0 all these limit points 
are solutions of GVI (1). 


The regularization approach allows various modifica- 
tions. One of them was proposed by Bakushinskii and 
Polyak [2] and is called the iterative regularization 
method. The idea of this approach consists in simulta- 
neous changes of the regularization parameters and the 
step sizes of an approximation method, i.e., it is inter- 
mediate between the above averaging and regulariza- 
tion methods, and has similar convergence properties. 


Proximal Point Method 


The idea of the proximal point method, which was sug- 
gested by Martinet [39], also consists in replacing the 
initial GVI (1) with a sequence of auxiliary problems. 
The essential features of the proximal point method 
are that the regularization parameter may in principle 
be fixed and that the perturbed mapping depends on 
the previous iteration point. We first recall the conver- 
gence result for the exact version of the proximal point 
method applied to monotone problems. 


Theorem 5 Suppose that (C1) and (C3) are fulfilled 
and that a sequence {x*} is generated in conformity with 
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the rules 


Ag* € G(x*), (gk + O71 (x* — xk), y— xk) > 0 
VyeY, (11) 


where 0 > 0 is a regularization parameter. Then the fol- 
lowing assertions are true: 

(i) each auxiliary GVI (11) has a unique solution; 

(ii) the sequence {x** converges to a solution of GVI (1). 


In fact, since (C1) and (C3) hold, the cost mapping 
in (11) is strongly monotone and, by Proposition 3, (11) 
has a unique solution. However, we can replace (C3) 
with (C2) and obtain similar convergence properties; 
see e. g., [1,4,12,28]. We now give such a strengthened 
result for the proximal point method. 


Theorem 6 Suppose that (C1) and (C2) are fulfilled 
and that a sequence {x*} is generated in conformity with 
the rules in (11) with 0 > 0. Then the following asser- 
tions are true: 

(i) each auxiliary GVI (11) has a solution; 

(ii) the sequence {x** converges to a solution of GVI (1). 


Observe that the cost mapping in (11) need not be 
strongly monotone but (11) is still solvable. Under the 
additional Lipschitz continuity type condition on G we 
can choose @ large enough for the cost mapping in (11) 
to be strongly monotone, thus providing the unique- 
ness of a solution as well. 

The exact proximal point method attains linear and 
even superlinear convergence rates as was shown by 
Rockafellar [44]. At the same time, the total rates of 
convergence of both the proximal point method and 
the regularization method, involving expenses for ap- 
proximate solutions of auxiliary problems, need further 
investigations. 


Direct Iterative Methods for GVIs 


We now present iterative methods for solving GVI (1) 
without any explicit monotonicity assumptions. 


Center-Type Methods 


The best known of the center-type methods is the fa- 
mous ellipsoid method, which was proposed first by 
Yudin and Nemirovskii [47] and by Shor [45] for con- 
vex programming and afterwards adjusted for saddle 


point problems and VIs [40]. In this method, each it- 
erate x* is associated with an ellipsoid U; centered at 
and containing at least one solution point. After finding 
a half-space H*;, containing this solution point, the next 
ellipsoid Ux+1 is precisely the smallest ellipsoid con- 
taining the set U; () H;’. Set 


G(z) 
{p € R"|(p,y—z) SO Vy € X} 


ifze Xx, 

P(z) = 

ifzé xX. 
Method (EM). Choose a point x° € R”, a number 

A > 0 such that ||x° — x*|| < A for some x* € X* and 

set Ag := AI. At the kth iteration, k = 0,1,..., choose 

pe € P(x*) and set 


k 
k+1, k 1 Arp 


EL PTR | 
2 k kyT 
Hpi ae y= 2 Axp"(Axp*) 
n2—1 n+1 (p*)TAxpk 


andk:=k-+1. 
If the basic assumptions (C1) and (C2) are fulfilled, 
the process is well-defined. Namely, then 


Ux, = {x € R” | (Api = x*), x — x") <i} 
and 
By = {x eR" | (p* x —x*) <0}. 


The implementation of (EM) is similar to that of vari- 
able metric methods. It is well known that the vol- 
umes of U; will also tend to zero at a linear rate, which 
depends on the dimensionality of the problem. These 
properties yield the convergence of the sequence {x*} 
to a solution. 

The idea of various proximal-level methods is rather 
close to that of the center methods [14,18,36]. In fact, 
the methods are based on sequential updating of a poly- 
hedral approximation of a nonsmooth merit function 
for GVI and computing the prox-center of the corre- 
sponding level sets. These methods possess similar con- 
vergence properties. 


Combined Relaxation Methods 


The idea of combined relaxation methods consists in 
defining the next iterate x**! as the projection of the 
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current iterate x* onto a hyperplane H, which sep- 
arates strictly x* and the solution set and is com- 
puted by an auxiliary procedure. This approach to solve 
VIs was proposed first by Konnov [19], where it was 
also noticed that the parameters of the hyperplane Hy 
can be found with the help of an iteration of any 
relaxation method. Afterwards, combined relaxation 
methods were developed in several directions. In [20], 
a combined relaxation method for GVI of form (1) 
was proposed and linear convergence rates were estab- 
lished. All these methods ensure convergence to a solu- 
tion of GVI under assumption (C2) or (C3). Within the 
general combined relaxation framework, different rules 
for determining the separating hyperplane and auxil- 
iary procedures were presented; see [26] and references 
therein. 


Combined Relaxation Method for GVIs We now 
consider a combined relaxation method for solving 
GVI (1) with explicit usage of constraints [20,22]. In ad- 
dition to the basic assumptions (C1) and (C2), we sup- 
pose that 

e X is defined by 


X = {x € R" | h(x) < 0},7 


where h: R" — R is a convex and subdifferentiable 
function; 

e the Slater condition is satisfied, i.e., there exists 
a point x such that h(x) < 0. 
Let us define the mapping Q: R” — IT(R") by 


_ (Gx) 
Q(x) = ah(x) 


if h(x) <0, 
if h(x) >0. 


Definition 4 A mapping P: R” — R" is said to be 
a pseudo-projection onto X, if for every y € R” it holds 
that 


P(y) € X and ||P(y) — x|| < lly—x]| Vxe X. 
We denote by F the class of all pseudo-projection map- 
pings onto X. Clearly, rx € F. That is, the pseudo-pro- 
jection is weaker but it can be implemented in the case 
where h is essentially nonlinear; see [26] for more de- 
tails. 

Method (CRM). Step 0 (initialization): Choose 
a point x? € X, bounded positive sequences {¢;} and 


{nr}, and a sequence of mappings {P;,}, where Py € F. 
Also, choose numbers @ € (0,1), and y € (0,2). Set 
k:=0,]:=1. 

Step 1 (auxiliary procedure): 

Step 1.1: Choose q° from Q(x*), set i := 0, p! := q', 

aa a 

Step 1.2: If ||p‘|| < m, set xt! := x*, k= k +1, 

1 := 1+ 1 and go to step 1. (null step) 

Step 1.3: Set weit! := wh — e)p'/||p'||, choose 

git! € Q(whit!). If (qit!, p‘) > O||pi||?, then set 

yk := whitl, gk :— git], and go to step 2. (descent 
step) 

Step 1.4: Set pit) := Nr conv{p!,qit}}, i:=it+1 

and go to step 1.2. 

Step 2 (Main iteration): Set wy := (g*,x* — y*), 
xT = Pylx* — y(oallig* Pg", 
k := k + 1and goto step 1. 

Here NrS denotes the element of S nearest to the 
origin. We will call one increase of the index i an inner 
step, so that the number of inner steps gives the num- 
ber of computations of elements from Q(-) at the corre- 
sponding points. 

Theorem7 Let a sequence {x*} be generated by (CRM) 
and let {e;} and {n,} satisfy the following relations: 


{er} \ 0, {n NO. 


Then: 
(i) The number of inner steps at each iteration is finite. 
(ii) It holds that 


(12) 


lim x* = x* € X*. 

k->oo 
Given a starting point x° and a number 6 > 0, we de- 
fine the complexity of the method, denoted by N(6), as 
the total number of inner steps ¢ which ensures finding 
a point x € X such that 


|x — x" |[/[x° — x" <6. 


Therefore, since the computational expense per inner 
step can be evaluated for each problem under examina- 
tion, this estimate in fact gives the total amount of work. 
We thus proceed to obtain an upper bound for N(4). 


Theorem 8 (Konnov [26], Theorem 2.3.3) Suppose G 
is monotone and there exists x* € X* such that 


for every x € X and for every g € G(x), 


(g.x—x") > pllx—x"], 
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for some t > 0. Let a sequence {x*} be generated by 
(CRM) where 


ep =v'e',n =7',1=0,1,...; ve(0,1). 


Then, there exist some constants & > 0 and 7h > 0 such 
that 


N(6) < Byv?(In(Bo/5)/ Invot + ibe 


where 0 < Bo,B, <0, whenever 0<e' <é and 
0 <1!’ < fj, Bo and B, being independent of v. 


The assertion of Theorem 8 remains valid without the 
additional monotonicity assumption on G if X = R". 
Thus, (CRM) attains a logarithmic complexity estimate, 
which corresponds to a linear rate of convergence with 
respect to inner steps. We can give a similar upper 
bound for N(6) in the single-valued case. 


Theorem 9 (Konnov [26], Theorem 2.3.4) Suppose 
that X = R” and that G is strongly monotone and Lip- 
schitz continuous. Let a sequence {x*\ be generated by 
(CRM), where 


ep = viel. ni = vy 1 =0,1,... se’ > 0,7! > 0; 
v € (0,1). 
Then, 
N(6) < Byv°(In(Bo/5)/ Invot + 1), 


where 0 < Bo, B; < 00, Bo and B, being independent 


of v. 


Combined Relaxation Method for Multivalued Inclu- 
sions To solve GVI (1), we can also apply (CRM) for 
finding stationary points of the mapping P defined as 
follows: 


G(x) if h(x) <0, 
P(x) = 4 conv{G(x) J dh(x)} if h(x) =0, (3) 
dh(x) if h(x) > 0. 


Such a method need not include (pseudo)projections 
and is based on the following observations [21,26]. 

We note P in (13) is a K-mapping. Next, GVI (1) is 
equivalent to the multivalued inclusion 


0 € P(x*). (14) 


We denote by S” the solution set of problem (14). In 
order to apply (CRM) to this problem we have to show 
that its dual problem is solvable. Namely, let us consider 
the problem of finding a point x’ of R" such that 


Vx ER", Vp € Plu), (p,x—x*)>=0, 


which can be viewed as the dual problem of (14). We 
denote by S’ the solution set of this problem. 


Theorem 10 (Konnov [26], Theorem 2.3.1 and Propo- 
sition 2.4.1) It holds that 

(ji) X*=S*, 

(ii) X4 = $4. 


Therefore, we can apply (CRM) by replacing G, X, and 
Py by P, IR", and I, respectively, to the multivalued in- 
clusion (14) under the same blanket assumptions. We 
call this modification (CRMIS). 


Theorem 11 Let a sequence {x*} be generated by 
(CRMIS) and let {¢;} and {1} satisfy (15). Then: 

(i) The number of inner steps at each iteration is finite. 
(ii) It holds that 


lim x‘ = x* € S* = X*. 
k->0oo 


Iterative Methods 
for Generalized Complementarity Problems 


It is well known that taking into account additional 
peculiarities of the problem under examination could 
yield more efficient solution methods in comparison 
with those in the general case. We intend to describe 
several recent results for certain classes of multivalued 
VIs. 

Let us consider problem (1) where the feasible set 
X coincides with the nonnegative orthant RY. = {x € 
R” | x; > 0 Vi = 1,...,n}. Then it can be rewritten 
in the equivalent generalized complementarity problem 
(GCP) format: 


x*>0, dg* € G(x*), g* => 0, (g*,x*) =0. (15) 


Owing to the special form of the constraint sets of these 
problems, their existence and uniqueness results of so- 
lutions can be based upon rather weak order mono- 
tonicity properties instead of the previous norm mono- 
tonicity ones [9,10,17]. We recall several order mono- 
tonicity properties of single-valued mappings. 
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Definition 5 A mapping F: X — R’” is said to be 
(a) a Po-mapping, if for each pair of points x’, x” € X 
there exists an index i such that x) A x!’ and 


(x; — x} )(Fi(x’) — F;(x")) = 0; 


(b) a P-mapping, if for each pair of points x’, x” € X 
such that x’ # x” it holds that 


max (x) — x/) [Fi(x’) — F,(x”)] > 0; 
l<i<n 


(c) a Z-mapping if for each pair of points x’, x” € X 
such that x’ > x” it holds that F,(x’) < Fy(x”) for 
each index k with x, = x{. 


Clearly, each monotone (respectively, strictly mono- 
tone) mapping is a Po-mapping (respectively, P-map- 
ping). 

One of the most useful and fruitful concepts is that 
of the Z-mapping (or off-diagonal antitone mapping). 
However, the creation of efficient solution methods and 
even the generalization of this concept for multivalued 
mappings meet considerable difficulties. 

Following [29] and [32], we consider some kinds of 
multivalued Z-mappings and discuss their properties. 
For rather a general class of GCPs of form (15), we sug- 
gest an extension of the Jacobi algorithm and obtain its 
convergence to a solution, thus presenting an existence 
result. 


Properties of Multivalued Z-Mappings 


We present streamlined extensions of the above con- 
cepts for the multivalued case. 


Definition 6 A multivalued mapping G: X — JI(R") 

is said to be 

(a) a Po-mapping, if for each pair of points x’, x” € X, 
and for each pair of vectors g’ € G(x’), g” € G(x”) 
there exists an index i such that x' 4 x! and 


(x; — x7)(g; — g) 2 05 


(b) a P-mapping, if for each pair of points x’, x” € X 
such that x’ # x” and for each pair of vectors 
g’ € G(x’), g” € G(x”) there exists an index i such 
that 


(xj — x] (gi — BF) > 05 


(c) a Z-mapping if for each pair of points x’, x” € X 
such that x’ > x", x! # x" it holds that g, < gi 
for all g’ € G(x’), g” € G(x”) and for each index 
k such that x, = xj. 


Note that the additional condition x’ 4 x” cannot be 
dropped in Definition 6c since otherwise the Z-map- 
ping becomes single-valued. Hence, the above concept 
of the Z-mapping may appear too restrictive. 


Definition 7 A mapping G: R” — JI(R") is said to 
be 
(a) diagonal if G(x) = |] Gi(x;); 

i=1 


(b) quasi-diagonal if G(x) = [] G;(x). 
i=1 


Clearly, (a)—= (b). Moreover, each single-valued map- 
ping is quasi-diagonal. Next, observe that each diagonal 
single-valued mapping is Z, but this is not the case if it 
is multivalued; hence, various compositions of mullti- 
valued diagonal and Z-mappings may not possess the Z 
property as well. 

We now present modified order monotonicity con- 
cepts of multivalued Z-mappings which enable us to re- 
move these difficulties. 


Definition 8 A mapping G: X — JI(R") is said to 
be an upper (a lower) Z-mapping if for each pair 
of points x’,x” € D such that x’ > x” and for each 
g’ € G(x’) there exists g” € G(x”) (respectively, for 
each g” € G(x”) there exists g’ € G(x’)) such that 
& < & for every index k such that x, = x;. 


Obviously, these concepts extend the similar one from 
Definition 6 and the condition x’ 4 x” is now unnec- 
essary. They are also additive. Moreover, each diagonal 
mapping is both an upper and a lower Z-mapping. 


Extended Jacobi Algorithm 

for Multivalued Mixed Complementarity Problems 
Let us consider GCP (15), where G: X — JT(R") is of 
the form 


l 
G(x) = )0 FO o H(x), (16) 
s=1 


where FS): R" — IT(R") is a quasi-diagonal, an up- 
per Z- and a K-mapping on some rectangle contain- 
ing H)(X), HS): X > TT(R") is a diagonal monotone 
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K-mapping for each s = 1,...,1. Let us introduce the 
auxiliary set for GCP (15) and (16) as follows: 


Q={x>0| 4g e G(x),g>0}. 
Given a vector x € R” anda number y;, we set 


(x7, Yi) = (X1,- +. Xi, Vi, Xi41, +--+ Xn) - 


Algorithm (Jacobi). Choose a point x € Q and, be- 
ginning from the point x° = %, construct a sequence 
{x"} in conformity with the following rules. 

At the kth iteration, k = 0,1, ..., we have a point 
x* € Q such that x* < x° and that there exists g* € 


Hon FS)(hS)') for some AS! € H&S (x*),s = 
_1, such that g* > 0. 
“For each separate ers oo eee n, we determine 
numbers oo, ae wee pt such that 


l 
k k sek Jk wk 
O< xj") < xf, Agf ¢ y Ue, PO 20, 
s=1 


aie k 0, 


g = (17) 


pp? eH al) p? <he fors = 1,.2.,1, (18) 


with the help of the bisection procedure below. After- 
wards, set h'S)-*+1 = p') fors = 1, ... , land go to the 
(k + 1)th iteration. 

Procedure (Bisection). It is applied when the in- 
dices k and i are fixed and consists of the following se- 
quence of steps. 

Step 1: If g§ =0 or x* =0, set al 
p? =h* fors =1,..., 
step 2. 

Step 2: Choose p\’ € H‘(0) ae = ” 
pute an element g* € ye AY rag (300 
then set xhtl — 


— yk gk — gk 
= X75 8) = 8i> 
I and stop. Otherwise go to 


.,/ and com- 
* pp). If g = 0 
= 0 and aap, Oihenvics set xi = 0, 
= pe pe = pos fors=1,...,1. 
Step 3: Generate a sequence of inscribed segments 
[x,x/’] contracting to a point z; by choosing yj = 
$ (x) + x’), ae B, € H®(y;) fors = 1,...,1 
and g; € Yt_, FO (aS) 


= < and ae 


an and setting x/ = yj, 
po = pe fof GS Lal! He S0 and eS 
ye = 8% fors =1,..., lif gz, <0. 

Step 4: Set an = Zj ait compute numbers p € 
H®)(z;) for s = 1,...,1] such that conditions (17) and 
(18) are satisfied. 


We present a convergence result for the Jacobi algo- 
rithm. 


Theorem 12 Suppose that the set Q is nonempty. Then 
the Jacobi algorithm with the bisection procedure is well 
defined and generates a sequence {x*} converging to 
a solution x’ of GCP (15) and (16) such that 0 < x* < &. 


Clearly, the corresponding modification of the Gauss— 
Seidel algorithm will possess similar convergence prop- 
erties. Note that the above theorem in fact contains also 
the existence result. 


Corollary 1 If the set Q is nonempty, then GCP (15) 
and (16) has a solution. 


It was also shown in [32] that the auxiliary set 
Q is a meet semisublattice, i.e., for each pair of 
points x, y € Q it contains their minimal point (meet) 
Z = min{x, y} with z; = min{x;, y;} fori=1,...,n; 
if (16) is replaced by 


G(x) = Fo H(x) + V(x), 


V: X > I(R") is a quasi-diagonal, an upper Z- and 
a K-mapping, H: X — IT(R") is a diagonal monotone 
K-mapping, and F: R" — IT(R") is a quasi-diagonal, 
an upper Z- and a K-mapping on a rectangle contain- 
ing H(X). Hence, the set Q has the least element min Q 
which is a solution of the GCP. 

The above Jacobi algorithm can be extended to 
a more general class of problems. In fact, we can con- 
sider problem (1) where the feasible set X is defined as 
follows: 


= {x eR” |-cw <a; <x; <b; <+a0 


P= 1.2257} 


It is called the generalized mixed complementarity prob- 
lem (GMCP) and can be also equivalently rewritten as 
follows: Find a point x* € X such that 


Ag* € G(x"), gf 
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Then we should define the auxiliary set for the GMCP 
as follows 


Q= {x € X| Ag € G(x), x; <b; > g = 0 


and all the above results remain true. 

We can enhance the assertions of the theorems from 
Sects. “Regularization Methods” and “Proximal Point 
Method” for the regularization-type methods applied 
to GCP (15) or to GMCP (19) with order monotonic- 
ity (Po) properties. In fact, if G is Po, then the auxiliary 
mappings in (10) and (11) are P and the correspond- 
ing auxiliary problems will have a unique solution, thus 
strengthening assertions (i) of Theorem 3, (ii) of Theo- 
rem 4, and (i) of Theorem 6 [1,30,34]. 


Iterative Methods for MVIs 


Let Q: R” — R" be a continuous single-valued map- 
ping and f: R" — R be a convex, proper and lower 
semicontinuous function. The MVI problem is the 
problem of finding a point x* € X such that 


(Q(x*),x—x*) + f(x) —f(x*) >0 Vx EX. (20) 


In this section, we denote by X° the solution set of 
problem (20). Problem (20) was originally considered 
by Lescarret [37] and Browder [6] and was studied by 
many authors owing to its various applications. In the 
case f = 0, it corresponds to the usual VI (2). If f is 
subdifferentiable, MVI (20) becomes equivalent to the 
problem of finding x* € X such that 


(Q(x*)+h*,x—x*)>0 Vx EX, 
(21) 


5h* © af (x*), 


i.e., to GVI (1) with G = Q + df, where 0f denotes the 

subdifferential mapping of f. Also, GVI (21) is a partic- 
ular case of the problem: Find x* € X such that 

dh* € H(x*), (Q(x*)+h*,x—-x*)>0 VWxeXx, 

(22) 

where H: X — JI(R") is a monotone multivalued 


mapping. In turn, GVI (22) is a particular case of 
GVI (1) with G = Q+ H. 


In order to construct an efficient solution method 
for multivalued GVI (22) (or (21)) we can utilize the so- 
called splitting approach as a basis. In fact, if the map- 
ping H in GVI (22) (respectively, df in GVI (21)) is in- 
vertible rather easily, then one can apply the forward- 
backward splitting method which is due to Lions and 
Mercier [38] and consists in constructing a sequence 
{x} as follows: x*+! € X such that 


aykr € H(x**}), (Q(x*) at Oh (xk tl 4" 
+h yx >0 VyeX, (23) 


where 6 > 0, i.e., each iteration is explicit with respect 
to Q and implicit with respect to H. Method (23) is 
clearly simpler than the general proximal point method 
with respect to Q +H, but it requires strengthened 
monotonicity (co-coercivity) assumptions on Q for 
convergence [13]. The combined averaging and split- 
ting method (see [41]) allows for establishing conver- 
gence if Q is only monotone, but it also utilizes the di- 
vergent series step-size rule. 


Descent Methods for MVIs 


In order to enhance the step-size rule we can utilize the 
descent approach with respect to some artificial merit 
(or otherwise, gap) function, which enables one to con- 
vert the MVI problem into an optimization problem. 


Gap Function Approach for MVIs_ The simplest reg- 
ularized gap function can be defined as follows: 


Qo(x) = max By (x, y), 
yEex 
where 


Dax, ¥) = (G(x), x — y) — 0.5a||x — yl? 
ie) = FO; 


a>0o. 
The function ®,y(x,-) is strongly concave; hence, 
there exists the unique element yo(x) € X such that 
PDa(X, Va(x)) = Pa(x). Observe that the computation 
of yq(x) is equivalent to an iteration of the forward- 
backward splitting method applied to MVI (20). 

From the definition we have that the following 
properties are equivalent: 
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(a) a(x) = 0, 

(b) x = ya(x), 

(c) xis a solution of problem (20). 

ie, dg is a gap function for MVI (20) and MVI (20) is 
equivalent to the optimization problem 


min > Q(x). (24) 
xEX 
Despite the fact that ¢q is nondifferentiable and non- 
convex, we can describe descent methods with respect 
to ¢q without computation of its derivatives. Such 
a descent method with exact linesearch was proposed 
by Patriksson [42]. Moreover, it generates a sequence 
which converges to a unique solution of MVI (20) if the 
mapping Q is strongly monotone. At the same time, in- 
exact linesearch procedures are more suitable for im- 
plementation. For this reason we describe a descent 
method with an inexact Armijo-type linesearch proce- 
dure. 

Method (DIG). Choose a point x? € X and num- 
bers a > 0, 8 € (0,1), and y € (0, 1). 

At the kth iteration, k = 0,1,..., we have a point 
x* © X, compute yq(x*) and set dé = Vale") —x* If 
d* = 0, stop. Otherwise, we find m as the smallest non- 
negative integer such that 


~alx* + y™d*) < ga(x*) — By™|\d* |)? , 


set Ap = y, xk+1 = x* + Apd* and go to the next 
iteration. 


Theorem 13 If the mapping Q is continuously differ- 
entiable and strongly monotone with constant t, and 
B < t, (DIG) generates a sequence {x*} which converges 
to a unique solution of MVI (20). 


D-Gap Function Approach for MVIs__ For the usual 
VI (2), Peng [43] introduced the so-called D-gap func- 
tion, which allows one to convert it into an un- 
constrained optimization problem. Following this ap- 
proach, Konnov [23] proposed the D-gap function for 
MVI (20) and showed that, unlike the usual gap func- 
tions, it becomes differentiable if the mapping Q is so, 
regardless of the properties of the function f. Hence, we 
can apply the rapidly convergent algorithms in order to 
find a solution of the initial MVI. 
The D-gap function is defined as follows: 


Wap (x) = Pax) — pp(x), 


where 0 < a < f. It follows that MVI (20) is equivalent 
to the unconstrained optimization problem 


min > x). 
oe Wap ( ) 


Next, if Q a continuously differentiable, so is Wop and 


VWap(x) = VQ(x)[yp(x) — Ya(x)] 
+ B(x — yp(x)) — a(x — ya(x)) . 


If VQ(x) is positive definite on R”, then MVI (20) is 
equivalent to the equation 


VWap(x) = 0. 


Utilizing the above properties, we can describe a de- 
scent method with respect to Wag without computation 
of its derivatives. 

Method (DIDG). Choose a point x° € R” and 
numbers 6B > a > 0, 4 > 0, y € (0,1),0 > 0. 

At the kth iteration, k = 0,1,..., we have a point 


x*, compute yq(x*) and YB (x*), set 


Ax") = yale") — pple’), 
s(x*) := a(x* — yo(x*)) — B(x* — yg(x*)) 


and d* := r(x*) + ys(x*). If d* = 0, stop. Otherwise, 
we compute mas the smallest nonnegative integer such 
that 


Vag (x* =F ya) 
< Wop (x*) = yO(l[r(x*)|] + wells) II)? 


set Ay = y™, xk*1 = x* + Apd* and go to the next 
iteration. 

If the mapping Q is strongly monotone, (DIDG) 
also generates a sequence {x} which converges to 
a unique solution of MVI (20). 

In [25], this approach was extended for MVI (20) 
with order monotonicity (P) properties. In the case 
when the mapping Q is only monotone (or Po), but 
not strongly monotone, the above descent methods can 
be combined with either regularization or proximal 
point methods, such that their auxiliary subproblems 
are solved approximately. 


Combined Relaxation Methods for MVIs 


We describe a combined relaxation method for solving 
monotone MVI (20) which utilizes a similar iteration of 
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the forward—backward splitting method as an auxiliary 
procedure [24,26]. In this subsection we suppose that 
X is a nonempty, closed and convex subset of the space 
R", Q: R” > R’" is a continuous monotone mapping 
and f: IR" — Risa convex and subdifferentiable func- 
tion. For the sake of clarity, we describe a simplified ver- 
sion of the method. 

Method (CRS). Step 0 (initialization): Choose 
a point x° € R” and a sequence of n x n symmetric 
matrices {Ax} such that 


t' |p? < (Aep. p) < v' Ilpll? 


VpeR", O0<t'<1t" <o. (25) 


Choose numbers a € (0,1), 6 € (0,1), and y € (0, 2). 
Set k := 0. 
Step 1 (auxiliary procedure): 
Step 1.1: Determine m as the smallest nonnegative 
integer such that 


(Q(x*) — Q(z), x* — zh) 
< (1—a)Bo™ (Ag(z™ — x*), ze — x*) , 


where z*™ is a solution of the auxiliary problem: 
Find 2" © X such that 


(QG*) +B "Au" =2"), x2") 
+ f(x) — f(z") >0 WxeX. (26) 
Step 1.2: Set 0, := B™, y* = z&™ Tf x* = y*, stop. 
Otherwise set 
gk = Qi") — Q(x") — OAR — x4), 
on = (gk xk — y*y, 
Step 2 (main iteration): Set 


xt) = xk — yous" Ilie* ll’ 
k := k + Land go to step 1. 

Obviously, there exist a number of rules for choos- 
ing the sequence {Ax} satisfying condition (25). The 
simplest is A; = I, which yields the usual forward- 
backward splitting iteration. 


Theorem 14 Let a sequence {x*} be constructed by 
(CRS). If the method terminates at the kth iteration, then 
x* © X*. Otherwise, if {x*} is infinite, then 


lim x* = x* € X*. 
k->0o 


Note that problem (20) has a unique solution if Q is 
strongly monotone. Then (CRS) converges at least lin- 
early. 


Theorem 15 Suppose that Q is strongly monotone. If 
(CRS) generates an infinite sequence {x*}, then {x*} 
converges to a solution of problem (20) at a linear rate. 


This approach admits various extensions and modifi- 
cations. For instance, we can adjust the previous com- 
bined relaxation method to problem (20) with the func- 
tion f having the form 


f(x) = max fix), 27) 


where fj: R” — R,i = 1,...,m are continuously dif- 
ferentiable convex functions [27]. In this method, the 
function f in (26) is replaced by its lower approxima- 
tion: 


te(x) = max {fi(x') + (fe), — x") 


Hence, if the feasible set X is defined by affine functions, 
then the auxiliary problem is equivalent to a convex 
quadratic programming problem and can be solved ex- 
actly by one of the finite algorithms. At the same time, 
the modified method possesses the same convergence 
properties. 

The combined relaxation methods described above 
based on the auxiliary splitting iterations can be ap- 
plied to nonmonotone multivalued GVIs of form (22) 
and to nonmonotone mixed-equilibrium problems; 
see [24,31,35] for more details. 
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Hemivariational inequalities are generalizations of vari- 
ational inequalities. They are used to model mathe- 
matically problems from mechanics, engineering and 
economics whenever nonconvex energy functionals are 
present. Typical applications, e.g. in mechanics are 
contact problems of elastic bodies in which nonmono- 
tone friction laws or adhesive contact laws are involved 
or delamination of adhesively connected plates. The ba- 
sic hemivariational inequality is of the following form: 


Find u € K (K is a convex subset of a Hilbert space X) 
such that 


a(u,v —u) +f j (usv —u)dx 
Q 


> (F,v—u), VveK, (1) 


where a: X x X > Risa bilinear form, 2 a subset of RN, 
j (5) the Clarke generalized directional derivative of the 
locally Lipschitz function j: R™ — R defined in [3] by 


He ay 5) 


(:, +) the duality pairing between X and X* (X* is the 
dual space of X) and F € X*. If the function j is convex 
then the hemivariational inequality (1) is reduced to the 
classical variational inequality: Find u € K such that 


atuy—u) + f jv)dx— f j(u) dx 
Q 2 


> (F,v—u), VveK. 


The concept of the hemivariational inequality was in- 
troduced by P.D. Panagiotopoulos. The mathematical 
theory and the applications are studied in [14,15]. 


Discrete Problem 


For the discretization of the hemivariational inequali- 
ties it is used the finite element technique [2]. By means 
of it the following fully discrete problem is formulated: 
Find u € K such that 


ul A(v—u) + > cif (uis Vi — Ui) 
i€l 


> F'(v—u), WveK, (2) 


where A is the stiffness matrix, F the load vector, c¢; 
the coefficients of the appropriate numerical integra- 
tion formula, K a convex subset of the finite element 
space X;, C X,h the discretization parameter connected 
to the mesh size of the triangulation of X, and J the 
set of the components of u for which the function j has 
an effect. The following stability and convergence result 
has been proved for the above approximation scheme 
[9,10]: 


Theorem 1 The problems (2) are solvable for every h. 
Further, their solutions converge in subsequences to the 
solutions of the continuous problem (1). 
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Due to the nonconvex character of the function j the 
solutions of the continuous problem (1) and the dis- 
crete problems (2) are not in general unique. This also 
explains that in the above result the convergence is 
guaranteed only for subsequences and it has not been 
proved any convergence rate estimate. 


Numerical Realization 


Because of the nonsmoothness and the nonconvexity of 
j the numerical realization of (2) is a challenging prob- 
lem. There are different approaches for that (see, e. g. 
[15]). The most obvious one is to regularize the nons- 
moothness and use methods for smooth problems. The 
other possibility is to approximate (2) by a convex, pos- 
sibly nonsmooth, problem and apply numerical meth- 
ods for the classical variational inequalities. Both ap- 
proaches are typically iterative methods: in the former 
one the problem is solved with many regularization pa- 
rameters and in the latter one the convex approxima- 
tion is updated in every iteration step. In order to solve 
(2) directly in its original form one can use nonsmooth 
nonconvex optimization methods. 

Next it is explained in detail how the discrete hemi- 
variational inequality (2) can be transformed to a non- 
smooth optimization problem. The following concepts 
from nonsmooth analysis are needed [3]: 

e Suppose that f: R™ — R is locally Lipschitz contin- 
uous. Then &* is called a substationary point of f on 

K if 

0 € af(&*) + Nx(&*), 
where Nx(&) is the normal cone of K at €, defined by 


Nx(&) =d 4 |_J Addx(€) 


A>0 


dx the distance function of K, and df (€) is the Clarke 
subdifferential, defined by 


af (€) 
= {ne R™: fPEH2 1S 
e The function f: R“ — R is said to be upper semis- 
mooth if for any § € R™, € € R™ and sequences {7;} 


Cc R™ and {t;} C (0, 00) satisfying n; € f(E + tif) 
and ¢; | 0, one has 


VEER}. 


liarsapy eS tmiare = 


i—>oo tj 


In the sequel it is assumed that the stiffness matrix A is 
symmetric. Then the following discrete energy function 
can be defined: 


fu) = wut Au + > ci j(ui) — Flu. 


i€l 
And, consequently, the following optimization problem 


min f(u) (3) 


is formulated. 

The main question is now what is the relation be- 
tween the optimization problem (3) and the inequality 
problem (2). Under reasonable assumptions, which are 
generally satisfied for real applications, the subdifferen- 
tial of f is equal to 


df(u) = Au+ D> cjdj(uj) — F. 
i€l 
Then from the definition of the subdifferential and the 
upper semismoothness it follows the result (see the 
proofs in [2,3]): 


Theorem 2 Every substationary point of f on K is 
a solution of the discrete hemivariational inequality (2). 
Moreover, the functional f is upper semismooth. 


This result gives the theoretical basis and the moti- 
vation for the use of nonsmooth optimization meth- 
ods for the numerical solution of hemivariational in- 
equalities. In what follows optimization methods are in- 
troduced for nonsmooth nonconvex functionals which 
are convergent under the condition that the functional 
is upper semismooth. Furthermore, some observations 
are presented of the numerical tests being performed in 
[11,12]. 


Nonsmooth Optimization Methods 


The methods for solving the nonsmooth optimization 
problem (3) can be divided into two main classes: sub- 
gradient methods and bundle methods. The basic idea 
behind the subgradient methods is to generalize the 
smooth methods by replacing the gradient by an arbi- 
trary subgradient (see [17]). Due to this simple struc- 
ture they are widely used, but suffer from some theoret- 
ical and numerical drawbacks. 

Bundle methods have their origin in cutting planes 
and can be stated, at the moment, the most efficient 
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and promising methods in nonsmooth optimization 
(see [13]). The aim is to produce a sequence {ux}72, 
c RN converging to a local minimum of (3) being also 
a substationary point of f on K. Suppose that in addi- 
tion to the current iteration point u; there exist some 
trial points y; € R™ (from past iterations) and subgra- 
dients g; € df(y;) for j € Jy, where the index set J; is 
a nonempty subset of {1, ..., k}. 

The idea behind the bundle methods is to approxi- 
mate the objective function below by a piecewise linear 
function, in other words, f is replaced by so called cut- 
ting plane model 


fi(u) = max{f(y;) +g] (u— yi} (4) 
which equivalently can be written in the form 

Fi(u) = max{ flux) + gj (w= ux) — a7}, 
with the linearization error 

ak = f(ux) — f(yj) — g) (uk — 9). (5) 


If f is convex, then F(u) < f(u) for all u € RN and a* 
> 0 for all € J;. In other words, the cutting plane model 
Fi is an under estimate for f and the nonnegative lin- 
earization error at measures how good an approxima- 
tion the model is to the original problem. In the non- 
convex case these facts are not valid anymore and thus 
the linearization error (5) is replaced by so called sub- 


gradient locality measure (cf. [5]) 


Bt = max {a 


y |ur— yi f (6) 


where y > 0 (y = 0 if f is convex). Then obviously 
minyex fx(u) < f(uxz) and B* > 0 for all j € Jy. The 
search direction is then calculated by 


= 1 
d, = in d)+—d" Myds. (7 
k ee ee )+ 5 k (7) 


The role of the stabilizing term $d' Mid is to guaran- 
tee the existence of the solution d, and keep the ap- 
proximation local enough. The n x n matrix Mx is in- 
tended to accumulate some second order information 
about the curvature of f around u,. 

The different bundle methods deviate mostly in the 
choice of M;. Roughly speaking, the following methods 
can be distinguish. 


e Cutting plane method [4] with M; = 0. 
e Conjugate subgradient method [18] with M; = I and 

Br =0. 

e «-steepest descent [7] and generalized cutting plane 
method [5] with M; =I. 

e Bundle trust region [16] and proximal bundle 
method [6] with M; = Axl. 

e Variable metric bundle method [1] with M; as a full 
matrix. 

Although the more advanced bundle methods try 
to accumulate the second order information, they are 
based on first order (sub) gradient information and thus 
have to considered as first order methods. The ‘real’ 
second order method, called bundle-Newton method, 
was derived in [8]. Instead of piecewise linear cutting 
pane model (4) a quadratic model was introduced in 
the form 


Fik(u) = max fon + gj (u— yj) 
+ solu — yj)" Mjlu — y;) 


where M; ~ V7f(y;). The search direction finding prob- 
lem (7) is then replaced by the problem 


dy = arg min {fe(ur +d)}. (8) 


Next the problem of determining the stepsize into 
search direction d; is considered. Assume that my € (0, 
1/2), mp € (mz, 1) and f € (0, 1] are some fixed line 
search parameters. First search for the largest number 
tk € [0,1] such that tk > fand 


flux + tide) S flux) + mith ve, (9) 
where v; is the predicted amount of descent 

vi = fila + de) — flux) <0. 
If such a parameter exists take a long serious step 

Uti = UR + thd, and x41 = Ue41. 


Otherwise, if (9) holds but 0 < if < ¢ then take a short 
serious step 


Ut) = UR + thd, and yp41 = UR + th dy 
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and if tf = 0 take a null step 
k 
UR+1 = Uz and Vet = UR + trdk, 


where a > ef is such that 


(10) 


In short serious steps and null steps there exists dis- 
continuity in the gradient of f. Then the requirement 
(10) ensures that uw; and y;,1 lie on the opposite sides 
of this discontinuity and the new subgradient gy41 € 
Of (Ye+1) will force a remarkable modification of the next 
search direction finding problem. The iteration is ter- 
minated if |v;| is small enough. 

The pseudocode of general bundle method is the 
following: 


PROCEDURE bundle method() 
InputInstance(); 
Generate an initial solution u;; 


Initialize the bundle J; and v;; 
Set le = Ie 
DO | v_ |= e 
Generate the search direction d;; 
Find stepsizes i and fi: 
Update u, and J;; 
Setk =k+1; 
Evaluate f(u,) and gx, € Of (ux); 
OD; 
RETURN (final solution u;) 
END bundle method; 


Numerical Experience 


The numerical tests in [12] indicate the applicability of 
bundle methods for hemivariational inequlities. Espe- 
cially the second order bundle-Newton method based 
on the piecewise quadratic model works very reliable 
and efficiently way. This is natural, since the optimiza- 
tion problem arising from hemivariational inequalities 
has a dominated quadratic part. The most promising 
feature of bundle-Newton method discovered was the 
independence of the iteration number and function 
evaluations from the dimension of the problem even in 
the large scale case. 
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The standard semidefinite program has the form: 


min Cex 
s.t. AjeX =);, 
(SDP) 
i —s eee 7/2 
xX >0, 


where the given matrices Aj € R”” and C € R™” are 
symmetric, b € R”, and unknown X € R”™* is also sym- 
metric. Furthermore, C e X = tr CTX = Soin CX jx, and 
X = 0 means that X is positive semidefinite. In most ap- 
plications, A; = aja}, a; € R", is a rank-one matrix and 
C is sparse. 

The dual of (SDP) can be written as: 


max bly 
(DspP) jst. SY yA; +S=C, 
$0, 
where y;, i= 1,..., mare scalar variables. 


This pair of semidefinite programs can be solved in 
‘polynomial time’. There are actually several polyno- 
mial algorithms. One is the primal-scaling algorithm 
([1,13,16,17]), which is the analogue of the primal 
potential reduction algorithm for linear program- 
ming. This algorithm uses X to generate the next it- 
erate direction. Another is the dual-scaling algorithm 
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([2,9,16,17]), which is the analogue of the dual-scaling 
algorithm for linear programming. The dual-scaling 
algorithm uses only S to generate the iterate direction. 
The third is the primal-dual scaling algorithm, which 
uses both X and S to generate iterate directions, in- 
cluding Alizadeh-Haeberly-Overton, Helmberg-Rendl- 
Vanderbei-W olkowicz/Kojima-Shida-Hara/Monteiro, 

Nesterov-Todd, Gu, and Toh directions, as well as 
directions called the MTW and Half directions (see 
[6,15], and references therein). All these algorithm pos- 
sess O(./n log( t)) iteration complexity to yield duality 
gap accuracy €. 

Although they are ‘polynomially’ solvable, semidef- 
inite programs with dimension n above 1000 have been 
extremely hard to solve in practice, due to the density 
of matrices involved in computation. Thus, exploiting 
the structure and sparsity characteristic of large scale 
semidefinite programs becomes critical to the efficient 
computation of their solution. 

Many large scale semidefinite programs, such as 
the relaxations of combinatorial and quadratic opti- 
mization problems, have features which make the dual- 
scaling algorithm the most suitable choice: 

1) For large scale problems, S tends to be very sparse 
and structured since it is the linear combination of 
C and the Ajs. This sparsity allows considerable sav- 
ings in both memory and computation time. On 
the other hand, X, the primal matrix, may be much 
less sparse and its structure not known beforehand. 
Thus, primal or primal-dual algorithms cannot fully 
exploit the sparseness and structure of the data. 
Many problems under consideration require less ac- 
curacy than some other applications. Therefore, the 
superlinear convergence, exhibited by the primal- 
dual algorithm, may not be utilized in our appli- 
cations. The dual-scaling linear programming algo- 
rithm has been shown to perform equally well when 
only a lower precision answer is required. 

In most combinatorial applications, we need only 
a lower bound for the optimal objective value of 
(SDP. Solving (DSDP) alone would be sufficient to 
provide such a lower bound. Moreover, in most ap- 
plications an interior-feasible point is available to 
start with. Thus, we may not need to generate and 
store X at all. 

Even if an optimal primal solution is necessary, the 
dual-scaling algorithm can generate a sparsely struc- 


2 


per 


3 


~ 


4 


~N 


tured optimal X at the termination of the algorithm. 

The dual-scaling algorithm, which is a an extension 
of the linear programming algorithm, is to reduce the 
Tanabe-Todd-Ye primal-dual potential function 


W(X,S) = pln(X e S) — Indet X — Indet S, 


where p > n + 4/n, by a constant at each iteration. 
Since 


n|n(X e S) — Indet X — Indet > nInn, 


the reduction of the potential leads the duality gap, 
X eS, converging 0. 
Let 


Aj eX 
A(X) = 


and A'(y) = >" WiAi, 
A, eX - 


and let Z = Ce X for some feasible X. Consider the dual 
potential function 


W(y.Z) = pln — b' y) —Indet S. 
Note the relation between the two potential functions: 
W(X, S) = wy, Zz) — Indet X. 


The gradient of with respect to y is: 


V¥y.2 =-= 4b + AS) 
z—bly 

Each step in the dual-scaling algorithm minimizes the 
linearized dual potential function subject to an ellip- 
soidal constraint that keeps S > 0 and the quadratic er- 
ror term small. More precisely, beginning with a strictly 
feasible dual point i S') anda Z*, each iteration solves 
the problem: 


min Vy! (yk, z*)(y— y*) 


(1) 
st. ey? (ATYy _ y*)) (Sk) | <a, 


where q@ is a constant in (0, 1). Here, all matrix norms 
used here will be the Frobenius norm. 
Define 


m(i, j) = tr(S*)1A,(S*)1A, 
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and the positive definite matrix 


m(1, 1) m(1, m) 


Mi = (2) 


Fails 1) le: m) 


In particular, if A; = aja} is arank-one matrix, where a; 
ER", fori=1,...,n, then 
(a! (S*)1a,) (al (S*) an)? 
ME = . : 


(al (S*)“1a,)? (aT(S*)am)? 


The minimal solution, y**!, of (1) is given by 


tN = oemaiaggn 
where 

d(z'), = —(M*)!Vw(yk, z*). (3) 
Let 


PEt) = (S!)-S-AT (My, 2) (84-8 
= (sat (-d@),) (S9-*. 
Then 
VT 2 a), = — | Pe)", 
VT ZN — yh) = —a | PE], 


and the reduction in the potential function satisfies the 
inequality 


az 
W+,Z4) — (yt, 24) < -a [PE] + =a. 


Focusing on the expression of P(Z*), it can be 
rewritten as 


Pe) = — aap ES Xe +I 
with 
x@) 
gk = bl yk 


=o gkyl (Aly aczk + Sys -*, 
(4) 


Note that A(X(z*)) = b, and X(Z*) isa primal feasible 
solution if and only if X (z*) > 0. Furthermore, from 
the multiplicative structure of (4), X(Z*) > 0 ifand only 
if 


A'(d(z*),) + Sk > 0, 


which is a sparse matrix in many applications. Also note 
that 


Ce X(z*) = Ske x(z*) +b! yk 
zk _ pT yk 
=i 
(AT(d@),) (Sky 4 n) +b! yk, 


which can be efficiently computed. 

One can show that, when || Pcz*)|| is small, then 
(X(z*), y®, S*) is in the neighborhood of the central 
path and C e X(z*) < Z*. Thus, we can decrease z* 
to C e X(z*). Moreover, W(X(Z*), S*) is reduced from 
w(xk, Sk) by a constant. 

The theoretical algorithm can be stated as follows. 


Given A(X°) = b, X°>0, 2° =CeX? S°= 
C—A'y® > 0,andk :=0. 

do the following: 

WHILE z* — b'y* > € DO 

1 | Compute the matrix M* of (2). 

2 | Solve (3) for the dual step direction d (z*) be 

3 | IFZ* > Cex(z*) AND A! (d(z‘),)+S* > 0 
THEN X*+!= x(z*) and z*t1=C e X*t! 
ELSE X*t+! = X* and z*t! = zk 


ENDIF 
frei pl a sk+1 sei 
4) Let yes —y. + [pga a2" )y and S™ = 
C— Aly): 


5 | Set k := k + 1 and return to Step 1. 


Dual algorithm 


The algorithm occasionally updates the primal so- 
lution X and its objective value, but it does not need X 
in computation. We can derive the following potential 
reduction theorem: 
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Theorem 1 
w(x, gery < wi: S*) _ § 


where 6 > 1/20 for a suitable a. 
This theorem leads to 


Corollary 2 Let po > n+ Jn. Then, the algorithm ter- 
minates in at most O((p — n) log(X° e S°/e)) iterations. 


To accelerate the convergence of the algorithm, one 
may increase the value of p in practice and consider 
a bigger stepsize a, see [4]. The stopping criterion is of- 
ten 


zk — pl yk 
—_— <€ 
1+ |bT yk] ~ 


that is, when the relative duality gap is less than pre- 
scribed accuracy €. 

The dual-scaling algorithm, described above, has 
been implemented for solving semidefinite programs, 
arisen from maximum-cut and ‘box’-constrained 
quadratic optimization, with dimension up to 10000. 
The computational results are promising. 


See also 


> ABS Algorithms for Linear Equations and Linear 
Least Squares 

> Cholesky Factorization 

> Duality for Semidefinite Programming 

> Interior Point Methods for Semidefinite 
Programming 

> Interval Linear Systems 

> Large Scale Trust Region Problems 

> Large Scale Unconstrained Optimization 

> Linear Programming 

> Orthogonal Triangularization 

> Overdetermined Systems of Linear Equations 

> QR Factorization 

> Semidefinite Programming and Determinant 
Maximization 

> Semidefinite Programming: Optimality Conditions 
and Stability 

> Semidefinite Programming and Structural 
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Programming and Perfect Duality 
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Spatial price equilibrium modeling and computation 
are concerned with the prediction of commodity trade 
flow patterns between spatially separated supply and 
demand markets as well as the market commodity 
prices. The distinctive character of spatial price prob- 
lems lies in the recognition of transportation costs as- 
sociated with shipping the commodities between pro- 
ducing and consuming locations or regions. Such mod- 
els are perfectly competitive partial equilibrium models; 
perfectly competitive in the sense that it is assumed that 
there are many producers and consumers with no indi- 
vidual being able to affect the market prices and partial 
(as opposed to general) in the sense that only a subset 
of the commodities in the economy is assumed to be 
modeled. 

In particular, in the spatial price equilibrium prob- 
lem, one seeks to compute the commodity supply 
prices, demand prices, and trade flows satisfying the 
equilibrium condition that the demand price is equal 
to the supply price plus the cost of transportation, 
if there is trade between the pair of supply and de- 
mand markets; if the demand price is less than the sup- 
ply price plus the transportation cost, then there will 
be no trade. Spatial price equilibrium problems arise 


in agricultural markets, energy markets, and financial 
markets and such models provide the basis for inter- 
regional and international trade modeling (see, e.g., 
[14,16,19,20,21,30]). 

The first reference in the literature to such prob- 
lems was by A. Cournot [2] in 1838, who considered 
two spatially separated markets. S. Enke [8], more than 
a century later, used an analogy between spatial price 
equilibrium and electronic circuits to give the first com- 
putational approach, albeit analogue, to such problems, 
in the case of linear and separable supply and demand 
functions. 

P.A. Samuelson [28] subsequently initiated the rig- 
orous treatment of such problems by establishing that 
the solution to the spatial price equilibrium problem, as 
posed by Enke, could be obtained by solving an opti- 
mization problem in which the objective function, al- 
though artificial, had the interpretation of a net so- 
cial pay-off function. The spatial price equilibrium, in 
this case, coincided with the Kuhn-Tucker conditions 
of the appropriately constructed optimization problem. 
Samuelson also related Enke’s specification to a stan- 
dard problem in linear programming, the Hitchcock- 
Koopmans transportation problem and noted that the 
spatial price equilibrium problem was more general 
in the sense that the supplies and demands were not 
known a priori. Finally, Samuelson also identified the 
network structure of such problems. 

T. Takayama and G.C. Judge [29,30] further ex- 
panded on the work of Samuelson [28] and showed 
that the prices and commodity flows satisfying the spa- 
tial price equilibrium conditions could be determined 
by solving a quadratic programming problem in the 
case of linear supply and demand price functions for 
which the Jacobians were symmetric and not necessar- 
ily diagonal. This theoretical advance enabled not only 
the qualitative study of equilibrium patterns, but also 
opened up the possibility for the development of effec- 
tive computational procedures, based on convex pro- 
gramming, as well as, the exploitation of the network 
structure (see [5,12,15]). 

As noted in Takayama and Judge [30], who devel- 
oped a variety of spatial price equilibrium models, dis- 
tinct model formulations are needed, in particular, both 
quantity and price formulations, depending upon the 
availability and format of the data. In a quantity for- 
mulation it is assumed that the supply price functions 
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and demand price functions are given (and these are 
a function, respectively, the quantities produced and 
consumed) whereas in a price formulation it is assumed 
that the supply and demand functions are given and 
these are a function, respectively, of the supply and de- 
mand prices. Moreover, Takayama and Judge [30] re- 
alized that a pure optimization framework was not suf- 
ficient to handle, for example, multicommodity spatial 
price equilibrium problems in which the Jacobians of 
the supply and demand price functions were no longer 
symmetric. 

Towards that end, new formulations were proposed 
for the spatial price equilibrium problem under more 
general settings, including fixed point, complementar- 
ity, and variational inequality formulations. J.G. MacK- 
innon [18] gave a fixed point formulation which was 
then used by H.W. Kuhn and MacKinnon [17] for com- 
putational purposes. R. Asmuth, B.C. Eaves, and E.L. 
Peterson [1] considered the linear asymmetric spatial 
price equilibrium problem formulated as a linear com- 
plementarity problem and proposed Lemke’s algorithm 
for the computation of the spatial price equilibrium. 
J.S. Pang and P.L. Lee [27] developed special-purpose 
algorithms based on the complementarity formulation 
of the problem. M. Florian and M. Los [10] and S.C. 
Dafermos and A. Nagurney [4] addressed the varia- 
tional inequality formulations of general spatial price 
equilibrium models with the latter authors providing 
sensitivity analysis results. The interrelationships be- 
tween variational inequality, complementarity, and ex- 
tremal formulations of spatial price equilibrium prob- 
lems are given in [11]. 

Dafermos and Nagurney [5] established the equiv- 
alence of the spatial price equilibrium problem with 
the traffic network equilibrium problem. This identifi- 
cation stimulated further research in network equilibria 
(cf. [9,21], and the references therein) and in algorithm 
development for such problems. Computational test- 
ing of different algorithms for spatial price equilibrium 
problems can be found in [11] and [20]. Spatial price 
equilibrium models have also been used for policy anal- 
ysis (see, e. g., [16,23,26], and the references therein). 

Since spatial price equilibrium problems can be 
large scale in practice parallel computational ap- 
proaches have been implemented to solve such prob- 
lems (cf. [13,22,23]). Recently, general dynamic spa- 
tial price equilibrium models have been developed (cf. 


[24,25]), based on the connection between solutions 
to variational inequality problems and the stationary 
points of projected dynamical systems (cf. [7]), and 
solved using parallel computers. 

For definiteness, we first present the quantity model 
and then the price model and provide the variational 
inequality formulations of the governing equilibrium 
conditions. We then present a dynamic quantity model. 
For additional background, including qualitative and 
computational results, see [21] and [25]. 


The Quantity Model 


Consider the spatial price equilibrium problem in 
quantity variables with M supply markets and N de- 
mand markets involved in the production and con- 
sumption of a homogeneous commodity under perfect 
competition. Denote a typical supply market by i and 
a typical demand market by j. Let s; denote the sup- 
ply and z; the supply price of the commodity at sup- 
ply market i. Let dj denote the demand and p; the de- 
mand price at demand market j. Group the supplies and 
supply prices, respectively, into a column vector s € RM 
and a row vector z € R™. Similarly, group the demands 
and demand prices, respectively, into a column vector 
d € RN and a row vector p € R. Let Q, denote the 
nonnegative commodity shipment between the supply 
and demand market pair (i,j), and let cj denote the unit 
transaction cost associated with trading the commodity 
between (i, j). The unit transaction costs are assumed 
to include the unit costs of transportation from supply 
markets to demand markets, and, depending upon the 
application, may also include a tax/tariff, duty, or sub- 
sidy incorporated into these costs. Group the commod- 
ity shipments into a column vector Q € R“N and the 
transaction costs into a row vector c € RM. The net- 
work structure of the problem is depicted in Fig. 1. 
Assume that the supply price at any supply mar- 
ket may, in general, depend upon the supply of the 
commodity at every supply market, that is, w = z(s), 
where z is aknown smooth function. Similarly, the de- 
mand price at any demand market may depend upon, 
in general, the demand of the commodity at every de- 
mand market, that is, p = p(d), where p is a known 
smooth function. The unit transaction cost between 
a pair of supply and demand markets may depend upon 
the shipments of the commodity between every pair of 
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Supply Markets 
ma (3) 


7 (s) 


Demand Markets 


Spatial Price Equilibrium, Figure 1 
Network structure of spatial market problem 


markets, that is, c = c(Q), where c is a known smooth 
function. 

The supplies, demands, and shipments of the com- 
modity, in turn, must satisfy the following feasibility 
conditions, which are also referred to as the conserva- 
tion of flow equations: 


N 
si = } Qij. i= 1,...,.M, 
j=l 


M 
t= Oy. Pots, 
i=1 


Qi; = 9, i=1,...,M;j=1,...,N. 


In other words, the supply at each supply market 
is equal to the commodity shipments out of that sup- 
ply market to all the demand markets. Similarly, the 
demand at each demand market is equal to the com- 
modity shipments from all the supply markets into that 
demand market. 


Definition 1 (spatial price equilibrium) Following 
[28] and [30], the supply, demand, and commodity 
shipment pattern (s*, Q*, d*) constitutes a spatial price 
equilibrium, if it is feasible, and for all pairs of supply 
and demand markets (i, j), it satisfies the conditions: 


mi(s*) + cij(Q*) PAE) egy a 
> pid"), if Qi = 0. 

Hence, if the commodity shipment between a pair of 

supply and demand markets is positive at equilibrium, 

then the demand price at the demand market must be 

equal to the supply price at the originating supply mar- 

ket plus the unit transaction cost. If the commodity 


shipment is zero in equilibrium, then the supply price 
plus the unit transaction cost can exceed the demand 
price. 

The spatial price equilibrium can be formulated as 
a variational inequality problem (cf. [3,10], and [21] for 
proofs). Precisely, we have 


Theorem 2 (variational inequality formulation) 
A commodity supply, shipment, and demand pattern 
(s*, Q*, d*) € K is a spatial price equilibrium if and 
only if it satisfies the following variational inequality 
problem: 


(x(s*),s —s*) + (c(Q*), Q— Q*) 
+ (-pld"),d—a") = 0, Vis,Q,d) € K, 


where 
K = {(s,Q,d): feasibility conditions hold} 


and (-, -) denotes the inner product. 


Example 3 For illustrative purposes, we now present 
a small example. Consider the spatial price equilibrium 
problem consisting of two supply markets and two de- 
mand markets. Assume that the functions are as fol- 
lows: 
m(s) = 5s; + 52 + 1, 
77(s) = 45. + 5, + 2, 
c11(Q) = 2Qin + Qy + 3, 
€12(Q) = Qn +5, 
c21(Q) = 3Qr1 + Qn +5, 
€22(Q) = 3Q22 + 2Q2 + 9, 
pi(d) = —2d, — d, + 21, 
p2(d) = —5d, — 3d) + 29. 


It is easy to verify that the spatial price equilibrium 
pattern is given by: 


Ss; 2, sy =1, 
Qn =1 Qn = 1, Qi, =1, Q=9, 
df =2, dy =1. 


In one of the simplest models, in which the Jaco- 
bians of the supply price functions, [ dz/ ds], the trans- 
portation (or transaction) cost functions, [ dc/ dQ], and 
minus the demand price functions, —[ dp/ dd] are di- 
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agonal and positive definite, then the spatial price equi- 
librium pattern coincides with the Kuhn-Tucker condi- 
tions of the strictly convex optimization problem: 

min 


M TL, Qi 
bay , mi(x) dx 
QeRY” | ja 0 
Me EN Qij NN Liki Qi 
23) cij(y) dy-)of p;(z) dz 
0 “=, 70 
j=l 


i=1 j=1 


The Price Model 


We now describe briefly the price model. The notation 
is as for the quantity model except now we consider the 
situation where the supplies at the supply markets, de- 
noted by the row vector s may, in general, depend upon 
the column vector of supply prices z, that is, s = s(z). 
Similarly, assume that the demands at the demand mar- 
kets, denoted by the row vector d, may, in general, de- 
pend upon the column vector of demand prices p, that 
is, d = d(p). The transaction/transportation costs are of 
the same form as in the quantity model. 

The spatial equilibrium conditions now take the fol- 
lowing form: For all pairs of supply and demand mar- 
kets (i, j),i=1,...,M3j=1,...,N: 


=p* ifQ*>0, 
Htc) roe no 
= Pj 1 ij — oy 
where 
N 
=> 0) tuto: 
itae aes 
= > Qh, itn? = 9, 
j=l 


and 


if p; > 0, 
dj(p") a 


M 
a> o, 


The first equilibrium condition is as in the quantity 
model with the exception that the prices are now vari- 
ables. The other two conditions allow for the possibil- 
ity that if the equilibrium prices are zero, then one may 
have excess supply and/or excess demand at the respec- 


if p; =0. 


tive market(s). If the prices are positive, then the mar- 
kets will clear. 

The variational inequality formulation of the equi- 
librium conditions governing the price model is now 
given (for a proof, see [21]). 


Theorem 4 (variational inequality formulation) The 
vector x* = (Q*, 2*, p*)eE aa is an equilibrium 
shipment and price vector if and only if it satisfies the 
variational inequality: 


Vxe 


(F(x*),x —x*) > 0, spied 


where F: K > RMN+™4N js the row vector: F(x) = (T(x); 
S(x), D(x)), where T: RYIN*+M+N _, RMN, ; RUNTM _, 
R™, and D: RYN*N — RN are defined by: 


Tij = mi + cij(Q) — pj, 


A Dynamic Model 


We now present the projected dynamical system model 
of the latter spatial price problem. For additional back- 
ground, qualitative properties, as well as computational 
results, see [25] and the references therein. In view of 
variational inequality governing the price model, we 
may write the dynamical system as: 


Q Q —F(Q,#,p) 
ri= FIgun-+utn a —S(Q, 1) 


where assuming that the feasible set K is a convex poly- 
hedron (as is the case here), and given x € K and ve 
R", we define the projection of the vector v at x (with 
respect to K) by 

Px(x + dv) — x 

9. 


ITx(x,v) = lim 
5-0 
where Px is defined as: 
Px(x) = arg min ||x — z||, 
zeK 


and || - || denotes the Euclidean norm. 


Spatial Price Equilibrium 


3651 


More explicitly, if the demand price at a demand 
market exceeds the supply price plus the unit transac- 
tion cost associated with shipping the commodity be- 
tween a pair of supply and demand markets, then the 
commodity shipment between this pair of markets will 
increase. On the other hand, if the supply price plus unit 
transaction cost exceeds the demand price, then the 
commodity shipment between the pair of supply and 
demand markets will decrease. If the supply at a supply 
market exceeds (is exceeded by) the commodity ship- 
ments out of the market, then the supply price will de- 
crease (increase). In contrast, if the demand at a de- 
mand market exceeds (is exceeded by) the commodity 
shipments into the market, then the demand price will 
increase (decrease). 

However, if at the boundary the vector field —F 
points ‘out’ of the feasible set, the right-hand side of 
the ordinary differential equation becomes the projec- 
tion of F onto the boundary. In other words, if the com- 
modity shipments, and/or the supply prices, and/or the 
demand prices are driven to be negative, then the pro- 
jection ensures that the commodity shipments and the 
prices will be nonnegative, by setting the values equal 
to zero. The solution to the projected dynamical system 
then evolves along a ‘section’ of the boundary of the fea- 
sible set. At a later time, the solution may re-enter the 
interior of the constraint set, or it may enter a lower- 
dimensional part of its boundary, with, ultimately, the 
spatial price equilibrium conditions being reached at 
a stationary point, that is, when x = 0. 


See also 


> Equilibrium Networks 

> Financial Equilibrium 

> Generalized Monotonicity: Applications to 
Variational Inequalities and Equilibrium Problems 

> Oligopolistic Market Equilibrium 

> Traffic Network Equilibrium 

> Walrasian Price Equilibrium 
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Introduction 


Cauchy’s steepest descent algorithm [22] is the most 
ancient method for multidimensional unconstrained 
minimization. Given f, a real smooth function defined 
on R", the idea is to iterate according to: 


Xe+1 = Xp —anV AG (xx), (1) 


with the expectancy that the sequence {x;} would ap- 
proximate a minimizer of f. The greedy choice of the 
steplength a; is 


fxk — OV f(x) = min flxk —aV f(x). (2) 


The poor practical behavior of (1)-(2) has been known 
for many years. If the level sets of f resemble long val- 
leys, the sequence {x;} displays a typical zig-zagging tra- 
jectory and the speed of convergence is very slow. In the 
simplest case, in which f is a strictly convex quadratic, 
the method converges to the solution with a Q-linear 
rate of convergence whose factor tends to 1 when the 
condition number of the Hessian tends to infinity. 

Nevertheless, the structure of the iteration (1) is 
very attractive, especially when one deals with large- 
scale (many variables) problems. Each iteration only 
needs the computation of the gradient V f(x;) and the 
number of algebraic operations is linear in terms of n. 
As a consequence, a simple paper by Barzilai and Bor- 
wein published in 1988 [4] attracted some justified at- 
tention. Barzilai and Borwein discovered that, for some 
choices of a, Cauchy’s method converges superlinearly 
to the solution, if f : R* > R is a convex quadratic. 
Some members of the optimization community began 
to believe that the existence of an efficient method for 
large-scale minimization based only on gradient direc- 
tions could be possible. 

In 1993, Raydan [60] proved the convergence of the 
Barzilai-Borwein method for arbitrary strictly convex 
quadratics. He showed that the method was far more 
efficient than the steepest descent algorithm (1)-(2) al- 
though it was not competitive with the Conjugate Gra- 
dient method of Hestenes and Stiefel [49] for quadratic 
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problems. The possibility of obtaining superlinear con- 
vergence for arbitrary n was discarded by Fletcher’s 
work [40] (see also [60]) and a bizarre behavior of the 
method seemed to discourage the application to general 
(not necessarily quadratic) unconstrained minimiza- 
tion: the sequence of functional values f(x,;) did not 
decrease monotonically and, sometimes, monotonicity 
was severely violated. 

However, starting with the work by Grippo, Lam- 
pariello and Lucidi [47], nonmonotone strategies for 
function minimization began to become popular. These 
strategies made it possible to define globally convergent 
algorithms without monotone decrease requirements. 
The philosophy behind nonmonotone strategies is that, 
many times, the first choice of a trial point by a min- 
imization algorithm hides a lot of wisdom about the 
problem structure and that such knowledge can be de- 
stroyed by the decrease imposition. For example, if one 
applies Newton’s method to a problem in which sev- 
eral components of the gradient are linear, these com- 
ponents vanish at the first trial point of each iteration, 
but the objective function value does not necessarily de- 
crease at this trial point. 

Therefore, the conditions were given for the imple- 
mentation of the Barzilai-Borwein method for general 
unconstrained minimization with the help of a non- 
monotone strategy. Raydan [61] defined this method 
in 1997 using the GLL strategy [47]. He proved global 
convergence and exhibited numerical experiments that 
showed that, perhaps surprisingly, the method was 
more efficient than classical conjugate gradient meth- 
ods for minimizing general functions. These nice com- 
parative numerical results were possible because, al- 
beit the Conjugate Gradient method of Hestenes and 
Stiefel continued to be the rule of choice for solving 
many convex quadratic problems, its efficiency was 
hardly inherited by generalizations for minimizing gen- 
eral functions. Therefore, there existed a wide space for 
variations of the Barzilai-Borwein idea. 

The Spectral Projected Gradient (SPG) method [16, 
17,18] was born from the marriage of the Barzila- 
Borwein (spectral) nonmonotone ideas with classical 
projected gradient strategies [7,46,53]. This method is 
applicable to convex constrained problems in which 
the projection on the feasible set is easy to com- 
pute. Since its appearance, the method has been in- 
tensively used in applications [3,6,10,14,15,19,20,24,26, 


35,42,50,59,63,64,65,69]. Moreover, it has been the ob- 
ject of several spectral-parameter modifications, alter- 
native nonmonotone strategies have been suggested, 
convergence and stability properties have been eluci- 
dated and it has been combined with other algorithms 
for different optimization problems. 


Method 
The Secant Connection 


Quasi-Newton secant methods for unconstrained opti- 
mization [36,37] obey the recursive formula 


Xkp1 = Xe + ,By' Vf (xx). (3) 


The sequence of matrices {B;} satisfy the secant equa- 
tion 


Brisk = Ves (4) 


where sj = xx41 — Xx and yp = Vf(xx41) — Vf (xx). 
By (4), it can be shown that, at the trial point 
Xk — By! VF (xx), the affine approximation of V f(x) 
that interpolates the gradient at x, and x,_; vanishes 
for all k > 1. 

Now assume that we want a matrix B,+, with a very 
simple structure that satisfies (4). More precisely, we 
wish 


Bui = Onqil, 


with 0441 € R. (4) becomes: 


Ok+1Sk = Vk- 


In general, this equation cannot be solved. However, 
accepting the least-squares solution that minimizes 
osx — yx||3, we obtain: 
ie 
Gays. (5) 
S, Sk 
This formula defines the most popular Barzilai- 
Borwein method [61]. Namely, the method for uncon- 
strained minimization is of the form (3), where, at each 
iteration, 


i= -=V fle) 


and formula (5) is used to generate the coefficients o,% 
provided that they are bounded away from zero and 
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that they are not very large. In other words, the method 
uses safeguards 0 < Omin < Omax < 00 and defines, at 
each iteration: 

SEV 


Ok+1 = MiN 4 Omax, MAX 4} Omin, “T. 
Si Sk 


By the Mean-Value Theorem of Integral Calculus, one 
has: 


1 
Vk = (/ V? (xk + ts)dt) Sc: 
0 


Therefore, formula (5) defines a Rayleigh quotient rel- 
ative to the average Hessian matrix ( i Vf (xk + 
ts;)dt)s,. This coefficient is between the minimum and 
the maximum eigenvalue of the average Hessian, which 
motivates the denomination of Spectral Method [16]. 
Writing the secant equation as Hy, = sx, which is 
also standard in the Quasi-Newton tradition, we arrive 


T 
to a different spectral coefficient: : a 5 - Curiously, both 
k 


this dual and the primal coefficient had been used for 
many years in practical quasi-Newton methods to de- 
fine the initial matrices Bp [58]. 


The Line Search 


The Spectral Projected Gradient method (SPG) aims to 
minimize f on a closed and convex set 2. The method, 
as well as its unconstrained counterpart [61] has the 
form 


Xeey = Xp + Agdy. (6) 


The search direction d; has been defined in [16] as 
1 
dy, = P(xp — —Vf(xx)) — xx, 
OK 


where P denotes the Euclidean projection on @. A re- 
lated method with approximate projections has been 
defined in [18]. The direction d, is a descent di- 
rection, which means that, if d, #0, one has that 
f(xr + ad.) < f(x) for a small enough. This means 
that, in principle, one could define convergent methods 
imposing sufficient decrease at every iteration. How- 
ever, this leads to disastrous practical results. For this 
reason, the spectral methods employ a nonmonotone 
line search that does not impose functional decrease 
at every iteration. In [16,17,18] the GLL [47] search is 


used. This line search depends on an integer parame- 
ter M > 1 and imposes a functional decrease every M 
iterations (if M = 1 then GLL line search reduces to 
a monotone line search). 

The line search is based on a safeguarded quadratic 
interpolation and aims to satisfy an Armijo-type cri- 
terion with a sufficient decrease parameter y € (0, 1). 
The safeguarding procedure acts when the minimum 
of the one-dimensional quadratic lies outside [t, m0], 
and not when it lies outside [t,@, t2@] as usually imple- 
mented. This means that, when interpolation tends to 
reject 90% (for o; = 0.1) of the original search interval 
[0, 1], we judge that its prediction is not reliable and we 
prefer the more conservative bisection. The complete 
line search procedure is described below. 


Algorithm 3.1: Line Search 

Compute fmax = max{f(x,_;)|0 < j < min{k, M — 
I}, x4 <— xe + dk, db <— (Vf (x«x)), dx) and seta <— 1. 
Step 1. Test nonmonotone Armijo-like criterion 

If f(x+) < fmax + yad then set a, < a and stop. 


Step 2. Compute a safeguarded new trial steplength 
Compute mp < —3076/(f(x+) — f(xx) — a8). 

If Otmp = 01 and O;mp < o2@ then set a <— Amp. 
Otherwise, set « <— a /2. 


Step 3. Compute x; <— x; + ad, and go to Step 1. 


Remark. In the case of rejection of the first trial point, 
the next ones are computed along the same direc- 
tion. As a consequence, the projection operation is per- 
formed only once per iteration. 


General Form and Convergence 


The Spectral Projected Gradient method SPG aims to 
solve the problem 


Minimize f(x) subject to x € 2, (7) 


where f admits continuous first derivatives and 
@ C R" is closed and convex. 
We say that a point x € 2 is stationary if 


Vf(x)'d > 0 


for alld € R" such thatx +de QQ. 
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In [18], SPG method has been presented as a mem- 
ber of a wider family of “Inexact Variable Metric” meth- 
ods for solving (7). Let B be the set of n x n positive 
definite matrices such that ||B|| < L and ||B™'|| < L. 
Therefore, B is a compact set of R”*". In the spectral 
gradient approach, the matrices will be of the form oI, 
with o € [Omin, Omax]. 


Algorithm 4.1: Inexact Variable Metric Method 
Assume 7 € (0,1), y € (0,1), 0<|1<m<1M>1 
an integer number. Let xo € 2 be an arbitrary initial 
point. We denote gx = Vf (xx) for all k = 0,1,2,... 
Given x, € 92, By € B, the steps of the kth iteration of 
the algorithm are: 


Step 1. Compute the search direction 
Consider the subproblem 


Minimize Q;(d) subjectto x, +d€Q, (8) 
where 


1 
Q;(d) = 54! Bad + gid. 


Let dy be the minimizer of (8). (This minimizer exists 
and is unique by the strict convexity of the subprob- 
lem (8), but does not need to be computed.) 

Let dx be such that x, + dx € §2 and 


Qk (di) < 7 Qe(de) - 


If dy, = 0, stop the execution of the algorithm declaring 
that x, is a stationary point. 


Step 2. Compute the steplength 

Compute fmax = max{f(x,_;)|0 < j < min{k,M — 
1}},6 <— (gx, dx) and set a <— 1. 

If 


F(x + ad) = fmax a yas , (9) 


set Up = A, Xg41 = X_ + apd, and finish the iteration. 
Otherwise, choose Qpew € [T@, T2@], set & <— Anew and 
repeat test (9). 


Remarks. (a) Algorithm 3.1 is a particular case of Step 2 
of Algorithm 4.1. (b) In the definition of Algorithm 4.1 
the possibility 7 = 1 corresponds to the case in which 
the subproblem (8) is solved exactly. 


The main theoretical results [18] are stated below. 
Firstly, it is shown that an iteration necessarily finishes 
and then it is shown that every limit point of a sequence 
generated by the algorithm is stationary. 


Theorem 4.1. The algorithm is well defined. 


Theorem 4.2. Assume that the level set {x € 22 | 
f(x) < f(xo)} is bounded. Then, either the algorithm 
stops at some stationary point x,, or every limit point of 
the generated sequence is stationary. 


Numerical Example 


In [17] a family of location problems was introduced. 
Given a set of npol disjoint polygons in R* we wish to 
find the point y that minimizes the sum of the distances 
to those polygons. Therefore, the problem is 


npol 


min )) |Iz' — yll2 
OY G=1 


subject to z' € P;, i=1,...,npol. 
I l 
Let us write x = (z!,zl,...,2)7" 23°, v1, 72). We 


observe that the problem has 2 x (npol + 1) vari- 
ables. The number of (linear inequality) constraints is 
Bes : vj, where v; is the number of vertices of the poly- 
gon P;. Each constraint defines a half-plane in R?. Fig- 
ure | shows the solution of a small five-polygons prob- 
lem. 

For projecting x onto the feasible set observe that we 
only need to project each z' separately onto the corre- 
sponding polygon P;. In the projection subroutine we 
consider the half-planes that define the polygon. If z! 
belongs to all these half-planes, then z! is the projection 
onto P;. Otherwise, we consider the set of half-planes to 
which z' does not belong. We project z! onto these half- 
planes and we discard the projected points that do not 
belong to P;. Let A; be the (finite) set of nondiscarded 
half-plane projections and let V; be the set of vertices 
of P;. Then, the projection of z' onto P; is the point of 
A; U V; that is closest to z’. The projection subroutine 
are included in the test driver for SPG method [17]. 

Varying npol and choosing randomly the localiza- 
tion of the polygons and the number of vertices of 
each one, several test problems were generated and 
solved by the SPG method in [17]. The biggest prob- 
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Spectral Projected Gradient Methods, Figure 1 
Five-polygons problem 


lem had 48,126 polygons, 96,254 variables and 578,648 
constraints. Using the origin as initial approximation, 
it was solved by the SPG method in 17 iterations, us- 
ing 19 function evaluations and 12.97 s of CPU time on 
a Sun SparcStation 20 with the following main charac- 
teristics: 128 Mbytes of RAM, 70 MHz, 204.7 mips, 44.4 
Mflops. (Codes were in Fortran 77 and the compiler op- 
tion adopted was “-O”.) 


Further Developments 


Developments on spectral gradient and spectral pro- 

jected gradient ideas include: 

1. Application and implementation of the spectral 
methods to particular optimization problems: Lin- 
ear inequality constraints were considered in [1]. 
In [38] the SPG method was used to solve Aug- 
mented Lagrangian subproblems. The spectral gra- 
dient method solves the subproblems originated by 
the application of an exponential penalty method to 
linearly constrained optimization in [56]. 

2. Preconditioning: The necessity of preconditioning 
for very ill-conditioned problems has been recog- 
nized in several works [5,23,45,54,57]. 

3. Extensions: The spectral residual direction was de- 
fined in [51] to introduce a new method that aims 
to solve nonlinear systems of equations using only 
the vector of residues. See, also, [48,52,70]. The 


SPG method has been extended for solving non- 
differentiable convex constrained problems [25]. 


. Association with other methods: The SPG method 


has been used in the context of active-set meth- 
ods for box-constrained optimization in [2,13,12]. 
Namely, SPG iterations are used in these methods 
for abandoning faces whereas Newtonian iterations 
are employed inside the faces of the feasible region. 
The opposite idea was used in [44], where spectral 
directions were used inside the faces and a differ- 
ent orthogonal strategy was employed to modify the 
set of current active constraints (see also [9]). Spec- 
tral ideas were also used in association with conju- 
gate gradients in [11]. Combinations of the spectral 
gradient method with other descent directions were 
suggested in [21,28]. 


. Nonmonotone alternative rules: Dai and Fletch- 


er [30] observed that, in some cases, even the descent 
GLL strategy was very conservative and, so, more 
chance should be given to the pure spectral method 
(a, = 1 for all k). However, they showed that, for 
quadratic minimization with box constraints, the 
pure method is not convergent. Therefore, alterna- 
tive tolerant nonmonotone rules were suggested. Dai 
and Zhang [31] noted that the behavior of spectral 
methods heavily depend on the choice of the param- 
eter M of the GLL search and proposed an adaptive 
nonmonotone search independent of M. Over and 
under relaxations of the spectral step were studied 
by Raydan and Svaiter [62]. 


. Alternative choices of the spectral parameter: In [43] 


it was observed that the convergence theory of the 
spectral gradient method for quadratics remains 
valid when one uses Rayleigh coefficients originated 
in retarded iterations (see also [55]). In [32], for un- 
constrained optimization problems, the same step- 
size is reused for m consecutive iterations (CBB 
method). This cyclic method is locally linearly con- 
vergent to a local minimizer with positive definite 
Hessian. Numerical evidence indicates that when 
m > n/2 >= 3, where n is the problem dimension, 
CBB is locally superlinearly convergent. In the spe- 
cial case m = 3,n = 2, the convergence rate is, in 
general, no better than linear [32]. 

In [34] the stepsize in the spectral gradient method 
was interpreted from the point of view of interpo- 
lation and two competitive modified spectral-like 
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gradient methods were defined. Yuan [68] defined 
a new stepsize for unconstrained optimization that 
seems to possess spectral gradient properties pre- 
serving monotonicity. 


. Asymptotic behavior analysis: A careful consider- 


ation of the asymptotic practical and theoretical 
behavior of the Barzilai-Borwein method may be 
found in [41]. Dai and Fletcher [29] showed in- 
teresting transition properties of the spectral gra- 
dient method for quadratic functions as depend- 
ing on the number of variables. Dai and Liao [33] 
proved the R-linear convergence of the spectral gra- 
dient method for general functions and, as a con- 
sequence, established that the spectral stepsize is 
always accepted by the non-monotone line search 
when the iterate is close to the solution. The con- 
vergence of the inexact SPG method was established 
in [66,67] under different assumptions than the ones 
used in [18]. Assuming Lipschitz-continuity of the 
objective functions, these authors eliminated the 
bounded-level-set assumption of [18]. 
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Splitting methods were originally proposed as a gener- 
alization of the classical SOR method for solving a sys- 
tem of linear equations [8,25], and in the late 1970s 
they were extended to the linear complementarity prob- 
lem (LCP; cf. » Linear complementarity problem) [1,2, 
Chap. 5], [10,13,18]. These methods are iterative and 
are best suited for problems in which exploitation of 
sparsity is important, such as large sparse linear pro- 
grams and the discretization of certain elliptic bound- 
ary value problems with obstacle. 


To describe the splitting methods, we formulate the 
LCP (with bound constraints) as the problem of finding 
an x = (xj,.. 
nonlinear equations: 


-» Xn) € R" solving the following system of 


x= max [J, min[u, x — (Mx + q)|| : (1) 


where M = [mi]j,j=1,....n € R”, 9 = (Gis «++» Qn) € R" 
and the lower bound / = (J), ..., /,) and upper bound u 
= (uj, ..., U») are given. (Here max and min are under- 
stood to be taken componentwise and we allow J; = — 
oo or u; = 00 for some i. The case of J; = 0 and u; = oo for 
all i corresponds to the standard LCP.) In the splitting 
methods, we express 


M=B+C 


for some B € R™” and C € R””; then, starting with any 
x € R", we iteratively update x by solving the following 
equation for x’: 


x = max [J/, min[u, x’ — (Bx! + Cx + q)]] g (2) 


and then replacing x with x’. Thus, at each iteration, we 
effectively replace M and q in the original problem by, 
respectively, B and Cx + q which we then solve to obtain 
the new iterate. 

A key to the performance of the splitting methods 
lies in the choice of the matrix B. We should choose B 
to be a good approximation of M so that the methods 
have rapid convergence and, at the same time, such that 
x’ is easy to compute at each iteration (e. g., B is diago- 
nal or upper/lower triangular). The best known choice, 
corresponding to the SOR method of C. Hildreth [3,7], 
is 
me! 


B= 
wW 


D+L, (3) 


where D and L denote, respectively, the diagonal and 
the strictly lower-triangular part of M and w € (0, 2) is 
a relaxation parameter (see [2, p. 397], [13]). For this 
choice of B and assuming M has positive diagonal en- 
tries, the components of x’ can be computed using n- 
step backsolve: 


(a) 


x, = max min Ee - 
Mii 


x 3 Mj jX; + So mijx; + i 


j<i jzi 


PH 16.0 Me 
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In the case where 1; = — oo and u; = oo for all i, the above 
iteration reduces to the classical SOR method for solv- 
ing the system of linear equations Mx + q = 0. More 
generally, we can choose B to be block-lower/upper- 
triangular, e. g., 


Bu 
Bay By 
B= 
Bor Bpp Bop 


for some l< p < n, with the diagonal and triangular 
blocks possibly coming from M. Then, each block of 
components of x’ can be computed recursively by solv- 
ing an LCP of dimension equal to the block size. Other 
choices of B are discussed in [2, Chap. 5], [13] and 
below. Computation with the (block) SOR method on 
solving sparse linear programs and LCP with symmet- 
ric positive definite M is investigated in [1,4,14,16]. 

An original application of the SOR method is to the 
solution of convex quadratic programs of the form 

min jyly 

st. Ay <b, 


where A € R™” and b € R" are given, with A having 
nonzero rows. (Here, T denotes the transpose.) Specif- 
ically, by attaching nonnegative Lagrange multipliers 
(cf. also » Lagrangian multipliers methods for convex 
programming) x = (x),..., X,) to the constraints Ay < 
b, we obtain the following dual problem in x: 


max Jin} 5yTy + x7 (Ay — wt 
y 2 


x>0 


= max ere Te - a 
x>0 2 

whose optimal solution, related to the optimal solution 
of the original problem by y + ATx = 0 [7], solves the 
LCP (1) with M = AAT, g = band J; = 0, u; = 00 for alli 
[7, p. 4]. In this case, M is symmetric positive semidef- 
inite with positive diagonal entries and x’ computed in 
the SOR method is alternatively given by the formula: 


(a) 


ar tore) 


A; = max | —xj;, 
t 
/ 
x, = Xj + Ai, 


yitl = yi ATA, 7 aes 


where y! = —ATx and A; denotes the ith row of A. The 
above iteration is reminiscent of the Agmon-Motzkin- 
Fourier relaxation method for solving the inequalities 
Ay < b and, in fact, differs from the latter only in that 
the term —x;, rather than zero, appears inside the max. 
Convergence of the splitting methods, despite their 
relatively long history, was more fully analyzed only in 
the last ten years. In particular, if M is symmetric (not 
necessarily positive semidefinite) and the function 


{x= wet Mx +q'x (4) 


is bounded below on the box ! < x < u, then it is known 
that x generated by the splitting method (2) converges 
to a solution of the LCP at a linear rate (in the root sense 
[17]), provided that (B, C) is a regular Q-splitting in the 
sense that 


B — Cis positive definite and for every x there 
exists a solution x’ to (2) 


[12, Thm. 3.2]. (Earlier results of this kind that further 
assumed M is positive semidefinite or nondegenerate 
can be found in [2 Chap. 5], [5,11,19,20] and refer- 
ences therein.) For the SOR method, corresponding to 
B given by (3) with w € (0, 2), it can be verified that (B, 
C) isa regular Q-splitting provided M has positive diag- 
onal entries. The proof of the above convergence result 
uses two key facts about the LCP, namely, that f(x) as- 
sumes only a finite number of values on the solution set 
and that the distance to the solution set from any point 
x near the solution set is in the order of the‘residual’ at 
x, defined to be the difference in the two sides of (1). In 
addition, the function f(x) can be used in a line-search 
strategy to accelerate convergence of the splitting meth- 
ods [2, Sec. 5.5]. 

If M is not symmetric but positive semidefinite, it 
is known that x generated by the splitting method (2) 
converges to a solution of the LCP at a linear rate (in 
the root sense), provided that 


B— M is symmetric positive definite 


[2, Thm. 5.6.1], [24, Cor. 5.3]. One choice of B that sat- 
isfies the above assumption is 


BS2MED—L= 1", 


where L denotes the strictly lower-triangular part of M 
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and D is any nx n diagonal matrix such that D=-E=L 
is positive definite. This choice of B is upper-triangular 
and hence the corresponding x’ can be computed in 
the order of n? arithmetic operations using n-step back- 
solve [22, Sec. 6], [24]. Computationally, the asymme- 
try of M makes it difficult to incorporate line-search 
strategies since no ‘natural’ merit function analogous 
to (4) is known. As a result, on problems where M is 
highly asymmetric, such as the LCP formulation of lin- 
ear programs, the convergence of the splitting methods 
can be slow. Thus, accelerating convergence of the split- 
ting methods on asymmetric problems remains a chal- 
lenge. In this direction, we point out related methods 
based on projection or operator splitting (see [6,21] 
and references therein). These methods are applicable 
to the case where M is positive semidefinite (not neces- 
sarily symmetric) and the major part of their iterations 
also involves solving a matrix-splitting equation of the 
form (2), except the solution x’ must undergo addi- 
tional transformations to yield the new iterate x. These 
methods, which may be viewed as a hybrid form of the 
splitting methods, admit some forms of line search and 
show good promise in computation. 

In summary, building on the early work of Hildreth 
and H.B. Keller and others, splitting methods have been 
well developed in the last twenty years to solve the 
LCP (1) when the matrix M is either symmetric or 
positive semidefinite. Computationally, these methods 
are best suited when M is symmetric, possibly having 
some sparsity structure (e.g., M = AAT with A sparse), 
and the function (4) is used in a line-search strategy 
to accelerate convergence. Extensions of these meth- 
ods to problems where the box / < x < u is replaced 
by a general polyhedral set, including as special cases 
the extended linear/quadratic programming problem of 
R.T. Rockafellar and R.J-B. Wets and the quadratic pro- 
gram formulation of the LCP with row sufficient ma- 
trix, have also been studied [2, Sec. 5.5], [6,12,22,23]. 
Inexact computation of x’ is discussed in [2, Sec. 5.7], 
[9,12,15]. Acceleration of the methods in the case where 
M is not symmetric remains an open issue. In fact, if 
M is not symmetric nor positive semidefinite, conver- 
gence of the splitting methods is known only for the 
case where M is an H-matrix with positive diagonal en- 
tries and B is likewise, with the comparison matrix of B 
having a contractive property [2, p. 418], [18,19]. Thus, 
even if M is a P-matrix, it is not known whether the 


splitting methods converge for some practical choice 
of B. 


See also 
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SSC algorithms are a class of local minimization al- 
gorithms developed within the framework of supervi- 
sor and searcher cooperation (SSC). One of the dis- 
tinct characteristics of this class of algorithms is that 
they are both efficient and robust, and therefore suitable 
for minimization problems with strong noises. Some of 
these algorithms have been studied in [5] and [6]. 

Most of the existing fast minimization algorithms 
are not suitable for noisy minimization problems. One 
of the main reasons seems that the effectiveness of the 
line search procedures, which are usually incorporated 
into these algorithms, are rather sensitive to the accu- 
racy of function or gradient value evaluations. One may 
increase the accuracy of these evaluations by, for exam- 
ple, using an averaging process to diminish the effects of 
stochastic noises. However this will normally increase 
the computational work required. 

Most of the existing robust minimization algo- 
rithms, such as the stochastic approximation (SA) 
method, the N-M simplex method, the genetic algo- 
rithms, the Hooke-Jeeves method, do not use the clas- 
sical line search, see [7] and [2]. Consequently they 
have been widely employed to tackle noisy optimiza- 
tion problems. Unfortunately, these algorithms are, in 
general, very slow and inefficient even for the noise 
free case. The stochastic approximation method, for 
example, is very robust. Its global convergence has 
been established under various assumptions for the 
noise. However, even for deterministic problems with- 
out noises, SA has been known to be very slow in com- 
parison with efficient gradient algorithms like the con- 
jugate gradient (CG) method and the GBB algorithm 
(see [8]). This is mainly due to the use of pre-assigned 
stepsizes in the search for optimizers. 

The SSC framework provides an effective mecha- 
nism to address the above issues. The essential idea 
adopted in SSC algorithms is to combine the desir- 
able characteristics of two algorithms: a supervisor al- 
gorithm (SR) and a searcher (or search engine) algo- 
rithm (SE). The former is used to ensure global con- 
vergence and to enhance robustness, while the latter 
is employed to increase the efficiency of the solution 
searching work. The SSC framework can be viewed as 
a systematic way of exploring possible combinations 
of some existing minimization algorithms. The result- 
ing algorithms are a class of piecewise or hybrid algo- 
rithms. 
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Assume that we wish to minimize an objective func- 
tion f(x) on R”. Assume, for given xo, ..., x), that we 
have an iterative algorithm, called search engine (SE): 


Xkt1 = XE — sex(xXp,...,Xe-1,k, f), 


This particular form of se, may depend on k and 
the values (or estimates) of (Peal ys {Vi Gaal as 
{H' (xr Dh co» etc, where H(x) is the Hessian matrix of f 
at the point x. Suppose that this algorithm is convergent 
to alocal minimizer of f provided the starting points are 
close enough to the minimizer. 

Assume that we also have a supervisor algorithm 
(SR): Given xo, ...5 X17 


Xkb1 = Xk — Sr (XK... Xk-1,k, f), 


In general, an SR is slow but robust, while an SE is 
fast but only convergent locally. For the two algorithms 
to work together efficiently, a principle is needed to reg- 
ulate their relationship. In [5] the following framework 
was introduced, based on the idea of cooperation be- 
tween the supervisor and the searcher (SSC): the super- 
visor acts only if it believes that the performance of the 
search engine is not satisfactory while the search engine 
does most of the solution searching work. 

There are many different ways of designing new al- 
gorithms in this framework. In [5] the following sim- 
ple implementation has been proposed: Assume f > 0. 
Given xo, ..., xj, define (k = 1,1+ 1,...) the following 
(SSC) algorithm: 


x-—sr, if Tif (xk — srk) 
Xk+1 = < f(x~ — sex), 
x, —se, otherwise, 


where {T;} is a given sequence of nonnegative real 
numbers. 

Note that as far as minimization is concerned, one 
can always assume that f > 0 by adding a positive con- 
stant to the original function. Alternatively, one can use 
the following SSC algorithm for the general case: 


if piign{ e—sre) 
k 


Xk — STk 
ere x f (xk — Tk) 
= fixe — sen); 
xk —sez otherwise. 


where {T;} is a given sequence of nonnegative real 
numbers. 

Other forms of implementation or extensions have 
also been studied in [5,6]. For example, the following 
nonmonotone version of SSC algorithm was proposed 
in [6] (NOMSCC algorithm): For a fixed positive inte- 
ger m > 0 and a real number 6 > 0, define x; as follows: 


Xp — Sex if f(xp—sex)< max 


0<j<m(k) 
(f(x p)} — €x [gel 
or if f(xz — sex) 
< Ty f (xk — srk), 


otherwise, 


Xk+1 = 


Xk — STk 


where gx = Vf(xx), m(k) = min(k, m) and {e;} is a given 
sequence of nonnegative real numbers such that €, > 6 
(k=0,1,...). 

It was found that in some cases the nonmonotone 
version performs better than the simpler original ver- 
sion (see [5]). 

In principle, any globally convergent iterative algo- 
rithm, e.g. CG, BFGS, etc., could be used as a supervi- 
sor. In general, one wishes that supervisors should have 
simpler structures to increase robustness. Two classes 
of supervisors have been examined in [6]. 

The first class uses pre-assigned steplengths as in 
SA. Let {t;,}(k = 0, 1, ...) be such that 
i) tp>O0; 

ii) Sm th =+ Oo. 

Let x, € R” (k = 0, 1, ...) and let gy = Vf (xx). Let 1 > 
0 and let {di(x;, ..., xx, f)} be a sequence of R” vec- 
tors so that there exist c, C > 0 such that the following 
assumptions hold: 


(D) dp ge >clgkl’s |del < Clgel, k=0,1,.... 


One of the most frequently used forms of d; in this pa- 
per is dy = g,, though there are other possible choices as 
well. 

The first class of supervisors have the following 


form: for given Xo, ..., XI; 


Xeey = Xe — ted, k=1,141,.... 


When dj = gj, it is the SA algorithm. 
The second class of supervisors have the following 


form: for given Xo, ..., X1; 


Xep1 = X~—tede, kK=I,14+1,..., 
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where {t,}, {d;} are chosen so that there exists an a > 0, 
independent of k, such that 


d'V 7 
f (xn — tedk) — f(xn) < ye 
|dx| 
and 
Y  cos”(9x) = +00, 
1 
where 
pea OO pugs... 
lgk| |e | 


For instance, let d;. = g; and let {t, be generated us- 
ing the Wolfe line search procedure. Then the above as- 
sumptions hold. Clearly other line search methods can 
also be considered in the same way. 

Again, in principle, any locally convergent iterative 
algorithm could be used as a search engine. Some possi- 
ble examples, which have been already investigated, are 
as follows: 

1) Newton search engine: Given xo, 


Xkey = Xp — Sex, k=0,1,..., 


where sex = Hy ok and Hy and g, are the Hessian 
matrix and gradient of f at the point x,. If Hy is not 
invertible, then the SSC switches to SR. 


2) Quasi-Newton search engine: Given Xo, 


YN 


Xkey = Xp — Sex, k=0,1,..., 


where se, = Byg;, and By is given by a quasi-Newton 
recursive formula (see [4] and [3] for the details). 
It has been found that a straightforward use of this 
class of search engines may lead to poor perfor- 
mance of the resulting algorithms. 

3 


YS 


BB search engine: Given xo, 
Xkey = XR — Sex, k=0,1,..., 
where sex = 4.2%, Where @p = 1 and fork > 1 


_ Lyel 


Ak 
me 
Vx Sk 


where 


Vk = Xk — Xk-1; 
sk = Vf (xn) — VF (xp-1). 


This steplength a; was first proposed in [1], where 
the BB algorithm was introduced. This search en- 
gine has been studied in detail in [5]. 

Three SSC algorithms have been studied and 
tested in [6]: SSC-SABB, SSC-SANEWTON, and SSC- 
SABFGS, which use SA as the supervisor, and, BB, New- 
ton’s, and BFGS as the search engines, respectively. For 
example the SSC-SABB algorithm is as follows: 


Let x € R” be given and a = 1. 
Let T, > 0, k = 0, 1,..., be given, and assume that 
= 0. 


Then (the SSC-SABB algorithm): 


tege if Tr f(xe—thegk) < f(xXk—-Ok gk), 


Xkay =XE—- : 
pees ang otherwise, 


where, for k > 1, 


— Lye? 


a, = ——, 
VK Sk 


with 
Vk = Xk —Xk-1, 


Sp = Wi (Gey) = Vi za) « 


These algorithms were found to be efficient and very 
robust. For instance, SSC-SABB was able to solve diffi- 
cult stochastic optimization problems efficiently, while 
for the noise free case, it was comparable with some 
fastest gradient algorithms like CG and GBB (see [5]). 
In [6], it was also reported that SSC-SANEWTON and 
SSC-SABFGS can solve difficult optimization problems, 
using rough approximations of the gradient and the 
Hessian, while they are very fast when the errors are 
min(0.01, zz) and 
t, = min(0.01, 1/k) are tested, while connection T; is 
fixed as a constant T>0. For example, one may take T = 


small. In these experiments, t, = 


3 or T = 5 when there is little noise in an optimization 
problem, and one has to let T = 1 if the noise is strong. 

In the noise free case the following global conver- 
gence result has been established in [6] for SSC algo- 
rithms. 

Let f be twice continuously differentiable and 
bounded below. Assume that V f is Lipschitz with 
a global Lipschitz constant. Let {x;} be generated by the 
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SSC algorithm. Assume that [[§° max(T;, 1)< oo. If the 
supervisor belongs to the first class, then there is an € 
(f) > 0 such that { ys te |V f(xx)|*}is convergent as 
k — oo for any sequence {t;,} such that there is a N > 0 
satisfying that t;, < €(f) after k > N. If the supervisor be- 
longs to the second class, then { }F cos?(0x) |Vf(xx)|7} 
is convergent as k — oo. 

It should be noted that the above convergence is in- 
dependent of the search engines used. The result can 
be further extended. For instance, SSC-SABB has been 
shown to be globally convergent when only finite T; = 1 
and the rest T;, = T > 1. It was also shown that the speed 
of SSC-SABB is R-linear. 

In general, the SSC algorithms are at least as fast as 
their SE provided T,> T> 1 after k large enough. There- 
fore, the global convergence of SSC algorithms is en- 
sured by the SR, while their speed is largely decided by 
the SE. More details can be found in [6]. 

The above basic SCC algorithms have been ex- 
tended for different applications, like training of feed- 
forward neural networks (FNNs). For example, the fol- 
lowing extended version of the SSC algorithms, which 
forces the search engine to run m iteration before at- 
tempting switching, is proposed. Let x9 € R” be given. 
Let Ty > 0 be given for k = 0, 1, ..., and let m > 0. As- 
sume that f > 0 and that we have x,;. Then define 

ae 


2 = 
Xetm = Vk+m;, 


tm — Xk — STk, 


where yim is defined by yz = xx, 


Vetit1 = Yeti — sext1, 1=0,...,m—1. 
Then define 
Mijn te ET Mpeg) 
Xk = ST ke 
Xi, , otherwise. 


The purpose of the above extension is to create 
a ‘memory’ effect which may be important for the ef- 
ficient performance of some search engines. 

It seems that the supervisor-searcher cooperation 
framework offers a promising way for combining char- 
acteristics of existing algorithms. It seems that the new 
algorithms can retain the desirable properties of their 


‘parents’. Consequently these algorithms are both effi- 
cient and robust. This makes them suitable for both de- 
terministic and stochastic minimization problems. This 
unique feature is largely due to the fact that the algo- 
rithms use neither line search nor pre-assigned step- 
sizes all the time. Finally, the additional computational 
work required is also very light. 


See also 


> Equality-constrained Nonlinear Programming: KKT 
Necessary Optimality Conditions 

> Inequality-constrained Nonlinear Optimization 

> SSC Minimization Algorithms for Nonsmooth and 
Stochastic Optimization 
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SSC algorithms are a class of local minimization al- 
gorithms developed within the framework of supervi- 
sor and searcher cooperation, see ® SSC minimiza- 
tion algorithms for definitions. Some of the algorithms 
are both efficient and robust, and therefore suitable 
for minimization problems with strong noise. Further 
studies of SSC algorithms can be found in [4,5], and [7]. 

Nonsmooth optimization problems, where the ob- 
jective functions may not be differentiable, are fre- 
quently met in many important applications. There ex- 
ist some minimization algorithms, which are applicable 
to certain types of nonsmooth optimization problems, 
such as the subgradient method, the conjugate subgra- 
dient method, and the cutting plane method (see, [3,8], 
and [9]). However, more research is still much needed 
to develop fast minimization algorithms for nonsmooth 
optimization problems. 

Some of the SSC algorithms have been extended to 
nonsmooth optimization problems, see [6]. It is clear 
that the SSC framework described in ® SSC minimiza- 
tion algorithms can be used to design minimization al- 
gorithms for nonoptimization problems, though it is 
not straightforward to select suitable supervisors and 
searchers. In [6], the SSC-SABB (see ® SSC minimiza- 
tion algorithms) algorithm was extended to nonsmooth 
optimization: the supervisor was replaced by the sub- 
gradient method, while the BB search engine was still 
used. This gives the SSC-SBB algorithm, described be- 
low. 

Assume that we wish to minimise an objective func- 
tion f(x) on R”. Assume that f > 0 is Lipschitz. Let t; > 
0 (k = 0, 1,...) be such that }°9°t, = 00 and }°$° t; < 
oo. Define Of (x), the general gradient at x, as in [9]: 


Of (x) = conv( V¢(x)), 


with 


iS lim,» Vf (xi), 


R”: 
= where V f(x;) exists 


Vy (x) = 
where conv(V+(x)) represents the close convex hull of 
Vy (x). Since every Lipschitz function on R" is differen- 
tiable almost everywhere, the above general gradient is 
well-defined. The SSC-SBB algorithm is then defined as 
follows: For given xo and T;, > 0 (k=0, 1,...), define 


Veti = Xe — thE if Tr f (yeti) 
eer = = f(Zk+1); 
Zk+1 = Xp —O&_ otherwise, 


where & € V¢(xx), @ = 1, and fork > 1 


2 
_ Wel 
YySk 


Ak 


where 


Vk = Xk—Xe-1, Sk = Ex — Epa. 


Note that in the above definition, &; is taken from 
Vy (xx), instead of df (xx) as in the subgradient method. 
In general, this causes no extra computational work. 
For instance, for the most common case where f(x) = 
max, <;</j(fi(x)), where f; (i= 1,..., J) is smooth, one 
can simply use & = Vfj(xx), where j is such that fj(xx) 
= max) <;</j(f;i(xx)). In fact, it is also possible to simply 
require &% € Of(x;) with |£;| = 1 in the above definition, 
as in the subgradient method. Then the formula of a; 
has to be modified because of the normalization of &,. 
Some global convergence results have been estab- 
lished in [6] for the SSC-SBB algorithm. The following 
is just an example. Assume that 
a) f is continuous and strictly convex; 
b) x* is the global minimizer of f such that f(x*) = 
minyer” f(x) > 0; and 
c) there exist constants c, m > 0 such that for all x € R” 


c |x —x*| < f(x) — f(x*) < m|x—x*|. 


Assume that {6;} is a sequence of real numbers. Let xo 
be given and let T;, = T > 0 (k = 0, 1, ...). Let {xz} be 
defined by (k = 1, 2, ...) 


Yeti = XE — thE if Tf (yet) 
Xk+1 = < f(Zk+1), 
Zk+1 = Xp — BeE, otherwise, 
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where &; € df (x;) with |&% = 1, and t; > 0 such that )°§° 
t, = 00, and )°f° #2 < oo. Then limg— o|x% — x*| = 0 
if T < c/m. If we further have U = sup,, )(f(x) — f(y)) 
< oo, then for any 0 < T < 1 there exists a Cr> 0 such 
that lim, 0 |x,—x*| = 0, where x; is generated from 
the above SSC-SBB algorithm for any shifted objective 
function fc =f + C, where C > Cr. 

The above result basically says that if 0 < T < 1, then 
by adding a suitable positive number C to the objec- 
tive function f (this does not change the minimizer of 
f), the SSC-SBB algorithm will converge. It does not, 
however, give any estimate for the constant C. In [6], 
convergence was observed for all C > 0. More general 
convergence results have also been established in [6], 
where some of the above conditions have been further 
relaxed. 

Numerical experiments indicated that the SSC-SBB 
is quite fast when {x;,} are far away from a minimizer. It 
then slows down when the approximations are near the 
minimizer, if it happens that the objective function is 
not differentiable at that point. The main reason seems 
that the BB search engine is frequently redundant in the 
computation when x; is near a nonsmooth minimizer. 
This problem is yet to be solved. Nevertheless, in some 
cases, a considerable improvement on the overall speed 
over the subgradient method has been observed in [6]. 

It has been found that some SSC algorithms seem 
quite suitable for stochastic optimization: 


min f(x), 


with f(x) = F(x) +e, 


where F is the smooth underlying exact mathematical 
model, and € is stochastic noise, which may depend 
on x. Of course, here we really mean to find a mini- 
mizer of F. Among the algorithms, a stochastic version 
of SSC-SABB has been much studied and tested, see, [4] 
and [7]. 

Assume that we have some estimators for F(x) and 
nabla; F(x). Whenever we refer to f(x) and V f(x), we 
mean the estimators of the value or the gradient of F at 
the point x. When € = 0, all the estimators are assumed 
to be identical to the exact values. 

Then the stochastic version of SSC-SABB is as fol- 
lows: Let xo € R” be given and @ = 1. Fork =0, 1,..., 
let T;, > 0 be given. Define the stochastic SSC-SABB al- 


gorithm 
Xe—tege if Tr f (xe — tege) 
Xk+1 = < f(xk — Org), 
Xk — Oger otherwise, 


where gx = Vf(x;), and for k > 1, 


2, 
_ el 
VySk 


Ok 
with 


Vk =Xk—Xe-1, Sk = VF (xn) — VF (xK-1). 


Actually it has the exactly same form as the determinis- 
tic version of SSC-SABB (see ® SSC minimization algo- 
rithms) but with a different interpretation for f and V 
f. More importantly, the parameters have to be selected 
differently. For example, it has been found in [4] that 
if one takes t, = min(c, ZR) or t, = min(c, 1/k), then 
the constant c may be very much problem dependent to 
ensure a fast convergence. It was reported in [4] and [7] 
that T; = 1 was always a safe choice for the stochastic 
problems studied there. One may use any estimator for 
F or VF in the above algorithm. In [4] and [7], the linear 
or quadratic approximation for F, and the second order 
central difference estimator for V F have been tested. 
Performance of the algorithm was found to be 
promising. A considerable improvement over the 
stochastic approximation algorithm (SA) was observed 
in [4] and [7]. It was found to be able to solve some 
hard stochastic optimization problems efficiently. This 
improvement seems due to its unique adaptive (switch- 
ing) feature in calculating its steplengths — they are nei- 
ther pre-assigned nor determined by line search. 
However, this unique feature leads to a major dif- 
ficulty in establishing convergence for the algorithm 
(indeed for this class of algorithms) in the stochastic 
case. In the literature, most existing convergence re- 
sults for the SA type of algorithms are established on 
the assumption that the size of the steplengths tends 
to zero. For instance, in the case of the stochastic ap- 
proximation algorithm, the steplength at the kth itera- 
tion is just t,, which goes to zero as k — oo. However 
this assumption is certainly not true for the stochas- 
tic version of SSC-SABB, since at the kth iteration, the 
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steplength is either t, or a,. In fact, it was observed that 
the steplengths in the stochastic version of SSC-SABB 
often jumped, and this actually speeded up the algo- 
rithm. 

Some convergence results were established for 
this algorithm in [2], based on the observation that 
the shifted minimization problem: ming:(f + C) has 
weaker noise relative to the value of F + C, if C is larger. 
One of the convergence results will be very briefly de- 
scribed below. For ease of exposition, we will only state 
the result for the one-dimensional case. 

Assume that {&}xe>u. (nehk> ei (oebae> 1, and 
{x«}e> 1}; are independent random sequences defined 
on the probability space (92, P, F) with bounded vari- 
ances. For n = 1, 2,..., define the sub-o-algebra F,, by 


Fn 
= o({Exhreen}, (nkhteen}s (chteen}s (Akh ken})- 


Assume that x, ~ N(0, y(k)) and €, ~ N(0, 0(k)) 
for y(k)> 0 and 6(k) > 0, respectively. Assume that the 
function F € C?(R) with bounded second derivatives. 
Let {t,} be defined as before. Let {8;.} be a sequence of 
positive numbers, and let T > 0. Then for a given start- 
ing point Xo = xo, define the random process (k = 1, 2, 


a) 

Xepi = Yeti 
if 

T[F(Xq = te(VF(Xk) + Ee-41)) + Seti] 

< [F(Xe — Be(VF(Xk) + neti) + Ket]. 

and otherwise by 

Xxpi = Let, 
where 


Yeti = X_ — te(VF(Xx) + &e41), 
Zeti = X~ — Be(VE(Xx) + e41)- 


Suppose that {&;, F;,} is a sequence of martingale 
differences. Then it was shown in [2] that for any given 
T <1, one has that 


lim F(X;,) 
k—o0o 


exists almost surely, and 


P(lim inf dis(X;,, 2) = 0) = 1, 
k—>oo 


provided C is large enough and 


[(1 — T)C]’ 
y(k) + @(k) < 10log(k + 1)’ 


where 
2 ={x: VF(x) = 0}. 


Convergence has also been established when all the 
noises are bounded, see [2]. It should be noted that the 
above convergence is independent of the selection of 
the sequence {f;,}. However, the speed of the algorithm 
will very much depend on the selection. The assump- 
tions of the results are in fact quite weak, and can be 
met easily, e. g. by averaging two or three samples of y% 
and ¢;, or by taking a larger value of C. For instance, in 
[4], C was fixed to be 250 when F > 0. 


See also 


> Equality-constrained Nonlinear Programming: KKT 
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> Inequality-constrained Nonlinear Optimization 
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Decomposition methods for stochastic linear program- 
ming problems often use cutting planes to develop 
piecewise linear approximations of the objective func- 
tion. This article provides a summary of techniques that 
can be used to stabilize these algorithms. By providing 
a unified treatment of deterministic as well as stochas- 
tic cutting plane algorithms, this article provides a com- 
mon ground for both classes of methods. 


Introduction 


Stochastic programming problems often give rise to 
highly structured optimization problems that are solved 


using decomposition techniques. Many of these meth- 
ods rely on cutting plane algorithms that generate suc- 
cessive approximations of the objective function. By 
and large, these algorithms generate deterministic ap- 
proximations (e.g. [1,14]). More recently, new ran- 
domized versions of cutting plane methods have also 
been studied ([6] and [10]). The approximations used 
by such algorithms (deterministic or stochastic) are 
piecewise linear functions, which have the potential 
to provide arbitrarily close approximations, especially 
near an optimum solution. However, these basic meth- 
ods suffer from the following drawback: cutting planes 
are derived in each iteration, and proofs of convergence 
are often based on retaining all cuts generated during 
the course of the algorithm. Since the cuts often re- 
sult in dense linear inequalities, the unfettered prolifer- 
ation of cuts leads to scarcity of computer memory. On 
the other hand, deleting cutting planes indiscriminately 
leads to instability in the approximations, and conse- 
quently in the solutions as well. This article is devoted 
to the discussion of algorithmic methods to curtail the 
size of the approximations without a degradation in the 
convergence properties of the algorithm. 

Cutting plane algorithms for the solution of 
stochastic linear programming (SLP) problems may use 
one of a class of alternative representations of the SLP 
objective function. There is a great deal of flexibil- 
ity in the manner in which cutting plane algorithms 
are designed. For stochastic programming problems, 
each iteration may generate anywhere from 1 to S cut- 
ting planes, where S is the number of possible out- 
comes associated with the random events of interest. 
Our discussion of stabilization methods will focus on 
two basic tools: incumbent solutions and regulariza- 
tion ([{12,13] and [7]). These tools are then used within 
deterministic as well as stochastic cutting plane algo- 
rithms to provide well defined schemes for the dele- 
tion of cutting planes that do not compromise the in- 
tegrity of the algorithm. For problems with n deci- 
sion variables, deterministic algorithms that generate q 
cuts (1 < q < S) can be shown to be convergent us- 
ing a maximum of n + 2q deterministic cuts. In case 
of a stochastic cutting plane algorithm that uses q cuts 
per iteration, convergence results can be obtained by 
using a maximum of n + 3q cuts. We conclude this 
article with an illustration of these stabilization tech- 
niques. 
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Alternative Methods 
for Approximating The Recourse Function 


A two-stage stochastic linear program (SLP) may be 
stated as 


cl x + E[h(x,0)] 
s.t. xe xX, 


min 


(SLP) 


where @ is a random variable and 


min gly 
h(x,@)= 4st. Wy =r(w)—T(o)x, 
y= 0, 


and X C R” is a convex polyhedral set. Note that h is 
a value function of a linear program. For the sake of 
simplicity in this presentation, we assume that for all x 
€ X, h(x,@) < co with probability one. 

Most deterministic methods for SP work with the 
entire sample space of scenarios. For these methods it 
is therefore customary to to assume that the random 
variable @ is discrete, so that the possible outcomes may 
be numbered {w,}5_,. In such cases, the expectation in 
the objective function may be written as 


S 
E[h(x,@)] = }) A(x, os)ps. 
s=1 


where p,; = P{@ = a}, the probability of scenario s. 
The structure of the SLP problem is well suited for 
Benders’ decomposition which, in the SLP literature, 
goes under the name of the L-shaped method ([14]). 
At each iteration of the L-shaped method one con- 
structs a supporting hyperplane of the recourse func- 
tion E[h(x, @)]. This hyperplane is then is added to the 
collection of previously generated hyperplanes, and the 
method proceeds in this manner until a stopping rule is 
satisfied. The sequence of hyperplanes are called ‘cuts’ 
and they provide a piecewise linear lower bounding en- 
velope of E[h(x, @)]. Some variants of this method de- 
velop cutting plane approximations of the value func- 
tions {h(x, os)}ey so that S cuts are generated at a time 
(see [2,13]). These types of cutting plane algorithms are 
often called multicut methods since each iteration re- 
quires the development of as many cuts as there are 
scenarios in S. Since these multicut methods use sums 
of piecewise linear approximations, rather than one ag- 
gregated approximation (as in the L-shaped method), 


they provide better lower bounds than the L-shaped 
method, although there is no similar guarantee regard- 
ing the quality of the upper bounds derived. 


Two Basic Tools: Incumbent Solutions 
and Regularization 


Consider optimization problems of the form min{f(x):x 
€ X}, where f:R” — R is convex and X C R" is a con- 
vex set. At each iteration, we assume that a point x* is 
given, and we generate a lower bounding linearization 
Ant Br x, where the cut coefficients B, € Of (x*) and ax 
= f(x*)— B i x*, (Since we are dealing with convex func- 
tions, the Clarke subdifferential, [3], will suffice for the 
purposes of this exposition.) The collection of all pre- 
viously generated cuts define a piecewise linear lower 
bounding function, denoted f;(x), and is represented 
as 


f(x) = max oy + Bx; (1) 


where J; is an index set, a; € R, B; € R” were generated 
in iteration f. 

For the case of SLP, the cut coefficients are obtained 
by recognizing that as long as h(x’, w) is finite, LP dual- 
ity implies that 


[r(w) — T(@)x"]" a 


s.t. Wla < g. 


max 
h(x,@) = 


Letting z‘(w) denote an optimal dual solution, a sub- 
gradient of h is 


—T(w)'2'(w) € dh(x',@). 


Hence the cut coefficients for the SLP problem are de- 
fined as 


Br =c—E[T(@)'x'@)], 


a, = E[r(®)'2'(0)]. 


(2) 


A more detailed explanation of this cut is provided in 
> Stochastic linear programming: Decomposition and 
cutting planes. It is worth noting at this point that one 
of the major handicaps associated with deterministic 
cutting plane algorithms for SLP arises from the cal- 
culations necessary in (2), which involves multidimen- 
sional integration. For problems involving a large num- 
ber of outcomes, this operation becomes computation- 
ally burdensome, and one must then resort to sampling 
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based methods such as those discussed in the next sec- 
tion. 

The iterates obtained by cutting plane methods may 
be generated as 

x**+! © argmin { fx(x): x € X}. (3) 
We will designate these points as ‘candidate’ solutions. 
Since f; has the form shown in (1), it can be written as 
a linear programming problem. In the language of Ben- 
ders’ decomposition ([1]), this problem is called a mas- 
ter problem (program). 

There are several criticisms that may be leveled 
against cutting plane methods. For example, in the ear- 
liest versions of these methods (including [11]) the in- 
dex set J, was defined as {1, ..., k}; consequently, the 
size of the LP and its data structures would grow in- 
definitely. In order to curtail the growth of the LP size, 
one may resort to dropping some of the inequalities. As 
one might expect, it is important to derive mathemat- 
ically justifiable cut dropping rules, for otherwise, the 
sequence generated by (3) may not converge to an op- 
timal solution. In this article we will summarize stabi- 
lization schemes that guarantee convergence, while en- 
suring that there is a fixed upper limit on the size of 
the master program. Another criticism is that the se- 
quence {f (x*)} is not a decreasing sequence. As a result, 
it is advisable to monitor decreases in f! (x*) and record 
points that suggest significant improvements. Such iter- 
ates will be designated as ‘incumbent’ solutions, so that 
the sequence of incumbent solutions is a subsequence 
of the sequence of candidate solutions. 

We now proceed to a discussion of regularization, 
in which the piecewise linear approximation in (1) 
is augmented with a strictly convex casting function 
[15] or an auxiliary function [4]. The most commonly 
used casting function is a quadratic proximal term (see 
[12,13]). In iteration k, let x* denote the incumbent so- 
lution. The regularized master program is defined as 


2 


kt k 


o 
'e angmin | fala) +5 |x —% : xext, 
where o > 0 is a parameter that may be chosen during 
the course of the algorithm. Note that if we define 
2 


px(xso) = fxlx) + > |x-x 
then 
Opr(x*;0) = Ofx(X*). 


Hence, if 0 belongs to the set Opx(x*;o), then 0 € 
Of (x*). Moreover, if Of (x*) Cc oF (z*), then 0 € 
dpx(x*; o) implies 0 € Of (Xx*), the first order optimality 
condition at x*. 


Regularization of Deterministic Cutting Plane 
Methods 


In the following we shall work with the understand- 
ing that X is a convex polyhedral set represented in the 
form {x: Ax < b}. For the remainder of the develop- 
ment, it will be convenient to rewrite the approxima- 
tion in terms of the displacement, d, from the incum- 
bent solution, x*. Let x < x* + d. In order to write 
the cuts as a function of d, we define f* = a, + B) x* 
that is, f* denotes the value of the cut a; + 8] x at the 
incumbent point x*. Hence the approximation in terms 
of the direction vector d may be written as 


v(d) = fil@* + d) = maxtfy + Bid}. (4) 


The primal master problem (direction finding prob- 
lem) can now be written as 


min v+$ \|d|| 
p= pi d= fi, 
x*+deX. 


(PRM*) y= est. Vte Jk, 


Since this section is devoted to a deterministic algo- 
rithm, we work with the assumption that 


vy. (0) = fla") = f@". (5) 


We now state a prototypical regularized determin- 
istic cutting plane algorithm. 

This algorithm has several important properties. 
First note from (4) and (5), and the definition of dé (a 
solution to PMK‘) that 


va(d*) = filz* + d*) 


oO 2 


<1c)—S |e =s0%-S 


. (6) 


Hence A; > 0, with strict inequality if d* 4 0. Conse- 
quently, the sequence of incumbents defined in Step 2 
yields a descending sequence of objective values. Fur- 
thermore, the stopping rule is based on the observation 
that if Az.) < €, then o ||d||?/2 < €. Thus with an ap- 


propriate choice of €, the stopping rule tests whether 
the direction d* has a sufficiently small norm. 
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0 (Initialize) 

Oa | (Problem parameters) A convex function 
f:R" — Rand a convex polyhedral set X = 
{x: Ax < b} are given. 

For SLP problems, f(x) = c!x + E[h(x,0)]. 
Ob | (Algorithm parameters) o > 0, y € (0,1) and 
€ > Oare given. 

Oc | x! € X is given, k <— 0, Jo <— 9, %° < x!, 
Ak = oo. 

1 (Define/Update the piecewise linear approxi- 
mation) 

k <— k+1. Evaluate By € OFC") and 
Akt = te) _ joie (For SLP these quanti- 
ties are calculated as in (2).) J, = {t € J-1: 
Gk > 0) Uitkt. 

2 | (Update the incumbent) 

TE f(x") = fie *)— pay, THEN 2" = x". 
ELSE a 

3 | (Solve the regularized master) 

Let x*+! = x* + d*, where d* denotes an op- 
timal solution to (PRM*), Similarly let 6 de- 
note the optimal Lagrange multipliers associ- 
ated with the cuts indexed by t € Jk. 

4 | (Stopping rule) 

Let Ager = f(&*) — ve(d*). 

IF Axy; < &, THEN stop ELSE repeat from 
Step 1. 


A regularized deterministic cutting plane algorithm 


In Step 4, Aj41 is defined using the objective func- 
tion approximation, v;(d*). Thus, the decision to up- 
date the incumbent is based on how well A;,1 predicts 
the actual change in the objective value. For higher val- 
ues of y an incumbent change is accepted only when 
the prediction is accurate, whereas smaller values of y 
yield a less stringent criterion. 

The analysis of this algorithm relies heavily on 
a dual statement of the regularized master (PRM). In 
fact, [12] states the algorithm in terms of the dual to 
the master problem specified above. To write the dual, 
let F, denote the vector of scalars tiae Jx> and simi- 
larly, let B, denote a matrix whose rows are given by the 
cut coefficients { Bi hte j,- Furthermore, let bj denote the 
vector Ax* — b. Then the dual to (PRM) is 


max F/O +b/A— 2 |Blo+ Atal’ 


(DRM‘) 
st. 1'G6=1, O6>0, AZO. 


One of the important relationships between the primal 
and dual optimal solutions for the pair (PRM*) and 
(DRM) is 


d= Se GAT: (7) 
Oo 


Finally, we note that the total number of cutting 
planes retained in Step 1 cannot exceed n + 2. To see 
this, note that since o > 0, the primal and dual mas- 
ter problems are strictly convex programs, and conse- 
quently both have unique optimal solutions. Suppose 
now that d* and (6%, A*) are the optimal solutions for 
the primal-dual pair. Clearly these solutions must sat- 
isfy (7) which in combination with the convexity con- 
straint in (DRM‘), yields n + 1 linear equations involv- 
ing the dual variables (0, A). Note that (6*, A*) (the op- 
timal dual solution) must be an extreme point of the 
resulting polyhedron; for if not, then (0%, A‘) can be 
written as a convex combination of two other points 
with the same dual objective value. The latter, of course, 
contradicts the uniqueness of the dual optimum. Hence 
(0*, AX) must be an extreme point of the polyhedron 
determined by (7) and the dual feasibility restrictions. 
Since there are at most n + 1 equations, it follows that 
there can be at most n + 1 components of 6* which can 
be positive. Thus by including the new cut, we conclude 
that the cardinality of J, cannot exceed n + 2. 

We should mention that similar benefits can also be 
realized in multicut methods. For instance, if we use the 
sum of q piecewise linear approximations as in above, 
then one needs to carry at most n + q cuts from one it- 
eration to the next. By accounting for the q cuts gener- 
ated in each iteration of a multicut method, it follows 
that the size of the master problem can be restricted 
to at most n + 2q. In applying this class of methods 
to SLP problems, the regularized decomposition algo- 
rithm [13] uses q = S, the total number of outcomes. 
Hence the resulting master problem requires n + 2S 
cuts. 


A Regularized Stochastic Decomposition 
Algorithm 


In contrast to the previous sections, this one is dedi- 
cated to SLP problems. For this class of problems, the 
calculation of multidimensional integrals in (2) cre- 
ates a computational bottleneck for deterministic al- 
gorithms. In order to overcome this bottleneck, the 
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Stochastic Decomposition (SD) algorithm combines 
sampling within a cutting plane algorithm. In this sec- 
tion we discuss a regularized version of stochastic de- 
composition (SD). 

It is clear that the primary change in going from 
the deterministic method of the previous section to the 
stochastic method is the inclusion of sampling. Refer- 
ring to the algorithmic statement of the previous sec- 
tion, we augment the statement of step 1 with the inclu- 
sion of a sampled outcome (generated independently of 
previous observations). Further differences between SD 
and other cutting plane algorithms arise from the man- 
ner in which the approximations are generated/updated 
(Step 1) as well as the rule for accepting an incumbent 
(Step 2). The process for creating the cutting plane co- 
efficients is presented in ® Stochastic linear program- 
ming: Decomposition and cutting planes, we do not re- 
produce those details here; instead, we summarize the 
changes to step 1 as follows: 


1 | (Define/Update the piecewise linear approxi- 
mation) 

k <— k+1. Generate w*, an outcome of @. 
Evaluate cut coefficients (a ee B i ) for a new cut, 
and update coefficients of previously generated 
cuts. Denote the updated cuts by (a*, B*). Up- 
date Ji. 


Next we motivate the reason for updating J; in 
a manner that is slightly different that that used in a de- 
terministic algorithm. We also argue the need to up- 
date the incumbent (in Step 2) with a slightly different, 
though analogous rule. 

Unlike the regularized deterministic algorithm 
which uses at most m + 2 cuts in the master problem, 
convergence results for the regularized SD method re- 
quires n + 3 cuts in the master program. This is because 
of the inherent inaccuracy of the objective function es- 
timates used in a sampling algorithm. Recall that Step 2 
of the deterministic method uses objective function val- 
ues f (x") and f (x*-!) to decide whether the incumbent 
needs to be updated. In a sample based algorithm these 
quantities cannot be calculated, and the choice of an in- 
cumbent is necessarily based on sampled information. 

In order to prove asymptotic results regarding the 
sequence of incumbent solutions, one needs to have 


asymptotic accuracy of a subgradient and the function 
value at an accumulation point of the incumbent se- 
quence. One way to accomplish this is to require that 
the cut associated with an incumbent becomes asymp- 
totically accurate (with probability one). This property 
may be attained algorithmically by periodically updat- 
ing the cut associated with the incumbent solution. To 
clarify the process, suppose that at iteration k, we know 
that the most recent iteration at which the incumbent 
cut was generated (or updated) was ix < k. Further- 
more, let us suppose that we intend to update the in- 
cumbent cut every Tt iterations (oo > t > 1). Then, at 
an iteration in which k = ig + T, the incumbent cut is 
updated to reflect the impact of outcomes as well as the 
dual vertices that may have been generated since iter- 
ation iz. These updates then guarantee that the sample 
size used for the incumbent cut grows indefinitely as re- 
quired by the law of large numbers. 

As suggested by the above discussion, the rule for 
cut retention ought to maintain a cut associated with 
the incumbent solution. At iteration k, let t; denote the 
index associated with a cut generated at the incumbent 
solution. Then the rule used for cut retention is the fol- 
lowing: 


i= {t jn es 0} U fk, te. 


Hence the maximum number of cuts used in the reg- 
ularized master for SD is n + 3. With these changes 
included in the definition of the approximation f;, we 
now provide the rule used for updating incumbents in 
Step 2. 


2.| (Update the incumbent) 
IF k = 1, THEN put A; = 00, 
ELSE Ay feat’) vinta"). 
IE fi(x*) < f(x") — yAn, 
EIEN 2) EGE eee 


Finally, there is one additional issue that arises 
whenever one uses sampled information. In some cases, 
as in SD, sampling is incorporated within the decom- 
position algorithm. In other cases, sampling is under- 
taken prior to using the optimization algorithm. Nev- 
ertheless, since each case uses sampled data, one should 
explore the possibility of error. We refer the reader to 
two articles in this area. The first of these ([5]) is based 


3674 


Stabilization of Cutting Plane Algorithms for Stochastic Linear Programming Problems 


on designing stopping rules that work with ‘in-sample’ 
information, and is tailored for the SD algorithm. An- 
other approach, using ‘out-of-sample’ scenarios is pro- 
vided in [8]. 


Conclusions 


We conclude this article with two examples taken from 
[9]. The illustrative application for these examples deals 
with an electricity reserve planning problem that finds 
an optimal trade-off between the cost of reserve capac- 
ity and the cost of unmet demand [9]. In Fig. 1, we il- 
lustrate two trajectories: the solid line is the trajectory 
of incumbent solutions and the dotted line is the set 
of candidate solutions. In order to isolate the impact 
of using an incumbent solution from the impact of the 
quadratic term, we used only the LP master in gener- 
ating Fig. 1. It is clear that the solid line of incumbent 
solutions provides a far more stable trajectory. 

Next we discuss the impact of the quadratic term 
with reference to the same reserve planning example. 
In order to isolate the impact of the quadratic term, we 
examine the candidate sequences from two implemen- 
tations: one in which the weight on the quadratic term 
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is very small (o = 10~®) and another in which o = 1. 
The candidate solutions from the former implementa- 
tion are depicted by dotted lines in Fig. 2, whereas, the 
candidates from the latter implementation are shown 
by the solid line. Once again, the solid line, represent- 
ing the impact of regularization, exhibits a more stable 
trajectory than the trajectory associated with the dotted 
line. 

Before we close the article, we should mention one 
acceleration technique that often helps speed-up reg- 
ularized algorithms in practice. The idea is to change 
the magnitude of o dynamically: o is reduced when the 
incumbent changes and ee — xk | increases; on the 
other hand, o is increased when the incumbent does 
not change. This allows the algorithm to take smaller 
steps (higher o) when the approximation seems to be 
poor, whereas, the steps are allowed to more ‘daring’ 
(lower 0) when progress is rapid. 
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Abstract 


In this article, we provide an overview on how the max- 
imum weighted stable set problem can be solved ex- 
actly with Branch & Cut techniques. In addition, we 
provide selected references to other exact methods. We 
start with a brief introduction of the stable set prob- 
lem and a few basic definitions but assuming that the 
reader is already familiar with the basic concepts. The 
main stress of this article lies in the review of polyhe- 
dral results for the stable set polytope in Sect. “Stable 
Set Polytope” and the discussion of separation proce- 
dures, Sect. “Separation”. An efficient Branch & Cut al- 
gorithm needs, in addition to strong separation rou- 
tines, also a good branching strategy. This is discussed 
in Sect. “Branching”. At the end, some implementation 
aspects are considered. 


Keywords 


Stable set; Independent set; Maximum clique; Vertex 
packing; Branch & Cut; Separation; Exact method 


Introduction 


Let G = (V, E) be an undirected graph consisting of 
a nonempty finite set V, the node set; and a finite 
set E, the edge set, of unordered pairs of distinct ele- 
ments of V. A stable set of graph G is defined as a set 
of nodes S with the property that the nodes of S are 
pairwise non adjacent; two nodes are called adjacent 
if there is an edge in E connecting them. In the lit- 
erature, stable set is also called independent set, ver- 
tex packing, co-clique or anticlique. If each node v; of 
a graph Gis assigned a weight c;, then the graph is called 
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weighted. In this case, the maximum weighted stable 
set problem looks for a stable set S which maximizes 
the sum of the weights corresponding to the nodes in 
S, ee c;. In the case when G is not weighted, or all 
c; = 1, we are interested in a stable set with the maxi- 
mum number of nodes, which is called maximum car- 
dinality stable set. The size of a maximum cardinality 
stable set is called the stability number of graph G and 
is denoted by a(G). Throughout this article, references 
to the maximum stable set problem, or just stable set 
problem, consider the weighted case unless otherwise 
noted. 

Following from its definition, the stable set prob- 
lem has many applications in various fields, [15]. Espe- 
cially when “conflict(s)” between some objects occur, it 
is an indicator that the stable set problem is applicable. 
Next to the Traveling Salesman Problem, it is one of 
the most important combinatorial optimization prob- 
lems. It is well known that it is NP-hard to deter- 
mine a maximum stable set in an arbitrary graph, [37]. 
This holds true for the cardinality case. Furthermore, 
it is also hard to approximate the stable set number: It 
can be shown that for any fixed e > 0 there is no poly- 
nomial time algorithm for approximating the stability 
number within a factor of |V|°, under the assumption 
that P 4 NP, [3,36]. 

Let us briefly and informally introduce some poly- 
hedral terminologies. In our case it is sufficient to de- 
fine a polyhedron as the solution set of a system of 
linear inequalities. If the solution set is bounded, it 
is called a polytope. Graphically speaking, a polytope 
in R" is of full-dimension if it contains an n-dimen- 
sional sphere completely; in 2-dimensions it is there- 
fore forbidden that the polytope is empty, one point 
or a line segment. A linear inequality B'x < bo is 
valid with respect to a polyhedron P, if P is a sub- 
set of {x | B'x < bo}. We call a set F C P a facet of 
P if there is a valid inequality B'x < bo for P such 
that F = {x € P| B'x = bo}, and the inequality is not 
dominated by any other valid inequality. This inequal- 
ity is called a facet-defining inequality for P. In the case 
when v is a point in the polyhedron P and F = {v}, we 
call v a vertex of polyhedron P. Now, let P be a poly- 
tope and x* be a given point. The task to decide if 
this point lies in P or if not to find a valid inequal- 
ity B'x < bo for P which is violated by x*, is called 
the separation problem for polytope P. The convex hull 


of points y1,...,¥n € R®@ is the set of points x satisfy- 
ing x= ey Xi with pare? = land Xj = 0 Vi. 
It is denoted by conv{y;,..., Vn}. More details can be 
found for instance in [20,42,82]. 

We introduce now, in addition to the ones 
above, several graph theoretic definitions and notations 
needed throughout this article. A node v is incident to 
an edge e, if e = uv. The two nodes incident to an edge 
are its endnodes. A node is isolated if it has no neigh- 
bor in the graph, which means that it is not an endnode 
of any edge of the graph. The neighborhood of a node 
v is the collection of all its neighbor ans is abbreviated 
with I"(v). If a graph has no isolated nodes, it is called 
connected. A graph is said to be complete if it contains 
an edge connecting each pair of its nodes. A clique is 
the node set of a complete subgraph. If a clique has 
three nodes it is also called a triangle. The cardinality of 
a graph G is abbreviated by |G| and denotes the num- 
ber of nodes in the graph. The complement graph G of 
the graph G has the same node set as G and contains an 
edge between two nodes, iff no edge is contained in G. 
We call a graph G bipartite if its node set V can be par- 
titioned into two disjoint sets V|, V2 with V = V, U V2 
such that neither two nodes of set V,; nor two nodes 
of set V2 are neighbors. We call H = (W, F) a sub- 
graph of G, and write H C G, when W C VandF CE 
is the set of edges of graph G with both endnodes in 
W. Two graphs G = (V, E) and H = (W, F) are called 
isomorphic, if there is a bijection @: V — W such that 
uv € E > $(u)p(v) € F. A matching is a collection of 
pairwise disjoint edges. If in a matching M every node 
of Gis incident with exactly one edge M, then it is a per- 
fect matching. More about graph theory can be found 
in [33,78]. 

We do not describe the Branch & Cut algorithm in 
general here, as we assume the reader is familiar with 
its basic ideas. For more information we refer to [40, 
46,61,79]. 

Before we discuss some aspects of a Branch & Cut 
algorithm to solve the maximum stable set problem, 
we provide a list of some other exact solution meth- 
ods. Clearly, this list does not claim to be complete. 
A more detailed list of exact methods can be found 
in [15,22]. In this context, we want to point out that 
the stable set problem is equivalent to the maximum 
clique problem in the complement graph. Hence, each 
method solving the maximum clique problem can also 
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be used to solve the stable set problem. For polyno- 
mial time algorithms for some special classes of graphs, 
see [2,6,8,14,17,32,48,54,55,59,63] and Sect. “Stable Set 
Polytope”. Algorithms finding all maximum stable sets 
in a graph are considered in [19,29,44,49,73]. In the lit- 
erature, many variants of Branch & Bound algorithms 
have been discussed, [4,12,35,51,60,65,71,76,80]. Other 
methods using, for instance, continuous formulations, 
column generation or constraint programming can be 
found in [5,16,18,21,25,53,64,68,75,77]. 

Benchmark instances are provided by the second 
DIMACS Challenge, [34], from 1992/1993 and by the 
BHOS library from 2000, [13]. Note that some of these 
stable set instances are still unsolved. A test case gener- 
ator was introduced by Hasselberg et al. [45]. 


Method 


Let us now formulate the maximum stable set problem 
as a linear program. Therefore, one choice could be to 
introduce variables x; for each node v; € V, which have 
value one, if node v; is in a stable set, say S, and oth- 
erwise zero. Such a vector is called an incidence vec- 
tor. Obviously, for each edge, only one endnode can 
be in a stable set and hence, we get the so called edge- 
inequalities 

xi txj<l VijeE£. (1) 
It is easy to see, that if vector x has a positive integer 
domain (or more precisely, binary domain), each vector 
satisfying inequalities (1) induces a stable set and vice 
versa. Hence, if c denotes the (positive) weight vector of 
the nodes, one gets the following integer program 


St. Xit xj 1 


x € {0, "I, 


Vijek (2) 


which solves the maximum weighted stable set prob- 
lem. We recognize, that this formulation has only |E| 
constraints and |V| variables. So the formulation is 
quite compact. Unfortunately, the binary constraints 
on x make it hard to solve this linear program. We will 
discuss some relaxations of this problem in the next sec- 
tion which is mainly based on [9,39,42,58,70]. 


Stable Set Polytope 


The stable set polytope of graph G = (V, E) is defined 
as the convex hull of the incidence vectors of all stable 
sets in G. It is denoted by 


Psrap(G):=conv{ 7° |S C V stable set}, 


where y° is the incidence vector of set S. From the 
integer program formulation (2) we see that Psrap(G) 
is a polyhedron. As it is bounded by the |V|-dimen- 
sional unit cube, it is indeed a polytope. The defini- 
tion of a stable set implies that the unit vectors are al- 
ways stable sets. Trivially, the zero vector is a stable set, 
the empty set, therefore, the stable set polytope is full- 
dimensional. This implies that all facets of Psrap(G) 
are inequalities, and hence, we do not have to consider 
equalities, [58,67]. 

Let us now discuss some relaxations of the integer 
program formulation (2) which will also give us relax- 
ations of the stable set polytope. The obvious idea is to 
relax the binary condition on x, and instead make them 
continuous which leads to 


Vv; eV. (3) 


The integer problem reduces to a linear program which 
can be solved in polynomial time. This relaxation leads 
to the so called stable set polytope relaxation 


Prsrap(G) 
={x ER" |x; +x, <1,0<x<1VijeE EF}. 
(4) 


From its construction, we get that Psrap(G) C 
Prstap(G). For a complete graph with cardinality > 3, 
the x vector with value 1/2 in each component is a ver- 
tex of Prstap(G), but it cannot be contained in Psrap(G) 
as a maximum cardinality stable set in a complete graph 
has cardinality one. This example shows that the relax- 
ation above is very weak. Note, the vector whose en- 
tries are all 1/2 is always contained in Prsrap(G) - inde- 
pendent of the structure of the graph G. The following 
corollary generalizes this observation. It was first indi- 
cated by Balinski [11]. 


Corollary 1 The vertices of Prsrap(G) are (0, 7 1)- 
valued. 


We saw that for a complete graph G with cardinality 
> 3, Pspap(G) C Prsrag(G). The next theorem states 
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that this holds except for connected bipartite graphs. In 
this case the stable set polytope and the stable set poly- 
tope relaxation are equal, that is Psrag(G) = Pgsran(G). 


Theorem 2 [42] The non-negativity inequalities, 
x; => 0 Vv; € V, together with the edge inequalities (1) 
are sufficient to describe Psrap(G), iff G is bipartite and 
has no isolated nodes. 


Theorem 2 has the following important implication: It 
states that the maximum stable set problem for bipar- 
tite graphs can be solved in polynomial time by solv- 
ing the stable set problem over (4). As a consequence, 
a Branch & Cut algorithm using the stable set polytope 
relaxation (4), will terminate for bipartite graphs in the 
root node of the branching tree after solving one linear 
program. However, we already have indicated that the 
stable set polytope relaxation is very week and hence 
not a good choice in a Branch & Cut framework for 
general graphs. Obviously, it can be checked in linear 
time whether a graph is bipartite or not. Exact polyno- 
mial time algorithms for bipartite graphs can be found 
in [42,47]. 

As the restriction to bipartite graphs is very tough, 
we want to find some ways to strengthen the stable 
set polytope relaxation. One idea is to add additional 
valid inequalities to Prsran(G). Therefore, let us con- 
sider the Fig. 1. It shows a graph with the five nodes 
V1, V2,...,V5. Such a graph is called odd-cycle. In gen- 
eral, whenever a (sub-)graph H has an odd number of 
nodes, say n, and there are n adjacent edges in the edge 
set such that each node is incident to exactly two nodes, 
then we call H an odd cycle. Notice that an odd cycle 
can have more than n edges. In this case, any additional 
edge is called chord. For the graph of the Fig. 1 , the 
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stable set polytope relaxation allows the fractional solu- 
tion with all entries of 1/2, as illustrated. This solution 
is optimal, and for the cardinality stable set problem, 
the objective function value is 5/2, which is greater than 
any optimal stable set which has cardinality two. Now, 
summing up all edge inequalities corresponding to the 
five edges in the graph, one gets 


2X1 + 2x2 + 2x3 + 2X4 + 2x5 <5. 


In this case each node is incident to exactly two edges, 
giving the coefficients for the variables, and there are 
five edge inequalities, providing the right-hand side. 
This inequality can be divided by two and as all vari- 
ables are binary, one gets 


5 
Xp +x. +%34+%4+%5 < 5 | . 


This inequality can be generalized to the so called odd- 
cycle inequalities 


Yast 


viEV (5) 
for each odd cycle C = (V,E) CG. 


From the construction above, it is obvious that the odd- 
cycle inequalities are valid for the stable set polytope. If, 
in addition, the stable set polytope relaxation satisfies 
all odd-cycle inequalities of G, then it is called a cycle- 
constraint stable set polytope and is denoted by 


Pesrap(G):={x € R!Y! | x satisfies (1), (3) and (5)} . 


If you consider, again, a complete graph, you see that 
there is no constant which relates the optimal solution 
over Pcsrap(G) to an optimal stable set. However, the 
graphs for which Pesrap(G) = Psrap(G) are called t-per- 
fect; the “t” stands for “trou” - the French word for 
“hole”. Two examples for t-perfect graphs are bipartite 
graphs and almost bipartite graphs, where a graph G 
is called almost bipartite if there is a node v such that 
graph G without v is bipartite. The problem of checking 
whether a graph is t-perfect or not belongs to co- NP. 
The special structure of t-perfect graphs helps to find 
a maximum stable set. This is stated by the next corol- 


lary. 
Corollary 3.) The maximum stable set problem in 
a t-perfect graph can be solved in polynomial time. 
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We will see in Sect. “Separation” that the odd-cycle in- 
equalities can be separated in polynomial time. This 
proves together with the Equivalence of Optimization 
and Separation the Corollary 3. Polynomial time algo- 
rithms for the class of t-perfect graphs can be found 
in [42,43]. 

Weare mainly interested in facets of Perap(G) since 
they are not dominated by any valid inequality of 
Pstap(G). The odd-cycle inequalities can only be facet- 
defining if their odd cycles are chordless. If there is 
a chord, one gets a smaller odd cycle and an even cycle. 
The smaller odd-cycle inequality together with the edge 
inequalities dominate the odd-cycle inequality which 
shows that it cannot be facet-defining. A graph which is 
a chordless cycle is called a hole. If an odd cycle induces 
an odd hole, the corresponding odd-cycle inequality is 
called an odd-hole inequality. Consider the following 


Corollary 4 [57] Let G be an odd hole. Then 


were wet is facet-defining for Psran(G). 


A counterpart of the odd-cycle inequalities are the anti- 
hole inequalities. They are valid for antiholes, which is 
the complement graph of an odd hole with at least five 
nodes. From the Fig. 2, we recognize that each stable 
set of an antihole with 1 nodes can contain at most two 
nodes as each node is adjacent to exactly n — 2 nodes. 
Therefore, the following inequalities hold 


> x; <2 foreach antihole A = (V,E) CG. (6) 


viEV 


Note that an antihole with 5 nodes is isomorphic to an 
odd hole with 5 nodes. The separation problem for the 
antihole inequalities is not known whether it belongs to 
P or not. 

Another group of inequalities for the stable set poly- 
tope builds the clique inequalities. From the Fig. 3 we 


eh 
< 
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get 


ae x; <1 foreach clique Q. (7) 
vjEQ 


In 1979, Padberg showed the following important 


Theorem 5 [62] Let G be a graph with node set V and 
Q C V. Inequality (7) is valid for Psrag(G). An inequal- 
ity dvjEQ xj <1 isa facet of Psrap, iff Q is a maximal 
clique in G. 


Theorem 5 shows that the edge inequalities (1) are 
only facet-defining for Psrap(G), if they build a max- 
imal clique. Hence, they are dominated by the clique 
inequalities. We will use this observation later in Sect. 
“Implementation”. Note that for triangles, the clique in- 
equality and the odd-cycle inequality are the same. We 
define the so called clique-constraint stable set polytope 
as 


Posran(G):={x € R'Y! | x satisfies (1), (3) and (7)} . 


A graph G is called perfect if Posran(G) = Psrap(G). 
Originally, in 1961 Berge called a graph perfect if the 
coloring number is equal to the clique number. This 
definition is equivalent to the polyhedral one given 
above. The maximum clique and the stable set problem 
are very closely related. Therefore, it is not surprising 
that it is NP-hard to separate the clique inequalities 
in an arbitrary graph. With this fact, it is quite remark- 
able that the following theorem holds. 


Theorem 6 [42] The maximum stable set problem for 
perfect graphs can be solved in polynomial time. 


We do not go into the details of the proof here, but 
nevertheless, we give a rough explanation. It is pos- 
sible to generalize the clique inequalities to a class of 
so called orthonormal representation constraints which 
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Odd wheel 


are polynomially separable. The convex set of all vec- 
tors satisfying them and the non-negativity inequali- 
ties build the so called theta body, [50,81]. In the case 
of perfect graphs, this theta body is a polytope which 
equals Psran(G). This implies Theorem 6. However, it is 
even N P- hard to determine an optimal solution over 
Pgsran(G), in general. More about perfect graphs can 
be found, for instance, in [30,31,66]. 

If we consider the Fig. 4, we recognize an odd 
hole with cardinality seven with the additional node 
which is adjacent to all other nodes. Such a graph is 
called wheel and node v, is its hub. We see that if node 
vy, is contained in a stable set, no other node of the wheel 
can be contained in it. Hence, we get the following odd- 
wheel inequalities 


> ek i < ic 
vieV (8) 
for each odd wheel C = (V, E) € G with hubu. 


From its construction, inequality (8) is valid for 
Psrap(G). It defines a facet if G is isomorphic to an odd 
wheel. Recognize that the wheel inequality dominates 
the odd-hole inequality. Generalizations of the wheel 
inequalities can be found in [27]. 

Another class of inequalities are the web and an- 
tiweb inequalities. Let p and q be integers satisfying 
p>2q+1 and q>1.A graph G is called a web if 
G is isomorphic to the graph consisting of the nodes 
{V1,...,Vp} with an edge v;v;, iff |i — j] = r < q mod- 
ulo (n — 2). A web is abbreviated with W(p, q). A graph 
is called antiweb, denoted by AW(p, q), iff AW(p, q) is 
isomorphic to W(p, q). Examples can be seen in the 
Fig. 5 and 6, respectively. The following inequalities 


ye = a: (9) 


vieW(p,q) 
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(7,3)-web 


Stable Set Problem: Branch & Cut Algorithms, Figure 6 
(7,3)-antiweb 


(10) 
vi€AW(p,q) 


are called web inequalities and antiweb inequalities, 
respectively. Both types of inequalities are valid for 
Psraz(G). The web inequalities (9) define facets if p and 
q are relatively prime and G = W(p, q) - two natu- 
ral numbers are called relatively prime if their great- 
est common divisor is 1, or in formula: gcd(p, q) = 1; 
while the antiweb inequalities (10) are facet-defining for 
Pstap(AW(p, q)) if there is no k EN with p=k-q. 
More details can be found, for instance, in [28,74]. 

Now, consider the following class of inequalities for 
agraphG = (V,E)andW CV 


x(W):= }) x; < a(G[W)). 


view 


(11) 


They are called rank inequalities. From their construc- 
tion, inequalities (11) are valid for Psrap(G). The edge, 
odd-cycle, clique, antihole, web and antiweb inequal- 
ities belong to this class. Therefore, these inequalities 
are not facet-defining for Psrag(G) in general. For in- 
stance, an odd-wheel with 5 or more nodes does not 
lead to a rank inequality. Let us now have a closer look 
at the separation of the discussed inequalities. 
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Separation 


In order to separate the odd-cycle inequalities (5) for 
a graph G and a vector x*, one has to find an odd cycle 
for which x* violates the corresponding inequality, or 
one has to prove that such cycles do not exist. In other 
words, we have to find a minimum-weight odd cycle 
in a graph, with an appropriate weighting function. If 
this cycle satisfies the corresponding inequality (5), it is 
proven that all odd-cycle inequalities are satisfied. Oth- 
erwise, one has found a maximal violated odd-cycle in- 
equality. Therefore, we first recognize 


Proposition 7 A minimum-weight odd cycle in 
a graph G with edge weights can be computed in 
O(|V| - |E] - log(V)). 


The idea is to construct an auxiliary bipartite graph H. 
This node set of H consists of two copies of the node set 
of the original graph G, and there is an edge between 
two nodes of H if the corresponding original nodes in 
G are adjacent. The edge weights are copied with the 
edges. From the construction of H, a minimum odd- 
cycle with respect to the edge weighting in G corre- 
sponds to a shortest path in H and vice versa. Hence, 
calculation of a shortest path for each node of the orig- 
inal graph G to its copy, gives a minimum weight odd 
cycle in G. Computing the shortest paths with Dijkstra’s 
algorithm yields to the running time of Proposition 7. 
Now, define the following edge weighting of graph G 
depending on x” as 


(12) 


With this weighting, it can be shown that an odd-cycle 
inequality in G is violated by vector x*, if and only if 


AC = (V,E) C G with C odd cycle and 


> c(viv;) < ; : 


vivjek 
This yields directly to the following theorem. 


Theorem 8 The separation problem for the class of odd- 
cycle inequalities can be solved in O(|V| - |E| - log(|V])). 


One has to mention that x* has to satisfy all inequal- 
ities (3) before the odd-cycle inequalities can be sepa- 
rated with the above procedure. Otherwise, the weights 


defined in (12) can become negative, and the short- 
est path cannot be calculated with Dijkstra’s algorithm 
anymore. More details and the description of the algo- 
rithms can be found in [27,38,67]. It is interesting to 
realize that, in general, there are exponentially many 
odd cycles in a graph, but on the contrary, the sepa- 
ration of them is polynomial. We want to mention that 
Grétschel and Pulleyblank introduced in 1981 another 
method to separate the odd-cycle inequalities with the 
use of perfect matchings resulting in a running time of 
O(\E|*), [41]. 

Finding a maximum clique is IVP- hard, while 
computing an arbitrary maximal clique as well as an ar- 
bitrary maximal stable set can be done in linear time. 
We want to point out that we distinguish between max- 
imum and maximal. Maximal means that there is no 
better solution containing the particular one; so maxi- 
mal can be seen as locally best while maximum is global. 
The separation problem for the clique inequalities asks 
to find a violated clique inequality in a particular graph 
G with a given linear program solution or to state that 
all clique inequalities are satisfied. This is equivalent to 
finding a maximum clique in G with the linear pro- 
gram solution as node-weighting c. Hence, the separa- 
tion problem for the clique inequalities is NP-hard. 
Computational tests show that the clique inequalities 
are very important for polyhedral approaches to the sta- 
ble set problem, cf. [69]. As an exact separation cannot 
be considered, one idea could be to fix the size of the 
cliques to be separated, as then the problem becomes 
polynomially solvable. Another observation is that it is 
enough to consider maximal cliques. The reason is that 
the resulting clique inequality dominates all clique in- 
equalities corresponding to contained cliques. Practi- 
cally, it is more successful to separate over larger classes 
of inequalities containing the clique inequalities. We 
discuss that later for the case of rank inequalities. How- 
ever, the best computational results, so far, are achieved 
with heuristic separation methods. 

A separation algorithm for the wheel inequalities is 
given by Cheng and Cunningham, [26]. The appeal- 
ing idea is to treat each node of the graph as a possible 
hub of an odd wheel and define appropriate weights for 
all nodes which are adjacent to this hub. Then, on the 
new graph, the odd-cycle inequalities can be separated. 
Hence, this algorithm results in a total running time 
of O(|V|* - |E| - log(|V|)) or O(| V|*), dependent on the 
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shortest path algorithm. For practical Branch & Cut al- 
gorithms, this complexity is already too high compared 
to the number of violated inequalities one can expect. 
Even more sophisticated are the separation routines for 
the web and antiweb inequalities. They are discussed by 
Cheng and Vries, [28]. Although they can also be sep- 
arated in polynomial time, the complexity of the sepa- 
ration algorithms are again too large for practical use. 
In addition, these type of inequalities usually occur in 
graphs with high density, e.g. greater than 0.7, which 
makes its separation doubtful for graphs with low den- 
sity; where the density of graph G = (V, E) is defined 


2-|E| 
IVI-V[=1)° 
Next to the special type of inequalities for the sta- 


ble set polytope discussed in the Sect. “Stable Set Poly- 
tope”, one can use general classes of inequalities. In 
this context, we want to mention two of such a type. 
The first are the so called mod-k cuts which belong to 
the Chvatal-Gomory cuts with rank one. The appealing 
idea of the mod-k cuts is to find a multiplier jz, such 
that a particular inequality system Ax < b multiplied 
with this vector jz can be strengthened by dividing it by 
a positive integer k. Therefore, let k > 1 be integral, and 
suppose that we have given a system of linear inequali- 
ties Ax < b with integral coefficients. Let jz be a vector 
with positive integer entries and appropriate dimension 
such that 


w'A= 0 (mod k), 
w'b= k—1 (modk). 


From this, one can obtain the mod-k inequalities 
y < Uji b —(k—1)) 
~y! Ax < — =(k= . 
k k 


Examples for the case of k = 2 are the odd-cycle in- 
equalities or the wheel inequalities. The separation of 
the mod-k inequalities and more details can be found 
in [23,67]. 

The second class of general inequalities that we dis- 
cuss here in brief are the so called local cuts. The prin- 
cipal idea was introduced by Applegate et al. in 2001 
for the Traveling Salesman Problem, [1]. Suppose we 
have given the set of all m feasible solutions to the stable 
set problem, for instance of a subgraph with n nodes. 
The idea is to check if a given (solution) vector lies in 
its convex hull or if it lies outside, to compute a facet 
which separates this point from the convex hull. This 


can be achieved by solving a linear program with m 
variables and n constraints. Its optimal objective func- 
tion value provides the information if the point lies in- 
side the polytope, and the optimal dual variables give 
the coefficients of the separation inequality, called lo- 
cal cut. Obviously, this method has some weaknesses. 
One first has to find a ‘good’ subgraph and then enu- 
merate all stable sets. In addition, the linear program 
to be solved can be large, as the number of stable sets 
in a graph can be exponential. Nevertheless, the result- 
ing cuts can be quite strong, especially if all other sep- 
aration procedures do not bring new cuts. More details 
and computational results can be found in [67]. 

At the end of this subsection, we introduce the idea 
of separating rank inequalities. We do not go into full 
detail here but instead focus on the discussion of the 
principle ideas of the beautiful results of Mannino and 
Sassano, [52]. The appealing idea is to reduce the size 
of the graph and to make it denser at the same time. In 
general, any node of the graph can be selected and its 
two endnodes will be removed from the graph. In addi- 
tion, new so called false edges are added to the graph 
and some other nodes may also be removed. There- 
fore, this procedure is called edge projection. Now, af- 
ter a few iterations of this procedure, when the graph 
is small enough, one can separate any type of rank 
inequality, for instance, the clique inequalities or the 
odd-cycle inequalities. If a violated inequality has been 
found, it must be projected to be valid for the poly- 
tope of the original graph. This process is called anti- 
projection. We have to mention that it is possible that 
a projected inequality is no longer violated by a solu- 
tion vector, even if it was in the projected graph. The 
edge-projection and the anti-projection can be done in 
linear time which makes this method very fast. Let us 
now consider a small example indicated in the Fig. 7. 
In Fig. 7a you see a small graph with six nodes. It is an 
odd hole with the additional node v4. If we select edge 
€ = v3V5 to project on, then in this case, nodes v3, v5 
and, in addition, v4 are removed from the graph (the 
reason therefore is that v4 is the common neighbor of 
nodes v3 and vs); as well as all edges incident with any 
of these nodes. False edges are added between the set 
of nodes which are only neighbors of v3 and not of vs 
and the set of nodes which are only neighbors of v5 and 
not of v3. Hence, false edge v2Vv6 is added, and one gets 
the triangle shown in Fig. 7b. We recognize that the re- 
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Stable Set Problem: Branch & Cut Algorithms, Figure 7 
Edge protection 


sulting graph is smaller and, indeed, more dense. We 
realize that the triangle inequality 


Xp +x. + x6 <1 


is valid for Psrap(G) but not for Psrag(G). In most 
cases, the anti-projection adds the deleted nodes to the 
inequality and increases the right hand side by value 
onel. In this case we get 


Xt x. t+ x3 4+ x4 4+ x5 + X% <2, 


which is a valid inequality for Psrag(G). We recognize 
that it is a lifted odd-hole inequality, which is facet- 
defining for Pspap(G); where the extension of a valid 
inequality for P to a valid inequality for a higher di- 
mensional polytope P > P is called lifting. In general, 
it turns out that facet-defining inequalities for the poly- 
tope of the projected graph might not be facet-defining 
for the polytope of the original graph and vice versa. 
The method was successfully developed and imple- 
mented by Rossi and Smriglio, [69]. More details and 
polyhedral results can be found in [67]. 


Branching 


The branching strategy in a Branch & Cut algorithm in- 
fluences the overall performance of the algorithm very 
much. In general, it is very difficult to find a good strat- 
egy. Various techniques have been explored and none 
of them can always outperform the others. But for spe- 
cial problems, there are different strategies that help 
to reduce the size of the Branch & Bound tree and 
speedup the algorithm. We start this section with a mo- 
tivation for the need of special techniques and present 
the branching idea of Balas and Yu from 1986, [7]. It 
still remains the most successful strategy in practice. 


V2 
V1 I 


V6 


b Graph G after 
edge-projection of 
edge v3U5 


One standard branching idea for a problem with 
binary variables is to generate two subproblems. One 
variable is set to value one in one subproblem and to 
value zero in the other one. However, this branching 
strategy leads to a very unbalanced Branch & Bound 
tree for the maximum stable set problem. This can be 
seen by the following argument. Setting a variable to 
value one sets all nodes of its neighborhood to value 
zero. In contrast, setting a variable to value zero has 
no consequence for all other variables of the graph. To 
avoid this drawback, one could think to set in each sub- 
problem of the tree at least one variable to value one. 
Basically, this is the key property of the branching strat- 
egy by Balas and Yu. 

Let G’ = (V’, E’) be the subgraph induced by the 
set of nodes which are not fixed in a current subprob- 
lem. In each subproblem, the goal is to find a maxi- 
mum stable set in the particular subgraph given by the 
tree, or to prove that a(G’) < LB, with the lower bound 
LB. Let W C V’, and assume that we can show that 
a(G[W]) < LB. Clearly, if W = V’, the subproblem 
can be fathomed; otherwise, if w(G’) > LB any maxi- 
mum stable set must contain at least one node of set 
Z:=V'\ W = {v,...,Vp}. Based on this observation, 
Balas and Yu showed that every maximum-cardinality 
stable set with greater weight than the lower bound 
must be contained in one of the sets 


VP VW) sige Ved) 
fori=1,...,p. 


Note, that this strategy is also true for the weighted case 
with c # 1. This branching leads to p new subproblems 
in one branching step. In each subproblem, node 1; is 
set to value one, and all nodes of (vj) U {vi4i,---.Vp} 
are set to value zero. 
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Let us now discuss some properties of this strategy. 
The size of W and the ordering of the nodes in Z can 
affect the total number of subproblems to be solved. Of 
course, the larger W is, the fewer subproblems will be 
generated in that state. The size of W is strongly effected 
by the quality of the computed lower bound. Determin- 
ing W is quite crucial and can be done for the cardinal- 
ity stable set problem, for instance, with a clique cover- 
ing, cf. [7,69]. Also other methods such as matchings or 
holes [71] have been considered. In addition, the choice 
of the branching variable also has a great impact on the 
number of subproblems being solved. The size of the 
tree can be reduced by branching on nodes with a high 
degree, which was empirically shown by Carraghan and 
Pardalos [25]. The reason is the previously mentioned 
observation that with the branching node its neighbor- 
hood is also set. To sort the nodes in each subproblem 
in ascending order of degree seems to be computation- 
ally expensive as the degree of all nodes has to be cal- 
culated in each branching step prior to sorting. How- 
ever, Sewell [71] showed experimentally that for sparse 
graphs this is still convenient. 


Implementation Aspects 


In general, when implementing a solver for the stable 
set problem, the following two things are crucial. First, 
one has to obtain a good lower bound, which means 
in the case of maximization, a feasible solution. One 
should use one of the many suggested heuristics in lit- 
erature. We refer to the article » heuristics for maxi- 
mum clique and independent set. Second, it is recom- 
mended to have a strong preprocessing. This becomes 
even more important when the graphs result from ap- 
plications. Many contributions have been made for this 
purpose. Among them are, for instance, fixing of nodes, 
fixing of cliques and treating connected components 
separately, [56,67,72]. 

For the case of a Branch & Cut algorithm in par- 
ticular, one first needs a formulation of the stable 
set problem. We discussed some of the relaxations in 
Sect. “Stable Set Polytope”. For practical efficiency, it is 
not recommended to start with the optimization over 
Prstap(G). The reasons are that this relaxation is very 
weak and contains relatively many constraints. A better 
idea is to start with maximal cliques which contain all 
edges. Such a covering can be found in linear time. The 


resulting relaxation is stronger, as each maximal clique 
is facet-defining and dominates all the edge inequali- 
ties contained. Recognize that for bipartite graphs the 
relaxations is the same for both methods. 

If one decides to separate several classes of inequal- 
ities within a Branch & Cut framework, one needs an 
order in which the separation routines are called. It 
is recommended to first call the polynomial separa- 
tion routines, then the ones which take higher com- 
putational effort. However, practical tests show that 
the clique inequalities are very important. Therefore, 
a Branch & Cut solver should focus on fast heuristic 
separation of the clique inequalities combined with the 
very powerful tool of edge-projection. This leads to the 
best computational results so far. 

Moreover, it is recommended to focus on facet- 
defining inequalities. Therefore, each clique should be 
lifted to a maximal clique before its inequality is added 
to the formulation. Accordingly, each odd-cycle should 
be checked not to contain any chords, and if so, the 
smaller odd-cycle would be added instead. These trans- 
formations can be done in linear time. 

More details regarding implementation and Branch 
& Cut modules for the stable set problem can be found, 
for instance, in [10,24,67,69]. 


Conclusion 


Many contributions have been made to solve the sta- 
ble set problem exactly. One of the exact algorithms is 
based on the Branch & Cut method. However, the stable 
set polytope is not yet fully understood, and the known 
inequalities are either easy to separate with little impact, 
or they can only be separated with a large amount of 
computational effort and are very crucial for polyhe- 
dral approaches to the stable set problem. Therefore, it 
is not a surprise that there are some stable set instances 
with less than 1000 nodes which cannot be solved ex- 
actly with current methods. 


See also 


> Heuristics for Maximum Clique and Independent 
Set 

> Integer Programming 

> Integer Programming: Branch and Bound Methods 

> Integer Programming: Branch and Cut Algorithms 

> Integer Programming: Cutting Plane Algorithms 


3686 


Stable Set Problem: Branch & Cut Algorithms 


> 
> 


Lovasz Number 
Simplicial Pivoting Algorithms for Integer 
Programming 


References 


i 


Applegate D, Bixby R, Chvatal V, Cook W (2001) TSP Cuts 
Which Do Not Conform to the Template Paradigm. Com- 
put Comb Optim LNCS, vol 2241:157-222 

Alekseev VE (2003) On easy and hard hereditary classes of 
graphs with respect to the independent set problem. Dis- 
cret Appl Math 132(1-3):17-26 

Arora S, Safra S (1992) Probabilistic Checking of Proofs; 
a new Characterization of NP. In Proceedings 33rd 
IEEE Symposium on Foundations of Computer Science, 
pp 2-13. IEEE Computer Society, Los Angeles 

Babel L, Tinhofer G (1990) A branch and bound algorithm 
for the maximum clique problem. ZOR-Methods Models 
Oper Res 34:207-217 

Balas E, Xue J (1996) Weighted and Unweighted Maximum 
Clique Algorithms with Upper Bounds from Fractional Col- 
oring. Algorithmica 15(5):397-412 


. Balas E, Yu CS (1989) On graphs with polynomially 


solvable maximum-weight clique problem. Networks 
19(2):247-253 

Balas E, Yu CS (1986) Finding a Maximum Clique in an Ar- 
bitrary Graph. SIAM J 14(4):1054-1068 

Balas E, Chvatala V, Nesetiil J (1987) On the Maximum 
Weight Clique Problem. Math Oper Res 12(3):522-535 
Balas E, Padberg MW (1976) Set Partitioning: A Survey. 
SIAM Rev 18(4):710-761 

Balasa E, Ceria S, Cornuejols G, Pataki G (1996) Polyhe- 
dral methods for the maximum clique problem. In: John- 
son DS, Trick MA (eds) American Mathematical Society. 
DIMACS vol 26, pp 11-28 


. Balinski ML (1970) On Maximum Matching, Minimum Cov- 


ering and their Connections. In: Kuhn HW (ed) Proceedings 
of the Princeton symposium on mathematical program- 
ming. Princeton University Press, Princeton, pp 303-312 
Barnes ER (2000) A Branch-and-Bound Procedure for the 
Largest Clique in a Graph. Approximation and Complexity 
in Numerical Optimization: Continuous and Discrete Prob- 
lems. Kluwer, Boston 

BHOSLIB (2000) Benchmarks with Hidden Optimum So- 
lutions for Graph Problems (Maximum Clique, Maximum 
Independent Set, Minimum Vertex Cover and Vertex 
Coloring) - Hiding Exact Solutions in Random Graphs. 
http://www.nlsde.buaa.edu.cn/~kexu/benchmarks/ 
graph-benchmarks.htm 

Bhattacharya BK, Kaller D (1997) An O(m + n log n) Al- 
gorithm for the Maximum-Clique Problem in Circular-Arc 
Graphs. J Algorithms 25(3):336-358 

Bomze IM, Budinich M, Pardalos PM, Pelillo M (1999) The 
Maximum Clique Problem. Handbook of Combinatorial 
Optimization. Kluwer, Boston 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


335 


34. 


35: 


36. 


. Bomze IM, Stix V (1999) Genetic engineering via negative 


fitness: Evolutionary dynamics for global optimization. An- 
nals Oper Res 89:297-318 


. Bonomo F, Duran G, Lin MC, Szwarcfiter JL (2005) On Bal- 


anced Graphs. Math Program 105(2-3):233-250 


. Bourjolly J-M, Laporte G, Mercure H (1997) A combinatorial 


column generation algorithm for the maximum stable set 
problem. Oper Res Lett 20(1):21-29 


. Bron C, Kerbosch J (1973) Algorithm 457: Finding all 


cliques on an undirected graph. Commun ACM 16:575-57 


. Bronsted A (1983) An introduction to Convex Polytopes. 


Graduate Texts in Mathematics, vol 90. Springer, New York 


. Burer S, Monteiro RDC, Zhang Y (2002) Maximum stable 


set formulations and heuristics based on continuous op- 
timization. Math Program 94(1):137-166 

Butenko S (2003) Maximum Independent Set and Re- 
lated Problems, with Applications. PhD thesis, University 
of Florida 

Caprara A, Fiscetti M, Letchford AN (2000) On the Sepa- 
ration of Maximally Violated mod-k Cuts. Math Program 
87(1):37-56 

Carr RD, Lancia G, Istrail S (2000) Branch-and-Cut Algo- 
rithms for Independent Set Problems: Integrality Gap and 
An Application to Protein Structure Alignment. Techni- 
cal report, Sandia National Laboratories, Albuquerque, US; 
Sandia National Laboratories, Livermore 

Carraghan R, Pardalos PM (1990) An exact algorithm for the 
maximum clique problem. Oper Res Lett 9(6):375-382 
Cheng E, Cunningham WH (1995) Separation problems for 
the stable set polytope. In: Balas E, Clausen J (eds) The 
4th Integer Programming and Combinatorial Optimization 
Conference Proceedings. pp 65-79 

Cheng E, Cunningham WH (1997) Wheel Inequalities for 
Stable Set Polytopes. Math Program 77:389-421 

Cheng E, de Vries S (2002) Antiweb-wheel inequalities and 
their separation problems over the stable set polytopes. 
Math Program 92(1):153-175 

Chiba N, Nishizeki T (1985) Arboricity and subgraph listing 
algorithms. SIAM J 14:210-223 

Chudnovsky M, Cornuéjols G, Liu X, Seymour P, Vuskovic K 
(2005) Recognizing Berge Graphs. Comb 25(2):143-186 
Chudnovsky M, Robertson N, Seymour P, Thomas R (2004) 
The strong perfect graph theorem. Ann Math 164:51229 
Cogisa O, Thierry E (2005) Computing maximum sta- 
ble sets for distance-hereditary graphs. Discret Optim 
2(3):185-188 

Diestel R (2000) Graph Theory. Electronic Edition 2000. 
Springer, New York 

Second DIMACS Challenge, 1992/1993. http://mat.gsia. 
cmu.edu/challenge.html 

Fahle T (2002) Simple and Fast: Improving a Branch-And- 
Bound Algorithm for Maximum Clique, vol 2461/2002 Lec- 
ture Notes in Computer Science pp 485-498 

Fujisawa K, Morito S, Kubo M (1995) Experimental Analyses 
of the Life Soan Method for the Maximum Stable Set Prob- 


Stable Set Problem: Branch & Cut Algorithms 


3687 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44, 


45. 


46. 


47. 


48. 


49. 


50. 


51. 


52, 


D3: 


54. 


55. 


lem. The Institute of Statistical Mathematics Cooperative 
Research Report 75:135-165 

Garey MR, Johnson DS (1979) Computers and Intractabil- 
ity, A guide to the Theory of NP-Completeness. In: Klee V 
(ed) A series of books in the mathematical sciences. Free- 
man WH and Company, New York 

Gerards AMH, Schrijver A (1986) Matrices with the 
Edmonds-Johnson property. Comb 6(4):365-379 
Giandomenico M, Letchford AN (2006) Exploring the Re- 
lationship Between Max-Cut and Stable Set Relaxations. 
Math Program 106(1):159-175 

Grdtschel M, JUnger M, Reinelt G (1984) A Cutting Plane 
Algorithm for the Linear Ordering Problem. Oper Res 
32:1195-1220 

Grétschel M, Pulleyblank WR (1981) Weakly Bipar- 
tite Graphs and the Max-cut Problem. Oper Res Lett 
1(1):23-27 

Grdtschel M, Lovasz L, Schrijver A (1988) Geometric Algo- 
rithms and Combinatorial Optimization. Algorithms and 
Combinatorics 2. Springer, Berlin 

Grdtschel M, Lovasz L, Schrijver A (1981) The Ellipsoid 
Method and Its Consequences in Combinatorial Optimiza- 
tion. Comb 1:169-197 

Harary F, Ross IC (1957) A procedure for clique detection 
using the group matrix. Sociom 20:205-215 

Hasselberg J, Pardalos PM, Vairaktarakis G (1993) Test case 
generators and computational results for the maximum 
clique problem. J Glob Optim 3(4):463-482 

Kallrath J, Wilson JM (1997) Business Optimization using 
Mathematical Programming. Macmillan, New York 

Lawler E (2001) Combinatorial Optimization: Networks and 
Matroids. Reprint of the 1976 original. Dover Publications, 
Inc., Mineola 

Lehmann KA, Kaufmann M, Steigele S, Nieselt K (2006) On 
the maximal cliques in c-max-tolerance graphs and their 
application in clustering molecular sequences. Algorithm 
Molecular Biol 1:9:1-17 

Loukakis E, Tsouros C (1981) A depth first serach algo- 
rithm to generate the family of maximal independet sets 
of a graph lexicographically. Comput 27:249-266 

Lovasz L (1979) On the Shannon capacity of a graph. IEEE 
Trans Inform Theory 25(1):1-7 

Mannino C, Sassano A (2005) An exact algorithm for 
the maximum stable set problem. Comput Optim Appl 
3(3):243-258 

Mannino C, Sassano A (1996) Edge Projection and the Max- 
imum Cardinality Stable Set Problem. DIMACS Series Dis- 
cret Math Theor Comput Sci 26:249-261 

Mannino C, Stefanutti E (1999) An augmentation algorithm 
for the maximum weighted stable set problem. Comput 
Optim Appl 14(3):367-381 

Masuda §, Nakajima K, Kashiwabara T, Fujisawa T (1990) Ef- 
ficient algorithms for finding maximum cliques of an over- 
lap graph. Networks 20(2):157-171 

Mosca R (1997) Polynomial algorithms for the maximum 


56. 


57. 


58. 


59. 


60. 


61. 


62. 


63. 


64. 


65. 


66. 


67. 


68. 


69. 


70. 


71. 


72; 


73. 


74. 


stable set problem on particular classes of p5-free graphs. 
Inf Process Lett 61(3):137-143 

Nemhauser GL, Trotter LE Jr (1975) Vertex Packings: 
Structural Properties and Algorithms. Math Program 
8:232-248 

Nemhauser GL, Trotter LE Jr (1974) Properties of Vertex 
Packing and Independence System Polyhedra. Math Pro- 
gram 6:48-61 

Nemhauser GL, Wolsey LA (1988) Integer and Combina- 
torial Optimization. Wiley-Interscience Series in Discrete 
Mathematics and Optimization. Wiley, New York 

Olariu S (1989) Weak bipolarizable graphs. Discret Math 
74(1-2):159-171 

Ostergard PRJ (2002) A fast algorithm for the maximum 
clique problem. Discret Appl Math 120(1-3):197-207 
Padberg MW, Rinaldi G (1987) Optimization of a 532 City 
Symmetric Traveling Salesman Problem by Branch and 
Cut. Oper Res Lett 6:1-7 

Padberg MW (1973) On the Facial Structure of Set Packing 
Polyhedra. Math Prog 5:199-215 

Papadimitriou CH, Yannakakis M (1981) The clique prob- 
lem for planar graphs. Inf Process Lett 13(4—5):131-133 
Pardalos PM, Phillips AT (1990) A global optimization ap- 
proach for solving the maximum clique problem. Int J 
Comput Math 33(3-4):209-216 

Pardalos PM, Rodgers GP (1992) A branch and bound algo- 
rithm for the maximum clique problem. Comput Oper Res 
19(5):363-375 

Ramirez-Alfonsin JL, Reed BA (eds) (2001) Perfect Graphs, 
Wiley-Interscience Series in Discrete Mathematics and Op- 
timization. Wiley, New York 

Rebennack S (2006) Maximum Stable Set Problem: 
A Branch & Cut Solver. Diplomarbeit, Ruprecht-Karls Uni- 
versitat Heidelberg, Heidelberg, Germany 

Régin J-C (2003) Using constraint Programming to Solve 
the Maximum Clique Problem. Lecture Notes in Computer 
Science. Springer, Berlin, pp 634-648 

Rossi F, Smriglio S (2001) A Branch-and-Cut Algorithm for 
the Maximum Cardinality Stable Set Problem. Oper Res 
Lett 28:63-74 

Schrijver A (2003) Combinatorial Optimization: Polyhedra 
and Efficiency, vol 24 of Algorithms and Combinatorics. 
Springer, Berlin 

Sewell EC (1998) A Branch and Bound Algorithm for the 
Stability Number of a Sparse Graph. INFORMS J Comput 
10(4):438-447 

Strijk T, Verweij B, Aardal K (2000) Algorithms for maxi- 
mum independent set applied to map labelling. Technical 
Report UU-CS-2000-22, http://citeseer.ist.psu.edu/article/ 
strijkOO0algorithms.html 

Tomita E, Tanaka A, Takahashi H (1988) The worst-time 
complexity for finding all the cliques. Technical report, Uni- 
versity of Electro-Communications, Tokyo, Japan 

Trotter LE (1975) A class of facet producing graphs for ver- 
tex packing polyhedra. Discret Math 12(4):373-388 


3688 


Standard Quadratic Optimization Problems: Algorithms 


75. Verweij B, Aardal K (1999) An Optimisation Algorithm for 
Maximum Independent Set with Applications in Map La- 
belling, vol 1643/1999 Lecture Notes in Computer Science, 
pp 426-437 

76. Warren JS, Hicks IV (2006) Combinatorial Branch-and- 
Bound for the Maximum Weight Independent Set Prob- 
lem. working paper, August 7 

77. Warrier D, Wilhelm WE, Warren JS, Hicks IV (2005) A branch- 
and-price approach for the maximum weight independent 
set problem. Network 46(4):198-209 

78. West DB (2000) Introduction to Graph Theory, 2nd edn. 
Prentice Hall 

79. Wolsey LA (1998) Integer Programming. Wiley-Inter- 
science Series in Discrete Mathematics and Optimization. 
Wiley-Interscience, New York 

80. Wood DR (1997) An algorithm for finding a maximum 
clique in a graph. Oper Res Lett 21(5):211-217 

81. Yildirim EA, Fan-Orzechowski X (2006) On Extracting Max- 
imum Stable Sets in Perfect Graphs Using Lovasz’s Theta 
Function. Comput Optim Appl 33(2-3):229-247 

82. Ziegler GM (1995) Lecture on Polytopes. Graduate Texts in 
Mathematics. Springer, New York 


Standard Quadratic Optimization 
Problems: Algorithms 


IMMANUEL M. BOMZE 
University Vienna, Wien, Austria 


MSC2000: 90C20 


Article Outline 


Keywords 
See also 
References 


Keywords 


Global optimization; Interior point; Copositivity; 
Escape step; Replicator dynamics 


A standard quadratic optimization problem (StQP) con- 
sists of finding (global) maximizers of a quadratic form 
over the standard simplex A in n-dimensional Eu- 
clidean space R", 


A = {xe R": x; > 0 forallie N, e'x=1}, 


where N = {1, ..., n}; a T denotes transposition; and 
e = [1,..., 1]™ € R”. Hence a StQP can be written as 


a (global) quadratic optimization problem of the form 
max { f(x) =x! Rx: xe A}, (1) 


where R is an arbitrary symmetric n x n matrix. 
Quadratic optimization problems like (1) are NP-hard 
[2], even regarding the detection of local solutions. 
Nevertheless, there are several procedures which try 
to exploit favorable data constellations in a systematic 
way, and to avoid the worst-case behavior whenever 
possible. Examples for this type of algorithms are spec- 
ified below. 

First we concentrate on the evolutionary approach 
to finding local solutions of StQPs. To this end, con- 
sider the following dynamical system operating on A: 


xi(t) = xi(t)[(Rx(t)); — x(t) "Rx(x)], 


i€N, @) 


where a dot signifies derivative w.r.t. time tT, and a dis- 
crete time version 
[Rx(t)]i 
ME ENE) reg ie N. (3) 

The stationary points under (2) and (3) coincide, and 

all local solutions of (1) are among these (see below). 

A stationary point x is said to be asymptotically stable, 

if every solution to (2) or (3) which starts close enough 

to x, will converge to x as t 7 oo. Now the follow- 
ing results hold (for proofs and further characterization 
results linking optimization theory, evolutionary game 
theory, and qualitative theory of dynamical systems, see 

[1] and the references therein): 

e the objective f(x(r)) increases strictly along non- 
constant trajectories of (2) and (3); 

e all trajectories converge to a stationary point; 

e all Karush-Kuhn-Tucker points and hence all local 
solutions of (1) are stationary points under (2) and 
(3); 

e if no principal minor of R = RT vanishes, then with 
probability one (regarding the choice of x(0), the 
starting point), any trajectory of (2) converges to 
one of the strict local solutions x of (1), which coin- 
cide with the asymptotically stable points under (2) 
and (3); 

e further, y' Ry < x! Rx forall y € A with y $ x but 
yi = 0 if X; = 03. 
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1 | Given a local solution x to (1), remove all alle- 
les which are not unfit, ie.allieS={ieN: 
x4 0}; 

2 | determine a (local) fitness minimizer y in the 
reduced problem, i.e. consider problem (1) 
with R replaced by 


R= [ys —rijlijen\s. 


where ys = maxj,jen\s Ti j3 
3 | with a local solution y of this auxiliary prob- 
lem, put 


J={pe N\S: yi > 0} 


and denote by m the cardinality of J; 

4 | for alls € S and t € J, consider the reduced 
problem P;-_,,, i.e. problem (1) in m — m vari- 
ables for the (n — m) x (n — m) matrix R;-, 
obtained from R as follows: replace r, ; with r;; 
and remove all other j € J; 

5 | xis a global solution to the master problem (1) 
if and only if for all (s, t) € Sx/J, the maximum 
of P;,, does not exceed the current best value 
x! Rk 

6 | in the negative, ie. if u!R,5,u > x! Rx for 
some u € A C R" ", and if j € J is chosen 
such that for all q € J 


1 
yy (rip —Tap)Up = 5649 — 1; j)Us » 
pEJULs} 


then a strictly improving feasible point X is ob- 
tained as follows: 


Hie eG) = ji. 
%y= 10 ifqe TU {s}\{j}, 
ug ifqe N\J. 


GENF procedure to escape from inefficient local solutions in 
StQPs 


Although strictly increasing objective values are 
guaranteed as trajectories under (2) or (3) are followed, 
one could get stuck in an inefficient local solution of (1). 
One possibility to escape is the genetic engineering via 
negative fitness (GENF) approach [1] described in the 
sequel. From the properties above, a strict local solution 
* of(1) must be a global one if all x; > 0. Consequently, 


at an inefficient local solution necessarily x; = 0 for 
some i. In the usual genetic interpretation of the dy- 
namics (2) and (3), this means that some alleles die 
out during the natural selection process, and these are 
therefore unfit in the environment currently prevailing. 
The escape step now artificially re-introduces some al- 
leles which would have gone extinct during the natural 
selection process, and restarts with a smaller subprob- 
lem which will yield an improvement if xX is inefficient, 
see the table above. 

In view of the possible combinatorial explosion in 
effort with increasing number of variables, this dimen- 
sion reducing strategy seems to be promising: if k is the 
size of S, the above result yields a series of km StQPs in 
n — m variables rather than in n. We are now ready to 
describe the algorithm which stops after finitely many 
repetitions, since it yields strict local solutions with 
strictly increasing objective values [1]. 


1 | Start with x(0) = [1/n, ...,1/n]' or nearby, iter- 
ate (3) until convergence; 

2 | the limit x = lim,—..9x(T) is a strict local solu- 
tion with probability one; call the GENF pro- 
cedure to improve the objective, if possible; de- 
note the improving point by x; 

3 | repeat 1), starting with x(0) = x 


Replicator dynamics algorithm for StQPs 


A different approach towards global solutions of 
StQPs uses familiar branch and bound schemes. For 
ease of exposition, now consider the minimization 
StQP 


min {x" Qx: xe A}, (4) 


and assume without loss of generality that Q has only 
positive entries (otherwise replace Q with Q + yeeT 
where y is suitably large). If one applies a usual simpli- 
cial partition [2] (cf. also ® Simplicial decomposition) 
to A, all subproblems are again StQPs. To obtain lower 
bounds for these problems, convex minorants for the 
objective x’ Qx on A may be used, e. g.quadratic forms 
x! Fx with F positive semidefinite (or some related ma- 
trix F which ensures that the minorant is convex nec- 
essarily over A only), where F is chosen such that the 
gap between the objectives is small. This can be accom- 
plished by requiring diag F = diagQ and that }°"/[qij — 
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fi] is small, while the minorant condition is guaranteed 
by the requirement fj < qj for all i,j ¢ N. Therefore one 
arrives at a semidefinite optimization problem (SDP; cf. 
> Semidefinite programming: Optimality conditions 
and stability) which can be solved by the usual meth- 
ods. The resulting matrix F then gives a convex prob- 
lem, so that the desired lower bound for(4), {minxT Fx: 
x € A} can be obtained efficiently, e. g. via local search 
techniques or linear complementarity approaches (cf. 
also > Interval analysis: Eigenvalue bounds of interval 
matrices). For details and results see [3]. 


See also 


> Complexity Theory: Quadratic Programming 

> Quadratic Assignment Problem 

> Quadratic Fractional Programming: Dinkelbach 
Method 

> Quadratic Knapsack 

> Quadratic Programming with Bound Constraints 

> Quadratic Programming Over an Ellipsoid 

> Standard Quadratic Optimization Problems: 
Applications 

> Standard Quadratic Optimization Problems: Theory 
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A standard quadratic optimization problem (StQP) con- 
sists of finding (global) maximizers of a quadratic form 
over the standard simplex in n-dimensional Euclidean 
space R", 


: x; >0 forallie N, e'x=1}, 


where N = {1,..., n}; a T denotes transposition; and e 
= Dien e = [1,...,1]7 € R", with {e;: i € N} the ver- 
tices of A. Hence a StQP can be written as a (global) 
quadratic optimization problem of the form 


max {x! Rx: xe A\ ; (1) 


where R is an arbitrary symmetric n x n matrix. 

An important application for StQPs is the search for 
a maximum weight clique arising in computer vision, 
pattern recognition and robotics (see [2] for a more de- 
tailed account): consider an undirected graph S = (N, 
€) with n nodes, and a weight vector w = [wy, ..., Wn]? 
of positive weights w; associated to the nodes i € N. 
A clique S is a subset of N which corresponds to a com- 
plete subgraph of G (i.e. any pair of different nodes in 
S is an edge in €). A clique S is said to be maximal if 
there is no larger clique containing S. Every clique S in 
G has a weight W(S) = )cies wi. The maximum weight 
clique problem (MWCP) consists of finding a clique in 
the graph which has largest total weight. The classical 
(unweighted) maximum clique problem is a special case 
with w = e. To reformulate the MWCP as a StQP, one 
may exploit an idea of L. Lovasz in considering the fol- 
lowing class of symmetric n x n matrices: let 


C(E) = {(cifijen: cj =O if (i,j) € }, 


as well as C,(G) = Ce C(€): CT = {Cand cj > cy + cj if 
(i, j) ¢ €}, and form the class 


for all i 


1 
C(G,w) = )C€ C4(G): ci = = 


1 


Now consider the (minimization) StQP 


min ean. 3 xe A} (2) 
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for some matrix C € € (G, w). Given a subset S C N, 
define the S-face of A as 


As = {xeEA: x; = Oifi ¢ St 


and its weighted barycenter as x° = )~ <5 (wil W(S))e; € 

As. Then the following assertions hold [2]: 

e A point x € A is a local solution to problem (2) if 
and only if x = x°, where S is a maximal clique. 

e A vector x € A is a global solution to problem (2) 
if and only if x = x°, where S is a maximum weight 
clique. 

e Moreover, all local (and hence global) solutions to 
(2) are strict. 

Note that a different class used in [3] does not share 

these properties. The class C(S, w) is isomorphic to the 

positive orthant in (3) — e dimensions where e is the 
cardinality of €. This class is a polyhedral pointed cone 
with its apex given by the matrix C9-” with entries 


1 ‘cet 
Pr ifi = j, 

cpu = yt + mm, ifi A jand(i,{) ¢€, 
0 otherwise. 


In the unweighted case w = e, we get C9" = (5)I+ Ag 
where Ag is the adjacency matrix of the complement 
graph G, and I the n x n identity matrix. Therefore (2) 
can be seen as a regularized generalization of the origi- 
nal approach of T.S. Motzkin and E.G. Straus [7]. 

Another application of StQP is concerned with the 
mean/variance portfolio selection problem (see, e. g. [6]; 
> Portfolio selection and multicriteria analysis) which 
can be formalized as follows: suppose there are n se- 
curities to invest in, at an amount expressed in relative 
shares x; > 0 of an investor’s budget. Thus, the budget 
constraint reads eTx = 1, and the set of all feasible port- 
folios (investment plans) is given by A. Now, given the 
expected return mj; of security i during the forthcom- 
ing period, and an n x n covariance matrix V across all 
securities, the investor faces the multi-objective prob- 
lem to maximize the expected return m™x and simul- 
taneously minimize the risk x? Vx associated with her 
decision x. 

One of the most popular approaches to such type 
of problems in general applications is that the user pre- 
specifies a parameter f which in her eyes balances the 
benefit of high return and low risk, i.e. consider the 


parametric QP 
max {fp (x) = m!x— Bx! Vx: xe A\ . 


For fixed f, this is a StQP. Anyhow, the question re- 
mains how to choose f. In finance applications, the 
notion of market portfolio is used to determine a rea- 
sonable value for this parameter. This emerges more or 
less from an exogenous artefact, namely by introducing 
a completely risk-free asset which is used to scale re- 
turn versus risk [5]. An alternative, purely endogenous 
derivation of market portfolio could use a result of M.J. 
Best and B. Ding [1] who consider the problem 


1 
ae ple (x), (3) 
and show how optimal solutions (6*, x*) for (3) emerge 
from a single StQP (1) with, e.g. R= 2mm — V. 

A general application of StQPs arises if one applies 
branch and bound schemes with simplicial partitions 
[4] (cf. also » Simplicial decomposition) to general 
quadratic problems of the form 


1 
max 4 g(x) = 5x Qxtelx: xeEeM,, 


where M = {x € R”: Ax < b} with A an m x n matrix 
and Qa symmetric n x n matrix. A subproblem then is 
of the form max{g(x): x € P} with PA M # @ and P = 
conv{vo, ..., Vn} for some points v; € R” (if all vertices 
of M are easy to determine, one could even take them 
rather than the v;). With the n x (n + 1) matrix U = [vo, 
.++5 Vn], the subproblem reduces to the StQP 


max {fy Ry: ye A} 


where R= UT QU + ecT U+ UT cel is a symmetric (n + 
1) x (n+ 1) matrix and A C R™!. Efficient bounds can 
thus be obtained with one of the algorithms for obtain- 
ing (local) solutions to a StQP. 
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A standard quadratic optimization problem (StQP) con- 
sists of finding (global) maximizers of a (possibly indef- 
inite) quadratic form over the standard simplex A in 
n-dimensional Euclidean space R”, 


A= {xeR": x; > Oforallie N, e'x= 1}, 


where N = {1,..., nm}; a! denotes transposition; and e 
= [1,..., 1]’ © R. Hence a StQP can be written as 
a (global) quadratic optimization problem of the form 


max {x" Rx: xe A} ; (1) 


where R is an arbitrary symmetric n x n matrix. Since 
the maximizers of (1) remain the same if R is replaced 
with R + yee’ where y is an arbitrary constant, one 
may assume without loss of generality that all entries 
of R are positive. Furthermore, the question of find- 
ing maximizers of a general quadratic function x' Qx 
+ 2c'x over A can be homogenized in a similar way by 
considering the rank-two update R = Q + ec! 
(1) which has the same objective values on A. 

StQPs arise in procedures which enable an escape 
from inefficient local solutions of general quadratic 
optimization problems (QPs): consider the general 
quadratic maximization problem 


+ce! in 


max | fla = 5x Ox+ clx: xeM$, (2) 


where M = {x € R": Ax < b}x with A an m x n ma- 
trix and Q a symmetric n x n matrix. To formulate 
a characterization of global optimality of a Karush- 
Kuhn-Tucker point x for (2), first add a trivial non- 
binding constraint, i.e. the most elementary strict in- 
equality 0 < 1, to obtain slacks w as follows: denote by 
a} the ith row of A and put ao = 0. Similarly put bo 
= 1 and enrich A = [ap|A']' = [o,a1,...,am]! as 
well as b = [bo|b"]' = [1,b1,...,bm]' Finally, de- 
fine 1 = b — Ax > o Then perform, for any i € {0,..., 
m}, a rank-one update of A and arank-two update of Q, 
using the current gradient g = V f(x) = Qx + cof the 
objective: 


D; = wa! = u;A, 
i Tio 
Qi = —aig’ — ga; — u;Q. 
This gives a symmetric nm x n matrix Q; and a matrix 
D; which is effectively m x n since its ith row is zero. 


Denoting by J(x) the set of all nonbinding constraints, 
the following result is proved in [1]: 


Theorem 1 A Karush-Kuhn-Tucker point X of (2) is 
a global solution if and only if for alli € J(x) = 
{i€NU{O}: u; > O}, 


v' Qiv >0 if Div = 0. (3) 
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Ifv! Qiv < 0 for some v with D;v > 0, then 
X=x+dv (4) 


is an improving feasible point for 1 = u;/(a} v) (if a} v 
= 0, i.e. A = + 00, this means that (2) is unbounded). 


Determining whether or not (3) is satisfied, amounts 
to the question whether or not max{v! (—Q;)v: Dw > 
o} < 0. Now this homogeneous problem is decompos- 
able [2,3] into problems of the form max{x! Rx: x > o}, 
where the constraint e'x = >° ix; = 1 can be added with- 
out loss of generality, rendering a StQP. In fact, in or- 
der to determine an improving feasible direction (4) it 
is not necessary to solve the latter problem to optimal- 
ity, but rather sufficient to determine a feasible point x 
€ Awithx! Rx>0. 

If the original problem (2) is itself already a StQP, 
then all checks of (3) can be reduced to a single one: 
ifx € A is any feasible point, then x is a global max- 
imizer of x' Qx over A if and only if the matrix Q = 
(x' Qx)ee' — Q satisfies v' Qv > 0 ifv > o,ie. Q is 
copositive. 

The close connection between StQPs and copositiv- 
ity becomes evident if the usual semidefinite program- 
ming (SDP) approach is enlarged to recast a StQP into 
a linear optimization problem on a cone K which is 
the (pre) dual of the cone K* of all copositive sym- 
metric n x n matrices, with respect to the duality (R, S) 
= trace(RS) operating on pairs (R, S) of such matrices. 
This formulation allows to employ interior point algo- 
rithms (cf. also » Sequential quadratic programming: 
Interior point methods for distributed optimal control 
problems; » Interior point methods for semidefinite 
programming), similar to the methods used in SDP. 
Both cones K* and K have nonempty interiors, and the 
latter can be described as follows [4,6]: 


K = conv {xx! 


Hep. Gad o} ; 

the convex hull of all symmetric rank-one matrices, 
i.e. dyadic products, generated by nonnegative vectors. 
Note that dropping the nonnegativity requirement, we 
would arrive at the positive semidefinite case. Now let E 
=ee! be the n x n matrix of all ones. Since the extremal 
points of the set £ = {X € K: (E, X) = 1} are exactly the 
dyadic products xx' with x € A, a maximizer of (R, X) 
over £ can be found which is of this form, and hence 


the StQP (1) is equivalent to the linear problem 
max {(R, X): X € K,(E,X) = 1}. 


It is easy to see that the dual formulation [5] of this 
problem is 


min{yé€R: yE—Re K*}, 


which is the task to find the smallest y such that yE — 
R is copositive. Thus the dual problem is related to the 
question of eigenvalue bounds (replace E with the iden- 
tity matrix and ‘copositive’ with ‘positive semidefinite’). 
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Introduction 


Historically, the use of new technologies in agriculture 
and related sciences has been relatively behind that in 
the industrial sector. Usually, for a technology that is 
already part of the mainstream technologies in the in- 
dustrial sector, it takes time to be accepted by the com- 
munity of researchers in agriculture-related areas. One 
of the reasons for the technological gap between the in- 
dustrial and agricultural sectors could be related to the 
modest amounts of investment made in the field of agri- 
culture compared with the impressive amounts and ef- 
forts the industrial sector invests in new technologies. 
Another reason could be the relatively slow pace of up- 
dating the student curriculum with the new technolo- 
gies in university departments that prepare the future 
specialists in the field of agriculture [19]. 

The level of complexity of the problems researchers 
in agriculture-related areas need to address is con- 
stantly increasing. The advent of the Internet forces re- 
searchers to move their models and applications in new 
programming platforms. As agriculture occurs in time 
and space, aside from the technical issues presented by 
the particular problem, researchers need to take into 
account spatial and temporary considerations related 
to the problem. Therefore, researchers in agriculture- 
related fields are obliged to address more and more 


complex problems and their solution requires a wide 
collaboration between specialists of different disciplines 
and the integration of different technologies. Thus, the 
software systems they need to develop and maintain are 
complex and challenging. 

In order to successfully overcome the challenges of 
developing flexible and complex agricultural systems, 
researchers are required to master and use modern soft- 
ware engineering disciplines. The following is a short 
inventory of some of the most advanced software engi- 
neering techniques used in developing software systems 
in agriculture and related fields. 


The Unified Modeling Language 


The Unified Modeling Language (UML) was born as 
a support for modeling software using the object- 
oriented programming paradigm. Before UML, several 
object-oriented modeling languages were used, each 
with its own set of notations, and there was some con- 
fusion among the object-oriented community about 
which language to use [4]. By the mid-1990s, an impor- 
tant event had occurred that impacted the development 
of object-oriented modeling languages in a very positive 
manner. Grady Booch, Ivar Jacobson and James Rum- 
baugh joined Rational Rose (http://www-306.ibm.com/ 
software/rational/) with the goal of creating a standard 
modeling language for specifying, visualizing, con- 
structing, and documenting all the artifacts of a soft- 
ware system [4]. 

According to Wikipedia, in the field of software 
engineering, the UML is a nonproprietary specifica- 
tion language for object modeling. UML is a general- 
purpose modeling language that includes a standard- 
ized graphical notation used to create an abstract 
model of a system, referred to as a UML model. 
UML is extendable, offering the following mechanisms 
for customization: profiles and stereotype (http://en. 
wikipedia.org/wiki/Unified_Modeling Language). The 
current version, UML2.0, contains 13 types of diagrams 
that can be grouped in three categories such as struc- 
ture, behavior, and interaction diagrams; they are used 
to express static and dynamic aspects of the system un- 
der study. UML is the Object Management Group’s 
most-used specification, and the way the world models 
not only application structure, behavior, and architec- 
ture, but also business process and data structure. 
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Examples of UML Models in Agriculture 


The use of UML in modeling agricultural systems is 
a recent phenomenon. Initially, the use of UML was 
to make a general presentation of the application’s 
model. Hutchings used a simple class diagram to rep- 
resent relationships between classes in a framework 
for grazing livestock. Drouet and Pages [7] discussed 
the benefits of using the object-oriented paradigm and 
UML to express the relationships between growth and 
assimilate partitioning from plant organs to the whole 
plant. This use of UML was limited as the diagrams pre- 
sented lacked many details, which make it difficult to 
understand the role of classes/objects and their behav- 
ior. A good model should represent not only the rela- 
tionships between classes, but also their structure and 
behavior and the role of each class involved in an asso- 
ciation. 

Later, a number of authors made UML part of their 
modeling approach [21] used UML to analyze several 
irrigation-scheduling models and water-balance mod- 
els to identify common elements and relationships in 
order to propose a general template for creating new 
models and maintaining existing ones. Papajorgji and 
Pardalos [19] presented a detailed UML and object- 
oriented approach to model software for agricultural 
systems. Pinet et al. [22] used UML and Object Con- 
straint Language (OCL) to model spatial constraints of 
an environmental information system monitoring the 
spreading of organic matter. Hasenohr and Pinet [12] 
used UML and OCL to develop a spatial decision sup- 
port system to implement common agricultural pol- 
icy. Martin and Vigler [13] used UML to set up a shared 
geographic information system for agricultural qual- 
ity and environmental management. Miralles [15] used 
UML to present geographic information system (GIS) 
patterns that express relationships between spatial and 
temporal concepts and to automatically generate the 
corresponding code. Figure 1 shows the class dia- 
gram of the irrigation-scheduling model as presented 
in [21]. 


The OCL 


OCL is a notational language, a subset of UML that 
allows specifying constraints over entities represent- 
ing concepts from the problem domain [17,27]. It in- 
tegrates notations close to a spoken language to ex- 


press constraints. OCL was first developed by a group 
of IBM scientists around 1995 during a business mod- 
eling project. It was influenced by Syntropy, which is 
an object-oriented modeling language that makes heavy 
use of mathematical concepts. OCL is now part of the 
UML standard supported by the OMG and it plays 
a crucial role in the model-driven architecture (MDA) 
approach [24]. 


Examples of Using OCL in Agriculture 


OCL is used to express spatial constraints in an 
environmental information system developed by re- 
searchers at Cemagref, France, and is described in detail 
in [22]. This system monitors the spreading of organic 
matter in France. 

Spreading on the croplands is an excellent way of 
recycling organic matter (manure, sewage sludge, etc.) 
but the agricultural practices used require a fastidious 
monitoring system. An excessive and ill-planed spread- 
ing practice could lead to damage to soils owing to pol- 
lution. It is very important to model a set of spatial con- 
straints that define precisely where spreading of organic 
matter can take place; as an example, organic matter 
can never be spread inside certain protected natural ar- 
eas. Designing an environmental information system 
that controls the spreading of organic matter requires 
some spatial constraints be strictly respected. Figure 2 
shows the UML model for the spreading problem. 

The Allowed_Area class models the area on which 
the regulation allows the spreading of organic matter. 
The Spread_Area class models the area on which the 
spreading has already been carried out by the farm- 
ers. In the ideal case, the organic matter is organized 
into groups before being spread on the fields; each 
group has an ID in order to improve traceability. The 
spreading model presented in this example includes 
only one of the potential organic matter providers (Pu- 
rification_Station). 

Constraints can be expressed using OCL. The fol- 
lowing constraint says that a spread area should not 
overlap with its associated allowed area: 
context Spread_Area inv: 
not (self.geometry.overlap(self.spread_on.geometry)) 

The following constraint says that all allowed areas 
must be spatially disjoint from built areas: 
context Allowed_Area inv: 
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Class diagram representation of the irrigation-scheduling model 


Allowed_Area pread_Area 
- AA_id > String z id - Stri 
- $pread_parcels_number . Integer Sp_id- String 
- validity_data : Date 
- geometry : Region 


- date_of_record : Date 
- geometry . Region 
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Agricultural spreading model 


1. Built_Area.allInstances->forAll 
(built_area_instance| 

2. built_area_instance.geometry->forAll(building| 

3. self.geometry.disjoint(building) )) 

In the above constraint, the use of set-based opera- 

tion is needed because the geometry of a built area 

can be composed of several simple regions (i. e., several 

buildings). A complete expression in natural language 


Organic_Matter 


+ OM_group_id : String 

- quantity © Integer 

- unt - String 

- ofganic_matter_type String 


Purification_ Station 


—__— ——__}- P_id - String 
provide_organic_matter] _ iRET id: String 


of this constraint is “1. for each built_area_instance 
in the Built_Area class and 2. for each building in 
the built_area_instance geometry, 3. the geometry of 
an Allowed_Area instance (denoted by self) must al- 
ways be spatially disjoint from building.” Code can be 
automatically generated from OCL constraints [6,22]. 
This code allows also the evaluation of the quality of 
the data stored in the database; i.e., verify if the data 
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stored in the database satisfy the constraints defined us- 
ing OCL. 


The Design Patterns 


Well before software engineers started using patterns, 
an architect named Christopher Alexander wrote two 
books that describe the use of patterns in building ar- 
chitecture and urban planning. The first book is ti- 
tled A Pattern Language: Towns, Buildings, Construc- 
tion [1], published in 1977. The second one is titled The 
Timeless Way of Building [2], published in 1979. These 
two books not only changed the way structures were 
built, but they had a significant impact in another not 
closely related field, the field of software engineering. 
According to Alexander [1], a pattern describes a prob- 
lem which occurs over and over again in our environ- 
ment, and then describes the core of the solution to that 
problem, in such a way that one can use this solution 
a million of times over, without ever doing it the same 
way twice. Although Alexander refers to buildings and 
towns, his conclusion can be successfully applied in the 
process of object-oriented design. 

Design patterns are well-thought solutions for 
a large number of problems that have been built by ex- 
perienced designers to be easily used by novice pro- 
grammers. They started being used in the mid-1990s, 
when a group of four software engineers [10] wrote 
the book titled Design Patterns Elements of Reusable 
Object-Oriented Software. The book had a significant 
impact on the way software design was carried out. 
A design pattern names, abstracts, and identifies the key 
aspects of a common design structure that make it use- 
ful for creating a reusable object-oriented design [10]. 
The same way an architect uses prefabricated blocks for 
building complex constructions, a programmer will use 
patterns to develop complex software. Using patterns 
makes the process of designing complex systems eas- 
ier. 

Design patterns are divided into three categories: 
creational, structural, and behavioral patterns. Cre- 
ational patterns deal with the process of creating ob- 
jects. They describe optimal ways of creating new ob- 
jects. Structural patterns describe how to compose 
classes or objects. Behavioral patterns describe how to 
distribute behavior among classes and how classes in- 
teract with each other. 


Example 1 of Using Design Patterns in Agriculture 


Very often programmers have to solve the same prob- 
lem that occurs in different applications regardless of 
the problem domain. An example of this type of prob- 
lem could be providing an application with the same 
type of data using different data sources and the system 
has to decide at run time what the particular data source 
is. Such a problem can be solved using the strategy pat- 
tern. 

The intent for the strategy pattern is to define a fam- 
ily of algorithms, encapsulate each one, and make them 
interchangeable [10]; therefore, algorithms can vary in- 
dependently from the clients that use them. This pat- 
tern is useful in cases where several strategies are avail- 
able for use and the choice of the right strategy is done 
at run time. To better understand the context in which 
the strategy pattern can be used, let us consider a simple 
simulation model as presented in [20]. In a crop sim- 
ulation model the weather data can be obtained using 
different sources, such as using a text file, reading them 
from a database system, or using an on-line system 
of weather stations. In a system developed in a tradi- 
tional programming language such as FORTRAN, the 
ability to choose between several options requires the 
use of complex if-then-else or switch statements. All 
the options available are hardwired into code. As new 
data sources are available, their use will require changes 
to the code. Therefore, traditional programming lan- 
guages offer rather limited and rigid solutions to this 
problem. A well-thought system should not only pro- 
vide access to several sources of data, but additionally it 
should provide for ways of obtaining them when avail- 
able in the future without affecting the existing system. 

The object-oriented paradigm and the design pat- 
terns solve this problem by offering a flexible and ele- 
gant solution. Figure 3 shows classes that are involved 
in the strategy pattern as described in [19]. The Simu- 
lationController is a client that uses the weather data. 
The IWeatherDataProvider is an interface that repre- 
sents the common behavior of all classes providing 
access to a particular source of weather data. Simu- 
lationController has a unidirectional association with 
IWeatherDataProvider. The multiplicity of this associ- 
ation allows one controller to use one or no weather 
data provider. Classes WeatherDataFromFile, Weath- 
erDataFromStation, and WeatherDataFromDatabase 
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Class diagram for the strategy pattern 


provide behavior for extracting data from a particular 
source such as a text file, a network of weather stations, 
or a database. These classes implement the same inter- 
face, the I[WeatherDataProvider; therefore, any one of 
them can be used to provide the weather data requested 
by SimulationController. Note that the user of the data, 
in this case SimulationController, does not have access 
to or knowledge of the data providers; therefore, the 
data providers can change the data extraction algorithm 
without affecting the data user. 


Example 2 of Using Design Patterns in Agriculture 


While developing a GIS, designers pay particular atten- 
tion to the spatial properties of thematic concepts such 
as Plot, Spread_Area, etc. In GIS-based systems the spa- 
tial concepts that are manipulated the most are Point, 
Line, and Polygon. These concepts have their own char- 
acteristics and some of them can be combined to cre- 
ate other concepts; as an example, a Line can be repre- 
sented as a set of Point and a Polygon can be considered 
as a set of Point or as a succession of Line. Furthermore, 
the nature of the relationship between Point, Line, and 
Polygon is static; it never changes over time. The rela- 
tionship between these spatial concepts can be repre- 
sented by design patterns as described in [10]. 

Miralles [15] described a GIS design pattern based 
model that is a recurrent model for a coordinate sys- 
tem: 1D, 2D, and 3D. Figure 4 shows the specific design 
pattern for a 2D coordinate system. This GIS model 


Weather Data From 
Station 
ae 


AX 


* 


Weather Data From 
Database 
SES SS 


is structured around two composite patterns. The first 
composite pattern depicts the relationship between the 
spatial properties Point and Line. A Line is composed 
at least of two Linear Component, which can be Point, 
Line, or a mixture of Point and Line. The Point has two 
properties: an abscissa and an ordinate. The Line has as 
property its length. The second composite pattern pro- 
vides the possibility of representing a Polygon as Polyg- 
onal Component, which can be either Linear Compo- 
nent or Polygon or even a combination of the latter. 
The polygonal properties are the perimeter and the sur- 
face. Like the simplest polygon (triangle) is composed 
of three points or three line segments, the cardinality 
of the Polygonal Arrangement association should be at 
least equal to (3 or more). In this case, the polygonal 
arrangement of two polygons cannot be done. In order 
to do it, the Two Polygons Arrangement association has 
been added. 

The GIS design pattern based model describing the 
spatial properties currently used for GIS modeling 
could be considered as a structural pattern. Considering 
the static nature of the relationships among concepts 
involved in the pattern, code can be easily generated in 
any programming language. 


The MDA Approach 


The MDA isa framework for software development de- 
fined by the OMG [25]. At the center of this approach 
are models; the software development process is driven 


State of the Art in Modeling Agricultural Systems 


3699 


Linear 
Component 


State of the Art in Modeling Agricultural Systems, Figure 4 


2* Linear Arrangement 


Linear Component 


+length: real 


0..* | Polygon 


Polygon 
Polygon + perimeter: real Polygon 
0..* +surface: real y) 


Two Polygons Arrangement 


Geographical information system design pattern based model for a 2D coordinate system 


by constructing models representing the software un- 
der development. The MDA approach is often referred 
to as a model-centric approach as it focuses on the busi- 
ness logic rather than on implementation technicali- 
ties of the system in a particular programming environ- 
ment. This separation allows both business knowledge 
and technology to continue to be developed without ne- 
cessitating a complete rework of existing systems [14]. 

MDA uses UML to construct visual representations 
of models. UML is an industry standard for visualiz- 
ing, specifying, constructing, and documenting the ar- 
tifacts of a software-intensive system [4], and it has a set 
of advantages that makes it fit to be the heart of the 
MDA approach. First, by its nature, UML allows for de- 
veloping models that are platform-independent. These 
models depict concepts from the problem domain and 
the relationships between them and then represent the 
concepts as objects provided with the appropriate data 
and behavior. A model specified with UML can be 
translated into any implementation environment. The 
valuable business and systems knowledge captured in 
models can then be leveraged, reused, shared, and im- 
plemented in any programming language [5]. A second 
advantage is that UML has built-in extension mech- 
anisms that allow the creation of specialized, UML- 
based languages referred to as UML profiles [8]. If mod- 
elling agricultural systems requires special modelling 
artifacts, then an agricultural UML profile would be 
created and plugged into the UML core system. 


The MDA approach consists of three levels of mod- 
els as shown in Fig. 5. As shown in this figure, a set of 
transformations are needed to transform a model from 
the current level to the next one. 

The approach starts with construction of a con- 
ceptual diagram that represents our knowledge of the 
problem domain expressed through concepts, abstrac- 
tions, and their relationships. Conceptual diagrams are 
the result of an activity referred to as conceptual model- 
ing. Conceptual modeling can be defined as the process 
of organizing our knowledge of an application domain 
into hierarchical rankings or ordering of abstractions, 
in order to obtain a better understanding of the phe- 
nomena under consideration [7]. Conceptual diagrams 
have the advantage of presenting concepts and rela- 
tionships in an abstract way, independent of any com- 
puting platform or programming language that may 
be used for their implementation. During this phase, 
the focus is on depicting the concepts of the system 
and providing them with the right data and behavior. 
The fact that the implementation technology may be 
Java, a relational database, or .NET is irrelevant at this 
point. Therefore, the intellectual capital invested in the 
model is not affected by changes in the implementation 
technologies. A conceptual model thus is a platform- 
independent model (PIM). 

Because of the nature of a PIM (no implementa- 
tion details are considered at this phase) and because 
the model construction is done visually using UML, 
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State of the Art in Modeling Agricultural Systems, Figure 5 


Transformations are applied to a model level to obtain the next level. PSM platform-specific model 


the participation of domain specialists in the model 
construction process is greatly facilitated. The MDA 
approach frees domain specialists from the necessity 
of knowing a programming language in order to be 
an active participant. PIMs are developed in UML, 
which is visual and uses plain English that can be eas- 
ily understood by programmers and nonprogrammers 
alike [21]. A PIM is the only model that developers will 
have to create “by hand.” Executable models will be ob- 
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tained automatically by applying a set of transforma- 
tions to the PIM. 

Figure 6 shows a PIM for a simple crop simulation 
model. Details on the implementation of this model can 
be found at http://mda.ifas.ufl.edu. Concepts from the 
simulation domain are depicted in an abstract man- 
ner and their relationships are presented. At the cen- 
ter of the model is the Simulator object, which has ac- 
cess to entity objects Plant, Soil, and Weather. The na- 
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Conceptual model for the crop simulation model 
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ture of these relationships is a composition, meaning 
that it is Simulator’s responsibility to create instances of 
these objects and destroy them at the end of the simula- 
tion. Simulator is provided with a method named simu- 
late(list of parameters) that runs the simulation using as 
initial values the list of parameters. Simulator plays the 
role of a control class; it sends the right message to the 
right object to carry out the simulation [19]. 

Providing objects of the conceptual diagram with 
behavior is one of the most exciting features of the 
MDA approach. In the world of the simulation mod- 
els, most of the behavior that objects should provide is 
expressed in the form of equations. Equations are con- 
structed in a declarative way using attributes of objects 
participating in the conceptual diagram. 

The simulation process is controlled by the behavior 
of the object Plant. A state-transition diagram is used 
to model the behavior of Plant [4]. This diagram shows 
the valid execution order of the services of the class and 
the set of possible lifecycles of Plant. Figure 7 shows 
the state-transition diagram of Plant. The diagram has 


two types of elements: states and transitions. States rep- 
resent the different situations through which an ob- 
ject of type Plant can pass, depending on the value of 
its attributes. Transitions represent executed services, 
events, or transactions, which produce state changes 
and modify the value of the object’s attributes. 
According to the state-transition diagram, Plant 
will remain in the state vegetative and will continue to 
receive messages calculateRate and integrate as long as 
the guard condition number of leaves > maximum num- 
ber of leaves is not satisfied. When the guard condition 
is satisfied Plant will move to state reproductive. For this 
transition, the source state is vegetative and the target 
state is reproductive. Plant will remain in the state re- 
productive as long as the guard condition cumulative 
thermal time > reproductive thermal time is not satis- 
fied. When the guard condition is satisfied, Plant will 
move to state mature and the simulation will terminate. 
MDaA-based tools provide ample capabilities to 
check the correctness of the conceptual model, the be- 
havior of it objects, and the relationships between them. 
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State chart describing the behavior of Plant 
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An XML file is created that contains a detailed specifi- 
cation for the model that can be used by code engines to 
generate code in several programming languages. Sev- 
eral scenarios can be considered as different parts of 
the system can be implemented in different languages. 
For example, the user can choose the C# environment 
for developing the user interface and a CORBA-EJB, 
Java-based environment for the implementation of the 
server. Because the conceptual model is detailed and 
precise, code generators can find all the information 
needed to translate the model into several program- 
ming environments. Besides the code representing ob- 
jects of the conceptual model, code generators will pro- 
vide all the wiring code that links the client and the 
server applications. 

In the domain of GIS there are several modeling for- 
malisms to express the spatial properties of thematic 
concepts: Aigle [16], CONGOO [18], GeoFrame [9], 
MADS [23], Perceptory [3], POLLEN [11], etc. Some 
of these formalisms use a visual language based on pic- 
tograms (Perceptory, MADS, Aigle, etc.). The visual 
language was introduced to improve the communica- 
tion between the GIS designer and users. 

Similarly, Miralles [15] has implemented the vi- 
sual language of Perceptory in a professional case tool 
using the profile mechanism, a mechanism that ex- 
tends the UML metamodel. Pictograms are attached 
as annotations to the UML notation of Class. For ex- 
ample, the polygonal geometry of the thematic con- 
cept Spread_Area showed in Fig. 8 can be expressed 
by a polygonal pictogram. Figure 8 shows the UML 
notation for class Spread_Area using the Percep- 
tory language [3]. The polygonal pictogram used in 
Spread_Area shows that Spread_Area is kind of Poly- 
gon. 

The GIS design formalisms are used during the 
analysis phase of the GIS development, the phase dur- 
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Spread_Area concept annotated with a polygonal pictogram 


ing which the model is created “by hand.” So, the 
model built is a PIM. Miralles [15] provided two trans- 
formations on the PIM to convert a pictogram into 
model elements used by code generators. The first one 
is a transformation that generates the GIS design pat- 
tern based model presented in “The Design Patterns” 
(Fig. 4). It automatically creates classes (Point, Line, and 
Polygon), the corresponding attributes (X, Y, length, 
perimeter, and surface), and also the generalization 
and the association links (Linear Arrangement, Polyg- 
onal Arrangement, and Two Polygons Arrangement). 
This transformation is called GIS design pattern based 
model generation. At this step, the thematic concept 
Spread_Area and the spatial concept Polygon are totally 
disassociated. The second transformation implements 
the relationship between these two concepts and is also 
a PIM/PIM transformation. This transformation is re- 
ferred to as pictogram translation mapping technique. 
The goal of this transformation is to automatically es- 
tablish an association between Spread_Area and Poly- 
gon (Fig. 9) referred to as Spatial Characteristic. 

By default, the role of Polygon is set to Geometry and 
its cardinality is set to 1. At the other end of the associ- 
ation Spatial Characteristic, the class, and its role share 
the same name, Spread_Area, and its cardinality is set 
to (0 or 1). These default values can be modified later 
by the designer if necessary. Once the association has 
been created, the pictogram is not used any longer as 
the information it conveys becomes redundant. 

The two transformations described above, GIS de- 
sign pattern based model generation and pictogram 
translation mapping technique are examples of appli- 
cation of the MDA principles in the domain of GIS. 


Conclusion 


Developing a successful software project in agriculture 
requires the collaboration of researchers from differ- 
ent scientific domains with different scientific back- 
grounds. Therefore, it is very important for a team of 
different backgrounds to have a common communica- 
tion language. UML is an excellent tool for analyzing, 
designing, and documenting software projects. Mod- 
els are developed visually using plain English (or any 
other spoken language for that matter) and can be un- 
derstood by programmers and nonprogrammers alike. 
Thus, collaboration between team members is greatly 
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Association spatial characteristic created by the pictogram translation mapping technique 


improved by increasing the number of specialists in- 
volved in project development. Furthermore, the ad- 
vent of MDA makes the process of design and analysis 
more accessible to specialists, as MDA is a specialist- 
centric approach. The model is developed visually us- 
ing knowledge from the problem domain, thus making 
the specialist of the domain the center of the application 
development. 
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Introduction 


This review paper considers the resource constrained 
project-scheduling problem (RCPSP) with static na- 
ture and renewable resources and aims to provide a re- 
cent survey of related work and heuristics employed. 
‘Static scheduling’ refers to determining a solution to 
a scheduling problem instance with fixed resources and 
precedence constraints. ‘Renewable resources’ implies 
that resources may be used during the whole schedul- 
ing process and planning timespan without degrada- 
tion in capacity or work pace. Thus, the resources are 
not single-use type. The solutions will basically con- 
sist of starting times of a known set of activities. For 
an introduction and overview of different formula- 
tions of project scheduling problems the reader is re- 
ferred to [7,16,17,25]. Ozdamar and Ulusoy [30] pro- 
vide an elaborate review of RCPSP with both renewable 
and non-renewable resource constraints and time/cost 
based objectives. 

The RCPSP is known to be NP-hard. Thus only 
instances with a very limited number of activities can 
currently be solved to optimality. For larger problems, 
heuristics are utilized that provide robust, high perfor- 
mance, extensible and easy to apply solutions. Some 
benchmark results are also available in the literature 
and among those, the most recent are the ones sup- 
plied by [1,15,23,24]. Bouleimen and Lecocq [6] (along 
with [1] and [13]) give the best performing results to 
supplied benchmark instances. 

The activity-on-node (AON) based flow represen- 
tation of the RCPSP is given in the next section. Fol- 
lowing the formulation, a brief summary of the recent 
approaches proposed is provided. 


Formulation 


Artigues et al. [2] provide the following AON-flow net- 
work based formulation for the static case of RCPSP 
with renewable resources: 

It is assumed that a project composed of a set of ac- 
tivities V = 1,...,n has to be scheduled ona set of re- 
newable resources i = 1,..., m. Each resource k € ¥ 
has a finite capacity Ry. Precedence constraints of activ- 
ities within the project are modeled by a set of project 
arcs E such that (i, j) € E means that activity j has to 
start after completion of activity i. Each activity i ¢ V 
requires a non-negative amount rj, of each resource 
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k € & and has a duration p;. The scheduling problem 
lies in characterizing an n-tuple S = {S,,...,S,} where 
S; is the starting time of the activity i, while minimizing 
the total project duration (makespan) denoted by Cyax. 
This problem, known as the RCPSP may be defined by 
the triple (V, E, #). Its difficulty comes from the re- 
source limitation constraints that prevent some activi- 
ties requiring the same resources from being scheduled 
simultaneously. These constraints can be modeled by 
defining each resource k as the union of Rx resource 
units, such that a given resource unit cannot be allo- 
cated to more than one activity at the same time. In 
other words, each resource is assumed to have a capac- 
ity of one activity. If a resource has a capacity greater 
than one activity, then it is divided into several re- 
sources each with a capacity of one. Hence, in any fea- 
sible solution, a resource unit allocated to an activity i 
has to be directly transferred after the completion of i 
to a unique activity j. However, since all the units of the 
same resource are equivalent, one only has to know the 
number of units directly transferred from one activity 
to another. For an elaborate discussion of the model, 
the graphical representation, and the mathematical for- 
mulation of the problem, refer to [2]. 

There are two main classifications for the objec- 
tive function: time based and cost based. Time based 
objectives in the literature comprise instances such as 
minimizing makespan, mean lateness, mean comple- 
tion time, and weighted tardiness in an environment 
where multiple projects are dealt simultaneously. How- 
ever, cost based objectives don’t necessarily yield the 
same results as time based objective functions. Max- 
imizing the net present value of a project, minimiz- 
ing total cost of a project considering all costs includ- 
ing variable costs due to resource consumption and 
other overhead summed with tardiness costs, and fi- 
nally maximizing the efficient usage of cash over the 
project span are some instances of cost based objec- 
tives we can observe in the contemporary literature. Oz- 
damar and Ulusoy [30] study these different objective 
functions in greater detail. 

Brucker et al. [7] introduce a classification scheme 
and a common notation for the RCPSP. The need for 
such an effort is to remove the widening gap between 
the contemporary machine scheduling literature and 
the RCPSP literature in terms of notation and classifi- 
cation. Indeed, both problems have so many common- 


alities that one may be converted to the other with ease 
(both are NP-hard in nature). Due to these commonal- 
ities, notation and heuristics developed for one can eas- 
ily be adapted for the other. Brucker et al. [7] also try 
to form a standard structural base to maintain future 
research within a coherent literature. 


Methods 


The surveys by Herroelen et al. [16] and Kolisch and 

Padman [25] provide detailed descriptions of the char- 

acteristics, representations and classification schemes 

for the solution approaches proposed for the RCPSP. 

Bouleimen and Lecocq [6] group the suggested solution 

methods into three, as follows: 

1. priority rules ([4,12,21,22,26]); 

2. exact methods ([8,10,11,28]); and 

3. metaheuristics such as tabu search ([2,3,31]) ge- 
netic algorithms ([1,13,20]), and simulated anneal- 
ing method ([5,6]). 

Brucker et al. [7] claim that the first heuristic meth- 
ods are the priority-rule based scheduling methods. In 
these methods, the main idea is extending a partially 
generated schedule by stepwise insertions of new ac- 
tivities either in sequential or parallel order. At each 
step, a set of feasible nodes for insertion is generated 
based on starting time constraints, priority constraints, 
or other resource constraints. The selection of the next 
activity for insertion (from the decision set) is based on 
a priority assessment mechanism, usually specific to the 
problem type and objectives. Brucker et al. [7] empha- 
size the advantage of priority-based heuristics as being 
intuitive, easy to implement and fast in computational 
effort. However, a shortcoming of these methods is that 
they do not excel with respect to the average deviation 
from the optimal objective function value. Brucker et 
al. [7] also point out that recent effort has shifted to 
exact methods, local search [32] and meta-heuristics. 
They also provide an overview of computational results 
by reporting the size of the problems solved and giving 
details about computational specifications. 

Among the exact algorithms proposed for the 
RCPSP, most promising progress has been attained us- 
ing the branch-and-bound mechanism. Herroelen et 
al. [16] highlight seven points in the conclusion of 
their review on usage of branch-and-bound methods 
for solving the RCPSP. They comment that those seven 
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points reveal a number of desirable attributes of an ef- 
ficient optimal solution procedure for the RCPSP. It is 
also added that the seven points constitute the very ba- 
sis of computational efficiency of a method they intro- 
duce for the RCPSP. Herroelen et al. [16] and Icmeli et 
al. [19] list instances of exact applications from the lit- 
erature while providing an extensive review of solution 
methods used for different versions of RCPSP. 

Kolisch and Padman [25] cluster heuristic ap- 
proaches for the RCPSP with makespan minimization 
objective basically to four different solutions metho- 
dologies: (1) priority-rule based scheduling, (2) trun- 
cated branch-and-bound, (3) disjunctive arc con- 
cepts, and (4) meta-heuristic techniques. However, for 
their efficiency, robustness and improvement potential, 
heuristic algorithms and meta-heuristic approaches 
(rather than exact algorithms) will be addressed. 

Meta-heuristic techniques for solving combinato- 
rial optimization problems have emerged in recent 
years. Kolisch and Padman [25] address that all heuris- 
tic approaches encode the solution as a list with length 
equal the number of jobs. The generated list can 
be mapped into a schedule using priority-based ap- 
proaches. For a detailed study in encoding schemes one 
may refer to Kolisch and Hartmann [24]. 

In the procedure of Sampson and Weiss [32] each 
element of the generated list is an integer. In their 
scheduling mechanism, each element starts at the max- 
imum of the completion times of its immediate prede- 
cessors plus a specific integer value. This ensures fea- 
sibility in the time domain. To prevent excess usage of 
renewable resources they also add a penalizing mecha- 
nism. 

Hartmann [14], Leon and Ramamoorthy [27], 
Naphade et al. [29], Lee and Kim [26], Cho and Kim [9], 
and Kohlmorgen et al. [20] basically encode the solu- 
tion as a list of numbers that assigns each task a priority 
value. By using these priority values within a schedule- 
generating scheme, one obtains a feasible schedule and 
the associated objective function value. This encod- 
ing has the potential to be applied to meta-heuristics 
such as simulated annealing, tabu search and genetic 
algorithms. 

Baar et al. [3], Bouleimen and Lecocq [6], Hart- 
mann [13], and Pinson et al. [31] use an ‘activity list’ 
where a schedule is generated by scheduling the activ- 
ities in the order prescribed by the list. Baar et al. [3] 


use two different neighborhood search mechanisms for 
a tabu-search procedure. The first one encodes a solu- 
tion as an activity list which is mapped to a schedule 
with the serial scheduling scheme. The neighborhood 
is defined as all activity lists which can be reached by 
shifting a resource-critical job to a new position. The 
second neighborhood builds up on the exact solution 
procedure of Brucker et al. [8]. Essentially, activity pairs 
are either forced or let to be processed in parallel via 
the so called ‘parallelity relations’. For a fixed paral- 
lelity relation a schedule is obtained by forward recur- 
sion. Bouleimen and Lecocq [6] use simulated anneal- 
ing together with a shift operator. Hartmann [13] uses 
a genetic algorithm with two-point crossover. Pinson 
et al. [31] propose a tabu-search with pair wise inter- 
change and shift within a neighborhood. 

An alternative objective function for the RCPSP 
is to maximize the net present value of the project. 
Kolisch and Padman [25] provide a classification of 
heuristics developed for the RCPSP. They group heuris- 
tics into three categories: (1) optimization guided, 
(2) parameter based, and (3) meta-heuristic ap- 
proaches. 

Icmeli and Erenguc [18] apply a tabu-search pro- 
cedure to a starting feasible solution generated using 
a simple, single-pass algorithm. They improve the ini- 
tial solution over several iterations by moving each ac- 
tivity one time unit early or late from its current com- 
pletion time without violating the earliest and latest 
completion time constraints for the activity. They also 
test the usage of long time memory concept in their 
algorithm. The computational results are found to be 
both efficient and close to optimal. 

Zhu and Padman [33,34] introduce a notion of re- 
placing single pass complex optimization-based heuris- 
tics with a blend of multiple but simple heuristics. They 
report superior performance of their method to other 
works employing unique but more complex heuris- 
tics. In their initial work [33], they utilize a multi- 
agent based approach with six simple rules used in 
random order to exploit changing conditions of the 
project environment. In their latter work [34], they use 
distributed computation concepts through the use of 
an Asynchronous Team (A-team) approach. This ap- 
proach facilitates cooperation of multiple heuristic al- 
gorithms so that together they produce better results 
than if they were acting alone. 
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Kolisch and Hartmann [24] provide an updated 
and extended version of their previous [23] review 
on solution methods for project scheduling problems. 
They review existing solution methods under prior- 
ity based rules, classical metaheuristics, non-standard 
metaheuristics, and other heuristics. They test different 
algorithms they picked from the literature on problem 
sets generated and report average deviation from criti- 
cal path lower bound. 


Conclusion 


In this study, the reader is provided with a brief in- 
troduction to the RCPSP and supplied with most re- 
cent improvements in solution mechanisms based on 
heuristics. With the potential of providing high quality 
solutions within reasonable time frames, heuristics and 
meta-heuristics seem to be one step ahead of the exact 
algorithms. Thus, this work aims to focus its inquiry 
domain within the meta-heuristics field. While trying 
to keep the content brief, the reader is provided with 
guides to elaborate and most cited references. 
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Stochastic programming is the science that offers solu- 
tions for problems in connection with stochastic sys- 
tems, where the resulting numerical problem to be 
solved is a mathematical programming problem. When 
formulating a stochastic programming problem, in 
most cases we start from a deterministic mathematical 
programming problem that we call base problem or un- 
derlying deterministic problem. Then, observing that 
some of the parameters in it are random, we formulate 
another decision problem, the stochastic programming 
problem, by taking into account the probability distri- 
bution of the random variables involved. 

Any stochastic programming problem formulation 
depends on a decision-observation scheme that tells us 
in what order decisions and observations follow each 
other. If this scheme is: decision making on the system 
design or control variables (usually contained in the de- 
cision vector x), observation of the random variables 
influencing the system performance, then the model is 
called static. If there is at least one observation of ran- 
dom variables followed by a decision making, then the 
model is called dynamic. From another point of view, 
a stochastic programming problem (static or dynamic) 
may contain reliability provision or allows for the vio- 
lation of the constraints with some penalty that is in- 
cluded into the objective function. The reliability pro- 
vision typically manifests itself in the use of probabilis- 
tic constraint(s), where we prescribe that the random 
constraint(s) should hold (when the random variables 
realize and can be observed) with prescribed probabil- 
ity (probabilities). The first type of stochastic program- 
ming model is called probabilistic constrained model 
while the second type is called recourse model. The two 
model construction principles can be used. simultane- 
ously in a hybrid model. The use of probabilistic con- 
straints is an old statistical decision principle. For ex- 
ample, A. Wald used it in the sequential analysis con- 
text [6]. Its combined use with mathematical program- 
ming, however, appeared first in [1]. The general form 
of a static probabilistic constrained model has the form: 


min h(x) 
st. Ao(x) > 0,..., Am(x) = 0, 
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where h, hy, ..., hm are some functions of x € R”, 


ho(x) = P (gi(x, &) = 0,..., g(x, €) = 0) — p, 


& € R17 is a random vector, gi, ..., g, are functions in 
R™4 and p is a prescribed probability. In practice, p is 
chosen near 1, e. g., p = 0.9, 0.95, 0.99. 

In the simplest case g;(x, €) = Tjx — y;,i=1,..., 7, 
where T; is the ith row of an r x n matrix T and h, hy, 
..., Hm» are linear functions. This model can be written 


as 
min c'x 
st. Ax>b, x>0, 
PU ix = 8) > p. 


The probabilistic constraints in the above models are 
joint constraints. Sometimes instead of P(Tx > &) > 
p> the individual probabilistic constraints P(T;x > & ;) 
> pi,i=1,..., r, are used. This was the case in the 
originating paper [1]. Joint probabilistic constraint was 
first used by L.B. Miller and H. Wagner [2]. They as- 
sumed, however, that in P(Tx > &) the components 
of the random vector & are stochastically independent. 
General probabilistic constrained stochastic program- 
ming models have been formulated in [3,4]. 

A related model construction contains maximiza- 
tion of a probability subject to some constraints. Pro- 
gramming under probabilistic constraint and maxi- 
mizing a probability under constraints have many ap- 
plications in many engineering (power systems, wa- 
ter resources, telecommunication, engineering struc- 
tures, etc.) and economic (insurance, finance, economic 
planning, etc.) problems. For the mathematical theory, 
solution techniques and applications of these models, 
see [5]. 
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Given the constraints g;(x, €) > 0,i=1,..., 17, where & is 
a random vector, one way to create part of a stochastic 
programming problem, based on them, is to introduce 
the constraints involving conditional expectations: 


gi(t) = Etgi(x, &)|gi(x, €) < 0} < dij, 


where d;, i= 1, ..., r, are some bounds chosen by our- 
selves. In the simplest and from the practical point of 
view most important case g;(x, €) = T;x — y;, where T; 
is the ith row of a matrix. In this case the above con- 
straints take the form: 


gi( T;x) = E{é; _ T;x|§; — T;x > 0} <dj,i=1,...,r. 


The practical meaning of these constraints is that vio- 
lations of the stochastic constraints T;x > &;,i=1,..., 
r, are allowed but the average magnitude of violation, 


given that violation occurs, is bounded from above, in 
each constraint. If, e. g., T;x > & means in a diet prob- 
lem that the meal composition should satisfy the de- 
mand for the ith nutrient in a population (where the 
randomness of the nutrient demand is due to the in- 
homogeneous nature of the population), then E{&; — 
Tix|&; — T;x > 0} is the average unsatisfied demand for 
nutrient i, among those whose demands are not satis- 
fied. Constraints of this type have been introduced first 
in [2]; see also [3]. The conditional expectation con- 
straint is closely related to the expected residual lifetime 
in reliability theory or total remaining life in insurance, 
these being defined as g(t) = E{n — t|n — t > 0}, in con- 
nection with the random lifetime 7. It is well-known 
(see, e. g., [3]) that if 7 has continuous distribution with 
logconcave probability distribution function, then g(t) 
is a decreasing function. Using this fact, we can con- 
vert the conditional expectation constraints into linear 
ones, provided that &; has continuous distribution with 
logconcave probability distribution function, and g;(x, 
y) = Tix — y;, for every i= 1, ..., r. The equivalent con- 
straints are: 


Tix > g; (di), 


b= Woe Pt: 


A closely related stochastic programming constraint 
formulation, based on the stochastic constraints T;x — 
&; > 0,i=1,..., 17, provides us with the following: 


E{é; _ T;x|§; _ T;x = O}P(E; _ T;x > 0) = di, 


t= deh « 


These are equivalent to 


1,(T;,x) = [- (1 —F,(z)) dz < dj, 
T 


1x 


i eee ae 


where F,(z) is the distribution function of the random 
variable £;,i=1,..., r. The new constraints are called 
integrated probabilistic constraints and have been in- 
troduced in [1]. In the above-mentioned diet problem 
these constraints mean that the average unsatisfied de- 
mand is taken in the whole population and is limited 
form above, in each nutrient. 

The advantage of the integrated probabilistic con- 
straints is that the functions ];, i= 1, ..., r, are decreas- 
ing, regardless of the type of probability distributions 
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involved, and the constraints are equivalent to 


Ta> GG); 


t= Lysis hs 
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Statistical classification is used when it is of interest to 
partition a set of subjects or observations into groups or 
categories, based on observed attributes that are associ- 
ated with each of the subjects. For example, a lending 
institution may wish to partition a set of loan applicants 
into one of the two categories of probable payers and 
defaulters, based on observed characteristics for the ap- 
plicants. The characteristics might include: size of the 
loan requested, total available income, total amount of 
credit available to the applicant at other sources, num- 
ber of years with current employer, and others. 

To give a formal definition to the problem, there are 
n subjects to be partitioned into k categories, based on 
m different observed characteristics. The proper cate- 
gory of classification is assumed to be known for each 
of these subjects, and Xj denotes the measured value 


3712 


Statistical Classification: Optimization Approaches 


of characteristic j for subject i. A classification function 
is to be obtained for each category of classification to 
represent the strength of association of any subject with 
that category. Let F,(X;.) denote the classification func- 
tion for a given category a with 1 < a<k, for any given 
subject, say for subject i. We assume a linear form for 
F,(X;.) and 


m 
F,(X;.) = aN + yo 
j=l 


The n known observations are used as a training set to 
obtain coefficients for the classification functions that 
accurately model the relationship between the strength 
of association of the classification functions and the 
actual group membership for the given observations. 
These functions are then used in the future to classify 
subjects for which the proper category is not known, 
and for which a prediction of category membership is 
being sought. In particular, a future observation, with 
associated measured values for Xjjs, will be predicted 
to be a member of category a when F,(X;.) > Fp(X:-.), 
for all 1 < b < k with b # a. R.A. Fisher [1] and 
C.A.B. Smith [3] developed classical statistical tech- 
niques to approach this problem, with standard as- 
sumptions about the distributions of the Xjs. 

More recently, optimization approaches have been 
used to develop techniques to obtain coefficients for 
the classification functions that directly maximize the 
number of correct classifications in the training set. 
A. Stam [4] presents an exhaustive survey of most of 
the early work in this area. Most of these approaches 
are based on mathematical programming techniques. 
These approaches are of interest since they will maxi- 
mize the number of correct classifications in the train- 
ing set, which standard statistical approaches will not 
necessar- ily do. In addition, the mathematical pro- 
gramming techniques are very useful when standard 
statistical assumptions about the distributions of Xjs 
are not valid. 

W.V. Gehrlein [2] presents elementary mathemati- 
cal programming formulations of the generalized clas- 
sification problem to obtain classification functions that 
directly maximize the number of correct classifications 
in the training set. The primary variables in these for- 
mulations are the given c? coefficients that should be 
used in the classification functions. 


Each of the observations in the training set will have 
a binary (0-1) variable, I;, associated with it, such that 
observation i will be correctly categorized when I; = 0 
and observation i will be incorrectly categorized when 
I; = 1. The objective function is given by 

n 
Minimize » I;. 
i=1 

There are k — 1 constraints that are associated with 
the categorization of each observation. Observation i is 
known to be a member of some category, say a. Then 
for each b with 1 < b <kand b 4 a there is a constraint 
of the form 


m m 
co + Yo ct Xi _ ce _ > CX; + MI; > e, 
j= j= 
in which M is a very large number and e¢ is a very small 
number. The values of M and e remain the same in all 
constraints. By the nature of M and e, this constraint 
will be met trivially if I; = 1, and we must have F,(X;.) > 
F,(X;.) if I; = 0. 

It is also possible to develop a classification pro- 
cedure that has only one classification function. The 
category of group membership is then determined by 
where the value of the computed classification func- 
tion value falls on the number line. That is, the number 
line is partitioned into k segments, with each of the seg- 
ments corresponding to an associated group member- 
ship. The line segments are established for each group, 
say a, with an upper limit UL, and a lower limit LLy. 
As above, a binary variable, I;, is associated with each 
observation such that observation i is correctly catego- 
rized when I; = 0 and observation i will be incorrectly 
categorized when I; = 1. The objective function remains 
the same as above. With this formulation there are only 
two constraints that are associated with the categoriza- 
tion of each observation. Observation i is known to be 
a member of some category, say a, and the associated 
constraints are of the form 


m 
cf + > of Xij — MI, < ULa, 
j=l 


m 
c6 +) c#Xij; + MI; > LLa. 
j=l 


As above, M is a very large number, and e will be a very 
small number. These constraints will be met trivially if 
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I; = 1, and we must have LL , < F,(X;.) < UL, when- 
ever I; = 0. Additional constraints are needed to ensure 
a logical consistency of the line segment partition. For 
each a with 1 < a < k, we need a constraint of the form 


UL, — LL, = e. 


To be certain that there is no overlap of the line seg- 
ments, each combination of groups, say a and b, has 
two binary variables I,, and Ip, defined for them, with 
associated constraints of the form 


LL, — ULy+ Map 


LL, — UL,+ MIbp, 
Tgp t Ing = 1. 


IV 
= 


V 
S 


Extensions of these elementary formulations of opti- 
mization techniques go in several different directions. 
A particularly useful variation deals with the notion 
of minimizing the total cost of misclassification, when 
there are different costs or penalties that are associ- 
ated with the different ways in which a subject could 
be incorrectly classified. Multiple stage classification 
schemes are also considered, in which a subject is ini- 
tially either placed in a category with existing informa- 
tion, or no classification is made. If a classification is 
not made in the first stage, then additional information 
is used to make a classification in a second stage. In ad- 
dition, specialized heuristic techniques have been de- 
veloped to obtain solutions to these mathematical pro- 
gramming formulations in an efficient manner, when 
the number of possible categories of classification is re- 
stricted. Much of the current work on these extensions 
is given in [5]. 
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Introduction 


This article considers the application of the notion of 
statistical convergence in turnpike theory. The first re- 
sults have been obtained recently [14,15,19]. We briefly 
discuss the importance of this conjunction, present 
some results obtained and, finally, we formulate a chal- 
lenging problem for future investigations. 

We will consider discrete dynamical systems. Tra- 
jectories of these systems are some sequences of real 
numbers. Turnpike property, in a simple case, states 
that there is a certain stationary point that attracts all 
optimal trajectories not depending on the initial state. 
In other words, all optimal trajectories converge to this 
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stationary point. We can say that any optimal trajec- 
tory spends “almost” all time in some ¢-neighborhood 
of that point. The term “almost” in this case means that 
only a finite number of elements of optimal trajectory 
may remain outside the e-neighborhood of that point 
(€ is any small number). 

It turns out that, for some practical problems this 
property does not hold. The relaxation of the term “al- 
most” might be helpful to extend the class of problems 
that the turnpike property holds. One such a relaxation 
is the use of statistical convergence instead of an or- 
dinary convergence. In this case, an infinite number 
of elements of optimal trajectory may remain outside 
the e-neighborhood of a stationary point; however, the 
number of these elements in comparison with the num- 
ber of elements in the e-neighborhood is so small that 
we can say the optimal trajectory “almost” remains in 
this neighborhood. 

This article adopts the notion of statistical conver- 
gence to describe the turnpike property. 


Turnpike Theory 


Turnpike theory studies asymptotical behavior (often 
stability) of optimal trajectories of dynamical systems. 
It has many applications in economics and engineering. 
We refer to [8,16,18,25] for more detailed information 
about this theory and its various applications. 

The first result in this area was obtained by J. von 
Neumann, in 1945. However, the main meaning of this 
result that led to turnpike property was discovered by 
Paul A. Samuelson, in 1948-1949, who also introduced 
this terminology. A clearer description of this property 
was provided by Dorfman et al. [3] in the chapter “Ef- 
ficient Programs of Capital Accumulation” of Linear 
Programming and Economic Analysis. The following is 
the famous quote from [3], p. 331, that describes the 
meaning of the turnpike property: 

“Thus in this unexpected way, we have found a real 
normative significance for steady growth - not steady 
growth in general, but maximal von Neumann growth. 
It is, in a sense, the single most effective way for the sys- 
tem to grow, so that if we are planning long-run growth, 
no matter where we start and where we desire to end up 
it will pay in the intermediate stages to get into a growth 
phase of this kind. It is exactly like a turnpike paralleled 
by a network of minor roads. There is a fastest route 


between any two points; and if the origin and destina- 
tion are close together and far from the turnpike, the best 
route may not touch the turnpike. But if origin and des- 
tination are far enough apart, it will always pay to get 
on to the turnpike and cover distance at the best rate of 
travel, even if this means adding a little mileage at either 
end. The best intermediate capital configuration is one 
which will grow most rapidly, even if it is not the desired 
one, it is temporarily optimal”. 

After this book, theorems about the asymptotic be- 
havior of optimal (or efficient) trajectories of dynamical 
systems are called “turnpike theorems.” Asymptotic be- 
havior of optimal trajectories may be described in dif- 
ferent ways. 

In this article, we consider trajectories that are se- 
quences of numbers from R”. The turnpike property 
in this case can be formulated as a convergence of opti- 
mal trajectories to some stationary point. We reformu- 
late this property using statistical convergence instead 
of an ordinary convergence. 


Statistical Cluster Points 
and Statistical Convergence 


The idea of statistical convergence was introduced by 
Steinhaus [23] and also independently by Fast [4] and 
Buck [1] for sequences of real and complex numbers. 
Later, this notion was developed by Salat [22], Mad- 
dox [7], Connor [2], Fridy [5,6] and others. 

Fridy [6] introduced the notion of a statistical limit 
point and a statistical cluster point and gave some prop- 
erties of a set of statistical limit and cluster points. In 
particular, it was shown that the set of statistical cluster 
points of a bounded sequence is not empty; moreover, 
if this set consists of one point, then the sequence is sta- 
tistically convergent to this point. Because of this prop- 
erty, the notion of statistical cluster points, and, conse- 
quently, the statistical convergence, became a suitable 
tool that could be used in turnpike theory. 

First we present some notations. We denote by |A| 
the cardinality of a subset A C {1,2,...}. Consider 
a sequence (x;), where x, € R™,k = 1,2,... 


Definition 1 The sequence (x;,) is said to be statisti- 


cally convergent to x* € R” if for every e > 0 


1 
lim sup —|{k < n: ||x~ —x*|| > e}] = 0. 
noo n 
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We use the notation st — limz-+99x, = x* in this case. 


Definition 2 & € R” is said to be a cluster point of 
sequence (xx) if for every e > 0 


1 
lim sup lth <n: ||xn—-&|| <e}] >0. 


n—>oo 


Given a sequence x = (x;), we denote by I”, the set of 
all statistical cluster points. If this set consists of one 
point, then the sequence is statistically convergent to 
that point [6]. 

The set of ordinary limit points is defined as 


Ly ={€ ER”: 


Xk, > & as ky > oo}. 


there exists a subsequence 


It is clear that if x is a bounded sequence, then Ly, is 
anonempty compact set and for every ¢ > 0 there exists 
anumber N, < +00 such that 


P(Lx, xX) < € forevery k > Ne. 


Here p(A, €) = minye, ||y — &|| is the distance from & 
to the closed set A. 

It turns out that the set I”, possesses a similar prop- 
erty. The following is a very useful and important result 
proved in [6] for the case m = 1. It is not difficult to 
generalize it for m > 1. 


Lemma 1. Assume that x = (xx) is a bounded se- 
quence. Then: 
1. There exists a sequence y = (yx) such that 
e Ty =L,, 
e limy- soo +\{k <nixp A yx}| = 0. 
2. The set of statistical cluster points I’, is not empty 
and compact. 
3. limy—soo 4\{k <n: ply, xn) <e}|=0 for all 
é>0. 


Let a = (a) be a sequence of bounded real numbers 
and Iy be the set of statistical cluster points on this se- 
quence. From Lemma 1. we know that the set Ty has 
a minimal element. We denote by C — lim infp+o0 a 
the minimal element in Iq. 

This notation is similar to the notion of 
lim infg—o0 @ being equal to a minimal number of 
the set of ordinary limit points Ly. 

Let g:IR” > R be a continuous function and 
x = (xx), x € R™, bea given bounded sequence. Then 


the sequence of real numbers y = (g(x;)) is bounded. 
We define the following functional 


I(x) = C- lim inf g(x) =minly (1) 


as a minimal number of I. We have the follow- 
ing useful representation: given any bounded sequence 
x = (xx) 


C — lim inf g(xx) = ee gté) . (2) 


Below we consider two problems where the turn- 
pike theorems are formulated in terms of statistical con- 
vergence. For details see [14,15,19]. 


Problem 1 


Consider the problem 


Xet1 © a(xz) + x~,k = 1,2,... 5 (3) 
I(x) = C- lim inf g(x,) > max . (4) 


We assume that set-valued mapping a: R” — 
IT,UR™) is continuous in the Hausdorff metric and 
g: R” — R is a continuous function. Here /7,(R”) 
stands for the set of all compact subsets of R”. 

A sequence x = (x;) satisfying (3) will be called 
a trajectory of this system. From Lemma 1. we know 
that the functional (4) is well defined for bounded tra- 
jectories. 


Definition 3. € € R” is called a stationary point if 


0 € af). 


Note that if € is a stationary point, then the sequence 
(xx), where x; = & for all k = 1,2,..., is a stationary 
trajectory to system (3). Throughout this article, we de- 
note the set of stationary points by M: 


M={x:0€a(x)}. 


The set M may be empty or unbounded. If it is not 
empty, then it is a closed set as mapping a is contin- 
uous. Denote 


J* = sup g(é) . 
ECM 
Definition 4 Trajectory x = (x;,) is called optimal if 
J(x) = J(X) holds for all trajectories * starting from the 
same initial state: Xo = xo. 
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We use the notation a(A) = Uxe,a(x). The following 
is the main condition imposed on mapping a. 
Condition A: Given any set A C R™ 


if 0 € co a(A) then 0 € a(co A). (5) 


If mapping a has a convex graph then this condition 
holds. The main results are presented in the following 
two theorems. 


Theorem 1 Assume that Condition A holds and g is 
concave. Then for every bounded trajectory x = (xx) to 
system (1) the inequality J(x(-)) < J* is satisfied. 


Now assume that the set M is convex and bounded, and 
function g is strictly concave. Then there is a unique 
point &* such that g(é*) = J*. 

Condition B: The set a(&*) is a strictly convex 
body; that is, / a(&*) # 9, and for every two points 
&,& € da(&*) and for all A € (0, 1) the following holds 


A +01 DE e i: a(é*). 


Here 0(-) and f (-) stand for the boundary and the 
interior of a set, respectively. 


Theorem 2 Assume that M is convex and bounded, 
function g is strictly concave and Conditions A and 
B hold. If a bounded trajectory x = (xx) such that 
J(x) = J*, then I, = {&*}; that is, trajectory x is sta- 
tistically convergent to &* : st — limg—oo X~ = &*. 


If J(x) = J* then from the first theorem it follows that 
trajectory x = (x;) is optimal. The second theorem 
provides the turnpike property: all optimal trajectories 
satisfying J(x) = J* statistically converge to &*. 

The proof of these theorems based on techniques 
developed for continuous systems in [9,10,11,12]. 
These studies did not use an assumption similar to 
Condition B. The following example shows that Condi- 
tion B is necessary when dealing with discrete systems. 


Example 1. Let mapping a and function g be defined 
on the box given by {(x1, x2) : |x;| < 1,i = 1,2} as fol- 
lows 


a(x, X2) = {(y1, ¥2) 2 V1 = X2(X2 — 1), 
y2 = [—2x2, 1- 2x2]} 5 


g=—xf—(1—x)°. 


We have 


M = {(x1, 2): [xi] S 1x2 = 0}; 
= max g(§) = g(0,0)=-1, &* =(0,0). 


It is not difficult to see that all the conditions of The- 
orem 2 hold except Condition B. Consider the se- 
quence x = (x*) where x* = (0,0) for k = 1,3,5,..., 
and x* = (0,1) for k = 2,4,6,... It isa trajectory to 
(3) because (0, 1) € a(0, 0) and (0, —1) € a(0, 1). More- 
over, the set J, = {(0, 0), (0, 1)} consists of two points. 
We have 


(x) = min =-1=J*, 
gery 
however x* is not statistically convergent to 
E* : st-limy—soox* F &*. 
Problem 2 
Consider the problem 
Xe+1 = f (Xk, Ue), x1 = Eun € U5 (6) 
I(x) =C— lim inf g(x,) > max. (7) 
00 


Here &° is a fixed initial point, function f(x, wu): 
IR” x R’ > R™ is continuous, U C R" is a compact 
set and g: R™ — R is a continuous function. 

The pair (u, x) is called a process if the sequences 
x = (xx) and u = (ux) satisfy (6) for all k = 1,2,...; 
x = (x x) is called a trajectory and u = (u,) is called 
a control. 

We assume that there is a bounded closed set 
C C R™ such that x; € C for all trajectories; that is, we 
assume that trajectories are uniformly bounded. 


Definition 5 € € R” is called a stationary point if 
there exists u € U such that f(&, u) = &. 


We denote the set of stationary points by M. It is clear 
that M is a closed set. 
We formulate the main conditions as follows: 
Condition 1. Function g has a unique maximizer on 
set M denoted by &* : maxgey g(&) = g(&*). 
Condition 2. There exists a process (u*, x*) such 
that xf > &* ask > oo. 
Denote B = {£ € C: g(E) > g(E*)}. 
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Condition 3. There exists a vector p € IR” such that 
Pf (x,u) < px forallx ¢ B,x A E* andue U. 

The turnpike property is formulated in the follow- 
ing theorem. 


Theorem 3 Let Conditions 1-3 hold and (u, x) be an 

optimal process in problem (6) and (7). Then 

1. Ty, = {&*}; that is, st — limpoo xp = &*. 

2. If u* €U is a unique point in U such that 
f(E*,u*) = &*, then we also have st—limz—+oo Uk = 
u*, 


In this theorem, turnpike property is established not 
only for optimal trajectories but also for optimal con- 
trols. In both cases this property is satisfied in terms of 
statistical convergence. 


A Challenging Problem 


The functional in the above problems is defined by sta- 
tistical cluster points (see (4)). The following functional 
would be of great interest in terms of turnpike theory 
and statistical convergence 


n 


Jo) = lim = Yo slau) max. (8) 


In the literature on turnpike theory many functional 
have been considered, including terminal function- 
als, integral (summation) functionals with and without 
discount factors [3,8,9,10,11,12,13,16,18,20,21,24,25]. 
They usually are defined by utility functions (g in our 
case). 

The functional (8) also has a useful meaning: it aims 
to maximize the limit of average utilities. However, this 
functional is not considered in the literature; the reason 
is very simple — for functional (8) the turnpike property 
in terms of (ordinary) convergence is in order not true! 

We explain how this may happen in the following 
example. 


Example 2. 
k =1,2,..., where x; € (—oo, +00). We only require 
that sets a(0) and a(1) contain at least points 0 and 
1: {0,1} € a(0), {0, 1} € a(1). Function g is defined as 
g(x) = —x?. 

It is clear that a stationary trajectory & is =0,k = 1, 
2,...,is an optimal trajectory and J* = 0. For any other 
trajectory * = (X;) we have J(X) < J*. 


Consider the system x,+41 € a(xx), 


Consider a sequence x = (xx), where x, = 1 for all 
k= i,i=1,2,..., and x, = 0 otherwise. We know 
that 


lim x, does not exist; however, st— lim x, =0. 
k-o0o k->0o 


It is easy to see that this sequence is a trajectory to the 
system. Moreover, it is not difficult to show that 


1 n 
—\° g(xz) > 0 as n>o. 
n 

k=1 


Therefore, x is an optimal trajectory and it does not 
converge to 0; meanwhile, the statistical convergence to 
0 is valid. 


This example shows that the turnpike property for 
functional (8) should use something different from or- 
dinary convergence. We believe that the statistical con- 
vergence will be suitable for this aim. 

To prove the turnpike property, in terms of statisti- 
cal convergence, for a wide range of systems with func- 
tional (8) would be a challenging problem. 


See also 


> Turnpike Theory: Stability of Optimal Trajectories 
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Introduction 


The careful observation of protein data banks [6] 
has been one of the motivations for modelling the 
biomolecular structure. The present development of 
this subject led to the conviction that the placement of 
some atoms in the structure of a protein is the same as 
that of Steiner points of a minimal Steiner Tree [3,4]. 
The experimental internal radius of a DNA molecule 
and a molecular aggregate like the tobacco mosaic virus 
as well as the pitch of the helices in a helical model of 
the placement of their atoms have also been in good 
agreement with this Steiner modelling. It seems that 
there is a deep correlation between the potential en- 
ergy of the molecular configuration and the length of 
the Steiner Tree. The search for the minima of the en- 
ergy could then be conducted by solving the associated 
Steiner problem. Even molecular clusters can be stud- 
ied with this approach by starting from an existing cor- 
relation of their potential energies with the length of 
a generic Fermat problem. It can be thought that Na- 
ture is following mathematical principles of local en- 
ergy minimization in order to build the present form of 
these structures and to keep them looking for stability 
through unstable stages of molecular evolution [5]. 


The Steiner Ratio of a Metric Manifold 


We consider a finite set of points A in a metric man- 
ifold M. Let us consider the subsets of A such that 
each pair of points on them could be connected by an 
edge of minimal length of a subset. These edges are 
geodesic arcs of the manifold M. A tree is a collection 
of points and their connecting edges. A tree that con- 
nects all the points of a subset is a spanning tree (SP) 
of this subset. Among all the possible STs s of a set A 
with length /sp(s, A), there is at least one whose over- 
all length is minimum. This will be the minimal SP of 
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set A, MST(A). Its length will be given by 


Imsta) = min Isp(s, A). (1) 
(s—trees) 

If we allow for the introduction of additional points 
of the manifold M on each set A, we get SPs of smaller 
overall length. A Steiner tree (ST) is obtained with the 
additional requirement of three tangent lines to only 
three geodesic edges meeting at an angle of 120° on each 
additional (Steiner) point. Among all these Steiner trees 
t of a set A with length [s7(t, A), there is one whose 
overall length is minimum. This is called the Steiner 
minimal tree of set A, SMT(A). Its length is 


Ismr(a) = min Isp(t, A). (2) 
(t—trees) 


The minimum spanning tree MST(A) is the worst 
approximation to the Steiner minimal tree, SMT(A), or 
the “worst cut” for each set A C M. A common mea- 
sure of this approximation is the Steiner ratio of the set 
ACM 


I 
pA) = (3) 
MST(A) 
The Steiner ratio p, of the manifold M is then de- 
fined as the infimum of all values p(A) for all sets A, or 


Pn = iat plA) . (4) 


We henceforth adopt the three-dimensional Eu- 
clidean space as the metric manifold M. 


Evenly Spaced Consecutive Points — 
Spanning and Steiner Trees 


For each set A of points in R? we suppose a continuous 
and differentiable curve to pass by all these points. If the 
points along the curve are evenly spaced in terms of the 
Euclidean metric, we have for their position vectors 


[lFj42 — Fell = Wtj41 — yl, (5) 


where || - || represents the Euclidean norm. 
A convenient representation of these vectors will 
be 


7; = (r(@) cos(jw), r(w) sin(ja), jh(w)) , 
Vejsn—L. (6) 


The functions r(w) and h(q@) are continuous and twice 
differentiable. 

The position vectors 7; above have an interesting 
property: four of them are enough to generate all the 
others, or 


Tjo4 = Mio + rise + Erin + C7;, 
O0<j<n-1, (7) 

where /, v, &, ¢ are functions to be found. 
This is a well-posed problem with a unique solution. 


We write the corresponding relations for the coordi- 
nates of the vectors in Eq. (7) as 


(j+4)h 
= e+ 3)h+vG+2)ht+ &G+Dh+ jh, 
O0<j<n-1 (8) 
and with an Argand representation in the x! — x? 


plane, 


1Zj+4 = [L1Zj43 + VrZj4o t+ Erzj4i t+ Crz;, 
0<j<n-1, (9) 


where 

zy = (zy = el. (10) 
From Eq. (9) we have 

z—pe—ve*—&z—-6=0. (11) 


We write Eq. (8) for two points j and j + 1, and we get 


wtvt+é+o=1. (12) 
From Egs. (12) and (8) we can write 
ot+2v+3&§&+4o=0. (13) 


The two last equations are enough for the existence of 
a double z = 1 root of Eq. (11), or 


(2-17 (2 +(2-—p)z+3-—2u—v)=0. (14) 


For complex roots of unit modulus |z| = 1 according 
to Eq. (10), we have 


w+4u+4v—8 <0 (15) 


@=2(1+cos@); O<w<4. (16) 
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Equations (15) and (16) lead us to write 


Qw+v=2. (17) 
From Eggs. (11), (12), (13) and (17) we get 
v=2-2p, =p; €=-1. (18) 


We can then write 
Pita = 7 j43 + 21—p)tjpe + wri —C4;. (19) 


This relation will give us the motivation of thinking 
about tetrahedra as the fundamental pieces of this mod- 
elling. There are ny = (n — 3) tetrahedra for n points. 

The values 

2 

@ = WR = HT + arccos (;) (20) 
correspond to vertices of two sequences of regu- 
lar tetrahedra joined together at common faces. In 
the literature, it is known usually by the name 
“3-sausage” [2]. Actually, these two values correspond 
to the same structure. It is itself chiral for n > 6, since 
for n = 3,4,5 (0, 1, 2 tetrahedra) there is a two-fold 
(nm = 3) or a three-fold (mn = 4,5) axis of symmetry. 
The structures correspond to w and —@, with w 4 wr 
being sequences of non-regular tetrahedra and they are 
chiral themselves and chiral to each other for all n > 3. 

After this digression, we go back to the problem 
of constructing SPs for the set of n vertices of Eq. (5). 
A first candidate is the sequence itself. Its total length is 
given by 


Isp = (n —1)[h? + (A+ 1)]”, (21) 
where 
A=1-2cosa. (22) 


There is a necessary restriction on this spanning tree if 
we require that the STs to be formed below be full STs 
(n — 2 Steiner points). The smallest angle between con- 
secutive edges with the points 7; of Eq. (6) as vertices 
should be less than 120°. We write 


r(A+1)? 
2[h2 + (A+ 1]? 


1 
one mal (23) 


or 


h?<rA(At+l1). (24) 


This is the first restriction imposed on the position vec- 
tors i Actually, r(@) is an arbitrary function for the 
present modelling, as will be seen in the forthcoming 
development. 

A generalization of the last formula can be obtained 
by introducing subsequences of evenly spaced but non- 
consecutive points [4]. These subsequences are given 
by 

(Pj) m,1 : ge gteete gree aia ag 


Pimax 


sTptte.. me. (25) 


jmax 


Tj+Ilpm; aoe 


where (m — 1) is the number of skipped points neces- 
sary to form this sequence. The indices of the subse- 
quences above should be restricted by 


jtlm<n-l. (26) 
We can write 
aG= | 
DP = [=| (27) 
m 


The square brackets [x] stand for the greatest integer 
value < x. 

There are m subsequences P;, 0 < j < m—1, and 
each subsequence has (Ip,,,,. + 1) points. We now de- 


fine a new sequence by the union set of the sequences 
above or 


m—1 
Pm = (JP), tejmae (28) 
j=0 


As a check of the consistency of this scheme we can 
see that each sequence P,,, has n points, like the original 
sequence, Eq. (6). From Eq. (27) and a mathematical 
identity we have 


(29) 


By completeness, the scheme should also include 
the original sequence of consecutive points, given by 
Eq. (6). It is effectively given by P; = (Po)1, n—1. 

Each sequence P,, (Eq. 28) has an associated SP. We 
shall proceed now to the calculation of its length. We 
stress that the union of subsequences of the definition 
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of P,, is accomplished by joining two subsequences by 
an edge of consecutive points as far as the calculation of 
length is concerned. 

The scheme is valid for a generic set of points on 
a given curve. We should write a generic Ansatz for the 
coordinates of the subsequences instead of Eq. (6). We 
have 


Titipm = (r(@) cos(j + lpm), r(w) sin(j + lpm), 
“G+ lpm)h(w)). (30) 


With the prescriptions above, the length of the SP for 
a sequence P,, is 


m—1 : 
—j=1 
1 = (Ph? + PUAm + DI YEO] 


: m 
j=0 


+(m—-D[h?+r(A,+))'?, BD 


where 


A; = Aand Ay, = 1—2cos(ma) . (32) 


After using the mathematical identity in the last equal- 
ity of Eq. (29), we get 
Ip = (n— m)[nPh? + PAm + 1)? 


+(m— Dh? +r(A,+)]'?. (33) 


There is an analogous restriction to Eq. (24) on the 
angles between edges of subsequences. It is written as 


mh? <?rAm(Am+tl), Wm. (34) 
The length of the MST will be given by 
Isp = = min n {se 4 . (35) 
(m 
The minim{...} process above should be un- 


derstood in the sense of formation of a piecewise 
function by the functions corresponding to values 
m=1,2,3,... 

We can apply the same scheme to Steiner points and 
their connecting edges. The original sequence is 


= (R(@) cos kw, R(w) sin kw, kH(a)) , 


1l<k<n-2, (36) 


where R(w) and H(q) are also continuous and twice 
differentiable functions. 


We now form the subsequences 


(Sk)m, Is,-max : 


? Sk+1s max ’ 
(37) 


Sk, Sk+m> Sk+2m> tee » Sktlsm> tee 


where (m — 1) is the number of skipped points. 
The restriction on the indices is 


k+lsm<n-2, (38) 
and we have 
n—k—-2 
Is, max = [=| . (39) 
m 


We also have m subsequences S;, 1 < k < m, each 
of which has (/s5, max + 1) points. We define a new se- 
quence of Steiner points by the union set of the se- 
quences S;,1 < k < m: 


Sw = | MS Siigiieed (40) 


k=1 


These new sequences have (n — 2) points each. This can 
be checked by using Eq. (38), or 


2 ( Is,max + 1) = m+y [| 


k=1 


(41) 


=mt+n-—-m—-2=n-2. 


The scheme includes trivially the original sequence of 
Eq. (36). It is given by S; = (S})1, n-1. 

The coordinates of these subsequences should be 
written generically as 


St-+igm = (Rm(w) cos(k + Ism), Rm(w) 


-sin(k + Ism),(k + 1sm)Hm(@)), (42) 


where R,,(@) and H,,(q@) are continuous and twice dif- 
ferentiable functions. 


Steiner Trees 


The ST for each subsequence will be organized as 
follows: first the points Sx+1;m of the subsequence 
(Sk)m, Ispmax for a given m will be connected consecu- 


tively to each other. The first of these points, S k+m Will 
be connected to the first two points, 7; and 74m, of the 
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subsequence (Pj) m, 1P max" The last point, S;4 am Will 


tL Spm 


be connected to the last two points, rj4 m and 


H(Ip imax —1) 
en lo senan™* All the intermediary points Rise Ism Will be 
connected to the intermediary points rj+1,m, for j = k. 
This means that we assume a path topology [2] for each 
subsequence. 

The requirement of edges meeting at an angle of 
120° on each Steiner point leads to the following rela- 
tions: 


Hm =h; mH? =RAm(Am+), Wm. (43) 


From Eqs. (34) and (43) we have trivially, 


Rn <r. (44) 


From Eq. (44) the length of the ST corresponding to 
the sequences P,, and S,, and a path topology will be 
given by 


1) = (ry — Rn) ( +) [= ==") 


k=1 


+ [m?H2, + (An +]? [| 
k=1 


+2 [m? Hr, + (r- RY + TRnl(Am + y]'” : 


(45) 


From Eqs. (43) and the mathematical identity used in 
Eq. (41) we write 


1) = n(r — Rm) + (n— m—2)Rm(Am + 1) 


+2[(r— Rm)? + R(t + RmAmAm + 1]. 
(46) 


Actually, we have used Eq. (43) for the two ends of 
the tree. Their contribution to the length is the last term 
in Eq. (46). In order to satisfy the condition of meet- 
ing edges at 120° there, we need to take a special limit 
Rm — r. It is worthwhile for future modelling applica- 
tions to notice that this procedure leads to the same re- 
sult as the limit for Mg ) for large numbers n of atoms. 
This is easy to see from Eq. (46). 


Le (n > lor Ry > r) = nr+[(n—m)Am—m]Rmn. 
—S —"” 


at ends 


(47) 


The Steiner Ratio Function 


We follow the prescription of Eq. (3) for writing an ex- 
pression for the Steiner ratio function. It will be given 
by 


_ Min(m) {n + (Am(Am + 1))7!? 
- min(m {(n — m)[Am + 1 + m?(F(@))?}}? _ 


[n= m) An — m|mF(o)} 
+(m— [Ai +1 + (FoF 7} 


(48) 
where F(w) = ue is a function restricted by 
1 
F(w) < min) —[Am(Am + 1)]!” (49) 
(m) m 


The application to protein modelling could be done 
by classifying the minimum energy values (minima 
of p) of protein structures on protein data banks [6] af- 
ter choosing a convenient function F(q). 

For application to the Steiner ratio problem of dis- 
crete mathematics, we can see that for very large set of 
points (1 >> 1), the ratio function Eq. (48) can be writ- 
ten 


; 2 1/2 
min¢n) ) 1+ mF(@) (447) 


Min(m) {[An +1+ m2 (Foy ]"} : 
(50) 


p(r> l= 


Let the function F(@) be chosen such that the 
minim) process in the numerator of Eq. (49) is dom- 
inated by the term corresponding to a fixed value of 
m = m. The problem of Eq. (49) will then be solved by 


1+ mF(w) (4%) 
[Am + 1+ m?*(F(@))*]!? 


p(w, F,m) = Maxim) 


(51) 


For m = 1 the function of Eq. (50) has the global mini- 
mum 


@, = 1 — arccos : ; F(a) = is (52) 
3 9 
and 
an | en 
p(or, Por), m = 1) = > (373 + V7) fs 


= 0.78419037337 . 
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This is the value that was conjectured in the litera- 
ture [2] as the best upper bound of the Steiner ratio for 
three-dimensional Euclidean space. 

We also report that the value m = m = 1 can be se- 
lected by a linear function F(w) = aw, which means 
right circular helices for the geometrical locus of the 
points 7; and Sp of our modelling. Furthermore, the 
function p obtained by taking also m = m = 1 in the 
denominator is a convex envelope in this case. 


Concluding Remarks 


The successful application of the scheme introduced in 
the foregoing pages reinforce the idea of studying their 
consequences as well as classifying biomolecular struc- 
tures in terms of associated Steiner trees. It is a geomet- 
rical approach to the fundamental problem of energy 
minimization of these structures, and it can shed some 
light on the problem of biomolecular formation and 
evolution. Some additional knowledge of protein struc- 
ture [1,7] should be introduced into our analysis, like 
amide planes and their twisting angles and a residual 
Fermat problem already solved by Nature for the place- 
ment of a-carbon atoms. The only information we used 
was the placement of carbon and nitrogen atoms as 
Steiner points. We hope that the scheme developed here 
can be extended to create a useful definition of molecu- 
lar chirality and a well-posed optimization problem. 
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The Steiner tree is an NP-hard combinatorial optimiza- 
tion problem [50] with a long history [11,66,93]. The 
study of Steiner trees received great attention in the 
1990s since many important open problems, including 
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the Gilbert-Pollak conjecture on the Euclidean Steiner 
ratio, the existence of better approximation, and the 
existence of polynomial time approximation schemes 
(PTAS), have been solved with influence in the gen- 
eral theory of designs and analysis of approximation 
algorithms for combinatorial optimization, and also, 
many new important applications in VLSI designs, op- 
tical networks, wireless communications, etc. have been 
discovered and studied extensively. Those applications 
usually require some modifications on classical Steiner 
tree problems and hence require new techniques for 
solving them. Therefore, studying various variations of 
Steiner trees became an exciting activity recently. In this 
article, we will review important developments in the 
1990s and discuss some open problems which may mo- 
tivate important developments in this century. 


On the Proof of Gilbert-Pollak’s Conjecture 


Given a set of points in a metric space, the problem is 
finding a shortest network interconnecting the points 
in the set. Such a shortest network is called a Steiner 
minimum tree on the point set. The Steiner tree prob- 
lem can be seen as a generalization of Fermat’s problem. 
Around 1700, P. Fermat proposed a problem of finding 
a point to minimize the total distance from this point to 
three given points in the Euclidean plane; its solution is 
exactly the Steiner minimum tree on the three points. 
The general form of Steiner minimum tree problem 
was proposed by C.F. Gauss [26]. However, R. Courant 
and H. Robbins [27] referred to it as the Steiner prob- 
lem. The popularity of their book was responsible for 
bringing the Steiner tree problem to people’s attention. 
Two important papers in the 1960s further laid a solid 
groundwork for additional study. Z.A. Melzak [75] first 
gave a finite algorithm for the Euclidean Steiner trees. 
E.N. Gilbert and H.O. Pollak [52] produced an excel- 
lent survey of the problem, raised many new topics in- 
cluding Steiner ratio problem, and extended the prob- 
lem to other metric space. Since then, more than three 
hundred research papers have been written contribut- 
ing to the Steiner tree problem. For an excellent survey, 
see [55]. 

An important development on the Steiner tree 
problem that took place in the beginning of the 1990s 
is the proof of Gilbert-Pollak’s conjecture on the Eu- 
clidean Steiner ratio [32,33]. This new development is 


based on the discovery of a new approach with a new 
minimax theorem. 

A minimum spanning tree on a set of points is the 
shortest network interconnecting the points in the set 
with all edges between the points. While the Steiner tree 
problem is intractable, the minimum spanning tree can 
be computed pretty fast. The Steiner ratio in a met- 
ric space is the largest lower bound for the ratio be- 
tween lengths of a minimum Steiner tree and a min- 
imum spanning tree for the same set of points in the 
metric space, which is a measure of performance for the 
minimum spanning tree as a polynomial time approxi- 
mation of the minimum Steiner tree. Determin- ing the 
Steiner ratio in each metric space is a traditional prob- 
lem on Steiner trees. In 1976, F.K. Hwang [54] deter- 
mined that the Steiner ratio in a rectilinear plane is 2/3. 
However, it took 22 years to complete the story of deter- 
mining the Steiner ratio in the Euclidean plane. In 1968, 
Gilbert and Pollak conjectured that the Steiner ratio 
in the Euclidean plane is /3/2. Through efforts made 
by several authors [14,24,31,31,36,48,53,80,86,87], and 
[23], the conjecture was finally proved by D.-Z. Du and 
Hwang [32,33] in 1990. The significance of their proof 
stems also from the potential applications of the new 
approach included in the proof. 

In their approach, the central part is a new minimax 
theorem about minimizing the maximum value of sev- 
eral concave functions over a simplex as follows. 


Theorem 1 (Du-Hwang minimax theorem) Let f(x) = 
max; «1 gi(x), where I is a finite set and g;(x) is a contin- 
uous, concave function in a polytope X. Then the min- 
imum value of f(x) over the polytope X is achieved at 
some critical point, namely, a point satisfying the follow- 
ing property: 
*) There exists an extreme subset Y of X such that x € Y 
and the index set M(x) (= {i: f(x) = gi(x)}) is maximal 
over Y. 


The Steiner ratio problem is first transferred to such 
a minimax problem (g;(x) = (the length of a Steiner 
tree)—(the Steiner ratio)-(the length of a spanning tree 
with graph structure i), where x is a vector whose com- 
ponents are edge-lengths of the Steiner tree) and the 
minimax theorem reduces the minimax problem to the 
problem of finding the minimax value of the concave 
functions at critical points. Then each critical point is 
transferred back to an input set of points with special 
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geometric structure; it is a subset of a lattice formed by 

equilateral triangles. This special structure enables us to 

verify the conjecture corresponding to the nonnegative- 
ness of minimax value of the concave functions. 

Clearly, in order to use the minimax approach, for 
each problem three questions will be addressed: 

1) How do we transfer the problem to such a minimax 
problem meeting the condition that the functions 
are concave? 

2) How do we determine the critical geometric struc- 
ture? 

3) How do we verify the function value on the critical 
structure? 

Developing techniques for answering these three ques- 

tions will enable us to solve more open problems. Let us 

explain it by some examples in the following. 


Chung-Gilbert’s Conjecture 


Steiner trees in Euclidean spaces have an application in 
constructing phylogenetic trees [17]. It was also con- 
jectured by Gilbert and Pollak [52] that in any Eu- 
clidean space the Steiner ratio is achieved by the ver- 
tex set of a regular simplex. F.R.K. Chung and Gilbert 
[22] constructed a sequence of Steiner trees on regu- 
lar simplices. The lengths of constructed Steiner trees 
goes decreasingly to /3/(4 — ./2). Although the con- 
structed trees are not known to be Steiner minimum 
trees, Chung and Gilbert conjectured that J3/ (4— /2) 
is the best lower bound for Steiner ratios in Euclidean 
spaces. Clearly, if /3/(4 — /2) is the limiting Steiner 
ratio in d-dimensional Euclidean space as d goes to in- 
finity, then Chung-Gilbert’s conjecture is a corollary of 
Gilbert and Pollak’s general conjecture. However, this 
general conjecture of Gilbert and Pollak has been dis- 
proved by J.M. Smith [92] for dimension from three to 
nine and by Du and Smith for dimension larger than 
two. Now, interesting questions which arise in this situ- 
ation are about Chung and Gilbert’s conjecture. Could 
Chung-Gilbert’s conjecture also be false? If the conjec- 
ture is not false, can we prove it by the minimax ap- 
proach? 

First, we claim that Chung-Gilbert’s conjecture 
could be true. In fact, we could get rid of Gilbert-Pol- 
lak’s general conjecture, and use another way to reach 
the conclusion that the limiting Steiner ratio for regular 
simplex is the best lower bound for Steiner ratios in Eu- 


clidean spaces. To support our viewpoint, let us analyze 
a possible proof of such a conclusion as follows. 

Consider n points in (m — 1)-dimensional Euclidean 
space. Then all of n(n — 1)/2 distances between the 
n points are independent. Suppose that we could do 
a similar transformation and the minimax theorem 
could apply to these n points to obtain a similar result in 
the proof of Gilbert-Pollak’s conjecture for Euclidean 
plane, i.e. a point set with critical geometric structure 
has the property that the union of all minimum span- 
ning trees contains as many equilateral triangles as pos- 
sible. Then such a critical structure must be a regular 
simplex. 

The above observation tells us two facts: 
a) Chung-Gilbert’s conjecture can follow from the fol- 

lowing two conjectures. 


Conjecture 2. The Steiner ratio for n points in a Eu- 
clidean space is not smaller than the Steiner ratio for 
the vertex set of (mn — 1)-dimensional regular simplex. 


Conjecture 3 (Smith’s conjecture [92]) »/3/(4 — /2) is 
the limiting Steiner ratio for simplex. 


b) It may be possible to prove Conjecture 2 by the min- 
imax approach if we could find the right transfor- 
mation. 

One may wonder why we need to find a right trans- 

formation. What happens to the transformation used 

in the proof of Gilbert-Pollak’s conjecture in the Eu- 
clidean plane? Here, we remark that such a transfor- 
mation does not work for Conjecture 2. In fact, in the 

Euclidean plane, with a fixed graph structure, all edge- 

lengths of a full Steiner tree can determine the set of 

original points and furthermore the length of a span- 
ning tree for a fixed graph structure is a convex func- 
tion of the edges-lengths of the Steiner tree. However, 
in Euclidean spaces of dimension more than two, edge- 
lengths of a full Steiner tree are not enough to deter- 
mine the set of original points. Moreover, adding other 
parameters may destroy the convexity of the length of 

a spanning tree as a function of the parameters. 

Smith [92] showed by an exhaustive computation 
that for d = 3, ..., 7, the Steiner trees constructed by 
Chung and Gilbert are actually minimum Steiner trees, 
but, for d = 8, their Steiner tree is not minimum. He 
also conjectured that the trees of Chung and Gilbert are 
minimum if d is of the form d = 3- 2?. Conjecture 3 is 
a corollary of this more specific conjecture. 
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From the above, we see that proving Chung-Gil- 
bert’s conjecture requires a further development of the 
minimax approach. 


Graham-Hwang’s Conjecture 


A Steiner tree with rectilinear distance is called a rec- 
tilinear Steiner tree. While rectilinear Steiner trees in 
plane have many applications on CAT and VLSI, rec- 
tilinear Steiner trees in high-dimensional space can be 
found in biology [17,47] and optimal traffic multicas- 
ting for some communication networks [13,18]. Al- 
though the Steiner ratio in rectilinear plane was deter- 
mined by Hwang [54] in an earlier stage of the study of 
Steiner trees, there is still (as of 2000) no progress on the 
Steiner ratio in rectilinear spaces by now. The Steiner 
ratio in a d-dimensional rectilinear space was conjec- 
tured to be d/(2d—1) by Graham and Hwang [53]. 
The difficulty for extending Hwang’s approach to prov- 
ing Graham-Hwang’s conjecture is due to the lack of 
knowledge on the full rectilinear Steiner trees in high- 
dimensional spaces. (A full Steiner tree has a property 
that all original points are leaves.) In fact, for a full rec- 
tilinear Steiner tree in plane, all Steiner points lie on 
a path. However, it is not known whether a similar re- 
sult holds for full rectilinear Steiner trees in a space of 
dimension more than two. 

Graham-Hwang’s conjecture can be easily trans- 
ferred to a minimax problem required by our minimax 
approach. For example, choose lengths of all straight 
segments of a Steiner tree. When the connection pat- 
tern of the Steiner tree is fixed, the set of original 
points can be determined by such segments-lengths, 
the length of the Steiner tree is a linear function and 
the length of a spanning tree is a convex function of 
such segment-lengths, so that g; is a concave function 
of such segment-lengths. However, for this transforma- 
tion, it is hard to determine the critical structure. To 
explain the difficulty, we notice that in general the crit- 
ical points could exist in both the boundary and inte- 
rior of the polytope. (See the minimax theorem.) In the 
proof of Gilbert-Pollak’s conjecture in plane, a crucial 
fact is that only interior critical points need to be con- 
sidered in a contradiction argument. The critical struc- 
ture of interior critical points are relatively easy to be 
determined. However, for the current transformation 
on Graham-Hwang’s conjecture, we have to consider 


some critical points on the boundary. It requires a new 
technique, either determine critical structure for such 
critical points or eliminate them from our considera- 
tion. 

One possible idea is to combine the minimax ap- 
proach and Hwang’s method. In fact, by the minimax 
approach, we may get a useful condition on the set of 
original points. With such a condition, the point set can 
have only certain type of full Steiner trees. This may 
reduce the difficulty of extending Hwang’s method to 
high dimension. 

The significance of developing techniques for de- 
termining critical structure corresponding to critical 
points on the boundary is not only for solving Graham- 
Hwang’s conjecture, but also for solving some other 
problems. For example, it can be immediately applied 
to some packing problems. One of the typical pack- 
ing problems is to find the maximum number of ob- 
jects which can be put in a certain container. When the 
objects are discs or spheres, the problem can be trans- 
ferred to a minimax problem that meets our require- 
ment. To determine such a number exactly, we have 
also to deal with critical points on the boundary of the 


polytope. 


The Steiner Ratio in Banach Spaces 


Examining the proof of Gilbert-Pollak’s conjecture in 
the Euclidean plane, we observe that the proof has noth- 
ing concerning the property of Euclidean norm except 
the last part, verification of the conjecture on point sets 
of critical structure. This means that using the minimax 
approach to determine the Steiner ratio in Minkowski 
plane (2-dimensional Banach space), we would have no 
problem in finding a transformation and determining 
critical structures. We would meet only a problem on 
verification for point sets with critical structure. 

Steiner minimum trees in Minkowski planes have 
been studied by [1,25,30,34,70,91]. In these papers, 
some fundamental properties of Steiner minimum trees 
in Minkowski planes have been established. Two nice 
conjectures about the Steiner ratio in Minkowski planes 
were proposed respectively by [25,30] and [30] as fol- 
lows: 


Conjecture 4 In any Minkowski plane, the Steiner ratio 
is between 2/3 and /3/2. 
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Conjecture 5 The Steiner ratio in a Minkowski plane 
equals that in its dual plane. 


With new techniques in the critical structures, B. Gao, 
Du and Graham [49] proved the first half of Conjec- 
ture 4 that in any Minkowski plane, the Steiner ratio 
is at least 2/3, and P.-J. Wan, Du and Graham [97] 
showed that Conjecture 5 is true for three, four, and five 
points. With a different approach, Du and others [30] 
also proved that in any Minkowski plane, the Steiner 
ratio is at most 0.8766. 

The Chung-Gilbert conjecture and Conjecture 5 
can be extended to high-dimensional Banach spaces as 
follows. 


Conjecture 6 In any infinite-dimensional Banach 
space, the Steiner ratio is between 1/2 and J3/ 


(2— /2). 


Conjecture 7 The Steiner ratio in any Banach space 
equals that in its dual space. 


Significant results on these two conjectures could be 
produced by further developments of minimax ap- 
proach from a successful application in two-dimen- 
sional problems to high-dimension. 


On Better Approximations 


Starting from a minimum spanning tree, improve it by 
adding Steiner points. This is a natural idea to obtain 
an approximation solution for the Steiner minimum 
tree. Every approximation solution obtained in this way 
would have a performance ratio at most the inverse of 
the Steiner ratio. The problem is how much better than 
the inverse of the Steiner ratio one can make. 

From the 1980s onwards numerous heuristics 
[6,13,19,44,61,63,64,65,67,94,100] for Steiner mini- 
mum trees have been proposed for points in various 
metric spaces. Their superiority over minimum span- 
ning trees were often claimed by computation experi- 
ments. But no theoretical proof of superiority was ever 
given. It was a long-standing problem whether there 
exists a polynomial time approximation with a perfor- 
mance ratio better than the inverse of the Steiner ratio 
or not. For simplicity, a polynomial time approxima- 
tion with performance ratio smaller than the inverse of 
the Steiner ratio will be called a better approximation. 
The first significant work on better approximations was 


made by M.W. Bern [10]. He proved that for the rec- 
tilinear metric and Poisson distributed regular points, 
a greedy approximation obtained by a very simple im- 
provement over a minimum spanning tree has a shorter 
average length. Later, Hwang and Y.C. Yao [56] ex- 
tended this result to the usual case when the number 
of regular points is fixed. 

In 1991, A.Z. Zelikovsky [102] made the first break- 
through to the problem by giving a better heuristic for 
the Steiner minimum trees in graph. This is the second 
important development on Steiner trees in 1990s. To 
explain his idea and review further development from 
his work, let us start from comparing his work with 
a previous work with a similar idea. 


Chang’s Idea 


Chang [18,19] proposed the following approximation 
algorithm for Steiner minimum trees in the Euclidean 
plane: Start from a minimum spanning tree and at each 
iteration choose a Steiner point such that using this 
Steiner point to connect three vertices in the current 
tree could replace two edges in the minimum spanning 
tree and this replacement achieves the maximum saving 
among such possible replacements. 

Smith, D.T. Lee and J.S. Liebman [90] also use the 
idea of the greedy improvement. But, they start with 
Delaunay triangulation instead of a minimum span- 
ning tree. Since every minimum spanning tree is con- 
tained in Delaunay triangulation, the performance ratio 
of their approximation algorithm can also be bounded 
by the inverse of the Steiner ratio. The advantage 
of Smith-Lee-Liebman algorithm is on the running 
time. While Chang’s algorithm runs in O(n*) time, 
Smith-Leeh-Liebman algorithm runs only in O(n log 
n) time. 

A. Kahng and G. Robin [60] proposed an approxi- 
mation algorithm for Steiner minimum trees in the rec- 
tilinear plane by using the same idea as that of Chang. 
For these three algorithms, it can be proved that for 
any particular set of points, the ratio of lengths of the 
approximation solution and the Steiner minimum tree 
is smaller than the inverse of the Steiner ratio. Some 
experimental results also show that the approximation 
solution obtained by these algorithms are very good. 
However, no proof has been found to show any one of 
them being a better approximation. 
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Zelikovsky’s Idea 


Zelikovsky’s idea[102] is based on the decomposition of 
a Steiner tree (namely, a tree, not necessarily minimum, 
interconnecting original points): An original point in 
a Steiner tree can be either a leaf or a junction. In the 
latter case, the Steiner tree can be decomposed at this 
point. In this way, every Steiner tree can be decomposed 
into edge-disjoint union of several Steiner trees for sub- 
sets of original points; each of them has no junction be- 
ing an original point. A Steiner tree with no original 
point being a junction is called a full Steiner tree. The 
full Steiner trees in the decomposition are called full 
components. The size of a full component is the num- 
ber of original points in the component. 

Clearly, for any k > 3, a k-size Steiner minimum tree 
usually has shorter length compared with a minimum 
spanning tree. It is natural to think about using a min- 
imum k-size Steiner tree to approximate the Steiner 
minimum tree. However, this does not work because 
computing a k-size Steiner minimum tree is still an in- 
tractable problem. Zelikovsky’s idea is to approximate 
the Steiner minimum tree by a 3-size Steiner tree gen- 
erated by a polynomial time greedy algorithm. The key 
fact is that the length of such a heuristic is smaller than 
the arithmetic mean of lengths of a minimum spanning 
tree and a 3-size Steiner minimum tree; that is, the per- 
formance ratio of his approximation satisfies 


py! +3" 


’ 


PR< 
2 

where px is the k-Steiner ratio. Thus, if the 3-Steiner ra- 
tio 3 is bigger than the Steiner ratio 2, then this greedy 
algorithm is a better approximation for the Steiner min- 
imum tree. Zelikovsky was able to prove that 3-Steiner 
ratio in graphs is at least 3/5 which is bigger than 1/2, 
the Steiner ratio in graphs [61]. So, he solved the bet- 
ter approximation problem in graphs. Zelikovsky’s idea 
has been extensively studied in the literature. 

Du, Zhang, and Q. Feng [40] generalized Ze- 
likovsky’s idea to the k-size Steiner tree. They showed 
that a generalized Zelikovsky’s algorithm has perfor- 
mance ratio 


_9)p-1 -1 
= (k — 2)pz° + py 
- k-1 


PR 


P. Berman and V. Ramaiyer [9] employed a differ- 
ent idea to generalize Zelikovsky’s result. They obtained 
an algorithm with the performance ratio satisfying 


They also showed that in the rectilinear plane, the 3- 
Steiner ratio is at least 72/94 which is bigger than 2/3 
[54], the Steiner ratio in rectilinear plane. So, they 
solved the better heuristic problem in rectilinear plane. 
Du, Zhang, and Feng [40] proved a lower bound 
for the k-Steiner ratio in any metric space. This lower 
bound goes to one as k goes to infinity. So, in any met- 
ric space with the Steiner ratio less than one, there ex- 
ists a k-Steiner ratio bigger than the Steiner ratio. Thus, 
they proved that the better heuristic exists in any metric 
space satisfying the following conditions: 
1) the Steiner ratio is smaller than one; 
2) the Steiner minimum tree on any fixed number of 
points can be computed in polynomial time. 
These metric spaces include Euclidean plane and Eu- 
clidean spaces. 
Zelikovsky [104] used a different potential function 
in his greedy approximation and obtained an approxi- 
mation with performance ratio satisfying 


PR < py (1 —Inpy). 


Although Zelikovsky’s idea starts from a point dif- 
ferent from Chang’s one, the two approximations are 
actually similar. To see this, let us describe Zelikovsky’s 
algorithm as follows: Start from a minimum spanning 
tree and at each iteration choose a Steiner point such 
that using this Steiner point to connect three regular 
points could replace two edges in the minimum span- 
ning tree and this replacement achieves the maximum 
saving among such possible replacements. 

Clearly, they both start from a minimum spanning 
tree and improve it step by step by using a greedy prin- 
cipal to choose a Steiner point to connect a triple of ver- 
tices. The difference is only that this triple in Chang’s al- 
gorithm may contain some Steiner points while it con- 
tains only regular points in Zelikovsky’s algorithm. This 
difference makes Chang’s approximation difficult to be 
analyzed. Which one will give a better approximation 
solution? This is an interesting problem. 
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The k-Steiner Ratio p, 


While the determination of the k-Steiner ratio plays an 
important role in estimation of the performance ratio of 
several recent better approximations, A. Borchers and 
Du [15] completely determined the k-Steiner ratio in 
graphs that for k= 2" +h > 2, 


72’ +h 


Pe Caer ae i 


and Borchers, Du, Gao, and Wan [16] completely de- 
termined the k-Steiner ratio in the rectilinear plane that 
P2 = 2/3, p3 = 4/5, and for k > 4, px = (2k — 1)/(2k). 
However, the k-Steiner ratio in the Euclidean plane for 
k > 3 is still (as of 2000) an open problem. Du, Zhang, 
and Feng [40] conjectured that the 3-Steiner ratio in the 
Euclidean plane is 


(1+ J3)/2 
14+ 724+ 73 


They also analyzed that the k-Steiner ratio in the Eu- 
clidean plane might be determined in a similar way to 
the proof of Gilbert-Pollak conjecture. The difficulty 
appears only in the description of ‘critical structure’. 


Variable Metric Method 


Berman and Ramaiyer [9] introduced an interesting ap- 
proach to generalize Zelikovsky’s greedy approxima- 
tion. Let us call the Steiner minimum tree for a sub- 
set of k regular points as a k-tree. Their approach con- 
sists of two steps. The first step processes all i-trees, 3 
<i<k, sequentially in the following way: For each i- 
tree T with positive saving in the current graph, put 
T in a stack and if two leaves x and y of T are con- 
nected by a path p in a minimum spanning tree without 
passing any other leaf of T, then put an edge between 
x and y with weight equal to the length of the longest 
edge in p minus the saving of T. In the second step, it 
repeatedly pops i-trees from the stack remodifying the 
original minimum spanning tree for all regular points 
and keeping only i-trees with the current positive sav- 
ing. Adding weighted edges to a point set would change 
the metric on the points set. Let E be an arbitrary set of 
weighted edges such that adding them to the input met- 
ric space makes all i-trees for 3 < i < k have nonpositive 
saving in the resulting metric space Mg. Denote by t;(P) 
a supremum of the length of a minimum spanning tree 


for the point set P in metric space Mg over all such E. 
Then Berman-Ramaiyer’s algorithm produces a k-size 
Steiner tree with total length at most 


k 


4(P) — fo — t;(P) 
i=3 


t(P) <a t(P) — te(P) 


a Sen ae es 


i= 


The bound for the performance ratio of Berman- 
Ramaiyer’s approximation above is obtained from this 
bound and the fact that t,(P) < p,' SMT(P) where 
SMT(P) is the length of the Steiner minimum tree for 
point set P. 

Based on the above observation, we may have the 
following questions. Could we find another way to vary 
metric for a better bound? Could we forget the greedy 
idea and design a better approximation with only a vari- 
able metric idea? Answering these questions requires 
deeper understanding the of variable metric method. 
We attempt to obtain new algorithms from this study. 

M. Karpinski and Zelikovsky [62] proposed a pre- 
processing procedure to improve existing better ap- 
proximations. First, they use this procedure to choose 
some Steiner points and then run a better approxima- 
tion algorithm on the union of the set of regular points 
and the set of chosen Steiner points. This preprocessing 
improves the performance ratio for every known better 
approximation that we mentioned previously. 

The preprocessing procedure is similar to the al- 
gorithm of Berman and Ramaiyer. But, it uses a ‘re- 
lated gain’ instead of the saving as the greedy function. 
One of our current ideas is to modify Chang’s algo- 
rithm in the following way: At each iteration, if a Steiner 
point is introduced, then computes its related gain, and 
later consider only triples of regular points and Steiner 
points with positive related gain. Would this approxi- 
mation perform better? We attempt to get the answer. 

Although many better approximations have been 
found in recent years, none of them has performance 
ratio smaller than the inverse of 3-Steiner ratio. The in- 
verse of the 3-Steiner ratio seems to be the limit for the 
performance ratio of polynomial time approximations 
for Steiner minimum trees to be able to reach. 

S. Arora and others [5] conjectured that their back- 
track greedy technique gives a polynomial time approx- 
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imation scheme to 3-size Steiner minimum trees. If 
their conjecture is true, then their algorithms also give 
approximations for Steiner minimum trees with perfor- 
mance ratio approach to the inverse of the 3-Steiner ra- 
tio. This probably is the best possible performance ratio. 
Thus, the conjecture of Arora and others is an attractive 
problem to our further research. 

A more accurate analysis [62,101,104] for the per- 
formance ratios of Berman-Ramaiyer’s algorithm and 
Karpinski-Zelikovsky’s preprocessing requires bounds 
for t, and a similar number ft. The techniques in 
[15,16] for determining the k-Steiner ratio seems very 
promising for establishing tight upper bounds for t 
and t*, 

The knowledge for the lower bound of the per- 
formance ratio is an open problem (as of 2000). One 
knows only that for Steiner minimum trees in graphs, 
if NP # P, then a lower bound larger than one exists, 
because the problem in this case is MAX SNP-com- 
plete [12]. 


On PTAS 


T. Jiang and others [58,59] brought a quite different 
idea from previous ones to Steiner minimum trees. 
They decompose the set of regular points based on 
the lengths of edges in a minimum spanning tree. By 
an interesting analysis, they proved that if the ratio of 
lengths between the longest edge and the shortest edge 
in a minimum spanning tree is bounded by a constant, 
then there is a polynomial time approximation scheme 
(PTAS) for Steiner minimum trees in the rectilinear 
plane and in the Euclidean plane. This idea can also 
be used in other geometric optimization problems, in 
particular, some variations of Steiner tree problems de- 
scribed in the next section. 

In 1995, Arora and J.S.B. Mitchell independently 
discovered powerful techniques to establish polyno- 
mial time approximation schemes for geometric opti- 
mization problems, including Euclidean and rectilin- 
ear Steiner tree problems. Their results constitute the 
third important development on Steiner trees in 1990s. 
The significance of their results is not only on Steiner 
trees, but also on the design and analysis of approxima- 
tion algorithms in combinatorial optimization. Let us 
review these two remarkable techniques in the follow- 
ing. 


Arora’s PTAS 


It is quite interesting to note that Arora [4] appeared 
only one week before Mitchell [76]. Any way, they 
use very different techniques to reach the same goal. 
Therefore, both are very interesting. Arora’s technique 
is based on recursive partition. In Jiang and others 
[58,59], although partition can be moved parallelly, the 
size of each cell is fixed. It cannot be varied accord- 
ing to local information about distribution of terminals. 
Therefore, only in case that terminals are distributed al- 
most evenly, could the partition work well. This is why 
such a condition that the ratio of lengths between the 
longest edge and the shortest edge in a minimum span- 
ning tree is bounded by a constant is required. 

However, in Arora’s recursive partition, each big 
cell is partitioned into small cells independently from 
other big cells. How to cut only depends on the situa- 
tion inside of itself. This advantage enables him to dis- 
card the condition in Jiang and others [58,59]. 


Mitchell’s PTAS 


Mitchell’s technique was initiated from studying a min- 
imum length rectangular partition problem. Given 
a rectilinear region R surrounded by a rectilinear poly- 
gon and some rectilinear holes, a rectangular partition 
of R is a set of segments in R, which divide R into small 
rectangles each of which does not contain any hole in 
its interior. The problem is to find such a rectangular 
partition with the minimum total length. This problem 
is NP-hard. 

Du and others [38] introduced a concept of guillo- 
tine subdivision. A guillotine subdivision is a sequence 
of cuts performed recursively such that each cut par- 
titions a piece into at least two. Du and others [38] 
showed that the minimum length guillotine rectan- 
gular partition can be computed in polynomial time. 
However, they were only able to show that this guillo- 
tine subdivision is an approximation of the minimum 
length rectangular partition problem with performance 
ratio two in a special case that the region R is sur- 
rounded by a rectangle with some points as holes in it. 
Mitchell [77] showed that this is actually true in gen- 
eral. He also successfully utilized this technique to ob- 
tain constant approximations for other geometric opti- 
mization problems. With the same technique, C. Mata 
[74] obtained a constant-factor approximation algo- 
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rithm for red-blue separation problem improving pre- 
vious result O(/ogn). 

Inspired by this success, Mitchell [76] extended 
guillotine subdivision to m-guillotine subdivision, 
a rectangular polygonal subdivision such that there ex- 
ists a cut whose intersection with the subdivision edges 
consists of a small number (O(m)) of connected com- 
ponents and the subdivisions on either side of the 
cut are also m-guillotine. With a minor change of the 
proof of [77], Mitchell established a PTAS for min- 
imum length rectangular partition problem. Mitchell 
[78,79] further extended this m-guillotine subdivision 
technique to other geometric optimization problems, 
including Euclidean and rectilinear Steiner tree prob- 
lems, and obtained PTAS for them. 


Variations of Steiner Trees 


Successful researches on classical Steiner tree problems 
encourage extensive study on variations of Steiner trees 
with various application backgrounds. Currently, they 
form a quite active research direction in Steiner trees. 

In VLSI design, one considers several sets of ter- 
minals and finds a minimum total length packing of 
Steiner trees for these sets under the following situation 
[82]: The edges of the Steiner trees are required to lie 
in channels between cells. Each channel has a capacity 
which tells at most how many edges can run through 
it. 

A complicated computer network usually consists 
of several nets of different speeds. The following prob- 
lem was proposed based on such a back- ground: Con- 
sider an undirected network with multiple edge weights 
(c,(e), ..., cg(e)) (c,(e)> +++ > cx(e)). Given a subset N of 
vertices and a partition {Nj, ..., Nx} of N with |N| > 
2, find a subnetwork interconnecting N with minimum 
total weight such that the length of any edge e on a path 
between a pair of vertices in N; is at least cj(e) [43,57]. 

To construct roads of minimum total length to in- 
terconnect n highways under the constraint that the 
roads can intersect each highway only at one point in 
a designated interval which is a line-segment, a gener- 
alization of Euclidean Steiner trees has been proposed 
and studied. Du, Hwang, and Xue [35] presented a set 
of optimality conditions for the problem and showed 
how to construct a solution to meet this set of optimal- 
ity conditions. 


Constructing phylogenetic trees is an important 
topic in computer biology. One of formulations is as 
follows: For a fixed alphabet A, let d denote the Ham- 
ming distance on A", i.e. d((a), ...; dn); (bi, ...5 bn)) 
equals the number of indices i such that a; 4 b;. Given 
a set P of points in the metric space (A", d), find 
a Steiner minimum tree for P. This problem is known 
to be NP-hard. (See [47].) 

When a new customer is out of original telephone 
network, the company has to build a new line to con- 
nect the customer into the network. This situation 
brings us an on-line Steiner tree problem as follow: As- 
sume that a sequence of points in a metric space are 
given step by step. In the ith step, only locations of the 
first n; points in the sequence are known. The problem 
is to construct a shorter network at each step based on 
the network constructed in previous steps. The study of 
on-line problems was initiated by [89] and [73]. A cri- 
terion for the performance of an on-line algorithm is 
to compare the solution generated by the on-line algo- 
rithm with the solution of corresponding off-line prob- 
lem. In the Euclidean plane, it has been known that 
the worst-case ratio of lengths between on-line solution 
and off-line solution is between O(n logn / log log n) 
and O(n logn) [2,96,99]. 

Listing all variations and reviewing each of them 
would take tremendous time and space. It should not be 
the purpose of this short article. Therefore, we next re- 
view a few for which some significant results have been 
recently obtained. 


Steiner Arborescence 


Given a weighted directed graph G, a vertex r, and 
a subset P of n vertices, a Steiner arborescence is a di- 
rected tree with root r such that for each x € P there ex- 
ists a path from r to x. The shortest Steiner arborescence 
is also called a minimum Steiner arborescence. Com- 
puting minimum Steiner arborescence is an NP-hard 
problem. Also, one knows that if NP # P, then the best 
possible performance ratio of polynomial time approx- 
imation for this problem is O(log n). This means that 
although, like the minimum spanning tree, the mini- 
mum arborescence as a shortest arborescence tree with- 
out Steiner points can be computed in polynomial time, 
the Steiner ratio (the maximum lower bound for the ra- 
tio of lengths between the minimum Steiner arbores- 
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cence and the minimum arborescence for the same set 
of given points) in directed graphs is zero. Z. Dai and 
others [20,28] applied Arora’s techniques to this prob- 
lem and obtained the best known result that for any 
€ > 0 there exists a polynomial time approximation 
with performance ratio O(n*). An open problem (as of 
2000) remains for closing the gap between the lower 
bound and the upper bound for the performance ra- 
tio. 

A version of this problem in the rectilinear plane 
has a great interest in VLSI designs and an interest- 
ing story in the literature. Given a set P of n points 
in the first quadrant of the rectilinear plane, a rectilin- 
ear Steiner arborescence tree is a directed tree rooted 
at the origin, consisting of all paths from the root to 
points in P with horizontal edges oriented in left-to- 
right direction and vertical edges oriented in bottom- 
up direction. What is the complexity of computing 
the minimum rectilinear arborescence? First, it was 
claimed that a polynomial time algorithm was found. 
However, S.K. Rao, P. Sadayappan, Hwang, and P.W. 
Shor [84] found a serious flow in this algorithm. Al- 
though they could not show the NP-completeness of 
the problem, they pointed out the difficulties of com- 
puting the minimum rectilinear arborescence in poly- 
nomial time. They also showed that while the ratio of 
lengths between a minimum arbores- cence tree and 
a minimum Steiner tree for the same set of points tends 
to infinity, there is a polynomial time approximation 
with performance two. Recently (2000), W. Shi and C. 
Su [88] showed that computing the minimum rectilin- 
ear arborescence is NP-hard. B. Lu and L. Ruan [72] 
showed, by employing Arora’s techniques, that there is 
a polynomial time approximation scheme for the prob- 
lem. 


Edge-length and Number of Steiner Points 


In wavelength-division multiplexing (WDM) optical 
network design [68,83], suppose we need to connect 
n sites located at pj, ..., Pn with WDM optical net- 
work. Due to the limit in transmission power, signals 
can only travel a limited distance (say R) for guaranteed 
correct transmission. If some of the intersite distances 
are greater than R, we need to provide some amplifiers 
or receivers/transmitters at some locations in order to 
break it into shorter pieces. This situation requires us 


to consider the problem of minimizing the maximum 
edge-length and the number of Steiner points in design 
of WDM optical network. To do so, two variations of 
Steiner trees have been studied. 

The first is to minimize the number of Steiner points 
under upper bound for edge-length. That is, given a set 
of n terminals X = {p;, ..., Pn} in the Euclidean plane 
R’, and a positive constant R, the problem is to com- 
pute a tree T spanning a superset of X such that each 
edge in the tree has a length no more than R and 
with the minimum number C(T) of points other than 
those in X, called Steiner points. This problem is called 
Steiner tree problem with minimum number of Steiner 
points, denoted by STP-MSP for short. G.-L. Lin and 
G.H. Xue [69] showed that the STP-MSP problem is 
NP-hard. They also showed that the approximation 
obtained from the minimum spanning tree by simply 
breaking each edge into small pieces within the upper 
bound (called steinerized spanning tree) has a worst- 
case performance ratio at most five. D. Chen and oth- 
ers [21] showed that this approximation has a perfor- 
mance ratio exactly four. They also presented a new 
polynomial time approximation with a performance ra- 
tio at most three and a polynomial time approximation 
scheme under certain conditions. Lu and others [71] 
studied the STP-MSP in rectilinear plane. They showed 
that in the rectilinear plane, the steinerized spanning 
tree has performance ratio exactly three and there ex- 
ists a polynomial time approximation two. 

The second is to minimize the maximum edge- 
length under an upper bound on the number of Steiner 
points. That is, given a set P = {p1,..., pn} of n termi- 
nals and an positive integer k, we want to find a Steiner 
tree with at most k Steiner points such that the length 
of the longest edges in the tree is minimized. This is one 
of the bottleneck Steiner tree problems. Wang and Du 
[98] showed that: 


a) if NP # P, then the performance ratio of any poly- 
nomial time approximation for the problem in the 
Euclidean plane is at least J; 

b) if NP # P, then the performance ratio of any poly- 
nomial time approximation for the problem in the 
rectilinear plane is at least two; 

c) there exists a polynomial time approximation with 
performance ratio two for the problem in both rec- 
tilinear and Euclidean planes. 
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Multiphase 


Given an edge-weighed complete graph with vertex set 
X (|X| =n) and subsets X),..., Xm of vertices, the prob- 
lem is to find a minimum weighed subgraph G such that 
for every i= 1,..., m, Gcontains a spanning tree for X;. 
This problem is called subset interconnection designs or 
multiphase spanning network problem [29,37]. Du and 
others [41] showed that if NP ¥ P, then the best perfor- 
mance ratio of polynomial time approximation for this 
problem is Inn + O(1). 

Given an edge-weighed graph B with vertex set X 
and subsets X,, Yj,...,; Xm, Ym of X with X; N Y; =, 
the problem is to find a minimum weighed subgraph G 
such that for every i= 1,...,m, Gcontains a Steiner tree 
for X; without using vertices not in Y;. This problem is 
called multiphase Steiner network problem. Both multi- 
phase spanning network and Steiner network problems 
arose in communication network design [81] and vac- 
uum system design [37]. For the former one, when the 
solution is a forest, the system (X), ...,; Xm) is called 
subtree hypergraph. Such a system has various applica- 
tions in computer database schemes [7] and statistics. 
It is also related to chordal graphs [42,45]. R.E. Tarjan 
and M. Yannakakis [95] gave a O(m+n)-time algorithm 
to tell whether a set system is a subtree hypergraph or 
not. 

Comparing the phylogenetic tree problem with 
multiphase Steiner network problem, we would find 
some similarities between them if we look at each co- 
ordinate like a phase. For multiphase Steiner tree prob- 
lem, if the solution is a tree, then we have either a good 
heuristic or a polynomial time computable exact solu- 
tion [37]. This suggests that studying the relationship 
between the two problems will hopefully find a new 
construction of phylogenetic trees. 

L. Ruan and others [85] found that multiweight 
Steine tree problem can be transformed to multiphase 
Steiner tree problem. This initiates new line to study 
both problems. 


See also 


> Auction Algorithms 

> Bottleneck Steiner Tree Problems 

> Communication Network Assignment Problem 
> Directed Tree Networks 

> Dynamic Traffic Networks 


> Equilibrium Networks 

> Evacuation Networks 

> Generalized Networks 

> Maximum Flow Problem 

> Minimum Cost Flow Problem 

> Network Design Problems 

> Network Location: Covering Problems 

> Nonconvex Network Flow Problems 

> Piecewise Linear Network Flow Problems 

> Shortest Path Tree Algorithms 

> Stochastic Network Problems: Massively Parallel 
Solution 

> Survivable Networks 

> Traffic Network Equilibrium 
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A stochastic bilevel program (SBP) is a generalization 
of an ordinary bilevel program (BP; cf. > Bilevel pro- 
gramming: Introduction, history and overview), which 
allows the uncertainty in the values of the problem pa- 
rameters to be expressed by a probability distribution 
on some or all of the variables of the model. The intro- 
duction of these random variables in the BP modifies 
some of its properties, both its mathematical properties, 
as well as the resolution time needed to find a satisfac- 
tory solution. 

Nevertheless, a stochastic BP is essentially a bilevel 
program, and it is therefore important to summarize 
the essential features of this class of models before mov- 
ing on to the effects of incorporating uncertainty. 


Principal Features of Bilevel Programs 


Bilevel programs are optimization problems with two 
objectives, one at each level, which interact through the 
sharing of some of the problem variables. BP can be 
perhaps most easily understood through comparison 
with the leader-follower (or Stackelberg) paradigm in 
game theory. In this context, the leader is represented 
through the upper level optimization problem. In par- 
ticular, the leader seeks to optimize a function of two 
vectors, one which he explicitly controls, x, and another 
vector which describes the reactions of the followers to 
his actions. These reactions are described by y. Since the 
reactions of the followers have an impact on the objec- 
tive of the leader, the leader’s optimization problem is 
to solve min f(x, y), subject to x € X, y € Y. Note that 
the leader may be subject to constraints on his action, 
given by X. In addition, in some cases, there is more 
than one possible reaction y from the followers in re- 
sponse to a given x; if this happens, the leader may take 
different strategies for accepting a single y when he de- 
cides on the optimal x, or y will be restricted to a set, Y. 

Just as the leader seeks to minimize a function, f(x, 
y), the followers’ behavior is also described by an opti- 
mization process, t(x, y). In this case, the followers ac- 
cept the leader’s decision, x, as a parameter, and seek 
to optimize their objective over y, subject to some con- 
straints, described by the set Z(x), which may also de- 
pend on the parameter x. That is, the lower level, fol- 
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lowers’ problem is given by min t(x, y), y € Z(x), which 
is clearly a parametric program with x as a parameter. 
In summary, then, BP is expressed by a pair of opti- 
mization programs, coupled through the passing of an 
upper-level variable as a parameter to the lower level 
problem. One can express quite generally the vector y as 
a possibly nonunique solution to the lower-level prob- 
lem, whose (parametric) solution set is given by S(x): 


min f(x,y) 
yey 


st.  y € S(x), 


where S(x) = {y: y € argmin,< 7(x) t(x, z)}. 

The definition of BP in terms of a generic, lower- 
level parametric solution set S(x) allows us to introduce 
a further generality into the model: in many interest- 
ing applications, the lower-level, or followers’ behav- 
ior, cannot be expressed as an optimization problem, 
but can be described by an equilibrium process, which is 
given mathematically by a variational inequality prob- 
lem (VIP). That is, S(x) = {y € Z2(x): T(x, y)"(y— 
¥y) < 0, ¥ € Z(x)}. These bilevel programs are often 
referred to as mathematical programs with equilibrium 
constraints, (MPEC). In fact, MPEC can be considered 
as a more general form of BP, since any optimization 
problem can be expressed as a VIP, but the converse is 
true only when T(x, -) = Vyt(x, -). 

Bilevel programs have a number of mathematical 
and computational particularities with respect to stan- 
dard one-level optimization programs. The most strik- 
ing characteristic of BP is that the upper-level function 
is not differentiable, even in the case where the lower- 
level response vector, y, is unique as a function of x, that 
is y = S(x). Indeed, y is an implicit function in terms of 
x. A further observation of BP shows that, for each eval- 
uation of f(x, y), it is necessary to solve the lower-level 
problem, just to obtain an iterate for y. The computa- 
tional complexity of BP is thus increased dramatically, 
since each iteration in the resolution of f requires the 
resolution of t. 


Examples of Bilevel Programs 


While the leader-follower paradigm is useful for il- 
lustrating the hierarchical nature of the two levels in 
a bilevel program, it does not provide a sufficient scope 
of the range of problems included in the class of bilevel 


programs. Indeed, bilevel programs describe problems 
in many areas of engineering and management, as well 
as problems in game theory. Following are a few exam- 
ples of bilevel programs, which will furthermore help 
illustrate how uncertainty can be explicitly taken into 
account. (See also [10].) 


Example 1 (Optimal pricing problem) In a number 
of application areas, especially the transportation and 
telecommunications sectors, a central operator seeks to 
maximize profit, given that the market that he is tar- 
geting is competitive, and furthermore that his poten- 
tial clients can refuse to participate should the price be 
set too high, or service quality too low. These problems 
have an inherent bilevel form. Determining the optimal 
tolls to set on a highway, or the price of a long-distance 
phone service are problems of this type. 

The upper-level describes the manager’s problem: it 
may consist in determining the prices so as to achieve 
profit maximization, or the problem of determining the 
level of service to offer on the infrastructure which op- 
timizes a given performance criterion, perhaps taking 
into account the cost of offering such a service level. The 
policy instrument, x, is then the price to be set by the 
manager for use of the infrastructure, and/or the service 
level of the infrastructure (capacity, travel time, or ac- 
cess time improvements, etc.). The feasible set X gener- 
ally contains bounds on the possible values of x. When 
the manager seeks to determine the optimal level of ca- 
pacity improvements to an existing infrastructure, the 
resulting bilevel problem is known as the network de- 
sign problem. 

The lower-level problem describes the users’ re- 
sponses to the prices and/or service levels set by the 
manager. The users’ response is given by the level of 
use on each link of the infrastructure, y. In general, one 
assumes that the users’ behavior follows an equilibrium 
principle, given by an cost operator (or utility function), 
T(x, y). The cost operator gives the cost of using the in- 
frastructure as a function of the price vector x and the 
amount of use y. The interpretation of (Nash) equilib- 
rium is the following: A usage pattern y is in equilib- 
rium (stable) if no single user can reduce his own cost 
(or increase his utility) by modifying his current usage 
pattern. The lower-level feasible set, Z(x) will then in- 
clude demand satisfaction constraints (where the total 
demand level may be a function of the price or service 
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level, x) and may include capacity constraints on the in- 
frastructure. 

The deterministic bilevel pricing model is then 
given by: 


max f(x,y), ay 
st. y € S(x), 


where S(x) := {y € Z(x): Tx, y)"(y-9) < OV € 
Z(x)}. 

Permitting the incorporation of random variables 
into the optimal infrastructure pricing model can be 
quite useful for increasing the realism of the model. In 
particular, the demand for the use of the infrastructure 
can be estimated through historical data or surveys, but 
generally with some random error. In this case, demand 
can be expressed as a random variable, and then Z(x) 
:= Z(x, w), where w belongs to a probability space (92, 
A, P). Similarly, the maximum usage level of the in- 
frastructure, that is, the link capacities included in Z(x), 
can depend on a number of factors which cannot be de- 
scribed precisely, but whose effect is to modify upper 
usage limits according to a known probabilistic law. In 
addition, the user’s cost function on the infrastructure, 
T(x, y) = T(x, y, @) can vary according to a known dis- 
tribution. 


Example 2 (Stackelberg-Nash equilibrium) The game 
theoretic model of a Stackelberg-Nash (or Stackel- 
berg-Nash-Cournot) equilibrium permits representing 
a number of important market phenomena. The model 
assumes a market in which N firms produce a single 
good, each competing for maximum market share. In 
addition, there is a single firm (or government) that also 
produces the good, but is capable of reacting to the N 
other firms’ production when determining how to set 
its own production level. 

This paradigm is successfully applied, for exam- 
ple, to utility markets, in which both private and pub- 
lic firms compete to sell the same utility; power gen- 
eration is one such example. The upper-level opti- 
mization problem, max,f(x, y) represents the leader’s 
profit function, and x the leader’s production level. The 
lower-level represents the Nash equilibrium problem 
among the N firms, known as the followers, with y := 
y(x) being the followers’ production levels, given the 
leader’s production decision. Each follower then solves 
max,, t;(x, y); and the equilibrium of the noncoopera- 


tive Nash game can therefore be expressed as a VIP, as 
in the example above. 

The incorporation of uncertainty in the Stackel- 
berg-Nash model would then take the form of a ran- 
domly varying demand from the market, and possibly 
uncertainty in the profit functions themselves. 


Example 3 (Structural optimization) Among the large 
number of potential applications of bilevel optimiza- 
tion in engineering, one which has been the subject of 
much research attention is that of finding the design of 
a mechanical structure which has the best performance 
under the influence of external forces: structural opti- 
mization. 

As is the case in the examples above, structural opti- 
mization problems also have an inherent bilevel form. 
The upper level objective function f(x, y) measures 
some characteristic of the structure, such as its con- 
struction cost, weight, or stiffness, with y represent- 
ing the performance measure. This objective function 
is optimized by selecting design parameters, x, which 
express the shape of the structure, and the choice and 
amount of material to be used in the design. In addition, 
the structure may be subject to behavioral constraints 
within the upper-level problem, such as bounds on the 
displacements, stresses and contact forces. These con- 
straints define the feasible region Y. Budget limits on 
the amount of available material, if present, would be 
included in a set X. 

The lower-level problem describes the behavior of 
the structure given the choices of the design variables, 
possible contact conditions with foundations or bound- 
aries, (the set Z), and the external forces acting on it, 
F. For elastic structures, the behavior is given by the 
equilibrium law of minimal potential energy, T(x, y), 
which determines the values of the (lower-level) state 
variables, that is, the displacements, stresses and con- 
tact forces, y. The matrix K(x) represents the stiffness 
of the material, and is symmetric and positive semidef- 
inite. 

The deterministic structural optimization problem 
thus described then is: 


min f(x,y), 
yey (2) 


st. y € argmin,., IT(x,z), 


where IT (x, y) := (1/2)y'K(x)y — Fly. 
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The introduction of uncertainty into the structural 
optimization model can be of great use in permitting 
the modeler to take into account variations in external 
forces (due for example to wind and other weather con- 
ditions, varying traffic on a bridge, etc.) and variations 
in the material properties. In the first case, the variation 
in external forces can be described by setting F = F(w), 
where w € 2. Note that a failure to take this variability 
into account may lead to structures with unwanted vi- 
bration under certain weather conditions, as the topol- 
ogy optimization will assign bars only where needed to 
sustain the described forces; hence random forces left 
out of the optimization will not be accounted for by 
the resulting structure. Note further that the structure 
that will result from this stochastic bilevel optimization 
will be one that responds best ‘on average’ to the range 
of possible forces. In certain cases, where any failure 
is unacceptable (e. g., when designing bridges), it may 
be preferable to use a worst-case approach, which will 
result in more costly designs that minimize however 
the risk of failure (rather than minimizing cost or some 
other characteristic of the structure). 

Uncertainty in other problem parameters can be 
accommodated in a similar manner. Taking into ac- 
count the variability in material properties, one would 
set K(x) = K(x, @). 


Properties of the Stochastic Bilevel Program 


The presence of random variables in the bilevel pro- 
gram means that one can no longer calculate exactly 
the vectors x and y, since their values depend on pa- 
rameters which vary randomly. Instead, one can calcu- 
late the values of x and y that optimize f on average; the 
objective is then to minimize the expected value of f(x, 
y, w). The stochastic bilevel program is defined below, 
with the more general lower-level VIP: 


min E,[f(x, y,)| 
SBP st. x EX, 


y(wm) € S(x,a), WE, 


where S(x,@) := {y € Z(x,@): T(x,y,@)'(y—J) 
<0,vE Z(x,@)} denotes the set of solutions to the 
lower-level variational inequality defined by the param- 
eterized mapping T(x, -, w) and feasible set Z(x, w) 
(presumed convex). The random variable w is defined 
on a probability space (92, A, P). 


In its general form, the objective function of the 
stochastic bilevel program can be a multiple integral 
when @ is a vector of continuous random variables. 
That is, E[f(x, y, @)] := faf(x, y, @)dF(w). However, 
this integral is in most cases very difficult to evalu- 
ate. For that reason, as is the case in the majority of 
stochastic programs at this time, a discretization of the 
random distributions is used: one lets £ represent the 
discrete set of random observations obtained from 2, 
numbered ¢ = 1, ..., |£|, and p¢ the probability of each 
scenario ¢ € £, with )\ye, pg = 1. This allows one to 
express the expected value in the objective function as 
a sum: E[ f(x, ye)] := dover pe f(x, ye). The resulting 
problem is referred to as SBP-L. 

In what follows, it is assumed that the set of random 
observations has been expressed as a discrete set. (For 
information on the additional assumptions needed in 
the case of a continuous distribution, see [1,12,16]). 

Consider the following assumptions: 

i) X is nonempty and closed. 

ii) The lower-level constraint set is of the form Z¢(x) 
:= {y: gi (x, y) <0,i=1,..., k}, € € £, where each 
function gy is continuous and convex in y for each 
x € X. Further, either &y(% +) = g4(-), i=1,...,k,2 
€ Q, that is, Zg(x) = Zg, or for eachx€ X,£ EL, 
there is a y such that gi (x, y) <0,i=1,...,k. 

iii) There exists an (x, ye) € Pe := {(x, y) € gr Sex € 
X} with f(x, ye) < oo for all £ € £, where gr Se := 
{(x, y): y € Se} is the graph of S,. 

iv) (Inf-compactness) f is lower semicontinuous, 
proper, and has bounded level sets on Uge Pe. 


Perhaps the first property of interest is that of the exis- 
tence of a solution. 


Theorem 4 (Existence of optimal solutions to SBP- 
£) Let the assumption ii) hold, and the mapping T¢ 
be continuous. Then, the graph of S¢, gr S¢, is closed for 
each £ € £. Hence, under the additional assumptions i), 
iii), and iv), there exists at least one optimal solution to 
SBP-£. 


Proof If gr S¢ is closed for each £ € &, then the as- 
sumptions imply the inf-compactness of the extended 
function f + b>, where Z := UeerZe. From this, the 
existence of a solution follows from Weierstrass’ theo- 
rem. But by condition ii), either Z¢(x) = Z¢, in which 
case the closedness of gr S¢ follows from the continuity 
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of T¢, or the Slater condition holds, in which case the 
closedness of Z¢ follows from [7, Lemma 1]. 


The conditions required in the preceding existence re- 
sult are weaker than those of some previously consid- 
ered requirements on bilevel model formulations, and 
as such, may be particularly interesting for a number 
of important applications. One such example can be 
found in (stochastic or deterministic) structural opti- 
mization problems; in this case, for example, it can be 
shown, that there exists an optimal solution even in the 
presence of zero design bounds. (See [2] for further de- 
tails.) 

A second property of interest is whether or not 
the problem is a convex optimization problem. In most 
cases, neither deterministic not stochastic bilevel prob- 
lems will be convex. However, there is a special form of 
the stochastic bilevel program which may possess the 
desired convexity property. 

Consider the following special case of SBP, in which 
the upper-level objective function f depends on the 
lower-level solution only in the sense of its optimal 
value. The deterministic form of this problem has been 
analyzed in great detail in [15], and is defined as follows: 


min 


SBPOV s.t. 


Ew [f (x, p(x,o))], 
xEX, 


x,w):= inf t(x,y,a). 
Pl ) or y ) 


The discretized formulation, SBPOV-£, of this special 
case is analogous to that of the general bilevel program. 


Theorem 5 (Convexity of SBPOV-£) In addition to 
the assumptions i)-iv), assume, for each ¢ € &, that te 
is convex and continuous, and gi, i = 1, ..., k, are con- 
vex. Then, each function pe := infyez,(x) te(x, y) is con- 
vex on X. Further, assume that X is convex, and that 
the function f is convex and increasing in its second ar- 
gument. Then, the implicit upper-level objective func- 
tion x +> oer pef(x, pe(x)) is convex on X, so that 
SBPOV-L is a convex problem. 


Proof One needs only to establish the convexity of pe 
on X for every £ € £, but this result follows from the 
assumptions and [5, Thm. 5]. 


Note that two-stage stochastic programs are in fact 
equivalent to the problem SBPOV: two-stage stochastic 


linear programs (2S-SLP) are obtained as a particular 
form of this problem, that is, the right-hand side pertur- 
bation model, as discussed by [15, p. 189], in which the 
upper-level variable is located only on the right-hand 
side of the lower-level constraints. Consequently, the 
convexity of the two-stage stochastic programs can be 
obtained directly from the preceding result on SBPOV. 

(These and other such relations are explored in [12].) 
A third property of interest for the resolution of the 

model concerns the differentiability of the upper-level 

objective function. 

Let us first present some additional (and stronger) 
assumptions, that, among other things, will guarantee 
the uniqueness of the lower-level solution, yg for each x 
€ X andeach f€ £. 

a) f is continuously differentiable. 

b) The lower-level constraint set is of the form Z¢(x) := 
{y: gi (x, y) < 0,i=1,..., k}, where each function ry 
is twice continuously differentiable and convex in y 
for each x € X, € € L. Furthermore, for each x € X, 
LEL, Ze(x) £ MW, and Ze(x) C Be, for some open 
and bounded set By. 

c) Let I¢(x, y) = {i= 1,..., kg, y) = 0}. Then, for 
each x € X,¢ € L, and y € S¢(x), the partial gradi- 
ents V, gi (x, y), i € Ig(x, y), are linearly indepen- 
dent. 

d) Tg is continuously differentiable and strongly 
monotone in yg for eachx EX, £ EL. 


Theorem 6 (Directional differentiability of SBP-£) 
Suppose that SBP-£ has a solution and let the assump- 
tions a)-d) be satisfied. Then, the implicit function x > 
Weer Pet (x, ye(x)) of SBP-£) is locally Lipschitz con- 
tinuous and directionally differentiable on X. 


Proof By [13, Thm. 2.1], the assumptions imply that 
the implicit mapping x +> S¢(x) is locally Lipschitz 
continuous, for each £ € £. Then, the result follows di- 
rectly from [14]. 


Algorithms for Stochastic Bilevel Programs 


The last essential ingredient needed in order to study 
and apply stochastic bilevel programs to real problems 
is an efficient algorithm for solving the model. As men- 
tioned earlier, the deterministic bilevel problem is al- 
ready quite difficult and time-consuming to solve; the 
introduction of a discrete random distribution on some 
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or all of its parameters causes an even greater increase 
in the problem size. For this reason, the development 
of efficient methods is primordial, as are the use of 
decomposition and parallel strategies, whenever possi- 
ble. 

Next, one method for generating subgradients of f 
is presented (based on [11]). Note that the algorithm 
utilizes the local Lipschitz continuity and directional 
differentiability of the implicit function f, and thus 
requires the additional assumptions presented above. 
Then, a penalty method making use of a merit func- 
tion reformulation of the VIP is discussed, followed by 
a perturbation method which uses the Karush-Kuhn- 
Tucker conditions (KKT) of the VIP. 

For a given x € X and ££, let ye be the (unique) 
solution to the lower-level problem, and let 7(x, €) := {i 
=1,...,m: gi (x, y) = 0}. One then introduces the subsets 
7,(x, £) and I(x, £) of T(x, £) for the active constraints 
whose (unique) multipliers satisfy A; > 0 and A; = 0, re- 
spectively. In the event that Jo(x, £) is nonempty (that 
is, strict complementarity does not hold at ye) one can 
further introduce a subset J(x, £) of (x, £) for which 
the requirement is that J(x, €) D> J(x, €) D It, £) 
holds. Let gz and d; denote the subvector of g and sub- 
vector of d where only rows i € J are included. 

Applying the analysis of [11], a subgradient of f can 
be calculated as follows. 

First, for each £ € £, solve the following linear sys- 
tem of equations: 


Vy Le(x,ye.Ae) Vy Sy x,e)(x, ye)" 
Vy S4(x,6)(X, Ye) omx|I(x.6)| 


«(be ) = (OL) 
Fa oex,0) o”™ 


in order to obtain d,,, where L¢(x, ye, Ac) := Te(x, ye) 
oe V yg (x, ye)" Ae, and ye = ye(x). 

Then, a subgradient of f at x is given by the formula 
E(x) = Dyer pelWe f(x, ye) + Va Le(x, ye, Ag) dy, — 
VS I(x 0)(% ye) day.) 

The subgradient can be used in an algorithm for the 
heuristic solution of the problem or embedded within 
a more sophisticated algorithm. The following subgra- 
dient projection algorithm utilizes the fact that f is dif- 
ferentiable almost everywhere and in particular when 
the yg, £ € £, are strictly complementary: 


Given x € X 

An initial step in the direction of an arbitrary ele- 
ment —&/(x) € —df(x) is taken, 

followed by a Euclidian projection onto X: 

a backtracking line search in this steplength is made 
so that 

either the resulting feasible solution has a sufficiently 
lower objective value, 

or a predetermined steplength is applied, whichever 
is greater. 


Subgradient projection method 


Note that, at points of nondifferentiability, the tra- 
ditional projection method may break down because 
the negative of the subgradient may then not be a de- 
scent direction; in order to obtain a well-defined itera- 
tion at such points, one therefore utilizes a steplength 
which is the maximum of the one supplied by the back- 
tracking line search and the result of a predetermined 
steplength formula used in traditional subgradient opti- 
mization techniques. 

See [11] for the deterministic analog of the above 
analysis for calculating subgradients in a bundle method 
for the solution of a deterministic bilevel problem. This 
can be viewed as a more advanced technique than the 
above which ensures convergence to a stationary point. 

Consider the following parallel resolution strategy 
for this model. In some cases, one may identify a clus- 
ter of scenarios with similar values. Then, by allocat- 
ing these to the same processor, one may solve the cor- 
responding lower-level problems utilizing efficient re- 
optimization procedures given that any of them have 
been solved to optimality, since the optimal solution to 
any one of them is feasible and near-optimal to all the 
others. Further, for scenarios with slightly differing sets, 
J(x, £), consider sorting the scenarios in each set so that 
|J(x, £)| is increasing. Then one may solve the preced- 
ing linear systems in sequence, expanding the matrix 
with the necessary rows and columns and utilizing the 
solution to the former system as a starting point in the 
search for the next. The fact that the choice of J(x, £) is 
arbitrary in the range of active constraints may be used 
to minimize the number of scenarios with distinct val- 
ues of J(x, £). 

By reformulating the variational inequality in the 
lower level through the use of a merit (or gap) func- 
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tion and introducing a penalty parameter, it is possible 
to devise a different method for solving the stochastic 
bilevel program. The merit function is defined as fol- 
lows: Assume that for each scenario,  € £, one can 
find a continuous function yw, that satisfies the follow- 
ing criteria: 


We = 0 
for every x € X and all y, and 
Ww=0 &-) ye Se(x). 


Then, the method is based on including the new func- 
tion W¢ in the objective with a penalty parameter forc- 
ing it to zero, that is, for ng > 0, one solves 


min > pelf (x, ye) + newelx, ved], 


LEL 
st xEX. 


Note that the objective function remains separable with 
respect to the scenarios and can thus be decomposed 
and solved in parallel. For more details on this method 
and possible merit functions see [4,6,8,9]. 

A still different approach for solving MPEC or 
bilevel programs involves rewriting the solution set 
mapping of the lower-level variational inequality in 
terms of its KKT conditions. Letting A, be the (unique, 
under the assumptions above) vector of multipliers for 
the lower-level constraint set Z¢(x), the KKT conditions 
are, for every € € L: 


Te(x, y) — Vyge(x, y)Ag = 0, 


ge(x,y) <0, Ag>O, Af gelx,y) =0. 


Then, the stochastic bilevel program is written as before 
with the constraints above replacing the constraints y¢ 
ES (x). 

The resulting program is an equivalent one-level re- 
formulation of SBP, but is intractable due to the pres- 
ence of the complementarity constraints. In [3] the 
above model was reformulated by expressing the lower- 
level constraints as g¢(x, y) > 0, £ € &, and then writing 
the complementarity constraints as 


&e(x,y)— ze =0, —2min(ze,A¢) = 0, 


for every £ € £, where zg is a vector of the same di- 
mension as Ag, and the min operator is applied to the 
vectors componentwise. 


The resulting problem is tractable but nonsmooth, 
due to the min operator. The authors in [3] then refor- 
mulated the nonsmooth optimization problem by using 
a perturbative approximation to the min operation. This 
results in a sequence of smooth optimization problems 
converging to the nonsmooth problem as the perturba- 
tion parameters, 1, tend to zero. 

Since the equations above are all separable with re- 
spect to the scenarios £ € £, the same decomposition 
approaches can be applied to this method. 

It should be noted that decomposition across sce- 
narios may still prove insufficient for permitting the 
resolution of realistic stochastic bilevel programs. The 
use of random sampling, such as has been used in 
stochastic quasigradient methods and stochastic decom- 
position, as well as the development of approximation 
strategies, are lines of research that should be pursued 
in the future. 
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Every well-designed global optimization algorithm is 

usually equipped with three main components: 

1) a sampling (or global) phase, whose aim is to inves- 
tigate as thoroughly as possible the feasible region; 

2) a local phase, designed to approximate good local 
optima; 

3) a stopping rule, through which the algorithm is ter- 
minated either with a certificate of optimality (or an 
estimate of the error incurred) or with some kind of 
probabilistic measure of the error itself. 

Many deterministic algorithm for global optimization 
problems are equipped with stopping criteria which 
enable to prove global optimality or to give a pre- 
cise error bound. Unfortunately, from one side those 
algorithms are applicable only to strongly structured 
problems, like, e.g., the optimization of Lipschitz- 
continuous functions (with known Lipschitz constant), 
the minimization of concave functions (cf. ® Concave 
programming), or of functions which are explicitly rep- 
resentable as the difference of two convex functions (cf. 
> D.C. programming); thus those stopping rules, based 
upon duality results and lower-bounding techniques, 
as well as the algorithms designed for those problems, 
cannot be applied to problems which do not possess 
that specific structure. On the other side, even for those 
strongly structured problems, it has been frequently ob- 
served that, in practice, it is quite likely that an algo- 
rithm will find the global optimum relatively quickly, 
but the vast majority of computational time is devoted 
to the proof of optimality. So, in some situations, it 
might be advisable to relax the requirement of certifi- 
cated optimality, and to stop an algorithm as soon as 
there is sufficient evidence that the optimum has been 
found. 

Stopping criteria based upon this idea are usually 
built after a stochastic model and are inspired from 
classical stopping rules developed within the field of 
statistical decision theory. Most, if not all, of the re- 
search in statistical stopping rules is based on the as- 
sumption that the algorithm used is either the pure ran- 
dom search or pure Monte-Carlo method, or multistart. 
The former is the most basic global optimization algo- 
rithm which consists only in generating a uniform ran- 
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dom sample in the feasible region and keeping track of 
the best observed function value (the record). The lat- 
ter, multistart, differs from pure random sampling as 
it prescribes to apply a local optimization routine from 
each sampled point (thus it implements an effective, al- 
though computationally inefficient, local phase). Exten- 
sion of the stopping rules built for these algorithms to 
different methods, like e.g. other two-phase methods 

(cf. also ® Stochastic global optimization: Two-phase 

methods) known as multilevel single-linkage, or simple 

linkage, has to be considered just as an heuristic. 

It is possible to distinguish three main types of 
stopping rules, which mainly differ upon the criteria 
for stopping and the sophistication of the underlying 
stochastic model: 

1) global exploration: stop as soon as there is sufficient 
evidence that all of the feasible region has been sam- 
pled; 

2) enumeration of local optima: stop as soon as there 
is sufficient evidence that all local optima have been 
observed; 

3) enumeration of good local optima: stop as soon as 
there is sufficient evidence that no local optimum 
better than the best so far will be observed. 

It can be easily understood that the last criterion is the 

best one, but its practical realization poses very difficult 

modelization problems. In what follows, a brief account 
of the main models and methods for each of the three 
models will be given. 

Let S C R¢ (d EN) be the feasible region for the 
global optimization problem 


f* = min f(x). (1) 


It is assumed that the Lebesgue measure j1(S) of S is 
finite and strictly positive. From elementary probabil- 
ity, it is easy to derive that, after N uniform points have 
been sampled, the probability that at least a point in 
the sample falls within a prescribed subset of S whose 
Lebesgue measure is € is given by 


c N 


thus, given a prefixed probability level p € (0, 1), if sam- 
pling is terminated as soon as 


N> log(1 — p) 


= (3) 
log (1 - ts) 


then every region of volume greater than € will contain 
a sample point with probability at least p. Letting 


L” := {x €S: f(x) < f* +n} (4) 


be the 7-level set of f, it is in principle possible, although 
extremely difficult in practice, once an error level 7 has 
been chosen, to let € = (L"). The main disadvantage 
of this very simple stopping rule is that it usually pre- 
scribes to stop very late, when every region of volume 
€, and not just L”, has been observed. In principle, this 
method might be applied with success to multistart: this 
algorithm can be seen as a pure random search method 
applied to the composition of the objective function f 
and the mapping of S into itself which arises associ- 
ating to each feasible point x € S the point which is 
obtained starting a local optimization routine from x. 
This way, being the resulting composite function piece- 
wise constant, a (usually much) larger value of € can 
be safely used. Unfortunately no practical method is 
known which enables to approximate the correct value 
of €. Some attempts have been reported in the literature 
(see e.g. [6]), but the rules proposed, although quite 
simple, seem to be very inefficient in providing a quick 
and reliable stopping time. 

The second class of methods for stopping, designed 
for the complete enumeration of local optima through 
multistart, originates from ideas introduced in[9]. Ifthe 
starting points in multistart are stochastically indepen- 
dent, then, letting the local optima be arbitrarily num- 
bered as 1,..., T, the probability of hitting 1; times the 
first, m2 times the second, and so on, is given by 

n} i ‘ 

a Brera a (5) 
Parameters 6; in (5) represent the share of the jth lo- 
cal optimum, that is, the relative volume of its region 
of attraction, or, equivalently, the probability that a lo- 
cal search started from a point in the sample leads to 
the jth local minimum. In practice, neither T, nor the 
shares are known. A Bayesian decision-theoretic frame- 
work is used, in which a prior probability distribu- 
tion is given on the unknown parameters of the proba- 
bilistic model; after each observation is made, through 
a Bayesian updating, an a posteriori distribution is quite 
easily computed. Based upon this model, several rules 
can be obtained for stopping, depending upon differ- 
ent cost structures. Following a standard procedure in 
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Bayesian sequential decision theory, a loss function is 
defined which gives a measure of the trade-off between 
stopping and continuing sampling; then, sequentially, 
at each iteration the posterior expected loss is com- 
puted and the alternatives of stopping and continuing 
are compared. For example, one might define a loss 
by specifying a price to be paid either for stopping be- 
fore all local optima have been found or for continuing 
after the global optimum has been observed. Alterna- 
tively, one can define a cost for each local search and 
a gain (a negative loss) incurred if a new local optimum 
is discovered. Stopping rules derived from this frame- 
work have the advantage of being very simple to imple- 
ment; unfortunately, the model and the resulting stop- 
ping rules just depend on the number of different lo- 
cal optima discovered; the value of the objective func- 
tion at the local optima never plays a role, so that these 
criteria tend to be quite conservative and highly sensi- 
ble to the total number of local optima, irrespective of 
their value. Well known and widely used results in this 
framework can be found in [3]; an ingenious trick is 
presented in [8] which consists in ordering the observed 
function values at local optima; a term which takes into 
account the desire to stop if no optima better than the 
best so far is observed is thus included in the loss. Here 
again function values do not explicitly enter neither the 
model nor the stopping rules, but the relative rank of 
each local optimum is considered. The resulting rules, 
although more complex than those in [3], display a sig- 
nificantly better behavior. 

In all of the above methods for stopping, a prior dis- 
tribution has to be given on the total number of local 
optima and the shares; while it is quite natural to as- 
sume a Dirichlet distribution for the shares, the ques- 
tion of choosing a sensible prior for T is still to be an- 
swered; a seemingly interesting choice, consisting of an 
improper prior, giving constant weight to all positive 
integers, was proven to give incorrect results in [8]. 

The last generation of stopping rules for multistart 
derive from even more complex models. The idea of ex- 
plicitly including function values in the model raises ex- 
tremely difficult theoretical problems. In fact it is very 
hard to identify a sensible stochastic models for the 
observed function values at randomly chosen points 
or, even worse, at local optima obtained by starting 
a local optimization routine from a random starting 
point. Again, the best one can hope, is to define a prior 


model and adapt the resulting probability distribution 
by means of Bayes’ theorem. An attempt in this direc- 
tion can be found in [1] and [2]; there it is assumed that 
function values at local optima obtained from mullti- 
start follow a probability distribution which is largely 
unknown. Thus this probability distribution is mod- 
eled as one out of an infinite class of possible prob- 
ability distributions: the prior knowledge is made ex- 
plicit in the definition of a prior probability distribu- 
tion over a class of distribution functions. From the 
literature on Bayesian nonparametric inference the so 
called simple homogeneous process [5] was selected, due 
to its representativeness (its realizations are dense in the 
space of continuous distributions) and to its computa- 
tional manageability. Accordingly, stopping rules were 
derived which prescribe to stop as soon as 


te 
/ (ft — 9) diay) <, 6) 


where f* is the record value observed after the first n 
samples (more precisely, after the first n local searches), 
F,, is the expected posterior distribution, computed 
through the prior process and the n observations of lo- 
cal optima, c is a threshold. The left-hand side in (3) is 
the expected improvement after the the next observa- 
tion with respect to the current record. The effective- 
ness of the resulting stopping rules has been tested and 
very good computational results have been reported; 
unfortunately the rules are quite cumbersome to im- 
plement and require the setting of several parameters 
in the definition of the prior. 

An alternative approach is described in [4], where 
the problem of specifying a prior distribution on the 
set of probability distributions is simplified assuming 
that function values are discretized; this way the un- 
derlying probabilistic model becomes a parametric one, 
although with a possibly huge number of parameters, 
namely the global minimum and global maximum val- 
ues of f, and the probability of observing any of the dis- 
cretized values between them. Although the resulting 
rules are attractively simple, the idea of discretizing the 
range is far from being a satisfactory one, as a discretiza- 
tion which is too large might lead to incorrect decisions 
on the global optimum, while a narrow one enlarges the 
dimension of the parameter space. Again the problem 
of specifying a prior on the parameters, in particular of 
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the global minimum and maximum, is a difficult one; 
moreover the sensitivity of the rules to function val- 
ues is questionable: the authors proof that, using a loss 
function which is the difference between a cost associ- 
ated to sampling and the ratio between the possible im- 
provement over the record value and the range of values 
for f, the optimal stopping rule depends only on the it- 
eration number; thus no information on the observed 
function values is used, thus easily leading to incorrect 
decisions. 

Unfortunately the main difficulty in stopping 
a stochastic global optimization algorithm is not any 
of the above mentioned ones, like e.g., the difficulty 
of choosing parameters, the insensibility of some rules 
to function values or the cumbersome implementa- 
tion. The real weakness of all the above mentioned ap- 
proaches is the fact that all of them are based upon 
the analysis of stochastically independent samples in 
the feasible region. While this assumption is natural for 
pure random search and for multistart, it becomes false 
as soon as more refined methods are used, notably two- 
phase methods. Thus all of these rules become simply 
heuristic stopping criteria. What is worse is that they 
do not provide a reliable estimate of the error incurred 
after stopping; the user is thus left with a heuristic rule 
with no guarantee. It is not a surprise that most stochas- 
tic global optimization users just let their algorithm run 
until some time limit is exhausted. Unfortunately de- 
riving stopping rules for more clever algorithms is an 
extremely hard task; some attempts have been reported 
in [7] when dealing with Bayesian global optimization 
methods (cf. » Bayesian global optimization), but even 
if good stopping rules can be derived in that framework, 
the results are again only heuristic, as they are based 
upon models of the objective function which are usu- 
ally not justifiable. 

In conclusion, while stopping is a crucial compo- 
nent in stochastic global optimization and late stopping 
is generally the main cause of inefficiency, research in 
this field seems to have stopped in the last years; there 
is still a need for good criteria for general algorithms, 
capable of producing a reliable estimate of the error. 
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Two-phase methods are global optimization algorithms 
which consist of sampling (global phase) coupled with 
refinement or approximation of local optima (local 
phase). Although this definition is extremely general, 
as to encompass virtually all known methods of global 
optimization, the term ‘two-phase’ is generally used in 
connection with methods based upon random sam- 
pling and local optimization started from selected 
points in the sample. 

To fix the notation, let the problem under consider- 
ation be that of finding 


f* = min f(x), (1) 
where S C R¢ is a d-dimensional set (usually closed, of- 
ten compact) and f: S — R is a continuous objective 
function. On f no other special assumption is generally 
made except that a local search algorithm is available, 
which is capable of producing a local optimum once 
started from a feasible point. Thus, depending on the 
local optimization routine employed, f might, for ex- 
ample, be required to be continuously differentiable. 
Two-phase methods are best suited for global op- 
timization problems with no special structure, while 
different, often deterministic, methods are preferred 
when dealing with strongly structured global optimiza- 
tion problems like, e.g., concave minimization (cf. 
also ® Concave programming), d.c. problems (cf. also 
> D.C. programming), Lipschitz continuous problems; 
for a general reference on these particular classes of 
structured global optimization problems, besides this 
volume, the reader might wish to consult [3]. 
Two-phase methods display sufficiently good be- 
havior when the following conditions are met: 
e sampling in the feasible set is not too difficult (fre- 
quently S is assumed to be a hyper-rectangle); 
e the dimension d is not too high: until a few years ago 
d= 10 seemed already to be a high dimension; more 


recent developments pushed this limit to more than 
60; 

e the Lebesgue measure of the region of attraction of 
the global optima is not too small. The ‘region of at- 
traction’ of a local (and, hence, also the global) opti- 
mum is defined as the maximal subset A C S char- 
acterized by the fact that a local search started from 
any point in A leads to that local optimum; 

e the computational cost of evaluating f at a feasible 
point is substantially lower than that of performing 
a local search, and it is not extremely expensive by 
itself. 

For low-dimensional problems, when the last condition 

is not met and function observation is a computation- 

ally demanding task it is preferable to switch to meth- 
ods based upon approximate models of the objective 
function, the most well-known representative of which 
is the class of Bayesian algorithms (cf. also » Bayesian 
global optimization). 

What has come to be known as the class of two- 
phase methods consists of the following very general 
optimization scheme: 


PROCEDURE two-phase() 
kee fe 9= ile 
WHILE (StoppingRule() == FALSE) do: 
choose Nx > 0; 
(Begin Phase I): 
sample a set S; of Nz points from S; 
let Sk = Uk, Si 
(Begin Phase II): 
choose S* C Sk; 
FOR EACH x € S* DO 
start a local search from x; 
END FOR; 
letk:=k+1; 
END WHILE; 
END two-phase; 


In practice Phase I, or global phase, aims at explor- 
ing as thoroughly as possible the feasible region, while 
in Phase II, or local phase, the approximation of good 
local optima is carried out. It can be easily seen that 
most methods for global optimization can be seen as 
special instances of the above scheme. 

Phase I usually consists of uniform random sam- 
pling, even if some attempts have been made to use 
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quasirandom sequences which have, among others, the 

advantage of producing smaller gaps between sampled 

points. 

As the main computational burden is assumed to 
be connected with local searches, it is natural that the 
main distinction between different two-phase methods 
lies in the decision regarding the choice of S*, the set 
of starting points for local optimization. The following 
are some of the most well-known strategies for defining 
such a set: 

e pure random search: S* = Q, that is, no local searches 
are performed. This method is a basic, although very 
inefficient, global optimization strategy. It is also 
known as the pure Monte-Carlo method; 

e best start: S* C S, is equal to the set of new ‘record 
points’, i.e. to the set of points at which, during the 
current iteration, a function value strictly lower than 
the best so far has been observed; 

e multistart: S* = S;. This method prescribes that a lo- 
cal search is started from every sampled point; 

e clustering: S* is defined as the set of record points 
in suitable subsets of the sample built according to 
a clustering procedure. The idea of clustering for 
global optimization dates back to two papers in the 
1970s, [2] and [10], where the idea of concentrat- 
ing the sample in order to approximate the regions 
of attraction of low-level local optima was intro- 
duced. Concentration of the sample is achieved ei- 
ther by discarding a fixed fraction of the points 
with highest function value, or by performing a sin- 
gle, or just a few, descent steps from points in 
the sample. These procedures transform the sam- 
ple into a nonuniform one: clustering techniques 
are then employed to identify subsets of the sample 
with a higher-than-average concentration of points. 
This idea is further exploited in density clustering 
methods (see [9] and [7]) where clusters are grown 
around suitable “seed points’ by progressively en- 
larging an ellipsoidal set until the relative density 
of points of the transformed sample which fall in- 
side the ellipsoid becomes smaller than a thresh- 
old; 

e multilevel single-linkage (in short: MLSL). In this 
method the user specifies a constant sample size Nj, 
= N and two positive reals o and v. Then S* is de- 
fined as follows: a point x € S* (the whole sample) 
is included in S* if no point y 4 x in the sample S* 


exists such that: 


f(y) S f(x) and ||x— yl] < ry,k,o- 


It is moreover required that points in S* are ‘suf- 
ficiently far’ from the boundary, that is, given the 
threshold v > 0, it is required that 


dist(x, 0S) > v, VxeS*, 


and that neither a local search had been already ap- 
plied to x, nor x is too near to a previously discov- 
ered local optimum. 

Here dist is the Euclidean distance and 0S denotes 
the boundary of S; the variable threshold ry,4,¢ is 
defined as: 


-1/2 d log kN ne 
oa (r (1+ 5) mise EN ) . (2) 


In (2), I” is the gamma function, jz represents the 
Lebesgue measure, o is a positive constant. The 
basic idea of this method, which was analyzed in 
[8], is that, instead of building clusters with pre- 
scribed shape (e.g., ellipsoidal), points are clus- 
tered by means of a distance criterion. In particu- 
lar a point is clustered to another in the sample if 
this latter is near enough and has a better function 
value. Local searches are started only from unclus- 
tered points. This way, ideally linking points which 
are clustered together, a forest is built and local 
searches are started only from the root of each tree; it 
is hoped that each tree in the forest connects points 
within the region of attraction of a single local opti- 
mum. As the threshold in (2) decreases to 0, a sin- 
gle tree will eventually be broken into two or more 
connected components, and it may happen that lo- 
cal searches are started also from points which, in 
previous iterations, were not considered to be good 
candidates. 

simple linkage (in short: SL). Here Nx = 1 and ei- 
ther S* = 9 or S* = Sy (a single point). In particular 
a local search will be started from the most recently 
sampled point x if and only if no point y # x in the 
sample S* exists such that: 


fly) < f(x) +e and |x—yll < reo, 


where rx, ¢ is defined as 


Vd 
a (r (1 + d/2) w(S)oE= ) ; (3) 
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Symbols used in (3) have the same meaning as be- 
fore; € is a small positive constant. This method was 
introduced in [5] and theoretically compared with 
MLSL in [4]. The main differences with MLSL are 
in the choice of sampling a single point at each it- 
eration and of letting the possibility of starting a lo- 
cal search only from the last single sampled point, 
instead of having the necessity, at each iteration, of 
reconsidering the whole sample again; 
e Topographical search. S* is defined as the set of local 
record points in the sample: given an integer m > 
1 a point is a local record if its functional value is 
lower than that of its m nearest neighbors. A variant 
of this idea was presented in [11]. 
In all these methods, provided that sampling produces 
a dense set of points in S, convergence of the best ob- 
served function value to the global optimum is achieved 
with probability 1. It was proven in [1] that produc- 
ing a dense set of observations is, in a certain sense, 
also necessary. Moreover, if a prefixed accuracy n > 0 
is given and if it is assumed that the level set 


L" = {x eS: f(x) < f* +n} 


has a positive Lebesgue measure, then all of the above 
algorithms will almost surely place an observation in 
L" in a finite number of iterations. What distinguishes 
poorly performing methods like Pure Random Search 
from MLSL or Simple Linkage is the fact that, through 
local searches, these methods attempt to place observa- 
tions in L” as soon as a point is sampled in the (hope- 
fully much larger) region of attraction of the global op- 
timum. 
Multistart succeeds in reaching this goal, starting 
a local search from every sampled point. This has the 
negative effect of wasting a huge quantity of computa- 
tional power both during local searches leading to local 
(nonglobal) optima, and in discovering each local op- 
timum more than once. Both MLSL and Simple Link- 
age try to reach a compromise between Pure Random 
Search and Multistart by starting a few local searches 
only from a selected set of promising points. The main 
theoretical properties of MLSL and SL are the following: 
e the probability of starting a local search at iteration 
k decreases to 0 provided that o > 2 in MLSL and o 
>0in SL; 
e the total expected number of local searches started, 
even if the algorithm is run forever without stop- 


ping, is finite, provided that o > 4 in MLSL and o 
> 24/d in SL (these results hold when S is the d-di- 
mensional hypercube). 

These two properties are crucial in evaluating the ef- 

ficiency of MLSL and SL: they state that the total ef- 

fort devoted to local searches is kept at a low level. 

The different conditions imposed on a in the last prop- 

erty come from an important difference in the assump- 

tions of MLSL and SL: while the former forbids local 
searches to be started from sampled points which are 
within a prefixed distance from the boundary, the latter 

allows local searches to be started from any point in S, 

including the boundary. As the dimension d increases, 

given, as prescribed in MLSL, v > 0, the probability 
of sampling a point whose distance from the bound- 
ary is less than v increases. This fact might help to ex- 
plain the fact that MLSL was successfully applied only 
to quite low-dimensional global optimization prob- 
lems. On the other hand SL was applied with success 
to high-dimensional global optimization problems like, 

e.g., the minimization of Lennard-Jones potential en- 

ergy function, a classical test for global optimization de- 

rived from computational chemistry, characterized by 
the presence of a number of local optima which in- 
creases exponentially with the dimension of the prob- 
lem. 

Recent developments in two-phase methods aim 
at: 

e analyzing their finite time behavior (all of the prop- 
erties mentioned above were asymptotic ones), pos- 
sibly leading to different definitions of the thresh- 
olds used for deciding whether to start or not local 
searches; 

e improving the sampling phase, in order to avoid the 
possibility that a local search is started from a point 
just because no other point was sampled in a suitable 
neighborhood. 

This last point can be tackled by building the sample in 

such a way that large gaps between sampled points are 

avoided; as an example, quasirandom sequences might 
be employed (for a general reference see [6]). It should 
be observed however that the substitution of pseudo- 
random points with quasirandom ones needs a thor- 
ough redefinition of the criteria used for starting lo- 
cal searches. As an alternative, the idea itself of using 

a threshold might be abandoned, leading to methods 

similar to Topographical Search. 
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As a concluding remark, it should be observed that 
all the effort in these methods is devoted to placing, 
at relatively low computational cost, observations in 
a small neighborhood of the global optimum. This ef- 
fort is wasted if the algorithm is not equipped with 
a good stopping rule (cf. also ® Stochastic global op- 
timization: Stopping rules). Unfortunately, no stopping 
rule is sensible for such poorly structured problems; the 
only possibility up to now has been that of using heuris- 
tic stopping criteria, some of which particularly com- 
plex, derived from simplified stochastic models. Regret- 
tably, research in the field of stopping rules for two 
phase methods seems to have stalled in the last few 
years, probably as a consequence of the low confidence 
users put into statistical stopping rules which never can 
give a guarantee, or even an estimate, about the error in 
approximating the global optimum. 
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As in many other branches of mathematical optimiza- 
tion, stochastic programming theory and algorithms 
have to be rethought completely when including integer 
requirements. Among stochastic integer programs so 
far the linear two-stage model is best understood, both 
structurally and algorithmically. It is the problem 


min {ex + Q(x, 4): Bx = b, x € Zh x a 
where 


Q(x, 1) = | (2 — Ax) d(z, A)) 


RS 
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and 


Wy + W'y’ =t, 


@(t) = min) qy+q'y’: J ‘ 
y eR. ver, 


Structural properties of the above model are mainly de- 
termined by the interaction of the second-stage value 
function ® and the integrating probability measure ju. 
Due to the integer requirement on y the value func- 
tion @ is no longer convex, as would have been the case 
with the integer-free counterpart model. Studies of the 
mixed integer value function ® date back to the 1970s 

[5]. Under mild assumptions including in particular the 

rationality of W and W’ it holds that @(t) € R for all t, 

and the function @ is lower semicontinuous. Moreover, 

the following properties of ® (established in [3,5]) are 
useful prerequisites for the analysis of the above model: 

1) There exists a countable partition U2, T; of the 
domain of ® such that the restrictions of ® to T; 
are piecewise linear and Lipschitz continuous with 
a uniform constant L > 0 not depending on i. 

2) Each of the sets T; has a representation T; = {t; + K}\ 
UL , {tij + K}, where K denotes the polyhedral cone 
WwW’ (R{,), tj, tj are suitable points from the argument 
space, and N does not depend on i. 

3) There exist positive constants 6, y such that |®(t;) 
— D(ty)| < B || t1 — te || +y for arbitrary fy, to. 

Although discontinuous, the function ® hence is not 

‘too bad’: discontinuities only occur in subsets of hy- 

perplanes, in its continuity regions the function is even 

Lipschitzian with uniform modulus, and the overall 

growth of the function is bounded by an affine expres- 

sion. 

The combination of these properties with tools from 
probability theory leads to statements on the joint con- 
tinuity of Q as a function both of the decision variable x 
and the integrating probability measure ju. The latter, of 
course, needs a proper convergence notion in the space 
of probability measures. Here weak convergence of prob- 
ability measures [4] has turned out to be sufficiently 
broad to cover relevant applications and sufficiently 
strict to allow substantial conclusions. Continuity of Q 
both in x and y has direct consequences for the stabil- 
ity of the stochastic integer program when perturbing 
the underlying probability measure jz. Such perturba- 
tions are motivated by two reasons: In practical model- 
ing the probability distribution of the random param- 
eters is always subjective. The modeler hence wants to 


be sure that slight modifications of the distribution do 
not lead to drastic changes in the solution. Secondly, 
the integral defining Q is multivariate with a dimension 
governed by the dimension of the underlying random 
vector which usually is quite big. Numerical integration 
hence fails if 2 is nondiscrete. Approximating jz by dis- 
crete measures then turns integrals into sums which are 
numerically feasible. Of course, this has to be accom- 
panied by the safeguard that ‘close’ approximations of 
model data (the measure j1) end up in ‘close’ approxi- 
mations of the model output (the optimal value and the 
solution set). 

Under mild technical assumptions that basically en- 
sure ® and Q to be well defined real-valued func- 
tions the following results on continuity, stability and 
rates of convergence for stochastic integer programs are 
known. 

Fatou’s lemma and the lower semicontinuity of ® 
imply lower semicontinuity of Q(-, #4) [11,13]. Via 
Lebesgue’s dominated convergence theorem this ex- 
tends to continuity of Q(-, jz) at a given x provided the 
exceptional set E(x) of all (z, A) such that @ is discon- 
tinuous at z — Ax has jz-measure zero [11,13]. Since 
discontinuities of ® are located in a set of Lebesgue 
measure zero (cf. property 2) above) the condition on 
E(x) is fulfilled if jz has a density. This also covers the 
first continuity result in the field obtained by L. Stougie 
[15]. Adding boundedness and monotonicity assump- 
tions on densities of one-dimensional marginal distri- 
butions of linear transforms of jz leads to Lipschitz con- 
tinuity of Q(-, 4) [12,13]. Here the above properties 1) 
and 3) of @ are essential for the proof. 

A particular problem class is given by two-stage 
stochastic programs with simple integer recourse. Here, 
the function Qis much better understood since it enjoys 
separability properties and essential parts of the anal- 
ysis can be done in dimension one. Results comprise 
sufficient conditions for differentiability of Q(-, jz), an 
algorithm for constructing the convex hull of the epi- 
graph of Q(., 2) and convexification procedures for Q(-, 
/4) based on proper modifications of ju [6,7,8,16]. 

Continuity of Q as a function jointly in x and jz can 
be obtained by adding elements from the theory of weak 
convergence of probability measures [4]. Using Rubin’s 
theorem on weak convergence of image measures in- 
duced by discontinuous transformations [4] it is pos- 
sible to show that Q(-, -) is continuous at (x, jz) if the 
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above mentioned exceptional set E(x) has jz-measure 
zero [13]. In [1] the authors study upper and lower 
semicontinuity of integral functionals with discontin- 
uous integrands of which Q(-, -) is a special case. The 
role of the discontinuity set E(x) is then taken by prop- 
erly defined exceptional sets of missing upper and lower 
semicontinuity. Semicontinuity of the integral func- 
tional then essentially follows if the corresponding ex- 
ceptional set has j1-measure zero. 

When heading for rates of quantitative continuity 
of Q as a function of 4, e. g., Lipschitz or Hélder con- 
tinuity, it is essential to select a metric on the space of 
probability measures (probability metric, [9]) that, on 
the one hand, fits to the discontinuous integrand and, 
on the other hand, metrizes weak convergence of prob- 
ability measures under mild assumptions. In [14] a spe- 
cific variational distance meeting these requirements is 
proposed and a Hélder continuity result for Q(x, -) is 
established. The polyhedral cone K arising in the above 
property 2) of @ enters the definition of the variational 
distance as a crucial ingredient. 

By standard arguments from the stability analysis 
of optimization problems with parameters in general 
topological or metric spaces [2,10] the above continu- 
ity statements for Q can be turned into stability results 
for the optimal value and the set of optimal solutions 
of the underlying stochastic integer program. In par- 
ticular, results of this type were obtained for (Hélder) 
continuity of the optimal value and for upper semicon- 
tinuity of the solution set mapping [1,11,13,14]. 
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A stochastic linear program with recourse is a program 
of the form 


cx + Q(x) 
xEex 


min 


s.t. 


where Q(x) = Eg Q(x &), Q(x, £) = minygy eve) Qy(E) 
and €¢ denotes the mathematical expectation with re- 
spect to €. X and Y(&) are usually polyhedral convex 
sets. In recourse programs, some decisions (x), called 
first-stage decisions, must be taken before knowing the 
particular values taken by the random variables (&) 
while some other decisions (y(&)), called second-stage 
decisions or recourse actions, can be taken after the re- 
alizations of the random variables are known. In this 
representation, Q(x, &) is the second-stage value func- 
tion for a given € and Q(x) the expected value-function 
or expected recourse. 

A stochastic integer program with recourse (SIP) is 
a stochastic program, where some of the decisions are 
restricted to be integer, either in the first-or in the 
second-stage problem. It is an extension of integer pro- 
gramming or combinatorial optimization, where some 
of the problem data are random variables. Any appli- 
cation of combinatorial optimization can thus be ex- 
tended to a stochastic integer program. Typical appli- 
cations are in the energy sector [16], resource acquisi- 
tion [1], location problems [10], stochastic scheduling 
[2], stochastic knapsack for yield management [15]. We 
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concentrate here on the recourse formulation of SIP al- 
though some models incorporate instead probabilistic 
constraints. 

SIPs are notoriously difficult unless the only in- 
tegrality requirements are restricted in the first-stage. 
Indeed, the expected recourse function is known to 
be nonconvex, discontinuous (with some exceptions 
when the random variables are absolutely continuous 
[10,13]. Similarly, the set of first-stage decisions that 
yield second-stage feasible solutions is in general non- 
convex. It follows that the available methods and prop- 
erties are scarce for this problem. Research has thus 
concentrated on some specific problems. 

One major situation where some properties are 
available is the simple integer recourse problem, defined 
as follows: 


Q(x, ) = min 

al >h-—Tx, 
-y <h-Tx,yt, 
y = Oand integer 


qr-yt+quey 


where & is formed by the stochastic components of q*, 
q_,hand T. Here, any difference in h — Tx with respect 
to zero must be compensated by an integer quantity y* 
or y_. This compensation is calculated componentwise. 
The expected value of a simple integer recourse prob- 
lem can be computed either exactly or by an approx- 
imation whose error bound can be controlled. More- 
over, a componentwise convexity property can be de- 
rived between points that are at an integer distance so 
that an exact algorithm can be obtained, in particular in 
the case where the first-stage variables are integer [11]. 
Also, in several cases, the convex hull of the expected 
recourse can be obtained [6]. 

Another line of approach is to use the hierarchy 
between aggregate level decisions, which are typically 
those restricted to be integer, and detailed level deci- 
sions, which are very often continuous. Hierarchy has 
been used either through Benders decomposition [9] or 
within the framework of asymptotic analysis [8]. 

Bender’s decomposition has also been applied to the 
case when the first-stage variables are binary variables 
[7]. Those methods sometimes use the terminology ‘in- 
teger L-shaped’ to stress the similarity with what is done 
in linear (continuous) stochastic programs. Applica- 
tions have mainly be in the routing area as many ex- 


amples exist where the expected second-stage recourse 
functions is computable. Of particular importance, is 
the possibility to develop lower bounding functionals 
that are also valid at fractional solutions (see » Frac- 
tional combinatorial optimization). 

A description of Bender’s decomposition for SIP in 
a more general framework is available in [5]. An inter- 
esting alternative is to combine dual decomposition and 
Lagrangian relaxation [4]. 

Clearly, this field is only in its infancy (as of 1999) 
so that we may expect many more results in the coming 
years. A bibliography is available in [14] and a general 
presentation in [3]. 
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Stochastic linear programs are typically characterized 
as extremely large scale linear programs. In general, 
they cannot be solved without the use of specialized 
methods that are designed to exploit their special struc- 
ture. In this chapter, we describe a commonly used de- 
composition technique due to J.F. Benders, and discuss 
the manner in which it is used to solve stochastic linear 
programs. We also discuss the manner in which statis- 
tical techniques can be used in combination with Ben- 
ders’ decomposition. This combination forms the ba- 
sis of the stochastic decomposition algorithm, which is 
a powerful mechanism for solving large scale stochastic 
linear programs. 


Introduction 


In deterministic activity analysis, planning consists of 
choosing activity levels which satisfy resource con- 
straints while maximizing total profit (or minimizing 
total cost). Note that all the information necessary for 
decision making is assumed to be available at the time 
of planning. Under uncertainty, not all the information 
is available, and parameters such as resources are of- 
ten modeled by random variables. However, in the ab- 
sence of appropriate modeling and algorithmic tools for 
planning under uncertainty, practitioners have often 
resorted to using deterministic methodology by replac- 
ing the random variables by their expected values. In 
general, this is inappropriate. In circumstances where 
all the information is not known with certainty, it is ad- 
visable to plan only those activities that cannot be post- 
poned to a future date, while some others may be post- 
poned until better information becomes available. Since 
information is revealed sequentially over time, decision 
making under uncertainty naturally becomes a multi- 
stage process. The earliest LP models for planning un- 
der uncertainty may be credited to G.B. Dantzig [3] and 
E.MLL. Beale [1], and are often referred to as two-stage 
stochastic programs with recourse. 


A two-stage stochastic linear program with recourse 
may be stated as 


min cx + E[h(x,0)] 
(SLP) 4st. Ax =b, 
x>0, 


where @ is a random variable defined on the probability 
space (S2, A, P), and for each w € 22, 


min gwy 
h(x,@) = 4st. Woy =o — Tox (1) 
y= 0. 


Note that the randomness in data elements appears in 
the second stage, whereas data in the first stage is as- 
sumed to be known with certainty. 

Two-stage stochastic linear programs arise in a vari- 
ety of settings. They commonly appear in situations in 
which the first-stage decision, x, corresponds to a long 
term, or ‘planning’, decision that must be made imme- 
diately (i.e., prior to any realization of @). Following 
the implementation of this decision, one is faced with 
a collection of short term, or ‘operational’, decisions, 
which vary with the outcome of @. Thus, for example, 
in a manufacturing environment one might make deci- 
sions regarding the acquisition of productive capacity. 
These, of course, are long term decisions made prior to 
knowing the precise nature of the demand profile. Ac- 
tual production decisions are made after information 
regarding demand has been revealed. As such, these de- 
cisions are short term. Naturally, the objective function 
will attempt to strike a balance between the two types of 
costs. 

Note that the explicit representation of E[h(x, @)] 
requires the solution of (1) for each possible outcome 
of @. Thus, problems such as SLP are typically quite 
large - too large to solve directly as linear programs. 
It follows that the efficient solution of SLP requires the 
development of specialized solution procedures that ex- 
ploit the structure of the problem. In order to bring the 
problem to a computationally viable size, decomposi- 
tion schemes are used. As problem size increases, these 
schemes are often used in conjunction with statistically 
motivated methods. Thus, in this article, we will ex- 
plore the development of cutting plane methods based 
on a Benders’ decomposition of SLP, and will discuss 
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the development of deterministic cutting planes, as well 
as their statistically motivated counterparts. 


Decomposition of SLP 


For brevity in exposition, we assume throughout that 
h(x,@) < oo with probability one. This is equivalent 
to assuming that for all x satisfying Ax = b, x > 0, (1) 
is almost surely feasible. In the stochastic programming 
literature, this property is known as ‘relatively complete 
recourse’. As stated in (1), the function h(x, w) is easily 
verified as a convex function of x (see [9]). It follows 
that the recourse function, 


E[M(x,0)] = f h(x.) Pde) 
2 


is also a convex function of x. As such, it lends itself to 
solution via Kelley’s cutting plane algorithm [7]. Define 


f(x) = cx + E[h(x,@)] 
X= {x: Ax =b, x => 0}. 


Assuming X is bounded, Kelley’s algorithm may be 
stated as: 


S 


x! € Xis given, f°(x) = —o0, k — 0. 
k<k+1. 
Find (a*, B*) such that Vx € X: 
faa (ee px, 
f(x) = a* + (c+ B*)x. 
f(x) = max{fr_i(x), a + (c+ BY yx}. 
Solve minxex f(x) to obtain x**!. 
Repeat from Step 1. 


— 


Ww dN 


Kelley’s cutting plane algorithm 


While it is not difficult to include a stopping rule in 
Kelley’s method, we have not done so in the above state- 
ment because we wish to draw parallels between deter- 
ministic and stochastic cutting plane methods. We note 
that stopping rules for stochastic methods are beyond 
the scope of this article. 

The manner in which (a*, B*) are specified in Step 
1 of the algorithm is critical to ensuring that an opti- 
mal solution to min{f(x): x € X} is eventually identified 
through Step 3 of the algorithm. Coefficients of the sup- 
porting hyperplane required in Step 2 may be obtained 


from a dual solution of (1). That is, assuming that (1) 
has a finite optimum, we have 


max (fw — Tyx) 
h(x,@) = (2) 
s.t. T Ww X< Bo. 


Thus, if x ©€ X is given and if we let mm) € 
arg max {1 (Tm — Tox): TWe < Zo} then 


h(x,@) > to(fTo — Tex), Wx € X, 


with equality ensured if x = x. Note that wz actually 
depends on both x and w. Thus, given x* and a, let 


ak € arg max {rr — Twx*): TWea < go} 
and note that we may obtain the subgradient coefh- 


cients in Step 2 of Kelley’s method using 


ak = [ur P(dw), 
Q 


pi =— [ x81. Piao), 


2 


or equivalently 
ak = E[zArsl, p = —E[r£T5}. (3) 


Note that this algorithm can be interpreted as a de- 
composition method for block angular linear programs. 
Hence it is sometimes referred to as Benders’ decompo- 
sition [2]. In the stochastic programming literature, this 
is also known as the L-shaped method [8]. 

In order to appreciate the computational challenges 
inherent to the solution of SLP, it is important to recog- 
nize the magnitude of the requirements associated with 
(3). Specifically, note that the subgradient coefficients 
specified in (3) require the implicit solution of the linear 
program in (1)-(2) for every possible realization of the 
random variable @. If there are only a few possible real- 
izations, this poses no computational burden. However, 
in most cases this exact evaluation easily exceeds com- 
putational capabilities. For example, if there are only 10 
independent random variables with 3 outcomes each, 
then there are a total of 3'° or 59,049 possible outcomes. 
This figure represents the number of linear programs 
that would have to be solved in each iteration of Kel- 
ley’s method. To solve problems of realistic sizes, it is 
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necessary to resort to approximations of the subgradi- 
ent coefficients. Given their representation as expected 
values in (3), it is natural to resort to statistically based 
approximation schemes. 


Statistical Representation 
of Cutting Plane Coefficients 


The most simplistic use of sampled data within the cut- 
ting plane coefficients is quite straightforward and eas- 
ily implemented. Unfortunately, it is prone to substan- 
tial error, and should not, in general be used except with 
caution. We present it here only as an introduction to 
a more stable methodology. 

We begin by noting that one may obtain statisti- 
cal estimates of the cutting plane coefficients by using 
randomly sampled observations of @ and computing 
the appropriate sample means. That is, suppose that 
{w'}"_, is a collection of independent and identically 
distributed observations of @ and z* € arg max{z(r' — 
T! xk: 2 W! < g'}, where (r’, T’, g', W') = (Tot, Tots Soot» 
Wt). Then the sample means 


n n 


= — oat 


t=1 t=1 


can be used as estimates of the cutting plane coefh- 
cients. Application of Kelley’s method using these es- 
timated cut coefficients is equivalent to solving 


1 n 
cx + — ) h(x, @') 
n 
t=1 


4 
st. Ax =b, (4) 


x>0. 


If we let x,, denote an optimal solution to (4), then it is 
clear that x, need not be an optimal solution to SLP. 
Moreover, it is also clear that x, will depend on the 
sample used - different sets of observations will lead to 
different solutions. 

The drawback to simply solving (4) in place of SLP 
lies in the inability to judge the quality of the solutions 
produced. That is, if x* denotes an optimal solution 
to SLP, one would naturally be interested in assessing 
f(%n) — f(x*). This turns out to be a fairly difficult 
undertaking, and can be computationally intensive (see 


[5]). In addition, we note that cutting plane algorithms 
commonly generate cuts in early iterations which sup- 
port the objective function ‘peripherally’ (i.e., in re- 
gions that are far removed from the optimal solution). 
With that observation, we note that it is possible to ease 
the computational effort required by using less accurate 
cuts in the early iterations. The stochastic decomposition 
algorithm [4] was designed to circumvent these draw- 
backs. 


Stochastic Decomposition 


Recognizing that cutting planes derived early in the it- 
erative process will tend to have little bearing on the op- 
timal solution, stochastic decomposition (SD) iterates 
with a variable sample size. As iterations progress, the 
sample size used increases. That is, in the kth iteration, 
k observations are used. This requires the generation of 
one new observation of @ in each iteration. In addition, 
SD creates computational efficiencies by using approx- 
imation of the subproblem (2). 

In the kth iteration, the SD algorithm approximates 
one support of the following function: 


le 


k 
Yo h(x, 0"). 
t=1 


Note that cuts generated in earlier iterations were based 
on fewer observations of @. In order to ensure that all 
cuts are asymptotically valid (i. e., underestimate the ac- 
tual objective function, as required in Step 2 of Kelley’s 
method), the SD algorithm updates previous cuts by in- 
cluding a lower bounding constant. For example, sup- 
pose that h(x,@) > 0 (with probability 1). Then 


k 
Y> h(x, 0) 
t=1 
ae = 
= — }— VAlx.0') 
k ay 


k=1 
k-1 1 
uae Erol, 
k a = 


|e 


+ “hla. wf) (5) 


IV 


Now, suppose that in the first k — 1 iterations we have 
accumulated a collection of cutting plane coefficients, 
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{(ak-!, BE )} C1) such that 


k-1 
> h(x,w') > ak) + pty, 
t=1 


Vx EX, t=1,...,k-1. 


Combining the above inequality with (5), we have 


ie kai 
7 h(x, w') > a as! + pe. 
t=1 


Vx EX, t=1,...,k-1. 


This leads to a simple mechanism for updating pre- 
viously derived cutting plane coefficients which pre- 
serves the required lower bounding nature as iterations 
progress (and hence, as the sample size increases). That 
is, we simply require 

tr. K-1 gk 


K, Ka 1p 
Oy a 


ara 


use 


and the update of B* is not altered. 

To illustrate the subproblem approximation, sup- 
pose that g> and W, are constant so that g, = g and 
Ww = W for all w € 2. In this case, the set of dual fea- 
sible solutions in (2) is the same for all w € £2, so that 


h(x,@) = max {m(rm — Tox): TW < g}. 


Noting that we may restrict our attention to extreme 
point solutions, let V denote the set of extreme points 
of {7: mW < g}. Then 


h(x,@) = max {1 (rm — Tox): w EV}. 


SD iteratively constructs a subset of V based upon ob- 
served dual solutions. Thus, if V; C V is the subset as it 
appears in the kth iteration, then SD estimates the cut- 
ting plane coefficients using 


mk € arg max {r(r! —T'x*): we Vit : 

Unlike the simplistic sample-based method previously 
described, specific guarantees of optimality can be ob- 
tained through stochastic decomposition. The exposi- 
tory details associated with safeguards can be somewhat 
lengthy, and thus, [4] and [6] fora detailed explanation. 


Conclusions 


The representation of uncertainty in linear program- 
ming models easily leads to problems of extremely large 
magnitude. As such, the ability to decompose stochas- 
tic linear programs is critical to the development of vi- 
able solution procedures. Indeed, Benders’ decompo- 
sition, and the cutting planes derived from it, lie at 
the heart of a wide variety of stochastic linear pro- 
gramming solution methods. Moreover, as the num- 
ber of possible outcomes associated with the random 
variables increases, it is necessary to incorporate addi- 
tional techniques. One of the most promising avenues 
of exploration to date involves the incorporation of ran- 
dom sampling methods within a decomposition proce- 
dure. This provides a mechanism for combining proven 
methods for obtaining computational efficiencies. That 
is, the benefits of using decomposition techniques for 
large scale linear programs have been well established, 
as have the benefits of using statistical summaries of 
sampled data in the estimation of expected values. In 
the solution of large scale stochastic linear programs, 
their combination proves to be quite powerful. 
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Stochastic linear programs (SLPs) can be seen as a gen- 
eralization of linear programming problems where at 
least some coefficients in the objective function and/or 
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the constraints are random. The motivation for such 
a formulation is that in many practical applications, the 
problem data are not known with certainty, for instance 
because they represent information about the future. 

As an example, consider a problem statement from 
the area of production planning. Today (at a first stage), 
a decision maker has to decide upon an input plan x 
which yields goods Tx by means of some technologi- 
cal process in order to meet the uncertain demand h 
in the future (a second stage). Since it is likely that the 
number of produced goods fails to meet the demand, 
a recourse action y is required that allows to compen- 
sate the discrepancy ‘h — Tx’ as soon as the demand is 
known. Such a correction induces additional cost q'y in 
the second stage. The objective is to find a decision x 
that minimizes the direct cost c’x of the first stage plus 
the expected cost Q(x) induced by x for compensations. 
Formally, this can be written as 


min c’x + Q(x) 
st. Ax=b (1) 
x>0 


where Q(x) = A a Q(x, &) dF(&) is the expectation of the 
recourse function 


Q(x, é) = min {q'y: Wy = h(€)— T(é)x}. (2) 


The second-stage cost (2) depends on x and the real- 
ization of some K-dimensional random vector & with 
values in the compact convex set  C R¥ and distribu- 
tion function F. Note that Q(x, &): & — R is convex for 
all x satisfying the constraints in (1). While the recourse 
action y may be different for each &, the first-stage de- 
cision x is independent of which event actually occurs. 
This property is known as nonanticipativity. The mean- 
ing is that the current decision is only based on what is 
known today. 

Throughout this article, it is assumed that the cost 
coefficients q as well as the recourse matrix W are fixed 
since the penalization for compensating the discrep- 
ancy h(€) — T(&)x is likely to be of a deterministic na- 
ture. The case of a nonrandom W is known as fixed re- 
course. In particular, W = (—I, I) with I being the iden- 
tical matrix is called simple recourse, meaning that each 
deviation of h(€) — T(&)x from zero is penalized by its 
absolute value. However, in a more general formulation 


both q and W may also depend on the realization of & 
(see [1,12] for a more thorough discussion). Further- 
more, it is assumed that the demand h and the technol- 
ogy matrix T are linear-affine in €, i.e. 


h(E) = ho + hy & + +++ + hré&x, 
T(§) = To + 1 ++++ + Trék. 


If the distribution of & is discrete, the stochastic linear 
program with recourse given by (1) and (2) can be writ- 
ten as a large deterministic problem where the expecta- 
tions are written as a finite sum, and all constraints are 
duplicated for each realization of €. The resulting deter- 
ministic equivalent problem may be solved by straight- 
forward application of standard linear programming 
methods (provided that the discrete set of possible out- 
comes for & is of relatively low cardinality). However, it 
exhibits a typical block structure that can be exploited 
by special decomposition algorithms (see also [1,12] or 
[13]). 

Otherwise, if the distribution of & is continuous, one 
can use approximation techniques ([2,10,11]) where the 
original random vector & is replaced by another one 
: Typically, a is discrete, and the problem reduces to 
a discretely distributed stochastic program. These tech- 
niques take advantage of the convexity of the recourse 
function, yielding upper and lower bounds for the ex- 
pected recourse cost. This allows to quantify the accu- 
racy of the approximation and, if not sufficient, to im- 
prove it by constructing a better approximation to Q(x). 


The Jensen Lower Bound 


In order to outline the basic concepts (see e. g. [12]), as- 
sume that € is one-dimensional and denote the expec- 
tation by € := E€ = f, € dF(&). Recall that y(&) = 
Q(x, &) is convex in & for any fixed x. Therefore, it can 
be bounded from below by a linear function w(&) that 
supports g at some point a i.e. (assuming that ¢@ is 
(sub)differentiable in é) 


Wi) = 9@) + 9 OE -6. 
Due to linearity, the expected value of this lower bound 
is given by 

Ew() = 9€) + o GEE - 8) = vO. 


Obviously, the best lower bound is given by& = E since 
no linear function supporting g has a value larger than 
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Stochastic Linear Programs with Recourse and Arbitrary Mul- 
tivariate Distributions, Figure 1 


y(é) in €. This is stated by Jensen’s inequality p(E&) < 
Eg(&) for convex functions of random variables. Ap- 
plied to the situation where y(&) is convex over the do- 
main of &, one obtains a lower bound to the recourse 
function from 


o® < / o(é) dF(é). (3) 


fod 
fo) 


Note that (3) also holds for multidimensional & € & C 
RK, K>1, regardless of any correlation between the 
components of &. 

For one-dimensional random variables, the lower 
bounding function y(&) is illustrated in Fig. 1, together 
with an upper bound which can be derived as follows. 


The Edmundson-Madansky Upper Bound 
(Independent Case) 


For simplicity, the one-dimensional case is considered 
first. Let the support of & be given by the interval & = 
[ao, 4,] CR. The idea of the Edmundson-Madansky up- 
per bound is to introduce a discrete random variable . 
with the same expectation, i.e. EE = é, attaining the 
values ag and a; with probabilities 


pea ge ES 

Pop = PE = ap) = SE, 
Po = P@ = ay) = 2 —™ = (2, 
a, — ao a, — ao 


Obviously, the linear function W(&) through the points 
(ao, Y(ao)) and (a;, y(a,)) is above g(&) for all & € F 
due to convexity. This implies 

§ fa 


(62> ot. 4 — ola 
a, — dao a, — dao 


for all € € &, and integrating this inequality yields 


i g(t) dF(é) 
2 OPS pape! ola 
a, — ag a, — do 


= Pa’ (ao) + Pay * g(a) = Eg(€) 


as an upper bound for the expectation Eg(&). The EM- 
bound on intervals can be extended easily to multivari- 
ate distributions, where ZF = [ajo, a4;] x +++ x[axo, 4x1] 
C RX is the rectangular support of &, if either Q(x, -) is 
separable in the components of & or the elements of & 
are stochastically independent. In the former case, the 
bound may be applied to each component separately. 
Here, the more general latter case is of interest. 

Denote the vertices of & by a’, v = (14, ..., VK); Vi 
€ {0, 1}, such that a? = aj,,. Analogously to the above, 
Eis a discrete random vector with independent compo- 
nents and EE; = £,, attaining values a” with probability 


p(a”):=P ( = a”) 


K ¢_4yvi —_ _£. 
= Liar 1) (oi Si) (4) 
Tj (ain aio) 


where V; = 1 — v;, and the EM-inequality contains the 
product of all combinations of each interval bound, i.e. 


[ 06) aF@ = Y p00") fa") = Ev. 


fo) 


The Edmundson-Madansky Upper Bound 
(Dependent Case) 


If the components of & are dependent, the EM-bound is 
more difficult to evaluate (unfortunately, the notation 
is also somewhat cumbersome). For B := {A: A C {I, 
.. Kp and 6,4(v) = []jeq(-D"', A € B, 


mai | (1s) aF(6) 


“ \ied 
z 


are the crossmoments of € for any A € B, and py := 
ma — [Lica é,. Using these definitions and A = 
eee K} \ A, it has been shown in [6], that the prob- 
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abilities of the discrete outcomes a” are determined by 
K 
pla’) = | ](ai - aio) 
i=l 


K 
x | | [(-1)"(ain, - &,) 
i=1 


+ > 84(0) | | az, 5A(v)pa 


AEB icA 


Obviously, p 4 = 0 if the components of & are indepen- 
dent which is equivalent to the bound in (4). The calcu- 
lation of the EM-bound requires to evaluate 2% points 
and, in the dependent case, the same number of cross- 
moments. Therefore, for higher dimensions not only 
the computational effort but also the number of discrete 
outcomes increases exponentially, yielding a determin- 
istic equivalent problem that might be too large to be 
solved. 


Bounds on Simplices 


Instead of rectangles, one can use a regular simplex A 
> & containing the support of &. In this case, Jensen’s 
inequality where & is replaced by A in (3) can be ap- 
plied immediately to obtain a lower bound. To derive 
an upper bound, the affine independent vertices vo, ..., 
vx of A are considered as discrete outcomes. Since this 
are only K + 1 points, the complexity is no longer expo- 
nential in the dimension of €. Note that for any € € A, 
the system of linear equations 


pol&) +--+ px(&) = 1, 
vopo(E) +--+ + vepx(&) = & 


or briefly 


Vp(é) = (:) with V = () 7 J 


has the unique solution 


Ail 
p(t) =V () 


Analogously to the above, a discrete random variable & 
is constructed attaining values vo, .. 
bilities po, ..., px so that the expectation of the discrete 


.» VK with proba- 


distribution is equivalent to those of the original one, 
ie. EE = povo t+ +--+ Pan = é. This yields the follow- 
ing version of the Edmundson-Madansky inequality: 


K 
[oe are = Y pio. 
i=0 


A 


where the vector of probabilities is given by 


rf Qeno- 


Note that this equation holds for both the independent 
and the dependent case. 


Improving Bounds 


The advantageous feature of approximation techniques 
is that the accuracy can be quantified by the difference 
between the Jensen and the Edmundson—Madansky 
bound. If not sufficient, the approximation can be im- 
proved by dividing the rectangular support & (or the 
simplex A, respectively) into convex disjunct subsets 
like in Fig. 2. 

A finite collection of such subsets is called a parti- 
tion. Dividing an element of an existing partition yields 
a ‘refined’ partition, and the associated bound is at least 
as good as the former one (monotonicity of bounds). If 
the subsets become arbitrary small, the approximated 
recourse function converges to the original one. How- 
ever, for computational reasons the partition cannot 
become arbitrarily small. Also, dividing & (or A, re- 
spectively) without strategy may increase the computa- 
tional effort dramatically. Hence, sophisticated. refine- 
ment strategies are required that analyze the accuracy 


> ————_ 
“9 fi &» vi 
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of the bounds for each subset and determine those that 
should be further divided. This yields a sequential ap- 
proximation to the original problem. 


Generalizations and Alternative Methods 


When deriving the upper and lower bounds, only ran- 
domness in the right-hand side was taken into account 
so far. If the latter are deterministic (i.e. h(€) := ho) 
but the coefficients q(é) in the objective are uncertain, 
one can apply the same procedure to the dual prob- 
lem. Since this is a maximization problem with con- 
cave recourse function, the Jensen inequality provides 
an upper while the Edmundson-Madansky inequality 
yields a lower bound. For uncertainty in both the objec- 
tive and the right-hand side of constraints, extensions 
of the approximation scheme described above are re- 
quired (see e. g. [7]). It has to be mentioned that there 
are other concepts apart from the bounds derived here 
which are also applicable for noncompact support of 
the random data [4] or derive sharper lower bounds by 
exploiting second moment information [3]. 

Alternatively, one may approximate the original 
problem by sampling from continuous distributions to 
obtain a deterministic equivalent, or one may use Ben- 
ders decomposition together with variance reduction 
techniques to handle a large number of scenarios [9]. 
Other sophisticated approaches like stochastic quasi- 
gradient methods [5] or stochastic decomposition [8] 
combine sampling with techniques known from convex 
optimization, for example subgradient or cutting plane 
procedures. 
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Stochastic programming provides a framework for se- 
quential decision making, or planning, under uncer- 
tainty. Uncertainty naturally arises when the future 
consequences of present decisions are unknown, and is 
typically represented by a set of scenarios covering pos- 
sible future events. A stochastic program (SP) seeks to 
find a decision which is, in some sense, optimal with re- 
spect to the scenario set. Depending upon the number 
of scenarios, and upon the size of the underlying (de- 
terministic) model, an SP may become very large and 
computationally challenging to solve. Hence, there is an 
ongoing effort to device efficient algorithms tailored to 
the special structures of SPs, and to exploit novel, par- 
allel computer architectures in their solution. 

This article focuses on two- and multistage SPs with 
generalized network structure. The two-stage SP is the 
simplest program which captures the dynamic decision 
process, while the network structure arises naturally in 
many applications, such as financial decision making 
(e.g., [11]). This problem structure was studied already 
in [24] and [25]. We first describe algorithms for the so- 
lution of network problems with strictly convex objec- 
tive functions. These algorithms are then adapted to the 
specially structured stochastic network problems, and 
are also extended to general, convex (such as linear) ob- 
jective functions. 


Problem Formulation 


A stochastic program models a situation where a deci- 
sion maker must make a decision here and now (time 
0), facing future uncertainty (the first stage). At later 
time points, T,,, further corrective decision ares made. 
These decisions are made dependently on prior deci- 
sions and on the actual realizations of uncertain events 
between times 0 and t,, but in the face of further un- 
certainty of events after t,,. The two-stage SP consists of 
the initial and one corrective decision and is naturally 
generalized to the multistage SP with an arbitrary num- 
ber of decisions. 

The two-stage SP where uncertainty is represented 
by a finite scenario set S = {1,..., N} with probabilities 
p® > 0, can be formulated mathematically in the deter- 
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ministic, equivalent form, [26]: 


min f(x) + )o p'g(y’) 


seS 

st. Ax = b, 
Cx+By=d, VseS, 
O<x<u*, VseS, 
0O<y<vwv, VseS. 


The first- and second-stage variables are denoted x 
and y*, respectively. Any (scenario-independent) con- 
straints on the first-stage decision are represented by 
the Ax = b constraint. The second-stage decisions y* de- 
pend upon both the scenario (hence the superscript s) 
and on the first-stage decisions x through the matrix C’*. 
All decision variable may be subject to simple bounds. 
The objective functions f and g* are assumed to be con- 
vex, and continuously differentiable. 

The L-shaped decomposition algorithm [21], based 
on Benders decomposition, applies directly to SP and 
is well-suited for course-grained, parallel solution, [17]. 
We consider here the equivalent split-variable formula- 
tion, obtained by replicating the first-stage variables x 
into copies x° for each scenario, then adding nonantici- 
pativity constraints (NA constraints) 


1 s 


x =x, Ws € {2,...,N} (1) 


to force the replicated variables to have the same value: 


min ice + g(y*)) 


ses 
s.t Axé'=b, VseS, 
[RNLP] Cex* + Be ys =d*, VWseS, 
O0<x' <u, VseS, 
Oo<y<v, VseS, 
xl=x, Ws e {2,...,S}. 


The advantage of this formulation is that the problem 
decomposes into N independent subproblems when the 
NA constraints are ignored. This fact is exploited algo- 
rithmically in the progressive hedging ([10,20]) and the 
diagonal quadratic approximation ([1,9]) algorithms, 
as well as in the row-action algorithms discussed above. 


Row-Action Algorithm 


Row-Action Algorithm The row-action algorithm (RA 
algorithm) [4] is a primal-dual algorithm for solving the 


general nonlinear optimization problem 


min F(z) 
[NLP] jst. Hz=r, 
0<z<u, 


where the objective function F(z) is strictly convex and 
continuously differentiable, z ¢ R” and H € R””. 

A solution to [NLP] consists of real vectors, z € R", 
mw €R”™ and A € R", which satisfy the standard optimal- 
ity conditions: 

e primal feasibility: Hz =rand0 <z < u; 
e dual feasibility. VF(z) = —H'x —A; 
e complementary slackness: for j =1,...,n, 


A; >0 => zj = 0, 
Aj <0 => 2 = Uj. 


Starting from an initial primal-dual point (z, 7, A) 
that satisfies complementary slackness, the row-action 
algorithm iteratively operates on a single constraint (or 
row) at a time, simultaneously updating the primal vari- 
ables occurring in the constraint, and the constraint’s 
dual price. This update causes the constraint to be satis- 
fied (primal feasibility) while maintaining complemen- 
tary slackness. The algorithm terminates when primal 
feasibility is satisfied for all constraints, within some 
tolerance. The order in which the constraints are op- 
erated on is not formally important as long as no con- 
straint is ignored indefinitely, although the ordering 
may influence the rate of convergence. 

The algorithmic step can be viewed as a projection 
of the current primal iterate upon the hyperplane de- 
fined by a nonsatisfied constraint, with a simultaneous 
update of the constraint’s dual price so that the three 
optimality conditions stated above are satisfied for that 
constraint. Indeed, let h; denote the ith row of H, so that 
hig = 1; is the ith constraint from Hz = r. Then the up- 
date is defined as the solution z’*! € R", B € R to the 
system 


VE(z"*!) = VF(z”) + B- hi, (2) 
hi zt) = 4, (3) 


where v is the iteration counter, z” is the current pri- 
mal point and the Bregman parameter B is the change 
in the constraint’s dual price, z;. Under mild conditions 
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(strict convexity and zone consistency of F, see [4]), this 
system can be shown to have a uniquely determined so- 
lution that lies in the domain of F. The projection upon 
a simple bounds constraint is similar: For instance, if Zj 
> uj, the system 


VF(z’t!) = VF(z’) + B, a =U; 


defines the next iterate. The algorithm is summarized 
below, where e; denotes the ith unit vector. 


Initialization 

Set v = 0. Initialize (z°, 2°, 4°) such that VF(z’) = 
Se a 

Projection on equality constraint: 

To project on the ith equality constraint, 
h}z = rj, solve the equations 


Whe )=vye@) + Bh, 


es =f; 


for z’*! © R" and BER. 
Update the dual price: r”*! = 1” — B - e;. 
Projection on bounds: 
To project on the jth simple bound constraint, 0 < 
Zj S u;: 

If zi < 0, let B and z¥*" solve: 


VF(2""') = VF(z") + B-e;, 
Weel 
ee ls 
If zi > uj, let B and z¥*" solve: 


Vie )= VEZ) +B e;, 
v+1 


Sj 


= Uj. 
If0 <z¥ < uj, let B and z?*' solve: 


V F(z") = WF(z’) + B-e;, 
Baa. 


Let AY*! = AY — B- ej. 
Termination: IF convergence: STOP. 
ELSE Set v < v + 1 and continue. 


Row-action algorithm for [NLP] 


We note that the initialization step is usually trivial, 
by setting A° = —VF(z°) — H' 1° for any z° in the do- 
main of F and any 2°. The convergence test is based on 
a measure of violation of primal feasibility. 

The RA algorithm has found application in a variety 
of areas, such as matrix estimation, image reconstruc- 
tion and multicommodity network flow problems. For 
an extensive textbook treatment of these and related 
topics, and for further references, see [6]. 


Specialization to Quadratic Generalized Network 
Problems 


We now specialize the RA algorithm to the case where 
[NLP] is a network problem with a quadratic objective 
function, 


1 
F(z) = 5 Wz+ q'z. 


Let the network structure be defined by a graph G=(V, 
E) with a set V of nodes (or vertices), and a set EC V x 
V of arcs (or edges). Let ay = {j € V: (i, j) € E} be the 
set of nodes having an arc coming from i, and 57 = {i € 
V: (i, j) € E} be the set of nodes having an arc going to 
j, respectively. The decision variables are then the flows 
zi from node i to node j, for (i, j) € E. We allow the 
network to be generalized with arc multipliers mj > 0: 


1 2 
» Pa + GijZij 


min 
(i,j)€E 
[QNP] » Zij — De MkiZki = Ti, 
jedi ked; 
Vie N, 


05 Zi; < uij, V(i, j) € E. 


The elements of W are here denoted by wj and 1; is the 
supply at node i (demands are represented as negative 
supplies). 

The algorithmic steps of the RA algorithm can now 
be stated for [QNP]. For the flow conservation con- 
straint for node i, 


) zii- > MkiZki = Ti, 


jest ied; 


(2)-(3) leads to the updating formula for z;: 


vt+l pv 4 Bo ie gt 
Zi =A San for j € 8;", 
v+1 pv _ Bima - 
Ze = By - fork e d;, 
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where 
ri os e5+ Zi — er mxizh,) 
m2 7 . 
Da ars + Dikes> oT 


The dual variable for node iis updated by subtracting 6 
from its current value, 7?*! = 2? — B. 

Similarly, the simple bound constraints lead to the 
updates: 
© ifz?, <0: Let ia 


e if z}; > uj: Let ae 


p= (4) 


Oand B = — wyz?; 
= uj and B = w(ug — zis 
© if0 <2), <ujLetzj" = Z + A? /wy and B = —AY.. 
In each case update qh »» = B. 
The stochastic quadratic network problem, is now 
obtained by adding NA constraints (1) to [QNP]: 
[SQNP] 


min » 2 p (5 wij(Zjj)” + 42's) 


séS (i, j)€E 
s.t. ss Zij = S MkiZki = Ti, 
jedi ied; 
Vie N, 
Zi = Zipp Vs € {2,..., N}, 
Vi, j) € F, 
0< Zi, < ui, Vi, j) € E. 


As in [RNLP] the superscripts s denote scenario- 
dependent quantities. The NA constraints apply to the 
subset F of (replicated) first-stage variables. A special- 
ization of (2)-(3) upon one such const ae the 
two components of the current iterate Zz,’ ij ” and Zi ” to 
the common value 

ge _ ae _ ae 
which is their probability-weighted average. However, 
it was shown in [13] that the values of first-stage vari- 
able (i, j) across all scenarios can be updated in a single 
step — equivalent to an infinite number of projections 
upon the NA-constraints for z;,; — to their probability- 
weighted average: 


=e Zi VseS. (5) 


sES 


2 a 


This observation leads to considerably faster agreement 
among scenarios. 


Proximal Minimization with D-functions 


The row-action algorithm only applies to problems 
with a strictly convex objective function. Consider the 
linear program 
min c'z 
[LP] 
s.t. zEX, 


where the feasible region is denoted by 


X={zeR": Hz=rand0<z<u}. 


The proximal minimization with D-functions algorithm 
(PMD) solves linear programs by perturbing the objec- 
tive into a strictly convex form, then solving the result- 
ing subproblem using RA. The process is repeated with 
updated perturbations until the solution to the original 
LP is approached. PMD was proposed in [5], where its 
convergence was established. 

Let S # 9 be an open convex set. Let f: ACR" >R 
be an auxiliary function. We assume that S C A, where 
S is the closure of S, and that f is strictly convex and 
continuous on S and continuously differentiable on S. 
The set S is called the zone of f. The D-function corre- 
sponding to f is defined as 


Dy(x, y) = f(x) — f(y) — VEG) (x — y). (6) 


For some suitable choice of the auxiliary function f 
and a positive nondecreasing sequence eae aoe with lim 
inf, 00 y* = y < oo, the proximal minimization al- 
gorithm with D-functions proceeds from an arbitrary 
starting point, z° € S, with the following iteration (the 


PMD algorithm): 


ghtl arg min F(z) + vy + Dye, Zz 2 (7) 


ze€XNs 


PMD was used in [14] and [15] to solve linear and 
stochastic network problems using two auxiliary func- 
tions: the quadratic function, 


1 
fol) = 5x", (8) 
with zone R” and D-function 
1 ar 
Dye y) = s(x - y)" (x= 9), 


and the negative of Shannon’s entropy function 


fr(x) = ae - log x; (9) 
j=l 
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with the positive orthant S = {x € R": x; > 0} as zone and 
with D-function 


n 


xj ie 
DG = > x flog = 1 +> yj. 
J j=l 


j=l 


With the quadratic auxiliary function fg, PMD special- 
izes to the familiar quadratic proximal point algorithm 
(QPP algorithm) of [18,19]: 

2 


k+1 


1 
Zz < arg min Fo(z) = e'z + —— |z-z 
Bex a(2) “ 2yk 


QPP hence consists of solving a series of quadratic pro- 
grams (using, e.g., RA) while iteratively updating the 
proximal point z*. 

QPP and the corresponding algorithm obtained 
from PMD by using fz, the entropic proximal point 
algorithm EPP, [7], were implemented and compared 
with exact algorithms for deterministic network prob- 
lems in [15]. It was found that while neither algo- 
rithm was comparable to specialized, exact network al- 
gorithms (based on simplex or relaxation) for small 
or medium-sized problems, the PMD algorithms were 
able to solve extremely large problems, with up to 16 
million variables, which could not be solved using the 
exact algorithms. 


Parallelism in the Row-Action Algorithm 


The RA algorithm lends itself naturally to parallel ex- 
ecution on a computer with a large number of inter- 
connected processors. In the context of stochastic net- 
work problems the potential for parallelism manifests 
itself at several levels. The key to parallelizing an algo- 
rithm is to identify parts of the algorithm which can be 
executed simultaneously without interfering with each 
other. Hence, two calculations can be executed simulta- 
neously (by different processors) if they do not depend 
upon each other’s results, that is, there are no data de- 
pendencies between them. 


Simple Bounds 


The projection on simple bounds constraints is the sim- 
plest example of natural parallelism: Each projection 
changes the value of (at most) one primal and one dual 
variable. All the n bounds projections of a problem hav- 


ing n variables can be executed in parallel without data 
dependencies. 


Disjoint Constraints 


By disjoint constraints we mean constraints that do not 
have primary variables in common. Projections (2)-(3) 
upon a set of such constraints can be performed simul- 
taneously since the data involved in each projection do 
not depend upon the other projections. For network 
problems, equality constraints correspond to nodes and 
are disjoint for sets of nonadjacent nodes, i. e., nodes 
that have no arcs in common. The identification of such 
sets is a graph-coloring problem [28], where nodes with 
the same ‘color’ can all be updated simultaneously. 


Stochastic Problems 


For a stochastic network it is evident that nodes belong- 
ing to different scenario subproblems are independent. 
This is true even for first-stage nodes within our frame- 
work of variable-splitting as in RNLP (and is our pri- 
mary reason for using splitting). The NA projections 
(5) can in turn be executed in parallel for each (set of 
replicated) first-stage variable, Z; Pp (i,j) € F. 


Jacobi Algorithms 


The kinds of parallelism mentioned so far all define al- 
gorithms which are equivalent to the strictly sequen- 
tial RA algorithm, i. e., Gauss-Seidel algorithms. In con- 
trast, Jacobi algorithms allow simultaneous operations 
on constraints even if they are not disjoint. This in- 
creases the potential for parallelism because projections 
on all the primal constraints can be calculated in par- 
allel. However, projections on nondisjoint constraints 
will generally lead to conflicting updates of the (pri- 
mal) variables common to the constraints. The con- 
flict can be resolved by solving first for the projections 
for all constraints simultaneously, but retaining only 
the dual solutions and discarding the primal variables. 
Then, common values of the primal variables are cal- 
culated from the duals using dual feasibility. The con- 
vergence of this algorithm, using suitable underrelax- 
ation, is established in [2]. Jacobi algorithms generally 
allow more parallelism than Gauss-Seidel algorithms 
but have poorer convergence properties because they 
operate on partially outdated data. 
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Massively Parallel Implementations 


The RA algorithm was implemented on the mas- 
sively parallel Connection Machine CM-2, [8], which 
is a single-instruction, multiple-data (SIMD) computer 
having up to 65536 processors implemented as 4096 
physical chips organized as a 12-dimensional hyper- 
cube. Each processor has 32Kbytes local memory, and 
there is a floating-point unit for each cluster of 32 
processors. The processors, which operate at 7MHz, 
can each simulate a number of virtual processors, VPs, 
which allows the programmer to address the machine 
as if it had a number of processors required by a spe- 
cific parallel algorithm. 

Stochastic network problems were mapped upon 
the machine following the scheme of [27] for each sce- 
nario network problem. This mapping was found in 
[12] to be the most efficient data structure. A linearly- 
organized cluster of VPs were assigned to each node i, 
consisting of |5*| + |8;| + 1 VPs which calculate, in 
parallel, dual feasible flows on the node’s incident arcs 
(satisfying arc bounds), and cooperate efficiently in cal- 
culating the resulting node surplus/deficit, and f in (4), 
leading to an updated dual node price. Clusters of pro- 
cessors associated with adjacent nodes then exchange 
dual prices through a global send operation. This oper- 
ation is the only operation that does not use the efficient 
hypercube communication pattern. 

Each scenario subproblem is represented the same 
way but in such a way that processors associated with 
corresponding variables in different scenarios have di- 
rect communication links. This allows for efficient cal- 
culation of the NA projections (5) across scenarios. The 
algorithm hence alternates between flow conservation 
and bounds constraints projections within each sce- 
nario network — in parallel across nodes and scenar- 
ios — and enforcement of nonanticipativity constraints 
across scenarios. Experimentation with the balancing 
between these two constraint types are reported in [13] 
and [16]; generally 25-100 network iterations between 
nonanticipativity projections worked well. [16] also re- 
ports on a choice of penalty parameter values, y’*, in (7). 


Extension to Multistage SPs 


Although two-stage stochastic programs go a long way 
toward properly incorporating uncertainty, they suffer 
from the problem of ‘anticipativity’: At the time of the 


second-stage decision, all uncertainties, even those be- 
yond the second stage, are known to the program, per- 
mitting in effect a super-optimal decision. Addressing 
the realistic requirement that there should be more than 
two stages, so that decisions at any (but the last) stage 
are still made under further uncertainty, leads to multi- 
stage SPs, MSP. 

A T-stage stochastic programming problem can be 
formulated as follows [3]: 


min cx; 
x1 
+Ezg, (nin C2X2 
x2 
+Eg,\g, (nin C3xX3+ ++ 
x3 


PE gs Ib sn 871 min ersr) 
s.t. Aix, = by, 
Box, + A2x2 = ba, 
B3x2 + A3x3 = bs, 


Brx7_-1 + Arxr = br, 


O0<x,<u;fort=1,...,T, 
where 


&, = (Ay, By, by, c1), t= 2,...,T, 
are random variables, i. e., $;-measurable functions &,: 
A; > R™: on some probability spaces ({2;, F:, P;). 
The decision variables x, € R™, for t = 2,..., T, 
are stochastic variables measurable on the o-field gen- 
erated by &;. The notation Eg denotes mathematical ex- 
pectation with respect to &, and E¢,¢, similarly denotes 
conditional expectation. The sequential nature of the 
decision process is apparent from this formulation: At 
stage t, x; is decided to minimize the expected cost of 
the subsequent stages, conditional upon events realized 
up to that stage. 


Stage 1 


Hi (0) ¢=1 
Stage 2: (2) (3) t=2 
Stage 3: | (6) (7) (8) (9) i#3aT 
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A scenario tree can be used to represent the way in 
which the stochastic variables &; evolve: The root of the 
tree corresponds to the immediately observable, deter- 
ministic data, A,, B,, bj and c,. The nodes of the tree 
at level t > 2 corresponds to possible realizations of &;. 
In this tree, a scenario corresponds to a complete path 
from the root to a leaf. Similarly to the two-stage case, 
variables belonging to stages prior to the last are repli- 
cated for each scenario, and appropriate NA constraints 
used to enforce the proper correspondence among sce- 
narios: 


Stage 1: 


Stage 2; 


Stage 3: 


The nodes here represent sets of decision variables 
belonging to various stages (top to bottom) and scenar- 
ios (left to right). The double lines represent NA con- 
straints; cf. the scenario tree above. This problem repre- 
sentation maps naturally upon a rectangular communi- 
cation pattern, as on the CM-2, and was solved, [16], as 
in the two-stage case, by iterating alternatingly on sce- 
nario networks and NA constraints. 


Computational Experiments 


The algorithms covered in this article were subjected 
to extensive numerical experimentation on the major 
types of stochastic network problems: 


Nonlinear Two-stage Problems 


The row-action algorithm was implemented on the 
CM-2 and used to solve large scale, quadratic prob- 
lems in [13]. They report that the algorithm scales very 
effectively in problem size and number of processors: 
For instance, doubling both results in virtually the same 
solution time. The largest problem solved had 8, 192 
scenarios and a deterministic, nonlinear equivalent of 
868,367 constraints and 2,747,017 variables, and was 
solved to a tight primal tolerance in about 11 minutes 
using 32K processors, and achieving a computational 
rate of 276MFLOPS. The algorithm’s performance is 
sensitive to the range of multipliers occurring in the 


generalized networks, and deteriorates as this range in- 
creases. 

Problems of this size could not be solved using any 
other available algorithm. The RA algorithm was, how- 
ever, competitive with standard algorithms on smaller 
problems. 


Linear Two-Stage Problems 


The PMD algorithm in conjunction with the row- 
action algorithm was used in [14] on two-stage prob- 
lems with linear objectives. They conclude that the rel- 
ative advantage of this algorithm compared to standard 
codes (Minos 5.3 simplex and OSL interior point) is in 
the solution of large and very large problems. Solving 
the largest problems, with 2, 048 scenarios (determinis- 
tic equivalent of 217,103 constraints and 618,529 vari- 
ables), took more than 3 hours of 32K CM-2 process- 
ing time, but a problem of this size could not be solved 
using the simplex or interior point algorithms. It also 
appears that the PMD/RA solution times scale nearly 
linearly in problem size, whereas the comparison codes 
had close to quadratic time complexity. It is apparent 
that the solution of linear, as opposed to strictly convex, 
problems takes substantially longer due to the overhead 
of the nested PMD/RA algorithms. 


Multistage Problems 


Finally, linear multistage problems with up to 9 stages 
were solved in [16]. Results mirror those stated above, 
namely effective scalability and the ability to make 
progress on very large scale problems, even with the 
complex nonanticipativity structure of a 9-stage prob- 
lem. OSL is generally superior to the PMD/RA imple- 
mentation for small to medium-sized problems, but 
cannot solve the large instances. 

For further material on the parallel and massively 
parallel solution of large scale stochastic programs, see 
also [22] and [23]. 
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Introduction 


A significant number of optimal stopping problems of 
practical interest may only be solved through numerical 
schemes. As many of them have surfaced in the area 
of mathematical finance, illustrations drawn from that 
field will be used to describe some of these numerical 
approaches. Specifically, we consider problems in the 
expected-value maximization form 


sup E[f(Xz,7)] , 
TET 


(1) 


where TJ is a set of stopping times, f a measurable 
function and {X;}1er = X a Markov process, where I 
is a time index set that can be either discrete or contin- 
uous (see AitSahlia [1] for additional details). 

Under technical conditions for its existence, a solu- 
tion for (1) consists of 
e the value function 

V(x, t) = supzerz, El f (Xz, t)|X: = x], where T;, is 

the set of stopping times subsequent to ¢ in T 
e the optimal stopping time 

Tt, = argmax,e7,E[f (Xz, t)]. 
In this context, with E denoting the state space of X, 
the set E x I is partitioned into a closed set S and its 
complement C labeled, respectively, stopping and con- 
tinuation regions. Then 


Tt, = inf{s >t: X, € S}. (2) 
Discrete-Time Models 
When T = {0,1,... ,N} for some given N < 00, the 


most straightforward numerical device is the back- 
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wards recursive dynamic algorithm 


V(x,N) = f(x,N), (3) 


V(x, n) 
= max{E[V(Xn4i,n + 1)|Xn = x], f(x, n)}, 
O0<n<N-1. (4) 


An issue with the above might be implementing the 
proper numerical scheme to estimate E[V(X,41,n + 
1)|X, = <x], especially in light of the so-called curse 
of dimensionality that makes this algorithm inefficient 
in high dimensions. There are potentially two reme- 
dies to this problem: one, for finite T , based on Monte- 
Carlo simulation, and another, for infinite T, based on 
large-scale linear programming (LP). For the former, 
an efficient and popular algorithm is that of Longstaff 
and Schwartz [10], which is now viewed within the 
wider context of approximate dynamic programming 
(see also [3,11]). The basic idea of this algorithm is to 
use Monte Carlo simulation and least-squares regres- 
sion to estimate E[V(Xn41,2 + 1)|Xn = x]. 

For infinite J, the value function is time-homoge- 
neous when X and f are. In this case the value function 
solves 


V(x) = sup E[f(Xz)|X = x] (5) 


TET; 


for all t € {0,1,...} and may be obtained through 
a LP algorithm thanks to its Snell envelope character- 
ization [1]. Assuming a transition matrix P for X and 
a finite state space, which might be genuine or the re- 
sult of a truncation, the resulting LP is 


Minimize > V(x) 
x 


subject to 
V(x) > > Px, VU), 
y 


V(x) = f(x), 
V(x) >0. 


See Cinlar [4] and Dynkin and Yushkevich [5] for 
proofs and further details. 


Continuous-Time Models 


When both X and its time index J are continuous, there 
are a number of numerical schemes to generate solu- 
tions for (1). Overall, they approximate either the un- 
derlying diffusion process X by a discrete version or the 
value function and its derivatives in its characterizing 
expression (e.g., integral representation, partial differ- 
ential equation.) 

e Weak-convergence approximation approach: The 
most general scheme concerning this approach is to 
approximate the infinitesimal operator £ of X in the 
free-boundary problem that characterizes the solu- 
tion of (1). For example, a finite-differences approx- 
imation of derivatives in the free-boundary problem 


£V=0 inc, 
V=f onEx{T}, 
av af 

We > ay on 0S 


leads to the formulation of an optimal stopping 
problem for a Markov chain (see Kushner and 
Dupuis [8]). 
If the process X is explicitly expressed in terms of 
Brownian motion, then random-walk approxima- 
tions can directly be used on the latter. This is a fairly 
well understood procedure for which rates of con- 
vergence have been developed (see Lamberton [9]). 
e Integral equation approach: In this scheme, one 
makes use of the Doob-Meyer decomposition 
formula for submartingales (see Karatzas and 
Shreve [7]) to express the value function V in terms 
of the boundary, which itself solves a related in- 
tegral equation. For example, consider a case in 
American option pricing , with horizon T, payoff 
function f(x, t) = e~" max(K — x,0), and X; = 
Xo exp{(r—o7/2)t +0 W;}, where K > 0,r > 0, and 
o > Oare given and {W;}; is a standard Brownian 
motion started at 0. Then the value function V can 
be decomposed as 


V(x, t) = U(x, t) 
T 
+f [rK@(—d(X, B(t),t — t))]dt, (6) 


where ® is the cumulative standard normal dis- 
tribution function, d(x, y,t) = (In(x/y) + (r + 
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o7/2)t)la./t — oft, and U(x,t) = Kell 
@®(—d(x,K,T — t)). This formula requires the 
knowledge of the boundary B = 0S, where S is the 
stopping region that identifies the optimal stopping 
time (2), and which is obtained as the solution of the 
integral equation 


(K — B(t)) 
= U(B(t), t) 


T 
+f [rK®(—d(B(t), B(t),t —t)]dt. (7) 


Efficient and accurate spline approximations of B 
can be found in AitSahlia and Lai [2]. 

e Linear complementarity approach: An alternative 
that does not require the explicit determination of 
the optimal stopping boundary relies on the varia- 
tional inequality formulation 


on E x [0, T), 
on E x {T}. 


min{V, V — f} =0, 
V=f, 


Finite-difference approximations then lead to a lin- 
ear complementarity problem (see Huang and 
Pang [6] and Wilmott et al. [12]). 
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Introduction 


A typical stochastic optimal stopping problem of prac- 
tical interest consists of the following optimization: 


sup E[f(X;,t)] st. rE T, (1) 


where {X;} = X is a stochastic process known as the 
state process, E its associated expectation operator, f 
a function measurable with respect to the probability 
law induced by X, and T a set of stopping times to 
be defined shortly. In many applications f(X;z, T) is in- 
terpreted as the gain resulting from stopping at time t 
when the state value is X;. 

A financial example that has been the subject of 
great interest in mathematical finance/financial en- 
gineering is one with f(x, t) =e" max(K — x,0), 
where K > 0 is given, and X is the geometric Brown- 
ian process 


X; = Xo exp{(r — o7/2)t + oB;}, 


where r and o are given positive constants and {B;} is 
a standard Brownian motion started at 0. In finance 
f (Xr, t) represents the discounted payoff that results 
from the exercise at time t of a put stock option by its 
holder who is allowed to sell this stock at share price K 
when it is traded at price X;. The option holder’s prob- 
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lem is to find the best time to exercise this option, thus 
maximizing its payoff, a problem that is mathematically 
expressed as (1). As will be made precise soon, this op- 
timal exercise time (or more generally stopping time) 
must be determined only on the basis of pastz observa- 
tions. It should be mentioned here that {B;} can also be 
considered the state process, instead of X. 

Note that the payoff function f as expressed in (1) 
does include payoffs that are path-dependent through 
the usual introduction of additional variables to render 
a problem Markovian. For example, still in the finan- 
cial realm, one may consider the payoffs e~"'(M; — X;) 
or e "' max(K — A;,0) that depend on the maximum 
process M; = max,<; X; or the average process A; = 
(1/t) [, X.ds. 

Stochastic optimal stopping theory, or optimal stop- 
ping as it is customarily known, is a specialized type 
of the (stochastic) dynamic programming approach de- 
vised by Bellman [1] in the 1950s. However, actual 
optimal stopping problems originated in Wald’s work 
on sequential statistical inference (Wald [4]), where 
the problem is to determine sequentially the sample 
size that will decide between two statistical hypothe- 
ses. Ever since these early days, this field has experi- 
enced several developments in both theory and appli- 
cations as described for example in the book of Peskir 
and Shiryaev [3]. 

Optimal stopping problems are generally ap- 
proached from a probabilistic perspective through mar- 
tingales and Markov processes. When the underlying 
process X in (1) is a diffusion, they also lead to free- 
boundary problems for partial differential equations. 
Optimal stopping problems are rarely solved in closed- 
form and numerical methods abound, a topic addressed 
in a companion entry in this Encyclopedia. 


Definitions 


This section sets up basic definitions that lead to the 
notion of stopping time. As mentioned before, the deci- 
sion to stop at time t must be based only on information 
available up to t. In this respect the concept of informa- 
tion set in the form of filter is first formally presented, 
followed by that of stopping time. 
e Discrete-time filtration 
Given a probability space (2, ¥, P), a discrete-time 
filtration is a collection (F)n>0 where each F,, is 


a o-algebra of subsets of §2 such that Fy C Fi C 
--+F. Fy, represents the information available up 
to time n. It generally consists of at least the set of 
events that have been determined by the realized 
values of X; up to time n. The latter is called the nat- 
ural filtration of X and is often augmented to form 
(F n)n>0- 
e Continuous-time filtration 
Here the definition is essentially identical to the pre- 
vious modulo a technical condition. Given a prob- 
ability space (2, F, P), a continuous-time filtration 
is a collection (¥;);>0 where each F; is a o-algebra 
of subsets of 2 such that F, C F; C F for s<t. 
As in the discrete-time case, F; also represents in- 
formation up to time ¢. Additionally, it is assumed 
that each F; contains all P-null sets in F and that 
(F:)r>0 is right-continuous; i.e. F; = ()\,<, Fs for 
allt > 0. 7 
e Stopping time 

Let J = {0,1,2,...} and I = [0, co] when X is, re- 
spectively, a discrete-time process and a continuous- 
time process. A random variable t: {2 — Iisa stop- 
ping time if P{t < oo} = land {t < t} € Ff, forall 
t > 0. Often the set I is bounded and therefore the 
former condition is obviously true. The latter con- 
dition expresses the fact that the decision to stop 
at time t must be based solely on information up 
to time f. In this case t is adapted to the filtration 


(Ft) t>0- 


Solution Methods 


There are generally two approaches to solving (1): 
one based on probabilistic tools and another on par- 
tial differential equations (PDE) techniques. However, 
both start by using the dynamic programming princi- 
ple of optimality to derive the so-called Bellman equa- 
tion. When the interval I is of the form [0,7] or 
{1,2,...,N} define J; to be, respectively, the set of 
stopping times in [ft, T] and {t,f+1,...,N}. When I 
is infinite, T; is defined as the set of stopping times in 
I that are > t. Then solving (1) is tantamount to deter- 
mining 
e the value function V(x,t) = sup,ez, E[f(Xr. T)| 
Xo = x], and 
e the optimal stopping time t;* = argmax,.7 Ef(Xz, 
T). 
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A sufficient condition guaranteeing the finiteness of the 
expectation in (1) is 


E(sup | f (Xz, £)|) < 00, 


tel 


which can in fact be relaxed a number of ways. 


(I) 


The Probabilistic Approach: Martingales 

When I = {0,1,..., N}, then by the optimality 
principle of dynamic programming we can write 
the recursion 


V(x,N) = f(x, N) (2) 


V(x, n) = max{E[V(Xn4i.n + 1)|X, =x], 
f(x,n}, O<n<N-1. (3) 


The solution for the system (2)-(3) induces a se- 
quence of random variables S, = V(X,, n) that 
satisfies the following properties: 
(i) Sp= max{E[Sn4ilFn], f(Xn, n)}s 
(ii) (Sn)k<n<n is the smallest super-martingale 
that dominates the gain process G, = 
f (Xn, N)k<n<N (i.e.3 Sp = Gy P-a.s.); 
(iii) the stopping time t7 = inffn <k <N: 
S; = G,} is optimal for 0 < n < N; 
(iv) the stopped sequence (Skar*)n<k<n_ is 
a martingale. 
We recall here that a discrete-time process (My) n 
is a martingale with respect to a filtration (Fy), 
(martingale for short) if E|M,,| < oo for n>0 
and E(Mn41|Fn) = Mn, P-as., for n > 0. Cor- 
respondingly, (M,,), is a super-martingale if 
E(Mn4i|Fn) < Mn, P-as., n> 0. The process 
(Sn)e<n<n is called the Snell envelope and the 
above characterization is particularly useful to ob- 
tain the value function V through linear pro- 
gramming when the state space is finite (see Cin- 
lar p. 212 in [2]). 
The generalization of the above result to the case 
where I is countably infinite requires that the 
sequence S, = V(X,,n) be characterized differ- 
ently through the concept of essential supremum 
below, which generalizes in some sense that of de- 
terministic supremum. 
Essential Supremum. Let I be an arbitrary set 
and (Zy)ner be a collection of random vari- 
ables defined on the same probability space. Then 


(II) 


there exists a countable subset J C I such that 

Z* = sup,¢ Zn Satisfies 

(a) Z, < Z* P-as. for each n € I; 

(b) for any other random variable Z such that 
Zn < ZP-a.s. for each n € I, we have Z* < Z 
P-a.s. 

The random variable Z’ is labeled essential supre- 

mum and is denoted by esssup,,¢;Zn. 

As a consequence, we can now rewrite the above 

Snell envelope when I = {1,2,..., Nhas 


Sn = esssuprez, El f(Xz,7)|Fnl , 


where T,, is the set of stopping times in {n,n + 
1,...,N}. When I is countable infinite then S,, 
is correspondingly defined with T,, as the set of 
stopping times in {n,n + 1,...}. Similarly, S,, sat- 
isfies both conditions (a) and (b) and the optimal- 
ity property (i) above for all n > 0. 

For the continuous-time case, where I is an inter- 
val, the value function for problem (1) is the Snell 
envelope of the gain process (f(X;, t)); defined as 


St = esssup ez, El f(Xr, t)|Fr] . 


where 7; is the set of stopping times in [t, T] for 
a finite horizon problem or [t, oo) otherwise. The 
Bellman equation in its discrete form (3) is now 
replaced by 


nel, (A) 


tel, (5) 


V(x, t) > max{E[V(X,, s)|X; = x], f(X:, H}, 
fors >t. 


Formulation (1) has cast the problem of optimal 
stopping in a Markovian framework. This is in 
fact the most common situation in practice and 
the set-up is not too restrictive as it mirrors well 
the generic martingale situation fully described in 
Peskir and Shiryaev [3]. 

The Probabilistic Approach: Markov Property 
and Stopping Boundary 

When X is a Markov process (in discrete- or 
continuous-time) with state space E the optimal 
stopping time is defined as 


t* = inf{t eI: X,;€ S}, 
where S is a closed subset in I x E. S and its com- 
plement C in I x E are such that 

V(x, t) > f(x, onC, 

V(x, t) = f(x, t)onS. 
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C and S are respectively called the continua- 
tion and stopping regions. The intersection B of 
their closures is called the stopping boundary. It 
is time-dependent when I is bounded and time- 
homogeneous when I is unbounded. When I is 
countably finite and E is discrete, then B can be 
obtained through the backward recursion (2)-(3). 
The PDE Approach 

When the state process X is a diffusion the bound- 
ary Band the value function V can be obtained by 
solving a free-boundary problem. Alternatively, 


(IID) 


when only the value function is of interest then it 
can be obtained as the solution of a variational in- 
equality. If we let £L the infinitesimal operator as- 
sociated with X, then assuming regularity and dif- 
ferentiability where necessary, the free-boundary 
problem when I = [0, 00) is stated as 


£V=0 inC, 
av af (6) 
aa ae on B. 


The latter condition is called smooth-fit. It is in 
a sense the condition that characterizes the op- 
timality of a solution V of the PDE (6). When 
I = [0, T] the free-boundary problem becomes: 


L£V=0 inc, 

V=f on Ex{T}, (7) 
oV of 

ae Ts on B 


One way to avoid reference to the free boundary 
B is through the use of variational equality. For 
example, the latter problem with finite horizon T 
can be re-expressed as 


min{V,V —f}=0, 
V= 75 


on E x [0, T) 
on E x {T}. 
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In many applications of stochastic programming there 
is some uncertainty about the probability distribution P 
of the random parameters. The incomplete knowledge of 
the probability distribution can be described by assum- 
ing that P belongs to a specified class P of probability 
distributions. This in turn suggests to use the minimax 
decision rule. 

The first results were concerned with stochastic lin- 
ear programs with recourse; they can be treated within 
the following more general framework 


min F(x; P) := Epf(x;@) (1) 
on the set X C R", 

with X a given set of decisions, P a probability distri- 
bution on (§2, 2’), 2 C R” and P known to belong to 
a class P. The random outcome of a decision x € X is 
quantified by a function f defined on X x 92, Ep denotes 
the expectation under P. 

These results were formulated in terms of the two- 
person zero-sum game 


(X, P, F(x; P)). (2) 


M. Iosifescu and R. Theodorescu [11] suggested to use 
an optimal mixed strategy of the first player in the game 
(2). J. Zatkova [18] introduced the notion of minimax 
solution as an optimal pure strategy of the first player 
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in the game (2). Under quite general assumptions on 
and F, a minimax solution exists and 


inf max F(x; P) = max inf F(x; P). (3) 
xeX PEP PEP xeX 


The minimax decision rule can be applied also in cases 
when the minimax theorem for the game (2) does not 
hold true. It means to solve the problem 


min max F(x; P) 
PEP (4) 
on theset X C R" 


hence, to apply the best possible decision obtained for 
the most adverse considered circumstances. This pro- 
vides a tool for the worst-case analysis for program 
(1) and allows for constructing bounds for the optimal 
value of (1) valid for all P € ?. 

Applicability of the results depends on the assumed 
form of the class P which describes the level of the avail- 
able information about the probability distribution of 
the random parameters and also on the properties of 
the random objective function f(x; w). Let us list some 
of the most frequent choices of P: 

e consists of probability distributions carried by 2 

CR” which fulfill certain moment conditions, e. g., 


P ={P: Epg(@) = yj, f= lisse (5) 


with prescribed values y;, Vj, [4,6,8,17,18]. 

e contains probability distributions on (2, +’) with 
fixed marginals [15]. 

e Additional qualitative information, such as uni- 
modality of P, is taken into account [6,8]. 

e consists of probability distributions P with known 
finite support, i.e., to specify P means to fix the 
probabilities of the considered atoms (scenarios) 
taking into account a prior knowledge about their 
partial ordering, etc.; see e. g. [2]. 

e ?P is a neighborhood of a hypothetical probability 
distribution Po [4]. 

e In principle, P can be also a parametric family of 
probability distributions with an incomplete knowl- 
edge of parameter values. 

For convex, compact P, the expectation F(x; P) = Epf (x; 

@) attains its maximal (and minimal) value at extremal 

points of P; the extremal probability distributions can 

be characterized independently of the form of the ran- 
dom objective f, however, the worst-case probability 


distribution, say, P* € P independent of f (and thus 
independent of the decisions x) appears only excep- 
tionally. If this is possible the objective function in (4) 
maxpep F(x; P) = F(x; P*) is just an objective func- 
tion of a standard stochastic program which is relatively 
easy to solve due to a relatively simple structure of P*. 
There are also instances when one can succeed to get 
the explicit form of maxpep F(x; P) [6,12]. They relate 
to classes of one-dimensional probability distributions 
and to special functions f. 

The general methodology for solution of the inner 
optimization problem maxpep F(x; P) for a fixed de- 
cision x has been elaborated in detail for the classes 
of probability distributions defined by moment condi- 
tions (5), both in the form of equations and inequalities: 
The extremal probability distributions have finite sup- 
ports, cf. [14,17], and the solution of the inner problem 


max ffs z) dP (6) 
Q 


subject to 


fara. 


2 
[ smar= yy. i= Taxed, 
2 


reduces to solution of a generalized linear program (cf. 
[3,4,7,9,17]), provided that (2 is compact and f(x; -), gj, 
Vj, are continuous on (2. The procedure provides both 
the atoms of the sought worst-case probability distribu- 
tion and their probabilities. In some cases, it is expedi- 
ent to analyze the dual program to (6), (7), which reads 


J 
min) > Uujyj + Uo (8) 
u =i 
subject to 
J 
uo + > ujg;(z) = f(z), Wee Q. (9) 
j=l 


For details and various applications consult [3,4,5,6,7, 
8,9,13,17]. 

As an example, let f(x; -) be a convex function 
on a bounded convex polyhedron 2 C R”, say, 2 = 
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conv{o™, ..., 0} and 


Pal Py Epi Vi fl eceg mt (10) 


with y a given interior point of 2. The constraints of 


(9), 


m 
Ug + > ujzj = F(%2Z), 


j=l 


hold true for all z € (2 if and only if they are fulfilled 
for the extremal points wo, ..., o). Duality proper- 
ties imply that only suitable subsets of the set of ex- 
tremal points of {2 need to be considered in construc- 
tion the finite supports of the worst-case distributions. 
The generalized linear program (6), (7) reduces to the 
linear program 


K 
max ) > px f(s) (11) 
is k=1 
subject to 
K 
k A 
> pro! ) =yj, j=l,...,m, 
se (12) 


pr>O Vk. 


K 
> pe =1, 
k=1 


Convexity of f with respect to w is essential for the 
above result. Generalization to piecewise convex func- 
tions f(x, -) (cf. [5]) is possible; on the other hand, the 
worst-case probability distribution from the class (10) 
for f concave in w is the degenerated distribution con- 
centrated at the prescribed expected value Epw. This de- 
generated distribution provides the best (i. e., the min- 
imal possible) expectation for convex functions f(x, -) 
under P belonging to the class ?; compare with the 
Jensen inequality. 

If the set of feasible solutions of (12) is a single- 
ton, the worst-case distribution P* does not depend on 
f and we obtain bounds for the optimal values of the 
stochastic programs (12) under an arbitrary probabil- 
ity distribution P from the class (10) and an arbitrary 
function f which is convex in @: 


min f(x, Epw) < min Ep f(x, @) 
xEx xEx (13) 
< min Ep« f(x,w), WP EP, 

xe 


provided that the minima exist. Such bounds are nu- 
merically tractable, are tight and provide an informa- 
tion about sensitivity of the optimal value of stochastic 
program (1) on the choice of a probability distribution 
P belonging to the considered class P. The well-known 
instance is the class of probability distributions carried 
by a closed interval [a, b] on the real line with a pre- 
scribed value y € (a, b) of the expectation Epw. The 
worst-case distribution is carried by the endpoints of 
the given interval [a, b] and the only solution of the sys- 
tem 

piat+psab=y, pitpo=1, pi,p2>0 
is p) = (b — y)/(b — a) and pz = 1 — p;. The result agrees 
with the well-known Edmundson-Madansky inequality 
and the minimax approach guarantees that this bound 
is tight within the considered class of probability distri- 
butions and for convex functions f(x, -). 

There is a host of papers devoted to designing vari- 
ous bounds for the objective function F(x, P) of stochas- 
tic programs (1) under various assumptions about the 
class P and the function f(x, -); for a review of the re- 
lated results see [3,4,6,13,17] and the references therein. 
These bounds proved to be useful also in designing al- 
gorithms and this is at present the main field of success- 
ful applications of the minimax approach. 

On the other hand, to get minimax decisions is 
rather demanding, as it requires the solution of the 
full minimax problem (4). Except for the simple special 
cases, such as a unique feasible discrete distribution that 
fulfills (7) or the optimal value of the objective function 
(8) obtained in an explicit form, one has to rely on spe- 
cial numerical procedures such as the stochastic quasi- 
gradient methods designed for this purpose in [9,10]. 
The numerical difficulties are behind the fact that, in 
spite of a sound motivation, real life applications of the 
minimax approach have been rare and have consisted 
of the simple special cases e. g., [1,2,6,8,15,16]. 
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We consider an objective function h(x, &), to be maxi- 
mized in a mathematical programming problem, where 
& is a random vector. There are two prominent pro- 
posals to incorporate it into a stochastic program- 
ming model formulation. The first one is due to H.M. 
Markowitz [2,3] and it advocates for the maximization, 
with respect to x and subject to the given constraints, of 
the function 


a E[h(x, €)] — By Var [h(x, &)]. 


where @ > 0 and 6 > 0 are constants. We may choose a 


= 1. Let p(x) = E[h(x, &)], o(x) = \/Var[h(x, &)]. An 


optimal solution xp can be characterized by the follow- 

ing two statements: 

a) there is no feasible x such that jz(x) = (xo), o(x)< 
0 (Xo)s 

b) there is no feasible x such that o(x) = o(Xp), [£(x) > 
A(X). 


We say that the pair (j2(x), o(Xo)) is an efficient point 
among all pairs (s(x), o(x)). 

An important special case is the random objective 
function h(x, &) = &'x and it comes up in portfolio 
composition problems, where the components of & are 
the random returns of the assets. If we introduce the 
notations fz = E(é), C = E[(é — w)(€ — pw) "], then the 
objective function of the stochastic programming prob- 
lem takes the form: 


pe'x— Bvx' Cx. 


Sometimes x! Cx is replaced for Vx! Cx, in order to 
obtain a convex quadratic programming problem. 

Sometimes mu 'x is fixed (or a lower bound is pre- 
scribed for it) and x! Cx is minimized, or x! Cx is fixed 
(or an upper bound is prescribed for it) and w'x is 
maximized. 

The second principle to handle h(x, &) is due to S. 
Kataoka [1]. In this case we formulate the problem 


max d 
st.  P(h(x,§) = d) = p, 
x ED, 


where p is a prescribed probability and D is the set of 
feasible solutions in the original problem. In the special 
case h(x, €) = &'x, and under the assumption that & has 
a normal distribution, the above problem can be shown 
to be equivalent to 


max \eTx +@7(1 — p)VxT Cx} 


s.t. x € D, 


where @ is the univariate standard normal probability 
distribution function. 
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Decision making under uncertainty can often be for- 
malized as a stochastic program, constrained not merely 
in material terms (by bounded resources, capacities, 
technological possibilities etc.), but also by limited in- 
formation (nonanticipativity). The former type of con- 
straints, accounting for material bounds, is usually de- 
scribed by inequalities required to hold almost surely. 
The latter type, reflecting informational restrictions, 
often assumes the form of linear equations involving 
conditional expectation operators. Each sort of con- 
straint generates its own Lagrange multipliers. These 
Lagrange multipliers have various applications. In par- 
ticular, they play key roles in algorithms for solv- 
ing stochastic programs employing decomposition (cf. 
also » Stochastic linear programming: Decomposition 
and cutting planes) or constraint relaxation techniques 
[12,13,17,19]. Besides their computational role, these 
auxiliary variables also figure prominently in optimality 
conditions, duality theory and stability analysis (cf. also 
> Stochastic integer programming: Continuity, stabil- 
ity, rates of convergence). Present randomness, they 
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take on specific features [1,3,4,9,11,15,16,18], which are 
reviewed in this article. 


The Basic Model 


Stochastic programs often assume the following form. 
Minimize a functional f(x) = f(x(-)) over some given 
set in a linear space £ containing finite sequences x(-) = 
(x1(-),..., x7(-)) of random vectors x;(w) € R”‘. These 
vectors represent constrained choices (made sequen- 
tially, one at each stage or time ¢ = 1,..., T < oo) un- 
der imperfect knowledge about the state w € S2 of the 
world. Although @ is not known a priori, its proba- 
bility distribution, P, is given exogenously and defined 
on some sigma-field F7,; in 2. The knowledge of w 
increases over time. Specifically, there is an expanding 
family F) C--- C F741 of sigma-fields which describes 
the information flow. At time ¢ one may ascertain for 
any event in J; (and such events only) whether it has 
happened or not. In particular, a finite J; partitions 2 
into minimal events (atoms, information sets, decision 
nodes) on each of which x; must be constant. The inclu- 
sion F; C F;41, t < T, reflecting progressive acquisition 
of knowledge, says that the partition becomes finer as 
time evolves. 

At time t the decision-maker implements the part x; 
of his overall decision x = (x),..., x7). That part is sup- 
posed to be an F;-measurable strategy (policy, behav- 
ioral rule) x;: 82 + R"'. This means that only available 
information is used any stage; decisions are based on 
realized rather than future events. If so, the process x = 
(x), ..., Xr) is called nonanticipative with respect to the 
filtration F = (F,)7_,, and we write x € F for brevity. 
For example, let 01, ..., 97 be a stochastic process, de- 
fined on £2, and let F; be generated by 6;,..., ;. Then 
x € F means that x; depends on 6),..., 6; only. 

Besides the informational limitation x € F, there are 
other restrictions x € GM X ‘material’ in nature, which 
are defined as follows: x € £ belongs to the set G (and 
is said to satisfy the phase constraints) if and only if 


81(@, X(@)) = gil, x1(@),...,x4(@)) € —Ki(w) (1) 
almost surely (a.s.); x € & belongs to X if and only if 
x1(wW) € X(@) as. (2) 


for all t. Here, g: Q x RUT Tt" + R™ is F, x B- 
measurable, and K;(w) C R™, X;(w) C R™ are F;- 
measurable random sets (see, e.g., [2] and [1]); B 


stands for the Borel o-algebra. Both X;(w) and K;(@) 
are nonempty and closed; K;(@) is a convex cone. De- 
fine the relation <,;,, on R™ bya<;y~b@b-—ae 
K,(@). Then (1) can be written in the form 


B(@,X(W)) <tm0 as. (3) 
The basic optimization problem is stated as follows. 


Find — inf f(x) 
s.t. xEFAGNX. 


(P) 


Problem (P) is supposed to be feasible (i.e, FAGNX 
# ) with finite optimal value. 

Important examples of objective functionals include 
integral functionals of the form 


f(x) := EF(@, x(@)) = [ Flo. x(oyP(do), (4) 


where F is some F7,; x B-measurable integrand for 
which f(x), x € X, is well-defined and finite. 

In the general setting, the objective f is a real-valued 
functional on the set X C £, where £ is the given lin- 
ear space of F7,,-measurable vector functions x(w) = 
(x,(@), ..., xr(w)). As £, one often takes L?(F 7,1, P; 
R"), 1 := } om, with p € [1, +00]. This space consists of 
(equivalence classes of) F74;-measurable functions x: 
2 — R" with finite norm || x ||p := [ f |x(@)|PP(dw)]""? 
if p € [1, +00), and || x |loo := ess sup |x(w)|, where |-| is 
any fixed norm on a finite-dimensional vector space. 

It will be convenient to assume that all the o- 
algebras F;, are completed by all subsets of null-sets in 
2. In many applications, this assumption does not lead 
to a significant loss in generality. 


Results 


For the most part, it will not be supposed that problem 
(P) has optimal solutions. For completeness, however, 
an existence result is provided. 


Theorem 1 (Existence of optimal solutions.) Suppose 
that, for each a, the set 


A(o) 


a, € X,(o), 
&:(@, a) € —K,(o), 
Vt 


== {a=(a,...,a7): 
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is closed, convex and bounded in the norm |-| by some 
number B(w) > 0, where EB < oo. Assume £& contains 
ax € F such that x(w) € A(w) a.s. Also, suppose f is 
convex and lower semicontinuous with respect to L' con- 
vergence. Then problem (P) admits an optimal solution. 


Proof The feasible set FN G MN X is convex, closed in 
L' and uniformly integrable, hence weakly sequentially 
compact, while f is weakly lower semicontinuous [6]. 


Some notation must now be fixed. If Y is a topological 
vector space, we write y* € Y* when y* is a continuous 
linear mapping of Y into R. The value of y* € Y* at y € 
Y is denoted by y*y. If K C Y is a convex cone, its dual 
cone is defined as K* := {y* € Y* :y*y > 0, Vy € K}. 

The following general fact will serve as the basis for 
further presentation. 


Proposition 2 (A Fritz John rule) Consider an opti- 
mization problem: 


Find inf f(x) 

s.t. xEeX, 
g(x) € —K, 
h(x) = 0, 


(P1) 


having finite optimal value inf(P1) for the given func- 
tions f: X > R, g:X > Y andh: X — %. Here X is a set, 
K CY is a convex cone, and Y, % are Hausdorff linear 
topological spaces. Suppose the convex hull conv C of the 
set 


f(x) < inf (P1) + 1, 
g(x) Ee -—K + y, 
h(x) =z 
forsomex €X 


(r,y,Z) 
ERxYxZ- 


has nonempty interior and (0, 0, 0) at its boundary. Then 
there exists a nonzero continuous linear functional (r*, 
y*,2*) ER, x K* x 2 such that 


r* inf (P1) 


5 
= inf {r* f(x) + y* g(x) + 2*h(x): x EX}. ©) 


If Y, Z are both finite-dimensional, it suffices for (5) that 
(0, 0, 0) lies at the boundary of conv C. 


Proof The convex hull of C has a closed supporting 
hyperplane through its boundary point (0, 0, 0). Hence 
there is a nonzero (r*, y*, z*) € R* x Y* x Z* such that 


rr+y*y +z*z> 0 for all (r, y, z) € C. It is straightfor- 
ward to see that r* > 0, and that y* must belong to the 
dual cone K*. Thus r*[f(x) — inf(P1)]+y*g(x)+z*h(x) 
> 0 for all x € X, implying inf{r* f(x) + y* g(x) + z*h(x): 
x € X} > r* inf(P1). The reverse inequality holds triv- 
ially. 

The above result can be employed, in particular, if 

a) Cis convex, and 

b) the interior intC of the set C is not empty. 

Observe that (0, 0, 0) is always on the boundary of C. 
Condition a) is fulfilled if, for any x' € X, y' € g(x’) + 
K, z! = h(x'), i = 1, 2, and p € [0, 1], there exists x € 
X such that f(x) < pf(x') + (1 — p)f(x*), g(x) € —K + 
py'+ (1 — p)y’, and h(x) = pz'+(1 — p)z’. In turn, this 
property holds if X is a convex set in some linear space, 
f isa convex functional on X, h: X > Z is affine, and the 
mapping g: X — Y is convex with respect to the cone 
K, i.e. pg(x')+(1 — p)g(x*)—g(px'+(1 — p)x”) € K for 
all x!, x? € X and p € (0, 1]. 

In this article, Proposition 2 is applied to versions 
of problem (P1) in which one of the constraints g(x) € 
—X or h(x) = 0 is not present. Observe that intC 4 @ in 
any of the following cases: 

i) his absent, and f is bounded above ona set X, C X 
with int[g(X,) + K] 4 @ (this is so if int 4 ); 

ii) gis absent and f is bounded above on some X2 C X 
for which int h(X,) 4 @. 

In the applications below, the functional f and the set X 

will be those involved in the basic model (see the pre- 

vious section). Furthermore, fix some q € [1, +00] and 

set 


T 
VS([2G.FR™: 

- (6) 
g(x) = (gil, x)), 
K:={y ©Y: yl) € Ki(w) as. Vt}. 


Suppose || g(x) ||, < oo for all x € X. Also, following 
[16], define 


h(x) := x —(E,x1,...,Erxr), (7) 


and Z := h(£), assuming that h (when this operator 
comes into action) is well-defined on £ D X. Here E; 
stands for the conditional expectation given F;. Ob- 
serve that x € F if and only if h(x) = 0. 

In the subsequent analysis, the following hypothesis 
regarding problem (P) will be used: 
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FJ) There exists a nonzero (r*, y*, z*) € Ry x K* x 2* 
such that the Fritz John condition (5) holds with 
inf(P) finite. 

It is understood that if some constraint g(x) € —K or 

h(x) = 0 is absent in (P), or automatically satisfied, then 

the corresponding part of (y*, z*) should be omitted. 
Conditions which guarantee applicability of Propo- 

sition 2, and hence the truth of FJ), are presented above. 

Hypothesis FJ) is especially well motivated for con- 

vex problems. Absent convexity, FJ) frequently obtains 

when X is a neighborhood of some local optimum. 


Lagrange Multipliers for Phase Constraints 


Throughout this subsection, it is assumed that £ C F. 
The information constraint is thus satisfied automati- 
cally, and problem (P) reduces to finding inf f over GN 
X. Associate the Lagrangian 


L(x, A) := f(x) +Ag(x), x EX, MEK, 


to constraint (3). 


Theorem 3 (Lagrange multipliers for the phase con- 
straints.) Assume FJ) and the following strict feasibil- 
ity condition: For any y = (y;)1_, € Y, belonging to some 
neighborhood of 0, one can find x € X satisfying 


iM, X()) Stu yi(@) as., Vet. 
Then there exists A = (A,)_, € K* such that 


inf (P) = inf L(-,). (8) 


Proof ‘The strict feasibility condition ensures that the 
number r* involved in FJ) is strictly positive. Divide (5) 
by r* and set A := y*/r*. 


It is often important to obtain an integral representa- 
tion of A, 


Ay=EX(a)y(@), yey, (9) 


with an appropriate function A(w). This is immediate 
if q € [1, +00) because then L4(F;, P;R™')* = LT (F;, 
P;R™*), where q* := q/(q — 1), 1* = +00. However, not 
every functional in the dual of L® is of the form (9) 
(those which admit representation (9) with A(-) € L! are 
called absolutely continuous). Therefore the case q = + 
oo requires special consideration. The analysis of that 
case is based on the following continuity property of f: 


e For any pair x,x € X and any sequence of Fr- 
measurable indicators y* 2 — {0, 1} satisfying Ey* 
— 0, we have 

FR + (1 = x')x) > fle), (10) 
provided y*x + (1— y*)x € X. 
Clearly (10) holds when f is of the form (4) (with finite 
values). 
One speaks of complete recourse if: 
CR) for any x € X and 1 < ¢ < T, there exists x; € 
FiO X; such that 


81(@, x1(@), ..-, X1-1(@), X1(@)) Xt0 Vas. (11) 


and (x),...,(1 — 1s)x; + Isxt,...,x7r) € X for 
each S € F;. 
Here 1s(w) = 1 if w € S and 0 otherwise; the notation 
%. € FF, X; means that x, is ¥;-measurable and satis- 
fies (2). 


Theorem 4 (Absolutely continuous Lagrange mul- 
tipliers for essentially bounded phase constraints.) 
Consider the problem 


min f 
sto xEGNX. 


Assume FJ), CR), strict feasibility, and the continuity 
property (10). Then, there exists 2 = (A,)1_, € K* with 
At € L'(F;, P;R™) satisfying (8). 


Proof By virtue of the Yosida-Hewitt theorem [20], any 
A € L© admits a unique decomposition A = A* + 2° 
into an absolutely continuous component A* € L' and 
a singular component 1°. The last notion means that 
there exist measurable sets S!, S?, ... such that P(S‘) > 
0 and A‘(-) = AS(y*-), Wk, where y* := Igk. 

Consider A = (A;) € K* satisfying (8) and decom- 
pose A; into A?+A‘. We claim that, in (8), one may re- 
place A by A* = (A#), i-e., set all A} = 0. Indeed, by way 
of induction, suppose A‘ = 0 for all t > f: 


fe) + > Argr(x) + DO Adge(x) 
txt t>t+l 


> inf(P) (12) 
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for any x € X, where 1 < t < T. (If t = T, the last sum 
is zero.) Then the analogous inequality also holds for 
t — 1. To prove this, select a sequence S!, S*, ...of F;- 
measurable sets such that P(S*) — 0 and AC) = AS(*), 
Vk, where y* is the indicator of S*. Fix any x € X. Con- 
sider the function x; € F;,X; described in CR). Let xk 
€ X be obtained from x by substituting 7% +-0=—7") x: 
in place of x; (in coordinate t only). Then 


FOE) + SO Aegelak) + D5 AS ge (x*) 


tT<t-1 t=t 


> fk) + So Aege(x*) + Aigr(x*) 
tT<t-1 


+ D5 Al gr(x*) > inf (P), 


tat 


because Asgi(x*) = AS (x*gi(x*)) < 0. To obtain (12) for 
t — 1, it suffices to pass to the limit, employing the con- 
tinuity property (10). 

Finally, observe that A* = (A?) € K*, since, for any 
Vt =t,0 0, we have 0 < A,[(L — xy] = AST — x4] 
> Aty, = 0. 


Methods using similar arguments were first pro- 
posed by A.Ya. Dubovitskii and A.A. Milyutin [5] in 
(deterministic) optimal control theory. Applications 
to stochastic extremal problems were developed in 
[8,9,10,14,15,16], and [1], where various versions and 
extensions of Theorem 4 can be found. Another ap- 
proach, relying on a direct analysis of perturbation 
functions in L; and yielding Lagrange multipliers rep- 
resentable as functions in Lo, was suggested by E.B. 
Dynkin [7]. 


Lagrange Multipliers 
for Nonanticipativity Constraints 


In this subsection, let £ := L?(F741, P;R") for some p € 
[1, too] and Z := {z € £: (Ez, ..., Erzr) = 0} = h(L), 
where h is given by (7). 

The next goal is to relax the constraint x € F (} 
h(x) = 0). Constraints of this type were first examined 
systematically by R.T. Rockafellar and R.J-B. Wets [16]. 
To separate different issues, suppose here that all the 
phase constraints of type (3) are absent, or already re- 
laxed, as described above. Then (P) reduces to minimiz- 
ing f over F N X. To deal with the constraint h(x) = 0, 


consider a different Lagrangian 


A(x, 7) := f(x) + h(x), 


xeEX, wezZz*. (13) 


Theorem 5 (Lagrange multipliers for the nonantici- 
pativity constraints.) Consider the problem of mini- 
mizing f over F 1 X. Assume FJ). If X has nonempty 
interior, then there is a linear functional m € Z* such 
that 


inf (P) = inf A(-, 7). (14) 


Proof Evidently the linear mapping h: & — Z defined 
in (7) is surjective. Both spaces £, Z are Banach. Since 
the set U := intX 4 @ is open, the open mapping theo- 
rem implies that h(U) is open. Furthermore, 0 € h(‘U) 
(see Remark 6), and so 0 € int h(X). Asa result, r* must 
be strictly positive. Divide 5 by r* and set m := z*/r*. 


Remark 6 Note that ifint X 4 @ and p< oo, then X;(@) 
=R” as.,andso X=. Furthermore, ifint X 4 @, then, 
for any p, FN int X A @, which implies 0 € h(int X). 


Remark 7 By the Hahn-Banach theorem, the func- 
tional constructed in Theorem 5 can be extended to 
a continuous functional z € £*. 

Again, it is of importance to obtain an integral rep- 
resentation of 2. Like before, when p € [1, +00), this 
representation is immediate, since £* = L?” with p* = 
pl(p — 1), 1* = +00. Suppose p = +00. 


Theorem 8 (Absolutely continuous multipliers for 
the nonanticipativity constraints: the L® case.) Con- 
sider the problem 


min f 


over FOX. 


Assume FJ) together with the continuity condition (10). 
Suppose that X is convex and int X # O. Then there 
exists an absolutely continuous functional on L = 
L®(Fr41, P:R") satisfying (14). 


Proof Let be the functional described in Theorem 
5 and Remark 7. By the Yosida-Hewitt theorem, each 
coordinate zr; of x decomposes into the sum z/ + zi, 
where % € L' and x‘ is singular. Let S', S*, ...€ Frat, 
P(S*) + 0 and i) = eG"): Wk, where 7* = 1... We 
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may assume, additionally, that || (1 — 7*)E:7* loo > 0 
[9]. Consider any x € X. Construct x‘ from x by sub- 
stituting Exe t+(1— x") xt in place of x; in coordinate t 
only. Then x* € X by virtue of the convexity and ;- 
measurability of X;(@) (see [1, App. II]; X;(@) is con- 
vex a.s. since X is convex). Finally, f(x*) > f(x) and 
m (xk — E,xk) > 24 (x; — E,x,), because 25 (x* — E,xk) 
= ms [Epxe(* — E,y*)], where E;|* — E,z*| < 2E,(1 — 
X)Ex* > 0 in || - |loo. This shows that A(x, 2?) > inf 
(P) for all x € X. 


Remark 9 If x = (x;) admits an integral representa- 
tion, then the Lagrangian (13) can be written 


T 
A(x, 7) = fe) +E SE, [201(x — Er) 


t=1 


a 
= f(x) + ES [( — Ep) xe] . 


t=1 


This allows one to interpret 7;— E;z; as a ‘shadow 
price’ of information [3,4,9,16]. 


Synthesis 


Combination of the results presented above allows one 

to examine both the phase and the nonanticipativity 

constraints simultaneously. The next theorem provides 

a criterion of optimality in terms of pointwise mini- 

mization of a Lagrangian associated with the two con- 

straints. Consider problem (P) with £ = L?(F 741, P;R") 

and f defined by (4). Suppose that the following hy- 

potheses hold: 

C) For each w, the set X(@) := X\(@) x --- x X7(@) 
and the function F(w, a), a € X(w), are convex; the 
mapping g(a, a), a € X(w), is convex with respect 
to the cone Ki(w) x +--+ x Kr(w). 

G) The functional f(x) is bounded above on some set 
Xx) C XN F with int[g(X)) + K] F G; furthermore, 
0 € int[g(XN F) + K]. 

H) For any integral linear functional A € K*, f(x) + 
Ag(x) is bounded above on some Xz C X with int 
h(X2) 4 @, and we have int X # 9. 

The sets of interior points involved in G), int h(X2), and 

int X are defined in terms of the spaces Y, Z = h(£), and 

4, respectively (for the definition of Y and K see (6)). 

Additionally, if q = oo, assume CR). 


Theorem 10 (Pointwise optimality.) Let x «XN F 
M G. Then x is a solution to (P) if and only if there exist 
functionals A € K* and x € L?* of integral form (9) such 
that a.s. 


x(@) € argmin 


F(@, a) + A(@)g(o, a) 
+ >>, [2(@) -— E:tt:(@)] a 


2 ape X+(w) we) 


and A (w)g(@, x(@)) = 0. 


Proof Let x be a solution to (P). It is sufficient to con- 
struct functionals A € K* and mw € L* of integral form 
such that 


{ Flo, x!) + A@)g(o, x") 
Rom) SE) + >>, [a1 — Evry] x} 


for all x’ € X. Then a suitable measurable selection ar- 
gument (see, e. g., [1, App. I]) yields (15)). To construct 
A, use Theorems 3 and 4. Then, to prove the existence 
of x, apply Theorems 5 and 8 to a modified optimiza- 
tion problem with the objective functional f(x) + A g(x). 
(The truth of FJ) follows from i) and ii).) The ‘if state- 
ment is straightforward. 


Stochastic programming, as presented above, can easily 
accommodate integral constraints of the form i y(a, 
x(@))P(dw) € M, where g: 2 x R" > R¢ is an integrand 
satisfying appropriate conditions and M isa cone in R’. 

Of course this article is only a brief glance at the 
large and rapidly developing field of study. Many rele- 
vant aspects have not been discussed. Perhaps the most 
important of such aspects is the tight connection be- 
tween the theory of stochastic Lagrange multipliers and 
the theory of stochastic economic models. The for- 
mer provides technical tools for the latter. The latter 
serves as a source of problems and often as a ‘proving 
ground’ for new developments regarding stochastic La- 
grange multipliers. For an introduction into this sub- 
ject, see [1]. 
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Uncertainty is pervasive in many decision making 
problems on which it often plays a key role. Uncertainty 
in the input data of mathematical programs is gener- 
ally modeled by postulating a probability distribution 
(usually discrete) for the unknown parameters which 
is then incorporated in an appropriate optimization 
model. Based on this approach, stochastic programs 
(SP) provide a constructive and prescriptive frame- 
work for incorporating, ex-ante, uncertainty in decision 
making models. Stochastic programming has evolved 
into an effective framework for modeling sequential de- 
cision problems under uncertainty in diverse applica- 
tions: e.g., investment management, production and 
logistics, capacity and operational planning for electric 
power generation, management of natural resources, 
network design, etc. 

Attention is focused here on two-stage stochastic 
linear programs with recourse which address the fol- 
lowing situation: Certain decisions must be made at 
present in the face of uncertainty. At a later time uncer- 
tainty is resolved by observing a joint realization (out- 
come) of the values of all uncertain parameters. At that 
time, further corrective (recourse) actions can be taken 
in response to the outcome that materializes. Each pos- 
tulated realization of the uncertain parameters consti- 
tutes a particular scenario. The objective is to minimize 


the expected value of a total cost functional, which in- 
cludes the direct cost of the initial decisions and the ex- 
pected cost of the recourse actions. 


Problem Formulation 


Two-stage SP with recourse distinguish between two sets 

of decision variables: 

e@ xo €R” denotes the first-stage decisions. These deci- 
sions are made before the values of the random vari- 
ables are observed, but they should anticipate the 
consequent cost of recourse actions. 

e y; € R" denotes the second-stage decisions under 
a particular scenario s. These are the adaptive de- 
cisions, representing recourse actions that are taken 
after the random variables have been observed. They 
depend on the first-stage decisions and on the real- 
ization of the random variables. 

Uncertainty is represented by a discrete set of scenarios 

S={I,..., S} with associated probabilities p, > 0, -S_, 

Ps = 1. The two stage SP with recourse can then be stated 

in the following deterministic equivalent program, [13]: 


S 
min c'xo+ > pqs Ys. (1) 
xoER'?, ysERY =I 
such that 
Aoxo = b, (2) 


T;Xo + Wey; =hs, Ws eS. (3) 


Any deterministic constraints on the first-stage deci- 
sions are depicted by equation (2) the coefficients of 
which (i.e., the mo xX mo matrix Ap and the vector b € 
R™°) are scenario invariant. Each scenario s € S is asso- 
ciated with a corresponding instance of the input data, 
that is, the m, x mp technology matrix T;, the m, x ny 
recourse matrix W,, and the vectors q, € R™ and h, € 
R™. 

In this compact representation of the deterministic 
equivalent program, the constraints matrix has a dual 
block-angular structure: 


Ao 
Tl W (4) 


Ts Ws 
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The above formulation has n = n9+S-n, variables and m 
= mp + S-m, constraints. Hence, the inclusion of a large 
number of scenarios, so as to account for many pos- 
sible contingencies, inevitably leads to very large opti- 
mization programs. Substantial research effort has been 
directed toward the development of effective solution 
methods that exploit the special block-structures of SP 
and capitalize on the capabilities of high-performance 
computing systems, including parallel multiprocessors. 

Reformulations are sometimes applied to yield 
structures that are more suitable for some parallel algo- 
rithms. These reformulations employ variable-splitting 
to replicate the first-stage solutions into distinct vec- 
tors x; € R"! for each scenario s € S. Explicit nonan- 
ticipativity constraints are then added to the program 
in order to ensure that the values of these distinct 
vectors are scenario-invariant. Nonanticipativity con- 
straints can be of a staircase form: 

Xe = Ko =] 0, 8 = 2s ss Ss (5) 
Similarly, the distinct vectors x, may be equated with 
an auxiliary vector xo, yielding a primal block-angular 
structure: 

Xs— xX =0, s=1,...,S. (6) 
Recent reviews of alternative parallel algorithms for 
solving stochastic programs can be found in [2,12]. 


Interior Point Algorithm 


Interior point algorithms directly address the deter- 
ministic equivalent program (1)-(3). Let us focus on 
the primal-dual, path following interior point method 
(e.g., [11]) which solves simultaneously the following 
pair of dual programs: 


(P) minc!x st. Ax =b, 


x>0 


(D) maxb'y st.Alyt+z=c. 


z>0 
The m x n constraint matrix A is assumed to have full 
row rank. The method applies a logarithmic barrier to 
enforce the nonnegativity constraints. Each iteration 
involves a Newton step for the system of linear equa- 
tions that represents first-order conditions for a criti- 
cal point of the associated Lagrangian functions for the 


barrier forms of (P) and (D). The algorithm is given be- 
low. 

X and Z are the n x n positive definite diagonal ma- 
trices X = diag(x,, ..., Xn), Z = diag(z, ..., Zn). The 
steplengths ap, @p are computed so as to keep the pri- 
mal variables (x) and the dual slack iterates (z) positive: 


ap = §- min fay: xj; +ajAx; = 0}, (7) 
@p = 8+ min for: zj +aj;Az; > 0}, (8) 


where 6 € (0, 1). A typical value of 5, used in practice 
is 0.9995. An updating formula for setting the barrier 
parameter jz” that works well in practice is: 


cl x _ ea 
= 
n 


Initialization 

Set v=0. Start with an interior point (x” €R7, z” € 
R?, y” € R”), and pz” > 0. 

Iterative step 

Solve for the dual step Ay: 


(AOA )Ay=y, (9) 


where © = XZ~!, ¥ = p + AO(o — X'6), p = 
b — Ax, 

o =c—A!ly—z,¢ = wl — XZ1, and 1 isa con- 
formable vector of ones. 

Compute the primal step Ax, and the slack variable 
step Az, from 


Ax = —O(o —X"@—A' Ay), (10) 

Az = —X7\(@ — ZAx). (11) 
Update: 

xt) = x + opAx, (12) 

y= Pandy, (13) 

gt = 2 + apAz, (14) 


where ap, @p € (0, 1) are steplengths. reduce 2” 
to ”*!, and increment the iteration counter v <— 
v+i. 
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Hence, the barrier parameter is kept large when far 
from the optimum (as measured by a large duality gap 
in the numerator) and the search direction points away 
from the boundary of the feasible region, so as to allow 
large steps. The barrier parameter is reduced as the op- 
timum is approached so that the iterates may approach 
the boundary of the feasible region. Practically the same 
computations are applied - with fairly minor exten- 
sions - to solve separable, convex quadratic programs 
(see, e. g., [5,11]). 

The major effort involves the solution of the n x n 
symmetric, positive definite system of linear equations 
(A@A')Ay = w. This system is commonly encoun- 
tered in interior point methods. So, the parallel matrix 
factorization procedures discussed here are directly ap- 
plicable in other interior point algorithms as well. In- 
terior point methods for stochastic programs are re- 
viewed in [10]. 

The constraints matrix A of a two-stage SP has 
the dual block-angular structure (4). In the discus- 
sion above, the vector x encompasses all primal deci- 
sion variables, that is, the first-stage decisions x9, as 
well as the recourse decisions y, for all scenarios. Also, 
the scenario probabilities p, are incorporated by scal- 
ing the objective coefficients. Despite the sparsity of 
the constraints matrix A, the product matrix AOA! 
can be very dense due to the presence of the cou- 
pling columns associated with the first-stage variables 
(see (4)). Hence, the direct application of interior point 
methods to stochastic programs is not particularly ef- 
fective. 

A first approach to overcoming this problem fo- 
cuses on staircase formulations (cf. (5)). This signif- 
icantly reduces fill-in and produces banded product 
matrices which can be factorized efficiently [8]. Schur 
complements have also been tested as a means for over- 
coming the problem with the dense columns of the 
first-stage variables [3,8]. These procedures can im- 
prove substantially the performance of interior point 
algorithms on SP. However, they can not be effec- 
tively parallelized. Moreover, the Schur complement 
approach suffers from numerical instabilities, particu- 
larly in problems with many dense columns [3]. 

An alternative, is to directly parallelize the matrix 
operations in interior point methods. Such procedures 
typically treat the optimization programs as fully dense 
and can not exploit the sparse block structure of SP. 


Consequently, they are effective only for moderate-size 
problems [6,9]. 

A third approach is to specialize a matrix factoriza- 
tion procedure so as to capitalize on the structure of SP. 
The method is based on a generalization of the Sher- 
man-Morrison-Woodbury formula. It was proposed 
for stochastic programs by J.R. Birge and L. Qi [4], and 
was further extended by Birge and D.F. Holmes [3], 
who also reported numerical experiments. Implemen- 
tations of this factorization procedure on hypercubes 
and other parallel computers are reported in [7,14]. 


Parallel Matrix Factorization 


Partition the vectors Ay and y in (9) into subvectors 
[AyJ , re Ayl |" and ar San wll, respectively, 
with Ay, Ww; € R”’, for] =0,..., S. Here Ayo repre- 
sents the dual step corresponding to the first-stage con- 
straints, and Ay; represents the dual step correspond- 
ing to the second-stage constraints for the /th scenario. 
Hence, m; = m,, for] = 1, ..., S; also denote nj; = n; 
for] =1,..., S. The matrix factorization procedure that 
solves for the dual step A y in (9) is based on the follow- 
ing lemma; for a proof of the lemma, see [4]. 


Lemmal LetM=AOA', where @ is diagonal, and R 
feo s(Ri), where Ro =Lisan R™ x R™ identity 
matrix, Rj = WiO,|W} €R™*™!, 1=1,..., S, and O; € 
R"'“"! is the (diagonal) submatrix of © corresponding to 
the Ith block. Also, let 


S 
G = OF? + Ap Ao + > 1 Ry'Ni, 


1=1 
ea (@ Al 
—Ay 0 


(9) 
Ap I Ay — 
T, O T, 0 
U= V= 
Ts 0 Ts 0 


If Ag and W), 1=1,..., S, have full row rank then M and 
Gy = — Ao G;'Aj are invertible, and 


M7! =R!—R"UG! VIR", (10) 


Equation (10) indicates that the solution of the linear 
system MAy = (A@A')Ay = w can be expressed as Ay 
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=p — 1, where p solves Rp = w, and r is obtained from 
the system 
Gq=V'p, 


Rr = Uq. (11) 


The vector p can be computed componentwise by solv- 
ing Rip; = 1, for 1 =0,..., S. The block structure of G is 
exploited in solving for q. One can write: 


_f & ASV ie)? 
oa= ("1 ae ae Ys a 


where 
(=) =V'p= a — a = 20> 
Hence, 
q = —G,'(p° + AoGy'P’). (13) 
qi =G)@! - Aq’). (14) 


Once q is known, r can be computed componentwise 
from (11). 

The required operations rely extensively on matrix 
subblock computations that can be performed indepen- 
dently of one another. A parallel procedure for comput- 
ing the dual step Ay is summarized in the following box 
(denote by Aj. the ith row of A, and by A, the jth col- 
umn of A). 

Interprocessor data communication is necessary at 
only three points. After forming the terms T] R7!Tj in 
Steps 2a—2b the processors communicate to form the 
matrix G, and the vectors p! and p’, in Step 2b. The 
results can be accumulated at a single master proces- 
sor which executes serially the computations involving 
the dense matrices G; and G2 in Steps 2c-2e. The com- 
puted vector q is then broadcasted to all other proces- 
sors. Steps 3 and 4 require only the distributed data R), 
T}, and p; on the /th processor and can be carried out 
with full parallelism. A final communication step accu- 
mulates the partial vectors A y; at the master processor 
to form A y. This vector can then be made available to 
all processors for use in the subsequent calculation of 
the directions A x and A z; these computations involve 
only matrix-vector products and vector additions that 
can be parallelized in a rather straightforward manner 
[14]. This algorithm is suitable for implementation on 


Begin with the following data distribution. 
Processor | holds W, Tj, ©), and wy, 
1 = 1,..., S. A designated master processor 
also holds Ag, Ro, @o, and Wo. 

1 (Parallel solution of Rp = .) 
The master processor sets Ropo=Ipo=Wo. 
Processors | = 1,...,S, form R; = W,0, Al 
and solve Rj p; = Wi for pi. 

2 (Solution of G, = V' p.) 

2a Processors 1 = 1,...,S, solve Rj(u;)! = (T)).; 


for (uj)', i = 1,...,o, thus computing the 
columns of the matrix u; = [(u))!,..., 
Gel =e iy. 

2b) Processors ? = 15. , Spmultiply yy — fla, 


to form v; = saa RT; and also compute ¢; = 
Tp, Communicate v; € R"*"° and c; € R” 
to form G (cf. (15)) and p! (cf. (18)) on the 
master processor. 
The master processor sets jie = —po. 

2c The master processor solves Gju = p! for u 
and sets v = p* + Agu (cf. (19)). 

2d The master processor forms G) by solving 


(G:)wi = (A). for w', i = 1, ..., mo, and 
qais @y = SA, coc WPL 

2e The master processor solves G2q* = —v for q* 
(cf. (19)), and solves Gyq! = p!— A] q? for q! 
(cf. (20)). 
Communicate to distribute q' € R” to all pro- 
cessors. 


3 (Parallel solution of Rr = Ug.) 
The master processor sets ry = Aoq! + q’. 
Processors ] = 1,..., S, solve Rjr; = 
for r). 

4 (Form Ay in parallel.) 
The master processor sets Ayo = po — ro. Pro- 
Gass | = Ml, coog Sy Set AW = fDi = ie 
Communicate to gather the vector Ay on the 
master processor. 


Tq! 


Parallel matrix factorization for dual step calculation 


distributed memory, as well as on shared memory mul- 
tiprocessors. 

An alternative parallel implementation is to dis- 
tribute the matrices G, and G) to all processors, and let 
the processors proceed locally (and redundantly) with 
all calculations involving these matrices. The master 
processor approach uses an all-to-one communication 
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step to accumulate the dense matrices at the master, fol- 
lowed by a one-to-all communication step to distribute 
the results to the processors. The alternative approach 
combines these two communication steps into a single 
all-to-all communication step that distributes the dense 
matrices to all processors which compute locally (and 
redundantly) their own copies of the vector q. Both of 
these alternatives are very efficient on present day dis- 
tributed memory machines with high-bandwidth inter- 
connections. 

Yet another alternative is to distribute the dense ma- 
trices across processors, and use parallel dense linear al- 
gebra techniques for all calculations that involve these 
matrices. This approach is more suitable for shared- 
memory, tightly coupled multiprocessors, and when 
the dense matrices G; and G) are large. 

A J. Berger et al. [1] proposed another matrix factor- 
ization procedure that exploits the block-structure form 
of stochastic programs in interior point algorithms. The 
method, termed tree dissection, operates on the split- 
variable formulation and is applicable to multistage 
stochastic programs with recourse and convex, block- 
separable objective functions. A serial implementation 
of the method demonstrated very competitive compu- 
tational performance on large scale problems in com- 
parison to direct applications of interior point algo- 
rithms. 


Computational Experience 


Interior point algorithms can be applied to solve SP 
with linear or separable, convex objective functions. 
Separability is important to maintain sparsity in the 
projection matrices. Nonseparable problems can yield 
full projection matrices, thus dramatically increasing 
the computational complexity. In such cases, it is possi- 
ble to treat a problem as fully dense and directly paral- 
lelize the matrix operations involved in interior point 
methods without regard to problem structure [6,9]. 
However, such an approach is effective only for moder- 
ate size problems due to its substantial computational 
and storage requirements. 

Interior point algorithms have proved very ro- 
bust on two-stage SP with linear, or separable, convex 
quadratic objectives. The required number of iterations 
is neither particularly influenced in going from linear 
to separable quadratic SP, nor is it significantly affected 


by the size of the problem or by the conditioning of 
the objective function. Hence, the algorithms can solve 
to a high accuracy very large SP in a moderate num- 
ber of iterations. However, they may exhibit numeri- 
cal difficulties if the constraint matrix does not have full 
row rank. Even if the entire constraint matrix has full 
row rank, the parallel factorization procedure may suf- 
fer from numerical instabilities if the recourse matrices 
W, do not have full row rank as well. Thus, care must 
be exercised in implementations to test and account for 
situations in which the recourse matrices are rank defi- 
cient. 

The parallel matrix factorization procedure pre- 
sented in this article has been subjected to extensive nu- 
merical experimentation on hypercubes and other par- 
allel computing systems [7,14], exhibiting a high level of 
scalability on large scale problems. In the implementa- 
tions, each parallel task factorized the part of the projec- 
tion matrix corresponding to a scenario. Moreover, the 
matrix operations involved in the factorization proce- 
dure, and throughout the interior point algorithm, can 
be executed efficiently on vector processors, or be fur- 
ther parallelized on massively parallel multiprocessors. 
Particularly, the operations on the small dense matrices 
that constitute the coordination step of the algorithm 
can be vectorized or parallelized to effectively eliminate 
any serial bottleneck. 
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The simple integer recourse model with fixed technology 
matrix is defined as 


inf {cx + Q(x): Ax=b, xERT}, 
where the expected value function Q is 
Q(x) = Eg v(E — Tx), 
and v is the value function of the second-stage problem 


v(s) = 


inf (qty? +q°y": 
yt sy 


yes7 217 222) 


for s€R™. Here c,A,b,q*+,q and T are vec- 
tors/matrices of the appropriate size, qt,q™ > 0, 
q’ +q > 0,and é isa random vector in R™. 

As suggested by the name, this model has the same 
structure as the well-known continuous simple re- 
course model, in which the second-stage decision vari- 
ables y = (y*, y~) are non-negative reals. These mod- 
els are indeed the most simple recourse models, both 
analytically and conceptually. In both models the re- 
course actions (that is, the compensations for observed 
deviations from the constraints Tx = &) are straight- 
forward. For example, let Tx represent production to 
meet uncertain demand &. Then in the continuous 
model the recourse actions may represent buying or 
selling any shortage or surplus, whereas in the integer 
recourse model buying and selling is only possible in 
batches of a certain size. In both models the objective 
function reflects the direct costs cx and the expected re- 
course costs Q(x). 

Using separability which is due to the simple re- 
course structure, Q is completely characterized by the 
one-dimensional generic function Q, given by 


Q(z) = qt Eel —z]* + q7Eel€-z|". zER, 


with g*,q” € R4,qt +q > 0,&arandom variable, 
and [s]* = max{0, [s]}, |s]~ = max{0,—|s|},s eR. 
Below we present results for the one-dimensional func- 
tion Q; the extension to the n,-dimensional case is 
straightforward. 

In [11] structural properties of the function Q are 
presented. A closed-form expression for Q is given, 


Q(z) = qt) Pr{é > z+ k} 


k=0 


Co 
+q > Pr{é <z-k}, zeER, 


k=0 


and conditions for (Lipschitz) continuity and (one-sid- 
ed) differentiability are derived. (Corresponding results 
for the model in which also the technology matrix T is 
random are given in [4].) The function Q is continu- 
ous if and only if & follows a continuous distribution. 
If € is a discrete random variable with realizations &/, 
j=1,...,r, then Q is lower semicontinuous with dis- 
continuity points U;{é/ + Z}. Moreover, even if Q is 
continuous, it is non-convex in general. This has led to 
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the study of conditions on the distribution of & such 
that the function Q is convex, and to the construction 
of convex approximations of Q. 

Since Q is continuous (and finite) precisely if & is 
continuously distributed (with finite mean value), it fol- 
lows that Q can only be convex if € has a probability 
density function, say f. In [9] it is shown that, under 
some mild technical conditions, Q is convex if and only 
if f(s) = G(s + 1) — G(s), s € R, where G is an arbi- 
trary cumulative distribution function with finite mean 
value. For example, if G corresponds to the degenerate 
distribution in 1 then f is a probability density function 
of the uniform distribution on [0,1]. Formulated in 
terms of random variables, we have that Q is convex if 
and only if there exists a random variable 7 with finite 
mean value, such that for all s € R the conditional dis- 
tribution of € given 7 = s is uniform on [s — 1, s]. From 
this we see that the uniform distribution with unit sup- 
port plays a central role here. 

In [7] it is shown that any reasonable convex 
approximation of the function Q can be represented as 
a one-dimensional expected value function of a con- 
tinuous simple recourse model, with a random right- 
hand side parameter whose distribution is known. Con- 
sequently, given a convex approximation of Q, the 
integer recourse model can be solved (at least ap- 
proximately) by well-known algorithms for continuous 
simple recourse models (see e. g. [1,3,12,13,19]). 

For the case that & follows a finite discrete distribu- 
tion, a strongly polynomial algorithm to construct the 
convex hull of the function Q is given in [8]. In [6] it 
is shown that if the matrix T has full row rank, then 
the resulting one-dimensional functions can be used as 
building blocks for the convex hull of the n-dimen- 
sional expected value function Q. If this condition is not 
satisfied, a convex lower bound for Q is obtained. 

If € is a continuous random variable, convex ap- 
proximations of Q can be obtained by perturbing its 
distribution. In [9] a class of such approximations, de- 
fined by their probability density functions fo(s) = 
F(|sla + 1) — F([sJa) , s € R, is analyzed. Here F 
is the cumulative distribution function of £, w € [0, 1) 
is a shift parameter, and |-| ~ denotes round down with 
respect to the set {a + Z} (the case w = 0 corresponds 
to the usual integer round down). For each a € [0, 1), 
the function Qg(z) = qt E[&—z|]t +q Elé —z|~, 
z € R, with the random variable & distributed accord- 


ing to fy, is a piecewise linear convex approximation of 
Q. It is shown that 


[Qe — Qe < (gt + gE 
where |A|f is the total variation of f. By taking con- 
vex combinations this uniform error bound can be im- 
proved by a factor two at most, which is obtained by 
using fog = (fa + fp)/2 with | — B| = 1/2 as the ap- 
proximating distribution. For many distributions the 
total variation of f decreases as the variance of the dis- 
tribution increases. In these cases the approximation 
becomes better accordingly. 

The continuous simple recourse representations of 
the approximations presented above have discretely 
distributed right-hand side parameters, and can there- 
fore be solved efficiently. Algorithms to compute these 
distributions (and standard solution methods) are im- 
plemented in the model management system SLP- 
IOR [2]. 

Most results referred to above can be found in [17]. 
For an overview of the field of stochastic (mixed-)in- 
teger programming beyond the simple recourse case, 
we refer to [5,10,14,15,16]. An extensive bibliography 
of stochastic programming in general can be found 
in [18], which also contains a separate listing of stochas- 
tic integer programming literature. 
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A two-stage stochastic linear program with recourse 
(SLPR) with random right-hand sides and objective 
costs is normally written as follows: 


min cx + Q(x) 
x>0 


st. Ax =b 


where 


Q(x) = f Ae.e)go) da, 
2 


and 
OO) We) es 


Here, g(w) is the density function for the joint random 
vector © := (E,7) whose support is the set 2 and W 
is an (m x n) matrix. This way of formulating a two- 
stage stochastic program is motivated partly by solu- 
tion procedures, and partly by the time structure of the 
problem. For this article, the former is more important. 
The interpretation of the problem is that first (now) we 
make a decision x, then we observe a value of the joint 
vector @, and finally we make a recourse decision y based 
on our earlier decision x and the observed value of @. 

In a direct approach for solution of the above prob- 
lem, such as Benders decomposition [1] (or equivalently 
the L-shaped decomposition [12]), a master problem is 
created to determine a first stage solution x (say xo), 
along with a subproblem to determine the second stage 
value function Q(xo) by integrating: 


[:o.g) dw = [fo do. 
Q 2 
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For most functions f and densities g this is impossi- 
ble to do exactly. Numerical integration when the di- 
mension of the random vector @ is beyond 5 or 6 is 
computationally intractable. Hence, one has to resort 
to bounds or approximations. Normally, the integral is 
replaced by a lower bounding expression, either by re- 
placing f(w) by some simpler L(w), g(@) by some sim- 
pler [(w), or both. Benders decomposition may then be 
applied to this simplified problem to arrive at an opti- 
mal lower bounding solution x9. To check if this solu- 
tion is good enough as an optimal solution to the given 
problem, an upper bound U to Q(xo) must be found. 
Such upper bounds are usually determined by either re- 
lying on the solution of a certain moment problem that 
would essentially yield a discretization of the support (2 
or using a functional approximation U(w) that serves as 
an upper bounding function to f(w). The latter case is 
discussed in this article. In this case, the resulting ap- 
proximations may require either univariate integration 
on marginal domains, or simple discretizations of the 
support to allow efficient computation. 

Upper bounds of interest here can be categorized 
depending on whether SLPR has randomness only in 
the right-hand sides, or it has randomness in both the 
objective costs and right-hand sides. In the former case, 
the function Q is convex in the random vector, while 
in the latter case, it is convex-concave in the random 
vectors. 

We first consider the convex case, by restricting 
to be a degenerate random vector, and thus, using the 
notation that € = w and q = q(n). An easy upper bound 
in this case is available due to H.P. Edmundson and A. 
Madansky. 


The Edmundson-Madansky Upper Bound 


The Edmundson-Madansky upper bound (EM-bound) 
is based on articles by Edmundson [7] and Madansky 
[9]. This bound can be interpreted in terms of a mo- 
ment problem with first moment condition, as well as 
in terms of an upper bounding function U(&) on f(&). 
We consider the latter construction, as illustrated in 
Fig. 1. 

The upper bound U(&) can be written as U(&) = r& + 
s, with r = (f(b) — f(a))/(b — a) and s = b/(b — a) f(a) — 
al(b — a)f(b). Upon integration, one obtains the upper 


a b 
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The basis for the EM-bound 


bound as 


b ae ~ 


[ose ae = po + 10S 


a 


In other words, by integrating U(&) instead of f(&) one 
obtains an upper bound which amounts to just evaluat- 
ing the function f at the extreme values of the support 
[a, b], using the weights 


EE—a 
b—a’ 


_ b-EE 
~ b-a 


and 1-p= 
If the random vector & has K independent random 
components, the above reduction will leave us with a to- 
tal of 2* points to evaluate. Hence, with more than 
about 10 random variables or so, this approximation 
scheme becomes computationally unattractive. Hence, 
there is a need for an upper bound whose complexity is 
not exponential in the number of random variables. 


The Piecewise Linear Upper Bound 


The piecewise linear upper bound (PL-bound) is based 
on the articles independently developed by S.W. Wal- 
lace [13] and J.R. Birge and R.J-B. Wets [4], later com- 
bined in [3]. Assume now that f(€) is given by 


f(§) = min (gy: Wy =b+HE,0<y<c}, 


that is, T(€&) = T, H is a deterministic matrix with n, 
columns, the m; random variables are independent, and 
there is no randomness in the upper limits, c. For sim- 
plicity, assume that the support of & is [0, B], with B 
= (B,, ..., B,,). More complex situations can also be 
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treated. The goal is to find an upper bounding function 
ny 
U(E) = f(0) + D> di. 
i=1 


This function U(€) is useful in that for most g(&) it is 
simple to integrate. We first solve 


f() =min{qy: Wy =b, 0<y <c} =qy’. 
Pa 


This is the base case, and we define a! = —y®, and ! 
= c — y°. Define a counter r and let r := 1. Now, solve 
(letting H, be column r of the matrix H) 
min{qy: Wy = H,B,,a’ < y < p"} 
y 


= qy' = d'B,. 


Then, update the bounds to obtain 


r+ 
i 


prt! = Br —max{y/,0}. 


alt! = a — min{y’, 0}, 


Now, increment r by one and repeat until all mj, ran- 
dom variables have been treated. The PL-bound, as out- 
lined here, requires the solution of n; + 1 linear pro- 
grams, in contrast to the EM-bound which needed 2”! 
linear programs in the same setting. Many other ver- 
sions of this bound exist, see, for example, [8, Sect. 3.4.4; 
6.5.1]. 

Note that the EM-bound and the PL-bound are not 
comparable, in the sense that either one can be better 
than the other. The PL-bound bound may be infinite 
even ifthe true expected value is finite, whereas the EM- 
bound is finite if and only if the true value is finite. If the 
function f(€) turns out to be linear in €, both bounds 
are exact. 


Restrictions 


Upper bounds can be found by adding restrictions to 
the solution set of a problem. The PL-bound above is 
an example of that. In that case the restriction amounts 
to reserving certain parts of the upper limits for certain 
random variables. Another type of thinking about re- 


strictions can be found in [10]. He points out that 


b 
/ FlEg(é) dé 
: b 
2 / min {qy: Wy = b + h(&)} g(&) dé 


b 
Sain i {qy: Wy = b + A} gE) dé. 
y20 


The logic of this bound is that if we allow only one y for 
all values of €, rather than a function of &, we restrict the 
problem, and hence obtain an upper bound. The useful- 
ness of this bound depends on our ability to evaluate the 
final expression. In [11] this expression is used in con- 
nection with another very useful observation, to arrive 
a restricted-recourse bound. Let 


2 =miniqy: Wy 2 §}, 


and define 2* to be an optimal dual solution to this 
problem. With x’ > 2*, we get that zf = z} where 


zy = min{gy + m'(§ — Wy)"}, 


Combining the two results we get that the following 
yields an upper bound: 


b 
min } qy +2! / (& — Wy) *g(&) dé 


Solving this problem amounts to solving a stochastic 
program with simple recourse, and is not particularly 
hard. The quality of the bound depends to a large extent 
on the ability to find tight dual solutions. Note that this 
bound does not require convexity, namely, the matrix 
W is allowed to have random elements, and that it can 
be used for much more general situations than here. 

As D.P. Morton and R.K. Wood noted, the re- 
stricted recourse bounds provide improvements over 
the penalty-based aggregation bounds developed in [2] 
and [6]. Restrictions are also used to bound a multistage 
problem in [14]. 


Upper Bounds for a Convex-Concave Case 


When the right-hand side and objective cost vectors 
(of the second stage problem) are dependent random 
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vectors, upper bounds are developed in [5]. The first 
essential idea is to enclose the joint support §2 within 
a known (K + L)-dimensional simplex 9, where K and 
L are, respectively, the number of € and n random vari- 
ables. Such a simplex can generally be determined quite 
easily, however, the quality of the bound can be affected 
by a particular choice. Let the chosen simplex Q have 
the extreme points given by w! = (u', v') fori=1,...,K 
+L+1. 

The second important idea is to develop an upper 
bounding function U(x, w) to Q(x, w) by using the con- 
vexity property of Q in & vector, given a fixed first stage 
decision x. Towards this, let on be the conditional do- 
main of 2 for any fixed 7 value. Under the convexity, 
the following inequality holds: 


Q(x, @) < U(x, a) 
K+I+1 


= S pil) Q(x, u',n), for w € Qy, 


i=1 
where the nonnegative multipliers p;(@) satisfy the con- 


vexity constraints for any w € (2: 


K+L+1 


> w' pi(@) =o, 


i=1 


K+L+1 


> pi(w) = 1. 
i=1 


Taking expectations under the conditioning argument 
with respect to the ‘true’ density g(w) yields Q(x) < 
EU(x, w). However, EU(x, @) in itself is not easy to 
evaluate. Hence, a simple inequality is utilized to upper 
bound the latter expectation. Notice that 


K+L+1 


EUG oO) = [ (0.x, ¢(0) de 
i=l oO 
where 
G(@, x, i) 


= min {pi(w)q(n)y': Wy = h(u') — T(u')x}, 
yz 
in which each minimization involves only random ob- 


jective coefficients. Then, apply Jensen’s inequality on 
the inner minimization to obtain the upper bound as 


< Di ae {a@qj')y': Wy = h(u') — T(u')x}, 


provided the certainty equivalents p; and 7' satisfy the 
condition: 


pial!) = / OOO 
Q 


N.C.P. Edirisinghe [5] shows that the latter equivalent 
representation can be uniquely determined when the 
objective cost vector q(7) is linear affine in 7. To com- 
pute these certainty equivalents, consider the (nonsin- 
gular) vertex matrix V of the simplex Q, whose ith col- 
umn is given by 


(Wiessei Weal). 


The inverse matrix of V is denoted by V—! whose ith 
row is V;'. Then, it can be shown that 


Di 
= V) (Ele dieses ElEgl Elly «oes Elyel; ly 
Moreover, 7 is evaluated for each coordinate [( = 1,..., 
L) by 77; = rilPis where ri is the ith element of the (K + 
L + 1)-column vector, 


Vv" (Eléini),...,ElExm],Elmml....,Elnim])’. 


Consequently, the upper bound requires all first mo- 
ments and second order moments including the vari- 
ance information of 7. The upper bound computations 
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Convex-concave upper bound in two dimensions 
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require solving only K + L + 1 linear programs. There- 
fore, the complexity of the bound does not grow ex- 
ponentially with the number of random variables. This 
bound is illustrated in Fig. 2. 

It can be verified through counterexamples that 
this upper bound is not associated with the solution 
of a moment problem having the concerned moment 
conditions. In contrast, generally, upper bounds in the 
convex-concave case are associated with solutions to 
moment problems. Also, when applied to the case of K 
= 1 and L = 0 - when only the right-hand side is ran- 
dom -, this upper bound reduces to the previously dis- 
cussed Edmundson-Madansky upper bound. 
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Introduction 


Traditional deterministic optimization methods are 
used for well defined objective and constraint functions, 
i.e., when it is possible to calculate exactly Fo(x) to be 
minimized (or maximized) and to verify constraints 
Fi(x) <0, i=1l:m (1) 
for each decision vector x = (x1,...,Xn) € X, where 
the set X has a “simple” structure (for example, defined 
by linear constraints). Usually it is also assumed that 
gradients or subgradients (for nonsmooth functions) 
Fj, of the functions F;,i = 0,1,...,m are easily cal- 
culated. Stochastic Quasigradient (SQG) methods have 
been developed for solving general optimization prob- 
lems without exact calculation of F;, Fix. They incorpo- 
rate basic ideas of standard optimization methods, ran- 
dom search procedures, stochastic approximation and 
statistical estimation. There are at least three main ap- 
plications areas for SQG methods: 
e Deterministic problems for which the calculation 
of descent directions is difficult (large-scale, nons- 


mooth, distributed, and nonstationary optimization 

models). 

e Multiextremal problems where it is important to by- 
pass locally optimal solutions. 

e Problems involving uncertainties or/and difficulties 
in the evaluation of functions and their subgradi- 
ents (stochastic, spatial, and dynamic optimization 
problems with multidimensional integrals, simula- 
tion and other analytically intractable models). 
Thus, SQG methods are used in situations where the 

structure of the problem does not permit the applica- 
tion of one the many tools of deterministic optimiza- 
tion. They only require modest computer resources per 
iteration and reach with reasonable speed the vicinity 
of optimal solutions, with an accuracy that is sufficient 
for many applications. Further details on SQG methods 
can be found in references. 

The main idea of the SQG methods as proposed 
in [1,2,3] (see also [5,6,7,8,9]) is to use statistical (bi- 
ased and unbiased) estimates of objective and con- 
straints functions and/or their gradients (subgradi- 
ents). In other words, a sequence of approximate solu- 
tions x°,x!,... 
ables n;(k), and random vectors £‘(k),i =0,...,m 
such that the conditional mathematical expectation for 
a given “history” Bx (say, (x°,...,x*)) 


is constructed by using random vari- 


E{ni(k)|Bx] = Fi(x*) + ai(k) , (2) 
E(€!(k)| Ba] = Fix(x*) + b'(k) (3) 


where a;(k) , b‘(k) are “errors” (bias) of the estimates 
ni(k) , €'(k). For the exact convergence of the sequence 
{x} to optimal solutions a;(k) , b'(k) must tend (in 
some sense) to 0 when k — oo. Vectors &'(k) are called 
stochastic quasigradients. If b'(k) = 0, then they are 
also called stochastic gradients for continuously dif- 
ferentiable F;(x) and stochastic subgradients (gener- 
alized gradients) for nonsmooth F;(x). In what fol- 
lows notations F(x), n(k), &(k) are also used instead of 
Fo(x), no(k), €9(k). 

Consider the simplest SQG method. Assume that 
there are no constraints (1), X is a closed bounded 
(compact) convex set such that the orthogonal pro- 
jection I7Tx(y) of a point y on X is easily calcu- 
lated: ITX(y) = Arg min{||y — x||? : x € X}, for exam- 
ple, TTa<x<p(y) = max [a, min iy; b}]. The SQG pro- 
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jection method is defined iteratively as following 
xh! = Mx [x*— pf(K)|.k=01,..., 4) 


where px is a positive stepsize, x° is an arbitrary initial 
approximation (guess). 


Calculation of SQG 


Let us consider some important typical examples of 
SQG. 


Example 1. Monte Carlo Optimization Various prac- 
tical problems are so complicated that only a Monte 
Carlo simulation model is available [2,4,6] to indicate 
how the system might react to any given choice of the 
decision variable x. We always can view a given simula- 
tion run of such a model as the observation of an “envi- 
ronment” w from a sample space 92. To simplify mat- 
ters, let us assume that only a single quantity f(x, w) 
summarizes the output of the simulation @ for given x. 
The problem is then to minimize the expected perfor- 
mance (cost, risk, profit, a “distance” from given goals 
or a reference point, etc.): 


F(x) = Ef(x,@). (5) 


This is a typical stochastic optimization problem. 
Exact values of F(x) are unknown explicitly. Available 
information at each current solution x* and simulation 
run @ is n(k) = f(x*,o) satisfying (2) for a(k) = 0. 
The vector &(k) can be calculated as in the standard 
stochastic approximation procedures: At each step k 
for given x* simulate random outcomes f (x*, w*®), 
F(x*¥ + Aged, w*/), j =1,..., n, where Axelis a posi- 
tive increment in the direction eof jth coordinate axis; 
calculate 


E(k) = D> Ay! 
j=l 
ica + Age, of) — fet, o)] ei. (6) 


KO) « k,n 


Simulations w 10) 
dependent: one possibility is to use only one sim- 
ulation w* at each step k: w*° =... 
The variance of such a single run estimate of SQG 
converges to 0 as k— oo, whereas for indepen- 
dent simulations it goes to oo. Since E [E(k)|x*] = 


are not necessarily in- 


=o =o", 


i=l aS [F(x* + Apel) — F(x*)| e/, then for con- 
tinuously differentiable F(-): 


E[etklx*] = Fels) + CW) An, 7) 


where ||C(k)|| < const < oo for all x* from a bounded 
set X. 


Example 2. Optimization by random search Suppose 
that F(x) can be evaluated exactly but this is time con- 
suming, say, because F(x) is defined on solutions of 
differential equations or on solutions of other opti- 
mization problems. A purely random trial-and-error 
method (with x*+! € X drawn at random until that 
F(x*+!) < F(x*), and so on) may be time consuming 
since the probability to “hit” at random even a sub- 
space as large as nonnegative orthant of n-dimensional 
Euclidean space is 2~”. The traditional finite difference 
approximation 


F,(x*) ~~ S~ Ag! ica + Axel) — F(x*)| ei (8) 
j=l 


requires n + 1 evaluations of F(-) and this also may be 
time-consuming. The SQG 


g(k) = 3/247" [Pak + Ans) — FY] OK,  ) 


where ¢* has independent uniformly distributed on 
[—1, 1] components, requires only two evaluations of 
F(x) at points x* and x* + Axc* independently of the 
dimensionality n. It is easy to see that vector (9) satisfies 
(7) for continuously differentiable F(x). 


Example 3. Finite difference approximations of subgra- 
dients ‘The finite difference approximations (6), (8), 
(9) can not be used for nondifferentiable functions, 
e.g., for stochastic two-stage and minimax problems. 
SQG methods allow to develop simple finite-difference 
subgradient approximations for general (determinis- 
tic and stochastic) nonsmooth optimization problems. 
The slight randomization of (6), (8), (9) by substitut- 
ing, roughly speaking, the current point x* by a ran- 
dom point x* = x* + vk, where the random vector v‘ 
has a density and | we | — 0 with probability 1, ensures 
their convergence even for locally Lipschitz and discon- 
tinuous functions [2,4,5], pp. 151, 320, [6,7]. 

Assume that F(x) is a locally integrable (possibly 
discontinuous) function and the vector v* has suffi- 
ciently smooth density concentrated in a bounded set. 


3804 


Stochastic Quasigradient Methods 


Then 
E(k) = 3/2A;1 [FR + Ang’) — FG)] E*, (10) 


&(k) = 3/2A;! 
[FGF + Ans’, oM) — faa] o* (a) 


are SQG of F(k, x) = EF(x + v*) or so-called stochas- 
tic mollifier quasigradient (SMQG) of F(x), which con- 
verges (in some sense) to F(x) and for which F,.(k, x) 
converges [4,6] to the set of subgradients F,(x). We 
have 

E[edolx*] = Belk, x") + COAR, (12) 
where ||C(k)|| < const < oo for all x* from a bounded 
set. The analysis of convergence of x* involves gen- 
eral ideas of nonstationary optimization (see Example 
5). The important advantage of this approach is that 
F(k, x) smoothes out rapid oscillations of F(x) and re- 
flects general trend of F(x). In this sense F(k, x) pro- 
vides a “bird’s eye” point of view on the “landscape” 
F(x) enabling {x*} to bypass inessential local solutions. 
“Large” enough v* force the procedure to concentrate 
on essential (global) solutions. 


Example 4. Global optimization The simplest way to 
introduce the “inertia” in the gradient type procedure 
to bypass some local solutions is to perturb the gra- 
dient F,.(x*) by a random vector vk, ice. to consider 
&(k) = F(x") + v*, EvF =0.A special choice of v* 
corresponds to the simulated annealing. Another ap- 
proach is to cut off local solutions by sequential convex 
approximations [6]. 


Example 5. Nonstationary optimization Many applied 
problems, [2,5], pp. 152-156, [6], such as in Exam- 
ple 3, can be formulated as optimization problems 
with objective function Fo(k, x) and constraints func- 
tions F;(k, x) changing at each step k = 0,... In this 
case a SQG - method on step k performs one step 
of the minimization of Fo(k,x) using estimates of 
F;,(k, x), i = 0,...,m. An important case arises when 
F;(k, x) — F;(x) in some sense. Then it is possible to 
prove that Fo(k, x*) — min Fo(x). In the general case, it 
is possible to specify wide variety of situations for which 
| Fo(k, x*) — min Fo(x, k)| >-0k>o&. 


Convergence Properties 


SQG methods generate random sequences of approxi- 
mate solutions {x*(w)} and values {F(x*(w))} indexed 
by @ from an appropriately defined probability space. 
Most important, from practical point of view, is the 
convergence of x*(w) (or F(x*)) to the set of local 
(in general) solutions X* (set F* = F*(X*) ) for al- 
most all w (with probability 1). The convergence with 
probability 1 of the sequence {F(x*)} to the set F’ was 
proved for rather general nonsmooth (generalized dif- 
ferentiable, locally Lipschitz and even semicontinuous) 
functions covering a wide range of applications. The 
limit points of {x*(w)} for each w form a connected 
set from X°. The convergence x*(w) — X* with prob- 
ability 1 takes place under “convexity” assumptions. 
The global convergence in general cases requires spe- 
cial stochastic mechanisms [2,6]. In all cases the con- 
vergence requires special choice of the stepsize pz. Due 
to the complexity of the problems, p;, can not be cho- 
sen in a way, that guarantees the monotonic decrease of 
F(x): F(x*t!) < F(x*),k =0,1,... A relatively flexi- 
ble requirement that often guarantees the convergence 
of the sequence {F(x*)} with probability 1 is that with 
probability 1 


r (13) 
So E [px BCI + 1% IE(AI?] < 00. 
k=0 


For example, consider (6) with dependent obser- 
k,0 El ee ee = ep” and flea) << 
const < oo. In this case we can always assume in 
practice that ||E(k)|| < const < oo. Then condition 
(13) leads to the requirement pz, > ie aa Pe = 
00, Pe E(pr Ak + py) < 00, which is satisfied for 
Pr= C/(k+ 1), A, = D/(k+ 1) with constants C, D. 
In practice C, Dare usually adjusted [2,6] at each step by 
taking into account the history B;,, for example by us- 
ing values F(k) = (k + 1)7! “_, f(x’, o'). Different 
adaptive SQG methods with adaptive adjustment of p; 
as a function of B; have been studied in [5], pp. 373- 
385, 316-322, [11]. One idea is to choose px, more or 
less so as minimize E [F(x* _ p&(k))|x*]. It leads to 
adaptive modifications of p; that are proportional to 
the product < &(k + 1),&(k) >. The important issues 


vations w = ow 
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of determining the moment of termination (stopping 
time) and the confidence intervals for approximate so- 
lutions have also been studied [5], pp. 353-373. 

SQG methods require appropriate techniques to 
prove their convergence. These techniques ([2,5], 
pp. 155-156), [7,9] can be viewed as stochastic Lya- 
punov method for the stability analysis of nonsmooth 
dynamic systems. The main idea is to show that 
{x*(w)} for each w leaves any neighborhood of points 
which do not belong to X° with decreasing values of 
some (in general nonsmooth) Lyapunov function. 


Nonsmooth Problems 


The common sense arguments in using SQG methods 
for nonsmooth problems may be misleading (see ap- 
plications of SQG methods). An exception, in a sense, 
is the class of problems with so-called generalized dif- 
ferentiable (GD) functions [4,7]. The class of GD func- 
tions is closed with respect to min and max operators 
and superpositions. The following important formula 
holds for the set df of subgradients 


dmax{ fi(x), fa(x)} = co {dfi(x) = max (fi, fr)} 
(14) 


and subgradients of a composite function W(fi,.... f;) 
are calculated by intuitively obvious chain rules. The 
class of GD functions is also closed with respect to the 
expectation operator: 


OF (x) = Edf(x,@), F(x) = Ef(x,o), (15) 


where f(-,@) is a GD function. 

Formulas (14), (15) provide a useful tool for calcu- 
lating subgradients. Unfortunately, for general classes 
of nonsmooth functions their direct use becomes te- 
dious and in some cases (14), (15) invalid. The most 
promising approach seems to use SMQG similar to 
(10), (11). 


Averaging Operations 


The methods (4) and many other SQG methods have 
the same basic structure as their deterministic coun- 
terparts. The following stochastic linearization method 
possesses an essential new feature. Consider again 
the minimization of F(x),x € X. Assume that F(x) is 
a continuously differentiable function, and that X is 


a convex compact. The method is defined iteratively by 


kth = xk + pr(x* =x*), 0< Pk < 1, (16) 


Ek +1) = E(k) + 8 (E(k +) FQ), 


x* € argmin {E00 x) 1x €E x}, 
where x° is an arbitrary initial approximation from X. 
The well-known deterministic counterpart has 
6, =0, &(k) = F,(x*). Simple examples show that 
without the averaging operation (17), ie, 54 =0, 
method (16) does not converge. For convergence, it is 
required in addition to (13) that 


lo ) 
bx = 0, pk [Sk + 0, )° Es; <0OOo. 
k=0 


(18) 


Method (16) is generally used when X is defined 
by linear constraints. In this case a linear subprob- 
lem is solved at each step k. In contrast, the projection 
method (4), requires the solution of quadratic subprob- 
lem. Note that in both cases only small perturbations 
occur at each step in the objective functions of subprob- 
lems. Therefore, only small adjustments of the preced- 
ing solutions are needed. This method can be modified 
for nondifferentiable functions and constraints (1). In 
particular, it is possible to use SMQG as in (10), (11) 
(stochastic finite-difference approximations) for locally 
Lipschitz functions. 

The use of averaging operations similar to (17) is 
often crucial for the convergence of SQG methods as 
well as their efficiency and robustness. 

This operation is also applied to directions &(k) 
in (4), i.e. instead of &(k) the vector E(k) is used such 
that fork = 0,1,... 


E(k +1) = Fk) + 6 [ek + 1)-F0)] , 
£0) = 50). 


It introduces inertia or “heavy ball” properties for 
procedure (4) in addition to its inherent global features 
due to involving stochastic mechanisms. It may also 
reduce the variance of the SQG. The averaging of ap- 
proximate solutions x* may also improve the asymp- 
totic properties [10]. 
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General Constraints 


Constraints (1) for which the functions F;(x) are not 
known explicitly can be treated by using penalty func- 
tions, the averaging operation, and the » Lagrange 
multipliers. Consider the minimization of Fo(x), x € X, 
subject to constraints (1). Instead we can minimize 
a penalty function, for example 


W(x, c) = Fo(x) + | max {0, F;(x)} 


i=1 


(19) 


on X, where c is a large number. If exact values 
F;(x) are not available, then max {0, F;(x)} is un- 
known. Note that if F; is GD functions, then W(-, c) 
is also GD function with subgradient % = Fo, + 
c aan max {0, F;} Fj, where Fox , Fix are subgradients 
of Fo , Fj. 

Assume there are available the statistical estimates 
£°(k) , E'(k) satisfying (3). Consider the SQG procedure 
with embedded averaging operation (21) 


xh) = TTg| x — px(ECR) 


+¢)\max{o, F(x)}€'()] 20) 


i=1 
Ei(k + 1) = Bi(k) + 6 [n@ — F.b)] 
(21) 


i=1:m. 

The convergence of this method with probability 1 

requires conditions similar to those in (13), (18). The 

following procedure converges also under conditions 
similar to (13), (18): 


x*+1 — Tx [" - pro | (22) 


a 


Assume that F;(x),i=0,1,...,m are convex 
functions and X is a convex compact. The SQG La- 
grange multiplier method is characterized by the rela- 
tions 


£%k), if max F,(k) = Fj,(k) <0, 
gik(k), if Fi,(k) > 0. 


x1 = hy x'—pefe%(H+ adWE |] « (3) 


i=1 


Ai(k+1) = min [max {0, A;(k) + dxni(k)}, C| , (24) 


where nj;(k), &'(k) are estimates of F(x"), Fix(x*) as 
in (2), (3); Fix are subgradients of F;(x); C is a large 
enough number; px, 5, are stepsizes. 

The procedure (23) can be interpreted in the context 
of nonstationary optimization: x**! is the result of the 
one-step of the procedure (4) applied to the nonstation- 
ary function W(k, x) = Fo(x) + 07), Ai(k)F;(x) with 
SQG §°(k) + S07, Ai(k)E'(Kk). It was proved [9] that 
min,<x Fo(x*) converges to min Fo(x) (in the feasible 
set) with probability 1, provided that Fo(x) is strictly 
convex, py = 6x, and (13) is satisfied with ||&(k)||* sub- 
stituted by )77"5 E(k) |’. The convergence for the 
convex functions Fo(x) - not necessary strictly con- 
vex - was established under additional assumptions 
on 6, similar to (18). The convergence of the se- 
quences as psx* / Se Ps» > psA(s)/ ye Ps> 
Xs) = (Ai(s),...,Am(s)), to the saddle points of the 
Lagrange function does not require the strict con- 
vexity of Fo(x) and different stepsizes px, 5, (see 
stochastic quasigradient methods in minimax prob- 
lems). 
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Let us consider some optimization problems that re- 
quire SQG methods: 


min{Fo(x)|Fi(x) < 0,i=1:m,x € X}. (1) 


Maximization via Ordering Rules 


Realistic decision problems involve multiple objectives 
and inherent uncertainty. Generally, it is not possible 
to optimize several objectives simultaneously; for in- 
stance, minimizing cost while maximizing reliability at 


the same time. Therefore, it is necessary to strike a bal- 
ance between various objectives and if we can specify 
some (utility) function U(x) that combines all objec- 
tives into a scale index of preferability, then the problem 
of decision making can be cast in the format of the stan- 
dard optimization problem of maximizing U(x). Un- 
fortunately, finding such a function may be a very diffi- 
cult task. It is often much easier to arrive at a preference 
ordering, [4], p. 176, among feasible decisions (based 
on some rules or direct judgements by decision mak- 
ers). Therefore, let us assume that instead of U(x) there 
is a given consistent “mechanism” (ordering >) that 
can verify whether a vector x is preferred to y(x > y), 
and yields outcomes that are equivalent to those of the 
unknown continuous function U(x) > U(y). Let us de- 
fine F(x) = —U(x) and 


is) Sets Ache, 
~ ) clk)if. xk + Agc(k) < x*, 


where A, — 0 and ¢(k),k = 0,1,... are inde- 
pendent samples of the random vector ¢ uniformly 
distributed over the unit sphere. Then, E[E(k)|x*] = 
—aU,.(x*)/ | U,(x*)| for continuously differentiable 
U(x), where @ is a positive number, i.e., the vector 
&(k) estimates the direction of gradient F,.(x*) and can 
be used in SQG methods (also Stochastic quasigradient 
methods) to maximize F(x) without knowing this func- 


tion. 


Expected Utility Maximization 


In practice, a given decision x often results in different 
outcomes g(x,@) = (gi (x, @), go(x,@),..., g(x, @)) 
affected by uncertainty w (“environment”, parameters). 
Using either objective or subjective probability it is pos- 
sible to treat w as a stochastic variable that is charac- 
terized by the priori probability measure P(-). The ex- 
pected utility is an evaluation 


U(x) = Eu(gi(x, @), g(x, @),..., g(x, @)) 


= i ulo(x,t))P(d(e)) , (2) 


which is linear with respect to P, i.e., if @ is a mixture 
of w’ and w” with probability w and 1—a,0 <a <1, 
then 


U(x) = wEu(g(x, w’) + (1 — a@)Eu(g(x,@”)). (3) 
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The maximization of (2) is a special case of the general 
stochastic optimization (STO) problem, which does 
not necessarily satisfy (3). The expected utility the- 
ory neglects the major difficulties involved in the max- 
imization of U(x): exact evaluation of U(x) as the 
integral (2), analytically or numerically, is only pos- 
sible in exceptional cases. Consequently, applications 
suffer from highly restrictive assumptions, for exam- 
ple that w has a discrete probability distribution with 
a small number N of possible states @ = 1,..., N. 
SQG methods avoid the calculation of integrals (2). 
Take F(x) = —Eu(g(x,@)). Assume that functions 
u(Z1,...,2r)s Zi = gi(x,@) are known explicitly and 
gradients uz(-), gix(-) are calculated exactly for given 
z,x,w. Then, the SQG of F(x) is 


E(k) = (&1(k),..., &n(k)), &)(k) 


=— So uz,(gilx*,o),.... g(x, 0) 
i=1 


gixi(x*,@). (4) 


Example 1. Portfolio Selection Problems 


The advantage of using (4) is evident even in a sim- 
ple single-period model. Assume that at the beginning 
of the period (week, month, year, etc.) an investor al- 
locates funds among different investment alternatives 
with random rates of returns. He may, for example, be 
in charge of investing the foreign currency reserve of 
a central bank, decide on projects financing, or man- 
age a mutual fund. Let j = 1,...,m denotes assets 
(or classes of investment) with random rates of re- 
turn @;; x; is a share of asset j to be included in the 
portfolio; c; is the current price; W is the initial fund. 
The net portfolio future value is now g(x,w) = W — 
iar Xi + jai OX; Where x = (x1,..., Xn) satis- 
fies feasibility constraints. The expected utility U(x) = 
Eu(W + 7j_4(@j —¢)) xj) =f... fu(g(x,o))P(do), 
where u(z) is assumed to be a monotonically increasing 
function of z. If each , is characterized by a finite num- 
ber M of states, the expectation U(x) is reduced to the 
sum of N = M” terms. The number M” is astronomi- 
cally large even for M = 10, n = 10, i.e. although @ is 
characterized by a finite number N of states, the exact 
evaluation of U(x) is still a tedious task. The vector &(k) 
in (4) has components &(k) = —u'(W + Ly=1(oF — 
ci) x; (wk — cj), where w*,k = 0,1,... are indepen- 
dent samples of w from a probability distribution. 


Stochastic Optimization Problems 


It is often impossible to summarize the outcomes 
g(x,q@) of a decision x into a single index of prefer- 
ability (2). Such cases lead to the following general STO 
problem: Given a probability space that gives a descrip- 
tion of the possible environments wm € £2 with asso- 
ciated probability measure P, a stochastic optimization 
(STO) problem is to find x € X C R” such that con- 
straints 


Fi(x) = E[fi(x, @)] 


= f fils. 0), de) =0, t=i17m, 6) 


are satisfied and an objective function 


A0)=2G0)= / fied. © 


is minimized. Functions fj(x,@) and F(x), i = 
0,1,...,m, are called correspondingly sample and 
expectation functions. In some problems functions 
fi(x, @) depend, [4], pp. 17, 173, not only on the out- 
comes g(x, w) but also on their expectations Eg(x, w): 


filx, @) = W(x, Eg(x,@),@). (7) 


In this case, the calculation of fj(x,q@) requires the 
calculation of the expectation Eg(x,«), i.e. functions 
fi(x, @) themselves are not known explicitly. 

Functions f;(x,@) and even F;(x) in (5) are often 
discontinuous. For example, if we set fi(x,@) = 1 —1; 
when an outcome gi(x,w) > 0, and fi(x,@) = 
otherwise, then for the given i constraint (5) corre- 
sponds to the safety or chance constraint 


=1; 


F,(x) = Pri[gi(x,@) = 0) -—r; <0, (8) 
where 0 < 7; < 1 isa safety level (risk factor). For a dis- 
crete distribution of w, function F;(x) in (8) is a discon- 
tinuous function. 

A rather general approach, besides SQG methods, 
consists of approximating the distribution P in (5), 
(6) by a discrete distribution pY = (pi,..., pn), 
i.g., by an empirical distribution p; = I/s,s = 
1,...,N. As a result, the integrals in (5), (6) are re- 
placed by sums, i.e., functions F;(x) are approximated 
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by FN(x) = YS psfi(x,@;), and resulting problem 
could be solved, if possible, by standard deterministic 
methods. This approach can be used only when P does 
not depend on x, N is a small number (as in Example 1), 
and fi(-, @) are analytically tractable functions. Besides, 
the deterministic approximations F(x) may destroy 
the smoothness, the continuity and even the convex- 
ity [3] of functions F;(x). The convergence of min Fe 
to min F(x), N — ov, is established in practically all 
important cases. Despite this, the number of discon- 
tinuities and local optimal solutions of approximating 
problems as we can see from (8) and further examples, 
may tend to oo without having connections with solu- 
tions of the original problem. The SQG methods have 
advantages in such cases. They deal directly with func- 
tions F;(x) allowing to utilize a remarkable specific of 
many STO problems: despite nonsmooth, discontinu- 
ous and even nonconvex functions f;(-,@), and hence 
FN (x), functions F;(x) are often continuous and con- 
vex. 

If P in (5) does not depend on x and subgradients 
fix, @) are calculated exactly, then, under certain reg- 
ularity assumptions that ensure the interchangeability 
of differentiation and integration operations, fi,(x*, w) 
is a SQG of F(x) atx = x*. In case P(x,dw) = 
p(x, @)dw and p(x,@), fi(x,@) are calculated exactly, 
a SQG of F;(x) is computed as 


y Pxlat, wo") 
BW = ful’) + fico ey 0) 
If P(x,dw) is not known but the function 


f fi(x, @)P(y, d@) is continuously differentiable with 
respect to y at y = x, then the following SQG has the 
bias | bi(k) | < constA,, for all x* from a bounded set: 


= fix(x*,@) 
+0 APL fie®, 0) 


j=1 


— filx*, o°)) el 


&'(k) 


(10) 


where w*J,j = 1,..., n, are independent observations 
of w from P(x* + A,e/, dw), and w® from P(x*, dw). 

For general nonsmooth Fj(x) the SQG are calcu- 
lated by using random points x* instead of x‘ as in (11), 


(12) in Stochastic quasigradient methods: 


E(k) = D0 ALLA, oh) — feo Je, (11) 
j=l 
E(k) = 3/2A7 Lfi(x* + Ars*, @*) 
— f(x, a )Ic*, (12) 


where ¢* has independent uniformly distributed com- 
ponents on [—1, 1]. 

The choice (11) with x* = xk corresponds to 
the standard Stochastic Approximation originating 
from the classical papers of Robbins-Monro and 
Kiefer-Wolfowitz (see, for example, [10,11]). It was 
proposed for unconstrained minimization F(x) = 
IT iG w)P(x, dw), where F(x) is a twice continuously 
differentiable convex function. 

Vectors (10)-(12) have (for fixed x*,x*) un- 
bounded variance Varé(k) = O(A;”) > co, k > ow, 
assuming Varf;(x,w) < const. If P does not depend 
on x, then for dependent w*° = --- = w* (single run 
SQG) they have Var&(k) = O(A*) > 0, k > o0. 

An averaging operation similar to (17) in ® stochas- 
tic quasigradient methods, is used to confront the 
complexity of the sample function (7). Assume that 
W;(x, y, @) is calculated exactly for a given (x, y, w) and 
consider the sequence 


gk + 1) = Bk) + Sk(g(x*, w*) — g(x*)), 
k= 0, 1,ece 5 


where x‘ is the current approximate solution, and w*, 
k = 0,1,..., are independent samples of w. Then, 
under general requirements on 5,, with probability 1 
g(x) — E[g(x*, w*)|x*] | > o,k > o. There- 
fore, W (xk e(k), o*) can be used as an estimate of 
Re, w*) and &'(k) can be calculated by chain rules 
such as 


E(k) = Wix(x*, o(k), o*) 


+ Vig x. 210). 0%) gil’, oF) (13) 


for GD functions, or by using finite-difference approx- 
imations of Wx, Wig, ix as in (11), (12). Single-run 
SQG of type (11), (12) for dependent wo = --- = w* 
provide surprisingly more accurate estimates [2]. 
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“Hit-or-Miss” Decision Problems 


This problem [3] illustrates typical difficulties in dealing 
with optimization of continuously differentiable expec- 
tation functions F(x) when sample functions f(x, @) 
are nonsmooth. Assume that at some point in the evo- 
lution of a system (ecosystem, nuclear power plant, eco- 
nomic system), a catastrophe at a random time w could 
occur if the decision maker does not intervene and con- 
trol ongoing processes at a time moment x ¢€ [0, T]. 
The profit in the absence of a catastrophe, x < a, is 
proportional to tT = min(x,q@), but w < x leads to 
high losses b. Suppose that w is distributed on the in- 
terval [0, T] with a probability density function j4(@) 
and the sample performance function 
—ax if 0<x<@ 

if ox<x<T. 


flx,o) = 


b—aw 


The function f(x,q@) is discontinuous with respect to 
both variables. The expected performance function has 
the form 


F(x) = Ef(x,@) 


= Elf (x, @)Ix<o] + Elf(x,@)Ixz0], (14) 


where I4 = I,4(q) is the indicator function of the event 
A: I4(@) = lif@ € Aand I4(w) = 0, otherwise. The 
gradient of f(-,@) exits everywhere except for x = o. 
Define 


—a,0<x<@ 
Oo<x<T. 


Sc(x, @) = 


Obviously, the expectation Ef,(x,q@) exists, but the 
“interchangeability” formula is not valid: F,.(x) # 
Ef,(x,@). Indeed the direct differentiation of both 
sides in (14) yields F,(x) = (f(x, x)— f(x, x40)) U(x) + 
Efx(x, @) where f(x, x40) = limy— +0 f(x, y). There- 
fore the discontinuity of f(x, @) results in a new addi- 
tional to f,(x, w) term 


E(k) = (Fle*, x) — fle x4) we) + fel, 0). 
(15) 


It is clear that the approximations F(x) of function 
(14) have increasing number of discontinuities and lo- 
cal optimal solutions. 


Pollution Control 


A feature common to most of the models applied for the 
design of pollution control polices is the use of transfer 
coefficients a;; linking the amount of pollution x; emit- 
ted by source i to the resulting pollution concentrations 
ey Aj jXi> j = 1,...,m. 
The coefficients a;; are often computed by means of 


y; at location j as yj = 


Gaussian type diffusion equations. These equations are 
solved over all possible meteorological conditions, and 
the outputs are then weighted by the frequencies of the 
meteorological inputs over a given time interval, yield- 
ing average a;;. The deterministic models determine 
cost-effective emission strategies subject to achieving 
exogenously specified environmental goals, such as am- 
bient standards q; at receptors: y; < qj. The natural im- 
provement of deterministic models is the inclusion of 
constraints that account for the random nature of the 
coefficients a; j in order to reduce the occurrence of ex- 
treme events: 


F;(x) => a + yi, E max | 0, Yo xj(aji = nif 
i=1 i=1 


—qj <0,i=1:m,a4;; = Ea;; , 


where y; is a risk coefficient which enforces the con- 
straints to reduce the chance for actual deposition to 
exceed the average value. The function F;(x) does not 
satisfy the linearity requirements (3) and, in general, is 
not continuously differentiable (although it is a convex 
function). It’s aSQG 

0, yoia1 X74, 7 ia 4) ji 


i k) =; A = 
&'(k) jit Vi ak — @;i, otherwise, 


where a‘, k = 0,1,...are independent observations 


Queuing Networks 


A typical situation with implicitly given nonsmooth 
sample functions f(x, @) occurs in the optimization of 
queuing networks [3]. A network consists of nodes (de- 
vices) which “serve” flows of “messages”. At any mo- 
ment the device i = 1,2,... 
which is then transferred to another node in accordance 


serves only one message, 


with a certain routing procedure defining a destination 
node for jth message served at the ith device. If the de- 
vice is busy, then the message is waiting in the queue 
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and is served according to the rule: first come — first 
served. Let 1;;(x,@) be (random) service time of mes- 
sage j at i depending on some control parameter x and 
uncontrolled (random) parameter @; a; ;(x, w) — the ar- 
rival time of message j to node i; B;;(x,@) - the time 
when i starts to serve j; yij(x,@) — the time when i 
finishes servicing j. The logic of a node operation is 
described by the following recurrent relations: y;j = 
Bij + tij, Bij = max {yi(j-1), aij\s j = 1,2,.... From 
this follows that various important indicators (wait- 
ing times, queue length node loads and etc.) of net- 
work performance can be expressed through functions 
T;j(x,@) by max and min operations, i.e. they are GD 
functions assuming 1;;(-,@) are such functions. The 
calculation of SQG can be based on (15) in “Stochastic 
quasigradient methods”. 


Stochastic Dynamic Systems 


Stochastic dynamic systems are usually defined by im- 
plicitly given sample performance functions f;(x, @). 
The decision vector x represents a sequence of deci- 
sions (control actions) x(t) over a given time horizon 
a ,T: x = (x(0),...,x(T — 1)). In addi- 
tion to x, there may also be a group of state variables 
z = (z(0),...,2(T)) that record the state of the sys- 
tem. The variables x, z are connected through a system 
of equations: 


z(t + 1) = g(t, 2(t), x(t), @), 
£=0)...9F =1,20) =z. (16) 
Objective and constraints functions are defined as 
expectations of some sample performance functions 
hj(z,x,@),i=0,1,...,m. 

Due to (16), variables z are implicit functions of 
(x,@), ie. Z = 2(x,@). Therefore, h; are also im- 
plicit functions of (x,@): fi(x,@) = hi(z(x,@),x,@), 
i = 0,1,...,m, and the resulting stochastic dynamic 
optimization problem can be viewed as a stochastic op- 
timization problem of the type (5)-(6) with implicitly 
given sample performance functions fj(x,@). A way to 
solve this problem is to use the SQG (11), (12). In par- 
ticular (12) requires the calculation of only two “trajec- 
tories” z(t), t = 0,1,..., T at each step of SQG pro- 
cedures. If functions g(-,@), h;(-,@) have well-defined 
analytical structure and the probability measure P does 


not depend on x, then subgradients f;.(x*,@) are cal- 
culated (for fixed w) using analytical formulas from 
nonsmooth analysis |1][pp. 17, 175 in 4]. 


Optimization of Discrete Event Dynamic Systems 


The well-defined analytical structure of functions 
g(-,@), hi(-,@), in (16) is typical for applications in 
mechanics and physics. Important problems in oper- 
ation research, economics, ecology, finance, reliability 
theory, communicational networks [3,4,8,9] deal with 
cases where these functions and the probability mea- 
sure are composed of so many components involving 
logical variables (as in queuing systems) that no explicit 
“smooth” analytical expression can be derived. Discrete 
events may change the state of the discrete event dy- 
namic system in a discontinuous fashion, implying that 
the functions g(-,@), h;(-,@) are nonsmooth. This of- 
ten rules out the interchangeability of differentiation 
and integration operations, as in the “hit-or-miss” de- 
cision problems. Nonetheless, it is possible to develop 
various techniques for calculating stochastic quasigra- 
dients [2,3,8]. Let us consider a typical situation. 


Example 2. Managing Catastrophic Risks 


Increasing likelihood of extreme catastrophic events 
which may affect large territories and communities 
dominates the discussion of global change processes. 
The analysis of robust catastrophic risk management 
decisions [6] requires new approaches based on explicit 
analysis of endogenous risk processes involving var- 
ious agents such as governments, insurers, and indi- 
viduals. Risk processes describing the ruin of an agent 
or depletion of resources have similar structure. Con- 
sider a typical simple example. At time ¢ = 0,1,... 
risk reserve of an insurer is characterized as R(x, t) = 
M(t) + xt — S(t), t = 0,1,..., where M(t) is the 
“normal” part of the reserve, associated with ordinary 
(noncatastrophic claims); a catastrophic scenario oc- 
curs at time t with probability p; S(t) = °Nt, Di, is 
the accumulated catastrophic claim from catastrophes 
at random time moments fj, tf, ...; xt is the accumu- 
lated premium (inflow) from catastrophic risk. In more 
general risk processes inflows xt and outflows S(t) at t 
are described by some random functions I(x, t), O(x, t) 
dependent on a vector x of feasible decisions [6]. The 
long term stability can be characterized by the prob- 
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ability of ruin q(x) = Pr[R(x,t) < 0 for some f] 
or q(x) = ElIgx,r)<o, where IRcg = 1if R < 0, 
Ig<o = 0 otherwise, and Tt is the first moment t when 
R(x, t) < 0. Assume d(x) is the welfare generated by 
x. The calculation of d(x) requires to consider all rel- 
evant agents [6]. The problem is to find x > 0 max- 
imizing a trade-off between profit d(x) and the risk 
F(x) = d(x) + yEl[Irx,r)<0], where y is a risk coef- 
ficient. The function f(x,w) = d(x) + yIrcx,r)<o is 
an implicit function of x and @, and it is also a discon- 
tinuous function. Assume that the probability V;(y) = 
Pr[M(t) < y] is an explicitly known function. By tak- 
ing the conditional expectation for given D;,, D;,,..., 
the function F(x) can be written as 


F(x) = d(x) + yE) pN(1— p)™ 


t=1 


Nt 
x Vy (>9.-») . (17) 


1=1 


A SQG of F(x) in (17) can be calculated [6] by us- 
ing auxiliary random variables. At step k sample ran- 
dom variable ¢, € {1,2,...} distributed according 
to arbitrary w(t), )°2, w(t) = 1,4(t) > 0, sample 
D,,,1 =1,...,Ng, and take &(k) = d!(x*)—y pN& (1— 
pyNEV! (pet Dy, — x*bu)cele(se), where d’, V/ 
are the derivatives of d(-), V;(-). It is easy to see that 
E[E(k)|x*] = F,(x*). More general situations are dis- 
cussed in [5]. 


Neural Nets 


These models emerge in image processing, classifi- 
cation and behavioral sciences. From a formal point 
of view the training of a neural net is equivalent 
to the minimization of the error function F(x) = 
aa H(i, x), where each function H(i, x) corresponds 
to one training object. At each step k = 0,1,..., 
it is possible to experiment only with one object. As- 
sume at the step k action i = i(k) is chosen with 
probability (i) > 0 among N alternatives. The most 
frequently used algorithm is so-called back propaga- 
tion, where the current vector x* of parameters x is 
adjusted [7] in the direction opposite to the gradient 
&(k) = 1/u(i(k))H,(i(k), x*). It is easy to see that 
ELE(K)|x*] = Fe(x*), 


Automaton Learning Problem 


Let j = {1l,...,} be the automaton action set and 
B(j) be the random response to action j. The dis- 
tribution of B(j) depends j but it is unknown. The 
automaton attempts to improve its behavior (current 
action) based on the random responses to a particu- 
lar action chosen. In other words, the goal is to find 
an action with the largest expected outcome Ef(/). 
This problem can be formulated as a rather simple 
stochastic optimization problem: maximize F(x) = 
Via EB(A) x), s.t. Leas = 1xyp 0, Let xk = 
(xk, ...,x*) is the current approximate solution to this 
problem. Choose an action i = i(k) with the probabil- 
ity a observe response B*(i(k)) and calculate &(k) = 
(Ojsens 0 B* GR) IRE p25) Then E[&(k)|x*] = 
F,(x*), 
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Introduction 


Stochastic quasigradient (SQG) methods are applicable 
to both deterministic and stochastic minimax (SMM) 
problems. SMM problems, which are similar to the two- 
stage stochastic programing, have nested nonsmooth 
sample (random) objective functions. Some examples 
of SMM applications are discussed in ‘Stochastic quasi- 
gradient methods: Applications’. An important class 
of SMM problems takes on the form [4,9], pp. 165- 
168, [14]: minimize the expectation function 


F(x) = Emaxg(x,y,@), xEX, (1) 
yey 


where f(x,@) = maxyecy g(x, y, @) is the sample (ran- 
dom) objective function, X C R”, Y C R’ and 
@ is an element of a probability space (2, A, P), i.e. 
@ € 2, and A is the set of events (subsets of £2) 
measurable with respect to the probability measure P. 
If §2 contains only one point, then the minimization 
of (1) corresponds to the standard deterministic min- 
imax problem. Besides deterministic constraints of the 


type x € X, problem (1) may have general constraints 
of the STO problems given in terms of expectation 
functions, some of which may have the same struc- 
ture as the expectation function F(x) in (1). The set Y 
may also depend on (x, w) as in the two-stage stochas- 
tic programing. Functions f(x,w) in (1) often have 
more general nested structure as in catastrophic risk 
management [7]. First of all, let us note that the sam- 
ple function f(-,@) is an implicitly given nonsmooth 
function even for linear g(-, y,w). Therefore, all gen- 
eral purpose SQG methods developed for general non- 
smooth problems (such as stochastic finite-difference 
approximations) are applicable to problem (1). Spe- 
cific SQG methods utilize the structure of the sam- 
ple function f(x,w) by using the following idea. Let 
y(x, w) bea solution of the inner maximization problem 
in (1), ie. f(x,@) = g(x, y(x,@), @). Often it is pos- 
sible to show that g(-, y,@) is an analyticaly tractable 
generalized differentiable (GD) function and, hence, 
f(,@) = maxyey g(-, y,@) is also a generalized differ- 
entiable function and its subgradient 


fel, @) = gx, (x, @), @) (2) 


is an SQG of F(x). For example, if g(-, y,@) is a con- 
vex function, then f(-,@) is also a convex function 
and (2) defines its subgradient. Let us note that al- 
though the two-stage model has similar objective func- 
tion f(x,@) = minyey g(x, y,@), but in this case the 
convexity of f(-,@) requires much stronger assump- 
tions: g(x, y, @) has to be a convex function in both vari- 
ables (x,y). As it is discussed further, these two classes 
of models are oriented on rather different decision situ- 
ations under uncertainty. The vector f(x, ) or its ap- 
proximation can be used in various SQG methods. For 
example, if f(-,@) isa GD function, then the SQG pro- 
jection method is defined as 


xh) = oy G — prfr(x*, o')| ; 


(oro) al ere 


(3) 


Ko = xk 4 ck and © e!,... are indepen- 


ek | >0, e* > 0 
for k — oo with probability 1, and w°,w!,... are 
independent samples of w. The vector f,(x*,w*) is 
a stochastic mollifier quasigradient of F(x) at x = x*. 


If pr = 0,>°725 0% = © with probability 1, 


where X 


dent random vectors with densities, 
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yo Ep; < 00, and X is a convex compact, then 
{F 5} converges with probability 1 and the cluster 
points of random sequences {xk \ belong with prob- 
ability 1 to a connected set of local solutions [8,11]. 
The convergence of (3) for ek = 0 takes place for so- 
called [4,9], p. 151, [15] weakly convex functions F(x), 
i.e., functions such that F(y) — F(x) > (Fx(x), y—x) + 
r(x, y), where r(x, y)/||x — yl] > 0,x > zy > z. 
For convex f(-,@), the sequence {x*} converges with 
probability 1 to the set of optimal solutions for e* = 0. 
If the probability distribution of w is concentrated at 
one point, then (1) reduces to the standard determin- 
istic minimax problem and (3) is a SQG procedure for 
this problem. 


Example 1. Production planning under uncertainty. As 
long as there is uncertainty about future demand, 
prices, input-output coefficients, available resources, 
etc., the choice of a production level x > x > 0 for 
foreseeable demand is a “hit-or-miss” type decision 
problem. The cost f(x, @) associated with overestima- 
tion and underestimation of w is, in the simplest case, 
a random piesewise linear function 


f(x, @) = max {a(x —@), B(w — x)} , 


where a is the unit surplus cost and f is the unit short- 
age cost. The problem is to find the level x that is “op- 
timal”, in a sense, for all foreseeable demands w rather 
than a function x — x(@) specifying the optimal pro- 
duction level should be in every possible “scenario” w. 
The expected cost criterion leads to the minimization 


F(x) = Emax {a(x — w), B(w — x)} (4) 


subject to x > x > O for a given upper bound x. 
This stochastic minimax problem is also reformulated 
as a two-stage stochastic programming model known 
as the newsboy problem. 

Function F(x) in (4) is convex, therefore method (3) 
with e* = (is reduced to the following 


x*t! — min{max{0, x* — px&(k)}, X}, 
1 rn 


where &(k) = a, if the current level of production 


x* exceeds the observed demand w*(x* > w*) and 


&(k) = — 6 otherwise. The method (5) can be viewed 
as an adaptive procedure which is able to learn the op- 
timal production level through sequential adjustments 
of its current levels x°,x!,... to observable demands 
w°,o!,.... Let us note that the optimal solution of (4) 
and more general SMM problems [6] defines quan- 
tile type characteristics of solutions, e.g., CVaR risk 
measures [16] (see also “Two-stage stochastic program- 
ming: Stochastic Quasigradient methods”). For exam- 
ple, if the distribution of w has a density, a > 0, B > 0, 
then the optimal solution x minimizing (4) is the quan- 
tile defined as Pr[w < x] = B/(a + B). Therefore, the 
process (5) is a constraint sequential estimation proce- 
dure as in [9]. Problem (4) illustrates the essential dif- 
ference between so-called scenario analysis aiming at 
the straightforward calculation of x(w) and the STO op- 
timization approach: instead of producing trivial opti- 
mal “bang-bang” solutions x(@) = w for each scenario 
(a Pareto optimal solution w.r.t. potential w), an STO 
model as (4) produces one solution that is optimal (“ro- 
bust”) against all possible w. 


Example 2. Stochastic facility location model. This 
model [6,9], pp. 413-435, generalizes Example 1 and il- 
lustrates the possible implicit character of underlying 
probability distributions. Assume that customers liv- 
ing in a district i choose their destination j at random 
with probability Pj related to the cost of travel between 
(i,j) and (or) other factors. Let e, be a random num- 
ber of customers traveling from i to j and 7; is the total 
number of customers attracted by j: tj = oi, &: p 
j=l: te ai Pi = aj;,i = 1: m. The actual num- 
ber 1; of customers attracted by j may not be equal 
x;. The random cost connected with overestimating or 
underestimating of the demand 1; in district j may be 
a convex function a(x; — t;) for xj = tj or Bj(tj — x;) 
for x; < t;. The problem is to determine the size x; that 
minimizes the expected cost 


F(x) = )> Emax faj(xj — 1), By(t) — xp} . 


j=l 


where x; > x = 0. The SQG procedure for solving this 
problem is similar to (5). It is remarkable that appli- 
cations of SQG methods to spatial minimax allocation 
problems [12] do not destroy their convexity in contrast 
to discrete approximation schemes. 
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Decision Making Under Extreme Events 


The standard theory of extreme events studies the be- 
haviors of the maxima, max(6),...,6,), for an iid se- 
quence 6),...,9,, n > 2. The objective function (1) 
is, in general, defined by the maxima of mutually de- 
pendent random variables g(x, y,@), y € Y, which also 
depends on decision variables x. Problem (1) can be 
viewed as a model for decision making under extreme 
events, when two types of uncertain variables y, w af- 
fect the result g(x, y,@) of decision x. The uncertainty 
with respect to y is evaluated from the extreme random 
scenario, whereas w is considered as a random variable. 
Therefore (1) is a hybrid between a purely deterministic 
minimax approach that takes the form of minimizing 
F(x) = max {g(x, y,w)|y € Y,@ € 2} and the purely 
probabilistic Bayesian approach of minimizing the ex- 
pectation function F(x) = Eg(x, y,@) for some joint 
probability distribution of (y, @). Such SQG procedures 
as (3) can be viewed as an adaptive search of robust de- 
cisions by learning from environmental responses (sim- 
ulations) w°, w!,.... 


Assymetric Information 


The following interpretation leads to various impor- 
tant generalizations of the SMM problem (1). Consider 
two agents and the objective function g(x, y, ). Agent 1 
chooses his action x € X without knowing the choice 
y € Y of agent 2 and the state of nature (environ- 
ment w. Agent 2 chooses his action y after agent 1 and 
is fully informed about x, w. F(x) in (1) is the guaran- 
teed expected result of agent 1 for action x. If agent 2 
does not know the state w before choosing his action y, 
the problem for agent 1 is to minimize 


F(x) = max Eg(x, y,@). (6) 
yey 


The function F(x) in (6) is nonlinear in probability 
measure P, i.e. it differes from the expected utility. The 
calculation of F at any point requires the solution of 
aSTO subproblem, i. e. it isa nested STO problem, that 
often can be solved by using SQG methods with the av- 
eraging operation. 


Example 3. Exact penalty function. See also ® stochas- 
tic quasigradient methods for a discussion of the 
method. For solving a particular case of problem (6): 


the minimization of the exact penalty function 
F(x) = Efo(x,@) + C )° max {0, Efi(x, )} 
i=1 
for general STO problems. 


Convex-Concave Expectation 


Assume that the function g(x, y) = Eg(x, y, @) is con- 
vex in x and concave in y, and X, Y are compact convex 
sets. Let 


slept a2 


(7) 


mx([x* — pegr(x*, y*,o*)], 
k+1 


yl = axly* + pegy(x®, y*, co), 


where g,,gy are subgradients of g(x,y,@) with respect 
to x,y correspondingly. This is an SQG projection 
method for the search of saddle points of g(x,y) in 
X x Y. It is a stochastic version of the Arrow-Hurwicz 
method. The convergence of sequences {x*}, {y*} to 
a saddle point requires rather strong assumptions on 
g(x,y), such as strict convex-concavity. It is possi- 
ble to show that for linear functions ¢(-, y), g(x, :) 
the sequences {x*}, {y*} do not converge under any 
choice of the step-size multiplier p;,, besides some 
exceptional cases. The convergence of the sequences 
RT Py eee Pee eae 
with probability 1 to a saddle point of g(x,y) takes place 
under standard assumptions on p, ensuring the con- 
vergence of SQG projection method for convex prob- 
lem [14,17]. Another possibility is to modify (7) by us- 
ing general ideas of the proximal method or its varia- 
tions [3,13]. 


Example 4. Finite set Y. Consider problem (6) 
with a finite set Y, i.e. assume that F(x) = 
max <j<; Eg;(x,@). This problem is equivalent to the 
minimization of F(x) = maxyey E )0j-, yigi(x,@), 
with convex-concave expectation, Y = {yj = 0, 
Via Vi = Ue 
Further refinements of stochastic minimax problems 
are possible. A hybrid of the models (1), (6) is the min- 
imization 

F(x) = max E max g(x, y,Z,@). (8) 

yeY zEZ 

If g(x, y,z,@) is convex in x and concave in y, 

and X, Y are convex compacts, then the procedure (7) 


3816 


Stochastic Quasigradient Methods in Minimax Problems 


is also applicable for solving (8) with g.(x*, y*, w*), 
g(x*, yk, wk) replaced by gx(x*, yk, zk, wk), 9? (xk, 
y* 2°"), where z* is a solution of the deter- 
ministic subproblem g(x*, y*,z*¥,@*) = 
gx’ "2.0". 


MaxzezZ 


Stochastic Nash Equilibrium 


A stochastic equilibrium of an N person game on 
X = X, xX X, x... x Xy can be defined by using 
pay-off functions W(x) = Ew;(x,@), x € X. Let us 
.5Xn) by (yi|x). The 
point x* = (xf,...,x}) € X is refered to as the Nash 
equilibrium if W;(x*) = min {W%(yi|x*)|(yilx*) € X}. 
Let us introduce KyFan function L(x, y) = 
inn (Vie) — Vile) y = Gus. ym % = (2, 
...,Xy). A point x* € X is a normalized equilibrium 
if maxyex L(x*, y) = 0. Since maxyex L(x, y) = 0, the 
search of a stochastic normalized equilibrium reduces 
to an SMM problem: minimize 


denote (X1,...,Xj-1, Vi, Xi41,-- 


F(x) = maxL(x,y), xeEX. 
yEex 


The important additional information that minyex 
F(x) = 0 can be effectively utilized in the search of 
global solutions by SQG methods. Assume now that 
L(x, y) is a convex-concave function for x € X, y € X. 
Then, the procedure (7) takes the form 


a Sig EB - prvisi(x', o*)| 
i= 1:N,k=0,1,...; 6) 


where jx; is a subgradient of g;(x,@) with respect to 
x. The convergence conditions are similar to those of 
method (7). This method is an adaptive adjustment 
procedure for learning a Nash equilibrium [10,17]. 


Example 5. Stochastic Cournot oligopoly. The classi- 
cal oligopoly model of Cournot [1,10] remains a key 
model within modern theories of industrial organiza- 
tion. Generalizing it to comprise the different goods 
and uncertainty, the model takes on the following form. 
Firm i produces the commodity bundle x; € R”, thus 
incurring convex random production cost c;(x;,@) and 
gaining random market revenues ( Par Xj,@), Xi), 
where p(Q,q@) is the price at which total demand Q 
equals the aggregate supply rahe Suppose that 
p(Q,@) = a — A(w)Q, where a € R" and A(q@) is 


an X n positive semidefinite matrix (almost for all 
w). Then for Wj(x,w) = ci(xi,@) — (a,x;) + 
Via (A()x;, xi) the function L(x,y) is convex- 
concave. 


Stochastic Optimization 
with Unknown Distributions 


The probability measure P of the standard STO prob- 
lem is assumed to be well defined on subsets A of 2 
in the sense that it is possible to generate samples 
w°,@',... of random variables w from P. In practice 
the probability [2] measure P may not be known ex- 
actly: there is only information on some of its char- 
acteristics, in particular bounds for the mean value or 
other moments. Such information can often be written 
in terms of constraints 


Q,(x, P) = / qn(x,@)P(dw) <0, s=1: K, (10) 
Q 


/ P(dw) = 1, 
2 


where qx(x,w) are known functions (which often do 
not depend on x), for example, as in constraints on 
joint moments ¢,,,...., < Eo;! - oy < Cy,,..,7, with 
known constants c, C. If the unknown probability mea- 
sure is evaluated from the worst case perspective within 
constraints (10), (11), then the STO problem is formal- 
ized as a SMM problem: find a vector x that minimizes 


(11) 


F(x) = max [ f(x.0)P(de). (12) 


where P is the family of probability measures de- 
fined by (10), (11). The solution of “inner” subprob- 
lem in (12) definesan implicit probability measure 
P*(x,d@). The important fact is that P*(x,-) is concen- 
trated at not more than K + 1 points [4,5], and this fact 
can be utilized effectively in the design of solution pro- 
cedures. Another important approach is to use the fol- 
lowing duality relations [5] 


>0,wE2 


K 
F(x) = max foo + Yo mason (13) 


s=1 


for the inner maximization problem in (12). 

Other important cases arise with further speci- 
fication of uncertainties associated with measure P. 
Assume that random parameters can be separated 
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into two groups (w ,v) with a joint distribution 
function H(w,v) of the form dH(@,v) = h(a,v) 
P(d@)u(dv), where P(dw) is not known exactly and 
satisfies (10), (11). If Pis taken from the worst case, then 
the problem is formulated as the minimization of 


F(x) = E [max Bs. fle0,9)] 


[ [max f fo0, »yh(o, »»PCdo)| du(v). 
(14) 


By using duality relations similar to (13), this can be 
reformulated as the minimization of type (1) function 


F(x) = max [ f(x, @, v)h(a, v) 


u>0,@E2 


K (15) 
+ ~ Usqs(x,@,Vv)]du(v) . 
s=1 

Example 6. Incomplete information on cost functions. 
Consider (12) for cost function F(x) = (Ec, x),x € X, 
where c is a random vector, c = c(w, v), and the func- 
tions qx in (10) do not depend on x. Then problem (15) 
is formulated as the minimization 


F(x) = max [evo v), x)h(w, v) 
Q 


u>=0,wE 


K 
+ )>usqs(w)]n(dv). 


s=1 


Here F(x) is a convex function. A SQG is defined by 
formula (2). 


The complexity of SMM problems discussed here is due 
to nested structure of their objective functions. Many 
of them involve deterministic optimization subprob- 
lems under the sign of mathematical expectations. In 
applications, these subproblems often have a special 
structure, for example, the feasible set may be reduced 
to a finite number of alternatives (as in Example 4). 
In the case of infinite feasible sets they can be adap- 
tively approximated by random finite sets with con- 
stant number of elements at each step k = 0,1,... of 
SQG procedures, as it was proposed in [5]. For prac- 
tical applications it is important to realize that mod- 
els (1), (6), (8), (12), (14) are formulated, in fact, under 
different assumptions on the worst case scenario. For 


example, in (12), the evaluation of uncertainty is taken 
from the worst case expected outcomes, whereas (1) 
deals with the worst case random scenarios. 


See also 


> Minimax Theorems 

> Nondifferentiable Optimization: Minimax Problems 

> Stochastic Quasigradient Methods 

> Stochastic Quasigradient Methods: Applications 

> Two-Stage Stochastic Programming: Quasigradient 
Method 

> Two-Stage Stochastic Programs with Recourse 
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Introduction 


The field of stochastic scheduling is motivated by prob- 
lems of priority assignment arising in a variety of sys- 
tems where jobs with random features (e. g., arrival or 
processing times) vie over time for access to shared 
service resources. Two prominent application areas 
are the dynamic scheduling of flexible manufacturing 
and computer-communication systems. Think, e. g., of 
a manufacturing workstation whose capacity is shared 
by multiple part types. Or consider a packet-switched 
communication network’s channel whose bandwidth is 
shared by multiple traffic classes. Another rich set of 
applications is furnished by problems concerning the 
dynamic scheduling of multiple projects, whose states 
evolve randomly over time (e. g., research and develop- 
ment projects, or clinical trials). 


The performance of such systems, as evaluated by 
a criterion such as the average time that jobs stay in 
the system (flowtime), can be significantly affected by 
the scheduling policy adopted to prioritize over time the 
access of jobs to resources. This motivates the inter- 
est of finding scheduling policies that optimize perfor- 
mance objectives of concern (e. g., minimizing the av- 
erage flowtime). Yet, the high degree of discretional- 
ity allowed in the design of such policies gives rise to 
a combinatorial explosion rendering intractable an ex- 
haustive search to determine an optimal policy. Instead, 
a goal of practical interest is to design relatively sim- 
ple scheduling policies that achieve an optimal or nearly 
optimal performance. 

The theory of stochastic scheduling addresses such 
a goal in the idealized setting of stochastic system mod- 
els. Real-world random features such as job interar- 
rival or processing times are thus modeled by specify- 
ing their probability distributions. Model assumptions 
vary across several dimensions, including the class of 
scheduling policies considered to be admissible, job in- 
terarrival and processing-time distributions, type and 
arrangement of service resources, and performance ob- 
jective to be optimized. Typically, admissible policies 
are required to be nonanticipative, meaning that they 
cannot make use of future information, such as the un- 
known total duration of a job whose processing has not 
yet finished. 

Regarding solution methods and techniques, it 
seems fair to say that no unified and practical approach 
is yet available to design and analyze optimal or near- 
optimal policies across the entire range of stochas- 
tic scheduling models. Although many such models 
can be cast in the framework of dynamic program- 
ming, straightforward application of this technique typ- 
ically results in intractable formulations (curse of di- 
mensionality). Classical results in the area were ob- 
tained through insightful ad hoc ideas, often based on 
interchange arguments (cf. [41]), whose extension to 
seemingly close model variations is hard or elusive. Yet 
the last two decades have witnessed major advances in 
promising research fronts, such as the use of Brown- 
ian or of fluid approximations, the use of mathemat- 
ical programming formulations, and the development 
of priority-index policies. 

Stochastic scheduling problems can be classified 
into three broad types, which have evolved with sub- 
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stantial autonomy: problems concerning the schedul- 
ing of a batch of stochastic jobs, multi-armed bandit 
problems, and problems concerning the scheduling of 
queueing systems. 

The historical development of each such area has 
followed a similar three-stage pattern. In the first, ear- 
lier stage, researchers elucidated the optimal policies 
in relatively simple models. Such policies were often 
found to be based on priority indices: an index is com- 
puted for each job type or project, as a function of 
its state; then, at each decision epoch jobs or projects 
with larger index values are awarded higher priority 
for access to service. In the second stage, research ef- 
forts focused on identifying optimal policies in more 
complex models, often at the expense of introducing 
rather restrictive conditions on model parameters, such 
as symmetry assumptions. In the third, current stage, 
the main focus has shifted to develop computation- 
ally tractable methods capable of addressing large-scale 
models, which yield guidelines for designing good dy- 
namic scheduling policies. 


Models 
Scheduling a Batch of Stochastic Jobs 


In models of this class, a set of m machines is avail- 
able to process a batch of n jobs with random pro- 
cessing times having known distributions, in order to 
optimize a given performance objective. The simplest 
such problem is to sequence a set of n stochastic jobs 
on a single machine (m = 1) to minimize the ex- 
pected weighted flowtime. Job processing times are in- 
dependent random variables, having a general distribu- 
tion G;(-) with mean p; for job i. Admissible schedul- 
ing policies are required to be nonanticipative and non- 
preemptive (processing of a job, once started, must pro- 
ceed uninterruptedly to completion). Let w; > 0 de- 
note the cost rate incurred per unit time in the system 
(waiting or being processed) for job i, and let C; denote 
its random completion time. Let IT denote the class of 
all admissible policies, and let E” [-] denote expectation 
under policy z € IT. The problem can be formulated 
as 


min E” 
well 


So wij ‘: (1) 
j=l 


In the special case where job durations are determin- 
istic, Smith first showed in [60] that it is optimal to se- 
quence jobs in nonincreasing order of the priority index 
w;/p;.Such a rule is also optimal in the general stochas- 
tic case (1), as shown in [57]. References [36,37] identify 
conditions under which such an index rule is optimal 
when there are multiple identical parallel machines and 
processing times are exponentially distributed. 

The model extension where policies are allowed to 
be preemptive (processing of a job may be interrupted 
at any time, to be later resumed) was solved by Sev- 
cik in [58]. The optimal policy is again characterized 
by a priority index for each job, which in this case is 
a function of the cumulative processing time received 
so far. 

Optimal index policies have also been identified for 
scheduling a batch of jobs on identical parallel ma- 
chines, yet only under rather stringent conditions. The 
main performance objectives investigated in such a set- 
ting are: (i) minimize the total expected flowtime, 


>. C; 3 (2) 
j=l 


and (ii) minimize the expected makespan (time to finish 
the last job), 


min E” 
well 


min E” E <| . (3) 
well l<j<n 

The index rule that assigns higher priority to jobs 
with shorter expected processing time (SEPT) has been 
shown to be optimal for (2) under the following as- 
sumptions: when job processing time distributions are 
exponential [15,29,72]; when jobs have the same gen- 
eral processing time distribution (having possibly re- 
ceived different amounts of processing prior to start) 
with a nondecreasing hazard rate function [65]; and, 
more generally, when job processing time distributions 
are stochastically ordered [67]. 

As for the expected makespan objective (3), the in- 
dex rule that assigns higher priority to jobs with longer 
expected processing times (LEPT) has been shown to 
be optimal in the following cases: under exponential 
processing time distributions [15,72]; and when jobs 
have a common processing time distribution (with pos- 
sibly different amounts of processing prior to start) with 
a nonincreasing hazard rate function [65]. 
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Other models incorporate more complex features. 
Thus, the optimality of the preemptive version of 
Smith’s index rule is extended in [53] to models with 
stochastic release dates or due dates. Also, in models 
with uniform parallel machines, which differ in speed 
rates, researchers have characterized optimal policies 
exhibiting a threshold structure: see [1,55] for the prob- 
lem of expected flowtime minimization, and [18] for 
the problem of expected makespan minimization. An 
optimal policy for the problem of scheduling a batch of 
stochastic jobs in a flow shop (with m machines in se- 
ries) is identified in [75]. 

The optimality of the simple policies identified in 
the work reviewed above typically does not extend 
to models that violate the required assumptions [19]. 
Finding an optimal policy in such cases appears to be 
a computationally intractable goal (see [50] for a study 
on the complexity of decision-making problems under 
uncertainty, such as stochastic scheduling). This fact 
has motivated the analysis of suboptimal heuristic in- 
dex policies. 

For example, it has been shown in [71] that, under 
mild assumptions, the suboptimality gap for Smith’s 
rule, when used as a heuristic for stochastic scheduling 
on parallel machines, is bounded above by a quantity 
that is independent of the number of jobs. Therefore, 
as the latter grows to infinity the rule’s relative subopti- 
mality gap vanishes, yielding a form of asymptotic op- 
timality. An earlier asymptotic optimality result in the 
same vein for a model of parallel machines stochastic 
scheduling with in-tree precedence constraints was ob- 
tained in [51]. 

A recent line of work uses optimal solutions to 
linear programming relaxations to design and analyze 
scheduling rules with performance guarantees for hard 
stochastic scheduling problems [40]. 


Multi-Armed Bandits 


Models in this class are concerned with optimally al- 
locating effort over time to a collection of projects, 
which change state in a random fashion depending on 
whether they are engaged or not. A classic example is 
the multi-armed bandit problem which, in its discrete- 
time version, can be described as follows: there is a col- 
lection of K projects labeled by k = 1,..., K, exactly 
one of which must be engaged at each discrete time 


period t = 0,1,.... Project k can be in a finite num- 
ber of states ix € Nz, where Nx is the project’s state 
space. If at period t project k occupies state i, and is 
engaged, an active reward Rj (ix) is earned, geometri- 
cally discounted by factor 0 < 6 < 1; then, the project 
state changes in a Markovian fashion to j, with active 
transition probability py (ik, jx) for jx € Nx. Projects 
not engaged do not earn rewards (i. e., passive rewards 
are RiCix) = 0) and remain frozen. The problem is 
to find a nonanticipative scheduling policy for selecting 
the project to be engaged at each period, so as to maxi- 
mize the total expected discounted reward earned over 
an infinite horizon. Denoting by /T the class of such ad- 
missible policies, and denoting by X;(t) and by a,(t) 
the state and the action (a,(t) = 1: active; ax(t) = 0: 
passive) for project k at period t, the problem can be 
formulated as 


[oe] K 
max Er bp DO) | 
t=0 k=1 


Such a classic problem, whose name refers to a slot 
machine with multiple arms, one of which must be 
pulled at each time, has its origins in problems of se- 
quential design of experiments [56,62]. After being long 
considered intractable, the problem was solved in a cel- 
ebrated result by Gittins and Jones [28]. The optimal 
policy is given by an index rule: an index y;(ix) is de- 
fined for each project k as a function of its state i,; then, 
at each time a project with currently largest index is en- 
gaged, breaking ties arbitrarily. The Gittins index gen- 
eralizes that introduced in [7] for Bayesian Bernoulli 
bandits, which in turn was based on the index intro- 
duced in [13] for finite-horizon Bayesian bandits. 

The optimality of such an index rule, for the orig- 
inal model and extensions, has a rich history of proofs 
yielding complementary insights. Such proofs are based 
on different techniques, including interchange argu- 
ments [26,28,64,70], dynamic programming [73], in- 
tuitive arguments [66], induction arguments [63], and 
conservation laws/linear programming [8]. See [27] for 
a comprehensive reference. For efficient methods to 
compute the Gittins index, see [17,35,46]. 

The important model extension where projects 
not engaged continue to evolve, typically with differ- 
ent transition probabilities, was introduced by Whittle 
in [74]. Its greatly improved modeling power comes, 
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however, at the cost of tractability [52]. In the setting 
of a time-average version of such a multi-armed restless 
bandit problem, he deployed a Lagrangian approach to 
obtain a heuristic index rule that reduces to Gittins’ in 
the classic model. His conjecture regarding the asymp- 
totic optimality of such an index rule as both the num- 
ber of projects and the required number of projects to 
be engaged grow to infinity in a constant ratio was es- 
tablished, under appropriate conditions, in [68]. Yet his 
proposed index for restless bandits only exists for a re- 
stricted class of bandits, termed indexable, which raises 
the issue of finding sufficient conditions for indexabil- 
ity. 

The results in [74] were based on introduction of 
a tractable problem relaxation, which also yields useful 
bounds on the optimal value. Improved bounds based 
on a hierarchy of linear programming relaxations were 
introduced in [11]. 

A framework for the analysis and computation of 
restless bandit indices, leading to the unifying con- 
cept of marginal productivity index, has been re- 
cently developed and deployed in several applications 
in [42,43,44,45]. See [47] for an accessible review of 
such a line of research. 

The incorporation of penalties (costs or delays) 
for switching projects also yields an important yet in- 
tractable model extension of classic bandits [34], as it 
is no longer solved by index policies [5]. Yet, [3] intro- 
duced an intuitive index that partially characterizes op- 
timal policies. An efficient algorithm to compute such 
an index, based on the natural formulation of classic 
bandits with switching costs as restless bandits with- 
out them, along with extensive computational experi- 
ence showing that the resulting index policy is nearly 
optimal, is reported in [49]. 


Scheduling Queueing Systems 


Models in this class concern the design of optimal poli- 
cies for dynamic allocation of jobs to servers, where jobs 
arrive over time according to given stochastic processes 
The main class of models in this setting is that of multi- 
class queueing networks (MQNs), widely applied as ver- 
satile models of computer-communication and manu- 
facturing systems. 

The simplest types of MQNs involve scheduling 
a number of job classes in a single server. Similarly as 


in the two problem categories discussed above, simple 
priority-index rules have been shown to be optimal for 
a variety of such models. Consider the case of a multi- 
class M/G/1 queue, where K job classes vie for the at- 
tention of a single server: Jobs of class k = 1,...,K 
arrive at the system as a Poisson process with rate Ax, 
and their service times are drawn independently from 
a common distribution G;(-) with mean 1/j,. Class j 
jobs incur linear holding costs at rate c, > 0 per unit 
time that a job resides in the system (waiting or in ser- 
vice). The goal is to find a nonanticipative and nonpre- 
emptive scheduling policy prescribing which job class 
to serve at each decision epoch, in order to minimize 
the long-run average holding cost rate per unit time. 
Let /7 denote the class of all such admissible policies, 
and let E” [L;] denote the expected number of class k 
jobs in the system under policy x € I. The problem 
can be stated as 


K 
min cp E” [Lx] . 
min 2 KE” [Lx] 


Its solution is given by the classic cj-rule [21], which 
is the same as the Smith index rule discussed above: 
award highest service priority at each time to a job with 
largest index c,j1z. The cy-rule is also optimal among 
preemptive policies when service times are exponential. 

The optimality of an index policy for the model 
extension that incorporates Markovian job feedback 
(when a class k job completes service it changes class to 
I with probability p;;, and leaves the system with proba- 
bility 1 — )“_, pxi) was established by Klimov in [38]. 
The optimal priority index is efficiently computed by 
the K-step Klimov algorithm. The result was extended 
to the discounted criterion in [61]. 

An account of these results based on the achievable 
region method, which seeks to characterize the region 
of achievable performance vectors (e.g., mean queue 
lengths for each class) by means of linear programming 
constraints that formulate conservation laws, has been 
given in [8,20,25,59] (in increasing levels of generality). 
The performance of Klimov’s index rule, when used as 
a heuristic in the model extension that includes iden- 
tical parallel servers, has been analyzed using such an 
approach in [31]: a relaxed linear programming for- 
mulation of the performance region is shown to yield 
closed-form suboptimality bounds, which imply the 
rule’s asymptotic optimality in heavy-traffic. 
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More general MQN models involve features such 
as changeover times for changing service from one job 
class to another [39], or multiple processing stations, 
which provide service to corresponding nonoverlap- 
ping subsets of job classes. Due to the intractability of 
such models, researchers have aimed to design rela- 
tively simple heuristic policies which achieve a perfor- 
mance close to optimal. The accomplishment of such 
a goal has been hindered by formidable technical chal- 
lenges, including the stability problem for multiclass 
queueing networks with multiple stations [14,24]: in 
general it is not known what conditions on model pa- 
rameters ensure that a given policy is stable (the time- 
average number of jobs in the system is finite). As 
a result, computer simulation remains the most widely 
used tool in applications of these models. Theoreti- 
cal approaches currently under active development in- 
clude the development of heuristic scheduling poli- 
cies based on: diffusion approximations of the original 
system under heavy-traffic conditions [6,32,33,54,69]; 
fluid approximations [4, 16,23]; mathematical program- 
ming formulations [9,10,12,22,30,31]; and restless ban- 
dit indexation [2,43,44,45,47,48]. 
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The basic transportation problem (a minimal cost net- 
work flow problem in a bipartite graph) is a very well- 
known problem, which can be efficiently solved with 
existing methods. It is also an important problem in 
practice; transporting goods from a set of supply points 
(factories) to a set of demand points (customers) so as 
to minimize transportation costs is a situation that of- 
ten faces planners. 

However, in practice the demand of the customers 
is often not known exactly. In many cases it is best seen 
as a stochastic amount, with a certain probability func- 
tion and a certain expected value. Models of this sit- 
uation also exist in the literature, and quite efficient 
solution methods have been developed, see for exam- 
ple [1,10] and [9]. The problem is called the stochastic 
transportation problem, (STP), and is a transportation 
problem with the demand constraints replaced by non- 
linear convex costs. 

Considering the other end of the transportation 
problem, there are facility location models, which deal 
with the question of whether or not a certain supply 
point should be utilized. In such problems, a supply 
point (facility) is available only if a certain fixed cost is 
paid. Such models, with linear costs for transportation, 
are also well known andseveral efficient solution meth- 
ods exist, see for example [3,6,17], and [21]. (Other 
variants of location problems are treated in » Facil- 
ity location with staircase costs and > Facility location 
problems with spatial interaction.) 

Obviously both these aspects can be interesting to 
consider simultaneously, i.e. planning the location of 
supply facilities and transportation of goods to the 
customers, while considering the demand as stochas- 
tic. This is what we call the stochastic transportation and 
location problem. This problem has received little at- 
tention until now. Only a few suggestions for solution 
methods can be found in the literature, [2,11,12,14]. 
The latter two papers actually consider a further gen- 
eralization with generalconcave costs at the supply 
points, together with the convex costs at thedemand 
points. 
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The Problem 


Let m supply points (facilities) and n demand points 
(customers) be given. The variables are x,, the amount 
shipped from supply point i to demand point j, z;, 
the amount shipped out of supply point i, and y;, the 
amount shipped into demand point j. The maximal ca- 
pacity at supply point i is S;, and the cost of produc- 
tion is g;(z;), where g;(0) = 0, and gj(z;) is assumed to 
be lower semicontinuous, nondecreasing and concave 
due to economies of scale. Assuming stochastic at de- 
mand point j, dj, we get a penalty cost ¢j(y;, dj), such 
that the expected penalty f(y) = E[ ¢j(y;, dj)] is a con- 
vex function, see [1]. 

Let us assume a probability density function, ¢;(d)), 
which gives an expected demand as E[dj] = Ve v dj(v) 
dy, and a continuous distribution function as Fj(d;) = 
i ey o;(v)dv. There are unit holding costs, a; > 0, and 
unit shortage costs, §; > 0, which gives a total cost as 


filyj) = &j Jo — yj)bj(v) dv 


Vj 


yj 
ee / (yy —Waidy 
0 


Vj 
= &(Eldj] — y)) + (& +) / 


0 


F(v)dv, 


which can be shown to be a convex function. 

The costs for transportation are linear, with unit 
cost cj from supply point ito demand point j. The prob- 
lem to minimize the total costs is: 


m n 
v* = min) y cymy 


i=1 j=1 
+) giz) + >) fi) 
i=l j=l 


(P) § s.t. JS lovey; 


The objective function is a sum of three terms: a lin- 
ear transportation cost, a convex shortage penalty and 
a concave production cost. It isneither convex nor con- 
cave, but a d.c. function, i.e. a function that can be rep- 
resented as a difference of two convex functions, [13]. 
Minimizing such a function under linear constraints is 
a nonconvex global optimization problem, which may 
have multiple local minima. 
We can add the following redundant constraints, 


a yj < Stor. 
j=l 


20s V4; 


xij XSi, Vi, 


where Sror = pas S;, to ensure that the feasibleset is 
bounded. 

The problem (P) is quite general, and here we 
are mainly interested in the stochastic transportation- 
location problem, (STLP), which is the special case of 
(P) that occurs if g; consists of a fixed charge (and pos- 
sibly a linear cost). This problem can also be formulated 
as 


mi on 
v* = min ) ) Ci jXij 


i=1 j=1 


+o ri8i + > fi) 
i=1 j=l 


s.t. sy = 9; 
i=1 
(STLP) : PTs 
Sa S58, 
j=l 
PH 1p MN, 
6; € {0,1}, 
i=1,...,m 
yi 29, Vi, 
xij 20, Vij 


where 1; is the fixed cost for starting production at sup- 
ply point i, and 4; is equal to 1 if something is produced 
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at supply point i and 0 if not. This problem has been 
solved by a heuristic approach in [14] and by general- 
ized Benders decomposition, [5], in [2]. 

A much simpler special case of (P) occurs if g;(z;) 
is linear (and thus can be included in the transporta- 
tion costs), namely the stochastic transportation prob- 
lem, see [15]: 


mon 
v* = min ) ) CijXij 


i=1 j=1 


+50 i071) 
j=l 


s.t yxy = i 
(STP) tt 
j=l,...,n, 
baer <S; 
j=l 
i=1,...,m, 
xij 2 0, Vi, j. 
yj20, Vi 


The objective function of (STEP) is convex, so the prob- 
lem can be solved efficiently by methods for convex 
problems. 


Solution Method 


We will here describe the method proposed in [12], 
which solves both (P) and (STLP). It exploits the facts 
that the constraints are of transportation type and the 
objective function is separable. Furthermore, of the mn 
+m +n variables, only m variables, z, appear in noncon- 
vex functions. Therefore one can reduce the problem 
to a much smaller d.c. optimization problem in only z. 
This reduced problem can be solved by a branch and 
bound procedure in which branching is performed by 
partitioning the space into rectangles, while bounding 
is based on linearization of the concave functions g(z) 
= Dein 8i(Zi). 
One can show, [12], that(P) is equivalent to 


(P*) min {g(z) + g(z): z€ 2} 


in the sense that the two problems have equal optimal 
values and if (x*, y*, z*) solves (P), then z* solves (P*), 
and conversely if z* solves (P*), then (x*, y*, z*) solves 
(P), where (x*, y*) is an optimal solution of (Q(z)), 
where 


g(z) =min > > CijXij 


i=1 j=1 


+> fi0) 
j=l 


m 
s.t. yar, 
i=1 
(Q(z)) P= 1Lysanght 
n 
Sage, 
j=l 
i=1,...,.m 
vie 
j=l,...,n, 
xij 20, Vij 


for a given z in the rectangle 2 = {z:0<z< S:= (S, 
..+> Sm) }. One can show that y(z) is a convex function. 
(P*) is still a d.c. optimization problem, but much 
smaller than (P). Below we present a branch and bound 
solution method for (P*), based on the following. 
e Rectangular subdivision of the feasible domain 92. 
e Linearization of the concave costs, yielding a sub- 
problem similar to (STP). 
Let M = [p, q] bea rectangle contained in 2. Any point 
w € M, together with an index i € {1, ..., m }, deter- 
mines a subdivision of M into two subrectangles {z: p; 
SZ Wipe <2 <q (VtF df and {z: wi < zi < qi pr 
< 2 < qe (Wt F i) }. This subdivision is called a subdi- 
vision via (w, i). 

For any rectangle M = [p, q] contained in the feasi- 
ble domain 2, let Lyg ;(-) be the affine function which 
agrees with g;(-) at the endpoints of the segment [p;, qi], 
ie. 


Lm,i(zi) = gi(pi) 
omy (fee on ee oe 
(- —) (gi(qi) gi(pi)) 


t 


hy (- - si?) 
qi — Pi 
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Define the convex problem 


m n 


B(M) = min) > Gy 


i=1 j=1 


+ Y- Fils) 
j=l 

+) Lu,i(zi) 
i=l 


ex = yi 


(CP(M)) 


Clearly 8 (M) < min { g(z) + g(z): z € M}, and if an op- 
timal solution (x(M); y(M), z(M)) of (CP(M)) satisfies 
Ly(Z(M)) = g(Z(M)) then B (M) = min { g(z) + g(2): 
zeEM}. 

Since the convex problem (CP(M)) is an STP with 
additional constraints z € M,i.e. p < z < q, the lower 
bounding subproblem for each rectangle can be solved 
by slightly modified versions of algorithms for (STP), 
see [9]. 

My,, the subrectangle in which the approximation is 
to be improved, is chosen to be the subrectangle with 
smallest lower bound. 


Mie arg min, B(M), 
k 


where W;’ denotes the current partition. This implies 
B(M,) < min { g(z) + g(z):z € @ }, and if Z* = Z(Mx) 
satisfies Ly, (z*) = g(z*) then equality holds, i.e. zk 
solves (P*). Otherwise, g; (z*) = Lu, i(Z*) > 0 for 
at least one i and we subdivide M via (zk , ik)» Where 
ix is the index i that achieves the maximal difference 
gi(Z*) -—L M,,i(Z*) between the actual cost and the lin- 
ear approximation. This subdivision rule ensures that 
eventually this maximal difference will tend to zero, and 
the lower bound 6(Mx) will tend to the exact minimum 
of the objective function on M,. 


The method is given in algorithmic form: 


Initialization 
Choose ¢ and a subrectangle M, of §2 which is 
known to contain an optional solution of (P*). 
Let z! be the best feasible point available, and 
0 = 9(z') + g(z!). Set Wi = P; = {Mj}, k = 1. 
Iteration k =1,2,... 

i For each M e€ P; solve (CP(M)), yield- 
ing B(M) and (x(M; y(M), z(M)). Update v 
and z*, 

ii Delete all M € W; such that 6(M)>7—e, and 
let W, be the remaining members of W,. 
IF W, = 9 THEN terminate: z* is a global e- 
optimal solution of (P*). ELSE choose M; € 
arg min{B(M): M € W,}. 

iti Let z* = 2(My,). Select iz € arg max; {gi(z*)— 
Lu, (2")}. 
Fg, 2, )— in, (2°) —0 
THEN terminate: 2* is a global optional solu- 
tion. 
ELSE divide My, via (2*, iz). Let Pys1 be the 
partition of My and Wx.1 = (W; \ {Mg}) U 
Prt. 
Set k > k + 1 and go back to i). 


In [12] convergence of this algorithm is proved. If 
gi(t) is discontinuous at t = 0, then the problem does 
not change if g;(t) is replaced by a continuous function 
g;(t) that is linear for 0 < t < tT; and equal to g,(t) for t 
> t;, where T; is a certain positive number (for details, 
see [12]). 

Let us now discuss the case when g;(t) are concave 
piecewise linear nondecreasing functions. First we can, 
as mentioned above, assume that they are continuous 
at t = 0, hence continuous at every point. Furthermore, 
in Step iii) of the Algorithm, instead of dividing Mx. i, 
. we can divide it via the breakpoint u;,;, of gi, (t) 


nearest to Zi that satisfies 


via Z 


Si, (Uk, i) an Loy iz Uk,ix) = ix Ce ~~ Loy, ig(Zj,) : 


Since the number of breakpoints of each function g;(t) 
is finite, the algorithm must terminate after finitely 
many steps. In this case the method has similarities to 
the method proposed in [19]. 

If g;(t) is of fixed charge type, e. g. g;(0) = 0 and g;(t) 
=r; + p;t> 0 for t> 0, it can be replaced by a continuous 
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concave piecewise linear nondecreasing function with 
one breakpoint u; arbitrarily near to 0. The subdivision 
of the interval [0, S;] via u; then amounts to branching 
over the boolean variable 4;, with [0, u;] corresponding 
to 6; = 0, and [w;, S;] corresponding to 6; = 1. 


How to Solve the Subproblem 


There are several possibilities for solving the subprob- 
lem, (CP(M)), which is a problem of the type (STP). 
It can be solved by the Frank-Wolfe method (cf. also 
> Frank—Wolfe algorithm, [4]), in [1] and [16], by cross 
decomposition, [20], in [10], by separable program- 
ming in [7], by the forest iteration method in [18] and 
by mean value cross decomposition, [9], and combina- 
tions of separable programming, Lagrangian relaxation 
with subgradient optimization and mean value cross 
decomposition in [9]. 

In computational tests in [9] separable program- 
ming combined with mean value cross decomposition, 
here denoted by SM, is found to be the quickest method, 
followed by a modified version of the Frank-Wolfe 
method, [16], here denoted by FW. 

In [12] two branch and bound methods, BB-SM 
where the subproblem is solved by the SM method, and 
BB-FW, where the subproblem is solved by the FW 
method, are compared. BB-FW is found to be rather 
stable, while BB-SM is less so. Reasons for this can be 
the scaling and rounding required in the primal sub- 
problem in SM, if the linear minimal cost network flow 
code used requires integer costs and capacities. If af- 
ter branching the interval [p;, q;] is small, the round- 
ing may have large effects on the dual solution (used as 
input to subsequent subproblems). 


Computational Results 


In [12] anumber of randomly generated problems with 
up to 100 origins and 500 destinations have been solved. 
The probability density functions used are exponential 
distribution functions, which yields 


filyj) = oj (», = x) ss (a) exp(—Ajyj). 


The concave cost functions are chosen to be of the form 


biz if z; > 0, 


Zi) = 


600 


500 


Stochastic Transportation and Location Problems, Figure 1 


The relations between the convex costs and the con- 
cave costs are decided by a weight, c,, on the concave 
part, which seems to affect the difficulty of the prob- 
lem very much. For small values of cg the concave part 
of the cost is dominated by the convex part, and the 
problems are easy. (For c, = 0, we get the (STP).) For 
large values of c,, the concave part dominates, and these 
problems are also easy (solvable in fractions of a sec- 
ond). However, somewhere between these extremes, we 
find a sharp increase in difficulty. These effects are illus- 
trated for two groups of problems in Fig. 1. 

A general conclusion of the computational tests is 
that the branch and bound trees are fairly limited in 
size, in spite of the risk for unnecessary branching due 
to the asymptotic convergence of the methods used for 
the subproblem. This is indicates that the bounding 
subproblems are strong enough. 
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The deterministic vehicle routing problem (VRP) is de- 
fined on a graph G = (V, E), where V = {v1, ..., Va} is 
a vertex set and E = {(vj, vj):vi, vj € V, i < j} is an edge 
set. Vertex v, represents a depot at which are based m 
identical vehicles of capacity D. The other vertices are 
customers requiring a visit, which may consist of either 
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collection or delivery of goods or of providing some ser- 
vice like in the repair industry. The VRP consists of de- 
signing a set of m least cost routes starting and ending 
at the depot, such that each customer is visited exactly 
once. In practice, additional constraints need to be sat- 
isfied. Two important examples are the following: 

e capacity constraints: with each customer ¥; is associ- 
ated a demand d;. Then the total demand ofa vehicle 
route may not exceed D. 

e time constraints: with each customer v; is associated 
a service time s;. Also, with each edge (vj, vj) is as- 
sociated a travel time t,. Then, the total duration of 
each route, including service and travel times, may 
not exceed a given bound B. 

Several other constraints can be encountered in practi- 

cal applications. For a recent survey and a bibliography 

on the deterministic VRP, see [5] and [10]. 

In many situations, some components of the prob- 
lem are random. Then, the problem becomes a stochas- 
tic vehicle routing problem (SVRP). At the moment 
(1999), three main cases have been considered: 

1) stochastic customers: customer v; is present with 
probability p; and absent with probability 1 — p;. 

2) stochastic demands: the demand d; of customer 1; is 
a random variable. 

3) stochastic times: the service time s; of customer v; 
and the travel time tj of edge (vj, vj) are random vari- 
ables. 

This randomness may apply to only some customers 
or edges. When some data are random, it is no longer 
possible to require that all constraints be satisfied for 
all realizations of the random variables. As is classical 
in stochastic programming (see also [3]), two main ap- 
proaches are considered. 

In chance constraint programming (CCP), the deci- 
sion maker requires that the constraints must be sat- 
isfied with a given probability, typically 90% or 95%. 
This line of research for the SVRP was initiated in [18]. 
While in general, the approach in CCP is to obtain 
a (usually nonlinear) deterministic equivalent of the 
probabilistic constraint, it turns out that, for the SVRP, 
linear constraints can be found to eliminate routes that 
violate the probabilistic constraints. Those constraints 
are stronger for the SVRP with stochastic demands (as 
they apply to sets of customers) and, in general, weaker 
in the case of stochastic travel times (as they only ap- 
ply to the routes traveled). These constraints are usually 


embedded within a branch and cut (cf. also » Integer 
programming: Branch and cut algorithms) procedure, 
in exactly the same way as this is done for the subtour 
elimination constraints in the deterministic VRP [13]. 

In stochastic program with recourse (SPR), the set 
of decisions is divided into two groups, the de- cisions 
made before the realizations of the random variables 
are known are called first-stage decisions, those made 
after the realizations are known are called second-stage 
(or recourse, or corrective) actions. In the SVRP, the 
first-stage decision typically consists of planning the 
various routes. In the second stage, the routes are fol- 
lowed as planned, with simple rules for possible cor- 
rective actions. When customers are present with some 
probability, the recourse action consists of skipping ab- 
sent customers (this problem is then known as the prob- 
abilistic traveling salesman problem , or PTSP). For the 
case where demands are random, while following the 
planned route to make the collections, the vehicle may 
be unable to load some customer’s demand as the vehi- 
cle becomes full. The recourse action may then consist 
of a return trip to the depot to unload, such that the 
vehicle may be able to resume its trip. Similarly in the 
SVRP with stochastic travel times, the recourse action 
may simply consist of paying some charge (or penalty) 
when the effective travel time exceeds B. Such situations 
do not occur in a deterministic setting where demands 
(or travel times) are supposed to take precisely the value 
forecasted for planning the route. The aim of an SPR 
is to find a solution of least expected total length (for 
the PTSP) or least expected total cost (for SVRP with 
stochastic demands or times). The framework of plan- 
ning the routes in the first-stage and having a simplified 
recourse policy in the second-stage is known as a priori 
optimization. 

Solution methodologies include the asymptotic 
analysis of a priori optimization [2,9], heuristics such 
as the modified savings algorithm [4], metaheuristics 
such as the tabu search [7] and exact algorithms [12]. 
Based on the integer L-shaped method [11], exact algo- 
rithms assume the capability of computing the expected 
recourse (or penalty) function and the availability of 
a lower bound on this function. Efficiency is greatly 
improved when lower bounding functionals can be de- 
rived that also apply at fractional solutions. Such is the 
case for the PTSP [14] and for the SVRP with stochastic 
demands [8]. 
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More elaborate versions of SPR can easily be mod- 
eled. They typically include more diversified recourse 
policies or even multistage recourse policies. They are 
in general much more difficult to solve. Example of suc- 
cessful solutions are available in dynamic vehicle allo- 
cation [16], dynamic routing [17] and re-optimization 
strategies [1]. A survey is available in [6]. 
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Structural optimization is a discipline dealing with opti- 
mal design of load-carrying structures. Some examples 
of such problems are to find an optimal shape of a sus- 
pension arm in a car, to find an optimal material distri- 
bution in the wall of a centrifugal separator, to find op- 
timal thicknesses of composite material layers in wing 
panels of an aircraft, or to find optimal cross sectional 
dimensions of the different beams in a new Eiffel tower! 

Structural optimization problems are often mod- 
eled as nonlinear programming problems of the form 


minimize fo(x) 


subject to fi(x) < bi, i=1,...,m, (1) 
xEex. 
Here, x = (x),...,xXn)' is the vector of design 


variables, like cross sectional dimensions of bars and 
beams, thicknesses of membranes and plates, variables 
describing the shape of the structure, etc. The objective 


function f(x) is typically the structural weight, while 
the inequalities f;(x) < b; model constraints on dis- 
placements, stresses, moments of inertia, eigenfrequen- 
cies, etc. Finally, X is a given rectangular subset of R”, 
defining simple lower and upper bounds on the vari- 
ables. 

The functions f;(x) are not explicit. Instead, they are 
typically of the form 


filx) = hi(x, u(x)), (2) 


where h; (x, u) are explicit functions while the “state 
vector” (u) depends implicitly on the design variable 
vector x through some system of state equations. Often, 
u is the nodal displacement vector in a finite element 
model and the state equations are of the form 


Ku=p, (3) 


where K = K(x) is the stiffness matrix of the structure 
(in the finite element model), while p = p(x) is a vector 
describing the loads on the structure. This means that 
each time the constraint functions should be evaluated 
at some point x, the state equations must be generated 
and solved, typically by some finite element package. 
The function evaluations could therefore become very 
time consuming. 

An encouraging fact, however, is that for many 
problems of this type it is possible to calculate gradients 
of the constraint functions in an efficient way. Since 
the possibility of calculating gradients is a key point for 
solving structural optimization problems, we now de- 
scribe in some detail a so called adjoint method for such 
calculations, assuming that the considered problem is 
on the above form. 

First, the chain rule gives 

Of; Oh du 


eee T 
Ox; 7 Ox; ae Ox; ; 


(4) 


where the components of the row vector q; are the par- 
tial derivatives of h; with respect to the components of 
u, calculated at the current point (x, u). 

Next, with q; from above, let the vectors v; be ob- 


tained from the systems 
Kvi = di, i=1,....m. (5) 


When the system (3) was recently solved for obtaining 
the displacement vector u corresponding to the current 
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x, the matrix K was generated and factorized, typically 

by a banded Cholesky method. This calculated factor- 

ization should naturally be used again when solving (5). 
By differentiating both sides of (3), one obtains 


ua, (6) 


from which it follows, after multiplying by v/, that 


du dp OK 
Ce aa ag! 
i Ox; = Ox; ‘i ax; ; ) 


Together with (4), this implies that 


Op a, (8) 


T 
ax; Ox; +! Ox; 
where all the terms on the right hand side are fairly 
straightforward to calculate. 

In most applications, the considered structure 
should be able to carry several different loads. This 
means that instead of a single load vector p in (3), there 
are several given load vectors pi,..., pz, and corre- 
sponding displacement vectors u;,..., uz. After obvi- 
ous modifications of the notations, the above descrip- 
tion of gradient calculations remains valid. 

The major computational work when calculating 
function values and gradients thus consists in solving 
the systems (3) and (5) for a possibly large number of 
given right hand side vectors. However, the stiffness 
matrix K(x) only has to be factorized once for the cur- 
rent x. 

Because function evaluations are very expensive, 
and because gradients can be calculated almost at the 
same time as function values, the following iterative ap- 
proximation approach has become well established for 
solving a large class of structural optimization problems 
on the above form. 


Step 0. Choose a starting point x € X and set the 
iteration index k = 1. 
Step 1. Given x, calculate f (x) and gradients 
V fix) fori=0,1,...,m. 
Step 2. Generate an approximating subproblem of the 
form 
minimize (x) 
subject to g(x) <b;, i=1,...,m, (9) 


xe xX | 


where g(x) are explicit functions which approxi- 
mate the implicit functions f;(x), while X ( is a rect- 
angular subset of X containing x“, 

Step 3. Solve this explicit subproblem with some suit- 
able method and let the optimal solution be the next 
iteration point xt), Then set k = k +1 and go to 
Step 1 again. 


The process is terminated when some reasonable 
convergence criteria have been fulfilled, or (in practice) 
when the marginal improvements from the latest itera- 
tions have become so small that the user does not find it 
worthwhile to continue. As mentioned above, each sin- 
gle iteration may take a considerable time. 

A crucial step in this approach is to make a clever 
choice of approximating functions gi (x). The main 
information available for doing that are the calculated 
function values and gradients, at the current iteration 
point x“ as well as at previous points. In addition, some 
relevant properties of the considered problem may be 
known. As an example, it is known that the normal 
stress in a truss element decreases approximately as 1/x; 
if the cross section area xj of the element increases, 
and a related type of behavior holds also for the nodal 
displacements. This kind of general information could 
be most valuable when the approximating functions 
should be chosen. Finally, it is important that the sub- 
problem (9) does not become too hard to solve numer- 
ically. For this reason, and to avoid nonglobal local op- 
tima of the subproblem, it is to prefer that the chosen 
approximating functions g(x) are convex. It should 
be noted, though, that the original functions f;(x) may 
very well be nonconvex. This implies that the optimal- 
ity conditions used for terminating the process do not, 
in general, guarantee a global optimum of the original 
problem. 

The approach above, Step 0 - Step 3, will now be 
exemplified by a specific method, further discussed in 
e.g., [2], which is well suited for problems of the fol- 
lowing type: The design variables x; are assumed to be 
transverse sizes of structural elements, such as cross 
section areas of truss elements or thicknesses of mem- 
brane elements. This makes the stiffness matrix K(x) 
linear in x. It is further assumed that there are strictly 
positive lower bounds defined for these variables, which 
implies that K(x) is always positive definite. The ob- 
jective function is assumed to be the structural weight, 
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which is linear in x. Finally, the constraint functions are 
assumed to model given limitations on stresses and dis- 
placements at different given points in the structure, for 
different given load vectors. 

For this type of problems, the constraint functions 
fi satisfy the relation fj(ax) = (1/a)f;(x) for ev- 
ery vector x with strictly positive components and ev- 
ery scalar a > 0. This makes it reasonable to approx- 
imate each constraint functions by a linearization in 
the inverse design variables 1/x;. The approximating 
functions g(x) are thus chosen as follows, for i = 
1l,...,m, 


” 1 1 
gi (x) = AGE) + Day (2- +s) . (10) 
= 


j 
where 


OT 
“1 Bix) Fax; ve 


calculated at x = x), 
There is no need to approximate the objective func- 
tion, since it is already linear in x. Thus, 


2 =f) => ox. (12) 
j=l 


Therefore, the subproblem (9) becomes as follows, 
where a temporary change of variables to yj; = 1/x; 
has been made, 


, - (13) 
subject to > aijy; <b;,i=1,...,m 


j=l 
yeE ye) 


This is a tractable problem with a separable and 
strictly convex objective functions, linear inequality 
constraints and simple bounds on the variables. One of 
several possible ways of solving this subproblem is to 
form the corresponding dual problem, which is of the 
form 


maximize (A) 


(14) 
subject to A; > 0, 


a ee 74) 


where ¢(A) is a concave, continuously differentiable, 
explicit function. By solving this dual problem by some 
suitable method, like a modified Newton method, one 
also obtain the optimal solution of the primal subprob- 
lem (13). 

If somewhat more elaborate approximating func- 
tions g(x) are used, the above approach of solving 
a sequence of separable convex subproblems can in 
fact be made globally convergent, so that the sequence 
of generated iteration points x) always converges to- 
wards the set of KKT points of the original problem (1), 
see [3]. 

An expanding subfield of structural optimization is 
so called topology optimization, where a central ingre- 
dient is the interest for the holes in the structure. In 
addition to the optimal outer shape of the structure, 
one now also search for the optimal number, location 
and shape of these holes. The corresponding optimiza- 
tion models typically involve a large number of binary 
variables which indicate presence (x; = 1) or absence 
(x; = 0) of material in various points of the consid- 
ered structure. For an excellent survey on topology op- 
timization, see [1]. 


References 


1. Bendsge MP, Sigmund O (2003) Topology optimization: the- 
ory, methods and applications. Springer, Berlin 

2. Fleury C (1979) A unified approach to structural weight min- 
imization. Comput Method Appl Mech Eng 20:17-38 

3. Svanberg K (2002) A class of globally convergent optimiza- 
tion methods based on conservative convex separable ap- 
proximations. SIAM J Optim 12:555-573 


—————— 
Structural Optimization: History 


RAPHAEL T. HAFTKA', 

JAROSLAW SOBIESZCZANSKI-SOBIESKI” 

' University Florida, Gainesville, USA 

* NASA Langley Research Center, Hampton, USA 


MSC2000: 90C90, 90C26 


Article Outline 


Keywords 
See also 
References 


Structural Optimization: History 


3835 


Keywords 


Structural optimization; Structural shape optimization; 
Structural topology optimization; Sequential 
approximate optimization; Sensitivity derivatives 


The term structural optimization is commonly used 
for the optimization of engineering structures, such as 
building, automobile, or airplane structures for im- 
proved strength or stiffness properties and reduced 
weight or cost. Before computer based optimization 
became widely used, structural components, such as 
beams and plates were optimized by using the calculus 
of variations [5]. 

The computerized analysis of structures, via mod- 
els that discretize the structure into a large number of 
pieces, known as finite elements, has become prevalent 
in the 1960s. Numerical optimization based on finite el- 
ement models started in the early 1960s by L. Schmit 
and his students [6]. The early years were character- 
ized by applications for civil engineering truss struc- 
tures, with the design variables being cross-sectional ar- 
eas of the elements. Later these variables were general- 
ized to cross-sectional dimensions of beams and thick- 
nesses of plates. This class of design variables, so called 
sizing variables, has the distinction that the optimiza- 
tion can be carried out with only superficial changes in 
the finite element model. 

More recently, structural optimization research has 
focused on changing the shape (geometry) and topology 
of the structural configuration. Geometrical changes re- 
quire redefinition of the finite element mesh. Topologi- 
cal changes, which consist of adding or removing parts 
as well as creating holes, pose even more difficult chal- 
lenges in converting the structural design into a man- 
ageable optimization problem [1,2]. 

A major driving force in the development of struc- 
tural optimization methodology has been the need to 
accommodate very large number of design variables 
(hundreds or thousands), while a single structural anal- 
ysis (evaluation of objective function and constraints) 
requires the solution of thousands to hundred of thou- 
sands of algebraic equations derived from the finite ele- 
ment method. This computational challenge has been 
addressed by several devices, many quite unique to 
structural optimization. 

e For optimization problems subject only to stress 
limit constraints, an intuitive optimality criterion 


has been employed, that stipulates that each part of 
the structure is stressed to its limit, imposed by ma- 
terial properties or by buckling, under at least one 
loading condition. This optimality condition is ac- 
companied by techniques that remove material from 
regions that are under stressed and add material to 
regions that are overstressed, without the need to 
calculate derivatives. This approach is often termed 
fully stressed design. 

In many problems the number of active constraints, 
aside from lower limits on the design variables, can 
be made much smaller than the number of design 
variables. Various dual optimization formulations 
then become more effective than direct formulations 
[1]. 

The most popular approach for solving structural 
optimization problems of high dimension is sequen- 
tial approximate optimization, a generalization of 
sequential linear programming. In this approach 
the objective function and constraints are replaced 
by approximations using first derivatives, which are 
linear in either the design variables or their recipro- 
cals. Convex approximations are particularly popu- 
lar [3]. 

Efficient calculation of derivatives of structural re- 
sponse with respect to design variables is a ma- 
jor field of active research. Methods that differen- 
tiate the continuum equations and then discretize 
compete with methods which differentiate the dis- 
cretized finite element equations. Adjoint derivative 
methods are usually superior when the number of 
differentiated response quantities is less than the 
number of the design variables. 

For topology optimization problems, where the 
number of design variables is extremely large, global 
compliance is often used as a single measure of 
structural performance. This allows the develop- 
ment of efficient specialized algorithms, or at least 
extremely cheap calculation of derivatives. 

In the structural design problem, the overall design 
is usually carried out first, using a coarse analysis 
and optimization models to determine the overall 
material distribution, and possibly shape and topol- 
ogy. This is followed by, detail design of different 
parts of the structure, for example, individual spars 
and panels in aircraft structures. In principle, there 
ought to be feedback from the second stage to the 
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first, but the iterations implied by such feedback 

are often impractical for cost and time reasons. Un- 

fortunately, there is still no completely satisfactory 
method to such two-stage design in a rigorous and 
computationally efficient way. 

While most structural optimization problems are 
formulated as continuous problems, there is also sub- 
stantial interest in discrete design variables, and these 
fall in two categories: those that appear as continuous in 
the analysis but are available for actual implementation 
in limited sets; and those that appear as discrete in the 
analysis. An example of the first category are civil engi- 
neering applications of beam cross-sectional shapes and 
dimensions, which are readily available only in stan- 
dardized sets, and using other shapes increases the cost 
substantially. An example of the second category is are 
choices of material and topology. The increasing usage 
of fiber reinforced laminated composite materials also 
introduces variables of both categories, creating dis- 
crete and combinatorial problems. This is due to the fact 
that thicknesses have to be integer multiples of the ba- 
sic ply thickness, and fiber angles are usually limited to 
a small discrete set by the availability of test data. Ge- 
netic algorithms have been popular for such applica- 
tions [4]. 
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Frequently, optimal control of engineering processes is 
difficult to achieve, or the resulting structure of the op- 
timal control policy may not be in an appropriate form 
for application. This leads us to make certain approxi- 
mations to the formulation of the problem, to simplify 
the model describing the process, or to impose some 
structure on the nature of the control policy. Therefore, 
instead of solving the original optimal control problem, 
the optimal control policy is established to a closely re- 
lated problem. The solution to the related problem is 
said to be suboptimal control policy of the original prob- 
lem. 


Suboptimal Control 


3837 


Time Suboptimal Control 


To illustrate the suboptimal control in one important 
area, let us consider the time optimal control problem, 
where the system is described by the differential equa- 
tion 

dx 


|, = f(x, u), 


i with x(0) given, (1) 


where x is an n-dimensional state vector and uw is an 
r-dimensional control vector bounded by 


S202 he JV (2) 


The time optimal control problem is to determine 
the control u in the time interval 0 < t < t;, so that the 
origin x(tr) = 0, is reached in minimum time ty. The 
origin is reached when 


||x:(tp)|| <éj, i=l,...,n, (3) 


where €; is some specified tolerance, such as accuracy of 
measurement of the state variables. 

For the case of scalar control (r = 1), and n = 2, 
this time optimal control problem is easily solved by es- 
tablishing the switching curves in the phase plane and 
then to follow the particular trajectory starting from the 
given initial condition to the switching curve and then 
following the switching curve to the origin. This is il- 
lustrated in [8, pp. 146-150]. 

Of interest is to solve the time optimal control prob- 
lem for higher-dimensional problems. For the general 
case the problem is very difficult, even with iterative 
dynamic programming (IDP) [3,4]. The special case of 
time optimal control of a linear system with n = 6 and 
r = 2 was solved in [9] by using linear programming 
on the discretized form of the system to seek the min- 
imum number of time steps to provide a feasible solu- 
tion. The computational aspects and results of using lin- 
ear programming on a 6-plate gas absorber are given in 
[8, pp. 212-223] and [1]. The optimal control policy in- 
volves switching the two control variables from bound 
to bound several times. Therefore, we may be interested 
in allowing the final time to be increased somewhat if 
we get a ‘more stable’ control policy, and one that would 
not be very sensitive to modeling errors. 

In order to stabilize the control policy for the 
discrete-time version of the problem and still drive the 
system to the origin, R. Koepcke and L. Lapidus [7], 


suggested the construction of a positive definite 
quadratic function of state 


V(k) = x" (K)Qx(k) (4) 


and to choose the control policy to minimize the for- 
ward difference 


AV(k) = V(k + 1)— V(k). (5) 


If any of the calculated control variables are beyond the 
boundary specified by (2), then the boundary values are 
used. Such clipping technique is widely used in optimal 
control to handle control constraints. If AV(k) is nega- 
tive, then we have the added benefit of having asymp- 
totic stability stability, and V(k) is called a Lyapunov 
function. 

There is a certain amount of freedom in choosing 
an appropriate positive definite matrix Q. Although Q 
may be chosen to be the identity matrix, such a choice is 
not the best. For a six-plate gas absorber model, Lapidus 
and R. Luus [8, pp. 363-369] suggested the use of the 
diagonal matrix 


Q = diag[1, 7.39, 230, 230, 7.39, 1], (6) 


rather than an identity matrix. The elements in Q were 
chosen to put more weight on the inner stages to coun- 
terbalance the logarithmic damping produced by each 
stage. Thus, 7.39 = e*, and 230 = (e*)* where e is the 
base for the natural logarithm. The criterion that |x;(t,)| 
< 0.001, i = 1, ..., 6 was satisfied with tr = 9.0 min- 
utes, which is reasonably close to the value ty = 6.0 
minutes obtained by linear programming for the orig- 
inal problem [1,9]. To improve the result obtained by 
suboptimal procedure, different values for Q were ex- 
amined [2,5]. Instead of using Rosenbrock’s hillclimb- 
ing procedure [24], Luus [12] found that the use of di- 
rect search [16] gave surprisingly good results. The re- 
sults were surprising, since tr = 4.8 minutes that was ob- 
tained, was better than the previously accepted value of 
6.0. This apparent paradox, where the suboptimal con- 
trol yielded better results than the optimal control was 
resolved in [23], where the high sensitivity of tp on the 
final state specification is shown. With optimal control 
we were driving the system to the mathematical origin, 
rather than the practical origin where all variables had 
to be less than 0.001 in value. When this relaxed condi- 
tion was incorporated into the linear programming al- 
gorithm, a minimum time of tr = 4.5 minutes resulted. 
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Suboptimal Control 


Therefore, the suboptimal control here served as a 
good means to check the results obtained in solving the 
original problem, and provided a nice means of reinter- 
preting the original optimal control problem. The use 
of the quadratic function V = xT (k)Qx(k) in general 
optimal control problems to simplify the formulation 
and to provide good suboptimal results have been re- 
ported [6,11,25,27]. 

Another approach to time suboptimal control is to 
simply carry out search on the elements of the feedback 
gain matrix [10], or choose the gain matrix elements so 
that the eigenvalues of the linearized system are shifted 
as far left as possible in the imaginary versus real graph 
[18]. With direct search optimization this search can be 
readily accomplished. These two approaches were com- 
bined into a two-step procedure in [20], to enable the 
desired state to be approached very rapidly, and yet ina 
stable manner. 


Use of Suboptimal Control in Complex Systems 


When the given system is very complex, or of very high 
dimension, very good results can be obtained by reduc- 
ing the dimensionality or complexity of the system be- 
fore attempting to determine the optimal control pol- 
icy. This is similar to the idea in nonlinear analysis 
where averaging technique may be used to average out 
the noncontributing terms and yet provide accurate sta- 
bility information [17]. The optimal control policy ob- 
tained for the simplified system is then the suboptimal 
control for the original system. To get good results, one 
usually tries to get the best simplified or reduced model. 
Optimization has been used to obtain excellent models 
even when the original system has been of very high di- 
mension [13,33,32]. 

Another useful approach is to use orthogonal collo- 
cation to change a partial differential equation into a 
set of ordinary differential equations [21,29,31] or by 
using coordinate transformation [30]. K.T. Wong and 
Luus [28] showed that a very good simplified model for 
a staged system, such as a gas absorber, that is mod- 
eled as a large number of differential equations, can be 
established by first converting the ordinary differential 
equations into a partial differential equation and then 
to use orthogonal collocation to yield a small number 
of ordinary differential equations that depict the behav- 
ior of the system quite accurately. 


Suboptimal Control in Other Situations 


Suppose that the optimal control policy requires the use 
of all the state variables in the control law, but it is im- 
practical to measure all the variables. To handle that sit- 
uation we can establish a control law which uses only 
the variables that can be measured. We have in essence 
an incomplete state feedback, and it is important to ex- 
amine what is lost by not measuring all the state vari- 
ables. This is an important area, and good progress has 
been made [26]. 

In time-delay systems, if the time delay is small, then, 
as was shown in [21], Taylor series approximation may 
be used to convert the time-delay system into a set of or- 
dinary differential equations. The establishment of op- 
timal control for the latter is much easier, but results in 
suboptimal control for the original system. The degree 
of suboptimality for realistic values for the delay terms 
was found to be quite small [21]. Such an approach was 
also used in [19,15], to obtain suboptimal control policy 
for a time-delay system. This suboptimal control pol- 
icy was then used as the initial control policy in using 
iterative dynamic programming with piecewise linear 
control. 

Another area where suboptimal control has been 
used is in the choice of the final time tr. When ty is 
relatively small, the choice of ty is very important, be- 
cause the optimal control policy is quite sensitive to t, 
[14]. If, however, t, is relatively large, the system is lin- 
ear and the performance index is quadratic, the choice 
of ty = oo simplifies the solution of the optimal control 
problem, since the resulting Riccati equation then is an 
algebraic equation and not a differential equation that 
has to be integrated backward. 

Suboptimal control therefore serves a very useful 
role in the optimal control field. 


See also 


> Control vector iteration 

> Duality in optimal control with first order 
differential equations 

> Dynamic programming: Continuous-time optimal 
control 

> Dynamic programming and Newton’s method in 
unconstrained optimal control 

> Dynamic programming: Optimal control 
applications 
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> Hamilton-Jacobi-Bellman equation 

> Infinite horizon control and dynamic games 

> MINLP: Applications in the interaction of design 
and control 

> Multi-objective optimization: Interaction of design 
and control 

> Optimal control of a flexible arm 

> Optimization strategies for dynamic systems 

> Robust control 

> Robust control: Schur stability of polytopes of 
polynomials 

> Semi-infinite programming and control problems 

> Sequential quadratic programming: Interior point 
methods for distributed optimal control problems 
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Successive quadratic programming (or SQP) methods 
are a class of methods for finding a local optimum 
to nonlinearly constrained optimization (or nonlinear 
programming) problems. Introduced by R.B. Wilson 
[19] in the early 1960s, followed by variants by W. 
Murray [13] and M.C. Biggs [2], and then popular- 
ized and refined by S.P. Han [10] and MJ.D. Powell 
[16], SQP methods are based on the recursive use of 
quadratic programming to calculate iterative improve- 
ments to the estimates of the constrained optimum and 
corresponding Lagrange and Kuhn-Tucker multipliers. 
This use of recursive quadratic programming can be 
thought of as a means of balancing the tasks of satisfy- 
ing the nonlinear constraints and optimizing the objec- 
tive function. That is, starting estimates of the unknown 
variables and iterates do not have to satisfy the nonlin- 
ear constraints at each iteration, as they do in many 


other methods for nonlinearly constrained optimiza- 
tion. Rather the nonlinear constraints are satisfied as 
the iterates approach the optimum. Consequently, this 
together with the use of analytical and/or quasi- Newton 
estimates of the (Hessian) matrix of second derivatives 
of the Lagrangian function which account for nonlinear 
constraint curvature, are often cited as the two primary 
reasons why SQP methods are more reliable and more 
efficient (usually requiring fewer function and gradient 
evaluations) than other nonlinear programming tech- 
niques in solving nonlinearly constrained optimization 
problems. Successive quadratic programming methods 
have, in more recent years, been extended to large scale 
and nonconvex, nonlinearly constrained optimization 
[11,14] and applied successfully in various mathemati- 
cal, scientific, and engineering disciplines ([1,11,17]). 
The fundamental building blocks of any SQP 
method generally include: 
e methods for calculating or estimating the Hessian 
matrix of the Lagrangian function, 
e procedures for solving the successive quadratic pro- 
grams, and 
e stabilization techniques for forcing convergence 
from ‘poor’ starting points. 


Some Nonlinear Programming Basics 


Successive quadratic programming and other local con- 
strained optimization techniques seek to find a local so- 
lution to the following nonlinear programming (NLP) 
problem: 


min f(x) 


st. c(x) <0, 


where x is a vector of length n which represents an es- 
timate of the local minimum, f(x) is a twice contin- 
uously differentiable objective function and c(x), the 
vector of equality and/or inequality constraints, is also 
twice continuously differentiable and nonlinear. This 
constrained optimization problem is commonly recast 
in terms of the Lagrangian function, L(x), defined by 


L(x) = f(x) + AT e(x) + w"e(x), 


where A and p are vectors of Lagrange and Kuhn- 
Tucker multipliers associated with the equality and in- 
equality constraints respectively. The conditions that 
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define local solutions to the original nonlinearly con- 
strained optimization problem are called Kuhn-Tucker 
(or Karush-Kuhn-Tucker) conditions and are rep- 
resented by stationarity of the Lagrangian function. 
The Kuhn-Tucker conditions for the nonlinearly con- 
strained optimization problem shown above are 


gi(x) = g(x) + AT Je(x) + w(x) =0, 


c(x) = 0, pe! c(x) =0, pw=0, 


where g(x) is the gradient of the objective function, 
Je(x) is the Jacobian matrix (or matrix of first partial 
derivatives) of the equality constraints, J;(x) is the Jaco- 
bian matrix of the inequality constraints, g;(x) is the 
gradient of the Lagrangian function and the comple- 
mentarity conditions, jz! c(x) = 0, for the inequality 
constraints are interpreted as follows: if c;(x) = 0, then 
[i > 0 or if ci(x) < 0, then jz; = 0 for each inequality 
separately. Successive quadratic programming meth- 
ods can be thought of as an application of Newton or 
quasi-Newton methods to the NLP Kuhn-Tucker con- 
ditions, with one important difference. Direct applica- 
tion of Newton or quasi- Newton methods to the Kuhn- 
Tucker conditions for the nonlinear program requires 
a priori knowledge of the set of active constraints (i. e., 
the equalities plus the inequalities that hold as equali- 
ties) and this is further complicated by the fact that the 
active set can change from iteration to iteration. What 
Wilson [19] recognized was that the active set (and 
therefore the Lagrange and Kuhn-Tucker multipliers) 
and the Newton correction in the x variables at any iter- 
ation could be determined simultaneously by solving an 
appropriately-posed quadratic programming subprob- 
lem. This iterative quadratic programming subprob- 
lem, which is based on a quadratic approximation to the 
Lagrangian function subject to linearized constraints, is 
given by 


min g(xx) Ax, + 5Ax? Br Axy 
s.t. c(xn) + T(xp Ax, < 0 


where A x; is the change in the unknown variables, 
B, is some approximation to the true Hessian matrix 
of the Lagrangian function, H(x) = Hy(x) + )CAiHci(x) 
+ > iH-i(x), and where Hy(x) and H,;(x) refer to the 
true Hessian matrices of the objective function and ith 
constraint respectively and J is the Jacobia matrix of 
the constraints. Solving this quadratic programming 


subproblem produces precisely the same change in the 
unknown variables and estimates of the Lagrange and 
Kuhn-Tucker multipliers as does Newton’s (or a quasi- 
Newton) method applied to the Kuhn-Tucker con- 
ditions for the original nonlinear program provided 
the active set is known. Therefore, the use of succes- 
sive quadratic programming has the distinct advan- 
tage of not requiring a priori knowledge of the active 
set! However, perhaps the single biggest disadvantage 
of SQP methods is that linearization of the constraints 
(or the use of trust regions) can sometimes make these 
quadratic programming subproblems infeasible (i.e., 
result in a feasible region for the linearized constraints 
that is empty). 


The Fundamental Building Blocks 
of Successive Quadratic Programming 


Computational tools for the implementation of succes- 
sive quadratic programming methods require means 
of estimating the Hessian matrix of the Lagrangian 
function (i.e., by analytical, finite difference, or quasi- 
Newton second derivatives or a mixture thereof), meth- 
ods for solving the recursive quadratic programming 
subproblems (i. e., active set or interior point methods), 
and stabilization techniques (i. e., line searching or trust 
region methods) for forcing convergence from ‘poor’ 
starting points. 


Estimating the Hessian Matrix 
of the Lagrangian Function 


Wilson [19] originally suggested the use of analytical 
second derivatives of the objective function and the 
constraints for approximating the Hessian matrix of 
the Lagrangian function. However, in the 1960s and 
1970s, quasi-Newton methods for approximating sec- 
ond derivatives were introduced and became popular. 
This led to superlinear convergence results for a num- 
ber of quasi-Newton updates including the Davidon- 
Fletcher-Powell (DFP) and Powell-symmetric-Broyden 
(PSB) updates [10], and eventually the use of a mod- 
ified Broyden-Fletcher-Goldfarb-Shanno (BFGS) up- 
date [16] for calculating a new approximation to the 
Hessian matrix of the Lagrangian function given by 


[neng | = [Bises, Br] 


Bayi = Be + ; 
(nj sk) (s) Bysk) 
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where the new approximation to the Hessian matrix, 
B,+ 1, is computed from the old approximation, B;, the 
change in the unknown or x variables, s, = A x;,, and 
the vector nx = Oyg — (1 — @) Busg, where yp = gi (x41) 
— g1(xx) and 6 depends on the relative size of Nh Sk and 
$) Besk. In particular, 9 = 1 unless Si] Sk < 0.2 $4 Brsks 
in which case 9 = 0.85) Brsk/(s} Bresk — Nh Sk) was sug- 
gested in [16]. 

Presently, both analytical and quasi- Newton, as well 
as finite difference, second derivatives of the objec- 
tive function and the nonlinear constraints are used 
for estimating H(x). However, all methods for estimat- 
ing second derivatives have advantages and disadvan- 
tages. Quasi-Newton updates are computationally in- 
expensive and can be suitably modified to maintain the 
positive definiteness of H(x), which makes the recur- 
sive quadratic programs convex (bowl-shaped). This, 
in turns, guarantees that each recursive quadratic pro- 
gram has a unique solution and that descent in a suit- 
ably chosen stabilization procedure like line search- 
ing can be maintained in order to force convergence. 
However, the use of quasi-Newton methods provides 
only two-step R-superlinear convergence under rea- 
sonable conditions and the particular use of the mod- 
ified BFGS update forces the Hessian matrix of the 
Lagrangian function to be positive definite in the full 
space of the variables, which is unnatural. Usually the 
true Hessian matrix is indefinite and only positive def- 
inite on the tangent subspace (i.e., a hyperplane) de- 
fined by the linearized constraints, even at a local con- 
strained minimum. Moreover, SQP methods that use 
quasi-Newton derivatives generally require more func- 
tion and gradient evaluations than SQP methods based 
on analytical second derivatives. The use of analytical 
(or finite difference) second derivative to approximate 
the Hessian matrix of the Lagrangian function provide 
faster quadratic convergence, but only at a price. That 
is, SQP methods that use analytical or finite difference 
second derivatives usually converge in fewer function 
and gradient evaluations than SQP methods that use 
quasi-Newton approximations to H(x), but, in doing 
so, they sacrifice the guaranteed convexity of the re- 
cursive quadratic programs. This, in turn, can intro- 
duce multiple Kuhn-Tucker points into the quadratic 
programming subproblems, and result in the loss of 
descent properties associated with suitably chosen line 
search functions. Some application-based SQP meth- 


ods [11] have used judicious mixtures of analytical and 
quasi-Newton second derivative information and these 
methods have the same properties as SQP methods 
based analytical or finite difference second derivatives, 
only the rate of convergence is still theoretically two- 
step R-superlinear. 

More recent SQP methods for solving large scale 
problems are either based on sparsity-preserving esti- 
mates of H(x), so-called full space methods, or range 
and null space decomposition (RND), which elimi- 
nate x variables by substitution using the equality con- 
straints and require approximations of the projection 
of H(x) onto the linear (tangent) subspace (i.e., the 
hyperplane) defined by the linearized constraints. Full 
space methods ([11,14]), as their name implies, re- 
sult in quadratic programming subproblems in the full 
space of the x variables and often use analytical, fi- 
nite difference, quasi-Newton or a mixture of second 
derivatives. RND methods, on the other hand, result in 
smaller quadratic programming subproblems because 
they eliminate x variables using the equality constraints. 
However, RND methods [15] must use quasi-Newton 
updates to build approximations to the projection of 
H(x) on the tangent subspace, which tend to track cur- 
vature less effectively than analytical second derivative 
approximations. 


Methods for Solving 
the Recursive Quadratic Programs 


Given Bx, an approximation to H(x), the quadratic pro- 
gram can be assembled and the corresponding Kuhn- 
Tucker conditions for the quadratic programming sub- 
problem, 


BrAxg + Jp Ag+ J) be = —9(xx) 
and 
Up Tp | Ax = —c(xx). 


where J; is the Jacobian matrix of the equality con- 
straints and J; is the Jacobian matrix for the active in- 
equality constraints, can be solved for Ax,, Ax and juz. 
Methods for solving quadratic programming problems 
can be divided into two broad categories, active set 
methods and interior point methods. 
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Active Set Methods 


Active set methods iteratively determine the inequal- 
ities that hold as equalities to solve the associated 
quadratic program Kuhn-Tucker conditions for Ax,, 
A, and jz. This is typically accomplished by repeat- 
edly applying a set of rules for adding and deleting con- 
straints from the estimate of the active set (i.e., an ac- 
tive set strategy) until a valid Kuhn-Tucker point (or 
solution) is determined. Moreover, the Kuhn-Tucker 
conditions for the quadratic program constitute a set 
of symmetric linear equations, can be solved using 
a variety of symmetric matrix factorization methods 
including Cholesky factorization, and the matrix fac- 
tors can be modified (or updated) to accommodate 
changes in the active set without the need for com- 
plete refactorization at each quadratic programming it- 
eration. Complete refactorization is only required at 
each SQP iteration When the quadratic program is con- 
vex, descent in the quadratic program can be main- 
tained, the associated linear Kuhn-Tucker conditions 
have a positive definite projection of the Hessian ma- 
trix of the Lagrangian function onto the subspace de- 
fined by any active set of constraints, are reasonably 
easy to solve, and some guarantee of convergence to 
a unique quadratic programming solution (i. e., Kuhn- 
Tucker point) can be given. When the quadratic pro- 
gram is indefinite (i. e., due to nonconvexities), special 
factorization methods must be used to handle the po- 
tential indefiniteness of the projected Hessian matrix 
of the Lagrangian function, new rules for adding and 
deleting constraints to and from the active set are re- 
quired, and no guarantees of convergence can usually 
be made [12] [3]. Also active set methods can suffer 
from a potential combinatorial explosion in computa- 
tional overhead under certain circumstances, particu- 
larly on large quadratic programming problems. 


Interior Point Methods 


Recently, interior point methods have been suggested 
for quadratic programming [8,9,20] because they have 
the potential to solve problems with many variables. 
However limited experience in solving quadratic pro- 
gramming problems is currently available, and even 
less is available for large scale problems. Interior point 
methods for quadratic programming are primal-dual 
path following algorithms that employ a logarithmic 


barrier function, are based on Newton’s method and 
permit the use of iterative methods to solve the asso- 
ciated Kuhn-Tucker conditions for the quadratic pro- 
gram. However, convexity in the quadratic program 
is generally required. Typical iterative linear equation- 
solving techniques used to solve the associated Kuhn- 
Tucker conditions include preconditioned conjugate 
gradient, generalized minimum residuals, and other so- 
called Krylov (or expanding) subspace methods and 
preconditioning techniques often are based on some 
partial LU factorization of the Hessian matrix. These 
linear equation-solving methods are particularly ad- 
vantageous in solving large scale problems because they 
avoid fill-in in the coefficient matrix (i. e., turning zero 
elements into nonzero elements through the elimina- 
tion process) and thus, in principle, reduce both storage 
and overall computational workload. 


Global Convergence and Stabilization Techniques 


Often times, the starting point chosen for initiating 
SQP computations is not within the theoretical region 
of convergence for a given local solution. Thus it has 
become standard practice in the use of SQP methods, 
as well as in nonlinear programming in general, to use 
some type of technique for forcing convergence from 
these so-called ‘poor’ starting points. The most com- 
mon stabilization techniques used in SQP methods are 
based on either line searching or the use of trust re- 
gions. 


Line Searching 


The underlying concept of using line search functions 
in successive quadratic programming is to generate 
a monotonically decreasing sequence of line search (or 
merit) function values that maintain a compromise be- 
tween satisfying the constraints and minimizing the ob- 
jective function and guarantee convergence to a station- 
ary point of the Lagrangian function. Many line search- 
ing techniques in SQP methods are based on the ap- 
plication of Armijo’s rule to exact ]; penalty functions 
and augmented Lagrangian merit functions. For exam- 
ple, Powell [16] uses the nondifferentiable line search 
function 


P(x, A, ) 
= f(x) + Yo [Aj ei)| + D5 [iI les) 


’ 
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which was suggested in [10] and chooses the line search 
parameter, a, as the first number in the sequence (1, 
0.1, 0.001, ...) for which h(x, + @Ax,, A, LL) < P(x, A, 
jt) and where Ax,, A and jv are solution of the current 
quadratic programming subproblem. Local superlinear 
convergence occurs when a = 1 is chosen in the neigh- 
borhood of the solution. A further modification of this 
exact penalty function that chooses |A,;| = max[|Aj|, (|A4| 
+ |A;|)/], where 4’ is the Lagrange multiplier for the jth 
constraint on the previous iteration, has been suggested 
in [16] in order to avoid placing too much emphasis 
on satisfying the constraints in the line search func- 
tion. While this modification frequently gives good nu- 
merical performance, convergence guarantees can not 
be given because upper bounds on the multipliers are 
required but not known in advance. Cycling has been 
observed in SQP methods using nondifferentiable line 
search functions [5] and this, in turn, has led to the de- 
velopment of such things as the watchdog technique 
[6], which allows the line search function values to in- 
crease on some iterations. 

A monotonic decreasing sequence of line search 
function values can also be maintained through a prop- 
erty known as descent (i.e., that gl Ax < 0, where 
gs is the gradient of the line search function). This re- 
quires that the line search function be differentiable and 
has led to the use of the differentiable augmented La- 
grangian type line search functions [17] given by 


G(x,A. M7) 
= f(x) — [AT e(x) — re(x)"e(x)] — Plex, pn), 


where r is a penalty parameter that is adjusted itera- 
tively, where P(x, 4, r) = )pi(x, ps, r) and where p;(x, 
His 1) = [wici(x) — rc;(x)*] if the ith inequality is active 
and p;(x, ji, r) = 47 /r if it is inactive. 


Trust Region Methods 


Trust region methods for nonlinearly equality con- 
strained optimization [4,18] are another way of forcing 
convergence from ‘poor’ starting points. More recent 
work can be found in [7]. These techniques add a single 
trust region bound to the set of linearized equality con- 
straints and solve the modified quadratic programming 
subproblem defined by 


min g(xz)' Any + + Ax) Bx Axx 
st. C(x~) + (xp )Ax~ = 0, [Axel] < Ax, 


where Ax is a trust region radius that is adjusted from 
information gathered from one iteration to the next. 
The primary advantages of trust region methods are 
that they are relatively straightforward to implement 
and permit the Hessian matrix of the Lagrangian func- 
tion to be indefinite. While often successful, trust re- 
gion methods have the particular disadvantage of caus- 
ing infeasible linearized constraints. That is, when the 
trust region is too small, it may not be possible to sat- 
isfy the linearized constraints within the trust region. 
Other trust region methods for constrained optimiza- 
tion that define the trust region in terms of a set of 
bounds on each variable have been suggested [11] and 
thus applicable to problems involving inequality con- 
straints. However, these also can lead to infeasible lin- 
earized constraint sets. 


A Generic SQP Algorithm 


A typical, generic successive quadratic programming 

algorithm is shown below. 

1) Initialize x and Bo; define a convergence tolerance, € 
> 0, and set k = 0. 

2) Evaluate f(xx), g(x), (xz) and J(x;). 

3) Tf || Lec(oe), c(xn)]" Il2 < €, (xz) < O and p > 0, then 
stop; else go to step 4. 

4) Construct the quadratic program 


min g(xx)! Axg + 5 Ax] BeAxg 
st.  c(x~) + J(xx) Ax, < 0 


and solve it for Ax,, A, and ju. 

5) Determine x; from by either line searching, trust 
regions or some other means. 

6) Calculate a new approximation to the Hessian ma- 
trix, By. +1, from analytical, finite difference, quasi- 
Newton or mixed second derivatives. 

7) Setk=k-+1and go to step 2. 


Some Brief Comments 
on Numerical Performance 


The usual measures of numerical performance in non- 
linear programming, as well as other areas of numer- 
ical mathematics, are algorithmic reliability and effi- 
ciency. Reliability refers to the ability of an algorithm to 
find a local optimum from starting points that are ‘far 
away from the solution. That is, the issue of reliabil- 
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ity is equivalent to the question: will the SQP method 
converge to a local constrained optimum if the start- 
ing point is ‘far away’ from this solution, for whatever 
reasons? Clearly the answer to this question is strongly 
related to the use of stabilization techniques and is the 
primary motivation for the interest in the global con- 
vergence characteristic of SQP methods. Efficiency, on 
the other hand, is usually measured in terms of func- 
tion and gradient evaluations (or iterations) and is re- 
lated to the local convergence properties of SQP meth- 
ods. When the SQP algorithm gets close to the solution, 
rapid convergence to the optimum that requires fewer 
function and gradient evaluations is desired. Many nu- 
merical studies by the principle authors of SQP meth- 
ods, as well as others in the mathematical sciences [17], 
and various branches of engineering [1,11], have clearly 
demonstrated that successive quadratic programming 
methods are among the most reliable and efficient algo- 
rithms presently available for solving nonlinearly con- 
strained optimization problems. 

However, there are many subtle and interrelated is- 
sues (e. g., sparsity, nonconvexity, constraint infeasibil- 
ity, as well as problem-specific issues like model incon- 
sistencies and limitations) that have bearing on numer- 
ical performance and therefore considerable care must 
be exercised during both implementation and problem- 
solving. Often trades between advantages and disad- 
vantages must be accepted solely out of necessity. 


See also 


> Feasible Sequential Quadratic Programming 

> Optimization with Equilibrium Constraints: 
A Piecewise SQP Approach 

> Sequential Quadratic Programming: Interior Point 
Methods for Distributed Optimal Control 
Problems 

> Successive Quadratic Programming: Applications in 
Distillation Systems 

> Successive Quadratic Programming: Applications in 
the Process Industry 

> Successive Quadratic Programming: Decomposition 
Methods 

> Successive Quadratic Programming: Full Space 
Methods 

> Successive Quadratic Programming: Solution by 
Active Sets and Interior Point Methods 
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The separation of the chemical components of a mix- 
ture into a variety of useful product streams is a com- 
mon task in the chemical and petroleum industries and 
distillation represents the most common method for 
this task. Some distillation products may be sold while 
others may be sent to different parts of the same chem- 
ical plant for further processing. Distillation takes place 
in a distillation column (or large tower) that usually has 
a number of stages (or trays) on which vapor flowing up 
the column is brought into contact with liquid flowing 
down the column. It is this contact between liquid and 
vapor that affects the separation. Unfortunately distilla- 
tion is often very energy intensive and thus very costly. 
As a result, optimal operation of a distillation or se- 
quence of distillation columns is desirable. 

In a mathematical programming framework, the 
optimal design of a distillation column is a mixed in- 
teger nonlinear program (MINLP) because it involves 
both discrete variables (i.e., the number of stages in 
the column, the feed tray location, etc.) and continuous 
variables (i.e., flow rates, compositions, temperatures, 
pressures, etc. on all trays). Once the design or discrete 
variables are fixed, the optimal operation of a given col- 
umn configuration becomes a nonlinear programming 
(NLP) problem since then it only involves continuous 
variables. These NLP problems tend to be highly non- 
linear and nonconvex due to the nature of the equations 
of conservation of energy and phase equilibrium. 

In recent years, there has been some work on the ap- 
plication of full space [7,8,9] and decomposition meth- 
ods [2,11] of successive quadratic programming (SQP) 
to distillation. In particular, A. Kumar and A. Lucia [6] 
proposed a full space SQP method based on thermo- 
dynamically consistent quasi- Newton formulae that ex- 
ploit the homogeneity of the second derivatives of the 
energy balance and phase equilibrium equations. These 
thermodynamically consistent quasi-Newton updates 
were used to build appropriate parts of the Hessian ma- 
trix of the Lagrangian function and shown to result in 
better numerical performance than traditional secant 
updates on a number of distillation examples. Lucia and 
J. Xu [8] developed an indefinite quadratic program- 
ming method based on Bunch and Parlett factoriza- 
tion [3] of the entire coefficient matrix of the Kuhn- 
Tucker conditions, an active set strategy, and the use of 
trust regions to address the strong constraint noncon- 
vexities inherent in distillation optimization problems. 
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Results from this work showed that permitting indefi- 
niteness often provides a better local quadratic model 
for the Lagrangian function and that the resulting algo- 
rithms were capable of easily solving distillation exam- 
ples with which [7] had difficulty. In [9] a refined active 
set strategy, constrained pivoting and numerical matrix 
factor updating for indefinite quadratic programming 
were proposed and good numerical performance was 
obtained for a family of SQP methods on a set of 15 dis- 
tillations, some of which were extractive and azeotropic 
distillations. 

In contrast, L.T. Biegler and C. Schmid [2,11] have 
applied range and null space decomposition (RND) 
SQP methods to distillation optimization problems. 
They ‘tailor’ an RND method for use with an existing 
simulation model through the use of an interface in or- 
der to illustrate that decomposition SQP methods can 
be easily applied to process models like distillation. In 
particular, [2] reports good numerical performance for 
a set of four distillation examples involving ideal binary 
and ternary mixtures, along with a discussion of issues 
such as the need for preprocessing and quadratic pro- 
gramming constraint infeasibility. See [2] for a discus- 
sion of the need for and ways in which range space cur- 
vature can be obtained. 

The issues that are important in the application of 
SQP methods to distillation systems include: 

e formulation; 

e the mathematical model; 

e Hessian matrix approximations, sparsity and other 
exploitable properties; 

e initialization procedures for the unknown variables 
and multipliers; and 

e algorithmic and other implementation issues. 


SQP Formulation 


Distillation optimization problems usually contain be- 
tween 100 and 500 unknown variables, roughly the 
same number of equality constraints and twice that 
number of inequality constraints. Thus the number of 
degrees of freedom is often small compared to the num- 
ber of unknowns. As a result, both full space and de- 
composition SQP methods can be used and the ques- 
tion of which approach is better still remains open. Re- 
gardless of the approach, a general mathematical rep- 
resentation of the distillation optimization problem is 


given by 
min f(x) 


st.  c(x) < 0, 


where x is a vector of unknown variables of length n 
which represents an estimate of the local minimum, 
f(x) is a twice continuously differentiable objective 
function and c(x), the vector of equality and/or inequal- 
ity constraints, is also twice continuously differentiable 
and nonlinear. This constrained optimization problem 
is commonly recast in terms of the Lagrangian function, 
L(x), defined by 


L(x) = f(x) + AT c(x) + wl e(x), 


where A and wy are vectors of Lagrange and Kuhn- 
Tucker multipliers associated with the equality and 
inequality constraints respectively. The Kuhn-Tucker 
conditions are solved iteratively using a recursive 
quadratic programming formulation to define the 
change in the unknown variables and multipliers. This 
iterative quadratic programming subproblem, which is 
based on a quadratic approximation to the Lagrangian 
function subject to linearized constraints, is given by 


min g (xx) | Axg + +x) BeAxk 
st c(xn) + I(xp)Axg < 0, 


where Ax; is the change in the unknown variables, 
g(x,) is the gradient of the objective function, J(x;,) 
is the Jacobian matrix of the constraint functions, B, 
is some approximation to the true Hessian matrix of 
the Lagrangian function, H(x) = H¢(x) + )CAiHci(x) + 
> uiH-i(x), where Hy(x) and H,i(x) refer to the true 
Hessian matrices of the objective function and ith con- 
straint respectively. When full space SQP methods are 
used, this recursive quadratic programming problem 
is solved directly. When range and null space decom- 
position (RND) methods are used, the equality con- 
straints are used to ‘eliminate’ variables and a ‘reduced’ 
quadratic programming problem, given by 


min (Z) g(xk) 
—Z} BelYeUe Yel 'c(xe)])' Aze 
+42) (Z) BZ) Azk 
st. xy + Yee Ye)'[c(xn)] < 2 Az 
< xy + Yee Ye) '[e(xx)] 
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is solved. Here Y and Z represent bases for the range 
and null space of the Jacobian matrix respectively, 
Ax, = YAy, + ZAz, where Az, and Ay, = — 
UeY«)~‘[c(xx)] are the change in the unknown vari- 
ables in the null space and range respectively, J is the 
Jacobian matrix of the equality constraints, (Zr ByZx) 
is the symmetric projection of the Hessian matrix of 
the Lagrangian function onto the linearized constraints, 
and x; and xy are lower and upper bounds on the x 
variables. 


Mathematical Model 


The objective function in distillation optimization can 
be linear or nonlinear while the equality constraints 
are usually a mixture of highly nonlinear and linear al- 
gebraic equations. The inequality constraints, on the 
other hand, are simple bounds on variables but other 
nonlinear inequalities can occur. 


Objective Function 


The objective function for a typical distillation opti- 
mization is usually some function that represents a bal- 
ance between the energy-related (or other operating) 
costs of the column and the profit obtained from the 
sale (or credits) of products. One example, taken from 
[7], might be 


min f = c(—Q® + Q®) — (vg + TK) 


where Q® and Q® are the condenser and reboiler heat 
duties (or energy demands) respectively and vj, and I? 
are the component flow rates of the overhead and bot- 
toms products. Here the subscripts lk and hk denote 
the light key component (or low boiling component) 
and heavy key component (or high boiling component) 
respectively, the superscripts C and R denote the con- 
denser (or top stage) and reboiler (or bottom stage) of 
the column and c, is a scaling factor that helps balance 
the scale between the energy costs and product flow 
variables. Usually the energy demands consist of cool- 
ing water requirements for the condenser and the steam 
demands for the reboiler. Moreover, the negative sign 
in the condenser duty in the above objective function 
merely accounts for the thermodynamic convention as- 
sociated with heat transfer and does not represent sub- 
traction of the condenser duty because the value of Q® 


is always negative. Other objective functional forms ex- 
ist as well. 


Equality Constraints 


The equality constraints in distillation optimization 
consist of the conservation of mass and energy as well 
as the phase equilibrium equations, the latter of which 
relates the composition of the vapor to that of the liq- 
uid leaving each stage. The equations for the jth equi- 
librium stage are the component mass balances 


fig + lija — bij — vig + Vij41 = 0, 


the phase equilibrium relationships 


and the conservation of energy 


(Sofi) H+ (ia) Ha 
(Eu) 4-(Sm)m 
" (> vit) hjti + Qj = 0. 


In these equations, J, and vj are the flow rates of com- 
ponent i in the liquid and vapor respectively on the jth 
stage, Kj is the equilibrium ratio (or K-value) for the 
ith component on the jth stage, H; and h; are the corre- 
sponding liquid and vapor enthalpies, f i; is the ith com- 
ponent feed flow rate to the jth stage, Q; is the heat duty 
to the jth stage and n, is the number of components. 
Moreover, the equilibrium ratios, Kj, and enthalpies, 
H; and hj, are strongly nonlinear functions of the com- 
ponent flow rates, temperature, T;, and pressure, p; on 
the jth stage. Finally, for a column with n, equilibrium 
stages, there are n;(2n, + 1) equality constraints and 
n;(2n, + 2) unknown variables (i. e., the lijs, vis, Tjs and 
Qjs). Usually, the pressure on each stage is fixed in some 
manner so pressure is not an unknown variable. Thus 
there are n, degrees of freedom. However, it is usual, 
but not strictly necessary, that adiabatic operation (i. e., 
no heat withdrawn or added) for all trays except the top 
and bottom tray is assumed. This gives the additional 
equality constraints 
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and results in just two degrees of freedom for the 
simplest and most common column configuration. 
Columns with heat withdrawal or addition would nec- 
essarily have more degrees of freedom. 


Inequality Constraints 


The inequality constraints in distillation optimization 
problems are usually comprised of simple bounds on 
variables and product flows. In particular, for the jth 
equilibrium stage there are nonnegativity bounds on 
the component flow rates 


lij,vij = 0, i eee 


upper and lower bounds on the temperature 
Tin s T; = Tmax 


in order to keep the calculations of Kj, Hj and h; physi- 
cally meaningful, and in some cases explicit bounds on 
product component flow rates 


lin.Vi1 < ) fij, ta ee 


to ensure that the mass balance around the column is 
satisfied. 


»Nc, 


Lagrangian Hessian Matrix Approximations 


The Hessian matrix of the Lagrangian function re- 
quires that second derivatives of the objective function 
and nonlinear constraints be approximated and this 
can be done in any number of ways. When full space 
SQP methods are used to solve distillation optimization 
problems, analytical second derivative, finite difference 
second derivatives, quasi-Newton approximations or 
a mixture of analytical and quasi-Newton derivatives 
of the objective function and constraints (hybrid meth- 
ods) can be used. In contrast, when decomposition SQP 
methods are used [1], the modified Broyden-Fletcher- 
Goldfarb-Shanno (BFGS) update (see [10]) is usually 
used to approximate the projection of the Lagrangian 
Hessian matrix, ral B,Z,, to avoid explicit representa- 
tion of B, and the computation of a matrix triple prod- 
uct. 

In full space SQP methods, all techniques for esti- 
mating the Hessian matrix of the Lagrangian function 
can be put in the form 


By = C(xx) + Ag 


where C(x;) is called the computed part of the Hessian 
matrix and is calculated from analytical derivatives and 
Ax, the approximated part, can be computed either an- 
alytically, from finite differences or from an appropri- 
ate quasi-Newton formula [6]. Note that the objective 
function and the phase equilibrium and energy balance 
constraints can have both computed and approximated 
parts, and in particular, the second derivatives of the 
phase equilibrium and energy balance constraints have 
a natural division into thermodynamically ideal and 
nonideal ideal and nonideal (or excess) parts. The ideal 
parts are readily available in analytical form and consti- 
tute much of C(x;) while the nonideal parts depend on 
‘models’ for the activity coefficient and/or fugacity co- 
efficient and excess enthalpy and are usually contained 
in A,. Furthermore, because of the stagewise structure 
of distillation columns, both the computed and approx- 
imated parts are sparse and tend to be comprised of 
many small dense blocks. Thus quasi- Newton formulas 
such as the Powell-symmetric-Broyden (PSB) update 
can be used to build the second derivative approxima- 
tions of each block [6,7]. Also, certain parts of the equi- 
librium ratios, Kj, and enthalpies, H; and hj, are homo- 
geneous functions of the unknown variables (i.e., the 
component molar flow rates) and this gives rise to other 
matrix constraints that can be exploited when building 
quasi-Newton approximations to the blocks of A. That 
is, the equilibrium ratio is commonly defined by 


iy. £0 
_ Vissi 


~ QijPj 


ij 


where yj is the liquid activity coefficient, f? ; 18 the pure 
component fugacity and $j is the vapor fugacity coefli- 
cient for the ith component on the jth stage. The liquid 
and vapor molar enthalpies, on the other hand, usually 
have the form 


ID E 
H;=H?P +H, 

ID E 
hy =hP + he, 


where the superscripts ID and E denote the ideal and 
excess (or nonideal) parts respectively. The functions 
In yj and Hj are homogeneous functions of the liq- 
uid component molar flow rates on stage j while In 
gi and hi are homogeneous functions of the vapor 
component molar flow rates from the jth stage. Be- 
cause these thermodynamic properties are homoge- 
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neous functions, they give rise to the matrix conditions 
[V7 In yi ll; =-—Inyj and [V?Hi lj =0, 
for the liquid phase on the jth stage and 


[V2 In Pijlv; = —In ¢j, [V*hj]v; = 0, 


for the vapor phase on stage j. In [6] it is suggested that 
these thermodynamic constraints be used, in conjunc- 
tion with traditional secant conditions, to build better 
approximations of the appropriate blocks of Ay and use 
one iterated projection [4] from the space of secant ma- 
trices to the space of thermodynamically constrained 
matrices to approximate the second derivatives of the 
activity coefficients, fugacity coefficients and excess en- 
thalpies. In [9] a variety of techniques was used for ap- 
proximating the blocks of Ax, including a partial New- 
ton strategy in which each block of A; is zero at each 
SQP iteration, a secant only hybrid (SOH) method in 
which each block of A; is approximated by the PSB for- 
mula and only associated secant information for that 
block, a thermodynamically constrained hybrid (TCH) 
method in which both secant and thermodynamic con- 
straints are used in conjunction with iterated projec- 
tions for each block of Ax, and Newton’s (or Wilson’s 
[12]) method, in which analytical or finite difference 
second derivatives of each block of A, are used. 

Biegler et al. [2,11] use the modified BFGS formula 
exclusively to approximate the projection of the La- 
grangian Hessian matrix on the tangent subspace de- 
fined by the linearized constraints in RND SQP meth- 
ods. Curvature information in the range space is gener- 
ally neglected; however [2] suggests the use of and tech- 
niques for obtaining range space curvature. 


Initialization of the Unknown Variables 
and Multipliers 


The initialization of the unknown variables and La- 
grange and Kuhn-Tucker multipliers is an extremely 
important aspect of the successful implementation and 
application of SQP methods to distillation optimiza- 
tion, regardless of whether full space or decomposition 
SQP methods are used, and can represent the differ- 
ence between success and failure in problem solving. 
‘Good’ initial values of the unknowns and multipliers 
often prevent infeasible quadratic programming sub- 
problems. In many cases, a base design or simulation 


is available, in which the equality constraints are solved 
for a given set of specifications for the column (i.e., ad- 
ditional (usually two) equality constraints that exhaust 
the number of degrees of freedom). See [9]. This base 
case simulation provides both feasible and qualitatively 
correct initial estimates of the unknown variables and, 
while feasibility is not strictly necessary, it does usually 
significantly improve the numerical performance of full 
space SQP methods on distillation optimization prob- 
lems. This is because much of the strong nonlinearity 
in distillation optimization is contained in the equality 
constraints (i. e., phase equilibrium and energy balance 
equations) and feasible starting points usually result in 
iterates that track the constraint surface more closely 
than infeasible starting points. Both full space and de- 
composition SQP methods benefit from feasible start- 
ing points. The base case simulation also identifies any 
active inequalities at the feasible starting point. 

Decomposition methods for distillation optimiza- 
tion can also make use of simulations to initialize the 
unknown variables. However, [11] uses an initializa- 
tion procedure from the simulation program to give an 
infeasible but linearly consistent starting point for dis- 
tillation optimization problems solved by the ‘tailored’ 
RND SQP method. 

Good initial estimates of the Lagrange and Kuhn- 
Tucker multipliers are also important to the application 
of SQP methods to distillation optimization. In [6] all 
initial Kuhn-Tucker multipliers are set to zero (unless 
the base case simulation suggests otherwise) and the La- 
grange multipliers are initialized by solving the equa- 
tions 


JeJpA = —Jeg(xx), 


where again Jz is the part of the Jacobian matrix cor- 
responding to the equality (or active) constraints. Be- 
cause the number of equality constraints can be large 
and because the sparsity of JzJ} need not be anything 
like the sparsity of Jz, In [6] it is suggested that the 
above equation be solved for a small ‘model’ column 
consisting of a condenser (top stage), reboiler (bottom 
stage), all feed stages and one stage between the con- 
denser, reboiler and each feed tray and then the result- 
ing multiplier values be distributed by equation-type 
and section throughout the actual larger column. Thus 
for a ‘conventional’ column with one feed stage, there 
is one rectifying stage (between the feed and the con- 
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denser) and one stripping tray (between the feed and 
the reboiler) and five total stages. Lagrange multipli- 
ers for this ‘model’ column are calculated and then dis- 
tributed by equation-type within different sections in 
the column. That is, the Lagrange multipliers for the 
mass balance, energy balance and phase equilibrium 
equations for the condenser, reboiler and all feed stages 
are assigned their values calculated for the model col- 
umn while the Lagrange multipliers for the mass bal- 
ance, energy balance and phase equilibrium equations 
in the rectifying section are all assigned the same re- 
spective values calculated for the single rectifying tray 
in the model column. The same exact distribution pro- 
cedure is used for all stages in the stripping section of 
the column. 

Initial estimates of the the Lagrange multiplier in 
decomposition SQP methods are not strictly required 
since the initial projected Hessian matrix, al BZ, 
does not strictly require these values to be known. 
Moreover, the Lagrange multipliers, as well as any ac- 
tive Kuhn-Tucker multipliers, can be easily calculated 
from the relationship 


Ye Tp An = —YQ g(xx) 


once the unknown variables have been initialized. 


Algorithmic and Other Implementation Issues 


The sparsity of the constraint Jacobian matrix, charac- 
teristics of the resulting quadratic programming sub- 
problems and the use of stabilization techniques such 
as line searching are also important in assembling the 
correct set of computer tools for the application of SQP 
methods to distillation optimization problems. 

The Jacobian matrix for the equality constraints in 
distillation optimization usually has a block tridiago- 
nal structure, unless there are pumparounds. The in- 
equality constraints, on the other hand, result in a di- 
agonal structure for their part of the constraint Jaco- 
bian matrix. As a result, both full space and decompo- 
sition SQP methods must exploit the sparsity of the Ja- 
cobian matrix in distillation problems to keep storage 
(i.e., fill-in) and computational effort (arithmetic op- 
erations) tractible. Exploiting the sparsity of the con- 
straint Jacobian matrix is necessary in full space SQP 
method in order to effectively store the linear operators 
used in solving the large recursive quadratic program- 


ming subproblems that occur. That is, sparsity must be 
exploited in the matrix factorizations in active set meth- 
ods or the natural operators in interior point meth- 
ods for solving large quadratic programming problems. 
In contrast, the use of sparse matrix techniques sig- 
nificantly reduces both storage of the Jacobian matrix 
and the storage and computational effort required to 
form the (factors of the) matrices Y (i.e., the basis for 
the range), [JzY;]~', and Y;UJeYx]~| in decomposition 
SQP methods. D. Goldfarb [5] provides a good set of 
general guidelines for many of the issues related to the 
sparsity of both the constraint Jacobian and Hessian 
matrices in recursive quadratic programming. 

In decomposition SQP methods, the projection of 
the Lagrangian Hessian matrix is almost always approx- 
imated by the modified BFGS formula [10] and thus 
the ‘reduced’ quadratic programming subproblems are 
positive definite and have a unique solution at each SQP 
iteration. This is a significant advantage in some re- 
spects but the BFGS formula can give slower conver- 
gence than desired at the SQP level of the computa- 
tions in distillation optimization because it often has 
difficulty tracking the strong curvature of the noncon- 
vex constraint surface. In full space SQP methods, the 
projected Hessian Lagrangian matrix can be either pos- 
itive definite or indefinite depending on the way in 
which it is approximated. If Levenberg—Marquardt or 
modified Schur complements are used in conjunction 
with sparse factorizations like Cholesky factorization, 
positive definiteness can be maintained and convex 
quadratic programs result. On the other hand, when 
a hybrid approach is used, the blocks of A; usually have 
no sign definiteness in distillation optimization because 
the phase equilibrium and energy balance equations are 
nonconvex and strongly nonlinear. Thus the resulting 
(projection of the) Hessian matrix of the Lagrangian 
function can be indefinite and indefinite quadratic pro- 
grams result, which can be difficult to solve. 

Stabilization techniques such as line searching [2] 
and trust region methods [8,9] have been used in distil- 
lation optimization. Biegler [2] suggests the use of line 
searching techniques such as the Armijo rule (see [10]) 
but gives few details on associated numerical perfor- 
mance. Lucia et al. [9] recommend the use of ‘asymmet- 
ric’ trust region methods in distillation optimization to 
improve numerical performance and also alleviate dif- 
ficulties associated with infeasible quadratic programs. 
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Comments on the Numerical Performance 
of SQP Methods on Distillation Problems 


There are a limited number of papers in the literature 
on the optimization of distillation systems using SQP 
methods. Lucia and Kumar [6] minimized the operat- 
ing costs of a methanol recovery column with 10 equi- 
librium stage and two components using SQP meth- 
ods in which the Lagrangian Hessian matrix was ap- 
proximated by a partial Newton method, SOH and 
TCH methods and all analytical second derivative (i. e., 
Wilson’s method). They report failure for all meth- 
ods except the thermodynamically constrained hybrid 
method on this relatively small problem involving 50 
unknown variables. They [7] subsequently applied the 
same SQP methods, with the exception of the partial 
Newton method, to a set of five distillation examples 
ranging in size from 35 to 176 unknown variables and 
report good numerical performance for the thermo- 
dynamically constrained hybrid method. In particular, 
they discuss the advantages of using feasible starting 
points for the unknown variables and ‘good’ qualita- 
tive estimates of the Lagrange multipliers. In [8] the 
need for feasible starting points is reiterated and also 
contains a discussion of the occurrence of line search- 
ing difficulties and uphill search directions in the ex- 
amples studied in [7]. See [9] for the numerical perfor- 
mance of the partial Newton, SOH, TCH, Wilson and 
range and null space decomposition (RND) SQP meth- 
ods on a set of 15 examples, some of which contain 
strongly nonideal (and therefore nonlinear) extractive 
and azeotropic distillations. In this study, most meth- 
ods performed quite well with a slight advantage go- 
ing to Wilson’s method over the TCH method. The 
RND method performed worst of all on this set of ex- 
amples, followed by partial Newton and then the SOH 
and TCH methods in terms of reliability and efficiency. 
See [9] also for the failure of line searching techniques 
such as Armijo’s rule and an augmented Lagrangian 
line search function as well as the occurrence of in- 
feasible linearized constraint sets and the usefulness of 
‘asymmetric trust regions in forcing convergence when 
difficulties arise. See, on the other hand, [11] for numer- 
ical results for two binary distillations and two ternary 
distillations, ranging in size from 60 to 252 unknown 
variables. Few numerical details are presented with re- 
gard to the physical properties models used, although 


the mixtures studied can be considered ideal, and some 
discussion of infeasible quadratic programs from ‘poor 
starting points is given. 


See also 


> Feasible Sequential Quadratic Programming 

> Optimization with Equilibrium Constraints: 
A Piecewise SQP Approach 

> Sequential Quadratic Programming: Interior Point 
Methods for Distributed Optimal Control Problems 

> Successive Quadratic Programming 

> Successive Quadratic Programming: Applications in 
the Process Industry 

> Successive Quadratic Programming: Decomposition 
Methods 

> Successive Quadratic Programming: Full Space 
Methods 

> Successive Quadratic Programming: Solution by 
Active Sets and Interior Point Methods 
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This article discusses the use of successive quadratic 
programming (SQP) in industry, together with tech- 
niques for mathematical optimization and process 
modeling to improve economic performance of plants 
in process industries. First, different types of flowsheet- 
ing optimization problems based on SQP methods and 
process models are introduced briefly. Then a number 
of process optimization formulations and strategies are 


discussed, along with how the SQP algorithm needs to 
be developed and extended to take advantage of large 
scale systems. In particular, the development of reduced 
Hessian SQP (rSQP) is presented along with different 
variants. Finally, literature on industrial and academi- 
cal applications of SQP and rSQP is given. 


Introduction 


Complex engineering models can be formulated as 
large systems of differential and algebraic equations, 
constructed by linking smaller submodels. These com- 
plex engineering models can constitute a larger system 
which leads to flowsheet optimization. This optimiza- 
tion problem can be posed as considering a general 
problem with different models and connections shown 
in Fig. 1 schematically. 

The mathematical representation of the flowsheet 
optimization and simulation problem shown symbol- 
ically in Fig. 1 can be written as 


min >> fwmi. yMi, Umi) 


st. Mi(wmi,ymi1.Umi) = 0 


(1) 
M,( Won, Mn, Umn) =0 


C(wmi, YMi. Umi) = 0 
fori =1,..., #units 


wew, yey, ueu, 


where M; are the chemical process models that can be 
solved with specialized solution strategies. Also, wii are 
the internal variables inside of each model M;; yyyj are 
the input stream variables, and uj are the decision 
variables. Here C(w, y, u) = 0 includes the additional 
constraints that arise from coupling of models and the 
sets W, Y and U represent lower and upper bounds on 
their respective variables. 

In these optimization problems, the overall equa- 
tion system usually results in very large systems of al- 
gebraic equations and variables (typically, 10* — 10°) 
with relatively few degrees of freedom (typically, < 100). 
The solution and optimization of these models are fre- 
quently effected by calculation procedures that exploit 
their equation structure. Because it requires fewer it- 
erations and because of its flexibility in interfacing to 
process models, successive quadratic programming has 
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General representation of flowsheet optimization problem 


arguably become the most popular method for solving 
these nonlinearly constrained optimization problems. 

The objective of this article is to characterize usage 
of SQP in different techniques for process flowsheeting 
optimization problem and to give an overview on in- 
dustrial applications of these techniques. We also point 
out some remaining difficulties which still prevent ap- 
plication of simultaneous optimization for very large 
systems. In Section 2 we present the SQP algorithm and 
discuss how this is interfaced to process models. Sec- 
tion 3 discusses some large scale extensions to SQP and 
focuses on the rSQP technique. Here, the flowsheeting 
modes, modular and equation oriented (EO), are de- 
scribed in Section 4 in order to provide more detail on 
process optimization problems. This is followed in Sec- 
tion 5 with some industrial examples for both off-line 
and on-line optimization. Conclusions and directions 
for future research are then given in Section 6. 


Successive Quadratic Programming 


In this section we examine the underlying ideas of 
the SQP method and the theory that establishes it as 
a framework from which effective algorithms can be de- 
rived. In addition, an excellent review of the develop- 
ment of the SQP algorithm can be found in [9]. For pro- 
cess optimization we describe the most popular mani- 
festations of the method, discuss the theoretical proper- 
ties, and comment on their practical implementations. 
The nonlinear programming problem to be solved can 
be formulated as 


min f(x) 
(NLP) 4 s.t c(x) = 0 
xi <x <x" 


where the objective function f: R” — R, equality con- 
straints c: R’ — R” and any nonlinear inequality con- 


straints can be expressed through simple bounds and 
additional equality constraints by adding slack vari- 
ables. Here the great strength of the SQP method is 
its ability to solve problems with nonlinear constraints. 
For this reason, it is assumed that (NLP) contains at 
least one nonlinear constraint function. 

The basic idea of SQP is to model (NLP) at a given 
point, say x,, by a quadratic programming subproblem, 
and then to use the solution to this subproblem to con- 
struct a better approximation x;, 1. This process is it- 
erated to create a sequence of approximations that, it is 
hoped, will converge to a solution x*. The key to under- 
standing the performance and theory of SQP is the fact 
that, with an appropriate choice of quadratic subprob- 
lem, the method can be viewed as the natural extension 
of Newton and quasi-Newton methods [16] to the con- 
strained optimization setting. Thus one would expect 
SQP methods to share the characteristics of Newton- 
like methods, namely, rapid convergence when iterates 
are close to the solution, but it is possible to have erratic 
behavior that needs to be carefully controlled when iter- 
ates are far from a solution. While this correspondence 
is valid, in general, the presence of constraints makes 
both the analysis and implementation of SQP methods 
significantly more complex. 

Two additional properties of the SQP methods 
should be pointed out. First, SQP is not a feasible-point 
method; that is, neither the initial point nor any of the 
subsequent iterates need to be feasible (a feasible point 
satisfies all of the constraints of (NLP)). This is a major 
advantage since finding a feasible point for nonlinear 
constraints may be nearly as hard as solving (NLP) it- 
self. SQP methods can be easily modified so that lin- 
ear constraints, including simple bounds, are always 
satisfied. Second, the success of the SQP methods de- 
pends on the existence of rapid and accurate algorithms 
for solving quadratic programs. Fortunately, there are 
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good procedures for solving quadratic programs. In- 
deed, when there are only equality constraints, the so- 
lution to a quadratic program reduces to the solution of 
a linear system of equations. When there are inequality 
constraints a sequence of these systems is to be solved, 
in principle. 

A successful SQP algorithm also needs adaptive 
safeguards that deal with general problems. The algo- 
rithmic details to overcome such difficulties, as well as 
more mundane questions — how to choose parameters, 
how to recognize convergence, and how to carry out the 
numerical linear algebra - are lumped under the term 
‘implementation’. Some description of SQP implemen- 
tations for SQP is provided in [6]. 

The basic algorithm for the SQP method can be 
summarized as follows: 


For each iteration k: 

1 | Evaluate the objective and functions, f(x,) 
and c(x;) and their gradients 

2 | Solve a quadratic programming (QP) problem 
to determine a search direction, d; for the vari- 
ables, x;. If a termination criterion is satisfied 
(ie., xz isa KKT point), STOP. 

3 | Find a steplength that leads to a sufficient im- 
provement toward the solution of (NLP).This 
is done either by a trust region or a line search 
algorithm. In the case of a line search, set 
Xk+1 = X_ + a¢d,, where a; is a steplength pa- 
rameter. 

For the trust region method, we constrain 
d; € A, where A is adjusted and x,4) = x, + 
dk. 


Basic algorithm for the SQP method 


To consider two components of this algorithm, the 
QP subproblem for problem(NLP) method can be for- 
mulated as follows 

‘ T 147 
min g(xn) d+ 5d’) W(xx)d 
st. (xz) + A(xx)'d =0 (2) 
x’ <xp+d <x" 
where g denotes the gradient of f, W(x) denotesthe 


Hessian of the Lagrangian function, L(x, A) = f(x) + 
A ¢(x) and A denotes the n x m matrix of constraint 


gradients, 
A(xr) = [Vei(xg),.... Vem(xx)]. 


To establish global convergence for constrained opti- 
mization algorithms, i.e., convergence to KKT points 
from poor starting points, a way of measuring progress 
towards a solution is needed. For SQP this is done by 
constructing a merit function, a reduction in which im- 
plies that an acceptable step has been taken. For deter- 
mination of the steplength, either with a trust region 
or line search method, a merit function is used to bal- 
ance the two goals of decreasing the objective function 
and satisfying the constraints of the nonlinear program 
(NLP). Choices for the merit function include the non- 
differentiable £; merit function 


Pulx) = f(x) + we Mlelh (3) 


from [24], and the augmented Lagrangian function 


Ile(x) ||? 


- (4) 


Qplx) = f(x) + A(x) e(x) + 


from [18]. 

Using these components, S.P. Han [24] proved that 
if a line search stepsize, a ,, is chosen by decreas- 
ing an exact penalty function along the QP computed 
search direction d;, and the QPs are convex, solv- 
able and bounded below, then the SQP algorithm con- 
verges to a KKT point from any starting point, im- 
plying global convergence. However, employing this 
line search function often led to very small stepsizes, 
and consequently, slow convergence rates in the neigh- 
borhood of the solution. M.J.D. Powell [34] modified 
this procedure to introduce a less stringent line search 
function. However, this function was neither globally 
convergent, nor locally superlinear. Several researchers 
have considered the line search strategies for SQP. K. 
Schittkowski [37,38] and H. Yamashita [48] proposed 
an augmented Lagrangian merit function for the line 
search. 

R.H. Byrd and J. Nocedal [11] give an analysis of 
the two merit functions and their convergence proper- 
ties for a reduced Hessian algorithm. The augmented 
Lagrange function has been used widely but its perfor- 
mance is sensitive to the multiplier estimates and the 
penalty parameter, jz. On the other hand, the £; merit 
function can suffer from the Maratos effect (slow con- 
vergence) near the solution, although a nonmonotonic 
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line search such as watchdog technique [13] can be used 
to avoid this effect. This merit function also has the 
advantage of not requiring estimates of multiplier val- 
ues at each iteration, even though the penalty param- 
eter is usually based on Lagrange multiplier estimates. 
For this reason, in [8] a simpler measure is considered 
that does not require Lagrange multiplier estimates, but 
still maintains descent properties for 9,,(x) as discussed 
later. 

Finally, Byrd and Nocedal [11] summarize and ex- 
tend local convergence properties of SQP. In particular, 
if full steps are taken in the neighborhood of the solu- 
tion, a variety of superlinear convergence rates can be 
classified and these depend on how W(x) is calculated 
or approximated. 

Efficient SQP algorithms in the large scale case de- 
pend on carefully addressing many factors. Problems 
are considered large if, to solve them efficiently, either 
their structure must be exploited or the storage and ma- 
nipulation of the matrices involved must be handled in 
a special manner. The most obvious structure, and the 
most commonly considered, is the sparsity of the matri- 
ces. Typically in large scale problems most constraints 
depend only on a few of the variables and the objective 
function is ‘partially separable’, i.e., it is made of a sum 
of functions each of which depends only on a few of the 
variables. In such cases matrices are sparse. 

The formulation of the problem in terms of SQP 
and QP usage is the same as shown in (2). But the solu- 
tion strategy and passing of information about the Hes- 
sian make a significant difference. For the SQP method 
described above, the Hessian matrix is usually supplied 
in a dense form, since it is frequently approximated 
by a quasi-Newton updating formula. The search di- 
rection is determined by using the dense Hessian ap- 
proximation. When the problem becomes very large, 
passingthis information and solving the QP can become 
prohibitively expensive. 

For large scale process optimization problems, we 
can distinguish twosignificant kinds of SQP algorithms, 
full space and reduced space methods. The first ap- 
proach applies sparse, full space QP solvers, where nat- 
ural problem structure can be exploited [28] based on 
analytic first and second derivative matrices. For large 
process models, however, second derivatives may be 
difficult to obtain and there are generally few decision 
variables (i.e., degrees of freedom) despite the large 


model size. As a result, we therefore consider a reduced 
space SQP (rSQP) decomposition strategy, as described 
in the next section. 


Reduced Hessian SQP Methods 


The reduced Hessian methods approximate only the 
portion of the Hessian relevant to a subspace of the vari- 
ables. The advantages of these methodsare that quasi- 
Newton positive definite updates can be used and that 
the dimension of the problem is reduced to n — m (pos- 
sibly a significant reduction). Several versions of a re- 
duced Hessian type of algorithm have been proposed; 
they differ in the ways the multiplier vectors are cho- 
sen and the way the reduced Hessian approximation is 
updated. 

In rSQP, the quadratic programming (QP) sub- 
problem is reduced to solving a smaller QP in the space 
of the independent variables by introducing a nonsin- 
gular matrix of order n, which consists of two basis ma- 
trices and is written as [Y,; Z;], where Y; € R"*", Zy 
€ R"*("—™ and it is assumed that Al Zx = 0. Thus the 
Zx matrix becomes a basis for the tangent space of the 
constraintsand the solution can be expressed as 


dy = Yepy + Zepz (5) 


for vectors py € R” and pz € R"~ ™. From (5) the linear 
constraint defined in (2) becomes 


ck + Ay Yepy = 0. (6) 


If Ax is assumed to have full column rank then the non- 
singularity of [Y;, Z;] implies that the matrix Al Y; is 
nonsingular, so that py can be determined by (7) as: 


py =—[AL Yi] ‘ce. (7) 


Loosely speaking, the py step serves to improve the so- 
lution of the equality constraints, while pz acts in the 
null space of these constraints and serves to minimize 
the objective function. 

There are a number of ways to form the basis vectors 
Y and Z. One of the cheapest ways, that is well-suited to 
large scale decomposition, is to partition the variables, 
x, into m basic or dependent variables (which are re- 
ordered to be the last m variables) and n — m decision 
variables. This induces the partition 


A(x)" = [N(x) C(x)], (8) 
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where m x m basis matrix C(x) is assumed to be non- 
singular. Z(x) and Y(x) are now defined to be 


Z(x)' =} -N'C?)], Y! = [oly. (9) 


This choice is particularly popular [18,32] and advan- 
tageous when A(x) is large and sparse, because a sparse 
LU decomposition of C(x) can often be computed eff- 
ciently. 

If the variables are partitioned into independent (z;) 
and dependent (zp) variables, then the corresponding 
search direction can be defined as follows: 


d' =(d! dj], (10) 
dy = Pz, (11) 
dp = py —C Nady, (12) 


where p, corresponds to a Newton step for the m de- 
pendent variables and m equality constraints and p, is 
computed by solving a much smaller QP subproblem 
than the original problem, as given below. The QP sub- 
problem can then be expressed exclusively in terms of 
the variable pz. 


min (Zi ge + wz)! pzt + Py Brpz 
st. x, — XK — Yepy < Zepz 
U 


(13) 
Sx" — XK — Yepy 

where the reduced matrices Z'WZ and_ vector 
Z'WYpy are given (or approximated) by B, and wy, 
respectively. This decomposition reduces the Hessian 
matrix from order n to order n — m but we are still left 
with n — m simple bounds on the variables pz and m 
bounds from the dependent variables, which are pro- 
jected into the space of the independent variables. De- 
tails of the reduced Hessian SQP can be found in [11,27] 
and in [8,39]. 


Multiplier-Free Reduced Hessian SQP 
In the conventional rSQP method (see, e. g., [11]), La- 


grange multipliers are calculated by 


A=—-(Y'A) TY! (gk +0), (14) 


where 7, are the bound multipliers. In process op- 
timization models where the model equations, vari- 


ables and the constraint gradient matrices are not ac- 
cessible directly, we can develop a nonlinear program- 
ming method that requires neither second derivative nor 
calculates Lagrange multiplier estimates for the model 
equations. In [7] this condition is taken into account, 
the ‘multiplier free’ reduced Hessian algorithm is de- 
rived and is presented formally for problem (NLP). In 
this approach, f and c are assumed to be smooth func- 
tions with n, m >> n — mand the first derivatives of the 
f and care available. The SQP method for solving equa- 
tions (7)-(13) generates, at iterate x;,, a search direction 
dx by solving the QP subproblem with an exact penalty 
linesearch method. Generally the condition px > ||Ax| 
is used to ensure a descent property [11]. Instead, the 
multiplier-free approach can be used by noting that 

AT cp = (ge +x)! Yepy. (15) 
So to ensure a descent property, one only needs to 
choose: 


\(ge + mk)’ Yepy| 


16 
Icel = 


Finally, current SQP methods incorporate either trust 
region or line search strategies to promote global con- 
vergence behavior. The line search is more efficient in 
determining proper steplengths while the trust region 
is essential to avoid poor search directions. Because of 
the trade-offs in using either method, D. Ternet and 
L.T. Biegler [42] incorporated a combined line search 
and trust region approach. Details for the application 
of trust region and line search methods can be seen 
from [42]. 


Solving the QP Subproblem 


At the heart of the SQP optimization algorithm is the 
formulation and solution of the quadratic program 
(QP1). This step strongly influences theaccuracy and ef- 
ficiency of the algorithm. Aside from the effort required 
to evaluate the function values and gradients, this step 
is usually the most computationally intensive step in the 
SQP or rSQP algorithm. 

QP algorithms based on active set strategies can be 
classified into primal and dual approaches. In the pri- 
mal space approach, a feasible point is determined first 
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and succeeding directions are then taken to reduce the 
quadratic objective function. This approach requires 
a positive definite projected Hessian. An early primal 
code was VE02AD, developed by R. Fletcher [18] and 
incorporated in the Harwell subroutine library in 1972 
[25]. A popular and very reliable primal QP code, QP- 
SOL, was developed by P.E. Gill and W. Murray [21], 
who also extended this strategy in the codes LSSOL 
(1988) and QPOPT (1996). 

A very efficient dual space QP strategy was devel- 
oped by D. Goldfarb and A. Idnani [22]. Here, we re- 
quire a positive Hessian matrix, but no initial feasible 
point is required, instead a dual feasible point is first 
calculated. This can save considerable effort in the SQP 
or rSQP algorithms. This approach was incorporated 
into two QP codes, the Harwell code-VE17AD by Pow- 
ell and QPKWIK [39]. In addition, QPKWIK allows for 
direct updating of the inverse Cholesky factor of the re- 
duced Hessian matrix. This and other features within- 
QPKWIK allow the reduced Hessian method to per- 
form better than both QPSOL and VE17AD, as n— m 
becomes larger. 


Modeling Modes for Process Flowsheets 


Using the process models described in (1) and in Fig. 1, 
thereare three problem types, simulation, design and op- 
timization problems, frequently considered by process 
engineers. In the simulation problem, the variables as- 
sociated with the feed streams and the design variables 
(u in the constraints in (1)) of the units are specified. 
The unknowns are the remaining variables (y and w in 
the constraints in (1)). In this procedure, it is implic- 
itly assumed thatthe number of variables to be deter- 
mined is equal to the number of equations, so that the 
system is solved; any adjustment of remaining decision 
variables can be left to an outer optimization loop. In 
addition, design problems require the specification of 
additional constraints (such as production rates, prod- 
uct yields and purities) in the flowsheet and freeing up 
additional decision variables (u) to satisfy these con- 
straints. 

In the optimization problem for process flowsheets, 
variables associated with the feed streams and design 
variables may be left unspecified and a cost function is 
added to the model in (1). The unspecified variables (u) 
are determined so as to minimize the cost function. In 


this case, both equality and inequality constraints may 

be present and their number may be different from the 

number ofthe unspecified parameters. At the simplest 

(and least efficient) level, the optimization approach is 

an iterative procedure consisting of the following steps: 

a) Fix (n — m) degrees of freedom (independent vari- 
ables, 1) 

b) Solve the m equations for the m remaining vari- 
ables(w and y). This is the flowsheet simulation 
problem. 

c) Evaluate the objective function (1), and adjust the 
n— m variables to minimize this and satisfy the 
bounds in (1). 

d) Repeat from step a). 

Since the variables are determined by the solution of 

the equations in step b), the equations themselves are 

satisfied exactly within the convergence tolerance of 
the solution procedure. We may regard the equations 
as serving to eliminate m variables from the optimiza- 
tion problem. This approach treats the process model 
as a ‘black box’ and requires the repeated solution of the 
simulation problem. However, this approach can fail if 
decision variables are chosen at intermediate points, for 
which there is no solution to the simulation model. 

The advantage to SQP is that it can be interfaced 
more flexibly than with the black-box strategy and it 
allows a simultaneous optimization and simulation ca- 
pability. This can be seen by considering two different 
modes, the modular and equation oriented approaches, 
for model formulation and solution. These are illus- 
trated in Fig. 2 and described next. 


Modular (Closed Form) Approach 


With this approach, the modeling equations are 
grouped according the individual units in the process 
and specialized solution strategies are applied to each 
unit. Calculation then proceeds sequentially from one 
unit to the next. With modular approach, there are 
many widely tested models and procedures. Solution 
procedures are unit specific and locally robust. Initial- 
ization is also straightforward. However, this model- 
ing mode requires tearing of the recycle streams. Here 
one iterates on tear variables that are sufficient to per- 
mit the remaining variables to be calculated. Since the 
flowsheet consists of modular modules, recycle conver- 
gence is usually performed by slow convergence tech- 
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niques. Moreover, the calculation of derivatives from 
individual process models involves perturbing and re- 
simulating the entire flowsheet with respect to the de- 
cision variables. This process is both time-consuming 
and subject to errors due to probable internal conver- 
gence failures during model solution stage. 

The modular approach is the most common simu- 
lation technique employed in industrial environments 
for off-line design and analysis. For example in three 
common process simulation codes (ASPEN, PRO/II 
and HYSYS) the optimization problem, is solved by 
first calculating the process models before evaluating 
the constraints and objective function. This black-box 
technique is referred to as feasible path (FP) approach 
and represents a two-tiered strategy to optimization. 
The optimization problem is solved in an outer loop, 
while the simulation equations are converged in an in- 
ner loop. Note that recycle equations need to be solved 
every time the objective function is evaluated. 

On the other hand, these tear stream equations 
and variables can also be added as constraints in the 
optimization problem, and this leads to a more effi- 
cient NLP strategy than with the black-box approach. 
Termed the infeasible path approach, this strategy per- 
forms convergence of the recycle loops simultaneously 
with optimization of the flowsheet. This capability has 
been added to a number of process simulators (see Ta- 
ble 2) and the optimizer enjoys some success in indus- 


try for the optimization of novel process and equipment 
designs. A detailed derivation of this approach along 
with description of several flowsheeting case studies is 
given in [6]. 


Equation Oriented (Open Form) Approach 


With the equation oriented approach, the process 
model equations are considered as a single large set of 
equations to be solved with a large scale nonlinear al- 
gorithm. For process optimization, B.A. Murtagh [31] 
offered the viewpoint where the optimization is em- 
bedded within the solution procedure. Here the nonlin- 
ear equations describing the entire system become a set 
of nonlinear equality constraints, giving rise to a large 
nonlinear programming problem with a mixed set of 
large sparse linear and nonlinear constraints. To distin- 
guish it from the modular or closed form approach it 
is named as the simultaneous or open form approach. 
This approach involves the simultaneous linearization 
of all the equations and iteration on all the variables, 
using the Newton-Raphson method or some variation 
thereof. In this case, we must be able to solve efficiently 
very large systems of sparse linear equations. In solv- 
ing such systems the use of sparse matrix techniques is 
a necessity. 

The EO approach, when used with multiple mod- 
els, does not exploit individual model structure and the 


3860 


Successive Quadratic Programming: Applications in the Process Industry 


entire burden for solution is on a general purpose New- 
ton solver. Initialization might be difficult, but very ef- 
ficient methods exist for the partitioning and tearing 
large sets of algebraic equations. Also, objective func- 
tion and constraint derivatives are usually available an- 
alytically from the large system. Therefore, the advan- 
tage of the equation oriented approach is that it avoids 
multiple levels of iteration, one for solving the systems 
describing equations and one for optimization. On the 
other hand, the describing equations are not necessar- 
ily satisfied exactly until convergence is approached (al- 
though it is possible to allow for instances where this 
causes difficulty). 

Several EO programs were developed, including 
ASCEND (Carnegie MellonUniv.), QUASILIN (Cam- 
bridge Univ.), FLOWSIM (Univ. Connecticut), and 
SPEEDUP (Imperial College). Given recent advances 
in software engineering and object oriented struc- 
tures, equation oriented process simulation pack- 
ages havealso been made commercially available (e. g. 
RTOPT, SPEEDUP, NOVA, gProms). However, be- 
cause equation-based process models are harder to set 
up and initialize, these packages are generally more dif- 
ficult to use than modular simulators. 

With these two simulation modes, we see a num- 
ber of trade-offs. The modular mode deals with large 
detailed models and convergence of the optimization 
problem occurs at multiple levels and can be time- 
consuming. However, initialization and problem for- 
mulation is generally easy and intuitive to the process 
engineer and the solution strategies are robust. Conse- 
quently, this approach is used as a general purpose op- 
timization strategy for off-line design and analysis for 
large scale chemical processes. 

On the other hand, the equation oriented strategy 
provides a truly simultaneous strategy to process opti- 
mization and can be much more efficient. Nevertheless, 
initialization and process modeling are somewhat spe- 
cialized to the process application, and general-purpose 
detailed models may often be simplified. As a result, EO 
approaches are more common for on-line optimiza- 
tion, including refineries, olefin plants and power sta- 
tions. 

In the next section, we provide a brief history on the 
application ofSQP to these problem types. We then de- 
scribe a number of examples for bothon-line and off- 
line process optimization. 


Application of SQP Optimization 
in Industrial Problems 


The first appearance of SQP can be traced back to [45] 
and [4], butnumerical difficulties hampered widespread 
application. In particular, this was due to conceptual 
weaknesses, such as lack of global convergence, non- 
convex QPs and unreliable QP solvers. As a result 
there was little initial development of SQP until the 
late seventies. Nevertheless, in 1968, J.D. Simon de- 
veloped a general purpose nonlinear optimizer within 
Exxon, based upon the successive solution of quadratic 
programming (QP) problems approximating the given 
problem. The program called ECO (Exxon Computer- 
ized Optimizer) was put into production status in 1969 
and made available to Exxon’s worldwide affiliates. In 
1970, the code was revised to handle a gas field opti- 
mization problem which required over 300 variables for 
the Exxon Production Research Company. The code 
proved to be so successful that a special version has 
linked to a reservoir simulation system and marketed 
outside of Exxon. In 1972, a revised Exxon production 
version was released to incorporate additional features 
and be more user friendly. However, this software was 
not pursued during the 1980s due to lack of economic 
justification [40]. 

On the other hand, by 1977, the application of 
quasi-Newton methods and analysis of exact penalty 
functions led to the efficient SQP algorithms by Han 
and Powell. From this starting point, the next decade 
saw algorithmic developments by A. Conn, Fletcher, 
Gill, Murray, Nocedal and many others, which led to 
advanced features including convergence properties for 
a variety of merit functions, applications of trust region 
and line search globalizations for constrained optimiza- 
tion, and efficient factorization and decomposition for 
large scale problems. 

Applications of SQP in process engineering begin 
in 1980 and include contributions from A. Westerberg 
and coworkers, R.W.H. Sargent, Biegler, A. Lucia, S. 
Macchietto, M.A. Stadtherr, W. Morton, B. Kalitvent- 
zeff and others. New algorithms have been developed, 
existing ones have been refined, some good software 
has been developed, and there has been some computa- 
tional experience and practical applications. All of these 
academic efforts have paved the way for effective large 
scale, on-line optimization strategies, discussed next. 
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On-line Process Optimization 


On-line (or real-time) optimization requires the solu- 
tion of nonlinear programs that describe the steady 
state operation of a chemical process. This problem 
can be solved every few hours and operating condi- 
tions (e.g., setpoints to the control system) can be 
updated in the process to improve operation based 
on a profit function. Current industrial applications 
of model-based real time optimization (RTO) address 
complex chemical plants. T.E. Marlin and A.N. Hrymak 

[29] list the following features of plant which favor the 

application of RTO: 

1) adjustable optimization variables exist after higher 
priority safety, quality and production rate objec- 
tives have been achieved; 

2) profit changes significantly as values of the opti- 
mization variables are changed; 

3) disturbances occur frequently enough for real-time 
adjustments to be required; 

4) determining the proper values for the optimization 
variables is too complex to be achieved by selecting 
from several standard operating procedures. 

Systems for on-line optimization have been developed 

and used since about 1980, but success or failure in in- 

dustrial applications has largely gone unnoticed until 
the 1990s. After that, a few publications appeared in the 
literature (e. g. [3]; [2]). Still, much of the mathemati- 
cal programming technology has not been documented 

outside of industrial corporations. Therefore, it is im- 

possible to fully survey and discuss all the industrial ap- 

plications here. Rather, we highlight a few major areas 
in which progress is being made, and point out a few 
references for further detail. 

In particular, applications grew in scope and size 
as computing power to support such activities became 
available. From 1990 onwards, there has been a signif- 
icant growth in the application of on-line optimization 
systems in the process industries. The following appli- 
cations milestones show the growth of SQP-based ap- 
plications for real-time optimization [2]: 

e 1980s: in house developments at DSM, ICI, Shell (> 
20000 equations); 

e 1986: Shell ‘Opera’ package ethylene plants; 

e 1988: First DMO application: SUNOCO Hydroc- 
racker; 

e 1991: Lyondell Integrated refinery; 


e 1994: Mobil and Mitsubishi Chemical applications 

(over 200000 equations); 

e 1996: Aspen/DMC/Setpoint mergers. 

Note that with the 1996 merger, 80-90% of real- 
time optimization applications are implementations by 
Aspen Technology, Inc. A recent and comprehensive 
review of the issues related to real-time optimization 
(RTO) may be found in [29]. The components of the 
current research to RTO of large scale continuous pro- 
cesses based on steady state models are reviewed in 
[33]. In that article, the issues involved in the design 
of RTO systems are discussed, particularly with respect 
to structural decisions, e.g., the choice of measure- 
ments to be used to monitor plant performance and 
update the optimization model, and the level of model 
complexity to be used in the RTO system. From pub- 
lished studies, summarized in Table 1, a successful RTO 
application delivers about 3% of the value added by 
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Some industrial case studies with SQP optimization 


REAL-TIME OPTIMIZATION 


Company 
Shell Oil (1986) 
Wilton (1988) 


Application 
Ethylene Plant 
Power Station 


Results 
§ 4.0M/yr 
2-6% 


Amoco (1990) 
British Petroleum 
Chevron USA (1990) 


Star Enterprise 
(1990) 


Shell Oil 


Gas Plant 
Refinery 
Ethylene Plant 
Crude Unit 


Refinery 


§ 4.0M/yr 
§ 2.5M/yr 
5-10% 

§ 3.0M/yr 


9% in gasoline 
production 


Texaco (1990) 
Lyondell (1991) 


OMV Deutschland 
(1991) 


Conkwright (1994) 
Ic 

DMC 

Sunoco (1995) 


Divekar & Lepore 
(1991, [17]) 


Refinery 
Ethylene Plant 
Ethylene Plant 


Petroleum Crude 
Distillation 


Industrial Steam & 
power 


Ethylene plant 


Hydrocracker 


Ethylbenzene / 
styrene 


§ 4.0M/yr 
9 month payout 
1-3% 


§ 1-2M/yr 

§ 1.5M/yr 
Payback in less 
than one year 


$ 1M/yr 
§ 1-2.6M/yr 
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the plant in economic benefits ([{15]; [20,35,43]; [26]). 
However, it should benoted that published applications 
cover a small spectrum of the full range of manufactur- 
ing plants employed in the process industries, i. e., large 
scale continuous plants in the petroleum and petro- 
chemical sectors [33]. 

As seen in Table 1, many successful industrial ap- 
plications of RTO have been reported. For instance, 
impressive optimizations were implemented at the 
SUNOCO Sarnia Canada refinery, which was recog- 
nized by a 1995 Computerworld Smithsonian Award 
for innovative information technology in manufactur- 
ing. The following examples with economic benefits 
suggest the wide range of processes on which RTO has 
been successful. 


Off-Line Process Optimization 


Optimization for process design is a difficult and com- 
plicated task. Here, most discussion of process opti- 
mization in the literature focuses on the problem: given 
certain operating objectives such as throughput, utilities 
availabilities, product specifications, what are the best 
sizes of equipment and operating conditions to mini- 
mize an appropriate combination of capital and oper- 
ating costs? 

To aid in the design task, detailed, comprehensive 
simulation platforms have been developed (see [6], for 
a review.) Moreover, over the past two decades there 
has been an almost complete shift from in-house devel- 
opment and maintenance of simulation packages, e. g., 
within an operating petrochemical company, to vendor 
supplied software. Table 2 presents a short summary 
of current process simulation tools with SQP optimiza- 
tion. All but the last three are in-house packages; the 
last three entries are vendor software which command 
most of the usage for design and optimization. 

Unlike real-time optimization, these off-line process 
simulation programs are now part of every process engi- 
neer’s toolkit and have also been widely integrated into 
the chemical engineering academic curriculum. More- 
over, while RTO models remain specialized applica- 
tions with only a small group of model developers, pro- 
cess simulation tools are available on every engineer’s 
desk, at least in large operating companies, and are used 
for most day-to-day modeling tasks. These include de- 
sign of new processes, retrofits of existing ones, de- 
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SQP optimization for process design 


OFF-LINE PROC OPTIMIZATION 


ICI Flowpack SQP with Modular mode 
fap [Genesys | SQP with Modular mode 


QUIKBAL 

PRO/II 
Hyprotech | HYSYS 

ASPEN Plus 


SQP with Modular moed 
SQP with Modular mode 
SQP with Modular mode 
SQP with Modular mode 


bottlenecking the process operations and analysis for 

operability and control. 

While both RTO and off-line process simulation 
represent steady stateprocess models, off-line models 
tend to be much more detailed and rigorous. This is due 
to the fact that these models need to serve much more 
general applications and also because there are no on- 
line data with which to adjustparameters. As a result, 
these models are much more difficult to solve andaro- 
bust modular mode is preferred, particularly if detailed 
sizing and costing programs are involved. 

Broadly speaking, modular process simulation tools 
can be classified into four levels: 

1) At the lowest level, basic physical properties and 
thermodynamic relationships (e.g., phase equilib- 
rium, energy balance terms, transport relationships) 
have been incorporated. These contain the vast ma- 
jority of process equations and these are solved with 
specialized solution algorithms. 

2) At the next level, are the basic building blocks for the 
process units, including distillation, heat exchange, 
reaction and material transfer. These blocks con- 
sist of mass and energy balances as well as constitu- 
tive equations, solved with specialized procedures; 
they also rely heavily on underlying physical prop- 
erty equations. 

3) This level deals with the convergence of the over- 
all flowsheet. Here process units are sequenced, tear 
streams are chosen, their values are updated and re- 
cycle loops are converged. It is at this level that flow- 
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sheet optimization is introduced since the SQP algo- 
rithm extends the overall convergence function by 
incorporating tear equations as equality constraints 
in the optimization problem. Often this problem 
is fairly small (fewer than 100 variables and con- 
straints) and a dense SQP algorithm without decom- 
position is usually satisfactory. Here the dominant 
computational cost for the optimization lies in the 
evaluation of the objective and constraint functions 
(and gradients) from the process model. 

4) The process simulator is capped with a graphical 
user interface that communicates with the process 
engineer in setting up and solving the simulation 
and optimization problem. 

For off-line process optimization, the list of successful 

applications is too numerous to mention, as it is cur- 

rently a routine task, distributed across all sectors of 
the chemical industry. A number of case studies for 

flowsheet optimization have been summarized in [6]. 

Moreover, the user guides for ASPEN+, PRO/II and 

HYSYSprovide ample documentation and examples on 

the use of their SQP-based optimization tools. 


Conclusions And Future Work 


This article provides a brief review of nonlinear pro- 
gramming strategies and applications in chemical pro- 
cess optimization. In many industrial applications, the 
NLP algorithm of choice is successive quadratic pro- 
gramming (SQP) and a description of the algorithm, 
and its variations, is provided. In particular, we develop 
the basic SQP algorithm and then concentrate on large 
scale extensions. For process optimization, we take ad- 
vantage of two model characteristics: these problems 
have few decision variables (< 100) despite their large 
model size and, despite advances in software develop- 
ment, second derivatives are often hard to evaluate. As 
a result, reduced space decompositions for SQP (rSQP) 
have been developed for a number of industrial appli- 
cations. 

Process models that are formulated for optimiza- 
tion can be classified as modular and equation oriented 
modes. In the first mode, function values are expen- 
sive as most of the process equations are solved in- 
ternally with specialized solution procedures. As a re- 
sult, the optimization problem seen by SQP is relatively 
small and can be solved without decomposition. In the 


equation oriented mode, the process equations are in- 
tegrated into the optimization problem and the burden 
of the solution is passed on to the NLP solver. Here, 
decomposition strategies such as rSQP are essential for 
efficient process optimization. 

Finally, we classify process applications as off-line, 
devoted to design and analysis studies and on-line, de- 
voted to monitoring and optimization of an operat- 
ing process in real time. Currently, off-line optimiza- 
tion tasks are often performed with modular simulation 
tools that incorporate SQP strategies without decom- 
position. In contrast, on-line process optimization is 
performed almost entirely with equation oriented mod- 
els and require the implementation of decomposition 
strategies like rSQP. A number of applications in both 
categories are cited in this article. 

Future work related to NLP applications in pro- 
cess optimization deals with further development of the 
SQP algorithm, extension of large scale decomposition 
strategies and larger, more sophisticated problem for- 
mulations for process application. 

Fundamental development of SQP algorithms deals 
with improving the local and global convergence prop- 
erties of the algorithm. These properties have been 
strengthened through the analysis of trust region strate- 
gies as well as additional safeguards in dealing with rank 
deficiency and inequality constraints. Related to this are 
the application of interior point (IP) strategies that im- 
prove the efficiency and reliability of large scale, highly 
constrained NLPs. These IP (or barrier) methods can 
be applied at the level of the QP subproblem (see [47]; 
[44]) or the barrier terms can be applied directly to the 
NLP problem [10]. Since the computational effort of 
barrier methods (either at the QP or the NLP level) does 
not increase greatly with an increase in the number of 
inequality constraints, this approach seems to be essen- 
tial to deal with ever increasing problem sizes. 

Moreover, decomposition strategies for large scale 
NLP can be considered in two categories. For full space 
SQP, decomposition occurs at the QP and the linear 
algebra level, and effective strategies have been devel- 
oped to factorize indefinite, sparse systems. These also 
require first and second derivatives from the process 
model. Future developments in process modeling sys- 
tems need to provide these capabilities. Also, further 
conceptual development is needed to deal with large, 
full space QPs with indefinite Hessian matrices. 
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For reduced space methods, the multiplier free ap- 
proach can be applied to a tailoring of process mod- 
els, where existing modular models (if solved with 
Newton-based procedures) can be solved simultane- 
ously with the NLP, using rSQP. With this approach, 
the best of the modular and equation oriented modes 
can be achieved; reliable, detailed models with spe- 
cialized initializations and solvers can be optimized 
quickly and simultaneously. This approach has been 
demonstrated in a number of process applications in 
[12,39,41] and [1]. 

Finally, with the development of improved NLP 
solvers and decomposition strategies there are a num- 
ber of process applications that extend beyond process 
optimization, both for on-line and off-line optimiza- 
tion. For on-line optimization, the current challenges 
lies in combining the control and RTO layers in a chem- 
ical process. The resulting formulation is a differential- 
algebraic optimization problem, which can be posed as 
a large scale NLP with many decision variables. These 
problems require novel decomposition approaches that 
are beyond the scope of this article (see e. g., [5]). Re- 
lated to control and optimization are the problems of 
state estimation and parameter estimation. These tasks 
are essential to identify the optimization model and 
have the same structure as the differential-algebraic op- 
timization problem. 

For off-line optimization, a number of capabilities 
are required that extend beyond process optimization. 
Once an optimal flowsheet has been found, a number of 
questions still need to be answered, before the solution 
can be implemented. These issues can be summarized 
by the following items: 

e Sensitivity of optimal flowsheets: How does the opti- 
mum flowsheet change with changes in input con- 
ditions and model uncertainty? ([46]) 

e Design under uncertainty: What is the optimal pro- 
cess that can accommodate a range of uncertainties? 
([23]) 

e Operability and flexibility analysis of flowsheets Op- 
erability and flexibility analysis of flowsheets: Over 
what range of uncertainty does an existing process 
function? ([19]) 

e Integration of dynamic considerations and controlla- 
bility: How well can the designed or existing process 
reject disturbances and move from one desired set- 
point to another? ([33]) 


As a result of these open questions, process optimiza- 
tion still appears to be an active and fertile area for fu- 
ture research. 
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> Optimization with Equilibrium Constraints: 
A Piecewise SQP Approach 
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> Successive Quadratic Programming 

> Successive Quadratic Programming: Applications in 
Distillation Systems 
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> Successive Quadratic Programming: Full Space 
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> Successive Quadratic Programming: Solution by 
Active Sets and Interior Point Methods 
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Decomposition methods of successive quadratic pro- 
gramming (SQP) are methods that reduce the size of 
the recursive quadratic programming (QP) subprob- 
lems by using the equality constraints to ‘eliminate’ 
variables. Decomposition methods are particularly use- 
ful when the number of degrees of freedom, n — meq 


is small, where n is the number of unknown vari- 
ables and meq is the number of equality constraints, be- 
cause this results in small and tractible quadratic pro- 
gramming subproblems in which there is little need 
to be concerned with sparsity, fill-in and other issues 
in large scale quadratic and successive quadratic pro- 
gramming. Thus quasi-Newton updates like the mod- 
ified Broyden-Fletcher-Goldfarb-Shanno (BFGS) for- 
mula can be used to maintain hereditary positive def- 
inite approximations to the projection of the Hessian 
matrix of the Lagrangian function on the linearized 
constraint surface (tangent plane) and the solutions to 
the recursive quadratic programs are unique. However, 
when decomposition methods are used, it is often nec- 
essary to use quasi-Newton approximations to the La- 
grangian Hessian matrix and thus the asymptotic rate 
of convergence is at best two-step Q-superlinear as op- 
posed to quadratic if analytical or finite difference sec- 
ond derivatives are used. Moreover, some curvature (or 
second derivative) information is ultimately lost as the 
linearized constraint surface orientation changes be- 
cause second derivative information is only being gath- 
ered or approximated on the tangent subspace, while it 
is neglected in directions orthogonal to the tangent sub- 
space. Additional techniques for recovering this ‘lost’ 
curvature information and for preserving sparsity in the 
Jacobian matrix of the constraints have also been pro- 
posed. 

All decomposition methods are based on a choice 
of basis for the vector space defined by the x variables 
in the optimization problem and are best suited for 
nonlinear programming problems with equality con- 
straints and simple bounds on variables. More general 
nonlinear inequalities are usually handled by convert- 
ing these inequalities to equalities using slack variables. 
Some early decomposition methods [1,6] in engineer- 
ing used canonical bases while more recent range and 
null space decomposition (RND) methods [7], choose 
basis vectors that align with the range and null space of 
the Jacobian matrix of the constraints. Range and null 
space decomposition methods were introduced by W. 
Murray and M.H. Wright in the late 1970s (see [4]). 
The null space, Z, of the Jacobian matrix, Jz, is a vec- 
tor space that satisfies the condition Jpw, = 0 for any 
nonzero vector wz = )ajz;, where {z;} are the basis vec- 
tors for the null space and where the dimension of the 
null space is equal to the number of distinct vectors z;. 
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The null space of the Jacobian matrix of the constraints 
is the tangent subspace of the constraints. The range 
space, Y, of the Jacobian matrix, on the other hand, is 
the space orthogonal to Z such that the direct sum Y + 
Z = R", where n denotes the number of x variables. This 
is where the name range and null space decomposition 
(RND) comes from and this particular choice of basis 
also leads to some simplified algebra. RND methods are 
the decomposition methods currently in use. 
The issues that are central to decomposition meth- 
ods of successive quadratic programming include: 
e the choice of basis for the linearized constraint sur- 
face; 
e decoupling, simplifying assumptions and other re- 
lated concerns; 
e methods for approximating the projection of the La- 
grangian Hessian matrix; 
e the methods used in factoring the Jacobian and La- 
grangian Hessian matrices. 


Nonlinear Programming 


Successive quadratic programming methods address 
the problem of finding a local solution to the following 
nonlinear programming (NLP) problem: 


min f(x) 


st. c(x) < 0, 


where x is a vector of length n which represents an esti- 
mate of the local minimum, f(x) is a twice continuously 
differentiable objective function and c(x), the vector of 
equality and/or inequality constraints, is also twice con- 
tinuously differentiable and nonlinear. The associated 
Lagrangian function for this nonlinear programming 
problem is 


L(x) = f(x) + AT (x) + uw" e(x), 


where A and py are vectors of Lagrange and Kuhn- 
Tucker multipliers associated with the equality and in- 
equality constraints, respectively. The corresponding 
gradient of the Lagrangian function, g7(x), is defined by 


g(x) = g(x) +AT g(x) + wu! gelx), 


where g(x) is the gradient of the objective function, 
g-(x) is the gradient (or vector of first partial deriva- 
tives) of the constraint functions. 


Decomposition of the Quadratic Program 


All successive quadratic programming methods solve 
nonlinear programs by recursively solving quadratic 
programming subproblems based on a quadratic ap- 
proximation of the Lagrangian function and decom- 
position methods are no different. Consider then the 
recursive quadratic program on the kth SQP iteration 
given by 


min g(xz)' Ax, + 3 Ax} Bx Axr 
st. c(xg) + Jp Axx = 0, 


and xz, < Ax, < xu, 


where g(x;) is the gradient of the nonlinear program- 
ming objective function, f(x), evaluated at x;, c(x,) is 
the set of equality constraints, Jz is the (#7eq x n) Ja- 
cobian (or first partial derivative) matrix for the equal- 
ity constraints, B, is an (n x n) approximation to the 
Hessian (or second partial derivative) matrix of the La- 
grangian function, x, and xy are the lower and up- 
per bounds on the variables respectively and Ax,, Ax, 
and 4x represent the desired solution or change in the 
unknown variables, Lagrange and Kuhn-Tucker multi- 
pliers respectively. Remember, general inequalities are 
converted into equalities using slack variables so they 
are present as part of the set of equalities in this formu- 
lation. Only the bounds on variables remain as inequal- 
ities. 

Decomposition methods are based on the idea of 
splitting (or decomposing) the unknown variables, x, 
into two groups, Meg dependent variables, say y, and n 
— Meg independent variables, z. Once this framework is 
established the change in the unknown variables, Ax;,, 
can be represented by the matrix equation 


Axx = YrAye + Z,AZx, 


where the matrices Y; and Z, are n X meq and n x (n 
— Meq), respectively. Substitution of this expression for 
Ax; into the quadratic program gives 


min g(xx) [Ye Aye + Z, Az] 

+4[Ye Aye + ZeAze]' BelYe Aye + ZeAze] 
s.t. c(x~) + Jel Ye Aye + Zp Az] = 0 

xp < YpAye + Zp Azy < xv. 


The reformulated equality constraints can be rear- 
ranged to give Ay, = —[JeYx]~'[c(xz) + JeZeAzel, 
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which in turn can be substituted into the quadratic ap- 
proximation of the Lagrangian function to give a ‘re- 
duced’ objective function expression and a ‘reduced’ 
quadratic program in the n — meq variables Az, only. 
That is, substitution of this last expression for Ay; gives 
the ‘reduced’ quadratic program defined by 


min g(xx)"[Ye(—Ue Yel" [clxe) + JeZeAzel) 
+2, Az] 
+51 ¥e(—De Ye) [e(xx) + JeZeAze]) 
+Z, Az]! 
x By [Y(—Ue Ye] [e(xx) + JeZpAzx]) 
+Z, Azz] 
st. xy + Ye(Ue Yu)" [e(xe) + JeZeAzx]) 
< Z, Az 
< xy + Ye(Ue Ye) [e(xn) + JeZp Azz). 


Note that the bounds on Ay, effect the bounds on Az, 
through the Y;JeYxl7'[c(x,)] term. 


Quadratic Programming Kuhn-Tucker Conditions 
for Decomposition Methods 


For decomposition methods, the Kuhn-Tucker condi- 
tions for the original quadratic program are given by 
the generalized stationary conditions for the ‘reduced’ 
quadratic program 


[Cy + Zk] glxe) + (Cy + Zp)" Beag 
+[C] BeCy + 2C) ByZp + Zf BeZp]Azy 
+]; tk = 0, 

JrAz, = 0, 

LE = 9, 


where the vector a, = — Yx[JeYx]~'c(xx) and the (n x 
(n — Meq))-matrix Cy = — YxeYu)~' JeZx, Jr is the Ja- 
cobian matrix for the active bounds on Az; and ux are 
the Kuhn-Tucker multipliers associated with those ac- 
tive bounds. 

There are also the equations defining the change in 
the dependent variables 


Aye = —UeYel'[e(xe) + JeZeAzi] 


and the conditions defining the Lagrange multipliers 
for the equality constraints 


Y, BelYiAys + ZeAze] + Yio Tp Ak 
— ~Y;,) g(xz) 3 


This last condition comes from the Kuhn-Tucker con- 
ditions for the quadratic program formulated in terms 
of both Ay, and Az. 


Choice of Basis 


Choices for the matrices Y; and Z, give different de- 
composition methods, some of which are more conve- 
nient algebraically than others. For example, the choice 
of basis in the decomposition method in [1] corre- 
sponds to Y; = (I — i Jz) and Z, = (0 I), where the 
Jacobian matrix of the equality constraints, Jz, is parti- 
tioned into (Jy Jz) such that the (#7eq X meq)-submatrix 
J, is invertible or nonsingular and the order of the ma- 
trix partitions are such that they are consistent with 
matrix-vector and matrix-matrix multiplication. For 
range and null space decomposition (RND) methods [7], 
the choice of basis is given by Y; = (I — re Jx) and Z, 
= Udy") such that JgZ, = [0]. The condition JgZ, = 
[0] means that the matrix product JpZ;, gives the zero 
matrix and therefore each column of the matrix Z; is 
carried to the zero vector by the Jacobian matrix. 

Use of the null space condition JzZ, = [0] simpli- 
fies the reduced quadratic program Kuhn-Tucker con- 
ditions to 


Zq g (xk) — Zp BeLYe Ue Yel c(xn)] 
+Z jf BeZyAzy + (Zp ,—-Zy |e = 0, 

Zp Az — Xu — Yee Yul '[c(xx)] < 0, 
—ZAze + x7 + Yee Yel '[c(x1)] < 0, 


since the (n x (nm — meq))-matrix Cy = — Yx[JeYx)7' 


JeZx = [0], 
Aye + Ue Ye) ‘[e(xx)] = 0 


and 


Y¥, BLY, Aye + ZeAze] + YI TE An 
= —Y,' g(xx). 
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Decoupling the Stationary Conditions 


The last four equations defining Az;, Ay,z, Ax and [x 
are coupled because of the last condition and require 
projected second derivative information from both the 
range space ued B,Y,) and the cross product of the 
range and null spaces ean ByZ,). To remove the need 
for this information and thereby decouple the station- 
ary conditions, simplifying assumptions are necessary. 
In particular, it is assumed that because A z and A y; 
are zero at any constrained local minimum, A y; = 0 
and A z; = 0 can be used at all iterations, which reduces 
the last equation to 


Yy JeTAx = —Y;) g(x). 


This decouples the variables z, and ju; from A yx and 
Ax. However, it should be pointed out that this sim- 
plification results in a loss of curvature information 
in the range space of the Jacobian matrix and slightly 
slower asymptotic rates of convergence. That is, second 
derivatives information associated with y? BY, and 
¥? B,Zx, which is not constant for general nonlinear 
constraints, is lost and the resulting rate of convergence 
is two-step Q-superlinear. 

There are also other concerns related to the bounds 
on the change in the dependent variables, A yx, 
that require attention. The relationship A y, = — 
UeY«l~'[c(x,)] that defines the change in the depen- 
dent variables as a function of the change in the in- 
dependent variables also directly effects the bounds on 
the independent variables in the reduced quadratic pro- 
gram, which are given by 


xp + YiUVe Yel '[c(xn)] < Ze Az 
< xu + Ye[JeYe]"[c(xx)] . 


Some care must be exercised to avoid conflicting 
bounds and infeasible reduced quadratic programs. 


Methods for Approximating 
the Projected Lagrangian Hessian Matrix 


With the above simplifications, the only second deriva- 
tive information required in the Kuhn-Tucker condi- 
tions for the reduced quadratic program is the ((m — 
Meq) X (1 — Meq))-matrix Zi BZ, which is the pro- 
jection of the full Lagrangian Hessian matrix onto the 
tangent subspace defined by the linearized constraints. 


Moreover, because the matrix Zt B,Z, must be posi- 
tive definite at any constrained local minimum and can 
be much smaller in dimension than B; in many appli- 
cations, Zi B,Z, is often approximated using the BFGS 
formula or some other suitable quasi-Newton update 
that preserves hereditary positive definiteness [7]. The 
primary advantage in doing this is that the change in 
the independent variables and associated Kuhn-Tucker 
multipliers for any bounds can be determined by solv- 
ing a smaller, convex quadratic program. This results 
in computational savings as well guaranteeing an iter- 
atively unique A z,. However, remember the trade-off 
for this is loss of curvature in the range space and two- 
step Q-superlinear convergence, which is a bit slower 
than Q-superlinear or quadratic convergence. 

Using analytical or finite difference second deriva- 
tives in full space methods is straightforward; using an- 
alytical or finite difference second derivatives in decom- 
position methods is not. With quasi-Newton approxi- 
mations, the projection zy B,Z,, can be easily formed 
and stored as a small dense ((n — meq) x (1 — mMeq))- 
matrix. Moreover, there is no need to explicitly calcu- 
late or store By. On the other hand, to use analytical 
or finite difference second derivatives in decomposition 
methods the matrix triple product Zi B,Z; must be ex- 
plicitly formed and therefore the entire Hessian matrix 
of the Lagrangian function, Bz, must be evaluated and 
stored. Clearly this is counter to the overall purpose of 
decomposition. 


Factoring the Jacobian 
and Projected Lagrangian Hessian Matrices 


The iterative computation of the range and null space 
of the Jacobian matrix is normally accomplished by QR 
factorization. That is, the Jacobian matrix of the con- 
straints is factored iteratively according to the rule 


Tex) = Q(xx) (RT (xx) 0)" 
= (Ye Z)(RE 0). 


where Q(xx), which is the product of Householder 
transformation matrices, is an orthonormal matrix and 
partitioned into Y; and Z; and where R(xx) is an upper 
triangular (1Meq X Meq)-matrix. In general, the matri- 
ces Q(x;) and R' (x;,) will be dense matrices and there- 
fore not well suited for large problems in which Jp(x,) 
is sparse. Other factorizations such as LQ factorization 
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can be used and it is also possible to update the sparse 
factors of the Jacobian matrix to reduce storage require- 
ments [5]. 

When Zi BZ, is approximated using the BFGS 
or some equivalent hereditary positive definite quasi- 
Newton update, it can be factored reliably and effi- 
ciently using LDL' factorization, where L and D are 
lower triangular and diagonal factors, respectively. In 
fact, techniques exist for updating the quasi-Newton 
lower and diagonal factors directly to avoid the expense 
of factorization altogether [5]. 


An SQP Algorithm for Decomposition Methods 


A generic successive quadratic programming algorithm 

using decomposition methods is shown below: 

1) Initialize x, A, w and CaN ByZ,); define a conver- 
gence tolerance, € > 0, and set k = 0. 

2) Evaluate f(xz), g(x), c(xg) and Jp(xz). 

3) If || (gr(xx) c(xx))! llo < €, (xz) = 0 and pz > 0, 
then stop. Otherwise, go to step 4. 

4) Factor Jz(x;) by QR (or some other equivalent) fac- 
torization so that Jp(xx) = (Yx Zx) (Ry 0)!. 

5) Define the dependent variables, y,, and indepen- 
dent variables, z,. 

6) Determine the change in the dependent variables 
from A yx, + JeYul7'[c(xx)] = 0 and set yer = ye + 
A yr. 

7) Solve the equation YI =— Y) g(xx) for the 
Lagrange multipliers A,. 

8) Construct the reduced quadratic program 


min (Z) g(xk) 
—Zf Bel YeUe Yel *e(xn)])' Aze 
+4Azi (Z{ BeZ)AZE 

st. xy + Ye Ue Ye)" [e(xx)] 
< Zp Az 


<xu + Ye Ue Ye) '[e(xx)] 


and solve it for A zz and juz. Set Zp41 = Ze + A Ze. 

9) Determine x41 = (Vivi, Zk+1) from by either line 
searching, trust regions or some other means. 

10) Calculate a new approximation to the projected 
Hessian matrix, en BeiZk+1, using the BFGS 
formula or some equivalent hereditary positive 
definite quasi- Newton update. 


11) Setk=k +1 and go to step 2. 


Some Comments on Numerical Performance 
of Decomposition Methods 


Decomposition methods have been applied to a variety 
of small and large scale nonlinearly constrained opti- 
mization problems in both the mathematical sciences 
[7] and engineering [1,2,3]. Most agree that decompo- 
sition methods are best suited for applications in which 
the number of degrees of freedom or number of in- 
dependent variables (i.¢., 1 — meq) is small compared 
to the total number of variables, a situation that oc- 
curs in many practical applications. Good numerical re- 
sults have been reported for mathematical benchmark 
problem [7] and small and large chemical process engi- 
neering problems [2]. In particular, J. Nocedal and M.L. 
Overton [7] report that RND methods compare favor- 
ably with the full space SQP method of M.J.D. Powell 
[8] on a set of small mathematical benchmark prob- 
lems involving up to eight variables and four equality 
constraints. L.T. Biegler [2] and H.S. Chen and M.A. 
Stadtherr [3] show that decomposition methods work 
well on a variety of chemical process problems includ- 
ing multicomponent, multistage distillation optimiza- 
tion problems involving up to 1000 variables. 
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Full space SQP methods are successive quadratic pro- 
gramming methods that build approximations to the 
Hessian matrix of the Lagrangian function and solve 
the resulting quadratic programming subproblems in 
the full space of the unknown variables. The original 
SQP methods of R.B. Wilson [25], S.P. Han [12] and 
M.J.D. Powell [21] are formulated in terms of all of the 
x variables and are therefore full space methods. Usu- 
ally, however, ‘full space methods’ refers to those SQP 
methods that operate in the full space of the x variables 
when the number of variables, say n, is large and, as 
a result, are simultaneously concerned with the spar- 
sity (the relative number of zero and nonzero elements) 
of the matrix of second partial derivatives of the La- 
grangian function and the Jacobian matrix of the con- 
straint functions, as well as techniques for factoring, up- 
dating and solving the Kuhn-Tucker conditions for the 
recursive quadratic programs and other related issues. 
A sparse matrix is one in which the number of nonzero 
elements is a small fraction of the total, and performing 
arithmetic operations with only these nonzero elements 
reduces the overall computational workload. Clearly, 
the SQP methods of Han and Powell were not intended 
for problems in which the number of x variables is 
large. Full space SQP methods can be further catego- 
rized as ‘convex’ or ‘nonconvex’ and this characteriza- 
tion has bearing on the techniques used to estimate the 
Hessian matrix of the Lagrangian function, its result- 
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ing curvature, the nature of the recursive quadratic pro- 
grams (i. e., whether they are positive definite or indefi- 
nite), and the methods needed to factor and solve the 
Kuhn-Tucker conditions for the quadratic program- 
ming subproblems. In particular, nonconvexity (or in- 
definiteness) requires special factorization techniques 
and more complex active set methods, can give rise 
to indefinite quadratic programs and multiple Kuhn- 
Tucker points (or solutions) to the recursive quadratic 
programming problems, and can result in a loss of de- 
scent in the parent nonlinear programming problem 
causing line searching and other difficulties. Virtually 
all of these difficulties disappear when convexity can be 
guaranteed. 
Thus the issues that are important when the number 
of x variables is large are: 
e estimating the Hessian matrix of the Lagrangian 
function, 
e various aspects of solving the Kuhn-Tucker condi- 
tions for the quadratic program, 
e solution (Kuhn-Tucker point) multiplicity in the 
quadratic program, 
loss of descent in the parent nonlinear program, and 
e initializing the unknown variables. 


Nonlinearly Constrained Optimization 


The general nonlinear programming (NLP) problem is 
given by 


min f(x) 


st.  c(x) < 0, 


where x is a vector of length n which represents an es- 
timate of the local minimum, f(x) is a twice contin- 
uously differentiable objective function and c(x), the 
vector of equality and/or inequality constraints, is also 
twice continuously differentiable and nonlinear. Suc- 
cessive quadratic programming methods are based on 
a quadratic approximation of the Lagrangian function, 
L(x), defined by 


L(x) = f(x) + AT e(x) + uw" c(x), 
where A and wp are vectors of Lagrange and Kuhn- 


Tucker multipliers associated with the equality and in- 
equality constraints respectively, and attempt to solve 


the NLP by recursively solving a quadratic program- 
ming subproblem 


min g(xz)! Axg - + Ax, Be Axg 
st. (xx) + J(xn)Axe < 0, 


where A xx is the change in the unknown variables, 
By is some approximation to the true Hessian matrix 
of the Lagrangian function, H(x) = Hy(x) + )0AjH.i(x) 
+ )’uiH (x), and where Hy(x) and H,;(x) refer to the 
true Hessian matrices of the objective function and ith 
constraint respectively and J is the Jacobian matrix of 
the constraints. 


Kuhn-Tucker Conditions 
for the Quadratic Program 


The Kuhn-Tucker conditions that define stationarity in 
the recursive quadratic program are given by 


BuAxg + JAR + J we = —9(xx) 
Ua Jp |" Ax = —c(xx) , 


MeO, 


where g(x;) is the gradient of the nonlinear program- 
ming objective function, f(x), evaluated at x,;, c(x,) is 
the set of active constraints (i. e., equalities plus inequal- 
ities that hold as equalities), Jz and J; are the Jacobian 
(or first partial derivative) matrices for the equality and 
active inequality constraints respectively, By is an ap- 
proximation to the Hessian (or second partial deriva- 
tive) matrix of the Lagrangian function, and A xx, Ax 
and ju; represent the desired solution or change in the 
unknown variables, Lagrange and Kuhn-Tucker multi- 
pliers respectively. Remember, the number of inequali- 
ties in the active set can change from one quadratic pro- 
gramming iteration to the next as well as from one SQP 
iteration to the next. In many larger applications, the 
matrices B;,, Jg and J; have relatively few nonzero ele- 
ments (or are sparse) with a sparsity pattern that is often 
naturally banded with wide bandwidth. Efforts to ac- 
count for this sparsity to reduce both storage and com- 
putation give rise to many auxiliary issues that must be 
resolved in order to produce reliable and efficient full 
space SQP methods. 
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Estimating the Hessian Matrix 
of the Lagrangian Function 


When the number of variables is small, rank-two, 
quasi-Newton updating formulas like the Broyden- 
Fletcher-Goldfard-Shanno (BFGS) and Davidon-Flet- 
cher Powell (DFP) updates that preserve the desired 
property of positive definiteness can be used to approx- 
imate the Hessian matrix of the Lagrangian function 
because sparsity is of no concern. On the other hand, 
when n is large, these and other ‘full’ updates can not be 
used because they result in Hessian matrix approxima- 
tions that have essentially all nonzero elements, increas- 
ing both storage requirements and associated compu- 
tational effort. Moreover, while both the sparsity of the 
constraint Jacobian matrix and the Hessian matrix of 
the Lagrangian function must be considered in devel- 
oping full space SQP methods for large scale problems, 
it is the characteristics of the second derivatives of the 
Lagrangian function that most strongly effects the na- 
ture of the resulting recursive quadratic programs and 
the techniques that must be used to solve them. 

In order to account for the sparsity of the Hes- 
sian matrix of the Lagrangian function, second par- 
tial derivatives of the objective function and nonlin- 
ear constraints are usually estimated using analytical or 
finite difference derivatives, sparse quasi-Newton up- 
dates like the sparse Powell-symmetric Broyden (PSB) 
update [23], or a mixture of analytical and quasi- 
Newton derivatives [13,15]. In fact, all techniques for 
estimating the Hessian matrix of the Lagrangian func- 
tion can be put within a common framework [5] by rep- 
resenting the matrix B;, in the form 


By = C(xx) + Ak, 


where C(xx) is called the computed part of the Hes- 
sian matrix, and is calculated from analytical deriva- 
tives, and A;, the approximated part, can be computed 
either analytically, from finite differences or from an 
appropriate quasi-Newton formula. This division of the 
Hessian matrix of the Lagrangian function is both natu- 
ral and convenient in many applications and allows the 
approximation of the Lagrangian Hessian matrix to be 
tailored for any given situation. When n is small and 
C(x;) = 0 and A, is updated by the ‘full’ DFP or full 
modified BFGS updates, the resulting SQP methods are 
those of Han and Powell. Wilson’s method, which can 


be used for either small or large scale problems, results 
when C(x,) and Ax = A(x) are calculated from ana- 
lytical and/or finite difference second derivatives. On 
the other hand, C(x,) = 0 and A; can be updated by 
the sparse PSB update [23] to give a full space SQP 
(quasi-Newton) method that accounts for sparsity. Fi- 
nally, hybrid SQP methods [15], which have their foun- 
dation in nonlinear least squares [5], result when C(x,) 
is computed from analytical second derivatives and Ax 
is calculated using some quasi-Newton update. For ex- 
ample, A. Kumar and A. Lucia [13] suggest a number 
of hybrid full space SQP methods that calculate C(x,) 
from analytical second derivatives and build approxi- 
mations to each of the many small dense blocks of Ax 
by the full PSB update and iterated projections [6] using 
both traditional secant and auxiliary (thermodynamic) 
matrix constraint information. Like Wilson’s method, 
these hybrid methods for approximating the Hessian 
matrix of the Lagrangian function can be used for both 
small and large scale problems. 


Convexity 


For small problems, the use of the modified BFGS up- 
date to approximate B;, gives hereditary positive defi- 
nite approximations to the Hessian matrix of the La- 
grangian matrix and is preferred because this guaran- 
tees that the projection of the Lagrangian Hessian ma- 
trix is positive definite on the tangent subspace de- 
fined by the linearized constraints. As a result, the re- 
cursive quadratic programs are convex (bowl-shaped) 
and have unique solutions, and these unique solutions 
usually provide descent in the nonlinear programming 
line search function. In contrast, when analytical and/or 
other quasi-Newton updates are used to build itera- 
tive approximations of the Hessian matrix of the La- 
grangian function, regardless of whether the problem 
size is small or large, it is often difficult to guaran- 
tee hereditary positive definiteness unless the problem 
has certain intrinsic convexity properties or the Hes- 
sian matrix is ‘corrected’ to force positive definiteness. 
The most common type of correction for forcing posi- 
tive definiteness is one in which a scalar multiple of the 
identity matrix, I, is added to the current approxima- 
tion to the Hessian matrix of the Lagrangian function. 
That is, if By (or its projection) is determined to be in- 
definite (by monitoring the appropriate factors), then it 
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is corrected by the Levenberg-Marquardt rule 
Be= B+ yl, 


where y is a scalar determined from the diagonal factor 
of By. Other modifications or corrections to ensure that 
By is positive definite using Schur complements have 
also been suggested in [10] and applied in [1]. How- 
ever, it has been shown that forcing positive definite- 
ness can often lead to convergence to undesired (trivial) 
solutions [19]. 


Nonconvexity 


In many cases, the true Hessian matrix of the La- 
grangian function is positive definite on the tangent 
subspace defined by the constraints at a local con- 
strained minimum but indefinite in the full space of 
the variables. To build a ‘better’ approximation to the 
Hessian matrix of the Lagrangian function, it is possi- 
ble to allow B; to be indefinite. However, this can also 
cause the projection of the Hessian matrix onto the tan- 
gent subspace of the linearized constraints to be indef- 
inite. Consequently, the resulting recursive quadratic 
programs can be indefinite and require special tech- 
niques [2,16] or global optimization methods to be 
solved reliably. Moreover, loss of descent in the nonlin- 
ear program can still occur even though these indefinite 
quadratic programs are solved successfully. 

Thus in many large scale applications, where spar- 
sity must be exploited and some combination of an- 
alytical, finite difference and/or quasi-Newton second 
derivatives must be used, considerable attention must 
be paid to the resulting convexity or nonconvexity im- 
plied by the approximations of the Hessian matrix of 
the Lagrangian function because this will have signif- 
icant ramifications, both in the methods used to solve 
the recursive quadratic programs and in the use of sta- 
bilization techniques to farce convergence from ‘poor’ 
(or remote) starting points. 


Solving the Kuhn-Tucker Conditions 
for Quadratic Programming Subproblems 


Full space SQP methods give rise to recursive quadratic 
programs in the full space of the x variables. When n is 
small and sparsity is of little concern, active set strate- 
gies are usually used to solve these small but full recur- 


sive quadratic programs. Moreover, when the approx- 
imation to the Hessian matrix of the Lagrangian func- 
tion is hereditary positive definite, the quadratic pro- 
gram is convex, the rules for constraint addition and 
deletion are simple, both feasibility and descent in the 
quadratic program can be maintained under mild re- 
strictions, and convergence to the unique solution of 
the quadratic program can be guaranteed [8]. This is 
why full quasi-Newton updates like the modified BFGS 
formula are preferred for small problems. Even when 
n is large, there is still strong incentive to ‘correct’ the 
Hessian matrix approximations for positive definite- 
ness so that the recursive quadratic program is convex, 
although techniques for both large scale convex and in- 
definite quadratic programming and both active set (di- 
rect) and interior paint (iterative) methods for solving 
the recursive quadratic programs exist. 


Active Set Methods 


Current active set strategies are exchange-type methods 
[8] and are usually based on some factorization of the 
coefficient matrix of the Kuhn-Tucker conditions for 
the quadratic program depending on the properties of 
Bx. This means that inequalities are brought in and out 
of the active set (or exchanged) as part of the procedure 
for solving the given recursive quadratic program. This 
is accomplished using a combination of ‘line search- 
ing’ in the direction given by the current estimate of 
the quadratic program solution and by monitoring the 
signs of the Kuhn-Tucker multipliers for the inequali- 
ties. When the Hessian matrix of the Lagrangian func- 
tion is positive definite, an LDL! or Cholesky factoriza- 
tion of B, (or its projection) is often used and the factors 
are ‘updated’ both symbolically and numerically as the 
active set within a given recursive quadratic program 
changes. Updating avoids complete refactorization at 
each quadratic program iteration and diagonal pivot- 
ing can be used to maintain numerical stability. This is 
true for both small and large problems. Moreover, at the 
beginning of any given recursive quadratic program, 
an initial feasible solution to the quadratic program is 
often generated using linear programming. While this 
may be desirable, in some sense, feasible starting points 
for the quadratic program are not strictly required. In 
fact, the usual strategy of initializing the active set for 
the current recursive quadratic program to be the final 
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active set on the previous SQP iteration often generates 
an infeasible starting point for the quadratic program, 
particularly when the SQP iterates are far from the op- 
timal solution. 

When the Hessian matrix of the Lagrangian func- 
tion is permitted to be indefinite on the tangent sub- 
space defined by the linearized constraints, LDL! or 
Cholesky factorization is no longer necessarily stable 
and therefore can not be used. To correctly handle pro- 
jected indefiniteness in solving the Kuhn-Tucker con- 
ditions for a given choice of active set, symmetric in- 
definite factorization [3] has been suggested [2] and 
modified for sparsity [14,16]. A number of pivoting 
strategies also exist, including threshold [14] and con- 
strained pivoting [18], to both reduce fill-in and en- 
sure numerical stability. Within the active set strategy, 
projected indefiniteness often causes ‘incorrect’ con- 
straint addition and deletion, places restrictions on the 
order in which inequalities can be added to and/or 
deleted from the active set (even among members of 
the true final active set), can cause redundant active 
sets, and gives rise to multiple Kuhn-Tucker points. 
Thus more complicated logic is required within the ac- 
tive set strategy when projected indefiniteness is per- 
mitted. In particular, projected indefiniteness must be 
monitored from one quadratic programming iteration 
to the next and used as a guide for constraint addition 
and deletion, line searching must be permitted in both 
the positive and negative direction, and an inequality 
must not necessarily be deleted from the active set be- 
cause its associated Kuhn-Tucker multiplier is nega- 
tive. See [19]. 


Interior Point Methods 


Interior point methods for solving the recursive 
quadratic programs have been suggested [9,11,26] and 
differ from active set methods in that they usually use it- 
erative methods with some type of preconditioning (or 
scaling of the elements of the Jacobian and Hessian ma- 
trices) to solve the Kuhn-Tucker conditions for a modi- 
fied quadratic program. In the context of quadratic pro- 
gramming, interior point methods are usually primal- 
dual (sometimes called predictor-corrector) path fol- 
lowing algorithms that use logarithmic barrier func- 
tions. The use of a logarithmic barrier function changes 
the linear Kuhn-Tucker conditions for the quadratic 


program into a set of parameterized nonlinear Kuhn- 
Tucker conditions. These nonlinear stationary condi- 
tions are usually solved using some type of (truncated) 
Newton method, whose linear subproblems are often 
solved by iterative linear equation-solving techniques. 
Approximate linear solutions are often used in the be- 
ginning and the accuracy ‘tightened’ as the solution to 
the nonlinear equations is approached. Convexity in 
the quadratic program is also generally required and 
preconditioning of the linear equations is needed for 
numerical stability. Typical iterative linear equation- 
solving techniques used to determine the (truncated) 
Newton corrections include preconditioned conjugate 
gradient, generalized minimum residuals, and other so- 
called Krylov (or expanding) subspace methods, while 
common preconditioning techniques often are based 
on some partial LU factorization of the coefficient ma- 
trix. The use of iterative linear equation-solving meth- 
ods is particularly advantageous in solving large scale 
problems because fill-in (i.e., turning zero elements 
into nonzero elements) is avoided in the coefficient ma- 
trix, thereby reduce both storage and overall computa- 
tional workload. 


Multiple QP Kuhn-Tucker Points 
and Related Issues 


When the projection of the Lagrangian Hessian ma- 
trix is hereditary positive definite on the tangent sub- 
space defined by the linearized constraints, the re- 
cursive quadratic programs have unique solutions (or 
Kuhn-Tucker points). Perhaps the single biggest dif- 
ficulty associated with projected indefiniteness of the 
Hessian matrix of the Lagrangian function in full space 
SQP methods is the potential for multiple solutions 
to the recursive (indefinite) quadratic programs. There 
can be local and global minima of the quadratic pro- 
gramming objective function on the linearized con- 
straint surface, as well as a saddle point solutions, each 
corresponding to a different set of active constraints. 
Moreover, the particular solution that is found is very 
often a function of the way in which the quadratic pro- 
gramming calculations are initiated (or the initial active 
set chosen for the quadratic program). 

However, more importantly, many of these multi- 
ple solutions (even the global solution) to a quadratic 
program may not be descent directions for a vari- 
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ety of common line search functions used in nonlin- 
early constrained optimization, including an |, exact 
penalty function and an augmented Lagrangian func- 
tion. Thus finding the global solution to a given recur- 
sive quadratic program using global optimization tech- 
niques is unjustified [17]. 


Loss of Descent 
in the Nonlinear Program 


Given a valid Kuhn-Tucker point from an indefinite re- 
cursive quadratic program, loss of descent (or an uphill 
search direction) in the nonlinear program can still oc- 
cur. However, this is a problem that has received rel- 
atively little attention in the literature because many 
full space SQP methods modify the Lagrangian Hessian 
matrix so that it is positive definite on the tangent sub- 
space defined by the linearized constraints, thereby pro- 
viding descent. 

Descent in any nonlinear programming algorithm 
means that gis(xx)! A xx < 0, where gzs(x;) is the gra- 
dient of some suitably chosen merit function that bal- 
ances satisfying the constraints with minimizing the ob- 
jective function. Loss of descent means that gis(xk) | A 
xx = 0. When line searching is used as the stabilization 
technique, it is essential that the change in the unknown 
variables provided by the quadratic programming sub- 
problem be a descent direction of the chosen line search 
merit function at each SQP iteration. Failure to provide 
a descent direction in the context of line searching of- 
ten leads to termination of the SQP algorithm due to 
the failure to find a ‘better’ point with regard to the line 
search function. Thus, it seems appropriate to modify 
the Hessian matrix of the Lagrangian function for (pro- 
jected) positive definiteness to avoid subsequent failure 
in the line searching phase of a full space SQP method, 
as is done in [1,20]. 

In contrast, when trust region methods are used for 
stabilization [4,16,22,24] [7], descent at each SQP itera- 
tion is not strictly required but is desirable. Thus, many 
trust region methods for full space SQP methods still 
use positive definite Hessian matrix approximations, as 
well as a merit function on the trust region [22], to help 
ensure descent even though uniform boundedness is all 
that is required. Other trust region methods [17] use 
a linear programming subproblem to recapture descent 
at any given SQP iteration. For example, Lucia and J. Xu 


[17] solve the linear programming problem given by 


min g(xz)! Axg 
st. JAxz < c(xx), 


where the Jacobian matrix, J, is comprised of the first 
partial derivatives for all active constraints from the 
quadratic programming subproblem. While this ap- 
proach does sacrifice curvature information, it does 
have the distinct advantages of having a unique solu- 
tion and providing information on how to adjust the 
trust region to obtain descent. 


Initializing the Unknown Variables 


Many studies of full space SQP methods give little at- 
tention to the way in which initial values for the un- 
known variables and Lagrange and Kuhn-Tucker mul- 
tipliers are initialized. In fact, remote or ‘poor’ initial 
values for the unknown variables are often chosen in 
mathematical studies involving ‘small’ problems in or- 
der to test the global convergence properties of the 
SQP algorithm. While this provides very useful infor- 
mation, in large scale application-based optimization 
problems, the choice of starting point is an impor- 
tant issue with a slightly different perspective that often 
represents the difference between success and failure. 
Qualitatively correct physical information is frequently 
just as important as, or even more important than, the 
numerical values used for the unknown variables. In 
many physically-based optimization applications, the 
mathematical model (i.e., the constraints and the ob- 
jective function) can wander into regions in which the 
model is not properly defined when initial values are 
chosen arbitrarily and lead to difficulties such as infea- 
sibility, singularity, and other related problems. In ad- 
dition, many physically-based nonlinear programming 
problems can exhibit multiple optima. However, some 
of these solutions are clearly undesirable in the sense 
that they do not represent the desired operational state 
of the model. Some solutions may represent local op- 
tima or saddle point solutions when a global optimum 
is the desired solution. To improve the chances of cal- 
culating desired optima, ‘better’ initial values are often 
used and/or coupled with the use of global optimization 
techniques. 
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Despite the fact that SQP methods are ‘infeasible 
path’ algorithms since they do not usually satisfy the 
nonlinear constraints at each iteration, it is often help- 
ful to at least initiate the computations using a set of ini- 
tial values that satisfies the constraint set and, if possi- 
ble, represents something ‘close’ to the desired optimal 
solution. In many applications, this is often possible by 
solving a ‘simulation’ problem, in which the degrees af 
freedom in the optimization problem are exhausted by 
defining a set of ‘specifications’ (i.e., by adding simple 
equality constraints that fix values of certain variables) 
to give a set of m constraint equations in n unknown 
variables. The solution to this set of nonlinear algebraic 
equations often provides useful and physically mean- 
ingful initial values for the unknown variables. The ini- 
tial values for the Lagrange and Kuhn-Tucker multi- 
pliers are also important in a problem-solving setting 
because their values effect the Hessian matrix of the 
Lagrangian function, the quadratic programming solu- 
tion, and the line search function (if one is used). Ini- 
tial values for the Lagrange and Kuhn-Tucker multipli- 
ers are often determined by computing a least squares 
solution to the Kuhn-Tucker conditions for the non- 
linear programming using the initial values of the un- 
known variables and some knowledge or assumption 
of the initial active set of constraints. That is, initial La- 
grange and Kuhn-Tucker multipliers can be obtained 
by solving the set of equations given by 


JJ" A = —Ie(xx), 


where J is the Jacobian matrix of the active constraints 
and A represents the associated Lagrange and Kuhn- 
Tucker multipliers. Special techniques are also often 
employed to reduce the size and storage associated with 
this system of nonlinear equations [16]. 


Related Numerical Studies 


There have been relatively few numerical studies of 
full space SQP methods specifically directed at large 
scale problems. Application areas have included math- 
ematical [19], chemical process optimization [16,18] 
and aerospace problems [1] to name a few. The num- 
ber of unknown variables in these studies has ranged 
from 5 to 60 in the mathematical studies, from 100 
to 500 unknown variables in the chemical process op- 


timization problems and to as many as 13,000 un- 
knowns in the aerospace examples. A variety of tech- 
niques for estimating the Hessian matrix, including an- 
alytical, finite difference, quasi-Newton and a mixture 
of second derivatives, have been used. In some stud- 
ies positive definiteness of the Lagrangian Hessian ma- 
trix has been enforced [1,20], while in others indefinite- 
ness of the projected Hessian matrix has been permitted 
[16,17,18,19]. 
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Quadratic programming (QP) subproblems arise in 
both full space and decomposition methods of succes- 
sive quadratic programming (SQP) where the solution 
to the QP is used to define the step in the unknown 
variables for the nonlinear programming (NLP) phase 
of the calculations. Thus reliable and efficient methods 
for solving quadratic programs are needed. Early meth- 
ods for quadratic programming were based on a lin- 
ear programming (LP) approach [1,3,17] but current 
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methods are either active set strategies [5] and [2,13] or 
interior point methods [7,11,17]. An active set is a set 
of equality and/or inequality constraints that hold as 
equalities and active set methods [5] for quadratic pro- 
gramming are based on a strategy for moving inequal- 
ity constraints in and out of the active set during the 
solution of the quadratic program until a valid Kuhn- 
Tucker point (solution) is found. Sometimes feasible 
starting points are used to initiate the QP calculations 
but this is not strictly necessary and often impractical 
in the context of SQP methods. Inequalities are added 
to the active set based on constraint violations in the 
current Kuhn-Tucker direction using a ‘line searching’ 
procedure while inequalities are deleted from the ac- 
tive set based on the signs of Kuhn-Tucker multipli- 
ers. Constraint addition, deletion and/or exchange usu- 
ally occurs one at a time. Furthermore, many active set 
methods require convexity (or a positive definite pro- 
jection of the matrix of the quadratic objective func- 
tion); however QP methods for indefinite quadratic 
programs [2,13] and large scale problems [13,14] also 
exist. For large scale problems, sparsity of the coeffi- 
cient (Jacobian) matrix of the constraints and second- 
derivative matrix need to be exploited. In contrast, in- 
terior point methods convert a convex quadratic pro- 
gram into a set of nonlinear equations using logarith- 
mic barrier functions, and solve the resulting nonlinear 
system of equations using Newton’s method and iter- 
ative linear equation-solving techniques. The use of it- 
erative methods to solve the linearized equations usu- 
ally requires preconditioning but incurs no fill-in, mak- 
ing it possible to solve very large problems. Convexity 
is also a strict requirement for current interior point 
methods. 

Issues that are important in solving quadratic pro- 
grams by either active set methods or interior point 
methods include 
e the rules for constraint addition, deletion and ex- 

change, cycling, infeasibility, redundancy; 

e the use of matrix factorizations, updating techniques 
and sparsity considerations; 
e convexity and nonconvexity. 


The Quadratic Program 


Recursive quadratic programming problems arise from 
the application of SQP methods to the following non- 


linear programming problem 


min f(x) 


st. c(x) <0, 


where x is a vector of length n which represents an esti- 
mate of the local minimum, f(x) is a twice continuously 
differentiable objective function and c(x), the vector of 
equality and/or inequality constraints, is also twice con- 
tinuously differentiable and nonlinear. A Lagrangian 
formulation 


L(x) = f(x) + AT e(x) + pw! (x), 


where L(x) is the Lagrangian function and A and yu 
are vectors of Lagrange and Kuhn-Tucker multipli- 
ers associated with the equality and inequality con- 
straints, respectively, gives rise to the following succes- 
sive quadratic program: 


min g(xx) | Axg + + Ax) Be Axx 
st. c(xg) + JAx, < 0, 


where A xx is the change in the unknown variables, 
g(x) is the gradient of the objective function, J is the 
first partial derivative (or Jacobian) matrix of the con- 
straint functions, and B; is some approximation to the 
true Hessian matrix of the Lagrangian function, H(x) = 
Hy(x) + CAH ci(x) + )> iH ci(x), and where H,(x) and 
H_;(x) refer to the true Hessian matrices of the objective 
function and ith constraint respectively. It is this recur- 
sive quadratic program that is solved by active set or 
interior point methods, whose solution is complicated 
by the fact that any inequalities that hold as equalities 
must be determined as part of the solution procedure. 


Linear Kuhn-Tucker Conditions 
for the Quadratic Program 


The solution of a quadratic program is defined by the 
stationary (or linear Kuhn-Tucker) conditions for the 
Lagrangian function of the quadratic program given by 


ByAxy + Ji An +5) we = —9(xx) 
and 
Up Jt |" xg = —c(xp) 


with 4, > 0, where Jz is the Jacobian matrix of the 
equality constraints, J; is the Jacobian matrix for the 
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active inequality constraints, and where the unknown 
variables in this set of linear equations are A x;, Ax, and 
tx. For convex quadratic programs, the projection of 
the Hessian matrix onto the constraint surface is posi- 
tive definite and the solution to the quadratic program- 
ming problem is unique. On the other hand, for non- 
convex (or indefinite) quadratic programs multiple so- 
lutions can exist. Therefore, reliable active set or in- 
terior point methods for solving quadratic programs 
must find the global solution to convex QP’s and at 
least a local solution to indefinite quadratic program- 
ming problems. 


Active Set Methods 


Equality constraints are always members of the active 
set on all QP iterations. If the active inequality con- 
straints at the solution were known, the linear Kuhn- 
Tucker for the quadratic program would only have to 
be solved once (per SQP iteration). However, the ac- 
tive inequalities and therefore J; are not known until 
the solution is found and this is what makes solving 
quadratic programs something more than just solving 
linear systems of equations. The linear Kuhn-Tucker 
conditions for the quadratic program must be repeat- 
edly solved with a different number of equations and 
different number of variables, both of which reflect the 
number of active inequalities. 


Initialization 


Many active set methods for quadratic programming 

use a feasible starting point to initiate the calculations, 

which is usually determined by solving a phase I linear 
programming problem [5,6]. The reasons for this are 
that: 

i) constraint feasibility can then be maintained 
throughout the iteration procedure; 

ii) iterative comparisons of the quadratic program- 
ming objective function are meaningful since all it- 
erates are feasible and 

iii) descent from one iteration to the next can be en- 
forced. 

However, the use of a feasible starting point is not 

strictly necessary for convex quadratic programs and is 

actually impractical in SQP methods. In fact, it is a com- 
mon practice in SQP methods to choose the active set 
for each QP subproblem to be the final active set from 


the previous SQP iteration and this usually gives an in- 
feasible starting point, particularly during early nonlin- 
ear programming iterations. However, the use of the ac- 
tive set from the previous SQP iteration has been found 
to work well and therefore the calculation of a feasible 
starting point at each SQP iteration is an unnecessary 
computational expense. Direct comparison of feasible 
and infeasible QP starting points in SQP calculations 
has also shown this to be true [13]. 


Addition and Deletion of Inequality Constraints 


The heart of any active set method for quadratic pro- 
gramming is the rules for adding, deleting or exchang- 
ing inequalities and in most active set strategies one ‘it- 
eration’ consists of either the addition or deletion of 
a single inequality or the exchange of one inequality for 
another [5,6]. However, strategies for adding and delet- 
ing multiple inequalities have also been suggested. 
Constraint addition, deletion and exchange usu- 
ally occur in the following way. Given some active 
set, say {Aj}, which is simply a collection of inequal- 
ity constraint indices, and a solution to the associated 
quadratic programming Kuhn-Tucker conditions for 
that active set, A x;, A; and juz, a search in the direction 
A x; is performed to determine if any of the inequal- 
ities not in the current active set are violated. This is 
easily accomplished by comparing the elements of the 
product J; A x; to the elements of the vector — c;(xx) 
for each inequality not in the current active set. If e/ 
Jr A x; > — c(xx) for i ¢ {Aj}, then that inequality con- 
straint is violated and temporarily flagged as one that 
must be considered for addition to the active set. Oth- 
erwise, it is not violated. Once all inequalities not in 
the active set have been tested, the most violated in- 
equality (or the one in which the ratio is the smallest) 
is selected as the inequality constraint to add to the cur- 
rent active set. Constraint deletion, on the other hand, 
is more straightforward. The Kuhn-Tucker multipliers 
for all inequalities in the current active set are checked 
and the one with the multiplier that is most negative 
is identified as the inequality that should be removed 
from the active set. For iterations in which both addi- 
tion and deletion are indicated, an exchange is usually 
made in which the inequality to be added replaces the 
one to be deleted from the current active set. These rules 
define the next active set {Aj,1}. See [5] for a more de- 
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tailed description of the standard rules for constraint 

addition, deletion and exchange. 

For nonconvex or indefinite quadratic program- 
ming problems, A. Lucia et al. [13] have shown that 
standard rules can determine incorrect active sets when 
the projection of the matrix B, onto the linear con- 
straint surface is indefinite. In particular, nonconvex- 
ity together with the standard rules can lead to the ad- 
dition of inequalities not in the final active set and/or 
the deletion of inequalities that truly belong to the fi- 
nal active set. This often leads to cycling, in which the 
same estimate of the active set occurs periodically af- 
ter one or more active set changes, and subsequently 
to failure of the active set strategy and therefore the 
SQP method. This is further complicated by the fact 
that there can be multiple Kuhn-Tucker points to in- 
definite quadratic programs, each of which corresponds 
to a different active set. Thus the rules for constraint ad- 
dition, deletion and exchange for indefinite or noncon- 
vex quadratic programming problems are more com- 
plicated and often include: 

i) monitoring the projected Hessian matrix and using 
projected indefiniteness to guide the addition of in- 
equalities into the active set; 

ii) permitting line searching for negative values of the 
line search parameter when the current QP Kuhn- 
Tucker point is in a direction of nondescent; and 

iii) deleting an inequality only if the projected Hessian 
matrix is positive definite or the degrees of freedom 
are exhausted [13]. 


Infeasible and Redundant Constraint Sets 


Many active set strategies also contain safeguards for 
infeasible constraint sets and constraint redundancy to 
avoid singularity in iterative estimates of the active set. 
An infeasible constraint set is one in which the collec- 
tion of points satisfying the constraints is empty; thus 
there is no solution to the quadratic program. It is 
in this regard that the use of phase I linear program- 
ming techniques for generating feasible initial values 
for the quadratic program offer advantages since they 
readily identify constraint infeasibilities; however other 
techniques for identifying constraint infeasibilities have 
been proposed [13]. SQP calculations in which infea- 
sible constraint sets have been encountered are usu- 
ally continued by using some ‘least error’ solution for 


the step in the unknown variables. Otherwise, the cal- 
culations are simply terminated [15]. Redundant con- 
straints, on the other hand, give rise to singularity in the 
constraint Jacobian matrix for the active set and tech- 
niques for ‘trapping’ and removing linearly dependent 
constraints are available. 


Matrix Factorizations and Updating 


The linear Kuhn-Tucker conditions for the quadratic 
program are usually solved using matrix factorizations. 
When the quadratic program is small in size and is 
guaranteed to be convex, QR, LQ or TU factorization 
can be used to factor the Jacobian matrix of the con- 
straints and Cholesky factorization is used to factor ei- 
ther the Hessian matrix, B;, or its projection onto the 
constraint surface, Z' BZ [9]. Note, however, that By 
does not have to be positive definite in order for Z' 
B,Z to be positive definite, but that some care must be 
exercised since Cholesky factorization requires positive 
definiteness. For large scale quadratic programs, usually 
both the constraint Jacobian matrix and the Hessian 
matrix are sparse (i.e., contain relatively few nonzero 
elements). To exploit sparsity, QR factorization can not 
be used to factor the constraint Jacobian matrix since 
Q formed from Householder transformations is usu- 
ally a full matrix and Z can destroy any sparsity in By. 
In this case, TU factorization is used for the Jacobian 
matrix and Cholesky factorization is used for the pro- 
jected Hessian matrix when it is positive definite [10]. 
For indefinite quadratic programs, Bunch and Parlett 
factorization is usually used in place of Cholesky fac- 
torization of the Hessian matrix or its projection [2] 
to maintain numerical stability. Lucia, J. Xu and K.M. 
Layn [13] use Bunch and Parlett factorization for the 
entire coefficient matrix of the Kuhn-Tucker condi- 
tions by exploiting the sparsity of the constraint Ja- 
cobian and Hessian matrices. See [10] for guidelines 
for choosing factorization techniques in quadratic pro- 
gramming. 

In order for the procedure of repeatedly solving the 
linear Kuhn-Tucker conditions to be efficient, com- 
plete refactorization of the associated matrices must be 
avoided. When constraint addition, deletion and ex- 
change is limited to the change of one or at most two 
inequalities from one active set to the next, techniques 
that modify the necessary factors using relatively few 
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arithmetic operations are available. R. Fletcher [5] gives 
recursion relations for modifying the fundamental lin- 
ear operators in quadratic programming. Other explicit 
and implicit formulas for modifying Cholesky, Bunch 
and Parlett factors [12] exist, as well as updating tech- 
niques based on Schur complement [8]. 


Interior Point Methods 


Most interior point methods are primal-dual (some- 
times called predictor-corrector) path following algo- 
rithms that use logarithmic barrier functions to change 
the linear Kuhn-Tucker conditions for the quadratic 
program into a set of parameterized nonlinear Kuhn- 
Tucker conditions [7,11,18]. The resulting nonlinear 
equations are solved using some type of (truncated) 
Newton method, whose linear subproblems are, in 
turn, solved by iterative linear equation-solving tech- 
niques. That is, approximate linear solutions are of- 
ten used in the beginning and the accuracy ‘tight- 
ened’ as the solution to the nonlinear equations is ap- 
proached. Convexity in the quadratic program is also 
generally required and preconditioning of the linear 
equations is needed for numerical stability. Iterative 
linear equation-solving techniques used to determine 
the (truncated) Newton corrections include precondi- 
tioned conjugate gradient, generalized minimum resid- 
uals, and other so-called Krylov (or expanding) sub- 
space methods, while preconditioning is often based on 
some incomplete (or partial) LU factorization of the co- 
efficient matrix. The use of iterative linear equation- 
solving methods is particularly advantageous in solv- 
ing large scale quadratic programming problems be- 
cause fill-in (i.e., turning zero elements into nonzero 
elements) is avoided in the coefficient matrix, thereby 
reducing both storage and overall computational work- 
load. 


Interior Point Formulation 


Interior point methods use logarithmic barrier func- 
tions to convert a quadratic program of the form 


min g(xx)! Axg + + Ax, BeAxg 
s.t c(xz) + JAx, = 0 
Ax,p > 0 


into the nonlinear program 
min g(xn)  Axg 
1 
Ae 5 Any Bi An —P) log(Axi); 
s.t. c(xz) + JAx, = 0, 


where (A x;)x denotes the ith element of the vector 
Ax on the kth iteration and P is a positive penalty 
parameter that tends to zero as the solution to the 
quadratic program is approached. Note that the in- 
equality bounds on the variables appear as part of the 
objective function and that the purpose of the logarith- 
mic functions is to add a penalty to the objective func- 
tion as the variables A x; approach their bounds. Since 
the logarithm of a small number is negative, the term 
— P log(A x;)x for any variable near its bound is pos- 
itive. This increases the value of the objective function 
and thus places a ‘barrier’ in the way of the iterates in 
order to prevent them from hitting the boundaries and 
keep them in the ‘interior’ of the feasible region. Also, 
general inequalities can be converted to equality con- 
straints using slack variables. Because the logarithmic 
barrier functions are nonlinear, the resulting Kuhn- 
Tucker conditions 


1 
T — = 
ByAxe tJ A,-—P ) io (xx) 
and 
JAx, = —c(xx) 


are also nonlinear because of the terms 1/(A x;),. These 
equations are then converted into an equivalent formu- 
lation using dual variables, z,, defined by P )°1/(A x;)x, 
and the resulting system of nonlinear equations 
ByAxy + g(xn) +J' Ax — Ze = 0, 
JAxx + c(xx) =0 


and 


1 
ze PD | (Axx 


which provide a primal-dual central path, are solved us- 
ing some form of Newton’s method. 


Solution of the Interior Point 
Optimality Conditions 


This last set of nonlinear equations, which define opti- 
mality for an interior point formulation of a quadratic 
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program, must be solved iteratively and usually New- 
ton’s method is used for this task (i.e., these condi- 
tions for the interior point formulation are linearized 
and corrections to the variables A x,;, A Ax and A zz 
are calculated). In practice, these equations are usually 
solved in a predictor-corrector fashion in which predic- 
tor steps account for scaling and feasibility and correc- 
tor steps perform centering to keep the iterates interior 
to the feasible region. Within these predictor and cor- 
rector iterations, line searching (i.e., a fraction of the 
Newton step) is also used to maintain feasibility and 
the rules for choosing the appropriate stepsizes are such 
that direct prediction Newton steps are taken in the 
limit. 

The nonlinear optimality conditions for an interior 
point formulation can be solved accurately at each it- 
eration. However, to improve the efficiency of interior 
point methods, truncated Newton methods [4] have 
been suggested for approximately solving these con- 
ditions. That is, optimality conditions are only solved 
to a ‘loose’ tolerance during early iterations in order 
to produce acceptable corrections in the variables and 
this tolerance is tightened as the solution to the inte- 
rior point optimality conditions is approached. The pri- 
mary justification for this approach is that there is no 
apparent need to solve the nonlinear optimality condi- 
tions accurately when the iterates are far from the solu- 
tion. 

To further improve efficiency, the linearized equa- 
tions that come from the application of Newton’s 
method to the optimality conditions for interior 
point methods are often solved using iterative lin- 
ear equation-solving techniques such as preconditioned 
conjugate gradients or other Krylov subspace methods 
such as the Generalized Minimum RESiduals (GMRES) 
techniques of [16]. These iterative methods usually use 
a small number of basis vectors, to conserve storage, 
and are ‘restarted’ each time the number of iterations 
exceeds the number of basis vectors being stored. In ad- 
dition, preconditioning is usually required for numer- 
ical stability. Preconditioning techniques are intended 
to improve the condition number (i.e., the ratio of the 
absolute value of the largest and smallest eigenvalue) 
of the coefficient matrix and typically some form of in- 
complete LU factorization is used for this purpose. The 
primary advantage of using iterative methods to solve 
the linearized equations is that they incur no fill-in and 


thus keep storage requirements at an acceptable level, 
even for very large problems. 


Penalty Parameter Updating 


The penalty parameter, P, must tend to zero as the solu- 
tion to the quadratic program is approached to achieve 
rapid convergence and for this usually some ‘updating’ 
procedure is used. That is, given some initial penalty 
parameter, say Po, iterative values of the penalty param- 
eter are calculated using some formula that drives P to 
zero quickly. To accomplish this, generic updating for- 
mulas of the form 


Pryi = BP 


are usually used, where f is a function of the stepsizes 
determined during the corrector phase of the solution 
to the optimality conditions. 


Numerical Studies 


Limited numerical results for quadratic programming 
can be found in the papers that introduce various as- 
pects of active set methods [2,5] and [13]. Few general 
numerical results for quadratic programming in SQP 
methods exist and those again usually deal with specific 
concerns such as infeasibility, factorizations, matrix up- 
dating, etc. There are even fewer numerical results re- 
garding the use of interior point methods, although in- 
terest in the latter is beginning to grow, and although 
much has been written regarding the comparison of ac- 
tive set and interior point methods no definitive study 
has been published. 


See also 


> Entropy Optimization: Interior Point Methods 

> Feasible Sequential Quadratic Programming 
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Introduction 


During the last two decades, many new strategies in lo- 
gistics management have emerged. By introduction of 
these strategies, not only concepts of logistics engineer- 
ing have been broadened but also new concepts such 
as “logistics integration” have been introduced. As the 
new concepts were introduced, many companies began 
to realize that in optimizing their logistics costs it is 
not sufficient to focus only on the organization itself. 
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It is rather compulsory to include other companies that 
have direct or indirect relationships with the organiza- 
tion. The challenge for logistics managers has become 
the integration of all operations across all facets of the 
business to improve the overall performance. The con- 
cept has become to be known as supply chain manage- 
ment (SCM). 

As companies began managing their supply chain, 
they realized that they were in need of tools that will 
measure the combined performance of the supply chain 
instead of their organization alone. These much-needed 
tools can be referred to as “supply chain performance 
measurement systems” [6]. When traditional perfor- 
mance measurement systems are reviewed, it is seen 
that performance metrics are usually based on account- 
ing systems which are not sufficient to measure overall 
supply chain performance. In order to overcome this 
insufficiency, several performance measurement sys- 
tems have been developed during the last decade. 


Applications 


Kaplan and Norton [17] developed the concept of the 
balanced scorecard (BSC) that complements financial 
measures of past performance with measures of the 
drivers of future performance. According to Kaplan 
and Norton [17], financial and nonfinancial measures 
must be parts of an information system that is available 
to employees at all levels in an organization. Based on 
this approach, the BSC provides executives with a com- 
prehensive framework that matches a company’s vision 
and strategy with a set of performance measures which 
are organized into four different perspectives: finan- 
cial perspective, customer perspective, internal business 
process perspective, and learning and growth perspec- 
tive. 

The supply chain operation reference (SCOR) [9] 
model links process elements, metrics, best practices, 
and the features associated with the execution of a sup- 
ply chain in a unique format and provides a balanced 
approach to measure overall supply chain performance. 
The SCOR model is hierarchical with specific bound- 
aries in regard to scope and gives definitions of ev- 
ery performance measure included in the model. In 
version 5.0 of the model, the performance measures 
are also intended to be hierarchical. Under the SCOR 
model, performance measures are classified into five 


groups, which are reliability, responsiveness, flexibility, 
cost, and assets. 

Beamon [2] categorized performance measures into 
two groups, qualitative and quantitative measures, and 
uses these performance measures for supply chain de- 
sign and analysis. Qualitative performance measures 
are those for which there is no single direct numeri- 
cal measurement. Quantitative performance measures 
are the measures that may be directly described numer- 
ically. Beamon [2] also categorized quantitative mea- 
sures as objectives based on cost and objectives based 
on customer responsiveness. Fill rate, product lateness, 
customer response time, and lead time are examples 
of measures based on customer responsiveness, while 
cost, sales, profit, inventory investment, and return on 
investment are defined as measures based on cost. 

According to Gunasekaran et al. [14], companies 
often lack the insight for the development of effective 
measures and metrics to achieve fully integrated sup- 
ply chains. The lack of insight is not only the result of 
an unbalanced approach between financial and nonfi- 
nancial performance measures, but is also because of an 
insufficient distinction among metrics at strategic, tac- 
tical, and operational levels. The authors also present 
a framework for measuring the performance of a sup- 
ply chain after discussing some of the most appropri- 
ate performance metrics and measures. The metrics 
discussed in this framework are classified into strate- 
gic, tactical, and operational levels of management. The 
metrics are also distinguished as financial and nonfi- 
nancial so that a suitable costing method based on ac- 
tivity analysis can be applied. 

Beamon and Chen [4] categorized performance 
measures into three groups: resource, output, and flex- 
ibility. The resource performance measures determine 
the level of resources in the system that are used to 
meet the objectives. The output performance measures 
show the effectiveness of the supply chain. The flexi- 
bility measures describe the range of possible operating 
conditions that are profitably achievable in the supply 
chain. Beamon and Chen [4] defined performance mea- 
sures for each category and also ran simulations con- 
cerning the performance behavior of conjoined supply 
chains. In their paper, a conjoined supply chain is de- 
fined as a combination of divergent and arborescent 
structures. The simulation results show that the system 
stock-out risk, the probability distribution of the de- 
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mand, and the transportation time are the most impor- 
tant metrics in determining the effectiveness of a supply 
chain. 

Basu [1] made a comparison of performance mea- 
sures when performance criteria shift from the en- 
terprise itself to the integrated supply chain. Accord- 
ing to the author, the collaborative culture of the in- 
tegrated supply chain has triggered the emergence of 
new measures, especially in five areas: external focus, 
power to customer, value-based competition, network 
performance, and intellectual capital. The author also 
recommends a six-step cycle to implement a perfor- 
mance management system and sustain the benefits of 
a performance management system with the new mea- 
sures. 

Beamon [3] evaluated and identified the limitations 
of supply chain performance measures such as cost, 
activity time, responsiveness, and flexibility. She also 
evaluated the use of single performance measures. Ac- 
cording to Beamon [3], the single supply chain per- 
formance measures are attractive because of their sim- 
plicity. In addition, the author claims that current 
supply chain performance measurement systems are 
inadequate since they rely on the use of cost as a pri- 
mary measure; they are not inclusive; they are often 
inconsistent with the strategic goals of the organiza- 
tion; and they do not consider the effects of uncertainty. 
On the basis of these insufficiencies, the author pro- 
posed a framework for measuring supply chain per- 
formance that relates supply chain performance mea- 
sures to strategic goals. In addition, Beamon [3] gave 
a list of supply chain performance measures and their 
respective definition. The author also presented a quan- 
titative approach to flexibility measurement and stated 
that flexibility measures are different from cost, activity 
time, and responsiveness measures. 

Ramdas and Spekman [21] measured supply chain 
performance using a set of variables that capture the 
impact of SCM on both system-wide revenues and 
costs. Their evaluation was based on responses to a sur- 
vey of 22 extended supply chains across five indus- 
try groups, which included life sciences, oil and gas, 
consumer products, agricultural and food processing 
utilities, and manufacturing (high-tech electronic and 
automotive). The authors defined six variables that re- 
flect different approaches to measure the supply chain 
performance. These variables are inventory, time, or- 


der fulfillment, quality, customer focus, and customer 
satisfaction. Ramdas and Spekman [21] also compared 
functional and innovative respondents and concluded 
that functional product supply chains and innovative 
product supply chains differ significantly in thinking 
and practices. 

Stewart [24] claims that the integration of a sup- 
ply chain requires philosophical, operational, and sys- 
tems changes. The author also claims that the objective 
of an integrated supply chain structure is minimizing 
non-value-adding activities and their associated struc- 
ture. The author suggests that during the integration of 
a supply chain, four categories of operational change 
must be considered. These categories are structure, 
policy, systems, and organization. Systems should en- 
able performance measurement. In addition, the author 
pointed out that the business performance metrics must 
support a balanced view, and that a “balanced metric” 
framework is necessary to measure supply chain perfor- 
mance. Stewart [24] also provided PRTM’s Third An- 
nual Supply Chain Performance Benchmarking Study 
results. The data collected for the benchmarking study 
cover four areas: delivery performance, flexibility and 
responsiveness, logistics cost, and asset management. 
The author identified “keys” to unlock the supply chain 
excellence. 

Stainer [23] included productivity in the context of 
logistics operations and showed how productivity can 
be measured in this context. The author states that pro- 
ductivity can be seen as management of resource uti- 
lization and then proposes a framework for logistics 
productivity analysis that consists of five distinct di- 
mensions of service performance: tangibles, reliability, 
responsiveness, assurance, and empathy. In addition, 
Stainer [23] states that the dimensions must be incor- 
porated in the strategic thinking. 

Bowersox and Closs [5] discussed logistics perfor- 
mance measures. The authors offer a framework for 
measuring integrated supply chain performance and 
benchmarking across an organization. They propose 
three objectives for developing and implementing per- 
formance measurement systems: monitoring measures, 
controlling measures, and directing measures. In addi- 
tion, they defined activity-based measures and process 
measures. While activity-based measures focus on an 
individual task or process, process measures focus on 
the overall process throughout the supply chain. Bow- 
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ersox and Closs [5] defined three levels of performance 
measurement: internal performance measurement, ex- 
ternal performance measurement, and comprehensive 
supply chain measurement. Each of these measurement 
systems is classified into subcategories, and the logis- 
tics performance measures are placed into these sub- 
categories. 

Miller [19] presented a hierarchical framework for 
capturing and linking all key performance measures. In 
this framework, the author differentiates measures both 
by their individual level in the hierarchy and by their 
focus. There are three hierarchical levels: strategic, tac- 
tical, and operational. Within these hierarchical levels, 
performance measures are differentiated into two cat- 
egories: internal and external measures. While internal 
measures focus on efficiency and productivity, external 
measures focus on the effectiveness of an activity. On 
the basis of the framework developed, at the strategic 
level, a few key performance metrics will be measured 
to assess a company’s overall performance; at the tacti- 
cal level, the performance of each function will be mon- 
itored; and at the operational level, the performance of 
each subfunction will be monitored. 

Handfield and Nichols [15] discussed the key ele- 
ments in establishing a successful supply chain reengi- 
neering effort and an effective performance measure- 
ment. They defined the properties of an effective sup- 
ply chain performance measurement system and gave 
an example framework of the BSC approach for a sup- 
ply chain performance measurement system. 

Lapide [18] identified that companies generally fall 
into the following developmental stages: 

1. Functional excellence - a stage in which a company 
needs to develop excellence within each of its operat- 
ing units, such as manufacturing, customer service, 
and logistics departments. 

2. Enterprise-wide integration - a stage in which 
a company needs to develop excellence in its cross- 
functional processes rather than within its individ- 
ual functional departments. 

3. Extended enterprise integration - a stage in which 
a company needs to develop excellence in interenter- 
prise processes. 

Another important aspect of performance measure- 

ment is setting the correct performance targets, which 

should always be jointly set in the context of strategic 
objectives. Lapide [18] identified four methods that can 


be used to set performance targets: historical data based 
targets, external benchmark, internal benchmark, and 
theoretical target setting. 

Hausman [16] gave information about the effects 
of the Internet on the supply chain. The author claims 
that new performance metrics should capture costs 
and benefits of the Internet. Hausman [16] claims that 
a supply chain needs to perform on three key dimen- 
sions: service, assets, and speed. In addition, the au- 
thor emphasizes that supply chain performance metrics 
must be aligned with the business strategy. 

Rolstadas [22] states that although many different 
performance definitions exist in the literature, these 
definitions can be defined by three dimensions: 

1. Effectiveness: to what extent are customers’ needs 
met. 
2. Efficiency: how economically are the resources of the 
company utilized. 
3. Changeability: to what extent is the company pre- 
pared for future changes. 
Chan et al. [8] identified some in-depth problems of 
performance measurement systems in the supply chain 
context in their literature review. These problems are 
(1) the lack of a balanced approach in integrating finan- 
cial and nonfinancial measures, (2) the lack of system 
thinking, in which a supply chain must be viewed as 
one whole entity and the measurement system should 
span the entire supply chain, and (3) loss of a supply 
chain context. Thus, the authors conclude that the ex- 
isting performance measurement systems lead to local 
optimization. To overcome local optimization, Chan 
et al. [8] propose a supply chain performance measure- 
ment system with the assistance of the “analytic hier- 
archy process” (AHP) method. The proposed system is 
supposed to assess the performance of all the nodes in- 
volved along the supply chain on the basis of the core 
process in the simplified supply chain model. The au- 
thors propose an eight-step method that identifies and 
decomposes the processes involved and measures the 
performance. Chan et al. [7] extended the previously 
proposed supply chain performance measurement sys- 
tem by using the fuzzy set theory. 

Research papers related to supply chain perfor- 
mance measurement have also appeared in many books 
for different supply chain environments, such as Ge- 
unes et al. [11,13], Geunes and Pardalos [12], and 
Pardalos and Tsitsiringos [20]. 
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Similar to previous researchers, Capar [6] also pro- 
poses a supply chain performance measurement frame- 
work. In addition to customer satisfaction and financial 
perspectives, the author presented a new perspective. 
The new perspective, referred to as the supply chain col- 
laboration perspective, considers new trends in SCM. 
Capar [6] also presented performance measures for the 
supply chain collaboration perspective. The metrics are 
classified as strategic, tactical, and operational in or- 
der to determine the corresponding management level 
that deals with the metrics. Furthermore, the author 
discussed an appropriate supply chain performance 
measurement system for a large Turkish automotive 
company that manufactures passenger cars, light com- 
mercial vehicles, and related components. 


Conclusions 


According to a multiyear study of supply chain excel- 
lence at Michigan State University, performance mea- 
surement is one of the top four drivers of supply chain 
excellence [10]. This study also brings out the impor- 
tance of neglected supply chain performance measure- 
ment during the supply chain transformation efforts. 
The research study reached the conclusion that success- 
ful supply chain transformation efforts via effective sup- 
ply chain performance measurement are increasing. 

Easton et al. [10] also mentioned that sufficient per- 
formance measurement systems should possess the fol- 
lowing properties: 

1. Measures should be directly tied to operational ef- 
fectiveness and efficiency. 

2. Measures should relate important strategic objec- 
tives and nonfinancial performance. 

3. Measures should provide a forward-looking per- 
spective. 

As the authors state, efficiency is the most weighted 
dimension of the majority of measurement systems, 
and it is intuitive for companies to focus on efficiency. 
However, companies should not neglect the measure- 
ment of the integrated supply chain performance. 
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As the modern telecommunications need networks that 
support fast and reliable data transmissions, the op- 
tic fiber networks (SONET: Synchronous Optic Net- 
works) are widely used and replace the old copper based 
networks. According to the characteristics of the optic 
fiber networks, designing networks so that it can sur- 
vive from any failure of the networks which comes from 
a node failure or a link failure is an important issue that 
we have to consider [1]. 

Malfunctions of the networks result from node or 
link failures and these failures come from natural dis- 
asters such as earthquakes or incidents such as cutting 
by ground digging or fire. Since much of modern busi- 
ness depends on the telecommunication networks, the 
networks should be safe even if there is a damage in 
some place of the networks. The survivable network is 
a network that can perform its function properly even 
if there are node or link failures on the network. Prac- 
tically, the node means a telecommunication center, 
a switching point, or a city and the link denotes a ca- 
ble between pair of nodes. 


For an undirected graph G=(V, E), where V isa set 
of vertices (nodes) and E is a set of edges (links) in the 
graph, the survivability of the networks is defined as the 
number of node-disjoint paths, rj, for pairs of nodes i, 
j. In other words, from i to j or from j to i in the graph, 
the communication path is safe until rj-1 nodes of the 
network are malfunctioning. 

Also, edge-disjoint paths can be defined similarly. 
If two edge-disjoint paths are found between i and 
j, the two paths do not share some edges in these 
paths, but they may use the same node in the paths. 
Hence, the node-disjoint constraint is stricter than the 
edge-disjoint constraint. Note that if there are rj node- 
disjoint paths between i and j, we can always find rj 
edge-disjoint paths between them. But, the reverse is 
not true. 

In Fig. 1, for example, three paths, a — b — h,a —c 
—e-—f —h, and a— d-—e-— g—h between a and h, 
are edge-disjoint and a— b —-handa—d—e-—g-—h 
are node-disjoint paths. That means at least three edge 
failures or two node failures are needed for interrupting 
telecommunications service between a and h. 

In general, the number of node(edge)-disjoint paths 
for some specific nodes, a and b, can be found easily us- 
ing max-flow algorithms. If flow capacities of all edges 
are infinite and those of all nodes on the network is 
1, maximum flow obtained by a max-flow algorithm 
is equal to the number of node-disjoint paths between 
a and b. If flow capacities of all edges are 1 and flow 
capacities of all nodes are infinite, the maximum flow 
should be the number of edge-disjoint paths between a 
and b on the network. 

Now, we consider only the node-disjoint con- 
straints since the edge-disjoint constraints are included 
in the node-disjoint constraints. The node(edge)-dis- 
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joint constraints may be called connectivity constraints. 
If rj = n — 1, Vi, j € V, where n is the number of 
nodes in the graph, the survivable network is a complete 
graph and there are n — 1 node disjoint paths includ- 
ing a direct connection between a and b. However, it is 
not a good idea to establish such an expensive network. 
Since implementing a network that has more links than 
an optimal network that has the necessary number of 
links to provide desirable survivability is not cost effec- 
tive, we have to consider the trade-off between the cost 
and network redundancy in designing a network topol- 
ogy. 

For a given G = (V, E), cost matrix c(i,j),i,j=1,...; 
n, and the requirements of the number of node-disjoint 
paths between each pair of nodes i, j € V, rij, the Surviv- 
able network design problem (SNDP) is to find the mini- 
mum cost edge set such that it guarantees that the num- 
bers of node-disjoint paths between pairs of nodes are 
all greater than or equal to rj, for all i,j=1,...,n. If rj = 
2, for all i,j =1,..., n, then a ring topology is a solution 
of the network. Since all pairs of nodes in a ring should 
have two node-disjoint paths, the network is safe with 
any one node failure. For designing such a ring struc- 
ture, Net Solver was developed by L.M. Gardner, I.H. 
Sudborough, and I.G. Tollis [2]. 

Many studies have been done for these problems 
and they can be categorized into two classes: 

e algorithms that obtain the optimum solution 

e heuristic algorithms that get a near optimum solu- 
tion 

For the former, various integer programming (IP) and 

linear programming (LP) techniques are used and for 

the latter, local search methods are studied. 

M. Grétschel and C.L. Monma [4] formulated the 
problems as integer program and studied integer poly- 
hedra of k-edge(node) connected network design prob- 
lems. Also, Grétschel, Monma, and M. Stoer [5] pre- 
sented computational results obtained by solving the 
SNDP using a cutting plane method of the IP. They con- 
sidered the SNDP with low connectivity constraints, 
i.e., rj = 0, 1, 2, for all i, 7, MX. Goemans and DJ. Bert- 
simas [3] established the parsimonious property of the 
problems formulated in the LP relaxation. 

K. Steiglitz, P. Weiner, and D.J. Kleitman [9] pro- 
posed a local search method for the general SNDP. 
Monma and D.F. Shallcross [6] used several heuristic 
techniques for obtaining initial solutions and for im- 


proving the solutions of two-connected survivability 
constraints problems. T.S. Wu explained the network 
survivability in detail in [10] and many features of hard- 
ware aspects are included in the book. 


SNDP with Traffic Capacity 


In the practical situation, we also have to consider 
telecommunication traffic on the network in addition to 
the network topology. In order to guarantee the steady 
service when some node or link failures occur on the 
network, we have to decide the cable capacity in a link. 
For some specific pair of large cities, the traffic may 
be higher than that of some pair of small cities. Con- 
sider the following situation: a link that has large ca- 
pacity between two large cities A and B is broken down 
and a secondary path that the survivable network pro- 
vides is used. But the secondary path uses a link that 
has not an enough capacity for the traffic between A 
and B. If such a case happens, the service between A 
and B should be interrupted and the network is not safe 
any more. Hence, the capacity of a link should be deter- 
mined in designing the network as well as the surviv- 
ability. 

For this kind of problems, many approaches have 
been proposed. J. Yamada [11] proposes an algorithm 
to design efficient spare path networks. It is a heuristic 
and to achieve near-optimization. J. Shi and J. Fonseka 
[8] studied a class of traffic-based survivability mea- 
sures and survivability analysis of telecommunications 
networks. I. Ouveysi and A. Wirth [7] used a maximum 
spanning tree with traffic requirements and could pro- 
vide a survivable network that can operate with at most 
two link failures. 


A Heuristic for SNDP 


Now, we are going to explain a heuristic method [9] 
(called Alg1) for the general SNDP in detail. 

Alg1 is composed of 2 parts: 
1) stage of obtaining initial feasible solutions 
2) stage of local improvements of the initial solutions 
Since the SNDP has many local minima, by performing 
a local search starting from several initial solutions and 
obtaining many local minima, Alg1 may get a near op- 
timal solution of the SNDP. The local search method is 
trying to find a direction to minimize an objective func- 
tion. If such a point is found and it is feasible, it moves 
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to the new point and continues the same procedure un- 
til no further improvements can be made. Since the 
SNDP has multiple local minima, the procedure may 
stop at a local minimum and further improvements are 
impossible. If then, it finds a local minimum and the 
whole procedure starts again with another initial feasi- 
ble solution. Since it searches local area, it is called the 
local search. In order to increase the possibility of find- 
ing the global minimum, it has to search whole area of 
the feasible region and it needs many initial solutions. 
However, it is not cost-effective to investigate the whole 
feasible region and the trade-off between computation 
time and quality of solutions. The following is the detail 
procedure of Alg1. 


DO k = 1 (number of local minima to be found) 
find an initial feasible solution 
WHILE local improvements are possible 
DO X-change (local improvement) 
move to the new feasible solution 
END WHILE 
keep the local minimum 
END DO 
RETURN the lowest cost solution among the local 
minima 


Program Alg1 


For obtaining an initial solution, Alg1 defines a re- 
quirement array p, p; = max; rj, j = 1, ..., n, and we 
select at least pj edges connected to a node i for con- 
structing an initial solution. It uses the property that 
the degree of node i must be at least the maximum of 
node requirements between i and all other nodes in or- 
der to guarantee that there are rj node-disjoint paths 
between nodes i and j. By randomizing the sequence of 
the node numbers, Alg] may get many distinct initial 
solutions. These initial solutions may satisfy the node 
connectivity constraints or not. The node connectiv- 
ity constraints can be tested by using a max-flow algo- 
rithm and if there is a max-flow f > rj between nodes 
i and j, the initial solution that satisfies the node con- 
nectivity constraints becomes an initial feasible solution 
and is used for the next stage, local improvements. 

The local improvements are mainly composed of 
a series of X-changes (see Fig. 2). 
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In Fig. 2, the lines between two nodes mean the 
edges are selected for cabling. If c(a, b) + c(c, d) > c(a, 
d) + c(b, c), the X-change can be done and the total 
cost can be decreased. However, it is necessary to test 
the network after an X-change still keeps its feasibility 
of the node-connectivity constraints since two links are 
deleted and it can affect the number of node-disjoint 
paths for some pair of nodes. If we have to test node- 
connectivity constraints for all pairs of nodes i, j = 1, 

.., n, it must take O(n?) steps and be time consum- 
ing. But, according to [9, Thm. 2], only some of O(n’) 
pairs could be tested after an X-change for ensuring the 
node-connectivity constraints. 

For the SNDP with connectivity constraints rj = 1, 
2 for all i, j € V, Monma and Shallcross [6] proposed 
a specific method for obtaining initial solutions and lo- 
cal improvements. For the SNDP with node connectiv- 
ity rj > 2 for some pair of nodes i, j, a special method for 
initial solutions and local improvements has not been 
studied until now. 

As much of the modern life depends on telecommu- 
nications, the importance of the survivable networks is 
increasing rapidly. Also the structure of the SNDP will 
be complex in order to model the complicated require- 
ments of telecommunications in the real world. Hence, 
new classes of SNDP will be introduced and they should 
deserver to be challenged by many researchers. 
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Let A be an n Xx n symmetric matrix and b be a column 
vector of length n. Then the system of linear equations 


Ax = b, 


where x is a column vector of length n, is a symmetric 
system of linear equations. To demonstrate what a sym- 
metric matrix is, consider the following two matrices, 


1 2 3 1 2 3 
2 -1 —4 7 —-1l —4 
3-4 1 5 -—3 1 


The matrix on the left is symmetric because an element 
in row i and column j is equivalent to the element in 
row j and column i. The matrix on the right does not 
have this property and is therefore not symmetric. Al- 
though the Chinese investigated linear systems of equa- 
tions around 250 BC, the modern study of systems of 
linear equations was begun in the late 17th century by 
G.W. Leibniz. The solution techniques of this time were 
developed through the use of determinants, and the 
idea of a matrix was not introduced until 1850 by J.J. 
Sylvester. In 1855, the English mathematician A. Caley 
published the first article concerned with the algebra of 
matrices and it was Caley that defined what it meant for 
a matrix to be symmetric. 

Symmetric systems of linear equations often arise 
when dealing with optimization problems. For exam- 
ple, many common optimization algorithms, such as 
gradient descent, quasi-Newton methods, and Newton’s 
method, use a solution to a symmetric linear system to 
decide a direction in which to search for the next iterate. 
Consider the unconstrained optimization problem 


min f(x), 
ms ) 
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where f is a twice continuously differentiable function 
from R" to R. For any x € R", the search direction, A x, 
which solves 


SAx = —Vf(x), 


leads to gradient descent if S is the identity matrix, 
Newton’s method if S is the Hessian of f at x, and quasi- 
Newton methods if S$ is an approximation of the Hes- 
sian of f. In all of these cases S is symmetric, and one 
can see that common optimization routines require the 
solution to a symmetric system of linear equations. An- 
other optimization problem requiring the solution to 
a symmetric system of linear equations is the uncon- 
strained, full rank, least squares problem. This problem 
is 


in ||Bx —b 
min ||Bx — bl, , 


where B is an (m x n)-matrix, m > n, and the rank of B 
is n. The unique solution to this problem is the solution 
to the symmetric system 


B'Bx =B'b. 


Symmetric matrices have several desirable eigen- 
value and eigenvector properties. One of these proper- 
ties is that the 2-norm of a symmetric matrix is the spec- 
tral radius. In other words, if Amax is the largest eigen- 
value of the symmetric matrix A, then 


|All. = Amax- 


Two of the most important eigenvalue properties, 
shown by Ch. Hermite in 1855, are that symmetric ma- 
trices have real eigenvalues and are unitarily similar to 
a diagonal matrix, whose main diagonal contains the 
eigenvalues of the matrix. This last result relies on the 
fact that the eigenspaces of an n by n symmetric matrix, 
A, contains an orthonormal bases for R”. Using such 
a basis as the columns of a matrix, say Q, we have the 
similarity relation, 


A=QDQ'. 


If this factorization of A is known, then the system Ax 
= b may be easily solved by first solving 


Dy=Q'b, 


and then setting x = Qy. Although this diagonal factor- 
ization is in general expensive, it is useful when deal- 
ing with quasi-Newton methods. These methods often 
update the current approximation of the Hessian by 
adding a rank one matrix, so that, in the following iter- 
ation, the new matrix produces a ‘better’ search direc- 
tion. Hence, if S is the symmetric approximation to the 
Hessian of f for the current iteration, the next iteration 
uses S + vv', where v is defined by the algorithm. The 
result needed to analyze and implement these quasi- 
Newton methods is due to Ch. Loewner. The theorem 
is known as the interlocking eigenvalue theorem, and it 
shows the relationship between the eigenvalues of the 
symmetric matrices S and S + vv". It states that if the 
eigenvalues of S are 


Ay <ce 


Sn, 


and the eigenvalues of § + vv! are 


0) S++ On, 
then 
Ay <0, < An SO) S00 <Any SO. 


Essentially this demonstrates that if a symmetric matrix 
is formed as the sum of two symmetric matrices, one 
of which is rank one, then the eigenvalue structure re- 
mains somewhat intact. This leads to preconditioning 
and scaling routines that make quasi- Newton methods 
robust. Furthermore, subsequent results show how to 
efficiently obtain the diagonal factorization for S + vv! 
from the factorization of S. This means that solving the 
symmetric system of equations for the next iteration is 
relatively cheap when the diagonal factorization of S is 
known. 

Many methods, other than the diagonal factoriza- 
tion used above, have been suggested to solve symmet- 
ric systems of linear equations. When the A matrix is 
positive definite, the factorization of choice is usually 
the Cholesky factorization. However, when the matrix 
is indefinite, other schemes must be used. In 1846 C.G. 
Jacobi showed how to use rotations to solve a symmetric 
system of linear equations. This method is receiving re- 
cent attention because it is inherently parallel. Another 
method, presented by J. Aasen in 1971, is to permute 
the rows and columns of A and then decompose this 
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matrix into a tridiagonal matrix instead of a diagonal 
matrix. The exact factorization is 


PAP’ =LTL', 


where P is a permutation matrix, L is a lower triangular 
matrix, and T is a tridiagonal matrix. Now, to solve Ax 
= bone solves the following sequence of problems: Lz = 
Pb, Tw = z, L"y = w, and then sets x = Py. 


See also 


> ABS Algorithms for Linear Equations and Linear 
Least Squares 

> Cholesky Factorization 

> Gauss, Carl Friedrich 

> Interval Linear Systems 

> Large Scale Trust Region Problems 

> Large Scale Unconstrained Optimization 


> Linear Programming 

> Orthogonal Triangularization 

> Overdetermined Systems of Linear Equations 
> QR Factorization 
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In a general format a theorem of the alternative (TA) 
claims that, between two given propositions, say S 
and S*, one and only one is true. In mathematics S 
and S* are, in general, systems of equalities or in- 
equalities. A TA for linear algebraic systems was es- 
tablished as early as 1873 by P. Gordan [11]; then 
there was the celebrated Farkas lemma in 1902 [7] (cf. 
also » Farkas lemma; » Farkas lemma: Generaliza- 
tions); indeed, such a lemma does not appear as a TA, 
but an obvious reformulation shows it as a TA. Some 
further important TA were established in 1915 by E. 
Stiemke [22], in 1936 by T.S. Motzkin [19], in 1951 by 
MLL. Slater [21], in 1956 by A.W. Tucker and in 1956 
by RJ. Duffin (see [17]). Subsequently, due mainly to 
the development of the optimization theory, there has 


been a blooming of TAs; they have been extended to not 
necessarily algebraic systems, to systems in an infinite- 
dimensional space, to systems in a complex space, and 
even to systems for point-to-set maps. TA (sometimes 
called transposition theorems) have been conceived as 
tools for proving some theorems of linear algebra (this 
is the reason why the Farkas TA is known as a lemma) 
or to prove the existence and uniqueness of solutions of 
differential and integral equations [24]. 

It is interesting to note that, a few years later, in 
a completely different field of mathematics, some ideas 
mature which lead to state so-called separation theo- 
rems (ST). Indeed, here too, the first important result 
does not look like an ST: on the basis of some ideas of 
E. Helly in 1912 [14], S. Banach in 1925 [2] and H. Hahn 
in 1927 [12], independently of each other, establish the 
celebrated Hahn-Banach linear extension theorem; by 
means of an obvious reformulation it shows itself to be 
a ST. Here too the purpose is to have lemmas for prov- 
ing other theorems — in functional analysis and geome- 
try. 

Over several years TA and ST have been carried on 
as disjoint theories. Recently, thanks to the great devel- 
opment of optimization and to the increasing use of 
TA and ST in the theory of optimization, it has been 
recognized that TA and ST are different ‘languages’ for 
expressing thesame ‘structural’ property (this does not 
imply that one of them should be deleted; on the con- 
trary, different languages let us achieve more proper- 
ties) and, overall, that they are not only tools for prov- 
ing theorems; indeed, they have been raised to the basis 
for the theory of constrained extrema. 

After a short review of some TA, their application 
to prove fundamental theorems of optimization will be 
shown. Then, we will briefly describe the recent ap- 
proach to the theory of constrained extrema which is 
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based on TA and ST. Matrices and vectors will be real- 
valued. 


Farkas Lemma 


Let A bea matrix of the order m x n, abe a row n-vector, 
and x a column n-vector. Ax > 0 implies ax > 0 if and 
only if there exists a nonnegative row m-vector z such 
that zA =a. 

This lemma receives a useful vector interpretation. 
The rows of A can be seen as vectors of R"; call C the 
(convex) cone generated by them, and set C* := {x € 
R" : Ax > 0}. Since the elements of C are the only vec- 
tors which have a nonnegative scalar product with each 
vector of C*, then a must belong to C. 

Farkas lemma can be equivalently formulated as 
TA: 

Theorem 1 Let us adopt the same notation of Farkas 
lemma, and let z be a row m-vector. Between the systems 
(in the unknowns x and z): 


S;: Ax>0, ax <0 
and 
Si zASa, 220 


one and only one has solutions. 


System Sj introduces a new variable and a new space - 
i.e., that where z runs - which can be called dual space 
of that where x runs, as we will see later. 

From Therorem 1 we immediately deduce another 
TA. 


Theorem 2 Let A and B be matrices respectively of the 
orders m x n and p x n, ua row m-vector, and v a row 
p-vector. Between the systems (in the unknowns x and 


(u, v)): 


Sx: Ax<0, Bx <0 
and 
st uA+vB= 0, 
>" )u>0, v>0, v0 


one and only one has solutions. 


The possibility of both S; and S} leads to that of in- 
equality (uA + vB) x < 0 which contradicts the equa- 
tion in S}. Let e be the column p-vector whose en- 
tries equal 1; because of Therorem 1 the impossibility 


of S2(which is equivalent to that of system Ax < 0, Bx + 
et < 0, t > 0 with t € R such a system is easily identified 
to be of type S;) implies the possibility of system (we 
set z = (u, v)) — uA — Bv=0, —-ev=—1l,u>0,v>0, 
which shows the possibility of S}. 

A vector interpretation quite analogous to the one 
above can be given for Therorem 2. At A = 0 Therorem 
2 becomes the TA stated by Gordan. Now, let us show, 
by means of classic instances, how TA have been ex- 
ploited for proving fundamental theorems on con- 
strained extrema. To this end, consider the following 
minimization problem with bilateral constraints: 


min f(x), 
st. g(x) = 0, 


where the function f:R"” — R and the column vector 
function g:R” — R” are differentiable at least at x € R”. 
Let V f(x) denote the row n-vector gradient of f at x and 
V g(x) the m x n Jacobian matrix of g at x. 

It is well known (see [17,20]) that, under suitable 
assumptions (e.g., V.g(x) has maximum rank and V 
g(x) is continuous around X, if x is a (local) minimum 
point for P;, then the directional derivative of f is non- 
negative along each direction d of the linear manifold 
Vag(x)d = 0) d=0 (beside, of course, along each feasi- 
ble direction; s is a column n-vector) which is tangent, 
at (x), to the (nonlinear) manifold g(x) = 0. This means 
that V g(x)d =0 implies V f(x)d > 0, or 


ee d>0 implies Vf(%)d>0. 


V g(x) 
—Vg(x) 
lemma can be applied and its thesis means now the exis- 
tence of a nonnegative row (2m) -vector, say y = (y’, y”) 
with y’, y’ € R’, such that (y/ — y”) Ve(x) = Vf(x), 
or 


By setting A = ( ) and a = Vf (x), Farkas’ 


Vf(%) = AVg(X), 


where A := y! — y”. Hence, by means of a TA, we have 
achieved the existence of a vector A) (whose elements 
are known as Lagrange multipliers), such that the pair 
(X, A) is a stationary point of the Lagrangian function 
L(x; A) := f(x) — A g(x). 
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Now, consider the following minimization problem 
with unilateral constraints: 


min f(x), 

P, : 

st. g(x) > 0 
where f and g are as in P}. 

In 1948 F. John, under the assumption that f and g 
be differentiable at least at x, proved the following nec- 
essary condition (see [17,20]: if X is a (local) minimum 
point for P3, then there exist 96 €¢ R+ and A € R"! with 
6,4) # 0, such that (0, A) is a solution of the system 
(in the unknowns @ € R and the row vector A € R"): 

FJ: OVf(x)—AVg(x)=0, Ag(x) =0. 

To show this condition, let us set (T as superscript de- 
notes transposition) g(x)! = (gi (x), ...5 @m(x)), A = (Ai, 
..., Am) and introduce the sets 


I:= {1,..., m}, 
PePraeHTEek 249 =0). 


If the thesis is false, then the (linear homogeneous) sys- 
tem 


AV F(R) — D> AV gi(X) = 0, 
ie]? 

6>0, A;>0, 

(0,A;, 1€ 2) #0 


(1) 


ie. 


has no solution (in fact, if (1) had a solution, say (6*, 
A*,i€ I°), then, by setting A * = 0, i € I\I°, and A * = 
(AY, ...,4 *), the pair (9*, A *) should be a solution of 
FJ). With the positions 


A=0, a= ( we) 1 


—Vegi(x), ie I 

v=(9,A ;,i€I°), (1) is identified with S}. Hence, from 
Therorem 2 we deduce that S, has solutions, or that 
there exists a column n-vector y, such that 

ie. 


Vi@y <0, Vegi(x)y>0, 


These inequalities would mean that ¥ is a feasible direc- 
tion along which the directional derivative of f at x is 
negative; this fact, according to a well known Lineariza- 
tion Lemma [1,17], contradicts the assumption that x 
be a minimum point. 


The above Lagrange and John necessary optimality 
conditions show how a TA has been classically used, 
and hence the reason why they have been conceived. 
However, TA (and ST) possess a much greater potential 
than that exploited for proving theorems. To explain, 
even if in short, this aspect let us consider again P, and 
assume now that f be convex and g concave, but not 
necessarily differentiable, so that P2 is a convex prob- 
lem (minimization of a convex function over a convex 
domain). Let us set p(x; x) := f(x)—f (x) := by the very 
definition of minimum it is trivial to claim that a feasi- 
ble x € R” is a minimum point for P) if and only if the 
system (in the unknown x): 


S3: p(x%3x)>0, g(x)>0, xeER", 


is impossible, or 
S? ANKE =~, 


where H:= {(u, v) ER XR”: u>0,v> 0} and K(x) := 
{(u,v) ERX R™: u= (xXx), v= g(x), xER = 
F(x; R"), with F(x; x) := (@(x; x), g(x)).It is easy to see 
that S) holds if and only if 
SJ: HO(K(x)—dosH) = Q, 

where the difference is in vector sense and clos denotes 
closure. K(x), which is called the image of P2, is such 
that K(x) — clos H)is convex [9] as well as H. In the 
image space (u, v), consider the family of hyperplanes 
H defined by the equation 


£(u,v;0,A) := Out+Av=0 
with 


6 >0, (6,A) £0, 


where the scalar 6 and the row m-vector A are parame- 
ters which describe the family. Denote by H® the closed 
halfspace defined by the nonpositive level set of , and 
by H* the open halfspace defined by the positive level 
set of £. We should like to be able to claim that . 
(and thus S’3) holds if and only if there exists a hy- 
perplane H such that K(x) — clos H C H® (and thus 
K(x) C H®. While the necessity is an obvious conse- 
quence of the convexity of H and K(x) — clos H, the 
sufficiency unfortunately does not hold. In fact, if @ = 0, 
then the above inclusion does not exclude that elements 
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of K(x) — clos H (and thus of K(x), which belongs to 
HY, may belong to H((more precisely to F( MN frt H; frt 
denotes frontier) or that H M F(x; X(A)) # @, where 


y(x;x) > 0, 


g(x) = 0, 
Ag(x) = 0 


X(A):= 4x ER": 


This can be expressed saying that the above inclusion 
assures separation, but not disjunctive separation, be- 
tween H and K(x) —clos H (and thus K (x)). Instead, 
if 0 = 1 (if @ > 0, due to the homogeneity of J, @ can be 
reduced to be 1), then the above inclusion implies dis- 
junctive separation so that the sufficiency holds. There- 
fore, in general the sufficiency does not hold. This draw- 
back can be overcome in two ways. In the former (a pri- 
ori) we restrict the class of functions g, g in order to 
guarantee the existence of a separating hyperplane hav- 
ing 0> 0; this is done by making suitable assumptions, 
which are called constraint qualifications (see [17]) if 
they implicate only g (e. g., Slater’s condition which re- 
quires the existence of ax € R” such that g(x) > 0), 
and are called regularity conditions if they implicate 
both g and g (see [9,17,18]). In the latter (a posteriori) 
we must check that H M F(x; X(A)) = @. 

In the preceding claim we have considered the in- 
clusion in H®, since it is a closed halfspace. When g 
and g are affine and 6 = 0, if we replace H® with the 
negative level set of H, say H™, then the sufficient part 
of the above claim becomes selfevident since HT 1 H 
= @. The necessity holds since S3’ and the ‘parallelism’ 
between K(x) and u-axis (the case when 6 = 0) imply 
K(x) CH. 

We have obtained the following theorem; in the se- 
quel we will understand the dependence of clos and K 
(and hence of F) on x: to stress the fact that Theorem 
3 holds independently of x; namely, it holds whatever 
the concave function g(x; -) may be and not only when 
g(x,x) = f(x) — f(x). In fact, in going from S3 to SF or 
S3’ or S}", does not play any substantial role: a change 
of x produces merely a translation of K in the direction 
of the u-axis and does not affect the conclusion. 


Theorem 3 Letg: R” > Randg: R" > R”. 
i) Assume that ¢ and g be affine. S3 is impossible if and 
only if there exist 0 € Rand i € R", such that 


Op(x) + Ag(x) < 0, 
8@>0, A>0, 


Vx ER", 


ie 
(6,A) £0, 


where the first inequality must be verified in strict 
sense if 0 = 0. 

ii) Assume that ~ and g are concave, and that there ex- 
ists X € R" such that g(x) > 0. S3 is impossible if 
there exists A € R™, such that 

S3': p(x) +Ag(x) <0, WxeER",A>0. 

iii) S3 is impossible if and only if there exist 0 € Rand A 

€ R™, such that 


Op(x) +Ag(x) <0 Vx ER", 


with 6>0, 

si A> 0, 
(0,4) £0, 
and X(A) = 

when d=0. 


Before touching on the consequences for Pz of the 
above approach, let us show how Theorem 3 can be 
used as a source for deriving TA; this will be done by 
deducing some classic linear TA from Theorem 3 even 
if, historically, these have been established directly. 

With the notation of Theorem 1, set g(x) = — ax 
and g(x) = Ax, so that S; becomes a particular case of 
S3. Theorem 3i) can be applied. At 0 = 0 S} becomes A 
> 0, A Ax < 0, Vx € R", and is obviously impossible; at 
6 = 1 S} becomes A > 0, — ax + A Ax <0, Vx € R", 
and holds if and only if A > 0, — a + A A = 0, which is 
equivalent to S}. Theorem 1 follows from Theorem 3. 

With the notation of Theorem 2 and its proof, S, 
turns out to be equivalent to system t > 0, A x < 0, Bx 
+ et < 0, which is easily identified as a particular case of 
S3 where x is replaced by (x, t), (x) by t, g(x) by 


Theorem 3i) can be applied. Thus, by setting A = (u, v) 
at 0 = 0, S} is obviously impossible; at 6 = 1 it becomes: 


u= 0, 


Vx ER", 


v>0, t—(uA+vB)x—vet <0, 
VteER, 
and holds if and only if u > 0, v > 0, ve = 1, uA + vB 


= 0, which is equivalent to S}. Theorem 2 follows from 
Theorem 3. 
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Theorem 4 (Stiemke’s theorem) Let A be anm xn 
matrix. Between the systems (in the unknowns x and u): 


Sg: Ax>0, Ax £0, 
and 
Si: uA=0, u>O0, 
one and only one has solutions. 
By observing that S{ is equivalent to ce ua’ = 0, 


—u! <0, S* can be seen asa special case of $2. Then the 
application of Theorem 2 leads quickly to the thesis. 


Theorem 5 (Motzkin’s theorem) Let A, B, C be matri- 
ces of the orders m x n, p Xn, q x n, respectively. Between 
the systems (in the unknowns x and (u, v, y)): 


Ss: Ax>0, Bx>0, Cx=0 


and 
uA+vB+ yC = 0, 
Si: u>0, uO, 
v0 
one and only one has solutions. 


It is immediate to see that Ss is impossible if and 
only if the same happens to the system (in the unknown 


x ’ . 
(*) € R" x R; e is a column m-vector whose entries 


equal 1): t > 0, Ax => et, Bx = 0, Cx = 0, which is easily 
identified as a particular case of S3 where x is replaced by 


(7) p(x) by t, g(x) by 


A —-e 

B 0 x 

C 0 (,): 
—-C 0 


Theorem 3i) can be applied. Thus, by setting 1 = (u, v, 
y's"), at 0 = 0 SF is obviously impossible; at 0 = 1 it 
becomes: 


u>0, v>0, y>0, y">0, 
t+ [uA+ vB +4 (y/ — y")C]x — uet < 0, 


Vx ER", VteER, 


and holds if and only if u > 0, v > 0, uA + vB + yC = 
0, ue = 1, which is equivalent to St. Theorem 5 follows 
from Theorem 3. 


Theorem 6 (Slater’s theorem) Let A, B, C, D be matri- 
ces of the orders m x n, p X n, q Xn, r x n, respectively. 
Between the systems (in the unknowns x and (u, Vv, y, Z)): 


Se: Ax >0, Bx>0, Bx #40, 
Cx>0, Dx =0, 
uA+vB+yC+zD=0 

st : and either 

u>=>0, u#0, v>0, y>=0 
oru>0, v>0, y=0, 


one and only one has solutions. 
It is easy to see that S¢ is equivalent to the system (in 


the unknown (7): 


t>0, Ax => ent, 
ce : Bx =0, epBx = t, 
Cx >0, Dx=0 (teER), 


where €m and ey are respectively a column m-vector and 
a row p-vector both with entries equal to 1. S¢' is quickly 
identified as a particular case of S3 where x is replaced by 


(*) (x) byt g(x) by 


A —€m 

B 0 

epB —1 x 
C 0 (*) 
D 0 

—D 0 


Theorem 3i) can be applied. Thus, by setting A = (u, v’, 
vo, y, 2,2"), at 0 = 0 SF is obviously impossible; at 0 = 1 
S} becomes: 


u>0, v>0, w>0, 

y>=0, 220, 220, 

t+ [uA + (v’ + voep)B+ yC + (2 — 2”)D]x 
— (Uem + vo)t < 0, 


VxeER", VteR 
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and holds if and only if (v:= v' + vo ep, z:=z' — 2") 


u>0, v=>0, 


uA+vB+yC+zD=0, 


vy 20, y= 0, 


Uem + Vo = 1. 


This system is equivalent to Sé since vo = 0 => u # Oand 
Vo > 0 => v > 0. Theorem 6 follows from Theorem 3. 


Theorem 7 (Tucker’s theorem) Let A, B, C be matrices 
of the orders m x n, p X n, q X n, respectively. Between 
the systems (in the unknowns x and (u, v, y)): 


S7: Ax>0, Ax4#0, Bx>0, Cx=0 
and 
SF: uA+vB+yC=0, u>0, v>0 


one and only one has solutions. 

It is immediate to see that S7 is possible if and only 
if the same happens to the system (e is a row m-vector 
whose entries equal 1): eAx > 0, Ax => 0, Bx => 0, Cx = 0, 
which is easily identified as a particular case of S; where 
(x) =eAx and 


A 


g(x) = 7 x. 


—C 


Theorem 3i) can be applied. Thus, by setting A = (u’, v, 
y's y"), at 0 = 0 S} is obviously impossible; at 0 = 1 it 
becomes: 


eAx + [uA + vB + (y'— y")C]x < 0, 
#>0; v0, vy 20, 9’ =O, 


Vx ER", 
and holds if and only if (y := y’— y") 


u’>0, v0, (e+u')A+vB+ yC=0, 


which is equivalent to S¥ since u:=e+u' > 0. Theorem 
7 follows from Theorem 3. 

At B = 0 and C = 0 Theorem 7 collapses to Theorem 
4, 


Theorem 8 (Duffin’s theorem) Let A be anmxn 
matrix, b a column m-vector, a a row n-vector, and a 
a scalar. The system (in the unknown x): 


Sg: ax>a, Ax<b 


is impossible if and only if at least one of the systems (in 
the unknown i): 


Ss: AA=a, Ab<a, Az 
and 

S.:. AA=O, Ab<0, A= 6, 
is possible. 


Sg is easily identified as a particular case of S3 where 
(x) = ax — a, and g(x) = b — Ax. Theorem 3i) can be 
applied. Thus, at 0 = 0 S} becomes’ >0,Ab—2 Ax < 
0, Wx € R", and is equivalent to S3' at 0 = 1 S} becomes 
A > 0, ax —a +A(b— Ax) < 0, Vx € R", and is equiva- 
lent to S3. Theorem 8 follows from Theorem 3. 

In quite similar ways other TA can be obtained from 
Theorem 3. 

Theorem 3 can be stated directly, i. e. without intro- 
ducing the image space (u, v) and separation, as classi- 
cally done in [6]. It is satisfactory if a TA aims to play 
the role of lemma for some theorems. Instead, the above 
outlined way raises TA to the basis for developing most 
of the topics of the Theory of Optimization, and for ob- 
taining TA under weaker assumptions than those of The- 
orem 3. A few comments will now be added on these two 
aspects. 

When the assumptions of Theorem 3ii) are not ful- 
filled, then of course S3 and S}' are not necessarily in al- 
ternative. However, by taking into account the geomet- 
ric meaning of S}', it is easy to note that the feasibil- 
ity of Sz’ is a sufficient condition (without any assump- 
tion on y and g) for S3 to be impossible and then (when 
g(x) = f(x) — f(x)) for x to be a minimum point of 
P2. These facts lead us to a generalization of Theorem 3. 
Indeed, it is immediate to see that any condition, which 
assures the convexity of K—clos H (like the concavity of 
y and g), assures the existence of a separating hyperplane 
between H and XK. It has been proved [23] that K— clos 
HH is convex if and only if the (1+ m)-vector function F(x) 
:= — (x) — g(x) is convex like (F is a convex-like func- 
tion if and only if Vx’, x"; € R" and Va € [0, 1] there 
exists z € R" such that F(z) < (1 — a) F(x')+ a F(x")). 
Therefore, we have: 


Theorem 9 Let the (1+ m)-vector function F be convex- 
like, and suppose that there exists x € R” such that 
g(x) > 0. Then, between $3 and S}’ one and only one 
has solutions. 
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If g and g are concave (in particular affine), then F is 
convex-like; thus Theorem 3ii) is a special case of The- 
orem 9. 

Now, let us briefly show how S}’ originates most of 
the topics of the theory of optimization. Since the fea- 
sibility of S}’ implies that x is a minimum point for P, 
and since S}’ is equivalent to (now continue to we use 
the position g(x) = f(x) — f(x)) 

(A): inf sup [f(x) — f(x) + Ag(x)] < 0, 


i 
A>0 xeER” 
we are led to consider the problem 


Py: at inf, L(x; A), 

where L(x;A) := f(x)— A g(x) is called Lagrangian func- 
tion, and the elements of a vector A, which fulfills the 
above inequality and thus S}’, are called Lagrangian 
multipliers and therefore receive the interesting inter- 
pretation as elements of the gradient of the hyperplane 
which separates the image K of P; from H to within 
an obvious transformation, the Lagrangian function is 
strictly related to such a hyperplane. Pj is known as 
dual problem of P2 which, in its turn, receives the name 
of primal problem. The space of the (linear) separation 
functionals ¢ is called dual space; being here in a finite- 
dimensional space, it is isomorphic to the space R” 
where A runs; this is the reason why the elements of 
vector A are often identified as dual variables. Without 
any assumption it is possible to prove the following in- 
equality [9]: 


PD: sup inf L(x;A) < inf sup L(x; A) 
pa xER" xER" fe 


= inf f(x), 
where the equality holds if R:= {x ER": g(x) > 0} 49. 
The difference between the second and the first term of 
the above inequality is called duality gap and is always 
nonnegative. Under the assumptions of Theorem 9 it is 
possible to prove [9,23] that the duality gap between P2 
and P} is zero; this result recovers the well-known du- 
ality theorem for linear programming (see [4,17]) when 
f and g are affine. P} has shown itself to be very useful 
to improve algorithms for solving P.; a classic instance 
is offered by the so-called Hitchcock linear transporta- 
tion problem (see [4]), where the use of dual variables 
(i.e., A) drastically reduces the computational steps of 


simplex algorithm; another classic instance is offered 
by the dual decomposition methods for solving the so- 
called mixed integer linear programs (see [4]), which 
heavily exploit the dual problem under the assump- 
tion that the duality gap be zero. Theory and solving 
algorithms are not the only fields of application of du- 
ality. Indeed, a solution of the dual problem contains 
always an important piece of information, often even 
more important than that of the primal problem. This 
fact is proved by classic instances. When P3 is the for- 
mat for finding the maximum flow in a network, then 
the dual variables give the potentials at nodes and arcs 
(see [4]) which are crucial information for the design 
and management of the network. When P, is the for- 
mat for finding the optimal production in an industry, 
then the dual variables represent the so-called shadow 
prices which lead to deep information on how the re- 
sources are exploited in the production process (see 
[4]). Many other applications might be mentioned. In 
all cases the introduction of the dual problem leads to 
a deep mathematical analysis which would have been 
inconceivable if the primal problem only had been in- 
troduced. 

Now, let us go back to (A). From this condition it 
is possible to derive the classic necessary and sufficient 
condition for x to be a minimum point of P, which is 
expressed in terms of generalized multipliers [17,20]. 
Condition (A)has stimulated the development of sev- 
eral other theories, like minimax theory and game the- 
ory [5], saddle point theory [9], penalization theory [9]. 

When K — clos J is not convex, S}’ may not hold 
even if S3’ does. In such a case, the above separation 
scheme suggests the introduction of a nonlinear func- 
tional to replace €(u, v; A); in other words, we can try 
to show S;’ by means of nonlinear separation if the lin- 
ear one fails (see » Image space approach to optimiza- 
tion). The nonlinear separation has led to generalize all 
the above results [9], and has allowed us to extend TA 
to more general situations, like systems of point-to-set 
maps [8,10], or to systems in a complex space, or to 
systems in an infinite-dimensional space where the first 
contribution is due to J. Farkas [7]. 


See also 


> Farkas Lemma 
> Linear Optimization: Theorems of the Alternative 
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Given a list of cities, the classical traveling sales- 
man problem is aimed at finding the least cost tour 
through the cities. The time-dependent traveling sales- 
man problem is a generalization of the traveling sales- 
man problem where the cost of travel between cities is 
also dependent on the order in which they are traversed. 
We now provide a more formal description of the two 
problems. Consider a set of cities N = {1, ..., n} and 
a mapping, D: N x N — R, that associates with each or- 
dered city-pair a cost incurred when a travel/transition 
is undertaken starting from the first city and ending 
at the second city. The data may be pictorially visual- 
ized on a complete directed graph of n nodes, where 
the nodes represent the cities and the arcs are labeled 
with the transition costs of the incident node pair. The 
cost function may be extended from single transitions 
to paths by summing the cost of travel over the arcs 
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comprising the path. As there is a unique arc join- 
ing any two nodes, a path in the graph may be identi- 
fied with a sequence of nodes. Let us then restrict our 
attention to simple circuits that pass through all the 
nodes. These are the Hamiltonian cycles of the directed 
graph. Cyclic permutations of the node set produce 
these Hamiltonian cycles and there are, therefore, (n — 
1)! possible candidates. The classical traveling salesman 
problem (TSP) is aimed at finding the cyclic permu- 
tation/Hamiltonian cycle of cities with the minimum 
travel cost. 

Given a Hamiltonian cycle P, associate with every 
arc its ordinality in P. A variant of the TSP spawns out 
if the contribution of a transition on the arc towards the 
cost of P is not only dependent on the ordered node- 
pair incident to the arc but also on the ordinality of the 
arc in ?. The ordinality of the arc shall be referred to as 
the time-period of the associated transition following 
the intuition that the cost of travel between cities varies 
with time and assuming that it takes one time unit to 
travel between any two cities. In other words, the cost 
of transition is specified as a mapping C: N x Nx J > 
R where N is the set of possible ordinal values for an 
arc in any Hamiltonian cycle P. Note that | J | = | N 
| since we restrict attention to Hamiltonian cycles. The 
time-dependent traveling salesman problem (TDTSP) is 
aimed at finding the minimum cost Hamiltonian cycle 
under the cost structure defined above. It should be ap- 
parent that TSP is a special case of TDTSP. 

Computational indicates that the 
TDTSP is a significantly more difficult problem in com- 
parison to the TSP. However, the flexibility obtained by 
using the more elaborate cost structure allows one to 
model additional interesting applications. 

Applications of the TDTSP have been proposed in: 
sequence dependent scheduling with time-dependent 
set-up costs [10], scheduling with precedence con- 
straints [3] and timetabling [2]. 

Studies on the TDTSP have been made by J.C. Pi- 
card and M. Queyranne [10]. Exact solution approaches 
for a special case of the TDTSP, namely the delivery 
man problem, have been reported by A. Lucena [7]. 
Time-dependent vehicle routing problems have been 
studied by C. Malandraki and M.S. Daskin [9]. Vari- 
ous formulations for the TDTSP have been compared 
by L. Gouveia and S. Vof [4]. Benders’ partitioning 
scheme has been used to derive an exact algorithm for 


experience 


the TDTSP [15]. Heuristics to accelerate convergence 
of such an algorithm are discussed in [14]. 

In this article, we present existing formulations and 
solution methodologies for the TDTSP. First, we review 
various formulations of the TDTSP and a comparison 
of these formulations in regards to the tightness of their 
relaxations. We then present a technique of construct- 
ing tight relaxations by employing convex and concave 
envelopes of product terms. On the algorithmic side, 
we outline a modification of the Benders decomposi- 
tion algorithm for the TDTSP. Then, using multistage 
network optimization we present an acceleration tech- 
nique for the above algorithm. Finally, a variable depth 
search heuristic is briefly outlined. 


Formulations of TDTSP 


The TDTSP is a special case of the quadratic assign- 
ment problem (QAP). The formulation for the QAP 
may hence be employed to model the TDTSP as fol- 
lows: 


min y'Qy 
st.  y € AP,, 


where AP,, is the assignment polytope for n assign- 
ments, 


0 Qi2 Qi n-1 Qin 

0 0 Qon-1  Qan 
C=l5 3 ; ; 

0 0 cae 0 Qn-1 n 

0 0 tee 0 0 


where Q; € R”*” and assumes the form 


0 Cik 1 0 0 0 Cikn 
Ckil 0 Cik2 0 0 eee 0 

O ceiz2 O  Cik3 0 0 

0 a 0 Cki n—3 0 Cki n—2 0 

0 ne 0 0 Cki n—2 0 Cki n—-1 
Chin ‘** 0 0 0 Ckin—-1 0 


in terms of the mapping C:Nx Nx T—>R. 
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In order to derive linearized versions of the above 
formulation, we define two sets of binary variables: 


1 city i visited in period t, 


Yip = . 
0 otherwise, 
1 transition from city i 
Xijt = to city j in period t, 


0 otherwise. 


The above formulation then takes the following form: 


min y x > Cijt Yir-1 Yit, 
ij ¢ 


J 
s.t. = 1, ieN, 


t 
Se 1 teT 
i 


ieN,teT. 


(Q) 


Yie € {0,1}, 


Linearized formulations are derived by introducing the 
transition variables X. Most linearized models assume 
that the tour begins and ends in city 1. This condition 
does not pose any additional restriction since a dummy 
city may be added with zero transition costs to and from 
other cities and then be treated as the starting city. 

In [5], a linearized reformulation of the TDTSP was 
derived by using the X variables: 


min ¥ > Ly Ci it X ijt 
i j t 


such that: 
YiYr=l, ieNn, (1) 
t 
baer ae teT, (2) 


Ve Yen = 2A RS 0 172N, tEeT;. GB) 


yo sea jEN, (4) 
a: 


Yi, € {0,1}, i€ N,teET, (5) 


Nivel), £f2N te 7. (6) 


The constraints (1) and (2) are the assignment con- 
straints. Constraints (3) state that no transition from 
city i to city j can take place in time-period t if city i 


was not visited in time-period t — 1 and city j was not 
visited in time-period t. Note that this constraint could 
be tightened by replacing it by the equivalent two con- 
straints Xj < Yi, and Xi < Yjp:. Constraint (4) just 
states that there are n transitions in any feasible solu- 
tion. 

The linear formulation presented by Picard and 
Queyranne [10] is described below: 


min >> > CijeXije 
ij ¢ 
such that: 
Yo Xj = 1, (7) 
j 
Yo Xie = > Xj, ie N, te T\O}, (8) 


YY Xip=Hl, Gen, (9) 
i t 


Xi € {0,1}, if EN, teT. (10) 


Constraint (7) fixes the starting city as 1. Constraints 
(8) require an entry to be followed by exit from any city 
in the following time-period. Constraints (9) allow only 
one entry to a city. 

Another model for the TDTSP has been proposed 
by K. Fox, B. Gavish and S. Graves [3] based on the as- 
sumption that the tour begins and ends in city 1: 


min > > 2 Ci it X ijt 
i j t 


such that: 


> rah LEN, (11) 
j t 
YoYo Xin Hl, GEN, (12) 
tj 

teT, (13) 


SG =1, 
ij 

n n—-1 
Yo > tXiz — DD Xie = 1, ie N\{1}, 
j t=2 j t=1 


(14) 


Xi € {0,1}, if EN, teT. (15) 


Constraints (11), (12) and (13) are assignment relation- 
ships which, respectively, state that a city is left, entered 
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and visited exactly once. Constraints (14) are subtour 
elimination constraints that force leaving the city (ex- 
cept the starting one) in the time-period following the 
entering time-period. 

We present below a linear formulation of the 
TDTSP proposed in N.V. Sahinidis and I.E. Grossmann 
[11] and R.J. Vander Wiel and Sahinidis [14]: 


min SoS YS CijeXije, 
j j t 
s.t. oe iE€eN, 
ai, teT, 
(P) yee jen, teT, 
yen ieN,teT, 


J 
Yit € {0, 1}, 


Xijt = 0. 


ieN,teT, 


This last formulation does not require the X variables 
to take integral values as the constraints enforce the in- 
tegrality of X when the Y variables take integral val- 
ues. 

The strengths of the various formulations are com- 
pared in [4]. It turns out that formulation P has the 
tightest linear relaxation amongst all formulations. The 
formulation in [10] achieves the same objective func- 
tion value. However, its feasible region is larger. The 
formulations in [3] and [5] are dominated by formu- 
lation (P). 


Envelopes and Tight Formulations 


We provide some insight into the tight relaxations 
for the TDTSP. By introducing the X variables in the 
TDTSP formulation (Q), we obtain the following math- 
ematical program: 


min yyy Gets 
i 7 4 

st) oYe=1, ieN, 
t 
yo neHi teT, 
i 


Xijt = Yi t-1Yit, 
Y;,€ {0,1}, i€N,teT. 


(16) 


Note that, in the above formulation, there are no in- 
tegrality restrictions on the X variables. These are au- 
tomatically enforced by the bilinear constraints (16) in 
the formulation. However, these constraints are non- 
convex and therefore the mathematical program given 
above is a nonconvex nonlinear program. A convex 
relaxation may be developed by replacing the bilinear 
constraints by convex constraints that properly contain 
all the feasible points to the above program. In particu- 
lar, if constraints (16) are replaced by linear constraints 
then the linear programming relaxations of TDTSP are 
obtained. 

We present a general methodology for construct- 
ing tight linear relaxations of 0-1 programs contain- 
ing product terms of 0-1 variables. Product terms of 0- 
1 variables have linear concave and convex envelopes 
over the unit hypercube as shown in [13]. Therefore, 
we may introduce new variables for the product terms 
and restrict them to lie in the convex set formed by 
the concave and convex envelope of the corresponding 
product term. Note that the product terms take integral 
values at the extreme points. However, as the convex 
and concave envelopes are exact at the extreme points 
of the unit hypercube, the integrality restrictions on the 
newly introduced variables become redundant and may 
be dropped. This technique may be used in conjunction 
with the reformulation-linearization technique (RLT) 
introduced by H.D. Sherali and W.P. Adams [12]. 

In the case of the TDTSP, the product terms are bi- 
linear. The convex envelope is, therefore, given by: 


Xije = maxt{ Yj --1 + Vir — 1, 0}. 
The concave envelope is given by: 
Xijx < min{Y; 1-1, Vit}. 


It was shown in [13] that the above constraints are im- 
plied in the formulation (P) described above. Further- 
more, this formulation may be derived by using the 
scheme described in this section. 


Network Interpretation of TDTSP 


We now provide a network interpretation of the 
TDTSP. If the Y variables in the linear relaxation of (P) 
are fixed, the formulation (P) reduces to a network flow 
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model. Formally, the problem is: 


min Sod SY Cie Xize 
i j t 


J 
at) Xp= Yn jen, ret, 
(S) i 
Yo Xi = Vir, i€N, tET, 
j 
Xijt = 0. 


It follows from the problem definition that the above 
problem decomposes by time-periods. In each time pe- 
riod, (S) takes the form of a transportation problem. 
Hence, a series of transportation problems may be used 
to solve the above problem. An alternate way of visu- 
alizing this structure is by juxtaposing the transporta- 
tion problems to form a 2n-partite graph. An illustra- 
tion of the graph appears as Fig. 1. The destination node 
representing city i in time-period t — 1 is connected 
to the source node representing city i in time period t 
by an arc with capacity interval [Y;;—1, Yi;—1]. In the 
new framework, the problem reduces to a feasible cir- 
culation problem. For more on network flow problems, 
see [1]. 

Fixing the Y variables amounts to identifying 
a Hamiltonian cycle of the 2n-partite graph. Hence, the 
TDTSP reduces to the problem of identifying the min- 
imum cost Hamiltonian cycle on this graph. Note that 
it is possible to combine the ith destination node in pe- 
riod t — 1 and ith source node in period tf to produce an 
n-partite graph. 


Decomposition Algorithm for TDTSP 


Based on the network interpretation of the problem de- 
scribed above, it is possible to arrive at a decomposi- 
tion algorithm to solve the dual of the linear relaxation 
of (P). 

This algorithm employs ideas of Benders decompo- 
sition. The master problem is defined in the space of 
the Y variables. Once the Y variables are fixed, the sub- 
problem is a set of n transportation problems. They are 
solved and one of their dual optimal solutions is picked 
to construct a cut for the master problem. This proce- 
dure is iterated producing a series of master problems 
that are increasingly tighter approximations of the pro- 
jection of (P) on the space of the Y variables. When the 
upper bound from the subproblem and lower bound 


from the master problem converge, the solution to (P) 
is obtained. For a detailed description of the algorithm, 
see [15] and [13]. 

Note that the cutting plane introduced into the Ben- 
ders master problem depends on the optimal dual solu- 
tion selected from the subproblem to construct it. T.L. 
Magnanti and R.T. Wong [8] proved that Pareto opti- 
mal solutions to a Benders problem may be constructed 
by solving a second-stage optimization problem on the 
set of optimal dual solutions from the subproblem. It 
was shown in [13] that, whenever the subproblem is 
a network-flow problem, the Pareto optimal problem 
can be recast as a network-flow problem. Using this 
idea, the linear programming relaxation may be solved 
by introducing Pareto optimal cuts at each iteration. 
Computational experience shows that this methodol- 
ogy gives faster convergence characteristics for the Ben- 
ders algorithm. 

Once we have a solution methodology for the dual 
of the linear relaxation, we can incorporate it in the 
branch and bound framework to derive an exact algo- 
rithm for the TDTSP. Note that the dual of the linear 
relaxation does not need to be solved to optimality to 
produce a valid lower bound to construct the enumera- 
tion tree. 


Heuristics for TDTSP 


Heuristics for the TDTSP are natural extensions of 

heuristics for the TSP. Probably one of the most suc- 

cessful heuristics for the TSP is the R-opt heuristic de- 
veloped by S. Lin and B.W. Kernighan [6] and ex- 
tended for the TDTSP by Vander Wiel and Sahinidis 

[14]. 

We describe the variable R-opt heuristic as applied 
to the TDTSP. This is an improvement heuristic and 
assumes an initial tour has been constructed already. 

e Step 1 selects a transition (J, k) for removal. 

e Step 2 identifies an arc (k, i) to replace arc (J, k). 

e The selection is done in a way that maximizes a cer- 
tain estimate of cost improvement which takes into 
account the time-dependence of the transition costs. 

e The path between j and k is then reversed and arc (I, 
j) is added to complete the tour. 

e Step 5 iterates trying to accomplish a similar reduc- 
tion with (j, i) as the starting arc instead of (J, k). 

For details, see [14]. 
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City | 
City 2 


City 1 Za 


Time period 1 


Cities split as sources and destinations 


Time period n 


for time periods & and k +] 


Time-Dependent Traveling Salesman Problem, Figure 1 
The 2n-partite TDTSP graph 


Conclusion 


TDTSP is acomputationally difficult problem. Since the 
linear programming relaxations of the formulations for 
this problem are large, it is important to identify struc- 
tured constraints and solve the problems efficiently us- 
ing some decomposition based ideas. We presented one 
such algorithm that exploits the network substructure 
of the TDTSP formulation. Furthermore, valid inequal- 
ities for the TDTSP polytope may help improve its for- 
mulation and allow us to develop more efficient solu- 
tion techniques. However, as of today, the TDTSP con- 
tinues to be an intractable problem for large instances. 
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Introduction 


Topological derivatives of shape functionals are intro- 
duced in [35] for elliptic boundary value problems. The 
construction is based on the technique [15,24] of sin- 
gular perturbations of geometrical domains, and the 
mathematical framework for topological differentiabil- 
ity in the general case can be found in [27] within the 
method of compound asymptotic expansions. There 
are numerous applications of topological derivatives to 
the resolution of shape optimization and inverse prob- 
lems. The asymptotic analysis of boundary value prob- 
lems in singularly perturbed geometrical domains is 
performed in the monographs [15,24,26], and, e.g., in 
the papers [17,28,29,30]. The derivation of topological 
derivatives for integral functionals is presented, e. g., in 
the papers [11,19,20,21,22,23,27,34,35,36,37,38,40,41], 
and in the Ph.D. dissertations [18,31]. 

In this chapter we perform an asymptotic analy- 
sis for boundary value problems in elasticity in two 
and three spatial dimensions. The results are borrowed 
from papers of the authors, [35,38,40,41]; see also [27], 
where the complete proofs of the presented results can 
be found. 

Numerical methods of optimization with topo- 
logical derivatives are considered in, e.g., the pa- 
pers [1,2,3,4,6,7,9,10,13,14,16,18,34]. 


Definitions 
The topological derivative for a shape functional is de- 
fined in the following way. 

Assume that 2 C RN, N = 2,3, is an open set and 
that there is given a shape functional 


JI: 2\K>R 


for any compact subset K C 2. We denote by 
B,(x), x € 92, the open ball of radius p > 0 around x, 
and w,(x) = Bp(x). The domain with a void will be de- 
noted 2(p,x) = 2 \ w(x). Assume that there exists 
the following limit: 


J(2(p, x)) — (2) 


|wp(x)| 


¥ 


T(x) = lim 
pyo 


which can be defined in an equivalent way by 


F) = tim LEED = HO) 
po p 
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The function T (x),x € &, is called the topological 
derivative of J({2) and provides the information on 
the infinitesimal variation of the shape functional 7 if 
a small hole is created at x € (2. We shall show in the 
sequel that the method is constructive, i.e., the topo- 
logical derivative can be evaluated for shape functionals 
depending on solutions of elasticity equations defined 
in 2. 

The partial differential equation for up = ug(p,x) 
is called the state equation for the shape optimization 
problems under consideration. We show that for a class 
of shape functionals it is sufficient to solve in the un- 
perturbed domain {2 the state equation as well as the 
appropriate adjoint state equation in order to evalu- 
ate the topological derivative T (x), x € 2. This means 
that the derivative can be used in shape optimization for 
broad classes of shape functionals and partial differen- 
tial equations. Some examples of where the derivative is 
explicitly given for model problems are provided. 

Our results can be described in the form of the fol- 
lowing expansion: 


H(Q(p, x)) = I(Q) + |wp(x)|T (x) + ofp). 


In the very special case of the energy functional, the 
so-called compliance functional in linear elasticity, the 
topological derivative is in fact considered in [8]. The 
derivative is used, for the first time, in numerical meth- 
ods of optimal design for the specific choice of shape 
functional [8]. In order to differentiate the energy func- 
tional with respect to the variations of the boundary 
of the domain of integration, knowledge of the shape 
derivative of the state equation with respect to the 
boundary variations is not required. Therefore, the re- 
sults obtained for the particular case of the energy func- 
tional cannot be directly generalized to the case of an 
arbitrary shape functional. 

In the sequel we shall drop x from the notation, as- 
suming that the cavity surrounds x = O € 92. 


Formulation 


Three-Dimensional Anisotropic Elastic Body 
with a Small Cavity 


Let us consider the elasticity problem written in the 
matrix/column form 


Lu = D(-V,)'A~D(V,)u=0 in Q(p), (1) 


Nu = D(n)' A~D(Vx)u = g@ ondQ, (2) 


N@u = D(n)'A~D(V,)u=0 on da, , (3) 


where A is a symmetric positive definite matrix of size 
6 x 6, consisting of the elastic material moduli (the 
Hooke’s matrix) a = 1/./2 and D(V,) isa6 x 3 matrix 
of the first-order differential operators (€; = 0/0x;): 


& 0 0 0 a& ak& 
Dé'=]0 & 0 o& O a& (4) 
0 0 & ag, ag, 0 


u is the displacement column, and n = (n,n, n3)! 
is the unit outward normal vector on 02(p), i.e. 
unit column. In this notation the strain and stress 
columns are given respectively by «(u) = D(V,.)u and 
o(u) = A~D(V;x)u, which gives 


+ 

€(u) = (e11,€2. €33, J 2€03, V2€n, V2€12) ; 
7 

o(u) = (11,022,033 / 2003, V20%1, V2012) . 


The load g® is supposed to be self-equilibrated in 
order to assure the existence of a solution to the elastic- 
ity problem, 


i. d(x)" g@(x)ds, =0€ R®, (5) 
a2 
where 
1 0 0 0 —AX3 AX2 
d(x)=|]0 1 0) ax; 0 ax, (6) 
0 0 1 —AX2 AX, 0 


represents rigid body motion. 

The general theory presented in the article can be 
applied to a broad class of shape functionals; however, 
to fix the ideas we deal only with one representative ex- 
ample. 

Let us consider the functional 


J}(u) = / o(us.x)' B(x)o(us x) dx . (7) 
2(p) 
Functional (7) looks like the elastic energy functional 


E(u; 2(p)) = an €(usx) Ae(usx) dx 
p 


at i ets) "A ota xdx (6) 
2 JQ(p) 
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but can contain a certain symmetric 6 x 6-matrix func- 
tion B. In the case of a constant, diagonal matrix B func- 
tional (7) is related to the square of the L2(§2)-norm 
of the stress tensor or of its components. On the other 
hand, if A(x)~'B(x)A(x)~! becomes a constant diago- 
nal matrix with our choice of B, then in (7) the similar 
strain norms are obtained. For problem (1)-(3) the ex- 
plicit dependence of the integrand on the displacement 
vector u(p,x) makes no sense, since such a displacement 
field is defined up to rigid motions. 

From condition (5) it follows that both problems, 
problem (1)-(3) in the body (p) with the cavity w, 
and the first limit problem in the entire body &, 


D(-V,)' AD(Vx)v =0in 2, 


D(n)' AD(V,)v = g® in dQ, ~ 
admit the solutions u(p,x) © C**(2(p))? and 
v € C*%(Q)°, respectively, under the loading g? € 
C!“(9Q)°. Freedom in selection of such solutions up 
to the rigid motions has no influence on functional (7) 
and therefore can be neglected (we recall only that using 
additional conditions we can pass to uniquely solvable 
problems). 

Before presenting the result for functional (7), 
we recall some facts. First of all, the adjoint state 
W © C*%(2)? has the form 


D(-V,)' AD(Vx)W = 
—2D(V,)'BAD(V,)v in 2, 
D(n)AD(V,)W = 
2D(n)AD(V,)' BAD(Vx)v. on 02. 


(10) 


Furthermore, we define the special functions z/ solving 
the exterior elasticity problem 


D(—V¢)' AD(V¢)zi =0 inG=R*\a, (11) 

D(n(é))' AD(V)z/ = gi on da (12) 
with the special right-hand sides 

gi(é) = —D(n())' Ae’ , (13) 


where j = 1, ... ,6ande/ = (8)), ... ria is an el- 
ement of the canonical basis in R°. 


The final formula has the following form. 


Theorem 1 The following formula holds true: 


T}(u) = Id(v) + p* fe") BAC») Ion 
+ (ABAD(V¢)ze°(v), D(V¢)ze°(v))R3\o, 
+ ((W)— 2BAe%(v))Tm”e%(v)! 


+4 Op?) ; 
(14) 


where €°(v) = D(Vg)v() and €°(W) = D(Vg)W(O) 
are strain columns evaluated at the point x = © for the 
solutions of problems (9) and (10); m® is the polarization 
matrix of size 6 x 6 for the cavity w in the elastic space 
with Hooke’s matrix A, andz = (z', ... , 2°) is the row 
of energy components of the special solutions to homoge- 
neous exterior elasticity problem (11)-(12). 


In the particular case of B(x) = SAI, functional (7) 
coincides with the elastic energy (8). In addition, we 
have W = 1; thus €°(W) — 2BAe°(v) = 0 and the last 
term in parentheses in (14) vanishes, and by (16) the 
sum of the first two terms equals $e°(v) 'm®e°(v). 
Thus, we have the relation 


E(u; 2(p)) = E(u; 2) + 5Pe%(v)TmeP(v) 


+ O(p**?). (15) 


The 6 x 6 polarization matrix may be computed explic- 
itly using the result given below. 


Theorem 2 The following integral representation holds 
true: 


m%, = (AD(Ve)2!, D(Ve)2*) | + Ajkloil. (16) 


We consider only the operator £(V,.) with the constant 
coefficients; however, the main results of the article re- 
main the same for the operators with variable coeffi- 
cients. 


Contact Problem for Plane Elasticity 


We consider the two-dimensional elasticity problem in 
plane stress formulation. Unlike in (1)-(3) on a part I" 
of 092, we assume clamped condition u = 0, on part I", 
the load N?u = {0ijnj}i=1,2 = g, and on part I”, the 


condition of frictionless contact 
Un2z0, On <0, 
(17) 


Onun = 0, Of =O.N—O,nN=0. 
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Here uy = ujni, On = NjOjjnj, On = {Ojjn;}i=1,2. We 
define also the ring C(R, p) = @r \ Wp with R > pand 
such that we C 22. 

In contrast to the previous section, it is now impos- 
sible to compute topological derivatives of shape func- 
tionals by means of adjoint variables without additional 
assumption of strict complementarity for unknown so- 
lutions. Therefore, we shall derive a method for com- 
puting the perturbation caused by w, in the solution 
itself. 

The bilinear form corresponding to the elastic en- 
ergy may be written as 


a(p;u,v) = an ae) dx (18) 
fo 


for u,v € H'(92)*, and the linear form responsible for 
the work of external forces is 


L(u) =") ulgds. (19) 
e 


& 
We will use also the Steklov-Poincaré operator A, de- 
fined in the following way. Consider the boundary value 
problem 


Lw=0 inC(R,p), 
N°w=0 on 00, (20) 
w=v ondar. 
Then we set 
Ap(v) =o,(w) on dap. (21) 
Thus A, is a mapping 
Ap: H!? (dap)? + Ho? (dap)? . (22) 


In the latter part of the article it will be demonstrated 
constructively that 
Ap = Ay + B+ O(p*) (23) 


in the linear operator norm corresponding to (22). Us- 
ing this notation we have 


a(p;u,u) = ae a! (u)e(u) dx 


+ >| a! (u)e(u) dx (24) 
C(R,p) 


2 


as well as 


1 1 
y oT (u)e(u) dx = =( Apt, u) 00 
2 JSc(R,p) 2 


1 1 
= 5 Alou, U) dap F 5h (Bu, U) dap =F Riu, u) > 
(25) 
where R(u, u) is of the order O(p*) on bounded sets in 


H}2 ( JWR y. 
With B we associate the bilinear form 


1 
b(u,v) = 5 (Bu, u) ox (26) 
and observe that 
1 
a(0; u, u) = 5 Alou, U) dcop (27) 


corresponds to the internal elastic energy in the entire 
domain. Denote also by up the solution to the contact 
problem in the domain without a hole. We have thus 
the approximation of the energy form 


a(p;u,u) = a(O;u,u) + p’b(u, u). (28) 


Let also 

Hy, (2) = {ve H'(2)|v=0 only} 
and K be the convex cone 

K={ve Hp, (2)|vn >0 onl;}. 


Then the following variational inequality solves our 
contact problem in £2(p): 


ué€K: a(p;su,u—v)>L(v—u) WveK. (29) 


Taking into account approximation (28) and using ab- 
stract results on the differentiability of metric pro- 
jection onto the polyhedric convex sets in Dirichlet 
space [33] we have the following result. 


Theorem 3 For p sufficiently small we have on 2p the 
following expansion of the solution u with respect to the 
parameter p at 0+: 

in H'(Qr)’, 


u = uo + pq + o(p’) (30) 


where the topological derivative q of the solution u to the 
contact problem is given by the unique solution of the 
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following variational inequality: 


q € Sx(u): a(0;q,v—q)+b(u, v—q) > 0 Vv e€ Sx(u), 
(31) 


where 


<0 on &(u), 


Sx(u) = {v € (Hy (2) ln 
a(0; u,v) = 0} . (2) 
The coincidence set 
E(u) = {x € I.|un(x) = 0} 


is well defined [33,42], for any function u € H'(Q)’, 
and ug € K is the solution of variational inequality (29) 
forp =0. 

The perturbation q gives an approximation of u outside 
wr. In the ring C(R,p) one can, as we shall see, compute 
the solution separately. 


Solution of the Elasticity System in the Ring 


Let us consider the plane elasticity problem in the ring 
C(R,p). We use polar coordinates (7,0) with e, pointing 
outward and eg perpendicularly in the counterclock- 
wise direction. Assume that the displacement on the 
outer boundary is given, while the inner boundary is 
free. We want to compare the solution to such a prob- 
lem to one defined in the full circle, with the same dis- 
placement data. To this end we shall construct the ex- 
act representation of both solutions, using the complex 
variable method of [25]. It was shown there that 


Orr — 10,9 = 2G! — &7/9(26" + W’), 


Orr + 1099 = Ang’ , (33) 
2u(ur + ing) = e (ep — zg’ —W), 
where ¢ and yf are given by complex series 
k=+00 
@ = Alog(z) + by anz* , 
k=—00 
k=+00 (G4) 
Ww = —KAlog(z) + > byzk . 
k=—oo 


Here ju is the Lame constant, v is the Poisson ratio, 
«x = 3 —4vin the plain strain case, and« = (3—v)/(1+ 
v) for plane stress. 


The displacement data are given in the form of 
Fourier series 


k=+00 
2u(u, + iug) = > Aye’ 


k=—0o 


(35) 


The traction-free condition on some circle means 
Orr = 0,6 = 0. From (33) and (34) we get for displace- 
ments the formula 


1 «1 
2u(u, + iug) = 2k A~rlog(r)— — A-z 
Zz r 


p=too 
ae > [srap41 — (1 — p)dy— pr? 


p=—oo 


= brant? |e : (36) 


Similarly we obtain a representation of tractions on 
some circle 


1 I. 2 
Orr — 10,9 = 2A—-+ (kK + 1) Az 
Zz r 
p=too 


+ a (1 — pa + p)aprit Car le 


p=-0o 
1 P 
+ sbp-i|z - (37) 
Denote dp = Kao — bo. For the full circle we must 


eliminate singularities, i.e., b-, = a_, = A=0 for 
k = 1,2, ... and then, using (36), obtain 


Ra? = NAo, 
i ese 
1 
Nay = Ao, 
(k +1)R 
1 
0 
ny Care se 
k KRE k-1 
b= : k : A k 
i= aE + 2) Akt + Acti], >1. 


(38) 


In further analysis we consider the ring and take for 
the sake of simplicity R = 1 as well as p < 0.5. This is 
only rescaling and does not diminish generality. Then 
from (36) for r= R= 1 and (37) for r= p, we get 
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A= Oand 
dy = A-, + ae A 
0 =A-1 eR! + pt ls 
R2 
= —___A,, 
a2 KR! + pt 1 
Na, = ee ; 
(k — 1)R? + 2? 
3a, = SAo, 
Se 
2p°R 
b_; = —2p?Ma, = —————___M'Ap 
1 pray (k — DR? + 29? 0 
4p2 
p’R “ 
b_» =-p*d, = — : 
2 p a2 Rt + pi 1 
Observe that 
2 - 
=e! = =p'——__—_—A,,, 
0 0 P ice? + p?) 1 
2 
— 2 Hh 
a-a=— MA, (39 
[eG ieee apy 
1 
a, — a3 = —p* Ay. 


KR?(KR* + p*) 


Again using (36) and (37) we obtain for k > 2 


A—-(k—1) Ak+1 
= T,; -|- 40 
po k(p) Hal (40) 
where 
—(k A 1)p?* =p) 
Tk(p) = —k2 prAk+1) =(k= 1)p 
and the system which may be rewritten as 
Ak+1 Ak 
. =| = 41 
s(t] = [2] a 


with entries 


Si(p)r1 = KRE* — (k? — 1)R'* p** 
Ae K2REFD o2(k+1) 


Si(p)iz = —(k — 1)(R1* p24) — RFD pk) | (42) 
Sk(p)ar = —(k + 1)(RET! + RI * p?*) , 
Sx(p)22 = —R1 = icRt* te-D . 


In fact formulas (40) and (42) are correct also for 
k = 0,1 and p > 0. Together with initial values doa), 


@7,b_,b_2 they allow us to compute all a;,b, for any 
—0o0 < k < +00. 

The matrix S;(p) is a perturbation of Sj), which 
would produce the solution for the full circle, namely, 
Was be acs Observe that T;,(0) = 0. Direct computa- 
tions lead to estimates 


|a3 — aS| < A (|A2|p* + |A-a|p”) (43) 

and fork = 4,5,... 
ie a°| <A (JAi-1/p°? 4 |A1-|or?) 
(44) 


where the exponent k/2 has been used to counteract the 


growth of k? in terms like k?p*”. Similarly, 
[bi — bi] <A (|Az|p* + |A-2|p”) . (45) 


and fork = 2,3,... 


[bi — BRI <A (Aceilo™4?? + [Acer nlo™?) 


(46) 
From relation (40) we get another estimate 
Jae) < Ap** (Antal + A-atyl), K=12... 
(47) 
and 
|b_~| < Ap?) (|Ag_a| + |Ar-el), &=3,4,.... 
(48) 


Here A is a constant independent of p and A;. Observe 
that the corrections proportional to p* are present only 
in a1, a3, b_y, a_;. The rest is at least of the order O(p°) 
(in fact O(p*)). 


Explicit Expansion of Elastic Energy The elastic en- 
ergy contained in the ring has the form 


2E(p, R) = i O(Up): €(Up) dx 
C(p,R) 


a / UpO(Up).n ds . (49) 
IWR 
Since Up = von dap, 
2E(p, R) = i uo(u,).nds . (50) 
d@R 
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Now o(up) is in fact of the form o(up) = op(u), be- 
cause Up = U ON JWR, which means that up = up(u). 
If we split a, into 


op(u) = 0° + p’o'(u) + O(p*), (51) 
then 
E(p, R) = B(0,R)+p? | uo'(u).n ds+O(p*). (52) 
d@R 


Thus the problem of defining the operator B reduces to 
finding o'(u). 

From (33) and (34) we know that o,(u) is a linear 
function of infinite vectors a, b, while o°(u) is the same 
function of a°, b°. Here a, b° are computed for w®, 
while a,b correspond to C(p,R). To obtain o!(u) it is 
enough to express a, b as 


a=a’+ p’a' + O(p'), 
b = b’ + p’b' + O(p'), 


because then 


(53) 


o'(u) = o'(a', b';u). 


Let us observe as well that 


20 
/ uo'(u).nds = rf (ou, + O19) dé 
IWR 0 


20 
= rf R[(o,, — io }y)(u, + iug)] dO. (54) 
0 


The analysis of formulae (38) for a°, b° and their coun- 
terparts a,b leads to the conclusion that the only 
nonzero terms in a!, b’ will be a}, a}, a,,b1,, bh. 

Taking into account that A =0 in (34) for our 
problem, 


b= 9 + pp! + O(p'), 
v= W' + pry + O(n), 


where 


(55) 


1 1 
¢'= aly tayz+ Ce y= bis + biz. (56) 


Using all the results collected so far gives the final ex- 
pression for B: 


1 f20¢=2),.. 
[vote ds = =| Ge maya (Aol 
~ (+ D)AaP 14, 
K 


7 6(K + 1) 657) 
K 


(AA) 


Taking into account the formulae for Fourier coefhi- 
cients 


m 200 , 
Ap = a (u; + iun)e"? dO, 
mT Jo 


bE 


20 
dee / a ER ae, (58) 
Jo 


m 20 ; 
A. = a (u; + iug)et'® dé, 
Jo 


we conclude that B is indeed the well-defined bilinear 
form which contains squares of integrals of u over d@,. 
In addition, from (56) follows the theorem below. 


Theorem 4 [fu € H'?(dw )’, which is equivalent to 


k=+00 
Yo V1+ Ak)? < Ao, (59) 
k=—oo 


then the rest R(u,u) in formula (25) is uniformly 
bounded by some constant depending only on Ao. 


The derivation sketched above allows one also to ob- 
tain a higher-order expansion of the Steklov-Poincaré 
operator. 


Cases 
Plane Isotropic Elasticity System 
Let us consider the isotropic elasticity equations in the 
plane 
Lug = f in ; 
u=g on Ih, 
ant U=h on 1), 
and the same system for u in the domain with the cir- 


cular hole w, centered at xp € {2, with the additional 
condition 


N°u=0 on da,. 


Observe the presence of the volume forces denoted by 
f. Isotropicity means here that the matrix of material 
coefficients has a particular form 


A+2u , Xr , 0 
A= Xr , At , O 
0 ; 0 , 2p 
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We introduce the yield functional 
Io(p) = fou?) So(u) dx, (60) 
2(p) 


where S is an isotropic matrix. Again, isotropicity 
means that S may be expressed as follows: 


1+2m l 0 
S = [sj] = l 1+2m 0 : 
0 0 2m 


where /,m are any real constants. Their values depend 
on the particular yield criterion (e.g., maximal shear 
stress or Huber). The following assumption assures that 
the problem is well defined. 

(A) The domain 92 has a piecewise smooth bound- 
ary, but pure cracks are admissible, even having differ- 
ent types of boundary conditions prescribed on both 
edges (i. e., tractions and displacements). Then g,h must 
be compatible with u ¢ H!(2)’. 

The interior regularity of u in 92 is determined 
by the regularity of the right-hand side f of the elas- 
ticity system. For such a problem the formulae given 
in Sect. “Three-Dimensional Anisotropic Elastic Body 
with a Small Cavity” may be computed exactly, even 
in the more general case of the presence of volume 
forces [35]. 

In this case the adjoint state v € Hf (2) satisfies 
for all test functions ¢ € H hr (92) the following integral 
identity: 


-| (D(V,)v) | A~D(V,)p dx = 2 if o(u)' So(o) dx. 

(61) 
Denote n = 1(1 + 4A; + 43) + 2m(1 + 243). Now 
we may formulate the following result: 


Theorem 5 Assume that the distributed force is suffi- 
ciently regular, f € C'()*, and (A) then the topological 
derivative of the functional J, is given by 


T(x) = —[n(a? + 2b?) + 2f Ty 


1 
++ panty + 2b,,b, cos 25)| (62) 


x=X0 


Here 


naI(14+40= +4u—) +2m(1+42=) . 


Some of the terms in (62) require explanation. In the 
reference frame tied to the principal stress directions 
for the displacement fields u,w,v, they are given by the 
expressions: 


ay = Oy (U) + 022(u), by = 01,(u) — ox(u), 


ay = 011(V) + 022(v), by = O41 (v) — 022(v) . 


Finally, the angle 6 denotes the angle between principal 
stress directions for displacement fields u and v in (62) 
and E, v stands for Young’s modulus and Poisson con- 
stant. By principal stress directions we mean, as usually, 
the coordinate system in which the stress tensor is diag- 
onal. 


Three-Dimensional Isotropic Elasticity Systems 


Now we consider the same system as in Sect. “Plane 
Isotropic Elasticity System”, only in R*. The isotropic 
matrix of material (Lame) coefficients is now 


Lao | A Yo o 

A A+ A 0 0 0 

eu| 4 A 249 2 oO 4@ 
0 0 0 mw 0 O 

0 0 0 0 24 0 

0 0 0 0 0 2m 


The yield functional is similar, 
Jap) =f to(u)" Sou?) (63) 
2(p) 


where S is an isotropic matrix. Isotropicity means here 
again that S may be expressed as follows: 


1+2m 1 l 0 0 0 

1 1+2m 1 0 0 0 

eo) 4 1 I+2m 0 0 0 
0 0 0 2m o 0|’ 

0 0 0 0 2m 0 

0 0 0 0 0 2m 


where /,m are real constants. Some yield criteria fit into 
this framework, but not maximal shear stress. The fol- 
lowing assumption assures that J, is well defined for 
solutions of the elasticity system. 

(A) The domain 92 has a piecewise smooth bound- 
ary, which may have reentrant corners with a < 27 
created by the intersection of two planes. In addition, 
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g,h must be compatible with u € H'(2)>. With respect 
to f we assume that it is a continuous vector field and 
fe (2y. 

The interior regularity of the displacement field u 
in 92 is determined by the regularity of the right-hand 
side f of the elasticity system. To be precise: Let 5 > 0 be 
given and 2° = 2 \ (I + Bs(0)). Then u € H*(2°)?, 
see [26]. 

The adjoint state equation for shape functional J, 
takes on exactly the same form as in (61) 

Now we may formulate the following result, giving 
the constructive method for computing the topological 
derivative: 


Theorem 6 Assume that the distributed force is suffi- 
ciently regular, f € C'(2;R*), and (A) is satisfied, then 


T Jo(xo) = —ZIK(S:o(w),o(u)) +4nflv 


+ K(A7;0(u),0(V))]x=x9 » (64) 


where v € Hr, (92) is the adjoint state satisfying the in- 
tegral identities (61). 


Function K is defined as an integral over the unit sphere 
0B, (0) = {x € R*|||x|| = 1} of the following functions: 


K(S; o(u(xo)), o(u(x0o))) 
= / a™ (u(x); x)? - So (u(x9); x) dS 
0B, (0) 
K(A7!; 0(u(x0)), o(v(%0))) 


— / o™ (u(x); x)? - AW! «a (v(x0); x) dS . 
By (0) 


The symbol o®(u(xo); x) denotes the explicit solution 

to the exterior elasticity problem, constructed from the 

so-called Leon solutions [12] in the way specified below. 

It satisfies the following boundary conditions in the in- 

finite exterior domain R? \ B,(0): 

e Notractions are applied on the surface 0B, (0) of the 
ball; 

e The stresses 0° (u(x0); x) tend to the constant value 
0 (u(xo)) as [|x|] > 00. 

In this notation o°°(u(xo); x) is a function of space 

variables depending on the functional parameter u(xo), 

while o(u(x)) is a value of the stress tensor computed 

at the point xo for the displacement field u. The de- 

pendence between o0™(u(x9);x) and o(u(xo)) results 


from the boundary condition at infinity listed above. 
The method for obtaining such solutions (and the dis- 
placement field u°°), based on [12], is given in [36]. 

The main difficulty is related to the computa- 
tion of the values of the functions denoted above 
as K(S;o(u(xo)),o(u(xo))) and K(A™';0(u(xo)), 
o(w(xo))), which cannot be obtained in the closed 
form, in contrast with the two-dimensional case. This 
is due to the fact that the principal stress directions for 
u and v may be rotated with respect to each other, and 
this rotation is not specified by a single parameter 6. 

Therefore we must approximate these functions us- 
ing numerical quadrature formulae. It is possible, be- 
cause we may calculate the values of integrands defin- 
ing K at any point on the sphere. This makes the com- 
putations more involved but does not increase the nu- 
merical complexity in comparison to evaluating single 
closed form expression in the case of two dimensions. 
The detailed procedure is given in [36]. 


Conclusions 


We list some applications of topological deriva- 
tives in numerical methods of resolution for shape- 
optimization and inverse problems. 

A numerical coupling of two methods, boundary 
variations by a level set method and topological deriva- 
tives, in shape and topology optimization of structures 
is proposed in [1,4,7,9,10,13]. On the one hand, the 
level set method, based on the classical shape deriva- 
tive, is known to easily handle boundary propagation 
with topological changes. However, in practice it does 
not allow for the nucleation of new holes (at least in two 
spatial dimensions). On the other hand, the topological 
derivative method is precisely designed for introduc- 
ing new holes in the optimization process. Therefore, 
the coupling of these two methods yields an efficient al- 
gorithm that can escape from local minima in a given 
topological class of shapes. Both methods rely on a no- 
tion of gradient computed through an adjoint analysis 
and have a low CPU cost since they capture a shape on 
a fixed Eulerian mesh. The main advantage of our cou- 
pled algorithm is to make the resulting optimal design 
largely independent of the initial guess. 

The paper [2] is devoted to minimum stress design 
in structural optimization. The efficient numerical al- 
gorithm for shape and topology optimization is based 
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on the level set method coupled with the topological 
derivative. Several numerical examples in two and three 
spatial dimensions are discussed. 

The Brazilian group has been working on topolog- 
ical derivatives since early 2000. Novotny in his Ph.D. 
dissertation proposes a method to calculate the topo- 
logical derivative based on classical shape sensitivity 
analysis [31]. 

The topological derivatives for partial differential 
equations on graphs are introduced in [19]. 

Topological sensitivity analysis can be performed 
in the framework of the piecewise constant Mum- 
ford-Shah functional. Topological and shape deriva- 
tives can be combined to derive a fast algorithm for im- 
age segmentation, without any initialization required. 
The general Mumford-Shah functional is also investi- 
gated, see, e. g., [14], see also [5]. 
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Complementarity theory is dedicated to the study of 
complementarity problems. The concept of comple- 
mentarity is fundamental to the study of many opti- 
mization problems and to the analysis and computa- 
tion of equilibria in the physical and economical sense. 
It is well known that the complementarity theory has 
also many and remarkable applications in Engineer- 
ing, Elasticity, Mechanics, Game Theory etc. The solu- 
tion set of a complementarity problem can be empty 
or nonempty, stable or unstable. When the solution 
set is nonempty, the problem is, how can we compute 
a solution. The classical existence results for comple- 
mentarity problems were proved using the Hartman - 
Stampacchia theorem, Karamardian ’s theorem, some 
fixed point theorems and for the linear complementar- 
ity problem using algebraic tools. A class of powerful 
methods used recently in complementarity theory is the 
class of topological methods. By topological methods we 
can prove existence theorems, we can study the stability 
of the solution set or we can study some particular topo- 
logical properties of the solution set. In what follows, 
we shall present some known topological methods. 


Preliminaries 


Denote by (R", ( -, -)) the n-dimensional Euclidean 
space, by (H, ( -, - )) a Hilbert space and by (E, || - ||) 
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a real Banach space. If (E, || - ||) is a Banach space, de- 
note by E* the topological dual of E and by (E, E*) a du- 
ality defined by a canonical bilinear form ( -, - ) defined 
on E x E*. We say that K C E isa pointed closed convex 
cone if and only if K is a closed subset and the following 
properties are satisfied: 

1) K+KCK; 

2) AKC KforallA € R,; 

3) KN (—K) = {0}. 

Whenever a pointed closed convex cone K C E is de- 
fined, we have an ordering on E defined by x < y if and 
only if y — x € K. By definition the dual of K is 


K={yeE*: (x,y) > 0 forall x € K}. 


Note that K* is also a closed convex cone. We say that 
the ordered Banach space (E, || - ||, K) is a vector lattice 
if and only if for every pair (x, y) of elements of E, the 
supremum x A y and the infimum x v y both exist in E. 
If (H, (-,- ), K) is an ordered Hilbert space, we say that 
the inner-product ( -, -) is K-local if whenever x A y = 0 
(x, y € K), we have (x, y ) = 0. If (H, (-, -)) is a Hilbert 
space and K C H isa closed convex cone, we denote the 
projection onto K by Px. The projection Px is defined 
for every x € H by || x — Px(x) || = minyex || x—y |). 

If E = R" and (-, -) is the Euclidean inner-product, 
the cone R{ is closed, pointed, self-adjoint (i. e., (R‘)* 
= R‘|) and the inner-product ( -, -) is R4 -local. The or- 
dered space (R", ( -, -), R'.) is a vector lattice. Let ( E, 
E* ) be a duality of Banach spaces and let K C E be 
a pointed closed convex cone. Given the mappings f: 
E — E* and g: E — E, consider the following implicit 
complementarity problem: 


find x, EE 
ICP(f,g,K) 4s g(xo) © K, f(xo) € K*, and 
(g(xo), f(xo)) = 0. 


If g(x) = x for all x € E, we obtain the nonlinear comple- 
mentarity problem: 


find xo EK 
NCP(f,K) {st f (xo) € K* 
(xo, f(xo)) = 0. 


and 


If E is a vector lattice with respect to the ordering de- 
fined by K and f), ..., fn are mappings from E into E 


we consider the general order complementarity problem: 
GOCP({f; }#_,, K) 


find x) €K 
s.t A(filxo), dears . fn(Xo)) = 0. 


Topological Degree 
and Complementarity Problems 


A powerful topological method used in complementar- 
ity theory is based on the concept of topological degree 
of a continuous mapping. A standard reference for de- 
gree theory is [23]. Corresponding to a bounded open 
set 2 C R", a continuous function f: 2 —> R",and 
an n-vector y ¢ f(dS2), we associate an integer num- 

ber denoted by deg(f, £2, y). We say that deg(f, 2, y) 

is the degree of f at y relative to 92. Always for our ap- 

plications we take y = 0. The topological degreehas the 
following properties: 

1) (Existence property). If deg(f, 2, 0) # 0, then the 
equation f(x) = 0 has a solution in 22. 

2) (Nearness property). If deg(f, §2, 0) is defined and 
g € C(2,R") is such that sup,e@ || f(x) —g(x) || 
< dist(0, f(d92)), then deg(g, 92, 0) is defined and 
deg(g, 2, 0) = deg(f, 92, 0). 

3) (Homotopy invariance property). If H: [0,1] x 
Q — R" is continuous and 0 4 H(t, dQ) for all 
t € [0, 1], then deg(H(0, -), 2, 0) = deg(H(1, -), 2, 
0). 

4) (Excision property). Suppose that deg(f, §2, 0) is de- 
fined and D is a compact subset of §2 such that there 
are no solutions of f(x) = 0 in D. Then deg(f, 82, 0) 
= deg(f, 92 \ D, 0). 

5) (Domain decomposition property). If deg(f, 2, 0) is 
defined and 2 is a disjoint union of finite number of 
open sets (2), then 


deg(f, 2,0) = }~ deg(f, 2;,0) 


6) (Index at a zero). Let x, be an isolated solution of 
the equation f(x) = 0. Then deg(f, §2, 0) is the same 
for any bounded open set (2 containing x, with the 
property that 2 contains no other solution of f(x) = 
0. In this case we call deg(f, 92, 0) the index of f at x., 
i.e., index(f, x.) = deg(f, 2, 0). Iff is differentiable 
at x, with a nonsingular Jacobian matrix f’ (x,), then 
index(f, x.) = sgn det f’(x). 
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The topological degree defined above is the Brouwer 
degree, which can be extended to infinite-dimensional 
case by the concept of Leray-Schauder degree. It is ev- 
ident that, when the problem NCP(f, K) is equivalent 
to an equation of the form ®(x) = 0, and deg(®, 2, 
0) is well defined, we can obtain existence theorems 
for the problem NCP(f, K). This is the situation when 
the problem NCP(f, K) is defined on a Hilbert space 
(H (-,-)) ordered by a closed pointed convex cone 
K C H. It is known [13,15,25,26] that in this case, the 
problem NCP(f, K) is equivalent to the solvability of 
the equation 


P(x) = x — Px(x — f(x)) = 0. (1) 


In [7,25,26] several results were proved using (1) and 
the topological degree in the Euclidean space. Suppose 
that the ordered Hilbert space (H, ( -,-), K) is a vector 
lattice, K is self-adjoint (i.e., K = K*) and the inner- 
product ( -, + ) is K-local. In this case the problem 
NCP(f, K) is equivalent to the equation: 


®,(x) =x A f(x) = 0. (2) 


This is the case when (H, ( -,- ), K) is the Euclidean 
space R”, ( -, - ), R'.). The topological degree of the 
mapping ®, can be used. 

Generally, the problem GOCP({f; }#_,, K) can be 
studied using the topological degree and the equation 


#3(x) = A(filx), -.-, fa(x)) = 0. (3) 


Using (2) and (3) and the topological degree many re- 
sults were proved in [6,7,8,25,26]. 

The particular case of affine functions (i. e., the case 
of linear complementarity problems) has been consid- 
ered in many papers as for example: [4,6,7,8,11,12,24, 
27,28,29]. The topological degree can be also used to 
study, the cardinality of solution set [7,21,22], to study 
the stability of solutions [7,10], or to study the connect- 
edness of solution set [9,18]. Finally, we note the paper 
[5] where the topological degree is applied to the study 
of a particular complementarity problem which is im- 
portant in Elasticity Theory. 


Exceptional Families of Elements 
and Complementarity Problems 


Let (R”, (-,- )) be the Euclidean space, K C E a closed 
pointed convex cone and f: R” — R” a function. 


Definition 1 We say that the family of elements 
{x"},>0 C K is an exceptional family of elements for f 
with respect to K if and only if for every real number 
r > 0 there exists a real number j1, > 0 such that the vec- 
tor u, = f(x") + [1,x" satisfies the following conditions: 
1) u, € K*; 

2) (uy, x") =0; 

3) || x" || > +coar>o+m. 


This was introduced in [1] and [19] and it is a new vari- 
ant of a similar notion introduced initially in [15]. By 
the topological degree we can prove the following alter- 
native. 


Theorem 2 For any continuous function f: R" > R" 
there exists either a solution for the problem NCP(f, K), 
or an exceptional family of elements for f with respect 
to K. 


Proof The proof is in [1,15] or [19]. 


Corollary 3 If a continuous function f: R" — R" is 
without exceptional families of elements with respect 
to K, then the problem NCP(f, K) is solvable. 


In the papers [1,15,16,17] and [19] are proved sev- 
eral existence theorems based on Corollary of Theorem 
1 for explicit and implicit complementarity problems. 
We note that Theorem 1 can be extended to infinite- 
dimensional Hilbert spaces, replacing the function f by 
a compact field. 


Homotopy Continuation Method 


Let (R", (-,- )) be the Euclidean space ordered by R‘,, 
and f: R” — R" a continuous function. The homo- 
topy continuation method is the following. Let D(x) = 
diag(x) be the diagonal (n x n)-matrix with the coordi- 
nates of x € R". Define the mapping ® : R*” —> R", x 
R" by ®(z) = (D(x)y, y-f(x)) for every z = (x, y) > 0. 
The problem NCP(f, R‘) is equivalent to the system of 
equations: 


@(z)=0 and z=(x,y)>0. (4) 


Consider the family of systems of equations 
@(z)=te and z=(x,y)>0. (5) 


where c = (a, b) € (R".\0) x R” andt © R,. Let C={ tc 
:t > 0 }. Under certain assumptions ®~!(C) exists and 
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forms a trajectory, a one-dimensional curve, {z(t):t > O}. 
Furthermore, z(t) leads to a solution of the system (4) 
as t tends to 0. By the homotopy continuation method 
we can study the existence of the trajectory ®~'(C) and 
we can develop numerical methods for tracing this tra- 
jectory. About the results obtained by this method, the 
reader is referred to the paper [20] and to its references. 


Zero-Epi Mappings 
and Complementarity Problems 


The concept of zero-epi mapping was introduced in [3] 
and it is more simple and more refined as the topologi- 
cal degree [3,13]. By this notion we can obtain the solv- 
ability of a nonlinear equation in a Banach space, even 
when the topological degree is zero. Let (E, || - ||) and 
(F, || - ||) be Banach spaces and §2 C Ea bounded open 
subset. 


Definition 4 We say that a continuous mapping 

f: 2 — Fis zero-epi (shortly 0-epi) if and only if: 

1) 0 ¢f(082) (i-e., f is O-admissible); 

2) for any continuous compact mapping h: Q —> F 
such that h(x) = 0 for every x € 02, the equation 
f(x) = h(x) has a solution in 22. 

If f: Q — Fis p-admissible (for p € F), i.e. p Z f(02) 

and the mapping f — p, defined by (f — p)(x) =f(x) — p 

is 0-epi, we say that f is p-epi. 


Properties 


i) Existence property. If f: 2 — F is p-epi, then the 
equation f(x) = p has a solution in 92. 
ii) Normalization property. The inclusion i: Q —> E 
is p-epi if and only if p € 2. 
iii) Localization property. If f: 2 — F is 0-epi, 2, C 
@ is an open set and f~'(0) C Qy, then the restric- 
tion of f to 21, ie. fly, : Q, > Fis 0-epi. 
Homotopy property. Let f: 2 — F be 0-epiand let 
h: 2x [0,1] > F bea continuous and compact 
mapping such that h(x, 0) =0 for any x € Q.1£ f(x) 
+ h(x, t) A 0 for all x € 022 and for any t € [0, 1], 
then the mapping f(-) + h(-, 1): 2 => Fis 0-epi. 
v) Boundary dependence property. If f: 2 —> F is 
0-epi and g: 2 — F isa continuous compact 
mapping such that g(x) = 0 for all x € , then 
f+g: 2 = Fis0-epi. 


iv 


~~ 


Remark 5 If f: Q — E isa p-admissible compact 
vector field and the Leray-Schauder degree deg(f, 2, 
p) #0, then fis p-epi. The converse is not true. 


In [13, Chap. 3] several existence theorems are proved 
for the (implicit and explicit) nonlinear complemen- 
tarity problem applying the concept of 0-epi map- 
ping. The connectedness of solution set of a nonlinear 
complementarity problem depending of multiparame- 
ters can be studied also applying the concept of 0-epi 
mapping [13]. The next result extends to the infinite- 
dimensional case and to the case when the topological 
degree is zero, the main result proved in [26]. 


Theorem 6 Let (H, (-,-)) be a Hilbert space, K C H 

a closed pointed convex cone and f, g: H + H completely 

continuous mappings. Suppose given a completely con- 

tinuous mapping ¢: H —> H and 2 C H a bounded 

open set such that: 

1) the mapping WV: Q — H defined by W(x) = g(x) 
—Pxlg(x) — $(x)] for allx € Q is 0-epi; 

2) for every iL > 0 and x € 02 MN g~!(K) we have f (x) + 
1 G(x) £ (K — g(x))*. 7 

Then the problem ICP(f, g, K) has a solution x» € Q. 


Proof A proof of this result is in [13]. 


Conclusions 


The application of topological methods to the study 
of complementarity problems represents probably, the 
most recent activity in complementarity theory. An- 
other argument to support this idea is the topological 
index on cones used recently in the paper [14]. 


See also 


> Convex-simplex Algorithm 

> Equivalence Between Nonlinear Complementarity 
Problem and Fixed Point Problem 

> Generalized Nonlinear Complementarity Problem 

> Integer Linear Complementary Problem 

> LCP: Pardalos—Rosen Mixed Integer Formulation 

> Lemke Method 

> Linear Complementarity Problem 

> Linear Programming 

> Order Complementarity 

> Parametric Linear Programming: Cost Simplex 
Algorithm 


3922 


Topology of Global Optimization 


> 


> 


Principal Pivoting Methods for Linear 
Complementarity Problems 
Sequential Simplex Method 


References 


il 


Bulavski VA, Isac G, Kalashnikov V (1998) Application 
of topological degree to complementarity problems. In: 
Migdalas A (eds) Multilevel Optimization: Algorithms and 
Applications. Kluwer, Dordrecht, pp 333-358 


. Carbone A, Isac G (1998) The generalized order comple- 


mentarity problem. Applications to economics. An exis- 
tence result. Nonlinear Stud 5(2):129-151 

Furi M, Martelli M, Vignoli A (1980) On the solvability of 
nonlinear operators equations in normed spaces. Ann Mat 
Pura Appl 124:321-343 

Garcia CB, Gould FJ, Turnbull TR (1983) Relations be- 
tween PL maps, complementarity cones and degree in lin- 
ear complementarity problems. In: Eaves, Gould, Peitgen, 
Todd (eds) Homotopy Methods and Global Convergence. 
Plenum, New York, pp 91-144 

Goeleven D, Nguyen VH, Théra M (1993) Nonlinear eigen- 
value problems governed by a variational inequality of von 
Karman’s type: A degree theoretic approach. Topol Meth 
Nonlinear Anal 2:253-276 

Gowda MS (1991) A degree formula of Stewart. Math Res 
Report Univ Maryland 

Gowda MS (1993) Applications of degree theory to linear 
complementarity problems. Math Oper Res 18(4):868-879 
Gowda MS, Sznajder R (1994) The generalized order lin- 
ear complementarity problem. SIAM J Matrix Anal Appl 
15(3):779-795 

Gowda MS, Sznajder R (1997) Weak univalence and con- 
nectedness of inverse images of continuous functions. 
Preprint Univ Maryland, Baltimore County 

Ha CD (1987) Application of degree theory in stability of 
the complementarity problem. Math Oper Res 12:368-376 


. How R (1983) On a class of linear complementarity 


problem of variable degree. In: Eaves, Gould, Peitgen, 
Todd (eds) Homotopy Methods and Global Convergence. 
Plenum, New York, pp 155-177 

How R, Stone RE (1983) Linear complementarity and the 
degree of mappings. In: Eaves, Gould, Peitgen, Todd (eds) 
Homotopy Methods and Global Convergence. Plenum, 
New York, pp 179-223 

Hyers DH, Isac G, Rassias TM (1997) Topics in nonlinear 
analysis and applications. World Sci., Singapore 

Isac G (1996) The fold complementarity problem and the 
order complementarity problem. Topol Meth Nonlinear 
Anal 8:343-358 

Isac G, Bulavski V, Kalashnikov V (1997) Exceptional fam- 
ilies, topological degree and complementarity problems. 
J Global Optim 10:207-225 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


Isac G, Carbone A (1999) Exceptional families of elements 
for continuous functions. Some applications to comple- 
mentarity theory. J Global Optim 15:181-196 

Isac G, Obushowska T (1998) Functions without excep- 
tional family of elements and complementarity problems. 
J Optim Th Appl 99:147-163 

Jones C, Gowda MS (1997) On the connectedness of solu- 
tion set in linear complementarity problems. Preprint Univ 
Maryland, Baltimore County 

Kalashnikov V (1995) Complementarity problem and the 
generalized oligopoly model. Habilitation Thesis CEMI, 
Moscow 

Kojima M, Megiddo N, Noma T (1991) Homotopy contin- 
uation method for nonlinear complementarity problems. 
Math Oper Res 16(4):754-774 

Kojima M, Saigal R (1979) On the number of solutions to 
a class of linear complementarity problem. Math Program 
17:136-139 

Kojima M, Saigal R (1981) On the number of solutions 
to a class of complementarity problems. Math Program 
21:190-203 

Lloyd NG (1978) Degree theory. Cambridge Univ. Press, 
Cambridge 

Morris WD (1990) On the maximum degree of an LCP map. 
Math Oper Res 15:423-429 

Pang JS (1995) Complementarity problems. In: Horst R, 
Pardalos PM (eds) Handbook Global Optim. Kluwer, Dor- 
drecht, pp 271-338 

Pang JS, Yao JC (1995) Ona generalization of anormal map 
and equation. SIAM J Control Optim 33(1):168-184 
Sridhar R (1996) The degree on an exact order matrix. Math 
Oper Res 21(2):427-441 

Stewart DE (1991) A degree theory approach to degener- 
acy of LCPS. Res Report Dept Math Univ Queensland, Aus- 
tralia 4072 

Sznajder R (1994) Degree-theoretic analysis of the verti- 
cal and horizontal linear complementarity problems. PhD 
Thesis, Graduate School Univ. Maryland 


Topology of Global Optimization 


HUBERTUS TH. JONGEN!, ALINA RUIZ JHONES” 
1 Department Math., Aachen University Technol. 
Aachen, Germany 
* Fac. Math. and Computer Sci., 
University Havana San Lazaro y L, 
Ciudad Habana, Cuba 


MSC2000: 90C30, 58E05 


Topology of Global Optimization 


3923 


Article Outline 


Keywords 

Introduction, Critical Points, Nondegeneracy 
Relations Between KKT Points: Morse Relations 
Projected Gradients 

Global Gradient Flows: Equality Con-straints Only 
Global Gradient Flows: The General Case 

See also 

References 


Keywords 


Morse theory; Karush-Kuhn-Tucker point; Morse 
relations; Euler formula; Global optimization; 
Continuous selection of functions; Min-max graph; 
Min-max digraph; Projected positive gradient; 
Projected negative gradient; Ascent flow; Descent flow 


Introduction, Critical Points, Nondegeneracy 


In this article we describe the basic idea of Morse the- 
ory in finite-dimensional smooth optimization. This is 
concerned with critical points (in particular, Karush- 
Kuhn-Tucker points) and relations between them. An 
extension to certain nonsmooth problems is indicated. 
Then, we turn to gradient flows and focus on the fun- 
damental problem: how to get from one local minimum 
to (all) other ones. 

In this paper we consider optimization problems of 


the type (P): 


min f on the feasible set M, 
hi(x) = 0, 
(P) iel, 
where M=4xe€R": 
g(x) = 0, 
jeJ 


and where f, hj, gj: R” > R are C*-functions, |I| > <n, 
[J| < 00. 

For simplicity we assume that M is compact and 
that the linear independence constraint qualification 
(LICQ) is satisfied at all points of M. The LICQ is 
said to hold at x € M if the vectors Dh;(x), i € I, 
Dgj(x), j € Jo(x) are linearly independent. Here, Dh 
stands for the row vector of partial derivatives of h and 
Jo(%) = {j EJ: gi(X) = Of. 

In virtue of LICQ we can take the constraint func- 
tions h;, i € I, gj, j € Jo(x), as new coordinates in 


a neighborhood of x. In these coordinates, the set M 
locally takes the form H? xR‘, where p = |Jo(x)|, q = 
n—|I|—p,andH? = {ye R?: y;>0, i=1,...,p} 
A point x € M is called a critical point for f |x if 
there exist real numbers overlined,, Hj such that 


Df = So AiDhi + ~~ Hj Dgjlz- 


i€l j€Jo(X) 


A critical point is called a Karush-Kuhn-Tucker point 
(shortly, KKT point) if HB; > 0, j € Jo(x). Moreover, 
a critical point is said to be nondegenerate if the follow- 
ing two conditions hold: 

ND1) (linear) Ej # 0, j € Jo(X). 

ND2) (quadratic) D*L(x)|7-m is nonsingular. 

The matrix D’L stands for the Hessian of the Lagrange 
function L, 


L(x) = f(x) — S“Aihi(x) — S> Tjgj(x), 
) 


i€l anes 
and T;M denotes the tangent space at x, 


TM = {€ ER": Dh(x)E = 0, i € I, 
Dg(Z)E = 0, j € Jo}. 


Condition ND2) means that the matrix V' D*L(¥)VV 
is nonsingular, where V is some matrix whose columns 
form a basis for the tangent space TyM. 

In a neighborhood of a nondegenerate critical point 
% there exist new C!-coordinates, such that M locally 
takes the form H? x R4 and f | becomes (equivariant 
Morse lemma; see [11]): 


f~f@+ > o-vn+ doy 4+ >i 4+ Voy 
i j k 1 


where the coordinates y; and y; in the first two sums are 
nonnegative. The number of negative/positive linear 
terms corresponds to the number of negative/positive 
multipliers [2,, r € Jo(x), whereas the number of neg- 
ative/positive squares is equal to the number of nega- 
tive/positive eigenvalues of V' D*L(x)V). The number 
of negative linear (quadratic) terms is called the linear 
index LI (quadratic index QI). In particular, a nonde- 
generate critical point is a KKT point (local minimum) 
if and only if LI = 0 (LI = QI = 0). Basic references for 
this article are [11,12,15]. 
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Relations Between KKT Points: Morse Relations 


From now on we assume that all critical points of f | 
are nondegenerate with pairwise different f-values. For 
details we refer to [11,16]. In order to study relations 
between critical points, we consider lower level sets M* 
= {x € M: f(x) < a} for increasing values of a. If the 
intermediate set Me = {x € M:a < f(x) < b} does not 
contain a KKT point, then the set M? can be continu- 
ously deformed into the lower level set M*. The cru- 
cial point is the following: a point x € M M is not 
a KKT point if and only if there exists a feasible direc- 
tion of linear descent for f at x. The desired deforma- 
tion can now be accomplished via such descent direc- 
tions. 

Next, suppose that the intermediate set M° contains 
exactly one KKT point x. Moreover, let a < f(x) < b 
and QI = k. Then, the lower level set M® has the ho- 
motopy type of M* U D*. Here, the notation M* U D* 
means that a k-dimensional ball D* is attached (glued) 
to the set M* along its boundary dD*. 

In particular, if k = 0 (local minimum!), then M* U 
D? is just the disjoint union of M* and a point (hence, 
a new component is created). Next, the 1-dimensional 
ball D! is an interval, and its boundary dD! consists of 
two points. There are two possibilities. Either the two 
boundary points are glued onto two different compo- 
nents of M* (hence, the number of connected compo- 
nents decreases by one), or both boundary points are 
mapped onto the same component of M* (now, the 
number of 2-dimensional ‘holes’ is increased by one). 
Speaking in terms of holes, we have the following gen- 
eral alternative when passing a value of a KKT point 
with QI =k: 

e either the number of k-dim holes of M* goes down 
by one; or 

e the number of (k +1)-dim holes of M* goes up by 
one. 

To be precise: by a k-dim hole of a topological space X 

we mean a generator of H;—(X), the (k — 1) singu- 

lar homology space of X over the real number field; in 

particular, Ho(X) counts the number of path-connected 

components of X. 

The number of k-dim holes is invariant under con- 
tinuous deformations. Hence, that number can only 
change when passing a functional level corresponding 
to a KKT point. 


Let r;, (the kth Betti number) denote the number of 
the (k +1)-dim holes of the feasible set M. Moreover, 
let cy () be the number of KKT points with QI = k 
at which level the number of k-dim holes of M* goes 
down (the number of (k + 1)-dim holes goes up) for in- 
creasing values of a. If we reach the global maximum 
value of f |, then M* = M, and we should have created 
precisely r;, holes of dimension (k + 1), k = 0, 1, 2, ++. 
Consequently, if (in between) more than r; holes of di- 
mension (k + 1) are created, then some of these holes 
should be closed before reaching the global maximum 
value of f|,7. Together with the aforementioned alterna- 
tive, this results into the following topological balance 
equations (Morse relations): 


6 Sg =e. FS Oca) (1) 


where cf = co and 0 = ¢, :=c* +c; fors >n—|I| (the 


dimension of M). 

If M is connected, then rp = 1 and the first relation 
in (1) becomes cp — cy ~!. 

It guarantees the existence of at least (co — 1) KKT 
points with QI = 1. (mountain pass theorem). For this 
reason we call the KKT points with QI = 1 of (—) type 
decomposition points. In fact, when lowering the func- 
tional level of a decomposition point, the correspond- 
ing component of the lower level set splits up into two 
components, thereby separating the local minima con- 
tained in them. 

We can get rid of the (+) and (—) signs in (1) by 
adding all equations in (1) with alternating signs. This 
leads to the equation (s = n — |I|): 


tj =a (1G 
=fo—nttr—-+++(-1)'r, (2) 


Remark 1 In the deformation part (along feasible di- 
rections of linear descent) we can weaken the LICQ as- 
sumption. For example, the Mangasarian—-Fromowitz 
constraint qualification suffices (see [7]). Also, nons- 
mooth aspects can be taken into account (see [4,9,13]). 
Since only KKT points playa role in the Morse relations 
(1), the nondegenerate KKT points can be replaced by 
strongly stable stationary points (in the sense of Ko- 
jima); see[7]. 


Remark 2 In case that M is a polytope P, relation (2) 
reflects the famous Euler ’s formula. In fact, using the 
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logarithmic barrier function, a function f can be con- 
structed such that each k-dim face of the polytope P has 
exactly one KKT point for f |p with QI = k. For this par- 
ticular function f, the number c; in (2) equals the num- 
ber of k-dim faces of P. Since a polytope can be contin- 
uously deformed into a point, we have ro = 1, and r; = 
0,i > 1. Altogether, formula (2) then becomes the Euler 
’s formula for polytopes (see [8]). 


Remark 3 The ideas of deformation and cell attach- 
ment can be generalized to functions of maximum type 
(cf. [2]) and minimum type (cf. [3]). Both cases are spe- 
cial continuous selections of functions. In fact, let W = 
CS(fi,...5fs), where f1,..., fs: R” > Rare C’-functions 
and CS means ‘continuous selection’. Note that W is 
nonsmooth in general. In [13] the concept of nonde- 
generate critical point for W is introduced. It is shown 
that, locally around such a point Z, there exist new con- 
tinuous coordinates such that W takes the form: 


E»] 
r+k ~ 
— SO x4 Ee: (3) 


j=rtl l=r+k+1 


rr troee 


It is easily seen that 


eli 


and 


a r 
min Rixceeare= > Xi ~-) ae 
i=1 i=1 


Now, consider the lower level set of Y when passing the 
value W(z). Then, in case that W is of max (resp. min) 
type, a k-cell (resp. (k + r)-cell) will be attached to the 
lower level set. 


r 
ds ap 
i=1 


If W is not of max (or min) type, the situation becomes 
more complicated; with respect to the ‘linear part’ it is 
to be expected that more cells have to be attached simul- 
taneously. The negative squares in (3) will raise the di- 
mension of the latter cells. A precise study is presented 
in [1]. 


Projected Gradients 


A symmetric positive definite (n, n)-matrix R defines 
a scalar product (+), where (x, y )z := x! Ry. The gra- 


dient gradpf(x) of f with respect to R is defined to be 
the vector solving the system (v, gradrf(x))r = Df (x)v, 
v € R". It follows that gradpf(x) = R7! D' f(x). 

For x € M let 


Dh,(x)é = 0, i € I, 


_f n, 
ae Bes Dg (X)E = 0, j € Jo(X) 


a 
denote the tangent cone of M at x. The projected pos- 
itive gradient (+)gradr, yu f(x) at x € M is defined to 
be the unique solution vector of the following ‘primal’ 
optimization problem: 


min lé — gradp f(x) Ik 
s.t. E ¢ C;M, 
where |lyllp = V(VsV)p 


We point out that (+) gradp y, f(x) is equal to the 
vector obtained by inserting the solution (A, 7Z) of the 
‘dual’ problem: 


grad, (f+ Dash + Do msi) 


ie] j€Jo(X) 


min 


s.t. p= 0. 


In case that Jo(x) = 9, we have the formula 


(+) gradp yy f(x) 
= (A—AH(H'AH)'H" A)D' f(z), 


where the columns of H are formed by the vectors 
D'h,(%),i€I, and A = R7. 

The projected negative gradient (—) grad, uf is de- 
fined to be the projected positive gradient correspond- 
ing to the function (— f). 

We note that (—) grade iy f(x) = 0 if and only if x 
is a KKT point for f|. Moreover, 

(+) gradg yy f®) = —(-) grady y f@) 
if Jo(x) = O. 

A C-Riemannian metric (or variable metric)R:x 
— R(x) is a Ck-mapping from R” into the space of 
symmetric positive definite (n, n)-matrices. It induces 
(pointwise) a projected positive (negative) gradient 
field of f on M. 
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Global Gradient Flows: Equality Con-straints Only 


Here we assume that there are no inequality constraints, 

ie J=@. 

Let all functions f, hj, i € I, be smooth, ie. of 
class C°® and let R be a smooth Riemannian metric. 
Now, M is a smooth manifold without boundary and 
the concepts of critical point and KKT point coincide. 
We assume, in addition, that all critical points of f | 
are nondegenerate having pairwise different f-values. 
Consider the vector field (+) gradg, y f on M. It de- 
fines a smooth flow [12] ¥: R x M — M, where W 
(t, x) is the point which is reached from x when inte- 
grating the vector field during the time ¢. Let x € M 
be a critical point. Then, the stable manifold We := 
{x € M: lim;+o. W(t, x) = x} and the unstable man- 
ifold W2 := {x € M: lim;+—.o W(t, x) = X} are well 
defined. From the fundamental work of S. Smale [17] 
we know that for generic Riemannian metrics (resp. for 
generic f) all stable and unstable manifolds correspond- 
ing to critical points intersect transversally. 

Now we focus on the fundamental question: how to 
get from one local minimum to (all) other ones. To this 
aim we introduce two bipartite graphs: 

e The 0-1-0 graph. The set of nodes is partitioned into 
the set of local minima of f |; and the set of critical 
points of f |,4 with QI = 1. There exists an edge be- 
tween x (local minimum) and y (critical point with 
QI = 1) if and only if WE W; # Q@ (i.e. if there 
exists a trajectory of (+) gradr,m f which connects 
the local minimum x and the critical point 7). 

e The min-max graph. The set of nodes is partitioned 
into the set of local minima and the set of local max- 
ima of f|y. There exists an edge between x (local 
minimum) and ¥ (local maximum) iff W!N Ws #0 
(i. e. if there exists a trajectory of (+) gradr m f con- 
necting the local minimum X and the local maxi- 
mum y). 


Theorem 4 ([10,11]) Let M be connected. Then both 
the 0-1-0 graph and the min-max graph are generically 
connected. 


The connectedness of the 0-1-0 graph follows from the 
fact that the subgraph of local minima and decomposi- 
tion points is already (generically) connected. This also 
induces the connectedness of the min-max graph. In 
fact, let the decomposition point X connect the differ- 
ent local minima y, and y,. The unstable manifold W# 


(generically) intersects W3 for some local maximum Z. 
But then, W2M Wy. #0,i=1,2. 

We emphasize that the connectedness of the afore- 
mentioned graphs lies at the heart of the problem of 
global optimization. 


Global Gradient Flows: The General Case 


The appearance of inequality constraints makes things 
much more difficult and, up to now, the theory on 
global flows is far from complete. Now we are dealing 
with two types of differential equations on M: 


x = (+) gradp y f(x) (the ascent flow) (4) 


x = (-) gradp yy f(x) (the descent flow) (5) 


Both equations (4), (5), may have discontinuities in 
the right-hand side along the boundary 0M. A solution 
of (4), (5) is a function x(-) which is absolutely con- 
tinuous on compact time intervals and which satisfies 
(4), (5) almost everywhere. Uniqueness for the associ- 
ated initial value problems can only be guaranteed for 
positive time intervals (for details see [5,6]). Hence, we 
will integrate (4), (5) only in positive time, and then, the 
functional value will increase (decrease). We note that 
KKT points on 0 M may be reached (via the descent 
flow) in finite time and that integral curves can be tan- 
gent to the boundary 0M. Moreover, an integral curve 
may move along the boundary 0M, thereby changing 
the active constraints. 

Now, let us focus on the concept of a min-max 
graph. We assume again that all critical points of f |; are 
nondegenerated. Let x),.. NG be the 
local minima and the local maxima of f |,y respectively. 
Choose small neighborhoods (germs) Ux,,..., Uy,; of 
X1,..-, 9, in M. These neighborhoods will be kept fixed 
in the sequel. The min-max digraph is defined to be the 
following directed bipartite graph: 

e The min-max digraph. The set of nodes is parti- 
tioned into the set of local minima {x),..., Xp} and 
the set of local maxima {y,,...,V,}. There exists an 
arc from x; to y; (from y; to x;) if the ascent flow 
(descent flow) connects some point from Ux, (U5,) 
with a point from U5, (Ug, ). 

Note: In case of equality constraints only, an arc 

from x; to y; always generates an arc from ¥; to x; 

(just by reversing the integration time). 


.,Xp and y,,.. 
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Now, let M be connected. In contrast to the theorem 
on min-max graphs, the min-max digraph need not be 
strongly connected (i. e. connected as a directed graph) 
in presence of inequality constraints. Moreover, the dis- 
connectedness may be stable (with respect to small C! 
—perturbations of Df or R). 

The simplest example of this phenomenon can be 
constructed on the 2-dimensional disc M [18]; the func- 
tion f |¢ should have five critical points: two local min- 
ima, two local maxima (all of them on the boundary 
0M) and one saddlepoint (in the interior of M). More- 
over, the separatrices of the saddlepoint should inter- 
sect dM in points outside the chosen neighborhhoods 
of the local minima (maxima). 

Although this result seems to be disappointing at 
first glance, a different Riemannian metric may be con- 
structed such that the associated min-max digraph be- 
comes strongly connected. 

In fact, consider the example above. By means of 
adapting the Riemannian metric, one might move the 
four points of intersection of the saddle-separatrices 
with the boundary 0M towards the set of local min- 
ima/maxima. But then, the associated min-max digraph 
becomes strongly connected. 

We end up with the following theorem [14]. 
Theorem 5 For connected M (and given f) there exists 
a smooth Riemannian metric R such that the resulting 
min-max digraph is strongly connected. 
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The term “topology optimization” is usually used for 
a certain type of problem appearing in structural opti- 
mization (» structural optimization; > structural op- 
timization: history) where the choice of design vari- 
ables allows for a prediction of a general distribution 
of material in space. Alternatively, this type of prob- 
lem is called generalized shape design problems or lay- 
out design problems. The concepts of the area are not, 
however, restricted to problems in structural mechan- 
ics, and much of the basic understanding of the varia- 
tional nature of these problems in a continuum setting 
derives from studies of conduction problems (heat or 
electric current). Also, much current research is related 
to the application of the techniques in multiphysics set- 
tings [4,5]. 

Inherently topology design problems are large-scale 
(infinite) discrete optimization problems. Most work in 
the field has been concerned with formulations where 
the prediction of topology can be performed in the 
framework of differentiable optimization. Initial studies 
were performed in the early twentieth century for prob- 
lems where layout is described in terms of densities of 
fields of stringers at the plastic limit, with variational 
calculus being the setting of the mathematical analy- 
sis [9]. Works involving the tools of mathematical pro- 
gramming techniques were initiated in the 1960s, based 
on similar mechanical models for truss structures, lead- 
ing to linear programming problems, and with the fun- 
damental solutions being the basis for obtaining con- 


siderable insight into the mechanical nature of optimal 
topologies [8]. 

In the last two decades (as of 2007), work in the 
area of shape design in a variational setting has led 
to a revival of topology design and it is now one of 
the most active areas of design optimization. As it is 
broadly recognized that structural layout has an im- 
mense influence on structural performance, the tech- 
nology is now quite standard in industrial contexts, typ- 
ically based on the standard use of finite-element soft- 
ware. In discrete form the problems treated are similar 
in structure to other structural optimization problems, 
i.e., with objective and constraint functions given in 
terms of the design variables and correlated state vari- 
ables, which in turn are given implicitly as solutions to 
variational problems depending on the design variables 
(that is, the problem is a mathematical programming 
problem with equilibrium constraints, and in a contin- 
uum mechanics formulation it is a so-called partial dif- 
ferential equation constrained optimization problem). 
In topology design the number of design variables is 
large, usually leading to simplifications made in terms 
of the number of constraints and in the complications 
involved in the variational problem defining the state. 
In mathematical programming terms, sequential con- 
vex approximations and dual methods play an impor- 
tant role [10], while for problems of special structure 
(see later), interior point methods for semidefinite pro- 
gramming problems and methods of nonsmooth op- 
timization have led to efficient computational proce- 
dures [3]. 

In its general continuum setting the mathematical 
analysis of topology design problems is based on the 
tools of variational analysis, as seen in problems of op- 
timal control of partial differential equations. Methods 
of variational convergence (G-convergence, J” -conver- 
gence, etc.) and relaxation are central to the area, and as 
relaxed controls can be understood in terms of compos- 
ite materials a close interaction between the area and 
the field of theoretical material science has been fruitful 
and of mutual benefit [2,7]. 

Seen from a mathematical programming perspec- 
tive, the most thoroughly studied topology design prob- 
lem is the so-called truss topology problem. Here the 
optimization of the geometry and topology of trusses 
can conveniently be formulated in terms of the well- 
known ground structure method. In this approach the 
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layout of the truss structure is found by allowing a cer- 
tain set of connections between a fixed set of nodal 
points as potential structural or vanishing members 
(bars in tension or compression only). Allowing for 
continuously varying cross-sectional bar areas, includ- 
ing the possibility of zero bar areas, gives a continu- 
ous optimization problem which permits the predic- 
tion of topology; that is, the prediction of which bars 
should be part of an optimal structure. A similar prob- 
lem structure can be achieved in some finite-element 
versions of continuum problems, such as variable thick- 
ness membrane problems and problems involving the 
prediction of topology as well as material. If we consider 
the simplest possible optimal design problem, namely, 
the minimization of compliance (maximization of stiff- 
ness) for a given total mass of the structure, the topol- 
ogy optimization problem has a linear objective func- 
tion in the vector of nodal displacements u and a bilin- 
ear constraint equation determining the displacements 
as functions of the design variables t;, resulting in the 
following problem statement: 


min f'u 
t€R,uER 


m 
subject to t; Kiu = f 


= (1) 
t>0 


where K; is the positive semidefinite element stiffness 
matrix of the ith element, and the vector f denotes 
the given external loads. It is for this problem possible, 
through duality principles, to derive a number of equiv- 
alent problem statements, for example, in semidefinite 
programming form. With these formulations at hand it 
is thus possible to devise algorithms which can handle 
large-scale problems, and it is possible to devise algo- 
rithms for finding global optima [1]. 

A survey of the area can be found in [6]; this refer- 
ence also includes an extensive bibliography. 


See also 

> Semidefinite Programming and Structural 
Optimization 

> Structural Optimization 


> Structural Optimization: History 
> Topology of Global Optimization 


References 


1. Achtziger W, Stolpe M (2007) Truss topology optimization 
with discrete design variables - Guaranteed global opti- 
mality and benchmark examples. Struct Multidiscip Optim 
34:1-20 

2. Allaire G (2002) Shape Optimization by the Homogeniza- 
tion Method. Springer, New York 

3. Ben-Tal A, Kocvara M, Nemirovski A, Zowe J (2000) Free 
material design via semidefinite programming: the multi- 
load case with contact conditions. SIAM Rev 42:695-715 

4. Bendsoe MP (2006) Computational Challenges for Multi- 
Physics Topology Optimization. In: Mota Soares CA et al 
(eds) Computational Mechanics — Solids, Structures and 
Coupled Problems. Springer, Dordrecht, pp 1-20 

5. Bendsoe MP, Lund E, Olhoff N, Sigmund O (2005) Topol- 
ogy Optimization - broadening the areas of application. 
Control Cybern 34:7-35 

6. Bendsge MP, Sigmund O (2003) Topology Optimization - 
Theory, Methods and Applications. Springer, Heidelberg 

7. Cherkaev AV (2000) Variational Methods for Structural Op- 
timization. Springer, New York 

8. Dorn W, Gomory R, Greenberg M (1964) Automatic design 
of optimal structures. J Mécanique 3:25-52 

9. Hemp WS (1973) Optimum structures. Clarendon Press, 
Oxford 

10. Svanberg KA (2002) Class of Globally Convergent Opti- 
mization Methods Based on Conservative Convex Separa- 
ble Approximations. SIAM J Optim 12:555-573 


Traffic Network Equilibrium 
TNE 


ANNA NAGURNEY 
University Massachusetts, Amherst, USA 


MSC2000: 90B06, 90B20, 91B50 


Article Outline 


Keywords 
Traffic Network Equilibrium 
with Travel Disutility Functions 
Elastic Demand Traffic Network Problems 
with Known Travel Demand Functions 
Fixed Demand Traffic Network Problems 
See also 
References 


3930 


Traffic Network Equilibrium 


Keywords 


User-optimization; System-optimization; Traffic 
assignment; Variational inequality formulations; 
Equilibration; Congested network; Braess paradox 


The traffic network equilibrium problem, sometimes 
also referred to as the traffic assignment problem, ad- 
dresses the problem of users of a congested transporta- 
tion network seeking to determine their minimal cost 
travel paths from their origins to their respective des- 
tinations. It is a classical network equilibrium problem 
and was studied by A.C. Pigou [29], who considered 
a two-node, two-link (or path) transportation network, 
and was further developed by F.H. Knight [21]. The 
congestion on a link is modeled by having the travel 
cost as perceived by the user be a nonlinear function; 
in many applications the cost is convex or monotone. 

The main objective in the study of traffic network 
equilibria is the determination of traffic patterns char- 
acterized by the property that, once, established, no user 
or potential user may decrease his travel cost or disu- 
tility by changing his travel arrangements. The traf- 
fic network equilibrium conditions were stated by J.G. 
Wardrop [33] through two principles: 

First principle: The journey times of all routes actu- 
ally used are equal, and less than those which would be 
experienced by a single vehicle on any unused route. 

Second principle: The average journey time is mini- 
mal. 

In the standard traffic equilibrium problem, the 
travel cost on a link depends solely upon the flow on 
that link whereas the travel demand associated with an 
O/D pair may be either fixed, that is given, or elastic, 
that is, it depends upon the travel cost associated with 
the particular origin/destination (O/D) pair. M.J. Beck- 
mann, C.B. McGuire, and C.B. Winsten [2] in their 
seminal work showed that the equilibrium conditions 
in the case of separable (and increasing) functions co- 
incided with the optimality conditions of an appropri- 
ately constructed convex optimization problem. Such 
a reformulation also holds in the nonseparable case 
provided that the Jacobian of the functions is symmet- 
ric. The reformulation of the equilibrium conditions in 
the symmetric case as a convex optimization problem 
was also done in the case of the spatial price equilib- 
rium problem by P.A. Samuelson [30]. 


S.C. Dafermos and F.T. Sparrow [14] coined the 
terms user-optimized and system-optimized transporta- 
tion networks to distinguish between two distinct sit- 
uations. In the user-optimized problem users act uni- 
laterally, in their own self-interest, in selecting their 
routes, and the equilibrium pattern satisfies Wardrop’s 
first principle, whereas is the system-optimized prob- 
lem users select routes according to what is optimal 
from a societal point of view, in that the total costs in 
the system are minimized. In the latter problem, the 
marginal total costs rather than the average user costs 
are equilibrated. She also introduced equilibration al- 
gorithms based on the path formulation of the problem 
which exploited the network structure of the problem 
(see also [23]). Another algorithm that is widely used in 
practice for the symmetric TNE problem is the Frank- 
Wolfe algorithm (cf. » Frank—Wolfe algorithm) [19]. 

Such a symmetry assumption was limiting, how- 
ever, from both modeling and application standpoints. 
The discovery of Dafermos [6] that the traffic equilib- 
rium conditions as formulated by M.J. Smith [31] de- 
fined a variational inequality problem allowed for such 
modeling extensions as: asymmetric link travel costs, 
link interactions, and multiple modes of transportation 
and classes of users. It also stimulated the development 
of rigorous algorithms for the computation of solutions 
to such problems as well as the qualitative study of equi- 
librium patterns in terms of the existence and unique- 
ness of solutions in addition to sensitivity analysis and 
stability issues. 

Algorithms that have been applied to solve gen- 
eral traffic network equilibrium problems include pro- 
jection and relaxation methods (cf. [1,3,4,6,7,8,9,17,22, 
25,27]) and simplicial decomposition (cf. [16], and the 
references therein). Projection and relaxation methods 
resolve the variational inequality problem into a se- 
ries of convex optimization problems, with projec- 
tion methods yielding quadratic programming prob- 
lems and relaxation methods, typically, nonlinear pro- 
gramming problems. Hence, the overall effectiveness of 
a variational inequality-based method for the computa- 
tion of traffic network equilibria will depend upon the 
algorithm used at each iteration. 

Sensitivity analysis for traffic networks was con- 
ducted by M.A. Hall [20] and R. Steinberg and W. 
Zangwill [32] and in a variational inequality frame- 
work by Dafermos and A. Nagurney [10,1 1,12]. Some 
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of the work was, in part, an attempt to explain traffic 
network paradoxes such as the Braess paradox [5] (see 
also [15,18,24]) in which the addition of a link results in 
all users of the transportation network being worse off. 

For background and additional material, including 
theoretical results, algorithms, and computational ex- 
amples, see [16,25,26], and [28]. 

A variety of traffic network equilibrium models are 
now presented along with the variational inequality for- 
mulations of the governing equilibrium conditions. 


Traffic Network Equilibrium 
with Travel Disutility Functions 


The first model described here is due to Dafermos [7]. 
In this model, the travel demands are not known and 
fixed but are variables. The spatial price equilibrium 
problem (cf. [13]) is equivalent to this problem. 

We consider a network [N, L] consisting of nodes 
[N] and directed links [L]. Let a denote a link of the 
network connecting a pair of nodes, and let p denote 
a path (assumed to be acyclic) consisting of a sequence 
of links connecting an O/D pair w. P,, denotes the set 
of paths connecting the O/D pair w with np, paths. We 
let W denote the set of O/D pairs and P the set of paths 
in the network. We assume that there are J O/D pairs, 
na links, and ny paths. 

Let x», represent the flow on path p and let f, denote 
the load on link a. The following conservation of flow 
equation must hold for each link a: 


fa = > xpSap. 
P 


where 6a, = 1, if link a is contained in path p, and 0 
otherwise. Hence, the load on a link a is equal to the 
sum of all the path flows on paths that contain the link 
a. 

Moreover, if we let d,, denote the demand associ- 
ated with an O/D pair w, then we must have that for 
each O/D pair w: 


dy = y= 


pEePw 


where xp = 0, for all p, that is, the sum of all the path 
flows on paths connecting the O/D pair w must be equal 
to the demand d,,. We refer to this expression as the 
demand feasibility condition. Let x denote the column 
vector of path flows with dimension np. 


Let c, denote the user cost associated with traversing 
link a, and let C, the user cost associated with traversing 
path p. Then 


Cy = > CaSap- 


In other words, the cost of a path is equal to the sum of 
the costs on the links comprising that path. We group 
the link costs into the row vector c with n4 components, 
and the path costs into the row vector C with np compo- 
nents. We also assume that we are given a travel disu- 
tility function A,, for each O/D pair w. We group the 
travel disutilities into the column vector A with J com- 
ponents. 

We assume that, in general, the cost associated with 
a link may depend upon the entire link load pattern, 
that is, cg = ca(f) and that the travel disutility associ- 
ated with an O/D pair may depend upon the entire de- 
mand pattern, that is, A,, = Ay(d), where f is the n4- 
dimensional column vector of link loads and d is the 
J-dimensional column vector of travel demands. 


Definition 1 (traffic network equilibrium; [2,7]) A 
vector x* € R!”, which induces a vector d*, through the 
demand feasibility condition, is a traffic network equi- 
librium if for each path p € P,, and every O/D pair w: 


=A,(d*) ifx* > 0 


oie) > Aw(d*) 


if x5 = 0, 


In equilibrium, only those paths connecting an O/D 
pair that have minimal user costs are used, and their 
costs are equal to the travel disutility associated with 
traveling between the O/D pair. 

The equilibrium conditions have been formulated 
as a variational inequality problem by Dafermos [7]. In 
particular, we have: 


Theorem 2 (cf. [7]) (x*, d*) € K! is a traffic net- 
work equilibrium pattern, that is, satisfies the equilib- 
rium conditions if and only if it satisfies the variational 
inequality problem: 

path flow formulation 


(C(x*), x —x*) — (A(d*), d — d*) > 0, 
V(x,d) € K’', 
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Traffic Network Equilibrium, Figure 1 
A traffic network equilibrium example 


where K! = {(x, d) : x > 0; and the demand feasibility 
condition holds}, or, equivalently, (f*, d*) € K? satisfies 
the variational inequality problem: 

link flow formulation 


(CF =P Ale a =") 0, 
V(f,d) € K’, 


where K? = {(f, d): x = 0; and the conservation of flow 
and demand feasibility conditions hold} and (-, -) de- 
notes the inner product. 


Example 3 For illustrative purposes, we now present 
an example, which is illustrated in Fig. 1. Assume that 
there are 4 nodes and 5 links in the network as depicted 
in the figure and a single O/D pair w = (1, 4). Define the 
paths connecting the O/D pair a: p; = (a, d), po = (b, e), 
and p3 = (a,c, e). 

Assume that the link travel cost functions are given 
by: 

calf) = 5fa + 5 fe +5, 

co(f) = 10fo + fa +5, 

cc(f) = 10f. + 5f + 10, 

ca(f) = 7 fa + 2fe +1, 

ce(f) = 10fe + fe + 21, 


and the travel disutility function is given by: 


Aw(d) = —3dy + 181. 


i 


The equilibrium path flow pattern is: x*, = 10, x a 
5, x53 = 0, with induced link loads: f* = 10, f; =5,f: = 
0, f7 = 10, f= = 5, and the equilibrium travel demand: 
d* = 15, 


The incurred travel costs are: Cp; = Cp2 = 136, Cp3 = 
161, and the incurred travel disutility A,, = 136. 

In the special case (cf. [2]), where the user link 
cost functions are separable, that is, cg = ca(fa), and 
the travel disutility functions are also separable, that is, 
Aw = Aw(dy), then the traffic network equilibrium pat- 
tern can be obtained as the solution to the optimization 
problem: 


fa dy 
i a(x) dx — Aw(y) dy. 
Pde [ cols) ax 3 [rvoray 
0 0 


Elastic Demand Traffic Network Problems 
with Known Travel Demand Functions 


We now consider elastic demand traffic network prob- 
lems in which the travel demand functions rather than 
the travel disutility functions are assumed to be given. 
The model is due to Dafermos and Nagurney [12] We 
retain the notation of the preceding model except for 
the following changes. We assume now that the de- 
mand d,,, associated with traveling between O/D pair 
w, is now a function, in general, of the travel disutilities 
associated with traveling between all the O/D pairs, that 
is, d,, = dy(A). We assume now that the vector d is a row 
vector and the vector A is a column vector. 

Note that the expression relating the link loads to 
the path flows is still valid, as is the nonnegativity as- 
sumption on the path flows. In addition, the link cost 
and path cost functions are as defined previously. 

The traffic network equilibrium conditions are now 
the following (cf. [2] and [12]): 


Definition 4 (traffic network equilibrium) <A path 
flow pattern x* and a travel disutility pattern A* is 
a traffic network equilibrium pattern if, for every O/D 
pair w and each path p € P,,, the following equalities 
and inequalities hold: 


=a) ia > 
Cals”) w Pp 
SA, aia = 
and 
= a a, = 0 
dy(A*)) 
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The first system of equalities and inequalities above is 
analogous to the traffic network equilibrium conditions 
for the preceding model where now the equilibrium 
travel disutilities A* are to be determined, rather than 
the equilibrium travel demand a*. 

The second set of equalities and inequalities, in turn, 
has the following interpretation: if the travel disutil- 
ity (or price) associated with traveling between an O/D 
pair w is positive, then the ‘market’ clears for that O/D 
pair, that is, the sum of the path flows on paths con- 
necting that O/D pair are equal to the demand associ- 
ated with that O/D pair; if the travel disutility (or price) 
is zero, then the sum of the path flows can exceed the 
demand. 

Here we can immediately write down the govern- 
ing variational inequality formulation in path flow and 
travel disutility variables (see, also, e. g., [12,25]). 


Theorem 5 (variational inequality formulation) (x*, 
A*)eE Re is a traffic network equilibrium if and only 
if it satisfies the variational inequality problem: 


D> do [Cp(x*) — AR] x [xp — X51] 
w pePy 
— )1d,(A*) — >> x3] x [Aw — AZ] = 0, 
w pePw 
V(x,A) € RYT, 


or, in vector form: 


((Ce") = Bat)", x =x") 
— ((d(A*) — Bx*)',A—A*) > 0, 
V(x,A) € REPT, 


where B is the (J X np)-dimensional matrix with element 
(w, p) = 1, if p € Py, and 0 otherwise. 


Fixed Demand Traffic Network Problems 


We now present the path flow and link load variational 
inequality formulations of the traffic network equilib- 
rium conditions in the case of fixed travel demands, in- 
troduced in [31] and [6]. 

We retain the notation of the preceding two mod- 
els. However, in contrast, it is assumed now that there is 
a fixed and known travel demand associated with trav- 
eling between each O/D pair in the network. Let d,, de- 
note the traffic demand between O/D pair w, which is 


assumed to be known and fixed. The demand must sat- 
isfy, for each w € W, 


ay = ae 


peEPyw 


where xp > 0, Vp, that is, the sum of the path flows be- 
tween an O/D pair w must be equal to the demand d,,; 
such a path flow pattern is termed feasible. 

Following [33] and [2], the traffic network equilib- 
rium conditions are given as follows. 


Definition 6 (fixed demand traffic network equilib- 
rium) A path flow pattern x*, which satisfies the de- 
mand, is a traffic network equilibrium, if, for every O/D 
pair w and each path p € P,, the following equalities 
and inequalities hold: 


Ay it'x, >0 


Cty. 
¥ fxs =0, 


> dw 


where A, is the travel disutility incurred in equilibrium. 


Again, as in the elastic demand models, in equilibrium, 
only those paths connecting an O/D pair that have min- 
imal user travel costs are used, and those paths that are 
not used have costs that are higher than or equal to 
these minimal travel costs. However, here the demands 
and travel disutilities are no longer functions. 

The equilibrium conditions have been formulated 
as a variational inequality problem by Smith [31] and 
Dafermos [6]. In particular, we present two formula- 
tions, in path flows and link loads, respectively. 


Theorem 7 (variational inequality formulation in 
path flows) x* € K? isa traffic network equilibrium in 
path flows if and only if it solves the following variational 
inequality problem: 


(C(x*),x—x*)>0, Wx eK’, 


where K? = {x € R‘\?: the path flow pattern is feasible}. 


Theorem 8 (variational inequality formulation in link 
loads) f* € K* is a traffic network equilibrium in link 
loads if and only if it satisfies the following variational 
inequality problem: 


(cf), f—f*) = 0, 


where K* = {f : 4x > 0, the path flow pattern is feasible 
and induces a link load pattern}. 


Vf eK’, 
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In the case where the Jacobian of the link travel cost 
functions is symmetric, i. e., 


Oca(f) - dcy(f) 
Ofte Ofa ” 


for all links a, b € L, then by Green’s lemma the vector 
c(f) is the gradient of the line vector f t c(x) dx. More- 
over, if the Jacobian is positive semidefinite, then the 
traffic equilibrium pattern (f*) coincides with the solu- 
tion of the convex optimization problem: 


f 
min f o(x) dx. 


fek* 
0 
In particular, when the link travel cost functions c, 
are separable, that is, cq = ca(fq) for all links a, then one 


obtains the objective function: 


which is the classical and standard traffic network equi- 
librium problem with fixed travel demands (cf. [2]). 
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Introduction 


The Traveling Salesman Problem (TSP) is perhaps the 
most studied discrete optimization problem. Its popu- 
larity is due to the facts that TSP is easy to formulate, 
difficult to solve, and has a large number of applica- 
tions. It appears that K. Menger [31] was the first re- 
searcher to consider the Traveling Salesman Problem 
(TSP). He observed that the problem can be solved by 
examining all permutations one by one. Realizing that 
the complete enumeration of all permutations was not 
possible for graphs with a large number of vertices, he 
looked at the most natural nearest neighbor strategy and 
pointed out that this heuristic, in general, does not pro- 
duce the shortest route. (In fact, the nearest neighbor 
heuristic will generate the worst possible route for some 
problem instances of each size [17].) For interesting 
overviews of TSP history, see [20,40]. 


Basic Definitions and Notation 


In applications, both the symmetric and asymmetric 
versions of the TSP are important. In the Symmetric 
TSP (STSP), given a complete (undirected) graph K,, 
with weights on the edges, our aim is to find a Hamilto- 
nian cycle in K,, of minimum weight (the weight a cycle 
is the sum of the weights of its edges). In the Asym- 
metric TSP (ATSP), given a complete directed graph 
K* with weights on the arcs, find a Hamiltonian cy- 
cle in K* of minimum weight. The Euclidean TSP is 
a special case of STSP in which the vertices are points 
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in the Euclidean plane and the weight on each edge is 
the Euclidean distance between its endpoints. A Hamil- 
tonian cycle in K, or Kj is often called a tour. Notice 
that ATSP has (n — 1)! tours, but STSP has (n — 1)!/2 
tours (changing the direction of the tour in STSP does 
not change the tour). By TSP we refer to both STSP and 
ATSP simultaneously. 

Throughout this entry, the set [n] = {1,2,..., n} 
denotes the vertices of K, or K* or any other n-vertex 
graph under consideration. The weight of an edge (arc) 
ij is denoted by w;; or w(i, j). We also call w;; the dis- 
tance from i to j and the length of ij. A cycle factor is 
a collection of vertex-disjoint cycles in K* covering all 
vertices of K7. 


Computational Complexity 


The Hamiltonian cycle problem on an n-vertex graph 
G can be transformed into STSP by converting G to an 
edge-weighted K,, as follows: assign weight 0 to each 
edge of G; and assign weight 1 to each edge in the com- 
plement of G. A similar transformation can be used for 
digraphs and ATSP. This implies that TSP is NP-hard, 
even if the triangle inequality holds. By replacing the 
weights 0 by 1 and the weights 1 by 1 + nr in this trans- 
formation, we obtain the following result: 


Proposition 1 For an arbitrary constant r, unless P = 
NP, there is no polynomial time algorithm that always 
produces a tour of total weight at most r times the opti- 
mal. 


It was proved in [12,38] that even Euclidean TSP is 
NP-hard. Despite this result, there was a feeling among 
some researchers that the Euclidean TSP is somewhat 
simpler than the general STSP. More precisely, Propo- 
sition 1 does not hold for the Euclidean TSP. This 
was confirmed by Arora [1] in 1996, see Theorem 2. 
Mitchell [33] independently made a similar discovery 
a few months later (see [2]). 


Theorem 2 For every € > 0, there is a polynomial time 
algorithm A. that, for any instance of the Euclidean 
TSP, finds a tour at most 1 + € times longer than the 
optimal one. 


As of this writing, the fastest algorithm A, has time 
complexity O(n logn + n/poly(e)) [42]. These A, al- 
gorithms have been implemented, but, in their cur- 


rent form, they are not competitive with best TSP 
heuristics [2]. 

Arora’s result can be generalized to d-dimensional 
Euclidean space for any constant d. However, the next 
theorem limits the scope of this generalization. 


Theorem 3 [45] There exists a constant r>1 such 
that, for the Euclidean TSP in O(log n)-dimensional Eu- 
clidean space, the problem of finding a tour that is at 
most r times longer than the optimal tour is NP-hard. 


We finish this subsection with a result from [16] that in- 
dicates another limitation for ‘approximation’ ATSP al- 
gorithms. The domination number of an ATSP heuris- 
tic H is the maximum d(n) such that for each instance 
of ATSP on n vertices, H produces a tour T which is 
not worse than at least d(m) tours including T itself. 


Theorem 4 Unless P = NP, there is no polynomial 
time ATSP heuristic of domination number at least (n — 
1)! — [n — n™ |! for any constant a < 1. 


Formulations 


Perhaps, the simplest combinatorial formulation of 
ATSP is as follows: given an n x n-matrix W = [wij] 
find a permutation z of [n] that minimizes the sum 


n—-1 
Wr(n),2(1) 1 = Wx(i), a(i+1)- 
i=1 


For STSP, we require that W is symmetric. 

The earliest (and very useful) integer programming 
formulation of ATSP is due to Dantzig, Fulkerson and 
Johnson [10]. Define n? — n zero-one variables x; j by 
xij = 1, if the tour traverses arc ij and x;; = 0, other- 
wise. Then ATSP can be expressed as: 


n n 
min z = ) ) Wi jXij 


i=1 j=l 


such that ) > xj; =1, jen) 


i=1 


n 
DE —o |e i¢€ [n] 
j=l 


yy ee |S| — 1 forall |S| <n 


i€S jes 


xij =0 or 1, iAje([n]. 
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The first set of constraints ensures that a tour must 
come into vertex j exactly once, and the second set of 
constraints indicates that a tour must leave every ver- 
tex i exactly once. These two sets of constraints en- 
sure that there are two arcs adjacent to each vertex, one 
in and one out. However, this does not prevent non- 
Hamiltonian cycles. Instead of having one tour, the so- 
lution can consist of two of more vertex-disjoint cycles 
(called sub-tours), i.e., be a cycle factor with t > 2 cy- 
cles. The third set of constraints, called sub-tour elim- 
ination constraints, requires that no proper subset of 
vertices, S, can have a total of |S| arcs. 

For STSP, we can get the following similar formula- 
tion: 


minz = D Wigriy (1) 
l<i<j<n 

such that }°x;;=2, j € [nl] (2) 
i=l 

> do xij = 2 forall 3 <|S| < n/2 (3) 

ieS j¢s 

O<xij <1, ifje [nl (4) 

xi; is integral for all i A j € [n]. (5) 


While the Dantzig-Fulkerson-Johnson formulation of 
ATSP has an exponential number of sub-tour elimi- 
nation constraints and, thus, of all constraints, there 
are other integer programming ATSP formulations that 
contain only a polynomial number of constraints. One 
such example is the formulation of Miller, Tucker and 
Zemlin [32]. In this formulation, we use (n — 1)(n — 2) 
additional constraints and n — 1 additional variables. 
The following constraints replace the sub-tour elimi- 
nation constraints in the Dantzig-Fulkerson-Johnson 
formulation: 


(n—1)xjj+uj—uj < (n—2) forall i A j = 2,3,...,n, 


where u;,i = 2,3,...,n are unrestricted real variables. 
If a solution is not a tour, it contains a cycle C without 
vertex 1. By adding the inequalities above correspond- 
ing to all arcs ij of C, we arrive at a contradiction. 
Notice that the Dantzig-Fulkerson-Johnson for- 
mulation of ATSP is stronger than the Miller-Tucker- 
Zemlin formulation in the following sense: the opti- 
mal value of the linear relaxation of the former is larger 


than that of the latter [37]. As for the STSP, there are 
two other formulations that are as strong as the the 
Dantzig-Fulkerson-Johnson formulation for STSP, but 
only the latter have been used in computational prac- 
tice. For more information on various formulations of 
TSP, see [40]. 


Applications 


It appears that the most natural and well-studied appli- 
cation area of the TSP is machine scheduling. A simple 
scheduling application can be described as follows. Sup- 
pose there are n jobs 1,2,..., to be processed sequen- 
tially on a machine. Let w;; be the set up cost required 
for processing job j immediately after job i. When all 
the jobs are processed, the machine is reset to its initial 
state at a cost of wj, where j is the last job processed. 
The aim of the Sequencing Problem is to find an order 
in which the jobs are to be processed so as to minimize 
the total setup cost. Observe that finding a permutation 
x of [n] that minimizes wa(n)na) + SoD) Waati+1) 
solves the problem. Thus, the Sequencing Problem is 
equivalent to ATSP. 

Now consider a more interesting application intro- 
duced and studied by Gutin et al. [15]. The Seismic Ves- 
sel Problem (SVP) is defined by a set of line segments 
(survey lines) on the plane, all of which need to be tra- 
versed exactly once. Some lines can be traversed in ei- 
ther direction, other have directional constraints im- 
posed on them. The objective is to minimize the travel 
time between lines by choosing an optimal ordering of 
lines (and specifying in which direction each line has to 
be traversed). The function that defines the travel time 
between lines can be of arbitrary complexity and in gen- 
eral is defined as a matrix of ‘line change’ weights for all 
combinations of pairs of lines and traversing directions. 

More formally, SVP can be stated as follows: We are 
given a weighted complete digraph K*, whose vertices 
are partitioned into pairs P (representing survey lines). 
Each pair {u,v} € P is assigned a set F,, such that 
OA Fy C {uv, vu}. If Fy, = {uv}, then we must 
traverse the survey line corresponding to uv from u to 
v, and if F,, = {uv, vu}, either traversing direction is 
possible. Let F = {uv : {u,v} € P,uv € Fy,}. Ev- 
ery arc in F is assigned weight zero (as we must tra- 
verse all survey lines and we assume that the time of 
traversal of a survey line in both directions is the same). 
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We are required to find a minimum weight Hamilton 
cycle in K* that traverses one arc from F,,, for every 
pair {u,v} € P. 

The Stacker Crane Problem (SCP) studied in [18,21] 
is a special case of SVP. In SCP, F,,, consists of one arc 
for every pair {u,v} € P. To see that SCP is equivalent 
to ATSP it suffices to contract all arcs of F. 

Consider SVP. In order to enforce the requirement 
that a Hamilton cycle has to traverse one arc in Fy, 
for each pair {u,v} € P, we apply a transformation 
which results in a weighted complete undirected graph. 
Solving STSP on the transformed graph provides a so- 
lution to the original problem. The transformation re- 
places each pair {u,v} € P with a graph. We con- 
sider only the more interesting case when the line {u, v} 
is undirected (that is, |F,,| = 2). In this case, the 
two vertices are replaced with the so-called diamond 
graph Dg with V(Ds) = {N,W,E,S,a,b,c,d} and 
E(Ds) = {Na,aW, Nb, bE, Wc, cS, Ed, dS, bc}. The 
diamond graph can be traversed in two possible ways, 
N—S and W —E (see Chapter 19 in [39]). These corre- 
spond to traversing the original pair of vertices, {u, v} 
via arcs uv and vu, respectively. To make the weight of 
the tour consistent with the original graph: 

e We set the weight of edges incident to W to be the 
same as the weight of the corresponding original 
arcs entering vertex v; 

e The weight of edges incident to E are taken to be the 
same as weight of arcs leaving u; 

e The weight of edges incident to N are taken to be the 
same as weight of arcs entering u; 

e The weight of edges incident to S are taken to be the 
same as weight of arcs leaving v; 

e Since arcs uv and vu have zero weight, all edges in- 
side the diamond graph have their weight set to 0. 

e The vertices a, b, c,d are not adjacent to any vertices 
outside their copy of Dg. 

For more TSP applications, see, e. g., [40]. 


Methods 


The methods to solve TSP can be divided into two large 
classes: exact algorithms that solve the problem or its 
special cases to optimality and the algorithms that nor- 
mally provide non-optimal tours. The members of the 
second class are called TSP heuristics or TSP approx- 
imation algorithms (the latter is often used if there is 


some kind of approximation guarantee). Exact algo- 
rithms are used when we want to obtain an optimal 
tour. This may not be possible as exact algorithms may 
well require several hours or days of running time even 
for instances of moderate size (for example, the au- 
thors of [11] found out that no state-of-the art exact 
algorithm could solve some ATSP instances with 316 
vertices within the limit of 10* sec.). When running 
time is limited or the data of the instance is not exact, 
one can use TSP heuristics. For discussion of TSP soft- 
ware implementing both exact algorithms and heuris- 
tics, see [30] and the site http://www.or.deis.unibo.it/ 
research.html. 


Exact Algorithms 


The brute-force method of explicitly examining all pos- 
sible TSP tours is impractical even for moderately sized 
problem instances because there are (n — 1)!/2 differ- 
ent tours in K, and (n — 1)! different tours in K*. The 
well-known dynamic programming algorithm of Hell 
and Karp [19] reduces the running time to O(n*2”) 
only, but this time complexity is still far too large to 
solve even TSP instances of moderate size. On the other 
hand, branch-and-bound, branch-and-cut and other 
branching algorithms are proved to be quite efficient 
in practice; branch-and-bound algorithms will be dis- 
cussed in this subsection. 

While every STSP instance can be considered as 
an ATSP instance and, thus, solved using ATSP algo- 
rithms, normally STSP-specialized algorithms are used 
for STSP instances as such algorithms are often more 
efficient that their ATSP counterparts (partially because 
they exploit a more special structure of STSP and par- 
tially because STSP algorithms have received signifi- 
cantly more attention than their ATSP counterparts). 
Moreover, in many cases ATSP instances are trans- 
formed into STSP instances and subsequently solved 
using STSP algorithms. (This situation may change in 
the future when advanced ATSP solvers will have been 
developed.) 

In this subsection, we will consider two ATSP-to- 
STSP transformations and basic ideas behind STSP 
branch-and-bound algorithms. We will not consider 
special polynomial-time solvable cases of TSP; instead 
we refer the reader to [5,26] which are excellent surveys 
on the topic. 
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The following are well-known ATSP-to-STSP trans- 
formations: 


The 2-node transformation: Replace every vertex i of 
K* by a pair i-,it of vertices to form Kon. The 
weights of edges of K2, are defined as follows: all 
weights are equal to +00 apart from w(i-, it) = 0 
and w(it, j-) = w(i,j) + M for all i 4 j € [nl], 
where w(i, j) is the weight of arc ij in Ki and M 
is a sufficiently large constant. The transformation 
value nM has to be subtracted from the STSP opti- 
mal weight to obtain the ATSP optimal weight. The 
transformation was introduced by Jonker and Vol- 
genant [24]. 

The 3-node transformation: Replace every vertex i of 
K* byatriple i~, i°, i+ of vertices to form K3,. The 
weights of edges of K3, are defined as follows: all 
weights are equal to +00 apart from w(i7,i°) = 
w(i?, it) = Oand w(it, j-) = wii, j) for all i # 
j € [n], where w(i, j) is the weight of arc ij in K*. 
The transformation was introduced by Karp [28]. 


Each transformation has its pros and cons, see [11,21]. 

Now we consider basic ideas behind TSP branch- 
and-bound and branch-and-cut algorithms using the 
Dantzig-Fulkerson-Johnson formulation of STSP. The 
formulation allows us to treat STSP as an integer pro- 
gramming problem. If we drop (5), we will get a lin- 
ear programming problem whose solution will give us 
a lower bound to STSP. The linear program is called 
the linear relaxation of STSP. A branch-and-bound al- 
gorithm for STSP could be as follows. 


Step 1 A list L of problems to solve is initialized by in- 
cluding into it the linear program discussed above. 
This problem is called the root problem. 

Step 2 If L = @, then the best known feasible solu- 
tion (tour) is optimal. Otherwise, choose a problem 
P and delete it from the list. 

Step 3 (a) Solve the linear relaxation of P. If the solu- 
tion is integral, return to Step 2 after eventually up- 
dating the best known integral solution and the best 
known solution value. 

(b) If the value of the objective function exceeds that 
of the best known feasible solution, return to Step 2. 
(c) Otherwise, using some linear inequality, parti- 
tion the current problem into two new problems 


which are added to L. The union of the feasible (in- 
tegral) solutions to each of these two problems con- 
tains all the feasible solutions of the problem that 
has been partitioned. This is commonly done by 
choosing a variable with a current fractional value 
xj; and imposing x;; = 1 in one problem and 
xij < 0 in the other. Return to Step 2. 


Due to computer memory limitations, the branch-and- 
bound algorithm is appropriate for an STSP formula- 
tion with a polynomial number of constraints, but this 
is not the case for the Dantzig—Fulkerson-Johnson for- 
mulation of STSP. Thus, we need to use a method that 
allows to store only a small number of constraints at any 
given moment of time. One such method is row gener- 
ation. Using row generation, we replace Step 1 of the 
above algorithm by the following: we initially solve the 
problem consisting of (1), (2) and (4) obtaining a so- 
lution x. Now we try to find a set S C [mn] such that 
ies Digs Xi; < 2. To do that we can use an efficient 
algorithm for computing a minimum cut in a weighted 
undirected graph applied to K, with weight function 
x: E(K,) > R (see, e.g., [7,25]. If a desired set S is 
found, we add the constraint to the current linear pro- 
gram and solve it to find a new vector x, and continue as 
above. If no desired set S has been found, the problem 
is solved. 

Similarly, one can solve other problems from the 
list L. In practice, we need to apply a minimum cut algo- 
rithm very few times, see, e. g., [36]. The solution found 
in Step 1 usually provides a good lower bound called the 
Held-Karp bound in the literature. 

STSP computational practice indicates that while 
branch-and-bound algorithms are fairly efficient in 
solving STSP, branch-and-cut algorithms are normally 
much more efficient. For an excellent overview of STSP 
branch-and-cut algorithms, see [36]. 


TSP Heuristics TSP heuristics can be roughly par- 
titioned into two classes: construction heuristics, and 
improvement heuristics. Both classes and their per- 
formances in computational experiments are discussed 
below. More comprehensive overviews of TSP heuris- 
tics can be found in [14,21] and [22]. Notice that [21] 
and [22] discuss families of instances on which many 
TSP heuristics have been tested. Many new heuristics 
are now tested using these families. This allows one to 
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compare new heuristics with many known ones with- 
out too much effort. 

We finalize the Heuristic subsection by a brief dis- 
cussion of approximation analysis of TSP heuristics. 


Construction Heuristics Construction heuristics build 
a tour from scratch and stop when one is produced. 
The simplest and most obvious construction heuristic 
is nearest neighbor (NN): the tour starts at any vertex x 
of the complete directed or undirected graph; we repeat 
the following loop until all vertices have been included 
in the tour: add to the tour a vertex (among vertices not 
yet in the tour) closest to the vertex last added to the 
tour. The greedy algorithm is based on the observation 
that a vertex-disjoint collection of paths in K* (K,) can 
be extended to a tour in K* (K,). In the ATSP greedy al- 
gorithm, we order all arcs a), a2,..., @n(n—1) such that 
w(a;) < w(aj;+4) for each i = 1,2,...,n(n —1)—-1, 
set C := @ and, in the ith iteration, we check whether 
the arcs of C and a; form a vertex-disjoint collection of 
paths or a tour, and if it is so, we add a; to C. 

Computational experiments in [21] indicate that, 
in fact, on most real-world-like problem instances of 
ATSP, NN performs better than the greedy algorithm; 
the greedy algorithm fails completely on one family 
of instances, where the average greedy-tour is more 
than 2000% above the optimum. Computational exper- 
iments for STSP in [22] show that both the greedy al- 
gorithm and NN perform relatively well on Euclidean 
instances and perform poorly for general STSP. The 
greedy algorithm appears to perform better than NN 
for STSP. 

Vertex insertion (VI) is another type of TSP con- 
struction heuristic. For ATSP, the insertion algorithm 
begins with a cycle of length 2, and in each iteration, 
inserts a new vertex into the cycle. For STSP, the al- 
gorithm beings with a cycle of length 3. We describe 
only the ATSP vertex insertion, but the STSP algorithm 
is similar. Let C be a cycle in K7, and let v be a ver- 
tex not on C. For any arc ab on cycle C, the insertion 
of vertex v at arc ab is the operation of replacing arc 
ab with the arcs av and vb. The resulting cycle is de- 
noted C(a,v,b). Observe that the difference between 
the wights of C(a, v, b) and C equals w(a, v)+ w(v, b)— 
w(a, b). The VI algorithm always inserts a vertex v at 
arc ab of C for which w(a, v) + w(v, b) — w(a, b) min- 
imum. 


Random vertex insertion (RVI), nearest vertex inser- 
tion (NVI), and farthest vertex insertion (FVI), which 
are defined below, are three different versions of algo- 
rithm VI. Each one of them is determined by how it 
chooses vertex v to be inserted into the current cycle C. 
Given a vertex v anda cycle Cin K7, d(v, C) denotes the 
distance from v to C, that is, d(v,C) = min{w(v, x) : 
x € V(C)}. The algorithm RVI chooses vertex v ran- 
domly. The algorithm NVI chooses vertex v so that its 
distance to cycle C is a minimum. That is, d(v,C) = 
min{d(u,C): u ¢ V(C)}. The algorithm FVI chooses 
vertex v so that its distance to cycle C is a maximum. 
That is, d(v, C) = max{d(u,C): u ¢ V(C)}. 

The vertex insertion heuristics described above per- 
form quite well for Euclidean TSP (see [22]). Com- 
putational experiments with RVI for ATSP in [13] 
show that RVI is good only for instances close to 
Euclidean. 

The following heuristic was initially suggested, in 
a different form, for the Vehicle Routing Problem by 
Clark and Write [9]. In the savings heuristic, we choose 
one vertex, say, n and compute new weights w’(i, j) = 
w(i, j) — w(i, n) — w(n, j) for alli # j € [n—1]. Then 
the greedy algorithm is applied for the new weights in 
K, —n (K; —n) until all vertices (but 1) are included in 
a path. Then n is added to the path to form a tour. The 
savings heuristic showed very good results for STSP 
in the computational experiments discussed in [22], in 
which the heuristic clearly outperformed the greedy al- 
gorithm, NN, RVI, NVI, FVI and a large number other 
heuristics. (The saving heuristic has not been tested for 
ATSP in [21].) 

The only heuristic, some versions of which could 
successfully compete with the savings heuristic in the 
experiments in [22], was the well-known Christofides 
heuristic [8]. The Christofides heuristic is designed only 
for STSP and proceeds as follows: First we find a min- 
imum weight spanning tree T in K,,. Let X be the ver- 
tices of odd degree in T. It is well-known that |X| is even 
and, thus, the subgraph G of K,, induced by X has a per- 
fect matching. We compute a minimum weight perfect 
matching M in G. The edges of T and M for an Euler 
graph H as all vertex degrees are even. We find an Euler 
trail R of H and ‘short-cut’ it, i.e., delete all repetitions 
of the same vertex in R. As a result, we obtain a tour. 
The way of short-cutting is very important for getting 
good quality tours [22]. 


Traveling Salesman Problem 


3941 


According to [21] the best ATSP construction 
heuristics are based on finding a minimum weight cy- 
cle factor (a vertex-disjoint collection of cycles covering 
all vertices of K7) and merging the cycles (the process 
often called patching in the literature) to obtain a tour. 
The operation of patching of two cycles C and Z deletes 
an arc in each of the cycles and adds an arc from C to Z 
and an arc from Z to C such that we obtain a cycle con- 
taining all vertices of C and Z. Often patching of cycles 
C = ijin... isi; and Z = jij... jrj, is done optimally, 
i.e., we delete arcs ipip41 and jqjq41 such that the cycle 


bt oo infest fqta-<<Jifve-Jodpeitpea a tehi 


is of minimum possible weight. 

The following simple yet very successful patching 
heuristic was introduced by Karp and Steele [29]. In the 
Karp-Steele heuristic, we always choose a pair of cy- 
cles (in the current cycle factor) with maximum num- 
ber of vertices and patch them optimally. The Karp- 
Steele heuristic performs not so good when the min- 
imum weight cycle factor has many cycles with just 
two vertices. In such cases, another patching heuris- 
tic, contract-or-patch (COP) gives better results [21]. 
COP partitions the cycles of the cycle factor into short 
and long cycles (a short cycle has at most t vertices for 
some fixed t). COP deletes the heaviest arc from each 
short cycle and contracts each such path using the op- 
eration of path-contraction defined shortly. COP finds 
a minimum cost cycle factor in the new complete di- 
graph and continues as above until the current cycle 
factor has no short cycles. In the last case, COP ap- 
plies the Karp-Steele heuristic, computes a tour and 
‘extends’ it to a tour in K* in the obvious way. For 
a directed path P = x x2...x,) in K%, the opera- 
tion of path-contraction (see [3] for the case of general 
weighted digraphs) consists of replacing all vertices of P 
in K* with a single new vertex v and assigning weights 
in the new digraph K;,_,,, as follows: the weight be- 
tween vertices not including v is the same as in K7, the 
weight w(v, u) in Oana, equals w(xp, u) in KF and the 
weight w(u, v) in | oder equals w(u, x;) in K for each 
u € V(K;)\ V(P). The contract-or-patch heuristic was 
introduced by Glover et al. [13]. 


Improvement Heuristics Improvement heuristics start 
from a tour normally obtained using a construction 
heuristic and iteratively improve it by changing some 


parts of it at each iteration. Improvement heuristics are 
typically much faster than the exact algorithms, yet of- 
ten produce solutions very close to the optimal one. 

It appears that currently the best improvement 
heuristics are based on local search, on genetic algo- 
rithm approach, or on a mixture of the two, which is of- 
ten called memetic algorithms. The most developed TSP 
improvement algorithms are local search algorithms 
that use edge exchange, in which a tour is improved by 
replacing k its edges with k edges not in the solution. 
For STSP, the 2-opt algorithm starts from an initial tour 
T and tries to improve T by replacing two of its non- 
adjacent edges with two other edges to form another 
tour. Once an improvement is obtained, it becomes the 
new T. The procedure is repeated as long as an im- 
provement is possible (or a time limit is exceeded). For 
k > 3, the k-opt algorithm is the same as 2-opt except 
that k edges are replaced at each iteration. 

The best local search algorithms use a variable k- 
opt search called the Lin-Kernighan local search, where 
at each iteration the actual value of k varies depend- 
ing on which value of k gives the best improvement, 
for details see, e. g., [43]. Although the Lin-Kernighan 
local search can be applied only to STSP, ATSP can 
be transformed into STSP (see above). However, there 
is an approach, the ejection chain methods, which in- 
clude the Lin-Kernighan search, that are applicable to 
ATSP. Recently, Rego et al. [44] developed a new ejec- 
tion chain method, the doubly-rooted Stem-and-Cycle 
method that can be directly applied to ATSP. Compu- 
tational experiments in [44] clearly demonstrated high 
efficiency of the new method. One interesting aspect of 
the method indicated in [44] is the fact that the method 
allows one to construct tours, in polynomial time, that 
are better than an exponential number of other tours. 

The main problem with any kind of local search 
is that no further improvement is possible once we 
have found a local optimum. To get around this prob- 
lem, one can restart the local search from another tour 
and repeat this many times. In the end, the best of all 
found tours gives us a solution. In practice, two ways 
to obtain restarting tours have been used. In the first 
(called iterated local search), a restarting tour is pro- 
duced by a construction heuristic as before. In the sec- 
ond (chained local search), a kind of perturbation is 
applied to the current or previous local optimum to ob- 
tain a restarting tour. It seems Baum [6] was the first 
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to introduce chained local search; this method proved 
to be significantly better than the iterated local search 
for large instances of STSP (see, e. g., Johnson and Mc- 
Geoch [22]). 

Genetic algorithms operate with a large number of 
tours at any given time. They produce the initial pop- 
ulation of tours and consecutively several other pop- 
ulations such that the best tour in the previous pop- 
ulation is not worse than the best tour in the current 
population. Genetic operators that change tours include 
mutations (a mutation makes small changes to a sin- 
gle tour) and crossovers. A crossover selects two tours 
and produces a new tour from them. It appears that the 
currently most efficient crossovers are variations of the 
edge assembly crossover (EAX) introduced by Nagata 
and Kobayashi [35]. In EAX, we identify a set A of edges 
from the first tour and a set B of edges from the second 
tour such that A U B forms a collection of alternating 
cycles (i. e., cycles in which edges alternate between the 
first and second tours) and replace all edges from A by 
the edges of B resulting in a cycle factor. Then an op- 
eration of patching is applied to the cycle factor. Re- 
cently, Nagata [34] reported on very impressive results 
for large instances of STSP achieved by a genetic algo- 
rithm using a new version of EAX and no local search. 


Worst Case Analysis of Heuristics While computa- 
tional experiments are important in the evaluation of 
heuristics, they cannot cover all possible families of in- 
stances of TSP and, in particular, they normally do 
not cover the most difficult instances. Moreover, cer- 
tain applications may produce families of instances that 
are much harder than those normally used in com- 
putational experiments. For example, such instances 
can arise when the Generalized TSP is transformed 
into TSP. Thus, theoretical analysis of the worst pos- 
sible cases is also important in evaluating and compar- 
ing TSP heuristics. One way to analyze worst cases of 
heuristics is Domination Analysis, see its entry in this 
book. 

We provide only a brief overview of the second 
approach to the worst case analysis of heuristics, Ap- 
proximation Analysis. For the STSP with triangle in- 
equality (i.e., wij + wje = wix for all vertices i, j, k), 
the best known approximation is 3/2 provided by the 
Christofides algorithm discussed earlier. The perfor- 
mance guarantee 3/2 means that a tour produced by 


the heuristic has weight which is at most 50% larger 
than that of an optimal tour. For the Euclidean TSP, 
we can obtain much better approximation as we saw 
earlier. For ATSP with triangle inequality, no algorithm 
with constant approximation guarantee is known. The 
best approximation ratio so far was obtained by Ka- 
plan et al. [27]: 0.841 - log n. Recently, Blaeser et al. [4] 
obtained a constant approximation guarantee when 
a strengthen triangle inequality holds: for some y € 
[1/2,1) we have y - (wij + wjx) = wik for all vertices 
i, j,k. The authors of [4] proved that their algorithm 
always produces a tour at most (1 + y)/(2— y — y?) 
times longer than an optimal one. 

We saw above that, if no triangle inequality is im- 
posed, there is no polynomial-time TSP algorithm with 
constant approximation guarantee (unless P=NP). We 
can overcome the inapproximability, by using another 
measure of performance guarantee. One such measure 
was defined by Zemel [46] who provided some mathe- 
matical arguments to show that his measure is better, in 
some sense, than the traditional performance (approxi- 
mation) ratio. Let A be a heuristic for TSP and I a prob- 
lem instance. Then wWypin(I), Wmax(I), w.a(J) denote the 
weights, respectively, of an optimal tour, a heaviest 
tour, and a tour produced by A for instance J. The 
Zemel measure of A, denoted p,(A), is the supremum 
of (wa (I) — Wmin(D))/(Wmax(L) — Wmin(J)), taken over all 
TSP instances I for which Wmax(I) # Wmin(I). The fol- 
lowing theorem was proved by Hassin and Khuller [18]. 


Theorem 5 There is a polynomial-time heuristic A 
for ATSP with p,(A) < 1/2, and one for STSP with 
p2(A) < 1/3. 


See also 


> Domination Analysis in Combinatorial 
Optimization 

> Evolutionary Algorithms in Combinatorial 
Optimization 

> Heuristic and Metaheuristic Algorithms for the 
Traveling Salesman Problem 
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In a sequence of path-breaking papers A.W. Tucker 
and A.J. Goldman systematically investigated the re- 
lation between the theory of linear programming (cf. 
> Linear programming) on the one hand, and theo- 
rems of the alternative (cf. » Linear optimization: Theo- 
rems of the alternative) on the other hand [3,4,5,12]. In 
these papers they develop a comprehensive theory, cov- 
ering many old and new results, often with new proofs. 
Thus they sharpen and consolidate the classical theo- 
rems of the alternative of J. Farkas [2], P. Gordan [6], 
E. Stiemke [11] and T.S. Motzkin [8] and the duality 
theory for linear optimization as first developed by G.B. 
Dantzig and J. von Neumann and O. Morgenstern. New 
is the emphasis they put on the property of complemen- 
tary slackness. They derive the above results from prop- 
erties of homogeneous systems of linear equality and in- 
equality relations. 

In its most general form such a so-called dual system 
consists of two systems, as follows: 


u unrestricted —Ax —By =0 
v>0 —Cx —Dy=0 (1) 
Alu+cClv>0 x>0 


Blu+D'v=0 y unrestricted 
The matrices and vectors in (1) are such that all expres- 
sions are well-defined, in particular the matrices A and 
C have the same number of columns and similarly for 
the matrices B and D. In the left system the variables 
are the entries in the vectors y and v, and in the right 
system these are the entries in the vectors x and z. Note 
that the lines in (1) define a natural one-to-one corre- 
spondence between the variables in one system and the 
inequalities in the other system. Also, if a relation is of 
equality type then the corresponding variable is unre- 
stricted (or free) and if it is of inequality type then the 
corresponding variable is nonnegative. 

One easily verifies that any solution of (1) will satisfy 


u'(—Ax — By) = 0, (2) 


v'(—Cx — Dy) > 0, (3) 
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x! (Alu+c'v)>=0, (4) 
y' (Blut D'v) =0. (5) 
Adding (2) and (4), and also (3) and (5), one gets 
u' By— v' Cx > 0. 


—u' By + v' Cx > 0, 


So u! By =v!Cx. Combining this with (2) and (5), one 
obtains 


—u' Ax = u! By =v'Cx= —v'Dy. 


This implies that (3) and (4) hold with equality: 


v'(—Cx — Dy) = 0, (6) 


x'(Alu+c'v) =0. (7) 


These relations are called the complementary slackness 
relations. They imply that if one of the nonnegative 
variables is positive then the corresponding inequality 
in the system necessarily holds with equality. 

Note that it is not excluded that a nonnegative vari- 
able is zero and the corresponding inequality in the sys- 
tem holds with equality. In general this may certainly 
occur. For example, the trivial solution x = y=u=v=0 
has this behavior. The main result in [12, Thm. 4], how- 
ever, states that there exists a solution of (1) with the 
property that a nonnegative variable is positive if and 
only if the corresponding inequality in the system holds 
with equality. Such a solution is called strictly comple- 
mentary and can be characterized by the fact that it sat- 
isfies the strictly complementary conditions: 


v—Cx—Dy>0, (8) 


x+Alu+clv>0. (9) 


In [12] Tucker proves this result in a number of steps. 
Only the first step is nontrivial; the other steps consist 
of rather elementary algebraic arguments. 

In the first step, he considers the simple dual system 


Alu>0, Ax=0, x>0. (10) 


Adapting arguments of D. Gale, in an unpublished 
proof of the fundamental theorem of H. Weyl [14] - that 
the convex hull of finitely many halflines is the inter- 
section of finitely many halfspaces — he shows the exis- 
tence of a solution of (10) such that the first coordinate 


of the vector x + Au is positive. This result is basic 
for the rest of Tucker’s paper [12]. As Tucker shows, it 
already implies Farkas’ lemma and, as he remarks, it re- 
curs in geometric form in [4] as the theorem stating that 
‘a polyhedral cone is the polar of its polar’ and in [3] 
as the separation theorem for a polyhedral convex cone 
and an individual vector. Tucker’s proof exploits only 
algebraic arguments and uses induction to the number 
of columns of A. It may be noted that he could also have 
used Farkas’ lemma (cf. » Farkas lemma). Because if 
there does not exist a solution with x)> 0, then writing 
x; =e] x, where e; denotes the first unit vector, the sys- 
tem Ax = 0, x > 0, —elx < 0 does not have a solution; 
then Farkas’ lemma states that the system Alg= = 4 
has a solution. Hence, with u = —z, one has a solution of 
(10) such that the first coordinate of x + A! wis positive. 

Of course, there is nothing special with the first co- 
ordinate of x + A'u. For each of the other coordinates 
one can also obtain solutions x and u such that this 
particular coordinate of x + A! w is positive. By adding 
these solutions one gets a solution of (10) all of whose 
coordinates are positive, i. e., such that 


x+Alu>o. (11) 
Thus the main result has now been proved for the spe- 
cial case of system (10). At this stage Tucker shows that 
the Stiemke and Gordan transposition theorems easily 
follow. Indeed, if there is no u such that A'u 4 0 then 
there must exist an x > 0, with Ax = 0, which is Stiemke’s 
theorem; and if there is no nonzero x > 0 such that Ax 
= 0 then there must exist a u such that A! u > 0, which 
is Gordan’s theorem. 

When applying the above result with the matrix A 
replaced by (A B C — C) it immediately follows that the 
system 


Alu>0, Blu>0, Clu=0, (12) 

Ax+By+Cz=0, x=>0, y=0 (13) 
has a solution such that 

x+Alu>0, v+Blu>0. 


Hence, if every solution u of (12) satisfies A' u=0 then 
(13) must have a solution with x > 0. By the comple- 
mentary slackness property, each solution will satisfy 
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x'A'u=0. Therefore, either the system 


Alu>0, Alu#0, Blu>0, Clu=0 
has a solution or the system 
Ax+By+Cz=0, x>0, y=0 


has a solution, but not both. This result is known as 
Tucker’s transposition theorem [7]. On the other hand, 
and in a similar way, it follows that either the system 


A'u>0, Blu>0, Clu=0 
or the system 
Ax+By+Cz=0, x>0, y>0, yO 


has a solution, but not both. This is Motzkin’s transposi- 
tion theorem. When C is vacuous, these results are also 
known as ‘theorems of the alternative’ for the pair A, B 
of matrices [1]. 

When replacing the matrix A in (10) by (J K) one 
obtains that the system 


K'u>0, -Kx>0, u>0, x>0 (14) 
has a solution such that 
u—Kx > 0, x+K'u>0. 


Tucker notes that by applying this result to the pay-off 
matrix of a ‘fair’ zero-sum two-person game, one may 
easily derive a well-known theorem of von Neumann 
and Morgenstern [9]. One easily sees that the following 
alternatives hold: 


K'u#0 or x>Q0, 


K'u>0 or x #0, (15) 
u>0O or —Kx 40, 
u#~0 or —Kx>0. (16) 


The above alternatives are mutually exclusive because 
u' Kx = 0 for all solutions of (14); (15) and (16) are 
dual forms of the theorem of the alternative for matri- 
ces in [9]. It also follows that if the system —Kx > 0, x 
> 0 has no nonzero solution then the system K Tu>0, 
u > 0 has a solution; this result is due to J. Ville [13]. 
The existence of a strictly complementary solution 
of the most general dual system, as given by (1), now 


straightforwardly follows by replacing the matrix K in 
(14) by the matrix 


—-A —-B B 
A B —B 
C D -—-D 


This yields the existence of nonnegative vectors u41, U2, 
v; (x, yi, and y2 such that 
—Al uy, + Alu ear oe = 0, 
—Bl uy, + Blu, +D'v>0, 
Blu, -— Blu —D'y = 0, 
x = Alay + Alu sly S 0, 
and 
Alx+ By, - Bly, >0, 
—A'x—B'y, + B' y, >0, 
—C'x—D'y, + D'y > 0, 
v—Clx- D'y, + D'y. >0. 
Take u = uz — uy, and y = y; — y2. Then 
Alu+cC!lv>0, 
Blu+D'v=0, 
ee Aly eel ys O, 
and 
A'x+B'y=0, 
=Aly— Bly = 0, 
yv—C'x—D'y> 0, 


showing that u, v, x and y solve the dual system (1), and 
also satisfy the strictly complementarity conditions (8) 
and (9). 

An interesting and important special case of the 
dual system (14) occurs when the matrix K is skew- 
symmetric. Then the conditions on x and u are the same 
and the system becomes a selfdual system. Taking z = x 
+ uand replacing K by K' it then follows that there 
exists a vector z such that 


Kz>0, z>0, z+Kz>0. (17) 


In fact, the result for this special case is strong 
enough to recover the more general result for the sys- 
tem (14): if K is an arbitrary matrix then one simply 
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applies (17) to the skew-symmetric matrix 


(« ): 


The existence of a strictly complementary solution 
to a selfdual system is used in [5] “as an omnibus means 
of proving the basic duality and existence theorems of 
linear programming’; a new proof is given [10], where 
this result is used for the same purpose. The derivation 
of the duality theorem goes as follows. 

For a given matrix A and column vectors b and c 
of appropriate size consider the pair of dual linear pro- 
grams 


min c!x 
(P) st. Ax >b 
x> 
max bly 
(D) s.t. Aly <c 
y= 0. 


If x is feasible for (P) and y for (D) then 


clx> y' Ax > bly. 

This is known as the weak duality result for linear op- 

timization. The strong duality result states that if one of 

the two problems (P) and (D) has an optimal solution, 

then so has the other and the optimal values coincide. 
Define the skew-symmetric matrix K by 


0 A -—-b 
K:=|-A' 0 c 
b' -~<l 9 


Applying (17) to K one obtains nonnegative vectors y 
and x and a nonnegative scalar t such that 


Ax — th > 0, (18) 
—Aly+tc>0, (19) 
bly —c'x>0, (20) 
y+ Ax — tb > 0, (21) 
x—A'y+ te>0, (22) 


t+b'y—c'x>0. (23) 


Recall that these relations imply the complementarity 
relations, which are given by 


y' (Ax — tb) = 0, (24) 

x'(-Aly + tc) =0, (25) 

t(b'y —c'x)=0. (26) 
Note that (24) and (25) are equivalent to 

y' Ax =t bly =tc'x (27) 


and these relation imply (26). 

The relations (18)-(23) are homogeneous in t, x and 
y. Hence, a solution with t > 0 exists if and only if a so- 
lution with t = 1 exists. Thus two cases have to be dis- 
tinguished: either t = 0 or t= 1. 

If t = 0 then one has 


b'y—c'x>0. 


Ax > 0, AX y <0, 
Thus one has either b'y > 0 or c'x < 0 or both. First 
consider the case bly > 0. Then (P) cannot have a fea- 


sible solution x, for this would yield the contradiction 
0> x! (Aly) =(Ax)'y> bl y>0. 


Moreover, if (D) has a feasible solution y’, then AY ys 
0 implies that y’ + ay is feasible for (D) for any nonneg- 
ative ~. From 


b'y’+ay=b'y +ab'y, 


it follows that the dual objective value can attain ar- 
bitrarily large valus, since b'y > 0. The dual problem 
(D) is unbounded in this case. Thus, if bly > 0, then 
(P) is infeasible and (D) can be either infeasible or un- 
bounded. 

If c'x < 0, similar arguments can be used to show 
that (D) is infeasible and (P) can be either infeasible or 
unbounded. 

If t = 1 then x is feasible for (P) and y for (D), 
whereas c'x = b'y, proving that x is an optimal solu- 
tion for (P) and y is an optimal solution for (D). Hence, 
the duality theorem for linear optimization has been 
proved. 
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The above approach to the duality theory for linear 
optimization yields a little more than the classical ap- 
proach, namely; if (P) and (D) are feasible, then there 
exist strictly complementary optimal solutions x and y 
(5, [Coroll. 2A]). This is due to (21) and (22) which give 
(for t = 1): 


y+ (Ax —b)>0, 
x+(c—Aly)>0. 


See also 


> Farkas Lemma 

> Linear Optimization: Theorems of the Alternative 
> Linear Programming 

> Motzkin Transposition Theorem 
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Introduction 


This article considers the problem of asymptotical sta- 
bility of optimal trajectories of dynamical systems de- 
scribed by differential inclusions. In the literature, the 
results obtained in this area are called “turnpike theo- 
rems.” 

Turnpike theory has many applications in eco- 
nomics and engineering. We refer to [9,17,18,25] for 
more detailed information about this theory and its var- 
ious applications. 
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The first result in this area was obtained by J. von 
Neumann, in 1945. However, the main meaning of this 
result that led to turnpike property was discovered by 
Paul A. Samuelson, in 1948-1949, who also introduced 
this terminology. These results were obtained for opti- 
mal trajectories of models of economic dynamics deter- 
mined by convex processes (the von Neumann model). 
A clearer description of this property was provided by 
Dorfman et al. [1] in the Chap. “Efficient Programs 
of Capital Accumulation” of Linear Programming and 
Economic Analysis. The following is the famous quote 
from [1], p. 331, that describes the meaning of the turn- 
pike property: 

“Thus in this unexpected way, we have found a real 
normative significance for steady growth - not steady 
growth in general, but maximal von Neumann growth. 
It is, in a sense, the single most effective way for the sys- 
tem to grow, so that if we are planning long-run growth, 
no matter where we start and where we desire to end up 
it will pay in the intermediate stages to get into a growth 
phase of this kind. It is exactly like a turnpike paralleled 
by a network of minor roads. There is a fastest route 
between any two points; and if the origin and destina- 
tion are close together and far from the turnpike, the best 
route may not touch the turnpike. But if origin and des- 
tination are far enough apart, it will always pay to get 
on to the turnpike and cover distance at the best rate of 
travel, even if this means adding a little mileage at either 
end. The best intermediate capital configuration is one 
which will grow most rapidly, even if it is not the desired 
one, it is temporarily optimal”. 

In a simple case, when the trajectories of the sys- 
tem under consideration are uniformly bounded, the 
following formulation could be considered as a turn- 
pike property. 

Let {x7(t)} be a set of optimal trajectories defined 
on the intervals [0, T], T > 0, and x* be a fixed point. 
In the applications, x* is usually an optimal stationary 
point. 

Turnpike property: For any e > 0 there is a finite 
number K, > 0 such that for all T > 0 the following in- 
equality holds: 


meas{t € [0, T] : ||xr(t)—x*|| =e} < Ke. 


The meaning of this statement is as follows: the 
time that optimal trajectories spend outside the ¢- 
neighborhood of x* is bounded by some finite number 


K, that does not depend on T and optimal trajectories. 
If the system is considered on the interval [0, oo), 
the turnpike property can be formulated as a conver- 
gence of all optimal trajectories to x*. 
Historically, the turnpike theory was first studied 
for optimal control problems in discrete time. The gen- 
eral formulation of these problems can be presented as 


Maximize J({x;})_,); 


t=1,...,T. 


subject to x41 € a(xz), 


(1) 


Set-valued mapping a: (2 — JT,(IR") is usually as- 
sumed to be continuous in the Hausdorff metric. Here 
92 C R” (in a particular case §2 = R") and [7,(R") 
stands for the set of all compact subsets of R”. The 
graph of the mapping a is defined by 


graph a = {(x,y):x € 2, y€a(x)}. 


The objective function - functional J({x;}/_,) can be 
defined in different forms. It is usually defined by some 
utility function u(x;, x;—-1). In some cases it is assumed 
that u(x;, x;-1) = u(x;); that is, utility function u does 
not depend on x;_}. 

Terminal and integral type functionals with and 
without discount factors are most commonly used in 
the literature. 

Many approaches have been developed to study the 
turnpike property for different classes of problems (1). 
Good surveys of these approaches developed by the 
1970s can be found in [9,17]. The main achievement 
of these approaches can be summarized as follows. The 
turnpike property is valid under the following convex- 
ity assumptions: 


graph a is convex, and the function u ; 

is strictly concave . . 
We say that problem (1) is convex if condition (2) 
holds. Many problems arising in economics are con- 
vex problems. Thus, the methods developed for discrete 
systems in the form of (1) can be successfully applied to 
such piratical problems. 

The study of turnpike property for continuous sys- 
tems started in the 1970s. It turned out that the methods 
developed for discrete systems were not applicable for 
continuous systems; thus, new methods were required. 
In order to prove the turnpike property for continuous 
systems, together with the convexity assumptions (2), 
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very restrictive additional assumptions were used. Since 
then, to prove turnpike property assuming only con- 
vexity conditions (2) became a very difficult and chal- 
lenging problem. This article gives a brief survey of the 
results obtained in this area. 


Definitions 


Let x° € 2 bea given initial point. We will consider the 
following optimal control problem 

Maximize J(x(-)); subject to x € a(x) . (3) 
We assume that set-valued mapping a: {2 — IT,(R") 
has compact images and is continuous in the Hausdorff 
metric. Here 2 C R” or Q = R". 


Definition 1 An absolutely continuous function x(-) 
is called a trajectory defined on the interval [0, T], if for 
almost all t € [0, T] the inclusion x(t) € a(x(t)) holds. 


Definition 2 
0 € a(x). 


x € R" is called a stationary point if 


Stationary points play an important role in the study of 
asymptotical behavior of optimal trajectories. Through- 
out this article, we denote the set of stationary points 
by M: 


M={xE2:0€a(x)}. 


If mapping a(x) is continuous, then M is a compact set. 
We note that this set may be empty. 

We will consider different classes of functionals 
J(x(-)). The following are the most commonly used 
functionals considered in the literature: 


lim inf u(x(t)), lim inf u(x(t), (¢)) (4) 


T T 
/ u(x) dt, / u(x, x) dt; (5) 
0 0 


Le e"' u(x) dt, e "u(x, x) dt. (6) 
0 0 


In (4), lim inf;,9 u(x(t), x(£)) is taken over the points 
t where x(t) exists. 

Several approaches have been developed to study 
turnpike property for continuous systems. These ap- 
proaches use the Hamiltonian of problem (3) and the 
necessary conditions of optimality in various versions. 


We specially mention the approaches developed by 
Rockafellar [20,21] and Scheinkman [22,23]. They con- 
sidered a convex problem with integral functionals 
(with and without a discount factor). The additional as- 
sumptions (together with convexity) that were used in 
these approaches involve the derivatives of the Hamil- 
tonian. That is why these assumptions are very difficult 
to check in practical problems. 

Among the other approaches developed for prob- 
lem (3), we mention the results of Gusev and 
Yakubovich [4,5], Panasyuk and Panasyuk [18] and 
Zelikina [26]. They considered some special classes of 
optimal control problems defined by differential equa- 
tions and the turnpike property was established under 
some restrictive assumptions. The approaches devel- 
oped in [4,5,26] were based on Pontryagin’s maxi- 
mum principle. The results obtained by Panasyuk and 
Panasyuk [18] found some interesting applications in 
engineering where the corresponding restrictive as- 
sumptions hold. 

Summarizing these results, we observe that the tech- 
niques developed for continuous systems in the form 
of (3) have not been successful in the establishment of 
turnpike property for convex problems. The additional 
assumptions were too restrictive in terms of application 
to a wide range of practical problems. 

However, it was the common opinion that this flaw 
was due to the drawbacks of the techniques developed. 
As mentioned above, these techniques were based on 
the necessary conditions of optimality. We think that 
the use of necessary conditions (for example, Pontrya- 
gin’s maximum principle) generates serious difficulties 
in the proof of turnpike property. New techniques were 
required that could allow us to avoid the use of neces- 
sary conditions. 

We note that there are many studies in the literature 
that aimed to study the behavior of optimal trajecto- 
ries for different systems. We refer to [25] for more in- 
formation and references. For example, Zaslavski [25] 
obtained important results for variational problems re- 
garding the turnpike behavior of optimal trajectories. 

In the following we present some results obtained 
in [10,11,12,13,15]. These studies introduced new tech- 
niques for problem (3) that did not use necessary con- 
ditions. In this way, we succeeded in establishing the 
turnpike property not only for convex problems but 
also for some classes of nonconvex problems. 
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Turnpike Theorems for Terminal Functionals 


Functional (4) was first introduced by Lyapunov [8], in 
1983, for discrete systems. It can be considered as an 
analog of the terminal functional reformulated for the 
interval (0, oo). It turned out that this functional was 
very convenient in terms of turnpike property. 

The turnpike property for this functional was estab- 
lished in [10,11] even for nonconvex problems in con- 
tinuous time. These results were obtained on the ba- 
sis of new techniques that had been developed. They 
applied to some nonconvex practical optimal control 
problems. Some applications of these techniques to dis- 
crete time systems can be found in [14,16,19]. 

We consider the system x € a(x) on the interval 
[0, oo). 


Definition 3 Trajectory x(t) is called optimal if 
J(x(-)) = J(&(-)) holds for all trajectories <(t) starting 
from the same initial state: (0) = x(0). 


Definition 4 The set 
m = {x € R”:0 € coa(x)} 


is called the set of generalized stationary points. 
Here “co” stands for the convex hull. Clearly M Cc M. 


We will also use the notation 


a(A) = Uxeaa(x). 


Functional lim inf; .91(x(2)) 


In this section we consider the problem 
x €a(x), J(x(-)) =lim int u(x(t)) > max (7) 
00 


The main condition that will be imposed on mapping a 
is the following: 

Condition A: Given any set A C R” 

if Oe€coa(A) then 0€coa(x) 


for some xeA. (8) 


If mapping a has convex images a(x) for all x, then (8) 


can be reformulated as 
if O€coa(A) then 0€a(coA). (9) 


We denote by 2 the class of continuous set-valued 
mappings a satisfying Condition A. The following 


lemma shows that 2 contains the class of mappings 
having a convex graph. 


Lemma 1 [If graph a is convex then Condition A holds. 


Denote J* = max,eqn u(x). Let x* € IN be a point for 
which u(x*) = J*. If set IN is convex and u(x) is strictly 
concave, then point x* is unique. 

The main results are combined in the following the- 
orems. 


Theorem 1 Assume that a € U and function u(x) is 
concave. Then the inequality J(x(-)) < J* holds for all 
trajectories x(t). 


Theorem 2 Assume that a € U, function u(x) is strictly 
concave and IN is convex and compact. If trajectory x(t) 
is such that J(x(-)) = J*, then lim;+9 x(t) = x*. 


If J(x(-)) = J*, then from the first theorem it follows 
that trajectory x(t) is optimal. The second theorem pro- 
vides the turnpike property: all optimal trajectories sat- 
isfying J(x(-)) = J* converge to x*. 

It is important to note that, in this way, the turn- 
pike property is established for a special class 2 of non- 
convex set-valued mappings a. This class contains map- 
pings a having convex graphs. Therefore, for convex 
problem (7) the turnpike property is true without any 
additional assumptions. 


Functional lim inf; 9u(x(1),<(t)) 


Now we consider the problem 


x €a(x), Jy(x(-)) = lim dint u(x(t), x(t)) — max 


(10) 
The main condition in this case is the following: 
Condition B: Given any set Q C grapha 
if coQN(R"x0) F- GB 
then coQn(Mx0) F BO. (11) 


We denote by & the class of continuous set-valued 
mappings a satisfying Condition B. We have the follow- 
ing properties: 


Lemma 2 [If graph a is convex, then Condition B holds. 
If Condition A holds then Condition B holds too. 


Denote the set of continuous set-valued mapping with 
a convex graph by ©. From this lemma we have the 
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relation 
CCBA. 


Denote Jf = max,egn u(x, 0), where Mt is defined in 
Definition 4. 

Let x* € IN be a point for which u(x*,0) = Jf. If 
set JM is convex and compact and u(x,y) is strictly con- 
cave, then point x* is unique. 

Similar to Theorems 1 and 2 we have the following 
results. 


Theorem 3 Assume that a € 8 and function u(x,y) is 
concave. Then the inequality J,(x(-)) < Jf holds for all 
trajectories x(t). 


Theorem 4 Assume that a€%, function u(x,y) 
is strictly concave and IM is convex and compact. 
If trajectory x(t) is such that J,(x(-)) = J} then 


lim; +o9 x(t) = x*. 


Therefore, for problem (10), the turnpike property is 
established for a special class & of nonconvex set-valued 
mappings a. This class contains mappings a having 
convex graphs. 

Now we present some interesting examples related 
to Theorems 1-4 and classes U and B. 


Example 1 Let A be an n x n matrix, B be ann xr 
matrix and V C R’ be aclosed set (not necessarily con- 
vex). Then, the mapping defined by 


a(x) = {Ax + Bv: ve V} 
belongs to class B (and, consequently, to 2). 


Example 2. Let x = (x),x2) € R* and a(x) = 
{-(x1, x2), (x1,0)}. It is not difficult to show that 
ace. 


The following example shows that, in Theorem 4, the 
convergence x(t) —> 0 may not be true while x(t) > 


x 


Example 3 Let x € Ru(x,y) = /x + //y +1, and 
a(x) = [—1,1] ifx € [0,1],a(x) = lifx > 1. We 
have JN = [0,1] and J* = 2. Consider trajectory x(t) 
defined as follows: on each interval [m,m + 1],m = 
1,2,---, weset x(t) = t—mift € [m,m+1/(2m)], and 
x(t) = —t/(2m—1) + (m+ 1)/(2m—1) if t € [m+ 
1/(2m), m+ 1]. It is not difficult to show that J,(x(-)) = 
2 = Jf (i.e. turnpike property is true). However, x(t) 
does not converge to 0. 


Turnpike Theorems for Integral Functionals 


In this section we consider problem (3) with integral 
functionals. For the sake of simplicity, we will only con- 
sider the following problem: 


x €a(x), x(0) = x°; 
(12) 


T 
Ir(x(-)) = / u(x) dt > max 
0 


We denote by X7 the set of trajectories defined on the 
interval [0,T]. Let 

Jp = sup IJr(x(-)). 
x()EXT 
Definition 5 Trajectory x(-) is called optimal if 
Jr(x(-)) = Jz and is called £-optimal (& > 0) if 


Joe) 2I7 ="; 


Definition 6 x* € M is called an optimal stationary 
point if 


Here M is the set of all stationary points. We assume 
set M is not empty. 

The turnpike theorem is proved under two main 
conditions: Conditions M and H given below. The first 
condition concerns the existence of “good” trajectories 
starting from the initial state x°. The second is the main 
condition that provides the turnpike property. 

Condition M: There exists b < +00 such that for ev- 
ery T > 0 there is a trajectory x(-) € Xr satisfying the 
inequality 


Ine) =e =o. 
Set 
B={xEQ: u(x) >u*}. 
We fix p € R", p # 0, and define a support function 


c(x) = max py. 
yEa(x) 
Here the notation py means the scalar product of the 
vectors p and y. By |c| we will denote the absolute value 
of c. We also define the function 
u(x) — u* 


Ic(x)| 


u(y) — u* 
c(y) 


g(x,y) = 
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Condition H: There exists a vector p € R” such that 
H1 c(x) <0 forallx ¢ B,x # x*; 
H2 there exists point X € 92 such that px = px* and 
c(x) > 0; 
H3 for all points x, y, for which px = py, c(x) <0, 
c(y) > 0, the inequality p(x, y) < 0 holds. Moreover, 
i 
if 


XK > x", yp > yl Zx"*, 
PXk = PVk.C(Xk) < 0, c(yx) > 0, 


then lim sup 9(xx, yx) <0. 


k—>oo 
Now we formulate the main result. 


Theorem5 Assume that Conditions M and H are satis- 
fied and the optimal stationary point x* is unique. Then: 
1) There exists C < +00 such that 


T 
[lost lat = C 
0 


for all T > 0 and all trajectories x(-) € Xr. 
2) For every € > 0 there exists Keg < --oo such that 


meas{t € [0, T] : ||x(t) —x*|| > e} < Keg 


for all T > 0 and all -optimal trajectories x(-) € Xr. 
3) If x(-) is an optimal trajectory and x(t,) = x(t,) = 
x*, then x(t) = x* for allt € [t), tp]. 


This theorem has two major advantages compared with 

the results obtained by others, including [20,21,22,23]: 

1. Theorem 5 does not use the Hamiltonian. It uses 
conditions that directly imposed on mapping a and 
function u. Thus, these conditions can be verified for 
a given particular problem. 

2. The main condition in Theorem 5 is H3. It can be 
considered as a relation between mapping a and 
function u which provides the turnpike property. 

We will see below that Conditions H1 and H3 hold if 
the graph of the mapping a is a convex set (in R” x R”) 
and the function u is strictly concave. On the other 
hand Condition H may hold for mappings a having 
nonconvex graphs and for functions u that are not 
strictly concave. Therefore, Theorem 5 establishes turn- 
pike property for nonconvex problems. 


Convex Problems 

Now we consider problem (12) assuming that graph a 
of the mapping a is a convex set and the function 
u: $2 — R is strictly concave. Let 


Jr = sup Jr(x(-)). 


x(-)EXr 
In this section, we present a result showing that Theo- 
rem 5 is valid for a convex problem without assuming 
Condition H. In particular, this means that the turn- 
pike property is true for convex problem (12) without 
any restrictive additional assumptions. 
We have the following result. 


Lemma 3 Assume that graph a is a compact set, func- 
tion u is strictly concave and 


0 €inta(x) forsome xEM. (13) 


Then Conditions H1 and H3 hold. 


We note that Condition H2 may not be satisfied even if 
condition (13) holds. This can be seen from the follow- 
ing example. 

Example 4 Let 2 = [-1,1] C R'anda(x) = [—1, 
E(x)], 


where 


4 1\? 1 
sa) =- (+5) +2, elas, 


Consider the function u(x) = 1 — (x — 1). 

For this problem function wu is strictly concave, 
the graph of the mapping a is a convex set. We have 
M = [-1,0], u* = maxyey u(x) = Oand x* = 0. 

It is not difficult to observe that for the point 
x = —1/2 condition (13) holds. 

Consider Condition H. We have B = [0, 1]. Condi- 
tion H1 is satisfied for the points p € R!, p > 0. 

Now we check Condition H2. Take any p € R', p # 
0. If px = px*, then x = x* = Oand 


c(x) = max py = 


max y=0. 
yEa(Xx) 


ye[—1,0] 

Therefore, Condition H2 is not satisfied for any 
peR, p # 0. The main result of this section is the fol- 
lowing 
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Theorem 6 Assume that function u is strictly concave 
and Conditions M and (13) hold. Then an optimal sta- 
tionary point x* exists, is unique and all assertions of 
Theorem 5 are true. 


Condition (13) is important. Example 5 presented 
shows that if this condition does not hold, then The- 
orem 6 may be not true. 


Example 5 Let 2 = [—1,1] c R', 


f—xt+v:ve[xt—1,0]} if0<x <1; 
a(x) = 
{v:veé[-1,0]} if -l<x <0, 


and u(x) = 1 — (x — 1)’. vis the control. 

It is clear that function u is strictly concave, the 
graph of the mapping a is a convex set. We have 
M = [-1, 0], u* = maxyey u(x) = Oand x* = 0. 

It is not difficult to observe that condition (13) is 
not satisfied. We will show that Theorem 6 is not true 
in this case. 

We take an initial point x° = 1 and consider a tra- 
jectory corresponding to the control v(t) = 0. This tra- 
jectory can be calculated as a solution to the following 
differential equation: 


x = —x*, x(0) = 1. 

We have x(t) = (3t + 1)~3. Clearly 0 < x(t) < 1 and 
x(t) > 0as t > oo. Therefore, x(f) is a trajectory. We 
have 


uy T 


[ooo —u*)dt = fo — (x(t) — 1)?]dt 
0 0 
T JM 


> [ «mat = fortartar +00 , 
0 


0 


as T — oo. Therefore, the first assertion of Theorem 6 
is not true. 


Other Results 


Some generalizations of the results presented above, 
involving functionals in (5) and (6), can be found 
in [11,12,13,15]. Moreover the case when the optimal 
stationary point is not unique is also considered. 


See also 


> Statistical Convergence and Turnpike Theory 
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Anticipation, Learning, and Adaptation 


Two-stage stochastic programming models incorporate 
three major mechanisms facilitating our response (or 
survival) to uncertainty and changing conditions: an- 
ticipation, learning, and adaptation. Uncertainty and 
potential abrupt changes are pervasive characteristics 
of most on-going socio-economic and environmental 
changes. In order to manage such processes we must 
develop robust strategies incorporating all these mech- 
anisms: the long term anticipative (forward looking, 
ex-ante) actions (policy setting, allocation of resources, 
engineering design, pre-disaster planning, etc.); learn- 
ing (by-doing, researches, observations); and the short- 
term adaptive adjustments (defensive driving, market- 
ing, inventory, control, post-disaster adaptation, etc.). 
The standard expected utility theory considers these 
mechanisms independently suggesting either anticipa- 
tive (risk averse) or adaptive (risk prone) decisions. 
This decision paradigm often directs real policy debate, 
e. g., on CO® stabilization strategies, emphasizing either 
immediate actions or wait-and-see adaptation after full 
information become available. 

The following simple example illustrates that ac- 
cording to the two-stage modeling approach, in gen- 
eral, only a part of the risk is managed by anticipative 
decisions whereas the other part is managed by con- 
nected with them properly designed adaptive decisions. 
It shows that strong interdependencies among ex-ante 
and ex-post decisions induce endogenous risk aversion 
even in linear models. The example illustrates also po- 
tential advantages of SQG methods. 


Safety Constraints and CVaR Risk Measures 


A stylized climate stabilization (two-stage stochastic 
programming) problem [8] can be formulated as fol- 
lows: let x denotes an amount of emission reduction 
and let a random variable 6 denotes an uncertain criti- 
cal level of required emission reduction. Ex-ante emis- 
sion reductions x > 0 with costs cx may underestimate 
B. A linear total adaptation cost is dy, where y is an 
ex-post adaptation. Let us assume that ex-post adap- 
tive capacity is unlimited (in general, it must be de- 
veloped ex-ante), and c < d. The two-stage model is 
formulated as the minimization of expected total cost 
cx + dEy subject to the constraint x + y > f. This 
problem is equivalent to the minimization of function 
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F(x) = cx + Emin{dy|x + y > B} or F(x) = 
cx+dE max{0, B—x}, which is a simple minimax prob- 
lem. Optimality conditions for these types of problems 
show [4,5], pp 107, 416, that the optimal ex-ante so- 
lution is the critical quantile x* = £, satisfying the 
safety constraint Pr[x > 6] > pforp = 1—c/d 
assuming the distribution of 6 has a density. This is 
a remarkable result: highly non-linear and even often 
discontinuous safety (or chance) constraint is derived 
(justified) from an explicit introduction of ex-post sec- 
ond stage decisions y. In other words, although the two 
stage model is linear in variables x,y, the strong risk 
aversion is induced among ex-ante decisions character- 
ized by the critical quantile B,. Only the slice B, of the 
risk is managed ex-ante, whereas the rest y* = B — Bp 
is adapted ex-post. It is easy to see that the optimal 
value F(x*) = dEBI(B > x*), where I(-) is the indi- 
cator function. This is the expected shortfall or Condi- 
tional Value-at-Risk (CVaR) risk measure [9]. 

In more realistic models, 6 is defined as a rather 
complex process dependent on scenarios of future 
global energy system, land use changes, demographic 
dynamics, etc. In these cases it is practically impossi- 
ble to derive the distribution of f analytically. Instead, 
only random scenarios of 8, B°, B', B?,..., can be gen- 
erated providing sufficient information for SQG meth- 
ods. F(x) is a convex and nonsmooth function, its SQG 
is€ = &(x, 8) = c—d for B < x; and & = c otherwise. 
Therefore, the SQG projection method for k = 0,1,... 
is defined as the following 


x(k + 1) = max {0, x — pyé(x', p*)} . (1) 


General Model 


The model incorporates two types of independent de- 
cisions. The ex-ante (risk averse, anticipative) decision 
x € R" of the first-stage is made on the basis of a pri- 
ory information about random uncertain variables w. 
The second-stage ex-post (risk prone, adaptive) decision 
y € R’ is chosen after making an additional observa- 
tion on w. For known @, decisions x,y are evaluated by 
some functions g(x, y,@),i = 0,1,...,m, which de- 


fine the constraints 
gi(x,y,@) <0, i=1l:im (2) 


and the objective function go(x,y,w). The ex-ante deci- 
sion x which is chosen before the observation of w can- 


not properly anticipate w and, hence, satisfy (2) exactly. 
The ex-post decision y creates the possibility to ful- 
fill (2) after revealing information on w. It minimizes 
go(x,y,w) for given x,w subject to (2) and some addi- 
tional constraints y € Y such as y > 0. Let us denote 
the feasible set of this standard deterministic problem 
as Y(x,w) and an optimal solution as y(x,w). In var- 
ious important applications y(x,@) is easily calculated 
and its existence can be easily ensured by introducing 
some auxiliary variables. The function go(x,y,@) reflects 
a trade-off between choosing some options x now "and 
postponing other options y after" full information on w 
becomes available. The general two-stage problem is to 
find x € X C R” minimizing 


F(x) = Ef(x,@), (3) 


where f(x,@) = go(x, y(x,@), @). Besides determin- 
istic constraints of type x € X, there may also be gen- 
eral constraints of STO problems formulated in terms 
of some other random functions f;(x,@),/ = 1,2,.... 

The random objective function f(x,@) in (3) is 
a rather general implicitly defined nonsmooth func- 
tion even for linear in (x,y) functions g(x, y,@),i = 
0,1,..., m. Hence F(x), in general, is also a nonsmooth 
function and general purpose SQG methods designed 
for nonsmooth optimization problems are applicable for 
minimizing (3). In fact, as the following sections show, 
there are fundamental obstacles in using other solu- 
tion techniques even for problems with linear func- 
tions. Now consider specific SQG methods which ex- 
ploit the structure of the function (3). 


Convex Case 


Assume that g;(x, y,@),i = 0: m are convex in (x,y) 
functions and that a solution A(x,w) dual to the solution 
y(x,@) is given. Let (gix,giy) be a subgradient of the func- 
tion gi(-,@) in variables (x,y) at a point (x*, y*), k= 
0,1,.... A stochastic subgradient of the function (3) 
takes the form [2,5], pp 16, 171, 


E(k) = gox(x*, y*, w*) 


+ Aix", o*)giz(x*, y* wo), (4) 
i=1 


where yk = y(x*, w*); w,...,@* are independent 


samples of w. This vector can now be used in various 
SQG methods. 
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Example 1 Linear functions Assume that x € X, 
Y = R%, and go = (c,x) + (d,y), gi = (a',x) 
+ (W', y) — bj, where d, b, a’, W' are random vectors, 
ie ow = {d, b, a’,Wii=l: mh. Let us introduce 
matrices A = (a',...,a"), W = (W',..., W™). By 
using SQG projection methods with &(k) defined as (4), 
we obtain the following procedure. Let x° be an arbi- 
trary initial approximate solution and w°,...,@* are 
independent observations of w, w* = (d*, b*, Ak, W*), 
where d‘,b‘,A*,Wk are observations of random vectors 
d,b and matrices A,W. Solve the linear problem (for 
given x* wk): min{(d*, y): Wry < oF A» > 
0}; calculate the dual variables A(x*,w"), E(k) = c+ 
A(x*, w*)A*, and new 


x1 = ax [x* — p00] , (5) 


where k = 0,1,.... This method was first proposed 
in [1,2] (see also references in [5], pp 169-171, [7], 
pp 213-215). It is important to note that the SQG 
method (5) can be regarded as a stochastic decom- 
position procedure for extremely large scale problems 
which often can not be solved by conventional deter- 
ministic techniques [6]. 


Stochastic Decomposition Techniques 


Assume that @ = (d,b, A, W) has only a finite num- 
ber of possible states (scenarios) w = (d*, b*, A°, W*), 
s = 1: N,with probabilities p,, ™_, p, = 1. Then the 
problem with linear functions and X = R‘. is equiv- 
alent to the following deterministic large scale linear 
problem: minimize 


(c,x) + pild', y(1)) +... + pr(d%, y(N)), (6) 


A'x + W'y(1) < b'," 


ANx +... WNy(N) < BN, 


x>0, y(1)>0,..., y(N)>0. 


The number N may be very large: if only the vec- 
tor b = (b,,...,bm) is random and each compo- 
nent b,,...,b, has two independent outcomes, then 
N = 2”. Hence deterministic problem (6) can not be 
solved by the standard optimization techniques even 
with small number of constraints m = 100 and general 
random matrix A. The SQG procedure (5) is applicable 


also to other deterministic problems with an arbitrary 
block-diagonal structure of type (6), since any objective 
function (a, x) + (B', y(1)) +... + (BY, y(N)) can be 
rewritten in the form of expectation (6) with c = a, 


ds = B'/ps, ps > 0, ye: = 1, 


Example 2 Managing agricultural risks This exam- 
ple illustrates the nonsmooth character of the objec- 
tive function (3), which prohibits the use of the stan- 
dard stochastic approximation procedures. The main 
issue is to evaluate the need for an irrigation system. 
If the river water level is characterized by its average 
value, the decision to use irrigation is trivial and de- 
pends, in particular, on whether the profit per hectare 
of irrigated area d is greater than the profit d; from 
a hectare without irrigation. The stochastic variation 
of the river water level creates essential difficulties. In 
situations of low water levels the land prepared in ad- 
vance can only be partially supplemented with addi- 
tional water, resulting in a profit d) per hectare on 
the remainder of the land. Besides this, the situation 
may also be affected by variations in water prices: it 
is easy to imagine a scenario for which in a dry sea- 
son the use of irrigation water may become unprofitable 
although irrigation is profitable under average condi- 
tions. Now suppose that Q is the level of available wa- 
ter; q is the amount of water required for irrigation of 
a hectare. Denote by x, x < a, the area which must be 
prepared in advance for irrigation, where a is the to- 
tal irrigable acreage. There may be two types of risks: 
in situations when Q < xq there is the risk to forego 
the profit per hectare of land that irrigates. In the case 
when Q > xq there is the risk to forego the profit per 
hectare of land not prepared in advance for irrigation. 
These risks depend on the choice of ex-post decisions 
y = (1. y2, 3), where y; is the use of irrigated land, y2 
is the use of land that was prepared for irrigated culti- 
vation but cannot be irrigated, y3 is the use of land that 
was not prepared for irrigation. Let w = (Q, d), do, ds), 
and c be the cost per hectare of irrigated land. Ex-ante 
and ex-post decisions x, y1, y2, y3 are connected by the 
equations y} + y.2 <x,0<x<a,yt+yot+y3 <a, 
Q/q = vis V1 = 0,2 = 0, y3 => 0. The decision vec- 
tor y(x,@) maximizes the profit dj y; + doy2 + d3y3 
subject to these constraints. The sample objective func- 
tion can be defined as f(x, @) = —r(x, w), where r(-) is 
the revenue function r(x,@) = —cx + dyyi(x,@) + 
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dx y2(x,@) + d3y3(x,@). This function has complex 
nonsmooth character because y;(x,w),i = 1,2,3 are 
discontinuous functions. Thus if, by a chance, Q > xq 
and d; > d), then y\(x,w) = x, yrx(x,@) = 0, 
y3(x,@) = a—x. But if Q>xq and d; < dy, then 
yi(x,@) = 0, yo(x,@) = x, y3(x,w) = a—x. In the 
case Q < xq, d, => dy the values are y|(x,w) = Q/q, 
y2(x,@) = x—Q/q, y3(x,@) = a—x. The SQG method 
is defined by (5). 


Dynamic Two-Stage Problem 


It must be emphasized that the “stages” of the two-stage 
problem do not necessarily refer to two time units [2,5], 
pp 16-20, [7]. The x,y vectors may represent sequences 
of actions x(t),y(t) over a given time horizon x = 
(x(0),x(1),...),y = (y(0), y(1),...), and in addition 
to the x,y decision variables, there may also be a group 
of variables z = (z(0), z(1),...) that record the state of 
the system at t = 0,1,.... The variables x,y,z,w are of- 
ten connected through a system of equations: z(f+1) = 
z(t) + A(t, z(t), x(t), y(t), wo), f = 0,..., T—1. The es- 
sential new feature of such a dynamic two-stage stochas- 
tic programming problem is that the variables z are im- 
plicit functions of x,y,w besides the already rather com- 
plex implicit structure of y(x,w). This often rules out the 
use of deterministic optimization techniques. 


Example 3 Optimal investments Consider a typical 
problem of optimal investments under uncertainty. Let 
x;(t) be the new capacity made available for electric- 
ity producing technology i at time t and z;,(t) be the 
total capacity of i at time ¢. Obviously z(t) = z(t 
— 1) + x;(t) — x;(t — L;), where L; is the life-time of 
i. If dj(t) is different possible demand modes (scenar- 
ios) j, at time tf = 0,1,..., T; yj(t) is capacity of i (ef- 
fectively) used at time ¢ in mode j, then > iyi i) < 
z(t) and }°; yij(t) = dj. Let c;(t) be the unit invest- 
ment cost for i at time t and qj(t) be the unite pro- 
duction cost. The future cost and total demand can be 
considered truly random, i.e., elements forming w are 
dj(t), qi (t). The resulting random objective function is 
the sum of investment and production costs: f(x, @) = 
Di ci(t)zi(t) + min Vij Gi Dy: Nonnegative 
variables z;(t) are uniquely defined by variables x(t), 
i.e. f(x,@) is an implicit function of x. The general 
scheme for calculation SQG is the following. Assume 
for simplicity x;(f — Lj) = 0,¢ = 0,...,L; — 1. 


Suppose that at step k we have arrived at an approx- 
imate ex-ante decision variables a), b= Opcacg BE. 
Next, simulate w* composed of dk(t), qi f(b); calcu- 
late z(t) and ex-post variables yi(t). Let AK(t) be the 
dual variables for constraints )~ vi jh < z*(t). Here 
we suppose that these demand constraints can always 
be fulfilled by introducing a fictitious unlimited energy 
source with high operating cost. A SQG of f(x,w) at 
x = x* is defined by using adjoint variables uk(t) (com- 
monly used in the control theory) to dynamic equations 
for z;(t), [2], pp 173-175. In our case they obey sim- 
ple equations: uk(T) = —c;(T), uk(t) a uk(t + 1) 
— c(t) + AK(t) fort = T—1,...,1,0. The SQG & 
consists of components &*(t) = u(t + L;) — u(t) 
for t = 0,1,...,T7 — L; and EK(t) = —uk(t) for 
t=T-L;+1...., T. 


Decision Processes with Rolling Horizon 


In the dynamic two-stage problem, the learning (ob- 
(w(0),...,@(t),...) takes place 
only in one step before making ex-post decision y = 
(y(0),..., y(t), ...). In reality the learning and the de- 
cision making processes may be of a sequential char- 
acter. At step f = 0,1,... some uncertainties w(t) are 
revealed followed by ex-post decisions y(t,x,@), that are 
chosen to adapt to new information. The whole deci- 
sion process proceeds in alternating steps: decision - 
learning - decision - .... The dependence of y(t,x,w) 
on x is highly nonlinear, i.e. these functions do not 
posses, in general, the separability properties necessary 
to permit the use of the conventional recursive equa- 
tions of dynamic programming. There are even more 
serious obstacles to the use of such recursive equa- 
tions: a tremendous increase of the dimensionality and 
the computation of mathematical expectations. The dy- 
namic two-stage model provides a powerful approach 
to dynamic decision making problems under uncer- 
tainty. At time t = 0 an optimal long term ex-ante 
strategy x[0, T — 1] is computed by using a priori infor- 
mation about uncertainty within the interval [0, T— 1]. 
The decision x(0) from x[0, T — 1] is chosen to be im- 
plemented at t = 0 and the new a priori information is 
designed for interval [1, T] conditioned on the learned 
(0); a new ex-ante strategy x[1, T] is computed and 
the decision x(1) from x[1, T] is chosen for the imple- 
mentation at tf = 1, and so on. This approach to de- 
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cision making with rolling horizon avoids the computa- 
tion of decisions at time t as a function of all previous 
to t decisions, what enormously reduces the compu- 
tational burden of the recursive dynamic programming 
equations and multy-stage stochastic programs. The de- 
cision path (strategy) x[t, T+t—1] foreach t = 0,1,... 
can be viewed as a robust strategic plan over a time 
horizon of duration T (weeks, months, years). At each 
t = 0,1,... this plan is revised to incorporate adap- 
tively new information and new time horizon. 

The duration T must be properly defined in order to 
justify strategies that may turn into benefits over long 
and uncertain time horizons. For example, how can we 
justify investments, say, in a flood defense system to 
cope with foreseen extreme 100-, 250-, 500-, and 1000- 
year floods. In such cases, T can be a random vari- 
able, so-called stopping time, associated with the oc- 
currence of a catastrophic event. SQG methods allow to 
design adaptive Monte Carlo optimization procedures 
(learning-by-simulations) combining fast generators of 
catastrophes with adaptive adjustments of robust risk 
management decisions [3]. 
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Many real world decision problems are faced with 
some uncertainty. Typical examples are produc- 
tion/inventory problems with uncertain future de- 
mands or energy models with uncertain future fuel 
prices. Models that take uncertainty into account are 
known as stochastic models to differentiate them from 
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deterministic models which assume all data to be 
known with certainty. Stochastic programming is pre- 
cisely the field of mathematical programming where 
some of the data are random variables. 

In two-stage stochastic programs, some decisions, x, 
called first-stage decisions, must be taken before know- 
ing the particular values taken by the random variables 
— while some other decisions, y(&), called second-stage 
decisions or corrective actions, can be taken after the re- 
alizations of the random variables are known. In this 
representation, first- and second-stage are differenti- 
ated as the periods of time before and after the random 
data are known. 

A two-stage stochastic linear program or stochastic 
linear program with recourse is a mathematical program 
of the form 


c.x + Q(x) 
s.t. xeEX, 


where Q(x) = Ee Q(x, &), Qos £) = minyge YE) gy(@)s 
and Eg denotes the mathematical expectation with re- 
spect to the random vector €. X and Y(€) are usually 
polyhedral convex sets. In this representation, Q(x, &) 
is the second-stage value function for a given & and 
Q(x) the expected value-function or expected recourse. 
It measures the impact in the second-stage and in ex- 
pected terms, of a first-stage decision. 

Many different situations can be represented by 
a recourse program. Two extreme situations for the 
random variables are the following. First, the random 
vector may represent a limited number of well studied 
scenarios. These are obtained as the best judgment ex- 
perts can form about the future. In its simplest version, 
this may correspond to an optimistic scenario, a pes- 
simistic scenario and a mean scenario. The stochas- 
tic solution will hedge against these scenarios, to find 
a solution that performs well under three scenarios, al- 
though only one will realize. On the other extreme, the 
random vector may represent uncertainties that recur 
frequently on a short-term basis. Then, the expecta- 
tion somehow represents a mean over possible values 
of which many occur so that the expectation will match 
closely e. g. the mean yearly revenue/cost. 

Much of the difficulty of solving a two-stage pro- 
gram depends on the properties of the expected re- 
course function and on the so-called second-stage fea- 
sibility set, denoted by Kz, which represents the set of 


min 


first-stage decisions yielding feasible decisions in the 
second stage. In the case where random vectors are de- 
scribed by discrete distributions, Q(x) is a piecewise lin- 
ear convex function of x and K2 is convex and polyhe- 
dral in x, so that classical decomposition techniques may 
apply (see » L-shaped method for two-stage stochastic 
programs with recourse). When the random variables 
are not discrete, some technicalities may occur which 
result in difficulties over feasibility sets [5]. Those situa- 
tions are fortunately infrequent. In the case of W being 
fixed and under weak assumptions, Q(x) is convex. It 
is also differentiable if € has an absolutely continuous 
cumulative distribution, so that techniques from non- 
linear programming can be applied. Note that contin- 
uous random variables may be approximated by dis- 
crete ones. For this process, known as discretization, see 
> Semi-infinite programming: Discretization methods. 

Even when decision makers realize the existence of 
uncertainty, in practice, they may choose to solve a de- 
terministic model. The reason for such a choice is that 
stochastic models are seen as more difficult to solve. 
Now, as perfect forecasting does not exist, real data are 
very often different from the data used in the models. 
This results in poor decisions being taken. It is thus 
very often advisable to develop smaller size models that 
include some stochastic elements, instead of very large 
detailed deterministic ones that neglect the presence of 
uncertainty. 

Measures have been developed to quantify the im- 
portance of solving a stochastic program instead of 
a deterministic one. The expected value of perfect infor- 
mation (EVPI) measures the maximum amount a deci- 
sion maker would be ready to pay in return for com- 
plete information about the future. This concept has 
been developed in the context of decision analysis. It 
compares the expected objective when all decisions can 
be taken after the random vector is observed (the so- 
called wait-and-see solutions) and the two-stage situ- 
ation. The value of stochastic solution (VSS) measures 
how much would be lost by not solving the recourse 
problem, but, instead, by solving some substitute deter- 
ministic model. Those concepts are studied in [1]. Un- 
fortunately, they can only be calculated a posteriori, so 
that is it usually not possible to evaluate beforehand the 
benefit of solving a stochastic program. 

A classical alternative to two-stage or recourse pro- 
grams is to require that the constraints should be sat- 
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isfied with some level of probability. This is known as 
chance-constraint or probabilistic programming. See [4] 
for an extensive treatment. 

Finally, one can observe that other fields also in- 
clude uncertainty into their models. Examples are de- 
cision analysis, Markov decision processes or stochas- 
tic optimal control. To illustrate the difference, we may 
say that, typically, a two-stage stochastic program is an 
extension of a linear mathematical program. It involves 
many decision variables and constraints, discrete time 
periods, linear expectation functionals for the objec- 
tive and known distributions for the random variables. 
For a general presentation of stochastic programming, 
see [2] or [3]. 
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Unconstrained optimization methods seek a minimiz- 
ing point of a nonlinear function f: R” — R, where f is 
smooth. The classical techniques named for I. Newton 
and A.-L. Cauchy view this fundamental problem from 
complementary perspectives, model-based and metric- 
based, respectively. They provide a coherent framework 
that relates the basic algorithms of the subject to one an- 
other and reveals hitherto unexplored avenues for fur- 
ther algorithmic development. 


Notation 


Lowercase boldface letters denote vectors, e. g., x, and 
uppercase boldface letters denote matrices, e.g., M. 
A matrix that is necessarily positive definite and sym- 
metric has a + superscript attached, e.g., D*. Calli- 
graphic letters, e.g., H{, denote certain distinguished 
matrix variables. 


Model-Based Perspective 


Model-based methods approximate f at a current it- 
erate xx by a local approximating model or direction- 
finding problem (DfP), which is used to obtain an im- 
proving point. 


Newton’s Method 
In Newton’s method the DfP is as follows: 


min gj, (x— xx) + 3(x— xx) He(x— xx) (1) 


st. [xX —xe||p+ < dx, 


where g;, denotes the gradient vector of f at x;,, and 
H;, denotes its (possibly indefinite) Hessian matrix, i.e., 
the n x n matrix of second partial derivatives 0°f/ 0x; 
dx; at xx. The points x that satisfy the quadratic con- 
straint form the trust region. The quantity || - ||p+ de- 
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notes a vector norm defined by a positive definite, sym- 
metric matrix D* that determines the scaling of vari- 
ables, i.e., || z ||p+ =(z' D* z)!”? for any vector z. Com- 
mon choices include the Euclidean norm D* =I (where 
I denotes the n x n identity matrix) and the norm de- 
fined by a fixed diagonal scaling matrix (independent of 
k). The quantity 5; is an adaptively-updated parameter 
that defines the size of the trust region. 

It can be shown that a point x, is the global solution 
of (1) if and only if there is a scalar A, > 0 (Lagrange 
multiplier) such that 


(Hy + A,Dt)(xx — Xk) = —8x, 


Ax (||X* — Xk|[p+ — 5x) = 0, 


(2) 


with (Hy + Ax D*) positive semidefinite. 

For convenience of discussion, assume that the con- 
straint holds as an equality at x, and the matrix (Hx + 
Ax D*) is positive definite. (An ‘easy’ case occurs when 
x» is in the interior of the trust region so the DfP is 
essentially unconstrained with A, = 0; the other infre- 
quent and so-called ‘hard case’ arises when the matrix is 
only positive semidefinite, and it requires a deeper anal- 
ysis and refinement of the algorithmic techniques. For 
details, see [12].) Then the optimal multiplier is the so- 
lution of the following one-dimensional nonlinear equa- 
tion in the variable A > 0, which is derived directly from 
(2), namely, 


|w(A)|Ip+ = 5x, w(A) = —(Hy + ADT)! gy. 


Also, the vector x» — x, is a direction of descent at the 
point x,. A variety of strategies can be devised for defin- 
ing the new current iterate x..1. A pure trust region 
strategy (TR strategy) evaluates the function at xx. If it 
is not suitably improving then the current iterate is not 
updated, 6; is reduced, and the procedure repeated. If 
x, is improving then x; 1 =X, and 5, is updated (usu- 
ally by comparing function reduction predicted by the 
model against actual reduction). Alternatively, the fore- 
going strategy can be augmented by a line search along 
the direction of descent dy =x. — x; to find an improv- 
ing point, and again 4; is revised (TR/LS strategy). See 
also [16] for strategies that explicitly use the dual of (1). 


Quasi-Newton Method 


When Hy, is unavailable or too expensive to compute, 
it can be approximated by an n x n symmetric matrix, 


say, M;, which is used in the foregoing model-based ap- 
proach (1) in place of H;. This approximation is then 
revised as follows. Suppose the next iterate is x, 41 and 
the corresponding gradient vector is g; . 1, and define sx 
=Xe41 —X,and yz = Be 41 — Be. A standard mean value 
theorem for vector-valued functions states that 


a 
[tx + Osx) dO | sk = yk. (3) 
0 


i.e., the averaged Hessian matrix over the current step 
transforms the vector s; into y,. In revising M; to incor- 
porate new information, it is natural to require that the 
updated matrix M;., 1, has the same property, i.e., that 
it satisfies the so-called quasi-Newton relation or secant 
relation: 


My+18k = Yk. (4) 


The symmetric rank-one update (SR1 update) makes the 
simplest possible modification to M;, adding to it a ma- 
trix x uu!, where « is a real number and u is an n- 
vector. The unique matrix M;.1 of this form that also 
satisfies (4) is as follows: 


(ye — Msx)(yr — Mxsx)" 
(yk — Mxsx) sx : 


Mx+1 = Mk + (5) 
This update can be safeguarded when the denominator 
in the last expression is close to zero. A local approx- 
imating model analogous to (1) can be defined using 
the Hessian approximation in place of H,. The result- 
ing model-based method is called the symmetric rank- 
one quasi-Newton method (SRI quasi-Newton method). 
For additional detail, see [3]. 


Limited-Memory Approach 


When storage is at a premium and it is not possible 
to store an n x n matrix, a limited-memory symmetric 
rank-one approach (L-SR1 approach) uses the current 
step and a remembered set of prior steps (usually much 
fewer than n in number), along with their associated 
gradient changes, to form a compact representation of 
the approximated Hessian. We will denote this approx- 
imation by L-Mx. Details can be found in [2]. An alter- 
native approach, called a limited-memory affine reduced 
Hessian or successive affine reduction (SAR) technique, 
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develops Hessian information in an affine subspace de- 
fined by the current gradient vector, the current step 
and a set of zero, one or more previous steps. Curva- 
ture estimates can be obtained in a Newton or a quasi- 
Newton sense. The associated methods are identified 
by the acronyms L-RH-N and L-RH-SR1 and the cor- 
responding Hessian approximations by L-RH-H, and 
L-RH-Mk. The underlying updates can be patterned af- 
ter analogous techniques described in [14] and [8], but 
they have not been fully explored, to date (1999). 


Modified Cauchy Approach 


Finally, when Hessian approximations in (1) are re- 
stricted to (possibly indefinite) diagonal matrices Dx, 
whose elements are obtained by finite differences or up- 
dating techniques, one obtains a simple method that 
has also not been fully explored to date (1999). We at- 
tach the name modified Cauchy method to it for reasons 
that will become apparent in the next Section. 


Summary 


Each of the foregoing model-based methods utilize 
a DfP at the current iterate x; of the form: 


min gj (x— xx) + $(x— xx)’ Hg (x— xx) 6) 


st. |x —xx|lp+ < dk, 


where HH; is one of the following: the Hessian matrix 
H,; an SR1 approximation M; to the Hessian; a com- 
pact representation L-M;, L-RH-H, or L-RH-Mj; a di- 
agonal matrix Dx. (The other quantities in (6) were de- 
fined earlier.) This DfP is used in a TR or TR/LS strategy 
to obtain an improving point. 


Metric-Based Perspective 


Metric-based methods explicitly or implicitly perform 
a transformation of variables (or reconditioning) and 
employ a steepest descent search vector in the trans- 
formed space. Use of the negative gradient (steepest de- 
scent) direction to obtain an improving point was orig- 
inally proposed by Cauchy. 

Consider a change of variables, x = Rx, where R 
is any n x n nonsingular matrix. Then g, the gradient 
vector at the point x, transforms to = R~' g, which 
is easily verified by the chain rule. (Henceforth, we at- 
tach the symbol ‘tilde’ to transformed quantities, and 


whenever it is necessary to explicitly identify the ma- 
trix used to define the transformation, we write x[R] or 
g[R].) The steepest descent direction at the current it- 
erate X; in the transformed space is — R~ ' gi, and the 
corresponding direction in the original space is — [R' 
R]7! Sk. 


Cauchy Method 


If R is taken to be a nonsingular diagonal matrix Dt, 
corresponding to a rescaling of the variables that is ei- 
ther fixed or is varied at each iteration, this defines 
a Cauchy method. A line search along the search direc- 
tion — (D{)~ gy yields an improving point, and the 
procedure is then repeated. 


Variable Metric Method 


Consider next the case when the matrix defining the 
transformation of variables is an n x n matrix R, that 
can be changed at each iteration. Suppose a line search 
procedure along the corresponding direction — [Rj 
R,]~! gx yields a step to an improving point x;,1 and 
again define sx = Xi41 — Xx and yx = Qari — Bx. How 
should we revise the reconditioner Ry to Rx, 1 in or- 
der to reflect new information? Ideally, the transformed 
function, say th should have concentric contour lines, 
i.e., in the metric defined by an ‘ideal reconditioner’ 
Rx +1, the next iterate, obtained by a unit step from the 
transformed point x;,+ along the steepest descent di- 
rection, should be independent of where X;.+. lies along 
‘sx. Such a reconditioner will not normally exist when f 
is nonquadratic, but it should at least have the afore- 
mentioned property at the two points x, and X;,4+1. 
Thus, it is reasonable to require that Ry +1 be chosen 
to satisfy 


Xx (Ri+i] — $x (Re+1] 
= K+ [Rigi] —Bc4i[Re+i]. (7) 


This equation can be re-expressed in the original vari- 
ables as follows: 


Regist = Ry Ye (8) 


For a matrix Rx +1 satisfying (8) to exist, it is necessary 
and sufficient that Y; 8k > 0, which can always be as- 
sured by a line search procedure. Since Roi Ru 1 Sk 
= Yk, we see that (8) is equivalent to the quasi-Newton 
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relation (4) when we impose the added restriction that 
the Hessian approximation is positive definite. 
Consider the question of how to revise Rx. The so- 
called BFGS update (named after the first letter of the 
surname of its four co-discoverers) makes the simplest 
possible augmentation of R; by adding to it a matrix 
uv! of rank one. The updated matrix R;4; is required 
to satisfy (8) and is chosen as close as possible to Ry, 
as measured by the Frobenius or spectral norm of the 
difference (Ry +1 — Rx). This update can be shown to be 


as follows: 
< 
7 Ri Risk 
|Rxsx| 


(9) 


Ry4i = Re + 


Rxsx Yk 
Resell \ (yisx)!? 


where || - || denotes the Euclidean vector norm. The de- 
scent search direction is defined as before by 


digi = —[RQ4 Reid ‘get, 


and a line search along it will yield an improving point. 
The foregoing BFGS algorithm is an outgrowth of sem- 
inal variable-metric ideas pioneered by W.C. Davidon 
[4] and clarified by R. Fletcher and M.J.D. Powell [6]. 
For other original references, see, for example, the bib- 
liographies in [1] or [5]. 

Let Myy, = = Riv Rx+1 and Wey = [My ir’ 
These quantities can be updated directly as follows: 


+ mt T 
M, s,s, M k , YRYE 


Mt, =MT— (10) 
k+1 k TMts, ae 
se SKS 
Wi, =ExwfE, + —*, (11) 
Y, Sk 
where 
SiY, 
E, =I- <a 
Y, Sk 


Limited-Memory Approach 


When storage is at a premium, a version of the BFGS 
algorithm (L-BFGS) preserves steps and corresponding 
changes in gradients over a limited number of prior it- 
erations, and defines the matrix M; or wr; implicitly 
in terms of these vectors instead of explicitly by form- 
ing a square matrix. Consider the simplest case where 
a single step and gradient change are preserved (one- 


step memory). The update is then defined implicitly by 
(11) with w; = yx I, where y; is a scaling constant of- 
ten taken to be y, s;/y, yx. Thus the search direction is 
defined by 


a SiS; 
dkai = —| Ex(viDE, + =— | 8e+1- 
Y; Sk 


Under the assumption of exact line searches, i.e., Ce 
s, = 0 it follows immediately that the search direction is 
parallel to the following vector: 
¥ dae 

Sk 


Y, Sk 


— Seti + (12) 
This is the search direction used in the conjugate gradi- 
ent method, pioneered by M.R. Hestenes and E.L. Stiefel 
[9] and later suitably adapted to nonlinear optimization 
by Fletcher and C. Reeves [7]. We say that L-BFGS is 
a CG-related algorithm. 

More generally, a set of prior steps and gradient 
changes can be preserved and the update defined re- 
cursively (see [11]). Key implementation issues are ad- 
dressed in [8]. An alternative compact representation 
for L-BFGS is given in [2]. Henceforth, let us denote 
the Hessian and inverse Hessian approximations in L- 
BFGS by L-Mf and L-wy, respectively. 

A limited-memory reduced-Hessian or successive 
affine reduction version of the BFGS algorithm devel- 
ops curvature approximations in an affine subspace de- 
fined by the current gradient vector and a set of prior 
steps. We will denote this algorithm by L-RH-BFGS and 
its Hessian approximation by L-RH-Mf7. It too can be 
shown to be CG-related. For details, see [8,14] and ref- 
erences given therein. 


Modified Newton Method 


If Hy is available and possibly indefinite, it can be mod- 
ified to a positive definite matrix, Hj, in a variety of 
ways (see [1]). This modified matrix can be factored as 
Hy — RL Rk with R; nonsingular, for example, by us- 
ing a Cholesky factorization or an eigendecomposition. 
The factor Ry then defines a metric-based algorithm as 
above called a modified Newton method (MN). 

A limited-memory modified Newton algorithm 
analogous to L-RH-BFGS can also be formulated. For 
details, see [14]. Denote this CG-related algorithm by L- 
RH-MN and its Hessian approximation by L-RH-H7. 
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Summary 


The steepest descent direction in each of the foregoing 
methods is the direction d; that minimizes aa d over all 
vectors d of constant length 5, in an appropriate metric. 
(Typically 5, = 1.) Let d = x — x,. Then this DfP can 
equivalently be stated as follows: 


min 8, (x — xx) (13) 
wt. — < bx, 
S [IX — ell apt S 9% 


where M;' is given by one of the following: I; (D/)’s 
M;; L-M;;L-RH-H{; L-RH-M/; H}. The quantity 5, 
determines the length of the initial step along the search 
direction, and a line search (LS) strategy yields an im- 
proving point. 

Let us denote the inverse of Mj by W7. It is some- 
times computationally more efficient to maintain the 


latter matrix, for example, within a limited-memory 
BFGS algorithm. 


Newton-Cauchy Framework 


A simple and elegant picture emerges from the devel- 
opment in the two Sections above, which is summa- 
rized by Fig. 1 (see also [15]). It is often convenient 
to straddle this “two-lane highway’, so to speak, and to 
formulate algorithms based on a ‘middle-of-the-road’ 
approach. We now describe the traditional synthesis 
based on positive definite, unconstrained models and 
a new synthesis, called the NC method, based on Mt - 
metric trust regions. 


Positive Definite Quadratic Models 


At the current iterate x;, use an unconstrained DfP of 
the following form: 


min g/ (x — xz) + sex a) Hi x a) (14) 


where oF is one of the following: I; (Dey: M;; L-M;; 
L-RH-H}; L-RH-M;; H{. Note that the DfP uses the 
options available in (13), and indeed is a Lagrangian re- 
laxation of the latter. Often these options for H{ are 
derived directly in model-based terms. The search di- 
rection obtained from (14) is dy = — eal g. and 
aline search along it yields an improving iterate. A good 
discussion of this line of development can be found 
in [1]. 


model-based 
Newton (N) modified Newton (MN) 


R1 BFGS 
L-SR1; L-BFGS*; 
L-RH-N*; L-RH-SR1* | L-RH-MN’*; L-RH-BFGS* 


modified Cauchy (MC)* Cauchy (C) 


Unconstrained Nonlinear Optimization: Newton-Cauchy 
Framework, Figure 1 

Newton-Cauchy framework. Legend: * - CG-related; * - not 
fully explored 


: 


NC Method 


Substantial order can be brought to computational un- 
constrained nonlinear minimization by recognizing the 
existence of a single underlying method, henceforth 
called the Newton-Cauchy or NC method, which is 
based on a model of the form (6), but with its trust 
region now employing a metric corresponding to (13). 
This DfP takes the form 


min g, (x— xx) + 3 (x — xx)! Hi(x — xx) (15) 


s.t. [IX — Xk] gg < bk, 


where the matrix H, is one of the following: the zero 
matrix 0; Hy; My; L-M;; L-RH-H,; L-RH-M,;; Dx. Note 
that the objective is permitted to be linear. The matrix 
My is one of the following: I; (D)?; Mj; L-M}; L- 
RH-H;; L-RH-M/; H;. Despite numerical drawbacks, 
it is sometimes computationally more convenient to 
maintain wr; also note H; = Mt gives a model equiv- 
alent to H, = 0. 

The underlying theory for trust region subproblems 
of the form (15) and techniques for computing the as- 
sociated multiplier A; can be derived, for example, from 
[13] or [16], and the next iterate is obtained by a TR/LS 
strategy as discussed in the first Section; in particular, 
a line search can ensure that y, s; > 0, which is needed 
whenever a variable metric is updated. Note also when 
the objective function is linear that (15) reduces to (13), 
and the TR/LS strategy reduces to a line-search strategy. 

The NC method can be formulated into a large 
variety of individual NC algorithms. These include all 
the standard ones in current use, along with many 
new and potentially useful algorithms, each a par- 
ticular algorithmic expression of the same underly- 
ing method. A sample of a few algorithms from 


3968 


Unconstrained Optimization in Neural Network Training 


[ H.[ D* [| TR | Newton 


Unconstrained Nonlinear Optimization: Newton-Cauchy 
Framework, Figure 2 
Examples of NC algorithms 


amongst the many possibilities (combinations of FH; 
and M,") is given in Fig. 2. The last column identifies 
each algorithm, and the symbol * indicates that it is 
new. 


We see that even in this relatively mature field there 


are still ample opportunities for new algorithmic con- 
tributions, and for associated convergence and rate of 
convergence analysis, numerical experimentation and 
the development of quality software. 


See also 


> 


> 
> 


> 
> 
> 
> 
> 


Automatic Differentiation: Calculation of Newton 
Steps 

Broyden Family of Methods and the BFGS Update 
Dynamic Programming and Newton’s Method in 
Unconstrained Optimal Control 

Interval Newton Methods 

Large Scale Unconstrained Optimization 
Nondifferentiable Optimization: Newton Method 
Numerical Methods for Unary Optimization 
Unconstrained Optimization in Neural Network 
Training 


References 


Is 


Bertsekas DP (1999) Nonlinear programming, 2nd edn. 
Athena Sci., Belmont, MA 


2. Byrd RH, Nocedal J, Schnabel RB (1994) Representations 


of quasi-Newton matrices and their use in limited-memory 
methods. Math Program 63:129-156 


3. 


Conn AR, Gould NIM, Toint PhL (1991) Convergence of 
quasi-Newton matrices generated by the symmetric rank 
one update. Math Program 50:177-196 


. Davidon WC (1991) Variable metric method for minimiza- 


tion. SIAM J Optim 1:1-17 (Original (with different pref- 
ace): Argonne Nat. Lab. Report ANL-5990 (Rev., Argonne, 
Illinois)) 


. Dennis JE, Schnabel RB (1983) Numerical methods for 


unconstrained optimization and nonlinear equations. 
Prentice-Hall, Englewood Cliffs, NJ 


. Fletcher R, Powell MJD (1963) A rapidly convergent de- 


scent method for minimization. Comput J 6:163-168 


. Fletcher R, Reeves C (1964) Function minimization by con- 


jugate gradients. Comput J 6:149-154 


. Gilbert JC, Lemaréchal C (1989) Some numerical exper- 


iments with variable-storage quasi-Newton algorithms. 
Math Program B 45:407-435 


. Hestenes MR, Stiefel EL (1952) Methods of conjugate gradi- 


ents for solving linear systems. J Res Nat Bureau Standards 
(B) 49:409-436 


. Leonard MW (1995) Reduced Hessian quasi-Newton 


methods for optimization. PhD Diss Univ Calif, San Diego, 
CA 


. Liu DC, Nocedal J (1989) On the limited memory BFGS 


method for large-scale optimization. Math Program B 
45:503-528 


. Moré JJ (1983) Recent developments in algorithms 


and software for trust region methods. In: Bachem A, 
Grdtschel M, Korte B (eds) Mathematical Programming: 
The State of the Art (Bonn, 1982). Springer, Berlin, 
pp 258-287 


. Moré JJ (1993) Generalizations of the trust region problem. 


Optim Methods Softw 2:189-209 


. Nazareth JL (1986) Conjugate gradient algorithms less de- 


pendent on conjugacy. SIAM Rev 28:501-511 


. Nazareth JL (1994) The Newton-Cauchy framework: A uni- 


fied approach to unconstrained nonlinear minimization. 
Lecture Notes Computer Sci, vol 769. Springer, Berlin 


. Rendl F, Wolkowicz H (1997) A semidefinite framework for 


trust region subproblems with applications to large scale 
minimization. Math Program 77:273-300 


———E 
Unconstrained Optimization 


in Neural Network Training 
UONNT 


LUIGI GRIPPO 
Rome University La Sapienza, Rome, Italy 


MSC2000: 90C30, 90C30, 90C52, 90C53, 90C55, 
65K05, 68T05 


Unconstrained Optimization in Neural Network Training 


3969 


Article Outline 


Keywords 
See also 
References 


Keywords 


Neural networks; Unconstrained optimization; 
Training algorithms 


The training (or supervised learning) problem for 
a neural network [5,13] of given topology can be for- 
mulated as the problem of determining the network pa- 
rameters by minimizing some measure of the error be- 
tween the desired output and the actual output, in cor- 
respondence to a given set of input data. More specifi- 
cally, consider a static network and suppose that a set of 
desired input/output patterns (training set) is given: 


T={E,t): f= tens}, 


where it is assumed that €; € R? and t; € R. 

Denoting by w € R” the vector of network param- 
eters and by y(w; &) € R the output of the network in 
response to an input &, the training problem can be for- 
mulated, for instance, as the (nonlinear) least squares 
problem (cf. also » Least squares problems): 


M 


min E(w) — * (y(w3 &)) — t;)" . 


j=l 


As an example, consider a simple two-layer feed- 
forward network with one input & € R and one output 
y €R, having a ‘hidden layer’ constituted by two neural 
units with a sigmoidal input-output function (‘activa- 
tion function’) defined by: 


G(s) = Ite’ 
and an output layer consisting of one linear unit. 

Let 11, uz be the weights on the input connections, 
let 0;, 02 be the ‘thresholds’ associated to the hidden 
units and let v;, v2 be the weights on the output con- 
nections. Then the input/output map realized by the 
network is given by 


y = v16(u1§) — 1) + v2 (u2é; — 2). 


Therefore, given a set {(&j, t;)} of training pairs, the 
training problem may consist in determining the pa- 
rameter vector 


Ww = (U1, U2, 91, 02, V1, V2) 


that minimizes the error function: 


M 
Y (rib (i§ — 1) + v2 (uaé} — 2) — £3)”. 


j=l 


Problems of this form constitute challenging uncon- 
strained optimization problems, which typically exhibit 
almost all the difficulties that may be encountered in the 
minimization of a nonlinear function, and in particular: 
e multiple local minima; 

e steep sided valleys; 

e extensive flat regions; 

e singularities in the Hessian matrix. 

Moreover, the number n of unknown parameters can 
be large in many practical applications and the number 
M of error terms can be huge, since it corresponds to 
the number of training examples used in the learning 
process. 

It is also important to realize that the minimization 
of the error function E itself is not the ultimate goal of 
network training. In fact, the quality of learning would 
rather depend on the network’s ability of making good 
predictions for new inputs (not considered in the train- 
ing data), that is, on the “generalization capability’ of the 
network, [5,13]. Thus, a considerable amount of exper- 
imentation may be required in order to choose prop- 
erly the model complexity in relation to the available 
data and to evaluate the results of the training phase. 
In practice, this implies that very often the minimiza- 
tion process has to be repeated, in correspondence to 
different architectures, different training sets, different 
stopping rules, and possibly also in correspondence to 
different error functions. Then, although learning can- 
not be simply reduced to an optimization problem, the 
availability of efficient optimization algorithms can be 
crucial for a successful learning. 

Deterministic iterative methods that attempt to find 
a minimizer of E can be categorized into: 

e batch methods, which use at each step information 

on the global error function E; 
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e on-line methods, which use information on a single 
error term 


E,(w) = (y(ws&)) — t))” 


at time. 

Batch methods can be used only for off-line learning, 
that is when the whole training set is available before 
that learning is started, whereas on-line methods can 
be employed both for off-line learning and for on-line 
learning, that is when the training set is formed during 
real time operations of the network. 

In the neural network literature the best known 
training method is the so-called backpropagation 
method (BP) [22], which owes its name essentially to 
the technique used for computing the derivatives of E 
in a multilayer network, a technique based on an effi- 
cient use of the chain rule, which can be identified with 
the reverse mode technique of automatic differentiation 
[23] (cf. also » Automatic differentiation: Introduction, 
history and rounding error estimation). The BP method 
has been implemented both in batch mode (‘batch BP’ 
or ‘off-line BP’) and in an on-line mode (‘on-line BP’ or 
“stochastic BP’ or ‘pattern mode training’). 

In the batch version, the BP method can be viewed 
as an heuristic implementation of the gradient method 
(or steepest descent method) and it can be described by 
the iteration: 


wit! = wk — aVE(w*), (1) 


where VE is the gradient of E and the stepsize a> 0 is 
termed the ‘learning rate’. Global convergence and rate 
of convergence analyses of iteration (1) with a constant 
stepsize can be found, for instance, in [2]. In particular, 
it is known that a convergent implementation would re- 
quire the stepsize to satisfy the bound 


2 
0<a<-, 
L 


under the assumption that VE is Lipschitz-continuous 
on R” with constant L. As the value of L is unknown, 
it may be difficult to find an appropriate value for a. 
This may suggest the use of an inexact line search tech- 
nique for computing the stepsize along the search di- 


rection. With a suitable implementation, the increase 
in the computational cost per iteration due to the line 
searches would be compensated by a definite improve- 
ment in the overall efficiency and reliability. 

Modifications of the BP method and heuristic rules 
for choosing and updating a have been also extensively 
studied in the neural network literature [7,25]. 

An improved version of the basic BP method is the 
so-called momentum updating rule [22], which consists 
in the iteration: 


wht! = wk — aVE(w*) + B(wk — wk}, (2) 


where f> 0 is a suitable parameter. In the optimiza- 
tion literature, this method is known as the heavy ball 
method [20], because of a physical analogy with the mo- 
tion of a body in a potential field, subject to an energy 
loss caused by friction. Under appropriate assumptions, 
it can be shown [21] that the convergence rate of this 
method is superior to that of the gradient method, but 
there is again the difficulty of choosing suitable values 
for the parameters a and f. 

On the other hand, when batch training is adopted, 
many well-known unconstrained minimization meth- 
ods are available for computing stationary points of E, 
such as 
e conjugate gradient methods (CGM; cf. » Conju- 

gate-gradient methods); 

e quasi-Newton methods; 

e Newton-type methods. 

Moreover, as the training problem is a nonlinear least 
squares problem, also the use of some modified form 
of the Gauss—Newton method is suitable, at least when 
small residues are expected. 

It can be observed that an iteration of the CGM can 
be placed equivalently into the form 


ak Bk 
w*tl — wk = a* VE(w*) + cs (w* _ we), 
a 


where a’ is the stepsize (to be computed through a line 
search) and AX is the parameter appearing in the par- 
ticular CGM formula we may adopt. This would corre- 
spond to a momentum updating rule where the choice 
of the parameters a* and B* can be made on a sound 
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theoretical basis. In fact, the use of the CGMhas been 
suggested since the early papers on neural networks and 
good results have been obtained [6], for instance, by 
employing the Polak—Ribi¢re CGM, which corresponds 
to the choice: 


_ VE(w*)T (VE(w*) — VE(wE-})) 


k 
: [VEO)P 


Various applications and adaptations of the CGM can 
be found in the recent neural network literature [18,25]. 
For large scale training problems a viable alternative 
can be also the use of some reduced memory quasi- 
Newton method [19]. In particular, training algorithms 
employing a memoryless BFGS method (also known as 
Shanno conjugate gradient method [24]) have been con- 
sidered [1]. This method can be defined through the it- 
eration 


wet! = wh + akgk 
where the stepsize w* is computed through an (inaccu- 
rate) line search and the search direction d‘ is obtained 
by taking initially d° = — V E(w®) and then computing 
for k > 0 the vectors 


a ey 


y = VE(w*) — VE(w) 
and letting 


d* = —VE(w*) + ak yk spre, 


where: 
_ (sk) TVE(w*) 
~~ (sk)T yk 
6. of OG) Ty) | GT VE(w*) 
a. (1+ oaree) + (FAT yk 


It can be shown that the search directions are descent 
directions, provided that 


Cours >0, 


which can be enforced through an appropriate line 
search. 


Several experiments have been also made by using 
Newton-type methods employing second order deriva- 
tives of E. In particular, truncated Newton methods ap- 
pear to be valuable for large-dimensional training prob- 
lems [8], but the use of second order methods may be 
not so convenient in case of singularities in the Hessian 
matrix of E at the solutions, since the superlinear con- 
vergence rate usually associated to Newton-type meth- 
ods may be lost. However, singularities are most likely 
to occur when the number of free parameters is too 
large in relation to the available data, a situation which 
would suggest the need of ‘pruning’ the network. 

On the whole, whenever batch learning is viable, 
it can be safely said that unconstrained minimization 
methods are of some order of magnitude faster with 
respect early heuristic training methods and the CGM 
method appears to be the technique of election. Special 
cautions are required in the choice of the starting point 
w®, which, in principle, should be such that the level set 


L= {w: E(w) < E(w°)} 


is bounded. A useful device may be that of minimizing 
a modified objective function of the form 


E(w) = E(w) + €||w|I? 


where ¢ is a small parameter. This ensures that all level 
sets are compact and may prevent the algorithms from 
reaching flat regions corresponding to very large values 
of w. The addition of the term é || w ||? may also have an 
important motivation in the context of learning theory 
since it corresponds, in essence, to regularizing the er- 
ror function by introducing a ‘complexity penalty’ that 
‘encourages the excess weights to assume values closer 
to zero, and thereby improve generalization’ [13]. 

Moreover, in association with a local method, some 
form of multistart method may be required for search- 
ing a global minimizer or, alternatively, the use of a de- 
terministic technique of global optimization can be 
attempted. Training algorithms employing homotopic 
methods and tunneling methods are described in [25]. 

However, batch methods are not suitable in on- 
line learning problems, since the objective function is 
not known when training is started, and hence on-line 
methods have to be adopted. 
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In the on-line version, the BP method can be de- 
scribed as a sequence of cycles (‘epochs’), each consist- 
ing of M iterations that update the vector w by employ- 
ing the negative gradient ofa single error term E; at time 
(possibly in a random order). This can be described by 
means of the following simplified scheme. 


1. Choose w® and set k = 0; 
2. Set yo = w* and choose a; 
3. Forj=1,..., M: 
set yj = yj-1 — AVE ;(yj-1) 
4, Set w+! = yas; 
5. Setk =k +1 and return to 2. 


On-line backpropagation 


It is also possible to introduce a ‘momentum term’ 
by replacing, for j > 1, the update considered at Step 3 
of the preceding scheme with an update of the form: 


Vj = Yin — CVE (yj-1) + Blyj-1 — ¥j-2)s 


or else by introducing a memory of the preceding step 
also when passing to a new epoch. 

For on-line BP, the problem of selecting an ap- 
propriate value for w becomes more critical, since the 
method may not converge with a fixed stepsize. In spite 
of this, various heuristic versions of on-line BP have 
been employed with satisfactory results in many neural 
network applications and the interesting fact is that on- 
line BP employing simple heuristics is often superior 
to more sophisticated batch unconstrained optimiza- 
tion methods even for off-line learning, at least when 
the number of training pairs is very large and the train- 
ing set is highly redundant [28]. In fact, in this case, it 
could be wasteful to spend much time in computing ex- 
actly the gradient of E when far from the solution. In 
addition, it would seem that an on-line procedure in- 
troduces some sort of randomness that may help es- 
caping from local minimizers. In fact, on-line BP can 
be viewed as a stochastic process that introduces a ran- 
dom error on the gradient, which may prevent the algo- 
rithm from being trapped at irrelevant local minimiz- 
ers. 

Neural network applications have stimulated a con- 
siderable interest in the field of unconstrained opti- 
mization towards the convergence analysis of on-line 


methods (often termed incremental gradient methods) 
and the development of new algorithms. From a deter- 
ministic point of view, incremental gradient methods 
can be viewed as algorithms for minimizing a sum of 
M differentiable functions, in which the computation 
of derivatives is split into a set of M (or more) consec- 
utive steps. An early contribution to the study of this 
problem (with a different motivation) has been given in 
[14], where the case of a convex objective function is 
studied. More recently, convergence results have been 
obtained by giving rules for the stepsizes, which, under 
suitable assumptions, may ensure convergence towards 
stationary points. In particular, stepsize rules for non- 
convex problems have been established in [9,29], by us- 
ing stochastic approximation ideas and in [15,16], by 
employing deterministic approaches. 

A thorough analysis of incremental gradient meth- 
ods and a description of an incremental version of 
the Gauss-Newton method, which leads to a discrete 
version of the extended Kalman filter, can be found 
in [4]. 

In particular, in the case of on-line BP, under the 
assumption that the gradients VE; are Lipschitz contin- 
uous and that the sequence {w*} generated by on-line 
BP is bounded (or else that || VE; || grows at most lin- 
early with || VE ||) it can be shown that every limit point 


of {w*} is a stationary point of E, provided that the step- 
k 


sizes a* are such that 
co lo. e) 
ye =oo and ye? < oO. 
k=0 k=0 


As an example, the choice 


for some c > 0, which is considered in the stochas- 
tic approximation literature, would satisfy these con- 
ditions. Similar convergence results have been also es- 
tablished in connection with the momentum updating 
rule [16]. 

In any case, we have that the stepsize must be 
driven, in principle, to zero for ensuring convergence 
of on-line methods, and this could be highly inefficient 
in comparison with a batch gradient method. Some at- 
tempts have been made in order to overcome this dif- 
ficulty. A possible compromise is the use of hybrid on- 
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line-batch BP techniques, called bold-driver methods in 
the neural network literature, where on-line BP is used 
with a fixed stepsize for one (or more epochs), but the 
stepsize is revised periodically by evaluating the behav- 
ior of the global error function E [27]. Globally conver- 
gent algorithms based on this idea have been also con- 
sidered in [10] and [26]. These techniques may improve 
the performance of on-line BP in case of off-line learn- 
ing with a moderately large and redundant training set, 
but still has the disadvantage that the whole objective 
function has to be evaluated, which is expensive when 
M is very large and is unsuitable in case of ‘truly’ on- 
line learning. A different approach can be that of reduc- 
ing the degree of incrementalism as the method pro- 
gresses and to gradually switch from the incremental 
gradient method, which can be quite effective at early 
stages of the process, to the steepest descent method, 
which has a much better ultimate convergence rate [3]. 
Still another approach can be that of constructing mul- 
tiple copies of the network, each trained (possibly in 
a batch mode) with different data blocks (added as they 
become available), and then penalizing the disagree- 
ment between the various solutions in a way that ul- 
timate convergence can be achieved [11,12]. However, 
an extensive computational testing of these approaches 
is not available and research on incremental gradient 
methods is still an active field, so that further progresses 
may be expected. 

Suggested general references on the application of 
optimization methods to neural network training are 
[4,5,13,25], and [17]. 
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Introduction 


Variable neighborhood search (VNS) is a metaheuris- 
tic, or framework for building heuristics, aimed at solv- 
ing combinatorial and global optimization problems. 
Its basic idea is systematic change of a neighborhood 
combined with a local search. Since its inception, VNS 
has undergone many developments and been applied in 


numerous fields. We review here the basic rules of VNS 
and of its main extensions. Moreover, some of the most 
successful applications are briefly summarized. Point- 
ers to many other ones are given in the reference list. 

A deterministic optimization problem may be for- 
mulated as 


min{f(x)|x € X,X C S}, (1) 


where S,X,x and f denote respectively the solution 
space and feasible set, a feasible solution and a real- 
valued objective function, respectively. If S is a finite but 
large set, a combinatorial optimization problem is de- 
fined. If S = R”, we talk about continuous optimization. 
A solution x* € S is optimal if 


F(@") = fw), Ve eS; 


an exact algorithm for problem (1), if one exists, finds 
an optimal solution x*, together with the proof of its 
optimality, or shows that there is no feasible solution, 
i.e, S = M. Moreover, in practice, the time to do so 
should be finite (and not too large); if one deals with 
a continuous function one must admit a degree of tol- 
erance i.e., stop when a feasible solution x* has been 
found such that 


f(x*) < f(x) +e, Wx € Sor 


f(x") — f(x) 
f(x*) 
for some small positive e. 

Many practical instances of problems of the 
form (1), arising in operations research and other fields, 
are too large for an exact solution to be found in rea- 
sonable time. It is well known from complexity the- 
ory [46,85] that thousands of problems are nondeter- 
ministic polynomial-time hard (NP-hard), that no al- 
gorithm with a number of steps polynomial in the size 


<e, Vx ES 
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of the instances is known for solving any of them and 
that finding one would entail obtaining one for each 
and all of them. Moreover, in some cases where a prob- 
lem admits a polynomial algorithm, the power of this 
polynomial may be so large that realistic size instances 
cannot be solved in reasonable time in the worst case, 
and sometimes also in the average case or most of the 
time. 

So one is often forced to resort to heuristics, which 
yield quickly an approximate solution, or sometimes an 
optimal solution but without proof of its optimality. 
Some of these heuristics have a worst-case guarantee, 
i.e., the solution x; obtained satisfies 


f (xn) — f(x) 
f(xn) 


for some €, which is, however, rarely small. Moreover, 
this ¢ is usually much larger than the error observed in 
practice and may therefore be a bad guide in selecting 
a heuristic. In addition to avoiding excessive computing 
time, heuristics address another problem: local optima. 
A local optimum x, of (1) is such that 


<e, Vx Ex (2) 


f(xr) = f(x), Vx € N(x) NX, (3) 


where N(x;) denotes a neighborhood of x; (ways to 
define such a neighborhood will be discussed later). If 
there are many local minima, the range of values they 
span may be large. Moreover, the globally optimum 
value f(x*) may differ substantially from the average 
value of a local minimum, or even from the best such 
value among many, obtained by some simple heuristic 
such as multistart (a phenomenon called the Tcheby- 
cheff catastrophe in [7]). There are, however, many 
ways to get out of local optima and, more precisely, the 
valleys which contain them (or set of solutions from 
which the descent method under consideration leads to 
them). 

Metaheuristics are general frameworks to build 
heuristics for combinatorial and global optimization 
problems. For discussion of the best known of them 
the reader is referred to surveys [17,49,91]. Some of the 
many successful applications of metaheuristics are also 
mentioned there. 

Variable neighborhood search (VNS) [55,56, 
59,78] is a metaheuristic which exploits systematically 
the idea of neighborhood change, both in descent to 


local minima and in escape from the valleys which 
contain them. VNS relies heavily upon the following 
observations: 


Fact 1 A local minimum with respect to one neighbor- 
hood structure is not necessarily so for another; 

Fact 2 A global minimum is a local minimum with re- 
spect to all possible neighborhood structures; 

Fact 3 For many problems local minima with respect 
to one or several neighborhoods are relatively 
close to each other. 


This last observation, which is empirical, implies that 
a local optimum often provides some information 
about the global one. This may, for instance, be sev- 
eral variables with the same value in both. However, 
it is usually not known which ones are such. An orga- 
nized study of the neighborhood of this local optimum 
is therefore in order, until a better one is found. 

Unlike many other metaheuristics, the basic 
schemes of VNS and its extensions are simple and 
require few, and sometimes no parameters. Therefore 
in addition to providing very good solutions, often in 
simpler ways than other methods, VNS gives insight 
into the reasons for such a performance, which in turn 
can lead to more efficient and sophisticated implemen- 
tations. 


Background 


VNS embeds a local search heuristic for solving com- 
binatorial and global optimization problems. There are 
predecessors of this idea. It allows change of the neigh- 
borhood structures within this search. In this section 
we give a brief introduction to the variable metric al- 
gorithm for solving continuous convex problems and 
local search heuristics for solving combinatorial and 
global optimization problems. 


Variable Metric Method 


The variable metric method for solving unconstrained 
continuous optimization problem (1) was suggested by 
Davidon [27] and Fletcher and Powell [43]. The idea is 
to change the metric (and thus the neighborhood) in 
each iteration such that the search direction (steepest 
descent with respect to the current metric) adapts bet- 
ter to the local shape of the function. In the first itera- 
tion a Euclidean unit ball in n-dimensional space is used 
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and the steepest descent (antigradient) direction found; 
in the next iterations, ellipsoidal balls are used and the 
steepest descent direction with respect to a new metric 
is obtained after a linear transformation. The purpose 
of such changes is to build up, iteratively, a good ap- 
proximation to the inverse of the Hessian matrix A7! 
of f, that is, to construct a sequence of matrices H; with 
the property, 

lim H; =A”. 

t<—co 
In the convex quadratic programming case the limit is 
achieved after n iterations instead of oo. In that way the 
so-called Newton search direction is obtained. The ad- 
vantages are (1) it is not necessary to find the inverse of 
the Hessian (which requires O(n?) operations) in each 
iteration; (2) the second -order information is not de- 
manded. 

Assume that the function f(x) is approximated by 
its Taylor series 

L T 

f(x) = 5x Ax—b’x (4) 
with positive-definite matrix A (A > 0). Applying the 
first-order condition V f(x) = Ax — b =0, we have 
AXopt = 5, where xop; is a minimum point. At the cur- 
rent point we have A x; = Vf(x;) + b. We will not rig- 
orously derive here the Davidon-Fletcher-Powell algo- 
rithm for taking H; to Hj+,. Let us just mention that 
subtracting these two equations and multiplying (from 
the left) by the inverse matrix A~', we have 


Xopt — Xi = —A7!V f(x;). 


Subtracting the latest equation at x;+; from the same 
equation at x; gives 


Xi$1 — x) = —A'(V (xi41) — VF (xi). (5) 


Having made the step from x; to x;+1, we might reason- 
ably want to require that the new approximation Hj+, 
satisfies (5) as if it were actually A™!, that is, 


Xita — Xj) = —Hi4i (Vf (xi4+1) — VA(xi)). (6) 


We might also assume that the updating formula for 
matrix H; should be of the form Hj+,; = H; + U, 
where U is a correction. It is possible to get different 


Function VarMetric(x); 
1 let x € R” be an initial solution 
2 H<—I;g<——Vf(x) 
3 fori=1tondo 
4 a* <— argming f(x +a- Hg) 
5S = RO cle e— VW if@) 
6 H<H+U 

end 
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Variable metric algorithm 


Function Best 
1 repeat 
2 x’ <x 
3 x <— argminyen(x) f(y) 


until (f(x) > f(x’); 


mprovement (x) 


Variable Neighborhood Search Methods, Algorithm 2 
Best improvement (steepest descent) heuristic 


updating formulas for U and thus for Hj+, keeping 
H+, positive-definite (H;+, > 0). In fact, there exists 
a whole family of updates, the Broyden family. From 
practical experience the Broyden—Fletcher—-Goldfarb- 
Shanno method seem to be most popular (see [48] for 
details). Pseudo-code is given in Algorithm 1. 

From the above one can conclude that even in solv- 
ing a convex program a change of metric, and thus 
change of the neighborhoods induced by that metric, 
may produce more efficient algorithms. Thus, using 
the idea of neighborhood change for solving NP-hard 
problems could well lead to even greater benefits. 


Local Search 


A local search heuristic consists in choosing an initial 
solution x, finding a direction of descent from x, within 
a neighborhood N(x), and moving to the minimum of 
f(x) within N(x) along that direction; if there is no di- 
rection of descent, the heuristic stops, and otherwise it 
is iterated. Usually the steepest descent direction, also 
referred to as best improvement, is used. This set of rules 
is summarized in Algorithm 2, where we assume that an 
initial solution x is given. The output consists of a local 
minimum, also denoted with x, and its value. 

Observe that a neighborhood structure N(x) is de- 
fined for all x € X; in discrete optimization problems it 
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Function FirstImprovement (x) 


1 repeat 

2 x/<x; i< 0 

3 repeat 

4 i<i+l 

5 x < argmin{ f(x), f(xi)}, x; € N(x) 


until (f(x) < f(x;) or i=|N(x)|) ; 
until (f(x) = f(x’) ; 


Variable Neighborhood Search Methods, Algorithm 3 
First improvement heuristic 


usually consists of all vectors obtained from x by some 
simple modification, e. g., complementing one or two 
components of a 0-1 vector. Then, at each step, the 
neighborhood N(x) of x is explored completely. As this 
may be time-consuming, an alternative is to use the first 
descent heuristic. Vectors x; € N(x) are then enumer- 
ated systematically and a move is made as soon as a de- 
scent direction is found. This is summarized in Algo- 
rithm 3. 


Basic Schemes 


Let us denote with Nx, (k = 1,..., kmax), a finite set of 
preselected neighborhood structures, and with NV;,.(x) 
the set of solutions in the kth neighborhood of x. We 
will also use notation N{,k = 1,...,khax» When de- 
scribing local descent. Neighborhoods Nj or Nj may 
be induced from one or more metric (or quasi-metric) 
functions introduced into a solution space S. An opti- 
mal solution Xopt (or global minimum) is a feasible solu- 
tion where a minimum of (1) is reached. We call x’ € X 
a local minimum of (1) with respect to Nj, if there is no 
solution x € N;(x’) C X such that f(x) < f(x’). 

In order to solve (1) by using several neighbor- 
hoods, facts 1-3 can be used in three different ways: 
(i) deterministic; (ii) stochastic; (iii) both deterministic 
and stochastic. We first give in Algorithm 4 steps of the 
neighborhood change function that will be used later. 

The function NeighborhoodChange() com- 
pares the new value f(x’) with the incumbent value 
f(x) obtained in the neighborhood k (line 1). If an im- 
provement is obtained, k is returned to its initial value 
and the new incumbent updated (line 2). Otherwise, the 
next neighborhood is considered (line 3). 


Function NeighborhoodChange (x, x’, k) 
1 if f(x’) < f(x) then 
2 x <— x';k <—1/* Make a move */ 

else 
3 k —k+1/* Next neighborhood */ 

end 
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Neighborhood change or move or not function 


Function VND (x, k’,..) 


max 


1 repeat 
2 k<1 
3 repeat 
4 xe arg MIN yew! (x) f(x) 

/* Find the best neighbor in N;.(x) */ 
5 NeighborhoodChange (x, x’, k) 

/* Change neighborhood */ 

until k =k’; 


until no improvement is obtained ; 


Variable Neighborhood Search Methods, Algorithm 5 
Steps of the basic variable neighborhood descent (VND) 


Variable Neighborhood Descent 


The variable neighborhood descent (VND) method is 
obtained if the change of neighborhoods is performed 
in a deterministic way. Its steps are presented in Algo- 
rithm 5. In the descriptions of all algorithms that follow 
we assume that an initial solution x is given. 

Most local search heuristics use in their descents 
a single or sometimes two neighborhoods (kj,,, < 2). 
Note that the final solution should be a local minimum 
with respect to all k{,,, neighborhoods, and thus the 
chances of reaching a global one are larger when us- 
ing VND than with a single neighborhood structure. 
Beside this sequential order of neighborhood struc- 
tures in VND above, one can develop a nested strat- 
egy. Assume, e. g., that k{,,, = 3; then a possible nested 
strategy is: perform VND from Fig. 8 for the first two 
neighborhoods, in each point x’ that belongs to the 
third (x’ € N3(x)). Such an approach is applied, e. g., 


in [14,57]. 


Reduced VNS 


The reduced VNS (RVNS) method is obtained if ran- 
dom points are selected from N;.(x) and no descent 
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Function RVNS (x, kmax, tmax) 

1 repeat 

2 k<1 

3 repeat 

4 x’ — Shake(x,k) 

5 NeighborhoodChange (x, x’, k) 
cera fe = ese 8 

6 t + CpuTime () 

until f > tax 3 


Variable Neighborhood Search Methods, Algorithm 6 
Steps of the reduced variable neighborhood search (RVNS) 


is made. Rather, the values of these new points are 
compared with that of the incumbent and updating 
takes place in the case of improvement. We assume that 
a stopping condition has been chosen, among various 
possibilities, e. g., the maximum CPU time allowed tax, 
or the maximum number of iterations between two im- 
provements. To simplify the description of the algo- 
rithms we always use tmax below. Therefore, RVNS uses 
two parameters: tmax and kmax. Its steps are presented 
in Algorithm 6. With the function Shake represented 
in line 4, we generate a point x’ at random from the kth 
neighborhood of x, i.e., x’ € Ni.(x). 

RVNS is useful for very large instances for which 
local search is costly. It is observed that the best value 
for the parameter kyax is often 2. In addition, the maxi- 
mum number of iterations between two improvements 
is usually used as a stopping condition. RVNS is akin to 
a Monte Carlo method, but is more systematic (see [80], 
where results obtained by RVNS were 30% better than 
those of the Monte Carlo method in solving a contin- 
uous min-max problem). When applied to the p-me- 
dian problem, RVNS gave equally good solutions as the 
fast interchange heuristic of [104] in 20-40 times less 
time [60]. 


Basic VNS 


The basic VNS method [78] combines deterministic 
and stochastic changes of neighborhood. Its steps are 
given in Algorithm 7. 

Often successive neighborhoods Nj, will be nested. 
Observe that point x’ is generated at random in step 4 
in order to avoid cycling, which might occur if any de- 
terministic rule was applied. In step 5 the first improve- 
ment local search (Algorithm 3) is usually adopted; 


Function VNS (x, kinax. tmax) 


1 repeat 

2 k<1 

B repeat 

4 x’ — Shake (x,k) 
f= Slaaleiiagy = / 

5 x" <— FirstImprovement (x’) 
/* Local search */ 

6 NeighborhoodChange (x, x”,k) 
/* Change neighborhood */ 

until k = kinax 5 
7 t — CpuTime () 


until t > tmax3 


Variable Neighborhood Search Methods, Algorithm 7 
Steps of the basic variable neighborhood search (VNS) 


however, it can be replaced with best improvement (Al- 
gorithm 2). 


General VNS 


Note that the local search step 5 may be also re- 
placed by VND (Algorithm 4). Using this general VNS 
(VNS/VND) approach led to the most successful appli- 
cations reported [3,14,18,20,21,22,23,57,61,94,96]. The 
steps of the general VNS are given in Algorithm 8. 


Skewed VNS 


The skewed VNS (SVNS) method [53] addresses the 
problem of exploring valleys far from the incumbent 
solution. Indeed, once the best solution in a large region 
has been found it is necessary to go quite far to obtain 
an improved one. Solutions drawn at random in far- 
away neighborhoods may differ substantially from the 
incumbent and VNS can then degenerate, to some ex- 
tent, into the multistart heuristic (in which descents are 
made iteratively from solutions generated at random, 
and which is known not to be very efficient). So some 
compensation for distance from the incumbent must be 
made and a scheme called SVNS is proposed for that 
purpose. Its steps are presented in Algorithms 9 and 10, 
where the KeepBest(x, x’) function simply keeps the 
better between x and x’: if f(x’) < f(x) then x < x’. 
SVNS makes use of a function p(x, x”) to measure 
the distance between the incumbent solution x and the 
local optimum found x//. The distance used to define 


3980 Variable Neighborhood Search Methods 


rm , 


* Global minimum 


@ Local minimum 


Variable Neighborhood Search Methods, Figure 1 
General VNS (GVNS) 


Function GVNS (x, kK’. kmaxs tmax) 
1 repeat 
2 k<1 
3 repeat 
4 x’ — Shake(x, k) 
5 x! <— VND(x’, ki oy) 
6 NeighborhoodChange(x, x”, k) 
until k = kmax 3 
7 t — CpuTime () 
until t > tmax3 


Variable Neighborhood Search Methods, Algorithm 8 
Steps of the GVNS 


Function NeighborhoodChangeS(x, x”, k, a) 
1 if f(x”) —ap(x, x”) < f(x) then 
2 xe x"sk<el1 

else 
3 k<k+1 

end 


Variable Neighborhood Search Methods, Algorithm 9 
Steps of neighborhood change for the skewed VNS (SVNS) 


Function SVNS (x, kmax.tmax.@) 
1 repeat 
2 k — 13 Xbest <— X 
B repeat 
4 x’ — Shake(x, k) 
5 x” <— FirstImprovement(x’) 
6 KeepBest (xpest, x) 
v NeighborhoodChangeS(x, x”, k, a) 
until k = kmax; 
8 X <— Xbest 
9 t — CpuTime() 
until t > tax 


Variable Neighborhood Search Methods, Algorithm 10 
Steps of the SVNS 


the WN; as in the above examples, could be used also 
for this purpose. The parameter a must be chosen in 
order to accept exploring valleys far from x when f(x”) 
is larger than f(x) but not too much larger (otherwise 
one will always leave x). A good value is to be found ex- 
perimentally in each case. Moreover, in order to avoid 
frequent moves from x to a close solution, one may take 
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a large value for w when p(x, x”) is small. More so- 
phisticated choices for a function of ap(x, x”) could be 
made through some learning process. 


Some Extensions of Basic VNS 


Several easy ways to extend the basic VNS are now dis- 
cussed. The basic VNS is a descent, first improvement 
method with randomization. Without much additional 
effort it could be transformed into a descent-ascent 
method: in the NeighborhoodChange() function 
set also x <- x” with some probability even if the solu- 
tion is worse than the incumbent (or the best solution 
found so far). It could also be changed into a best im- 
provement method: make a move to the best neighbor- 
hood k* among all kmax of them. Its steps are given in 
Algorithm 11. 

Another variant of the basic VNS could be to find 
a solution x’ in step 4 as the best among b (a parame- 
ter) randomly generated solutions from the kth neigh- 
borhood. There are two possible variants of this exten- 
sion: (i) perform only one local search from the best 
point among J; (ii) perform all b local searches and then 
choose the best. We now give an algorithm of a second 
type suggested by Fleiszar and Hindi [41]. There, the 
value of parameter b is set to k. In that way no new pa- 
rameter is introduced (see Algorithm 12). 

It is also possible to introduce kin and kgtep, two pa- 
rameters that control the change of the neighborhood 
process: in the previous algorithms instead of k < 1 set 
k <— kmin and instead of k — k + 1 set k <— k + Ketep. 
Steps of jump VNS are given in Algorithms 13 and 14. 


Variable Neighborhood Decomposition Search 


While the basic VNS is clearly useful for approximate 
solution of many combinatorial and global optimiza- 
tion problems, it remains difficult or takes a long time 
to solve very large instances. Often the size of the prob- 
lems considered is limited in practice by the tools avail- 
able to solve them more than by the needs of poten- 
tial users of these tools. Hence, improvements appear to 
be highly desirable. Moreover, when heuristics are ap- 
plied to really large instances their strengths and weak- 
nesses become clearly apparent. Three improvements 
of the basic VNS for solving large instances are now 
considered. 


Function BI-VNS (x, kmax,tmax) 
1 repeat 

2 k<—1 Xbest <— X 

3 repeat 

4 x’ — Shake (x,k) 
5 

6 

7 


x” <— FirstImprovement(x’) 
KeepBest(Xpest, x”) 
k<k+l1 
until k = kinax ; 
X <— Xbest 
9 t — CpuTime () 
until t > tmax 3 


o 


Variable Neighborhood Search Methods, Algorithm 11 
Steps of the basic best improvement VNS (BI-VNS) 


Function FH-VNS (x, kmax, tmax) 
repeat 


1 
2 
3 repeat 
4 for = 1tokdo 
5 x’ <— Shake (x,k) 
6 x” <— FirstImprovement(x’) 
7 KeepBest(x, x”) 
end 
8 NeighborhoodChange(x, x”, k) 
until k = kingx 5 
9 t — CpuTime () 
until t > tmax3 


Variable Neighborhood Search Methods, Algorithm 12 
Steps of the Fleszar—-Hindi extension of the basic VNS (FH- 
VNS) 


The variable neighborhood decomposition search 
(VNDS) method [60] extends the basic VNS into a two- 
level VNS scheme based upon decomposition of the 
problem. Its steps are presented in Algorithm 15, where 
tq is an additional parameter and represents the run- 
ning time given for solving decomposed (smaller-sized) 
problems by VNS. 

For ease of presentation, but without loss of general- 
ity, we assumed that the solution x represents the set of 
some elements. In step 4 we denote with y a set of k so- 
lution attributes present in x’ but not in x (y = x’ \ x). 
In step 5 we find the local optimum y’ in the space of 
y; then we denote with x” the corresponding solution 
in the whole space S (x” = (x’ \ y) U y’). We noticed 
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Function J-VNS (2, een Ksteps omnes lvzwace)) 


x" <— FirstImprovement(x’) 
NeighborhoodChanged(x, x", k, kmins kstep) 


1 repeat 
2 k< kee 
B repeat 
4 x’ — Shake(x, k) 
5 
6 
ced (oe ccs 8 
Wi t — CpuTime () 


until t > tmax3 


Variable Neighborhood Search Methods, Algorithm 13 
Steps of the jump VNS (J-VNS) 


Function NeighborhoodChanged (x, x’, k, kmin, Kstep) 


1 if f(x’) < f(x) then 

2 x<—x'sk — kin 
else 

3 k<—k+ketep 
end 


Variable Neighborhood Search Methods, Algorithm 14 
Neighborhood change or move or not function 


Function VNDS (x, kmax, tmax, ta) 

1 repeat 

2 k<2 

3 repeat 

4 x’ — Shake (x,k); y< x’ \ x 

5 y' —VNS(y, k, ta)3 x” = (x'\ y) Uy! 

6 x” — FirstImprovement(x”) 

7 NeighborhoodChange(x, x”, k) 
until k = Kinax 3 

until t > tmax 


Variable Neighborhood Search Methods, Algorithm 15 
Steps of variable neighborhood decomposition search 
(VNDS) 


that exploiting some boundary effects in a new solution 
can significantly improve the solution quality. That is 
why, in step 6, we find the local optimum x// in the 
whole space S using x” as an initial solution. If this is 
time-consuming, then at least a few local search itera- 
tions should be performed. 

VNDS can be viewed as embedding the classi- 
cal successive approximation scheme (which has been 
used in combinatorial optimization at least since the 
1960s [50]) in the VNS framework. 


Parallel VNS 


Parallel VNS methods are another extension. Several 
ways for parallelizing VNS have recently been pro- 
posed [26,71] in solving the p-median problem. In [71] 
three of them were tested: (i) parallelize local search; 
(ii) augment the number of solutions drawn from the 
current neighborhood and do local search in paral- 
lel from each of them and (iii) do the same as for 
method 2 but update the information on the best so- 
lution found. The second version gave the best results. 
It was shown in [26] that assigning different neighbor- 
hoods to each processor and interrupting their work as 
soon as an improved solution is found gives very good 
results: the best known solutions have been found on 
several large instances taken from TSP-LIB [92]. Three 
parallel VNS strategies are also suggested for solving the 
traveling purchaser problem in [83]. 


Primal—Dual VNS 


For most modern heuristics the difference in value be- 
tween the optimal solution and the one obtained is 
completely unknown. Guaranteed performance of the 
primal heuristic may be determined if a lower bound on 
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Function PD-VNS (x, ki... Kmax, tmax) 
fl TWINS (Ge, Lee oo Meanuen frcraaes) 
/* Solve primal by VNS */ 
2 DualFeasible(x, y) 
/* Find (infeasible) dual such that fp = fp */ 
3 DualVNS(y) 
/* Use VNS do decrease infeasibility */ 
4 DualExact(y) 
/* Find exact (relaxed) dual */ 
5 BandB(x, y) 
/* Apply branch-and-bound method */ 


Variable Neighborhood Search Methods, Algorithm 16 
Steps of the basic primal—dual VNS (PD-VNS) 


the objective function value is known. To that end the 
standard approach is to relax the integrality condition 
on the primal variables, based on a mathematical pro- 
gramming formulation of the problem. However, when 
the dimension of the problem is large, even the relaxed 
problem may be impossible to solve exactly by standard 
commercial solvers. Therefore, it looks to be a good 
idea to solve dual relaxed problems heuristically as well. 
In that way we get guaranteed bounds on the primal 
heuristics performance. The next problem arises if we 
want to get exact solution within a branch-and-bound 
framework since having the approximate value of the 
relaxed dual does not allow us to branch in an easy 
way, e.g., exploiting complementary slackness condi- 
tions. Thus, the exact value of the dual is necessary. 

In primal—-dual VNS [52] we propose one possible 
general way to get both the guaranteed bounds and the 
exact solution. Its steps are given in Algorithm 16. 

In the first stage a heuristic procedure based on VNS 
is used to obtain a near-optimal solution. In [52] we 
showed that VNS with decomposition is a very power- 
ful technique for large-scale simple plant location prob- 
lems (SUPPL) up to 15,000 facilities x 15,000 users. In 
the second phase, our approach is designed to find an 
exact solution of the relaxed dual problem. For solv- 
ing SPLP, this is accomplished in three stages: (i) find 
an initial dual solution (generally infeasible) using the 
primal heuristic solution and complementary slackness 
conditions; (ii) improve the solution by applying VNS 
on the unconstrained nonlinear form of the dual; (iii) fi- 
nally, solve the dual exactly using a customized “sliding 
simplex” algorithm that applies “windows” on the dual 


variables to reduce substantially the size of the problem. 
In all problems tested, including instances much larger 
than previously reported in the literature, our proce- 
dure was able to find the exact dual solution in rea- 
sonable computing time. In the third and final phase 
armed with tight upper and lower bounds, obtained, re- 
spectively, from the heuristic primal solution in phase 1 
and the exact dual solution in phase 2, we apply a stan- 
dard branch-and-bound algorithm to find an optimal 
solution of the original problem. The lower bounds 
are updated with the dual sliding simplex method and 
the upper bounds whenever new integer solutions are 
obtained at the nodes of the branching tree. In this 
way we were able to solve exactly problem instances 
with up to 7,000 x 7,000 for uniform fixed costs and 
15,000 x 15,000 otherwise. 


Variable Neighborhood Formulation Space Search 


Traditional ways to tackle an optimization problem 
consider a given formulation and search in some way 
through its feasible set S. The fact that the same prob- 
lem may often be formulated in different ways allows 
us to extend search paradigms to include jumps from 
one formulation to another. Each formulation should 
lend itself to some traditional search method, its “lo- 
cal search” that works totally within this formulation, 
and yields a final solution when started from some ini- 
tial solution. Any solution found in one formulation 
should easily be translatable to its equivalent formula- 
tion in any other formulation. We may then move from 
one formulation to another using the solution resulting 
from the former’s local search as the initial solution for 
the latter’s local search. Such a strategy will of course 
only be useful ff local searches in different formulations 
behave differently. 

This idea was recently investigated in [81] using 
an approach that systematically changes formulations 
for solving circle packing problems (CPP). It is shown 
there that a stationary point of a nonlinear program- 
ming formulation of CPP in Cartesian coordinates is 
not necessarily also a stationary point in a polar coor- 
dinate system. The method reformulation descent that 
alternates between these two formulations until the fi- 
nal solution is stationary with respect to both is sug- 
gested. The results obtained were comparable with the 
best known values, but they were achieved some 150 
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Function FormulationChange(x, x’, ¢, ¢’, £) 
1 if f(’, x’) < f(¢, x) then 
2 b—_ Ps x<— x5 l <— Linin 

else 
3 fe ete sings 

end 


Variable Neighborhood Search Methods, Algorithm 17 
Formulation change function 


Function VNFSS(x, ¢, max) 
1 repeat 
2 L<]1 
/* Initialize formulation in F */ 
3 while £ < ling do 


4 ShakeFormulation(x,x’,,¢’,€) /* Take 
(p’,x’) € (Ng(),.N (x)) at random*/ 

5 FormulationChange(x,x’,6,¢’,£) 
/* Change formulation */ 


end 
until some stopping condition is met ; 


Variable Neighborhood Search Methods, Algorithm 18 
Reduced variable neighborhood formulation space search 
(VNFSS) 


times faster than by an alternative single formulation 
approach. In that same paper the idea suggested above 
of formulation space search was also introduced, using 
more than two formulations. Some research in that di- 
rection has been reported in [64,75,84]. One algorithm 
that uses the variable neighborhood idea in search- 
ing through the formulation space is given in Algo- 
rithms 17 and 18. 

In Fig. 2 we consider the CPP case with n = 50. 
The set consists of all mixed formulations, in which 
some circle centers are given in Cartesian coordinates, 
while the others are given in polar coordinates. The 
distance between two formulations is then the number 
of centers whose coordinates are expressed in differ- 
ent systems in each formulation. Our formulation space 
search starts with the reformulation descent solution 
i.e. with ror = 0.121858. The values of kmin and Ketep 
are set to 3 and the value of kmax is set tom = 50. We did 
not get an improvement with keys, = 3,6 and 9. The 
next improvement was obtained for keurr = 12. This 
means that a “mixed” formulation with 12 polar and 
38 Cartesian coordinates is used. Then we turn again 
to the formulation with three randomly chosen circle 


centers, which was unsuccessful, but obtained a better 
solution with six, etc. After 11 improvements we ended 
up with a solution with radius rmax = 0.125798. 


Applications 


Applications of VNS or of hybrids of VNS and other 
metaheuristics are diverse and numerous. We next re- 
view some of them. Considering first industrial applica- 
tions, the oil industry provided many problems. These 
include scheduling of walkover rigs for Petrobras [2], 
the design of an offshore pipeline network [13] and 
the pooling problem [5]. Other design problems in- 
clude cable layout [24], synchronous digital hierarchy/ 
wavelength-division multiplexing networks [73], sur- 
face acoustic wave filters [100], topological design of 
a yottabit-per-second lattice network [29], the ring star 
problem [31], distribution networks [67] and supply 
chain planning [68]. Location problems have also at- 
tracted much attention. Among discrete models the p- 
median has been the most studied [15,26,44,54,71,76] 
together with its variants [32,37]; the p-center prob- 
lems [77] and the maximum capture problem [10] have 
also been examined. Among continuous models the 
multisource Weber problem is addressed in [14]. Use 
of VNS to solve the quadratic assignment problem is 
discussed in [33,34,106]. 

VNS proved to be a very efficient tool in clus- 
ter analysis. In particular, the J-means heuristic com- 
bined with VNS appears to be the state of the art for 
heuristic solution of minimum sum-of-square cluster- 
ing [8,9,57]. Combined with stabilized column gener- 
ation [36] it leads to the presently most efficient exact 
algorithm for that problem [35]. 

Other combinatorial optimization problems on 
graphs to which VNS has been applied include the 
degree-constrained spanning tree problem [16,94,102], 
the clique problem [62], the max-cut problem [38], 
the median cycle problem [86] and vertex color- 
ing [6,64]. Some further discrete combinatorial opti- 
mization problems, unrelated to graphs, to which VNS 
has been applied are the linear ordering problem [45], 
bin packing [42] and the multidimensional knapsack 
problem [89]. 

Heuristics may help to find a feasible solution or an 
improved and possibly optimal solution to large and 
difficult mixed-integer programs. The local branching 
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r = 0.121858 
RD result 


Variable Neighborhood Search Methods, Figure 2 


Reduced formulation space search for the circle packing problem and n = 50 


method of Fischetti and Lodi [39] does that, in the spirit 
of VNS. For further developments see [40,61]. 

Timetabling and related manpower organization 
problems can be well solved with VNS. They include 
the team-orienteering problem [4], the examination 
proximity problem [25], the design of balanced MBA 
student teams [30] and apportioning the European Par- 
liament [103]. 

Various vehicle-routing problems were solved by 
VNS or hybrids [12,63,66,87,88,93,97,105]. This led to 
interesting developments such as the reactive VNS of 
Braysy [12]. Use of VNS to solve machine scheduling 
problems was studied in many papers [11,27,28,41,51, 
70,82,90,98]. 

Miscellaneous other problems solved with VNS in- 
clude study of the dynamics of handwriting [19], the 
capacitated lot-sizing problem with setup times [65], 
the location-routing problem with nonlinear costs [72] 
and continuous time-constrained optimization prob- 
lems [101]. 

In all these applications VNS is used as an optimiza- 
tion tool. It can also lead to results in “discovery sci- 
ence,” i.e., help in the development of theories. This has 
been done for graph theory in a long series of papers re- 


porting on development and applications of the system 

AutoGraphiX [20,21]. See also [22,23] for applications 

to chemistry and [1] for a survey with many further ref- 

erences. This system addresses the following problems: 

e Finda graph satisfying given constraints; 

e Find optimal or near-optimal graphs for an invari- 
ant subject to constraints; 

e Refute a conjecture; 

e Suggest a conjecture (or sharpen one); 

e Suggest a proof. 

This is done by applying VNS to find extremal graphs 

using a VND with many neighborhoods defined by 

modifications of the graphs such as removal or addi- 

tion of an edge, rotation of an edge, and so forth. Once 

a set of extremal graphs, parametrized by their order, 

has been found their properties are explored with var- 

ious data-mining techniques and lead to conjectures, 

refutations and simple proofs or ideas of proof. 

Note finally that a series of papers on VNS presented 
at the 18th EURO Mini-Conference on Variable Neigh- 
borhood Search, Tenerife, November 2005, will appear 
soon in special issues of the European Journal of Oper- 
ational Research, IMA Journal of Management Mathe- 
matics and Journal of Heuristics. 
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Equilibrium in a fundamental concept in the study of 
competitive problems arising in such fields as opera- 
tions research and management science, engineering, 
and economics, regional science, and finance. Method- 
ologies that have been applied to the study of equilib- 
rium problems include: systems of equations, optimiza- 
tion theory, complementarity theory, as well as fixed 
point theory. Variational inequality theory, in particu- 
lar, has become a powerful technique for equilibrium 
analysis and computation. 


Variational inequalities were introduced by P. Hart- 
man and G. Stampacchia [3], principally, for the 
study of partial differential equation problems drawn 
from mechanics. That research focused on infinite- 
dimensional variational inequalities. An exposition of 
infinite-dimensional variational inequalities and refer- 
ences can be found in [4]. 

MJ. Smith [9] provided a formulation of the traffic 
network equilibrium problem which was then shown 
by S.C. Dafermos [2] to satisfy a finite-dimensional 
variational inequality problem. This connection al- 
lowed for the construction of more realistic mod- 
els as well as rigorous computational techniques for 
equilibrium problems including: traffic network equi- 
librium problems, spatial price equilibrium problems, 
oligopolistic market equilibrium problems, as well as 
economic and financial equilibrium problems (cf. [5,6], 
and the references therein). 

Many mathematical problems can be formulated as 
variational inequality problems and, hence, this for- 
mulation is particularly convenient since it allows for 
a unified treatment of equilibrium and optimization 
problems. 


Definition 1 (variational inequality problem) The 
finite-dimensional variational inequality problem, 
VI(F, K), is to determine a vector x* € K C R", such 
that 


(F(x*)", x — x*) >0, VxeEK, 


where F is a given continuous function from K to R", K 
is a given closed convex set, and { -,- ) denotes the inner 
product in R”. 


We now discuss some basic problem types and their re- 
lationships to the variational inequality problem. We 
also provide examples. Proofs of the theoretical results 
may be found in [4,5]. For algorithms for the computa- 
tion of variational inequalities, see also [1,5,7,8]. 

We begin with systems of equations, which have 
been used to formulate certain equilibrium problems. 
We then discuss optimization problems, both uncon- 
strained and constrained, as well as complementarity 
problems. We conclude with a fixed point problem and 
its relationship with the variational inequality problem. 
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Problem Classes 


We here briefly review certain problem classes, which 
appear frequently in equilibrium modeling, and iden- 
tify their relationships to the variational inequality 
problem. 


Systems of Equations 


Systems of equations are common in equilibrium anal- 
ysis, expressing, for example, that the demand is equal 
to the supply of various commodities at the equilibrium 
price levels. Let K = R” and let F: R” — R” be a given 
function. A vector x* € R” is said to solve a system of 
equations if 


F(x*) = 0. 


The relationship to a variational inequality problem is 
stated in the following 


Proposition 2 Let K = R" and let F : R" — R" be 
a given vector function. Then x* € R" solves the varia- 
tional inequality problem VI(F, K) if and only if x* solves 
the system of equations 


F(x*) = 0. 


Example 3 (Market equilibrium with equalities only) 
As an illustration, we now present an example of a sys- 
tem of equations. Consider m consumers, with a typ- 
ical consumer denoted by j, and n commodities, with 
a typical commodity denoted by i. We let p denote the 
n-dimensional column vector of the commodity prices 
with components: { p1,..., Pn }. 

Assume that the demand for a commodity i, d;, may, 
in general, depend upon the prices of all the commodi- 
ties, that is, 


di(p) = )- di(p), 
j=l 


where d! (p) denotes the demand for commodity i by 
consumer j at the price vector p. 

Similarly, the supply of a commodity i, s;, may, in 
general, depend upon the prices of all the commodities, 
that is, 


si(p) = )~si(p), 
j=l 


where si (p) denotes the supply of commodity i of con- 
sumer j at the price vector p. 

We group the aggregate demands for the commodi- 
ties into the n-dimensional column vector d with com- 
ponents: { d,..., d, } and the aggregate supplies of the 
commodities into the n-dimensional column vector s 
with components: { 51, ...5 Sy }. 

The market equilibrium conditions that require that 
the supply of each commodity must be equal to the de- 
mand for each commodity at the equilibrium price vec- 
tor p*, are equivalent to the following system of equa- 
tions: 


s(p*) — d(p*) = 0. 


Clearly, this expression into the standard nonlinear 
equation form, if we define the vectors x = p and F(x) 
= s(p) — d(p). 

Note, however, that the problem class of nonlinear 
equations is not sufficiently general to guarantee, for ex- 
ample, that x* > 0, which may be desirable in this ex- 
ample in which the vector «x refers to prices. 


Optimization Problems 


Optimization problems, on the other hand, consider 
explicitly an objective function to be minimized (or 
maximized), subject to constraints that may consist of 
both equalities and inequalities. Let f be a continuously 
differentiable function where f: K — : R. Mathemati- 
cally, the statement of an optimization problem is: 


| min f(x) 


s.t. xeéeK. 


The relationship between an optimization problem and 
a variational inequality problem is now highlighted. 


Proposition 4 Let x* be a solution to the optimization 
problem: 


min f(x) 

s.t. x EK, 
where f is continuously differentiable and K is closed and 
convex. Then x* is a solution of the variational inequal- 
ity problem: 


(Vf(x*)", x —x*) >0, VxeEk. 
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Furthermore, we have the following: 


Proposition 5 If f(x) is a convex function and x* is 
a solution to VI(V f, K), then x* is a solution to the 
above optimization problem. 


If the feasible set K = R”, then the unconstrained opti- 
mization problem is also a variational inequality prob- 
lem. 

On the other hand, in the case where a certain sym- 
metry condition holds, the variational inequality prob- 
lem can be reformulated as an optimization problem. In 
other words, in the case that the variational inequality 
formulation of the equilibrium conditions underlying 
a specific problem is characterized by a function with 
a symmetric Jacobian, then the solution of the equilib- 
rium conditions and the solution of a particular opti- 
mization problem are one and the same. We first intro- 
duce the following definition and then fix this relation- 
ship in a theorem. 


Definition 6 Ann x n matrix M(x), whose elements 
mi(x);i=1,...,n;j=1,...,n, are functions defined on 
the set S C R", is said to be positive semidefinite on S if 


v' M(x)v > 0, VveER", xeS. 


It is said to be positive definite on S if 


v' M(x)v > 0, Vv 40, veR", xeS. 


It is said to be strongly positive definite on S if 
v' M(x)v > allyl’, 
for some a > 0, Vv ER", x €S. 


Note that if y(x) is the smallest eigenvalue, which is 

necessarily real, of the symmetric part of M(x), that is, 

[M(x) + M(x) ']/2, then it follows that: 

i) M(x) is positive semidefinite on S ifand only if y(x) 
> 0, forallx € S; 

ii) M(x) is positive definite on S if and only if y(x) > 0, 
for all x € S; 

iii) M(x) is strongly positive definite on S if and only if 
y(x) > a> 0, for all x eS. 


Theorem 7 Assume that F(x) is continuously differen- 
tiable on K and that the Jacobian matrix 


an. aR 
0x1 OXn 
VE(x) =|: 
OFn |... OF n 


0x1 OXn 


is symmetric and positive semidefinite. Then there is 
a real-valued convex function f: K — R' satisfying 


V(x) = F(x) 


with x* the solution of VI(F, K) also being the solution 
of the mathematical programming problem: 


min f(x) 
sto xeéekK. 


Hence, although the variational inequality problem en- 
compasses the optimization problem, a variational in- 
equality problem can be reformulated as a convex opti- 
mization problem, only when the symmetry condition 
and the positive semidefiniteness condition hold. 

Therefore, the variational inequality is the more 
general problem in that it can also handle a function 
F(x) with an asymmetric Jacobian. Historically, many 
equilibrium problems were reformulated as optimiza- 
tion problems, under precisely such a symmetry as- 
sumption. The assumption, however, in terms of appli- 
cations was restrictive and precluded the more realis- 
tic modeling of multiple commodities, multiple modes 
and/or classes in competition. Moreover, the objective 
function that resulted was sometimes artificial, without 
a clear economic interpretation, and simply a mathe- 
matical device. 


Complementarity Problems 


The variational inequality problem also contains the 
complementarity problem as a special case. Comple- 
mentarity problems are defined on the nonnegative or- 
thant. 

Let Ri denote the nonnegative orthant in R", and 
let F: R” > R". The nonlinear complementarity problem 
over R’. is a system of equations and inequalities stated 
as: 


Find x*>0 


s.t. F(x*)>0 and (BCE) a0") = 0. 


Whenever the mapping F is affine, that is, whenever 
F(x) = Mx + b, where M is an n x n matrix and b an 
n X 1 vector, the problem is then known as the linear 
complementarity problem. 
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The relationship between the complementarity 
problem and the variational inequality problem is as 
follows. 


Proposition 8 VI (F, R'.) and the complementarity 
problem have precisely the same solutions, if any. 


Example 9 (Market equilibrium with equalities and in- 
equalities) We now present a nonlinear complemen- 
tarity formulation of market equilibrium. We assume 
that the prices must now be nonnegative in equilibrium. 
Hence, we consider the following situation, in which 
the demand functions are given as previously as are the 
supply functions, but now, instead of the market equi- 
librium conditions, which are represented of a system 
of equations, we have the following equilibrium condi- 


tions: For each commodity i; i= 1,..., n: 
. .(=0 ifp* >0, 
si(p’) — di(p”) ke 
>0 ifp? =0. 


Note that these equilibrium conditions state that if the 
price of acommodity is positive in equilibrium then the 
supply of that commodity must be equal to the demand 
for that commodity. On the other hand, if the price of 
a commodity at equilibrium is zero, then there may be 
an excess supply of that commodity at equilibrium, that 
is, s\(p*) — di(p*) > 0, or the market clears. Further- 
more, this system of equalities and inequalities guaran- 
tees that the prices of the instruments do not take on 
negative values, which may occur in the system of equa- 
tions expressing the market clearing conditions. 

We now give the nonlinear complementarity for- 
mulation of this problem: 


Determine p* € Ri 
s(p*) — d(p*) > 0 
((s(p*) — d(p*))", p*) = 0. 


satisfying 


Moreover, since a nonlinear complementarity problem 
is a special case of a variational inequality problem, we 
may rewrite the nonlinear complementarity formula- 
tion of the market equilibrium problem above as a vari- 
ational inequality problem: 


Determine p* €R', 
sct ((s(p*) — d(p*))", p— p*) = 0, 
VpeR’. 


Note, first, that in the special case of demand functions 
and supply functions which are separable, the Jacobians 
of these functions are symmetric since they are diagonal 
and given, respectively, by 


it GO. ses 6 

Vs(p) = ig 
0 0 Fe 

gu 0 0 

Vd(p) =] : a 
0. (0: 26 a 


Indeed, in this special case model, the supply of a com- 
modity depends only upon the price of that commodity 
and, similarly, the demand for a commodity depends 
only upon the price of that commodity. 

Hence, in this special case, the price vector p* that 
satisfies the equilibrium conditions can be obtained by 
solving the following optimization problem: 


yn 2 i PE 
min > fs dx Yo f ay dy 


i=1 0 i=1 0 


st. pj = 0, on eee 


Note that one also obtains an optimization reformula- 
tion of the equilibrium conditions, provided that the 
following symmetry condition holds: ds;/ dp, = dsx/ Op; 
and 0dj/ 0px = 0d,/ dp; for all commodities i, k. In other 
words, the price of a commodity k affects the supply of 
a commodity i in the same way that the price of a com- 
modity i affects the price of a commodity k. A similar 
situation must hold for the demands for the commodi- 
ties. 

However, such conditions are limiting from the ap- 
plication standpoint and, hence, the appeal of varia- 
tional inequality problem that enables the formulation 
and, ultimately, the computation of equilibria where 
such restrictive symmetry assumptions on the under- 
lying functions need no longer hold. Indeed, such sym- 
metry assumptions were not imposed in the variational 
inequality problem. 


Example 10 (Market equilibrium with equalities and in- 
equalities and policy interventions) We now provide 
a generalization of the preceding market equilibrium 
model to allow for price policy interventions in the 
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form of price floors and ceilings. In particular, we let p© 
denote the imposed price ceiling on the price of com- 
modity i, and we let p} denote the imposed price floor 
on the price of commodity i. Then we have the follow- 
ing equilibrium conditions: For each commodity i; i = 
] eee 


a0. ap =a, 
si(p*)—di(p"))=0 if pi < pi < pf 
>0 tf prHz. 


Note that these equilibrium conditions state that if the 
price of a commodity in equilibrium lies between the 
imposed price floor and ceiling, then the supply of that 
commodity must be equal to the demand for that com- 
modity. On the other hand, if the price of a commodity 
at equilibrium is at the imposed floor, then there may 
be an excess supply of that commodity at equilibrium, 
that is, s;)(p*) — dj(p*) > 0, or the market clears. In con- 
trast, if the price of a commodity in equilibrium is at the 
imposed ceiling, then there may be an excess demand of 
the commodity in equilibrium. 


We now provide a variational inequality formulation of 

the governing equilibrium conditions: 
Determine p* € K, such that 
((s(p*) — d(p*))".p—p*)2 0, VpeK, 

where the feasible set K = { p |p’ < p < p© }, where p” 

and p© denote, respectively, the n-dimensional column 

vectors of imposed price floors and ceilings. 


Fixed Point Problems 


We now turn to a discussion of fixed point problems in 
conjunction with variational inequality problems. We 
also provide the geometric interpretation of the varia- 
tional inequality problem and its relationship to a fixed 
point problem. 

We first define a projection. For a graphical depic- 
tion, see Fig. 1 . 


Definition 11 (a projection) Let K be a closed convex 
set in R”. Then for each x € R", there is a unique point 
y € K, such that 


Ix -—yll < xz], VzeK, 


Feasible Set K 


Minimum 


Distance 
Variational Inequalities, Figure 1 
The projection y of x on the set K 
*~ F(z*)) 
(x* — F(x") 


Variational Inequalities, Figure 2 
Geometric depiction of the variational inequality problem 
and its fixed point equivalence (with y = 1) 


and y is known as the orthogonal projection of x on the 
set K with respect to the Euclidean norm, that is, 


y = Px(x) = arg min ||x — z|| 


In other words, the closest point to x lying in the set K 
is given by y. 

We now present a property of the projection opera- 
tor that is useful both in the qualitative analysis of equi- 
libria and in their computation. Let K again be a closed 
convex set. Then the projection operator Px is nonex- 
pansive, that is, 


, Wx,x' eR". 


|| Pxx — Pxx' | < |x — x! 


The relationship between a variational inequality and 
a fixed point problem can now be stated (see Fig. 2). 


Theorem 12 Assume that K is closed and convex. Then 
x* € K is a solution of the variational inequality problem 
VI(F, K) if and only if x* is a fixed point of the map: Px(I 
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—y F):K > K, for y > 0, that is, 
x* = Px(x* — yF(x*)). 
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Let V be a Hilbert space with the norm || - || and V’ its 
dual space with the duality pairing denoted by (-, -). Let 
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a: Vx V > R' bea bilinear form satisfying: 


boundedness: 4M = const > 0: 
la(u,v)| < Mull lvl, Vu,v eV; 
(1) ae 
V-ellipticity: da = const > 0: 
a(v,v)>allvl*?, Wvev. 


Finally, let K be a nonempty, closed and convex subset 
of V and f € V’ be given. 

By an abstract variational inequality of elliptic type 
we mean a triple { K, a, f } and the element u € K satis- 


fying 


(P)a(u,v—u)>(f,v—u), WERK, 


its solution. It is known (see [7]) that if a satisfies (F{), 
then (P) has a unique solution for any f € V’. If more- 
over a is symmetric in V, i.e. a(u, v) = a(y, u) for any 
u, v € V, then (P) is equivalent to the following mini- 
mization problem: 


(P’) Findu € K: J(u) < J(v), VveK, 


where J(v) = 1/2a(v, v) — (f, v) is the quadratic func- 
tional. 

In order to define the approximation of (P), we in- 
troduce a family { V;, } of finite-dimensional subspaces 
Vn C V, dim Vy, = n(h), where h > 0 is a discretization 
parameter and n(h) > + co when h > 0 +. Let Ky C 
V;, be a nonempty, closed and convex set, not necessar- 
ily a subset of K. 

By the approximation of (P) we call the problem 

(Py) Find up, € Ky: 


a(un, Vn —Un) = (f.V¥n—Un), Wn € Ka, 
or, in the case when a is symmetric: 
(P 7’) Find uj, € Ky: J(un) <JS(vn), Vn € Kn. 


Such approach is known as the Ritz-Galerkin 
method for the approximation of (P). 

Let Vz ={1,.--, Onin) | be a basis of V;, and denote 
by 7 the isomorphism between V;, and RM, 7 (V_) = 
R"”) defined in the standard way. Then K;, can be iden- 
tified with a nonempty, closed and convex subset K C 
R"”), where K = 7(K;,) and problem (,,) can be written 
in the following algebraic form: 


(P) Find X € K:(AX, y—X) > (F, y—X), VV EK, 


where A = age, is the matrix with the elements aj 
> } ; ; 
= adj $i), F = (FIL) € R™ with Fi = (f, $7), i 
= 1,..., n(h), and (-, -) stands for the scalar product in 
R"”_ In addition, if a is symmetric in V, then (P,’) is 
equivalent to the constrained minimization problem: 


(P’) Find ¥ € K: J(X) < J), Vy EK, 
where 
= Ls > 
Jy) = sfAy. y) — Fy). 
A natural question arises, namely how to estimate the 
error between u and up. It holds: 


Theorem1 Let uand uy be the solution to (P) and (Pp), 
respectively, and let (1) be satisfied. Then 


a \|u — up| < a(u — up, u — Up) 

< (f,u—vn) + (fun — ¥) 

+ a(uy —u, vy, —u) + alu, v — up) (1) 
+ a(u, vn — 4), 

VveK, Vv, € Kp. 


For the proof, see [3,5]. 
Remark 2 If Ky, C K for any h > 0, then choosing v = 
up in (1) we obtain: 

a ||u — ull? < a(u— up, u — up) 


< (f,u—vn) + a(upn —Uu, vy —u) + a(u,v;, — 4), 
Vvn € Kn. (2) 


In order to guarantee that || u, — u || > 0, h > 0 +, the 
following properties of the system {K;, } are needed: 


Vv € K Afvy}, vn € Kp: Vy > V, H>O04+, (3) 


if {vp}, vp» € Ky, is such that 


Vp — v (weakly) in V, thenv € K. (4) 


Then one has: 


Theorem 3 Let (H{) and (3), (4) be satisfied. Then the 
Ritz-Galerkin method is convergent, i. e. 


ju —upzl| >~0, hoot. 


The proof easily follows from (1) and (2). 
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Remark 4 If Ky, C K for any h > 0, then the condition 


(4) is automatically satisfied. 


In practice, the sets V;, and Kj, are constructed by using 
finite element methods. To illustrate such a construction 


we consider the following model example. 


Example 5 Let { K, a, f } be the variational inequality 


with 


K= {v € Hj(2): v>gaein®}, 


a(u,v) = J gradu - grad v dx, 


Q 
(fv) = f fas, 
Q 


where ¢ € C(2) is a given function, @ < 0 on 02, f 
€ L?(Q@) and H}() is the standard Sobolev space of 


functions vanishing on the boundary 022. 


Since a is bounded and elliptic in Hj(2) and K is 
a nonempty, closed convex subset of H, ase ), {K, a, f } 


has a unique solution u € K: 


J sradu- grad(y —w dx= [ fr-w) dx, 
Q Q 


VveEK. 


(5) 


Let us suppose that £2 is a plane polygonal domain. Let { 
T »} bea regular family of triangulations of @ (see [2]). 


With any TJ ;, we associate the space of piecewise linear 


functions V_ C Hj(&): 


Vz = vpn € C(2): ” = Nan ao 


and its closed convex subset Ky: 


Ky = {vn € Va: va(Ai) = G(Ai), VAi € Na}. 


where N;, is the set of all interior nodes of T },. Note that 


K;, ¢ K, in general. 
The approximation of (5) is defined by 


Find uy, € Ky: 


[ena up grad(vp, — up) dx 
Q 


> ; feptipde: Vinee, © 
Q 


valr € P\(T), VT € Th, 


Since a is symmetric, uj; can be equivalently character- 
ized by 


un€ Kn: Jun) <J(vn), Vvn € Kn, (7) 


where 


il 
J(vn) = 5 | Isradva? dx — f fv dx. 
Q Q 


The algebraic form of (7) reads as follows: 
Findx€K: J(x)<I(y), VV EK, (8) 
where 
K =P =(N1,--- Yn) ER™: 
yi = O(Ai), i = 1,..., nC}, 
n(h) = card Ny, 
IG) = FAI) - EH), 
with 
A= Gas 


aij = [ essaes - grad dj dx, 
2 
F=(ryg. B= f Foi d 
2 


and { ¢; eon being the basis of V;,. Using Theorems 1 
and 3 one can prove the following convergence result: 


Theorem 6 It holds: 
i) if¢ € H?(2) andu € H?(2) 1 K then 


lu — unllencay < ch, 


where c > 0 does not depend on h; 
ii) if @ € C(Q), then 


lu — Uallaya) 279, A>O0+, 


without any regularity assumption on u. 


To release the constraint v € K, the duality approach 
may be used. Such a formulation involving besides the 
primal variable also Lagrange multipliers is the basis for 
the so-called mixed finite element methods. 

Let Y be another Hilbert space and A C Y be 
a closed, convex cone containing the zero element of 
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Y, g € Y’ bean element of the dual space to Y. The du- 
ality pairing between Y’ and Y is denoted by [ -, -]. Let 
us suppose that the convex set K is characterized by: 


K=Wwev: bv.) =>I[g,u]l, Vue A}, 


where b: V x Y — R! is a continuous bilinear form. 
We shall define the new problem by: 
(MM) Find (u, A) € V x A such that 


a(u,v) + b(v,A) = (f,v), 
b(u, 4— A) = [g,4—Al, 


VveV, 
Vue A, 


where a: V x V — R' is the bilinear form satisfying 
(H). 
Problem (MM) will be called the mixed variational for- 
mulation to (P). In order to guarantee the existence and 
the uniqueness of its solution we suppose that (see [1]): 


b(v, [) 
3B>0: sup’ > Billy, Vue Y. (9) 
vey, IIv|| 
v#~0 


Remark 7 If ais symmetric on V, then (Vl) is equiva- 
lent to the following saddle-point formulation (see [5]): 
(M’) Find (u, A) € V x A such that 


Lu, w) < Llu, A) < Lv, A), 
Vivy,uyeVxA, 


where L(y, 4) = J(v) — bv, 1) + Lg, “I. 


Let { Vy, }, {Yq } be two families of finite-dimensional 
subspaces of V and Y, respectively. Let Ay C Yu be 
a closed, convex cone, containing the zero element of 
Y. 
By the approximation of (M) we call the problem 
(M4) Find (up, An) € Vn x Aq such that: 


a(un, Vn) + b(vn, An) = (f. vn), 
Vun € Vn, 

b(un, WH — An) = [g, HH — An], 
Vily € Ay. 


One can formulate conditions under which the se- 

quence {(up, A~)} of solutions to (M7) tends to the so- 

lution (u, A) of (M) (see [3,4]). Such a mixed formula- 

tion is useful since: 

j) there are no constraints imposed on the primal vari- 
able u; 


jj) it makes possible to approximate not only the pri- 
mal but also the dual variable A. 


Example 8 Let us consider the variational inequality 
{K, a, f }, where: 


K = {ve H'(Q): v>0on dQ}, 


a(u,v) = [ogra -gradv + uv) dx, 
2 


(fv) = f fras, fe (2). 


2 


Then the convex set K can be equivalently characterized 
as follows: 
K = {ve H\(2): [vp] = 0, 
Vu € H'7(9Q), p= O}, 


where H~ "7(092) is the dual space to 


H'?(0Q) = {¢: 02 > R': dv e HQ): 
v= gondQ}. 
The symbol [ -, - ] stands for the corresponding dual- 
ity and the ordering ‘>’ is defined in a usual way: pp => 
0 if and only if [v, 4] > 0 for any v € K. Denote by 
A the convex cone of all nonnegative functionals over 
H'?(0Q2). The mixed formulation of { K, a, f } is given 
by: 

Find (u,A)€ H'\(2)x A 
s.t. a(u,v) + [v,A] = (f,v), 
Vv € H\(2), 
[u,~—A] > 0, 


(10) 


VueEA. 


The approximation of (10) will be defined by a finite 
element method. 

To this end we suppose that 2 C R? is a polygo- 
nal domain. Let { T;, }, h > 0 +, be a regular family of 
triangulations of @ and let 


Vin = {vn € C(2): valr € A(T), VT € Th} 


be the space of piecewise linear functions over T p. Fur- 
ther, let { Ty } be a regular family of partitions of 02 
into segments I, the length of which does not exceed the 
number H > 0. We define 


Au = {un € L’(82): walt € Po(D), 
VI € Ty, Ly = 00nd}, 
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i.e. Ay is the set of all nonnegative piecewise constant 
functions over J y. The approximation of (10) is de- 


fined as follows: 
Find (un, AH) EV, x Ay 


st. a(Un, Vn) = (Vn, An)0,a2 = (f.Vn). 
Vvn € Vn, 
(un, LH — Ax)o,a2 = 9, 
Vin € Ag, 


(11) 


where (u, [4)o,a2 = fagu pds. The relation between 
(10) and (11) is studied in [4]. 


Since a is symmetric, problem (11) is equivalent to the 
saddle-point formulation: 


Find (un, Ax) eV; x Ay 
s.t. L(un, Ln) < L(un, Aw) < L(Vn, Aw) (12) 
V(vn, MH) © Van X An, 


where 


1 
L£(v_, Wo) = >| (\grad vp| + vi) dx 
2 


= f vines as— f fv dx. 
Q 


02 


The approximation of elliptic variational inequalities 
describing problems in mechanics of solids (contact 
problems, problems involving friction, different models 
of plasticity) can be found in [4,5,6]. The approxima- 
tion of time dependent variational inequalities is stud- 
ied in [3]. 
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Variational inequality theory is a powerful tool in the 
qualitative analysis of equilibria. Here we provide a geo- 
metric interpretation of the variational inequality prob- 
lem and conditions for existence and uniqueness of 
solutions. For proofs of the theoretical results stated 
herein, see [1,2]. For stability and sensitivity analysis of 
variational inequalities, including applications, see [2], 
and the references therein. 

In particular, here we consider the _finite- 
dimensional variational inequality problem VI(F, K): 
Determine x* € K, such that 


(BGE") "ae —x*)>0, Vx eK, 


where K C R” is a closed convex set and F is the vector 
function: F: K — R", where ( -, - ) denotes the inner 
product in R”. 

From the definition one can deduce that the neces- 
sary and sufficient condition for x* to be a solution to 
VI(F, K) is that 


—F(x") € C(x"), 


Normal Cone 


Variational Inequalities: Geometric Interpretation, Existence 
and Uniqueness, Figure 1 
Geometric interpretation of VI(F, K) 


where C(x) denotes the normal cone of K at x, defined 
by 

C(x) = {y eR": (y", x’ — x) <0, Vx’ eK}. 
A geometric depiction of the variational inequality 
problem is given in Fig. 1. 

Existence of a solution to a variational inequality 
problem follows from continuity of the function F en- 
tering the variational inequality, provided that the fea- 
sible set K is compact. Indeed, we have the following: 


Theorem 1 If K is a compact convex set and F(x) is 
continuous on K, then the variational inequality prob- 
lem admits at least one solution x*. 


In the case of an unbounded feasible set K, this theo- 
rem is no longer applicable; the existence of a solution 
to a variational inequality problem can, nevertheless, be 
established under the subsequent condition. 

Let Ba(0) denote a closed ball with radius R centered 
at 0 and let Kp = KM Ba(0). Kp is then bounded. By VI 
is denoted then the variational inequality problem: 

Determine x} € Kr, such that 


(F(xz)',y—xz)>0, Vy € Kr. 


We now state 


Theorem 2 VI (F, K) admits a solution if and only if 
there exists an R > 0 and a solution of VIp, xp; such that 
Ilxe I<. 


Although || x; || <.R may be difficult to check, one may 
be able to identify an appropriate R based on the partic- 
ular application. 
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Existence of a solution to a variational inequality 
problem may also be established under the coercivity 
condition, as in the subsequent corollary. 


Corollary 3 Suppose that F(x) satisfies the coercivity 
condition: 
((F(x) — F(%0))", x — x0) 


Ilx — xoll 


as ||x|| + 00 for x € K and for some xo € K. Then VI(F, 
K) always has a solution. 


Corollary 4 Suppose that x* is a solution of VI(F, K) 
and x* € K°, the interior of K. Then F(x*) = 0. 


Qualitative properties of existence and uniqueness be- 
come easily obtainable under certain monotonicity 
conditions. First we outline the definitions and then 
present the results. 


Definition 5 (monotonicity) F(x) is monotone on K if 
((F(x!) — F(x))™, x! — x7) > 0, 
Vx! x? € K. 
Definition 6 (strict monotonicity) 
monotone on K if 
((F(x!) — F(x?))", x! — x?) > 0, 


Vxl,x76K, x! x x? 


F(x) is strictly 


Definition 7 (strong monotonicity) F(x) is strongly 
monotone if for some a > 0 


(FQ) = FG a: 


Vxl x° EK. 


2 
—x7)>a|x'—x’| ; 


Definition 8 (Lipschitz continuity) F(x) is Lipschitz 
continuous if there exists an L > 0, such that 
|| F(x") _ F(x’)|| <L |x" —x? 
Vxl x7 eK, 


’ 


Similarly, one may define local monotonicity (strict 
monotonicity, strong monotonicity) if one restricts the 
points: x’, x? in the neighborhood of a certain point x. 
Let B(x) denote a ball in R" centered at x. 


Definition 9 (local monotonicity) 
monotone at x if 


((F(x') — F(x’))", x1 — x?) > 0, 
Vx', yl Ee KM B(x). 


F(x) is locally 


Definition 10 (local strict monotonicity) F(x) is lo- 
cally strictly monotone at x if 


(F(x!) — F(x”))', x! — x?) > 0, 
Vx',x° E KM B(x), x’ 4~x?. 


Definition 11 (local strong monotonicity) F(x) is lo- 
cally strongly monotone at x if for some a > 0 


2 


((F(x') — F(x?))' x! — | >a |x’ —x? | ; 


Vxl,x7 © KM B(x). 


A uniqueness result is presented in the subsequent the- 
orem. 


Theorem 12 Suppose that F(x) is strictly monotone on 
K. Then the solution is unique, if one exists. 


Similarly, one can show that if F is locally strictly mono- 
tone on K, then VI(F, K) has at most one local solution. 

Monotonicity is closely related to positive definite- 
ness. 


Theorem 13 Suppose that F(x) is continuously differ- 
entiable on K and the Jacobian matrix 


0x1 OxXn 
VF(x) =|: ele 
0x) OXy 


which need not be symmetric, is positive semidefinite 
(positive definite). Then F(x) is monotone (strictly mono- 
tone). 


Proposition 14 Assume that F(x) is continuously dif- 
ferentiable on K and that V F(x) is strongly positive def- 
inite. Then F(x) is strongly monotone. 


One obtains a stronger result in the special case where 
F(x) is linear. 


Corollary 15 Suppose that F(x) = Mx + b, where M 
is an n X n matrix and b is a constant vector in R". 
The function F is monotone if and only if M is positive 
semidefinite. F is strongly monotone if and only if M is 
positive definite. 


Proposition 16 Assume that F: K — R" is continuously 
differentiable at x. Then F(x) is locally strictly (strongly) 
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monotone at x if V F(x) is positive definite (strongly pos- 
itive definite), that is, 


v' F(x)v>0, WveR", v #0, 
v'VF@)v > allyl’, 
forsome a>0, VveR". 


The following theorem provides a condition under 
which both existence and uniqueness of the solution to 
the variational inequality problem are guaranteed. Here 
no assumption on the compactness of the feasible set K 
is made. 


Theorem 17 Assume that F(x) is strongly monotone. 
Then there exists precisely one solution x* to VI(F, K). 


Hence, in the case of an unbounded feasible set K, 
strong monotonicity of the function F guarantees both 
existence and uniqueness. If K is compact, then exis- 
tence is guaranteed if F is continuous, and only the 
strict monotonicity condition needs to hold for unique- 
ness to be guaranteed. 

Assume now that F(x) is both strongly monotone 
and Lipschitz continuous. Then the projection Px [x — 
y F(x)] is a contraction with respect to x, that is, we 
have the following: 


Theorem 18 Fix 0 < y < a /L? where a and L are the 
constants appearing, respectively, in the strong mono- 
tonicity and the Lipschitz continuity condition defini- 
tions. Then 


I|Px(x — yF(x)) — Pry — yF(y)) || < B lx — yl 
for all x, y € K, where 


(l—ya)'? <6 <1. 


An immediate consequence of the theorem and the Ba- 
nach fixed point theorem is: 


Corollary 19 The operator Px(x — y F(x)) has a unique 
fixed point x*. 
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A plethora of equilibrium problems, including net- 
work equilibrium problems, can be uniformly formu- 
lated and studied as finite-dimensional variational in- 
equality problems (cf. [11] and the references therein). 
Indeed, it was precisely the traffic network equilib- 
rium problem, as stated by M. Smith [15], and identi- 
fied by S.C. Dafermos [3] to be a variational inequal- 
ity problem, that gave birth to the ensuing research ac- 
tivity in variational inequality theory and applications 
in transportation science, regional science, operations 
research/management science, and, more recently, in 
economics. 

Usually, using this methodology, one first formu- 
lates the governing equilibrium conditions as a varia- 
tional inequality problem. Qualitative properties of ex- 
istence and uniqueness of solutions to a variational in- 
equality problem can then be studied using the stan- 
dard theory (cf. [9]) or by exploiting problem structure 
(cf. [11]). Finally, a variety of algorithms for the compu- 
tation of solutions to finite-dimensional variational in- 
equality problems are now available (see, e. g., [1,4,11], 
and the references therein). 

Finite-dimensional variational inequality theory by 
itself, however, provides no framework for the study 


of the dynamics of competitive systems. Rather, it cap- 
tures the system at its equilibrium state and, hence, the 
focus of this tool is static in nature. 

Recently, P. Dupuis and A. Nagurney [6] proved 
that, given a variational inequality problem, there is 
a naturally associated dynamical system, the stationary 
points of which correspond precisely to the solutions of 
the variational inequality problem. This association was 
first noted by Dupuis and H. Ishii [5]. This dynamical 
system, first referred to as a projected dynamical system 
by D. Zhang and Nagurney [16], is nonclassical in that 
its right-hand side, which is a projection operator, is 
discontinuous. The discontinuities arise because of the 
constraints underlying the variational inequality prob- 
lem modeling the application in question. Hence, clas- 
sical dynamical systems theory (cf. [2,7,8,10,13]) is no 
longer applicable. 

Nevertheless, as demonstrated rigorously in [6], 
a projected dynamical system may be studied through 
the use of the Skorokhod problem [14], a tool originally 
introduced for the study of stochastic differential equa- 
tions with a reflecting boundary condition. Existence 
and uniqueness of a solution path, which is essential for 
the dynamical system to provide a reasonable model, 
were also established therein. 

Here we present some recent results in the develop- 
ment of a new tool for the study of equilibrium prob- 
lems in a dynamic setting, which has been termed pro- 
jected dynamical systems theory (cf. [16]). One of the 
notable features of this tool, whose rigorous theoreti- 
cal foundations were laid in [6], is its relationship to 
the variational inequality problem. Projected dynami- 
cal systems theory, however, goes further than finite- 
dimensional variational inequality theory in that it ex- 
tends the static study of equilibrium states by introduc- 
ing an additional time dimension in order to allow for 
the analysis of disequilibrium behavior that precedes 
the equilibrium. 

In particular, we associate with a given variational 
inequality problem, a nonclassical dynamical system, 
called a projected dynamical system. The projected 
dynamical system is interesting both as a dynamical 
model for the system whose equilibrium behavior is de- 
scribed by the variational inequality, and, also, because 
its set of stationary points coincides with the set of solu- 
tions to a variational inequality problem. In this frame- 
work, the feasibility constraints in the variational in- 
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equality problem correspond to discontinuities in the 
right-hand side of the differential equation, which is 
a projection operator. Consequently, the projected dy- 
namical system is not amenable to analysis via the clas- 
sical theory of dynamical systems. 

We first recall the variational inequality problem. 
We then present the definition of a projected dynam- 
ical system, which evolves within a constraint set K. Its 
stationary points are identified with the solutions to the 
corresponding variational inequality problem with the 
same constraint set. We then state in a theorem the fun- 
damental properties of such a projected dynamical sys- 
tem in regards to the existence and uniqueness of solu- 
tion paths to the governing ordinary differential equa- 
tion. We subsequently provide an interpretation of the 
ordinary differential equation that defines the projected 
dynamical system, along with a description of how the 
solutions may be expected to behave. 

For additional qualitative results, in particular, sta- 
bility analysis results, see [16]. For a discussion of the 
general iterative scheme and proof of convergence, see 
[6]. For applications to dynamic spatial price equilib- 
rium problems, oligopolistic market equilibrium prob- 
lems, and traffic network equilibrium problems, see 
[12], and the references therein. 


The Variational Inequality Problem 
and a Projected Dynamical System. 


We now present the definition of a variational inequal- 
ity problem (VI) and that of a projected dynamical sys- 
tem (PDS). 


Definition 1 (variational inequality problem) For 
a closed convex set K C R” and vector function F: K 
— R", the variational inequality problem, VI(F, K), is 
to determine a vector x* € K, such that 


(F(x*)", x — x) >0, VWxeK, 


where (-, -) denotes the inner product in R”. 


As is well-known, the variational inequality has been 
used to formulate a plethora of equilibrium problems 
ranging from traffic network equilibrium problems to 
spatial oligopolistic market equilibrium problems (cf. 
[11] and the references therein). 

Finite-dimensional variational inequality theory, 
however, provides no framework for studying the un- 
derlying dynamics of systems, since it considers only 


equilibrium solutions in its formulation. Hence, in 
a sense, it provides a static representation of a system 
at its “steady state’. One would, therefore, like a theoret- 
ical framework that permits one to study a system not 
only at its equilibrium point, but also in a dynamical 
setting. 

The definition of a projected dynamical system 
(PDS) is given with respect to a closed convex set K, 
which is usually the constraint set underlying a partic- 
ular application, such as, for example, network equilib- 
rium problems, and a vector field F whose domain con- 
tains K. As noted in [6], it is expected that such pro- 
jected dynamical systems will provide mathematically 
convenient approximations to more ‘realistic dynami- 
cal models that might be used to describe nonstatic be- 
havior. The relationship between a projected dynamical 
system and its associated variational inequality problem 
with the same constraint set is then highlighted. For 
completeness, we also recall the fundamental proper- 
ties of existence and uniqueness of the solution to the 
ordinary differential equation (ODE) that defines such 
a projected dynamical system. 

Let K C R" be closed and convex. Denote the 
boundary and interior of K, respectively, by 0K and K®. 
Given x € OK, define the set of inward normals to K at 
x by 


N(x) 
= {y: llyll =1, and (y",x— y) <0, Vy € K}. 


We define N(x) to be {y: || y || = 1} for x in the interior 
of K. 

When K is a convex polyhedron (for example, when 
K consists of linear constraints), K takes the form 
(4 Ke where each K; is a closed half-space with in- 
ward normal Nj. Let Px be the norm projection. Then 
Px projects onto K ‘along N’, in that if y € K, then P(y) 
= y, and if y ¢ K, then P(y) € 0K, and P(y) — y= ay for 
some a > 0 and y € N(P(y)). 
Definition 2 Given x € K and v € R", define the pro- 


jection of the vector v at x (with respect to K) by 


TIx(x,v) = lim ee 


The class of ordinary differential equations that are of 
interest here take the following form: 


x = ITx(x,—F(x)), 
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where K is a closed convex set, corresponding to the 
constraint set in a particular application, and F(x) is 
a vector field defined on K. 

Note that a classical dynamical system, in contrast, 
is of the form 


x = —F(x). 


We have the following results (cf. [6]): 
i) Ifx € K®, then 


ITx(x, —F(x)) = —F(x). 
ii) If x € OK, then 
TIx(x, —F(x)) = —F(x) + B(x)N*(x), 
where 
N*(x) = arg max ((—F(x))", —N), 
and 
B(x) = max{0, ((—F(x))", —N*(x))}. 


Note that since the right-hand side of the ordinary dif- 
ferential equation is associated with a projection opera- 
tor, it is discontinuous on the boundary of K. Therefore, 
one needs to explicitly state what one means by a solu- 
tion to an ODE with a discontinuous right-hand side. 


Definition 3 We say that the function x: [0, 00) > K 
is a solution to the equation x = ITx(x, —F(x)) if x(-) is 
absolutely continuous and x(t) = I7x(x(t), —F(x(t))), 
— F(x(t))), save on a set of Lebesgue measure zero. 


In order to distinguish between the pertinent ODEs 
from the classical ODEs with continuous right-hand 
sides, we refer to the above as ODE(F, K). 


Definition 4 (initial value problem) For any xo € K as 
an initial value, we associate with ODE(F, K) an initial 
value problem, IVP(F, K, xo), defined as: 


x = ITk(x,—F(x)), x(0) = Xp. 
Note that if there is a solution ¢,,(t) to the initial value 
problem IVP(F, K, xo), with $x, (0) =x € K, then ¢,, (t) 
always stays in the constraint set K for t > 0. 

We now present the definition of a projected dy- 
namical system, governed by such an ODE(F, K), 
which, correspondingly, will be denoted by PDS(F, K). 
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ure 1 
A trajectory of a projected dynamical system that evolves 
both on the interior and on the boundary of the constraint 
set K 


Definition 5 (projected dynamical system) Define the 
projected dynamical system PDS(F, K) as the map ®: K 
x R > K where 


P(x, t) = x(t) 
solves IVP(F, K, x), that is, 


x(t) = Hx (¢x(t), —F(¢x(t))), 
x(0) = x. 


The behavior of the dynamical system is now described. 
One may refer to Fig. 1 for an illustration of this behav- 
ior. If x(t) € K°, then the evolution of the solution is 
directly given in terms of F: x = —F(x). However, if 
the vector field — F drives x to 0K (that is, for some t 
one has x(t) € 0K and — F(x(t)) points ‘out’ of K) the 
right-hand side of the ODE becomes the projection of — 
F onto OK. The solution to the ODE then evolves along 
a ‘section’ of OK, e. g., OK; for some i. At a later time 
the solution may re-enter K®, or it may enter a lower- 
dimensional part of 0K, e. g., 0K; N dK;. Depending on 
the particular vector field F, it may then evolve within 
the set dK; N OK;, re-enter dKj, enter 0Kj, etc. 

We now define a stationary or an equilibrium point. 


Definition 6 (stationary point or equilibrium point) 
The vector x* € K isa stationary point or an equilibrium 
point of the projected dynamical system PDS(F, K) if 


0 = Hx(x*, —F(x*)). 
In other words, we say that x* is a stationary point or 


an equilibrium point if, once the projected dynamical 
system is at x*, it will remain at x* for all future times. 
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From the definition it is apparent that x* is an equi- 
librium point of the projected dynamical system PDS(F, 
K) if the vector field F vanishes at x*. The contrary, 
however, is only true when x* is an interior point of the 
constraint set K. Indeed, when x* lies on the boundary 
of K, we may have F(x*) # 0. 

Note that for classical dynamical systems, the nec- 
essary and sufficient condition for an equilibrium point 
is that the vector field vanish at that point, that is, that 
0 = —F(x*). 

The following theorem states a basic connection be- 
tween the static world of finite-dimensional variational 
inequality problems and the dynamic world of pro- 
jected dynamical systems. 


Theorem 7 [6] Assume that K is a convex polyhedron. 
Then the equilibrium points of the PDS(F, K) coincide 
with the solutions of VI(F, K). Hence, for x* € K and 


satisfying 
0 = Hx(x*, —F(x*)) 
also satisfies 


(F(x*)", x — x*) >0, VxeEK. 
This theorem establishes the equivalence between the 
set of equilibria of a projected dynamical system and 
the set of solutions of a variational inequality problem. 
Moreover, it provides a natural underlying dynamics 
(out of equilibrium) of such systems. 

Before stating the fundamental theorem about pro- 
jected dynamical systems, we introduce the following 
assumption needed for the theorem. 


Assumption 8 (linear growth condition) There exists 
a B < oo such that the vector field —F: R” — R" sat- 
isfies the linear growth condition: || F(x) || < B(1 + || x 
||) for x € K, and also 


(F(x) + F(y))",x-y) < Bllx—yll’, 
Vx,y eK. 


Theorem 9 (existence, uniqueness, and continuous 

dependence) Assume that the linear growth condition 

holds. Then 

i) For any xo € K, there exists a unique solution x(t) to 
the initial value problem. 


ii) If x, —> xo as k — ov, then x,(t) converges to xo(t) 
uniformly on every compact set of [0, 00). 


The second statement of this theorem is sometimes 
called the continuous dependence of the solution path 
to ODE(F, K) on the initial value. By virtue of the the- 
orem, PDS(F, K) is well-defined and inhabits K when- 
ever the assumption holds. 

Lipschitz continuity is a condition that plays an im- 
portant role in the study of variational inequality prob- 
lems. It also is a critical concept in the classical study of 
dynamical systems. 


Definition 10 (Lipschitz continuity) F: K — R" is lo- 
cally Lipschitz continuous if for every x € K there are 
a neighborhood 7(x) and a positive number L(x) > 0 
such that 


’ 


|| F(x’) — F(x") || < L(x) |x’ — x” 


Vx', x” € n(x). 


When this condition holds uniformly on K for some 
constant L > 0, that is, 


’ 


|| F(x’) — F(x”) | < Lx’ — x” 


Vx',x” € K, 


then F is said to be Lipschitz continuous on K. 


Lipschitz continuity implies the Assumption and is, 
therefore, a sufficient condition for the fundamental 
properties of projected dynamical systems stated in the 
theorem. 


Example 11 (Tatonnement or adjustment process) 
Consider the market equilibrium model in which there 
are n commodities. We denote the price of commod- 
ity i by p;, and group the prices into the n-dimensional 
column vector p. The supply of commodity iis denoted 
by s;(p), and the demand for commodity i is denoted by 
dj(p). We are interested in determining the equilibrium 
pattern that satisfies the following 

market equilibrium conditions: For each commodity 
B12 1, 00.50: 


=0 
=>0 if pt =0. 


if *S5 0, 
si(p*) — di(p") Pi 


For this problem we propose the following adjustment 
or tatonnement process: For each commodity i; i = 1, 
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= di(p) — si(p) 
max{0, d;(p) — s;(p)} 


if pi>o 
if pi =0. 


In other words, a price of an instrument will increase 
if the demand for that instrument exceeds the supply 
of that instrument; the price will decrease if the de- 
mand for that instrument is less than the supply for 
that instrument. However, if the price of an instrument 
is equal to zero, and the supply of that instrument ex- 
ceeds the demand, then the price will not change since 
one cannot have negative prices according to equilib- 
rium conditions. 
In vector form, we may express the above as 


p = IIk(p, d(p) — s(p)), 


where K = R'., s(p) is the n-dimensional column vec- 
tor of supply functions, and d(p) is the n-dimensional 
column vector of demand functions. Note that this ad- 
justment process can be put into the standard form of 
a PDS, if we define the column vectors: x = p and F(x) 
= s(p) —d(p). 

On the other hand, if we do not constrain the in- 
strument prices to be nonnegative, then K = R”, and 
the above tatonnement process would take the form: 


p = d(p) —s(p). 


This would then be an example of a classical dynamical 
system. 

In the context of the example, we have then that, 
according to the theorem, the stationary point of prices, 
p’ that is, those prices that satisfy 


0 = Ix(p*, d(p*) — s(p*)) 
also satisfy the variational inequality problem 


((s(p*) — d(p*))", p — p*) = 0, 
Vpek. 


Hence, there is a natural underlying dynamics for 
the prices, and the equilibrium point satisfies the vari- 
ational inequality problem; equivalently, is a stationary 
point of the projected dynamical system. 
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The theory of variational principles is a branch of math- 
ematical sciences with a wide range of applications in 
industry, physical, social, regional and engineering sci- 
ences. Researches in this theory have shown important 
and novel connections with all areas of pure and ap- 
plied sciences. The general theory of the calculus of 
variations started soon after the introduction of dif- 
ferential and integral calculus by I. Newton and G.W. 
Leibniz, although some individual optimization prob- 
lems had been investigated before that, the determina- 
tion of the paths of light by P. Fermat. To be more spe- 
cific, the brothers Jakob Bernoulli and Johann Bernoulli 
(1697) were the first, who considered the variational 
problems in mathematical terms. It is worth mention- 
ing that the first phase of the development of the cal- 
culus of variations was characterized by a combination 
of philosophical concepts, mathematical methods and 
physical problems. L. Euler (eighteenth century) cre- 
ated a new branch of mathematics known as the cal- 
culus of variations. Motivated by geometrical consider- 
ations, he deduced its first principle which is now re- 
ferred to as Euler’s differential equation for the deter- 
mination of maximizing or minimizing arcs. By vari- 
ational principles, we mean: maximum and minimum 
problems arising in game theory, approximation the- 
ory, mechanics, geometrical optics, general relativity 
theory, economics, transportation, differential geome- 
try and related areas. In fact, the history of variational 
principles comprises the following distinct stages: 

1) The basic search for solutions of variational prob- 
lems, led through the work of Euler, J.L. Lagrange, 
A.M. Legendre, C.G. Jacobi, K. Weierstrass and 
many others, to develop along the lines of differen- 
tial and integral equations as well as functional anal- 
ysis. 

2) The Hamiltonian-Jacobi theory represents a gen- 
eral framework for the mathematical description of 
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the propagation of actions in nature and the opti- 
mal modeling of control processes in daily life. Us- 
ing the ideas and techniques of Hamiltonian—Jacobi 
theory in mechanics, E. Cartan introduced differen- 
tial geometry and his exterior calculus in the cal- 
culus of variations. Many basic equations of math- 
ematical physics result from variational problems. 
It is known that the gauge fields theories are a con- 
tinuation of Einstein’s concept of describing physi- 
cal effects mathematically in terms of differential ge- 
ometry. These theories play a fundamental role in 
the modern theory of elementary particles and are 
right tool of building up a unified theory of elemen- 
tary particles, which includes all kind of known in- 
teractions. For example, the Weinberg-Salam the- 
ory unifies weak and electromagnetic interactions. 
It is also known that the variational formulation of 
field theories allows for a degree of unification ab- 
sent their versions in terms of differential equations. 
Variational principles play an important part in the 
existence and stability of soliton, which occur in al- 
most every branch of physics. 

3) Optimization that came into being because of equi- 
librium problems arising in economics and trans- 
portation from the 1950s onwards, for example, 
linear optimization, Kuhn-Tucker theory, Bellman 
dynamic optimization, Ekeland’s principle and its 
variant forms. 

4) Variational and quasivariational inequalities the- 
ory with their applications to mathematical physics, 
pure and applied sciences, which was introduced in 
1964. Theory of variational inequalities provides us 
with a simple, natural, efficient and unified frame- 
work to study a wide class of unrelated problems. 
This theory combines the theory of extremal prob- 
lems and monotone operators under a unified view- 
point. Note that every monotone operator is not 
a potential operator. 

A last problem of great interest is the so-called inverse 

problem of the calculus of variations. A detailed expo- 

sition of the single integral problem shows the cru- 
cial role of the concept of variational selfadjoints. Self- 
adjointness of the linear differential operators is well 
known to be the key property in the inverse problems 

of the calculus of variations. However, E. Tonti [23] 

have emphasized the role played by the inner product 

with regard to the selfadjointness. Variational princi- 


ples for nonsymmetric nonpotential operators have not 
been widely used either by mathematicians or in ap- 
plications. One of the basic reasons for this is appar- 
ently the complexity of a constructive approach to the 
necessary symmetrizing operators. After Hilbert’s pa- 
per, [7,24], the variational methods for investigating 
boundary value problems for partial differential equa- 
tions, were developed and received theoretical justifi- 
cation. It is known that, if, for a linear nonsymmetric 
and nonpositive operator T, there exists an inverse op- 
erator T — 1 on a Hilbert space H, then there exist an 
infinite number of auxiliary operators g such that T is 
g-symmetric and g-positive. For the theoretical founda- 
tion of the formulation and investigation for variational 
principles for both linear and nonlinear equations, see 
[7,23], where it is shown that the construction of a vari- 
ational principle is closely related with the choice of the 
classes of functionals and the space. 

The direct methods for solving primal variational 
problems provide only upper bounds, whereas the so- 
lution of the dual (complementary) problem will give 
lower bounds. The idea of transforming the original 
variational problem of a minimization of a functional 
into a corresponding problem of maximization and of 
obtaining a posteriori estimate of approximate solution 
goes back to C. Zaremba, E. Trefftz and K. Friedrich, see 
[7,24] which incidently forms the basis of three direc- 
tions of obtaining dual variational principles: geomet- 
ric, operator and functional. In recent years (as of 2000) 
with the help of operator theory, interesting and im- 
portant results have been obtained in the applications 
of dual variational principles. The important signifi- 
cance of dual variational principles of obtaining them 
by means of Fenchel—-Rockafellar inequality has been 
emphasized in [8], where among the basic drawbacks 
of other techniques have been pointed out. It has been 
shown in [8] that dual techniques have more favorable 
properties than the primal ones for nonlinear and non- 
smooth systems. 

It is perhaps part of the fascination of the subject 
that so many branches of pure and applied sciences are 
involved. The task of becoming conversant with a wide 
spectrum of knowledge is indeed a real challenge. The 
framework chosen should be seen as a model setting for 
more general results. In this article, we will consider the 
variational-like inequalities to describe some results in 
the setting of Hilbert space and list some very interest- 
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ing and open (as of 1999) problems for the future re- 
search. 


Variational-Like Inequalities 


Let H be a real Hilbert space whose inner product and 
norm are denoted by (-) and || - || respectively. Let K be 
a nonempty subset of H and : K x K — H bea single- 
valued operator. Let F: K > H be a function. We now 
recall the following concepts and results, see, for exam- 
ple, [13,14,15]. 


Definition 1 Let u € K. Then, the set K is said to be 
invex at u with respect to n, if, for each v € K, andt € 
[0, 1], u+tn(v, u) € K. K is said to be invex with respect 
to 7, if K is invex at each u € K. 


From now onward, the set K is an invex set, unless oth- 
erwise specified. 


Definition 2 The function F: K — H is said to be 
pre-invex with respect to 7, if, for all u,v € K andt € 
[0, 1], 


F(u + tn(v,u)) < (1 — t)F(u) + tF(v). 


Definition 3 For all u, v € K, the differentiable func- 
tion F: K — H is said to be an invex function with re- 
spect to 7 if 


F(v) — F(u) > (F’(u), n(v,u)), 


where F’(u) is the differential of F at u. 


Remark 4 It is known that every differentiable pre- 
invex function is an invex function, but the converse 
is not true. However, if n(v, u) = v — u, then both pre- 
invex and invex functions are convex functions and the 
invex set is a convex set. If F is a differentiable pre- 
invex function and ¢ is a pre-invex function, then it is 
known [13,15] that the minimum u of the functional 
I[v], where 


I[v] = F(v)+ g(v) forallv € K, (1) 


on the invex set K in H can be characterized by the 
variational-like inequality 


(Fu), n(v, u)) + gv) — glu) = 0 


J, 
forallv € K. @) 


It is well known that in many important applications, 
variational-like inequalities (2) occur, which do not 
arise as a result of extremum problems. This motivates 
the interest of studying problem like (2) on its own, that 
is, without assuming a priori that this comes out as an 
Euler inequality of an extremum problem. 


For a given nonlinear operator T: H — H, we consider 
the problem of finding u € H such that 


(Tu, n(v,u)) + g(v) — g(u) = 0 


(3) 
for all v € H. 


Clearly problem (2) is a special case of problem (3). 
First of all, we discuss some special important cases: In 
particular, if 7 (v, u) = v — u, then problem (3) is equiv- 
alent to finding u € H such that 


(Tu,v —u) + g(v) — g(u) = 0 


(4) 
for all v € H, 


which is known as the mixed variational inequal- 
ity. Note that the function g: H > R U {+ 
co} is a proper, convex and lower semicontinu- 
ous, whose subdifferential dg(u) is a maximal mono- 
tone operator. For applications of problem (4), 
see [3,4,5,6,9,10,12,13,14,15,17,18] and the references 
therein. Problem (4) can be written in the equivalent 
form as: Find u € H such that 


0 € Tu + dg(u), (5) 
which is equivalent to finding u € H such that 
u = Jplu— pTul, (6) 


where Jy = (I + p dg)" is the resolvent operator associ- 
ated with the maximal monotone operator 0g, a subd- 
ifferential of the proper, convex and lower semicontin- 
uous function g; and p > 0 is a constant. Problem (5) is 
also known as the variational inclusion; see [16] and the 
references therein for more details. 

If y* is the conjugate function of ¢, then its subdif- 
ferential dp* is also a maximal monotone operator and 
Jox = (I + 09*)7! is the resolvent operator associated 
with dg*. From the definitions of the resolvent opera- 
tors, we have, for all u, v € H, 


u = Jg(u+v) > v € dg(u) 
> g*(v) + g(u) = (v, u) 
© u€é dg*(v) & v= Jg(uty). 
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From this result, we have 


Z = Jo(z) + Jox(z) forall z € H, 

a beautiful and useful relationship between the resol- 
vent operators. It has been shown [14] that the problem 
(4) is equivalent to finding z € H such that 


PTJoz + Joxz = 0. (7) 


Equations (7) are called the resolvent equations. Such 
an equivalent interplay has played an important part in 
suggesting various iterative methods for solving mixed 
variational inequalities. If g is an indicator function of 
a closed convex set K in H, then Jy = Px, the projection 
of H onto K; as a consequence, resolvent equations are 
equivalent to the Wiener-Hopf equations, introduced 
and studied in [21] and [20] in connection with classi- 
cal variational inequalities. See [14,17,18] for the physi- 
cal formulation and numerical methods of the Wiener- 
Hopf equations. 


Remark 5 Above, we have tried to emphasize the 
role played by the concepts of the invexity theory in 
variational-like inequalities. Unfortunately, all the ex- 
istence theory for variational-like inequalities has been 
developed in the setting of the standard convexity up 
to now. It is right time to study the variational-like in- 
equalities in context of invex functions and invex sets. 
We would like to point out that the projection and re- 
solvent equations techniques cannot be extended and 
modified to study the existence results and to suggest 
iterative methods for variational-like inequalities due 
to the presence of the function 7 and the nonlinear 
pre-invex function @. See [13,15] for the auxiliary prin- 
ciple technique to suggest a general iterative method 
and a merit function for solving variational-like 
inequalities. 


Open Problems 


In this section, we list a number of open problems 

which can play an important role in the development 

of variational-like inequalities. 

1) Is the subdifferential of a preinvex function a maxi- 
mal monotone operator? 

2) Does there exist a resolvent(projection) operator 
associated with the subdifferential of a proper, 


pre-invex (invex) and lower semicontinuous func- 

tion? 

3) There are a number of merit (gap) functions for 
variational inequalities and complementarity prob- 
lems. Is it possible to construct similar merit (gap) 
functions for variational-like inequalities? M.A. 
Noor [15] has constructed a merit (gap) function for 
variational-like inequalities under some conditions. 

4) Study the sensitivity analysis for variational-like in- 
equalities. 

5) Can one apply the Ky Fan inequality or any other 
minimax theory to study the existence of a solution 
of variational-like inequalities in the context of in- 
vexity theory? 

6) In recent years (as of 2000), Ekeland’s principle has 
played a significant part in various branches of pure 
and applied sciences, see, for example, [2,6,12] and 
the references therein. Is it possible to find a simi- 
lar variational principle for pre-invex (invex) func- 
tions? 

In this article, we have given only a brief introduc- 
tion of variational-like inequalities. This theory does 
not appear to have developed to an extent that it pro- 
vides a complete framework for studying various prob- 
lems. This field has been continuing and will continue 
to foster new, innovative and novel applications. The 
interested reader is advised to explore this fascinating 
field further and discover interesting and significant 
applications. 

It is not practical to quote sufficient up-to-date ref- 
erences. We shall therefore constrain to various refer- 
ences with which the authors have recently (as of 1999) 
been associated. Perhaps some of these point to future 
possibilities. 
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Vector optimization is the discipline which studies the 
approaches for selecting optimal decisions from a given 
admissible (feasible) set in presence of two (bicriteria) 
or more (multicriteria) conflicting objectives. The in- 
creasing interest, in the last decades, towards this disci- 
pline is mainly due to the fact that, in optimization pro- 
cesses, it seems more realistic to accept the presence of 
more than one objective. This, in fact, happens in eco- 
nomic systems (maximization of the profit and mini- 
mization of the risk in portfolio selection problems), in 
engineering systems an in physical systems (see, for ex- 
ample, [23]) and so on. 

From the pioneering works of V. Pareto and M.-E.- 
L. Walras at the end of the nineteenth century, where 
the authors introduce a (vector) equilibrium concept 
for economical systems, vector optimization gained 
mathematical recognition with the Kuhn-Tucker def- 
inition of vector maximum given in 1951. Such a defi- 
nition, as we will see later, clarifies that the mathemat- 
ical foundations of this discipline must be found in the 
fundamental works of G. Cantor and F. Hausdorff con- 
cerning orderings and ordered set in vector spaces. The 
birth of the discipline in the economic context provides 
the justification of the presence of many typical terms 
like utility functions, decision process, preferring order, 
equilibrium model. 

The first mathematical step consists in introducing 
a preference in a set, which will be called decision set 
D. This, mathematically speaking, can be translated in 
considering a binary relation R (i. e. a subset of D x D) 
on D. For economists, a preference is a partial order R, 
i.e. a binary relation satisfying the following two prop- 
erties 
a) (x, x) € R, Vx € D (reflexivity); 
b) (vy) ER, z) € R= (x, z) € R (transitivity). 

In order to have compatibility between the partial 

order R and the structure of vector space of D it is 


common to assume that the following two axioms 
hold: 
c) Va > 0: (x,y) € R= (ax, ay) € R; 
d) [(x,y) ER, (z,w) ERI > (x+z,yt+w) eR. 
A fundamental property, when it holds, is the anti- 
symmetry of R: 
e) (yw y)ER (yx) ERS x=y. 
The following theorem shows the strict relationship be- 
tween partial order and cones, and it gives the reason of 
the common term ‘the ordering cone is...’. 


Theorem 1 
a) If Ris a partial order then 


C= {xe D: (x,0) € R} 


is a convex cone; if, in addition, R is antisymmetric, 
then C is pointed. 
b) If C is a convex cone, then 


R= {(x,y)€DxD: x-—yeCch 


is a partial order on D; if, in addition, C is pointed, 
then R is antisymmetric. 


For the sake of simplicity from now on we shall suppose 
that D C R" and we shall consider, in this context, the 
most often used and best known ordering cone in R”, 
which is called the Paretian cone, that is C = R4, \{0} = 
{x € R": x; > 0,i=1,..., n}\{O}. Naturally many other 
types of convex cones have been considered in literature 
in different situations but our treatment can be easily 
generalized to those cases. 

The following definition is crucial in this frame- 
work. 


Definition 2 y is a minimum of the set A with respect 
to the cone C, and we will write y € mincA, if and only 
if the system 


y-xEC, xe€A, 


is impossible. 


From the above definition, the natural consequence is 
contained in the following: 


Definition 3| Given f: R” — R” anda subset D C R”, 
a point x € D is called a Pareto (or efficient) solution of 


min f(x) 


xéED 


(P) 
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if and only if f(x) is a minimum of f(D) with respect to 
the cone C. 


These minimum points are called Pareto (or efficient) 
points. If we replace C with intC we have the classi- 
cal relaxed definition of weak Pareto (or weak efficient) 
points. The notion of efficiency has been restricted to 
proper efficiency in order to avoid some undesirable 
situations. 

Unfortunately, there are several different definitions 
of proper efficiency [4,6,11] and this makes more diffi- 
cult the development of the analysis with respect to this 
aspect. 

After having given the definition of minimum point 
many relevant questions come: 

1) Under what conditions can we ensure the existence 
of a solution of problem (P)? 

2) What conditions can be established for a mini- 
mum point (necessary or sufficient optimality con- 
ditions)? 

3) How can we determine the minimum (when it ex- 
ists)? 

For giving the fundamental ideas for answering the 

above questions we can restrict ourselves to the most 

classical case in which 


D={x ER™: g(x) <0, h(x) = 0}, 


where g: R” — R‘ and R” > R*. 
A first classical theorem for the existence of the min- 
imum needs the following definition. 


Definition 4 f is called R’ -upper semicontinuous at 
xo € D if and only if for every neighborhood V of f (xo), 
there exists a neighborhood I of xo such that f(x) € V— 
R",VxeIND. 


Now we are able to state the following: 


Theorem 5 Let us suppose that D is compact and — f is 
R'-upper semicontinuous in D. Then the set of optimal 
solution of (P) is nonempty. 


This theorem is a generalization to vector optimization 
of the well known Weierstrass theorem. Many gener- 
alizations of it can be found in the literature (see, for 
example, [18]). 

Necessary optimality conditions of Lagrangian type 
can be established under classical assumptions of con- 
tinuous differentiability of f, g and h. The following the- 
orem holds. 


Theorem 6 Suppose that (P) satisfies the classical 
Kuhn-Tucker constraints qualification (or some gener- 
alization of it) atx € D [2,21]. Then, a necessary condi- 
tion for x to be a weak Pareto solution of (P) is that there 
exist 


ER’, TERS, GER‘ 


such that 


(7,4,6) # (0,0, 0) 


and 
(2. Vf@) + (2, Vg@)) 
(A) ee RE = 0, 
(2. s(@)) = 0; 
w= 0, A>0. 


Addition of convexity to the assumption of the Theo- 
rems leads us to sufficient optimality conditions: 


Theorem 7 [If all f; and g; are convex and all hx are 
affine, then condition (A) in Theorem 6 is sufficient for 
x € D to be a weak Pareto solution to (P). 


Theorems 6 and 7 are classical results and they are the 

starting point in the field of optimality conditions. De- 

velopments of such theorems can be found in literature 

(see, for example, [16,18,21]). We can observe that the 

generalizations go in several different directions: 

a) to remove the assumptions of differentiability; 

b) to weaken the constraint qualifications assump- 
tions; 

c) to strengthen the optimality conditions for other 
types of optimal solutions. 

Another research field is the characterization of effi- 

cient points. Most well known results regarding the 

characterization of efficient points are via scalarization 

by means of vectors of weights belonging to the po- 

lar cone of the ordering cone. This leads to find an 

‘equivalent’ scalar optimization problem in the follow- 

ing sense: 


Theorem 8 Suppose that all f; are convex. Then x is 
a weak Pareto solution if and only if there exists 1 € R',, 
pt # 0 such that X is a minimum point of the function 
(i; f) on the set D. 
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Let us observe that, in Theorem 8, the assumption of 
convexity of all f; is not required in the proof of suffi- 
ciency but only in the proof of the necessity. 

When the objective functions and the constraints 
are defined by linear or affine functions we have the so- 
called multi-objective linear programming. In this case 
the set of efficient points is connected and it is possi- 
ble to derive an algorithm, which is a generalization of 
the simplex method, in order to locate the entire set of 
efficient points. 

Finally, we recall that it is possible to develop a dual- 
ity theory for vector optimization like in the scalar case. 

In fact, it is well known from scalar optimization 
that, under suitable assumptions, a minimization prob- 
lem can be associated to a maximization problem such 
that both problems have the same optimal solutions. 
This scheme is called, in literature, duality and it pro- 
vides useful tools in order to have a deeper knowledge 
of the given problem and, moreover, it provides im- 
portant informations in order to develop algorithms 
for solving the given problem. A similar general duality 
principle holds for vector optimization problems and it 
can be specialized to linear vector problems. 


See also 


> Image Space Approach to Optimization 
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The vector variational inequality is a mathematics 
model which is designed to account for equilibrium 
situations where the multicriteria consideration is im- 
portant. The concept of a vector variational inequal- 
ity was introduced in [5]. In recent years, the vector 
variational inequality problem has received extensive 
attentions and found many applications in vector op- 
timization and vector network equilibrium problems. 
The theory of vector variational inequalities has been 
summarized in the edited book [7] and one chapter of 
the monograph [1]. 

Let X and Y be Hausdorff topological vector spaces. 
By L(X, Y), we denote the set of all linear continuous 
functions from X into Y. For 1 € L(X, Y), the value of 
linear function / at x is denoted by (I, x). Let C C Y 
be a nonempty, pointed, closed and convex cone with 
intC # Q. For convenience, we will denote C \ {0} and 
intC by C, and C respectively. Then (Y,C) is an ordered 
Hausdorff topological vector space with a partial order- 
ing defined by, for y1, y2 € Y, 


VM Sc ¥2 SS yrra-n €C. 
Moreover, we also define 


M1 £e, V2 > Wa- MN ECo; 
yi £ey2 > wn —n ¢C. 


These orderings can also be applied to sets where the 
ordering is understood as element-wise. 
Let T: K > L(X,Y) and K C X bea nonempty 
closed and convex subset. 
A weak vector variational inequality (WVVI) is 
a problem of finding x* € K such that 
(T(x"), x—x") £e 0, 


Vx EK, (WVVI) 


A vector variational inequality (VVI) is a problem of 
finding x* € K such that 


(T(x*),x—x*) £c,0, Wx eK. (VVI) 


It is clear that “¢¢” is a closed ordering, that is, 
Xt $e Oand x, — ximplyx <¢ 0, but “Zc,” is not. As 
such, the set of solutions for (WVVI) is closed and that 


for (VVI) is not. When Y = R and X = R", (WVVI) 
and (VVI) reduce to the variational inequality, see [8]. 
Consider a vector optimization problem: 


min F(x), (VOP)x 


xE€K 


where f : X — Y is avector-valued function. The point 
x* € K is said to be a weakly minimal solution of f on 
K if and only if f(K) #¢ f(x*) and a minimal solution 
of f on K if and only if f(K) #c, f(x”). 

Let X = R",¥ = R’ and C = Re. Let f(x) = 
(fi(x), °° ,fe(x))'. A point x* € K is said to be a Ge- 
offrion properly minimal solution of (VOP)x if and 
only if there exists a scalar M > 0 such that, for each 
1, 

fil®) = file) 

filx) — fj(x*) ~ 
for some j such that fj(x) > fj(x*) whenever x € K 
and fi(x) < f;(x*). Every Geoffrion properly minimal 
solution is a minimal solution. 

f : X — Y is C-convex on K if and only if, for any 
x1,X2 € K,A € [0,1], 


f(Ax + (1 = A)x2) Sc Af(m1) + (Ll — A) f (x2) - 


’ 


The following proposition summarizes relation- 
ships between (WVVI)/ (VVI) and the vector optimiza- 
tion problem (VOP)x. See [2,13]. 


Proposition 1 Assume that f is Gateaux differentiable 

with Gateaux derivative Df. Let T = Df. We have 

(i) Ifx is a weakly minimal solution of (VOP)x, then x 
solves (WVVI). 

(ii) If f is C-convex and x solves (WVVI), then x is 
a weakly minimal solution of (VOP)x. 

(iii) If f is C-convex and x solves (VVI), then x is a min- 
imal solution of (VOP)x. 

(iv) If —f is C-convex and x’ is a minimal solution of 
(VOP)kx, then x solves (VVI). 

(v) Let T(x) = Vf(x) := (Vfilx),-*+ , Vfe(x))" be 
the Jacobian (an £ x n matrix) of the vector-valued 
function f at x. If f is C = R4-convex and x* is 
a Geoffrion properly minimal solution for (VOP)x, 
then x solves (VVD). 


Without the C-convexity of —f, (iv) may not be true. 
Let X = R, Y = R’ and C = R}. Consider the prob- 
lem ming f(x), subject to x € [—1,0] where f(x) = 
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(x, x? +1). It is clear that every x € [—1, 0] is a mini- 
mal solution of the problem. But x = 0 is not a solution 
of (VVI). The set of solutions for (WVVI) and (VVI) is 
[—1,0] and [—1,0) respectively. 

A Minty vector variational inequality (Minty VVI, 
in short) is a problem of finding x* € K such that 


(T(x),x—x*) £c,0, Vx EK. (1) 


The following result shows that a minimal solu- 
tion of (VOP)x can be completely characterized by the 
Minty VVI when C = R4. 


Theorem 1 [6] Let X = R", Y = R and C= R4. 
Let T(x) = Vf (x). Let f be R" -convex and v-hemicon- 
tinuous on K. Then, x* is a minimal solution of (VOP)x 
if and only if it is a solution of the Minty VVI. 


The following generalized linearization lemma and 
Knaster, and Mazurkiewicz Theorem 
(KKM Theorem, in short) have played a key role in the 
establishment of the existence of a solution for (WVVI). 


Kuratowski 


Lemma 1 (Generalized Linearization Lemma) Let 
the mapping T : X — L(X,Y) be monotone and v- 
hemicontinuous. Then the following two problems are 
equivalent: 

1. x € K, (T(x), y—x) £60, 
2. x €K, (T(y),y—x) £e 0, 


Lemma 2 (KKM Theorem) Let K be a subset of a topo- 
logical vector space V. For each x € K, let a closed and 
convex set F(x) in V be given such that F(x) is compact 
for at least one x € K. If the convex hull of every fi- 
nite subset {x1,X2,-++ , Xx} of K is contained in the cor- 
responding union U?_, F(x;), then NxexF(x) # @. 


Vy € K; 
Vy EK. 


Assume that K is compact. We set 


Fi(y) = {x € K: (T(x), y— x) £¢ 0}, 
Fy(y) = {x € K: (T(y), y— x) £e Of, 


yeEK, 
yeER. 
It can be shown that the convex hull of every finite 
subset {x 1, x2,--- ,xx} of K is contained in the corre- 


sponding union U?'_, F,(x;). Since F,(y) C F)(y) for all 
y € K, this is also true for F,. By Lemma 1, we have 


NyekFily) = Nyex Fry) . 


We observe that for each y € K, F2(y) is a (weakly) 
compact subset in K. 


By Lemma 2, we have 


NyexFi(y) = NyexFo(y) a O. 
Hence, there exists an x* € K such that 


(T(x*),x—x*) £20, Wxek. 


Assume that K is unbounded and T : K — L(X, Y) 
is weakly coercive on K, that is, there exist x9 € K and 
c € intC* such that 


(co T(x) —co T(xo), x — X9)/||x — xo|| > oo, 


whenever x € K and ||x|| — +00. Ina similar way, we 
can show that NyexFi(y) 4 9. 

As such we have the following result, where the 
weak topology of X and the norm topology of Y are 
used. 


Theorem 2 [2] Assume that X is a reflexive Banach 
space and K C X is convex. Assume that (Y,C) is an 
ordered Banach space with C # @ and intC* # @. Let 
the mapping T : K — L(X,Y) be monotone, v-hemi- 
continuous and let, for any y € K, T(y) be completely 
continuous on X. If 

1. K is compact, or 

2. K is closed, T is weakly coercive on K, 

then the weak vector variational inequality (WVVI) is 
solvable. 


KKM Theorem cannot be applied to the establishment 
of the existence of (VVI) as the sets F,(x) and F(x) 
where £¢ is replaced by Zc, are not closed anymore. 

Only very recently, an existence of a solution for 
(VVI) has been obtained by using the Browder fixed 
point theorem. 


Theorem 3 [4] Assume that X is a reflexive Banach 

space and K C X is convex. Assume that (Y,C) is an 

ordered Banach space with C # @ and intC* # @. Let 

the mapping T : K — L(X, Y). If 

1. K is compact, and for each y € K, the set {x € K: 
(T(x), y — x) <c, 0} is open in K, or 

2. K is closed, T is continuous, and weakly coercive on 
K; 

then the vector variational inequality (VV) is solvable. 


The study of a vector variational inequality has also 
been pursued by introducing another model with a sim- 
ilar form and using the tool of conjugate function of 
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a vector-valued function. Here such a model is called 
a primitive of a VVI. We remark that this model was 
called a dual VVI in [12] and an inverse VVI in [1] re- 
spectively. 

Let T : X —> L(X, Y) bea function, and h : X > 
Y is a function. The (VVI;,) problem consists in finding 
x* € X, such that 


(T(x*),x—x*) £c, h(x*)—h(x), WxeXx. 


Assume that T is one-to-one (injective). Define T’ : 
L(X, Y) — X as follows: 


T’(1):= —T~'(-l), V1 € Domain(T’) = —Range(T). 


If T is linear, then T’ = T7!. 
The primitive of (VVI),) problem is defined as: find- 
ing /* € Domain(T"’), such that 


(2-1, TE) Ze, hE") — hE), 


VIE L(X,Y), (IVVIn) 


where h*=(1) := Maxc{(1,x) — h(x) : x € X} is the 
vector conjugate function of h. 

Let h : X — Y and x* € X. We define the subgra- 
dient of h at x’ by 


dch(x*) = {I © L(X, Y): h(x) — h(x") 
£c, (l,x—x*), Wx EX}. 


Theorem 4 [12] Let X be a Hausdorff topological vector 

space and (Y,C) be an ordered Hausdorff topological vec- 

tor space. The function T is one-to-one and h : X > Y 

is continuous. Assume that h&(1) # 9,V1 € L(X, Y). 

(i) If x’ is a solution of (VVI,), then 1* = —T(x*) is 
a solution of (IVVI;,) and the following relation is sat- 
isfied: 

Pe ene ie Ho) 

(ii) Ifl is a solution of (IVVI;), C is connected, i. e., C U 
(—C) = Y, and 0<h(x*) 4 @, where x* = —T'(I*), 
then x’ is a solution of (VVIj). 

Consider the (VOP) with X = K = R",Y = Rf 

being differentiable. Let h : R” 3 R¢ be a set-valued 


function and x* € X. We define the weak subgradient 
of hat x by 


a” h(x*) = {1 € R" x R¢ : h(x) — h(x*) 
fe (l,x—x*), Wx eX}. 


Let @ : R" x R’ —> R*° bea perturbation function 
satisfying 


b(x,0) = h(x), Vx ER”, 


and 
W(u) = —Maxe{—$(x,u):x € R"}. 


Now we construct the dual problem (for short, 
DVOP) of (VOP) as follows 


min—$=(0, 1°), subject to I € RX | 


Proposition 2. Assume that W has a weak subgradient 
at u = O and C is connected. If x’ is a solution of (VOP), 
then there exists Ty € R"** such that I* = —V f(x*) 
is a solution of the primitive of a vector variational in- 
equality and Ig is a solution of (DVOP) and satisfy the 
inclusion 


(I*", Ty) € 8” p(x*, 0). 


The concept of a gap function is well-known both in 
the context of convex optimization and variational in- 
equalities. The minimization of gap functions is a viable 
approach for solving variational inequalities. 

A set-valued function ¢, : K = Y is said to 
be a gap function of (WVVI) if and only if (i) 0 ¢€ 
¢w(x*) if and only if x’ solves (WVVI); and (ii) 0 #¢ 
dw(x), Vx € K. A set-valued function @ : K 3 Y 
is said to be a gap function of (VVI) if and only if 
(i) 0 € ¢(x*) if and only if x’ solves (VVI); and 
(ii) O Zc\ oy P(x), x € K. 


Proposition3 Let C be a pointed and convex cone in Y. 
We have 


(i) The set-valued function dy(x) := Maxe(T(x), 
x — K) is a gap function for (WVVD). 
(ii) The set-valued function $(x) := Maxc(T(x), 


x — K) is a gap function for (VVI). 


The above gap functions are of set-valued nature. Spe- 
cial single-valued gap functions can be constructed 
in terms of nonlinear scalarization functions. Given 
a fixed e € Cand a € Y, the nonlinear scalarization 
function is defined by: 


Eea(y) =min{teR:yeat+te—C}, yeY. 
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Proposition 4 Let e € C. Then x* € K solves 
(WVVI) if and only if the non-positive function g(x) = 
minyex &o({T(x), y — x)) has a zero at x. 


In the special case where Y = R*, C = Ro and 
T(x) = [T,(x),-+: , Te(x)]', the nonlinear scalariza- 
tion function may be expressed in the following equiv- 
alent form: 


Vi — Gi 

Sealy) = a 
Thus g(x) = minyex maxy<j<¢{(Ti(x), y — x)},x € 
K. The value of each g(x) amounts to solving a linear 
minimax optimization problem. 

Next we construct a gap function for a set-valued 
WVVI. 

Lett Y = R°,C = R4 and K C X a compact sub- 
set. Assume that T : K = L(X,R°) is a set-valued 
mapping with a compact set T(x) for each x. 

Consider the set-valued WVVI with the set-valued 
mapping T [9], which consists in finding x* € K, and 
t € T(x*) such that 


(i,x—x*) £60, VxeEK. (2) 


Let x, y € K and t € T(x). Denote 
(ty) = (Cty) (tye) » 


((t, y)); is the i-th component of (t, y), 
i = 1,---,€. We define two mappings ¢; : K 
x L(X,R*) > Rand@: K > Ras follows 


1.e€., 


G(x, t) = min ee Ns ¥=%))i (3) 
and 
(x) = max{¢i(x, t)|t € T(x)}. (4) 


Since K is compact, #1(x,t) is well-defined. If X is 
a Hausdorff topological vector space, then gi(x,f) is 
a lower semi-continuous function in x. Since T(x) is 
a compact set, o(x) is well-defined. 


Theorem 5 (x) defined by (4) is a gap function of the 
set-valued WVVI. 


By Theorem 5, the solution of set-valued WVVI is 
equivalent to finding a global solution x’ to the follow- 


ing generalized semi-infinite programming problem 


max s 
s.t. o(x,th<s, VWte T(x), 
oi(x, ti) =s, At, € T(x), 
xeEkK. 


The concept of vector complementarity problems 
was introduced in [2,11]. If K = D is a convex cone 
of X, then, by letting x = 0 and x = 2x* in (WVVI) 
respectively, we have 


0 £e (T(x*), x") £e 0, (5) 


and by letting x = y + x* with y € D, we have 


(T(x*), y) Ze 0, 


(5) and (6) together are called a weak vector comple- 
mentarity problem (WVCP). Let the weak C-dual cone 
DF of D be defined by 


VyeD. (6) 


Dtt = {ge (X,Y): (g,x) £60, Wx ED}. 


Then (WVCP) can be rewritten as a problem of finding 
x* € D, such that 


(T(x*), x") Ze 0, 


Thus a solution of (WVVI) is one for (WVCP), but 
the fact that the inverse implication is in general not 
true can be shown by some simple example. Neverthe- 
less, the inverse implication can be guaranteed by the 
usual positiveness property on T. Indeed, let the strong 
C-dual cone D§* of D be defined by 


T"yeDe* . 


Det = {ge L(X,Y):(g,x) >c0, Vx ED}. 


The positive vector complementarity problem 
(PVCP) is defined to be a problem of finding an x* € D 
such that 


(T(x*),x*) He 0, T(x*)e pe . 


It is obvious that D%’* and D¢* are nonempty, since 
the null linear function in L(X,Y) belongs to Dé* and 
D¢". It is easy to prove that D&* C Dit if Cis pointed. 
When Y = R, the weak and strong C-dual cones of D 
reduce to the dual cone D" of D. The weak and strong C- 
dual cones of D can be shown to be algebraically closed 
and the strong C-dual cone of D is convex. 
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Thus, it is clear that if C is pointed, then a solution 
of (PVCP) is one for (VCP). Moreover, by noting the 
ordering implication of 0 <c a Ze b = > b £€¢ O, 
a solution of (PVCP) is one for (WVVI). 


References 


1. Chen GY, Huang XX, Yang XQ (2005) Vector optimization. 
Set-valued and variational analysis. Lecture Notes in Eco- 
nomics and Mathematical Systems, 541. Springer, Berlin 

2. Chen GY, Yang XQ (1990) The vector complementary prob- 
lem and its equivalences with vector minimal element in 
ordered spaces. J Math Anal Appl 153:136-158 

3. Chen GY, Yen ND (1993) On the variational inequality 
model for network equilibrium. Internal Report, Depart- 
ment of Mathematics, University of Pisa, 3. 196 (724) 

4. Fang YP, Huang NJ (2006) Strong vector variational in- 
equalities in Banach spaces. Appl Math Lett 19:362-368 

5. Giannessi F (1980) Theorems of alternative, quadratic pro- 
grams and complementary problems. In: Cottle RW, Gian- 
nessi F, Lions JL (eds) Variational Inequality and Comple- 
mentary Problems. Wiley, New York 

6. Giannessi F (1998) On Minty variational principle. In: Gian- 
nessi F, Komldsi S, Rapcsak T (eds) New Trends in Mathe- 
matical Programming. Kluwer, Boston, pp 93-99 

7. Giannessi F (ed) (2000) Vector Variational Inequalities and 
Vector Equilibrium. Kluwer, Dordrecht, Boston, London 

8. Harker PT, Pang JS (1990) Finite-dimensional variational in- 
equality and nonlinear complementarity problems: a sur- 
vey of theory, algorithms and applications. Math Program 
48(2 Ser B):161-220 

9. Konnov IV, Yao JC (1997) On the generalized vector varia- 
tional inequality problem. J Math Anal Appl 206(1):42-58 

10. Lee GM, Kim DS, Lee BS, Yen ND (1998) Vector variational 
inequality as a tool for studying vector optimization prob- 
lems. Nonlinear Anal 34(5):745-765 

11. Yang XQ (1993) Vector complementarity and minimal ele- 
ment problems. J Optim Theory Appl 77(3):483-495 

12. Yang XQ (1993) Vector variational inequalities and its du- 
ality. Nonlinear Anal, TMA 21:867-877 

13. Yang XQ, Goh CJ (1997) On vector variational inequali- 
ties: application to vector equilibria. J Optim Theory Appl 
95:431-443 


————— 
Vehicle Routing 


JEAN-YVES POTVIN 
University Montréal, Montréal, Canada 


MSC2000: 90B06 


Article Outline 


Keywords 
Node Routing 


Static Deterministic Problems 
Static Stochastic Problems 
Dynamic Problems 
Methodologies 


Arc Routing 
Chinese Postman Problem 
Rural Postman Problem 
Capacitated Arc Routing Problem 
Methodologies 


See also 
References 


Keywords 


Network; Node routing; Arc routing; Static; Dynamic; 
Exact methods; Metaheuristics 


Vehicle routing consists in determining optimal collec- 
tion or delivery routes for a fleet of vehicles on a trans- 
portation network [7,11,14,15]. The customers to be 
serviced may be associated with vertices (node routing) 
or arcs (arc routing) of the network. Problems are de- 
terministic or stochastic depending on the certainty or 
uncertainty associated with the data. They are static or 
dynamic depending on the time of availability of the 
data. When all information is known in advance, a so- 
lution can be constructed beforehand and the problem 
is said to be static. Conversely, when new information 
(e. g., new customer requests) is continuously revealed 
over time, the problem is said to be dynamic. 

In the following, we examine both node and arc 
routing problems. 


Node Routing 


These NP-hard problems are found in transportation 
activities where the service occurs at the nodes (cus- 
tomer sites) of the transportation network. Along the 
logistics chain, they are associated with movement of 
raw material from suppliers to plants, movement of 
finished products from plants to warehouses or de- 
pots, and delivery of products to final customers. In 
the service sector, they are found in dial-a-ride systems 
(e.g., transportation-on-demand for people with spe- 
cific needs), school bus routing, courier services, etc. 
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Vehicle Routing 


The various classes of node routing problems are now 
presented. 


Static Deterministic Problems 


This is the most widely studied class of problems in the 

vehicle routing literature. It may be formally described 

as follows. Let G = (V, A) be a graph where V = {1,..., 

n } is the set of vertices, vertex 1 is the depot and A is 

the set of arcs. A nonnegative cost cj is associated with 

every arc (i, j), i # j. This cost may be interpreted as 

a travel distance or travel time, depending on the con- 

text. Also, a fleet of m vehicles is based at the depot 

with m being either fixed or variable. The vehicle rout- 
ing problem (VRP) then consists in determining a set of 
least-cost vehicle routes such that: 

e each vertex, apart from vertex 1, is visited exactly 
once by exactly one vehicle; 

e all vehicle routes start and end at vertex 1; 

e side constraints may have to be satisfied. 

When no side constraints are found, an m-traveling 

salesman problem (m-TSP) is obtained, with the clas- 

sical TSP corresponding to the special case m = 1 [17]. 

When side constraints are present, there may be one or 

more of the followings: 

e capacity constraints (capacitated vehicle routing 
problem, CVRP): a nonnegative demand or load q; 
is associated with each vertex i (apart from vertex 1) 
and the total load on a route cannot exceed vehicle 
capacity Q; 

e distance or travel time constraints (distance- 
constrained vehicle routing problem, DVRP): the 
length or travel time of a route must not exceed 
a prespecified bound; 

e time windows (vehicle routing problem with time 
windows, VRPTW): each vertex i must be visited 
within a time interval [a;, b;]; the upper bound J; 
may be soft or hard, and a waiting time is typically 
allowed if the vehicle arrives before aj;; 

e precedence constraints: a partial ordering may be 
imposed on the sequence of vertices to be ser- 
viced. For example, a subset of vertices may have 
to be visited before another subset, as in the vehi- 
cle routing problem with backhauls (VRPB). There 
may also be a precedence relationship between pairs 
of vertices, as in the subscriber dial-a-ride prob- 
lem (for transportation-on-demand services), where 


each transportation request includes both a pick-up 
and a delivery point and the pick-up must precede 
the delivery. 

Time-constrained vehicle routing and scheduling prob- 

lems have been widely studied in the literature (see [4], 

for acomprehensive survey). Pick-up and delivery prob- 

lems are studied in [20]. Other, more complex, variants 
are also reported in the literature. Without being ex- 

haustive, we mention the following [7]: 

e multiple depot problems where vehicles may start 
(end) their route from (to) different depots; 

e mixed fleet problems where vehicles have different 
characteristics (e. g., different capacities); 

e location-routing problems where strategic decisions 
about the location of different facilities (e. g., depots, 
warehouses) must be taken concurrently with the 
determination of the delivery routes; 

e period routing where daily routes are determined 
over an horizon that spans a few days (e. g., a 5-day 
week) and where each customer must receive deliv- 
eries at a designated frequency; 

e inventory routing where each customer has an in- 
ventory of a product and a distributor should deter- 
mine delivery routes so that no customer runs out of 
the product. 


Static Stochastic Problems 


Stochastic vehicle routing problems typically involve 
a stochastic demand or stochastic customers [16]. 
These problems also belong to the class of static prob- 
lems since all information is known in advance, even if 
this information is a probability distribution. Further- 
more, recourse actions are predefined in case of fail- 
ure (e. g., when the vehicle capacity is exceeded or when 
a customer does not show). Hence, a solution that mini- 
mizes some expected measure, like expected total travel 
distance, can be constructed beforehand. 


Dynamic Problems 


Dynamic problems emerge when information about 
the problem is continuously revealed over time [18]. 
Usually, such problems occur when new customer re- 
quests must be dispatched in ‘real-time’ into the current 
routes. These problems are found in many different ap- 
plication domains, like delivery of petroleum products 
and industrial gases, truckload and less-than-truckload 
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trucking, dial-a-ride systems, courier services, emer- 
gency services, etc. [10] Dynamic problems exhibit dis- 
tinctive features with regard to their static counterpart. 
For one thing, the time issue is crucial as the system 
must react promptly to new occurring events. A discus- 
sion on this topic may be found in [19]. 


Methodologies 


Exact methods for solving vehicle routing problems in- 

clude [14,21]: 

e direct tree search methods (e. g., branch and bound); 

e dynamic programming; 

e integer linear programming (e.g., set partitioning 
and column generation). 

Approximate methods may be classified as [2]: 

e constructive methods; 

- cluster-first, route-second: clusters or groups of cus- 
tomers are first identified; then, these customers are 
sequenced within each route; 

- route-first, cluster-second: a large route that in- 
cludes all customers is first constructed; then, it is 
partitioned into a number of smaller feasible routes; 

- savings/insertion methods; 

e improvement methods based on exchange proce- 
dures; 

e mixed methods which include both a constructive 
and an improvement phase. The latter include meta- 
heuristics like tabu search, simulated annealing, ge- 
netic algorithms, GRASP, ant systems and hybrids 
[8,9,12]. 

Parallel implementations of the above methods, in par- 

ticular metaheuristics, are also reported in the litera- 

ture [3]. Exploitation of multiple processors in parallel 
is vital in the case of dynamic vehicle routing problems 
where fast response times are required. 


Arc Routing 


As opposed to node routing, the key service activity in 
arc routing problems is to cover arcs of a transporta- 
tion network [1,5,6]. These problems are found in real- 
world applications like street maintenance, garbage 
collection, snow plowing, meter reading, etc. In the 
following, we examine the various subclasses of prob- 
lems. 


Chinese Postman Problem 


The Chinese postman problem (CPP) is the canonical 
problem in arc routing [13]. It is defined as follows. Let 
G=(V,EU A) bea graph where V = {1,..., n} is the set 
of vertices, E is a set of undirected edges and A is a set 
of directed arcs. A nonnegative cost cj is associated with 
every edge or arc (i, j), i # j. The CPP then consists of 
determining a least cost traversal of all edges and arcs 
of G. Several special cases should be mentioned: 

e the undirected CPP, when A = 9; 

e the directed CPP, when E = 9; 

e the mixed CPP, when A # @ and E F 9; 

e the windy CPP, when A = 9 but the cost of travel on 
each edge is not the same in both directions; 

e the hierarchical CPP, when A U E is partitioned into 
several classes and a precedence relationship is de- 
fined among the classes. If a particular class C; pre- 
cedes another class C;, then all edges in C; must be 
visited before Cj. 


Rural Postman Problem 


In the rural postman problem (RPP), only a subset of 
E UA is required to be serviced, although other edges 
or arcs may be in the solution. As for the CPP, different 
variants may be considered, depending if the underly- 
ing graph is directed, undirected or mixed. 


Capacitated Arc Routing Problem 


In the capacitated arc routing problem (CARP), a non- 
negative quantity qj is associated with each arc or edge 
(i, j). A fleet of m vehicles, each of capacity Q, must 
visit all edges or arcs of the graph subject to the capacity 
constraint. 


Methodologies 


Polynomial algorithms have been developed for the 
undirected and directed variants of CPP. The other 
variants are NP-hard, although polynomially solvable 
cases have been identified. For the undirected CPP, 
the problem solving methods are based on the follow- 
ing observation: a cycle that contains each edge exactly 
once in a graph can be found if and only if all vertices 
have even degree (such a graph is said to be Eulerian 
or unicursal). Thus, the basic problem is to find a least 
cost way of adding edges to the graph to make it uni- 


4022 


Vehicle Routing Problem with Simultaneous Pickups and Deliveries 


cursal. This is done by calculating the shortest paths 
between odd-degree vertices and to use these costs to 
determine a least-cost matching of the odd-degree ver- 
tices. The same augmentation problem appears in the 
directed CPP, in terms of balancing the in-degree and 
out-degree of each vertex. This is solved through a min- 
imum cost network flow problem. Similar approaches 
are reported for the RPP although, in this case, approx- 
imate solutions are produced. 

Many heuristic methods have been proposed for 
the CARP. These methods are often derived from 
their VRP counterparts: insertion methods, con- 
structive methods, improvement methods and mixed 
methods [6]. 

Exact algorithms based on branch and bound tech- 
niques are also reported in the literature for solving 
some NP-hard variants [1,5,6]. 


See also 


> General Routing Problem 
> Stochastic Vehicle Routing Problems 
> Vehicle Scheduling 
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Introduction 


Since it was first formulated in 1959 by Dantzig and 

Ramser [5], the vehicle routing problem (VRP) has 

attracted much attention from operations research 

academia. The VRP is similar to the traveling salesman 
problem (TSP). In the TSP, the goal is to find the short- 
est trip for a salesman who has to visit all customers 

(nodes) exactly once, starting at an initial node and re- 

turning to the same initial node. The VRP is the name 

given to the class of problems which includes the TSP 
and adds usage of multiple vehicles, each with a uni- 
form limited capacity. 

The VRP designates a wide range of set-ups to the 
original problem of the TSP rather than addressing 
a specific problem. Several versions of the problem may 
be defined, depending on a number of factors, con- 
straints, and objectives addressed in the context of the 
problem. It is fairly important to clarify the borders and 
content of the problem prior to developing an analyti- 
cal approach towards a solution. The main attributes 
within the configuration of most VRPs published are 
listed as follows: 

e Number of vehicles: The upper limit on number of 
vehicles available for routing. 

e Vehicles’ homogeneity/heterogeneity: The condition 
on the vehicles’ capacity, whether it is uniform for 
all vehicles or not. 

e Time windows: The imposed time constraint for ser- 
vicing a customer. 

e Backhauls: Besides feeding a customer with its de- 
mand, the customer loads the truck with some load 
to be carried back to the depot. 

e Splitting/unsplitting of load: The load to be delivered 
or picked up at any node may be divided into any 
number of groups in the splitting case and this is 
strictly forbidden in the unsplitting case. Splitting 
puts forward multiple trips to any node rather than 
a single one during the routing process. 

e Single depot/multiple depots: The distribution or col- 
lection process is constructed considering a single 


depot or multiple ones; even distribution and col- 
lection centers may be different. 

e Static/dynamic service needs: The demand values are 
either known in whole or unknown to some level 
prior to establishing a route for the service vehicle. 

e Precedence/coupling constraints: If a node’s demand 
must be satisfied with anything picked up at a node 
other than the depot, then a coupling constraint 
is used to handle this. The latter node has prece- 
dence in service over the prior node (precedence 
constraint). 

The VRP with deliveries and pickup (VRPDP) is 

a subset of the general VRP such that rather than pro- 

viding either a delivery or a pickup service, the nodes 

provide both services, sometimes even simultaneously. 

Salhi and Nagy [15] gave a clear identification of three 

types of problem that may be addressed under the VR- 

PDP subset depending on the the service provided as 

follows: 

1. VRP with backhauls (VRPB): The nodes are iden- 
tified either as linehaul (the nodes with deliveries 
originating from the depot) or backhaul (nodes with 
items to be picked up and destined for the main de- 
pot) nodes. The linehaul nodes are served prior to 
the backhaul nodes. The logic behind this is given 
by Chen and Wu [4] as the design of old trucks only 
allowed rear-load functions. 

2. VRP with mixed load (VRPM): Upon introduction 
of a side-loading function on trucks, rearrangement 
of loads on board became easier; thus, the need for 
serving linehaul customers to free some space was 
no longer necessary. The result was the introduc- 
tion of service routes with backhaul nodes at any 
sequence before the last linehaul node. Establishing 
routes composed of a mixed sequence of linehaul 
and backhaul nodes constitutes the VRPM. 

3. VRP with simultaneous delivery and pickup 
(VRPSDP): The VRPM setting caused inconve- 
nience when some nodes requested both a pickup 
and a delivery service. Those nodes had to be visited 
twice, which added to the tour length, resulting in 
inferior total routing results. The VRPSDP setting 
lets each node have a delivery and pickup service 
during the same stop. The delivery operation pre- 
cedes the pickup. The VRPM is a special case of 
VRPSDP such that either the delivery or the pickup 
quantity at each node is defined as zero. 
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The VRPSDP model differs from the m-dial-a-ride 
problem with nonunit capacity in the sense that the 
traffic of goods between nodes other than the depot is 
strictly avoided. In the latter type of problem, m routes 
are created using multiunit capacity vehicles to accom- 
modate transportation of items from and to nodes ex- 
isting on a route. 

Montane and Galvao [10] introduces a fourth differ- 
ent type of VRPDP. The problem, which is referred to 
as the express delivery problem, constructs separate de- 
livery routes and pickup routes to serve customers with 
both pickup and delivery requirements. These delivery 
routes, which are traversed initially, do not necessar- 
ily cover the same nodes while vehicles move along the 
pickup routes. Thus, each node is visited twice, possi- 
bly by two different vehicles, to satisfy the two types of 
service. 

In this review, only the VRPSDP configuration is 
discussed. The problem comprises many customers or 
“nodes” to be served by a fleet of vehicles of homo- 
geneous type and limited capacity. The vehicles de- 
liver items to customers from the depot and pickup 
loads are collected to be delivered back to the depot 
at the end of the trip. The unit sizes of the picked- 
up and delivered items are identical and they con- 
sume the same amount of capacity on each truck. How- 
ever, the amounts picked up and delivered at each 
location may not necessarily be the same. Delivery 
and pickup locations are unique and feeding a cus- 
tomer with anything picked up at a node other than 
the main depot is strictly avoided. The objective is 
to minimize the total distance covered by the fleet 
during service. Some instances of this type of prob- 
lem may be observed in distribution networks of bot- 
tled spring water in recollectable containers, industrial 
gas distribution/collection in refillable tanks, liquefied 
petroleum gas distribution in commercial containers 
from wholesalers to retailers, and crew transporta- 
tion between mainland and offshore oil rigs using 
helicopters. 


Formulation 


The graph-theoretical definition of the VRPSDP is as 
follows. 

Instance: A graph G = (V, E) with edge weights 
we for all e € E and vertex weights d, and p, for all 


v € V, a distinguished node, i.e., depot d, and a pa- 
rameter k denoting the upper limit on the number of 
vehicles available, and a parameter C denoting uniform 
capacity of each of the trucks. 

Objective: Find a partition of the nodes in V\{d} 
to Vi,...,V~% and a subset of edges T, C E form- 
ing k tours each containing node d and each node 
of V; exactly once, so that )°,.7, We is minimized 
without violating >) jey,4j <C. Vijevn Pj SC for 
he{l,..., k} and p*,+d*,+p,<C for ve V, 
te {l,..., k}, where p,,* denotes all the load picked 
up at some partition V; prior to some definite node 
véV;; and d,,* denotes all the load to be deliv- 
ered at some partition V; after some definite node 
ve V;. 


Applications 


Very little attention has been paid to the VRPSDP. 
This problem was first introduced in the literature by 
Min [9]. In his work, Min studied book distribution and 
recollection activities between a central library and 22 
remote libraries in a county in Ohio. Each and every 
day, a central depot is responsible for supplying remote 
libraries with ordered books and recollecting previously 
delivered books from them in return. There are two 
trucks, which are assigned for this distribution and rec- 
ollection activity, with limited capacity. The article also 
provides a symmetric cost matrix in terms of distances 
between the library locations. Min [9] gives the solution 
for his problem as 94. He uses a method of clustering 
the nodes into two groups, then solving a relaxation of 
the problem using branch and bound. After determin- 
ing the constraint violations, he penalizes moves lead- 
ing to such violations and solves the relaxed form of the 
problem iteratively until no violations of the constraints 
are observed. 

Halse [8] studied this special case VRPSDP as well 
as many others in the VRP literature. In his work, cases 
with a single depot and multiple vehicles and num- 
ber of nodes varying between 22 and 150 are stud- 
ied. Halse [8] utilized a Lagrangian relaxation and 
a column-generating approach. A cluster-first-route- 
second type heuristic is developed in which nodes are 
first distributed to vehicles and then the problem is 
solved using a 3-opt approach. 
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Angelelli and Mansini [1] studied the VRPSDP with 
time window constraints. They implemented a branch- 
and-price approach based on a set-covering formula- 
tion for the master problem. A relaxation of the ele- 
mentary shortest path problem with time windows and 
capacity constraints was used as the pricing problem. 
Branch-and-bound was applied to obtain integer solu- 
tions. Angelelli and Mansini [1] provided further pro- 
found guidance about exact algorithms based on col- 
umn generation and branch-and-price algorithms. 

Gendreau et al. [7] studied the VRPSDP for a single 
vehicle case. They derived 26 problem instances based 
on some formerly published instances and they tested 
the performance of their two newly developed heuris- 
tics with those previously introduced heuristics in the 
VRP literature. In their problem instances, the number 
of nodes varied between six and 261, including the de- 
pot. The first algorithm presented by the authors con- 
structs a sequence and serves nodes with positive de- 
mands (they describe positive demand as the case when 
the pickup quantity is greater than the delivery quan- 
tity) until the truck capacity is violated. In other words, 
when the truck is handling the next customer with posi- 
tive demand, if the residual capacity is not enough they 
stop serving the customer and begin serving the next 
available customer with negative demand. When there 
is enough room available to serve the next customer, 
the truck returns to the node where the former capac- 
ity violation occurred, and the next node with a positive 
demand is served. 

The main difficulty associated with Lagrangian re- 
laxation is due to the cardinality of the relaxed con- 
straints, which does not allow for the explicit inclusion 
of all of them in the objective function. To overcome 
this difficulty Toth and Vigo [17] proposed including 
only a limited set of the relaxed constraints initially and 
iteratively added other constraints which are violated 
by the current solution of the Lagrangian problem. Be- 
sides this mechanism, to avoid complexity of the ob- 
jective function, they also proposed purging the relaxed 
constraints from the Lagrangian relaxation in case they 
become slack by the current solution. This process is re- 
peated until no violated constraints are detected (hence, 
feasibility is obtained) or a prefixed number of subgra- 
dient iterations have been executed. 

Dethloff [6] also studied the VRPSDP problem. In 
his study, Dethloff utilized dynamic programming to 


calculate net savings attainable by imbedding the fu- 
ture steps and the course of actions to follow during 
those steps. The aim is to keep higher residual capac- 
ities on the vehicles to provide higher freedom for fu- 
ture servings of nodes while dealing with a current 
node. Higher residual capacities can be achieved by 
serving customers with a small (large) delivery amount 
and a large (small) pickup amount late (early) in the 
route. Each of those residuals is more advantageous if 
it is valid for a long part of the route. Additionally, the 
residual values are prospectively more advantageous if 
a higher cumulative demand for delivery and pickup 
of the yet unvisited customers for future insertions ex- 
ists. In his work, Dethloff developed 40 VRPSDP in- 
stances to test his algorithm. He also reported an im- 
provement on the solution given by Min [9]. Then, 
he compared the results of his algorithm with those of 
Salhi and Nagy [15], based on their problem instances 
and problem structure. In the problems given in [15], 
nodes are separated into disjoint delivery or pickup 
nodes with 0 distance vector in between and they are 
provided with either a delivery or a pickup service, but 
not both at the same time. Thus, a node may be vis- 
ited more than once when the coupling of nodes in 
the solution is collapsed into single ones. Besides, the 
problem puts a limit on the maximum route length and 
introduces multiple depots rather than a single depot 
case. 

The mathematical formulation of the VRPSDP is 
omitted. The interested reader is referred to [6]. A re- 
laxation of the VRPSDP may be obtained by separat- 
ing pickup and delivery processes such that at any node 
either pickup or a delivery occurs. This relaxation has 
been commented on to be at least as hard as an NP- 
hard problem [12]. Thus, the VRPSDP is also NP-hard 
in the strong sense. 

The algorithm presented by Montane and Gal- 
vao [10] for the VRPSDP starts with two well-known 
heuristics: tour partitioning and sweep. The primal 
problem is divided into TSP with simultaneous deliver 
and pickup subproblems and those are solved using 
cycle, minimum spanning tree, and cheapest insertion 
heuristics. Node exchange operators are used to over- 
come route infeasibilities and improve solution qual- 
ity. With use of the proposed methods, eight heuristics 
were generated and tested on 27 problems. The num- 
ber of nodes in the problems ranged between 32 and 80. 
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Minimum, maximum, and average solution values were 
provided rather than the best results for each problem 
instance. 

Vural [18] proposed a genetic algorithm for the so- 
lution of the VRPSDP described elaborately above. In 
that work, some previously published genetic struc- 
tures and mechanisms were used to build a good per- 
forming mechanism. The Dethloff [6] instances were 
tested and the results were found to be relatively bet- 
ter. The genetic mechanism introduced uses the “ran- 
dom keys method” of Bean [2] in order to establish 
the initial population of the chromosomes and uses 
a modified version of a cross-over mechanism intro- 
duced by Topcuoglu and Sevilmis [16]. With these two 
mechanisms, computational efficiency and reduction 
of complexity is sought. Further route improvement is 
performed using a local search mechanism based on 
Or-opt [14]. 

Nagy and Salhi [13] revisited their previous research 
and extended as well as updated it. They studied the 
VRPSDP together with the VRPDP, again considering 
both single and multiple depot instances. They pro- 
vided a list of VRPDP articles with their main features. 
They improved their previous [13] constructive heuris- 
tics by adding more node operators, leading to better 
solution refinement. They introduced three new heuris- 
tic methods and compared their performance with that 
of three other methods in addition to their best per- 
forming method of a previous study. 

Chen and Wu [4] provided a recent study on the 
VRPSDP. They developed two algorithms; one is an 
insertion-based heuristic, which also provides the ini- 
tial solution for the second algorithm; the second al- 
gorithm is a hybrid metaheuristic which works like the 
“simulated annealing” method, but eliminates the prob- 
abilistic moves with a deterministic rule. The algorithm 
also employs a “tabu” mechanism to avoid recurrence 
of previous local optima and finally refines the solu- 
tions for any improvement using a node swap and ex- 
change operators. The algorithm runs as long as an im- 
provement is realized within a specific number of trials. 
The algorithm was run on 14 Nagy and Salhi [13] in- 
stances without an upper limit on route length. They 
claim better results over Salhi and Nagy’s [15] results 
but they provided no comparison either with improved 
results [13] or with those provided by Dethloff [6]. They 
further generated some problem instances by modify- 


ing Solomon instances using the method of Salhi and 
Nagy [15]. 

The study presented by Montane and Galvao [11] 
was inspired by transportation between mainland 
Brazil and open-sea oil platforms using helicopters. 
Since the distance a helicopter may fly between two 
spots is restricted by factors such as fuel needs, the orig- 
inal VRPSDP comes with an additional constraint on 
the maximum length of move between nodes. Swap- 
ping people between platforms is avoided, which makes 
this problem a fine VRPSDP instance from a real-life 
situation. Montane and Galvao [11] used modified ver- 
sions of sweep and tour partitioning heuristics. The 
tours are filled with nodes or closed depending on the 
net change of the total load and maximum distance 
constraint. For this phase, four different selection rules 
are devised. The initial solutions constitute an input to 
the tabu search mechanism. The tabu search stops ei- 
ther when no more feasible movement exists or when 
an upper limit on a number of iterations is met. The 
procedure creates new initial solutions and the tabu 
phase follows until this cycle is run for a fixed num- 
ber of times. The authors tested their problem on 87 in- 
stances, provided by Dethloff [6], Salhi and Nagy [15], 
and Min [9], and 18 newly modified Solomon and ex- 
tended Solomon instances from the literature. 

Bianchessi and Righini [3] applied local search 
and tabu search algorithms on selfcreated random in- 
stances for VRPM. They further applied their algo- 
rithms on Dethloff [6] instances and compared their 
average values with those of Dethloff, reporting an im- 
provement. However, they did not compare their re- 
sults with those provided by previous researchers for 
the same instances. They first constructed initial solu- 
tions using four different node selection rules, based 
on tolerance to capacity violations and overall tour fea- 
sibility. They further applied local search by node ex- 
changes on different neighborhoods they defined. Al- 
though they applied a variable neighborhood search 
technique, they did not use it for VRPSDP solution gen- 
eration. 
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The vehicle scheduling problem (VSP) is concerned 
with the combination of a number of (passenger) trips, 
which are given by a timetable, to (vehicle) blocks. 
These blocks are sequences of trips operated by one 
vehicle after leaving the (origin or) home depot with 
a pull-out trip until its return to the same depot with 
an pull-in trip. For public transit companies the solu- 
tion of the VSP is of great importance for the efficiency 
in planning processes. This is valid for both vehicle op- 
erations as well as for manpower planning. From the 
maximal number of blocks (which usually arises during 
the morning peak hours) one obtains a lower bound on 
the vehicles required, consequently, this bound strongly 
influences the fleet size a company has to maintain. Fur- 
thermore, the number of vehicles and the total duration 
of all blocks also determine considerably the manpower 
requirements. 

The basic problem is to assign each trip of a given 
timetable to exactly one block in accordance with the 
in-company policies and various operational restrictions, 
while a minimization problem has to be solved with 
a given objective function [1,9]. For real-world prob- 
lems, the restrictions which have to be considered are 
usually varying for each company, and they also lead in 
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many cases to a great complexity in the problem struc- 

ture. That is, these optimization problems tend to be 

NP-hard optimization problems (cf. » Complexity the- 

ory; » Complexity classes in optimization). 

A range of basic characteristics can be determined, 
which mainly specify the (company-dependent) prob- 
lem structures of a VSP [5,6,18]. 

e Number of depots. Single-depot vehicle scheduling 
problems (SDVSP) and multi-depot vehicle schedul- 
ing problems (MDVSP) have to be differentiated. 
However, a MDVSP arises only if there are intersec- 
tions between the service areas assigned to the de- 
pots. Otherwise, a certain number of independent 
SDVSP has to be solved. Making use of a cluster first- 
schedule second strategy a MDVSP can also be parti- 
tioned into a number of SDVSP. 

e Assignment of vehicles to a depot. Each vehicle is 
usually assigned to a specific depot. Therefore, af- 
ter finishing a block the vehicle has to return to its 
origin. However, in some cases exceptions may be 
allowed by interchanges between different depots. 

e Number of trips. The number of (passenger) trips 
is fixed by a given timetable. In some cases alter- 
ations (trip shortening, trip shifting, trip cancella- 
tion) are possible or necessary to attain a reduction 
in the necessary number of vehicles or to adhere to 
capacity constraints. 

e Assignment of lines/trips to depots. In multi-depot 
cases, the lines and/or (single) trips have to be as- 
signed to the depots, preferably in accordance to the 
spatial structure. If no overlaps occur, a number of 
SDVSP is generated, otherwise a MDVSP has to be 
solved. 

e Multiple types of vehicles and type-dependent as- 
signment of lines/trips. Various restrictions (e. g., 
demand structure, differentiation on service, techni- 
cal conditions) may necessitate type-dependent as- 
signments of lines/trips. 

e Type-independent and type-dependent capacity 
constraints. At the depots certain restrictions usu- 
ally arise resulting from the number of stationed 
vehicles. However, based on the planning results, 
in some cases interchanges between different types 
may be allowed. 

e Technical restrictions. For those means of transport, 
which are operating in a guided or tracked mode 
(trolley bus, tram, light rail, etc.), additional techni- 


cal restrictions have to be considered. These specific 

restrictions mainly result from the reduced room for 

passing. 

e Manpower capacity restrictions. Besides quantita- 
tive manpower restrictions, the qualification of the 
drivers also has to be taken into consideration. Of 
special interest are the varying knowledge levels re- 
garding line networks, which often occurs in greater 
public transit areas. In addition, in many cases the 
drivers are not instructed on each type of vehicle. 

e In-company restrictions. Here the different layovers 
(which are defined as a minimum time between the 
end of a trip i and the beginning of the following 
trip j carried out by the same vehicle) are consid- 
ered. The duration of a layover results from the at- 
tributes describing the types of trips, which have to 
be linked. Furthermore, other specific in-company 
targets can be included, as e.g., interlining (within 
one block there should not be trips of only one line) 
or a given range for the length of a block. 

This short overview clearly shows the difficulties to de- 

scribe a real-world situation in a formal model, which 

may be taken as a necessary basis to employ quantitative 
methods on these scheduling problems. The availability 
of efficient algorithms usually depends on the formu- 
lation of a model with explicitly or implicitly included 
constraints. However, in some cases a simplification of 
the description can be considered which leads to sig- 
nificant impacts on the complexity of the formal model 

[9,16]. 

Besides the in-company framework, the operational 
restrictions have to be considered, which describe the 
basis for the admissibility to link two trips i and j within 
a block. To determine whether a link is admissible, the 
following conditions based on the fundamentals of the 
(standard) VSP have to be met: 


Seb ap eG Age Fb bip h Agy, S sy; (1) 
Sj a (s; + di) = Tmax; (2) 


with W; as a buffer time to cover possible delays, 5ij as 
the running time of a possibly required deadhead trip, 
A;j or Aji asa (trip dependent) layover, and Tynax as the 
(given) maximum idle time between the end of a trip i 
(expressed as the starting time s; of a trip i plus its du- 
ration d;) and the departure of the following trip j. 
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In the modeling process for all admissible links, 
an (individual) weight has to be defined, based on the 
quantifiable idle time as well as the various nonquantifi- 
able in-company restrictions. From the planning staffs 
point of view, the valuation of idle times in many cases 
leads to a conflicting situation. To attain efficient vehi- 
cle schedules the idle time usually should be minimized, 
but considering the duty scheduling process, longer idle 
times may be favorable for the determination of the 
necessary breaks in a duty. Therefore, it has to be de- 
cided whether the vehicle scheduling process should be 
regarded first of all as an independent step in the plan- 
ning process, which is followed by a (separate) duty 
scheduling process [16], or as an integrated process of 
simultaneous vehicle and duty scheduling [25,48]. 

Based on these formal restrictions and on different 
in-company targets, usually two basic objectives can be 
formulated: 

e minimization of the number of operating vehicles; 

e minimization of the nonproductive idle times (e. g., 
layovers or running time of deadhead trips). 

These two objectives are only complementary in a small 
number of specific cases [40]. Usually, a determination 
of a lexicographic order of these objectives is neces- 
sary. As the needed fleet size has most important in- 
fluence on operational cost, as mentioned above, the 
main objective consequently has to be the minimiza- 
tion of the number of vehicles, which is necessary 
to operate a given timetable, especially during peak 
hours. 

Based on these various and complex in-company 
structures, different models of the VSP have been for- 
mulated and appropriate solution procedures are devel- 
oped [15,21,22,46,49,50]. The most important of these 
models is described in the following. The objective for 
these problems is, if another formulation is not explic- 
itly given, to minimize the number of vehicles needed 
to operate a given timetable or to minimize the overall 
operational cost. 

e Single-depot vehicle scheduling problem (SDVSP). In 
the SDVSP, all trips are operated by vehicles sta- 
tioned at a single depot. The first solution proce- 
dures for these problems are based on assignment 
models [38,41] and transportation models [28,29]. 
Later, network flow formulations [9,34,35], match- 
ing formulations [2,3] and quasi-assignment formu- 
lations [42,43] are used. 


e Vehicle scheduling problems with a fixed number of 
vehicles (p-VSP). The p-VSP appears in three differ- 
ent cases [18]. First, it exists in a two-phase solu- 
tion process of a SDVSP, where based on the opti- 
mal number of vehicles, the overall operational cost 
has to be minimized. In the second case, there are 
more vehicles available than the calculated optimal 
number. Therefore, an additional target that all ve- 
hicles have to be used, has to be considered. The 
third case deals with the problem that the avail- 
able fleet is not large enough to operate a given 
timetable, so that a certain number of trips can- 
not be performed. In the first two cases, the p- 
VSP can be solved based on a transportation model, 
a network flow model, a quasi-assignment model, 
and a matching model, whereas the third case can 
only be handled making use of a quasi-assignment 
model. 

e Multi-depot vehicle scheduling problem (MDVSP). 
In the MDVSP the trips have to be operated by ve- 
hicles stationed at a certain number of depots with 
given vehicle and manpower capacities. Each used 
vehicle has to start and end at its home depot. Based 
on defined depot groups, which represent virtual de- 
pots showing a unique allocation of lines and/or 
trips, different solution approaches are developed. 
The earliest efficient version, a two-stage sched- 
ule first-cluster second solution procedure, which 
used an assignment model, is described in [16,38]. 
However, most of the known approaches proceed 
from a multicommodity flow formulation [34]. To 
solve the MDVSP based on this formulation, var- 
ious optimization strategies are employed, while 
especially branch and bound [8,24,37,45], set par- 
titioning [4,23,33,34,45], and Lagrange relaxation 
[3,30,32,34,36], methods have to be mentioned. 

e Vehicle scheduling with trip shifting (TSVSP). As op- 
posed to classical optimization approaches, in the 
TSVSP input data is modified within the solution 
process. The main advantage of such strategy lies 
in an extension of the degree of freedom for the 
combinatorial process, which leads to better results, 
especially with respect to the minimization of the 
number of vehicles during peak hour. Various ap- 
proaches are described in [7,11,12,14,17,20,31,47]. 

e Vehicle scheduling problems with multiple types 
of vehicles (MTVSP). The MTVSP considers that 
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a varying demand on the lines and/or on (single) 
trips do not necessarily require a unique type of ve- 
hicle. Therefore, if a fleet with different vehicle ca- 
pacities is available, the VSP may be extended. In 
this case the objective is to determine a schedule 
that minimizes the operational cost while the assign- 
ment of vehicles to trips is not fixed but part of the 
optimization process. Solution procedures to solve 
a MTVSP are given in [10,13]. 

e Integrated vehicle and duty scheduling problems 
(VDSP). In the VDSP, two steps of the planning pro- 
cess, the vehicle scheduling and the duty schedul- 
ing, become integrated and solved simultaneously. 
These problems, which arise mainly in extra-urban 
transit planning, result in solutions where each 
block usually corresponds to exactly one duty. So- 
lution methods for this problem are described in 
[2,25,27,39,44,48]. Similar to the VDSP are vehicle 
scheduling problems with time constraints (TCVSP) 
[26]. In this case, a time constraint may arise from 
technical restrictions (fuel capacity, etc.) or legal 
and in-company restrictions such as the maximum 
length of a duty period. 

The described picture shows the basics of the VSP and 

some specific characteristics, especially with respect to 

the various in-company dependent problem structures. 

Beginning several decades ago with very simple solu- 

tion procedures, which were a result of restrictions in 

computer technology and in availability of algorithms, 
better and better results in vehicle scheduling are be- 
ing attained. Until now, the procedure proposed in 

[16,19,38] leads to the best results for real-world prob- 

lems with a great number of trips. However, based on 

network flow formulations in connection with branch 
and bound, set partitioning, and Lagrange relaxation 

methods, some improvements seem to be possible [34]. 

Furthermore, further research activities are focused on 

the TSVSP and also on the VDSP. 
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Polytopes and Volume 


A convex polytope P in dimension d can be described 
in two ways: First, it is the convex hull of a finite set of 
points {vj,..., Vn } CS R?@. Second, it is the bounded in- 
tersection of a finite number of half spaces, i.e. given 
as {x: Ax < b} for a matrix A € R™*¢@ and a vec- 
tor b € R”. In the first case, the polytope is given 
by its V-representation, in the second case by its H- 
representation. 

Only full dimensional polytopes, i.e. polytopes 
which are not contained in any hyperplane of R’, are 
of interest in the context of volume computation. If P 
is such a polytope, then its volume Vol (P) = Vola (P, 
as usual defined by the d-dimensional Lebesgue mea- 
sure, is not zero. A minimal V-representation of P is 
unique and given by the vertices of P. Also a mini- 
mal 3-representation is unique (up to multiplication 
of rows of A and the corresponding entries of b by pos- 
itive scalars), and the sets PM {x: a; x = b}, where q; is 
the ith row of A in such a minimal representation, de- 
fine the facets of P. 

There are numerous applications of polytope vol- 
ume computation, ranging from estimating the size of 
the solution space of a linear program to counting the 
number of roots of a system of complex polynomial 
equations. To date, most practical applications concern 
low dimensions. This is partially due to the fact that 
the complexity of volume computation algorithms in- 
creases exponentially when the dimension grows, par- 
tially to limited experience with these algorithms. 

This contribution is concerned with exact algo- 
rithms for computing polytope volumes which have 
been implemented successfully. Randomized approx- 
imation algorithms as described in [7] are omitted 
from the discussion because so far they have not been 
programmed. 


Complexity Results 


Simple examples show that the minimal V-representa- 
tion of a polytope can have exponential size with re- 
spect to its minimal J{-representation, and vice versa. 
Hypercubes, for instance, have 2d facets and 24 vertices 
in dimension d, and the converse holds for their duals, 
the cross polytopes (see below for more about polytope 
duality). Hence it is not surprising that the complexity 
of volume computation depends on the polytope rep- 
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resentation. The complexity results of this section are 
valid for polytopes represented over Z or Q and are for- 
mulated with respect to the bit length of the input. De- 
tailed proofs can be found in [6]. 

For some classes of polytopes there are polynomial 
time algorithms computing their volumes. An impor- 
tant such class is formed by the polytopes in fixed di- 
mension, given either in J{- or V-representation, which 
reflects that geometric 2- or 3-dimensional problems 
can be solved efficiently in practice. 

Moreover, there are polynomial algorithms for 
polytopes P satisfying the following condition: P is 
given in 3{-representation, and there is a constant 
6 such that each facet of P contains at most 6 ver- 
tices (the near-simplicial case). Or, P is given in H- 
representation, and there is a constant 6 such that each 
vertex of P is contained in at most 6 facets (the near- 
simple case). In particular, the case 6 = d covers sim- 
plicial H- and simple V-polytopes. However, this result 
seems to be of little practical value. For instance, simple 
polytopes are usually given by their H(-representation, 
and their V-representation already tends to be expo- 
nential. Examples for this phenomenon are provided 
by hypercubes or polytopes constructed from randomly 
chosen halfspaces. 

All known algorithms are exponential in d respec- 
tively 5. This is inevitable for J{-polytopes, since the bi- 
nary size of the volume of a polytope may be exponen- 
tial in its }H{-representation. Some of the triangulation 
methods described below (boundary triangulation, for 
instance) show that the volume of a polytope has poly- 
nomial size with respect to the V-representation. How- 
ever, even in this case volume computation is not an 
easy task. Indeed, the problem is #P-hard for polytopes 
in either }{- or V-representation (see [4]). 

The complexity of the volume computation prob- 
lem is unknown when both representations are given. 
This problem might even be solvable in polynomial 
time with one of the known algorithms. 


Basic Approaches and Duality 


All deterministic volume computation methods de- 
compose a given polytope into polytopes whose vol- 
umes are easier to compute. Especially apt for volume 
computation purposes are simplices, i. e. d-dimensional 
polytopes with d+ 1 vertices. Their volume is given by 
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a determinant, precisely, 


Vol(conv({vo,..., Va})) 


i |det(v, =V0se084Va= Vo)| 
7 d! 


Depending on the decomposition into simplices, two 
basic classes of algorithms can be distinguished. 


Triangulations 


A triangulation of a polytope P is a finite collection 
{Aj: i = 1, ..., s} of simplices such that P = US_, Aj 
and the intersection of any two simplices is their com- 
mon face. (This is the usual definition of a triangula- 
tion; in fact, for volume computation purposes it would 
be enough to demand that no two simplices share an 
interior point.) Then clearly 


Vol(P) = ) > Vol(A,). 


i=1 


For example, if P is simplicial, i.e., all its facets are d— 
1-dimensional simplices, then linking these facets to 
a fixed interior point as illustrated in Fig. 1 yields a tri- 
angulation of the whole polytope, called boundary tri- 
angulation. Other algorithms are presented below. 


Signed Decompositions 


The restriction that no two simplices may share an in- 
terior point is in fact not necessary. If it is violated, then 
some parts of the polytope are covered more than once, 
which must be corrected by subtracting their respective 
volumes. A signed decomposition of P is thus a finite col- 
lection {(Aj, o;): i= 1,..., s} of simplices A; and signs 
o; € { + 1} with the following property: If a point of P 
does not lie on the boundary of any simplex, then the 
number of positive simplices containing it exceeds the 
number of negative simplices containing it by 1. A point 
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mances, Figure 2 


outside P is contained in as many positive as negative 
simplices. It follows that 


Vol(P) = } \ 0; Vol(Aj). 
i=1 


An example for a signed decomposition is given by the 
algorithm of J. Lawrence [9], see Fig. 2. Let P be simple, 
and introduce an additional hyperplane e not parallel 
to any edge of the polytope. The signed decomposition 
consists of one simplex for each vertex, which is formed 
by e and the hyperplanes incident with this vertex. How 
the sign is determined will be described below. In Fig. 2, 
the signed decomposition is {(ade, +), (abe, —), (cde, 
—), (bce, +)}. In fact, triangulations are special cases of 
signed decompositions with all signs being +1. Due to 
duality results a distinction between these two classes is 
reasonable nevertheless. 


Duality 


Assume that the origin lies in the interior of P, which 
may be obtained by a translation into any interior point. 
Then the polar dual of P is defined to be P’ = {x € R4: y" 
x < lfor all vertices y of P}. Hence the vertices of P are 
in bijection with the facets of P’, and the converse also 
holds. This means that the V-representation of P cor- 
responds to the H-representation of P’ (with the right- 
hand side b normalized to the all one vector), and vice 
versa. For instance, the polytopes of Fig. 1 and Fig. 2 are 
dual to each other. 

Polarity reverses inclusion, so that the duals of sim- 
ple polytopes are simplicial and vice versa, and the com- 
binatorial structure of P’ depends only on the combi- 
natorial structure of P. The exact geometry of P’ and its 


volume, however, depend on the location of the origin 
in P. A result due to P. Filliman [5] shows that to each 
triangulation of P corresponds a signed decomposition 
of P’, if the origin is located conveniently. Precisely, let 
J bea triangulation of P such that the origin does not lie 
on any hyperplane spanned by the vertices of the trian- 
gulation. Then the following procedure yields a signed 
decomposition of P’: For a simplex A € T let its separa- 
tion number s be the number of its facets separating it 
from the origin. Let A’ be the simplex bounded by the 
hyperplanes corresponding by polarity to the vertices of 
A, and let its sign be o = (— 1)’. 

In this sense, Lawrence’s decomposition of P” in 
Fig. 2 is induced by the boundary triangulation of P in 
Fig. 1. 

Suppose now that the complexity of a decomposi- 
tion is measured by the number of simplices it gener- 
ates (which is a simplified model since it does not take 
the sizes of the occurring numbers into account). Let 
P be a class of polytopes containing the origin in their 
interiors, and let P’ be composed of the duals of the el- 
ements in P. Then for each triangulation algorithm on 
P (working with the V-, H{- or both representations), 
there is a signed decomposition algorithm on P° (work- 
ing with the H-, V- or both representations) with the 
same complexity. This explains why most of the com- 
plexity results above are symmetric with respect to du- 
ality, and that asymmetry may only occur when the bi- 
nary lengths of the numbers involved are taken into 
account. 


Algorithms 


In this section, some volume computation algorithms 
are presented in more detail. While the following list is 
not exhaustive, all of the described methods have been 
implemented and tested on different examples (see [2]). 


Delaunay Triangulation 


The classical Delaunay triangulation is obtained by lift- 
ing the polytope vertices onto the surface of a d+ 1- 
dimensional convex body. Precisely, let f:R? > R be 
a strictly convex function. (Traditionally, f (x) = )°7_, 
<) Lift each vertex v to (v, f(v)). Then the lifted vertices 
will usually be in general position, i.e., their convex hull 
will be simplicial. Interpreting these simplicial facets in 


terms of the original vertices yields a triangulation. If 
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the lifted points are not in general position - which is 
frequent for the traditional lifting function on 0- 1- 
polytopes - then perturbation or the choice of a dif- 
ferent lift resolves the problem. The algorithm needs 
the V-representation. Its time and space complexity 
and numerical behavior depend on the convex hull al- 
gorithm used for computing the facets of the lifted 


polytope. 


Boundary Triangulation 


The algorithm for simplicial polytopes has been de- 
scribed above. If P is not simplicial itself, then per- 
turbation of its vertices leads to a simplicial polytope, 
whose facets can be determined by any convex hull 
algorithm. Identifying the perturbed vertices with the 
original ones leads to a triangulation of P. Again, only 
the V-representation is needed, and the behavior of the 
algorithm is determined by the underlying method for 
computing the convex hull. 


Triangulation by Cohen and Hickey 


An obvious improvement of any boundary triangula- 
tion is obtained by using a vertex instead of an interior 
point, thus dropping all simplices of the boundary tri- 
angulation containing this vertex. If in Fig. 1, for in- 
stance, e were translated into a, then only the two sim- 
plices abc and acd would be left over. Together with 
a very efficient scheme of handling nonsimpliciality, 
this observation is the basis of the triangulation algo- 
rithm described by J. Cohen and T. Hickey [3]. Suppose 
that e1,..., @m are the facets of P, given by the sets of 
vertices they contain. (This requirement is equivalent to 
the knowledge of both the V- and the 1{-representation 
of P.) Fix a vertex v(P). Then the set {conv(v(P), e; v(P) 
¢ e; } is a decomposition of P into pyramids with bases 
e; and apex v. If the polytope is simplicial, then the 
pyramids are in fact simplices. Otherwise their bases 
are triangulated recursively, and the resulting d— 1- 
dimensional simplices, together with the fixed vertex 
v(P), form a triangulation of P. More precisely, let b 
be such a base, again represented by a set of vertices. 
Fix a vertex v(b) € b. As above, {conv(v(b), e):v(b) ¢ 
e}, where e varies over the facets of b, i.e. the b — 1- 
dimensional faces of P contained in b, forms a decom- 
position of b into pyramids. The recursion continues 
with the e’s until a simplicial face is reached, which is 


the case at the latest in dimension 1. While the e’s are 
not known a priori for b ¥ P, they are exactly the b — 
1-dimensional sets within {b M e;: v(b) ¢ e; }. Instead 
of testing the dimensions of these sets, which turns out 
to be too costly in practice, one can just continue the 
algorithm with all of them. If the dimension of b /N e; 
is lower than needed, then this recursion branch will 
end prematurely with an empty face and not contribute 
a simplex to the triangulation. 

Since Cohen and Hickey’s triangulation scheme is 
purely combinatorial - it uses only incidence informa- 
tion and not the vertex coordinates to obtain a triangu- 
lation - it raises no numerical problems. Storing faces 
as sets of their vertices facilitates the detection of sim- 
plicial faces, as a face in (supposed) dimension d’ is sim- 
plicial if and only if it has d’+ 1 vertices. Similarly, faces 
with supposed dimension d’ and less than d’+ 1 ver- 
tices can be dropped, as their dimension is in fact lower 
than d’. 


Lawrence’s Signed Decomposition 


Assume first that P is simple. To apply the scheme pre- 
sented in the section about signed decompositions, it 
must be known how to compute the volume of the sim- 
plex associated with a vertex v of P. Lawrence [9] de- 
rives the following formula. Let the additional hyper- 
plane be {x: c' x = 0}, where xt» c! xis nonconstant 
on any edge of P. Denote by A, the d x d-matrix cor- 
responding to the facets containing v. (This matrix can 
be derived if both the H- and the V-representation of P 
are known.) Then A, is invertible, y” = (A! )~1c is well 
defined up to permutations of its entries, and none of 
its entries is zero. Finally, 


Vol(P) = (ely)? 
0. => ‘ 
~ di |det Ay| TT, v! 


If P is not simple, then a vertex v is contained in more 
than d facets, and Ay is not square any more. In this 
case one perturbs the 3{-representation lexicographi- 
cally and runs a vertex enumeration algorithm. This re- 
sults in possibly several new simple vertices for each 
vertex of the original polytope. Each new vertex cor- 
responds to a lexicographically feasible cobasis A, of 
the original vertex and contributes a term to the sum 
above. The same procedure must be applied if only the 
H-representation of P is known. 
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An unresolved problem is the choice of the addi- 
tional hyperplane, i.e. of c. For rational polytopes it 
is possible to choose c of size polynomial in the H- 
representation, so that each term of the sum has poly- 
nomial size. However, the nontrivial denominators of- 
ten result in prohibitively large numbers after a few 
terms are summed up. On the other hand, using float- 
ing point arithmetic often leads to severe numerical 
instabilities when the denominator is close to zero in 
some vertices, so that numbers of drastically different 
magnitudes are summed up. 


Lasserre’s Signed Decomposition 


Like Cohen and Hickey’s triangulation, the algorithm 
of J.B. Lasserre [8] is based on a decomposition of 
the polytope into pyramids and proceeds by recursion. 
However, it uses only the 1{-representation. Let a; be 
the ith row of A, b; the ith entry of b and e; = {x € P:a; 
x = bj}. Suppose that if e; is parallel to e; for some i # 
j, then b; and b; have different signs. (This is no restric- 
tion, since otherwise one of them is redundant, but it 
has to be tested in an implementation.) Then 


m 
vol(P) = =~ Ht vali. ie), 
d Jail 

where |b;|/||a;|| is the distance of the origin from ¢. 
A negative sign occurs for negative bj, i. e., when the hy- 
perplane corresponding to e; separates the origin from 
the polytope. This formula does not yet allow a recur- 
sive implementation, since the d— 1-dimensional vol- 
umes of facets embedded into d-dimensional space are 
needed. Thus e; is projected onto a suitable subspace of 
d— 1 coordinates. Let aj 4 0. Substituting x; = (bj — )°x 
F j aixx;)/ay in the system of linear inequalities Ax < b 
yields a new system A9x® < b® with m— 1 inequali- 
ties in d— 1 variables, which describes the projection of 
e; onto the coordinate subspace {x: x; = 0}. Taking the 
distorting effect of this projection into account, the fol- 
lowing formula, which can be implemented recursively, 
is derived: 


Vola(P) 


In the same way as Cohen and Hickey’s triangula- 
tion improves on boundary triangulation, the compu- 
tational effort for Lasserre’s algorithm can be reduced 
by translating each intermediate polytope into one of 
its vertices. In fact, the algorithm then implicitly con- 
structs Cohen and Hickey’s triangulation, with the de- 
terminant computation spread over the recursion tree. 
Since finding a vertex is a rather costly linear program- 
ming step, one instead translates into a (possibly infea- 
sible) basis solution, which preserves the character of 
a signed decomposition. 

Noticing that the same face may be considered at 
several places in the recursion tree, the efficiency of the 
algorithm can be increased by storing the volume of 
a face when it is computed for the first time, and retriev- 
ing the volume when a face reappears. Some care must 
be taken, however, when a face is projected onto differ- 
ent coordinate subspaces; then a determinant compu- 
tation is required. For details, see [2]. 

Unlike in Cohen and Hickey’s triangulation algo- 
rithm, there is no cheap way of testing whether an in- 
termediate face is empty or of a too low dimension. 


Hybrid Orthonormalization Technique 


In [2] an algorithm is described which combines the ad- 
vantages of Cohen and Hickey’s and Lasserre’s meth- 
ods. On one hand, it exploits the information of both 
the {- and the V-representation, on the other hand, it 
stores and retrieves intermediate results. 

Consider again the dissection into pyramids 


{conv(v(P), e;): v(P) €¢ e;} 


of Cohen and Hickey’s triangulation algorithm, where 
as before the e;’s are given as the sets of their vertices. 
The volume formula for pyramids yields 


1 
Volg(P) = > q dist(v(P), affle;)) Vola—i (ei), 
v(P)€e; 


where dist(v(P), aff(e;)) denotes the distance of the 
pyramid apex to the affine subspace corresponding to 
the pyramid base. Suppose that Volg_ ; (e;) and on or- 
thonormal basis of the linear subspace associated with 
e; are known. Then the required distance as well as an 
orthonormal basis of the linear subspace correspond- 
ing to the pyramid with basis e; can be computed easily. 
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Since the volume and an orthonormal basis of a one- 
dimensional polytope are trivial to determine, a recur- 
sive approach analogous to Cohen and Hickey’s algo- 
rithm can be adopted. 

So far, there is no advantage over the triangula- 
tion scheme. (In fact, the overhead of basis computa- 
tion makes this algorithm slower in practice.) How- 
ever, the strategy of storing and reusing intermediate 
results, which has been successfully employed to accel- 
erate Lasserre’s algorithm, is applicable now. Volumes 
and orthonormal bases of already visited faces can be 
stored and retrieved as soon as the face is reconsidered. 
As the bases require an enormous amount of storage 
space, it is reasonable to store only the face volumes and 
to recompute orthonormal bases from scratch when 
necessary. 

The algorithm requires both the J{- and the V- 
representation. Unlike all other methods presented 
above, it can only be implemented using floating point 
arithmetic and not rational arithmetic because or- 
thonormalization involves square roots. Care must be 
taken to choose a numerically stable orthonormaliza- 
tion technique, e.g. using Householder transforma- 
tions. 


Choosing an Algorithm 


Experience with volume computation in higher dimen- 
sions is very limited; to date, only one study has been 
published [2]. Hence, while the following recommen- 
dations reflect the state of the art, they may soon be 
affected by algorithmic progress or new experimental 
results. 


Low Dimension 


In accordance with the theoretical complexity results, 
volume computation is a simple task in practice if one 
restricts oneself to low dimensions (up to about 5). An 
algorithm should be chosen which works with the rep- 
resentation in which the polytopes are given. 


Near-Simple and Near-Simplicial Polytopes 


It can be observed that in practice triangulation meth- 
ods behave particularly well on near-simplicial V- 
polytopes, and that signed decomposition methods be- 


have particularly well on near-simple 9{-polytopes. 
This is in accordance with the general duality result 
above, but at first sight surprising when compared to 
the complexity result stating that these problems are 
polynomial when the polytopes are given by their con- 
verse representations. However, the polynomial com- 
plexity is obtained by solving a large number of lin- 
ear programs, which is apparently not competitive for 
problems of a tractable size. 

Lawrence’s method, which generates a signed de- 
composition with especially few simplices for near- 
simple 3{-polytopes, suffers from numerical instabili- 
ties as outlined above, so that Lasserre’s algorithm is 
preferable in general. 


Double Representation 


If both representations are known, then the hybrid or- 
thonormalization technique proves to be the most effi- 
cient algorithm in practice. Although it is closer in spirit 
to a triangulation method, it is usually faster even than 
signed decomposition algorithms on near-simple poly- 
topes since it efficiently exploits the additional struc- 
tural information from the V-representation. 


Representation Conversion 


The cases left over as difficult are those of polytopes 
given by the ‘wrong’ representation. An experimental 
finding is that V-polytopes with a large ratio n/m of 
vertices to facets and +-polytopes with a small ratio 
n/m pose problems. Unfortunately, most algorithms for 
converting between the representations face an ‘easy’ 
and a ‘hard’ direction, and exactly the hard direction 
is needed here. It has been observed, however, that on 
suitable classes of polytopes both directions have essen- 
tially the same complexity; the result is obtained using 
the easy transformation as an oracle for the hard one 
[1]. This technique will probably allow to efficiently ap- 
ply the hybrid orthonormalization technique to those 
polytopes whose volumes are particularly hard to com- 
pute today. 


See also 


> Quadratic Programming Over an Ellipsoid 
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Young man, in mathematics you don’t under- 
stand things, you just get used to them. 


John von Neumann 


A very intelligent mathematician and scientist, John 
von Neumann (1903-1957) worked in the area of set 
theory, game theory, economic behavior, operator alge- 
bra, quantum mechanics, computer science, neural net- 
work, and the theory of automata. He was also one of 
the first five professors at the Institute for Advanced 
Studies (IAS), whose purpose was ‘the pursuit of ad- 
vanced learning and exploration in fields of pure sci- 
ence and high scholarship’ [5]. 

Von Neumann was born in Budapest on December 
28, 1903. He was a mathematical prodigy with extraor- 
dinarily fast thinking and an effectively ‘photographic 
memory’. In 1921 he was sent to study at the Lutheran 
Gymnasium (Agostai Hitvallasu Evangelikus Fogim- 
nazium), one of three well-respected high schools in 
Budapest [1]. During the eight years of Von Neumann’s 
high school career, his father arranged a professional 
mathematician to tutor him at home so that his remark- 
able mathematical talent would be advanced. 

At the age of 18, von Neumann had his first mathe- 
matical paper published, with M. Fekete, his tutor. This 
paper showed how to solve a problem on the location of 
zeroes of certain minimal polynomials. He was awarded 
‘excellence in mathematics and scientific reasoning’ in 
a nationwide high school competition [4]. 

In 1921, von Neumann enrolled in both the Univ. 
of Budapest and the Univ. of Berlin to study mathe- 
matics. In 1925, he was awarded the Bachelor degree 
in chemical engineering from the Eidgenossische Tech- 
nische Hochschule (ETH) or Swiss Federal Institute of 
Technology. A year later, he was awarded the Ph.D. in 
mathematics at the Univ. of Budapest. 

During the 1920s, while von Neumann was in Eu- 
rope, he focused hisworks in two main areas: ‘set theory 
and logical foundations of mathematics’, and ‘Hilbert 
space theory, operator theory, and the mathematical 
foundations of quantum mechanics’ [7]. On top of that 
he successfully gained a reputation for his work on set 
theory and quantum mechanics especially the theory of 
measurement [1]. 

In 1930, von Neumann was invited to work at 
Princeton Univ. which was recognized as a global cen- 
ter for mathematicians during the early 1930s [3]. A few 
years later, he was appointed to teach at the IAS. Like 
other mathematicians at that time, von Neumann was 
fascinated by Hilbert research. One of his important 
works in Hilbert space was the theory of rings of op- 
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erators (W+-algebras or von Neumann algebras) which 
are algebras of bounded operators in a separable Hilbert 
space [8]. He applied ‘modern algebra to algebras of op- 
erators in a Hilbert space’. Later on, working with F.J. 
Murray, von Neumann examined the continuous ge- 
ometry associated with the ring of operators [2,9,10]. 

Von Neumann was also interested in the theory 
of games [12]. His great achievement in this area was 
the book [14], written with the Austrian economist O. 
Morgenstern. The book contains the mathematics of 
game theory and its application to a variety of economic 
problems. 

During World War II, the need for advanced com- 
puting technology increased within various military 
research programs. Several scientists, physicists, and 
mathematicians, who worked in those programs, faced 
problems that could not be solved analytically. As a re- 
sult some experiments or numerical methods were used 
to determine solutions to these difficult. Like other 
mathematicians, von Neumann also participated in 
those research programs as a consultant. He got in- 
volved in aerodynamics, high explosives, atomic bomb, 
electronics, the development of high-speed calculat- 
ing machines, etc. Von Neumann wanted a powerful 
calculating system that could solve nonlinear partial 
differential equations of more than one independent 
variable. 

At the Univ. of Pennsylvania’s Moore School of 
Electrical Engineering, the ENIAC (Electronic Nu- 
merical Integrator and Computer), a programmable 
electronic calculator, was completely designed. The 
ENIAC, regarded as the first modern computer, was 
able to work ‘at electronic speed’. As soon as von 
Neumann heard about this machine, he rushed to 
see the ENIAC despite of its full completion. Dur- 
ing the time he visited the Moore School, he dis- 
cussed. with the staff about the design of the EDVAC 
(Electronic Discrete Variable Arithmetic Computer), 
a stored-program computer; so he decided to partici- 
pate in developing this machine. While working on ED- 
VAC project, von Neumann was writing [6] in 1945 to 
describe the stored-program computer, particularly in 
its logical control [1,2]. His report was successfully ap- 
proved throughout the United States and Britain. 

After the war, von Neumann directed his attention 
to the Electronic Computer Project because he expected 
an increase on computer need in scientific research. 


One of his concerns on computing systems was their 
speed. In [13], he and his co-author defined factors that 
affected overall speed [1,2]. With A. Burks and H. Gold- 
stine, von Neumann also wrote a report, ‘Preliminary 
Discussion of the Logical Design of an Electronic Com- 
puting Instrument’, on logical design known as von 
Neumann architecture [1,2]. 

As the power of computer increased, the use of nu- 
merical analysis was stimulated after a declining state 
during the 1930s. Although the computer gave math- 
ematical scientists an opportunity to study larger and 
more complex systems of linear equations, partial dif- 
ferential equations, etc., existing iterative methods were 
ineffective to solve problems using computer. For this 
reason, von Neumann began examining more efficient 
and more reliable algorithms for the computer [11]. 
The methods of Monte-Carlo and the duality theorem 
in linear programming are the two most distinguished 
results that he contributed in computer-oriented nu- 
merical analysis. 

The computer has been a necessary tool for the 
achievement of many scientific and engineering re- 
search as von Neumann’s predicted. He used com- 
puter to solve problems in fluid dynamics, meteorol- 
ogy, atomic and nuclear physics, partial differential 
equations, numerical analysis, linear programming, etc. 
[11,12]. Von Neumann interest in computer seemed to 
be never lessened. The theory of natural and artificial 
automata [11] is also one of his computing research ac- 
complishment during the last years of his life. It exam- 
ines general solutions to the problems of organization, 
structure, language, information, and control [1,2]. 

Von Neumann’s problem-solving ability and wide- 
ranging interests enabled him to produce a large num- 
ber of contributions to various fields. Moreover his 
fast thinking and effective memory allowed him to re- 
duce the complexity of many problems. He had never 
stopped thinking about mathematics. Thus it is not sur- 
prising that von Neumann quickly became an intellec- 
tual in both pure and applied mathematics. 


See also 


> Duality Theory: Biduality in Nonconvex 
Optimization 

> Duality Theory: Monoduality in Convex 
Optimization 
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Facility Location Problems 


Let T be a convexpolygon in the plane R’, and S = {P), 
...» Py} bea set of n points in T. Let us denote by d(P, Q) 
the Euclidean distance between P and Q. The following 
are typical location problems. 


Problem 1 Find a point P = P* that attains 


max min d(P, P;). (1) 
PET PjeS 


Problem 2 Find a point P = P*™* that attains 


min max d(P, P;). (2) 
PER? PjES 


Problem 1 is called the largest empty-circle problem. 
This is because the solution P* of Problem 1 gives the 
center of the largest circle that does not contain any 
point of S in the interior while the center is in T; the 
value of (1) is the radius of the largest empty circle. 
Problem 2, on the other hand, is called the smallest 
enclosing-circle problem, because the solution P** of 
Problem 2, gives the center of the smallest circle con- 
taining all the points in S; the value in (2) is the radius 
of the smallest enclosing circle. 

These problems can be considered as facility loca- 
tion problems in the following sense (cf. also » Multi- 
facility and restricted location problems). Suppose that 
T represents the shape ofa city, and that P),..., P, rep- 
resent the locations of hospitals in the city. We assume 
that citizens go to the nearest hospitals when they need 
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medical care. Then, Problem 1 is to find the inhabitant 
who is the farthest from the nearest hospital. Hence, if 
the city has a budget to build another hospital, the solu- 
tion P* of Problem 1 gives the optimal location to build 
it in the sense that the least convenient person is bene- 
fited the most by the new hospital. 

Next suppose that the city government wants to 
build a blood-supply center that keeps all the types of 
blood and delivers them to the hospitals when needed. 
Then, the solution P* * of Problem 2 gives the optimal 
location to build it in the sense that the longest distance 
to a hospital is minimized. 


Voronoi Diagrams 


For a given set S = {Pj, .. 
region R(S; P;) by 


., P,} of n points, we define 


d(P, P;) < d(P, Pj) 


2, 
saa for any j Ai 


R(S; Pi) = (3) 


R(S;P;) consists of points P such that, among S, P; is 
the nearest point from P. R(S; P;) is called the Voronoi 
region of Pj. 

The plane R? is partitioned into the Voronoi regions 
R(S; P1), ..., R(S; P,) and their boundaries. This parti- 
tion is called the Voronoi diagram for S. Elements of S$ 
are called the generators of the Voronoi diagram. Fig- 
ure 1 shows an example of a Voronoi diagram, where 


small dots represent elements of S and solid lines repre- 
sent the Voronoi diagram. 
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Voronoi diagram and empty circles 


A line, a half line or a line segment shared by the 
boundaries of two Voronoi regions is called a Voronoi 
edge, and a point shared by the boundaries of three or 
more Voronoi regions is called a Voronoi point. 

The following properties are the direct conse- 
quences of the definition [1,7]. 


Lemma 3_ A Voronoi edge is on the perpendicular bisec- 
tor of the associated two generators. 


Lemma4_ A Voronoi point is the center of the circle that 
passes through the associated three or more generators, 
and there is no generator in the interior of this circle. 


An example of such a circle is shown by circle c, in 
Fig. 1. 

The Voronoi diagram can be constructed by many 
efficient algorithms. Among them, the divide-and- 
conquer algorithm [10] and the plane-sweep algorithm 
[3] require O(n logn) time, and this time complexity 
is worst-case optimal. The incremental algorithm re- 
quires O(n) time on the average for a wide class of dis- 
tributions of the generators [6]. Numerically robust al- 
gorithms are also obtained [7,11]. 

Now let us return to Problem 1. Suppose that we 
start with a circle with radius 0 centered at an arbitrary 
point in T, and try to make the circle as large as possi- 
ble by changing the radius and the center continuously 
provided that the center is in T and the circle contains 
no element of S in the interior. The situations in which 
we cannot make the circle larger can be classified into 
three types. 

The first type is that the circle hits three points in S, 
as circle c, in Fig. 1. The second type is that the circle 
hits two points and the center is on the boundary of T, 
as circle cz in Fig. 1. The third type is that the circle hits 
one point and the center is at the corner of T, as circle 
c3 in Fig. 1. 

Thus, the solution P* of Problem 1 is either: 

i) a Voronoi point; 

ii) a point of intersection between a Voronoi edge and 
the boundary of T; or 

iii) a vertex of the polygon T. 

Hence, we can solve Problem 1 by first constructing the 

Voronoi diagram and next checking all the candidate 

points. Since the number of Voronoi points and that of 

Voronoi edges are of O(n), Problem 1 can be solved in 

O(n log n) time if the number of vertices of the polygon 

T is of O(n). 
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Farthest-Point Voronoi Diagram 


Reversing the inequality in (3), we define another re- 
gion R¢(S; P;) by 


d(P, P;) > d(P, P;) 


2. 
ee for any j # i 


R4(S; Pi) = (4) 
R,(S; P;) consists of points P such that, among S, P; is 
the farthest point from P. The plane is partitioned into 
R¢(S; Pi), ..., Re(S; P;,) and their boundaries. This par- 
tition is called the farthest-point Voronoi diagram for S. 
The farthest-point Voronoi diagram for the same set of 
points as in Fig. 1 isshown in Fig. 2. 

The next property is a direct consequence of the def- 
inition [7]. 


Lemma 5 A Voronoi point of the farthest-point 
Voronoi diagram is the center of the circle that passes 
through the associated three generators, and this circle 
contains all the elements of S. 


An example of such a circle is shown by circle c, in 
Fig. 2. 

Suppose that we choose an arbitrary circle contain- 
ing S, and shrink it smaller by changing both the ra- 
dius and the center continuously without violating the 
condition that the circle contains all the elements of S. 
The locally minimal circle thus obtained can be classi- 
fied into two types. One type is a circle hitting three or 


Re(S: Py) 
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Farthest-point Voronoi diagram and the candidates of the 
smallest enclosing circle 


more points in S, as is shown by the circle c, in Fig. 2, 
and the other is a circle hitting two points that form 
a diameter of the circle. 
Therefore, the solution P* * of Problem 2 is either: 
i) a Voronoi point of the farthest-point Voronoi dia- 
gram for S; or 
ii) the midpoint of two vertices on the boundary of the 
convex hull of S, where the convex hull of S is de- 
fined as the smallest convex region containing S. 
Both the convex hull and the farthest-point Voronoi di- 
agrams can be constructed in O(n logn)time [2], and 
consequently Problem 2 can be solved in O(n logn) 
time. 


Variations in the Distance 


We have considered the facility location in the frame- 
work of the Euclidean distance. Sometimes, however, 
other distances are more realistic. For many variants of 
the distance, the above discussion can be applied with 
slight changes. 

Let (x;, y;) be the coordinates of P;, and (x, y) the 
coordinates of P. Typical variants of the distance are the 
following: 

e L,-distance (also called the Manhattan distance) 


d(P, P;) = |x —xi| + |y—yils (5) 
e@ Loo-distance 

d(P, P;) = max{|x — x;|,|y — yil}s (6) 
e L,-distance 

d(P, Pi) = {|x — xi]? + ly — yilP 3s (7) 
e elliptic distance 


d(P, P;) = a(x — xj)’ + 2b(x — xi)(y — yi) 
+ cy—yi)’, (8) 
where a > 0,ac — b? > 0. 


The L,-distance is a good approximation of the actual 
cost to move along the avenues in the North-South di- 
rection and along the streets in the East-West direction, 
just as we do in Manhattan in New York City or in the 
central area of Kyoto City. In this distance, the ‘circle’ 
is a square whose sides are slanted in the 45° direction 
with respect to the x and y axes. 
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Flatland 


Voronoi Diagrams in Facility Location, Figure 3 
Elliptic-distance Voronoi diagram and the largest empty el- 
lipse 


The Loo-distance can be observed in a mechanical 
x-y plotter. In the x-y plotter, a pen is moved by two 
step motors, one for the x direction and the other for 
the y direction; these two motors are controlled inde- 
pendently when the pen is moved in the pen-up mode. 
Hence, the time required to move the pen from one 
position to another is proportional to the Loo-distance. 
A similar distance can also be found in the parts-supply 
system on the ceiling of a production factory. 

The L,-distance represents a general distance, in- 
cluding the Euclidean distance for p = 2, the Manhattan 
distance for p = 1 and the Loo-distance as the limit for p 
> 00. 

For each of these distances, Problems 1 and 2 are 
defined. Also the Voronoi diagram and the farthest- 
point Voronoi diagram can be defined similarly [1,7], 
and hence can be used to solve Problems 1 and 2. 

It might be difficult to find an example of the el- 
liptic distance in the actual world, but this distance is 
also useful for some location problems. An example is 
shown in Fig. 3. Suppose that, as shown in (a), we are 
given a map, and we want to insert the name of a place 
in this map in such a way that the name should be as 
large as possible while it should not intersect the exist- 
ing symbols or lines. To solve this problem, we first find 
an ellipse enclosing the name text as shown in (a), next 
construct the Voronoi diagram with respect to the ellip- 
tic distance as shown in (b), and finally solve the largest 
empty-ellipse problem using this Voronoi diagram. 

Note that the elliptic-distance Voronoi diagram can 
be constructed easily. Actually, we first transform the 


plane by the affine transformation that maps the given 
ellipse to a circle, next construct the Euclidean-distance 
Voronoi diagram, and finally inversely transform it to 
the original plane. 


Multiple-Facility Location 


So far we have considered the optimal location of a sin- 
gle new facility. Another type of the location problem is 
to find a given number, say n, of points that altogether 
attain a certain optimality. This type of the problem 
arises when two or more facilities can be constructed 
simultaneously. 

A typical problem in this type is the following. 


Problem 6 For a given convex region T and given 
number 1, find the locations of n points P| = PY, ...; 
P,, = P*, that attain 
min max min d(P,P,). (9) 

Pi,...,Pn€ET PET iefi,...,n} 
This problem may arise in the situation that the city 
T has no hospital, and the city government wants to 
construct 1 hospitals (simultaneously or one by one) in 
such a way that, when all the n hospitals are built, the 
maximum distance from an inhabitant to the nearest 
hospital is minimized. 

A more general situation is that the city already has 
k hospitals and the government wants to construct n 
more hospitals. Thus, we get the next problem. 


Problem 7 For given convex regions T, number n, and 
k points Pysi,.-.>Pnsz € T, find the locations of n points 
P, = PY,..., Py = P* that attain 


min max min _ d(P,P,). (10) 


Pi,...,.P,ET PET i€f{l,....n+k} 

Still another situation is the following. 

Let jz (P) denote the population density at P in the 
city T, and let d(P, P;) be the cost for an inhabitant at P 
to go to the hospital at P;. Assume that all the citizens 
need medical care in equal probability, and the govern- 
ment wants to build n hospitals in such a way that the 
total cost for the citizens to go to the hospitals is mini- 
mized. Thus we get the next problem. 


Problem 8 For given T, j2, n, find the locations of n 
points P; = P},..., P, = P* that attain 


min foe, min d(P, P;) dT. (11) 


Py,...,P,E€T i€{l,..., 
fh 
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Note that Expressions (9), (10), (11) have the same 
form: 


min F(P,,... 
Pi,...,PnhET 


iPr) (12) 


However the objective function 


FCP}, «2 Py) 


is nonconvex, and the three problems are in general dif- 
ficult to solve strictly. However, we can find an approxi- 
mate solution using Voronoi diagrams in the following 
way. 

As we have seen, F(P), ..., P;,) in Problems 6 and 
7 can be obtained by solving the largest empty-circle 
problem and hence the Voronoi diagram associated 
with the distance d can be used. Moreover, (11) is 
rewritten to 


n 


> min, 2 i w(P)d(P, P;) aT, 


‘=1R(5:P,)AT 


(13) 


where R(S;P;) denotes the Voronoi region of P; in the 
associated Voronoi diagram. Hence, the objective func- 
tion F(P),..., P,) in Problem 8 can also be computed 
via the Voronoi diagram. 

Thus, we have the following iterative strategy to 
solve Problems 6, 7 and 8 approximately. First, we 
choose Pj, ..., P, in T arbitrarily. Next we construct 
the Voronoi diagram for {P), ..., P,} with respect to 
the given distance d, and compute the value of F(P),..., 
P,,) together with 0F/ 0x;, 0F/ dy; (i = 1, ..., n). Then, 
.., P,, in the direction that decreases the 
oe P,) 


we move Pj,. 
objective function. We repeat this until F(P, .. 
converges. 

The detailed descriptions of the strategies and their 
experimental evaluations for individual types of the 
problems can be found in [4,8,12]. For the fast auto- 
matic differentiation method to get dF/ dx; and 0F/ dy;; 
refer to [5]. 


Concluding Remarks 


We have seen typical problems in facility location for 
which the concept of the Voronoi diagram and its gen- 
eralization are useful. There are many other variants in 
facility location problems. For the details, also refer to 
other surveys [7,9]. 


See also 


> Combinatorial Optimization Algorithms in 
Resource Allocation Problems 

> Competitive Facility Location 

> Facility Location with Externalities 

> Facility Location Problems with Spatial Interaction 

> Facility Location with Staircase Costs 

> Global Optimization in Weber’s Problem with 
Attraction and Repulsion 

> MINLP: Application in Facility Location-allocation 

> Multifacility and Restricted Location Problems 

> Network Location: Covering Problems 

> Optimizing Facility Location with Rectilinear 
Distances 

> Production-distribution System Design Problem 

> Resource Allocation for Epidemic Control 

> Single Facility Location: Circle Covering Problem 

> Single Facility Location: Multi-objective Euclidean 
Distance Location 

> Single Facility Location: Multi-objective Rectilinear 
Distance Location 

> Stochastic Transportation and Location Problems 

> Warehouse Location Problem 


References 


1. Aurenhammer F (1991) Voronoi diagram: A survey of a fun- 
damental geometric data structure. ACM Computing Sur- 
veys 23:345-405 

2. Edelsbrunner H (1987) Algorithms in combinatorial geom- 
etry. Springer, Berlin 

3. Fortune S (1987) A sweepline algorithm for Voronoi dia- 
grams. Algorithmica 2:153-174 

4. lri M, Murota K, Ohya T (1984) A fast Voronoi diagram al- 
gorithm with applications to geographical optimization 
problems. In: Thoft-Christensen P (ed) Proc. IFIP Conf. Sys- 
tem Modelling and Optimization (1983, Copenhagen), Lec- 
ture Notes Control Inform Sci. Springer, Berlin, 273-288 

5. Kubota K, lri M (1991) Estimates of rounding errors with 
fast automatic differentiation and interval analysis. J In- 
form Process 14:508-515 

6. Ohya T, Iri M, Murota K (1984) Improvements of the in- 
cremental method for the Voronoi diagram with compu- 
tational comparison of various algorithms. J Oper Res Soc 
Japan 27:306-336 

7. Okabe A, Boots B, Sugihara K, Chui SN (2000) Spatial tes- 
sellations: Concepts and applications of Voronoi diagrams. 
Wiley, New York 

8. Okabe A, Suzuki A (1987) Stability of spatial competition 
for a large number of firms on a bounded two-dimensional 
space. Environm Plan A 19:1067-1082 


Voronoi Diagrams in Facility Location 4045 


9. Okabe A, Suzuki A (1997) Locational optimization prob- 11. Sugihara K, Iri M (1994) A robust topology-oriented incre- 
lems solved through Voronoi diagrams. Europ J Oper Res mental algorithm for Voronoi diagrams. Internat J Comput 
98:445-456 Geom Appl 4:179-228 

10. Preparata FP, Shamos MI (1985) Computational geometry: 12. Suzuki A, Drezner Z (1996) ‘The p-center location problem 
An introduction. Springer, Berlin in an area. Location Sci 4:69-82 


Walrasian Price Equilibrium 


4047 


ee 
Walrasian Price Equilibrium 


WPE 


ANNA NAGURNEY 
University Massachusetts, Amherst, USA 


MSC2000: 91B50 


Article Outline 


Keywords 

The Iterative Scheme 
The Projection Method 
The Relaxation Method 
See also 

References 


Keywords 


Pure exchange; Pure trade; General economic 
equilibrium; Perfect competition; Walras law; 
Aggregate excess demand function; Variational 
inequality formulation; Projection method; Relaxation 
method 


The Walrasian price or pure exchange equilibrium 
problem is a general as opposed to partial equilibrium 
problem in that all commodities in the economy are 
treated. In addition, it is an example of perfect com- 
petition in that it is assumed that producers take the 
prices as given and can not individually influence the 
prices. This problem has been extensively studied in 
the economics literature dating to L. Walras [20]; see 
also [2,4,10,19]. 

In the pure exchange model the consumer side of 
the economy is modeled by the excess demand func- 
tions and it is assumed that production is absent and 


consumers exchange commodities that they initially 
own. Production can be introduced into this basic 
framework in a variety of ways by including, for ex- 
ample, an activity analysis model to describe the pro- 
ductive techniques in the economy (cf. [14,15]). The 
excess demand functions are aggregated demand func- 
tions over the individual consumers in the economy. 
They represent the difference between the market de- 
mands for the commodities and the supply of the com- 
modities (based on the initial endowments of the con- 
sumers). 

Computation of economic equilibria has been, typ- 
ically, based either on classical algorithms for solv- 
ing nonlinear systems of equations (see, e. g., [6]), or, 
on simplicial approximation methods pioneered by H. 
Scarf [15] (see also the contributions of M.J. Todd 
[17,18], J.B. Shoven [16], J. Whalley [21], G. van der 
Laan and A.J.J. Talman [7]). The former techniques are 
applicable only when the equilibrium lies in the inte- 
rior of the feasible set, while the latter techniques are 
general-purpose algorithms, and are capable of han- 
dling inequality constraints, such as the requirement 
that the prices of the commodities be nonnegative. Nev- 
ertheless, in their present state of development, they 
may be unable to handle large scale problems (cf. [11]). 

General economic equilibrium problems have been 
formulated as nonlinear complementarity problems 
(see, e.g., [8]) and a Newton-type method based on 
this formulation has been used by many researchers for 
the computation of equilibria (see, e.g., [5,9,13]). Al- 
though this approach has been proven to be more ef- 
fective than fixed point methods, its convergence has 
not been proven theoretically (see, e. g., [11]). 

KC. Border [1] provides a variational inequality 
formulation of Walrasian price equilibrium. Qualitative 
results using a variational inequality framework can be 
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found in [3]. Algorithms, as well as numerical exam- 
ples, as well as the relationship of the Walrasian price 
equilibrium problem to a network equilibrium problem 
can be found in [23]. 

Here we present the pure exchange (or pure trade) 
economic equilibrium model and gives its variational 
inequality formulation. Some fundamental theoretical 
results are then presented. For proofs and additional 
background and results, see [3,12,22], and [23]. 

Consider a pure exchange economy with | com- 
modities, and with column price vector p taking val- 
ues in Ri. and with components pj, ..., pj. Denote the 
induced aggregate excess demand function z(p), which 
is a row vector with components z;(p), ..., Z(p). As- 
sume that z(p) is generally defined on a subcone C of 
Ri) which contains the interior Ri), et of R!,, that is, Ry, 
Cc CCR\.. Hence, the possibility that the aggregate ex- 
cess demand function may become unbounded when 
the price of a certain commodity vanishes is allowed. 
Assume that z(p) satisfies Walras’ law, that is, (z(p), p( 
= 0 on Cand that z(p) is homogeneous of degree zero in 
p on C, that is, z(ap) = z(p) for all p € C, a > 0. Because 
of homogeneity, one may normalize the prices so that 
they take values in the simplex: 


l 
t= |p pent. roi. 


i=1 


and, therefore, one may restrict the aggregate excess de- 
mand function to the intersection D on S! with C. Let 


si, = {p: p>o.pes't, 


and note that cud Cpe ¥. 
As is standard in general economic equilibrium the- 
ory, assume that: 
i) the function z(p): D > R' is continuous; 
ii) the function z(p) satisfies Walras’ law: 


(z(p),p) =0, WpeD. 


The definition of a Walrasian equilibrium is now stated. 


Theorem 1 (Walrasian price equilibrium) A price 
vector p* € Ri. is a Walrasian equilibrium price vector 


if 


2(p*) <0. 


The following theorem establishes that Walrasian price 
vectors can be characterized as solutions of a variational 
inequality. 


Theorem 2 (variational inequality formulation) A 
price vector p* € D is a Walrasian equilibrium if and 
only if its satisfies the variational inequality 


(2(p*),p—p*) <0, VWpeS'. 

The interpretation in the above variational inequality 
model geometrically is that z(p*) is ‘orthogonal’ to the 
set S! and points away from the set S)In particular, the 
result is the following: 


Proposition 3 A price vector p* is a Walrasian equilib- 
rium, or, equivalently, a solution of the above variational 
inequality, if and only if, it is a fixed point of the projec- 
tion map 


G(p) = Psi(p + p2(p)). 


where p > 0 and Ps: indicates the projection map onto 
the compact convex set S!, 


Note that if the aggregate excess demand function z(p) 
is defined and is continuous on all of S!, that is, D = S!, 
then the existence of at least one Walrasian equilibrium 
price vector in S! follows immediately from the stan- 
dard theory of variational inequalities. 

However, since D is not necessarily compact, one 
may still be able to deduce that z(p) exhibits the needed 
behavior near the boundary of S', in particular, that at 
least some of the components of z(p) become in a sense 
‘large’ as p approaches points on the boundary of S! that 
are not contained in D. Several existence proofs of this 
type can be found in [1]. We now provide the result 
proven in [3]: 


Theorem 4 Assume that the aggregate excess demand 
function z(p) satisfies the following assumption: If S' \ D 
is nonempty, then with any sequence {py} in S. which 
converges to a point of S' \ D there is associated a point 
pe S generally dependent on {py}, such that the se- 
quence z(p,,) - p contains infinitely many positive terms. 

Then there exists a Walrasian equilibrium price vec- 
tor p* €D. 


A special class of aggregate excess demand functions is 
now considered, for which the following result holds 
true: 
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Theorem 5 Assume that —z(p) is continuous and 
monotone on D. Then p*; € D is a Walrasian equilib- 
rium price vector if and only if 


(z(p),p — px) <0, VpeD, 


or, equivalently, if and only if, 


(z(p),p*) => 0, WpeD. 


An immediate consequence of the above is the follow- 
ing: 


Corollary 6 Assume that —z(p) is continuous and 
monotone on D and D is compact. Then the set of Wal- 
rasian equilibrium price vectors is a convex subset of D. 


The uniqueness issue is now investigated; specifically, 
if one strengthens the monotonicity assumption some- 
what, one obtains the following result. 


Theorem 7 Assume that —z(p) is strictly monotone on 
D, that is, 


(z(p') — z(p’), p' — p*) <0, 
Vp'.p’ € Dp F pr. 


Then there exists at most a single Walrasian price equi- 
librium vector p*. 


We now present a general iterative scheme for the com- 
putation of Walrasian price equilibria is described. The 
scheme is based on the general iterative scheme of S.C. 
Dafermos (cf. [12], and the references therein). 

In the study of algorithms and their convergence, 
the standard assumption in the economics literature (cf. 
[15]) is that the aggregate excess demand function z(p) 
is well-defined and continuous on all of S!. Here this 
assumption is also made. The scheme is as follows. 


The Iterative Scheme 


Construct a smooth function g(p, q): S! x S! = R! with 

the following properties: 

i) g(p,p) =—2(p), Vp € S's 

ii) for every p, q € S', the (1 x D)-matrix Vp g(p, q) is 
positive definite. 

Any smooth function g(p, q) with the above properties 

generates the following algorithm: 


0 | Initialization: 

Start with some p° € S!. Set k := 1. 

1 | Construction and computation: 

Compute p* by solving the variational in- 

equality 
(g(p*, p* 7)", p—p*) 20, Vpes’. 

2 | Convergence verification: 

IF | p* — p*"! |< ¢, with € > 0, a prespecified 

tolerance, 

THEN STOP; 

ELSE, set k := k + 1, and go to Step 1. 


For simplicity, we denote the above variational in- 
equality by VI(g, S'). Since Vp g(p, q) is positive def- 
inite, VI ta S') admits a unique solution p* Thus, we 
obtain a well-defined sequence {p*}. It is easy to verify 
that if the sequence {p*} is convergent, say p* > p*, as 
k — oo, then p* is an equilibrium price vector, that is, 
it is a solution of the variational inequality. In fact, on 
account of the continuity of g(p, q), VIK(g, S!) yields 


(-z(p*), p — p*) = (g(p*. p*)". p— p*) 
= im (s(p*, ph). p— p*| > 0, 
Vpe s' 


so that p* is a solution of the original variational in- 
equality. 

Conditions for convergence may be found in [12] 
and [23]. 

We now show that the general iterative scheme in- 
duces a projection method and a relaxation method. In 
the context of the pure exchange model both the pro- 
jection method and the relaxation method resolve the 
variational inequality problem into simpler subprob- 
lems, which can then be solved using equilibration al- 
gorithms (cf. [23]). 


The Projection Method 
The projection method corresponds to the choice: 
1 
8(p.q) = —2(q) + a — 4), 


where p is a positive scalar and G is a fixed, symmetric 
positive definite matrix. In this case properties i) and ii) 
are satisfied. In fact, 
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i) g(p.q) = —2(p) + 4G(p — p) = —2(p) 
ii) Vp g(p, q) = p_'G, is positive definite and symmet- 
ric. 


The Relaxation Method 


The relaxation method corresponds to the choice: 


gi(p,q) = —zi(q,... 54); 


Vi 1, 2yace els 


> Fi-1; Pi qit+1, oo 


In this case properties i) and ii) are also satisfied. In fact, 


i) g(p, p)=— 2(p), 


ii) 
dz 
—3a 0 
Vegp.gd=] : 
0 _ bz 


is a diagonal matrix. 
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Within the broad interface between computer science 
and operational research (OR) a major application area 
is locational decisions. Roughly speaking, one can claim 
that the location of any physical object whatsoever, par- 
tially or totally created by some living organism, rep- 
resents the solution to a location problem. Even if we 
restrict ‘living organism’ to encompass human beings 
only and even if we assume some kind of ‘intellectual 
process’ behind the choice of a solution, the entire field 
of location problems is still overwhelming and its story 
dates back as far as the story of mankind itself. Fur- 
ther limitations are necessary in order to find a suit- 
able framework for the pesentation of the subject, and 
the next subset of location problems emerges naturally 
if the rather vague phrase ‘intellectual process’ is re- 
placed by some systematical approach based on what 
has become commonly accepted as OR-methodology. 
This brings models in the focal point as a convenient 
tool for locational analyses. 

A substantial proportion of any developed country’s 
gross national product (GNP) is spent simply on ‘mov- 
ing things around’. The location of the facilities (such 
as town halls, hospitals, factories, depots, retailers, su- 
permarkets, or components in electrical circuits, etc.) 
in relation to the customers or other facilities is there- 
fore of crucial importance to the success of both private 
or public enterprises or to the outcome of the nowadays 
computer-monitored military operations. 

Several schemes have been proposed for classify- 
ing the wealth of models developed for locational de- 


cisions. To account for all the factors separating such 
models from one another is far beyond the scope of 
the present article. For our purpose, however, we need 
to emphasize the distinction between continuous and 
discrete models. Continuous models typically presume 
that the facilities to be located can be placed anywhere, 
that is, within the context of a continuous space, for ex- 
ample, in a plane. Discrete models, on the other hand, 
deal with situations where the set of potential sites for 
the facilities to be placed is finite and often represented 
by the vertices of a network. 

Single-commodity models as opposed to multicom- 
modity models deal with the location of one or more 
facilities each providing the same kind of service to the 
customers allocated to it. For such models, the weight of 
a customer represents the amount demanded (per time 
unit) for the kind of commodity supplied by the facili- 
ties. It may well occur that all customers can be viewed 
as having the same demand. If the so-called single as- 
signment property applies, which means that each cus- 
tomer is supplied by a single facility only, we can then 
transform the original data of the problem such that 
all customers have unity demand, or weight equal to 1. 
Problems where the weight associated with each cus- 
tomer equals 1 are characterized as unweighted as op- 
posed to weighted. 

As its name indicates, single-facility problems in- 
volves the placement of a single facility only as op- 
posed to multifacility models. Likewise, single-criterion 
models consider a single criterion only when the qual- 
ity of feasible solutions is assessed. In contrast to 
multicriteria decision making, it is here meaningful 
to talk of an objective function which is to be opti- 
mized. 

How should a feasible solution comprising both lo- 
cation and allocation be assessed? Among the most fre- 
quently encountered constituents of an objective func- 
tion are: the overall cost structure and the distance mea- 
sure employed. 

To exemplify some of the notions introduced above, 
we can consider the location of a regional wine depot 
which is to supply a chain of n supermarkets located in 
n towns. Disregarding the fact that wine bottles indeed 
may contain highly different fluids, we shall assume that 
the yearly demands can be expressed in terms of the to- 
tal number of wine cases. The problem thereby reduces 
to a single-commodity problem. 
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It is rather unlikely that the regional depot to be 
located can be placed ‘anywhere’. A discrete model is 
thus more appropriate in this case; let us say that the set 
of potential location sites is a subset of the vertices in 
the road network connecting the n cities. Now, will the 
trucks visit only a single supermarket at a time or a se- 
quence of supermarkets before it returns to the depot? 
For simplicity, let us disregard the routing aspect (how 
to combine location with the design of routes visiting 
two or more customers in some order) and accordingly 
assume that the objective function to be minimized can 
be expressed as the product sum 


Y “(annual demand) 


x (shortest path distance between 


supermarket and depot), 


where the summation is taken over the n supermarkets. 

This model representing a very simplified picture 
of the‘real world’ is known in the literature as the 1- 
median problem in a network or 1-MP for short. 

Admittedly, the inclusion of other factors would 
contribute a lot to the model’s realism. Examples of 
such factors are: seasonal variations around Christmas 
and other peak events, the number and capacities of 
the trucks used, the distance measure employed (road 
lengths or travel time?), the fixed costs which may vary 
from one potential location site to another, et cetera. 
Nevertheless, it is a well-documented fact that fairly 
simple models often are able to capture the essential 
parts of a realistic problem whereas the additional con- 
tributions to cost savings or profit achieved via more 
complex models, viewed against the time and cost in- 
vested in their development, need not give ‘value for 
money’. 

A multifacility location problem arises, as men- 
tioned above, when two or more facilities are to be 
located simultaneously, each interacting with the cus- 
tomers or existing facilities or with each other. Suppose 
for example that the chain of supermarkets may wish 
to expand its activities by entering a new region. How 
many new supermarkets should be opened and where 
should they be located? Should they all necessarily pro- 
vide the same kind of service to their customers, say, in 
terms of assortment? Considering the competitive en- 
vironment we here are dealing with, what will the ex- 
pected market share become? Should the supermarkets 


be opened in different time periods and, if yes, in which 
order? 

The last question addressed does also relate to the 
distinction between static and dynamic models, where 
the latter explicitly include time and thus can be viewed 
as examples of multiperiod planning. The usual aim of 
such dynamic models would be to investigate how best 
to incorporate additional new facilities in an existing 
structure, to rearrange an existing layout, or to plan 
a completely new system. 

The facilities to be located are normally regarded as 
‘friendly’ in the sense that ‘closeness’ is viewed as an at- 
tractive property. For example, real estate dealers praise 
easy access to shopping centers, schools, public trans- 
portation, and recreational areas when a house is an- 
nounced for sale. Locational decisions, however, do also 
encompass the counterpart: the location of so-called 
‘obnoxious’ facilities like nuclear power plants, shoot- 
ing ranges, and polluting factories which are needed for 
the society although they produce an undesirable effect 
or represent a threat to their surroundings. Here, one 
frequently used criterion is the maximum distance be- 
tween a facility and the closest customer. It should in 
this context be noted that even a friendly facility may 
well become obnoxious unless ‘closeness’ is taken with 
a grain of salt. Thus, optimal closeness to a noisy ele- 
mentary school is rather ‘reachable within a few min- 
utes’ walk’ than ‘next door’, a feature known as the 
NIMBY syndrome (not in my back yard). 

The investigation of models with such truly antago- 
nistic criteria capturing both the friendly and the ob- 
noxious aspect of a locational decision problem have 
attracted several researchers, notably in the 1990’s, and 
the field is still gaining further momentum. ‘Semiob- 
noxious’ is among the new adjectives created. Whereas 
a nuclear power plant indeed is being considered as 
obnoxious by the vast majority of people, a typical 
semiobnoxious facility could be an airport, disliked by 
its neighbors for its environmental pollution and appre- 
ciated by its users for its reachability. 


Three Prototype Location Problems 


When is a discrete model more appropriate than a con- 
tinuous one? Both options are frequently available to 
practitioners and the following issues are often crucial 
when a choice is to be made: 
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a) Is the transportation network so well developed in 
the region being considered and so free from barri- 
ers that a continuous formulation is reasonable? 

b) Is there a relatively small set of identifiable facility 
sites so that a discrete formulation should be advo- 
cated? 

c) Are the optimal solutions to a continuous model 
readily transferable to a set of possible locations 
without resulting in serious errors in the measure(s) 
of performance used to evaluate solutions? 

d) Do either of the two model types offer computa- 
tional advantages? 

Although the answers to such questions may be am- 
biguous, and although the analyst will often have con- 
siderable flexibility in her/his choice, experience from 
practice indicates that these answers most often lead to 
the choice of a discrete model. The major reasons are 
that in most cases decision-makers consider a discrete 
representation to be a more realistic and a more accu- 
rate portrayal of the problem at hand, and that contin- 
uous models appear to be relatively difficult to solve. 

Among the myriads of models considered in dis- 
crete location theory, only three of these: the p-median 
problem p-MP, the p-center problem (p-CP), and the 
simple plant location problem (SPLP) — at times re- 
ferred to as prototype location problems — have played 
a particularly dominant role. Despite the seeming sim- 
plicity of their underlying assumptions, these models 
have provided important, quantitative bases for the in- 
vestigation of numerous practical locational decision 
problems. They have been used both as optimization 
models in their own right or have been employed as 
subroutines in more integrated models. Finally, due to 
the large number of extensions available, each of these 
three prototype problems can be viewed as the foremost 
member of a family of location problems. 

We now present p-MP, p-CP, and SPLP in their 
most general forms and provide concise, symbolic for- 
mulations of each of these within a common frame- 
work. 

Let 
e mbe the finite number of customers, indexed by i € 

T= {l,..., m}; 

e nbe the finite number of sites for potential facilities, 
indexed byj € J = {1,..., n}; 

e p be the number of facilities to be opened or estab- 
lished, 1 <p <n. 


Whereas the locations of the p facilities to be estab- 
lished is to be decided upon, the locations of the m cus- 
tomers are assumed known and invariant. These cus- 
tomers have prespecified demands for a common good 
which in principle can be provided by any potential fa- 
cility. 

For each of the mn facility-customer pairs, define 
e cj as the total variable cost of serving all of customer 

i's demands from facility j. 

The ‘cost’ cj may include measures of the distance from 

customer i to facility j as well as of the time or cost of 

serving customer i from facility j. For example, the cj 
may be interpreted as cj = w;(hj+t) where 

e w; is the number of units demanded by customer i; 

e h; is the per unit cost of operating facility j (includ- 
ing variable production and administrative costs, 
etc.), and 

e tj, is the transportation cost of shipping one unit to 
customer j from facility i. 

Cost t;; may also be interpreted as tj = dj; where 

e dj is the physical distance (or its time or monetary 
equivalent) of a shortest path from customer i to fa- 
cility j 

Then, for hj = 0, cy reduces to cj = widj. Thus, cy cap- 

tures the various notions of distance, time, and variable 

costs referred to so far. 

With cj so defined, we can without loss of generality 
assume all customers to have unity demand. Further- 
more, for each of the three problems, no capacity con- 
straints are imposed on the number of costumers that 
each potential facility can serve. Finally, also without 
loss of generality, all data are assumed nonnegative. 

As will be explained shortly, we can conveniently 
express the locational decisions to be made in terms 
of Q: 

e QCJ isa subset of potential facilities to be opened 
and from which all customers are to be served. The 
cardinality of Q is denoted by |QJ. 

Conceptually, an approach to identify an optimal solu- 

tion to each of the three problems can be said to involve 

two phases, 

1) determination of a location pattern in terms of Q 
specifying the location of the facilities, and 

2) an allocation phase in which each customer is as- 
signed to exactly one open facility and hence is as- 
sumed to receive all of its demand from that facility 
such that a certain objective function is minimized. 
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Computationally, however, these two phases cannot in 
general be separated from one another but may, de- 
pending on how a specific algorithm is designed, be 
carried out simultaneously. The conceptual decomposi- 
tion into two phases is here suggested solely to facilitate 
comprehension of the ensuing compact formulations. 


p-MP 


Data instance: m, n, p, C = {cj}. 

Open p facilities and assign each customer to ex- 
actly one of them such that the total variable cost is 
minimized. For Q given, an assignment minimizing to- 
tal variable cost can be determined “by inspection’: cus- 
tomer jis assigned to an established facility correspond- 
ing to the smallest cj (up to ties), that is, to facility k 
where cj. = minj ¢ q cj. Upon assigning all customers in 
this manner, the resulting total variable cost becomes 
ie rimin;¢ gc}. Hence, p-MP reads: 


p—MP : min 


QcJ, |Ql=p 


i€l 


The p-center problem p-CP differs significantly from p- 
MP in several respects, primarily with respect to the 
criterion used for assessing the quality of a feasible 
solution. Whereas 1-MP as exemplified above by the 
wine depot location problem and the more general p- 
MP are minisum problems, p-CP has a minimax ob- 
jective: open p facilities and assign each customer to 
exactly one of them such that the maximum distance 
(unweighted case) or the maximum weighted distance 
from any open facility to any of the customers assigned 
to it is a minimum. 

p-CP is often a suitable model for analyzing loca- 
tional decision problems for emergency services such 
as police, fire, and ambulance services. A common cri- 
terion for the effectiveness of such service coverage is 
that any demand point may be reached from the facility 
nearest it within a given weighted distance, time or cost. 


p-CP 


Data instance: m, n, p, C = {cj}. 

Open p facilities and assign each customer to ex- 
actly one of them such that the maximum variable cost 
of serving any customer is a minimum. Suppose Q is 
known. We can then do no better than assigning the ith 


customer to that open facility from which the cost cj is 
a minimum, that is, to facility k where cx = minje Q Cj 
and where ties are resolved arbitrarily. Upon assigning 
all customers in this manner, the resulting maximum 
cost becomes max; ¢ ;{minj <q cj}. Hence, the following 
formulation obtains: 
p-CP: min max min ai} ‘ 
QcJ|Ql=p i€l ( j€Q 
Like p-MP, also the third prototype location problem, 
the simple plant location problem (SPLP) is a minisum 
problem. Two features, however, separate p-MP from 
SPLP: 
a) the inclusion of fixed costs associated with each po- 
tential facility, and 
b) the number of facilities to be established which no 
longer is prespecified but results from an optimal 
solution. 
For the jth potential facility define 
e fj as the fixed cost of establishing facility j 
‘Fixed’ means that f; is to be paid only if facility j ac- 
tually is established and f; is then independent of the 
number of customers (> 1) served by that facility. 


SPLP 


Data instance: m, n, C = {cj}, f = (fj). 

Open a subset Q C J of facilities and assign each cus- 
tomer to exactly one of them such that the sum of the 
fixed and the variable costs is minimized, that is, 


SPLP: min 7+ min ¢;; 
Qci ai >_ mit 2 
jEQ ie] 


We note in passing, that while most well-defined prob- 
lems bear unambiguous names, SPLP has been dealt 
with in the literature under a wide variety of different 
titles, usually composed of an adjective (simple, unca- 
pacitated, optimal) and a noun (plant, warehouse, facil- 
ity, site) followed by location problem. It is furthermore 
somewhat confusing that ‘simple’ in this context is syn- 
onymous with ‘uncapacitated’ since also p-MP and p- 
CP assume that the facilities to be located have unlim- 
ited capacities. 

This ultra-short sketch of the ‘nature of locational 
decisions’ does hardly reveal even the tip of the ice- 
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berg though hopefully enough to leave an impression 
of an area of great practical importance not to forget 
the wealth of theoretical challenges and open questions 
that still remain. 

The literature is already huge and rapidly growing. 
Among pertinent ‘broad-coverage’ references are the 
three textbooks [1,3,4]. Also, [2] deserves to be men- 
tioned. The idea is here to consider decisions as regards 
location and design of production facilities as being in- 
terrelated, that is, the optimal plant design (input mix 
and output level) depends on the location of the plant, 
and the optimal location of the plant depends on its de- 
sign. 
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Introduction 


In recent decades an increased interest in protecting 
the environment from everything that could lead to its 
degradation and destruction has been observed. Pollu- 
tion (discharge of materials or energy and discharge of 
microorganisms that are pathogenic for people and an- 
imals) of the groundwater and underwater is one of 
the most important problems facing ordinary people 
and authorities around the world. Many ecological con- 
sequences result from groundwater pollution. For ex- 
ample, the physicochemical characteristics of water are 
changed, leading to severe economic consequences for 
people, e. g., an increase in the cost of water processing 
for its reuse. 

The most important problems surrounding pollu- 
tion concern water (lakes, rivers, and oceans), which 
suffers the strongest exploitation and use. One of these 
uses is as receivers of the outflows of combined sewer 
networks [10,26,55]. The construction of treatment 
plants, to enable sewage treatment before disposal in 
a body of water, protects the quality of the water that 
receives the outflows of the sewage networks. However, 
urban combined sewer networks do not have separate 
collectors for domestic and industrial sewage and rain- 
water drainage. Therefore, during rainfall, networks 
and/or treatment plants may be overloaded, and over- 
flows may take place upstream of overloaded stretches, 
causing the pollution of receiving waters. Placing re- 
tention reservoirs at appropriate locations along the 
network (by constructing special basins (offline stor- 
age) or by installing throttle gates at the end of long 
sewer stretches (in-line storage)) is a cost-efficient way 
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Schematic representation of a small sewer network 


to avoid overflows during moderate rainfall events and 
to reduce them in stronger rainfall as the water is 
stored in the reservoirs during the rainfall and is di- 
rected toward the treatment plant after the rain has 
stopped. 

Optimal operation of the combined sewer network 
(which contains retention reservoirs) (Fig. 1) implies 
that for each rain event the whole retention capac- 
ity of all reservoirs will be used before overflows take 
place somewhere in the network. This, however, can- 
not be guaranteed by fixed gate settings, such as fixed 
weirs or manually adjustable gates for the filling and 
emptying of these storage spaces. Especially if the rain- 
fall is distributed unevenly over an urban area, there 
may be reservoirs that are not totally filled, while over- 
flows already occur elsewhere in the network. In these 
cases, a further considerable reduction of overflows can 
be obtained by real-time operation of the reservoirs, 
e. g., by use of controllable gates. The decision on how 
to move the gates during a certain rain event may be 
made by a human operator or by some automatic con- 
trol strategy to be applied in real time. An efficient 
control strategy can reduce substantially the overflows 
from a sewer network. In addition, it may lead to sub- 
stantial cost savings as the number and storage capac- 
ities of the reservoirs required to keep overflows below 
a certain (usually legislatively defined [22,60]) limit de- 
pends upon the efficiency of the applied control strat- 


egy. 


overflow treatment plant 


outflow 


reservoir 


Optimization of Wastewater Systems 


The development of a control system for combined 
sewer networks has as a goal the protection of the 
quality of waters that receive the outflows of the net- 
works. Thus, the main task of the control system is 
the minimization of overflows for any rainfall event. 
The development of optimization techniques for the 
planning, design, and management of complex water 
resource systems has been the subject of many inves- 
tigations [40] around the world. The choice of opti- 
mization method to be used depends on the character- 
istics of the reservoir system being considered, on the 
availability of data, and on the specific control objec- 
tives and constraints. Many researchers in the field have 
considered methods such as linear programming, dy- 
namic programming, nonlinear programming, linear- 
quadratic control, genetic algorithms, and combina- 
tions of these methods. Nonlinear optimal control is 
the most efficient approach due to direct considera- 
tion of inflow predictions, process nonlinearities, and 
constraints. On the other hand, nonlinear optimal con- 
trol implies development and implementation of so- 
phisticated codes for the real-time numerical solution 
of the optimal control problem. Multivariable regula- 
tors, if designed properly, may approximate the effi- 
ciency of nonlinear optimal control, based on much 
simpler calculation instructions. The approaches that 
are based on dynamic programming are difficult to ap- 
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ply to large-scale networks due to the “curse of dimen- 
sionality,” while approaches based on linear program- 
ming do not include the nonlinearities of the process. 
Expert systems, fuzzy control, and further heuristic ap- 
proaches have also been applied with remarkable re- 
sults. 

Linear programming is a very powerful and easy- 
to-use form of optimization and is most efficient for 
problems that can be expressed in linear terms. For 
sewer network control, linear programming is used 
in [4] for the development of a control algorithm for 
automatic control of detention storage in a large-scale 
combined sewer system and in [42] for the real-time 
control of urban drainage systems where the nonlin- 
ear programming problem is replaced by a succession 
of linear programming problems. In [1] the optimiza- 
tion of the discharge hydrograph of a pumping sta- 
tion located at the downstream end of a storm drainage 
channel located in the southeastern portion of Mexico 
City is considered and the initial nonlinear optimiza- 
tion problem is reduced to a series of linear program- 
ming problems whose solution determines the desired 
optimal discharge hydrograph. In [8] a new approach 
to the optimal design of wastewater treatment systems 
is presented. An algorithm that can be divided into two 
parts is proposed for finding global optimal solutions to 
the problem. The first part comprises a new linear pro- 
gram formulation that is used to generate good starting 
points for the solution of the general nonlinear program 
(second part). 

Dynamic programming has been used extensively 
in the optimization of water resource systems, as the 
nonlinear and stochastic features, which characterize 
a large number of water resource systems, can be trans- 
lated into a dynamic programming formulation. How- 
ever, when dynamic programming is applied to multi- 
ple reservoir systems, the usefulness of the technique is 
limited, as the computer memory requirement is quite 
large. In such cases, dynamic programming can only be 
applied if the complex problems with the large num- 
ber of variables are decomposed into a series of sub- 
problems, which are solved recursively. In the context 
of sewer network control, dynamic programming has 
been used for optimizing the design of drainage sys- 
tems [51], for designing the least expensive network of 
sewers that will drain water from a number of discrete 
sources [56], for designing the lowest-cost drainage net- 


works which include storage elements [17], and for 
control of the combined sewer network of the city and 
county of San Francisco [24]. 

Nonlinear programming offers a more general 
mathematical formulation than linear and dynamic 
programming and can effectively handle nonlinear ob- 
jective functions and nonlinear constraints. In [16] 
an algorithm that combines elements of discrete dy- 
namic programming (i.e., discrete state space, back- 
ward stagewise optimization) with elements of con- 
strained optimization (i.e, nonlinear programming 
with equality constraints) is used for the optimal con- 
trol of a multireservoir system. In [3], optimal control 
theory is used for real-time automated control of com- 
bined sewers, in [41,45,46,47] nonlinear programming 
is applied for the flow control of Québec Urban Com- 
munity sewer network, in [20] a model-predictive con- 
trol strategy that uses a mixed linear/quadratic objec- 
tive function is applied in the Seattle metropolitan area 
to minimize combined sewer overflows, while a solu- 
tion algorithm developed for the sewer network con- 
trol problem applying the discrete maximum principle 
is used in [27,28,29,31,32,33,35,38]. 

Linear-quadratic control theory has been exten- 
sively applied in many fields, and a number of in- 
vestigators have incorporated various aspects of lin- 
ear-quadratic theory in their solutions to reservoir 
operations problems. In [59] a multivariable feedback 
controller is used for the control of combined storm- 
sewer systems, while in [27,28,29,30,31,34,36,37,38] 
a linear multivariable feedback regulator designed us- 
ing the linear-quadratic methodology is used for the 
sewer network control problem. 

Genetic algorithms have been proposed as a means 
of global optimization for a variety of engineering de- 
sign problems. They mimic the natural genetic pro- 
cesses of evolution, deliberately keeping a range of 
good solutions to avoid being drawn into false lo- 
cal optima. Genetic algorithms are robust methods for 
searching the optimum solution to complex problems. 
In [57] they are applied to a four-reservoir, determin- 
istic finite-horizon problem. To achieve water qual- 
ity goals and wastewater treatment cost optimization 
in the Youngsan River, where water quality has de- 
creased due to heavy pollutant loads from Kwangju City 
and surrounding areas, a water quality management 
model [11] has been developed through the integration 
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of a genetic algorithm and a mathematical water quality 
model with remarkable results. 

Conventional rule-based control and fuzzy logic 
for real-time flow control of sewer systems have also 
been used with success. Conventional rule-based con- 
trol systems are based on a large number of rules, 
while control systems based on fuzzy logic combine the 
simple rules of an expert system with a flexible spec- 
ification of output parameters. In [23] a comparison 
of conventional rule-based flow control systems with 
a control system based on fuzzy logic is conducted for 
a combined sewer system, while in [18] a study was 
carried out for a part of the sewer system of the city 
of Flensburg using fuzzy logic for the real-time con- 
trol of the sewer system. An interactive fuzzy approach 
has worked suitably [25] in water quality management 
when applied to developing a water quality manage- 
ment plan in the Tou-Chen River Basin in northern 
Taiwan for solving a multiobjective optimization prob- 
lem involving vague and imprecise information related 
to data, model formulation, and the decision maker’s 
preferences. 

The real-time control (RTC) of wastewater systems 
has been a topic of research and application for many 
years, and the benefits of applying RTC strategies to 
various wastewater systems are presented in many re- 
search papers. In [9] a global optimal control prototype 
for the Barcelona urban drainage system is presented. 
[7] presents the results of the application of RTC strate- 
gies to the Roma-Cecchignola combined sewer system. 
In [58] the RTC is applied to sewer systems in Ger- 
many, while in [19] the analysis of the performance 
improvement of a new automatic central control pro- 
cedure applied to the sewer system of Rotterdam is 
presented. The results of a study [48] showed that the 
Trebic sewer system is suitable for combined runoff 
control. The optimized control of a Moscow sewer sub- 
network enabled significant improvements in the sewer 
network operation as shown in [14]. A global opti- 
mal control system was implemented on the Québec 
Urban Community’s Westerly sewer network [45] to 
manage flows and water levels in real time and man- 
aged to decrease combined sewer overflow (CSO) vol- 
umes at four overflow sites by more than 85% for seven 
rainfall events recorded during the summer of 1999. 
In [44] fault-tolerant-model predictive-control strate- 
gies of sewer networks are investigated and applied to 


a portion of the Barcelona sewage network under re- 

alistic rain and fault scenarios, while in [43] hybrid 

model predictive control (HMPC) for sewer networks 
is introduced and applied to the same sewer network. 

In [15] the CORAL offline, a new tool for sewerage net- 

work modeling, simulation, and optimal strategy com- 

putation, is demonstrated for a test catchment of the 

Barcelona sewer network for the purpose of perform- 

ing a global optimal control. 

It should be noted that in the recent past the three 
parts of the urban wastewater system (sewer system, 
wastewater treatment plant (WWTP), and receiving 
water) have been considered as separate units in water 
quality management, and the aims of optimum perfor- 
mance were considered individually as well. The con- 
ventional RTC of sewer systems mainly aims at min- 
imization of overflow volumes and loads, while treat- 
ment plant operation traditionally is mainly concerned 
with maintaining effluent standards. However, recent 
years have seen increased attention being paid to the 
integrated analysis of sewer networks, wastewater treat- 
ment plants and receiving waters, and many researchers 
have focused their work on integrated modeling and 
integrated control. Integrated control is characterized 
by two aspects [6]: 

e Integration of objectives: control objectives within 
one subsystem may be based on criteria measured 
in other subsystems. 

e Integration of information: control decisions taken 
in one subsystem may be based on information 
about the state of other subsystems. 

One of the most important improvements in the 
field of integrated modeling and integrated control 
is due to Schiitze and Butler’s work [5,6,52,53,54,61]. 
However, other researchers have also studied the inte- 
grated modeling and integrated control of wastewater 
systems [2,12,13,21,39,49,50]. 
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The concept of Young programming is based on the 
Young inequality. Wide range applications in mechan- 
ics [9], statistics and decision theory [4,7] and infor- 
mation theory [3] give some beautiful interpretations 
of the Young inequality. In section 1 we shortly recall 
the inequality in a form that is convenient to use in 
the rest of the paper. In section 2 some basic facts of 
linear programming are restated in a form that is easy 


to generalize to Young programming. In section 3 the 
Young programming is introduced and the duality is 
discussed. In section 4 a parametric form Young pro- 
gramming is shown to be an analytical approximation 
of the corresponding linear programming problem. Fi- 
nally in section 5 a row-action method for the solution 
of Young programming problems is presented. The al- 
gorithm interpreted in terms of the dual problem leads 
to an alternative method, we call it dir-action method. 


The Young Inequality 


The Young inequality was first published by W.H. 
Young in 1912. A generalized form of the inequality 
can be found in [5]. We recall the inequality in a form 
that is convenient to use in the rest of the paper. Let 
g: R= R bea continuous, strictly decreasing function, 
and consider the curve yg = {(x, p(x)): x € R}. The fol- 
lowing definition offers a way to describe how much 
an arbitrary point (u, v) of the plane is ‘away’ from the 
curve~/g. 


Definition 1 Let (u, v) € R? and denote w = g“!, then 


So(u.v) = (u— w(v))v — / g(t) dt 
vv) 


=(-owu— f wnat 
plu) 
is called the inaccuracy of (u, v) with respect to yg. 


Geometrically Sg(u, v) is the area of the shaded region 
in Fig. 1. The Young inequality states that 


So(u,v) >0_ for every (u,v) € R 
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Se(u, v) 


Young Programming, Figure 1 


and equality holds if and only if v = g(u). The geometric 
interpretation or a straightforward computation proves 
the statement. The inaccuracy can be termed in R”” as 
follows. 


Definition 2 Let (u, v) € R7, u= (u),. 
. s+) Vy), then 


one) Un), V = ("1 


So (u,v) := > So (uj, vj) 


j=} 


is called the inaccuracy of (u, v) € R*", i.e. the inaccu- 
racy is computed coordinatewisely. 


Interesting examples of functions of the form Sy(u, 
v) are displayed by certain discrepancy or divergence 
functions. Let f: R > R be a strictly concave and differ- 
entiable function. Then a class of divergence functions 
can be generated by the following mapping Dy: R x R 
—>R, 


Dy(ullv) = fv) + f(v~)(u— v) — fu), 


u,vER. 


Dy can serve as a ‘measure of distance’ between u and 
v € R with respect to f, although strictly speaking Dy is 
not a distance function for it is not symmetric and the 
triangle inequality does not hold. Nevertheless the fam- 
ily of functions of the form Dy (f being not necessarily 
of a sum form) was introduced by L.M. Bregman [1]. 


Obviously 


Dy(ullv) = fv) + fu —v) — fw) 
= f'(v(u—v) + [ro dt, 


u,vER. 
Let f: R—> R be defined by f’ = g. It is easy to verify that 
So(u, p(v)) = De(ul|v) for allu,v € R, 
or alternatively 


Sg(u,v) = De(ul|y(v)) forallu,v eR. 

Some special cases, divergence functions known in the 

literature, are obtained by choosing 

e g(t) = —Int — 1, then Sg(u, g(v)) = u In(u/v) — u 
+ v known as I-divergence, introduced by S. Kull- 
back~[10]; 

e g(t) = 1/t, then Sg(u, g(v)) = In(v/u) — (u/v)— 1, 
known as Itakura-Saito divergence [6]; 

e g(t)=t* ',a<1,a £0, then Sy(u, p(v)) = (v% — 
u® + av®—!(u — v))/e known as Csiszar’s w-diver- 
gence~[4]. 


Linear Programming 


Since the Young programming will be introduced as 
an analytical approximation of linear programming, it 
is convenient to recall some basic facts of linear pro- 
gramming. Let A be an m x n matrix and denote a, 
, a’) € R” the rows of A. Denote & and ¢+ the 
rowspace of matrix A, and the solution space of Ax = 
0, respectively. Let X,z € R” be arbitrary but fixed 
vectors. Let us define the affine subspaces 7 @ £L = 
{2 €R": z=Z+ wfor somew € L} and*¥ @ Lt = 
{x © R": x =X + w for some w € £+}. Then clearly 


xEX@L S Ax = AK, 
and 
ZET@LG~=T+yA 


for some y € R”. 
Denote xj, z; the jth coordinate of x, z, respectively, 
and consider the following feasibility problem, which is 


Young Programming 


4063 


in fact the linear programming problem in an equilib- 
rium form. 


Problem 3 Let x,z € R" be arbitrary but fixed vectors. 
Find a feasible solution (if any) to the set of constraints 


xex@lt, z2ETO@L, (1) 
x>0, z>0, (2) 
xjzj=0, jol,...,n. (3) 


The next lemma states some elementary but crucial ob- 
servations about Problem 3. 


Lemma4 Let x,z € R" be arbitrary but fixed vectors. 
i) Ifx and z satisfy (1), then xz + Xz = xz + XZ. 

ii) Ifx and z satisfy (2)-(2), then xz + Xz > XZ. 

iii) If x and z satisfy (1)-(3), then xz + Xz = XZ. 


Proof Elementary computation proves that i), ii) and 
iii) are obvious. 


Let “Ps {x: xex@tlt x > 0}, and D = 
{z: ZEZ@L,z>O0}. The following problem 
presents three equivalent settings of the linear pro- 
gramming problem. 


Problem 5 Let x,z € R" be arbitrary but fixed vectors. 

i) (Equilibrium form) Find a feasible solution to (1)- 
(3). 

ii) (Optimization form) Find a feasible solution to 
(1)-(2) such that a1 xjzj is minimal. 

iii) (Primal-dual form) Find solutions to both prob- 


lems: 
primal 
min ei Xj2ji XE P\, 
dual 


ae 
min {71 Fz): ZE Di 


The next theorem restates the well-known duality the- 
orem of linear programming in three equivalent forms 
corresponding to problem settings in Problem 5. 


Theorem 6 (Duality theorem) Letx,z € R" be arbi- 
trary but fixed vectors. 

i) If (1)-(2) is feasible, then (1)-(3) is feasible. 

ii) If (1)-(2) is feasible, then there are x* € P, z* € D 


such that )°"_, x*z* =0. 


t 
thal als a | 


iii) If PAM and D F Y, then there are optimal solutions 
x* € P and z* € D to both the primal and the dual 
problems, respectively; furthermore x*z* = 0. 


The standard way in the literature is to prove Theorem 
6iii) directly, then i) and ii) are easy corollaries. Finally 
let us point out that if we drop the assumption that 
curve y is defined to be the graph of a function ¢ in 
Definition 1, then the term xjzj can be interpreted as 
the inaccuracy of (x;, zj) with respect to y = {(x, z): x 
> 0, z => 0, xz = 0}. In the next section an equilibrium 
function, gy: R, — R, will be introduced as an analytical 
approximation of y. 
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Let g: R, — R, be a continuous, strictly decreasing 
function with lim,-, 9+ g(x) = oo, lim,-+ 9 g(x) = 0. 
Let Yu = {(x, y(x)): x € Ry} and denote w = yg. Ac- 
cording to Definition 1, the inaccuracy of (x, z) € R’, 
x> 0, z> 0 with respect to yg is 


So(x, 2) = (x — Wlz))z— i p(t) dt. 


w(z) 


The basic properties of Sy(x, z) are summarized in the 
following lemma. 


Lemma 7 Let g: R, — R, be a continuous, strictly 

decreasing function with lim, > 0+ p(x) = 00, lim, 

p(x) = 0. Let y = 7!. Then 

i) Sg(x, Z) is strictly convex function in x > 0 and z > 
0, respectively. 

ii) Sg(x, z) = 0 for every x > 0, z > 0, and Sg(x, z) = 0 
if and only if x = y(z). 

iii) 0/(0x)Sg(x, z) = z— v(x) for every x >0,z>0. 

iv) 0/(0z)Sg(x, z) = x — W(z) for every x >0,z> 0. 


v) — limy-+04 0/(0x)Sg(x, z) = —oo for every z > 0. 
vi) lim,-+0+ 0/(0z)Sg(x, z) = —oo for every x > 0. 
vii) limy—+o0 0/(0x)Sg(x, Zz) = z for every z > 0. 


viii) limz+o0 0/(0x)Sg(x, z) = x for every x > 0. 
ix) lim, o Sg(x, Z) = © for every z > 0. 
x) lim, + Sg(x, z) = 00 for every x > 0. 
xi) So(x, z) = Sy(z, x) for every x > 0 and z > 0. 


Proof Elementary computation proves each property. 


The inaccuracy of (x, z) € R*",x>0,z> 0 with respect 
to yg, as introduced in Definition 2, is computed coor- 


dinatewisely, i.e. for every x = (x1, ...; Xn), xj > 0,j = 1, 
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..)n,and Z= (Z),...; Zn), Z >0,j=1,...,n, 


S(x,2) = > So(xj,z)). 


j=l 


In the rest of this section the Young programming is 
presented by a complete analogy to linear program- 
ming as is discussed in section 2. Let us consider the 
following feasibility problem, which is in fact the Young 
programming problem in an equilibrium form. 


Problem 8 Let xX > 0,Z > 0 be arbitrary but fixed 
vectors. Find a feasible solution to the set of constraints 
below. 


xEX@Lt, 2ETOL, (4) 
x>0, z>0, (5) 
xj= wz), jol,...,n. (6) 


The next lemma states some elementary but crucial ob- 
servations about Problem 6. 


Lemma 9 Let x > 0,Z > 0 be arbitrary but fixed vec- 

tors. 

i) Ifxand z satisfy (4)-(5), then Sg(x,Z) + Sg(,z) = 
So(x, Z) + So(X,7Z). 

ii) Ifx and z satisfy (4)-(5), then Sg(x,Z) + Sg(X,z) = 
So (X,Z). 

iii) If x and z satisfy (4)-(6), then Sg(x,Z) + Sg(X,z) = 
Sax, 2). 


Proof Elementary computation proves that i), ii) and 
iii) are obvious. 


Let Pp = {x: xex@clt, x> 0}, and Dy = 
{z: ZET@L,z=O0}. The 
presents three equivalent settings of the Young pro- 
gramming problem. 


following problem 


Problem 10 Let x > 0,% > 0 be arbitrary but fixed 

vectors. 

i) (Equilibrium form) Find a feasible solution to (4)- 
(6). 

ii) (Optimization form) Find a feasible solution to 
(4)-(5) such that Vi=1S¢ (xj, Zj) is minimal. 


iii) (Primal-dual form) Find solutions to both prob- 


lems below: 
primal 
min Ee So(xj,2)): XE P,} : 
dual 


min ye So(Xj,Zj): ZE D+} ; 


The primal and dual Young programming problems as 
defined in Problem 10 are symmetrical in the sense that 
dual of the dual is the primal. For a proof of this state- 
ment, see [8]. The next theorem presents three equiv- 
alent forms of the duality theorem corresponding to 
problem settings in Problem 10. 


Theorem 11 (Duality theorem) Letx > 0,7 > 0 be 

arbitrary but fixed vectors. 

i) The system (4)-(6) is feasible. 

ii) There are unique x* € P, and z* € D, such that 
Ye So(x}', z}) =0. 

iii) There exist x* € P, and z* € D, unique optimal so- 
lutions to the primal and the dual problems, respec- 
tively. Furthermore x} = WG) ij =1,...,n. 


Instead of referring to proofs of more general state- 
ments (duality theorems of convex programming, see 
e.g. [11]) we prefer to give a short proof of Theorem 
1liii). 

Proof The proof can be formulated on both the pri- 
mal and the dual sides. To emphasize the symmetry we 
show the proof in both cases. 

Proof on the dual side. Because of Lemma 7i), 7vi) 
and 7x) the dual objective function attains its unique 
minimum in D,. Denote z* the minimum and let x; 
= w(z7) > 0,j=1,..., n. Suppose that ax* 4 aX 
for some i. Let z(0) := z* + 0 a. Clearly z(0) € D, for 
small enough 6, and 


4 seg z(0)) = ag—ax* 40 
dé §=0 
what is in contradiction with the assumption that z* is 
dual optimal solution. Therefore x* is primal optimal 
solution. 

Proof on the primal side. Because of Lemma 7i, 7v) 
and 7ix the primal objective function attains its unique 
minimum in P,. Denote x* the minimum and let Zz; 
= 9(x;) > 0, j = 1,..., 2. Suppose that z*—Z # yA 
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ie.z* —7@ ¢ L(a,..., a). Then there exists an € 
L£+(a,...,a) such that ¥(z* — 7) # 0. Let x(6) := 
x* + 6x. Clearly x(@) € P, for small enough 6, and 


= x(z—2z*) 40 


d 
gg X8).2)) 


what is in contradiction with the assumption that x* is 
primal optimal solution. Therefore z* is dual optimal 
solution. This completes the proof of iii). 

To show that i), ii), iii) are equivalent statements it 
is enough to note that Sy(x, z) > 0 for every x>0,z>0 
and S¢(x, z) = 0 if and only if xj = w(z),j=1,...,n. 


The next corollary points out that the optimal solutions 
of the primal and dual problems do not depend on the 
choice of parameter vectors € ¥ @ £L+,* > O and 
ZET@OLT>O. 


Corollary 12 Let X, Z be given. There exist unique x* € 
KX @ Lt, 2z* €T@L such that x* =W()j=L.ium 
and 


So(x*,z) + So(x,z*) = So(x,z), 
Wxex@Llt, VzeEZ@L. 


Proof Obvious from Theorem 11. 


Another important implication of the duality theorem 
is noted in the following corollary. 


Corollary 13. Let x, Z be given and denote x* = 
argmin {SoCs 2); xex@tt x> o} and zw = 
argmin {Sy(x,z): z€Z@® L, z> 0}. Suppose that L’ 
C £ and denote z/* = argmin {Sy(X,z): 2€7@ L’, 
z> 0}. Then 


x* = argmin {Sg(x,z): xe X@ £7, x>0}. 


Proof Sincez’* € Z@L' C Z@L, the statement follows 
from Corollary 12. 


Observations of Corollary 12 and Corollary 13 give rise 
to the row-action method proposed in section 5. 


Approximation of the LP Problems 


In this section we introduce a parameter € > 0 in the 
Young programming problem, and prove that the se- 
quence of optimal solutions of parametric Young pro- 
gramming problems converges to an optimal solution 
of the corresponding linear programming problem. 


Definition 14 Let € > 0, g: R, — R, continuous, 
strictly decreasing function with lim, -, 9, g(x) = 00, 
lim, — 00 Y(x) = 0. Define g_(x) := € v(x). 


Denote We = g-'. Clearly We(x) = (x/e), where p = 
g'. Let us consider the following parametric version 
of the Young programming primal-dual pair. 


Problem 15 Let x > 0,% > 0 be arbitrary but fixed 
vectors. Find a solution to both problems below 


primal 
xex@oct 
x>0 
min a Sp. (xj,Zj) 
dual 
ZELZOL 
z>0 


min 2s Soe es: Zj). 


For every given € > 0, denote x*(e), z*(€) the optimal 
solutions to Problem 15. The next theorem points out 
that the sets of optimal solutions {x*(e): € < Eo}, {z*(e): 
€ < €o} are bounded for a small enough € > 0. 


Theorem 16 Lete < «= min; Z;/(p(x))). Then there 
exists a K € R, such that x; (€) < K and z; (€) < K for j 
=1,...,n. 


Proof The optimality of x*(€) and z*(e) implies that 


D> Spe (x (€), 2) + D Soe (%j, Z7(€)) 


j=l j=l 
n 
= Y S,ane): 
j=l 


Let € < € = minj,2Zj/(y(x;)). It is clear from 
the geometric interpretation of Sy(x, z) in Fig. 1 that 


So.(%j.Zj) < Xj; for every € < €) andj =1,..., n. 
Since Sy_(x;, z7(€)) = 0,j=1,...,n, we have that 


n n 
> Secte; (©), 2) = > Riz 
j=l j=l 


for every € < €9, which implies the boundedness of 
x5 (€), j = lysis for limy= sq S5.(%,2;) = OO. The 
proof to show the boundedness of Zi (€), j =1,...,%, 
follows the same line of thoughts 
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Corollary 17 Let x*(e), z*(e€) be optimal solutions of 
Problem 10. Then 


lim x*(e)z7¥(e-)=0, jHl,...,n. 
e>0 J J 


Proof If lime +o Zz; (€) = 0, then due to the bounded- 
ness of x" (€) the corollary follows. If z; (€) >a>0 for 
all € < €o, then lim, -, 9 x" (e) = lime 9 pz; (e)/e) =0 
and the corollary follows 


Putting together the observations of this section we get 
that lim, _, 9 x*(€) = x* and that lim, _, 9 z*(€) = z* are 
optimal solutions of the linear programming primal- 
dual pair corresponding to Problem 15, i.e. 


primal 
xex@ct 
x>0 
min i Riz; 
dual 
ZEZOL 
z>0 


a a 
min ) j=) XjZj. 


Algorithms 


Row-action algorithms as defined in [2] are ones that 
use only the previous iterate in each iterative step, and 
access is required to only one row of the system of equa- 
tions of the constraint set. A row-action method was 
first suggested in the pioneering work of Bregman [1]. 
The following algorithm is a row-action method for the 
solution of the Young programming problem as stated 
in Problem 10iii). 


Initialize: 

TY) 

i:= k(modm). 
Step k: 

x* = argmin{S,(x,z*~’) : ax = ax}, 
zi = (xt), j=,....n. 


Algorithm 1: row-action method 


It may be interesting to point out that in terms of 
the dual, Step k reads as follows 


Step k: 
ak =7k-1 + Ba) 


D_ = argmin{Sy(x, T2 a ey eR 


That is, in dual terms, Step k is a one-dimensional 
minimization problem along the row vector a! ij=k 
(mod m) at each step. The convergence of this algo- 
rithm was shown by I. Csiszar [4] if any of the following 
two assumptions holds: 

[A1] the set {x: ax = alX, x > O} is bounded for 

at least one i, 1 <i<m; 

[A2] {6 p(t)dt = infin;, for some a > 0. 

Typically Al) holds for problems involving discrete 
random variables, where the sum of the components is 
one. A2) holds for example for g(x) = x%~1,0 <a <1. 
Note that, due to Lemma 7ix), the dual objective func- 
tion can be rewritten as Sy.(z,X), where w = y~!. Then 
the duality theorem (Theorem 11) enables us to add the 
remark that convergence is also ensured if 

[A2’] [§ w(t) dt = 00, for some a > 0. 

For example, A2’) holds for g(x) = x*~ 1, a <0. 

The convergence of Algorithm 1 without any fur- 
ther assumption, although is likely to be true, remains 
an open question according to the best knowledge of 
the authors. Finally we present a small numerical ex- 
ample to display the steps of Algorithm 1. 


Example 18 Let us consider the following Young pro- 
gramming problem. 


min 7 Slxj,2)) 


and ¢ (t) = 1/t. The steps that Algorithm 1 takes on this 
example are arranged in the table below (iterations were 


stopped when ke - an eo", 7 = 1.2, 3,4). 
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0 6.000 10.000 3.000 1.000 0.596108 

il 9.093 8.585 5.747 2.023 0.137325 
2 9.655 10.002 5.426 3.139 0.038966 2 

3 8.815 10.386 4.679 2.861 0.018048 
4 8.898 10.597 4.632 3.028 0.016183 3. 

5 8.827 10.630 4.568 3.004 0.016004 
6 8.834 10.648 4.564 3.018 0.016003 4. 

7 8.828 10.650 4.559 3.016 0.016003 

8 8.828 10.651 4.564 3.017 0.016003 
9 8.828 10.652 4.558 3.017 0.016003 _ 
6. 

So, the optimal solution is (x*)™ = (8.828 10.652 

4.558 3.017). If we solve the dual problem, then a = 7. 


ag = 1, 2, 3, 4, for every k= 1,..., 9, so the optimal 8 
solution for the dual problem is (z*)T = (0.113 0.094 


0.219 0.331). 9. 


See also 


> Linear Programming 
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Preface to the Index Volume 


This volume comprises the index to volumes 1-6 of the ENCYCLOPEDIA OF OPTIMIZATION. It contains two 
indices: a Subject Index, and a Name Index. In these indices, part of the information in the articles has been 
‘inverted’. 

To understand the contents of these indices, recall that each article in the first six volumes has the following 
global structure: 
e title (in bold) 
e text of the article; important notions are printed in italics, references to other articles are printed in bold, and 
the first mention of a scientist includes his/her initials 
bibliography 
author(s) 
AMS 2000 classification code 
list of keywords and phrases 


Name Index. The Name Index has an entry for each scientist explicitly mentioned in the text of an article. The 
entry lists the titles of the articles in which that Person is mentioned. The Name Index is alphabetically sorted 
according to the scientist’s last name. 


Subject Index. The Subject Index contains entries of four types, using different fonts: 
e article titles (in bold) 

e phrases marked in the articles as being important (in italics, as in the articles) 

e keywords and phrases as listed underneath each article (in a plain font) 

e rotations of the three entries above (in a sans-serif font) 


Article Titles. For each article title we first list the AMS 2000 subject classification code, then the titles of articles 
that refer to the article (i. e., those that mention the article), and, last, the titles of articles to which the article refers 
(i. e., those mentioned and printed in a bold font in the text of the article). 


Important Phrases. For each such phrase we give the list of article titles in which the phrase (or a standard form of 
it) appears, and all the AMS classification codes associated to these articles. These codes are thus taken from the 
articles. 


Keywords and Phrases. For each such entry we give the list of article titles having exactly this word or phrase in the 
Keywords and phrases section, and all the AMS classification codes associated with these articles. 


Rotations. The rotation set of a phrase is formed by phrases obtained by successively moving the first part of the 
initial phrase to the end. If the phrase thus obtained does not start with an uninformative word (like ‘the’, ‘its’, 
‘to’), the phrase belongs to the rotation set of the initial phrase. 

Rotation entries list the last part and refer to the actual index entry. They allow you to locate an exact phrase 
when you only know a word occurring somewhere in it. 
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An example to clarify this follows: 
Suppose the initial phrase is “Adaptive simulated annealing and its application to protein folding’. Then its rota- 
tions are 


1) simulated annealing and its application to protein folding see: Adaptive - 
2) annealing and its application to protein folding see: Adaptive simulated - 
3) and its application to protein folding see: Adaptive simulated annealing - 
4) its application to protein folding see: Adaptive simulated annealing and - 
5) application to protein folding see: Adaptive simulated annealing and its - 
6) to protein folding see: Adaptive simulated annealing and its application - 
7) protein folding see: Adaptive simulated annealing and its application to - 
8) folding see: Adaptive simulated annealing and its application to protein - 


Now, 3), 4), 6) clearly are not very helpful as regards the index, and only 1), 2), 5), 7), 8) remain and form the 
rotation set. So, by looking in the Subject Index for entries starting with any of the words ‘simulated’, ‘annealing’, 
‘application’, ‘protein’, ‘folding’, you will find the phrase ‘Adaptive simulated annealing and its application to 
protein folding’ as well as others! 


Order of Entries. Both indices are in alphabetical order, with numerals and mathematics symbols first. Sub-/ 
superscript and small uninformative words (such as “in”, “of”, “the”) are ignored. Punctuation signs and the 
symbols ‘-’ and ‘~’ count as space. 

Please note that greek letters are sorted as if spelled out, e. g., ¢ as ‘zeta’. 

When two phrases are exactly the same but are in different fonts, the order is: bold, italics, plain, sans-serif (or: 


article titles, important notions, keywords and phrases, rotations). 


It is hoped that both indices will lead you quickly to the wealth of information provided in the six volumes of the 
ENCYCLOPEDIA OF OPTIMIZATION. 


July 2008 


Subject Index 


0-1-0 graph 
[58E05, 90C30] 
(see: Topology of global optimization) 
0-1 knapsack see: fractional — 
0-1 linear programming approach for DNA transcription 
element identification see: Mixed — 
0-1 mixed integer problems 
[90C09, 90C10, 90C11] 
(see: Disjunctive programming) 
0-1 programming problem see: fractional —; hyperbolic —; 
single-ratio fractional (hyperbolic) — 
0-1 programs see: mixed integer — 
0-diagonal operator see: block- —; off- — 
1 knapsack see: fractional O- — 
1 linear programming approach for DNA transcription 
element identification see: Mixed 0- — 
1-median problem in a network 
[90B80, 90B85] 
(see: Warehouse location problem) 
1 mixed integer problems see: O- — 
1-MP 
[90B80, 90B85] 
(see: Warehouse location problem) 
1 programming problem see: fractional 0- —; hyperbolic 0- —; 
single-ratio fractional (hyperbolic) 0- — 
1 programs see: mixed integer 0- — 
1D-diffusion fluxes see: estimation of — 
2 see: SSS- — 
2-dimensional grid 
65K05, 65Y05] 
(see: Parallel computing: models) 
2-dimensional torus 
65K05, 65Y05] 
(see: Parallel computing: models) 
2-matching problem 
90C05, 90C10, 90C11, 90C27, 90C35, 90C57] 
(see: Assignment and matching; Integer programming) 
2-opt 
90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
2-opt neighborhood 
90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 


2-partition 

[68Q25, 90C60] 

see: NP-complete problems and proof methodology) 
2-SAT see: MAX- — 

2-separated 

[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

see: Optimization problems in unit-disk graphs) 
2-step superlinear 

[65K05, 65K10, 90C06, 90C30, 90C34, 90Cxx] 

see: Discontinuous optimization; Feasible sequential 
quadratic programming) 


2-valued function see: Boolean — 
2-valued logic algebra see: Boolean — 
2-valued normal forms see: Pl-algebras and — 
2B-consistency 
[65G20, 65G30, 65G40, 68T20] 
see: Interval constraints) 
3-colorability 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
3-DIMENSIONAL MATCHING 
[90C60] 
see: Complexity classes in optimization) 
3-dimensional matching problem 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
see: Maximum partition matching) 
3-partition 
[68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
#3 problem see: Gomez — 
3-SAT 
[68Q25, 90C60] 
(see: Complexity classes in optimization; NP-complete 
problems and proof methodology) 
3-satisfiability 
[68Q25, 90C60] 
(see: Complexity classes in optimization; NP-complete 
problems and proof methodology) 


3B-consistency 

[65G20, 65G30, 65G40, 68T20] 

(see: Interval constraints) 
3D-transportation problem 

[90C35] 

(see: Multi-index transportation problems) 
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3PM process 

[65G20, 65G30, 65G40, 65L99] 

(see: Interval analysis: differential equations) 
4-element group see: Klein — 
6000 see: EasyModeler/ — 
=NP see: P— 
0* -critical point 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
0— -critical point 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
0* -function 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: biduality in nonconvex optimization) 
0~ -function 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
oo-stationary point 
[65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 
oo-stationary point see: Hadamard — 
€-subdifferential 

[46A20, 52A01, 90C30] 

(see: Farkas lemma: generalizations) 


A 


a priori 
[90C25, 9417] 
(see: Bilevel programming: applications in engineering; 


Entropy optimization: shannon measure of entropy and its 


properties) 
a priori method 
[65K05, 90B50, 90C05, 90C29, 91B06] 
see: Multi-objective optimization and decision support 
systems) 
a priori optimization 
[90C10, 90C15] 
(see: Stochastic vehicle routing problems) 
A-weighted Euclidean norm 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
A* search algorithm 
[90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 
Abadie CQ 
[49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
Abaffi-Broyden-Spedicato algorithms for linear equations and 
linear least squares 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
Abaffian 
[65K05, 65K10] 
(see: ABS algorithms for optimization) 


Abaffian matrices 

65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 

squares) 

Abaffians 

65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 

squares) 

abnormal extremal 

41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 

abnormal extremals see: High-order maximum principle for — 

abnormal points 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 

abnormal points see: High-order necessary conditions for 
optimality for — 

abnormal processes 
[41A10, 46N10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals; High-order necessary conditions for optimality 
for abnormal points) 

abnormal weak extremal 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 

ABS algorithms for linear equations and linear least squares 
(65K05, 65K10) 
(referred to in: ABS algorithms for optimization; Cholesky 
factorization; Gauss-Newton method: Least squares, 
relation to Newton’s method; Generalized total least 
squares; Interval linear systems; Large scale trust region 
problems; Large scale unconstrained optimization; Least 
squares orthogonal polynomials; Least squares problems; 
Nonlinear least squares: Newton-type methods; Nonlinear 
least squares problems; Nonlinear least squares: trust 
region methods; Orthogonal triangularization; 
Overdetermined systems of linear equations; QR 
factorization; Solving large scale and sparse semidefinite 
programs; Symmetric systems of linear equations) 
(refers to: ABS algorithms for optimization; Cholesky 
factorization; Gauss-Newton method: Least squares, 
relation to Newton’s method; Generalized total least 
squares; Interval linear systems; Large scale trust region 
problems; Large scale unconstrained optimization; Least 
squares orthogonal polynomials; Least squares problems; 
Linear programming; Nonlinear least squares: Newton-type 
methods; Nonlinear least squares problems; Nonlinear least 
squares: trust region methods; Orthogonal 
triangularization; Overdetermined systems of linear 
equations; QR factorization; Solving large scale and sparse 
semidefinite programs; Symmetric systems of linear 
equations) 

ABS algorithms for optimization 
(65K05, 65K10) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; Gauss-Newton method: Least squares, 
relation to Newton’s method; Generalized total least 
squares; Least squares orthogonal polynomials; Least 
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squares problems; Nonlinear least squares: Newton-type 
methods; Nonlinear least squares problems; Nonlinear least 
squares: trust region methods) 
(refers to: ABS algorithms for linear equations and linear 
least squares; Gauss-Newton method: Least squares, 
relation to Newton’s method; Generalized total least 
squares; Least squares orthogonal polynomials; Least 
squares problems; Nonlinear least squares: Newton-type 
methods; Nonlinear least squares problems; Nonlinear least 
squares: trust region methods) 
ABS class see: basic —; scaled —; unsealed — 
ABS class of algorithms see: scaled — 
ABS methods 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares; ABS algorithms for optimization) 
absolute deviation see: least —; maximum —; mean — 
absolute estimation 
[26A24, 65D25] 
(see: Automatic differentiation: introduction, history and 
rounding error estimation) 
absolute limit 
01A99] 
(see: Gauss, Carl Friedrich) 
absolute qualification rule 
90C35] 
(see: Feedback set problems) 
absolutely continuous functional 
90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 
abstract constraint 
49K27, 49K40, 90C30, 90C31] 
(see: Second order constraint qualifications) 
abstract constraints 
49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
abstract convex analysis 
90C26 
(see: Global optimization: envelope representation) 
abstract convex function 
90C26 
(see: Global optimization: envelope representation) 
abstract convexity 
90C26 
(see: Global optimization: envelope representation) 
abstract convexity 
90C26 
(see: Global optimization: envelope representation) 
abstract group see: realization of an — 
abstract hemivariational inequality 
49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 
abstract variational inequality of elliptic type 
[65M60] 
(see: Variational inequalities: F. E. approach) 
AC3 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 


acceleration devices and related techniques 
65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
acceleration function see: the mid-point — 
acceleration steps 
90C30] 
(see: Cyclic coordinate method) 
accelleration of algorithms 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
acceptance measure 
see: Bayesian networks) 
acceptance/rejection 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C10, 90C26, 
90C27, 90C30] 
see: Multidimensional knapsack problems; Stochastic 
global optimization: two-phase methods) 
accepted by a Turing machine see: language — 
accepting see: threshold — 
accepting algorithms see: threshold — 
accepting computation of a Turing machine 
[90C60] 
(see: Complexity classes in optimization) 
accepting state of a Turing machine 
[90C60] 
(see: Complexity classes in optimization) 
access machine see: parallel random — 
accessibility form of CEP see: restricted — 
accessible form of CEP see: universally — 
accessible state 
[93-XX] 
(see: Dynamic programming: optimal control applications) 
accessory minimum problem 
[49M29, 65K10, 90C06] 
see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 
ACCPM 
90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 
accumulate 
49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
accumulation of the Jacobian 
65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
accuracy 
93-XX] 
(see: Dynamic programming: optimal control applications) 
achievable region method 
[90B36] 
(see: Stochastic scheduling) 
achievement 
[90C29] 
see: Multiple objective programming support) 
achievement function 
[90C11, 90C29] 
see: Multi-objective mixed integer programming; Multiple 
objective programming support) 
achievement scalarizing program 
[90C11, 90C29] 
see: Multi-objective mixed integer programming) 


4074 


Subject Index 


acid see: amino — 
acquisitions see: Multicriteria methods for mergers and — 
across a fault see: jump — 
across an s—t-cut see: flow — 
action see: Clarke dual —; corrective —; recourse —; total — 
action algorithm see: row- — 
action method see: row- — 
actions see: recourse — 
activation function 
[90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 
active 
[05C85, 46N10, 47N10, 49M37, 65K10, 90C10, 90C26, 90C30, 
90C46, 90C60] 
(see: Complexity of degeneracy; Directed tree networks; 
Global optimization: tight convex underestimators; Integer 
programming duality; Railroad locomotive scheduling) 
active see: p-order — 
active constraint 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
active constraints 
[90C26, 90C30, 90C39] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Kuhn-Tucker optimality conditions; Second order 
optimality conditions for nonlinear optimization) 
active constraints 
[90C26, 90C60] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Complexity of degeneracy) 
active constraints see: strongly — 
active function 
[49K35, 49M27, 65K10, 90C25] 
(see: Convex max-functions) 
active index 
[65K05, 90C26, 90C33, 90C34] 
(see: Adaptive convexification in semi-infinite 
optimization) 
active index set 
[49J52, 49K35, 49M27, 49Q10, 57R12, 65K10, 74G60, 74H99, 
74K99, 74Pxx, 90C25, 90C31, 90C34, 90C46, 90C90] 
(see: Convex max-functions; Generalized semi-infinite 
programming: optimality conditions; Quasidifferentiable 
optimization: stability of dynamic systems; Semi-infinite 
programming: second order optimality conditions; 
Smoothing methods for semi-infinite optimization) 
active index set see: essentially — 
active inequality constraints 
[90C26] 
see: Smooth nonlinear nonconvex optimization) 
active points see: set of e-most — 
active ridge 
[90Cxx] 
see: Discontinuous optimization) 
active set 
[65K05, 65K10, 90C20, 90C30] 
see: ABS algorithms for optimization; Quadratic 
programming with bound constraints; Rosen’s method, 
global convergence, and Powell’s conjecture) 


active set algorithm 
[65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
active set method 
[90Cxx] 
(see: Discontinuous optimization) 
active set methods 
[49M37, 65K05, 90C25, 90C30, 90C60] 
(see: Complexity of degeneracy; Inequality-constrained 
nonlinear optimization; Successive quadratic 
programming: full space methods) 
active set methods 
[90C25, 90C30, 90C60, 90Cxx] 
(see: Complexity of degeneracy; Discontinuous 
optimization; Successive quadratic programming; 
Successive quadratic programming: full space methods; 
Successive quadratic programming: solution by active sets 
and interior point methods) 
active set quadratic programming methods 
[62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
active set strategies 
[65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
active set strategy 
[90C25, 90C30] 
(see: Successive quadratic programming: solution by active 
sets and interior point methods) 
active set strategy see: Goldfarb-Idnani — 
active sets and interior point methods see: Successive 
quadratic programming: solution by — 
active site 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
active site 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
activities see: matrix of — 
activity see: direction, preserving an — 
activity coefficient 
[90C30] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
actual 
[65H20] 
(see: Multi-scale global optimization using 
terrain/funneling methods) 
acute angle condition 
[47J20, 49]40, 65K10, 90C33] 
(see: Solution methods for multivalued variational 
inequalities) 
acyclic oriented matroid 
[90C09, 90C10] 
(see: Oriented matroids) 
acyclic oriented matroid see: totally — 
acyclic subdigraph problem 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
acyclic tournament see: spanning — 
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AD 
[49-04, 65Y05, 68N20] 
(see: Automatic differentiation: parallel computation) 
AD see: error estimates for —; forward mode of —; point —; 
reverse mode of — 
AD algorithm see: forward mode of an —; reverse mode of 
an — 
AD-enabled parallelism 
[49-04, 65Y05, 68N20] 
(see: Automatic differentiation: parallel computation) 
ad hoc networks see: Optimization in — 
AD intermediate form 
[49-04, 65Y05, 68N20] 
(see: Automatic differentiation: parallel computation) 
AD of parallel programs 
[49-04, 65Y05, 68N20] 
(see: Automatic differentiation: parallel computation) 
AD tools see: parallel — 
ADO1 
[65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 
Adams-Johnson linearization 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
adaptation 
(see: Bayesian networks) 
adaptation see: subinterval — 
adaptive aggregation method 
49120, 90C39] 
(see: Dynamic programming: discounted problems) 
adaptive algorithm 
60J65, 68Q25] 
(see: Adaptive global search) 
adaptive algorithm 
60]65, 68Q25] 
(see: Adaptive global search) 
adaptive computational method 
34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
(see: Nonlocal sensitivity analysis with automatic 
differentiation) 
adaptive computational method 
[34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
(see: Nonlocal sensitivity analysis with automatic 
differentiation) 
Adaptive convexification in semi-infinite optimization 
(90C34, 90C33, 90C26, 65K05) 
(refers to: &BB algorithm; Bilevel optimization: feasibility 
test and flexibility index; Convex discrete optimization; 
Generalized semi-infinite programming: optimality 
conditions) 
adaptive) decision see: ex-post (risk prone — 
Adaptive global search 
(60J65, 68Q25) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Global optimization based 
on statistical models) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Global optimization based on statistical 
models) 


adaptive homotopy 

34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 

(see: Nonlocal sensitivity analysis with automatic 

differentiation) 

adaptive homotopy 

34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 

(see: Nonlocal sensitivity analysis with automatic 

differentiation) 

adaptive memory 

05-04, 90C27] 

(see: Evolutionary algorithms in combinatorial 

optimization) 

adaptive methods 

49M37, 65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 
90C26] 
(see: Information-based complexity and information-based 
optimization; Nonlinear least squares: Newton-type 
methods) 

adaptive methods 
[49M37, 60J65, 68Q25] 
(see: Adaptive global search; Nonlinear least squares: 
Newton-type methods) 

adaptive partitioning 
[65K05, 90C26, 90C30] 
(see: Bounding derivative ranges; Direct global 
optimization algorithm) 

adaptive random search method 

[65K05, 90C30] 

see: Random search methods) 

adaptive search 

[65K05, 90C26, 90C30, 90C90] 

see: Global optimization: hit and run methods; Random 

search methods) 

adaptive search 

[90C08, 90C11, 90C26, 90C27, 90C90] 

see: Biquadratic assignment problem; Global optimization: 
hit and run methods) 

adaptive search see: greedy randomized —; hesitant —; 
pure — 

adaptive search procedure see: greedy randomized — 

adaptive search procedures see: Greedy randomized — 

adaptive simulated annealing 
[92C05] 
(see: Adaptive simulated annealing and its application to 
protein folding) 

adaptive simulated annealing 
[92C05] 
(see: Adaptive simulated annealing and its application to 
protein folding) 

Adaptive simulated annealing and its application to protein 
folding 
(92C05) 
(referred to in: Adaptive global search; Bayesian global 
optimization; Genetic algorithms; Genetic algorithms for 
protein structure prediction; Global optimization based on 
statistical models; Global optimization in Lennard-Jones 
and morse clusters; Graph coloring; Molecular structure 
determination: convex global underestimation; 
Monte-Carlo simulated annealing in protein folding; 
Multiple minima problem in protein folding: «BB global 
optimization approach; Phase problem in X-ray 
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crystallography: Shake and bake approach; Random search 
methods; Simulated annealing; Simulated annealing 
methods in protein folding; Stochastic global optimization: 
stopping rules; Stochastic global optimization: two-phase 
methods) 
(refers to: Adaptive global search; Bayesian global 
optimization; Genetic algorithms; Genetic algorithms for 
protein structure prediction; Global optimization based on 
statistical models; Global optimization in Lennard-Jones 
and morse clusters; Global optimization in protein folding; 
Molecular structure determination: convex global 
underestimation; Monte-Carlo simulated annealing in 
protein folding; Multiple minima problem in protein 
folding: «BB global optimization approach; Packet 
annealing; Phase problem in X-ray crystallography: Shake 
and bake approach; Protein folding: generalized-ensemble 
algorithms; Random search methods; Simulated annealing; 
Simulated annealing methods in protein folding; Stochastic 
global optimization: stopping rules; Stochastic global 
optimization: two-phase methods) 

adaptive subdivision rule 
[90C26] 
(see: D.C. programming) 

addition with order see: first order theory of real — 

additional reverse convex constraint see: linear program with 
an — 

additive tree 

[62H30, 90C27] 

see: Assignment methods in clustering) 

additive utility functions 

[90C26, 91B28] 

(see: Portfolio selection and multicriteria analysis) 

additive utility functions 

[90C26, 90C29, 91B28] 

see: Decision support systems with multiple criteria; 
Portfolio selection and multicriteria analysis) 

adic assignments problems see: N- — 

aDIFOR 
[65K05, 90C30] 
(see: Automatic differentiation: calculation of the Hessian; 
Automatic differentiation: point and interval taylor 
operators) 


adjacency graph 

[90B80] 

(see: Facilities layout problems) 

adjacency graph 

[90B80] 

see: Facilities layout problems) 

adjacency matrix 

[05C15, 05C17, 05C35, 05C60, 05C69, 37B25, 90C20, 90C22, 
90C27, 90C35, 90C59, 91A22] 

see: Lovasz number; Replicator dynamics in combinatorial 
optimization) 


adjacent 

[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
see: Graph coloring; Lovasz number) 

adjacent channel constrained frequency assignment 
[05-XX] 

(see: Frequency assignment problem) 


adjacent vertices in a graph 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
adjacent violators algorithm see: pool — 
adjoint 
[65K05, 65L99, 90C30, 93-XX] 
(see: Automatic differentiation: calculation of Newton steps; 
Optimization strategies for dynamic systems) 
adjoint-based gradient 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
adjoint derivative method 
[90C26, 90C90] 
(see: Structural optimization: history) 
adjoint equation see: extended — 
adjoint linear map 
49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 
adjoint methods 
65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
adjoint problem 
49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 
adjoint program 
26A24, 65D25] 
(see: Automatic differentiation: introduction, history and 
rounding error estimation) 
adjoint recursion 
[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 
adjoint variables 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
adjoints 
[65H99, 65K99] 
(see: Automatic differentiation: point and interval) 
adjoints see: second order — 
adjustment 
[90B80, 90C10] 
(see: Facility location problems with spatial interaction) 
adjustment see: multiplier —; simultaneous — 
adjustment process 
65K10, 90C90] 
(see: Variational inequalities: projected dynamical system) 
adjustment process see: trip-route choice — 
admissible arc 
90C35] 
(see: Maximum flow problem) 
admissible cluster 
62H30, 90C39] 
(see: Dynamic programming in clustering) 
admissible displacement see: kinematically — 
admissible domain 
49J20, 49J52] 
(see: Shape optimization) 
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admissible pair of a monomial ideal 
{13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
admissible pair of trajectory and control functions see: 
asymptotically — 
admissible pair of trajectory-function and control-function 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
admissible pivot 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules) 
admissible policy 
[49L20, 90C39] 
(see: Dynamic programming: discounted problems) 
admissible solution 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
admissible solution 
[90C15, 90C29] 
(see: Approximation of extremum problems with 
probability functionals; Discretely distributed stochastic 
programs: descent directions and efficient points) 
admissible space see: kinetically — 
admissible trajectory-control pair 
03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
ADOL-C 
65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 
ADOL-F 
65K05, 90C30] 
(see: Automatic differentiation: calculation of the Hessian; 
Automatic differentiation: point and interval taylor 
operators) 
advance 
03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
advanced basis 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
advanced search heuristics 
05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
advanced warmstart 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
adversary 
05C85] 
(see: Directed tree networks) 
aEL 
74A40, 90C26] 
(see: Shape selective zeolite separation and catalysis: 
optimization methods) 
affine 
65K05, 90C30] 
(see: Minimax: directional differentiability) 
affine equilibrium constraints see: mathematical program 
with — 


affine function 
[32B15, 51E15, 51N20, 90C26, 90C39] 
(see: Affine sets and functions; Second order optimality 
conditions for nonlinear optimization) 
affine functions see: product of —; program of minimizing 
a product of two — 
affine-reduced-Hessian 
[90C30] 
(see: Conjugate-gradient methods) 
affine reduced Hessian see: limited-memory — 
affine-reduced-Hessian algorithm 
[90C30] 
(see: Conjugate-gradient methods) 
affine reduction see: successive — 
affine reduction BFGS algorithm see: successive — 
affine scaling algorithm 
90C05] 
(see: Linear programming: interior point methods; Linear 
programming: karmarkar projective algorithm) 
affine scaling SQPIP methods 
49K20, 49M99, 90C55] 
(see: Sequential quadratic programming: interior point 
methods for distributed optimal control problems) 
affine set 
32B15, 51E15, 51N20] 
(see: Affine sets and functions) 
Affine sets and functions 
(51E15, 32B15, 51N20) 
(referred to in: Linear programming; Linear space) 
(refers to: Convex max-functions; Linear programming; 
Linear space) 
after-arrival see: duty- — 
afterset 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
afterset representation of relations see: foreset and — 
against all see: one — 
against one see: one — 
agent see: principal — 
agents see: mass separating — 
aggregate excess demand function 
[91B50] 
(see: Walrasian price equilibrium) 
aggregate excess demand function 
[91B50] 
(see: Walrasian price equilibrium) 
aggregation 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
Aggregation 
(see: Optimal planning of offshore oilfield infrastructure) 
aggregation see: feature-based —; scenario — 
aggregation function 
[90C30, 90C90] 
(see: Decomposition techniques for MILP: lagrangian 
relaxation) 
aggregation heuristic 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
aggregation method see: adaptive — 
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aggregation schemes 
90C30, 90C90] 
(see: Decomposition techniques for MILP: lagrangian 
relaxation) 
Agmon-Motzkin-Fourier relaxation method 
90C25, 90C33, 90C55] 
(see: Splitting method for linear complementarity 
problems) 
agricultural risks 
90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 
agricultural risks 
90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 
agricultural systems see: State of the art in modeling — 
agriculture 
[90C29, 90C30, 90C90] 
(see: Decision support systems with multiple criteria; 
MINLP: applications in blending and pooling problems) 
ahead rules see: look- — 
AHP 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
AHP 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
aHT 
[74A40, 90C26] 
(see: Shape selective zeolite separation and catalysis: 
optimization methods) 
aid see: multicriteria decision — 
AIDA* 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
aided techniques see: computer — 
AIF 
[49-04, 65Y05, 68N20] 
see: Automatic differentiation: parallel computation) 
air pollution 
[90C05, 90C34] 
(see: Semi-infinite programming: methods for linear 
problems) 
air pollution 
[90C05, 90C34] 
(see: Semi-infinite programming: methods for linear 
problems) 
air traffic control see: ground delay problem in — 
air traffic control and ground delay programs 
[90B06, 90C06, 90C08, 90C35, 90C90] 
see: Airline optimization) 
aircraft routing 
[90B06, 90C06, 90C08, 90C35, 90C90] 
see: Airline optimization) 
craft routing 
[90B06, 90C06, 90C08, 90C35, 90C90] 
see: Airline optimization) 


ai 


ie 
in 


airline crew scheduling 
[90C35] 
(see: Multicommodity flow problems) 
airline fleet assignment 
[90C35] 
(see: Multicommodity flow problems) 
airline maintenance routing problem 
[90C35] 
(see: Multicommodity flow problems) 
Airline optimization 
(90B06, 90C06, 90C08, 90C35, 90C90) 
(referred to in: Integer programming; Vehicle scheduling) 
(refers to: Integer programming; Vehicle scheduling) 
airplane hopping problem 
90C35] 
(see: Minimum cost flow problem) 
Aitken double sweep method 
90C30] 
(see: Cyclic coordinate method) 
Aitken double sweep method 
90C30] 
(see: Cyclic coordinate method) 
Aizenberg-Rabinovich system 
03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic 
algebras) 
Akaike information criterion 
[62F10, 94A17] 
(see: Entropy optimization: parameter estimation) 
Alanine) see: Poly(L- — 
algebra see: Boolean —; Boolean 2-valued logic —; 
computer —; Fundamental theorem of —; Lie —; linear —; 
many-valued logic —; MV- —; Orlik-Solomon —-; Pi- —; 
Pinkava —; Pinkava logical —; relational interval —; V —; 
von Neumann —; Wx- —; Zhegalkin — 
algebra connective see: logic — 
algebra framework see: linear — 
algebra package see: computer — 
algebraic equations 
[01A60, 03B30, 54C70, 68Q17] 
(see: Hilbert’s thirteenth problem) 
algebraic equations 
[01A60, 03B30, 54C70, 68Q17] 
(see: Hilbert’s thirteenth problem) 
algebraic equations see: differential and —; linear — 
algebraic expressions 
(see: Planning in the process industry) 
algebraic methods see: Integer programming: — 
algebraic modeling language 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new 
paradigm) 
algebraic modeling languages 
(see: Planning in the process industry) 
algebraic QAP 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
algebraic quadratic assignment problem 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
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algebraic statistics 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 

algebraically decreasing tail see: RSM-distribution with — 

algebras see: application of Pl- —; complexity theory of PI- —; 
families of Pi- —; Finite complete systems of many-valued 
logic —; functional completeness of Pl- —; functionally 
complete normal forms of Pi- —; many-valued families of 
the Pinkava logic —; Pl-logic —; taxonomy of Pi-logic —; 
use of PI- — 

algebras and 2-valued normal forms see: Pl- — 

algebras of many-valued logics see: taxonomy of the PI- — 

algorithm 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Decomposition principle of linear programming; 
Modeling difficult optimization problems) 

algorithm see: A* search —; active set —; adaptive —; 
affine-reduced-Hessian —; affine scaling —; alpha-beta —; 
a@BB —; wBB global optimization —; approximation —; 
asynchronous —; asynchronous parallel CA —; auction —; 
augmenting path —; Balas —; Bialas-Karwan Kth-best —; 
binary search —; branch and bound —; branch and 
contract —; branching —; Buchberger —; bundle —; 
cCOMB —; CG-related —; CGU —-; clustering —; 
combinatorial —; complexity of an —; conical —; conjugate 
residual —; consistent labeling —; Conti-Traverso —; 
continuous-time equivalent of the dynamic 
programming —; convergent —; Convex-simplex —; 
copolyblock —; Corley-Moon —; Craig —; Craig conjugate 
gradient type —; cross decomposition —; cutting plane —; 
cycle-canceling —; cycling —; Dai-Yuan —; 
Daniel-Gragg—Kaufmann-Stewart reorthogonalized 
Gram-Schmidt —; descent —; descent in a nonlinear 
programming —; deterministic global optimization —; 
dimension-by-dimension —; Dinkelbach —; Direct global 
optimization —; discrete polyblock —; distributed game 
tree search —; division —; dual exterior point —; 
dual-scaling —; dual-scalings —; dual simplex —; dynamic 
programming —-; efficient —; efficient polynomially 
bounded polynomial time —; EGOP —-; ellipsoid —; 
Elzinga—Hearn —; EM —; entropic proximal point —; 
equilibration —; Esau—Williams —; evolutionary —; 
exact —; exact penalty function based —; 
expectation-maximization —; exponential —; exponential 
time —; Extended cutting plane —; extra-gradient —; 
Feed —; Fletcher-Reeves —; Ford—Fulkerson —; forward 
mode of an AD —; Frank—Wolfe —; Gauss-Seidel —; 
general —; general structure mixed integer w@BB —; 
generalized bisection —; generalized game tree search —; 
generalized primal-relaxed dual —; generic augmenting 
path —; generic preflow-push —; generic vertex 
insertion —; globally convergent —; globally convergent 
probability-one homotopy —; GMIN-awBB —; 
Goldfarb-Wang —; Gomory cutting plane —; gOP —; 
gradient-free —; gradient-free minimization —; gradient 
projection —; graph collapsing auction —; greedy —; 
grouping genetic —; Gsat —; heavy ball —; 
Hestenes-Stiefel —; heuristic —; hide-and-seek —; high 
failure of the alpha-beta —; hit and run —; 
homogeneous —; Hopcroft-Tarjan planarity-testing —; 


Huang —; Hungarian —; hybrid —; implicit Choleski —; 
Implicit LU —; Implicit LX —; implicit QR —; incremental —; 
incremental-iterative solution —; incremental negamax —; 
infeasible-start interior-point —; Ingber —; interior 

point —; interval Newton —; Jacobi —; JUnger—Mutzel 
branch and cut —; K-iterated tour partitioning —; 
Karmarkar —; Kruskal —; Lanczos —; learning —; 

Lemke’s —; Levenberg—Marquardt —; lexicographic 

search —; limited-memory —; limited-memory 
reduced-Hessian BFGS —-; linear —; Linear programming: 
karmarkar projective —; low failure of the alpha-beta —; 
machine-learning —; mandatory work first —; Martin —; 
max-—flow —; minimax —; minimum lower set —; MINLP: 
branch and bound global optimization —; MINLP: outer 
approximation —; modified Huang —; modified Kruskal —; 
modified Prim —; modified standard auction —; 
Monte-Carlo simulation —; multilevel —; naive auction —; 
NC —; nDOMB —-; nearest insertion optimal partitioning —; 
Nelder-Mead —; network simplex —; nondeterministic 
polynomial —; nondeterministic polynomial time —; 
nonsmooth SSC-SABB —; on-line —; one clause at a time —; 
operator splitting —; optimal —; optimal state space 
search —; outer approximation —; P- —; parallel —; parallel 
minimax tree —; parallel routing —; parallel savings —; 
parallel-tangents —; Parametric linear programming: cost 
simplex —; parametric objective simplex —; parametric 
right-hand side simplex —; PARTAN —-; partial proximal 
point —; path following —; perceptron —; pivot —; 
pivoting —; Piyavskii-Shubert —; Pnueli —; 
Polyak-Polak-Ribiére —; polyblock —; polynomial —; 
polynomial time —; polynomial time deterministic —; pool 
adjacent violators —; potential reduction —; potential 
smoothing —; predictor-corrector —; preflow-push —; 
primal-dual —; primal-dual potential reduction —; 
primal-dual scaling —; primal potential reduction —; 
primal-scaling —; primal simplex —; principal pivot —; 
principal variation splitting —; probabilistic analysis of 

an —; projected gradient —; projective —; proximal 

point —; pseudopolynomial —; pseudopolynomial time —; 
QPP —; quadratic proximal point —; RA —; randomized —; 
recursive —; recursive least squares —; recursive state 
space search —; reduced gradient —; regularized 
Frank-Wolfe —; regularized stochastic decomposition —; 
relative positioning —; relaxation —; relaxation labeling —; 
reverse mode of an AD —; reverse polyblock —; revised 
polyblock —; revised reverse polyblock (copolyblock) —; 
row-action —; Schaible —; sequential CA —; Sequential 
cutting plane —; sequential deterministic —; sequential 
minimax game tree —; shadow-vertex —; shake and 

bake —; simplex —; simplex type —; simulated annealing 
and genetic —; SMIN-a@BB —; Smith—Walford-one —; 
special structure mixed integer ~@BB —; SQP type —; 
SSC-SABB —; sSC-SBB —; state space search —; steepest 
descent —; stochastic decomposition —; strongly 
polynomial —; strongly polynomial time —; subgradient 
projection —; successive affine reduction BFGS —; 
successive shortest path —; supervisor —; sweep —; 
synchronized distributed state space search —; 
synchronized parallel CA —; synchronous implementation 
of the auction —; TCF of an —; three phase —; 
three-term-recurrence —; time complexity function of 
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an —; totally asynchronous implementation of the 
auction —; tree-splitting —; truncated Buchberger —; 
unified —; UTASTAR —-; variable-storage —; variant of the 
simplex —; virtual source —; weakly polynomial time —; 
Zangwill — 
algorithm analysis see: nondegeneracy assumption for — 
algorithm for axial MITPs see: greedy — 
algorithm of complexity O(n‘) 
[90C60] 
(see: Computational complexity theory) 
algorithm (definition) see: optimization — 
algorithm design 
[05-04, 65K05, 65Y05, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization; Parallel computing: models) 
algorithm design see: model for parallel — 
algorithm for entropy optimization see: path following — 
algorithm greedy-expanding 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
see: Maximum partition matching) 
algorithm partition-flipping 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
see: Maximum partition matching) 
algorithm partition-matching-I 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
algorithm polynomial of degree c 
[90C60] 
(see: Computational complexity theory) 
algorithm pre-matching 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
algorithm and robust stopping criteria see: Dykstra’s — 
algorithm running in O(n‘) time 
[90C60] 
(see: Computational complexity theory) 
algorithm-SCG 
(see: Railroad crew scheduling) 
algorithm for solving CAP on trees see: exact — 
algorithm solving a problem instance in time m 
[90C60] 
(see: Computational complexity theory) 
algorithm for weighted graph planarization see: branch and 
bound — 
algorithmic approximation 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 
algorithmic complexity 
[90C60] 
see: Kolmogorov complexity) 
Algorithmic complexity 
[90C60] 
see: Kolmogorov complexity) 
algorithmic definition 
[65H99, 65K99] 
see: Automatic differentiation: point and interval) 
algorithmic development 
[49M37, 90C11] 
see: MINLP: applications in the interaction of design and 
control) 


algorithmic differentiation 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 

algorithmic entropy 
[90C60] 
(see: Kolmogorov complexity) 

Algorithmic entropy 
[90C60] 
(see: Kolmogorov complexity) 

Algorithmic improvements using a heuristic parameter, reject 
index for interval optimization 
(65K05, 90C30) 
(refers to: Interval analysis: unconstrained and constrained 
optimization; Interval Newton methods) 

algorithmic information 

90C60] 

(see: Kolmogorov complexity) 

Algorithmic information 

90C60] 

(see: Kolmogorov complexity) 

algorithmic knowledge 

90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 

algorithmic language 

90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 

orithmic language 

90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 

algorithmic randomness 

90C60] 
(see: Kolmogorov complexity) 

Algorithmic randomness 
[90C60] 
(see: Kolmogorov complexity) 

algorithms 
[05C69, 05C85, 49J35, 49K35, 49M37, 52B11, 52B45, 52B55, 
62C20, 62G07, 62G30, 65K05, 65K10, 68Q25, 68W01, 90C26, 
90C27, 90C30, 90C35, 90C59, 90C60, 91A05, 91A12, 91A40, 
91B28] 
(see: &BB algorithm; Combinatorial optimization games; 
Competitive ratio for portfolio management; Graph 
coloring; Heuristics for maximum clique and independent 
set; Inequality-constrained nonlinear optimization; 
Isotonic regression problems; Minimax game tree 
searching; Volume computation for polytopes: strategies 
and performances) 

algorithms see: accelleration of —; approximation —; 
Asynchronous distributed optimization —; asynchronous 
iterative —; auction —; average case complexity of —; 
branch and bound —; bundle —; CA —; complexity theory 
of —; Cost approximation —; decomposition —; 
decomposition CA —; discrete-time —; efficient —; 
evolutionary —; exact —; fixed parameter tractable —; 
generic shortest path —; genetic —; geometric —; graph 
collapsing in auction —; graph reduction in auction —; 
greedy —; heuristic —; heuristic-metaheuristic —; hit and 
run —; inexact proximal point —; Integer programming: 
branch and cut —; Integer programming: cutting plane —; 
interior point —; local greedy —; local search —; 
memetic —; nonlinear CG-related —; numerical —; 
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optimal —; optimization —; pair assignment —; parallel —; 
polynomial time —; potential reduction —; primal and dual 
simplex —; Probabilistic analysis of simplex —; Protein 
folding: generalized-ensemble —; proximal —; random 
search —; randomized —; reducibility of —; robust —; 
scaled ABS class of —; search —; Shortest path tree —; 
simplicial —; Simplicial decomposition —; single 
assignment —; SLP —; smoothing —; solution —; SSC 
minimization —; Stable set problem: branch & cut —; 
Standard quadratic optimization problems: —; supervisor 
and searcher cooperation minimization —; threshold 
accepting —; training —; unconstrained optimization —; 
varying dimension pivoting —; virtual source concept in 
auction — 

algorithms in combinatorial optimization see: Evolutionary — 

algorithms and complexity see: Regression by special 
functions: — 

algorithms for entropy optimization 
[90C25, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 

algorithms for entropy optimization see: interior point — 

algorithms for financial planning problems see: Global 
optimization — 

algorithms for GAP see: approximation — 

Algorithms for genomic analysis 
(90C27, 90C35, 90C11, 65K05, 90-08, 90-00) 

algorithms for hypodifferentiable functions see: 
Quasidifferentiable optimization: — 

algorithms for integer programming see: Simplicial pivoting — 

algorithms for isotonic regression problems 
[62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 

algorithms for linear equations and linear least squares see: 
Abaffi-Broyden-Spedicato —; ABS — 

algorithms for linear programming generating two paths see: 
Pivoting — 

algorithms for nonconvex minimization problems see: 
decomposition — 

algorithms for nonsmooth and stochastic optimization see: 
SSC minimization — 

algorithms for optimization see: ABS — 

algorithms in pattern recognition see: Complementarity — 

algorithms for protein structure prediction see: Genetic — 

algorithms for QD functions see: Quasidifferentiable 
optimization: — 

algorithms in resource allocation problems see: Combinatorial 
optimization — 

algorithms and software see: Continuous global optimization: 
models — 

algorithms for the solution of multistage mean-variance 
optimization problems see: Decomposition — 

algorithms for stochastic bilevel programs 
[90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 

algorithms for stochastic linear programming problems see: 
Stabilization of cutting plane — 

algorithms for the traveling salesman problem see: Heuristic 
and metaheuristic — 


algorithms for unconstrained minimization 
[65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 

algorithms for unconstrained optimization see: New hybrid 
conjugate gradient —; Performance profiles of 
conjugate-gradient — 

algorithms for the vehicle routing problem see: 
Metaheuristic — 

aligned ellipsoid see: coordinate- — 

alignment see: communication-free —; multiple sequence —; 
trace of an — 

alignment constraint 

[05-02, 05-04, 15A04, 15A06, 68U99] 

see: Alignment problem) 

alignment-distribution graph 

[05-02, 05-04, 15A04, 15A06, 68U99] 

see: Alignment problem) 

alignment graph see: extended — 

Alignment problem 

05-02, 05-04, 15A04, 15A06, 68U99) 
(referred to in: Integer programming) 
(refers to: Integer programming) 

alignment problem 
[05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 

alignment problem 
[05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 

alignment problem see: communication-free —; constant 
degree parallelism —; solution of the — 

alignment via mixed-integer linear optimization see: Global 
pairwise protein sequence — 

all see: find one, find —; one against — 

all-atom 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 

all azeotropes see: Nonlinear systems of equations: application 
to the enclosure of — 

all edge-directions of P see: covers — 

all instances see: all-to- —; one-to- — 

all-optical networks 
[05C85] 
(see: Directed tree networks) 

all-to-all instances 


[05C85] 
(see: Directed tree networks) 

allele 
(see: Broadcast scheduling problem) 

allocation see: facility location- —; marginal —; median 
location- —; MINLP: application in facility location- —; 
multifacility location- —; resource —; task — 


allocation for epidemic control see: Resource — 
allocation of gas 

[76T30, 90C11, 90C90] 

(see: Mixed integer optimization in well scheduling) 
allocation model see: location- — 
allocation phase 

[90B80, 90B85] 

(see: Warehouse location problem) 
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allocation problem see: discrete resource —; location- —; 
p-median location- —; resource — 

allocation problems see: Combinatorial optimization 
algorithms in resource — 

allocation scheme see: randomized — 

allocation subproblem 

[90C26] 

see: MINLP: application in facility location-allocation) 

allowed neighbor in tabu search 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 

94C10] 

see: Maximum satisfiability problem) 

almost complementary solutions 

[65K05, 90C20, 90C33] 

see: Principal pivoting methods for linear complementarity 

problems) 

almost empty spaces see: analyzing — 

almost at equilibrium of an assignment and a set of prices 

[90C30, 90C35] 
(see: Auction algorithms) 

along-rays functions on topological vector spaces see: 
Increasing and convex- — 

alpha-beta algorithm 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 

alpha-beta algorithm see: high failure of the —; low failure of 
the — 

a@-concave function 

[90C15] 

(see: Logconcave measures, logconvexity) 

a-concave function 

[90C15] 

see: Logconcave measures, logconvexity) 

a@-concave measure 

[90C15] 

see: Logconcave measures, logconvexity) 

a-cut of a fuzzy relation 

[03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

a-divergence see: Csiszar — 

alpha-helical proteins see: Predictive method for interhelical 
contacts in — 

a-helix 
[92C05] 
(see: Adaptive simulated annealing and its application to 
protein folding) 

a-helix 
[92040] 
(see: Monte-Carlo simulated annealing in protein folding) 

aBB 
[49M37, 65K10, 90C26, 90C30, 92C40] 
(see: &BB algorithm; Multiple minima problem in protein 
folding: «BB global optimization approach) 

a@BB see: GMIN- —; MINLP: global optimization with —; 
SMIN- — 

BB algorithm 
(49M37, 65K10, 90C26, 90C30) 
(referred to in: Adaptive convexification in semi-infinite 
optimization; Bisection global optimization methods; 
Continuous global optimization: applications; Continuous 


global optimization: models, algorithms and software; 
Convex envelopes in optimization problems; Differential 
equations and global optimization; Direct global 
optimization algorithm; Eigenvalue enclosures for ordinary 
differential equations; Generalized primal-relaxed dual 
approach; Global optimization based on statistical models; 
Global optimization in batch design under uncertainty; 
Global optimization in binary star astronomy; Global 
optimization in generalized geometric programming; 
Global optimization methods for systems of nonlinear 
equations; Global optimization in phase and chemical 
reaction equilibrium; Global optimization using space 
filling; Hemivariational inequalities: eigenvalue problems; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval global optimization; MINLP: branch and bound 
global optimization algorithm; MINLP: global 
optimization with «BB; Quadratic knapsack; Reverse 
convex optimization; Semidefinite programming and 
determinant maximization; Smooth nonlinear nonconvex 
optimization; Standard quadratic optimization problems: 
theory; Topology of global optimization) 

(refers to: Bisection global optimization methods; 
Continuous global optimization: applications; Continuous 
global optimization: models, algorithms and software; 
Convex envelopes in optimization problems; D.C. 
programming; Differential equations and global 
optimization; Direct global optimization algorithm; 
Eigenvalue enclosures for ordinary differential equations; 
Generalized primal-relaxed dual approach; Global 
optimization based on statistical models; Global 
optimization in batch design under uncertainty; Global 
optimization in binary star astronomy; Global 
optimization in generalized geometric programming; 
Global optimization methods for systems of nonlinear 
equations; Global optimization in phase and chemical 
reaction equilibrium; Global optimization using space 
filling; Hemivariational inequalities: eigenvalue problems; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval global optimization; MINLP: branch and bound 
global optimization algorithm; MINLP: global 
optimization with «BB; Reformulation-linearization 
technique for global optimization; Reverse convex 
optimization; Semidefinite programming and determinant 
maximization; Smooth nonlinear nonconvex optimization; 
Topology of global optimization) 


a@BB algorithm 


[49M37, 65K05, 65K10, 90C11, 90C26, 90C30] 
(see: &BB algorithm; MINLP: global optimization with 
aBB) 


aBB algorithm see: general structure mixed integer —; 


GMIN- —; SMIN- —; special structure mixed integer — 


a@BB approach see: Global optimization: g- —; Global 


optimization: p- — 


BB global optimization algorithm 


[90C10, 90C26] 
(see: MINLP: branch and bound global optimization 
algorithm) 


aBB global optimization approach see: Multiple minima 


problem in protein folding: — 


alphabet see: finite — 
alphabet of a Turing machine see: input — 
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alternance see: Chebyshev — 
alternating procedure 

[90C26] 

(see: MINLP: application in facility location-allocation) 
alternating Turing machine 

[03D15, 68Q05, 68Q15] 

(see: Parallel computing: complexity classes) 
alternation see: Chebyshev — 
alternative see: Linear optimization: theorems of the —; 

maximal —; set of decision —; theorem of the — 
alternative linear system 

[15A39, 90C05] 

(see: Linear optimization: theorems of the alternative) 
alternative and optimization see: Theorems of the — 
Alternative set theory 

(03E70, 03H05, 91B16) 

(referred to in: Boolean and fuzzy relations; Checklist 

paradigm semantics for fuzzy logics; Finite complete 

systems of many-valued logic algebras; Inference of 
monotone boolean functions; Optimization in boolean 
classification problems; Optimization in classifying text 
documents) 

(refers to: Boolean and fuzzy relations; Checklist paradigm 

semantics for fuzzy logics; Finite complete systems of 

many-valued logic algebras; Inference of monotone boolean 
functions; Optimization in boolean classification problems; 

Optimization in classifying text documents) 
alternative set theory see: axioms of — 
alternative systems 

[15A39, 90C05] 

(see: Motzkin transposition theorem) 
alternative theorem 

[46A20, 52A01, 90C30] 

(see: Farkas lemma: generalizations) 
alternative theorem 

[46A20, 52A01, 90C30] 

(see: Farkas lemma: generalizations) 
alternative theorem see: basic — 
alternatives see: finite set of the —; set of —; theorem of the — 
alternatives to CG 

[90C30] 

(see: Conjugate-gradient methods) 
amino acid 

[92B05] 

(see: Genetic algorithms for protein structure prediction) 
amino acid 

[92B05, 92C05] 

(see: Adaptive simulated annealing and its application to 

protein folding; Genetic algorithms for protein structure 

prediction) 

analog of the dynamic programming equation see: 
continuous-time — 

analyses see: post-optimality — 

analysing declarative program structure 

[90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 
analysis see: abstract convex —; Algorithms for genomic —; 

applications of sensitivity —; approximation —; 

asymptotic —; automated Fortran program for nonlocal 

sensitivity —; average case —; cluster —; Combinatorial 
matrix —; competitive —; convex —; data envelopment —; 


decision —; dependence —; design —; discrete convex —; 
discriminant —; domination —; equilibrium —; exploratory 
statistical —; Financial applications of multicriteria —; 
functional —; infinitesimal perturbation —; interval —; 
investment —; linear Programming and Economic —; 
marginal —; matrix —; mean-variance portfolio —; 
model-based experimental —; monotonic —; 
multicriteria —; nondegeneracy assumption for 
algorithm —; nonlocal sensitivity —; nonsmooth —; 
nonstandard —; numerical —; perturbation —; Portfolio 
selection and multicriteria —; post-optimality —; 
post-optimality sensitivity —; preference disaggregation —; 
probabilistic —; range- —; regression —; relational —; 
robust stability —; robustness —; scenario —; sensitivity —; 
set-valued —; Shape reconstruction methods for 
nonconvex feasibility —; shape sensitivity —; Short-term 
scheduling under uncertainty: sensitivity —; stability —; 
target —; time series —; value —; worst-case — 

analysis of an algorithm see: probabilistic — 

analysis: application to chemical engineering design problems 
see: Interval — 

analysis with automatic differentiation see: Nonlocal 
sensitivity — 

analysis and balanced interval arithmetic see: Global 
optimization: interval — 

analysis of cable structures see: structural — 

analysis in combinatorial optimization see: Domination — 

analysis of complementarity problems see: Sensitivity — 

analysis: differential equations see: Interval — 

analysis: eigenvalue bounds of interval matrices see: Interval — 

analysis of flowsheets see: flexibility —; operability — 

analysis: Fréchet subdifferentials see: Nonsmooth — 

analysis: intermediate terms see: Interval — 

analysis and management of environmental systems see: 
Global optimization in the — 

analysis methodologies see: semantic — 

analysis: nondifferentiable problems see: Interval — 

analysis and optimization see: nonsmooth — 

analysis for optimization of dynamical systems see: Interval — 

analysis of optimization problems see: stability — 

analysis: parallel methods for global optimization see: 
Interval — 

analysis with respect to changes in cost coefficients see: 
sensitivity — 

analysis with respect to right-hand side changes see: 
sensitivity — 

analysis of simplex algorithms see: Probabilistic — 

analysis Step 
[90B15] 
(see: Evacuation networks) 

analysis: subdivision directions in interval branch and bound 
methods see: Interval — 

analysis system see: stability of a structural — 

analysis: systems of nonlinear equations see: Interval — 

analysis: unconstrained and constrained optimization see: 
Interval — 

analysis of variance see: one-way — 

analysis of variational inequality problems see: Sensitivity — 

analysis: verifying feasibility see: Interval — 

analysis: weak stationarity see: Nonsmooth — 
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analytic center 
[46N10, 49M20, 90-00, 90-08, 90C25, 90C47] 
(see: Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods) 
analytic center cutting plane method 
[90B10, 90C05, 90C06, 90C35] 
see: Nonoriented multicommodity flow problems) 
analytic hierarchy process 
[90C29] 
see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
analytic hierarchy process 
[90C29] 
see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
analytical approximation of linear programming 
[90C05, 90C25] 
see: Young programming) 
analytical approximation of a linear programming problem 
[90C05, 90C25] 
(see: Young programming) 
analytical differentiation 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
analytical tractability 
[90B85] 
see: Single facility location: multi-objective euclidean 
distance location) 
analyzing almost empty spaces 
(see: Selection of maximally informative genes) 
anchor 
(see: Semidefinite programming and the sensor network 
localization problem, SNLP) 
anchor see: non- — 
AND-ing 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
angle see: bond —; dihedral — 
angle condition see: acute —; nonobtuse —; uniform — 
angle method see: cutting —; Global optimization: cutting — 
angle optimization see: beam — 
angle selection see: beam — 
angle selection and wedge orientation optimization see: 
beam — 
angles see: direction — 
angular form see: block — 
angular structure see: block- —; dual block- — 
annealed replication heuristic 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
annealing 
[60J65, 68Q25, 90C27, 90C90] 
(see: Adaptive global search; Simulated annealing) 
annealing see: adaptive simulated —; Gaussian density —; 
Packet —; re- —; simulated —; simulating —; stochastic 
simulated — 
annealing and genetic algorithm see: simulated — 
annealing and its application to protein folding see: Adaptive 
simulated — 
annealing methods in protein folding see: Simulated — 


annealing in protein folding see: Monte-Carlo simulated — 
annealing schedule 
[90C27, 90C90] 
(see: Laplace method and applications to optimization 
problems; Simulated annealing) 
annealing temperature see: initial — 
annexation see: polyhedral — 
another) see: pseudomonotone bifunction (with respect to — 
Ansatz see: reduction — 
ant colony 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
ant system 
[68T20, 68T99, 90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Metaheuristics; Quadratic assignment problem) 
ant system see: MAX-MIN — 
ante (risk averse, anticipative) decision see: ex- — 
anti-cycling procedure 
90C60] 
(see: Complexity of degeneracy) 
anti-Monge inequalities 
90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
anti-Monge matrix 
90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
anti-Robinson 
62H30, 90C27] 
(see: Assignment methods in clustering) 
anti-Robinson matrix 
62H30, 90C39] 
(see: Dynamic programming in clustering) 
anticipative see: non- — 
anticipative) decision see: ex-ante (risk averse — 
anticycling 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules; Least-index anticycling 
rules; Lexicographic pivoting rules) 
anticycling rules 
[05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 
anticycling rules see: Least-index — 
antisymmetric partial order 
[41A30, 4799, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
antisymmetric relation 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
antisymmetric relation see: strictly — 
antitone Boolean function 
[90C09] 
(see: Inference of monotone boolean functions) 
antitone monotone Boolean function 
[90C09] 
(see: Inference of monotone boolean functions) 
antitone operator 
[90C33] 
(see: Order complementarity) 
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APF 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
appearance of control function see: linear — 
application to chemical engineering design problems see: 
Interval analysis: — 
application to the enclosure of all azeotropes see: Nonlinear 
systems of equations: — 
application in facility location-allocation see: MINLP: — 
application to phase equilibrium problems see: Global 
optimization: — 
application of PI-algebras 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
application process 
[68T20, 68T99, 90C27, 90C59] 
(see: Capacitated minimum spanning trees; Metaheuristics) 
application to protein folding see: Adaptive simulated 
annealing and its — 
applications 
[49M37, 65K10, 90-01, 90B30, 90B50, 90C26, 90C27, 90C30, 
91B06, 91B32, 91B52, 91B60, 91B74] 
(see: #BB algorithm; Bilevel programming in management; 
Financial applications of multicriteria analysis; Operations 
research and financial markets) 
applications see: Bilevel programming: —; Continuous global 
optimization: —; Dynamic programming: optimal 
control —; economic —; engineering —; Invexity and its —; 
medical —; minimization Methods for Non-Differentiable 
Functions and —; Multi-quadratic integer programming: 
models and —; multistage —; noneconomic —; 
Pseudomonotone maps: properties and —; 
Quasidifferentiable optimization: —; Robust linear 
programming with right-hand-side uncertainty, duality 
and —; scientific —; Standard quadratic optimization 
problems: —; Stochastic quasigradient methods: — 
applications in blending and pooling problems see: MINLP: — 
applications in distillation systems see: Successive quadratic 
programming: — 
applications in engineering see: Bilevel programming: — 
applications in environmental systems modeling and 
management 
[90C05] 
(see: Global optimization in the analysis and management 
of environmental systems) 
applications in finance see: Semi-infinite programming and — 
applications in the interaction of design and control see: 
MINLP: — 
applications in mechanics 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 
applications in mechanics see: Hemivariational inequalities: — 
applications of multicriteria analysis see: Financial — 
applications to optimization problems see: Laplace method 
and — 
applications of parametric programming 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 


applications in the process industry see: Successive quadratic 
programming: — 

applications of sensitivity analysis 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 

applications in the supply chain management see: Bilinear 
programming: — 

applications to thermoelasticity see: Quasidifferentiable 
optimization: — 

applications to variational inequalities and equilibrium 
problems see: Generalized monotonicity: — 

approach see: Archimedian —; auction —; augmented 
Lagrangian decomposition —; axiomatic —; Bayesian —; 
Bayesian heuristic —; Benders decomposition —; Bilevel 
programming: implicit function —; closed form —; 
continuously differentiable exact penalty function —; 
cutting plane —; direct —; equation oriented —; Everett 
generalized Lagrange multiplier —; feasibility —; 
feasible —; feasible path —; Generalized primal-relaxed 
dual —; Global optimization: g-a@BB —; Global optimization: 
p-a@BB —; gradient based —; GRASP —; implicit function —; 
index —; infeasible path —; Kuhn-Tucker —; 
lexicographic —; limited-memory —; limited-memory 
symmetric rank-one —; material derivative —; 
Mixed-integer nonlinear optimization: A disjunctive cutting 
plane —; modified Cauchy —; modular —; Multiple minima 
problem in protein folding: wBB global optimization —; one 
clause at a time —; open form —; Optimization with 
equilibrium constraints: A piecewise SQP —; outranking 
relations —; parabolic curve —; parametric —; path 
following —; penalty —; Petrov-Galerkin —; Phase 
problem in X-ray crystallography: Shake and bake —; 
preference disaggregation —; primal-relaxed dual —; 
proximal point —; semidefinite programming —; 
simultaneous —; stochastic —; Stochastic programming: 
minimax —; subgraph —; Tikhonov’s regularization —; 
trust region —; value function —; Variational inequalities: F. 
E. —; worst-case — 

approach: basic features, examples from financial decision 
making see: Preference disaggregation — 

approach to bilevel programming see: implicit function — 

approach to clustering see: Nonsmooth optimization — 

approach for DNA transcription element identification see: 
Mixed 0-1 linear programming — 

approach to fractional optimization see: parametric — 

approach: global optimum search with enhanced positioning 
see: Gene clustering: A novel decomposition-based 
clustering — 

approach to image reconstruction from projection data see: 
feasibility —; optimization — 

approach to optimality see: parametric — 

approach to optimization see: Image space — 

approach to optimization in water resources see: stochastic — 

approach to solving CAP on trees see: heuristic — 

approaches see: cutting plane —; equation based —; 
heuristic —; logic-based —; Optimal solvent design —; 
Statistical classification: optimization — 

appropriateness 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
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approximate 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 

approximate continuous 
[90C10, 90C11, 90C27, 90C33] 
(see: Continuous reformulations of discrete-continuous 
optimization problems) 

approximate gradient see: v- — 

approximate inference see: interval-valued — 

approximate inverse 

65H10, 65J15] 

(see: Contraction-mapping) 

approximate Jacobian 

49J52, 90C30] 

(see: Nondifferentiable optimization: Newton method) 

approximate methods for solving vehicle routing problems 

[90B06] 

see: Vehicle routing) 

approximate Newton method 

[90030] 

see: Generalized total least squares) 

approximate optimization see: sequential — 

approximate reasoning 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 

approximate reasoning 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

see: Checklist paradigm semantics for fuzzy logics) 

approximate reasoning see: interval logic system of —; 
point-based logic system of — 

approximate solutions of nonlinear systems of equations see: 
error bound for — 

approximately see: exactly or — 

approximating cone see: high-order —; tangent high-order — 

approximating cone of decrease see: high-order — 

approximating cones see: feasible high-order —; tangent 
high-order — 

approximating curve see: feasible high-order —; 
high-order —; tangent high-order — 

approximating the recourse function 
[90C06, 90C15] 
(see: Stabilization of cutting plane algorithms for stochastic 
linear programming problems) 

approximating vector see: feasible high-order —; high-order 
tangent — 

approximating vector of decrease see: high-order — 

approximating vectors see: high-order — 

approximation 
[49]20, 49]52, 65H20, 65M60] 
(see: Multi-scale global optimization using 
terrain/funneling methods; Shape optimization; 
Variational inequalities: F. E. approach) 

approximation 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D10, 65D30, 
65K05, 65K10, 90C15, 90C25, 90C26, 90C34, 90C35] 
(see: ABS algorithms for optimization; Approximation of 
multivariate probability integrals; Graph coloring; 
Multistage stochastic programming: barycentric 
approximation; Overdetermined systems of linear 
equations; Semi-infinite programming: numerical methods; 


Stochastic linear programs with recourse and arbitrary 

multivariate distributions) 
approximation see: algorithmic —; barycentric —; best —; 

better —; Chebyshev best —; cost —; discrete —; 
ellipsoidal —; finite-difference —; finite element —; 

Generalized outer —; hybrid branch and bound and 

outer —; inner —; linear —; linear outer —; logic of —; 

Logic-based outer —; maximal best —; mean field —; 

minimal best —; mixed finite element —; multipoint —; 

Multistage stochastic programming: barycentric —; 

outer —; Padé —; Padé-type —; perturbative —; 

point-based —; polyblock —; polynomial of best —; 

proximal —; quadratic outer —; second order —; Sensitivity 
and stability in NLP: —; stochastic —; successive —; 
truncated Taylor — 

approximation algorithm 

[90C20, 90C25] 

(see: Quadratic programming over an ellipsoid) 
approximation algorithm see: MINLP: outer —; outer — 
approximation algorithms 

[05C05, 05C85, 68Q25, 90B06, 90B35, 90B80, 90C06, 90C10, 

90C27, 90C39, 90C57, 90C59, 90C60, 90C90] 

(see: Bottleneck steiner tree problems; Directed tree 

networks; Traveling salesman problem) 
approximation algorithms 

[03B05, 05C05, 05C85, 68P10, 68Q25, 68R05, 68T15, 68T20, 

90B80, 90C09, 90C27, 90C35, 90C60, 90C90, 94C10] 

(see: Bottleneck steiner tree problems; Complexity theory: 

quadratic programming; Maximum satisfiability problem; 

Multi-index transportation problems; Simulated annealing; 

Steiner tree problems) 
approximation algorithms see: Cost — 
approximation algorithms for GAP 

[90-00] 

(see: Generalized assignment problem) 
approximation Analysis 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

(see: Domination analysis in combinatorial optimization) 
approximation by bounded or continuous functions see: 

Lipschitzian operators in best — 
approximation with equality relaxation see: outer — 
approximation with equality relaxation and augmented 

penalty see: outer — 

Approximation of extremum problems with probability 
functionals 

(90C15) 

(referred to in: Approximation of multivariate probability 

integrals; Discretely distributed stochastic programs: 

descent directions and efficient points; Extremum problems 
with probability functions: kernel type solution methods; 

General moment optimization problems; Logconcave 

measures, logconvexity; Logconcavity of discrete 

distributions; L-shaped method for two-stage stochastic 
programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 

Stabilization of cutting plane algorithms for stochastic 

linear programming problems; Static stochastic 
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programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of multivariate probability 
integrals; Discretely distributed stochastic programs: 
descent directions and efficient points; Extremum problems 
with probability functions: kernel type solution methods; 
General moment optimization problems; Logconcave 
measures, logconvexity; Logconcavity of discrete 
distributions; L-shaped method for two-stage stochastic 
programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

approximation of a function see: first order — 

approximation of linear programming see: analytical — 

approximation of a linear programming problem see: 
analytical — 

approximation measure see: contraction/ — 

approximation method see: Logic-based outer- —; outer —; 
polyblock —; Vogel — 

approximation methods see: Gaussian —; Semi-infinite 
programming: — 

Approximation of multivariate probability integrals 
(65C05, 65D30, 65Cxx, 65C30, 65C40, 65C50, 65C60, 90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 


solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

approximation of nonsmooth mappings 
[49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 

approximation operator see: best — 
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approximation in ordered normed linear spaces see: Best — 


approximation to the problem 
[41A30, 47A99, 65K10, 93-XX] 
(see: Boundary condition iteration BCI; Lipschitzian 


operators in best approximation by bounded or continuous 


functions) 
approximation problem see: simultaneous Diophantine — 
approximation ratio 
[05C05, 05C85, 68Q25, 90B80] 
(see: Bottleneck steiner tree problems; Directed tree 
networks) 
approximation scheme see: fully polynomial time —; 
polynomial time — 
approximation of space filling curves 
[90C26] 
(see: Global optimization using space filling) 
approximation techniques 
[90C15] 


(see: Stochastic linear programs with recourse and arbitrary 


multivariate distributions) 
approximation in the uniform norm 
[90C34] 


(see: Semi-infinite programming: approximation methods) 


approximation of variational inequalities 

[65M60] 

(see: Variational inequalities: F. E. approach) 
approximations see: nonsmooth local — 
approximations of nonsmooth mappings 

[49]52, 90C30] 

(see: Nondifferentiable optimization: Newton method) 
Approximations to robust conic optimization problems 
approximations to subdifferentials see: Continuous — 
approximator 
[49]52, 90C30] 
see: Nondifferentiable optimization: Newton method) 
aquifers 
[90C30, 90C35] 
see: Optimization in water resources) 
arbitrage pricing theory 
[91B50] 
see: Financial equilibrium) 
arbitrary 
[90C60] 

(see: Complexity classes in optimization) 
arbitrary multivariate distributions see: Stochastic linear 

programs with recourse and — 
arborescence see: minimum Steiner —; Steiner — 


arborescence problem see: capacitated minimum spanning — 


arborescence system see: multi-echelon — 
arborescence tree see: rectilinear Steiner — 
arboricity 

[90C35] 


(see: Fractional zero-one programming; Optimization in 


leveled graphs) 

arc see: admissible —; arrival-ground connection —; 
backward —; central —; conjunction —; disjunction —; 
dual —; endpoint of an —; entering —; forward —; 
ground-departure connection —; inadmissible —; 


incoming —; multiplier associated with an —; network —; 


outgoing 


; primal —; root —; train 


arc capacity 
[90C35] 

(see: Maximum flow problem) 

arc coloring 
[05C85] 

(see: Directed tree networks) 

arc consistency 
[65G20, 65G30, 65G40, 68T20] 

(see: Interval constraints) 

arc construction procedure see: best — 

arc cost see: piecewise linear — 

arc cost function see: sawtooth —; staircase — 

(arc) deletion problem see: vertex — 

arc in a directed network see: directed —; endpoint of an — 

arc flow bounds 
[90B10, 90C26, 90C30, 90C35] 

(see: Nonconvex network flow problems) 

arc flows see: capacity constraint on — 

arc formulation see: node- — 

arc formulation of the problem see: node- — 

arc incidence matrix see: node- — 

arc legend 
(see: Railroad crew scheduling) 

arc length vector 
[90C31, 90C39] 

(see: Multiple objective dynamic programming) 

arc in a network see: capacity of an —; cost of an —; 
directed — 

arc oriented branch and bound method 

68T99, 90C27] 

(see: Capacitated minimum spanning trees) 

arc oriented construction procedure 

68T99, 90C27] 

(see: Capacitated minimum spanning trees) 

arc routing 

68T99, 90C27] 

(see: Capacitated minimum spanning trees) 

arc routing 

90B06] 

(see: Vehicle routing) 

arc routing problem see: capacitated — 

arc separation procedure 

90B10] 

(see: Piecewise linear network flow problems) 

(arc) set problem see: feedback —; minimum feedback —; 
minimum feedback vertex —; minimum weight 
feedback —; subset feedback vertex —; subset minimum 
feedback vertex — 

Archimedes and the foundations of industrial engineering 
(01A20) 

archimedian 
(see: Planning in the process industry) 

Archimedian approach 
(see: Planning in the process industry) 

architecture see: selection of —; von Neumann — 

archive 
[34-xx, 34Bxx, 34Lxx, 93E24] 
(see: Complexity and large-scale least squares problems) 

arcs see: bold —; critical —; deadhead —; demand —; 
ground —-; natural stream —; rest —; sequence of —; 
train-train connection — 
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are under control see: rounding errors — 
area computer network see: local- — 
areas see: software package for specific mathematical — 
argon atoms 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
argument see: noncompensatory —; ordinal — 
argument function see: four- —; three- — 
argument principle 
[01A50, 01A55, 01A60] 
(see: Fundamental theorem of algebra) 
argument principle 
[01A50, 01A55, 01A60] 
(see: Fundamental theorem of algebra) 
arithmetic see: balanced interval —; balanced random 
interval —; differentiation —; Global optimization: interval 


analysis and balanced interval —; inclusion principle of 
machine interval —; inner interval —; Interval —; 
Kaucher —; machine interval —; random interval —; 
slope — 


arithmetic degree of a monomial ideal 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
arithmetic operation see: interval — 
arithmetic operations on fuzzy numbers 
[90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 
arity of a constraint 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
arm see: flexible —; Optimal control of a flexible — 
armed restless bandit problem see: multi- — 
Armijo-like criterion see: test nonmonotone — 
Armijo rule 
[90C30] 
(see: Convex-simplex algorithm; Cost approximation 
algorithms) 
Armijo steplength rule 
[49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
Arora PTAS 
[90C27] 
(see: Steiner tree problems) 
ARR 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
arr-station 
(see: Railroad locomotive scheduling) 
arr-time 
(see: Railroad locomotive scheduling) 
arrangement 
[05B35, 20F36, 20F55, 26A24, 52C35, 57N65, 65K99, 85-08] 
(see: Automatic differentiation: geometry of satellites and 
tracking stations; Hyperplane arrangements in 
optimization) 
arrangement see: face of an —; hyperplane —; linear —; 
polygonal —; simple —; two Polygons — 


arrangement of hyperplane 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
arrangement of hyperplanes see: Boolean —; braid —; 
cohomology of an —; complement of an —; divisor of 
an —; free —; reflection —; singularity of an — 
arrangement problem see: linear — 
arrangements see: Hyperplane — 
arrangements in optimization see: Hyperplane — 
Arrhenius constants 
[90C30, 90C52, 90C53, 90C55] 
(see: Gauss-Newton method: Least squares, relation to 
Newton’s method) 
arrival see: duty-after- — 
arrival-ground connection arc 
see: Railroad locomotive scheduling) 
arrival-ground node 
see: Railroad locomotive scheduling) 
arrival node 
see: Railroad locomotive scheduling) 
arrival-station 
see: Railroad crew scheduling) 
Arrow-Hurwicz gradient method 
[49Q10, 74K99, 74Pxx, 90C90, 91A65] 
see: Multilevel optimization in mechanics) 
art in modeling agricultural systems see: State of the — 
artificial centering hit and run 
[90C26, 90C90] 
see: Global optimization: hit and run methods) 
artificial intelligence 
[65G20, 65G30, 65G40, 65K05, 68T20, 90-08, 90C05, 90C06, 
90C10, 90C11, 90C20, 90C26, 90C30, 90C90] 
see: Disease diagnosis: optimization-based methods; 
Forecasting; Interval constraints) 
tificial intelligence 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 90C26, 
90C30, 91Axx, 91B06, 92C60] 
(see: Boolean and fuzzy relations; Forecasting) 
ary relation see: n- — 
AS 


a 


it 


[90C08, 90C11, 90C27, 90C57, 90C59] 
see: Quadratic assignment problem) 
as conic convex program see: semidefinite program — 
ASA 
[92C05] 
see: Adaptive simulated annealing and its application to 
protein folding) 
ascendant direction 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
ascendant direction see: feasible — 
ascent see: dual —; rate of steepest —; rule of steepest — 
ascent direction see: Dini steepest —; Hadamard steepest —; 
steepest — 
ascent flow 
[58E05, 90C30] 
(see: Topology of global optimization) 
asking strategy see: binary search-Hansel chains question- —; 
question- —; sequential Hansel chains question- — 
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ASOG equation 

90C26, 90C90] 

(see: Global optimization in phase and chemical reaction 

equilibrium) 

aspatial oligopoly problem 

91B06, 91B60] 

(see: Oligopolistic market equilibrium) 

aspatial and spatial markets 

91B06, 91B60] 

(see: Oligopolistic market equilibrium) 

aspiration criteria 

68M20, 90B06, 90B35, 90B80, 90C59] 
(see: Flow shop scheduling problem; Heuristic and 
metaheuristic algorithms for the traveling salesman 
problem; Location routing problem) 

aspiration level 
[05C69, 05C85, 68W01, 90C29, 90C59] 
(see: Heuristics for maximum clique and independent set; 
Multiple objective programming support) 

aspiration search 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 

aspiration search see: parallel — 

asplund 
[49K27, 58C20, 58E30, 90C46, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials; 
Nonsmooth analysis: weak stationarity) 

assembly crossover (EAX) see: edge — 

assessment see: comparative efficiency —; fuzzy truth — 

asset see: risk-free — 

asset Liability Management 

65K99, 90C29] 

(see: Asset liability management decision support system) 

Asset liability management decision support system 

90C29, 65K99) 

asset pricing model see: capital — 

asset selling problem 

[49L20, 90C39] 

see: Dynamic programming: discounted problems) 

assignment 

[90029] 

see: Decision support systems with multiple criteria; Mixed 
integer programming/constraint programming hybrid 
methods) 

assignment 
[90C10, 90C35] 
(see: Bi-objective assignment problem) 

assignment see: adjacent channel constrained frequency —; 
airline fleet —; co-channel constrained frequency —; 
discrete location and —; dynamic traffic —; feasible —; 
fleet —; free —; optimal —; order of a T-coloring 
frequency —; partial —; quadratic —; single —; span of 
a T-coloring frequency —; traffic —; variable — 

assignment algorithms see: pair —; single — 

assignment constraints see: semi- — 

Assignment and matching 
(90C35, 90C27, 90C10, 90C05) 
(referred to in: Assignment methods in clustering; 
Bi-objective assignment problem; Communication network 
assignment problem; Frequency assignment problem; 
Linear ordering problem; Maximum partition matching; 


Multidimensional assignment problem; Quadratic 
assignment problem) 
(refers to: Assignment methods in clustering; Bi-objective 
assignment problem; Communication network assignment 
problem; Frequency assignment problem; Maximum 
partition matching; Quadratic assignment problem) 

assignment method 
[62H30, 90C27] 
(see: Assignment methods in clustering) 

Assignment methods in clustering 
(62H30, 90C27) 
(referred to in: Assignment and matching; Bi-objective 
assignment problem; Communication network assignment 
problem; Frequency assignment problem; Linear ordering 
problem; Maximum partition matching; Quadratic 
assignment problem) 
(refers to: Assignment and matching; Bi-objective 
assignment problem; Communication network assignment 
problem; Frequency assignment problem; Maximum 
partition matching; Quadratic assignment problem) 

assignment model 
[68M20, 90B06, 90B10, 90B35, 90B80, 90B85, 90C10, 90C27] 
(see: Single facility location: multi-objective euclidean 
distance location; Vehicle scheduling) 

assignment model see: the multi-resource weighted —; 
quasi- — 

assignment models see: locomotive — 

assignment problem 
[68Q25, 68R10, 68W40, 90B85, 90C05, 90C06, 90C08, 90C10, 
90C11, 90C27, 90C30, 90C35, 90C59, 90C60] 
(see: Assignment and matching; Auction algorithms; 
Bi-objective assignment problem; Complexity of 
degeneracy; Domination analysis in combinatorial 
optimization; Integer programming: cutting plane 
algorithms; Single facility location: multi-objective 
euclidean distance location) 

assignment problem 
[68Q25, 90B80, 90C05, 90C27, 90C30, 90C35] 
(see: Auction algorithms; Communication network 
assignment problem) 

assignment problem see: algebraic quadratic —; Asymptotic 
properties of random multidimensional —; Bi-objective —; 
Biquadratic —; bottleneck quadratic —; Communication 
network —; fleet —; Frequency —; general quadratic —; 
generalized —; Koopmans-Beckmann quadratic —; 
Multidimensional —; multilevel generalized —; 
multiperiod —; optimal —; order preserving —; 
quadratic —; Quadratic semi- —; radio link frequency —; 
traffic — 

assignment problems see: multi-index —; nonlinear —; 
three-index — 

assignment property see: single — 

assignment ranking 
[90C60] 
(see: Complexity of degeneracy) 

assignment and a set of prices see: almost at equilibrium of 
an —; equilibrium of an — 

assignment of wavelengths 
[05C85] 
(see: Directed tree networks) 

assignments problems see: N-adic — 
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associated with an arc see: multiplier — 

associated with A see: canonical function — 
Association see: atlas of the International Zeolite — 
association graph 

[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 

(see: Replicator dynamics in combinatorial optimization) 
association graph see: tree —; weighted tree — 
association problem see: data- — 
associative connective 

[03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 
associativity of products of relations see: pseudo- — 
assumption see: backlog —; human rationality —; lost sales —; 

nondegeneracy —; separability — 
assumption for algorithm analysis see: nondegeneracy — 
assumption stability 

[90C31] 

(see: Sensitivity and stability in NLP: continuity and 

differential stability) 
assumptions see: regularity —; under weak — 
AST 

[03E70, 03H05, 91B16] 

(see: Alternative set theory) 
astrodynamics 

[26A24, 65K99, 85-08] 

(see: Automatic differentiation: geometry of satellites and 

tracking stations) 
Astrodynamics 

[26A24, 65K99, 85-08] 

(see: Automatic differentiation: geometry of satellites and 

tracking stations) 
astronomical problem 

[90C26, 90C90] 

(see: Global optimization in binary star astronomy) 
astronomy 

[49M37, 65K10, 90C26, 90C30, 90C90] 

(see: a BB algorithm; Global optimization in binary star 

astronomy) 
astronomy see: Global optimization in binary star — 
asymmetric Traveling Salesman Problem (ATSP) 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

(see: Domination analysis in combinatorial optimization) 
asymmetric TSP 

[90C59] 

(see: Heuristic and metaheuristic algorithms for the 

traveling salesman problem) 
asymmetric TSP (ATSP) 

[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 

90C60, 90C90] 

(see: Traveling salesman problem) 
asymmetrical information 

[68Q25, 91B28] 

(see: Competitive ratio for portfolio management) 
asymptotic analysis 

[90C10, 90C15] 

(see: Stochastic integer programs) 
asymptotic analysis 

[90Cxx] 

(see: Discontinuous optimization) 


asymptotic behavior 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
asymptotic behavior 
[62C10, 65K05, 68Q25, 90B80, 90C05, 90C10, 90C15, 90C26, 
90C27] 
(see: Bayesian global optimization; Communication 
network assignment problem) 
asymptotic behavior of CAP on trees 
[68Q25, 90B80, 90C05, 90C27] 
see: Communication network assignment problem) 
asymptotic case of integral evaluation 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
asymptotic convergence 
[03H10, 49J27, 90C34] 
see: Semi-infinite programming and control problems) 
asymptotic convergence rates 
[49M29, 65K10, 90C06] 
see: Local attractors for gradient-related descent iterations) 
asymptotic CQ 
[49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
Asymptotic properties of random multidimensional 
assignment problem 
90C27, 34E05) 
asymptotic results for RSM-distributions 
[52A22, 60D05, 68Q25, 90C05] 
see: Probabilistic analysis of simplex algorithms) 
asymptotic stability 
[90C30] 
(see: Suboptimal control) 
asymptotical stability at an equilibrium 
[90B15] 
(see: Dynamic traffic networks) 
asymptotical stability of a system 
[90B15] 
see: Dynamic traffic networks) 
asymptotical system stability 
[90B15] 
see: Dynamic traffic networks) 
asymptotically admissible pair of trajectory and control functions 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
asymptotically stable 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
asymptotically stable stationary point 
[90C20] 
see: Standard quadratic optimization problems: 
algorithms) 
asynchronous algorithm 
[90C30, 90C52, 90C53, 90C55] 
see: Asynchronous distributed optimization algorithms) 
asynchronous computation 
[90C30, 90C52, 90C53, 90C55] 
see: Asynchronous distributed optimization algorithms; 
Cost approximation algorithms) 
asynchronous computation 
[90C30] 
see: Cost approximation algorithms) 
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asynchronous computation see: partially — 
asynchronous convergence theorem 

[90C30, 90C52, 90C53, 90C55] 

(see: Asynchronous distributed optimization algorithms) 
Asynchronous distributed optimization algorithms 

(90C30, 90C30, 90C52, 90C53, 90C55) 

(referred to in: Automatic differentiation: parallel 

computation; Heuristic search; Load balancing for parallel 

optimization techniques; Parallel computing: complexity 
classes; Parallel computing: models; Parallel heuristic 
search; Stochastic network problems: massively parallel 
solution) 

(refers to: Automatic differentiation: parallel computation; 

Heuristic search; Interval analysis: parallel methods for 

global optimization; Load balancing for parallel 

optimization techniques; Parallel computing: complexity 
classes; Parallel computing: models; Parallel heuristic 
search; Stochastic network problems: massively parallel 
solution) 

asynchronous implementation of the auction algorithm sce: 
totally — 

asynchronous iterative algorithms 

[90C30, 90C52, 90C53, 90C55] 

(see: Asynchronous distributed optimization algorithms) 
asynchronous iterative method see: partially — 
asynchronous operation see: partially —; totally — 
asynchronous parallel CA algorithm 
90C30] 

(see: Cost approximation algorithms) 

asynchronous round robin balancing scheme 

68W 10, 90C27] 

(see: Load balancing for parallel optimization techniques) 
Atkinson-Brakhage preconditioner 

65H10, 65J15] 

(see: Contraction-mapping) 

atlas of the International Zeolite Association 

74A40, 90C26] 

(see: Shape selective zeolite separation and catalysis: 

optimization methods) 
atom see: all- — 
ATOMFT 

[65K05, 90C30] 

(see: Automatic differentiation: point and interval taylor 

operators) 
atoms see: argon — 

(ATSP) see: asymmetric Traveling Salesman Problem —; 
asymmetric TSP — 

attentive convergence see: f- — 

attraction see: basins of —; region of — 

attraction and repulsion see: Global optimization in Weber's 
problem with —; Weber problem with — 

attractor see: local — 

attractors see: singular local — 

attractors for gradient-related descent iterations see: Local — 

attribute utility theory see: multi- — 

attributed tree 

[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 

(see: Replicator dynamics in combinatorial optimization) 
auction 

[90C30, 90C35] 

(see: Auction algorithms) 


auction algorithm 
[90C30, 90C35] 
(see: Auction algorithms) 

auction algorithm see: graph collapsing —; modified 
standard —; naive —; synchronous implementation of 
the —; totally asynchronous implementation of the — 

Auction algorithms 
(90C30, 90C35) 
(referred to in: Communication network assignment 
problem; Dynamic traffic networks; Equilibrium networks; 
Generalized networks; Maximum flow problem; Minimum 
cost flow problem; Multicommodity flow problems; 
Network design problems; Network location: covering 
problems; Nonconvex network flow problems; Piecewise 
linear network flow problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Traffic network equilibrium) 
(refers to: Communication network assignment problem; 
Directed tree networks; Dynamic traffic networks; 
Equilibrium networks; Evacuation networks; Generalized 
networks; Maximum flow problem; Minimum cost flow 
problem; Network design problems; Network location: 
covering problems; Nonconvex network flow problems; 
Piecewise linear network flow problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Traffic network equilibrium) 

auction algorithms 
[90B10, 90C27, 90C30, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms; 
Shortest path tree algorithms) 

auction algorithms see: graph collapsing in —; graph 
reduction in —; virtual source concept in — 

auction approach 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 

auction technique 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 

auction technique 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 

auditing decisions see: Multicriteria decision support 
methodologies for — 

augmentation see: planar — 

augmentation oracle 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 

augmentation oracle see: oriented — 

augmented Lagrange functions 
[90C30] 
(see: Nonlinear least squares problems) 

augmented Lagrangian 
[90C25, 90C30, 90C33] 
(see: Implicit lagrangian; Lagrangian multipliers methods 
for convex programming) 

augmented Lagrangian decomposition approach 
[90C30, 90C35] 
(see: Optimization in water resources) 
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augmented Lagrangian function 

[90C30] 

(see: Image space approach to optimization) 
augmented Lagrangian methods see: Practical — 
augmented Lagrangians 

[90C25, 90C30] 

(see: Lagrangian multipliers methods for convex 

programming) 
augmented network 

[90B10, 90C26, 90C30, 90C35] 

(see: Nonconvex network flow problems) 
augmented penalty see: outer approximation with equality 

relaxation and — 
augmented performance index 

93-XX] 

(see: Direct search Luus—Jaakola optimization procedure) 
augmenting flows 
90C35 
(see: Minimum cost flow problem) 
augmenting path algorithm 
90C35 
(see: Maximum flow problem) 
augmenting path algorithm 
90C35 
(see: Maximum flow problem) 
augmenting path algorithm see: generic — 
augmenting vector 
05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 

90B, 90C] 

(see: Convex discrete optimization) 
automata see: theory of — 
automated design optimization process 
90C90] 
(see: Design optimization in computational fluid dynamics) 
automated Fortran program for nonlocal sensitivity analysis 
34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
(see: Nonlocal sensitivity analysis with automatic 
differentiation) 
automated hypothesis formation 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
automatic classification of documents 
90C09, 90C10] 
(see: Optimization in classifying text documents) 
automatic differentiation 
26A24, 34-XX, 49-04, 49-XX, 65-XX, 65D25, 65G20, 65G30, 
65G40, 65H20, 65K05, 65K99, 65L99, 65Y05, 68-XX, 68N20, 
68W30, 85-08, 90-XX, 90C26, 90C30] 
(see: Automatic differentiation: calculation of the Hessian; 
Automatic differentiation: calculation of Newton steps; 
Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: root problem and branch problem; 
Bounding derivative ranges; Complexity of gradients, 
Jacobians, and Hessians; Interval analysis: differential 
equations; Interval analysis: intermediate terms; Interval 
global optimization; Nonlocal sensitivity analysis with 
automatic differentiation) 


automatic differentiation 
[26A24, 34-XX, 49-04, 49-XX, 65-XX, 65D25, 65K05, 65K99, 
65Y05, 68-XX, 68N20, 68W30, 85-08, 90-XX, 90C26, 90C30] 
(see: Automatic differentiation: calculation of the Hessian; 
Automatic differentiation: calculation of Newton steps; 
Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: parallel 
computation; Automatic differentiation: point and interval 
taylor operators; Automatic differentiation: root problem 
and branch problem; Bounding derivative ranges; 
Complexity of gradients, Jacobians, and Hessians; Nonlocal 
sensitivity analysis with automatic differentiation) 
automatic differentiation see: backward mode in —; forward 
mode of —; interval —; Nonlocal sensitivity analysis 
with —; reverse mode —; vector forward — 
Automatic differentiation: calculation of the Hessian 
(90C30, 65K05) 
(referred to in: Automatic differentiation: calculation of 
Newton steps; Automatic differentiation: geometry of 
satellites and tracking stations; Automatic differentiation: 
introduction, history and rounding error estimation; 
Automatic differentiation: parallel computation; 
Automatic differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
Automatic differentiation: root problem and branch 
problem; Nonlocal sensitivity analysis with automatic 
differentiation) 
(refers to: Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
Automatic differentiation: root problem and branch 
problem; Nonlocal sensitivity analysis with automatic 
differentiation) 
Automatic differentiation: calculation of Newton steps 
(90C30, 65K05) 
(referred to in: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: geometry of satellites 
and tracking stations; Automatic differentiation: 
introduction, history and rounding error estimation; 
Automatic differentiation: parallel computation; 
Automatic differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
Automatic differentiation: root problem and branch 
problem; Dynamic programming and Newton’s method in 
unconstrained optimal control; Interval Newton methods; 
Nondifferentiable optimization: Newton method; 
Nonlinear least squares: Newton-type methods; Nonlocal 
sensitivity analysis with automatic differentiation; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 
(refers to: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: geometry of satellites 
and tracking stations; Automatic differentiation: 
introduction, history and rounding error estimation; 
Automatic differentiation: parallel computation; 
Automatic differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
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Automatic differentiation: root problem and branch 
problem; Dynamic programming and Newton’s method in 
unconstrained optimal control; Interval Newton methods; 
Nondifferentiable optimization: Newton method; Nonlocal 
sensitivity analysis with automatic differentiation; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 


Automatic differentiation: geometry of satellites and tracking 


stations 

(26A24, 65K99, 85-08) 

(referred to in: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: introduction, history and 
rounding error estimation; Automatic differentiation: 
parallel computation; Automatic differentiation: point and 
interval; Automatic differentiation: point and interval 
taylor operators; Automatic differentiation: root problem 
and branch problem; Nonlocal sensitivity analysis with 
automatic differentiation) 

(refers to: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: introduction, history and 
rounding error estimation; Automatic differentiation: 
parallel computation; Automatic differentiation: point and 
interval; Automatic differentiation: point and interval 
taylor operators; Automatic differentiation: root problem 
and branch problem; Nonlocal sensitivity analysis with 
automatic differentiation) 


Automatic differentiation: introduction, history and 


rounding error estimation 

(65D25, 26A24) 

(referred to in: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: parallel 
computation; Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Automatic differentiation: root problem and 
branch problem; Interval analysis: intermediate terms; 
Interval analysis: subdivision directions in interval branch 
and bound methods; Nonlocal sensitivity analysis with 
automatic differentiation; Unconstrained optimization in 
neural network training) 

(refers to: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: parallel 
computation; Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Automatic differentiation: root problem and 
branch problem; Nonlocal sensitivity analysis with 
automatic differentiation) 


Automatic differentiation: parallel computation 


(65Y05, 68N20, 49-04) 

(referred to in: Asynchronous distributed optimization 
algorithms; Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: point and interval; Automatic 


differentiation: point and interval taylor operators; 
Automatic differentiation: root problem and branch 
problem; Heuristic search; Load balancing for parallel 
optimization techniques; Nonlocal sensitivity analysis with 
automatic differentiation; Parallel computing: complexity 
classes; Parallel computing: models; Parallel heuristic 
search; Stochastic network problems: massively parallel 
solution) 

(refers to: Asynchronous distributed optimization 
algorithms; Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
Automatic differentiation: root problem and branch 
problem; Heuristic search; Interval analysis: parallel 
methods for global optimization; Load balancing for 
parallel optimization techniques; Nonlocal sensitivity 
analysis with automatic differentiation; Parallel computing: 
complexity classes; Parallel computing: models; Parallel 
heuristic search; Stochastic network problems: massively 
parallel solution) 


Automatic differentiation: point and interval 


(65H99, 65K99) 

(referred to in: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: point and interval taylor operators; 
Automatic differentiation: root problem and branch 
problem; Bounding derivative ranges; Global optimization: 
application to phase equilibrium problems; Interval 
analysis: application to chemical engineering design 
problems; Interval analysis: differential equations; Interval 
analysis: eigenvalue bounds of interval matrices; Interval 
analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: systems of 
nonlinear equations; Interval analysis: unconstrained and 
constrained optimization; Interval analysis: verifying 
feasibility; Interval constraints; Interval fixed point theory; 
Interval global optimization; Interval linear systems; 
Interval Newton methods; Nonlocal sensitivity analysis 
with automatic differentiation) 

(refers to: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: point and interval taylor operators; 
Automatic differentiation: root problem and branch 
problem; Bounding derivative ranges; Global optimization: 
application to phase equilibrium problems; Interval 
analysis: application to chemical engineering design 
problems; Interval analysis: differential equations; Interval 
analysis: eigenvalue bounds of interval matrices; Interval 
analysis: intermediate terms; Interval analysis: 
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nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; Interval 
Newton methods; Nonlocal sensitivity analysis with 
automatic differentiation) 

Automatic differentiation: point and interval taylor operators 
(65K05, 90C30) 
(referred to in: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: point and interval; Automatic 
differentiation: root problem and branch problem; 
Bounding derivative ranges; Global optimization: 
application to phase equilibrium problems; Interval 
analysis: application to chemical engineering design 
problems; Interval analysis: differential equations; Interval 
analysis: eigenvalue bounds of interval matrices; Interval 
analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: systems of 
nonlinear equations; Interval analysis: unconstrained and 
constrained optimization; Interval analysis: verifying 
feasibility; Interval constraints; Interval fixed point theory; 
Interval global optimization; Interval linear systems; 
Interval Newton methods; Nonlocal sensitivity analysis 
with automatic differentiation) 
(refers to: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: point and interval; Automatic 
differentiation: root problem and branch problem; 
Bounding derivative ranges; Global optimization: 
application to phase equilibrium problems; Interval 
analysis: application to chemical engineering design 
problems; Interval analysis: differential equations; Interval 
analysis: eigenvalue bounds of interval matrices; Interval 
analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; Interval 
Newton methods; Nonlocal sensitivity analysis with 
automatic differentiation) 

Automatic differentiation: root problem and branch problem 
(65K05) 
(referred to in: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 


steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
Nonlocal sensitivity analysis with automatic 
differentiation) 
(refers to: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
Nonlocal sensitivity analysis with automatic 
differentiation) 

automatic document classification 

[90C09, 90C10] 

see: Optimization in classifying text documents) 

automatic graph drawing 

[90C35] 

see: Optimization in leveled graphs) 

automatic parallelization 

[05-02, 05-04, 15A04, 15A06, 68U99] 

see: Alignment problem) 

automatic result verification 

[65G20, 65G30, 65G40, 65H20, 65K99] 

see: Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval fixed point theory; Interval Newton methods) 

automation 

[93D09] 

see: Robust control) 

autonomy 

[49M37, 65K05, 65K10, 90C30, 93A13] 

see: Multilevel methods for optimal design) 

auxiliary function 

[65K05, 68W10, 90B15, 90C06, 90C30] 

see: Direct global optimization algorithm; Stochastic 

network problems: massively parallel solution) 

auxiliary problem principle 

[90C30] 

see: Cost approximation algorithms) 

auxiliary problem principle 

[90C30] 

see: Cost approximation algorithms) 

auxiliary variable 

[93-XX] 

see: Dynamic programming: optimal control applications) 

availability see: upper bound on gas lift — 

average 

[65H20] 

see: Multi-scale global optimization using 

terrain/funneling methods) 

average see: on — 

average behavior 

[05B35, 65K05, 90C05, 90C20, 90C33] 

see: Criss-cross pivoting rules) 
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average case analysis 

62C10, 65K05, 90C10, 90C15, 90C26] 

(see: Bayesian global optimization) 

average case behavior 

52A22, 60D05, 68Q25, 90C05] 

(see: Probabilistic analysis of simplex algorithms) 

average case complexity 

[60J65, 68Q25] 

(see: Adaptive global search) 

average case complexity 

[60J65, 68Q25] 

see: Adaptive global search) 

average case complexity of algorithms 

[52A22, 60D05, 68Q25, 90C05] 

(see: Probabilistic analysis of simplex algorithms) 

average case setting 

[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 

(see: Information-based complexity and information-based 

optimization) 

average cost per stage 

[49L99] 

see: Dynamic programming: average cost per stage 

problems) 

average cost per stage problem 

[49L.20, 90C39, 90C40] 

see: Dynamic programming: infinite horizon problems, 

overview) 

average cost per stage problems 

[49L99] 

see: Dynamic programming: average cost per stage 
problems) 

average cost per stage problems see: Dynamic 
programming: — 

average model see: moving — 

average nonredundancy rate 

[52A22, 60D05, 68Q25, 90C05] 

(see: Probabilistic analysis of simplex algorithms) 

average number of pivot steps 

[52A22, 60D05, 68Q25, 90C05] 

see: Probabilistic analysis of simplex algorithms) 

average redundancy rate 

[52A22, 60D05, 68Q25, 90C05] 

(see: Probabilistic analysis of simplex algorithms) 

average rmsds by energy 

[92C05, 92C40] 

(see: Protein loop structure prediction methods) 
averaged Navier-Stokes code see: Reynolds- — 
averaging 
[55R15, 55R35, 65K05, 90C11] 
see: Deterministic and probabilistic optimization models 
for data classification) 
averaging method 
[47J20, 49J40, 65K10, 90C33] 
see: Solution methods for multivalued variational 
inequalities) 
averaging operation 
[90C15] 
see: Stochastic quasigradient methods: applications) 
averaging operation 
[90C15] 


(see: Stochastic quasigradient methods in minimax 
problems) 
averse, anticipative) decision see: ex-ante (risk — 
avoidance 
[68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
AW. see: Tucker — 
away direction 
[90C30] 
(see: Frank-Wolfe algorithm) 
away terminals 
(see: Railroad crew scheduling) 
axes of coordinates 
[01A99] 
(see: Leibniz, gottfried wilhelm) 
axial MITPs see: greedy algorithm for —; hub heuristics for — 
axial multi-index transportation problem 
[90C35] 
(see: Multi-index transportation problems) 
axiom see: choice —; existence of classes —; extensionality —; 
induction —; prolongation —; regularity —; two 
cardinalities — 
axiom of extensionality 
[03B52, 03E72, 47840, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
axiom of prolongation 
03E70, 03H05, 91B16] 
(see: Alternative set theory) 
axiom systems for oriented matroids 
90C09, 90C10] 
(see: Oriented matroids) 
axiomatic approach 
90C30] 
(see: Global optimization based on statistical models) 
axiomatic derivation of cross-entropy 
90C25, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 
axiomatic derivation of entropy 
90C25, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 
axiomatic derivation of the principle of maximum entropy 
90C25, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 
axiomatic derivation of the principle of minimum cross-entropy 
90C25, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 
axioms see: existence of sets —; painting — 
axioms of alternative set theory 
03E70, 03H05, 91B16] 
(see: Alternative set theory) 
azeotrope 
90C30] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
azeotrope 
90C30] 
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(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
azeotropes see: heterogeneous —; Nonlinear systems of 
equations: application to the enclosure of all —; reactive — 
azuma’s inequality 
[05C85] 
(see: Directed tree networks) 


B see: gas lift wells of type 
flowing wells of type — 
B-derivative 
[49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
b-matching 
[90C10, 90C11, 90C27, 90C57] 
(see: Integer programming) 
b-matching problem 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
b-matching problem see: perfect — 
B-subdifferential 
[49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
B well see: type — 
B wells see: type — 
BA 
[62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
back-off 
[49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 
back propagation 
[90C15] 
(see: Stochastic quasigradient methods: applications) 
backboard wiring problem 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
backhauls 
[00-02, 01-02, 03-02] 
(see: Vehicle routing problem with simultaneous pickups 
and deliveries) 
backhauls see: vehicle routing problem with — 
backlog assumption 
[491.20] 
(see: Dynamic programming: inventory control) 
backpropagation method 
[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 
(see: Unconstrained optimization in neural network 
training) 
backsolving see: Gaussian elimination with — 
backtrack 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
backtrack phases 
[90C35] 
(see: Graph coloring) 


; level —; model 


; naturally 


backtracking 

[90C10, 90C29, 90C30] 

see: Multi-objective integer linear programming; 
Nonlinear least squares problems) 

backtracking see: depth-first search with — 
backward arc 

[90035] 

see: Maximum flow problem) 

backward compatibility 

[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 

see: Boolean and fuzzy relations) 

backward mode in automatic differentiation 

[65G20, 65G30, 65G40, 65H20] 

(see: Interval analysis: intermediate terms) 
bad derivatives see: method of — 
Baire measure 

[03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 
bake algorithm see: shake and — 
bake approach see: Phase problem in X-ray crystallography: 

Shake and — 
balance see: power — 
balance constraints see: mass — 
balance equation 

[49-XX, 90-XX, 93-XX] 

(see: Duality theory: triduality in global optimization) 
balance equations see: mass and energy —; node flow — 
balance equations for material flows 

(see: Planning in the process industry) 
balance sheet 

[91B50] 

(see: Financial equilibrium) 
balanced interval arithmetic 

[65G30, 65G40, 65K05, 90C30, 90C57] 

(see: Global optimization: interval analysis and balanced 

interval arithmetic) 
balanced interval arithmetic see: Global optimization: interval 

analysis and — 
balanced node 

[90035] 

(see: Minimum cost flow problem) 
balanced random interval arithmetic 

[65G30, 65G40, 65K05, 90C30, 90C57] 

(see: Global optimization: interval analysis and balanced 

interval arithmetic) 
balances see: mass —; mass, energy and momentum — 
balancing see: dynamic load —; load —; static load — 
balancing objectives 

[90B85] 

(see: Single facility location: multi-objective euclidean 

distance location; Single facility location: multi-objective 

rectilinear distance location) 
balancing for parallel optimization techniques see: Load — 
balancing problem see: turbine — 
balancing scheme see: asynchronous round robin —; 
near-neighbor load — 
balancing technique see: dynamic load — 
Balas algorithm 
[90C10, 90C29] 
(see: Multi-objective integer linear programming) 
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ball algorithm see: heavy — 
ball-constrained linear problem 
[37A35, 90C05] 
see: Potential reduction methods for linear programming) 
ball constraint 
[90C20, 90C25] 
see: Quadratic programming over an ellipsoid) 
ball method see: heavy — 
ball quotient 
[05B35, 20F36, 20F55, 52C35, 57N65] 
see: Hyperplane arrangements) 
Banach linear extension theorem see: Hahn—- — 
Banach space see: monotone operator on a — 
Banach spaces see: Steiner ratio in — 
Banach theorem see: Hahn- —; Mazur—Orlicz version of the 
Hahn- — 
banded matrix 
[65Fxx] 
see: Least squares problems) 
bandit problem see: multi-armed restless — 
Bandler-Kohout compatibility theorem 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
bandwidth 
[15-XX, 65-XX, 90-XX] 
see: Cholesky factorization) 
bandwidth see: lower —; optical —; row 
bandwidth of interdisciplinary coupling 
[49M37, 65K05, 65K10, 90C30, 93A13] 
see: Multilevel methods for optimal design) 
bandwidth packing problem 
[90C35] 
see: Multicommodity flow problems) 
bandwidth problem 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
bank see: first — 
bar of a truss see: elastic — 
BARON 
[90C11] 
(see: MINLP: branch and bound methods) 
barrier 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
see: Parametric optimization: embeddings, path following 
and singularities) 
barrier function 
[90C05, 90C22, 90C25, 90C30, 90C51] 
(see: Interior point methods for semidefinite programming) 
barrier function 
90C25, 90C30] 
(see: Successive quadratic programming: solution by active 
sets and interior point methods) 
barrier function see: logarithmic — 
barrier location problems 
[90B85] 
(see: Multifacility and restricted location problems) 
barrier location problems 
[90B85] 
(see: Multifacility and restricted location problems) 


; upper 


barrier method see: interior point logarithmic — 
barrier methods 

[65L99, 93-XX] 

(see: Optimization strategies for dynamic systems) 
barrier-penalty function see: logarithmic-quadratic — 
barrier policy 

[90B05, 90B06] 

(see: Global supply chain models) 
barycenter see: weighted — 
barycenters see: generalized — 
barycentric approximation 

[90C15] 

(see: Multistage stochastic programming: barycentric 

approximation) 
barycentric approximation see: Multistage stochastic 

programming: — 
barycentric scenario trees 
[90C15] 
(see: Multistage stochastic programming: barycentric 
approximation) 
barycentric weights 
[90C15] 
(see: Multistage stochastic programming: barycentric 
approximation) 
base polytope see: matroid — 
based see: index- —; method- —; metric- 
based aggregation see: feature- — 
based algorithm see: exact penalty function — 
based approach see: gradient — 
based approaches see: equation —; logic- — 
based approximation see: point- — 
based branch and bound see: IP/NLP —; QP/NLP — 
based clustering approach: global optimum search with 
enhanced positioning see: Gene clustering: A novel 
decomposition- — 

based complexity see: information- — 

based complexity and information-based optimization see: 

Information- — 
based control for drug delivery systems see: Model — 
based controllers via parametric programming see: Design of 

robust model- — 
based experimental analysis see: model- — 
based framework see: graph — 
based frameworkfor radiation therapy see: Optimization — 
based gradient see: adjoint- —; sensitivity- — 
based heuristics see: continuous — 
based implementation see: PVM- — 
based implementations see: MPI- — 
based logic system of approximate reasoning see: point- — 
based lower bounds see: eigenvalue — 
based method see: KKT- —; penalty- —; reduction — 


; model- 


based methods see: descent- —; Disease diagnosis: 
optimization- —; MINLP: logic- — 

based model see: glS design pattern —; information- — 

based NP methods see: knowledge- — 

based optimization see: information- —; Information-based 
complexity and information- —; simulation- — 

based outer approximation see: Logic- — 

based perspective see: metric- —; model- — 


based procedures see: gradient — 
based on semidefinite relaxations see: bounds — 
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based on single-crystal X-ray diffraction data see: Optimization 
techniques for phase retrieval — 

based on statistical models see: Global optimization — 

based system see: rule- — 

based theorem prover see: resolution — 

based visualization see: Optimization- — 

based yield see: (2- — 

bases see: neighboring — 

bases of a matroid see: set of — 

bases of an oriented matroid 

90C09, 90C10] 

(see: Oriented matroids) 

bases for polynomial equations see: Grobner — 

basic ABS class 

65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 

squares; ABS algorithms for optimization) 

basic alternative theorem 

90C26] 

(see: Invexity and its applications) 

basic alternative theorem 

90C26] 

(see: Invexity and its applications) 

basic column 

90C05, 90C33] 

(see: Pivoting algorithms for linear programming 

generating two paths) 

basic component 

90C30] 

(see: Convex-simplex algorithm) 

basic component 

90C30] 

(see: Convex-simplex algorithm) 

basic constraint qualification 

46A20, 49K27, 49K40, 52A01, 90C30, 90C31] 

(see: Composite nonsmooth optimization; First order 

constraint qualifications) 

basic feasible solution 

90C05, 90C35, 90C60] 
(see: Complexity of degeneracy; Generalized networks; 
Linear programming; Linear programming: Klee-Minty 
examples) 

basic feasible solution 
[90C60] 
(see: Complexity of degeneracy) 

basic features, examples from financial decision making see: 
Preference disaggregation approach: — 

basic GRASP 

65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 

(see: Greedy randomized adaptive search procedures) 

basic matrix 

90C05, 90C33] 

(see: Linear programming: Klee-Minty examples; Pivoting 

algorithms for linear programming generating two paths) 

basic operations in a program 

26A24, 65D25] 

(see: Automatic differentiation: introduction, history and 

rounding error estimation) 

basic outline of filled function methods 

65K05, 90C26, 90C30, 90C59] 

(see: Global optimization: filled function methods) 


basic QC 
[49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
basic rules of branch and bound 
[90C10, 90C29] 
(see: Multi-objective integer linear programming) 
basic sensitivity theorem 
[90C31] 
(see: Bounds and solution vector estimates for parametric 
NLPS) 
basic software routines see: package of — 
basic solution 
[05B35, 65K05, 90C05, 90C20, 90C33, 90C35] 
(see: Criss-cross pivoting rules; Generalized networks; 
Linear programming; Linear programming: Klee-Minty 
examples; Pivoting algorithms for linear programming 
generating two paths) 
basic solution 
[90C05] 
(see: Linear programming) 
basic solution see: degenerate — 
basic solutions 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Least-index anticycling rules; Lexicographic pivoting 
rules) 
basic variables 
[49M37, 90C05, 90C11] 
see: Linear programming; Mixed integer nonlinear 
programming) 
basic VNS 
[9008, 90C26, 90C27, 90C59] 
see: Variable neighborhood search methods) 
BASICS see: Lagrangian duality: — 
basin see: g- —; local — 
basins of attraction 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
basis 
[65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 
basis 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
basis see: advanced —; complementary —; dual 
degenerate —; extremal —; graver —; Grdbner —; 
Hilbert —; optimal —; primal degenerate —; primal 
feasible —; reduced Grébner —; tangle —; universal 
Grébner —; working — 
basis forest 
[90035] 
(see: Generalized networks) 
basis method see: extremal — 
basis orientation of an oriented matroid 
[90C09, 90C10] 
(see: Oriented matroids) 
basis tableau see: lexico-positive — 
Bassett-Maybee-Quirk theorem 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
batch design under uncertainty see: Global optimization in — 
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batch learning 
(see: Bayesian networks) 
batch mode 
(see: Planning in the process industry) 
batch plant see: multiproduct — 
batch plant design 
[90C05, 90C26] 
(see: Continuous global optimization: applications; Global 
optimization in batch design under uncertainty) 
batch process 
[90C26] 
(see: Global optimization in batch design under 
uncertainty) 
batch process 
[90C26] 
(see: MINLP: design and scheduling of batch processes) 
batch processes see: Medium-term scheduling of —; MINLP: 
design and scheduling of —; Reactive scheduling of — 
batch processes with resources see: Short-term scheduling 
of — 
batch production systems 
see: Planning in the process industry) 
batch reactor see: fed- — 
batch scheduling 
[62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
Bauer formula 
[65D25, 68W30] 
see: Complexity of gradients, Jacobians, and Hessians) 
Bayes see: naive —; simple — 
Bayes optimal rule 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
see: Disease diagnosis: optimization-based methods) 
Bayesian approach 
[62C10, 65K05, 90C10, 90C15, 90C26] 
see: Bayesian global optimization) 
Bayesian approach 
[62C10, 65K05, 90C10, 90C15, 90C26] 
see: Bayesian global optimization) 
Bayesian decision-theoretic framework 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
see: Stochastic global optimization: stopping rules) 
Bayesian global optimization 
(90C26, 90C10, 90C15, 65K05, 62C10) 
referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian networks; Genetic 
algorithms for protein structure prediction; Global 
optimization based on statistical models; Monte-Carlo 
simulated annealing in protein folding; Packet annealing; 
Random search methods; Simulated annealing; Simulated 
annealing methods in protein folding; Stochastic global 
optimization: stopping rules; Stochastic global 
optimization: two-phase methods) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Genetic algorithms for protein structure 
prediction; Global optimization based on statistical models; 
Monte-Carlo simulated annealing in protein folding; 
Packet annealing; Random search methods; Simulated 
annealing; Simulated annealing methods in protein folding; 


Stochastic global optimization: stopping rules; Stochastic 
global optimization: two-phase methods) 
Bayesian heuristic approach 
[62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
Bayesian inference 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
Bayesian methods 
[91B28] 
(see: Portfolio selection: markowitz mean-variance model) 
Bayesian networks 
(refers to: Bayesian global optimization; Evolutionary 
algorithms in combinatorial optimization; Neural networks 
for combinatorial optimization) 
Bayesian networks see: chain rule for —; dynamical — 
Bayesian parameter estimation 
90C25, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 
Bayesian stopping rule 
62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
Bayesian stopping rule 
65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: stopping rules) 
bB procedure 
90C26] 
(see: D.C. programming) 
BB search engine 
90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 
BCI 
[93-XX] 
(see: Boundary condition iteration BCI) 
BCI see: Boundary condition iteration — 
BDMST 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
be stable without pivoting see: guaranteed to — 


beacon 
(see: Semidefinite programming and the sensor network 
localization problem, SNLP) 

beam angle optimization 

68W01, 90-00, 90C90, 92-08, 92C50] 

(see: Optimization based frameworkfor radiation therapy) 

beam angle selection 

90C11, 90C59] 

(see: Nested partitions optimization) 

beam angle selection and wedge orientation optimization 

68W01, 90-00, 90C90, 92-08, 92C50] 

(see: Optimization based frameworkfor radiation therapy) 

beam cross-sectional shapes 

90C26, 90C90] 

(see: Structural optimization: history) 

beam’s-eye-view 

68W01, 90-00, 90C90, 92-08, 92C50] 

(see: Optimization based frameworkfor radiation therapy) 


Subject Index 


4101 


beam segmentation problem 
[68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
Beam selection in radiotherapy treatment design 
(refers to: Credit rating and optimization methods; 
Evolutionary algorithms in combinatorial optimization; 
Optimization based frameworkfor radiation therapy) 
beam weight optimization 
[68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
beams 
[90C26, 90C90] 
(see: Structural optimization: history) 
Beckmann QAP see: Koopmans— — 
Beckmann quadratic assignment problem see: Koopmans- — 
before-departure see: duty- — 
behavior see: asymptotic —; average —; average case —; 
day-to-day dynamic travel —; exponential —; 
noncooperative —; random — 
behavior of CAP on trees see: asymptotic — 
behavior of a generally nonhomogeneous and nonisotropic 
body see: linear thermoelastic — 
beliefs see: homogeneous — 
Bellman’s equation 
[49L20, 49L99, 90C39, 90C40] 
(see: Dynamic programming: average cost per stage 
problems; Dynamic programming: discounted problems; 
Dynamic programming: infinite horizon problems, 
overview; Dynamic programming: stochastic shortest path 


problems; Dynamic programming: undiscounted problems; 


Neuro-dynamic programming) 

Bellman equation see: derivation of the Hamilton—Jacobi- —; 
Hamilton—Jacobi- —; solution of the Hamilton—Jacobi- —; 
sufficiency theorem for the Hamilton-Jacobi- — 

Bellman-Ford method 
[90B10, 90C27] 

(see: Shortest path tree algorithms) 

ben-Tal SOCQ 
[49K27, 49K40, 90C30, 90C31] 

(see: Second order constraint qualifications) 

Benders decomposition 
[90-02, 90C10, 90C15, 90C27, 90C90] 

(see: Chemical process planning; L-shaped method for 
two-stage stochastic programs with recourse; Operations 
research models for supply chain management and design; 
Stochastic integer programs; Stochastic linear programs 
with recourse and arbitrary multivariate distributions; 
Stochastic programs with recourse: upper bounds; 
Time-dependent traveling salesman problem) 

Benders decomposition 
[90C26, 90C27] 

(see: Bilevel optimization: feasibility test and flexibility 
index; Time-dependent traveling salesman problem) 

Benders decomposition see: generalized —; nested — 

Benders decomposition approach 
[90C30, 90C35] 

(see: Optimization in water resources) 

Benders method see: generalized — 

best algorithm see: Bialas—Karwan Kth- — 

best approximation 
[41A30, 47A99, 65K10] 


(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 

best approximation see: Chebyshev —; maximal —; 
minimal —; polynomial of — 

best approximation by bounded or continuous functions see: 
Lipschitzian operators in — 

best approximation operator 

[41A30, 4799, 65K10] 

see: Lipschitzian operators in best approximation by 

bounded or continuous functions) 

Best approximation in ordered normed linear spaces 

90C46, 46B40, 41A50, 41A65) 

best arc construction procedure 

[68T99, 90C27] 

see: Capacitated minimum spanning trees) 

best bound 

[90C05, 90C06, 90C08, 90C10, 90C11] 

see: Integer programming: branch and bound methods) 

best bound rule 

[90C11] 

see: MINLP: branch and bound methods) 

best-compromise solution 

[90C11, 90C29, 90C90] 

see: Multi-objective optimization: interaction of design 

and control) 

best estimate 

[41A30, 4799, 65K10] 

see: Lipschitzian operators in best approximation by 

bounded or continuous functions) 

best estimate using pseudocosts 

[90C05, 90C06, 90C08, 90C10, 90C11] 

see: Integer programming: branch and bound methods) 

best estimate using pseudoshadow prices 

[90C05, 90C06, 90C08, 90C10, 90C11] 

see: Integer programming: branch and bound methods) 

best-first 

[65K05, 65Y05, 65Y10, 65Y20, 68W10] 

see: Interval analysis: parallel methods for global 

optimization) 

best-first tree search 

[68W10, 90C27] 

see: Load balancing for parallel optimization techniques) 

Best-First Tree Search see: Parallel — 

best fit 

[41A30, 62J02, 90C26] 

see: Regression by special functions: algorithms and 

complexity) 

best fitting to data 

[90C05, 90C25, 90C29, 90C30, 90C31] 

see: Nondifferentiable optimization: parametric 

programming) 

best improvement 

[68Q25, 68R10, 68W40, 9008, 90C26, 90C27, 90C59] 

see: Domination analysis in combinatorial optimization; 

Variable neighborhood search methods) 

Best improvement 

[9008, 90C26, 90C27, 90C59] 

see: Variable neighborhood search methods) 

best node construction procedure 

[68T99, 90C27] 

see: Capacitated minimum spanning trees) 
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best projection 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
best response mapping 
49]xx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
best start 
65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: two-phase methods) 
best value 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
see: Modeling difficult optimization problems) 
beta algorithm see: alpha- —; high failure of the alpha- —; low 
failure of the alpha- — 
B-sheet 
[92C40] 
(see: Monte-Carlo simulated annealing in protein folding) 
better approximation 
[90C27] 
(see: Steiner tree problems) 
between nonlinear complementarity problem and fixed point 
problem see: Equivalence — 
between primal and dual solutions see: exploiting the 
interplay — 
BF see: level —; model — 
BFGS see: L- —; L-RH- — 
BFGS algorithm see: limited-memory reduced-Hessian —; 
successive affine reduction — 
BFGS-CG relationship 
[90C30] 
(see: Conjugate-gradient methods) 
BFGS method 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
BFGS method see: limited-memory —; memoryless — 
BFGS quasi-Newton update 
[15A15, 90C25, 90C55, 90C90] 
(see: Semidefinite programming and determinant 
maximization) 
BFGS update 
[90C30] 
see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
BFGS update 
[90C30] 
see: Broyden family of methods and the BFGS update) 
BFGS update see: Broyden family of methods and the — 
BFM 
[90C30] 
(see: Broyden family of methods and the BFGS update) 
BFR see: level —; model — 
BFS 
[90C60] 
(see: Complexity of degeneracy) 
BFS see: degenerate —; nearly degenerate —; 
nondegenerate — 
bi-knapsack problem 
[90C10, 90C27] 
(see: Multidimensional knapsack problems) 


Bi-objective assignment problem 
(90C35, 90C10) 
(referred to in: Assignment and matching; Assignment 
methods in clustering; Communication network 
assignment problem; Decision support systems with 
multiple criteria; Estimating data for multicriteria decision 
making problems: optimization techniques; Financial 
applications of multicriteria analysis; Frequency 
assignment problem; Fuzzy multi-objective linear 
programming; Linear ordering problem; Maximum 
partition matching; Multicriteria sorting methods; 
Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling; Quadratic assignment 
problem) 
(refers to: Assignment and matching; Assignment methods 
in clustering; Communication network assignment 
problem; Decision support systems with multiple criteria; 
Estimating data for multicriteria decision making 
problems: optimization techniques; Financial applications 
of multicriteria analysis; Frequency assignment problem; 
Fuzzy multi-objective linear programming; Maximum 
partition matching; Multicriteria sorting methods; 
Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling; Quadratic assignment 
problem) 


bI-VNS 
9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 


Bialas-Karwan Kth-best algorithm 
90C30, 90C90] 
(see: Bilevel programming: global optimization) 


bias function 
65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 


bibliography of stochastic programming 
90C11, 90C15] 
(see: Stochastic programming with simple integer recourse) 
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bidding increment 
[90C30, 90C35] 
(see: Auction algorithms) 

bidding system see: preferential — 

bidimensional knapsack problem 

90C10, 90C27] 

(see: Multidimensional knapsack problems) 

bidual problem 

49K05, 49K10, 49K15, 49K20] 

(see: Duality in optimal control with first order differential 

equations) 

biduality 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: monoduality in convex optimization) 

biduality 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 

biduality in nonconvex optimization see: Duality theory: — 

biduality theorem 

49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 

bifunction see: monotone —; pseudomonotone —; 
quasimonotone — 

bifunction (with respect to another) see: pseudomonotone — 

big-M 

90C09, 90C10, 90C11] 

(see: MINLP: logic-based methods) 

bigraph 

90C09, 90C10] 

(see: Combinatorial matrix analysis) 

bigraph see: signed — 

bilateral boundary value problem 

49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 

Bilevel fractional programming 
(90C32, 90C26) 
(referred to in: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: global 
optimization; Bilevel programming: implicit function 
approach; Bilevel programming: introduction, history and 
overview; Bilevel programming in management; Bilevel 
programming: optimality conditions and duality; 
Fractional combinatorial optimization; Fractional 
programming; Multilevel methods for optimal design; 
Multilevel optimization in mechanics; Quadratic fractional 
programming: Dinkelbach method; Stochastic bilevel 
programs) 

Bilevel linear programming 
(49-01, 49K10, 49M37, 90-01, 91B52, 90C05, 90C27) 
(referred to in: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: global optimization; Bilevel programming: 
implicit function approach; Bilevel programming: 
introduction, history and overview; Bilevel programming in 
management; Bilevel programming: optimality conditions 


and duality; Multilevel methods for optimal design; 
Multilevel optimization in mechanics; Stochastic bilevel 
programs) 
(refers to: Bilevel fractional programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: 
applications in engineering; Bilevel programming: implicit 
function approach; Bilevel programming: introduction, 
history and overview; Bilevel programming in management; 
Bilevel programming: optimality conditions and duality; 
Multilevel methods for optimal design; Multilevel 
optimization in mechanics; Stochastic bilevel programs) 

bilevel linear programming 
[49-01, 49K10, 49K45, 49M37, 49N10, 90-01, 90C05, 90C20, 
90C27, 91B52] 
(see: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs) 

Bilevel linear programming: complexity, equivalence to 
minmax, concave programs 
(49-01, 49K45, 49N10, 90-01, 91B52, 90C20, 90C27) 
(referred to in: Bilevel linear programming; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: global optimization; Bilevel programming: 
implicit function approach; Bilevel programming: 
introduction, history and overview; Bilevel programming in 
management; Bilevel programming: optimality conditions 
and duality; Concave programming; Minimax: directional 
differentiability; Minimax theorems; Minimum concave 
transportation problems; Multilevel methods for optimal 
design; Multilevel optimization in mechanics; 
Nondifferentiable optimization: minimax problems; 
Stochastic bilevel programs; Stochastic programming: 
minimax approach) 
(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: 
applications in engineering; Bilevel programming: implicit 
function approach; Bilevel programming: introduction, 
history and overview; Bilevel programming in 
management; Bilevel programming: optimality conditions 
and duality; Concave programming; Minimax: directional 
differentiability; Minimax theorems; Minimum concave 
transportation problems; Multilevel methods for optimal 
design; Multilevel optimization in mechanics; 
Nondifferentiable optimization: minimax problems; 
Stochastic bilevel programs; Stochastic programming: 
minimax approach; Stochastic quasigradient methods in 
minimax problems) 

bilevel optimization 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 

bilevel optimization 
[90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Mixed integer nonlinear bilevel programming: 
deterministic global optimization) 
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Bilevel Optimization see: Mixed Integer — 

Bilevel optimization: feasibility test and flexibility index 
(90C26) 
(referred to in: Adaptive convexification in semi-infinite 
optimization; Bilevel linear programming; Bilevel linear 


programming: complexity, equivalence to minmax, concave 


programs; Bilevel programming; Bilevel programming: 
applications; Bilevel programming: global optimization; 
Bilevel programming: implicit function approach; Bilevel 
programming: introduction, history and overview; Bilevel 
programming in management; Bilevel programming: 


optimality conditions and duality; Generalized semi-infinite 


programming: optimality conditions; Minimax: directional 
differentiability; Minimax theorems; Multilevel methods 
for optimal design; Multilevel optimization in mechanics; 
Nondifferentiable optimization: minimax problems; 
Stochastic bilevel programs; Stochastic programming: 
minimax approach) 
(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: applications in engineering; Bilevel 
programming: implicit function approach; Bilevel 
programming: introduction, history and overview; Bilevel 
programming in management; Bilevel programming: 
optimality conditions and duality; Minimax: directional 
differentiability; Minimax theorems; Multilevel methods 
for optimal design; Multilevel optimization in mechanics; 
Nondifferentiable optimization: minimax problems; 
Stochastic bilevel programs; Stochastic programming: 
minimax approach; Stochastic quasigradient methods in 
minimax problems) 

bilevel program 
[49-01, 49K10, 49M37, 90-01, 90B30, 90B50, 90C05, 90C25, 
90C26, 90C27, 90C29, 90C30, 90C31, 91A10, 91B32, 91B52, 
91B74] 
(see: Bilevel linear programming; Bilevel programming; 
Bilevel programming in management; Bilevel 
programming: optimality conditions and duality) 

bilevel program see: stochastic — 

Bilevel programming 
(49M37, 90C26, 91A10) 
(referred to in: Bilevel linear programming; Bilevel linear 


programming: complexity, equivalence to minmax, concave 


programs; Bilevel optimization: feasibility test and 


flexibility index; Bilevel programming: applications; Bilevel 


programming: global optimization; Bilevel programming: 
implicit function approach; Bilevel programming: 


introduction, history and overview; Bilevel programming in 


management; Bilevel programming: optimality conditions 
and duality; Multilevel methods for optimal design; 
Multilevel optimization in mechanics; Stochastic bilevel 
programs) 

(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming: applications; Bilevel programming: 


applications in engineering; Bilevel programming: implicit 


function approach; Bilevel programming: introduction, 


history and overview; Bilevel programming in management; 
Bilevel programming: optimality conditions and duality; 
Multilevel methods for optimal design; Multilevel 
optimization in mechanics; Stochastic bilevel programs) 


bilevel programming 


[90C26, 90C30, 90C31] 
(see: Bilevel programming: introduction, history and 
overview) 


bilevel programming 


[90-01, 90B30, 90B50, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C90, 91B32, 91B52, 91B74] 

(see: Bilevel programming: global optimization; Bilevel 
programming: introduction, history and overview; Bilevel 
programming in management; Bilevel programming: 
optimality conditions and duality; MINLP: reactive 
distillation column synthesis; Mixed integer nonlinear 
bilevel programming: deterministic global optimization; 
Optimization with equilibrium constraints: A piecewise 
SQP approach) 


bilevel programming see: complexity of —; duality for —; 


enumeration in —; implicit function approach to — 


Bilevel programming: applications 


(91B99, 90C90, 91465) 

(referred to in: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: global optimization; Bilevel programming: 
implicit function approach; Bilevel programming: 
introduction, history and overview; Bilevel programming in 
management; Bilevel programming: optimality conditions 
and duality; Multilevel methods for optimal design; 
Multilevel optimization in mechanics; Stochastic bilevel 
programs) 

(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications in 
engineering; Bilevel programming: implicit function 
approach; Bilevel programming: introduction, history and 
overview; Bilevel programming in management; Bilevel 
programming: optimality conditions and duality; 
Multilevel methods for optimal design; Multilevel 
optimization in mechanics; Stochastic bilevel programs) 


Bilevel programming: applications in engineering 


(referred to in: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: global 
optimization; Bilevel programming: implicit function 
approach; Bilevel programming: introduction, history and 
overview; Bilevel programming in management; Bilevel 
programming: optimality conditions and duality; Design 
optimization in computational fluid dynamics; Interval 
analysis: application to chemical engineering design 
problems; Multidisciplinary design optimization; 
Multilevel methods for optimal design; Multilevel 
optimization in mechanics; Optimal design of composite 
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structures; Optimal design in nonlinear optics; Stochastic 
bilevel programs; Structural optimization: history) 


programming; Bilevel programming: applications; Bilevel 
programming: applications in engineering; Bilevel 


bilevel programming: deterministic global optimization see: 
Mixed integer nonlinear — 

Bilevel programming framework for enterprise-wide process 
networks under uncertainty 

Bilevel programming: global optimization 
(90C90, 90C30) 


programming: implicit function approach; Bilevel 
programming in management; Bilevel programming: 
optimality conditions and duality; Multilevel methods for 
optimal design; Multilevel optimization in mechanics; 
Stochastic bilevel programs) 

Bilevel programming in management 


(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: applications in engineering; Bilevel 
programming: implicit function approach; Bilevel 
programming: introduction, history and overview; Bilevel 
programming: optimality conditions and duality; 
Multilevel methods for optimal design; Multilevel 
optimization in mechanics; Stochastic bilevel programs) 
Bilevel programming: implicit function approach 

(90C26, 90C31, 91A65) 

(referred to in: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: global 
optimization; Bilevel programming: introduction, history 
and overview; Bilevel programming in management; Bilevel 
programming: optimality conditions and duality; 
Multilevel methods for optimal design; Multilevel 
optimization in mechanics; Stochastic bilevel programs) 
(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: applications in engineering; Bilevel 
programming: introduction, history and overview; Bilevel 
programming in management; Bilevel programming: 
optimality conditions and duality; Multilevel methods for 
optimal design; Multilevel optimization in mechanics; 
Stochastic bilevel programs) 

Bilevel programming: introduction, history and overview 
(90C26, 90C30, 90C31) 

(referred to in: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: global 
optimization; Bilevel programming: implicit function 
approach; Bilevel programming in management; Bilevel 
programming: optimality conditions and duality; 
Multilevel methods for optimal design; Multilevel 
optimization in mechanics; Nondifferentiable 
optimization: parametric programming; Optimization with 
equilibrium constraints: A piecewise SQP approach; 
Stochastic bilevel programs) 

(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 


(90-01, 91B52, 91B74, 91B32, 90B30, 90B50) 

(referred to in: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: implicit 
function approach; Bilevel programming: introduction, 
history and overview; Bilevel programming: optimality 
conditions and duality; Multilevel methods for optimal 
design; Multilevel optimization in mechanics; Stochastic 
bilevel programs) 

(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: applications in engineering; Bilevel 
programming: implicit function approach; Bilevel 
programming: introduction, history and overview; Bilevel 
programming: optimality conditions and duality; 
Multilevel methods for optimal design; Multilevel 
optimization in mechanics; Stochastic bilevel programs) 


Bilevel programming: optimality conditions and duality 


(90C25, 90C29, 90C30, 90C31) 

(referred to in: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: global 
optimization; Bilevel programming: implicit function 
approach; Bilevel programming: introduction, history and 
overview; Bilevel programming in management; Multilevel 
methods for optimal design; Multilevel optimization in 
mechanics; Stochastic bilevel programs) 

(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: applications in engineering; Bilevel 
programming: implicit function approach; Bilevel 
programming: introduction, history and overview; Bilevel 
programming in management; Multilevel methods for 
optimal design; Multilevel optimization in mechanics; 
Nondifferentiable optimization: parametric programming; 
Stochastic bilevel programs) 


bilevel programming problem 


[90C25, 90C26, 90C29, 90C30, 90C31, 91A65] 
(see: Bilevel programming: implicit function approach; 
Bilevel programming: optimality conditions and duality) 


bilevel programming problem 


[90C26, 90C31, 91465] 
(see: Bilevel programming: implicit function approach) 
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bilevel programming problem see: generalized —; linear — 
bilevel programming problems 
[90C30, 90C90] 
(see: Bilevel programming: global optimization) 
bilevel programming problems see: solution of — 
bilevel programs 
[90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 
bilevel programs see: algorithms for stochastic —; 
Stochastic — 
bilinear 
[90C26] 
(see: MINLP: design and scheduling of batch processes) 
bilinear 
[90C11, 90C90] 
(see: MINLP: trim-loss problem) 
bilinear form see: K-local — 
bilinear forms 
[49-XX, 65M60, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization; 
Variational inequalities: F. E. approach) 
bilinear function 
[90C26] 
see: Convex envelopes in optimization problems) 
bilinear matrix inequality 
[93D09] 
see: Robust control) 
Bilinear programming 
bilinear programming 
[49M37, 90C09, 90C10, 90C11, 90C25, 90C26, 91A10] 
see: Bilevel programming; Concave programming; 
Disjunctive programming; MINLP: branch and bound 
methods) 
bilinear programming 
[49-01, 49K45, 49N10, 90-01, 90C20, 90C27, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs) 
bilinear programming see: difficulties in —; optimality in —; 
stable — 
Bilinear programming: applications in the supply chain 
management 
bilinear programming problem 
[49-01, 49K45, 49N10, 90-01, 90C20, 90C27, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs) 
bilinear symmetric continuous form see: coercive — 
bilinear terms 
[90C26, 90C90] 
(see: Global optimization of heat exchanger networks) 
bimatrix games 
[90C11, 90C33] 
(see: LCP: Pardalos-Rosen mixed integer formulation) 
bimatrix games 
[90C11, 90C33] 
(see: LCP: Pardalos-Rosen mixed integer formulation; 
Linear complementarity problem) 
bin packing problem 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 


binary 
[90C26, 90C90] 
(see: Global optimization in binary star astronomy) 
binary constraint satisfaction problem 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
binary CSPs 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
binary encoding 
[92B05] 
(see: Genetic algorithms) 
binary encoding 
[92B05] 
(see: Genetic algorithms) 
binary heap 
90B10, 90C27] 
(see: Shortest path tree algorithms) 
binary length 
05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
binary linear programming 
90C30] 
(see: Lagrangian duality: BASICS) 
binary matroid 
90C09, 90C10] 
(see: Matroids) 
binary noninterference constraints 
90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
binary operations on relations 
03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
binary programming see: positive semi-definite quadratic — 
binary relation 
03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
binary search 
90C09 
(see: Inference of monotone boolean functions) 
binary search algorithm 
90C09 
(see: Inference of monotone boolean functions) 
binary search-Hansel chains question-asking strategy 
90C09 
(see: Inference of monotone boolean functions) 
binary search-Hansel chains question-asking strategy 
90C09 
(see: Inference of monotone boolean functions) 
binary search method 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
binary star 
[90C26, 90C90] 
(see: Global optimization in binary star astronomy) 
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binary star see: spectroscopic visual —; visual — 
binary star astronomy see: Global optimization in — 
binary surrogates 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
binary tree 
[90C10, 90C26] 
(see: MINLP: branch and bound global optimization 
algorithm) 
binary trees 
[05C85] 
(see: Directed tree networks) 
binary variables 
[90C11] 
(see: MINLP: branch and bound methods) 
binary vectors see: ordering on — 
binomial see: degree of a — 
binomial distribution 
[90C15] 
(see: Logconcavity of discrete distributions) 
binomial moments 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
biochemical processes 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
biological constraints see: incorporation of — 
biology see: computational — 
biomolecular structures see: Steiner ratio of — 
biorthogonal polynomial 
33C45, 65F20, 65F22, 65K10] 
(see: Least squares orthogonal polynomials) 
bipartite 
90C05, 90C10, 90C27, 90C35] 
(see: Assignment and matching) 
bipartite chordal graph 
05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
bipartite graph 
05C85, 90C09, 90C10] 
(see: Combinatorial matrix analysis; Directed tree 
networks) 
bipartite graph see: convex — 
bipartite matching 
[90C05, 90C10, 90C27, 90C35] 
(see: Assignment and matching) 
bipartite matching problem see: weighted — 
bipartite network 
[90B06, 90B10, 90C26, 90C35] 
(see: Minimum concave transportation problems) 
bipartite subgraph see: maximum — 
bipartite tournament 
[90C35] 
(see: Feedback set problems) 
bipartitioning problem see: graph — 
bipartization see: graph — 
bipartization problem see: graph —; minimum weighted 
graph — 


BiQAP 
[90C08, 90C11, 90C27] 
(see: Biquadratic assignment problem) 

Biquadratic assignment problem 
(90C27, 90C11, 90C08) 
(refers to: Feedback set problems; Generalized assignment 
problem; Graph coloring; Graph planarization; Greedy 
randomized adaptive search procedures; Integer 
programming: branch and bound methods; Quadratic 
assignment problem) 

biquadratic assignment problem 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Biquadratic assignment problem; Quadratic 

assignment problem) 

birkhoff’s theorem 

[90C09, 90C10] 

see: Combinatorial matrix analysis) 

bisection 

[65G20, 65G30, 65G40, 65K05, 90C30] 

see: Interval global optimization) 

bisection 

[65K05, 90C30] 

see: Bisection global optimization methods) 

bisection see: max- —; multidimensional — 

bisection algorithm see: generalized — 

Bisection global optimization methods 
(90C30, 65K05) 

referred to in: &BB algorithm; Global optimization: interval 

analysis and balanced interval arithmetic) 

refers to: &BB algorithm) 

section method 

[65K05, 90C20, 90C25, 90C30] 

see: Bisection global optimization methods; Quadratic 

programming over an ellipsoid) 

section method 

[90C20, 90C25] 

see: Quadratic programming over an ellipsoid) 

section Problem see: minimum — 

section search 

90C30] 

(see: Convex-simplex algorithm) 

section search 

90C30] 

(see: Frank-Wolfe algorithm) 

bisectored 

[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

see: Optimization problems in unit-disk graphs) 

sectored unit disk graphs 

[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

see: Optimization problems in unit-disk graphs) 

bisubmodular system 

[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 

bisymmetric matrix 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 

bisymmetric positive semidefinite matrix 
[65K05, 90C20, 90C33] 


b 


b 
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(see: Principal pivoting methods for linear complementarity 
problems) 
BK-product of relations 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
bK-products 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
BK-products 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
BL-logic 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
black-box 
[65K99] 
see: Global optimization of planar multilayered dielectric 
structures; Maximum cut problem, MAX-CUT) 
black-box global optimization 
[65K05] 
see: Direct global optimization algorithm) 
black-box optimization 
[90C05] 
see: Global optimization in the analysis and management 
of environmental systems) 
black-box optimization 
[65K05] 
see: Direct global optimization algorithm) 
black-box strategy 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
black oil model 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
Black-Scholes model 
[91B50] 
(see: Financial equilibrium) 
blackball number 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
Bland see: rule of — 
Bland least index pivoting rule 
[90C05] 
see: Linear programming: Klee-Minty examples) 
Bland rule 
[90C05] 
see: Linear programming: Klee-Minty examples) 
Bland technique 
[90C60] 
(see: Complexity of degeneracy) 
ended panel 
[90C26, 90C29] 
see: Optimal design of composite structures) 
blending 
[90C30, 90C90] 


b 


= 


(see: MINLP: applications in blending and pooling 
problems) 
blending see: nonlinear — 
blending and distribution scheduling: an MILP model see: 
Gasoline — 
blending index 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 
blending and pooling problems see: MINLP: applications in — 
blending problems see: pooling and — 
bliss 
[49Jxx, 91Axx] 
(see: Infinite horizon control and dynamic games) 
block 
[90B35] 
(see: Job-shop scheduling problem) 
block see: solution —; vehicle — 
block-0-diagonal operator 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
block angular form 
[65Fxx] 
(see: Least squares problems) 
block-angular structure 
[90C06] 
(see: Decomposition principle of linear programming) 
block-angular structure see: dual — 
block-clique graph 
[05C50, 15A48, 15457, 90C25] 
(see: Matrix completion problems) 
block of a partition 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 
block pivot 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
block plan 
[90B80] 
(see: Facilities layout problems) 
block theorem 
[90B35] 
(see: Job-shop scheduling problem) 
block truncated Newton software package 
[90C10, 90C26, 90C30] 
(see: Optimization software) 
blocks for the process units see: building — 
blocks of variables see: eliminating — 
blossom 
[90C05, 90C10, 90C27, 90C35] 
(see: Assignment and matching) 
blossom inequalities 
[90C05, 90C10, 90C27, 90C35] 
(see: Assignment and matching) 
BLPP 
[90C30, 90C90] 
(see: Bilevel programming: global optimization) 
BLPP see: complexity of the linear — 
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BMI 
[93D09] 
(see: Robust control) 
BNR-Prolog 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
board matrix see: chess- — 
body see: linear thermoelastic behavior of a generally 
nonhomogeneous and nonisotropic — 
body-tail problem see: head- — 
bold arcs 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 
bold connective 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
bold-driver method 
[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 
(see: Unconstrained optimization in neural network 
training) 
bold strategy 
[491.20, 90C40] 
(see: Dynamic programming: undiscounted problems) 
boldface 
[90B15] 
(see: Evacuation networks) 
Boltzmann constant 
[90C27, 90C90] 
(see: Simulated annealing) 
boltzmann density 
(see: Laplace method and applications to optimization 
problems) 
Boltzmann distribution 
65K05, 90C30] 
(see: Random search methods) 
bond angle 
92B05 
(see: Genetic algorithms for protein structure prediction) 
bond angle 
92B05 
(see: Genetic algorithms for protein structure prediction) 
bond distance 
92B05 
(see: Genetic algorithms for protein structure prediction) 
bond distance 
92B05 
(see: Genetic algorithms for protein structure prediction) 
bonds with constant maturities see: estimating the spot rate 
for — 
Bonferroni bounds see: Boole- — 
Boole-Bonferroni bounds 


[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 


(see: Approximation of multivariate probability integrals) 
Boole-Bonferroni bounds 


[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 


(see: Approximation of multivariate probability integrals) 
Boolean 2-valued function 
[03B50, 68T15, 68T30] 


(see: Finite complete systems of many-valued logic algebras) 


Boolean 2-valued logic algebra 

03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 

Boolean algebra 

03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 

Boolean arrangement of hyperplanes 

05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements) 

Boolean circuit see: depth of a — 

Boolean classification problem 

90C09, 90C10] 

(see: Optimization in boolean classification problems) 

Boolean classification problem 

90C09, 90C10] 

(see: Optimization in boolean classification problems) 

boolean classification problems see: Optimization in — 

Boolean connective 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

Boolean formula see: satisfiable — 

Boolean formula in conjunctive normal form 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 

94C10] 

see: Maximum satisfiability problem) 

Boolean formulas see: satisfiability of — 

Boolean function 

[90C09] 

see: Inference of monotone boolean functions) 

Boolean function 

[90C09] 

see: Inference of monotone boolean functions) 

Boolean function see: antitone —; antitone monotone —; 
isotone —; isotone monotone —; monotone —; 
nondecreasing monotone —; nonincreasing monotone — 

Boolean function inference see: monotone — 

Boolean function inference problem 
[90C09] 

(see: Inference of monotone boolean functions) 

Boolean function inference problem 
[90C09] 

(see: Inference of monotone boolean functions) 

boolean functions see: Inference of monotone —; interactive 
learning of — 

Boolean and fuzzy relations 
(03E72, 03B52, 47840, 68T27, 68T35, 68Uxx, 91B06, 90Bxx, 
91Axx, 92C60) 

(referred to in: Alternative set theory; Checklist paradigm 
semantics for fuzzy logics; Finite complete systems of 
many-valued logic algebras; Inference of monotone boolean 
functions; Optimization in boolean classification problems; 
Optimization in classifying text documents) 

(refers to: Alternative set theory; Checklist paradigm 
semantics for fuzzy logics; Finite complete systems of 
many-valued logic algebras; Inference of monotone boolean 
functions; Optimization in boolean classification problems; 
Optimization in classifying text documents) 

Boolean satisfiability 
[90C60] 

(see: Complexity theory) 
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Boolean variables 
(see: Logic-based outer approximation) 

bootstrapping 

90C35] 

(see: Feedback set problems) 

border node 

90B10, 90C27] 

(see: Shortest path tree algorithms) 

border nodes set 

90B10, 90C27] 

(see: Shortest path tree algorithms) 

bordering method 

33C45, 65F20, 65F22, 65K10] 

(see: Least squares orthogonal polynomials) 

bore model see: well — 

Borel set 

03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 

BOT types of logical connectives see: TOP and — 

both water environment see: minimizing the degradation in 
quality of — 

both-way compatibility 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

bottleneck machine 

90B35] 

(see: Job-shop scheduling problem) 

bottleneck measure 

62H30, 90C27] 

(see: Assignment methods in clustering) 

bottleneck quadratic assignment problem 

90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 

bottleneck Steiner ratio 

05C05, 05C85, 68Q25, 90B80] 

(see: Bottleneck steiner tree problems) 

Bottleneck steiner tree problems 

05C05, 05C85, 68Q25, 90B80) 

referred to in: Capacitated minimum spanning trees; 

Minimax game tree searching; Shortest path tree 

algorithms; Steiner tree problems) 

refers to: Capacitated minimum spanning trees; Directed 
tree networks; Minimax game tree searching; Shortest path 
tree algorithms; Steiner tree problems) 

bottleneck Steiner trees 

[05C05, 05C85, 68Q25, 90B80] 

see: Bottleneck steiner tree problems) 

bottleneck Steiner trees 

[05C05, 05C85, 68Q25, 90B80] 

see: Bottleneck steiner tree problems) 

bottlenecks in NLP solvers 

65L99, 93-XX] 

(see: Optimization strategies for dynamic systems) 

Bouligand cone 

65K05, 90C30, 90Cxx] 

(see: Nondifferentiable optimization: minimax problems; 

Quasidifferentiable optimization: optimality conditions) 

Bouligand cone 

65K05, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization) 


bouligand tangent 
[49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 

Bouligand tangent cone 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 

bound see: basic rules of branch and —; best —; Branch 
and —; branch-and- —; Edmundson—Madansky upper —; 
error —; Gilmore—Lawler lower —; global error —; 
guaranteed —; guaranteed lower —; Hunter—Worsley 
upper —; Jensen lower —; Lehmann-Goerisch —; lower —; 
IP/NLP based branch and —; optimal componentwise —; 
parametric lower —; parametric upper —; piecewise linear 
upper —; polynomial upper —; QP/NLP based branch 
and —; Rayleigh-Ritz —; reformulation/spatial branch 
and —-; restricted-recourse —; stochastic branch and —; 
upper —; valid lower —; valid upper — 

bound algorithm see: branch and — 

bound algorithm for weighted graph planarization see: branch 
and — 

bound algorithms see: branch and — 

bound for approximate solutions of nonlinear systems of 
equations see: error — 

bound consistency 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 

bound constrained quadratic problem 
[65K05, 90C20] 
(see: Quadratic programming with bound constraints) 

bound constraints 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: unconstrained and constrained 
optimization) 

bound constraints see: flow —; Quadratic programming 
with — 

bound dichotomy see: generalized-upper- — 

bound enumerative techniques see: branch and — 

bound-factors 
[90C26] 
(see: Reformulation-linearization technique for global 
optimization) 

bound function see: lower — 

bound on gas lift availability see: upper — 

bound global optimization algorithm see: MINLP: branch 
and — 

bound-improvement see: dual —; primal — 

bound method see: arc oriented branch and —; branch and —; 
node oriented branch and — 

bound methods see: branch and —; Integer programming: 
branch and —-; Interval analysis: subdivision directions in 
interval branch and —; MINLP: branch and — 

bound to optimality see: guaranteed — 

bound and outer approximation see: hybrid branch and — 

bound principle see: branch and — 

bound rule see: best — 

bound scheme see: branch and — 

bound for a set see: lower —; upper — 

bound for solutions of nonlinear systems of equations see: 
rigorous — 

bound strategy see: branch and — 

bound techniques see: branch and — 
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bound test see: lower —; upper- — 

bound for unconstrained optimization see: branch and — 
boundary see: lower —; upper — 

Boundary condition iteration BCI 

(93-XX) 

(referred to in: Control vector iteration CVI) 

(refers to: Control vector iteration CVI) 
boundary condition iteration method 

[93-XX] 

(see: Boundary condition iteration BCI) 
boundary conditions 

[03H10, 49J27, 49K05, 49K10, 49K15, 49K20, 90C34] 

(see: Duality in optimal control with first order differential 

equations; Semi-infinite programming and control 

problems) 
boundary conditions see: elastostatics with nonlinear —; 
quasidifferential elastic —; quasidifferential thermal —; 
variational formulation of quasidifferential thermal — 
boundary dependence property 

[90C33] 

(see: Topological methods in complementarity theory) 
boundary effects 

[9008, 90C26, 90C27, 90C59] 

(see: Variable neighborhood search methods) 
boundary flux estimation in distributed systems 

[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 

76R50, 80A20, 80A23, 80A30] 

(see: Identification methods for reaction kinetics and 

transport) 
boundary of a function 

[90C10, 90C25, 90C27, 90C35] 

(see: L-convex functions and M-convex functions) 
boundary laws and variational equalities see: single-valued — 
boundary point see: lower —; upper — 
boundary triangulation 

[52B11, 52B45, 52B55] 

(see: Volume computation for polytopes: strategies and 

performances) 
boundary value conditions 

[34A55, 78A60, 90C30] 

(see: Optimal design in nonlinear optics) 
boundary value problem see: bilateral —; ODE two-point —; 

two-point —; unilateral — 
boundary variation technique 

[49J20, 49]52] 

(see: Shape optimization) 
bounded 

[03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 
bounded 

[90C30] 

(see: Frank-Wolfe algorithm) 
bounded or continuous functions see: Lipschitzian operators 

in best approximation by — 
bounded cost per stage see: discounted problem with — 
bounded degree minimum spanning tree problem 

[05C05, 05C40, 68R10, 90C35] 

(see: Network design problems) 
bounded integer variable see: multiple branches for — 
bounded level set 

[90C05, 90C25, 90C30, 90C34] 


(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

bounded polynomial time algorithm see: efficient 
polynomially — 

bounded ratio disk graphs 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 

bounded rationality 
[90-01, 90B30, 90B50, 91B32, 91B52, 91B74] 
(see: Bilevel programming in management) 

bounded Turing machine see: exponentially space- —; 
exponentially time- —; polynomially space- —; 
polynomially time- — 

bounding see: lower —; upper — 

Bounding derivative ranges 
(90C30, 90C26) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Global optimization: application to phase 
equilibrium problems; Interval analysis: application to 
chemical engineering design problems; Interval analysis: 
differential equations; Interval analysis: eigenvalue bounds 
of interval matrices; Interval analysis: intermediate terms; 
Interval analysis: nondifferentiable problems; Interval 
analysis: systems of nonlinear equations; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 
linear systems; Interval Newton methods) 
(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Global optimization: application to phase 
equilibrium problems; Interval analysis: application to 
chemical engineering design problems; Interval analysis: 
differential equations; Interval analysis: eigenvalue bounds 
of interval matrices; Interval analysis: intermediate terms; 
Interval analysis: nondifferentiable problems; Interval 
analysis: parallel methods for global optimization; Interval 
analysis: subdivision directions in interval branch and 
bound methods; Interval analysis: systems of nonlinear 
equations; Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; Interval 
Newton methods) 

bounding the expectation 
[90C15] 
(see: Stochastic linear programs with recourse and arbitrary 
multivariate distributions) 

bounding Hessian see: lower — 

bounding step 
[90C10, 90C26] 
(see: MINLP: branch and bound global optimization 
algorithm) 

bounding structure see: generalized upper — 

bounds 
[62C20, 90B80, 90C15, 90C31] 
(see: Facilities layout problems; Sensitivity and stability in 
NLP: approximation; Stochastic programming: minimax 
approach) 
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bounds see: arc flow —; Boole-Bonferroni —; computable 
optimal value —; constructive lower —; eigenvalue based 
lower —; Gilmore-Lawler type lower —; 
Hunter-Worsley —; Lagrangian —; lower —; lower 
weight —; Maximum constraint satisfaction: relaxations and 
upper —; maximum flow problem with nonnegative 
lower —; parametric —; parametric upper and lower —; 
Stochastic programs with recourse: upper —; upper —; 
variance reduction lower — 
bounds based on semidefinite relaxations 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
bounds constraints 
[65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
bounds constraints see: generalized upper —; lower and 
upper — 
bounds on the distance of a feasible point to a solution point 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 
bounds to eigenvalues see: upper and lower — 
bounds of interval matrices see: Interval analysis: eigenvalue — 
bounds for linear equations 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 
bounds for multivariate probability integrals see: lower —; 
upper — 
bounds for NLP see: solution-point — 
bounds on simplices 
[90C15] 
(see: Stochastic linear programs with recourse and arbitrary 
multivariate distributions) 
Bounds and solution vector estimates for parametric NLPS 
(90C31) 
(referred to in: Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 
Nondifferentiable optimization: parametric programming; 
Parametric global optimization: sensitivity; Parametric 
linear programming: cost simplex algorithm; Parametric 
mixed integer nonlinear optimization; Parametric 
optimization: embeddings, path following and singularities; 
Selfdual parametric method for linear programs) 
(refers to: Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 
Nondifferentiable optimization: parametric programming; 
Parametric global optimization: sensitivity; Parametric 
linear programming: cost simplex algorithm; Parametric 
mixed integer nonlinear optimization; Parametric 
optimization: embeddings, path following and singularities; 
Selfdual parametric method for linear programs) 
bounds subject to moment conditions see: optimal integral — 
box 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
box see: black- —; feasible —; indeterminate —; process- —; 
reduced — 
box(2)-consistency 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 


box consistency 
[65G20, 65G30, 65G40, 65K05, 68T20, 90C30] 
(see: Interval constraints; Interval global optimization) 
box constraints 
[90C60] 
(see: Complexity theory: quadratic programming) 
box constraints 
[90C60] 
(see: Complexity theory: quadratic programming) 
box global optimization see: black- — 
box optimization see: black- — 
box strategy see: black- — 
boxes see: canonical —; leading — 
BP 
90C26, 90C31, 91A65] 
(see: Bilevel programming: implicit function approach) 
BPP 
90C30, 90C90] 
(see: Bilevel programming: global optimization) 
BQAP 
90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
brachytherapy 
68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
bracket 
65K05, 90C30] 
(see: Bisection global optimization methods) 
bracket see: multidimensional — 
bracketing 
90C30] 
(see: Nonlinear least squares problems) 
Braess paradox 
90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
Braess paradox 
90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
braid arrangement of hyperplanes 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
braid group 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
Brakhage preconditioner see: Atkinson— — 
branch 
90C11] 
(see: MINLP: branch and bound methods) 
branch see: cut-and- — 
branch & cut algorithms see: Stable set problem: — 
branch-and-bound 
90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
branch-and-price 
90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
branch-and-price 
90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
branch and bound 
49M37, 65G30, 65G40, 65K05, 65K10, 68M20, 68Q25, 


Subject Index 


4113 


68Q99, 90B06, 90B10, 90B35, 90B50, 90B80, 90C05, 90C10, 
90C11, 90C25, 90C26, 90C27, 90C29, 90C30, 90C35, 90C57, 
90C90, 92C40] 
(see: Branch and price: Integer programming with column 
generation; Chemical process planning; Communication 
network assignment problem; Concave programming; 
Global optimization in batch design under uncertainty; 
Global optimization of heat exchanger networks; Global 
optimization: interval analysis and balanced interval 
arithmetic; Graph coloring; Integer programming: 
lagrangian relaxation; Inventory management in supply 
chains; Lagrangian duality: BASICS; MINLP: branch and 
bound methods; Mixed integer nonlinear programming; 
Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization; Interactive methods for 
preference value functions; Multiple minima problem in 
protein folding: «BB global optimization approach; 
Nonlinear systems of equations: application to the 
enclosure of all azeotropes; Set covering, packing and 
partitioning problems; Time-dependent traveling salesman 
problem; Vehicle scheduling) 

Branch and bound 
[49M20, 49M37, 65K05, 65K10, 90B80, 90C05, 90C06, 90C08, 
90C10, 90C11, 90C20, 90C26, 90C30, 90C31] 
(see: @BB algorithm; Facilities layout problems; 
Generalized outer approximation; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: lagrangian 
relaxation; Linear ordering problem; MINLP: branch and 
bound methods; MINLP: global optimization with a BB; 
Mixed integer nonlinear programming; Multiparametric 
mixed integer linear programming; Standard quadratic 
optimization problems: applications; Stochastic 
transportation and location problems) 

branch and bound see: basic rules of —; IP/NLP based —; 
QP/NLP based —; reformulation/spatial —; stochastic — 

branch and bound algorithm 

65K05, 90C11, 90C26] 

(see: MINLP: global optimization with wBB) 

branch and bound algorithm for weighted graph planarization 

90C10, 90C27, 94C15] 

(see: Graph planarization) 

branch and bound algorithms 

65G20, 65G30, 65G40, 65H20, 65K99, 90B35] 

(see: Interval Newton methods; Job-shop scheduling 

problem) 

branch and bound algorithms 

90C10, 90C26] 

(see: MINLP: branch and bound global optimization 

algorithm) 

branch and bound enumerative techniques 

65K05, 90C20] 
(see: Quadratic programming with bound constraints) 

branch and bound global optimization algorithm see: 
MINLP: — 

branch and bound method 
[90B10] 
(see: Piecewise linear network flow problems) 

branch and bound method see: arc oriented —; node 
oriented — 


branch and bound methods 
[90C30, 90C90] 
(see: Bilevel programming: global optimization) 

branch and bound methods see: Integer programming: —; 
Interval analysis: subdivision directions in interval —; 
MINLP: — 

branch and bound and outer approximation see: hybrid — 

branch and bound principle 

[65G20, 65G30, 65G40, 65K05, 90C30] 

see: Interval global optimization) 

branch and bound scheme 

[90C26] 

see: Convex envelopes in optimization problems) 

branch and bound strategy 

[49-01, 49K10, 49M37, 90-01, 90C05, 90C27, 91B52] 

see: Bilevel linear programming) 

branch and bound techniques 

[65G20, 65G30, 65G40, 65K05, 90C30] 

see: Interval global optimization) 

branch and bound for unconstrained optimization 

[65G20, 65G30, 65G40, 65H20] 

see: Interval analysis: unconstrained and constrained 

optimization) 

branch and contract algorithm 

[90C26, 90C90] 

see: Global optimization of heat exchanger networks) 

branch and cut 

[90C10, 90C11, 90C26, 90C27, 90C35, 90C57] 

see: Cutting plane methods for global optimization; 
MINLP: branch and bound methods; Multicommodity flow 
problems; Optimization in leveled graphs; Set covering, 
packing and partitioning problems) 

branch and cut algorithm see: Junger—Mutzel — 

branch and cut algorithms see: Integer programming: — 

branch and cut procedure 

[90C10, 90C11, 90C27, 90C57] 

see: Integer programming) 

branch decomposition 

[68R10, 90C27] 

see: Branchwidth and branch decompositions) 

branch decompositions see: Branchwidth and — 

branch of a feasible set 

[90C30, 90C33] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 

branch and Infer 

see: Mixed integer programming/constraint programming 

hybrid methods) 

branch and price 

[90C35] 

see: Multicommodity flow problems) 

branch and price and cut 

[90C35] 

see: Multicommodity flow problems) 

Branch and price: Integer programming with column 

generation 

68Q99) 

referred to in: Decomposition techniques for MILP: 

lagrangian relaxation; Graph coloring; Integer linear 

complementary problem; Integer programming; Integer 

programming: algebraic methods; Integer programming: 
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branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 
LCP: Pardalos—Rosen mixed integer formulation; MINLP: 
trim-loss problem; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Nonoriented multicommodity flow 
problems; Parametric mixed integer nonlinear 
optimization; Set covering, packing and partitioning 
problems; Simplicial pivoting algorithms for integer 
programming; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem) 
(refers to: Decomposition techniques for MILP: lagrangian 
relaxation; Integer linear complementary problem; Integer 
programming; Integer programming: algebraic methods; 
Integer programming: branch and bound methods; Integer 
programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Integer 
programming duality; Integer programming: lagrangian 
relaxation; LCP: Pardalos—-Rosen mixed integer 
formulation; Mixed integer classification problems; 
Multi-objective integer linear programming; 
Multi-objective mixed integer programming; Set covering, 
packing and partitioning problems; Simplicial pivoting 
algorithms for integer programming; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Time-dependent traveling 
salesman problem) 

branch problem 
[65K05] 
(see: Automatic differentiation: root problem and branch 
problem) 

branch problem 
[65K05] 
(see: Automatic differentiation: root problem and branch 
problem) 

branch problem see: Automatic differentiation: root problem 
and — 

branch and prune 

65G20, 65G30, 65G40, 65K05, 90C30] 

(see: Interval global optimization) 

branch and reduce 

49M37, 90C11] 

(see: Mixed integer nonlinear programming) 

branches for bounded integer variable see: multiple — 

branching 

49M37, 65K10, 90C05, 90C06, 90C08, 90C10, 90C11, 90C26, 
90C27, 90C30, 90C57, 90C59] 
(see: &BB algorithm; Integer programming: branch and 
bound methods; Quadratic assignment problem) 

branching see: strong — 

branching algorithm 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 

branching step 
[90C10, 90C26] 
(see: MINLP: branch and bound global optimization 
algorithm) 


branchpoint of a graph 
[90C35] 
(see: Feedback set problems) 
branchwidth 
[68R10, 90C27] 
(see: Branchwidth and branch decompositions) 
Branchwidth and branch decompositions 
(90C27, 68R10) 
breadth-first 
[90C10, 90C26] 
(see: MINLP: branch and bound global optimization 
algorithm) 
breaking rule see: tie — 
breakpoint 
[90C11, 90C31] 
(see: Parametric mixed integer nonlinear optimization) 
breast cancer diagnosis 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
breast tumors 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
Bregman parameter 
68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 
bridges 
68Q25, 90B80, 90C05, 90C27] 
(see: Communication network assignment problem) 
bridging model 
03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
brief review see: Generalized variational inequalities: A — 
(Brier) scoring rule see: quadratic — 
Broadcast scheduling problem 
(refers to: Frequency assignment problem; Genetic 
algorithms; Graph coloring; Greedy randomized adaptive 
search procedures; Multi-objective integer linear 
programming; Optimization problems in unit-disk graphs; 
Simulated annealing) 
Broeckx linearization see: Kaufman- — 
brother waits see: younger — 
Brouwer degree 
90C33] 
(see: Topological methods in complementarity theory) 
brouwer fixed point theorem 
46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 65G20, 65G30, 
65G40, 65H20, 91A05] 
(see: Interval fixed point theory; Minimax theorems) 
Brownian motion 
60G35, 65K05] 
(see: Differential equations and global optimization) 
Brownian motion see: N-dimensional — 
Broyden class see: quasi-Newton method of — 
Broyden family 
[49M37, 90C30] 
(see: Nonlinear least squares: Newton-type methods; 
Rosen’s method, global convergence, and Powell’s 
conjecture) 
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Broyden family of methods 
[90C30] 
(see: Broyden family of methods and the BFGS update) 
Broyden family of methods and the BFGS update 
(90C30) 
(referred to in: Conjugate-gradient methods; Large scale 
unconstrained optimization; Numerical methods for unary 
optimization; Unconstrained nonlinear optimization: 
Newton-Cauchy framework; Unconstrained optimization 
in neural network training) 
(refers to: Conjugate-gradient methods; Large scale 
unconstrained optimization; Numerical methods for unary 
optimization; Unconstrained nonlinear optimization: 
Newton-Cauchy framework; Unconstrained optimization 
in neural network training) 
Broyden-Fletcher-Goldfarb-Shanno method 
90C30] 
(see: Successive quadratic programming) 
Broyden-Fletcher-Goldfarb-Shanno quasi-Newton update 
15A15, 90C25, 90C55, 90C90] 
(see: Semidefinite programming and determinant 
maximization) 
Broyden-Fletcher-Goldfarb-Shanno update 
90C30] 
(see: Broyden family of methods and the BFGS update) 
Broyden method see: Powell-symmetric- — 
Broyden methods 
[90C30] 
(see: Broyden family of methods and the BFGS update) 
Broyden-Spedicato algorithms for linear equations and linear 
least squares see: Abaffi- — 
Broyden theorem 
15A39, 90C05] 
(see: Farkas lemma) 
brute-force 
90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
BSM 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
BSP 
65K05, 65Y05] 
(see: Parallel computing: models) 
BSP 
65K05, 65Y05] 
(see: Parallel computing: models) 
bSP model 
03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
BSP model 
03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
BSTP 
[05C05, 05C85, 68Q25, 90B80] 
(see: Bottleneck steiner tree problems) 
Buchberger algorithm 
[12D10, 12Y05, 13Cxx, 13P10, 13Pxx, 14Qxx, 90Cxx] 
(see: Grébner bases for polynomial equations; Integer 
programming: algebraic methods) 
Buchberger algorithm see: truncated — 


bucket 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 
budget constraint 
[78M50, 90B50, 91B28] 
(see: Global optimization algorithms for financial planning 
problems) 
budget of uncertainty 
(see: Price of robustness for linear optimization problems) 
building see: model — 
building blocks for the process units 
[90C30, 90C90] 
see: Successive quadratic programming: applications in the 
process industry) 
bulk synchronous parallel 
[65K05, 65Y05] 
see: Parallel computing: models) 
bulk synchronous parallel computer 
[65K05, 65Y05] 
see: Parallel computing: models) 
bulk synchronous parallel model 
[03D15, 68Q05, 68Q15] 
see: Parallel computing: complexity classes) 
bulk synchronous parallel model 
[03D15, 68Q05, 68Q15] 
see: Parallel computing: complexity classes) 
bullwhip effect 
[90-02] 
see: Operations research models for supply chain 
management and design) 
Bunch and Parlett factorization 
[90C30, 90C90] 
see: Successive quadratic programming: applications in 
distillation systems) 
bundle algorithm 
[49]40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
(see: Nonconvex energy functions: hemivariational 
inequalities) 
bundle algorithms 
[90C26, 90C31, 91465] 
(see: Bilevel programming: implicit function approach) 
bundle method see: proximal —; proximal point —; variable 
metric — 
bundle methods 
[46N10, 49J40, 49J52, 65K05, 90-00, 90C15, 90C26, 90C30, 
90C33, 90C47] 
(see: Nondifferentiable optimization; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Stochastic bilevel programs) 
bundle methods 
[49J40, 49J52, 65K05, 90C30] 
(see: Nondifferentiable optimization: relaxation methods; 
Solving hemivariational inequalities by nonsmooth 
optimization methods) 
bundle-Newton method 
[49J40, 49J52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
bundle trust region 
[49J40, 49J52, 65K05, 90C30] 
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(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
Burke local dualization see: loffe- — 
Burke-Poliquin reduction 
[46A20, 52A01, 90C30] 
(see: Composite nonsmooth optimization) 
Burmeister function see: Fischer- — 
business failure risk 
[91B06, 91B60] 
(see: Financial applications of multicriteria analysis) 
busting see: consist- — 
butterfly 
[90C35] 
(see: Feedback set problems) 
butterfly see: toroidal — 


Cc 


C see: ADOL- —; algorithm polynomial of degree — 
C-differentiable function 

[49]52, 90C30] 

(see: Nondifferentiable optimization: Newton method) 
C-differential 

[49]52, 90C30] 

(see: Nondifferentiable optimization: Newton method) 
(Cln)-efficient point 

[90C15, 90C29] 

(see: Discretely distributed stochastic programs: descent 

directions and efficient points) 
(Chn)-efficient solution 
[90C15, 90C29] 
see: Discretely distributed stochastic programs: descent 
directions and efficient points) 

C!+ optimization problem 

49K27, 49K40, 90C30, 90C31] 

(see: Second order constraint qualifications) 
Ck-Riemannian metric 

[58E05, 90C30] 

see: Topology of global optimization) 
C-subdifferential 

[49]52, 90C30] 

(see: Nondifferentiable optimization: Newton method) 
CA algorithm see: asynchronous parallel —; sequential —; 

synchronized parallel — 

CA algorithms 
[90C30] 
(see: Cost approximation algorithms) 
CA algorithms see: decomposition — 
cable see: slack — 
cable structures see: structural analysis of — 
cables 

[51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 

(see: Graph realization via semidefinite programming) 
calculation of the Hessian see: Automatic differentiation: — 
calculation of Newton steps see: Automatic differentiation: — 
calculus see: infinitesimal —; quasidifferential — 
calculus of quasidifferentials 

[90Cxx] 

(see: Quasidifferentiable optimization: optimality 

conditions) 


calculus of quasidifferentials see: Quasidifferentiable 
optimization: — 
calculus of variations 
[01499] 
(see: Carathéodory, Constantine; Lagrange, Joseph-Louis) 
calculus of variations 
[01A99, 03H10, 49J27, 90C34] 
(see: Carathéodory, Constantine; Lagrange, Joseph-Louis; 
Semi-infinite programming and control problems) 
calculus of variations see: inverse problem of the —; 
Nonconvex-nonsmooth — 
calibration see: model — 
called 
90C15] 
(see: Two-stage stochastic programs with recourse) 
calm problem 
65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
calmness 
49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
calmness condition 
65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
calmness condition see: partial — 
campaign 
(see: Planning in the process industry) 
campaign see: mixed-product —; single-product — 
campaigns see: long — 
canceling algorithm see: cycle- — 
cancer chemotherapy 
[93-XX] 
(see: Direct search Luus—Jaakola optimization procedure) 
cancer diagnosis see: breast — 
candidate list 
[68T20, 68T99, 90B10, 90C27, 90C59] 
(see: Metaheuristics; Shortest path tree algorithms) 
Candidate List see: restricted — 
canonical boxes 
65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
canonical dual transformation 
49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
canonical dual transformation method 
49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
canonical form 
49-XX, 90-XX, 90C26, 93-XX] 
(see: D.C. programming; Duality theory: biduality in 
nonconvex optimization) 
canonical function associated with A 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
canonical function space 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
canonical function space see: extended — 
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canonical monotonic optimization problem 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
canonical normal form 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
canonical transformation 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
Cantor set theory 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
CAP 
[68Q25, 90B80, 90C05, 90C27] 
(see: Communication network assignment problem) 
CAP on trees 
[68Q25, 90B80, 90C05, 90C27] 
(see: Communication network assignment problem) 
CAP on trees see: asymptotic behavior of —; exact algorithm 
for solving —; heuristic approach to solving — 
capacitated arc routing problem 
90B06] 
(see: Vehicle routing) 
capacitated lot-sizing problem 
90C90] 
(see: Chemical process planning) 
capacitated minimum spanning arborescence problem 
68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
capacitated minimum spanning tree problem 
68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
capacitated minimum spanning tree problem 
68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
Capacitated minimum spanning trees 
(90C27, 68T99) 
(referred to in: Bottleneck steiner tree problems; Shortest 
path tree algorithms) 
(refers to: Bottleneck steiner tree problems; Directed tree 
networks; Minimax game tree searching; Shortest path tree 
algorithms) 
capacitated network see: directed — 
capacitated transportation problem 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
capacitated vehicle routing problem 
[90B06] 
(see: Vehicle routing) 
capacity see: arc —; nodes with water storage —; problem 
with nonunit —; residual —; shannon zero-error — 
capacity of an arc in a network 
[90C35] 
(see: Minimum cost flow problem) 
capacity constraint 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
capacity constraint on arc flows 
[90B10] 
(see: Piecewise linear network flow problems) 


capacity constraints 
[90B80, 90B85, 90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems; Warehouse 
location problem) 
capacity constraints see: maximum oil, gas and water —; single 
fixed cost with —; single fixed cost with no — 
capacity of a cut 
[90C35] 
see: Maximum flow problem) 
capital asset pricing model 
[91B50] 
see: Financial equilibrium) 
capital investment see: venture — 
capital market line 
[91B28] 
see: Portfolio selection: markowitz mean-variance model) 
Carathéodory 
[01A99] 
see: Carathéodory, Constantine) 
Carathéodory, Constantine 
01A99) 
(referred to in: Carathéodory theorem; History of 
optimization) 
refers to: Carathéodory theorem; History of optimization) 
Carathéodory principle 
[01A99] 
(see: Carathéodory, Constantine) 
Carathéodory theorem 
90C05) 
referred to in: Carathéodory, Constantine; History of 
optimization; Krein-Milman theorem; Linear 
programming; Single facility location: circle covering 
problem) 
refers to: Carathéodory, Constantine; Krein-Milman 
theorem; Linear programming) 
Carathéodory theorem 
[90B85, 90C06, 90C25, 90C27, 90C30, 90C35] 
see: Simplicial decomposition; Simplicial decomposition 
algorithms; Single facility location: circle covering problem) 
Carathéodory theorem 
[90C06, 90C25, 90C30, 90C35] 
see: Simplicial decomposition; Simplicial decomposition 
algorithms) 
cardinalities axiom see: two — 
cardinality of a graph 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
cardinality matching problem see: maximum — 
cardinality of a node 
[90C35] 
(see: Generalized networks) 
cargo routing problems 
(see: Maritime inventory routing problems) 
Carl Friedrich see: Gauss — 
Carlo see: Monte- —; pure Monte- — 
Carlo configuration see: Monte- — 
Carlo method see: metropolis Monte —; Monte- —; pure 
Monte- — 
Carlo simulated annealing in protein folding see: Monte- — 
Carlo simulation see: monte- — 
Carlo simulation algorithm see: Monte- — 
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Carlo simulation procedure see: Monte- — 
Carlo simulations for stochastic optimization see: Monte- — 
CARP 
[90B06] 
(see: Vehicle routing) 
carrying see: label — 
Cartesian coordinates 
92B05] 
(see: Genetic algorithms for protein structure prediction) 
Cartesian coordinates 
92B05] 
(see: Genetic algorithms for protein structure prediction) 
Cartesian product 
90C30] 
(see: Cost approximation algorithms) 
Cartesian product set 
90C30] 
(see: Cost approximation algorithms) 
cascade see: temperature — 
case see: convex-concave —; perfectly consistent — 
case analysis see: average —; worst- — 
case approach see: worst- — 
case behavior see: average — 
case complexity see: average —; worst- — 
case complexity of algorithms see: average — 
case of integral evaluation see: asymptotic — 
case optimality see: worst- — 
case performance guarantee see: worst- — 
case setting see: average — 
case of the trust region problem see: general —; hard —; 
Newton step — 
case of a two-person game see: cooperative — 
cash flow see: maximize operating — 
catalysis: optimization methods see: Shape selective zeolite 
separation and — 
catchment management 
[90C30, 90C35] 
(see: Optimization in water resources) 
Cauchy approach see: modified — 
Cauchy formula 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
Cauchy framework see: Newton- —; Unconstrained nonlinear 
optimization: Newton- — 
Cauchy inequality 
[15A39, 90C05] 
(see: Motzkin transposition theorem) 
Cauchy method 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
Cauchy method see: modified — 
Cauchy point 
[90C06] 
(see: Large scale unconstrained optimization) 
Cayley transform 
[15A39, 90C05] 
(see: Farkas lemma) 


CCM 
[90C30] 
(see: Cyclic coordinate method) 
cCOMB algorithm 
[49M07, 49M10, 65K, 90C06] 
(see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 
CD see: IS- — 
c.d. function 
[65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
CDPAP 
(05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
cell 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements in optimization) 
cell of a function 
[90Cxx] 
(see: Discontinuous optimization) 
cell of a polyhedral subdivision 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
cell sectorization 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
cell of a Turing machine see: tape — 
center see: analytic —; uncapacitated — 
center cutting plane method see: analytic — 
center of gravity 
49M20, 90-08, 90C25] 
(see: Nondifferentiable optimization: cutting plane 
methods) 
center of gravity location 
90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
center of gravity method 
49M20, 90-08, 90C25] 
(see: Nondifferentiable optimization: cutting plane 
methods) 
center of an interval linear system 
[15A99, 65G20, 65G30, 65G40, 90C26] 
(see: Interval linear systems) 
center node 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
center path 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
center problem see: p- — 
center problem on a network see: p- — 
centering see: design — 
centering direction 
[90C05] 
(see: Linear programming: interior point methods) 
centering hit and run see: artificial — 
central arc 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
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central component 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
central path 
[37A35, 49-XX, 90-XX, 90C05, 90C25, 90C30, 93-XX] 
(see: Duality theory: monoduality in convex optimization; 
Linear programming: interior point methods; Potential 
reduction methods for linear programming; Solving large 
scale and sparse semidefinite programs) 
central path 
[90C25, 90C30] 
(see: Successive quadratic programming: solution by active 
sets and interior point methods) 
central trajectory 
[90C05] 
(see: Linear programming: interior point methods) 
centroid of a simplex 
[90C30] 
(see: Sequential simplex method) 
CEP 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
CEP see: RAF of —; restricted accessibility form of —; UAF 
of —; universally accessible form of — 
cerevisiae see: saccharomyces — 
certainty see: mathematical and computational — 
certificate 
[15A39, 90C05, 90C60] 
(see: Complexity theory; Linear optimization: theorems of 
the alternative; Motzkin transposition theorem) 
certificate 
[15A39, 90005] 
(see: Farkas lemma; Linear optimization: theorems of the 
alternative; Motzkin transposition theorem) 
CFAP 
[05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
CFD 
[90C90] 
(see: Design optimization in computational fluid dynamics) 
CFD see: design optimization in — 
CG see: alternatives to — 
CG family see: two-parameter — 
CG method see: linear —; nonlinear — 
CG-related algorithm 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
CG-related algorithms see: nonlinear — 
CG relationship see: BFGS- — 
CG-standard 
[90C30] 
(see: Conjugate-gradient methods) 
CG-standard for minimizing q 
[90C30] 
(see: Conjugate-gradient methods) 
CGM 
[90C90] 
(see: Design optimization in computational fluid dynamics) 


CGM 

[65K05, 65Y05] 

see: Parallel computing: models) 

cGS 

[74A40, 90C26] 

see: Shape selective zeolite separation and catalysis: 

optimization methods) 

CGU algorithm 

[65K05, 90C26] 

see: Molecular structure determination: convex global 
underestimation) 

chain see: ejection —; finite-state Markov —; global supply —; 
Hansel —; markov —; operational decisions in a supply —; 
stationary-state Markov —; strategic design of a supply —; 
supply —; two-stranded — 

chain design see: supply — 

chain justification 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 

chain justification see: left- —; right- — 

chain management see: Bilinear programming: applications in 
the supply —; Mathematical programming methods in 
supply —; operational supply —; strategic supply —; 
supply — 

chain management and design see: Operations research 
models for supply — 

chain methods see: ejection — 

chain models see: Global supply — 

chain networks 
[05C85] 
(see: Directed tree networks) 

chain optimization see: supply — 

chain performance measurement see: Supply — 

chain rule 
[90C30] 
(see: Generalized total least squares) 

chain rule for Bayesian networks 
(see: Bayesian networks) 

chain rules 
[90C15] 
(see: Stochastic quasigradient methods: applications) 

chain sampling see: Markov — 

chain simulation models see: supply — 

chained local search 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 

chains see: Inventory management in supply —; Markov — 

chains question-asking strategy see: binary search-Hansel —; 
sequential Hansel — 

Chaitin complexity see: Solomonoff-Kolmogorov- — 

Chaitin in Omega 

[90C60] 

see: Kolmogorov complexity) 

challenge see: grand — 

challenges in MINLP 

[49M37, 90C11] 

see: Mixed integer nonlinear programming) 

challenges for OR 

[90C27] 

see: Operations research and financial markets) 
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chamber 

05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements) 

Chan method 

65F xx] 

(see: Least squares problems) 

chance constraint programming 

90C10, 90C15] 

(see: Stochastic vehicle routing problems) 

chance constraint programming 

90C10, 90C15] 
(see: Stochastic vehicle routing problems) 

changes see: sensitivity analysis with respect to right-hand 
side —; up to first order — 

changes in cost coefficients see: sensitivity analysis with 
respect to — 

channel constrained frequency assignment see: adjacent —; 
co- — 

chaotic iterative scheme 

[90C30, 90C52, 90C53, 90C55] 

see: Asynchronous distributed optimization algorithms) 

Characteristic see: spatial — 

characteristic equation 

[93D09] 

(see: Robust control) 

characteristic function 

[03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 

characteristic polynomial 

[05B35, 20F36, 20F55, 52C35, 57N65] 

see: Hyperplane arrangements) 

characteristic vector 

[05C60, 05C69, 05C85, 37B25, 68W01, 90C10, 90C20, 90C25, 
90C27, 90C35, 90C59, 91A22] 
(see: Heuristics for maximum clique and independent set; 
L-convex functions and M-convex functions; Replicator 
dynamics in combinatorial optimization) 

characteristic vector see: weighted — 

characteristics see: method of — 

characterization of see: Convexifiable functions — 

characterization of Ep, 

[90C15, 90C29] 

see: Discretely distributed stochastic programs: descent 

directions and efficient points) 

characterizing momments 

[94A17] 

see: Jaynes’ maximum entropy principle) 

characterstic polynomial 

[49M37, 65K10, 90C26, 90C30] 

see: 0BB algorithm) 

charactertstic vector 

[68Q25, 68R05, 90-08, 90C27, 90C32] 

see: Fractional combinatorial optimization) 

charge see: fixed —; linear fixed — 

charge function see: fixed — 

charge network flow problem see: fixed — 

charge networks see: fixed — 

charge problem see: fixed — 

charge transportation problem see: fixed — 

chart scores see: REL — 


Chebyshev alternance 

65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 

Chebyshev alternation 

65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 

Chebyshev best approximation 

49K35, 49M27, 65K10, 90C25] 

(see: Convex max-functions) 

Chebyshev iterative method 

90C05, 90C25] 

(see: Metropolis, Nicholas Constantine) 

Chebyshev polynomial 

65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 

Chebyshev problem 

65D10, 65K05] 

(see: Overdetermined systems of linear equations) 

Chebyshev set 

41A30, 47A99, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 

checklist see: valuation of a — 

checklist confirmation 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

checklist denial 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

checklist modus ponens 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

checklist modus tollens 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

checklist paradigm 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

checklist paradigm 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

Checklist paradigm semantics for fuzzy logics 
(03B52, 03B50, 03C80, 62F30, 62Gxx, 68T27 
(referred to in: Alternative set theory; Boolean and fuzzy 
relations; Finite complete systems of many-valued logic 
algebras; Inference of monotone boolean functions; 
Optimization in boolean classification problems; 
Optimization in classifying text documents) 
(refers to: Alternative set theory; Boolean and fuzzy 
relations; Finite complete systems of many-valued logic 
algebras; Inference of monotone boolean functions; 
Optimization in boolean classification problems; 
Optimization in classifying text documents) 

checklist template 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

checkpointing 
[49-04, 65Y05, 68N20] 
(see: Automatic differentiation: parallel computation) 

chemical engineering design problems see: Interval analysis: 
application to — 
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chemical equilibrium see: multiphase —; Optimality criteria for 


multiphase — 

chemical equilibrium problem 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 

chemical potential 
[49K99, 65K05, 80A10, 90C30] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes; Optimality criteria for 
multiphase chemical equilibrium) 

Chemical process planning 
(90C90) 
(referred to in: Generalized benders decomposition; 
Generalized outer approximation; MINLP: application in 
facility location-allocation; MINLP: applications in 


blending and pooling problems; MINLP: applications in the 


interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: branch and 
bound methods; MINLP: design and scheduling of batch 
processes; MINLP: generalized cross decomposition; 
MINLP: global optimization with «BB; MINLP: heat 


exchanger network synthesis; MINLP: logic-based methods; 


MINLP: outer approximation algorithm; MINLP: reactive 
distillation column synthesis; Mixed integer linear 
programming: heat exchanger network synthesis; Mixed 
integer linear programming: mass and heat exchanger 
networks; Mixed integer nonlinear programming) 
(refers to: Extended cutting plane algorithm; Generalized 
benders decomposition; Generalized outer approximation; 
MINLP: application in facility location-allocation; MINLP: 
applications in blending and pooling problems; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 
MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: mass and heat 
exchanger networks; Mixed integer nonlinear 
programming) 

chemical reaction equilibrium see: Global optimization in 
phase and — 

chemotherapy see: cancer — 

Chen-Harker-Kanzow-Smale function 
[49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 

chess-board matrix 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 

Chevron method 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 

chi-square statistic see: Pearson — 

child of a vertex 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 


Chinese postman problem 
[90B20] 
(see: General routing problem) 

Chinese postman problem see: directed — 

chirotope 
[90C09, 90C10] 
(see: Oriented matroids) 

choice see: greedy —; rational —; rule of random — 

choice adjustment process see: trip-route — 

choice axiom 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 

choice of the entering variable 
[90C05, 90C33] 
(see: Pivoting algorithms for linear programming 
generating two paths) 

choice knapsack see: multiple — 

choice knapsack problem see: linear multiple- —; 
multidimensional multiple- —; multiple- — 

choice of the leaving variable 
[90C05, 90C33] 
(see: Pivoting algorithms for linear programming 
generating two paths) 

choice property see: greedy- — 

choices see: linguistic — 

Choleski algorithm see: implicit — 

Cholesky factorization 
(15-XX, 65-XX, 90-XX) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; Interval linear systems; Large scale 
trust region problems; Large scale unconstrained 
optimization; Orthogonal triangularization; 
Overdetermined systems of linear equations; QR 
factorization; Solving large scale and sparse semidefinite 
programs; Symmetric systems of linear equations) 
(refers to: ABS algorithms for linear equations and linear 
least squares; Large scale trust region problems; Large scale 
unconstrained optimization; Least squares problems; 
Linear programming; Orthogonal triangularization; 
Overdetermined systems of linear equations; QR 
factorization; Solving large scale and sparse semidefinite 
programs; Symmetric systems of linear equations) 

cholesky factorization 

[15-XX, 65-XX, 65Fxx, 65K05, 90-XX, 90Cxx] 

see: Cholesky factorization; Least squares problems; 

Symmetric systems of linear equations) 

Cholesky triangle 

[15-XX, 65-XX, 90-XX] 

see: Cholesky factorization) 

chordal 

[90C35] 

see: Feedback set problems) 

chordal graph 

[05C50, 15A48, 15A57, 90C25] 

see: Matrix completion problems) 

chordal graph see: bipartite — 

chromatic number 

[05-XX, 05C15, 05C17, 05C35, 05C62, 05C69, 05C85, 90C22, 

90C27, 90C35, 90C59] 

see: Frequency assignment problem; Lovasz number; 

Optimization problems in unit-disk graphs) 
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chromosome 

92B05] 

(see: Genetic algorithms) 

chromosome 

[92B05] 

see: Genetic algorithms) 

Chung-Gilbert conjecture 

[90C27] 

(see: Steiner tree problems) 

Chvatal function 

[90C10, 90C46] 

see: Integer programming duality) 

Chvatal-Gomory cut 

[90C05, 90C06, 90C08, 90C10, 90C11, 90C46] 

see: Integer programming: branch and cut algorithms; 
Integer programming duality) 

Chvatal-Gomory cutting plane 

[90C05, 90C06, 90C08, 90C10, 90C11] 

see: Integer programming: cutting plane algorithms) 
Chvatal rank 

[90C10, 90C11, 90C27, 90C57] 

see: Integer programming) 

CI 


[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
CID 
[93A30, 93B50] 
(see: Mixed integer linear programming: mass and heat 
exchanger networks) 
circle see: largest empty —; least —; minimum —; smallest 
enclosing — 
circle covering problem see: Single facility location: — 
circle problem see: largest empty —; smallest enclosing- — 
circle product of relations 
[03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
circles in a square see: equal — 
circuit 
[13Cxx, 13Pxx, 14Qxx, 90C09, 90C10, 90Cxx] 
(see: Integer programming: algebraic methods; Oriented 
matroids) 
circuit see: combinatorial switching —; depth of a Boolean —; 
HAMILTON —; Hamiltonian —; sign of a — 
circuit design 
[90C10, 90C27, 94C15] 
(see: Graph planarization) 
circuit of a digraph 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
circuit orientation 
[90C09, 90C10] 
(see: Oriented matroids) 
circuit problem see: Hamiltonian — 
circuits 
[90C09, 90C10, 90C35] 
(see: Matroids; Optimization in leveled graphs) 
circuits see: signed — 


circulant matrix 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
circular path 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
circular unidimensional scale 
[62H30, 90C27] 
(see: Assignment methods in clustering) 
Clarke see: generalized subdifferential of F.H. — 
Clarke derivative 
[26E25, 49J52, 5227, 90C99] 
(see: Quasidifferentiable optimization: Dini derivatives, 
clarke derivatives) 
Clarke derivative see: directional — 
clarke derivatives see: Quasidifferentiable optimization: Dini 
derivatives — 
Clarke directional differential 
[49J40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
(see: Nonconvex energy functions: hemivariational 
inequalities) 
Clarke dual action 
49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
Clarke duality 
49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
Clarke duality theorem 
49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
clarke generalized derivative 
49J40, 49]52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX, 90C26] 
(see: Generalized monotone multivalued maps; Nonconvex 
energy functions: hemivariational inequalities) 
Clarke generalized directional derivative 
49J40, 49J52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
Clarke generalized gradient 
35A15, 47]20, 49]40] 
(see: Hemivariational inequalities: static problems) 
Clarke generalized gradient 
26E25, 49J52, 52A27, 90C99] 
(see: Quasidifferentiable optimization: Dini derivatives, 
clarke derivatives) 
Clarke generalized Jacobian 
49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
Clarke generalized subdifferential 
49J52, 49Q10, 74G60, 74H99, 74K99, 74Pxx, 90C90] 
(see: Quasidifferentiable optimization: stability of dynamic 
systems) 
Clarke-Rockafellar generalized derivative 
90C26] 
(see: Generalized monotone multivalued maps) 
Clarke subdifferential 
49J40, 49J52, 65K05, 90C30] 
(see: Nonconvex-nonsmooth calculus of variations; Solving 
hemivariational inequalities by nonsmooth optimization 
methods) 
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class 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 

class see: basic ABS —; closed —; color —; connected —; 
countable —; finite —; infinite —; quasi-Newton method of 
Broyden —; scaled ABS —; source —; universal —; unsealed 
ABS — 

class of algorithms see: scaled ABS — 

class data classification via mixed-integer optimization see: 
Multi- — 

class distance see: inter- —; intra- — 

class invariant under principal pivoting see: matrix — 

class of a matrix see: qualitative — 

class P see: complexity — 

class software package see: multiple- —; single- — 

class of states see: recurrent —; transient — 

classes see: complexity —; equivalent —; matrix —; Parallel 
computing: complexity —; phase —; 2-—; Sd- —; 
separable —; set-definable —; o- — 

classes axiom see: existence of — 

classes in optimization see: Complexity — 

classes of problems see: equivalence — 

classical cutting plane method see: Kelley’s — 

classical Gram-Schmiadt orthogonalization 
[65Fxx] 
(see: Least squares problems) 

classical inference 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 

classical linear regression model 
[90C26, 90C30] 
(see: Forecasting) 

classical logic see: evaluation in — 

classical LU factorization 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 

classical Lyusternik theorem 
[41A10, 46N10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals; High-order necessary conditions for optimality 
for abnormal points) 

classical oligopoly problem 
[91B06, 91B60] 
(see: Oligopolistic market equilibrium) 

classical thermoelastic model 
[35R70, 47840, 74B99, 74D99, 74G99, 74H99] 
(see: Quasidifferentiable optimization: applications to 
thermoelasticity) 

classical traveling salesman problem 
[90C27] 
(see: Time-dependent traveling salesman problem) 

classification 
[03B52, 03E72, 47840, 62H30, 68T27, 68T35, 68Uxx, 90Bxx, 
90C27, 90C29, 90C39, 91Axx, 91B06, 92C60] 
(see: Assignment methods in clustering; Boolean and fuzzy 
relations; Dynamic programming in clustering; 
Multicriteria sorting methods) 


classification 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
classification see: automatic document —; computational 
issues in —; Deterministic and probabilistic optimization 
models for data —; document —; Linear programming 
models for —; optimization in document —; statistical —; 
statistical pattern —; supervised —; text —; 
unsupervised — 
classification of documents see: automatic — 
classification error see: minimizing the overall — 
classification of fractional programs 
[90032] 
see: Fractional programming) 
classification function 
62H30, 90C11] 
(see: Statistical classification: optimization approaches) 
classification of hard problems 
[90C60] 
see: Computational complexity theory) 
classification of large collections of documents 
[90C09, 90C10] 
see: Optimization in classifying text documents) 
ssification of many-valued logics 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
assification matrix 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
see: Disease diagnosis: optimization-based methods) 
classification: optimization approaches see: Statistical — 
classification problem 
[90C09] 
see: Inference of monotone boolean functions) 
ssification problem 
[90C09] 
see: Inference of monotone boolean functions) 
classification problem see: Boolean —; g-group — 
classification problem (discriminant problem) see: g-group — 
classification problems see: Mixed integer —; Optimization in 
boolean — 
classification and regression trees 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
classification of text documents 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
classification via mixed-integer optimization see: Multi-class 
data — 
classifying declarative programs 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
classifying text documents see: Optimization in — 
clause 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
clause see: logical — 
clause at a time see: one — 
clause at a time algorithm see: one — 
clause at a time approach see: one — 
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clauses see: minimal number of DNF —; minimum number 
of — 
clearly defined end 
(see: Planning in the process industry) 
clipping technique 
[90C30] 
(see: Suboptimal control) 
clique 
[03B50, 05C15, 05C17, 05C35, 05C60, 05C62, 05C69, 05C85, 
37B25, 65Fxx, 68Q25, 68R10, 68T15, 68T30, 68W40, 90C10, 
90C20, 90C22, 90C27, 90C35, 90C59, 91A22] 
(see: Domination analysis in combinatorial optimization; 
Finite complete systems of many-valued logic algebras; 
Least squares problems; Lovasz number; Maximum 
constraint satisfaction: relaxations and upper bounds; 
Multidimensional assignment problem; Optimization in 
leveled graphs; Optimization problems in unit-disk graphs; 
Replicator dynamics in combinatorial optimization; 
Standard quadratic optimization problems: applications) 
clique 
[05C60, 05C69, 05C85, 37B25, 68W01, 90C20, 90C27, 90C35, 
90C59, 91A22] 
(see: Heuristics for maximum clique and independent set; 
Replicator dynamics in combinatorial optimization) 
clique see: maximal —; maximum —; maximum weight — 
clique-cut 
(see: Contact map overlap maximization problem, CMO) 
clique graph see: block- — 
clique and independent set see: Heuristics for maximum — 
clique number 
[05C15, 05C17, 05C35, 05C60, 05C62, 05C69, 05C85, 37B25, 
68W01, 90C20, 90C22, 90C27, 90C35, 90C59, 91A22] 
(see: Heuristics for maximum clique and independent set; 
Lovasz number; Optimization problems in unit-disk 
graphs; Replicator dynamics in combinatorial 
optimization) 
clique number see: weighted — 
clique partition number 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 
clique partitioning see: minimum — 
clique Problem 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
clique problem see: max- —; maximum —; maximum 
weight — 
closed see: minor — 
closed class 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 
closed convex cone see: pointed — 
closed form approach 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
closed form transformation see: unimodular max- — 
closed form transformations see: unimodular max- — 
closed function see: max- — 
closed list 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 


closed-loop control 
[49]xx, 91Axx] 
(see: Infinite horizon control and dynamic games) 
closed of a matroid 
[90C09, 90C10] 
(see: Matroids) 
closed point-to-set mapping 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
closed selfadjoint operator 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
closed set see: max- — 
closed sets see: max- — 
closure 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
closure operator for a matroid 
[90C09, 90C10] 
(see: Matroids) 
closure of a relation see: equivalence —; local equivalence —; 
local pre-order —; local tolerance —; pre-order —; 
property- —; reflexive —; tolerance — 
closures 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
CLP 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
CLP(BNR) 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
cluster 
[90C35] 
(see: Multi-index transportation problems) 
cluster see: admissible —; star — 
cluster analysis 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 65K05, 
68U20, 70-08, 82B21, 82B31, 82B41, 82B80, 90-00, 90-08, 
90C11, 90C27, 90C35, 92C40, 92E10] 
(see: Algorithms for genomic analysis; Global optimization 
in protein folding) 
cluster compactness 
62H30, 90C27] 
(see: Assignment methods in clustering) 
ster first-schedule second strategy 
68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
cluster-heads 
05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
cluster isolation 
62H30, 90C27] 
(see: Assignment methods in clustering) 
cluster second see: schedule first- — 
cluster statistic see: generalized single — 
clustering 
[62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 
90C27, 90C30, 90C39, 90C90] 
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(see: Assignment methods in clustering; Dynamic 
programming in clustering; Optimization in medical 
imaging; Stochastic global optimization: two-phase 
methods) 
clustering 
[62H30, 90C27, 90C39] 
(see: Assignment methods in clustering; Dynamic 
programming in clustering) 
clustering see: Assignment methods in —; density —; Dynamic 
programming in —; fuzzy —; hard —; minimal variance —; 
Nonsmooth optimization approach to —; order constrained 
hierarchical — 
clustering algorithm 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
clustering approach: global optimum search with enhanced 
positioning see: Gene clustering: A novel 
decomposition-based — 
clustering: A novel decomposition-based clustering approach: 
global optimum search with enhanced positioning see: 
Gene — 
clustering problem 
[90C08, 90C11, 90C27] 
(see: Quadratic semi-assignment problem) 
clusters see: Determining the optimal number of —; Global 
optimization in Lennard-Jones and morse —; PC — 
clusters size threshold see: determination of — 
CMDT 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
CMO see: Contact map overlap maximization problem — 
CMST 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
CMST see: equal demand —; nonunit weight —; unit weight — 
CNF 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
CNF 
[90C09, 90C10] 
(see: Inference of monotone boolean functions; 
Optimization in boolean classification problems; 
Optimization in classifying text documents) 
CNF see: k- —; SAT-k- — 
CNF problem see: SAT- — 
CNSO 
[46A20, 52A01, 90C30] 
(see: Composite nonsmooth optimization) 
CNSO see: extended real-valued —; multi-objective —; 
real-valued — 
CNSO problems see: second order Lagrangian theory of — 
co 
[90C27, 90C29] 
(see: Multi-objective combinatorial optimization) 
co-channel constrained frequency assignment 
[05-XX] 
(see: Frequency assignment problem) 
co-coercive operator 
[47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 


co-generation plant 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
co-generation plant 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
co-index see: linear —; quadratic — 
co-optimal path 
[52A22, 60D05, 68Q25, 90C05] 
see: Probabilistic analysis of simplex algorithms) 
co-optimal vertex 
[52A22, 60D05, 68Q25, 90C05] 
see: Probabilistic analysis of simplex algorithms) 
coalition see: concordant —; discordant — 
coarse grained multicomputer 
[65K05, 65Y05] 
see: Parallel computing: models) 
coarse grained multicomputer 
[65K05, 65Y05] 
see: Parallel computing: models) 
coarse grid 
[68W01, 90-00, 90C90, 92-08, 92C50] 
see: Optimization based frameworkfor radiation therapy) 
coarse valuation structure 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
coarseness 
[68Q20] 
see: Optimal triangulations) 
cobipartite neighborhood edge elimination ordering 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
see: Optimization problems in unit-disk graphs) 
coboundary of a function 
90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
cocircuits see: signed — 
cocomparability graph 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C35, 90C59] 
(see: Feedback set problems; Optimization problems in 
unit-disk graphs) 
code see: Gray —; RANS —; Reynolds-averaged 
Navier-Stokes — 
code list 
[65G20, 65G30, 65G40, 65H20, 65H99, 65K05, 65K99, 90C26, 
90C30] 
(see: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Interval analysis: 
intermediate terms) 
code list 
[65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 
code PCSP see: computer — 
code transformation 
[65H99, 65K99] 
(see: Automatic differentiation: point and interval) 
code transformation see: source — 
coderivative see: limiting — 
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codifferentiability 
[49]52, 65K99, 70-08, 90C25] 
(see: Quasidifferentiable optimization: algorithms for 
hypodifferentiable functions; Quasidifferentiable 
optimization: codifferentiable functions) 

codifferentiable 
[49]52, 65K99] 
(see: Quasidifferentiable optimization: algorithms for 
hypodifferentiable functions) 

codifferentiable see: continuously —; twice —; twice 
continuously — 

codifferentiable function 
[49]52, 65K99, 70-08, 90C25] 
(see: Quasidifferentiable optimization: codifferentiable 
functions) 

codifferentiable function 
[65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 

codifferentiable function see: continuously —; Dini —; 
Hadamard —; twice —; twice continuously — 

codifferentiable functions see: Quasidifferentiable 
optimization: — 

codifferential 
[26B25, 26E25, 49J52, 65K99, 65Kxx, 70-08, 90C25, 90C30, 
90C99, 90Cxx] 
(see: Nondifferentiable optimization: Newton method; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: 
codifferentiable functions) 

codifferential see: second order — 

codifferential descent see: method of — 

coefficient see: activity —; contraction —; expansion —; 
fugacity —; reflection — 

coefficient generation see: one-at-a-time — 

coefficient matrix see: ill-conditioned — 

coefficient pivoting rule see: Dantzig largest — 

coefficient reduction 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 

coefficient rule see: largest — 

coefficients see: estimation of kinetic —; flexible MOLP with 
fuzzy —; generalized linear programming with variable —; 
MOLP with fuzzy —; multi-objective linear programming 
with fuzzy —; real —; sensitivity analysis with respect to 
changes in cost —; statistical representation of cutting 
plane —; Taylor — 

coercive 

[90C25, 90C26] 

see: Decomposition in global optimization) 

coercive bilinear symmetric continuous form 

[49]40, 49]52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 

see: Nonconvex energy functions: hemivariational 

inequalities) 

coercive hemivariational inequality problem 

[49]40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 

see: Nonconvex energy functions: hemivariational 
inequalities) 

coercive operator see: co- — 


coercivity condition 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
coercivity condition 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
cognitive construct 
03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
cognitive element 
03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
cognitive science 
65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
Cohen triangulation see: Hickey- — 
cohomology see: local system — 
cohomology of an arrangement of hyperplanes 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
coin graphs 
05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
coincidence theorem 
46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
cold spot 
68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
cold spots 
68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
coli see: escherichia — 
collaborative 
[65F10, 65F50, 65H10, 65K10] 
(see: Multidisciplinary design optimization) 
collaborative optimization 
[49M37, 65F10, 65F50, 65H10, 65K05, 65K10, 90C30, 93A13] 
(see: Multidisciplinary design optimization; Multilevel 
methods for optimal design) 
collapse see: probabilistic — 
collapsing auction algorithm see: graph — 
collapsing in auction algorithms see: graph — 
collecting traveling salesman problem see: prize — 
collection 
[90C10, 90C26, 90C30] 
(see: Optimization software) 
collection of margins see: hierarchical — 
collection of a partition see: left- —; right- — 
collection of subsets see: transversal of a — 
collections of documents see: classification of large — 
collectivety compact 
[65H10, 65J15] 
(see: Contraction-mapping) 
collision see: direct —; hidden — 
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collocation 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
collocation 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
collocation see: orthogonal — 
collocation conditions 
[34A55, 78A60, 90C30] 
(see: Optimal design in nonlinear optics) 
(colloquial) see: optimization: definition — 
colony see: ant — 
coloop 
90C09, 90C10] 
(see: Matroids; Oriented matroids) 
color see: double —; single — 
color class 
05C15, 05C62, 05C69, 05C85, 90C27, 90C35, 90C59] 
(see: Graph coloring; Optimization problems in unit-disk 
graphs) 
color-forced 
05C85] 
(see: Directed tree networks) 
colorability see: 3- — 
colorable see: k- — 
coloring 
[90C35] 
(see: Graph coloring) 
coloring 
[90C35] 
(see: Graph coloring) 
coloring see: arc —; conflict-free —; constrained edge —; 
edge —; frequency exhaustive sequential —; graph —; 


greedy —; hypergraph q- —; list —; proper —; requirement 


exhaustive sequential —; t- —; total —; uniform 
sequential —; weighted — 

coloring extension 
[05C85] 
(see: Directed tree networks) 

coloring frequency assignment see: order of a T- —; span of 
aT-— 

coloring heuristic see: sequential greedy — 

coloring problem see: edge —; graph —; m- —; path —; 
total —; weighted graph — 

column see: basic —; critical —; nonbasic — 

column dropping 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 

column dropping rule 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 

column generation 
[68Q99, 90C06, 90C10, 90C11, 90C25, 90C27, 90C30, 90C35, 
90C57] 
(see: Branch and price: Integer programming with column 


generation; Frank-Wolfe algorithm; Integer programming; 
Multicommodity flow problems; Simplicial decomposition; 


Simplicial decomposition algorithms) 

Column generation 
[68Q99, 90B90, 90C06, 90C10, 90C11, 90C25, 90C30, 90C35, 
90C57, 90C59, 90C90] 


(see: Branch and price: Integer programming with column 
generation; Cutting-stock problem; Frank-Wolfe 
algorithm; Modeling difficult optimization problems; 
Multicommodity flow problems; Simplicial decomposition; 
Simplicial decomposition algorithms) 

column generation see: Branch and price: Integer 
programming with — 

column generation formulation 

[90C35] 

see: Multicommodity flow problems) 

column generation methods 

[90C10, 90C11, 90C27, 90C57] 

see: Set covering, packing and partitioning problems) 

column generation subproblem 

[90C06, 90C25, 90C35] 

see: Simplicial decomposition algorithms) 

column incidence graph 

[65D25, 68W30] 

(see: Complexity of gradients, Jacobians, and Hessians) 

column-pivoting see: QR factorization with — 

column sufficient 

90C33] 

(see: Linear complementarity problem) 

column sufficient matrix 

65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 

column synthesis see: MINLP: reactive distillation — 

combination of the extreme points see: convex — 

combinations see: convex — 

combinatorial 

[90C60] 

see: Computational complexity theory) 

combinatorial algorithm 

[90C09, 90C10] 

see: Combinatorial optimization algorithms in resource 

allocation problems) 

combinatorial fractional programming 

[90C32] 

(see: Fractional programming) 

Combinatorial matrix analysis 

90C10, 90C09) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Combinatorial optimization 
games; Evolutionary algorithms in combinatorial 
optimization; Fractional combinatorial optimization; 
Multi-objective combinatorial optimization; Replicator 
dynamics in combinatorial optimization) 
(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Combinatorial optimization 
games; Evolutionary algorithms in combinatorial 
optimization; Fractional combinatorial optimization; 
Multi-objective combinatorial optimization; Neural 
networks for combinatorial optimization; Replicator 
dynamics in combinatorial optimization) 

combinatorial matrix analysis 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 

combinatorial optimization 
[01A99, 05A, 05C60, 05C69, 15A, 37B25, 51M, 52A, 52B, 52C, 
60J15, 60J60, 60J70, 60K35, 62H, 62H30, 65C05, 65C10, 
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65C20, 65H20, 65K05, 68Q, 68R, 68U, 68U20, 68W, 70-08, 
82B21, 82B31, 82B41, 82B80, 90-01, 9008, 90B, 90B40, 90C, 
90C10, 90C11, 90C20, 90C26, 90C27, 90C29, 90C35, 90C39, 
90C57, 90C59, 90C60, 91A12, 91A22, 92C40, 92E10, 94C15] 
(see: Combinatorial optimization games; Convex discrete 
optimization; Dynamic programming in clustering; Global 
optimization in protein folding; Greedy randomized 
adaptive search procedures; History of optimization; 
Integer programming; Multi-objective combinatorial 
optimization; Replicator dynamics in combinatorial 
optimization; Variable neighborhood search methods) 

combinatorial optimization 
[05-04, 62H30, 65H20, 65K05, 68M20, 68T99, 90-01, 90B06, 
90B10, 90B35, 90B40, 90B80, 90C08, 90C09, 90C10, 90C11, 
90C15, 90C20, 90C27, 90C29, 90C30, 90C35, 90C39, 90C57, 
90C59, 90C60, 91A12, 94C15] 
(see: Assignment methods in clustering; Bi-objective 
assignment problem; Capacitated minimum spanning 
trees; Combinatorial optimization games; Computational 
complexity theory; Dynamic programming in clustering; 
Evolutionary algorithms in combinatorial optimization; 
Feedback set problems; Greedy randomized adaptive search 
procedures; Integer programming; Linear ordering 
problem; Matroids; Multidimensional knapsack problems; 
Multi-objective combinatorial optimization; Neural 
networks for combinatorial optimization; Oriented 
matroids; Quadratic assignment problem; Set covering, 
packing and partitioning problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic integer 
programs; Vehicle scheduling) 

combinatorial optimization see: convex —; Domination 
analysis in —; Evolutionary algorithms in —; Fractional —; 
large-scale —; linear fractional —; multi-objective —; 
Neural networks for —; Replicator dynamics in —; 
stochastic —; uniform fractional — 

Combinatorial optimization algorithms in resource allocation 
problems 
(90C09, 90C10) 
(referred to in: Combinatorial matrix analysis; Facilities 
layout problems; Facility location with externalities; 
Facility location problems with spatial interaction; Facility 
location with staircase costs; Fractional combinatorial 
optimization; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Multi-objective combinatorial 
optimization; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Replicator dynamics in combinatorial 
optimization; Resource allocation for epidemic control; 
Simple recourse problem; Single facility location: circle 
covering problem; Single facility location: multi-objective 
euclidean distance location; Single facility location: 
multi-objective rectilinear distance location; Stochastic 
transportation and location problems; Voronoi diagrams in 
facility location; Warehouse location problem) 
(refers to: Combinatorial matrix analysis; Combinatorial 
optimization games; Competitive facility location; 
Evolutionary algorithms in combinatorial optimization; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 


staircase costs; Fractional combinatorial optimization; 
Global optimization in Weber’s problem with attraction 
and repulsion; MINLP: application in facility 
location-allocation; Multifacility and restricted location 
problems; Multi-objective combinatorial optimization; 
Network location: covering problems; Neural networks for 
combinatorial optimization; Optimizing facility location 
with euclidean and rectilinear distances; 
Production-distribution system design problem; Replicator 
dynamics in combinatorial optimization; Resource 
allocation for epidemic control; Simple recourse problem; 
Single facility location: circle covering problem; Single 
facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 


combinatorial optimization game 
[90C27, 90C60, 9112] 
(see: Combinatorial optimization games) 


Combinatorial optimization games 
(91A12, 90C27, 90C60) 
(referred to in: Combinatorial matrix analysis; 
Combinatorial optimization algorithms in resource 
allocation problems; Evolutionary algorithms in 
combinatorial optimization; Fractional combinatorial 
optimization; Multi-objective combinatorial optimization; 
Replicator dynamics in combinatorial optimization) 
(refers to: Combinatorial matrix analysis; Evolutionary 
algorithms in combinatorial optimization; Fractional 
combinatorial optimization; Multi-objective combinatorial 
optimization; Neural networks for combinatorial 
optimization; Replicator dynamics in combinatorial 
optimization) 

combinatorial optimization problem 
[90C27, 90C30, 90C60] 
(see: Computational complexity theory; Neural networks 
for combinatorial optimization) 

combinatorial optimization problem see: fractional —; integral 
linear fractional — 


combinatorial optimization problems 
90C11, 90C59] 

(see: Nested partitions optimization) 
combinatorial problem 

90C26, 90C90] 

(see: Structural optimization: history) 
combinatorial properties 

68Q20] 

(see: Optimal triangulations) 
combinatorial switching circuit 

03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 


Combinatorial test problems and problem generators 
(90B99, 05499) 
(referred to in: Maximum cut problem, MAX-CUT) 
combinatorics 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 


combinatorics see: polyhedral — 
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combine 
[46N10, 47N10, 49M37, 65K10, 90C26, 90C30] 
(see: Global optimization: tight convex underestimators) 
combined method of feasible directions 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
combined relative measure 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
combined relaxation methods 
[47J20, 49J40, 65K10, 90C33] 
(see: Solution methods for multivalued variational 
inequalities) 
commodity 
[90C35] 
(see: Multicommodity flow problems) 
commodity flows see: Multi- — 
commodity model in OR see: single- — 
commodity network flow problem see: nonlinear single — 
commodity single-criterion uncapacitated static multifacility 
see: discrete single- — 
common dependency set 
90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
common mutated sequence see: minimum weight — 
common random numbers 
62F12, 65C05, 65K05, 90C15, 90C27, 90C31] 
(see: Discrete stochastic optimization; Monte-Carlo 
simulations for stochastic optimization) 
communication 
03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
communication see: open —; rendez-vous — 
communication costs 
65K05, 65Y05] 
(see: Parallel computing: models) 
communication equilibrium 
49Jxx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
communication-free alignment 
05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
communication-free alignment problem 
05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
communication network 
68Q25, 90B80, 90C05, 90C27] 
(see: Communication network assignment problem) 
Communication network assignment problem 
(90B80, 90C05, 90C27, 68Q25) 
(referred to in: Assignment and matching; Assignment 
methods in clustering; Auction algorithms; Bi-objective 
assignment problem; Dynamic traffic networks; 
Equilibrium networks; Frequency assignment problem; 
Generalized networks; Linear ordering problem; Maximum 
flow problem; Maximum partition matching; Minimum 
cost flow problem; Multicommodity flow problems; 
Network design problems; Network location: covering 
problems; Nonconvex network flow problems; Piecewise 
linear network flow problems; Quadratic assignment 
problem; Shortest path tree algorithms; Steiner tree 


problems; Stochastic network problems: massively parallel 
solution; Survivable networks; Traffic network equilibrium) 
(refers to: Assignment and matching; Assignment methods 
in clustering; Auction algorithms; Bi-objective assignment 
problem; Directed tree networks; Dynamic traffic networks; 
Equilibrium networks; Evacuation networks; Frequency 
assignment problem; Generalized networks; Maximum flow 
problem; Maximum partition matching; Minimum cost 
flow problem; Network design problems; Network location: 
covering problems; Nonconvex network flow problems; 
Piecewise linear network flow problems; Quadratic 
assignment problem; Shortest path tree algorithms; Steiner 
tree problems; Stochastic network problems: massively 
parallel solution; Survivable networks; Traffic network 
equilibrium) 

communication network assignment problem 

68Q25, 90B80, 90C05, 90C27] 

(see: Communication network assignment problem) 

communication protocol 

90C30, 90C52, 90C53, 90C55] 

(see: Asynchronous distributed optimization algorithms) 

commutator K 

03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 

compact 

491.20, 49M29, 65K10, 90C06, 90C40] 
(see: Dynamic programming: undiscounted problems; Local 
attractors for gradient-related descent iterations) 

compact see: collectivety — 

compact epi-Lipschitzness 

[49K27, 58C20, 58E30, 90C48] 

see: Nonsmooth analysis: Fréchet subdifferentials) 

compact graph 

[49J20, 49]52] 

see: Shape optimization) 

compact operator 

[49]52] 

see: Hemivariational inequalities: eigenvalue problems) 

compact representation 

[90C30, 90C35] 

see: Optimization in water resources; Unconstrained 

nonlinear optimization: Newton-Cauchy framework) 

compact representations 

[90C39] 

see: Neuro-dynamic programming) 

compactness 

[90C31, 90C34] 

see: Parametric global optimization: sensitivity) 

compactness see: cluster —; partial sequential normal —; 
sequential normal —; weak — 

company policies see: in- — 

comparative efficiency assessment 

[90B30, 90B50, 90C05, 91B82] 

see: Data envelopment analysis) 

comparison see: paired —; sequence —; technological — 

comparison of efficiency and nondomination 

[90C15, 90C29] 

see: Discretely distributed stochastic programs: descent 

directions and efficient points) 

comparison oracle 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
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90B, 90C] 
(see: Convex discrete optimization) 
comparison of parametric solutions 
[90C11, 90C31] 
(see: Multiparametric mixed integer linear programming) 
comparisons see: missing —; pairwise — 
compatibility see: backward —; both-way —; forward — 
compatibility condition 
[35A15, 47J20, 49]40] 
(see: Hemivariational inequalities: static problems) 
compatibility conditions 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
compatibility equations see: strain-displacement — 
compatibility theorem see: Bandler-Kohout — 
competition see: imperfect —; perfect — 
competition facility location model see: spatial — 
competitive see: perfectly — 
competitive analysis 
[68Q25, 91B28] 
see: Competitive ratio for portfolio management) 
competitive environment 
[90B80, 90B85] 
see: Warehouse location problem) 
competitive equilibrium model see: perfectly — 
Competitive facility location 
90B60, 90B80, 90B85) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 
competitive ratio 
[05C85, 68Q25, 91B28] 
see: Competitive ratio for portfolio management; Directed 
tree networks) 
competitive ratio 
[68Q25, 91B28] 
see: Competitive ratio for portfolio management) 
Competitive ratio for portfolio management 
91B28, 68Q25) 
referred to in: Financial applications of multicriteria 
analysis; Financial optimization; Portfolio selection and 
multicriteria analysis; Robust optimization; Semi-infinite 
programming and applications in finance) 
(refers to: Financial applications of multicriteria analysis; 
Financial optimization; Portfolio selection and 
multicriteria analysis; Robust optimization; Semi-infinite 
programming and applications in finance) 


co. 


mplement 

05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 

(see: Convex discrete optimization) 


complement 


05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 


complement see: relative —; Schur — 


co. 


co. 


co 


mplement of an arrangement of hyperplanes 

05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements) 

mplement graph 

05C69, 05C85, 68WO01, 90C59] 

(see: Heuristics for maximum clique and independent set) 
mplementarity 

90C10, 90C11, 90C27, 90C30, 90C33] 

(see: Continuous reformulations of discrete-continuous 
optimization problems; Optimization with equilibrium 
constraints: A piecewise SQP approach; Topological 
methods in complementarity theory) 


complementarity 


[49-01, 49-XX, 49K10, 49M37, 90-01, 90-XX, 90C05, 90C27, 
90C30, 90C33, 91B52, 93-XX] 

(see: Bilevel linear programming; Duality for semidefinite 
programming; Duality theory: monoduality in convex 
optimization; Topological methods in complementarity 
theory) 


complementarity see: generalized —; linear —; nonlinear —; 


Order —; strict — 


Complementarity algorithms in pattern recognition 


(referred to in: Generalizations of interior point methods for 
the linear complementarity problem; Simultaneous 
estimation and optimization of nonlinear problems) 

(refers to: Generalizations of interior point methods for the 
linear complementarity problem; Generalized eigenvalue 
proximal support vector machine problem) 


complementarity condition 


49M37, 65K05, 90C30] 
(see: Image space approach to optimization; 
Inequality-constrained nonlinear optimization) 


complementarity condition 


90C22, 90C25, 90C31] 
(see: Semidefinite programming: optimality conditions and 
stability) 


complementarity problem 


90C33] 
(see: Generalized nonlinear complementarity problem; 
Linear complementarity problem) 


complementarity problem 


65K10, 65M60, 90C30, 90C31, 90C33] 

(see: Implicit lagrangian; Nonsmooth and smoothing 
methods for nonlinear complementarity problems and 
variational inequalities; Sensitivity analysis of 
complementarity problems; Variational inequalities) 


complementarity problem see: discrete dynamic —; 


dynamic —; extended linear —; general order —; 
Generalizations of interior point methods for the linear —; 
generalizations of the nonlinear —; generalized —; 
generalized linear order —; generalized mixed —; 
Generalized nonlinear —; generalized order —; horizontal 
linear —; implicit —; implicit general order —; 
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infinite-dimensional generalized order —; linear —; linear 
order —; mixed linear —; nonlinear —; nonlinear order —; 
order —; parametric linear —; parametric nonlinear —; 
vertical linear — 
complementarity problem and fixed point problem see: 
Equivalence between nonlinear — 
complementarity problems see: linear —; nonlinear —; 
parametric —; Principal pivoting methods for linear —; 
Sensitivity analysis of —; Splitting method for linear — 
complementarity problems and variational inequalities see: 
Nonsmooth and smoothing methods for nonlinear — 
complementarity slackness 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
complementarity slackness see: strict — 
complementarity slackness condition see: strict — 
complementarity theory see: Topological methods in — 
complementary basis 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules; Principal pivoting 
methods for linear complementarity problems) 
complementary conditions see: strictly — 
complementary gap function 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
complementary gap function see: pure — 
complementary graph 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 
complementary ions 
(see: Peptide identification via mixed-integer optimization) 
complementary operator 
49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
mplementary pair of variables 
90C33] 
(see: Lemke method) 
mplementary pivot methods 
90C30, 90C90] 
(see: Bilevel programming: global optimization) 
complementary pivot theory 
65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
complementary problem see: Integer linear —; linear —; 
nonlinear — 
complementary region 
90C11, 90C59] 
(see: Nested partitions optimization) 
complementary relation 
03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
complementary slackness 
15A39, 68W 10, 90B15, 90C05, 90C06, 90C10, 90C30, 90C31, 
90C33, 90C34, 90C46, 90C90] 
(see: Decomposition techniques for MILP: lagrangian 
relaxation; Integer programming duality; Integer 
programming: lagrangian relaxation; Linear 
complementarity problem; Semi-infinite programming: 
methods for linear problems; Sensitivity and stability in 


Cc 


S 


Cc 


S 


NLP: continuity and differential stability; Stochastic 
network problems: massively parallel solution; Tucker 
homogeneous systems of linear relations) 
complementary slackness see: €- —; strict — 
complementary slackness conditions 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 
complementary slackness relations 
[15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
complementary solution see: strictly — 
complementary solutions see: almost — 
complete 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
complete see: NP- — 
complete completeness see: strong NP- — 
complete digraph 
[90C10, 90C11, 90C20] 
see: Linear ordering problem) 
complete game 
[49J35, 49K35, 62C20, 91A05, 91A40] 
see: Minimax game tree searching) 
complete graph 
[05C69, 05C85, 68W01, 90C59] 
see: Heuristics for maximum clique and independent set) 
complete language see: F- — 
complete many-valued logic normal form 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
complete master problem 
[90C06, 90C25, 90C35] 
see: Simplicial decomposition algorithms) 
complete normal forms of Pi-algebras see: functionally — 
complete orientation 
[90B35] 
see: Job-shop scheduling problem) 
complete orthogonal factorization 
[15A23, 65F05, 65F20, 65F22, 65F25] 
see: Orthogonal triangularization) 
complete orthogonal factorization 
[15A23, 65F05, 65F20, 65F22, 65F25] 
see: Orthogonal triangularization; QR factorization) 
complete problem see: NP- — 
complete problems and proof methodology see: NP- — 
complete recourse 
[90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming; Stochastic 
programming: nonanticipativity and lagrange multipliers) 
complete recourse see: relatively — 
complete reduction 
[65K05, 90C30] 
see: Bisection global optimization methods) 
complete reductions see: ordinary NP- — 
complete set of connectives 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
complete systems of many-valued logic algebras see: Finite — 
completely continuous operator 
[46N10, 49]40, 90C26] 
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(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 

completely positive 
[90C22, 90C25] 
(see: Copositive programming) 

completely positive and contraction matrices see: completion 
to— 

completely positive matrix 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 

completely positive matrix see: partial — 

completely regular set 
[90033] 
(see: Order complementarity) 

completeness see: F- —; functional —; NP- —; ordinary NP- —; 
strong NP- —; strong NP-complete — 

completeness of Pl-algebras see: functional — 

completeness proofs see: NP- — 

completion see: rank matrix — 

completion to completely positive and contraction matrices 

[05C50, 15A48, 15A57, 90C25] 

see: Matrix completion problems) 

completion of matrices 

[05C50, 15A48, 15A57, 90C25] 

see: Matrix completion problems) 

completion of a partial matrix 

[05C50, 15A48, 15A57, 90C25] 

see: Matrix completion problems) 

completion problem see: distance matrix —; Euclidean 
distance matrix —; matrix —; maximum rank —; minimum 
rank —; positive (semi) definite —; positive semidefinite 
matrix — 

completion problems see: Matrix — 

completion time 

[90B36] 

see: Stochastic scheduling) 

complex interval matrix 

[65G20, 65G30, 65G40, 65L99] 

see: Interval analysis: eigenvalue bounds of interval 

matrices) 

complex interval matrix 

[65G20, 65G30, 65G40, 65L99] 

see: Interval analysis: eigenvalue bounds of interval 
matrices) 

complexities see: predictability of — 

complexity 
[65K05, 68Q25, 90C08, 90C11, 90C27, 90C30, 90C57, 90C59, 
90C60] 
(see: Automatic differentiation: point and interval taylor 
operators; NP-complete problems and proof methodology; 
Quadratic assignment problem) 

complexity 
[90B35, 90C20, 90C25, 90C27, 90C30, 90C60, 90C90, 91A12] 
(see: Chemical process planning; Combinatorial 
optimization games; Complexity classes in optimization; 
Complexity theory; Complexity theory: quadratic 
programming; Job-shop scheduling problem; Kolmogorov 
complexity; Kuhn-Tucker optimality conditions; Quadratic 
programming over an ellipsoid) 

complexity see: Algorithmic —; average case —; 
computational —; conditional Kolmogorov —; 


Descriptional —; descriptive —; exponential —; graver —; 
information-based —; Kolmogorov —; PLS- —; 
polynomial —; Regression by special functions: algorithms 
and —; Solomonoff—Kolmogorov-Chaitin —; worst-case — 

complexity of an algorithm 

90C60] 

(see: Computational complexity theory) 

complexity of algorithms see: average case — 

complexity of bilevel programming 

90C30, 90C90] 

(see: Bilevel programming: global optimization) 

complexity class P 

03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 

complexity classes 

03D15, 68Q05, 68Q15, 90C60] 

(see: Complexity classes in optimization; Parallel 

computing: complexity classes) 

complexity classes 

03D15, 68Q05, 68Q15, 90C60] 
(see: Complexity classes in optimization; Parallel 
computing: complexity classes) 

complexity classes see: Parallel computing: — 

Complexity classes in optimization 
(90C60) 
(referred to in: Complexity of degeneracy; Complexity of 
gradients, Jacobians, and Hessians; Complexity theory; 
Complexity theory: quadratic programming; 
Computational complexity theory; Facilities layout 
problems; Fractional combinatorial optimization; Global 
optimization in multiplicative programming; 
Information-based complexity and information-based 
optimization; Interval Newton methods; Job-shop 
scheduling problem; Kolmogorov complexity; Mixed 
integer nonlinear programming; Multifacility and 
restricted location problems; Multiplicative programming; 
NP-complete problems and proof methodology; Parallel 
computing: complexity classes; Vehicle scheduling) 
(refers to: Complexity of degeneracy; Complexity of 
gradients, Jacobians, and Hessians; Complexity theory; 
Complexity theory: quadratic programming; 
Computational complexity theory; Fractional 
combinatorial optimization; Information-based complexity 
and information-based optimization; Kolmogorov 
complexity; Mixed integer nonlinear programming; 
NP-complete problems and proof methodology; Parallel 
computing: complexity classes) 

Complexity of degeneracy 
(90C60) 
(referred to in: Complexity classes in optimization; 
Complexity of gradients, Jacobians, and Hessians; 
Complexity theory; Complexity theory: quadratic 
programming; Computational complexity theory; 
Fractional combinatorial optimization; Information-based 
complexity and information-based optimization; 
Kolmogorov complexity; Mixed integer nonlinear 
programming; NP-complete problems and proof 
methodology; Parallel computing: complexity classes) 
(refers to: Complexity classes in optimization; Complexity of 
gradients, Jacobians, and Hessians; Complexity theory; 
Complexity theory: quadratic programming; 
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Computational complexity theory; Fractional 
combinatorial optimization; Information-based complexity 
and information-based optimization; Kolmogorov 
complexity; Mixed integer nonlinear programming; 
NP-complete problems and proof methodology; Parallel 
computing: complexity classes) 

complexity of a deterministic Turing machine see: space —; 
time — 

complexity, equivalence to minmax, concave programs see: 
Bilevel linear programming: — 

complexity and equivalent forms see: Quadratic integer 
programming: — 

complexity function see: time — 

complexity function of an algorithm see: time — 

Complexity of gradients, Jacobians, and Hessians 
(65D25, 68W30) 
(referred to in: Complexity classes in optimization; 
Complexity of degeneracy; Complexity theory; Complexity 
theory: quadratic programming; Computational 
complexity theory; Fractional combinatorial optimization; 
Information-based complexity and information-based 
optimization; Kolmogorov complexity; Mixed integer 
nonlinear programming; NP-complete problems and proof 
methodology; Parallel computing: complexity classes) 
(refers to: Complexity classes in optimization; Complexity of 
degeneracy; Complexity theory; Complexity theory: 
quadratic programming; Computational complexity 
theory; Fractional combinatorial optimization; 
Information-based complexity and information-based 
optimization; Kolmogorov complexity; Mixed integer 
nonlinear programming; NP-complete problems and proof 
methodology; Parallel computing: complexity classes) 

complexity and information-based optimization see: 
Information-based — 

Complexity and large-scale least squares problems 
(93E24, 34-xx, 34Bxx, 34Lxx) 

complexity of the linear BLPP 
[49-01, 49K45, 49N10, 90-01, 90C20, 90C27, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs) 

complexity of models 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 

complexity of a nondeterministic Turing machine see: 
space —; time — 

complexity O(n‘) see: algorithm of — 

complexity of optimization problems see: computational — 

Complexity theory 
(90C60) 
(referred to in: Complexity classes in optimization; 
Complexity of degeneracy; Complexity of gradients, 
Jacobians, and Hessians; Complexity theory: quadratic 
programming; Computational complexity theory; Facilities 
layout problems; Fractional combinatorial optimization; 
Global optimization in multiplicative programming; 
Information-based complexity and information-based 
optimization; Job-shop scheduling problem; Kolmogorov 
complexity; Mixed integer nonlinear programming; 
Multifacility and restricted location problems; 
NP-complete problems and proof methodology; Parallel 


computing: complexity classes; Quadratic assignment 
problem; Quadratic knapsack; Vehicle scheduling) 
(refers to: Complexity classes in optimization; Complexity of 
degeneracy; Complexity of gradients, Jacobians, and 
Hessians; Complexity theory: quadratic programming; 
Computational complexity theory; Fractional 
combinatorial optimization; Information-based complexity 
and information-based optimization; Kolmogorov 
complexity; Mixed integer nonlinear programming; 
NP-complete problems and proof methodology; Parallel 
computing: complexity classes; Shortest path tree 
algorithms) 

complexity theory 

90C60] 

(see: Complexity theory; Computational complexity theory) 

complexity theory 

90C60] 

(see: Computational complexity theory) 

complexity theory see: Computational — 

complexity theory of algorithms 

03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 

complexity theory of PI-algebras 

03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 

Complexity theory: quadratic programming 
(90C60) 
(referred to in: Complexity classes in optimization; 
Complexity of degeneracy; Complexity of gradients, 
Jacobians, and Hessians; Complexity theory; 
Computational complexity theory; Fractional 
combinatorial optimization; Information-based complexity 
and information-based optimization; Kolmogorov 
complexity; Linear ordering problem; Mixed integer 
nonlinear programming; NP-complete problems and proof 
methodology; Parallel computing: complexity classes; 
Quadratic assignment problem; Quadratic fractional 
programming: Dinkelbach method; Quadratic knapsack; 
Quadratic programming with bound constraints; Quadratic 
programming over an ellipsoid; Standard quadratic 
optimization problems: algorithms; Standard quadratic 
optimization problems: applications; Standard quadratic 
optimization problems: theory) 
(refers to: Complexity classes in optimization; Complexity of 
degeneracy; Complexity of gradients, Jacobians, and 
Hessians; Complexity theory; Computational complexity 
theory; Fractional combinatorial optimization; 
Information-based complexity and information-based 
optimization; Kolmogorov complexity; Mixed integer 
nonlinear programming; NP-complete problems and proof 
methodology; Parallel computing: complexity classes; 
Quadratic assignment problem; Quadratic fractional 
programming: Dinkelbach method; Quadratic knapsack; 
Quadratic programming with bound constraints; Quadratic 
programming over an ellipsoid; Standard quadratic 
optimization problems: algorithms; Standard quadratic 
optimization problems: applications; Standard quadratic 
optimization problems: theory) 

complexity of Turing machines 
[90C60] 
(see: Complexity classes in optimization) 
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compliance 

90C25, 90C27, 90C90] 

(see: Semidefinite programming and structural 
optimization) 

complicating variables 

90C26] 

(see: Generalized primal-relaxed dual approach) 
component 

68T99, 90C27] 

(see: Capacitated minimum spanning trees) 


component see: basic —; central —; control —; feasible —; 
infeasible —; linear —; nonbasic —; noncentral —; 
polygonal —; singular — 


component species 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
components see: full — 
components of a digraph see: strongly connected — 
components of a matrix see: irreducible — 
componentwise bound see: optimal — 
Composite Convexifiable Function see: integral Mean-Value 
for — 
composite materials see: laminated — 
Composite nonsmooth optimization 
(46A20, 90C30, 52A01) 
(referred to in: Nonconvex-nonsmooth calculus of 
variations; Nonsmooth and smoothing methods for 
nonlinear complementarity problems and variational 
inequalities; Solving hemivariational inequalities by 
nonsmooth optimization methods) 
(refers to: Nonconvex-nonsmooth calculus of variations; 
Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities; 
Solving hemivariational inequalities by nonsmooth 
optimization methods; Vector optimization) 
composite programming see: convex — 
composite structures see: design of —; Optimal design of — 
composition 
[90C09, 90C10] 
(see: Oriented matroids) 
composition difference see: minimum — 
composition interval diagram 
[93.A30, 93B50] 
(see: Mixed integer linear programming: mass and heat 
exchanger networks) 
composition of relations see: round —; square — 
composition theorem 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
compositional models 
[76T30, 90C11, 90C90] 
see: Mixed integer optimization in well scheduling) 
compositions see: equality of phase —; relational — 
compression 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
compromise programming 
[90C29] 
see: Multi-objective optimization: pareto optimal 
solutions, properties) 


compromise solution see: best- — 

computability see: partial — 

computable function see: polynomial time — 

computable optimal value bounds 

90C31] 

(see: Sensitivity and stability in NLP: approximation) 

computable solution 

90C31] 

(see: Sensitivity and stability in NLP: approximation) 

computation 

33C45, 65F20, 65F22, 65K10, 90C10, 90C30] 

(see: Least squares orthogonal polynomials; Modeling 

languages in optimization: a new paradigm) 

Computation 

90C60] 
(see: Kolmogorov complexity) 

computation see: asynchronous —; Automatic differentiation: 
parallel —; conjugate gradient parameter —; direction —; 
fixed point —; model of —; parallel —; partially 
asynchronous — 

computation and data mapping 
[05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 

computation in mechanics see: parallel — 

computation for polytopes: strategies and performances see: 
Volume — 

computation thesis see: parallel — 

computation of a Turing machine see: accepting —; length of 
a partial —; nonaccepting —; partial — 

computational biology 
[90C35] 
(see: Optimization in leveled graphs) 

computational certainty see: mathematical and — 

computational complexity 
[41A30, 62J02, 90C08, 90C11, 90C26, 90C27, 90C57, 90C59, 
90C60, 91A12] 
(see: Combinatorial optimization games; Computational 
complexity theory; Quadratic assignment problem; 
Regression by special functions: algorithms and 
complexity) 

computational complexity 
[03B50, 49-01, 49K45, 49N10, 68Q25, 68T15, 68T30, 90-01, 
90B80, 90C05, 90C20, 90C27, 90C60, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs; Communication network 
assignment problem; Computational complexity theory; 
Finite complete systems of many-valued logic algebras; 
NP-complete problems and proof methodology) 

computational complexity of optimization problems 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 

Computational complexity theory 
(90C60) 
(referred to in: Complexity classes in optimization; 
Complexity of degeneracy; Complexity of gradients, 
Jacobians, and Hessians; Complexity theory; Complexity 
theory: quadratic programming; Fractional combinatorial 
optimization; Information-based complexity and 
information-based optimization; Kolmogorov complexity; 
Mixed integer nonlinear programming; Multiplicative 
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programming; NP-complete problems and proof 
methodology; Parallel computing: complexity classes; 
Quadratic assignment problem; Quadratic knapsack) 
(refers to: Complexity classes in optimization; Complexity of 
degeneracy; Complexity of gradients, Jacobians, and 
Hessians; Complexity theory; Complexity theory: quadratic 
programming; Fractional combinatorial optimization; 
Information-based complexity and information-based 
optimization; Kolmogorov complexity; Mixed integer 
nonlinear programming; Parallel computing: complexity 
classes) 
computational differentiation 
[26A24, 34-XX, 49-XX, 65-XX, 65D25, 68-XX, 68W30, 
90-XX] 
(see: Automatic differentiation: introduction, history and 
rounding error estimation; Complexity of gradients, 
Jacobians, and Hessians; Nonlocal sensitivity analysis with 
automatic differentiation) 
computational efficiency 
90C31 
(see: Sensitivity and stability in NLP: approximation) 
computational equivalence 
90C34 
(see: Semi-infinite programming: approximation methods) 
computational equivalent 
90C34 
(see: Semi-infinite programming: approximation methods) 
computational fluid dynamics 
90C90 
(see: Design optimization in computational fluid dynamics) 
computational fluid dynamics 
90C90 
(see: Design optimization in computational fluid dynamics) 


computational fluid dynamics see: Design optimization in — 
computational graph 

[26A24, 65D25, 65K05, 68W30, 90C26, 90C30] 

(see: Automatic differentiation: introduction, history and 

rounding error estimation; Automatic differentiation: 

point and interval taylor operators; Bounding derivative 

ranges; Complexity of gradients, Jacobians, and Hessians) 
computational issues in classification 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 
computational linguistics 

[90C09, 90C10] 

(see: Optimization in classifying text documents) 
computational mechanics 

[49Q10, 74K99, 74Pxx, 90C90, 91465] 

(see: Multilevel optimization in mechanics) 
computational method see: adaptive — 
computational methods 

[65H99, 65K99] 

(see: Automatic differentiation: point and interval) 
computational model 

[65D25, 68W30] 

(see: Complexity of gradients, Jacobians, and Hessians) 
computational nonredundancy 

[90C30] 

(see: Cost approximation algorithms) 


computational performance 

[90C26] 

(see: Smooth nonlinear nonconvex optimization) 
computational performance see: optimization of — 
computational plasticity 

[49Q10, 74K99, 74Pxx, 90C90, 91A65] 

(see: Multilevel optimization in mechanics) 
computational process 

[26A24, 65D25] 

(see: Automatic differentiation: introduction, history and 

rounding error estimation) 
computational solution see: practically feasible — 
computational step 

[26A24, 65D25] 

(see: Automatic differentiation: introduction, history and 

rounding error estimation) 
computationally equivalent semi-infinite programs 

[90C34] 

(see: Semi-infinite programming: approximation methods) 
computations see: interval —; parallel —; uniform 
compute see: easy-to- — 
compute a safeguarded new trial steplength 
[49M07, 49M10, 65K, 90C06, 90C20] 

(see: Spectral projected gradient methods) 

compute the search direction 

[49M07, 49M10, 65K, 90C06, 90C20] 

see: Spectral projected gradient methods) 

compute the steplength 

[49M07, 49M10, 65K, 90C06, 90C20] 

see: Spectral projected gradient methods) 

computer see: bulk synchronous parallel —; distributed 
memory parallel —; parallel — 

computer aided techniques 

[90C26, 90C30] 

see: Forecasting) 

computer algebra 

[65D25, 68W30] 

see: Complexity of gradients, Jacobians, and Hessians) 

computer algebra 

[13Cxx, 13Pxx, 14Qxx, 65D25, 68W30, 90Cxx] 

see: Complexity of gradients, Jacobians, and Hessians; 

Integer programming: algebraic methods) 

computer algebra package 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 

computer code PCSP 

[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 

(see: Approximation of multivariate probability integrals) 
computer implementation example see: optimization — 
Computer implementation of optimization 

[90C10, 90C30, 90C35] 

(see: Optimization in operation of electric and energy 

power systems) 
computer network see: local-area — 
computerized tomography 

[94A08, 94A17] 

(see: Maximum entropy principle: image reconstruction) 
computing 

[65D25, 68W30] 

(see: Complexity of gradients, Jacobians, and Hessians) 
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computing see: distributed —; high performance —; 
interval —; massively parallel —; models for parallel —; 
parallel — 

computing: complexity classes see: Parallel — 

computing: models see: Parallel — 

computing processes in interactive methods 
[90C11, 90C29] 
(see: Multi-objective mixed integer programming) 

computing system see: high performance — 

concave case see: convex- — 

concave fractional program 

[90C32] 

(see: Fractional programming) 

concave function 

[90C26, 90C39] 

(see: Second order optimality conditions for nonlinear 

optimization) 

concave function 

[65K05, 90Cxx] 

see: Dini and Hadamard derivatives in optimization) 

concave function see: a- —; U- — 

concave functions see: product of — 

concave increasing 
[65D18, 90B85, 90C26] 
(see: Global optimization in location problems) 

concave measure see: a- — 

concave minimization 
[90B10, 90C25] 
(see: Concave programming; Piecewise linear network flow 
problems) 

concave probability measure see: y- — 

Concave programming 
(90C25) 
(referred to in: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Global 
optimization in multiplicative programming; Minimum 
concave transportation problems; Multiplicative 
programming; Quadratic assignment problem; Reverse 
convex optimization; Stochastic global optimization: 
stopping rules; Stochastic global optimization: two-phase 
methods) 
(refers to: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Minimum 
concave transportation problems) 

concave programming 
[90C25, 90C90] 
(see: Chemical process planning; Concave programming) 

concave programming 
[49-01, 49K45, 49N10, 90-01, 90C20, 90C25, 90C27, 90C90, 
91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs; Chemical process planning; 
Concave programming) 

concave programming see: quadratic — 

concave programs see: Bilevel linear programming: 
complexity, equivalence to minmax — 

concave quadratic programming 
[90C60] 
(see: Complexity theory: quadratic programming) 

concave regression see: convex and — 

concave transportation problem see: minimum — 


concave transportation problems see: Minimum — 
concavity see: property of —; vector generalized — 
concavity cut 

[90C26] 

(see: Cutting plane methods for global optimization) 
concavity cut 

[90C26] 

(see: Cutting plane methods for global optimization) 
concavity in multi-objective optimization see: Generalized — 
concentration theorem see: Jaynes entropy — 
concept in auction algorithms see: virtual source — 
conceptual design stage 

[90C90] 

(see: Design optimization in computational fluid dynamics) 
conceptual diagram 

(see: State of the art in modeling agricultural systems) 
conceptual modeling 

(see: State of the art in modeling agricultural systems) 
conciseness 
90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 
conclusion 

03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
concordance 

90-XX] 
(see: Outranking methods) 
concordance condition 

90-XX] 
(see: Outranking methods) 
concordance-discordance 
90-XX] 
(see: Outranking methods) 
concordance level 

90-XX] 
(see: Outranking methods) 
concordant coalition 

90-XX] 
(see: Outranking methods) 

concurrent subspace 

65F10, 65F50, 65H10, 65K10] 

(see: Multidisciplinary design optimization) 

concurrent subspace optimization 

65F10, 65F50, 65H10, 65K10] 

(see: Multidisciplinary design optimization) 

condensation see: posynomial — 

condensing operator 

90C33] 

(see: Order complementarity) 

CoNDEXPTIME 

90C60] 

(see: Complexity classes in optimization) 
condition see: acute angle —; calmness —; coercivity —; 

compatibility —; complementarity —; concordance —; 

conjugacy —; diagonal dominance —-; first order 

necessary —; Fritz John type —; general second order 

sufficient —; general strong second order sufficient —; 

high-order local minimum —; Karush-Kuhn-Tucker type —; 

Kirchhoff- —; Kuhn-Tucker optimality —; linear growth —; 

linear nondegeneracy —; maximum —; necessary 

optimality —; nondegeneracy —; nondiscordance —; 
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nonobtuse angle —; nontriviality —; orthogonal —; partial 
calmness —; quadratic nondegeneracy —; regularity —; 
saddle-point sufficient —; sandwich —; second order 
necessary —; second order optimality —; second order 
sufficient —; separation —; Signorini —; Slater's —; strict 
complementarity slackness —; strict feasibility —; strong 
second order sufficient —; sufficient —; sufficient 
optimality —; superlinear convergence —; uniform 
angle —; unilateral growth — 

condition iteration BCI see: Boundary — 

condition iteration method see: boundary — 

condition for LDSU see: nonarbitrage — 

condition measures 
[90C05, 90C22, 90C25, 90C30, 90C51] 
(see: Interior point methods for semidefinite programming) 

condition number 
[15-XX, 49M37, 65-XX, 90-XX, 90C31] 
(see: Cholesky factorization; Nonlinear least squares: 
Newton-type methods; Sensitivity and stability in NLP: 
approximation) 

condition number see: normwise relative — 

condition number of a matrix 
[65Fxx] 
(see: Least squares problems) 

condition for penalty methods see: regularity — 

condition without using (sub)gradients parametric 
representations see: necessary optimality — 

conditional see: logic — 

conditional expectation constraint 
[90C15] 
(see: Static stochastic programming models: conditional 
expectations) 

conditional expectations see: Static stochastic programming 
models: — 

conditional gradient method 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 

conditional Kolmogorov complexity 
[90C60] 
(see: Kolmogorov complexity) 

conditional lower derivative see: Dini —; Hadamard — 

conditional proximity data see: row — 

conditional upper derivative see: Dini —; Hadamard — 

conditionally differentiable function see: Dini —; Hadamard — 

conditionally directionally differentiable function see: Dini —; 
Hadamard — 

conditioned coefficient matrix see: ill- — 

conditioned matrix see: ill- —; well- — 

conditioned problem see: ill- —; well- — 

conditions see: boundary —; boundary value —; 
collocation —; compatibility —; complementary 
slackness —; continuity —; cut —; economic system —; 
elastostatics with nonlinear boundary —; 
Equality-constrained nonlinear programming: KKT 
necessary optimality —; first order KKT —; first order 
necessary —; first order necessary optimality —; first order 
and second order optimality —; Fritz John —; Fritz John 
generalized —; fritz John necessary optimality —; 
generalized Karush-Kuhn-Tucker —; generalized necessary 
optimality —; Generalized semi-infinite programming: 
optimality —; Goldstein —; Hebden —; 


Karush—Kuhn-Tucker —; Karush—Kuhn-Tucker 
optimality —; KKT —; KKT necessary optimality —; KKT 
optimality —; KKT stationarity —; KT —; Kuhn-Tucker —; 
Kuhn-Tucker necessary optimality —; Kuhn—Tucker 
optimality —; Lagrangian —; market equilibrium —; 
matching of derivative —; moment —; necessary —; 
necessary optimality —; necessary and sufficient —; 
necessary and sufficient optimality —; nonstoichiometric 
form of KT —; optimal integral bounds subject to 
moment —; optimality —; Penrose —; point —; Post —; 
Quasidifferentiable optimization: optimality —; 
quasidifferential elastic boundary —; quasidifferential 
thermal boundary —; regularity —; Saddle point theory and 
optimality —; second order necessary —; second order 
necessary and sufficient optimality —; second order 
sufficient —; Semi-infinite programming: second order 
optimality —; stoichiometric form of KT —; strictly 
complementary —; sufficient —; sufficient decrease —; 
sufficient optimality —; uniform Hdlder —; validity —; 
variational formulation of quasidifferential thermal 
boundary — 

conditions for a constrained optimum 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 

conditions and duality see: Bilevel programming: optimality — 

conditions moment problem see: infinite many — 

conditions on multipliers see: orthogonality — 

conditions for nonlinear optimization see: Second order 
optimality — 

conditions for optimality see: high-order necessary — 

conditions for optimality for abnormal points see: High-order 
necessary — 

conditions for quadratic programming sub-problems see: 
Kuhn-Tucker — 

conditions and stability see: Semidefinite programming: 
optimality — 

conditions for an unconstrained optimum 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 

Condorcet paradox 
[90-XX] 
(see: Outranking methods) 

conduction see: Fourier law of heat —; heat — 

cone see: Bouligand —; Bouligand tangent —; contingent —; 
convex —; critical —; dual —; fréchet normal —; 
Galerkin —; high-order approximating —; inner 
linearization —; isotone projection —; limiting normal —; 
minimal —; normal —; order —; outer linearization —; 
Paretian —; pointed closed convex —; pointed convex —; 
polar —; second order —; secondary —; tangent —; 
tangent high-order approximating —; z-critical — 

cone-convex map 
[49K27, 90C29, 90C48] 
(see: Set-valued optimization) 

cone of critical directions 
[90C31, 90C34] 
(see: Semi-infinite programming: second order optimality 
conditions) 

cone of decrease see: high-order approximating — 
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cone of feasible directions 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 
cone property see: uniform — 
cones see: convex —; feasible high-order approximating —; 
high-order feasible —; homogenous —-; ordering —; 
tangent high-order approximating — 
cones of decrease see: high-order — 
confidence interval 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 
configuration 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
configuration 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
configuration see: Monte-Carlo —; point —; search —; 
unyielding —; vector — 
configuration space see: local properties of the — 
confirmans see: modus — 
confirmation 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 
confirmation see: checklist — 
conflict-free coloring 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
conflict graph 
[05C85, 65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
see: Algorithms for genomic analysis; Directed tree 
networks) 
conflicting populations see: Volterra model of — 
conformation 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
conformation 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
conformation see: molecular —; native — 
conformational search 
[65K10, 92C40] 
(see: Multiple minima problem in protein folding: «BB 
global optimization approach) 
conformations see: discarding far-from-native — 
conformity see: uniformity — 
confusion matrix 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
congested network 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
conic convex program see: semidefinite program as — 
conic convex programs 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 


conic duality theorem 
[90C22, 90C25] 
(see: Copositive programming) 
conic extension 
[90C30] 
(see: Image space approach to optimization) 
conic optimization problems see: Approximations to robust — 
conic program 
[37A35, 90C05] 
(see: Potential reduction methods for linear programming) 
conical algorithm 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
conjecture see: Chung-Gilbert —; Gilbert—Pollak —; 
Graham-Hwang —-; Jerrum —; powell’s —; Rosen's 
method, global convergence, and Powell’s —; Smith — 
conjugacy condition 
[49M07, 49M10, 65K, 90C06] 
(see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 
conjugate see: Legendre — 
conjugate direction subclass 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
conjugate function 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
conjugate functions 
[46A20, 49J40, 52A01, 62H30, 65C30, 65C40, 65C50, 65C60, 
65Cxx, 90C05, 90C30] 
(see: Farkas lemma: generalizations; Variational principles) 
conjugate functions see: Fenchel — 
conjugate gradient 
[65K05, 90C30] 
(see: Automatic differentiation: calculation of Newton 
steps) 
conjugate gradient algorithms for unconstrained optimization 
see: New hybrid —; Performance profiles of — 
conjugate gradient method 
[49M37, 90C30] 
(see: Nonlinear least squares: trust region methods; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 
conjugate gradient method see: Shanno — 
Conjugate-gradient methods 
(90C30) 
(referred to in: Broyden family of methods and the BFGS 
update; Discontinuous optimization; Large scale trust 
region problems; Large scale unconstrained optimization; 
Local attractors for gradient-related descent iterations; 
Nonlinear least squares: Newton-type methods; Nonlinear 
least squares: trust region methods; Unconstrained 
optimization in neural network training) 
(refers to: Broyden family of methods and the BFGS update; 
Large scale trust region problems; Large scale 
unconstrained optimization; Local attractors for 
gradient-related descent iterations; Nonlinear least squares: 
Newton-type methods; Nonlinear least squares: trust region 
methods; Unconstrained nonlinear optimization: 
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Newton-Cauchy framework; Unconstrained optimization 
in neural network training) 
conjugate gradient methods 
[90C90] 
(see: Design optimization in computational fluid dynamics) 
conjugate gradient parameter computation 
[49M07, 49M10, 65K, 90C06] 
(see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 
conjugate gradient type algorithm see: Craig — 
conjugate gradients 
90C30] 
(see: Conjugate-gradient methods) 
conjugate pair 
90C90] 
(see: Design optimization in computational fluid dynamics) 
conjugate residual algorithm 
65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
conjugate subgradient method 
49]40, 49]52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
conjugation see: level sets — 
conjunction arc 
[90B35] 
(see: Job-shop scheduling problem) 
conjunctive normal form 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T15, 68T27, 68T30, 
90C09, 90C10, 90C11] 
(see: Checklist paradigm semantics for fuzzy logics; 
Disjunctive programming; Finite complete systems of 
many-valued logic algebras; Inference of monotone boolean 
functions; Optimization in boolean classification problems; 
Optimization in classifying text documents) 
conjunctive normal form 
[90C09, 90C10] 
(see: Inference of monotone boolean functions; 
Optimization in boolean classification problems; 
Optimization in classifying text documents) 
conjunctive normal form see: Boolean formula in — 
conjunctive use of water resource systems 
[90C30, 90C35] 
(see: Optimization in water resources) 
connected 
[68R10, 90C27] 
(see: Branchwidth and branch decompositions) 
connected class 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
connected components of a digraph see: strongly — 
connected cycle see: k-dimensional cube — 
connected digraph see: strongly — 
connected dominating set 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
connected edge list see: extended doubly — 
connected graph 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 


connected matroid see: infinitely — 
connected network see: strongly — 
connected set 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
connectedness 
[46A22, 49J35, 49]40, 54D05, 54H25, 55M20, 90C29, 91A05] 
(see: Minimax theorems; Multi-objective optimization: 
pareto optimal solutions, properties) 
connectedness 
[46A22, 49J35, 49]40, 54D05, 54H25, 55M20, 90C29, 91A05] 
(see: Generalized concavity in multi-objective optimization; 
Minimax theorems) 
connectedness of the efficient points sets 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
connecting path see: elementary — 
connection see: train-to-train — 
connection arc see: arrival-ground —; ground-departure — 
connection arcs see: train-train — 
connection of flow lines 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
connection of wells 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
connective see: associative —; bold —; Boolean —; logic 
algebra —; Lukasiewicz —; mid — 
connectives see: complete set of —; emergence of logic —; 
logic —; semantics of MVL —; TOP and BOT types of 
logical — 
connectivity see: matroid —; network — 
connectivity graph 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
connectivity of a matroid 
[90C09, 90C10] 
(see: Matroids) 
conorm see: t- — 
conormal 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
conorms see: t- — 
CoNP 
[90C60] 
(see: Complexity classes in optimization) 
conquer see: divide-and- — 
consecutive one constraint 
[68W01, 90-00, 90C90, 92-08, 92C50] 
see: Optimization based frameworkfor radiation therapy) 
conservation constraint see: flow — 
conservation of flow 
[91B06, 91B60] 
see: Oligopolistic market equilibrium) 
conservation of flow equation 
[90C30] 
see: Equilibrium networks) 
conservation of flow equations 
[90B10, 91B28, 91B50] 


4140 


Subject Index 


(see: Piecewise linear network flow problems; Spatial price 
equilibrium) 
conservation law see: flow — 
conservation laws 
[90B36] 
(see: Stochastic scheduling) 
considerations see: uncertainty — 
considerations and controllability see: integration of 
dynamic — 
consist-busting 
(see: Railroad locomotive scheduling) 
consist flow formulation 
(see: Railroad locomotive scheduling) 
consistency see: 2B- —; 3B- —; arc —; bound 
box(2)- —; hull —; kB- —; local —; total 
consistency constraints 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
consistency index 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
consistency property see: Jacobian — 
consistency ratio 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
consistent see: locally — 
consistent case see: perfectly — 
consistent judgment matrix 
[90C29] 
see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
consistent labeling 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Criss-cross pivoting rules; Least-index anticycling 
rules) 
consistent labeling algorithm 
[05B35, 90C05, 90C20, 90C33] 
see: Least-index anticycling rules) 
consistent least squares problem 
[65Fxx] 
see: Least squares problems) 
consistent matrix 
[90C29] 
see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
consistent rounding 
[90C35] 
(see: Maximum flow problem) 
consistent variable 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
conspiracy number 
[49J35, 49K35, 62C20, 91A05, 91A40] 
see: Minimax game tree searching) 
constant 
[68Q20] 
see: Optimal triangulations) 
constant see: Boltzmann —; Lipschitz — 
constant control see: piecewise — 


; box —; 
;zone 


constant degree parallelism alignment problem 
(05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
constant maturities see: estimating the spot rate for bonds 
with — 
constant permutation QAP 
90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
constant perturbations see: piecewise- — 
constant rebalanced portfolio 
68Q25, 91B28] 
(see: Competitive ratio for portfolio management) 
Constantine see: Carathéodory —; Metropolis, Nicholas — 
constants 
90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
constants see: Arrhenius — 
constrained edge coloring 
05C85] 
(see: Directed tree networks) 
constrained frequency assignment see: adjacent channel —; 
co-channel — 
constrained global optimization 
[90C26] 
(see: Global optimization using space filling) 
constrained global optimum 
[90C26] 
(see: Global optimization using space filling) 
constrained hierarchical clustering see: order — 
constrained labeling see: distance — 
constrained linear problem see: ball- — 
constrained linear programming see: probabilistic — 
constrained linear programming: duality theory see: 
Probabilistic — 
constrained logic programming 
65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
constrained minimax problem 
65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 
constrained minimax problems 
65K05, 65K10, 90C06, 90C30, 90C34] 
(see: Feasible sequential quadratic programming) 
constrained minimization 
90C26] 
(see: Invexity and its applications) 
constrained minimization problem 
65M60, 90C26] 
(see: Invexity and its applications; Variational inequalities: 
F. E. approach) 
constrained minimum spanning tree problem see: resource- — 
constrained nonlinear optimization see: Inequality- — 
constrained nonlinear programming: KKT necessary optimality 
conditions see: Equality- — 
constrained nonlinear programming problem see: equality- — 
constrained optimization 
[65G20, 65G30, 65G40, 65H20, 65K05, 68Q05, 68Q10, 68Q25, 
90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization; Interval analysis: unconstrained and 
constrained optimization) 
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constrained optimization 
[49M37, 65G20, 65G30, 65G40, 65H20, 65K05, 90C30] 
(see: Inequality-constrained nonlinear optimization; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility) 

constrained optimization see: equality- —; general —; Interval 
analysis: unconstrained and —; nonlinearly — 

constrained optimization problem see: global — 

constrained optimization problems see: linearly — 

constrained optimum see: conditions for a — 

constrained partitioning see: order — 

constrained path problem see: impossible pairs — 

constrained problems: convexity theory see: Probabilistic — 

constrained project scheduling see: Static resource — 

constrained quadratic problem see: bound — 

constrained stochastic programming see: probabilistic — 

constrained subgraph problem see: degree- — 

constrained: unified modeling frameworks see: Short-term 
scheduling, resource — 

constrained vehicle routing problem see: distance- — 

constraint see: abstract —; active —; alignment —; arity of 
a —; ball —; budget —; capacity —; conditional 
expectation —; consecutive one —; convex quadratic —; 
ellipsoid —; ellipsoidal —; flow conservation —; hidden —; 


implicit equality —; implied —; integral —; integral 
quadratic —; integrated probabilistic —; irredundant —; 
joint probabilistic —; knapsack —; linear —; linear program 


with an additional reverse convex —; locality —; 
marginal —; necessary —; nonlinear —; nonredundant —; 
probabilistic —; programming under probabilistic —; 
pseudoquadratic —; redundant —; relatively redundant —; 
resource —; Slater —; state —; state inequality —; 
surrogate —; tongue-and-groove —; weakly necessary —; 
weight of a — 
constraint on arc flows see: capacity — 
constraint-by-constraint method see: lexicographic variant of 
the — 
constraint-factor 
90C26] 
(see: Reformulation-linearization technique for global 
optimization) 
constraint graph 
90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
constraint logic programming 
65G20, 65G30, 65G40, 68T20, 90C10, 90C30] 
(see: Interval constraints; Modeling languages in 
optimization: a new paradigm) 
constraint logic programming see: modeling language and — 
constraint method see: €- —; lexicographic variant of the 
constraint-by- — 
constraint on a multiplicative function 
[90C26] 
(see: Global optimization in multiplicative programming) 
constraint narrowing operator 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
constraint programming 
[65G20, 65G30, 65G40, 68T20, 68T99, 90C27, 90C59] 
(see: Interval constraints; Metaheuristics) 


constraint programming 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
constraint programming see: chance — 
constraint programming hybrid methods see: Mixed integer 
programming/ — 
constraint propagation 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: intermediate terms) 
constraint qualification 
[49K27, 49K40, 49M37, 65K05, 65K10, 90C05, 90C25, 90C26, 
90C29, 90C30, 90C31, 90C39, 93A13] 
(see: Bilevel programming: optimality conditions and 
duality; Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; First order 
constraint qualifications; Kuhn-Tucker optimality 
conditions; Lagrangian duality: BASICS; Multilevel 
methods for optimal design; Second order optimality 
conditions for nonlinear optimization; Theorems of the 
alternative and optimization) 
constraint qualification 
[49K27, 49K40, 90C26, 90C30, 90C31, 90C39] 
(see: Duality for semidefinite programming; First order 
constraint qualifications; Kuhn-Tucker optimality 
conditions; Lagrangian duality: BASICS; Second order 
constraint qualifications; Second order optimality 
conditions for nonlinear optimization; Sensitivity and 
stability in NLP: continuity and differential stability) 
constraint qualification see: basic —; first order —; generalized 
Slater —; linear independence —-; linear independency —; 
Mangasarian—Fromovitz —; second order —; Slater — 
constraint qualification (LICQ) see: linear independence — 
constraint qualifications 
[90C30, 90C31] 
(see: Image space approach to optimization; Sensitivity and 
stability in NLP: continuity and differential stability) 
constraint qualifications see: First order —; input —; Second 
order — 
constraint region see: relaxed — 
constraint satisfaction 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
constraint satisfaction see: maximum — 
constraint satisfaction problem 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
Constraint Satisfaction Problem see: binary —; max-r- —; 
maximum —; numerical — 
constraint satisfaction problems 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
constraint satisfaction problems see: continuous — 
constraint satisfaction: relaxations and upper bounds see: 
Maximum — 
constraint satisfaction techniques 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: intermediate terms) 
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constraint satisfaction techniques 

65G20, 65G30, 65G40, 65H20] 

(see: Interval analysis: intermediate terms) 

constraint set 

90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: branch and bound methods) 

constraint set see: reduction of a — 

constraint solving 

65G20, 65G30, 65G40, 65K05, 90C30] 

(see: Interval global optimization) 

constraint violating point see: index of a — 

constraints 

65G20, 65G30, 65G40, 68T20, 90C05, 90C10, 90C20, 90C26, 
90C30] 
(see: Global optimization using space filling; Interval 
constraints; Modeling languages in optimization: a new 
paradigm; Redundancy in nonlinear programs; Smooth 
nonlinear nonconvex optimization) 

constraints 
[65K05, 90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Direct global optimization algorithm; Modeling 
difficult optimization problems) 

constraints see: abstract —; active —; active inequality —; 
binary noninterference —; bound —; bounds —; box —; 
capacity —; consistency —; control —; coupling —; 
disjoint —; equality —; feasibility of equality —; feasibility 
of inequality —; flow —; flow bound —; fuzzy —; general 
linear —; generalized upper bounds —; inactive —; 
incorporation of biological —; individual probabilistic —; 
inequality —; infeasibility of inequality —; 
intercommodity —; Interval —; Lagrange multipliers for 
nonanticipativity —; Lagrange multipliers for phase —; 
linearization of —; lower and upper bounds —; mass 
balance —; mathematical program with affine 


equilibrium —; mathematical program with equilibrium —; 


maximum function with dependent —; maximum oil, gas 
and water capacity —; multistage linking —; nested —; 
network —; nonanticipativity —; noninterference —; 


notation for —; odd-set —; optimization under network —; 


phase —; positive semidefiniteness —; 
precedence/coupling —; projection —; Quadratic 
programming with bound —; regular —; 
semi-assignment —; set-valued —; side —; simplicial —; 
single fixed cost with capacity —; single fixed cost with no 
capacity —; slack —; state —; strongly active —; 
structural —; sub-tour elimination —; submodular —; 
subtour elimination —; tight —; time window —; tree —; 
upper and lower well oil rate —; vehicle scheduling 
problems with time —; violation of —; weighted-sums 
programs with — 

constraints: A piecewise SQP approach see: Optimization with 
equilibrium — 

constraints in standard form 
[90C60] 
(see: Complexity of degeneracy) 

constraints on variables 
[49]52, 90C30] 
(see: Nondifferentiable optimization: relaxation methods) 

construct see: cognitive — 

construction see: greedy —; network design and schedule —; 
random — 


construction of descent directions 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
construction of a dual problem 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 
construction heuristic 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
construction heuristics 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
construction methods 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
construction phase 
(see: Maximum cut problem, MAX-CUT) 
construction phase in GRASP 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
construction procedure see: arc oriented —; best arc —; best 
node —; mixed —; mixed VAM —; node oriented — 
construction procedures 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
construction procedures 
[90B80] 
(see: Facilities layout problems) 
constructions see: linearly elastic mechanical — 
constructive lower bounds 
[90B35] 
(see: Job-shop scheduling problem) 
constructive methods for solving vehicle routing problems 
[90B06] 
(see: Vehicle routing) 
constructive nonlinear dynamics see: Robust design of 
dynamic systems by — 
consumption see: expected power — 
consumption of utilities 
(see: Planning in the process industry) 
contact see: Signorini-Coulomb unilateral frictional — 
contact map 
(see: Contact map overlap maximization problem, CMO) 
contact map overlap 
(see: Contact map overlap maximization problem, CMO) 
Contact map overlap maximization problem, CMO 
contact point 
[90Cxx] 
(see: Discontinuous optimization) 
contact problem with friction see: coupled unilateral — 
contacts in alpha-helical proteins see: Predictive method for 
interhelical — 
containment graph model 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
contaminated information 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
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(see: Information-based complexity and information-based 
optimization) 

context descriptors 

90C09, 90C10] 

(see: Optimization in classifying text documents) 

context descriptors 

90C09, 90C10] 

(see: Optimization in classifying text documents) 

Conti-Traverso algorithm 

13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 

contingency table 

03B50, 03B52, 03C80, 62F30, 62Gxx, 62H30, 68T27, 90C27] 

(see: Assignment methods in clustering; Checklist paradigm 

semantics for fuzzy logics) 

contingent 

49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 

contingent cone 

90C31, 90C34, 90C46] 

(see: Generalized semi-infinite programming: optimality 

conditions) 

contingent epiderivative 

49K27, 90C29, 90C48] 

(see: Set-valued optimization) 

continuation 

65F10, 65F50, 65H10, 65K10] 

(see: Globally convergent homotopy methods) 

continuation 

65F10, 65F50, 65H10, 65K10] 

(see: Globally convergent homotopy methods) 

continuation see: homotopy — 

continuation method 

65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 

continuation method 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 

continuation method see: homotopy — 

continuity see: Holder —; joint —; Lipschitz —; rates of 
quantitative — 

continuity conditions 
[34A55, 78A60, 90C30] 
(see: Optimal design in nonlinear optics) 

continuity and differential stability see: Sensitivity and stability 
in NLP: — 

continuity property of the objective function value 
[90C31] 
(see: Bounds and solution vector estimates for parametric 
NLPS) 

continuity, stability, rates of convergence see: Stochastic 
integer programming: — 

continuous 
[58E05, 90C30] 
(see: Planning in the process industry; Topology of global 
optimization) 

continuous see: approximate —; exact —; Lipschitz — 


Continuous approximations to subdifferentials 

(65K05, 90C56) 
continuous based heuristics 

[05C69, 05C85, 68W01, 90C59] 

(see: Heuristics for maximum clique and independent set) 
continuous constraint satisfaction problems 

[65G20, 65G30, 65G40, 68T20] 

(see: Interval constraints) 
continuous dependence 

[65K10, 90C90] 

(see: Variational inequalities: projected dynamical system) 
continuous and discrete free variables see: Generalized 

geometric programming: mixed — 
continuous and discrete time models 

[90C26] 

(see: MINLP: design and scheduling of batch processes) 
continuous form see: coercive bilinear symmetric — 
continuous function see: Lipschitz —; locally Lipschitz —; 

radially —; U- — 
continuous functional see: absolutely — 
continuous functions see: Lipschitzian operators in best 

approximation by bounded or — 
continuous global optimization 

[90C05] 

(see: Continuous global optimization: models, algorithms 

and software; Global optimization in the analysis and 

management of environmental systems) 
continuous global optimization see: mixed discrete- — 
Continuous global optimization: applications 

(90C05) 

(referred to in: &BB algorithm; Continuous global 

optimization: models, algorithms and software; Differential 

equations and global optimization; Direct global 
optimization algorithm; Forecasting; Global optimization 
in the analysis and management of environmental systems; 

Global optimization based on statistical models; Global 

optimization in binary star astronomy; Global 

optimization methods for systems of nonlinear equations; 

Global optimization using space filling; Interval global 

optimization; Mixed integer nonlinear programming; 

Topology of global optimization) 

(refers to: eBB algorithm; Continuous global optimization: 

models, algorithms and software; Differential equations 

and global optimization; Direct global optimization 
algorithm; Forecasting; Global optimization in the analysis 
and management of environmental systems; Global 
optimization based on statistical models; Global 
optimization in binary star astronomy; Global 
optimization methods for systems of nonlinear equations; 

Global optimization using space filling; Interval global 

optimization; Mixed integer nonlinear programming; 

Topology of global optimization) 
continuous global optimization model 

[90C05] 

(see: Continuous global optimization: models, algorithms 

and software) 

Continuous global optimization: models, algorithms and 
software 

(90C05) 

(referred to in: aBB algorithm; Continuous global 

optimization: applications; Differential equations and 
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global optimization; Direct global optimization algorithm; 
Global optimization in the analysis and management of 
environmental systems; Global optimization based on 
statistical models; Global optimization in batch design 
under uncertainty; Global optimization in binary star 
astronomy; Global optimization in generalized geometric 
programming; Global optimization: interval analysis and 
balanced interval arithmetic; Global optimization methods 
for systems of nonlinear equations; Global optimization in 
phase and chemical reaction equilibrium; Global 
optimization using space filling; Interval global 
optimization; Large scale unconstrained optimization; 
Maximum cut problem, MAX-CUT; MINLP: branch and 
bound global optimization algorithm; MINLP: global 
optimization with «BB; Modeling languages in 
optimization: a new paradigm; Optimization-based 
visualization; Optimization software; Smooth nonlinear 
nonconvex optimization; Topology of global optimization) 
(refers to: a BB algorithm; Continuous global optimization: 
applications; Convex envelopes in optimization problems; 
Differential equations and global optimization; Direct 
global optimization algorithm; Global optimization in the 
analysis and management of environmental systems; Global 
optimization based on statistical models; Global 
optimization in batch design under uncertainty; Global 
optimization in binary star astronomy; Global 
optimization in generalized geometric programming; 
Global optimization of heat exchanger networks; Global 
optimization methods for systems of nonlinear equations; 
Global optimization in phase and chemical reaction 
equilibrium; Global optimization using space filling; 
Interval global optimization; Large scale unconstrained 
optimization; MINLP: branch and bound global 
optimization algorithm; MINLP: heat exchanger network 
synthesis; MINLP: mass and heat exchanger networks; 
Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer linear programming: 
mass and heat exchanger networks; Modeling languages in 
optimization: a new paradigm; Optimization software; 
Smooth nonlinear nonconvex optimization; Topology of 
global optimization) 


continuous location 
[90B80, 90B85, 90Cxx, 91 Axx, 91Bxx] 
see: Facility location with externalities) 


continuous model in OR 
[90B80, 90B85] 
see: Warehouse location problem) 


continuous multiple criteria problem 

[90C29] 

see: Multiple objective programming support) 
continuous operator see: completely — 
continuous optimization 

[9008, 90C26, 90C27, 90C59] 

see: Variable neighborhood search methods) 


continuous optimization problems see: Continuous 
reformulations of discrete- — 


continuous piecewise linear function see: decomposition of 
a 


continuous processes see: Short-term scheduling of — 


continuous programming 
[90C26] 
(see: Invexity and its applications) 
continuous programming 
[90C26] 
(see: Invexity and its applications) 
Continuous reformulations of discrete-continuous 
optimization problems 
(90C11, 90C10, 90C33, 90C27) 
(refers to: Disjunctive programming; Mixed integer 
programming/constraint programming hybrid methods; 
Order complementarity) 
continuous relaxation 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
Continuous review inventory models: (Q, R) policy 
(49-02, 90-02) 
continuous review model 
90B50] 
(see: Inventory management in supply chains) 
continuous review model 
90B50] 
(see: Inventory management in supply chains) 
continuous selection 
49J52, 49Q10, 74G60, 74H99, 74K99, 74Pxx, 90C90] 
(see: Quasidifferentiable optimization: stability of dynamic 
systems) 
continuous selection of functions 
58E05, 90C30] 
(see: Topology of global optimization) 
continuous selection operator 
41A30, 47A99, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
continuous-time analog of the dynamic programming equation 
34H05, 49L20, 90C39] 
(see: Dynamic programming: continuous-time optimal 
control) 
continuous-time equivalent of the dynamic programming 
algorithm 
[34H05, 491.20, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 
continuous-time formulation 
[90C26] 
(see: MINLP: design and scheduling of batch processes) 
continuous Time Model 
(see: Integrated planning and scheduling) 
continuous-time optimal control 
[34H05, 491.20, 90C39] 
(see: Dynamic programming: continuous-time optimal 
control; Hamilton-Jacobi-Bellman equation) 
continuous-time optimal control see: Dynamic 
programming: — 
continuous-time Riccati equation 
[34H05, 491.20, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 
continuously codifferentiable 
[65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
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continuously codifferentiable see: twice — 
continuously codifferentiable function 
[49J52, 65K99, 70-08, 90C25] 
(see: Quasidifferentiable optimization: codifferentiable 
functions) 
continuously codifferentiable function see: twice — 
continuously differentiable exact penalty function approach 
[90C30] 
(see: Large scale trust region problems) 
continuously differentiable function see: piecewise — 
continuum 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
continuum 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
continuum see: true — 
contract see: energy purchase — 
contract algorithm see: branch and — 
contract-or-patch (COP) 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
contracting matroid elements 
[90C09, 90C 10] 
(see: Matroids) 
contracting measure 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
contraction 
[65H10, 65J15, 90C30] 
(see: Contraction-mapping; Sequential simplex method) 
contraction see: k-set- —; matroid —; path —-; strict-set- —; 
weighter sup-norm — 
contraction/approximation measure 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
contraction coefficient 
[90C30] 
(see: Sequential simplex method) 
Contraction-mapping 
(65H10, 65J15) 
(referred to in: Global optimization methods for systems of 
nonlinear equations; Grébner bases for polynomial 
equations; Interval analysis: systems of nonlinear 
equations; Nonlinear least squares: Newton-type methods; 
Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
(refers to: Global optimization methods for systems of 
nonlinear equations; Interval analysis: systems of nonlinear 
equations; Nonlinear least squares: Newton-type methods; 
Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
contraction mapping 
[491.20, 90C40] 
(see: Dynamic programming: stochastic shortest path 
problems) 
contraction mapping 
[65H 10, 65J15] 
(see: Contraction-mapping) 


contraction mappings 
[49L20, 90C30, 90C39, 90C40, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms; 
Dynamic programming: infinite horizon problems, 
overview) 

contraction matrices see: completion to completely positive 
and — 

contraction matrix 

[05C50, 15A48, 15A57, 90C25] 

see: Matrix completion problems) 

contraction matrix see: partial — 

contraction of a matroid 

[90C09, 90C10] 

see: Matroids) 

contraction in matroids 

[90C09, 90C10] 

see: Oriented matroids) 

contraction method see: edge — 

contraction operation 

[90C35] 

see: Feedback set problems) 

contraction operation 

[90C30] 

(see: Sequential simplex method) 

contractive operator 

[49L20, 90C39] 

(see: Dynamic programming: discounted problems) 

contradual transformation 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

see: Checklist paradigm semantics for fuzzy logics) 

contrapositivization 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

see: Checklist paradigm semantics for fuzzy logics) 

control 

[49]xx, 49K20, 49M99, 90C55, 91 Axx] 

see: Emergency evacuation, optimization modeling; 

Infinite horizon control and dynamic games; Sequential 

quadratic programming: interior point methods for 

distributed optimal control problems) 


control 
[92C05] 
(see: Adaptive simulated annealing and its application to 
protein folding) 

control see: closed-loop —; continuous-time optimal —; 
Discrete-Time Optimal —; Dynamic programming: 
continuous-time optimal —; Dynamic programming: 
inventory —; Dynamic programming and Newton's method 
in unconstrained optimal —; epidemic —; feedback —; 
ground delay problem in air traffic —; interaction of design 
and —-; interaction of design, synthesis and —; inventory —; 
MINLP: applications in the interaction of design and —; 
model predictive —; mu synthesis —; Multi-objective 
optimization: interaction of design and —; open-loop —; 
optimal —; parametric optimal —; piecewise constant —; 
piecewise linear —; pollution —; process —; relaxed —; 
Resource allocation for epidemic —; Robust —; rounding 
errors are under —; Suboptimal —; systems theory and —; 
temperature —; time optimal —; unconstrained optimal — 

control applications see: Dynamic programming: optimal — 
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control component 
[90C90, 91B28] 
(see: Robust optimization) 
control constraints 
[90C90, 91B28] 
(see: Robust optimization) 
control for drug delivery systems see: Model based — 
control and dynamic games see: Infinite horizon — 
control engineering 
[93D09] 
(see: Robust control) 
control with first order differential equations see: Duality in 
optimal — 
control of a flexible arm see: Optimal — 
control function 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
control-function see: admissible pair of trajectory-function 
and —-; linear appearance of — 
control functions see: asymptotically admissible pair of 
trajectory and — 
control and ground delay programs see: air traffic — 
control model see: logistics — 
control pair see: admissible trajectory- — 
control parameterization 
[49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 
control parametrization 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
control policy see: optimal — 
control problem see: finite-dimensional —; inventory —; 
mixed integer optimal —; optimal —; relaxed —; 
singular —; time optimal — 
Control problems 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 
control problems see: discretized optimal —; distributed 
optimal —; Semi-infinite programming and —; Sequential 
quadratic programming: interior point methods for 
distributed optimal — 
control restrictions 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 
control: schur stability of polytopes of polynomials see: 
Robust — 
control state of a Turing machine 
[90C60] 
(see: Complexity classes in optimization) 
control synthesis see: robust — 
control theory 
[49-XX, 60Jxx, 65Lxx, 91B32, 92D30, 93-XX] 
(see: Resource allocation for epidemic control) 
control theory 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
control theory see: robust — 


control variables 

90C90, 91B28] 

(see: Robust optimization) 

control variates 

62F12, 65C05, 65K05, 90C15, 90C31] 

(see: Monte-Carlo simulations for stochastic optimization) 
control vector 

49M29, 65K10, 90C06] 

(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 


control vector iteration 
[93-XX] 
(see: Boundary condition iteration BCI) 

Control vector iteration CVI 
(93-XX) 
(referred to in: Boundary condition iteration BCI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Hamilton-Jacobi-Bellman 
equation; Infinite horizon control and dynamic games; 
MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control; Optimal control of a flexible arm; Robust 
control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Suboptimal control) 
(refers to: Boundary condition iteration BCI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Hamilton-Jacobi-Bellman 
equation; Infinite horizon control and dynamic games; 
MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control; Optimal control of a flexible arm; 
Optimization strategies for dynamic systems; Robust 
control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Suboptimal control) 


controllability 
[49M37, 90C11, 93-XX] 
(see: MINLP: applications in the interaction of design and 
control; Optimal control of a flexible arm) 

controllability see: integration of dynamic considerations 
and —; minimum norm — 

controllability measure 
[49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 

controlled recharge facilities 
[90C30, 90C35] 
(see: Optimization in water resources) 
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controlled selection 
[90C35] 
(see: Multi-index transportation problems) 
controller see: feasible gradient —; nonfeasible gradient — 
controllers via parametric programming see: Design of robust 
model-based — 
controls see: suboptimal trajectories and —; unbounded — 
controls and non standard methods see: unbounded — 
convention see: extensionality — 
conventional 
[90C35] 
(see: Multicommodity flow problems) 
convergence 
[47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 
convergence see: P —; asymptotic —; discrete —; discrete 
Mosco —-; discrete Painlevé—Kuratowski —; f-attentive —; 
finite e- —; global —; linear —; polynomial time —; 
premature —; Q-quadratic —; Q-superlinear —; 
quadratic —; rate of —; Stochastic integer programming: 
continuity, stability, rates of —; superlinear —; weak —; 
weak discrete — 
convergence condition see: superlinear — 
convergence of GRASP see: global — 
convergence of the overall flowsheet 
[90C30, 90C90] 


(see: Successive quadratic programming: applications in the 


process industry) 
convergence, and Powell's conjecture see: Rosen’s method, 

global — 
convergence of PPA 

[90C30] 

(see: Relaxation in projection methods) 
convergence of probability measures see: weak — 
convergence problem for the Rosen method see: global — 
convergence rate 

[90C06, 93-XX] 

(see: Boundary condition iteration BCI; Large scale 

unconstrained optimization) 
convergence rate 

[90C30] 

(see: Frank-Wolfe algorithm) 
convergence rate see: geometric —; local —; r-linear 
convergence rates see: asymptotic — 
convergence tests see: feasibility —; value — 
convergence theorem 

[60G35, 65K05] 

(see: Differential equations and global optimization) 
convergence theorem see: asynchronous —-; local 

quadratic —; monotone — 
convergence and turnpike theory see: Statistical — 
convergent 

[90C25, 90C26] 

(see: Decomposition in global optimization) 
convergent see: globally — 
convergent algorithm 

[90C26] 

(see: Cutting plane methods for global optimization) 
convergent algorithm see: globally — 
convergent homotopies see: probability-one globally — 
convergent homotopy methods see: Globally — 


convergent probability-one homotopy algorithm see: 
globally — 

convergent rate see: superlinear — 

converges 

[90C15] 

see: Approximation of extremum problems with 

probability functionals) 

converse relation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

converting see: paper — 

convex 

[41A30, 47A99, 49K05, 49K10, 49K15, 49K20, 49M37, 65K10, 

90C10, 90C11, 90C25, 90C26, 90C27, 90C30, 90C31, 90C33, 

90C35] 

see: &0BB algorithm; Continuous reformulations of 
discrete-continuous optimization problems; Duality in 
optimal control with first order differential equations; 
Generalized monotone single valued maps; Global 
optimization: functional forms; L-convex functions and 
M-convex functions; Lipschitzian operators in best 
approximation by bounded or continuous functions; 
Robust global optimization; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

convex 
[90C05, 90C11, 90C15, 90C25, 90C30] 
(see: Krein-Milman theorem; Stochastic programming with 
simple integer recourse; Successive quadratic 
programming: full space methods; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

convex see: n- —; invariant —; m- —; strongly — 

convex-along-rays functions on topological vector spaces see: 
Increasing and — 

convex analysis 
[26B25, 26E25, 46A22, 49J35, 49J40, 49J52, 49Q10, 49805, 
54D05, 54H25, 55M20, 65K99, 70-08, 70-XX, 74G99, 74H99, 
74K99, 74Pxx, 80-XX, 90C25, 90C33, 90C99, 91A05] 
(see: Hemivariational inequalities: applications in 
mechanics; Minimax theorems; Nonconvex energy 
functions: hemivariational inequalities; Quasidifferentiable 
optimization; Quasidifferentiable optimization: algorithms 
for hypodifferentiable functions; Quasidifferentiable 
optimization: codifferentiable functions) 

convex analysis 

[32B15, 51E15, 51N20] 

see: Affine sets and functions) 

convex analysis see: abstract —; discrete — 

convex bipartite graph 

[90C35] 

see: Feedback set problems) 

convex combination of the extreme points 

[90C30] 

see: Simplicial decomposition) 

convex combinations 

[49M07, 49M10, 65K05, 68Q99, 90C06] 

see: Branch and price: Integer programming with column 

generation; Performance profiles of conjugate-gradient 

algorithms for unconstrained optimization) 
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convex combinations 


[90C30] 
(see: Simplicial decomposition) 


convex combinatorial optimization 


[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 


convex composite programming 


[46A20, 52A01, 90C30] 
see: Composite nonsmooth optimization) 


convex-concave case 


[90C15] 
see: Stochastic programs with recourse: upper bounds) 


convex and concave regression 


[41A30, 62J02, 90C26] 
see: Regression by special functions: algorithms and 
complexity) 


convex cone 


[90C30] 
see: Duality for semidefinite programming) 


convex cone see: pointed —; pointed closed — 
convex cones 


[90C22, 90C25] 
see: Copositive programming) 


convex constraint see: linear program with an additional 


reverse — 


convex decreasing 


[65D18, 90B85, 90C26] 
(see: Global optimization in location problems) 


Convex discrete optimization 


(05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C) 

(referred to in: Adaptive convexification in semi-infinite 
optimization) 


Convex envelopes in optimization problems 


(90C26) 

(referred to in: a BB algorithm; Continuous global 
optimization: models, algorithms and software; Global 
optimization in generalized geometric programming; 
Global optimization methods for systems of nonlinear 
equations; Lipschitzian operators in best approximation by 
bounded or continuous functions; MINLP: global 
optimization with wBB) 

(refers to: & BB algorithm; Global optimization in 
generalized geometric programming; MINLP: global 
optimization with wBB) 


convex feasibility problem 


[47H05, 65J15, 90C25, 90C30, 90C55] 
(see: Fejér monotonicity in convex optimization; Relaxation 
in projection methods) 


convex function 


[49J52, 90C26, 90C30, 90C31, 90C39] 

(see: Nondifferentiable optimization: subgradient 
optimization methods; Second order optimality conditions 
for nonlinear optimization; Sensitivity and stability in NLP: 
approximation) 


convex function 


[49J52, 65K05, 90C30, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization; 
Frank-Wolfe algorithm; Nondifferentiable optimization: 


relaxation methods; Nondifferentiable optimization: 
subgradient optimization methods) 

convex function see: abstract —; difference —; geodesic —; 
H- —; K- —; L- —; M- —; program of minimizing 
a generalized —; strictly —; uniformly — 

convex functions see: difference of —; Fenchel-type duality for 
M- and L- —; h- —; L-convex functions and M- —; product 
of — 

convex functions and M-convex functions see: L- — 


convex global underestimation 
[65K05, 90C26] 
(see: Molecular structure determination: convex global 
underestimation) 

convex global underestimation see: Molecular structure 
determination: — 

convex global underestimator 
[65K05, 90C26] 
(see: Molecular structure determination: convex global 
underestimation) 

convex hull 
[05A, 05C60, 05C69, 15A, 37B25, 41A30, 47A99, 51M, 52A, 
52B, 52C, 62H, 65K10, 68Q, 68R, 68U, 68W, 90B, 90B80, 
90B85, 90C, 90C05, 90C06, 90C08, 90CN9, 90C10, 90C11, 
90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Convex discrete optimization; Disjunctive 
programming; Integer programming: cutting plane 
algorithms; Lipschitzian operators in best approximation 
by bounded or continuous functions; Replicator dynamics 
in combinatorial optimization; Single facility location: 
circle covering problem; Voronoi diagrams in facility 
location) 

convex hull 
[90B80, 90C05, 90C09, 90C10, 90C11, 90C15, 90C27, 90C30] 
(see: Carathéodory theorem; Disjunctive programming; 
Frank-Wolfe algorithm; Krein-Milman theorem; 
Simplicial decomposition; Stochastic programming with 
simple integer recourse; Voronoi diagrams in facility 
location) 

convex hull see: lower — 

Convex hull disjunctions 
(see: Logic-based outer approximation) 

convex hull problem 
[52B12, 68Q25] 
(see: Fourier—-Motzkin elimination method) 

convex inequalities 
[46A20, 52A01, 90C30] 
(see: Farkas lemma: generalizations) 

convex inequality see: reverse — 

convex inequality systems 
[46A20, 52A01, 90C30] 
(see: Farkas lemma: generalizations) 

convex integer programming 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 

convex integer transportation problem 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
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convex intersection problem 

[90C30] 

(see: Relaxation in projection methods) 
convex-like 

[90C26] 

(see: Invexity and its applications) 
convex-like 

[90C26] 

(see: Invexity and its applications) 
convex-like function 

[90C05, 90C30] 

(see: Theorems of the alternative and optimization) 
convex-like function pair 

[46A20, 52A01, 90C30] 

(see: Farkas lemma: generalizations) 
convex-like systems 

[46A20, 52A01, 90C30] 

(see: Farkas lemma: generalizations) 
convex map see: cone- — 
convex max-function 

[65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 
Convex max-functions 

(49K35, 49M27, 65K10, 90C25) 

(referred to in: Affine sets and functions; Lagrangian 

multipliers methods for convex programming) 

(refers to: Lagrangian multipliers methods for convex 

programming; Successive quadratic programming) 
convex MINLP 

[49M37, 90C11] 

(see: Mixed integer nonlinear programming) 
convex minorant see: greatest — 
convex model 

[90C05, 90C25, 90C29, 90C30, 90C31] 

(see: Nondifferentiable optimization: parametric 

programming) 
convex moment problem 

[28-XX, 49-XX, 60-XX] 

(see: General moment optimization problems) 
convex moment problem 

[28-XX, 49-XX, 60-XX] 

(see: General moment optimization problems) 
convex moment problem see: solution of the — 
convex multiplicative function see: program of minimizing a — 
convex multiplicative functions see: sum of — 
convex multiplicative program 

[90C26] 

(see: Global optimization in multiplicative programming) 
convex NDO 

[46N10, 90-00, 90047] 

(see: Nondifferentiable optimization) 
convex and nonconvex programming problems 

[90C26, 90C39] 

(see: Second order optimality conditions for nonlinear 

optimization) 
convex objective function see: separable — 
convex optimization 

[15A15, 90C25, 90C55, 90C90] 

(see: Semidefinite programming and determinant 

maximization) 


convex optimization 
[15A15, 49-XX, 49K35, 49M27, 65K10, 90-XX, 90C25, 90C30, 
90C55, 90C90, 93-XX] 
(see: Convex max-functions; Duality theory: monoduality in 
convex optimization; Lagrangian multipliers methods for 
convex programming; Semidefinite programming and 
determinant maximization) 

convex optimization see: Duality theory: monoduality in —; 
Fejér monotonicity in —; multi-objective —; 
nondifferentiable —; Reverse — 

convex optimization problem 
[90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 

convex parametric programming 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 

convex piecewise linearization in facility location problems with 
staircase costs 

90B80, 90C11] 

(see: Facility location with staircase costs) 

convex polyhedral set 

90C05, 90C15] 

(see: Probabilistic constrained linear programming: duality 

theory) 

convex polytope 

52B11, 52B45, 52B55] 

(see: Volume computation for polytopes: strategies and 

performances) 

convex polytope 

52B11, 52B45, 52B55, 90B85] 

(see: Multifacility and restricted location problems; Volume 

computation for polytopes: strategies and performances) 

convex problem 

90C25, 90C30, 90C31] 
(see: Lagrangian multipliers methods for convex 
programming; Sensitivity and stability in NLP: continuity 
and differential stability) 

convex problem see: jointly — 

convex problems 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 

convex problems see: partly — 

convex program see: nondifferentiable —; partly —; 
semidefinite program as conic — 

convex programming 
[90C05, 90C06, 90C22, 90C25, 90C30, 90C51] 
(see: Interior point methods for semidefinite programming; 
Saddle point theory and optimality conditions) 

convex programming 
[47H05, 65J15, 90C05, 90C25, 90C30, 90C55] 
(see: Duality for semidefinite programming; Fejér 
monotonicity in convex optimization; Young 
programming) 

convex programming see: differentiable —; fundamental 
property in —; Lagrangian multipliers methods for —; 
reverse — 

convex programming problem 
[60G35, 65K05, 90C26, 90C39] 
(see: Differential equations and global optimization; Second 
order optimality conditions for nonlinear optimization) 
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convex programs see: conic —; partly —; reverse 
convex quadratic constraint 

90C60] 

(see: Complexity theory: quadratic programming) 
convex quadratic function 

[90C60] 

(see: Complexity theory: quadratic programming) 
convex quadratic knapsack problem 

[90C60] 

(see: Complexity theory: quadratic programming) 
convex quadratic optimization 

[05B35, 65K05, 90C05, 90C20, 90C33] 

(see: Criss-cross pivoting rules) 

convex quadratic program 

[90B85, 90C27] 

(see: Single facility location: circle covering problem) 
convex quadratic programming 

[90C30] 

(see: Lagrangian duality: BASICS) 

convex regression problem 

[41A30, 62J02, 90C26] 

see: Regression by special functions: algorithms and 
complexity) 


convex relaxation problem 

[65H10, 90C26, 90C30] 

(see: Global optimization methods for systems of nonlinear 
equations) 


convex relaxations 

[90C26, 90C90] 

see: Global optimization of heat exchanger networks) 

convex semidefinite programming problem 

[90C22, 90C25, 90C31] 

see: Semidefinite programming: optimality conditions and 

stability) 

convex set 

[49]52, 90C30] 

see: Nondifferentiable optimization: relaxation methods; 
Nondifferentiable optimization: subgradient optimization 
methods) 

convex set see: geodesic —; L- —; M- 

convex sets see: differences of — 


5 reverse 


Convex-simplex algorithm 
(90C30) 
(referred to in: Equivalence between nonlinear 
complementarity problem and fixed point problem; 
Generalized nonlinear complementarity problem; Integer 
linear complementary problem; LCP: Pardalos-Rosen 
mixed integer formulation; Lemke method; Linear 
complementarity problem; Linear programming; Order 
complementarity; Parametric linear programming: cost 
simplex algorithm; Principal pivoting methods for linear 
complementarity problems; Sequential simplex method; 
Topological methods in complementarity theory) 
(refers to: Lemke method; Linear complementarity problem; 
Linear programming; Parametric linear programming: cost 
simplex algorithm; Sequential simplex method) 
convex-simplex algorithm 
[90C30] 
(see: Convex-simplex algorithm) 


convex-simplex algorithm 

[90C30] 

(see: Convex-simplex algorithm) 
convex SIP 

[90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming: discretization methods) 
convex SQP 

[90C25, 90C30] 

(see: Successive quadratic programming: full space 

methods) 
convex subdifferential 

[46A20, 52A01, 90C30] 

(see: Composite nonsmooth optimization) 
convex transformation 

[90C11, 90C90] 

(see: MINLP: trim-loss problem) 
convex transformation 

[90C11, 90C90] 

(see: MINLP: trim-loss problem) 
convex underestimator 

[90C11, 90C26] 

(see: Convex envelopes in optimization problems; MINLP: 

branch and bound methods) 
convex underestimator 

[90C26] 

(see: Convex envelopes in optimization problems) 
convex underestimators see: Global optimization: tight — 
convex variational inequality for an elastostatic problem 

involving QD-superpotentials 

[49J40, 49M05, 49805, 74G99, 74H99, 74Pxx] 

(see: Quasidifferentiable optimization: variational 

formulations) 
convexifiable 

[25A15, 34A05, 90C25, 90C26, 90C30, 90C31] 

(see: Convexifiable functions, characterization of; Invexity 

and its applications) 
convexifiable 

[90C26] 

(see: Invexity and its applications) 

Convexifiable Function see: integral Mean-Value for 

Composite — 

Convexifiable functions, characterization of 

(90C25, 90C26, 90C30, 90C31, 25A15, 34A05) 
convexifiable program see: sequentially — 
convexification 

[25A15, 34A05, 90C25, 90C26, 90C30, 90C31] 

(see: Convexifiable functions, characterization of) 
convexification 

[90C27] 

(see: Time-dependent traveling salesman problem) 
convexification parameter 

[65K05, 90C26, 90C33, 90C34] 

(see: Adaptive convexification in semi-infinite 

optimization) 
convexification/relaxation strategy 

[65K05, 90C11, 90C26] 

(see: MINLP: global optimization with wBB) 
convexification in semi-infinite optimization see: Adaptive — 
Convexification Technique see: reformulation-Linearization/ — 
convexification techniques see: reformulation-linearization/ — 
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convexifier 
[25A15, 34A05, 90C25, 90C26, 90C30, 90C31] 
(see: Convexifiable functions, characterization of) 
convexity 
[90C26, 90C30, 90C33] 
(see: Generalized monotone single valued maps; Implicit 
lagrangian) 
convexity 
[28-XX, 49-XX, 49M37, 60-XX, 65K10, 90C26, 90C30] 
(see: &BB algorithm; Frank-Wolfe algorithm; General 
moment optimization problems) 
convexity see: abstract —; discrete midpoint —; 
generalized —; geodesic —; K- —; L- —; M- 
convexity cut 
[90C26] 
(see: Cutting plane methods for global optimization) 
convexity property of the objective function value 
[90C31] 
(see: Bounds and solution vector estimates for parametric 
NLPS) 
convexity property of the solution space 
[90C31] 
(see: Bounds and solution vector estimates for parametric 
NLPS) 
convexity theory see: Probabilistic constrained problems: — 
convexized filled function see: globally — 
Cook-Levin theorem 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
cook’s theorem 
[03B50, 68T15, 68T30, 90C60] 
(see: Computational complexity theory; Finite complete 
systems of many-valued logic algebras) 
cooling schedule 
[65K05, 90C30] 
(see: Random search methods) 
cooperation see: region of — 
cooperation minimization algorithms see: supervisor and 
searcher — 
cooperative case of a two-person game 
90C30, 90C90] 
(see: Bilevel programming: global optimization) 
cooperative equilibrium 
49Jxx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
Cooperative equilibrium 
49Jxx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
cooperative game 
90C27, 90C60, 91A12] 
(see: Combinatorial optimization games) 
cooperative game 
90C27, 90C60, 91A12] 
(see: Combinatorial optimization games) 
cooperative solution 
49Jxx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
coordinate 
01A99] 
(see: Leibniz, gottfried wilhelm) 


coordinate-aligned ellipsoid 
[37A35, 90C05] 
(see: Potential reduction methods for linear programming) 
coordinate descent method 
[90C30] 
(see: Cost approximation algorithms) 
coordinate direction 
[90C26, 90C90] 
(see: Global optimization: hit and run methods) 
coordinate method see: Cyclic — 
coordinate search see: cyclic — 
coordinate system see: curvilinear —; moving — 
coordinate transformation 
[90C30] 
(see: Suboptimal control) 
coordinates 
[0199] 
(see: Leibniz, gottfried wilhelm) 
coordinates see: axes of —; Cartesian —; internal —; kth order 
form of — 
coordinatewise increasing function 
[90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions) 
coordinatewise increasing utility function 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
coordination see: decomposition/ — 
coordination method see: goal —; model — 
coordination step 
[90C06, 90C25, 90C35] 
see: Simplicial decomposition algorithms) 
(COP) see: contract-or-patch — 
copolyblock algorithm 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
(copolyblock) algorithm see: revised reverse polyblock — 
copositive 
[05C15, 05C17, 05C35, 05C69, 65K05, 90C20, 90C22, 90C25, 
90C35] 
see: Copositive programming; Lovasz number; Quadratic 
programming with bound constraints) 
copositive matrix 
[65K05, 90C20, 90C33] 
see: Principal pivoting methods for linear complementarity 
problems; Standard quadratic optimization problems: 
theory) 
copositive matrix see: strictly — 
Copositive optimization 
90C20, 90C22, 90C26) 
Copositive programming 
90C25, 90C22) 
referred to in: Lovasz number) 
copositive programming 
[90C22, 90C25] 
see: Copositive programming) 
copositivity 
[90C20] 
see: Standard quadratic optimization problems: 
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algorithms; Standard quadratic optimization problems: 
theory) 
copulas 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
core see: multi- — 
Corley-Moon algorithm 
[90C31, 90C39] 
(see: Multiple objective dynamic programming) 
corner rule see: North-West — 
corner solution 
[90C05] 
(see: Extension of the fundamental theorem of linear 
programming) 
cornered 
[90C05] 
(see: Extension of the fundamental theorem of linear 
programming) 
corrected seminormal equation 
[65Fxx] 
(see: Least squares problems) 
correcting methods see: label — 
corrective 
[90C10, 90C15] 
(see: Stochastic vehicle routing problems) 
corrective action 
[90C15] 
see: Two-stage stochastic programs with recourse) 
corrector 
[90C05] 
see: Linear programming: interior point methods) 
corrector see: predictor- — 
corrector algorithm see: predictor- — 
corridor method 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
cost 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
cost see: decomposable —; differential —; Hamiltonian 
path —-; linear platform —; mean-weight —; minimizing 
network —; path —; piecewise linear arc —; production 
realizing with minimal social —; reduced —; setup —; 
staircase —; transportation —; unbounded —; variable — 
cost approximation 
[90039] 
(see: Neuro-dynamic programming) 
Cost approximation algorithms 
(90C30) 
(referred to in: Dynamic traffic networks) 
(refers to: Dynamic traffic networks; Frank-Wolfe 
algorithm) 
cost of an arc in a network 
[90C35] 
(see: Minimum cost flow problem) 
cost with capacity constraints see: single fixed — 
cost coefficients see: sensitivity analysis with respect to 
changes in — 
cost of a directed cycle 
[90035] 
(see: Minimum cost flow problem) 


cost fixing see: reduced — 
cost flow problem see: minimum — 
cost function 
[90B10, 90C26, 90C30, 90C35, 93-XX] 
(see: Direct search Luus—Jaakola optimization procedure; 
Nonconvex network flow problems) 
cost function see: regular —; regular link —; sawtooth arc —; 
staircase —; staircase arc —; total — 
cost functional 
[49J20, 49]52] 
(see: Shape optimization) 
cost functions in integer programming 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
cost index 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
cost infinite horizon problem see: total — 
cost network flow see: minimum — 
cost network flow problem see: minimum —; piecewise linear 
minimum — 
cost with no capacity constraints see: single fixed — 
cost per stage see: average —; discounted problem with 
bounded — 
cost per stage problem see: average — 
cost per stage problems see: average —; Dynamic 
programming: average — 
cost row 
[90C05] 
(see: Linear programming: Klee-Minty examples) 
cost scaling 
[90C30, 90C35] 
(see: Auction algorithms) 
cost simplex algorithm see: Parametric linear programming: — 
cost structure 
[90B80, 90B85] 
(see: Warehouse location problem) 
cost terms 
(see: Planning in the process industry) 
cost/time see: minimization of — 
cost-to-go 
[49L.20, 90C40] 
(see: Dynamic programming: stochastic shortest path 
problems) 
cost-to-time ratio 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
cost-to-time ratio cycle see: minimum — 
cost vector see: differential —; generic — 
cost vectors see: equivalent — 
COSTADE 
[90C26, 90C29] 
(see: Optimal design of composite structures) 
costs see: communication —; convex piecewise linearization in 
facility location problems with staircase —; detention —; 
Facility location with staircase —; heuristics of facility 
location problems with staircase —; linearization in facility 
location problems with staircase —; reduction to finite —; 
solution of facility location problems with staircase — 
Coulomb unilateral frictional contact see: Signorini- — 
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countability 

[03E70, 03H05, 91B16] 

(see: Alternative set theory) 
countable class 

[03E70, 03H05, 91B16] 

(see: Alternative set theory) 
countable set D 

[49L99] 

(see: Dynamic programming: average cost per stage 

problems) 
counterpart see: robust — 
counterpart method see: stochastic — 
coupled fixed point 

[90C33] 

(see: Order complementarity) 
coupled HMM 

(see: Bayesian networks) 
coupled unilateral contact problem with friction 

[49]40, 49Q10, 70-08, 74K99, 74Pxx] 

(see: Quasivariational inequalities) 
coupling see: bandwidth of interdisciplinary —; 

model/optimizer — 
coupling constraints 

[90B10, 90C05, 90C06, 90C35] 

(see: Nonoriented multicommodity flow problems) 
coupling constraints see: precedence/ — 

Courant penalty function 

[90C30] 

(see: Image space approach to optimization) 
Cournot equilibrium see: Stackelbergq—-Nash- — 
Cournot-Nash equilibrium see: spatial — 
Cournot-Nash oligopolistic equilibrium 

[65K10, 90C31] 

(see: Sensitivity analysis of variational inequality problems) 
Cournot-Nash oligopolistic equilibrium model 

[90C31, 90C33] 

(see: Sensitivity analysis of complementarity problems) 
covariance matrix estimation 

[15A15, 90C25, 90C55, 90C90] 

(see: Semidefinite programming and determinant 

maximization) 
covector 

[90C09, 90C10] 

(see: Oriented matroids) 
cover 

[05C50, 15A48, 15A57, 90C25] 

(see: Matrix completion problems) 

COVER see: minimum weighted vertex —; node —; 
universal —; VERTEX — 

cover the extremal set 
(see: Planning in the process industry) 

Cover Problem see: minimum Vertex — 

coverage location problem see: maximum — 

covering, packing and partitioning problems see: Set — 

covering problem 

[90C35] 

(see: Feedback set problems) 
covering problem see: node —; set —; Single facility location: 

circle — 


covering problem on a network 
[90B10, 90B80, 90C35] 

(see: Network location: covering problems) 

covering problems see: Network location: — 

covering relation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

covering subset 

[90C09, 90C10] 

(see: Matroids) 

covers all edge-directions of P 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 

(see: Convex discrete optimization) 

CP see: p- — 

CPP 
[90B20] 

(see: General routing problem) 

CQ 
[49K27, 49K40, 90C30, 90C31] 

(see: First order constraint qualifications) 

CQ see: Abadie —; asymptotic —; First order —; Gollan —; 
Kuhn-Tucker —; linear independence —; 
Mangasarian—Fromovitz —; Robinson —; second order —; 
Strong Slater —; Weak Slater — 

CR 

[90C29] 

see: Estimating data for multicriteria decision making 

problems: optimization techniques) 

Craig algorithm 

[65K05, 65K10] 

see: ABS algorithms for linear equations and linear least 

squares) 

Craig conjugate gradient type algorithm 

[65K05, 65K10] 

see: ABS algorithms for linear equations and linear least 
squares) 

Crane Problem (SCP) see: stacker — 

CRCW PRAM 

[03D15, 68Q05, 68Q15] 

see: Parallel computing: complexity classes) 

credence see: degrees of — 

Credit rating and optimization methods 

91B28 90C90 90C05 90C20 90C30) 

referred to in: Beam selection in radiotherapy treatment 

design) 

crew deadheading 

see: Railroad crew scheduling) 

crew district see: double-ended —; single-ended — 

crew districts 
(see: Railroad crew scheduling) 

crew pairing 
(see: Railroad crew scheduling) 

crew pools 
(see: Railroad crew scheduling) 

CREW PRAM 
[03D15, 68Q05, 68Q15] 

(see: Parallel computing: complexity classes) 

crew rostering 

(see: Railroad crew scheduling) 
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crew scheduling 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 

crew scheduling 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization; Railroad crew scheduling) 

crew scheduling see: airline —; Railroad — 

crew-scheduling problem 

[90C10, 90C11, 90C27, 90C57] 

see: Set covering, packing and partitioning problems) 

crew types 

see: Railroad crew scheduling) 

crisp relation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

crisp relations see: special properties of — 

criss-cross 

[05B35, 65K05, 90C05, 90C20, 90C33] 

(see: Criss-cross pivoting rules) 

criss-cross method 

[05B35, 65K05, 90C05, 90C20, 90C33] 

see: Criss-cross pivoting rules) 

criss-cross method 

[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules) 

criss-cross method see: least-index —; Terlaky —; Ziont 

Criss-cross pivoting rules 
(90C05, 90C33, 90C20, 05B35, 65K05) 
(referred to in: Least-index anticycling rules; Lexicographic 
pivoting rules; Linear programming; Linear programming: 
Klee-Minty examples; Pivoting algorithms for linear 
programming generating two paths; Principal pivoting 
methods for linear complementarity problems; 
Probabilistic analysis of simplex algorithms; Simplicial 
pivoting algorithms for integer programming) 
(refers to: Least-index anticycling rules; Lexicographic 
pivoting rules; Linear complementarity problem; Linear 
programming; Pivoting algorithms for linear programming 
generating two paths; Principal pivoting methods for linear 
complementarity problems; Probabilistic analysis of 
simplex algorithms; Simplicial pivoting algorithms for 
integer programming) 

criteria 
[90029] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 

criteria see: aspiration —; Decision support systems with 
multiple —; Dykstra’s algorithm and robust stopping — 

criteria decision making see: multiple — 

criteria design problem see: multiple — 

criteria evaluation see: multiple — 

criteria for multiphase chemical equilibrium see: Optimality — 

criteria problem see: continuous multiple —; discrete 
multiple — 

criteria problems see: multi- — 

criterion see: Akaike information —; dominance —; fuzzy —; 
infeasibility —; integrality —; k-means —; least squares —; 
measurable —; Metropolis —; minimum unfeasibility —; 
objective —; optimality —; ordinal —; probabilistic —; 
reaction tangent-plane —; scale invariance —; sector 


stability —; stopping —; tangent-plane —; termination —; 
test nonmonotone Armijo-like — 


criterion problem in OR see: single- — 
criterion space 


[90C29] 
(see: Multiple objective programming support) 


criterion uncapacitated static multifacility see: discrete 


single-commodity single- — 


critical 


[90C05, 90C11, 90C25, 90C30, 90C31, 90C34] 
(see: Parametric mixed integer nonlinear optimization; 
Semi-infinite programming: discretization methods) 


critical arcs 


(see: Emergency evacuation, optimization modeling) 


critical column 


[90C05, 90C06] 
(see: Selfdual parametric method for linear programs) 


critical cone 


[90C22, 90C25, 90C30, 90C31, 90C33] 

(see: Optimization with equilibrium constraints: 

A piecewise SQP approach; Semidefinite programming: 
optimality conditions and stability) 


critical cone see: z- — 
critical direction 


[49K27, 49K40, 90C30, 90C31] 
(see: Second order constraint qualifications) 


critical direction see: high-order —; high-regular — 
critical directions see: cone of — 
critical interval 


[90C11, 90C31] 
(see: Parametric mixed integer nonlinear optimization) 


critical path 


[90B35] 
(see: Job-shop scheduling problem) 


critical point 


[49-XX, 49J52, 58E05, 90-XX, 90C30, 93-XX] 

(see: Duality theory: monoduality in convex optimization; 
Hemivariational inequalities: eigenvalue problems; 
Topology of global optimization) 


critical point see: dt- —; generalized —; nondegenerate — 
critical point of an energy functional see: generalized — 
critical point set see: generalized — 

critical point theory 


[49]52] 
(see: Hemivariational inequalities: eigenvalue problems) 


critical points 


[49J40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
(see: Nonconvex energy functions: hemivariational 
inequalities) 


critical points see: nondegenerate — 
critical region 


C 


1g 


90C05, 90C31] 

(see: Multiparametric linear programming; Parametric 
linear programming: cost simplex algorithm) 

itical region 

90C05, 90C31] 

(see: Multiparametric linear programming; Parametric 
linear programming: cost simplex algorithm) 


critical regions 


90C11, 90C31] 
(see: Multiparametric mixed integer linear programming) 


Subject Index 


4155 


critical regions see: neighboring — 
critical row 
[90C05, 90C06] 
(see: Selfdual parametric method for linear programs) 
critical value 
[90C05, 90C06] 
(see: Selfdual parametric method for linear programs) 
cross see: criss- — 
cross decomposition 
[49M27, 90C11, 90C30] 
(see: MINLP: generalized cross decomposition) 
cross decomposition see: generalized —; mean value —; 
MINLP: generalized — 
cross decomposition algorithm 
[49M27, 90C11, 90C30] 
(see: MINLP: generalized cross decomposition) 
cross-entropy 
[62F10, 90C25, 94A17] 
(see: Entropy optimization: parameter estimation; Entropy 
optimization: shannon measure of entropy and its 
properties) 
cross-entropy 
[90C25, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 
cross-entropy see: axiomatic derivation of —; axiomatic 
derivation of the principle of minimum —; 
Kullback—Leibler —; Kullback—Leibler measure of —; 
principle of minimum — 
cross-entropy principle see: minimum — 
cross method see: criss- —; least-index criss- —; Terlaky 
criss- —; Ziont criss- — 
cross pivoting rules see: Criss- — 
cross polytope 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 
cross-sectional shapes see: beam — 
cross-validation 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
crossing see: edge —; non- — 
crossing minimization 


[90C35] 

(see: Optimization in leveled graphs) 
crossing minimization see: k-level —; leveled — 
crossing number 


[90C10, 90C27, 94C15] 
(see: Graph planarization) 
crossover 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90, 92B05] 
(see: Broadcast scheduling problem; Genetic algorithms; 
Traveling salesman problem) 
crossover 
[92B05] 
(see: Genetic algorithms) 
crossover (EAX) see: edge assembly — 


CRPM 
[90C27, 90C60, 91412] 
(see: Combinatorial optimization games) 
crystal structures see: prediction of — 
crystal X-ray diffraction data see: Optimization techniques for 
phase retrieval based on single- — 
crystallography: Shake and bake approach see: Phase problem 
in X-ray — 
CSA 
[90C30] 
see: Convex-simplex algorithm) 
CSC 
[90B10, 90C27] 
see: Shortest path tree algorithms) 
csd 
[03B52, 03E72, 47840, 62G07, 62G30, 65K05, 68T27, 68T35, 
68Uxx, 90Bxx, 91Axx, 91B06, 92C60] 
see: Boolean and fuzzy relations; Isotonic regression 
problems) 
Csiszar a-divergence 
[90C05, 90C25] 
(see: Young programming) 
CSP 
[90C10] 
see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
CSP see: mAX- —; max-r- — 
CSPs see: binary — 
CST 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 
cube connected cycle see: k-dimensional — 
cuboctahedron 
[90C35] 
(see: Optimization in leveled graphs) 
cumulative 
(see: Mixed integer programming/constraint programming 
hybrid methods) 
cumulative sum diagram 
[41A30, 62G07, 62G30, 62J02, 65K05, 90C26] 
(see: Isotonic regression problems; Regression by special 
functions: algorithms and complexity) 
cures of dimensionality 
[90C05] 
(see: Continuous global optimization: models, algorithms 
and software) 
curse of dimensionality 
[65K05, 65T40, 68Q05, 68Q10, 68Q25, 90B36, 90C05, 90C25, 
90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval; 
Information-based complexity and information-based 
optimization; Stochastic optimal stopping: numerical 
methods; Stochastic scheduling) 
curse of dimensionality 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26, 90C34] 
(see: Information-based complexity and information-based 
optimization; Semi-infinite programming: methods for 
linear problems) 
curvature 
[90C22, 90C25, 90C31] 
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(see: Semidefinite programming: optimality conditions and 
stability) 

curvature see: direction of negative —; negative — 

curve see: feasible high-order approximating —; high-order 
approximating —; interest rate yield —; load —; 
mobilization —; parabolic —; Peano —; space filling —; 
switching —; tangent high-order approximating — 

curve approach see: parabolic — 

curve fitting 
[90C26, 90C30] 
(see: Forecasting) 

curve fitting see: subjective — 

curve fitting and extrapolation see: subjective — 

curves see: approximation of space filling — 

curvilinear coordinate system 

[90C26] 

(see: Smooth nonlinear nonconvex optimization) 

curvilinear line search 

[90C06] 

(see: Large scale unconstrained optimization) 

customer 

[90B80, 90B85] 

see: Warehouse location problem) 

customer see: weight of a — 

cut 

[90B10, 90B15, 90C15, 90C35] 

see: Preprocessing in stochastic programming) 

cut see: branch and —; branch and price and —; capacity of 
a —; Chvatal-Gomory —; clique- —; concavity —; 
convexity —; feasibility —; flow across an s—t- —; 
global —; integer —; intersection —; knapsack —; 
lift-and-project —; lifting —; local —; max- —; 
maximum —; Maximum cut problem, MAX- —; maximum 
mean —; maximum mean-weight —; minimal —; 
minimum —-; mixed integer rounding —; nonlinear —; 
odd-hole- —; optimality —; s—t- —; valid — 

cut algorithm see: Junger—Mutzel branch and — 

cut algorithms see: Integer programming: branch and —; 
Stable set problem: branch — 

cut-and-branch 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms) 

cut conditions 
[49]35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 

cut of a fuzzy relation see: a- — 

cut generation 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems; Planning in 
the process industry) 

cut-improvement see: dual —; primal — 

Cut (MC) see: max — 

cut principle see: disjunctive — 

cut problem see: minimum — 

cut problem, MAX-CUT see: Maximum — 

cut procedure see: branch and — 

cut theorem see: max-flow min- — 

cuts see: feasibility —; Fenchel —; lift-and-project —; 
nondominated —-; parallel —; pool of —; quotient —; 
reduction —; value — 


cutting angle method 

[90C26] 

(see: Global optimization: envelope representation) 
cutting angle method 

[90C26] 

(see: Global optimization: envelope representation) 
cutting angle method see: Global optimization: — 
cutting pattern 

[90B90, 90C59] 

(see: Cutting-stock problem) 
cutting patterns 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 

90B, 90C] 

(see: Convex discrete optimization) 
cutting plane see: Chvatal-Gomory —; extended —; 

generalized —; strong —; trade-off — 
cutting plane algorithm 

[49M37, 90C08, 90C11, 90C27, 90C29, 90C57, 90C59, 90C90] 

(see: MINLP: applications in the interaction of design and 

control; Multi-objective optimization: interaction of design 

and control; Quadratic assignment problem) 
cutting plane algorithm 

[90C06, 90C15] 

(see: Stabilization of cutting plane algorithms for stochastic 

linear programming problems; Stochastic linear 

programming: decomposition and cutting planes) 
cutting plane algorithm see: Extended —; Gomory —; 

Sequential — 
cutting plane algorithms see: Integer programming: — 
cutting plane algorithms for stochastic linear programming 

problems see: Stabilization of — 
cutting plane approach 

[90C25] 

(see: Concave programming) 
cutting plane approach see: Mixed-integer nonlinear 

optimization: A disjunctive — 
cutting plane approaches 

[90C10, 90C11, 90C27, 90C57] 

(see: Set covering, packing and partitioning problems) 
cutting plane coefficients see: statistical representation of — 
cutting plane method 

[46N10, 49J40, 49J52, 65K05, 90-00, 90C05, 90C10, 90C11, 

90C25, 90C27, 90C30, 90C34, 90C47, 90C57] 

(see: Integer programming; Nondifferentiable optimization; 

Semi-infinite programming: discretization methods; 

Solving hemivariational inequalities by nonsmooth 

optimization methods) 
cutting plane method 

[90C26] 

(see: Cutting plane methods for global optimization) 
cutting plane method see: analytic center —; extended —; 

generalized —; Kelley —; Kelley's classical — 
cutting plane methods 

[62F12, 65C05, 65K05, 90C15, 90C31] 

(see: Monte-Carlo simulations for stochastic optimization) 
cutting plane methods see: Nondifferentiable optimization: —; 

regularization of deterministic — 

Cutting plane methods for global optimization 

(90C26) 
cutting plane model 

[49]40, 49J52, 65K05, 90C30] 
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(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 

cutting planes 
[03B05, 49M37, 68P10, 68Q25, 68R05, 68T15, 68T20, 90-XX, 
90C05, 90C06, 90C08, 90C0N9, 90C10, 90C11, 90C27, 94C10] 
(see: Integer programming: cutting plane algorithms; 
Maximum satisfiability problem; Mixed integer nonlinear 
programming; Survivable networks) 

cutting planes 
[49M20, 90-08, 90C05, 90C06, 90C08, 90C09, 90C10, 90C11, 
90C25] 
(see: Disjunctive programming; Integer programming: 
branch and cut algorithms; Integer programming: cutting 
plane algorithms; Nondifferentiable optimization: cutting 
plane methods) 

cutting planes see: polyhedral —; Stochastic linear 
programming: decomposition and — 

cutting stock 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 

Cutting-stock problem 
(90B90, 90C59) 
(refers to: Integer programming) 

cutting-stock problem 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68Q99, 68R, 68U, 
68W, 90B, 90B50, 90C, 90C10, 90C11, 90C27, 90C57] 
(see: Branch and price: Integer programming with column 
generation; Convex discrete optimization; Optimization 
and decision support systems; Set covering, packing and 
partitioning problems) 

cutting-stock problem 
[90B50, 90B90, 90C59] 
(see: Cutting-stock problem; Optimization and decision 
support systems) 

cutworthiness 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

cutworthy property 
[03B52, 03E72, 47840, 68127, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

CVI see: Control vector iteration — 

CVRP 
[90B06] 
(see: Vehicle routing) 

cycle 
[90C35] 
(see: Minimum cost flow problem) 

cycle see: cost of a directed —; fundamental —; k-dimensional 
cube connected —; maximum profit-to-time ratio —; 
minimum cost-to-time ratio —; minimum mean —; 
mixed — 

cycle-canceling algorithm 
[90C35] 
(see: Minimum cost flow problem) 

cycle-canceling algorithm 
[90C35] 
(see: Minimum cost flow problem) 

cycle of a digraph see: directed — 


cycle factor 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
see: Traveling salesman problem) 
cycle in a graph 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
see: Replicator dynamics in combinatorial optimization) 
cycle problem see: Hamiltonian —; hitting — 
cycle time 
[90C26] 
see: Global optimization in batch design under 
uncertainty) 
cycles see: negative — 
Cyclic coordinate method 
90C30) 
(referred to in: Powell method; Rosenbrock method; 
Sequential simplex method) 
refers to: Powell method; Rosenbrock method; Sequential 
simplex method) 
cyclic coordinate search 
[90C30] 
see: Cyclic coordinate method) 
cyclic rule 
[90C30] 
(see: Cost approximation algorithms) 
cyclic shift function 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
cyclically reducible graph 
[90C35] 
see: Feedback set problems) 
cycling 
[90C05, 90C10, 90C60] 
see: Complexity of degeneracy; Simplicial pivoting 
algorithms for integer programming) 
cycling 
[05B35, 65K05, 90C05, 90C20, 90C33, 90C60] 
(see: Complexity of degeneracy; Criss-cross pivoting rules) 
cycling see: nondegenerate — 
cycling algorithm 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Lexicographic pivoting rules) 
cycling procedure see: anti- — 
cZP 
(74440, 90C26] 
(see: Shape selective zeolite separation and catalysis: 
optimization methods) 


D 


D see: countable set — 
Dpy see: feasible for — 
d-dimensional hypercube 

[65K05, 65Y05] 

(see: Parallel computing: models) 
d-dimensional torus 

[65K05, 65Y05] 

(see: Parallel computing: models) 
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D’Esopo-Pape method 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 
D-function 
[68W10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 
D-functions see: proximal minimization with — 
D-optimal design 
15A15, 90C25, 90C55, 90C90] 
(see: Semidefinite programming and determinant 
maximization) 
DACE 
65F10, 65F50, 65H10, 65K10] 
(see: Multidisciplinary design optimization) 
DACE 
65F10, 65F50, 65H10, 65K10] 
(see: Multidisciplinary design optimization) 
DAE 
49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 
Dai-Yuan algorithm 
90C30] 
(see: Conjugate-gradient methods) 
damped Gauss-Newton method 
49M37] 
(see: Nonlinear least squares: Newton-type methods) 
damped Newton method 
49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
damped NM 
[49]52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
Daniel-Gragg-Kaufmann-Stewart reorthogonalized 
Gram-Schmidt algorithm 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
Dantzig largest coefficient pivoting rule 
[90C05] 
(see: Linear programming: Klee-Minty examples) 
Dantzig rule 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
Dantzig-Wolfe decomposition 
[49M20, 90-08, 90C06, 90C25, 90C35] 
see: Nondifferentiable optimization: cutting plane 
methods; Simplicial decomposition algorithms) 
Dantzig-Wolfe decomposition 
[90C06, 90C25, 90C30, 90C35] 
see: Frank-Wolfe algorithm; Simplicial decomposition; 
Simplicial decomposition algorithms) 
Dantzig—Wolfe decomposition see: nonlinear — 
data 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
data see: best fitting to —; evaluation of empirical —; 
feasibility approach to image reconstruction from 
projection —; image reconstruction from projection —; 
length of input —; optimization approach to image 


reconstruction from projection —; Optimization techniques 
for phase retrieval based on single-crystal X-ray 
diffraction —; row conditional proximity —; size of input —; 
training — 
data-association problem 
[90C35] 
(see: Multi-index transportation problems) 
data classification see: Deterministic and probabilistic 
optimization models for — 
data classification via mixed-integer optimization see: 
Multi-class — 
data elicitation 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
data elicitation 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
Data envelopment analysis 
(90B50, 90B30, 91B82, 90C05) 
(refers to: Optimization and decision support systems) 
data envelopment analysis 
[90B30, 90B50, 90C05, 90C25, 90C29, 90C30, 90C31, 91B82] 
(see: Bilevel programming: optimality conditions and 
duality; Data envelopment analysis) 
data envelopment analysis 
[90C27] 
(see: Operations research and financial markets) 
data fitting 
[90C30] 
(see: Generalized total least squares) 
data mapping see: computation and — 
Data mining 
data Mining 
(see: Mathematical programming for data mining) 
data mining see: Mathematical programming for — 
data for multicriteria decision making problems: optimization 
techniques see: Estimating — 
data parallelism 
[49-04, 65Y05, 68N20] 
(see: Automatic differentiation: parallel computation) 
data perturbation 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
data sets see: least squares problems with massive — 
Davidon-Fletcher-Powell method 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
Davidon-Fletcher-Powell update 
[90C30] 
(see: Broyden family of methods and the BFGS update) 
day dynamic travel behavior see: day-to- — 
day-to-day dynamic travel behavior 
[90B15] 
(see: Dynamic traffic networks) 
day-to-day dynamic travel behavior 
[90B15] 
(see: Dynamic traffic networks) 
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D.C. decomposition 
[90C30] 
(see: Large scale trust region problems) 
d.c. function 
[65H10, 90B80, 90C11, 90C26, 90C30, 90C31] 
(see: Global optimization methods for systems of nonlinear 
equations; Robust global optimization; Stochastic 
transportation and location problems) 
d.c. function 
[46A20, 52A01, 65Kxx, 90C30, 90Cxx] 
(see: Farkas lemma: generalizations; Quasidifferentiable 
optimization: algorithms for QD functions) 
dc functions 
[90C26] 
(see: D.C. programming) 
d.c. optimization 
[90C25, 90C26, 90C31] 
(see: Concave programming; Robust global optimization) 
D.C. programming 
(90C26) 
(referred to in: a BB algorithm; Global optimization 
methods for systems of nonlinear equations; Large scale 
trust region problems; Quadratic knapsack; Quadratic 
programming with bound constraints; Reverse convex 
optimization; Standard quadratic optimization problems: 
theory; Stochastic global optimization: stopping rules; 
Stochastic global optimization: two-phase methods) 
d.c. programming 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization; 
Duality theory: monoduality in convex optimization) 
d.c. programming 
[49-XX, 90-XX, 90C30, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization; 
Large scale trust region problems) 
d.c. programming problem 
[65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
d.c. set 
[90C26] 
(see: Global optimization in multiplicative programming) 
DCA 
[90C30] 
(see: Large scale trust region problems) 
De La Garza method 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
De novo protein design using flexible templates 
(92D20, 46N10, 90C 10) 
De novo protein designUsing rigid templates 
DEA 
[90B30, 90B50, 90C05, 91B82] 
(see: Data envelopment analysis) 
DEA 
[90B30, 90B50, 90C05, 91B82] 
(see: Data envelopment analysis) 
dead or dog-lawed 
(see: Railroad crew scheduling) 


dead point 
90Cxx] 
(see: Discontinuous optimization) 
dead-point iterate 
90Cxx] 
(see: Discontinuous optimization) 
deadhead arcs 
see: Railroad crew scheduling) 
deadheading 
[90B06, 90C06, 90C08, 90C35, 90C90] 
see: Airline optimization; Railroad locomotive scheduling) 
deadheading see: crew — 
decision see: ex-ante (risk averse, anticipative) —; ex-post (risk 
prone, adaptive) —; expectation and —-; first-stage —; 
funding —; fuzzy —; investment —; recourse —; 
second-stage — 
decision aid see: multicriteria — 
decision alternative see: set of — 
decision analysis 
[90C15] 
(see: Two-stage stochastic programs with recourse) 
decision maker 
[90C29] 
see: Multi-objective optimization; Interactive methods for 
preference value functions) 
decision making 
[49M37, 65K05, 65K10, 90C30, 93A13] 
see: Multilevel methods for optimal design) 
decision making see: financial —; group —; hierarchical —; 
multicriteria —; multiple criteria —; Preference 
disaggregation approach: basic features, examples from 
financial — 
decision making problems: optimization techniques see: 
Estimating data for multicriteria — 
decision making with rolling horizon 
[90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 
decision making with rolling horizon 
[90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 
decision making under extreme events 
[90C15] 
(see: Stochastic quasigradient methods in minimax 
problems) 
decision making under extreme events 
[90C15] 
(see: Stochastic quasigradient methods in minimax 
problems) 
decision making under uncertainty 
[90C26] 
(see: MINLP: application in facility location-allocation) 
decision models see: nonlinear — 
decision problem 
[90C60] 
(see: Complexity theory; Computational complexity theory) 
decision problem 
[90C60] 
(see: Complexity theory; Computational complexity theory) 
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decision problem see: locational —; polynomially 
transformable — 

decision problems see: “hit-or-miss” — 

decision process see: Markov — 

decision rule 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 

decision rule see: minimax — 

decision set 
[90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions; Multi-objective optimization: 
pareto optimal solutions, properties; Vector optimization) 

decision support 
[65K05, 90B50, 90C05, 90C29, 91B06, 91B60] 
(see: Financial applications of multicriteria analysis; 
Multi-objective optimization and decision support systems; 
Railroad crew scheduling) 

decision support methodologies for auditing decisions see: 
Multicriteria — 

decision support system 
[65K05, 90B50, 90B80, 90C05, 90C29, 91B06] 
(see: Facilities layout problems; Multi-objective 
optimization and decision support systems) 

decision support system 
[90B80, 90C29, 91A99] 
(see: Decision support systems with multiple criteria; 
Facilities layout problems; Preference disaggregation) 

decision support system see: Asset liability management —; 
intelligent multicriteria —; multicriteria —; multicriteria 
group — 

decision support systems 
[90C29] 
(see: Decision support systems with multiple criteria) 

decision support systems see: intelligent multicriteria —; 
Multi-objective optimization and —; Optimization and — 

Decision support systems with multiple criteria 
(90C29) 
(referred to in: Bi-objective assignment problem; Estimating 
data for multicriteria decision making problems: 
optimization techniques; Financial applications of 
multicriteria analysis; Fuzzy multi-objective linear 
programming; Multicriteria sorting methods; 
Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling) 
(refers to: Bi-objective assignment problem; Estimating data 
for multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 


sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling) 
decision-theoretic framework see: Bayesian — 
decision theory 
[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
decision tree 
[90C09, 90C10] 
(see: Optimization in boolean classification problems) 
decision variable see: flow — 
decision variables 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming; Plant layout problems and optimization) 
decision variables x 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
decisions see: diversified investment —; first-stage —; 
inventory and transportation —; Multicriteria decision 
support methodologies for auditing —; second-stage — 
decisions in dynamic optimization see: discrete — 
decisions in a supply chain see: operational — 
declarative knowledge 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
declarative language 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
declarative language 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
declarative languages 
(see: Planning in the process industry) 
declarative program see: pretty-printing a — 
declarative program structure see: analysing — 
declarative programs see: classifying —; symbolically 
transforming — 
declarative representation 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
declared interval function see: pre- — 
decomposable cost 
[90C35] 
(see: Multi-index transportation problems) 
decompose 
[90C25, 90C26] 
(see: Decomposition in global optimization) 
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decomposition 
[15-XX, 49M29, 65-XX, 68Q99, 90-XX, 90C11, 90C15, 90C26, 
90C33, 90C35] 
(see: Branch and price: Integer programming with column 
generation; Cholesky factorization; Generalized benders 
decomposition; Multicommodity flow problems; Stochastic 
bilevel programs) 

Decomposition 
[49M27, 49M29, 49M37, 68Q99, 90C06, 90C11, 90C20, 
90C30, 90C35, 90C90] 
(see: Branch and price: Integer programming with column 
generation; Decomposition principle of linear 
programming; Decomposition techniques for MILP: 
lagrangian relaxation; Generalized benders decomposition; 
MINLP: generalized cross decomposition; Mixed integer 
nonlinear programming; Multicommodity flow problems; 
Railroad crew scheduling; Successive quadratic 
programming: decomposition methods) 

decomposition see: Benders —; branch —; cross —; 
Dantzig—Wolfe —; D.C. —; disaggregate simplicial —; 
dual —; generalized Benders —; generalized cross —; heat 
exchanger network synthesis without —; Jordan—Hahn —; 
L-shaped —; Lagrangian —; Lasserre signed —; Lawrence 
signed —; LU- —; mean value cross —; MINLP: generalized 
cross —; nested Benders —; nonlinear Dantzig—Wolfe —; 
operator —; path —; price-directive —; problem —; QR —; 
range and null space —; regularized Frank-Wolfe —; 
resource-directive —; restricted simplicial —; signed —; 
simplicial —; stochastic —; tree —; well-separated pair —; 
Yosida-Hewitt — 

decomposition algorithm see: cross —; regularized 
stochastic —; stochastic — 

decomposition algorithms 
[90C15] 
(see: Stochastic linear programs with recourse and arbitrary 
multivariate distributions) 

decomposition algorithms 
[90B10, 90C05, 90C06, 90C11, 90C31, 90C35] 
(see: Nonoriented multicommodity flow problems; 
Parametric mixed integer nonlinear optimization) 

decomposition algorithms see: Simplicial — 

decomposition algorithms for nonconvex minimization problems 
[49Q10, 74K99, 74Pxx, 90C90, 91465] 
(see: Multilevel optimization in mechanics) 

Decomposition algorithms for the solution of multistage 
mean-variance optimization problems 
(90C15, 90C90) 

decomposition approach see: augmented Lagrangian —; 
Benders — 

decomposition-based clustering approach: global optimum 
search with enhanced positioning see: Gene clustering: 
A novel — 

decomposition CA algorithms 
[90C30] 
(see: Cost approximation algorithms) 

decomposition of a continuous piecewise linear function 
[90Cxx] 
(see: Discontinuous optimization) 

decomposition/coordination 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 


decomposition and cutting planes see: Stochastic linear 
programming: — 

decomposition of a function see: second order — 

Decomposition in global optimization 

90C26, 90C25) 

decomposition heuristic 

[68T99, 90C27] 

see: Capacitated minimum spanning trees) 

decomposition method see: feasible —; nonfeasible — 

decomposition methods 

[90C20, 90C30] 

see: Successive quadratic programming: decomposition 
methods) 

decomposition methods see: Successive quadratic 
programming: — 

decomposition of a monomial ideal see: standard pair — 

decomposition point 
[58E05, 90C30] 
(see: Topology of global optimization) 

decomposition points 
[58E05, 90C30] 
(see: Topology of global optimization) 

Decomposition principle of linear programming 
(90C06) 
(referred to in: Generalized benders decomposition; MINLP: 
generalized cross decomposition; MINLP: logic-based 
methods; Simplicial decomposition; Simplicial 
decomposition algorithms; Stochastic linear programming: 
decomposition and cutting planes; Successive quadratic 
programming: decomposition methods) 
(refers to: Generalized benders decomposition; MINLP: 
generalized cross decomposition; MINLP: logic-based 
methods; Simplicial decomposition; Simplicial 
decomposition algorithms; Stochastic linear programming: 
decomposition and cutting planes; Successive quadratic 
programming: decomposition methods) 

decomposition of SLP 

[90C06, 90C15] 

see: Stochastic linear programming: decomposition and 

cutting planes) 

decomposition solution see: truncated singular value — 

decomposition step 

[90C06, 90C25, 90C35] 

see: Simplicial decomposition algorithms) 

decomposition techniques 

[49K35, 49M27, 49Q10, 65K10, 74K99, 74Pxx, 90C15, 90C25, 

90C90, 91A65] 

see: Convex max-functions; Multilevel optimization in 
mechanics; Multistage stochastic programming: barycentric 
approximation; Two-stage stochastic programs with 
recourse) 

decomposition techniques 
[90C15] 
(see: L-shaped method for two-stage stochastic programs 
with recourse) 

Decomposition techniques for MILP: lagrangian relaxation 
(90C90, 90C30) 
(referred to in: Branch and price: Integer programming with 
column generation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
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branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; Lagrange, 
Joseph-Louis; Lagrangian multipliers methods for convex 
programming; LCP: Pardalos—Rosen mixed integer 
formulation; MINLP: trim-loss problem; Multi-objective 
integer linear programming; Multi-objective mixed integer 
programming; Multi-objective optimization: lagrange 
duality; Multiparametric mixed integer linear 
programming; Parametric mixed integer nonlinear 
optimization; Set covering, packing and partitioning 
problems; Simplicial pivoting algorithms for integer 
programming; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem) 
(refers to: Branch and price: Integer programming with 
column generation; Integer linear complementary problem; 
Integer programming; Integer programming: algebraic 
methods; Integer programming: branch and bound 
methods; Integer programming: branch and cut algorithms; 
Integer programming: cutting plane algorithms; Integer 
programming duality; Integer programming: lagrangian 
relaxation; Lagrange, Joseph-Louis; Lagrangian multipliers 
methods for convex programming; LCP: Pardalos-Rosen 
mixed integer formulation; Mixed integer classification 
problems; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multi-objective optimization: lagrange duality; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Time-dependent 
traveling salesman problem) 

decompositions see: Branchwidth and branch — 

decrease 
[65K05, 90C30] 
(see: Random search methods) 

decrease see: high-order approximating cone of —; high-order 
approximating vector of —; high-order cones of —; 
high-order set of — 

decrease conditions see: sufficient — 

decreasing 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 

decreasing see: convex —; nonlinear — 

decreasing tail see: RSM-distribution with algebraically — 

Dedekind number 

[90C09] 

see: Inference of monotone boolean functions) 

deepening see: iterative — 

default strategies 

[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

see: Modeling difficult optimization problems; Planning in 

the process industry) 

deficit of a network node 

[90C35] 

see: Minimum cost flow problem) 

definable classes see: set- — 


defined end see: clearly — 
defined start-ups see: well- — 
definite see: positive — 
definite completion problem see: positive (semi) — 
definite matrices see: positive — 
definite matrix see: partial —; positive —; strongly positive — 
definite quadratic binary programming see: positive semi- — 
definite quadratic function see: positive — 
definite quadratic models see: positive — 
definiteness see: positive — 
definition see: algorithmic —; optimization algorithm — 
definition (colloquial) see: optimization: — 
deflected gradient methods 
90C30] 
(see: Cost approximation algorithms) 
deformable model 
90C90] 
(see: Optimization in medical imaging) 
deformable templates 
90C90] 
(see: Optimization in medical imaging) 
deformation process 
60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
degeneracy 
90C60] 
(see: Complexity of degeneracy) 
degeneracy 
90C60] 
(see: Complexity of degeneracy) 
degeneracy see: Complexity of —; near —; resolving — 
degenerate 
05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 
degenerate see: dual —; primal — 
degenerate basic solution 
90C05 
(see: Linear programming) 
degenerate basis see: dual —; primal — 
degenerate BFS 
90C60 
(see: Complexity of degeneracy) 
degenerate BFS see: nearly — 
degenerate pivot operation 
90C35 
(see: Minimum cost flow problem) 
degenerate problem 
05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Lexicographic pivoting rules) 
degenerate system 
90C60] 
(see: Complexity of degeneracy) 
degradation in quality of both water environment see: 
minimizing the — 
degree see: Brouwer —; graph —; Leray-Schauder —; 
maximum —; motionless —; topological — 
degree of a binomial 
{13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
degree c see: algorithm polynomial of — 
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degree-constrained subgraph problem 
[90C10, 90C11, 90C27, 90C57] 
(see: Integer programming) 
degree deletion heuristic see: increasing- — 
degree of flexibility see: fixed —; optimal — 
degree of inclusion 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
degree of linearity 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
degree minimum spanning tree problem see: bounded — 
degree of a monomial ideal see: arithmetic — 
degree ordering see: minimum — 
degree parallelism alignment problem see: constant — 
degree theory 
90C33] 
(see: Topological methods in complementarity theory) 
degree zero see: homogeneous of — 
degrees of credence 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
delaunay triangulation 
52B11, 52B45, 52B55, 68Q20] 
(see: Optimal triangulations; Volume computation for 
polytopes: strategies and performances) 
delay problem in air traffic control see: ground — 
delay programs see: air traffic control and ground — 
delay system see: time- — 
delay systems see: time- — 
deleting matroid elements 
[90C09, 90C10] 
(see: Matroids) 
deletion 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 
deletion heuristic see: increasing-degree —; incremental — 
deletion in matroids 
[90C09, 90C10] 
(see: Oriented matroids) 
deletion problem see: vertex (arc) — 
deliveries see: Vehicle routing problem with simultaneous 
pickups and — 
delivers see: pick-up and — 
delivery see: express shipment — 
delivery problem see: express — 
delivery systems see: Model based control for drug — 
Delphi method 
[90C26, 90C30] 
(see: Forecasting) 
delta function 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
demand see: elastic travel —; fixed travel —; net —; 
regional —; unity —; water —; water resources planning 
under uncertainty on hydrological exogenous inflow and — 
demand arcs 
(see: Railroad crew scheduling) 
demand CMST see: equal — 
demand function see: aggregate excess — 


demand functions see: elastic demand traffic network 
problems with travel — 
demand node 
[90C30, 90C35] 
(see: Minimum cost flow problem; Optimization in water 
resources) 
demand traffic network equilibrium see: fixed — 
demand traffic network problems see: fixed — 
demand traffic network problems with travel demand 
functions see: elastic — 
denial 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
denial see: checklist — 
density 
[90C05, 90C25, 90C30, 90C34] 
(see: Fractional zero-one programming; Semi-infinite 
programming: discretization methods) 
density see: boltzmann —; steady-state distribution —; 
transition probability — 
density annealing see: Gaussian — 
density clustering 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: two-phase methods) 
density function see: logconcave probability — 
dep-station 
(see: Railroad locomotive scheduling) 
dep-time 
(see: Railroad locomotive scheduling) 
departure see: duty-before- — 
departure connection arc see: ground- — 
departure-ground 
see: Railroad locomotive scheduling) 
departure node 
see: Railroad crew scheduling; Railroad locomotive 
scheduling) 
departure-station 
see: Railroad crew scheduling) 
dependence see: continuous —; linear —; noisy functional — 
dependence analysis 
[49M37, 65K05, 65K10, 90C30, 93A13] 
see: Multilevel methods for optimal design) 
dependence property see: boundary — 
dependency see: interval — 
dependency set see: common — 
dependent see: positively linearly — 
dependent constraints see: maximum function with — 
dependent hyperplanes 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
dependent property see: norm- — 
dependent protein force field via linear optimization see: 
Distance — 
dependent set 
[90C09, 90C10] 
(see: Matroids) 
dependent set see: minimal — 
dependent traveling salesman problem see: Time- — 
dependent variables 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
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depot 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

depot see: multiple —; virtual — 

depot group 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

depot/multiple depots see: single — 

depot vehicle scheduling problem see: Multi- —; Single- — 

depot vehicle scheduling problems see: multi- —; Single- — 

depots see: single depot/multiple — 

depth see: evaluation — 

depth of a Boolean circuit 

[03D15, 68Q05, 68Q15] 

see: Parallel computing: complexity classes) 

depth-first 

[90C10, 90C26] 

(see: MINLP: branch and bound global optimization 

algorithm) 

depth-first search 

[05C85, 90C10, 90C29, 90C35] 
(see: Directed tree networks; Generalized networks; 
Multi-objective integer linear programming) 

depth-first search with backtracking 

[90C05, 90C06, 90C08, 90C10, 90C11] 

see: Integer programming: branch and bound methods) 

depth-first tree search 

[68W 10, 90C27] 

see: Load balancing for parallel optimization techniques) 

Depth-First Tree Search see: Parallel — 

derangements 

[34E05, 90C27] 

see: Asymptotic properties of random multidimensional 
assignment problem) 

derivation of cross-entropy see: axiomatic — 

derivation of entropy see: axiomatic — 

derivation of the Hamilton-Jacobi-Bellman equation 
[34H05, 491.20, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 

derivation of the principle of maximum entropy see: 
axiomatic — 

derivation of the principle of minimum cross-entropy see: 
axiomatic — 

derivative 
[26E25, 49]52, 5227, 90C99] 
(see: Quasidifferentiable optimization: Dini derivatives, 
clarke derivatives) 

derivative see: B- —; Clarke —; clarke generalized —; Clarke 
generalized directional —; Clarke—Rockafellar 
generalized —; Dini —; Dini conditional lower —; Dini 
conditional upper —; Dini lower —; Dini lower 
directional —; Dini upper —; Dini upper directional —; 
directional —; directional Clarke —; generalized 
directional —; generalized second order directional —; 
Hadamard —; Hadamard conditional lower —; Hadamard 
conditional upper —; Hadamard lower directional —; 
Hadamard upper directional —; kth directional —; 
parameter —; upper — 

derivative approach see: material — 

derivative conditions see: matching of — 


derivative-free descent method 


[90C30, 90C33] 
(see: Implicit lagrangian) 


Derivative-free methods for non-smooth optimization 


(65K05, 90C56) 
(referred to in: Maximum cut problem, MAX-CUT) 


derivative of a function 


[26E25, 49J52, 52A27, 90C99] 
(see: Quasidifferentiable optimization: Dini derivatives, 
clarke derivatives) 


derivative of an integral 


[90C15] 
(see: Derivatives of probability and integral functions: 
general theory and examples) 


derivative method see: adjoint — 
derivative of a probability function 


[90C15] 
(see: Derivatives of probability and integral functions: 
general theory and examples) 


derivative of a probability function 


[90C15] 
(see: Derivatives of probability and integral functions: 
general theory and examples) 


derivative ranges see: Bounding — 
derivative in shape optimization see: Topological — 
derivatives 


[93-XX] 
(see: Dynamic programming: optimal control applications) 


derivatives 


[60J05, 90C15, 90C27] 

(see: Derivatives of markov processes and their simulation; 
Derivatives of probability measures; Discrete stochastic 
optimization) 


derivatives see: Dini directional —; directional —; 


distributional —; elementary partial —; evaluation of 
objective functions and/or —; Hadamard directional —; 
handcoded —-; high-order directional —; higher-order —; 
higher-order directional —; lower and upper directional —; 
matrix of second partial —; method of bad —; pricing —; 
process —; Quasidifferentiable optimization: Dini 
derivatives, clarke —; sensitivity —; simulation of — 


derivatives, clarke derivatives see: Quasidifferentiable 


optimization: Dini — 


Derivatives of markov processes and their simulation 


(90C15, 60J05) 

(referred to in: Derivatives of probability and integral 
functions: general theory and examples; Derivatives of 
probability measures; Discrete stochastic optimization) 
(refers to: Derivatives of probability and integral functions: 
general theory and examples; Derivatives of probability 
measures; Discrete stochastic optimization; Optimization 
in operation of electric and energy power systems; 
Stochastic quasigradient methods) 


derivatives in optimization see: Diniand Hadamard — 
Derivatives of probability and integral functions: general 


theory and examples 

(90C15) 

(referred to in: Derivatives of markov processes and their 
simulation; Derivatives of probability measures; Discrete 
stochastic optimization) 

(refers to: Derivatives of markov processes and their 
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simulation; Derivatives of probability measures; Discrete 
stochastic optimization; Optimization in operation of 
electric and energy power systems) 

Derivatives of probability measures 
(90C15) 
(referred to in: Derivatives of markov processes and their 
simulation; Derivatives of probability and integral 
functions: general theory and examples; Discrete stochastic 
optimization) 
(refers to: Derivatives of markov processes and their 
simulation; Derivatives of probability and integral 
functions: general theory and examples; Discrete stochastic 
optimization; Optimization in operation of electric and 
energy power systems; Stochastic quasigradient methods) 

derivatives of structural response 
[90C26, 90C90] 
(see: Structural optimization: history) 

descending index 
[60J05, 90C15] 
(see: Derivatives of markov processes and their simulation) 

descent 
[90C30] 
(see: Nonlinear least squares problems) 

descent see: direction of —; dual —; e-steepest —; first —; 
gradient —; gradient-related —; hypodifferential —; loss 
of —; method of codifferential —; method of 
hypodifferential —; method of steepest —; Newtonian —; 
rate of steepest —; reformulation —; steepest —; variable 
neighborhood — 

descent algorithm 
[90C26, 90C31, 91465] 
(see: Bilevel programming: implicit function approach) 

descent algorithm see: steepest — 

descent-based methods 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 

descent direction 
[90C06, 90C30, 90C90] 
(see: Decomposition techniques for MILP: lagrangian 
relaxation; Large scale unconstrained optimization; 
Sequential simplex method) 

descent direction 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 

descent direction see: Dini steepest —; Hadamard steepest —; 
quasi-Newtonian —; steepest — 

descent directions see: construction of — 

descent directions and efficient points see: Discretely 
distributed stochastic programs: — 

descent flow 

58E05, 90C30] 

(see: Topology of global optimization) 

descent flow 

58E05, 90C30] 

(see: Topology of global optimization) 

descent iterations see: Local attractors for gradient-related — 

descent method 

49M37] 

(see: Nonlinear least squares: trust region methods) 


descent method 
[90C30] 
(see: Nonlinear least squares problems) 

descent method see: coordinate —; derivative-free —; 
steepest — 

descent in a nonlinear program see: loss of — 

descent in a nonlinear programming algorithm 

[90C25, 90C30] 

see: Successive quadratic programming: full space 

methods) 

descent properties 

[90030] 

see: Cost approximation algorithms) 

descent ray 

[90C05, 90C25, 90C30, 90C34] 

see: Semi-infinite programming, semidefinite 

programming and perfect duality) 

descent step 

[47J20, 49]40, 65K10, 90C33] 

see: Solution methods for multivalued variational 

inequalities) 

descent vector 

[49M29, 65K10, 90C05, 90C06, 90C25, 90C30, 90C34] 

see: Local attractors for gradient-related descent iterations; 
Semi-infinite programming, semidefinite programming 
and perfect duality) 

descent vector 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

descent vector see: steepest — 

description see: problem — 

description method see: double — 

descriptional complexity 

90C60] 

(see: Kolmogorov complexity) 

Descriptional complexity 

[90C60] 

see: Kolmogorov complexity) 

descriptive complexity 

[03B50, 68T15, 68T30] 

see: Finite complete systems of many-valued logic algebras) 

descriptive perspective 

[90C29] 

see: Preference modeling) 

descriptors see: context — 

design 

[90C90] 

see: Design optimization in computational fluid dynamics) 

design 

[90C26] 

see: MINLP: design and scheduling of batch processes) 

design see: algorithm —; batch plant —; Beam selection in 
radiotherapy treatment —; circuit —; D-optimal —; 
distribution system —; experiment —; experimental —; 
fully stressed —; global optimal —; logical —; model for 
parallel algorithm —; molecular —; multidisciplinary —; 
Multilevel methods for optimal —; multiload shape —; 
multiload truss —; network —; Operations research models 
for supply chain management and —; optimal —; optimal 
experimental —; optimal shape —; point —; process —; 
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robust obstacle-free shape —; robust obstacle-free truss —; 
sequential experimental —; shape —; structural —; supply 
chain —; truss — 

design analysis 

[90C90] 

see: Design optimization in computational fluid dynamics) 

design approaches see: Optimal solvent — 

design centering 

[65D18, 90B85, 90C26] 

see: Global optimization in location problems) 

design of composite structures 

[90C26, 90C29] 

(see: Optimal design of composite structures) 

design of composite structures 

[90C26, 90C29] 
(see: Optimal design of composite structures) 

design of composite structures see: Optimal — 

design and control see: interaction of —; MINLP: applications 
in the interaction of —; Multi-objective optimization: 
interaction of — 

design of dynamic systems by constructive nonlinear 
dynamics see: Robust — 

design models see: strategic — 

design in nonlinear optics see: Optimal — 

design of operators 

[65K05, 90C30] 

see: Automatic differentiation: point and interval taylor 

operators) 

design of optimal shapes 

[03H10, 49J27, 90C34] 

see: Semi-infinite programming and control problems) 

design optimization see: Multidisciplinary — 

design optimization in CFD 

[90C90] 

see: Design optimization in computational fluid dynamics) 

Design optimization in computational fluid dynamics 
(90C90) 
(referred to in: Interval analysis: application to chemical 
engineering design problems; Multidisciplinary design 
optimization; Multilevel methods for optimal design; 
Optimal design of composite structures; Optimal design in 
nonlinear optics; Structural optimization: history) 
(refers to: Bilevel programming: applications in engineering; 
Interval analysis: application to chemical engineering 
design problems; Multidisciplinary design optimization; 
Multilevel methods for optimal design; Optimal design of 
composite structures; Optimal design in nonlinear optics; 
Structural optimization: history) 

design optimization process see: automated — 

design pattern based model see: glS — 

design problem 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 

design problem see: multiple criteria —; network —; 
Production-distribution system —; survivable network — 

design problems 
[90C29] 
(see: Multiple objective programming support) 

design problems see: Interval analysis: application to chemical 
engineering —; Network —; optimal — 


Design of robust model-based controllers via parametric 
programming 

design and schedule construction see: network — 

design and scheduling of batch processes see: MINLP: — 

design space 

[90C90] 

(see: Design optimization in computational fluid dynamics) 
design stage see: conceptual —; detailed —; preliminary — 
design superstructure 

[90C90] 

(see: MINLP: heat exchanger network synthesis) 
design of a supply chain see: strategic — 
design, synthesis and control see: interaction of — 
design under uncertainty 

[49M37, 90C11, 90C30, 90C90] 

(see: Mixed integer nonlinear programming; Successive 

quadratic programming: applications in the process 

industry) 

design under uncertainty see: Global optimization in batch —; 
process synthesis and — 

design using flexible templates see: De novo protein — 

design variables 

[90C90, 91B28] 

(see: Robust optimization) 
design variables see: discrete — 
designs see: subset interconnection — 
designUsing rigid templates see: De novo protein — 
destructive method 

[90B35] 

(see: Job-shop scheduling problem) 
det problem see: max- — 
detailed design stage 

[90C90] 

(see: Design optimization in computational fluid dynamics) 
detecting redundancy see: deterministic method for —; 

probabilistic method for — 
detection see: low-level feature — 
detection via semidefinite programming see: Maximum 

likelihood — 
detention costs 

(see: Railroad crew scheduling) 
determinant expansion of a matrix see: standard — 
determinant maximization see: Semidefinite programming 

and — 
determination see: molecular structure —; orbits — 
determination of clusters size threshold 

[92C05, 92C40] 

(see: Protein loop structure prediction methods) 
determination: convex global underestimation see: Molecular 

structure — 
determination of rmsd threshold 

[92C05, 92C40] 

(see: Protein loop structure prediction methods) 
determined graph see: rank — 
determined system of nonlinear equations see: well- — 
determined variable see: strongly — 

Determining the optimal number of clusters 

(90C26, 91C20, 68T20, 68W10, 90C11, 92-08, 92C05, 92D10) 
deterministic 

[90C60] 

(see: Complexity classes in optimization) 
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deterministic 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
deterministic algorithm see: polynomial time —; sequential — 
deterministic cutting plane methods see: regularization of — 
deterministic equivalent model 
[90C30, 90C35] 
(see: Optimization in water resources) 
deterministic equivalent problem 
[90C15] 
(see: Stochastic linear programs with recourse and arbitrary 
multivariate distributions) 
deterministic global optimization see: LP strategy for 
interval-Newton method in —; Mixed integer nonlinear 
bilevel programming: — 
deterministic global optimization algorithm 
90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 
deterministic method for detecting redundancy 
90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
deterministic neural network 
90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 
deterministic optimization 
65K05] 
(see: Direct global optimization algorithm) 
Deterministic and probabilistic optimization models for data 
classification 
(65K05, 55R15, 55R35, 90C11) 
(referred to in: Linear programming models for 
classification; Mixed integer classification problems) 
deterministic problem see: static —; underlying — 
deterministic shortest path problem 
[49L.20, 90C39, 90C40] 
(see: Dynamic programming: infinite horizon problems, 
overview) 
deterministic Turing machine 
[90C60] 
(see: Complexity theory) 
deterministic Turing machine see: space complexity of a —; 
time complexity of a — 
development see: algorithmic —; model — 
development and evaluation see: software — 
deviation see: external —; internal —; least absolute —; 
maximum absolute —; mean absolute — 
device see: local search — 
devices and related techniques see: acceleration — 
DEXPTIME 
[90C60] 
(see: Complexity classes in optimization) 
DFBB 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
DFP method 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 


DFP update 
[90C30] 
(see: Broyden family of methods and the BFGS update) 
DFP update 
[90C30] 
(see: Broyden family of methods and the BFGS update) 
diagnosing and tracing infeasibilities 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems; Planning in 
the process industry) 
diagnosis see: breast cancer —; medical — 
diagnosis: optimization-based methods see: Disease — 
diagnostic rotation 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
see: Global optimization in protein folding) 
diagnostic rotations 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
see: Global optimization in protein folding) 
diagonal 
[47J20, 49]40, 65K10, 90C33] 
see: Interval analysis for optimization of dynamical 
systems; Solution methods for multivalued variational 
inequalities) 
diagonal see: negative main —; quasi- — 
diagonal dominance condition 
[90C30, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms) 
diagonal matrix 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
diagonal model 
[91B50] 
(see: Financial equilibrium) 
diagonal operator see: block-0- —; off-0- — 
diagonal pivot 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
diagonal shift matrix 
[49M37, 65K10, 90C26, 90C30] 
(see: aBB algorithm; QBB global optimization method) 
diagonal underestimation matrix 
[49M37, 65K10, 90C26, 90C30] 
(see: QBB global optimization method) 
diagram see: composition interval —; conceptual —; 
cumulative sum —; farthest-point Voronoi —; Hasse —; 
temperature interval —; Voronoi — 
diagrams see: Voronoi — 
diagrams in facility location see: Voronoi — 
DIAL see: S- — 
dial-a-ride 
[90B06] 
(see: Vehicle routing) 
dial-a-ride see: m- — 
diameter 
[90C35] 
(see: Multi-index transportation problems) 
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dichotomy see: generalized-upper-bound —; GUB —; 
variable — 
dicycle 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
dielectric structures see: Global optimization of planar 
multilayered — 
Dienes implication see: Kleene- — 
difference see: minimum composition —; temporal — 
difference approximation see: finite- — 
difference convex function 
[26B25, 26E25, 49J40, 49J52, 49M05, 49805, 74G99, 74H99, 
74Pxx, 90C99] 
(see: Quasidifferentiable optimization; Quasidifferentiable 
optimization: variational formulations) 
difference of convex functions 
[65Kxx, 90Cxx] 
see: Quasidifferentiable optimization: algorithms for QD 
functions) 
difference equation 
[93-XX] 
see: Dynamic programming: optimal control applications) 
difference estimate 
[90C15] 
see: Derivatives of probability measures) 
difference of max-type functions 
[65Kxx, 90Cxx] 
see: Quasidifferentiable optimization: algorithms for QD 
functions) 
difference methods see: finite — 
difference of monotonic functions 
[65K05, 90C26, 90C30] 
see: Monotonic optimization) 
difference quotients 
[65D25, 68W30] 
see: Complexity of gradients, Jacobians, and Hessians) 
difference of relations 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
difference sublinear 
[46A20, 52A01, 90C30] 
see: Farkas lemma: generalizations) 
difference sublinear function 
[46A20, 52A01, 90C30] 
(see: Farkas lemma: generalizations) 
differences see: divided —; finite — 
differences of convex sets 
[26E25, 49J52, 52A27, 90C99] 
(see: Quasidifferentiable optimization: Dini derivatives, 
clarke derivatives) 
differencing 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
different 
(see: Planning in the process industry) 
differentiability see: direct —; inverse —; Minimax: 
directional — 
differentiable 
[65K05, 90C15, 90C30, 90Cxx] 
(see: Derivatives of probability measures; Dini and 


Hadamard derivatives in optimization; Image space 
approach to optimization) 

differentiable see: Dini —; dini directionally —; 
directionally —; Hadamard —; hadamard directionally —; 
process —; strictly — 

differentiable convex programming 
[90C30] 
(see: Lagrangian duality: BASICS) 

differentiable exact penalty function approach see: 
continuously — 

differentiable family of measures see: weakly L (v)- — 

differentiable function 
[26B25, 26E25, 49]52, 90C99] 
(see: Quasidifferentiable optimization) 

differentiable function 
[90C30] 
(see: Frank-Wolfe algorithm) 

differentiable function see: C- —; Dini —; Dini conditionally —; 
Dini conditionally directionally —; Dini directionally —; Dini 
uniformly —; Dini uniformly directionally —; 
directionally —; Fréchet —; Hadamard —; Hadamard 
conditionally —; Hadamard conditionally directionally —; 
Hadamard directionally —; piecewise —; piecewise 
continuously —; piecewise twice- — 

Differentiable Functions and Applications see: minimization 
Methods for Non- — 

differentiable (GD) function see: generalized — 

differentiable MINLPs see: twice- — 

differentiable NLPs see: Twice- — 

differentiable part of a function see: twice- — 


differential 
[01A99] 
(see: Leibniz, gottfried wilhelm) 

differential see: C- —; Clarke directional —; generalized 
directional —; limiting —; one-sided —; Rockafellar 
directional — 


differential and algebraic equations 
[49M37, 65L99, 90C11, 93-XX] 
(see: MINLP: applications in the interaction of design and 
control; Optimization strategies for dynamic systems) 
differential cost 
[49199] 
(see: Dynamic programming: average cost per stage 
problems) 
differential cost vector 
[49199] 
(see: Dynamic programming: average cost per stage 
problems) 
differential dynamic programming 
[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 
differential equation 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: differential equations) 
differential equation 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: differential equations) 
differential equation see: Knizhnik-Zamolodchikov —; 
stochastic — 
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differential equations 
[34-xx, 34Bxx, 34Lxx, 93E24] 
(see: Complexity and large-scale least squares problems) 

differential equations see: Duality in optimal control with first 
order —; Eigenvalue enclosures for ordinary —; First order 
partial —; Interval analysis: —; ordinary —; partial — 

Differential equations and global optimization 
(60G35, 65K05) 
(referred to in: aBB algorithm; Continuous global 
optimization: applications; Continuous global 
optimization: models, algorithms and software; Direct 
global optimization algorithm; Global optimization based 
on statistical models; Global optimization in binary star 
astronomy; Global optimization methods for systems of 
nonlinear equations; Global optimization using space 
filling; Topology of global optimization) 
(refers to: &BB algorithm; Continuous global optimization: 
applications; Continuous global optimization: models, 
algorithms and software; Direct global optimization 
algorithm; Global optimization based on statistical models; 
Global optimization in binary star astronomy; Global 
optimization methods for systems of nonlinear equations; 
Global optimization using space filling; Simulated 
annealing methods in protein folding; Topology of global 
optimization) 

differential stability see: Sensitivity and stability in NLP: 
continuity and — 

differentiation 
[01A99] 
(see: Leibniz, gottfried wilhelm) 

differentiation 
[01A99, 65H99, 65K99] 
(see: Automatic differentiation: point and interval; Leibniz, 
gottfried wilhelm) 

differentiation see: algorithmic —; analytical —; automatic —; 
backward mode in automatic —; computational —; forward 
mode of automatic —; goal-oriented —; internal 
numerical —; interval automatic —; Nonlocal sensitivity 
analysis with automatic —; numerical —; reverse —; reverse 
mode automatic —; symbolic —; vector forward 
automatic — 

differentiation arithmetic 
[90C26, 90C30] 
(see: Bounding derivative ranges) 

differentiation: calculation of the Hessian see: Automatic — 

differentiation: calculation of Newton steps see: Automatic — 

differentiation: geometry of satellites and tracking stations see: 
Automatic — 

differentiation: introduction, history and rounding error 
estimation see: Automatic — 

differentiation: parallel computation see: Automatic — 

differentiation: point and interval see: Automatic — 

differentiation: point and interval taylor operators see: 
Automatic — 

differentiation: root problem and branch problem see: 
Automatic — 

difficult optimization problems see: Modeling — 

difficulties in bilinear programming 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 


diffraction data see: Optimization techniques for phase 
retrieval based on single-crystal X-ray — 

diffusion equation 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 

diffusion equation see: nonlinear — 

diffusion equation method 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 

diffusion flux models see: estimation of — 

diffusion fluxes see: estimation of 1D- — 

diffusion process 
[78M50, 90B50, 91B28] 
(see: Global optimization algorithms for financial planning 
problems; Laplace method and applications to optimization 
problems) 

digraph 
[03B52, 03E72, 05C05, 05C40, 47840, 68R10, 68T27, 68T35, 
68Uxx, 90Bxx, 90C09, 90C10, 90C35, 91 Axx, 91B06, 92C60] 
(see: Boolean and fuzzy relations; Combinatorial matrix 
analysis; Network design problems) 

digraph see: circuit of a—; complete —; directed cycle of a —; 
min-max —; set of edges of a —; signed —; strongly 
connected —; strongly connected components of a —; 
vertex of a — 

digraph representation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

dihedral angle 

[92B05] 

see: Genetic algorithms for protein structure prediction) 

dihedral angle 

[92B05] 

see: Genetic algorithms for protein structure prediction) 

dijoin 

[90C35] 

(see: Feedback set problems) 

dilatation 

[90C15, 90C29] 

see: Discretely distributed stochastic programs: descent 
directions and efficient points) 

dilation see: space — 

dimension 
[90C30] 
(see: Simplicial decomposition) 

dimension algorithm see: dimension-by- — 

dimension-by-dimension algorithm 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 

dimension pivoting algorithms see: varying — 

dimensional Brownian motion see: N- — 

dimensional control problem see: finite- — 

dimensional cube connected cycle see: k- — 

dimensional generalized order complementarity problem see: 
infinite- — 

dimensional grid see: 2- — 

dimensional hypercube see: d- — 

dimensional integration see: high- — 

dimensional knapsack problem see: m- — 


4170 


Subject Index 


dimensional linear program see: finite- — 
dimensional linear programming see: infinite- — 
dimensional marginal probability distribution function see: 
one- —; two- — 
DIMENSIONAL MATCHING see: 3- — 
dimensional matching problem see: 3- — 
dimensional models for entropy optimization for image 
reconstruction see: finite- — 
dimensional nonlinear equation see: one- — 
dimensional optimization see: infinite- — 
dimensional subspace see: finite- — 
dimensional symmetric interval matrix 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: eigenvalue bounds of interval 
matrices) 
dimensional torus see: 2- —; d- — 
dimensional transportation problem see: three- — 
dimensional variational inequality problem see: finite- — 
dimensional vectors see: lexicographical ordering for n- — 
dimensionality see: cures of —; curse of — 
Dini codifferentiable function 
[65Kxx, 90Cxx] 
see: Quasidifferentiable optimization: algorithms for QD 
functions) 
Dini conditional lower derivative 
[65K05, 90Cxx 
see: Dini and Hadamard derivatives in optimization) 
Dini conditional upper derivative 
[65K05, 90Cxx 
(see: Dini and Hadamard derivatives in optimization) 
Dini conditionally differentiable function 
[65K05, 90Cxx 
see: Dini and Hadamard derivatives in optimization) 
Dini conditionally directionally differentiable function 
[65K05, 90Cxx 
see: Dini and Hadamard derivatives in optimization) 
Dini derivative 
[26E25, 49J52, 52A27, 65K05, 90C99, 90Cxx] 
see: Dini and Hadamard derivatives in optimization; 
Quasidifferentiable optimization: Dini derivatives, clarke 
derivatives; Quasidifferentiable optimization: optimality 
conditions) 
Dini derivative 
[26E25, 49J52, 52A27, 90C99] 
(see: Quasidifferentiable optimization: Dini derivatives, 
clarke derivatives) 
Dini derivatives, clarke derivatives see: Quasidifferentiable 
optimization: — 
Dini differentiable 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
Dini differentiable function 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 
Dini directional derivatives 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
Dini directional derivatives 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 


dini directionally differentiable 
[65K05, 90C30, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization; 
Minimax: directional differentiability) 

Dini directionally differentiable function 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 

Dini and Hadamard derivatives in optimization 
(90Cxx, 65K05) 
(referred to in: Global optimization: envelope 
representation; Nondifferentiable optimization; 
Nondifferentiable optimization: cutting plane methods; 
Nondifferentiable optimization: minimax problems; 
Nondifferentiable optimization: Newton method; 
Nondifferentiable optimization: parametric programming; 
Nondifferentiable optimization: relaxation methods; 
Nondifferentiable optimization: subgradient optimization 
methods; Quasidifferentiable optimization: optimality 
conditions) 
(refers to: Global optimization: envelope representation; 
Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods; Nondifferentiable 
optimization: minimax problems; Nondifferentiable 
optimization: Newton method; Nondifferentiable 
optimization: parametric programming; Nondifferentiable 
optimization: relaxation methods; Nondifferentiable 
optimization: subgradient optimization methods) 

Dini lower derivative 

26E25, 49J52, 52A27, 90C99] 

(see: Quasidifferentiable optimization: Dini derivatives, 

clarke derivatives) 

Dini lower directional derivative 

65K05, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization) 

Dini quasidifferentiable function 

90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 

Dini quasidifferential 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 

Dini steepest ascent direction 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 

Dini steepest descent direction 

65K05, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization) 

Dini sup-stationary point 

65K05, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization) 

Dini uniformly differentiable function 

65K05, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization) 

Dini uniformly directionally differentiable function 

26E25, 49J52, 52A27, 90C99] 

(see: Quasidifferentiable optimization: Dini derivatives, 

clarke derivatives) 

Dini upper derivative 

26E25, 49J52, 52A27, 90C99] 
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(see: Quasidifferentiable optimization: Dini derivatives, 
clarke derivatives) 
Dini upper directional derivative 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
Dinkelbach algorithm 
90C32] 
(see: Quadratic fractional programming: Dinkelbach 
method) 
Dinkelbach method 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
Dinkelbach method 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization; Quadratic 
fractional programming: Dinkelbach method) 
Dinkelbach method see: Quadratic fractional programming: — 
Diophantine approximation problem see: simultaneous — 
Diophantine equations 


constrained optimization) 
refers to: Interval analysis: unconstrained and constrained 
optimization) 


direct search optimization 


[93-XX] 
see: Direct search Luus—Jaakola optimization procedure) 


direct-sequential 


[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 


directed arc in a directed network 


[90C35] 
see: Maximum flow problem) 


directed arc in a network 


[90C35] 
(see: Minimum cost flow problem) 


directed capacitated network 


[05B35, 90C05, 90C20, 90C33] 
see: Least-index anticycling rules) 


[65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 
squares) 

dipole moment 

[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 

(see: Global optimization in protein folding) 

direct approach 

[65G20, 65G30, 65G40, 65K05, 90C30] 

(see: Interval global optimization) 


direct collision 


(see: Broadcast scheduling problem) 

direct differentiability 

[90C15] 

(see: Derivatives of probability measures) 

Direct global optimization algorithm 

(65K05) 

(referred to in: a BB algorithm; Continuous global 
optimization: applications; Continuous global 
optimization: models, algorithms and software; Differential 
equations and global optimization; Global optimization 
based on statistical models; Global optimization in binary 
star astronomy; Global optimization methods for systems 
of nonlinear equations; Global optimization using space 
filling; Topology of global optimization) 

(refers to: &BB algorithm; Continuous global optimization: 
applications; Continuous global optimization: models, 
algorithms and software; Differential equations and global 
optimization; Global optimization based on statistical 
models; Global optimization in binary star astronomy; 
Global optimization methods for systems of nonlinear 
equations; Global optimization using space filling; 
Topology of global optimization) 


direct iteration 


[65H10, 65J15] 
(see: Contraction-mapping) 


direct search 


[90C30] 

(see: Suboptimal control) 

Direct search Luus—Jaakola optimization procedure 
(93-XX) 

(referred to in: Interval analysis: unconstrained and 


directed Chinese postman problem 

[90C35] 

see: Minimum cost flow problem) 

directed cycle see: cost of a — 

directed cycle of a digraph 

[90C09, 90C10] 

(see: Combinatorial matrix analysis) 

directed divergence 

[90C25, 9417] 

see: Entropy optimization: shannon measure of entropy 

and its properties) 

directed graph 

[90C09, 90C10] 

see: Combinatorial matrix analysis; Oriented matroids) 

directed network 

[90C35] 

see: Maximum flow problem; Minimum cost flow problem) 

directed network see: directed arc in a —; endpoint of an arc in 
a—; node ina — 

directed path 

[90C35] 

see: Maximum flow problem; Minimum cost flow problem) 

directed tree 

[05C85] 

see: Directed tree networks) 

Directed tree networks 

(05C85) 

referred to in: Auction algorithms; Bottleneck steiner tree 
problems; Capacitated minimum spanning trees; 
Communication network assignment problem; Dynamic 
traffic networks; Equilibrium networks; Generalized 
networks; Maximum flow problem; Minimax game tree 
searching; Minimum cost flow problem; Multicommodity 
flow problems; Network design problems; Network 
location: covering problems; Nonconvex network flow 
problems; Piecewise linear network flow problems; Shortest 
path tree algorithms; Steiner tree problems; Stochastic 
network problems: massively parallel solution; Survivable 
networks; Traffic network equilibrium) 

directed walk 
[90C35] 
(see: Minimum cost flow problem) 
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direction 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 

direction see: ascendant —; away —; centering —; compute 
the search —; coordinate —; critical —; descent —; Dini 
steepest ascent —; Dini steepest descent —; feasible —; 
feasible ascendant —; Hadamard steepest ascent —; 
Hadamard steepest descent —; high-order critical —; 
high-regular critical —; hyperspherical —; improving 
feasible —; jump —; quasi-Newtonian descent —; 
search —; soaring —; steepest ascent —; steepest 
descent — 

direction angles 
[26A24, 65K99, 85-08] 
(see: Automatic differentiation: geometry of satellites and 
tracking stations) 

direction computation 
[49M07, 49M10, 65K, 90C06] 
(see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 

direction of descent 

[90C30] 

see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 

direction finding problem 

[90C30] 

see: Frank-Wolfe algorithm) 

direction finding problem 

[90C30] 

see: Frank-Wolfe algorithm) 

direction finding problem see: regularized — 

direction hit and run see: hyperspheres — 

direction method see: reference — 

direction method for nonlinear programming see: feasible — 

direction methods see: feasible — 

direction of negative curvature 
[90C06] 
(see: Large scale unconstrained optimization) 

direction, preserving an activity 
[90Cxx] 
(see: Discontinuous optimization) 

direction subclass see: conjugate — 

direction vector see: reference — 

directional Clarke derivative 
[35A15, 47J20, 49J40] 
(see: Hemivariational inequalities: static problems) 

directional derivative 
[26B25, 26E25, 46N10, 49J52, 90-00, 90C26, 90C30, 90C31, 
90C47, 90C99] 
(see: Global optimization: envelope representation; 
Lagrangian duality: BASICS; Nondifferentiable 
optimization; Quasidifferentiable optimization; Sensitivity 
and stability in NLP: continuity and differential stability) 

directional derivative 
[65K05, 90C30, 90Cxx] 
(see: Minimax: directional differentiability; 
Quasidifferentiable optimization: optimality conditions) 

directional derivative see: Clarke generalized —; Dini lower —; 
Dini upper —; generalized —; generalized second order —; 
Hadamard lower —; Hadamard upper —; kth — 


directional derivatives 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
directional derivatives see: Dini —; Hadamard —; 
high-order —; higher-order —; lower and upper — 
directional differentiability see: Minimax: — 
directional differential see: Clarke —; generalized —; 
Rockafellar — 
directional SOCQ 
[49K27, 49K40, 90C30, 90C31] 
(see: Second order constraint qualifications) 
directionally differentiable 
[90C30, 90C31, 90C34, 90C46] 
(see: Generalized semi-infinite programming: optimality 
conditions; Image space approach to optimization) 
directionally differentiable see: dini —; hadamard — 
directionally differentiable function 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 
directionally differentiable function see: Dini —; Dini 
conditionally —; Dini uniformly —; Hadamard —; 
Hadamard conditionally — 
directions see: combined method of feasible —; cone of 
critical —; cone of feasible —; construction of descent —; 
methods of feasible —; orthogonal search —; steep — 
directions and efficient points see: Discretely distributed 
stochastic programs: descent — 
directions in interval branch and bound methods see: Interval 
analysis: subdivision — 
directions of P see: covers all edge- — 
directive decomposition see: price- —; resource- — 
directly left-reachable 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
directly right-reachable 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
Dirichlet distribution 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
Dirichlet problem see: nonsmooth — 
disaggregate simplicial decomposition 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
disaggregated representation 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
disaggregated representation 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
Disaggregation 
(see: Optimal planning of offshore oilfield infrastructure) 
disaggregation see: preference — 
disaggregation analysis see: preference — 
disaggregation approach see: preference — 
disaggregation approach: basic features, examples from 
financial decision making see: Preference — 
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disaggregation method 

[90C29, 91A99] 

(see: Preference disaggregation) 
disaggregation in multi-objective optimization 

[90C29, 91499] 

(see: Preference disaggregation) 
disaggregation paradigm 

[90C29, 91499] 

(see: Preference disaggregation) 
disaggregation of preferences 

[91B06, 91B60] 

(see: Financial applications of multicriteria analysis) 
disaggregation system see: interactive — 
disaggregation under uncertainty 
90C29, 91A99 
(see: Preference disaggregation) 
isallowed node 
68T99, 90C27 
(see: Capacitated minimum spanning trees) 
scarding far-from-native conformations 
92C05, 92C40 
(see: Protein loop structure prediction methods) 
sconnected matroid 
90C09, 90C10 
(see: Matroids) 
scontinuous function 
93-XX] 

(see: Direct search Luus—Jaakola optimization procedure) 
Discontinuous optimization 

(90Cxx) 

(referred to in: Nondifferentiable optimization) 

(refers to: Conjugate-gradient methods; Gauss-Newton 

method: Least squares, relation to Newton’s method; 

Nondifferentiable optimization) 
discontinuous optimization 
90Cxx] 

(see: Discontinuous optimization) 
Discontinuous optimization 

90Cxx] 

(see: Discontinuous optimization) 
'scordance 

91B06, 91B60] 

(see: Financial applications of multicriteria analysis) 
discordance 

90-XX] 

(see: Outranking methods) 
discordance see: concordance- — 
discordant coalition 

90-XX] 

(see: Outranking methods) 

scount factor 

49120, 49L99, 90C39, 90C40] 

(see: Dynamic programming: average cost per stage 

problems; Dynamic programming: discounted problems; 

Dynamic programming: infinite horizon problems, 

overview; Dynamic programming: undiscounted problems) 
discounted infinite horizon problem 

[491.20] 

(see: Dynamic programming: inventory control) 
discounted problem 

[49L20, 491.99, 90C40] 


d 


d 


d 


d 


d 


d 


(see: Dynamic programming: average cost per stage 
problems; Dynamic programming: stochastic shortest path 
problems) 
discounted problem 
[49L.20, 90C39, 90C40] 
(see: Dynamic programming: discounted problems; 
Dynamic programming: undiscounted problems) 
discounted problem with bounded cost per stage 
[49L20, 90C39, 90C40] 
(see: Dynamic programming: infinite horizon problems, 
overview) 
discounted problems see: Dynamic programming: — 
Discovery see: logic of Scientific — 
discrepancy search see: limited — 
discrete approximation 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
discrete approximation 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
discrete-continuous global optimization see: mixed — 
discrete-continuous optimization problems see: Continuous 
reformulations of — 
discrete convergence 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
discrete convergence see: weak — 
discrete convex analysis 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
discrete convex analysis 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
discrete decisions in dynamic optimization 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
discrete design variables 
[90C26, 90C90] 
(see: Structural optimization: history) 
discrete distributions see: Logconcavity of — 
discrete dynamic complementarity problem 
[90C33] 
(see: Order complementarity) 
discrete dynamical systems 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
discrete e-global local maximizers see: set of — 
discrete event dynamic system 
[90C15] 
(see: Stochastic quasigradient methods: applications) 
discrete filled function 
[65K05, 90C26, 90C30, 90C59] 
(see: Global optimization: filled function methods) 
discrete free variables see: Generalized geometric 
programming: mixed continuous and — 
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discrete function 
03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
discrete functions 
03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
screte global maximizer 
90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
screte left local maximizer 
90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
discrete location and assignment 
68M20, 90B06, 90B10, 90B35, 90B80, 90B85, 90C10, 90C27, 
90Cxx, 91 Axx, 91Bxx] 
(see: Facility location with externalities; Vehicle scheduling) 
discrete location problem 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 


d 


d 


discrete logconcave distributions 
[90C15] 
(see: Logconcavity of discrete distributions) 
discrete measure 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
discrete midpoint convexity 
90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
discrete model in OR 
90B80, 90B85] 
(see: Warehouse location problem) 
discrete monotonic optimization problems 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
discrete Mosco convergence 
90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
iscrete multiple criteria problem 
90C29] 
(see: Multiple objective programming support) 
iscrete neighborhood 
65K05, 90C05, 90C25, 90C26, 90C30, 90C34, 90C59] 
(see: Global optimization: filled function methods; 
Semi-infinite programming: discretization methods) 
discrete optimization 
62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
discrete optimization 
62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
discrete optimization see: Convex — 
discrete optimization oracle see: linear — 
discrete Painlevé-Kuratowski convergence 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 


di 


di 


discrete polyblock algorithm 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 

discrete probability distribution see: logconcave —; 
logconcave univariate — 

discrete resource allocation problem 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 

discrete separation theorem see: Frank — 

discrete single-commodity single-criterion uncapacitated static 
multifacility 
[90B80, 90B85] 
(see: Warehouse location problem) 

Discrete stochastic optimization 
(90C15, 90C27) 
(referred to in: Derivatives of markov processes and their 
simulation; Derivatives of probability and integral 
functions: general theory and examples; Derivatives of 
probability measures) 
(refers to: Derivatives of markov processes and their 
simulation; Derivatives of probability and integral 
functions: general theory and examples; Derivatives of 
probability measures; Integer programming: branch and 
bound methods; Optimization in operation of electric and 
energy power systems; Simulated annealing; Stochastic 
integer programming: continuity, stability, rates of 
convergence) 

discrete-time algorithms 
[90B15] 
(see: Dynamic traffic networks) 

discrete-time algorithms 
[90B15] 
(see: Dynamic traffic networks) 

discrete-time formulations 
[90C26] 
(see: MINLP: design and scheduling of batch processes) 

discrete Time Model 
(see: Integrated planning and scheduling) 

discrete-time models 
(see: Planning in the process industry) 

discrete time models see: continuous and — 

Discrete-Time Optimal Control 
[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 

discrete-time systems 

39A11, 93C55, 93D09] 

(see: Robust control: schur stability of polytopes of 

polynomials) 

screte truncated Newton method 

90C06] 

(see: Large scale unconstrained optimization) 

screte variables 

90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

(see: Modeling difficult optimization problems) 

scretely distributed stochastic programs 

90C15, 90C29] 

(see: Discretely distributed stochastic programs: descent 

directions and efficient points) 


d 
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Discretely distributed stochastic programs: descent directions 
and efficient points 
(90C15, 90C29) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Extremum problems with probability 
functions: kernel type solution methods; General moment 
optimization problems; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; Multistage stochastic programming: barycentric 
approximation; Preprocessing in stochastic programming; 
Probabilistic constrained linear programming: duality 
theory; Probabilistic constrained problems: convexity 
theory; Simple recourse problem: dual method; Simple 
recourse problem: primal method; Stabilization of cutting 
plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Extremum problems with probability 
functions: kernel type solution methods; General moment 
optimization problems; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; Multistage stochastic programming: barycentric 
approximation; Preprocessing in stochastic programming; 
Probabilistic constrained linear programming: duality 
theory; Probabilistic constrained problems: convexity 
theory; Simple recourse problem: dual method; Simple 
recourse problem: primal method; Stabilization of cutting 
plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 


programming: quasigradient method; Two-stage stochastic 
programs with recourse) 
discretization 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
discretization 


[90B85] 
(see: Multifacility and restricted location problems) 
discretization see: full —; partial —; uniform time — 


discretization method 

[90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming: discretization methods) 
discretization methods see: Semi-infinite programming: — 
discretization of optimization problems 

[90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming: discretization methods) 
discretization procedure see: stochastic — 
discretized hemivariational inequalities for nonlinear material 

laws 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
see: Hemivariational inequalities: applications in 
mechanics) 
scretized optimal control problems 
49K20, 49M99, 90C55] 

(see: Sequential quadratic programming: interior point 
methods for distributed optimal control problems) 
scretized SIP problem 

90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming: discretization methods) 
discretized SIP problem see: nonlinear — 
discriminant 

[05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements) 
discriminant analysis 

[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 

90C90] 

(see: Disease diagnosis: optimization-based methods) 
discriminant functions 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
see: Disease diagnosis: optimization-based methods) 
(discriminant problem) see: g-group classification problem — 
discrimination 
[90C29] 
see: Multicriteria sorting methods) 
discrimination see: hierarchical —; multigroup hierarchical — 
Disease diagnosis: optimization-based methods 
90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 90C90, 90-08, 
65K05) 
disjoint 
see: Bilinear programming) 
disjoint see: link-diverse/ — 
disjoint constraints 
[68W 10, 90B15, 90C06, 90C30] 
see: Stochastic network problems: massively parallel 
solution) 
disjoint path see: edge- —; node- — 
disjunction 
[90C09, 90C10, 90C11] 
see: Disjunctive programming) 


d 


d. 
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disjunction arc 
[90B35] 
(see: Job-shop scheduling problem) 

Disjunctions 
(see: Logic-based outer approximation) 

disjunctions see: Convex hull — 

disjunctive cut principle 
[90C09, 90C10, 90C11] 
(see: Disjunctive programming) 

disjunctive cutting plane approach see: Mixed-integer 
nonlinear optimization: A — 

disjunctive inequality 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms) 

disjunctive normal form 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T15, 68T27, 68T30, 
90C09, 90C10] 
(see: Checklist paradigm semantics for fuzzy logics; Finite 
complete systems of many-valued logic algebras; Inference 
of monotone boolean functions; Optimization in boolean 
classification problems; Optimization in classifying text 
documents) 

disjunctive normal form 
[90C09, 90C10] 
(see: Inference of monotone boolean functions; 
Optimization in boolean classification problems; 
Optimization in classifying text documents) 

disjunctive OA master problem 
[90C09, 90C10, 90C11] 
(see: MINLP: logic-based methods) 

disjunctive program see: facial — 

Disjunctive programming 
(90C09, 90C10, 90C11) 
(referred to in: Continuous reformulations of 
discrete-continuous optimization problems; MINLP: 
branch and bound global optimization algorithm; MINLP: 
branch and bound methods; MINLP: global optimization 
with «BB; MINLP: logic-based methods) 
(refers to: MINLP: branch and bound global optimization 
algorithm; MINLP: branch and bound methods; MINLP: 
global optimization with «BB; MINLP: logic-based 
methods; Reformulation-linearization technique for global 
optimization) 

disjunctive programming 
[90C10, 90C11, 90C27, 90C30, 90C33, 90C57] 
(see: Integer programming; Logic-based outer 
approximation; Optimization with equilibrium constraints: 
A piecewise SQP approach) 

disjunctive programming 
[90C09, 90C10, 90C11, 90C27, 90C30, 90C33, 90C57] 
(see: Disjunctive programming; Integer programming; 
MINLP: logic-based methods; Optimization with 
equilibrium constraints: A piecewise SQP approach; Set 
covering, packing and partitioning problems) 

disjunctive programming see: Generalized — 

disk graphs 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 

disk graphs see: bisectored unit —; bounded ratio —; 
double —; Optimization problems in unit- —; unit- — 

disk) representation see: geometric (or — 


dispatch problem 
[90B50] 
(see: Optimization and decision support systems) 
dispatcher see: load — 
dispatching see: load — 
displacement see: kinematically admissible — 
displacement compatibility equations see: strain- — 
displacements see: method of simultaneous —; method of 
successive —; virtual — 
dissection see: nested —; tree — 
dissimilarities 
[65K05, 90C27, 90C30, 90C57, 91C15] 
(see: Optimization-based visualization) 
dissimilarity measure 
[62H30, 90C27] 
(see: Assignment methods in clustering) 
distance see: bond —; elliptic —; euclidean —; inter-class —; 
intra-class —; Ly- —; Manhattan —; maximizing 
minimum —; maximum weighted —; method of 
optimal —; nonbonded —-; rectilinear —; relative — 
distance constrained labeling 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
distance-constrained vehicle routing problem 
[90B06] 
(see: Vehicle routing) 
Distance dependent protein force field via linear optimization 
distance of a feasible point to a solution point see: bounds on 
the — 
distance function see: least squares — 
distance functions 
41A30, 47A99, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
distance geometry problem 
65D 18, 90B85, 90C26] 
(see: Global optimization in location problems) 
distance geometry problem see: Molecular — 
distance in a graph 
05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
distance label 
90B10, 90C27, 90C35] 
(see: Maximum flow problem; Shortest path tree 
algorithms) 
distance location see: Single facility location: multi-objective 
euclidean —; Single facility location: multi-objective 
rectilinear — 
distance location problem see: Euclidean —; iterative solution 
of the Euclidean —; rectilinear —; squared Euclidean — 
distance matrix 
05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
distance matrix see: Euclidean —; partial — 
distance matrix completion problem 
05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
distance matrix completion problem see: Euclidean — 
distance measure 
90B80, 90B85] 
(see: Warehouse location problem) 
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distance scaling method 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
distances see: euclidean —; Manhattan —; minkowski —; 
Optimizing facility location with euclidean and 
rectilinear —; positiveness of — 
distillation 
[90C30, 90C90] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes; Successive quadratic 
programming: applications in distillation systems) 
distillation 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in 
distillation systems) 
distillation see: reactive — 
distillation column synthesis see: MINLP: reactive — 
distillation superstructure 
[90C90] 
(see: MINLP: reactive distillation column synthesis) 
distillation systems see: Successive quadratic programming: 
applications in — 
distinguishable 
62H30, 68T10, 90C11] 
(see: Mixed integer classification problems) 
stinguished point 
90C26] 
(see: Cutting plane methods for global optimization) 
istinguished solution 
90C05] 
(see: Linear programming: Klee-Minty examples) 
stinguished tableau 
90C05] 
(see: Linear programming: Klee-Minty examples) 
stinguished variable 
65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
stributed computing 
90C30, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms) 
distributed computing 
90C30, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms) 
stributed game tree search algorithm 
49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
distributed memory parallel computer 
65K05, 65Y05] 
(see: Parallel computing: models) 
distributed memory parallel machines 
65K05, 65Y05] 
(see: Parallel computing: models) 
stributed optimal control problems 
49K20, 49M99, 90C55] 
(see: Sequential quadratic programming: interior point 
methods for distributed optimal control problems) 
distributed optimal control problems see: Sequential quadratic 
programming: interior point methods for — 
distributed optimization algorithms see: Asynchronous — 
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d 


distributed state space search algorithm see: synchronized — 

distributed stochastic programs see: discretely — 

distributed stochastic programs: descent directions and 
efficient points see: Discretely — 

distributed systems see: boundary flux estimation in — 

distribution see: binomial —; Boltzmann —; Dirichlet —; 
Gaussian —; geometric —; hypergeometric —; incomplete 
knowledge of a probability —; law of normal —; Levy 
probability —; logconcave discrete probability —; 
logconcave univariate discrete probability —; multivariate 
gamma —-; multivariate normal —; Poisson —; posterior —; 
prior —; probability —; quasiconcave probability —; 
trinomial —; Tsallis probability —; uncertainty embedded in 
a probability —; uniform — 

distribution with algebraically decreasing tail see: RSM- — 

distribution density see: steady-state — 

distribution of efforts see: optimal — 

distribution function 
[60G35, 65K05] 
(see: Differential equations and global optimization) 

distribution function see: multivariate probability —; 
one-dimensional marginal probability —; two-dimensional 
marginal probability — 

distribution functions see: gradient of multivariate —; 
marginal — 

distribution graph see: alignment- — 

distribution law see: Gauss — 

distribution problems 
[90C35] 
(see: Minimum cost flow problem) 

distribution scheduling: an MILP model see: Gasoline blending 
and — 

distribution system design 
[90-02] 
(see: Operations research models for supply chain 
management and design) 

distribution system design problem see: Production- — 

distribution systems 
[90-02] 
(see: Operations research models for supply chain 
management and design) 

distribution systems planning 
[90C35] 
(see: Multicommodity flow problems) 

distributional derivatives 
[60J05, 90C15] 
(see: Derivatives of markov processes and their simulation; 
Derivatives of probability measures) 

distributions see: asymptotic results for RSM- —; discrete 
logconcave —; Logconcavity of discrete —; Stochastic 
linear programs with recourse and arbitrary multivariate — 

distributive lattice 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 

distributive lattice see: free — 

district see: double-ended crew —; single-ended crew — 

districting problem see: political — 

districts see: crew — 
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distrust region method 
[49M37, 90C11] 
(see: Mixed integer nonlinear programming) 
disutility functions see: traffic network equilibrium with 
travel — 
dive-and-fix 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
divergence see: Csiszar a- —; directed —; Itakura—Saito —; 
Kullback—Leibler — 
divergent series rule 
[49]52, 90C30] 
(see: Nondifferentiable optimization: subgradient 
optimization methods) 
divergent series step-size rule 
[47]20, 49]40, 65K10, 90C33] 
(see: Solution methods for multivalued variational 
inequalities) 
divergent series steplength rule 
[90C30] 
(see: Cost approximation algorithms) 
diverging trails 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
diverse/disjoint see: link- — 
diversification 
[68M20, 90B06, 90B35, 90B80, 90C59] 
(see: Flow shop scheduling problem; Heuristic and 
metaheuristic algorithms for the traveling salesman 
problem; Location routing problem) 
diversified investment decisions 
[68Q25, 91B28] 
(see: Competitive ratio for portfolio management) 
divide-and-conquer 
[90C09, 90C10, 90C26] 
(see: Combinatorial optimization algorithms in resource 
allocation problems; MINLP: branch and bound global 
optimization algorithm) 
divided differences 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
divided differences 
[65D25, 68W30] 
see: Complexity of gradients, Jacobians, and Hessians) 
division algorithm 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
see: Integer programming: algebraic methods) 
division multiplexing see: wavelength- — 
divisor 
[05B35, 20F36, 20F55, 52C35, 57N65] 
see: Hyperplane arrangements) 
divisor 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
divisor of an arrangement of hyperplanes 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
divorcing 
see: Bayesian networks) 
d.m. function 
[65Kxx, 90C26, 90C31, 90Cxx] 


(see: Quasidifferentiable optimization: algorithms for QD 
functions; Robust global optimization) 
dM functions 
90C26] 
(see: D.C. programming) 
dM optimization 
90C26, 90C31] 
(see: D.C. programming; Robust global optimization) 
DNA mapping 
90C35] 
(see: Optimization in leveled graphs) 
dNA sequencing 
65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 
DNA transcription element identification see: Mixed 0-1 linear 
programming approach for — 
DNF 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
DNF 
[90C09, 90C10] 
(see: Inference of monotone boolean functions; 
Optimization in boolean classification problems; 
Optimization in classifying text documents) 
DNF see: TAUT- — 
DNF clauses see: minimal number of — 
document 
90C09, 90C10] 
(see: Optimization in classifying text documents) 
document classification 
90C09, 90C10] 
(see: Optimization in classifying text documents) 
document classification see: automatic —; optimization in — 
document surrogate 
90C09, 90C10] 
(see: Optimization in classifying text documents) 
document surrogate 
90C09, 90C10] 
(see: Optimization in classifying text documents) 
documentation 
90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
documents see: automatic classification of —; classification of 
large collections of —; classification of text —; Optimization 
in classifying text — 
dog-lawed see: dead or — 
dogleg method 
65C20, 65G20, 65G30, 65G40, 65H20, 90C06, 90C90] 
(see: Interval analysis: application to chemical engineering 
design problems; Large scale unconstrained optimization) 
dogleg path 
49M37, 90C30] 
(see: Nonlinear least squares problems; Nonlinear least 
squares: trust region methods) 
dogleg path 
49M37] 
(see: Nonlinear least squares: trust region methods) 
dogleg path see: multiple — 
domain 
49J40; 49]53; 47H05; 47H04; 26B25] 
(see: Pseudomonotone maps: properties and applications) 
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domain see: admissible —; feasible —; natural — 
domain of a function 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
domain of a function see: effective — 
domain method see: fictitious — 
domain of search 
[90C26] 
(see: Global optimization using space filling) 
domains see: global optimization over unbounded — 
dominance 
[62H30, 90C27] 
(see: Assignment methods in clustering) 
dominance condition see: diagonal — 
dominance criterion 
[90C11, 90C31] 
(see: Multiparametric mixed integer linear programming) 
dominance relation 
[90-XX] 
(see: Outranking methods) 
dominated see: not — 
dominated family of measures 
[90C15] 
(see: Derivatives of probability measures) 
dominated point 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
dominating set 
[03B50, 05C15, 05C62, 05C69, 05C85, 68T15, 68T30, 90C27, 
90C59] 
(see: Finite complete systems of many-valued logic algebras; 
Optimization problems in unit-disk graphs) 
dominating set see: connected —; finite —; independent — 
domination Analysis 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
Domination analysis in combinatorial optimization 
(90C27, 90C59, 68Q25, 68W40, 68R10) 
(referred to in: Traveling salesman problem) 
(refers to: Traveling salesman problem) 
domination number 
[05C15, 05C62, 05C69, 05C85, 68Q25, 68R10, 68W40, 90B06, 
90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 90C60, 
90C90] 
(see: Domination analysis in combinatorial optimization; 
Optimization problems in unit-disk graphs; Traveling 
salesman problem) 
domination property 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
domination ratio 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
door point see: trap- — 
Doppler effect 
[90C26, 90C90] 
(see: Global optimization in binary star astronomy) 
dose (eud) see: equivalent uniform — 


double color 
[05C85] 
(see: Directed tree networks) 
double description method 
[52B12, 68Q25] 
(see: Fourier—-Motzkin elimination method) 
double disk graphs 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
see: Optimization problems in unit-disk graphs) 
double-ended 
(see: Railroad crew scheduling) 
double-ended crew district 
see: Railroad crew scheduling) 
double-max duality 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: biduality in nonconvex optimization) 
double-min duality 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
double pivot 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
double star 
[90C26, 90C90] 
see: Global optimization in binary star astronomy) 
double sweep method see: Aitken — 
double-well function 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: triduality in global optimization) 
doublet 
[65K05, 90C30] 
see: Automatic differentiation: calculation of the Hessian) 
doublet 
[65K05, 90C30] 
(see: Automatic differentiation: calculation of the Hessian) 
doublet see: sparse — 
doubly connected edge list see: extended — 
doubly nonnegative matrix 
[05C50, 15A48, 15A57, 90C25] 
see: Matrix completion problems) 
doubly stochastic matrix 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
Douglas—Rachford method 
47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 
down penalty 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
downhill simplex method 
90C30] 
(see: Sequential simplex method) 
DP 


90C09, 90C10, 90C11] 

(see: Disjunctive programming) 
drawing see: automatic graph —; graph — 
Driebeck-Tomlin penalty 

[90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: branch and bound methods) 
drifting see: population — 
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driven see: message- — 
driver method see: bold- — 
dropped negatively 
[90Cxx] 
(see: Discontinuous optimization) 
dropped positively 
[90Cxx] 
(see: Discontinuous optimization) 
dropping see: column — 
dropping rule see: column — 
drought out events 
[90C30, 90C35] 
(see: Optimization in water resources) 
drug delivery systems see: Model based control for — 
DS 
[90C26] 
(see: Global optimization using space filling) 
DSD 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
DSL function 
[46A20, 52A01, 90C30] 
see: Farkas lemma: generalizations) 
DSL system 
[46A20, 52A01, 90C30] 
see: Farkas lemma: generalizations) 
DSPACE 
[90C60] 
see: Complexity classes in optimization) 
DSS see: multicriteria — 
DTIME 
[90C60] 
see: Complexity classes in optimization) 
Du-Hwang minimax theorem 
[90C27] 
see: Steiner tree problems) 
dual 
[90B85, 90C27] 
see: Single facility location: circle covering problem) 
dual see: extended Lagrange-Slater —; formal perfect —; 
functional —; Lagrangian —; Mond-Weir —; primal- —; 
SSS* - —; strong —; superadditive —; surrogate —; 
Wolfe — 
dual action see: Clarke — 
dual algorithm see: generalized primal-relaxed —; primal- — 
dual approach see: Generalized primal-relaxed —; 
primal-relaxed — 
dual arc 
[90B35] 
(see: Job-shop scheduling problem) 
dual ascent 
[90B80, 90C10] 
(see: Facility location problems with spatial interaction) 
dual ascent 
[90B80, 90C10] 
see: Facility location problems with spatial interaction) 
dual block-angular structure 
[90C15] 
see: L-shaped method for two-stage stochastic programs 
with recourse; Stochastic programming: parallel 
factorization of structured matrices) 


dual bound-improvement 
[49M27, 90C11, 90C30] 
(see: MINLP: generalized cross decomposition) 
dual cone 
[90C15, 90C22, 90C25] 
(see: Copositive programming; Stochastic programming: 
nonanticipativity and lagrange multipliers) 
dual cut-improvement 
49M27, 90C11, 90C30] 
(see: MINLP: generalized cross decomposition) 
dual decomposition 
90C10, 90C15] 
(see: Stochastic integer programs) 
dual degenerate 
05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 
dual degenerate basis 
05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Lexicographic pivoting rules) 
dual descent 
90C30, 90C90] 
(see: Decomposition techniques for MILP: lagrangian 
relaxation) 
dual in entropy optimization see: unconstrained — 
dual Euler-Lagrange equation 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
dual exterior point algorithm 
[90C05] 
(see: Linear programming: Klee-Minty examples) 
dual feasibility 
[68W10, 90B15, 90C05, 90C06, 90C30, 90C31] 
(see: Parametric linear programming: cost simplex 
algorithm; Stochastic network problems: massively parallel 
solution) 
dual feasible set 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
dual framework see: primal- — 
dual information 
[49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 
dual integral system see: totally — 
dual interior-point methods see: primal- — 
dual linear program 
[90C30] 
(see: Lagrangian duality: BASICS) 
dual linear programs 
[15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
dual matroid 
[90C09, 90C10] 
(see: Matroids; Oriented matroids) 
dual method see: Simple recourse problem: — 
dual method for the simple recourse problem 
[90C06, 90C08, 90C15] 
(see: Simple recourse problem) 
dual methods see: primal- — 
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dual optimization problem 
[90C30] 
(see: Lagrangian duality: BASICS) 

dual optimization problem 

90C30] 

(see: Lagrangian duality: BASICS) 

dual optimization problem see: Lagrangian — 

dual (or Minty) GVI 

47J20, 49J40, 65K10, 90C33] 

(see: Solution methods for multivalued variational 

inequalities) 

dual pair 

90B35] 

(see: Job-shop scheduling problem) 

dual potential function 

37A35, 90C05, 90C25, 90C30] 
(see: Potential reduction methods for linear programming; 
Solving large scale and sparse semidefinite programs) 

dual potential function see: primal- — 

dual potential reduction algorithm see: primal- — 

dual price 

68Q99] 

(see: Branch and price: Integer programming with column 

generation) 

dual price increase 

90B10, 90C27] 

(see: Shortest path tree algorithms) 

dual problem 

15A39, 49-XX, 49K05, 49K10, 49K15, 49K20, 49L99, 90-XX, 
90C05, 90C29, 90C30, 93-XX] 
(see: Duality in optimal control with first order differential 
equations; Duality theory: monoduality in convex 
optimization; Dynamic programming: average cost per 
stage problems; Image space approach to optimization; 
Motzkin transposition theorem; Multi-objective 
optimization: lagrange duality; Theorems of the alternative 
and optimization) 

dual problem 
[90C05, 90C30] 
(see: Theorems of the alternative and optimization) 

dual problem see: construction of a —; generalized —; 
Lagrangian —; nonconvex — 

dual problems see: primal and — 

dual procedures 

68T99, 90C27] 

(see: Capacitated minimum spanning trees) 

dual program 

90C06] 

(see: Saddle point theory and optimality conditions) 

dual programming problem 

90C06] 

(see: Saddle point theory and optimality conditions) 

dual properness 

90C25, 90C26] 

(see: Decomposition in global optimization) 

dual ray 

15A39, 90C05] 

(see: Motzkin transposition theorem) 

dual scaling 

90C25, 90C30] 

(see: Solving large scale and sparse semidefinite programs) 


dual-scaling algorithm 
[90C25, 90C30] 
(see: Solving large scale and sparse semidefinite programs) 
dual scaling algorithm see: primal- — 
dual-scalings algorithm 
[90C25, 90C30] 
(see: Solving large scale and sparse semidefinite programs) 
dual SD problem 
[90C25, 90C27, 90C90] 
see: Semidefinite programming and structural 
optimization) 
dual semi-infinite program 
[90C05, 90C34, 91B28] 
see: Semi-infinite programming and applications in 
finance; Semi-infinite programming: methods for linear 
problems) 
dual semi-infinite program 
[90C05, 90C34] 
(see: Semi-infinite programming: methods for linear 
problems) 
dual semidefinite program 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 
dual side see: proof on the — 
dual simplex 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
dual simplex algorithm 
[90C35] 
(see: Generalized networks) 
dual simplex algorithms see: primal and — 
dual simplex method see: lexicographic — 
dual slacks 
[49-XX, 90-XX, 90C05, 93-XX] 
(see: Duality theory: monoduality in convex optimization; 
Homogeneous selfdual methods for linear programming) 
dual solution see: primal- — 
dual solutions see: exploiting the interplay between primal 
and — 
dual space 
[90C05, 90C30] 
(see: Theorems of the alternative and optimization) 
dual space 
[90C05, 90C30] 
see: Theorems of the alternative and optimization) 
dual SQPIP methods see: primal- — 
dual system 
[15A39, 90C05, 90C33] 
see: Equivalence between nonlinear complementarity 
problem and fixed point problem; Tucker homogeneous 
systems of linear relations) 
dual systems see: homogeneous — 
dual techniques 
[90B80, 90C11] 
(see: Facility location with staircase costs) 
dual transformation 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
dual transformation see: canonical — 
dual transformation method see: canonical — 
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dual variable 
[90C30] 
(see: Image space approach to optimization) 

dual variables 
[90C05, 90C10, 90C30, 90C35, 90C46] 
(see: Generalized networks; Integer programming duality; 
Lagrangian duality: BASICS; Theorems of the alternative 
and optimization) 

dual variational inequality problem 

[46N10, 49]40, 90C26] 

see: Generalized monotonicity: applications to variational 

inequalities and equilibrium problems) 

dual vector 

[90C30] 

(see: Lagrangian duality: BASICS) 

duality 

[49-XX, 90-XX, 90B85, 90C30, 90C33, 93-XX] 

see: Duality theory: monoduality in convex optimization; 
Equivalence between nonlinear complementarity problem 
and fixed point problem; Large scale trust region problems; 
Single facility location: multi-objective euclidean distance 
location) 

duality 
[15A39, 49-XX, 49K27, 49K40, 49M29, 90-XX, 90C05, 90C10, 
90C11, 90C15, 90C22, 90C25, 90C29, 90C30, 90C31, 90C46, 
93-XX] 
(see: Bilevel programming: optimality conditions and 
duality; Duality theory: biduality in nonconvex 
optimization; Duality theory: monoduality in convex 
optimization; Duality theory: triduality in global 
optimization; First order constraint qualifications; 
Generalized benders decomposition; Integer programming 
duality; Linear optimization: theorems of the alternative; 
Motzkin transposition theorem; Probabilistic constrained 
linear programming: duality theory; Second order 
constraint qualifications; Semidefinite programming: 
optimality conditions and stability) 

duality see: Bilevel programming: optimality conditions 
and —; Clarke —; double-max —; double-min —; 
Fenchel —; Fenchel—Moreau —; Fenchel-Rockafellar —; 
inference —; Integer programming —; Klotzler —; 
Lagrangian —; Legendre —; linear programming —; LP —; 
Minkowski —; Multi-objective optimization: lagrange —; 
perfect —; polar —; saddle Lagrange —; SDP —; 
Semi-infinite programming, semidefinite programming and 
perfect —; strong —; strong and weak —; superadditive —; 
superLagrangian —; surrogate —; weak — 

duality and applications see: Robust linear programming with 
right-hand-side uncertainty — 

duality: BASICS see: Lagrangian — 

duality for bilevel programming 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 

duality equality 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 

duality equality 
[90C34, 91B28] 


(see: Semi-infinite programming and applications in 
finance) 

duality from the view of linear semi-infinite programming see: 
perfect — 

duality gap 
[49-XX, 90-XX, 90C05, 90C10, 90C22, 90C25, 90C30, 90C31, 
90C34, 90C51, 93-XX] 
(see: Duality theory: monoduality in convex optimization; 
Duality theory: triduality in global optimization; Image 
space approach to optimization; Integer programming: 
lagrangian relaxation; Interior point methods for 
semidefinite programming; Lagrangian duality: BASICS; 
Semidefinite programming: optimality conditions and 
stability; Semi-infinite programming, semidefinite 
programming and perfect duality; Theorems of the 
alternative and optimization) 

duality gap 
[90C30] 
(see: Lagrangian duality: BASICS) 

duality gap see: relative — 

Duality gaps in nonconvex optimization 
(90B50, 78M50) 

duality inequality 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

duality of the linear SIP problem 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

duality for M- and L-convex functions see: Fenchel-type — 

duality of matroids 
[90C09, 90C10] 
(see: Oriented matroids) 

duality and maximum principle 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 

Duality in optimal control with first order differential 
equations 
(49K05, 49K10, 49K15, 49K20) 
(referred to in: Control vector iteration CVI; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Hamilton-Jacobi-Bellman equation; Infinite 
horizon control and dynamic games; MINLP: applications 
in the interaction of design and control; Multi-objective 
optimization: interaction of design and control; Optimal 
control of a flexible arm; Robust control; Robust control: 
schur stability of polytopes of polynomials; Semi-infinite 
programming and control problems; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Suboptimal control) 
(refers to: Control vector iteration CVI; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Hamilton-Jacobi-Bellman equation; Infinite 
horizon control and dynamic games; MINLP: applications 
in the interaction of design and control; Multi-objective 
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optimization: interaction of design and control; Optimal 
control of a flexible arm; Robust control; Robust control: 
schur stability of polytopes of polynomials; Semi-infinite 
programming and control problems; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Suboptimal control) 


duality pair see: Fenchel —; Legendre — 


duality relation see: weak — 

duality relations see: legendre — 
duality result see: strong —; weak — 
Duality for semidefinite programming 


duality theorem see: Clarke 


(90C30) 

(referred to in: Semidefinite programming and determinant 
maximization; Semidefinite programming: optimality 
conditions and stability; Semidefinite programming and 
structural optimization; Semi-infinite programming, 
semidefinite programming and perfect duality; Solving 
large scale and sparse semidefinite programs) 

(refers to: Interior point methods for semidefinite 
programming; Semidefinite programming and determinant 
maximization; Semidefinite programming: optimality 
conditions and stability; Semidefinite programming and 
structural optimization; Semi-infinite programming, 
semidefinite programming and perfect duality; Solving 
large scale and sparse semidefinite programs) 

duality theorem 

[01A99, 90C05, 90C15, 90C99] 

(see: Probabilistic constrained linear programming: duality 
theory; Von Neumann, John) 
duality theorem 

[01A99, 90C99] 

(see: Von Neumann, John) 

; conic —; saddle 
superLagrangian —; weak — 

duality theorem for linear optimization 

[15A39, 90C05] 

(see: Motzkin transposition theorem) 
duality theory 

[49M29, 90C11] 

(see: Generalized benders decomposition) 
Duality theory 

[49K05, 49K10, 49K15, 49K20] 

(see: Duality in optimal control with first order differential 
equations) 


; strong —; 


duality theory see: Fenchel-Rockafellar —; Probabilistic 


constrained linear programming: — 

Duality theory: biduality in nonconvex optimization 
(49-XX, 90-XX, 93-XX) 

(referred to in: Duality theory: monoduality in convex 
optimization; Duality theory: triduality in global 
optimization; Von Neumann, John) 

(refers to: Duality theory: monoduality in convex 
optimization; Duality theory: triduality in global 
optimization; History of optimization; Von Neumann, 
John) 

duality theory for entropy optimization 

[90C25, 94417] 

(see: Entropy optimization: shannon measure of entropy 
and its properties) 

Duality theory: monoduality in convex optimization 
(49-XX, 90-XX, 93-XX) 


(referred to in: Duality theory: biduality in nonconvex 
optimization; Duality theory: triduality in global 
optimization; Von Neumann, John) 
(refers to: Duality theory: biduality in nonconvex 
optimization; Duality theory: triduality in global 
optimization; History of optimization) 

Duality theory: triduality in global optimization 
(49-XX, 90-XX, 93-XX) 
(referred to in: Duality theory: biduality in nonconvex 
optimization; Duality theory: monoduality in convex 
optimization; Von Neumann, John) 
(refers to: Duality theory: biduality in nonconvex 
optimization; Duality theory: monoduality in convex 
optimization; History of optimization; Von Neumann, 
John) 

dualization 
[46A20, 52A01, 90C10, 90C30, 90C46] 
(see: Composite nonsmooth optimization; Integer 
programming duality) 

dualization see: loffe-Burke local — 

Dubovitskii—Milyutin theorem 

[41A10, 46N10, 47N10, 49K27] 

see: High-order necessary conditions for optimality for 

abnormal points) 

Duffin theorem 

[90C05, 90C30] 

see: Theorems of the alternative and optimization) 

dummy nodes see: PSA with — 

duopoly 

[91B06, 91B60] 

see: Oligopolistic market equilibrium) 

duty-after-arrival 

see: Railroad crew scheduling) 

duty-before-departure 

see: Railroad crew scheduling) 

duty-period 

see: Railroad crew scheduling) 

duty scheduling problems see: Integrated vehicle and — 

duty scheduling process 

[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 

see: Vehicle scheduling) 

duty time see: on- — 

DVRP 

[90B06] 

see: Vehicle routing) 

Dykstra’s algorithm and robust stopping criteria 

90C25, 65K05, 65G505) 

dynamic 

[90B06] 

see: Vehicle routing) 

dynamic complementarity problem 

[90C33] 

see: Order complementarity) 

dynamic complementarity problem see: discrete — 

dynamic considerations and controllability see: integration 
of — 

dynamic facility location 
[90B85] 
(see: Single facility location: multi-objective rectilinear 
distance location) 

dynamic games see: Infinite horizon control and — 
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dynamic load balancing 

65K05, 65Y05, 65Y10, 65Y20, 68W10] 

(see: Interval analysis: parallel methods for global 

optimization) 

dynamic load balancing technique 

90C10, 90C26, 90C30] 

(see: Optimization software) 

dynamic location problem 

90C35] 

(see: Multi-index transportation problems) 

dynamic model 

90B80, 90B85] 

(see: Warehouse location problem) 

dynamic network flow problem see: nonlinear — 

dynamic network flow problems 

90C30] 

(see: Simplicial decomposition) 

dynamic optimization 

[65L99, 93-XX] 

see: Optimization strategies for dynamic systems) 

Dynamic optimization 

[49]xx, 65L99, 91Axx, 93-XX] 

see: Infinite horizon control and dynamic games; 
Optimization strategies for dynamic systems) 

dynamic optimization see: discrete decisions in —; mixed 
integer — 

dynamic optimization problem see: stochastic — 

dynamic programming 
[34H05, 49]xx, 49120, 49L99, 62H30, 65L99, 68Q20, 90C10, 
90C11, 90C20, 90C39, 90C40, 91Axx, 93-XX] 
(see: Dynamic programming: average cost per stage 
problems; Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming: optimal control applications; 
Hamilton-Jacobi-Bellman equation; Infinite horizon 
control and dynamic games; Linear ordering problem; 
Neuro-dynamic programming; Optimal triangulations; 
Optimization strategies for dynamic systems) 

dynamic programming 


[34H05, 49120, 49L99, 49M29, 62H30, 65K10, 90C06, 90C27, 


90C31, 90C39, 90C40] 
(see: Dynamic programming: average cost per stage 
problems; Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: stochastic 
shortest path problems; Dynamic programming: 
undiscounted problems; Hamilton-Jacobi-Bellman 
equation; Multiple objective dynamic programming; 
Neuro-dynamic programming; Operations research and 
financial markets) 

dynamic programming see: differential —; iterative —; 
Multiple objective —; Neuro- —; stochastic — 

dynamic programming algorithm 
[49L.20, 90C39, 90C40] 


(see: Dynamic programming: discounted problems; 
Dynamic programming: inventory control; Dynamic 
programming: stochastic shortest path problems) 

dynamic programming algorithm see: continuous-time 
equivalent of the — 

Dynamic programming: average cost per stage problems 
(49L99) 
(referred to in: Dynamic programming in clustering; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; Multiple 
objective dynamic programming; Neuro-dynamic 
programming) 
(refers to: Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; Multiple 
objective dynamic programming; Neuro-dynamic 
programming) 

Dynamic programming in clustering 
(90C39, 62H30) 
(referred to in: Dynamic programming: average cost per 
stage problems; Dynamic programming: continuous-time 
optimal control; Dynamic programming: discounted 
problems; Dynamic programming: infinite horizon 
problems, overview; Dynamic programming: inventory 
control; Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Dynamic programming: 
stochastic shortest path problems; Dynamic programming: 
undiscounted problems; Hamilton-Jacobi-Bellman 
equation; Multiple objective dynamic programming; 
Neuro-dynamic programming; Optimization-based 
visualization) 
(refers to: Dynamic programming: average cost per stage 
problems; Dynamic programming: continuous-time 
optimal control; Dynamic programming: discounted 
problems; Dynamic programming: infinite horizon 
problems, overview; Dynamic programming: inventory 
control; Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Dynamic programming: 
stochastic shortest path problems; Dynamic programming: 
undiscounted problems; Hamilton-Jacobi-Bellman 
equation; Multiple objective dynamic programming; 
Neuro-dynamic programming) 

Dynamic programming: continuous-time optimal control 
(49L20, 34H05, 90C39) 
(referred to in: Control vector iteration CVI; Duality in 
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optimal control with first order differential equations; 
Dynamic programming: average cost per stage problems; 
Dynamic programming in clustering; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; High-order 
maximum principle for abnormal extremals; Infinite 
horizon control and dynamic games; MINLP: applications 
in the interaction of design and control; Multi-objective 
optimization: interaction of design and control; Multiple 
objective dynamic programming; Neuro-dynamic 
programming; Optimal control of a flexible arm; 
Optimization strategies for dynamic systems; Pontryagin 
maximum principle; Robust control; Robust control: schur 
stability of polytopes of polynomials; Semi-infinite 
programming and control problems; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Suboptimal control) 

(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: average cost per stage problems; Dynamic 
programming in clustering; Dynamic programming: 
discounted problems; Dynamic programming: infinite 
horizon problems, overview; Dynamic programming: 
inventory control; Dynamic programming and Newton’s 
method in unconstrained optimal control; Dynamic 
programming: optimal control applications; Dynamic 
programming: stochastic shortest path problems; Dynamic 
programming: undiscounted problems; 
Hamilton-Jacobi-Bellman equation; High-order maximum 
principle for abnormal extremals; Infinite horizon control 
and dynamic games; MINLP: applications in the interaction 
of design and control; Multi-objective optimization: 
interaction of design and control; Multiple objective 
dynamic programming; Neuro-dynamic programming; 
Optimal control of a flexible arm; Optimization strategies 
for dynamic systems; Pontryagin maximum principle; 
Robust control; Robust control: schur stability of polytopes 
of polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Suboptimal control) 

Dynamic programming: discounted problems 

(49L20, 90C39) 

(referred to in: Dynamic programming: average cost per 
stage problems; Dynamic programming in clustering; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming: infinite horizon problems, 
overview; Dynamic programming: inventory control; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Dynamic programming: 
stochastic shortest path problems; Dynamic programming: 
undiscounted problems; Hamilton-Jacobi-Bellman 
equation; Multiple objective dynamic programming; 
Neuro-dynamic programming) 


(refers to: Dynamic programming: average cost per stage 
problems; Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; Multiple 
objective dynamic programming; Neuro-dynamic 
programming) 

dynamic programming equation see: continuous-time analog 
of the — 

dynamic programming equations see: recursive — 

Dynamic programming: infinite horizon problems, overview 
(49120, 90C39, 90C40) 
(referred to in: Dynamic programming: average cost per 
stage problems; Dynamic programming in clustering; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming: discounted problems; Dynamic 
programming: inventory control; Dynamic programming 
and Newton’s method in unconstrained optimal control; 
Dynamic programming: optimal control applications; 


Dynamic programming: stochastic shortest path problems; 


Dynamic programming: undiscounted problems; 
Hamilton-Jacobi-Bellman equation; Multiple objective 
dynamic programming; Neuro-dynamic programming; 
Optimization strategies for dynamic systems) 

(refers to: Dynamic programming: average cost per stage 
problems; Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: inventory control; Dynamic programming 
and Newton’s method in unconstrained optimal control; 
Dynamic programming: optimal control applications; 


Dynamic programming: stochastic shortest path problems; 


Dynamic programming: undiscounted problems; 
Hamilton-Jacobi-Bellman equation; Multiple objective 
dynamic programming; Neuro-dynamic programming; 
Optimization strategies for dynamic systems) 

Dynamic programming: inventory control 
(49120) 
(referred to in: Dynamic programming: average cost per 
stage problems; Dynamic programming in clustering; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Dynamic programming: 
stochastic shortest path problems; Dynamic 
programming: undiscounted problems; 
Hamilton-Jacobi-Bellman equation; Multiple 
objective dynamic programming; Neuro-dynamic 
programming) 
(refers to: Dynamic programming: average cost per stage 
problems; Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 


Subject Index 


Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Dynamic programming: 
stochastic shortest path problems; Dynamic programming: 
undiscounted problems; Hamilton-Jacobi-Bellman 
equation; Multiple objective dynamic programming; 
Neuro-dynamic programming) 

Dynamic programming and Newton’s method in 
unconstrained optimal control 

(49M29, 65K10, 90C06) 

(referred to in: Automatic differentiation: calculation of 
Newton steps; Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: average cost per stage problems; 
Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming: optimal control applications; Dynamic 
programming: stochastic shortest path problems; Dynamic 
programming: undiscounted problems; 
Hamilton-Jacobi-Bellman equation; Infinite horizon 
control and dynamic games; Interval Newton methods; 
MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control; Multiple objective dynamic programming; 
Neuro-dynamic programming; Nondifferentiable 
optimization: Newton method; Nonlinear least squares: 
Newton-type methods; Optimal control of a flexible arm; 
Optimization strategies for dynamic systems; Robust 
control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Suboptimal control; Unconstrained nonlinear 
optimization: Newton-Cauchy framework) 

(refers to: Automatic differentiation: calculation of Newton 
steps; Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: average cost per stage problems; Dynamic 
programming in clustering; Dynamic programming: 
continuous-time optimal control; Dynamic programming: 
discounted problems; Dynamic programming: infinite 
horizon problems, overview; Dynamic programming: 
inventory control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; Infinite 
horizon control and dynamic games; Interval Newton 
methods; MINLP: applications in the interaction of design 
and control; Multi-objective optimization: interaction of 
design and control; Multiple objective dynamic 
programming; Neuro-dynamic programming; 
Nondifferentiable optimization: Newton method; Optimal 
control of a flexible arm; Optimization strategies for 
dynamic systems; Robust control; Robust control: schur 
stability of polytopes of polynomials; Semi-infinite 
programming and control problems; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Suboptimal control; 


Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 


Dynamic programming: optimal control applications 


(93-XX) 

(referred to in: Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: average cost per stage problems; 
Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: stochastic 
shortest path problems; Dynamic programming: 
undiscounted problems; Hamilton-Jacobi-Bellman 
equation; Infinite horizon control and dynamic games; 
MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control; Multiple objective dynamic programming; 
Neuro-dynamic programming; Optimal control of a flexible 
arm; Optimization strategies for dynamic systems; Robust 
control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Suboptimal control) 

(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: average cost per stage problems; Dynamic 
programming in clustering; Dynamic programming: 
continuous-time optimal control; Dynamic programming: 
discounted problems; Dynamic programming: infinite 
horizon problems, overview; Dynamic programming: 
inventory control; Dynamic programming and Newton’s 
method in unconstrained optimal control; Dynamic 
programming: stochastic shortest path problems; Dynamic 
programming: undiscounted problems; 
Hamilton-Jacobi-Bellman equation; Infinite horizon 
control and dynamic games; MINLP: applications in the 
interaction of design and control; Multi-objective 
optimization: interaction of design and control; Multiple 
objective dynamic programming; Neuro-dynamic 
programming; Optimal control of a flexible arm; 
Optimization strategies for dynamic systems; Robust 
control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Suboptimal control) 


dynamic programming paradigm see: general — 
dynamic programming recursion 


[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 


dynamic programming recursions 


[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 


Dynamic programming: stochastic shortest path problems 


(49L20, 90C40) 
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(referred to in: Dynamic programming: average cost per 
stage problems; Dynamic programming in clustering; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; Multiple 
objective dynamic programming; Neuro-dynamic 
programming; Optimization strategies for dynamic 
systems) 
(refers to: Dynamic programming: average cost per stage 
problems; Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; Multiple 
objective dynamic programming; Neuro-dynamic 
programming; Optimization strategies for dynamic 
systems) 

Dynamic programming: undiscounted problems 
(49L20, 90C40) 
(referred to in: Dynamic programming: average cost per 
stage problems; Dynamic programming in clustering; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Hamilton-Jacobi-Bellman equation; 
Multiple objective dynamic programming; Neuro-dynamic 
programming) 
(refers to: Dynamic programming: average cost per stage 
problems; Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Hamilton-Jacobi-Bellman equation; 
Multiple objective dynamic programming; Neuro-dynamic 
programming) 

dynamic service needs see: static/ — 

dynamic simulation 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 

dynamic system see: discrete event — 

dynamic systems see: Optimization strategies for —; 
Quasidifferentiable optimization: stability of —; 
stochastic — 


dynamic systems by constructive nonlinear dynamics see: 
Robust design of — 

dynamic traffic assignment 

[90C35] 

see: Multicommodity flow problems) 

dynamic traffic network model 

[90B15] 

see: Dynamic traffic networks) 

Dynamic traffic networks 

90B15) 

referred to in: Auction algorithms; Communication 
network assignment problem; Cost approximation 
algorithms; Equilibrium networks; Generalized networks; 
Maximum flow problem; Minimum cost flow problem; 
Multicommodity flow problems; Network design problems; 
Network location: covering problems; Nonconvex network 
flow problems; Piecewise linear network flow problems; 
Shortest path tree algorithms; Steiner tree problems; 
Stochastic network problems: massively parallel solution; 
Survivable networks; Traffic network equilibrium) 
(refers to: Auction algorithms; Communication network 
assignment problem; Cost approximation algorithms; 
Directed tree networks; Equilibrium networks; Evacuation 
networks; Generalized networks; Maximum flow problem; 
Minimum cost flow problem; Network design problems; 
Network location: covering problems; Nonconvex network 
flow problems; Piecewise linear network flow problems; 
Shortest path tree algorithms; Steiner tree problems; 
Stochastic network problems: massively parallel solution; 
Survivable networks; Traffic network equilibrium) 

dynamic travel behavior see: day-to-day — 

dynamic two-stage stochastic programming problem 

[90C15] 

see: Two-stage stochastic programming: quasigradient 

method) 

dynamic two-stage stochastic programming problem 

[90C15] 

see: Two-stage stochastic programming: quasigradient 

method) 

dynamic vehicle routing problem 

[90B06] 

see: Vehicle routing) 

dynamical Bayesian networks 

see: Bayesian networks) 

dynamical system 

[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 

see: Replicator dynamics in combinatorial optimization; 

Standard quadratic optimization problems: algorithms) 

dynamical system 

[05C60, 05C69, 37B25, 90C20, 90C27, 90C34, 90C35, 90C59, 

91A22, 91B28] 

see: Replicator dynamics in combinatorial optimization; 
Semi-infinite programming and applications in finance) 

dynamical system see: projected —; Variational inequalities: 
projected —; variational inequality problem and 
a projected — 

dynamical systems see: discrete —; estimating uncertainty 
in —; Interval analysis for optimization of —; projected — 

dynamics see: computational fluid —; Design optimization in 
computational fluid —; molecular —; process —; 
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replicator —; Robust design of dynamic systems by 
constructive nonlinear — 
dynamics in combinatorial optimization see: Replicator — 


E 


Ep, j see: characterization of — 
E. approach see: Variational inequalities: F. — 
EACO 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 
earliness see: minimization of order — 
easy-to-compute 
(see: Global optimization: functional forms) 
EasyModeler/6000 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
(EAX) see: edge assembly crossover — 
echelon arborescence system see: multi- — 
echelon form 
[65K05, 90C30] 
(see: Automatic differentiation: calculation of Newton 
steps) 
echelon stock 
[90B50] 
see: Inventory management in supply chains) 
econometric methods 
[90C26, 90C30] 
(see: Forecasting) 
econometric models 
[90C26, 90C30] 
see: Forecasting) 
econometrics 
[90C26, 90C30] 
see: Forecasting) 
Economic Analysis see: linear Programming and — 
economic applications 
[90C05] 
(see: Continuous global optimization: applications) 
economic equilibrium 
[90C90, 91465, 91B99] 
(see: Bilevel programming: applications) 
economic equilibrium see: general — 
economic equilibrium model see: pure exchange —; pure 
trade — 
economic growth see: Ramsey rule of — 
Economic lot-sizing problem 
(90C11, 90C39, 90C90, 90B10, 90B05, 55M05) 
economic order quantity 
[90B50] 
see: Inventory management in supply chains) 
economic system conditions 
[91B50] 
see: Financial equilibrium) 
economics 
[01A99] 
see: Kantorovich, Leonid Vitalyevich) 
economics see: mathematical — 


economies of scale 
[90C25] 
(see: Concave programming) 
economy see: pure exchange — 
economy of scale 
90C26, 90C30] 
(see: Reverse convex optimization) 
economy of scale 
90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 
edge see: light —; pale —; required —; shortest —; Voronoi — 
edge assembly crossover (EAX) 
90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
edge coloring 
05C85, 90C35] 
(see: Directed tree networks; Graph coloring) 
edge coloring see: constrained — 
edge coloring problem 
90C35] 
(see: Graph coloring) 
edge contraction method 
(see: Maximum cut problem, MAX-CUT) 
edge crossing 
90C10, 90C27, 94C15] 
(see: Graph planarization) 
edge-directions of P see: covers all — 
edge-disjoint path 
90-XX] 
(see: Survivable networks) 
edge-disjoint path 
90-XX] 
(see: Survivable networks) 
edge elimination ordering see: cobipartite neighborhood — 
edge exchange 
90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
edge filter see: Sobel — 
edge flips see: good — 
edge of a graph 
05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
edge insertion paradigm 
68Q20 
(see: Optimal triangulations) 
edge list see: extended doubly connected — 
edge realization 
90C35 
(see: Optimization in leveled graphs) 
edge set 
90C35 
(see: Graph coloring) 
edge simplex method see: steepest — 
edges 
05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
edges see: unavoidable — 
edges of a digraph see: set of — 
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EDM 

[05C50, 15A48, 15457, 90C25] 

(see: Matrix completion problems) 
Edmundson-Madansky upper bound 

[90C15] 

(see: Stochastic programs with recourse: upper bounds) 
effect see: bullwhip —; Doppler —; Maratos —; wrapping — 
effective domain of a function 

[90C10, 90C25, 90C27, 90C35] 

(see: L-convex functions and M-convex functions) 
effective set of a function 

[65K05, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization) 
effects see: boundary —; solvation — 
effects in microclusters see: size — 
efficiency 

[90C11, 90C29] 

(see: Multi-objective mixed integer programming) 


efficiency 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
efficiency see: computational —; local —; local strict —; local 
weak —; proper —; strict —; weak 


efficiency assessment see: comparative — 
efficiency and nondomination see: comparison of — 
efficient 
[90B30, 90B50, 90C05, 90C10, 90C11, 90C29, 91B82] 
(see: Data envelopment analysis; Generalized concavity in 
multi-objective optimization; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multi-objective optimization; Interactive 
methods for preference value functions; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support) 
efficient see: extreme- —; input- —; nonextreme —; 
output- —; weakly — 
efficient algorithm 
[03D15, 68Q05, 68Q15, 90C60] 
(see: Computational complexity theory; Parallel computing: 
complexity classes) 
efficient algorithm 
[90C60] 
(see: Computational complexity theory) 
efficient algorithms 
[68Q25, 91B28] 
(see: Competitive ratio for portfolio management) 
efficient frontier 
[91B50] 
(see: Financial equilibrium) 
efficient point 
[90C15] 
(see: Stochastic programming models: random objective) 
efficient point 
[90C15, 90C29, 90C30] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points; Multi-objective 
optimization: lagrange duality; Stochastic programming 
models: random objective) 
efficient point see: (G+ —; local —; local strictly —; local 
weakly —; strictly —; weakly 


efficient point set 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
efficient points see: Discretely distributed stochastic programs: 
descent directions and — 
efficient points sets see: connectedness of the — 
efficient polynomially bounded polynomial time algorithm 
[90C60] 
see: Computational complexity theory) 
efficient portfolios see: frontier of — 
efficient set 
[90C31, 90C39] 
see: Multiple objective dynamic programming) 
efficient solution 
[65K05, 90B50, 90C05, 90C29, 90C30, 90C90, 91B06] 
see: Bilevel programming: global optimization; 
Multi-objective optimization and decision support systems; 
Multiple objective programming support) 
efficient solution 
[65K05, 90B50, 90C05, 90C15, 90C29, 91B06] 
see: Discretely distributed stochastic programs: descent 
directions and efficient points; Multi-objective 
optimization and decision support systems; Multi-objective 
optimization: pareto optimal solutions, properties) 


efficient solution see: (C},)- —; nonsupported —; Pareto —; 
properly —; supported —; weakly — 
efficient solutions 


[90C27, 90C29] 
(see: Multi-objective combinatorial optimization) 

efficient solutions see: nonsupported —; set of potential —; 
supported — 

effort see: principle of least — 

efforts see: optimal distribution of — 

EGOP algorithm 

[90C26] 

(see: Generalized primal-relaxed dual approach) 

igenvalue 

[90C20, 90C25] 

(see: Quadratic programming over an ellipsoid) 

eigenvalue 

[90C29] 

see: Estimating data for multicriteria decision making 

problems: optimization techniques) 

igenvalue based lower bounds 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

eigenvalue bounds of interval matrices see: Interval analysis: — 

Eigenvalue enclosures for ordinary differential equations 
(49R50, 65L15, 65L60, 65G20, 65G30, 65G40) 
(referred to in: aBB algorithm; Hemivariational 
inequalities: eigenvalue problems; Interval analysis: 
eigenvalue bounds of interval matrices; Semidefinite 
programming and determinant maximization) 
(refers to: a BB algorithm; Hemivariational inequalities: 
eigenvalue problems; Interval analysis: eigenvalue bounds 
of interval matrices; Semidefinite programming and 
determinant maximization) 

eigenvalue formulation see: inverse interpolation 
parametric — 


e 
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eigenvalue of an interval matrix see: extreme —; interval of 
variation of an — 
eigenvalue problem 
[49]52, 90030] 
(see: Hemivariational inequalities: eigenvalue problems; 
Large scale trust region problems) 
eigenvalue problem see: nonsmooth — 
eigenvalue problems see: Hemivariational inequalities: — 
eigenvalue proximal support vector machine see: 
generalized — 
eigenvalue proximal support vector machine problem see: 
Generalized — 
eigenvalue reformulation see: parametric — 
eigenvalue theorem see: interlocking — 
eigenvalues see: upper and lower bounds to — 
eigenvector 
[90C20, 90C25] 
(see: Quadratic programming over an ellipsoid) 
eigenvector 
[90029] 
see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
ejection chain 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
ejection chain methods 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
Ekeland point see: local — 
ekeland variational principle 
[58C20, 58E30, 90C46, 90C48] 
(see: Nonsmooth analysis: weak stationarity) 
elastic bar of a truss 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
elastic boundary conditions see: quasidifferential — 
elastic demand traffic network problems with travel demand 
functions 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
elastic mechanical constructions see: linearly — 
elastic stability 
[49]52, 49Q10, 74G60, 74H99, 74K99, 74Pxx, 90C90] 
(see: Quasidifferentiable optimization: stability of dynamic 
systems) 
elastic systems see: linearly — 
elastic travel demand 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
elasticity see: price — 
elastostatic problem involving QD-superpotentials 
[49]40, 49M05, 49805, 74G99, 74H99, 74Pxx] 
(see: Quasidifferentiable optimization: variational 
formulations) 
elastostatic problem involving QD-superpotentials see: convex 
variational inequality for an —; variational equality for an — 
elastostatics 
[49]52, 49805, 74G99, 74H99, 74Pxx, 90C33] 


(see: Hemivariational inequalities: applications in 
mechanics) 

elastostatics with nonlinear boundary conditions 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 

ELECTRE 
[90-XX] 
(see: Outranking methods) 

ELECTRE | see: generalization of — 

electric and energy power systems see: Optimization in 
operation of — 

electric field 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 

electric power system 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C30, 90C35, 
94C15] 
(see: Greedy randomized adaptive search procedures; 
Optimization in operation of electric and energy power 
systems) 

electric power system 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 

electrostatic interactions 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 

element see: cognitive —; finite —; left-paired —; 
left-unpaired —; mixed finite —; right-paired —; 
right-unpaired — 

element approximation see: finite —; mixed finite — 

element group see: Klein 4- — 

element in a Hilbert space see: symmetric — 

element identification see: Mixed 0-1 linear programming 
approach for DNA transcription — 

element method see: finite — 

elemental subset 

62H30, 90C39] 

(see: Dynamic programming in clustering) 

elementary connecting path 

90C29] 

(see: Estimating data for multicriteria decision making 

problems: optimization techniques) 

ementary functions 

26A48, 26A51, 52A07] 
(see: Increasing and convex-along-rays functions on 
topological vector spaces) 

elementary functions see: set of — 

elementary orthogonal transformations 

15A23, 65F05, 65F20, 65F22, 65F25] 

(see: QR factorization) 

elementary partial derivatives 

26A24, 65D25] 

(see: Automatic differentiation: introduction, history and 

rounding error estimation) 

elementary transformations 

15A23, 65F05, 65F20, 65F22, 65F25] 

(see: Orthogonal triangularization) 


~ 
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elements see: contracting matroid —; deleting matroid — 
elevator problem 
68Q25, 90B80, 90C05, 90C27] 
(see: Communication network assignment problem) 
elicitation see: data — 
eligible nonbasic variable 
90C05] 
(see: Linear programming: Klee-Minty examples) 
iminating blocks of variables 
52B12, 68Q25] 
(see: Fourier—-Motzkin elimination method) 
elimination 
13Cxx, 13Pxx, 14Qxx, 65K05, 90C30, 90Cxx] 
(see: Bisection global optimization methods; Integer 
programming: algebraic methods) 
elimination 
65K05, 90C30] 
(see: Bisection global optimization methods) 
elimination see: Fourier-Motzkin —; Gaussian — 
elimination with backsolving see: Gaussian — 
elimination constraints see: sub-tour —; subtour — 
elimination graph 
[65Fxx] 
(see: Least squares problems) 
elimination method see: Fourier-Motzkin — 
elimination ordering see: cobipartite neighborhood edge — 
elitism 
[92B05] 
(see: Genetic algorithms) 
elitism 
[92B05] 
(see: Genetic algorithms) 
ellipsoid see: coordinate-aligned —; maximum-volume —; 
minimum-volume —; Quadratic programming over an — 
ellipsoid algorithm 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C06, 90C08, 90C10, 
90C11, 90C20, 90C25, 90C26, 90C60] 
(see: Information-based complexity and information-based 
optimization; Integer programming: cutting plane 
algorithms; Linear programming: karmarkar projective 
algorithm; Quadratic knapsack) 
ellipsoid constraint 
[90C20, 90C25] 
(see: Quadratic programming over an ellipsoid) 
Ellipsoid method 
(90C05) 
(refers to: Linear programming; Linear programming: 
interior point methods; Linear programming: karmarkar 
projective algorithm; Linear programming: Klee-Minty 
examples; Volume computation for polytopes: strategies 
and performances) 
ellipsoid method 
90C60 
(see: Complexity theory; Complexity theory: quadratic 
programming) 
ellipsoid method 
90C60 
(see: Complexity theory: quadratic programming) 
ellipsoid Property 
90C05 
(see: Ellipsoid method) 


~ 


e 


ellipsoidal approximation 
[15A15, 90C25, 90C55, 90C90] 
(see: Semidefinite programming and determinant 
maximization) 
ellipsoidal approximation 
[15A15, 90C25, 90C55, 90C90] 
see: Semidefinite programming and determinant 
maximization) 
ellipsoidal constraint 
[90C25, 90C30, 90C60] 
see: Complexity theory: quadratic programming; Solving 
large scale and sparse semidefinite programs) 
elliptic distance 
[90B80, 90C27 
see: Voronoi diagrams in facility location) 
elliptic distance 
[90B80, 90C27 
see: Voronoi diagrams in facility location) 
elliptic type see: abstract variational inequality of — 
Elzinga-Hearn algorithm 
[90B85, 90C27 
see: Single facility location: circle covering problem) 
EM algorithm 
[65T40, 90C26, 90C30, 90C90] 
see: Global optimization methods for harmonic retrieval) 
embedded family of preferences 
[90C29] 
see: Preference modeling) 
embedded in a probability distribution see: uncertainty — 
embedding 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C27, 90C29, 90C30, 
90C31, 90C33, 90C34, 90C57, 91C15] 
see: Optimization-based visualization; Parametric 
optimization: embeddings, path following and 
singularities) 
embedding 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 
embeddings, path following and singularities see: Parametric 
optimization: — 
emergence of logic connectives 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 


Emergency evacuation, optimization modeling 

emergency facility location 

[90C10, 90C11, 90C27, 90C57] 

see: Integer programming) 

empirical data see: evaluation of — 

empirical measure 

[62F12, 65C05, 65K05, 90C15, 90C31] 

see: Monte-Carlo simulations for stochastic optimization) 
empirical method 

[90C90] 

(see: Design optimization in computational fluid dynamics) 
empirical potential 

[90C90] 

see: Simulated annealing methods in protein folding) 
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empirical potential 
[92B05] 


(see: Genetic algorithms for protein structure prediction) 


empirical potentials 
[92B05] 


(see: Genetic algorithms for protein structure prediction) 


empty circle see: largest — 
empty circle problem see: largest — 
empty neighborhood graphs 
[68Q20] 
(see: Optimal triangulations) 
empty spaces see: analyzing almost — 
empty tree 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
enabled parallelism see: AD- — 
enclosing circle see: smallest — 
enclosing-circle problem see: smallest — 
enclosure see: interval — 
enclosure of all azeotropes see: Nonlinear systems of 
equations: application to the — 
enclosures for ordinary differential equations see: 
Eigenvalue — 
encoding see: binary — 
encyclopedia of Optimization 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
end see: clearly defined — 
ended see: double- —; single- — 
ended crew district see: double- —; single- — 
endpoint of an arc 
[90C35] 
see: Minimum cost flow problem) 
endpoint of an arc in a directed network 
[90035] 
see: Maximum flow problem) 
dpoint of a graph 
[90C35] 
see: Feedback set problems) 


energy 
[90C25, 94417] 


e 


= 


and its properties) 
energy 
[90C90, 91465, 91B99] 
(see: Bilevel programming: applications) 
energy see: average rmsds by —; external —; free —; 


internal —; minimum potential —; molar Gibbs free —; 


smoothing of the potential —; total Gibbs free — 
energy balance equations see: mass and — 
energy function 


[49J40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX, 90C90] 


(see: Nonconvex energy functions: hemivariational 
inequalities; Optimization in medical imaging) 
energy function see: Lennard-Jones potential —; 


nonconvex —; Optimization techniques for minimizing 


the —; potential — 


energy functional see: generalized critical point of an — 


energy functions: hemivariational inequalities see: 
Nonconvex — 


see: Entropy optimization: shannon measure of entropy 


energy minimization 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 65K10, 
68U20, 70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding; Multiple minima 
problem in protein folding: «BB global optimization 
approach) 
energy minimum see: global — 
energy model 
[90B50] 
(see: Optimization and decision support systems) 
energy modeling 
[90B50] 
(see: Optimization and decision support systems) 
energy and momentum balances see: mass — 
energy power systems see: Optimization in operation of 
electric and — 
energy purchase contract 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
energy purchase contract 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
engine see: BB search —; Newton search —; quasi-Newton 
search —; search — 
engine routing and industrial in-plant railroads 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
engineering see: Archimedes and the foundations of 
industrial —; Bilevel programming: applications in —; 
control — 
engineering applications 
[03B52, 03E72, 47840, 68127, 68T35, 68Uxx, 90Bxx, 90C05, 
91 Axx, 91B06, 92C60] 
(see: Boolean and fuzzy relations; Continuous global 
optimization: applications) 
engineering design problems see: Interval analysis: application 
to chemical — 
engineering optimization 
[65K05] 
(see: Direct global optimization algorithm) 
engineering structures 
[90C26, 90C90] 
(see: Structural optimization: history) 
engineering via negative fitness see: genetic — 
engines see: scheduling of switching — 
enhance 
(see: Maximum cut problem, MAX-CUT) 
enhanced heuristic 
[62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
enhanced positioning see: Gene clustering: A novel 
decomposition-based clustering approach: global optimum 
search with — 
Enkephalin see: Met- — 
enlargement 
[03H10, 49]27, 90C34] 
(see: Semi-infinite programming and control problems) 
enlargement of a feasible region 
[90C26] 
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(see: Bilevel optimization: feasibility test and flexibility 
index) 
ensemble algorithms see: Protein folding: generalized- — 
ensembles see: Generalized — 
entering arc 
[90C35] 
(see: Minimum cost flow problem) 
entering variable 
[90C05] 
(see: Linear programming: Klee-Minty examples) 
entering variable see: choice of the — 
enterprise-wide process networks under uncertainty see: 
Bilevel programming framework for — 
entities see: multipurpose storage — 
entropic proximal point algorithm 
[68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 
entropy 
[90C25, 94A08, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties; Jaynes’ maximum entropy principle; 
Maximum entropy principle: image reconstruction) 
entropy 
[90C25, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties; Jaynes’ maximum entropy principle) 
entropy see: Algorithmic —; axiomatic derivation of —; 
axiomatic derivation of cross- —; axiomatic derivation of 
the principle of maximum —; axiomatic derivation of the 


principle of minimum cross- —; cross- —; ¢- —; Jaynes 
maximum —; Kolmogorov é€- —; Kullback-Leibler cross- —; 
Kullback—Leibler measure of cross- —; maximum —; 


principle of maximum —-; principle of minimum cross- —; 
relative —; Shannon — 
entropy concentration theorem see: Jaynes — 
entropy and game theory see: Maximum — 
entropy and its properties see: Entropy optimization: shannon 
measure of — 
entropy optimization 
[94A08, 94A17] 
(see: Maximum entropy principle: image reconstruction) 
entropy optimization 
[90C25, 90C51, 94A08, 94A17] 
(see: Entropy optimization: interior point methods; 
Maximum entropy principle: image reconstruction) 
entropy optimization see: algorithms for —; duality theory 
for —; interior point algorithms for —; path following 
algorithm for —; unconstrained dual in — 
Entropy optimization for image reconstruction 
[94A08, 94A17] 
(see: Maximum entropy principle: image reconstruction) 
entropy optimization for image reconstruction see: 
finite-dimensional models for —; vector-space models 
for — 
Entropy optimization: interior point methods 
(94A17, 90C51, 90C25) 


(referred to in: Entropy optimization: parameter estimation; 
Entropy optimization: shannon measure of entropy and its 


properties; Homogeneous selfdual methods for linear 
programming; Jaynes’ maximum entropy principle; Linear 


programming: interior point methods; Linear 
programming: karmarkar projective algorithm; Maximum 
entropy principle: image reconstruction; Potential 
reduction methods for linear programming; Sequential 
quadratic programming: interior point methods for 
distributed optimal control problems; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

(refers to: Entropy optimization: parameter estimation; 
Entropy optimization: shannon measure of entropy and its 
properties; Homogeneous selfdual methods for linear 
programming; Interior point methods for semidefinite 
programming; Jaynes’ maximum entropy principle; Linear 
programming: interior point methods; Linear 
programming: karmarkar projective algorithm; Maximum 
entropy principle: image reconstruction; Potential 
reduction methods for linear programming; Sequential 
quadratic programming: interior point methods for 
distributed optimal control problems; Successive quadratic 
programming: solution by active sets and interior point 
methods) 


Entropy optimization: parameter estimation 
(94A17, 62F10) 
(referred to in: Entropy optimization: interior point 
methods; Entropy optimization: shannon measure of 
entropy and its properties; Jaynes’ maximum entropy 
principle; Maximum entropy principle: image 
reconstruction) 
(refers to: Entropy optimization: interior point methods; 
Entropy optimization: shannon measure of entropy and its 
properties; Jaynes’ maximum entropy principle; Maximum 
entropy principle: image reconstruction) 


Entropy optimization: shannon measure of entropy and its 
properties 
(94A17, 90C25) 
(referred to in: Entropy optimization: interior point 
methods; Entropy optimization: parameter estimation; 
Jaynes’ maximum entropy principle; Maximum entropy 
principle: image reconstruction; Optimization in medical 
imaging) 
(refers to: Entropy optimization: interior point methods; 
Entropy optimization: parameter estimation; Jaynes’ 
maximum entropy principle; Maximum entropy principle: 
image reconstruction; Optimization in medical imaging) 
entropy principle see: Jaynes’ maximum —; maximum —; 
minimum cross- — 
entropy principle: image reconstruction see: Maximum — 
entry-uniqueness problem 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 


enumerating extreme point solutions 
[90C60] 
(see: Complexity of degeneracy) 
enumeration 
[90C10, 90C11, 90C27, 90C57] 
(see: Integer programming; Set covering, packing and 
partitioning problems) 


enumeration see: extreme point —; implicit —; randomized — 
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enumeration in bilevel programming 
[90C30, 90C90] 
(see: Bilevel programming: global optimization) 
enumeration methods see: limited — 
enumeration techniques 
[90C26] 
see: Bilevel optimization: feasibility test and flexibility 
index) 
enumerative solution 
[90C10, 90C11, 90C27, 90C57] 
see: Integer programming) 
enumerative techniques see: branch and bound — 
envelope 
[90C26] 
see: Global optimization: envelope representation) 
envelope see: lower —; upper — 
envelope representation 
[90C26] 
see: Global optimization: envelope representation) 
envelope representation see: Global optimization: — 
envelopes see: theory of — 
envelopes in optimization problems see: Convex — 
envelopment analysis see: data — 
envelops see: theory of — 
environment see: competitive —; minimizing the degradation 
in quality of both water —; problem solving —; 
system-optimizing —; user-optimizing — 
environmental systems see: Global optimization in the analysis 
and management of — 
environmental systems modeling and management see: 
applications in — 
environmental targets 
[90C30, 90C35] 
(see: Optimization in water resources) 
EO 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
EOQ 
[90B50] 
(see: Inventory management in supply chains) 
EOQ 
[90B50] 
see: Inventory management in supply chains) 
eor operator 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 
EPA 
[90C30] 
(see: Large scale trust region problems) 
epi-Lipschitzness see: compact — 
epi mapping see: zero- — 
epiconsistency 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 
epiconvergence 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 
epiconvergent sequence 
[90C30] 
(see: Cost approximation algorithms) 


epidemic control 
49-XX, 60]xx, 65Lxx, 91B32, 92D30, 93-XX] 
(see: Resource allocation for epidemic control) 
epidemic control see: Resource allocation for — 
epidemic model 
49-XX, 60]xx, 65Lxx, 91B32, 92D30, 93-XX] 
(see: Resource allocation for epidemic control) 
epidemiology 
49-XX, 60]xx, 65Lxx, 91B32, 92D30, 93-XX] 
(see: Resource allocation for epidemic control) 
epiderivative see: contingent — 
epigraph 
49K27, 58C20, 58E30, 65K05, 90C10, 90C29, 90C30, 90C48] 
(see: Bisection global optimization methods; Integer 
programming: lagrangian relaxation; Nonsmooth analysis: 
Fréchet subdifferentials; Set-valued optimization) 
epigraph 
65K05, 90C30] 
(see: Bisection global optimization methods) 
epigraphs 
46A20, 52A01, 62F12, 65C05, 65K05, 90C15, 90C30, 90C31] 
(see: Farkas lemma: generalizations; Monte-Carlo 
simulations for stochastic optimization) 
epistemological interpretation 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
€-complementary slackness 
90C30, 90C35] 
(see: Auction algorithms) 
€-constraint method 
49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 
€-convergence see: finite — 
€-entropy 
[01A60, 03B30, 54C70, 68Q17] 
(see: Hilbert’s thirteenth problem) 
€-entropy 
[01A60, 03B30, 54C70, 68Q17] 
(see: Hilbert’s thirteenth problem) 
€-entropy see: Kolmogorov — 
&-global local maximizers see: set of discrete — 
é-global points see: set of — 
€-minimizer 
90C20, 90C25] 
(see: Quadratic programming over an ellipsoid) 
&-most active points see: set of — 
eé-reserved solution 
90C26] 
(see: Global optimization using space filling) 
€-reserved solution 
90C26] 
(see: Global optimization using space filling) 
e€-scaling 
90C30, 90C35] 
(see: Auction algorithms) 
€-stationary point 
65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 
€-steepest descent 
49J40, 49J52, 65K05, 90C30] 
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(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
e-subdifferential 
[46A20, 52A01, 65Kxx, 90C30, 90Cxx] 
(see: Farkas lemma: generalizations; Quasidifferentiable 
optimization: algorithms for QD functions) 
e-subdifferential set 
[46N10, 90-00, 90C47] 
(see: Nondifferentiable optimization) 
e-subgradient method 
[46N10, 90-00, 90C47] 
(see: Nondifferentiable optimization) 
EQNLP 
[49M37, 65K05, 90C30] 
(see: Equality-constrained nonlinear programming: KKT 
necessary optimality conditions) 
equal circles in a square 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
equal demand CMST 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
equalities see: single-valued boundary laws and variational — 
equality see: duality — 
Equality-constrained nonlinear programming: KKT necessary 
optimality conditions 
(49M37, 65K05, 90C30) 
(referred to in: First order constraint qualifications; Globally 
convergent homotopy methods; Inequality-constrained 
nonlinear optimization; Kuhn-Tucker optimality 
conditions; Lagrangian duality: BASICS; Redundancy in 
nonlinear programs; Relaxation in projection methods; 
Rosen’s method, global convergence, and Powell’s 
conjecture; Saddle point theory and optimality conditions; 
Second order constraint qualifications; Second order 
optimality conditions for nonlinear optimization; SSC 
minimization algorithms; SSC minimization algorithms for 
nonsmooth and stochastic optimization) 
(refers to: First order constraint qualifications; 
Inequality-constrained nonlinear optimization; 
Kuhn-Tucker optimality conditions; Lagrangian duality: 
BASICS; Relaxation in projection methods; Rosen’s 
method, global convergence, and Powell’s conjecture; 
Saddle point theory and optimality conditions; Second 
order constraint qualifications; Second order optimality 
conditions for nonlinear optimization; SSC minimization 
algorithms; SSC minimization algorithms for nonsmooth 
and stochastic optimization) 
equality-constrained nonlinear programming problem 
[49M37, 65K05, 90C30] 
(see: Equality-constrained nonlinear programming: KKT 
necessary optimality conditions) 
equality-constrained optimization 
[49M37, 65K05, 90C26, 90C30, 90C39] 
(see: Equality-constrained nonlinear programming: KKT 
necessary optimality conditions; Second order optimality 
conditions for nonlinear optimization) 
equality-constrained optimization 
[49M37, 65K05, 90C30] 
(see: Equality-constrained nonlinear programming: KKT 
necessary optimality conditions) 


equality constraint see: implicit — 

equality constraints 
[41A10, 46N10, 47N10, 49K27, 93-XX] 
(see: Direct search Luus—Jaakola optimization procedure; 
High-order necessary conditions for optimality for 
abnormal points) 

equality constraints see: feasibility of — 

equality for an elastostatic problem involving 
QD-superpotentials see: variational — 

equality of phase compositions 
[90C30] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 

equality relation see: linear — 

equality relaxation see: outer approximation with — 

equality relaxation and augmented penalty see: outer 
approximation with — 

equalization see: quality — 

equation see: ASOG —; balance —; Bellman’s —; 
characteristic —; conservation of flow —; continuous-time 
analog of the dynamic programming —; continuous-time 
Riccati —; corrected seminormal —; derivation of the 
Hamilton—Jacobi-Bellman —; difference —; differential —; 
diffusion —; dual Euler-Lagrange —; equilibrium —; 
Euler —; Euler-Lagrange —; extended adjoint —; 
generalized —; geometrical —; governing —; 
Hamilton-Jacobi —; Hamilton—Jacobi-Bellman —; 
Hammerstein —; HJB —; Knizhnik-Zamolodchikov 
differential —; Kremser —; Lagrange —; Langevin —; 
linear —; linear interval —; nonlinear diffusion —; 
normal —; NRTL —; one-dimensional nonlinear —; 
Poisson —; reaction —; regular solution of the Wilson —; 
replicator —; Riccati —; Schroedinger —; secant —; 
Smoluchowski-Kramers —; solution of the 
Hamilton—Jacobi-Bellman —; stochastic differential —; 
storage —; sufficiency theorem for the 
Hamilton—Jacobi-Bellman —; UNIFAC —; UNIQUAC —; 
Wilson — 

equation-based 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 

equation based approaches 
[90C26, 90C90] 
(see: Global optimization in phase and chemical reaction 
equilibrium) 

equation method see: diffusion — 

equation models see: simultaneous — 

equation oriented approach 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 

equation-solving see: iterative linear — 

equation of state 
[65H20, 80A10, 80A22, 90C90] 
(see: Global optimization: application to phase equilibrium 
problems) 

equations see: algebraic —; bounds for linear —; conservation 
of flow —; differential —; differential and algebraic —; 
Diophantine —; Duality in optimal control with first order 
differential —; Eigenvalue enclosures for ordinary 
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differential —; error bound for approximate solutions of 
nonlinear systems of —; Euler —; existence of solutions of 
nonlinear systems of —; extremal —; First order partial 
differential —; generalized state —; Global optimization 
methods for systems of nonlinear —; Grdbner bases for 
polynomial —; ideal and nonideal phase equilibrium —; 
integral —; Interval analysis: differential —; Interval analysis: 
systems of nonlinear —; 
Kantorovich—Karush—Kuhn-Tucker —; KT —; 
Kuhn-Tucker —; Lagrange —-; linear —; linear algebraic —; 
linear systems of —; mass and energy balance —; node flow 
balance —; nonlinear —; nonlinear system of —; nonlinear 
systems of —; nonsmooth —; normal —; ordinary 
differential —; overdetermined system of nonlinear —; 
Overdetermined systems of linear —; partial differential —; 
phase equilibrium —; polynomial —; polynomial system 
of —; recursive dynamic programming —; replicator —; 
resolvent —; rigorous bound for solutions of nonlinear 
systems of —; rotation in the solution of —; selection —; 
sensitivity —; solvability of —; state —; strain-displacement 
compatibility —; stress equilibrium —; Symmetric systems 
of linear —; system of —; systems of nonlinear —; test for 
the existence of solutions of —; underdetermined system of 
nonlinear —; uniqueness of solutions of nonlinear systems 
of —; well-determined system of nonlinear —; 
Wiener—Hopf — 

equations: application to the enclosure of all azeotropes see: 
Nonlinear systems of — 

equations and global optimization see: Differential — 

equations and linear least squares see: 
Abaffi-Broyden-Spedicato algorithms for linear —; ABS 
algorithms for linear — 

equations for material flows see: balance — 

equiangularity 
[68Q20] 
(see: Optimal triangulations) 

equilibration 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 

equilibration algorithm 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 

equilibria see: reaction — 

equilibrium 
[49-XX, 49]xx, 90-XX, 90B80, 90B85, 90C15, 90C26, 90C33, 
90Cxx, 91 Axx, 91Bxx, 93-XX] 
(see: Duality theory: triduality in global optimization; 
Facility location with externalities; Infinite horizon control 
and dynamic games; Stochastic bilevel programs) 

equilibrium 
[49M37, 90C26, 91A10] 
(see: Bilevel programming) 

equilibrium see: asymptotical stability at an —; 
communication —; Cooperative —; Cournot—-Nash 
oligopolistic —; economic —; feedback Nash —; 
Financial —; fixed demand traffic network —; general —; 
general economic —; Global optimization in phase and 
chemical reaction —; memory strategy —; memory strategy 
Nash —; migration —; multimodal traffic network —; 
multiphase chemical —; Nash —; network —; 
Noncooperative —; Oligopolistic market —; open-loop 


Nash —; Optimality criteria for multiphase chemical —; 
Overtaking —; partial —; phase —; pure exchange —; 
spatial Cournot—Nash —-; spatial price —; stability at an —; 
Stackelberg—Nash —; Stackelberg—Nash—Cournot —; 
symmetric network —; thermal —; traffic network —; 
Walrasian price — 

equilibrium analysis 
[90-01, 90B30, 90B50, 91B32, 91B52, 91B74] 
(see: Bilevel programming in management) 

equilibrium of an assignment and a set of prices 
[90C30, 90C35] 
(see: Auction algorithms) 

equilibrium of an assignment and a set of prices see: almost 
at — 

equilibrium conditions see: market — 

equilibrium constraints see: mathematical program with —; 
mathematical program with affine — 

equilibrium constraints: A piecewise SQP approach see: 
Optimization with — 

equilibrium equation 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 

equilibrium equations see: ideal and nonideal phase —; 
phase —; stress — 

equilibrium model see: Cournot-Nash oligopolistic —; 
migration network —; multi-sector multi-instrument 
financial —; multimodal traffic network —; partial —; 
perfectly competitive —; pure exchange economic —; pure 
trade economic — 

Equilibrium networks 
(90C30) 
(referred to in: Auction algorithms; Communication 
network assignment problem; Dynamic traffic networks; 
Financial equilibrium; Generalized monotonicity: 
applications to variational inequalities and equilibrium 
problems; Generalized networks; Maximum flow problem; 
Minimum cost flow problem; Multicommodity flow 
problems; Network design problems; Network location: 
covering problems; Nonconvex network flow problems; 
Oligopolistic market equilibrium; Piecewise linear network 
flow problems; Shortest path tree algorithms; Spatial price 
equilibrium; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Traffic network equilibrium; Walrasian price equilibrium) 
(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Evacuation networks; Financial 
equilibrium; Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Generalized networks; Maximum flow problem; Minimum 
cost flow problem; Network design problems; Network 
location: covering problems; Nonconvex network flow 
problems; Oligopolistic market equilibrium; Piecewise 
linear network flow problems; Shortest path tree 
algorithms; Spatial price equilibrium; Steiner tree 
problems; Stochastic network problems: massively parallel 
solution; Survivable networks; Traffic network equilibrium; 
Walrasian price equilibrium) 

equilibrium point 
[65K10, 90C90] 
(see: Variational inequalities: projected dynamical system) 
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equilibrium point 
[65K10, 90C90] 
(see: Variational inequalities: projected dynamical system) 
equilibrium problem see: chemical —; network structure of the 
spatial price —; phase —; spatial price —; standard traffic — 
equilibrium problems 
[65K10, 90C31, 90C33] 
(see: Sensitivity analysis of complementarity problems; 
Sensitivity analysis of variational inequality problems) 
equilibrium problems 
[46N10, 49J40, 90C26, 90C33] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems; Linear 
complementarity problem) 
equilibrium problems see: Generalized monotonicity: 
applications to variational inequalities and —; Global 
optimization: application to phase — 
equilibrium process 
[90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 
equilibrium search see: Global — 
equilibrium solutions 
[90C26, 90C90] 
(see: Global optimization in phase and chemical reaction 
equilibrium) 
equilibrium solutions see: verifying — 
equilibrium stress 
[51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 
(see: Graph realization via semidefinite programming) 
equilibrium with travel disutility functions see: traffic 
network — 
equivalence 
[90C29] 
(see: Preference modeling) 
equivalence see: computational —; logical —; problem — 
Equivalence between nonlinear complementarity problem 
and fixed point problem 
(90C33) 
(referred to in: Generalized nonlinear complementarity 
problem; Integer linear complementary problem; LCP: 
Pardalos-Rosen mixed integer formulation; Linear 
complementarity problem; Order complementarity; 
Principal pivoting methods for linear complementarity 
problems; Topological methods in complementarity 
theory) 
(refers to: Convex-simplex algorithm; Generalized nonlinear 
complementarity problem; Integer linear complementary 
problem; LCP: Pardalos-Rosen mixed integer formulation; 
Lemke method; Linear complementarity problem; Linear 
programming; Order complementarity; Parametric linear 
programming: cost simplex algorithm; Principal pivoting 
methods for linear complementarity problems; Sequential 
simplex method; Topological methods in complementarity 
theory) 
equivalence classes of problems 
[90C60] 
(see: Computational complexity theory) 
equivalence closure of a relation 
[03B52, 03E72, 47840, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 


equivalence closure of a relation see: local — 
equivalence to minmax, concave programs see: Bilevel linear 
programming: complexity — 
equivalence relation 
[26E25, 49]52, 5227, 90C99] 
(see: Quasidifferentiable optimization: Dini derivatives, 
clarke derivatives) 
equivalence relation see: local — 
equivalence of SIPs 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
equivalence theorem 
[49K27, 49K40, 90C30, 90C31] 
(see: Second order constraint qualifications) 
equivalences 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
equivalences 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
equivalent 
[05C15, 05C62, 05C69, 05C85, 13Cxx, 13Pxx, 14Qxx, 90C27, 
90C59, 90Cxx] 
(see: Integer programming: algebraic methods; 
Optimization problems in unit-disk graphs) 
equivalent see: computational — 
equivalent classes 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
equivalent cost vectors 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
equivalent of the dynamic programming algorithm see: 
continuous-time — 
equivalent forms see: Quadratic integer programming: 
complexity and — 
equivalent model see: deterministic — 
equivalent primal SD problem 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
equivalent problem see: deterministic — 
equivalent semi-infinite programs see: computationally — 
equivalent uniform dose (eud) 
[68W01, 90-00, 90C90, 92-08, 92C50] 
see: Optimization based frameworkfor radiation therapy) 
equivariant Morse lemma 
[58E05, 90C30] 
see: Topology of global optimization) 
EREW PRAM 
[03D15, 68Q05, 68Q15] 
see: Parallel computing: complexity classes) 


error see: estimation —; feasibility —; linearization —; 
minimizing the overall classification —; round-off —; sum of 
squared — 


error bound 

[90C30, 90C33] 

(see: Implicit lagrangian) 
error bound see: global — 
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error bound for approximate solutions of nonlinear systems of 
equations 
[65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 
error capacity see: shannon zero- — 
error estimates for AD 
[26A24, 65D25] 
(see: Automatic differentiation: introduction, history and 
rounding error estimation) 
error estimation see: Automatic differentiation: introduction, 
history and rounding — 
error minimization 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
errors are under control see: rounding — 
errors-in-variables model 
[65Fxx] 
(see: Least squares problems) 
Esau- Williams algorithm 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
escape step 
[90C20] 
(see: Standard quadratic optimization problems: 
algorithms) 
escape step 
[90C20] 
see: Standard quadratic optimization problems: 
algorithms) 
escherichia coli 
[90C08] 
see: Mixed 0-1 linear programming approach for DNA 
transcription element identification) 
essential 
[90C26, 90C31] 
see: Robust global optimization) 
essential optimal solution 
[90C26, 90C31] 
see: Robust global optimization) 
essential polyhedron 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
see: Integer programming: algebraic methods) 
essential supremum 
(see: Stochastic optimal stopping: problem formulations) 
essentially active index set 
[49J52, 49Q10, 74G60, 74H99, 74K99, 74Pxx, 90C90] 
see: Quasidifferentiable optimization: stability of dynamic 
systems) 
established node 
[68T99, 90C27] 
see: Capacitated minimum spanning trees) 
establishing solution quality 
[62F12, 65C05, 65K05, 90C15, 90C31] 
see: Monte-Carlo simulations for stochastic optimization) 
estimate 
[41A30, 62J02, 90C26] 
see: Regression by special functions: algorithms and 
complexity) 
estimate see: best —; difference —; Matula —; maximum 
likelihood —; probabilistic —; pseudocost — 


estimate of the spot rate see: t- — 

estimate using pseudocosts see: best — 

estimate using pseudoshadow prices see: best — 

estimates see: kernel —; parameter — 

estimates for AD see: error — 

estimates for parametric NLPS see: Bounds and solution 
vector — 

Estimating data for multicriteria decision making problems: 
optimization techniques 
(90C29) 
(referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Financial 
applications of multicriteria analysis; Fuzzy multi-objective 
linear programming; Multicriteria sorting methods; 
Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling) 
(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Financial 
applications of multicriteria analysis; Fuzzy multi-objective 
linear programming; Multicriteria sorting methods; 
Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling) 

estimating the spot rate for bonds with constant maturities 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 

estimating uncertainty in dynamical systems 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 

estimation 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 

estimation see: absolute —; Automatic differentiation: 
introduction, history and rounding error —; Bayesian 
parameter —; covariance matrix —; Entropy optimization: 
parameter —; gradient —; maximum likelihood —; 
parameter —; t-programmed problem of spot rate — 
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estimation of 1D-diffusion fluxes 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
estimation of diffusion flux models 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
estimation in distributed systems see: boundary flux — 
estimation error 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
estimation of kinetic coefficients 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
estimation in lumped systems see: reaction flux — 
estimation of model parameters 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
estimation and optimization of nonlinear problems see: 
Simultaneous — 
estimation problem see: £; —; sinusoidal parameter — 
estimation procedure see: sequential — 
estimation of reaction rates and stoichiometry 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
estimation risk 
[91B28] 
(see: Portfolio selection: markowitz mean-variance model) 
estimation of subdifferentials 
[26E25, 49J52, 52A27, 90C99] 
(see: Quasidifferentiable optimization: Dini derivatives, 
clarke derivatives) 
estimation of utility functions 
[90C29] 
(see: Multicriteria sorting methods) 
estimator see: Huber M- —; robust — 
estimators see: James—Stein —; Stein — 
n-convex 
90C26 
(see: Invexity and its applications) 
n-convex 
90C26 
(see: Invexity and its applications) 
n -pseudoconvex 
90C26 
(see: Invexity and its applications) 
euclidean distance 
90B85 
(see: Bayesian networks; Single facility location: 
multi-objective rectilinear distance location) 


euclidean distance location see: Single facility location: 
multi-objective — 
Euclidean distance location problem 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
Euclidean distance location problem see: iterative solution of 
the —; squared — 
Euclidean distance matrix 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
Euclidean distance matrix completion problem 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
Euclidean distance matrix completion problem 
(see: Semidefinite programming and the sensor network 
localization problem, SNLP) 
euclidean distances 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
Euclidean norm see: A-weighted — 
euclidean and rectilinear distances see: Optimizing facility 
location with — 
Euclidean representation 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
Euclidean representation see: unidimensional — 
Euclidean space see: triangulation of — 
Euclidean Steiner ratio 
[90C27] 
(see: Steiner tree problems) 
euclidean TSP 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
(eud) see: equivalent uniform dose — 
Euler equation 
[90C30] 
see: Image space approach to optimization) 
Euler equation 
[90030] 
see: Image space approach to optimization) 
Euler equations 
[90C90] 
see: Design optimization in computational fluid dynamics) 
Euler formula 
[58E05, 90C30] 
(see: Topology of global optimization) 
Euler-Lagrange equation 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
Euler-Lagrange equation see: dual — 
Euler method 
[90B15] 
(see: Dynamic traffic networks) 
Eulerian graph 
[90B06] 
(see: Vehicle routing) 


4200 


Subject Index 


european Journal of Operational Research 
[9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 
Evacuation networks 
(90B15) 
(referred to in: Auction algorithms; Communication 
network assignment problem; Dynamic traffic networks; 
Equilibrium networks; Generalized networks; Maximum 
flow problem; Minimum cost flow problem; 


Multicommodity flow problems; Network design problems; 


Network location: covering problems; Nonconvex network 
flow problems; Piecewise linear network flow problems; 
Shortest path tree algorithms; Steiner tree problems; 
Stochastic network problems: massively parallel solution; 
Survivable networks; Traffic network equilibrium) 

evacuation, optimization modeling see: Emergency — 

evaluation 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 

evaluation see: asymptotic case of integral —; multiple 
criteria —; performance —; policy —; software 
development and — 

evaluation in classical logic 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 

evaluation depth 

[68T20, 68T99, 90C27, 90C59] 

see: Metaheuristics) 

evaluation of empirical data 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

evaluation in multiple-valued logic 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

see: Checklist paradigm semantics for fuzzy logics) 

evaluation of objective functions and/or derivatives 

[90C10, 90C26, 90C30] 

see: Optimization software) 

evaluation problem 

[90C29] 

see: Multiple objective programming support) 

evaluator see: position — 

even sequence 

[05C85] 
(see: Directed tree networks) 

event dynamic system see: discrete — 


events see: decision making under extreme —; drought out — 


eventually exact 

[90C25, 90C26] 

(see: Decomposition in global optimization) 
Everett generalized Lagrange multiplier approach 

[90C10, 90C27] 

(see: Multidimensional knapsack problems) 
evidence see: expected weight of —; likelihood — 
evolution 

[92B05] 

(see: Genetic algorithms; Genetic algorithms for protein 

structure prediction) 
evolution strategy 

[92B05] 

(see: Genetic algorithms) 


evolution strategy 

92B05] 

(see: Genetic algorithms) 

evolutionary algorithm 

90C26] 

(see: MINLP: design and scheduling of batch processes) 

evolutionary algorithm 

05-04, 90C27] 

(see: Evolutionary algorithms in combinatorial 

optimization) 

evolutionary algorithms 

05-04, 68T20, 68T99, 90C27, 90C59] 
(see: Evolutionary algorithms in combinatorial 
optimization; Metaheuristics) 

Evolutionary algorithms in combinatorial optimization 
(90C27, 05-04) 
(referred to in: Bayesian networks; Beam selection in 
radiotherapy treatment design; Combinatorial matrix 
analysis; Combinatorial optimization algorithms in 
resource allocation problems; Combinatorial optimization 
games; Fractional combinatorial optimization; 
Multi-objective combinatorial optimization; 
Optimization-based visualization; Replicator dynamics in 
combinatorial optimization; Simulated annealing; 
Traveling salesman problem) 
(refers to: Combinatorial matrix analysis; Combinatorial 
optimization games; Fractional combinatorial 
optimization; Genetic algorithms; Multi-objective 
combinatorial optimization; Neural networks for 
combinatorial optimization; Replicator dynamics in 
combinatorial optimization) 

evolutionary game theory 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 

evolutionary game theory 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 

evolutionary methods 
(see: Bayesian networks) 

evolutionary network 

05C05, 05C40, 68R10, 90C35] 

(see: Network design problems) 

evolutionary strategies 

68T20, 68T99, 90C27, 90C59] 

(see: Metaheuristics) 

Evtushenko method 

65K05, 65K10] 

(see: ABS algorithms for optimization) 

EW 

68T99, 90C27] 

(see: Capacitated minimum spanning trees) 

ex-ante (risk averse, anticipative) decision 

90C15 

(see: Two-stage stochastic programming: quasigradient 

method) 

ex-ante (risk averse, anticipative) decision 

90C15 

(see: Two-stage stochastic programming: quasigradient 

method) 

ex-post (risk prone, adaptive) decision 

90C15 
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(see: Two-stage stochastic programming: quasigradient 
method) 
ex-post (risk prone, adaptive) decision 
[90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 
exact see: eventually — 
exact algorithm 
[9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 
exact algorithm for solving CAP on trees 
[68Q25, 90B80, 90C05, 90C27] 
(see: Communication network assignment problem) 
exact algorithms 
[68Q25, 68R10, 68W40, 90B06, 90B35, 90C06, 90C10, 90C27, 
90C39, 90C57, 90C59, 90C60, 90C90] 
(see: Domination analysis in combinatorial optimization; 
Heuristic and metaheuristic algorithms for the traveling 
salesman problem; Traveling salesman problem) 
exact algorithms 
[68Q25, 90B80, 90C05, 90C06, 90C08, 90C10, 90C11, 90C27, 
94C15] 
(see: Communication network assignment problem; Graph 
planarization; Integer programming: branch and bound 
methods; Integer programming: branch and cut algorithms; 
Integer programming: cutting plane algorithms) 
exact continuous 
90C10, 90C11, 90C27, 90C33] 
(see: Continuous reformulations of discrete-continuous 
optimization problems) 
exact Log -penalty function 
90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
exact methods 
90B06] 
(see: Vehicle routing) 
exact methods for solving vehicle routing problems 
90B06] 
(see: Vehicle routing) 
exact penalty 
49K35, 49M27, 65K10, 90C25] 
(see: Convex max-functions) 
exact penalty function 
65L99, 90C15, 90C25, 90C29, 90C30, 90C31, 93-XX] 
(see: Bilevel programming: optimality conditions and 
duality; Optimization strategies for dynamic systems; 
Stochastic quasigradient methods in minimax problems) 
exact penalty function 
[90C15, 90C30, 90Cxx] 
(see: Large scale trust region problems; Quasidifferentiable 
optimization: exact penalty methods; Stochastic 
quasigradient methods in minimax problems) 
exact penalty function see: |; — 
exact penalty function approach see: continuously 
differentiable — 
exact penalty function based algorithm 
[90C30] 
(see: Large scale trust region problems) 
Exact penalty method 
[90Cxx] 
(see: Discontinuous optimization) 


exact penalty methods see: Quasidifferentiable 
optimization: — 

exact penalty parameter 

[90Cxx] 

see: Quasidifferentiable optimization: exact penalty 

methods) 

exact procedure 

[90C10] 

see: Maximum constraint satisfaction: relaxations and 

upper bounds) 

exact sampling 

[62F12, 65C05, 65K05, 90C15, 90C31] 

see: Monte-Carlo simulations for stochastic optimization) 

exact solution methods 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

exactly 

see: LP strategy for interval-Newton method in 

deterministic global optimization) 

exactly or approximately 

90B10] 

(see: Piecewise linear network flow problems) 

example 

[90C06, 90C10, 90C11, 90C27, 90C30, 90C57, 90C90, 94C15] 

see: Graph planarization; Modeling difficult optimization 
problems) 

example see: optimization computer implementation — 

example of a trim-loss problem see: numerical — 

examples see: Derivatives of probability and integral functions: 
general theory and —; Klee-Minty —; Linear programming: 
Klee—Minty —; unclassifiable — 

examples from financial decision making see: Preference 
disaggregation approach: basic features — 

examples of quasidifferentiable functions 

[90Cxx] 

see: Quasidifferentiable optimization: optimality 

conditions) 

exceptional family 

[90033] 

(see: Topological methods in complementarity theory) 

excess demand function see: aggregate — 

excess function 

[90C26, 90C90] 

(see: Global optimization in phase and chemical reaction 

equilibrium) 

excess of a network node 

90C35] 

(see: Minimum cost flow problem) 

excess part 

90C30, 90C90] 
(see: Successive quadratic programming: applications in 
distillation systems) 

excess width 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 

exchange see: edge —; mass and heat —; modeling mass —; 
pure — 

exchange economic equilibrium model see: pure — 

exchange economy see: pure — 

exchange equilibrium see: pure — 

exchange heuristic see: min- — 
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exchange matches see: mass — 
exchange move 

[68T99, 90C27] 

(see: Capacitated minimum spanning trees) 
exchange neighborhood 

[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 
exchange neighborhood see: k- —; pair- — 
exchange network see: heat and mass — 
exchange networks see: Flexible mass — 
exchange pivot 

[65K05, 90C20, 90C33] 


(see: Principal pivoting methods for linear complementarity 


problems) 
exchange procedures see: local — 
exchange property 
[90C09, 90C10] 
(see: Matroids) 
exchanger see: heat —; mass — 
exchanger network see: mass —; mass and heat — 
exchanger network superstructure see: heat — 
exchanger network synthesis see: heat —; MINLP: heat —; 
Mixed integer linear programming: heat — 
exchanger network synthesis without decomposition see: 
heat — 
exchanger networks see: Global optimization of heat —; 
heat —; MINLP: mass and heat —; Mixed integer linear 
programming: mass and heat — 
exclusion region 
[68Q20] 
see: Optimal triangulations) 
exclusive OR 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 
exclusively 
[90C10, 90C11, 90C27, 90C33] 
see: Continuous reformulations of discrete-continuous 
optimization problems) 
execution of a Turing machine 
[90C60] 
(see: Complexity theory) 
exhaustion principle 
[65G20, 65G30, 65G40, 65K05, 90C30] 
see: Interval global optimization) 
exhaustive 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
exhaustive sequential coloring see: frequency —; 
requirement — 
existence 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 
existence of classes axiom 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
existence property 
[90033] 
(see: Topological methods in complementarity theory) 


existence-proving properties of interval Newton methods 
65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 
existence of sets axioms 
03E70, 03H05, 91B16] 
(see: Alternative set theory) 
existence of solutions of equations see: test for the — 
existence of solutions of nonlinear systems of equations 
65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 
existence and uniqueness see: Variational inequalities: 
geometric interpretation — 
exogenous inflow see: hydrological — 
exogenous inflow and demand see: water resources planning 
under uncertainty on hydrological — 
expanded transshipment model 
[93A30, 93B50] 
(see: Mixed integer linear programming: mass and heat 
exchanger networks) 
expanding see: algorithm greedy- — 
expanding grid see: menace of the — 
expansion 
90C30 
(see: Sequential simplex method) 
expansion see: first order Taylor series — 
expansion coefficient 
90C30 
(see: Sequential simplex method) 
expansion of a matrix see: standard determinant — 
expansion operations 
90C30 
(see: Sequential simplex method) 
expectation see: bounding the — 
expectation constraint see: conditional — 
expectation and decision 
[90C26, 90C30] 
(see: Forecasting) 
expectation functions see: sample and — 
expectation of an indicator function 
[90C15] 
(see: Derivatives of probability and integral functions: 
general theory and examples) 
expectation-maximization 
(see: Bayesian networks) 
expectation maximization 
[65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 
expectation-maximization algorithm 
[65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 
expectation-maximization interval 
[65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 
expectations see: Static stochastic programming models: 
conditional — 
expected number of pivot steps 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
expected number of shadow-vertices 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 


Subject Index 


4203 


expected power consumption 

[68M12, 90B18, 90C11, 90C30] 

(see: Optimization in ad hoc networks) 
expected recourse 

[90C15] 

(see: Two-stage stochastic programs with recourse) 
expected recourse function 

[90C10, 90C15] 

(see: Stochastic integer programs) 
expected Savings 

[90B15] 

(see: Evacuation networks) 
expected value function 

[90C11, 90C15] 

(see: Stochastic programming with simple integer recourse) 
expected value of perfect information 

[90C15] 

(see: Two-stage stochastic programs with recourse) 
expected weight of evidence 

[90C25, 94417] 

(see: Entropy optimization: shannon measure of entropy 

and its properties) 
experiment 

[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 

76R50, 80A20, 80A23, 80A30] 

(see: Identification methods for reaction kinetics and 

transport) 
experiment design 

[15A15, 90C25, 90C55, 90C90] 

(see: Semidefinite programming and determinant 

maximization) 
experiment design 

[15A15, 90C25, 90C55, 90C90] 

(see: Semidefinite programming and determinant 

maximization) 
experimental analysis see: model-based — 
experimental design 

[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 

76R50, 80A20, 80A23, 80A30] 

(see: Identification methods for reaction kinetics and 

transport) 
experimental design see: optimal —; sequential — 
expert system 

[90C26, 90C30] 

(see: Forecasting) 
expert systems 

[90C26, 90C30] 

(see: Forecasting) 
exploiting the interplay between primal and dual solutions 

[90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: branch and bound methods) 
exploration of minimax trees see: parallelizing the — 
exploratory statistical analysis 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
exponential 

[34E05, 90C27] 

(see: Asymptotic properties of random multidimensional 

assignment problem) 


exponential algorithm 

[90C60] 

(see: Computational complexity theory) 
exponential algorithm 

[90C60] 

(see: Computational complexity theory) 
exponential behavior 

[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Criss-cross pivoting rules) 
exponential complexity 

[03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 
exponential function 

[90C60] 

(see: Complexity classes in optimization) 
exponential function see: parabolic- — 
exponential scale 

[90C29] 

(see: Estimating data for multicriteria decision making 

problems: optimization techniques) 
exponential smoothing 
[90C26, 90C30] 

(see: Forecasting) 

exponential smoothing 

[90C26, 90C30] 

see: Forecasting) 

exponential time algorithm 

[90C60] 

see: Computational complexity theory) 
exponential transformation 

[90C11, 90C90] 

(see: MINLP: trim-loss problem) 
exponentially space-bounded Turing machine 
[90C60] 

see: Complexity classes in optimization) 
exponentially time-bounded Turing machine 
[90C60] 

(see: Complexity classes in optimization) 
export model 

[90C35] 

see: Multicommodity flow problems) 
exposure time see: radiation — 

express delivery problem 

[00-02, 01-02, 03-02] 

(see: Vehicle routing problem with simultaneous pickups 

and deliveries) 
express shipment delivery 
[90C35] 

(see: Multicommodity flow problems) 
expression parsing 

[65G20, 65G30, 65G40, 65H20] 

see: Interval analysis: intermediate terms) 
expressions see: algebraic — 

EXSPACE 

[90C60] 

see: Complexity classes in optimization) 
extended adjoint equation 

[41A10, 47N10, 49K15, 49K27] 

(see: High-order maximum principle for abnormal 
extremals) 
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extended alignment graph 

90C35] 

(see: Optimization in leveled graphs) 

extended canonical function space 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: triduality in global optimization) 

extended cutting plane 

[49M37, 90C11] 

see: MINLP: outer approximation algorithm; Mixed 

integer nonlinear programming) 

Extended cutting plane algorithm 

90C11, 90C26) 
(referred to in: Chemical process planning; Generalized 
benders decomposition; Generalized outer approximation; 
MINLP: application in facility location-allocation; MINLP: 
applications in blending and pooling problems; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 
MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer linear programming: 
mass and heat exchanger networks; Mixed integer 
nonlinear programming; Quadratic assignment problem) 

extended cutting plane method 

[90C11] 

see: MINLP: outer approximation algorithm) 

extended doubly connected edge list 

[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

see: Optimization problems in unit-disk graphs) 

extended Extremal Principle 

[49K27, 58C20, 58E30, 90C48] 

see: Nonsmooth analysis: Fréchet subdifferentials) 


extended group relaxations 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
see: Integer programming: algebraic methods) 
extended Lagrange-Slater dual 
[90C05, 90C25, 90C30, 90C34] 
see: Duality for semidefinite programming; Semi-infinite 
programming, semidefinite programming and perfect 
duality) 
extended linear complementarity problem 
[90C33] 
(see: Linear complementarity problem) 
extended linear programming problems 
[90C25, 90C33, 90C55] 
(see: Splitting method for linear complementarity 
problems) 
extended matrix 
[65K05, 90C30] 
(see: Automatic differentiation: calculation of Newton 
steps) 
extended quadratic programming problem 
[90C25, 90C33, 90C55] 
(see: Splitting method for linear complementarity 
problems) 


extended real-valued CNSO 
[46A20, 52A01, 90C30] 
(see: Composite nonsmooth optimization) 
extended set of Lagrange multipliers 
[49M37, 65K05, 90C30] 
(see: Inequality-constrained nonlinear optimization) 
extended support problems method 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
extension see: coloring —; conic —; interval —; Lovasz —; 
mean value —; natural interval —; path —; uniform —; 
united — 
Extension of the fundamental theorem of linear programming 
(90C05) 
extension set 
[03B52, 03E72, 47840, 68127, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
extension theorem see: Hahn-Banach linear — 
extensionality see: axiom of — 
extensionality axiom 
03E70, 03H05, 91B16] 
(see: Alternative set theory) 
extensionality convention 
03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
extensive form 
90C15] 
(see: L-shaped method for two-stage stochastic programs 
with recourse) 
exterior point algorithm see: dual — 
exterior point method 
90C05, 90C33] 
(see: Pivoting algorithms for linear programming 
generating two paths) 
exterior point method 
90C05, 90C33] 
(see: Pivoting algorithms for linear programming 
generating two paths) 
exteriority 
05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
external deviation 
62H30, 68T10, 90C05] 
(see: Linear programming models for classification) 
external energy 
90C90] 
(see: Optimization in medical imaging) 
externalities 
90B80, 90B85, 90Cxx, 91Axx, 91Bxx] 
(see: Facility location with externalities) 
externalities see: Facility location with — 
extra-gradient algorithm 
90C30] 
(see: Cost approximation algorithms) 
extra-urban transit planning 
68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
extraction see: feature — 
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extrapolation 
[90C26, 90C30] 
(see: Forecasting) 
extrapolation 
[90C26, 90C30] 
(see: Forecasting) 
extrapolation see: subjective curve fitting and — 
extrapolation methods 
[90C26, 90C30] 
(see: Forecasting) 
extraTime 
(see: Medium-term scheduling of batch processes) 
extremal 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
extremal see: abnormal —; abnormal weak —; locally —; 
normal —; weak — 
extremal basis 
49K35, 49M27, 65K10, 90C25] 
(see: Convex max-functions) 
extremal basis method 
65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 
extremal basis method 
65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 
extremal equations 
90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
extremal global optimization see: multi- — 
extremal Principle 
[49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
Extremal Principle see: extended — 
extremal ray 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
extremal set see: cover the — 
extremality see: multi- — 
extremals see: High-order maximum principle for abnormal — 
extreme-efficient 
[90B30, 90B50, 90C05, 91B82] 
(see: Data envelopment analysis) 
extreme eigenvalue of an interval matrix 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: eigenvalue bounds of interval 
matrices) 
extreme events see: decision making under — 
extreme face 
[90C09, 90C10, 90C11] 
(see: Disjunctive programming) 
extreme feasible solution 
[90C60] 
(see: Complexity of degeneracy) 
extreme point 
[90C05, 90C30] 
(see: Convex-simplex algorithm; Frank-Wolfe algorithm; 
Krein-Milman theorem; Simplicial decomposition) 


extreme point enumeration 

[90C60] 

see: Complexity of degeneracy) 

extreme point mathematical program 

[90C09, 90C10, 90C11] 

see: Disjunctive programming) 

extreme point ranking 

[90C60] 

see: Complexity of degeneracy) 

extreme point solution 

[90C60] 

see: Complexity of degeneracy) 

extreme point solutions see: enumerating — 

extreme points see: convex combination of the —; ranking — 

extremum principles 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 

extremum problems with probability functionals see: 
Approximation of — 

Extremum problems with probability functions: kernel type 
solution methods 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; General 
moment optimization problems; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; Multistage stochastic programming: barycentric 
approximation; Preprocessing in stochastic programming; 
Probabilistic constrained linear programming: duality 
theory; Probabilistic constrained problems: convexity 
theory; Simple recourse problem: dual method; Simple 
recourse problem: primal method; Stabilization of cutting 
plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; General 
moment optimization problems; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; Multistage stochastic programming: barycentric 
approximation; Preprocessing in stochastic programming; 
Probabilistic constrained linear programming: duality 
theory; Probabilistic constrained problems: convexity 
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theory; Simple recourse problem: dual method; Simple 
recourse problem: primal method; Stabilization of cutting 
plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 


eye-view see: beam’s- — 


F 


F see: ADOL- — 
f-attentive convergence 

[49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 
F-complete language 

[90C60] 

(see: Complexity classes in optimization) 
F-completeness 

[90C60] 

(see: Complexity classes in optimization) 
F. E. approach see: Variational inequalities: — 
F-hard language 


[90C60] 


see: Complexity classes in optimization) 


F-hardness 


fac 


[90C60] 
(see: Complexity classes in optimization) 


e 


[05B35, 20F36, 20F55, 52C35, 57N65] 


see: Hyperplane arrangements in optimization) 


face see: extreme —; k- —; optimal —; points on the same —; 
a 


fac 


fac 


e of an arrangement 


[05B35, 20F36, 20F55, 52C35, 57N65] 


see: Hyperplane arrangements in optimization) 
e of a polyhedral subdivision 


[13Cxx, 13Pxx, 14Qxx, 90Cxx] 


see: Integer programming: algebraic methods) 


faces see: incident — 


fac 


fac 


et 


[90C05, 90C06, 90C08, 90C10, 90C11] 


see: Integer programming: cutting plane algorithms) 
et 


[90C09, 90C10, 90C11] 


see: Disjunctive programming) 


facets 
05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 


(see: Convex discrete optimization) 

facial disjunctive program 

90C09, 90C10, 90C11] 

(see: Disjunctive programming) 

facial disjunctive program 

90C09, 90C10, 90C11] 

(see: Disjunctive programming) 

facial program 

90C10, 90C11, 90C27, 90C57] 
(see: Integer programming) 

facilities see: controlled recharge —; groundwater 
pumping —-; residents of special —; surface water 
pumping — 

facilities layout 
[90B80] 
(see: Facilities layout problems) 

Facilities layout problems 


(90B80) 

(referred to in: Quadratic assignment problem) 

(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Complexity classes in optimization; Complexity theory; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Genetic algorithms; Global optimization in 
Weber’s problem with attraction and repulsion; Integer 
programming: branch and bound methods; Integer 
programming: lagrangian relaxation; MINLP: application 
in facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Production-distribution system design problem; 
Quadratic assignment problem; Resource allocation for 
epidemic control; Simulated annealing methods in protein 
folding; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 


facility layout 


[90C10, 90C27, 94C15] 
(see: Graph planarization) 


facility location 


[90-02, 90B05, 90B06, 90B80, 90B85, 90C11, 90Cxx, 91Axx, 
91Bxx] 

(see: Facility location with externalities; Global supply chain 
models; Operations research models for supply chain 
management and design; Stochastic transportation and 
location problems; Warehouse location problem) 


facility location 


[05C05, 05C85, 68Q25, 90B80, 90B85, 90C08, 90C11, 90C26, 
90C27, 90C57, 90C59, 90C90] 

(see: Bottleneck steiner tree problems; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; Quadratic assignment 
problem; Single facility location: multi-objective euclidean 
distance location; Single facility location: multi-objective 
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rectilinear distance location; Voronoi diagrams in facility 

location) 
facility location see: Competitive —; dynamic —; 

emergency —; multi-objective —; multiple- —; single —; 

Voronoi diagrams in — 
facility location-allocation 

[49M37, 90C11] 

(see: Mixed integer nonlinear programming) 
facility location-allocation 

[90C26] 

(see: MINLP: application in facility location-allocation) 
facility location-allocation see: MINLP: application in — 
facility location: circle covering problem see: Single — 
facility location with euclidean and rectilinear distances see: 

Optimizing — 

Facility location with externalities 

(90B80, 90B85, 91Bxx, 90Cxx, 91 Axx) 

(referred to in: Combinatorial optimization algorithms in 

resource allocation problems; Facilities layout problems; 


Facility location problems with spatial interaction; Facility 


location with staircase costs; Global optimization in 
Weber’s problem with attraction and repulsion; MINLP: 


application in facility location-allocation; Multifacility and 


restricted location problems; Network location: covering 
problems; Optimizing facility location with euclidean and 


rectilinear distances; Single facility location: circle covering 
problem; Single facility location: multi-objective euclidean 
distance location; Single facility location: multi-objective 
rectilinear distance location; Stochastic transportation and 
location problems; Voronoi diagrams in facility location; 
Warehouse location problem) 

(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location problems with spatial interaction; Facility 
location with staircase costs; Global optimization in 
Weber’s problem with attraction and repulsion; MINLP: 
application in facility location-allocation; Multifacility and 
restricted location problems; Network location: covering 
problems; Optimizing facility location with euclidean and 
rectilinear distances; Production-distribution system 
design problem; Resource allocation for epidemic control; 
Single facility location: circle covering problem; Single 
facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 


facility location model 


[90B80, 90C10] 
(see: Facility location problems with spatial interaction) 


facility location model see: spatial competition —; 


Stochastic — 


facility location: multi-objective euclidean distance location 


see: Single — 


facility location: multi-objective rectilinear distance location 


see: Single — 


facility location problem 


[90B80, 90C10, 90C11, 90C27, 90C57] 
(see: Facility location with staircase costs; Integer 
programming) 


facility location problem see: uncapacitated — 


facility location problems 


[90B80, 90C10] 
(see: Facility location problems with spatial interaction) 


Facility location problems with spatial interaction 


(90B80, 90C10) 

(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 

(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location with externalities; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Production-distribution system design problem; 
Resource allocation for epidemic control; Single facility 
location: circle covering problem; Single facility location: 
multi-objective euclidean distance location; Single facility 
location: multi-objective rectilinear distance location; 
Stochastic transportation and location problems; Voronoi 
diagrams in facility location; Warehouse location problem) 


facility location problems with staircase costs see: convex 


piecewise linearization in —; heuristics of —; linearization 
in —; solution of — 


Facility location with staircase costs 


(90B80, 90C11) 

(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Global optimization in 
Weber’s problem with attraction and repulsion; MINLP: 
application in facility location-allocation; Multifacility and 
restricted location problems; Network location: covering 
problems; Optimizing facility location with euclidean and 
rectilinear distances; Single facility location: circle covering 
problem; Single facility location: multi-objective euclidean 
distance location; Single facility location: multi-objective 
rectilinear distance location; Stochastic transportation and 
location problems; Voronoi diagrams in facility location; 
Warehouse location problem) 

(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location with externalities; Facility location 
problems with spatial interaction; Global optimization in 
Weber’s problem with attraction and repulsion; MINLP: 
application in facility location-allocation; Multifacility and 
restricted location problems; Network location: covering 
problems; Optimizing facility location with euclidean and 
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rectilinear distances; Production-distribution system 
design problem; Resource allocation for epidemic control; 
Single facility location: circle covering problem; Single 
facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 

facility location with staircase costs 
[90B80, 90C11] 
(see: Facility location with staircase costs) 

facility planning and scheduling 
[49M37, 90C11] 
(see: Mixed integer nonlinear programming) 

facility problem in OR see: single- — 

factor see: constraint- —; cycle —; discount —; fading —; 


human rationality —; Q- —; search overhead — 
factor programming see: variable — 
factorial HMM 


(see: Bayesian networks) 

factorization see: Bunch and Parlett —; Cholesky —; classical 
LU —; complete orthogonal —; matrix —; modifying 
matrix —; orthogonal —; parallel matrix —; qR —; rank 
revealing —; rank revealing QR —; rank revealing URV —; 
structured matrix — 

factorization with column-pivoting see: QR — 

factorization of structured matrices see: Stochastic 
programming: parallel — 

factorization using Householder transformations see: QR — 

factorized quasi-Newton methods 


[49M37] 

(see: Nonlinear least squares: Newton-type methods) 
factors see: bound- —; normalized structure — 
fading factor 


(see: Bayesian networks) 
failure of the alpha-beta algorithm see: high —; low — 
failure risk see: business — 
fair objective function 

[90C09, 90C10] 

(see: Combinatorial optimization algorithms in resource 

allocation problems) 
falsification 

[34-xx, 34Bxx, 34Lxx, 93E24] 

(see: Complexity and large-scale least squares problems) 
families of Pi-algebras 

[03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 
families of the Pinkava logic algebras see: many-valued — 
family see: Broyden —; exceptional —; finite nested —; 

laminar —; two-parameter CG — 
family of measures see: dominated —; weakly L; 

(v)-differentiable — 
family of methods see: Broyden — 
family of methods and the BFGS update see: Broyden — 
family of preferences see: embedded — 
family of probability measures see: regular — 
family of sets see: pseudoconnected — 
family of triangulations see: regular — 
fan see: Grobner —; secondary — 


FAP 
[05-XX] 
(see: Frequency assignment problem) 

far-from-native conformations see: discarding — 

Farkas lemma 
(15A39, 90C05) 
(referred to in: Farkas lemma: generalizations; 
Fourier-Motzkin elimination method; Fractional 
programming; Global optimization: envelope 
representation; Grébner bases for polynomial equations; 
Kuhn-Tucker optimality conditions; Lagrangian duality: 
BASICS; Least-index anticycling rules; Linear optimization: 
theorems of the alternative; Linear programming; Motzkin 
transposition theorem; Theorems of the alternative and 
optimization; Tucker homogeneous systems of linear 
relations) 
(refers to: Farkas lemma: generalizations; Linear 
optimization: theorems of the alternative; Linear 
programming; Motzkin transposition theorem; Theorems 
of the alternative and optimization; Tucker homogeneous 
systems of linear relations) 

Farkas lemma 
[05B35, 15A39, 90C05, 90C20, 90C30, 90C33] 
(see: Kuhn-Tucker optimality conditions; Least-index 
anticycling rules; Linear optimization: theorems of the 
alternative; Motzkin transposition theorem; Theorems of 
the alternative and optimization) 

Farkas lemma: generalizations 
(46420, 90C30, 52A01) 
(referred to in: Farkas lemma; Fourier-Motzkin elimination 
method; Fractional programming; Global optimization: 
envelope representation; Grébner bases for polynomial 
equations; Lagrangian duality: BASICS; Least-index 
anticycling rules; Theorems of the alternative and 
optimization) 
(refers to: Farkas lemma) 

farthest-point Voronoi diagram 
[90B80, 90C27] 
(see: Voronoi diagrams in facility location) 

farthest-point Voronoi diagram 
[90B80, 90C27] 
(see: Voronoi diagrams in facility location) 

farthest vertex insertion (EVI) 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 

fast Givens transformation 

15A23, 65F05, 65F20, 65F22, 65F25] 

(see: QR factorization) 

fast interchange 

9008, 90C26, 90C27, 90C59] 

(see: Variable neighborhood search methods) 

fathom 

90C11] 

(see: MINLP: branch and bound methods) 

fathoming a node 

90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: branch and bound methods) 

fathoming step 

90C10, 90C26] 
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(see: MINLP: branch and bound global optimization 
algorithm) 
fatness 
[68Q20] 
(see: Optimal triangulations) 
fault see: jump across a —; negative —; positive — 
fault ridge 
[90Cxx] 
(see: Discontinuous optimization) 
faults see: set of — 
fc and max-regret heuristics see: max-regret- — 
FCO 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
FCOP 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
FCTP 
90B06, 90B10, 90C26, 90C35] 
(see: Minimum concave transportation problems) 
FDL 
90C09] 
(see: Inference of monotone boolean functions) 
FDS 
90B85] 
(see: Multifacility and restricted location problems) 
feasibility 
90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming) 
feasibility see: dual —; interdisciplinary —; Interval analysis: 
verifying —; primal — 
feasibility analysis see: Shape reconstruction methods for 
nonconvex — 
feasibility approach 
49M37, 90C11] 
(see: Mixed integer nonlinear programming) 
feasibility approach to image reconstruction from projection data 
94A08, 94A17] 
(see: Maximum entropy principle: image reconstruction) 
feasibility condition see: strict — 
feasibility convergence tests 
49M27, 90C11, 90C30] 
(see: MINLP: generalized cross decomposition) 
feasibility cut 
90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming) 
feasibility cuts 
49M27, 90C11, 90C15, 90C30] 
(see: L-shaped method for two-stage stochastic programs 
with recourse; MINLP: generalized cross decomposition) 
feasibility of equality constraints 
65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: verifying feasibility) 
feasibility error 
90C90, 91B28] 
(see: Robust optimization) 
feasibility of inequality constraints 
65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: verifying feasibility) 
feasibility problem 
05B35, 46A20, 52A01, 90C05, 90C20, 90C30, 90C33] 


(see: Farkas lemma: generalizations; Least-index anticycling 
rules) 
feasibility problem see: convex —; nonlinear —; zero-one 
integer — 
feasibility set see: second-stage — 
feasibility test 
[65G20, 65G30, 65G40, 65K05, 90C26, 90C30] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Interval global optimization) 
feasibility test and flexibility index see: Bilevel optimization: — 
feasible 
[49M30, 49M37, 65K05, 90C26, 90C30, 90C31] 
(see: Practical augmented Lagrangian methods; Robust 
global optimization; Smooth nonlinear nonconvex 
optimization) 
feasible approach 
[93-XX] 
(see: Dynamic programming: optimal control applications) 
feasible ascendant direction 
[90C30] 
see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
feasible assignment 
[90C10] 
see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
feasible basis see: primal — 
feasible box 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
feasible component 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
feasible computational solution see: practically — 
feasible cones see: high-order — 
feasible for Dpy 
[90C05, 90C25, 90C30, 90C34] 
see: Semi-infinite programming, semidefinite 
programming and perfect duality) 
feasible decomposition method 
[49Q10, 74K99, 74Pxx, 90C90, 91A65] 
see: Multilevel optimization in mechanics) 
feasible direction 
65K05, 65K10, 90C06, 90C30, 90C34] 
(see: Feasible sequential quadratic programming; 
Kuhn-Tucker optimality conditions; Rosen’s method, 
global convergence, and Powell’s conjecture) 
feasible direction 
[90C30] 
see: Frank-Wolfe algorithm) 
feasible direction see: improving — 
feasible direction method for nonlinear programming 
[90C30] 
see: Frank-Wolfe algorithm) 
feasible direction methods 
[90C29, 90C30] 
see: Convex-simplex algorithm; Multi-objective 
optimization; Interactive methods for preference value 
functions) 
feasible direction methods 
[65K05, 65K10, 90C30] 
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(see: ABS algorithms for optimization; Convex-simplex 
algorithm; Frank-Wolfe algorithm) 

feasible directions see: combined method of —; cone of —; 
methods of — 

feasible domain 


[65G20, 65G30, 65G40, 65K05, 90C30] 
see: Interval global optimization) 


feasible flow 


[90C09, 90C10] 
see: Combinatorial optimization algorithms in resource 


allocation problems) 


feasible flow problem 


[90C35] 
see: Maximum flow problem) 


feasible flow vector 


[90C09, 90C10] 


(see: Combinatorial optimization algorithms in resource 
allocation problems) 
feasible gradient controller 


[49Q10, 74K99, 74Pxx, 90C90, 91465] 


(see: Multilevel optimization in mechanics) 
feasible high-order approximating cones 


[41A10, 46N10, 47N10, 49K27 


(see: High-order necessary conditions for optimality for 


abnormal points) 


feasible high-order approximating curve 


abnormal points) 


[41A10, 46N10, 47N10, 49K27 
see: High-order necessary conditions for optimality for 


feasible high-order approximating vector 


fea. 


[41A10, 46N10, 47N10, 49K27 
see: High-order necessary conditions for optimality for 


abnormal points) 


sible iterates 
[65K05, 65K10, 90C06, 90C30, 90C34] 
see: Feasible sequential quadratic programming) 


feasible iterates 


fea 


fea 


fea. 


fea. 


fea. 


[65K05, 65K10, 90C06, 90C30, 90C34] 

see: Feasible sequential quadratic programming) 
sible move 

[68T99, 90C27] 


(see: Capacitated minimum spanning trees) 


sible node 
[90C10, 90C29] 


(see: Multi-objective integer linear programming) 


sible path approach 
[90C30, 90C90] 
see: Successive quadratic programming: applications in the 


process industry) 


sible path flow pattern 

[90B06, 90B20, 91B50] 

see: Traffic network equilibrium) 

sible point 

[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30, 


90C31] 


see: Rosen’s method, global convergence, and Powell’s 


conjecture; Sensitivity and stability in NLP: approximation; 
Stochastic global optimization: two-phase methods) 


feasible point 


90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 


(see: Modeling difficult optimization problems) 


feasible point see: regular — 

feasible point to a solution point see: bounds on the distance 
of a— 

feasible points see: set of — 

feasible problem 
[15A39, 90C05] 
(see: Linear optimization: theorems of the alternative) 

feasible region 
[65G30, 65G40, 65K05, 68W10, 90B15, 90C05, 90C06, 90C20, 
90C29, 90C30, 90C31, 90C57, 90C90] 
(see: Bilevel programming: global optimization; Global 
optimization: interval analysis and balanced interval 
arithmetic; Multiple objective programming support; 
Redundancy in nonlinear programs; Rosen’s method, 
global convergence, and Powell’s conjecture; Sensitivity and 
stability in NLP: continuity and differential stability; 
Stochastic network problems: massively parallel solution) 

feasible region see: enlargement of a —; minimal 
representation of a —; prime representation of a —; 
relaxation of a — 

feasible region reduction 
[90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions) 

Feasible sequential quadratic programming 
(65K05, 65K10, 90C06, 90C30, 90C34) 
(referred to in: Optimization with equilibrium constraints: 
A piecewise SQP approach; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Successive quadratic 
programming; Successive quadratic programming: 
applications in distillation systems; Successive quadratic 
programming: applications in the process industry; 
Successive quadratic programming: decomposition 
methods; Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 
(refers to: Optimization with equilibrium constraints: 
A piecewise SQP approach; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Successive quadratic 
programming; Successive quadratic programming: 
applications in distillation systems; Successive quadratic 
programming: applications in the process industry; 
Successive quadratic programming: decomposition 
methods; Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 

feasible sequential quadratic programming 
[65K05, 65K10, 90C06, 90C30, 90C34] 
(see: Feasible sequential quadratic programming) 

feasible set 
[37A35, 49M37, 65K05, 65K10, 9008, 90C05, 90C26, 90C27, 
90C29, 90C30, 90C59, 93A13] 
(see: Multilevel methods for optimal design; Multiple 
objective programming support; Potential reduction 
methods for linear programming; Smooth nonlinear 
nonconvex optimization; Variable neighborhood search 
methods) 

feasible set see: branch of a —; dual —; high-order —; 
p-order —; primal —-; strictly 
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feasible solution 
[9008, 90C05, 90C25, 90C26, 90C27, 90C30, 90C31, 90C33, 
90C59] 
(see: Lagrangian multipliers methods for convex 
programming; Pivoting algorithms for linear programming 
generating two paths; Robust global optimization; Variable 
neighborhood search methods) 
feasible solution see: basic —; extreme — 
feasible solutions see: set of — 
feasible spanning tree structure 
[90C35] 
(see: Minimum cost flow problem) 
feasible underestimators 
[90C11, 90C26] 
(see: Extended cutting plane algorithm) 
feature-based aggregation 
[49L20, 90C40] 
(see: Dynamic programming: stochastic shortest path 
problems) 
feature detection see: low-level — 
feature extraction 
90C39] 
(see: Neuro-dynamic programming) 
feature segmentation 
90C90] 
(see: Optimization in medical imaging) 
feature selection 
65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
feature space 
90C90] 
(see: Optimization in medical imaging) 
feature vector 
90C39] 
(see: Neuro-dynamic programming) 
features see: special model — 
features, examples from financial decision making see: 
Preference disaggregation approach: basic — 
fed-batch reactor 
93-XX] 
(see: Dynamic programming: optimal control applications) 
Feed 
34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
(see: Nonlocal sensitivity analysis with automatic 
differentiation) 
Feed algorithm 
34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
(see: Nonlocal sensitivity analysis with automatic 
differentiation) 
feed-forward network see: two-layer — 
feed-forward neural network 
90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 
feedback 
93D09] 
(see: Robust control) 
feedback see: incomplete state 
feedback arc set problem 
90C35] 
(see: Feedback set problems) 


; off-line —; on-line 


feedback arc set problem see: minimum —; minimum 
weight — 

feedback control 

[93-XX] 

see: Dynamic programming: optimal control applications) 

feedback Nash equilibrium 

[49]xx, 91Axx] 

see: Infinite horizon control and dynamic games) 

feedback set problem 

[90C35] 

see: Feedback set problems) 

feedback set problem 

[90C35] 

see: Feedback set problems) 

Feedback set problems 

90C35) 

referred to in: Biquadratic assignment problem; Graph 
coloring; Graph planarization; Greedy randomized 
adaptive search procedures; Linear ordering problem; 
Quadratic assignment problem; Quadratic 
semi-assignment problem) 
(refers to: Generalized assignment problem; Graph coloring; 
Graph planarization; Greedy randomized adaptive search 
procedures; Quadratic assignment problem; Quadratic 
semi-assignment problem) 

feedback Stackelberg solution 
(see: Bilevel programming framework for enterprise-wide 
process networks under uncertainty) 

feedback vertex (arc) set problem see: minimum —; subset —; 
subset minimum — 


feedback vertex set 
[90C35] 
(see: Feedback set problems) 

feedback vertex set see: minimum — 

feedback vertex set problem 
[90C35] 
(see: Feedback set problems) 

feedback vertex set problem see: minimum weighted —; 
unweighted — 

Fejér monotone sequence 
[47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 

Fejér monotonicity 
[47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 

Fejér monotonicity in convex optimization 
(47H05, 90C25, 90C55, 65J15, 90C25) 
(referred to in: Generalized monotone multivalued maps; 
Generalized monotone single valued maps; Generalized 
monotonicity: applications to variational inequalities and 
equilibrium problems) 
(refers to: Generalized monotone multivalued maps; 
Generalized monotone single valued maps; Generalized 
monotonicity: applications to variational inequalities and 
equilibrium problems) 

Fejérian see: S- — 

fekete points problem 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
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Fenchel conjugate functions 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: triduality in global optimization) 
Fenchel cuts 

90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: cutting plane algorithms) 
Fenchel duality 

90C25, 90C27, 90C90] 

(see: Semidefinite programming and structural 
optimization) 

nchel duality pair 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: triduality in global optimization) 
Fenchel-Legendre transformation see: integral — 
Fenchel-Moreau duality 
[90C26] 
see: Global optimization: envelope representation) 
Fenchel-Moreau subdifferential 
[90C26] 
(see: Generalized monotone multivalued maps) 
Fenchel-Rockafellar duality 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization; 
Duality theory: triduality in global optimization) 
Fenchel-Rockafellar duality 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: monoduality in convex optimization) 
Fenchel-Rockafellar duality theory 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: monoduality in convex optimization) 
Fenchel transformation 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: biduality in nonconvex optimization; 
Duality theory: monoduality in convex optimization) 
Fenchel-type duality for M- and L-convex functions 
[90C10, 90C25, 90C27, 90C35] 
see: L-convex functions and M-convex functions) 
Fenchel-Young inequality 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: monoduality in convex optimization) 
Fermat problem 
[90C27] 

(see: Steiner tree problems) 
Fermat problem see: general — 
F.H. Clarke see: generalized subdifferential of — 
fH-VNS 
[9008, 90C26, 90C27, 90C59] 
see: Variable neighborhood search methods) 
fiber see: Grébner — 
Fibonnaci section search 
[90C30] 
see: Nonlinear least squares problems) 
fictitious domain method 
[49J20, 49]52] 
see: Shape optimization) 
fictitious uncertainty 
93D09] 

(see: Robust control) 
field see: electric —; sigma- —; splitting 
field approximation see: mean — 


F 


iy 


field via linear optimization see: Distance dependent protein 
force — 

fields see: force —; offshore oil — 

Figure Legends 
(see: Mixed integer nonlinear bilevel programming: 
deterministic global optimization) 

figures 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 

fill-in see: intermediate — 

fill-in of a graph see: minimum — 

filled function 
[65K05, 90C26, 90C30, 90C59] 
(see: Global optimization: filled function methods) 

filled function see: discrete —; globally convexized —; 
locally — 

filled function methods 
[65K05, 90C26, 90C30, 90C59] 
(see: Global optimization: filled function methods) 

filled function methods see: basic outline of —; Global 
optimization: — 

filling see: Global optimization using space — 

filling curve see: space — 

filling curves see: approximation of space — 

FILO 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules) 

filter see: kalman —; Sobel edge —; Volterra — 

filters see: wedge — 

filtration see: stochastic process nonanticipative with respect 
toa— 

final state of a Turing machine 
[90C60] 
(see: Complexity classes in optimization) 

finance 
[90C26, 90C27, 91B06, 91B28, 91B60] 
(see: Financial applications of multicriteria analysis; 
Operations research and financial markets; Portfolio 
selection: markowitz mean-variance model; Portfolio 
selection and multicriteria analysis) 

finance see: mathematical —; Semi-infinite programming and 
applications in — 

Financial applications of multicriteria analysis 
(91B06, 91B60) 
(referred to in: Bi-objective assignment problem; 
Competitive ratio for portfolio management; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial optimization; Fuzzy multi-objective 
linear programming; Multicriteria sorting methods; 
Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
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approach: basic features, examples from financial decision 
making; Preference modeling; Robust optimization; 
Semi-infinite programming and applications in finance) 
(refers to: Bi-objective assignment problem; Competitive 
ratio for portfolio management; Decision support systems 
with multiple criteria; Estimating data for multicriteria 
decision making problems: optimization techniques; 
Financial optimization; Fuzzy multi-objective linear 
programming; Multicriteria sorting methods; 
Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling; Robust optimization; 
Semi-infinite programming and applications in finance) 

financial decision making 
[90C29] 
(see: Preference disaggregation approach: basic features, 
examples from financial decision making) 

financial decision making 
[90C29] 
(see: Preference disaggregation approach: basic features, 
examples from financial decision making) 

financial decision making see: Preference disaggregation 
approach: basic features, examples from — 

Financial equilibrium 
(91B50) 
(referred to in: Equilibrium networks; Generalized 
monotonicity: applications to variational inequalities and 
equilibrium problems; Oligopolistic market equilibrium; 
Spatial price equilibrium; Traffic network equilibrium; 
Walrasian price equilibrium) 


(refers to: Equilibrium networks; Generalized monotonicity: 


applications to variational inequalities and equilibrium 
problems; Oligopolistic market equilibrium; Spatial price 
equilibrium; Traffic network equilibrium; Walrasian price 
equilibrium) 

financial equilibrium 
[91B50] 
(see: Financial equilibrium) 

financial equilibrium model see: multi-sector 
multi-instrument — 

financial leverage hypothesis 
[90C05, 90C90, 91B28] 
(see: Multicriteria methods for mergers and acquisitions) 

financial markets see: Operations research and — 

Financial optimization 
(91B28) 
(referred to in: Competitive ratio for portfolio management; 
Financial applications of multicriteria analysis; Portfolio 
selection and multicriteria analysis; Robust optimization; 
Semi-infinite programming and applications in finance) 
(refers to: Competitive ratio for portfolio management; 


Financial applications of multicriteria analysis; Portfolio 

selection and multicriteria analysis; Robust optimization; 

Semi-infinite programming and applications in finance) 
financial planning 

[91B28] 

(see: Financial optimization) 
financial planning problems see: Global optimization 

algorithms for — 
find all see: find one — 
find one, find all 

(see: Planning in the process industry) 
finding a minimum 

[49]52, 90C30] 

(see: Nondifferentiable optimization: relaxation methods) 
finding problem see: direction —; regularized direction — 
finding procedure see: model — 
finding shortest paths see: problem of — 
fine structures see: maxdiag —; mindiag — 
fine valuation structure 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
finer grid 

[68W01, 90-00, 90C90, 92-08, 92C50] 

(see: Optimization based frameworkfor radiation therapy) 
finite 

[46N10, 47N10, 49M37, 57R12, 65K10, 90C26, 90C30, 90C31, 

90C34] 

(see: Global optimization: tight convex underestimators; LP 

strategy for interval-Newton method in deterministic global 

optimization; Parametric global optimization: sensitivity; 

Smoothing methods for semi-infinite optimization) 
finite alphabet 
[90C60] 
see: Complexity classes in optimization) 
finite class 
[03E70, 03H0S, 91B16] 
see: Alternative set theory) 

Finite complete systems of many-valued logic algebras 

03B50, 68T15, 68T30) 

referred to in: Alternative set theory; Boolean and fuzzy 
relations; Checklist paradigm semantics for fuzzy logics; 

Inference of monotone boolean functions; Optimization in 

boolean classification problems; Optimization in classifying 

text documents) 

(refers to: Alternative set theory; Boolean and fuzzy 

relations; Checklist paradigm semantics for fuzzy logics; 

Inference of monotone boolean functions; Optimization in 

boolean classification problems; Optimization in classifying 

text documents) 
finite costs see: reduction to — 


finite-difference approximation 

[62F12, 65C05, 65K05, 90C15, 90C31] 

see: Monte-Carlo simulations for stochastic optimization) 
finite difference methods 

[34H05, 49120, 90C39] 

see: Hamilton-Jacobi-Bellman equation) 

finite differences 

[65D25, 68W30] 

see: Complexity of gradients, Jacobians, and Hessians) 
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finite-dimensional control problem 
03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
finite-dimensional linear program 
03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
finite-dimensional models for entropy optimization for image 
reconstruction 
94A08, 94A17] 
(see: Maximum entropy principle: image reconstruction) 
finite-dimensional subspace 
65M60] 
(see: Variational inequalities: F. E. approach) 
finite-dimensional variational inequality problem 
65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
finite dominating set 
90B85] 
(see: Multifacility and restricted location problems) 
finite dominating set 
90B85] 
(see: Multifacility and restricted location problems) 
finite element 
49M37, 65K05, 90C26, 90C30, 90C90] 
(see: Structural optimization; Structural optimization: 
history) 
finite element 
[49M37, 65K05, 90C30] 
(see: Structural optimization) 
finite element see: mixed — 
finite element approximation 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
finite element approximation see: mixed — 
finite element method 
[49J40, 49J52, 49M05, 49Q10, 49805, 65M60, 70-08, 74G99, 
74H99, 74K99, 74Pxx, 90C33, 90C90, 91A65, 94A08, 94A17] 
(see: Hemivariational inequalities: applications in 
mechanics; Maximum entropy principle: image 
reconstruction; Multilevel optimization in mechanics; 


Quasidifferentiable optimization: variational formulations; 
Quasivariational inequalities; Variational inequalities: F. E. 


approach) 
finite €-convergence 
[49M29, 90C11] 
see: Generalized benders decomposition) 
finite generation method see: Lagrangian — 
finite horizon 
see: Bayesian networks) 
finite jump system 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
finite minimax problem 
49K35, 49M27, 65K10, 90C25] 
(see: Convex max-functions) 
finite moment problem 
28-XX, 49-XX, 60-XX] 
(see: General moment optimization problems) 


finite natural numbers 

[03E70, 03H05, 91B16] 

(see: Alternative set theory) 
finite nested family 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 
finite optimality 

[49]xx, 91Axx] 

(see: Infinite horizon control and dynamic games) 
finite optimization problem see: one-parametric — 
finite rational numbers 

[03E70, 03H05, 91B16] 

(see: Alternative set theory) 
finite sequence see: generalized — 
finite set see: hierarchy ina — 
finite set of the alternatives 
90-XX] 

(see: Outranking methods) 

finite-state Markov chain 

49120, 90C39] 

(see: Dynamic programming: discounted problems) 
Finsler theorem 

93D09] 

(see: Robust control) 

firmly nonexpansive operator 

47H05, 65J15, 90C25, 90C55] 

(see: Fejér monotonicity in convex optimization) 
first see: best- —; breadth- —; depth- — 
first algorithm see: mandatory work — 
first bank 

[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 

70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 

(see: Global optimization in protein folding) 
first-cluster second see: schedule — 
first descent 

[9008, 90C26, 90C27, 90C59] 

(see: Variable neighborhood search methods) 
first-In-First-Out 

(see: Railroad crew scheduling) 
first-in last-out rule 
05B35, 65K05, 90C05, 90C20, 90C33] 

(see: Criss-cross pivoting rules) 
first level problem 
90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
first order approximation of a function 
65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
first order changes see: up to — 
first order constraint qualification 
49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
First order constraint qualifications 

(90C30, 49K27, 90C31, 49K40) 

(referred to in: Equality-constrained nonlinear 

programming: KKT necessary optimality conditions; 

Inequality-constrained nonlinear optimization; 

Kuhn-Tucker optimality conditions; Lagrangian duality: 

BASICS; Nondifferentiable optimization: parametric 
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programming; Rosen’s method, global convergence, and 
Powell’s conjecture; Saddle point theory and optimality 
conditions; Second order constraint qualifications; Second 
order optimality conditions for nonlinear optimization) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; 
Inequality-constrained nonlinear optimization; 
Kuhn-Tucker optimality conditions; Lagrangian duality: 
BASICS; Rosen’s method, global convergence, and Powell’s 
conjecture; Saddle point theory and optimality conditions; 
Second order constraint qualifications; Second order 
optimality conditions for nonlinear optimization) 

first order constraint qualifications 
[49K27, 49K40, 90C30, 90C31] 
(see: Second order constraint qualifications) 

First order CQ 
[49K27, 49K40, 90C26, 90C30, 90C31, 90C39] 
(see: First order constraint qualifications; Second order 
optimality conditions for nonlinear optimization) 

first order differential equations see: Duality in optimal control 
with — 

first order KKT conditions 
[90C31] 
(see: Bounds and solution vector estimates for parametric 
NLPS) 

first order necessary condition 
[49M29, 65K10, 90C06, 90C26, 90C39] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control; Second order optimality 
conditions for nonlinear optimization) 

first order necessary conditions 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 

first order necessary optimality conditions 
[49M37, 65K05, 90C30] 
(see: Equality-constrained nonlinear programming: KKT 
necessary optimality conditions) 

first order optimality 
[49M37, 65K05, 90C30] 
(see: Inequality-constrained nonlinear optimization) 

First order partial differential equations 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 

first order and second order optimality conditions 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 

first order tangent set 
[49K27, 49K40, 90C30, 90C31] 
(see: Second order constraint qualifications) 

first order Taylor series expansion 
[90C30] 
(see: Simplicial decomposition) 

first order Taylor series expansion 
[90C30] 
(see: Convex-simplex algorithm; Frank-Wolfe algorithm; 
Simplicial decomposition) 


first order theory of real addition with order 
[52B12, 68Q25] 
(see: Fourier—-Motzkin elimination method) 

First-Out see: first-In- — 

first principle see: Wardrop — 

first-schedule second strategy see: cluster — 

first search see: depth- — 

first search with backtracking see: depth- — 

first slope lemma 

90C30] 

(see: Rosen’s method, global convergence, and Powell’s 

conjecture) 

first-stage decision 

90C15] 

(see: Two-stage stochastic programs with recourse) 

first-stage decisions 

90B10, 90B15, 90C10, 90C15, 90C35] 
(see: Preprocessing in stochastic programming; Stochastic 
integer programs; Stochastic programming: parallel 
factorization of structured matrices; Stochastic vehicle 
routing problems) 

first tree search see: best- —; depth- —; Parallel Best- —; 
Parallel Depth- — 

Fischer-Burmeister function 
[90C30, 90C33] 
(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 

fit see: best — 

fitness 

[92B05] 

see: Broadcast scheduling problem; Genetic algorithms) 

fitness 

[92B05] 

see: Genetic algorithms) 

fitness see: genetic engineering via negative — 

fitness function 

[68T20, 68T99, 90C27, 90C59] 

see: Metaheuristics) 

fittest see: survival of the — 

fitting see: curve —; data —; subjective curve — 

fitting to data see: best — 

fitting and extrapolation see: subjective curve — 


fix see: dive-and- —; near-integer- —; relax-and- — 
fixed charge 

[90025] 

(see: Concave programming) 
fixed charge 


[90B10, 90B80, 90C11] 
(see: Piecewise linear network flow problems; Stochastic 
transportation and location problems) 
fixed charge see: linear — 
fixed charge function 
[90B10, 90C26, 90C30, 90C35] 
see: Nonconvex network flow problems) 
fixed charge network flow problem 
[90B10] 
see: Piecewise linear network flow problems) 
fixed charge networks 
[90B10, 90C26, 90C30, 90C35] 
see: Nonconvex network flow problems) 
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fixed charge problem 
[90C10, 90C11, 90C27, 90C57] 
(see: Integer programming) 
fixed charge transportation problem 
[90B06, 90B10, 90C26, 90C35] 
(see: Minimum concave transportation problems) 
fixed charge transportation problem 
[90B06, 90B10, 90C26, 90C35] 
(see: Minimum concave transportation problems) 
fixed cost with capacity constraints see: single — 
fixed cost with no capacity constraints see: single — 
fixed degree of flexibility 
[90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 
fixed demand traffic network equilibrium 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
fixed demand traffic network problems 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
fixed number of vehicles see: Vehicle scheduling problems 
with a — 
fixed parameter tractability 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
see: Optimization problems in unit-disk graphs) 
fixed parameter tractable algorithms 
[68R10, 90C27] 
(see: Branchwidth and branch decompositions) 
fixed point 
[49L20, 49M29, 65H10, 65J15, 65K10, 90C06, 90C30, 90C39, 
90C52, 90C53, 90C55] 
see: Asynchronous distributed optimization algorithms; 
Contraction-mapping; Dynamic programming: discounted 
problems; Local attractors for gradient-related descent 
iterations) 
fixed point see: coupled — 
fixed point computation 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
see: Information-based complexity and information-based 
optimization) 
fixed point iteration 
[65G20, 65G30, 65G40, 65H20] 
see: Interval fixed point theory) 
fixed point problem 
[47H05, 65J15, 90C25, 90C33, 90C55] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem; Fejér monotonicity in 
convex optimization) 
fixed point problem 
[65K10, 65M60, 90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem; Variational inequalities) 
fixed point problem see: Equivalence between nonlinear 
complementarity problem and — 
fixed point theorem 
[46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
fixed point theorem see: brouwer —; Miranda —; Schauder —; 
Tychonoff — 


fixed point theory 
[90C05, 90C10, 90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem; Simplicial pivoting 
algorithms for integer programming) 
fixed point theory see: Interval — 
fixed recourse 
90C15] 
(see: Stochastic linear programs with recourse and arbitrary 
multivariate distributions) 
fixed tabs search 
03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
fixed travel demand 
90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
fixedTime 
(see: Medium-term scheduling of batch processes) 
fixing see: reduced cost — 
FL 
[15A39, 90B80, 90C05] 
(see: Facilities layout problems; Farkas lemma) 
flat fuzzy number see: L-R — 
fleet see: mixed — 
fleet assignment 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
fleet assignment see: airline — 
fleet assignment problem 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
Fletcher-Goldfarb-Shanno method see: Broyden- — 
Fletcher-Goldfarb-Shanno quasi-Newton update see: 
Broyden- — 
Fletcher-Goldfarb-Shanno update see: Broyden- — 
Fletcher-Powell method see: Davidon- — 
Fletcher-Powell update see: Davidon- — 
Fletcher-Reeves algorithm 
[90C30 
(see: Conjugate-gradient methods) 
Fletcher-Reeves formula 
[90C06 
(see: Large scale unconstrained optimization) 
Fletcher-Reeves method 
[90C06 
(see: Large scale unconstrained optimization) 
flexibility 
[90C26 
(see: Global optimization in batch design under 
uncertainty) 
flexibility 
[90C26 
(see: Bilevel optimization: feasibility test and flexibility 
index) 
flexibility see: fixed degree of —; optimal degree of —; 
stochastic — 
flexibility analysis of flowsheets 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
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flexibility index 
[90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 
flexibility index see: Bilevel optimization: feasibility test and — 
flexible arm 
[93-XX] 
(see: Optimal control of a flexible arm) 
flexible arm see: Optimal control of a — 
Flexible mass exchange networks 
[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks) 
flexible MOLP with fuzzy coefficients 
90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 
flexible programming 
90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 
flexible templates see: De novo protein design using — 
flight schedule 
90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
flipping see: algorithm partition- —; partition — 
flipping model 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
flips see: good edge — 
floating point intervals 
65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
floating point operation 
15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 
floor function 
65K05] 
(see: Direct global optimization algorithm) 
flop 
15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 
flow 
90C35] 
(see: Maximum flow problem) 
flow see: ascent —; conservation of —; descent —; feasible —; 
generalized —; material —; maximize operating cash —; 
maximum —; minimum cost network —; 
multicommodity —; relaxed multicommodity —; value of 
a—; value of a network — 
flow across an s—t-cut 
[90C35] 
(see: Maximum flow problem) 
flow algorithm see: max- — 
flow balance equations see: node — 
flow bound constraints 
[90C35] 
(see: Maximum flow problem; Minimum cost flow problem) 
flow bounds see: arc — 
flow conservation constraint 
[68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 


flow conservation law 
(see: Peptide identification via mixed-integer optimization) 

flow constraints 
[90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 

flow decision variable 
[90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 

flow equation see: conservation of — 

flow equations see: conservation of — 

flow formulation see: consist —; link —; path — 

flow lines see: connection of — 

flow min-cut theorem see: max- — 

flow model see: network — 

flow models see: undirected multicommodity network — 

flow pattern see: feasible path — 

flow problem 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 

flow problem see: feasible —; fixed charge network —- linear 
network —; maximal —; Maximum —; minimum cost —; 
minimum cost network —; multicommodity network —; 
network —; node-path formulation of the 
multicommodity —; nonconvex network —; nonlinear 
dynamic network —; nonlinear network —; nonlinear single 
commodity network —; package —; piecewise linear 
minimum cost network —; uncapacitated network — 

flow problem with nonnegative lower bounds see: 
maximum — 

flow problems see: dynamic network —; large nonlinear 
multicommodity —; maximum —; Multicommodity —; 
Nonconvex network —; nonlinear multicommodity —; 
nonlinear network —; Nonoriented multicommodity —; 
Piecewise linear network — 

flow-shop 
[05-04, 90B36, 90C26, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization; MINLP: design and scheduling of batch 
processes; Stochastic scheduling) 

flow-shop problem 
[62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 

Flow shop scheduling problem 
(68M20, 90B35) 

flow solver 
[90C90] 
(see: Design optimization in computational fluid dynamics) 

flow vector see: feasible — 

flowing wells of type a see: naturally — 

flowing wells of type b see: naturally — 

flowlines see: set of — 

flowmax 
[90B35, 9330] 
(see: Gasoline blending and distribution scheduling: an 
MILP model) 

flowmin 
[90B35, 93430] 
(see: Gasoline blending and distribution scheduling: an 
MILP model) 

flowrate see: well oil — 
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flows see: augmenting —; balance equations for material —; 
capacity constraint on arc —; global gradient —; 
logistics —; Multi-commodity —; multicommodity 
network —; network —; variational inequality formulation 
in path — 
flows with gains 
[90C35] 
(see: Generalized networks) 
flows in networks 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C26, 90C27, 
90C30, 90C35] 
(see: Minimum concave transportation problems; 
Nonconvex network flow problems; Vehicle scheduling) 
flowsheet see: convergence of the overall —; process — 
flowsheet optimization 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
flowsheets see: flexibility analysis of —; operability analysis 
of —; sensitivity of optimal — 
flowtime 
[90B36] 
(see: Stochastic scheduling) 
FLP 
[90B80] 
(see: Facilities layout problems) 
fluctuations see: thermal — 
fluence map optimization 
[68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
fluid dynamics see: computational —; Design optimization in 
computational — 
flux estimation in distributed systems see: boundary — 
flux estimation in lumped systems see: reaction — 
flux models see: estimation of diffusion — 
fluxes see: estimation of 1D-diffusion — 
FMOLP 
[90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 
fold integer programming see: n- — 
fold matrix see: n- — 
folding see: Adaptive simulated annealing and its application 
to protein —; Global optimization in protein —; 
Monte-Carlo simulated annealing in protein —; protein —; 
Simulated annealing methods in protein — 
folding: w@BB global optimization approach see: Multiple 
minima problem in protein — 
folding: generalized-ensemble algorithms see: Protein — 
folding problem see: protein — 
folks theorem 
[49]xx, 91 Axx] 
see: Infinite horizon control and dynamic games) 
follower problem 
[90C25, 90C29, 90C30, 90C31] 
see: Bilevel programming: optimality conditions and 
duality) 
following 
[90B60, 90B80, 90B85] 
see: Competitive facility location) 
following see: path — 
following algorithm see: path — 


following algorithm for entropy optimization see: path — 

following approach see: path — 

following methods see: path — 

following and singularities see: Parametric optimization: 
embeddings, path — 

forbidden or tabu 
[68M20, 90B06, 90B35, 90B80, 90C59] 
(see: Flow shop scheduling problem; Heuristic and 
metaheuristic algorithms for the traveling salesman 
problem; Location routing problem; Metaheuristic 
algorithms for the vehicle routing problem) 

force see: brute- — 

force field via linear optimization see: Distance dependent 
protein — 

force fields 
[65K10, 92C40] 
(see: Multiple minima problem in protein folding: «BB 
global optimization approach) 

forced see: color- — 

Ford-Fulkerson algorithm 
[05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 

Ford method see: Bellman- — 

forecast see: judgemental — 

Forecasting 
(90C30, 90C26) 
(referred to in: Continuous global optimization: 
applications) 
(refers to: Continuous global optimization: applications; 
Genetic algorithms) 

forecasting methods see: qualitative —; quantitative — 

forecasting model 

90C26, 90C30] 

(see: Forecasting) 

foreset 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

foreset and afterset representation of relations 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

forest see: basis — 

forest management 
[90C35] 
(see: Multicommodity flow problems) 

form see: AD intermediate —; block angular —; Boolean 
formula in conjunctive normal —; canonical —; canonical 
normal —; coercive bilinear symmetric continuous —; 
complete many-valued logic normal —; conjunctive 
normal —; constraints in standard —; disjunctive normal —; 
echelon —; extensive —; game in normal —; K-local 
bilinear —; Lagrange —; Lagrangian —-; linear optimization 
problem in standard —; logarithmic —; logarithmic p- —; 
many-valued normal —; matrix in standard —; Mayer —; 
normal —; Pl-normal —; rational p- —; standard —; 
standard greedy —; Taylor — 

form approach see: closed —; open — 

form of CEP see: restricted accessibility —; universally 
accessible — 

form of coordinates see: kth order — 
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form of KT conditions see: nonstoichiometric —; 
stoichiometric — 

form of a polynomial see: normal — 

form test see: Taylor — 

form transformation see: unimodular max-closed — 

form transformations see: unimodular max-closed — 

formal orthogonal polynomials see: least squares — 

formal perfect dual 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

formation see: automated hypothesis — 

formation values see: set of — 

formats see: independent of solver — 

forms see: bilinear —; Global optimization: functional —; 
minimization of Pinkava normal —; Pl-algebras and 
2-valued normal —; Quadratic integer programming: 
complexity and equivalent —; tricanonical — 

forms of Pi-algebras see: functionally complete normal — 

formula see: Bauer —; Cauchy —; Euler —; Fletcher-Reeves —; 
integral over surface —; integral over volume —; marginal 
value —; Moré updating —; Polak—Ribiére —; satisfiable 
Boolean —; selfdual rank one —; set- —; 
Sherman-Morrison —; Sherman-Morrison rank-one 
update —; Sherman—Morrison—Woodbury — 

formula in conjunctive normal form see: Boolean — 

formulas see: Horn —; satisfiability of Boolean — 

formulation see: column generation —; consist flow —; 
continuous-time —; inverse interpolation parametric 
eigenvalue —; LCP: Pardalos—Rosen mixed integer —; least 
squares —; link flow —; mathematical —; mixed 
variational —; multilevel problem —; node-arc —; path —; 
path flow —; price —; problem —; quantity —; 
saddle-point —; Scarf —; separable —; variational 
inequality — 

formulation in link loads see: variational inequality — 

formulation of the multicommodity flow problem see: 
node-path — 

formulation in path flows see: variational inequality — 

formulation of the problem see: node-arc — 

formulation of quasidifferential laws see: variational — 

formulation of quasidifferential thermal boundary conditions 
see: variational — 

formulation and solution of inverse problems 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 

formulation of SP see: split-variable — 

formulation space search 
[9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 

formulation of subdifferential laws see: variational — 

formulations see: discrete-time —; Quasidifferentiable 
optimization: variational —; Stochastic optimal stopping: 
problem —; variational inequality — 

Forrest-Goldfarb method 
[65K05, 65K10] 
(see: ABS algorithms for optimization) 

Fortran see: high performance —; Vienna — 


Fortran program for nonlocal sensitivity analysis see: 
automated — 

FORTRAN subroutines 

[90C35] 

(see: Feedback set problems) 

forward arc 

[90C35] 

see: Maximum flow problem) 

forward automatic differentiation see: vector — 

forward compatibility 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

forward mode 

[65D25, 65H99, 65K99, 68W30] 

see: Automatic differentiation: point and interval; 

Complexity of gradients, Jacobians, and Hessians) 

forward mode of AD 

[49-04, 65Y05, 68N20] 

see: Automatic differentiation: parallel computation) 

forward mode of an AD algorithm 

[26A24, 65D25] 

see: Automatic differentiation: introduction, history and 

rounding error estimation) 

forward mode of automatic differentiation 

[26A24, 65G20, 65G30, 65G40, 65H20, 65K99, 85-08] 

see: Automatic differentiation: geometry of satellites and 
tracking stations; Interval analysis: intermediate terms) 

forward network see: two-layer feed- — 

forward neural network see: feed- — 

forward path 

[90B10, 90C27] 

see: Shortest path tree algorithms) 

forward phases 

[90C35] 

see: Graph coloring) 

forward substitution 

[65G20, 65G30, 65G40, 65H20] 

see: Interval analysis: intermediate terms) 

foundations of industrial engineering see: Archimedes and 
the — 

four-argument function 

[62H30, 90C27] 

see: Assignment methods in clustering) 

Fourier see: mechanical principle of — 

Fourier law of heat conduction 

[35R70, 47840, 74B99, 74D99, 74G99, 74H99] 

see: Quasidifferentiable optimization: applications to 

thermoelasticity) 

Fourier-Motzkin elimination 

[52B12, 68Q25] 

see: Fourier-Motzkin elimination method) 

Fourier-Motzkin elimination method 

52B12, 68Q25) 

refers to: Farkas lemma; Farkas lemma: generalizations; 

Linear programming) 

Fourier-Motzkin method 

[52B12, 68Q25] 

see: Fourier-Motzkin elimination method) 

Fourier relaxation method see: Agmon-Motzkin- — 
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FP 


fra 


fra 


fra 


90C30, 90C90] 


(see: Successive quadratic programming: applications in the 
process industry) 


ctal interface 
49Q10, 74K99, 74Pxx, 90C90, 91A65] 


(see: Multilevel optimization in mechanics) 


ctal set 
49Q10, 74K99, 74Pxx, 90C90, 91A65] 


(see: Multilevel optimization in mechanics) 


ctional 0-1 knapsack 
68Q25, 68R05, 90-08, 90C27, 90C32] 


(see: Fractional combinatorial optimization) 


fra 


ctional 0-1 programming problem 


(see: Fractional zero-one programming) 

Fractional combinatorial optimization 
(90-08, 90C27, 90C32, 68Q25, 68R05) 
(referred to in: Combinatorial matrix analysis; 
Combinatorial optimization algorithms in resource 
allocation problems; Combinatorial optimization games; 
Complexity classes in optimization; Complexity of 
degeneracy; Complexity of gradients, Jacobians, and 
Hessians; Complexity theory; Complexity theory: quadratic 
programming; Computational complexity theory; 
Evolutionary algorithms in combinatorial optimization; 
Fractional programming; Information-based complexity 
and information-based optimization; Kolmogorov 
complexity; Mixed integer nonlinear programming; 
Multi-objective combinatorial optimization; NP-complete 
problems and proof methodology; Parallel computing: 
complexity classes; Quadratic fractional programming: 
Dinkelbach method; Replicator dynamics in combinatorial 
optimization; Stochastic integer programs) 


(refers to: Bilevel fractional programming; Combinatorial 
matrix analysis; Combinatorial optimization algorithms in 
resource allocation problems; Combinatorial optimization 
games; Complexity classes in optimization; Complexity of 
degeneracy; Complexity of gradients, Jacobians, and 
Hessians; Complexity theory; Complexity theory: quadratic 
programming; Computational complexity theory; 
Evolutionary algorithms in combinatorial optimization; 
Fractional programming; Information-based complexity 
and information-based optimization; Kolmogorov 
complexity; Mixed integer nonlinear programming; 
Multi-objective combinatorial optimization; Neural 
networks for combinatorial optimization; NP-complete 
problems and proof methodology; Parallel computing: 
complexity classes; Quadratic fractional programming: 
Dinkelbach method; Replicator dynamics in combinatorial 
optimization) 


fractional combinatorial optimization see: linear —; uniform — 
fractional combinatorial optimization problem 


[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 


fractional combinatorial optimization problem see: integral 


linear — 


fractional (hyperbolic) 0-1 programming problem see: 


single-ratio — 


fractional linear programming 


[90C11] 
(see: MINLP: branch and bound methods) 


fractional optimization 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
fractional optimization 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
fractional optimization see: parametric approach to — 
fractional program 
90C32] 
(see: Fractional programming) 
fractional program see: concave —; generalized —; linear —; 
max-min —; min-max —; multi-objective —; quadratic —; 
single-ratio —; sum-of-ratios — 
Fractional programming 
(90C32) 
(referred to in: Fractional combinatorial optimization; 
Quadratic fractional programming: Dinkelbach method) 
(refers to: Bilevel fractional programming; Farkas lemma; 
Farkas lemma: generalizations; Fractional combinatorial 
optimization; Quadratic fractional programming: 
Dinkelbach method) 
fractional programming 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
fractional programming 
[90C27, 90C32] 
(see: Fractional programming; Operations research and 
financial markets; Quadratic fractional programming: 
Dinkelbach method) 
fractional programming see: Bilevel —; combinatorial —; 
integer —; linear- —; multi-objective — 
fractional programming: Dinkelbach method see: Quadratic — 
fractional programming problem 
[90C32] 
(see: Quadratic fractional programming: Dinkelbach 
method) 
fractional programming problems see: Multi-objective — 
fractional programs see: classification of — 
fractional routing pattern model 
[68Q25, 90B80, 90C05, 90C27] 
(see: Communication network assignment problem) 
fractional terms see: linear — 
fractional updating 
(see: Bayesian networks) 
Fractional zero-one programming 
frame 
[90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming) 
framework see: Bayesian decision-theoretic —; graph 
based —-; linear algebra —; multiperiod optimization 
modeling —; Newton-Cauchy —; nonstandard —; 
primal-dual —; proximal —; Unconstrained nonlinear 
optimization: Newton—Cauchy — 
framework for enterprise-wide process networks under 
uncertainty see: Bilevel programming — 
frameworkfor radiation therapy see: Optimization based — 
frameworks see: Short-term scheduling, resource constrained: 
unified modeling — 
Frank discrete separation theorem 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
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Frank-Wolfe 
[90C06, 90C25, 90C30, 90C35] 
(see: Cost approximation algorithms; Frank-Wolfe 
algorithm; Simplicial decomposition algorithms) 
Frank-Wolfe algorithm 
(90C30) 
(referred to in: Cost approximation algorithms; Simplicial 
decomposition; Stochastic transportation and location 
problems; Traffic network equilibrium) 
(refers to: Rosen’s method, global convergence, and Powell’s 
conjecture) 
Frank-Wolfe algorithm 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
Frank-Wolfe algorithm 
[90C30] 
(see: Cost approximation algorithms; Simplicial 
decomposition) 
Frank—Wolfe algorithm see: regularized — 
Frank—Wolfe decomposition see: regularized — 
Fréchet 
49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
Fréchet differentiable function 
49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
fréchet normal cone 
49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
Fréchet subdifferential 
49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
Fréchet subdifferential see: limiting —; singular — 
Fréchet subdifferentials 
49K27, 58C20, 58E30, 90C46, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials; 
Nonsmooth analysis: weak stationarity) 
Fréchet subdifferentials see: limiting —; Nonsmooth 
analysis: — 
Fréchet superdifferential 
[49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
free see: univariate gradient — 
free algorithm see: gradient- — 
free alignment see: communication- — 
free alignment problem see: communication- — 
free arrangement of hyperplanes 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
free asset see: risk- — 
free assignment 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
free coloring see: conflict- — 
free descent method see: derivative- — 
free distributive lattice 
[90C09] 
(see: Inference of monotone boolean functions) 


free distributive lattice 

[90C09] 

(see: Inference of monotone boolean functions) 
free energy 

[92B05] 

(see: Genetic algorithms for protein structure prediction) 
free energy 

[92B05] 

(see: Genetic algorithms for protein structure prediction) 
free energy see: molar Gibbs —; total Gibbs — 
free Givens transformation see: square-root- — 
free lunch see: no — 
free methods for non-smooth optimization see: Derivative- — 
free minimization see: gradient- — 
free minimization algorithm see: gradient- — 
free reduced Hessian SQP see: multiplier- — 
free shape design see: robust obstacle- — 
free truss design see: robust obstacle- — 
free variables see: Generalized geometric programming: mixed 

continuous and discrete — 
freight operation 

[90035] 

(see: Multicommodity flow problems) 
frequency assignment see: adjacent channel constrained —; 

co-channel constrained —; order of a T-coloring —; span of 

a T-coloring — 

Frequency assignment problem 

(05-XX) 

(referred to in: Assignment and matching; Assignment 

methods in clustering; Bi-objective assignment problem; 

Broadcast scheduling problem; Communication network 

assignment problem; Graph coloring; Linear ordering 

problem; Maximum constraint satisfaction: relaxations and 
upper bounds; Maximum partition matching; Quadratic 
assignment problem) 

(refers to: Assignment and matching; Assignment methods 

in clustering; Bi-objective assignment problem; 

Communication network assignment problem; Graph 

coloring; Maximum constraint satisfaction: relaxations and 

upper bounds; Maximum partition matching; Quadratic 
assignment problem) 

frequency assignment problem see: radio link — 

frequency exhaustive sequential coloring 

[05-XX] 

(see: Frequency assignment problem) 
frequentist 

[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 

90C90] 

(see: Disease diagnosis: optimization-based methods) 
friction see: coupled unilateral contact problem with — 
frictional contact see: Signorini-Coulomb unilateral — 
Friedrich see: Gauss, Carl — 

Frieze-Yadegar linearization 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
Fritz John conditions 

[65G20, 65G30, 65G40, 65H20] 

(see: Interval analysis: verifying feasibility) 
Fritz John conditions 

[90C15] 
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(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 
Fritz John generalized conditions 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
fritz John necessary optimality conditions 
[49M37, 65K05, 90C29, 90C30] 
(see: Equality-constrained nonlinear programming: KKT 
necessary optimality conditions; Generalized concavity in 
multi-objective optimization) 
Fritz John rule 
[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 
Fritz John system 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: verifying feasibility) 
Fritz John type condition 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
Frobenius theorem see: Perron— — 
Fromovitz constraint qualification see: Mangasarian— — 
Fromovitz CQ see: Mangasarian— — 
frontier see: efficient — 
frontier of efficient portfolios 
[91B50] 
see: Financial equilibrium) 
FSP 
[90C35] 
see: Feedback set problems) 
FSQP 
[65K05, 65K10, 90C06, 90C30, 90C34] 
(see: Feasible sequential quadratic programming) 
fuel mixture problem 
see: Planning in the process industry) 
fugacity coefficient 
[90C30] 
see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
Fulkerson algorithm see: Ford— — 
full components 
[90C27] 
see: Steiner tree problems) 
full discretization 
[65L99, 93-XX] 
see: Optimization strategies for dynamic systems) 
full master problem 
[90C06] 
(see: Decomposition principle of linear programming) 
full master program 
[90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 
full recourse 
[90C30, 90C35] 
see: Optimization in water resources) 
full recourse 
90C30, 90C35] 
(see: Optimization in water resources) 
full row rank 
[90C05, 90C33] 


(see: Pivoting algorithms for linear programming 
generating two paths) 
full space methods 
[90C30, 90C90] 
(see: Successive quadratic programming; Successive 
quadratic programming: applications in distillation 
systems) 
full space methods see: Successive quadratic programming: — 
full space SQP 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
full space SQP method 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in 
distillation systems) 
full space successive quadratic programming 
[90C25, 90C30] 
(see: Successive quadratic programming: full space 
methods) 
full space of x variables 
[90C30] 
(see: Successive quadratic programming) 
full space of x variables 
[90C25, 90C30] 
(see: Successive quadratic programming: full space 
methods) 
full Steiner tree 
[90C27] 
(see: Steiner tree problems) 
full-step Gauss-Newton method 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
full-step Gauss-Newton method 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
fully indecomposable matrix 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
fully nonlinear problem 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
fully polynomial time approximation scheme 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
fully stressed design 
[90C26, 90C90] 
(see: Structural optimization: history) 


function 
[01A99] 
(see: Global optimization: functional forms; Leibniz, 
gottfried wilhelm) 

function see: dt- —; abstract convex —; achievement —; 
activation —; active —; admissible pair of 
trajectory-function and control- —; affine —; aggregate 


excess demand —; aggregation —; w-concave —; antitone 
Boolean —; antitone monotone Boolean —; approximating 
the recourse —; augmented Lagrangian —; auxiliary —; 
barrier —; bias —; bilinear —; Boolean —; Boolean 
2-valued —; boundary of a —; C-differentiable —; c.d. —; 
cell of a —; characteristic —; 

Chen-Harker—Kanzow-Smale —; Chvatal —; 
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classification —; coboundary of a —; codifferentiable —; 
complementary gap —; concave —; conjugate —; 
constraint on a multiplicative —; continuously 
codifferentiable —; control —; convex —; convex-like —; 
convex max- —; convex quadratic —; coordinatewise 
increasing —; coordinatewise increasing utility —; cost —; 
Courant penalty —; cyclic shift —; D- —; d.c. —; 
decomposition of a continuous piecewise linear —; 

delta —; derivative of a —; derivative of a probability —; 
difference convex —; difference sublinear —; 

differentiable —; Dini codifferentiable —; Dini conditionally 
differentiable —; Dini conditionally directionally 
differentiable —; Dini differentiable —; Dini directionally 
differentiable —; Dini quasidifferentiable —; Dini uniformly 
differentiable —; Dini uniformly directionally 

differentiable —; directionally differentiable —; 
discontinuous —; discrete —; discrete filled —; 

distribution —; d.m. —; domain of a —; double-well —; 
DSL —; dual potential —; effective domain of a —; effective 
set of a —; energy —; exact Lgo-penalty —; exact 

penalty —; excess —; expectation of an indicator —; 
expected recourse —; expected value —; exponential —; 
fair objective —; filled —; first order approximation of a —; 
Fischer-Burmeister —; fitness —; fixed charge —; floor —; 
four-argument —; Fréchet differentiable —; gap —; 
generalized differentiable (GD) —; geodesic convex —; 
Gibbs —; globally convexized filled —; good inclusion —; 
gradient of a probability —; gradient-related set —; 

greedy —; H-convex —; Hadamard codifferentiable —; 
Hadamard conditionally differentiable —; Hadamard 
conditionally directionally differentiable —; Hadamard 
differentiable —; Hadamard directionally differentiable —; 
Hadamard quasidifferentiable —; Hamiltonian —; Hessian 
matrix of a Lagrangian —; hyperdifferentiable —; 
hypodifferentiable —; implicit utility —; inclusion —; 
increasing —; indicator —; infimum of a Lagrangian —; int 
U-quasiconcave —; integral Mean-Value for Composite 
Convexifiable —; invex —; IPH —; isotone Boolean —; 
isotone inclusion —; isotone monotone Boolean —; 
isotonic —; K-convex —; Karmarkar potential —; kernel —; 
Kojima —; Kreisselmeier—Steinhauser —; KyFan —; 
L-convex —; |; exact penalty —; £; penalty —; Lagrange —; 
Lagrangian —; least squares distance —; Lennard-Jones 
potential energy —; lexicographically minimax objective —; 
LFS —; likelihood —; linear appearance of control —; linear 
supporting —; Lipschitz —; Lipschitz continuous —- list 
square merit —; locally filled —; locally Lipschitz —; locally 
Lipschitz continuous —; locally monotone —-; locally strictly 
monotone —-; locally strongly monotone —; logarithmic 
barrier —; logarithmic-quadratic barrier-penalty —; 
logconcave —; logconcave probability density —; 
logconvex —; lower bound —; lower semicontinuous —; 
Luc U-quasiconcave —; Lyapunov —; M-convex —; 
marginal —; max- —; max-closed —; max-type —; maximin 
objective —; maximum —; maximum-type —; maxmin —; 
mean value —; membership —; merit —; the mid-point 
acceleration —; min-type —; minimal —; minimax 
objective —; minimum —; mixed integer value —; 
Moebius —; monotone —; monotone Boolean —; 
monotonic —; multicriteria objective —; multifacility Weber 
objective —; multifacility Weber—-Rawls objective —; 


multivariate probability distribution —; nonconvex —; 
nonconvex energy —; nondecreasing —; nondecreasing 
monotone Boolean —; nondifferentiable —; nonincreasing 
monotone Boolean —; nonsmooth —; objective —; 
one-dimensional marginal probability distribution —; 
optimal value —; Optimization techniques for minimizing 
the energy —; order of an inclusion —; 
parabolic-exponential —; partially separable —; Peano —; 
penalty —; perturbation —; piecewise continuously 
differentiable —; piecewise differentiable —; piecewise 
linear —; piecewise linear quadratic —; piecewise 
twice-differentiable —; polynomial time computable —; 
positive definite quadratic —; positively homogeneous —; 
potential —; potential energy —; pre-declared interval —; 
pre-invex —; preference value —; primal-dual potential —; 
primal gap —; primal potential —; probability —; program 
of minimizing a convex multiplicative —; program of 
minimizing a generalized convex —; projected Hessian 
matrix of a Lagrangian —; pseudoconvex —; pure 
complementary gap —; quadratic —; quantile —; 
quasiconcave —; quasiconvex —; quasidifferentiable —; 
R!-upper semicontinuous —; radially continuous —; 
random objective —; recourse —; regular cost —; regular 
link cost —; regularized gap —; rounding —; saddle —; 
sawtooth arc cost —; scalarizing —; scale —; score —; 
scoring —; second order decomposition of a —; 
semicoercive —; semismooth —; semistrictly 
quasiconvex —; separable convex objective —; separable 
objective —; set-valued objective —; Shannon —; 
Sheffer —; sign —; single smooth —; social utility —; 
stable —; staircase arc cost —; staircase cost —; 
standard —; step —; strictly convex —; strictly 
monotone —- strictly pseudoconvex —-; strictly 
quasiconvex —; strongly monotone —-; strongly 
semismooth —; subconjugate —; subcritical —; 
subdifferentiable —; subdual —; sublinear —; 
submodular —; supconjugate —; superadditive —; 
supercritical —; superdifferentiable —; superlinear —; 
supermodular —; support —; support set of a —; 
Tanabe-Todd-Ye potential —; three-argument —; time 
complexity —; total cost —; trajectory —; twice 
codifferentiable —; twice continuously codifferentiable —; 
twice-differentiable part of a —; two-dimensional marginal 
probability distribution —; U-concave —; U-continuous —; 
U-pseudoconcave —; U-quasiconcave —; U-weakly 
pseudoconcave —; umbrella —; uniform P- —; uniformly 
convex —; upper semicontinuous —; upper 
semismooth —-; utility —; value —; zone of a — 

function of an algorithm see: time complexity — 

function approach see: Bilevel programming: implicit —; 
continuously differentiable exact penalty —; implicit —; 
value — 

function approach to bilevel programming see: implicit — 

function associated with A see: canonical — 

function based algorithm see: exact penalty — 

function and control-function see: admissible pair of 
trajectory- — 

function with dependent constraints see: maximum — 

function inference see: monotone Boolean — 

function inference problem see: Boolean — 

function martingale see: score — 
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function of a matroid see: weight — 

function method see: score — 

function methods see: basic outline of filled —; filled —; Global 
optimization: filled — 

function minimax inequality see: two- — 

function optimization see: marginal — 

function pair see: convex-like — 

function parametrization see: objective — 

function space 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

function space see: canonical —; extended canonical — 

function system see: iterative — 

function theorem see: implicit — 

function value see: continuity property of the objective —; 
convexity property of the objective — 

functional see: absolutely continuous —; cost —; generalized 
critical point of an energy —; Lagrange —; recession —; 
substationarity point of a —; truth- — 

functional analysis 

[01A99] 

see: Kantorovich, Leonid Vitalyevich) 

functional analysis 

[01A99] 

(see: Kantorovich, Leonid Vitalyevich) 

functional completeness 

[03B50, 68T'15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 

functional completeness 

[03B50, 68T15, 68T30] 

see: Finite complete systems of many-valued logic algebras) 

functional completeness of PI-algebras 

[03B50, 68T15, 68T30] 

see: Finite complete systems of many-valued logic algebras) 

functional dependence see: noisy — 

functional dual 

[90C10, 90C46] 

see: Integer programming duality) 

functional forms see: Global optimization: — 

functional paradigm 

90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 

functional relation 

03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

functionally complete normal forms of Pi-algebras 

03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 

functionals see: Approximation of extremum problems with 
probability —; probability — 

functions see: additive utility —; Affine sets and —; 
asymptotically admissible pair of trajectory and control —; 
augmented Lagrange —; conjugate —; continuous 
selection of —; Convex max- —; dc —; difference of 
convex —; difference of max-type —; difference of 
monotonic —; discrete —; discriminant —; distance —; 
dM —-; elastic demand traffic network problems with travel 
demand —-; elementary —; estimation of utility —; 
examples of quasidifferentiable —; Fenchel conjugate —; 


Fenchel-type duality for M- and L-convex —; gradient of 
multivariate distribution —; h-convex —; homotopic —; 
Inference of monotone boolean —; interactive learning of 
Boolean —; isotone —; L-convex functions and 
M-convex —; Lagrange-type —; linear —; Lipschitzian 
operators in best approximation by bounded or 
continuous —; marginal —; marginal distribution —; 
minimizing —; Multi-objective optimization; Interactive 
methods for preference value —; multimodal —; natural 
level —; nondifferentiable objective —; nonsmooth —; 
notation for objective —; objective —; optimal value —; 
penalty —; probability —; product of affine —; product of 
concave —; product of convex —; production —; program 
of minimizing a product of two affine —; proximal 
minimization with D- —; quasidifferentiable —; 
Quasidifferentiable optimization: algorithms for 
hypodifferentiable —; Quasidifferentiable optimization: 
algorithms for QD —; Quasidifferentiable optimization: 
codifferentiable —; quasidifferential —; sample and 
expectation —; scheduling —; separation —; set of 
elementary —; smoothing —; sum of convex 
multiplicative —; superpositions of —; theory of 
generalized —-; traffic network equilibrium with travel 
disutility — 

functions: algorithms and complexity see: Regression by 
special — 

functions and/or derivatives see: evaluation of objective — 

Functions and Applications see: minimization Methods for 
Non-Differentiable — 

functions, characterization of see: Convexifiable — 

functions: general theory and examples see: Derivatives of 
probability and integral — 

functions: hemivariational inequalities see: Nonconvex 
energy — 

functions in integer programming see: cost — 

functions: kernel type solution methods see: Extremum 
problems with probability — 

functions and M-convex functions see: L-convex — 

functions on topological vector spaces see: Increasing and 
convex-along-rays —; Increasing and positively 
homogeneous — 

fundamental cycle 

90C35] 

(see: Minimum cost flow problem) 

fundamental group 

05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements) 

fundamental indiscernibility 

03E70, 03H05, 91B16] 

(see: Alternative set theory) 

fundamental property in convex programming 

90C06] 
(see: Saddle point theory and optimality conditions) 

fundamental theorem see: Weyl — 

Fundamental theorem of algebra 
(01455, 01A50, 01460) 
(referred to in: Grébner bases for polynomial equations) 
(refers to: Grébner bases for polynomial equations) 

fundamental theorem of algebra 
[01A99] 
(see: Gauss, Carl Friedrich) 


Subject Index 


4225 


fundamental theorem of algebra 
[01A99] 
(see: Gauss, Carl Friedrich) 
fundamental theorem of linear programming see: Extension of 
the — 
fundamental theorem of natural selection 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
funding decision 
[90C27] 
(see: Operations research and financial markets) 
funnel 
[65H20] 
(see: Multi-scale global optimization using 
terrain/funneling methods) 
funneling methods see: Multi-scale global optimization using 
terrain/ — 
fuzzification 
03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
fuzziness 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
fuzziness see: unnormalized — 
fuzzy 
94A17] 
(see: Jaynes’ maximum entropy principle) 
fuzzy clustering 
65K05, 90C26, 90C56, 90C90] 
(see: Derivative-free methods for non-smooth optimization; 
Nonsmooth optimization approach to clustering) 
fuzzy coefficients see: flexible MOLP with —; MOLP with —; 
multi-objective linear programming with — 
fuzzy constraints 
90C29, 90C70 
(see: Fuzzy multi-objective linear programming) 
fuzzy criterion 
90C29, 91A99 
(see: Preference disaggregation) 
fuzzy decision 
90C29, 90C70 
(see: Fuzzy multi-objective linear programming) 
fuzzy goals 
90C29, 90C70 
(see: Fuzzy multi-objective linear programming) 
fuzzy interval inference 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
fuzzy interval pairs 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
fuzzy logic 
90C26, 90C30] 
(see: Forecasting) 
fuzzy logic 
90C26, 90C30] 
(see: Forecasting) 
fuzzy logics see: Checklist paradigm semantics for — 
Fuzzy multi-objective linear programming 
(90C70, 90C29) 
(referred to in: Bi-objective assignment problem; Decision 


support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Multicriteria sorting methods; Multi-objective 
combinatorial optimization; Multi-objective integer linear 
programming; Multi-objective optimization and decision 
support systems; Multi-objective optimization: interaction 
of design and control; Multi-objective optimization; 
Interactive methods for preference value functions; 
Multi-objective optimization: lagrange duality; 
Multi-objective optimization: pareto optimal solutions, 
properties; Multiple objective programming support; 
Outranking methods; Portfolio selection and multicriteria 
analysis; Preference disaggregation; Preference 
disaggregation approach: basic features, examples from 
financial decision making; Preference modeling) 
(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Multicriteria sorting methods; Multi-objective 
combinatorial optimization; Multi-objective integer linear 
programming; Multi-objective optimization and decision 
support systems; Multi-objective optimization: interaction 
of design and control; Multi-objective optimization; 
Interactive methods for preference value functions; 
Multi-objective optimization: lagrange duality; 
Multi-objective optimization: pareto optimal solutions, 
properties; Multiple objective programming support; 
Outranking methods; Portfolio selection and multicriteria 
analysis; Preference disaggregation; Preference 
disaggregation approach: basic features, examples from 
financial decision making; Preference modeling) 

fuzzy number 

[90C29, 90C70] 

see: Fuzzy multi-objective linear programming) 

fuzzy number see: L-R —; L-R flat — 

fuzzy numbers 

[90C29, 90C70] 

see: Fuzzy multi-objective linear programming) 

fuzzy numbers see: arithmetic operations on — 

fuzzy outranking relation 

[90-XX] 

see: Outranking methods) 

fuzzy power set 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

see: Checklist paradigm semantics for fuzzy logics) 

fuzzy product see: harsh — 

fuzzy programming 

[90C90] 

(see: Chemical process planning) 

fuzzy relation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

fuzzy relation see: a-cut of a — 

fuzzy relational product 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 
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fuzzy relations 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
fuzzy relations 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
fuzzy relations see: Boolean and —; special properties of — 
fuzzy set 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
fuzzy set-inclusion operator 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 
fuzzy sets 
[03B50, 03B52, 03C80, 03E72, 47S40, 62F30, 62Gxx, 68T27, 
68T35, 68Uxx, 90Bxx, 90C29, 90C70, 91 Axx, 91B06, 92C60] 
see: Boolean and fuzzy relations; Checklist paradigm 
semantics for fuzzy logics; Fuzzy multi-objective linear 
programming) 
fuzzy sets 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27, 90C29, 90C70] 
see: Checklist paradigm semantics for fuzzy logics; Fuzzy 
multi-objective linear programming) 
fuzzy sum rule 
[58C20, 58E30, 90C46, 90C48] 
see: Nonsmooth analysis: weak stationarity) 
fuzzy triangle product 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
fuzzy truth assessment 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 
(FVI) see: farthest vertex insertion — 


G 


g-aBB approach see: Global optimization: — 

g-basin 

65K05, 90C26, 90C30, 90C59] 

(see: Global optimization: filled function methods) 
g-group classification problem 

62H30, 68T10, 90C11] 

(see: Mixed integer classification problems) 

g-group classification problem (discriminant problem) 
[62H30, 68T10, 90C05] 

see: Linear programming models for classification) 
GA 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
gabriel graph 

[68Q20] 

see: Optimal triangulations) 

gadget 

[05C85] 

see: Directed tree networks) 

gain see: small — 


gain legitimacy 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

Gaines implication see: Goguen- — 

gains see: flows with — 

Gale-Hoffman inequalities 
[90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming) 

Galerkin approach see: Petrov- — 

Galerkin cone 
[90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem) 

Galerkin iteration see: Petrov- — 

Galerkin method see: Ritz— — 

Galerkin spectral method 
[34H05, 49120, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 

gambling see: optimal — 

game see: combinatorial optimization —; complete —; 
cooperative —; cooperative case of a two-person —; 
minimax —; noncooperative —; nonzero-sum infinite 
horizon —; optimality ina —; packing —; polymatrix —; 
Stackelberg —; two-person —; two-person zero-sum —; 
two-player zero-sum perfect-information —; von 
Stackelberg — 

game in normal form 

49] xx, 91 Axx] 

(see: Infinite horizon control and dynamic games) 

game with side payments 

90C27, 90C60, 91A12] 

(see: Combinatorial optimization games) 

game of strategy 

46A22, 49]35, 49J40, 54D05, 54H25, 55M20, 91A05] 

(see: Minimax theorems) 

game theory 

01A99, 90C27, 90C60, 90C99, 91A12] 
(see: Combinatorial optimization games; Von Neumann, 
John) 

game theory 
[49M37, 62C20, 90B80, 90B85, 90C15, 90C26, 90C30, 90C31, 
90Cxx, 91A10, 91Axx, 91B06, 91B60, 91Bxx] 
(see: Bilevel programming; Bilevel programming: 
introduction, history and overview; Facility location with 
externalities; Oligopolistic market equilibrium; Stochastic 
programming: minimax approach) 

game theory see: evolutionary —; Maximum entropy and —; 
Stackelberg — 

game tree 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 

game tree algorithm see: sequential minimax — 

game tree search algorithm see: distributed —; generalized — 

game tree searching see: Minimax — 

games 
[01499] 
(see: History of optimization) 

games 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 


Subject Index 


4227 


games see: bimatrix —; Combinatorial optimization —; Infinite 
horizon control and dynamic —; noncooperative —; theory 


of —; von Stackelberg — 
y-concave probability measure 

[90C15] 

(see: Logconcave measures, logconvexity) 
gamma distribution see: multivariate — 

GAP 

[68Q99] 

(see: Branch and price: Integer programming with column 

generation) 
gap see: approximation algorithms for —; duality —; 

integrality —; relative duality — 
gap function 

[90C15, 90C26, 90C30, 90C33] 

(see: Lagrangian duality: BASICS; Stochastic bilevel 

programs) 
gap function see: complementary —; primal —; pure 

complementary —; regularized — 
gap theorem 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
gaps in nonconvex optimization see: Duality — 

Garza method see: De La — 

gas see: allocation of — 

gas lift availability see: upper bound on — 
gas lift wells of type a 

[76T30, 90C11, 90C90] 

(see: Mixed integer optimization in well scheduling) 
gas lift wells of type b 

[76T30, 90C11, 90C90] 

(see: Mixed integer optimization in well scheduling) 
gas and water capacity constraints see: maximum oil — 
Gasoline blending and distribution scheduling: an MILP 

model 

(90B35, 93A30) 
gateaux subdifferential 

[49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 
gates see: logic — 
gauge 

[90B85] 

(see: Multifacility and restricted location problems) 
gauge 

[90B85] 

(see: Multifacility and restricted location problems) 
Gauss, Carl Friedrich 

(01499) 

(referred to in: Gauss-Newton method: Least squares, 

relation to Newton’s method; Least squares problems; 

Linear programming; Symmetric systems of linear 

equations) 


(refers to: Gauss-Newton method: Least squares, relation to 


Newton’s method; Least squares problems; Linear 

programming; Symmetric systems of linear equations) 
Gauss distribution law 

[01A99] 

(see: Gauss, Carl Friedrich) 
Gauss-Markoff theorem 

[65Fxx] 

(see: Least squares problems) 


Gauss—Newton method 


[49M37] 
(see: Nonlinear least squares: trust region methods) 


Gauss—Newton method 


[90C30, 90C52, 90C53, 90C55] 
(see: Gauss-Newton method: Least squares, relation to 
Newton’s method; Generalized total least squares) 


Gauss—Newton method see: damped —-; full-step — 
Gauss-Newton method: Least squares, relation to Newton’s 


method 

(90C30, 90C30, 90C52, 90C53, 90C55) 

(referred to in: ABS algorithms for linear equations and 
linear least squares; ABS algorithms for optimization; 
Discontinuous optimization; Gauss, Carl Friedrich; 
Generalized total least squares; Least squares orthogonal 
polynomials; Least squares problems; Nonlinear least 
squares: Newton-type methods; Nonlinear least squares 
problems; Nonlinear least squares: trust region methods) 
(refers to: ABS algorithms for linear equations and linear 
least squares; ABS algorithms for optimization; Gauss, Carl 
Friedrich; Generalized total least squares; Least squares 
orthogonal polynomials; Least squares problems; Nonlinear 


least squares: Newton-type methods; Nonlinear least 
squares problems; Nonlinear least squares: trust region 
methods) 


Ga 


uss problem 


[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 


Ga 
Ga 


Ga 


Ga 


Ga 


Ga 


Ga 


Gai 


uss quadrature rule see: generalized — 
uss—Seidel 


[90C30] 


see: Cost approximation algorithms) 
uss-Seidel algorithm 


[90C30] 


see: Cost approximation algorithms) 
uss—Seidel iteration 


[49L.20, 90C39] 


see: Dynamic programming: discounted problems) 
uss—Seidel method 


[90033] 


see: Linear complementarity problem) 
uss—Seidel value iteration 


[49L20, 90C40] 


see: Dynamic programming: stochastic shortest path 


problems) 


uss—Southwell method 


[90C30] 


see: Cyclic coordinate method) 
uss—Southwell method 


[90C30] 


see: Cyclic coordinate method) 


gaussian 


Ga 
Ga 


Ga 


see: Optimal sensor scheduling) 
ussian see: linear-quadratic — 
ussian approximation methods 


[01A99] 


see: Gauss, Carl Friedrich) 
ussian density annealing 


[90C90] 


see: Simulated annealing methods in protein folding) 
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Gaussian distribution 
[52A22, 60D05, 68Q25, 90C05] 

(see: Probabilistic analysis of simplex algorithms) 

Gaussian elimination 
[01A99, 65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 
squares; ABS algorithms for optimization; Gauss, Carl 
Friedrich) 

Gaussian elimination 

[01A99] 

see: Gauss, Carl Friedrich) 

Gaussian elimination with backsolving 

[15-XX, 65-XX, 90-XX] 

(see: Cholesky factorization) 

Gaussian measure 

[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 

see: Information-based complexity and information-based 

optimization) 

Gaussian quadrature 

[33C45, 65F20, 65F22, 65K10, 90C26] 

see: Global optimization in batch design under 

uncertainty; Least squares orthogonal polynomials) 

Gaussianity 

[90C26, 90C90] 

(see: Signal processing with higher order statistics) 

Gauvin theorem 

[49K27, 49K40, 90C30, 90C31] 

see: First order constraint qualifications) 

GBD 

[49M29, 90C11] 

(see: Generalized benders decomposition) 

GBD see: variants of — 

GC 

[90C35] 

see: Graph coloring) 

GCM 

[62G07, 62G30, 65K05] 

see: Isotonic regression problems) 

(GD) function see: generalized differentiable — 

gDP 

[90C10, 90C11, 90C27, 90C33] 

see: Continuous reformulations of discrete-continuous 
optimization problems) 

Gene clustering: A novel decomposition-based clustering 
approach: global optimum search with enhanced 
positioning 

91C20, 90C11, 90C26) 

general Algorithm 

[90B15] 

(see: Evacuation networks) 

general case of the trust region problem 

[49M37] 

see: Nonlinear least squares: trust region methods) 

general constrained optimization 

[90C26, 90C39] 

see: Second order optimality conditions for nonlinear 

optimization) 

general dynamic programming paradigm 

62H30, 90C39] 

(see: Dynamic programming in clustering) 


general economic equilibrium 
[91B50] 
(see: Walrasian price equilibrium) 

general equilibrium 
[91B50] 
(see: Walrasian price equilibrium) 

general Fermat problem 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 

general gradient 
[90C15, 90C30, 90C99] 
(see: SSC minimization algorithms for nonsmooth and 
stochastic optimization) 

general linear constraints 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 

General moment optimization problems 
(28-XX, 49-XX, 60-XX) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; Logconcave measures, logconvexity; 
Logconcavity of discrete distributions; L-shaped method for 
two-stage stochastic programs with recourse; Multistage 
stochastic programming: barycentric approximation; 
Preprocessing in stochastic programming; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; Logconcave measures, logconvexity; 
Logconcavity of discrete distributions; L-shaped method for 
two-stage stochastic programs with recourse; Multistage 
stochastic programming: barycentric approximation; 
Preprocessing in stochastic programming; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
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Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 


general order complementarity problem 


90C33] 
(see: Topological methods in complementarity theory) 
general order complementarity problem see: implicit — 


general position 


52B11, 52B45, 52B55] 

(see: Volume computation for polytopes: strategies and 
performances) 

general position of hyperplanes 

05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements) 

general purpose 

65H20, 80A10, 80A22, 90C90] 

(see: Global optimization: application to phase equilibrium 
problems) 


general structure mixed integer a BB algorithm 


[65K05, 90C11, 90C26] 
see: MINLP: global optimization with «BB) 


general theory and examples see: Derivatives of probability 


and integral functions: — 


general univariate linear model 


[65Fxx] 
see: Least squares problems) 


generalization of ELECTRE I 


ge 
ge 


[90-XX] 

see: Outranking methods) 

neralization of Lyusternik theorem see: high-order — 
neralizations see: Farkas lemma: — 


Generalizations of interior point methods for the linear 
complementarity problem 


(90C33, 90C51, 65K10) 
(referred to in: Complementarity algorithms in pattern 


recognition; Mathematical programming methods in 
supply chain management; Simultaneous estimation and 
optimization of nonlinear problems) 


(refers to: Complementarity algorithms in pattern 


recognition; Mathematical programming methods in 
supply chain management; Simultaneous estimation and 
optimization of nonlinear problems) 

generalizations of the nonlinear complementarity problem 


[90C33] 
see: Generalized nonlinear complementarity problem) 


generalized 


[90C31, 90C34] 
see: Semi-infinite programming: second order optimality 
conditions) 


Generalized assignment problem 


90-00) 
referred to in: Biquadratic assignment problem; Feedback 


set problems; Graph coloring; Graph planarization; Greedy 


randomized adaptive search procedures; Linear ordering 
problem; Multi-index transportation problems; Quadratic 
assignment problem; Quadratic semi-assignment problem) 
generalized assignment problem 
[68Q99, 90-00] 
(see: Branch and price: Integer programming with column 
generation; Generalized assignment problem) 
generalized assignment problem see: multilevel — 
generalized barycenters 


general-purpose software library 

90C10, 90C26, 90C30] 

(see: Optimization software) 

general QAP 

90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
general quadratic assignment problem 
90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 


General routing problem [90C15] 
(90B20) (see: Multistage stochastic programming: barycentric 
(referred to in: Stochastic vehicle routing problems; Vehicle approximation) 


Generalized benders decomposition 
(49M29, 90C11) 
(referred to in: Chemical process planning; Decomposition 
principle of linear programming; Generalized outer 
approximation; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLP: generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: logic-based methods; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 


routing; Vehicle scheduling) 
(refers to: Stochastic vehicle routing problems; Vehicle 
routing; Vehicle scheduling) 
general routing problem 
90B20 
(see: General routing problem) 
general second order sufficient condition 
90C31 
(see: Sensitivity and stability in NLP: continuity and 
differential stability) 
general strong second order sufficient condition 
90C31 
(see: Sensitivity and stability in NLP: continuity and 
differential stability) 
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column synthesis; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming; Nondifferentiable 
optimization; Preprocessing in stochastic programming; 
Simplicial decomposition; Simplicial decomposition 
algorithms; Stochastic linear programming: decomposition 
and cutting planes; Successive quadratic programming: 
decomposition methods) 

(refers to: Chemical process planning; Decomposition 
principle of linear programming; Extended cutting plane 
algorithm; Generalized outer approximation; MINLP: 
application in facility location-allocation; MINLP: 
applications in blending and pooling problems; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 
MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: mass and heat 
exchanger networks; Mixed integer nonlinear 
programming; Simplicial decomposition; Simplicial 
decomposition algorithms; Stochastic linear programming: 
decomposition and cutting planes; Successive quadratic 
programming: decomposition methods) 


generalized Benders decomposition 


[49M37, 65K05, 90C10, 90C11, 90C26, 90C29, 90C90] 

(see: Bilevel optimization: feasibility test and flexibility 
index; MINLP: applications in the interaction of design and 
control; MINLP: branch and bound global optimization 
algorithm; MINLP: branch and bound methods; MINLP: 
global optimization with «BB; MINLP: outer 
approximation algorithm; Mixed integer nonlinear 
programming; Multi-objective optimization: interaction of 
design and control) 

generalized Benders decomposition 

[90C09, 90C10, 90C11] 

see: MINLP: logic-based methods; MINLP: outer 
approximation algorithm) 


generalized Benders method 


[90C09, 90C10, 90C11] 
see: MINLP: logic-based methods) 


generalized bilevel programming problem 


[49M37, 65K05, 65K10, 90C30, 93A13] 
see: Multilevel methods for optimal design) 


generalized bisection algorithm 


[65H20, 80A10, 80A22, 90C90] 

see: Global optimization: application to phase equilibrium 
problems) 

generalized complementarity 

[90C33] 

(see: Generalized nonlinear complementarity problem) 


generalized complementarity problem 


[47J20, 49J40, 65K10, 90C33] 
(see: Generalized nonlinear complementarity problem; 
Solution methods for multivalued variational inequalities) 


generalized concavity see: vector — 


Generalized concavity in multi-objective optimization 
(90C29) 
(referred to in: Invexity and its applications; L-convex 
functions and M-convex functions) 
(refers to: Invexity and its applications; Isotonic regression 
problems) 
generalized conditions see: Fritz John — 
generalized convex function see: program of minimizing a — 
generalized convexity 
90C26] 
(see: Generalized monotone multivalued maps; Generalized 
monotone single valued maps) 
generalized critical point 
90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
generalized critical point 
49J40] 
(see: Nonconvex-nonsmooth calculus of variations) 
generalized critical point of an energy functional 
49J40] 
(see: Nonconvex-nonsmooth calculus of variations) 
generalized critical point set 
90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
generalized cross decomposition 
49M27, 49M37, 90C11, 90C30] 
(see: MINLP: generalized cross decomposition; Mixed 
integer nonlinear programming) 
generalized cross decomposition see: MINLP: — 
generalized cutting plane 
[90C26] 
(see: Global optimization: envelope representation) 
generalized cutting plane method 
[49]40, 49J52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
generalized derivative see: clarke —; Clarke—Rockafellar — 
generalized differentiable (GD) function 
90C15] 
(see: Stochastic quasigradient methods in minimax 
problems) 
generalized directional derivative 
49J52] 
(see: Hemivariational inequalities: eigenvalue problems) 
generalized directional derivative see: Clarke — 
generalized directional differential 
49J40, 49]52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
(see: Nonconvex energy functions: hemivariational 
inequalities) 
Generalized Disjunctive Programming 
(see: Logic-based outer approximation) 
Generalized disjunctive programming 
[90C09, 90C10, 90C11] 
(see: Generalized disjunctive programming; MINLP: 
logic-based methods; Optimal planning of offshore oilfield 
infrastructure) 
generalized dual problem 
[90C30] 
(see: Image space approach to optimization) 
generalized eigenvalue proximal support vector machine 
[68Q32, 68T10] 
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(see: Generalized eigenvalue proximal support vector machine 
problem) 

Generalized eigenvalue proximal support vector machine 
problem 
[68Q32, 68T10] 
(see: Generalized eigenvalue proximal support vector machine 
problem) 

generalized-ensemble algorithms see: Protein folding: — 

Generalized ensembles 
[92-08, 92C05, 92C40] 
(see: Protein folding: generalized-ensemble algorithms) 

generalized equation 
[65K10, 90C31, 90C33] 
(see: Linear complementarity problem; Sensitivity analysis 
of complementarity problems; Sensitivity analysis of 
variational inequality problems) 

generalized finite sequence 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

generalized flow 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 

generalized fractional program 
[90C32] 
(see: Fractional programming) 

generalized functions see: theory of — 

generalized game tree search algorithm 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 

generalized Gauss quadrature rule 
[90C05, 90C34] 
(see: Semi-infinite programming: methods for linear 
problems) 

generalized geometric programming 
[90C26, 90C90] 
(see: Global optimization in generalized geometric 
programming) 

generalized geometric programming 
[90C26, 90C90] 
(see: Global optimization in generalized geometric 
programming) 

generalized geometric programming see: Global optimization 
in — 

Generalized geometric programming: mixed continuous and 
discrete free variables 
[49M37, 90C11, 90C30] 
(see: Generalized geometric programming: mixed continuous 
and discrete free variables) 

generalized gradient 
[26E25, 46N10, 49J40, 49J52, 49Q10, 52A27, 65G20, 65G30, 
65G40, 65K05, 70-XX, 74K99, 74Pxx, 80-XX, 90-00, 90C30, 
90C47, 90C99] 
(see: Hemivariational inequalities: eigenvalue problems; 
Interval global optimization; Nonconvex energy functions: 
hemivariational inequalities; Nondifferentiable 
optimization; Quasidifferentiable optimization: Dini 
derivatives, clarke derivatives) 

generalized gradient see: Clarke — 


generalized inverses 

49M37] 

(see: Nonlinear least squares: Newton-type methods) 
generalized invex 

90C26] 

(see: Invexity and its applications) 
generalized invex 

90C26] 

(see: Invexity and its applications) 
generalized Jacobian see: Clarke — 
generalized Karush-Kuhn-Tucker conditions 
65K10, 90C31] 

(see: Sensitivity analysis of variational inequality problems) 
generalized Lagrange multiplier approach see: Everett — 
generalized least squares problem 
[90C05, 90C25, 90C29, 90C30, 90C31] 
see: Nondifferentiable optimization: parametric 
programming) 
generalized linear order complementarity problem 
[90C33] 
see: Order complementarity) 
generalized linear programming with variable coefficients 
[90C05, 90C25, 90C30, 90C34] 
see: Semi-infinite programming, semidefinite 
programming and perfect duality) 
generalized minimizing sequence 
[49J40, 49M30, 65K05, 65M30, 65M32] 
see: Ill-posed variational problems) 
generalized mixed complementarity problem 
[47J20, 49J40, 65K10, 90C33] 
see: Solution methods for multivalued variational 
inequalities) 

Generalized monotone multivalued maps 
90C26) 
referred to in: Fejér monotonicity in convex optimization; 

Generalized monotone single valued maps; Generalized 

monotonicity: applications to variational inequalities and 

equilibrium problems; Pseudomonotone maps: properties 
and applications; Set-valued optimization) 

(refers to: Fejér monotonicity in convex optimization; 

Generalized monotone single valued maps; Generalized 

monotonicity: applications to variational inequalities and 

equilibrium problems; Set-valued optimization) 
generalized monotone operator 

[46N10, 49J40, 90C26] 

(see: Generalized monotonicity: applications to variational 

inequalities and equilibrium problems) 
Generalized monotone single valued maps 

(90C26) 

(referred to in: Fejér monotonicity in convex optimization; 

Generalized monotone multivalued maps; Generalized 

monotonicity: applications to variational inequalities and 

equilibrium problems; Pseudomonotone maps: properties 
and applications; Set-valued optimization) 

(refers to: Fejér monotonicity in convex optimization; 

Generalized monotone multivalued maps; Generalized 

monotonicity: applications to variational inequalities and 

equilibrium problems; Set-valued optimization) 
generalized monotonicity 

[90C26] 

(see: Generalized monotone single valued maps) 
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generalized monotonicity 

[46N10, 49]40, 90C26] 

(see: Generalized monotone multivalued maps; Generalized 
monotone single valued maps; Generalized monotonicity: 
applications to variational inequalities and equilibrium 
problems) 

Generalized monotonicity: applications to variational 
inequalities and equilibrium problems 

(90C26, 49J40, 46N10) 

(referred to in: Equilibrium networks; Fejér monotonicity in 
convex optimization; Financial equilibrium; Generalized 
monotone multivalued maps; Generalized monotone single 
valued maps; Hemivariational inequalities: applications in 
mechanics; Hemivariational inequalities: eigenvalue 
problems; Nonconvex energy functions: hemivariational 
inequalities; Nonconvex-nonsmooth calculus of variations; 
Oligopolistic market equilibrium; Optimization with 
equilibrium constraints: A piecewise SQP approach; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Spatial price equilibrium; Traffic network 
equilibrium; Variational inequalities; Variational 
inequalities: F. E. approach; Variational inequalities: 
geometric interpretation, existence and uniqueness; 
Variational inequalities: projected dynamical system; 
Walrasian price equilibrium) 

(refers to: Equilibrium networks; Fejér monotonicity in 
convex optimization; Financial equilibrium; Generalized 
monotone multivalued maps; Generalized monotone single 
valued maps; Hemivariational inequalities: applications in 
mechanics; Hemivariational inequalities: eigenvalue 
problems; Hemivariational inequalities: static problems; 
Nonconvex energy functions: hemivariational inequalities; 
Oligopolistic market equilibrium; Quasidifferentiable 
optimization; Quasidifferentiable optimization: algorithms 
for hypodifferentiable functions; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 


formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Spatial price equilibrium; Traffic network 
equilibrium; Variational inequalities; Variational 
inequalities: F. E. approach; Variational inequalities: 
geometric interpretation, existence and uniqueness; 
Variational inequalities: projected dynamical system; 
Variational principles; Walrasian price equilibrium) 

generalized morphism 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

generalized morphism 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

generalized morphisms 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

generalized morphisms of relations 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

generalized necessary optimality conditions 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 

generalized network 
[90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 

generalized network optimization system 
[90C10, 90C26, 90C30] 
(see: Optimization software) 

generalized network problem 
[90C35] 
(see: Generalized networks) 

generalized network problems see: quadratic — 

Generalized networks 
(90C35) 
(referred to in: Auction algorithms, Communication 
network assignment problem; Dynamic traffic networks; 
Equilibrium networks; Maximum flow problem; Minimum 
cost flow problem; Multicommodity flow problems; 
Network design problems; Network location: covering 
problems; Nonconvex network flow problems; Piecewise 
linear network flow problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Traffic network equilibrium) 
(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Equilibrium networks; Evacuation 
networks; Maximum flow problem; Minimum cost flow 
problem; Network design problems; Network location: 
covering problems; Nonconvex network flow problems; 
Piecewise linear network flow problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Traffic network equilibrium) 
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generalized networks 
[90C35] 
(see: Generalized networks) 

generalized networks 
[90C30, 90C35] 
(see: Convex-simplex algorithm; Generalized networks) 

Generalized nonlinear complementarity problem 
(90C33) 
(referred to in: Equivalence between nonlinear 
complementarity problem and fixed point problem; Global 
optimization methods for systems of nonlinear equations; 
Integer linear complementary problem; LCP: 
Pardalos—Rosen mixed integer formulation; Linear 
complementarity problem; Order complementarity; 
Principal pivoting methods for linear complementarity 
problems; Topological methods in complementarity 
theory) 
(refers to: Convex-simplex algorithm; Equivalence between 
nonlinear complementarity problem and fixed point 
problem; Integer linear complementary problem; LCP: 
Pardalos—Rosen mixed integer formulation; Lemke method; 
Linear complementarity problem; Linear programming; 
Order complementarity; Parametric linear programming: 
cost simplex algorithm; Principal pivoting methods for 
linear complementarity problems; Sequential simplex 
method; Topological methods in complementarity theory) 

generalized nonlinear least squares 
[90C30] 
(see: Generalized total least squares) 

generalized nonlinear least squares problem 
[90C30] 
(see: Nonlinear least squares problems) 

generalized order complementarity problem 
[90C33] 
(see: Order complementarity) 

generalized order complementarity problem see: 
infinite-dimensional — 

Generalized outer approximation 
(90C11, 90C30, 49M20) 
(referred to in: Chemical process planning; Generalized 
benders decomposition; Global optimization in 
multiplicative programming; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLDP: generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: logic-based methods; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming) 
(refers to: Chemical process planning; Extended cutting 
plane algorithm; Generalized benders decomposition; 
MINLP: application in facility location-allocation; MINLP: 
applications in blending and pooling problems; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 


MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with w BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: mass and heat 
exchanger networks; Mixed integer nonlinear 
programming) 

generalized outer approximation 

[49M37, 90C11] 

see: Mixed integer nonlinear programming) 

generalized primal-relaxed dual algorithm 

[90C26] 

see: Generalized primal-relaxed dual approach) 

Generalized primal-relaxed dual approach 

90C26) 

(referred to in: &BB algorithm; Global optimization in phase 

and chemical reaction equilibrium) 

refers to: &BB algorithm; Global optimization in phase and 

chemical reaction equilibrium) 

generalized quantifier 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

see: Checklist paradigm semantics for fuzzy logics) 

generalized quantifier 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

see: Checklist paradigm semantics for fuzzy logics) 

generalized second order directional derivative 

[46A20, 52A01, 90C30] 

see: Composite nonsmooth optimization) 

generalized semi-infinite problem 

[90C31, 90C34] 

see: Parametric global optimization: sensitivity) 

Generalized semi-infinite programming: optimality conditions 

[90C31, 90C34, 90C46] 

see: Generalized semi-infinite programming: optimality 

conditions) 

generalized single cluster statistic 

[62H30, 90C27] 

see: Assignment methods in clustering) 

generalized Slater constraint qualification 

[90C31] 

see: Sensitivity and stability in NLP: continuity and 

differential stability) 

generalized state equations 

[49K05, 49K10, 49K15, 49K20] 

see: Duality in optimal control with first order differential 

equations) 

generalized subdifferential 

[26B25, 26E25, 49J52, 90C99] 

see: Quasidifferentiable optimization) 

generalized subdifferential see: Clarke — 

generalized subdifferential of F.H. Clarke 

[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 

see: Hemivariational inequalities: applications in 

mechanics) 

Generalized total least squares 

90C30) 

referred to in: ABS algorithms for linear equations and 

linear least squares; ABS algorithms for optimization; 

Gauss-Newton method: Least squares, relation to Newton’s 
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method; Least squares orthogonal polynomials; Least 
squares problems; Nonlinear least squares: Newton-type 
methods; Nonlinear least squares problems; Nonlinear least 
squares: trust region methods) 
(refers to: ABS algorithms for linear equations and linear 
least squares; ABS algorithms for optimization; 
Gauss-Newton method: Least squares, relation to Newton’s 
method; Least squares orthogonal polynomials; Least 
squares problems; Nonlinear least squares: Newton-type 
methods; Nonlinear least squares problems; Nonlinear least 
squares: trust region methods) 

generalized total least squares 

[90C30] 

see: Generalized total least squares) 

generalized TSP 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

(see: Domination analysis in combinatorial optimization) 

generalized-upper-bound dichotomy 

[90C05, 90C06, 90C08, 90C10, 90C11] 

see: Integer programming: branch and bound methods) 

generalized upper bounding structure 

[90C30] 

see: Convex-simplex algorithm) 

generalized upper bounds constraints 

[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 

Generalized variational inequalities: A brief review 

[49]53, 90C30] 

(see: Generalized variational inequalities: A brief review) 

generalized variational inequality 

[47J20, 49J40, 65K10, 90C33] 

(see: Solution methods for multivalued variational 

inequalities) 

generalized Weber problem 

[65D18, 90B85, 90C26] 
(see: Global optimization in location problems) 

generally nonhomogeneous and nonisotropic body see: linear 
thermoelastic behavior of a — 

generated see: randomly — 

generating polynomial 
[33C45, 65F20, 65F22, 65K10] 
(see: Least squares orthogonal polynomials) 

generating two paths see: Pivoting algorithms for linear 
programming — 

generation 
[92B05] 
(see: Genetic algorithms; State of the art in modeling 
agricultural systems) 

generation 
[92B05] 
(see: Genetic algorithms) 

generation see: Branch and price: Integer programming with 
column —; Column —; cut —; hydro- —; one-at-a-time 
coefficient —; row —; scenario — 

generation formulation see: column — 

generation method see: Lagrangian finite — 

generation methods see: column — 

generation modeling languages see: second — 

generation plant see: co- — 

generation subproblem see: column — 


generations 
(see: Broadcast scheduling problem) 

generator see: hit and run —; Li-Pardalos —; Palubeckis —; 
supremal — 

generators 

90B80, 90C27] 

(see: Voronoi diagrams in facility location) 

generators see: Combinatorial test problems and problem — 

generic 

13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 

generic augmenting path algorithm 

90C35] 

(see: Maximum flow problem) 

generic cost vector 

13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 

generic pivoting rule 

90C05] 

(see: Linear programming: Klee-Minty examples) 

generic preflow-push algorithm 

90C35] 

(see: Maximum flow problem) 

generic property 

90C22, 90C25, 90C31] 

(see: Semidefinite programming: optimality conditions and 

stability) 

generic shortest path algorithms 

90B10, 90C27] 
(see: Shortest path tree algorithms) 

generic singularities 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 

generic transitions 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 

generic vertex insertion algorithm 
[68Q25, 68R10, 68W40, 90027, 90C59] 
(see: Domination analysis in combinatorial optimization) 

genericity 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 

GENEROUS 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 

genes see: Selection of maximally informative — 

genetic algorithm see: grouping —; simulated annealing 
and — 

Genetic algorithms 
(92B05) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Broadcast scheduling 
problem; Evolutionary algorithms in combinatorial 
optimization; Facilities layout problems; Forecasting; 
Genetic algorithms for protein structure prediction; Global 
optimization in Lennard-Jones and morse clusters; Graph 
coloring; Integer programming: branch and bound 
methods; Job-shop scheduling problem; Molecular 
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structure determination: convex global underestimation; 
Monte-Carlo simulated annealing in protein folding; 
Multidisciplinary design optimization; Multiple minima 
problem in protein folding: w BB global optimization 
approach; Optimization in medical imaging; Packet 
annealing; Phase problem in X-ray crystallography: Shake 
and bake approach; Set covering, packing and partitioning 
problems; Simulated annealing methods in protein folding) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Genetic algorithms for protein structure 
prediction; Global optimization in Lennard-Jones and 
morse clusters; Global optimization in protein folding; 
Molecular structure determination: convex global 
underestimation; Monte-Carlo simulated annealing in 
protein folding; Multiple minima problem in protein 
folding: «BB global optimization approach; Packet 
annealing; Phase problem in X-ray crystallography: Shake 
and bake approach; Protein folding: generalized-ensemble 
algorithms; Simulated annealing; Simulated annealing 
methods in protein folding) 

genetic algorithms 
[62C10, 65K05, 90B06, 90B35, 90C06, 90C08, 90C10, 90C11, 
90C15, 90C26, 90C27, 90C39, 90C57, 90C59, 90C60, 90C90, 
92B05] 
(see: Bayesian global optimization; Design optimization in 
computational fluid dynamics; Genetic algorithms; 
Maximum constraint satisfaction: relaxations and upper 
bounds; Quadratic assignment problem; Traveling 
salesman problem) 

genetic algorithms 
[90B80, 90C26, 90C30, 92B05] 
(see: Facilities layout problems; Forecasting; Genetic 
algorithms) 

Genetic algorithms for protein structure prediction 
(92B05) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian global 
optimization; Genetic algorithms; Global optimization 
based on statistical models; Monte-Carlo simulated 
annealing in protein folding; Packet annealing; Random 
search methods; Simulated annealing methods in protein 
folding; Stochastic global optimization: stopping rules; 
Stochastic global optimization: two-phase methods) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Bayesian global optimization; Genetic 
algorithms; Global optimization based on statistical 
models; Monte-Carlo simulated annealing in protein 
folding; Packet annealing; Random search methods; 
Simulated annealing methods in protein folding; Stochastic 
global optimization: stopping rules; Stochastic global 
optimization: two-phase methods) 

genetic engineering via negative fitness 

90C20] 

(see: Standard quadratic optimization problems: 

algorithms) 

genetic operators 

90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 

90C60, 90C90] 

(see: Traveling salesman problem) 

GENF 

90C20] 


(see: Standard quadratic optimization problems: 
algorithms) 

genomic analysis see: Algorithms for — 

geodesic convex function 

90C26] 

(see: Smooth nonlinear nonconvex optimization) 

geodesic convex set 

90C26] 

(see: Smooth nonlinear nonconvex optimization) 

geodesic convexity 

90C26] 

(see: Smooth nonlinear nonconvex optimization) 

geodesic convexity 

90C26] 

(see: Smooth nonlinear nonconvex optimization) 

geodesic gradient vector 

90C26] 

(see: Smooth nonlinear nonconvex optimization) 

geodesic Hessian matrix 

90C26] 

(see: Smooth nonlinear nonconvex optimization) 

Geoffrion theorem 

90C10, 90C29] 

(see: Multi-objective integer linear programming) 

geometric 

68Q20] 

(see: Optimal triangulations) 

geometric algorithms 

[05C05, 05C85, 68Q25, 90B80] 

see: Bottleneck steiner tree problems) 

geometric convergence rate 

[49]52, 90C30] 

see: Nondifferentiable optimization: subgradient 

optimization methods) 

geometric distribution 

[90C15] 

see: Logconcavity of discrete distributions) 

geometric interpretation 

[65K10, 65M60] 

see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 

geometric interpretation, existence and uniqueness see: 
Variational inequalities: — 

geometric mean method 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 

geometric mean method see: revised — 

geometric moment theory 

[28-XX, 49-XX, 60-XX] 

see: General moment optimization problems) 

geometric (or disk) representation 

[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

see: Optimization problems in unit-disk graphs) 

geometric programming 

[01A99] 

see: History of optimization) 

Geometric programming 

[90C28, 90C30] 

see: Geometric programming) 
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geometric programming see: generalized —; Global 
optimization in generalized — 

geometric programming: mixed continuous and discrete free 
variables see: Generalized — 

geometric semilattice 

[05B35, 20F36, 20F55, 52C35, 57N65] 

see: Hyperplane arrangements) 

geometric semilattice 

[05B35, 20F36, 20F55, 52C35, 57N65] 

see: Hyperplane arrangements) 

geometric series rule 

[49]52, 90C30] 

see: Nondifferentiable optimization: subgradient 

optimization methods) 

geometrical equation 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: triduality in global optimization) 

geometrical operator 

[49-XX, 90-XX, 93-XX] 

(see: Duality theory: triduality in global optimization) 

geometrical problem 

[90C30] 

(see: Lagrangian duality: BASICS) 

geometrically 

[90030] 

see: Frank-Wolfe algorithm) 

geometrically linear problem 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: triduality in global optimization) 

geometrically nonlinear problem 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: triduality in global optimization) 

geometry 

see: State of the art in modeling agricultural systems) 

geometry see: stochastic —; vector — 

geometry problem see: distance —; Molecular distance — 

geometry of satellites and tracking stations see: Automatic 
differentiation: — 

Gershgorin theorem 

[49M37, 65K10, 90C09, 90C10, 90C26, 90C30] 

(see: a BB algorithm; Combinatorial matrix analysis) 

GGA 

[05-04, 90C27] 

see: Evolutionary algorithms in combinatorial 

optimization) 

GGP 

[90C26, 90C90] 

see: Global optimization in generalized geometric 
programming) 

Gibbs free energy see: molar —; total — 

Gibbs function 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 

Gibbs sampler see: hidden Markov model and — 

GIDEON 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 

Gilbert conjecture see: Chung— — 


Gilbert-Pollak conjecture 
[90C27] 
(see: Steiner tree problems) 
Gilmore-Lawler lower bound 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
Gilmore-Lawler type lower bounds 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
glS design pattern based model 
(see: State of the art in modeling agricultural systems) 
given marginals see: table with — 
Given transformation 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
Givens rotation 
[15A23, 65F05, 65F20, 65F22, 65F25] 
(see: QR factorization) 
Givens transformation see: fast —; square-root-free — 
glass model see: Ising —; Potts — 
global 
[65H20, 80A10, 80A22, 90C26, 90C31, 90C34, 90C90, 92-08, 
92C05, 92C40] 
(see: Generalized primal-relaxed dual approach; Global 
optimization: application to phase equilibrium problems; 
Interval analysis for optimization of dynamical systems; 
Parametric global optimization: sensitivity; Protein 
folding: generalized-ensemble algorithms) 
global constrained optimization problem 
[60G35, 65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Differential equations and global optimization; 
Interval global optimization) 
global convergence 
[49]52, 49M37, 90C06, 90C30] 
(see: Large scale unconstrained optimization; 
Nondifferentiable optimization: Newton method; 
Nonlinear least squares: Newton-type methods; Rosen’s 
method, global convergence, and Powell’s conjecture) 
global convergence 
[49M37, 90C30] 
(see: Nonlinear least squares: Newton-type methods; 
Rosen’s method, global convergence, and Powell’s 
conjecture) 
global convergence of GRASP 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
global convergence, and Powell’s conjecture see: Rosen’s 
method — 
global convergence problem for the Rosen method 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
global cut 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms) 
global energy minimum 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
Global equilibrium search 
(see: Global equilibrium search) 
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global error bound 
[49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
global gradient flows 
[58E05, 90C30] 
(see: Topology of global optimization) 
global independence 
(see: Bayesian networks) 
global infimum 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
global Lagrange multiplier rule 
90C26] 
(see: Smooth nonlinear nonconvex optimization) 
global local maximizers see: set of discrete e- — 
global maximization problem 
90C05, 90C34] 
(see: Semi-infinite programming: methods for linear 
problems) 
global maximizer 
65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
global maximizer see: discrete — 
global maximum point 
65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
obal minimization 
03H10, 49J27, 65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 
90C26, 90C34] 
(see: Information-based complexity and information-based 
optimization; Semi-infinite programming and control 
problems) 
global minimization 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
global minimizer 
[46A20, 52A01, 65G20, 65G30, 65G40, 65K05, 90C30, 90Cxx] 
(see: Composite nonsmooth optimization; Dini and 
Hadamard derivatives in optimization; Interval global 
optimization) 
global minimizers 
[65G30, 65G40, 65K05, 90C30, 90C57] 
(see: Global optimization: interval analysis and balanced 
interval arithmetic) 
global minimum 
[03H10, 49J27, 65G20, 65G30, 65G40, 65K05, 90C26, 90C30, 
90C34, 90C39, 90C57] 
(see: Global optimization: interval analysis and balanced 
interval arithmetic; Interval global optimization; Second 
order optimality conditions for nonlinear optimization; 
Semi-infinite programming and control problems) 
global minimum 
90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 
obal minimum KKT point 
65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
lobal minimum of an NNFP 
90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 


an 


& 


an 


& 


an 


& 


global minimum point 

65K05, 90C30, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization; Image 

space approach to optimization) 

global minimum solution 

90C10, 90C26] 

(see: MINLP: branch and bound global optimization 

algorithm) 

global MINLP 

49M37, 90C11] 

(see: Mixed integer nonlinear programming) 

global nonlinear optimization 

46A20, 52A01, 90C30] 

(see: Farkas lemma: generalizations) 

global optimal design 

90C26, 90C90] 

(see: Global optimization of heat exchanger networks) 

bal optimal solution 

65H10, 90C26, 90C30] 

(see: Global optimization methods for systems of nonlinear 

equations) 

bal optimality 

46A20, 52A01, 90C30] 

(see: Farkas lemma: generalizations) 

bal optimization 

01A99, 26E25, 46A20, 49-XX, 49J52, 52A01, 52A27, 60J15, 
60J60, 60J65, 60J70, 60K35, 65C05, 65C10, 65C20, 65C30, 
65C40, 65C50, 65C60, 65Cxx, 65G20, 65G30, 65G40, 65K05, 
65740, 68Q25, 68U20, 70-08, 82B21, 82B31, 82B41, 82B80, 
90-XX, 90B50, 90C05, 90C10, 90C20, 90C26, 90C29, 90C30, 
90C90, 90C99, 91B06, 92C40, 92E10, 93-XX] 
(see: Adaptive global search; Duality theory: triduality in 
global optimization; Farkas lemma: generalizations; Global 
optimization methods for harmonic retrieval; Global 
optimization in protein folding; History of optimization; 
Interval analysis: systems of nonlinear equations; MINLP: 
branch and bound global optimization algorithm; 
Multi-objective optimization and decision support systems; 
Quadratic programming with bound constraints; 
Quasidifferentiable optimization: Dini derivatives, clarke 
derivatives; Reverse convex optimization; Selection of 
maximally informative genes; Stochastic global 
optimization: stopping rules; Stochastic global 
optimization: two-phase methods) 

Global optimization 
[46A20, 49K99, 49M29, 49M37, 52A01, 58E05, 60G35, 62C10, 
65C30, 65C40, 65C50, 65C60, 65Cxx, 65G20, 65G30, 65G40, 
65H10, 65H20, 65K05, 65K10, 65K99, 65T40, 80A10, 80A22, 
90B06, 90B10, 90C05, 90C10, 90C11, 90C15, 90C20, 90C26, 
90C27, 90C29, 90C30, 90C32, 90C35, 90C90, 90C99, 92C40] 
(see: &BB algorithm; Bayesian global optimization; Bilevel 
programming: global optimization; Continuous global 
optimization: applications; Differential equations and 
global optimization; Direct global optimization algorithm; 
Farkas lemma: generalizations; Generalized benders 
decomposition; Generalized primal-relaxed dual approach; 
Global optimization: application to phase equilibrium 
problems; Global optimization based on statistical models; 
Global optimization: envelope representation; Global 
optimization in generalized geometric programming; 
Global optimization of heat exchanger networks; Global 
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optimization: hit and run methods; Global optimization in 
Lennard-Jones and morse clusters; Global optimization 
methods for harmonic retrieval; Global optimization 
methods for systems of nonlinear equations; Global 
optimization in multiplicative programming; Global 
optimization in phase and chemical reaction equilibrium; 
Global optimization in Weber’s problem with attraction 
and repulsion; Interval analysis: intermediate terms; 
Interval analysis: nondifferentiable problems; Interval 
analysis: unconstrained and constrained optimization; 
Interval analysis: verifying feasibility; Interval fixed point 
theory; Interval global optimization; Interval Newton 
methods; Minimum concave transportation problems; 
MINLP: branch and bound global optimization algorithm; 
MINLP: global optimization with «BB; Mixed integer 
nonlinear bilevel programming: deterministic global 
optimization; Mixed integer nonlinear programming; 
Multiple minima problem in protein folding: «BB global 
optimization approach; Neural networks for combinatorial 
optimization; Nonconvex network flow problems; Optimal 
design of composite structures; Optimality criteria for 
multiphase chemical equilibrium; Phase problem in X-ray 
crystallography: Shake and bake approach; Piecewise linear 
network flow problems; Quadratic fractional 
programming: Dinkelbach method; Quadratic 
programming with bound constraints; Random search 
methods; Reverse convex optimization; SSC minimization 
algorithms for nonsmooth and stochastic optimization; 
Standard quadratic optimization problems: algorithms; 
Standard quadratic optimization problems: theory; 
Stochastic global optimization: stopping rules; Stochastic 
global optimization: two-phase methods; Topology of 
global optimization) 

global optimization see: Bayesian —; Bilevel programming: —; 
black-box —; constrained —; continuous —; Cutting plane 
methods for —; Decomposition in —; Differential equations 
and —; Duality theory: triduality in —; Interval —; Interval 
analysis: parallel methods for —; LP strategy for 
interval-Newton method in deterministic —; mixed 
discrete-continuous —; Mixed integer nonlinear bilevel 
programming: deterministic —; multi-extremal —; 
Reformulation-linearization technique for —; Robust —; 
stochastic —; Topology of —; unconstrained — 


global optimization algorithm see: «BB —; deterministic —; 
Direct —; MINLP: branch and bound — 


Global optimization algorithms for financial planning problems 
[78M50, 90B50, 91B28] 
(see: Global optimization algorithms for financial planning 
problems) 


global optimization with wBB see: MINLP: — 


Global optimization in the analysis and management of 
environmental systems 
(90C05) 
(referred to in: Continuous global optimization: 
applications; Continuous global optimization: models, 
algorithms and software; Interval global optimization; 
Mixed integer nonlinear programming; Optimization in 
water resources) 
(refers to: Continuous global optimization: applications; 
Continuous global optimization: models, algorithms and 


software; Interval global optimization; Mixed integer 
nonlinear programming; Optimization in water resources) 

Global optimization: application to phase equilibrium 
problems 
(80A10, 80A22, 90C90, 65H20) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global optimization 
in phase and chemical reaction equilibrium; Interval 
analysis: application to chemical engineering design 
problems; Interval analysis: differential equations; Interval 
analysis: eigenvalue bounds of interval matrices; Interval 
analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: systems of 
nonlinear equations; Interval analysis: unconstrained and 
constrained optimization; Interval analysis: verifying 
feasibility; Interval constraints; Interval fixed point theory; 
Interval global optimization; Interval linear systems; 
Interval Newton methods; Optimality criteria for 
multiphase chemical equilibrium) 
(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global optimization 
in phase and chemical reaction equilibrium; Interval 
analysis: application to chemical engineering design 
problems; Interval analysis: differential equations; Interval 
analysis: eigenvalue bounds of interval matrices; Interval 
analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; Interval 
Newton methods; Optimality criteria for multiphase 
chemical equilibrium) 

global optimization: applications see: Continuous — 

global optimization approach see: Multiple minima problem in 
protein folding: «BB — 

Global optimization based on statistical models 
(90C30) 
(referred to in: Adaptive global search; Adaptive simulated 
annealing and its application to protein folding; «BB 
algorithm; Bayesian global optimization; Continuous 
global optimization: applications; Continuous global 
optimization: models, algorithms and software; Differential 
equations and global optimization; Direct global 
optimization algorithm; Genetic algorithms for protein 
structure prediction; Global optimization in binary star 
astronomy; Global optimization methods for systems of 
nonlinear equations; Global optimization using space 
filling; Monte-Carlo simulated annealing in protein folding; 
Packet annealing; Random search methods; Simulated 
annealing; Simulated annealing methods in protein folding; 
Stochastic global optimization: stopping rules; Stochastic 
global optimization: two-phase methods; Topology of 
global optimization) 
(refers to: Adaptive global search; Adaptive simulated 
annealing and its application to protein folding; «BB 
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algorithm; Bayesian global optimization; Continuous 
global optimization: applications; Continuous global 
optimization: models, algorithms and software; Differential 
equations and global optimization; Direct global 
optimization algorithm; Genetic algorithms for protein 
structure prediction; Global optimization in binary star 
astronomy; Global optimization methods for systems of 
nonlinear equations; Global optimization using space 
filling; Integer programming: branch and bound methods; 
Monte-Carlo simulated annealing in protein folding; 
Packet annealing; Random search methods; Simulated 
annealing; Simulated annealing methods in protein folding; 
Stochastic global optimization: stopping rules; Stochastic 
global optimization: two-phase methods; Topology of 
global optimization) 

Global optimization in batch design under uncertainty 
(90C26) 

(referred to in: a BB algorithm; Continuous global 
optimization: models, algorithms and software; Global 
optimization in generalized geometric programming; 
Global optimization methods for systems of nonlinear 
equations; Global optimization in phase and chemical 
reaction equilibrium; Interval global optimization; MINLP: 
branch and bound global optimization algorithm; MINLP: 
global optimization with «BB; Smooth nonlinear 
nonconvex optimization) 

(refers to: &BB algorithm; Continuous global optimization: 
models, algorithms and software; Global optimization in 
generalized geometric programming; Global optimization 
methods for systems of nonlinear equations; Global 
optimization in phase and chemical reaction equilibrium; 
Interval global optimization; MINLP: branch and bound 
global optimization algorithm; MINLP: global optimization 
with «BB; Smooth nonlinear nonconvex optimization) 
Global optimization in binary star astronomy 

(90C26, 90C90) 

(referred to in: a BB algorithm; Continuous global 
optimization: applications; Continuous global 
optimization: models, algorithms and software; Differential 
equations and global optimization; Direct global 
optimization algorithm; Global optimization based on 
statistical models; Global optimization methods for systems 
of nonlinear equations; Global optimization using space 
filling; Topology of global optimization) 

(refers to: &BB algorithm; Continuous global optimization: 
applications; Continuous global optimization: models, 
algorithms and software; Differential equations and global 
optimization; Direct global optimization algorithm; Global 
optimization based on statistical models; Global 
optimization methods for systems of nonlinear equations; 
Global optimization using space filling; Topology of global 
optimization) 

Global optimization: cutting angle method 

(90C26, 65K05, 90C56, 65K10) 

Global optimization: envelope representation 

(90C26) 

(referred to in: Dini and Hadamard derivatives in 
optimization; Nondifferentiable optimization; 
Nondifferentiable optimization: cutting plane methods; 
Nondifferentiable optimization: minimax problems; 
Nondifferentiable optimization: Newton method; 


Nondifferentiable optimization: parametric programming; 
Nondifferentiable optimization: relaxation methods; 
Nondifferentiable optimization: subgradient optimization 
methods) 
(refers to: Dini and Hadamard derivatives in optimization; 
Farkas lemma; Farkas lemma: generalizations; 
Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods; Nondifferentiable 
optimization: minimax problems; Nondifferentiable 
optimization: Newton method; Nondifferentiable 
optimization: parametric programming; Nondifferentiable 
optimization: relaxation methods; Nondifferentiable 
optimization: subgradient optimization methods) 

Global optimization: filled function methods 
(90C26, 90C30, 90C59, 65K05) 

Global optimization: functional forms 

Global optimization: g-w BB approach 
(49M37, 65K10, 90C26, 90C30, 46N10, 47N10) 

Global optimization in generalized geometric programming 
(90C26, 90C90) 
(referred to in: a BB algorithm; Continuous global 
optimization: models, algorithms and software; Convex 
envelopes in optimization problems; Global optimization 
in batch design under uncertainty; Global optimization 
methods for systems of nonlinear equations; Global 
optimization in phase and chemical reaction equilibrium; 
Interval global optimization; MINLP: branch and bound 
global optimization algorithm; MINLP: global optimization 
with w BB; Smooth nonlinear nonconvex optimization) 
(refers to: eBB algorithm; Continuous global optimization: 
models, algorithms and software; Convex envelopes in 
optimization problems; Global optimization in batch 
design under uncertainty; Global optimization methods for 
systems of nonlinear equations; Global optimization in 
phase and chemical reaction equilibrium; Interval global 
optimization; MINLP: branch and bound global 
optimization algorithm; MINLP: global optimization with 
«BB; Smooth nonlinear nonconvex optimization) 

Global optimization of heat exchanger networks 
(90C26, 90C90) 
(referred to in: Continuous global optimization: models, 
algorithms and software; Global optimization methods for 
systems of nonlinear equations; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: mass and heat exchanger networks; 
Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer linear programming: 
mass and heat exchanger networks) 
(refers to: MINLP: global optimization with «BB; MINLP: 
heat exchanger network synthesis; MINLP: mass and heat 
exchanger networks; Mixed integer linear programming: 
heat exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks) 

Global optimization: hit and run methods 
(90C26, 90C90) 
(referred to in: Optimal design of composite structures; 
Stochastic global optimization: stopping rules; Stochastic 
global optimization: two-phase methods) 
(refers to: Random search methods; Simulated annealing; 
Stochastic global optimization: stopping rules; Stochastic 
global optimization: two-phase methods) 
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Global optimization: interval analysis and balanced interval 
arithmetic 
(65K05, 90C30, 90C57, 65G30, 65G40) 
(refers to: Bisection global optimization methods; 
Continuous global optimization: models, algorithms and 
software; Interval analysis: parallel methods for global 
optimization; Interval analysis: subdivision directions in 
interval branch and bound methods; Interval analysis: 
unconstrained and constrained optimization; Interval 
global optimization; Interval linear systems) 

Global optimization in Lennard-Jones and morse 
clusters 
(90C26, 90C90) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Genetic algorithms; Graph 
coloring; Molecular structure determination: convex global 
underestimation; Monte-Carlo simulated annealing in 
protein folding; Multiple minima problem in protein 
folding: «BB global optimization approach; Packet 
annealing; Phase problem in X-ray crystallography: Shake 
and bake approach; Simulated annealing methods in 
protein folding) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Genetic algorithms; Global optimization 
in protein folding; Molecular structure determination: 
convex global underestimation; Monte-Carlo simulated 
annealing in protein folding; Multiple minima problem in 
protein folding: w BB global optimization approach; Packet 
annealing; Phase problem in X-ray crystallography: Shake 
and bake approach; Protein folding: generalized-ensemble 
algorithms; Simulated annealing; Simulated annealing 
methods in protein folding) 

Global optimization in location problems 
(90C26, 65D18, 90B85) 

global optimization method see: QBB — 

global optimization methods see: Bisection — 

Global optimization methods for harmonic retrieval 
(90C26, 65T40, 90C30, 90C90) 
(referred to in: Signal processing with higher order statistics) 
(refers to: Signal processing with higher order statistics) 

Global optimization methods for systems of nonlinear 
equations 
(65H10, 90C26, 90C30) 
(referred to in: a BB algorithm; Continuous global 
optimization: applications; Continuous global 
optimization: models, algorithms and software; 
Contraction-mapping; Differential equations and global 
optimization; Direct global optimization algorithm; Global 
optimization based on statistical models; Global 
optimization in batch design under uncertainty; Global 
optimization in binary star astronomy; Global 
optimization in generalized geometric programming; 
Global optimization in phase and chemical reaction 
equilibrium; Global optimization using space filling; 
Groébner bases for polynomial equations; Interval analysis: 
systems of nonlinear equations; Interval global 
optimization; MINLP: branch and bound global 
optimization algorithm; MINLP: global optimization with 
o BB; Nonlinear least squares: Newton-type methods; 
Nonlinear systems of equations: application to the 
enclosure of all azeotropes; Smooth nonlinear nonconvex 


optimization; Topology of global optimization) 

(refers to: &BB algorithm; Continuous global optimization: 
applications; Continuous global optimization: models, 
algorithms and software; Contraction-mapping; Convex 
envelopes in optimization problems; D.C. programming; 
Differential equations and global optimization; Direct 
global optimization algorithm; Generalized nonlinear 
complementarity problem; Global optimization based on 
statistical models; Global optimization in batch design 
under uncertainty; Global optimization in binary star 
astronomy; Global optimization in generalized geometric 
programming; Global optimization of heat exchanger 
networks; Global optimization in phase and chemical 
reaction equilibrium; Global optimization using space 
filling; Interval analysis: systems of nonlinear equations; 
Interval global optimization; MINLP: branch and bound 
global optimization algorithm; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: mass and heat exchanger networks; 
Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer linear programming: 
mass and heat exchanger networks; Nonlinear least squares: 
Newton-type methods; Nonlinear systems of equations: 
application to the enclosure of all azeotropes; Smooth 
nonlinear nonconvex optimization; Topology of global 
optimization; Variational inequalities) 


global optimization model see: continuous — 
global optimization: models, algorithms and software see: 


Continuous — 


Global optimization in multiplicative programming 


(90C26) 

(referred to in: Linear programming; Multiparametric linear 
programming; Multiplicative programming; Parametric 
linear programming: cost simplex algorithm) 

(refers to: Complexity classes in optimization; Complexity 
theory; Concave programming; Generalized outer 
approximation; Integer programming: branch and bound 
methods; Linear programming; Multiparametric linear 
programming; Multiplicative programming; Parametric 
linear programming: cost simplex algorithm) 


Global optimization: p-«BB approach 


(49M37, 65K10, 90C26, 90C30, 46N10, 47N10) 


Global optimization in phase and chemical reaction 


equilibrium 

(90C26, 90C90) 

(referred to in: a BB algorithm; Continuous global 
optimization: models, algorithms and software; 
Generalized primal-relaxed dual approach; Global 
optimization: application to phase equilibrium problems; 
Global optimization in batch design under uncertainty; 
Global optimization in generalized geometric 
programming; Global optimization methods for systems of 
nonlinear equations; Interval global optimization; MINLP: 
branch and bound global optimization algorithm; MINLP: 
global optimization with «BB; Optimality criteria for 
multiphase chemical equilibrium; Smooth nonlinear 
nonconvex optimization) 

(refers to: &BB algorithm; Continuous global optimization: 
models, algorithms and software; Generalized 
primal-relaxed dual approach; Global optimization: 
application to phase equilibrium problems; Global 
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optimization in batch design under uncertainty; Global 

optimization in generalized geometric programming; 

Global optimization methods for systems of nonlinear 

equations; Interval global optimization; MINLP: branch 

and bound global optimization algorithm; MINLP: global 
optimization with «BB; Optimality criteria for multiphase 
chemical equilibrium; Smooth nonlinear nonconvex 
optimization) 

Global optimization of planar multilayered dielectric 
structures 

(65K99) 
global optimization problem 

[49K99, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65H10, 65K05, 

80A10, 90C26, 90C27, 90C30] 

(see: Global optimization methods for systems of nonlinear 

equations; Neural networks for combinatorial 

optimization; Optimality criteria for multiphase chemical 
equilibrium; Random search methods; Stochastic global 
optimization: two-phase methods) 

Global optimization in protein folding 

[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 

70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 

(see: Global optimization in protein folding) 
global optimization: sensitivity see: Parametric — 
global optimization: stopping rules see: Stochastic — 

Global optimization: tight convex underestimators 

[46N10, 47N10, 49M37, 65K10, 90C26, 90C30] 

(see: Global optimization: tight convex underestimators) 
global optimization: two-phase methods see: Stochastic — 
global optimization over unbounded domains 

[65G20, 65G30, 65G40, 65K05, 90C30] 

(see: Interval global optimization) 

Global optimization using space filling 

(90C26) 

(referred to in: «BB algorithm; Continuous global 

optimization: applications; Continuous global 

optimization: models, algorithms and software; Differential 
equations and global optimization; Direct global 
optimization algorithm; Global optimization based on 
statistical models; Global optimization in binary star 
astronomy; Global optimization methods for systems of 
nonlinear equations; Topology of global optimization) 

(refers to: &BB algorithm; Continuous global optimization: 

applications; Continuous global optimization: models, 

algorithms and software; Differential equations and global 
optimization; Direct global optimization algorithm; Global 
optimization based on statistical models; Global 
optimization in binary star astronomy; Global 
optimization methods for systems of nonlinear equations; 

Topology of global optimization) 
global optimization using terrain/funneling methods see: 

Multi-scale — 

Global optimization in Weber’s problem with attraction and 
repulsion 

(90C26, 90C90) 

(referred to in: Combinatorial optimization algorithms in 

resource allocation problems; Facilities layout problems; 

Facility location with externalities; Facility location 

problems with spatial interaction; Facility location with 

staircase costs; MINLP: application in facility 
location-allocation; Multifacility and restricted location 


problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 
(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; MINLP: application in facility 
location-allocation; Multifacility and restricted location 
problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Production-distribution system design problem; 
Resource allocation for epidemic control; Single facility 
location: circle covering problem; Single facility location: 
multi-objective euclidean distance location; Single facility 
location: multi-objective rectilinear distance location; 
Stochastic transportation and location problems; Voronoi 
diagrams in facility location; Warehouse location problem) 

global optimizer 
[90C26] 
(see: Smooth nonlinear nonconvex optimization) 

global optimum 
[65F10, 65F50, 65G20, 65G30, 65G40, 65H10, 65H20, 65K10, 
93-XX] 
(see: Direct search Luus—Jaakola optimization procedure; 
Dynamic programming: optimal control applications; 
Globally convergent homotopy methods; Interval analysis: 
unconstrained and constrained optimization) 

global optimum see: constrained — 

global optimum search 
[49M29, 90C11] 
(see: Generalized benders decomposition) 

global optimum search with enhanced positioning see: Gene 
clustering: A novel decomposition-based clustering 
approach: — 

Global pairwise protein sequence alignment via mixed-integer 
linear optimization 

[90C10, 92-08] 

see: Global pairwise protein sequence alignment via 

mixed-integer linear optimization) 

global phase 

[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 

see: Stochastic global optimization: two-phase methods) 

global points see: set of e- — 

global round robin request 

[68W10, 90C27] 

see: Load balancing for parallel optimization techniques) 

bal search 

[65K05, 90C26, 90C30, 90C59, 90C90] 

see: Global optimization in binary star astronomy; Global 
optimization: filled function methods) 

global search see: Adaptive — 

global solution 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 


Keck 
iS) 
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global structual stability of SIP(fh,g) 

90C31, 90C34] 

(see: Parametric global optimization: sensitivity) 
bal structural stability 

[90C31, 90C34] 

see: Parametric global optimization: sensitivity) 


oa 
iS) 


global supply chain 

[90B05, 90B06] 

see: Global supply chain models) 

Global supply chain models 

90B05, 90B06) 

referred to in: Inventory management in supply chains; 
Nonconvex network flow problems; Operations research 
models for supply chain management and design; Piecewise 
linear network flow problems) 

(refers to: Inventory management in supply chains; 
Nonconvex network flow problems; Operations research 
models for supply chain management and design; Piecewise 
linear network flow problems) 


Global terrain methods 
(see: Global terrain methods) 

global unconstrained optimization problem 
[60G35, 65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Differential equations and global optimization; 
Interval global optimization) 

global underestimation see: convex —; Molecular structure 
determination: convex — 

global underestimator 
[90C11, 90C26] 
(see: Extended cutting plane algorithm) 

global underestimator see: convex — 

globally convergent 
[49M37, 65F10, 65F50, 65H10, 65K05, 65K10, 90C30] 
(see: Globally convergent homotopy methods; Nonlinear 
least squares: trust region methods; Structural 
optimization) 

globally convergent 
[49M37, 65F10, 65F50, 65H10, 65K05, 65K10, 90C30] 
(see: Globally convergent homotopy methods; Structural 
optimization) 

globally convergent algorithm 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 

globally convergent homotopies see: probability-one — 

Globally convergent homotopy methods 
(65F10, 65F50, 65H10, 65K10) 
(referred to in: Parametric optimization: embeddings, path 
following and singularities; Topology of global 
optimization) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; Parametric 
optimization: embeddings, path following and 
singularities; Topology of global optimization) 

globally convergent probability-one homotopy algorithm 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 

globally convexized filled function 
[65K05, 90C26, 90C30, 90C59] 
(see: Global optimization: filled function methods) 


globally optimal 
[90C30] 
(see: Frank-Wolfe algorithm; Simplicial decomposition) 
globally optimal parameter 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
GlobSol 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: intermediate terms) 
GMIN 
[90C10, 90C26] 
(see: MINLP: branch and bound global optimization 
algorithm) 
GMIN-a BB 
49M37, 90C11] 
(see: Mixed integer nonlinear programming) 
GMIN-aBB algorithm 
90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 
GMMVM 
90C26] 
(see: Generalized monotone multivalued maps) 
GMRES 
65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
GMSVM 
90C26] 
(see: Generalized monotone single valued maps) 
GMVIPEP 
46N10, 49J40, 90C26] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
GN 
90C35] 
(see: Generalized networks) 
go see: cost-to- — 
GO for SNE 
65H10, 90C26, 90C30] 
(see: Global optimization methods for systems of nonlinear 
equations) 
GO4BSA 
90C26, 90C90] 
(see: Global optimization in binary star astronomy) 
goal coordination method 
49Q10, 74K99, 74Pxx, 90C90, 91A65] 
(see: Multilevel optimization in mechanics) 
goal-oriented differentiation 
65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
goal programming 
90C27, 90C29] 
(see: Decision support systems with multiple criteria; 
Multicriteria sorting methods; Multi-objective 
combinatorial optimization) 
goal programming 
[90C27, 90C29] 
(see: Multicriteria sorting methods; Operations research 
and financial markets) 
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Goal Programming see: Lexicographic — 
goals 
(see: Planning in the process industry) 
goals see: fuzzy — 
Goerisch bound see: Lehmann- — 
Goerisch method 
[49R50, 65G20, 65G30, 65G40, 65L15, 65L60] 
(see: Eigenvalue enclosures for ordinary differential 
equations) 
Goguen-Gaines implication 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
golden section method 
90C30] 
(see: Convex-simplex algorithm) 
golden section method 
90C30] 
(see: Convex-simplex algorithm; Frank-Wolfe algorithm) 
golden section search 
90C30] 
(see: Nonlinear least squares problems) 
Goldfarb-Idnani active set strategy 
65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
Goldfarb-Idnani method 
[65K05, 65K10] 
(see: ABS algorithms for optimization) 
Goldfarb method see: Forrest- — 
Goldfarb-Shanno method see: Broyden—-Fletcher- — 
Goldfarb-Shanno quasi-Newton update see: 
Broyden-Fletcher- — 
Goldfarb-Shanno update see: Broyden-Fletcher- — 
Goldfarb-Wang algorithm 
90C30] 
(see: Numerical methods for unary optimization) 
Goldfarb-Wang algorithm 
90C30] 
(see: Numerical methods for unary optimization) 
Goldstein conditions 
49M37, 90C30] 
(see: Nonlinear least squares: Newton-type methods; 
Nonlinear least squares problems) 
Gollan CQ 
49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
Gomez #3 problem 
65K05] 
(see: Direct global optimization algorithm) 
Gomory cut see: Chvatal- — 
Gomory cutting plane see: Chvatal- — 
Gomory cutting plane algorithm 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: cutting plane algorithms) 
Gomory relaxations 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
good edge flips 
[68Q20] 
(see: Optimal triangulations) 


good inclusion function 


65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 


good subset 


46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 


Goodman-Kruskal Ty statistic 


[62H30, 90C27] 
(see: Assignment methods in clustering) 


gOP algorithm 


[90C26, 90C30, 90C90] 
see: Bilevel programming: global optimization; 
Generalized primal-relaxed dual approach) 


GOP algorithm 


gOR 


[90C26] 
see: Generalized primal-relaxed dual approach) 


[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling; 
Optimal planning of offshore oilfield infrastructure) 


Gordan transposition theorem 


gOS 


[15A39, 90C05] 
see: Tucker homogeneous systems of linear relations) 


[68T20, 68W10, 90C11, 90C26, 91C20, 92-08, 92C05, 92D 10] 
(see: Determining the optimal number of clusters) 


GOSF 


go 


[90C26] 
see: Global optimization using space filling) 
ttfried wilhelm see: Leibniz — 


Gottfried Wilhelm Leibniz 


[01A99] 
see: Leibniz, gottfried wilhelm) 


governing equation 


[49K20, 49M99, 90C55] 
(see: Sequential quadratic programming: interior point 


methods for distributed optimal control problems) 
government regulation 


[90-01, 90B30, 90B50, 91B32, 91B52, 91B74] 
see: Bilevel programming in management) 


GPASP 


[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 


(see: Greedy randomized adaptive search procedures) 
GPP 


[90C08, 90C11, 90C27, 90C57, 90C59] 
see: Quadratic assignment problem) 


GPRD 


[90C26] 


(see: Generalized primal-relaxed dual approach) 


gradient 


[49M29, 58E05, 65H99, 65K05, 65K10, 65K99, 90C06, 90C25, 


90C30, 90C31] 

(see: Automatic differentiation: calculation of the Hessian; 

Automatic differentiation: point and interval; Local 

attractors for gradient-related descent iterations; Sensitivity 

and stability in NLP; Solving large scale and sparse 

semidefinite programs; Topology of global optimization) 
gradient 


[65K05, 90C30] 


(see: Automatic differentiation: calculation of the Hessian) 
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gradient see: adjoint-based —; Clarke generalized —; 
conjugate —; general —; generalized —; v-approximate —; 
projected negative —; projected positive —; reduced —; 
restricted —; sensitivity-based — 

gradient algorithm see: extra- —; projected —; reduced — 

gradient algorithms for unconstrained optimization see: New 
hybrid conjugate —; Performance profiles of conjugate- — 

gradient based approach 

[90C30, 90C52, 90C53, 90C55] 

see: Gauss-Newton method: Least squares, relation to 

Newton’s method) 

gradient based procedures 

[90C30] 

see: Powell method) 

gradient controller see: feasible —; nonfeasible — 

gradient descent 

[65K05, 90Cxx] 

see: Symmetric systems of linear equations) 

gradient estimation 

[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 

gradient flows see: global — 

gradient free see: univariate — 

gradient-free algorithm 
[90C30] 
(see: Cyclic coordinate method) 

gradient-free minimization 
[90C30] 
(see: Powell method; Rosenbrock method; Sequential 
simplex method) 

gradient-free minimization algorithm 

[90C30] 

see: Powell method; Rosenbrock method) 

gradient index 

[62H30, 90C39] 

see: Dynamic programming in clustering) 

gradient of an integral 

[90C15] 

see: Derivatives of probability and integral functions: 
general theory and examples) 

gradient method see: Arrow-Hurwicz —; conditional —; 
conjugate —; incremental —; Shanno conjugate —; Wolfe 
reduced — 

gradient methods 
[90C30, 90C52, 90C53, 90C55] 
(see: Gauss-Newton method: Least squares, relation to 
Newton’s method) 

gradient methods see: Conjugate- —; deflected —; Spectral 
projected — 

gradient of multivariate distribution functions 

[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 

see: Approximation of multivariate probability integrals) 

gradient parameter computation see: conjugate — 

gradient of a probability function 

[90C15] 

see: Derivatives of probability and integral functions: 

general theory and examples) 

gradient of a probability function 

[90C15] 

see: Derivatives of probability and integral functions: 

general theory and examples) 


gradient projection 

[90C30] 

(see: Rosen’s method, global convergence, and Powell’s 

conjecture) 
gradient projection 

[90C30] 

(see: Cost approximation algorithms) 
gradient projection algorithm 

[90C30] 

(see: Cost approximation algorithms) 
gradient projection method see: Rosen — 
gradient-related descent 

[49M29, 65K10, 90C06] 

(see: Local attractors for gradient-related descent iterations) 
gradient-related descent iterations see: Local attractors for — 
gradient-related set function 

[49M29, 65K10, 90C06] 

(see: Local attractors for gradient-related descent iterations) 
gradient type algorithm see: Craig conjugate — 
gradient vector 

[37A35, 49M29, 65K10, 90C05, 90C06, 90C26, 90C39] 

(see: Local attractors for gradient-related descent iterations; 

Potential reduction methods for linear programming; 

Second order optimality conditions for nonlinear 

optimization) 
gradient vector see: geodesic — 
gradients see: conjugate — 
gradients, Jacobians, and Hessians see: Complexity of — 
Gragg-Kaufmann-Stewart reorthogonalized Gram-Schmidt 

algorithm see: Daniel- — 
Graham-Hwang conjecture 

[90C27] 

(see: Steiner tree problems) 
graham-Scan 

[46N10, 47N10, 49M37, 65K10, 90C26, 90C30] 

(see: Global optimization: tight convex underestimators) 
grained multicomputer see: coarse — 

GRAM 

[65K05, 65Y05] 

(see: Parallel computing: models) 

Gram-Schmidt algorithm see: 

Daniel-Gragg—Kaufmann-Stewart reorthogonalized — 
Gram-Schmidt orthogonalization 

[65Fxx] 

(see: Least squares problems) 

Gram-Schmidt orthogonalization 

[90C30] 

(see: Rosenbrock method) 

Gram-Schmidt orthogonalization see: classical —; modified — 
Gram-Schmidt type iteration 

[65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 

squares) 
grand challenge 

[90C10, 90C26, 90C30] 

(see: Optimization software) 
graph 

[90C10] 

(see: Maximum constraint satisfaction: relaxations and 

upper bounds) 
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graph 
[05C60, 05C69, 37B25, 90B80, 90C10, 90C20, 90C27, 90C35, 
90C59, 91A22, 94C15] 
(see: Facilities layout problems; Graph coloring; Graph 
planarization; Replicator dynamics in combinatorial 
optimization) 

graph see: 0-1-0 —; adjacency —; adjacent vertices in a —; 
alignment-distribution —; association —; bipartite —; 
bipartite chordal —; block-clique —; branchpoint of a —; 
cardinality of a —; chordal —; cocomparability —; column 
incidence —; compact —; complement —; 
complementary —; complete —; computational —; 
conflict —; connected —; connectivity —; constraint —; 
convex bipartite —; cycle in a —; cyclically reducible —; 
directed —; distance in a —; edge of a —; elimination —; 
endpoint of a —; Eulerian —; extended alignment —; 
gabriel —; homeomorph of a —; interval —; k-leveled —; 
kernel of a —; length of a path in a —; level in a leveled —; 
level planar —; leveled —; linkpoint of a —; maximum 
weighted planar —; min-max —; minimum fill-in of a —; 
mixed —; order of a —; overlap —; path in a —; 
permutation —; planar —; proper k-leveled —; rank 
determined —-; series-parallel —; size of a —; 
Smith-Walford one-reducible —; trapezoid —; tree 
association —; unicursal —; vertex of a —; weighted —; 
weighted tree association — 

graph based framework 

05-02, 05-04, 15A04, 15A06, 68U99] 

(see: Alignment problem) 

graph bipartitioning problem 

90C27, 90C30] 

(see: Neural networks for combinatorial optimization) 

graph bipartization 

90C35] 

(see: Feedback set problems) 

graph bipartization problem 

90C35] 

(see: Feedback set problems) 

graph bipartization problem see: minimum weighted — 

graph collapsing auction algorithm 

90B10, 90C27] 

(see: Shortest path tree algorithms) 

graph collapsing in auction algorithms 

90B10, 90C27] 
(see: Shortest path tree algorithms) 

Graph coloring 
(90C35) 
(referred to in: Biquadratic assignment problem; Broadcast 
scheduling problem; Feedback set problems; Frequency 
assignment problem; Graph planarization; Greedy 
randomized adaptive search procedures; Heuristics for 
maximum clique and independent set; Integer 
programming; Linear ordering problem; Maximum 
constraint satisfaction: relaxations and upper bounds; 
Multi-objective mixed integer programming; Quadratic 
assignment problem; Quadratic semi-assignment problem; 
Replicator dynamics in combinatorial optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; 
Time-dependent traveling salesman problem) 
(refers to: Adaptive simulated annealing and its application 


to protein folding; Branch and price: Integer programming 
with column generation; Decomposition techniques for 
MILP: lagrangian relaxation; Feedback set problems; 
Frequency assignment problem; Generalized assignment 
problem; Genetic algorithms; Global optimization in 
Lennard-Jones and morse clusters; Global optimization in 
protein folding; Graph planarization; Greedy randomized 
adaptive search procedures; Heuristics for maximum clique 
and independent set; Integer linear complementary 
problem; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos—Rosen 
mixed integer formulation; Maximum constraint 
satisfaction: relaxations and upper bounds; Mixed integer 
classification problems; Molecular structure determination: 
convex global underestimation; Monte-Carlo simulated 
annealing in protein folding; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Multiple minima problem in protein 
folding: «BB global optimization approach; Packet 
annealing; Parametric mixed integer nonlinear 
optimization; Phase problem in X-ray crystallography: 
Shake and bake approach; Protein folding: 
generalized-ensemble algorithms; Quadratic assignment 
problem; Quadratic semi-assignment problem; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Simulated 
annealing methods in protein folding; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Time-dependent traveling 
salesman problem) 

graph coloring 
[05-XX] 
(see: Frequency assignment problem) 

graph coloring problem 
[05-04, 05-XX, 90C27, 90C35] 
(see: Evolutionary algorithms in combinatorial 
optimization; Frequency assignment problem; Graph 
coloring) 

graph coloring problem see: weighted — 

graph degree 

[90C35] 

see: Feedback set problems) 

graph drawing 

[90C10, 90C27, 94C15] 

see: Graph planarization) 

graph drawing see: automatic — 

graph isomorphism 

[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 

see: Replicator dynamics in combinatorial optimization) 

graph isomorphism problem 

[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 

see: Replicator dynamics in combinatorial optimization) 

graph of a matrix 

[90C09, 90C10] 

see: Combinatorial matrix analysis) 
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graph model see: containment —; intersection —; 
proximity — 

graph optimization 
[90C09, 90C10, 90C11, 90C20] 
(see: Linear ordering problem; Matroids; Oriented 
matroids) 

graph packing problem 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 

Graph partitioning 
(90C27, 05-04) 
(referred to in: Bayesian networks; Beam selection in 
radiotherapy treatment design; Combinatorial matrix 
analysis; Combinatorial optimization algorithms in 
resource allocation problems; Combinatorial optimization 
games; Fractional combinatorial optimization; 
Multi-objective combinatorial optimization; 
Optimization-based visualization; Replicator dynamics in 
combinatorial optimization; Simulated annealing; 
Traveling salesman problem) 
(refers to: Combinatorial matrix analysis; Combinatorial 
optimization games; Fractional combinatorial 
optimization; Genetic algorithms; Multi-objective 
combinatorial optimization; Neural networks for 
combinatorial optimization; Replicator dynamics in 
combinatorial optimization) 

graph partitioning problem 
[05C60, 05C69, 37B25, 90C08, 90C11, 90C20, 90C27, 90C35, 
90C57, 90C59, 91A22] 
(see: Quadratic assignment problem; Replicator dynamics 
in combinatorial optimization) 

graph partitioning problem see: k-way — 

Graph planarization 
(94C15, 90C10, 90C27) 
(referred to in: Biquadratic assignment problem; Feedback 
set problems; Graph coloring; Greedy randomized adaptive 
search procedures; Linear ordering problem; Optimization 
in leveled graphs; Quadratic assignment problem; 
Quadratic semi-assignment problem) 
(refers to: Feedback set problems; Generalized assignment 
problem; Graph coloring; Greedy randomized adaptive 
search procedures; Optimization in leveled graphs; 
Quadratic assignment problem; Quadratic 
semi-assignment problem) 

graph planarization 
[90C10, 90C27, 94C15] 
(see: Graph planarization) 

graph planarization see: branch and bound algorithm for 
weighted — 

graph problem 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 

graph Realization Problem 
[05C50, 15A48, 15A57, 51K05, 52C25, 68Q25, 68U05, 90C22, 
90C25, 90C35] 
(see: Graph realization via semidefinite programming; Matrix 
completion problems) 

Graph realization problem 
(see: Semidefinite programming and the sensor network 
localization problem, SNLP) 


Graph realization via semidefinite programming 

51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 

(see: Graph realization via semidefinite programming) 

graph reduction in auction algorithms 

90B10, 90C27] 

(see: Shortest path tree algorithms) 

graph search 

68W 10, 90C27] 

(see: Load balancing for parallel optimization techniques) 

graph search 

68W 10, 90C27] 

(see: Load balancing for parallel optimization techniques) 
graph theorem see: strong perfect — 

graph theory 
[05-XX] 

(see: Frequency assignment problem) 

graph theory 
[90B80, 90C05, 90C10, 90C27, 90C35] 

(see: Assignment and matching; Facilities layout problems) 
graphic matroid 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 

90B, 90C, 90C09, 90C10] 

(see: Convex discrete optimization; Matroids) 

graphical traveling salesman problem 
[90B20] 

(see: General routing problem) 

graphical traveling salesman problem see: Steiner — 

graphical user interface 
[90C30, 90C90] 

(see: Successive quadratic programming: applications in the 
process industry) 

graphs see: bisectored unit disk —; bounded ratio disk —; 
coin —; disk —; double disk —; empty neighborhood —; 
grid —; isomorphic —; leveled —; matrix patterns and —; 
Optimization in leveled —; Optimization problems in 
unit-disk —; searching state space —; unit-Disk — 

GRASP 
[03B05, 65H20, 65K05, 68P10, 68Q25, 68R05, 68T15, 68T20, 
90-01, 90B40, 90C09, 90C10, 90C27, 90C35, 94C10, 94C15] 
(see: Greedy randomized adaptive search procedures; 
Maximum satisfiability problem) 

GRASP 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Feedback set problems; Greedy randomized adaptive 
search procedures) 

GRASP see: basic —; construction phase in —; global 
convergence of —; local search phase in —; long-term 
memory in —; parallel —; reactive — 

GRASP approach 

90C09, 90C10] 

(see: Optimization in boolean classification problems) 

GRASP in hybrid metaheuristics 

65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 

(see: Greedy randomized adaptive search procedures) 

GRASP in industry 

65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 

(see: Greedy randomized adaptive search procedures) 

GRASP in operations research 

65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 

(see: Greedy randomized adaptive search procedures) 
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graver basis 
[05A, 13Cxx, 13Pxx, 14Qxx, 15A, 51M, 52A, 52B, 52C, 62H, 
68Q, 68R, 68U, 68W, 90B, 90C, 90Cxx] 
(see: Convex discrete optimization; Integer programming: 
algebraic methods) 

graver complexity 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 

gravity see: center of — 

gravity location see: center of — 

gravity method see: center of — 

Gray code 

92B05] 

(see: Genetic algorithms) 

Gray code 

92B05] 

(see: Genetic algorithms) 

greater see: lexicographically — 

greatest convex minorant 

41A30, 47A99, 62G07, 62G30, 62J02, 65K05, 65K10, 90C26] 
(see: Isotonic regression problems; Lipschitzian operators 
in best approximation by bounded or continuous functions; 
Regression by special functions: algorithms and 
complexity) 

greatest improvement see: rule of — 

greatest K-minorant 
[41A30, 47A99, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 

greatest quasiconvex minorant 
[41A30, 47A99, 62J02, 65K10, 90C26] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions; Regression by special 
functions: algorithms and complexity) 

greedy algorithm 
[68Q25, 68R10, 68W40, 90B06, 90B35, 90C06, 90C09, 90C10, 
90C27, 90C35, 90C39, 90C57, 90C59, 90C60, 90C90, 94C15] 
(see: Combinatorial optimization algorithms in resource 
allocation problems; Domination analysis in combinatorial 
optimization; Graph planarization; Multi-index 
transportation problems; Traveling salesman problem) 

greedy algorithm 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05, 90C35] 
(see: Maximum partition matching; Multi-index 
transportation problems) 

greedy algorithm see: the — 

greedy algorithm for axial MITPs 
[90C35] 
(see: Multi-index transportation problems) 

greedy algorithms 
[05C85] 
(see: Directed tree networks) 

greedy algorithms see: local — 

greedy choice 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 

greedy-choice property 
[90C09, 90C 10] 
(see: Matroids) 


greedy coloring 
[05-XX] 
(see: Frequency assignment problem) 

greedy coloring heuristic see: sequential — 

greedy construction 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 

greedy-expanding see: algorithm — 

greedy form see: standard — 

greedy function 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 

greedy heuristics 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 

greedy heuristics see: sequential — 

greedy method 

[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 

see: Maximum partition matching) 

greedy randomized adaptive search 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

greedy randomized adaptive search procedure 

[90C35] 

see: Feedback set problems) 

Greedy randomized adaptive search procedures 

90-01, 90B40, 90C10, 90C35, 90C27, 94C15, 65H20, 65K05) 

referred to in: Biquadratic assignment problem; Broadcast 
scheduling problem; Feedback set problems; Graph 
coloring; Graph planarization; Heuristics for maximum 
clique and independent set; Linear ordering problem; 
Maximum cut problem, MAX-CUT; Maximum 
satisfiability problem; Quadratic assignment problem; 
Quadratic semi-assignment problem; Replicator dynamics 
in combinatorial optimization) 
(refers to: Feedback set problems; Generalized assignment 
problem; Graph coloring; Graph planarization; Heuristics 
for maximum clique and independent set; Maximum 
satisfiability problem; Quadratic assignment problem; 
Quadratic semi-assignment problem) 

greedy swap 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 

greedy swaps see: monotone sequence of — 

greedy technique 
[90C09, 90C10] 
(see: Matroids) 

greedy technique 
[90C09, 90C10, 90C11, 90C20] 
(see: Linear ordering problem; Matroids; Oriented 
matroids) 

greedy triangulation 

68Q20] 

(see: Optimal triangulations) 

grid 

[90C05, 90C25, 90C30, 90C34] 

see: Semi-infinite programming: discretization methods) 

grid 

[65K05, 65Y05] 

see: Parallel computing: models) 
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grid see: 2-dimensional ; finer —; menace of the 
expanding — 
grid graphs 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
grid point 
[93-XX] 
(see: Dynamic programming: optimal control applications) 
grid search 
[62F12, 65C05, 65K05, 90C15, 90C31] 
see: Monte-Carlo simulations for stochastic optimization) 
grid technique see: uniform — 
grids see: repertory — 
Grobner bases for polynomial equations 
(12D10, 12Y05, 13P10) 
(referred to in: Fundamental theorem of algebra) 
(refers to: Contraction-mapping; Farkas lemma; Farkas 
lemma: generalizations; Fundamental theorem of algebra; 
Global optimization methods for systems of nonlinear 
equations; Interval analysis: systems of nonlinear 
equations; Nonlinear least squares: Newton-type methods; 
Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
Grébner basis 
[12D10, 12Y05, 13Cxx, 13P10, 13Pxx, 14Qxx, 90Cxx] 
(see: Grébner bases for polynomial equations; Integer 
programming: algebraic methods) 
Grébner basis 
[12D10, 12Y05, 13Cxx, 13P10, 13Pxx, 14Qxx, 90Cxx] 
(see: Grébner bases for polynomial equations; Integer 
programming: algebraic methods) 
Grdbner basis see: reduced —; universal — 
Grébner fan 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
Grébner fiber 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
groove constraint see: tongue-and- — 
ground see: departure- — 
ground arcs 
(see: Railroad locomotive scheduling) 
ground connection arc see: arrival- — 
ground delay problem in air traffic control 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
ground delay programs see: air traffic control and — 
ground-departure connection arc 
(see: Railroad locomotive scheduling) 
ground node see: arrival- — 
ground set 
[90C09, 90C10 
(see: Matroids) 
ground state 
[90C27, 90C90 
see: Simulated annealing) 
groundwater see: rational use of — 
groundwater pumping facilities 
[90C30, 90C35 
see: Optimization in water resources) 
groundwater resources see: surface and — 


; coarse 


groundwater systems see: surface and — 
group see: braid —; depot —; fundamental —; higher 
homotopy —; Klein 4-element —; quantum —-; realization 
of an abstract —; symmetric S2x2x2 — 
group classification problem see: g- — 
group classification problem (discriminant problem) see: g- — 
group decision making 
[90B50] 
(see: Optimization and decision support systems) 
group decision making 
[90B50] 
(see: Optimization and decision support systems) 
group decision support system see: multicriteria — 
group relaxation 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
group relaxation 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
group relaxation in integer programming 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
group relaxations see: extended — 
group of transformation see: Piaget — 
grouping genetic algorithm 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 
groupoid see: nonassociative —; noncommutative — 
groups see: PCR — 
growing technique see: sohere — 
growth see: Ramsey rule of economic —; second order — 
growth condition see: linear —; unilateral — 
growth scheme 
[90C26, 90C90] 
(see: Global optimization in Lennard-Jones and morse 
clusters) 
GRP 
90B20] 
(see: General routing problem) 
GRR 
68W 10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
GRR-M 
68W 10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
Gsat algorithm 
03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
GSC 
90B05, 90B06] 
(see: Global supply chain models) 
GSEARCH 
49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
GSIP 
90C31, 90C34] 
(see: Semi-infinite programming: second order optimality 
conditions) 
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guarantee 
(see: Interval analysis for optimization of dynamical 
systems) 
guarantee see: performance —; worst-case performance — 
guaranteed to be stable without pivoting 
[15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 
guaranteed bound 
90C26, 90C30] 
(see: Bounding derivative ranges) 
guaranteed bound to optimality 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: cutting plane algorithms) 
guaranteed lower bound 
65K05, 90C11, 90C26] 
(see: MINLP: global optimization with wBB) 
guardian 
90C05] 
(see: Ellipsoid method) 
GUB dichotomy 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
GUHA 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
guided learning 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
guiding process 
68T20, 68T99, 90C27, 90C59] 
(see: Capacitated minimum spanning trees; Metaheuristics) 
guillotine subdivision 
90C27] 
(see: Steiner tree problems) 
gVI 
47J20, 49J40, 65K10, 90C33] 
(see: Solution methods for multivalued variational 
inequalities) 
GVI see: dual (or Minty) — 
gVNS 
[9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 


H 


H-convex function 
[90C26] 
(see: Global optimization: envelope representation) 
h-convex functions 
[46A20, 52A01, 90C30] 
(see: Farkas lemma: generalizations) 
H-matrix 
[49M37, 65K10, 90C26, 90C30] 
(see: @BB algorithm) 
H-representation 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 


H-subdifferential 
[90C26] 
see: Global optimization: envelope representation) 
H-subgradient 
[90C26] 
see: Global optimization: envelope representation) 
Hadamard 
[65K05, 90Cxx] 
see: Dini and Hadamard derivatives in optimization) 
Hadamard ov-stationary point 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
Hadamard codifferentiable function 
[65Kxx, 90Cxx] 
see: Quasidifferentiable optimization: algorithms for QD 
functions) 
Hadamard conditional lower derivative 
[65K05, 90Cxx 
see: Dini and Hadamard derivatives in optimization) 
Hadamard conditional upper derivative 
[65K05, 90Cxx 
(see: Dini and Hadamard derivatives in optimization) 
Hadamard conditionally differentiable function 
[65K05, 90Cxx 
see: Dini and Hadamard derivatives in optimization) 
Hadamard conditionally directionally differentiable function 
[65K05, 90Cxx 
see: Dini and Hadamard derivatives in optimization) 
Hadamard derivative 
[65K05, 90Cxx 
see: Dini and Hadamard derivatives in optimization; 
Quasidifferentiable optimization: optimality conditions) 
Hadamard derivatives in optimization see: Dini and — 
Hadamard differentiable 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
Hadamard differentiable function 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 
Hadamard directional derivatives 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
hadamard directionally differentiable 
[65K05, 90C30, 90Cxx] 
see: Dini and Hadamard derivatives in optimization; 
Minimax: directional differentiability) 
Hadamard directionally differentiable function 
[90Cxx] 
see: Quasidifferentiable optimization: optimality 
conditions) 
Hadamard lower directional derivative 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
Hadamard quasidifferentiable function 
[90Cxx] 
see: Quasidifferentiable optimization: optimality 
conditions) 
Hadamard quasidifferential 
[90Cxx] 
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(see: Quasidifferentiable optimization: optimality 
conditions) 

Hadamard steepest ascent direction 

[65K05, 90Cxx 

(see: Dini and Hadamard derivatives in optimization) 

Hadamard steepest descent direction 

[65K05, 90Cxx 

see: Dini and Hadamard derivatives in optimization) 

Hadamard sup-stationary point 

[65K05, 90Cxx 

see: Dini and Hadamard derivatives in optimization) 

Hadamard upper directional derivative 

[65K05, 90Cxx 

see: Dini and Hadamard derivatives in optimization) 

Hahn-Banach linear extension theorem 

[90C05, 90C30 

see: Theorems of the alternative and optimization) 

Hahn-Banach theorem 

[46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 90C15, 91A05] 

see: Minimax theorems; Stochastic programming: 

nonanticipativity and lagrange multipliers) 

Hahn-Banach theorem 

[46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 

see: Minimax theorems) 

Hahn-Banach theorem see: Mazur-Orlicz version of the — 

Hahn decomposition see: Jordan- — 

HAMILTON CIRCUIT 
[90C60] 
(see: Complexity classes in optimization) 

Hamilton-Jacobi-Bellman equation 
(49L20, 34H05, 90C39) 
(referred to in: Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: average cost per stage problems; 
Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; High-order maximum principle for abnormal 
extremals; Infinite horizon control and dynamic games; 
MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control; Multiple objective dynamic programming; 
Neuro-dynamic programming; Optimal control of a flexible 
arm; Optimization strategies for dynamic systems; 
Pontryagin maximum principle; Robust control; Robust 
control: schur stability of polytopes of polynomials; 
Semi-infinite programming and control problems; 
Sequential quadratic programming: interior point methods 
for distributed optimal control problems; Suboptimal 
control) 
(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: average cost per stage problems; Dynamic 
programming in clustering; Dynamic programming: 
continuous-time optimal control; Dynamic programming: 


discounted problems; Dynamic programming: infinite 
horizon problems, overview; Dynamic programming: 
inventory control; Dynamic programming and Newton’s 
method in unconstrained optimal control; Dynamic 
programming: optimal control applications; Dynamic 
programming: stochastic shortest path problems; Dynamic 
programming: undiscounted problems; High-order 
maximum principle for abnormal extremals; Infinite 
horizon control and dynamic games; MINLP: applications 
in the interaction of design and control; Multi-objective 
optimization: interaction of design and control; Multiple 
objective dynamic programming; Neuro-dynamic 
programming; Optimal control of a flexible arm; 
Optimization strategies for dynamic systems; Pontryagin 
maximum principle; Robust control; Robust control: schur 
stability of polytopes of polynomials; Semi-infinite 
programming and control problems; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Suboptimal control) 

Hamilton-Jacobi-Bellman equation 
[34H05, 491.20, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 

Hamilton-Jacobi-Bellman equation 
[34H05, 49120, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 

Hamilton—Jacobi-Bellman equation see: derivation of the —; 
solution of the —; sufficiency theorem for the — 

Hamilton-Jacobi equation 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 

Hamilton-Jacobi inequality 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 

Hamiltonian 
[49-XX, 49K05, 49K10, 49K15, 49K20, 65L99, 90-XX, 93-XX] 
(see: Boundary condition iteration BCI; Duality in optimal 
control with first order differential equations; Duality 
theory: biduality in nonconvex optimization; Optimization 
strategies for dynamic systems) 

Hamiltonian circuit 

90C60] 

(see: Computational complexity theory) 

Hamiltonian circuit problem 

90C60] 

(see: Computational complexity theory) 

Hamiltonian cycle problem 

90C60] 

(see: Complexity theory) 

Hamiltonian function 

49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 

Hamiltonian path cost 
[90C35] 
(see: Multi-index transportation problems) 

Hamiltonian system 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
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Hamiltonian system 
[49-XX, 49J15, 49K15, 90-XX, 93-XX, 93C10] 
(see: Duality theory: biduality in nonconvex optimization; 
Pontryagin maximum principle) 
Hammerstein equation 
[65H10, 65J15] 
(see: Contraction-mapping) 
Hamming-reactive tabu search 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
hand side changes see: sensitivity analysis with respect to 
right- — 
hand side perturbation model see: right- — 
hand side perturbation problem see: right- — 
hand side problem see: right- — 
hand side simplex algorithm see: parametric right- — 
hand-side uncertainty, duality and applications see: Robust 
linear programming with right- — 
handbook on Semidefinite Programming 
90C05, 90C22, 90C25, 90C30, 90C51] 
(see: Interior point methods for semidefinite programming) 
handcoded derivatives 
65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
handcoding 
65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
Hansel chain 
90C09] 
(see: Inference of monotone boolean functions) 
Hansel chain 
90C09] 
(see: Inference of monotone boolean functions) 
Hansel chains question-asking strategy see: binary search- —; 
sequential — 
Hansel theorem 
90C09] 
(see: Inference of monotone boolean functions) 
Hansel theorem 
90C09] 
(see: Inference of monotone boolean functions) 
hard see: NP- —; strongly NP- — 
hard case of the trust region problem 
49M37] 
(see: Nonlinear least squares: trust region methods) 
hard clustering 
65K05, 90C26, 90C56, 90C90] 
(see: Derivative-free methods for non-smooth optimization; 
Nonsmooth optimization approach to clustering) 
hard language see: F- — 
hard problem see: NP- — 
hard problems see: classification of — 
hardness see: F- — 
Hardy-Littlewood-Polya theorem 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
Harker-Kanzow-Smale function see: Chen— — 
harmonic retrieval 
[65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 


harmonic retrieval 

[65T40, 90C26, 90C30, 90C90] 

(see: Global optimization methods for harmonic retrieval) 
harmonic retrieval see: Global optimization methods for — 
harsh fuzzy product 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 

Hasse diagram 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 

see: Boolean and fuzzy relations) 

HCP 

[90C60] 

see: Computational complexity theory) 

hDY 

[49M07, 49M10, 65K, 90C06] 

see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 

hDYz 

[49M07, 49M10, 65K, 90C06] 

see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 

head-body-tail problem 

[90B35] 

see: Job-shop scheduling problem) 

head(l) 

(see: Railroad crew scheduling) 
head of operation 

[90B35] 

(see: Job-shop scheduling problem) 
heads see: cluster- —; tape — 

HEAP see: binary —; S- — 
Hearn algorithm see: Elzinga- — 
heat conduction 

[35R70, 47840, 74B99, 74D99, 74G99, 74H99] 

(see: Quasidifferentiable optimization: applications to 

thermoelasticity) 
heat conduction see: Fourier law of — 
heat exchange see: mass and — 
heat exchanger 
[90C90] 
see: MINLP: heat exchanger network synthesis) 
heat exchanger network see: mass and — 
heat exchanger network superstructure 
[90C90] 

(see: MINLP: heat exchanger network synthesis) 
heat exchanger network synthesis 
[90C90] 

(see: Mixed integer linear programming: heat exchanger 

network synthesis) 
heat exchanger network synthesis see: MINLP: —; Mixed 

integer linear programming: — 
heat exchanger network synthesis without decomposition 

[90C90] 

(see: MINLP: heat exchanger network synthesis) 
heat exchanger networks 

[90C26, 90C90] 

(see: Global optimization of heat exchanger networks) 
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heat exchanger networks see: Global optimization of —; 
MINLP: mass and —; Mixed integer linear programming: 
mass and — 

heat and mass exchange network 


[90C90] 


see: MINLP: reactive distillation column synthesis) 


heat transfer module see: mass/ — 
heater 


[90B35, 90C11, 90C30] 


see: Robust optimization: mixed-integer linear programs) 


heavy ball algorithm 


[90C30] 


see: Conjugate-gradient methods) 


heavy ball method 


He 


[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 


see: Unconstrained optimization in neural network 


training) 


bden conditions 


[49M37] 
(see: Nonlinear least squares: trust region methods) 


hedging 


[90C90, 91B28] 


helical proteins see: Predictive method for interhelical contacts 


see: Robust optimization) 


in alpha- — 
helix see: a- — 
hemicontinuous operator 
[46N10, 49J40, 90C26] 


(see: Generalized monotonicity: applications to variational 


inequalities and equilibrium problems) 

hemicontinuous operator see: upper — 

hemivariational inequalities 
[26B25, 26E25, 35R70, 47840, 49J35, 49J40, 49J52, 49M05, 
49Q10, 49505, 65K05, 65K99, 70-08, 70-XX, 74A55, 74B99, 


74D99, 74G99, 74H99, 74K99, 74M10, 74M15, 74Pxx, 80-XX, 


90C26, 90C30, 90C33, 90C90, 90C99, 91A65] 

(see: Hemivariational inequalities: applications in 
mechanics; Hemivariational inequalities: eigenvalue 
problems; Multilevel optimization in mechanics; 


Nonconvex energy functions: hemivariational inequalities; 


Quasidifferentiable optimization; Quasidifferentiable 
optimization: applications; Quasidifferentiable 
optimization: applications to thermoelasticity; 


Quasidifferentiable optimization: variational formulations; 


Quasivariational inequalities; Solving hemivariational 
inequalities by nonsmooth optimization methods) 
hemivariational inequalities 
[49]40, 49J52, 49805, 65K05, 74G99, 74H99, 74Pxx, 90C30, 
90C33] 
(see: Hemivariational inequalities: applications in 
mechanics; Hemivariational inequalities: eigenvalue 
problems; Solving hemivariational inequalities by 
nonsmooth optimization methods) 
hemivariational inequalities see: multivalued nonmonotone 
laws and —; Nonconvex energy functions: — 
Hemivariational inequalities: applications in mechanics 
(49805, 74G99, 74H99, 74Pxx, 49J52, 90C33) 
(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: eigenvalue problems; 


Nonconvex energy functions: hemivariational inequalities; 


Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 


Hemivariational inequalities: eigenvalue problems 


(49]52) 

(referred to in: a BB algorithm; Eigenvalue enclosures for 
ordinary differential equations; Generalized monotonicity: 
applications to variational inequalities and equilibrium 
problems; Hemivariational inequalities: applications in 
mechanics; Interval analysis: eigenvalue bounds of interval 
matrices; Nonconvex energy functions: hemivariational 
inequalities; Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
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functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Semidefinite 
programming and determinant maximization; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 
(refers to: eBB algorithm; Eigenvalue enclosures for 
ordinary differential equations; Generalized monotonicity: 
applications to variational inequalities and equilibrium 
problems; Hemivariational inequalities: applications in 
mechanics; Hemivariational inequalities: static problems; 
Interval analysis: eigenvalue bounds of interval matrices; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Semidefinite 
programming and determinant maximization; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F, E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

hemivariational inequalities for nonlinear material laws see: 
discretized — 

hemivariational inequalities by nonsmooth optimization 
methods see: Solving — 

Hemivariational inequalities: static problems 
(49J40, 47J20, 49J40, 35A15) 
(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 


Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

hemivariational inequality see: abstract —; semicoercive —; 
variational- — 

hemivariational inequality problem see: coercive — 

HEN synthesis 

[90C90] 

see: MINLP: heat exchanger network synthesis; Mixed 

integer linear programming: heat exchanger network 

synthesis) 

HEN synthesis using MINLP 

[90C90] 

see: MINLP: heat exchanger network synthesis) 

hereditary property 

[49M37] 

see: Nonlinear least squares: Newton-type methods) 

Hermitian interval matrix 

[65G20, 65G30, 65G40, 65L99] 

see: Interval analysis: eigenvalue bounds of interval 

matrices) 

Hermitian interval matrix 

[65G20, 65G30, 65G40, 65L99] 

see: Interval analysis: eigenvalue bounds of interval 
matrices) 

Hermitian matrix see: partial — 

hesitant adaptive search 
[65K05, 90C30] 
(see: Random search methods) 

Hessian 
[49-04, 65H99, 65K05, 65K99, 65Y05, 68N20, 90C30, 90C31] 
(see: Automatic differentiation: calculation of the Hessian; 
Automatic differentiation: calculation of Newton steps; 
Automatic differentiation: parallel computation; 
Automatic differentiation: point and interval; Sensitivity 
and stability in NLP) 

Hessian 
[65K05, 90C30] 
(see: Automatic differentiation: calculation of the Hessian) 


Hessian see: affine-reduced- —; Automatic differentiation: 
calculation of the —; limited-memory affine reduced —; 
lower bounding —; quasi- —; reduced — 


Hessian algorithm see: affine-reduced- — 
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Hessian BFGS algorithm see: limited-memory reduced- — 
Hessian of a Lagrangian see: reduced — 
Hessian matrices 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: intermediate terms) 
Hessian matrix 
[90C25, 90C30, 90C90] 
(see: Design optimization in computational fluid dynamics: 
Successive quadratic programming: full space methods) 
Hessian matrix 
65K05, 90C30] 
(see: Automatic differentiation: calculation of Newton 
steps) 
Hessian matrix see: geodesic —; interval —; n —; projected 
Lagrangian — 
Hessian matrix of the Lagrangian function 
[90C25, 90C30, 90C90] 
(see: Successive quadratic programming; Successive 
quadratic programming: applications in distillation 
systems; Successive quadratic programming: full space 
methods) 
Hessian matrix of a Lagrangian function 
[90C30, 90C90] 
(see: Successive quadratic programming; Successive 
quadratic programming: applications in distillation 
systems) 
Hessian matrix of a Lagrangian function see: projected — 
Hessian SQP see: multiplier-free reduced — 
Hessian SQP method see: reduced — 
Hessian test 
[65K05, 65Y05, 65Y10, 65Y20, 68W10] 
(see: Interval analysis: parallel methods for global 
optimization) 
Hessians see: Complexity of gradients, Jacobians, and — 
Hestenes-Stiefel algorithm 
[90C30] 
(see: Conjugate-gradient methods) 
heterogeneity see: subset —; vehicles’ homogeneity/ — 
heterogeneous 
[90C30] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
heterogeneous azeotropes 
[90C30] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
heterogeneous relation 
[03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
heterogeneous relations see: special properties of — 
heteroscedastic model 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
see: Disease diagnosis: optimization-based methods) 
heterotonic operator 
[90033] 
(see: Order complementarity) 
Heun method 
[90B15] 
(see: Dynamic traffic networks) 


> 


heuristic 
[68T20, 68T99, 90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 
90C57, 90C59, 90C60, 90C90] 
(see: Metaheuristics; Traveling salesman problem) 

heuristic see: aggregation —; annealed replication —; 
construction —; decomposition —; enhanced —; 
increasing-degree deletion —; incremental deletion —; 
maximal matching —; min-exchange —; multiple-hub —; 
R-opt —; rounding —; savings —; search —; semigreedy —; 
sequential greedy coloring —; single hub —; Toyoda 
primal —; Whitney savings — 

heuristic algorithm 
[90-XX] 
(see: Survivable networks) 

heuristic algorithms 
[68Q20, 90C90] 
(see: Optimal triangulations; Simulated annealing methods 
in protein folding) 

heuristic approach see: Bayesian — 

heuristic approach to solving CAP on trees 

68Q25, 90B80, 90C05, 90C27] 

(see: Communication network assignment problem) 

heuristic approaches 

68Q25, 90B80, 90C05, 90C27] 

(see: Communication network assignment problem) 

heuristic measure 

68T20, 68T99, 90C27, 90C59] 

(see: Metaheuristics) 

heuristic-metaheuristic algorithms 

90C59] 
(see: Heuristic and metaheuristic algorithms for the 
traveling salesman problem) 

Heuristic and metaheuristic algorithms for the traveling 
salesman problem 
(90C59) 
(referred to in: Traveling salesman problem) 

heuristic methods 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 

heuristic optimization method 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 

heuristic parameter, reject index for interval optimization see: 
Algorithmic improvements using a — 

heuristic procedure 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 

Heuristic search 
(68T20, 90B40, 90C47) 
(referred to in: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Load balancing for parallel optimization 
techniques; Maximum cut problem, MAX-CUT; Parallel 
computing: complexity classes; Parallel computing: models; 
Parallel heuristic search; Quadratic assignment problem; 
Stochastic network problems: massively parallel solution) 
(refers to: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Load balancing for parallel optimization 
techniques; Parallel computing: complexity classes; Parallel 
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computing: models; Parallel heuristic search; Stochastic 
network problems: massively parallel solution) 
heuristic search see: Parallel — 
heuristics 
[9008, 90B06, 90B35, 90C06, 90C10, 90C11, 90C26, 90C27, 
90C30, 90C39, 90C57, 90C59, 90C60, 90C90] 
(see: Chemical process planning; Global optimization based 
on statistical models; Integer programming; Set covering, 
packing and partitioning problems; Traveling salesman 
problem; Variable neighborhood search methods) 
heuristics 
(05-04, 05C60, 05C69, 05C85, 37B25, 62C10, 65K05, 68W01, 
90B35, 90B80, 90C10, 90C11, 90C15, 90C20, 90C26, 90C27, 
90C30, 90C35, 90C59, 91A22, 94C15] 
(see: Bayesian global optimization; Evolutionary algorithms 
in combinatorial optimization; Facilities layout problems; 
Facility location with staircase costs; Frank-Wolfe 
algorithm; Graph planarization; Heuristics for maximum 
clique and independent set; Job-shop scheduling problem; 
Replicator dynamics in combinatorial optimization) 
heuristics see: advanced search —; construction —; 
continuous based —; greedy —-; history-sensitive —; 
improvement —; journal of —; k-interchange —; local 
search —; max-regret-fc and max-regret —; primal —; 
randomized —; sequential —; sequential greedy — 
heuristics for axial MITPs see: hub — 
heuristics of facility location problems with staircase costs 
[90B80, 90C11] 
(see: Facility location with staircase costs) 
Heuristics for maximum clique and independent set 
(90C59, 05C69, 05C85, 68W01) 
(referred to in: Graph coloring; Greedy randomized adaptive 
search procedures; Replicator dynamics in combinatorial 
optimization; Stable set problem: branch & cut algorithms) 
(refers to: Graph coloring; Greedy randomized adaptive 
search procedures; Replicator dynamics in combinatorial 
optimization) 
Hewitt decomposition see: Yosida- — 
Hewitt theorem see: Yosida- — 
Hickey-Cohen triangulation 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 
hidden collision 
(see: Broadcast scheduling problem) 
hidden constraint 
[90C30] 
(see: Duality for semidefinite programming) 
hidden Markov model and Gibbs sampler 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 
hidden Markov models 
(see: Bayesian networks) 
hide-and-seek algorithm 
[90C26, 90C90] 
(see: Global optimization: hit and run methods) 
hierarchical 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
hierarchical clustering see: order constrained — 


hierarchical collection of margins 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 

90B, 90C] 

see: Convex discrete optimization) 

hierarchical decision making 

[90C30, 90C90] 

(see: Bilevel programming: global optimization) 

hierarchical discrimination 

[90C29] 

see: Multicriteria sorting methods) 

hierarchical discrimination see: multigroup — 

hierarchical optimization 

[49-01, 49K10, 49K45, 49M37, 49N10, 90-01, 90B30, 90B50, 

90C05, 90C15, 90C20, 90C26, 90C27, 90C30, 90C31, 90C33, 

91B32, 91B52, 91B74] 

see: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel programming: introduction, history and 
overview; Bilevel programming in management; Stochastic 
bilevel programs) 

hierarchical programming problem 

[49M37, 65K05, 65K10, 90C30, 93A13] 

(see: Multilevel methods for optimal design) 

hierarchy 

[90C10, 90C15] 

see: Stochastic integer programs) 

hierarchy see: k-level —; lift-and-project —; partition — 

hierarchy in a finite set 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

see: Boolean and fuzzy relations) 

hierarchy process see: analytic — 

high see: sufficiently — 

high-dimensional integration 

[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 

see: Information-based complexity and information-based 

optimization) 

high-dimensional integration 

[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 

see: Information-based complexity and information-based 

optimization) 

high failure of the alpha-beta algorithm 

[49J35, 49K35, 62C20, 91A05, 91A40] 

(see: Minimax game tree searching) 

high-level software 

[90C10, 90C26, 90C30] 

see: Optimization software) 

high-order approximating cone 

[41A10, 46N10, 47N10, 49K27] 

see: High-order necessary conditions for optimality for 
abnormal points) 

high-order approximating cone see: tangent — 

high-order approximating cone of decrease 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 

high-order approximating cones see: feasible —; tangent — 

high-order approximating curve 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
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high-order approximating curve see: feasible —; tangent — 
high-order approximating vector see: feasible — 
high-order approximating vector of decrease 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
high-order approximating vectors 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
high-order cones of decrease 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
high-order critical direction 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
high-order directional derivatives 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
high-order feasible cones 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
high-order feasible set 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
high-order generalization of Lyusternik theorem 
[41A10, 46N10, 47N10, 49K27] 
see: High-order necessary conditions for optimality for 
abnormal points) 
high-order local maximum principle 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
high-order local maximum principle for Lagrangian problems 
[41A10, 47N10, 49K15, 49K27] 
see: High-order maximum principle for abnormal 
extremals) 
high-order local minimum condition 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
High-order maximum principle for abnormal extremals 
(49K15, 49K27, 41A10, 47N10) 
(referred to in: Dynamic programming: continuous-time 
optimal control; Hamilton-Jacobi-Bellman equation; 
Pontryagin maximum principle) 
(refers to: Dynamic programming: continuous-time optimal 
control; Hamilton-Jacobi-Bellman equation; High-order 
necessary conditions for optimality for abnormal points; 
Pontryagin maximum principle) 
high-order necessary conditions for optimality 
[41A10, 46N10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals; High-order necessary conditions for optimality 
for abnormal points) 


High-order necessary conditions for optimality for abnormal 
points 
(49K27, 46N10, 41A10, 47N10) 
(referred to in: High-order maximum principle for 
abnormal extremals; Kuhn-Tucker optimality conditions) 
(refers to: Kuhn-Tucker optimality conditions) 
high-order set of decrease 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
high-order tangent approximating vector 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
high-order tangent sets 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
high-order tangent sets 
[41A10, 46N10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals; High-order necessary conditions for optimality 
for abnormal points) 
high performance computing 
[90C10, 90C26, 90C30] 
(see: Optimization software) 
high performance computing 
[90C10, 90C26, 90C30] 
(see: Optimization software) 
high performance computing system 
[65K05, 65Y05] 
(see: Parallel computing: models) 
high performance Fortran 
(05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
high point problem 
[49-01, 49K10, 49M37, 90-01, 90C05, 90C27, 91B52] 
(see: Bilevel linear programming) 
high-regular critical direction 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
higher homotopy group 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
higher-order derivatives 
[65K05, 90C30] 
(see: Minimax: directional differentiability) 
higher-order directional derivatives 
[65K05, 90C30] 
(see: Minimax: directional differentiability) 
higher-order spectrum 
[90C26, 90C90] 
(see: Signal processing with higher order statistics) 
higher-order statistics 
[90C26, 90C90] 
(see: Signal processing with higher order statistics) 
higher-order statistics 
[90C26, 90C90] 
(see: Signal processing with higher order statistics) 
higher order statistics see: Signal processing with — 
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Hilbert basis 

[90C10, 90C46] 

(see: Integer programming duality) 
Hilbert scheme 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 

90B, 90C] 

(see: Convex discrete optimization) 

Hilbert space 

[01A99, 90C99] 

(see: Von Neumann, John) 

Hilbert space see: symmetric element in a — 
Hilbert tenth problem 

[90C60] 

(see: Complexity classes in optimization) 
Hilbert’s thirteenth problem 

(01A60, 03B30, 54C70, 68Q17) 

(refers to: History of optimization) 
hillclimbing procedure see: Rosenbrock — 

Hipparcos 

[90C26, 90C90] 

(see: Global optimization in binary star astronomy) 
Hirabayashi see: problem regular in the sense of Kojima- — 
history 

[49Jxx, 91Axx] 

(see: Infinite horizon control and dynamic games) 
history 

[01A99] 

(see: History of optimization) 
history see: Structural optimization: — 

History of optimization 

(01499) 

(referred to in: Carathéodory, Constantine; Duality theory: 

biduality in nonconvex optimization; Duality theory: 

monoduality in convex optimization; Duality theory: 
triduality in global optimization; Hilbert’s thirteenth 
problem; Inequality-constrained nonlinear optimization; 

Kantorovich, Leonid Vitalyevich; Leibniz, gottfried 

wilhelm; Linear programming; Operations research; Von 

Neumann, John) 

(refers to: Carathéodory, Constantine; Carathéodory 

theorem; Inequality-constrained nonlinear optimization; 

Kantorovich, Leonid Vitalyevich; Leibniz, gottfried 

wilhelm; Linear programming; Operations research; Von 

Neumann, John) 
history and overview see: Bilevel programming: 

introduction — 
history of parametric programming 

[90C05, 90C25, 90C29, 90C30, 90C31] 

(see: Nondifferentiable optimization: parametric 

programming) 
history and rounding error estimation see: Automatic 

differentiation: introduction — 
history of a search 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 

94C10] 

(see: Maximum satisfiability problem) 
history-sensitive heuristics 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 

94C10] 

(see: Maximum satisfiability problem) 


history-sensitive heuristics 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
see: Maximum satisfiability problem) 
“hit-or-miss” decision problems 
[90C15] 
(see: Stochastic quasigradient methods: applications) 
hit and run 
[65K05, 90C30] 
see: Random search methods) 
hit and run see: artificial centering —; hyperspheres 
direction —; improving — 
hit and run algorithm 
[90C26, 90C90] 
see: Global optimization: hit and run methods) 
t and run algorithms 
[90C26, 90C90] 
see: Global optimization: hit and run methods) 
t and run generator 
[90C26, 90C90] 
see: Global optimization: hit and run methods) 
hit and run methods 
[90C26, 90C29, 90C90] 
see: Global optimization: hit and run methods; Optimal 
design of composite structures) 
hit and run methods see: Global optimization: — 
hitting cycle problem 
[90C35] 
(see: Feedback set problems) 
HYB equation 
[34H05, 49120, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 
HMM see: coupled —; factorial — 
hoc networks see: Optimization in ad — 
Hoffman inequalities see: Gale- — 
Hélder conditions see: uniform — 
Holder continuity 
[90C11, 90C15, 90C31] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence) 
hole-cut see: odd- — 
home terminals 
see: Railroad crew scheduling) 
homeomorph of a graph 
[05C50, 15A48, 15A57, 90C25] 
see: Matrix completion problems) 
homogeneity/heterogeneity see: vehicles’ — 
homogeneous 
[90C05, 90C30] 
see: Homogeneous selfdual methods for linear 
programming; Nonlinear systems of equations: application 
to the enclosure of all azeotropes) 
homogeneous see: increasing and positively —; plus —; 
positively — 
homogeneous algorithm 
[90C05] 
(see: Homogeneous selfdual methods for linear 
programming) 
homogeneous beliefs 
[91B28] 
(see: Portfolio selection: markowitz mean-variance model) 


h 
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homogeneous of degree zero 

[91B50] 

(see: Walrasian price equilibrium) 
homogeneous dual systems 

[15A39, 90C05] 

(see: Tucker homogeneous systems of linear relations) 
homogeneous function see: positively — 
homogeneous functions on topological vector spaces see: 

Increasing and positively — 
homogeneous process see: simple — 
homogeneous relation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 
homogeneous relations see: special properties of — 
Homogeneous selfdual methods for linear programming 

(90C05) 

(referred to in: Entropy optimization: interior point 

methods; Linear programming: interior point methods; 

Linear programming: karmarkar projective algorithm; 

Potential reduction methods for linear programming; 

Sequential quadratic programming: interior point methods 

for distributed optimal control problems; Successive 

quadratic programming: solution by active sets and interior 
point methods) 

(refers to: Entropy optimization: interior point methods; 

Interior point methods for semidefinite programming; 

Linear programming: interior point methods; Linear 

programming: karmarkar projective algorithm; Potential 

reduction methods for linear programming; Sequential 
quadratic programming: interior point methods for 
distributed optimal control problems; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

homogeneous and selfdual model 

[90C05] 

(see: Homogeneous selfdual methods for linear 

programming) 
homogeneous systems 

[15A39, 90C05] 

(see: Tucker homogeneous systems of linear relations) 
homogeneous systems of linear relations see: Tucker — 
homogenous 
see: Approximations to robust conic optimization 
problems) 
homogenous cones 
see: Approximations to robust conic optimization 
problems) 
homomorphism 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
homomorphism see: strong —; very strong —; weak — 
homomorphisms 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 
homoscedastic model 

[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 

90C90] 

(see: Disease diagnosis: optimization-based methods) 


homotopic functions 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 
homotopic maps 
[01A50, 01A55, 01A60] 
(see: Fundamental theorem of algebra) 
homotopic method 
[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 
(see: Unconstrained optimization in neural network 
training) 
homotopies see: optimization —; probability-one globally 
convergent — 
homotopy 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 
homotopy 
[01A50, 01A55, 01A60, 65F10, 65F50, 65H10, 65K10] 
(see: Fundamental theorem of algebra; Globally convergent 
homotopy methods) 
homotopy see: adaptive —; probability-one — 
homotopy algorithm see: globally convergent 
probability-one — 
homotopy continuation 
[65C20, 65G20, 65G30, 65G40, 65H20, 90C30, 90C90] 
(see: Interval analysis: application to chemical engineering 
design problems; Nonlinear systems of equations: 
application to the enclosure of all azeotropes) 
homotopy continuation method 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
homotopy group see: higher — 
homotopy method 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 
homotopy method 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 
homotopy methods 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 
homotopy methods see: Globally convergent —; software 
for — 
homotopy Newton method 
49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
homotopy property 
90C33] 
(see: Topological methods in complementarity theory) 
homotopy type 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
homotopy type 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
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HOMPACK90 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 
Hook law 
[35A15, 47J20, 49]40] 
(see: Hemivariational inequalities: static problems) 
hop neighboring stations see: one- — 
hop neighbors see: two- — 
Hopcroft-Tarjan planarity-testing algorithm 
[90C10, 90C27, 94C15] 
(see: Graph planarization) 
Hopf equations see: Wiener- — 
hopping problem see: airplane — 
horizon see: decision making with rolling —; finite —; 
infinite —; infinite time —; planning — 
horizon control and dynamic games see: Infinite — 
horizon game see: nonzero-sum infinite — 
horizon problem see: discounted infinite —; total cost 
infinite — 
horizon problems see: infinite — 
horizon problems, overview see: Dynamic programming: 
infinite — 
horizontal linear complementarity problem 
90C33] 
(see: Linear complementarity problem) 
Horn formulas 
90C05, 90C10 
(see: Simplicial pivoting algorithms for integer 
programming) 
Horn formulas 
90C05, 90C10 
(see: Simplicial pivoting algorithms for integer 
programming) 
HOS 
90C26, 90C90 
(see: Signal processing with higher order statistics) 
hot spot 
68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
hot spots 
68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
Householder transformation 
15A23, 49M37, 65F05, 65F20, 65F22, 65F25] 
(see: Nonlinear least squares: Newton-type methods; QR 
factorization) 
Householder transformations see: QR factorization using — 
HPF 
[05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
hPS 
(see: Short-term scheduling of batch processes with 
resources) 
Huang algorithm 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares; ABS algorithms for optimization) 
Huang algorithm 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 


Huang algorithm see: modified — 
Huang method 
65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
hub 
90C35] 
(see: Multi-index transportation problems) 
hub heuristic see: multiple- —; single — 
hub heuristics for axial MITPs 
90C35] 
(see: Multi-index transportation problems) 
Huber M-estimator 
65D10, 65K05] 
(see: Overdetermined systems of linear equations) 
hull see: convex —; lower convex —; normal —; reverse 
normal — 
hull consistency 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
hull disjunctions see: Convex — 
hull problem see: convex — 
Hull relaxation 
see: Optimal planning of offshore oilfield infrastructure) 
human rationality assumption 
[90C29] 
see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
human rationality factor 
[90C29] 
see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
Hungarian algorithm 
[90C05, 90C10, 90C27, 90C35] 
see: Assignment and matching) 
Hungarian method 
[90C10, 90C35] 
see: Bi-objective assignment problem) 
Hunter-Worsley bounds 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
see: Approximation of multivariate probability integrals) 
Hunter-Worsley upper bound 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
Hurwicz gradient method see: Arrow- — 
huS 
[49M07, 49M10, 65K, 90C06] 
(see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 
HVI 
[35A15, 47J20, 49]40] 
(see: Hemivariational inequalities: static problems) 
Hwang conjecture see: Graham- — 
Hwang minimax theorem see: Du- — 
hybrid algorithm 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 
hybrid algorithm 
[90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 
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hybrid branch and bound and outer approximation 
[49M37, 90C11] 
(see: Mixed integer nonlinear programming) 
hybrid conjugate gradient algorithms for unconstrained 
optimization see: New — 
hybrid metaheuristic 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
hybrid metaheuristics see: GRASP in — 
hybrid methods 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
hybrid methods see: Mixed integer programming/constraint 
programming — 
hybrid model 
[90C15] 
(see: Static stochastic programming models) 
hybrid NP methods 
[90C11, 90C59] 
(see: Nested partitions optimization) 
hybrid orthonormalization 
[52B11, 52B45, 52B55] 
see: Volume computation for polytopes: strategies and 
performances) 
hydro-generation 
[90C35] 
see: Multicommodity flow problems) 
hydro plants 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
hydro-reservoir 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
hydrological exogenous inflow 
[90C30, 90C35] 
(see: Optimization in water resources) 
hydrological exogenous inflow and demand see: water 
resources planning under uncertainty on — 
hydropower nodes see: on-the-river — 
hyperbolic 0-1 programming problem 
(see: Fractional zero-one programming) 
(hyperbolic) 0-1 programming problem see: single-ratio 
fractional — 
hyperbolic programming 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules) 
hypercube 
[65K05, 65Y05] 
(see: Parallel computing: models) 
hypercube 
[65K05, 65Y05] 
(see: Parallel computing: models) 
hypercube see: d-dimensional — 
hyperdifferentiable 
[65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
hyperdifferentiable function 
[49]52, 65K99, 70-08, 90C25] 


(see: Quasidifferentiable optimization: codifferentiable 
functions) 
hyperdifferential 
[49]52, 65K99, 65Kxx, 70-08, 90C25, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for 
hypodifferentiable functions; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: codifferentiable 
functions) 
hyperdifferential see: second order — 
hypergeometric distribution 
[90C15] 
(see: Logconcavity of discrete distributions) 
hypergeometric integral 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
hypergeometric integral 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
hyperglycemia 
(see: Model based control for drug delivery systems) 
hypergraph see: subtree — 
hypergraph q-coloring 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
hyperplane see: arrangement of —; separating —; support —; 
tangent — 
hyperplane arrangement 
[90C09, 90C10] 
(see: Oriented matroids) 
hyperplane arrangement 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements; Hyperplane arrangements 
in optimization) 
Hyperplane arrangements 
(52C35, 05B35, 57N65, 20F36, 20F55) 
(referred to in: Hyperplane arrangements in optimization) 
(refers to: Hyperplane arrangements in optimization) 
Hyperplane arrangements in optimization 
(05B35, 20F36, 20F55, 52C35, 57N65) 
(referred to in: Hyperplane arrangements) 
(refers to: Hyperplane arrangements) 
hyperplanes see: Boolean arrangement of —; braid 
arrangement of —; cohomology of an arrangement of —; 
complement of an arrangement of —; dependent —; 
divisor of an arrangement of —; free arrangement of —; 
general position of —; reflection arrangement of —; 
singularity of an arrangement of — 
hyperspheres direction hit and run 
90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
hyperspherical direction 
90C26, 90C90] 
(see: Global optimization: hit and run methods) 
hypervoxels 
90C90] 
(see: Optimization in medical imaging) 
hypodifferentiability 
65K05, 90C30] 
(see: Minimax: directional differentiability) 
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hypodifferentiability 

[65K05, 90C30] 

(see: Minimax: directional differentiability) 
hypodifferentiable 

[49]52, 65K99, 65Kxx, 90Cxx] 

(see: Quasidifferentiable optimization: algorithms for 

hypodifferentiable functions; Quasidifferentiable 

optimization: algorithms for QD functions) 
hypodifferentiable function 

[49]52, 65K99, 70-08, 90C25] 

(see: Quasidifferentiable optimization: codifferentiable 

functions) 
hypodifferentiable functions see: Quasidifferentiable 

optimization: algorithms for — 
hypodifferentiable optimization 

[49J52, 65K99] 

(see: Quasidifferentiable optimization: algorithms for 

hypodifferentiable functions) 
hypodifferential 

[49]52, 65K05, 65K99, 65Kxx, 70-08, 90C25, 90C30, 90Cxx] 

(see: Nondifferentiable optimization: minimax problems; 

Quasidifferentiable optimization: algorithms for 

hypodifferentiable functions; Quasidifferentiable 

optimization: algorithms for QD functions; 

Quasidifferentiable optimization: codifferentiable 

functions) 
hypodifferential 

[65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 
hypodifferential see: kth order —; second —; second order — 
hypodifferential descent 

[65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 
hypodifferential descent see: method of — 
hypoglycemia 

(see: Model based control for drug delivery systems) 
hypothesis see: financial leverage —; inefficient 

management — 
hypothesis formation see: automated — 
hysteresis 

[49]52] 

(see: Hemivariational inequalities: eigenvalue problems) 
hZaw 

[49M07, 49M10, 65K, 90C06] 

(see: New hybrid conjugate gradient algorithms for 

unconstrained optimization) 
hZw 

[49M07, 49M10, 65K, 90C06] 

(see: New hybrid conjugate gradient algorithms for 

unconstrained optimization) 


| see: algorithm partition-matching- —; generalization of 
ELECTRE — 

| requirement see: Type — 

IA 
[90C26] 
(see: Cutting plane methods for global optimization) 


IBC 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 
IDA* 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
ideal see: admissible pair of a monomial —; arithmetic degree 
of a monomial —; initial —; lattice —; localization of an —; 
monomial —; polynomial —; standard pair decomposition 
of a monomial —; standard pair of a monomial —; 
Stanley-Reisner —; toric — 
ideal and nonideal phase equilibrium equations 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in 
distillation systems) 
ideal part 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in 
distillation systems) 
identical machines 
[68Q99] 
(see: Branch and price: Integer programming with column 
generation) 
identification 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
identification see: Mixed 0-1 linear programming approach for 
DNA transcription element —; model —; parameter — 
Identification methods for reaction kinetics and transport 
(34A55, 35R30, 62G05, 62G08, 62P30, 62P10, 62J02, 62K05, 
76R50, 80A23, 80A30, 80A20) 
identification problem see: parameter — 
identification via mixed-integer optimization see: Peptide — 
identities see: primitive partition — 
identity transformation 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
idle time 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
Idnani active set strategy see: Goldfarb- — 
Idnani method see: Goldfarb- — 
IDP 
[90C30] 
(see: Suboptimal control) 
IDP 
[93-XX] 
(see: Dynamic programming: optimal control applications) 
IEQNO 
[49M37, 65K05, 90C30] 
see: Inequality-constrained nonlinear optimization) 
if-when scenarios see: what- — 
IFS 
[49Q10, 74K99, 74Pxx, 90C90, 91465] 
see: Multilevel optimization in mechanics) 
IHDG 
[49]xx, 91Axx] 
see: Infinite horizon control and dynamic games) 
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Il rule see: polyak — 

IIP 

90C05] 

(see: Linear programming: interior point methods) 

IL 

90C30, 90C33] 

(see: Implicit lagrangian) 

ill-conditioned coefficient matrix 

90C25, 90C29, 90C30, 90C31] 

(see: Bilevel programming: optimality conditions and 

duality) 

ill-conditioned matrix 

15-XX, 65-XX, 90-XX] 

(see: Cholesky factorization) 

ill-conditioned problem 

90C22, 90C25, 90C31] 

(see: Semidefinite programming: optimality conditions and 

stability) 

ill-posed 

90C05, 90C25, 90C29, 90C30, 90C31] 

(see: Nondifferentiable optimization: parametric 

programming) 

ill-posed problem 

49]40, 49M30, 65K05, 65M30, 65M32] 

(see: Ill-posed variational problems) 

ill-posed problems 

[90C30] 

see: Cost approximation algorithms) 

ill-posed variational problem 

[49J40, 49M30, 65K05, 65M30, 65M32] 

see: Ill-posed variational problems) 

Ill-posed variational problems 

(65K05, 65M30, 65M32, 49M30, 49J40) 

referred to in: Sensitivity and stability in NLP) 

refers to: Sensitivity and stability in NLP) 

IM see: multistage —; single-stage — 

IM in SC 

[90B50] 

see: Inventory management in supply chains) 

iMA Journal of Management Mathematics 

[9008, 90C26, 90C27, 90C59] 

(see: Variable neighborhood search methods) 

image 

[90C05, 90C30] 

(see: Theorems of the alternative and optimization) 

image problem 

[90C30] 

see: Image space approach to optimization) 

image processing see: optimization in medical — 

image reconstruction 

[94A08, 94A17] 

see: Maximum entropy principle: image reconstruction) 

image reconstruction see: Entropy optimization for —; 
finite-dimensional models for entropy optimization for —; 
Maximum entropy principle: —; vector-space models for 
entropy optimization for — 

image reconstruction from projection data 
[94.A08, 94A17] 
(see: Maximum entropy principle: image reconstruction) 

image reconstruction from projection data see: feasibility 
approach to —; optimization approach to — 


image space 

[90C05, 90C30] 

(see: Theorems of the alternative and optimization) 
image space 

[90C05, 90C30] 

(see: Image space approach to optimization; Theorems of 

the alternative and optimization) 
Image space approach to optimization 

(90C30) 

(referred to in: Theorems of the alternative and 

optimization; Vector optimization) 

(refers to: Theorems of the alternative and optimization; 

Vector optimization) 
images 

[94A08, 94A17] 

(see: Maximum entropy principle: image reconstruction) 
imaging see: medical —; Optimization in medical — 
imbalance 
90C35] 

(see: Minimum cost flow problem) 
immediate selection 

90B35] 

(see: Job-shop scheduling problem) 
imperative programming paradigm 

90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 
imperfect competition 

91B06, 91B60] 

(see: Oligopolistic market equilibrium) 
imperfect competition 

91B06, 91B60] 

(see: Oligopolistic market equilibrium) 
implementation see: programmed —; PVM-based — 
implementation of the auction algorithm see: synchronous —; 

totally asynchronous — 
implementation example see: optimization computer — 
implementation of optimization see: Computer — 
implementations see: MPl-based — 
implication 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
implication see: Goguen—Gaines —; Kleene-Dienes —; 

Lukasiewicz —; many-valued logic —; Reichenbach — 
implication operator 

[03B50, 03B52, 03C80, 03E72, 47840, 62F30, 62Gxx, 68T15, 

68T27, 68T30, 68T35, 68Uxx, 90Bxx, 91Axx, 91B06, 92C60] 

(see: Boolean and fuzzy relations; Checklist paradigm 

semantics for fuzzy logics; Finite complete systems of 

many-valued logic algebras) 
implications see: logical — 
implicit Choleski algorithm 

[65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 

squares) 
implicit complementarity problem 

[90C33] 

(see: Equivalence between nonlinear complementarity 

problem and fixed point problem; Topological methods in 

complementarity theory) 
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implicit enumeration 

[90C35] 

(see: Graph coloring) 
implicit equality constraint 

[90C05, 90C20] 

(see: Redundancy in nonlinear programs) 
implicit function approach 

[90C26, 90C31, 91A65] 

(see: Bilevel programming: implicit function approach) 
implicit function approach see: Bilevel programming: — 
implicit function approach to bilevel programming 

[90C26, 90C31, 91A65] 

(see: Bilevel programming: implicit function approach) 
implicit function theorem 

[90C30, 90C31] 

(see: Bounds and solution vector estimates for parametric 

NLPS; Generalized total least squares) 
implicit general order complementarity problem 

[90C33] 

(see: Order complementarity) 

Implicit lagrangian 

(90C33, 90C30) 

(referred to in: Kuhn-Tucker optimality conditions) 

(refers to: Kuhn-Tucker optimality conditions; Lagrangian 

duality: BASICS; Variational inequalities) 
implicit Lagrangian see: restricted —; unconstrained — 
implicit LU algorithm 
65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 
squares; ABS algorithms for optimization) 

Implicit LU algorithm 

65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 
squares) 

implicit LX algorithm 

65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 
squares; ABS algorithms for optimization) 

Implicit LX algorithm 

65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 
squares) 

implicit QR algorithm 

65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 
squares) 

implicit restarted Lanczos method 

90C30] 

(see: Large scale trust region problems) 

implicit utility function 

90C11, 90C29] 

(see: Multi-objective mixed integer programming) 
implicit variational inequalities and quasivariational inequalities 
49J40, 49Q10, 70-08, 74K99, 74Pxx] 

(see: Quasivariational inequalities) 

implicit variational problems 

49J40, 49Q10, 70-08, 74K99, 74Pxx] 

(see: Quasivariational inequalities) 

implied constraint 

90B10, 90B15, 90C15, 90C35] 

(see: Preprocessing in stochastic programming) 


implied inequality 

15A39, 90C05] 

(see: Linear optimization: theorems of the alternative) 
import model 

[90C35] 

(see: Multicommodity flow problems) 

importance sampling 

[62F12, 65C05, 65K05, 90C15, 90C31] 

see: Monte-Carlo simulations for stochastic optimization) 
impossible pairs constrained path problem 

[68Q25, 90C60] 

(see: NP-complete problems and proof methodology) 
imprecise information 

[90C29, 90C70] 

(see: Fuzzy multi-objective linear programming) 
improper 

[65K05, 90C26, 90C30] 

see: Monotonic optimization) 

improper vertex 

[90C26] 

see: Cutting plane methods for global optimization) 
improved piecewise linearization 

[90B80, 90C11] 

(see: Facility location with staircase costs) 

improved procedure 


[90B80] 
see: Facilities layout problems) 
improvement see: best —; dual bound- —; dual cut- —; 
local —; primal bound- —; primal cut- —; rule of greatest — 


improvement heuristics 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
improvement of KKT points see: successive — 
improvement methods 
[90B06, 90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem; Vehicle routing) 
improvements using a heuristic parameter, reject index for 
interval optimization see: Algorithmic — 
improving 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
improving feasible direction 
[90C30] 
(see: Convex-simplex algorithm) 
improving feasible direction 
[90C30] 
see: Convex-simplex algorithm) 
improving hit and run 
[65K05, 90C26, 90C29, 90C30, 90C90] 
(see: Global optimization: hit and run methods; Optimal 
design of composite structures; Random search methods) 
improving hit and run 
[90C26, 90C29, 90C90] 
(see: Global optimization: hit and run methods; Optimal 
design of composite structures) 
impulse perturbations see: Vasicek model with — 
imputation 
[90C27, 90C60, 91412] 
(see: Combinatorial optimization games) 
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IMSL subroutine library 
65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
imvex 
90C26] 
(see: Invexity and its applications) 
in-company policies 
68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
(in logic) see: literal — 
inaccuracy 
[90C05, 90C25] 
(see: Young programming) 
inaccuracy in observations 
[65D10, 65K05] 
(see: Overdetermined systems of linear equations) 
inactive 
[90C60] 
see: Complexity of degeneracy) 
inactive constraints 
[90C26, 90C39] 
see: Second order optimality conditions for nonlinear 
optimization) 
inactive constraints 
[90C60] 
(see: Complexity of degeneracy) 
inadmissible arc 
[90C35] 
(see: Maximum flow problem) 
incidence 
[05B35, 20F36, 20F55, 52C35, 57N65] 
see: Hyperplane arrangements in optimization) 
incidence graph see: column — 
incidence matrix 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
incidence matrix see: node-arc — 
incidence in a network 
[90C35] 
(see: Minimum cost flow problem) 
incidence vector 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
see: Lovasz number) 
incident 
[90C35] 
see: Graph coloring) 
incident faces 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements in optimization) 
inclusion see: degree of —; variational — 
inclusion function 
[65G20, 65G30, 65G40, 65K05, 65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval; 
Interval global optimization) 
inclusion function see: good —; isotone —; order of an — 
Inclusion Method 
[49R50, 65G20, 65G30, 65G40, 65L15, 65L60] 
(see: Eigenvalue enclosures for ordinary differential 
equations) 
inclusion operator see: fuzzy set- — 


inclusion principle 
65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
inclusion principle of machine interval arithmetic 
65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
inclusion of relations 
03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
incoming arc 
90C35] 
(see: Minimum cost flow problem) 
incomparability 
90C29] 
(see: Preference modeling) 
incomplete information 
68Q25, 90C09, 90C10, 91B28] 
(see: Competitive ratio for portfolio management; 
Optimization in boolean classification problems) 
incomplete information 
[62C20, 90C15] 
(see: Stochastic programming: minimax approach) 
incomplete judgments 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
incomplete knowledge of a probability distribution 
[62C20, 90C15] 
(see: Stochastic programming: minimax approach) 
incomplete methods 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 
incomplete solution 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
incomplete state feedback 
[90C30] 
(see: Suboptimal control) 
incorporation of biological constraints 
(see: Selection of maximally informative genes) 
increase see: dual price — 
increasing 
[65K05, 90B10, 90C26, 90C30] 
(see: Monotonic optimization; Piecewise linear network 
flow problems) 
increasing see: concave —; linear — 
Increasing and convex-along-rays functions on topological 
vector spaces 
(26A48, 52A07, 26A51) 
increasing-degree deletion heuristic 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
increasing function 
[90C26] 
(see: Cutting plane methods for global optimization) 
increasing function see: coordinatewise — 
increasing and positively homogeneous 
[26A48, 26A51, 52A07] 
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(see: Increasing and convex-along-rays functions on 
topological vector spaces) 
Increasing and positively homogeneous functions on 
topological vector spaces 
(26A48, 52A07, 26A51) 
increasing utility function see: coordinatewise — 
increment see: bidding — 
incremental algorithm 
[90C09, 90C 10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
incremental deletion heuristic 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
incremental gradient method 
[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 
(see: Unconstrained optimization in neural network 
training) 
incremental-iterative solution algorithm 
[49J52, 49Q10, 74G60, 74H99, 74K99, 74Pxx, 90C90] 
(see: Quasidifferentiable optimization: stability of dynamic 
systems) 
incremental negamax algorithm 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
incremental strategy for model structure refinement 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
incumbent objective value 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
incumbent solution 
[90C06, 90C15, 90C25] 
(see: Concave programming; Stabilization of cutting plane 
algorithms for stochastic linear programming problems) 
incumbent value 
[90C10, 90C29] 
(see: Multi-objective integer linear programming) 
indecomposable matrix see: fully — 
indefinite 
90C60] 
(see: Complexity theory: quadratic programming) 
indefinite integral 
65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
indefinite quadratic problems 
65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
indefinite quadratic programming 
90C11, 90C25] 
(see: Concave programming; MINLP: branch and bound 
methods) 
indefinite quadratic programs 
[90C25, 90C30] 
(see: Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 


independence 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 

independence see: global —; linear —; local 

independence constraint qualification see: linear — 

independence constraint qualification (LICQ) see: linear — 

independence CQ see: linear — 

independence number 

[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

see: Optimization problems in unit-disk graphs) 

independence system 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

see: Domination analysis in combinatorial optimization) 

independency constraint qualification see: linear — 

independent 

[90C30] 

see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 

independent see: linearly —; model —; positively linearly — 

independent dominating set 

[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

see: Optimization problems in unit-disk graphs) 

independent set 

[05C15, 05C17, 05C35, 05C62, 05C69, 05C85, 68Q25, 68R10, 
68W01, 68W40, 90C09, 90C10, 90C22, 90C27, 90C35, 90C59] 
(see: Domination analysis in combinatorial optimization; 
Graph coloring; Heuristics for maximum clique and 
independent set; Lovasz number; Matroids; Optimization 
problems in unit-disk graphs) 

independent set 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 

Independent Set see: Heuristics for maximum clique and —; 
maximal —; maximum —; maximum weighted — 

Independent Set Problem see: maximum — 

independent sets 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

see: Domination analysis in combinatorial optimization) 

independent sets see: maximum weight — 

independent of solver formats 

see: Planning in the process industry) 

independent subset 

90C09, 90C10] 

(see: Matroids) 

independent system 

90C09, 90C10] 

(see: Matroids) 

independent variables 

65D25, 65G20, 65G30, 65G40, 65H20, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians; 
Interval analysis: intermediate terms) 

indeterminate box 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 

index see: active —; augmented performance —-; Bilevel 
optimization: feasibility test and flexibility —; blending —; 
consistency —; cost —; descending —-; flexibility —; 
gradient —; least- —; linear —; linear co- —; merit —; 
morse —; performance —; quadratic —; quadratic co- — 

index anticycling rules see: Least- — 
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index approach 

[90C26] 

(see: Global optimization using space filling) 
index approach 


[90C26] 
(see: Global optimization using space filling) 
index assignment problems see: multi- —; three- — 


index-based 
(see: Planning in the process industry) 
index of a constraint violating point 
[90C26] 
(see: Global optimization using space filling) 
index criss-cross method see: least- — 
index for interval optimization see: Algorithmic improvements 
using a heuristic parameter, reject — 
index market model see: Sharpe single — 
index pivoting method see: least- — 
index pivoting rule see: Bland least — 
index refinement see: Murty least- — 
index rule see: smallest — 
index set 
[90034] 
(see: Semi-infinite programming: approximation methods) 


index set see: active —; essentially active —; SIP —; species 

index transportation problem see: axial multi- —; integer 
multi- —; k- —; multi- —; planar multi- —; symmetric 
multi- —; three- — 

index transportation problems see: Multi- — 

index tree 


[34E05, 90C27] 
see: Asymptotic properties of random multidimensional 
assignment problem) 
indexing terms 
[90C09, 90C10 
see: Optimization in classifying text documents) 
indexing terms 
[90C09, 90C10 
see: Optimization in classifying text documents) 
indexing vocabulary 
[90C09, 90C10 
(see: Optimization in classifying text documents) 
indexing vocabulary 
90C09, 90C10 
(see: Optimization in classifying text documents) 
indexing vocabulary see: optimal — 
iNDF 
[65K10, 90C33, 90C51] 
see: Generalizations of interior point methods for the 
linear complementarity problem) 
indicator 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
indicator function 
[49J40, 49J52, 49Q10, 49805, 62F12, 65C05, 65K05, 70-08, 
70-XX, 74G99, 74H99, 74K99, 74Pxx, 80-XX, 90C15, 90C31, 
90C33] 
(see: Hemivariational inequalities: applications in 
mechanics; Monte-Carlo simulations for stochastic 
optimization; Nonconvex energy functions: 


hemivariational inequalities; Quasivariational inequalities; 
Stochastic quasigradient methods: applications) 
indicator function see: expectation of an — 
indices 
(see: Planning in the process industry; Short-term 
scheduling under uncertainty: sensitivity analysis) 
indices see: morse — 
indifference 
90C29] 
(see: Preference modeling) 
indifference threshold 
90-XX] 
(see: Outranking methods) 
indirect methods 
65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
indiscernibility 
03E70, 03H05, 91B16] 
(see: Alternative set theory) 
indiscernibility see: fundamental —; relation of — 
individual 
92B05] 
(see: Genetic algorithms) 
individual 
92B05] 
(see: Genetic algorithms) 
individual probabilistic constraints 
90C15] 
(see: Static stochastic programming models) 
individual rationality 
90C27, 90C60, 91A12] 
(see: Combinatorial optimization games) 
individual software routine 
90C10, 90C26, 90C30] 
(see: Optimization software) 
individuals 
(see: Broadcast scheduling problem) 
induced region 
[49M37, 90C26, 91A10] 
(see: Bilevel programming) 
induced subgraph 
[05C50, 05C60, 05C69, 15A48, 15A57, 37B25, 90C20, 90C25, 
90C27, 90C35, 90C59, 91A22] 
(see: Matrix completion problems; Replicator dynamics in 
combinatorial optimization) 
induced by a vertex subset see: subgraph — 
inducible region 
[49-01, 49K10, 49M37, 90-01, 90C05, 90C27, 90C30, 90C90, 
91B52] 
(see: Bilevel linear programming; Bilevel programming: 
global optimization) 
induction axiom 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
inductive inference 
[90C26, 90C30] 
(see: Forecasting) 
inductive inference 
[90C26, 90C30] 
(see: Forecasting) 
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inductive inference problem 
[90C09, 90C10] 
(see: Optimization in boolean classification problems) 
inductive inference problem 
[90C09, 90C10] 
(see: Optimization in boolean classification problems) 
inductive structure of an irreducible matrix 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
industrial engineering see: Archimedes and the foundations 
of — 
industrial in-plant railroads see: engine routing and — 
industrial problems see: SQP optimization in — 
industry see: GRASP in —; petrochemical —; Planning in the 
process —; refining —; Successive quadratic programming: 
applications in the process — 
inefficient 
[90B30, 90B50, 90C05, 91B82] 
(see: Data envelopment analysis) 
inefficient management hypothesis 
[90C05, 90C90, 91B28] 
(see: Multicriteria methods for mergers and acquisitions) 
inequalities see: anti-Monge —; approximation of 
variational —; blossom —; convex —; Gale—Hoffman —; 
hemivariational —; implicit variational inequalities and 
quasivariational —; linear —; Monge —; multivalued 
monotone laws and variational —; multivalued 
nonmonotone laws and hemivariational —; Nonconvex 
energy functions: hemivariational —; Nonsmooth and 
smoothing methods for nonlinear complementarity 
problems and variational —; parametric variational —; QD 
laws and systems of variational —; Quasivariational —; 
saddle-point —; scalar variational —; Solution methods for 
multivalued variational —; system of —; system of 
variational —; valid —; variational —; variational-like —; 
vector variational — 
inequalities: applications in mechanics see: Hemivariational — 
inequalities: A brief review see: Generalized variational — 
inequalities: eigenvalue problems see: Hemivariational — 
inequalities and equilibrium problems see: Generalized 
monotonicity: applications to variational — 
inequalities: F. E. approach see: Variational — 
inequalities: geometric interpretation, existence and 
uniqueness see: Variational — 
inequalities for nonlinear material laws see: discretized 
hemivariational — 
inequalities by nonsmooth optimization methods see: Solving 
hemivariational — 
inequalities: projected dynamical system see: Variational — 
inequalities and quasivariational inequalities see: implicit 
variational — 
inequalities: static problems see: Hemivariational — 
inequality see: abstract hemivariational —; azuma’s —; bilinear 
matrix —; Cauchy —; disjunctive —; duality —; 
Fenchel-Young —-; generalized variational —; 
Hamilton-Jacobi —; implied —; jensen’s —; linear 
matrix —; mixed variational —; nondominated valid —; 
quasivariational —; reverse convex —; semicoercive 
hemivariational —; strengthen triangle —; subgradient —; 
triangle —; two-function minimax —; variational —; 
variational-hemivariational —; vector —; Young 


Inequality-constrained nonlinear optimization 
(49M37, 65K05, 90C30) 
(referred to in: Equality-constrained nonlinear 
programming: KKT necessary optimality conditions; First 
order constraint qualifications; History of optimization; 
Kuhn-Tucker optimality conditions; Lagrangian duality: 
BASICS; Redundancy in nonlinear programs; Relaxation in 
projection methods; Rosen’s method, global convergence, 
and Powell’s conjecture; Saddle point theory and optimality 
conditions; Second order constraint qualifications; Second 
order optimality conditions for nonlinear optimization; 
SSC minimization algorithms; SSC minimization 
algorithms for nonsmooth and stochastic optimization) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; First order 
constraint qualifications; History of optimization; 
Kuhn-Tucker optimality conditions; Lagrangian duality: 
BASICS; Redundancy in nonlinear programs; Relaxation in 
projection methods; Rosen’s method, global convergence, 
and Powell’s conjecture; Saddle point theory and optimality 
conditions; Second order constraint qualifications; Second 
order optimality conditions for nonlinear optimization; 
SSC minimization algorithms; SSC minimization 
algorithms for nonsmooth and stochastic optimization) 

inequality constraint see: state — 

inequality constraints 
[41A10, 46N10, 47N10, 49K27, 65K05] 
(see: Direct global optimization algorithm; High-order 
necessary conditions for optimality for abnormal points) 

inequality constraints see: active —; feasibility of —; 
infeasibility of — 

inequality for an elastostatic problem involving 
QD-superpotentials see: convex variational — 

inequality of elliptic type see: abstract variational — 

inequality formulation see: variational — 

inequality formulation in link loads see: variational — 

inequality formulation in path flows see: variational — 

inequality formulations see: variational — 

inequality or nonsmooth mechanics 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 

inequality problem see: coercive hemivariational —; dual 
variational —; finite-dimensional variational —; parametric 
variational —; variational —; vector variational — 

inequality problem and a projected dynamical system see: 
variational — 

inequality problems 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 

inequality problems see: Sensitivity analysis of variational —; 
variational — 

inequality relation see: linear — 

inequality systems 
[15A39, 46A20, 52A01, 90C05, 90C30] 
(see: Farkas lemma; Farkas lemma: generalizations; Linear 
optimization: theorems of the alternative; Motzkin 
transposition theorem; Tucker homogeneous systems of 
linear relations) 

inequality systems see: convex — 
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inertia of a matrix 

49M37] 

(see: Nonlinear least squares: trust region methods) 
inexact line search 

90C30] 

(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 

inexact line search line 

90C30] 

(see: Frank-Wolfe algorithm) 

inexact line search technique 

[90C30] 

see: Convex-simplex algorithm) 

inexact Newton method 

[90C30] 

see: Numerical methods for unary optimization) 
inexact Newton method 

[90C30] 

see: Numerical methods for unary optimization) 


inexact Newton methods 

[90C06] 

see: Large scale unconstrained optimization) 

inexact proximal point algorithms 

[90C30] 

(see: Cost approximation algorithms) 

inf-stationary 

[65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 

inf-stationary point 

[65K05, 90C30, 90Cxx] 

see: Nondifferentiable optimization: minimax problems; 

Quasidifferentiable optimization: optimality conditions) 

inf-stationary point 

[65K05, 90C30] 

see: Nondifferentiable optimization: minimax problems) 


inf-stationary points 

[49]52, 65K99] 

see: Quasidifferentiable optimization: algorithms for 
hypodifferentiable functions) 


infeasibilities see: diagnosing and tracing —; sum of integer — 
infeasibility criterion 
[90C11, 90C31] 
(see: Multiparametric mixed integer linear programming) 
infeasibility of inequality constraints 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: verifying feasibility) 
infeasibility proof 
[90C10] 
see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
infeasibility test 
[49M37, 65G20, 65G30, 65G40, 65K05, 90C11, 90C30] 
(see: Interval global optimization; Mixed integer nonlinear 
programming) 
infeasible 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 


infeasible component 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
infeasible integer variable see: most/least — 
infeasible interior point 
[90C05] 
(see: Linear programming: interior point methods) 
infeasible node 
[90C10, 90C29] 
(see: Multi-objective integer linear programming) 
infeasible path approach 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
infeasible problem 
[15A39, 90C05] 
(see: Linear optimization: theorems of the alternative) 
infeasible program 
[90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
infeasible solution 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
infeasible-start interior-point algorithm 
[90C05, 90C22, 90C25, 90C30, 90C51] 
(see: Interior point methods for semidefinite programming) 
infeasible system 
[15A39, 90C05] 
(see: Motzkin transposition theorem) 
Infer see: branch and — 
inference see: Bayesian —; classical —; fuzzy interval —; 
inductive —; interval-valued approximate —; monotone 
Boolean function —; order restricted statistical —; premis of 
an —; visual — 
inference duality 
[90C10, 90C46] 
(see: Integer programming duality) 
Inference of monotone boolean functions 
(90C09) 
(referred to in: Alternative set theory; Boolean and fuzzy 
relations; Checklist paradigm semantics for fuzzy logics; 
Finite complete systems of many-valued logic algebras; 
Optimization in boolean classification problems; 
Optimization in classifying text documents) 
(refers to: Alternative set theory; Boolean and fuzzy 
relations; Checklist paradigm semantics for fuzzy logics; 
Finite complete systems of many-valued logic algebras; 
Optimization in boolean classification problems; 
Optimization in classifying text documents) 
inference problem see: Boolean function —; inductive — 
infimum 
49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
infimum see: global —; local — 
infimum of a Lagrangian function 
90C30] 
(see: Lagrangian duality: BASICS) 
infinite see: semi- — 
infinite class 
03E70, 03H05, 91B16] 
(see: Alternative set theory) 
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infinite-dimensional generalized order complementarity problem 
[90C33] 
(see: Generalized nonlinear complementarity problem) 
infinite-dimensional linear programming 
[03H10, 4927, 90C34] 
(see: Semi-infinite programming and control problems) 
infinite-dimensional optimization 
[46A20, 52A01, 90C30] 
(see: Farkas lemma: generalizations) 
infinite horizon 
(see: Bayesian networks) 
Infinite horizon control and dynamic games 
(91Axx, 49]xx) 
(referred to in: Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Hamilton-Jacobi-Bellman 
equation; MINLP: applications in the interaction of design 
and control; Multi-objective optimization: interaction of 
design and control; Optimal control of a flexible arm; 
Optimization strategies for dynamic systems; Robust 
control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Suboptimal control) 
(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 


applications; Hamilton-Jacobi-Bellman equation; MINLP: 


applications in the interaction of design and control; 
Multi-objective optimization: interaction of design and 
control; Optimal control of a flexible arm; Optimization 
strategies for dynamic systems; Robust control; Robust 
control: schur stability of polytopes of polynomials; 
Semi-infinite programming and control problems; 


Sequential quadratic programming: interior point methods 


for distributed optimal control problems; Suboptimal 
control) 

infinite horizon game see: nonzero-sum — 

infinite horizon problem see: discounted —; total cost — 

infinite horizon problems 
[49L.20, 90C39, 90C40] 
(see: Dynamic programming: infinite horizon problems, 
overview; Dynamic programming: inventory control; 
Dynamic programming: undiscounted problems) 

infinite horizon problems 
[49L.20, 49L99, 90C39, 90C40] 
(see: Dynamic programming: average cost per stage 
problems; Dynamic programming: discounted problems; 
Dynamic programming: infinite horizon problems, 
overview; Dynamic programming: stochastic shortest path 
problems; Dynamic programming: undiscounted 
problems) 

infinite horizon problems, overview see: Dynamic 
programming: — 

infinite linear programming see: semi- — 


infinite many conditions moment problem 
[28-XX, 49-XX, 60-XX] 
(see: General moment optimization problems) 

infinite moment problem 
[28-XX, 49-XX, 60-XX] 
(see: General moment optimization problems) 

infinite optimization see: Adaptive convexification in semi- —; 
one-parametric semi- —; semi- —; Smoothing methods for 
semi- — 

infinite optimization problem see: semi- — 

infinite optimization problems see: semi- — 

infinite problem see: generalized semi- — 


infinite program see: dual semi- —; primal (linear) semi- —; 
semi- — 

infinite programming see: linear semi- —; perfect duality from 
the view of linear semi- —; reduced problem in semi- —; 
semi- — 


infinite programming and applications in finance see: Semi- — 

infinite programming: approximation methods see: Semi- — 

infinite programming and control problems see: Semi- — 

infinite programming: discretization methods see: Semi- — 

infinite programming: methods for linear problems see: 
Semi- — 

infinite programming: numerical methods see: Semi- — 

infinite programming: optimality conditions see: Generalized 
semi- — 

infinite programming: second order optimality conditions see: 
Semi- — 

infinite programming, semidefinite programming and perfect 
duality see: Semi- — 

infinite programs see: computationally equivalent semi- —; 
nonlinear semi- —; semi- — 

infinite time horizon 

[49]xx, 91 Axx] 

see: Infinite horizon control and dynamic games) 

infinitely connected matroid 

[90C09, 90C10] 

see: Matroids) 

infinitely near rational numbers 

[03E70, 03H05, 91B16 

see: Alternative set theory) 

infinitely small negative real numbers 

[03E70, 03H05, 91B16 

see: Alternative set theory) 

infinitely small positive real numbers 

[03E70, 03H05, 91B16 

see: Alternative set theory) 

infinitely small real numbers 

[03E70, 03H05, 91B16 

see: Alternative set theory) 

infinitesimal 

[03H10, 49J27, 90C34] 

see: Semi-infinite programming and control problems) 

infinitesimal calculus 

[01A99] 

see: Leibniz, gottfried wilhelm) 

infinitesimal calculus 

[01A99] 

see: Leibniz, gottfried wilhelm) 
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infinitesimal perturbation analysis 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 

infinity 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 

infinity 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 

inflow see: hydrological exogenous — 

inflow and demand see: water resources planning under 
uncertainty on hydrological exogenous — 

information 
[01A60, 03B30, 54C70, 65K05, 68Q05, 68Q10, 68Q17, 68Q25, 
90C05, 90C25, 90C26, 94A17] 
(see: Hilbert’s thirteenth problem; Information-based 
complexity and information-based optimization; Jaynes’ 
maximum entropy principle) 

information 
[01A60, 03B30, 54C70, 68Q17, 90C60] 
(see: Hilbert’s thirteenth problem; Kolmogorov complexity) 

information see: Algorithmic —; asymmetrical —; 
contaminated —; dual —; expected value of perfect —; 
imprecise —; incomplete —; missing —; mutual —; 
partial —; priced —; radius of —; uncertain —; unknown — 

information-based complexity 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 

information-based complexity 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26, 90C60] 
(see: Complexity theory; Information-based complexity and 
information-based optimization) 

Information-based complexity and information-based 
optimization 
(65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26) 
(referred to in: Complexity classes in optimization; 
Complexity of degeneracy; Complexity of gradients, 
Jacobians, and Hessians; Complexity theory; Complexity 
theory: quadratic programming; Computational 
complexity theory; Fractional combinatorial optimization; 
Kolmogorov complexity; Mixed integer nonlinear 
programming; NP-complete problems and proof 
methodology; Parallel computing: complexity classes) 
(refers to: Complexity classes in optimization; Complexity of 
degeneracy; Complexity of gradients, Jacobians, and 
Hessians; Complexity theory; Complexity theory: quadratic 
programming; Computational complexity theory; 
Fractional combinatorial optimization; Kolmogorov 
complexity; Mixed integer nonlinear programming; 
NP-complete problems and proof methodology; Parallel 
computing: complexity classes) 

information-based model 
[90C60] 
(see: Complexity theory) 

information-based optimization 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 

information-based optimization 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 


(see: Information-based complexity and information-based 
optimization) 
information-based optimization see: Information-based 
complexity and — 
information criterion see: Akaike — 
information game see: two-player zero-sum perfect- — 
information structure 
[49Jxx, 91Axx] 
(see: Infinite horizon control and dynamic games) 
informative genes see: Selection of maximally — 
informed see: weakly — 
infrastructure see: Optimal planning of offshore oilfield — 
ing see: AND- —; OR- — 
Ingber algorithm 
[90C26, 90C90] 
(see: Global optimization in binary star astronomy) 
inhibit procedure 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
inhibitor 
(see: Bayesian networks) 
initial annealing temperature 
62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
initial ideal 
13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
initial simplex 
90C30] 
(see: Sequential simplex method) 
initial solution 
03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
initial system 
65K05, 90C30] 
(see: Bisection global optimization methods) 
initial tableau 
90C05] 
(see: Linear programming: Klee-Minty examples) 
initial temperature 
90C27, 90C90] 
(see: Simulated annealing) 
initial term of a polynomial 
13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
initial value problem 
65G20, 65G30, 65G40, 65K10, 65L99, 90C90] 
(see: Interval analysis: differential equations; Variational 
inequalities: projected dynamical system) 
initial value problem 
[65K10, 90C90] 
(see: Variational inequalities: projected dynamical system) 
initialization 
[49L20, 90C05, 90C33, 90C39, 90C40, 92C05, 92C40] 
(see: Dynamic programming: infinite horizon problems, 
overview; Pivoting algorithms for linear programming 
generating two paths; Protein loop structure prediction 
methods) 
initializing unknown variables 
[90C25, 90C30] 
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(see: Successive quadratic programming: full space 
methods) 
initiated see: receiver- —; sender- — 
initiated mapping technique see: receiver —; sender — 
inner approximation 
90C06, 90C25, 90C26, 90C35] 
(see: Concave programming; Cutting plane methods for 
global optimization; Simplicial decomposition algorithms) 
inner approximation 
90C06, 90C25, 90C26, 90C35] 
(see: Cutting plane methods for global optimization; 
Simplicial decomposition algorithms) 
inner interval arithmetic 
65G30, 65G40, 65K05, 90C30, 90C57] 
(see: Global optimization: interval analysis and balanced 
interval arithmetic) 
inner linearization 
90C30] 
(see: Simplicial decomposition) 
inner linearization cone 
90C31, 90C34, 90C46] 
(see: Generalized semi-infinite programming: optimality 
conditions) 
inner linearization/restriction 
90C30] 
(see: Simplicial decomposition) 
inner point 
46N10, 49J40, 90C26] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
inner problem 
90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
inner product see: K-local — 
inner regular measure 
28-XX, 49-XX, 60-XX] 
(see: General moment optimization problems) 
input see: interva —; maximization of output/ — 
input alphabet of a Turing machine 
90C60] 
(see: Complexity classes in optimization) 
input constraint qualifications 
90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
input data see: length of —; size of — 
input-efficient 
[90B30, 90B50, 90C05, 91B82] 
(see: Data envelopment analysis) 
input layer 
(see: Bayesian networks) 
input neurons 
[90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 
input optimization 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality; Nondifferentiable optimization: parametric 
programming) 
input-output matrices see: updating — 


input-output tables see: triangulation problem for — 
input of a Turing machine see: size of the — 
inscribed sphere method see: largest — 
insertion 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 
insertion algorithm see: generic vertex — 
insertion (FVI) see: farthest vertex — 
insertion (NVI) see: nearest vertex — 
insertion optimal partitioning algorithm see: nearest — 
insertion paradigm see: edge — 
insertion (RVI) see: random vertex — 
insertion step 
[90C59] 
(see: Heuristic and metaheuristic algorithms for the 
traveling salesman problem) 
insertion supernode 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 
insertion of vertex v at 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
see: Traveling salesman problem) 
insertion (VI) see: vertex — 
insight 
[90C10, 90C30] 
see: Modeling languages in optimization: a new paradigm) 
instability in parametric programming 
[90C05, 90C25, 90C29, 90C30, 90C31] 
see: Nondifferentiable optimization: parametric 
programming) 
instance 
[00-02, 01-02, 03-02] 
see: Vehicle routing problem with simultaneous pickups 
and deliveries) 
instance see: problem —-; size of a problem — 
instance in time m see: algorithm solving a problem — 
instances see: all-to-all —; one-to-all — 
instrument financial equilibrium model see: multi-sector 
multi- — 
insufficient reason see: laplace’s principle of — 
insufficient reasoning see: Laplace principle of — 
int U-quasiconcave function 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
integer 
[65K05, 90C10, 90C11, 90C26, 90C46] 
(see: Direct global optimization algorithm; Integer 
programming duality; MINLP: global optimization with 
aBB) 
integer 
[90C10, 90C29] 
(see: Multi-objective integer linear programming) 
integer see: mixed — 
integer 0-1 programs see: mixed — 
integer wBB algorithm see: general structure mixed —; special 
structure mixed — 
Integer Bilevel Optimization see: Mixed — 
integer classification problems see: Mixed — 
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integer cut 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 

integer dynamic optimization see: mixed — 

integer feasibility problem see: zero-one — 

integer-fix see: near- — 

integer formulation see: LCP: Pardalos—Rosen mixed — 

integer fractional programming 

[90C32] 

see: Fractional programming) 

integer infeasibilities see: sum of — 

integer L-shaped method 

[90C10, 90C15 

see: Stochastic vehicle routing problems) 

integer labeling 

[90C05, 90C10 
(see: Simplicial pivoting algorithms for integer 
programming) 

integer LCP 
[90C25, 90C33 
(see: Integer linear complementary problem) 

Integer linear complementary problem 
(90C25, 90C33) 
(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Equivalence between nonlinear 
complementarity problem and fixed point problem; 
Generalized nonlinear complementarity problem; Graph 
coloring; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 
LCP: Pardalos—Rosen mixed integer formulation; Linear 
complementarity problem; MINLP: trim-loss problem; 
Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; Order 
complementarity; Parametric mixed integer nonlinear 
optimization; Principal pivoting methods for linear 
complementarity problems; Set covering, packing and 
partitioning problems; Simplicial pivoting algorithms for 
integer programming; Stochastic integer programming: 
continuity, stability, rates of convergence; Stochastic 
integer programs; Time-dependent traveling salesman 
problem; Topological methods in complementarity theory) 
(refers to: Branch and price: Integer programming with 
column generation; Convex-simplex algorithm; 
Decomposition techniques for MILP: lagrangian relaxation; 
Equivalence between nonlinear complementarity problem 
and fixed point problem; Generalized nonlinear 
complementarity problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos-Rosen 
mixed integer formulation; Lemke method; Linear 
complementarity problem; Linear programming; Mixed 
integer classification problems; Multi-objective integer 
linear programming; Multi-objective mixed integer 


programming; Multiparametric mixed integer linear 
programming; Order complementarity; Parametric linear 
programming: cost simplex algorithm; Parametric mixed 
integer nonlinear optimization; Principal pivoting methods 
for linear complementarity problems; Sequential simplex 
method; Set covering, packing and partitioning problems; 
Simplicial pivoting algorithms for integer programming; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem; Topological 
methods in complementarity theory) 

integer linear complementary problem 
[90C25, 90C33] 

(see: Integer linear complementary problem) 

integer linear optimization see: Global pairwise protein 
sequence alignment via mixed- — 

integer linear program 
[90035] 

(see: Optimization in leveled graphs) 

integer linear program see: single parametric mixed — 

integer linear programming 
[90C10, 90C29] 

(see: Multi-objective integer linear programming) 

integer linear programming see: mixed —; Multi-objective —; 
Multiparametric mixed — 

integer linear programming: heat exchanger network 
synthesis see: Mixed — 

integer linear programming: mass and heat exchanger 
networks see: Mixed — 

integer linear programs see: Robust optimization: mixed- — 

Integer linear programs for routing and protection problems 
in optical networks 
(68M10, 90B18, 90B25, 46N10) 

integer MITPs 
[90C35] 

(see: Multi-index transportation problems) 

integer multi-index transportation problem 
[90C35] 

(see: Multi-index transportation problems) 

integer nonconvex problem see: mixed — 

integer nonlinear bilevel programming: deterministic global 
optimization see: Mixed — 

integer nonlinear optimization see: mixed —; Parametric 
mixed — 

integer nonlinear optimization: A disjunctive cutting plane 
approach see: Mixed- — 

integer nonlinear program see: mixed — 

integer nonlinear programming see: mixed — 

integer nonlinear programming problem see: mixed — 

integer optimal control problem see: mixed — 

Integer Optimization see: Mixed —; Multi-class data 
classification via mixed- —; Peptide identification via 
mixed- — 

integer optimization problem 
[90C26] 

(see: Smooth nonlinear nonconvex optimization) 

integer optimization problem see: mixed — 

integer optimization in well scheduling see: Mixed — 

integer problem see: linear zero-one —; mixed — 

integer problems see: 0-1 mixed —; linear mixed — 
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integer program 
[05-XX, 90C10, 90C11, 90C27, 90C30, 90C57] 
(see: Frequency assignment problem; Integer programming; 
Lagrangian duality: BASICS) 

integer program 
[90C05, 90C06, 90C08, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Integer programming: branch and bound methods; 
Integer programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Modeling difficult 
optimization problems) 

integer program see: mixed —; zero-one — 

integer program with recourse see: stochastic — 


Integer programming 


Stochastic integer programs; Time-dependent traveling 
salesman problem; Vehicle scheduling) 

integer programming 
[01A99, 05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68Q20, 
68Q99, 68R, 68U, 68W, 90-XX, 90B, 90B50, 90C] 
(see: Branch and price: Integer programming with column 
generation; Convex discrete optimization; History of 
optimization; Optimal triangulations; Optimization and 
decision support systems; Survivable networks) 

integer programming 
[13Cxx, 13Pxx, 14Qxx, 68Q99, 90B50, 90B80, 90C05, 90C10, 
90C11, 90C27, 90C30, 90C35, 90C46, 90C57, 90Cxx] 
(see: Assignment and matching; Branch and price: Integer 
programming with column generation; Facilities layout 


(90C10, 90C11, 90C27, 90C57) 

(referred to in: Airline optimization; Alignment problem; 
Branch and price: Integer programming with column 
generation; Cutting-stock problem; Decomposition 
techniques for MILP: lagrangian relaxation; Graph 
coloring; Integer linear complementary problem; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; Large scale trust 
region problems; LCP: Pardalos—Rosen mixed integer 
formulation; Maximum cut problem, MAX-CUT; 
Maximum satisfiability problem; MINLP: trim-loss 
problem; Mixed integer classification problems; 
Multidimensional knapsack problems; Multi-objective 
integer linear programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Optimization-based visualization; 
Optimization in leveled graphs; Parametric mixed integer 
nonlinear optimization; Quadratic knapsack; Set covering, 
packing and partitioning problems; Simplicial pivoting 
algorithms for integer programming; Stable set problem: 
branch & cut algorithms; Stochastic integer programming: 
continuity, stability, rates of convergence; Stochastic 
integer programs; Time-dependent traveling salesman 
problem; Vehicle scheduling) 

(refers to: Airline optimization; Alignment problem; Branch 
and price: Integer programming with column generation; 
Decomposition techniques for MILP: lagrangian relaxation; 
Graph coloring; Integer linear complementary problem; 
Integer programming: algebraic methods; Integer 
programming: branch and bound methods; Integer 
programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Integer 
programming duality; Integer programming: lagrangian 
relaxation; LCP: Pardalos—Rosen mixed integer 
formulation; Maximum satisfiability problem; Mixed 
integer classification problems; Multidimensional knapsack 
problems; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Optimization in leveled graphs; Parametric mixed integer 
nonlinear optimization; Quadratic knapsack; Set covering, 
packing and partitioning problems; Simplicial pivoting 
algorithms for integer programming; Stochastic integer 
programming: continuity, stability, rates of convergence; 


problems; Facility location problems with spatial 
interaction; Integer programming; Integer programming: 
algebraic methods; Integer programming duality; Integer 
programming: lagrangian relaxation; Optimization and 
decision support systems; Optimization in leveled graphs; 
Set covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming) 


integer programming see: convex —; cost functions in —; 


group relaxation in —; mixed —; Multi-objective mixed —; 
multi-objective (multicriteria) mixed —; n-fold —; Simplicial 
pivoting algorithms for —; stochastic —; stochastic 

(mixed- —; test sets in —; zero-one — 


Integer programming: algebraic methods 


(13Cxx, 13Pxx, 14Qxx, 90Cxx) 

(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: branch and bound methods; Integer 
programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Integer 
programming duality; Integer programming: lagrangian 
relaxation; LCP: Pardalos-Rosen mixed integer 
formulation; MINLP: trim-loss problem; Multi-objective 
integer linear programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Parametric mixed integer nonlinear 
optimization; Set covering, packing and partitioning 
problems; Simplicial pivoting algorithms for integer 
programming; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem) 

(refers to: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Integer linear complementary 
problem; Integer programming; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos—Rosen 
mixed integer formulation; Mixed integer classification 
problems; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stochastic 
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integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Time-dependent 
traveling salesman problem) 

Integer programming: branch and bound methods 

(90C10, 90C05, 90C08, 90C11, 90C06) 

(referred to in: Biquadratic assignment problem; Branch and 
price: Integer programming with column generation; 
Decomposition techniques for MILP: lagrangian relaxation; 
Discrete stochastic optimization; Facilities layout problems; 
Global optimization based on statistical models; Global 
optimization in multiplicative programming; Graph 
coloring; Integer linear complementary problem; Integer 
programming; Integer programming: algebraic methods; 
Integer programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Integer 
programming duality; Integer programming: lagrangian 
relaxation; Job-shop scheduling problem; LCP: 
Pardalos—Rosen mixed integer formulation; Maximum 
satisfiability problem; MINLP: trim-loss problem; 
Multidimensional assignment problem; Multidimensional 
knapsack problems; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Optimization-based visualization; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stable set 
problem: branch & cut algorithms; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Time-dependent traveling 
salesman problem) 

(refers to: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Genetic algorithms; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and cut algorithms; Integer programming: cutting 
plane algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos—-Rosen 
mixed integer formulation; Linear programming: interior 
point methods; Mixed integer classification problems; 
Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Time-dependent 
traveling salesman problem) 

Integer programming: branch and cut algorithms 

(90C10, 90C11, 90C05, 90C08, 90C06) 

(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: cutting 
plane algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos-Rosen 
mixed integer formulation; MINLP: trim-loss problem; 


Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; 
Quadratic assignment problem; Set covering, packing and 
partitioning problems; Simplicial pivoting algorithms for 
integer programming; Stable set problem: branch & cut 
algorithms; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Stochastic vehicle routing problems; Time-dependent 
traveling salesman problem) 
(refers to: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Integer linear complementary 
problem; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos-Rosen 
mixed integer formulation; Mixed integer classification 
problems; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Time-dependent 
traveling salesman problem) 

Integer programming with column generation see: Branch and 
price: — 

integer programming: complexity and equivalent forms see: 
Quadratic — 

integer programming/constraint programming hybrid 
methods see: Mixed — 

integer programming: continuity, stability, rates of 
convergence see: Stochastic — 

Integer programming: cutting plane algorithms 
(90C10, 90C05, 90C08, 90C11, 90C06, 90C08) 
(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; Job-shop scheduling 
problem; LCP: Pardalos-Rosen mixed integer formulation; 
MINLP: trim-loss problem; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Parametric mixed integer nonlinear 
optimization; Quadratic assignment problem; Set covering, 
packing and partitioning problems; Simplicial pivoting 
algorithms for integer programming; Stable set problem: 
branch & cut algorithms; Stochastic integer programming: 
continuity, stability, rates of convergence; Stochastic 
integer programs; Time-dependent traveling salesman 
problem) 
(refers to: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
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lagrangian relaxation; Integer linear complementary 
problem; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos—-Rosen 
mixed integer formulation; Mixed integer classification 
problems; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
NP-complete problems and proof methodology; Parametric 
mixed integer nonlinear optimization; Set covering, 
packing and partitioning problems; Simplicial pivoting 
algorithms for integer programming; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Time-dependent traveling 
salesman problem) 

Integer programming duality 

(90C10, 90C46) 

(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 
LCP: Pardalos-Rosen mixed integer formulation; MINLP: 
trim-loss problem; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Parametric mixed integer nonlinear 
optimization; Set covering, packing and partitioning 
problems; Simplicial pivoting algorithms for integer 
programming; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem) 

(refers to: Decomposition techniques for MILP: lagrangian 
relaxation; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 


duality; Multiparametric mixed integer linear 
programming; Nondifferentiable optimization; 
Nondifferentiable optimization: subgradient optimization 
methods; Parametric mixed integer nonlinear 
optimization; Quadratic assignment problem; Set covering, 
packing and partitioning problems; Simplicial pivoting 
algorithms for integer programming; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Time-dependent traveling 
salesman problem) 

(refers to: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Integer linear complementary 
problem; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Lagrange, 
Joseph-Louis; Lagrangian multipliers methods for convex 
programming; LCP: Pardalos-Rosen mixed integer 
formulation; Mixed integer classification problems; 
Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multi-objective optimization: lagrange duality; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Time-dependent 
traveling salesman problem) 


integer programming: models and applications see: 


Multi-quadratic — 


integer programming problem 


[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods; 
Integer programming: cutting plane algorithms) 


integer programming problem see: large scale nonlinear 


mixed —; mixed nonlinear — 


integer programming problems 


[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms) 


algorithms; Integer programming: lagrangian relaxation; 
Simplicial pivoting algorithms for integer programming; 
Time-dependent traveling salesman problem) 

Integer programming: lagrangian relaxation 
(90C10, 90C30) 
(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 


integer programs see: mixed —; Stochastic — 
integer quadratic programming see: mixed- — 
integer recourse see: simple —; Stochastic programming with 
simple —; two-stage stochastic programs with simple — 
integer rounding 
[90C05, 90C06, 90C08, 90C10, 90C11, 90C27, 90C57] 
(see: Integer programming; Integer programming: cutting 


lagrangian relaxation; Facilities layout problems; Graph plane algorithms) 

coloring; Integer linear complementary problem; Integer integer rounding cut see: mixed — 
programming; Integer programming: algebraic methods; integer Solution 

Integer programming: branch and bound methods; Integer [90C11] 


programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Integer 
programming duality; Lagrange, Joseph-Louis; Lagrangian 
multipliers methods for convex programming; LCP: 
Pardalos—Rosen mixed integer formulation; MINLP: 
trim-loss problem; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multi-objective optimization: lagrange 


(see: MINLP: branch and bound methods) 
integer transportation problem see: convex — 
integer value function see: mixed — 
integer variable see: most/least infeasible —; multiple 
branches for bounded — 
integer variables 
[90C11] 
(see: MINLP: branch and bound methods) 
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integral 
[01A99] 
(see: Leibniz, gottfried wilhelm) 
integral see: derivative of an —; gradient of an —; 
hypergeometric —; indefinite —; multivariate 
probability — 
integral bounds subject to moment conditions see: optimal — 
integral constraint 
[28-XX, 49-XX, 60-XX] 
(see: General moment optimization problems) 
integral equations 
[65H10, 65J15] 
(see: Contraction-mapping) 
integral evaluation see: asymptotic case of — 
integral Fenchel-Legendre transformation 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
integral functions: general theory and examples see: 
Derivatives of probability and — 
integral linear fractional combinatorial optimization problem 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
see: Fractional combinatorial optimization) 
integral Mean-Value for Composite Convexifiable Function 
[25A15, 34A05, 90C25, 90C26, 90C30, 90C31] 
(see: Convexifiable functions, characterization of) 
integral quadratic constraint 
[93D09] 
see: Robust control) 
integral relationships 
[03H10, 49J27, 90C34] 
see: Semi-infinite programming and control problems) 
integral over surface formula 
[90C15] 
see: Derivatives of probability and integral functions: 
general theory and examples) 
integral system see: totally dual — 
integral vector see: primitive —; support of an — 
integral over a volume 
[90C15] 
see: Derivatives of probability and integral functions: 
general theory and examples) 
integral over volume formula 
[90C15] 
see: Derivatives of probability and integral functions: 
general theory and examples) 
integrality criterion 
[90C11, 90C31] 
see: Multiparametric mixed integer linear programming) 
integrality gap 
[90C35] 
see: Feedback set problems) 
integrality property 
[90C30, 90C90] 
(see: Decomposition techniques for MILP: lagrangian 
relaxation) 
integrality theorem 
90C35] 
(see: Maximum flow problem) 
integrals see: Approximation of multivariate probability —; 
lower bounds for multivariate probability —; probability —; 
upper bounds for multivariate probability — 


integrate 
(see: State of the art in modeling agricultural systems) 
Integrated planning and scheduling 
integrated probabilistic constraint 
[90C15] 
(see: Static stochastic programming models: conditional 
expectations) 
integrated probabilistic constraint 
[90C15] 
(see: Static stochastic programming models: conditional 
expectations) 
Integrated vehicle and duty scheduling problems 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
integration 
[01A99] 
(see: Leibniz, gottfried wilhelm) 
integration 
[01A99] 
(see: Leibniz, gottfried wilhelm) 
integration see: high-dimensional —; problem — 
integration of dynamic considerations and controllability 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
integration of surveys 
[90035 
(see: Multi-index transportation problems) 
intelligence see: artificial — 
intelligent multicriteria decision support system 
[90C29 
(see: Decision support systems with multiple criteria) 
intelligent multicriteria decision support systems 
[90C29 
(see: Decision support systems with multiple criteria) 
intelligent search 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
intensification 
[68M20, 90B06, 90B35, 90B80, 90C59] 
(see: Flow shop scheduling problem; Heuristic and 
metaheuristic algorithms for the traveling salesman 
problem; Location routing problem) 
intensification phase 
(see: Maximum cut problem, MAX-CUT) 
inter-class distance 
[55R15, 55R35, 65K05, 90C11] 
(see: Deterministic and probabilistic optimization models 
for data classification) 
interaction see: Facility location problems with spatial —; level 
of —; spatial —; visual 
interaction of design and control 
[49M37, 90C11, 90C29, 90C90] 
(see: MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control) 
interaction of design and control 
[49M37, 90C11, 90C29, 90C90] 
(see: MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control) 
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interaction of design and control see: MINLP: applications in 
the —; Multi-objective optimization: — 

interaction of design, synthesis and control 
[49M37, 90C11] 
(see: Mixed integer nonlinear programming) 

interaction model see: spatial- — 

interactions see: electrostatic — 

interactive 

90C11, 90C29] 

(see: Multi-objective mixed integer programming) 

interactive disaggregation system 

90C29, 91A99] 

(see: Preference disaggregation) 

interactive learning of Boolean functions 

90C09] 

(see: Inference of monotone boolean functions) 

interactive learning of Boolean functions 

90C09] 

(see: Inference of monotone boolean functions) 

interactive method 

90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions; Multi-objective optimization: 
pareto optimal solutions, properties) 

interactive method 
[90C11, 90C29] 
(see: Multi-objective mixed integer programming; 
Multi-objective optimization; Interactive methods for 
preference value functions) 

interactive method see: visual — 

interactive methods 
[65K05, 90B50, 90C05, 90C27, 90C29, 90C70, 91B06] 
(see: Fuzzy multi-objective linear programming; 
Multi-objective combinatorial optimization; 


Multi-objective optimization and decision support systems) 


interactive methods see: computing processes in — 
Interactive methods for preference value functions see: 

Multi-objective optimization; — 
interactive procedures 

[90C29, 90C70] 

(see: Fuzzy multi-objective linear programming) 
interactive sampling procedure 

[90C29, 90C70] 

(see: Fuzzy multi-objective linear programming) 
interactive versus noninteractive methods 

[90C11, 90C29] 

(see: Multi-objective mixed integer programming) 
interchange see: fast — 
interchange heuristics see: k- — 
intercommodity constraints 

[90C35] 

(see: Feedback set problems) 
interconnection designs see: subset — 
interdisciplinary coupling see: bandwidth of — 
interdisciplinary feasibility 

[49M37, 65K05, 65K10, 90C30, 93A13] 

(see: Multilevel methods for optimal design) 
interest rate see: riskless —; spot — 
interest rate yield curve 

[90C34, 91B28] 


(see: Semi-infinite programming and applications in 
finance) 

interest rates see: term structure of — 

interface see: fractal —; graphical user — 

interhelical contacts in alpha-helical proteins see: Predictive 
method for — 

interior see: nonempty — 

interior point 
[49M29, 65K10, 65L99, 90C06, 90C33, 93-XX] 
(see: Linear complementarity problem; Local attractors for 
gradient-related descent iterations; Optimization strategies 
for dynamic systems) 

interior point 
[90C20] 
(see: Standard quadratic optimization problems: 
algorithms; Standard quadratic optimization problems: 
theory) 

interior point see: infeasible — 

interior point algorithm 
[90C05, 90C33] 
(see: Pivoting algorithms for linear programming 
generating two paths) 

interior-point algorithm see: infeasible-start — 

interior point algorithms 

90C15, 90C20, 90C25] 

(see: Quadratic programming over an ellipsoid; Stochastic 

programming: parallel factorization of structured matrices) 

interior point algorithms 

90C05, 90C33] 

(see: Pivoting algorithms for linear programming 

generating two paths) 

interior point algorithms for entropy optimization 

90C25, 90C51, 94A17] 

(see: Entropy optimization: interior point methods) 

interior point logarithmic barrier method 

90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming: discretization methods) 

interior point method 

90C05, 90C09, 90C10] 

(see: Linear programming: karmarkar projective algorithm; 

Optimization in boolean classification problems) 

interior point methods 

15A15, 49J52, 65K05, 65K10, 65L99, 90C05, 90C06, 90C08, 
90C10, 90C11, 90C20, 90C25, 90C30, 90C34, 90C51, 90C55, 
90C60, 90C90, 93-XX, 94A17] 
(see: Complexity theory: quadratic programming; Entropy 
optimization: interior point methods; Feasible sequential 
quadratic programming; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Nondifferentiable optimization: Newton 
method; Optimization strategies for dynamic systems; 
Quadratic programming with bound constraints; 
Semidefinite programming and determinant maximization; 
Successive quadratic programming: solution by active sets 
and interior point methods) 

interior point methods 
[15A15, 49K20, 49M99, 65K05, 65K10, 90C05, 90C15, 90C25, 
90C27, 90C30, 90C51, 90C55, 90C60, 90C90, 94A17] 
(see: ABS algorithms for optimization; Complexity theory: 
quadratic programming; Entropy optimization: interior 
point methods; Homogeneous selfdual methods for linear 
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programming; Linear programming: karmarkar projective 
algorithm; Semidefinite programming and determinant 
maximization; Semidefinite programming and structural 
optimization; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Stochastic programming: parallel factorization of 
structured matrices; Successive quadratic programming; 
Successive quadratic programming: full space methods; 
Successive quadratic programming: solution by active sets 
and interior point methods) 

interior-point methods see: Entropy optimization: —; Linear 
programming: —; polynomial time —; primal-dual —; 
Successive quadratic programming: solution by active sets 
and — 

interior point methods for distributed optimal control 
problems see: Sequential quadratic programming: — 

interior point methods for the linear complementarity 
problem see: Generalizations of — 

Interior point methods for semidefinite programming 
(90C51, 90C22, 90C25, 90C05, 90C30) 
(referred to in: Duality for semidefinite programming; 
Entropy optimization: interior point methods; 
Homogeneous selfdual methods for linear programming; 
Linear programming: interior point methods; Linear 
programming: karmarkar projective algorithm; Matrix 
completion problems; Potential reduction methods for 
linear programming; Semidefinite programming and 
determinant maximization; Semidefinite programming: 
optimality conditions and stability; Semidefinite 
programming and structural optimization; Semi-infinite 
programming, semidefinite programming and perfect 
duality; Sequential quadratic programming: interior point 


methods for distributed optimal control problems; Solving 


large scale and sparse semidefinite programs; Standard 
quadratic optimization problems: theory; Successive 


quadratic programming: solution by active sets and interior 


point methods) 
interior of a relation see: symmetric — 
interior of a set 
[37A35, 90C05] 
(see: Potential reduction methods for linear programming) 
interior solution 
[90C25, 90C51, 9417] 
see: Entropy optimization: interior point methods) 
interiors 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
interlining 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
see: Vehicle scheduling) 
interlocking eigenvalue theorem 
[65K05, 90Cxx] 
see: Symmetric systems of linear equations) 
intermediate fill-in 
[65Fxx] 
(see: Least squares problems) 
intermediate form see: AD — 
intermediate scale network 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 


intermediate storage see: unlimited — 
intermediate term 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: intermediate terms) 
intermediate terms see: Interval analysis: — 
intermediate variables 
[26A24, 65D25] 
(see: Automatic differentiation: introduction, history and 
rounding error estimation) 
internal coordinates 
92B05] 
(see: Genetic algorithms for protein structure prediction) 
internal coordinates 
92B05] 
(see: Genetic algorithms for protein structure prediction) 
internal deviation 
62H30, 68T10, 90C05] 
(see: Linear programming models for classification) 
internal energy 
90C90] 
(see: Optimization in medical imaging) 
internal numerical differentiation 
34-xx, 34Bxx, 34Lxx, 93E24] 
(see: Complexity and large-scale least squares problems) 
International Zeolite Association see: atlas of the — 


Internet 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
interplay between primal and dual solutions see: exploiting 
the — 
interpolation see: Nystr6m — 
interpolation parametric eigenvalue formulation see: 
inverse — 
interpolation problem 
[93-XX] 
(see: Dynamic programming: optimal control applications) 
interpolatory operator 
[90034] 
(see: Semi-infinite programming: approximation methods) 
interpolatory operator see: nonnegative — 
interpretation 
[65H99, 65K99] 
(see: Automatic differentiation: point and interval) 
interpretation see: epistemological —; geometric —; 
objective —; subjective — 
interpretation, existence and uniqueness see: Variational 
inequalities: geometric — 
intersatured vertices 
[90C35] 
(see: Feedback set problems) 
intersection 
[03B52, 03E72, 47840, 65G20, 65G30, 65G40, 68T27, 68T35, 
68Uxx, 90Bxx, 90C26, 90C30, 91Axx, 91B06, 92C60] 
(see: Boolean and fuzzy relations; Bounding derivative 
ranges; Interval analysis: systems of nonlinear equations) 
intersection see: transversal — 
intersection cut 
[90C11] 
(see: MINLP: branch and bound methods) 
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intersection cut 
[90C26] 
(see: Cutting plane methods for global optimization) 

intersection graph model 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 

intersection problem see: convex — 

interva input 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: differential equations) 

interval see: Automatic differentiation: point and —; 
confidence —; critical —; expectation-maximization —; 
machine — 

interval algebra see: relational — 

interval analysis 
[49M37, 65G20, 65G30, 65G40, 65H20, 65K99, 90C11] 
(see: Interval Newton methods; Mixed integer nonlinear 
programming) 

interval analysis 
[65C20, 65G20, 65G30, 65G40, 65H20, 65L99, 80A10, 80A22, 
90C90] 
(see: Global optimization: application to phase equilibrium 
problems; Interval analysis: application to chemical 
engineering design problems; Interval analysis: differential 
equations) 

Interval analysis: application to chemical engineering design 
problems 
(65C20, 65G20, 65G30, 65G40, 90C90, 65H20) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Design 
optimization in computational fluid dynamics; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: differential equations; Interval analysis: 
eigenvalue bounds of interval matrices; Interval analysis: 
intermediate terms; Interval analysis: nondifferentiable 
problems; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; Interval 
Newton methods; Multidisciplinary design optimization; 
Multilevel methods for optimal design; Optimal design of 
composite structures; Optimal design in nonlinear optics; 
Structural optimization: history) 
(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bilevel programming: applications in 
engineering; Bounding derivative ranges; Design 
optimization in computational fluid dynamics; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: differential equations; Interval analysis: 
eigenvalue bounds of interval matrices; Interval analysis: 
intermediate terms; Interval analysis: nondifferentiable 
problems; Interval analysis: parallel methods for global 
optimization; Interval analysis: subdivision directions in 
interval branch and bound methods; Interval analysis: 
systems of nonlinear equations; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 


linear systems; Interval Newton methods; Multidisciplinary 
design optimization; Multilevel methods for optimal 
design; Optimal design of composite structures; Optimal 
design in nonlinear optics; Structural optimization: history) 

interval analysis and balanced interval arithmetic see: Global 
optimization: — 

Interval analysis: differential equations 
(65G20, 65G30, 65G40, 65L99) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: eigenvalue bounds of 
interval matrices; Interval analysis: intermediate terms; 
Interval analysis: nondifferentiable problems; Interval 
analysis: systems of nonlinear equations; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 
linear systems; Interval Newton methods) 
(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: eigenvalue bounds of 
interval matrices; Interval analysis: intermediate terms; 
Interval analysis: nondifferentiable problems; Interval 
analysis: parallel methods for global optimization; Interval 
analysis: subdivision directions in interval branch and 
bound methods; Interval analysis: systems of nonlinear 
equations; Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; Interval 
Newton methods) 

Interval analysis: eigenvalue bounds of interval matrices 
(65G20, 65G30, 65G40, 65L99) 
(referred to in: aBB algorithm; Automatic differentiation: 
point and interval; Automatic differentiation: point and 
interval taylor operators; Bounding derivative ranges; 
Eigenvalue enclosures for ordinary differential equations; 
Global optimization: application to phase equilibrium 
problems; Hemivariational inequalities: eigenvalue 
problems; Interval analysis: application to chemical 
engineering design problems; Interval analysis: differential 
equations; Interval analysis: intermediate terms; Interval 
analysis: nondifferentiable problems; Interval analysis: 
systems of nonlinear equations; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 
linear systems; Interval Newton methods; Semidefinite 
programming and determinant maximization; Standard 
quadratic optimization problems: algorithms) 
(refers to: eBB algorithm; Automatic differentiation: point 
and interval; Automatic differentiation: point and interval 
taylor operators; Bounding derivative ranges; Eigenvalue 
enclosures for ordinary differential equations; Global 
optimization: application to phase equilibrium problems; 
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Hemivariational inequalities: eigenvalue problems; Interval 
analysis: application to chemical engineering design 
problems; Interval analysis: differential equations; Interval 
analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; Interval 
Newton methods; Semidefinite programming and 
determinant maximization) 

Interval analysis: intermediate terms 
(65G20, 65G30, 65G40, 65H20) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: nondifferentiable problems; Interval 
analysis: parallel methods for global optimization; Interval 
analysis: systems of nonlinear equations; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 
linear systems; Interval Newton methods) 
(refers to: Automatic differentiation: introduction, history 
and rounding error estimation; Automatic differentiation: 
point and interval; Automatic differentiation: point and 
interval taylor operators; Bounding derivative ranges; 
Global optimization: application to phase equilibrium 
problems; Interval analysis: application to chemical 
engineering design problems; Interval analysis: differential 
equations; Interval analysis: eigenvalue bounds of interval 
matrices; Interval analysis: nondifferentiable problems; 
Interval analysis: parallel methods for global optimization; 
Interval analysis: subdivision directions in interval branch 
and bound methods; Interval analysis: systems of nonlinear 
equations; Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; Interval 
Newton methods) 

Interval analysis: nondifferentiable problems 
(65G20, 65G30, 65G40, 65H20) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
systems of nonlinear equations; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 


linear systems; Interval Newton methods) 
(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
parallel methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; Interval 
Newton methods) 

Interval analysis for optimization of dynamical systems 

Interval analysis: parallel methods for global optimization 
(65K05, 65Y05, 65Y10, 65Y20, 68W10) 
(referred to in: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Global optimization: interval analysis and balanced interval 
arithmetic; Interval analysis: application to chemical 
engineering design problems; Interval analysis: differential 
equations; Interval analysis: eigenvalue bounds of interval 
matrices; Interval analysis: intermediate terms; Interval 
analysis: nondifferentiable problems; Interval analysis: 
systems of nonlinear equations; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 
linear systems; Interval Newton methods; Load balancing 
for parallel optimization techniques; Parallel computing: 
complexity classes; Parallel computing: models; Stochastic 
network problems: massively parallel solution) 
(refers to: Interval analysis: intermediate terms; Interval 
analysis: subdivision directions in interval branch and 
bound methods; Interval analysis: systems of nonlinear 
equations; Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval global optimization; Interval Newton methods) 

Interval analysis: subdivision directions in interval branch 
and bound methods 
(65K05, 90C30) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Global optimization: interval analysis and balanced interval 
arithmetic; Interval analysis: application to chemical 
engineering design problems; Interval analysis: differential 
equations; Interval analysis: eigenvalue bounds of interval 
matrices; Interval analysis: intermediate terms; Interval 
analysis: nondifferentiable problems; Interval analysis: 
parallel methods for global optimization; Interval analysis: 
systems of nonlinear equations; Interval analysis: 
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unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 
linear systems; Interval Newton methods) 

(refers to: Automatic differentiation: introduction, history 
and rounding error estimation; Interval analysis: 
unconstrained and constrained optimization; Interval 
Newton methods; MINLP: branch and bound global 
optimization algorithm) 

Interval analysis: systems of nonlinear equations 

(65G20, 65G30, 65G40) 

(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; 
Contraction-mapping; Global optimization: application to 
phase equilibrium problems; Global optimization methods 
for systems of nonlinear equations; Grébner bases for 
polynomial equations; Interval analysis: application to 
chemical engineering design problems; Interval analysis: 
differential equations; Interval analysis: eigenvalue bounds 
of interval matrices; Interval analysis: intermediate terms; 
Interval analysis: nondifferentiable problems; Interval 
analysis: parallel methods for global optimization; Interval 
analysis: unconstrained and constrained optimization; 
Interval analysis: verifying feasibility; Interval constraints; 
Interval fixed point theory; Interval global optimization; 
Interval linear systems; Interval Newton methods; 
Nonlinear least squares: Newton-type methods; Nonlinear 
systems of equations: application to the enclosure of all 
azeotropes) 

(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; 
Contraction-mapping; Global optimization: application to 
phase equilibrium problems; Global optimization methods 
for systems of nonlinear equations; Interval analysis: 
application to chemical engineering design problems; 
Interval analysis: differential equations; Interval analysis: 
eigenvalue bounds of interval matrices; Interval analysis: 
intermediate terms; Interval analysis: nondifferentiable 
problems; Interval analysis: parallel methods for global 
optimization; Interval analysis: subdivision directions in 
interval branch and bound methods; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 
linear systems; Interval Newton methods; Nonlinear least 
squares: Newton-type methods; Nonlinear systems of 
equations: application to the enclosure of all azeotropes) 
Interval analysis: unconstrained and constrained 
optimization 

(65G20, 65G30, 65G40, 65H20) 

(referred to in: Algorithmic improvements using a heuristic 
parameter, reject index for interval optimization; 
Automatic differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
Bounding derivative ranges; Direct search Luus—Jaakola 
optimization procedure; Global optimization: application 
to phase equilibrium problems; Global optimization: 
interval analysis and balanced interval arithmetic; Interval 
analysis: application to chemical engineering design 


problems; Interval analysis: differential equations; Interval 
analysis: eigenvalue bounds of interval matrices; Interval 
analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: verifying feasibility; Interval constraints; 
Interval fixed point theory; Interval global optimization; 
Interval linear systems; Interval Newton methods) 

(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Direct search 
Luus—Jaakola optimization procedure; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: verifying feasibility; Interval constraints; 
Interval fixed point theory; Interval global optimization; 
Interval linear systems; Interval Newton methods) 


Interval analysis: verifying feasibility 


(65G20, 65G30, 65G40, 65H20) 

(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: systems 
of nonlinear equations; Interval analysis: unconstrained 
and constrained optimization; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 
linear systems; Interval Newton methods) 

(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval constraints; Interval fixed point 
theory; Interval global optimization; Interval linear 
systems; Interval Newton methods) 


interval arithmetic 


[49M37, 65G20, 65G30, 65G40, 65H99, 65K05, 65K10, 65K99, 
65T40, 68T20, 90C26, 90C30, 90C57, 90C90] 
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(see: &BB algorithm; Automatic differentiation: point and 
interval; Bounding derivative ranges; Global optimization: 
interval analysis and balanced interval arithmetic; Global 
optimization methods for harmonic retrieval; Interval 
constraints) 

Interval arithmetic 
[15A99, 49M37, 65G20, 65G30, 65G40, 65K05, 65K10, 90C26, 
90C30] 
(see: aBB algorithm; Automatic differentiation: point and 
interval taylor operators; Bounding derivative ranges; 
Interval linear systems) 

interval arithmetic see: balanced —; balanced random —; 
Global optimization: interval analysis and balanced —; 
inclusion principle of machine —; inner —; machine —; 
random — 

interval arithmetic operation 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 

interval automatic differentiation 
[65H99, 65K99] 
(see: Automatic differentiation: point and interval) 

interval branch and bound methods see: Interval analysis: 
subdivision directions in — 

interval computations 
[65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval fixed point theory; 
Interval Newton methods) 

interval computing 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

Interval constraints 
(68120, 65G20, 65G30, 65G40) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: systems of 
nonlinear equations; Interval analysis: unconstrained and 
constrained optimization; Interval analysis: verifying 
feasibility; Interval fixed point theory; Interval global 
optimization; Interval linear systems; Interval Newton 
methods) 
(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 


Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval fixed point theory; Interval global optimization; 
Interval linear systems; Interval Newton methods) 

interval dependency 

65G20, 65G30, 65G40, 65H20] 

(see: Interval analysis: intermediate terms) 

interval dependency 

65G20, 65G30, 65G40, 65H20] 

(see: Interval analysis: intermediate terms) 

interval diagram see: composition —; temperature — 

interval enclosure 

65G20, 65G30, 65G40, 68T20] 

(see: Interval constraints) 

interval equation see: linear — 

interval extension 

65H20, 80A10, 80A22, 90C90] 
(see: Global optimization: application to phase equilibrium 
problems; LP strategy for interval-Newton method in 
deterministic global optimization) 

interval extension see: natural — 

Interval fixed point theory 
(65G20, 65G30, 65G40, 65H20) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: systems of 
nonlinear equations; Interval analysis: unconstrained and 
constrained optimization; Interval analysis: verifying 
feasibility; Interval constraints; Interval global 
optimization; Interval linear systems; Interval Newton 
methods) 
(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval global optimization; Interval 
linear systems; Interval Newton methods) 

interval function see: pre-declared — 

Interval global optimization 
(65K05, 90C30, 65G20, 65G30, 65G40) 
(referred to in: a BB algorithm; Automatic differentiation: 
point and interval; Automatic differentiation: point and 
interval taylor operators; Bounding derivative ranges; 
Continuous global optimization: applications; Continuous 
global optimization: models, algorithms and software; 
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Global optimization in the analysis and management of 
environmental systems; Global optimization: application to 
phase equilibrium problems; Global optimization in batch 
design under uncertainty; Global optimization in 
generalized geometric programming; Global optimization: 
interval analysis and balanced interval arithmetic; Global 
optimization methods for systems of nonlinear equations; 
Global optimization in phase and chemical reaction 
equilibrium; Interval analysis: application to chemical 
engineering design problems; Interval analysis: differential 
equations; Interval analysis: eigenvalue bounds of interval 
matrices; Interval analysis: intermediate terms; Interval 
analysis: nondifferentiable problems; Interval analysis: 
parallel methods for global optimization; Interval analysis: 
systems of nonlinear equations; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval linear systems; Interval Newton 
methods; MINLP: branch and bound global optimization 
algorithm; MINLP: global optimization with «BB; Mixed 
integer nonlinear programming; Smooth nonlinear 
nonconvex optimization) 

(refers to: &BB algorithm; Automatic differentiation: point 
and interval; Automatic differentiation: point and interval 
taylor operators; Bounding derivative ranges; Continuous 
global optimization: applications; Continuous global 
optimization: models, algorithms and software; Global 
optimization in the analysis and management of 
environmental systems; Global optimization: application to 
phase equilibrium problems; Global optimization in batch 
design under uncertainty; Global optimization in 
generalized geometric programming; Global optimization 
methods for systems of nonlinear equations; Global 
optimization in phase and chemical reaction equilibrium; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
linear systems; Interval Newton methods; MINLP: branch 
and bound global optimization algorithm; MINLP: global 
optimization with wBB; Mixed integer nonlinear 
programming; Smooth nonlinear nonconvex optimization) 


interval graph 
[05C85, 90C35] 
(see: Directed tree networks; Feedback set problems) 


interval Hessian matrix 
[49M37, 65K10, 90C26, 90C30] 
(see: @BB algorithm) 


interval inference see: fuzzy — 


interval linear system 
[15A99, 65G20, 65G30, 65G40, 90C26] 
(see: Interval linear systems) 


interval linear system see: center of an — 


Interval linear systems 

(15A99, 65G20, 65G30, 65G40, 90C26) 

(referred to in: ABS algorithms for linear equations and 

linear least squares; Automatic differentiation: point and 

interval; Automatic differentiation: point and interval 
taylor operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 

Global optimization: interval analysis and balanced interval 

arithmetic; Interval analysis: application to chemical 

engineering design problems; Interval analysis: differential 
equations; Interval analysis: eigenvalue bounds of interval 
matrices; Interval analysis: intermediate terms; Interval 
analysis: nondifferentiable problems; Interval analysis: 
systems of nonlinear equations; Interval analysis: 
unconstrained and constrained optimization; Interval 
analysis: verifying feasibility; Interval constraints; Interval 
fixed point theory; Interval global optimization; Interval 

Newton methods; Large scale trust region problems; Large 

scale unconstrained optimization; Orthogonal 

triangularization; Overdetermined systems of linear 
equations; QR factorization; Solving large scale and sparse 
semidefinite programs; Symmetric systems of linear 
equations) 

(refers to: ABS algorithms for linear equations and linear 

least squares; Automatic differentiation: point and interval; 

Automatic differentiation: point and interval taylor 

operators; Bounding derivative ranges; Cholesky 

factorization; Global optimization: application to phase 
equilibrium problems; Interval analysis: application to 
chemical engineering design problems; Interval analysis: 
differential equations; Interval analysis: eigenvalue bounds 
of interval matrices; Interval analysis: intermediate terms; 

Interval analysis: nondifferentiable problems; Interval 

analysis: parallel methods for global optimization; Interval 

analysis: subdivision directions in interval branch and 
bound methods; Interval analysis: systems of nonlinear 
equations; Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 

Interval constraints; Interval fixed point theory; Interval 

global optimization; Interval Newton methods; Large scale 

trust region problems; Large scale unconstrained 
optimization; Linear programming; Nonlinear least 
squares: trust region methods; Orthogonal 
triangularization; Overdetermined systems of linear 
equations; QR factorization; Solving large scale and sparse 
semidefinite programs; Symmetric systems of linear 
equations) 

interval logic 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
interval logic system of approximate reasoning 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
interval matrices see: Interval analysis: eigenvalue bounds of — 
interval matrix see: complex —; dimensional symmetric —; 

extreme eigenvalue of an —; Hermitian —; interval of 

variation of an eigenvalue of an —; real —; real 

symmetric —; vertex matrix of an — 
interval methods 

[65T40, 90C26, 90C30, 90C90] 

(see: Global optimization methods for harmonic retrieval) 
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interval methods 
[65G20, 65G30, 65G40, 65K05, 65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval; 
Interval global optimization; Random search methods) 

interval Newton 

[65C20, 65G20, 65G30, 65G40, 65H20, 90C90] 

see: Interval analysis: application to chemical engineering 

design problems) 

interval Newton 

[65H20, 80A10, 80A22, 90C90] 

see: Global optimization: application to phase equilibrium 

problems) 

interval Newton algorithm 

[65H20, 80A10, 80A22, 90C90] 

see: Global optimization: application to phase equilibrium 

problems) 

interval Newton iteration 

[65G20, 65G30, 65G40, 65K05, 90C30] 

see: Interval global optimization) 

interval Newton method 

[65G20, 65G30, 65G40, 65H20, 65K99] 

see: Interval analysis: systems of nonlinear equations; 
Interval Newton methods) 

interval Newton method see: Krawczyk variation of the —; 
multivariate —; univariate — 

interval-Newton method in deterministic global optimization 
see: LP strategy for — 

Interval Newton methods 
(65G20, 65G30, 65G40, 65H20, 65K99) 
(referred to in: Algorithmic improvements using a heuristic 
parameter, reject index for interval optimization; 
Automatic differentiation: calculation of Newton steps; 
Automatic differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
Bounding derivative ranges; Dynamic programming and 
Newton’s method in unconstrained optimal control; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; 
Nondifferentiable optimization: Newton method; 
Nonlinear least squares: Newton-type methods; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 
(refers to: Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Complexity classes 
in optimization; Dynamic programming and Newton’s 
method in unconstrained optimal control; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 


design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval constraints; Interval fixed point theory; Interval 
global optimization; Interval linear systems; 
Nondifferentiable optimization: Newton method; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 

interval Newton methods 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: unconstrained and constrained 
optimization) 

interval Newton methods see: existence-proving properties 
of — 

interval Newton operator 

65G20, 65G30, 65G40] 

(see: Interval analysis: systems of nonlinear equations) 

interval Newton operator 

65G20, 65G30, 65G40] 

(see: Interval analysis: systems of nonlinear equations) 

interval Newton operator see: univariate — 

interval operator 

65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 

interval optimization see: Algorithmic improvements using 
a heuristic parameter, reject index for — 

interval order 
[90C29] 
(see: Preference modeling) 

interval package see: variable precision — 

interval pairs see: fuzzy — 

Interval propagation 
(68T20, 65G20, 65G30, 65G40) 
(referred to in: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: systems of 
nonlinear equations; Interval analysis: unconstrained and 
constrained optimization; Interval analysis: verifying 
feasibility; Interval fixed point theory; Interval global 
optimization; Interval linear systems; Interval Newton 
methods) 
(refers to: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges; Global 
optimization: application to phase equilibrium problems; 
Interval analysis: application to chemical engineering 
design problems; Interval analysis: differential equations; 
Interval analysis: eigenvalue bounds of interval matrices; 
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Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems; Interval analysis: parallel 
methods for global optimization; Interval analysis: 
subdivision directions in interval branch and bound 
methods; Interval analysis: systems of nonlinear equations; 
Interval analysis: unconstrained and constrained 
optimization; Interval analysis: verifying feasibility; 
Interval fixed point theory; Interval global optimization; 
Interval linear systems; Interval Newton methods) 
interval propagation 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
interval slopes 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: nondifferentiable problems) 
interval Taylor operator 
[65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 
interval taylor operators see: Automatic differentiation: point 
and — 
interval-valued approximate inference 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
interval of variation of an eigenvalue of an interval matrix 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: eigenvalue bounds of interval 
matrices) 
intervals see: floating point —; overlap of — 
INTLIB 
[65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 
INTOPT_90 
[65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 
intra-class distance 
[55R15, 55R35, 65K05, 90C11] 
(see: Deterministic and probabilistic optimization models 
for data classification) 
intractible problem 
[90C60] 
(see: Complexity theory) 
introduction, history and overview see: Bilevel 
programming: — 
introduction, history and rounding error estimation see: 
Automatic differentiation: — 
iNV 
(see: Integrated planning and scheduling) 
invariance criterion see: scale — 
invariance model see: sign- — 
invariant see: scale- —; shift- — 
invariant convex 
[90C26] 
(see: Invexity and its applications) 
invariant set 
[49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
invariant under principal pivoting see: matrix class — 


invariants 

[05C85] 

(see: Directed tree networks) 
invariants see: structure — 
inventory control 

[491.20] 

(see: Dynamic programming: inventory control) 
inventory control see: Dynamic programming: — 
inventory control problem 

[491.20] 

(see: Dynamic programming: inventory control) 
inventory management 

[90B50] 

(see: Inventory management in supply chains) 
inventory management see: multistage — 
inventory management models see: multistage —; single 

stage — 

Inventory management in supply chains 

(90B50) 

(referred to in: Global supply chain models; Nonconvex 

network flow problems; Operations research models for 

supply chain management and design; Piecewise linear 
network flow problems) 

(refers to: Global supply chain models; Nonconvex network 

flow problems; Operations research models for supply 

chain management and design; Piecewise linear network 
flow problems) 
inventory models: (QR) policy see: Continuous review — 
Inventory Ordering see: zero- — 
inventory placement 

[90-02] 

(see: Operations research models for supply chain 

management and design) 
inventory routing 

[65H20, 65K05, 90-01, 90B06, 90B40, 90C10, 90C27, 90C35, 

94C15] 

(see: Greedy randomized adaptive search procedures; 

Vehicle routing) 
inventory routing problems see: Maritime — 
inventory ship routing problem 

(see: Maritime inventory routing problems) 
inventory systems 
90-02] 

(see: Operations research models for supply chain 
management and design) 

inventory and transportation decisions 

90-02] 

(see: Operations research models for supply chain 
management and design) 

inverse 

15A15, 90C25, 90C55, 90C90] 

(see: Semidefinite programming and determinant 

maximization) 
inverse see: approximate —; Moore-Penrose pseudo- —; 

pseudo- — 
inverse differentiability 

[90C15] 

(see: Derivatives of probability measures) 
inverse interpolation parametric eigenvalue formulation 

[90C30] 

(see: Large scale trust region problems) 
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inverse problem of the calculus of variations 
[49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 
inverse problems see: formulation and solution of — 
inverse product of relations see: self- — 
inverse quasi-Newton updating 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
inverse relation 
[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
inverses see: generalized — 
inverted transformation 
[90C11, 90C90] 
(see: MINLP: trim-loss problem) 
investment see: maximization of return on —; venture 
capital — 
investment analysis 
[91B06, 91B60] 
(see: Financial applications of multicriteria analysis) 
investment decision 
[91B06, 91B60] 
(see: Financial applications of multicriteria analysis) 
investment decisions see: diversified — 
investments see: optimal — 


invex 

[90C26] 

(see: Invexity and its applications) 
invex see: generalized —; pseudo- —; quasi- —; V- — 
invex function 


[49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 

(see: Variational principles) 

invex function 

[49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 

see: Variational principles) 

invex function see: pre- — 

invex set 

[49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 

see: Variational principles) 

invex set see: pre- — 

Invexity and its applications 

90C26) 

referred to in: Generalized concavity in multi-objective 
optimization; L-convex functions and M-convex functions) 
(refers to: Generalized concavity in multi-objective 
optimization; Isotonic regression problems; L-convex 
functions and M-convex functions) 

invexity at a point 
[49J40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 

invexity with respect to a set 
[49J40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 

invexity with respect to a set see: pre- — 

iNVO 
(see: Integrated planning and scheduling) 

involutory operator 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 


involving QD-superpotentials see: convex variational 
inequality for an elastostatic problem —-; elastostatic 
problem —-; variational equality for an elastostatic 
problem — 
Toffe-Burke local dualization 
46A20, 52A01, 90C30] 
(see: Composite nonsmooth optimization) 
ions see: complementary — 
IP 
68Q99] 
(see: Branch and price: Integer programming with column 
generation) 
IPE 
90C30 
(see: Large scale trust region problems) 
IPH function 
90C26 
(see: Global optimization: envelope representation) 
iPMN 
65K10, 90C33, 90C51] 
(see: Generalizations of interior point methods for the 
linear complementarity problem) 
IPP 
68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
1QC 
93D09] 
(see: Robust control) 
IQML method 
65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 
irreducible components of a matrix 
90C09, 90C10] 
(see: Combinatorial matrix analysis) 
irreducible matrix 
90C09, 90C10] 
(see: Combinatorial matrix analysis) 
irreducible matrix see: inductive structure of an — 
irredundant constraint 
90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
irregular operations 
90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
irregular operations problem 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
Ising glass model 
[90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 
isodose 
[68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
isolated local minimizer 
[90C26, 90C31] 
(see: Sensitivity and stability in NLP: continuity and 
differential stability; Smooth nonlinear nonconvex 
optimization) 
isolated stationary point 
[49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
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isolation see: cluster — 

isomorphic graphs 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 

isomorphism see: graph —; maximal similarity subtree —; 
maximal subtree —; maximum similarity subtree —; 
maximum subtree —; subtree — 

isomorphism problem see: graph — 

isotone Boolean function 
[90C09] 
(see: Inference of monotone boolean functions) 

isotone functions 
[41A30, 4799, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 

isotone inclusion function 

65G20, 65G30, 65G40, 65K05, 90C30] 

(see: Interval global optimization) 

isotone mapping see: ®- — 

isotone monotone Boolean function 

90C09] 

(see: Inference of monotone boolean functions) 

isotone operator 

90C33] 

(see: Order complementarity) 

isotone optimization 

41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 

isotone projection 
[90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem) 

isotone projection cone 
[90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem) 

isotonic function 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 

isotonic medium regression 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 

isotonic regression 
[41A30, 62G07, 62G30, 62J02, 65K05, 90C26] 
(see: Isotonic regression problems; Regression by special 
functions: algorithms and complexity) 

isotonic regression see: simple order — 

isotonic regression problem 
[41A30, 62G07, 62G30, 62J02, 65K05, 90C26] 
(see: Isotonic regression problems; Regression by special 
functions: algorithms and complexity) 

Isotonic regression problems 
(62G07, 62G30, 65K05) 
(referred to in: Generalized concavity in multi-objective 
optimization; Invexity and its applications; L-convex 
functions and M-convex functions; Regression by special 
functions: algorithms and complexity) 


(refers to: Regression by special functions: algorithms and 
complexity) 
isotonic regression problems see: algorithms for — 
isotonicity property 
[65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 
issues in classification see: computational — 
Itakura-Saito divergence 
[90C05, 90C25] 
(see: Young programming) 
iterate see: dead-point — 
iterated local search 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
see: Traveling salesman problem) 
iterated tour partitioning algorithm see: K- — 
iterates see: feasible — 
iteration 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
see: Global optimization in protein folding) 
iteration see: control vector —; direct —; fixed point —; 
Gauss-Seidel —; Gauss-Seidel value —; Gram-Schmidt 
type —; interval Newton —; Jacobi —; Petrov—Galerkin —; 
policy —; relative value —; Richardson —; value — 
iteration BCI see: Boundary condition — 
iteration CVI see: Control vector — 
iteration method see: boundary condition — 
iterations see: Local attractors for gradient-related descent — 
iterative algorithms see: asynchronous — 
iterative deepening 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
iterative dynamic programming 
[65L99, 90C30, 93-XX] 
(see: Boundary condition iteration BCI; Direct search 
Luus—Jaakola optimization procedure; Dynamic 
programming: optimal control applications; Optimization 
strategies for dynamic systems; Suboptimal control) 
iterative function system 
[49Q10, 74K99, 74Pxx, 90C90, 91465] 
see: Multilevel optimization in mechanics) 
iterative linear equation-solving 
[90C25, 90C30] 
see: Successive quadratic programming: solution by active 
sets and interior point methods) 
iterative method 
[90033] 
see: Linear complementarity problem) 
iterative method 
[65H10, 65J15] 
see: Contraction-mapping) 
iterative method see: Chebyshev —; partially asynchronous — 
iterative model refinement 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
see: Identification methods for reaction kinetics and 
transport) 
iterative quadratic maximum likelihood method 
[65T40, 90C26, 90C30, 90C90] 
see: Global optimization methods for harmonic retrieval) 
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iterative regularization see: Tikhonov — 
iterative regularization method 
[47J20, 49J40, 65K10, 90C33] 
(see: Solution methods for multivalued variational 
inequalities) 
iterative scheme 
[91B50] 
(see: Walrasian price equilibrium) 
iterative scheme see: chaotic — 
iterative solution algorithm see: incremental- — 
iterative solution of the Euclidean distance location problem 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
IV see: Prolog — 
IVP 
[49]40, 49M30, 65K05, 65M30, 65M32] 
(see: Ill-posed variational problems) 


J 


J-normal primal problem 

90C29, 90C30] 

(see: Multi-objective optimization: lagrange duality) 

J-stable primal problem 

90C29, 90C30] 

(see: Multi-objective optimization: lagrange duality) 

j-VNS 

9008, 90C26, 90C27, 90C59] 

(see: Variable neighborhood search methods) 

Jacobi 

[49L20, 90C39] 

(see: Dynamic programming: discounted problems) 

Jacobi 

[90C30] 

see: Cost approximation algorithms) 

Jacobi algorithm 

[68W 10, 90B15, 90C06, 90C30] 

see: Cost approximation algorithms; Stochastic network 
problems: massively parallel solution) 

Jacobi-Bellman equation see: derivation of the Hamilton——; 
Hamilton——-; solution of the Hamilton- —; sufficiency 
theorem for the Hamilton— — 

Jacobi equation see: Hamilton- — 

Jacobi inequality see: Hamilton- — 

Jacobi iteration 
[15A23, 65F05, 65F20, 65F22, 65F25] 

(see: QR factorization) 

Jacobi method 
[90C33] 

(see: Linear complementarity problem) 

Jacobian see: accumulation of the —; approximate —; Clarke 
generalized —; preaccumulation of the — 

Jacobian consistency property 
[90C30, 90C33] 

(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 

Jacobian matrix 
[65K05, 90C30] 


(see: Automatic differentiation: calculation of Newton 
steps) 

Jacobian matrix 
[65K05, 90C25, 90C30] 
(see: Automatic differentiation: calculation of Newton steps; 
Successive quadratic programming: full space methods; 
Successive quadratic programming: solution by active sets 
and interior point methods) 

Jacobians, and Hessians see: Complexity of gradients — 

James-Stein estimators 

91B28] 

(see: Portfolio selection: markowitz mean-variance model) 

James sup theorem 

46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 

(see: Minimax theorems) 

Jaynes 

94A17] 

(see: Jaynes’ maximum entropy principle) 

Jaynes entropy concentration theorem 

94A17] 

(see: Jaynes’ maximum entropy principle) 

Jaynes maximum entropy 

62F10, 94A17] 
(see: Entropy optimization: parameter estimation) 

Jaynes’ maximum entropy principle 
(94A17) 
(referred to in: Entropy optimization: interior point 
methods; Entropy optimization: parameter estimation; 
Entropy optimization: shannon measure of entropy and its 
properties; Maximum entropy principle: image 
reconstruction) 
(refers to: Entropy optimization: interior point methods; 
Entropy optimization: parameter estimation; Entropy 
optimization: shannon measure of entropy and its 
properties; Maximum entropy principle: image 
reconstruction) 


jensen’s inequality 
[90C15] 
(see: Multistage stochastic programming: barycentric 
approximation; Stochastic linear programs with recourse 
and arbitrary multivariate distributions) 
Jensen lower bound 
90C15] 
(see: Stochastic linear programs with recourse and arbitrary 
multivariate distributions) 
Jerrum conjecture 
05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
JJT-regular problem 
65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 
job-shop 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 
job-shop 
[90B35] 
(see: Job-shop scheduling problem) 
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job-shop problem 
[62C10, 65K05, 90B35, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization; Job-shop scheduling 
problem) 
Job-shop scheduling problem 
(90B35) 
(referred to in: MINLP: design and scheduling of batch 
processes; Vehicle scheduling) 
(refers to: Complexity classes in optimization; Complexity 
theory; Genetic algorithms; Integer programming: branch 
and bound methods; Integer programming: cutting plane 
algorithms; Linear programming; MINLP: design and 
scheduling of batch processes; Simulated annealing; 
Stochastic scheduling; Vehicle scheduling) 
John see: Von Neumann — 
John conditions see: Fritz — 
John generalized conditions see: Fritz — 
John necessary optimality conditions see: fritz — 
John rule see: Fritz — 
John system see: Fritz — 
John type condition see: Fritz — 
Johnson linearization see: Adams- — 
join 
[90C35] 
(see: Multi-index transportation problems) 
join procedure 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
joined sets 
[46A22, 49]35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
joining see: neighbor — 
joint continuity 
90C11, 90C15, 90C31] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence) 
joint probabilistic constraint 
90C15] 
(see: Static stochastic programming models) 
jointly convex problem 
90C31] 
(see: Sensitivity and stability in NLP: continuity and 
differential stability) 
Jones see: Lennard- — 
Jones microcluster see: Lennard- — 
Jones and morse clusters see: Global optimization in 
Lennard- — 
Jones potential energy function see: Lennard- — 
Jongen-Jonker-Twilt see: problem regular in the sense of — 
Jonker-Twilt see: problem regular in the sense of Jongen- — 
Jordan-Hahn decomposition 
[90C15] 
(see: Derivatives of probability measures) 
Joseph-Louis see: Lagrange — 
journal of Heuristics 
[68T20, 68T99, 9008, 90C26, 90C27, 90C59] 
(see: Metaheuristics; Variable neighborhood search 
methods) 
Journal of Management Mathematics see: iMA — 
Journal of Operational Research see: european — 


judgemental forecast 
[90C26, 90C30] 
(see: Forecasting) 
judgment see: pairwise — 
judgment matrix see: consistent — 
judgments see: incomplete — 
jump across a fault 
[90Cxx] 
(see: Discontinuous optimization) 
jump direction 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
jump system see: finite — 
jumps of optimal solutions 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
junction nodes see: physical — 
Jiinger-Mutzel branch and cut algorithm 
[90C10, 90C27, 94C15] 
(see: Graph planarization) 
justice see: rule of — 
justification see: chain —; left-chain —; right-chain — 


K 


K see: commutator —; multivariable stability margin — 
k-CNF 

[03B50, 68T15, 68T30] 

see: Finite complete systems of many-valued logic algebras) 
k-CNF see: SAT- — 

k-colorable 

[05C15, 05C62, 05C69, 05C85, 90C27, 90C35, 90C59] 
see: Graph coloring; Optimization problems in unit-disk 
graphs) 

K-convex function 

[491.20] 

see: Dynamic programming: inventory control) 
K-convexity 

[491.20] 

see: Dynamic programming: inventory control) 
k-dimensional cube connected cycle 

[90C35] 

see: Feedback set problems) 

k-exchange neighborhood 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

k-face 

[05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements in optimization) 
index transportation problem 

[90C35] 

see: Multi-index transportation problems) 
interchange heuristics 

[05C69, 05C85, 68W01, 90C59] 

see: Heuristics for maximum clique and independent set) 
K-iterated tour partitioning algorithm 

[68T99, 90C27] 

see: Capacitated minimum spanning trees) 


k 


k 
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K-L type neighborhood structure for the QAP 
90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
k-level crossing minimization 
90C35] 
(see: Optimization in leveled graphs) 
level hierarchy 
[90C35] 
(see: Optimization in leveled graphs) 
k-level planarization problem 
[90C35] 
see: Optimization in leveled graphs) 
k-leveled graph 
[90C35] 
see: Optimization in leveled graphs) 
k-leveled graph see: proper — 
K-local bilinear form 
[90C33] 
see: Order complementarity) 
K-local inner product 
[90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem) 
K-majorant see: smallest — 
k-means criterion 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
K-minorant see: greatest — 
k-neighbor 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
k-neighborhood 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
k-Opt 
[68Q25, 68R10, 68W40, 90B06, 90B35, 90C06, 90C10, 90C27, 
90C39, 90C57, 90C59, 90C60, 90C90] 
see: Domination analysis in combinatorial optimization; 
Traveling salesman problem) 
k-optimality 
[68Q20] 
see: Optimal triangulations) 
k-relations 
[05C85] 
see: Directed tree networks) 
k-restrictive 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 
k-restrictive multilayer 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
see: Algorithms for genomic analysis) 
k-separation 
[90C09, 90C10] 
see: Matroids) 
set-contraction 
[90C33] 
(see: Order complementarity) 
k-Steiner ratio 
[90C27] 
(see: Steiner tree problems) 


k 


k 


k-tree 
[90C27] 
(see: Steiner tree problems) 
k-way graph partitioning problem 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 
k-way polytope 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
k-way table 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
k-way transportation polytope 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
Kackmartz method 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
kalman filter 
(see: Bayesian networks) 
Kalmanson matrix 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
Kantorovich 
[01499] 
(see: Kantorovich, Leonid Vitalyevich) 
Kantorovich-Karush-Kuhn-Tucker equations 
[65K05, 65K10] 
(see: ABS algorithms for optimization) 
Kantorovich, Leonid Vitalyevich 
(01A99) 
(referred to in: History of optimization; Linear 
programming) 
(refers to: History of optimization; Linear programming) 
Kantorovich scheme 
49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
Kanzow-Smale function see: Chen-Harker- — 
Karmarkar algorithm 
90C05] 
(see: Linear programming: karmarkar projective algorithm) 
Karmarkar method 
65K05, 65K10] 
(see: ABS algorithms for optimization) 
Karmarkar potential function 
37A35, 90C05] 
(see: Potential reduction methods for linear programming) 
karmarkar projective algorithm see: Linear programming: — 
Karush-Kuhn-Tucker conditions 
90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 
Karush-Kuhn-Tucker conditions 
90C30] 
(see: Convex-simplex algorithm) 
Karush—Kuhn-Tucker conditions see: generalized — 
Karush—Kuhn-Tucker equations see: Kantorovich— — 
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Karush-Kuhn-Tucker optimality conditions 
[90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 
Karush-Kuhn-Tucker point 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
Karush-Kuhn-Tucker point 
[58E05, 90C26, 90C30] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Topology of global optimization) 
Karush-Kuhn-Tucker type condition 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
Karwan Kth-best algorithm see: Bialas— — 
Kataoka principle 
[90C15] 
(see: Stochastic programming models: random objective) 
Kaucher arithmetic 
[65G30, 65G40, 65K05, 90C30, 90C57] 
(see: Global optimization: interval analysis and balanced 
interval arithmetic) 
Kaufman-Broeckx linearization 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
Kaufmann-Stewart reorthogonalized Gram-Schmidt 
algorithm see: Daniel-Gragg- — 
kB-consistency 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
Keifer-Wolfowitz method 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 
Kelley’s classical cutting plane method 
[46N10, 90-00, 90C47] 
(see: Nondifferentiable optimization) 
Kelley cutting plane method 
[65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 
Kelley method 
[65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 
kernel 
[49-01, 49K 45, 49N10, 90-01, 90C20, 90C27, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs) 
kernel see: Markov — 
kernel estimates 
[90C15] 
(see: Extremum problems with probability functions: kernel 
type solution methods) 
kernel function 
[90C30, 90C33] 
(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 
kernel of a graph 
[90-XX] 
(see: Outranking methods) 
kernel transformation 
[55R15, 55R35, 65K05, 90C11] 


(see: Deterministic and probabilistic optimization models 
for data classification) 
kernel type solution methods see: Extremum problems with 
probability functions: — 
Kernighan neighborhood see: Lin- — 
key variables 
[49M25, 90-08, 90C05, 90C06, 90C08, 90C15] 
(see: Simple recourse problem: primal method) 
keys method see: random — 
keywords 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
keywords 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
KH-regular problem 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 
Kharitonov theorem 
[49M37, 65K10, 90C26, 90C30] 
(see: a BB algorithm) 
Kimura maximum principle 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
see: Replicator dynamics in combinatorial optimization) 
kinematically admissible displacement 
[90C25, 90C27, 90C90] 
see: Semidefinite programming and structural 
optimization) 
kinetic coefficients see: estimation of — 
kinetically admissible space 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
kinetics and transport see: Identification methods for 
reaction — 
Kirchhoff-condition 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
KKT 
[90C15, 90C26, 90C33] 
see: Stochastic bilevel programs) 
KKT-based method 
[49M37, 65K05, 65K10, 90C30, 93A13] 
see: Multilevel methods for optimal design) 
KKT conditions 
[49M37, 65K05, 90C26, 90C30, 90C39] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Equality-constrained nonlinear programming: KKT 
necessary optimality conditions; Second order optimality 
conditions for nonlinear optimization) 
KKT conditions 
[90C26, 90C30, 90C33, 90C39] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach; Second order optimality 
conditions for nonlinear optimization) 
KKT conditions see: first order — 
kKT necessary optimality conditions 
[49M37, 65K05, 90C30] 
(see: Equality-constrained nonlinear programming: KKT 
necessary optimality conditions) 
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KKT necessary optimality conditions 
[49M37, 65K05, 90C30] 
(see: Equality-constrained nonlinear programming: KKT 
necessary optimality conditions) 
KKT necessary optimality conditions see: Equality-constrained 
nonlinear programming: — 
KKT optimality conditions 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
kKT point 
[90C60] 
(see: Complexity theory: quadratic programming) 
KKT point see: global minimum — 
KKT points 
[58E05, 90C30] 
see: Topology of global optimization) 
KKT points see: successive improvement of — 
KKT stationarity conditions 
[65K05, 90C20] 
see: Quadratic programming with bound constraints) 
Klee-Minty examples 
[90C05] 
see: Linear programming: Klee-Minty examples) 
Klee-Minty examples 
[90C05] 
see: Linear programming: Klee-Minty examples) 
Klee-Minty examples see: Linear programming: — 
Kleene-Dienes implication 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
Klein 4-element group 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
Klétzler duality 
[49K05, 49K10, 49K15, 49K20] 
see: Duality in optimal control with first order differential 
equations) 
knapsack see: fractional 0-1 —; multiconstraint —; 
multidimensional —; multiple choice —; Quadratic — 
knapsack constraint 
[68Q99, 90C10, 90C27] 
(see: Branch and price: Integer programming with column 
generation; Multidimensional knapsack problems) 
knapsack constraint 
[90C20, 90C60] 
(see: Quadratic knapsack) 
knapsack cut 
[90C11] 
(see: MINLP: branch and bound methods) 
knapsack problem 
[05-04, 62C10, 65K05, 68Q99, 90C05, 90C10, 90C11, 90C15, 
90C26, 90C27, 90C29, 90C30, 90C57, 90C90] 
(see: Bayesian global optimization; Branch and price: 
Integer programming with column generation; Chemical 
process planning; Evolutionary algorithms in 
combinatorial optimization; Integer programming; 
Kuhn-Tucker optimality conditions; Multi-objective 
combinatorial optimization; Simplicial pivoting algorithms 
for integer programming) 
knapsack problem 
[90B90, 90C05, 90C10, 90C59, 90C60] 


(see: Complexity theory: quadratic programming; 
Cutting-stock problem; Simplicial pivoting algorithms for 
integer programming) 
knapsack problem see: bi- —; bidimensional —; convex 
quadratic —; linear multiple-choice —; m-dimensional —; 
multi- —; multiconstraint —; multidimensional —; 
multidimensional multiple-choice —; multidimensional 
zero-one —; multiple —; multiple-choice —; zero-one — 
knapsack problems 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms) 
knapsack problems see: Multidimensional — 
Knaster-Kuratowski-Mazurkiewicz lemma 
46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
Knizhnik-Zamolodchikov differential equation 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
knowledge 
(see: Planning in the process industry) 
knowledge see: algorithmic —; declarative —; state of — 
knowledge-based NP methods 
[90C11, 90C59] 
(see: Nested partitions optimization) 
knowledge of a probability distribution see: incomplete — 
Kohout compatibility theorem see: Bandler- — 
Kojima function 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 
Kojima—Hirabayashi see: problem regular in the sense of — 
Kojima-Shindo method 
[90C30, 90C33] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 
Kolmogorov-Chaitin complexity see: Solomonoff- — 
Kolmogorov complexity 
(90C60) 
(referred to in: Complexity classes in optimization; 
Complexity of degeneracy; Complexity of gradients, 
Jacobians, and Hessians; Complexity theory; Complexity 
theory: quadratic programming; Computational 
complexity theory; Fractional combinatorial optimization; 
Information-based complexity and information-based 
optimization; Mixed integer nonlinear programming; 
NP-complete problems and proof methodology; Parallel 
computing: complexity classes) 
(refers to: Complexity classes in optimization; Complexity of 
degeneracy; Complexity of gradients, Jacobians, and 
Hessians; Complexity theory; Complexity theory: quadratic 
programming; Computational complexity theory; 
Fractional combinatorial optimization; Information-based 
complexity and information-based optimization; Mixed 
integer nonlinear programming; Parallel computing: 
complexity classes) 
Kolmogorov complexity 
[90C60] 
(see: Kolmogorov complexity) 
Kolmogorov complexity see: conditional — 
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Kolmogorov €-entropy 
[01A60, 03B30, 54C70, 68Q17] 
(see: Hilbert’s thirteenth problem) 
Koopmans-Beckmann QAP 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
Koopmans-Beckmann quadratic assignment problem 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
KP 
[90B06, 90B10, 90C26, 90C35] 
(see: Minimum concave transportation problems) 
Kramers equation see: Snoluchowski- — 
Krawczyk method 
[65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval fixed point theory; Interval Newton methods) 
Krawczyk variation of the interval Newton method 
[65G20, 65G30, 65G40] 
(see: Interval analysis: systems of nonlinear equations) 
Krein-Milman theorem 
(90C05) 
(referred to in: Carathéodory theorem; Linear 
programming) 
(refers to: Carathéodory theorem; Linear programming) 
Kreisselmeier-Steinhauser function 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
Kremser equation 
[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks) 
Kruskal algorithm 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
Kruskal algorithm see: modified — 
Kruskal Tp statistic see: Goodman- — 
Krylov space type methods 
65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
KSM 
90C15] 
(see: Extremum problems with probability functions: kernel 
type solution methods) 
KT conditions 
90C30] 
(see: Kuhn-Tucker optimality conditions) 
KT conditions see: nonstoichiometric form of —; 
stoichiometric form of — 
KT equations 
65K05, 65K10] 
(see: ABS algorithms for optimization) 
KT point 
90C30] 
(see: Nonlinear least squares problems) 
KT point 
90C30] 
(see: Kuhn-Tucker optimality conditions) 
Kth-best algorithm see: Bialas-Karwan — 
kth directional derivative 
65K05, 90C30] 
(see: Minimax: directional differentiability) 


kth order form of coordinates 

[65K05, 90C30] 

see: Minimax: directional differentiability) 

kth order hypodifferential 

[65K05, 90C30] 

see: Minimax: directional differentiability) 

Kuhn-Tucker approach 

[49-01, 49K10, 49M37, 90-01, 90C05, 90C27, 91B52] 

see: Bilevel linear programming) 

Kuhn-Tucker conditions 

[90C25, 90C30] 

see: Successive quadratic programming: full space 

methods) 

Kuhn-Tucker conditions 

[90C20, 90C30] 
(see: Successive quadratic programming; Successive 
quadratic programming: decomposition methods) 

Kuhn-Tucker conditions see: generalized Karush- —; 
Karush- — 

Kuhn-Tucker conditions for quadratic programming 
sub-problems 

[90C25, 90C30] 

see: Successive quadratic programming: full space 

methods) 

Kuhn-Tucker CQ 

[49K27, 49K40, 90C30, 90C31] 

(see: First order constraint qualifications) 

Kuhn-Tucker equations 

[65K05, 65K10] 

see: ABS algorithms for optimization) 

Kuhn-Tucker equations see: Kantorovich-Karush—- — 

Kuhn-Tucker necessary optimality conditions 

[65F10, 65F50, 65H10, 65K10] 

see: Globally convergent homotopy methods) 

Kuhn-Tucker optimality condition 

[90C30] 

(see: Nonlinear least squares problems) 

Kuhn-Tucker optimality condition 

[90C30] 

(see: Nonlinear least squares problems) 

Kuhn-Tucker optimality conditions 

90C30) 

(referred to in: Equality-constrained nonlinear 

programming: KKT necessary optimality conditions; First 

order constraint qualifications; High-order necessary 

conditions for optimality for abnormal points; Implicit 

lagrangian; Inequality-constrained nonlinear optimization; 

Lagrangian duality: BASICS; Rosen’s method, global 

convergence, and Powell’s conjecture; Saddle point theory 

and optimality conditions; Second order constraint 

qualifications; Second order optimality conditions for 

nonlinear optimization) 

(refers to: Equality-constrained nonlinear programming: 

KKT necessary optimality conditions; Farkas lemma; First 

order constraint qualifications; High-order necessary 

conditions for optimality for abnormal points; Implicit 

lagrangian; Inequality-constrained nonlinear optimization; 

Lagrangian duality: BASICS; Rosen’s method, global 

convergence, and Powell’s conjecture; Saddle point theory 

and optimality conditions; Second order constraint 
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qualifications; Second order optimality conditions for 

nonlinear optimization) 

Kuhn-Tucker optimality conditions see: Karush- — 
Kuhn-Tucker point 

[90C30] 

(see: Rosen’s method, global convergence, and Powell’s 

conjecture) 

Kuhn-Tucker point see: Karush- — 

Kuhn-Tucker points see: multiple —; multiple QP — 
Kuhn-Tucker type condition see: Karush- — 
Kullback-Leibler cross-entropy 

[90C25, 9417] 

(see: Entropy optimization: shannon measure of entropy 

and its properties) 
Kullback-Leibler divergence 

[15A15, 90C25, 90C55, 90C90] 

(see: Semidefinite programming and determinant 

maximization) 

Kullback-Leibler measure of cross-entropy 

[62F10, 94417] 

(see: Entropy optimization: parameter estimation) 
Kuratowski convergence see: discrete Painlevé- — 
Kuratowski-Mazurkiewicz lemma see: Knaster- — 
kurtosis 

[90C26, 90C90] 

(see: Signal processing with higher order statistics) 
KyFan function 

[90C15] 

(see: Stochastic quasigradient methods in minimax 

problems) 


L 


(I) see: tie-up-time — 
L-BFGS 
90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
L-convex function 
90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
L2-convex function 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
-convex function 
[90C10, 90C25, 90C27, 90C35] 
see: L-convex functions and M-convex functions) 
L-convex functions see: Fenchel-type duality for M- and — 
L-convex functions and M-convex functions 
90C27, 90C25, 90C10, 90C35) 
referred to in: Invexity and its applications) 
refers to: Generalized concavity in multi-objective 
optimization; Invexity and its applications; Isotonic 
regression problems) 
L-convex set 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
L-convex set 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 


= 


L 


L-convexity 


[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 


L -distance 


90B80, 90C27] 
(see: Voronoi diagrams in facility location) 


L,-distance 


90B80, 90C27] 
(see: Voronoi diagrams in facility location) 


Loo-distance 


90B80, 90C27] 
(see: Voronoi diagrams in facility location) 


Loo-distance 


90B80, 90C27] 
(see: Voronoi diagrams in facility location) 


Lp-distance 


90B80, 90C27] 
(see: Voronoi diagrams in facility location) 


Lp-distance 


£ 


1 


90B80, 90B85, 90C27] 

(see: Single facility location: multi-objective rectilinear 
distance location; Voronoi diagrams in facility location) 
estimation problem 

65D10, 65K05] 

(see: Overdetermined systems of linear equations) 


I, exact penalty function 


90Cxx] 
(see: Discontinuous optimization) 


L-matrix 


90C09, 90C10] 
(see: Combinatorial matrix analysis) 


L,-norm 


62H30, 90C39] 
(see: Dynamic programming in clustering) 


Ly-norm 


62H30, 90C39] 
(see: Dynamic programming in clustering) 


Loo-norm 


62H30, 90C39] 
(see: Dynamic programming in clustering) 


£, penalty function 


90C30] 
(see: Nonlinear least squares problems) 


Loo-penalty function see: exact — 
L-R flat fuzzy number 


90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 


L-R fuzzy number 


90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 


L-RH-BFGS 


90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 


L-separation theorem 


90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 


L-shaped decomposition 


68W 10, 90B15, 90C06, 90C15, 90C30] 
(see: Stochastic network problems: massively parallel 
solution; Stochastic programs with recourse: upper bounds) 
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I-shaped method 
[90C06, 90C15, 90C90] 
(see: Decomposition algorithms for the solution of 
multistage mean-variance optimization problems; L-shaped 
method for two-stage stochastic programs with recourse; 
Stochastic linear programming: decomposition and cutting 
planes) 

L-shaped method see: integer — 

L-shaped method for two-stage stochastic programs with 
recourse 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; Multistage stochastic programming: 
barycentric approximation; Preprocessing in stochastic 
programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programming: 
quasigradient method; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; Multistage stochastic programming: 
barycentric approximation; Preprocessing in stochastic 
programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 


Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

L type neighborhood structure for the QAP see: K- — 

Ltree of unused partitions see: set — 

Lreac of used partitions see: set — 

L, (v)-differentiable family of measures see: weakly — 

La Garza method see: De — 

label see: distance — 

label carrying 

[90C05, 90C10 

see: Simplicial pivoting algorithms for integer 

programming) 

label correcting methods 

[90B10, 90C27 

see: Shortest path tree algorithms) 

label setting methods 

[90B10, 90C27 

see: Shortest path tree algorithms) 

labeling 

[90C05, 90C10 

see: Simplicial pivoting algorithms for integer 
programming) 

labeling see: consistent —; distance constrained —; integer —; 
vector — 

labeling algorithm see: consistent —; relaxation — 

labeling procedure 

[05B35, 90C05, 90C20, 90C33] 

see: Least-index anticycling rules) 

labeling processes see: relaxation — 

labelings 

[90C05, 90C10] 

see: Simplicial pivoting algorithms for integer 

programming) 

lack of smoothness 

[65D25, 68W30] 

see: Complexity of gradients, Jacobians, and Hessians) 

Lagrange duality see: Multi-objective optimization: —; 
saddle — 

Lagrange equation 


[90C30] 
see: Image space approach to optimization) 
Lagrange equation see: dual Euler- —; Euler- — 


Lagrange equations 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: monoduality in convex optimization) 
Lagrange form 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: biduality in nonconvex optimization) 
Lagrange function 

[49K27, 49K40, 90C30, 90C31, 90C34, 90C90] 

see: MINLP: applications in blending and pooling 


4296 


Subject Index 


problems; Second order constraint qualifications; 
Semi-infinite programming: second order optimality 
conditions) 

Lagrange function 
[90C05, 90C30] 
(see: Image space approach to optimization; Theorems of 
the alternative and optimization) 

Lagrange functional 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 

Lagrange functions see: augmented — 

Lagrange, Joseph-Louis 
(0199) 
(referred to in: Decomposition techniques for MILP: 
lagrangian relaxation; Integer programming: lagrangian 
relaxation; Lagrangian multipliers methods for convex 
programming; Multi-objective optimization: lagrange 
duality) 
(refers to: Decomposition techniques for MILP: lagrangian 
relaxation; Integer programming: lagrangian relaxation; 
Lagrangian multipliers methods for convex programming; 
Multi-objective optimization: lagrange duality) 

Lagrange multiplier 
[90C29, 90C30, 90C33] 
(see: Implicit lagrangian; Multi-objective optimization; 
Interactive methods for preference value functions) 

Lagrange multiplier approach see: Everett generalized — 

Lagrange multiplier rule 

[49K27, 90C26, 90C29, 90C48] 

see: Set-valued optimization; Smooth nonlinear nonconvex 

optimization) 

Lagrange multiplier rule 

[90C26] 

see: Smooth nonlinear nonconvex optimization) 

Lagrange multiplier rule see: global — 

Lagrange multiplier sets 

[90C05, 90C25, 90C29, 90C30, 90C31] 

see: Nondifferentiable optimization: parametric 

programming) 

Lagrange multiplier vector 

[90C26, 90C39] 

see: Second order optimality conditions for nonlinear 

optimization) 

Lagrange multipliers 

[01A99, 49M37, 65G20, 65G30, 65G40, 65H20, 65K05, 65K10, 

90B85, 90C05, 90C10, 90C22, 90C25, 90C30, 90C31, 93A13] 

see: Equality-constrained nonlinear programming: KKT 
necessary optimality conditions; Image space approach to 
optimization; Integer programming: lagrangian relaxation; 
Interval analysis: verifying feasibility; Lagrange, 
Joseph-Louis; Lagrangian multipliers methods for convex 
programming; Multilevel methods for optimal design; 
Semidefinite programming: optimality conditions and 
stability; Sensitivity and stability in NLP: approximation; 
Sensitivity and stability in NLP: continuity and differential 
stability; Single facility location: multi-objective euclidean 
distance location; Theorems of the alternative and 
optimization) 

Lagrange multipliers 
[01A99, 90C05, 90C10, 90C15, 90C22, 90C25, 90C30, 90C31] 


(see: Image space approach to optimization; Integer 
programming: lagrangian relaxation; Lagrange, 
Joseph-Louis; Semidefinite programming: optimality 
conditions and stability; Stochastic programming: 
nonanticipativity and lagrange multipliers; Theorems of 
the alternative and optimization) 

Lagrange multipliers see: extended set of —; Stochastic 
programming: nonanticipativity and — 

Lagrange multipliers for nonanticipativity constraints 
[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 

Lagrange multipliers for phase constraints 
[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 

Lagrange relaxation 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

Lagrange-Slater dual see: extended — 

Lagrange-type functions 
(90C30, 90C26, 90C46) 

Lagrange-type functions 
[90C26, 90C30, 90C46] 
(see: Lagrange-type functions) 

Lagrangian 
[05C60, 05C69, 05C85, 37B25, 46A20, 49M37, 52A01, 65K05, 
68W01, 90C15, 90C20, 90C22, 90C25, 90C27, 90C30, 90C31, 
90C34, 90C35, 90C59, 91A22] 
(see: Composite nonsmooth optimization; Duality for 
semidefinite programming; Heuristics for maximum clique 
and independent set; Inequality-constrained nonlinear 
optimization; Replicator dynamics in combinatorial 
optimization; Semidefinite programming: optimality 
conditions and stability; Semi-infinite programming: 
second order optimality conditions; Sensitivity and stability 
in NLP: continuity and differential stability; Stochastic 
programming: nonanticipativity and lagrange multipliers) 

Lagrangian see: augmented —; Implicit —; modified —; 
MPEC —; quadratic —; reduced Hessian of a —; restricted 
implicit —; saddle —; sub —; unconstrained implicit —; 
vector valued — 

Lagrangian bounds 

90C25, 90C26] 

(see: Decomposition in global optimization) 

Lagrangian conditions 

90C26] 

(see: Invexity and its applications) 

Lagrangian conditions 

90C26] 

(see: Invexity and its applications) 

Lagrangian decomposition 

90C30, 90C90] 

(see: Decomposition techniques for MILP: lagrangian 

relaxation) 

Lagrangian decomposition 

90C30, 90C90] 
(see: Decomposition techniques for MILP: lagrangian 
relaxation) 

Lagrangian decomposition approach see: augmented — 
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Lagrangian dual 

[90C10, 90C30, 90C90] 

(see: Decomposition techniques for MILP: lagrangian 
relaxation; Integer programming: lagrangian relaxation) 
Lagrangian dual 

[90C10, 90C30] 

(see: Integer programming: lagrangian relaxation) 
Lagrangian dual optimization problem 

[90C30] 

(see: Lagrangian duality: BASICS) 

Lagrangian dual problem 

90C10, 90C30, 90C46] 

(see: Integer programming duality; Lagrangian duality: 
BASICS) 

Lagrangian duality 

90C10, 90C46] 

(see: Integer programming duality) 

Lagrangian duality 

90C30] 

(see: Duality for semidefinite programming) 

Lagrangian duality: BASICS 

(90C30) 

(referred to in: Equality-constrained nonlinear 
programming: KKT necessary optimality conditions; First 
order constraint qualifications; Implicit lagrangian; 
Inequality-constrained nonlinear optimization; 
Kuhn-Tucker optimality conditions; Rosen’s method, 
global convergence, and Powell’s conjecture; Saddle point 
theory and optimality conditions; Second order constraint 
qualifications; Second order optimality conditions for 
nonlinear optimization) 

(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; Farkas lemma; 
Farkas lemma: generalizations; First order constraint 
qualifications; Inequality-constrained nonlinear 
optimization; Kuhn-Tucker optimality conditions; Rosen’s 
method, global convergence, and Powell’s conjecture; 
Saddle point theory and optimality conditions; Second 
order constraint qualifications; Second order optimality 
conditions for nonlinear optimization) 

Lagrangian finite generation method 

90C15] 

(see: L-shaped method for two-stage stochastic programs 
with recourse) 

Lagrangian form 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: monoduality in convex optimization) 
Lagrangian function 

90C05, 90C20, 90C25, 90C26, 90C30, 90C39, 90C90] 

(see: Image space approach to optimization; Lagrangian 
duality: BASICS; Second order optimality conditions for 
nonlinear optimization; Smooth nonlinear nonconvex 
optimization; Successive quadratic programming; 
Successive quadratic programming: applications in 
distillation systems; Successive quadratic programming: 
decomposition methods; Successive quadratic 
programming: full space methods; Theorems of the 
alternative and optimization) 

Lagrangian function 

[90C20, 90C30, 90C90] 

(see: Successive quadratic programming; Successive 


quadratic programming: applications in distillation 
systems; Successive quadratic programming: 
decomposition methods) 

Lagrangian function see: augmented —; Hessian matrix of 
a —; infimum of a —; projected Hessian matrix of a — 

Lagrangian Hessian matrix see: projected — 

Lagrangian methods see: Practical augmented — 

Lagrangian multipliers 

[90C05, 90C30] 

see: Theorems of the alternative and optimization) 

Lagrangian multipliers 

[90C25, 90C30] 

see: Lagrangian multipliers methods for convex 

programming) 

Lagrangian multipliers methods for convex programming 

90C25, 90C30) 
(referred to in: Convex max-functions; Decomposition 
techniques for MILP: lagrangian relaxation; Integer 
programming: lagrangian relaxation; Lagrange, 
Joseph-Louis; Multi-objective optimization: lagrange 
duality; Splitting method for linear complementarity 
problems) 
(refers to: Convex max-functions; Decomposition 
techniques for MILP: lagrangian relaxation; Integer 
programming: lagrangian relaxation; Lagrange, 
Joseph-Louis; Multi-objective optimization: lagrange 
duality) 

Lagrangian problems see: high-order local maximum principle 
for — 


Lagrangian relaxation 
[49J52, 68Q99, 90C10, 90C11, 90C15, 90C27, 90C30, 90C35, 
90C46, 90C57, 90C90] 
(see: Branch and price: Integer programming with column 
generation; Decomposition techniques for MILP: 
lagrangian relaxation; Integer programming; Integer 
programming duality; Integer programming: lagrangian 
relaxation; Multicommodity flow problems; Multi-index 
transportation problems; Nondifferentiable optimization: 
relaxation methods; Stochastic integer programs) 
Lagrangian relaxation 
[49J52, 90B80, 90C10, 90C30, 90C90] 
(see: Decomposition techniques for MILP: lagrangian 
relaxation; Facilities layout problems; Integer 
programming: lagrangian relaxation; Nondifferentiable 
optimization: relaxation methods) 
lagrangian relaxation see: Decomposition techniques for 
MILP: —; Integer programming: — 
Lagrangian relaxation with subgradient optimization 
[90B80, 90C10] 
(see: Facility location problems with spatial interaction) 
Lagrangian theory 
[46A20, 52A01, 90C30] 
(see: Composite nonsmooth optimization) 
Lagrangian theory of CNSO problems see: second order — 
Lagrangians see: augmented — 
A see: canonical function associated with — 
laminar family 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
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laminated composite materials 
[90C26, 90C90] 
(see: Structural optimization: history) 
Lanczos algorithm 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
Lanczos method see: implicit restarted — 
Landau notation 
[49K27, 49K40, 90C30, 90C31] 
(see: Second order constraint qualifications) 
Langevin equation 
[65K05, 90C30] 
(see: Random search methods) 
language 
[90C60] 
(see: Complexity classes in optimization) 
language see: algebraic modeling —; algorithmic —; 
declarative —; F-complete —; F-hard —; modeling — 
language accepted by a Turing machine 
[90C60] 
see: Complexity classes in optimization) 
language and constraint logic programming see: modeling — 
language recognition problem 
[90C60] 
see: Complexity classes in optimization; Complexity 
theory) 
language recognition problems 
[90C60] 
(see: Complexity classes in optimization) 
languages see: algebraic modeling —; declarative —; second 
generation modeling — 
languages in optimization: a new paradigm see: Modeling — 
LAPACK 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
Laplace method and applications to optimization problems 
laplace’s principle of insufficient reason 
[94A17] 
(see: Jaynes’ maximum entropy principle) 
Laplace principle of insufficient reasoning 
[90C25, 9417] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 
large see: sufficiently — 
large collections of documents see: classification of — 
large nonlinear multicommodity flow problems 
[90C30] 
(see: Simplicial decomposition) 
large region network 
[05C05, 05C40, 68R10, 90C35] 
see: Network design problems) 
large residual 
[90C30] 
see: Nonlinear least squares problems) 
large residual problem 
[90C30] 
see: Generalized total least squares) 
large-scale combinatorial optimization 
(see: Selection of maximally informative genes) 


large-scale least squares problems see: Complexity and — 

large scale linear systems 

90C30] 

(see: Conjugate-gradient methods) 

large-scale neighborhoods 

68T20, 68T99, 90C27, 90C59] 

(see: Metaheuristics) 

large scale nonlinear mixed integer programming problem 

90B05, 90B06] 

(see: Global supply chain models) 

large scale optimization 

01A99] 

(see: History of optimization) 

large scale problem 

90C06 

(see: Large scale unconstrained optimization) 

large scale problem 

90C06 

(see: Large scale unconstrained optimization) 

large scale and sparse semidefinite programs see: Solving — 

large scale trust region 

90C30 

(see: Large scale trust region problems) 

large scale trust region problem 

90C30 
(see: Large scale trust region problems) 

Large scale trust region problems 
(90C30) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; Cholesky factorization; 
Conjugate-gradient methods; Interval linear systems; Large 
scale unconstrained optimization; Local attractors for 
gradient-related descent iterations; Nonlinear least squares: 
Newton-type methods; Nonlinear least squares: trust region 
methods; Orthogonal triangularization; Overdetermined 
systems of linear equations; QR factorization; Solving large 
scale and sparse semidefinite programs; Symmetric systems 
of linear equations) 
(refers to: ABS algorithms for linear equations and linear 
least squares; Cholesky factorization; Conjugate-gradient 
methods; D.C. programming; Integer programming; 
Interval linear systems; Large scale unconstrained 
optimization; Linear programming; Lipschitzian operators 
in best approximation by bounded or continuous functions; 
Local attractors for gradient-related descent iterations; 
Nonlinear least squares: Newton-type methods; Nonlinear 
least squares: trust region methods; Orthogonal 
triangularization; Overdetermined systems of linear 
equations; QR factorization; Solving large scale and sparse 
semidefinite programs; Symmetric systems of linear 
equations) 

Large scale unconstrained optimization 
(90C06) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; Broyden family of methods and the 
BFGS update; Cholesky factorization; Conjugate-gradient 
methods; Continuous global optimization: models, 
algorithms and software; Interval linear systems; Large 
scale trust region problems; Modeling languages in 
optimization: a new paradigm; Optimization software; 
Orthogonal triangularization; Overdetermined systems of 
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linear equations; QR factorization; Solving large scale and 
sparse semidefinite programs; Symmetric systems of linear 
equations; Unconstrained nonlinear optimization: 
Newton-Cauchy framework; Unconstrained optimization 
in neural network training) 
(refers to: ABS algorithms for linear equations and linear 
least squares; Broyden family of methods and the BFGS 
update; Cholesky factorization; Conjugate-gradient 
methods; Continuous global optimization: models, 
algorithms and software; Interval linear systems; Large scale 
trust region problems; Linear programming; Modeling 
languages in optimization: a new paradigm; Nonlinear least 
squares: trust region methods; Optimization software; 
Orthogonal triangularization; Overdetermined systems of 
linear equations; QR factorization; Solving large scale and 
sparse semidefinite programs; Symmetric systems of linear 
equations; Unconstrained nonlinear optimization: 
Newton-Cauchy framework; Unconstrained optimization 
in neural network training) 

largest coefficient pivoting rule see: Dantzig — 

largest coefficient rule 

90C05] 

(see: Linear programming: Klee-Minty examples) 

largest empty circle 

90B80, 90C27] 

(see: Voronoi diagrams in facility location) 

largest empty circle problem 

90B80, 90C27] 

(see: Voronoi diagrams in facility location) 

largest inscribed sphere method 

49M20, 90-08, 90C25] 
(see: Nondifferentiable optimization: cutting plane 
methods) 

largest possible 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 

Lasserre signed decomposition 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 

last node 
[90C35] 
(see: Generalized networks) 

last-out rule see: first-in — 

lattice 
[03B50, 03B52, 03C80, 13Cxx, 13Pxx, 14Qxx, 62F30, 62Gxx, 
68T27, 90Cxx] 
(see: Checklist paradigm semantics for fuzzy logics; Integer 
programming: algebraic methods) 

lattice see: distributive —; free distributive —; vector — 

lattice ideal 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 

lattice program 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 

lattice program 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 


lattice-type many-valued logic system 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
law see: flow conservation —; Gauss distribution —; Hook —; 
Raoult —; Walras — 
law of heat conduction see: Fourier — 
law of normal distribution 
[0199] 
(see: Gauss, Carl Friedrich) 
lawed see: dead or dog- — 
Lawler linearization 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
Lawler lower bound see: Gilmore- — 
Lawler type lower bounds see: Gilmore- — 
Lawrence signed decomposition 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 
laws see: conservation —; discretized hemivariational 
inequalities for nonlinear material —; variational 
formulation of quasidifferential —; variational formulation 
of subdifferential — 
laws and hemivariational inequalities see: multivalued 
nonmonotone — 
laws and systems of variational inequalities see: QD — 
laws and variational equalities see: single-valued boundary — 
laws and variational inequalities see: multivalued monotone — 
layer see: input — 
layer feed-forward network see: two- — 
layer supergraph see: three- — 
layout see: facilities —; facility — 
layout manager 
[90B80] 
(see: Facilities layout problems) 
layout manager 
[90B80] 
(see: Facilities layout problems) 
layout problem see: terminal — 
layout problems see: Facilities — 
layout problems and optimization see: Plant — 
layover 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
LBDOP 
[68W 10, 90C27] 
see: Load balancing for parallel optimization techniques) 
ICP 
[65K10, 90C11, 90C33, 90C51] 
see: Generalizations of interior point methods for the 
linear complementarity problem; LCP: Pardalos—Rosen 
mixed integer formulation) 
LCP 
[05B35, 65K05, 90C05, 90C20, 90C25, 90C33, 90C55] 
(see: Integer linear complementary problem; Lexicographic 
pivoting rules; Principal pivoting methods for linear 
complementarity problems; Splitting method for linear 
complementarity problems) 
LCP see: integer —; PCP- —; process the — 
LCP: Pardalos-Rosen mixed integer formulation 
(90C33, 90C11) 


4300 Subject Index 


(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Equivalence between nonlinear 
complementarity problem and fixed point problem; 
Generalized nonlinear complementarity problem; Graph 
coloring; Integer linear complementary problem; Integer 
programming; Integer programming: algebraic methods; 
Integer programming: branch and bound methods; Integer 
programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Integer 
programming: lagrangian relaxation; Linear 
complementarity problem; MINLP: trim-loss problem; 
Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; Order 
complementarity; Parametric mixed integer nonlinear 
optimization; Principal pivoting methods for linear 
complementarity problems; Set covering, packing and 
partitioning problems; Simplicial pivoting algorithms for 
integer programming; Stochastic integer programming: 
continuity, stability, rates of convergence; Stochastic 
integer programs; Time-dependent traveling salesman 
problem; Topological methods in complementarity theory) 
(refers to: Branch and price: Integer programming with 
column generation; Convex-simplex algorithm; 
Decomposition techniques for MILP: lagrangian relaxation; 
Equivalence between nonlinear complementarity problem 
and fixed point problem; Generalized nonlinear 
complementarity problem; Integer linear complementary 
problem; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; Lemke method; 
Linear complementarity problem; Linear programming; 
Mixed integer classification problems; Multi-objective 
integer linear programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Order complementarity; Parametric linear 
programming: cost simplex algorithm; Parametric mixed 
integer nonlinear optimization; Principal pivoting methods 
for linear complementarity problems; Sequential simplex 
method; Set covering, packing and partitioning problems; 
Simplicial pivoting algorithms for integer programming; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem; Topological 
methods in complementarity theory) 

IDL 
[15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 

IDM 
[15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 

LDSU see: nonarbitrage condition for — 

leader problem 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 


leading boxes 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
lean stream 
[93A30, 93B50] 
(see: Mixed integer linear programming: mass and heat 


exchanger networks) 
learning see: batch —; guided —; machine —-; off-line —; 
on-line —; Q- —; reinforcement —; supervised — 


learning algorithm 
[90C09, 90C10] 
(see: Optimization in boolean classification problems) 
learning algorithm 
[90C09, 90C10] 
(see: Optimization in boolean classification problems) 
learning algorithm see: machine- — 
learning of Boolean functions see: interactive — 
least absolute deviation 
[65K05, 90C27, 90C30, 90C57, 91C15] 
(see: Optimization-based visualization) 
least circle 
[90B85, 90C27] 
(see: Single facility location: circle covering problem) 
least effort see: principle of — 
least-index 
[05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 
Least-index anticycling rules 
(90C05, 90C33, 90C20, 05B35) 
(referred to in: Criss-cross pivoting rules; Lexicographic 
pivoting rules; Linear programming; Linear programming: 
Klee-Minty examples; Pivoting algorithms for linear 
programming generating two paths; Principal pivoting 
methods for linear complementarity problems; 
Probabilistic analysis of simplex algorithms; Simplicial 
pivoting algorithms for integer programming) 
(refers to: Criss-cross pivoting rules; Farkas lemma; Farkas 
lemma: generalizations; Lexicographic pivoting rules; 
Linear complementarity problem; Linear programming; 
Oriented matroids; Pivoting algorithms for linear 
programming generating two paths; Principal pivoting 
methods for linear complementarity problems; 
Probabilistic analysis of simplex algorithms) 
least-index anticycling rules 
05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 
least-index criss-cross method 
05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules; Principal pivoting methods 
for linear complementarity problems) 
least-index pivoting method 
05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 
least index pivoting rule see: Bland — 
least-index refinement see: Murty — 
least infeasible integer variable see: most/ — 
least squares 
[33C45, 65F20, 65F22, 65K10, 65T40, 90C26, 90C29, 90C30, 
90C90] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques; Global optimization 
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methods for harmonic retrieval; Least squares orthogonal 
polynomials) 

least squares 
[33C45, 65F20, 65F22, 65Fxx, 65K10, 90C30, 90C52, 90C53, 
90C55] 
(see: Gauss-Newton method: Least squares, relation to 
Newton’s method; Least squares orthogonal polynomials; 
Least squares problems) 

least squares see: Abaffi-Broyden-Spedicato algorithms for 
linear equations and linear —; ABS algorithms for linear 
equations and linear —; generalized nonlinear —; 
Generalized total —; linear —; method of —; nonlinear —; 
weighted — 

least squares algorithm see: recursive — 

least squares criterion 

62H30, 90C39] 

(see: Dynamic programming in clustering) 

least squares distance function 

41A30, 62J02, 90C26] 

(see: Regression by special functions: algorithms and 

complexity) 

least squares formal orthogonal polynomials 

33C45, 65F20, 65F22, 65K10] 

(see: Least squares orthogonal polynomials) 

least squares formulation 

90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 

least squares: Newton-type methods see: Nonlinear — 

Least squares orthogonal polynomials 
(33C45, 65K10, 65F20, 65F22) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; ABS algorithms for optimization; 
Gauss-Newton method: Least squares, relation to Newton’s 
method; Generalized total least squares; Least squares 
problems; Nonlinear least squares: Newton-type methods; 
Nonlinear least squares problems; Nonlinear least squares: 
trust region methods) 
(refers to: ABS algorithms for linear equations and linear 
least squares; ABS algorithms for optimization; 
Gauss-Newton method: Least squares, relation to Newton’s 
method; Generalized total least squares; Least squares 
problems; Nonlinear least squares: Newton-type methods; 
Nonlinear least squares problems; Nonlinear least squares: 
trust region methods) 

least squares problem 
[15-XX, 49K35, 49M27, 62G07, 62G30, 65-XX, 65D10, 65K05, 
65K10, 90-XX, 90C25, 90Cxx] 
(see: Cholesky factorization; Convex max-functions; 
Isotonic regression problems; Overdetermined systems of 
linear equations; Symmetric systems of linear equations) 

least squares problem 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 

least squares problem see: consistent —; generalized —; 
generalized nonlinear —; perturbed —; sparse —; total —; 
unconstrained nonlinear —; weighted — 

Least squares problems 
(65Fxx) 
(referred to in: ABS algorithms for linear equations and 


linear least squares; ABS algorithms for optimization; 
Cholesky factorization; Gauss, Carl Friedrich; 
Gauss-Newton method: Least squares, relation to Newton’s 
method; Generalized total least squares; Least squares 
orthogonal polynomials; Nonlinear least squares: 
Newton-type methods; Nonlinear least squares problems; 
Nonlinear least squares: trust region methods; 
Unconstrained optimization in neural network training) 
(refers to: ABS algorithms for linear equations and linear 
least squares; ABS algorithms for optimization; Gauss, Carl 
Friedrich; Gauss-Newton method: Least squares, relation 
to Newton’s method; Generalized total least squares; Least 
squares orthogonal polynomials; Nonlinear least squares: 
Newton-type methods; Nonlinear least squares problems; 
Nonlinear least squares: trust region methods) 

least squares problems see: Complexity and large-scale —; 
Nonlinear — 

least squares problems with massive data sets 
[34-xx, 34Bxx, 34Lxx, 93E24] 
(see: Complexity and large-scale least squares problems) 

Least squares, relation to Newton’s method see: 
Gauss—Newton method: — 

least squares solutions 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 

least squares: trust region methods see: Nonlinear — 

leaving variable 

[90C05] 

see: Linear programming: Klee-Minty examples) 

leaving variable see: choice of the — 

left-chain justification 

[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05 

see: Maximum partition matching) 

left-collection of a partition 

[05A18, 05D15, 68MO07, 68M 10, 68Q25, 68R05 

see: Maximum partition matching) 

left local maximizer see: discrete — 

left-paired element 

[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05 

see: Maximum partition matching) 

left-paired set 

[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05 

(see: Maximum partition matching) 

left-pairs 

[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05 

see: Maximum partition matching) 

left-reachable 

[05A18, 05D15, 68M07, 68M 10, 68Q25, 68R05 

see: Maximum partition matching) 

left-reachable see: directly — 

left saddle point 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: monoduality in convex optimization) 

left-unpaired element 

[05A18, 05D15, 68MO07, 68M10, 68Q25, 68R05] 

see: Maximum partition matching) 

legal neighbor 

[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 

legend see: arc —; node — 
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Legendre conjugate 
49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
Legendre duality 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
Legendre duality pair 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
legendre duality relations 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: biduality in nonconvex optimization; 
Duality theory: monoduality in convex optimization) 
Legendre transformation 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
Legendre transformation see: integral Fenchel- — 
Legends see: Figure — 
legitimacy see: gain — 
Lehmann-Goerisch bound 
[49R50, 65G20, 65G30, 65G40, 65L15, 65L60] 
see: Eigenvalue enclosures for ordinary differential 
equations) 
Lehmann-Maehly method 
[49R50, 65G20, 65G30, 65G40, 65L15, 65L60] 
see: Eigenvalue enclosures for ordinary differential 
equations) 
Lehmann-Maehly method 
[49R50, 65G20, 65G30, 65G40, 65L15, 65L60] 
see: Eigenvalue enclosures for ordinary differential 
equations) 
Leibler cross-entropy see: Kullback- — 
Leibler divergence see: Kullback- — 
Leibler measure of cross-entropy see: Kullback- — 
Leibniz see: Gottfried Wilhelm — 
Leibniz, gottfried wilhelm 
(01A99) 
(referred to in: History of optimization) 
(refers to: History of optimization) 


Lemke’s algorithm 
[05B35, 52A22, 60D05, 65K05, 68Q25, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules; Probabilistic analysis of 
simplex algorithms) 

Lemke method 
(90C33) 
(referred to in: Convex-simplex algorithm; Equivalence 
between nonlinear complementarity problem and fixed 
point problem; Generalized nonlinear complementarity 
problem; Integer linear complementary problem; LCP: 
Pardalos—Rosen mixed integer formulation; Linear 
complementarity problem; Linear programming; Order 
complementarity; Parametric linear programming: cost 
simplex algorithm; Principal pivoting methods for linear 
complementarity problems; Sequential simplex method; 
Topological methods in complementarity theory) 
(refers to: Convex-simplex algorithm; Linear 
complementarity problem; Linear programming; 
Parametric linear programming: cost simplex algorithm; 
Sequential simplex method) 


Lemke method 
[90C33] 
(see: Linear complementarity problem) 
lemma see: equivariant Morse —; Farkas —; first slope —; 
Knaster—Kuratowski-Mazurkiewicz —; second slope —; 
third slope — 
lemma: generalizations see: Farkas — 
lemmas see: slope — 
length see: binary —; maximin path —; maximum path —; 
minimax path —; minimum path —; path —; Shortest 
program —; unary —; variable stage- — 
length of input data 
68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
length of a partial computation of a Turing machine 
90C60] 
(see: Complexity classes in optimization) 
length of a path in a graph 
05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
length of a subgraph 
05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
length vector see: arc — 
Lennard-Jones 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
Lennard-Jones microcluster 
[90C26, 90C90] 
(see: Global optimization in Lennard-Jones and morse 
clusters) 
Lennard-Jones and morse clusters see: Global optimization 
in — 
Lennard-Jones potential energy function 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Monotonic optimization; Stochastic global 
optimization: two-phase methods) 
Leonid Vitalyevich see: Kantorovich — 
Leray-Schauder degree 
[90C33] 
(see: Topological methods in complementarity theory) 
less-than-truckload 
[90C35] 
(see: Multicommodity flow problems) 
lessPreferred 
(see: Railroad locomotive scheduling) 
level see: aspiration —; concordance —; lower- — 
level B 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
level BF 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
level BFR 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
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(see: Identification methods for reaction kinetics and 
transport) 

level crossing minimization see: k- — 

level feature detection see: low- — 

level functions see: natural — 

level hierarchy see: k- — 

level of interaction 

90B85 

(see: Multifacility and restricted location problems) 

level in a leveled graph 

90C35 

(see: Optimization in leveled graphs) 

level Optimization see: Two- — 

level planar graph 

90C35 

(see: Optimization in leveled graphs) 

level planarization 

90C35 

(see: Optimization in leveled graphs) 

level planarization problem 

90C35 
(see: Optimization in leveled graphs) 

level planarization problem see: k- — 

level problem see: first —; lower- —; second —; upper 

level set 
[62G07, 62G30, 65K05, 90C05, 90C25, 90C26, 90C30, 90C34] 
(see: Isotonic regression problems; Monotonic 
optimization; Random search methods; Semi-infinite 
programming: discretization methods) 

level set see: bounded — 

level sets conjugation 


90C26] 
(see: Global optimization: envelope representation) 
level software see: high- —; low- —; medium- — 


level of a vertex in a rooted tree 
05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
leveled crossing minimization 
90C35 
(see: Optimization in leveled graphs) 
leveled graph 
90C35 
(see: Optimization in leveled graphs) 
leveled graph see: k- —; level in a —; proper k- — 
leveled graphs 
[90C35 
(see: Optimization in leveled graphs) 
leveled graphs see: Optimization in — 
Levenberg-Marquardt 
90C30 
(see: Cost approximation algorithms) 
Levenberg-Marquardt 
90C30 
(see: Cost approximation algorithms) 
Levenberg-Marquardt algorithm 
90C30 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
Levenberg-Marquardt method 
49M37] 
(see: Nonlinear least squares: trust region methods) 


Levenberg-Marquardt rule 
[90C25, 90C30] 
(see: Successive quadratic programming: full space 
methods) 
leverage hypothesis see: financial — 
Levin theorem see: Cook- — 
Levitin-Polyak method 
90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
Levitin-Polyak minimizing sequence 
49]40, 49M30, 65K05, 65M30, 65M32] 
(see: Ill-posed variational problems) 
Levitin-Polyak well-posed problem 
[49J40, 49M30, 65K05, 65M30, 65M32] 
see: Ill-posed variational problems) 
Levy probability distribution 
[90C90] 
(see: Simulated annealing methods in protein folding) 
lexico-positive basis tableau 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Lexicographic pivoting rules) 
lexico-positive vector 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Lexicographic pivoting rules) 
lexicographic 
(see: Planning in the process industry) 
lexicographic approach 
see: Planning in the process industry) 
lexicographic dual simplex method 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Lexicographic pivoting rules) 
Lexicographic Goal Programming 
see: Planning in the process industry) 
lexicographic ordering 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Lexicographic pivoting rules) 
lexicographic ordering 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Lexicographic pivoting rules) 
lexicographic ordering and perturbation 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Lexicographic pivoting rules) 
lexicographic pivot selection 
[65K05, 90C20, 90C33] 
see: Principal pivoting methods for linear complementarity 
problems) 
lexicographic pivoting rule 
[05B35, 90C05, 90C20, 90C33] 
see: Least-index anticycling rules) 
Lexicographic pivoting rules 
(90C05, 90C20, 90C33, 05B35, 65K05) 
referred to in: Criss-cross pivoting rules; Least-index 
anticycling rules; Linear programming; Linear 
programming: Klee-Minty examples; Pivoting algorithms 
for linear programming generating two paths; Principal 
pivoting methods for linear complementarity problems; 
Probabilistic analysis of simplex algorithms; Simplicial 
pivoting algorithms for integer programming) 
(refers to: Criss-cross pivoting rules; Least-index anticycling 
rules; Linear complementarity problem; Linear 
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programming; Oriented matroids; Pivoting algorithms for 
linear programming generating two paths; Principal 
pivoting methods for linear complementarity problems; 
Probabilistic analysis of simplex algorithms) 
lexicographic primal simplex method 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Lexicographic pivoting rules) 
lexicographic search algorithm 
[90C10, 90C11, 90C20] 
(see: Linear ordering problem) 
lexicographic simplex 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Lexicographic pivoting rules) 
lexicographic simplex method 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Lexicographic pivoting rules) 
lexicographic variant of the constraint-by-constraint method 
[52A22, 60D05, 68Q25, 90C05] 
see: Probabilistic analysis of simplex algorithms) 
lexicographical order 
[12D10, 12Y05, 13P10] 
(see: Grébner bases for polynomial equations) 
lexicographical ordering for n-dimensional vectors 
[90C09, 90C10] 
see: Combinatorial optimization algorithms in resource 
allocation problems) 
lexicographically greater 
[90C09, 90C10] 
see: Combinatorial optimization algorithms in resource 
allocation problems) 
lexicographically minimax objective function 
[90C09, 90C10] 
see: Combinatorial optimization algorithms in resource 
allocation problems) 
lexicographically positive vector 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Lexicographic pivoting rules) 
lexicographically smaller 
[90C09, 90C10] 
see: Combinatorial optimization algorithms in resource 
allocation problems) 
LexPr 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Lexicographic pivoting rules) 
LFS function 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
Li-Pardalos generator 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
Liability Management see: asset — 
liability management decision support system see: Asset — 
library see: general-purpose software —; IMSL subroutine —; 
NAG —; NAG parallel —; rotamer — 
LICQ 
[49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
(LICQ) see: linear independence constraint qualification — 


Lie algebra 

[05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements) 
lift-and-project 

[90C05, 90C06, 90C08, 90C10, 90C11, 90C26, 90C27, 90C57] 

(see: Integer programming; Integer programming: branch 

and cut algorithms; Reformulation-linearization technique 

for global optimization) 
lift-and-project 
90C09, 90C10, 90C11, 90C27, 90C57] 
(see: Disjunctive programming; Integer programming; Set 
covering, packing and partitioning problems) 
lift-and-project cut 
90C11] 
(see: MINLP: branch and bound methods) 
lift-and-project cuts 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: cutting plane algorithms) 
lift-and-project hierarchy 
90C09, 90C10, 90C11] 

(see: Disjunctive programming) 
lift availability see: upper bound on gas — 
lift wells of type a see: gas — 
lift wells of type b see: gas — 
lifting 
90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: branch and cut algorithms) 
lifting cut 

90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: branch and cut algorithms) 
lifting procedure 

90C09, 90C10, 90C11] 

(see: Disjunctive programming) 

light edge 

68Q20] 

(see: Optimal triangulations) 
light travel 

(see: Railroad locomotive scheduling) 
like see: convex- — 
like criterion see: test nonmonotone Armijo- — 
like function see: convex- — 
like function pair see: convex- — 
like inequalities see: variational- — 
like method see: proximal- — 
likelihood see: maximum — 
likelihood detection via semidefinite programming see: 

Maximum — 
likelihood estimate see: maximum — 
likelihood estimation see: maximum — 
likelihood evidence 

(see: Bayesian networks) 
likelihood function 

[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 

90C90] 

(see: Disease diagnosis: optimization-based methods) 
likelihood method see: iterative quadratic maximum — 
likelihood principle see: maximum — 
likelihood ratio method 

[62F12, 65C05, 65K05, 90C15, 90C31] 

(see: Monte-Carlo simulations for stochastic optimization) 
limit see: absolute — 
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limited discrepancy search 

[68T20, 68T99, 90C27, 90C59] 

(see: Metaheuristics) 
limited enumeration methods 

[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 
limited-memory 

[90C30 

(see: Conjugate-gradient methods) 
limited-memory affine reduced Hessian 

[90C30 

(see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 
limited-memory algorithm 

[90C30 

(see: Conjugate-gradient methods) 
limited-memory approach 

[90C30 

(see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 
limited-memory BFGS method 

[90C06 

(see: Large scale unconstrained optimization) 
limited-memory reduced-Hessian BFGS algorithm 

[90C30 

(see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 
limited-memory symmetric rank-one approach 

[90C30 

(see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 
limiting coderivative 

[49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 
limiting differential 

[49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 
limiting Fréchet subdifferential 

[49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 
limiting (Fréchet) subdifferentials 

[49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 
limiting normal cone 

[49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 
limiting subdifferential see: singular — 
limiting superdifferential 

[49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 
limiting value 

[01A99] 

(see: Gauss, Carl Friedrich) 
Lin-Kernighan neighborhood 

[68T99, 90C27] 

(see: Capacitated minimum spanning trees) 
LindAcR 

[05B35, 90C05, 90C20, 90C33] 

(see: Least-index anticycling rules) 


line see: capital market —; inexact line search —; security 


market — 


line algorithm see: on- — 

line feedback see: off- —; on- — 

line learning see: off- —; on- — 

line method see: on- — 

line process optimization see: off- —; on- — 

line search 
[90C25, 90C30, 90C33] 
(see: Conjugate-gradient methods; Implicit lagrangian; 
Nonlinear least squares problems; Successive quadratic 
programming: full space methods) 

line search 
[90C30] 
(see: Nonlinear least squares problems) 

line search see: curvilinear —; inexact —; nonmonotone — 

line search line see: inexact — 

line search methods 
[90C30] 
(see: Cyclic coordinate method; Powell method; Rosenbrock 
method) 

line search problem 

[90C30] 

(see: Convex-simplex algorithm) 

line search problem 

[90C30] 

see: Convex-simplex algorithm; Frank-Wolfe algorithm) 

line search technique 

[49M37] 

(see: Nonlinear least squares: Newton-type methods) 

line search technique see: inexact — 

line searches 

[49M37, 90C30] 
(see: Nonlinear least squares: trust region methods; 
Rosenbrock method) 

line searching 

[90C30] 

(see: Successive quadratic programming) 

lineality space 

[90C22, 90C25, 90C31] 

(see: Semidefinite programming: optimality conditions and 

stability) 

linear 

34A55, 34E05, 35R30, 49M20, 62G05, 62G08, 62J02, 62K05, 
62P10, 62P30, 76R50, 80A20, 80A23, 80A30, 90C11, 90C27, 
90C30] 
(see: Asymptotic properties of random multidimensional 
assignment problem; Generalized outer approximation; 
Global optimization: functional forms; Identification 
methods for reaction kinetics and transport; Optimal 
sensor scheduling) 

linear see: non- — 

linear algebra 
[14R10, 15A03, 32B15, 51E15, 51N20] 
(see: Affine sets and functions; Linear space) 

linear algebra framework 
[05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 

linear algebraic equations 
[34-XX, 49-XX, 65-XX, 65K05, 65K10, 68-XX, 90-XX] 
(see: ABS algorithms for linear equations and linear least 
squares; Nonlocal sensitivity analysis with automatic 
differentiation) 
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linear algorithm 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 

linear appearance of control function 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 

linear approximation 
[90C31] 
(see: Bounds and solution vector estimates for parametric 
NLPS) 

linear arc cost see: piecewise — 

linear Arrangement 
(see: State of the art in modeling agricultural systems) 

linear arrangement problem 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

linear bilevel programming problem 

[90C30, 90C90] 

see: Bilevel programming: global optimization) 

linear BLPP see: complexity of the — 

linear CG method 

[90C30] 

see: Conjugate-gradient methods) 

linear CG method 

[90C30] 

see: Conjugate-gradient methods) 

linear co-index 

[90C31, 90C34] 

see: Parametric global optimization: sensitivity) 

linear complementarity 

[65K05, 90C20] 

see: Quadratic programming with bound constraints) 

linear complementarity 

[90C33] 

see: Lemke method) 

Linear complementarity problem 
(90C33) 
(referred to in: Convex-simplex algorithm; Criss-cross 
pivoting rules; Equivalence between nonlinear 
complementarity problem and fixed point problem; 
Generalized nonlinear complementarity problem; Integer 
linear complementary problem; LCP: Pardalos-Rosen 
mixed integer formulation; Least-index anticycling rules; 
Lemke method; Lexicographic pivoting rules; Linear 
programming; Linear programming: interior point 
methods; Optimization with equilibrium constraints: 
A piecewise SQP approach; Order complementarity; 
Parametric linear programming: cost simplex algorithm; 
Principal pivoting methods for linear complementarity 
problems; Probabilistic analysis of simplex algorithms; 
Quadratic programming with bound constraints; 
Sequential simplex method; Splitting method for linear 
complementarity problems; Topological methods in 
complementarity theory) 
(refers to: Convex-simplex algorithm; Equivalence between 
nonlinear complementarity problem and fixed point 
problem; Generalized nonlinear complementarity problem; 
Integer linear complementary problem; LCP: 
Pardalos—Rosen mixed integer formulation; Lemke 
method; Linear programming; Order complementarity; 
Parametric linear programming: cost simplex algorithm; 


Principal pivoting methods for linear complementarity 
problems; Sequential simplex method; Splitting method for 
linear complementarity problems; Topological methods in 
complementarity theory) 
linear complementarity problem 
[65K10, 65M60, 90C11, 90C30, 90C33] 
(see: Kuhn-Tucker optimality conditions; LCP: 
Pardalos—Rosen mixed integer formulation; Lemke 
method; Variational inequalities) 
linear complementarity problem 
[90C11, 90C33] 
(see: LCP: Pardalos-Rosen mixed integer formulation) 
linear complementarity problem see: extended —; 
Generalizations of interior point methods for the —; 
horizontal —; mixed —; parametric —; vertical — 
linear complementarity problems 
[90C31, 90C33] 
(see: Sensitivity analysis of complementarity problems) 
linear complementarity problems see: Principal pivoting 
methods for —; Splitting method for — 
linear complementary problem 
[90C25, 90C33] 
(see: Integer linear complementary problem) 
linear complementary problem see: Integer — 
linear Component 
(see: State of the art in modeling agricultural systems) 
linear constraint 
[90C90] 
(see: Design optimization in computational fluid dynamics) 
linear constraints see: general — 
linear control see: piecewise — 
linear convergence 
65K05, 90C30] 
(see: Bisection global optimization methods) 
linear convergence rate see: r- — 
linear dependence 
90C09, 90C10] 
(see: Matroids; Oriented matroids) 
linear discrete optimization oracle 
05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 


linear equality relation 

15A39, 90C05] 

(see: Linear optimization: theorems of the alternative) 

linear equation 

15A39, 90C05] 

(see: Linear optimization: theorems of the alternative) 

linear equation-solving see: iterative — 

linear equations 

65H10, 65J15, 65K05, 65K10] 
(see: ABS algorithms for optimization; 
Contraction-mapping) 

linear equations see: bounds for —; Overdetermined systems 
of —; Symmetric systems of — 

linear equations and linear least squares see: 
Abaffi-Broyden-Spedicato algorithms for —; ABS 
algorithms for — 

linear extension theorem see: Hahn—Banach — 
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linear fixed charge 
[90C25] 
(see: Concave programming) 
linear fractional combinatorial optimization 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
linear fractional combinatorial optimization problem see: 
integral — 
linear fractional program 
[90C32] 
(see: Fractional programming) 
linear-fractional programming 
[90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 
linear fractional terms 
[90C26, 90C90] 
(see: Global optimization of heat exchanger networks) 
linear function see: decomposition of a continuous 
piecewise —; piecewise — 
linear functions 
[90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 
linear growth condition 
[65K10, 90C90] 
(see: Variational inequalities: projected dynamical system) 
linear, increasing 
[90B15] 
(see: Evacuation networks) 
linear independence 
[90C31] 
(see: Sensitivity and stability in NLP: continuity and 
differential stability) 
linear independence constraint qualification 
[49K20, 49M99, 65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 
90C30, 90C31, 90C33, 90C34, 90C55] 
(see: Parametric global optimization: sensitivity; Parametric 
optimization: embeddings, path following and 
singularities; Sequential quadratic programming: interior 
point methods for distributed optimal control problems) 
linear independence constraint qualification (LICQ) 
[90C31, 90C34, 90C46] 
(see: Generalized semi-infinite programming: optimality 
conditions) 
linear independence CQ 
[49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
linear independency constraint qualification 
[90C31, 90C34] 
(see: Semi-infinite programming: second order optimality 
conditions) 
linear index 
[58E05, 90C30] 
(see: Topology of global optimization) 
linear inequalities 
[15A39, 90C05] 
(see: Motzkin transposition theorem) 
linear inequalities 
[52B12, 68Q25] 
(see: Fourier—-Motzkin elimination method) 


linear inequality relation 

15A39, 90C05] 

(see: Linear optimization: theorems of the alternative) 

linear interval equation 

[65G20, 65G30, 65G40, 65K05, 90C30] 

(see: Interval global optimization) 

linear least squares 

[65K05, 65K10] 

see: ABS algorithms for linear equations and linear least 
squares) 

linear least squares see: Abaffi-Broyden-Spedicato algorithms 
for linear equations and —; ABS algorithms for linear 
equations and — 

linear map see: adjoint — 

linear matrix inequality 

[15A15, 90C25, 90C55, 90C90, 93D09] 

see: Robust control; Semidefinite programming and 

determinant maximization) 

linear matrix inequality 

[15A15, 90C25, 90C55, 90C90, 93D09] 

see: Robust control; Semidefinite programming and 

determinant maximization) 

linear matroid 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 

90B, 90C] 

see: Convex discrete optimization) 

linear minimum cost network flow problem see: piecewise — 

linear mixed integer problems 

[49M27, 90C11, 90C30] 

see: MINLP: generalized cross decomposition) 

linear model 

[90C05, 90C25, 90C29, 90C30, 90C31] 

see: Nondifferentiable optimization: parametric 
programming) 

linear model see: general univariate — 

linear multiple-choice knapsack problem 
[90C10, 90C27] 
(see: Multidimensional knapsack problems) 

linear multiplicative program 
[90C26] 
(see: Global optimization in multiplicative programming) 

linear network flow problem 
[90C30, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms) 

linear network flow problems see: Piecewise — 

linear nondegeneracy condition 

[58E05, 90C30] 

(see: Topology of global optimization) 

linear optimization 

[05B35, 90C05, 90C20, 90C33] 

see: Least-index anticycling rules) 

linear optimization 

[05B35, 65K05, 90C05, 90C20, 90C33] 

see: Criss-cross pivoting rules) 

linear optimization see: Distance dependent protein force field 
via —; duality theorem for —; Global pairwise protein 
sequence alignment via mixed-integer — 

linear optimization problem 
[15A39, 90C05] 
(see: Linear optimization: theorems of the alternative; 
Motzkin transposition theorem) 
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lin 


lin 


ear optimization problem in standard form 

[15A39, 90C05] 

(see: Linear optimization: theorems of the alternative) 
ear optimization problems see: Price of robustness for —; 
stochastic — 


Linear optimization: theorems of the alternative 


lin 


lin 


(15439, 90C05) 

(referred to in: Farkas lemma; Linear programming; 
Motzkin transposition theorem; Theorems of the 
alternative and optimization; Tucker homogeneous systems 
of linear relations) 

(refers to: Farkas lemma; Linear programming; Motzkin 
transposition theorem; Theorems of the alternative and 
optimization; Tucker homogeneous systems of linear 
relations) 

ear order complementarity problem 

[90C33] 

see: Order complementarity) 

ear order complementarity problem see: generalized — 


linear ordering 


[90C05, 90C06, 90C08, 90C10, 90C11] 
see: Integer programming: cutting plane algorithms) 


Linear ordering problem 


90C10, 90C11, 90C20) 

(referred to in: Quadratic assignment problem) 

refers to: Assignment and matching; Assignment methods 
in clustering; Bi-objective assignment problem; 
Communication network assignment problem; Complexity 
theory: quadratic programming; Feedback set problems; 
Frequency assignment problem; Generalized assignment 
problem; Graph coloring; Graph planarization; Greedy 
randomized adaptive search procedures; Maximum 
partition matching; Quadratic assignment problem; 
Quadratic fractional programming: Dinkelbach method; 
Quadratic knapsack; Quadratic programming with bound 
constraints; Quadratic programming over an ellipsoid; 
Quadratic semi-assignment problem; Standard quadratic 
optimization problems: algorithms; Standard quadratic 
optimization problems: applications; Standard quadratic 
optimization problems: theory) 


linear ordering problem 


[90C35] 
see: Optimization in leveled graphs) 


linear outer approximation 


[49M20, 90C11, 90C30] 
(see: Generalized outer approximation) 


linear path 


[62H30, 90C39] 
see: Dynamic programming in clustering) 


linear platform cost 


[90C26] 
see: MINLP: application in facility location-allocation) 


linear potential 


lin 


lin 


[90C90] 

see: Design optimization in computational fluid dynamics) 
ear problem see: ball-constrained —; geometrically —; 
physically — 

ear problems see: Semi-infinite programming: methods 

for — 


linear program 


[68Q25, 90C05, 90C20, 90C30, 91B28] 


(see: Competitive ratio for portfolio management; 
Redundancy in nonlinear programs; Simplicial 
decomposition) 


linear program 


[90C30] 
(see: Convex-simplex algorithm; Frank-Wolfe algorithm; 
Simplicial decomposition) 


linear program see: dual —; finite-dimensional —; integer —; 


single parametric mixed integer —; two-stage stochastic — 


linear program with an additional reverse convex constraint 


[90C26, 90C30] 
(see: Reverse convex optimization) 


linear program with recourse see: stochastic — 
Linear programming 


(90C05) 

(referred to in: ABS algorithms for linear equations and 
linear least squares; Affine sets and functions; Carathéodory 
theorem; Cholesky factorization; Convex-simplex 
algorithm; Criss-cross pivoting rules; Ellipsoid method; 
Equivalence between nonlinear complementarity problem 
and fixed point problem; Farkas lemma; Fourier-Motzkin 
elimination method; Gauss, Carl Friedrich; Generalized 
nonlinear complementarity problem; Global optimization 
in multiplicative programming; History of optimization; 
Integer linear complementary problem; Interval linear 
systems; Job-shop scheduling problem; Kantorovich, 
Leonid Vitalyevich; Krein-Milman theorem; Large scale 
trust region problems; Large scale unconstrained 
optimization; LCP: Pardalos—Rosen mixed integer 
formulation; Least-index anticycling rules; Lemke method; 
Lexicographic pivoting rules; Linear complementarity 
problem; Linear optimization: theorems of the alternative; 
Linear programming: Klee-Minty examples; Linear 
programming models for classification; Linear space; 
Motzkin transposition theorem; Multiparametric linear 
programming; Multiplicative programming; Optimization 
in medical imaging; Order complementarity; Orthogonal 
triangularization; Overdetermined systems of linear 
equations; Parametric linear programming: cost simplex 
algorithm; Pivoting algorithms for linear programming 
generating two paths; Principal pivoting methods for linear 
complementarity problems; Probabilistic analysis of 
simplex algorithms; QR factorization; Sequential simplex 
method; Simplicial pivoting algorithms for integer 
programming; Solving large scale and sparse semidefinite 
programs; Symmetric systems of linear equations; 
Topological methods in complementarity theory; Tucker 
homogeneous systems of linear relations; Young 
programming) 

(refers to: Affine sets and functions; Carathéodory theorem; 
Convex-simplex algorithm; Criss-cross pivoting rules; 
Farkas lemma; Gauss, Carl Friedrich; Global optimization 
in multiplicative programming; History of optimization; 
Kantorovich, Leonid Vitalyevich; Krein-Milman theorem; 
Least-index anticycling rules; Lemke method; Lexicographic 
pivoting rules; Linear complementarity problem; Linear 
optimization: theorems of the alternative; Linear space; 
Motzkin transposition theorem; Multiparametric linear 
programming; Multiplicative programming; Parametric 
linear programming: cost simplex algorithm; Pivoting 
algorithms for linear programming generating two paths; 
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Principal pivoting methods for linear complementarity 
problems; Probabilistic analysis of simplex algorithms; 
Sequential simplex method; Simplicial pivoting algorithms 
for integer programming; Tucker homogeneous systems of 
linear relations) 

linear programming 

[01A99, 37A35, 65K05, 68Q05, 68Q10, 68Q20, 68Q25, 68Q99, 
90-XX, 90B50, 90C05, 90C06, 90C10, 90C11, 90C15, 90C22, 
90C25, 90C26, 90C29, 90C30, 90C57, 90C60, 90C90] 

(see: Branch and price: Integer programming with column 
generation; Complexity theory; Complexity theory: 
quadratic programming; Copositive programming; 
Information-based complexity and information-based 
optimization; Kantorovich, Leonid Vitalyevich; Lagrangian 
duality: BASICS; Modeling difficult optimization problems; 
Optimal triangulations; Optimization and decision support 
systems; Potential reduction methods for linear 
programming; Preference disaggregation approach: basic 
features, examples from financial decision making; 
Probabilistic constrained linear programming: duality 
theory; Suboptimal control; Survivable networks) 

linear programming 

[01A99, 37A35, 49-XX, 52A22, 52B12, 60D05, 65K05, 65K10, 
68Q05, 68Q10, 68Q25, 90-XX, 90B30, 90B50, 90B85, 90C05, 
90C06, 90C10, 90C11, 90C15, 90C20, 90C25, 90C26, 90C27, 
90C29, 90C30, 90C31, 90C35, 91A99, 91B28, 91B82, 93-XX] 
(see: ABS algorithms for optimization; Auction algorithms; 
Data envelopment analysis; Duality theory: monoduality in 
convex optimization; Fourier-Motzkin elimination 
method; Homogeneous selfdual methods for linear 
programming; Information-based complexity and 
information-based optimization; Kantorovich, Leonid 
Vitalyevich; Linear ordering problem; Linear 
programming; Linear programming: karmarkar projective 
algorithm; Linear programming: Klee-Minty examples; 
Multifacility and restricted location problems; 
Multi-objective integer linear programming; Operations 
research and financial markets; Optimization and decision 
support systems; Portfolio selection: markowitz 
mean-variance model; Potential reduction methods for 
linear programming; Preference disaggregation; Preference 
disaggregation approach: basic features, examples from 
financial decision making; Probabilistic analysis of simplex 
algorithms; Probabilistic constrained linear programming: 
duality theory; Selfdual parametric method for linear 
programs; Sensitivity and stability in NLP) 

linear programming see: analytical approximation of —; 
Bilevel —; binary —; Decomposition principle of —; 
Extension of the fundamental theorem of —; fractional —; 
Fuzzy multi-objective —; Homogeneous selfdual methods 
for —; infinite-dimensional —; integer —; mixed integer —; 
multi-objective —; Multi-objective integer —; 
Multiparametric —; Multiparametric mixed integer —; 
multiple objective —; parametric —; Piecewise —; Potential 
reduction methods for —; probabilistic constrained —; 
semi-infinite —; stochastic — 

linear programming approach for DNA transcription element 
identification see: Mixed 0-1 — 

linear programming: complexity, equivalence to minmax, 
concave programs see: Bilevel — 

linear programming: cost simplex algorithm see: Parametric — 


linear programming: decomposition and cutting planes see: 
Stochastic — 

linear programming duality 
[90C10, 90C46] 
(see: Integer programming duality) 

linear programming: duality theory see: Probabilistic 
constrained — 

linear Programming and Economic Analysis 
[35B40, 37C70, 40A05, 49J24] 
(see: Statistical convergence and turnpike theory; Turnpike 
theory: stability of optimal trajectories) 

linear programming with fuzzy coefficients see: 
multi-objective — 

linear programming generating two paths see: Pivoting 
algorithms for — 

linear programming: heat exchanger network synthesis see: 
Mixed integer — 

Linear programming: interior point methods 
(90C05) 
(referred to in: Ellipsoid method; Entropy optimization: 
interior point methods; Homogeneous selfdual methods for 
linear programming; Integer programming: branch and 
bound methods; Linear programming: karmarkar 
projective algorithm; Potential reduction methods for 
linear programming; Principal pivoting methods for linear 
complementarity problems; Quadratic assignment 
problem; Quadratic programming with bound constraints; 
Sequential quadratic programming: interior point methods 
for distributed optimal control problems; Successive 
quadratic programming: solution by active sets and interior 
point methods) 
(refers to: Entropy optimization: interior point methods; 
Homogeneous selfdual methods for linear programming; 
Interior point methods for semidefinite programming; 
Linear complementarity problem; Linear programming: 
karmarkar projective algorithm; Potential reduction 
methods for linear programming; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

Linear programming: karmarkar projective algorithm 
(90C05) 
(referred to in: Ellipsoid method; Entropy optimization: 
interior point methods; Homogeneous selfdual methods for 
linear programming; Linear programming: interior point 
methods; Nondifferentiable optimization: cutting plane 
methods; Potential reduction methods for linear 
programming; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Successive quadratic programming: solution by active sets 
and interior point methods) 
(refers to: Entropy optimization: interior point methods; 
Homogeneous selfdual methods for linear programming; 
Interior point methods for semidefinite programming; 
Linear programming: interior point methods; Potential 
reduction methods for linear programming; Sequential 
quadratic programming: interior point methods for 
distributed optimal control problems; Successive quadratic 
programming: solution by active sets and interior point 
methods) 
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Linear programming: Klee-Minty examples 
(90C05) 
(referred to in: Ellipsoid method) 
(refers to: Criss-cross pivoting rules; Least-index anticycling 
rules; Lexicographic pivoting rules; Linear programming) 

linear programming: mass and heat exchanger networks see: 
Mixed integer — 

Linear programming models for classification 
(62H30, 68T10, 90C05) 
(referred to in: Mixed integer classification problems; 
Optimization in boolean classification problems; 
Optimization in classifying text documents; Statistical 
classification: optimization approaches) 
(refers to: Deterministic and probabilistic optimization 
models for data classification; Linear programming; Mixed 
integer classification problems; Statistical classification: 
optimization approaches) 

linear programming problem 
[49L20, 52B12, 60G35, 65K05, 68Q25, 90C39, 90C90] 
(see: Design optimization in computational fluid dynamics; 
Differential equations and global optimization; Dynamic 
programming: discounted problems; Fourier-Motzkin 
elimination method) 

linear programming problem see: analytical approximation of 
a— 

linear programming problems see: extended —; Stabilization 
of cutting plane algorithms for stochastic — 

linear programming program 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 

linear programming relaxation 
[90C05, 90C06, 90C08, 90C10, 90C11, 90C27, 90C57] 
(see: Integer programming: branch and bound methods; 
Integer programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Set covering, 
packing and partitioning problems) 

linear programming relaxations 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 

linear programming with right-hand-side uncertainty, duality 
and applications see: Robust — 

linear programming under uncertainty see: multi-objective — 

linear programming with variable coefficients see: 
generalized — 

linear programs see: dual —; Robust optimization: 
mixed-integer —; Selfdual parametric method for — 

linear programs with recourse and arbitrary multivariate 
distributions see: Stochastic — 

linear programs for routing and protection problems in optical 
networks see: Integer — 

linear quadratic function see: piecewise — 

linear-quadratic Gaussian 
[93D09] 
(see: Robust control) 

linear-quadratic problem 
[34H05, 491.20, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 

linear regression model see: classical — 


linear relations 

[15A39, 90C05] 

(see: Linear optimization: theorems of the alternative) 
linear relations see: Tucker homogeneous systems of — 
linear relaxation 

[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 

90C60, 90C90] 

(see: Traveling salesman problem) 
linear scales 

[90C29] 

(see: Estimating data for multicriteria decision making 

problems: optimization techniques) 
(linear) semi-infinite program see: primal — 
linear semi-infinite programming 

[90C05, 90C34] 

(see: Semi-infinite programming: methods for linear 

problems) 
linear semi-infinite programming see: perfect duality from the 

view of — 
linear semidefinite program 
90C22, 90C25, 90C31] 
(see: Semidefinite programming: optimality conditions and 
stability) 
linear semidefinite programming problem 
90C22, 90C25, 90C31] 
(see: Semidefinite programming: optimality conditions and 
stability) 
linear SIP 
90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
linear SIP problem see: duality of the — 
linear SIP problems 
90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming: discretization methods) 
Linear space 

(15A03, 14R10, 51N20) 

(referred to in: Affine sets and functions; Linear 

programming) 

(refers to: Affine sets and functions; Linear programming) 
linear space 
14R10, 15A03, 51N20] 

(see: Linear space) 

linear spaces see: Best approximation in ordered normed — 
linear speedup 

49J35, 49K35, 62C20, 91A05, 91A40] 

(see: Minimax game tree searching) 

linear support 

90C30] 

(see: Lagrangian duality: BASICS) 

linear supporting function 

90C30] 

(see: Lagrangian duality: BASICS) 
linear system see: alternative —; center of an interval —; 

interval —; sign-solvable — 
linear systems see: Interval —; large scale — 
linear systems of equations 

[15A99, 65G20, 65G30, 65G40, 90C26] 

(see: Interval linear systems) 
linear thermoelastic behavior of a generally nonhomogeneous and 

nonisotropic body 

[35R70, 47840, 74B99, 74D99, 74G99, 74H99] 
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(see: Quasidifferentiable optimization: applications to 
thermoelasticity) 
linear topological space 
46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
linear transformation 
90C11, 90C90] 
(see: MINLP: trim-loss problem) 
linear two-stage model 
90C11, 90C15, 90C31] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence) 
linear unidimensional scale 
62H30, 90C27] 
(see: Assignment methods in clustering) 
linear upper bound see: piecewise — 
linear zero-one integer problem 
[90C25, 90C33] 
(see: Integer linear complementary problem) 
linearity see: degree of — 
linearization 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
linearization see: Adams—Johnson —; Frieze—Yadegar —; 
improved piecewise —; inner —; Kaufman-Broeckx —; 
Lawler —; partial — 
linearization cone see: inner —; outer — 
linearization of constraints 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
Linearization/Convexification Technique see: 
reformulation- — 
linearization/convexification techniques see: reformulation- — 
linearization error 
[49]40, 49J52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
linearization in facility location problems with staircase costs 
[90B80, 90C11] 
(see: Facility location with staircase costs) 
linearization in facility location problems with staircase costs 
see: convex piecewise — 
linearization methods 
[90C30] 
(see: Cost approximation algorithms) 
linearization methods 
[90C30] 
(see: Cost approximation algorithms) 
linearization of programs 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
linearization/restriction see: inner — 
linearization technique see: reformulation- — 
linearization technique for global optimization see: 
Reformulation- — 
linearized reformulation 
[65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
linearly constrained optimization problems 
[90C30] 


(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
linearly dependent see: positively — 
linearly elastic mechanical constructions 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
linearly elastic systems 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
linearly independent 
[90C30] 
(see: Convex-simplex algorithm) 
linearly independent see: positively — 
linearly monotonic over see: strongly — 
lines see: connection of flow —; method of — 
linguistic choices 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
linguistics see: computational — 
link cost function see: regular — 
link-diverse/disjoint 
[46N10, 68M10, 90B18, 90B25] 
(see: Integer linear programs for routing and protection 
problems in optical networks) 
link flow formulation 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
link frequency assignment problem see: radio — 
link loads see: variational inequality formulation in — 


linkage see: multilevel single- —; simple — 
linking constraints see: multistage — 
linkpoint of a graph 

[90C35] 


see: Feedback set problems) 
links see: temporal — 
LINPACK 
[65K05, 65K10] 
see: ABS algorithms for linear equations and linear least 
squares) 
Liouville theorem 
[01A50, 01A55, 01A60] 
see: Fundamental theorem of algebra) 
Lipschitz 
[90C11, 90C15] 
(see: Stochastic programming with simple integer recourse) 
Lipschitz see: locally — 
Lipschitz constant 
[65K05, 90C30] 
(see: Bisection global optimization methods) 
Lipschitz continuity 
[65K10, 65M60, 90C11, 90C15, 90C31, 90C90] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system) 
Lipschitz continuity 
[65K05, 65K10, 65M60, 90C30] 
(see: Bisection global optimization methods; Variational 
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inequalities: geometric interpretation, existence and 
uniqueness) 
Lipschitz continuous 
[49J20, 49J52, 65K05, 65K10, 65M60, 90C30, 90C31, 90C52, 
90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms; 
Bisection global optimization methods; Sensitivity and 
stability in NLP: approximation; Shape optimization; 
Variational inequalities: geometric interpretation, 
existence and uniqueness) 
Lipschitz continuous function 
[65K10, 90C90] 
(see: Variational inequalities: projected dynamical system) 
Lipschitz continuous function see: locally — 
Lipschitz function 
[90C26] 
(see: Generalized monotone multivalued maps; Global 
optimization using space filling) 
Lipschitz function 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
Lipschitz function see: locally — 
Lipschitz optimization 
[65G20, 65G30, 65G40, 65H20, 65K05, 90C26, 90C30] 
see: Interval analysis: unconstrained and constrained 
optimization; Monotonic optimization) 
Lipschitz optimization 
[90C26] 
(see: Global optimization using space filling) 
Lipschitz programming 
[90C26] 
see: Global optimization: envelope representation) 
Lipschitz programming 
[90C26] 
see: Global optimization: envelope representation) 
Lipschitz stability 
[90C05, 90C25, 90C29, 90C30, 90C31] 
see: Nondifferentiable optimization: parametric 
programming) 
Lipschitz stable solution 
[90C22, 90C25, 90C31] 
(see: Semidefinite programming: optimality conditions and 
stability) 
Lipschitzian operators in best approximation by bounded or 
continuous functions 
(65K10, 41A30, 4799) 
(referred to in: Large scale trust region problems) 
(refers to: Convex envelopes in optimization problems) 
lipschitzian selection operator 
[41A30, 4799, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
Lipschitzian selection operator see: optimal — 
Lipschitzness see: compact epi- — 
liquid phases 
[90C26, 90C90] 
(see: Global optimization in phase and chemical reaction 
equilibrium) 
list see: candidate —; closed —; code —; extended doubly 
connected edge —; open —; prediction —; restricted 
Candidate —; running —; tabu — 


list coloring 
[05-XX] 
(see: Frequency assignment problem) 
list size 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
list square merit function 
[49]52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
literal 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
literal (in logic) 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
Littlewood-Polya theorem see: Hardy- — 
LJ optimization procedure 
[93-XX] 
(see: Direct search Luus—Jaakola optimization procedure) 
LMI 
[15A15, 90C25, 90C55, 90C90] 
(see: Semidefinite programming and determinant 
maximization) 
LMM 
[90C25, 90C30] 
(see: Lagrangian multipliers methods for convex 
programming) 
IMT-skeleton 
[68Q20] 
(see: Optimal triangulations) 
load 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
load see: splitting/unsplitting of — 
load balancing 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
load balancing see: dynamic —; static — 
Load balancing for parallel optimization techniques 
(68W10, 90C27) 
(referred to in: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Heuristic search; Parallel computing: 
complexity classes; Parallel computing: models; Parallel 
heuristic search; Stochastic network problems: massively 
parallel solution) 
(refers to: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Heuristic search; Interval analysis: parallel 
methods for global optimization; Parallel computing: 
complexity classes; Parallel computing: models; Parallel 
heuristic search; Stochastic network problems: massively 
parallel solution) 
load balancing scheme see: near-neighbor — 
load balancing technique see: dynamic — 
load curve 
[90C10, 90C30, 90C35] 
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(see: Optimization in operation of electric and energy 
power systems) 
load curve 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
load dispatcher 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
load dispatcher 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
load dispatching 
[90B50] 
(see: Optimization and decision support systems) 
load dispatching 
[90B50] 
(see: Optimization and decision support systems) 
loadbalance 
[65K05, 65Y05, 65Y10, 65Y20, 68W10] 
(see: Interval analysis: parallel methods for global 
optimization) 
loads see: set of —; variational inequality formulation in link — 
local 
[65H20] 
(see: Multi-scale global optimization using 
terrain/funneling methods) 
local approximations see: nonsmooth — 
local-area computer network 
[68Q25, 90B80, 90C05, 90C27] 
(see: Communication network assignment problem) 
local attractor 
[49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
local attractors see: singular — 
Local attractors for gradient-related descent iterations 
(49M29, 65K10, 90C06) 
(referred to in: Conjugate-gradient methods; Large scale 
trust region problems; Nonlinear least squares: 
Newton-type methods; Nonlinear least squares: trust region 
methods) 
(refers to: Conjugate-gradient methods; Large scale trust 
region problems; Nonlinear least squares: Newton-type 
methods; Nonlinear least squares: trust region methods) 
local basin 
[65K05, 90C26, 90C30, 90C59] 
(see: Global optimization: filled function methods) 
local bilinear form see: K- — 
local consistency 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
local convergence rate 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
local convergence rate 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 


local cut 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms) 
local dualization see: loffe-Burke — 
local efficiency 
90C29] 
(see: Generalized concavity in multi-objective optimization) 
local efficient point 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
local Ekeland point 
[58C20, 58E30, 90C46, 90C48] 
(see: Nonsmooth analysis: weak stationarity) 
local equivalence closure of a relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
local equivalence relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
local exchange procedures 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
local greedy algorithms 
[05C85] 
(see: Directed tree networks) 
local improvement 
[68Q20] 
(see: Optimal triangulations) 
local independence 
(see: Bayesian networks) 
local infimum 
[90Cxx] 
(see: Discontinuous optimization) 
local inner product see: K- — 
local maximizer 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
local maximizer see: discrete left —; strict — 
local maximizers see: set of discrete e-global — 
local maximum 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
local maximum point 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
local maximum point see: strict — 
local maximum principle 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
local maximum principle 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
local maximum principle see: high-order — 
local maximum principle for Lagrangian problems see: 
high-order — 
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local minima 

92B05] 

(see: Genetic algorithms) 

local minimization 

[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 

see: Information-based complexity and information-based 

optimization) 

local minimization 

[90C60] 

(see: Complexity theory: quadratic programming) 

local minimizer 

[65K05, 90C26, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization; 
Smooth nonlinear nonconvex optimization) 

local minimizer see: isolated —; nonsingular —; regular —; 
strict —; strong — 

local minimizer problem 

[90C60] 

(see: Complexity theory: quadratic programming) 

local minimizers 

[49M29, 65K10, 90C06] 

see: Local attractors for gradient-related descent iterations) 

local minimum 

[9008, 90C08, 90C11, 90C26, 90C27, 90C39, 90C57, 90C59] 

see: Quadratic assignment problem; Second order 
optimality conditions for nonlinear optimization; Variable 
neighborhood search methods) 

local minimum 
[90C26, 90C39, 92B05] 
(see: Genetic algorithms; Second order optimality 
conditions for nonlinear optimization) 

local minimum see: strict —; strong — 

local minimum condition see: high-order — 

local minimum point 

[65K05, 90Cxx] 

see: Dini and Hadamard derivatives in optimization) 

local minimum point see: strict — 

local MINLP 

[49M37, 90C11] 

see: Mixed integer nonlinear programming) 

local monotonicity 

[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 

local optima 
[62H30, 90C27] 
(see: Assignment methods in clustering) 

local optimization 
[90C30, 90C60, 90C90] 
(see: Complexity theory; MINLP: applications in blending 
and pooling problems) 

local optimizer see: strict — 

local optimum 
[03B05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 68P10, 
68Q25, 68R05, 68T15, 68T20, 90C09, 90C26, 90C27, 90C30, 
94C10] 
(see: Maximum satisfiability problem; Stochastic global 
optimization: two-phase methods) 

local order relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 


91B06, 92C60] 
(see: Boolean and fuzzy relations) 
local phase 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: two-phase methods) 
local pre-order closure of a relation 
[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
local properties of the configuration space 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
local quadratic convergence theorem 
[90C30] 
(see: Numerical methods for unary optimization) 
local-ratio principle 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
local relational properties 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
local search 
[65K05, 9008, 90B35, 90C10, 90C26, 90C27, 90C30, 90C59, 
94C15] 
(see: Global optimization: filled function methods; Graph 
planarization; Job-shop scheduling problem; Maximum 
constraint satisfaction: relaxations and upper bounds; 
Variable neighborhood search methods) 
local search 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90-XX, 90C09, 
90C27, 90C35, 94C10] 
(see: Feedback set problems; Maximum satisfiability 
problem; Survivable networks) 
local search see: chained —; iterated —; nonoblivious —; 
stochastic — 
local search algorithms 
90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
local search device 
65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
local search heuristics 
05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
local search method 
90-XX] 
(see: Survivable networks) 
local search phase in GRASP 
65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
local search problems see: polynomial time — 
local solution 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
local solutions 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
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local strict efficiency 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
local strict monotonicity 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
local strictly efficient point 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
local strong monotonicity 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
local system cohomology 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
local tolerance closure of a relation 
[03B52, 03E72, 47840, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
local tolerance relation 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
local underestimator 
90C11, 90C26] 
(see: Extended cutting plane algorithm) 
local weak efficiency 
90C29] 
(see: Generalized concavity in multi-objective optimization) 
local weakly efficient point 
90C29] 
(see: Generalized concavity in multi-objective optimization) 
locality constraint 
05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
locality measure see: subgradient — 
localization 
65K05, 90C30] 
(see: Random search methods) 
localization of an ideal 
13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
localization problem see: Sensor network — 
localization problem, SNLP see: Semidefinite programming 
and the sensor network — 
localization property 
[90C33] 
(see: Topological methods in complementarity theory) 
localization search 
[65K05, 90C30] 
(see: Random search methods) 
localization search see: pure — 
localization set 
[49M20, 90-08, 90C25] 
(see: Nondifferentiable optimization: cutting plane 
methods) 
locally 
[68T20, 68T99, 90C27, 90C31, 90C34, 90C59] 


see: Metaheuristics; Parametric global optimization: 
sensitivity) 
locally consistent 
[65G20, 65G30, 65G40, 68T20] 
see: Interval constraints) 
locally extremal 
[49K27, 58C20, 58E30, 90C48] 
see: Nonsmooth analysis: Fréchet subdifferentials) 
locally filled function 
[65K05, 90C26, 90C30, 90C59] 
(see: Global optimization: filled function methods) 
locally Lipschitz 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
locally Lipschitz continuous function 
[65K10, 90C90] 
see: Variational inequalities: projected dynamical system) 
locally Lipschitz function 
[49J52, 65G20, 65G30, 65G40, 65K05, 90C30] 
see: Hemivariational inequalities: eigenvalue problems; 
Interval global optimization) 
locally minimal 
[68Q20] 
see: Optimal triangulations) 
locally monotone function 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
locally optimal 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
locally optimal parameter 
[90C05, 90C25, 90C29, 90C30, 90C31] 
see: Nondifferentiable optimization: parametric 
programming) 
locally optimal solution 
[90C08, 90C11, 90C27, 90C57, 90C59] 
see: Quadratic assignment problem) 
locally reduced problem 
[90C05, 90C25, 90C30, 90C34] 
see: Semi-infinite programming: discretization methods) 
locally reflexive relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
locally strictly monotone function 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
locally strongly monotone function 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
locally strongly monotonic at 
[90C05] 
see: Extension of the fundamental theorem of linear 
programming) 
location 
[49M07, 49M10, 65K, 90B80, 90C06, 90C20] 
(see: Facilities layout problems; Spectral projected gradient 
methods) 
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location 
[90B80, 90C10, 90C11] 
(see: Facilities layout problems; Facility location problems 
with spatial interaction; Stochastic transportation and 
location problems) 
location see: center of gravity —; Competitive facility —; 
continuous —; dynamic facility —; emergency facility —; 
facility —; median —; multi-objective facility —; 
multifacilities —; multifacility —; multiple-facility —; single 
facility —; Single facility location: multi-objective euclidean 
distance —; Single facility location: multi-objective 
rectilinear distance —; Voronoi diagrams in facility — 
location-allocation see: facility —; median —; MINLP: 
application in facility —; multifacility — 
location-allocation model 
[90C26] 
(see: MINLP: application in facility location-allocation) 
location-allocation problem 
[90C26] 
(see: MINLP: application in facility location-allocation) 
location-allocation problem see: p-median — 
location and assignment see: discrete — 
location: circle covering problem see: Single facility — 
location: covering problems see: Network — 
location with euclidean and rectilinear distances see: 
Optimizing facility — 
location with externalities see: Facility — 
location model see: facility —; plant —; spatial competition 
facility —; Stochastic facility — 
location: multi-objective euclidean distance location see: 
Single facility — 
location: multi-objective rectilinear distance location see: 
Single facility — 
location pattern 
[90B80, 90B85] 
(see: Warehouse location problem) 
location problem 
[90B80, 90B85] 
(see: Warehouse location problem) 
location problem see: discrete —; dynamic —; Euclidean 
distance —; facility —; iterative solution of the Euclidean 
distance —; maximum coverage —; objective for a —; 
prototype —-; rectilinear distance —; restricted —; simple 
plant —; squared Euclidean distance —; stochastic 
transportation and —; uncapacitated facility —; 
uncapacitated plant —; warehouse — 
location problems see: barrier —; facility —; Global 
optimization in —; Multifacility and restricted —; Stochastic 
transportation and — 
location problems with spatial interaction see: Facility — 
location problems with staircase costs see: convex piecewise 
linearization in facility —; heuristics of facility —; 
linearization in facility —; solution of facility — 
location-routing 
[90-02, 90B06] 
(see: Operations research models for supply chain 
management and design; Vehicle routing) 
location-routing models 
[90-02] 
(see: Operations research models for supply chain 
management and design) 


Location routing problem 
(90B06, 90B80) 
location on a sphere see: minimax — 
location with staircase costs see: Facility — 
location theory 
[90B85] 
(see: Multifacility and restricted location problems) 
locational decision problem 
[90B80, 90B85] 
(see: Warehouse location problem) 
locomotive assignment models 
(see: Railroad locomotive scheduling) 
Locomotive scheduling 
(see: Railroad locomotive scheduling) 
locomotive scheduling see: Railroad — 
locomotive scheduling models see: single — 
locomotive type see: single — 
locomotive type models see: multiple — 
Loeb measure 
03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
logarithmic barrier function 
90C05] 
(see: Linear programming: interior point methods) 
logarithmic barrier method see: interior point — 
logarithmic form 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
logarithmic p-form 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
logarithmic-quadratic barrier-penalty function 
90C31] 
(see: Sensitivity and stability in NLP: approximation) 
logarithmic scoring rule 
(see: Bayesian networks) 
logarithmic and square-root transformation 
90C11, 90C90] 
(see: MINLP: trim-loss problem) 
logarithmic volume 
37A35, 90C05] 
(see: Potential reduction methods for linear programming) 
logconcave discrete probability distribution 
90C15 
(see: Logconcavity of discrete distributions) 
logconcave distributions see: discrete — 
logconcave function 
90C15 
(see: Logconcave measures, logconvexity) 
logconcave function 
90C15 
(see: Logconcave measures, logconvexity; Probabilistic 
constrained problems: convexity theory) 
logconcave measure 
90C15 
(see: Logconcave measures, logconvexity) 
Logconcave measures, logconvexity 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
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programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcavity of discrete distributions; L-shaped method for 
two-stage stochastic programs with recourse; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic network problems: massively parallel solution; 
Stochastic programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programs with recourse: upper bounds; 
Stochastic vehicle routing problems; Two-stage stochastic 
programs with recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcavity of discrete distributions; L-shaped method for 
two-stage stochastic programs with recourse; Multistage 
stochastic programming: barycentric approximation; 
Preprocessing in stochastic programming; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

logconcave probability density function 
[90C15] 
(see: Probabilistic constrained problems: convexity theory) 

logconcave probability measure 
[90C15] 
(see: Logconcave measures, logconvexity) 


logconcave univariate discrete probability distribution 
[90C15] 
(see: Logconcavity of discrete distributions) 

Logconcavity of discrete distributions 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; L-shaped method for 
two-stage stochastic programs with recourse; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic network problems: massively parallel solution; 
Stochastic programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programs with recourse: upper bounds; 
Stochastic vehicle routing problems; Two-stage stochastic 
programs with recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; L-shaped method for 
two-stage stochastic programs with recourse; Multistage 
stochastic programming: barycentric approximation; 
Preprocessing in stochastic programming; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
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programming: quasigradient method; Two-stage stochastic 
programs with recourse) 
logconvex function 
90C15] 
(see: Logconcave measures, logconvexity) 
logconvex function 
[90C15] 
(see: Logconcave measures, logconvexity) 
logconvex measure 
[90C15] 
see: Logconcave measures, logconvexity) 
logconvex probability measure 
[90C15] 
see: Logconcave measures, logconvexity) 
logconvexity see: Logconcave measures — 
logic see: BL- —; evaluation in classical —; evaluation in 
multiple-valued —; fuzzy —; interval —; literal (in — 
logic algebra see: Boolean 2-valued —; many-valued — 
logic algebra connective 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
logic algebras see: Finite complete systems of many-valued —; 
many-valued families of the Pinkava —; PI- —; taxonomy of 
Pi- — 
logic of approximation 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
logic-based approaches 
[49M37, 90C11] 
(see: Mixed integer nonlinear programming) 
logic-based methods see: MINLP: — 
Logic-based outer approximation 
Logic-based outer-approximation method 
(see: Optimal planning of offshore oilfield infrastructure) 
logic conditional 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
logic connectives 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 
logic connectives see: emergence of — 
logic gates 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
logic implication see: many-valued — 
logic normal form see: complete many-valued — 
logic programming 
[65G20, 65G30, 65G40, 65H20] 
see: Interval analysis: intermediate terms) 
logic programming see: constrained —; constraint —; 
modeling language and constraint —; paradigm of — 
logic of Scientific Discovery 
[34-xx, 34Bxx, 34Lxx, 93E24] 
(see: Complexity and large-scale least squares problems) 
logic system see: lattice-type many-valued — 
logic system of approximate reasoning see: interval —; 
point-based — 
logical algebra see: Pinkava — 
logical clause 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 


logical connectives see: TOP and BOT types of — 
logical design 
[01A99, 90C99] 
(see: Von Neumann, John) 
logical equivalence 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
logical implications 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
logics see: Checklist paradigm semantics for fuzzy —; 
classification of many-valued —; many-valued —; 
taxonomy of the Pl-algebras of many-valued — 
logistics 
90B05, 90B06] 
(see: Global supply chain models) 
logistics 
68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
logistics control model 
90-02 
(see: Operations research models for supply chain 
management and design) 
logistics flows 
90-02 
(see: Operations research models for supply chain 
management and design) 
logistics management 
90-02 
(see: Operations research models for supply chain 
management and design) 
LogP model 
03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
logspace Turing machine 
[03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
long campaigns 
(see: Planning in the process industry) 
long range planning 
90C90] 
(see: Chemical process planning) 
long serious step 
49J40, 49]52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
long-term memory in GRASP 
65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
look-ahead rules 
65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: stopping rules) 
lookahead-unit-resolution see: single- — 
lookup table representation 
90C39] 
(see: Neuro-dynamic programming) 
loop 
90C09, 90C10] 
(see: Matroids) 
loop control see: closed- —; open- — 
loop Nash equilibrium see: open- — 
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loop structure prediction methods see: Protein — 
loops see: nested — 
LOP 

[90C10, 90C11, 90C20] 

(see: Linear ordering problem) 
loss see: trim — 
loss of descent 

[90C25, 90C30] 

(see: Successive quadratic programming: full space 

methods) 
loss of descent in a nonlinear program 

[90C25, 90C30] 

(see: Successive quadratic programming: full space 

methods) 
loss problem see: MINLP: trim- —; numerical example of 

a trim- —; trim- — 
losses see: minimization of — 
lost sales assumption 

[49L20] 

(see: Dynamic programming: inventory control) 
lot sizing 

[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

(see: Modeling difficult optimization problems) 
lot-sizing problem see: capacitated —; Economic — 
Louis see: Lagrange, Joseph- — 

Lovasz extension 

[90C10, 90C25, 90C27, 90C35] 

(see: L-convex functions and M-convex functions) 
Lovasz number 

(05C69, 05C15, 05C17, 05C35, 90C35, 90C22) 

(referred to in: Stable set problem: branch & cut algorithms) 

(refers to: Copositive programming) 
lovasz number 
05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 

(see: Lovasz number) 

low failure of the alpha-beta algorithm 
49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
low-level feature detection 

90C90] 

(see: Optimization in medical imaging) 
low-level software 

90C10, 90C26, 90C30] 

(see: Optimization software) 

low-rank nonconvexity 

90C26, 90C31] 

(see: Global optimization in multiplicative programming; 
Multiplicative programming) 
low-rank nonconvexity 

90C26, 90C31] 

(see: Global optimization in multiplicative programming; 

Multiplicative programming) 
lower bandwidth 

[15-XX, 65-XX, 90-XX] 

(see: Cholesky factorization) 
lower bound 

[90C10] 

(see: Maximum constraint satisfaction: relaxations and 

upper bounds) 
lower bound see: Gilmore-Lawler —; guaranteed —; 

Jensen —; parametric —; valid — 


lower bound function 
[90C15, 90C27] 
(see: Discrete stochastic optimization) 

lower bound for a set 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 

lower bound test 

49M37, 90C11] 

(see: Mixed integer nonlinear programming) 

lower boundary 

65K05, 90C26, 90C30] 

(see: Monotonic optimization) 

lower boundary point 

65K05, 90C26, 90C30] 

(see: Monotonic optimization) 

lower bounding 

49M37, 65K10, 90C26, 90C30] 

(see: #BB algorithm) 

lower bounding Hessian 

49M37, 65K10, 90C26, 90C30] 

(see: a BB algorithm) 

lower bounds 

05C85, 90B35] 
(see: Directed tree networks; Job-shop scheduling problem) 

lower bounds see: constructive —; eigenvalue based —; 
Gilmore—Lawler type —; maximum flow problem with 
nonnegative —; parametric upper and —; variance 
reduction — 

lower bounds to eigenvalues see: upper and — 

lower bounds for multivariate probability integrals 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 

lower convex hull 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 

lower derivative see: Dini —; Dini conditional —; Hadamard 
conditional — 


lower directional derivative see: Dini —; Hadamard — 
lower envelope 
[90C30] 
see: Lagrangian duality: BASICS) 
lower-level 
[49M30, 49M37, 65K05, 90C30] 
(see: Practical augmented Lagrangian methods) 
lower-level problem 
[57R12, 90C31, 90C34, 90C46] 
see: Generalized semi-infinite programming: optimality 
conditions; Parametric global optimization: sensitivity; 
Smoothing methods for semi-infinite optimization) 
lower problem 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
lower semicontinuous 
[46A22, 49J20, 49J35, 49J40, 49J52, 54D05, 54H25, 55M20, 
91A05] 
(see: Minimax theorems; Shape optimization) 
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lower semicontinuous 

90C11, 90C15] 

(see: Stochastic programming with simple integer recourse) 
lower semicontinuous function 

[03H10, 49J27, 90C26, 90C34] 

see: Convex envelopes in optimization problems; 
Semi-infinite programming and control problems) 
lower set 

[62G07, 62G30, 65K05] 

see: Isotonic regression problems) 

lower set algorithm see: minimum — 

lower sets see: minimum — 

lower and upper bounds constraints 

[90C09, 90C10] 

see: Combinatorial optimization algorithms in resource 
allocation problems) 

lower and upper directional derivatives 

[90C31, 90C34, 90C46] 

see: Generalized semi-infinite programming: optimality 
conditions) 

lower weight bounds 

[68Q20] 

see: Optimal triangulations) 

lower well oil rate constraints see: upper and — 

Léwner partial order 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 


LP 
[90C05] 
(see: Linear programming) 
LP 
[05B35, 65K05, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C33, 90C57, 90C90] 
see: Lexicographic pivoting rules; Modeling difficult 
optimization problems) 
LP duality 
[90C05, 90C15] 
(see: Probabilistic constrained linear programming: duality 
theory) 
IP/NLP based branch and bound 
[49M20, 90C11, 90C30] 
(see: Generalized outer approximation) 
IP relaxation 
[68Q99, 90B80, 90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Branch and price: Integer programming with column 
generation; Facility location problems with spatial 
interaction; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 
algorithms) 
LP strategy for interval-Newton method in deterministic 
global optimization 
IPS 


see: Short-term scheduling of batch processes with 
resources) 

IS-CD 

[49M07, 49M10, 65K, 90C06] 

(see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 

LS problem 

15-XX, 65-XX, 90-XX] 

(see: Cholesky factorization) 


LSO 
[41A30, 47499, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
LSP 
[65Fxx] 
(see: Least squares problems) 
LSTR 
[90C30] 
(see: Large scale trust region problems) 
LSUO 
[90C06] 
(see: Large scale unconstrained optimization) 
LU algorithm see: Implicit — 
LU-decomposition 
[15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 
LU factorization see: classical — 
Luc U-quasiconcave function 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
Lukasiewicz connective 
[03B52, 03E72, 47840, 68127, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
Lukasiewicz implication 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
lumped systems see: reaction flux estimation in — 
lunch see: no free — 
Luus—Jaakola optimization procedure see: Direct search — 
LX algorithm see: Implicit — 
Lyapunov function 
[90C30] 
(see: Suboptimal control) 
Lyusternik theorem 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
Lyusternik theorem see: classical —; high-order generalization 
of — 


M 


m see: algorithm solving a problem instance in time —; 
big- —; GRR- —; skew-symmetric matrix — 
m-coloring problem 
[90C08, 90C11, 90C27] 
(see: Quadratic semi-assignment problem) 
m-convex 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
M-convex function 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
M>-convex function 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
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M5 -convex function 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
M-convex functions see: L-convex functions and — 
M-convex set 
90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
M®5-convex set 
90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
M-convexity 
90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
m-dial-a-ride 
00-02, 01-02, 03-02] 
(see: Vehicle routing problem with simultaneous pickups 
and deliveries) 
m-dimensional knapsack problem 
90C10, 90C27] 
(see: Multidimensional knapsack problems) 
M-estimator see: Huber — 
M- and L-convex functions see: Fenchel-type duality for — 
M-Pareto optimal solution 
[90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 
M-separation theorem 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
m-TSP 
[90B06] 
(see: Vehicle routing) 
machine see: accepting computation of a Turing —; accepting 
state of a Turing —; alternating Turing —; bottleneck —; 
control state of a Turing —; deterministic Turing —; 
execution of a Turing —; exponentially space-bounded 
Turing —; exponentially time-bounded Turing —-; final state 
of a Turing —; generalized eigenvalue proximal support 
vector —; input alphabet of a Turing —; language accepted 
by a Turing —; length of a partial computation of 
a Turing —; logspace Turing —; move of a Turing —; 
nonaccepting computation of a Turing —; nondeterministic 
Turing —; parallel random access —; partial computation of 
a Turing —; polynomially space-bounded Turing —; 
polynomially time-bounded Turing —; running time of 
a Turing —-; size of the input of a Turing —; space 
complexity of a deterministic Turing —; space complexity of 
a nondeterministic Turing —; start state of a Turing —; state 
of a Turing —; tape cell of a Turing —; tape of a Turing —; 
time complexity of a deterministic Turing —; time 
complexity of a nondeterministic Turing —; transition rules 
of a Turing —; Turing — 
machine interval 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
machine interval arithmetic 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
machine interval arithmetic see: inclusion principle of — 
machine learning 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 


90C90] 
(see: Disease diagnosis: optimization-based methods) 
machine-learning algorithm 
[49-XX, 60Jxx, 65Lxx, 91B32, 92D30, 93-XX] 
(see: Resource allocation for epidemic control) 
machine model see: Turing — 
machine problem see: Generalized eigenvalue proximal 
support vector — 
machine repetition 
[90B35] 
(see: Job-shop scheduling problem) 
machine size 
[65K05, 65Y05] 
(see: Parallel computing: models) 
machine solving a problem see: Turing — 
machines see: complexity of Turing —; distributed memory 
parallel —; identical —; nonidentical —; parallel —; shared 
memory parallel —; support vector — 
macro scale network 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
macrostate 
[90B80, 90C10] 
(see: Facility location problems with spatial interaction) 
Madansky upper bound see: Edmundson- — 
Maehly method see: Lehmann- — 
MAESTRO 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
magic numbers 
[90C26, 90C90] 
(see: Global optimization in Lennard-Jones and morse 
clusters) 
magnitude see: order of — 
main diagonal see: negative — 
maintenance 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
maintenance routing problem see: airline — 
majorant see: smallest K- — 
majority 
[90-XX] 
see: Outranking methods) 
majority theorem 
[90C26, 90C90] 
see: Global optimization in Weber’s problem with 
attraction and repulsion) 
majorization 
[90C09, 90C10] 
see: Combinatorial matrix analysis) 
maker see: decision — 
makespan 
[68Q25, 90B36, 90C60] 
see: NP-complete problems and proof methodology; 
Stochastic scheduling) 
makespan see: minimization of — 
making see: decision —; financial decision —; group 
decision —; hierarchical decision —; multicriteria 
decision —; multiple criteria decision —; Preference 
disaggregation approach: basic features, examples from 
financial decision — 
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making problems: optimization techniques see: Estimating 
data for multicriteria decision — 

making with rolling horizon see: decision — 

making under extreme events see: decision — 

making under uncertainty see: decision — 

mammography screening 

[90C09, 90C10] 

(see: Optimization in boolean classification problems) 

management 

[90C26, 90C30, 90C31] 

see: Bilevel programming: introduction, history and 

overview) 

management 

[90-01, 90B30, 90B50, 91B32, 91B52, 91B74] 

see: Bilevel programming in management) 

management see: applications in environmental systems 
modeling and —-; asset Liability —; Bilevel programming 
in —; Bilinear programming: applications in the supply 
chain —; catchment —; Competitive ratio for portfolio —; 
forest —; inventory —; logistics —; Mathematical 
programming methods in supply chain —; multistage 
inventory —; operational supply chain —; portfolio —; 
revenue —; strategic supply chain —; supply chain — 

management decision support system see: Asset liability — 

management and design see: Operations research models for 
supply chain — 

management of environmental systems see: Global 
optimization in the analysis and — 

management hypothesis see: inefficient — 

Management Mathematics see: iMA Journal of — 

management models see: multistage inventory —; single 
stage inventory — 

management in supply chains see: Inventory — 

manager see: layout — 

mandatory work first algorithm 

[49]35, 49K35, 62C20, 91A05, 91A40] 

(see: Minimax game tree searching) 

Mangasarian-Fromovitz constraint qualification 

90C31] 

(see: Sensitivity and stability in NLP: continuity and 

differential stability) 

Mangasarian-Fromovitz constraint qualification 

90C26, 90C31, 90C34, 90C39] 
(see: Parametric global optimization: sensitivity; Second 
order optimality conditions for nonlinear optimization) 

Mangasarian-Fromovitz CQ 

[49K27, 49K40, 90C30, 90C31] 

see: First order constraint qualifications) 

Manhattan distance 

[90B80, 90C27] 

see: Voronoi diagrams in facility location) 

Manhattan distance 

[90B80, 90C27] 

see: Voronoi diagrams in facility location) 

Manhattan distances 

[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 

manifold 

[05B35, 20F36, 20F55, 52C35, 57N65] 

see: Hyperplane arrangements) 

manifold see: Riemannian — 


manipulation see: symbolic — 
Mann-Whitney statistic 

[62H30, 90C27] 

(see: Assignment methods in clustering) 
many conditions moment problem see: infinite — 
many-valued families of the Pinkava logic algebras 

[03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 
many-valued logic algebra 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
many-valued logic algebras see: Finite complete systems of — 
many-valued logic implication 

[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 
(see: Boolean and fuzzy relations) 
many-valued logic normal form see: complete — 
many-valued logic system see: lattice-type — 
many-valued logics 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
many-valued logics 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T15, 68T27, 68T30] 
(see: Checklist paradigm semantics for fuzzy logics; Finite 
complete systems of many-valued logic algebras) 
many-valued logics see: classification of —; taxonomy of the 
Pl-algebras of — 
many-valued normal form 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
MAP 
[90C08, 90C10, 90C11, 90C27, 90C57, 90C59] 
(see: Multidimensional assignment problem; Quadratic 
assignment problem) 
map see: adjoint linear —; cone-convex —; contact —; 
maximal monotone —; monotone —; normal —; Peano —; 
proximal —; proximity —; quasimonotone —; semistrictly 
quasimonotone —-; standard part —; strictly monotone —; 
strictly pseudomonotone —-; strictly quasimonotone — 

map optimization see: fluence — 

map overlap see: contact — 

map overlap maximization problem, CMO see: Contact — 

mapping see: best response —; closed point-to-set —; 
computation and data —; Contraction- —; DNA —; nearest 
point —; optimal solution —; ®-isotone —; point-to-set —; 
pseudomonotone —; semismooth —; strongly 
semismooth —-; zero-epi — 

mapping technique see: pictogram translation —; receiver 
initiated —; sender initiated — 

mappings see: approximation of nonsmooth —; 
approximations of nonsmooth —; contraction —; method 
of —; point-to-set — 

maps see: Generalized monotone multivalued —; Generalized 
monotone single valued —; homotopic — 

maps: properties and applications see: PSeudomonotone — 

Maratos effect 

[65K05, 65K10, 90C06, 90C30, 90C34] 

(see: Feasible sequential quadratic programming) 
margin see: multivariable stability —; Stability — 
margin K see: multivariable stability — 
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marginal allocation 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
marginal analysis 
[90C60] 
(see: Complexity of degeneracy) 
marginal constraint 
[90C35] 
(see: Multi-index transportation problems) 
marginal distribution functions 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
marginal function 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
marginal function optimization 
[49J35, 65K99, 74A55, 74M10, 74M15, 90C26] 
(see: Quasidifferentiable optimization: applications) 
marginal functions 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
marginal probability distribution function see: 
one-dimensional —; two-dimensional — 
marginal value 
[90C60] 
(see: Complexity of degeneracy) 
marginal value see: positive — 
marginal value formula 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
marginal values see: negative —; positive — 
marginals see: table with given — 
margins see: hierarchical collection of — 
Margolin method see: Schruben- — 
Maritime inventory routing problems 
market equilibrium see: Oligopolistic — 
market equilibrium conditions 
[65K10, 90C90] 
(see: Variational inequalities: projected dynamical system) 
market equilibrium conditions 
[65K10, 65M60] 
(see: Variational inequalities) 
market line see: capital —; security — 
market model see: Sharpe single index — 
market portfolio 
[91B50] 
(see: Financial equilibrium) 
market portfolio 
[91B50] 
(see: Financial equilibrium) 
markets see: aspatial and spatial —; Operations research and 
financial —; spatial — 
Markoff theorem see: Gauss- — 
markov chain 
(see: Bayesian networks) 
Markov chain see: finite-state —; stationary-state — 
Markov chain sampling 
[65K05, 90C30] 
(see: Random search methods) 


Markov chains 

[90C27] 

(see: Operations research and financial markets) 

Markov decision process 

[49-XX, 60Jxx, 65Lxx, 91B32, 92D30, 93-XX] 

(see: Resource allocation for epidemic control) 

Markov kernel 

[28-XX, 49-XX, 60-XX, 90C15, 90C29] 

see: Discretely distributed stochastic programs: descent 
directions and efficient points; General moment 
optimization problems) 

Markov kernel 

[90C15, 90C29] 

(see: Discretely distributed stochastic programs: descent 

directions and efficient points) 

Markov model and Gibbs sampler see: hidden — 
Markov models see: hidden — 
Markov process 

[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 

70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 

(see: Global optimization in protein folding) 

Markov process 

[49-XX, 60Jxx, 65Lxx, 91B32, 92D30, 93-XX] 

(see: Resource allocation for epidemic control) 
markov processes and their simulation see: Derivatives of — 
Markov strategy 

[49]xx, 91 Axx] 

(see: Infinite horizon control and dynamic games) 
Markov transformation 

[28-XX, 49-XX, 60-XX] 

(see: General moment optimization problems) 
markowitz mean-variance model see: Portfolio selection: — 
Marquardt see: Levenberg- — 

Marquardt algorithm see: Levenberg—- — 
Marquardt method see: Levenberg- — 
Marquardt rule see: Levenberg- — 
marriage problem 

[90C05, 90C10, 90C27, 90C35] 

(see: Assignment and matching) 
marriage problem 

[90C05, 90C10, 90C27, 90C35] 

(see: Assignment and matching) 
marriage problem see: stable — 

Martin algorithm 

[68T99, 90C27] 

(see: Capacitated minimum spanning trees) 
martingale see: score function — 
mass 
[90C26, 90C90] 
see: Global optimization in binary star astronomy) 
mass 
[90C26, 90C90] 
see: Global optimization in binary star astronomy) 
mass balance constraints 
[90C35] 
see: Maximum flow problem; Minimum cost flow problem) 
mass balances 
[90C30, 90C90] 

(see: Successive quadratic programming: applications in 
distillation systems) 
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mass and energy balance equations 
90C30, 90C90] 
(see: Successive quadratic programming: applications in 
distillation systems) 
mass, energy and momentum balances 
76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
mass exchange see: modeling — 
mass exchange matches 
93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks; Mixed 
integer linear programming: mass and heat exchanger 
networks) 
mass exchange network see: heat and — 
mass exchange networks see: Flexible — 
mass exchanger 
[93.A30, 93B50 
see: Mixed integer linear programming: mass and heat 
exchanger networks) 
mass exchanger network 
[93.A30, 93B50 
see: MINLP: mass and heat exchanger networks) 
mass and heat exchange 
[93.A30, 93B50 
(see: MINLP: mass and heat exchanger networks; Mixed 
integer linear programming: mass and heat exchanger 
networks) 
mass and heat exchanger network 
[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks) 
mass and heat exchanger networks see: MINLP: —; Mixed 
integer linear programming: — 
mass/heat transfer module 
[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks) 
mass separating agents 
[93A30, 93B50] 
(see: Mixed integer linear programming: mass and heat 
exchanger networks) 
massive data sets see: least squares problems with — 
massively parallel computing 
[68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 
massively parallel solution see: Stochastic network 
problems: — 
master problem 
[49M29, 90C06, 90C10, 90C11, 90C15, 90C30, 90C35, 90C57, 
90C90] 
(see: Decomposition algorithms for the solution of 
multistage mean-variance optimization problems; 
Generalized benders decomposition; Modeling difficult 
optimization problems; Multicommodity flow problems; 
Simplicial decomposition; Stabilization of cutting plane 
algorithms for stochastic linear programming problems) 
master problem 
[90C30] 
(see: Simplicial decomposition) 
master problem see: complete —; disjunctive OA —; full —; 
MILP —; MIQP —; primal —; relaxed —; relaxed primal —; 
restricted — 


master program 
90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 
master program see: full —; reduced — 
master-slave scheme 
68W 10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
master subproblem 
90C11, 90C31] 
(see: Parametric mixed integer nonlinear optimization) 
match-network problem 
90C90] 
(see: MINLP: heat exchanger network synthesis) 
matches see: mass exchange — 
matching 
05C85, 90C05, 90C10, 90C27, 90C35] 
(see: Assignment and matching; Directed tree networks; 
Maximum flow problem) 
matching see: 3-DIMENSIONAL —-; algorithm pre- —; 


Assignment and —; b- —; bipartite —; maximum —; 
Maximum partition —; maximum pre- —; partition —; 
perfect —; pre- — 


matching of derivative conditions 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 

matching heuristic see: maximal — 

matching-| see: algorithm partition- — 

matching model 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

matching problem 
[90C10, 90C11, 90C27, 90C57] 
(see: Integer programming) 

matching problem see: 2- —; 3-dimensional —; b- —; 
maximum cardinality —; maximum partition —; maximum 
pre- —; perfect —; perfect b- —; weighted —; weighted 
bipartite — 

Matching (ROM) see: recursive Opt — 

Matching Subgraph Problem see: minMax — 

matchings see: perfect — 

material derivative approach 
[49J20, 49J52] 
(see: Shape optimization) 

material flow 
[90-02] 
(see: Operations research models for supply chain 
management and design) 

material flows see: balance equations for — 

material laws see: discretized hemivariational inequalities for 
nonlinear — 

materials see: laminated composite — 

mathematical areas see: software package for specific — 

mathematical and computational certainty 
(see: LP strategy for interval-Newton method in 
deterministic global optimization) 

mathematical economics 
[90B80, 90B85, 90Cxx, 91 Axx, 91Bxx] 
(see: Facility location with externalities) 

mathematical finance 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
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(see: Information-based complexity and information-based 
optimization) 
mathematical finance 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 
mathematical formulation 
[90C11, 90C29, 90C90] 
(see: Multi-objective optimization: interaction of design 
and control) 
mathematical model 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30, 90C06, 90C10, 90C11, 90C30, 
90C57, 90C90] 
(see: Identification methods for reaction kinetics and 
transport; Modeling difficult optimization problems) 
mathematical modeling 
[49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 
mathematical models 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
mathematical program see: extreme point — 
mathematical program with affine equilibrium constraints 
[90C30, 90C33] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 
mathematical program with equilibrium constraints 
90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 
mathematical program with equilibrium constraints 
90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 
mathematical Programming 
90C05, 90C06, 90C22, 90C25, 90C30, 90C51] 
(see: Interior point methods for semidefinite programming; 
Saddle point theory and optimality conditions) 
mathematical programming 
26B25, 26E25, 49J52, 62H30, 90B80, 90B85, 90C11, 90C27, 
90C99, 90Cxx, 91Axx, 91Bxx] 
(see: Facility location with externalities; Operations 
research and financial markets; Quasidifferentiable 
optimization; Statistical classification: optimization 
approaches) 
mathematical programming see: multi-objective — 
Mathematical programming for data mining 
Mathematical programming methods in supply chain 
management 
(referred to in: Generalizations of interior point methods for 
the linear complementarity problem; Simultaneous 
estimation and optimization of nonlinear problems) 
(refers to: Generalizations of interior point methods for the 
linear complementarity problem; Simultaneous estimation 
and optimization of nonlinear problems) 
mathematical programming problem see: nonlinear — 
mathematical rigor see: with — 
mathematical software 
[90C10, 90C26, 90C30] 
(see: Optimization software) 


mathematical software 
[90C10, 90C26, 90C30] 
(see: Optimization software) 

Mathematics see: iMA Journal of Management — 

matric matroid 
[90C09, 90C10] 
(see: Matroids) 

matrices 
[90C33] 
(see: Linear complementarity problem) 

matrices see: Abaffian —; completion of —; completion to 
completely positive and contraction —; Hessian —; Interval 
analysis: eigenvalue bounds of interval —; positive 
definite —; positive semidefinite —; q- —; Stochastic 
programming: parallel factorization of structured —; 
updating input-output — 

matrix 
[90C09, 90C10, 90C25, 90C33, 90C55] 
(see: Combinatorial matrix analysis; Splitting method for 
linear complementarity problems) 

matrix 
[90C30] 
(see: Frank-Wolfe algorithm) 

matrix see: adjacency —; anti-Monge —; anti-Robinson —; 
banded —; basic —; bisymmetric —; bisymmetric positive 
semidefinite —; chess-board —; circulant —; 
classification —; column sufficient —; completely 
positive —; completion of a partial —; complex interval —; 
condition number of a —; confusion —; consistent —; 
consistent judgment —; contraction —; copositive —; 
diagonal —; diagonal shift —; diagonal underestimation —; 
dimensional symmetric interval —; distance —; doubly 
nonnegative —; doubly stochastic —; Euclidean 
distance —; extended —; extreme eigenvalue of an 
interval —; fully indecomposable —; geodesic Hessian —; 
graph of a —; H- —; Hermitian interval —; Hessian —; 
ill-conditioned —;; ill-conditioned coefficient —; 
incidence —; inductive structure of an irreducible —; inertia 
of a —; interval Hessian —; interval of variation of an 
eigenvalue of an interval —; irreducible —; irreducible 
components of a —; Jacobian —; Kalmanson —; L- —; 
Monge —; monotone —; n-fold —; n Hessian —; node-arc 
incidence —; nonbasic —; nonsingular —; oblique 
projection —; orthogonal —; p- —; partial —; partial 
completely positive —; partial contraction —; partial 
definite —; partial distance —; partial Hermitian —; partial 
semidefinite —; pattern of a —; permanent of a —; 
permutation —; polar —; polynomial —; positive 
definite —; positive semidefinite —; positive semidefinite 
symmetric —; product —; projected Lagrangian Hessian —; 
projection —; Q- —; qualitative class of a —; rank-one —; 
real interval —; real symmetric interval —; realization of 
a—; regular —; rotation —; row sufficient —; S*-—; 
seed —; sign —; sign-nonsingular —; sign pattern of a —; 
singular —; skew-symmetric —; sparse —; standard 
determinant expansion of a —; stiffness —; stochastic —; 
strictly copositive —; strongly nonsingular —; strongly 
positive definite —; strongly regular —; structured —; 
sufficient —; sum —; symmetric —; Toeplitz —; totally 
unimodular —; transition —; transition probability —; 
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tridiagonal —; unimodular —; vertex matrix of an 
interval —; well-conditioned — 
matrix of activities 
[90Cxx] 
(see: Discontinuous optimization) 
matrix analysis 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
matrix analysis 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
matrix analysis see: Combinatorial — 
matrix class invariant under principal pivoting 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
matrix classes 
[90033] 
(see: Linear complementarity problem) 
matrix classes 
[90033] 
(see: Linear complementarity problem) 
matrix completion see: rank — 
matrix completion problem 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
matrix completion problem see: distance —; Euclidean 
distance —; positive semidefinite — 
Matrix completion problems 
(05C50, 15A48, 15A57, 90C25) 
(referred to in: Semidefinite programming and determinant 
maximization) 
(refers to: Interior point methods for semidefinite 
programming; Semidefinite programming and determinant 
maximization) 
matrix estimation see: covariance — 
matrix factorization 
[90C15] 
(see: Stochastic programming: parallel factorization of 
structured matrices) 
matrix factorization see: modifying —; parallel —; 
structured — 
matrix inequality see: bilinear —; linear — 
matrix of an interval matrix see: vertex — 
matrix of a Lagrangian function see: Hessian —; projected 
Hessian — 
matrix M see: skew-symmetric — 
matrix notation see: relational — 
matrix notation for relational operations 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
matrix patterns and graphs 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
matrix representation of a relation 
[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 


matrix rounding problem 
[90035] 
(see: Maximum flow problem) 

matrix of second partial derivatives 
[90C25, 90C30] 
(see: Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 

matrix splitting methods in quadratic programming 

90C30] 

(see: Cost approximation algorithms) 

matrix in standard form 

65Fxx] 

(see: Least squares problems) 

matroid 

90C09, 90C10] 

(see: Matroids) 

matroid 

90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 

matroid see: acyclic oriented —; bases of an oriented —; basis 
orientation of an oriented —; binary —; closed of a —; 
closure operator for a —; connectivity of a —; contraction of 
a —; disconnected —; dual —; graphic —-; infinitely 
connected —; linear —; matric —; minor of a —; 
orientable —; orthogonal —; partition —; rank of a —; 
regular —; representable —; restriction of a —; set of bases 
of a —; ternary —; totally acyclic oriented —; transversal —; 
underlying —; uniform —; vector of an oriented —; 
vectorial —; weight function of a —; weighted — 

matroid base polytope 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 


matroid connectivity 
[90C09, 90C10] 
(see: Matroids) 
matroid contraction 
[90C09, 90C10] 
(see: Matroids) 
matroid elements see: contracting —; deleting — 
matroid minor 
90C09, 90C10] 
(see: Oriented matroids) 
matroid representation 
90C09, 90C10] 
(see: Matroids) 
matroid restriction 
90C09, 90C10] 
(see: Matroids) 
matroid theory 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
Matroids 
(90C09, 90C10) 
(referred to in: Oriented matroids) 
(refers to: Oriented matroids) 
matroids see: axiom systems for oriented —; contraction in —; 
deletion in —; duality of —; oriented — 
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Matula estimate 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
mature 
(see: State of the art in modeling agricultural systems) 
maturities see: estimating the spot rate for bonds with 
constant — 
maturity see: term to —; yield to — 
MAX-2-SAT 
90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
max-bisection 
05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
max-clique problem 
90C60] 
(see: Complexity theory) 
max-closed form transformation see: unimodular — 
max-closed form transformations see: unimodular — 
max-closed function 
90C05, 90C10 
(see: Simplicial pivoting algorithms for integer 
programming) 
max-closed set 
90C05, 90C10 
(see: Simplicial pivoting algorithms for integer 
programming) 
max-closed sets 
90C05, 90C10 
(see: Simplicial pivoting algorithms for integer 
programming) 
mAX-CSP 
90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
max-cut 
05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
MAX-CUT see: Maximum cut problem — 
max Cut (MC) 
68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
max-det problem 
15A15, 90C25, 90C55, 90C90] 
(see: Semidefinite programming and determinant 
maximization) 
max digraph see: min- — 
max duality see: double- — 
max-flow algorithm 
90-XX] 
(see: Survivable networks) 
max-flow min-cut theorem 
05C05, 05C40, 68R10, 90C35] 
(see: Maximum flow problem; Network design problems) 
max-flow min-cut theorem 
90C35] 
(see: Maximum flow problem) 
max fractional program see: min- — 
max-function 
46A20, 52A01, 65K05, 90C30] 


(see: Composite nonsmooth optimization; Minimax: 
directional differentiability) 

max-function 
[49K35, 49M27, 65K05, 65K10, 90C25, 90C30] 
(see: Convex max-functions; Minimax: directional 
differentiability) 

max-function see: convex — 

max-functions see: Convex — 

max graph see: min- — 

MAX-MIN ant system 

[05-04, 90C27] 

see: Evolutionary algorithms in combinatorial 

optimization) 

max-min fractional program 

[90C32] 

see: Fractional programming) 

max-min-max optimization problem 

[90C26] 

see: Bilevel optimization: feasibility test and flexibility 
index) 

max optimization problem see: max-min- — 

max-r-Constraint Satisfaction Problem 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

(see: Domination analysis in combinatorial optimization) 

max-r-CSP 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

see: Domination analysis in combinatorial optimization) 

max-regret-fc and max-regret heuristics 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

see: Domination analysis in combinatorial optimization) 

max-regret heuristics see: max-regret-fc and — 

MAX-SAT 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C10, 

90C27, 94C10] 

see: Maximum constraint satisfaction: relaxations and 
upper bounds; Maximum satisfiability problem) 

MAX-SAT problem see: weighted — 

max Steiner tree see: min- — 

max TSP 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

see: Domination analysis in combinatorial optimization) 

max-type function 

[65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 

max-type functions see: difference of — 

maxdiag fine structures 

03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 

MaxEnt 

90C25, 94A17] 

(see: Entropy optimization: shannon measure of entropy 

and its properties) 

maxima 

01A99] 

(see: Leibniz, gottfried wilhelm) 

maximal 

05C15, 05C17, 05C35, 05C62, 05C69, 05C85, 90C22, 90C27, 

90C35, 90C59] 

(see: Lovasz number; Optimization problems in unit-disk 

graphs) 
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maximal alternative 
[90-XX] 
(see: Outranking methods) 
maximal best approximation 
[41A30, 4799, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
maximal clique 
[05C69, 05C85, 68W01, 90C20, 90C59] 
(see: Heuristics for maximum clique and independent set; 
Standard quadratic optimization problems: applications) 
maximal flow problem 
05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 
maximal independent set 
[90C09, 90C10] 
see: Matroids) 
maximal matching heuristic 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
see: Domination analysis in combinatorial optimization) 
maximal monotone map 
[47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 
maximal planar subgraph 
[90C10, 90C27, 94C15] 
(see: Graph planarization) 
maximal similarity subtree isomorphism 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
maximal subtree isomorphism 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
see: Replicator dynamics in combinatorial optimization) 
maximally informative genes see: Selection of — 
maximin objective function 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
maximin path length 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 


maximization 
[90C60] 
(see: Computational complexity theory) 
maximization see: expectation- —; Semidefinite programming 


and determinant — 
maximization algorithm see: expectation- — 
maximization interval see: expectation- — 
maximization method see: vector — 
maximization of output/input 

[90C32] 

(see: Fractional programming) 
maximization problem see: global — 
maximization problem, CMO see: Contact map overlap — 
maximization of productivity 

[90C32] 

(see: Fractional programming) 
maximization of return on investment 

[90032] 

(see: Fractional programming) 


maximization of return/risk 
[90C32] 
(see: Fractional programming) 

maximization of sales 
(see: Short-term scheduling of batch processes with 
resources) 

Maximization of the Smallest of Several Ratios 
[90C32] 
(see: Fractional programming) 

maximize net present value 
(see: Planning in the process industry) 

maximize operating cash flow 
(see: Planning in the process industry) 

maximizer 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 

maximizer 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 

maximizer see: discrete global —; discrete left local —; 
global —; local —; strict local — 

maximizers see: set of discrete e-global local — 

maximizing minimum distance 

90C29] 

(see: Multicriteria sorting methods) 

maximizing a sum of ratios 

90C32] 

(see: Fractional programming) 

maximum 

05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 

(see: Lovasz number) 

maximum see: local — 

maximum absolute deviation 

41A30, 62J02, 90C26] 

(see: Regression by special functions: algorithms and 

complexity) 

maximum bipartite subgraph 

90C10, 90C27, 94C15] 

(see: Graph planarization) 

maximum cardinality matching problem 

90C05, 90C10, 90C27, 90C35] 

(see: Assignment and matching) 

maximum clique 

05-04, 05C60, 05C69, 05C85, 37B25, 68Q25, 68W01, 90C10, 
90C20, 90C27, 90C35, 90C59, 90C60, 91A22] 
(see: Evolutionary algorithms in combinatorial 
optimization; Heuristics for maximum clique and 
independent set; Maximum constraint satisfaction: 
relaxations and upper bounds; NP-complete problems and 
proof methodology; Quadratic knapsack; Replicator 
dynamics in combinatorial optimization) 

maximum clique and independent set see: Heuristics for — 

maximum clique problem 
[05C15, 05C60, 05C62, 05C69, 05C85, 37B25, 60G35, 65K05, 
68Q25, 68R10, 68W40, 90C08, 90C11, 90C20, 90C27, 90C35, 
90C57, 90C59, 91A22] 
(see: Differential equations and global optimization; 
Domination analysis in combinatorial optimization; 
Optimization problems in unit-disk graphs; Quadratic 
assignment problem; Replicator dynamics in combinatorial 
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optimization; Standard quadratic optimization problems: 
applications) 
maximum condition 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 
maximum constraint satisfaction 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
maximum constraint satisfaction problem 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
Maximum constraint satisfaction: relaxations and upper 
bounds 
(90C10) 
(referred to in: Frequency assignment problem; Graph 
coloring) 
(refers to: Frequency assignment problem; Graph coloring) 
maximum coverage location problem 
[90B10, 90B80, 90C35] 
(see: Network location: covering problems) 
maximum coverage location problem 
[90B10, 90B80, 90C35] 
(see: Network location: covering problems) 
maximum cut 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: cutting plane algorithms) 
Maximum cut problem, MAX-CUT 
(refers to: Combinatorial test problems and problem 
generators; Continuous global optimization: models, 
algorithms and software; Derivative-free methods for 
non-smooth optimization; Greedy randomized adaptive 
search procedures; Heuristic search; Integer programming; 
NP-complete problems and proof methodology; Quadratic 
integer programming: complexity and equivalent forms; 
Random search methods; Semidefinite programming: 
optimality conditions and stability; Semidefinite 
programming and the sensor network localization problem, 
SNLP; Solving large scale and sparse semidefinite programs; 
Variable neighborhood search methods) 
maximum degree 
[90C35] 
(see: Graph coloring) 
maximum entropy 
[62F10, 94A17] 
(see: Entropy optimization: parameter estimation) 
maximum entropy see: axiomatic derivation of the principle 
of —; Jaynes —; principle of — 
Maximum entropy and game theory 
maximum entropy principle 
[90C25, 94A08, 9417] 
(see: Entropy optimization: shannon measure of entropy 
and its properties; Jaynes’ maximum entropy principle; 
Maximum entropy principle: image reconstruction) 
maximum entropy principle 
[90C25, 94A08, 9417] 
(see: Entropy optimization: shannon measure of entropy 
and its properties; Maximum entropy principle: image 
reconstruction) 


maximum entropy principle see: Jaynes’ — 

Maximum entropy principle: image reconstruction 
(94A17, 94A08) 
(referred to in: Entropy optimization: interior point 
methods; Entropy optimization: parameter estimation; 
Entropy optimization: shannon measure of entropy and its 
properties; Jaynes’ maximum entropy principle; 
Optimization in medical imaging) 
(refers to: Entropy optimization: interior point methods; 
Entropy optimization: parameter estimation; Entropy 
optimization: shannon measure of entropy and its 
properties; Jaynes’ maximum entropy principle; 
Optimization in medical imaging) 

maximum flow 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 

Maximum flow problem 
(90C35) 
(referred to in: Auction algorithms; Communication 
network assignment problem; Dynamic traffic networks; 
Equilibrium networks; Generalized networks; Minimum 
cost flow problem; Multicommodity flow problems; 
Network design problems; Network location: covering 
problems; Nonconvex network flow problems; Nonoriented 
multicommodity flow problems; Piecewise linear network 
flow problems; Shortest path tree algorithms; Steiner tree 
problems; Stochastic network problems: massively parallel 
solution; Survivable networks; Traffic network equilibrium) 


(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Equilibrium networks; Evacuation 
networks; Generalized networks; Minimum cost flow 
problem; Network design problems; Network location: 
covering problems; Nonconvex network flow problems; 
Nonoriented multicommodity flow problems; Piecewise 
linear network flow problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Traffic network equilibrium) 

maximum flow problem 

[90C35] 

see: Maximum flow problem) 

maximum flow problem 

[90C35] 

see: Maximum flow problem) 

maximum flow problem with nonnegative lower bounds 

[90C35] 

see: Maximum flow problem) 

maximum flow problems 

[90C05, 90C06, 90C08, 90C10, 90C11] 

see: Integer programming: cutting plane algorithms) 

maximum function 

[65K05, 90Cxx] 

see: Dini and Hadamard derivatives in optimization) 

maximum function with dependent constraints 

[65K05, 90C30] 

see: Minimax: directional differentiability) 

maximum Independent Set 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

see: Domination analysis in combinatorial optimization) 
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maximum Independent Set Problem 

[68Q25, 68R10, 68W40, 90C20, 90C27, 90C59, 90C60] 

(see: Domination analysis in combinatorial optimization; 

Quadratic knapsack) 
maximum likelihood 

[62F10, 65T40, 90C26, 90C30, 90C90, 94A17] 

(see: Entropy optimization: parameter estimation; Global 

optimization methods for harmonic retrieval) 

Maximum likelihood detection via semidefinite programming 

(65Y20, 68W25, 90C27, 90C22, 49N15) 
maximum likelihood estimate 

[15A15, 34-xx, 34Bxx, 34Lxx, 90C25, 90C55, 90C90, 93E24] 

(see: Complexity and large-scale least squares problems; 

Semidefinite programming and determinant 

maximization) 
maximum likelihood estimation 

[62F12, 65C05, 65K05, 90C15, 90C31] 

(see: Monte-Carlo simulations for stochastic optimization) 
maximum likelihood method see: iterative quadratic — 
maximum likelihood principle 

[90C25, 94417] 

(see: Entropy optimization: shannon measure of entropy 

and its properties) 
maximum matching 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
see: Maximum partition matching) 
maximum mean cut 
[68Q25, 68R05, 90-08, 90C27, 90C32] 

(see: Fractional combinatorial optimization) 
maximum mean-weight cut 

[68Q25, 68R05, 90-08, 90C27, 90C32] 

see: Fractional combinatorial optimization) 
maximum norm see: weighted — 

maximum number of well switches 

[76T30, 90C11, 90C90] 

see: Mixed integer optimization in well scheduling) 
maximum oil, gas and water capacity constraints 
[76T30, 90C11, 90C90] 

(see: Mixed integer optimization in well scheduling) 
Maximum partition matching 

(05A18, 05D15, 68M07, 68M10, 68Q25, 68R05) 

(referred to in: Assignment and matching; Assignment 

methods in clustering; Bi-objective assignment problem; 

Communication network assignment problem; Frequency 

assignment problem; Linear ordering problem; Quadratic 

assignment problem) 

(refers to: Assignment and matching; Assignment methods 

in clustering; Bi-objective assignment problem; 

Communication network assignment problem; Frequency 

assignment problem; Quadratic assignment problem) 
maximum partition matching problem 

[05A18, 05D15, 68MO07, 68M10, 68Q25, 68R05] 

(see: Maximum partition matching) 
maximum path length 

[62H30, 90C39] 

(see: Dynamic programming in clustering) 
maximum planar subgraph 

[90C10, 90C27, 94C15] 

(see: Graph planarization) 
maximum point see: global —; local —; strict local — 


maximum a posteriori principle 
(see: Bayesian networks) 
maximum pre-matching 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
maximum pre-matching problem 
[05A18, 05D15, 68M07, 68M 10, 68Q25, 68R05] 
(see: Maximum partition matching) 
maximum principle 
[49J15, 49K15, 93C10] 
(see: Pontryagin maximum principle) 
maximum principle see: duality and —; high-order local —; 
Kimura —; local —; pontryagin’s — 
maximum principle for abnormal extremals see: High-order — 
maximum principle for Lagrangian problems see: high-order 
local — 
maximum profit-to-time ratio cycle 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
maximum rank completion problem 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
maximum satisfiability 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
maximum satisfiability 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
Maximum satisfiability problem 
(03B05, 68Q25, 90C09, 90C27, 68P10, 68R05, 68T15, 68T20, 
94C10) 
(referred to in: Greedy randomized adaptive search 
procedures; Integer programming) 
(refers to: Greedy randomized adaptive search procedures; 
Integer programming; Integer programming: branch and 
bound methods; Simulated annealing methods in protein 
folding) 
maximum similarity subtree isomorphism 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
maximum subtree isomorphism 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
maximum-type function 
[49]40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
(see: Nonconvex energy functions: hemivariational 
inequalities) 
maximum Variance Unfolding 
[51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 
(see: Graph realization via semidefinite programming) 
maximum-volume ellipsoid 
[15A15, 90C25, 90C55, 90C90] 
(see: Semidefinite programming and determinant 
maximization) 
maximum weight clique 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 
maximum weight clique 
[90C20] 
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(see: Standard quadratic optimization problems: 
applications) 
maximum weight clique problem 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization; 
Standard quadratic optimization problems: applications) 
maximum weight independent sets 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 
maximum weight trace 
90C35] 
(see: Optimization in leveled graphs) 
maximum weighted distance 
90B85, 90C27] 
(see: Single facility location: circle covering problem) 
maximum weighted independent set 
05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
maximum weighted planar graph 
90C10, 90C27, 94C15] 
(see: Graph planarization) 
maxmin function 
65K05, 90C30] 
(see: Minimax: directional differentiability) 
maxmin function 
65K05, 90C30] 
(see: Minimax: directional differentiability) 
maxmin objective 
90B85] 
(see: Single facility location: multi-objective rectilinear 
distance location) 
Maybee-Quirk theorem see: Bassett- — 
Mayer form 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
Mazur-Orlicz theorem 
[46A22, 49]35, 49]40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
Mazur-Orlicz version of the Hahn-Banach theorem 
[46A22, 49]35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
Mazurkiewicz lemma see: Knaster-Kuratowski- — 
(MC) see: max Cut — 
McCormick SOCQ 
49K27, 49K40, 90C30, 90C31] 
(see: Second order constraint qualifications) 
MCD 
65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
MCDM 
90C29] 
(see: Decision support systems with multiple criteria) 
MCDM 
90-XX, 90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques; Outranking methods) 
mCl 
68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 


MCP 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
MCTP 
[90B06, 90B10, 90C26, 90C35] 
(see: Minimum concave transportation problems) 
MDO 
[90C90] 
see: Design optimization in computational fluid dynamics) 
MDO paradigm 
[65F10, 65F50, 65H10, 65K10] 
see: Multidisciplinary design optimization) 
MDVSP 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
Mead algorithm see: Nelder- — 
mean absolute deviation 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 
mean cut see: maximum — 
mean cycle see: minimum — 
mean field approximation 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
mean method see: geometric —; overall —; revised 
geometric — 
mean product 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
mean value 
[90C26, 90C30] 
(see: Bounding derivative ranges) 
Mean-Value for Composite Convexifiable Function see: 
integral — 
mean value cross decomposition 
[90B80, 90C10] 
(see: Facility location problems with spatial interaction) 
mean value extension 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval fixed point theory) 
mean value function 
[90C15, 90C29 
see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
mean value function 
[90C15, 90C29 
see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
mean value problem 
[90C90, 91B28 
see: Robust optimization) 
mean value theorem 
[65G20, 65G30, 65G40, 65H20, 65K05, 65K99, 90Cxx] 
see: Dini and Hadamard derivatives in optimization; 
Interval Newton methods) 
mean-variance 
[90C27] 
see: Operations research and financial markets) 
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mean-variance model see: Portfolio selection: markowitz — 
mean-variance optimization problems see: Decomposition 
algorithms for the solution of multistage — 
mean-variance portfolio analysis 
[91B50] 
(see: Financial equilibrium) 
mean-weight cost 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
mean-weight cut see: maximum — 
meaningful words 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
means criterion see: k- — 
measurable criterion 
[90C29, 91A99] 
(see: Preference disaggregation) 
measure see: acceptance —; a-concave —; Baire —; 
bottleneck —; combined relative —; contracting —; 
contraction/approximation —; controllability —; 
discrete —; dissimilarity —; distance —; empirical —; 
y-concave probability —; Gaussian —; heuristic —; inner 
regular —; Loeb —; logconcave —; logconcave 
probability —; logconvex —; logconvex probability —; 
quasiconcave —; quasiconcave probability —; Radon —; 
similarity —; subgradient locality —; Wiener —; Wiener 
probability —; zemel — 
measure of cross-entropy see: Kullback—Leibler — 
measure of entropy and its properties see: Entropy 
optimization: shannon — 
measure of noncompactness 
[90033] 
(see: Order complementarity) 
measure space 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
measure space see: probability — 
measure spaces 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
measure theory 
[01A99] 
(see: Carathéodory, Constantine) 
measure theory 
[01A99, 03H10, 49J27, 90C34] 
see: Carathéodory, Constantine; Semi-infinite 
programming and control problems) 
measure of uncertainty 
[90C25, 94417] 
see: Entropy optimization: shannon measure of entropy 
and its properties) 
measurement 
[90C29] 
see: Preference modeling) 
measurement see: Supply chain performance — 
measurement techniques 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
see: Identification methods for reaction kinetics and 
transport) 


measures see: condition —; Derivatives of probability —; 
dominated family of —; normalization of —; Radon —; 
regular family of probability —; weak convergence of 
probability —; weakly L, (v)-differentiable family of — 
measures, logconvexity see: Logconcave — 
mechanical constructions see: linearly elastic — 
mechanical models 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
mechanical principle of Fourier 
[15A39, 90C05] 
(see: Farkas lemma) 
mechanics see: applications in —; computational —; 
Hemivariational inequalities: applications in —; inequality 
or nonsmooth —; molecular —; Multilevel optimization 
in —; nonsmooth —-; parallel computation in —; smooth 
potentials and stability in —; unilateral — 
mechanizing 
01499] 
(see: History of optimization) 
median location 
90B85] 
(see: Single facility location: multi-objective rectilinear 
distance location) 
median location-allocation 
90C26] 
(see: MINLP: application in facility location-allocation) 
median location-allocation problem see: p- — 
median problem see: p- — 
median problem in a network see: 1- — 
medical applications 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
medical diagnosis 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 90C09, 
90C10, 91Axx, 91B06, 92C60] 
(see: Boolean and fuzzy relations; Optimization in boolean 
classification problems) 
medical diagnosis 
[90C09, 90C10] 
(see: Optimization in boolean classification problems) 
medical image processing see: optimization in — 
medical imaging 
[90C90] 
(see: Optimization in medical imaging) 
medical imaging see: Optimization in — 
medicine 
[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
medium-level software 
[90C10, 90C26, 90C30] 
(see: Optimization software) 
medium regression see: isotonic —; quasiconvex — 
Medium-term scheduling of batch processes 
meet 
[90C35] 
(see: Multi-index transportation problems) 
meet semisublattice 
[47J20, 49]40, 65K10, 90C33] 
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(see: Solution methods for multivalued variational 
inequalities) 

Megiddo’s parametric search 

68Q25, 68R05, 90-08, 90C27, 90C32] 

(see: Fractional combinatorial optimization) 

Megiddo parametric search 

68Q25, 68R05, 90-08, 90C27, 90C32] 

(see: Fractional combinatorial optimization) 

membership function 

90C90] 

(see: Chemical process planning) 

membership oracle 

05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 

90B, 90C] 

(see: Convex discrete optimization) 

memetic algorithms 

68120, 68T99, 90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 
90C57, 90C59, 90C60, 90C90] 

(see: Metaheuristics; Traveling salesman problem) 
memory see: adaptive —; limited- —; short-term — 
memory affine reduced Hessian see: limited- — 
memory algorithm see: limited- — 
memory approach see: limited- — 
memory BFGS method see: limited- — 
memory in GRASP see: long-term — 
memory model see: queueing shared- — 
memory parallel computer see: distributed — 
memory parallel machines see: distributed —; shared — 
memory reduced-Hessian BFGS algorithm see: limited- — 
memory strategy equilibrium 

[49]xx, 91 Axx] 

(see: Infinite horizon control and dynamic games) 
memory strategy Nash equilibrium 

[49Jxx, 91Axx] 

(see: Infinite horizon control and dynamic games) 
memory symmetric rank-one approach see: limited- — 
memoryless BFGS method 

[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 

(see: Unconstrained optimization in neural network 

training) 
MEN 

(93A30, 93B50) 

(referred to in: Continuous global optimization: models, 

algorithms and software; Global optimization of heat 

exchanger networks; Global optimization methods for 
systems of nonlinear equations; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; Mixed integer linear programming: mass and 
heat exchanger networks) 

(refers to: Global optimization of heat exchanger networks; 

MINLP: global optimization with «BB; MINLP: heat 

exchanger network synthesis; Mixed integer linear 

programming: heat exchanger network synthesis; Mixed 
integer linear programming: mass and heat exchanger 
networks) 
MEN 
[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks) 
MEN superstructure 
[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks) 


MEN synthesis method see: sequential — 

MEN synthesis model see: multiperiod MINLP — 

menace of the expanding grid 
[93-XX] 
(see: Dynamic programming: optimal control 
applications) 

mergers and acquisitions see: Multicriteria methods for — 

merit function 
[49M37, 65K05, 65K10, 90C06, 90C15, 90C25, 90C26, 90C30, 
90C33, 90C34, 90C35] 
(see: Feasible sequential quadratic programming; Implicit 
lagrangian; Inequality-constrained nonlinear optimization; 
Nonlinear least squares problems; Simplicial 
decomposition algorithms; Stochastic bilevel programs) 

merit function 
[49J40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05, 
90C06, 90C25, 90C30, 90C33, 90C35] 
(see: Cost approximation algorithms; Implicit lagrangian; 
Simplicial decomposition algorithms; Variational 
principles) 

merit function see: list square — 

merit index 

[62H30, 90C39] 

see: Dynamic programming in clustering) 

mesh 

[90C35] 

see: Feedback set problems) 

mesh see: toroidal — 

mesh networks 

[05C85] 

(see: Directed tree networks) 

message-driven 

65K05, 65Y05, 65Y10, 65Y20, 68W10] 
(see: Interval analysis: parallel methods for global 
optimization) 

Met-Enkephalin 
[92C05] 
(see: Adaptive simulated annealing and its application to 
protein folding) 

Met-Enkephalin 

92C05] 

(see: Adaptive simulated annealing and its application to 

protein folding) 

meta-UTA 

90C29, 91A99] 

(see: Preference disaggregation) 

metaheuristic 

90C27, 90C29] 
(see: Multi-objective combinatorial optimization) 

metaheuristic see: hybrid — 

metaheuristic algorithms see: heuristic- — 

metaheuristic algorithms for the traveling salesman problem 
see: Heuristic and — 

Metaheuristic algorithms for the vehicle routing problem 
(90B06, 90C59) 

Metaheuristics 
(68T20, 90C59, 90C27, 68T99) 

metaheuristics 
[90B06] 
(see: Vehicle routing) 
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metaheuristics 


[65H20, 65K05, 90-01, 90B06, 90B40, 90C10, 90C27, 90C35, 
94C15] 

(see: Greedy randomized adaptive search procedures; 
Vehicle routing) 


metaheuristics see: GRASP in hybrid — 
metaminimax theorem 


[46A22, 4935, 49]40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 


metamodel 


[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic 
optimization) 


method see: a priori —; achievable region —; active set —; 


adaptive aggregation —; adaptive computational —; 
adaptive random search —; adjoint derivative —; 
Agmon-Motzkin-Fourier relaxation —; Aitken double 
sweep —-; analytic center cutting plane —; approximate 
Newton —-; arc oriented branch and bound —; 
Arrow-Hurwicz gradient —; assignment —; averaging —; 
backpropagation —; Bellman-Ford —; BFGS —; binary 
search —; bisection —; bold-driver —; bordering —; 
boundary condition iteration —; branch and bound —; 
Broyden-Fletcher—Goldfarb-Shanno —; bundle-Newton —; 
canonical dual transformation —; Cauchy —; center of 
gravity —; Chan —; Chebyshev iterative —; Chevron —; 
conditional gradient —; conjugate gradient —; conjugate 
subgradient —; continuation —; coordinate descent —; 
corridor —; criss-cross —; cutting angle —; cutting 

plane —; Cyclic coordinate —; D'Esopo—Pape —; damped 
Gauss—Newton —; damped Newton —; 
Davidon-Fletcher—Powell —; De La Garza —; Delphi —; 
derivative-free descent —; descent —; destructive —; 

DFP —; diffusion equation —; Dinkelbach —; 
disaggregation —; discrete truncated Newton —; 
discretization —; distance scaling —; distrust region —; 
dogleg —; double description —; Douglas—Rachford —; 
downhill simplex —; edge contraction —; ellipsoid —; 
empirical —; €-constraint —; €-subgradient —; Euler —; 
Evtushenko —; Exact penalty —; extended cutting plane —; 
extended support problems —; exterior point —; extremal 
basis —; feasible decomposition —; fictitious domain —; 
finite element —; Fletcher-Reeves —; Forrest—Goldfarb —; 
Fourier—Motzkin —; Fourier—Motzkin elimination —; full 
space SQP —-; full-step Gauss-Newton —-; Galerkin 
spectral —; Gauss-Newton —; Gauss—Newton method: 
Least squares, relation to Newton's —; Gauss-Seidel —; 
Gauss-Southwell —; generalized Benders —; generalized 
cutting plane —; geometric mean —; global convergence 
problem for the Rosen —; Global optimization: cutting 
angle —; goal coordination —; Goerisch —; golden 
section —; Goldfarb-Idnani —; greedy —; heavy ball —; 
Heun —; heuristic optimization —; homotopic —; 
homotopy —; homotopy continuation —; homotopy 
Newton —; Huang —; Hungarian —; implicit restarted 
Lanczos —; Inclusion —; incremental gradient —; inexact 
Newton —-; integer L-shaped —; interactive —; interior 
point —; interior point logarithmic barrier —; interval 
Newton —; IQML —-; iterative —; iterative quadratic 
maximum likelihood —; iterative regularization —; 

Jacobi —; Kackmartz —; Karmarkar —; Keifer-Wolfowitz —; 


Kelley —; Kelley's classical cutting plane —; Kelley cutting 
plane —; KKT-based —; Kojima-Shindo —; Krawczyk —; 
Krawczyk variation of the interval Newton —; |-shaped —; 
Lagrangian finite generation —; largest inscribed sphere —; 
least-index criss-cross —; least-index pivoting —; 
Lehmann-Maehly —; Lemke —; Levenberg-Marquardt —; 
Levitin—Polyak —; lexicographic dual simplex —; 
lexicographic primal simplex —; lexicographic simplex —; 
lexicographic variant of the constraint-by-constraint —; 
likelihood ratio —; limited-memory BFGS —-; linear CG —; 
local search —; Logic-based outer-approximation —; 
memoryless BFGS —; metric-based —; metropolis Monte 
Carlo —; model-based —; model coordination —; modified 
Cauchy —; modified Newton —; Monte-Carlo —; MOSA —; 
multicriteria sorting —; multifrontal —; multiplier —; 
multivariate interval Newton —; NC —; 

Newsam-Ramsdell —; Newton's —; Newton-Raphson —; 
Newton-type —; node oriented branch and bound —; 
noising —; nonadaptive —; Nondifferentiable optimization: 
Newton —; nonfeasible decomposition —; 

noninteractive —; nonlinear CG —; nonparametric 
statistical —; nonsmooth Newton —; OA —; on-line —; 
outer approximation —; overall mean —; overdetermined 
Yule-Walker —; parameterization —; partial-update 
Newton —-;; partially asynchronous iterative —; partitioned 
quasi-Newton —- partitioning —; penalty-based —; 
piecewise sequential quadratic programming —-; Piela —; 
pilot —; pivot —; Polak-Ribiére —; polyblock 
approximation —; Powell —; 

Powell-symmetric-Broyden —; power —; primal —; 
principal pivoting —; projection —; proximal bundle —; 
proximal-like —; proximal point —; proximal point 

bundle —; pure Monte-Carlo —; pure NP —; QBB global 
optimization —; QR —; Quadratic fractional programming: 
Dinkelbach —; quasi-Newton —; random keys —; random 
search —; Rayleigh-Ritz —; reduced Hessian SQP —; 
reduction based —; reference direction —; regression —; 
regret —; relaxation —; response surface —; revised 
geometric mean —-; Ritz—Galerkin —; Robbins—Monro —; 
Rodriguez —; rollout —; Rosen —; Rosen gradient 
projection —; Rosenbrock —; row-action —; Rudolph —; 
satisficing —; Schruben-Margolin —; score function —; 
separated Newton —; sequential —; sequential MEN 
synthesis —; Sequential simplex —; Shanno conjugate 
gradient —; shaped —; Simple recourse problem: dual —; 
Simple recourse problem: primal —; simplex —; single 
underlying —; smoothing Newton —; Solanki —; SOR —; 
splitting —; splitting Newton —; square-root —; SR1 
quasi-Newton —-; steepest descent —; steepest edge 
simplex —; stochastic counterpart —; stochastic search —; 
Stoica —; support problems solution —; supports 
problems —; symmetric rank-one quasi-Newton —; 

tensor —; Terlaky criss-cross —; topological —; truncated 
Newton —-; trust region —; tunneling —; two-phase —; 
Two-stage stochastic programming: quasigradient —; 
univariate interval Newton —; UTA —; variable metric —; 
variable metric bundle —; vector maximization —; visual 
interactive —; Vogel approximation —; volumetric —; 
Wolfe reduced gradient —; Ziont criss-cross — 


method and applications to optimization problems see: 


Laplace — 
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method of bad derivatives 
[90C90] 
(see: Simulated annealing methods in protein folding) 
method-based 
68120, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
method of Broyden class see: quasi-Newton — 
method of characteristics 
34H05, 49L20, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 
method of codifferential descent 
65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
method of codifferential descent 
65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
method for detecting redundancy see: deterministic —; 
probabilistic — 
method in deterministic global optimization see: LP strategy 
for interval-Newton — 
method of feasible directions see: combined — 
method, global convergence, and Powell’s conjecture see: 
Rosen’s — 
method of hypodifferential descent 
[65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 
method of hypodifferential descent 
[65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
method for interhelical contacts in alpha-helical proteins see: 
Predictive — 
method of least squares 
[01499] 
(see: Gauss, Carl Friedrich) 
method of least squares 
[01499] 
(see: Gauss, Carl Friedrich) 
method: Least squares, relation to Newton's method see: 
Gauss—Newton — 
method for linear complementarity problems see: Splitting — 
method for linear programs see: Selfdual parametric — 
method of lines 
34-xx, 34Bxx, 34Lxx, 93E24] 
(see: Complexity and large-scale least squares problems) 
method of mappings 
49J20, 49]52] 
(see: Shape optimization) 
method for nonlinear programming see: feasible direction — 
method of optimal distance 
28-XX, 49-XX, 60-XX] 
(see: General moment optimization problems) 
method of optimal ratio 
28-XX, 49-XX, 60-XX] 
(see: General moment optimization problems) 


method for the simple recourse problem see: dual —; primal — 


method of simultaneous displacements 
[90C30] 
(see: Cost approximation algorithms) 


method of steepest descent 


[65K05, 90C30, 90C90] 
(see: Design optimization in computational fluid dynamics; 
Nondifferentiable optimization: minimax problems) 


method of successive displacements 


[90C30] 
(see: Cost approximation algorithms) 


method for two-stage stochastic programs with recourse see: 


L-shaped — 


method in unconstrained optimal control see: Dynamic 


programming and Newton's — 


methodologies see: semantic analysis —; solution — 
methodologies for auditing decisions see: Multicriteria 


decision support — 


methodology see: NP-complete problems and proof —; 


OR- —; tabu search —; trust region — 


methods see: ABS —; active set —; active set quadratic 


programming —; adaptive —; adjoint —; affine scaling 
SQPIP —; barrier —; basic outline of filled function —; 
Bayesian —; Bisection global optimization —; branch and 
bound —; Broyden —; Broyden family of —; bundle —; 
column generation —; combined relaxation —; 
complementary pivot —; computational —; computing 
processes in interactive —; Conjugate-gradient —; 
construction —; Credit rating and optimization —; cutting 
plane —; decomposition —; deflected gradient —; 
descent-based —; Disease diagnosis: 

optimization-based —; econometric —; ejection chain —; 
Entropy optimization: interior point —; evolutionary —; 
exact —; exact solution —; existence-proving properties of 
interval Newton —; extrapolation —; Extremum problems 
with probability functions: kernel type solution —; 
factorized quasi-Newton —-; feasible direction —; filled 
function —; finite difference —; full space —; Gaussian 
approximation —; Global optimization: filled function —; 
Global optimization: hit and run —; Global terrain —; 
Globally convergent homotopy —; gradient —; heuristic —; 
hit and run —; homotopy —; hybrid —; hybrid NP —; 
improvement —; incomplete —; indirect —; inexact 
Newton —; Integer programming: algebraic —; Integer 
programming: branch and bound —-; interactive —; 
interactive versus noninteractive —; interior point —; 
interval —; Interval analysis: subdivision directions in 
interval branch and bound —; interval Newton —; 
knowledge-based NP —; Krylov space type —; label 
correcting —; label setting —; limited enumeration —; line 
search —; Linear programming: interior point —; 
linearization —; MINLP: branch and bound —; MINLP: 
logic-based —; mixed —; Mixed integer 
programming/constraint programming hybrid —; 
Multi-scale global optimization using terrain/funneling —; 
Multicriteria sorting —; multicut —; multilevel —; 
multiplier —; Nondifferentiable optimization: cutting 
plane —; Nondifferentiable optimization: relaxation —; 
Nondifferentiable optimization: subgradient 

optimization —; Nonlinear least squares: Newton-type —; 
Nonlinear least squares: trust region —; nonsmooth —; 
numerical —; Outranking —; parallel —; parametric —; 
path following —; perturbation —; polyhedral —; 
polynomial time interior point —; posterior —; Practical 
augmented Lagrangian —; primal-dual —; primal-dual 
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interior-point —; primal-dual SQPIP —; Protein loop 
structure prediction —; proximal point —; quadrature —; 
qualitative forecasting —; quantitative forecasting —; 
quasi-Newton —; Quasidifferentiable optimization: exact 
penalty —; Random search —; regularity condition for 
penalty —; regularization of deterministic cutting plane —; 
Relaxation in projection —; Semi-infinite programming: 
approximation —; Semi-infinite programming: 
discretization —; Semi-infinite programming: numerical —; 
sequential quadratic programming —; Shape selective 
zeolite separation and catalysis: optimization —; 
smoothing —; software for homotopy —; solution —; 
Solving hemivariational inequalities by nonsmooth 
optimization —; Spectral projected gradient —; sQG —; 
sQG projection —; stochastic —; Stochastic global 
optimization: two-phase —; Stochastic optimal stopping: 
numerical —; stochastic quasigradient —; stochastic 
Quasigradient (SQG) —; Subgradient —; Successive 
quadratic programming: decomposition —; Successive 
quadratic programming: full space —; Successive quadratic 
programming: solution by active sets and interior point —; 
topological —; trust region —; unbounded controls and 
non standard —; variable metric —; Variable neighborhood 
search —; variational — 

methods: applications see: Stochastic quasigradient — 

methods and the BFGS update see: Broyden family of — 

methods in clustering see: Assignment — 

methods in complementarity theory see: Topological — 

methods for convex programming see: Lagrangian 
multipliers — 

methods for distributed optimal control problems see: 
Sequential quadratic programming: interior point — 

methods of feasible directions 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 

methods for global optimization see: Cutting plane —; Interval 
analysis: parallel — 

methods for harmonic retrieval see: Global optimization — 

methods for the linear complementarity problem see: 
Generalizations of interior point — 

methods for linear complementarity problems see: Principal 
pivoting — 

methods for linear problems see: Semi-infinite 
programming: — 

methods for linear programming see: Homogeneous 
selfdual —; Potential reduction — 

methods for mergers and acquisitions see: Multicriteria — 

methods in minimax problems see: Stochastic quasigradient — 

methods for multivalued variational inequalities see: 
Solution — 

Methods for Non-Differentiable Functions and Applications 
see: minimization — 

methods for non-smooth optimization see: Derivative-free — 

methods for nonconvex feasibility analysis see: Shape 
reconstruction — 

methods for nonlinear complementarity problems and 
variational inequalities see: Nonsmooth and smoothing — 

methods for optimal design see: Multilevel — 

methods for preference value functions see: Multi-objective 
optimization; Interactive — 

methods in protein folding see: Simulated annealing — 


methods in quadratic programming see: matrix splitting — 

methods for reaction kinetics and transport see: 
Identification — 

methods for semi-infinite optimization see: Smoothing — 

methods for semidefinite programming see: Interior point — 

methods for solving vehicle routing problems see: 
approximate —; constructive —; exact — 

methods in supply chain management see: Mathematical 
programming — 

methods for systems of nonlinear equations see: Global 
optimization — 

methods for unary optimization see: Numerical — 

metric 
[90B50] 
(see: Inventory management in supply chains) 

metric see: Ck-Riemannian —; probability —; Riemannian —; 
Shahshahani —; variable —; w-weighted Tchebycheff — 

metric-based 

90C30 

(see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 

metric-based method 

90C30 

(see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 

metric-based perspective 

90C30 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

metric bundle method see: variable — 

metric method see: variable — 

metric methods see: variable — 

metric projection 
[41A30, 47A99, 49]52, 65K10, 90C30] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions; Nondifferentiable 
optimization: Newton method) 

metric regularity 

49K27, 49K40, 90C30, 90C31] 

(see: First order constraint qualifications) 

metric regularity* 

49K27, 49K40, 90C30, 90C31] 

(see: First order constraint qualifications) 

metric space 

90C35] 

(see: Multi-index transportation problems) 

metrically regular 

49K27, 49K40, 90C30, 90C31] 

(see: First order constraint qualifications) 

Metropolis 

90C05, 90C25] 

(see: Metropolis, Nicholas Constantine) 

Metropolis criterion 

65K05, 90C30] 

(see: Random search methods) 

metropolis Monte Carlo method 

60J15, 60J60, 60370, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 

Metropolis, Nicholas Constantine 
(90C05, 90C25) 
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Metropolis process 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
MHD 
[65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
MHEN 
(93A30, 93B50) 
(referred to in: Continuous global optimization: models, 
algorithms and software; Global optimization of heat 
exchanger networks; Global optimization methods for 
systems of nonlinear equations; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; Mixed integer linear programming: mass and 
heat exchanger networks) 
(refers to: Global optimization of heat exchanger networks; 
MINLP: global optimization with «BB; MINLP: heat 
exchanger network synthesis; Mixed integer linear 
programming: heat exchanger network synthesis; Mixed 
integer linear programming: mass and heat exchanger 
networks) 
MHEN 
[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks) 
micro scale network 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
microcluster see: Lennard—Jones —; Morse — 
microclusters see: size effects in — 
microstate 
[90B80, 90C10] 
(see: Facility location problems with spatial interaction) 
mid connective 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
the mid-point acceleration function 
[25A15, 34A05, 90C25, 90C26, 90C30, 90C31] 
(see: Convexifiable functions, characterization of) 
mid-point acceleration function see: the — 
middle set 
[68R10, 90C27] 
(see: Branchwidth and branch decompositions) 
midpoint convexity see: discrete — 
midpoint test 


[65G20, 65G30, 65G40, 65H20, 65K05, 65Y05, 65Y10, 65Y20, 


68W 10, 90C30] 
(see: Interval analysis: parallel methods for global 
optimization; Interval analysis: unconstrained and 
constrained optimization; Interval global optimization) 
midpoint tests 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
migration see: multiclass — 
migration equilibrium 
[90C30] 
(see: Equilibrium networks) 
migration equilibrium 
[90C30] 
(see: Equilibrium networks) 


migration network equilibrium model 

[90C30] 

see: Equilibrium networks) 

Milman theorem see: Krein— — 

MILP 

see: Logic-based outer approximation) 

MILP 

[90C06, 90C10, 90C11, 90C30, 90C57, 90C90, 93A30, 93B50] 

see: Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer linear programming: 
mass and heat exchanger networks; Modeling difficult 
optimization problems; Optimal planning of offshore 
oilfield infrastructure) 

MILP: lagrangian relaxation see: Decomposition techniques 
for — 

MILP master problem 
[49M20, 90C11, 90C30] 
(see: Generalized outer approximation) 

MILP model see: Gasoline blending and distribution 
scheduling: an — 

MIN ant system see: MAX- — 

min-cut theorem see: max-flow — 

min duality see: double- — 

min-exchange heuristic 

[68T99, 90C27] 

see: Capacitated minimum spanning trees) 

min fractional program see: max- — 

min-max digraph 

[58E05, 90C30] 

see: Topology of global optimization) 

min-max digraph 

[58E05, 90C30] 

see: Topology of global optimization) 

min-max fractional program 

[90C32] 

see: Fractional programming) 

min-max graph 

[58E05, 90C30] 

(see: Topology of global optimization) 

min-max graph 

[58E05, 90C30] 

see: Topology of global optimization) 

min-max optimization problem see: max- — 

min-max Steiner tree 

[05C05, 05C85, 68Q25, 90B80] 

see: Bottleneck steiner tree problems) 

min-type function 

[90C26] 

see: Global optimization: envelope representation) 

min-type function 

[90C26] 

see: Global optimization: envelope representation) 

mindiag fine structures 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

see: Checklist paradigm semantics for fuzzy logics) 

minima 

[01A99] 

see: Leibniz, gottfried wilhelm) 

minima see: local —; multiple — 

minima problem in protein folding: @BB global optimization 
approach see: Multiple — 
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minimal 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
minimal see: locally — 
minimal best approximation 
[41A30, 47A99, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
minimal cone 
90C30] 
(see: Duality for semidefinite programming) 
minimal cut 
90C09, 90C10] 
(see: Oriented matroids) 
minimal dependent set 
90C09, 90C10] 
(see: Matroids) 
minimal function 
90C26] 
(see: Phase problem in X-ray crystallography: Shake and 
bake approach) 
minimal function 
[90C26] 
(see: Phase problem in X-ray crystallography: Shake and 
bake approach) 
minimal number of DNF clauses 
[90C09, 90C10] 
(see: Optimization in boolean classification problems) 
minimal principle 
[90C26] 
(see: Phase problem in X-ray crystallography: Shake and 
bake approach) 
minimal principle 
90C26] 
(see: Phase problem in X-ray crystallography: Shake and 
bake approach) 
minimal representation 
90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
minimal representation of a feasible region 
90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
minimal runs 
(see: Planning in the process industry) 
minimal social cost see: production realizing with — 
minimal tree see: Steiner — 
minimal tree problem see: Steiner — 
minimal value 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 


minimal variance clustering 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
minimax 
[65K05, 65K10, 90C06, 90C30, 90C34] 
(see: Feasible sequential quadratic programming) 
minimax 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 


minimax algorithm 
[49]35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
minimax approach see: Stochastic programming: — 
minimax decision rule 
[62C20, 90C15] 
(see: Stochastic programming: minimax approach) 
minimax decision rule 
[62C20, 90C15] 
(see: Stochastic programming: minimax approach) 
Minimax: directional differentiability 
(90C30, 65K05) 
(referred to in: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Minimax 
theorems; Nondifferentiable optimization: minimax 
problems; Stochastic programming: minimax approach) 
(refers to: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 


optimization: feasibility test and flexibility index; Minimax 


theorems; Nondifferentiable optimization: minimax 
problems; Stochastic programming: minimax approach; 
Stochastic quasigradient methods in minimax problems) 
minimax game 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
minimax game tree algorithm see: sequential — 
Minimax game tree searching 
(49]35, 49K35, 62C20, 91A05, 91A40) 


(referred to in: Bottleneck steiner tree problems; Capacitated 


minimum spanning trees; Shortest path tree algorithms) 
(refers to: Bottleneck steiner tree problems; Directed tree 
networks; Shortest path tree algorithms) 
minimax inequality see: two-function — 
minimax location on a sphere 
[90B85, 90C27] 
(see: Single facility location: circle covering problem) 
minimax objective 
[90B80, 90B85] 
(see: Warehouse location problem) 
minimax objective function 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
minimax objective function see: lexicographically — 
minimax observation problem 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
minimax observation problem under uncertainty with 
perturbations 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
minimax path length 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
minimax point see: saddle- — 
minimax principles 
[49J52] 
(see: Hemivariational inequalities: eigenvalue problems) 
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minimax problem 

[49K35, 49M27, 65K05, 65K10, 90C25, 90C30] 

(see: Convex max-functions; Nondifferentiable 

optimization: minimax problems) 
minimax problem 

[49K35, 49M27, 65K05, 65K10, 90C25, 90C30] 

(see: Convex max-functions; Minimax: directional 

differentiability; Nondifferentiable optimization: minimax 

problems) 
minimax problem see: constrained —; finite — 
minimax problems see: constrained —; Nondifferentiable 
optimization: —; Stochastic quasigradient methods in — 
minimax solution 

[62C20, 90C15] 

(see: Stochastic programming: minimax approach) 
minimax theorem 

[46A22, 49J35, 49]40, 54D05, 54H25, 55M20, 90C27, 91A05] 

(see: Minimax theorems; Steiner tree problems) 
minimax theorem 

[46A22, 49J35, 49]40, 54D05, 54H25, 55M20, 91A05] 

(see: Minimax theorems) 
minimax theorem see: Du-Hwang 
Minimax theorems 

(46A22, 49]35, 49J40, 54D05, 54H25, 55M20, 91A05) 

(referred to in: Bilevel linear programming: complexity, 

equivalence to minmax, concave programs; Bilevel 


; mixed —; saddle- 


optimization: feasibility test and flexibility index; Minimax: 


directional differentiability; Nondifferentiable 


optimization: minimax problems; Stochastic programming: 


minimax approach; Stochastic quasigradient methods in 
minimax problems) 

(refers to: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 


optimization: feasibility test and flexibility index; Minimax: 


directional differentiability; Nondifferentiable 


optimization: minimax problems; Stochastic programming: 


minimax approach; Stochastic quasigradient methods in 
minimax problems) 
minimax theory 
[65K05, 90C30] 
(see: Minimax: directional differentiability) 
minimax tree 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
minimax tree algorithm see: parallel — 
minimax trees see: parallelizing the exploration of — 
minimax value 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
minimization 
[90C60] 
(see: Computational complexity theory) 
minimization see: algorithms for unconstrained —; 
concave —; constrained —; crossing —; energy —; error 
global —; gradient-free —; k-level crossing —; leveled 
crossing —; local —; nonconvex —; proximal —; 
unconstrained —; vector — 
minimization algorithm see: gradient-free — 
minimization algorithms see: SSC —; supervisor and searcher 
cooperation — 


> 


minimization algorithms for nonsmooth and stochastic 
optimization see: SSC — 

minimization of cost/time 

[90C32] 

see: Fractional programming) 

minimization with D-functions see: proximal — 

minimization of losses 

[90C29] 

see: Estimating data for multicriteria decision making 

problems: optimization techniques) 

minimization of makespan 

see: Short-term scheduling of batch processes with 
resources) 

minimization Methods for Non-Differentiable Functions and 
Applications 

[01A70, 90-03] 

see: Shor, Naum Zuselevich) 

minimization of order earliness 

see: Short-term scheduling of batch processes with 

resources) 

minimization of Pinkava normal forms 

[03B50, 68T15, 68T30] 

see: Finite complete systems of many-valued logic algebras) 

minimization problem 

68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

minimization problem see: constrained — 

minimization problems see: decomposition algorithms for 
nonconvex — 

minimization of regret 

90C29] 

(see: Estimating data for multicriteria decision making 

problems: optimization techniques) 

minimizer 

49K27, 90C05, 90C25, 90C29, 90C30, 90C31, 90C48] 

(see: Nondifferentiable optimization: parametric 

programming; Set-valued optimization) 

minimizer 

65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 

minimizer see: €- —; global —; isolated local —; local —; 
near- —; nonsingular local —; regular local —; strict 
local —; strong local —; weak — 

minimizer problem see: local — 

minimizers see: global —; local — 

minimizing a convex multiplicative function see: program of — 

minimizing the degradation in quality of both water environment 
[90C30, 90C35] 
(see: Optimization in water resources) 

minimizing the energy function see: Optimization techniques 
for — 

minimizing functions 

[65G20, 65G30, 65G40, 65K05, 90C30] 

see: Interval global optimization) 

minimizing a generalized convex function see: program of — 

minimizing misclassifications 

[90C29] 

see: Multicriteria sorting methods) 

minimizing network cost 

[93A30, 93B50] 

see: MINLP: mass and heat exchanger networks) 
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minimizing the overall classification error 
[90C29] 
(see: Multicriteria sorting methods) 
minimizing a product of two affine functions see: program 
of — 
minimizing q see: CG-standard for — 
minimizing sequence 
[49M29, 65K10, 90C06] 


(see: Local attractors for gradient-related descent iterations) 


minimizing sequence see: generalized —; Levitin-Polyak — 

minimum 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 

minimum see: finding a —; global —; global energy —; 
local —; principle of —; relative —; strict local —; strict 
relative —; strong local —; strong relative — 

minimum Bisection Problem 

[68Q25, 68R10, 68W40, 90C27, 90C59] 

see: Domination analysis in combinatorial optimization) 

minimum circle 

[90B85, 90C27] 

see: Single facility location: circle covering problem) 

minimum clique partitioning 

[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

(see: Optimization problems in unit-disk graphs) 

minimum composition difference 

[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks; Mixed 
integer linear programming: mass and heat exchanger 
networks) 

minimum concave transportation problem 
[90B06, 90B10, 90C26, 90C35] 
(see: Minimum concave transportation problems) 

Minimum concave transportation problems 
(90C26, 90C35, 90B06, 90B10) 
(referred to in: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Concave 
programming; Motzkin transposition theorem; Stochastic 
transportation and location problems) 
(refers to: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Concave 
programming; Motzkin transposition theorem; 
Multi-index transportation problems; Stochastic 
transportation and location problems) 

minimum condition see: high-order local — 

Minimum cost flow problem 
(90C35) 
(referred to in: Auction algorithms; Communication 
network assignment problem; Dynamic traffic networks; 
Equilibrium networks; Generalized networks; Maximum 
flow problem; Multicommodity flow problems; Network 
design problems; Network location: covering problems; 
Nonconvex network flow problems; Nonoriented 
multicommodity flow problems; Piecewise linear network 
flow problems; Shortest path tree algorithms; Steiner tree 
problems; Stochastic network problems: massively parallel 


solution; Survivable networks; Traffic network equilibrium) 


(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Equilibrium networks; Evacuation 


networks; Generalized networks; Maximum flow problem; 
Multicommodity flow problems; Network design problems; 
Network location: covering problems; Nonconvex network 
flow problems; Nonoriented multicommodity flow 
problems; Piecewise linear network flow problems; Shortest 
path tree algorithms; Steiner tree problems; Stochastic 
network problems: massively parallel solution; Survivable 
networks; Traffic network equilibrium) 
minimum cost flow problem 
[05C05, 05C40, 68R10, 90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions; 
Minimum cost flow problem; Network design problems) 
minimum cost flow problem 
[90C35] 
(see: Minimum cost flow problem) 
minimum cost network flow 
[90B10] 
(see: Piecewise linear network flow problems) 
minimum cost network flow problem 
[90C35] 
(see: Generalized networks) 
minimum cost network flow problem see: piecewise linear — 
minimum cost-to-time ratio cycle 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
minimum cross-entropy see: axiomatic derivation of the 
principle of —; principle of — 
minimum cross-entropy principle 
[94A08, 94A17] 
(see: Maximum entropy principle: image reconstruction) 
minimum cross-entropy principle 
[90C25, 9417] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 
minimum cut 
90C35] 
(see: Maximum flow problem) 
minimum cut problem 
90C35] 
(see: Maximum flow problem) 
minimum cut problem 
90C35] 
(see: Maximum flow problem) 
minimum degree ordering 
65Fxx] 
(see: Least squares problems) 
minimum distance see: maximizing — 
minimum feedback arc set problem 
[90C35] 
(see: Feedback set problems) 
minimum feedback vertex (arc) set problem 
[90C35] 
(see: Feedback set problems) 
minimum feedback vertex (arc) set problem see: subset — 
minimum feedback vertex set 
[90C35] 
(see: Feedback set problems) 
minimum fill-in of a graph 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
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minimum function 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
minimum KKT point see: global — 
minimum lower set algorithm 
62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
minimum lower sets 
62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
minimum mean cycle 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
minimum Multiprocessor Scheduling Problem 
68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
minimum of an NNFP see: global — 
minimum norm controllability 
93-XX] 
(see: Optimal control of a flexible arm) 
minimum norm solution 
90C11, 90C25, 90C33] 
(see: Integer linear complementary problem; LCP: 
Pardalos-Rosen mixed integer formulation) 
minimum norm solution 
90C11, 90C33] 
(see: LCP: Pardalos-Rosen mixed integer formulation) 
minimum number of clauses 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
minimum number of Steiner points see: Steiner tree problem 
with — 
minimum Partition Problem (MP) 
68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
minimum path length 
62H30, 90C39] 
(see: Dynamic programming in clustering) 
minimum phase 
90C26, 90C90] 
(see: Signal processing with higher order statistics) 
minimum point see: global —; local —; strict local — 
minimum potential energy 
90C26, 90C90] 
(see: Global optimization in Lennard-Jones and morse 
clusters) 
minimum principle see: Pontryagin — 
minimum problem see: accessory — 
minimum rank completion problem 
05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
minimum ratio spanning-tree 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
minimum ratio test 
90C05] 
(see: Linear programming: Klee-Minty examples) 
minimum set 
03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
minimum solution see: global — 


minimum spanning arborescence problem see: capacitated — 
minimum spanning tree 
[05C05, 05C40, 68R10, 90C27, 90C35] 
(see: Network design problems; Steiner tree problems) 
minimum spanning tree problem 
[05C05, 05C40, 68R10, 68T99, 90C09, 90C10, 90C27, 90C35] 
(see: Capacitated minimum spanning trees; Matroids; 
Network design problems) 
minimum spanning tree problem see: bounded degree —; 
capacitated —; resource-constrained — 
minimum spanning trees 
[05C05, 05C40, 68R10, 90C35] 
see: Network design problems) 
minimum spanning trees see: Capacitated — 
minimum sphere 
[90B85, 90C27] 
see: Single facility location: circle covering problem) 
minimum sphere problem 
[90B85, 90C27] 
see: Single facility location: circle covering problem) 
minimum Steiner arborescence 
[90C27] 
(see: Steiner tree problems) 
minimum tree see: Steiner — 
minimum unfeasibility criterion 
[90C10, 90C29] 
(see: Multi-objective integer linear programming) 
minimum-units problem 
[90C90] 
(see: Mixed integer linear programming: heat exchanger 
network synthesis) 
minimum value see: positive — 
minimum Vertex Cover Problem 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
minimum-volume ellipsoid 
[15A15, 90C25, 90C55, 90C90] 
see: Semidefinite programming and determinant 
maximization) 
minimum weight common mutated sequence 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
see: Algorithms for genomic analysis) 
minimum weight feedback arc set problem 
[90C08, 90C11, 90C27, 90C57, 90C59] 
see: Quadratic assignment problem) 
minimum weight Steiner triangulation 
[68Q20] 
(see: Optimal triangulations) 
minimum weight triangulation 
[68Q20] 
see: Optimal triangulations) 
minimum weighted feedback vertex set problem 
[90C35] 
see: Feedback set problems) 
minimum weighted graph bipartization problem 
[90C35] 
see: Feedback set problems) 
minimum weighted vertex cover 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
mining see: Data —; Mathematical programming for data — 
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minisum 

65D18, 90B85, 90C26] 

(see: Global optimization in location problems) 

minisum problems 

90B80, 90B85] 

(see: Warehouse location problem) 

minkowski distances 

65K05, 90C27, 90C30, 90C57, 91C15] 

(see: Optimization-based visualization) 

Minkowski duality 

[90C26] 

see: Global optimization: envelope representation) 

Minkowski plane 

[90C27] 

see: Steiner tree problems) 

Minkowski sum 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 

MINLP 

[49M37, 90C11] 

see: Mixed integer nonlinear programming) 

MINLP 

[90C06, 90C10, 90C11, 90C26, 90C30, 90C57, 90C90, 93A30, 

93B50] 

see: MINLP: application in facility location-allocation; 
MINLP: branch and bound methods; MINLP: heat 
exchanger network synthesis; MINLP: mass and heat 
exchanger networks; MINLP: reactive distillation column 
synthesis; Modeling difficult optimization problems; 
Optimal planning of offshore oilfield infrastructure) 

MINLP see: challenges in —; convex —; global —; HEN 
synthesis using —; local —; nonconvex — 

MINLP: application in facility location-allocation 
(90C26) 
(referred to in: Chemical process planning; Combinatorial 
optimization algorithms in resource allocation problems; 
Facilities layout problems; Facility location with 
externalities; Facility location problems with spatial 
interaction; Facility location with staircase costs; 
Generalized benders decomposition; Generalized outer 
approximation; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: applications in 


blending and pooling problems; MINLP: applications in the 


interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: branch and 
bound methods; MINLP: design and scheduling of batch 
processes; MINLP: generalized cross decomposition; 
MINLP: global optimization with «BB; MINLP: heat 


exchanger network synthesis; MINLP: logic-based methods; 


MINLP: outer approximation algorithm; MINLP: reactive 
distillation column synthesis; Mixed integer linear 
programming: heat exchanger network synthesis; Mixed 
integer linear programming: mass and heat exchanger 
networks; Mixed integer nonlinear programming; 
Multifacility and restricted location problems; Network 
location: covering problems; Optimizing facility location 
with euclidean and rectilinear distances; Single facility 
location: circle covering problem; Single facility location: 
multi-objective euclidean distance location; Single facility 
location: multi-objective rectilinear distance location; 
Stochastic transportation and location problems; Voronoi 


diagrams in facility location; Warehouse location problem) 
(refers to: Chemical process planning; Combinatorial 
optimization algorithms in resource allocation problems; 
Competitive facility location; Extended cutting plane 
algorithm; Facility location with externalities; Facility 
location problems with spatial interaction; Facility location 
with staircase costs; Generalized benders decomposition; 
Generalized outer approximation; Global optimization in 
Weber’s problem with attraction and repulsion; MINLP: 
applications in blending and pooling problems; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 
MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: mass and heat 
exchanger networks; Mixed integer nonlinear 
programming; Multifacility and restricted location 
problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Production-distribution system design problem; 
Resource allocation for epidemic control; Single facility 
location: circle covering problem; Single facility location: 
multi-objective euclidean distance location; Single facility 
location: multi-objective rectilinear distance location; 
Stochastic transportation and location problems; Voronoi 
diagrams in facility location; Warehouse location problem) 


MINLP: applications in blending and pooling problems 


(90C90, 90C30) 

(referred to in: Chemical process planning; Generalized 
benders decomposition; Generalized outer approximation; 
MINLP: application in facility location-allocation; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 
MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer linear programming: 
mass and heat exchanger networks; Mixed integer 
nonlinear programming) 

(refers to: Chemical process planning; Extended cutting 
plane algorithm; Generalized benders decomposition; 
Generalized outer approximation; MINLP: application in 
facility location-allocation; MINLP: applications in the 
interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: branch and 
bound methods; MINLP: design and scheduling of batch 
processes; MINLP: generalized cross decomposition; 
MINLP: global optimization with «BB; MINLP: heat 
exchanger network synthesis; MINLP: logic-based methods; 
MINLP: outer approximation algorithm; MINLP: reactive 
distillation column synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming) 
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MINLP: applications in the interaction of design and control 
(90C11, 49M37) 

(referred to in: Chemical process planning; Control vector 
iteration CVI; Duality in optimal control with first order 
differential equations; Dynamic programming: 
continuous-time optimal control; Dynamic programming 
and Newton’s method in unconstrained optimal control; 
Dynamic programming: optimal control applications; 
Generalized benders decomposition; Generalized outer 
approximation; Hamilton-Jacobi-Bellman equation; 
Infinite horizon control and dynamic games; MINLP: 
application in facility location-allocation; MINLP: 
applications in blending and pooling problems; MINLP: 
branch and bound global optimization algorithm; MINLP: 
branch and bound methods; MINLP: design and scheduling 
of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer linear programming: 
mass and heat exchanger networks; Mixed integer 
nonlinear programming; Multi-objective optimization: 
interaction of design and control; Optimal control of a 
flexible arm; Robust control; Robust control: schur stability 
of polytopes of polynomials; Semi-infinite programming 
and control problems; Sequential quadratic programming: 
interior point methods for distributed optimal control 
problems; Suboptimal control) 

(refers to: Chemical process planning; Control vector 
iteration CVI; Duality in optimal control with first order 
differential equations; Dynamic programming: 
continuous-time optimal control; Dynamic programming 
and Newton’s method in unconstrained optimal control; 
Dynamic programming: optimal control applications; 
Extended cutting plane algorithm; Generalized benders 
decomposition; Generalized outer approximation; 
Hamilton-Jacobi-Bellman equation; Infinite horizon 
control and dynamic games; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLP: generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: logic-based methods; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: mass 
and heat exchanger networks; Mixed integer nonlinear 
programming; Multi-objective optimization: interaction of 
design and control; Optimal control of a flexible arm; 
Robust control; Robust control: schur stability of polytopes 
of polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Suboptimal control) 

MINLP: branch and bound global optimization algorithm 
(90C10, 90C26) 

(referred to in: «BB algorithm; Chemical process planning; 
Continuous global optimization: models, algorithms and 


software; Disjunctive programming; Generalized benders 
decomposition; Generalized outer approximation; Global 
optimization in batch design under uncertainty; Global 
optimization in generalized geometric programming; 
Global optimization methods for systems of nonlinear 
equations; Global optimization in phase and chemical 
reaction equilibrium; Interval analysis: subdivision 
directions in interval branch and bound methods; Interval 
global optimization; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound methods; 
MINLP: design and scheduling of batch processes; MINLP: 
generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: logic-based methods; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming; Smooth nonlinear 
nonconvex optimization) 

(refers to: & BB algorithm; Chemical process planning; 
Continuous global optimization: models, algorithms and 
software; Disjunctive programming; Extended cutting 
plane algorithm; Generalized benders decomposition; 
Generalized outer approximation; Global optimization in 
batch design under uncertainty; Global optimization in 
generalized geometric programming; Global optimization 
methods for systems of nonlinear equations; Global 
optimization in phase and chemical reaction equilibrium; 
Interval global optimization; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound methods; 
MINLP: design and scheduling of batch processes; MINLP: 
generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: logic-based methods; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: mass 
and heat exchanger networks; Mixed integer nonlinear 
programming; Reformulation-linearization technique for 
global optimization; Smooth nonlinear nonconvex 
optimization) 


MINLP: branch and bound methods 


(90C11) 

(referred to in: Chemical process planning; Disjunctive 
programming; Generalized benders decomposition; 
Generalized outer approximation; MINLP: application in 
facility location-allocation; MINLP: applications in 
blending and pooling problems; MINLP: applications in the 
interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer linear programming: 
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mass and heat exchanger networks; Mixed integer 
nonlinear programming) 

(refers to: Chemical process planning; Disjunctive 
programming; Extended cutting plane algorithm; 
Generalized benders decomposition; Generalized outer 
approximation; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: design and scheduling of 
batch processes; MINLP: generalized cross decomposition; 
MINLP: global optimization with «BB; MINLP: heat 
exchanger network synthesis; MINLP: logic-based methods; 
MINLP: outer approximation algorithm; MINLP: reactive 
distillation column synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming; 
Reformulation-linearization technique for global 
optimization) 

MINLP: design and scheduling of batch processes 

(90C26) 

(referred to in: Chemical process planning; Generalized 
benders decomposition; Generalized outer approximation; 
Job-shop scheduling problem; MINLP: application in 
facility location-allocation; MINLP: applications in 
blending and pooling problems; MINLP: applications in the 
interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: branch and 
bound methods; MINLP: generalized cross decomposition; 
MINLP: global optimization with «BB; MINLP: heat 
exchanger network synthesis; MINLP: logic-based methods; 
MINLP: outer approximation algorithm; MINLP: reactive 
distillation column synthesis; Mixed integer linear 
programming: heat exchanger network synthesis; Mixed 
integer linear programming: mass and heat exchanger 
networks; Mixed integer nonlinear programming; Vehicle 
scheduling) 

(refers to: Chemical process planning; Extended cutting 
plane algorithm; Generalized benders decomposition; 
Generalized outer approximation; Job-shop scheduling 
problem; MINLP: application in facility location-allocation; 
MINLP: applications in blending and pooling problems; 
MINLP: applications in the interaction of design and 
control; MINLP: branch and bound global optimization 
algorithm; MINLP: branch and bound methods; MINLP: 
generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: logic-based methods; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: mass 
and heat exchanger networks; Mixed integer nonlinear 
programming; Stochastic scheduling; Vehicle scheduling) 
MINLDP: generalized cross decomposition 

(90C11, 90C30, 49M27) 

(referred to in: Chemical process planning; Decomposition 
principle of linear programming; Generalized benders 
decomposition; Generalized outer approximation; MINLP: 
application in facility location-allocation; MINLP: 
applications in blending and pooling problems; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 


MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: global optimization 
with «BB; MINLP: heat exchanger network synthesis; 
MINLDP: logic-based methods; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming; Simplicial decomposition; 
Simplicial decomposition algorithms; Stochastic linear 
programming: decomposition and cutting planes; 
Successive quadratic programming: decomposition 
methods) 

(refers to: Chemical process planning; Decomposition 
principle of linear programming; Extended cutting plane 
algorithm; Generalized benders decomposition; 
Generalized outer approximation; MINLP: application in 
facility location-allocation; MINLP: applications in 
blending and pooling problems; MINLP: applications in the 
interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: branch and 
bound methods; MINLP: design and scheduling of batch 
processes; MINLP: global optimization with «BB; MINLP: 
heat exchanger network synthesis; MINLP: logic-based 
methods; MINLP: outer approximation algorithm; MINLP: 
reactive distillation column synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming; Simplicial decomposition; 
Simplicial decomposition algorithms; Stochastic linear 
programming: decomposition and cutting planes; 
Successive quadratic programming: decomposition 
methods) 


MINLP: global optimization with «BB 


(65K05, 90C11, 90C26) 

(referred to in: «BB algorithm; Chemical process planning; 
Convex envelopes in optimization problems; Disjunctive 
programming; Generalized benders decomposition; 
Generalized outer approximation; Global optimization in 
batch design under uncertainty; Global optimization in 
generalized geometric programming; Global optimization 
of heat exchanger networks; Global optimization methods 
for systems of nonlinear equations; Global optimization in 
phase and chemical reaction equilibrium; Interval global 
optimization; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLP: generalized cross decomposition; MINLP: heat 
exchanger network synthesis; MINLP: logic-based methods; 
MINLP: mass and heat exchanger networks; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming; Smooth nonlinear 
nonconvex optimization) 

(refers to: &BB algorithm; Chemical process planning; 
Continuous global optimization: models, algorithms and 
software; Convex envelopes in optimization problems; 
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Disjunctive programming; Extended cutting plane 
algorithm; Generalized benders decomposition; 
Generalized outer approximation; Global optimization in 
batch design under uncertainty; Global optimization in 
generalized geometric programming; Global optimization 
of heat exchanger networks; Global optimization methods 
for systems of nonlinear equations; Global optimization in 
phase and chemical reaction equilibrium; Interval global 
optimization; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLDP: generalized cross decomposition; MINLP: heat 
exchanger network synthesis; MINLP: logic-based methods; 
MINLP: mass and heat exchanger networks; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming; 
Reformulation-linearization technique for global 
optimization; Smooth nonlinear nonconvex 

optimization) 

MINLP: heat exchanger network synthesis 

(90C90) 

(referred to in: Chemical process planning; Continuous 
global optimization: models, algorithms and software; 
Generalized benders decomposition; Generalized outer 
approximation; Global optimization of heat exchanger 
networks; Global optimization methods for systems of 
nonlinear equations; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLP: generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: logic-based methods; 
MINLP: mass and heat exchanger networks; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming) 

(refers to: Chemical process planning; Extended cutting 
plane algorithm; Generalized benders decomposition; 
Generalized outer approximation; Global optimization of 
heat exchanger networks; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLP: generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: logic-based methods; 
MINLP: mass and heat exchanger networks; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 


programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming) 


MINLP: logic-based methods 


(90C10, 90C09, 90C11) 

(referred to in: Chemical process planning; Decomposition 
principle of linear programming; Disjunctive 
programming; Generalized benders decomposition; 
Generalized outer approximation; MINLP: application in 
facility location-allocation; MINLP: applications in 
blending and pooling problems; MINLP: applications in the 
interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: branch and 
bound methods; MINLP: design and scheduling of batch 
processes; MINLP: generalized cross decomposition; 
MINLP: global optimization with «BB; MINLP: heat 
exchanger network synthesis; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: heat exchanger network 
synthesis; Mixed integer linear programming: mass and 
heat exchanger networks; Mixed integer nonlinear 
programming; Simplicial decomposition; Simplicial 
decomposition algorithms; Stochastic linear programming: 
decomposition and cutting planes; Successive quadratic 
programming: decomposition methods) 

(refers to: Chemical process planning; Decomposition 
principle of linear programming; Disjunctive 
programming; Extended cutting plane algorithm; 
Generalized benders decomposition; Generalized outer 
approximation; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLP: generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: outer approximation algorithm; MINLP: 
reactive distillation column synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming; 
Reformulation-linearization technique for global 
optimization; Simplicial decomposition; Simplicial 
decomposition algorithms; Stochastic linear programming: 
decomposition and cutting planes; Successive quadratic 
programming: decomposition methods) 


MINLP: mass and heat exchanger networks 


(93A30, 93B50) 

(referred to in: Continuous global optimization: models, 
algorithms and software; Global optimization of heat 
exchanger networks; Global optimization methods for 
systems of nonlinear equations; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; Mixed integer linear programming: mass and 
heat exchanger networks) 

(refers to: Global optimization of heat exchanger networks; 
MINLP: global optimization with «BB; MINLP: heat 
exchanger network synthesis; Mixed integer linear 
programming: heat exchanger network synthesis; Mixed 
integer linear programming: mass and heat exchanger 
networks) 


MINLP MEN synthesis model see: multiperiod — 
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MINLP: outer approximation algorithm 
(90C11) 
(referred to in: Chemical process planning; Generalized 
benders decomposition; Generalized outer approximation; 
MINLP: application in facility location-allocation; MINLP: 
applications in blending and pooling problems; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 
MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: reactive distillation column 
synthesis; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming) 
(refers to: Chemical process planning; Extended cutting 
plane algorithm; Generalized benders decomposition; 
Generalized outer approximation; MINLP: application in 
facility location-allocation; MINLP: applications in 
blending and pooling problems; MINLP: applications in the 
interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: branch and 
bound methods; MINLP: design and scheduling of batch 
processes; MINLP: generalized cross decomposition; 
MINLP: global optimization with «BB; MINLP: heat 
exchanger network synthesis; MINLP: logic-based methods; 
MINLP: reactive distillation column synthesis; Mixed 
integer linear programming: mass and heat exchanger 
networks; Mixed integer nonlinear 
programming) 

MINLP: reactive distillation column synthesis 
(90C90) 
(referred to in: Chemical process planning; Generalized 
benders decomposition; Generalized outer approximation; 
MINLP: application in facility location-allocation; MINLP: 
applications in blending and pooling problems; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 
MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; Mixed integer linear programming: heat 
exchanger network synthesis; Mixed integer linear 
programming: mass and heat exchanger networks; Mixed 
integer nonlinear programming) 
(refers to: Chemical process planning; Extended cutting 
plane algorithm; Generalized benders decomposition; 
Generalized outer approximation; MINLP: application in 
facility location-allocation; MINLP: applications in 
blending and pooling problems; MINLP: applications in the 
interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: branch and 
bound methods; MINLP: design and scheduling of batch 
processes; MINLP: generalized cross decomposition; 
MINLP: global optimization with «BB; MINLP: heat 
exchanger network synthesis; MINLP: logic-based methods; 
MINLP: outer approximation algorithm; Mixed integer 


linear programming: mass and heat exchanger networks; 
Mixed integer nonlinear programming) 

MINLP: trim-loss problem 
(90C11, 90C90) 
(referred to in: Mixed integer nonlinear programming) 
(refers to: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Integer linear complementary 
problem; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos-Rosen 
mixed integer formulation; Mixed integer classification 
problems; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Time-dependent 
traveling salesman problem) 

mINLPs 
[65K05, 90C11, 90C26] 
(see: MINLP: global optimization with a BB) 

MINLPs see: twice-differentiable — 

minmax, concave programs see: Bilevel linear programming: 
complexity, equivalence to — 

minMax Matching Subgraph Problem 

68Q25, 68R10, 68W40, 90C27, 90C59] 

(see: Domination analysis in combinatorial optimization) 

minmax multicenter 

05C05, 05C85, 68Q25, 90B80] 

(see: Bottleneck steiner tree problems) 

minmax problem 

49-01, 49K45, 49N10, 90-01, 90C20, 90C27, 91B52] 

(see: Bilevel linear programming: complexity, equivalence 

to minmax, concave programs) 

minmax problem 

49-01, 49K45, 49N10, 90-01, 90C20, 90C27, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs) 

minor see: matroid — 

minor closed 
[68R10, 90C27] 
(see: Branchwidth and branch decompositions) 

minor of a matroid 
[90C09, 90C10] 
(see: Matroids) 

minorant see: greatest convex —; greatest K- —; greatest 
quasiconvex — 

minority 
[90-XX] 
(see: Outranking methods) 

MINOS 
[49M37, 65K05, 90C30] 
(see: Inequality-constrained nonlinear optimization) 

mintasks 
(see: Medium-term scheduling of batch processes) 
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Minty examples see: Klee- —; Linear programming: Klee- — 
Minty) GVI see: dual (or — 
MinxEnt 
[90C25, 94417] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 
MIP 
[65K05, 90C09, 90C10, 90C11, 90C20] 
(see: Disjunctive programming; Multi-quadratic integer 
programming: models and applications) 
mipstart 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
MIQP 
90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
MIQP master problem 
49M20, 90C11, 90C30] 
(see: Generalized outer approximation) 
Miranda fixed point theorem 
65G20, 65G30, 65G40, 65H20] 
(see: Interval fixed point theory) 
mIS 
68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
misclassifications see: minimizing — 
miss” decision problems see: “hit-or- — 
missing comparisons 
90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
missing information 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
missing information 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
Mitchell PTAS 
90C27] 
(see: Steiner tree problems) 
mitigator 
65K05, 90C26, 90C30, 90C59] 
(see: Global optimization: filled function methods) 
MITP 
90C35] 
(see: Multi-index transportation problems) 
MITPs see: greedy algorithm for axial —; hub heuristics for 
axial —; integer — 
Mixed 0-1 linear programming approach for DNA 
transcription element identification 
(90C08) 
mixed complementarity problem see: generalized — 
mixed construction procedure 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
mixed continuous and discrete free variables see: Generalized 
geometric programming: — 
mixed cycle 
[90C35] 
(see: Optimization in leveled graphs) 


mixed discrete-continuous global optimization 
[90C26, 90C29] 

see: Optimal design of composite structures) 
mixed discrete-continuous global optimization 
[90C26, 90C29, 90C90] 

see: Global optimization: hit and run methods; Optimal 
design of composite structures) 

mixed finite element 

[65M60] 

see: Variational inequalities: F. E. approach) 
mixed finite element approximation 

[65M60] 

see: Variational inequalities: F. E. approach) 
mixed fleet 

[90B06] 

see: Vehicle routing) 

mixed graph 

[90B35] 

see: Job-shop scheduling problem) 

mixed integer 

[49M27, 90C11, 90C30] 

see: MINLP: generalized cross decomposition) 
mixed integer 0-1 programs 

[90C09, 90C10, 90C11] 

see: Disjunctive programming) 


mixed integer @BB algorithm see: general structure —; special 
structure — 


Mixed Integer Bilevel Optimization 
(see: Mixed integer nonlinear bilevel programming: 
deterministic global optimization) 


Mixed integer classification problems 
(62H30,68T10,90C11) 
(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 
LCP: Pardalos—Rosen mixed integer formulation; Linear 
programming models for classification; MINLP: trim-loss 
problem; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Optimization in boolean classification problems; 
Optimization in classifying text documents; Parametric 
mixed integer nonlinear optimization; Set covering, 
packing and partitioning problems; Simplicial pivoting 
algorithms for integer programming; Statistical 
classification: optimization approaches; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Time-dependent traveling 
salesman problem) 
(refers to: Deterministic and probabilistic optimization 
models for data classification; Integer programming; Linear 
programming models for classification; Optimization in 
boolean classification problems; Statistical classification: 
optimization approaches) 
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mixed integer dynamic optimization 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 

mixed integer formulation see: LCP: Pardalos-Rosen — 

mixed-integer linear optimization see: Global pairwise protein 
sequence alignment via — 

mixed integer linear program see: single parametric — 

mixed integer/linear programming 
[90C06, 90C10, 90C11, 90C30, 90C46, 90C57, 90C90] 
(see: Integer programming duality; Modeling difficult 
optimization problems) 

mixed integer linear programming 
[90C90] 
(see: Chemical process planning) 

mixed integer linear programming see: Multiparametric — 

Mixed integer linear programming: heat exchanger network 
synthesis 
(90C90) 
(referred to in: Continuous global optimization: models, 
algorithms and software; Global optimization of heat 
exchanger networks; Global optimization methods for 
systems of nonlinear equations; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: mass and heat exchanger networks; 
Mixed integer linear programming: mass and heat 
exchanger networks) 
(refers to: Chemical process planning; Extended cutting 
plane algorithm; Generalized benders decomposition; 
Generalized outer approximation; Global optimization of 
heat exchanger networks; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLP: generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: logic-based methods; MINLP: outer 
approximation algorithm; MINLP: reactive distillation 
column synthesis; Mixed integer linear programming: mass 
and heat exchanger networks; Mixed integer nonlinear 
programming) 

Mixed integer linear programming: mass and heat exchanger 
networks 
(93A30, 93B50) 
(referred to in: Chemical process planning; Continuous 
global optimization: models, algorithms and software; 
Generalized benders decomposition; Generalized outer 
approximation; Global optimization of heat exchanger 
networks; Global optimization methods for systems of 
nonlinear equations; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLP: generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: logic-based methods; MINLP: mass and 
heat exchanger networks; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 


Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer nonlinear programming) 
(refers to: Chemical process planning; Extended cutting 
plane algorithm; Generalized benders decomposition; 
Generalized outer approximation; Global optimization of 
heat exchanger networks; MINLP: application in facility 
location-allocation; MINLP: applications in blending and 
pooling problems; MINLP: applications in the interaction 
of design and control; MINLP: branch and bound global 
optimization algorithm; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
MINLDP: generalized cross decomposition; MINLP: global 
optimization with «BB; MINLP: heat exchanger network 
synthesis; MINLP: logic-based methods; MINLP: mass and 
heat exchanger networks; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
Mixed integer linear programming: heat exchanger 
network synthesis; Mixed integer nonlinear programming) 


mixed-integer linear programs see: Robust optimization: — 
mixed integer nonconvex problem 


[65K05, 90C11, 90C26] 
(see: MINLP: global optimization with a BB) 


Mixed integer nonlinear bilevel programming: deterministic 


global optimization 


mixed integer nonlinear optimization 


[49M29, 49M37, 90C11, 90C26] 

(see: Generalized benders decomposition; Global 
optimization in batch design under uncertainty; Mixed 
integer nonlinear programming) 


mixed integer nonlinear optimization 


[49M37, 90C11, 90C29, 90C90] 

(see: MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control) 


mixed integer nonlinear optimization see: Parametric — 
Mixed-integer nonlinear optimization: A disjunctive cutting 


plane approach 
(49M37, 90C11) 


mixed integer nonlinear program 


[49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 


Mixed integer nonlinear programming 


(90C11, 49M37) 

(referred to in: Chemical process planning; Complexity 
classes in optimization; Complexity of degeneracy; 
Complexity of gradients, Jacobians, and Hessians; 
Complexity theory; Complexity theory: quadratic 
programming; Computational complexity theory; 
Continuous global optimization: applications; Fractional 
combinatorial optimization; Generalized benders 
decomposition; Generalized outer approximation; Global 
optimization in the analysis and management of 
environmental systems; Information-based complexity and 
information-based optimization; Interval global 
optimization; Kolmogorov complexity; MINLP: application 
in facility location-allocation; MINLP: applications in 
blending and pooling problems; MINLP: applications in the 
interaction of design and control; MINLP: branch and 
bound global optimization algorithm; MINLP: branch and 
bound methods; MINLP: design and scheduling of batch 
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processes; MINLP: generalized cross decomposition; 
MINLP: global optimization with «BB; MINLP: heat 
exchanger network synthesis; MINLP: logic-based methods; 
MINLP: outer approximation algorithm; MINLP: reactive 
distillation column synthesis; Mixed integer linear 
programming: heat exchanger network synthesis; Mixed 
integer linear programming: mass and heat exchanger 
networks; NP-complete problems and proof methodology; 
Parallel computing: complexity classes) 

(refers to: Chemical process planning; Complexity classes in 
optimization; Complexity of degeneracy; Complexity of 
gradients, Jacobians, and Hessians; Complexity theory; 
Complexity theory: quadratic programming; 
Computational complexity theory; Continuous global 
optimization: applications; Extended cutting plane 
algorithm; Fractional combinatorial optimization; 
Generalized benders decomposition; Generalized outer 
approximation; Global optimization in the analysis and 
management of environmental systems; Information-based 
complexity and information-based optimization; Interval 
global optimization; Kolmogorov complexity; MINLP: 
application in facility location-allocation; MINLP: 
applications in blending and pooling problems; MINLP: 
applications in the interaction of design and control; 
MINLP: branch and bound global optimization algorithm; 
MINLP: branch and bound methods; MINLP: design and 
scheduling of batch processes; MINLP: generalized cross 
decomposition; MINLP: global optimization with a BB; 
MINLP: heat exchanger network synthesis; MINLP: 
logic-based methods; MINLP: outer approximation 
algorithm; MINLP: reactive distillation column synthesis; 
MINLP: trim-loss problem; Mixed integer linear 
programming: mass and heat exchanger networks; Parallel 
computing: complexity classes) 

mixed integer nonlinear programming 

[49M20, 49M37, 90C06, 90C10, 90C11, 90C26, 90C29, 90C30, 
90C57, 90C90] 

(see: Generalized outer approximation; MINLP: branch and 
bound global optimization algorithm; Mixed integer 
nonlinear programming; Modeling difficult optimization 
problems; Multi-objective optimization: interaction of 
design and control) 

mixed integer nonlinear programming 

[49M20, 90C10, 90C11, 90C26, 90C30] 

(see: Generalized outer approximation; MINLP: branch and 
bound global optimization algorithm; MINLP: outer 
approximation algorithm) 

mixed integer nonlinear programming problem 

[90C11, 90C90] 

(see: MINLP: branch and bound methods; Mixed integer 
linear programming: heat exchanger network synthesis) 
mixed integer optimal control problem 

[49M37, 90C11, 90C29, 90C90] 

(see: MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control) 

Mixed Integer Optimization 

(see: Mixed integer nonlinear bilevel programming: 
deterministic global optimization) 

mixed-integer optimization see: Multi-class data classification 
via —; Peptide identification via — 


mixed integer optimization problem 

[90C26] 

see: Bilevel optimization: feasibility test and flexibility 

index) 

Mixed integer optimization in well scheduling 

76T30, 90C11, 90C90) 

mixed integer problem 

[90C11, 90C33] 

see: LCP: Pardalos-Rosen mixed integer formulation) 

mixed integer problem 

[90C11, 90C33] 

see: LCP: Pardalos-Rosen mixed integer formulation) 

mixed integer problems see: 0-1 —; linear — 

mixed integer program 

[90C10, 90C11, 90C27, 90C57] 

see: Integer programming) 

Mixed-integer programming 

[90B50, 90C29] 

see: Logic-based outer approximation; Multicriteria sorting 

methods; Optimization and decision support systems) 

mixed integer programming 

[90B50, 90B80, 90C09, 90C10, 90C11, 90C33] 

see: Facility location with staircase costs; LCP: 
Pardalos—Rosen mixed integer formulation; MINLP: 
branch and bound methods; MINLP: logic-based methods; 
Mixed integer nonlinear bilevel programming: 
deterministic global optimization; Optimal planning of 
offshore oilfield infrastructure; Optimization and decision 
support systems; Railroad crew scheduling; Railroad 
locomotive scheduling) 

mixed integer programming see: Multi-objective —; 
multi-objective (multicriteria) —; stochastic — 

Mixed integer programming/constraint programming hybrid 
methods 
(referred to in: Continuous reformulations of 
discrete-continuous optimization problems) 

mixed integer programming problem see: large scale 
nonlinear — 

mixed integer programs 

[90C11, 90C59] 

see: Nested partitions optimization) 

mixed-integer quadratic programming 

[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

see: Modeling difficult optimization problems) 

mixed integer rounding cut 

[90C11] 

see: MINLP: branch and bound methods) 

mixed integer value function 

[90C11, 90C15, 90C31] 

see: Stochastic integer programming: continuity, stability, 

rates of convergence) 

mixed linear complementarity problem 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: monoduality in convex optimization) 

mixed methods 

[90B06] 

see: Vehicle routing) 

mixed minimax theorem 

[46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 

see: Minimax theorems) 
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mixed nonlinear integer programming problem 
[65K05, 90C26, 90C30, 90C59] 
(see: Global optimization: filled function methods) 
mixed-product campaign 
[90C26] 
(see: MINLP: design and scheduling of batch processes) 
mixed Time Representation 
(see: Integrated planning and scheduling) 
mixed VAM construction procedure 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
mixed variational formulation 
[65M60] 
(see: Variational inequalities: F. E. approach) 
mixed variational inequality 
[49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 
mixture problem see: fuel — 
MLP 
[90C15] 
(see: L-shaped method for two-stage stochastic programs 
with recourse) 
MLS 
[62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
MMP 
[65K05, 90C30] 
see: Nondifferentiable optimization: minimax problems) 
mobilization curve 
(see: Emergency evacuation, optimization modeling) 
MOCO 
[90C10, 90C35] 
(see: Bi-objective assignment problem) 
mode 
[65K05, 65Y05] 
(see: Parallel computing: models) 
mode see: batch —; forward —; reverse — 
mode of AD see: forward —; reverse — 
mode of an AD algorithm see: forward —; reverse — 
mode of automatic differentiation see: backward —; 
forward —; reverse — 
model 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
model see: assignment —; black oil —; Black-Scholes —; 
bridging —; BSP —; bulk synchronous parallel —; capital 
asset pricing —; classical linear regression —; classical 
thermoelastic —; computational —; containment graph —; 
continuous global optimization —; continuous review —; 
continuous Time —; convex —; Cournot-Nash oligopolistic 
equilibrium —; cutting plane —; deformable —; 
deterministic equivalent —; diagonal —; discrete Time —; 
dynamic —; dynamic traffic network —; energy —; 
epidemic —; errors-in-variables —; expanded 
transshipment —; export —; facility location —; flipping —; 
forecasting —; fractional routing pattern —; Gasoline 
blending and distribution scheduling: an MILP —; general 
univariate linear —; glS design pattern based —; 
heteroscedastic —; homogeneous and selfdual —; 
homoscedastic —; hybrid —; import —; 
information-based —; intersection graph —; Ising glass —; 


linear —; linear two-stage —; location-allocation —; 
logistics control —; LogP —; matching —; mathematical —; 
migration network equilibrium —; moving average —; the 
multi-resource weighted assignment —; multi-sector 
multi-instrument financial equilibrium —; multimodal 
traffic network equilibrium —; multiperiod —; multiperiod 
MINLP MEN synthesis —; network flow —; newsboy —; 
oligopoly —; parametric programming —; partial 
equilibrium —; perfectly competitive equilibrium —; 
periodic review —; plant location —; Portfolio selection: 
markowitz mean-variance —; Potts glass —; price —; 
proximity graph —; pure exchange economic 
equilibrium —; pure trade economic equilibrium —; 
QSM —; quantity —; quasi-assignment —; queueing 
shared-memory —; Ramsey —; real number —; recourse —; 
reduced —-; relational —; right-hand side perturbation —; 
rotation-symmetry —; Sharpe single index market —; 
sign-invariance —; single path routing pattern —; 
single-period —; spatial competition facility location —; 
spatial-interaction —; spatial oligopoly —; static —; 
stochastic —; Stochastic facility location —; 
superstructure —; transshipment —; trust region —; Turing 
machine —; vector space —; well bore —; Wiener — 

model B 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 

model-based 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

Model based control for drug delivery systems 
(refers to: Nondifferentiable optimization: parametric 
programming) 

model-based controllers via parametric programming see: 
Design of robust — 

model-based experimental analysis 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 

model-based method 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

model-based perspective 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

model BF 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 

model BFR 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
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model building 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
model calibration 
[90C05] 
(see: Global optimization in the analysis and management 
of environmental systems) 
model of computation 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 


(see: Information-based complexity and information-based 


optimization) 
model of conflicting populations see: Volterra — 
model coordination method 

[49Q10, 74K99, 74Pxx, 90C90, 91465] 

(see: Multilevel optimization in mechanics) 
model development 

[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 

76R50, 80A20, 80A23, 80A30] 

(see: Identification methods for reaction kinetics and 

transport) 
model features see: special — 
model finding procedure 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 

94C10] 

(see: Maximum satisfiability problem) 
model and Gibbs sampler see: hidden Markov — 
model identification 

[62F10, 94A17] 

(see: Entropy optimization: parameter estimation) 
model with impulse perturbations see: Vasicek — 
model independent 

[65H20, 80A10, 80A22, 90C90] 


(see: Global optimization: application to phase equilibrium 


problems) 
model nodes see: plant/ —; retailer/ — 
model/optimizer coupling 

[90C30, 90C90] 


(see: Successive quadratic programming: applications in the 


process industry) 
model in OR 
[90B80, 90B85] 
(see: Warehouse location problem) 
model in OR see: continuous —; discrete —; 
multicommodity —; single-commodity — 
model for parallel algorithm design 
[65K05, 65Y05] 
(see: Parallel computing: models) 
model parameters see: estimation of — 
model predictive control 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
model refinement see: iterative — 
model reformulation 
[90C09, 90C10, 90C11] 
(see: Disjunctive programming) 
model robust 
[90C90, 91B28] 
(see: Robust optimization) 
model structure refinement see: incremental strategy 
for — 


model structures 

[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 

76R50, 80A20, 80A23, 80A30] 

see: Identification methods for reaction kinetics and 

transport) 

model types 

[90C05] 

see: Continuous global optimization: models, algorithms 

and software) 

model validation 

[49Q10, 74K99, 74Pxx, 90C90, 91A65] 

see: Multilevel optimization in mechanics) 

model world 

[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

(see: Modeling difficult optimization problems) 

modeling 

[90C06, 90C10, 90C11, 90C27, 90C30, 90C57, 90C90] 

see: Modeling difficult optimization problems; Operations 
research and financial markets) 

modeling see: conceptual —; Emergency evacuation, 
optimization —; energy —; mathematical —; 
nonsmooth —-; preference —; problem —; uncertainty — 

modeling agricultural systems see: State of the art in — 

Modeling difficult optimization problems 
(90C06, 90C10, 90C11, 90C30, 90C57, 90C90) 

modeling framework see: multiperiod optimization — 

modeling frameworks see: Short-term scheduling, resource 
constrained: unified — 

modeling language 

[90C10, 90C30] 

see: Modeling languages in optimization: a new paradigm) 

modeling language 

[90C10, 90C30] 

see: Modeling languages in optimization: a new paradigm) 

modeling language see: algebraic — 

modeling language and constraint logic programming 

[90C10, 90C30] 

see: Modeling languages in optimization: a new paradigm) 

modeling languages see: algebraic —; second generation — 

Modeling languages in optimization: a new paradigm 

90C10, 90C30) 
(referred to in: Continuous global optimization: models, 
algorithms and software; Large scale unconstrained 
optimization; Optimization software) 
(refers to: Continuous global optimization: models, 
algorithms and software; Large scale unconstrained 
optimization; Optimization software) 

modeling and management see: applications in environmental 
systems — 

modeling mass exchange 

[93A30, 93B50] 

see: MINLP: mass and heat exchanger networks) 

modeling production 

see: Planning in the process industry) 

modello 

[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

see: Modeling difficult optimization problems) 

modellus 

[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

see: Modeling difficult optimization problems) 
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models see: complexity of —; compositional —; continuous 
and discrete time —; discrete-time —; econometric —; 
estimation of diffusion flux —; Global optimization based 
on statistical —; Global supply chain —; hidden Markov —; 
location-routing —; locomotive assignment —; 
mathematical —; mechanical —; multiple locomotive 
type —; multipopulation replicator —; multistage inventory 
management —; nonlinear decision —; Parallel 
computing: —; parametric programming —-; positive 
definite quadratic —; purpose of —; recourse —; 
representation of —; restricted recourse —; simulation —; 
simultaneous equation —; single locomotive scheduling —; 
single stage inventory management —-; single versus 
Multiperiod —; Static stochastic programming —; 
statistical —; strategic design —; supply chain 
simulation —; thermodynamic —; time-stamped —; 
transportation —; two-stage stochastic programming —; 
undirected multicommodity network flow — 

models, algorithms and software see: Continuous global 
optimization: — 

models and applications see: Multi-quadratic integer 
programming: — 

models for classification see: Linear programming — 

models: conditional expectations see: Static stochastic 
programming — 

models for data classification see: Deterministic and 
probabilistic optimization — 

models for entropy optimization for image reconstruction see: 
finite-dimensional —; vector-space — 

models for parallel computing 
[65K05, 65Y05] 
(see: Parallel computing: models) 

models: (QR) policy see: Continuous review inventory — 

models: random objective see: Stochastic programming — 

models for supply chain management and design see: 
Operations research — 

modified 
(see: Emergency evacuation, optimization modeling) 

modified Cauchy approach 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

modified Cauchy method 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

modified Gram-Schmidt orthogonalization 

[65Fxx] 

(see: Least squares problems) 

modified Huang algorithm 

[65K05, 65K10] 

see: ABS algorithms for linear equations and linear least 

squares; ABS algorithms for optimization) 

modified Kruskal algorithm 

[68T99, 90C27] 

see: Capacitated minimum spanning trees) 

modified Lagrangian 

[90C25, 90C30] 

see: Lagrangian multipliers methods for convex 

programming) 


modified Newton method 

[90C30] 

(see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 
modified Prim algorithm 
68T99, 90C27] 

(see: Capacitated minimum spanning trees) 
modified square-root transformation 
90C11, 90C90] 

(see: MINLP: trim-loss problem) 
modified standard auction algorithm 
90B10, 90C27] 

(see: Shortest path tree algorithms) 
modifying matrix factorization 

65Fxx] 

(see: Least squares problems) 
MODP 

90C31, 90C39] 

(see: Multiple objective dynamic programming) 
MODP see: principle of Pareto optimality of — 
modular 

[90C30, 90C90] 

(see: Successive quadratic programming: applications in the 

process industry) 
modular 

[90C30, 90C90] 

(see: Successive quadratic programming: applications in the 

process industry) 
modular approach 

[90C30, 90C90] 

(see: Successive quadratic programming: applications in the 

process industry) 
module see: mass/heat transfer — 
modus confirmans 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
modus negans 

03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
modus ponens 

03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
modus ponens see: checklist — 

modus tollens 

03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 

modus tollens see: checklist — 

Moebius function 

05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements) 

MOILP 

90C27, 90C29] 

(see: Multi-objective combinatorial optimization) 
molar Gibbs free energy 

90C26, 90C90] 

(see: Global optimization in phase and chemical reaction 
equilibrium) 

molecular conformation 

65D 18, 90B85, 90C26] 

(see: Global optimization in location problems) 
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molecular design 

[49M37, 90C11] 

(see: Mixed integer nonlinear programming) 
Molecular distance geometry problem 

(46N60) 
molecular dynamics 

[90C90] 

(see: Simulated annealing methods in protein folding) 
molecular mechanics 
65K10, 92C40] 

(see: Multiple minima problem in protein folding: «BB 
global optimization approach) 

molecular optimization 

90C90] 

(see: Simulated annealing methods in protein folding) 
molecular structure determination 

65K05, 90C26] 

(see: Molecular structure determination: convex global 

underestimation) 

Molecular structure determination: convex global 
underestimation 

(65K05, 90C26) 

(referred to in: Adaptive simulated annealing and its 

application to protein folding; Genetic algorithms; Global 

optimization in Lennard-Jones and morse clusters; Graph 
coloring; Monte-Carlo simulated annealing in protein 
folding; Multiple minima problem in protein folding: «BB 
global optimization approach; Packet annealing; Phase 
problem in X-ray crystallography: Shake and bake 
approach; Simulated annealing methods in protein folding) 

(refers to: Adaptive simulated annealing and its application 

to protein folding; Genetic algorithms; Global optimization 

in Lennard-Jones and morse clusters; Global optimization 
in protein folding; Monte-Carlo simulated annealing in 
protein folding; Multiple minima problem in protein 
folding: wBB global optimization approach; Packet 
annealing; Phase problem in X-ray crystallography: Shake 
and bake approach; Protein folding: generalized-ensemble 
algorithms; Simulated annealing; Simulated annealing 
methods in protein folding) 

MOLFP 

[90C29, 90C70] 

(see: Fuzzy multi-objective linear programming) 
mollifier see: standard — 
mollifier quasigradient see: stochastic — 

MOLP with fuzzy coefficients 

[90C29, 90C70] 

(see: Fuzzy multi-objective linear programming) 
MOLP with fuzzy coefficients see: flexible — 
moment see: dipole — 
moment conditions 

[62C20, 90C15] 

(see: Stochastic programming: minimax approach) 
moment conditions see: optimal integral bounds subject to — 
moment optimization problems see: General — 
moment problem see: convex —; finite —; infinite —; infinite 

many conditions —; solution of the convex —; standard — 
moment theory 

[93-XX] 

(see: Optimal control of a flexible arm) 
moment theory see: geometric — 


moments see: binomial — 

momentum balances see: mass, energy and — 

momentum updating rule 

[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 

see: Unconstrained optimization in neural network 

training) 

MOMILP 

[90C11, 90C29] 

see: Multi-objective mixed integer programming) 

MOMIP 

[90C11, 90C29] 

see: Multi-objective mixed integer programming) 

momments see: characterizing — 

monads 

[03E70, 03H0S, 91B16] 

see: Alternative set theory) 

Mond-Weir dual 

[90C26] 

see: Invexity and its applications) 

Mond-Weir dual 

[90C26] 

see: Invexity and its applications) 

Monge inequalities 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

Monge inequalities see: anti- — 

Monge matrix 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

Monge matrix see: anti- — 

Monge property 

[90C35] 

see: Multi-index transportation problems) 

Monge property 

[90C35] 

see: Multi-index transportation problems) 

monitored 

see: Emergency evacuation, optimization modeling) 

monoduality 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: biduality in nonconvex optimization) 

monoduality in convex optimization see: Duality theory: — 

monomial 

[12D10, 12Y05, 13P10] 

see: Grébner bases for polynomial equations) 

monomial ideal 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 

monomial ideal see: admissible pair of a —; arithmetic degree 
of a —; standard pair of a —; standard pair decomposition 
of a— 

monomials see: posynomial —; standard — 

monopoly 
[91B06, 91B60] 
(see: Oligopolistic market equilibrium) 

monotone 
[46N10, 47J20, 49J40, 49L20, 65K10, 90C26, 90C33, 90C40] 
(see: Dynamic programming: stochastic shortest path 
problems; Generalized monotone multivalued maps; 
Generalized monotone single valued maps; Generalized 
monotonicity: applications to variational inequalities and 
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equilibrium problems; Solution methods for multivalued 
variational inequalities) 
monotone see: strictly —; strongly — 
monotone bifunction 
[46N10, 49J40, 90C26] 
see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
monotone Boolean function 
[90C09] 
see: Inference of monotone boolean functions) 
monotone Boolean function 
[90C09] 
see: Inference of monotone boolean functions) 
monotone Boolean function see: antitone —; isotone —; 
nondecreasing —; nonincreasing — 
monotone Boolean function inference 
[90C09] 
see: Inference of monotone boolean functions) 
monotone boolean functions see: Inference of — 
monotone convergence theorem 
[49L20, 90C40] 
see: Dynamic programming: undiscounted problems) 
monotone function 
[65K10, 65M60] 
see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
monotone function see: locally —; locally strictly —; locally 
strongly —; strictly —; strongly — 
monotone laws and variational inequalities see: multivalued — 
monotone map 
[90C26] 
(see: Generalized monotone single valued maps) 
monotone map see: maximal —; strictly — 
monotone matrix 
[90033] 
(see: Linear complementarity problem) 
monotone multivalued maps see: Generalized — 
monotone operator 
[46N10, 49J40, 90C26] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
monotone operator see: generalized —; strictly — 
monotone operator on a Banach space 
[46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
see: Minimax theorems) 
monotone sequence see: Fejér — 
monotone sequence of greedy swaps 
[90C08, 90C11, 90C27, 90C57, 90C59] 
see: Quadratic assignment problem) 
monotone single valued maps see: Generalized — 
monotonic 
[65K05, 90C26, 90C30] 
see: Monotonic optimization) 
monotonic analysis 
[26A48, 26A51, 52A07] 
see: Increasing and convex-along-rays functions on 
topological vector spaces) 
monotonic at see: locally strongly — 
monotonic function 
[41A30, 62J02, 90C26] 


(see: Regression by special functions: algorithms and 

complexity) 
monotonic functions see: difference of — 
Monotonic optimization 

(90C26, 65K05, 90C30) 
monotonic optimization 

[65K05, 90C26, 90C30, 90C31] 

(see: Cutting plane methods for global optimization; D.C. 

programming; Monotonic optimization; Robust global 

optimization) 
monotonic optimization 

[90C26] 

(see: Cutting plane methods for global optimization) 
monotonic optimization problem see: canonical — 
monotonic optimization problems see: discrete — 
monotonic over see: strongly linearly — 
monotonicity 

[65K10, 65M60, 90C09, 90C10, 90C26, 90C30] 

(see: Bounding derivative ranges; Inference of monotone 

boolean functions; Optimization in boolean classification 

problems; Variational inequalities: geometric 
interpretation, existence and uniqueness) 
monotonicity 

[65K10, 65M60] 

(see: Variational inequalities: geometric interpretation, 

existence and uniqueness) 
monotonicity see: Fejér —; generalized —; local —; local 

strict —; local strong —; partial —; strict —; strong 
monotonicity: applications to variational inequalities and 

equilibrium problems see: Generalized — 
monotonicity in convex optimization see: Fejér — 
monotonicity and nonconvexity test 

[65G20, 65G30, 65G40, 65K05, 90C30] 

(see: Interval global optimization) 
monotonicity test 

[49M37, 65G20, 65G30, 65G40, 65H20, 65K05, 65Y05, 65Y10, 

65Y20, 68W10, 90C11, 90C30] 

(see: Interval analysis: parallel methods for global 

optimization; Interval analysis: unconstrained and 

constrained optimization; Interval global optimization; 

Mixed integer nonlinear programming) 
monotonous see: partially — 

Monro method see: Robbins— — 
Monte-Carlo 

[65K10, 92C40] 

(see: Multiple minima problem in protein folding: «BB 

global optimization approach) 
Monte-Carlo see: pure — 
Monte-Carlo configuration 

[92C05] 

(see: Adaptive simulated annealing and its application to 

protein folding) 
Monte-Carlo method 

[90C05, 90C25, 90C90] 

(see: Metropolis, Nicholas Constantine; Simulated 

annealing methods in protein folding) 
Monte-Carlo method 

[62F12, 65C05, 65K05, 90C05, 90C15, 90C25, 90C31] 

(see: Metropolis, Nicholas Constantine; Monte-Carlo 

simulations for stochastic optimization) 
Monte Carlo method see: metropolis —; pure — 
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Monte-Carlo sampling and variance reduction 
[90C27] 
(see: Operations research and financial markets) 

Monte-Carlo simulated annealing in protein folding 
(92C40) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian global 
optimization; Genetic algorithms; Genetic algorithms for 
protein structure prediction; Global optimization based on 
statistical models; Global optimization in Lennard-Jones 
and morse clusters; Graph coloring; Molecular structure 
determination: convex global underestimation; 
Monte-Carlo simulations for stochastic optimization; 
Multiple minima problem in protein folding: «BB global 
optimization approach; Packet annealing; Phase problem in 
X-ray crystallography: Shake and bake approach; Random 
search methods; Simulated annealing; Simulated annealing 
methods in protein folding; Stochastic global optimization: 
stopping rules; Stochastic global optimization: two-phase 
methods) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Bayesian global optimization; Genetic 
algorithms; Genetic algorithms for protein structure 
prediction; Global optimization based on statistical models; 
Global optimization in Lennard-Jones and morse clusters; 
Global optimization in protein folding; Molecular structure 
determination: convex global underestimation; 
Monte-Carlo simulations for stochastic optimization; 
Multiple minima problem in protein folding: «BB global 
optimization approach; Packet annealing; Phase problem in 
X-ray crystallography: Shake and bake approach; Protein 
folding: generalized-ensemble algorithms; Random search 
methods; Simulated annealing; Simulated annealing 
methods in protein folding; Stochastic global optimization: 
stopping rules; Stochastic global optimization: two-phase 
methods) 

monte-Carlo simulation 
[49L20, 49L99, 62F12, 65C05, 65K05, 90C15, 90C31, 90C40] 
(see: Dynamic programming: average cost per stage 
problems; Dynamic programming: stochastic shortest path 
problems; Monte-Carlo simulations for stochastic 
optimization) 

Monte-Carlo simulation algorithm 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 

Monte-Carlo simulation procedure 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 

Monte-Carlo simulations for stochastic optimization 
(90C15, 65C05, 65K05, 90C31, 62F12) 
(referred to in: Monte-Carlo simulated annealing in protein 
folding) 
(refers to: Monte-Carlo simulated annealing in protein 
folding) 

mood of play 
[49]xx, 91Axx] 
(see: Infinite horizon control and dynamic games) 

Moon algorithm see: Corley- — 

Moore-Penrose pseudo-inverse 
[65K05, 65K10] 


(see: ABS algorithms for linear equations and linear least 
squares) 

Moré updating formula 

90C30] 

(see: Generalized total least squares) 

Moreau duality see: Fenchel- — 

Moreau-Rockafellar subdifferential 

49K27, 49K40, 90C30, 90C31] 

(see: First order constraint qualifications) 

Moreau subdifferential see: Fenchel- — 

Moreau theorem 

90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem) 

morphism see: generalized — 

morphisms see: generalized — 

morphisms of relations see: generalized — 

Morrison formula see: Sherman- — 

Morrison rank-one update formula see: Sherman- — 

Morrison—Woodbury formula see: Sherman- — 

morse clusters see: Global optimization in Lennard-Jones 
and — 

morse index 

[57R12, 90C31, 90C34] 

see: Smoothing methods for semi-infinite optimization) 

morse indices 

57R12, 90C31, 90C34] 

(see: Smoothing methods for semi-infinite optimization) 

Morse lemma see: equivariant — 

Morse microcluster 

90C26, 90C90] 

(see: Global optimization in Lennard-Jones and morse 

clusters) 

Morse relations 

58E05, 90C30] 

(see: Topology of global optimization) 

Morse relations 

58E05, 90C30] 

(see: Topology of global optimization) 

Morse theory 

58E05, 90C30] 

(see: Topology of global optimization) 

MOSA method 

90C27, 90C29] 
(see: Multi-objective combinatorial optimization) 

Mosco convergence see: discrete — 

most active points see: set of e- — 

most/least infeasible integer variable 

[90C05, 90C06, 90C08, 90C10, 90C11] 

see: Integer programming: branch and bound methods) 

most preferred solution 

[90C29] 

see: Multiple objective programming support) 

most promising region 

[90C11, 90C59] 

see: Nested partitions optimization) 

mostPreferred 

see: Railroad locomotive scheduling) 

motion 

[03E70, 03H0S, 91B16] 

see: Alternative set theory) 
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motion 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
motion see: Brownian —; N-dimensional Brownian — 
motion of a point 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 
motion of a set 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 
motionless degree 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
motivation 
[90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming) 
Motzkin elimination see: Fourier- — 
Motzkin elimination method see: Fourier- — 
Motzkin-Fourier relaxation method see: Agmon- — 
Motzkin method see: Fourier- — 
Motzkin theorem 
[90C05, 90C30] 
(see: Theorems of the alternative and optimization) 
Motzkin transposition theorem 
(15A39, 90C05) 
(referred to in: Farkas lemma; Linear optimization: 
theorems of the alternative; Linear programming; 
Minimum concave transportation problems; Stochastic 
transportation and location problems; Tucker 
homogeneous systems of linear relations) 
(refers to: Farkas lemma; Linear optimization: theorems of 
the alternative; Linear programming; Minimum concave 
transportation problems; Multi-index transportation 
problems; Stochastic transportation and location problems; 
Tucker homogeneous systems of linear relations) 
Motzkin transposition theorem 
[15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
mountain pass theorem 
[49]52, 58E05, 90C30] 
(see: Hemivariational inequalities: eigenvalue problems; 
Topology of global optimization) 
move 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
move see: exchange —; feasible —; shift 
move in a search 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
see: Maximum satisfiability problem) 
move of a Turing machine 
[90C60] 
see: Complexity theory) 
moving average model 
[90C26, 90C30] 
(see: Forecasting) 
moving average model 
[90C26, 90C30] 
(see: Forecasting) 


moving coordinate system 
65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: differential equations) 
(MP) see: 1- —; minimum Partition Problem —; p- — 
MPC 
90C26] 
(see: MINLP: design and scheduling of batch processes) 
mPCC 
65K05, 90C26, 90C33, 90C34] 
(see: Adaptive convexification in semi-infinite 
optimization) 
MPEC 
90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 
MPEC 
90C30, 90C33] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 
MPEC Lagrangian 
90C30, 90C33] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 
MPEC multipliers 
90C30, 90C33] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 
MPI 
49-04, 65Y05, 68N20] 
(see: Automatic differentiation: parallel computation) 
MPI-based implementations 
65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
MPM 
05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
MPS 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
MS scheme 
68W 10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
MSIM 
90B50] 
(see: Inventory management in supply chains) 
MSP 
68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 
MSP see: STP- — 
MST 
05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
MTT 
15A39, 90C05] 
(see: Motzkin transposition theorem) 
MTVSP 
68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
mu synthesis control 
93D09] 
(see: Robust control) 
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multi-armed restless bandit problem 
[90B36] 
(see: Stochastic scheduling) 
multi-attribute utility theory 
[90C29, 91B06, 91B60] 
(see: Decision support systems with multiple criteria; 
Financial applications of multicriteria analysis; Preference 
disaggregation approach: basic features, examples from 
financial decision making) 
Multi-class data classification via mixed-integer optimization 
Multi-commodity flows 
(see: Railroad crew scheduling; Railroad locomotive 
scheduling) 
multi-core 
[65K05, 65Y05, 65Y10, 65Y20, 68W10] 
(see: Interval analysis: parallel methods for global 
optimization) 
multi-criteria problems 
(see: Planning in the process industry) 
Multi-depot vehicle scheduling problem 
68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
multi-depot vehicle scheduling problems 
68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
multi-echelon arborescence system 
90B50 
(see: Inventory management in supply chains) 
multi-extremal global optimization 
90C25 
(see: Concave programming) 
multi-extremal global optimization 
90C25 
(see: Concave programming) 
multi-extremality 
90C05 
(see: Continuous global optimization: applications; Global 
optimization in the analysis and management of 
environmental systems) 
multi-index assignment problems 
[90C08, 90C11, 90C27, 90C35, 90C57, 90C59] 
(see: Multi-index transportation problems; Quadratic 
assignment problem) 
multi-index transportation problem 
[90C35] 
(see: Multi-index transportation problems) 
multi-index transportation problem see: axial —; integer —; 
planar —; symmetric — 
Multi-index transportation problems 
(90C35) 
(referred to in: Minimum concave transportation problems; 
Motzkin transposition theorem; Multidimensional 
assignment problem; Stochastic transportation and 
location problems) 
(refers to: Generalized assignment problem; Stochastic 
transportation and location problems) 
multi-instrument financial equilibrium model see: 
multi-sector — 
multi-knapsack problem 
[90C10, 90C27] 
(see: Multidimensional knapsack problems) 


multi-objective 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 

see: Convex discrete optimization) 


multi-objective CNSO 
[46A20, 52A01, 90C30] 
see: Composite nonsmooth optimization) 


Multi-objective combinatorial optimization 

90C29, 90C27) 

referred to in: Bi-objective assignment problem; 
Combinatorial matrix analysis; Combinatorial 
optimization algorithms in resource allocation problems; 
Combinatorial optimization games; Decision support 
systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Evolutionary algorithms in combinatorial 
optimization; Financial applications of multicriteria 
analysis; Fractional combinatorial optimization; Fuzzy 
multi-objective linear programming; Multicriteria sorting 
methods; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling; Replicator dynamics in 
combinatorial optimization) 

(refers to: Bi-objective assignment problem; Combinatorial 
matrix analysis; Combinatorial optimization algorithms in 
resource allocation problems; Combinatorial optimization 
games; Decision support systems with multiple criteria; 
Estimating data for multicriteria decision making 
problems: optimization techniques; Evolutionary 
algorithms in combinatorial optimization; Financial 
applications of multicriteria analysis; Fractional 
combinatorial optimization; Fuzzy multi-objective linear 
programming; Multicriteria sorting methods; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Neural networks 
for combinatorial optimization; Outranking methods; 
Portfolio selection and multicriteria analysis; Preference 
disaggregation; Preference disaggregation approach: basic 
features, examples from financial decision making; 
Preference modeling; Replicator dynamics in combinatorial 
optimization) 


multi-objective combinatorial optimization 
[90C10, 90C35] 
(see: Bi-objective assignment problem) 
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multi-objective convex optimization 

[46A20, 52A01, 90C30] 

(see: Farkas lemma: generalizations) 
multi-objective euclidean distance location see: Single facility 

location: — 
multi-objective facility location 
90B85] 
(see: Single facility location: multi-objective rectilinear 
distance location) 
multi-objective fractional program 
90C32] 
(see: Fractional programming) 
multi-objective fractional programming 
[90C32] 
see: Fractional programming) 
Multi-objective fractional programming problems 
90C29) 
Multi-objective integer linear programming 
90C29, 90C10) 
referred to in: Bi-objective assignment problem; Branch 
and price: Integer programming with column generation; 
Broadcast scheduling problem; Decision support systems 
with multiple criteria; Decomposition techniques for MILP: 
lagrangian relaxation; Estimating data for multicriteria 
decision making problems: optimization techniques; 
Financial applications of multicriteria analysis; Fuzzy 
multi-objective linear programming; Graph coloring; 
Integer linear complementary problem; Integer 
programming; Integer programming: algebraic methods; 
Integer programming: branch and bound methods; Integer 
programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Integer 
programming: lagrangian relaxation; LCP: Pardalos—Rosen 
mixed integer formulation; MINLP: trim-loss problem; 
Multicriteria sorting methods; Multi-objective 
combinatorial optimization; Multi-objective mixed integer 
programming; Multi-objective optimization and decision 
support systems; Multi-objective optimization: interaction 
of design and control; Multi-objective optimization; 
Interactive methods for preference value functions; 
Multi-objective optimization: lagrange duality; 
Multi-objective optimization: pareto optimal solutions, 
properties; Multiparametric mixed integer linear 
programming; Multiple objective programming support; 
Outranking methods; Parametric mixed integer nonlinear 
optimization; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling; Set covering, packing and 
partitioning problems; Simplicial pivoting algorithms for 
integer programming; Stochastic integer programming: 
continuity, stability, rates of convergence; Stochastic 
integer programs; Time-dependent traveling salesman 
problem) 
(refers to: Bi-objective assignment problem; Branch and 
price: Integer programming with column generation; 
Decision support systems with multiple criteria; 
Decomposition techniques for MILP: lagrangian relaxation; 
Estimating data for multicriteria decision making 
problems: optimization techniques; Financial applications 
of multicriteria analysis; Fuzzy multi-objective linear 


programming; Integer linear complementary problem; 

Integer programming; Integer programming: algebraic 

methods; Integer programming: branch and bound 

methods; Integer programming: branch and cut algorithms; 

Integer programming: cutting plane algorithms; Integer 

programming duality; Integer programming: lagrangian 

relaxation; LCP: Pardalos—Rosen mixed integer 
formulation; Mixed integer classification problems; 

Multicriteria sorting methods; Multi-objective 

combinatorial optimization; Multi-objective mixed integer 

programming; Multi-objective optimization and decision 
support systems; Multi-objective optimization: interaction 
of design and control; Multi-objective optimization; 

Interactive methods for preference value functions; 

Multi-objective optimization: lagrange duality; 

Multi-objective optimization: pareto optimal solutions, 

properties; Multiparametric mixed integer linear 

programming; Multiple objective programming support; 

Outranking methods; Parametric mixed integer nonlinear 

optimization; Portfolio selection and multicriteria analysis; 

Preference disaggregation; Preference disaggregation 

approach: basic features, examples from financial decision 

making; Preference modeling; Set covering, packing and 
partitioning problems; Simplicial pivoting algorithms for 
integer programming; Stochastic integer programming: 
continuity, stability, rates of convergence; Stochastic 
integer programs; Time-dependent traveling salesman 
problem) 

multi-objective linear programming 

[90C10, 90C26, 90C29, 91B28] 

(see: Multi-objective integer linear programming; Portfolio 

selection and multicriteria analysis; Vector optimization) 
multi-objective linear programming 

[90C26, 91B28] 

(see: Portfolio selection and multicriteria analysis) 
multi-objective linear programming see: Fuzzy — 
multi-objective linear programming with fuzzy coefficients 
90C29, 90C70] 

(see: Fuzzy multi-objective linear programming) 
multi-objective linear programming under uncertainty 
90C29, 90C70] 

(see: Fuzzy multi-objective linear programming) 
multi-objective mathematical programming 

91B06, 91B60] 

(see: Financial applications of multicriteria analysis) 
multi-objective mathematical programming 

90C11, 90C29] 

(see: Multi-objective mixed integer programming) 
Multi-objective mixed integer programming 

(90C29, 90C11) 

(referred to in: Branch and price: Integer programming with 

column generation; Decomposition techniques for MILP: 

lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 

LCP: Pardalos—Rosen mixed integer formulation; MINLP: 

trim-loss problem; Multi-objective integer linear 

programming; Multiparametric mixed integer linear 
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programming; Parametric mixed integer nonlinear 
optimization; Set covering, packing and partitioning 
problems; Simplicial pivoting algorithms for integer 
programming; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem) 

(refers to: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos—-Rosen 
mixed integer formulation; Mixed integer classification 
problems; Multi-objective integer linear programming; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Time-dependent 
traveling salesman problem) 

multi-objective mixed integer programming 

[90C11, 90C29] 

(see: Multi-objective mixed integer programming) 
multi-objective (multicriteria) mixed integer programming 
[90C11, 90C29] 

(see: Multi-objective mixed integer programming) 
multi-objective optimization 

[49M37, 65K05, 65K10, 90B50, 90B85, 90C11, 90C29, 90C30, 
93A13] 

(see: MINLP: applications in the interaction of design and 
control; Multilevel methods for optimal design; 
Multi-objective optimization; Interactive methods for 
preference value functions; Multi-objective optimization: 
pareto optimal solutions, properties; Optimization and 
decision support systems; Selection of maximally 
informative genes; Single facility location: multi-objective 
rectilinear distance location) 

multi-objective optimization 

[90B50, 90C11, 90C29, 90C90] 

(see: Multi-objective optimization: interaction of design 
and control; Multi-objective optimization; Interactive 
methods for preference value functions; Multi-objective 
optimization: pareto optimal solutions, properties; 
Optimization and decision support systems) 


multi-objective optimization see: disaggregation in —; 


Generalized concavity in — 

Multi-objective optimization and decision support systems 
(90B50, 90C29, 65K05, 90C05, 91B06) 

(referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 


for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling) 

(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling) 


Multi-objective optimization: interaction of design and 


control 

(90C29, 90C11, 90C90) 

(referred to in: Bi-objective assignment problem; Control 
vector iteration CVI; Decision support systems with 
multiple criteria; Duality in optimal control with first order 
differential equations; Dynamic programming: 
continuous-time optimal control; Dynamic programming 
and Newton’s method in unconstrained optimal control; 
Dynamic programming: optimal control applications; 
Estimating data for multicriteria decision making 
problems: optimization techniques; Financial applications 
of multicriteria analysis; Fuzzy multi-objective linear 
programming; Hamilton-Jacobi-Bellman equation; 
Infinite horizon control and dynamic games; MINLP: 
applications in the interaction of design and control; 
Multicriteria sorting methods; Multi-objective 
combinatorial optimization; Multi-objective integer linear 
programming; Multi-objective optimization and decision 
support systems; Multi-objective optimization; Interactive 
methods for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Optimal control 
of a flexible arm; Outranking methods; Portfolio selection 
and multicriteria analysis; Preference disaggregation; 
Preference disaggregation approach: basic features, 
examples from financial decision making; Preference 
modeling; Robust control; Robust control: schur stability of 
polytopes of polynomials; Semi-infinite programming and 
control problems; Sequential quadratic programming: 
interior point methods for distributed optimal control 
problems; Suboptimal control) 

(refers to: Bi-objective assignment problem; Control vector 
iteration CVI; Decision support systems with multiple 
criteria; Duality in optimal control with first order 
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differential equations; Dynamic programming: 
continuous-time optimal control; Dynamic programming 
and Newton’s method in unconstrained optimal control; 
Dynamic programming: optimal control applications; 
Estimating data for multicriteria decision making 
problems: optimization techniques; Financial applications 
of multicriteria analysis; Fuzzy multi-objective linear 
programming; Hamilton-Jacobi-Bellman equation; 
Infinite horizon control and dynamic games; MINLP: 
applications in the interaction of design and control; 
Multicriteria sorting methods; Multi-objective 
combinatorial optimization; Multi-objective integer linear 
programming; Multi-objective optimization and decision 
support systems; Multi-objective optimization; Interactive 
methods for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Optimal control 
of a flexible arm; Outranking methods; Portfolio selection 
and multicriteria analysis; Preference disaggregation; 
Preference disaggregation approach: basic features, 
examples from financial decision making; Preference 
modeling; Robust control; Robust control: schur stability of 
polytopes of polynomials; Semi-infinite programming and 
control problems; Sequential quadratic programming: 
interior point methods for distributed optimal control 
problems; Suboptimal control) 

multi-objective optimization in the interaction of design and 
control 

[90C11, 90C29, 90C90] 

(see: Multi-objective optimization: interaction of design 
and control) 

Multi-objective optimization; Interactive methods for 
preference value functions 

(90C29) 

(referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization: lagrange duality; 
Multi-objective optimization: pareto optimal solutions, 
properties; Multiple objective programming support; 
Outranking methods; Portfolio selection and multicriteria 
analysis; Preference disaggregation; Preference 
disaggregation approach: basic features, examples from 
financial decision making; Preference modeling) 

(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization: lagrange duality; 


Multi-objective optimization: pareto optimal solutions, 
properties; Multiple objective programming support; 
Outranking methods; Portfolio selection and multicriteria 
analysis; Preference disaggregation; Preference 
disaggregation approach: basic features, examples from 
financial decision making; Preference modeling) 


Multi-objective optimization: lagrange duality 


(90C29, 90C30) 

(referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Decomposition 
techniques for MILP: lagrangian relaxation; Estimating 
data for multicriteria decision making problems: 
optimization techniques; Financial applications of 
multicriteria analysis; Fuzzy multi-objective linear 
programming; Integer programming: lagrangian 
relaxation; Lagrange, Joseph-Louis; Lagrangian multipliers 
methods for convex programming; Multicriteria sorting 
methods; Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling) 

(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Decomposition 
techniques for MILP: lagrangian relaxation; Estimating 
data for multicriteria decision making problems: 
optimization techniques; Financial applications of 
multicriteria analysis; Fuzzy multi-objective linear 
programming; Integer programming: lagrangian 
relaxation; Lagrange, Joseph-Louis; Lagrangian multipliers 
methods for convex programming; Multicriteria sorting 
methods; Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making; Preference modeling) 


Multi-objective optimization: pareto optimal solutions, 


properties 

(90C29) 

(referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
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Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multiple objective 
programming support; Outranking methods; Portfolio 
selection and multicriteria analysis; Preference 
disaggregation; Preference disaggregation approach: basic 
features, examples from financial decision making; 
Preference modeling) 
(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multiple objective 
programming support; Outranking methods; Portfolio 
selection and multicriteria analysis; Preference 
disaggregation; Preference disaggregation approach: basic 
features, examples from financial decision making; 
Preference modeling) 

multi-objective programming 
[90C29] 
(see: Preference disaggregation approach: basic features, 
examples from financial decision making) 

multi-objective programming 
[90C10, 90C27, 90C29, 90C35] 
(see: Bi-objective assignment problem; Multi-objective 
combinatorial optimization; Multi-objective integer linear 
programming) 

multi-objective rectilinear distance location see: Single facility 
location: — 

Multi-quadratic integer programming: models and 
applications 
(65K05, 90C11, 90C20) 

the multi-resource weighted assignment model 
[90-00] 
(see: Generalized assignment problem) 

multi-resource weighted assignment model see: the — 

Multi-scale global optimization using terrain/funneling 
methods 
(65H20) 

multi-sector multi-instrument financial equilibrium model 
[91B50] 
(see: Financial equilibrium) 

multicenter see: minmax — 

multiclass migration 
[90C30] 
(see: Equilibrium networks) 

multiclass migration 
[90C30] 
(see: Equilibrium networks) 

multiclass queueing networks 
[90B36] 
(see: Stochastic scheduling) 


multicoloring 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 

multicommodity flow 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

multicommodity flow 
[90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 

multicommodity flow see: relaxed — 

multicommodity flow problem see: node-path formulation of 
the — 

Multicommodity flow problems 
(90C35) 
(referred to in: Minimum cost flow problem; Nonconvex 
network flow problems; Nonoriented multicommodity flow 
problems) 
(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Equilibrium networks; Evacuation 
networks; Generalized networks; Maximum flow problem; 
Minimum cost flow problem; Network design problems; 
Network location: covering problems; Nonconvex network 
flow problems; Nonoriented multicommodity flow 
problems; Piecewise linear network flow problems; Shortest 
path tree algorithms; Steiner tree problems; Stochastic 
network problems: massively parallel solution; Survivable 
networks; Traffic network equilibrium) 

multicommodity flow problems see: large nonlinear —; 
nonlinear —; Nonoriented — 

multicommodity model in OR 

[90B80, 90B85] 

see: Warehouse location problem) 

multicommodity network 

[90B10, 90C26, 90C30, 90C35] 

see: Nonconvex network flow problems) 

multicommodity network flow models see: undirected — 

multicommodity network flow problem 

[90B06, 90C06, 90C08, 90C35, 90C90] 

see: Airline optimization) 

multicommodity network flows 

[90C35] 

see: Multicommodity flow problems) 

multicomputer see: coarse grained — 

multiconstraint knapsack 

[90C10, 90C27] 

see: Multidimensional knapsack problems) 

multiconstraint knapsack problem 

[90C10, 90C27] 

see: Multidimensional knapsack problems) 

multicriteria analysis 

[90C29, 91A99, 91B06, 91B60] 

see: Financial applications of multicriteria analysis; 

Preference disaggregation) 

multicriteria analysis 

[90C11, 90C29, 91A99, 91B06, 91B60] 

see: Decision support systems with multiple criteria; 

Financial applications of multicriteria analysis; 

Multicriteria sorting methods; Multi-objective mixed 

integer programming; Preference disaggregation; 
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Subject Index 


Preference disaggregation approach: basic features, 
examples from financial decision making) 
multicriteria analysis see: Financial applications of —; Portfolio 
selection and — 
multicriteria decision aid 
[90C29, 91B06, 91B60] 
(see: Decision support systems with multiple criteria; 
Financial applications of multicriteria analysis; 
Multicriteria sorting methods; Preference disaggregation 
approach: basic features, examples from financial decision 
making) 
multicriteria decision making 
[90B80, 90B85] 
(see: Warehouse location problem) 
multicriteria decision making 
[90C29, 90C70] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques; Fuzzy multi-objective 
linear programming) 
multicriteria decision making problems: optimization 
techniques see: Estimating data for — 
Multicriteria decision support methodologies for auditing 
decisions 
90C90, 90C11, 91B28) 
multicriteria decision support system 
[90C29] 
see: Decision support systems with multiple criteria) 
multicriteria decision support system see: intelligent — 
multicriteria decision support systems see: intelligent — 
multicriteria DSS 
[90C29] 
see: Decision support systems with multiple criteria) 
multicriteria group decision support system 
[90C29] 
see: Decision support systems with multiple criteria) 
multicriteria group decision support system 
[90C29] 
(see: Decision support systems with multiple criteria) 
Multicriteria methods for mergers and acquisitions 
91B28, 90C05, 90C90) 
(multicriteria) mixed integer programming see: 
multi-objective — 
multicriteria objective function 
[90B80] 
see: Facilities layout problems) 
multicriteria sorting method 
[90C29] 
see: Multicriteria sorting methods) 
Multicriteria sorting methods 
90C29) 
referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multi-objective 
combinatorial optimization; Multi-objective integer linear 
programming; Multi-objective optimization and decision 
support systems; Multi-objective optimization: interaction 
of design and control; Multi-objective optimization; 
Interactive methods for preference value functions; 
Multi-objective optimization: lagrange duality; 


Multi-objective optimization: pareto optimal solutions, 
properties; Multiple objective programming support; 
Outranking methods; Portfolio selection and multicriteria 
analysis; Preference disaggregation; Preference 
disaggregation approach: basic features, examples from 
financial decision making; Preference modeling) 
(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multi-objective 
combinatorial optimization; Multi-objective integer linear 
programming; Multi-objective optimization and decision 
support systems; Multi-objective optimization: interaction 
of design and control; Multi-objective optimization; 
Interactive methods for preference value functions; 
Multi-objective optimization: lagrange duality; 
Multi-objective optimization: pareto optimal solutions, 
properties; Multiple objective programming support; 
Outranking methods; Portfolio selection and multicriteria 
analysis; Preference disaggregation; Preference 
disaggregation approach: basic features, examples from 
financial decision making; Preference modeling) 

multicut methods 
[90C06, 90C15] 
(see: Stabilization of cutting plane algorithms for stochastic 
linear programming problems) 

Multidimensional assignment problem 
(90C10, 90C27) 
(refers to: Assignment and matching; Integer programming: 
branch and bound methods; Multi-index transportation 
problems) 

multidimensional assignment problem 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 

multidimensional assignment problem see: Asymptotic 
properties of random — 

multidimensional bisection 
[65K05, 90C30] 
(see: Bisection global optimization methods) 

multidimensional bisection 

65K05, 90C30] 

(see: Bisection global optimization methods) 

multidimensional bracket 

65K05, 90C30] 

(see: Bisection global optimization methods) 

multidimensional knapsack 

90C10, 90C27] 

(see: Multidimensional knapsack problems) 

multidimensional knapsack problem 

90C10, 90C27] 
(see: Multidimensional knapsack problems) 

Multidimensional knapsack problems 
(90C27, 90C10) 
(referred to in: Integer programming; Quadratic knapsack) 
(refers to: Integer programming; Integer programming: 
branch and bound methods; Quadratic knapsack) 

multidimensional multiple-choice knapsack problem 
[90C10, 90C27] 
(see: Multidimensional knapsack problems) 
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multidimensional scaling 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
multidimensional scaling problem 
[65D18, 90B85, 90C26] 
(see: Global optimization in location problems) 
multidimensional transportation problem 
[90C35] 
(see: Multi-index transportation problems) 
multidimensional zero-one knapsack problem 
[90C10, 90C27] 
(see: Multidimensional knapsack problems) 
multidisciplinary design 
[65F10, 65F50, 65H10, 65K10] 
(see: Multidisciplinary design optimization) 
Multidisciplinary design optimization 
(65F10, 65F50, 65H10, 65K10) 
(referred to in: Design optimization in computational fluid 
dynamics; Interval analysis: application to chemical 
engineering design problems; Multilevel methods for 
optimal design; Optimal design of composite structures; 
Optimal design in nonlinear optics; Structural 
optimization: history) 
(refers to: Bilevel programming: applications in engineering; 
Design optimization in computational fluid dynamics; 
Genetic algorithms; Interval analysis: application to 
chemical engineering design problems; Multilevel methods 
for optimal design; Optimal design of composite structures; 
Optimal design in nonlinear optics; Structural 
optimization: history) 
multidisciplinary design optimization 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
multidisciplinary optimization 
[65F10, 65F50, 65H10, 65K10] 
(see: Multidisciplinary design optimization) 
multifacilities location 
[90B80] 
(see: Facilities layout problems) 
multifacility see: discrete single-commodity single-criterion 
uncapacitated static — 
multifacility location 
05C05, 05C85, 68Q25, 90B80] 
(see: Bottleneck steiner tree problems) 
multifacility location-allocation 
90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
multifacility problem in OR 
90B80, 90B85] 
(see: Warehouse location problem) 
Multifacility and restricted location problems 
(90B85) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Network location: covering 
problems; Optimizing facility location with euclidean and 


rectilinear distances; Single facility location: circle covering 
problem; Single facility location: multi-objective euclidean 
distance location; Single facility location: multi-objective 
rectilinear distance location; Stochastic transportation and 
location problems; Voronoi diagrams in facility location; 
Warehouse location problem) 

(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Complexity classes in optimization; Complexity theory; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Network location: covering 
problems; Optimizing facility location with euclidean and 
rectilinear distances; Production-distribution system 
design problem; Resource allocation for epidemic control; 
Single facility location: circle covering problem; Single 
facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 


multifacility Weber objective function 


90B85 
(see: Multifacility and restricted location problems) 


multifacility Weber problem 


90B85 
(see: Multifacility and restricted location problems) 


multifacility Weber-Rawls objective function 


90B85 
(see: Multifacility and restricted location problems) 


multifrontal method 


65Fxx 
(see: Least squares problems) 


multigraph 


05-XX] 
(see: Frequency assignment problem) 


multigroup hierarchical discrimination 


[90C29] 
see: Multicriteria sorting methods) 


multilayer see: k-restrictive — 
multilayered dielectric structures see: Global optimization of 


planar — 


multilevel algorithm 


[49M37, 65K05, 65K10, 90C30, 93A13] 
see: Multilevel methods for optimal design) 


multilevel generalized assignment problem 


[90-00] 
see: Generalized assignment problem) 


multilevel methods 


[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
see: Stochastic global optimization: two-phase methods) 


Multilevel methods for optimal design 


49M37, 65K05, 65K10, 90C30, 93A13) 

referred to in: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: global 
optimization; Bilevel programming: implicit function 
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Subject Index 


approach; Bilevel programming: introduction, history and 
overview; Bilevel programming in management; Bilevel 
programming: optimality conditions and duality; Design 
optimization in computational fluid dynamics; Interval 
analysis: application to chemical engineering design 
problems; Multidisciplinary design optimization; 
Multilevel optimization in mechanics; Optimal design of 
composite structures; Optimal design in nonlinear optics; 
Stochastic bilevel programs; Structural optimization: 
history) 
(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: applications in engineering; Bilevel 
programming: implicit function approach; Bilevel 
programming: introduction, history and overview; Bilevel 
programming in management; Bilevel programming: 
optimality conditions and duality; Design optimization in 
computational fluid dynamics; Interval analysis: 
application to chemical engineering design problems; 
Multidisciplinary design optimization; Multilevel 
optimization in mechanics; Optimal design of composite 
structures; Optimal design in nonlinear optics; Stochastic 
bilevel programs; Structural optimization: history) 

multilevel optimization 

[49]35, 65K99, 74A55, 74M10, 74M15, 90C26] 

see: Quasidifferentiable optimization: applications) 

multilevel optimization 

[49Q10, 74K99, 74Pxx, 90C90, 91A65] 

see: Multilevel optimization in mechanics) 

Multilevel optimization in mechanics 

49Q10, 74K99, 74Pxx, 90C90, 91A65) 

referred to in: Bilevel linear programming; Bilevel linear 
programming: complexity, equivalence to minmax, concave 
programs; Bilevel optimization: feasibility test and 
flexibility index; Bilevel programming; Bilevel 
programming: applications; Bilevel programming: global 
optimization; Bilevel programming: implicit function 
approach; Bilevel programming: introduction, history and 
overview; Bilevel programming in management; Bilevel 
programming: optimality conditions and duality; 
Multilevel methods for optimal design; Quasivariational 
inequalities; Stochastic bilevel programs) 
(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: applications in engineering; Bilevel 
programming: implicit function approach; Bilevel 
programming: introduction, history and overview; Bilevel 
programming in management; Bilevel programming: 
optimality conditions and duality; Multilevel methods for 
optimal design; Stochastic bilevel programs) 

multilevel problem formulation 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 

multilevel programming 
[90C26, 90C30, 90C31] 


(see: Bilevel programming: introduction, history and 
overview) 

multilevel programming problem 

49M37, 65K05, 65K10, 90C30, 93A13] 

(see: Multilevel methods for optimal design) 

multilevel single-linkage 

65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 

(see: Stochastic global optimization: stopping rules; 

Stochastic global optimization: two-phase methods) 

multiload shape design 

90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 

multiload truss design 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 

multimodal functions 
[90C30 
(see: Global optimization based on statistical models) 

multimodal networks 
[90C30 
(see: Equilibrium networks) 

multimodal traffic network equilibrium 
[90C30 
(see: Equilibrium networks) 

multimodal traffic network equilibrium model 
[90C30 
(see: Equilibrium networks) 

Multiparametric linear programming 
(90C31, 90C05) 
(referred to in: Bounds and solution vector estimates for 
parametric NLPS; Global optimization in multiplicative 
programming; Linear programming; Multiparametric 
mixed integer linear programming; Multiplicative 
programming; Nondifferentiable optimization: parametric 
programming; Parametric global optimization: sensitivity; 
Parametric linear programming: cost simplex algorithm; 
Parametric mixed integer nonlinear optimization; 
Parametric optimization: embeddings, path following and 
singularities; Selfdual parametric method for linear 
programs) 
(refers to: Bounds and solution vector estimates for 
parametric NLPS; Global optimization in multiplicative 
programming; Linear programming; Multiparametric 
mixed integer linear programming; Multiplicative 
programming; Nondifferentiable optimization: parametric 
programming; Parametric global optimization: sensitivity; 
Parametric linear programming: cost simplex algorithm; 
Parametric mixed integer nonlinear optimization; 
Parametric optimization: embeddings, path following and 
singularities; Selfdual parametric method for linear 
programs) 

Multiparametric mixed integer linear programming 
(90C31, 90C11) 
(referred to in: Bounds and solution vector estimates for 
parametric NLPS; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
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and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 
LCP: Pardalos—Rosen mixed integer formulation; MINLP: 
trim-loss problem; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multiparametric linear programming; 
Nondifferentiable optimization: parametric programming; 
Parametric global optimization: sensitivity; Parametric 
linear programming: cost simplex algorithm; Parametric 
mixed integer nonlinear optimization; Parametric 
optimization: embeddings, path following and 
singularities; Selfdual parametric method for linear 
programs; Set covering, packing and partitioning problems; 
Simplicial pivoting algorithms for integer programming; 
Time-dependent traveling salesman problem) 
(refers to: Bounds and solution vector estimates for 
parametric NLPS; Branch and price: Integer programming 
with column generation; Decomposition techniques for 
MILP: lagrangian relaxation; Integer linear complementary 
problem; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos-Rosen 
mixed integer formulation; Mixed integer classification 
problems; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric linear programming; Nondifferentiable 
optimization: parametric programming; Parametric global 
optimization: sensitivity; Parametric linear programming: 
cost simplex algorithm; Parametric mixed integer nonlinear 
optimization; Parametric optimization: embeddings, path 
following and singularities; Selfdual parametric method for 
linear programs; Set covering, packing and partitioning 
problems; Simplicial pivoting algorithms for integer 
programming; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem) 

multiperiod assignment problem 
[90C35] 
(see: Multi-index transportation problems) 

multiperiod MINLP MEN synthesis model 
[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks) 

multiperiod model 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 

Multiperiod Models see: single versus — 

multiperiod optimization 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems; Optimal planning of offshore oilfield 
infrastructure) 

multiperiod optimization modeling framework 
[90C30, 90C35] 
(see: Optimization in water resources) 

multiperiod planning 
[90B80, 90B85] 
(see: Warehouse location problem) 


multiperiod stochastic program 
[90B05, 90B06] 
(see: Global supply chain models) 
multiperiod stochastic program 
[90B05, 90B06] 
(see: Global supply chain models) 
multiphase chemical equilibrium 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
multiphase chemical equilibrium see: Optimality criteria for — 
multiphase spanning network 
[90C27] 
(see: Steiner tree problems) 
multiphase Steiner network 
[90C27] 
(see: Steiner tree problems) 
multiphase Steiner problems 
[90C27] 
(see: Steiner tree problems) 
multiple 
(see: Railroad locomotive scheduling) 
multiple branches for bounded integer variable 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
multiple choice knapsack 
[90C10, 90C27] 
(see: Multidimensional knapsack problems) 
multiple-choice knapsack problem 
[90C10, 90C27] 
(see: Multidimensional knapsack problems) 
multiple-choice knapsack problem see: linear —; 
multidimensional — 
multiple-class software package 
[90C10, 90C26, 90C30] 
(see: Optimization software) 
multiple criteria see: Decision support systems with — 
multiple criteria decision making 
[65K05, 90-XX, 90B50, 90C05, 90C26, 90C29, 91B06, 91B28, 
91B60] 
(see: Financial applications of multicriteria analysis; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization; Interactive methods for 
preference value functions; Multi-objective optimization: 
pareto optimal solutions, properties; Multiple objective 
programming support; Outranking methods; Portfolio 
selection and multicriteria analysis) 
multiple criteria decision making 
[65K05, 90B50, 90C05, 90C29, 91B06] 
(see: Multi-objective optimization and decision support 
systems; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support) 
multiple criteria design problem 
[90C29] 
(see: Multiple objective programming support) 
multiple criteria evaluation 
[90C29] 
(see: Multiple objective programming support) 
multiple criteria problem see: continuous —; discrete — 
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multiple depot 

90B06] 

(see: Vehicle routing) 

multiple depots see: single depot/ — 

multiple dogleg path 

49M37] 

(see: Nonlinear least squares: trust region methods) 

multiple-facility location 

90B80, 90C27] 

(see: Voronoi diagrams in facility location) 

multiple-hub heuristic 

90C35] 

(see: Multi-index transportation problems) 

multiple knapsack problem 

90C10, 90C27] 

(see: Multidimensional knapsack problems) 

multiple Kuhn-Tucker points 

[90C25, 90C30] 

see: Successive quadratic programming: full space 
methods) 

multiple locomotive type models 
(see: Railroad locomotive scheduling) 

multiple minima 
[65K10, 92C40] 
(see: Multiple minima problem in protein folding: «BB 
global optimization approach) 

multiple minima 
[65K10, 92C40] 
(see: Multiple minima problem in protein folding: «BB 
global optimization approach) 

Multiple minima problem in protein folding: «BB global 
optimization approach 
(92C40, 65K10) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Genetic algorithms; Global 
optimization in Lennard-Jones and morse clusters; Graph 
coloring; Molecular structure determination: convex global 
underestimation; Monte-Carlo simulated annealing in 
protein folding; Packet annealing; Phase problem in X-ray 
crystallography: Shake and bake approach; Simulated 
annealing methods in protein folding) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Genetic algorithms; Global optimization 
in Lennard-Jones and morse clusters; Global optimization 
in protein folding; Molecular structure determination: 
convex global underestimation; Monte-Carlo simulated 
annealing in protein folding; Packet annealing; Phase 
problem in X-ray crystallography: Shake and bake 
approach; Protein folding: generalized-ensemble 
algorithms; Simulated annealing; Simulated annealing 
methods in protein folding) 

Multiple objective dynamic programming 
(90C39, 90C31) 
(referred to in: Dynamic programming: average cost per 
stage problems; Dynamic programming in clustering; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 


applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; 
Neuro-dynamic programming) 
(refers to: Dynamic programming: average cost per stage 
problems; Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; 
Neuro-dynamic programming) 

multiple objective linear programming 
[65K05, 90B50, 90C05, 90C29, 91B06] 
(see: Multi-objective optimization and decision support 
systems; Multiple objective programming support) 

multiple objective programming 
[90C29] 
(see: Multiple objective programming support) 

multiple objective programming 
[90C29, 90C31, 90C39] 
(see: Multiple objective dynamic programming; Multiple 
objective programming support) 

Multiple objective programming support 
(90029) 
(referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Outranking methods; Portfolio selection and multicriteria 
analysis; Preference disaggregation; Preference 
disaggregation approach: basic features, examples from 
financial decision making; Preference modeling) 
(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Outranking methods; Portfolio selection and multicriteria 
analysis; Preference disaggregation; Preference 
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disaggregation approach: basic features, examples from 

financial decision making; Preference modeling) 
multiple objective programming support 
90C29] 

(see: Multiple objective programming support) 
multiple objective programming support 

90C29] 

(see: Multiple objective programming support) 
multiple objectives 

49-01, 49K10, 49M37, 90-01, 90C05, 90C27, 91B52] 
(see: Bilevel linear programming) 

multiple QP Kuhn-Tucker points 

90C25, 90C30] 

(see: Successive quadratic programming: full space 

methods) 
multiple runs 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 

94C10] 

(see: Maximum satisfiability problem) 
multiple sequence alignment 

[90C35] 

(see: Optimization in leveled graphs) 
multiple sequence alignment 

[90C35] 

(see: Optimization in leveled graphs) 
multiple shooting 

[65L99, 93-XX] 

(see: Optimization strategies for dynamic systems) 
multiple types of vehicles see: Vehicle scheduling problems 

with — 
multiple-valued logic see: evaluation in — 
multiplexing see: wavelength-division — 
multiplicative function see: constraint on a —; program of 

minimizing a convex — 
multiplicative functions see: sum of convex — 
multiplicative program see: convex —; linear — 
Multiplicative programming 

(90C26, 90C31) 

(referred to in: Global optimization in multiplicative 

programming; Linear programming; Multiparametric 

linear programming; Parametric linear programming: cost 
simplex algorithm) 

(refers to: Complexity classes in optimization; 

Computational complexity theory; Concave programming; 

Global optimization in multiplicative programming; Linear 

programming; Multiparametric linear programming; 

Parametric linear programming: cost simplex algorithm) 
multiplicative programming 

[65K05, 90C25, 90C26, 90C30] 

(see: Concave programming; Monotonic optimization) 
multiplicative programming see: Global optimization in — 
multiplicity of a prime 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 
multiplier see: Lagrange — 
multiplier adjustment 

[90C30, 90C90] 

(see: Decomposition techniques for MILP: lagrangian 

relaxation) 
multiplier approach see: Everett generalized Lagrange — 


multiplier associated with an arc 
[90035] 
(see: Generalized networks) 
multiplier-free reduced Hessian SQP 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
multiplier method 
[90C25, 90C30] 
(see: Lagrangian multipliers methods for convex 
programming) 
multiplier methods 
[90C25, 90C30] 
(see: Lagrangian multipliers methods for convex 
programming) 
multiplier rule see: global Lagrange —; Lagrange — 
multiplier sets see: Lagrange — 
multiplier vector see: Lagrange — 
multipliers see: extended set of Lagrange —; Lagrange —; 
Lagrangian —; MPEC —; orthogonality conditions on —; 
Stochastic programming: nonanticipativity and lagrange — 
multipliers methods for convex programming see: 
Lagrangian — 
multipliers for nonanticipativity constraints see: Lagrange — 
multipliers for phase constraints see: Lagrange — 
multipoint approximation 
[65F10, 65F50, 65H10, 65K10] 
see: Multidisciplinary design optimization) 
multipopulation replicator models 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
see: Replicator dynamics in combinatorial optimization) 
Multiprocessor Scheduling Problem see: minimum — 
multiproduct 
[49120] 
(see: Dynamic programming: inventory control) 
multiproduct 
90C26] 
(see: Global optimization in batch design under 
uncertainty) 
multiproduct batch plant 
90C26] 
(see: MINLP: design and scheduling of batch processes) 
multiproduct plant 
[90C26] 
see: Global optimization in batch design under 
uncertainty) 
multipurpose 
[90C26] 
see: Bilevel optimization: feasibility test and flexibility 
index) 
multipurpose 
[90C26] 
see: Global optimization in batch design under 
uncertainty) 
multipurpose plant 
[90C26] 
see: Global optimization in batch design under 
uncertainty) 
multipurpose storage entities 
see: Planning in the process industry) 
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multiratio programs 

90C32] 

(see: Fractional programming) 

multistage 

[68W 10, 90B15, 90C06, 90C30] 

see: Stochastic network problems: massively parallel 

solution) 

multistage applications 

[65L99, 93-XX] 

see: Optimization strategies for dynamic systems) 

multistage IM 

[90B50] 

see: Inventory management in supply chains) 

multistage inventory management 

[90B50] 

see: Inventory management in supply chains) 

multistage inventory management 

[90B50] 

see: Inventory management in supply chains) 

multistage inventory management models 

[90B50] 

see: Inventory management in supply chains) 

multistage linking constraints 

[90C30, 90C35] 

see: Optimization in water resources) 

multistage mean-variance optimization problems see: 
Decomposition algorithms for the solution of — 

multistage optimization 

[90C27] 

(see: Time-dependent traveling salesman problem) 

multistage problems 

[65L99, 93-XX] 

see: Optimization strategies for dynamic systems) 

multistage stochastic program 

[90C15] 

see: Multistage stochastic programming: barycentric 

approximation) 

multistage stochastic programming 

[68W10, 90B15, 90C06, 90C30] 

see: Stochastic network problems: massively parallel 
solution) 

Multistage stochastic programming: barycentric 
approximation 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 


Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 


Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Preprocessing in 
stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 


problems: convexity theory; Simple recourse problem: dual 


method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 


programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programming: 
quasigradient method; Two-stage stochastic programs with 
recourse) 

(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
L-shaped method for two-stage stochastic programs with 
recourse; Preprocessing in stochastic programming; 
Probabilistic constrained linear programming: duality 
theory; Probabilistic constrained problems: convexity 
theory; Simple recourse problem: dual method; Simple 
recourse problem: primal method; Stabilization of cutting 
plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Two-stage 
stochastic programming: quasigradient method; Two-stage 
stochastic programs with recourse) 


multistart 


65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: stopping rules; 
Stochastic global optimization: two-phase methods) 


multistart process 


65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 


multitarget tracking 


90C35] 
(see: Multi-index transportation problems) 


multivalued maps see: Generalized monotone — 
multivalued monotone laws and variational inequalities 


49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 


multivalued nonmonotone laws and hemivariational inequalities 


49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 
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multivalued variational inequalities see: Solution methods 
for — 
multivariable stability margin 
[93D09] 
(see: Robust control) 
multivariable stability margin K 
[93D09] 
(see: Robust control) 
multivariate distribution functions see: gradient of — 
multivariate distributions see: Stochastic linear programs with 
recourse and arbitrary — 
multivariate gamma distribution 
65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
multivariate interval Newton method 
65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 
multivariate normal distribution 
65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
multivariate normal distribution 
90C15] 
(see: Probabilistic constrained problems: convexity theory) 
multivariate probability distribution function 
65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
multivariate probability integral 
65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
multivariate probability integrals see: Approximation of —; 
lower bounds for —; upper bounds for — 
multiWeber problem 
90B85 
(see: Multifacility and restricted location problems) 
multiWeber problem 
90B85 
(see: Multifacility and restricted location problems) 
multiWeber-Rawls problem 
90B85 
(see: Multifacility and restricted location problems) 
multy-stage stochastic programs 
90C15 
(see: Two-stage stochastic programming: quasigradient 
method) 
multy-stage stochastic programs 
[90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 
Murty least-index refinement 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
mutated sequence see: minimum weight common — 
mutation 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90, 92B05] 
(see: Genetic algorithms; Traveling salesman problem) 
mutation 
[92B05] 
(see: Genetic algorithms) 


mutual information 
[62F10, 94A17] 
see: Entropy optimization: parameter estimation) 
Mutzel branch and cut algorithm see: Junger- — 
MV-algebra 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
mVC 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
mVI problem 
[47J20, 49J40, 65K10, 90C33] 
see: Solution methods for multivalued variational 
inequalities) 
MVL connectives see: semantics of — 
MWCP 
[90C20] 
(see: Standard quadratic optimization problems: 
applications) 
MWFAS 
[90C35] 
(see: Feedback set problems) 
MWEVS 
[90C35] 
(see: Feedback set problems) 
mWW 
(74440, 90C26] 
(see: Shape selective zeolite separation and catalysis: 
optimization methods) 
myopic 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 


N-adic assignments problems 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

n-ary relation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 

see: Boolean and fuzzy relations) 

n-ary relation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 

see: Boolean and fuzzy relations) 

N-dimensional Brownian motion 

[60G35, 65K05] 

see: Differential equations and global optimization) 
n-dimensional vectors see: lexicographical ordering for — 
n-fold integer programming 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 

see: Convex discrete optimization) 

n-fold matrix 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 

see: Convex discrete optimization) 
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n Hessian matrix 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 
n noninteracting 
[92-08, 92C05, 92C40] 
(see: Protein folding: generalized-ensemble algorithms) 
N-normal primal problem 
[90C29, 90C30] 
(see: Multi-objective optimization: lagrange duality) 
NP see: N P=co —; P = — 
N P=co N P 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
n-queens problem 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
n-valued Pl-systems see: subfamilies of — 
NAG library 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
NAG parallel library 
[90C10, 90C26, 90C30] 
(see: Optimization software) 
naive auction algorithm 
[90C30, 90C35] 
(see: Auction algorithms) 
naive Bayes 
see: Bayesian networks) 
narrowing operator see: constraint — 
Nasa 
[34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
see: Nonlocal sensitivity analysis with automatic 
differentiation) 
Nasa program 
[34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
see: Nonlocal sensitivity analysis with automatic 
differentiation) 
Nasa program 
[34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
see: Nonlocal sensitivity analysis with automatic 
differentiation) 
Nash-Cournot equilibrium see: Stackelberg- — 
Nash equilibrium 
[46A22, 49J35, 49J40, 49Jxx, 54D05, 54H25, 55M20, 90C15, 
91A05, 91Axx, 91B06, 91B60] 
(see: Infinite horizon control and dynamic games; Minimax 
theorems; Oligopolistic market equilibrium; Stochastic 
quasigradient methods in minimax problems) 
Nash equilibrium 
[90C15, 91B06, 91B60] 
(see: Oligopolistic market equilibrium; Stochastic 
quasigradient methods in minimax problems) 
Nash equilibrium see: feedback —; memory strategy —; 
open-loop —; spatial Cournot- —; Stackelberg— — 
Nash oligopolistic equilibrium see: Cournot- — 
Nash oligopolistic equilibrium model see: Cournot- — 
native conformation 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 


70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
native conformations see: discarding far-from- — 
natural 
65H20, 80A10, 80A22, 90C90] 
(see: Global optimization: application to phase equilibrium 
problems) 
natural domain 
65K05, 90C30] 
(see: Bisection global optimization methods) 
natural interval extension 
65G20, 65G30, 65G40, 65K05, 90C26, 90C30] 
(see: Bounding derivative ranges; Interval global 
optimization) 
natural level functions 
34-xx, 34Bxx, 34Lxx, 93E24] 
(see: Complexity and large-scale least squares problems) 
natural numbers 
03E70, 03H05, 91B16] 
(see: Alternative set theory) 
natural numbers see: finite — 
natural residual 
90C30, 90C33] 
(see: Implicit lagrangian) 
natural selection see: fundamental theorem of — 
natural stream arcs 
90C30, 90C35] 
(see: Optimization in water resources) 
natural vector see: support of a — 
naturally flowing wells of type a 
76130, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
naturally flowing wells of type b 
76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
Naum Zuselevich see: Shor — 
Navier-Stokes code see: Reynolds-averaged — 
NC 


03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
NC algorithm 
90C30 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
NC method 
90C30 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
NC method 
90C30 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
NC3 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
NDEXPTIME 
[90C60] 
(see: Complexity classes in optimization) 
NDO 
[46N10, 90-00, 90C47] 
(see: Nondifferentiable optimization) 
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NDO see: convex — 
nDOMB algorithm 
[49M07, 49M10, 65K, 90C06] 
(see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 
NDP 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
NDSPACE 
[90C60] 
(see: Complexity classes in optimization) 
NDTIME 
[90C60] 
(see: Complexity classes in optimization) 
near degeneracy 
[90C60] 
(see: Complexity of degeneracy) 
near-integer-fix 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
near-minimizer 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
near-neighbor load balancing scheme 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
near rational numbers see: infinitely — 
near-simpliciality 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 
near-simplicity 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 
nearest insertion optimal partitioning algorithm 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
nearest-neighbor 
[65K05, 90-08, 90B06, 90B35, 90C05, 90C06, 90C10, 90C11, 
90C20, 90C27, 90C30, 90C39, 90C57, 90C59, 90C60, 90C90] 
(see: Disease diagnosis: optimization-based methods; 
Traveling salesman problem) 
nearest neighbor (NN) 
[68Q25, 68R10, 68W40, 90B06, 90B35, 90C06, 90C10, 90C27, 
90C39, 90C57, 90C59, 90C60, 90C90] 
(see: Domination analysis in combinatorial optimization; 
Traveling salesman problem) 
nearest neighbor (RNN) see: repeated — 
nearest point mapping 
[41A30, 47A99, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
nearest vertex insertion (NVI) 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
nearly degenerate BFS 
[90C60] 
(see: Complexity of degeneracy) 


necessary 
[90C22, 90C25, 90C31] 
(see: Semidefinite programming: optimality conditions and 
stability) 
necessary condition see: first order —; second order — 
necessary conditions 
[03H10, 49J27, 90C31, 90C34] 
(see: Semi-infinite programming and control problems; 
Semi-infinite programming: second order optimality 
conditions) 
necessary conditions 
[90C30] 
(see: Image space approach to optimization) 
necessary conditions see: first order —; second order — 
necessary conditions for optimality see: high-order — 
necessary conditions for optimality for abnormal points see: 
High-order — 
necessary constraint 
[90C05, 90C20] 
see: Redundancy in nonlinear programs) 
necessary constraint see: weakly — 
necessary optimality condition 
[90C26, 90C31, 91A65] 
see: Bilevel programming: implicit function approach) 
necessary optimality condition without using (sub)gradients 
parametric representations 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
necessary optimality conditions 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
necessary optimality conditions 
[90C26, 90C31, 90C39, 91A65] 
(see: Bilevel programming: implicit function approach; 
Second order optimality conditions for nonlinear 
optimization) 
necessary optimality conditions see: Equality-constrained 
nonlinear programming: KKT —-; first order —; fritz John —; 
generalized —; KKT —; Kuhn-Tucker — 
necessary and sufficient conditions 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 
necessary and sufficient conditions 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 
necessary and sufficient optimality conditions 
[49K05, 49K10, 49K15, 49K20, 65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization; 
Duality in optimal control with first order differential 
equations) 
necessary and sufficient optimality conditions see: second 
order — 
needs see: static/dynamic service — 
negamax algorithm see: incremental — 
negans see: modus — 
negated relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
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91B06, 92C60] 
see: Boolean and fuzzy relations) 
negation 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
negation transformation 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 
negative 
[49M29, 65K10, 90C06] 
see: Local attractors for gradient-related descent iterations) 
negative curvature 
[49M37] 
see: Nonlinear least squares: trust region methods) 
negative curvature 
[49M37] 
see: Nonlinear least squares: trust region methods) 
negative curvature see: direction of — 
negative cycles 
[90035] 
(see: Minimum cost flow problem) 
negative fault 
[90Cxx] 
(see: Discontinuous optimization) 
negative fitness see: genetic engineering via — 
negative gradient see: projected — 
negative main diagonal 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
negative marginal values 
[90C60] 
(see: Complexity of degeneracy) 
negative marginal values 
[90C60] 
(see: Complexity of degeneracy) 
negative real numbers see: infinitely small — 
negative-zero pattern see: positive- — 
negatively see: dropped — 
neighbor see: k- —; legal —; nearest- — 
neighbor joining 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 
neighbor load balancing scheme see: near- — 
neighbor (NN) see: nearest — 
neighbor (RNN) see: repeated nearest — 
neighbor in tabu search see: allowed —; prohibited — 
neighborhood 
[05C15, 05C17, 05C35, 05C69, 65K05, 68T20, 68T99, 90C22, 
90C26, 90C27, 90C30, 90C35, 90C59] 
(see: Global optimization: filled function methods; Lovasz 
number; Metaheuristics) 
neighborhood see: 2-opt —; discrete —; exchange —; k- —; 
k-exchange —; Lin—Kernighan —; pair-exchange — 
neighborhood descent see: variable — 
neighborhood edge elimination ordering see: cobipartite — 
neighborhood graphs see: empty — 
neighborhood of a permutation 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
neighborhood search methods see: Variable — 


neighborhood of a solution 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
neighborhood structure 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
neighborhood structure for the QAP see: K-L type — 
neighborhoods see: large-scale — 
neighboring bases 
[90C05, 90C31] 
(see: Parametric linear programming: cost simplex 
algorithm) 
neighboring critical regions 
[90C05, 90C31] 
(see: Parametric linear programming: cost simplex 
algorithm) 
neighboring stations see: one-hop — 
neighbors 
90C05, 90C31] 
(see: Multiparametric linear programming) 
neighbors see: two-hop — 
neighbors of the origin 
13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
Nelder-Mead algorithm 
90C26, 90C90] 
(see: Global optimization in binary star astronomy) 
nested 
9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 
nested Benders decomposition 
90C15, 90C90] 
(see: Decomposition algorithms for the solution of 
multistage mean-variance optimization problems) 
nested constraints 
90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
nested dissection 
65Fxx] 
(see: Least squares problems) 
nested family see: finite — 
nested loops 
05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
nested loops 
05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
nested partitions 
90C11, 90C59] 
(see: Nested partitions optimization) 
Nested partitions optimization 
(90C59, 90C11) 
nested STO problem 
[90C15] 
(see: Stochastic quasigradient methods in minimax 
problems) 
net demand 
[90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 
net present value see: maximize — 
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net supply 
[90B10, 90C26, 90C30, 90C35] 


(see: Nonconvex network flow problems) 

network 
[05C05, 05C40, 68R10, 90C35] 
(see: Generalized networks; Network design problems) 

network 
[05C05, 05C40, 68R10, 68W 10, 90B06, 90B10, 90B15, 90C05, 
90C06, 90C30, 90C35] 
(see: Frank-Wolfe algorithm; Maximum flow problem; 
Minimum cost flow problem; Network design problems; 
Nonoriented multicommodity flow problems; Stochastic 
network problems: massively parallel solution; Vehicle 
routing) 

network see: 1-median problem in a —; augmented —; 
bipartite —; capacity of an arc in a —; communication —; 
congested —; cost of an arc in a —; covering problem on 
a—; deterministic neural —; directed —; directed arc in 
a —; directed arc in a directed —; directed capacitated —; 
endpoint of an arc in a directed —; evolutionary —; 
feed-forward neural —; generalized —; heat and mass 
exchange —; incidence in a —; intermediate scale —; large 
region —; local-area computer —; macro scale —; mass 
exchanger —; mass and heat exchanger —; micro scale —; 
multicommodity —; multiphase spanning —; multiphase 
Steiner —; node in a —; node in a directed —; p-center 
problem on a —; recurrent neural —; regional —; 
residual —; routing of traffic in transmission —; 
Space-time —; star —; state-task- —; stochastic neural —; 
strongly connected —; survivable —; system-optimized 
transportation —; time replicated —; training a —; 
transformed —; two-layer feed-forward —; user-optimized 
transportation —; weekly space-time — 

network arc 

90C35] 

(see: Generalized networks) 

network assignment problem see: Communication — 

network connectivity 

90C35] 

(see: Maximum flow problem) 

network constraints 

90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 

network constraints see: optimization under — 

network cost see: minimizing — 

network design 

90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

(see: Modeling difficult optimization problems) 

network design 

90-XX, 90B10, 90C26, 90C30, 90C35, 90C90, 91A65, 91B99] 

(see: Bilevel programming: applications; Nonconvex 

network flow problems; Survivable networks) 

network design problem 

90-01, 90B30, 90B50, 90C15, 90C26, 90C33, 91B32, 91B52, 
91B74] 
(see: Bilevel programming in management; Stochastic 
bilevel programs) 

network design problem see: survivable — 

Network design problems 
(05C05, 05C40, 68R10, 90C35) 


(referred to in: Auction algorithms; Communication 
network assignment problem; Dynamic traffic networks; 
Equilibrium networks; Generalized networks; Maximum 
flow problem; Minimum cost flow problem; 
Multicommodity flow problems; Network location: 
covering problems; Nonconvex network flow problems; 
Piecewise linear network flow problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Traffic network equilibrium) 
(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Equilibrium networks; Evacuation 
networks; Generalized networks; Maximum flow problem; 
Minimum cost flow problem; Network location: covering 
problems; Nonconvex network flow problems; Piecewise 
linear network flow problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Traffic network equilibrium) 

network design and schedule construction 

[90B06, 90C06, 90C08, 90C35, 90C90] 

see: Airline optimization) 

network design and schedule construction 

[90B06, 90C06, 90C08, 90C35, 90C90] 

see: Airline optimization) 

network equilibrium 

[90B06, 90B20, 90C30, 91B50] 

see: Equilibrium networks; Traffic network equilibrium) 

network equilibrium see: fixed demand traffic —; multimodal 
traffic —; symmetric —; traffic — 

network equilibrium model see: migration —; multimodal 
traffic — 

network equilibrium with travel disutility functions see: 
traffic — 

network flow see: minimum cost —; value of a — 

network flow model 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

network flow models see: undirected multicommodity — 

network flow problem 
[90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 

network flow problem see: fixed charge —; linear —; minimum 
cost —; multicommodity —; nonconvex —; nonlinear —; 
nonlinear dynamic —; nonlinear single commodity —; 
piecewise linear minimum cost —; uncapacitated — 

network flow problems see: dynamic —; Nonconvex —; 
nonlinear —; Piecewise linear — 

network flows 
[01499] 
(see: History of optimization) 

network flows 
[05C05, 05C40, 68R10, 90C35] 
(see: Generalized networks; Network design problems; 
Railroad crew scheduling; Railroad locomotive scheduling) 

network flows see: multicommodity — 

network localization problem see: Sensor — 

network localization problem, SNLP see: Semidefinite 
programming and the sensor — 
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Network location: covering problems 
(90C35, 90B10, 90B80) 
(referred to in: Auction algorithms; Combinatorial 
optimization algorithms in resource allocation problems; 
Communication network assignment problem; Dynamic 
traffic networks; Equilibrium networks; Facilities layout 
problems; Facility location with externalities; Facility 
location problems with spatial interaction; Facility location 
with staircase costs; Generalized networks; Global 
optimization in Weber’s problem with attraction and 
repulsion; Maximum flow problem; Minimum cost flow 
problem; MINLP: application in facility location-allocation; 
Multicommodity flow problems; Multifacility and 
restricted location problems; Network design problems; 
Nonconvex network flow problems; Nonoriented 
multicommodity flow problems; Optimizing facility 
location with euclidean and rectilinear distances; Piecewise 
linear network flow problems; Shortest path tree 
algorithms; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Stochastic 
transportation and location problems; Survivable networks; 
Traffic network equilibrium; Voronoi diagrams in facility 
location; Warehouse location problem) 
(refers to: Auction algorithms; Combinatorial optimization 
algorithms in resource allocation problems; 
Communication network assignment problem; 
Competitive facility location; Directed tree networks; 
Dynamic traffic networks; Equilibrium networks; 
Evacuation networks; Facility location with externalities; 
Facility location problems with spatial interaction; Facility 
location with staircase costs; Generalized networks; Global 
optimization in Weber’s problem with attraction and 
repulsion; Maximum flow problem; Minimum cost flow 
problem; MINLP: application in facility location-allocation; 
Multifacility and restricted location problems; Network 
design problems; Nonconvex network flow problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Piecewise linear network flow problems; 
Production-distribution system design problem; Resource 
allocation for epidemic control; Shortest path tree 
algorithms; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Stochastic 
transportation and location problems; Survivable networks; 
Traffic network equilibrium; Voronoi diagrams in facility 
location; Warehouse location problem) 


network model see: dynamic traffic — 


network node 
[90C35] 
(see: Generalized networks) 


network node see: deficit of a —; excess of a — 


network optimization 
[90C30, 90C35] 
(see: Auction algorithms) 


network optimization 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 

network optimization system see: generalized — 

network problem see: generalized —; match- —; pure —; 
stochastic — 

network problems see: fixed demand traffic —; quadratic 
generalized — 

network problems: massively parallel solution see: 
Stochastic — 

network problems with travel demand functions see: elastic 
demand traffic — 

network programming 

90B10, 90C05, 90C06, 90C35] 

(see: Nonoriented multicommodity flow problems) 

network programming 

91B28] 

(see: Financial optimization) 

network simplex algorithm 

90035] 

(see: Minimum cost flow problem) 

network simplex algorithm 

90C35] 

(see: Minimum cost flow problem) 

network structure of the spatial price equilibrium problem 

91B28, 91B50] 

(see: Spatial price equilibrium) 

network superstructure see: heat exchanger — 

network synthesis 

90C05] 

(see: Continuous global optimization: applications) 

network synthesis 

90C90] 
(see: MINLP: heat exchanger network synthesis) 

network synthesis see: heat exchanger —; MINLP: heat 
exchanger —; Mixed integer linear programming: heat 
exchanger — 

network synthesis without decomposition see: heat 
exchanger — 

network topology 
[65K05, 65Y05] 
(see: Parallel computing: models) 

network training see: Unconstrained optimization in neural — 

networks see: all-optical —; Bayesian —; chain —; chain rule 
for Bayesian —; Directed tree —; Dynamic traffic —; 
dynamical Bayesian —; Equilibrium —; Evacuation —; fixed 
charge —; Flexible mass exchange —; flows in —; 
generalized —; Global optimization of heat exchanger —; 
heat exchanger —; Integer linear programs for routing and 
protection problems in optical —; mesh —; MINLP: mass 
and heat exchanger —; Mixed integer linear programming: 
mass and heat exchanger —; multiclass queueing —; 
multimodal —; neural —; Optimization in ad hoc —; 
queuing —; regeneration —; ring —; Survivable —; 
topology of transportation — 

networks for combinatorial optimization see: Neural — 

networks under uncertainty see: Bilevel programming 
framework for enterprise-wide process — 

Neumann algebra see: von — 

Neumann architecture see: von — 

Neumann, John see: Von — 
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neural network see: deterministic —; feed-forward —; 
recurrent —; stochastic — 


neural network training see: Unconstrained optimization in — 


neural networks 
[65K05, 68T20, 68T99, 90-08, 90C05, 90C06, 90C09, 90C10, 
90C11, 90C20, 90C27, 90C30, 90C59, 90C90] 
(see: Disease diagnosis: optimization-based methods; 
Metaheuristics; Optimization in boolean classification 
problems) 


neural networks 
[65K05, 68T05, 90C26, 90C27, 90C30, 90C39, 90C52, 90C53, 
90C55] 
(see: Forecasting; Neural networks for combinatorial 
optimization; Neuro-dynamic programming; 
Unconstrained optimization in neural network training) 

Neural networks for combinatorial optimization 
(90C27, 90C30) 
(referred to in: Bayesian networks; Combinatorial matrix 
analysis; Combinatorial optimization algorithms in 
resource allocation problems; Combinatorial optimization 
games; Evolutionary algorithms in combinatorial 
optimization; Fractional combinatorial optimization; 
Multi-objective combinatorial optimization; 
Neuro-dynamic programming; Replicator dynamics in 
combinatorial optimization; Set covering, packing and 
partitioning problems; Unconstrained optimization in 
neural network training) 
(refers to: Neuro-dynamic programming; Replicator 
dynamics in combinatorial optimization; Unconstrained 
optimization in neural network training) 

Neuro-dynamic programming 
(90C39) 
(referred to in: Dynamic programming: average cost per 
stage problems; Dynamic programming in clustering; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; Multiple 
objective dynamic programming; Neural networks for 
combinatorial optimization; Replicator dynamics in 
combinatorial optimization; Unconstrained optimization 
in neural network training) 
(refers to: Dynamic programming: average cost per stage 
problems; Dynamic programming in clustering; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming: discounted problems; Dynamic 
programming: infinite horizon problems, overview; 
Dynamic programming: inventory control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Dynamic programming: stochastic shortest 
path problems; Dynamic programming: undiscounted 
problems; Hamilton-Jacobi-Bellman equation; Multiple 
objective dynamic programming; Neural networks for 
combinatorial optimization; Replicator dynamics in 


combinatorial optimization; Unconstrained optimization 
in neural network training) 

neurons 
[90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 

neurons see: input —; output — 

New hybrid conjugate gradient algorithms for unconstrained 
optimization 
(49M07, 49M10, 90C06, 65K) 

new paradigm see: Modeling languages in optimization: a — 

new trial steplength see: compute a safeguarded — 

the New York Times 

[90C05] 

see: Ellipsoid method) 

New York Times see: the — 

Newsam-Ramsdell method 

[65K05, 90C30] 

see: Automatic differentiation: calculation of Newton 

steps) 

newsboy model 

[90B50] 

see: Inventory management in supply chains) 

newsboy problem 

[90C06, 90C08, 90C15] 
(see: Simple recourse problem; Stochastic quasigradient 
methods in minimax problems) 

newsboy problem 
[90B50, 90C06, 90C08, 90C15] 
(see: Inventory management in supply chains; Simple 
recourse problem; Stochastic quasigradient methods in 
minimax problems) 

Newton see: interval —; quasi- —; truncated — 

Newton algorithm see: interval — 

Newton-Cauchy framework 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

Newton-Cauchy framework 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

Newton-Cauchy framework see: Unconstrained nonlinear 
optimization: — 

Newton iteration see: interval — 

Newton’s method 
[49]52, 49M37, 65K05, 68Q25, 68R05, 90-08, 90C05, 90C20, 
90C22, 90C25, 90C27, 90C30, 90C32, 90C51, 90Cxx] 
(see: Cost approximation algorithms; Fractional 
combinatorial optimization; Interior point methods for 
semidefinite programming; Nondifferentiable 
optimization: Newton method; Nonlinear least squares: 
Newton-type methods; Quadratic programming over an 
ellipsoid; Symmetric systems of linear equations; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 

Newton method 
[49J52, 49M29, 49M37, 65K10, 68Q25, 68R05, 90-08, 90C06, 
90C20, 90C25, 90C27, 90C30, 90C32] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control; Fractional combinatorial 
optimization; Nondifferentiable optimization: Newton 
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method; Nonlinear least squares: Newton-type methods; 
Quadratic programming over an ellipsoid) 

Newton's method see: approximate —; bundle- —; 
damped —; damped Gauss-— —-; discrete truncated —; 
full-step Gauss- —; Gauss- —; Gauss—Newton method: 
Least squares, relation to —; homotopy —; inexact —; 
interval —; Krawczyk variation of the interval —; 
modified —; multivariate interval —; Nondifferentiable 
optimization: —; nonsmooth —-; partial-update —; 


partitioned quasi- —; quasi- —; separated —; 
smoothing —; splitting —; SR1 quasi- —; symmetric 
rank-one quasi- —; truncated —; univariate interval — 


Newton method of Broyden class see: quasi- — 

Newton method in deterministic global optimization see: LP 
strategy for interval- — 

Newton method: Least squares, relation to Newton’s method 
see: Gauss— — 

Newton's method in unconstrained optimal control see: 
Dynamic programming and — 

Newton methods see: existence-proving properties of 


interval —; factorized quasi- —; inexact —; interval —; 
quasi- — 
Newton operator see: interval —; univariate interval — 
Newton procedure 


[49M20, 90-08, 90C25] 
see: Nondifferentiable optimization: cutting plane 
methods) 
Newton-Raphson method 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
Newton relation see: quasi- — 
Newton search engine 
90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 
Newton search engine see: quasi- — 
Newton software package see: block truncated — 
Newton step 

[37A35, 65K05, 90C05, 90C30] 

(see: Automatic differentiation: calculation of Newton steps; 

Potential reduction methods for linear programming) 
Newton step 

[65K05, 90C30] 

(see: Automatic differentiation: calculation of Newton 

steps) 

Newton step case of the trust region problem 

[49M37] 

(see: Nonlinear least squares: trust region methods) 
Newton steps see: Automatic differentiation: calculation of — 
Newton test 

[65K05, 65Y05, 65Y10, 65Y20, 68W10] 

(see: Interval analysis: parallel methods for global 

optimization) 
Newton-type method 

[65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 
Newton-type methods see: Nonlinear least squares: — 
Newton update see: BFGS quasi- —; 

Broyden-Fletcher—Goldfarb-Shanno quasi- —; quasi- — 
Newton updates see: quasi- — 

Newton updating see: inverse quasi- — 


Newtonian descent 
49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
Newtonian descent 
49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
Newtonian descent direction see: quasi- — 
next shortest path procedure 
90C35] 
(see: Multicommodity flow problems) 
Nicholas Constantine see: Metropolis — 
NIMBY syndrome 
90B80, 90B85] 
(see: Warehouse location problem) 
NLP 
65L99, 90C06, 90C10, 90C11, 90C30, 90C57, 90C90, 93-XX] 
(see: Modeling difficult optimization problems; 
Optimization strategies for dynamic systems) 
NLP see: Sensitivity and stability in —; solution-point bounds 
for — 
NLP: approximation see: Sensitivity and stability in — 
NLP based branch and bound see: IP/ —; QP/ — 
NLP: continuity and differential stability see: Sensitivity and 
stability in — 
NLP solvers see: bottlenecks in — 
NLP subproblem 
[49M20, 90C11, 90C30] 
(see: Generalized outer approximation) 
NLP techniques 
[65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
NLPs see: Bounds and solution vector estimates for 
parametric —; Twice-differentiable — 
NLS 
90C30] 
(see: Nonlinear least squares problems) 
NM 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
NM see: damped —; smoothing — 
NN 
90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 
(NN) see: nearest neighbor — 
NNFP 
90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 
NNFP see: global minimum of an — 
no capacity constraints see: single fixed cost with — 
no free lunch 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
no pivoting required 
[15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 
Nobel Prize 
[01A99] 
(see: Kantorovich, Leonid Vitalyevich) 
node see: arrival —; arrival-ground —; balanced —; border —; 
cardinality of a —; center —; deficit of a network —; 
demand —-; departure —; disallowed —; established —; 
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excess of a network —; fathoming a —; feasible —; 
infeasible —; last —; network —; root —; sink —; source 


supply —; transshipment —; tree — 
node-arc formulation 
90C35] 
(see: Multicommodity flow problems) 
node-arc formulation of the problem 
90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 
node-arc incidence matrix 
90C35] 
(see: Generalized networks) 
node-arc incidence matrix 
90C30] 
(see: Simplicial decomposition) 
node construction procedure see: best — 
node cover 
90C35] 
(see: Maximum flow problem) 
node covering problem 
90C20, 90C60] 
(see: Quadratic knapsack) 
node in a directed network 
90C35] 
(see: Maximum flow problem) 
node-disjoint path 
90-XX] 
(see: Survivable networks) 
node-disjoint path 
90-XX] 
(see: Survivable networks) 
node flow balance equations 
90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 
node legend 
(see: Railroad crew scheduling) 
node in a network 
90C35] 
(see: Minimum cost flow problem) 
node oriented branch and bound method 
68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
node oriented construction procedure 
68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
node-path formulation of the multicommodity flow problem 
90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 
node potentials 
[90C35] 
(see: Minimum cost flow problem) 
node reconstruction see: parent — 
node routing 
[90B06] 
(see: Vehicle routing) 
node routing 
[90B06] 
(see: Vehicle routing) 
node tightening 
(see: Fractional zero-one programming) 


> 


node of a truss 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
nodes see: on-the-river hydropower —; physical junction —; 
plant —; plant/model —; PSA with dummy —-; retailer —; 
retailer/model —; return —; Steiner 
nodes set see: border —; tree — 
nodes with water storage capacity 
[90C30, 90C35] 
(see: Optimization in water resources) 
noises see: optimization with — 
noising method 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
noisy-AND 
(see: Bayesian networks) 
noisy functional dependence 
(see: Bayesian networks) 
noisy-OR 
see: Bayesian networks) 
nomography 
[01A60, 03B30, 54C70, 68Q17] 
see: Hilbert’s thirteenth problem) 
non-anchor 
see: Semidefinite programming and the sensor network 
localization problem, SNLP) 
non-anticipative 
[90C15, 90C90] 
see: Decomposition algorithms for the solution of 
multistage mean-variance optimization problems) 
non-crossing 
see: Contact map overlap maximization problem, CMO) 
Non-Differentiable Functions and Applications see: 
minimization Methods for — 
non-linear 
(see: Global optimization: functional forms) 
non-smooth optimization see: Derivative-free methods for — 
non standard methods see: unbounded controls and — 
nonaccepting computation of a Turing machine 
[90C60] 
see: Complexity classes in optimization) 
nonadaptive method 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
see: Information-based complexity and information-based 
optimization) 
nonanticipative principle 
[90C30, 90C35] 
see: Optimization in water resources) 
nonanticipative with respect to a filtration see: stochastic 
process — 
nonanticipative water resources policies 
[90C30, 90C35] 
(see: Optimization in water resources) 
nonanticipativity 
[90C15, 91B28] 
(see: Financial optimization; Multistage stochastic 
programming: barycentric approximation; Stochastic 
linear programs with recourse and arbitrary multivariate 
distributions; Stochastic programming: nonanticipativity 
and lagrange multipliers) 
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nonanticipativity 


[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 


nonanticipativity constraints 


[68W10, 90B15, 90C06, 90C15, 90C30, 90C35] 

(see: Optimization in water resources; Stochastic network 
problems: massively parallel solution; Stochastic 
programming: parallel factorization of structured matrices) 


nonanticipativity constraints see: Lagrange multipliers for — 
nonanticipativity and lagrange multipliers see: Stochastic 


programming: — 


nonanticipativity water resources policies 


[90C30, 90C35] 
see: Optimization in water resources) 


nonarbitrage condition for LDSU 


[90C34, 91B28] 
see: Semi-infinite programming and applications in 
finance) 


nonassociative groupoid 


[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 


nonassociative products 


[03B52, 03E72, 47S40, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 


nonassociative products 


[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 


nonassociativity 


[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 


nonbasic 


[90C05] 
(see: Linear programming: Klee-Minty examples) 


nonbasic column 


[90C05, 90C33] 
see: Pivoting algorithms for linear programming 
generating two paths) 


nonbasic component 


[90C30] 
see: Convex-simplex algorithm) 


nonbasic component 


[90C30] 
see: Convex-simplex algorithm) 


nonbasic matrix 


[90C05, 90C33] 
(see: Pivoting algorithms for linear programming 
generating two paths) 


nonbasic variable see: eligible — 
nonbasic variables 


[49M37, 90C11] 
(see: Mixed integer nonlinear programming) 


nonbonded distance 


[92B05] 
(see: Genetic algorithms for protein structure prediction) 


nonbonded distance 


[92B05] 
(see: Genetic algorithms for protein structure prediction) 


noncentral component 
[68T99, 90C27] 

(see: Capacitated minimum spanning trees) 
noncommutative groupoid 

[03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 
noncompactness see: measure of — 

noncompensatory argument 
[90-XX] 

(see: Outranking methods) 

nonconvex 
[35B40, 37C70, 49J24, 90C10, 90C11, 90C25, 90C27, 90C30, 
90C33] 

(see: Continuous reformulations of discrete-continuous 

optimization problems; Successive quadratic 

programming: solution by active sets and interior point 

methods; Turnpike theory: stability of optimal trajectories) 
nonconvex 

[90C25, 90C30] 

(see: Successive quadratic programming: full space 

methods; Successive quadratic programming: solution by 

active sets and interior point methods) 

nonconvex dual problem 
[49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 
nonconvex energy function 

[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 

(see: Hemivariational inequalities: applications in 

mechanics) 

Nonconvex energy functions: hemivariational inequalities 
(49]40, 70-XX, 80-XX, 49J52, 49Q10, 74K99, 74Pxx) 
(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 


Subject Index 


4379 


Hemivariational inequalities: static problems; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 


thermoelasticity; Quasidifferentiable optimization: calculus 


of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 


methods; Variational inequalities; Variational inequalities: 


F, E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

nonconvex feasibility analysis see: Shape reconstruction 
methods for — 

nonconvex function 
[90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 

nonconvex minimization 
[90C26] 
(see: Convex envelopes in optimization problems) 

nonconvex minimization 
[90C26, 90C31] 
(see: Global optimization in multiplicative programming; 
Multiplicative programming) 

nonconvex minimization problems see: decomposition 
algorithms for — 

nonconvex MINLP 
[49M37, 90C11] 
(see: Mixed integer nonlinear programming) 

nonconvex network flow problem 
[90B10, 90C26, 90C30, 90C35] 
(see: Nonconvex network flow problems) 

Nonconvex network flow problems 
(90C26, 90C30, 90C35, 90B10) 
(referred to in: Auction algorithms; Communication 
network assignment problem; Dynamic traffic networks; 
Equilibrium networks; Generalized networks; Global 
supply chain models; Inventory management in supply 
chains; Maximum flow problem; Minimum cost flow 
problem; Multicommodity flow problems; Network design 
problems; Network location: covering problems; 
Nonoriented multicommodity flow problems; Operations 
research models for supply chain management and design; 
Piecewise linear network flow problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic network 


problems: massively parallel solution; Survivable networks; 


Traffic network equilibrium) 

(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Equilibrium networks; Evacuation 


networks; Generalized networks; Global supply chain 
models; Inventory management in supply chains; 
Maximum flow problem; Minimum cost flow problem; 
Multicommodity flow problems; Network design problems; 
Network location: covering problems; Nonoriented 
multicommodity flow problems; Operations research 
models for supply chain management and design; Piecewise 
linear network flow problems; Shortest path tree 
algorithms; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Traffic network equilibrium) 


Nonconvex-nonsmooth calculus of variations 


(49]40) 

(referred to in: Composite nonsmooth optimization; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Composite nonsmooth optimization; Generalized 
monotonicity: applications to variational inequalities and 
equilibrium problems; Hemivariational inequalities: 
applications in mechanics; Hemivariational inequalities: 
eigenvalue problems; Hemivariational inequalities: static 
problems; Nonconvex energy functions: hemivariational 
inequalities; Nonsmooth and smoothing methods for 
nonlinear complementarity problems and variational 
inequalities; Quasidifferentiable optimization; 
Quasidifferentiable optimization: algorithms for 
hypodifferentiable functions; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
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systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

nonconvex optimization 
[93D09] 
(see: Robust control) 

nonconvex optimization 
[26B25, 26E25, 49-XX, 49J40, 49J52, 49M37, 65K05, 65K99, 
70-08, 90-XX, 90C25, 90C26, 90C30, 90C99, 91A10, 93-XX] 
(see: Bilevel programming; Convex envelopes in 
optimization problems; Duality theory: biduality in 
nonconvex optimization; Quasidifferentiable optimization; 
Quasidifferentiable optimization: codifferentiable 
functions; Smooth nonlinear nonconvex optimization; 
Solving hemivariational inequalities by nonsmooth 
optimization methods) 

nonconvex optimization see: Duality gaps in —; Duality theory: 
biduality in —; nonsmooth —; Smooth nonlinear — 

nonconvex optimization problem 
[90C26] 
(see: Global optimization in batch design under 
uncertainty; Smooth nonlinear nonconvex optimization) 

nonconvex primal problem 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 

nonconvex problem see: mixed integer — 

nonconvex program see: nondifferentiable — 

nonconvex programming 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: biduality in nonconvex optimization) 

nonconvex programming 

[90B06, 90B10, 90C26, 90C30, 90C35] 

see: Minimum concave transportation problems; 

Nonconvex network flow problems; Reverse convex 

optimization) 

nonconvex programming problem 

[90C26, 90C39] 

see: Second order optimality conditions for nonlinear 

optimization) 

nonconvex programming problems see: convex and — 

nonconvex programs 

90C26, 90C39] 

(see: Second order optimality conditions for nonlinear 

optimization) 

nonconvex programs 

90C09, 90C10, 90C11] 

(see: Disjunctive programming) 

nonconvex quadratic programming 

90C60] 
(see: Complexity theory; Complexity theory: quadratic 
programming) 

nonconvex set 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 


nonconvex SQP 

[90C25, 90C30] 

(see: Successive quadratic programming: full space 

methods) 
nonconvex superpotential 

[49]40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 

(see: Nonconvex energy functions: hemivariational 

inequalities) 
nonconvexity 

[49M37, 65K10, 90C26, 90C30] 

(see: eBB algorithm) 
nonconvexity 

[49-XX, 90-XX, 93-XX] 

(see: Duality theory: triduality in global optimization) 
nonconvexity see: low-rank — 
nonconvexity test 

[49M37, 90C11] 

(see: Mixed integer nonlinear programming) 
nonconvexity test see: monotonicity and — 
noncooperative behavior 

[91B06, 91B60] 

(see: Oligopolistic market equilibrium) 
noncooperative behavior 
91B06, 91B60] 

(see: Oligopolistic market equilibrium) 
noncooperative equilibrium 

49]xx, 91 Axx] 

(see: Infinite horizon control and dynamic games) 
Noncooperative equilibrium 

49Jxx, 91 Axx] 

(see: Infinite horizon control and dynamic games) 
noncooperative game 

91B06, 91B60] 

(see: Oligopolistic market equilibrium) 
noncooperative games 

46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 

noncooperative solution 

49Jxx, 91Axx] 

(see: Infinite horizon control and dynamic games) 
noncycling 

90C05, 90C10] 

(see: Simplicial pivoting algorithms for integer 
programming) 

nondecreasing function 

41A30, 62J02, 90C26] 

(see: Regression by special functions: algorithms and 
complexity) 

nondecreasing monotone Boolean function 

90C09] 

(see: Inference of monotone boolean functions) 
nondegeneracy 

90C60] 

(see: Complexity of degeneracy) 

nondegeneracy 

90C22, 90C25, 90C31, 90C60] 

(see: Complexity of degeneracy; Semidefinite programming: 
optimality conditions and stability) 
nondegeneracy assumption 

90C33] 

(see: Lemke method) 
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nondegeneracy assumption for algorithm analysis 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
nondegeneracy condition 
[90C33] 
(see: Linear complementarity problem) 
nondegeneracy condition see: linear —; quadratic — 
nondegenerate 
[41A10, 46N10, 47N10, 49K27, 57R12, 65K05, 90C20, 90C31, 
90C34, 90C46] 
(see: Generalized semi-infinite programming: optimality 
conditions; High-order necessary conditions for optimality 
for abnormal points; Quadratic programming with bound 
constraints; Smoothing methods for semi-infinite 
optimization) 
nondegenerate 
[90C05] 
(see: Linear programming) 
nondegenerate BFS 
[90C60] 
(see: Complexity of degeneracy) 
nondegenerate critical point 
[49J52, 49Q10, 58E05, 65K05, 65K10, 74G60, 74H99, 74K99, 
74Pxx, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 90C33, 
90C34, 90C90] 
(see: Parametric optimization: embeddings, path following 
and singularities; Quasidifferentiable optimization: 
stability of dynamic systems; Topology of global 
optimization) 
nondegenerate critical points 
90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
nondegenerate cycling 
05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 
nondegenerate pivot operation 
90C35] 
(see: Minimum cost flow problem) 
nondegenerate point 
90C22, 90C25, 90C31] 
(see: Semidefinite programming: optimality conditions and 
stability) 
nondegenerate problems 
05B35, 65K05, 90C05, 90C20, 90C30, 90C33] 
(see: Criss-cross pivoting rules; Least-index anticycling 
rules; Lexicographic pivoting rules; Rosen’s method, global 
convergence, and Powell’s conjecture) 


nondegenerate solution 
[90C60 
(see: Complexity of degeneracy) 
nondegenerate systems 
[90C60 
(see: Complexity of degeneracy) 
nondeterministic 
[90C60 
(see: Complexity classes in optimization) 
nondeterministic polynomial algorithm 
[90C60 
(see: Computational complexity theory) 


nondeterministic polynomial algorithm 
[90C60] 
(see: Computational complexity theory) 

nondeterministic polynomial time algorithm 
[90C60] 
(see: Computational complexity theory) 

nondeterministic Turing machine 
[90C60] 
(see: Complexity theory) 

nondeterministic Turing machine see: space complexity of 
a—; time complexity of a — 

nondifferentiability 

[49-04, 65Y05, 68N20] 

see: Automatic differentiation: parallel computation) 

nondifferentiability 

[65G20, 65G30, 65G40, 65H20] 

see: Interval analysis: nondifferentiable problems) 

nondifferentiable convex optimization 

[90C06, 90C25, 90C35] 

see: Simplicial decomposition algorithms) 

nondifferentiable convex program 

[90C06, 90C25, 90C35] 

(see: Simplicial decomposition algorithms) 

nondifferentiable function 

[93-XX] 

see: Dynamic programming: optimal control applications) 

nondifferentiable nonconvex program 

[90C15, 90C26, 90C33] 

see: Stochastic bilevel programs) 

nondifferentiable objective functions 

[65T40, 90C26, 90C30, 90C90] 

see: Global optimization methods for harmonic retrieval) 

nondifferentiable objective functions 

[90C30] 

see: Sequential simplex method) 

Nondifferentiable optimization 

90-00, 90C47, 46N10) 

referred to in: Dini and Hadamard derivatives in 
optimization; Discontinuous optimization; Global 
optimization: envelope representation; Nondifferentiable 
optimization: cutting plane methods; Nondifferentiable 
optimization: minimax problems; Nondifferentiable 
optimization: Newton method; Nondifferentiable 
optimization: parametric programming; Nondifferentiable 
optimization: relaxation methods; Nondifferentiable 
optimization: subgradient optimization methods) 
(refers to: Dini and Hadamard derivatives in optimization; 
Discontinuous optimization; Generalized benders 
decomposition; Global optimization: envelope 
representation; Integer programming: lagrangian 
relaxation; Nondifferentiable optimization: cutting plane 
methods; Nondifferentiable optimization: minimax 
problems; Nondifferentiable optimization: Newton 
method; Nondifferentiable optimization: parametric 
programming; Nondifferentiable optimization: relaxation 
methods; Nondifferentiable optimization: subgradient 
optimization methods; Quasidifferentiable optimization: 
exact penalty methods) 

nondifferentiable optimization 
[46N10, 49M20, 65K05, 90-00, 90-08, 90C25, 90C26, 90C30, 
90C31, 90C47, 90Cxx] 
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(see: Bilevel programming: introduction, history and 
overview; Cyclic coordinate method; Dini and Hadamard 
derivatives in optimization; Discontinuous optimization; 
Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods; Nondifferentiable 
optimization: minimax problems) 


Nondifferentiable optimization: cutting plane methods 


(49M20, 90C25, 90-08) 

(referred to in: Dini and Hadamard derivatives in 
optimization; Global optimization: envelope 
representation; Nondifferentiable optimization; 
Nondifferentiable optimization: minimax problems; 
Nondifferentiable optimization: Newton method; 
Nondifferentiable optimization: parametric programming; 
Nondifferentiable optimization: relaxation methods; 
Nondifferentiable optimization: subgradient optimization 
methods) 

(refers to: Dini and Hadamard derivatives in optimization; 
Global optimization: envelope representation; Linear 
programming: karmarkar projective algorithm; 
Nondifferentiable optimization; Nondifferentiable 
optimization: minimax problems; Nondifferentiable 
optimization: Newton method; Nondifferentiable 
optimization: parametric programming; Nondifferentiable 
optimization: relaxation methods; Nondifferentiable 
optimization: subgradient optimization methods) 


Nondifferentiable optimization: minimax problems 


(90C30, 65K05) 

(referred to in: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Dini and 
Hadamard derivatives in optimization; Global 
optimization: envelope representation; Minimax: 
directional differentiability; Minimax theorems; 
Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods; Nondifferentiable 
optimization: Newton method; Nondifferentiable 
optimization: parametric programming; Nondifferentiable 
optimization: relaxation methods; Nondifferentiable 
optimization: subgradient optimization methods; 
Stochastic programming: minimax approach; Stochastic 
quasigradient methods in minimax problems) 

(refers to: Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Dini and 
Hadamard derivatives in optimization; Global 
optimization: envelope representation; Minimax: 
directional differentiability; Minimax theorems; 
Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods; Nondifferentiable 
optimization: Newton method; Nondifferentiable 
optimization: parametric programming; Nondifferentiable 
optimization: relaxation methods; Nondifferentiable 
optimization: subgradient optimization methods; 
Stochastic programming: minimax approach; Stochastic 
quasigradient methods in minimax problems) 


Nondifferentiable optimization: Newton method 


(49J52, 90C30) 

(referred to in: Automatic differentiation: calculation of 
Newton steps; Dini and Hadamard derivatives in 
optimization; Dynamic programming and Newton’s 


method in unconstrained optimal control; Global 
optimization: envelope representation; Interval Newton 
methods; Nondifferentiable optimization; 
Nondifferentiable optimization: cutting plane methods; 
Nondifferentiable optimization: minimax problems; 
Nondifferentiable optimization: parametric programming; 
Nondifferentiable optimization: relaxation methods; 
Nondifferentiable optimization: subgradient optimization 
methods; Nonlinear least squares: Newton-type methods; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 

(refers to: Automatic differentiation: calculation of Newton 
steps; Dini and Hadamard derivatives in optimization; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Global optimization: 
envelope representation; Interval Newton methods; 
Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods; Nondifferentiable 
optimization: minimax problems; Nondifferentiable 
optimization: parametric programming; Nondifferentiable 
optimization: relaxation methods; Nondifferentiable 
optimization: subgradient optimization methods; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 


Nondifferentiable optimization: parametric programming 


(90C05, 90C25, 90C29, 90C30, 90C31) 

(referred to in: Bilevel programming: optimality conditions 
and duality; Bounds and solution vector estimates for 
parametric NLPS; Dini and Hadamard derivatives in 
optimization; Global optimization: envelope 
representation; Model based control for drug delivery 
systems; Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 
Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods; Nondifferentiable 
optimization: minimax problems; Nondifferentiable 
optimization: Newton method; Nondifferentiable 
optimization: relaxation methods; Nondifferentiable 
optimization: subgradient optimization methods; 
Parametric global optimization: sensitivity; Parametric 
linear programming: cost simplex algorithm; Parametric 
mixed integer nonlinear optimization; Parametric 
optimization: embeddings, path following and singularities; 
Selfdual parametric method for linear programs) 

(refers to: Bilevel programming: introduction, history and 
overview; Bounds and solution vector estimates for 
parametric NLPS; Dini and Hadamard derivatives in 
optimization; First order constraint qualifications; Global 
optimization: envelope representation; Multiparametric 
linear programming; Multiparametric mixed integer linear 
programming; Nondifferentiable optimization; 
Nondifferentiable optimization: cutting plane methods; 
Nondifferentiable optimization: minimax problems; 
Nondifferentiable optimization: Newton method; 
Nondifferentiable optimization: relaxation methods; 
Nondifferentiable optimization: subgradient optimization 
methods; Parametric global optimization: sensitivity; 
Parametric linear programming: cost simplex algorithm; 
Parametric mixed integer nonlinear optimization; 
Parametric optimization: embeddings, path following and 
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singularities; Second order constraint qualifications; 
Selfdual parametric method for linear programs) 


Nondifferentiable optimization: relaxation methods 
(49J52, 90C30) 
(referred to in: Dini and Hadamard derivatives in 
optimization; Global optimization: envelope 
representation; Nondifferentiable optimization; 
Nondifferentiable optimization: cutting plane methods; 
Nondifferentiable optimization: minimax problems; 
Nondifferentiable optimization: Newton method; 
Nondifferentiable optimization: parametric programming; 
Nondifferentiable optimization: subgradient optimization 
methods) 
(refers to: Dini and Hadamard derivatives in optimization; 
Global optimization: envelope representation; 
Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods; Nondifferentiable 
optimization: minimax problems; Nondifferentiable 
optimization: Newton method; Nondifferentiable 
optimization: parametric programming; Nondifferentiable 
optimization: subgradient optimization methods) 


Nondifferentiable optimization: subgradient optimization 
methods 
(49J52, 90C30) 
(referred to in: Dini and Hadamard derivatives in 
optimization; Global optimization: envelope 
representation; Nondifferentiable optimization; 
Nondifferentiable optimization: cutting plane methods; 
Nondifferentiable optimization: minimax problems; 
Nondifferentiable optimization: Newton method; 
Nondifferentiable optimization: parametric programming; 
Nondifferentiable optimization: relaxation methods; 
Quadratic assignment problem) 
(refers to: Dini and Hadamard derivatives in optimization; 
Global optimization: envelope representation; Integer 
programming: lagrangian relaxation; Nondifferentiable 
optimization; Nondifferentiable optimization: cutting 
plane methods; Nondifferentiable optimization: minimax 
problems; Nondifferentiable optimization: Newton 
method; Nondifferentiable optimization: parametric 
programming; Nondifferentiable optimization: relaxation 
methods) 


nondifferentiable problems see: Interval analysis: — 


nondifferential optimization 
[01A99] 
(see: History of optimization) 
nondiscordance condition 
[90-XX] 
(see: Outranking methods) 
nondominance 
[90C11, 90C29] 
(see: Multi-objective mixed integer programming) 


nondominated 
[90C11, 90C29] 
(see: Multi-objective mixed integer programming; 
Multi-objective optimization; Interactive methods for 
preference value functions; Multi-objective optimization: 
pareto optimal solutions, properties; Multiple objective 
programming support) 


nondominated cuts 
90C09, 90C10, 90C11] 
(see: Disjunctive programming) 
nondominated path 
90C31, 90C39] 
(see: Multiple objective dynamic programming) 
nondominated solution 
90C29] 
(see: Multiple objective programming support) 
nondominated solution 
90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
nondominated solution see: unsupported —; weakly — 
nondominated solution set 
[90C11, 90C29, 90C90] 
see: Multi-objective optimization: interaction of design 
and control) 
nondominated valid inequality 
[90C09, 90C10, 90C11] 
see: Disjunctive programming) 
nondomination see: comparison of efficiency and — 
noneconomic applications 
[90C32] 
see: Fractional programming) 
nonempty 
[90C30] 
see: Frank-Wolfe algorithm) 
nonempty interior 
[65K05, 90C26, 90C30] 
see: Monotonic optimization) 
nonexpansive 
[65K10, 65M60] 
see: Variational inequalities) 
nonexpansive operator 
[65K10, 65M60] 
see: Variational inequalities) 
nonexpansive operator see: firmly — 
nonextreme efficient 
[90B30, 90B50, 90C05, 91B82] 
see: Data envelopment analysis) 
nonfeasible decomposition method 
[49Q10, 74K99, 74Pxx, 90C90, 91A65] 
see: Multilevel optimization in mechanics) 
nonfeasible gradient controller 
[49Q10, 74K99, 74Pxx, 90C90, 91A65] 
see: Multilevel optimization in mechanics) 
nonGaussian signal processing 
[90C26, 90C90] 
see: Signal processing with higher order statistics) 
nonhomogeneous and nonisotropic body see: linear 
thermoelastic behavior of a generally — 
nonideal part 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in 
distillation systems) 
nonideal phase equilibrium equations see: ideal and — 
nonidentical machines 
[68Q99] 
(see: Branch and price: Integer programming with column 
generation) 
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nonincreasing monotone Boolean function 

90C09] 

(see: Inference of monotone boolean functions) 

noninferior 

[90C29] 

see: Multi-objective optimization; Interactive methods for 

preference value functions) 

noninferior solution 

[90C29] 

see: Multi-objective optimization: pareto optimal 

solutions, properties) 

noninferior solution 

[90C29] 

see: Multi-objective optimization: pareto optimal 
solutions, properties) 

noninferior solution see: weakly — 

noninferior solution set 

[49M37, 90C11, 90C29, 90C90] 

(see: MINLP: applications in the interaction of design and 

control; Multi-objective optimization: interaction of design 

and control) 
noninteracting see: n — 
noninteractive method 

[90C11, 90C29] 

(see: Multi-objective mixed integer programming) 
noninteractive methods see: interactive versus — 
noninterference constraints 

[90C10] 

(see: Maximum constraint satisfaction: relaxations and 

upper bounds) 
noninterference constraints see: binary — 
nonisotropic body see: linear thermoelastic behavior of a 

generally nonhomogeneous and — 
nonlinear 

[03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 
nonlinear 

[49M27, 90C11, 90C30] 

(see: MINLP: generalized cross decomposition) 
nonlinear assignment problems 

[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 
nonlinear bilevel programming: deterministic global 

optimization see: Mixed integer — 
nonlinear blending 

[90C30, 90C90] 

(see: MINLP: applications in blending and pooling 

problems) 
nonlinear boundary conditions see: elastostatics with — 
nonlinear CG method 
[90C30] 
see: Conjugate-gradient methods) 
nonlinear CG method 
[90C30] 
see: Conjugate-gradient methods) 
nonlinear CG-related algorithms 
[90C30] 

(see: Conjugate-gradient methods) 
nonlinear CG-related algorithms 
[90C30] 

(see: Conjugate-gradient methods) 


nonlinear complementarity 
[49M37, 90C26, 91A10] 
(see: Bilevel programming) 
nonlinear complementarity problem 
[65F10, 65F50, 65H10, 65K10, 65M60, 90C30, 90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem; Generalized nonlinear 
complementarity problem; Globally convergent homotopy 
methods; Implicit lagrangian; Linear complementarity 
problem; Nonsmooth and smoothing methods for 
nonlinear complementarity problems and variational 
inequalities; Topological methods in complementarity 
theory; Variational inequalities) 
nonlinear complementarity problem 
[90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem; Generalized nonlinear 
complementarity problem) 
nonlinear complementarity problem see: generalizations of 
the —; Generalized —; parametric — 
nonlinear complementarity problem and fixed point problem 
see: Equivalence between — 
nonlinear complementarity problems 
[90C31, 90C33] 
(see: Sensitivity analysis of complementarity problems) 
nonlinear complementarity problems and variational 
inequalities see: Nonsmooth and smoothing methods for — 
nonlinear complementary problem 
49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
nonlinear complementary problem 
49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
nonlinear constraint 
90C90] 
(see: Design optimization in computational fluid dynamics) 
nonlinear cut 
90C26] 
(see: Cutting plane methods for global optimization) 
nonlinear cut 
90C26] 
(see: Cutting plane methods for global optimization) 
nonlinear Dantzig-Wolfe decomposition 
90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
nonlinear decision models 
90C05] 
(see: Continuous global optimization: applications; 
Continuous global optimization: models, algorithms and 
software; Global optimization in the analysis and 
management of environmental systems) 
nonlinear, decreasing 
[90B15] 
(see: Evacuation networks) 
nonlinear diffusion equation 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
nonlinear discretized SIP problem 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
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nonlinear dynamic network flow problem 
[90C30] 
(see: Simplicial decomposition) 

nonlinear dynamics see: Robust design of dynamic systems by 
constructive — 

nonlinear equation see: one-dimensional — 

nonlinear equations 
[65G20, 65G30, 65G40] 
(see: Interval analysis: systems of nonlinear equations) 

nonlinear equations 
[65F10, 65F50, 65G20, 65G30, 65G40, 65H10, 65J15, 65K10] 
(see: Contraction-mapping; Globally convergent homotopy 
methods; Interval analysis: systems of nonlinear equations) 

nonlinear equations see: Global optimization methods for 
systems of —; Interval analysis: systems of —; 
overdetermined system of —; systems of —; 
underdetermined system of —; well-determined system 
of — 

nonlinear feasibility problem 
[49M20, 90C11, 90C30] 
(see: Generalized outer approximation) 

nonlinear integer programming problem see: mixed — 

nonlinear least squares 
[49M37, 90C30] 
(see: Generalized total least squares; Nonlinear least 
squares: trust region methods) 

nonlinear least squares 
[90C30] 
(see: Nonlinear least squares problems) 

nonlinear least squares see: generalized — 

Nonlinear least squares: Newton-type methods 
(49M37) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; ABS algorithms for optimization; 
Conjugate-gradient methods; Contraction-mapping; 
Gauss-Newton method: Least squares, relation to Newton’s 
method; Generalized total least squares; Global 
optimization methods for systems of nonlinear equations; 
Groébner bases for polynomial equations; Interval analysis: 
systems of nonlinear equations; Large scale trust region 
problems; Least squares orthogonal polynomials; Least 
squares problems; Local attractors for gradient-related 
descent iterations; Nonlinear least squares problems; 
Nonlinear least squares: trust region methods; Nonlinear 
systems of equations: application to the enclosure of all 
azeotropes; Optimization-based visualization) 
(refers to: ABS algorithms for linear equations and linear 
least squares; ABS algorithms for optimization; Automatic 
differentiation: calculation of Newton steps; 
Conjugate-gradient methods; Contraction-mapping; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Gauss-Newton method: 
Least squares, relation to Newton’s method; Generalized 
total least squares; Global optimization methods for 
systems of nonlinear equations; Interval analysis: systems 
of nonlinear equations; Interval Newton methods; Large 
scale trust region problems; Least squares orthogonal 
polynomials; Least squares problems; Local attractors for 
gradient-related descent iterations; Nondifferentiable 
optimization: Newton method; Nonlinear least squares 
problems; Nonlinear least squares: trust region methods; 


Nonlinear systems of equations: application to the 
enclosure of all azeotropes; Unconstrained nonlinear 
optimization: Newton-Cauchy framework) 

nonlinear least squares problem see: generalized —; 
unconstrained — 

Nonlinear least squares problems 
(90C30) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; ABS algorithms for optimization; 
Gauss—Newton method: Least squares, relation to Newton’s 
method; Generalized total least squares; Least squares 
orthogonal polynomials; Least squares problems; Nonlinear 
least squares: Newton-type methods; Nonlinear least 
squares: trust region methods) 
(refers to: ABS algorithms for linear equations and linear 
least squares; ABS algorithms for optimization; 
Gauss-Newton method: Least squares, relation to Newton’s 
method; Generalized total least squares; Least squares 
orthogonal polynomials; Least squares problems; Nonlinear 
least squares: Newton-type methods; Nonlinear least 
squares: trust region methods) 

Nonlinear least squares: trust region methods 
(49M37) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; ABS algorithms for optimization; 
Conjugate-gradient methods; Gauss-Newton method: Least 
squares, relation to Newton’s method; Generalized total 
least squares; Interval linear systems; Large scale trust 
region problems; Large scale unconstrained optimization; 
Least squares orthogonal polynomials; Least squares 
problems; Local attractors for gradient-related descent 
iterations; Nonlinear least squares: Newton-type methods; 
Nonlinear least squares problems; Overdetermined systems 
of linear equations) 
(refers to: ABS algorithms for linear equations and linear 
least squares; ABS algorithms for optimization; 
Conjugate-gradient methods; Gauss-Newton method: Least 
squares, relation to Newton’s method; Generalized total 
least squares; Large scale trust region problems; Least 
squares orthogonal polynomials; Least squares problems; 
Local attractors for gradient-related descent iterations; 
Nonlinear least squares: Newton-type methods; Nonlinear 
least squares problems) 

nonlinear material laws see: discretized hemivariational 
inequalities for — 

nonlinear mathematical programming problem 
[49]20, 49J52] 
(see: Shape optimization) 

nonlinear mixed integer programming problem see: large 
scale — 

nonlinear multicommodity flow problems 

[90C30] 

see: Simplicial decomposition) 

nonlinear multicommodity flow problems see: large — 

nonlinear network flow problem 

[90C30, 90C52, 90C53, 90C55] 

see: Asynchronous distributed optimization algorithms) 

nonlinear network flow problems 

[90C30] 

see: Convex-simplex algorithm) 

nonlinear nonconvex optimization see: Snooth — 
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nonlinear optics 
[34A55, 78A60, 90C30] 
(see: Optimal design in nonlinear optics) 

nonlinear optics 
[34A55, 78A60, 90C30] 
(see: Optimal design in nonlinear optics) 

nonlinear optics see: Optimal design in — 

nonlinear optimization 
[90B50, 90C26] 
(see: Optimization and decision support systems; Smooth 
nonlinear nonconvex optimization) 

nonlinear optimization 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26, 91B28] 
(see: Financial optimization; Information-based complexity 
and information-based optimization) 

nonlinear optimization see: global —; 
Inequality-constrained —; mixed integer —; parametric —; 
Parametric mixed integer —; Second order optimality 
conditions for —; smooth — 

nonlinear optimization: A disjunctive cutting plane approach 
see: Mixed-integer — 

nonlinear optimization: Newton-Cauchy framework see: 
Unconstrained — 

nonlinear optimization problem 
[49K99, 65K05, 80A10, 90C26, 90C90] 
(see: Design optimization in computational fluid dynamics; 
Optimality criteria for multiphase chemical equilibrium; 
Smooth nonlinear nonconvex optimization) 

nonlinear optimization problem 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 

nonlinear optimization problems 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 

nonlinear order complementarity problem 

[90033] 

see: Order complementarity) 

nonlinear parametric optimization 

[90C05, 90C25, 90C29, 90C30, 90C31] 

see: Nondifferentiable optimization: parametric 

programming) 

nonlinear potential 

[90C90] 
(see: Design optimization in computational fluid 
dynamics) 

nonlinear problem see: fully —; geometrically —; physically — 

nonlinear problems see: Simultaneous estimation and 
optimization of — 

nonlinear program 
[90C05, 90C20, 90C26, 90C30] 
(see: Global optimization in batch design under 
uncertainty; Redundancy in nonlinear programs; Simplicial 
decomposition) 

nonlinear program 
[90030] 
(see: Simplicial decomposition) 

nonlinear program see: loss of descent in a —; mixed 
integer —; relaxed — 


nonlinear programming 
[90C06, 90C10, 90C11, 90C22, 90C25, 90C30, 90C31, 90C57, 
90C90] 
(see: Modeling difficult optimization problems; 
Semidefinite programming: optimality conditions and 
stability; Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 

nonlinear programming 
[65K05, 65K10, 90B50, 90C06, 90C27, 90C30, 90C31, 90C34, 
91B28] 
(see: Convex-simplex algorithm; Feasible sequential 
quadratic programming; Frank-Wolfe algorithm; Neural 
networks for combinatorial optimization; Operations 
research and financial markets; Optimization and decision 
support systems; Portfolio selection: markowitz 
mean-variance model; Rosen’s method, global convergence, 
and Powell’s conjecture; Sensitivity and stability in NLP; 
Sensitivity and stability in NLP: continuity and differential 
stability; Simplicial decomposition) 

nonlinear programming see: feasible direction method for —; 
mixed integer —; sensitivity in — 

nonlinear programming algorithm see: descent in a — 

nonlinear programming: KKT necessary optimality conditions 
see: Equality-constrained — 

nonlinear programming problem 
[49K20, 49M99, 90C26, 90C55, 90C90] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Design optimization in computational fluid 
dynamics; Sequential quadratic programming: interior 
point methods for distributed optimal control problems) 

nonlinear programming problem see: equality-constrained —; 
mixed integer — 

nonlinear programs see: Redundancy in — 

nonlinear semi-infinite programs 

90C34] 

(see: Semi-infinite programming: approximation methods) 

nonlinear semi-infinite programs 

90C34] 

(see: Semi-infinite programming: approximation methods) 

nonlinear signal processing 

90C26, 90C90] 

(see: Signal processing with higher order statistics) 

nonlinear simplicial 

90C06, 90C25, 90C35] 

(see: Simplicial decomposition algorithms) 

nonlinear single commodity network flow problem 

90C30] 

(see: Simplicial decomposition) 

nonlinear SIP 

90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming: discretization methods) 

nonlinear system of equations 

90C30] 

(see: Nonlinear systems of equations: application to the 

enclosure of all azeotropes) 


nonlinear system of equations 
65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 
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nonlinear systems of equations 
[65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 

nonlinear systems of equations 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 

nonlinear systems of equations see: error bound for 
approximate solutions of —; existence of solutions of —; 
rigorous bound for solutions of —; uniqueness of solutions 
of — 

Nonlinear systems of equations: application to the enclosure 
of all azeotropes 
(90C30) 
(referred to in: Contraction-mapping; Global optimization 
methods for systems of nonlinear equations; Grébner bases 
for polynomial equations; Interval analysis: systems of 
nonlinear equations; Nonlinear least squares: Newton-type 
methods) 
(refers to: Contraction-mapping; Global optimization 
methods for systems of nonlinear equations; Interval 
analysis: systems of nonlinear equations; Nonlinear least 
squares: Newton-type methods) 

nonlinearly constrained optimization 
[90C25, 90C30] 
(see: Successive quadratic programming: full space 
methods) 

nonlocal sensitivity analysis 
[34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
(see: Nonlocal sensitivity analysis with automatic 
differentiation) 

nonlocal sensitivity analysis 
[34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
(see: Nonlocal sensitivity analysis with automatic 
differentiation) 

nonlocal sensitivity analysis see: automated Fortran program 
for — 

Nonlocal sensitivity analysis with automatic differentiation 
(34-XX, 49-XX, 65-XX, 68-XX, 90-XX) 
(referred to in: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 
Automatic differentiation: root problem and branch 
problem; Parametric global optimization: sensitivity; 
Sensitivity analysis of complementarity problems; 
Sensitivity analysis of variational inequality problems; 
Sensitivity and stability in NLP; Sensitivity and stability in 
NLP: approximation; Sensitivity and stability in NLP: 
continuity and differential stability) 
(refers to: Automatic differentiation: calculation of the 
Hessian; Automatic differentiation: calculation of Newton 
steps; Automatic differentiation: geometry of satellites and 
tracking stations; Automatic differentiation: introduction, 
history and rounding error estimation; Automatic 
differentiation: parallel computation; Automatic 
differentiation: point and interval; Automatic 
differentiation: point and interval taylor operators; 


Automatic differentiation: root problem and branch 
problem; Parametric global optimization: sensitivity; 
Sensitivity analysis of complementarity problems; 
Sensitivity analysis of variational inequality problems; 
Sensitivity and stability in NLP; Sensitivity and stability in 
NLP: approximation; Sensitivity and stability in NLP: 
continuity and differential stability) 

nonminimum phase 

[90C26, 90C90] 

(see: Signal processing with higher order statistics) 
nonmonotone Armijo-like criterion see: test — 
nonmonotone laws and hemivariational inequalities see: 

multivalued — 
nonmonotone line search 

[90C06] 

(see: Large scale unconstrained optimization) 
nonnegative interpolatory operator 

[90C34] 

(see: Semi-infinite programming: approximation methods) 
nonnegative lower bounds see: maximum flow problem 

with — 
nonnegative matrix see: doubly — 
nonoblivious local search 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
see: Maximum satisfiability problem) 
nonobtuse angle condition 
[47J20, 49J40, 65K10, 90C33] 
see: Solution methods for multivalued variational 
inequalities) 

Nonoriented multicommodity flow problems 

90B10, 90C05, 90C06, 90C35) 

referred to in: Maximum flow problem; Minimum cost flow 
problem; Multicommodity flow problems; Nonconvex 
network flow problems; Piecewise linear network flow 
problems) 

(refers to: Branch and price: Integer programming with 

column generation; Maximum flow problem; Minimum 

cost flow problem; Multicommodity flow problems; 

Network location: covering problems; Nonconvex network 

flow problems; Piecewise linear network flow problems) 
nonparametric 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
see: Disease diagnosis: optimization-based methods) 
nonparametric statistical method 
[62H30, 90C27] 
see: Assignment methods in clustering) 
nonparticipant 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
nonpreemptive 
[90B36] 
see: Stochastic scheduling) 
nonredundancy see: computational — 
nonredundancy rate see: average — 
nonredundant constraint 

[90C05, 90C20] 

(see: Redundancy in nonlinear programs) 
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nonregular operator 

41A10, 47N10, 49K15, 49K27] 

(see: High-order maximum principle for abnormal 
extremals) 

nonsaturating push 

90C35] 

(see: Maximum flow problem) 

nonseparable optimization problem 

[93-XX] 

see: Direct search Luus—Jaakola optimization procedure) 
nonseparable problem 

[93-XX] 

(see: Dynamic programming: optimal control applications) 
nonsingular 

[90C30] 

see: Convex-simplex algorithm) 

nonsingular local minimizer 

[49M29, 65K10, 90C06] 

see: Local attractors for gradient-related descent iterations) 
nonsingular matrix 

[90C30] 

see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 
nonsingular matrix see: sign- —; strongly — 
nonsmooth 

[49K27, 49K40, 90C30, 90C31] 

(see: Second order constraint qualifications) 
nonsmooth analysis 

[26B25, 26E25, 49J40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 

80-XX, 90C15, 90C26, 90C99] 

(see: Global optimization: envelope representation; 

Nonconvex energy functions: hemivariational inequalities; 

Quasidifferentiable optimization; Stochastic quasigradient 

methods: applications) 
nonsmooth analysis 

[26B25, 26E25, 46A20, 49J52, 52A01, 52A27, 65K05, 65K99, 

90C30, 90C90, 90C99, 90Cxx] 

(see: Composite nonsmooth optimization; Dini and 

Hadamard derivatives in optimization; Quasidifferentiable 

optimization; Quasidifferentiable optimization: calculus of 

quasidifferentials; Quasidifferentiable optimization: Dini 
derivatives, clarke derivatives) 
Nonsmooth analysis: Fréchet subdifferentials 
49K27, 90C48, 58C20, 58E30) 
(referred to in: Nonsmooth analysis: weak stationarity) 
refers to: Nonsmooth analysis: weak stationarity) 
nonsmooth analysis and optimization 
[49M37, 65K05, 65K10, 90C30, 93A13] 
see: Multilevel methods for optimal design) 
Nonsmooth analysis: weak stationarity 
90C46, 90C48, 58C20, 58E30) 
referred to in: Nonsmooth analysis: Fréchet subdifferentials; 

Smoothing methods for semi-infinite optimization) 

(refers to: Nonsmooth analysis: Fréchet subdifferentials) 
nonsmooth calculus of variations see: Nonconvex- — 
nonsmooth Dirichlet problem 

[49]52] 

(see: Hemivariational inequalities: eigenvalue problems) 
nonsmooth eigenvalue problem 

[49]52] 

(see: Hemivariational inequalities: eigenvalue problems) 


nonsmooth equations 
[49]52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
nonsmooth function 
[26B25, 26E25, 49]52, 90C99] 
(see: Quasidifferentiable optimization) 
nonsmooth functions 
[46A20, 52A01, 90C30] 
(see: Farkas lemma: generalizations) 
nonsmooth local approximations 
[49J52, 49Q10, 74G60, 74H99, 74K99, 74Pxx, 90C90] 
(see: Quasidifferentiable optimization: stability of dynamic 
systems) 
nonsmooth mappings see: approximation of —; 
approximations of — 
nonsmooth mechanics 
[49J35, 49J40, 49J52, 49Q10, 49805, 65K99, 70-XX, 74A55, 
74G99, 74H99, 74K99, 74M10, 74M15, 74Pxx, 80-XX, 90C26, 
90C33] 
(see: Hemivariational inequalities: applications in 
mechanics; Nonconvex energy functions: hemivariational 
inequalities; Quasidifferentiable optimization: 
applications) 
nonsmooth mechanics 
[49J40, 49J52, 49M05, 49Q10, 49805, 70-08, 74G60, 74G99, 
74H99, 74K99, 74Pxx, 90C33, 90C90] 
(see: Hemivariational inequalities: applications in 
mechanics; Quasidifferentiable optimization: stability of 
dynamic systems; Quasidifferentiable optimization: 
variational formulations; Quasivariational inequalities) 
nonsmooth mechanics see: inequality or — 
nonsmooth methods 
90C30, 90C33] 
(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 
nonsmooth modeling 
49J35, 65K99, 74A55, 74M10, 74M15, 90C26] 
(see: Quasidifferentiable optimization: applications) 
nonsmooth modeling 
49J35, 49J52, 65K99, 74A55, 74M10, 74M15, 90C26, 90C90] 
(see: Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: calculus of 
quasidifferentials) 
nonsmooth Newton method 
[90C30, 90C33] 
(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 
nonsmooth nonconvex optimization 
[49]40, 49]52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
nonsmooth optimization 
[49J20, 49J52, 90C15, 90C26, 90C33] 
(see: Shape optimization; Stochastic bilevel programs) 
nonsmooth optimization 
[26B25, 26E25, 46A20, 46N10, 49J35, 49J40, 49J52, 49M05, 
49M20, 49805, 52A01, 65G20, 65G30, 65G40, 65K05, 65K99, 
70-08, 74A55, 74G99, 74H99, 74M10, 74M15, 74Pxx, 90-00, 
90-08, 90C15, 90C25, 90C26, 90C30, 90C47, 90C90, 90C99] 
(see: Composite nonsmooth optimization; Direct global 
optimization algorithm; Farkas lemma: generalizations; 
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Interval global optimization; Nondifferentiable 
optimization; Nondifferentiable optimization: cutting 
plane methods; Quasidifferentiable optimization; 
Quasidifferentiable optimization: algorithms for 
hypodifferentiable functions; Quasidifferentiable 
optimization: applications; Quasidifferentiable 
optimization: calculus of quasidifferentials; 
Quasidifferentiable optimization: codifferentiable 
functions; Quasidifferentiable optimization: variational 
formulations; Random search methods; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; SSC minimization algorithms for nonsmooth and 
stochastic optimization) 

nonsmooth optimization see: Composite — 

Nonsmooth optimization approach to clustering 
(90C26, 90C56, 90C90) 

nonsmooth optimization methods see: Solving 
hemivariational inequalities by — 

nonsmooth optimization problems 
[90C15] 

(see: Two-stage stochastic programming: quasigradient 
method) 

nonsmooth optimization problems 
[90C15] 

(see: Two-stage stochastic programming: quasigradient 
method) 

nonsmooth reformulation see: smoothing- — 

Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities 
(90C33, 90C30) 

(referred to in: Composite nonsmooth optimization; 
Nonconvex-nonsmooth calculus of variations; Smoothing 
methods for semi-infinite optimization; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational principles) 

(refers to: Composite nonsmooth optimization; 
Nonconvex-nonsmooth calculus of variations; Solving 
hemivariational inequalities by nonsmooth optimization 
methods) 

nonsmooth SSC-SABB algorithm 
[90C15, 90C30, 90C99] 

(see: SSC minimization algorithms for nonsmooth and 
stochastic optimization) 

nonsmooth and stochastic optimization see: SSC minimization 
algorithms for — 

nonsmooth superpotential 
[49J52, 49Q10, 74G60, 74H99, 74K99, 74Pxx, 90C90] 

(see: Quasidifferentiable optimization: stability of dynamic 
systems) 

nonstandard analysis 

03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 

nonstandard analysis 

03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 

nonstandard framework 

03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 

nonstochastic uncertainty 

90C34, 91B28] 


see: Semi-infinite programming and applications in 

finance) 

nonstoichiometric form of KT conditions 

[49K99, 65K05, 80A10] 

see: Optimality criteria for multiphase chemical 

equilibrium) 

nonsupported efficient solution 

[90C10, 90C29] 

see: Multi-objective integer linear programming) 

nonsupported efficient solutions 

[90C10, 90C35] 

see: Bi-objective assignment problem) 

nontriviality condition 

[41A10, 47N10, 49K15, 49K27] 

see: High-order maximum principle for abnormal 

extremals) 

nonuniform 

[49M37, 65K10, 90C26, 90C30] 

see: &BB algorithm) 

nonunit capacity see: problem with — 

nonunit weight CMST 

[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 

nonzero pattern see: zero- — 

nonzero residual problem 
[90C30] 
(see: Nonlinear least squares problems) 

nonzero-sum infinite horizon game 
[49Jxx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 

norm see: A-weighted Euclidean —; approximation in the 
uniform —; L,- —; normalized —; t- —; weighted 
maximum —; weighter sup — 

norm contraction see: weighter sup- — 

norm controllability see: minimum — 

norm-dependent property 
[49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 

norm solution see: minimum — 

normal 
[65K05, 90C26, 90C30, 90C31] 
(see: Minimax: directional differentiability; Robust global 
optimization) 

normal compactness see: partial sequential —; sequential — 

normal cone 
[05A, 15A, 49J40, 49J52, 49Q10, 51M, 52A, 52B, 52C, 62H, 
65K05, 65K10, 65M60, 68Q, 68R, 68U, 68W, 70-XX, 74K99, 
74Pxx, 80-XX, 90B, 90C, 90C30, 90C31, 90C33, 90Cxx] 
(see: Convex discrete optimization; Nonconvex energy 
functions: hemivariational inequalities; Quasidifferentiable 
optimization: exact penalty methods; Sensitivity analysis of 
complementarity problems; Sensitivity analysis of 
variational inequality problems; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities: geometric interpretation, 
existence and uniqueness) 

normal cone see: fréchet —; limiting — 

normal distribution see: law of —; multivariate — 

normal equation 

[65Fxx, 90C30] 
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(see: Generalized total least squares; Least squares 
problems) 
normal equations 
[15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 
normal extremal 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
normal form 
[12D10, 12Y05, 13P10] 
(see: Grébner bases for polynomial equations) 
normal form 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
normal form see: Boolean formula in conjunctive —; 
canonical —; complete many-valued logic —; 
conjunctive —; disjunctive —; game in —; many-valued —; 
P|- — 
normal form of a polynomial 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
normal forms see: minimization of Pinkava —; Pl-algebras and 
2-valued — 
normal forms of Pi-algebras see: functionally complete — 
normal hull 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
normal hull see: reverse — 
normal map 
[90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem) 
normal primal problem see: J- —; N- — 
normal set see: reverse — 
normalization 
[62H30, 68T10, 90C05] 
see: Linear programming models for classification) 
normalization of measures 
[90B85] 
see: Single facility location: multi-objective euclidean 
distance location) 
normalization property 
[90C33] 
see: Topological methods in complementarity theory) 
normalized norm 
[26A24, 65D25] 
(see: Automatic differentiation: introduction, history and 
rounding error estimation) 
normalized stress 
[65K05, 90C27, 90C30, 90C57, 91C15] 
(see: Optimization-based visualization) 
normalized structure factors 
[90C26] 
see: Phase problem in X-ray crystallography: Shake and 
bake approach) 
normalized structure factors 
[90C26] 
see: Phase problem in X-ray crystallography: Shake and 
bake approach) 


normalized volume 
{13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
normative perspective 
[90C29] 
(see: Preference modeling) 
normed linear spaces see: Best approximation in ordered — 
norms see: t- — 
normwise relative condition number 
[65Fxx] 
(see: Least squares problems) 
North-West corner rule 
[90C35] 
(see: Multi-index transportation problems) 
not dominated 
[90C27, 90C29] 
(see: Multi-objective combinatorial optimization) 
notation see: Landau —; relational matrix — 
notation for constraints 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
notation for objective functions 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
notation for relational operations see: matrix — 
novel decomposition-based clustering approach: global 
optimum search with enhanced positioning see: Gene 
clustering: A — 
novo protein design using flexible templates see: De — 
novo protein designUsing rigid templates see: De — 
NP 
[90C60] 
(see: Complexity classes in optimization) 
nP-complete 
[49M37, 68Q25, 90C11, 90C60] 
(see: Complexity theory; Mixed integer nonlinear 
programming; NP-complete problems and proof 
methodology) 
NP-complete 
[90C60] 
(see: Complexity of degeneracy; Complexity theory; 
Complexity theory: quadratic programming) 
NP-complete completeness see: strong — 
nP-complete problem 
[68Q25, 90C60] 
(see: Complexity theory; Computational complexity theory; 
NP-complete problems and proof methodology) 
NP-complete problem 
[68Q25, 90C60] 
(see: Complexity of degeneracy; Computational complexity 
theory; NP-complete problems and proof methodology) 
NP-complete problems and proof methodology 
(90C60, 68Q25) 
(referred to in: Complexity classes in optimization; 
Complexity of degeneracy; Complexity of gradients, 
Jacobians, and Hessians; Complexity theory; Complexity 
theory: quadratic programming; Fractional combinatorial 
optimization; Information-based complexity and 
information-based optimization; Integer programming: 
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cutting plane algorithms; Maximum cut problem, 
MAX-CUT) 
(refers to: Complexity classes in optimization; Complexity of 
degeneracy; Complexity of gradients, Jacobians, and 
Hessians; Complexity theory; Complexity theory: quadratic 
programming; Computational complexity theory; 
Fractional combinatorial optimization; Information-based 
complexity and information-based optimization; 
Kolmogorov complexity; Mixed integer nonlinear 
programming; Parallel computing: complexity classes) 

NP-complete reductions see: ordinary — 

NP-completeness 

03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 

NP-completeness see: ordinary —; strong — 

NP-completeness proofs 

68Q25, 90C60] 

(see: NP-complete problems and proof methodology) 

NP-hard 

49M37, 68T99, 9008, 90C05, 90C06, 90C08, 90C10, 90C11, 
90C26, 90C27, 90C59, 90C60, 93D09] 
(see: Capacitated minimum spanning trees; Complexity 
theory; Integer programming: branch and bound methods; 
Mixed integer nonlinear programming; Price of robustness 
for linear optimization problems; Robust control; Variable 
neighborhood search methods) 

NP-hard 
[90B80, 90B85, 90C60] 
(see: Complexity theory; Complexity theory: quadratic 
programming; Facilities layout problems; Multifacility and 
restricted location problems) 

NP-hard see: strongly — 

nP-hard problem 
[68Q25, 90C60] 
(see: Computational complexity theory; NP-complete 
problems and proof methodology) 

NP-hard problem 
[68Q25, 90C60] 
(see: Computational complexity theory; NP-complete 
problems and proof methodology) 

NP method see: pure — 

NP methods see: hybrid —; knowledge-based — 

NPC 
[90C60] 
(see: Computational complexity theory) 

NPH 
[90C60] 
(see: Computational complexity theory) 

NRTL equation 
[90C26, 90C90] 
(see: Global optimization in phase and chemical reaction 
equilibrium) 

nSD 
[65K10, 90C06, 90C25, 90C33, 90C35, 90C51] 
(see: Generalizations of interior point methods for the 
linear complementarity problem; Simplicial decomposition 
algorithms) 

NSM 
[90C30, 90C33] 
(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 


v-approximate gradient 

[49M29, 65K10, 90C06] 

see: Local attractors for gradient-related descent iterations) 

nulispace see: numerical — 

null space 

[49-XX, 90-XX, 90C20, 90C30, 90Cxx, 93-XX] 

see: Discontinuous optimization; Duality theory: biduality 

in nonconvex optimization; Successive quadratic 

programming: decomposition methods) 

null space 

[90C20, 90C30] 

see: Successive quadratic programming: decomposition 
methods) 

null space decomposition see: range and — 

null step 

[47J20, 49]40, 49J52, 65K05, 65K10, 90C30, 90C33] 

(see: Solution methods for multivalued variational 

inequalities; Solving hemivariational inequalities by 

nonsmooth optimization methods) 
number see: blackball —; chromatic —; clique —; clique 
partition —; condition —; conspiracy —; crossing —; 

Dedekind —; domination —; fuzzy —; independence —; 

L-R flat fuzzy —; L-R fuzzy —; Lovasz —; normwise relative 

condition —; separation —; stability —; tangle —; 

weighted clique —; weighted stability — 
number of clauses see: minimum — 
number of clusters see: Determining the optimal — 
number of DNF clauses see: minimal — 
number of a matrix see: condition — 
number model see: real — 
number of operations 

[65D25, 68W30] 

(see: Complexity of gradients, Jacobians, and Hessians) 
number of pivot steps see: average —; expected — 
number of shadow-vertices see: expected —; variance of 

the — 
number of Steiner points see: Steiner tree problem with 

minimum — 
number of vehicles 

[00-02, 01-02, 03-02] 

(see: Vehicle routing problem with simultaneous pickups 

and deliveries) 
number of vehicles see: Vehicle scheduling problems with 

a fixed — 
number of well switches see: maximum — 
numbers see: arithmetic operations on fuzzy —; common 

random —-; finite natural —; finite rational —; fuzzy —; 


infinitely near rational —; infinitely small negative real —; 
infinitely small positive real —; infinitely small real —; 
magic —; natural —; rational —; real — 

Numerica 


[65G20, 65G30, 65G40, 68T20] 

see: Interval constraints) 

numerical algorithms 

[49]40, 49Q10, 70-08, 74K99, 74Pxx] 

see: Quasivariational inequalities) 

numerical algorithms 

[90C25, 90C26, 90C34] 

see: Semi-infinite programming: numerical methods) 
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numerical analysis 
[01A99, 90C99] 
(see: Von Neumann, John) 
numerical constraint satisfaction problem 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 
numerical differentiation 
[26A24, 65D25, 68W30] 
(see: Automatic differentiation: introduction, history and 
rounding error estimation; Complexity of gradients, 
Jacobians, and Hessians) 
numerical differentiation see: internal — 
numerical example of a trim-loss problem 
90C11, 90C90] 
(see: MINLP: trim-loss problem) 
numerical methods 
90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
numerical methods 
65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
numerical methods see: Semi-infinite programming: —; 
Stochastic optimal stopping: — 
Numerical methods for unary optimization 
(90C30) 
(referred to in: Broyden family of methods and the BFGS 
update; Unconstrained nonlinear optimization: 
Newton-Cauchy framework; Unconstrained optimization 
in neural network training) 


(refers to: Broyden family of methods and the BFGS update; 


Unconstrained nonlinear optimization: Newton-Cauchy 


framework; Unconstrained optimization in neural network 


training) 
numerical nulispace 
[65Fxx] 
(see: Least squares problems) 
numerical rank 
[15A23, 65F05, 65F20, 65F22, 65F25, 65Fxx] 


(see: Least squares problems; Orthogonal triangularization) 


numerical results 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 

numerical simulation 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 

(NVI) see: nearest vertex insertion — 

Nystrém interpolation 
[65H10, 65J15] 
(see: Contraction-mapping) 


Oo 


O(n‘) see: algorithm of complexity — 
O(n‘) time see: algorithm running in — 
OA master problem see: disjunctive — 


OA method 
[90C26] 
(see: Cutting plane methods for global optimization) 
object 
(see: State of the art in modeling agricultural systems) 
objective 
[00-02, 01-02, 03-02] 
(see: Vehicle routing problem with simultaneous pickups 
and deliveries) 
objective see: maxmin —; minimax —; multi- —; Stochastic 
programming models: random — 
objective assignment problem see: Bi- — 
objective CNSO see: multi- — 
objective combinatorial optimization see: multi- — 
objective convex optimization see: multi- — 
objective criterion 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
objective dynamic programming see: Multiple — 
objective euclidean distance location see: Single facility 
location: multi- — 
objective facility location see: multi- — 
objective fractional program see: multi- — 
objective fractional programming see: multi- — 
objective fractional programming problems see: Multi- — 
objective function 
[65G30, 65G40, 65K05, 68Q25, 9008, 90B10, 90B80, 90B85, 
90C05, 90C20, 90C26, 90C27, 90C30, 90C35, 90C57, 90C59, 
90C90, 91B28] 
(see: Competitive ratio for portfolio management; Global 
optimization: interval analysis and balanced interval 
arithmetic; Global optimization using space filling; MINLP: 
heat exchanger network synthesis; Nonconvex network flow 
problems; Redundancy in nonlinear programs; Rosen’s 
method, global convergence, and Powell’s conjecture; 
Variable neighborhood search methods; Warehouse 
location problem) 
objective function 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
objective function see: fair —; lexicographically minimax —; 
maximin —; minimax —; multicriteria —; multifacility 
Weber —; multifacility Weber—Rawls —; random —; 
separable —; separable convex —; set-valued — 
objective function parametrization 
[90C05, 90C31] 
(see: Parametric linear programming: cost simplex 
algorithm) 
objective function value see: continuity property of the —; 
convexity property of the — 
objective functions 
[90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions; Multi-objective optimization: 
pareto optimal solutions, properties; Multiple objective 
programming support) 
objective functions see: nondifferentiable —; notation for — 
objective functions and/or derivatives see: evaluation of — 
objective integer linear programming see: Multi- — 
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objective interpretation 
[94A17] 
(see: Jaynes’ maximum entropy principle) 
objective linear programming see: Fuzzy multi- —; multi- —; 
multiple — 
objective linear programming with fuzzy coefficients see: 
multi- — 
objective linear programming under uncertainty see: multi- — 
objective for a location problem 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
objective mathematical programming see: multi- — 
objective mixed integer programming see: Multi- — 
objective (multicriteria) mixed integer programming see: 
multi- — 
objective optimization see: disaggregation in multi- —; 
Generalized concavity in multi- —; multi- — 
objective optimization and decision support systems sce: 
Multi- — 
objective optimization: interaction of design and control see: 
Multi- — 
objective optimization; Interactive methods for preference 
value functions see: Multi- — 
objective optimization: lagrange duality see: Multi- — 
objective optimization: pareto optimal solutions, properties 
see: Multi- — 
objective programming see: multi- —; multiple — 
objective programming support see: Multiple — 
objective rectilinear distance location see: Single facility 
location: multi- — 
objective simplex algorithm see: parametric — 
objective value see: incumbent — 
objectives see: balancing —; multiple —; pull —; push — 
objects 
[65K05, 90C27, 90C30, 90C57, 91C15] 
(see: Optimization-based visualization) 
oblique projection matrix 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
observation problem see: minimax — 
observation problem under uncertainty with perturbations 
see: minimax — 
observational quantifiers 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
observations see: inaccuracy in — 
obstacle 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
obstacle-free shape design see: robust — 
obstacle-free truss design see: robust — 
obstruction set 
[68R10, 90C27] 
(see: Branchwidth and branch decompositions) 
OCAT 
[90C09, 90C10] 
(see: Optimization in boolean classification problems; 
Optimization in classifying text documents) 


Occam razor 

90C60] 

(see: Kolmogorov complexity) 

odd-hole-cut 

see: Contact map overlap maximization problem, CMO) 

odd sequence 

[05C85] 

see: Directed tree networks) 

odd-set constraints 

[90C05, 90C10, 90C27, 90C35] 

see: Assignment and matching) 

ODE two-point boundary value problem 

[34A55, 78A60, 90C30] 

see: Optimal design in nonlinear optics) 

Odyssée 

[65K05, 90C30] 

see: Automatic differentiation: point and interval taylor 
operators) 

off see: back- —; tailing- 

off-0-diagonal operator 

[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 
off cutting plane see: trade- — 
off error see: round- — 
off-line feedback 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 

94C10] 

(see: Maximum satisfiability problem) 
off-line learning 

[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 

(see: Unconstrained optimization in neural network 

training) 
off-line process optimization 

[90C30, 90C90] 

(see: Successive quadratic programming: applications in the 

process industry) 
off question see: trade- — 
offs see: trade- — 
offshore oil fields 

[90C26] 

(see: MINLP: application in facility location-allocation) 
offshore oilfield infrastructure see: Optimal planning of — 
offspring see: perfect — 
oil fields see: offshore — 
oil flowrate see: well — 
oil, gas and water capacity constraints see: maximum — 
oil model see: black — 
oil rate constraints see: upper and lower well — 
oilfield infrastructure see: Optimal planning of offshore — 
oligopolistic equilibrium see: Cournot-Nash — 
oligopolistic equilibrium model see: Cournot-Nash — 
Oligopolistic market equilibrium 

(91B06, 91B60) 

(referred to in: Equilibrium networks; Financial 

equilibrium; Generalized monotonicity: applications to 

variational inequalities and equilibrium problems; Spatial 
price equilibrium; Traffic network equilibrium; Walrasian 
price equilibrium) 

(refers to: Equilibrium networks; Financial equilibrium; 

Generalized monotonicity: applications to variational 

inequalities and equilibrium problems; Spatial price 


; trade- 
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equilibrium; Traffic network equilibrium; Walrasian price 
equilibrium) 
oligopoly 
91B06, 91B60] 
(see: Oligopolistic market equilibrium) 
oligopoly model 
90C15] 
(see: Stochastic quasigradient methods in minimax 
problems) 
oligopoly model 
90C15] 
(see: Stochastic quasigradient methods in minimax 
problems) 
oligopoly model see: spatial — 
oligopoly problem see: aspatial —; classical — 
OLSO 
[41A30, 4799, 65K10] 
see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
OME 
[91B06, 91B60] 
see: Oligopolistic market equilibrium) 
Omega see: Chaitin in — 
2-based yield 
[90C34, 91B28] 
see: Semi-infinite programming and applications in 
finance) 
on average 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
on-duty time 
see: Railroad crew scheduling) 
on-line algorithm 
[05C85] 
see: Directed tree networks) 
on-line feedback 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
see: Maximum satisfiability problem) 
on-line learning 
[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 
(see: Unconstrained optimization in neural network 
training) 
on-line method 
[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 
(see: Unconstrained optimization in neural network 
training) 
on-line process optimization 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
on-the-river hydropower nodes 
[90C30, 90C35] 
(see: Optimization in water resources) 
one see: one against — 
one against all 
(see: Mathematical programming for data mining) 
one against one 
(see: Mathematical programming for data mining) 
one algorithm see: Smith-Walford- — 
one approach see: limited-memory symmetric rank- — 


one-at-a-time coefficient generation 
65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 
one clause at a time 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
one clause at a time algorithm 
90C09, 90C10] 
(see: Optimization in classifying text documents) 
one clause at a time approach 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
one constraint see: consecutive — 
one-dimensional marginal probability distribution function 
65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
one-dimensional nonlinear equation 
90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
one, find all see: find — 
one-for-one ordering policy 
[90B50] 
(see: Inventory management in supply chains) 
one formula see: selfdual rank — 
one globally convergent homotopies see: probability- — 
one homotopy see: probability- — 
one homotopy algorithm see: globally convergent 
probability- — 
one-hop neighboring stations 
(see: Broadcast scheduling problem) 
one integer feasibility problem see: zero- — 
one integer problem see: linear zero- — 
one integer program see: zero- — 
one integer programming see: zero- — 
one knapsack problem see: multidimensional zero- —; zero- — 
one matrix see: rank- — 
one optimization see: zero- — 
one ordering policy see: one-for- — 
one-parametric finite optimization problem 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
one-parametric semi-infinite optimization 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
one problem see: quadratic zero- — 
one programming see: Fractional zero- —; pure zero- — 
one programming problem see: zero- — 
one quasi-Newton method see: symmetric rank- — 
one-reducible graph see: Smith-Walford — 
one-sided differential 
[26B25, 26E25, 49]52, 90C99] 
(see: Quasidifferentiable optimization) 
one-to-all instances 
[05C85] 
(see: Directed tree networks) 
one-tree 
[90C35] 
(see: Generalized networks) 
one update see: symmetric rank- — 
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one update formula see: Sherman-Morrison rank- — 
one-way analysis of variance 
[62H30, 90C27] 
(see: Assignment methods in clustering) 
onto relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
open communication 
[90C11, 90C29] 
(see: Multi-objective mixed integer programming) 
open form approach 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
open list 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
open-loop control 
[49]xx, 91Axx] 
(see: Infinite horizon control and dynamic games) 
open-loop Nash equilibrium 
[49]xx, 91Axx] 
(see: Infinite horizon control and dynamic games) 
open shop problem 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 
operability 
[49M37, 90C11] 
(see: MINLP: applications in the interaction of design and 
control) 
operability analysis of flowsheets 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
operating cash flow see: maximize — 
operation see: averaging —; contraction —; degenerate 
pivot —; floating point —; freight —; head of —; interval 
arithmetic —; nondegenerate pivot —; partially 
asynchronous —; pivot —; tail of —; totally 
asynchronous — 
operation of electric and energy power systems see: 
Optimization in — 
operation planning 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
operation planning 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
operational decisions in a supply chain 
[90-02] 
(see: Operations research models for supply chain 
management and design) 
Operational Research see: european Journal of — 
operational restrictions 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 


operational status of the wells 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 

operational supply chain management 
[90-02] 
(see: Operations research models for supply chain 
management and design) 

operations see: expansion —; irregular —; matrix notation for 
relational —; number of —; process —; reduction —; 
reflection — 

operations on fuzzy numbers see: arithmetic — 

operations problem see: irregular — 

operations in a program see: basic — 

operations on relations see: binary —; unary — 

Operations research 
(90C27) 

referred to in: History of optimization) 

refers to: History of optimization) 

operations research 

[90C31] 

(see: Sensitivity and stability in NLP) 

operations research 

[90C27] 

see: Operations research) 

operations research see: GRASP in — 

Operations research and financial markets 

90C27) 

Operations research models for supply chain management 
and design 
(90-02) 
(referred to in: Global supply chain models; Inventory 
management in supply chains; Nonconvex network flow 
problems; Piecewise linear network flow problems) 
(refers to: Global supply chain models; Inventory 
management in supply chains; Nonconvex network flow 
problems; Piecewise linear network flow problems) 

operator see: antitone —; best approximation —; 
block-0-diagonal —; closed selfadjoint —; co-coercive —; 
compact —; complementary —; completely continuous —; 
condensing —; constraint narrowing —; continuous 
selection —; contractive —; eor —; firmly nonexpansive —; 
fuzzy set-inclusion —; generalized monotone —; 
geometrical —; hemicontinuous —; heterotonic —; 
implication —; interpolatory —; interval —; interval 
Newton —-; interval Taylor —; involutory —; isotone —; 
lipschitzian selection —; monotone —; nonexpansive —; 
nonnegative interpolatory —; nonregular —; 
off-0-diagonal —; optimal Lipschitzian selection —; 
orthogonal projection —; overloaded —; p-regular —; 
Point Taylor —; properly quasimonotone —; 
pseudomonotone —; quasimonotone —-; resolvent —; 
selection —; semistrictly quasimonotone —-; strictly 
monotone —-; strictly pseudomonotone —- strictly 
quasimonotone —-; univariate interval Newton —; upper 
hemicontinuous — 

operator on a Banach space see: monotone — 

operator decomposition 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 

operator for a matroid see: closure — 
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operator overloading 
[65H99, 65K05, 65K99, 90C30] 
(see: Automatic differentiation: point and interval; 
Automatic differentiation: point and interval taylor 
operators) 

operator splitting 

[90030] 

see: Cost approximation algorithms) 

operator splitting 

[90030] 

see: Cost approximation algorithms) 

operator splitting algorithm 

[47H05, 65J15, 90C25, 90C55] 

see: Fejér monotonicity in convex optimization) 

operator topology see: strong — 

operators see: Automatic differentiation: point and interval 
taylor —; design of —; genetic — 

operators in best approximation by bounded or continuous 
functions see: Lipschitzian — 

OPFAD 

[65K05, 90C30] 

see: Automatic differentiation: calculation of the Hessian) 

opposite of a signed set 

[90C09, 90C10] 

(see: Oriented matroids) 

OPRAD 

[65K05, 90C30] 

see: Automatic differentiation: calculation of the Hessian) 

Opt see: 2- —; k- — 

opt heuristic see: R- — 

Opt Matching (ROM) see: recursive — 

opt neighborhood see: 2- — 


optical bandwidth 
[05C85] 
(see: Directed tree networks) 
optical networks see: all- —; Integer linear programs for 


routing and protection problems in — 
optics see: nonlinear —; Optimal design in nonlinear — 
optima see: local — 
optimal 
[05B35, 65K05, 68R10, 9008, 90C05, 90C20, 90C26, 90C27, 
90C33, 90C59] 
(see: Branchwidth and branch decompositions; Criss-cross 
pivoting rules; Variable neighborhood search methods) 
optimal see: globally —; locally —; Pareto 
optimal algorithm 
[03D15, 68Q05, 68Q15] 
see: Parallel computing: complexity classes) 
optimal algorithms 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
see: Information-based complexity and information-based 
optimization) 
optimal assignment 
[90C09, 90C10] 
see: Combinatorial matrix analysis) 
optimal assignment problem 
[90C09, 90C10] 
see: Combinatorial matrix analysis) 
optimal basis 
[90C05, 90C06, 90C08, 90C10, 90C11, 90C33] 
see: Integer programming: branch and bound methods; 


Pivoting algorithms for linear programming generating 
two paths) 

optimal componentwise bound 
[15A99, 65G20, 65G30, 65G40, 90C26] 
(see: Interval linear systems) 

optimal control 
[03H10, 49J15, 49J27, 49K15, 90C26, 90C34, 93-XX, 93C10] 
(see: Boundary condition iteration BCI; Invexity and its 
applications; Pontryagin maximum principle; Semi-infinite 
programming and control problems) 

optimal control 
[49-XX, 49J15, 49K15, 60Jxx, 65Lxx, 90C26, 91B32, 92D30, 
93-XX, 93C10] 
(see: Invexity and its applications; Optimal control of a 
flexible arm; Pontryagin maximum principle; Resource 
allocation for epidemic control) 

optimal control see: continuous-time —; Discrete-Time —; 
Dynamic programming: continuous-time —; Dynamic 
programming and Newton's method in unconstrained —; 
parametric —; time —; unconstrained — 

optimal control applications see: Dynamic programming: — 

optimal control with first order differential equations see: 
Duality in — 

Optimal control of a flexible arm 
(93-XX) 
(referred to in: Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Hamilton-Jacobi-Bellman 
equation; Infinite horizon control and dynamic games; 
MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control; Robust control; Robust control: schur stability 
of polytopes of polynomials; Semi-infinite programming 
and control problems; Sequential quadratic programming: 
interior point methods for distributed optimal control 
problems; Suboptimal control) 
(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Hamilton-Jacobi-Bellman equation; Infinite 
horizon control and dynamic games; MINLP: applications 
in the interaction of design and control; Multi-objective 
optimization: interaction of design and control; Robust 
control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Suboptimal control) 

optimal control policy 
[90C30] 
(see: Suboptimal control) 

optimal control problem 
[93-XX] 
(see: Boundary condition iteration BCI) 

optimal control problem 
[49K20, 49M99, 90C55] 
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(see: Sequential quadratic programming: interior point 
methods for distributed optimal control problems) 


optimal control problem see: mixed integer —; time — 


optimal control problems see: discretized —; distributed —; 
Sequential quadratic programming: interior point methods 
for distributed — 

optimal degree of flexibility 
[90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 


optimal design 
[34A55, 78A60, 90C26, 90C30, 90C31] 
(see: Bilevel programming: introduction, history and 
overview; Optimal design in nonlinear optics) 


optimal design 
[34A55, 78A60, 90C25, 90C27, 90C30, 90C90] 
(see: Optimal design in nonlinear optics; Semidefinite 
programming and structural optimization) 


optimal design see: D- —; global —; Multilevel methods for — 


Optimal design of composite structures 
(90C29, 90C26) 
(referred to in: Design optimization in computational fluid 
dynamics; Interval analysis: application to chemical 
engineering design problems; Multidisciplinary design 
optimization; Multilevel methods for optimal design; 
Optimal design in nonlinear optics; Structural 
optimization: history) 
(refers to: Bilevel programming: applications in engineering; 
Design optimization in computational fluid dynamics; 
Global optimization: hit and run methods; Interval 
analysis: application to chemical engineering design 
problems; Multidisciplinary design optimization; 
Multilevel methods for optimal design; Optimal design in 
nonlinear optics; Random search methods; Structural 
optimization: history) 

Optimal design in nonlinear optics 
(34455, 90C30, 78A60) 
(referred to in: Design optimization in computational fluid 
dynamics; Interval analysis: application to chemical 
engineering design problems; Multidisciplinary design 
optimization; Multilevel methods for optimal design; 
Optimal design of composite structures; Structural 
optimization: history) 
(refers to: Bilevel programming: applications in engineering; 
Design optimization in computational fluid dynamics; 
Interval analysis: application to chemical engineering 
design problems; Multidisciplinary design optimization; 
Multilevel methods for optimal design; Optimal design of 
composite structures; Structural optimization: history) 

optimal design problems 
[49K20, 49M99, 90C55] 
(see: Sequential quadratic programming: interior point 
methods for distributed optimal control problems) 

optimal distance see: method of — 

optimal distribution of efforts 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 


optimal experimental design 
05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
optimal face 
90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
optimal face 
90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
optimal flowsheets see: sensitivity of — 
optimal gambling 
49120, 90C40] 
(see: Dynamic programming: undiscounted problems) 
optimal indexing vocabulary 
90C09, 90C10] 
(see: Optimization in classifying text documents) 
optimal integral bounds subject to moment conditions 
28-XX, 49-XX, 60-XX] 
(see: General moment optimization problems) 
optimal investments 
[90C15] 
see: Two-stage stochastic programming: quasigradient 
method) 
Optimal investments 
[90C15] 
see: Two-stage stochastic programming: quasigradient 
method) 
optimal Lipschitzian selection operator 
[41A30, 4799, 65K10] 
see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
optimal number of clusters see: Determining the — 
optimal parameter 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
optimal parameter 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
optimal parameter see: globally —; locally — 
optimal partitioning algorithm see: nearest insertion — 
optimal path 
[49M37] 
(see: Nonlinear least squares: trust region methods) 
optimal path 
[49M37] 
(see: Nonlinear least squares: trust region methods) 
optimal path see: co- — 
Optimal planning of offshore oilfield infrastructure 
optimal policies 
[90B50] 
(see: Inventory management in supply chains) 
optimal policies see: (s,S) — 
optimal ratio see: method of — 
optimal relaxation 
[90C30] 
(see: Relaxation in projection methods) 
optimal rule see: Bayes — 
Optimal sensor scheduling 
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optimal shape design 
[49J20, 49]52] 
(see: Shape optimization) 

optimal shapes see: design of — 

optimal solution 
[9008, 90C06, 90C10, 90C11, 90C25, 90C26, 90C27, 90C30, 
90C31, 90C57, 90C59, 90C90] 
(see: Lagrangian multipliers methods for convex 
programming; Modeling difficult optimization problems; 
Robust global optimization; Variable neighborhood search 
methods) 


optimal solution see: essential —; global —; locally —; 
M-Pareto —; Pareto —; quasi- —; strongly stable —; weakly 
Pareto — 


optimal solution mapping 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
optimal solution of a program 
[90C06] 
(see: Saddle point theory and optimality conditions) 
optimal solution set see: Pareto — 
optimal solutions see: jumps of —; Pareto — 


optimal solutions, properties see: Multi-objective optimization: 


pareto — 
Optimal solvent design approaches 
65K99) 
optimal spanning tree structure 
[90C35] 
see: Minimum cost flow problem) 
optimal state space search algorithm 
[49J35, 49K35, 62C20, 91A05, 91A40] 
see: Minimax game tree searching) 
optimal steady state 
[49]xx, 91 Axx] 
see: Infinite horizon control and dynamic games) 
optimal stopping 
[491.20, 90C40] 
see: Dynamic programming: undiscounted problems) 
optimal stopping 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
see: Stochastic global optimization: stopping rules) 
optimal stopping: numerical methods see: Stochastic — 
optimal stopping: problem formulations see: Stochastic — 
optimal subset 
[90C09, 90C10] 
(see: Matroids) 
optimal substructure property 
[90C09, 90C10] 
(see: Matroids) 
optimal trajectories see: Turnpike theory: stability of — 
optimal trajectory 
[49J15, 49K15, 93C10] 
(see: Pontryagin maximum principle) 
Optimal triangulations 
(68Q20) 
optimal triangulations 
[68Q20] 
(see: Optimal triangulations) 
optimal value bounds see: computable — 


optimal value function 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Image space approach to optimization; 
Nondifferentiable optimization: parametric programming; 
Sensitivity and stability in NLP: continuity and differential 
stability) 

optimal value functions 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 

optimal vertex see: co- — 

optimal vocabulary 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 

optimality see: finite —; first order —; global —; guaranteed 
bound to —; high-order necessary conditions for —; k- —; 
overtaking —; parametric approach to —; Pareto —; 
principle of —; test of —; weak principle of —; weakly 
overtaking —; worst-case — 

optimality for abnormal points see: High-order necessary 
conditions for — 

optimality analyses see: post- — 

optimality analysis see: post- — 

optimality in bilinear programming 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 

optimality condition see: Kuhn—Tucker —; necessary —; 
second order —; sufficient — 

optimality condition without using (sub)gradients parametric 
representations see: necessary — 

optimality conditions 
[90C06, 90C26, 90C31, 90C39] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Saddle point theory and optimality conditions; 
Second order optimality conditions for nonlinear 
optimization; Sensitivity and stability in NLP: continuity 
and differential stability) 

optimality conditions 
[46A20, 49J15, 49K15, 49K27, 49K40, 49M37, 52A01, 65K05, 
90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 90C34, 91B28, 
93C10] 
(see: Bilevel programming: optimality conditions and 
duality; Composite nonsmooth optimization; First order 
constraint qualifications; Generalized concavity in 
multi-objective optimization; Inequality-constrained 
nonlinear optimization; Kuhn-Tucker optimality 
conditions; Pontryagin maximum principle; Quadratic 
programming with bound constraints; Second order 
constraint qualifications; Semi-infinite programming and 
applications in finance; Sensitivity and stability in NLP; 
Sensitivity and stability in NLP: continuity and differential 
stability; Smooth nonlinear nonconvex optimization) 

optimality conditions see: Equality-constrained nonlinear 
programming: KKT necessary —; first order necessary —; 
first order and second order —; fritz John necessary —; 
generalized necessary —; Generalized semi-infinite 
programming: —; Karush-Kuhn-Tucker —; KKT —; KKT 
necessary —; Kuhn-Tucker —; Kuhn-Tucker necessary —; 
necessary —; necessary and sufficient —; 
Quasidifferentiable optimization: —; Saddle point theory 
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and —; second order necessary and sufficient —; 
Semi-infinite programming: second order —; sufficient — 

optimality conditions and duality see: Bilevel programming: — 

optimality conditions for nonlinear optimization see: Second 
order — 

optimality conditions and stability see: Semidefinite 
programming: — 

Optimality criteria for multiphase chemical equilibrium 
(49K99, 65K05, 80A10) 
(referred to in: Global optimization: application to phase 
equilibrium problems; Global optimization in phase and 
chemical reaction equilibrium) 
(refers to: Global optimization: application to phase 
equilibrium problems; Global optimization in phase and 
chemical reaction equilibrium) 

optimality criterion 

90C26, 90C90] 

(see: Structural optimization: history) 

optimality cut 

90C15] 

(see: L-shaped method for two-stage stochastic programs 

with recourse) 

optimality in a game 

49Jxx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 

optimality of MODP see: principle of Pareto — 

optimality in parametric programming 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 

optimality principal see: proximate — 

optimality principle see: proximate — 

optimality sensitivity analysis see: post- — 

optimally 

90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 

90C60, 90C90] 

(see: Traveling salesman problem) 

optimally scaled subclass 

65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 

squares) 

optimization 

65L99, 90C06, 90C10, 90C11, 90C30, 90C57, 90C90, 93-XX] 
(see: Modeling difficult optimization problems; 
Optimization strategies for dynamic systems; Simulated 
annealing methods in protein folding) 

optimization 
[01A99, 05-XX, 15A99, 49K99, 62G07, 62G30, 65D10, 65G20, 
65G30, 65G40, 65K05, 65K10, 68Q25, 68Q99, 68W 10, 80A10, 
90B15, 90B80, 90B85, 90C05, 90C06, 90C08, 90C10, 90C11, 
90C15, 90C26, 90C27, 90C30, 90C31, 90C33, 90C35, 90C39, 
90C52, 90C53, 90C55, 90C90, 90Cxx, 91B28, 92B05, 92C05, 
94A17] 
(see: ABS algorithms for optimization; Adaptive simulated 
annealing and its application to protein folding; 
Assignment and matching; Asynchronous distributed 
optimization algorithms; Auction algorithms; Biquadratic 
assignment problem; Branch and price: Integer 
programming with column generation; Communication 
network assignment problem; Decomposition principle of 
linear programming; Design optimization in 


computational fluid dynamics; Frequency assignment 
problem; Genetic algorithms; Genetic algorithms for 
protein structure prediction; Graph coloring; History of 
optimization; Homogeneous selfdual methods for linear 
programming; Implicit lagrangian; Interval linear systems; 
Invexity and its applications; Isotonic regression problems; 
Jaynes’ maximum entropy principle; Multiplicative 
programming; Neuro-dynamic programming; Nonsmooth 
and smoothing methods for nonlinear complementarity 
problems and variational inequalities; Optimality criteria 
for multiphase chemical equilibrium; Optimization in 
medical imaging; Optimization software; Overdetermined 
systems of linear equations; Probabilistic constrained linear 
programming: duality theory; Quadratic semi-assignment 
problem; Robust optimization; Saddle point theory and 
optimality conditions; Simulated annealing; Single facility 
location: multi-objective euclidean distance location; Single 
facility location: multi-objective rectilinear distance 
location; Stochastic network problems: massively parallel 
solution; Symmetric systems of linear equations; Two-stage 
stochastic programs with recourse) 

optimization see: a priori —; ABS algorithms for —; Adaptive 
convexification in semi-infinite —; Airline —; Algorithmic 
improvements using a heuristic parameter, reject index for 
interval —; algorithms for entropy —; Bayesian global —; 
beam angle —; beam angle selection and wedge 
orientation —; beam weight —; bilevel —; Bilevel 
programming: global —; black-box —; black-box global —; 
branch and bound for unconstrained —; collaborative —; 
combinatorial —; Complexity classes in —; Composite 
nonsmooth —; Computer implementation of —; concurrent 
subspace —; constrained —; constrained global —; 
continuous —; continuous global —; convex —; convex 
combinatorial —; Convex discrete —; convex quadratic —; 
Copositive —; Cutting plane methods for global —; d.c. —; 
Decomposition in global —; Derivative-free methods for 
non-smooth —; deterministic —; Differential equations and 
global —; Dini and Hadamard derivatives in —; direct 
search —; disaggregation in multi-objective —; 
Discontinuous —; discrete —; discrete decisions in 
dynamic —-; Discrete stochastic —; Distance dependent 
protein force field via linear —; dM —; Domination analysis 
in combinatorial —; Duality gaps in nonconvex —; duality 
theorem for linear —; Duality theory: biduality in 
nonconvex —; duality theory for entropy —; Duality theory: 
monoduality in convex —; Duality theory: triduality in 
global —; Dynamic —; encyclopedia of —; engineering —; 
entropy —; equality-constrained —; Evolutionary 
algorithms in combinatorial —; Fejér monotonicity in 
convex —; Financial —; flowsheet —; fluence map —; 
fractional —; Fractional combinatorial —; general 
constrained —; Generalized concavity in multi-objective —; 
global —; global nonlinear —; Global pairwise protein 
sequence alignment via mixed-integer linear —; graph —; 
hierarchical —; History of —; Hyperplane arrangements 
in —; hypodifferentiable —; Image space approach to —; 
Inequality-constrained nonlinear —; 
infinite-dimensional —; information-based —; 
Information-based complexity and information-based —; 
input —; interior point algorithms for entropy —; Interval 
analysis: parallel methods for global —; Interval analysis: 
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unconstrained and constrained —; Interval global —; 
isotone —; Lagrangian relaxation with subgradient —; large 
scale —; large-scale combinatorial —; Large scale 
unconstrained —; linear —; linear fractional 
combinatorial —; Lipschitz —; local —; LP strategy for 
interval-Newton method in deterministic global —; 
marginal function —; mixed discrete-continuous global —; 
Mixed Integer —; Mixed Integer Bilevel —; mixed integer 
dynamic —; mixed integer nonlinear —; Mixed integer 
nonlinear bilevel programming: deterministic global —; 
molecular —; monotonic —; Monte-Carlo simulations for 
stochastic —; Multi-class data classification via 
mixed-integer —; multi-extremal global —; 
multi-objective —; multi-objective combinatorial —; 
multi-objective convex —; multidisciplinary —; 
Multidisciplinary design —; multilevel —; multiperiod —; 
multistage —; Nested partitions —; network —; Neural 
networks for combinatorial —; New hybrid conjugate 
gradient algorithms for unconstrained —; nonconvex —; 
nondifferentiable —; nondifferentiable convex —; 
nondifferential —; nonlinear —; nonlinear parametric —; 
nonlinearly constrained —; nonsmooth —; nonsmooth 
analysis and —; nonsmooth nonconvex —; Numerical 
methods for unary —; off-line process —; on-line 

process —; one-parametric semi-infinite —; ordinal —; 
parallel —; parametric —; parametric approach to 
fractional —; Parametric mixed integer nonlinear —; 
parametric nonlinear —; path following algorithm for 
entropy —; Peptide identification via mixed-integer —; 
Performance profiles of conjugate-gradient algorithms for 
unconstrained —; Plant layout problems and —; 

portfolio —; process —; Quasidifferentiable —; 
Reformulation-linearization technique for global —; 
Replicator dynamics in combinatorial —; Reverse 

convex —; Robust —; Robust global —; sample-path —; 
Second order optimality conditions for nonlinear —; 
semi-infinite —; Semidefinite programming and 

structural —; separable —; sequential approximate —; 
Set-valued —; Shape —; simulation-based —; sizing —; 
smooth nonlinear —; Smooth nonlinear nonconvex —; 
Smoothing methods for semi-infinite —; SSC minimization 
algorithms for nonsmooth and stochastic —; stochastic —; 
stochastic combinatorial —; stochastic global —; 
stochasticglobal —; structural —; structural shape —; 
structural topology —; subgradient —; supply chain —; 
system- —; tailored —; Theorems of the alternative and —; 
Topological derivative in shape —; topology —; Topology 
of global —; Two-level —; type of —; unary —; 
unbounded —; unconstrained —; unconstrained dual in 
entropy —; unconstrained global —; uniform fractional 
combinatorial —; unstructured —; user- —; vector —; 
Wastewater system —; zero-one — 


Optimization in ad hoc networks 


(68M12, 90B18, 90C11, 90C30) 


optimization algorithm see: wBB global —; deterministic 


global —; Direct global —; MINLP: branch and bound 
global — 


optimization algorithm (definition) 


[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 


optimization algorithms 

[90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming: discretization methods) 
optimization algorithms see: Asynchronous distributed —; 

unconstrained — 
optimization algorithms for financial planning problems see: 

Global — 
optimization: algorithms for hypodifferentiable functions see: 

Quasidifferentiable — 
optimization: algorithms for QD functions see: 

Quasidifferentiable — 
optimization algorithms in resource allocation problems see: 

Combinatorial — 
optimization with wBB see: MINLP: global — 
optimization in the analysis and management of 

environmental systems see: Global — 
optimization: application to phase equilibrium problems see: 

Global — 
optimization: applications see: Continuous global —; 

Quasidifferentiable — 
optimization: applications to thermoelasticity see: 

Quasidifferentiable — 
optimization approach see: Multiple minima problem in 

protein folding: wBB global — 
optimization approach to clustering see: Nonsmooth — 
optimization approach to image reconstruction from projection 

data 

[94A08, 94A17] 

(see: Maximum entropy principle: image reconstruction) 
optimization approaches see: Statistical classification: — 
Optimization based frameworkfor radiation therapy 

(68W01, 90-00, 90C90, 92-08, 92C50) 

(referred to in: Beam selection in radiotherapy treatment 

design) 
optimization-based methods see: Disease diagnosis: — 
optimization based on statistical models see: Global — 
Optimization-based visualization 

(91C15, 65K05, 90C30, 90C27, 90C57) 

(refers to: Continuous global optimization: models, 

algorithms and software; Dynamic programming in 

clustering; Evolutionary algorithms in combinatorial 
optimization; Integer programming; Integer programming: 
branch and bound methods; Nonlinear least squares: 

Newton-type methods; Simulated annealing) 
optimization in batch design under uncertainty see: Global — 
optimization in binary star astronomy see: Global — 
Optimization in boolean classification problems 

(90C09, 90C10) 

(referred to in: Alternative set theory; Boolean and fuzzy 

relations; Checklist paradigm semantics for fuzzy logics; 

Finite complete systems of many-valued logic algebras; 

Inference of monotone boolean functions; Mixed integer 

classification problems; Optimization in classifying text 

documents; Statistical classification: optimization 
approaches) 

(refers to: Alternative set theory; Boolean and fuzzy 

relations; Checklist paradigm semantics for fuzzy logics; 

Finite complete systems of many-valued logic algebras; 

Inference of monotone boolean functions; Linear 

programming models for classification; Mixed integer 

classification problems; Optimization in classifying text 
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documents; Statistical classification: optimization 

approaches) 
optimization: calculus of quasidifferentials see: 

Quasidifferentiable — 
optimization in CFD see: design — 
Optimization in classifying text documents 

(90C09, 90C10) 

(referred to in: Alternative set theory; Boolean and fuzzy 

relations; Checklist paradigm semantics for fuzzy logics; 

Finite complete systems of many-valued logic algebras; 

Inference of monotone boolean functions; Optimization in 

boolean classification problems; Statistical classification: 

optimization approaches) 

(refers to: Alternative set theory; Boolean and fuzzy 

relations; Checklist paradigm semantics for fuzzy logics; 

Finite complete systems of many-valued logic algebras; 

Inference of monotone boolean functions; Linear 

programming models for classification; Mixed integer 

classification problems; Optimization in boolean 
classification problems; Statistical classification: 
optimization approaches) 

optimization: codifferentiable functions see: 

Quasidifferentiable — 
optimization in computational fluid dynamics see: Design — 
optimization of computational performance 

[03B52, 03E72, 47840, 68127, 68135, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 
optimization computer implementation example 

[90C10, 90C30, 90C35] 

(see: Optimization in operation of electric and energy 

power systems) 
optimization: cutting angle method see: Global — 
optimization: cutting plane methods see: Nondifferentiable — 
Optimization and decision support systems 

(90B50) 

(referred to in: Data envelopment analysis) 
optimization and decision support systems see: 

Multi-objective — 
optimization: definition (colloquial) 

[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

(see: Modeling difficult optimization problems) 
optimization: Dini derivatives, clarke derivatives see: 

Quasidifferentiable — 
optimization: A disjunctive cutting plane approach see: 

Mixed-integer nonlinear — 
optimization in document classification 

[90C09, 90C 10] 

(see: Optimization in classifying text documents) 
optimization of dynamical systems see: Interval analysis for — 
optimization: embeddings, path following and singularities 

see: Parametric — 
optimization: envelope representation see: Global — 
Optimization with equilibrium constraints: A piecewise SQP 

approach 

(90C30, 90C33) 

(referred to in: Feasible sequential quadratic programming; 

Sequential quadratic programming: interior point methods 

for distributed optimal control problems; Successive 

quadratic programming; Successive quadratic 
programming: applications in distillation systems; 


Successive quadratic programming: applications in the 
process industry; Successive quadratic programming: 
decomposition methods; Successive quadratic 
programming: full space methods; Successive quadratic 
programming: solution by active sets and interior point 
methods) 
(refers to: Bilevel programming: introduction, history and 
overview; Feasible sequential quadratic programming; 
Generalized monotonicity: applications to variational 
inequalities and equilibrium problems; Linear 
complementarity problem; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Successive quadratic 
programming; Successive quadratic programming: 
applications in distillation systems; Successive quadratic 
programming: applications in the process industry; 
Successive quadratic programming: decomposition 
methods; Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods; Variational 
inequalities) 

optimization: exact penalty methods see: 
Quasidifferentiable — 

optimization: feasibility test and flexibility index see: Bilevel — 

optimization: filled function methods see: Global — 

optimization: functional forms see: Global — 

optimization: g-@BB approach see: Global — 

optimization game see: combinatorial — 

optimization games see: Combinatorial — 

optimization in generalized geometric programming see: 
Global — 

optimization of heat exchanger networks see: Global — 

optimization: history see: Structural — 

optimization: hit and run methods see: Global — 

optimization homotopies 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 

optimization for image reconstruction see: Entropy —; 
finite-dimensional models for entropy —; vector-space 
models for entropy — 

optimization in industrial problems see: SQP — 

optimization: interaction of design and control see: 
Multi-objective — 

optimization; Interactive methods for preference value 
functions see: Multi-objective — 

optimization: interior point methods see: Entropy — 

optimization: interval analysis and balanced interval 
arithmetic see: Global — 

optimization: lagrange duality see: Multi-objective — 

optimization in Lennard-Jones and morse clusters see: 
Global — 

Optimization in leveled graphs 

90C35) 

referred to in: Graph planarization; Integer programming) 

refers to: Graph planarization; Integer programming) 

optimization in location problems see: Global — 

optimization in mechanics see: Multilevel — 

optimization in medical image processing 

[90C90] 

see: Optimization in medical imaging) 
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Optimization in medical imaging 
(90C90) 
(referred to in: Entropy optimization: shannon measure of 
entropy and its properties; Maximum entropy principle: 
image reconstruction) 
(refers to: Entropy optimization: shannon measure of 
entropy and its properties; Genetic algorithms; Linear 
programming; Maximum entropy principle: image 
reconstruction; Simulated annealing) 
optimization method see: heuristic —; QBB global — 
optimization methods see: Bisection global —; Credit rating 
and —; Nondifferentiable optimization: subgradient —; 
Shape selective zeolite separation and catalysis: —; Solving 
hemivariational inequalities by nonsmooth — 
optimization methods for harmonic retrieval see: Global — 
optimization methods for systems of nonlinear equations see: 
Global — 
optimization: minimax problems see: Nondifferentiable — 
optimization: mixed-integer linear programs see: Robust — 
optimization model see: continuous global — 
optimization modeling see: Emergency evacuation — 
optimization modeling framework see: multiperiod — 
optimization: models, algorithms and software see: 
Continuous global — 
optimization models for data classification see: Deterministic 
and probabilistic — 
optimization in multiplicative programming see: Global — 
optimization in neural network training see: Unconstrained — 
optimization: a new paradigm see: Modeling languages in — 
optimization: Newton—Cauchy framework see: Unconstrained 
nonlinear — 
optimization: Newton method see: Nondifferentiable — 
optimization with noises 
[90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 
optimization of nonlinear problems see: Simultaneous 
estimation and — 
Optimization in operation of electric and energy power 
systems 
(90C35, 90C30, 90C10) 
(referred to in: Derivatives of markov processes and their 
simulation; Derivatives of probability and integral 
functions: general theory and examples; Derivatives of 
probability measures; Discrete stochastic optimization) 
optimization: optimality conditions see: Quasidifferentiable — 
optimization oracle see: linear discrete — 
optimization: p-@BB approach see: Global — 
optimization paradigm 
[90C10, 90C26, 90C30] 
(see: Optimization software) 
optimization: parameter estimation see: Entropy — 
optimization: parametric programming see: 
Nondifferentiable — 
optimization: pareto optimal solutions, properties see: 
Multi-objective — 
optimization in phase and chemical reaction equilibrium see: 
Global — 
optimization of planar multilayered dielectric structures see: 
Global — 


optimization problem 
[65K10, 65M60, 90C26, 90C31] 
(see: Robust global optimization; Variational inequalities) 
optimization problem 
[65K10, 65M60] 
(see: Variational inequalities) 
optimization problem see: Cc! _, canonical monotonic —; 
combinatorial —; convex —; dual —; fractional 
combinatorial —; global —; global constrained —; global 
unconstrained —; integer —; integral linear fractional 
combinatorial —; Lagrangian dual —; linear —; 
max-min-max —; mixed integer —; nonconvex —; 
nonlinear —; nonseparable —; one-parametric finite —; 
parametric —; primal —; semi-infinite —; separable —; 
set-valued —; standard quadratic —; stochastic dynamic —; 
unary —; unconstrained — 
optimization problem in standard form see: linear — 
optimization problems 
[68Q25, 68R10, 68W40, 90C26, 90C27, 90C30, 90C59, 90C90] 


(see: Domination analysis in combinatorial optimization; 
Planning in the process industry; Smooth nonlinear 
nonconvex optimization; Successive quadratic 
programming: applications in the process industry) 
optimization problems see: Approximations to robust conic —; 
combinatorial —; computational complexity of —; 
Continuous reformulations of discrete-continuous —; 
Convex envelopes in —; Decomposition algorithms for the 
solution of multistage mean-variance —; discrete 
monotonic —; discretization of —; General moment —; 
Laplace method and applications to —; linearly 
constrained —; Modeling difficult —; nonlinear —; 
nonsmooth —-; Price of robustness for linear —; 
semi-infinite —; stability analysis of —; stochastic linear — 
optimization problems: algorithms see: Standard quadratic — 
optimization problems: applications see: Standard 
quadratic — 
optimization problems: theory see: Standard quadratic — 
Optimization problems in unit-disk graphs 
(05C85, 05C69, 05C15, 05C62, 90C27, 90C59) 
(referred to in: Broadcast scheduling problem) 
optimization procedure see: Direct search Luus—Jaakola —; 
[= 
optimization process see: automated design — 
optimization in protein folding see: Global — 
optimization: relaxation methods see: Nondifferentiable — 
optimization: sensitivity see: Parametric global — 
optimization: shannon measure of entropy and its properties 
see: Entropy — 
optimization-simulation 
(see: Emergency evacuation, optimization modeling) 
Optimization software 
(90C30, 90C26, 90C10) 
(referred to in: Continuous global optimization: models, 
algorithms and software; Large scale unconstrained 
optimization; Modeling languages in optimization: a new 
paradigm) 
(refers to: Continuous global optimization: models, 
algorithms and software; Large scale unconstrained 
optimization; Modeling languages in optimization: a new 
paradigm) 
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optimization: stability of dynamic systems see: 

Quasidifferentiable — 
optimization: stopping rules see: Stochastic global — 
Optimization strategies for dynamic systems 

(93-XX, 65L99) 

(referred to in: Control vector iteration CVI; Dynamic 

programming: continuous-time optimal control; Dynamic 

programming: infinite horizon problems, overview; 

Dynamic programming and Newton’s method in 

unconstrained optimal control; Dynamic programming: 

optimal control applications; Dynamic programming: 
stochastic shortest path problems; 

Hamilton-Jacobi-Bellman equation; Infinite horizon 

control and dynamic games; Quasidifferentiable 

optimization: stability of dynamic systems; Suboptimal 
control) 

(refers to: Dynamic programming: continuous-time optimal 

control; Dynamic programming: infinite horizon problems, 

overview; Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Dynamic programming: 
stochastic shortest path problems; 

Hamilton-Jacobi-Bellman equation; Infinite horizon 

control and dynamic games; Quasidifferentiable 

optimization: stability of dynamic systems) 
optimization: subgradient optimization methods see: 

Nondifferentiable — 
optimization system 

[90C10, 90C26, 90C30] 

(see: Optimization software) 
optimization system see: generalized network — 
optimization techniques see: Estimating data for multicriteria 

decision making problems: —; Load balancing for 

parallel — 

Optimization techniques for minimizing the energy function 

[90C90] 

(see: Optimization in medical imaging) 

Optimization techniques for phase retrieval based on 
single-crystal X-ray diffraction data 

optimization: theorems of the alternative see: Linear — 

optimization: tight convex underestimators see: Global — 

optimization over a trajectory 

[93-XX] 

(see: Dynamic programming: optimal control applications) 
optimization: two-phase methods see: Stochastic global — 
optimization over unbounded domains see: global — 
optimization under network constraints 

[65K05, 90C26, 90C30] 

(see: Monotonic optimization) 
optimization using space filling see: Global — 
optimization using terrain/funneling methods see: Multi-scale 

global — 
optimization: variational formulations see: 

Quasidifferentiable — 
optimization in a vector space 

[94A08, 94A17] 

(see: Maximum entropy principle: image reconstruction) 
Optimization in water resources 

(90C30, 90C35) 

(referred to in: Global optimization in the analysis and 

management of environmental systems) 


(refers to: Global optimization in the analysis and 
management of environmental systems) 

optimization in water resources see: stochastic approach to — 

optimization in Weber's problem with attraction and repulsion 
see: Global — 

optimization in well scheduling see: Mixed integer — 

optimized transportation network see: system- —; user- — 

optimizer 
[90C90] 
(see: Design optimization in computational fluid dynamics) 

optimizer see: global —; strict local — 

optimizer coupling see: model/ — 

optimizing environment see: system- —; user- — 

Optimizing facility location with euclidean and rectilinear 
distances 
(90B80, 90B85) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Single facility location: circle covering problem; Single 
facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 
(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Production-distribution system design problem; Resource 
allocation for epidemic control; Single facility location: 
circle covering problem; Single facility location: 
multi-objective euclidean distance location; Single facility 
location: multi-objective rectilinear distance location; 
Stochastic transportation and location problems; Voronoi 
diagrams in facility location; Warehouse location problem) 

optimum see: conditions for a constrained —; conditions for 
an unconstrained —; constrained global —; global —; 
local —; Pareto —; unconstrained — 

optimum search see: global — 

optimum search with enhanced positioning see: Gene 
clustering: A novel decomposition-based clustering 
approach: global — 

optimum solution 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 

option 
[91B50] 
(see: Financial equilibrium) 
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option 

91B50] 

(see: Financial equilibrium) 

(or disk) representation see: geometric — 

OR-ing 

03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

OR-methodology 

90B80, 90B85] 

(see: Warehouse location problem) 

(or Minty) GVI see: dual — 

oracle 

90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems; Inference of monotone boolean 
functions; Maximum constraint satisfaction: relaxations 
and upper bounds) 

oracle see: augmentation —; comparison —-; linear discrete 
optimization —; membership —; oriented augmentation — 

orbit 

[90C26, 90C90] 

see: Global optimization in binary star astronomy) 

orbit see: satellite — 

orbital period 

[26A24, 65K99, 85-08] 

see: Automatic differentiation: geometry of satellites and 

tracking stations) 

orbits determination 

[90C26, 90C90] 

see: Global optimization in binary star astronomy) 

order see: antisymmetric partial —; first order theory of real 


addition with —; interval —; lexicographical —; Lowner 
partial —; partial —; pre- —; pseudo- —; quasi- —; 
semi- —; weak — 


order active see: p- — 

order adjoints see: second — 

order approximating cone see: high- —; tangent high- — 
order approximating cone of decrease see: high- — 


order approximating cones see: feasible high- —; tangent 
high- — 
order approximating curve see: feasible high- —; high- —; 


tangent high- — 

order approximating vector see: feasible high- — 

order approximating vector of decrease see: high- — 

order approximating vectors see: high- — 

order approximation see: second — 

order approximation of a function see: first — 

order changes see: up to first — 

order closure of a relation see: local pre- —; pre- — 

order codifferential see: second — 

Order complementarity 
(90C33) 
(referred to in: Continuous reformulations of 
discrete-continuous optimization problems; Equivalence 
between nonlinear complementarity problem and fixed 
point problem; Generalized nonlinear complementarity 
problem; Integer linear complementary problem; LCP: 
Pardalos—Rosen mixed integer formulation; Linear 
complementarity problem; Principal pivoting methods for 
linear complementarity problems; Topological methods in 


complementarity theory) 
(refers to: Convex-simplex algorithm; Equivalence between 
nonlinear complementarity problem and fixed point 
problem; Generalized nonlinear complementarity problem; 
Integer linear complementary problem; LCP: 
Pardalos-Rosen mixed integer formulation; Lemke 
method; Linear complementarity problem; Linear 
programming; Parametric linear programming: cost 
simplex algorithm; Principal pivoting methods for linear 
complementarity problems; Sequential simplex method; 
Topological methods in complementarity theory) 

order complementarity 
[90C33] 
(see: Order complementarity) 

order complementarity problem 
[90033] 
(see: Order complementarity) 

order complementarity problem see: general —; 
generalized —; generalized linear —; implicit general —; 
infinite-dimensional generalized —; linear —; nonlinear — 

order cone 
[90C26] 
(see: Invexity and its applications) 

order cone see: second — 

order cones of decrease see: high- — 

order constrained hierarchical clustering 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 

order constrained partitioning 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 

order constraint qualification see: first —; second — 

order constraint qualifications see: First —; Second — 

order CQ see: First —; second — 

order critical direction see: high- — 

order decomposition of a function see: second — 

order derivatives see: higher- — 

order differential equations see: Duality in optimal control with 
first — 

order directional derivative see: generalized second — 

order directional derivatives see: high- —; higher- — 

order earliness see: minimization of — 

order feasible cones see: high- — 

order feasible set see: high- —; p- — 

order form of coordinates see: kth — 

order generalization of Lyusternik theorem see: high- — 

order of a graph 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 

order growth see: second — 

order hyperdifferential see: second — 

order hypodifferential see: kth —; second — 

order of an inclusion function 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 

order isotonic regression see: simple — 

order KKT conditions see: first — 

order Lagrangian theory of CNSO problems see: second — 

order local maximum principle see: high- — 

order local maximum principle for Lagrangian problems see: 
high- — 
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order local minimum condition see: high- — 
order of magnitude 

[90C60] 

(see: Computational complexity theory) 
order maximum principle for abnormal extremals see: High- — 
order necessary condition see: first —; second — 
order necessary conditions see: first —; second — 
order necessary conditions for optimality see: high- — 
order necessary conditions for optimality for abnormal points 

see: High- — 
order necessary optimality conditions see: first — 
order necessary and sufficient optimality conditions see: 

second — 
order optimality see: first — 
order optimality condition see: second — 
order optimality conditions see: first order and second —; 

Semi-infinite programming: second — 
order optimality conditions for nonlinear optimization see: 

Second — 
order partial differential equations see: First — 
order preserving assignment problem 

[90C05, 90C10, 90C27, 90C35] 

(see: Assignment and matching) 
order procedures see: second — 
order quantity see: economic — 
order regular set see: second — 
order relation 

[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 
order relation see: local —; partial 
order restricted statistical inference 

[41A30, 62J02, 90C26] 

(see: Regression by special functions: algorithms and 

complexity) 
order restriction 

[62G07, 62G30, 65K05] 

(see: Isotonic regression problems) 
order and second order optimality conditions see: first — 
order set of decrease see: high- — 
order spectrum see: higher- — 
order statistics see: higher- —; Signal processing with 

higher — 
order sufficiency see: second —; strong second — 
order sufficient condition see: general second —; general 

strong second —; second —; strong second — 
order sufficient conditions see: second — 
order of a T-coloring frequency assignment 

[05-XX] 

(see: Frequency assignment problem) 
order tangent approximating vector see: high- — 
order tangent set see: first —; second — 
order tangent sets see: high- — 
order Taylor series expansion see: first — 
order theory of real addition with order see: first — 
ordered normed linear spaces see: Best approximation in — 
ordered partition 

[62H30, 90C27, 90C39] 

(see: Assignment methods in clustering; Dynamic 

programming in clustering) 


; pre- 


ordered partitioning 
[68Q25, 68R10, 68W40, 90C27, 90C59] 

(see: Domination analysis in combinatorial optimization) 
ordered set 

[90C29] 

(see: Preference modeling) 

ordered spaces see: semi- — 

ordered vector spaces 
[90C33] 

(see: Order complementarity) 

Ordering see: cobipartite neighborhood edge elimination —; 
lexicographic —; linear —; minimum degree —; 
zero-Inventory — 

ordering on binary vectors 
[90C09] 

(see: Inference of monotone boolean functions) 
ordering cones 

[90C29] 

(see: Vector optimization) 

ordering for n-dimensional vectors see: lexicographical — 

ordering and perturbation see: lexicographic — 

ordering policy see: one-for-one — 

ordering problem see: Linear — 

orders see: partial — 

ordinal argument 

[90-XX] 

(see: Outranking methods) 

ordinal criterion 

[90C29, 91499] 

see: Preference disaggregation) 

ordinal optimization 

[90C15, 90C27] 

see: Discrete stochastic optimization) 

ordinal regression 

[90C26, 91B28] 

see: Portfolio selection and multicriteria analysis) 

ordinary 

[65G20, 65G30, 65G40] 

(see: Interval analysis: systems of nonlinear equations) 

ordinary differential equations 

[49K05, 49K10, 49K15, 49K20] 

see: Duality in optimal control with first order differential 

equations) 

ordinary differential equations 

[49-XX, 60]xx, 65Lxx, 91B32, 92D30, 93-XX] 

see: Resource allocation for epidemic control) 

ordinary differential equations see: Eigenvalue enclosures 
for — 

ordinary NP-complete reductions 

[68Q25, 90C60] 

see: NP-complete problems and proof methodology) 

ordinary NP-completeness 

[68Q25, 90C60] 

see: NP-complete problems and proof methodology) 

ordinary NP-completeness 

[68Q25, 90C60] 

see: NP-complete problems and proof methodology) 

organization see: self- — 

orientable matroid 

[90C09, 90C10] 

see: Oriented matroids) 
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orientation 
[90B35] 
(see: Job-shop scheduling problem) 
orientation see: circuit —; complete — 
orientation optimization see: beam angle selection and 
wedge — 
orientation of an oriented matroid see: basis — 
oriented approach see: equation — 
oriented augmentation oracle 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
oriented branch and bound method see: arc —; node — 
oriented construction procedure see: arc —; node — 
oriented differentiation see: goal- — 
oriented matroid see: acyclic —; bases of an —; basis 
orientation of an —; totally acyclic —; vector of an — 
Oriented matroids 
90C09, 90C10) 
(referred to in: Least-index anticycling rules; Lexicographic 
pivoting rules; Matroids) 
refers to: Matroids) 
oriented matroids 
[05B35, 65K05, 90C05, 90C09, 90C10, 90C20, 90C33] 
(see: Criss-cross pivoting rules; Oriented matroids) 
oriented matroids 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules; Least-index anticycling 
rules; Lexicographic pivoting rules) 
oriented matroids see: axiom systems for — 
origin see: neighbors of the — 
origin tracing 
(see: Planning in the process industry) 
Orlicz theorem see: Mazur- — 
Orlicz version of the Hahn—Banach theorem see: Mazur- — 
Orlik-Solomon algebra 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
Orlik-Solomon algebra 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
orthogonal collocation 
90C30] 
(see: Suboptimal control) 
orthogonal condition 
[90C30] 
see: Image space approach to optimization) 
orthogonal factorization 
[15A23, 65F05, 65F20, 65F22, 65F25] 
see: QR factorization) 
orthogonal factorization 
[15A23, 65F05, 65F20, 65F22, 65F25] 
see: QR factorization) 
orthogonal factorization see: complete — 
orthogonal matrix 
[90C09, 90C10] 
see: Combinatorial matrix analysis) 
orthogonal matrix 
[15A39, 90C05] 
(see: Farkas lemma) 


orthogonal matroid 
[90C09, 90C10] 
(see: Oriented matroids) 

orthogonal polynomials 
[33C45, 65F20, 65F22, 65K10, 90C30] 
(see: Generalized total least squares; Least squares 
orthogonal polynomials) 

orthogonal polynomials 
[33C45, 65F20, 65F22, 65K10] 
(see: Least squares orthogonal polynomials) 

orthogonal polynomials see: Least squares —; least squares 
formal — 

orthogonal projection 

65K10, 65M60] 

(see: Variational inequalities) 

orthogonal projection operator 

90C30] 

(see: Rosen’s method, global convergence, and Powell’s 

conjecture) 

orthogonal search directions 

90C30] 

(see: Rosenbrock method) 

orthogonal signed sets 

90C09, 90C10] 

(see: Oriented matroids) 

orthogonal transform 

15A23, 65F05, 65F20, 65F22, 65F25] 
(see: QR factorization) 

orthogonal transformations see: elementary — 

Orthogonal triangularization 
(65F25, 15A23, 65F05, 65F20, 65F22) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; Cholesky factorization; Interval linear 
systems; Large scale trust region problems; Large scale 
unconstrained optimization; Overdetermined systems of 
linear equations; QR factorization; Solving large scale and 
sparse semidefinite programs; Symmetric systems of linear 
equations) 
(refers to: ABS algorithms for linear equations and linear 
least squares; Cholesky factorization; Interval linear 
systems; Large scale trust region problems; Large scale 
unconstrained optimization; Linear programming; 
Overdetermined systems of linear equations; QR 
factorization; Solving large scale and sparse semidefinite 
programs; Symmetric systems of linear equations) 

orthogonality conditions on multipliers 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 

orthogonalization see: classical Gram-Schmidt —; 
Gram-Schmidt —; modified Gram-—Schmidt — 

orthogonalization scheme see: sequential row — 

orthogonally scaled subclass 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 

orthonormal representation 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 

orthonormalization see: hybrid — 
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Os 


[90C10, 90C26, 90C30] 
(see: Optimization software) 


OSLE 


[65D10, 65K05] 

(see: Overdetermined systems of linear equations) 
Out see: first-In-First- —; pricing- — 

out events see: drought — 


out rule see: first-in last- — 


out trip see: pull- — 


outcome 


90C26] 
(see: Global optimization using space filling) 


outcome set 


90C29] 

(see: Multi-objective optimization; Interactive methods for 
preference value functions) 

outcome space 

65K05, 90B50, 90C05, 90C29, 91B06] 

(see: Multi-objective optimization and decision support 
systems) 

outer approximation 

49M37, 90C05, 90C11, 90C25, 90C26, 90C29, 90C30, 90C34, 
90C90] 

(see: Bilevel optimization: feasibility test and flexibility 
index; Concave programming; MINLP: branch and bound 
methods; MINLP: design and scheduling of batch processes; 
Mixed integer nonlinear programming; Multi-objective 
optimization: interaction of design and control; 
Semi-infinite programming: discretization methods) 

outer approximation 

[49M20, 49M37, 90C11, 90C26, 90C30] 

(see: Cutting plane methods for global optimization; 
Generalized outer approximation; Mixed integer nonlinear 


programming) 
outer approximation see: Generalized —; hybrid branch and 
bound and —-; linear —; Logic-based —; quadratic — 


outer approximation algorithm 

[90C10, 90C11, 90C26] 

(see: MINLP: branch and bound global optimization 
algorithm; MINLP: outer approximation algorithm) 
outer approximation algorithm see: MINLP: — 

outer approximation with equality relaxation 

[65K05, 90C11, 90C26, 90C29, 90C90] 

(see: MINLP: global optimization with a BB; Multi-objective 
optimization: interaction of design and control) 

outer approximation with equality relaxation and augmented 
penalty 

90C11, 90C29, 90C90] 

(see: Multi-objective optimization: interaction of design 
and control) 

outer approximation method 

90C26] 

(see: Cutting plane methods for global optimization) 
outer approximation method 

90C09, 90C10, 90C11] 

(see: MINLP: logic-based methods; MINLP: outer 
approximation algorithm) 

outer-approximation method see: Logic-based — 


outer linearization cone 


[90C31, 90C34, 90C46] 


see: Generalized semi-infinite programming: optimality 
conditions) 

outer problem 

[90C25, 90C29, 90C30, 90C31] 

see: Bilevel programming: optimality conditions and 
duality) 

outgoing arc 

[90C35] 

see: Minimum cost flow problem) 

outline of filled function methods see: basic — 

output-efficient 

[90B30, 90B50, 90C05, 91B82] 

see: Data envelopment analysis) 

output/input see: maximization of — 

output matrices see: updating input- — 

output neurons 

[90C27, 90C30] 

see: Neural networks for combinatorial optimization) 
output-polynomial 

[52B12, 68Q25] 

see: Fourier-Motzkin elimination method) 
output-polynomial time 

[52B12, 68Q25] 

see: Fourier-Motzkin elimination method) 

output tables see: triangulation problem for input- — 
outranking 

[90-XX] 

see: Outranking methods) 

Outranking methods 

90-XX) 

referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Portfolio 
selection and multicriteria analysis; Preference 
disaggregation; Preference disaggregation approach: basic 
features, examples from financial decision making; 
Preference modeling) 

(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
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Multiple objective programming support; Portfolio 
selection and multicriteria analysis; Preference 
disaggregation; Preference disaggregation approach: basic 
features, examples from financial decision making; 
Preference modeling) 
outranking methods 
[90-XX] 
(see: Outranking methods) 
outranking relation 
[90-XX, 90C29] 
(see: Outranking methods; Preference disaggregation 
approach: basic features, examples from financial decision 
making) 
outranking relation 
[90C26, 90C29, 91B28] 
(see: Multicriteria sorting methods; Portfolio selection and 
multicriteria analysis) 
outranking relation see: fuzzy — 
outranking relations 
[90C29, 91B06, 91B60] 
see: Financial applications of multicriteria analysis; 
Multicriteria sorting methods) 
outranking relations approach 
[90C29] 
see: Decision support systems with multiple criteria) 
outward rounding 
[65G20, 65G30, 65G40, 65H20] 
see: Interval fixed point theory) 
over see: strongly linearly monotonic — 
over an ellipsoid see: Quadratic programming — 
over surface formula see: integral — 
over a trajectory see: optimization — 
over unbounded domains see: global optimization — 
over a volume see: integral — 
over volume formula see: integral — 
overall classification error see: minimizing the — 
overall flowsheet see: convergence of the — 
overall mean method 
[91B28] 
see: Portfolio selection: markowitz mean-variance model) 
overdetermined system of nonlinear equations 
[90C30] 
see: Nonlinear least squares problems) 
Overdetermined systems of linear equations 
(65K05, 65D10) 
referred to in: ABS algorithms for linear equations and 
linear least squares; Cholesky factorization; Interval linear 
systems; Large scale trust region problems; Large scale 
unconstrained optimization; Orthogonal triangularization; 
QR factorization; Solving large scale and sparse 
semidefinite programs; Symmetric systems of linear 
equations) 
(refers to: ABS algorithms for linear equations and linear 
least squares; Cholesky factorization; Interval linear 
systems; Large scale trust region problems; Large scale 
unconstrained optimization; Linear programming; 
Nonlinear least squares: trust region methods; Orthogonal 
triangularization; QR factorization; Solving large scale and 
sparse semidefinite programs; Symmetric systems of linear 
equations) 


overdetermined Yule-Walker method 
[65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 
overhead 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
overhead factor see: search — 
overlap see: contact map — 
overlap graph 
[90C10, 90C27, 94C15] 
(see: Graph planarization) 
overlap of intervals 
[90C10, 90C27, 94C15] 
(see: Graph planarization) 
overlap maximization problem, CMO see: Contact map — 
overloaded operator 
[65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 
overloaded operator 
[65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 
overloading see: operator — 
overprojeclion 
[90C30] 
(see: Relaxation in projection methods) 
overrelaxation see: successive — 
Overtaking equilibrium 
[49Jxx, 91Axx] 
(see: Infinite horizon control and dynamic games) 
overtaking optimality 
[49]xx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
overtaking optimality see: weakly — 
overview see: Bilevel programming: introduction, history 
and —; Dynamic programming: infinite horizon 
problems — 


ma) 


[90C60] 
(see: Complexity classes in optimization) 
P see: complexity class —; covers all edge-directions of —; N 
P=co N—; P=N — 
Px 
[90C35] 
(see: Multicommodity flow problems) 
P=NP 
[49-01, 49K45, 49N10, 90-01, 90C20, 90C27, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs) 
P-algorithm 
[60]65, 68Q25] 
(see: Adaptive global search) 
p-@BB approach see: Global optimization: — 
p-center problem 
[90B80, 90B85] 
(see: Warehouse location problem) 
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p-center problem 
[90B10, 90B80, 90B85, 90C35] 
(see: Network location: covering problems; Warehouse 
location problem) 
p-center problem on a network 
[90B10, 90B80, 90C35] 
(see: Network location: covering problems) 
P convergence 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
p-CP 
[90B80, 90B85] 
(see: Warehouse location problem) 
p-form see: logarithmic —; rational — 
P-function see: uniform — 
p-matrix 
[05B35, 65K05, 90C05, 90C20, 90C25, 90C30, 90C33, 90C55] 
(see: Criss-cross pivoting rules; Implicit lagrangian; 
Least-index anticycling rules; Principal pivoting methods 
for linear complementarity problems; Splitting method for 
linear complementarity problems) 
Po-matrix 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
P.-matrix 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
p-median location-allocation problem 
[90C26] 
(see: MINLP: application in facility location-allocation) 
p-median problem 
[90B80, 90B85] 
(see: Warehouse location problem) 
p-median problem 
[9008, 90B80, 90B85, 90C26, 90C27, 90C59, 90Cxx, 91 Axx, 
91Bxx] 
(see: Facility location with externalities; Variable 
neighborhood search methods; Warehouse location 
problem) 
p-MP 
[90B80, 90B85] 
(see: Warehouse location problem) 
p-order active 
[41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
p-order feasible set 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
p-partition 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
p-regular 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 


p-regular operator 

41A10, 46N10, 47N10, 49K27] 

(see: High-order necessary conditions for optimality for 

abnormal points) 

p-simplex 

90C30] 

(see: Simplicial decomposition) 

p-simplex 

[90C30] 

see: Simplicial decomposition) 

p-VSP 

[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 

see: Vehicle scheduling) 

P=co NP see: N— 

PA 

[90C26] 

see: Cutting plane methods for global optimization) 

PA of SA 

[52A22, 60D05, 68Q25, 90C05] 

see: Probabilistic analysis of simplex algorithms) 

package see: block truncated Newton software —; computer 
algebra —; multiple-class software —; single-class 
software —; variable precision interval — 

package of basic software routines 
[90C10, 90C26, 90C30] 
(see: Optimization software) 

package flow problem 
[90C35] 
(see: Multicommodity flow problems) 

package for specific mathematical areas see: software — 

Packet annealing 
(92B05) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian global 
optimization; Genetic algorithms; Genetic algorithms for 
protein structure prediction; Global optimization based on 
statistical models; Global optimization in Lennard-Jones 
and morse clusters; Graph coloring; Molecular structure 
determination: convex global underestimation; 
Monte-Carlo simulated annealing in protein folding; 
Multiple minima problem in protein folding: «BB global 
optimization approach; Phase problem in X-ray 
crystallography: Shake and bake approach; Random search 
methods; Simulated annealing; Simulated annealing 
methods in protein folding; Stochastic global optimization: 
stopping rules; Stochastic global optimization: two-phase 
methods) 
(refers to: Bayesian global optimization; Genetic algorithms; 
Genetic algorithms for protein structure prediction; Global 
optimization based on statistical models; Global 
optimization in Lennard-Jones and morse clusters; Global 
optimization in protein folding; Molecular structure 
determination: convex global underestimation; 
Monte-Carlo simulated annealing in protein folding; 
Multiple minima problem in protein folding: «BB global 
optimization approach; Phase problem in X-ray 
crystallography: Shake and bake approach; Protein folding: 
generalized-ensemble algorithms; Random search methods; 
Simulated annealing; Simulated annealing methods in 
protein folding; Stochastic global optimization: stopping 
rules; Stochastic global optimization: two-phase methods) 
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packing 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: cutting plane algorithms) 
packing see: vertex — 
packing game 
[90C27, 90C60, 91A12] 
(see: Combinatorial optimization games) 
packing and partitioning problems see: Set covering — 
packing problem 
[90035] 
see: Feedback set problems) 
packing problem see: bandwidth 
Padé approximation 
[33C45, 65F20, 65F22, 65K10] 
(see: Least squares orthogonal polynomials) 
Padé-type approximation 
[33C45, 65F20, 65F22, 65K10] 
(see: Least squares orthogonal polynomials) 
Padé-type approximation 
[33C45, 65F20, 65F22, 65K10] 
(see: Least squares orthogonal polynomials) 
pADRE2 
[65K05, 90C30] 
(see: Automatic differentiation: calculation of the Hessian; 
Automatic differentiation: point and interval taylor 
operators) 
Painlevé—Kuratowski convergence see: discrete — 
painting axioms 
[90C09, 90C10] 
(see: Oriented matroids) 
pair 
(see: Contact map overlap maximization problem, CMO) 
pair see: admissible trajectory-control —; conjugate —; 
convex-like function —; dual —; Fenchel duality —; 
Legendre duality —; primal —; quasimonotone — 
pair assignment algorithms 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
pair decomposition see: well-separated — 
pair decomposition of a monomial ideal see: standard — 
pair-exchange neighborhood 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
pair of amonomial ideal see: admissible —; standard — 
pair of trajectory and control functions see: asymptotically 
admissible — 
pair of trajectory-function and control-function see: 
admissible — 
pair of variables see: complementary — 
paired comparison 
[90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions) 
paired element see: left- —; right- — 
paired set see: left- —; right- — 
pairing see: crew — 
pairs see: fuzzy interval —; left- —; right- — 
pairs constrained path problem see: impossible — 
pairwise comparisons 
[90-XX, 90C29] 


; bin 


; graph —; set 


(see: Estimating data for multicriteria decision making 

problems: optimization techniques; Outranking methods) 
pairwise comparisons 

[90C29] 

(see: Estimating data for multicriteria decision making 

problems: optimization techniques) 
pairwise judgment 

[90C29] 

(see: Estimating data for multicriteria decision making 

problems: optimization techniques) 
pairwise protein sequence alignment via mixed-integer linear 

optimization see: Global — 
pale edge 

[90C10, 90C27, 94C15] 

(see: Graph planarization) 
Palubeckis generator 

[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 
panel see: blended — 

Pape method see: D’'Esopo- — 

paper converting 

90C11, 90C90] 

(see: MINLP: trim-loss problem) 

parabolic curve 

49K27, 49K40, 90C30, 90C31] 

(see: Second order constraint qualifications) 
parabolic curve approach 

49K27, 49K40, 90C30, 90C31] 

(see: Second order constraint qualifications) 
parabolic-exponential function 

90C30] 

(see: Image space approach to optimization) 
paradigm see: checklist —; disaggregation —; edge 

insertion —; functional —; general dynamic 

programming —; imperative programming —; MDO —; 

Modeling languages in optimization: a new —; 

optimization — 
paradigm of logic programming 

[90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 
paradigm semantics for fuzzy logics see: Checklist — 
paradox see: Braess —; Condorcet — 
parallax 

[90C26, 90C90] 

(see: Global optimization in binary star astronomy) 
parallel 

[68T20, 68T99, 90C27, 90C59] 

(see: Contact map overlap maximization problem, CMO; 

Metaheuristics) 
parallel see: bulk synchronous — 
parallel AD tools 

[49-04, 65Y05, 68N20] 

(see: Automatic differentiation: parallel computation) 
parallel algorithm 

[65K05, 65Y05] 

(see: Parallel computing: models) 
parallel algorithm 

[65K05, 65Y05, 68W 10, 90C27] 

(see: Load balancing for parallel optimization techniques; 

Parallel computing: models) 
parallel algorithm design see: model for — 
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parallel algorithms 
[65G20, 65G30, 65G40, 65K05, 68T20, 68T99, 90C27, 90C30, 
90C59] 
(see: Interval global optimization; Metaheuristics) 

parallel aspiration search 

49J35, 49K35, 62C20, 91A05, 91A40] 

(see: Minimax game tree searching) 

Parallel Best-First Tree Search 

68W 10, 90C27] 

(see: Load balancing for parallel optimization techniques) 

parallel CA algorithm see: asynchronous —; synchronized — 

parallel computation 

65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 

parallel computation see: Automatic differentiation: — 

parallel computation in mechanics 

49Q10, 74K99, 74Pxx, 90C90, 91A65] 

(see: Multilevel optimization in mechanics) 

parallel computation thesis 

03D15, 68Q05, 68Q15] 

(see: Parallel computing: complexity classes) 

parallel computations 

49-04, 65Y05, 68N20] 

(see: Automatic differentiation: parallel computation) 

parallel computer 

65K05, 65Y05, 90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms; 
Parallel computing: models) 

parallel computer see: bulk synchronous —; distributed 
memory — 

parallel computing 
[65K05, 65Y05] 
(see: Parallel computing: models) 

parallel computing 
[49-04, 65Y05, 68N20, 90C15] 
(see: Automatic differentiation: parallel computation; 
Stochastic programming: parallel factorization of 
structured matrices) 

parallel computing see: massively —; models for — 

Parallel computing: complexity classes 
(68Q05, 68Q15, 03D15) 
(referred to in: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Complexity classes in optimization; 
Complexity of degeneracy; Complexity of gradients, 
Jacobians, and Hessians; Complexity theory; Complexity 
theory: quadratic programming; Computational 
complexity theory; Fractional combinatorial optimization; 
Heuristic search; Information-based complexity and 
information-based optimization; Kolmogorov complexity; 
Load balancing for parallel optimization techniques; Mixed 
integer nonlinear programming; NP-complete problems 
and proof methodology; Parallel computing: models; 
Parallel heuristic search; Stochastic network problems: 
massively parallel solution) 
(refers to: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Complexity classes in optimization; 
Complexity of degeneracy; Complexity of gradients, 
Jacobians, and Hessians; Complexity theory; Complexity 


theory: quadratic programming; Computational 
complexity theory; Fractional combinatorial optimization; 
Heuristic search; Information-based complexity and 
information-based optimization; Interval analysis: parallel 
methods for global optimization; Kolmogorov complexity; 
Load balancing for parallel optimization techniques; Mixed 
integer nonlinear programming; Parallel computing: 
models; Parallel heuristic search; Stochastic network 
problems: massively parallel solution) 

Parallel computing: models 
(65K05, 65Y05) 
(referred to in: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Heuristic search; Load balancing for parallel 
optimization techniques; Parallel computing: complexity 
classes; Parallel heuristic search; Stochastic network 
problems: massively parallel solution) 
(refers to: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Heuristic search; Interval analysis: parallel 
methods for global optimization; Load balancing for 
parallel optimization techniques; Parallel computing: 
complexity classes; Parallel heuristic search; Stochastic 
network problems: massively parallel solution) 

parallel cuts 
[90C05] 
(see: Ellipsoid method) 

Parallel Depth-First Tree Search 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 

parallel factorization of structured matrices see: Stochastic 
programming: — 

parallel graph see: series- — 

parallel GRASP 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 

Parallel heuristic search 
(68W 10, 68W15, 68R05, 68T20) 
(referred to in: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Heuristic search; Load balancing for parallel 
optimization techniques; Parallel computing: complexity 
classes; Parallel computing: models; Stochastic network 
problems: massively parallel solution) 
(refers to: Asynchronous distributed optimization 
algorithms; Automatic differentiation: parallel 
computation; Heuristic search; Load balancing for parallel 
optimization techniques; Parallel computing: complexity 
classes; Parallel computing: models; Stochastic network 
problems: massively parallel solution) 

parallel library see: NAG — 

parallel machines 
[03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 

parallel machines see: distributed memory —; shared 
memory — 

parallel matrix factorization 
[90C15] 
(see: Stochastic programming: parallel factorization of 
structured matrices) 
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parallel methods 
[90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 
parallel methods for global optimization see: Interval 
analysis: — 
parallel minimax tree algorithm 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
parallel model see: bulk synchronous — 
parallel optimization 
[90C10, 90C26, 90C30] 
(see: Optimization software) 
parallel optimization techniques see: Load balancing for — 
parallel programming 
[65G20, 65G30, 65G40, 65L99] 
see: Interval analysis: differential equations) 
parallel programs see: AD of — 
parallel random access machine 
[65K05, 65Y05] 
see: Parallel computing: models) 
parallel random access machine 
[65K05, 65Y05] 
see: Parallel computing: models) 
parallel routing algorithm 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
see: Maximum partition matching) 
parallel savings algorithm 
[68T99, 90C27] 
see: Capacitated minimum spanning trees) 
parallel solution see: Stochastic network problems: 
massively — 
parallel tangents 
[90C30] 
see: Frank-Wolfe algorithm) 
parallel-tangents algorithm 
[90C30] 
see: Conjugate-gradient methods) 
Parallel VNS 
[9008, 90C26, 90C27, 90C59] 
see: Variable neighborhood search methods) 
parallelism see: AD-enabled —; data —; time — 
parallelism alignment problem see: constant degree — 
parallelization 
[65G20, 65G30, 65G40] 
(see: Interval analysis: systems of nonlinear equations) 
parallelization see: automatic — 
parallelizing the exploration of minimax trees 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
parameter 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 


programming) 
parameter see: Bregman —; convexification —; exact 
penalty —; globally optimal —; locally optimal —; 


optimal —; penalty —; prohibition —; temperature —; 
tuning — 

parameter CG family see: two- — 

parameter computation see: conjugate gradient — 


parameter derivative 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 
parameter estimates 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 
parameter estimation 
[90C30, 90C52, 90C53, 90C55] 
(see: Gauss-Newton method: Least squares, relation to 
Newton’s method; Generalized total least squares) 
parameter estimation 
[62F10, 65D10, 65K05, 94A17] 
(see: Entropy optimization: parameter estimation; 
Overdetermined systems of linear equations) 
parameter estimation see: Bayesian —; Entropy 
optimization: — 
parameter estimation problem see: sinusoidal — 
parameter identification 
[34A55, 78A60, 90C30] 
(see: Optimal design in nonlinear optics) 
parameter identification 
[34A55, 78A60, 90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming; Optimal design in nonlinear optics) 
parameter identification problem 
[49K20, 49M99, 90C05, 90C25, 90C29, 90C30, 90C31, 90C55] 
(see: Nondifferentiable optimization: parametric 
programming; Sequential quadratic programming: interior 
point methods for distributed optimal control problems) 
parameter, reject index for interval optimization see: 
Algorithmic improvements using a heuristic — 
parameter T see: prohibition — 
parameter tractability see: fixed — 
parameter tractable algorithms see: fixed — 
parameterization see: control — 
parameterization method 
[90C11, 90C90] 
(see: MINLP: trim-loss problem) 
parameters 
(see: Planning in the process industry; Short-term 
scheduling under uncertainty: sensitivity analysis) 
parameters see: estimation of model —; problem —; 
sensitivity — 
parametric 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
parametric approach 
(see: Fractional zero-one programming) 
parametric approach to fractional optimization 
68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
parametric approach to optimality 
90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
parametric bounds 
90C11, 90C31] 
(see: Multiparametric mixed integer linear programming) 
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parametric complementarity problems 
[90C30, 90C33] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 
parametric eigenvalue formulation see: inverse 
interpolation — 
parametric eigenvalue reformulation 
[90C30] 
(see: Large scale trust region problems) 
parametric finite optimization problem see: one- — 
Parametric global optimization: sensitivity 
(90C31, 90C34, 90C34) 
(referred to in: Bounds and solution vector estimates for 
parametric NLPS; Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 


Nondifferentiable optimization: parametric programming; 
Nonlocal sensitivity analysis with automatic differentiation; 


Parametric linear programming: cost simplex algorithm; 
Parametric mixed integer nonlinear optimization; 
Parametric optimization: embeddings, path following and 
singularities; Selfdual parametric method for linear 
programs; Sensitivity analysis of complementarity 
problems; Sensitivity analysis of variational inequality 
problems; Sensitivity and stability in NLP; Sensitivity and 


stability in NLP: approximation; Sensitivity and stability in 


NLP: continuity and differential stability) 

(refers to: Bounds and solution vector estimates for 
parametric NLPS; Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 


Nondifferentiable optimization: parametric programming; 
Nonlocal sensitivity analysis with automatic differentiation; 


Parametric linear programming: cost simplex algorithm; 
Parametric mixed integer nonlinear optimization; 
Parametric optimization: embeddings, path following and 
singularities; Selfdual parametric method for linear 
programs; Sensitivity analysis of complementarity 
problems; Sensitivity analysis of variational inequality 
problems; Sensitivity and stability in NLP; Sensitivity and 


stability in NLP: approximation; Sensitivity and stability in 


NLP: continuity and differential stability) 

parametric linear complementarity problem 
[90C31, 90C33] 
(see: Sensitivity analysis of complementarity problems) 

parametric linear programming 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming; Parametric linear programming: cost 
simplex algorithm) 

parametric linear programming 
[90C05, 90C31] 
(see: Parametric linear programming: cost simplex 
algorithm) 

Parametric linear programming: cost simplex algorithm 
(90C05, 90C31) 
(referred to in: Bounds and solution vector estimates for 
parametric NLPS; Convex-simplex algorithm; Equivalence 
between nonlinear complementarity problem and fixed 
point problem; Generalized nonlinear complementarity 
problem; Global optimization in multiplicative 
programming; Integer linear complementary problem; 
LCP: Pardalos—Rosen mixed integer formulation; Lemke 


method; Linear complementarity problem; Linear 
programming; Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 
Multiplicative programming; Nondifferentiable 
optimization: parametric programming; Order 
complementarity; Parametric global optimization: 
sensitivity; Parametric mixed integer nonlinear 
optimization; Parametric optimization: embeddings, path 
following and singularities; Principal pivoting methods for 
linear complementarity problems; Selfdual parametric 
method for linear programs; Sequential simplex method; 
Topological methods in complementarity theory) 

(refers to: Bounds and solution vector estimates for 
parametric NLPS; Convex-simplex algorithm; Global 
optimization in multiplicative programming; Lemke 
method; Linear complementarity problem; Linear 
programming; Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 
Multiplicative programming; Nondifferentiable 
optimization: parametric programming; Parametric global 
optimization: sensitivity; Parametric mixed integer 
nonlinear optimization; Parametric optimization: 
embeddings, path following and singularities; Selfdual 
parametric method for linear programs; Sequential simplex 
method) 


parametric lower bound 


[90C11, 90C31] 
(see: Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization) 


parametric method for linear programs see: Selfdual — 
parametric methods 


[90C26] 
(see: Global optimization in multiplicative programming) 


parametric mixed integer linear program see: single — 
Parametric mixed integer nonlinear optimization 


(90C31, 90C11) 

(referred to in: Bounds and solution vector estimates for 
parametric NLPS; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 
LCP: Pardalos—Rosen mixed integer formulation; MINLP: 
trim-loss problem; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 
Nondifferentiable optimization: parametric programming; 
Parametric global optimization: sensitivity; Parametric 
linear programming: cost simplex algorithm; Parametric 
optimization: embeddings, path following and 
singularities; Selfdual parametric method for linear 
programs; Set covering, packing and partitioning problems; 
Simplicial pivoting algorithms for integer programming; 
Time-dependent traveling salesman problem) 

(refers to: Bounds and solution vector estimates for 
parametric NLPS; Branch and price: Integer programming 
with column generation; Decomposition techniques for 
MILP: lagrangian relaxation; Integer linear complementary 
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problem; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos-Rosen 
mixed integer formulation; Mixed integer classification 
problems; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric linear programming; Multiparametric 
mixed integer linear programming; Nondifferentiable 
optimization: parametric programming; Parametric global 
optimization: sensitivity; Parametric linear programming: 
cost simplex algorithm; Parametric optimization: 
embeddings, path following and singularities; Selfdual 
parametric method for linear programs; Set covering, 
packing and partitioning problems; Simplicial pivoting 
algorithms for integer programming; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Time-dependent traveling 
salesman problem) 

parametric NLPS see: Bounds and solution vector estimates 
for — 

parametric nonlinear complementarity problem 
[90C31, 90C33] 
(see: Sensitivity analysis of complementarity problems) 

parametric nonlinear optimization 
[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 

parametric objective simplex algorithm 

[90C26] 

see: Global optimization in multiplicative programming) 

parametric optimal control 

[49M37, 90C11] 

see: MINLP: applications in the interaction of design and 

control) 

parametric optimization 

[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 

90C33, 90C34] 

see: Design of robust model-based controllers via 

parametric programming; Parametric optimization: 

embeddings, path following and singularities) 

parametric optimization 

65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 
90C33, 90C34] 
(see: Parametric optimization: embeddings, path following 
and singularities) 

parametric optimization see: nonlinear — 

Parametric optimization: embeddings, path following and 
singularities 
(90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 90C33, 90C34, 
65K05, 65K10) 
(referred to in: Bounds and solution vector estimates for 
parametric NLPS; Generalized semi-infinite programming: 
optimality conditions; Globally convergent homotopy 
methods; Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 
Nondifferentiable optimization: parametric programming; 
Parametric global optimization: sensitivity; Parametric 


linear programming: cost simplex algorithm; Parametric 
mixed integer nonlinear optimization; Selfdual parametric 
method for linear programs; Topology of global 
optimization) 
(refers to: Bounds and solution vector estimates for 
parametric NLPS; Globally convergent homotopy methods; 
Multiparametric linear programming; Multiparametric 
mixed integer linear programming; Nondifferentiable 
optimization: parametric programming; Parametric global 
optimization: sensitivity; Parametric linear programming: 
cost simplex algorithm; Parametric mixed integer 
nonlinear optimization; Selfdual parametric method for 
linear programs; Topology of global optimization) 
parametric optimization problem 
[90C31, 90C34] 
(see: Semi-infinite programming: second order optimality 
conditions) 
parametric problem 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
(see: Fractional combinatorial optimization) 
parametric programming 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality; Nondifferentiable optimization: parametric 
programming) 
parametric programming 
[90C25, 90C29, 90C30, 90C31, 90C34] 
(see: Bilevel programming: optimality conditions and 
duality; Parametric global optimization: sensitivity) 
parametric programming see: applications of —; convex —; 
Design of robust model-based controllers via —; history 
of —; instability in —; Nondifferentiable optimization: —; 
optimality in —; stability on —; stable —; structural stability 
in —; topological stability in — 
parametric programming model 
90C05, 90C25, 90C29, 90C30, 90C31 
(see: Nondifferentiable optimization: parametric 
programming) 
parametric programming model 
90C05, 90C25, 90C29, 90C30, 90C31 
(see: Nondifferentiable optimization: parametric 
programming) 
parametric programming models 
90C05, 90C25, 90C29, 90C30, 90C31 
(see: Nondifferentiable optimization: parametric 
programming) 
parametric programs see: robust — 
parametric representation 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
parametric representations see: necessary optimality condition 
without using (sub)gradients — 
parametric right-hand side simplex algorithm 
[90C26] 
(see: Global optimization in multiplicative programming) 
parametric rule 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
parametric search see: Megiddo — 
parametric semi-infinite optimization see: one- — 
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parametric solutions see: comparison of — 
parametric upper bound 
(90C11, 90C31 
(see: Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization) 
parametric upper and lower bounds 
(90C11, 90C31 
(see: Bounds and solution vector estimates for parametric 
NLPS; Parametric mixed integer nonlinear optimization) 
parametric variational inequalities 
90C30, 90C33 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 
parametric variational inequality problem 
65K10, 90C31 
(see: Sensitivity analysis of variational inequality problems) 
parametrization see: control —; objective function — 
parametrized Sard theorem 
65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 
Pardalos generator see: Li- — 
Pardalos—Rosen mixed integer formulation see: LCP: — 
parent node reconstruction 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms) 
parent of a vertex 
05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
parergon 
01A20] 
(see: Archimedes and the foundations of industrial 
engineering) 
Paretian cone 
90C29] 
(see: Generalized concavity in multi-objective optimization; 
Vector optimization) 
Pareto efficient solution 
[90C29] 
(see: Vector optimization) 
Pareto optimal 
[49L20, 90C29, 90C39] 
(see: Dynamic programming: discounted problems; 
Multi-objective optimization; Interactive methods for 
preference value functions; Multi-objective optimization: 
pareto optimal solutions, properties; Planning in the 
process industry) 
Pareto optimal 
(see: Planning in the process industry) 
Pareto optimal solution 
[90B85, 90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points; Single facility location: 
multi-objective rectilinear distance location) 
Pareto optimal solution 
[90C11, 90C15, 90C29, 90C90] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points; Multi-objective 
optimization: interaction of design and control; 
Multi-objective optimization: pareto optimal solutions, 
properties) 
Pareto optimal solution see: M- —; weakly — 


Pareto optimal solution set 
[90C11, 90C29, 90C90] 
(see: Multi-objective optimization: interaction of design 
and control) 
Pareto optimal solutions 
[90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 
pareto optimal solutions, properties see: Multi-objective 
optimization: — 
Pareto optimality 
[90B85, 90C27] 
(see: Single facility location: multi-objective euclidean 
distance location; Time-dependent traveling salesman 
problem) 
Pareto optimality 
[90C29] 
(see: Vector optimization) 
Pareto optimality of MODP see: principle of — 
Pareto optimum 
[90C25, 90C29, 90C30, 90C31] 
see: Bilevel programming: optimality conditions and 
duality) 
Pareto point 
[90C29] 
see: Generalized concavity in multi-objective optimization) 
Pareto race 
[90C29] 
(see: Multiple objective programming support) 
Pareto solution 
[49Jxx, 91 Axx] 
see: Infinite horizon control and dynamic games) 
Parlett factorization see: Bunch and — 
Parrott theorem 
[93D09] 
see: Robust control) 
parsing see: expression — 
part see: excess —; ideal —; nonideal —; seed — 
part of a function see: twice-differentiable — 
part map see: standard — 
PARTAN 
[90C30] 
see: Frank-Wolfe algorithm) 
PARTAN algorithm 
90C30] 
(see: Conjugate-gradient methods) 
partial assignment 
68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
partial calmness condition 
[90C25, 90C29, 90C30, 90C31] 
see: Bilevel programming: optimality conditions and 
duality) 
partial completely positive matrix 
[05C50, 15A48, 15A57, 90C25] 
see: Matrix completion problems) 
partial computability 
[90C26] 
see: Global optimization using space filling) 
partial computability 
[90C26] 
see: Global optimization using space filling) 
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partial computation of a Turing machine 
[90C60] 
(see: Complexity classes in optimization) 
partial computation of a Turing machine see: length of a — 
partial contraction matrix 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
partial definite matrix 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
partial derivatives see: elementary —; matrix of second — 
partial differential equations 
[03H10, 49J27, 49K20, 49M99, 90C34, 90C55] 
(see: Semi-infinite programming and control problems; 
Sequential quadratic programming: interior point methods 
for distributed optimal control problems) 
partial differential equations see: First order — 
partial discretization 
[65L99, 93-XX] 
see: Optimization strategies for dynamic systems) 
partial distance matrix 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
partial equilibrium 
[91B50] 
see: Walrasian price equilibrium) 
partial equilibrium 
[91B28, 91B50] 
see: Spatial price equilibrium) 
partial equilibrium model 
[91B28, 91B50] 
see: Spatial price equilibrium) 
partial Hermitian matrix 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
partial information 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
see: Information-based complexity and information-based 
optimization) 
partial linearization 
[90C30] 
see: Cost approximation algorithms) 
partial matrix 
[05C50, 15A48, 15A57, 90C25] 
see: Matrix completion problems) 
partial matrix 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
partial matrix see: completion of a — 
partial monotonicity 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
partial monotonicity 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 
partial order 
[41A30, 47A99, 62G07, 62G30, 65K05, 65K10, 90C29] 
(see: Isotonic regression problems; Lipschitzian operators 


in best approximation by bounded or continuous functions; 
Vector optimization) 

partial order see: antisymmetric —; Lowner — 

partial order relation 

03B52, 03E72, 47840, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

partial orders 

90C29] 

(see: Preference modeling) 

partial proximal point algorithm 

90C30] 

(see: Cost approximation algorithms) 

partial semidefinite matrix 

05C50, 15A48, 15A57, 90C25] 

(see: Matrix completion problems) 

partial sequential normal compactness 

49K27, 58C20, 58E30, 90C48] 

(see: Nonsmooth analysis: Fréchet subdifferentials) 

partial-update Newton method 

90C30] 

(see: Numerical methods for unary optimization) 

partial-update Newton method 

90C30] 

(see: Numerical methods for unary optimization) 

partial updating 

90C05] 
(see: Linear programming: interior point methods; Linear 
programming: karmarkar projective algorithm) 

partially 
(see: Mixed integer programming/constraint programming 
hybrid methods) 

partially asynchronous computation 

90C30] 

(see: Cost approximation algorithms) 

partially asynchronous iterative method 

90C30, 90C52, 90C53, 90C55] 

(see: Asynchronous distributed optimization algorithms) 

partially asynchronous operation 

90C30, 90C52, 90C53, 90C55] 

(see: Asynchronous distributed optimization algorithms) 

partially monotonous 

90C15, 90C29] 

(see: Discretely distributed stochastic programs: descent 

directions and efficient points) 

rtially separable function 

90C06] 

(see: Large scale unconstrained optimization) 

participant 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

participants see: subset of — 

partition 
[68Q25, 90C09, 90C10, 90C15, 90C60] 
(see: Matroids; NP-complete problems and proof 
methodology; Stochastic linear programs with recourse and 
arbitrary multivariate distributions) 


‘Sy 


P 


partition see: 2- —; 3- —; block of a —; left-collection of a —; 
ordered —; p- —; rectangular —; right-collection of a —; 
simplicial — 


Subject Index 


4417 


partition flipping 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
partition-flipping see: algorithm — 
partition hierarchy 
[62H30, 90C27, 90C39] 
(see: Assignment methods in clustering; Dynamic 
programming in clustering) 
partition identities see: primitive — 
partition matching 
[05A18, 05D15, 68MO07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
partition matching see: Maximum — 
partition-matching-| see: algorithm — 
partition matching problem see: maximum — 
partition matroid 
[90C09, 90C10] 
(see: Matroids) 
partition number see: clique — 
partition problem see: rectangular — 
Partition Problem (MP) see: minimum — 
partition on a set 
[03B52, 03E72, 41A30, 47S40, 62J02, 68T27, 68T35, 68Uxx, 
90Bxx, 90C26, 91 Axx, 91B06, 92C60] 
(see: Boolean and fuzzy relations; Regression by special 
functions: algorithms and complexity) 
partitioned quasi-Newton method 
[90C06] 
(see: Large scale unconstrained optimization) 
partitioning 
[65K05] 
(see: Direct global optimization algorithm) 
partitioning see: adaptive —; Graph —; minimum clique —; 
order constrained —; ordered —; set — 
partitioning algorithm see: K-iterated tour —; nearest insertion 
optimal — 
partitioning method 
[90C35] 
(see: Multicommodity flow problems) 
partitioning problem see: graph —; k-way graph —; set — 
partitioning problems see: Set covering, packing and — 
partitions 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
partitions see: nested —; set Ltree of unused —; set Lreac of 
used —; set Rrree of unused —; set Rreac of used — 
partitions optimization see: Nested — 
partly convex problems 
90C25, 90C26] 
(see: Decomposition in global optimization) 
partly convex program 
90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
partly convex programs 
90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
pass theorem see: mountain — 


passenger trip 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

patch (COP) see: contract-or- — 

patching 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 

path 
[90035] 
(see: Minimum cost flow problem) 

path see: center —; central —; circular —; co-optimal —; 
critical —; directed —; dogleg —; edge-disjoint —; 
elementary connecting —; forward —; linear —; multiple 
dogleg —; node-disjoint —; nondominated —; optimal —; 
principal variation —; shortest —; stochastic shortest —; 
unique — 

path algorithm see: augmenting —; generic augmenting —; 
successive shortest — 

path algorithms see: generic shortest — 

path approach see: feasible —; infeasible — 

path coloring problem 
[05C85] 
(see: Directed tree networks) 

path contraction 
[90B06, 90B10, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 
90C59, 90C60, 90C90] 
(see: Shortest path tree algorithms; Traveling salesman 
problem) 

path cost 
[90035] 
(see: Multi-index transportation problems) 

path cost see: Hamiltonian — 

path decomposition 

[68R10, 90C27] 

(see: Branchwidth and branch decompositions) 

path extension 

[90B10, 90C27] 

(see: Shortest path tree algorithms) 

path flow formulation 

[90B06, 90B15, 90B20, 91B50] 

see: Dynamic traffic networks; Traffic network 
equilibrium) 

path flow pattern see: feasible — 

path flows see: variational inequality formulation in — 

path following 

[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 

90C33, 90C34] 

see: Parametric optimization: embeddings, path following 

and singularities) 

path following 

[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 

90C33, 90C34] 

see: Parametric optimization: embeddings, path following 

and singularities) 

path following algorithm 

[90C05] 

see: Linear programming: interior point methods) 

path following algorithm for entropy optimization 

[90C25, 90C51, 94A17] 

see: Entropy optimization: interior point methods) 
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path following approach 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
path following methods 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 
path following and singularities see: Parametric optimization: 
embeddings — 
path formulation 
[90035] 
(see: Multicommodity flow problems) 
path formulation of the multicommodity flow problem see: 
node- — 
path in a graph 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
path in a graph see: length of a — 
path length 
[62H30, 90C27] 
(see: Assignment methods in clustering) 
path length see: maximin —; maximum —; minimax —; 
minimum — 
path optimization see: sample- — 
path problem see: deterministic shortest —; impossible pairs 
constrained —; shortest —; stochastic shortest — 
path problems see: Dynamic programming: stochastic 
shortest —; stochastic shortest — 
path procedure see: next shortest — 
path-protection 
[46N10, 68M10, 90B18, 90B25] 
(see: Integer linear programs for routing and protection 
problems in optical networks) 
path relinking 
[65H20, 65K05, 68T20, 68T99, 90-01, 90B40, 90C10, 90C11, 
90C20, 90C27, 90C35, 90C59, 94C15] 
(see: Greedy randomized adaptive search procedures; Linear 
ordering problem; Metaheuristics) 
path routing pattern model see: single — 
path-string 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
path tree algorithms see: Shortest — 
path tree problem see: single source shortest — 
path tree problems see: shortest — 
paths see: Pivoting algorithms for linear programming 
generating two —; problem of finding shortest — 
pathwidth 
[68R10, 90C27] 
(see: Branchwidth and branch decompositions) 
pattern see: cutting —; feasible path flow —; location —; 
positive-negative-zero —; strategy —; zero-nonzero — 
pattern based model see: glS design — 
pattern classification see: statistical — 
pattern of a matrix 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
pattern of a matrix see: sign — 
pattern model see: fractional routing —; single path routing — 


pattern recognition 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
pattern recognition see: Complementarity algorithms in —; 
statistical — 
pattern search 
[90C30] 
(see: Rosenbrock method) 
pattern search 
[90C30] 
(see: Cyclic coordinate method; Powell method; Rosenbrock 
method) 
pattern searches 
[90C30] 
(see: Cyclic coordinate method) 
patterns see: cutting —; word — 
patterns and graphs see: matrix — 
PAV 
[62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
payments see: game with side — 
payoff space see: resource- — 
PC clusters 
[65K05, 65Y05] 
(see: Parallel computing: models) 
PCP-LCP 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
PCR groups 
[65K05, 90C30] 
(see: Automatic differentiation: calculation of Newton 
steps) 
PCSP see: computer code — 
pd 
[05C50, 15A48, 15A57, 90C25, 90C26, 90C39] 
(see: Matrix completion problems; Second order optimality 
conditions for nonlinear optimization) 
pD-VNS 
9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 
PDS 
90C10, 90C26, 90C30] 
(see: Optimization software) 
Peano curve 
90C26] 
(see: Global optimization using space filling) 
Peano curve 
90C26] 
(see: Global optimization using space filling) 
Peano function 
90C30] 
(see: Image space approach to optimization) 
Peano map 
62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
Pearson chi-square statistic 
62H30, 90C27] 
(see: Assignment methods in clustering) 
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penalties 
[90C26] 
(see: Global optimization using space filling) 
penalty 
[90C11, 90C26] 
(see: Global optimization in batch design under 
uncertainty; MINLP: branch and bound methods) 
penalty see: down —; Driebeck-Tomlin —; exact —; outer 
approximation with equality relaxation and augmented —; 
up — 
penalty approach 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
penalty-based method 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
penalty function 
[05C69, 05C85, 65K05, 68W01, 90C59, 90C90, 91B28, 93-XX] 
(see: Direct global optimization algorithm; Dynamic 
programming: optimal control applications; Heuristics for 
maximum clique and independent set; Robust 
optimization) 
penalty function see: Courant —; exact —; exact Loo- —; £1 —; 
|; exact —; logarithmic-quadratic barrier- — 
penalty function approach see: continuously differentiable 
exact — 
penalty function based algorithm see: exact — 
penalty functions 
[93-XX] 
(see: Direct search Luus—Jaakola optimization procedure) 
penalty method see: Exact — 
penalty methods see: Quasidifferentiable optimization: 
exact —; regularity condition for — 
penalty parameter 
90Cxx] 
(see: Discontinuous optimization) 
penalty parameter see: exact — 
penalty technique 
65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
Penrose conditions 
65F xx] 
(see: Least squares problems) 
Penrose pseudo-inverse see: Moore- — 
PEP 
49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
Peptide identification via mixed-integer optimization 
per stage see: average cost —; discounted problem with 
bounded cost — 
per stage problem see: average cost — 
per stage problems see: average cost —; Dynamic 
programming: average cost — 
perceptron algorithm 
[62H30, 68T10, 90C05] 
(see: Linear programming models for classification) 
perfect 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 
perfect see: subgame — 


perfect b-matching problem 

[90C05, 90C10, 90C27, 90C35] 

see: Assignment and matching) 

perfect competition 

[91B50] 

see: Financial equilibrium; Walrasian price equilibrium) 

perfect competition 

[91B50] 

see: Financial equilibrium; Walrasian price equilibrium) 

perfect dual see: formal — 

perfect duality 

90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

perfect duality see: Semi-infinite programming, semidefinite 
programming and — 

perfect duality from the view of linear semi-infinite programming 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

perfect graph theorem see: strong — 

perfect information see: expected value of — 

perfect-information game see: two-player zero-sum — 

perfect matching 

[90C05, 90C10, 90C27, 90C35] 

see: Assignment and matching) 

perfect matching problem 

[90C10, 90C11, 90C27, 90C57] 

see: Integer programming) 

perfect matchings 

[05C85] 

see: Directed tree networks) 

perfect offspring 

[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 

see: Greedy randomized adaptive search procedures) 

perfectly competitive 

[91B28, 91B50] 

see: Spatial price equilibrium) 

perfectly competitive equilibrium model 

[91B28, 91B50] 

see: Spatial price equilibrium) 

perfectly consistent case 

[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 

performance see: computational —; optimization of 
computational — 

performance computing see: high — 

performance computing system see: high — 

performance evaluation 

[90B30, 90B50, 90C05, 91B82] 

see: Data envelopment analysis) 

performance Fortran see: high — 

performance guarantee 

[05C85, 90C35] 

see: Directed tree networks; Graph coloring) 

performance guarantee see: worst-case — 

performance index 

[93-XX] 

see: Direct search Luus—Jaakola optimization procedure; 

Dynamic programming: optimal control applications) 
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performance index see: augmented — 
performance measurement see: Supply chain — 
Performance profiles of conjugate-gradient algorithms for 
unconstrained optimization 
(49M07, 49M 10, 90C06, 65K05) 
performances see: Volume computation for polytopes: 
strategies and — 
perimeter 
(see: State of the art in modeling agricultural systems) 
period see: duty- —; orbital — 
period model see: single- — 
period routing 
[90B06] 
(see: Vehicle routing) 
periodic review model 
[90B50] 
(see: Inventory management in supply chains) 
periodic review model 
[90B50] 
(see: Inventory management in supply chains) 
permanent of a matrix 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
permanent residents 
(see: Emergency evacuation, optimization modeling) 
permutation see: neighborhood of a — 
permutation graph 
[90C35] 
see: Feedback set problems) 
permutation matrix 
[90C08, 90C11, 90C27, 90C57, 90C59] 
see: Quadratic assignment problem) 
permutation QAP see: constant — 
permutations 
[05C85] 
(see: Directed tree networks) 
Perron-Frobenius theorem 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
person game see: cooperative case of a two- —; two- — 
person zero-sum game see: two- — 
perspective see: descriptive —; metric-based —; 
model-based —; normative —; prescriptive — 
perturbation see: data —; lexicographic ordering and — 
perturbation analysis 
[90C15] 
(see: Derivatives of probability measures) 
perturbation analysis see: infinitesimal — 
perturbation function 
[90C30] 
(see: Image space approach to optimization) 
perturbation methods 
[90C30] 
(see: Cost approximation algorithms) 
perturbation model see: right-hand side — 
perturbation problem see: right-hand side — 
perturbation technique 
[05B35, 65K05, 90C05, 90C20, 90C33, 90C60] 
(see: Complexity of degeneracy; Lexicographic pivoting 
rules) 


perturbations 
[90C05, 90C25, 90C30, 90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance; Semi-infinite programming, semidefinite 
programming and perfect duality) 

perturbations see: minimax observation problem under 
uncertainty with —; piecewise-constant —; Vasicek model 
with impulse — 

perturbative approximation 

90C15, 90C26, 90C33] 

(see: Stochastic bilevel programs) 

perturbed least squares problem 

65F xx] 

(see: Least squares problems) 

perturbed system 

90C60] 

(see: Complexity of degeneracy) 

petrochemical industry 

90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 

Petrov-Galerkin approach 
[65K05, 65K10] 
(see: ABS algorithms for optimization) 

Petrov-Galerkin iteration 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 

phase see: allocation —; construction —; global —; 
intensification —; local —; minimum —; nonminimum —; 
two- — 

phase algorithm see: three — 

phase and chemical reaction equilibrium see: Global 
optimization in — 

phase classes 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 

phase compositions see: equality of — 

phase constraints 
[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 

phase constraints see: Lagrange multipliers for — 

phase equilibrium 
[90C30] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 

phase equilibrium 
[65H20, 80A10, 80A22, 90C26, 90C90] 
(see: Global optimization: application to phase equilibrium 
problems; Global optimization in phase and chemical 
reaction equilibrium) 

phase equilibrium equations 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in 
distillation systems) 

phase equilibrium equations see: ideal and nonideal — 

phase equilibrium problem 
[49K99, 65K05, 80A10] 
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(see: Optimality criteria for multiphase chemical 
equilibrium) 

phase equilibrium problems see: Global optimization: 
application to — 

phase in GRASP see: construction —; local search — 

phase method see: two- — 

phase methods see: Stochastic global optimization: two- — 

phase problem 
[90C26] 
(see: Phase problem in X-ray crystallography: Shake and 
bake approach) 

phase problem 
[90C26] 
(see: Phase problem in X-ray crystallography: Shake and 
bake approach) 

Phase problem in X-ray crystallography: Shake and bake 
approach 
(90C26) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Genetic algorithms; Global 
optimization in Lennard-Jones and morse clusters; Graph 
coloring; Molecular structure determination: convex global 
underestimation; Monte-Carlo simulated annealing in 
protein folding; Multiple minima problem in protein 
folding: «BB global optimization approach; Packet 
annealing; Simulated annealing methods in protein 
folding) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Genetic algorithms; Global optimization 
in Lennard-Jones and morse clusters; Global optimization 
in protein folding; Molecular structure determination: 
convex global underestimation; Monte-Carlo simulated 
annealing in protein folding; Multiple minima problem in 
protein folding: «BB global optimization approach; Packet 
annealing; Protein folding: generalized-ensemble 
algorithms; Simulated annealing; Simulated annealing 
methods in protein folding) 

phase procedure see: two- — 

phase retrieval based on single-crystal X-ray diffraction data 
see: Optimization techniques for — 

phase space 
[49J15, 49K15, 93C10] 
(see: Pontryagin maximum principle) 

phase split 

65H20, 80A10, 80A22, 90C90 

(see: Global optimization: application to phase equilibrium 

problems) 

phase stability 

65H20, 80A10, 80A22, 90C90 

(see: Global optimization: application to phase equilibrium 

problems) 

phase stability 

65H20, 80A10, 80A22, 90C90 

(see: Global optimization: application to phase equilibrium 

problems) 

phase stability problem 

90C26, 90C90] 

(see: Global optimization in phase and chemical reaction 

equilibrium) 

phases 

90C10] 


(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
phases see: backtrack —; forward 
®@-isotone mapping 
[90C33] 
see: Order complementarity) 
physical junction nodes 
[90C30, 90C35] 
(see: Optimization in water resources) 
physically linear problem 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: triduality in global optimization) 
physically nonlinear problem 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: triduality in global optimization) 
Pi-algebra 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
Pi-algebras see: application of —; complexity theory of —; 
families of —; functional completeness of —; functionally 
complete normal forms of —; use of — 
PI-algebras and 2-valued normal forms 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
Pl-algebras of many-valued logics see: taxonomy of the — 
7 -classes 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 
PI-logic algebras 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
PI-logic algebras 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
Pi-logic algebras see: taxonomy of — 
PI-normal form 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
Pl-systems see: subfamilies of n-valued — 
Piaget group of transformation 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
pick-up and delivers 
[90B06] 
(see: Vehicle routing) 
pickups and deliveries see: Vehicle routing problem with 
simultaneous — 
pictogram translation mapping technique 
see: State of the art in modeling agricultural systems) 
pieces 
[46N10, 47N10, 49M37, 65K10, 90C26, 90C30] 
(see: Global optimization: tight convex underestimators) 
piecewise 
[46N10, 47N10, 49M37, 65K10, 90C26, 90C30] 
(see: Global optimization: tight convex underestimators) 
piecewise constant control 
[93-XX] 
see: Dynamic programming: optimal control applications) 
piecewise-constant perturbations 
[90C34, 91B28] 


; liquid —; vapor 
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see: Semi-infinite programming and applications in 
finance) 
piecewise continuously differentiable function 
90C26, 90C31, 91A65] 
(see: Bilevel programming: implicit function approach) 
piecewise continuously differentiable function 
90C26, 90C31, 91A65] 
(see: Bilevel programming: implicit function approach) 
piecewise differentiable function 
90Cxx] 
(see: Discontinuous optimization) 
piecewise linear arc cost 
[90B10] 
(see: Piecewise linear network flow problems) 
piecewise linear control 
[93-XX] 
see: Dynamic programming: optimal control applications) 
piecewise linear function 
[65M60, 90C26] 
see: MINLP: application in facility location-allocation; 
Variational inequalities: F. E. approach) 
piecewise linear function see: decomposition of 
a continuous — 


piecewise linear minimum cost network flow problem 
[90B10] 
(see: Piecewise linear network flow problems) 

Piecewise linear network flow problems 
(90B10) 
(referred to in: Auction algorithms; Communication 
network assignment problem; Dynamic traffic networks; 
Equilibrium networks; Generalized networks; Global 
supply chain models; Inventory management in supply 
chains; Maximum flow problem; Minimum cost flow 
problem; Multicommodity flow problems; Network design 
problems; Network location: covering problems; 
Nonconvex network flow problems; Nonoriented 
multicommodity flow problems; Operations research 
models for supply chain management and design; Shortest 
path tree algorithms; Steiner tree problems; Stochastic 
network problems: massively parallel solution; Survivable 
networks; Traffic network equilibrium) 
(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Equilibrium networks; Evacuation 
networks; Generalized networks; Global supply chain 
models; Inventory management in supply chains; 
Maximum flow problem; Minimum cost flow problem; 
Network design problems; Network location: covering 
problems; Nonconvex network flow problems; Nonoriented 
multicommodity flow problems; Operations research 
models for supply chain management and design; Shortest 
path tree algorithms; Steiner tree problems; Stochastic 
network problems: massively parallel solution; Survivable 
networks; Traffic network equilibrium) 

Piecewise linear programming 
[90Cxx] 
(see: Discontinuous optimization) 

piecewise linear quadratic function 
[46A20, 52A01, 90C30] 
(see: Composite nonsmooth optimization) 


piecewise linear upper bound 

[90C15] 

(see: Stochastic programs with recourse: upper bounds) 
piecewise linearization see: improved — 
piecewise linearization in facility location problems with 

staircase costs see: convex — 
piecewise sequential quadratic programming method 

[90C30, 90C33] 

(see: Optimization with equilibrium constraints: 

A piecewise SQP approach) 
piecewise sequential quadratic programming method 

[90C30, 90C33] 

(see: Optimization with equilibrium constraints: 

A piecewise SQP approach) 
piecewise SQP approach see: Optimization with equilibrium 

constraints: A — 
piecewise twice-differentiable function 

[90Cxx] 

(see: Discontinuous optimization) 
Piela method 

[90C26, 90C90] 

(see: Global optimization in Lennard-Jones and morse 

clusters) 
pilot method 

[68T20, 68T99, 90C27, 90C59] 

(see: Metaheuristics) 
pinch point 

[93A30, 93B50] 

(see: Mixed integer linear programming: mass and heat 

exchanger networks) 
Pinkava algebra 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
Pinkava logic algebras see: many-valued families of the — 
Pinkava logical algebra 

[03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 
Pinkava normal forms see: minimization of — 
pinned 

[51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 

(see: Graph realization via semidefinite programming) 
pitchfork 

(see: Global terrain methods) 
pivot 

[15-XX, 65-XX, 90-XX] 

(see: Cholesky factorization) 
pivot see: admissible —; block —; diagonal —; double —; 

exchange —; simple principal — 
pivot algorithm 

[05B35, 65K05, 90C05, 90C20, 90C33] 

(see: Criss-cross pivoting rules; Least-index anticycling 

rules; Lexicographic pivoting rules) 
pivot algorithm see: principal — 
pivot method 

[65K05, 90C20, 90C33] 

(see: Principal pivoting methods for linear complementarity 

problems) 
pivot methods see: complementary — 
pivot operation 

[90C05, 90C33, 90C35] 
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(see: Minimum cost flow problem; Pivoting algorithms for 
linear programming generating two paths) 

pivot operation see: degenerate —; nondegenerate — 

pivot rules 
[05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 

pivot rules 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules; Least-index anticycling 
rules; Lexicographic pivoting rules; Principal pivoting 
methods for linear complementarity problems) 

pivot selection see: lexicographic — 

pivot steps see: average number of —; expected number of — 

pivot theory see: complementary — 

pivotal transform see: principal — 

pivotal transformation see: principal — 

pivoting 
[90C05, 90C10] 
(see: Linear programming; Simplicial pivoting algorithms 
for integer programming) 

pivoting 
[90C05, 90C33] 
(see: Lemke method; Linear programming) 

pivoting see: guaranteed to be stable without —; matrix class 
invariant under principal —; principal —; QR factorization 
with column- — 

pivoting algorithm 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 

pivoting algorithms see: varying dimension — 

pivoting algorithms for integer programming see: Simplicial — 

Pivoting algorithms for linear programming generating two 
paths 
(90C05, 90C33) 
(referred to in: Criss-cross pivoting rules; Least-index 
anticycling rules; Lexicographic pivoting rules; Linear 
programming; Principal pivoting methods for linear 
complementarity problems; Probabilistic analysis of 
simplex algorithms; Simplicial pivoting algorithms for 
integer programming) 
(refers to: Criss-cross pivoting rules; Least-index anticycling 
rules; Lexicographic pivoting rules; Linear programming; 
Principal pivoting methods for linear complementarity 
problems; Probabilistic analysis of simplex algorithms; 
Simplicial pivoting algorithms for integer programming) 

pivoting method see: least-index —; principal — 

pivoting methods for linear complementarity problems see: 
Principal — 

pivoting property 
[90C09, 90C10] 
(see: Oriented matroids) 

pivoting required see: no — 

pivoting rule see: Bland least index —; Dantzig largest 
coefficient —; generic —; lexicographic — 

pivoting rules 
[05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules; Linear programming: 
Klee-Minty examples) 


Pivoting rules 

[90C05] 

see: Linear programming: Klee-Minty examples) 

pivoting rules see: Criss-cross —; Lexicographic — 

pixels 

[90C90] 

(see: Optimization in medical imaging) 

Piyavskii-Shubert algorithm 

[65K05, 90C30] 
(see: Bisection global optimization methods) 

placement see: inventory — 

plan see: block — 

planar 

[90C10, 90C27] 

see: Multidimensional assignment problem) 

planar augmentation 

[90C10, 90C27, 94C15] 

(see: Graph planarization) 

planar graph 

[68R10, 90C10, 90C27, 94C15] 

see: Branchwidth and branch decompositions; Graph 

planarization) 

planar graph 

[90C10, 90C27, 94C15] 

see: Graph planarization) 

planar graph see: level —; maximum weighted — 

planar multi-index transportation problem 

[90035] 

see: Multi-index transportation problems) 

planar multilayered dielectric structures see: Global 
optimization of — 

planar subgraph 

[90B80] 

see: Facilities layout problems) 

planar subgraph see: maximal —; maximum — 

planarity testing 

[90C10, 90C27, 94C15] 

see: Graph planarization) 

planarity-testing algorithm see: Hopcroft-Tarjan — 

planarization 

[90C10, 90C27, 90C35, 94C15] 

see: Graph planarization; Optimization in leveled graphs) 

planarization see: branch and bound algorithm for weighted 
graph —; Graph —-; level 

planarization problem see: k-level —; level — 

plane see: Chvatal-Gomory cutting —; extended cutting —; 
generalized cutting —; Minkowski —; strong cutting —; 
trade-off cutting — 

plane algorithm see: cutting —; Extended cutting —; Gomory 
cutting —; Sequential cutting — 

plane algorithms see: Integer programming: cutting — 

plane algorithms for stochastic linear programming problems 
see: Stabilization of cutting — 

plane approach see: cutting —; Mixed-integer nonlinear 
optimization: A disjunctive cutting — 

plane approaches see: cutting — 

plane coefficients see: statistical representation of cutting — 

plane criterion see: reaction tangent- —; tangent- — 

plane method see: analytic center cutting —; cutting —; 
extended cutting —; generalized cutting —; Kelley's 
classical cutting —; Kelley cutting — 
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plane methods see: cutting —; Nondifferentiable optimization: 


cutting —; regularization of deterministic cutting — 
plane methods for global optimization see: Cutting — 
plane model see: cutting — 
planes see: cutting —; polyhedral cutting —; Stochastic linear 
programming: decomposition and cutting — 
planning 
[90C26, 90C30, 90C31] 
(see: Bilevel programming: introduction, history and 
overview) 
planning see: Chemical process —; distribution systems —; 
extra-urban transit —; financial —; long range —; 
multiperiod —; operation —; production —; 
requirements —; Resource —; scheduling (staff —; 
street —; water resource — 

planning horizon 

[90C26] 

(see: MINLP: design and scheduling of batch processes) 
planning of offshore oilfield infrastructure see: Optimal — 
planning problem see: process — 
planning problems see: Global optimization algorithms for 

financial — 

Planning in the process industry 

planning and scheduling see: facility —; Integrated — 

planning under uncertainty see: Production — 

planning under uncertainty on hydrological exogenous inflow 
and demand see: water resources — 

plant 

(see: State of the art in modeling agricultural systems) 
plant see: co-generation —; multiproduct —; multiproduct 

batch —; multipurpose —; run-of-river —; storage —; 

thermal — 
plant design see: batch — 
Plant layout problems and optimization 
plant location model 

[90-02] 

(see: Operations research models for supply chain 

management and design) 
plant location problem see: simple —; uncapacitated — 
plant/model nodes 

[90035] 

(see: Minimum cost flow problem) 
plant nodes 

[90035] 

(see: Minimum cost flow problem) 
plant railroads see: engine routing and industrial in- — 
plants see: hydro —; run-of-river —; storage —; tracing the 

states of — 
plasticity see: computational — 
plates 
[90C26, 90C90] 
see: Structural optimization: history) 
platform cost see: linear — 
plausible rules 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 
play see: mood of — 
player 
[49]xx, 91 Axx] 
see: Infinite horizon control and dynamic games) 
player zero-sum perfect-information game see: two- — 


pLCP 

65K10, 90C33, 90C51] 

(see: Generalizations of interior point methods for the 

linear complementarity problem) 

PLE 

90C09, 90C10] 

(see: Optimization in classifying text documents) 

PLE 

90C09, 90C10] 

(see: Optimization in classifying text documents) 

PLNFP 

90B10] 

(see: Piecewise linear network flow problems) 

PLS-complexity 

90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 

PLS problems 

90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 

plus homogeneous 

41A50, 41A65, 46B40, 90C46] 
(see: Best approximation in ordered normed linear spaces) 

PM 
[90B80, 90C10] 
(see: Facility location problems with spatial interaction) 

PMD 

68W 10, 90B15, 90C06, 90C30] 

(see: Stochastic network problems: massively parallel 

solution) 

Pnueli algorithm 

90C05, 90C10] 

(see: Simplicial pivoting algorithms for integer 

programming) 

Poincaré polynomial 

05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements) 

Poincaré polynomial 

05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 

point see: 0*-critical —; oo-stationary —; asymptotically stable 
stationary —; bounds on the distance of a feasible point to 
a solution —; (C},)-efficient —; Cauchy —; contact —; 
coupled fixed —; critical —; dead —; decomposition —; 
Dini sup-stationary —; distinguished —; dominated —; 
efficient —; e-stationary —; equilibrium —; extreme —; 
feasible —; fixed —; generalized critical —; global 
maximum —; global minimum —; global minimum KKT —; 
grid —; Hadamard oo-stationary —; Hadamard 
sup-stationary —; index of a constraint violating —; 
inf-stationary —; infeasible interior —; inner —; interior —; 
invexity at a —; isolated stationary —; 
Karush—Kuhn-Tucker —; kKT —; KT —; Kuhn—-Tucker —; left 
saddle —; local efficient —; local Ekeland —; local 
maximum —; local minimum —-; local strictly efficient —; 
local weakly efficient —; lower boundary —; motion of a —; 
nondegenerate —; nondegenerate critical —; Pareto —; 
pinch —; proximal —; quadratic turning —; query —; 
reference —; regular —; regular feasible —; regular 
stationary —; right saddle —; saddle —; saddle-minimax —; 
stationary —; Steiner —; strict local maximum —- strict local 
minimum —-; strictly efficient —; strong stability of 
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a stationary —; subcritical —; substationary —; 
sup-stationary —; supercritical —; supermaximum —; 
superminimax —; trap-door —; trial —; turning —; upper 
boundary —; Voronoi —; weakly efficient — 

point acceleration function see: the mid- — 

point AD 
[65H99, 65K99] 
(see: Automatic differentiation: point and interval) 

point algorithm see: dual exterior —; entropic proximal —; 


infeasible-start interior- —; interior —; partial proximal —; 
proximal —; quadratic proximal — 
point algorithms see: inexact proximal —; interior — 


point algorithms for entropy optimization see: interior — 
point approach see: proximal — 
point-based approximation 

[49J52, 90C30] 

(see: Nondifferentiable optimization: Newton method) 
point-based logic system of approximate reasoning 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
point boundary value problem see: ODE two- —; two- — 
point bounds for NLP see: solution- — 
point bundle method see: proximal — 
point computation see: fixed — 
point conditions 

[65L99, 93-XX] 

(see: Optimization strategies for dynamic systems) 
point configuration 

[90C09, 90C10] 

(see: Oriented matroids) 
point design 

[90C26, 90C29] 

(see: Optimal design of composite structures) 
point of an energy functional see: generalized critical — 
point enumeration see: extreme — 
point formulation see: saddle- — 
point of a functional see: substationarity — 
point inequalities see: saddle- — 
point and interval see: Automatic differentiation: — 
point and interval taylor operators see: Automatic 

differentiation: — 
point intervals see: floating — 
point iterate see: dead- — 
point iteration see: fixed — 
point logarithmic barrier method see: interior — 
point mapping see: nearest — 
point mathematical program see: extreme — 
point method see: exterior —; interior —; proximal — 
point methods see: Entropy optimization: interior —; 

interior —; Linear programming: interior —; polynomial 
time interior —; primal-dual interior- —; proximal —; 

Successive quadratic programming: solution by active sets 

and interior — 
point methods for distributed optimal control problems see: 

Sequential quadratic programming: interior — 
point methods for the linear complementarity problem see: 

Generalizations of interior — 
point methods for semidefinite programming see: Interior — 
point operation see: floating — 


point problem see: Equivalence between nonlinear 
complementarity problem and fixed —; fixed —; high —; 
saddle- — 

point ranking see: extreme — 

point with respect to a set see: substationarity — 

point set see: efficient —; generalized critical — 

point solution see: extreme — 

point to a solution point see: bounds on the distance of 
a feasible — 

point solutions see: enumerating extreme — 

point sufficient condition see: saddle- — 

Point Taylor Operator 
[65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 

point theorem see: brouwer fixed —; fixed —; Miranda 
fixed —; right saddle- —; Schauder fixed —; 
supercritical —; Tychonoff fixed — 

point theory see: critical —; fixed —; Interval fixed — 

point theory and optimality conditions see: Saddle — 

point-to-set mapping 
[65K10, 90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality; Rosen’s method, global convergence, and Powell’s 
conjecture; Sensitivity analysis of variational inequality 
problems) 

point-to-set mapping 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 

point-to-set mapping see: closed — 

point-to-set mappings 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 

point Voronoi diagram see: farthest- — 

pointed closed convex cone 
[90033] 
(see: Topological methods in complementarity theory) 

pointed convex cone 
[90033] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem; Order complementarity) 

points see: abnormal —; convex combination of the 
extreme —; critical —; decomposition —; Discretely 
distributed stochastic programs: descent directions and 
efficient —; High-order necessary conditions for optimality 
for abnormal —; inf-stationary —; KKT —; multiple 
Kuhn-Tucker —; multiple QP Kuhn-Tucker —; 
nondegenerate critical —; ranking extreme —; saddle —; 
set of e-global —; set of s-most active —; set of feasible —; 
stationary —; Steiner —; Steiner tree problem with 
minimum number of Steiner —; successive improvement of 
KKT — 

points problem see: fekete — 

points on the same face 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements in optimization) 

points sets see: connectedness of the efficient — 
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Poisson distribution 
[90C15] 
(see: Logconcavity of discrete distributions) 
Poisson equation 
[60J05, 90C15] 
(see: Derivatives of markov processes and their simulation) 
Polak-Ribiére algorithm see: Polyak- — 
Polak-Ribiére formula 
[90C06] 
see: Large scale unconstrained optimization) 
Polak-Ribiére method 
[90C06] 
(see: Large scale unconstrained optimization) 
polar 
[90C22, 90C25, 90C31] 
see: Semidefinite programming: optimality conditions and 
stability) 
polar cone 
[41A10, 46N10, 47N10, 49K27, 90C30] 
(see: Duality for semidefinite programming; High-order 
necessary conditions for optimality for abnormal points) 
polar duality 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 
polar matrix 
[90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming) 
polar polyhedron 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
policies see: in-company —; nonanticipative water 
resources —; nonanticipativity water resources —; 
optimal —; (s,S) optimal — 
policy 
[49]xx, 491.20, 90B50, 91 Axx] 
(see: Dynamic programming: inventory control; Infinite 
horizon control and dynamic games; Inventory 
management in supply chains) 
policy see: admissible —; barrier —; Continuous review 
inventory models: (QR) —; one-for-one ordering —; optimal 
control —; proper —; (Q,R) —; (s,S) —; scheduling —; 
stationary —; unichain — 
policy evaluation 
[49L.20, 90C39, 90C40] 
(see: Dynamic programming: infinite horizon problems, 
Overview) 
policy iteration 
[49L20, 49L99, 90C39, 90C40] 
(see: Dynamic programming: average cost per stage 
problems; Dynamic programming: discounted problems; 
Dynamic programming: infinite horizon problems, 
overview; Dynamic programming: stochastic shortest path 
problems) 
Poliquin reduction see: Burke- — 
political districting problem 
[90C10, 90C11, 90C27, 90C57] 
(see: Set covering, packing and partitioning problems) 
Pollak conjecture see: Gilbert- — 
polling scheme see: random — 
pollution see: air — 


pollution control 
[90C15] 
(see: Stochastic quasigradient methods: applications) 
Poly(L-Alanine) 
[92C05] 
(see: Adaptive simulated annealing and its application to 
protein folding) 
Poly(L-Alanine) 
[92C05] 
(see: Adaptive simulated annealing and its application to 
protein folding) 
Pdlya theorem see: Hardy-Littlewood- — 
polyak II rule 
[49J52, 90C30] 
(see: Nondifferentiable optimization: relaxation methods; 
Nondifferentiable optimization: subgradient optimization 
methods) 
Polyak method see: Levitin- — 
Polyak minimizing sequence see: Levitin- — 
Polyak-Polak-Ribiére algorithm 
[90C30] 
(see: Conjugate-gradient methods) 
Polyak well-posed problem see: Levitin- — 
polyblock 
[90C26] 
(see: Cutting plane methods for global optimization) 
polyblock see: reduced —; reverse — 
polyblock algorithm 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
polyblock algorithm see: discrete 
polyblock approximation 
90C26] 
(see: Cutting plane methods for global optimization) 
polyblock approximation 
90C26] 
(see: Cutting plane methods for global optimization) 
polyblock approximation method 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
polyblock (copolyblock) algorithm see: revised reverse — 
polyblocks 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
polygon 
(see: State of the art in modeling agricultural systems) 
polygonal Arrangement 
(see: State of the art in modeling agricultural systems) 
polygonal Component 
(see: State of the art in modeling agricultural systems) 
Polygons Arrangement see: two — 
polyhedra see: segments of — 
polyhedral annexation 
[90C25, 90C26] 
(see: Concave programming; Cutting plane methods for 
global optimization) 
polyhedral annexation 
[90C09, 90C10, 90C11, 90C26] 
(see: Cutting plane methods for global optimization; 
Disjunctive programming) 


; reverse —; revised 


Subject Index 


4427 


polyhedral combinatorics 

[90C05, 90C06, 90C08, 90C10, 90C11, 90C27, 90C35, 90C57] 

(see: Integer programming; Integer programming: cutting 

plane algorithms; Optimization in leveled graphs) 
polyhedral cutting planes 

[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 
polyhedral methods 

[05-XX] 

(see: Frequency assignment problem) 
polyhedral methods 

[90C10, 90C11, 90C27, 90C57] 

(see: Integer programming; Set covering, packing and 

partitioning problems) 
polyhedral set 

[90C30] 

(see: Simplicial decomposition) 
polyhedral set 

[90C30] 

(see: Simplicial decomposition) 
polyhedral set see: convex — 
polyhedral subdivision 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 
polyhedral subdivision see: cell of a —; face of a — 
polyhedral theory 

[90C10, 90C11, 90C27, 90C57] 

(see: Set covering, packing and partitioning problems) 
polyhedron 

[90C30] 

(see: Convex-simplex algorithm; Simplicial decomposition) 
polyhedron 

[90C30] 

(see: Convex-simplex algorithm; Frank-Wolfe algorithm; 

Simplicial decomposition) 
polyhedron see: essential —; polar —; regular 

a—-; simple —; state —; submodular — 
polylogarithmic time 

[03D15, 68Q05, 68Q15] 

(see: Parallel computing: complexity classes) 


; segment of 


polymatrix game 
[90C25, 90C33] 
(see: Integer linear complementary problem) 
polynomial 
[05C15, 05C62, 05C69, 05C85, 34E05, 90C27, 90C59] 
(see: Asymptotic properties of random multidimensional 
assignment problem; Optimization problems in unit-disk 
graphs) 
polynomial see: biorthogonal —; characteristic —; 
characterstic —; Chebyshev —; generating —; initial term 
of a —; normal form of a —; output- —; Poincaré — 
polynomial algorithm 
[68Q25, 90C60] 
(see: Computational complexity theory; NP-complete 
problems and proof methodology) 
polynomial algorithm 
[90C60] 
(see: Computational complexity theory) 
polynomial algorithm see: nondeterministic —; strongly — 


polynomial of best approximation 
[65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems) 
polynomial complexity 
[03B50, 41A30, 62J02, 68T15, 68T30, 90C26] 
(see: Finite complete systems of many-valued logic algebras; 
Regression by special functions: algorithms and 
complexity) 
polynomial of degree c see: algorithm — 
polynomial equations 
[12D10, 12Y05, 13P10] 
(see: Grébner bases for polynomial equations) 
polynomial equations see: Grodbner bases for — 
polynomial ideal 
[12D10, 12Y05, 13P10] 
see: Grébner bases for polynomial equations) 
polynomial matrix 
[90C10, 90C25, 90C27, 90C35] 
(see: L-convex functions and M-convex functions) 
polynomial programming 
[65K05, 90C26, 90C30] 
see: Monotonic optimization) 
polynomial programs 
[90C09, 90C10, 90C11] 
see: Disjunctive programming) 
polynomial reducibility 
[90C60] 
see: Complexity classes in optimization) 
polynomial solution see: strongly — 
polynomial solvability 
[90C35] 
see: Feedback set problems) 
polynomial system of equations 
65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 
polynomial time 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68Q25, 68R, 68U, 
68W, 90B, 90C, 90C35, 90C60, 91B28] 
(see: Competitive ratio for portfolio management; 
Complexity theory; Computational complexity theory; 
Convex discrete optimization; Graph coloring) 
polynomial time 
[90C60] 
(see: Complexity theory; Complexity theory: quadratic 
programming) 
polynomial time see: output- —; strongly —; super- — 
polynomial time algorithm 
[49-01, 49K45, 49N10, 90-01, 90C05, 90C06, 90C08, 90C10, 
90C11, 90C20, 90C27, 90C35, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs; Integer programming: 
branch and bound methods; Linear programming: 
karmarkar projective algorithm; Maximum flow problem; 
Minimum cost flow problem) 
polynomial time algorithm 
[05B35, 20F36, 20F55, 52C35, 57N65, 90C05] 
(see: Hyperplane arrangements in optimization; Linear 
programming: karmarkar projective algorithm) 
polynomial time algorithm see: efficient polynomially 
bounded —; nondeterministic —; strongly —; weakly — 
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polynomial time algorithms 
[90C05] 
(see: Linear programming: interior point methods) 
polynomial time approximation scheme 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59, 90C60] 
(see: Complexity classes in optimization; Optimization 
problems in unit-disk graphs) 
polynomial time approximation scheme see: fully — 
polynomial time computable function 
[90C60] 
see: Complexity classes in optimization) 
polynomial time convergence 
[90C25, 90C51, 94417] 
(see: Entropy optimization: interior point methods) 
polynomial time convergence 
[90C25, 90C51, 9417] 
see: Entropy optimization: interior point methods) 
polynomial time deterministic algorithm 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
polynomial time interior point methods 
[90C25, 90C27, 90C90] 
see: Semidefinite programming and structural 
optimization) 
polynomial time local search problems 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
polynomial time problem see: strongly — 
polynomial time reduction 
[68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
polynomial time reduction 
[68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
polynomial time solution 
[90C60] 
see: Complexity theory: quadratic programming) 
polynomial transformation 
[90C60] 
see: Computational complexity theory) 
polynomial Turing reducibility 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
polynomial upper bound 
[52A22, 60D05, 68Q25, 90C05] 
see: Probabilistic analysis of simplex algorithms) 
polynomially bounded polynomial time algorithm see: 
efficient — 
polynomially space-bounded Turing machine 
[90C60] 
(see: Complexity classes in optimization) 
polynomially time-bounded Turing machine 
[90C60] 
(see: Complexity classes in optimization) 
polynomially transformable decision problem 
[90C60] 
(see: Complexity theory) 
polynomials see: least squares formal orthogonal —; Least 
squares orthogonal —; orthogonal —; polytope —; Robust 
control: schur stability of polytopes of — 


polyspectrum 
[90C26, 90C90] 
(see: Signal processing with higher order statistics) 
polytope 
[65K05, 65K10] 
(see: ABS algorithms for optimization) 
polytope 
[90C05] 
(see: Carathéodory theorem) 
polytope see: convex —; cross —; k-way —; k-way 
transportation —; matroid base —; secondary —; state —; 
trace — 
polytope polynomials 
[39A11, 93C55, 93D09] 
(see: Robust control: schur stability of polytopes of 
polynomials) 
polytopes of polynomials see: Robust control: schur stability 
of — 
polytopes: strategies and performances see: Volume 
computation for — 
ponens see: checklist modus —; modus — 
Pontryagin maximum principle 
(49J15, 49K15, 93C10) 
(referred to in: Dynamic programming: continuous-time 
optimal control; Hamilton-Jacobi-Bellman equation; 
High-order maximum principle for abnormal extremals) 
(refers to: Dynamic programming: continuous-time optimal 
control; Hamilton-Jacobi-Bellman equation; High-order 
maximum principle for abnormal extremals) 
pontryagin’s maximum principle 
[41A10, 47N10, 49K15, 49K27, 65L99, 93-XX] 
(see: Boundary condition iteration BCI; Dynamic 
programming: optimal control applications; High-order 
maximum principle for abnormal extremals; Optimization 
strategies for dynamic systems) 
Pontryagin maximum principle 
49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 
Pontryagin minimum principle 
34H05, 49L20, 90C39] 
(see: Dynamic programming: continuous-time optimal 
control) 
pool adjacent violators algorithm 
41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 
pool of cuts 
90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms) 
pool template 
68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
pooling 
90C05] 
(see: Continuous global optimization: applications; 
Planning in the process industry) 
pooling 
90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 
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pooling and blending problems 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 

pooling problems see: MINLP: applications in blending and — 

pools see: crew — 

POP 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 

population 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90, 92B05] 
(see: Broadcast scheduling problem; Design optimization in 
computational fluid dynamics; Genetic algorithms; 
Traveling salesman problem) 

population 

92B05] 

(see: Genetic algorithms) 

population drifting 

90C11, 90C90, 91B28] 

(see: Multicriteria decision support methodologies for 

auditing decisions) 

population size 

92B05] 

(see: Genetic algorithms) 

population size 

92B05] 
(see: Genetic algorithms) 

populations see: Volterra model of conflicting — 

portfolio see: constant rebalanced —; market —; universal — 

portfolio analysis see: mean-variance — 

portfolio management 
[68Q25, 90C26, 91B06, 91B28, 91B60] 
(see: Competitive ratio for portfolio management; Financial 
applications of multicriteria analysis; Portfolio selection 
and multicriteria analysis) 

portfolio management 
[68Q25, 90C26, 91B28] 
(see: Competitive ratio for portfolio management; Portfolio 
selection and multicriteria analysis) 

portfolio management see: Competitive ratio for — 

portfolio optimization 

91B50] 

(see: Financial equilibrium) 

portfolio selection 

90C20, 90C29] 

(see: Decision support systems with multiple criteria; 

Standard quadratic optimization problems: applications) 

portfolio selection 

90C20] 
(see: Standard quadratic optimization problems: 
applications) 

Portfolio selection: markowitz mean-variance model 
(91B28) 

Portfolio selection and multicriteria analysis 
(91B28, 90C26) 
(referred to in: Bi-objective assignment problem; 
Competitive ratio for portfolio management; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 


Financial optimization; Fuzzy multi-objective linear 
programming; Multicriteria sorting methods; 
Multi-objective combinatorial optimization; 
Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Preference disaggregation; Preference 
disaggregation approach: basic features, examples from 
financial decision making; Preference modeling; Robust 
optimization; Semi-infinite programming and applications 
in finance; Standard quadratic optimization problems: 
applications) 
(refers to: Bi-objective assignment problem; Competitive 
ratio for portfolio management; Decision support systems 
with multiple criteria; Estimating data for multicriteria 
decision making problems: optimization techniques; 
Financial applications of multicriteria analysis; Financial 
optimization; Fuzzy multi-objective linear programming; 
Multicriteria sorting methods; Multi-objective 
combinatorial optimization; Multi-objective integer linear 
programming; Multi-objective optimization and decision 
support systems; Multi-objective optimization: interaction 
of design and control; Multi-objective optimization; 
Interactive methods for preference value functions; 
Multi-objective optimization: lagrange duality; 
Multi-objective optimization: pareto optimal solutions, 
properties; Multiple objective programming support; 
Outranking methods; Preference disaggregation; 
Preference disaggregation approach: basic features, 
examples from financial decision making; Preference 
modeling; Robust optimization; Semi-infinite 
programming and applications in finance) 

portfolio selection problem 

[91B28] 

see: Portfolio selection: markowitz mean-variance model) 

portfolio theory 

[90C27] 

see: Operations research and financial markets) 

portfolio theory 

[90C27] 

see: Operations research and financial markets) 

portfolios see: frontier of efficient — 

posed see: ill- —; well- — 

posed problem see: ill- —; Levitin-Polyak well- —; well- — 

posed problems see: ill- — 

posed variational problem see: ill- — 

posed variational problems see: IIl- — 

posedness see: well- — 

position see: general — 

position evaluator 
[90C39] 
(see: Neuro-dynamic programming) 

position of hyperplanes see: general — 

position vector 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements in optimization) 
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positioning see: Gene clustering: A novel 
decomposition-based clustering approach: global optimum 
search with enhanced — 
positioning algorithm see: relative — 
positive 
[58E05, 90C30] 
(see: Topology of global optimization) 
positive see: completely — 
positive basis tableau see: lexico- — 
positive and contraction matrices see: completion to 
completely — 
positive definite 
[05C50, 15-XX, 15A48, 15A57, 65-XX, 90-XX, 90C05, 90C22, 
90C25, 90C30, 90C33, 90C51] 
(see: Cholesky factorization; Interior point methods for 
semidefinite programming; Linear complementarity 
problem; Matrix completion problems) 
positive definite matrices 
[65K05, 90C20, 90C33] 
see: Principal pivoting methods for linear complementarity 
problems) 
positive definite matrix 
[65K10, 65M60] 
(see: Variational inequalities) 
positive definite matrix 
[90C30] 
(see: Frank-Wolfe algorithm; Simplicial decomposition) 
positive definite matrix see: strongly — 
positive definite quadratic function 
[90C30] 
(see: Suboptimal control) 
positive definite quadratic models 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
positive definiteness 
[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 
positive fault 
[90Cxx] 
(see: Discontinuous optimization) 
positive gradient see: projected — 
positive marginal value 
[90C60] 
(see: Complexity of degeneracy) 
positive marginal values 
[90C60] 
(see: Complexity of degeneracy) 
positive matrix see: completely —; partial completely — 
positive minimum value 
[49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
positive-negative-zero pattern 
90C09, 90C10] 
(see: Combinatorial matrix analysis) 
positive real numbers see: infinitely small — 
positive (semi) definite completion problem 
[05C50, 15A48, 15A57, 90C25] 
see: Matrix completion problems) 


positive semi-definite quadratic binary programming 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
positive semidefinite 
[05C15, 05C17, 05C35, 05C50, 05C69, 15A48, 15457, 90C05, 
90C20, 90C22, 90C25, 90C30, 90C33, 90C35, 90C51, 90C60] 
(see: Copositive programming; Interior point methods for 
semidefinite programming; Linear complementarity 
problem; Lovasz number; Matrix completion problems; 
Quadratic knapsack) 
positive semidefinite matrices 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
positive semidefinite matrix 
[65K10, 65M60] 
(see: Variational inequalities) 
positive semidefinite matrix see: bisymmetric — 
positive semidefinite matrix completion problem 
[05C50, 15A48, 15457, 90C25] 
(see: Matrix completion problems) 
positive semidefinite symmetric matrix 
[65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 
positive semidefiniteness constraints 
[90C10, 90C11, 90C27, 90C57] 
(see: Integer programming) 
positive vector see: lexico- —; lexicographically — 
positively see: dropped — 
positively homogeneous 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
positively homogeneous see: increasing and — 
positively homogeneous function 
[90C26] 
(see: Global optimization: envelope representation) 
positively homogeneous functions on topological vector 
spaces see: Increasing and — 
positively linearly dependent 
[49M30, 49M37, 65K05, 90C30] 
(see: Practical augmented Lagrangian methods) 
positively linearly independent 
[49M30, 49M37, 65K05, 90C30] 
(see: Practical augmented Lagrangian methods) 
positiveness of distances 
[65K05, 90C27, 90C30, 90C57, 91C15] 
(see: Optimization-based visualization) 
positivity 
[93D09] 
(see: Robust control) 
possible see: largest — 
Post conditions 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
post-optimality analyses 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
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post-optimality analysis 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 
post-optimality sensitivity analysis 
[90C31] 
(see: Sensitivity and stability in NLP) 
post (risk prone, adaptive) decision see: ex- — 
Post system 
03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
posterior distribution 
65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
posterior methods 
65K05, 90B50, 90C05, 90C29, 91B06] 
(see: Multi-objective optimization and decision support 
systems) 
posteriori principle see: maximum a — 
postman problem see: Chinese —; directed Chinese —; 
rural — 
posynomial condensation 
90C26, 90C90 
(see: Global optimization in generalized geometric 
programming) 
posynomial monomials 
90C26, 90C90 
(see: Global optimization in generalized geometric 
programming) 
posynomial terms 
90C26, 90C90 
(see: Global optimization in generalized geometric 
programming) 
posynomials 
90C26, 90C90 
(see: Global optimization in generalized geometric 
programming) 
potential 
49J40, 49J52, 49Q10, 60J15, 60J60, 60J70, 60K35, 65C05, 
65C10, 65C20, 68U20, 70-08, 70-XX, 74K99, 74Pxx, 80-XX, 
82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding; Nonconvex 
energy functions: hemivariational inequalities) 
potential see: chemical —; empirical —; linear —; nonlinear — 
potential efficient solutions see: set of — 
potential energy see: minimum —; smoothing of the — 
potential energy function 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
potential energy function see: Lennard-Jones — 
potential function 
[37A35, 49M20, 90-08, 90C05, 90C22, 90C25, 90C30, 90C51] 
(see: Interior point methods for semidefinite programming; 
Linear programming: interior point methods; Linear 
programming: karmarkar projective algorithm; 
Nondifferentiable optimization: cutting plane methods; 
Potential reduction methods for linear programming) 
potential function 
[37A35, 90C05, 90C25, 90C30] 
(see: Linear programming: karmarkar projective algorithm; 


Potential reduction methods for linear programming; 
Solving large scale and sparse semidefinite programs) 
potential function see: dual —; Karmarkar —; primal —; 
primal-dual —; Tanabe-Todd-Ye — 
potential reduction 
[05-XX, 90C60] 
(see: Complexity of degeneracy; Frequency assignment 
problem) 
potential reduction 
[37A35, 90C05, 90C25, 90C30] 
(see: Potential reduction methods for linear programming; 
Solving large scale and sparse semidefinite programs) 


potential reduction algorithm 

[37A35, 90C05] 

(see: Potential reduction methods for linear programming) 
potential reduction algorithm see: primal —; primal-dual — 


potential reduction algorithms 
[90C05] 
(see: Linear programming: interior point methods; Linear 
programming: karmarkar projective algorithm) 

Potential reduction methods for linear programming 
(90C05, 37A35) 
(referred to in: Entropy optimization: interior point 
methods; Homogeneous selfdual methods for linear 
programming; Linear programming: interior point 
methods; Linear programming: karmarkar projective 
algorithm; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Successive quadratic programming: solution by active sets 
and interior point methods) 
(refers to: Entropy optimization: interior point methods; 
Homogeneous selfdual methods for linear programming; 
Interior point methods for semidefinite programming; 
Linear programming: interior point methods; Linear 
programming: karmarkar projective algorithm; Sequential 
quadratic programming: interior point methods for 
distributed optimal control problems; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

potential smoothing algorithm 
[90C90] 
(see: Simulated annealing methods in protein folding) 

potentials see: empirical —; node — 

potentials and stability in mechanics see: smooth — 


Potts glass model 
[90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 
powell’s conjecture 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
Powell’s conjecture see: Rosen’s method, global convergence, 
and — 
Powell method 
(90C30) 
(referred to in: Cyclic coordinate method; Rosenbrock 
method; Sequential simplex method) 
(refers to: Cyclic coordinate method; Rosenbrock method; 
Sequential simplex method) 
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Powell method 
[90C30] 
(see: Powell method) 
Powell method see: Davidon-Fletcher- — 
Powell-symmetric-Broyden method 
[90C30] 
(see: Successive quadratic programming) 
Powell update see: Davidon-Fletcher- — 
power balance 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
power consumption see: expected — 
power method 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
power set see: fuzzy — 
power system see: electric — 
power systems see: Optimization in operation of electric and 
energy — 
PP 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
PPA see: convergence of — 
PPM 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
practical 
(see: Global optimization: functional forms) 
Practical augmented Lagrangian methods 
90C30, 49M30, 49M37, 65K05) 
practically feasible computational solution 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
PRAM 
[03D15, 68Q05, 68Q15] 
see: Parallel computing: complexity classes) 
PRAM 
[03D15, 65K05, 65Y05, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes; Parallel 
computing: models) 
PRAM see: CRCW —; CREW 
pre-declared interval function 
[65G20, 65G30, 65G40, 65K05, 90C30] 
see: Interval global optimization) 
pre-invex function 
[49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
see: Variational principles) 
pre-invex function 
[49}40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 
pre-invex set 
49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 
pre-invexity with respect to a set 
49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 


; EREW 


pre-matching 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
pre-matching see: algorithm —; maximum — 
pre-matching problem see: maximum — 
pre-order 
[90C35] 
(see: Generalized networks) 
pre-order 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
pre-order closure of a relation 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
pre-order closure of a relation see: local — 
pre-order relation 
[03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
preaccumulation of the Jacobian 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
precedence/coupling constraints 
[00-02, 01-02, 03-02] 
(see: Vehicle routing problem with simultaneous pickups 
and deliveries) 
precision interval package see: variable — 
preconditioner 
[65H 10, 65J15] 
(see: Contraction-mapping) 
preconditioner see: Atkinson—Brakhage — 
preconditioning 
[65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 
preconditioning step 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
predecessor 
[90C35] 
(see: Generalized networks) 
predefined probabilities see: randomly with — 
predictability of complexities 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
prediction see: Genetic algorithms for protein structure —; 
tertiary structure — 
prediction of crystal structures 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
prediction list 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
prediction methods see: Protein loop structure — 
predictive control see: model — 
Predictive method for interhelical contacts in alpha-helical 
proteins 
(90C11) 
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preemptive 


preference 


predictor-corrector 


(see: Global terrain methods) 


predictor-corrector algorithm 


[90C05 
(see: Linear programming: interior point methods) 


90B36 
(see: Planning in the process industry; Stochastic 
scheduling) 


90C29 
(see: Preference modeling; Vector optimization) 
preference 
90C29 
(see: Preference modeling) 

Preference disaggregation 

(90C29, 91499) 

(referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation approach: basic features, 
examples from financial decision making; Preference 
modeling) 

(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation approach: basic features, 
examples from financial decision making; Preference 
modeling) 


preference disaggregation 


[90C29] 

(see: Multicriteria sorting methods) 

preference disaggregation 

[90C29, 91499] 

(see: Multicriteria sorting methods; Preference 
disaggregation; Preference disaggregation approach: basic 
features, examples from financial decision making) 


preference disaggregation analysis 


[90C29] 

(see: Decision support systems with multiple criteria; 
Preference disaggregation approach: basic features, 
examples from financial decision making) 


preference disaggregation approach 


[90C29] 
(see: Decision support systems with multiple criteria) 


Preference disaggregation approach: basic features, examples 


from financial decision making 

(90C29) 

(referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference modeling) 

(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference modeling) 


Preference modeling 


(90C29) 

(referred to in: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
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approach: basic features, examples from financial decision 
making) 

(refers to: Bi-objective assignment problem; Decision 
support systems with multiple criteria; Estimating data for 
multicriteria decision making problems: optimization 
techniques; Financial applications of multicriteria analysis; 
Fuzzy multi-objective linear programming; Multicriteria 
sorting methods; Multi-objective combinatorial 
optimization; Multi-objective integer linear programming; 
Multi-objective optimization and decision support systems; 
Multi-objective optimization: interaction of design and 
control; Multi-objective optimization; Interactive methods 
for preference value functions; Multi-objective 
optimization: lagrange duality; Multi-objective 
optimization: pareto optimal solutions, properties; 
Multiple objective programming support; Outranking 
methods; Portfolio selection and multicriteria analysis; 
Preference disaggregation; Preference disaggregation 
approach: basic features, examples from financial decision 
making) 


preference modeling 


90-XX] 
(see: Outranking methods) 


preference relation 


03E70, 03H05, 91B16] 
(see: Alternative set theory) 


preference threshold 


90-XX] 
(see: Outranking methods) 


preference value function 


90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions) 


preference value function 


[90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions) 


preference value functions see: Multi-objective optimization; 


Interactive methods for — 


preferences 


[90C29, 90C70] 
see: Fuzzy multi-objective linear programming) 


preferences see: disaggregation of —; embedded family of — 
preferential bidding system 


[90B06, 90C06, 90C08, 90C35, 90C90] 
see: Airline optimization) 


preferred solution see: most — 
preflow 


[90C35] 
see: Maximum flow problem) 


preflow-push algorithm 


[90C35] 
see: Maximum flow problem) 


preflow-push algorithm 


[90C35] 
see: Maximum flow problem) 


preflow-push algorithm see: generic — 
preliminary design stage 


[90C90] 
see: Design optimization in computational fluid dynamics) 


premature convergence 


92B05] 
(see: Genetic algorithms) 


premature convergence 


92B05] 
(see: Genetic algorithms) 


premis of an inference 


03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 


preparation 


65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 


preprocessing 


65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 


preprocessing 


90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming) 


preprocessing see: symbolic — 
preprocessing and reformulation 


90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 


Preprocessing in stochastic programming 


(90C15, 90C35, 90B10, 90B15) 

(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 

(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; Generalized benders decomposition; 
General moment optimization problems; Logconcave 
measures, logconvexity; Logconcavity of discrete 
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distributions; L-shaped method for two-stage stochastic 
programs with recourse; Multistage stochastic 
programming: barycentric approximation; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

prescriptive perspective 
[90C29] 
(see: Preference modeling) 

present value see: maximize net — 

preserving an activity see: direction — 

preserving assignment problem see: order — 

presolving techniques 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems; Planning in 
the process industry) 

prespecification see: weak — 

pressure see: Reid vapor — 

pretty-printing a declarative program 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 

price see: branch and —; branch-and- —; dual —; 
pseudoshadow —; shadow — 

price and cut see: branch and — 

price-directive decomposition 
[90C35] 
(see: Multicommodity flow problems) 

price elasticity 
[90C26] 
(see: MINLP: application in facility location-allocation) 

price equilibrium see: spatial —; Walrasian — 

price equilibrium problem see: network structure of the 
spatial —; spatial — 

price formulation 
[91B28, 91B50] 
(see: Spatial price equilibrium) 

price increase see: dual — 

price: Integer programming with column generation see: 
Branch and — 

price model 
[91B28, 91B50] 
(see: Spatial price equilibrium) 


price model 
[91B28, 91B50] 
(see: Spatial price equilibrium) 
Price of robustness for linear optimization problems 
price taker 
[91B50] 
(see: Financial equilibrium) 
priced information 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 
prices see: almost at equilibrium of an assignment and a set 
of —; best estimate using pseudoshadow —-; equilibrium of 
an assignment and a set of —; shadow — 
pricing 
[90C06, 90C25, 90C35] 
see: Simplicial decomposition algorithms) 
pricing derivatives 
[90C27] 
see: Operations research and financial markets) 
pricing model see: capital asset — 
pricing-out 
[90C06, 90C25, 90C35] 
see: Simplicial decomposition algorithms) 
pricing-out 
[90C06, 90C25, 90C35] 
see: Simplicial decomposition algorithms) 
pricing problem 
68Q99] 
(see: Branch and price: Integer programming with column 
generation) 
pricing theory see: arbitrage — 
Prim algorithm see: modified — 
primal 
[90C22, 90C25] 
see: Copositive programming) 
primal arc 
[90B35] 
(see: Job-shop scheduling problem) 
primal bound-improvement 
[49M27, 90C11, 90C30] 
see: MINLP: generalized cross decomposition) 
primal cut-improvement 
[49M27, 90C11, 90C30] 
see: MINLP: generalized cross decomposition) 
primal degenerate 
[05B35, 90C05, 90C20, 90C33] 
see: Least-index anticycling rules) 
primal degenerate basis 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Lexicographic pivoting rules) 
primal-dual 
[37A35, 49M27, 90C05, 90C11, 90C25, 90C30] 
see: MINLP: generalized cross decomposition; Potential 
reduction methods for linear programming; Solving large 
scale and sparse semidefinite programs) 
primal-dual algorithm 
[90C05] 
(see: Linear programming: interior point methods) 
primal-dual algorithm 
[90C25, 90C30, 90C51, 94A17] 


4436 


Subject Index 


(see: Entropy optimization: interior point methods; 
Successive quadratic programming: solution by active sets 
and interior point methods) 
primal-dual framework 
[90C25, 90C30] 
(see: Lagrangian multipliers methods for convex 
programming) 
primal-dual interior-point methods 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
primal-dual methods 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
primal-dual methods 
[90C25, 90C30] 
(see: Lagrangian multipliers methods for convex 
programming) 
primal-dual potential function 
[37A35, 90C05, 90C25, 90C30, 90C51, 94A17] 
(see: Entropy optimization: interior point methods; 
Potential reduction methods for linear programming; 
Solving large scale and sparse semidefinite programs) 
primal-dual potential reduction algorithm 
[37A35, 90C05] 
see: Potential reduction methods for linear programming) 
primal and dual problems 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Criss-cross pivoting rules) 
primal-dual scaling algorithm 
[90C25, 90C30] 
(see: Solving large scale and sparse semidefinite programs) 
primal and dual simplex algorithms 
[90C05, 90C06] 
(see: Selfdual parametric method for linear programs) 
primal-dual solution 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: monoduality in convex optimization) 
primal and dual solutions see: exploiting the interplay 
between — 
primal-dual SQPIP methods 
[49K20, 49M99, 90C55] 
(see: Sequential quadratic programming: interior point 
methods for distributed optimal control problems) 
primal feasibility 
[68W 10, 90B15, 90C05, 90C06, 90C30, 90C31] 
(see: Multiparametric linear programming; Parametric 
linear programming: cost simplex algorithm; Stochastic 
network problems: massively parallel solution) 


primal feasible basis 

[90C05, 90C31] 

(see: Multiparametric linear programming) 
primal feasible set 

[49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 
primal gap function 

[90C06, 90C25, 90C30, 90C35] 

(see: Cost approximation algorithms; Simplicial 

decomposition algorithms) 
primal heuristic see: Toyoda — 


primal heuristics 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
primal linear semi-infinite program 
[90C05, 90C34] 
(see: Semi-infinite programming: methods for linear 
problems) 
primal (linear) semi-infinite program 
[90C05, 90C34] 
(see: Semi-infinite programming: methods for linear 
problems) 
primal master problem 
[90C06, 90C15] 
(see: Stabilization of cutting plane algorithms for stochastic 
linear programming problems) 
primal master problem see: relaxed — 
primal method 
[90C06, 90C08, 90C15] 
(see: Simple recourse problem) 
primal method see: Simple recourse problem: — 
primal method for the simple recourse problem 
[90-08, 90C05, 90C06, 90C08, 90C15] 
(see: Simple recourse problem: dual method) 
primal optimization problem 
[90C30] 
(see: Lagrangian duality: BASICS) 
primal optimization problem 
[90C30] 
(see: Lagrangian duality: BASICS) 
primal pair 
[90B35] 
(see: Job-shop scheduling problem) 
primal potential function 
[37A35, 90C05] 
(see: Potential reduction methods for linear programming) 
primal potential reduction algorithm 
[37A35, 90C05] 
(see: Potential reduction methods for linear programming) 
primal problem 
[15A39, 49-XX, 49L99, 49M29, 90-XX, 90C05, 90C11, 90C22, 
90C25, 90C29, 90C30, 90C31, 93-XX] 
(see: Duality theory: monoduality in convex optimization; 
Dynamic programming: average cost per stage problems; 
Generalized benders decomposition; Lagrangian duality: 
BASICS; Motzkin transposition theorem; Multi-objective 
optimization: lagrange duality; Semidefinite programming: 
optimality conditions and stability; Theorems of the 
alternative and optimization) 
primal problem see: J-normal —; J-stable —; N-normal —; 
nonconvex — 
primal programming problem 
[90C06] 
(see: Saddle point theory and optimality conditions) 
primal ray 
[15A39, 90C05] 
(see: Motzkin transposition theorem) 
primal-relaxed dual algorithm see: generalized — 
primal-relaxed dual approach 
[90C26] 
(see: Generalized primal-relaxed dual approach) 
primal-relaxed dual approach see: Generalized — 
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primal-scaling algorithm 
[90C25, 90C30] 
(see: Solving large scale and sparse semidefinite programs) 
primal SD problem 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
primal SD problem see: equivalent — 
primal simplex algorithm 
90C35] 
(see: Generalized networks) 
primal simplex method see: lexicographic — 
primal solution 
90B80, 90C11] 
(see: Facility location with staircase costs) 
primal subproblem 
90C11, 90C31] 
(see: Parametric mixed integer nonlinear optimization) 
primary structure 
92B05] 
(see: Genetic algorithms for protein structure prediction) 
primary structure 
92B05] 
(see: Genetic algorithms for protein structure prediction) 
prime see: multiplicity of a — 
prime representation of a feasible region 
90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
primitive integral vector 
13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
primitive partition identities 
13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
principal see: proximate optimality — 
principal agent 
90C90, 91A65, 91B99] 
(see: Bilevel programming: applications) 
principal pivot see: simple — 
principal pivot algorithm 
05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 
principal pivotal transform 
65K05, 90C20, 90C33] 


problems) 

principal pivotal transformation 

90C33] 

(see: Linear complementarity problem) 
principal pivoting 

65K05, 90C20, 90C33] 


problems) 
principal pivoting 
65K05, 90C20, 90C33] 


(see: Principal pivoting methods for linear complementarity 


problems) 
principal pivoting see: matrix class invariant under — 
principal pivoting method 

[90C33] 

(see: Linear complementarity problem) 


(see: Principal pivoting methods for linear complementarity 


(see: Principal pivoting methods for linear complementarity 


Principal pivoting methods for linear complementarity 
problems 
(90C33, 90C20, 65K05) 
(referred to in: Criss-cross pivoting rules; Equivalence 
between nonlinear complementarity problem and fixed 
point problem; Generalized nonlinear complementarity 
problem; Integer linear complementary problem; LCP: 
Pardalos—Rosen mixed integer formulation; Least-index 
anticycling rules; Lexicographic pivoting rules; Linear 
complementarity problem; Linear programming; Order 
complementarity; Pivoting algorithms for linear 
programming generating two paths; Probabilistic analysis 
of simplex algorithms; Simplicial pivoting algorithms for 
integer programming; Topological methods in 
complementarity theory) 
(refers to: Convex-simplex algorithm; Criss-cross pivoting 
rules; Equivalence between nonlinear complementarity 
problem and fixed point problem; Generalized nonlinear 
complementarity problem; Integer linear complementary 
problem; LCP: Pardalos—Rosen mixed integer formulation; 
Least-index anticycling rules; Lemke method; Lexicographic 
pivoting rules; Linear complementarity problem; Linear 
programming; Linear programming: interior point 
methods; Order complementarity; Parametric linear 
programming: cost simplex algorithm; Pivoting algorithms 
for linear programming generating two paths; Probabilistic 
analysis of simplex algorithms; Sequential simplex method; 
Topological methods in complementarity theory) 

principal submatrix 
[65K05, 90C20, 90C33] 
(see: Linear complementarity problem; Principal pivoting 
methods for linear complementarity problems) 

principal variation path 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 

principal variation splitting algorithm 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 

principle see: argument —; auxiliary problem —; branch and 
bound —; Carathéodory —; disjunctive cut —; duality and 
maximum —; ekeland variational —; exhaustion —; 
extended Extremal —; extremal —; high-order local 
maximum —; inclusion —; Jaynes’ maximum entropy —; 
Kataoka —; Kimura maximum —; local maximum —; 
local-ratio —; maximum —; maximum entropy —; 
maximum likelihood —; maximum a posteriori —; 
minimal —; minimum cross-entropy —; nonanticipative —; 
pontryagin’s maximum —; Pontryagin minimum —; 
proximate optimality —; subdifferential Variational —; 
Wardrop first —; Wardrop second — 

principle for abnormal extremals see: High-order maximum — 

principle of Fourier see: mechanical — 

principle: image reconstruction see: Maximum entropy — 

principle of insufficient reason see: laplace’s — 

principle of insufficient reasoning see: Laplace — 

principle for Lagrangian problems see: high-order local 
maximum — 

principle of least effort 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
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principle of least effort 

[90C09, 90C10] 

(see: Optimization in classifying text documents) 
principle of linear programming see: Decomposition — 
principle of machine interval arithmetic see: inclusion — 
principle of maximum entropy 

[90C25, 94A08, 94A17] 

(see: Entropy optimization: shannon measure of entropy 

and its properties; Maximum entropy principle: image 

reconstruction) 
principle of maximum entropy 

[94.A08, 94A17] 

(see: Maximum entropy principle: image reconstruction) 
principle of maximum entropy see: axiomatic derivation of 

the — 
principle of minimum 

[90C25, 94417] 

(see: Entropy optimization: shannon measure of entropy 

and its properties) 
principle of minimum cross-entropy 

[94.A08, 94A17] 

(see: Maximum entropy principle: image reconstruction) 
principle of minimum cross-entropy see: axiomatic derivation 

of the — 
principle of optimality 
[93-XX] 
see: Dynamic programming: optimal control applications) 
principle of optimality see: weak — 
principle of Pareto optimality of MODP 
[90C31, 90C39] 
see: Multiple objective dynamic programming) 
inciple of transfers 
[90B85] 

(see: Single facility location: multi-objective euclidean 

distance location) 
principle of virtual work 

[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 

(see: Hemivariational inequalities: applications in 

mechanics) 
principles see: extremum —; minimax —; variational — 
printing a declarative program see: pretty- — 
prior distribution 

[62C10, 65K05, 90C10, 90C15, 90C26] 

(see: Bayesian global optimization) 
prior probability 

[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 

90C90] 

(see: Disease diagnosis: optimization-based methods) 
priori see: a— 
priori method see: a — 
priori optimization see: a — 
priorities 

(see: Planning in the process industry) 
priorities see: relative — 
priorities selection 

[90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: branch and bound methods) 
Prize see: Nobel — 
prize collecting traveling salesman problem 

[90C10, 90C11, 90C27, 90C57] 

(see: Integer programming) 


3 


P 


probabilistic analysis 
[90C90] 
(see: Chemical process planning) 

probabilistic analysis 
[90C90] 
(see: Chemical process planning) 

probabilistic analysis of an algorithm 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 

Probabilistic analysis of simplex algorithms 
(90C05, 68Q25, 60D05, 52A22) 
(referred to in: Criss-cross pivoting rules; Least-index 
anticycling rules; Lexicographic pivoting rules; Linear 
programming; Pivoting algorithms for linear programming 
generating two paths; Principal pivoting methods for linear 
complementarity problems; Simplicial pivoting algorithms 
for integer programming) 
(refers to: Criss-cross pivoting rules; Least-index anticycling 
rules; Lexicographic pivoting rules; Linear 
complementarity problem; Linear programming; Pivoting 
algorithms for linear programming generating two paths; 
Principal pivoting methods for linear complementarity 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems) 

probabilistic collapse 

03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 

probabilistic constrained linear programming 

90C05, 90C15] 

(see: Probabilistic constrained linear programming: duality 

theory) 

probabilistic constrained linear programming 

90C05, 90C15] 
(see: Probabilistic constrained linear programming: duality 
theory) 

Probabilistic constrained linear programming: duality theory 
(90C05, 90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
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random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

Probabilistic constrained problems: convexity theory 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Simple recourse problem: 
dual method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 


massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Simple recourse problem: 
dual method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

probabilistic constrained stochastic programming 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 

probabilistic constraint 
[90C15] 
(see: Probabilistic constrained problems: convexity theory; 
Static stochastic programming models) 

probabilistic constraint see: integrated —; joint —; 
programming under — 

probabilistic constraints see: individual — 

probabilistic criterion 

[90C29, 91A99 

see: Preference disaggregation) 

probabilistic estimate 

[26A24, 65D25] 

see: Automatic differentiation: introduction, history and 

rounding error estimation) 

probabilistic method for detecting redundancy 

[90C05, 90C20 
(see: Redundancy in nonlinear programs) 

probabilistic optimization models for data classification see: 
Deterministic and — 
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probabilistic programming 

90C15] 

(see: Two-stage stochastic programs with recourse) 
probabilistic programming 

90C15] 

(see: Two-stage stochastic programs with recourse) 
probabilistic traveling salesman 

90C10, 90C15] 

(see: Stochastic vehicle routing problems) 
probabilistic uncertainty 

94A17] 

(see: Jaynes’ maximum entropy principle) 
probabilities see: randomly with predefined — 
probability 

[28-XX, 49-XX, 60-XX] 

(see: General moment optimization problems) 
probability see: prior —; randomly with the same — 
probability density see: transition — 
probability density function see: logconcave — 
probability distribution 

[68Q25, 90C26, 91B28] 

(see: Competitive ratio for portfolio management; 

Emergency evacuation, optimization modeling; Global 

optimization in batch design under uncertainty) 
probability distribution see: incomplete knowledge of a —; 

Levy —; logconcave discrete —; logconcave univariate 

discrete —; quasiconcave —; Tsallis —; uncertainty 

embedded in a — 
probability distribution function see: multivariate —; 
one-dimensional marginal —; two-dimensional marginal — 
probability function 

[90C15] 

(see: Approximation of extremum problems with 

probability functionals; Derivatives of probability and 

integral functions: general theory and examples) 
probability function 

[90C15] 

(see: Derivatives of probability and integral functions: 

general theory and examples; Extremum problems with 

probability functions: kernel type solution methods) 
probability function see: derivative of a —; gradient of a — 
probability functionals 

[90C15] 

(see: Approximation of extremum problems with 

probability functionals) 
probability functionals see: Approximation of extremum 

problems with — 
probability functions 

[90C15] 

(see: Derivatives of probability and integral functions: 

general theory and examples) 
probability functions: kernel type solution methods see: 

Extremum problems with — 
probability integral see: multivariate — 
probability and integral functions: general theory and 

examples see: Derivatives of — 
probability integrals 

[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 

(see: Approximation of multivariate probability integrals) 


probability integrals see: Approximation of multivariate —; 
lower bounds for multivariate —; upper bounds for 
multivariate — 

probability matrix see: transition — 

probability measure see: y-concave —; logconcave —; 
logconvex —; quasiconcave —; Wiener — 

probability measure space 
[60G35, 65K05] 
(see: Differential equations and global optimization) 

probability measures see: Derivatives of —; regular family 
of —; weak convergence of — 

probability metric 
[90C11, 90C15, 90C31] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence) 

probability-one globally convergent homotopies 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 

probability-one homotopy 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 

probability-one homotopy algorithm see: globally 
convergent — 

probing 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 

problem 
[47J20, 49J40, 65K10, 90C33, 90C60] 
(see: Computational complexity theory; Solution methods 
for multivalued variational inequalities) 

problem see: 2-matching —; 3-dimensional matching —; 
3D-transportation —; accessory minimum —-; acyclic 
subdigraph —; adjoint —; airline maintenance routing —; 
airplane hopping —; algebraic quadratic assignment —; 
Alignment —; analytical approximation of a linear 
programming —; approximation to the —; aspatial 
oligopoly —; asset selling —; assignment —; 
astronomical —; Asymptotic properties of random 
multidimensional assignment —; Automatic differentiation: 
root problem and branch —; average cost per stage —; axial 
multi-index transportation —; b-matching —; backboard 
wiring —; ball-constrained linear —; bandwidth —; 
bandwidth packing —; beam segmentation —; 
bi-knapsack —; Bi-objective assignment —; bidimensional 
knapsack —; bidual —; bilateral boundary value —; bilevel 
programming —-; bilinear programming —; bin packing —; 
binary constraint satisfaction —; Biquadratic assignment —; 
Boolean classification —; Boolean function inference —; 
bottleneck quadratic assignment —; bound constrained 
quadratic —; bounded degree minimum spanning tree —; 
branch —; Broadcast scheduling —; qc optimization —; 
calm —; canonical monotonic optimization —; capacitated 
arc routing —; capacitated lot-sizing —; capacitated 
minimum spanning arborescence —; capacitated minimum 
spanning tree —; capacitated transportation —; 
capacitated vehicle routing —; Chebyshev —; chemical 
equilibrium —; Chinese postman —-; classical oligopoly —; 
classical traveling salesman —-; classification —; clique —; 
clustering —; coercive hemivariational inequality —; 
combinatorial —; combinatorial optimization —; 
communication-free alignment —; Communication 
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network assignment —; complementarity —; complete 
master —; consistent least squares —; constant degree 
parallelism alignment —; constrained minimax —; 
constrained minimization —; constraint satisfaction —; 
construction of a dual —; continuous multiple criteria —; 
convex —; convex feasibility —; convex hull —; convex 
integer transportation —; convex intersection —; convex 
moment —; convex optimization —; convex 

programming —; convex quadratic knapsack —; convex 
regression —; convex relaxation —; convex semidefinite 
programming —; covering —; crew-scheduling —; 
cutting-stock —; data-association —; d.c. programming —; 
decision —; degenerate —; degree-constrained 

subgraph —; design —; deterministic equivalent —; 
deterministic shortest path —; directed Chinese 

postman —-; direction finding —; discounted —; discounted 
infinite horizon —; discrete dynamic complementarity —; 
discrete location —; discrete multiple criteria —; discrete 
resource allocation —; discretized SIP —; disjunctive OA 
master —; dispatch —; distance-constrained vehicle 
routing —; distance geometry —; distance matrix 
completion —; dual —; dual method for the simple 
recourse —; dual optimization —; dual programming —; 
dual SD —; dual variational inequality —; duality of the 
linear SIP —; dynamic complementarity —; dynamic 
location —; dynamic two-stage stochastic programming —; 
dynamic vehicle routing —; Economic lot-sizing —; edge 
coloring —; eigenvalue —; elevator —; 

entry-uniqueness —; equality-constrained nonlinear 
programming —; Equivalence between nonlinear 
complementarity problem and fixed point —; equivalent 
primal SD —; Euclidean distance location —; Euclidean 
distance matrix completion —; evaluation —; express 
delivery —; extended linear complementarity —; extended 
quadratic programming —-; facility location —; 

feasibility —; feasible —; feasible flow —; feedback arc 

set —; feedback set —; feedback vertex set —; fekete 
points —; Fermat —; finite-dimensional control —; finite- 
dimensional variational inequality —; finite minimax —; 
finite moment —-; first level —; fixed charge —; fixed charge 
network flow —; fixed charge transportation —; fixed 
point —; fleet assignment —; flow —; flow-shop —; Flow 
shop scheduling —; follower —; fractional 0-1 
programming —-; fractional combinatorial optimization —; 
fractional programming —; Frequency assignment —-; fuel 
mixture —; full master —; fully nonlinear —; g-group 
classification —; g-group classification problem 
(discriminant —; Gauss —; general case of the trust 

region —; general Fermat —; general order 
complementarity —; general quadratic assignment —; 
General routing —; Generalizations of interior point 
methods for the linear complementarity —; generalizations 
of the nonlinear complementarity —; generalized 
assignment —; generalized bilevel programming —; 
generalized complementarity —; generalized dual —; 
Generalized eigenvalue proximal support vector 

machine —; generalized least squares —; generalized linear 
order complementarity —; generalized mixed 
complementarity —; generalized network —; Generalized 
nonlinear complementarity —; generalized nonlinear least 
squares —; generalized order complementarity —; 


generalized semi-infinite —; generalized Weber —; 
geometrical —; geometrically linear —; geometrically 
nonlinear —; global constrained optimization —; global 
maximization —; global optimization —; global 
unconstrained optimization —; Gomez #3 —; graph —; 
graph bipartitioning —; graph bipartization —; graph 
coloring —; graph isomorphism —; graph packing —; 
graph partitioning —; graph Realization —; graphical 
traveling salesman —; Hamiltonian circuit —; Hamiltonian 
cycle —; hard case of the trust region —; head-body-tail —; 
Heuristic and metaheuristic algorithms for the traveling 
salesman —-; hierarchical programming —; high point —; 
Hilbert tenth —; Hilbert’s thirteenth —; hitting cycle —; 
horizontal linear complementarity —; hyperbolic 0-1 
programming —-; ill-conditioned —-;; ill-posed —-; ill-posed 
variational —; image —; implicit complementarity —; 
implicit general order complementarity —; impossible pairs 
constrained path —; inductive inference —; infeasible —; 
infinite-dimensional generalized order complementarity —; 
infinite many conditions moment —; infinite moment —; 
initial value —; inner —; Integer linear complementary —; 
integer multi-index transportation —; integer 
optimization —; integer programming —-; integral linear 
fractional combinatorial optimization —; interpolation —; 
intractible —; inventory control —; inventory ship 

routing —-; irregular operations —; isotonic regression —; 
iterative solution of the Euclidean distance location —; 
J-normal primal —; J-stable primal —; JJT-regular —; 
job-shop —; Job-shop scheduling —; jointly convex —; 
k-index transportation —; k-level planarization —; k-way 
graph partitioning —; KH-regular —; knapsack —; 
Koopmans-Beckmann quadratic assignment —; £; 
estimation —; Lagrangian dual —; Lagrangian dual 
optimization —; language recognition —; large residual —; 
large scale —; large scale nonlinear mixed integer 
programming —-;; large scale trust region —; largest empty 
circle —; leader —; least squares —; level planarization —; 
Levitin—Polyak well-posed —; line search —; linear 
arrangement —-; linear bilevel programming —; linear 
complementarity —; linear complementary —; linear 
multiple-choice knapsack —; linear network flow —; linear 
optimization —; linear order complementarity —; Linear 
ordering —; linear programming —-; linear-quadratic —; 
linear semidefinite programming —-; linear zero-one 
integer —; local minimizer —; locally reduced —; 

location —; location-allocation —; Location routing —; 
locational decision —; lower —; lower-level —; LS —; 
m-coloring —; m-dimensional knapsack —; marriage —; 
master —; match-network —; matching —; matrix 
completion —; matrix rounding —; max-clique —; 
max-det —; max-min-max optimization —; 
max-r-Constraint Satisfaction —; maximal flow —; 
maximum cardinality matching —; maximum clique —; 
maximum constraint satisfaction —; maximum coverage 
location —; Maximum flow —; maximum Independent 

Set —; maximum partition matching —; maximum 
pre-matching —; maximum rank completion —; Maximum 
satisfiability —; maximum weight clique —; mean value —; 
Metaheuristic algorithms for the vehicle routing —; MILP 
master —; minimax —; minimax observation —; 
minimization —; minimum Bisection —; minimum concave 


4442 


Subject Index 


transportation —; minimum cost flow —; minimum cost 
network flow —; minimum cut —; minimum feedback arc 
set —; minimum feedback vertex (arc) set —; minimum 
Multiprocessor Scheduling —; minimum rank 

completion —; minimum spanning tree —; minimum 
sphere —; minimum-units —; minimum Vertex Cover —; 
minimum weight feedback arc set —; minimum weighted 
feedback vertex set —; minimum weighted graph 
bipartization —; MINLP: trim-loss —; minmax —; minMax 
Matching Subgraph —; MIQP master —; mixed integer —; 
mixed integer nonconvex —; mixed integer nonlinear 
programming —; mixed integer optimal control —; mixed 
integer optimization —; mixed linear complementarity —; 
mixed nonlinear integer programming —; Molecular 
distance geometry —; multi-armed restless bandit —; 
Multi-depot vehicle scheduling —; multi-index 
transportation —; multi-knapsack —; multicommodity 
network flow —; multiconstraint knapsack —; 
Multidimensional assignment —; multidimensional 
knapsack —; multidimensional multiple-choice 

knapsack —; multidimensional scaling —; multidimensional 
transportation —; multidimensional zero-one knapsack —; 
multifacility Weber —; multilevel generalized 

assignment —; multilevel programming —; multiperiod 
assignment —; multiple-choice knapsack —; multiple 
criteria design —; multiple knapsack —; multiWeber —; 
multiWeber—Rawls —; mVI —; N-normal primal —; 
n-queens —; nested STO —; network design —; network 
flow —; network structure of the spatial price 

equilibrium —; newsboy —; Newton step case of the trust 
region —; node-arc formulation of the —; node 

covering —; node-path formulation of the multicommodity 
flow —; nonconvex dual —; nonconvex network flow —; 
nonconvex optimization —; nonconvex primal —; 
nonconvex programming —; nonlinear 

complementarity —; nonlinear complementary —; 
nonlinear discretized SIP —; nonlinear dynamic network 
flow —; nonlinear feasibility —; nonlinear mathematical 
programming —; nonlinear network flow —; nonlinear 
optimization —; nonlinear order complementarity —; 
nonlinear programming —; nonlinear single commodity 
network flow —; nonseparable —; nonseparable 
optimization —; nonsmooth Dirichlet —; nonsmooth 
eigenvalue —; nonzero residual —; NP-complete —; 
NP-hard —; numerical constraint satisfaction —; numerical 
example of a trim-loss —; objective for a location —; ODE 
two-point boundary value —; one-parametric finite 
optimization —; open shop —; optimal assignment —; 
optimal control —; optimization —; order 
complementarity —; order preserving assignment —; 
outer —; p-center —; p-median —; p-median 
location-allocation —; package flow —; packing —; 
parameter identification —; parametric —; parametric 
linear complementarity —; parametric nonlinear 
complementarity —; parametric optimization —; 
parametric variational inequality —; path coloring —; 
perfect b-matching —; perfect matching —; perturbed least 
squares —; phase —; phase equilibrium —; phase 
stability —; physically linear —; physically nonlinear —; 
piecewise linear minimum cost network flow —; planar 
multi-index transportation —; political districting —; 


polynomially transformable decision —; portfolio 

selection —; positive (semi) definite completion —; positive 
semidefinite matrix completion —; pricing —; primal —; 
primal master —; primal method for the simple recourse —; 
primal optimization —; primal programming —-; primal 

SD —- prize collecting traveling salesman —; process 
planning —; Production-distribution system design —; 
programming —; protein folding —; prototype location —; 
pure network —; quadratic assignment —; quadratic 
programming —; Quadratic semi-assignment —; quadratic 
zero-one —; quasidifferentiable programming —; radio link 
frequency assignment —; real-world —; realisable —; 
recognition —; recourse —; rectangular partition —; 
rectilinear distance location —; regularized direction 
finding —; regularizing state —; relaxed —; relaxed 

control —; relaxed master —; relaxed primal master —; 
resource allocation —; resource-constrained minimum 
spanning tree —; restricted location —; restricted 

master —; p-regular —; right-hand side —; right-hand side 
perturbation —; road traveling salesman —; robust 
programming —; root —; rural postman —; 

saddle-point —; sample —; SAT-CNF —-; satellite —; 
satisfiability —; scheduling —; second level —; selection —; 
selfdual —; semi-infinite optimization —; semidefinite 
programming —; Sensor network localization —; 
separable —; separable optimization —; separation —; 
sequencing —; set covering —; set packing —; set 
partitioning —; set-valued optimization —; shortest 

path —; simple plant location —; Simple recourse —; 
simultaneous Diophantine approximation —; Single-depot 
vehicle scheduling —; Single facility location: circle 
covering —; single-ratio fractional (hyperbolic) 0-1 
programming —-; single source shortest path tree —; 
singular control —; sinusoidal parameter estimation —; 
skorokhod —; smallest enclosing-circle —; solution of a —; 
solution of the alignment —; solution of the convex 
moment —; sorting —; sparse least squares —; spatial price 
equilibrium —; squared Euclidean distance location —; 
stability —; stable —; stable marriage —; standard 
moment —; standard quadratic optimization —; standard 
SD —; standard traffic equilibrium —; state —; static 
deterministic —; Steiner —; Steiner graphical traveling 
salesman —; Steiner minimal tree —; Steiner-Weber —; 
stiff —; stochastic dynamic optimization —; stochastic 
network —; stochastic programming —; stochastic shortest 
path —; stochastic transportation —; stochastic 
transportation and location —; stochastic vehicle 

routing —; strongly polynomial time —; subset feedback 
vertex (arc) set —; subset minimum feedback vertex (arc) 
set —; subset-sum —-; survivable network design —; 
Sylvester —; symmetric multi-index transportation —; 
synthesis —; terminal layout —; three-dimensional 
transportation —; three-index transportation —; 
Time-dependent traveling salesman —; time optimal 
control —; total coloring —; total cost infinite horizon —; 
total least squares —; traffic assignment —; tramp 

steamer —; transportation —; transshipment —; Traveling 
purchaser —; traveling salesman —-; traveling 

salesperson —-; trim-loss —; trust region —; turbine 
balancing —; Turing machine solving a —; two-point 
boundary value —; unary optimization —; uncapacitated 
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facility location —; uncapacitated network flow —; 
uncapacitated plant location —; unconstrained —; 
unconstrained nonlinear least squares —; unconstrained 
optimization —; underlying deterministic —; 
undiscounted —; unilateral boundary value —; unweighted 
feedback vertex set —; upper —; upper level —; variational 
inequality —; vector variational inequality —; vehicle 
routing —; vehicle scheduling —; vertex (arc) deletion —; 
vertical linear complementarity —; warehouse location —; 
Weber —; Weber-Rawls —; weighted bipartite 
matching —; weighted graph coloring —; weighted least 
squares —; weighted matching —; weighted MAX-SAT —; 
well-conditioned —; well-posed —; zero-one integer 
feasibility —; zero-one knapsack —; zero-one 
programming —-; zero residual — 
problem in air traffic control see: ground delay — 
Problem (ATSP) see: asymmetric Traveling Salesman — 
problem with attraction and repulsion see: Global optimization 
in Weber's —; Weber — 
problem with backhauls see: vehicle routing — 
problem with bounded cost per stage see: discounted — 
problem: branch & cut algorithms see: Stable set — 
problem and branch problem see: Automatic differentiation: 
root — 
problem of the calculus of variations see: inverse — 
problem, CMO see: Contact map overlap maximization — 
problem decomposition 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
problem description 
[90C30, 90C35] 
(see: Optimization in water resources) 
problem (discriminant problem) see: g-group classification — 
problem: dual method see: Simple recourse — 
problem equivalence 
[90C60] 
(see: Computational complexity theory) 
problem of finding shortest paths 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
problem and fixed point problem see: Equivalence between 
nonlinear complementarity — 
problem formulation 
[68M 12, 90B18, 90C11, 90C30] 
(see: Optimization in ad hoc networks) 
problem formulation see: multilevel — 
problem formulations see: Stochastic optimal stopping: — 
problem with friction see: coupled unilateral contact — 
problem generators see: Combinatorial test problems and — 
problem for input-output tables see: triangulation — 
problem instance 
[68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
problem instance see: size of a — 
problem instance in time m see: algorithm solving a — 
problem integration 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
problem involving QD-superpotentials see: convex variational 
inequality for an elastostatic —; elastostatic —; variational 
equality for an elastostatic — 


problem, MAX-CUT see: Maximum cut — 

problem with minimum number of Steiner points see: Steiner 
tree — 

problem modeling 

[90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 
Problem (MP) see: minimum Partition — 
problem on a network see: 1-median —; covering —; 

p-center — 
problem with nonnegative lower bounds see: maximum 

flow — 
problem with nonunit capacity 

[00-02, 01-02, 03-02] 

(see: Vehicle routing problem with simultaneous pickups 

and deliveries) 
problem in OR see: multifacility —; single-criterion —; 

single-facility —; unweighted —; weighted — 
problem parameters 
[90C60] 
(see: Computational complexity theory) 
problem: primal method see: Simple recourse — 
problem principle see: auxiliary — 
problem and a projected dynamical system see: variational 
inequality — 
problem in protein folding: w@BB global optimization approach 
see: Multiple minima — 

problem regular in the sense of Jongen-Jonker-Twilt 

[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 

90C33, 90C34] 

see: Parametric optimization: embeddings, path following 

and singularities) 

problem regular in the sense of Kojima-Hirabayashi 

[65K05, 65K10, 90C20, 90C25, 90C26, 90C29, 90C30, 90C31, 

90C33, 90C34] 

see: Parametric optimization: embeddings, path following 

and singularities) 

problem representation 

[90C30] 

(see: Cost approximation algorithms) 

problem result 

[90C60] 

see: Complexity theory) 

problem for the Rosen method see: global convergence — 

Problem (SCP) see: stacker Crane — 

problem in semi-infinite programming see: reduced — 

problem with simultaneous pickups and deliveries see: Vehicle 
routing — 

problem, SNLP see: Semidefinite programming and the sensor 
network localization — 

problem solution 

[90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 
problem solving see: restriction to the solution set in — 
problem solving environment 

[90C10, 90C26, 90C30] 

(see: Optimization software) 
problem of spot rate estimation see: t-programmed — 
problem in SQP see: quadratic programming — 
problem in standard form see: linear optimization — 

Problem (SVP) see: seismic Vessel — 
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problem synthesis 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 

problem with time windows see: vehicle routing — 

problem under uncertainty with perturbations see: minimax 
observation — 

problem variables see: separable — 

problem in X-ray crystallography: Shake and bake approach 
see: Phase — 

problems see: 0-1 mixed integer —; algorithms for isotonic 
regression —; approximate methods for solving vehicle 
routing —; Approximations to robust conic optimization —; 
average cost per stage —; barrier location —; bilevel 
programming —-; Bottleneck steiner tree —; cargo 
routing —; classification of hard —; combinatorial 
optimization —; Combinatorial optimization algorithms in 
resource allocation —; Complexity and large-scale least 
squares —; computational complexity of optimization —; 
constrained minimax —; constraint satisfaction —; 
constructive methods for solving vehicle routing —; 
continuous constraint satisfaction —; Continuous 
reformulations of discrete-continuous optimization —; 
Control —; convex —; Convex envelopes in 
optimization —; convex and nonconvex programming —; 
decomposition algorithms for nonconvex minimization —; 
Decomposition algorithms for the solution of multistage 
mean-variance optimization —; design —; discrete 
monotonic optimization —; discretization of 
optimization —; discretized optimal control —; distributed 
optimal control —; distribution —; dynamic network 
flow —; Dynamic programming: average cost per stage —; 
Dynamic programming: discounted —; Dynamic 
programming: stochastic shortest path —; Dynamic 
programming: undiscounted —; equilibrium —; 
equivalence classes of —; exact methods for solving vehicle 
routing —; extended linear programming —; Facilities 
layout —; facility location —; Feedback set —; fixed 
demand traffic network —; formulation and solution of 
inverse —; General moment optimization —; Generalized 
monotonicity: applications to variational inequalities and 
equilibrium —; Global optimization algorithms for financial 
planning —; Global optimization: application to phase 
equilibrium —; Global optimization in location —; 
Hemivariational inequalities: eigenvalue —; Hemivariational 
inequalities: static —; high-order local maximum principle 
for Lagrangian —; “hit-or-miss” decision —-; ill-posed —; 
Ill-posed variational —; implicit variational —; indefinite 
quadratic —; inequality —; infinite horizon —; integer 
programming —; Integrated vehicle and duty 
scheduling —; Interval analysis: application to chemical 
engineering design —; Interval analysis: 
nondifferentiable —; Isotonic regression —; knapsack —; 
Kuhn-Tucker conditions for quadratic programming 
sub- —; language recognition —; Laplace method and 
applications to optimization —; large nonlinear 
multicommodity flow —; Large scale trust region —; Least 
squares —; linear complementarity —; linear mixed 
integer —; linear SIP —; linearly constrained 
optimization —; Maritime inventory routing —; Matrix 
completion —; maximum flow —; Minimum concave 
transportation —; minisum —; MINLP: applications in 


blending and pooling —; Mixed integer classification —; 
Modeling difficult optimization —; multi-criteria —; 
multi-depot vehicle scheduling —; multi-index 
assignment —; Multi-index transportation —; 
Multi-objective fractional programming —; 
Multicommodity flow —; Multidimensional knapsack —; 
Multifacility and restricted location —; multiphase 
Steiner —; multistage —; N-adic assignments —; Network 
design —; Network location: covering —; Nonconvex 
network flow —; nondegenerate —; Nondifferentiable 
optimization: minimax —; nonlinear assignment —; 
nonlinear complementarity —; Nonlinear least squares —; 
nonlinear multicommodity flow —; nonlinear network 
flow —; nonlinear optimization —; Nonoriented 
multicommodity flow —; nonsmooth optimization —; 
optimal design —; optimization —; Optimization in 
boolean classification —; parametric complementarity —; 
partly convex —; Piecewise linear network flow —; PLS —; 
polynomial time local search —; pooling and blending —; 
Price of robustness for linear optimization —; primal and 
dual —; Principal pivoting methods for linear 
complementarity —; quadratic generalized network —; 
quasidifferentiable —; reducibility of —; reducible —; 
second order Lagrangian theory of CNSO —; semi-infinite 
optimization —; Semi-infinite programming and control —; 
Semi-infinite programming: methods for linear —; 
Sensitivity analysis of complementarity —; Sensitivity 
analysis of variational inequality —; Sequential quadratic 
programming: interior point methods for distributed 
optimal control —; Set covering, packing and 
partitioning —; shortest path tree —; simulation —; 
Simultaneous estimation and optimization of nonlinear —; 
Single-depot vehicle scheduling —; solution of bilevel 
programming —; sorting —; Splitting method for linear 
complementarity —; SQP optimization in industrial —; 
stability analysis of optimization —; Stabilization of cutting 
plane algorithms for stochastic linear programming —; 
Stackelberg —; Steiner tree —; stochastic —; stochastic 
linear optimization —; Stochastic quasigradient methods in 
minimax —; stochastic shortest path —; Stochastic 
transportation and location —; Stochastic vehicle 
routing —; substationarity —; three-index assignment —; 
toy —; transformation of —; traveling salesman —; 
variational —; variational inequality — 

problems: algorithms see: Standard quadratic optimization — 

problems: applications see: Standard quadratic 
optimization — 

problems: convexity theory see: Probabilistic constrained — 

problems with a fixed number of vehicles see: Vehicle 
scheduling — 

problems with massive data sets see: least squares — 

problems: massively parallel solution see: Stochastic 
network — 

problems method see: extended support —; supports — 

problems with multiple types of vehicles see: Vehicle 
scheduling — 

problems in optical networks see: Integer linear programs for 
routing and protection — 

problems and optimization see: Plant layout — 

problems: optimization techniques see: Estimating data for 
multicriteria decision making — 
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problems, overview see: Dynamic programming: infinite 
horizon — 

problems with probability functionals see: Approximation of 
extremum — 

problems with probability functions: kernel type solution 
methods see: Extremum — 

problems and problem generators see: Combinatorial test — 

problems and proof methodology see: NP-complete — 

problems solution method see: support — 

problems with spatial interaction see: Facility location — 

problems with staircase costs see: convex piecewise 
linearization in facility location —; heuristics of facility 
location —; linearization in facility location —; solution of 
facility location — 

problems: theory see: Standard quadratic optimization — 

problems with time constraints see: vehicle scheduling — 

problems with travel demand functions see: elastic demand 
traffic network — 

problems in unit-disk graphs see: Optimization — 

problems and variational inequalities see: Nonsmooth and 
smoothing methods for nonlinear complementarity — 

procedure see: alternating —; anti-cycling —; arc oriented 
construction —; arc separation —; bB —; best arc 
construction —; best node construction —; branch and 
cut —; Direct search Luus—Jaakola optimization —; 
exact —; greedy randomized adaptive search —; 
heuristic —; improved —; inhibit —; interactive 
sampling —; join —; labeling —; lifting —; LJ 
optimization —; mixed construction —; mixed VAM 
construction —; model finding —; Monte-Carlo 
simulation —; Newton —; next shortest path —; node 
oriented construction —; rANDOMIZED ROUNDING —; 
recursive —; Rosenbrock hillclimbing —; roulette wheel —; 
S- —; sequential estimation —; stochastic discretization —; 
two-phase —; Weiszfeld —; Zionts—Wallenius — 

procedures see: construction —; dual —; gradient based —; 
Greedy randomized adaptive search —; interactive —; local 
exchange —; savings —; second order —; solution —; 
statistical — 

process see: 3PM —; adjustment —; analytic hierarchy —; 
application —; automated design optimization —; 
batch —; computational —; deformation —; diffusion —; 
duty scheduling —; equilibrium —; guiding —; Markov —; 
Markov decision —; Metropolis —; multistart —; simple 
homogeneous —; stochastic —; tatonnement —-; trip-route 
choice adjustment —; Wiener — 

process-Box 
[65K05, 65Y05, 65Y10, 65Y20, 68W10] 
(see: Interval analysis: parallel methods for global 
optimization) 

process control 
[49M37, 90C11, 90C29, 90C90] 
(see: MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control) 

process derivatives 
[60J05, 90C15] 
(see: Derivatives of markov processes and their simulation; 
Derivatives of probability measures) 

process design 
[49M37, 65C20, 65G20, 65G30, 65G40, 65H20, 90C11, 90C29, 


90C90] 
(see: Interval analysis: application to chemical engineering 
design problems; MINLP: applications in the interaction of 
design and control; Mixed integer nonlinear programming; 
Multi-objective optimization: interaction of design and 
control) 

process design 

65C20, 65G20, 65G30, 65G40, 65H20, 90C90] 

(see: Interval analysis: application to chemical engineering 

design problems) 

process differentiable 

90C15] 

(see: Derivatives of probability measures) 

process dynamics 

49M37, 90C11] 

(see: MINLP: applications in the interaction of design and 

control) 

process flowsheet 

90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 

process industry see: Planning in the —; Successive quadratic 
programming: applications in the — 

process the LCP 
[90C33] 
(see: Lemke method) 

process networks under uncertainty see: Bilevel programming 
framework for enterprise-wide — 

process nonanticipative with respect to a filtration see: 
stochastic — 

process operations 

[49M37, 90C11] 

see: Mixed integer nonlinear programming) 

process optimization 

[65C20, 65G20, 65G30, 65G40, 65H20, 90C90] 

see: Interval analysis: application to chemical engineering 

design problems) 

process optimization 

[65C20, 65G20, 65G30, 65G40, 65H20, 90C90] 

see: Interval analysis: application to chemical engineering 
design problems) 

process optimization see: off-line —; on-line — 

process planning see: Chemical — 

process planning problem 

[90C90] 

see: Chemical process planning) 

process representation 

[90C15] 

see: Derivatives of probability measures) 

process simulation 

[65C20, 65G20, 65G30, 65G40, 65H20, 90C90] 

see: Interval analysis: application to chemical engineering 

design problems) 

process simulation 

[65C20, 65G20, 65G30, 65G40, 65H20, 90C90] 

see: Interval analysis: application to chemical engineering 

design problems) 

process simulation programs 

[90C30, 90C90} 

(see: Successive quadratic programming: applications in the 

process industry) 
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process synthesis 
[49M37, 65C20, 65G20, 65G30, 65G40, 65H20, 90C11, 90C29, 
90C90] 
(see: Interval analysis: application to chemical engineering 
design problems; Mixed integer nonlinear programming; 
Multi-objective optimization: interaction of design and 
control) 

process synthesis 
[65C20, 65G20, 65G30, 65G40, 65H20, 90C90] 
(see: Interval analysis: application to chemical engineering 
design problems) 

process synthesis and design under uncertainty 
[49M37, 90C11] 
(see: Mixed integer nonlinear programming) 

process units see: building blocks for the — 

processes see: abnormal —; biochemical —; Medium-term 
scheduling of batch —; MINLP: design and scheduling of 
batch —; Reactive scheduling of batch —; regenerative —; 
relaxation labeling —; Short-term scheduling of 
continuous —; synthesis of separation — 

processes in interactive methods see: computing — 

processes with resources see: Short-term scheduling of 
batch — 

processes and their simulation see: Derivatives of markov — 

processing see: nonGaussian signal —; nonlinear signal —; 
optimization in medical image — 

processing with higher order statistics see: Signal — 

processor see: virtual — 

product see: Cartesian —; fuzzy relational —; fuzzy triangle —; 
harsh fuzzy —; K-local inner —; mean —; strong 

product of affine functions 

[90C26, 90C31 

see: Multiplicative programming) 

product campaign see: mixed- —; single- — 

product of concave functions 

[90C26, 90C31 

see: Multiplicative programming) 

product of convex functions 

[90C26, 90C31 

(see: Multiplicative programming) 

product matrix 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

product of relations see: BK- —; circle —; self-inverse —; 
square — 

product set see: Cartesian — 

product of two affine functions see: program of minimizing 
a— 

production see: modeling — 

Production-distribution system design problem 
(90B06) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Single facility location: circle covering problem; 


Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 

production functions 

49-XX, 60Jxx, 65Lxx, 91B32, 92D30, 93-XX] 

(see: Resource allocation for epidemic control) 

production planning 

90-01, 90B30, 90B50, 91B32, 91B52, 91B74] 

(see: Bilevel programming in management) 

Production planning under uncertainty 

90C15] 

(see: Stochastic quasigradient methods in minimax 

problems) 

production realizing with minimal social cost 

90C33] 

(see: Order complementarity) 

production set 

90B30, 90B50, 90C05, 91B82] 
(see: Data envelopment analysis) 

production systems see: batch — 

productivity see: maximization of — 

products see: BK- —; nonassociative — 

products of relations see: pseudo-associativity of — 

profiles of conjugate-gradient algorithms for unconstrained 
optimization see: Performance — 

profit 
[90C26] 
(see: Global optimization in batch design under 
uncertainty) 

profit-to-time ratio cycle see: maximum — 

program see: achievement scalarizing —; adjoint —; basic 
operations in a —; bilevel —; concave fractional —; 
conic —; convex multiplicative —; convex quadratic —; 
dual —; dual linear —; dual semi-infinite —; dual 
semidefinite —; extreme point mathematical —; facial —; 
facial disjunctive —; finite-dimensional linear —; 
fractional —; full master —; generalized fractional —; 
infeasible —; integer —; integer linear —; lattice —; 
linear —; linear fractional —; linear multiplicative —; linear 
programming —-; linear semidefinite —; loss of descent in 
a nonlinear —; master —; max-min fractional —; min-max 
fractional —; mixed integer —; mixed integer nonlinear —; 
multi-objective fractional —; multiperiod stochastic —; 
multistage stochastic —; Nasa —; nondifferentiable 
convex —; nondifferentiable nonconvex —; nonlinear —; 
optimal solution of a —; partly convex —; pretty-printing 
a declarative —; primal (linear) semi-infinite —; 
quadratic —; quadratic fractional —; reduced master —; 
reduced quadratic —; relaxed nonlinear —; semi-infinite —; 
semidefinite —; semidefinite program as conic convex —; 
sequentially convexifiable —; simplex —; single parametric 
mixed integer linear —; single-ratio fractional —; 
stochastic —; stochastic bilevel —; sum-of-ratios 
fractional —; Tchebycheff —; two-stage stochastic linear —; 
unbounded —; weighted-sums —; zero-one integer — 

program with an additional reverse convex constraint see: 
linear — 

program with affine equilibrium constraints see: 
mathematical — 
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program as conic convex program see: semidefinite — 
program with equilibrium constraints see: mathematical — 
program length see: Shortest — 

program of minimizing a convex multiplicative function 

[90C26, 90C31] 

(see: Multiplicative programming) 
program of minimizing a generalized convex function 

[90C26] 

(see: Global optimization in multiplicative programming) 
program of minimizing a product of two affine functions 

[90C26, 90C31] 

(see: Multiplicative programming) 
program for nonlocal sensitivity analysis see: automated 

Fortran — 
program with recourse see: stochastic —; stochastic 

integer —; stochastic linear —; two-stage stochastic — 
program structure see: analysing declarative — 
programmed implementation 

[65G20, 65G30, 65G40] 

(see: Interval analysis: systems of nonlinear equations) 
programmed problem of spot rate estimation see: t- — 
programming see: analytical approximation of linear —; 

applications of parametric —; bibliography of stochastic —; 

Bilevel —; Bilevel fractional —; Bilevel linear —; bilinear —; 

binary linear —; chance constraint —; combinatorial 

fractional —; complexity of bilevel —; Complexity theory: 
quadratic —; compromise —; concave —; concave 
quadratic —; constrained logic —; constraint —; constraint 
logic —; continuous —; convex —; convex composite —; 
convex integer —; convex parametric —; convex 

quadratic —; Copositive —; cost functions in integer —; 

D.C. —; Decomposition principle of linear —; Design of 

robust model-based controllers via parametric —; 

differentiable convex —; differential dynamic —; difficulties 
in bilinear —; Disjunctive —; duality for bilevel —; Duality 
for semidefinite —; dynamic —; enumeration in bilevel —; 

Extension of the fundamental theorem of linear —; feasible 

direction method for nonlinear —; Feasible sequential 

quadratic —; flexible —; Fractional —; fractional linear —; 

Fractional zero-one —; full space successive quadratic —; 

fundamental property in convex —; fuzzy —; Fuzzy 

multi-objective linear —; Generalized disjunctive —; 
generalized geometric —; Geometric —; Global 
optimization in generalized geometric —; Global 
optimization in multiplicative —; goal —; Graph realization 
via semidefinite —; group relaxation in integer —; 
handbook on Semidefinite —; history of parametric —; 

Homogeneous selfdual methods for linear —; 

hyperbolic —; implicit function approach to bilevel —; 

indefinite quadratic —; infinite-dimensional linear —; 

instability in parametric —; integer —; integer fractional —; 
integer linear —; Interior point methods for semidefinite —; 
iterative dynamic —; Lagrangian multipliers methods for 
convex —; Lexicographic Goal —; linear —; 

linear-fractional —; linear semi-infinite —; Lipschitz —; 
logic —; mathematical —; matrix splitting methods in 
quadratic —; Maximum likelihood detection via 
semidefinite —; mixed integer —; mixed integer linear —; 
mixed integer nonlinear —; mixed-integer quadratic —; 
modeling language and constraint logic —; 

multi-objective —; multi-objective fractional —; 


Multi-objective integer linear —; multi-objective linear —; 
multi-objective mathematical —; Multi-objective mixed 
integer —; multi-objective (multicriteria) mixed integer —; 
multilevel —; Multiparametric linear —; Multiparametric 
mixed integer linear —; multiple objective —; Multiple 
objective dynamic —; multiple objective linear —; 
multiplicative —; multistage stochastic —; n-fold 
integer —; network —; Neuro-dynamic —; nonconvex —; 
nonconvex quadratic —; Nondifferentiable optimization: 
parametric —; nonlinear —; optimality in bilinear —; 
optimality in parametric —; paradigm of logic —; 
parallel —; parametric —; parametric linear —; perfect 
duality from the view of linear semi-infinite —; Piecewise 
linear —; polynomial —; positive semi-definite quadratic 
binary —; Potential reduction methods for linear —; 
Preprocessing in stochastic —; probabilistic —; probabilistic 
constrained linear —; probabilistic constrained 
stochastic —; pure zero-one —; quadratic —; quadratic 
concave —; reduced problem in semi-infinite —; reverse 
convex —; semi-infinite —; semi-infinite linear —; 
semidefinite —; sensitivity in nonlinear —; sequential 
quadratic —; signomial —; Simplicial pivoting algorithms 
for integer —; stability on parametric —; stable bilinear —; 
stable parametric —; stochastic —; stochastic dynamic —; 
stochastic integer —; stochastic linear —; stochastic 
(mixed-)integer —; structural stability in parametric —; 
successive quadratic —; test sets in integer —; topological 
stability in parametric —; two-stage stochastic —; variable 
factor —; Young —-; zero-one integer — 

programming: algebraic methods see: Integer — 

programming algorithm see: continuous-time equivalent of 
the dynamic —; descent in a nonlinear —; dynamic — 

programming: applications see: Bilevel — 

programming: applications in distillation systems see: 
Successive quadratic — 

programming: applications in engineering see: Bilevel — 

programming and applications in finance see: Semi-infinite — 

programming: applications in the process industry see: 
Successive quadratic — 

programming: applications in the supply chain management 
see: Bilinear — 

programming approach see: semidefinite — 

programming approach for DNA transcription element 
identification see: Mixed 0-1 linear — 

programming: approximation methods see: Semi-infinite — 

programming: average cost per stage problems see: 
Dynamic — 

programming: barycentric approximation see: Multistage 
stochastic — 

programming with bound constraints see: Quadratic — 

programming: branch and bound methods see: Integer — 

programming: branch and cut algorithms see: Integer — 

programming in clustering see: Dynamic — 

programming with column generation see: Branch and price: 
Integer — 

programming: complexity, equivalence to minmax, concave 
programs see: Bilevel linear — 

programming: complexity and equivalent forms see: Quadratic 
integer — 

programming/constraint programming hybrid methods see: 
Mixed integer — 
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programming: continuity, stability, rates of convergence see: 
Stochastic integer — 

programming: continuous-time optimal control see: 
Dynamic — 

programming and control problems see: Semi-infinite — 

programming: cost simplex algorithm see: Parametric linear — 

programming: cutting plane algorithms see: Integer — 

programming for data mining see: Mathematical — 

programming: decomposition and cutting planes see: 
Stochastic linear — 

programming: decomposition methods see: Successive 
quadratic — 

programming and determinant maximization see: 
Semidefinite — 

programming: deterministic global optimization see: Mixed 
integer nonlinear bilevel — 

programming: Dinkelbach method see: Quadratic fractional — 

programming: discounted problems see: Dynamic — 

programming: discretization methods see: Semi-infinite — 

programming duality see: Integer —; linear — 

programming: duality theory see: Probabilistic constrained 
linear — 

Programming and Economic Analysis see: linear — 

programming over an ellipsoid see: Quadratic — 

programming equation see: continuous-time analog of the 
dynamic — 

programming equations see: recursive dynamic — 

programming framework for enterprise-wide process 
networks under uncertainty see: Bilevel — 

programming: full space methods see: Successive quadratic — 

programming with fuzzy coefficients see: multi-objective 
linear — 

programming generating two paths see: Pivoting algorithms 
for linear — 

programming: global optimization see: Bilevel — 

programming: heat exchanger network synthesis see: Mixed 
integer linear — 

programming hybrid methods see: Mixed integer 
programming/constraint — 

programming: implicit function approach see: Bilevel — 

programming: infinite horizon problems, overview see: 
Dynamic — 

programming: interior point methods see: Linear — 

programming: interior point methods for distributed optimal 
control problems see: Sequential quadratic — 

programming: introduction, history and overview see: 
Bilevel — 

programming: inventory control see: Dynamic — 

programming: karmarkar projective algorithm see: Linear — 

programming: KKT necessary optimality conditions see: 
Equality-constrained nonlinear — 

programming: Klee—Minty examples see: Linear — 

programming: lagrangian relaxation see: Integer — 

programming in management see: Bilevel — 

programming: mass and heat exchanger networks see: Mixed 
integer linear — 

programming method see: piecewise sequential quadratic — 

programming methods see: active set quadratic —; sequential 
quadratic — 

programming: methods for linear problems see: 
Semi-infinite — 


programming methods in supply chain management see: 
Mathematical — 

programming: minimax approach see: Stochastic — 

programming: mixed continuous and discrete free variables 
see: Generalized geometric — 

programming model see: parametric — 

programming models see: parametric —; Static stochastic —; 
two-stage stochastic — 

programming: models and applications see: Multi-quadratic 
integer — 

programming models for classification see: Linear — 

programming models: conditional expectations see: Static 
stochastic — 

programming models: random objective see: Stochastic — 

programming and Newton's method in unconstrained 
optimal control see: Dynamic — 

programming: nonanticipativity and lagrange multipliers see: 
Stochastic — 

programming: numerical methods see: Semi-infinite — 

programming: optimal control applications see: Dynamic — 

programming: optimality conditions see: Generalized 
semi-infinite — 

programming: optimality conditions and duality see: Bilevel — 

programming: optimality conditions and stability see: 
Semidefinite — 

programming paradigm see: general dynamic —; 
imperative — 

programming: parallel factorization of structured matrices see: 
Stochastic — 

programming and perfect duality see: Semi-infinite 
programming, semidefinite — 

programming problem 
[05C50, 15A48, 15457, 90C25] 
(see: Matrix completion problems) 

programming problem see: analytical approximation of 
a linear —; bilevel —; bilinear —; convex —; convex 
semidefinite —; d.c. —; dual —; dynamic two-stage 
stochastic —; equality-constrained nonlinear —; extended 
quadratic —; fractional —; fractional 0-1 —; generalized 


bilevel —; hierarchical —; hyperbolic 0-1 —; integer —; 
large scale nonlinear mixed integer —; linear —; linear 
bilevel —; linear semidefinite —; mixed integer 


nonlinear —; mixed nonlinear integer —; multilevel —; 
nonconvex —; nonlinear —; nonlinear mathematical —; 
primal —; quadratic —; quasidifferentiable —; robust —; 
semidefinite —; single-ratio fractional (hyperbolic) 0-1 —; 
stochastic —; zero-one — 

programming problem in SQP see: quadratic — 

programming problems see: bilevel —; convex and 
nonconvex —; extended linear —; integer —; 
Multi-objective fractional —; solution of bilevel —; 
Stabilization of cutting plane algorithms for stochastic 
linear — 

programming program see: linear — 

programming: quasigradient method see: Two-stage 
stochastic — 

programming recursion see: dynamic — 

programming recursions see: dynamic — 

programming relaxation see: linear — 

programming relaxations see: linear — 
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programming with right-hand-side uncertainty, duality and 
applications see: Robust linear — 

programming: second order optimality conditions see: 
Semi-infinite — 

programming, semidefinite programming and perfect duality 
see: Semi-infinite — 

programming and the sensor network localization problem, 
SNLP see: Semidefinite — 

programming with simple integer recourse see: Stochastic — 

programming: solution by active sets and interior point 
methods see: Successive quadratic — 

programming: stochastic shortest path problems see: 
Dynamic — 

programming and structural optimization see: Semidefinite — 

programming sub-problems see: Kuhn-Tucker conditions for 
quadratic — 

programming subproblem see: quadratic —; reduced 
quadratic — 

programming support see: Multiple objective — 

programming under probabilistic constraint 
[90C15] 
(see: Static stochastic programming models) 

programming under uncertainty see: multi-objective linear — 

programming: undiscounted problems see: Dynamic — 

programming with variable coefficients see: generalized 
linear — 

programs see: AD of parallel —; air traffic control and ground 
delay —; algorithms for stochastic bilevel —; bilevel —; 
Bilevel linear programming: complexity, equivalence to 
minmax, concave —; classification of fractional —; 
classifying declarative —; computationally equivalent 
semi-infinite —; conic convex —; discretely distributed 
stochastic —; dual linear —; indefinite quadratic —; 
linearization of —; mixed integer —; mixed integer 0-1 —; 
multiratio —; multy-stage stochastic —; nonconvex —; 
nonlinear semi-infinite —; partly convex —; polynomial —; 
process simulation —; Redundancy in nonlinear —; reverse 
convex —; Robust optimization: mixed-integer linear —; 
robust parametric —; Selfdual parametric method for 
linear —; semi-infinite —; single-ratio —; Solving large scale 
and sparse semidefinite —; Stochastic bilevel —; Stochastic 
integer —; symbolically transforming declarative — 

programs with constraints see: weighted-sums — 

programs: descent directions and efficient points see: 
Discretely distributed stochastic — 

programs with recourse see: L-shaped method for two-stage 
stochastic —; two-stage stochastic — 

programs with recourse and arbitrary multivariate 
distributions see: Stochastic linear — 

programs with recourse: upper bounds see: Stochastic — 

programs for routing and protection problems in optical 
networks see: Integer linear — 

programs with simple integer recourse see: two-stage 
stochastic — 

prohibited neighbor in tabu search 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 

prohibition parameter 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 


prohibition parameter T 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
project see: lift-and- — 
project cut see: lift-and- — 
project cuts see: lift-and- — 
project hierarchy see: lift-and- — 
project scheduling see: Static resource constrained — 
projected dynamical system 
[65K10, 90B15, 90C90] 
(see: Dynamic traffic networks; Variational inequalities: 
projected dynamical system) 
projected dynamical system 
[65K10, 90B15, 90C90] 
(see: Dynamic traffic networks; Variational inequalities: 
projected dynamical system) 
projected dynamical system see: Variational inequalities: —; 
variational inequality problem and a — 
projected dynamical systems 
[65K10, 90C90] 
(see: Variational inequalities: projected dynamical system) 
projected gradient algorithm 
[47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 
projected gradient methods see: Spectral — 
projected Hessian matrix of a Lagrangian function 
90C20, 90C30] 
(see: Successive quadratic programming: decomposition 
methods) 
projected Lagrangian Hessian matrix 
90C20, 90C30] 
(see: Successive quadratic programming: decomposition 
methods) 
projected negative gradient 
58E05, 90C30] 
(see: Topology of global optimization) 
projected negative gradient 
58E05, 90C30] 
(see: Topology of global optimization) 
projected positive gradient 
58E05, 90C30] 
(see: Topology of global optimization) 
projected positive gradient 
58E05, 90C30] 
(see: Topology of global optimization) 
projection 
49]52, 49M29, 65K10, 65M60, 90C11, 90C30] 
(see: Generalized benders decomposition; Nondifferentiable 
optimization: relaxation methods; Variational inequalities) 
projection 
52B12, 65K10, 65M60, 68Q25] 
(see: Fourier—-Motzkin elimination method; Variational 
inequalities) 
projection see: best —; gradient 
orthogonal —; subgradient — 
projection algorithm see: gradient —; subgradient — 
projection cone see: isotone — 
projection constraints 
[65G20, 65G30, 65G40, 68T20] 
(see: Interval constraints) 


; isotone —; metric —; 


4450 


Subject Index 


projection data see: feasibility approach to image 
reconstruction from —; image reconstruction from —; 
optimization approach to image reconstruction from — 
projection matrix 

[65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 

squares) 
projection matrix see: oblique — 
projection method 

[47J20, 49J40, 65K10, 90C33, 91B50] 

(see: Solution methods for multivalued variational 

inequalities; Walrasian price equilibrium) 
projection method 

[90C30, 91B50] 

(see: Relaxation in projection methods; Walrasian price 

equilibrium) 
projection method see: Rosen gradient — 
projection methods see: Relaxation in —; sQG — 
projection operator see: orthogonal — 
projection-restriction strategy 

[90C26] 

(see: Bilevel optimization: feasibility test and flexibility 

index) 
projective 

[49M07, 49M10, 65K05, 90C06] 

(see: Performance profiles of conjugate-gradient algorithms 

for unconstrained optimization) 
projective algorithm 

[90C05] 

(see: Linear programming: interior point methods) 
projective algorithm see: Linear programming: karmarkar — 
projective transformation 
[90C05] 

(see: Linear programming: karmarkar projective algorithm) 
projective transformation 

[90C05] 

(see: Linear programming: karmarkar projective algorithm) 
Prolog 

[65G20, 65G30, 65G40, 68T20] 

see: Interval constraints) 

Prolog see: BNR- — 

Prolog IV 

[65G20, 65G30, 65G40, 68T20] 

see: Interval constraints) 

prolongation see: axiom of — 

prolongation axiom 

[03E70, 03H0S, 91B16] 

see: Alternative set theory) 

promising region see: most — 

prone, adaptive) decision see: ex-post (risk — 

proof see: infeasibility — 

proof on the dual side 

[90C05, 90C25] 

(see: Young programming) 
proof methodology see: NP-complete problems and — 
proof system see: propositional — 
proofs see: NP-completeness — 
propagation 

[65G20, 65G30, 65G40, 68T20] 

(see: Interval constraints) 
propagation see: back —; constraint —; Interval — 


proper 
[51K05, 52C25, 65K05, 68Q25, 68U05, 90C22, 90C26, 90C30, 
90C35] 
(see: Graph realization via semidefinite programming; 
Monotonic optimization) 

proper see: strictly — 

proper coloring 

05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

(see: Optimization problems in unit-disk graphs) 

proper efficiency 

90C29] 

(see: Vector optimization) 

proper k-leveled graph 

90C35] 

(see: Optimization in leveled graphs) 

proper policy 

49120, 90C40] 

(see: Dynamic programming: stochastic shortest path 

problems) 


proper reduction 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 

properly efficient solution 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 

properly efficient solution 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 

properly quasimonotone operator 
[46N10, 49J40, 90C26] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 

properness see: dual — 

properties see: combinatorial —; descent —; Entropy 
optimization: shannon measure of entropy and its —; local 
relational —; Multi-objective optimization: pareto optimal 
solutions —; regularity —; testing relational —; 
thermodynamic — 

properties and applications see: Pseudomonotone maps: — 

properties of the configuration space see: local — 

properties of crisp relations see: special — 

properties of fuzzy relations see: special — 

properties of heterogeneous relations see: special — 

properties of homogeneous relations see: special — 

properties of interval Newton methods see: 
existence-proving — 

properties of random multidimensional assignment problem 
see: Asymptotic — 

properties of relations see: special —; universal — 

property see: boundary dependence —; cutworthy —; 
domination —; ellipsoid —; exchange —; existence —; 
generic —; greedy-choice —; hereditary —; homotopy —; 
integrality —; isotonicity —; Jacobian consistency —; 
localization —; Monge —; norm-dependent —; 
normalization —; optimal substructure —; pivoting —; 
scalarization —; single assignment —; uniform cone — 
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property-closure of a relation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 
property of concavity 

[94A17] 

(see: Jaynes’ maximum entropy principle) 
property in convex programming see: fundamental — 
property of the objective function value see: continuity —; 

convexity — 
property of the solution space see: convexity — 
proposal vector 
90C15, 90C90] 

(see: Decomposition algorithms for the solution of 
multistage mean-variance optimization problems) 
proposition 

41A30, 47A99, 65K10] 

(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
propositional proof system 

03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 
protection see: path- — 
protection problems in optical networks see: Integer linear 

programs for routing and — 
protein 

[90C90] 

(see: Simulated annealing methods in protein folding) 
protein design using flexible templates see: De novo — 
protein designUsing rigid templates see: De novo — 
protein folding 

[65K10, 90C90, 92C40] 

(see: Multiple minima problem in protein folding: «BB 

global optimization approach; Simulated annealing 

methods in protein folding) 
protein folding 

[65K05, 65K10, 90C26, 90C90, 92C05, 92C40] 

(see: Adaptive simulated annealing and its application to 

protein folding; Molecular structure determination: convex 

global underestimation; Monte-Carlo simulated annealing 
in protein folding; Multiple minima problem in protein 
folding: «BB global optimization approach; Simulated 
annealing methods in protein folding) 

protein folding see: Adaptive simulated annealing and its 
application to —; Global optimization in —; Monte-Carlo 
simulated annealing in —; Simulated annealing methods 
in — 

protein folding: wBB global optimization approach see: 
Multiple minima problem in — 

Protein folding: generalized-ensemble algorithms 

(92C05, 92C40, 92-08) 

(referred to in: Adaptive simulated annealing and its 

application to protein folding; Genetic algorithms; Global 

optimization in Lennard-Jones and morse clusters; Graph 
coloring; Molecular structure determination: convex global 
underestimation; Monte-Carlo simulated annealing in 
protein folding; Multiple minima problem in protein 
folding: «BB global optimization approach; Packet 
annealing; Phase problem in X-ray crystallography: Shake 
and bake approach; Simulated annealing methods in 
protein folding) 


protein folding problem 

[65K05, 90C26] 

(see: Molecular structure determination: convex global 

underestimation) 
protein force field via linear optimization see: Distance 

dependent — 
Protein loop structure prediction methods 

(92C05, 92C40) 
protein sequence alignment via mixed-integer linear 

optimization see: Global pairwise — 
protein structure 

[92B05] 

(see: Genetic algorithms for protein structure prediction) 
protein structure prediction see: Genetic algorithms for — 
proteins see: Predictive method for interhelical contacts in 

alpha-helical — 
protocol see: communication — 
protoconvex 
90C26] 

(see: Invexity and its applications) 
protoconvex 

90C26] 

(see: Invexity and its applications) 
prototype location problem 

90B80, 90B85] 

(see: Warehouse location problem) 
prototype location problem 

90B80, 90B85] 

(see: Warehouse location problem) 
prover see: resolution based theorem — 
proving properties of interval Newton methods see: 

existence- — 
proximal algorithms 

[90C25, 90C30] 

(see: Lagrangian multipliers methods for convex 

programming) 
proximal approximation 

[90C25, 90C30] 

(see: Lagrangian multipliers methods for convex 

programming) 
proximal bundle method 

[49J40, 49J52, 65K05, 90C30] 

(see: Solving hemivariational inequalities by nonsmooth 

optimization methods) 
proximal framework 
[90C25, 90C30] 
see: Lagrangian multipliers methods for convex 
programming) 
proximal-like method 
[49J40, 49M30, 65K05, 65M30, 65M32] 
see: Ill-posed variational problems) 
proximal map 
[90C25, 90C30] 
see: Lagrangian multipliers methods for convex 
programming) 
proximal minimization 
[68W 10, 90B15, 90C06, 90C30] 
see: Stochastic network problems: massively parallel 
solution) 
proximal minimization with D-functions 
[68W 10, 90B15, 90C06, 90C30] 
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(see: Stochastic network problems: massively parallel 
solution) 
proximal point 
[90C30] 
(see: Cost approximation algorithms) 
proximal point algorithm 
[47H05, 65J15, 90C25, 90C30, 90C55] 
(see: Cost approximation algorithms; Fejér monotonicity in 
convex optimization; Lagrangian multipliers methods for 
convex programming) 
proximal point algorithm see: entropic —; partial —; 
quadratic — 
proximal point algorithms see: inexact — 
proximal point approach 
[49}40, 49M30, 65K05, 65M30, 65M32] 
see: Ill-posed variational problems) 
proximal point bundle method 
[49]52, 90C30] 
see: Nondifferentiable optimization: relaxation methods) 
proximal point method 
[47]20, 49]40, 65K10, 90C33] 
see: Solution methods for multivalued variational 
inequalities) 
proximal point methods 
[90C30] 
see: Cost approximation algorithms) 
proximal point methods 
[49]40, 49M30, 65K05, 65M30, 65M32] 
see: Ill-posed variational problems) 
proximal set 
[47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 
proximal support vector machine see: generalized 
eigenvalue — 
proximal support vector machine problem see: Generalized 
eigenvalue — 
proximate optimality principal 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
proximate optimality principle 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
proximinal 
[41A30, 4799, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
proximity see: skew-symmetric —; symmetric — 
proximity data see: row conditional — 
proximity graph model 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
see: Optimization problems in unit-disk graphs) 
proximity map 
[41A30, 4799, 65K10] 
see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
pRP 
[49M07, 49M10, 65K, 90C06] 
see: New hybrid conjugate gradient algorithms for 
unconstrained optimization) 
prune see: branch and — 


pruning 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 
PSA 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
PSA with dummy nodes 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
psD 
[05C50, 15A48, 15A57, 65K10, 90C25, 90C26, 90C33, 90C39, 
90C51] 
(see: Generalizations of interior point methods for the 
linear complementarity problem; Matrix completion 
problems; Second order optimality conditions for 
nonlinear optimization) 
pseudo-associativity of products of relations 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
pseudo-inverse 
[65Fxx] 
(see: Least squares problems) 
pseudo-inverse see: Moore-Penrose — 
pseudo-invex 
[90C26 
(see: Invexity and its applications) 
pseudo-invex 
[90C26 
(see: Invexity and its applications) 
pseudo-order 
[90C29 
(see: Preference modeling) 
pseudo-triangulations 
[68Q20] 
(see: Optimal triangulations) 
pseudoconcave function see: U- —; U-weakly — 
pseudoconnected family of sets 
[46A22, 49J35, 49]40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
pseudoconnectedness 
[46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
pseudoconvex 
[90C26, 90C30] 
(see: Generalized monotone multivalued maps; Generalized 
monotone single valued maps; Invexity and its applications; 
Simplicial decomposition) 
pseudoconvex 
[90C26, 90C30] 
(see: Frank-Wolfe algorithm; Invexity and its applications) 
pseudoconvex see: n- —; strictly — 
pseudoconvex function 
[90C06, 90C25, 90C30, 90C35] 
(see: Convex-simplex algorithm; Simplicial decomposition 
algorithms) 
pseudoconvex function 
[90C06, 90C25, 90C30, 90C35] 
(see: Convex-simplex algorithm; Simplicial decomposition 
algorithms) 
pseudoconvex function see: strictly — 
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pseudoconvexity 
[90C30] 
(see: Simplicial decomposition) 
pseudocost 
[90C11] 
(see: MINLP: branch and bound methods) 
pseudocost estimate 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
pseudocosts 
[90C10, 90C26] 
(see: MINLP: branch and bound global optimization 
algorithm) 
pseudocosts see: best estimate using — 
pseudoflow 
[90C35] 
(see: Minimum cost flow problem) 
pseudomonotone 
[47J20, 49J40, 65K10, 90C26, 90C33] 
(see: Generalized monotone multivalued maps; Solution 
methods for multivalued variational inequalities) 
pseudomonotone bifunction 
[46N10, 49J40, 90C26] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
pseudomonotone bifunction (with respect to another) 
[46N10, 49]40, 90C26] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
pseudomonotone map see: strictly — 
pseudomonotone mapping 
[90C26] 
(see: Generalized monotone single valued maps) 
pseudomonotone mapping 
[35A15, 47]20, 49]40] 
(see: Hemivariational inequalities: static problems) 
Pseudomonotone maps: properties and applications 
(49]40; 49J53; 47H05; 47H04; 26B25) 
(refers to: Generalized monotone multivalued maps; 
Generalized monotone single valued maps) 
pseudomonotone operator 
[35A15, 46N10, 47J20, 49J40, 90C26] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems; Hemivariational 
inequalities: static problems) 
pseudomonotone operator see: strictly — 
pseudopolynomial algorithm 
[68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
pseudopolynomial time algorithm 
[49-01, 49K45, 49N10, 90-01, 90C20, 90C27, 90C35, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs; Maximum flow problem; 
Minimum cost flow problem) 
pseudoquadratic constraint 
[90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
pseudorandom 
[90C05, 90C34] 
(see: Semi-infinite programming: methods for linear 
problems) 


pseudoshadow price 

[90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: branch and bound methods) 
pseudoshadow prices see: best estimate using — 
pseudosphere 

[90C09, 90C10] 

(see: Oriented matroids) 
pSM 

(see: State of the art in modeling agricultural systems) 
PSPACE 

[03D15, 68Q05, 68Q15, 90C60] 

(see: Complexity classes in optimization; Parallel 

computing: complexity classes) 
PSQP 

[90C30, 90C33] 

(see: Optimization with equilibrium constraints: 

A piecewise SQP approach) 
psychology 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

PTAS 
[90C60] 
(see: Complexity classes in optimization) 

PTAS see: Arora —; Mitchell — 

PTSP 

[90C10, 90C15] 

see: Stochastic vehicle routing problems) 

pull-in trip 

[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 

see: Vehicle scheduling) 

pull objectives 

[90B85] 

see: Single facility location: multi-objective euclidean 
distance location; Single facility location: multi-objective 
rectilinear distance location) 

pull-out trip 

[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 

(see: Vehicle scheduling) 
pumping facilities see: groundwater —; surface water — 
purchase contract see: energy — 
purchaser problem see: Traveling — 
pure adaptive search 
[65K05, 90C26, 90C30, 90C90] 
see: Global optimization: hit and run methods; Random 
search methods) 
pure adaptive search 
[90C26, 90C90] 
see: Global optimization: hit and run methods) 
pure complementary gap function 
49-XX, 90-XX, 93-XX] 

(see: Duality theory: triduality in global optimization) 
pure exchange 

91B50] 

(see: Walrasian price equilibrium) 

pure exchange economic equilibrium model 

91B50] 

(see: Walrasian price equilibrium) 

pure exchange economy 

90C27, 90C60, 91A12, 91B50] 
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(see: Combinatorial optimization games; Walrasian price 
equilibrium) 


pure exchange equilibrium 


[91B50] 
(see: Walrasian price equilibrium) 


pure localization search 


[65K05, 90C30] 
(see: Random search methods) 


PVM 

[49-04, 65Y05, 68N20] 

(see: Automatic differentiation: parallel computation) 
PVM-based implementation 

[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 

(see: Greedy randomized adaptive search procedures) 
PVSPLIT 

[49J35, 49K35, 62C20, 91A05, 91A40] 


pure Monte-Carlo 


[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
see: Stochastic global optimization: stopping rules) 


pure Monte-Carlo method 


[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: two-phase methods) 


pure network problem 


[90C35] 
see: Generalized networks) 


pure NP method 


[90C11, 90C59] 
see: Nested partitions optimization) 


pure random search 


[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30, 
90C90] 

see: Global optimization: hit and run methods; Random 
search methods; Stochastic global optimization: stopping 
rules; Stochastic global optimization: two-phase methods) 


pure random search 

[90C26, 90C90] 

see: Global optimization: hit and run methods) 
pure strategy 


pure trade 
[91B50] 
(see: Walrasian price equilibrium) 
pure trade economic equilibrium model 
[91B50] 
(see: Walrasian price equilibrium) 
pure trust region strategy 
[90C30] 
see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
pure zero-one programming 
[90C10, 90C11, 90C27, 90C57] 
see: Set covering, packing and partitioning problems) 
purpose see: general — 
purpose of models 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
see: Modeling difficult optimization problems) 
purpose software library see: general- — 
push 
[90C35] 
(see: Maximum flow problem) 
push see: nonsaturating —; saturating — 
push algorithm see: generic preflow- —; preflow- — 
push objectives 
[90B85] 
(see: Single facility location: multi-objective euclidean 


[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
see: Replicator dynamics in combinatorial optimization) 


distance location; Single facility location: multi-objective 
rectilinear distance location) 


(see: Minimax game tree searching) 


Q 


q 


see: CG-standard for minimizing — 


q-coloring see: hypergraph — 


Q 


Q 


-factor 

[90C39] 

(see: Neuro-dynamic programming) 
-learning 

[49L20, 90C40] 

(see: Dynamic programming: stochastic shortest path 
problems) 


q-matrices 


65K05, 90C20, 90C33] 
(see: Principal pivoting methods for linear complementarity 
problems) 


Q-matrix 


90C33] 


(see: Linear complementarity problem) 


Q-quadratic convergence 


49J52, 90C30] 


(see: Nondifferentiable optimization: Newton method) 


(QR) policy 


90B50] 


(see: Inventory management in supply chains) 


Q-splitting see: regular — 
Q-superlinear 


65K05, 65K10, 90C06, 90C30, 90C34] 


(see: Feasible sequential quadratic programming) 


Q-superlinear convergence 


49J52, 90C30] 


(see: Nondifferentiable optimization: Newton method) 
qAP 


68Q25, 68R10, 68W40, 90C27, 90C59, 90C90] 


(see: Domination analysis in combinatorial optimization; 
Simulated annealing) 

QAP see: algebraic —; constant permutation —; general —; 
K-L type neighborhood structure for the —; 
Koopmans-Beckmann — 

QBB global optimization method 
(49M37, 65K10, 90C26, 90C30) 

QC see: basic — 


QD functions see: Quasidifferentiable optimization: algorithms 


for — 
QD laws and systems of variational inequalities 


[49J40, 49M05, 49805, 74G99, 74H99, 74Pxx] 


(see: Quasidifferentiable optimization: variational 
formulations) 


Q 


D-superpotentials see: convex variational inequality for an 
elastostatic problem involving —; elastostatic problem 
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QP 


involving —; variational equality for an elastostatic problem 
involving — 


QP 


90C20, 90C25] 
(see: Quadratic programming over an ellipsoid) 


90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 


QP Kuhn-Tucker points see: multiple — 
QP/NLP based branch and bound 


49M20, 90C11, 90C30] 

(see: Generalized outer approximation) 

QPP algorithm 

68W 10, 90B15, 90C06, 90C30] 

(see: Stochastic network problems: massively parallel 
solution) 


QPwBC 


65K05, 90C20] 
(see: Quadratic programming with bound constraints) 


QR algorithm see: implicit — 


QR decomposition 
15A23, 65F05, 65F20, 65F22, 65F25] 
(see: QR factorization) 


QR factorization 


(65F25, 15A23, 65F05, 65F20, 65F22) 

(referred to in: ABS algorithms for linear equations and 
linear least squares; Cholesky factorization; Interval linear 
systems; Large scale trust region problems; Large scale 
unconstrained optimization; Orthogonal triangularization; 
Overdetermined systems of linear equations; Solving large 
scale and sparse semidefinite programs; Symmetric systems 
of linear equations) 

(refers to: ABS algorithms for linear equations and linear 
least squares; Cholesky factorization; Interval linear 
systems; Large scale trust region problems; Large scale 
unconstrained optimization; Linear programming; 
Orthogonal triangularization; Overdetermined systems of 
linear equations; Solving large scale and sparse semidefinite 
programs; Symmetric systems of linear equations) 

qR factorization 

[15A23, 65F05, 65F20, 65F22, 65F25, 65Fxx, 90C30] 

(see: Generalized total least squares; Least squares 
problems; QR factorization) 


QR factorization 


[15A23, 65F05, 65F20, 65F22, 65F25, 90C20, 90C30] 
(see: Orthogonal triangularization; Successive quadratic 
programming: decomposition methods) 


QR factorization see: rank revealing — 


QR factorization with column-pivoting 

[15A23, 65F05, 65F20, 65F22, 65F25] 

(see: Orthogonal triangularization) 

QR factorization using Householder transformations 
[15A23, 65F05, 65F20, 65F22, 65F25] 

(see: QR factorization) 


QR method 


[65K05, 65K10] 
(see: ABS algorithms for optimization) 
(QR) policy see: Continuous review inventory models: — 


QSAP 


[90C08, 90C11, 90C27] 
(see: Quadratic semi-assignment problem) 


QSM model 

03D15, 68Q05, 68Q15] 

(see: Parallel computing: complexity classes) 

quadratic 

49M20, 90C06, 90C10, 90C11, 90C27, 90C30, 90C57, 90C90] 
(see: Generalized outer approximation; Modeling difficult 
optimization problems; Simulated annealing) 

quadratic assignment 

05-XX, 62H30, 90C27] 

(see: Assignment methods in clustering; Frequency 
assignment problem) 


quadratic assignment 
[62H30, 90C27] 
(see: Assignment methods in clustering) 


Quadratic assignment problem 
(90C08, 90C11, 90C27, 90C57, 90C59) 
(referred to in: Assignment and matching; Assignment 
methods in clustering; Bi-objective assignment problem; 
Biquadratic assignment problem; Communication network 
assignment problem; Complexity theory: quadratic 
programming; Facilities layout problems; Feedback set 
problems; Frequency assignment problem; Graph coloring; 
Graph planarization; Greedy randomized adaptive search 
procedures; Linear ordering problem; Maximum partition 
matching; Quadratic fractional programming: Dinkelbach 
method; Quadratic knapsack; Quadratic programming with 
bound constraints; Quadratic programming over an 
ellipsoid; Quadratic semi-assignment problem; Standard 
quadratic optimization problems: algorithms; Standard 
quadratic optimization problems: applications; Standard 
quadratic optimization problems: theory) 
(refers to: Assignment and matching; Assignment methods 
in clustering; Bi-objective assignment problem; 
Communication network assignment problem; Complexity 
theory; Complexity theory: quadratic programming; 
Computational complexity theory; Concave programming; 
Extended cutting plane algorithm; Facilities layout 
problems; Feedback set problems; Frequency assignment 
problem; Generalized assignment problem; Graph coloring; 
Graph planarization; Greedy randomized adaptive search 
procedures; Heuristic search; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 
Linear ordering problem; Linear programming: interior 
point methods; Maximum partition matching; 
Nondifferentiable optimization: subgradient optimization 
methods; Quadratic fractional programming: Dinkelbach 
method; Quadratic knapsack; Quadratic programming with 
bound constraints; Quadratic programming over an 
ellipsoid; Quadratic semi-assignment problem; Standard 
quadratic optimization problems: algorithms; Standard 
quadratic optimization problems: applications; Standard 
quadratic optimization problems: theory) 


quadratic assignment problem 
[05-04, 68Q25, 68R10, 68W40, 90B80, 90C05, 90C06, 90C08, 
90C10, 90C11, 90C20, 90C27, 90C59] 
(see: Communication network assignment problem; 
Domination analysis in combinatorial optimization; 
Evolutionary algorithms in combinatorial optimization; 
Integer programming: branch and cut algorithms; Linear 
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ordering problem; Time-dependent traveling salesman 
problem) 
quadratic assignment problem 


[90B80, 90C08, 90C11, 90C27, 90C57, 90C59] 


(see: Facilities layout problems; Quadratic assignment 
problem) 


qu 


adratic assignment problem see: algebraic —; 


bottleneck —; general —; Koopmans—Beckmann — 


qu 
qu 


adratic barrier-penalty function see: logarithmic- — 
adratic binary programming see: positive semi-definite — 


quadratic (Brier) scoring rule 


see: Bayesian networks) 


quadratic co-index 


[90C31, 90C34] 
see: Parametric global optimization: sensitivity) 


quadratic concave programming 


qu 


[49M37, 90C26, 91A10] 
see: Bilevel programming) 
adratic constraint see: convex —; integral — 


quadratic convergence 


[90C30, 90C33] 
see: Nonsmooth and smoothing methods for nonlinear 


complementarity problems and variational inequalities) 


qu 
qu 


adratic convergence see: Q- — 
adratic convergence theorem see: local — 


quadratic fractional program 


[90C32] 
(see: Fractional programming) 


Quadratic fractional programming: Dinkelbach method 


(90C32) 
(referred to in: Complexity theory: quadratic programming; 


Fractional combinatorial optimization; Fractional 
programming; Linear ordering problem; Quadratic 
assignment problem; Quadratic knapsack; Quadratic 
programming with bound constraints; Quadratic 
programming over an ellipsoid; Standard quadratic 
optimization problems: algorithms; Standard quadratic 
optimization problems: applications; Standard quadratic 
optimization problems: theory) 


(refers to: Bilevel fractional programming; Complexity 


theory: quadratic programming; Fractional combinatorial 
optimization; Fractional programming; Quadratic 
assignment problem; Quadratic knapsack; Quadratic 
programming with bound constraints; Quadratic 
programming over an ellipsoid; Standard quadratic 
optimization problems: algorithms; Standard quadratic 
optimization problems: applications; Standard quadratic 
optimization problems: theory) 

quadratic function 


qu 


qu 
qu 


qu 


[90C20, 90C25] 

(see: Quadratic programming over an ellipsoid) 

adratic function see: convex —; piecewise linear —; positive 
definite — 

adratic Gaussian see: linear- — 

adratic generalized network problems 

[68W 10, 90B15, 90C06, 90C30] 

(see: Stochastic network problems: massively parallel 
solution) 

adratic index 

[49J52, 49Q10, 58E05, 74G60, 74H99, 74K99, 74Pxx, 90C30, 
90C90] 


(see: Quasidifferentiable optimization: stability of dynamic 
systems; Topology of global optimization) 

Quadratic integer programming: complexity and equivalent 
forms 
(65K05, 90C11, 90C20) 
(referred to in: Maximum cut problem, MAX-CUT) 

quadratic integer programming: models and applications see: 
Multi- — 

Quadratic knapsack 
(90C20, 90C60) 
(referred to in: Complexity theory: quadratic programming; 
Integer programming; Linear ordering problem; 
Multidimensional knapsack problems; Quadratic 
assignment problem; Quadratic fractional programming: 
Dinkelbach method; Quadratic programming with bound 
constraints; Quadratic programming over an ellipsoid; 
Reverse convex optimization; Standard quadratic 
optimization problems: algorithms; Standard quadratic 
optimization problems: applications; Standard quadratic 
optimization problems: theory) 
(refers to: eBB algorithm; Complexity theory; Complexity 
theory: quadratic programming; Computational 
complexity theory; D.C. programming; Integer 
programming; Multidimensional knapsack problems; 
Quadratic assignment problem; Quadratic fractional 
programming: Dinkelbach method; Quadratic 
programming with bound constraints; Quadratic 
programming over an ellipsoid; Reverse convex 
optimization; Standard quadratic optimization problems: 
algorithms; Standard quadratic optimization problems: 
applications; Standard quadratic optimization problems: 
theory) 

quadratic knapsack problem see: convex — 

quadratic Lagrangian 
[90C25, 90C30] 
(see: Lagrangian multipliers methods for convex 
programming) 

quadratic maximum likelihood method see: iterative — 

quadratic models see: positive definite — 

quadratic nondegeneracy condition 
[58E05, 90C30] 
(see: Topology of global optimization) 

quadratic optimization see: convex — 

quadratic optimization problem see: standard — 

quadratic optimization problems: algorithms see: Standard — 

quadratic optimization problems: applications see: 
Standard — 

quadratic optimization problems: theory see: Standard — 

quadratic outer approximation 
[49M20, 90C11, 90C30] 
(see: Generalized outer approximation) 


quadratic problem see: bound constrained —; linear- — 
quadratic problems see: indefinite — 
quadratic program 


[65F10, 65F50, 65H10, 65K10, 90C31] 
(see: Globally convergent homotopy methods; Sensitivity 
and stability in NLP: approximation) 

quadratic program see: convex —; reduced — 

quadratic programming 
[05C60, 05C69, 37B25, 65K05, 65L99, 90C20, 90C25, 90C27, 
90C30, 90C35, 90C59, 90C60, 91A22, 93-XX] 
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(see: Complexity theory: quadratic programming; 
Optimization strategies for dynamic systems; Quadratic 
programming with bound constraints; Quadratic 
programming over an ellipsoid; Replicator dynamics in 
combinatorial optimization; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

quadratic programming 
[05C60, 05C69, 37B25, 65K05, 90C20, 90C25, 90C27, 90C30, 
90C31, 90C33, 90C35, 90C59, 90C60, 91A22, 91B28] 
(see: Complexity theory: quadratic programming; 
Frank-Wolfe algorithm; Linear complementarity problem; 
Operations research and financial markets; Portfolio 
selection: markowitz mean-variance model; Quadratic 
knapsack; Quadratic programming with bound constraints; 
Quadratic programming over an ellipsoid; Replicator 
dynamics in combinatorial optimization; Sensitivity and 
stability in NLP: approximation; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

quadratic programming see: Complexity theory: —; 
concave —; convex —; Feasible sequential —; full space 
successive —; indefinite —; matrix splitting methods in —; 
mixed-integer —; nonconvex —; sequential —; 
successive — 

quadratic programming: applications in distillation systems 
see: Successive — 

quadratic programming: applications in the process industry 
see: Successive — 

Quadratic programming with bound constraints 
(90C20, 65K05) 
(referred to in: Complexity theory: quadratic programming; 
Linear ordering problem; Quadratic assignment problem; 
Quadratic fractional programming: Dinkelbach method; 
Quadratic knapsack; Quadratic programming over an 
ellipsoid; Reverse convex optimization; Standard quadratic 
optimization problems: algorithms; Standard quadratic 
optimization problems: applications; Standard quadratic 
optimization problems: theory) 
(refers to: Complexity theory: quadratic programming; D.C. 
programming; Linear complementarity problem; Linear 
programming: interior point methods; Quadratic 
assignment problem; Quadratic fractional programming: 
Dinkelbach method; Quadratic knapsack; Quadratic 
programming over an ellipsoid; Reverse convex 
optimization; Standard quadratic optimization problems: 
algorithms; Standard quadratic optimization problems: 
applications; Standard quadratic optimization problems: 
theory) 

quadratic programming: decomposition methods see: 
Successive — 

Quadratic programming over an ellipsoid 
(90C20, 90C25) 
(referred to in: Complexity theory: quadratic programming; 
Linear ordering problem; Quadratic assignment problem; 
Quadratic fractional programming: Dinkelbach method; 
Quadratic knapsack; Quadratic programming with bound 
constraints; Standard quadratic optimization problems: 
algorithms; Standard quadratic optimization problems: 
applications; Standard quadratic optimization problems: 


theory; Volume computation for polytopes: strategies and 
performances) 
(refers to: Complexity theory: quadratic programming; 
Quadratic assignment problem; Quadratic fractional 
programming: Dinkelbach method; Quadratic knapsack; 
Quadratic programming with bound constraints; Standard 
quadratic optimization problems: algorithms; Standard 
quadratic optimization problems: applications; Standard 
quadratic optimization problems: theory; Volume 
computation for polytopes: strategies and performances) 
quadratic programming: full space methods see: Successive — 
quadratic programming: interior point methods for 
distributed optimal control problems see: Sequential — 
quadratic programming method see: piecewise sequential — 
quadratic programming methods see: active set —; 
sequential — 
quadratic programming problem 
[90C30, 90C90] 
(see: Design optimization in computational fluid dynamics; 
Successive quadratic programming: applications in 
distillation systems) 
quadratic programming problem see: extended — 
quadratic programming problem in SQP 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
quadratic programming: solution by active sets and interior 
point methods see: Successive — 
quadratic programming sub-problems see: Kuhn—Tucker 
conditions for — 
quadratic programming subproblem 
[90C30, 90C90] 
(see: Successive quadratic programming; Successive 
quadratic programming: applications in distillation 
systems) 
quadratic programming subproblem see: reduced — 
quadratic programs see: indefinite — 
quadratic proximal point algorithm 
[68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 
Quadratic semi-assignment problem 
(90C27, 90C11, 90C08) 
(referred to in: Feedback set problems; Graph coloring; 
Graph planarization; Greedy randomized adaptive search 
procedures; Linear ordering problem; Quadratic 
assignment problem) 
(refers to: Feedback set problems; Generalized assignment 
problem; Graph coloring; Graph planarization; Greedy 
randomized adaptive search procedures; Quadratic 
assignment problem) 
quadratic semi-assignment problem 
[90C08, 90C11, 90C27] 
(see: Quadratic semi-assignment problem) 
quadratic turning point 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
quadratic zero-one problem 
[65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
quadrature see: Gaussian — 
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quadrature methods 
[33C45, 65F20, 65F22, 65K10] 
(see: Least squares orthogonal polynomials) 
quadrature rule see: generalized Gauss — 
qualification see: basic constraint —; constraint —; first order 
constraint —; generalized Slater constraint —; linear 
independence constraint —; linear independency 
constraint —; Mangasarian—Fromovitz constraint —; 
second order constraint —; Slater constraint — 
qualification (LICQ) see: linear independence constraint — 
qualification rule see: absolute — 
qualifications see: constraint —; First order constraint —; input 
constraint —; Second order constraint — 
qualitative class of a matrix 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
qualitative forecasting methods 
[90C26, 90C30] 
(see: Forecasting) 
quality see: establishing solution — 
quality of both water environment see: minimizing the 
degradation in — 
quality equalization 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
quantifier see: generalized — 
quantifiers see: observational — 
quantile function 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
quantitative continuity see: rates of — 
quantitative forecasting methods 
[90C26, 90C30] 
see: Forecasting) 
quantity see: economic order —; relaxation — 
quantity formulation 
[91B28, 91B50] 
see: Spatial price equilibrium) 
quantity model 
[91B28, 91B50] 
(see: Spatial price equilibrium) 
quantity model 
91B28, 91B50] 
(see: Spatial price equilibrium) 
quantum group 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
quasi-assignment model 
68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
quasi-diagonal 
[47J20, 49]40, 65K10, 90C33] 
(see: Solution methods for multivalued variational 
inequalities) 
quasi-Hessian 
[90C90] 
(see: Design optimization in computational fluid dynamics) 
quasi-invex 
[90C26] 
(see: Invexity and its applications) 


quasi-invex 
[90C26] 
(see: Invexity and its applications) 
quasi-Newton 
[90C30] 
(see: Cost approximation algorithms) 
quasi-Newton method 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
quasi-Newton method see: partitioned —; SR1 —; symmetric 
rank-one — 
quasi-Newton method of Broyden class 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
quasi-Newton methods 
[49M37, 65K05, 90C30, 90Cxx] 
(see: Broyden family of methods and the BFGS update; Cost 
approximation algorithms; Nonlinear least squares: 
Newton-type methods; Symmetric systems of linear 
equations; Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
quasi-Newton methods 
[49M37, 65K05, 65K10, 90C30] 
(see: ABS algorithms for optimization; Broyden family of 
methods and the BFGS update; Nonlinear least squares: 
Newton-type methods) 
quasi-Newton methods see: factorized — 
quasi-Newton relation 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
quasi-Newton search engine 
[90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 
quasi-Newton update 
[15A15, 90C25, 90C55, 90C90] 
(see: Semidefinite programming and determinant 
maximization) 
quasi-Newton update see: BFGS —; 
Broyden-Fletcher—Goldfarb-Shanno — 
quasi-Newton updates 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
quasi-Newton updating see: inverse — 
quasi-Newtonian descent direction 
49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 
quasi-optimal solution 
90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
quasi-order 
62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
quasiconcave 
46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 90C15, 90C29, 
91A05] 
(see: Generalized concavity in multi-objective optimization; 
Logconcave measures, logconvexity; Minimax theorems) 
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quasiconcave function 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 

quasiconcave function 


[90C15] 

(see: Logconcave measures, logconvexity) 
quasiconcave function see: int U- —; Luc U- —; U- — 
quasiconcave measure 

[90C15] 


(see: Logconcave measures, logconvexity) 

quasiconcave probability distribution 
[90C05, 90C15] 
(see: Probabilistic constrained linear programming: duality 
theory) 

quasiconcave probability measure 
[90C15] 
(see: Logconcave measures, logconvexity) 

quasiconvex 
[41A30, 46A22, 47A99, 49J35, 49J40, 54D05, 54H25, 55M20, 
65K10, 90C26, 91A05] 
(see: Generalized monotone single valued maps; Invexity 
and its applications; Lipschitzian operators in best 
approximation by bounded or continuous functions; 
Minimax theorems) 

quasiconvex 

90C26] 

(see: Invexity and its applications) 

quasiconvex function 

41A30, 62J02, 90C26] 

(see: Regression by special functions: algorithms and 

complexity) 

quasiconvex function 

90C26] 
(see: Generalized monotone multivalued maps; Generalized 
monotone single valued maps) 

quasiconvex function see: semistrictly —; strictly — 

quasiconvex medium regression 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 

quasiconvex minorant see: greatest — 

quasiconvex and umbrella regression 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 

quasidifferentiability 
[26B25, 26E25, 49J35, 49J40, 49J52, 49M05, 49Q10, 49805, 
52A27, 65K99, 70-08, 74455, 74G60, 74G99, 74H99, 74K99, 
74M 10, 74M15, 74Pxx, 90C25, 90C26, 90C90, 90C99] 
(see: Quasidifferentiable optimization; Quasidifferentiable 
optimization: applications; Quasidifferentiable 
optimization: calculus of quasidifferentials; 
Quasidifferentiable optimization: codifferentiable 
functions; Quasidifferentiable optimization: Dini 
derivatives, clarke derivatives; Quasidifferentiable 
optimization: stability of dynamic systems; 
Quasidifferentiable optimization: variational formulations) 

quasidifferentiable 
[26B25, 26E25, 49J35, 49J52, 65K99, 65Kxx, 70-08, 74A55, 
74M 10, 74M15, 90C25, 90C26, 90C90, 90C99, 90Cxx] 


(see: Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: calculus of 
quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions) 

quasidifferentiable function 

[90Cxx] 

(see: Quasidifferentiable optimization: optimality 

conditions) 
quasidifferentiable function 

[65K05, 65Kxx, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization; 

Quasidifferentiable optimization: algorithms for QD 

functions) 
quasidifferentiable function see: Dini —; 

Hadamard — 
quasidifferentiable functions 

[65K05, 90C30] 

(see: Minimax: directional differentiability) 
quasidifferentiable functions see: examples of — 
Quasidifferentiable optimization 

(49]52, 26B25, 90C99, 26E25) 

(referred to in: Generalized monotonicity: applications to 

variational inequalities and equilibrium problems; 

Hemivariational inequalities: applications in mechanics; 

Hemivariational inequalities: eigenvalue problems; 

Nonconvex energy functions: hemivariational inequalities; 

Nonconvex-nonsmooth calculus of variations; 

Quasidifferentiable optimization: algorithms for 

hypodifferentiable functions; Quasidifferentiable 

optimization: algorithms for QD functions; 

Quasidifferentiable optimization: applications; 

Quasidifferentiable optimization: applications to 

thermoelasticity; Quasidifferentiable optimization: calculus 

of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 

Quasidifferentiable optimization: exact penalty methods; 

Quasidifferentiable optimization: optimality conditions; 

Quasidifferentiable optimization: stability of dynamic 

systems; Quasidifferentiable optimization: variational 

formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 

F. E. approach; Variational inequalities: geometric 

interpretation, existence and uniqueness; Variational 

inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 

variational inequalities and equilibrium problems; 

Hemivariational inequalities: applications in mechanics; 

Hemivariational inequalities: eigenvalue problems; 

Hemivariational inequalities: static problems; Nonconvex 

energy functions: hemivariational inequalities; 

Nonconvex-nonsmooth calculus of variations; 

Quasidifferentiable optimization: algorithms for 

hypodifferentiable functions; Quasidifferentiable 

optimization: algorithms for QD functions; 
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Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 


Quasidifferentiable optimization: algorithms for 


hypodifferentiable functions 

(49]52, 65K99) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 


optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 


Quasidifferentiable optimization: algorithms for QD 


functions 

(90Cxx, 65Kxx) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
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formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F, E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

Quasidifferentiable optimization: applications 

(74A55, 74M 10, 74M15, 65K99, 90C26, 49J35) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 


inequalities: projected dynamical system; Variational 
principles) 


Quasidifferentiable optimization: applications to 


thermoelasticity 

(74B99, 74D99, 74G99, 74H99, 47840, 35R70) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: calculus of 
quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: calculus of 
quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 


Quasidifferentiable optimization: calculus of 


quasidifferentials 
(49]52, 65K99, 90C90) 
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(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 


methods; Variational inequalities; Variational inequalities: 


F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 


methods; Variational inequalities; Variational inequalities: 


F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

Quasidifferentiable optimization: codifferentiable functions 
(65K99, 70-08, 49J52, 90C25) 
(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 


Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
Dini derivatives, clarke derivatives; Quasidifferentiable 
optimization: exact penalty methods; Quasidifferentiable 
optimization: optimality conditions; Quasidifferentiable 
optimization: stability of dynamic systems; 
Quasidifferentiable optimization: variational formulations; 
Quasivariational inequalities; Sensitivity analysis of 
variational inequality problems; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
Dini derivatives, clarke derivatives; Quasidifferentiable 
optimization: exact penalty methods; Quasidifferentiable 
optimization: optimality conditions; Quasidifferentiable 
optimization: stability of dynamic systems; 
Quasidifferentiable optimization: variational formulations; 
Quasivariational inequalities; Sensitivity analysis of 
variational inequality problems; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 


Quasidifferentiable optimization: Dini derivatives, clarke 


derivatives 

(49J52, 26E25, 52A27, 90C99) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
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thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: exact penalty methods; Quasidifferentiable 
optimization: optimality conditions; Quasidifferentiable 
optimization: stability of dynamic systems; 
Quasidifferentiable optimization: variational formulations; 
Quasivariational inequalities; Sensitivity analysis of 
variational inequality problems; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: exact penalty methods; Quasidifferentiable 
optimization: optimality conditions; Quasidifferentiable 
optimization: stability of dynamic systems; 
Quasidifferentiable optimization: variational formulations; 
Quasivariational inequalities; Sensitivity analysis of 
variational inequality problems; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

Quasidifferentiable optimization: exact penalty methods 
(90Cxx) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Nondifferentiable optimization; Quasidifferentiable 
optimization; Quasidifferentiable optimization: algorithms 
for hypodifferentiable functions; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: optimality conditions; 


Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 


Quasidifferentiable optimization: optimality conditions 


(90Cxx) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
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F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Dini and Hadamard derivatives in optimization; 
Generalized monotonicity: applications to variational 
inequalities and equilibrium problems; Hemivariational 
inequalities: applications in mechanics; Hemivariational 
inequalities: eigenvalue problems; Hemivariational 
inequalities: static problems; Nonconvex energy functions: 
hemivariational inequalities; Nonconvex-nonsmooth 
calculus of variations; Quasidifferentiable optimization; 
Quasidifferentiable optimization: algorithms for 
hypodifferentiable functions; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 


Quasidifferentiable optimization: stability of dynamic systems 


(74G60, 74H99, 4952, 49Q10, 74K99, 74Pxx, 90C90) 
(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Optimization strategies for dynamic systems; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: variational formulations; 
Quasivariational inequalities; Sensitivity analysis of 
variational inequality problems; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 


(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Optimization strategies for dynamic systems; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: variational formulations; 
Quasivariational inequalities; Sensitivity analysis of 
variational inequality problems; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 


Quasidifferentiable optimization: variational formulations 


(74G99, 74H99, 74Pxx, 49J40, 49M05, 49505) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasivariational inequalities; Sensitivity analysis 
of variational inequality problems; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
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energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasivariational inequalities; Sensitivity analysis 
of variational inequality problems; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

quasidifferentiable problems 

46A20, 52A01, 90C30] 

(see: Farkas lemma: generalizations) 

quasidifferentiable programming problem 

65Kxx, 90Cxx] 

(see: Quasidifferentiable optimization: algorithms for QD 
functions) 

quasidifferentiable set 

90Cxx] 

(see: Quasidifferentiable optimization: optimality 
conditions) 

quasidifferentiable superpotential 

49J40, 49M05, 49S05, 74G99, 74H99, 74Pxx] 

(see: Quasidifferentiable optimization: variational 
formulations) 

quasidifferential 

26B25, 26E25, 49J52, 52A27, 90C99, 90Cxx] 

(see: Quasidifferentiable optimization; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: optimality conditions) 
quasidifferential 

[90Cxx] 

(see: Quasidifferentiable optimization: exact penalty 
methods; Quasidifferentiable optimization: optimality 
conditions) 


quasidifferential see: Dini —; Hadamard — 


quasidifferential calculus 

[90Cxx] 

(see: Quasidifferentiable optimization: optimality 
conditions) 

quasidifferential elastic boundary conditions 

[35R70, 47840, 74B99, 74D99, 74G99, 74H99] 

(see: Quasidifferentiable optimization: applications to 
thermoelasticity) 

quasidifferential functions 

[90Cxx] 

(see: Quasidifferentiable optimization: exact penalty 
methods) 


quasidifferential laws see: variational formulation of — 


quasidifferential thermal boundary conditions 
[35R70, 47840, 74B99, 74D99, 74G99, 74H99] 
(see: Quasidifferentiable optimization: applications to 
thermoelasticity) 
quasidifferential thermal boundary conditions see: variational 
formulation of — 
quasidifferentials see: calculus of —; Quasidifferentiable 
optimization: calculus of — 
quasigradient see: stochastic —; stochastic mollifier — 
quasigradient method see: Two-stage stochastic 
programming: — 
quasigradient methods see: stochastic — 
quasigradient methods: applications see: Stochastic — 
quasigradient methods in minimax problems see: Stochastic — 
Quasigradient (SQG) methods see: stochastic — 
quasigradients see: stochastic — 
quasimonotone 
[90C26] 
see: Generalized monotone multivalued maps) 
quasimonotone bifunction 
[46N10, 49]40, 90C26] 
see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
quasimonotone map 
[90C26] 
see: Generalized monotone single valued maps) 
quasimonotone map 
[90C26] 
see: Generalized monotone single valued maps) 
quasimonotone map see: semistrictly —; strictly — 
quasimonotone operator 
[46N10, 49J40, 90C26] 
see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
quasimonotone operator 
[90C26] 
see: Generalized monotone multivalued maps) 
quasimonotone operator see: properly —; semistrictly —; 
strictly — 
quasimonotone pair 
[46N10, 49]40, 90C26] 
see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
quasirandom 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
see: Stochastic global optimization: two-phase methods) 
Quasivariational inequalities 
49]40, 70-08, 49Q10, 74K99, 74Pxx) 
referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
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codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Sensitivity analysis of variational inequality 
problems; Solving hemivariational inequalities by 
nonsmooth optimization methods; Variational inequalities; 
Variational inequalities: F. E. approach; Variational 
inequalities: geometric interpretation, existence and 
uniqueness; Variational inequalities: projected dynamical 
system; Variational principles) 
(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Multilevel 
optimization in mechanics; Nonconvex energy functions: 
hemivariational inequalities; Nonconvex-nonsmooth 
calculus of variations; Quasidifferentiable optimization; 
Quasidifferentiable optimization: algorithms for 
hypodifferentiable functions; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Sensitivity analysis of variational inequality 
problems; Solving hemivariational inequalities by 
nonsmooth optimization methods; Variational inequalities; 
Variational inequalities: F. E. approach; Variational 
inequalities: geometric interpretation, existence and 
uniqueness; Variational inequalities: projected dynamical 
system; Variational principles) 

quasivariational inequalities see: implicit variational 
inequalities and — 

quasivariational inequality 
[49Q10, 60G35, 65K05, 74K99, 74Pxx, 90C90, 91A65] 
(see: Differential equations and global optimization; 
Multilevel optimization in mechanics) 

queens problem see: n- — 

quench see: simulated — 

query point 

[46N10, 90-00, 90C47] 

see: Nondifferentiable optimization) 

question 

[68Q25, 90C60] 

see: NP-complete problems and proof methodology) 

question see: trade-off — 

question-asking strategy 

[90C09] 

see: Inference of monotone boolean functions) 

question-asking strategy see: binary search-Hansel chains —; 
sequential Hansel chains — 


queueing networks see: multiclass — 
queueing shared-memory model 

[03D15, 68Q05, 68Q15] 

(see: Parallel computing: complexity classes) 
queuing networks 

[90C15] 

(see: Stochastic quasigradient methods: applications) 
Quirk theorem see: Bassett-Maybee- — 
quotient see: ball —; Rayleigh —; Temple — 
quotient cuts 

[90C35] 

(see: Feedback set problems) 
quotients see: difference — 


R 


r-Constraint Satisfaction Problem see: max- — 
r-CSP see: max- — 

R flat fuzzy number see: L- — 

R fuzzy number see: L- — 

r-linear convergence rate 

[49]52, 90C30] 

(see: Nondifferentiable optimization: subgradient 

optimization methods) 

R-opt heuristic 

[90C27] 

(see: Time-dependent traveling salesman problem) 
Rfree Of unused partitions see: set — 

R}-upper semicontinuous function 
[90C29] 
(see: Vector optimization) 
Rreac Of used partitions see: set — 
RA algorithm 

[68W 10, 90B15, 90C06, 90C30] 

(see: Stochastic network problems: massively parallel 

solution) 

Rabinovich system see: Aizenberg- — 
race see: Pareto — 
radially continuous function 

[90C26] 

(see: Generalized monotone multivalued maps) 
radiation exposure time 

[68W01, 90-00, 90C90, 92-08, 92C50] 

(see: Optimization based frameworkfor radiation therapy) 
radiation therapy 

[68W01, 90-00, 90C90, 92-08, 92C50] 

(see: Optimization based frameworkfor radiation therapy) 
radiation therapy see: Optimization based frameworkfor — 
radio link frequency assignment problem 

[90C10] 

(see: Maximum constraint satisfaction: relaxations and 

upper bounds) 
radiotherapy treatment design see: Beam selection in — 
radius 

[05A, 05C15, 05C62, 05C69, 05C85, 15A, 51M, 52A, 52B, 52C, 

62H, 68Q, 68R, 68U, 68W, 90B, 90C, 90C27, 90C59] 

(see: Convex discrete optimization; Optimization problems 

in unit-disk graphs) 
radius see: spectral — 
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radius of information 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 


(see: Information-based complexity and information-based 


optimization) 
radius of stability 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
Radon measure 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
Radon measures 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
RAF of CEP 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
Railroad 
(see: Railroad crew scheduling; Railroad locomotive 
scheduling) 
Railroad crew scheduling 
Railroad locomotive scheduling 
railroads see: engine routing and industrial in-plant — 
RAM 
03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
Ramsdell method see: Newsam- — 
Ramsey model 
49Jxx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
Ramsey rule of economic growth 
49Jxx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
Rand statistic 
62H30, 90C27] 
(see: Assignment methods in clustering) 
random access machine see: parallel — 
random behavior 
52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
random choice see: rule of — 
random construction 
65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
random interval arithmetic 
65G30, 65G40, 65K05, 90C30, 90C57] 
(see: Global optimization: interval analysis and balanced 
interval arithmetic) 
random interval arithmetic see: balanced — 
random keys method 
[00-02, 01-02, 03-02] 
(see: Vehicle routing problem with simultaneous pickups 
and deliveries) 
random multidimensional assignment problem see: 
Asymptotic properties of — 
random numbers see: common — 
random objective see: Stochastic programming models: — 
random objective function 
[90C15] 
(see: Stochastic programming models: random objective) 


random polling scheme 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 

random sampling 
[90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 

random sampling 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: stopping rules; 
Stochastic global optimization: two-phase methods) 

random sampling see: uniform — 

random search see: pure — 

random search algorithms 

90C26, 90C29] 

(see: Optimal design of composite structures) 

random search algorithms 

90C26, 90C29, 90C90] 

(see: Global optimization: hit and run methods; Optimal 

design of composite structures) 

random search method 

65K05, 90C30] 
(see: Random search methods) 

random search method see: adaptive — 

Random search methods 
(65K05, 90C30) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian global 
optimization; Genetic algorithms for protein structure 
prediction; Global optimization based on statistical models; 
Global optimization: hit and run methods; Maximum cut 
problem, MAX-CUT; Monte-Carlo simulated annealing in 
protein folding; Optimal design of composite structures; 
Packet annealing; Simulated annealing; Simulated 
annealing methods in protein folding; Stochastic global 
optimization: stopping rules; Stochastic global 
optimization: two-phase methods) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Bayesian global optimization; Genetic 
algorithms for protein structure prediction; Global 
optimization based on statistical models; Monte-Carlo 
simulated annealing in protein folding; Packet annealing; 
Simulated annealing; Simulated annealing methods in 
protein folding; Stochastic global optimization: stopping 
rules; Stochastic global optimization: two-phase methods) 

random vertex insertion (RVI) 

[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 

90C60, 90C90] 

see: Traveling salesman problem) 

random walk search 

[92B05] 

(see: Genetic algorithms) 

random walk search 

[92B05] 

see: Genetic algorithms) 

randomization 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C10, 

90C27, 94C10, 94C15] 

see: Graph planarization; Maximum satisfiability problem) 

randomized adaptive search see: greedy — 

randomized adaptive search procedure see: greedy — 

randomized adaptive search procedures see: Greedy — 
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randomized algorithm 
[60J65, 68Q25] 
(see: Adaptive global search) 
randomized algorithms 
[05C85, 52A22, 60D05, 68Q25, 90C05] 
(see: Directed tree networks; Probabilistic analysis of 
simplex algorithms) 
randomized algorithms 
[60J65, 68Q25] 
(see: Adaptive global search) 
randomized allocation scheme 
[68W10, 90C27] 
see: Load balancing for parallel optimization techniques) 
randomized enumeration 
[90C05, 90C06, 90C08, 90C10, 90C11] 
see: Integer programming: branch and bound methods) 
randomized heuristics 
[05C69, 05C85, 68W01, 90C09, 90C10, 90C59] 
see: Heuristics for maximum clique and independent set; 
Optimization in boolean classification problems) 
randomized heuristics 
[90C09, 90C10] 
(see: Optimization in boolean classification problems) 
randomized rounding 
[05C85] 
see: Directed tree networks) 
rANDOMIZED ROUNDING PROCEDURE 
[49N15, 65Y20, 68W25, 90C22, 90C27] 
(see: Maximum likelihood detection via semidefinite 
programming) 
randomized setting 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 
randomly generated 
[92C40] 
(see: Monte-Carlo simulated annealing in protein folding) 
randomly with predefined probabilities 
[65G30, 65G40, 65K05, 90C30, 90C57] 
(see: Global optimization: interval analysis and balanced 
interval arithmetic) 
randomly with the same probability 
[65G30, 65G40, 65K05, 90C30, 90C57] 
(see: Global optimization: interval analysis and balanced 
interval arithmetic) 
Randomness 
[90C60] 
(see: Kolmogorov complexity) 
randomness see: Algorithmic — 
range-analysis 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 
range and null space decomposition 
[90C20, 90C30, 90C90] 
(see: Successive quadratic programming: applications in 
distillation systems; Successive quadratic programming: 
decomposition methods) 
range and null space decomposition 
[90C30, 90C90] 
(see: Successive quadratic programming; Successive 


quadratic programming: applications in distillation 
systems) 
range planning see: long — 
range space 
[90C20, 90C30] 
(see: Successive quadratic programming: decomposition 
methods) 
range space 
[90C20, 90C30] 
(see: Successive quadratic programming: decomposition 
methods) 
ranges see: Bounding derivative — 
rank 
[90C30] 
(see: Simplicial decomposition) 
rank see: Chvatal —; full row —; numerical — 
rank completion problem see: maximum —; minimum — 
rank determined graph 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
rank matrix completion 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
rank of a matroid 
[90C09, 90C10] 
(see: Matroids) 
rank nonconvexity see: low- — 
rank-one approach see: limited-memory symmetric — 
rank one formula see: selfdual — 
rank-one matrix 
[90C25, 90C30] 
(see: Solving large scale and sparse semidefinite programs) 
rank-one quasi-Newton method see: symmetric — 
rank-one update see: symmetric — 
rank-one update formula see: Sherman-Morrison — 
rank revealing factorization 
15A23, 65F05, 65F20, 65F22, 65F25] 
(see: Orthogonal triangularization) 
rank revealing factorization 
15A23, 65F05, 65F20, 65F22, 65F25] 
(see: Orthogonal triangularization) 
rank revealing QR factorization 
65F xx] 
(see: Least squares problems) 
rank revealing URV factorization 
65F xx] 
(see: Least squares problems) 
rank-two updates 
[90C30 
(see: Broyden family of methods and the BFGS update) 
ranking see: assignment —; extreme point — 
ranking extreme points 
[90C60 
(see: Complexity of degeneracy) 
RANS code 
[90C90 
(see: Design optimization in computational fluid dynamics) 
Raoult law 
[90C30 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
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Raphson method see: Newton- — 

RASP 
[90C60] 
(see: Complexity classes in optimization) 

rate see: average nonredundancy —-; average redundancy —; 
convergence —; geometric convergence —-; local 
convergence —- r-linear convergence —-; riskless 
interest —; spot interest —; superlinear convergent —; 
t-estimate of the spot — 

rate for bonds with constant maturities see: estimating the 
spot — 

rate constraints see: upper and lower well oil — 

rate of convergence 

90C30, 90C33] 

(see: Implicit lagrangian) 

rate estimation see: t-programmed problem of spot — 

rate of steepest ascent 

90Cxx] 

(see: Quasidifferentiable optimization: optimality 

conditions) 

rate of steepest descent 

90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 

rate yield curve see: interest — 

rateCT 
(see: Medium-term scheduling of batch processes) 

rates see: asymptotic convergence —; term structure of 
interest — 

rates of convergence see: Stochastic integer programming: 
continuity, stability — 

rates of quantitative continuity 
[90C11, 90C15, 90C31] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence) 

rates and stoichiometry see: estimation of reaction — 

rating and optimization methods see: Credit — 

ratio see: approximation —; bottleneck Steiner —; 
competitive —; consistency —; cost-to-time —; 
domination —; Euclidean Steiner —; k-Steiner —; method 
of optimal —; Sharpe —; Steiner 

ratio in Banach spaces see: Steiner — 

ratio of biomolecular structures see: Steiner — 

ratio cycle see: maximum profit-to-time —; minimum 
cost-to-time — 

ratio disk graphs see: bounded — 

ratio fractional (hyperbolic) 0-1 programming problem see: 
single- — 

ratio fractional program see: single- — 

ratio method see: likelihood — 

ratio for portfolio management see: Competitive — 

ratio principle see: local- — 

ratio programs see: single- — 

ratio spanning-tree see: minimum — 

ratio test see: minimum — 

rational choice 
[90C30] 
(see: Global optimization based on statistical models) 

rational numbers 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 


rational numbers see: finite —; infinitely near — 

rational p-form 

[05B35, 20F36, 20F55, 52C35, 57N65] 

see: Hyperplane arrangements) 

rational reaction set 

[49-01, 49K10, 49M37, 90-01, 90C05, 90C27, 90C30, 90C90, 

91B52] 

see: Bilevel linear programming; Bilevel programming: 

global optimization) 

rational use of groundwater 

[90C30, 90C35] 

see: Optimization in water resources) 

rationality see: bounded —; individual — 

rationality assumption see: human — 

rationality factor see: human — 

Ratios see: Maximization of the Smallest of Several —; 
maximizing a sum of — 

ratios fractional program see: sum-of- — 

Rawls objective function see: multifacility Weber- — 

Rawls problem see: multiWeber- —; Weber- — 

ray see: descent —; dual —; extremal —; primal —; 
termination on a secondary — 

ray crystallography: Shake and bake approach see: Phase 
problem in X- — 

ray diffraction data see: Optimization techniques for phase 
retrieval based on single-crystal X- — 

Rayleigh quotient 

[49R50, 65G20, 65G30, 65G40, 65L15, 65L60 

see: Eigenvalue enclosures for ordinary differential 

equations) 

Rayleigh-Ritz bound 

[49R50, 65G20, 65G30, 65G40, 65L15, 65L60 

see: Eigenvalue enclosures for ordinary differential 

equations) 

Rayleigh-Ritz method 

[49R50, 65G20, 65G30, 65G40, 65L15, 65L60 

see: Eigenvalue enclosures for ordinary differential 

equations) 

Rayleigh-Ritz method 

[49R50, 65G20, 65G30, 65G40, 65L15, 65L60 

see: Eigenvalue enclosures for ordinary differential 
equations) 

rays functions on topological vector spaces see: Increasing and 
convex-along- — 

razor see: Occam — 

RCL 

[90035] 

see: Feedback set problems) 


RD 

[90C90] 

(see: MINLP: reactive distillation column synthesis) 

re-annealing 

[92C05] 

see: Adaptive simulated annealing and its application to 
protein folding) 

reachable see: directly left- —; directly right- —; left- —; 
right- — 

reaction equation 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
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reaction equilibria 

[90C26, 90C90] 

(see: Global optimization in phase and chemical reaction 

equilibrium) 
reaction equilibrium see: Global optimization in phase and 

chemical — 
reaction flux estimation in lumped systems 

[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 

76R50, 80A20, 80A23, 80A30] 

(see: Identification methods for reaction kinetics and 

transport) 
reaction kinetics and transport see: Identification methods 

for — 
reaction rates and stoichiometry see: estimation of — 
reaction set see: rational — 
reaction tangent-plane criterion 
[49K99, 65K05, 80A10] 
see: Optimality criteria for multiphase chemical 
equilibrium) 
reaction tangent-plane criterion 
[49K99, 65K05, 80A10] 
see: Optimality criteria for multiphase chemical 
equilibrium) 
reactive azeotropes 
[90C30] 
see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
reactive distillation 
[90C90] 
see: MINLP: reactive distillation column synthesis) 
reactive distillation 
[90C90] 

(see: MINLP: reactive distillation column synthesis) 
reactive distillation column synthesis see: MINLP: — 
reactive GRASP 

[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 

(see: Feedback set problems; Greedy randomized adaptive 

search procedures) 

Reactive scheduling of batch processes 
reactive tabu search 

[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 

94C10] 

(see: Maximum satisfiability problem) 
reactive tabu search see: Hamming- — 
reactive TS 

[68T20, 68T99, 90C27, 90C59] 

(see: Metaheuristics) 
reactor see: fed-batch — 
real addition with order see: first order theory of — 
real coefficients 
[01A50, 01A55, 01A60] 

(see: Fundamental theorem of algebra) 

real interval matrix 

[65G20, 65G30, 65G40, 65L99] 

see: Interval analysis: eigenvalue bounds of interval 
matrices) 

real interval matrix 

[65G20, 65G30, 65G40, 65L99] 

see: Interval analysis: eigenvalue bounds of interval 
matrices) 


real number model 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26, 90C60] 
(see: Complexity theory; Information-based complexity and 
information-based optimization) 
real number model 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26, 90C60] 
(see: Complexity theory; Information-based complexity and 
information-based optimization) 
real numbers 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
real numbers see: infinitely small —; infinitely small 
negative —; infinitely small positive — 
real symmetric interval matrix 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: eigenvalue bounds of interval 
matrices) 
real symmetric interval matrix 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: eigenvalue bounds of interval 
matrices) 
real-valued CNSO 
[46A20, 52A01, 90C30] 
(see: Composite nonsmooth optimization) 
real-valued CNSO see: extended — 
real vectors space 
90C09, 90C10] 
(see: Oriented matroids) 
real world 
90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
real-world problem 
34-xx, 34Bxx, 34Lxx, 93E24] 
(see: Complexity and large-scale least squares problems) 
realisable problem 
49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
realization 
51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 
(see: Graph realization via semidefinite programming) 
realization see: edge — 
realization of an abstract group 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
realization of a matrix 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 
Realization Problem see: graph — 
realization via semidefinite programming see: Graph — 
realizing with minimal social cost see: production — 
reason see: laplace’s principle of insufficient — 
reasoning see: approximate —; interval logic system of 
approximate —; Laplace principle of insufficient —; 
point-based logic system of approximate — 
rebalanced portfolio see: constant — 
receiver-initiated 
[65K05, 65Y05, 65Y10, 65Y20, 68W10] 
(see: Interval analysis: parallel methods for global 
optimization) 
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receiver initiated mapping technique 
[68W 10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
recession functional 
[35A15, 47J20, 49]40] 
(see: Hemivariational inequalities: static problems) 
recession functional 
[35A15, 47J20, 49J40] 
(see: Hemivariational inequalities: static problems) 
recharge facilities see: controlled — 
recognition see: Complementarity algorithms in pattern —; 
pattern —; statistical pattern — 
recognition problem 
[90C05, 90C10, 90C60] 
(see: Computational complexity theory; Simplicial pivoting 
algorithms for integer programming) 
recognition problem 
[90C60] 
(see: Computational complexity theory) 
recognition problem see: language — 
recognition problems see: language — 
reconditioner 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
reconstruction 
[92C05, 92C40] 
(see: Protein loop structure prediction methods) 
reconstruction see: Entropy optimization for image —; 
finite-dimensional models for entropy optimization for 
image —; image —; Maximum entropy principle: image —; 
parent node —; vector-space models for entropy 
optimization for image — 
reconstruction from projection data see: feasibility approach to 
image —; image —; optimization approach to image — 
reconstruction methods for nonconvex feasibility analysis see: 
Shape — 
recourse 
[90C15] 
(see: Stochastic programming: parallel factorization of 
structured matrices) 
recourse 
[49M25, 90-08, 90C05, 90C06, 90C08, 90C15] 
(see: L-shaped method for two-stage stochastic programs 
with recourse; Simple recourse problem; Simple recourse 
problem: dual method; Simple recourse problem: primal 
method) 
recourse see: complete —; expected —; fixed —; full —; 
L-shaped method for two-stage stochastic programs 
with —; relatively complete —; simple —; simple integer —; 
stochastic integer program with —; stochastic linear 
program with —; stochastic program with —; Stochastic 
programming with simple integer —; two-stage stochastic 
program with —; two-stage stochastic programs with —; 
two-stage stochastic programs with simple integer — 
recourse action 
[90C10, 90C15] 
(see: Stochastic vehicle routing problems) 
recourse actions 
[90C10, 90C15] 
(see: Stochastic integer programs) 


recourse and arbitrary multivariate distributions see: Stochastic 
linear programs with — 
recourse bound see: restricted- — 
recourse decision 
[90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming; Stochastic 
programs with recourse: upper bounds) 
recourse function 
[90C15] 
(see: Stochastic linear programs with recourse and arbitrary 
multivariate distributions) 
recourse function see: approximating the —; expected — 
recourse model 
[90C15] 
(see: Static stochastic programming models) 
recourse models 
[90C11, 90C15] 
(see: Stochastic programming with simple integer recourse) 
recourse models see: restricted — 
recourse problem 
[90B10, 90B15, 90C15, 90C35] 
(see: Preprocessing in stochastic programming) 
recourse problem see: dual method for the simple —; primal 
method for the simple —; Simple — 
recourse problem: dual method see: Simple — 
recourse problem: primal method see: Simple — 
recourse: upper bounds see: Stochastic programs with — 
rectangular partition 
[90C27] 
(see: Steiner tree problems) 
rectangular partition problem 
[90C27] 
(see: Steiner tree problems) 
rectilinear distance 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location; Single facility location: multi-objective 
rectilinear distance location) 
rectilinear distance location see: Single facility location: 
multi-objective — 
rectilinear distance location problem 
[90B85] 
(see: Single facility location: multi-objective rectilinear 
distance location) 
rectilinear distances see: Optimizing facility location with 
euclidean and — 
rectilinear Steiner arborescence tree 
[90C27] 
(see: Steiner tree problems) 
rectilinear Steiner tree 
[90C27] 
(see: Steiner tree problems) 
recurrence see: three-term- — 
recurrence algorithm see: three-term- — 
recurrence relation 
[90C30] 
(see: Generalized total least squares) 
recurrent class of states 
[49L99] 
(see: Dynamic programming: average cost per stage 
problems) 
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recurrent neural network 
90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 
recursion 
[90C05, 90C10] 
see: Simplicial pivoting algorithms for integer 
programming) 
recursion 
[05B35, 65K05, 90C05, 90C20, 90C33] 
see: Criss-cross pivoting rules; Least-index anticycling 
rules) 
recursion see: adjoint —; dynamic programming — 
recursions see: dynamic programming — 
recursive algorithm 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules) 
recursive dynamic programming equations 
[90C15] 
see: Two-stage stochastic programming: quasigradient 
method) 
recursive dynamic programming equations 
[90C15] 
see: Two-stage stochastic programming: quasigradient 
method) 
recursive least squares algorithm 
[65Fxx] 
see: Least squares problems) 
recursive Opt Matching (ROM) 
[68Q25, 68R10, 68W40, 90C27, 90C59] 
(see: Domination analysis in combinatorial optimization) 
recursive procedure 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
recursive state space search algorithm 
[49]35, 49K35, 62C20, 91A05, 91A40] 
see: Minimax game tree searching) 
reduce see: branch and — 
reduced 
[9008, 90C26, 90C27, 90C59] 
see: Variable neighborhood search methods) 
reduced box 
[65K05, 90C26, 90C30] 
see: Monotonic optimization) 
reduced cost 
[68Q99, 90C35] 
(see: Branch and price: Integer programming with column 
generation; Generalized networks) 
reduced cost fixing 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods; 
Integer programming: cutting plane algorithms) 
reduced gradient 
[49M37, 65K05, 90C30] 
(see: Inequality-constrained nonlinear optimization) 
reduced gradient algorithm 
[90C30] 
(see: Convex-simplex algorithm) 
reduced gradient algorithm 
[90C30] 
(see: Convex-simplex algorithm) 
reduced gradient method see: Wolfe — 


reduced Grobner basis 

{13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 
reduced Hessian 


[90Cxx] 
(see: Discontinuous optimization) 
reduced-Hessian see: affine- —; limited-memory affine — 


reduced-Hessian algorithm see: affine- — 
reduced-Hessian BFGS algorithm see: limited-memory — 
reduced Hessian of a Lagrangian 
[49M37, 65K05, 90C30] 
(see: Inequality-constrained nonlinear optimization) 
reduced Hessian SQP see: multiplier-free — 
reduced Hessian SQP method 
90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
reduced master program 
90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 
reduced model 
90C30] 
(see: Suboptimal control) 
reduced polyblock 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
reduced problem see: locally — 
reduced problem in semi-infinite programming 
[90C31, 90C34] 
(see: Semi-infinite programming: second order optimality 
conditions) 
reduced quadratic program 
[90C20, 90C30] 
(see: Successive quadratic programming: decomposition 
methods) 
reduced quadratic programming subproblem 
[90C20, 90C30] 
(see: Successive quadratic programming: decomposition 
methods) 
reduced RLT system 
90C26] 
(see: Reformulation-linearization technique for global 
optimization) 
reduced space SQP 
65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
Reduced VNS 
9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 
reducibility 
68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
reducibility 
68Q25, 90C60] 
(see: Computational complexity theory; NP-complete 
problems and proof methodology) 
reducibility see: polynomial —; polynomial Turing — 
reducibility of algorithms 
[90C60] 
(see: Computational complexity theory) 
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reducibility of problems 
[90C60] 
(see: Computational complexity theory) 
reducible 
[90C60] 
(see: Computational complexity theory) 
reducible graph see: cyclically —; Smith-Walford one- — 
reducible problems 
[90C60] 
(see: Computational complexity theory) 
reduction 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
reduction 
[65K05, 90C30] 
(see: Bisection global optimization methods) 
reduction see: Burke—Poliquin —; coefficient —; complete —; 
feasible region —; Monte-Carlo sampling and variance —; 
polynomial time —; potential —; proper —; region —; 
simplex —; spherical —; successive affine —; variance —; 
weighting space — 
reduction algorithm see: potential —; primal-dual potential —; 
primal potential — 
reduction algorithms see: potential — 
reduction Ansatz 
[57R12, 90C25, 90C29, 90C30, 90C31, 90C34, 90C46] 
(see: Bilevel programming: optimality conditions and 
duality; Generalized semi-infinite programming: optimality 
conditions; Parametric global optimization: sensitivity; 
Smoothing methods for semi-infinite optimization) 
reduction Ansatz 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
reduction in auction algorithms see: graph — 
reduction based method 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming: discretization methods) 
reduction BFGS algorithm see: successive affine — 
reduction of a constraint set 
[90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
reduction cuts 
[65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
reduction to finite costs 
[49]xx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
reduction lower bounds see: variance — 
reduction methods for linear programming see: Potential — 
reduction operations 
[49-04, 65Y05, 68N20] 
(see: Automatic differentiation: parallel computation) 
reduction technique see: variance — 
reductions 
[05C85] 
(see: Directed tree networks) 
reductions see: ordinary NP-complete — 
redundancy 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 


redundancy 
[90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
redundancy see: deterministic method for detecting —; 
probabilistic method for detecting — 
Redundancy in nonlinear programs 
(90C05, 90C20) 
(referred to in: Inequality-constrained nonlinear 
optimization) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; 
Inequality-constrained nonlinear optimization) 
redundancy rate see: average — 
redundancy test 
[90C11, 90C31] 
see: Multiparametric mixed integer linear programming) 
redundant constraint 
[90C05, 90C20] 
see: Redundancy in nonlinear programs) 
redundant constraint see: relatively — 
redundant restriction 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
Reeves algorithm see: Fletcher- — 
Reeves formula see: Fletcher- — 
Reeves method see: Fletcher- — 
reference direction method 
[90C29] 
see: Multiple objective programming support) 
reference direction vector 
[90C29] 
see: Multiple objective programming support) 
reference point 
90C29] 
(see: Multiple objective programming support) 
refinement 
05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
refinement see: incremental strategy for model structure —; 
iterative model —; Murty least-index — 
refining industry 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 
reflection 
[90C30] 
(see: Sequential simplex method) 
reflection arrangement of hyperplanes 
[05B35, 20F36, 20F55, 52C35, 57N65] 
see: Hyperplane arrangements) 
reflection coefficient 
[90C30] 
see: Sequential simplex method) 
reflection operations 
[90C30] 
see: Sequential simplex method) 
reflexive closure of a relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
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reflexive relation 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 
reflexive relation see: locally — 
reflexivity 

[41A30, 47A99, 65K10] 

(see: Lipschitzian operators in best approximation by 

bounded or continuous functions) 
reformulation 

[90C10, 90C11, 90C27, 90C57] 

(see: Set covering, packing and partitioning problems) 
reformulation see: linearized —; model —; parametric 

eigenvalue —; preprocessing and —; 

smoothing-nonsmooth — 
reformulation descent 
[9008, 90C26, 90C27, 90C59] 
see: Variable neighborhood search methods) 
reformulation-Linearization/Convexification Technique 
[90C26] 
see: Reformulation-linearization technique for global 
optimization) 
reformulation-linearization/convexification techniques 
[65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
reformulation-linearization technique 
[90C09, 90C10, 90C11, 90C26, 90C27] 

(see: Disjunctive programming; 

Reformulation-linearization technique for global 

optimization; Time-dependent traveling salesman 

problem) 
reformulation-linearization technique 

[90C09, 90C10, 90C11] 

(see: Disjunctive programming) 
Reformulation-Linearization-Technique see: the — 
Reformulation-linearization technique for global 

optimization 

(90C26) 

(referred to in: aBB algorithm; Disjunctive programming; 

MINLP: branch and bound global optimization algorithm; 

MINLP: branch and bound methods; MINLP: global 

optimization with «BB; MINLP: logic-based methods) 
reformulation/spatial branch and bound 

[49M37, 90C11] 

(see: Mixed integer nonlinear programming) 


reformulation techniques 
[90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 

reformulations of discrete-continuous optimization problems 
see: Continuous — 

regeneration networks 
[93A30, 93B50] 
(see: MINLP: mass and heat exchanger networks; Mixed 
integer linear programming: mass and heat exchanger 
networks) 

regenerative processes 
[60J05, 90C15] 
(see: Derivatives of markov processes and their simulation) 


regenerative set 
[60J05, 90C15] 
(see: Derivatives of markov processes and their simulation) 
regenerative stopping times 
[60J05, 90C15] 
(see: Derivatives of markov processes and their simulation) 
regime see: stationary —; transient — 
region see: bundle trust —; complementary —; critical —; 
enlargement of a feasible —; exclusion —; feasible —; 
induced —; inducible —; large scale trust —; minimal 
representation of a feasible —; most promising —; prime 
representation of a feasible —; relaxation of a feasible —; 
relaxed constraint —; safe starting —; trust —; Voronoi — 
region approach see: trust — 
region of attraction 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: stopping rules; 
Stochastic global optimization: two-phase methods) 
region of cooperation 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
region method see: achievable 
region methodology see: trust — 
region methods see: Nonlinear least squares: trust —; trust — 
region model see: trust — 
region network see: large — 
region problem see: general case of the trust —; hard case of 
the trust —; large scale trust —; Newton step case of the 
trust —; trust — 
region problems see: Large scale trust — 
region reduction 
[93-XX] 
(see: Dynamic programming: optimal control applications) 
region reduction see: feasible — 
region strategy see: pure trust — 
region technique see: trust — 
regional demand 
[90B85] 
(see: Single facility location: multi-objective rectilinear 
distance location) 
regional network 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
regions see: critical —; neighboring critical —; trust — 
regions of stability 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
regression 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 
regression see: convex and concave —; isotonic —; isotonic 
medium —-; ordinal —; quasiconvex medium —; 
quasiconvex and umbrella —; simple order isotonic — 
regression analysis 
[90C26, 90C30] 
(see: Forecasting) 


; distrust 


; trust 


Subject Index 


4475 


regression analysis 
[90C26, 90C30] 
(see: Forecasting) 
regression method 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
regression model see: classical linear — 
regression problem see: convex —; isotonic — 
regression problems see: algorithms for isotonic —; lsotonic — 
Regression by special functions: algorithms and complexity 
(90C26, 41A30, 62J02) 
(referred to in: Isotonic regression problems) 
(refers to: Isotonic regression problems) 
regression trees see: classification and — 
regret see: minimization of — 
regret-fc and max-regret heuristics see: max- — 
regret heuristics see: max-regret-fc and max- — 
regret method 
68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
regular 
49M37, 65K05, 90C30] 
(see: Inequality-constrained nonlinear optimization) 
regular see: metrically —; p- — 
regular constraints 
90C30] 
(see: Kuhn-Tucker optimality conditions) 
regular cost function 
90B15] 
(see: Dynamic traffic networks) 
regular critical direction see: high- — 
regular family of probability measures 
49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 
regular family of triangulations 
[65M60] 
(see: Variational inequalities: F. E. approach) 
regular feasible point 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
regular link cost function 
[90B15] 
(see: Dynamic traffic networks) 
regular local minimizer 
[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 
regular matrix 
[15A99, 65G20, 65G30, 65G40, 90C26] 
(see: Interval linear systems) 
regular matrix see: strongly — 
regular matroid 
[90C09, 90C10] 
(see: Matroids) 
regular measure see: inner — 
regular operator see: p- — 
regular point 
[90C26, 90C39] 


(see: Second order optimality conditions for nonlinear 
optimization) 
regular polyhedron 
[90C60] 
(see: Complexity of degeneracy) 
regular polyhedron 
[90C60] 
(see: Complexity of degeneracy) 
regular problem see: JJT- —; KH- —; p- — 
regular Q-splitting 
[90C25, 90C33, 90C55] 
(see: Splitting method for linear complementarity 
problems) 
regular in the sense of Jongen—Jonker-Twilt see: problem — 
regular in the sense of Kojima—Hirabayashi see: problem — 
regular set 
[90C33] 
see: Order complementarity) 
regular set see: completely —; second order — 
regular simplex 
[90C30] 
see: Sequential simplex method) 
regular solution 
[65K10, 90C31, 90C33] 
(see: Sensitivity analysis of complementarity problems; 
Sensitivity analysis of variational inequality problems) 
regular solution of the Wilson equation 
[90C26, 90C90] 
(see: Global optimization in phase and chemical reaction 
equilibrium) 
regular stationary point 
[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 
regular subdivision 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
regular triangulation 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
see: Integer programming: algebraic methods) 
regular triangulations 
68Q20] 
(see: Optimal triangulations) 
regular value 
[90C26] 
(see: Smooth nonlinear nonconvex optimization) 
regularity 
[49M30, 49M37, 65G20, 65G30, 65G40, 65H20, 65K05, 
65K10, 90C30, 93A13] 
(see: Interval fixed point theory; Multilevel methods for 
optimal design; Practical augmented Lagrangian methods) 
regularity 
[49K27, 49K40, 90C30, 90C31, 90Cxx] 
(see: First order constraint qualifications; 
Quasidifferentiable optimization: exact penalty methods) 
regularity see: metric —; p- — 
regularity assumptions 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
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regularity axiom 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 

regularity condition 
[90C05, 90C26, 90C30, 90C39, 90Cxx] 
(see: Quasidifferentiable optimization: exact penalty 
methods; Rosen’s method, global convergence, and Powell’s 
conjecture; Second order optimality conditions for 
nonlinear optimization; Smooth nonlinear nonconvex 
optimization; Theorems of the alternative and 
optimization) 

regularity condition 

[90C26, 90C39] 

see: Second order optimality conditions for nonlinear 

optimization) 

regularity condition for penalty methods 

[90Cxx] 

(see: Quasidifferentiable optimization: exact penalty 

methods) 

regularity conditions 

[49K27, 49K40, 90C05, 90C15, 90C26, 90C30, 90C31] 

see: First order constraint qualifications; Global 
optimization using space filling; Image space approach to 
optimization; Probabilistic constrained linear 
programming: duality theory) 

regularity properties 
[57R12, 90C31, 90C34] 
(see: Smoothing methods for semi-infinite optimization) 

regularity* see: metric — 

regularization 
[90C06, 90C15] 
(see: Stabilization of cutting plane algorithms for stochastic 
linear programming problems) 

regularization see: Tikhonov —; Tikhonov iterative — 

regularization approach see: Tikhonov’s — 

regularization of deterministic cutting plane methods 
[90C06, 90C15] 
(see: Stabilization of cutting plane algorithms for stochastic 
linear programming problems) 

regularization method see: iterative — 

regularized direction finding problem 

[90C30] 

(see: Frank-Wolfe algorithm) 

regularized Frank-Wolfe algorithm 

[90C30] 

see: Simplicial decomposition) 


regularized Frank-Wolfe algorithm 

[90C30] 

see: Simplicial decomposition) 
regularized Frank—Wolfe decomposition 
[90C30] 

(see: Frank-Wolfe algorithm) 

regularized gap function 

[90C30, 90C33] 

see: Implicit lagrangian) 

regularized stochastic decomposition algorithm 
[90C06, 90C15] 

see: Stabilization of cutting plane algorithms for stochastic 
linear programming problems) 


regularized subproblem 
[90C30] 
(see: Simplicial decomposition) 

regularizing state problem 
[49]20, 49]52] 
(see: Shape optimization) 

regulation see: government — 

Reichenbach implication 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

Reid vapor pressure 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 

reinforcement learning 
[49L99, 90C39] 
(see: Dynamic programming: average cost per stage 
problems; Neuro-dynamic programming) 

reinforcement learning 
[90C39] 
(see: Neuro-dynamic programming) 

Reisner ideal see: Stanley- — 

reject index for interval optimization see: Algorithmic 
improvements using a heuristic parameter — 

rejection see: acceptance/ — 

REL chart scores 
[90B80] 
(see: Facilities layout problems) 

REL chart scores 
[90B80] 
(see: Facilities layout problems) 

relabel 
[90C35] 
(see: Maximum flow problem) 

related algorithm see: CG- — 

related algorithms see: nonlinear CG- — 

related descent see: gradient- — 

related descent iterations see: Local attractors for gradient- — 

related set function see: gradient- — 

related techniques see: acceleration devices and — 

relation see: a-cut of a fuzzy —; antisymmetric —; binary —; 
complementary —; converse —; covering —; crisp —; 
dominance —; equivalence —; equivalence closure of a —; 
functional —; fuzzy —; fuzzy outranking —; 
heterogeneous —; homogeneous —-; inverse —; linear 
equality —; linear inequality —; local equivalence —; local 
equivalence closure of a —; local order —; local pre-order 
closure of a —; local tolerance —; local tolerance closure of 
a —; locally reflexive —; matrix representation of a —; 
n-ary —; negated —; onto —; order —; outranking —; 
partial order —; pre-order —; pre-order closure of a —; 
preference —; property-closure of a —; quasi-Newton —; 
recurrence —; reflexive —; reflexive closure of a —; 
secant —; separating —; state —; strictly antisymmetric —; 
symmetric —; symmetric interior of a —; tolerance closure 
of a —; trace of —; transitive —; transposed —; 
univalent —; valued —; weak duality — 

relation of indiscernibility 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
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relation to Newton's method see: Gauss-Newton method: 
Least squares — 

relational analysis 
[03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

relational compositions 
[03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

relational interval algebra 
[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization) 

relational matrix notation 
[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

relational model 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

relational operations see: matrix notation for — 

relational product see: fuzzy — 

relational properties see: local —; testing — 

relations see: binary operations on —; BK-product of —; 
Boolean and fuzzy —; circle product of —; complementary 
slackness —; difference of —; foreset and afterset 
representation of —; fuzzy —; generalized morphisms of —; 
inclusion of —; k- —; legendre duality —; linear —; 
Morse —; outranking —; pseudo-associativity of products 
of —; round composition of —; self-inverse product of —; 
special properties of —; special properties of crisp —; 
special properties of fuzzy —; special properties of 
heterogeneous —-; special properties of homogeneous —; 
square composition of —; square product of —; subproduct 
of —; superproduct of —; Tucker homogeneous systems of 
linear —; unary operations on —; universal properties of — 

relations approach see: outranking — 

relationship see: BFGS-CG — 

relationships see: integral — 

relative 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 

relative complement 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

relative condition number see: normwise — 

relative distance 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 

relative duality gap 
[90C25, 90C30] 
(see: Solving large scale and sparse semidefinite programs) 

relative entropy 
[15A15, 90C25, 90C55, 90C90, 94A17] 
(see: Entropy optimization: shannon measure of entropy 
and its properties; Semidefinite programming and 
determinant maximization) 

relative measure see: combined — 


relative minimum 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 

relative minimum see: strict —; strong — 

relative positioning algorithm 

[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 

relative priorities 

[90C29] 

see: Estimating data for multicriteria decision making 

problems: optimization techniques) 

relative value iteration 

[49L99] 

see: Dynamic programming: average cost per stage 

problems) 

relatively complete recourse 

[90B10, 90B15, 90C15, 90C35, 90C90, 91B28] 
(see: Preprocessing in stochastic programming; Robust 
optimization) 

relatively redundant constraint 

[90C05, 90C20] 

see: Redundancy in nonlinear programs) 

relax-and-fix 

[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

(see: Modeling difficult optimization problems) 

relaxation 

[49M29, 90C05, 90C06, 90C08, 90C10, 90C11, 90C30] 

see: Generalized benders decomposition; Integer 
programming: branch and bound methods; Integer 
programming: lagrangian relaxation; Maximum constraint 
satisfaction: relaxations and upper bounds) 

relaxation 
[90C10, 90C30] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds; Relaxation in projection methods) 

relaxation see: continuous —; Decomposition techniques for 
MILP: lagrangian —; group —; Hull —; Integer 
programming: lagrangian —; Lagrange —; Lagrangian —; 
linear —; linear programming —; IP —; optimal —; outer 
approximation with equality —; surrogate — 

relaxation algorithm 
[90C30] 
(see: Cost approximation algorithms) 

relaxation and augmented penalty see: outer approximation 
with equality — 

relaxation of a feasible region 
[90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 

relaxation in integer programming see: group — 

relaxation labeling algorithm 

[05C69, 05C85, 68W01, 90C59] 

see: Heuristics for maximum clique and independent set) 

relaxation labeling processes 

[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 

see: Replicator dynamics in combinatorial optimization) 

relaxation method 

[91B50] 

see: Walrasian price equilibrium) 
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relaxation method 
[49]52, 90C30, 91B50] 
(see: Nondifferentiable optimization: relaxation methods; 
Walrasian price equilibrium) 

relaxation method see: Agmon—Motzkin—Fourier — 

relaxation methods see: combined —; Nondifferentiable 
optimization: — 

relaxation problem see: convex — 

Relaxation in projection methods 
(90C30) 
(referred to in: Equality-constrained nonlinear 
programming: KKT necessary optimality conditions; 
Inequality-constrained nonlinear optimization) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; 
Inequality-constrained nonlinear optimization) 

relaxation quantity 

[90C29] 

see: Multi-objective optimization; Interactive methods for 

preference value functions) 

relaxation rule 

[49]52, 90C30] 

see: Nondifferentiable optimization: subgradient 

optimization methods) 

relaxation step 

[65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Cost approximation algorithms; Interval global 
optimization) 

relaxation strategy see: convexification/ — 

relaxation with subgradient optimization see: Lagrangian — 

Relaxation technique 
(see: Railroad crew scheduling) 

relaxations see: bounds based on semidefinite —; convex —; 
extended group —; Gomory —-; linear programming —; 
tight — 

relaxations and upper bounds see: Maximum constraint 
satisfaction: — 

relaxed 

[90B36] 

(see: Stochastic scheduling) 

relaxed constraint region 

[90C30, 90C90] 

(see: Bilevel programming: global optimization) 

relaxed control 

[49K05, 49K10, 49K15, 49K20] 

see: Duality in optimal control with first order differential 

equations) 

relaxed control problem 

[49K05, 49K10, 49K15, 49K20] 

see: Duality in optimal control with first order differential 
equations) 

relaxed dual algorithm see: generalized primal- — 

relaxed dual approach see: Generalized primal- —; primal- — 

relaxed master problem 
[49M20, 90-08, 90C25] 
(see: Nondifferentiable optimization: cutting plane 
methods) 

relaxed multicommodity flow 
[90C35] 
(see: Feedback set problems) 


relaxed nonlinear program 
[90C30, 90C33] 

(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 

relaxed primal master problem 
[49M27, 90C11, 90C30] 

(see: MINLP: generalized cross decomposition) 

relaxed Problem 
(see: Railroad crew scheduling) 

reliability 
[90C10, 90C30, 93-XX] 

(see: Dynamic programming: optimal control applications; 
Modeling languages in optimization: a new paradigm) 
relinking see: path — 

rendez-vous communication 
[65K05, 65Y05] 

(see: Parallel computing: models) 

reorthogonalized Gram-Schmidt algorithm see: 
Daniel-Gragg—Kaufmann-Stewart — 

repeated nearest neighbor (RNN) 

[68Q25, 68R10, 68W40, 90027, 90C59] 

(see: Domination analysis in combinatorial optimization) 
repertory grids 

[03B52, 03E72, 47840, 68127, 68135, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

repetition 
[68T20, 68T99, 90C27, 90C59] 

(see: Metaheuristics) 

repetition see: machine — 

repetitive see: strictly — 

replacement 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 

(see: Global optimization in protein folding) 

replicated network see: time — 

replication heuristic see: annealed — 

replicator dynamics 
[90C20] 

(see: Standard quadratic optimization problems: 
algorithms) 

Replicator dynamics in combinatorial optimization 
(90C27, 90C20, 90C35, 90C59, 91A22, 37B25, 05C69, 05C60) 
(referred to in: Combinatorial matrix analysis; 
Combinatorial optimization algorithms in resource 
allocation problems; Combinatorial optimization games; 
Evolutionary algorithms in combinatorial optimization; 
Fractional combinatorial optimization; Heuristics for 
maximum clique and independent set; Multi-objective 
combinatorial optimization; Neural networks for 
combinatorial optimization; Neuro-dynamic 
programming; Unconstrained optimization in neural 
network training) 

(refers to: Combinatorial matrix analysis; Combinatorial 
optimization algorithms in resource allocation problems; 
Combinatorial optimization games; Evolutionary 
algorithms in combinatorial optimization; Fractional 
combinatorial optimization; Graph coloring; Greedy 
randomized adaptive search procedures; Heuristics for 
maximum clique and independent set; Multi-objective 
combinatorial optimization; Neural networks for 
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combinatorial optimization; Neuro-dynamic 
programming; Unconstrained optimization in neural 
network training) 
replicator equation 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
replicator equations 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
replicator models see: multipopulation — 
repositioned 
(see: Railroad locomotive scheduling) 
representable 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
representable matroid 
[90C09, 90C10] 
(see: Matroids) 
representation 
[52B11, 52B45, 52B55, 90C05, 90C20] 
(see: Redundancy in nonlinear programs; Volume 
computation for polytopes: strategies and performances) 
representation 
[90C05] 
(see: Carathéodory theorem) 
representation see: compact —; declarative —; digraph —; 
disaggregated —; envelope —; Euclidean —; geometric (or 
disk) —; Global optimization: envelope —; H- —; lookup 
table —; matroid —; minimal —; mixed Time —; 
orthonormal —; parametric —; problem —; process —; 
splitting variable —; unidimensional Euclidean —; V- — 
representation of cutting plane coefficients see: statistical — 
representation of a feasible region see: minimal —; prime — 
representation of models 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
representation of a relation see: matrix — 
representation of relations see: foreset and afterset — 
representation theorem 
[90C06, 90C25, 90C35] 
(see: Decomposition principle of linear programming; 
Simplicial decomposition algorithms) 
representation theorem 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
representation theorem see: topological — 
representations see: compact —; necessary optimality 
condition without using (sub)gradients parametric — 
representative set 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
reproductive 
(see: State of the art in modeling agricultural systems) 
repulsion see: Global optimization in Weber's problem with 
attraction and —; Weber problem with attraction and — 
request see: global round robin — 
required see: no pivoting — 


required edge 
[90B20] 
(see: General routing problem) 
required vertex 
[90B20] 
(see: General routing problem) 
requirement see: Type | — 
requirement exhaustive sequential coloring 
[05-XX] 
(see: Frequency assignment problem) 
requirements planning 
[90-02] 
(see: Operations research models for supply chain 
management and design) 
research see: european Journal of Operational —; GRASP in 
operations —; Operations — 
research and financial markets see: Operations — 
research models for supply chain management and design see: 
Operations — 
reserved solution see: e- — 
reservoir see: hydro- — 
residents see: permanent — 
residents of special facilities 
(see: Emergency evacuation, optimization modeling) 
residual 
[90C30] 
(see: Conjugate-gradient methods) 
residual see: large —; natural —; small 
residual algorithm see: conjugate — 
residual capacity 
[90C35] 
(see: Minimum cost flow problem) 
residual network 
[90035] 
(see: Maximum flow problem; Minimum cost flow problem) 
residual problem see: large —; nonzero —; zero — 
residual vector 
[65D10, 65K05] 
(see: Overdetermined systems of linear equations) 
residuals 
[90C30] 
see: Nonlinear least squares problems) 
residuals 
[90C30] 
see: Nonlinear least squares problems) 
residuation 
[03B52, 03E72, 47S40, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
residues see: selected — 
resolution see: single-lookahead-unit- — 
resolution based theorem prover 
[03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
resolvent equations 
[49J40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 
resolvent equations 
[49J40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 
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resolvent operator 

49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 

(see: Variational principles) 

resolving degeneracy 

90C60] 

(see: Complexity of degeneracy) 

resolving degeneracy 

90C60] 

(see: Complexity of degeneracy) 

resource allocation 

49-XX, 60)xx, 65Lxx, 90B80, 90B85, 90Cxx, 91Axx, 91B32, 
91Bxx, 92D30, 93-XX] 
(see: Facility location with externalities; Resource allocation 
for epidemic control) 

resource allocation 
[49-XX, 60Jxx, 65Lxx, 90C09, 90C10, 91B32, 92D30, 93-XX] 
(see: Combinatorial optimization algorithms in resource 
allocation problems; Resource allocation for epidemic 
control) 

Resource allocation for epidemic control 
(49-XX, 60]xx, 65Lxx, 91B32, 92D30, 93-XX) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location; 
Warehouse location problem) 
(refers to: Combinatorial optimization algorithms in 
resource allocation problems) 

resource allocation problem 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 

resource allocation problem see: discrete — 

resource allocation problems see: Combinatorial optimization 
algorithms in — 

resource-constrained minimum spanning tree problem 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 

resource constrained project scheduling see: Static — 

resource constrained: unified modeling frameworks see: 
Short-term scheduling — 

resource constraint 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 

resource-directive decomposition 
[90C35] 
(see: Multicommodity flow problems) 

resource-payoff space 
[90C30] 
(see: Lagrangian duality: BASICS) 


Resource planning 
(see: Railroad locomotive scheduling) 
resource planning see: water — 
resource systems see: conjunctive use of water — 
resource weighted assignment model see: the multi- — 
resources see: Optimization in water —; Short-term scheduling 
of batch processes with —; stochastic approach to 
optimization in water —; surface and groundwater — 
resources planning under uncertainty on hydrological 
exogenous inflow and demand see: water — 
resources policies see: nonanticipative water —; 
nonanticipativity water — 
respect to another) see: pseudomonotone bifunction (with — 
respect to changes in cost coefficients see: sensitivity analysis 
with — 
respect to a filtration see: stochastic process nonanticipative 
with — 
respect to right-hand side changes see: sensitivity analysis 
with — 
respect to a set see: invexity with —; pre-invexity with —; 
substationarity point with — 
response see: derivatives of structural — 
response mapping see: best — 
response surface 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 
response surface 
[65F10, 65F50, 65H10, 65K10] 
(see: Multidisciplinary design optimization) 
response surface method 
[90C90] 
(see: Design optimization in computational fluid dynamics) 
rest arcs 
(see: Railroad crew scheduling) 
restart 
[90C06] 
(see: Large scale unconstrained optimization) 
restarted Lanczos method see: implicit — 
restless bandit problem see: multi-armed — 
restrict 
[90C15] 
(see: Stochastic programs with recourse: upper bounds) 
restricted accessibility form of CEP 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
restricted Candidate List 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Broadcast scheduling problem; Feedback set problems; 
Greedy randomized adaptive search procedures) 
restricted gradient 
[90Cxx] 
(see: Discontinuous optimization) 
restricted implicit Lagrangian 
[90C30, 90C33] 
(see: Implicit lagrangian) 
restricted location problem 
[90B85] 
(see: Multifacility and restricted location problems) 
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restricted location problem 

[90B85] 

(see: Multifacility and restricted location problems) 
restricted location problems see: Multifacility and — 
restricted master problem 

[90C06, 90C10, 90C11, 90C25, 90C30, 90C35, 90C57, 90C90] 

(see: Decomposition principle of linear programming; 

Modeling difficult optimization problems; 

Multicommodity flow problems; Simplicial decomposition 

algorithms) 
restricted master problem 
90C06, 90C25, 90C35] 

(see: Simplicial decomposition algorithms) 
restricted-recourse bound 

90C15] 

(see: Stochastic programs with recourse: upper bounds) 
restricted recourse models 

90C90, 91B28] 

(see: Robust optimization) 

stricted simplicial decomposition 

90C30] 

(see: Simplicial decomposition) 

restricted simplicial decomposition 

90C30] 

(see: Simplicial decomposition) 
restricted statistical inference see: order — 
restriction 

[90C30] 

(see: Simplicial decomposition) 
restriction see: inner linearization/ —; matroid —; order —; 

redundant —; taboo — 
restriction of a matroid 
90C09, 90C10] 

(see: Matroids) 

striction to the solution set in problem solving 

90C15] 

(see: Stochastic programs with recourse: upper bounds) 
restriction strategy see: projection- — 

restrictions 

90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 

(see: Modeling difficult optimization problems) 
restrictions see: control —; operational — 
restrictive see: k- — 
restrictive multilayer see: k- — 
result see: problem —; strong duality —; weak duality — 
result verification see: automatic — 
resultant 

[01499] 

(see: Leibniz, gottfried wilhelm) 
results see: numerical — 
results for RSM-distributions see: asymptotic — 
retailer/model nodes 

[90C35] 

(see: Minimum cost flow problem) 
retailer nodes 

[90C35] 

(see: Minimum cost flow problem) 
retrieval see: Global optimization methods for harmonic —; 

harmonic — 
retrieval based on single-crystal X-ray diffraction data see: 

Optimization techniques for phase — 


re 


rn 


re 


rn 


return on investment see: maximization of — 
return nodes 
[90C30, 90C35] 
(see: Optimization in water resources) 
return and risk 
[68Q25, 91B28] 
(see: Competitive ratio for portfolio management) 
return/risk see: maximization of — 
revealing factorization see: rank — 
revealing QR factorization see: rank — 
revealing URV factorization see: rank — 
revenue management 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
revenue management 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
reverse convex constraint see: linear program with an 
additional — 
reverse convex inequality 
[46A20, 52A01, 90C30] 
see: Farkas lemma: generalizations) 
Reverse convex optimization 
(90C26, 90C30) 
referred to in: &BB algorithm; Quadratic knapsack; 
Quadratic programming with bound constraints; Standard 
quadratic optimization problems: theory) 
refers to: &BB algorithm; Concave programming; D.C. 
programming; Quadratic knapsack; Quadratic 
programming with bound constraints; Standard quadratic 
optimization problems: theory) 
reverse convex programming 
[90C05, 90C26, 90C30] 
see: Continuous global optimization: models, algorithms 
and software; Reverse convex optimization) 
reverse convex programming 
[90C26, 90C30 
see: Reverse convex optimization) 
reverse convex programs 
[90C26, 90C30 
(see: Reverse convex optimization) 
reverse convex set 
[90C26, 90C30 
see: Reverse convex optimization) 
reverse differentiation 
[65K05, 90C30] 
see: Automatic differentiation: calculation of the Hessian) 
reverse mode 
[65H99, 65K99] 
see: Automatic differentiation: point and interval) 
reverse mode of AD 
[49-04, 65Y05, 68N20] 
see: Automatic differentiation: parallel computation) 
reverse mode of an AD algorithm 
[26A24, 65D25] 
see: Automatic differentiation: introduction, history and 
rounding error estimation) 
reverse mode automatic differentiation 
[65D25, 68W30] 
see: Complexity of gradients, Jacobians, and Hessians) 
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reverse normal hull 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
reverse normal set 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
reverse polyblock 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
reverse polyblock algorithm 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
reverse polyblock (copolyblock) algorithm see: revised — 
review see: Generalized variational inequalities: A brief — 
review inventory models: (QR) policy see: Continuous — 
review model see: continuous —; periodic — 
revised geometric mean method 
90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
revised polyblock algorithm 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
revised reverse polyblock (copolyblock) algorithm 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
Reynolds-averaged Navier-Stokes code 
90C90] 
(see: Design optimization in computational fluid dynamics) 
RGM 
90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
RH-BFGS see: L- — 
p-regular problem 
[90Cxx] 
(see: Quasidifferentiable optimization: exact penalty 
methods) 
p-regularity 
[90Cxx] 
(see: Quasidifferentiable optimization: exact penalty 
methods) 
Ribiére algorithm see: Polyak—Polak- — 
Ribiére formula see: Polak— — 
Ribiére method see: Polak- — 
Riccati equation 
[90C30] 
see: Suboptimal control) 
Riccati equation see: continuous-time — 
rich stream 
[93A30, 93B50] 
see: Mixed integer linear programming: mass and heat 
exchanger networks) 
Richardson iteration 
[65H10, 65J15] 
see: Contraction-mapping) 
ride see: dial-a- —; m-dial-a- — 
ridge 
[90Cxx] 
see: Discontinuous optimization) 
ridge see: active —; fault — 


Riemannian manifold 

90C26] 

(see: Smooth nonlinear nonconvex optimization) 
Riemannian metric 

90C26] 

(see: Smooth nonlinear nonconvex optimization) 
Riemannian metric see: C<- — 

Riesz theorem 

03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 
right-chain justification 

05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
ight-collection of a partition 

05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 

(see: Maximum partition matching) 
right-hand side changes see: sensitivity analysis with respect 

to— 
right-hand side perturbation model 
90C15, 90C26, 90C33] 

(see: Stochastic bilevel programs) 

right-hand side perturbation problem 

90C31] 

(see: Sensitivity and stability in NLP: continuity and 
differential stability) 

right-hand side problem 

90C31] 

(see: Sensitivity and stability in NLP: approximation) 
right-hand side simplex algorithm see: parametric — 
right-hand-side uncertainty, duality and applications see: 

Robust linear programming with — 
right-paired element 
05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
right-paired set 
05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
right-pairs 
05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
right-reachable 
05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
right-reachable see: directly — 
right saddle point 
49-XX, 90-XX, 93-XX] 

(see: Duality theory: monoduality in convex optimization) 
right saddle-point theorem 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: monoduality in convex optimization) 
right-unpaired element 

05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 

(see: Maximum partition matching) 
rigid templates see: De novo protein designUsing — 
rigor see: with mathematical — 
rigorous bound for solutions of nonlinear systems of equations 

[65G20, 65G30, 65G40, 65H20, 65K99] 

(see: Interval Newton methods) 
rigorously 

[65G20, 65G30, 65G40, 65H20] 

(see: Interval fixed point theory) 
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ring networks 

[05C85] 

(see: Directed tree networks) 
ring topology 

[90-XX] 

(see: Survivable networks) 
risk see: business failure —; estimation —; maximization of 

return/ —; return and — 

(risk averse, anticipative) decision see: ex-ante — 
risk-free asset 

91B50] 

(see: Financial equilibrium) 

risk-free asset 

91B50] 

(see: Financial equilibrium) 

(risk prone, adaptive) decision see: ex-post — 
riskless interest rate 

68Q25, 91B28] 

(see: Competitive ratio for portfolio management) 
risks see: agricultural — 

Ritz bound see: Rayleigh- — 
Ritz-Galerkin method 

[65M60] 

(see: Variational inequalities: F. E. approach) 
Ritz—Galerkin method 

[65M60] 

(see: Variational inequalities: F. E. approach) 
Ritz method see: Rayleigh- — 
river hydropower nodes see: on-the- — 
river plant see: run-of- — 
river plants see: run-of- — 

RLT system see: reduced — 
RMP 

[90C06, 90C25, 90C35] 

(see: Simplicial decomposition algorithms) 
rmsd threshold see: determination of — 
rmsds by energy see: average — 

(RNN) see: repeated nearest neighbor — 
road traveling salesman problem 

[90B20] 

(see: General routing problem) 
Robbins—Monro method 

[62F12, 65C05, 65K05, 90C15, 90C31] 

(see: Monte-Carlo simulations for stochastic optimization) 
robin balancing scheme see: asynchronous round — 
robin request see: global round — 

Robinson see: anti- — 

Robinson CQ 

49K27, 49K40, 90C30, 90C31] 

(see: First order constraint qualifications) 
Robinson matrix see: anti- — 

robust 

05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

(see: Optimization problems in unit-disk graphs) 
robust see: model —; solution — 

robust algorithms 

05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 

(see: Optimization problems in unit-disk graphs) 


robust conic optimization problems see: Approximations to — 


Robust control 
(93D09) 


(referred to in: Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Hamilton-Jacobi-Bellman 
equation; Infinite horizon control and dynamic games; 
MINLP: applications in the interaction of design and 


control; Multi-objective optimization: interaction of design 


and control; Optimal control of a flexible arm; Robust 
control: schur stability of polytopes of polynomials; 
Semi-infinite programming and control problems; 


Sequential quadratic programming: interior point methods 


for distributed optimal control problems; Suboptimal 
control) 
(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Hamilton-Jacobi-Bellman equation; Infinite 
horizon control and dynamic games; MINLP: applications 
in the interaction of design and control; Multi-objective 
optimization: interaction of design and control; Optimal 
control of a flexible arm; Robust control: schur stability of 
polytopes of polynomials; Semi-infinite programming and 
control problems; Sequential quadratic programming: 
interior point methods for distributed optimal control 
problems; Suboptimal control) 

Robust control: schur stability of polytopes of polynomials 
(93D09, 93C55, 39A11) 
(referred to in: Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Hamilton-Jacobi-Bellman 
equation; Infinite horizon control and dynamic games; 
MINLP: applications in the interaction of design and 


control; Multi-objective optimization: interaction of design 


and control; Optimal control of a flexible arm; Robust 


control; Semi-infinite programming and control problems; 
Sequential quadratic programming: interior point methods 


for distributed optimal control problems; Suboptimal 
control) 
(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Hamilton-Jacobi-Bellman equation; Infinite 
horizon control and dynamic games; MINLP: applications 
in the interaction of design and control; Multi-objective 
optimization: interaction of design and control; Optimal 
control of a flexible arm; Robust control; Semi-infinite 
programming and control problems; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Suboptimal control) 

robust control synthesis 
[93D09] 
(see: Robust control) 
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robust control theory 
[93D09] 
(see: Robust control) 

robust counterpart 
(see: Approximations to robust conic optimization 
problems; Design of robust model-based controllers via 
parametric programming; Price of robustness for linear 
optimization problems) 

Robust design of dynamic systems by constructive nonlinear 
dynamics 
(37N40, 90C30, 90C34) 

robust estimator 
[65D 10, 65K05] 
(see: Overdetermined systems of linear equations) 

Robust global optimization 
(90C26, 90C31) 

Robust linear programming with right-hand-side uncertainty, 
duality and applications 

robust model-based controllers via parametric programming 
see: Design of — 

robust obstacle-free shape design 

[90C25, 90C27, 90C90] 

see: Semidefinite programming and structural 

optimization) 

robust obstacle-free truss design 

[90C25, 90C27, 90C90} 

see: Semidefinite programming and structural 

optimization) 

Robust optimization 

(90C90, 91B28) 

referred to in: Competitive ratio for portfolio management; 
Financial applications of multicriteria analysis; Financial 
optimization; Portfolio selection and multicriteria analysis; 
Semi-infinite programming and applications in finance) 
(refers to: Competitive ratio for portfolio management; 
Financial applications of multicriteria analysis; Financial 
optimization; Portfolio selection and multicriteria analysis; 
Semi-infinite programming and applications in finance) 

Robust optimization 

[90C90, 91B28] 

see: Robust optimization) 

Robust optimization: mixed-integer linear programs 

90C11, 90C30, 90B35) 

robust parametric programs 

[90C90, 91B28] 

see: Robust optimization) 

robust programming problem 

[90C29, 90C70] 

(see: Fuzzy multi-objective linear programming) 

robust stability 

[39A11, 93C55, 93D09] 

see: Robust control: schur stability of polytopes of 

polynomials) 

robust stability analysis 

[90C26, 90C90] 

see: Global optimization in generalized geometric 

programming) 

robust stability analysis 

[90C26, 90C90] 

see: Global optimization in generalized geometric 

programming) 


robust stopping criteria see: Dykstra’s algorithm and — 

robustness 
[90C25, 90C27, 90C90, 93D09] 
(see: Robust control; Semidefinite programming and 
structural optimization) 

robustness 
[90C15, 90C29, 90C30, 90C99] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points; SSC minimization 
algorithms; SSC minimization algorithms for nonsmooth 
and stochastic optimization) 

robustness analysis 
[90C29, 90C70, 93D09] 
(see: Fuzzy multi-objective linear programming; Robust 
control) 

robustness for linear optimization problems see: Price of — 

Rockafellar directional differential 
[49J40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
(see: Nonconvex energy functions: hemivariational 
inequalities) 

Rockafellar duality see: Fenchel- — 

Rockafellar duality theory see: Fenchel- — 

Rockafellar generalized derivative see: Clarke- — 

Rockafellar subdifferential see: Moreau— — 

Rodriguez method 

65F10, 65F50, 65H10, 65K10] 

(see: Multidisciplinary design optimization) 

rolling horizon see: decision making with — 

rollout method 

68T20, 68T99, 90C27, 90C59] 

(see: Metaheuristics) 

(ROM) see: recursive Opt Matching — 

root arc 

90C35] 
(see: Generalized networks) 

root-free Givens transformation see: square- — 

root method see: square- — 

root node 
[34E05, 90C27] 
(see: Asymptotic properties of random multidimensional 
assignment problem) 

root problem 
[65K05, 90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 
90C59, 90C60, 90C90] 
(see: Automatic differentiation: root problem and branch 
problem; Traveling salesman problem) 

root problem 
[65K05] 
(see: Automatic differentiation: root problem and branch 
problem) 

root problem and branch problem see: Automatic 
differentiation: — 

root transformation see: logarithmic and square- —; modified 
square- —; square- — 

root of a tree 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 

rooted tree 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Generalized networks; Replicator dynamics in 
combinatorial optimization) 


Subject Index 


4485 


rooted tree see: level of a vertex in a — 

Rosen gradient projection method 

65K05, 65K10] 

(see: ABS algorithms for optimization) 

Rosen gradient projection method 

90C30] 

(see: Rosen’s method, global convergence, and Powell’s 

conjecture) 

Rosen method 

90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 

Rosen method see: global convergence problem for the — 

Rosen’s method, global convergence, and Powell’s conjecture 
(90C30) 
(referred to in: Equality-constrained nonlinear 
programming: KKT necessary optimality conditions; First 
order constraint qualifications; Frank-Wolfe algorithm; 
Inequality-constrained nonlinear optimization; 
Kuhn-Tucker optimality conditions; Lagrangian duality: 
BASICS; Saddle point theory and optimality conditions; 
Second order constraint qualifications; Second order 
optimality conditions for nonlinear optimization) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; First order 


constraint qualifications; Inequality-constrained nonlinear 


optimization; Kuhn-Tucker optimality conditions; 
Lagrangian duality: BASICS; Saddle point theory and 
optimality conditions; Second order constraint 
qualifications; Second order optimality conditions for 
nonlinear optimization; Successive quadratic 
programming: full space methods) 
Rosen mixed integer formulation see: LCP: Pardalos- — 
Rosenbloom theorem 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
Rosenbrock hillclimbing procedure 
[90C30] 
(see: Suboptimal control) 
Rosenbrock method 
(90C30) 
(referred to in: Cyclic coordinate method; Powell method; 
Sequential simplex method) 
(refers to: Cyclic coordinate method; Powell method; 
Sequential simplex method) 
Rosenbrock method 
[90C30] 
(see: Rosenbrock method) 
rostering see: crew — 
rotamer 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
rotamer 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
rotamer library 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
rotamer library 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 


rotation see: diagnostic —; Givens — 
rotation matrix 
46N10, 47N10, 49M37, 65K10, 90C26, 90C30] 
(see: Global optimization: tight convex underestimators) 
rotation in the solution of equations 
65K05, 90Cxx] 
(see: Symmetric systems of linear equations) 
rotation-symmetry model 
52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
rotations see: diagnostic — 
roughness 
68Q20] 
(see: Optimal triangulations) 
roulette wheel procedure 
[92B05] 
see: Genetic algorithms) 
roulette wheel procedure 
[92B05] 
see: Genetic algorithms) 
round composition of relations 
[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
round-off error 
[90C60] 
(see: Complexity of degeneracy) 
round robin balancing scheme see: asynchronous — 
round robin request see: global — 
rounding see: consistent —; integer —; outward —; 
randomized — 
rounding cut see: mixed integer — 
rounding error estimation see: Automatic differentiation: 
introduction, history and — 
rounding errors are under control 
[65G20, 65G30, 65G40, 65K05, 90C30] 
see: Interval global optimization) 
rounding function 
[90C10, 90C46] 
see: Integer programming duality) 
rounding heuristic 
[90035] 
see: Multi-index transportation problems) 
rounding problem see: matrix — 
ROUNDING PROCEDURE see: rANDOMIZED — 
route choice adjustment process see: trip- — 
routine see: individual software —; separation — 
routines see: package of basic software — 
routing 
[90B80, 90B85, 90C35] 
(see: Multicommodity flow problems; Warehouse location 
problem) 
routing 
[90B20] 
(see: General routing problem) 
routing see: aircraft —; arc —; inventory —; location- —; 
node —; period —; vehicle —; VLSI 
routing algorithm see: parallel — 
routing and industrial in-plant railroads see: engine — 
routing models see: location- — 
routing pattern model see: fractional —; single path — 
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routing problem see: airline maintenance —; capacitated 
arc —; capacitated vehicle —; distance-constrained 
vehicle —; dynamic vehicle —; General —; inventory 
ship —; Location —; Metaheuristic algorithms for the 
vehicle —; stochastic vehicle —; vehicle — 

routing problem with backhauls see: vehicle — 

routing problem with simultaneous pickups and deliveries see: 
Vehicle — 

routing problem with time windows see: vehicle — 

routing problems see: approximate methods for solving 
vehicle —; cargo —; constructive methods for solving 
vehicle —; exact methods for solving vehicle —; Maritime 
inventory —; Stochastic vehicle — 

routing and protection problems in optical networks see: 
Integer linear programs for — 

routing of traffic in transmission network 
[90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 

row see: cost —; critical — 

row-action algorithm 

[68W 10, 90B15, 90C06, 90C30] 

see: Stochastic network problems: massively parallel 

solution) 

row-action method 

[90C05, 90C25] 

see: Young programming) 

row-action method 

[90C05, 90C25] 

(see: Young programming) 

row bandwidth 

[65Fxx] 

(see: Least squares problems) 

row conditional proximity data 

[62H30, 90C27] 

see: Assignment methods in clustering) 

row generation 

[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 

90C60, 90C90] 

see: Traveling salesman problem) 

row orthogonalization scheme see: sequential — 

row rank see: full — 


row sufficient 
[90C33] 
(see: Linear complementarity problem) 
row sufficient matrix 
[65K05, 90C20, 90C25, 90C33, 90C55] 
(see: Principal pivoting methods for linear complementarity 
problems; Splitting method for linear complementarity 
problems) 
RP 
[90C30] 
(see: Lagrangian duality: BASICS) 
RPP 
[90B20] 
(see: General routing problem) 
rRO 
[74A40, 90C26] 
(see: Shape selective zeolite separation and catalysis: 
optimization methods) 


RSD 
[90C30] 
(see: Simplicial decomposition) 
RSM-distribution with algebraically decreasing tail 
[52.A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
RSM-distributions see: asymptotic results for — 
rSQP 
90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
rSQP 
90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
RTPC 
49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
Rudolph method 
03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
rule see: absolute qualification —; adaptive subdivision —; 
Armijo —; Armijo steplength —; Bayes optimal —; Bayesian 
stopping —; best bound —; Bland —; Bland least index 
pivoting —; chain —; column dropping —; cyclic —; 
Dantzig —; Dantzig largest coefficient pivoting —; 
decision —; divergent series —; divergent series 
step-size —; divergent series steplength —-; first-in 
last-out —; Fritz John —; fuzzy sum —; generalized Gauss 
quadrature —; generic pivoting —; geometric series —; 
global Lagrange multiplier —; Lagrange multiplier —; 
largest coefficient —; Levenberg—Marquardt —; 
lexicographic pivoting —; logarithmic scoring —; minimax 
decision —; momentum updating —; North-West 
corner —; parametric —; polyak Il —; quadratic (Brier) 
scoring —; relaxation —; smallest index —; splitting —; 
stopping —; sum —-; tie breaking — 
rule-based system 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
rule for Bayesian networks see: chain — 
rule of Bland 
52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
rule of economic growth see: Ramsey — 
rule of greatest improvement 
52A22, 60D05, 68Q25, 90C05 
(see: Probabilistic analysis of simplex algorithms) 
rule of justice 
52A22, 60D05, 68Q25, 90C05 
(see: Probabilistic analysis of simplex algorithms) 
rule of random choice 
52A22, 60D05, 68Q25, 90C05 
(see: Probabilistic analysis of simplex algorithms) 
rule of steepest ascent 
52A22, 60D05, 68Q25, 90C05 
(see: Probabilistic analysis of simplex algorithms) 
rule of thumb 
68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
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rules see: anticycling —; chain —; Criss-cross pivoting —; 
Least-index anticycling —; Lexicographic pivoting —; 
look-ahead —; pivot —; pivoting —; plausible —; 
Stochastic global optimization: stopping — 
rules of branch and bound see: basic — 
rules of a Turing machine see: transition — 
run see: artificial centering hit and —; hit and —; hyperspheres 
direction hit and —; improving hit and — 
run algorithm see: hit and — 
run algorithms see: hit and — 
run generator see: hit and — 
run methods see: Global optimization: hit and —; hit and — 
run-of-river plant 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
run-of-river plant 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
run-of-river plants 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
run SQG see: single — 
running list 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
running in O(n‘) time see: algorithm — 
running time of a Turing machine 
[90C60] 
(see: Complexity theory) 
runs see: minimal —; multiple — 
rural postman problem 
[90B20] 
(see: General routing problem) 
(RVI) see: random vertex insertion — 
rVNS 
[9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 
RVP 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 


s 


S2%2x2 group see: symmetric — 
S-DIAL 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 
S-face 
[90C20] 
(see: Standard quadratic optimization problems: 
applications) 
S-Fejérian 
[47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 


S-HEAP 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 

S)policy see: (s — 

S-procedure 

93D09] 

(see: Robust control) 

(s,S) optimal policies 

491.20] 

(see: Dynamic programming: inventory control) 

(s,S) policy 

491.20] 

(see: Dynamic programming: inventory control) 

(s, S)policy 

90B50] 

(see: Inventory management in supply chains) 

s-stress 

65K05, 90C27, 90C30, 90C57, 91C15] 

(see: Optimization-based visualization) 

s—t-cut 

[90C35] 

see: Maximum flow problem) 

s—t-cut see: flow across an — 

S*-matrix 

[90C09, 90C10] 

see: Combinatorial matrix analysis) 

SA 

90C90, 90C27) 

referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian global 
optimization; Broadcast scheduling problem; Discrete 
stochastic optimization; Genetic algorithms; Global 
optimization based on statistical models; Global 
optimization: hit and run methods; Global optimization in 
Lennard-Jones and morse clusters; Job-shop scheduling 
problem; Molecular structure determination: convex global 
underestimation; Monte-Carlo simulated annealing in 
protein folding; Multiple minima problem in protein 
folding: «BB global optimization approach; 
Optimization-based visualization; Optimization in medical 
imaging; Packet annealing; Phase problem in X-ray 
crystallography: Shake and bake approach; Random search 
methods; Simulated annealing methods in protein folding; 
Stochastic global optimization: stopping rules; Stochastic 
global optimization: two-phase methods) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Bayesian global optimization; 
Evolutionary algorithms in combinatorial optimization; 
Global optimization based on statistical models; 
Monte-Carlo simulated annealing in protein folding; 
Packet annealing; Random search methods; Simulated 
annealing methods in protein folding; Stochastic global 
optimization: stopping rules; Stochastic global 
optimization: two-phase methods) 

SA 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 

SA see: PA of — 

SABB algorithm see: nonsmooth SSC- —; SSC- — 
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saccharomyces cerevisiae 
92B05] 
(see: Genetic algorithms) 
saddle duality theorem 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: monoduality in convex optimization) 
saddle function 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: monoduality in convex optimization) 
saddle Lagrange duality 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: biduality in nonconvex optimization; 
Duality theory: monoduality in convex optimization) 
saddle Lagrangian 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
saddle-minimax point 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: monoduality in convex optimization) 
saddle-minimax theorem 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: monoduality in convex optimization) 
saddle point 
[49K35, 49M27, 65K10, 90C05, 90C06, 90C15, 90C25] 
(see: Convex max-functions; Probabilistic constrained 
linear programming: duality theory; Saddle point theory 
and optimality conditions) 
saddle point 
[90C06] 
(see: Saddle point theory and optimality conditions) 
saddle point see: left —; right — 
saddle-point formulation 
[65M60] 
(see: Variational inequalities: F. E. approach) 
saddle-point inequalities 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
saddle-point problem 
[49K35, 49M27, 65K10, 90C25] 
see: Convex max-functions) 
saddle-point sufficient condition 
[90C30] 
see: Image space approach to optimization) 
saddle-point theorem see: right — 
Saddle point theory and optimality conditions 
90C06) 
referred to in: Equality-constrained nonlinear 
programming: KKT necessary optimality conditions; First 
order constraint qualifications; Inequality-constrained 
nonlinear optimization; Kuhn-Tucker optimality 
conditions; Lagrangian duality: BASICS; Rosen’s method, 
global convergence, and Powell’s conjecture; Second order 
constraint qualifications; Second order optimality 
conditions for nonlinear optimization) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; First order 
constraint qualifications; Inequality-constrained nonlinear 
optimization; Kuhn-Tucker optimality conditions; 
Lagrangian duality: BASICS; Rosen’s method, global 
convergence, and Powell’s conjecture; Second order 


fe¥) 


constraint qualifications; Second order optimality 
conditions for nonlinear optimization) 
saddle points 
[90C15] 
(see: Stochastic quasigradient methods in minimax 
problems) 
saddle value 
[90C05, 90C15] 
(see: Probabilistic constrained linear programming: duality 
theory) 
safe starting region 
[65G20, 65G30, 65G40] 
(see: Interval analysis: systems of nonlinear equations) 
safeguarded new trial steplength see: compute a — 
Saito divergence see: Itakura— — 
sales see: maximization of — 
sales assumption see: lost — 
salesman see: probabilistic traveling — 
salesman problem see: classical traveling —; graphical 
traveling —; Heuristic and metaheuristic algorithms for the 
traveling —; prize collecting traveling —; road traveling —; 
Steiner graphical traveling —; Time-dependent 
traveling —; traveling — 
Salesman Problem (ATSP) see: asymmetric Traveling — 
salesman problems see: traveling — 
salesperson problem see: traveling — 
same face see: points on the — 
same probability see: randomly with the — 
sample see: validation — 
sample and expectation functions 
[90C15] 
(see: Stochastic quasigradient methods: applications) 
sample-path optimization 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 
sample problem 
[90C26, 90C29] 
(see: Optimal design of composite structures) 
sampler see: hidden Markov model and Gibbs — 
samples see: training — 
sampling 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 
sampling see: exact —; importance —; Markov chain —; 
random —; uniform random — 
sampling procedure see: interactive — 
sampling and variance reduction see: Monte-Carlo — 
sandwich condition 
90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 
sandwich theorem 
05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 
SAR 
90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
Sard theorem see: parametrized — 
SAT 
[05-04, 68Q25, 90C27, 90C60] 
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(see: Evolutionary algorithms in combinatorial 
optimization; NP-complete problems and proof 
methodology) 
SAT 
90C09, 90C10] 
(see: Optimization in boolean classification problems) 
SAT see: 3- —; MAX- —; MAX-2- — 
SAT-CNF problem 
03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
SAT-k-CNF 
03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
SAT problem see: weighted MAX- — 
satellite orbit 
26A24, 65K99, 85-08] 
(see: Automatic differentiation: geometry of satellites and 
tracking stations) 
satellite problem 
90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 
satellite systems 
26A24, 65K99, 85-08] 
(see: Automatic differentiation: geometry of satellites and 
tracking stations) 
satellites and tracking stations see: Automatic differentiation: 
geometry of — 
satisfaction see: constraint —; maximum constraint — 
Satisfaction Problem see: binary constraint —; constraint —; 
max-r-Constraint —; maximum constraint —; numerical 
constraint — 
satisfaction problems see: constraint —; continuous 
constraint — 
satisfaction: relaxations and upper bounds see: Maximum 
constraint — 
satisfaction set 
[03B52, 03E72, 47840, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
satisfaction techniques see: constraint — 
satisfiability 
[68Q25, 90C05, 90C10, 90C60] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds; NP-complete problems and proof 
methodology; Simplicial pivoting algorithms for integer 
programming) 
satisfiability 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
satisfiability see: 3- —; Boolean —; maximum — 
satisfiability of Boolean formulas 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
satisfiability problem 
[03B50, 68T15, 68T30, 90C60] 
(see: Complexity theory; Finite complete systems of 
many-valued logic algebras) 
satisfiability problem 
[90C09, 90C10] 
(see: Optimization in boolean classification problems) 
satisfiability problem see: Maximum — 


satisfiable Boolean formula 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
satisficing method 
[90C29] 
see: Multi-objective optimization: pareto optimal 
solutions, properties) 
saturating push 
[90C35] 
(see: Maximum flow problem) 
Savings see: expected — 
savings algorithm see: parallel — 
savings heuristic 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
see: Traveling salesman problem) 
savings heuristic see: Whitney — 
savings procedures 
[68T99, 90C27] 
see: Capacitated minimum spanning trees) 
Savitch theorem 
[90C60] 
see: Complexity classes in optimization) 
sawtooth arc cost function 
[90B10] 
see: Piecewise linear network flow problems) 
SBB algorithm see: sSC- — 
SBP 
[90C15, 90C26, 90C33] 
see: Stochastic bilevel programs) 


SC 
[90B05, 90B06] 
(see: Global supply chain models) 
SC see: IM in — 
scalar variational inequalities 
[46N10, 49]40, 90C26] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
scalarization 
90C29] 
(see: Vector optimization) 
scalarization 
90C29] 
(see: Vector optimization) 
scalarization property 
90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
scalarizing function 
[90C29] 
(see: Multiple objective programming support) 
scalarizing function 
[90C29] 
(see: Multiple objective programming support) 
scalarizing program see: achievement — 
scalars 
[14R10, 15403, 51N20] 
(see: Linear space) 
scale 
[90C29] 
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(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 

scale 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 

scale see: circular unidimensional —; economies of —; 
economy of —; exponential —; linear unidimensional —; 
unidimensional — 

scale combinatorial optimization see: large- — 

scale function 
[90C26] 
(see: Invexity and its applications) 

scale function 
[90C26] 
(see: Invexity and its applications) 


scale global optimization using terrain/funneling methods see: 


Multi- — 
scale invariance criterion 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
scale-invariant 
(see: Global optimization: functional forms) 
scale least squares problems see: Complexity and large- — 
scale linear systems see: large — 
scale neighborhoods see: large- — 
scale network see: intermediate —; macro —; micro 
scale nonlinear mixed integer programming problem see: 
large — 
scale optimization see: large — 
scale problem see: large — 
scale and sparse semidefinite programs see: Solving large — 
scale trust region see: large — 
scale trust region problem see: large — 
scale trust region problems see: Large — 
scale unconstrained optimization see: Large — 
scaled ABS class 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
scaled ABS class of algorithms 
[65K05, 65K10] 
(see: ABS algorithms for optimization) 
scaled subclass see: optimally —; orthogonally — 
scales see: linear —; unidimensional — 
scaling see: cost —; dual —; €- —; multidimensional —; 
unidimensional — 
scaling algorithm see: affine 
primal-dual — 
scaling method see: distance — 
scaling problem see: multidimensional — 
scaling SQPIP methods see: affine — 
scalings algorithm see: dual- — 
scan 
[90C35] 
(see: Multi-index transportation problems) 
Scan see: graham- — 
Scarf formulation 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 


; dual- —; primal- —; 


Scarf formulation 

13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 

scatter search 

05-04, 68T20, 68T99, 90C10, 90C11, 90C20, 90C27, 90C59] 

(see: Evolutionary algorithms in combinatorial 

optimization; Linear ordering problem; Metaheuristics) 

scenario 

90C15, 90C26, 90C90, 91B28] 
(see: Decomposition algorithms for the solution of 
multistage mean-variance optimization problems; 
Financial optimization; Global optimization in batch 
design under uncertainty; Stochastic programming: parallel 
factorization of structured matrices; Two-stage stochastic 
programs with recourse) 

scenario aggregation 
[90C15] 
(see: L-shaped method for two-stage stochastic programs 
with recourse) 

scenario analysis 
[90C15, 90C29, 90C30, 90C35] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points; Optimization in water 
resources; Stochastic quasigradient methods in minimax 
problems) 

scenario analysis 
[90C15, 90C29, 90C30, 90C35] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points; Optimization in water 
resources; Stochastic quasigradient methods in minimax 
problems) 

scenario generation 

91B28] 

(see: Financial optimization) 

scenario set 

68W 10, 90B15, 90C06, 90C30] 

(see: Stochastic network problems: massively parallel 

solution) 

scenario tree 

90C15, 90C30, 90C35, 90C90] 
(see: Decomposition algorithms for the solution of 
multistage mean-variance optimization problems; 
Multistage stochastic programming: barycentric 
approximation; Optimization in water resources) 

scenario trees see: barycentric — 

scenarios see: what-if-when — 

SCG see: algorithm- — 

Schaible algorithm 
[90C32] 
(see: Quadratic fractional programming: Dinkelbach 
method) 

Schauder degree see: Leray- — 

Schauder fixed point theorem 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval fixed point theory) 

schedule see: annealing —; cooling —; flight 

schedule construction see: network design and — 

schedule first-cluster second 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

schedule second strategy see: cluster first- — 
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scheduling 
[90C26] 
(see: Global optimization in batch design under 
uncertainty) 
scheduling 
[90B35, 90C11, 90C26, 90C90] 
(see: Job-shop scheduling problem; MINLP: design and 
scheduling of batch processes; MINLP: trim-loss problem) 
scheduling see: airline crew —; batch —; crew —; facility 
planning and —; Integrated planning and —; 
Locomotive —; Mixed integer optimization in well —; 
Optimal sensor —; Railroad crew —; Railroad 
locomotive —; Static resource constrained project —; 
Stochastic —; Vehicle — 
scheduling of batch processes see: Medium-term —; MINLP: 
design and —; Reactive — 
scheduling of batch processes with resources see: 
Short-term — 
scheduling of continuous processes see: Short-term — 
scheduling functions 
[05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
scheduling: an MILP model see: Gasoline blending and 
distribution — 
scheduling models see: single locomotive — 
scheduling policy 
[90B36, 90C26] 
(see: MINLP: design and scheduling of batch processes; 
Stochastic scheduling) 
scheduling problem 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 
scheduling problem see: Broadcast —; crew- —; Flow shop —; 
Job-shop —; minimum Multiprocessor —; Multi-depot 
vehicle —; Single-depot vehicle —; vehicle — 
scheduling problems see: Integrated vehicle and duty —; 
multi-depot vehicle —; Single-depot vehicle — 
scheduling problems with a fixed number of vehicles see: 
Vehicle — 
scheduling problems with multiple types of vehicles see: 
Vehicle — 
scheduling problems with time constraints see: vehicle — 
scheduling process see: duty — 
scheduling, resource constrained: unified modeling 
frameworks see: Short-term — 
scheduling (staff planning) 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
scheduling of switching engines 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
scheduling theory 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
scheduling with trip shifting see: Vehicle — 
scheduling under uncertainty: sensitivity analysis see: 
Short-term — 
schema 
[92B05] 
(see: Genetic algorithms) 


schema 

[92B05] 

see: Genetic algorithms) 

Schema theorem 

[92B05] 

see: Genetic algorithms) 

Schema theorem 

[92B05] 

see: Genetic algorithms) 

Scheme 

[90C10, 90C30] 

see: Modeling languages in optimization: a new paradigm) 

scheme see: asynchronous round robin balancing —; branch 
and bound —-; chaotic iterative —; fully polynomial time 
approximation —; growth —; Hilbert —; iterative —; 
Kantorovich —; master-slave —; MS —; near-neighbor load 
balancing —; polynomial time approximation —; random 
polling —; randomized allocation —; sequential row 
orthogonalization — 

schemes see: aggregation — 

Schmidt algorithm see: Daniel-Gragg—Kaufmann-Stewart 
reorthogonalized Gram- — 

Schmidt orthogonalization see: classical Gram-— —; Gram- —; 
modified Gram- — 

Schmidt type iteration see: Gram- — 

Scholes model see: Black- — 

Schroedinger equation 

[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 

70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 

see: Global optimization in protein folding) 

Schruben-Margolin method 

[62F12, 65C05, 65K05, 90C15, 90C31] 

(see: Monte-Carlo simulations for stochastic optimization) 

Schur complement 

[65K05, 90C30] 

see: Automatic differentiation: calculation of Newton 

steps) 

Schur stability 

[39A11, 93C55, 93D09] 

see: Robust control: schur stability of polytopes of 
polynomials) 

schur stability of polytopes of polynomials see: Robust 
control: — 

science see: cognitive — 

scientific applications 
[03B50, 03B52, 03E72, 47840, 68T15, 68T27, 68T30, 68T35, 
68Uxx, 90Bxx, 90C05, 91Axx, 91B06, 92C60] 
(see: Boolean and fuzzy relations; Continuous global 
optimization: applications; Finite complete systems of 
many-valued logic algebras) 

Scientific Discovery see: logic of — 

sCM 

[68T20, 68T99, 90-02, 90C27, 90C59] 

see: Metaheuristics; Operations research models for supply 

chain management and design) 

score 

[90C39] 

see: Neuro-dynamic programming) 

score function 

[60J05, 90C15] 
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(see: Derivatives of markov processes and their simulation; 
Derivatives of probability measures) 

score function martingale 
[60J05, 90C15] 
(see: Derivatives of markov processes and their simulation) 

score function method 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 

scores see: REL chart — 

scoring function 
[62H30, 68T10, 90C05, 90C11, 90C39] 
(see: Linear programming models for classification; Mixed 
integer classification problems; Neuro-dynamic 
programming) 

scoring rule see: logarithmic —; quadratic (Brier) — 

SCOUT 
[49]35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 

SCP 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 

(SCP) see: stacker Crane Problem — 

screening see: mammography — 

SD 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 

Sd-classes 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 

SD problem see: dual —; equivalent primal —; primal —; 
standard — 

SDP 

[90C25, 90C30] 

see: Solving large scale and sparse semidefinite programs) 

SDP duality 

[90C30] 

see: Duality for semidefinite programming) 

SDSSS 

[49]35, 49K35, 62C20, 91A05, 91A40] 

(see: Minimax game tree searching) 

SDVSP 

[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 

see: Vehicle scheduling) 

SE 

[90C15, 90C30, 90C99] 

see: SSC minimization algorithms) 

search 

[65G20, 65G30, 65G40, 68T20] 

see: Interval constraints) 

search see: adaptive —; Adaptive global —; allowed neighbor 
in tabu —; aspiration —; best-first tree —; binary —; 
bisection —; chained local —; conformational —; 
curvilinear line —; cyclic coordinate —; depth-first —; 
depth-first tree —; direct —; domain of —; Fibonnaci 
section —; fixed tabs —; formulation space —; global —; 
Global equilibrium —; global optimum —; golden 
section —; graph —; greedy randomized adaptive —; 
grid —; Hamming-reactive tabu —; hesitant adaptive —; 
Heuristic —; history of a —; inexact line —; intelligent —; 


iterated local —; limited discrepancy —; line —; local —; 
localization —; Megiddo parametric —; move ina —; 
nonmonotone line —; nonoblivious local —; parallel 
aspiration —; Parallel Best-First Tree —; Parallel Depth-First 
Tree —; Parallel heuristic —; pattern —; prohibited 
neighbor in tabu —; pure adaptive —; pure localization —; 
pure random —; random walk —-; reactive tabu —; 
scatter —; stochastic local —; systematic —; tabu —; 
topological —; tree — 
search algorithm see: A* —; binary —; distributed game 
tree —; generalized game tree —; lexicographic —; optimal 
state space —; recursive state space —; state space —; 
synchronized distributed state space — 
search algorithms 
[90C30] 
(see: Frank-Wolfe algorithm) 
search algorithms see: local —; random — 
search with backtracking see: depth-first — 
search configuration 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
search device see: local — 
search direction 
[90C05, 90C22, 90C25, 90C30, 90C51] 
(see: Interior point methods for semidefinite programming) 
search direction see: compute the — 
search directions see: orthogonal — 
search engine 
[90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 
search engine see: BB —; Newton —; quasi-Newton — 
search with enhanced positioning see: Gene clustering: 
A novel decomposition-based clustering approach: global 
optimum — 
search-Hansel chains question-asking strategy see: binary — 
search heuristic 
[65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
search heuristics see: advanced —; local — 
search line see: inexact line — 
search Luus—Jaakola optimization procedure see: Direct — 
search method see: adaptive random —; binary —; local —; 
random —; stochastic — 
search methodology see: tabu — 
search methods see: line —; Random —-; Variable 
neighborhood — 
search optimization see: direct — 
search overhead factor 
[68W10, 90C27] 
(see: Load balancing for parallel optimization techniques) 
search phase in GRASP see: local — 
search problem see: line — 
search problems see: polynomial time local — 
search procedure see: greedy randomized adaptive — 
search procedures see: Greedy randomized adaptive — 
search technique see: inexact line —; line — 
search trajectory 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
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searcher cooperation minimization algorithms see: supervisor 
and — 

searches see: line —; pattern — 

searching 

49J35, 49K35, 62C20, 91A05, 91A40] 

(see: Minimax game tree searching) 

searching see: line —; Minimax game tree — 

searching state space graphs 

68W 10, 90C27] 

(see: Load balancing for parallel optimization techniques) 

secant equation 

49M07, 49M10, 65K, 90C06, 90C20] 

(see: Spectral projected gradient methods) 

secant relation 

90C30] 

(see: Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 

secant updating 

65D25, 68W30] 

(see: Complexity of gradients, Jacobians, and Hessians) 

second see: schedule first-cluster — 

second generation modeling languages 

90C10, 90C30] 

(see: Modeling languages in optimization: a new paradigm) 

second hypodifferential 

65K05, 90C30] 

(see: Nondifferentiable optimization: minimax problems) 

second level problem 

90C25, 90C29, 90C30, 90C31] 

(see: Bilevel programming: optimality conditions and 

duality) 

second order adjoints 

65D25, 68W30] 

(see: Complexity of gradients, Jacobians, and Hessians) 

second order approximation 

90C30] 

(see: Convex-simplex algorithm) 

second order codifferential 

49J52, 65K99, 65Kxx, 70-08, 90C25, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: 
codifferentiable functions) 

second order cone 
[90C05] 
(see: Linear programming: interior point methods) 

second order constraint qualification 
[49K27, 49K40, 90C30, 90C31] 
(see: Second order constraint qualifications) 

Second order constraint qualifications 
(90C30, 49K27, 90C31, 49K40) 
(referred to in: Equality-constrained nonlinear 
programming: KKT necessary optimality conditions; First 
order constraint qualifications; Generalized semi-infinite 
programming: optimality conditions; Inequality-constrained 
nonlinear optimization; Kuhn-Tucker optimality 
conditions; Lagrangian duality: BASICS; Nondifferentiable 
optimization: parametric programming; Rosen’s method, 
global convergence, and Powell’s conjecture; Saddle point 
theory and optimality conditions; Second order optimality 
conditions for nonlinear optimization) 
(refers to: Equality-constrained nonlinear programming: 


KKT necessary optimality conditions; First order 
constraint qualifications; Inequality-constrained nonlinear 
optimization; Kuhn-Tucker optimality conditions; 
Lagrangian duality: BASICS; Rosen’s method, global 
convergence, and Powell’s conjecture; Saddle point theory 
and optimality conditions; Second order optimality 
conditions for nonlinear optimization) 

second order CQ 

[90C26, 90C39] 

see: Second order optimality conditions for nonlinear 

optimization) 

second order decomposition of a function 

[90Cxx] 

see: Discontinuous optimization) 

second order directional derivative see: generalized — 

second order growth 

[90C22, 90C25, 90C31] 
(see: Semidefinite programming: optimality conditions and 
stability) 

second order hyperdifferential 

[49]52, 65K99, 70-08, 90C25] 

see: Quasidifferentiable optimization: codifferentiable 

functions) 

second order hypodifferential 

[49]52, 65K99, 70-08, 90C25] 

see: Quasidifferentiable optimization: codifferentiable 

functions) 

second order Lagrangian theory of CNSO problems 

[46A20, 52A01, 90C30] 

see: Composite nonsmooth optimization) 

second order necessary condition 

[49M29, 65K10, 90C06, 90C31] 

see: Dynamic programming and Newton’s method in 
unconstrained optimal control; Sensitivity and stability in 
NLP: approximation) 

second order necessary conditions 

[90C26, 90C39] 

see: Second order optimality conditions for nonlinear 

optimization) 

second order necessary and sufficient optimality conditions 

[90C31, 90C34] 

see: Semi-infinite programming: second order optimality 

conditions) 

second order optimality condition 

49M37, 65K05, 90C30] 
(see: Inequality-constrained nonlinear optimization) 

second order optimality conditions see: first order and —; 
Semi-infinite programming: — 

Second order optimality conditions for nonlinear 
optimization 
(90C39, 90C26) 
(referred to in: Equality-constrained nonlinear 
programming: KKT necessary optimality conditions; First 
order constraint qualifications; Inequality-constrained 
nonlinear optimization; Kuhn-Tucker optimality 
conditions; Lagrangian duality: BASICS; Rosen’s method, 
global convergence, and Powell’s conjecture; Saddle point 
theory and optimality conditions; Second order constraint 
qualifications) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; First order 
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constraint qualifications; Inequality-constrained nonlinear 
optimization; Kuhn-Tucker optimality conditions; 
Lagrangian duality: BASICS; Rosen’s method, global 
convergence, and Powell’s conjecture; Saddle point theory 
and optimality conditions; Second order constraint 
qualifications) 

second order procedures 


68T99, 90C27] 


(see: Capacitated minimum spanning trees) 


second order regular set 


49K27, 49K40, 90C30, 90C31] 


(see: Second order constraint qualifications) 


second order sufficiency 


[49M37, 65K05, 65K10, 90C30, 93A13] 
see: Multilevel methods for optimal design) 


second order sufficiency see: strong — 
second order sufficient condition 


A piecewise SQP approach; Sensitivity and stability in NLP: 


[90C30, 90C31, 90C33] 
see: Optimization with equilibrium constraints: 


continuity and differential stability) 

second order sufficient condition see: general —; general 
strong —; strong — 

second order sufficient conditions 


[90C26, 90C31, 90C39] 


(see: Second order optimality conditions for nonlinear 
optimization; Sensitivity and stability in NLP) 
second order tangent set 


[49K27, 49K40, 90C30, 90C31] 


(see: Second order constraint qualifications) 
second partial derivatives see: matrix of — 
second principle see: Wardrop — 
second slope lemma 


[90C30] 


(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
second-stage 


sec 


sec 


Sec 


[90C10, 90C15] 


(see: Stochastic vehicle routing problems) 


ond-stage decision 

[90C15] 

see: Two-stage stochastic programs with recourse) 
ond-stage decisions 

[90C10, 90C15] 

see: Stochastic integer programs; Stochastic programming: 


parallel factorization of structured matrices) 


ond-stage feasibility set 
[90C15] 
see: Two-stage stochastic programs with recourse) 


second strategy see: cluster first-schedule — 


sec 


sec 


Ss 


a 


C. 


ondary cone 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 


(see: Integer programming: algebraic methods) 


ondary fan 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 
ondary polytope 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 


secondary ray see: termination on a — 


secondary structure 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
secondary structure 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
section method see: golden — 
section search see: Fibonnaci —; golden — 
sectional shapes see: beam cross- — 
sectioning 
[90C30] 
(see: Nonlinear least squares problems) 
sector multi-instrument financial equilibrium model see: 
multi- — 
sector stability criterion 
93D09] 
(see: Robust control) 
sectorization see: cell — 
security market line 
91B28] 
(see: Portfolio selection: markowitz mean-variance model) 
see see: wait-and- — 
seed matrix 
65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
seed part 
68120, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
seed structure 
90C26, 90C90] 
(see: Global optimization in Lennard-Jones and morse 
clusters) 
seek algorithm see: hide-and- — 
segment of a polyhedron 
[90C60] 
(see: Complexity of degeneracy) 
segmentation 
[90C90] 
(see: Optimization in medical imaging) 
segmentation see: feature —; spatial — 
segmentation problem see: beam — 
segments of polyhedra 
[90C60] 
(see: Complexity of degeneracy) 
Seidel see: Gauss— — 
Seidel algorithm see: Gauss— — 
Seidel iteration see: Gauss— — 
Seidel method see: Gauss— — 
Seidel value iteration see: Gauss— — 
seismic Vessel Problem (SVP) 
90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
selected residues 
60J15, 60J60, 60370, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
selection 
41A30, 47A99, 65K10, 92B05] 
(see: Genetic algorithms; Lipschitzian operators in best 
approximation by bounded or continuous functions) 
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selection 
[92B05] 
(see: Genetic algorithms) 

selection see: beam angle —; continuous —; controlled —; 
feature —; fundamental theorem of natural —; 
immediate —; lexicographic pivot —; portfolio —; 
priorities —; subinterval —; subset — 

selection of architecture 
[90039] 
(see: Neuro-dynamic programming) 

selection equations 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 

selection of functions see: continuous — 

selection: markowitz mean-variance model see: Portfolio — 

Selection of maximally informative genes 

selection and multicriteria analysis see: Portfolio — 

selection operator 
[41A30, 47A99, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 

selection operator see: continuous —; lipschitzian —; optimal 
Lipschitzian — 

selection problem 
[90C29] 
(see: Multiple objective programming support) 

selection problem see: portfolio — 

selection in radiotherapy treatment design see: Beam — 

selection step 
[90C59] 
(see: Heuristic and metaheuristic algorithms for the 
traveling salesman problem) 

selection and wedge orientation optimization see: beam 
angle — 

selective zeolite separation and catalysis: optimization 
methods see: Shape — 

self-inverse product of relations 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

self-organization 

68M20, 90B35] 

(see: Flow shop scheduling problem) 

selfadjoint operator see: closed — 

selfdual 

90C05] 
(see: Homogeneous selfdual methods for linear 
programming) 

selfdual methods for linear programming see: 
Homogeneous — 

selfdual model see: homogeneous and — 

Selfdual parametric method for linear programs 
(90C05, 90C06) 
(referred to in: Bounds and solution vector estimates for 
parametric NLPS; Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 
Nondifferentiable optimization: parametric programming; 
Parametric global optimization: sensitivity; Parametric 
linear programming: cost simplex algorithm; Parametric 
mixed integer nonlinear optimization; Parametric 
optimization: embeddings, path following and 


singularities) 
(refers to: Bounds and solution vector estimates for 
parametric NLPS; Multiparametric linear programming; 
Multiparametric mixed integer linear programming; 
Nondifferentiable optimization: parametric programming; 
Parametric global optimization: sensitivity; Parametric 
linear programming: cost simplex algorithm; Parametric 
mixed integer nonlinear optimization; Parametric 
optimization: embeddings, path following and 
singularities) 
selfdual problem 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules) 
selfdual rank one formula 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
selfdual system 
[15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
selling see: short — 
selling problem see: asset — 
semantic analysis methodologies 
90C09, 90C10] 
(see: Optimization in classifying text documents) 
semantics 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
semantics for fuzzy logics see: Checklist paradigm — 
semantics of MVL connectives 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
semi-assignment constraints 
90-00] 
see: Generalized assignment problem) 
semi-assignment problem see: Quadratic — 
(semi) definite completion problem see: positive — 
semi-definite quadratic binary programming see: positive — 
semi-infinite 
[03H10, 49J27, 57R12, 90C31, 90C34] 
see: Semi-infinite programming and control problems; 
Smoothing methods for semi-infinite optimization) 
semi-infinite linear programming 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
semi-infinite optimization 
[65K05, 65K10, 90C06, 90C30, 90C34] 
see: Feasible sequential quadratic programming) 
semi-infinite optimization 
[90C25, 90C26, 90C31, 90C34] 
see: Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions) 
semi-infinite optimization see: Adaptive convexification in —; 
one-parametric —; Smoothing methods for — 
semi-infinite optimization problem 
[90C26] 
(see: Smooth nonlinear nonconvex optimization) 
semi-infinite optimization problems 
[90C31, 90C34] 
(see: Semi-infinite programming: second order optimality 
conditions) 
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semi-infinite problem see: generalized — 
semi-infinite program 


Ss 


e 


90C34] 
(see: Semi-infinite programming: approximation methods) 
mi-infinite program see: dual —; primal (linear) — 


semi-infinite programming 


[90C05, 90C25, 90C30, 90C34] 
see: Semi-infinite programming: discretization methods) 


semi-infinite programming 


[90C05, 90C25, 90C30, 90C34] 

see: Semi-infinite programming: discretization methods; 
Semi-infinite programming, semidefinite programming 
and perfect duality) 


semi-infinite programming see: linear —; perfect duality from 


the view of linear —; reduced problem in — 


Semi-infinite programming and applications in finance 


(90C34, 91B28) 

(referred to in: Competitive ratio for portfolio management; 
Financial applications of multicriteria analysis; Financial 
optimization; Portfolio selection and multicriteria analysis; 
Robust optimization) 

(refers to: Competitive ratio for portfolio management; 
Financial applications of multicriteria analysis; Financial 
optimization; Portfolio selection and multicriteria analysis; 
Robust optimization; Semi-infinite programming: 
approximation methods; Semi-infinite programming and 
control problems; Semi-infinite programming: 
discretization methods; Semi-infinite programming: 
methods for linear problems; Semi-infinite programming: 
numerical methods; Semi-infinite programming: second 
order optimality conditions; Semi-infinite programming, 
semidefinite programming and perfect duality) 


Semi-infinite programming: approximation methods 


(90C34) 

(referred to in: Semi-infinite programming and applications 
in finance; Semi-infinite programming and control 
problems; Semi-infinite programming: discretization 
methods; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions; Semi-infinite programming, semidefinite 
programming and perfect duality) 

(refers to: Semi-infinite programming and control 
problems; Semi-infinite programming: discretization 
methods; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions; Semi-infinite programming, semidefinite 
programming and perfect duality) 


Semi-infinite programming and control problems 


(4927, 90C34, 03H10) 

(referred to in: Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Hamilton-Jacobi-Bellman 
equation; Infinite horizon control and dynamic games; 
MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control; Optimal control of a flexible arm; Robust 


control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and applications 
in finance; Semi-infinite programming: approximation 
methods; Semi-infinite programming: discretization 
methods; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions; Semi-infinite programming, semidefinite 
programming and perfect duality; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems; Suboptimal control) 

(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Hamilton-Jacobi-Bellman equation; Infinite 
horizon control and dynamic games; MINLP: applications 
in the interaction of design and control; Multi-objective 
optimization: interaction of design and control; Optimal 
control of a flexible arm; Robust control; Robust control: 
schur stability of polytopes of polynomials; Semi-infinite 
programming: approximation methods; Semi-infinite 
programming: discretization methods; Semi-infinite 
programming: methods for linear problems; Semi-infinite 
programming: numerical methods; Semi-infinite 
programming: second order optimality conditions; 
Semi-infinite programming, semidefinite programming 
and perfect duality; Sequential quadratic programming: 
interior point methods for distributed optimal control 
problems; Suboptimal control) 


Semi-infinite programming: discretization methods 


(90C34, 90C05, 90C25, 90C30) 

(referred to in: Semi-infinite programming and applications 
in finance; Semi-infinite programming: approximation 
methods; Semi-infinite programming and control 
problems; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions; Semi-infinite programming, semidefinite 
programming and perfect duality; Two-stage stochastic 
programs with recourse) 

(refers to: Semi-infinite programming: approximation 
methods; Semi-infinite programming and control 
problems; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions; Semi-infinite programming, semidefinite 
programming and perfect duality) 


Semi-infinite programming: methods for linear problems 


(90C34, 90C05) 

(referred to in: Semi-infinite programming and applications 
in finance; Semi-infinite programming: approximation 
methods; Semi-infinite programming and control 
problems; Semi-infinite programming: discretization 
methods; Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions; Semi-infinite programming, semidefinite 
programming and perfect duality) 

(refers to: Semi-infinite programming: approximation 
methods; Semi-infinite programming and control 
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problems; Semi-infinite programming: discretization 
methods; Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions; Semi-infinite programming, semidefinite 
programming and perfect duality) 

Semi-infinite programming: numerical methods 

(90C34, 90C26, 90C25, 90C34) 

(referred to in: Semi-infinite programming and applications 
in finance; Semi-infinite programming: approximation 
methods; Semi-infinite programming and control 
problems; Semi-infinite programming: discretization 
methods; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: second order 
optimality conditions; Semi-infinite programming, 
semidefinite programming and perfect duality) 

(refers to: Semi-infinite programming: approximation 
methods; Semi-infinite programming and control 
problems; Semi-infinite programming: discretization 
methods; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: second order 
optimality conditions; Semi-infinite programming, 
semidefinite programming and perfect duality) 
semi-infinite programming: optimality conditions see: 
Generalized — 

Semi-infinite programming: second order optimality 
conditions 

(90C31, 90C34) 

(referred to in: Semi-infinite programming and applications 
in finance; Semi-infinite programming: approximation 
methods; Semi-infinite programming and control 
problems; Semi-infinite programming: discretization 
methods; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: numerical methods; 
Semi-infinite programming, semidefinite programming 
and perfect duality) 

(refers to: Semi-infinite programming: approximation 
methods; Semi-infinite programming and control 
problems; Semi-infinite programming: discretization 
methods; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: numerical methods; 
Semi-infinite programming, semidefinite programming 
and perfect duality) 

Semi-infinite programming, semidefinite programming and 
perfect duality 

(90C05, 90C25, 90C30, 90C34) 

(referred to in: Duality for semidefinite programming; 
Semidefinite programming and determinant maximization; 
Semidefinite programming: optimality conditions and 
stability; Semidefinite programming and structural 
optimization; Semi-infinite programming and applications 
in finance; Semi-infinite programming: approximation 
methods; Semi-infinite programming and control 
problems; Semi-infinite programming: discretization 
methods; Semi-infinite programming: methods for linear 
problems; Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions; Smoothing methods for semi-infinite 
optimization; Solving large scale and sparse semidefinite 
programs) 

(refers to: Duality for semidefinite programming; Interior 
point methods for semidefinite programming; Semidefinite 


programming and determinant maximization; Semidefinite 
programming: optimality conditions and stability; 
Semidefinite programming and structural optimization; 
Semi-infinite programming: approximation methods; 
Semi-infinite programming and control problems; 
Semi-infinite programming: discretization methods; 
Semi-infinite programming: methods for linear problems; 
Semi-infinite programming: numerical methods; 
Semi-infinite programming: second order optimality 
conditions; Solving large scale and sparse semidefinite 
programs) 

semi-infinite programs 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 

semi-infinite programs see: computationally equivalent —; 
nonlinear — 

semi-order 
[90C29] 
(see: Preference modeling) 

semi-ordered spaces 
[01A99] 
(see: Kantorovich, Leonid Vitalyevich) 

semicoercive function 
[49]40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
(see: Nonconvex energy functions: hemivariational 
inequalities) 

semicoercive hemivariational inequality 

35A15, 47J20, 49]40] 

(see: Hemivariational inequalities: static problems) 

semicoercive hemivariational inequality 

35A15, 47J20, 49J40] 

(see: Hemivariational inequalities: static problems) 

semicontinuity 

[90C11, 90C15, 90C31] 

see: Stochastic integer programming: continuity, stability, 
rates of convergence) 

semicontinuous see: lower — 

semicontinuous function see: lower —; R!'-upper —; upper — 

semidefinite 
[05C50, 15A48, 15A57, 90C25] 
(see: Matrix completion problems) 

semidefinite see: positive — 

semidefinite matrices see: positive — 

semidefinite matrix see: bisymmetric positive —; partial —; 
positive — 

semidefinite matrix completion problem see: positive — 

semidefinite program 
[90C25, 90C27, 90C30, 90C90] 
(see: Duality for semidefinite programming; Semidefinite 
programming and structural optimization; Solving large 
scale and sparse semidefinite programs) 

semidefinite program see: dual —; linear — 

semidefinite program as conic convex program 
[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 

semidefinite programming 
[46A20, 52A01, 90C05, 90C22, 90C25, 90C30, 90C51, 93D09] 
(see: Copositive programming; Farkas lemma: 
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generalizations; Interior point methods for semidefinite 
programming; Robust control) 


semidefinite programming 


[15A15, 46A20, 52A01, 90C22, 90C25, 90C27, 90C30, 90C31, 
90C55, 90C90, 93D09] 

(see: Duality for semidefinite programming; Farkas lemma: 
generalizations; Large scale trust region problems; Robust 
control; Semidefinite programming and determinant 
maximization; Semidefinite programming: optimality 
conditions and stability; Semidefinite programming and 
structural optimization; Solving large scale and sparse 
semidefinite programs) 


semidefinite programming see: Duality for —; Graph 


realization via —; handbook on —-; Interior point methods 
for —; Maximum likelihood detection via — 


semidefinite programming approach 


[90C30] 
(see: Large scale trust region problems) 


Semidefinite programming and determinant maximization 


(90C25, 90C55, 90C25, 90C90, 15A15) 

(referred to in: a BB algorithm; Duality for semidefinite 
programming; Eigenvalue enclosures for ordinary 
differential equations; Hemivariational inequalities: 
eigenvalue problems; Interval analysis: eigenvalue bounds 
of interval matrices; Matrix completion problems; 
Semidefinite programming: optimality conditions and 
stability; Semidefinite programming and the sensor 
network localization problem, SNLP; Semidefinite 
programming and structural optimization; Semi-infinite 
programming, semidefinite programming and perfect 
duality; Solving large scale and sparse semidefinite 
programs) 

(refers to: a BB algorithm; Duality for semidefinite 
programming; Eigenvalue enclosures for ordinary 
differential equations; Hemivariational inequalities: 
eigenvalue problems; Interior point methods for 
semidefinite programming; Interval analysis: eigenvalue 
bounds of interval matrices; Matrix completion problems; 
Semidefinite programming: optimality conditions and 
stability; Semidefinite programming and structural 
optimization; Semi-infinite programming, semidefinite 
programming and perfect duality; Solving large scale and 
sparse semidefinite programs) 


Semidefinite programming: optimality conditions and 


stability 

(90C22, 90C25, 90C31) 

(referred to in: Duality for semidefinite programming; 
Maximum cut problem, MAX-CUT; Semidefinite 


programming and determinant maximization; Semidefinite 
programming and the sensor network localization problem, 


SNLP; Semidefinite programming and structural 
optimization; Semi-infinite programming, semidefinite 
programming and perfect duality; Solving large scale and 
sparse semidefinite programs; Standard quadratic 
optimization problems: algorithms) 

(refers to: Duality for semidefinite programming; Interior 
point methods for semidefinite programming; Semidefinite 


programming and determinant maximization; Semidefinite 


programming and structural optimization; Semi-infinite 
programming, semidefinite programming and perfect 


duality; Solving large scale and sparse semidefinite 
programs) 

semidefinite programming and perfect duality see: 
Semi-infinite programming — 

semidefinite programming problem 
[15A15, 90C22, 90C25, 90C31, 90C55, 90C90] 
(see: Semidefinite programming and determinant 
maximization; Semidefinite programming: optimality 
conditions and stability) 

semidefinite programming problem see: convex —; linear — 

Semidefinite programming and the sensor network 
localization problem, SNLP 
(referred to in: Maximum cut problem, MAX-CUT) 
(refers to: Graph realization via semidefinite programming; 
Semidefinite programming and determinant maximization; 
Semidefinite programming: optimality conditions and 
stability; Semidefinite programming and structural 
optimization; Solving large scale and sparse semidefinite 
programs) 

Semidefinite programming and structural optimization 
(90C25, 90C27, 90C90) 
(referred to in: Duality for semidefinite programming; 
Semidefinite programming and determinant maximization; 
Semidefinite programming: optimality conditions and 
stability; Semidefinite programming and the sensor 
network localization problem, SNLP; Semi-infinite 
programming, semidefinite programming and perfect 
duality; Solving large scale and sparse semidefinite 
programs; Topology of global optimization; Topology 
optimization) 
(refers to: Duality for semidefinite programming; Interior 
point methods for semidefinite programming; Semidefinite 
programming and determinant maximization; Semidefinite 
programming: optimality conditions and stability; 
Semi-infinite programming, semidefinite programming 
and perfect duality; Solving large scale and sparse 
semidefinite programs; Structural optimization; Structural 
optimization: history; Topology of global optimization; 
Topology optimization) 

semidefinite programs see: Solving large scale and sparse — 

semidefinite relaxations see: bounds based on — 

semidefinite symmetric matrix see: positive — 

semidefiniteness constraints see: positive — 

semigreedy heuristic 

65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 

(see: Greedy randomized adaptive search procedures) 

semilattice see: geometric — 

semilinear set 

52B12, 68Q25] 

(see: Fourier—-Motzkin elimination method) 

semilinear set 

52B12, 68Q25] 

(see: Fourier—-Motzkin elimination method) 

seminormal equation see: corrected — 

semipermeability 

35R70, 47S40, 74B99, 74D99, 74G99, 74H99] 

(see: Quasidifferentiable optimization: applications to 

thermoelasticity) 

semisets 

03E70, 03H05, 91B16] 

(see: Alternative set theory) 


Subject Index 


4499 


semisets 

[03E70, 03H05, 91B16] 

(see: Alternative set theory) 
semismooth function 

[90C30, 90C33] 

(see: Nonsmooth and smoothing methods for nonlinear 

complementarity problems and variational inequalities) 
semismooth function see: strongly —; upper — 
semismooth mapping 

[49J52, 90C30] 

(see: Nondifferentiable optimization: Newton method) 
semismooth mapping 

[49]52, 90C30] 

(see: Nondifferentiable optimization: Newton method) 
semismooth mapping see: strongly — 
semismoothness 

[90C30, 90C33] 

(see: Nonsmooth and smoothing methods for nonlinear 

complementarity problems and variational inequalities) 
semismoothness see: strong — 
semistrictly quasiconvex function 

[90C26] 

(see: Generalized monotone single valued maps) 
semistrictly quasimonotone map 

[90C26] 

(see: Generalized monotone single valued maps) 
semistrictly quasimonotone operator 

[90C26] 

(see: Generalized monotone multivalued maps) 
semisublattice see: meet — 
sender-initiated 

[65K05, 65Y05, 65Y10, 65Y20, 68W10] 

(see: Interval analysis: parallel methods for global 

optimization) 
sender initiated mapping technique 

[68W10, 90C27] 

(see: Load balancing for parallel optimization techniques) 
sense of Jongen—Jonker-Twilt see: problem regular in the — 
sense of Kojima—Hirabayashi see: problem regular in the — 
sensitive heuristics see: history- — 
sensitivity 

[90B85, 90C30, 93-XX] 

(see: Boundary condition iteration BCI; Dynamic 

programming: optimal control applications; Single facility 

location: multi-objective euclidean distance location; 

Suboptimal control) 
sensitivity 

[65L99, 90C05, 90C25, 90C29, 90C30, 90C31, 93-XX] 

(see: Nondifferentiable optimization: parametric 

programming; Optimization strategies for dynamic 

systems) 

sensitivity see: Parametric global optimization: —; 
variational — 

sensitivity analysis 

[13Cxx, 13Pxx, 14Qxx, 90C05, 90C10, 90C25, 90C29, 90C30, 

90C31, 90C46, 90Cxx] 

(see: Integer programming: algebraic methods; Integer 

programming duality; Nondifferentiable optimization: 

parametric programming; Sensitivity and stability in 

NLP) 


sensitivity analysis 


[13Cxx, 13Pxx, 14Qxx, 65K10, 90C22, 90C25, 90C31, 90C33, 
90Cxx] 

(see: Bounds and solution vector estimates for parametric 
NLPS; Integer programming: algebraic methods; 
Semidefinite programming: optimality conditions and 
stability; Sensitivity analysis of complementarity problems; 
Sensitivity analysis of variational inequality problems; 
Sensitivity and stability in NLP; Sensitivity and stability in 
NLP: approximation; Sensitivity and stability in NLP: 
continuity and differential stability) 


sensitivity analysis see: applications of —; automated Fortran 


program for nonlocal —; nonlocal —; post-optimality —; 
shape —; Short-term scheduling under uncertainty: — 


sensitivity analysis with automatic differentiation see: 


Nonlocal — 


Sensitivity analysis of complementarity problems 


(90C31, 90C33) 

(referred to in: Nonlocal sensitivity analysis with automatic 
differentiation; Parametric global optimization: sensitivity; 
Sensitivity analysis of variational inequality problems; 
Sensitivity and stability in NLP; Sensitivity and stability in 
NLP: approximation; Sensitivity and stability in NLP: 
continuity and differential stability) 

(refers to: Nonlocal sensitivity analysis with automatic 
differentiation; Parametric global optimization: sensitivity; 
Sensitivity analysis of variational inequality problems; 
Sensitivity and stability in NLP; Sensitivity and stability in 
NLP: approximation; Sensitivity and stability in NLP: 
continuity and differential stability) 


sensitivity analysis with respect to changes in cost coefficients 


[90C05, 90C31 
(see: Parametric linear programming: cost simplex 
algorithm) 


sensitivity analysis with respect to right-hand side changes 


[90C05, 90C31 
(see: Multiparametric linear programming) 


Sensitivity analysis of variational inequality problems 


(90C31, 65K10 
(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; Nonlocal 
sensitivity analysis with automatic differentiation; 
Parametric global optimization: sensitivity; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
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analysis of complementarity problems; Sensitivity and 
stability in NLP; Sensitivity and stability in NLP: 
approximation; Sensitivity and stability in NLP: continuity 
and differential stability; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 
(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; Nonlocal 
sensitivity analysis with automatic differentiation; 
Parametric global optimization: sensitivity; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of complementarity problems; Sensitivity and 
stability in NLP; Sensitivity and stability in NLP: 
approximation; Sensitivity and stability in NLP: continuity 
and differential stability; Solving hemivariational 
inequalities by nonsmooth optimization methods; 
Variational inequalities; Variational inequalities: F. E. 
approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system; Variational 
principles) 

sensitivity-based gradient 

65L99, 93-XX] 

(see: Optimization strategies for dynamic systems) 

sensitivity derivatives 

90C26, 90C90] 

(see: Structural optimization: history) 

sensitivity equations 

65L99, 93-XX] 

(see: Optimization strategies for dynamic systems) 

sensitivity in nonlinear programming 

49M37, 65K05, 65K10, 90C30, 93A13] 

(see: Multilevel methods for optimal design) 

sensitivity of optimal flowsheets 

90C30, 90C90] 

(see: Successive quadratic programming: applications in the 

process industry) 

sensitivity parameters 

49K99, 65K05, 80A10] 


(see: Optimality criteria for multiphase chemical 
equilibrium) 

Sensitivity and stability in NLP 
(90C31) 
(referred to in: Ill-posed variational problems; Nonlocal 
sensitivity analysis with automatic differentiation; 
Parametric global optimization: sensitivity; Sensitivity 
analysis of complementarity problems; Sensitivity analysis 
of variational inequality problems; Sensitivity and stability 
in NLP: approximation; Sensitivity and stability in NLP: 
continuity and differential stability) 
(refers to: Ill-posed variational problems; Nonlocal 
sensitivity analysis with automatic differentiation; 
Parametric global optimization: sensitivity; Sensitivity 
analysis of complementarity problems; Sensitivity analysis 
of variational inequality problems; Sensitivity and stability 
in NLP: approximation; Sensitivity and stability in NLP: 
continuity and differential stability) 

Sensitivity and stability in NLP: approximation 
(90C31) 
(referred to in: Nonlocal sensitivity analysis with automatic 
differentiation; Parametric global optimization: sensitivity; 
Sensitivity analysis of complementarity problems; 
Sensitivity analysis of variational inequality problems; 
Sensitivity and stability in NLP; Sensitivity and stability in 
NLP: continuity and differential stability) 
(refers to: Nonlocal sensitivity analysis with automatic 
differentiation; Parametric global optimization: sensitivity; 
Sensitivity analysis of complementarity problems; 
Sensitivity analysis of variational inequality problems; 
Sensitivity and stability in NLP; Sensitivity and stability in 
NLP: continuity and differential stability) 

Sensitivity and stability in NLP: continuity and differential 
stability 
(90C31) 
(referred to in: Nonlocal sensitivity analysis with automatic 
differentiation; Parametric global optimization: sensitivity; 
Sensitivity analysis of complementarity problems; 
Sensitivity analysis of variational inequality problems; 
Sensitivity and stability in NLP; Sensitivity and stability in 
NLP: approximation) 
(refers to: Nonlocal sensitivity analysis with automatic 
differentiation; Parametric global optimization: sensitivity; 
Sensitivity analysis of complementarity problems; 
Sensitivity analysis of variational inequality problems; 
Sensitivity and stability in NLP; Sensitivity and stability in 
NLP: approximation) 

sensitivity theorem see: basic — 

Sensor network localization problem 
(see: Semidefinite programming and the sensor network 
localization problem, SNLP) 

sensor network localization problem, SNLP see: Semidefinite 
programming and the — 

sensor scheduling see: Optimal — 

separability assumption 
[49M29, 90C11] 
(see: Generalized benders decomposition) 

separable classes 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
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separable convex objective function 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 
separable formulation 
[65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
separable function see: partially — 
separable objective function 
[90C25] 
(see: Concave programming) 
separable optimization 
90C30] 
(see: Generalized total least squares) 
separable optimization problem 
90C30] 
(see: Generalized total least squares) 
separable problem 
49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
separable problem variables 
90C30 
(see: Nonlinear least squares problems) 
separated see: 2- — 
separated Newton method 
90C30 
(see: Generalized total least squares) 
separated Newton method 
90C30 
(see: Generalized total least squares) 
separated pair decomposition see: well- — 
separating agents see: mass — 
separating hyperplane 
62H30, 68T10, 90C05] 
(see: Linear programming models for classification) 
separating relation 
03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
separation 
49K27, 58C20, 58E30, 62H30, 68R10, 90B35, 90C11, 90C27, 
90C30, 90C48] 
(see: Assignment methods in clustering; Branchwidth and 
branch decompositions; Nonsmooth analysis: Fréchet 
subdifferentials; Robust optimization: mixed-integer linear 
programs) 
separation 
[90C30, 93A30, 93B50] 
(see: Image space approach to optimization; MINLP: mass 
and heat exchanger networks; Mixed integer linear 
programming: mass and heat exchanger networks) 
separation see: k- —; topological — 
separation and catalysis: optimization methods see: Shape 
selective zeolite — 
separation condition 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
separation functions 
[90C30] 
(see: Image space approach to optimization) 


separation number 

[52B11, 52B45, 52B55] 

(see: Volume computation for polytopes: strategies and 

performances) 
separation problem 

[90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: cutting plane algorithms) 
separation procedure see: arc — 
separation processes see: synthesis of — 
separation routine 

[90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: cutting plane algorithms) 
separation theorem see: Frank discrete —; L- —; M- — 
separation theorems 

[90C05, 90C30] 

(see: Theorems of the alternative and optimization) 
sequence 

[62H30, 90C39] 

(see: Dynamic programming in clustering) 
sequence see: epiconvergent —; even —; Fejér monotone —; 

generalized finite —; generalized minimizing —; 

Levitin—Polyak minimizing —; minimizing —; minimum 

weight common mutated —; odd — 
sequence alignment see: multiple — 
sequence alignment via mixed-integer linear optimization see: 

Global pairwise protein — 
sequence of arcs 

[90035] 

(see: Minimum cost flow problem) 
sequence comparison 

[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 

(see: Algorithms for genomic analysis) 
sequence of greedy swaps see: monotone — 
sequencing 

[62H30, 90C39] 

(see: Dynamic programming in clustering; Mixed integer 

programming/constraint programming hybrid methods) 
sequencing see: dNA — 
sequencing problem 

[05-04, 90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 

90C59, 90C60, 90C90] 

(see: Evolutionary algorithms in combinatorial 

optimization; Traveling salesman problem) 
sequential 

[68T20, 68T99, 9008, 90C26, 90C27, 90C59] 

(see: Metaheuristics; Variable neighborhood search 

methods) 
sequential see: direct- — 
sequential approximate optimization 
[90C26, 90C90] 

(see: Structural optimization: history) 

sequential approximate optimization 

[90C26, 90C90] 

see: Structural optimization: history) 

sequential CA algorithm 

[90C30] 

see: Cost approximation algorithms) 

sequential coloring see: frequency exhaustive —; requirement 
exhaustive —; uniform — 

Sequential cutting plane algorithm 

(90C11, 90C26) 
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sequential deterministic algorithm 

65G20, 65G30, 65G40, 65K05, 90C30] 

(see: Interval global optimization) 

sequential estimation procedure 

90C15] 

(see: Stochastic quasigradient methods in minimax 

problems) 

sequential estimation procedure 

90C15] 

(see: Stochastic quasigradient methods in minimax 

problems) 

sequential experimental design 

90C15, 90C27] 

(see: Discrete stochastic optimization) 

sequential greedy coloring heuristic 

90C35] 

(see: Graph coloring) 

sequential greedy heuristics 

[05C69, 05C85, 68W01, 90C59] 

see: Heuristics for maximum clique and independent set) 

sequential Hansel chains question-asking strategy 

[90C09] 

see: Inference of monotone boolean functions) 

sequential Hansel chains question-asking strategy 

[90C09] 

see: Inference of monotone boolean functions) 

sequential heuristics 

[05-XX] 

see: Frequency assignment problem) 

sequential MEN synthesis method 

[93A30, 93B50] 

see: Mixed integer linear programming: mass and heat 

exchanger networks) 

sequential method 

[65L99, 93-XX] 

see: Optimization strategies for dynamic systems) 

sequential minimax game tree algorithm 

[49J35, 49K35, 62C20, 91A05, 91A40] 

see: Minimax game tree searching) 

sequential normal compactness 

[49K27, 58C20, 58E30, 90C48] 

see: Nonsmooth analysis: Fréchet subdifferentials) 

sequential normal compactness see: partial — 

sequential quadratic programming 

[65K05, 65K10, 90C05, 90C06, 90C25, 90C30, 90C34] 

see: Cost approximation algorithms; Feasible sequential 
quadratic programming; Semi-infinite programming: 
discretization methods) 

sequential quadratic programming 


[49K20, 49M99, 65K05, 65K10, 90C06, 90C30, 90C34, 90C55] 


(see: Cost approximation algorithms; Feasible sequential 
quadratic programming; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems) 
sequential quadratic programming see: Feasible — 
Sequential quadratic programming: interior point methods 
for distributed optimal control problems 
(49M99, 49K20, 90C55) 
(referred to in: Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: continuous-time optimal control; 


Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Entropy optimization: 
interior point methods; Feasible sequential quadratic 
programming; Hamilton-Jacobi-Bellman equation; 
Homogeneous selfdual methods for linear programming; 
Infinite horizon control and dynamic games; Linear 
programming: interior point methods; Linear 
programming: karmarkar projective algorithm; MINLP: 
applications in the interaction of design and control; 
Multi-objective optimization: interaction of design and 
control; Optimal control of a flexible arm; Optimization 
with equilibrium constraints: A piecewise SQP approach; 
Potential reduction methods for linear programming; 
Probabilistic analysis of simplex algorithms; Robust 
control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and control 
problems; Standard quadratic optimization problems: 
theory; Suboptimal control; Successive quadratic 
programming; Successive quadratic programming: 
applications in distillation systems; Successive quadratic 
programming: applications in the process industry; 
Successive quadratic programming: decomposition 
methods; Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 

(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Entropy optimization: interior point 
methods; Feasible sequential quadratic programming; 
Hamilton-Jacobi-Bellman equation; Homogeneous 
selfdual methods for linear programming; Infinite horizon 
control and dynamic games; Interior point methods for 
semidefinite programming; Linear programming: interior 
point methods; Linear programming: karmarkar projective 
algorithm; MINLP: applications in the interaction of design 
and control; Multi-objective optimization: interaction of 
design and control; Optimal control of a flexible arm; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Potential reduction methods for linear 
programming; Robust control; Robust control: schur 
stability of polytopes of polynomials; Semi-infinite 
programming and control problems; Suboptimal control; 
Successive quadratic programming; Successive quadratic 
programming: applications in distillation systems; 
Successive quadratic programming: applications in the 
process industry; Successive quadratic programming: 
decomposition methods; Successive quadratic 
programming: full space methods; Successive quadratic 
programming: solution by active sets and interior point 
methods) 


sequential quadratic programming method see: piecewise — 
sequential quadratic programming methods 


[90C90] 
(see: Design optimization in computational fluid dynamics) 


sequential row orthogonalization scheme 


[65Fxx] 
(see: Least squares problems) 
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Sequential simplex method 
(90C30) 
(referred to in: Convex-simplex algorithm; Cyclic coordinate 
method; Equivalence between nonlinear complementarity 
problem and fixed point problem; Generalized nonlinear 
complementarity problem; Integer linear complementary 
problem; LCP: Pardalos-Rosen mixed integer formulation; 
Lemke method; Linear complementarity problem; Linear 
programming; Order complementarity; Parametric linear 
programming: cost simplex algorithm; Powell method; 
Principal pivoting methods for linear complementarity 
problems; Rosenbrock method; Topological methods in 
complementarity theory) 
(refers to: Convex-simplex algorithm; Cyclic coordinate 
method; Lemke method; Linear complementarity problem; 
Linear programming; Parametric linear programming: cost 
simplex algorithm; Powell method; Rosenbrock method) 

sequential simplex method 
[90C30] 
(see: Sequential simplex method) 

sequential synthesis 
[93A30, 93B50] 
(see: Mixed integer linear programming: mass and heat 
exchanger networks) 

sequentially convexifiable program 
[90C10, 90C11, 90C27, 90C57] 
(see: Integer programming) 

seriation 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 

seriation 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 

series see: Taylor —; time — 

series analysis see: time — 

series expansion see: first order Taylor — 

series-parallel graph 
[05C50, 15A48, 15457, 90C25] 
(see: Matrix completion problems) 

series rule see: divergent —; geometric — 

series step-size rule see: divergent — 

series steplength rule see: divergent — 

serious step see: long —; short — 

service needs see: static/dynamic — 

set see: active —; active index —; affine —; border nodes —; 
Borel —; bounded level —; branch of a feasible —; 
Cartesian product —; Chebyshev —; common 
dependency —; completely regular —; connected —; 
connected dominating —; constraint —; convex —; convex 
polyhedral —; cover the extremal —; d.c. —; decision —; 
dependent —; dominating —; dual feasible —; edge —; 
efficient —; efficient point —; e-subdifferential —; 
essentially active index —; extension —; feasible —; 
feedback vertex —; finite dominating —; first order 
tangent —; fractal —; fuzzy —; fuzzy power —; generalized 
critical point —; geodesic convex —; ground —; Heuristics 
for maximum clique and independent —; hierarchy in 
a finite —; high-order feasible —; independent —; 
independent dominating —; index —; interior of a —; 
invariant —; invex —; invexity with respect to a —; 
L-convex —; left-paired —; level —; localization —; 


lower —; lower bound for a —; M-convex —; 
max-closed —; maximal independent —; maximum 
Independent —; maximum weighted independent —; 
middle —; minimal dependent —; minimum —; minimum 
feedback vertex —; motion of a —; nonconvex —; 
nondominated solution —; noninferior solution —; 
obstruction —; opposite of a signed —; ordered —; 
outcome —-; p-order feasible —; Pareto optimal solution —; 
partition on a —; polyhedral —; pre-invex —; pre-invexity 
with respect to a —; primal feasible —; production —; 
proximal —; quasidifferentiable —; rational reaction —; 
reduction of a constraint —; regenerative —; regular —; 
representative —; reverse convex —; reverse normal —; 
right-paired —; satisfaction —; scenario —; second order 
regular —; second order tangent —; second-stage 
feasibility —; semilinear —; signed —; singular —; SIP 
index —; slope —; solution —; species index —; stability of 
a solution —; stable —; star-shaped —-; strictly feasible —; 
subdifferential —; substationarity point with respect to a —; 
support —; test —; training —; tree nodes —; 
uncertainty —; upper —; upper bound for a —; vertex — 

set algorithm see: active —; minimum lower — 

set of alternatives 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 

set of the alternatives see: finite — 

set of bases of a matroid 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 

set of connectives see: complete — 

set constraints see: odd- — 

set-contraction see: k- —; strict- — 

Set covering, packing and partitioning problems 
(90C10, 90C11, 90C27, 90C57) 
(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 
LCP: Pardalos—Rosen mixed integer formulation; MINLP: 
trim-loss problem; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Parametric mixed integer nonlinear 
optimization; Simplicial pivoting algorithms for integer 
programming; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem) 
(refers to: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Genetic algorithms; Graph coloring; 
Integer linear complementary problem; Integer 
programming; Integer programming: algebraic methods; 
Integer programming: branch and bound methods; Integer 
programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Integer 


4504 


Subject Index 


programming duality; Integer programming: lagrangian 
relaxation; LCP: Pardalos-Rosen mixed integer 
formulation; Mixed integer classification problems; 
Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Neural networks for combinatorial optimization; 
Parametric mixed integer nonlinear optimization; 
Simplicial pivoting algorithms for integer programming; 
Simulated annealing methods in protein folding; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Time-dependent 
traveling salesman problem) 

set covering problem 
[90C05, 90C10, 90C11, 90C20, 90C27, 90C57] 
(see: Integer programming; Redundancy in nonlinear 
programs; Set covering, packing and partitioning problems) 

set D see: countable — 

set of decision alternative 
[90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions) 

set of decrease see: high-order — 

set-definable classes 

[03E70, 03H0S, 91B16] 

see: Alternative set theory) 

t of discrete é-global local maximizers 

[90C05, 90C25, 90C30, 90C34] 

see: Semi-infinite programming: discretization methods) 

t of edges of a digraph 

[05C05, 05C40, 68R10, 90C35] 

see: Network design problems) 

of elementary functions 

[90C26] 

see: Global optimization: envelope representation) 

set of €-global points 

[90C05, 90C25, 90C30, 90C34] 

see: Semi-infinite programming: discretization methods) 

set of €-most active points 

[90C05, 90C25, 90C30, 90C34] 

see: Semi-infinite programming: discretization methods) 

set of faults 

[90Cxx] 

see: Discontinuous optimization) 

set of feasible points 

[90C05, 90C25, 90C30, 90C33, 90C34] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach; Semi-infinite programming: 
discretization methods) 

set of feasible solutions 

[90C05, 90C34] 

see: Semi-infinite programming: methods for linear 

problems) 

set of flowlines 

[76T30, 90C11, 90C90] 

(see: Mixed integer optimization in well scheduling) 

set of formation values 

[49K99, 65K05, 80A10] 

see: Optimality criteria for multiphase chemical 

equilibrium) 


RY 


a 


RY 


is} 


a 
a 
is 


set-formula 
[03E70, 03H05, 91B16] 

(see: Alternative set theory) 

set of a function see: effective —; gradient-related —; 
support — 

set-inclusion operator see: fuzzy — 

set Lfree of unused partitions 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 

(see: Maximum partition matching) 

set Lreac of used partitions 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 

(see: Maximum partition matching) 

set of Lagrange multipliers see: extended — 

set of loads 
[90C25, 90C27, 90C90] 

(see: Semidefinite programming and structural 
optimization) 

set mapping see: closed point-to- —; point-to- — 

set mappings see: point-to- — 

set method see: active — 

set methods see: active — 

set packing problem 
[90C10, 90C11, 90C27, 90C57] 

(see: Set covering, packing and partitioning problems) 

set partitioning 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C06, 90C08, 90C10, 
90C27, 90C35, 90C90] 

(see: Airline optimization; Vehicle scheduling) 

set partitioning problem 
[05-04, 90C10, 90C11, 90C27, 90C57] 

(see: Evolutionary algorithms in combinatorial 
optimization; Integer programming; Set covering, packing 
and partitioning problems) 

set of potential efficient solutions 
[90C27, 90C29] 

(see: Multi-objective combinatorial optimization) 

set of prices see: almost at equilibrium of an assignment and 
a —; equilibrium of an assignment and a — 

Set Problem see: feedback —; feedback arc —; feedback 
vertex —; maximum Independent —; minimum feedback 
arc —; minimum feedback vertex (arc) —; minimum weight 
feedback arc —; minimum weighted feedback vertex —; 
subset feedback vertex (arc) —; subset minimum feedback 
vertex (arc) —; unweighted feedback vertex — 

set problem: branch & cut algorithms see: Stable — 

set in problem solving see: restriction to the solution — 

set problems see: Feedback — 

set quadratic programming methods see: active — 

set Réree of unused partitions 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 

(see: Maximum partition matching) 

set Ryeac of used partitions 
[05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 

(see: Maximum partition matching) 

set strategies see: active — 

set strategy see: active —; Goldfarb-Idnani active — 

set theory see: Alternative —; axioms of alternative —; 
Cantor — 

set unification 
[65K05, 90C26, 90C33, 90C34] 
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(see: Adaptive convexification in semi-infinite 
optimization) 
set V see: vertex — 
set-valued analysis 
49K27, 90C29, 90C48] 
(see: Set-valued optimization) 
set-valued analysis 
49K27, 90C29, 90C48] 
(see: Set-valued optimization) 
set-valued constraints 
49K27, 90C29, 90C48] 
(see: Set-valued optimization) 
set-valued objective function 
49K27, 90C29, 90C48] 
(see: Set-valued optimization) 
Set-valued optimization 
(90C48, 90C29, 49K27) 
(referred to in: Generalized monotone multivalued maps; 
Generalized monotone single valued maps) 
(refers to: Generalized monotone multivalued maps; 
Generalized monotone single valued maps) 
set-valued optimization 
49K27, 90C29, 90C48] 
(see: Set-valued optimization) 
set-valued optimization problem 
49K27, 90C29, 90C48] 
(see: Set-valued optimization) 
set of wells 
76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
sets 


03E70, 03H05, 91B16] 

(see: Alternative set theory) 

sets 

03E70, 03H05, 91B16] 
(see: Alternative set theory) 

sets see: connectedness of the efficient points —; differences 
of convex —; fuzzy —; high-order tangent —; 
independent —; joined —; Lagrange multiplier —; least 
squares problems with massive data —; max-closed —; 
maximum weight independent —; minimum lower —; 
orthogonal signed —; pseudoconnected family of —; 
test — 

sets axioms see: existence of — 

sets conjugation see: level — 

sets and functions see: Affine — 

sets in integer programming see: test — 

sets and interior point methods see: Successive quadratic 
programming: solution by active — 

setting see: average case —; randomized — 

setting methods see: label — 

settle-value 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 

setup cost 
[90C25] 
(see: Concave programming) 

Several Ratios see: Maximization of the Smallest of — 


shadow price 
[90C60] 
(see: Complexity of degeneracy) 

shadow prices 
[90C05, 90C06, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming; Saddle point theory and optimality 
conditions; Sensitivity and stability in NLP: continuity and 
differential stability) 

shadow of shadows 

[01A99] 

(see: Gauss, Carl Friedrich) 

shadow-vertex 

[52A22, 60D05, 68Q25, 90C05] 

(see: Probabilistic analysis of simplex algorithms) 

shadow-vertex algorithm 

[52A22, 60D05, 68Q25, 90C05] 

see: Probabilistic analysis of simplex algorithms) 

shadow-vertices see: expected number of —; variance of the 
number of — 

shadows see: shadow of — 


Shahshahani metric 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
see: Replicator dynamics in combinatorial optimization) 
shake and bake algorithm 
[90C26] 
see: Phase problem in X-ray crystallography: Shake and 
bake approach) 
shake and bake algorithm 
[90C26] 
see: Phase problem in X-ray crystallography: Shake and 
bake approach) 
Shake and bake approach see: Phase problem in X-ray 
crystallography: — 
Shanno conjugate gradient method 
[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 
(see: Unconstrained optimization in neural network 
training) 
Shanno method see: Broyden-Fletcher-Goldfarb- — 
Shanno quasi-Newton update see: 
Broyden-Fletcher—Goldfarb- — 
Shanno update see: Broyden-Fletcher-Goldfarb- — 
Shannon 
[94A17] 
see: Jaynes’ maximum entropy principle) 
Shannon entropy 
[94A08, 94A17] 
see: Maximum entropy principle: image reconstruction) 
Shannon function 
[90C09] 
see: Inference of monotone boolean functions) 
Shannon function 
[90C09] 
(see: Inference of monotone boolean functions) 
shannon measure of entropy and its properties see: Entropy 
optimization: — 
shannon zero-error capacity 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 
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shape 
[90C26, 90C90] 
(see: Structural optimization: history) 
shape design 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
shape design see: multiload —; optimal —; robust 
obstacle-free — 
Shape optimization 
49J20, 49]52) 
refers to: Structural optimization; Structural optimization: 
history; Topological derivative in shape optimization) 
shape optimization 
[49J20, 49]52] 
(see: Shape optimization) 
shape optimization 
[49J20, 49]52] 
(see: Shape optimization) 
shape optimization see: structural —; Topological derivative 
in — 
Shape reconstruction methods for nonconvex feasibility 
analysis 
90-08, 90C26, 90C31) 
Shape selective zeolite separation and catalysis: optimization 
methods 
74A40, 90C26) 
shape sensitivity analysis 
[49J20, 49]52] 
see: Shape optimization) 
shape sensitivity analysis 
[49J20, 49]52] 
(see: Shape optimization) 
shaped decomposition see: L- — 
shaped method 
[90C06, 90C15] 
(see: Stabilization of cutting plane algorithms for stochastic 
linear programming problems) 
shaped method see: integer L- —; |- — 
shaped method for two-stage stochastic programs with 
recourse see: L- — 
shaped set see: star- — 
shapes see: beam cross-sectional —; design of optimal — 
share 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: stopping rules) 
shared-memory model see: queueing — 
shared memory parallel machines 
[65K05, 65Y05] 
see: Parallel computing: models) 
Sharpe ratio 
[91B28] 
(see: Portfolio selection: markowitz mean-variance model) 
Sharpe single index market model 
[91B28] 
see: Portfolio selection: markowitz mean-variance model) 
sheet see: balance —; B- — 
Sheffer function 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 


Sherman-Morrison formula 

[49M37] 

(see: Nonlinear least squares: Newton-type methods) 
Sherman-Morrison rank-one update formula 

[90C30] 

(see: Numerical methods for unary optimization) 
Sherman-Morrison-Woodbury formula 

[90C15] 

(see: Stochastic programming: parallel factorization of 

structured matrices) 
shift function see: cyclic — 
shift-invariant 

(see: Global optimization: functional forms) 
shift matrix see: diagonal — 
shift move 

[68T99, 90C27] 

(see: Capacitated minimum spanning trees) 
shift terms 

[90C31, 90C34, 90C46] 

(see: Generalized semi-infinite programming: optimality 

conditions) 
shifting see: Vehicle scheduling with trip — 
Shindo method see: Kojima— — 
ship routing problem see: inventory — 
shipment delivery see: express — 
shock 

[03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 
shock 

[03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 
shooting see: multiple — 
shop see: flow- —; job- — 
shop problem see: flow- —; job- —; open — 
shop scheduling problem see: Flow —; Job- — 

Shor, Naum Zuselevich 

(01A70, 90-03) 
short selling 

[91B28] 

(see: Portfolio selection: markowitz mean-variance model) 
short selling 

[91B28] 

(see: Portfolio selection: markowitz mean-variance model) 
short serious step 

[49]40, 49J52, 65K05, 90C30] 

(see: Solving hemivariational inequalities by nonsmooth 

optimization methods) 
short-term memory 

[05C69, 05C85, 68W01, 90C59] 

(see: Heuristics for maximum clique and independent set) 
Short-term scheduling of batch processes with resources 
Short-term scheduling of continuous processes 

(90B35, 65K05, 90C90, 90C11) 

Short-term scheduling, resource constrained: unified 
modeling frameworks 

(90B35, 65K05, 90C90, 90C11) 

Short-term scheduling under uncertainty: sensitivity analysis 
shortest edge 

[68Q20] 

(see: Optimal triangulations) 
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shortest path 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: cutting plane algorithms) 

shortest path 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 

shortest path see: stochastic — 

shortest path algorithm see: successive — 

shortest path algorithms see: generic — 

shortest path problem 
[49L20, 90C30, 90C40, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms; 
Dynamic programming: stochastic shortest path problems; 
Simplicial decomposition) 

shortest path problem 
[90B10, 90C27, 90C30] 
(see: Shortest path tree algorithms; Simplicial 
decomposition) 

shortest path problem see: deterministic —; stochastic — 

shortest path problems see: Dynamic programming: 
stochastic —; stochastic — 

shortest path procedure see: next — 

Shortest path tree algorithms 
(90C27, 90B10) 
(referred to in: Auction algorithms; Bottleneck steiner tree 
problems; Capacitated minimum spanning trees; 
Communication network assignment problem; Complexity 
theory; Dynamic traffic networks; Equilibrium networks; 
Generalized networks; Maximum flow problem; Minimax 
game tree searching; Minimum cost flow problem; 
Multicommodity flow problems; Network design problems; 
Network location: covering problems; Nonconvex network 
flow problems; Piecewise linear network flow problems; 
Steiner tree problems; Stochastic network problems: 
massively parallel solution; Survivable networks; Traffic 
network equilibrium) 
(refers to: Auction algorithms; Bottleneck steiner tree 
problems; Capacitated minimum spanning trees; 
Communication network assignment problem; Directed 
tree networks; Dynamic traffic networks; Equilibrium 
networks; Evacuation networks; Generalized networks; 
Maximum flow problem; Minimax game tree searching; 
Minimum cost flow problem; Network design problems; 
Network location: covering problems; Nonconvex network 
flow problems; Piecewise linear network flow problems; 
Steiner tree problems; Stochastic network problems: 
massively parallel solution; Survivable networks; Traffic 
network equilibrium) 

shortest path tree problem see: single source — 

shortest path tree problems 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 

shortest paths see: problem of finding — 

shortest program length 
[90C60] 
(see: Kolmogorov complexity) 

Shortest program length 
[90C60] 
(see: Kolmogorov complexity) 


Shubert algorithm see: Piyavskii- — 
shutdown 
(see: Reactive scheduling of batch processes) 
sibles 
[90C30, 90C35] 
(see: Optimization in water resources) 
side see: proof on the dual — 
side changes see: sensitivity analysis with respect to 
right-hand — 
side constraints 
[90B06, 90C06, 90C08, 90C35, 90C90] 
(see: Airline optimization) 
side constraints 
[90C30] 
(see: Simplicial decomposition) 
side payments see: game with — 
side perturbation model see: right-hand — 
side perturbation problem see: right-hand — 
side problem see: right-hand — 
side simplex algorithm see: parametric right-hand — 
side uncertainty, duality and applications see: Robust linear 
programming with right-hand- — 
sided differential see: one- — 
Sierpinski theorem 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 
o-classes 
[03E70, 03H0S, 91B16] 
(see: Alternative set theory) 
sigma-field 
[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 
sign of a circuit 
[90C09, 90C10] 
see: Combinatorial matrix analysis) 
sign function 
[65D10, 65K05] 
(see: Overdetermined systems of linear equations) 
sign-invariance model 
[52A22, 60D05, 68Q25, 90C05] 
see: Probabilistic analysis of simplex algorithms) 
sign matrix 
[15A39, 52A22, 60D05, 68Q25, 90C05] 
see: Farkas lemma; Probabilistic analysis of simplex 
algorithms) 
sign-nonsingular matrix 
[90C09, 90C10 
see: Combinatorial matrix analysis) 
sign pattern of a matrix 
[90C09, 90C10 
see: Combinatorial matrix analysis) 
sign-solvable linear system 
[90C09, 90C10 
see: Combinatorial matrix analysis) 
sign vector 
[90C09, 90C10 
see: Oriented matroids) 
signal processing see: nonGaussian —; nonlinear — 
Signal processing with higher order statistics 
90C26, 90C90) 
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(referred to in: Global optimization methods for harmonic 
retrieval) 
refers to: Global optimization methods for harmonic 
retrieval) 
signature 
[90C09, 90C10 
see: Oriented matroids) 
signed bigraph 
[90C09, 90C10 
see: Combinatorial matrix analysis) 
signed circuits 
[90C09, 90C10 
(see: Oriented matroids) 
signed cocircuits 
90C09, 90C10 
(see: Oriented matroids) 
signed decomposition 
52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 
signed decomposition see: Lasserre —; Lawrence — 
signed digraph 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 
signed set 
[90C09, 90C10] 
see: Oriented matroids) 
signed set see: opposite of a — 
signed sets see: orthogonal — 
signed subset 
[90C09, 90C10] 
see: Oriented matroids) 
signomial programming 
[90C26, 90C90] 
see: Global optimization in generalized geometric 
programming) 
signomials 
[90C26, 90C90] 
see: Global optimization in generalized geometric 
programming) 
Signorini condition 
[49]40, 49Q10, 70-08, 74K99, 74Pxx] 
see: Quasivariational inequalities) 
Signorini-Coulomb unilateral frictional contact 
[49]40, 49Q10, 70-08, 74K99, 74Pxx] 
see: Quasivariational inequalities) 
SIM 
[52A22, 60D05, 68Q25, 90C05] 
see: Probabilistic analysis of simplex algorithms) 
similarity measure 
[62H30, 90C27] 
see: Assignment methods in clustering) 
similarity subtree isomorphism see: maximal —; maximum — 
similarity of surrogates 
90C09, 90C10] 
(see: Optimization in classifying text documents) 
simple arrangement 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements in optimization) 
simple Bayes 
(see: Bayesian networks) 


simple homogeneous process 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: stopping rules) 

simple integer recourse 
[90C10, 90C15] 
(see: Stochastic integer programs) 

simple integer recourse 
[90C11, 90C15] 
(see: Stochastic programming with simple integer recourse) 

simple integer recourse see: Stochastic programming with —; 
two-stage stochastic programs with — 

simple linkage 

65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 

(see: Stochastic global optimization: stopping rules; 

Stochastic global optimization: two-phase methods) 

simple order isotonic regression 

62G07, 62G30, 65K05] 

(see: Isotonic regression problems) 

simple plant location problem 

90B80, 90B85] 

(see: Warehouse location problem) 

simple plant location problem 

90B80, 90B85] 

(see: Warehouse location problem) 

simple polyhedron 

90C60] 

(see: Complexity of degeneracy) 

simple polyhedron 

90C60] 

(see: Complexity of degeneracy) 

simple principal pivot 

65K05, 90C20, 90C33] 

(see: Principal pivoting methods for linear complementarity 

problems) 

simple recourse 

90C15, 90C30, 90C35] 
(see: Optimization in water resources; Stochastic linear 
programs with recourse and arbitrary multivariate 
distributions) 

simple recourse 
[90C11, 90C15] 
(see: Stochastic programming with simple integer 
recourse) 

Simple recourse problem 
(90C06, 90C08, 90C15) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Simple recourse problem: 
dual method; Simple recourse problem: primal method) 
(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Simple recourse problem: 
dual method; Simple recourse problem: primal method; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions) 

simple recourse problem see: dual method for the —; primal 
method for the — 

Simple recourse problem: dual method 
(90-08, 90C05, 90C06, 90C08, 90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 


Subject Index 4509 


Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem; 
Simple recourse problem: primal method; Stabilization of 
cutting plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programming: 
quasigradient method; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem; 
Simple recourse problem: primal method; Stabilization of 
cutting plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

Simple recourse problem: primal method 
(90-08, 90C05, 90C06, 90C08, 90C15, 49M25) 
(referred to in: Approximation of extremum problems with 


probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem; 
Simple recourse problem: dual method; Stabilization of 
cutting plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programming: 
quasigradient method; Two-stage stochastic programs with 
recourse) 

(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem; 
Simple recourse problem: dual method; Stabilization of 
cutting plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 
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simplex 
[52B11, 52B45, 52B55, 65K05, 90C06, 90C25, 90C26, 90C30, 
90C31, 90C35, 90C90] 
(see: Bisection global optimization methods; Global 
optimization in binary star astronomy; Sensitivity and 
stability in NLP: approximation; Sequential simplex 
method; Simplicial decomposition algorithms; Volume 
computation for polytopes: strategies and performances) 

simplex 
[65K05, 90C06, 90C25, 90C30, 90C35] 
(see: Bisection global optimization methods; Simplicial 
decomposition algorithms) 

simplex see: centroid of a —; dual —; initial —; 
lexicographic —; p- —; regular —; standard — 

simplex algorithm 
[03H10, 49J27, 90C05, 90C34] 
(see: Linear programming: Klee-Minty examples; 
Semi-infinite programming and control problems) 

simplex algorithm 
[90C05, 90C06, 90C10, 90C11, 90C30, 90C33, 90C57, 90C90] 
(see: Convex-simplex algorithm; Frank-Wolfe algorithm; 
Linear programming: Klee-Minty examples; Modeling 
difficult optimization problems; Pivoting algorithms for 
linear programming generating two paths; Simplicial 
decomposition) 

simplex algorithm see: Convex- —; dual —; network —; 
Parametric linear programming: cost —; parametric 
objective —; parametric right-hand side —; primal —; 
variant of the — 

simplex algorithms see: primal and dual —; Probabilistic 
analysis of — 

simplex method 
[05B35, 65K05, 65K10, 68Q99, 90C05, 90C06, 90C20, 90C25, 
90C33, 90C35] 
(see: ABS algorithms for optimization; Branch and price: 
Integer programming with column generation; Least-index 
anticycling rules; Lexicographic pivoting rules; Linear 
programming: karmarkar projective algorithm; Simplicial 
decomposition algorithms) 

simplex method 


[90C05] 
(see: Linear programming) 

simplex method see: downhill —; lexicographic —; 
lexicographic dual —; lexicographic primal —; 


Sequential —; steepest edge — 
simplex program 
[03H10, 49J27, 90C34] 
see: Semi-infinite programming and control problems) 
simplex reduction 
[65K05, 90C30] 
see: Bisection global optimization methods) 
simplex tableau see: terminal — 
simplex type algorithm 
[90C05, 90C33] 
see: Pivoting algorithms for linear programming 
generating two paths) 
simplexes see: system of — 
simplices 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 


simplices see: bounds on — 

simplicial see: nonlinear — 

simplicial algorithms 

90C05, 90C10] 

(see: Simplicial pivoting algorithms for integer 
programming) 

simplicial constraints 

90C60] 

(see: Complexity theory: quadratic programming) 
simplicial constraints 

90C60] 

(see: Complexity theory: quadratic programming) 
Simplicial decomposition 

(90C30) 

(referred to in: Decomposition principle of linear 

programming; Generalized benders decomposition; 

MINLDP: generalized cross decomposition; MINLP: 

logic-based methods; Simplicial decomposition algorithms; 

Standard quadratic optimization problems: algorithms; 

Standard quadratic optimization problems: applications; 

Stochastic linear programming: decomposition and cutting 

planes; Successive quadratic programming: decomposition 

methods) 

(refers to: Decomposition principle of linear programming; 

Frank-Wolfe algorithm; Generalized benders 

decomposition; MINLP: generalized cross decomposition; 

MINLDP: logic-based methods; Simplicial decomposition 

algorithms; Stochastic linear programming: decomposition 

and cutting planes; Successive quadratic programming: 
decomposition methods) 
simplicial decomposition 

[90C06, 90C25, 90C30, 90C35] 

(see: Frank-Wolfe algorithm; Simplicial decomposition; 

Simplicial decomposition algorithms) 
simplicial decomposition 

[90C30] 

(see: Frank-Wolfe algorithm; Simplicial decomposition) 
simplicial decomposition see: disaggregate —; restricted — 
Simplicial decomposition algorithms 

(90C06, 90C25, 90C35) 

(referred to in: Decomposition principle of linear 

programming; Generalized benders decomposition; 

MINLP: generalized cross decomposition; MINLP: 

logic-based methods; Simplicial decomposition; Stochastic 

linear programming: decomposition and cutting planes; 

Successive quadratic programming: decomposition 

methods) 

(refers to: Decomposition principle of linear programming; 

Generalized benders decomposition; MINLP: generalized 

cross decomposition; MINLP: logic-based methods; 

Simplicial decomposition; Stochastic linear programming: 

decomposition and cutting planes; Successive quadratic 

programming: decomposition methods) 
simplicial partition 

[90C20] 

(see: Standard quadratic optimization problems: 

applications) 

Simplicial pivoting algorithms for integer programming 

(90C10, 90C05) 

(referred to in: Branch and price: Integer programming with 

column generation; Criss-cross pivoting rules; 
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Decomposition techniques for MILP: lagrangian relaxation; 
Graph coloring; Integer linear complementary problem; 
Integer programming; Integer programming: algebraic 
methods; Integer programming: branch and bound 
methods; Integer programming: branch and cut algorithms; 
Integer programming: cutting plane algorithms; Integer 
programming duality; Integer programming: lagrangian 
relaxation; LCP: Pardalos—Rosen mixed integer 
formulation; Linear programming; MINLP: trim-loss 
problem; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Pivoting 
algorithms for linear programming generating two paths; 
Set covering, packing and partitioning problems; Stable set 
problem: branch & cut algorithms; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Time-dependent traveling 
salesman problem) 
(refers to: Branch and price: Integer programming with 
column generation; Criss-cross pivoting rules; 
Decomposition techniques for MILP: lagrangian relaxation; 
Graph coloring; Integer linear complementary problem; 
Integer programming; Integer programming: algebraic 
methods; Integer programming: branch and bound 
methods; Integer programming: branch and cut algorithms; 
Integer programming: cutting plane algorithms; Integer 
programming duality; Integer programming: lagrangian 
relaxation; LCP: Pardalos—Rosen mixed integer 
formulation; Least-index anticycling rules; Lexicographic 
pivoting rules; Linear programming; Mixed integer 
classification problems; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Parametric mixed integer nonlinear 
optimization; Pivoting algorithms for linear programming 
generating two paths; Principal pivoting methods for linear 
complementarity problems; Probabilistic analysis of 
simplex algorithms; Set covering, packing and partitioning 
problems; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Time-dependent traveling salesman problem) 

simpliciality 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 

simpliciality see: near- — 

simplicity 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 

simplicity see: near- — 

Simulated annealing 
(90C90, 90C27) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian global 
optimization; Broadcast scheduling problem; Discrete 
stochastic optimization; Genetic algorithms; Global 
optimization based on statistical models; Global 
optimization: hit and run methods; Global optimization in 
Lennard-Jones and morse clusters; Job-shop scheduling 


problem; Molecular structure determination: convex global 
underestimation; Monte-Carlo simulated annealing in 
protein folding; Multiple minima problem in protein 
folding: «BB global optimization approach; 
Optimization-based visualization; Optimization in medical 
imaging; Packet annealing; Phase problem in X-ray 
crystallography: Shake and bake approach; Random search 
methods; Simulated annealing methods in protein folding; 
Stochastic global optimization: stopping rules; Stochastic 
global optimization: two-phase methods) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Bayesian global optimization; 
Evolutionary algorithms in combinatorial optimization; 
Global optimization based on statistical models; 
Monte-Carlo simulated annealing in protein folding; 
Packet annealing; Random search methods; Simulated 
annealing methods in protein folding; Stochastic global 
optimization: stopping rules; Stochastic global 
optimization: two-phase methods) 

simulated annealing 
[00-02, 01-02, 03-02, 05-04, 62C10, 65K05, 68Q25, 68T20, 
68T99, 90B80, 90C05, 90C10, 90C15, 90C25, 90C26, 90C27, 
90C30, 90C59, 90C90] 
(see: Bayesian global optimization; Bayesian networks; 
Capacitated minimum spanning trees; Communication 
network assignment problem; Design optimization in 
computational fluid dynamics; Evolutionary algorithms in 
combinatorial optimization; Global optimization in binary 
star astronomy; Global optimization: hit and run methods; 
Maximum constraint satisfaction: relaxations and upper 
bounds; Metaheuristics; Metropolis, Nicholas Constantine; 
MINLP: design and scheduling of batch processes; Random 
search methods; Simulated annealing methods in protein 
folding; Vehicle routing problem with simultaneous 
pickups and deliveries) 

simulated annealing 
[90B80, 90C05, 90C25, 90C26, 90C29, 90C90, 92C40] 
(see: Facilities layout problems; Global optimization: hit 
and run methods; Metropolis, Nicholas Constantine; 
Monte-Carlo simulated annealing in protein folding; 
Optimal design of composite structures) 

simulated annealing see: adaptive —; stochastic — 

simulated annealing and genetic algorithm 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 

simulated annealing and its application to protein folding see: 
Adaptive — 

Simulated annealing methods in protein folding 
(90C90) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian global 
optimization; Differential equations and global 
optimization; Facilities layout problems; Genetic 
algorithms; Genetic algorithms for protein structure 
prediction; Global optimization based on statistical models; 
Global optimization in Lennard-Jones and morse clusters; 
Graph coloring; Maximum satisfiability problem; 
Molecular structure determination: convex global 
underestimation; Monte-Carlo simulated annealing in 
protein folding; Multiple minima problem in protein 
folding: «BB global optimization approach; Packet 
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annealing; Phase problem in X-ray crystallography: Shake 
and bake approach; Random search methods; Set covering, 
packing and partitioning problems; Simulated annealing; 
Stochastic global optimization: stopping rules; Stochastic 
global optimization: two-phase methods) 
(refers to: Adaptive simulated annealing and its application 
to protein folding; Bayesian global optimization; Genetic 
algorithms; Genetic algorithms for protein structure 
prediction; Global optimization based on statistical models; 
Global optimization in Lennard-Jones and morse clusters; 
Global optimization in protein folding; Molecular structure 
determination: convex global underestimation; 
Monte-Carlo simulated annealing in protein folding; 
Multiple minima problem in protein folding: «BB global 
optimization approach; Packet annealing; Phase problem in 
X-ray crystallography: Shake and bake approach; Protein 
folding: generalized-ensemble algorithms; Random search 
methods; Simulated annealing; Stochastic global 
optimization: stopping rules; Stochastic global 
optimization: two-phase methods) 

simulated annealing in protein folding see: Monte-Carlo — 

simulated quench 

92C05] 

(see: Adaptive simulated annealing and its application to 

protein folding) 

simulating annealing 

60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 

70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 

(see: Global optimization in protein folding) 

simulation 

90C39] 

(see: Emergency evacuation, optimization modeling; 

Neuro-dynamic programming) 

simulation 

90C27, 90C39] 
(see: Neuro-dynamic programming; Operations research 
and financial markets) 

simulation see: Derivatives of markov processes and their —; 
dynamic —; monte-Carlo —; numerical —; 
optimization- —; process — 

simulation algorithm see: Monte-Carlo — 

simulation-based optimization 

[62F12, 65C05, 65K05, 90C15, 90C31] 

see: Monte-Carlo simulations for stochastic optimization) 

simulation-based optimization 

[62F12, 65C05, 65K05, 90C15, 90C31] 

see: Monte-Carlo simulations for stochastic optimization) 

simulation of derivatives 

[90C15] 

(see: Derivatives of probability measures) 

simulation models 

[90-02] 

see: Operations research models for supply chain 
management and design) 

simulation models see: supply chain — 

simulation problems 
[90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 

simulation procedure see: Monte-Carlo — 

simulation programs see: process — 


simulations for stochastic optimization see: Monte-Carlo — 

simulator 
(see: State of the art in modeling agricultural systems) 

simultaneous 
[34A55, 35R30, 62G05, 62G08, 62J02, 62K05, 62P10, 62P30, 
76R50, 80A20, 80A23, 80A30] 
(see: Identification methods for reaction kinetics and 
transport) 

simultaneous adjustment 

90C26, 90C90] 

(see: Global optimization in binary star astronomy) 

simultaneous approach 

90C30, 90C90] 

(see: Successive quadratic programming: applications in the 

process industry) 

simultaneous Diophantine approximation problem 

90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 

simultaneous displacements see: method of — 

simultaneous equation models 
[90C26, 90C30] 
(see: Forecasting) 

Simultaneous estimation and optimization of nonlinear 
problems 
(93E20, 93E12, 49J15, 62J02, 62M10, 62M20, 91B28) 
(referred to in: Generalizations of interior point methods for 
the linear complementarity problem; Mathematical 
programming methods in supply chain management) 
(refers to: Complementarity algorithms in pattern 
recognition; Generalizations of interior point methods for 
the linear complementarity problem; Mathematical 
programming methods in supply chain management) 

simultaneous pickups and deliveries see: Vehicle routing 
problem with — 

simultaneous synthesis 
[93A30, 93B50] 
(see: Mixed integer linear programming: mass and heat 
exchanger networks) 

single 
(see: Peptide identification via mixed-integer optimization) 

single assignment 

26A24, 65D25] 

(see: Automatic differentiation: introduction, history and 

rounding error estimation) 

single assignment algorithms 

90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 

single assignment property 

90B80, 90B85] 

(see: Warehouse location problem) 

single-class software package 

90C10, 90C26, 90C30] 

(see: Optimization software) 

single cluster statistic see: generalized — 

single color 

05C85] 

(see: Directed tree networks) 

single-commodity model in OR 

90B80, 90B85] 

(see: Warehouse location problem) 
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single commodity network flow problem see: nonlinear — 

single-commodity single-criterion uncapacitated static 
multifacility see: discrete — 

single-criterion problem in OR 
[90B80, 90B85] 
(see: Warehouse location problem) 

single-criterion uncapacitated static multifacility see: discrete 
single-commodity — 

single-crystal X-ray diffraction data see: Optimization 
techniques for phase retrieval based on — 

single depot/multiple depots 

00-02, 01-02, 03-02] 

(see: Vehicle routing problem with simultaneous pickups 

and deliveries) 

Single-depot vehicle scheduling problem 

68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 

(see: Vehicle scheduling) 

Single-depot vehicle scheduling problems 

68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

single-ended 
(see: Railroad crew scheduling) 

single-ended crew district 
(see: Railroad crew scheduling) 

single facility location 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location; Single facility location: multi-objective 
rectilinear distance location) 

Single facility location: circle covering problem 
(90B85, 90C27) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 


distances; Single facility location: multi-objective euclidean 


distance location; Single facility location: multi-objective 


rectilinear distance location; Stochastic transportation and 


location problems; Voronoi diagrams in facility location; 
Warehouse location problem) 

(refers to: Carathéodory theorem; Combinatorial 
optimization algorithms in resource allocation problems; 
Competitive facility location; Facility location with 
externalities; Facility location problems with spatial 
interaction; Facility location with staircase costs; Global 
optimization in Weber’s problem with attraction and 
repulsion; MINLP: application in facility 
location-allocation; Multifacility and restricted location 
problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 


distances; Production-distribution system design problem; 


Resource allocation for epidemic control; Single facility 


location: multi-objective euclidean distance location; Single 


facility location: multi-objective rectilinear distance 
location; Stochastic transportation and location problems; 


Voronoi diagrams in facility location; Warehouse location 
problem) 


Single facility location: multi-objective euclidean distance 


location 

(90B85) 

(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Single facility location: circle covering problem; 
Single facility location: multi-objective rectilinear distance 
location; Stochastic transportation and location problems; 
Voronoi diagrams in facility location; Warehouse location 
problem) 

(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Production-distribution system design problem; 
Resource allocation for epidemic control; Single facility 
location: circle covering problem; Single facility location: 
multi-objective rectilinear distance location; Stochastic 
transportation and location problems; Voronoi diagrams in 
facility location; Warehouse location problem) 


Single facility location: multi-objective rectilinear distance 


location 

(90B85) 

(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Stochastic transportation and location problems; 
Voronoi diagrams in facility location; Warehouse location 
problem) 

(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Production-distribution system design problem; 
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Resource allocation for epidemic control; Single facility 
location: circle covering problem; Single facility location: 
multi-objective euclidean distance location; Stochastic 
transportation and location problems; Voronoi diagrams in 
facility location; Warehouse location problem) 
single-facility problem in OR 
90B80, 90B85] 
(see: Warehouse location problem) 
single fixed cost with capacity constraints 
90C26] 
(see: MINLP: application in facility location-allocation) 
ngle fixed cost with no capacity constraints 
90C26] 
(see: MINLP: application in facility location-allocation) 
single hub heuristic 
90C35] 
(see: Multi-index transportation problems) 
single index market model see: Sharpe — 
single-linkage see: multilevel — 
single locomotive scheduling models 
(see: Railroad locomotive scheduling) 
single locomotive type 
(see: Railroad locomotive scheduling) 
single-lookahead-unit-resolution 
[90C05, 90C10] 
see: Simplicial pivoting algorithms for integer 
programming) 
ngle parametric mixed integer linear program 
[90C11, 90C31] 
see: Multiparametric mixed integer linear programming) 
single path routing pattern model 
[68Q25, 90B80, 90C05, 90C27] 
see: Communication network assignment problem) 
single-period model 
[91B28] 
see: Financial optimization) 
single-product campaign 
[90C26] 
see: Global optimization in batch design under uncertainty; 
MINLP: design and scheduling of batch processes) 
single-ratio fractional (hyperbolic) 0-1 programming problem 
see: Fractional zero-one programming) 
single-ratio fractional program 
[90C32] 
see: Fractional programming) 
single-ratio programs 
[90C32] 
(see: Fractional programming) 
single run SQG 
[90C15] 
see: Stochastic quasigradient methods: applications) 
single smooth function 
57R12, 90C31, 90C34] 
(see: Smoothing methods for semi-infinite optimization) 
single source shortest path tree problem 
90B10, 90C27] 
(see: Shortest path tree algorithms) 
ngle-stage IM 
90B50] 
(see: Inventory management in supply chains) 


Ss 


Ss 


Ss 


single stage inventory management models 
[90B50] 
(see: Inventory management in supply chains) 
single underlying method 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
single-valued boundary laws and variational equalities 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 
single valued maps see: Generalized monotone — 
single versus Multiperiod Models 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 
singleton 
[90C35] 
(see: Feedback set problems) 
singular component 
[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 
singular control problem 
[93-XX] 
(see: Dynamic programming: optimal control applications) 
singular Fréchet subdifferential 
[49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
singular limiting subdifferential 
49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
singular local attractors 
49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
singular matrix 
15A99, 65G20, 65G30, 65G40, 90C26] 
(see: Interval linear systems) 
singular set 
05A18, 05D15, 68M07, 68M10, 68Q25, 68R05] 
(see: Maximum partition matching) 
singular value see: structured — 
singular value decomposition solution see: truncated — 
singularities 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
singularities see: generic —; Parametric optimization: 
embeddings, path following and — 
singularity 
[05B35, 20F36, 20F55, 52C35, 57N65, 90C31, 90C34] 
(see: Hyperplane arrangements; Parametric global 
optimization: sensitivity) 
singularity of an arrangement of hyperplanes 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
sink 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
sink node 
[90C35] 
(see: Maximum flow problem) 
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sinusoidal parameter estimation problem 
[65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 
sIP 
[57R12, 65K05, 90C26, 90C31, 90C33, 90C34] 
(see: Adaptive convexification in semi-infinite optimization; 
Parametric global optimization: sensitivity; Smoothing 
methods for semi-infinite optimization) 


SIP see: convex —; linear —; nonlinear —; structural stability 
of — 

SIP(f,h,g) see: global structual stability of — 

SIP index set 


[90C05, 90C25, 90C30, 90C34] 
(see: Semi-infinite programming, semidefinite 
programming and perfect duality) 
SIP problem see: discretized —; duality of the linear —; 
nonlinear discretized — 
SIP problems see: linear — 
SIPs see: equivalence of — 
site see: active — 
size see: list —; machine —; population —; step — 
size effects in microclusters 
[90C26, 90C90] 
(see: Global optimization in Lennard-Jones and morse 
clusters) 
size of a graph 
05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
size of input data 
68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
size of the input of a Turing machine 
90C60] 
(see: Complexity theory) 
size of a problem instance 
90C60] 
(see: Computational complexity theory) 
size rule see: divergent series step- — 
size threshold see: determination of clusters — 
sizing see: lot — 
sizing optimization 
[49J20, 49]52] 
(see: Shape optimization) 
sizing problem see: capacitated lot- —; Economic lot- — 
sizing variables 
[90C26, 90C90] 
(see: Structural optimization: history) 
skeleton 
[68Q20] 
(see: Optimal triangulations) 
skeleton see: IMT- — 
skew-symmetric matrix 
15A39, 90C05] 
(see: Farkas lemma; Linear optimization: theorems of the 
alternative) 
skew-symmetric matrix 
15A39, 90C05] 
(see: Farkas lemma) 
skew-symmetric matrix M 
65K05, 90C20, 90C33] 


see: Principal pivoting methods for linear complementarity 

problems) 

skew-symmetric proximity 

[62H30, 90C27] 

see: Assignment methods in clustering) 

Skewed VNS 

[9008, 90C26, 90C27, 90C59] 

see: Variable neighborhood search methods) 

skewness 

[90C26, 90C90] 

(see: Signal processing with higher order statistics) 

skorokhod problem 

[60G35, 65K05, 65K10, 90C90] 

see: Differential equations and global optimization; 

Variational inequalities: projected dynamical system) 

Skorokhod problem 

[65K10, 90C90] 

(see: Variational inequalities: projected dynamical system) 

slack cable 

[49Q10, 74K99, 74Pxx, 90C90, 91A65] 

see: Multilevel optimization in mechanics) 

slack constraints 

[90C60] 

see: Complexity of degeneracy) 

slack constraints 

[90C60] 

see: Complexity of degeneracy) 

slackness see: complementarity —; complementary —; 
€-complementary —; strict complementarity —; strict 
complementary — 

slackness condition see: strict complementarity — 

slackness conditions see: complementary — 

slackness relations see: complementary — 

slacks see: dual — 

Slater’s condition 
[90C05, 90C15, 90C25, 90C26, 90C29, 90C30, 90C31, 90C34, 
91A65, 91B28] 
(see: Bilevel programming: implicit function approach; 
Duality for semidefinite programming; Kuhn-Tucker 
optimality conditions; Nondifferentiable optimization: 
parametric programming; Probabilistic constrained linear 
programming: duality theory; Semi-infinite programming 
and applications in finance; Semi-infinite programming, 
semidefinite programming and perfect duality) 

Slater constraint 
[90C26] 
(see: Invexity and its applications) 

Slater constraint qualification 
[90C25, 90C29, 90C30, 90C33] 
(see: Lagrangian multipliers methods for convex 
programming; Multi-objective optimization: lagrange 
duality; Optimization with equilibrium constraints: 
A piecewise SQP approach) 

Slater constraint qualification see: generalized — 

Slater CQ see: Strong —; Weak — 

Slater dual see: extended Lagrange- — 

Slater theorem 
[90C05, 90C30] 
(see: Theorems of the alternative and optimization) 

slave scheme see: master- — 

slice see: time — 
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slope 
62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
slope arithmetic 
[65G20, 65G30, 65G40, 65K05, 90C30] 
see: Interval global optimization) 
slope lemma see: first —; second —; third 
slope lemmas 
[90C30] 
see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
slope set 
[65G20, 65G30, 65G40, 65H20, 65K99] 
see: Interval Newton methods) 
slopes see: interval — 
SLP 
[90C15] 
see: L-shaped method for two-stage stochastic programs 
with recourse) 
SLP see: decomposition of — 
SLP algorithms 
[90C30, 90C90] 
(see: MINLP: applications in blending and pooling 
problems) 
Smale function see: Chen—Harker-Kanzow- — 
small see: sufficiently — 
small gain 
[93D09] 
(see: Robust control) 
small negative real numbers see: infinitely — 
small positive real numbers see: infinitely — 
small real numbers see: infinitely — 
small residual 
[90C30] 
see: Nonlinear least squares problems) 
smaller see: lexicographically — 
smallest enclosing circle 
[90B80, 90C27] 
see: Voronoi diagrams in facility location) 
smallest enclosing-circle problem 
[90B80, 90C27] 
see: Voronoi diagrams in facility location) 
smallest index rule 
[90C05] 
(see: Linear programming: Klee-Minty examples) 
smallest K-majorant 
[41A30, 47A99, 65K10] 
see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
Smallest of Several Ratios see: Maximization of the — 
SMIN 
[90C10, 90C26] 
see: MINLP: branch and bound global optimization 
algorithm) 
SMIN-a BB 
49M37, 90C11] 
(see: Mixed integer nonlinear programming) 
SMIN-a BB algorithm 
90C26] 
(see: Bilevel optimization: feasibility test and flexibility 
index) 


Smith conjecture 
[90C27] 
(see: Steiner tree problems) 
Smith-Walford-one algorithm 
[90C35] 
(see: Feedback set problems) 
Smith-Walford one-reducible graph 
[90C35] 
(see: Feedback set problems) 
Smoluchowski-Kramers equation 
[60G35, 65K05] 
(see: Differential equations and global optimization) 
smooth function see: single — 
Smooth nonlinear nonconvex optimization 
(90C26) 
(referred to in: aBB algorithm; Continuous global 
optimization: models, algorithms and software; Global 
optimization in batch design under uncertainty; Global 
optimization in generalized geometric programming; 
Global optimization methods for systems of nonlinear 
equations; Global optimization in phase and chemical 
reaction equilibrium; Interval global optimization; MINLP: 
branch and bound global optimization algorithm; MINLP: 
global optimization with wBB) 
(refers to: &BB algorithm; Continuous global optimization: 
models, algorithms and software; Global optimization in 
batch design under uncertainty; Global optimization in 
generalized geometric programming; Global optimization 
methods for systems of nonlinear equations; Global 
optimization in phase and chemical reaction equilibrium; 
Interval global optimization; MINLP: branch and bound 
global optimization algorithm; MINLP: global 
optimization with wBB) 
smooth nonlinear optimization 
[90C26] 
(see: Smooth nonlinear nonconvex optimization) 
smooth optimization see: Derivative-free methods for non- — 
smooth potentials and stability in mechanics 
[49J52, 49Q10, 74G60, 74H99, 74K99, 74Pxx, 90C90] 
(see: Quasidifferentiable optimization: stability of dynamic 
systems) 
smoothing 
[49]52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
smoothing see: exponential — 
smoothing algorithm see: potential — 
smoothing algorithms 
[90Cxx] 
(see: Discontinuous optimization) 
smoothing functions 
[49]52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
smoothing methods 
[90C26, 90C30, 90C33] 
(see: Forecasting; Nonsmooth and smoothing methods for 
nonlinear complementarity problems and variational 
inequalities) 
smoothing methods 
[90C30, 90C33] 
(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 
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smoothing methods for nonlinear complementarity problems 


and variational inequalities see: Nonsmooth and — 
Smoothing methods for semi-infinite optimization 
(90C34, 90C31, 57R12) 


(refers to: Generalized semi-infinite programming: optimality 


conditions; Nonsmooth analysis: weak stationarity; 
Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities; 
Semi-infinite programming, semidefinite programming 
and perfect duality) 
smoothing Newton method 
[49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
smoothing NM 
49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
smoothing-nonsmooth reformulation 
90C30, 90C33] 
(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 
smoothing of the potential energy 
60J15, 60J60, 60370, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 
smoothness see: lack of — 
snakes 
[90C90] 
(see: Optimization in medical imaging) 
SNDP 
[90-XX] 
(see: Survivable networks) 
SNE see: GO for — 
SNLP see: Semidefinite programming and the sensor network 
localization problem — 
SO 


49J20, 49J52] 

(see: Shape optimization) 

soaring direction 

90Cxx] 

(see: Discontinuous optimization) 

Sobel edge filter 

90C90] 

(see: Optimization in medical imaging) 

Sobolev space 

49J52] 

(see: Hemivariational inequalities: eigenvalue problems) 
social cost see: production realizing with minimal — 
social utility function 

49Jxx, 91 Axx] 

(see: Infinite horizon control and dynamic games) 
SOCQ see: ben-Tal —; directional —; McCormick — 
software see: Continuous global optimization: models, 

algorithms and —; high-level —; low-level —; 

mathematical —; medium-level —; Optimization — 
software development and evaluation 

[90C05] 

(see: Continuous global optimization: models, algorithms 

and software) 
software for homotopy methods 

[65F10, 65F50, 65H10, 65K10] 

(see: Globally convergent homotopy methods) 


software library see: general-purpose — 


software package see: block truncated Newton —; 


multiple-class —; single-class — 


software package for specific mathematical areas 


[90C10, 90C26, 90C30] 
(see: Optimization software) 


software routine see: individual — 


software routines see: package of basic — 


soil 


(see: State of the art in modeling agricultural systems) 


Solanki method 


So 


[90C11, 90C29] 
see: Multi-objective mixed integer programming) 
lomon algebra see: Orlik- — 


solomonoff-Kolmogorov-Chaitin complexity 


[90C60] 
see: Kolmogorov complexity) 


Solomonoff-Kolmogorov-Chaitin complexity 


[90C60] 
see: Kolmogorov complexity) 


solution 


[90C10, 90C29] 

see: Maximum constraint satisfaction: relaxations and 
upper bounds; Multi-objective optimization: pareto 
optimal solutions, properties) 


solution see: admissible —; basic —; basic feasible —; 


best-compromise —; (C},)-efficient —; computable —; 
cooperative —; corner —; degenerate basic —; 
distinguished —; efficient —; enumerative —; 

é-reserved —; essential optimal —; extreme feasible —; 
extreme point —; feasible —; feedback Stackelberg —; 
global —; global minimum —; global optimal —; 
incomplete —; incumbent —; infeasible —; initial —; 
integer —; interior —; Lipschitz stable —; local —; locally 
optimal —; M-Pareto optimal —; minimax —; minimum 
norm —; most preferred —; neighborhood of a —; 
noncooperative —; nondegenerate —; nondominated —; 
noninferior —; nonsupported efficient —; optimal —; 
optimum —; Pareto —; Pareto efficient —; Pareto 

optimal —; polynomial time —; practically feasible 
computational —; primal —; primal-dual —; problem —; 
properly efficient —; quasi-optimal —; regular —; spanning 
tree —; stable —; Stochastic network problems: massively 
parallel —; strictly complementary —; strongly 
polynomial —; strongly stable —; strongly stable 

optimal —; supported efficient —; truncated singular value 
decomposition —; unbounded —; unsupported 
nondominated —; value of stochastic —; weakly 

efficient —; weakly nondominated —; weakly 

noninferior —; weakly Pareto optimal — 


solution by active sets and interior point methods see: 


Successive quadratic programming: — 


solution algorithm see: incremental-iterative — 


solution algorithms 


[90C26, 90C31, 91465] 
(see: Bilevel programming: implicit function approach) 


solution of the alignment problem 


(05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
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solution of bilevel programming problems 
90C30, 90C90] 
(see: Bilevel programming: global optimization) 
solution block 
[62G07, 62G30, 65K05] 
see: Isotonic regression problems) 
solution of the convex moment problem 
[28-XX, 49-XX, 60-XX] 
see: General moment optimization problems) 
solution of equations see: rotation in the — 
solution of the Euclidean distance location problem see: 
iterative — 
solution of facility location problems with staircase costs 
[90B80, 90C11] 
(see: Facility location with staircase costs) 
solution of the Hamilton-Jacobi-Bellman equation 
[34H05, 491.20, 90C39] 
(see: Hamilton-Jacobi-Bellman equation) 
solution of inverse problems see: formulation and — 
solution mapping see: optimal — 
solution method see: support problems — 
solution methodologies 
[90C30] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 
solution methods 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
solution methods see: exact —; Extremum problems with 
probability functions: kernel type — 
Solution methods for multivalued variational inequalities 
(47J20, 49J40, 90C33, 65K10) 
solution of multistage mean-variance optimization problems 
see: Decomposition algorithms for the — 
solution point see: bounds on the distance of a feasible point 
toa— 
solution-point bounds for NLP 
[90C31] 
(see: Sensitivity and stability in NLP: approximation) 
solution of a problem 
[90C60] 
(see: Computational complexity theory) 
solution procedures 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 
solution of a program see: optimal — 
solution quality see: establishing — 
solution robust 
[90C90, 91B28] 
(see: Robust optimization) 
solution set 
[15A99, 65G20, 65G30, 65G40, 90C26, 90C31, 90C33] 
(see: Interval linear systems; Sensitivity and stability in 
NLP: continuity and differential stability; Topological 
methods in complementarity theory) 
solution set see: nondominated —; noninferior —; Pareto 
optimal —; stability of a — 
solution set in problem solving see: restriction to the — 


solution space 
[9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 

solution space see: convexity property of the — 

solution strategies 

90C05] 

(see: Continuous global optimization: models, algorithms 

and software) 

solution of the system 

49M37] 

(see: Nonlinear least squares: trust region methods) 

solution of a system 

49M37] 
(see: Nonlinear least squares: trust region methods) 

solution vector estimates for parametric NLPS see: Bounds 
and — 

solution of the Wilson equation see: regular — 

solutions see: almost complementary —; basic —; comparison 
of parametric —; efficient —; enumerating extreme 
point —; equilibrium —; exploiting the interplay between 
primal and dual —; jumps of optimal —; least squares —; 
local —; nonsupported efficient —; Pareto optimal —; set of 
feasible —; set of potential efficient —; supported 
efficient —; verifying equilibrium — 

solutions of equations see: test for the existence of — 

solutions of nonlinear systems of equations see: error bound 
for approximate —; existence of —; rigorous bound for —; 
uniqueness of — 

solutions, properties see: Multi-objective optimization: pareto 
optimal — 

solvability see: polynomial — 

solvability of equations 

01499] 

(see: Lagrange, Joseph-Louis) 

solvability theorem 

90C26] 

(see: Global optimization: envelope representation) 

solvable linear system see: sign- — 

solvation effects 

65K10, 92C40] 
(see: Multiple minima problem in protein folding: «BB 
global optimization approach) 

solvent design approaches see: Optimal — 

solver 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 

solver 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 

solver see: flow — 

solver formats see: independent of — 

solvers 
(see: Planning in the process industry) 

solvers see: bottlenecks in NLP — 

solving see: constraint —; iterative linear equation- —; 
restriction to the solution set in problem — 

solving CAP on trees see: exact algorithm for —; heuristic 
approach to — 

solving environment see: problem — 

Solving hemivariational inequalities by nonsmooth 
optimization methods 
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(49J40, 49J52, 90C30, 65K05) 

(referred to in: Composite nonsmooth optimization; 
Generalized monotonicity: applications to variational 
inequalities and equilibrium problems; Hemivariational 
inequalities: applications in mechanics; Hemivariational 
inequalities: eigenvalue problems; Nonconvex energy 
functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; Nonsmooth 
and smoothing methods for nonlinear complementarity 
problems and variational inequalities; Quasidifferentiable 


optimization; Quasidifferentiable optimization: algorithms 


for hypodifferentiable functions; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 


thermoelasticity; Quasidifferentiable optimization: calculus 


of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Variational 
inequalities; Variational inequalities: F. E. approach; 
Variational inequalities: geometric interpretation, 
existence and uniqueness; Variational inequalities: 
projected dynamical system; Variational principles) 


(refers to: Composite nonsmooth optimization; Generalized 


monotonicity: applications to variational inequalities and 
equilibrium problems; Hemivariational inequalities: 
applications in mechanics; Hemivariational inequalities: 
eigenvalue problems; Hemivariational inequalities: static 
problems; Nonconvex energy functions: hemivariational 
inequalities; Nonconvex-nonsmooth calculus of variations; 
Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 


thermoelasticity; Quasidifferentiable optimization: calculus 


of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Variational 
inequalities; Variational inequalities: F. E. approach; 
Variational inequalities: geometric interpretation, 
existence and uniqueness; Variational inequalities: 
projected dynamical system; Variational principles) 
Solving large scale and sparse semidefinite programs 
(90C25, 90C30) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; Cholesky factorization; Duality for 


semidefinite programming; Interval linear systems; Large 
scale trust region problems; Large scale unconstrained 
optimization; Maximum cut problem, MAX-CUT; 
Orthogonal triangularization; Overdetermined systems of 
linear equations; QR factorization; Semidefinite 
programming and determinant maximization; Semidefinite 
programming: optimality conditions and stability; 
Semidefinite programming and the sensor network 
localization problem, SNLP; Semidefinite programming 
and structural optimization; Semi-infinite programming, 
semidefinite programming and perfect duality; Symmetric 
systems of linear equations) 
(refers to: ABS algorithms for linear equations and linear 
least squares; Cholesky factorization; Duality for 
semidefinite programming; Interior point methods for 
semidefinite programming; Interval linear systems; Large 
scale trust region problems; Large scale unconstrained 
optimization; Linear programming; Orthogonal 
triangularization; Overdetermined systems of linear 
equations; QR factorization; Semidefinite programming 
and determinant maximization; Semidefinite 
programming: optimality conditions and stability; 
Semidefinite programming and structural optimization; 
Semi-infinite programming, semidefinite programming 
and perfect duality; Symmetric systems of linear equations) 

solving a problem see: Turing machine — 

solving a problem instance in time m see: algorithm — 

solving vehicle routing problems see: approximate methods 
for —; constructive methods for —; exact methods for — 

SONC 

[90C31] 

see: Sensitivity and stability in NLP: approximation) 

SOR method 

[90C25, 90C33, 90C55] 

see: Splitting method for linear complementarity 

problems) 

sorting 

[90C29] 

see: Multicriteria sorting methods) 

sorting 

[90C29] 

see: Multicriteria sorting methods) 

sorting method see: multicriteria — 

sorting methods see: Multicriteria — 

sorting problem 
[90C29] 
(see: Preference disaggregation approach: basic features, 
examples from financial decision making) 

sorting problems 
[90C26, 90C29, 91B28] 
(see: Portfolio selection and multicriteria analysis; 
Preference disaggregation approach: basic features, 
examples from financial decision making) 

sOS 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 

source 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 

source see: virtual — 

source algorithm see: virtual — 
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source class 
[90C05, 90C34] 
(see: Semi-infinite programming: methods for linear 
problems) 

source code transformation 
[65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 

source concept in auction algorithms see: virtual — 

source node 
[90C35] 
(see: Maximum flow problem) 

source shortest path tree problem see: single — 

sources 
[65D18, 90B85, 90C26] 
(see: Global optimization in location problems) 

Southwell method see: Gauss- — 

SP see: split-variable formulation of — 

space see: canonical function —; convexity property of the 
solution —; criterion —; design —; dual —; extended 
canonical function —; feature —; function —; Hilbert —; 
image —; kinetically admissible —; lineality —; Linear —; 
linear topological —; local properties of the 
configuration —; measure —; metric —; monotone 
operator on a Banach —; null —; optimization in 
a vector —; outcome —; phase —; probability measure —; 
range —; real vectors —; resource-payoff —; Sobolev —; 
solution —; symmetric element in a Hilbert —; triangulation 
of Euclidean —; trustworthy —; variable —; vector — 

space approach to optimization see: Image — 

space-bounded Turing machine see: exponentially —; 
polynomially — 

space complexity of a deterministic Turing machine 
[90C60] 
(see: Complexity classes in optimization) 

space complexity of a nondeterministic Turing machine 
[90C60] 
(see: Complexity classes in optimization) 

space decomposition see: range and null — 

space dilation 
[49]52, 90C30] 
(see: Nondifferentiable optimization: subgradient 
optimization methods) 

space filling see: Global optimization using — 

space filling curve 
[90C26] 
(see: Global optimization using space filling) 

space filling curve 
[90C26] 
(see: Global optimization using space filling) 

space filling curves see: approximation of — 

space graphs see: searching state — 

space methods see: full —; Successive quadratic 
programming: full — 

space model see: vector — 

space models for entropy optimization for image 
reconstruction see: vector- — 

space reduction see: weighting — 

space search see: formulation — 

space search algorithm see: optimal state —; recursive 
state —; state —; synchronized distributed state — 


space SOP see: full —; reduced — 

space SQP method see: full — 

space successive quadratic programming see: full — 
Space-time network 

(see: Railroad crew scheduling; Railroad locomotive 

scheduling) 
space-time network see: weekly — 
space type methods see: Krylov — 
space of x variables see: full — 
spaces see: analyzing almost empty —; Best approximation in 

ordered normed linear —; Increasing and 
convex-along-rays functions on topological vector —; 

Increasing and positively homogeneous functions on 

topological vector —; measure —; ordered vector —; 

semi-ordered —; Steiner ratio in Banach —; 
subdifferentiability — 
span of a T-coloring frequency assignment 

[05-XX] 

(see: Frequency assignment problem) 
spanning acyclic tournament 

[90C10, 90C11, 90C20] 

(see: Linear ordering problem) 
spanning arborescence problem see: capacitated minimum — 
spanning network see: multiphase — 
spanning tree 

[90-XX] 

(see: Survivable networks) 
spanning tree 

[68T99, 90C27] 

(see: Capacitated minimum spanning trees) 
spanning-tree see: minimum —; minimum ratio — 
spanning tree problem see: bounded degree minimum —; 

capacitated minimum —; minimum —; 

resource-constrained minimum — 
spanning tree solution 

[90C35] 

(see: Minimum cost flow problem) 
spanning tree structure 

[90C35] 

(see: Minimum cost flow problem) 
spanning tree structure see: feasible —; optimal — 
spanning trees see: Capacitated minimum —; minimum — 
sparse doublet 

[65K05, 90C30] 

(see: Automatic differentiation: calculation of the Hessian) 
sparse least squares problem 

[65Fxx] 

(see: Least squares problems) 
sparse matrix 

[90C25, 90C30] 

(see: Solving large scale and sparse semidefinite programs; 

Successive quadratic programming: full space methods) 
sparse matrix 
90C25, 90C30] 

(see: Solving large scale and sparse semidefinite programs) 
sparse semidefinite programs see: Solving large scale and — 
sparse triplet 

65K05, 90C30] 

(see: Automatic differentiation: calculation of the Hessian) 
sparsity 

49-04, 65K05, 65Y05, 68N20, 90C30] 
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(see: Automatic differentiation: calculation of Newton steps; 
Automatic differentiation: parallel computation) 
sparsity 
[65K05, 90C25, 90C30] 
(see: Automatic differentiation: calculation of Newton steps; 
Successive quadratic programming: full space methods; 
Successive quadratic programming: solution by active sets 
and interior point methods) 
spatial branch and bound see: reformulation/ — 
spatial Characteristic 
(see: State of the art in modeling agricultural systems) 
spatial competition facility location model 
65K10, 90C31] 
(see: Sensitivity analysis of variational inequality problems) 
spatial Cournot-Nash equilibrium 
91B06, 91B60] 
(see: Oligopolistic market equilibrium) 
spatial interaction 
90B80, 90C10] 
(see: Facility location problems with spatial interaction) 
spatial interaction 
90B80, 90C10] 
(see: Facility location problems with spatial interaction) 
spatial interaction see: Facility location problems with — 
spatial-interaction model 
90C26] 
(see: MINLP: application in facility location-allocation) 
spatial markets 
91B28, 91B50] 
(see: Spatial price equilibrium) 
spatial markets see: aspatial and — 
spatial oligopoly model 
91B06, 91B60] 
(see: Oligopolistic market equilibrium) 
Spatial price equilibrium 
(91B50, 91B28) 
(referred to in: Equilibrium networks; Financial 
equilibrium; Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Oligopolistic market equilibrium; Traffic network 
equilibrium; Walrasian price equilibrium) 
(refers to: Equilibrium networks; Financial equilibrium; 
Generalized monotonicity: applications to variational 
inequalities and equilibrium problems; Oligopolistic 
market equilibrium; Traffic network equilibrium; 
Walrasian price equilibrium) 
spatial price equilibrium 
[65K10, 90C31, 90C35, 91B28, 91B50] 
(see: Multicommodity flow problems; Sensitivity analysis of 
variational inequality problems; Spatial price equilibrium) 
spatial price equilibrium 
[90C30] 
(see: Equilibrium networks) 
spatial price equilibrium problem 
[91B28, 91B50] 
(see: Spatial price equilibrium) 
spatial price equilibrium problem 
[90C30] 
(see: Equilibrium networks) 
spatial price equilibrium problem see: network structure of 
the — 


spatial segmentation 
[90C90] 
(see: Optimization in medical imaging) 
SPE 
[90C35] 
(see: Multicommodity flow problems) 
special facilities see: residents of — 
special functions: algorithms and complexity see: Regression 
by — 
special model features 
(see: Planning in the process industry) 
special properties of crisp relations 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
special properties of fuzzy relations 
[03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
special properties of heterogeneous relations 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
special properties of homogeneous relations 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
special properties of relations 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
special structure mixed integer aBB algorithm 
[65K05, 90C11, 90C26] 
see: MINLP: global optimization with «BB) 
species 
[49K99, 65K05, 80A10] 
see: Optimality criteria for multiphase chemical 
equilibrium) 
species see: Component — 


species index set 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
specific see: unit- — 
specific mathematical areas see: software package for — 
spectral method see: Galerkin — 
Spectral projected gradient methods 
(49M07, 49M10, 65K, 90C06, 90C20) 
spectral radius 
[65K05, 90Cxx] 
(see: Symmetric systems of linear equations) 
spectroscopic visual binary star 
[90C26, 90C90] 
(see: Global optimization in binary star astronomy) 
spectrum see: higher-order — 
Spedicato algorithms for linear equations and linear least 
squares see: Abaffi-Broyden- — 
speedup see: linear — 
sphere see: minimax location on a —; minimum — 
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sphere growing technique 
[90C35] 
(see: Feedback set problems) 
sphere method see: largest inscribed — 
sphere problem see: minimum — 
spherical reduction 
[65K05, 90C30] 
(see: Bisection global optimization methods) 
split see: phase — 
split-variable formulation of SP 
[68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 
splitting 
[90C30] 
(see: Cost approximation algorithms) 
splitting see: operator —; regular Q- — 
splitting algorithm see: operator —; principal variation —; 
tree- — 
splitting field 
[01A50, 01A55, 01A60] 
(see: Fundamental theorem of algebra) 
splitting method 
[90C25, 90C33, 90C55] 
(see: Splitting method for linear complementarity 
problems) 
Splitting method for linear complementarity problems 
(90C25, 90C55, 90C25, 90C33) 
(referred to in: Linear complementarity problem) 
(refers to: Lagrangian multipliers methods for convex 
programming; Linear complementarity problem) 
splitting methods in quadratic programming see: matrix — 
splitting Newton method 
[49]52, 90C30] 
see: Nondifferentiable optimization: Newton method) 
splitting rule 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
see: Maximum satisfiability problem) 
splitting/unsplitting of load 
[00-02, 01-02, 03-02] 
(see: Vehicle routing problem with simultaneous pickups 
and deliveries) 
splitting variable representation 
[90C30, 90C35] 
(see: Optimization in water resources) 
splitting variables 
[90C30, 90C35] 
see: Optimization in water resources) 
SPLP 
[90B80, 90B85] 
see: Warehouse location problem) 
spot see: cold —; hot — 
spot interest rate 
[90C34, 91B28] 
see: Semi-infinite programming and applications in 
finance) 
spot rate see: t-estimate of the — 
spot rate for bonds with constant maturities see: estimating 
the — 
spot rate estimation see: t-programmed problem of — 


spots see: cold —; hot — 

SPP 
[05-04, 90C27] 

(see: Evolutionary algorithms in combinatorial 
optimization) 

SPR 
[90C10, 90C15] 

(see: Stochastic vehicle routing problems) 

spread 
[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 

SPT 
[90B10, 90C27] 

(see: Shortest path tree algorithms) 

SQG 
[90C15] 

(see: Derivatives of probability measures) 

SQG see: single run — 

sQG methods 
[90C15] 

(see: Two-stage stochastic programming: quasigradient 
method) 

SQG methods 
[90C15] 

(see: Two-stage stochastic programming: quasigradient 
method) 

(SQG) methods see: stochastic Quasigradient — 

sQG projection methods 
[90C15] 

(see: Two-stage stochastic programming: quasigradient 
method) 

SQG projection methods 
[90C15] 

(see: Two-stage stochastic programming: quasigradient 
method) 

sQP 
[90C30, 90C90, 93-XX] 

(see: Dynamic programming: optimal control applications; 
Successive quadratic programming: applications in the 
process industry) 

SQP 
[65L99, 90C30, 90C90, 93-XX] 

(see: Optimization strategies for dynamic systems; 
Successive quadratic programming: applications in the 
process industry) 

SQP see: convex —; full space —; multiplier-free reduced 
Hessian —; nonconvex —; quadratic programming 
problem in —; reduced space — 

SQP approach see: Optimization with equilibrium constraints: 
A piecewise — 

SQP method see: full space —; reduced Hessian — 

SQP optimization in industrial problems 

90C30, 90C90] 

(see: Successive quadratic programming: applications in the 

process industry) 

SQP type algorithm 

90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming: discretization methods) 

SQPIP 

49K20, 49M99, 90C55] 
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(see: Sequential quadratic programming: interior point 
methods for distributed optimal control problems) 

SQPIP methods see: affine scaling —; primal-dual — 

square see: equal circles ina — 

square composition of relations 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

square merit function see: list — 

square product of relations 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

square-root-free Givens transformation 
[15A23, 65F05, 65F20, 65F22, 65F25] 
(see: QR factorization) 

square-root method 
[65Fxx] 
(see: Least squares problems) 

square-root transformation 
[90C11, 90C90] 
(see: MINLP: trim-loss problem) 

square-root transformation see: logarithmic and —; 
modified — 

square statistic see: Pearson chi- — 

squared error see: sum of — 

squared Euclidean distance location problem 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 

squares see: Abaffi-Broyden-Spedicato algorithms for linear 
equations and linear least —; ABS algorithms for linear 
equations and linear least —; generalized nonlinear least —; 
Generalized total least —; least —; linear least —; method 
of least —; nonlinear least —; sum of —; weighted least — 

squares algorithm see: recursive least — 

squares criterion see: least — 

squares distance function see: least — 

squares formal orthogonal polynomials see: least — 

squares formulation see: least — 

squares: Newton-type methods see: Nonlinear least — 

squares orthogonal polynomials see: Least — 

squares problem see: consistent least —; generalized least —; 
generalized nonlinear least —; least —; perturbed least —; 
sparse least —; total least —; unconstrained nonlinear 
least —; weighted least — 

squares problems see: Complexity and large-scale least —; 
Least —; Nonlinear least — 

squares problems with massive data sets see: least — 

squares, relation to Newton's method see: Gauss-Newton 
method: Least — 

squares solutions see: least — 

squares: trust region methods see: Nonlinear least — 

SR 
[90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 

SRI quasi-Newton method 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 


SRI update 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

SSC 
[90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 

SSC minimization algorithms 
(90C30, 90C15, 90C99) 
(referred to in: Equality-constrained nonlinear 
programming: KKT necessary optimality conditions; 
Inequality-constrained nonlinear optimization; SSC 
minimization algorithms for nonsmooth and stochastic 
optimization) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; 
Inequality-constrained nonlinear optimization; SSC 
minimization algorithms for nonsmooth and stochastic 
optimization) 

SSC minimization algorithms for nonsmooth and stochastic 
optimization 
(90C30, 90C15, 90C99) 
(referred to in: Equality-constrained nonlinear 
programming: KKT necessary optimality conditions; 
Inequality-constrained nonlinear optimization; SSC 
minimization algorithms) 
(refers to: Equality-constrained nonlinear programming: 
KKT necessary optimality conditions; 
Inequality-constrained nonlinear optimization; SSC 
minimization algorithms) 

SSC-SABB algorithm 
[90C15, 90C30, 90C99] 
(see: SSC minimization algorithms for nonsmooth and 
stochastic optimization) 

SSC-SABB algorithm see: nonsmooth — 

sSC-SBB algorithm 

[90C15, 90C30, 90C99] 

see: SSC minimization algorithms for nonsmooth and 

stochastic optimization) 

SSM 

[90C30] 

see: Sequential simplex method) 

SSS-2 

[49J35, 49K35, 62C20, 91A05, 91A40] 

see: Minimax game tree searching) 

SSS* 

[49J35, 49K35, 62C20, 91A05, 91A40] 

see: Minimax game tree searching) 

SSS -dual 

[49J35, 49K35, 62C20, 91A05, 91A40] 

see: Minimax game tree searching) 

stability 

[05C15, 05C17, 05C35, 05C69, 90C05, 90C11, 90C15, 90C22, 

90C25, 90C29, 90C30, 90C31, 90C35] 

see: Lovasz number; Nondifferentiable optimization: 


parametric programming; Stochastic integer programming: 


continuity, stability, rates of convergence; Suboptimal 
control) 

stability 
[49K27, 49K40, 90C05, 90C22, 90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
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duality; First order constraint qualifications; 


Nondifferentiable optimization: parametric programming; 


Second order constraint qualifications; Semidefinite 
programming: optimality conditions and stability) 
stability see: assumption —; asymptotic —; asymptotical 

system —; elastic —; global structural —; Lipschitz —; 
phase —; radius of —; regions of —; robust —; Schur —; 
Semidefinite programming: optimality conditions and —; 
Sensitivity and stability in NLP: continuity and 
differential —; structural —; system —; topological — 
stability analysis 
[90B15, 90C15, 90C31, 90C33] 
(see: Approximation of extremum problems with 
probability functionals; Dynamic traffic networks; 
Sensitivity analysis of complementarity problems) 
stability analysis 
[90C11, 90C15, 90C31] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence) 
stability analysis see: robust — 
stability analysis of optimization problems 
[90C11, 90C15, 90C31] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence) 
stability criterion see: sector — 
stability of dynamic systems see: Quasidifferentiable 
optimization: — 
stability at an equilibrium 
[90B15] 
(see: Dynamic traffic networks) 
stability at an equilibrium see: asymptotical — 
Stability margin 
[93D09] 
(see: Robust control) 
stability margin see: multivariable — 
stability margin K see: multivariable — 
stability in mechanics see: smooth potentials and — 
stability in NLP see: Sensitivity and — 
stability in NLP: approximation see: Sensitivity and — 
stability in NLP: continuity and differential stability see: 
Sensitivity and — 
stability number 
[05C15, 05C62, 05C69, 05C85, 68W01, 90C27, 90C59] 
(see: Heuristics for maximum clique and independent set; 
Optimization problems in unit-disk graphs) 
stability number see: weighted — 
stability of optimal trajectories see: Turnpike theory: — 
stability on parametric programming 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
stability in parametric programming see: structural —; 
topological — 
stability of polytopes of polynomials see: Robust control: 
schur — 
stability problem 
[90B36] 
(see: Stochastic scheduling) 
stability problem see: phase — 
stability, rates of convergence see: Stochastic integer 
programming: continuity — 


stability of SIP see: structural — 

stability of SIP(f,h,g) see: global structual — 

stability of a solution set 

90C33] 

(see: Topological methods in complementarity theory) 

stability of a stationary point see: strong — 

stability of a structural analysis system 

49J52, 49Q10, 74G60, 74H99, 74K99, 74Pxx, 90C90] 

(see: Quasidifferentiable optimization: stability of dynamic 

systems) 

stability of a system 

90B15] 

(see: Dynamic traffic networks) 

stability of a system see: asymptotical — 

stabilization 

90C06, 90C15] 
(see: Stabilization of cutting plane algorithms for stochastic 
linear programming problems) 

Stabilization of cutting plane algorithms for stochastic linear 
programming problems 
(90C15, 90C06) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; Static 
stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
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problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; Static 
stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 


sta 


sta 
sta 


sta 


sta 


+ 


sta 
sta 


ble 
90B36] 


(see: Stochastic scheduling) 


ble see: asymptotically — 
ble bilinear programming 
90C25, 90C29, 90C30, 90C31] 


(see: Bilevel programming: optimality conditions and 
duality) 


ble function 
90C06] 


(see: Saddle point theory and optimality conditions) 


ble marriage problem 
90C05, 90C10, 90C27, 90C35] 


(see: Assignment and matching) 


ble optimal solution see: strongly — 
ble parametric programming 
90C25, 90C29, 90C30, 90C31] 


(see: Bilevel programming: optimality conditions and 


duality) 


sta 
sta 


ble primal problem see: J- — 
ble problem 
[90C26] 


(see: Cutting plane methods for global optimization) 


sta 


ble set 
[05C15, 05C62, 05C69, 05C85, 68W01, 90C27, 90C35, 90C59] 


(see: Graph coloring; Heuristics for maximum clique and 
independent set; Optimization problems in unit-disk 
graphs) 

Stable set problem: branch & cut algorithms 
(90C27, 90C35) 
(refers to: Heuristics for maximum clique and independent 
set; Integer programming; Integer programming: branch 
and bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 
algorithms; Lovasz number; Simplicial pivoting algorithms 
for integer programming) 


sta 


ble solution 
[90C22, 90C25, 90C31] 


(see: Semidefinite programming: optimality conditions and 
stability) 

stable solution see: Lipschitz —; strongly — 

stable stationary point see: asymptotically — 


stable without pivoting see: guaranteed to be — 

Stackelberg game 
[49-01, 49K10, 49K45, 49M37, 49N10, 90-01, 90C05, 90C20, 
90C26, 90C27, 90C31, 91A65, 91B52] 
(see: Bilevel linear programming; Bilevel linear 


programming: complexity, equivalence to minmax, concave 


programs; Bilevel programming: implicit function 
approach) 

Stackelberg game 
[49-01, 49K10, 49K45, 49M37, 49N10, 90-01, 90B30, 90B50, 
90C05, 90C15, 90C20, 90C26, 90C27, 90C30, 90C31, 90C33, 
90C90, 91A65, 91B32, 91B52, 91B74] 
(see: Bilevel linear programming; Bilevel linear 


programming: complexity, equivalence to minmax, concave 


programs; Bilevel programming: global optimization; 
Bilevel programming: implicit function approach; Bilevel 


programming in management; Stochastic bilevel programs) 


Stackelberg game see: von — 

Stackelberg game theory 

[90C30, 90C90] 

see: Bilevel programming: global optimization) 

Stackelberg games see: von — 

Stackelberg-Nash-Cournot equilibrium 

[90C15, 90C26, 90C33] 

see: Stochastic bilevel programs) 

Stackelberg—Nash equilibrium 

[90C15, 90C26, 90C33] 

see: Stochastic bilevel programs) 

Stackelberg problems 

[90C26, 90C30, 90C31] 

see: Bilevel programming: introduction, history and 
overview) 

Stackelberg solution see: feedback — 

stacker Crane Problem (SCP) 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 

(staff planning) see: scheduling — 

stage see: average cost per —; conceptual design —; detailed 
design —; discounted problem with bounded cost per —; 
preliminary design —; second- — 

stage decision see: first- —; second- — 

stage decisions see: first- —; second- — 

stage feasibility set see: second- — 

stage IM see: single- — 

stage inventory management models see: single — 

stage-length see: variable — 

stage model see: linear two- — 

stage problem see: average cost per — 

stage problems see: average cost per —; Dynamic 
programming: average cost per — 

stage stochastic linear program see: two- — 

stage stochastic program with recourse see: two- — 

stage stochastic programming see: two- — 

stage stochastic programming models see: two- — 

stage stochastic programming problem see: dynamic two- — 

stage stochastic programming: quasigradient method see: 
Two- — 

stage stochastic programs see: multy- — 

stage stochastic programs with recourse see: L-shaped 
method for two- —; two- — 


4526 


Subject Index 


stage stochastic programs with simple integer recourse see: 
two- — 

staircase arc cost function 

90B10] 

(see: Piecewise linear network flow problems) 

staircase cost 

[90B80, 90C11] 

see: Facility location with staircase costs) 

aircase cost function 

[90B80, 90C11] 

see: Facility location with staircase costs) 

staircase costs see: convex piecewise linearization in facility 
location problems with —; Facility location with —; 
heuristics of facility location problems with —; linearization 
in facility location problems with —; solution of facility 
location problems with — 

stalling 

[90C60] 

(see: Complexity of degeneracy) 

stalling 

[90C60] 

see: Complexity of degeneracy) 

stamped models see: time- — 

standard 

[90C31, 90C34, 90C46] 

see: Generalized semi-infinite programming: optimality 
conditions) 

standard see: CG- — 

standard auction algorithm see: modified — 

standard determinant expansion of a matrix 
[90C09, 90C10] 
(see: Combinatorial matrix analysis) 

standard form 
[65K05, 65K10] 
(see: ABS algorithms for optimization) 

standard form see: constraints in —; linear optimization 
problem in —; matrix in — 

standard function 
[90C26, 90C30] 
(see: Bounding derivative ranges) 

standard greedy form 
[90B10, 90B80, 90C35] 
(see: Network location: covering problems) 

standard methods see: unbounded controls and non — 

standard for minimizing q see: CG- — 

standard mollifier 

[57R12, 90C31, 90C34] 

see: Smoothing methods for semi-infinite optimization) 

standard moment problem 

[28-XX, 49-XX, 60-XX] 

see: General moment optimization problems) 

standard monomials 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 

standard pair decomposition of a monomial ideal 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 

standard pair of a monomial ideal 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 


as 


Ss 


standard part map 
[03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 

standard quadratic optimization problem 
[90C20] 
(see: Standard quadratic optimization problems: 
algorithms; Standard quadratic optimization problems: 
applications; Standard quadratic optimization problems: 
theory) 

Standard quadratic optimization problems: algorithms 
(90C20) 
(referred to in: Complexity theory: quadratic programming; 
Linear ordering problem; Quadratic assignment problem; 
Quadratic fractional programming: Dinkelbach method; 
Quadratic knapsack; Quadratic programming with bound 
constraints; Quadratic programming over an ellipsoid; 
Standard quadratic optimization problems: applications; 
Standard quadratic optimization problems: theory) 
(refers to: Complexity theory: quadratic programming; 
Interval analysis: eigenvalue bounds of interval matrices; 
Quadratic assignment problem; Quadratic fractional 
programming: Dinkelbach method; Quadratic knapsack; 
Quadratic programming with bound constraints; Quadratic 
programming over an ellipsoid; Semidefinite 
programming: optimality conditions and stability; 
Simplicial decomposition; Standard quadratic optimization 
problems: applications; Standard quadratic optimization 
problems: theory) 

Standard quadratic optimization problems: applications 
(90C20) 
(referred to in: Complexity theory: quadratic programming; 
Linear ordering problem; Quadratic assignment problem; 
Quadratic fractional programming: Dinkelbach method; 
Quadratic knapsack; Quadratic programming with bound 
constraints; Quadratic programming over an ellipsoid; 
Standard quadratic optimization problems: algorithms; 
Standard quadratic optimization problems: theory) 
(refers to: Complexity theory: quadratic programming; 
Portfolio selection and multicriteria analysis; Quadratic 
assignment problem; Quadratic fractional programming: 
Dinkelbach method; Quadratic knapsack; Quadratic 
programming with bound constraints; Quadratic 
programming over an ellipsoid; Simplicial decomposition; 
Standard quadratic optimization problems: algorithms; 
Standard quadratic optimization problems: theory) 

Standard quadratic optimization problems: theory 
(90C20) 
(referred to in: Complexity theory: quadratic programming; 
Linear ordering problem; Quadratic assignment problem; 
Quadratic fractional programming: Dinkelbach method; 
Quadratic knapsack; Quadratic programming with bound 
constraints; Quadratic programming over an ellipsoid; 
Reverse convex optimization; Standard quadratic 
optimization problems: algorithms; Standard quadratic 
optimization problems: applications) 
(refers to: BB algorithm; Complexity theory: quadratic 
programming; D.C. programming; Interior point methods 
for semidefinite programming; Quadratic assignment 
problem; Quadratic fractional programming: Dinkelbach 
method; Quadratic knapsack; Quadratic programming with 
bound constraints; Quadratic programming over an 
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ellipsoid; Reverse convex optimization; Sequential 
quadratic programming: interior point methods for 
distributed optimal control problems; Standard quadratic 
optimization problems: algorithms; Standard quadratic 
optimization problems: applications) 
standard SD problem 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
standard simplex 
[65K05, 90C20, 90C30] 
(see: Bisection global optimization methods; Standard 
quadratic optimization problems: algorithms; Standard 
quadratic optimization problems: applications; Standard 
quadratic optimization problems: theory) 
standard state 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
standard traffic equilibrium problem 
90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
Stanley-Reisner ideal 
13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
star 
68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
star 
90C26, 90C90] 
(see: Global optimization in binary star astronomy) 
star see: binary —; double —; spectroscopic visual binary —; 
visual binary — 
star astronomy see: Global optimization in binary — 
star cluster 
[62H30, 90C27] 
(see: Assignment methods in clustering) 
star network 
[05A18, 05D15, 68M07, 68M 10, 68Q25, 68R05] 
(see: Maximum partition matching) 
star-shaped set 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
start see: best — 
start interior-point algorithm see: infeasible- — 
start state of a Turing machine 
[90C60] 
(see: Complexity classes in optimization) 
start temperature 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
start-ups see: well-defined — 
starting region see: safe — 
state 
[49K20, 49M99, 90C55] 
(see: Sequential quadratic programming: interior point 
methods for distributed optimal control problems) 
state see: accessible —; equation of —; ground —; optimal 
steady —; standard — 
State of the art in modeling agricultural systems 


state constraint 
[93-XX] 
(see: Dynamic programming: optimal control applications) 
state constraints 
[49K05, 49K10, 49K15, 49K20, 93-XX] 
(see: Direct search Luus—Jaakola optimization procedure; 
Duality in optimal control with first order differential 
equations) 
state distribution density see: steady- — 
state equations 
[49K05, 49K10, 49K15, 49K20, 49M99, 90C55] 
(see: Duality in optimal control with first order differential 
equations; Sequential quadratic programming: interior 
point methods for distributed optimal control problems) 
state equations see: generalized — 
state feedback see: incomplete — 
state inequality constraint 
[93-XX] 
see: Dynamic programming: optimal control applications) 
ate of knowledge 
[94A17] 
see: Jaynes’ maximum entropy principle) 
state Markov chain see: finite- —; stationary- — 
state polyhedron 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
state polytope 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
see: Integer programming: algebraic methods) 
state problem 
[49J20, 49]52] 
see: Shape optimization) 
state problem see: regularizing — 
state relation 
[49J20, 49]52] 
see: Shape optimization) 
state space graphs see: searching — 
state space search algorithm 
[49]35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
state space search algorithm see: optimal —; recursive —; 
synchronized distributed — 
state of a system 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
state-task-network 
[90C26] 
(see: MINLP: design and scheduling of batch processes) 
state of a Turing machine 
[90C60] 
(see: Complexity theory) 
state of a Turing machine see: accepting —; control —; 
final —; start — 
state vector 
[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 
states see: recurrent class of —; transient class of — 
states of plants see: tracing the — 
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static 

90B06] 

(see: Vehicle routing) 

static deterministic problem 

90B06] 

(see: Vehicle routing) 

static/dynamic service needs 

00-02, 01-02, 03-02] 

(see: Vehicle routing problem with simultaneous pickups 

and deliveries) 

static load balancing 

65K05, 65Y05, 65Y10, 65Y20, 68W10] 

(see: Interval analysis: parallel methods for global 

optimization) 

static model 

90B80, 90B85] 
(see: Warehouse location problem) 

static multifacility see: discrete single-commodity 
single-criterion uncapacitated — 

static problems see: Hemivariational inequalities: — 

Static resource constrained project scheduling 

Static stochastic programming models 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 


programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

Static stochastic programming models: conditional 
expectations 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Stochastic integer programming: 
continuity, stability, rates of convergence; Stochastic 
integer programs; Stochastic linear programming: 
decomposition and cutting planes; Stochastic linear 
programs with recourse and arbitrary multivariate 
distributions; Stochastic network problems: massively 
parallel solution; Stochastic programming: minimax 
approach; Stochastic programming models: random 
objective; Stochastic programming: nonanticipativity and 
lagrange multipliers; Stochastic programs with recourse: 
upper bounds; Stochastic vehicle routing problems; 
Two-stage stochastic programs with recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
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discrete distributions; L-shaped method for two-stage 

stochastic programs with recourse; Multistage stochastic 

programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 

Stabilization of cutting plane algorithms for stochastic 

linear programming problems; Static stochastic 

programming models; Stochastic integer programming: 
continuity, stability, rates of convergence; Stochastic 
integer programs; Stochastic linear programming: 
decomposition and cutting planes; Stochastic linear 
programs with recourse and arbitrary multivariate 
distributions; Stochastic network problems: massively 
parallel solution; Stochastic programming: minimax 
approach; Stochastic programming models: random 
objective; Stochastic programming: nonanticipativity and 
lagrange multipliers; Stochastic programming with simple 
integer recourse; Stochastic programs with recourse: upper 
bounds; Stochastic quasigradient methods in minimax 
problems; Stochastic vehicle routing problems; Two-stage 
stochastic programming: quasigradient method; Two-stage 
stochastic programs with recourse) 

static TS 

[68T20, 68T99, 90C27, 90C59] 

(see: Metaheuristics) 
station see: arr- —; arrival- 
stationarity 

[49K27, 58C20, 58E30, 65K05, 90C26, 90C33, 90C34, 90C48] 

(see: Adaptive convexification in semi-infinite optimization; 

Nonsmooth analysis: Fréchet subdifferentials) 
stationarity see: Nonsmooth analysis: weak —; weak — 
stationarity conditions see: KKT — 
stationary 

[05C60, 05C69, 37B25, 491.20, 49L99, 49M07, 49M10, 65K, 

90C05, 90C06, 90C20, 90C22, 90C25, 90C27, 90C30, 90C31, 

90C33, 90C34, 90C35, 90C40, 90C59, 91A22] 

(see: Dynamic programming: average cost per stage 

problems; Dynamic programming: undiscounted problems; 

Optimization with equilibrium constraints: A piecewise 

SQP approach; Replicator dynamics in combinatorial 

optimization; Semidefinite programming: optimality 

conditions and stability; Semi-infinite programming: 
discretization methods; Spectral projected gradient 
methods) 

stationary see: inf- — 

stationary point 

[49M29, 58C20, 58E30, 65K05, 65K10, 90C06, 90C26, 90C31, 

90C34, 90C39, 90C46, 90C48, 90C90, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization; 

Dynamic programming and Newton’s method in 

unconstrained optimal control; Local attractors for 

gradient-related descent iterations; Nonsmooth analysis: 
weak stationarity; Parametric global optimization: 
sensitivity; Second order optimality conditions for 
nonlinear optimization; Variational inequalities: projected 
dynamical system) 

stationary point 
[65K10, 90C26, 90C39, 90C90] 
(see: Second order optimality conditions for nonlinear 


; dep- —; departure- — 


optimization; Variational inequalities: projected dynamical 

system) 
stationary point see: oo- —; asymptotically stable —; Dini 

sup- —; €- —; Hadamard oo- —; Hadamard sup- —; inf- —; 
isolated —; regular —; strong stability of a —; sup- — 
stationary points 

[90C20] 

(see: Standard quadratic optimization problems: 

algorithms) 
stationary points see: inf- — 
stationary policy 

[49L20, 90C40] 

(see: Dynamic programming: stochastic shortest path 

problems; Dynamic programming: undiscounted 

problems) 
stationary regime 

[60J05, 90C15] 

(see: Derivatives of markov processes and their 

simulation) 
stationary-state Markov chain 

[49L99] 

(see: Dynamic programming: average cost per stage 

problems) 
stations see: Automatic differentiation: geometry of satellites 

and tracking —; one-hop neighboring —; tracking — 
statistic see: generalized single cluster —; Goodman-Kruskal 

Th —; Mann-Whitney —; Pearson chi-square —; Rand — 
statistical analysis see: exploratory — 
statistical classification 
[62H30, 90C11] 
see: Statistical classification: optimization approaches) 
tistical classification 
[62H30, 90C11] 
see: Statistical classification: optimization approaches) 
Statistical classification: optimization approaches 
62H30, 90C11) 
referred to in: Linear programming models for 

classification; Mixed integer classification problems; 

Optimization in boolean classification problems; 

Optimization in classifying text documents) 

(refers to: Linear programming models for classification; 

Mixed integer classification problems; Optimization in 

boolean classification problems; Optimization in classifying 

text documents) 
Statistical convergence and turnpike theory 

(40A05, 49]24) 

(referred to in: Turnpike theory: stability of optimal 

trajectories) 

(refers to: Turnpike theory: stability of optimal trajectories) 
statistical inference see: order restricted — 
statistical method see: nonparametric — 
statistical models 

[90C30] 

(see: Global optimization based on statistical models) 
statistical models see: Global optimization based on — 
statistical pattern classification 

[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 

90C90] 

(see: Disease diagnosis: optimization-based 

methods) 
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statistical pattern recognition 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
statistical procedures 
[90C26, 90C30] 
see: Forecasting) 
statistical representation of cutting plane coefficients 
[90C06, 90C15 
see: Stochastic linear programming: decomposition and 
cutting planes) 
statistics 
[90C26, 90C30 
see: Forecasting) 
statistics 
[90C26, 90C30 
(see: Forecasting) 
statistics see: algebraic —; higher-order —; Signal processing 
with higher order — 
status of the wells see: operational — 
steady state see: optimal — 
steady-state distribution density 
[60G35, 65K05] 
(see: Differential equations and global optimization) 
steamer problem see: tramp — 
steep directions 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 
steepest ascent see: rate of —; rule of — 
steepest ascent direction 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 
steepest ascent direction 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
steepest ascent direction see: Dini —; Hadamard — 
steepest descent 
[46N10, 68T20, 68T99, 90-00, 90C27, 90C47, 90C59] 
see: Metaheuristics; Nondifferentiable optimization) 
steepest descent see: e- —; method of —; rate of — 
steepest descent algorithm 
[60G35, 62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Differential equations and global optimization; 
Monte-Carlo simulations for stochastic optimization) 
steepest-descent direction 
[90C05, 90C22, 90C25, 90C30, 90C51, 90Cxx] 
see: Interior point methods for semidefinite programming; 
Quasidifferentiable optimization: optimality conditions; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 
steepest descent direction 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
steepest descent direction see: Dini —; Hadamard — 
steepest descent method 
[49M29, 65K10, 90C06, 90C30] 
(see: Cost approximation algorithms; Large scale 
unconstrained optimization; Local attractors for 
gradient-related descent iterations) 


steepest descent vector 
[49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 
steepest edge simplex method 
[90C05] 
(see: Linear programming: Klee-Minty examples) 
Stein estimators 
[91B28] 
(see: Portfolio selection: markowitz mean-variance model) 
Stein estimators see: James— — 
Steiner arborescence 
[90C27] 
(see: Steiner tree problems) 
Steiner arborescence see: minimum — 
Steiner arborescence tree see: rectilinear — 
Steiner graphical traveling salesman problem 
90B20] 
(see: General routing problem) 
Steiner minimal tree 
05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
Steiner minimal tree 
05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
Steiner minimal tree problem 
05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
Steiner minimum tree 
90C27] 
(see: Steiner tree problems) 
Steiner network see: multiphase — 
Steiner nodes 
05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
Steiner point 
90C27] 
(see: Steiner tree problems) 
Steiner points 
05C05, 05C40, 68Q20, 68R10, 90C27, 90C35] 
(see: Network design problems; Optimal triangulations; 
Steiner tree problems) 
Steiner points see: Steiner tree problem with minimum 
number of — 
Steiner problem 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
Steiner problems see: multiphase — 
Steiner ratio 
[05C05, 05C40, 05C85, 68Q25, 68R10, 90B80, 90C35] 
(see: Bottleneck steiner tree problems; Network design 
problems) 
Steiner ratio 
[90C27] 
(see: Steiner tree problems) 
Steiner ratio see: bottleneck —; Euclidean —; k- — 
Steiner ratio in Banach spaces 
[90C27] 
(see: Steiner tree problems) 
Steiner ratio of biomolecular structures 
(90C27) 
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Steiner tree 
[05-04, 05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Evolutionary algorithms in combinatorial 
optimization; Optimization problems in unit-disk graphs) 
Steiner tree 


[90C27] 
(see: Steiner tree problems) 
Steiner tree see: full —; min-max —; rectilinear — 


Steiner tree problem with minimum number of Steiner points 
[90C27] 
(see: Steiner tree problems) 

Steiner tree problems 
(90C27) 
(referred to in: Auction algorithms; Bottleneck steiner tree 
problems; Communication network assignment problem; 
Dynamic traffic networks; Equilibrium networks; 
Generalized networks; Maximum flow problem; Minimum 
cost flow problem; Multicommodity flow problems; 
Network design problems; Network location: covering 
problems; Nonconvex network flow problems; Piecewise 
linear network flow problems; Shortest path tree 
algorithms; Stochastic network problems: massively parallel 
solution; Survivable networks; Traffic network equilibrium) 


(refers to: Auction algorithms; Bottleneck steiner tree 
problems; Communication network assignment problem; 
Directed tree networks; Dynamic traffic networks; 
Equilibrium networks; Evacuation networks; Generalized 
networks; Maximum flow problem; Minimum cost flow 
problem; Network design problems; Network location: 
covering problems; Nonconvex network flow problems; 
Piecewise linear network flow problems; Shortest path tree 
algorithms; Stochastic network problems: massively parallel 
solution; Survivable networks; Traffic network equilibrium) 
steiner tree problems see: Bottleneck — 
Steiner trees see: bottleneck —; variations of — 
Steiner triangulation see: minimum weight — 
Steiner-Weber problem 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
Steinhauser function see: Kreisselmeier- — 
step see: analysis —; bounding —; branching —; 
computational —; coordination —; decomposition —; 
descent —; escape —; fathoming —; insertion —; long 
serious —; Newton —; null —; preconditioning —; 
relaxation —; selection —; short serious —; time- —; trial — 
step case of the trust region problem see: Newton — 
step function 
[90C26] 
(see: MINLP: application in facility location-allocation) 
step Gauss—Newton method see: full- — 
step size 
[90C05, 90C22, 90C25, 90C30, 90C51] 
(see: Interior point methods for semidefinite programming) 
step-size rule see: divergent series — 
step superlinear see: 2- — 
steplength 
[49M29, 65K10, 90C06] 
(see: Local attractors for gradient-related descent iterations) 


steplength see: compute the —; compute a safeguarded new 
trial — 

steplength rule see: Armijo —; divergent series — 

steps see: acceleration —; Automatic differentiation: 
calculation of Newton —; average number of pivot —; 
expected number of pivot — 

stepsize 
[90C30] 
(see: Frank-Wolfe algorithm) 

Stewart reorthogonalized Gram-Schmidt algorithm see: 
Daniel-Gragg-Kaufmann- — 

sTF 

[90B35, 90C11, 90C30] 

see: Robust optimization: mixed-integer linear programs) 

Stiefel algorithm see: Hestenes— — 

Stiemke theorem 

[90C05, 90C30] 

(see: Theorems of the alternative and optimization) 

Stiemke transposition theorem 

[15A39, 90C05] 

see: Tucker homogeneous systems of linear relations) 

stiff problem 

[65Fxx] 

see: Least squares problems) 

stiffness matrix 

[49M37, 65K05, 90C30] 

see: Structural optimization) 

stiffness matrix 

[49M37, 65K05, 90C30] 

see: Structural optimization) 

sTO 

[90B35, 90C11, 90C30] 

see: Robust optimization: mixed-integer linear programs) 

STO problem see: nested — 

stochastic approach 

[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 

70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 

see: Global optimization in protein folding) 

stochastic approach to optimization in water resources 

[90C30, 90C35] 

see: Optimization in water resources) 

chastic approximation 

[62F12, 65C05, 65K05, 90C15, 90C31] 

(see: Monte-Carlo simulations for stochastic optimization) 

stochastic approximation 

[90C15] 

see: Extremum problems with probability functions: kernel 

type solution methods) 

ochastic bilevel program 

[90C15, 90C26, 90C33] 

(see: Stochastic bilevel programs) 

Stochastic bilevel programs 

90C15, 90C26, 90C33) 

referred to in: Bilevel linear programming; Bilevel linear 

programming: complexity, equivalence to minmax, concave 

programs; Bilevel optimization: feasibility test and 

flexibility index; Bilevel programming; Bilevel 

programming: applications; Bilevel programming: global 

optimization; Bilevel programming: implicit function 

approach; Bilevel programming: introduction, history and 

overview; Bilevel programming in management; Bilevel 
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programming: optimality conditions and duality; 
Multilevel methods for optimal design; Multilevel 
optimization in mechanics) 
(refers to: Bilevel fractional programming; Bilevel linear 
programming; Bilevel linear programming: complexity, 
equivalence to minmax, concave programs; Bilevel 
optimization: feasibility test and flexibility index; Bilevel 
programming; Bilevel programming: applications; Bilevel 
programming: applications in engineering; Bilevel 
programming: implicit function approach; Bilevel 
programming: introduction, history and overview; Bilevel 
programming in management; Bilevel programming: 
optimality conditions and duality; Multilevel methods for 
optimal design; Multilevel optimization in mechanics) 

stochastic bilevel programs see: algorithms for — 

stochastic branch and bound 

90C15, 90C27] 

(see: Discrete stochastic optimization) 

ochastic combinatorial optimization 

[90C15, 90C27] 

see: Discrete stochastic optimization) 

chastic counterpart method 

[62F12, 65C05, 65K05, 90C15, 90C31] 

see: Monte-Carlo simulations for stochastic optimization) 

chastic decomposition 

[62F12, 65C05, 65K05, 90C15, 90C26, 90C31, 90C33] 

see: Monte-Carlo simulations for stochastic optimization; 
Stochastic bilevel programs; Two-stage stochastic 
programming: quasigradient method) 

stochastic decomposition 
[90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 

stochastic decomposition algorithm 
[90C06, 90C15] 
(see: Stochastic linear programming: decomposition and 
cutting planes) 

stochastic decomposition algorithm see: regularized — 

stochastic differential equation 

[60G35, 65K05] 

see: Differential equations and global optimization) 

ochastic discretization procedure 

[90C05, 90C25, 90C30, 90C34] 

see: Semi-infinite programming: discretization methods) 

chastic dynamic optimization problem 

[90C15] 

see: Stochastic quasigradient methods: applications) 

chastic dynamic programming 

[90B05, 90B06] 

see: Global supply chain models) 

stochastic dynamic programming 

[90B05, 90B06] 

see: Global supply chain models) 

chastic dynamic systems 

90C15] 

(see: Stochastic quasigradient methods: applications) 

Stochastic facility location model 

90C15] 

(see: Stochastic quasigradient methods in minimax 

problems) 
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stochastic flexibility 


[90C90] 
(see: Chemical process planning) 


stochastic geometry 


[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 


stochastic geometry 


[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 


stochastic global optimization 


[92B05] 
(see: Genetic algorithms) 


Stochastic global optimization: stopping rules 


(65K05, 90C26, 90C30, 65Cxx, 65C30, 65C40, 65C50, 65C60) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian global 
optimization; Genetic algorithms for protein structure 
prediction; Global optimization based on statistical models; 
Global optimization: hit and run methods; Monte-Carlo 
simulated annealing in protein folding; Packet annealing; 
Random search methods; Simulated annealing; Simulated 
annealing methods in protein folding; Stochastic global 
optimization: two-phase methods) 

(refers to: Adaptive simulated annealing and its application 
to protein folding; Bayesian global optimization; Concave 
programming; D.C. programming; Genetic algorithms for 
protein structure prediction; Global optimization based on 
statistical models; Global optimization: hit and run 
methods; Monte-Carlo simulated annealing in protein 
folding; Packet annealing; Random search methods; 
Simulated annealing; Simulated annealing methods in 
protein folding; Stochastic global optimization: two-phase 
methods) 


Stochastic global optimization: two-phase methods 


(65K05, 90C26, 90C30, 65Cxx, 65C30, 65C40, 65C50, 65C60) 
(referred to in: Adaptive simulated annealing and its 
application to protein folding; Bayesian global 
optimization; Genetic algorithms for protein structure 
prediction; Global optimization based on statistical models; 
Global optimization: hit and run methods; Monte-Carlo 
simulated annealing in protein folding; Packet annealing; 
Random search methods; Simulated annealing; Simulated 
annealing methods in protein folding; Stochastic global 
optimization: stopping rules) 

(refers to: Adaptive simulated annealing and its application 
to protein folding; Bayesian global optimization; Concave 
programming; D.C. programming; Genetic algorithms for 
protein structure prediction; Global optimization based on 
statistical models; Global optimization: hit and run 
methods; Monte-Carlo simulated annealing in protein 
folding; Packet annealing; Random search methods; 
Simulated annealing; Simulated annealing methods in 
protein folding; Stochastic global optimization: stopping 
rules) 


stochastic integer program with recourse 


[90C10, 90C15] 
(see: Stochastic integer programs) 


stochastic integer programming 


[90C11, 90C15, 90C31] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence) 
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Stochastic integer programming: continuity, stability, rates of 
convergence 

(90C15, 90C11, 90C31) 

(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Branch and price: Integer 
programming with column generation; Decomposition 
techniques for MILP: lagrangian relaxation; Discretely 
distributed stochastic programs: descent directions and 
efficient points; Discrete stochastic optimization; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Graph coloring; Integer linear complementary problem; 
Integer programming; Integer programming: algebraic 
methods; Integer programming: branch and bound 
methods; Integer programming: branch and cut algorithms; 
Integer programming: cutting plane algorithms; Integer 
programming: lagrangian relaxation; LCP: Pardalos—Rosen 
mixed integer formulation; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; MINLP: trim-loss problem; Multi-objective 
integer linear programming; Multi-objective mixed integer 
programming; Multiparametric mixed integer linear 
programming; Multistage stochastic programming: 
barycentric approximation; Parametric mixed integer 
nonlinear optimization; Preprocessing in stochastic 
programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Set covering, packing and 
partitioning problems; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Simplicial pivoting algorithms for integer programming; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programs; Stochastic linear programming: decomposition 
and cutting planes; Stochastic linear programs with 
recourse and arbitrary multivariate distributions; 
Stochastic network problems: massively parallel solution; 
Stochastic programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programs with recourse: upper bounds; 
Stochastic vehicle routing problems; Time-dependent 
traveling salesman problem; Two-stage stochastic programs 
with recourse) 

(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Branch and price: Integer 
programming with column generation; Decomposition 
techniques for MILP: lagrangian relaxation; Discretely 
distributed stochastic programs: descent directions and 
efficient points; Extremum problems with probability 
functions: kernel type solution methods; General moment 
optimization problems; Integer linear complementary 
problem; Integer programming; Integer programming: 
algebraic methods; Integer programming: branch and 
bound methods; Integer programming: branch and cut 
algorithms; Integer programming: cutting plane 


algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos—Rosen 
mixed integer formulation; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; Mixed integer classification problems; 
Multi-objective integer linear programming; 
Multi-objective mixed integer programming; Multistage 
stochastic programming: barycentric approximation; 
Preprocessing in stochastic programming; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; Set 
covering, packing and partitioning problems; Simple 
recourse problem: dual method; Simple recourse problem: 
primal method; Simplicial pivoting algorithms for integer 
programming; Stabilization of cutting plane algorithms for 
stochastic linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programs; Stochastic linear programming: decomposition 
and cutting planes; Stochastic linear programs with 
recourse and arbitrary multivariate distributions; 
Stochastic network problems: massively parallel solution; 
Stochastic programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programming with simple integer recourse; 
Stochastic programs with recourse: upper bounds; 
Stochastic quasigradient methods in minimax problems; 
Stochastic vehicle routing problems; Time-dependent 
traveling salesman problem; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 


Stochastic integer programs 


(90C15, 90C10) 

(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Branch and price: Integer 
programming with column generation; Decomposition 
techniques for MILP: lagrangian relaxation; Discretely 
distributed stochastic programs: descent directions and 
efficient points; Extremum problems with probability 
functions: kernel type solution methods; General moment 
optimization problems; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming: lagrangian relaxation; 
LCP: Pardalos-Rosen mixed integer formulation; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; MINLP: trim-loss 
problem; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Multistage stochastic programming: barycentric 
approximation; Parametric mixed integer nonlinear 
optimization; Preprocessing in stochastic programming; 
Probabilistic constrained linear programming: duality 
theory; Probabilistic constrained problems: convexity 
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theory; Set covering, packing and partitioning problems; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Simplicial pivoting algorithms 
for integer programming; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic linear programming: 
decomposition and cutting planes; Stochastic linear 
programs with recourse and arbitrary multivariate 
distributions; Stochastic network problems: massively 
parallel solution; Stochastic programming: minimax 
approach; Stochastic programming models: random 
objective; Stochastic programming: nonanticipativity and 
lagrange multipliers; Stochastic programs with recourse: 
upper bounds; Stochastic vehicle routing problems; 
Time-dependent traveling salesman problem; Two-stage 
stochastic programs with recourse) 

(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Branch and price: Integer 
programming with column generation; Decomposition 
techniques for MILP: lagrangian relaxation; Discretely 
distributed stochastic programs: descent directions and 
efficient points; Extremum problems with probability 
functions: kernel type solution methods; Fractional 
combinatorial optimization; General moment optimization 
problems; Integer linear complementary problem; Integer 
programming; Integer programming: algebraic methods; 
Integer programming: branch and bound methods; Integer 
programming: branch and cut algorithms; Integer 
programming: cutting plane algorithms; Integer 
programming duality; Integer programming: lagrangian 
relaxation; LCP: Pardalos-Rosen mixed integer 
formulation; Logconcave measures, logconvexity; 
Logconcavity of discrete distributions; L-shaped method for 
two-stage stochastic programs with recourse; Mixed integer 
classification problems; Multi-objective integer linear 
programming; Multi-objective mixed integer 
programming; Multistage stochastic programming: 
barycentric approximation; Preprocessing in stochastic 
programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Set covering, packing and 
partitioning problems; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Simplicial pivoting algorithms for integer programming; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic linear programming: decomposition and cutting 
planes; Stochastic linear programs with recourse and 
arbitrary multivariate distributions; Stochastic network 
problems: massively parallel solution; Stochastic 
programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programming with simple integer recourse; 


Stochastic programs with recourse: upper bounds; 
Stochastic quasigradient methods in minimax problems; 
Stochastic vehicle routing problems; Time-dependent 
traveling salesman problem; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

stochastic linear optimization problems 
[90C15, 90C27] 
(see: Discrete stochastic optimization) 

stochastic linear program see: two-stage — 

stochastic linear program with recourse 
[90C10, 90C15] 
(see: L-shaped method for two-stage stochastic programs 
with recourse; Stochastic integer programs; Stochastic 
linear programs with recourse and arbitrary multivariate 
distributions; Two-stage stochastic programs with 
recourse) 

stochastic linear programming 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 

Stochastic linear programming: decomposition and cutting 
planes 
(90C15, 90C06) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Decomposition principle of linear 
programming; Discretely distributed stochastic programs: 
descent directions and efficient points; Extremum problems 
with probability functions: kernel type solution methods; 
Generalized benders decomposition; General moment 
optimization problems; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; MINLP: generalized cross decomposition; 
MINLP: logic-based methods; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Simplicial decomposition; Simplicial decomposition 
algorithms; Stabilization of cutting plane algorithms for 
stochastic linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear programs 
with recourse and arbitrary multivariate distributions; 
Stochastic network problems: massively parallel solution; 
Stochastic programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programs with recourse: upper bounds; 
Stochastic vehicle routing problems; Successive quadratic 
programming: decomposition methods; Two-stage 
stochastic programming: quasigradient method; Two-stage 
stochastic programs with recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Decomposition principle of linear 
programming; Discretely distributed stochastic programs: 
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descent directions and efficient points; Extremum problems 
with probability functions: kernel type solution methods; 
Generalized benders decomposition; General moment 
optimization problems; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; MINLP: generalized cross decomposition; 
MINLP: logic-based methods; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Simplicial decomposition; Simplicial decomposition 
algorithms; Stabilization of cutting plane algorithms for 
stochastic linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear programs 
with recourse and arbitrary multivariate distributions; 
Stochastic network problems: massively parallel solution; 
Stochastic programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programming with simple integer recourse; 
Stochastic programs with recourse: upper bounds; 
Stochastic quasigradient methods in minimax problems; 
Stochastic vehicle routing problems; Successive quadratic 
programming: decomposition methods; Two-stage 
stochastic programming: quasigradient method; Two-stage 
stochastic programs with recourse) 

stochastic linear programming problems see: Stabilization of 
cutting plane algorithms for — 

Stochastic linear programs with recourse and arbitrary 
multivariate distributions 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic network problems: massively parallel solution; 
Stochastic programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 


programming: nonanticipativity and lagrange multipliers; 
Stochastic programs with recourse: upper bounds; 
Stochastic vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
L-shaped method for two-stage stochastic programs with 
recourse; Multistage stochastic programming: barycentric 
approximation; Preprocessing in stochastic programming; 
Probabilistic constrained linear programming: duality 
theory; Probabilistic constrained problems: convexity 
theory; Simple recourse problem: dual method; Simple 
recourse problem: primal method; Stabilization of cutting 
plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic network problems: massively parallel solution; 
Stochastic programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programming with simple integer recourse; 
Stochastic programs with recourse: upper bounds; 
Stochastic quasigradient methods in minimax problems; 
Two-stage stochastic programming: quasigradient method; 
Two-stage stochastic programs with recourse) 

stochastic local search 

[68T20, 68T99, 90C27, 90C59] 

see: Metaheuristics) 

stochastic matrix 

[90C15, 90C29] 

see: Discretely distributed stochastic programs: descent 

directions and efficient points) 

stochastic matrix see: doubly — 

stochastic methods 

[90C26, 90C90] 

see: Global optimization: hit and run methods) 

stochastic (mixed-)integer programming 

[90C11, 90C15] 

see: Stochastic programming with simple integer recourse) 

chastic model 

[52A22, 60D05, 68Q25, 90C05] 

see: Probabilistic analysis of simplex algorithms) 

stochastic mollifier quasigradient 

[90C15] 

see: Stochastic quasigradient methods in minimax 

problems) 

stochastic network problem 

[68W 10, 90B15, 90C06, 90C30] 

see: Stochastic network problems: massively parallel 

solution) 

Stochastic network problems: massively parallel solution 

90B15, 68W10, 90C06, 90C30) 

referred to in: Approximation of extremum problems with 
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probability functionals; Approximation of multivariate 
probability integrals; Asynchronous distributed 
optimization algorithms; Auction algorithms; Automatic 
differentiation: parallel computation; Communication 
network assignment problem; Discretely distributed 
stochastic programs: descent directions and efficient points; 
Dynamic traffic networks; Equilibrium networks; 
Extremum problems with probability functions: kernel type 
solution methods; Generalized networks; General moment 
optimization problems; Heuristic search; Load balancing 
for parallel optimization techniques; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; Maximum flow problem; Minimum cost flow 
problem; Multicommodity flow problems; Multistage 
stochastic programming: barycentric approximation; 
Network design problems; Network location: covering 
problems; Nonconvex network flow problems; Parallel 
computing: complexity classes; Parallel computing: models; 
Parallel heuristic search; Piecewise linear network flow 
problems; Preprocessing in stochastic programming; 
Probabilistic constrained linear programming: duality 
theory; Probabilistic constrained problems: convexity 
theory; Shortest path tree algorithms; Simple recourse 
problem: dual method; Simple recourse problem: primal 
method; Stabilization of cutting plane algorithms for 
stochastic linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Steiner tree problems; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Survivable networks; Traffic network 
equilibrium; Two-stage stochastic programs with recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Asynchronous distributed 
optimization algorithms; Auction algorithms; Automatic 
differentiation: parallel computation; Communication 
network assignment problem; Directed tree networks; 
Discretely distributed stochastic programs: descent 
directions and efficient points; Dynamic traffic networks; 
Equilibrium networks; Evacuation networks; Extremum 
problems with probability functions: kernel type solution 
methods; Generalized networks; General moment 
optimization problems; Heuristic search; Interval analysis: 
parallel methods for global optimization; Load balancing 
for parallel optimization techniques; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; Maximum flow problem; Minimum cost flow 
problem; Multistage stochastic programming: barycentric 
approximation; Network design problems; Network 
location: covering problems; Nonconvex network flow 
problems; Parallel computing: complexity classes; Parallel 


computing: models; Parallel heuristic search; Piecewise 
linear network flow problems; Preprocessing in stochastic 
programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Shortest path tree algorithms; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Steiner 
tree problems; Stochastic integer programming: continuity, 
stability, rates of convergence; Stochastic integer programs; 
Stochastic linear programming: decomposition and cutting 
planes; Stochastic linear programs with recourse and 
arbitrary multivariate distributions; Stochastic 
programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programming with simple integer recourse; 
Stochastic programs with recourse: upper bounds; 
Stochastic quasigradient methods in minimax problems; 
Stochastic vehicle routing problems; Survivable networks; 
Traffic network equilibrium; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

stochastic neural network 
[90C27, 90C30] 
(see: Neural networks for combinatorial optimization) 

Stochastic optimal stopping: numerical methods 

Stochastic optimal stopping: problem formulations 

stochastic optimization 
[90C15] 
(see: Derivatives of probability and integral functions: 
general theory and examples) 

stochastic optimization 
[60J05, 90C15, 90C27, 90C29, 90C30, 90C99] 
(see: Derivatives of markov processes and their simulation; 
Derivatives of probability measures; Discretely distributed 
stochastic programs: descent directions and efficient points; 
Discrete stochastic optimization; SSC minimization 
algorithms for nonsmooth and stochastic optimization) 

stochastic optimization see: Discrete —; Monte-Carlo 
simulations for —; SSC minimization algorithms for 
nonsmooth and — 

stochastic problems 
[68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 

stochastic process 
[60G35, 65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 
(see: Differential equations and global optimization; 
Unconstrained optimization in neural network training) 

stochastic process nonanticipative with respect to a filtration 
[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 

stochastic program 
[68W10, 90B15, 90C06, 90C15, 90C30] 
(see: Stochastic network problems: massively parallel 
solution; Stochastic programming: nonanticipativity and 
lagrange multipliers) 
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stochastic program 
[90C15, 90C29] 
(see: Discretely distributed stochastic programs: descent 
directions and efficient points) 

stochastic program see: multiperiod —; multistage — 

stochastic program with recourse 
[90C10, 90C15] 
(see: Stochastic vehicle routing problems) 

stochastic program with recourse see: two-stage — 

stochastic programming 
[01A99, 62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Approximation of extremum problems with 
probability functionals; History of optimization; 
Monte-Carlo simulations for stochastic optimization; 
Stochastic programming: parallel factorization of 
structured matrices; Two-stage stochastic programs with 
recourse) 

stochastic programming 
[49M25, 62C20, 62F12, 65C05, 65K05, 68W10, 90-08, 90B10, 
90B15, 90C05, 90C06, 90C08, 90C10, 90C15, 90C26, 90C27, 
90C30, 90C31, 90C33, 90C35, 90C90, 91B28] 
(see: Chemical process planning; Financial optimization; 
L-shaped method for two-stage stochastic programs with 
recourse; Monte-Carlo simulations for stochastic 
optimization; Multistage stochastic programming: 
barycentric approximation; Operations research and 
financial markets; Preprocessing in stochastic 
programming; Simple recourse problem; Simple recourse 
problem: dual method; Simple recourse problem: primal 
method; Stabilization of cutting plane algorithms for 
stochastic linear programming problems; Stochastic bilevel 
programs; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming: parallel factorization of structured matrices; 
Stochastic programs with recourse: upper bounds; 
Stochastic vehicle routing problems; Two-stage stochastic 
programs with recourse) 

stochastic programming see: bibliography of —; 
multistage —; Preprocessing in —; probabilistic 
constrained —; two-stage — 

stochastic programming: barycentric approximation see: 
Multistage — 

Stochastic programming: minimax approach 
(90C15, 62C20) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Bilevel linear programming: 
complexity, equivalence to minmax, concave programs; 
Bilevel optimization: feasibility test and flexibility index; 
Discretely distributed stochastic programs: descent 
directions and efficient points; Extremum problems with 
probability functions: kernel type solution methods; 
General moment optimization problems; Logconcave 
measures, logconvexity; Logconcavity of discrete 
distributions; L-shaped method for two-stage stochastic 


programs with recourse; Minimax: directional 
differentiability; Minimax theorems; Multistage stochastic 
programming: barycentric approximation; 
Nondifferentiable optimization: minimax problems; 
Preprocessing in stochastic programming; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming 
models: random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 

(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Bilevel linear programming: 
complexity, equivalence to minmax, concave programs; 
Bilevel optimization: feasibility test and flexibility index; 
Discretely distributed stochastic programs: descent 
directions and efficient points; Extremum problems with 
probability functions: kernel type solution methods; 
General moment optimization problems; Logconcave 
measures, logconvexity; Logconcavity of discrete 
distributions; L-shaped method for two-stage stochastic 
programs with recourse; Minimax: directional 
differentiability; Minimax theorems; Multistage stochastic 
programming: barycentric approximation; 
Nondifferentiable optimization: minimax problems; 
Preprocessing in stochastic programming; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming 
models: random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 


stochastic programming models see: Static —; two-stage — 
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stochastic programming models: conditional expectations see: 
Static — 

Stochastic programming models: random objective 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programs with 
recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 


quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

Stochastic programming: nonanticipativity and lagrange 
multipliers 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programs with recourse: 
upper bounds; Stochastic quasigradient methods; 
Stochastic vehicle routing problems; Two-stage stochastic 
programs with recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
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random objective; Stochastic programming with simple 
integer recourse; Stochastic programs with recourse: upper 
bounds; Stochastic quasigradient methods in minimax 
problems; Stochastic vehicle routing problems; Two-stage 
stochastic programming: quasigradient method; Two-stage 
stochastic programs with recourse) 

Stochastic programming: parallel factorization of structured 
matrices 
(90C15) 

stochastic programming problem 
[90C15] 
(see: Static stochastic programming models) 

stochastic programming problem see: dynamic two-stage — 

stochastic programming: quasigradient method see: 
Two-stage — 

Stochastic programming with simple integer recourse 
(90C15, 90C11) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
vehicle routing problems; Two-stage stochastic programs 
with recourse) 

stochastic programs see: discretely distributed —; 
multy-stage — 

stochastic programs: descent directions and efficient points 
see: Discretely distributed — 

stochastic programs with recourse see: L-shaped method for 
two-stage —; two-stage — 

Stochastic programs with recourse: upper bounds 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 


discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
vehicle routing problems; Two-stage stochastic programs 
with recourse) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method; Two-stage stochastic 
programs with recourse) 

stochastic programs with simple integer recourse see: 
two-stage — 

stochastic quasigradient 
[62F12, 65C05, 65K05, 90C15, 90C31] 
(see: Monte-Carlo simulations for stochastic optimization) 

Stochastic quasigradient methods 
(90C15) 
(referred to in: Derivatives of markov processes and their 
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simulation; Derivatives of probability measures; Stochastic 
quasigradient methods: applications; Stochastic 
quasigradient methods in minimax problems; Two-stage 
stochastic programming: quasigradient method) 

(refers to: Stochastic programming: nonanticipativity and 
lagrange multipliers) 


stochastic quasigradient methods 


[90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs; Stochastic quasigradient 
methods: applications) 


Stochastic quasigradient methods: applications 


(90C15) 

(referred to in: Stochastic quasigradient methods in 
minimax problems; Two-stage stochastic programming: 
quasigradient method) 

(refers to: Stochastic quasigradient methods) 


Stochastic quasigradient methods in minimax problems 


(90C15) 

(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Bilevel linear programming: 
complexity, equivalence to minmax, concave programs; 
Bilevel optimization: feasibility test and flexibility index; 
Discretely distributed stochastic programs: descent 
directions and efficient points; Extremum problems with 
probability functions: kernel type solution methods; 
General moment optimization problems; Logconcave 
measures, logconvexity; Logconcavity of discrete 
distributions; L-shaped method for two-stage stochastic 
programs with recourse; Minimax: directional 
differentiability; Minimax theorems; Multistage stochastic 
programming: barycentric approximation; 
Nondifferentiable optimization: minimax problems; 
Preprocessing in stochastic programming; Probabilistic 
constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic vehicle 
routing problems; Two-stage stochastic programming: 
quasigradient method; Two-stage stochastic programs with 
recourse) 

(refers to: Minimax theorems; Nondifferentiable 
optimization: minimax problems; Stochastic quasigradient 
methods; Stochastic quasigradient methods: applications; 
Two-stage stochastic programming: quasigradient method; 
Two-stage stochastic programs with recourse) 


stochastic Quasigradient (SQG) methods 
[90C15] 
(see: Stochastic quasigradient methods) 

stochastic quasigradients 
[90C15] 
(see: Stochastic quasigradient methods) 

Stochastic scheduling 
(90B36) 
(referred to in: Job-shop scheduling problem; MINLP: design 
and scheduling of batch processes; Vehicle scheduling) 

stochastic search method 

90C27, 90C90] 

(see: Simulated annealing) 

ochastic shortest path 

49120, 90C40] 

(see: Dynamic programming: stochastic shortest path 

problems) 

stochastic shortest path problem 

49120, 90C40] 

(see: Dynamic programming: stochastic shortest path 

problems) 

stochastic shortest path problems 

49120, 49L99, 90C39, 90C40] 
(see: Dynamic programming: average cost per stage 
problems; Dynamic programming: infinite horizon 
problems, overview) 

stochastic shortest path problems see: Dynamic 
programming: — 

stochastic simulated annealing 
[90C15, 90C27] 
(see: Discrete stochastic optimization) 

stochastic solution see: value of — 

stochastic transportation and location problem 
[90B80, 90C11] 
(see: Stochastic transportation and location problems) 

Stochastic transportation and location problems 
(90B80, 90C11) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; Minimum concave 
transportation problems; MINLP: application in facility 
location-allocation; Motzkin transposition theorem; 
Multifacility and restricted location problems; Multi-index 
transportation problems; Network location: covering 
problems; Optimizing facility location with euclidean and 
rectilinear distances; Single facility location: circle covering 
problem; Single facility location: multi-objective euclidean 
distance location; Single facility location: multi-objective 
rectilinear distance location; Voronoi diagrams in facility 
location; Warehouse location problem) 
(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Frank-Wolfe algorithm; Global 
optimization in Weber’s problem with attraction and 
repulsion; Minimum concave transportation problems; 
MINLP: application in facility location-allocation; Motzkin 


+ 
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transposition theorem; Multifacility and restricted location 
problems; Multi-index transportation problems; Network 
location: covering problems; Optimizing facility location 
with euclidean and rectilinear distances; 
Production-distribution system design problem; Resource 
allocation for epidemic control; Single facility location: 
circle covering problem; Single facility location: 
multi-objective euclidean distance location; Single facility 
location: multi-objective rectilinear distance location; 
Voronoi diagrams in facility location; Warehouse location 
problem) 

stochastic transportation problem 
[90B80, 90C11] 
(see: Stochastic transportation and location problems) 

stochastic travel times 
[90C10, 90C15] 
(see: Stochastic vehicle routing problems) 

stochastic vehicle routing problem 
[90C10, 90C15] 
(see: Stochastic vehicle routing problems) 

Stochastic vehicle routing problems 
(90C15, 90C10) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
General routing problem; Logconcave measures, 
logconvexity; Logconcavity of discrete distributions; 
L-shaped method for two-stage stochastic programs with 
recourse; Preprocessing in stochastic programming; 
Probabilistic constrained linear programming: duality 
theory; Probabilistic constrained problems: convexity 
theory; Simple recourse problem: dual method; Simple 
recourse problem: primal method; Stabilization of cutting 
plane algorithms for stochastic linear programming 
problems; Static stochastic programming models; Static 
stochastic programming models: conditional expectations; 
Stochastic integer programming: continuity, stability, rates 
of convergence; Stochastic integer programs; Stochastic 
linear programming: decomposition and cutting planes; 
Stochastic network problems: massively parallel solution; 
Stochastic programming: minimax approach; Stochastic 
programming models: random objective; Stochastic 
programming: nonanticipativity and lagrange multipliers; 
Stochastic programs with recourse: upper bounds; 
Two-stage stochastic programs with recourse; Vehicle 
routing; Vehicle scheduling) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
General routing problem; Integer programming: branch 
and cut algorithms; Logconcave measures, logconvexity; 
Logconcavity of discrete distributions; L-shaped method for 
two-stage stochastic programs with recourse; Multistage 
stochastic programming: barycentric approximation; 
Preprocessing in stochastic programming; Probabilistic 


constrained linear programming: duality theory; 
Probabilistic constrained problems: convexity theory; 
Simple recourse problem: dual method; Simple recourse 
problem: primal method; Stabilization of cutting plane 
algorithms for stochastic linear programming problems; 
Static stochastic programming models; Static stochastic 
programming models: conditional expectations; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Two-stage 
stochastic programming: quasigradient method; Two-stage 
stochastic programs with recourse; Vehicle routing; Vehicle 
scheduling) 

stochasticglobal optimization 
[92B05] 
(see: Genetic algorithms) 

stochasticity 
[90C30, 90C35] 
(see: Optimization in water resources) 

stock see: cutting —; echelon — 

stock problem see: cutting- — 

Stoica method 
[65T40, 90C26, 90C30, 90C90] 
(see: Global optimization methods for harmonic retrieval) 

stoichiometric form of KT conditions 
[49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 

stoichiometry see: estimation of reaction rates and — 

Stokes code see: Reynolds-averaged Navier- — 

stopping see: optimal — 

stopping criteria see: Dykstra’s algorithm and robust — 

stopping criterion 
[90C30] 
(see: Frank-Wolfe algorithm) 

stopping: numerical methods see: Stochastic optimal — 

stopping: problem formulations see: Stochastic optimal — 

stopping rule 
[90C05, 90C20] 
(see: Redundancy in nonlinear programs) 

stopping rule see: Bayesian — 

stopping rules see: Stochastic global optimization: — 

stopping times see: regenerative — 

storage see: unlimited intermediate —; variable- — 

storage algorithm see: variable- — 

storage capacity see: nodes with water — 

storage entities see: multipurpose — 

storage equation 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 


4542 


Subject Index 


storage plant 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
storage plant 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
storage plants 
90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
STP 
90B80, 90C11] 
(see: Stochastic transportation and location problems) 
STP-MSP 
90C27] 
(see: Steiner tree problems) 
StQP 
90C20] 
(see: Standard quadratic optimization problems: 
applications) 
strain-displacement compatibility equations 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 
strain tensor 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
stranded chain see: two- — 
strategic design models 
[90-02] 
(see: Operations research models for supply chain 
management and design) 
strategic design of a supply chain 
[90-02] 
(see: Operations research models for supply chain 
management and design) 
strategic supply chain management 
[90-02] 
(see: Operations research models for supply chain 
management and design) 
strategies 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
strategies see: active set —; default —; evolutionary —; 
solution — 
strategies for dynamic systems see: Optimization — 
strategies and performances see: Volume computation for 
polytopes: — 
strategy 
[49]xx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
strategy see: active set —; binary search-Hansel chains 
question-asking —; black-box —; bold —; branch and 
bound —-;; cluster first-schedule second —; 
convexification/relaxation —; evolution —; game of —; 
Goldfarb-Idnani active set —; Markov —; 
projection-restriction —; pure —; pure trust region —; 


question-asking —; sequential Hansel chains 
question-asking —; TR — 
strategy equilibrium see: memory — 
strategy for interval-Newton method in deterministic global 
optimization see: LP — 
strategy for model structure refinement see: incremental — 
strategy Nash equilibrium see: memory — 
strategy pattern 
(see: State of the art in modeling agricultural systems) 
stream see: lean —; rich — 
stream arcs see: natural — 
street planning 
[90C35] 
(see: Multicommodity flow problems) 
strengthen triangle inequality 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
stress 
[65K05, 90C27, 90C30, 90C57, 91C15] 
(see: Optimization-based visualization) 
stress see: equilibrium —; normalized —; s- — 
stress equilibrium equations 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 
stressed design see: fully — 
strict 
[58C20, 58E30, 90C46, 90C48] 
(see: Nonsmooth analysis: weak stationarity) 
strict complementarity 
[49M37, 65K05, 90C22, 90C25, 90C30, 90C31] 
(see: Inequality-constrained nonlinear optimization; 
Semidefinite programming: optimality conditions and 
stability) 
strict complementarity 
15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
strict complementarity slackness 
49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
strict complementarity slackness condition 
65K10, 90C31] 
(see: Sensitivity analysis of variational inequality problems) 
strict complementary slackness 
90C31, 90C34] 
(see: Semi-infinite programming: second order optimality 
conditions; Sensitivity and stability in NLP: continuity and 
differential stability) 
strict efficiency 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
strict efficiency see: local — 
strict feasibility condition 
[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 
strict local maximizer 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
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strict local maximum point 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
strict local minimizer 
[49M29, 65K05, 65K10, 90C06, 90C31, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization; Local 
attractors for gradient-related descent iterations; Sensitivity 
and stability in NLP: continuity and differential stability) 
strict local minimum 
90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 
strict local minimum point 
65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
strict local optimizer 
90C26] 
(see: Smooth nonlinear nonconvex optimization) 
strict monotonicity 
65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 


strict monotonicity 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
strict monotonicity see: local — 
strict relative minimum 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 
strict-set-contraction 
[90C33] 
(see: Order complementarity) 
strict TS 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
strictly antisymmetric relation 
[03B52, 03E72, 47840, 68127, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
strictly complementary conditions 
[15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
strictly complementary solution 
[15A39, 90C05] 
(see: Homogeneous selfdual methods for linear 
programming; Tucker homogeneous systems of linear 
relations) 
strictly convex function 
[90C26] 
(see: Generalized monotone single valued maps) 
strictly copositive matrix 
[65K05, 90C20] 
(see: Quadratic programming with bound constraints) 
strictly differentiable 
[49K27, 58C20, 58E30, 90C46, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials; 
Nonsmooth analysis: weak stationarity) 


strictly efficient point 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
strictly efficient point see: local — 
strictly feasible set 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
strictly monotone 
[47J20, 49J40, 65K10, 90C33] 
(see: Solution methods for multivalued variational 
inequalities) 
strictly monotone function 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
strictly monotone function see: locally — 
strictly monotone map 
[90C26] 
(see: Generalized monotone single valued maps) 
strictly monotone operator 
[90C26] 
(see: Generalized monotone multivalued maps) 
strictly proper 
see: Bayesian networks) 
strictly pseudoconvex 
[90C26] 
(see: Generalized monotone multivalued maps; Generalized 
monotone single valued maps) 
strictly pseudoconvex function 
[90C06, 90C25, 90C35] 
(see: Simplicial decomposition algorithms) 
strictly pseudomonotone map 
[90C26] 
see: Generalized monotone single valued maps) 
strictly pseudomonotone operator 
[90C26] 
see: Generalized monotone multivalued maps) 
strictly quasiconvex function 
[90C26] 
(see: Generalized monotone single valued maps) 
strictly quasimonotone map 
[90C26] 
(see: Generalized monotone single valued maps) 
strictly quasimonotone operator 
[90C26] 
see: Generalized monotone multivalued maps) 
strictly repetitive 
(see: Bayesian networks) 
string see: path- — 
strong branching 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
strong cutting plane 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: cutting plane algorithms) 
strong dual 
90C10, 90C46] 
(see: Integer programming duality) 
strong duality 
90C06, 90C30] 
(see: Duality for semidefinite programming; Lagrangian 
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duality: BASICS; Saddle point theory and optimality 
conditions) 
strong duality result 
[15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
strong duality theorem 
[05B35, 49-XX, 65K05, 90-XX, 90C05, 90C20, 90C33, 93-XX] 
(see: Criss-cross pivoting rules; Duality theory: monoduality 
in convex optimization) 
strong homomorphism 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
strong homomorphism see: very — 
strong local minimizer 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 
strong local minimum 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 
strong monotonicity 
[65K10, 65M60, 91B06, 91B60] 
(see: Oligopolistic market equilibrium; Variational 
inequalities: geometric interpretation, existence and 
uniqueness) 
strong monotonicity 
[65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
strong monotonicity see: local — 
strong NP-complete completeness 
[68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
strong NP-completeness 
[68Q25, 90C60] 
see: NP-complete problems and proof methodology) 
strong NP-completeness 
[68Q25, 90C60] 
(see: NP-complete problems and proof methodology) 
strong operator topology 
[46N10, 49J40, 90C26] 
see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
strong perfect graph theorem 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
see: Lovasz number) 
strong product 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 
strong relative minimum 
[90C26, 90C39] 
see: Second order optimality conditions for nonlinear 
optimization) 
strong second order sufficiency 
[49M37, 65K05, 65K10, 90C30, 93A13] 
see: Multilevel methods for optimal design) 
strong second order sufficient condition 
[90C31] 


(see: Sensitivity and stability in NLP: continuity and 
differential stability) 
strong second order sufficient condition see: general — 
strong semismoothness 
[90C30, 90C33] 
(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 
Strong Slater CQ 
[49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
strong stability of a stationary point 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
strong and weak duality 
90C30] 
(see: Duality for semidefinite programming) 
strongly active constraints 
49M20, 90C11, 90C30] 
(see: Generalized outer approximation) 
strongly connected components of a digraph 
90C09, 90C10] 
(see: Combinatorial matrix analysis) 
strongly connected digraph 
90C09, 90C10] 
(see: Combinatorial matrix analysis) 
strongly connected network 
90C35] 
(see: Minimum cost flow problem) 
strongly convex 
90C30] 
(see: Frank-Wolfe algorithm) 
strongly convex 
90C30] 
(see: Frank-Wolfe algorithm) 
strongly determined variable 
65H20, 65K05, 90-01, 90B40, 90C10, 90C27, 90C35, 94C15] 
(see: Greedy randomized adaptive search procedures) 
strongly linearly monotonic over 
90C05] 
(see: Extension of the fundamental theorem of linear 
programming) 
strongly monotone 
47J20, 49]40, 65K10, 90C30, 90C33] 
(see: Cost approximation algorithms; Implicit lagrangian; 
Solution methods for multivalued variational inequalities) 
strongly monotone function 
65K10, 65M60] 
(see: Variational inequalities: geometric interpretation, 
existence and uniqueness) 
strongly monotone function see: locally — 
strongly monotonic at see: locally — 
strongly nonsingular matrix 
[65K05, 65K10] 
(see: ABS algorithms for linear equations and linear least 
squares) 
strongly NP-hard 
[68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
strongly polynomial algorithm 
[62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
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strongly polynomial algorithm 
[90C11, 90C15] 
(see: Stochastic programming with simple integer recourse) 
strongly polynomial solution 
[62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
strongly polynomial time 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 65K05, 68Q, 68Q05, 
68Q10, 68Q25, 68R, 68U, 68W, 90B, 90C, 90C05, 90C25, 
90C26] 
(see: Convex discrete optimization; Information-based 
complexity and information-based optimization) 
strongly polynomial time 
90C60] 
(see: Complexity theory: quadratic programming) 
strongly polynomial time algorithm 
90C35] 
(see: Minimum cost flow problem) 
strongly polynomial time problem 
90C60] 
(see: Complexity theory: quadratic programming) 
strongly positive definite matrix 
65K10, 65M60] 
(see: Variational inequalities) 
strongly regular matrix 
15A99, 65G20, 65G30, 65G40, 90C26] 
(see: Interval linear systems) 
strongly semismooth function 
90C30, 90C33] 
(see: Nonsmooth and smoothing methods for nonlinear 
complementarity problems and variational inequalities) 
strongly semismooth mapping 
49J52, 90C30] 
(see: Nondifferentiable optimization: Newton method) 
strongly stable optimal solution 
90C26, 90C31, 91A65] 
(see: Bilevel programming: implicit function approach) 
strongly stable solution 
90C26, 90C31, 91A65] 
(see: Bilevel programming: implicit function approach) 
structual stability of SIP(f,h,g) see: global — 
structural analysis of cable structures 
49Q10, 74K99, 74Pxx, 90C90, 91A65] 
(see: Multilevel optimization in mechanics) 
structural analysis system see: stability of a — 
structural constraints 
90C90, 91B28] 
(see: Robust optimization) 
structural design 
90C25, 90C26, 90C27, 90C29, 90C90] 
(see: Optimal design of composite structures; Semidefinite 
programming and structural optimization) 
Structural optimization 
(49M37, 65K05, 90C30) 
(referred to in: Semidefinite programming and structural 
optimization; Shape optimization; Structural optimization: 
history; Topology of global optimization; Topology 
optimization) 
structural optimization 
[49M37, 65K05, 90C15, 90C26, 90C30, 90C33, 90C90] 


(see: Stochastic bilevel programs; Structural optimization; 
Structural optimization: history) 

structural optimization 
[49M37, 65K05, 90C26, 90C29, 90C30, 90C90] 
(see: Optimal design of composite structures; Structural 
optimization; Structural optimization: history) 

structural optimization see: Semidefinite programming and — 

Structural optimization: history 
(90C90, 90C26) 
(referred to in: Design optimization in computational fluid 
dynamics; Interval analysis: application to chemical 
engineering design problems; Multidisciplinary design 
optimization; Multilevel methods for optimal design; 
Optimal design of composite structures; Optimal design in 
nonlinear optics; Semidefinite programming and structural 
optimization; Shape optimization; Topology of global 
optimization; Topology optimization) 
(refers to: Bilevel programming: applications in engineering; 
Design optimization in computational fluid dynamics; 
Interval analysis: application to chemical engineering 
design problems; Multidisciplinary design optimization; 
Multilevel methods for optimal design; Optimal design of 
composite structures; Optimal design in nonlinear optics; 
Structural optimization; Topology optimization) 

structural response see: derivatives of — 

structural shape optimization 
[90C26, 90C90] 
(see: Structural optimization: history) 

structural stability 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 

structural stability see: global — 

structural stability in parametric programming 

[90C05, 90C25, 90C29, 90C30, 90C31] 

see: Nondifferentiable optimization: parametric 

programming) 

structural stability of SIP 

[90C31, 90C34] 

(see: Parametric global optimization: sensitivity) 

structural topology optimization 

[90C26, 90C90] 

see: Structural optimization: history) 

structural variables 

[90C90, 91B28] 

see: Robust optimization) 

structure see: analysing declarative program —; 
block-angular —; coarse valuation —; cost —; dual 
block-angular —; feasible spanning tree —; fine 
valuation —; generalized upper bounding —; 
information —; neighborhood —; optimal spanning 
tree —; primary —; protein —; secondary —; seed —; 
spanning tree —; tertiary — 

structure determination see: molecular — 

structure determination: convex global underestimation see: 
Molecular — 

structure factors see: normalized — 

structure of interest rates see: term — 

structure invariants 
[90C26] 
(see: Phase problem in X-ray crystallography: Shake and 
bake approach) 
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structure invariants 
[90C26] 
(see: Phase problem in X-ray crystallography: Shake and 
bake approach) 
structure of an irreducible matrix see: inductive — 
structure mixed integer wBB algorithm see: general —; 
special — 
structure prediction see: Genetic algorithms for protein —; 
tertiary — 
structure prediction methods see: Protein loop — 
structure for the QAP see: K-L type neighborhood — 
structure refinement see: incremental strategy for model — 
structure of the spatial price equilibrium problem see: 
network — 
structured matrices see: Stochastic programming: parallel 
factorization of — 
structured matrix 
[90C25, 90C30] 
see: Solving large scale and sparse semidefinite programs) 
structured matrix factorization 
[90C15] 
see: Stochastic programming: parallel factorization of 
structured matrices) 
structured singular value 
[93D09] 
(see: Robust control) 
structures see: design of composite —; engineering —; Global 
optimization of planar multilayered dielectric —; maxdiag 
fine —; mindiag fine —; model —; Optimal design of 
composite —; prediction of crystal —; Steiner ratio of 
biomolecular —; structural analysis of cable — 
struts 
[51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 
(see: Graph realization via semidefinite programming) 
(STSP) see: symmetric TSP — 
(sub)gradients parametric representations see: necessary 
optimality condition without using — 
sub Lagrangian 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
sub-problems see: Kuhn-Tucker conditions for quadratic 
programming — 
sub-tour elimination constraints 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
sub-tours 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
subclass see: conjugate direction —; optimally scaled —; 
orthogonally scaled — 
subconjugate function 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
subcritical function 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 
subcritical point 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 


subdifferentiability spaces 
[49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
subdifferentiable 
[26B25, 26E25, 49J52, 65K99, 90C99] 
(see: Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions) 
subdifferentiable function 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 
subdifferential 
[26B25, 26E25, 46A20, 49-XX, 49]40, 49J52, 49K27, 49K35, 
49M27, 49Q10, 52A01, 58C20, 58E30, 65G20, 65G30, 65G40, 
65K05, 65K10, 65K99, 70-08, 70-XX, 74K99, 74Pxx, 80-XX, 
90-XX, 90C06, 90C25, 90C26, 90C29, 90C30, 90C35, 90C46, 
90C48, 90C99, 90Cxx, 93-XX] 
(see: Composite nonsmooth optimization; Convex 
max-functions; Duality theory: triduality in global 
optimization; Farkas lemma: generalizations; Generalized 
monotone multivalued maps; Global optimization: 
envelope representation; Image space approach to 
optimization; Interval global optimization; Minimax: 
directional differentiability; Nonconvex energy functions: 
hemivariational inequalities; Nondifferentiable 
optimization: minimax problems; Nondifferentiable 
optimization: subgradient optimization methods; 
Nonsmooth analysis: weak stationarity; Quasidifferentiable 
optimization; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: optimality conditions; Set-valued 
optimization; Simplicial decomposition algorithms) 
subdifferential 
[49]52, 65K05, 90C30] 
(see: Nondifferentiable optimization: minimax problems; 
Nondifferentiable optimization: relaxation methods; 
Nondifferentiable optimization: subgradient optimization 
methods) 


subdifferential see: €- —; B- —; C- —; Clarke —; Clarke 
generalized —; convex —; e- —; Fenchel-Moreau —; 
Fréchet —; gateaux —; generalized —; H- —; limiting 


Fréchet —; Moreau-Rockafellar —; singular Fréchet —; 
singular limiting — 
subdifferential of F.H. Clarke see: generalized — 
subdifferential laws see: variational formulation of — 
subdifferential set 
[46N10, 90-00, 90C47] 
(see: Nondifferentiable optimization) 
subdifferential set see: €- — 
subdifferential Variational Principle 
[49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
subdifferentials 
[46A20, 52A01, 65Kxx, 90C30, 90Cxx] 
(see: Farkas lemma: generalizations; Quasidifferentiable 
optimization: algorithms for QD functions) 
subdifferentials see: Continuous approximations to —; 
estimation of —; Fréchet —; limiting (Fréchet) —; 
Nonsmooth analysis: Fréchet — 
subdigraph problem see: acyclic — 
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subdivision see: cell of a polyhedral —; face of a polyhedral —; 
guillotine —; polyhedral —; regular — 

subdivision directions in interval branch and bound methods 
see: Interval analysis: — 

subdivision rule see: adaptive — 

subdivision via (w,i) 

90B80, 90C11] 

(see: Stochastic transportation and location problems) 

subdual function 

90C25, 90C29, 90C30, 90C31] 

(see: Bilevel programming: optimality conditions and 

duality) 

subface 

05B35, 20F36, 20F55, 52C35, 57N65] 

(see: Hyperplane arrangements in optimization) 

subfamilies of n-valued PI-systems 

03B50, 68T15, 68T30] 

(see: Finite complete systems of many-valued logic algebras) 

subgame perfect 

49]xx, 91Axx] 

(see: Infinite horizon control and dynamic games) 

subgradient 

49J52, 49K27, 90C26, 90C29, 90C30, 90C35, 90C48] 
(see: Global optimization: envelope representation; Image 
space approach to optimization; Lagrangian duality: 
BASICS; Multicommodity flow problems; 
Nondifferentiable optimization: relaxation methods; 
Nondifferentiable optimization: subgradient optimization 
methods; Set-valued optimization) 

subgradient 
[49]52, 90C30] 
(see: Lagrangian duality: BASICS; Nondifferentiable 
optimization: relaxation methods; Nondifferentiable 
optimization: subgradient optimization methods) 

subgradient see: H- — 

subgradient inequality 
[46N10, 49M20, 90-00, 90-08, 90C25, 90C47] 
(see: Nondifferentiable optimization; Nondifferentiable 
optimization: cutting plane methods) 

subgradient locality measure 
[49]40, 49J52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 

subgradient method see: conjugate —; €- — 

Subgradient Methods 
[49]40, 49J52, 65K05, 90C30] 
(see: Nondifferentiable optimization: subgradient 
optimization methods; Solving hemivariational inequalities 
by nonsmooth optimization methods) 

subgradient optimization 
[90C10, 90C11, 90C15, 90C26, 90C27, 90C30, 90C33, 90C57] 
(see: Cost approximation algorithms; Set covering, packing 
and partitioning problems; Stochastic bilevel programs) 


subgradient optimization 
[90C30] 
(see: Cost approximation algorithms) 
subgradient optimization see: Lagrangian relaxation with — 
subgradient optimization methods see: Nondifferentiable 
optimization: — 


subgradient projection 
47H05, 65J15, 90C25, 90C55] 
(see: Fejér monotonicity in convex optimization) 
subgradient projection algorithm 
90C15, 90C26, 90C33] 
(see: Stochastic bilevel programs) 
subgradient techniques 
90C30, 90C90] 
(see: Decomposition techniques for MILP: lagrangian 
relaxation) 
subgradients 
46N10, 49K35, 49M27, 65K10, 90-00, 90C25, 90C30, 90C47] 
(see: Convex max-functions; Lagrangian duality: BASICS; 
Nondifferentiable optimization) 
subgraph 
[90C35] 
(see: Feedback set problems) 
subgraph see: induced —; length of a —; maximal planar —; 
maximum bipartite —; maximum planar —; planar — 
subgraph approach 
[05C69, 05C85, 68Q20, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set; 
Optimal triangulations) 
subgraph induced by a vertex subset 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
Subgraph Problem see: degree-constrained —; minMax 
Matching — 
subinterval adaptation 
[90C26, 90C30] 
(see: Bounding derivative ranges) 
subinterval selection 
[65K05, 90C30] 
(see: Algorithmic improvements using a heuristic 
parameter, reject index for interval optimization) 
subject to moment conditions see: optimal integral bounds — 
subjective curve fitting 
[90C26, 90C30] 
(see: Forecasting) 
subjective curve fitting and extrapolation 
[90C26, 90C30] 
see: Forecasting) 
subjective interpretation 
[94A17] 
see: Jaynes’ maximum entropy principle) 
subjet 
[90C26] 
see: Global optimization: envelope representation) 
sublattice 
[90C35] 
see: Multi-index transportation problems) 
sublinear see: difference — 
sublinear function 
[90C30] 
see: Image space approach to optimization) 
sublinear function see: difference — 
sublinear system 
[46A20, 52A01, 90C30] 
see: Farkas lemma: generalizations) 
submatrix see: principal — 
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submatroid 

90C09, 90C10] 

(see: Oriented matroids) 

submodular constraints 

90C09, 90C10] 

(see: Combinatorial optimization algorithms in resource 

allocation problems) 

submodular function 

90C09, 90C10, 90C25, 90C27, 90C35] 
(see: Combinatorial optimization algorithms in resource 
allocation problems; L-convex functions and M-convex 
functions) 

submodular function 

90C10, 90C25, 90C27, 90C35] 

(see: L-convex functions and M-convex functions) 


submodular polyhedron 

90C09, 90C10, 90C35] 

(see: Combinatorial optimization algorithms in resource 
allocation problems; Multi-index transportation problems) 
submodular system 

90C09, 90C10] 

(see: Combinatorial optimization algorithms in resource 
allocation problems) 


submodularity 
[90C35] 
(see: Multi-index transportation problems) 

Suboptimal control 
(90C30) 
(referred to in: Control vector iteration CVI; Duality in 
optimal control with first order differential equations; 
Dynamic programming: continuous-time optimal control; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Dynamic programming: 
optimal control applications; Hamilton-Jacobi-Bellman 
equation; Infinite horizon control and dynamic games; 
MINLP: applications in the interaction of design and 
control; Multi-objective optimization: interaction of design 
and control; Optimal control of a flexible arm; Robust 
control; Robust control: schur stability of polytopes of 
polynomials; Semi-infinite programming and control 
problems; Sequential quadratic programming: interior 
point methods for distributed optimal control problems) 
(refers to: Control vector iteration CVI; Duality in optimal 
control with first order differential equations; Dynamic 
programming: continuous-time optimal control; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Dynamic programming: optimal control 
applications; Hamilton-Jacobi-Bellman equation; Infinite 
horizon control and dynamic games; MINLP: applications 
in the interaction of design and control; Multi-objective 
optimization: interaction of design and control; Optimal 
control of a flexible arm; Optimization strategies for 
dynamic systems; Robust control; Robust control: schur 
stability of polytopes of polynomials; Semi-infinite 
programming and control problems; Sequential quadratic 
programming: interior point methods for distributed 
optimal control problems) 

suboptimal control 
[90C30] 
(see: Suboptimal control) 


suboptimal trajectories and controls 

03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 

suboptimality 

90C30] 

(see: Suboptimal control) 

suborder 

90C29] 

(see: Preference modeling) 

subproblem 

90C06, 90C30] 
(see: Decomposition principle of linear programming; 
Simplicial decomposition) 

subproblem 
[90C30] 
(see: Simplicial decomposition) 

subproblem see: allocation —; column generation —; 
master —; NLP —; primal —; quadratic programming —; 
reduced quadratic programming —; regularized — 

subproduct of relations 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

subrelation 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

subroutine library see: IMSL — 

subroutines see: FORTRAN — 

subset see: covering —; elemental —; good —; 
independent —; optimal —; signed —; subgraph induced 
by a vertex — 

subset feedback vertex (arc) set problem 

90C35] 

(see: Feedback set problems) 

subset heterogeneity 

62H30, 90C39] 

(see: Dynamic programming in clustering) 

subset interconnection designs 

90C27] 

(see: Steiner tree problems) 

subset minimum feedback vertex (arc) set problem 

90C35] 

(see: Feedback set problems) 

subset of participants 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

subset selection 

90C15, 90C27] 

(see: Discrete stochastic optimization) 

subset-sum problem 

05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C, 90C20, 90C60] 
(see: Convex discrete optimization; Quadratic knapsack) 

subsets see: transversal of a collection of — 

subspace see: concurrent —; finite-dimensional — 

subspace optimization see: concurrent — 

substationarity 
[49]40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
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(see: Nonconvex energy functions: hemivariational 
inequalities) 

substationarity point of a functional 

49J40, 49]52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 

(see: Nonconvex energy functions: hemivariational 

inequalities) 

substationarity point with respect to a set 

49J40, 49]52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 

(see: Nonconvex energy functions: hemivariational 

inequalities) 

substationarity problems 

49J40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX, 90C90, 
91A65] 
(see: Multilevel optimization in mechanics; Nonconvex 
energy functions: hemivariational inequalities) 

substationary point 

49J40, 49J52, 65K05, 90C30] 

(see: Solving hemivariational inequalities by nonsmooth 

optimization methods) 

substationary point 

49J40, 49J52, 65K05, 90C30] 

(see: Solving hemivariational inequalities by nonsmooth 

optimization methods) 

substitution 

65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 

(see: Algorithms for genomic analysis) 

substitution see: forward —; successive — 

substitution supernode 

65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 

(see: Algorithms for genomic analysis) 

substructure property see: optimal — 

substructuring 

65F xx] 

(see: Least squares problems) 

subtour elimination constraints 

90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: cutting plane algorithms) 

subtours 

90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: cutting plane algorithms) 

subtract 

90C11] 

(see: Predictive method for interhelical contacts in 

alpha-helical proteins) 

subtree 

68T99, 90C27] 

(see: Capacitated minimum spanning trees) 

subtree hypergraph 

90C27] 

(see: Steiner tree problems) 

subtree isomorphism 

05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 

subtree isomorphism see: maximal —; maximal similarity —; 
maximum —; maximum similarity — 

successive affine reduction 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

successive affine reduction BFGS algorithm 
[90C30] 


(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 

successive approximation 

49120, 90C40] 

(see: Dynamic programming: undiscounted problems) 

successive displacements see: method of — 

successive improvement of KKT points 

90C30] 

(see: Large scale trust region problems) 

successive overrelaxation 

90C33] 
(see: Linear complementarity problem) 

Successive quadratic programming 
(90C30) 
(referred to in: Convex max-functions; Feasible sequential 
quadratic programming; Optimization with equilibrium 
constraints: A piecewise SQP approach; Sequential 
quadratic programming: interior point methods for 
distributed optimal control problems; Successive quadratic 
programming: applications in distillation systems; 
Successive quadratic programming: applications in the 
process industry; Successive quadratic programming: 
decomposition methods; Successive quadratic 
programming: full space methods; Successive quadratic 
programming: solution by active sets and interior point 
methods) 
(refers to: Feasible sequential quadratic programming; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Successive quadratic programming: applications in 
distillation systems; Successive quadratic programming: 
applications in the process industry; Successive quadratic 
programming: decomposition methods; Successive 
quadratic programming: full space methods; Successive 
quadratic programming: solution by active sets and interior 
point methods) 

successive quadratic programming 
[65L99, 90C20, 90C25, 90C30, 90C90, 93-XX] 
(see: Dynamic programming: optimal control applications; 
Optimization strategies for dynamic systems; Successive 
quadratic programming: applications in distillation 
systems; Successive quadratic programming: 
decomposition methods; Successive quadratic 
programming: full space methods; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

successive quadratic programming 
[65K05, 65K10, 90C06, 90C20, 90C30, 90C34, 90C90] 
(see: Feasible sequential quadratic programming; Successive 
quadratic programming; Successive quadratic 
programming: applications in distillation systems; 
Successive quadratic programming: decomposition 
methods) 

successive quadratic programming see: full space — 

Successive quadratic programming: applications in 
distillation systems 
(90C30, 90C90) 
(referred to in: Feasible sequential quadratic programming; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Sequential quadratic programming: interior 
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point methods for distributed optimal control problems; 
Successive quadratic programming; Successive quadratic 
programming: applications in the process industry; 
Successive quadratic programming: decomposition 
methods; Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 
(refers to: Feasible sequential quadratic programming; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Successive quadratic programming; Successive quadratic 
programming: applications in the process industry; 
Successive quadratic programming: decomposition 
methods; Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 

Successive quadratic programming: applications in the 
process industry 
(90C30, 90C90) 
(referred to in: Feasible sequential quadratic programming; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Successive quadratic programming; Successive quadratic 
programming: applications in distillation systems; 
Successive quadratic programming: decomposition 
methods; Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 
(refers to: Feasible sequential quadratic programming; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Successive quadratic programming; Successive quadratic 
programming: applications in distillation systems; 
Successive quadratic programming: decomposition 
methods; Successive quadratic programming: full space 
methods; Successive quadratic programming: solution by 
active sets and interior point methods) 

Successive quadratic programming: decomposition methods 
(90C30, 90C20) 
(referred to in: Decomposition principle of linear 
programming; Feasible sequential quadratic programming; 
Generalized benders decomposition; MINLP: generalized 
cross decomposition; MINLP: logic-based methods; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Simplicial decomposition; Simplicial decomposition 
algorithms; Stochastic linear programming: decomposition 
and cutting planes; Successive quadratic programming; 
Successive quadratic programming: applications in 
distillation systems; Successive quadratic programming: 
applications in the process industry; Successive quadratic 
programming: full space methods; Successive quadratic 
programming: solution by active sets and interior point 
methods) 
(refers to: Decomposition principle of linear programming; 
Feasible sequential quadratic programming; Generalized 
benders decomposition; MINLP: generalized cross 


decomposition; MINLP: logic-based methods; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Simplicial decomposition; Simplicial decomposition 
algorithms; Stochastic linear programming: decomposition 
and cutting planes; Successive quadratic programming; 
Successive quadratic programming: applications in 
distillation systems; Successive quadratic programming: 
applications in the process industry; Successive quadratic 
programming: full space methods; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

Successive quadratic programming: full space methods 
(90C30, 90C25) 
(referred to in: Feasible sequential quadratic programming; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Rosen’s method, global convergence, and 
Powell’s conjecture; Sequential quadratic programming: 
interior point methods for distributed optimal control 
problems; Successive quadratic programming; Successive 
quadratic programming: applications in distillation 
systems; Successive quadratic programming: applications 
in the process industry; Successive quadratic programming: 
decomposition methods; Successive quadratic 
programming: solution by active sets and interior point 
methods) 
(refers to: Feasible sequential quadratic programming; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Successive quadratic programming; Successive quadratic 
programming: applications in distillation systems; 
Successive quadratic programming: applications in the 
process industry; Successive quadratic programming: 
decomposition methods; Successive quadratic 
programming: solution by active sets and interior point 
methods) 

Successive quadratic programming: solution by active sets and 
interior point methods 
(90C30, 90C25) 
(referred to in: Entropy optimization: interior point 
methods; Feasible sequential quadratic programming; 
Homogeneous selfdual methods for linear programming; 
Linear programming: interior point methods; Linear 
programming: karmarkar projective algorithm; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Potential reduction methods for linear 
programming; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Successive quadratic programming; Successive quadratic 
programming: applications in distillation systems; 
Successive quadratic programming: applications in the 
process industry; Successive quadratic programming: 
decomposition methods; Successive quadratic 
programming: full space methods) 
(refers to: Entropy optimization: interior point methods; 
Feasible sequential quadratic programming; Homogeneous 
selfdual methods for linear programming; Interior point 
methods for semidefinite programming; Linear 
programming: interior point methods; Linear 


Subject Index 


4551 


programming: karmarkar projective algorithm; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Potential reduction methods for linear 
programming; Sequential quadratic programming: interior 
point methods for distributed optimal control problems; 
Successive quadratic programming; Successive quadratic 
programming: applications in distillation systems; 
Successive quadratic programming: applications in the 
process industry; Successive quadratic programming: 
decomposition methods; Successive quadratic 
programming: full space methods) 

successive shortest path algorithm 

90C35] 

(see: Minimum cost flow problem) 

successive shortest path algorithm 

90C35] 

(see: Minimum cost flow problem) 

successive substitution 

65H10, 65J15] 

(see: Contraction-mapping) 

sufficiency see: second order —; strong second order — 

sufficiency theorem for the Hamilton-Jacobi-Bellman equation 

34H05, 49120, 90C39] 

(see: Hamilton-Jacobi-Bellman equation) 

sufficient 

90C22, 90C25, 90C31, 90C33, 90C34, 90C46] 
(see: Generalized semi-infinite programming: optimality 
conditions; Linear complementarity problem; Semidefinite 
programming: optimality conditions and stability) 

sufficient see: column —; row — 

sufficient condition 
[90C26, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization) 

sufficient condition 
[90C30] 
(see: Image space approach to optimization) 

sufficient condition see: general second order —; general 
strong second order —; saddle-point —; second order —; 
strong second order — 

sufficient conditions 
[90C26, 90C31, 90C34, 90C39] 
(see: Second order optimality conditions for nonlinear 
optimization; Semi-infinite programming: second order 
optimality conditions) 

sufficient conditions see: necessary and —; second order — 

sufficient decrease conditions 
[49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 

sufficient matrix 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules; Principal pivoting methods 
for linear complementarity problems) 

sufficient matrix see: column —; row — 

sufficient optimality condition 
[90C26, 90C31, 91465] 
(see: Bilevel programming: implicit function approach) 

sufficient optimality conditions 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 


sufficient optimality conditions 
[90C26, 90C31, 90C39, 91A65] 
(see: Bilevel programming: implicit function approach; 
Second order optimality conditions for nonlinear 
optimization) 
sufficient optimality conditions see: necessary and —; second 
order necessary and — 
sufficiently high 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
see: Global optimization in protein folding) 
sufficiently large 
[90C11, 90C26] 
see: Extended cutting plane algorithm) 
sufficiently small 
[25A15, 34A05, 90C25, 90C26, 90C30, 90C31] 
see: Convexifiable functions, characterization of) 
sum see: Minkowski — 
sum of convex multiplicative functions 
[90C26, 90C31] 
see: Multiplicative programming) 
sum diagram see: cumulative — 
sum game see: two-person zero- — 
sum infinite horizon game see: nonzero- — 
sum of integer infeasibilities 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 
sum matrix 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
sum perfect-information game see: two-player zero- — 
sum problem see: subset- — 
sum of ratios see: maximizing a — 
sum-of-ratios fractional program 
[90032] 
(see: Fractional programming) 
sum Rule 
[49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
sum rule see: fuzzy — 
sum of squared error 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
sum of squares 
[90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 
sums program see: weighted- — 
sums programs with constraints see: weighted- — 
sup norm see: weighter — 
sup-norm contraction see: weighter — 
sup-stationary point 
[90Cxx] 
(see: Quasidifferentiable optimization: optimality 
conditions) 
sup-stationary point see: Dini —; Hadamard — 
sup theorem see: James — 
supconjugate function 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: triduality in global optimization) 
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super-polynomial time 

90C27, 90C60, 91A12] 

(see: Combinatorial optimization games) 

superadditive dual 

[90C 10, 90C46] 

see: Integer programming duality) 

superadditive duality 

[90C10, 90C46] 

(see: Integer programming duality) 

superadditive function 

[90C10, 90C46] 

see: Integer programming duality) 

superbasic variables 

[90C30] 

see: Convex-simplex algorithm) 

superbasic variables 

[90C30] 

see: Convex-simplex algorithm) 

superconsistency 

[90C05, 90C25, 90C30, 90C34] 

(see: Semi-infinite programming, semidefinite 

programming and perfect duality) 

supercritical function 

[49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 

supercritical point 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: biduality in nonconvex optimization) 

supercritical point theorem 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: biduality in nonconvex optimization) 

superdifferentiable 

[26B25, 26E25, 49J52, 90C99, 90Cxx] 
(see: Quasidifferentiable optimization; Quasidifferentiable 
optimization: optimality conditions) 

superdifferentiable function 
[90C06] 
(see: Saddle point theory and optimality conditions) 

superdifferential 
[26B25, 26E25, 46A20, 49-XX, 49J52, 49K27, 52A01, 58C20, 
58E30, 65K05, 65K99, 70-08, 90-XX, 90C25, 90C30, 90C48, 
90C99, 90Cxx, 93-XX] 
(see: Duality theory: triduality in global optimization; 
Farkas lemma: generalizations; Minimax: directional 
differentiability; Nonsmooth analysis: Fréchet 
subdifferentials; Quasidifferentiable optimization; 
Quasidifferentiable optimization: codifferentiable 
functions; Quasidifferentiable optimization: optimality 
conditions) 

superdifferential see: Fréchet —; limiting — 

superfluous 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 

supergraph see: three-layer — 

superLagrangian 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization) 

superLagrangian 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: biduality in nonconvex optimization; 
Duality theory: triduality in global optimization) 


superLagrangian duality 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 

superLagrangian duality theorem 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 

superlinear see: 2-step —; Q- — 

superlinear convergence 

49J52, 90C30] 

(see: Nondifferentiable optimization: Newton method) 

superlinear convergence see: Q- — 

superlinear convergence condition 

49M29, 65K10, 90C06] 

(see: Dynamic programming and Newton’s method in 

unconstrained optimal control) 

superlinear convergent rate 

90C30] 

(see: Simplicial decomposition) 

superlinear function 

90C30] 

(see: Image space approach to optimization) 

supermaximum point 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 

superminimax point 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 

superminimax theorem 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 

supermodular function 

90C10, 90C25, 90C27, 90C35 

(see: L-convex functions and M-convex functions) 

supernode see: insertion —; substitution — 

superpositions of functions 

01A60, 03B30, 54C70, 68Q17] 

(see: Hilbert’s thirteenth problem) 

superpositions of functions 

01A60, 03B30, 54C70, 68Q17] 

(see: Hilbert’s thirteenth problem) 

superpotential 

49J40, 49]52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
(see: Nonconvex energy functions: hemivariational 
inequalities) 

superpotential see: nonconvex —; nonsmooth —; 
quasidifferentiable — 

superpotentials see: convex variational inequality for an 
elastostatic problem involving QD- —; elastostatic problem 
involving QD- —; variational equality for an elastostatic 
problem involving QD- — 

superproduct of relations 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

superrelation 

03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 

superstep 

65K05, 65Y05] 

(see: Parallel computing: models) 
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superstructure 
[03H10, 49J27, 49M37, 90C11, 90C34] 
(see: MINLP: applications in the interaction of design and 
control; Semi-infinite programming and control problems) 
superstructure see: design —; distillation —; heat exchanger 
network —; MEN — 
superstructure model 
[90C90] 
(see: MINLP: heat exchanger network synthesis) 
supervised classification 
[90C90] 
(see: Optimization in medical imaging) 
supervised learning 
[65K05, 68T05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 
90C30, 90C52, 90C53, 90C55, 90C90] 
(see: Disease diagnosis: optimization-based methods; 
Unconstrained optimization in neural network training) 
supervisor algorithm 
90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 
supervisor and searcher cooperation minimization algorithms 
90C15, 90C30, 90C99] 
(see: SSC minimization algorithms) 
supply see: net — 
supply chain 
90-02, 90B05, 90B06] 
(see: Global supply chain models; Operations research 
models for supply chain management and design) 
supply chain 
90B50] 
(see: Inventory management in supply chains) 
supply chain see: global —; operational decisions in a —; 
strategic design of a — 
supply chain design 
90-02 
(see: Operations research models for supply chain 
management and design) 
Supply chain management 
90-02 
(see: Operations research models for supply chain 
management and design) 
supply chain management 
90-02 
(see: Operations research models for supply chain 
management and design) 
supply chain management see: Bilinear programming: 
applications in the —; Mathematical programming 
methods in —; operational —; strategic — 
supply chain management and design see: Operations 
research models for — 
supply chain models see: Global — 
supply chain optimization 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
Supply chain performance measurement 
(00-02, 01-02, 03-02) 
supply chain simulation models 
[90-02] 
(see: Operations research models for supply chain 
management and design) 
supply chains see: Inventory management in — 


supply node 
[90035] 
(see: Minimum cost flow problem) 

support see: decision —; linear —; Multiple objective 
programming —; total — 

support function 


[26E25, 49J52, 52A27, 65K05, 90C26, 90C30, 90C99] 
(see: Global optimization: envelope representation; 
Minimax: directional differentiability; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives) 

support function 

[65K05, 90C30] 

see: Minimax: directional differentiability) 

support hyperplane 

[90C26] 

see: Global optimization: envelope representation) 

support of an integral vector 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 

support methodologies for auditing decisions see: Multicriteria 
decision — 

support of a natural vector 

[13Cxx, 13Pxx, 14Qxx, 90Cxx] 

see: Integer programming: algebraic methods) 

support problems method see: extended — 

support problems solution method 

[90C34, 91B28] 

see: Semi-infinite programming and applications in 

finance) 

support set 

[90C05, 90C10, 90C15] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds; Probabilistic constrained linear 
programming: duality theory) 

support set 
[90C26] 
(see: Global optimization: envelope representation) 

support set of a function 
[90C26] 
(see: Global optimization: envelope representation) 

support system see: Asset liability management decision —; 
decision —; intelligent multicriteria decision —; 
multicriteria decision —; multicriteria group decision — 

support systems see: decision —; intelligent multicriteria 
decision —; Multi-objective optimization and decision —; 
Optimization and decision — 

support systems with multiple criteria see: Decision — 

support vector machine see: generalized eigenvalue 
proximal — 

support vector machine problem see: Generalized eigenvalue 
proximal — 

support vector machines 

[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 

90C90] 

see: Disease diagnosis: optimization-based methods) 

supported efficient solution 

[90C10, 90C29] 

see: Multi-objective integer linear programming) 

supported efficient solutions 

[90C10, 90C35] 

see: Bi-objective assignment problem) 
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supporting function see: linear — 

supports problems method 

[90C34, 91B28] 

(see: Semi-infinite programming and applications in 

finance) 

supremal generator 

90C26] 

(see: Global optimization: envelope representation) 

supremal generator 

[90C26] 

see: Global optimization: envelope representation) 

supremum see: essential — 

surface 

see: State of the art in modeling agricultural systems) 

surface see: response — 

surface formula see: integral over — 

surface and groundwater resources 

[90C30, 90C35] 

see: Optimization in water resources) 

surface and groundwater systems 

[90C30, 90C35] 

see: Optimization in water resources) 

surface method see: response — 

surface traction 

[35A15, 47J20, 49]40] 

see: Hemivariational inequalities: static problems) 

surface water pumping facilities 

[90C30, 90C35] 

see: Optimization in water resources) 

surplus wealth 

[91B28] 

see: Financial optimization) 

surrogate see: document — 

surrogate constraint 

[90C10, 90C46 

see: Integer programming duality) 

surrogate dual 

[90C10, 90C30 

see: Integer programming: lagrangian relaxation) 

surrogate duality 

[90C10, 90C46 

(see: Integer programming duality) 

surrogate relaxation 

[90C10, 90C46 
(see: Integer programming duality) 

surrogates see: binary —; similarity of — 

surveys see: integration of — 

survivability 

[90-XX] 

(see: Survivable networks) 

survivable network 

[90C10, 90C27, 94C15] 

see: Graph planarization) 

survivable network design problem 

[90-XX] 

see: Survivable networks) 

Survivable networks 

(90-XX) 

referred to in: Auction algorithms; Communication 

network assignment problem; Dynamic traffic networks; 

Equilibrium networks; Generalized networks; Maximum 


flow problem; Minimum cost flow problem; 
Multicommodity flow problems; Network design problems; 
Network location: covering problems; Nonconvex network 
flow problems; Piecewise linear network flow problems; 
Shortest path tree algorithms; Steiner tree problems; 
Stochastic network problems: massively parallel solution; 
Traffic network equilibrium) 
(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Equilibrium networks; Evacuation 
networks; Generalized networks; Maximum flow problem; 
Minimum cost flow problem; Network design problems; 
Network location: covering problems; Nonconvex network 
flow problems; Piecewise linear network flow problems; 
Shortest path tree algorithms; Steiner tree problems; 
Stochastic network problems: massively parallel solution; 
Traffic network equilibrium) 
survival of the fittest 
(see: Broadcast scheduling problem) 
sVNS 
[9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 
(SVP) see: seismic Vessel Problem — 
SVRP 
[90C10, 90C15] 
(see: Stochastic vehicle routing problems) 
swap see: greedy — 
swaps see: monotone sequence of greedy — 
sweep algorithm 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
sweep method see: Aitken double — 
switches see: maximum number of well — 
switching 
93-XX] 
(see: Dynamic programming: optimal control applications) 
switching circuit see: combinatorial — 
switching curve 
90C30 
(see: Suboptimal control) 
switching engines see: scheduling of — 
switching time 
93-XX] 
(see: Dynamic programming: optimal control applications) 
Sylvester problem 
90B85, 90C27] 
(see: Single facility location: circle covering problem) 
symbolic differentiation 
26A24, 65D25, 68W30] 
(see: Automatic differentiation: introduction, history and 
rounding error estimation; Complexity of gradients, 
Jacobians, and Hessians) 
symbolic manipulation 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 
symbolic preprocessing 
[65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: intermediate terms) 
symbolic translation 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
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symbolically transforming declarative programs 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
symmetric 
[15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 
symmetric-Broyden method see: Powell- — 
symmetric continuous form see: coercive bilinear — 
symmetric element in a Hilbert space 
[65M60] 
(see: Variational inequalities: F. E. approach) 
symmetric interior of a relation 
[03B52, 03E72, 47840, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
symmetric interval matrix see: dimensional —; real — 
symmetric matrix 
[65K05, 90Cxx] 
(see: Symmetric systems of linear equations) 
symmetric matrix 
[65K05, 90Cxx] 
(see: Symmetric systems of linear equations) 
symmetric matrix see: positive semidefinite —; skew- — 
symmetric matrix M see: skew- — 
symmetric multi-index transportation problem 
[90C35] 
(see: Multi-index transportation problems) 
symmetric network equilibrium 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
symmetric proximity 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
symmetric proximity see: skew- — 
symmetric rank-one approach see: limited-memory — 
symmetric rank-one quasi-Newton method 
90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
symmetric rank-one update 
90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
symmetric relation 
03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
symmetric S2x2%2 group 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
Symmetric systems of linear equations 
(65K05, 90Cxx) 
(referred to in: ABS algorithms for linear equations and 
linear least squares; Cholesky factorization; Gauss, Carl 
Friedrich; Interval linear systems; Large scale trust region 
problems; Large scale unconstrained optimization; 
Orthogonal triangularization; Overdetermined systems of 
linear equations; QR factorization; Solving large scale and 
sparse semidefinite programs) 
(refers to: ABS algorithms for linear equations and linear 
least squares; Cholesky factorization; Gauss, Carl Friedrich; 


Interval linear systems; Large scale trust region problems; 
Large scale unconstrained optimization; Linear 
programming; Orthogonal triangularization; 
Overdetermined systems of linear equations; QR 
factorization; Solving large scale and sparse semidefinite 
programs) 

symmetric TSP 
[90C59] 
(see: Heuristic and metaheuristic algorithms for the 
traveling salesman problem) 

symmetric TSP (STSP) 
[68Q25, 68R10, 68W40, 90B06, 90B35, 90C06, 90C10, 90C27, 
90C39, 90C57, 90C59, 90C60, 90C90] 
(see: Domination analysis in combinatorial optimization; 
Traveling salesman problem) 

symmetry model see: rotation- — 

synchronized distributed state space search algorithm 

[49]35, 49K35, 62C20, 91A05, 91A40] 

see: Minimax game tree searching) 

synchronized parallel CA algorithm 

[90C30] 

see: Cost approximation algorithms) 

synchronous implementation of the auction algorithm 

[90C30, 90C35] 

see: Auction algorithms) 

synchronous parallel see: bulk — 

synchronous parallel computer see: bulk — 

synchronous parallel model see: bulk — 

syndrome see: NIMBY — 

synthesis see: heat exchanger network —; HEN —; MINLP: heat 
exchanger network —; MINLP: reactive distillation 
column —; Mixed integer linear programming: heat 
exchanger network —; network —; problem —; process —; 
robust control —; sequential —; simultaneous — 

synthesis and control see: interaction of design —; mu — 

synthesis and design under uncertainty see: process — 

synthesis method see: sequential MEN — 

synthesis model see: multiperiod MINLP MEN — 

synthesis problem 
[93A30, 93B50] 
(see: Mixed integer linear programming: mass and heat 
exchanger networks) 

synthesis of separation processes 
[90030] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes) 

synthesis using MINLP see: HEN — 

synthesis without decomposition see: heat exchanger 
network — 

system 
[65K05, 90C30] 
(see: Bisection global optimization methods) 

system see: Aizenberg-Rabinovich —; alternative linear —; 
ant —; Asset liability management decision support —; 
asymptotical stability of a —; bisubmodular —; center of an 
interval linear —; curvilinear coordinate —; decision 
support —; degenerate —; discrete event dynamic —; 
DSL —; dual —; dynamical —; electric power —; expert —; 
finite jump —; Fritz John —; generalized network 
optimization —; Hamiltonian —; high performance 
computing —; independence —; independent —; 
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infeasible —; initial —; intelligent multicriteria decision 
support —; interactive disaggregation —; interval linear —; 
iterative function —; lattice-type many-valued logic —; 
MAX-MIN ant —; moving coordinate —; multi-echelon 
arborescence —; multicriteria decision support —; 
multicriteria group decision support —; optimization —; 
perturbed —; Post —; preferential bidding —; projected 
dynamical —; propositional proof —; reduced RLT —; 
rule-based —; selfdual —; sign-solvable linear —; solution 
of a —; stability of a —; stability of a structural analysis —; 
state of a —; sublinear —; submodular —; time-delay —; 
tolerant —; totally dual integral —; variation of a —; 
Variational inequalities: projected dynamical —; variational 
inequality problem and a projected dynamical — 

system of approximate reasoning see: interval logic —; 
point-based logic — 

system cohomology see: local — 

system conditions see: economic — 

system design see: distribution — 

system design problem see: Production-distribution — 

system of equations 
[65H10, 65K10, 65M60, 90C26, 90C30] 
(see: Global optimization methods for systems of nonlinear 
equations; Variational inequalities) 

system of equations 
[65K10, 65M60] 
(see: Variational inequalities) 

system of equations see: nonlinear —; polynomial — 

system of inequalities 
[65H10, 90C26, 90C30] 
(see: Global optimization methods for systems of nonlinear 
equations) 

system of nonlinear equations see: overdetermined —; 
underdetermined —; well-determined — 

system-optimization 

[90B06, 90B20, 91B50] 

see: Traffic network equilibrium) 

system, optimization of see: Wastewater — 

system-optimized transportation network 

[90B06, 90B20, 91B50] 

see: Traffic network equilibrium) 

system-optimizing environment 

[90B80, 90B85, 90Cxx, 91 Axx, 91Bxx] 

(see: Facility location with externalities) 

system of simplexes 

[65K05, 90C30] 

(see: Bisection global optimization methods) 

system stability 

[90B15] 

see: Dynamic traffic networks) 

system stability 

[90B15] 
(see: Dynamic traffic networks) 

system stability see: asymptotical — 

system of variational inequalities 
[49]40, 49M05, 49805, 74G99, 74H99, 74Pxx] 
(see: Quasidifferentiable optimization: variational 
formulations) 

systematic search 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 


70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
(see: Global optimization in protein folding) 

systems see: alternative —; batch production —; boundary 
flux estimation in distributed —; conjunctive use of water 
resource —; convex inequality —; convex-like —; decision 
support —; discrete dynamical —; discrete-time —; 
distribution —; estimating uncertainty in dynamical —; 
expert —; Global optimization in the analysis and 
management of environmental —; homogeneous —; 


homogeneous dual —; inequality —; intelligent 
multicriteria decision support —; Interval analysis for 
optimization of dynamical —; Interval linear —; 

inventory —; large scale linear —; linearly elastic —; Model 


based control for drug delivery —; Multi-objective 
optimization and decision support —; nondegenerate —; 
Optimization and decision support —; Optimization in 
operation of electric and energy power —; Optimization 
strategies for dynamic —; projected dynamical —; 
Quasidifferentiable optimization: stability of dynamic —; 
reaction flux estimation in lumped —; satellite —; State of 
the art in modeling agricultural —; stochastic dynamic —; 
subfamilies of n-valued PI- —; Successive quadratic 
programming: applications in distillation —; surface and 
groundwater —; time-delay —; uncertain —; water 
transportation — 

systems by constructive nonlinear dynamics see: Robust 
design of dynamic — 

systems of equations see: error bound for approximate 
solutions of nonlinear —; existence of solutions of 
nonlinear —; linear —; nonlinear —; rigorous bound for 
solutions of nonlinear —; uniqueness of solutions of 
nonlinear — 

systems of equations: application to the enclosure of all 
azeotropes see: Nonlinear — 

systems of linear equations see: Overdetermined —; 
Symmetric — 

systems of linear relations see: Tucker homogeneous — 

systems of many-valued logic algebras see: Finite complete — 

systems modeling and management see: applications in 
environmental — 

systems with multiple criteria see: Decision support — 

systems of nonlinear equations 
[49M37, 65K10, 90C26, 90C30] 
(see: &BB algorithm) 

systems of nonlinear equations 
[65H10, 90C26, 90C30] 
(see: Global optimization methods for systems of nonlinear 
equations) 

systems of nonlinear equations see: Global optimization 
methods for —; Interval analysis: — 

systems for oriented matroids see: axiom — 

systems planning see: distribution — 

systems theory and control 
[49-XX, 60Jxx, 65Lxx, 91B32, 92D30, 93-XX] 
(see: Resource allocation for epidemic control) 

systems of variational inequalities see: QD laws and — 


T 


T see: prohibition parameter — 
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t-coloring 
[05-XX] 
(see: Frequency assignment problem) 

T-coloring frequency assignment see: order of a —; span of 
a— 

t-conorm 
[03B50, 03B52, 03E72, 47840, 68T15, 68T27, 68T30, 68T35, 
68Uxx, 90Bxx, 91Axx, 91B06, 92C60] 
(see: Boolean and fuzzy relations; Finite complete systems 
of many-valued logic algebras) 

t-conorms 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 

t-norm 
[03B50, 03B52, 03E72, 47840, 68T15, 68T27, 68T30, 68T35, 
68Uxx, 90Bxx, 91Axx, 91B06, 92C60] 
(see: Boolean and fuzzy relations; Finite complete systems 
of many-valued logic algebras) 

t-norms 

03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 

TA 

90C05, 90C30] 

(see: Theorems of the alternative and optimization) 

table see: contingency —; k-way — 

table with given marginals 

90C35] 
(see: Multi-index transportation problems) 

table representation see: lookup — 

tableau see: distinguished —; initial —; lexico-positive 
basis —; terminal simplex — 

tables see: triangulation problem for input-output — 

taboo restriction 
[90C26, 90C90] 
(see: Global optimization in binary star astronomy) 

tabs search see: fixed — 

tabu see: forbidden or — 

tabu list 
[05C69, 05C85, 68T99, 68W01, 90C27, 90C59] 
(see: Capacitated minimum spanning trees; Heuristics for 
maximum clique and independent set) 

tabu search 
[03B05, 62C10, 65K05, 68P10, 68Q25, 68R05, 68T15, 68T20, 
68T99, 90B80, 90C05, 90C08, 90C09, 90C10, 90C11, 90C15, 
90C26, 90C27, 90C35, 90C57, 90C59, 94C10] 
(see: Bayesian global optimization; Capacitated minimum 
spanning trees; Communication network assignment 
problem; Maximum constraint satisfaction: relaxations and 
upper bounds; Maximum satisfiability problem; 
Metaheuristics; Multi-index transportation problems; 
Quadratic assignment problem) 

tabu search see: allowed neighbor in —; Hamming-reactive —; 
prohibited neighbor in —; reactive — 

tabu search methodology 
[90C10, 90C11, 90C20] 
(see: Linear ordering problem) 

tactics see: trading — 

TAG 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 

tail see: RSM-distribution with algebraically decreasing — 


tail(l) 
(see: Railroad crew scheduling) 
tail of operation 
90B35] 
(see: Job-shop scheduling problem) 
tail problem see: head-body- — 
tailing-off 
68Q99] 
(see: Branch and price: Integer programming with column 
generation) 
tailored optimization 
90C30, 90C90] 
(see: Successive quadratic programming: applications in the 
process industry) 
taker see: price — 
Tal SOCQ see: ben- — 
Tanabe-Todd-Ye potential function 
[37A35, 90C05] 
(see: Potential reduction methods for linear programming) 
tangent 
[49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
tangent see: bouligand —; weak — 
tangent approximating vector see: high-order — 
tangent cone 
[65K05, 90C20, 90C22, 90C25, 90C31] 
(see: Quadratic programming with bound constraints; 
Semidefinite programming: optimality conditions and 
stability) 
tangent cone 
[90C22, 90C25, 90C31] 
(see: Semidefinite programming: optimality conditions and 
stability) 
tangent cone see: Bouligand — 
tangent high-order approximating cone 
41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
tangent high-order approximating cones 
41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
tangent high-order approximating curve 
41A10, 46N10, 47N10, 49K27] 
(see: High-order necessary conditions for optimality for 
abnormal points) 
tangent hyperplane 
90C26] 
(see: Global optimization: envelope representation) 
tangent-plane criterion 
[49K99, 65K05, 80A10] 
see: Optimality criteria for multiphase chemical 
equilibrium) 
tangent-plane criterion see: reaction — 
tangent set see: first order —; second order — 
tangent sets see: high-order — 
tangents see: parallel — 
tangents algorithm see: parallel- — 
tangle basis 
[68R10, 90C27] 
(see: Branchwidth and branch decompositions) 
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tangle number 
68R10, 90C27] 
(see: Branchwidth and branch decompositions) 
tape 
65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: intermediate terms) 
tape cell of a Turing machine 
90C60] 
(see: Complexity classes in optimization) 
tape heads 
90C60] 
(see: Complexity classes in optimization) 
tape of a Turing machine 
90C60] 
(see: Complexity classes in optimization) 
target 
68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
target analysis 
68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
targets 
(see: Planning in the process industry) 
targets see: environmental — 
Tarjan planarity-testing algorithm see: Hopcroft- — 
task 
03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
(see: Maximum satisfiability problem) 
task allocation 
90C30, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms) 
task-network see: state- — 
tatonnement process 
65K10, 90C90] 
(see: Variational inequalities: projected dynamical system) 
t-estimate of the spot rate 
90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
t-programmed problem of spot rate estimation 
90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
Th Statistic see: Goodman-Kruskal — 
TAUT 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
TAUT-DNF 
[03B50, 68T15, 68T30] 
see: Finite complete systems of many-valued logic algebras) 
taxonomy 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
taxonomy of the PI-algebras of many-valued logics 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
taxonomy of Pi-logic algebras 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 


Taylor approximation see: truncated — 
Taylor coefficients 
[90C26, 90C30] 
(see: Bounding derivative ranges) 
Taylor form 
[90C26, 90C30] 
(see: Bounding derivative ranges) 
Taylor form test 
[90C26, 90C30] 
(see: Bounding derivative ranges) 
Taylor Operator see: interval —; Point — 
taylor operators see: Automatic differentiation: point and 
interval — 
Taylor series 
[65G20, 65G30, 65G40, 65L99, 90C30, 90C52, 90C53, 90C55] 
(see: Gauss-Newton method: Least squares, relation to 
Newton’s method; Interval analysis: differential equations; 
Suboptimal control) 
Taylor series 
[65K05, 90C26, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators; Bounding derivative ranges) 
Taylor series expansion see: first order — 
Taylor theorem 
[65K05, 90C30] 
(see: Automatic differentiation: point and interval taylor 
operators) 
TBP 
[90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
TCF 
[90C60] 
(see: Computational complexity theory) 
TCF of an algorithm 
[90C60] 
(see: Computational complexity theory) 
Tchebycheff metric see: w-weighted — 
Tchebycheff program 
[90C11, 90C29] 
(see: Multi-objective mixed integer programming) 
TCVSP 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
TDI 
[90C35] 
(see: Feedback set problems) 
TDTSP 
[90C27] 
(see: Time-dependent traveling salesman problem) 
technique see: auction —; Bland —; boundary variation —; 
clipping —; dynamic load balancing —; greedy —; inexact 
line search —; line search —; penalty —; perturbation —; 
pictogram translation mapping —; receiver initiated 
mapping —; reformulation-linearization —; 
reformulation-Linearization/Convexification —; 
Relaxation —; sender initiated mapping —; sphere 
growing —-; trust region —; uniform grid —; variance 
reduction — 
technique for global optimization see: 
Reformulation-linearization — 
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techniques see: acceleration devices and related —; 
approximation —; branch and bound —; branch and bound 
enumerative —; computer aided —; constraint 

satisfaction —; decomposition —; dual —; enumeration —; 

Estimating data for multicriteria decision making problems: 

optimization —; Load balancing for parallel optimization —; 

measurement —; NLP —; presolving —; reformulation —; 
reformulation-linearization/convexification —; 

subgradient — 
techniques for MILP: lagrangian relaxation see: 

Decomposition — 
techniques for minimizing the energy function see: 

Optimization — 
techniques for phase retrieval based on single-crystal X-ray 

diffraction data see: Optimization — 
technological comparison 
90C26, 90C30] 

(see: Forecasting) 

telecommunication 

90C15, 90C26, 90C33] 

(see: Stochastic bilevel programs) 
telecommunication 

68T99, 90C27] 

(see: Capacitated minimum spanning trees) 
telecommunications 

90C35] 

(see: Multicommodity flow problems) 
teletherapy 

68W01, 90-00, 90C90, 92-08, 92C50] 

(see: Optimization based frameworkfor radiation therapy) 
temperature 

05C69, 05C85, 68W01, 90C59] 

(see: Heuristics for maximum clique and independent set) 
temperature see: initial —; initial annealing —; start — 
temperature cascade 

[90C90] 

(see: Mixed integer linear programming: heat exchanger 

network synthesis) 
temperature control 

[35R70, 47840, 74B99, 74D99, 74G99, 74H99] 

(see: Quasidifferentiable optimization: applications to 

thermoelasticity) 
temperature interval diagram 

[93A30, 93B50] 

(see: Mixed integer linear programming: mass and heat 

exchanger networks) 
temperature parameter 

[65K05, 90C30] 

(see: Random search methods) 
template see: checklist —; pool — 
templates see: De novo protein design using flexible —; De 

novo protein designUsing rigid —; deformable — 
Temple quotient 

[49R50, 65G20, 65G30, 65G40, 65L15, 65L60] 

(see: Eigenvalue enclosures for ordinary differential 

equations) 
temporal difference 

[49L.20, 90C39, 90C40] 

(see: Dynamic programming: stochastic shortest path 

problems; Neuro-dynamic programming) 


temporal links 
(see: Bayesian networks) 
tensegrity 
[51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 
(see: Graph realization via semidefinite programming) 
tensegrity see: unyielding — 
tensor see: strain — 
tensor method 
[90C06] 
see: Large scale unconstrained optimization) 
tenth problem see: Hilbert — 
Terlaky criss-cross method 
[90C05] 
see: Linear programming: Klee-Minty examples) 
term see: intermediate — 
term to maturity 
[90C34, 91B28] 
see: Semi-infinite programming and applications in 
finance) 
term memory see: short- — 
term memory in GRASP see: long- — 
term of a polynomial see: initial — 
term-recurrence see: three- — 
term-recurrence algorithm see: three- — 
term scheduling of batch processes see: Medium- — 
term scheduling of batch processes with resources see: 
Short- — 
term scheduling of continuous processes see: Short- — 
term scheduling, resource constrained: unified modeling 
frameworks see: Short- — 
term scheduling under uncertainty: sensitivity analysis see: 
Short- — 
term structure of interest rates 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
term structure of interest rates 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
terminal layout problem 
68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
terminal layout problem 
68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
terminal simplex tableau 
05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules) 
terminals 
05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
terminals see: away —; home — 
terminate 
[90C26, 90C31] 
(see: Robust global optimization) 
termination 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 
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termination criterion 
[92C05, 92C40] 
(see: Protein loop structure prediction methods) 
termination on a secondary ray 
[90033] 
(see: Linear complementarity problem) 
terms see: bilinear —; cost —; indexing —; Interval analysis: 
intermediate —; linear fractional —; posynomial —; shift —; 
transportation — 
ternary matroid 
[90C09, 90C10] 
(see: Matroids) 
terrain/funneling methods see: Multi-scale global optimization 
using — 
terrain methods see: Global — 
tertiary structure 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
tertiary structure 
[92B05] 
(see: Genetic algorithms for protein structure prediction) 
tertiary structure prediction 
[92040] 
(see: Monte-Carlo simulated annealing in protein folding) 
test see: feasibility —; Hessian —; infeasibility —; lower 
bound —; midpoint —; minimum ratio —; monotonicity —; 
monotonicity and nonconvexity —; Newton —; 
nonconvexity —; redundancy —; Taylor form —; 
upper-bound —; Wolfe — 
test for the existence of solutions of equations 
[65G20, 65G30, 65G40, 65K05, 90C30] 
see: Interval global optimization) 
test and flexibility index sce: Bilevel optimization: feasibility — 
test nonmonotone Armijo-like criterion 
[49M07, 49M10, 65K, 90C06, 90C20] 
see: Spectral projected gradient methods) 
t of optimality 
[90C05, 90C33] 
see: Pivoting algorithms for linear programming 
generating two paths) 
test problems and problem generators see: Combinatorial — 
test set 
[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 
90C90] 
(see: Disease diagnosis: optimization-based methods) 
test sets 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
test sets in integer programming 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
testing see: planarity — 
testing algorithm see: Hopcroft-Tarjan planarity- — 
testing relational properties 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
tests see: feasibility convergence —; midpoint —; value 
convergence — 
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text classification 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 


text classification 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 


text documents see: classification of —; Optimization in 
classifying — 

ThAIt 
[15A39, 90C05] 
(see: Linear optimization: theorems of the alternative) 


than-truckload see: less- — 
their simulation see: Derivatives of markov processes and — 


theorem see: alternative —; asynchronous convergence —; 
Bandler—Kohout compatibility —; basic alternative —; basic 
sensitivity —; Bassett-Maybee-Quirk —; biduality —; 
birkhoff’s —; block —; brouwer fixed point —; Broyden —; 
Carathéodory —; Clarke duality —; classical Lyusternik —; 
coincidence —; composition —; conic duality —; 
convergence —; cook’s —; Cook-Levin —; Du-Hwang 
minimax —; duality —; Dubovitskii—Milyutin —; Duffin —; 
equivalence —; Finsler —; fixed point —; folks —; Frank 
discrete separation —; gap —; Gauss—Markoff —; 
Gauvin —; Geoffrion —; Gershgorin —; Gordan 
transposition —; Hahn-Banach —; Hahn-Banach linear 
extension —; Hansel —; Hardy-Littlewood-Polya —; 
high-order generalization of Lyusternik —; implicit 
function —; integrality —; interlocking eigenvalue —; 
James sup —; Jaynes entropy concentration —; 
Kharitonov —; Krein-Milman —; L-separation —; 
Liouville —; local quadratic convergence —; Lyusternik —; 
M-separation —; majority —; max-flow min-cut —; 
Mazur—Orlicz —; Mazur-—Orlicz version of the 
Hahn-Banach —; mean value —; metaminimax —; 
minimax —; Miranda fixed point —; mixed minimax —; 
monotone convergence —; Moreau —; Motzkin —; 
Motzkin transposition —; mountain pass —; parametrized 
Sard —; Parrott —; Perron—-Frobenius —; representation —; 
Riesz —; right saddle-point —; Rosenbloom —; saddle 
duality —; saddle-minimax —; sandwich —; Savitch —; 
Schauder fixed point —; Schema —; Sierpinski —; Slater —; 
solvability —; Stiemke —; Stiemke transposition —; strong 
duality —; strong perfect graph —; supercritical point —; 
superLagrangian duality —; superminimax —; Taylor —; 
topological representation —; transposition —; triality —; 
triduality —; Tucker's —; Tucker transposition —; Tychonoff 
fixed point —; weak duality —; Weyl fundamental —; 
Yosida—Hewitt —; Zangwill — 


theorem of algebra see: Fundamental — 


theorem of the alternative 
[15A39, 90C05, 90C30] 
(see: Farkas lemma; Motzkin transposition theorem; 
Theorems of the alternative and optimization; Tucker 
homogeneous systems of linear relations) 


theorem of the alternative 
[15A39, 90C05, 90C30] 
(see: Farkas lemma; Theorems of the alternative and 
optimization; Tucker homogeneous systems of linear 
relations) 
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theorem of the alternatives 
[05B35, 90C05, 90C20, 90C33] 
(see: Least-index anticycling rules) 

theorem for the Hamilton—Jacobi-Bellman equation see: 
sufficiency — 

theorem for linear optimization see: duality — 

theorem of linear programming see: Extension of the 
fundamental — 

theorem of natural selection see: fundamental — 

theorem prover see: resolution based — 

theorems see: Minimax —; separation — 

theorems of the alternative see: Linear optimization: — 

Theorems of the alternative and optimization 
(90C05, 90C30) 
(referred to in: Farkas lemma; Image space approach to 
optimization; Linear optimization: theorems of the 
alternative) 
(refers to: Farkas lemma; Farkas lemma: generalizations; 
Image space approach to optimization; Linear 
optimization: theorems of the alternative) 

theoretic framework see: Bayesian decision- — 

theoretical 
(see: Global optimization: functional forms) 

theory see: Alternative set —; arbitrage pricing —; axioms of 
alternative set —; Cantor set —; complementary pivot —; 
Complexity —; Computational complexity —; control —; 
critical point —; decision —; degree —; Duality —; 
evolutionary game —; Fenchel-Rockafellar duality —; fixed 
point —; game —; geometric moment —; graph —; 
Interval fixed point —; Lagrangian —; location —; 
matroid —; Maximum entropy and game —; measure —; 
minimax —; moment —; Morse —; multi-attribute utility —; 
polyhedral —; portfolio —; Probabilistic constrained linear 
programming: duality —; Probabilistic constrained 
problems: convexity —; robust control —; scheduling —; 
Stackelberg game —; Standard quadratic optimization 
problems: —; Statistical convergence and turnpike —; 
Topological methods in complementarity —; triality —; 
triduality —; utility — 

theory of algorithms see: complexity — 

theory of automata 
[01A99, 90C99] 
(see: Von Neumann, John) 

theory of automata 
[01A99, 90C99] 
(see: Von Neumann, John) 

theory: biduality in nonconvex optimization see: Duality — 

theory of CNSO problems see: second order Lagrangian — 

theory and control see: systems — 

theory for entropy optimization see: duality — 

theory of envelopes 
[01A99] 
(see: Leibniz, gottfried wilhelm) 

theory of envelops 
[01A99] 
(see: Leibniz, gottfried wilhelm) 

theory and examples see: Derivatives of probability and 
integral functions: general — 

theory of games 
[01A99, 90C99] 
(see: Von Neumann, John) 


theory of generalized functions 
[0199] 
(see: Kantorovich, Leonid Vitalyevich) 
theory: monoduality in convex optimization see: Duality — 
theory and optimality conditions see: Saddle point — 
theory of Pl-algebras see: complexity — 
theory: quadratic programming see: Complexity — 
theory of real addition with order see: first order — 
theory: stability of optimal trajectories see: Turnpike — 
theory: triduality in global optimization see: Duality — 
therapy see: Optimization based frameworkfor radiation —; 
radiation — 
thermal boundary conditions see: quasidifferential —; 
variational formulation of quasidifferential — 
thermal equilibrium 
[90C27, 90C90] 
see: Simulated annealing) 
thermal fluctuations 
[60J15, 60J60, 60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 
70-08, 82B21, 82B31, 82B41, 82B80, 92C40, 92E10] 
see: Global optimization in protein folding) 
thermal plant 
[90C10, 90C30, 90C35] 
see: Optimization in operation of electric and energy 
power systems) 
thermal plant 
[90C10, 90C30, 90C35] 
(see: Optimization in operation of electric and energy 
power systems) 
thermodynamic models 
[90C26, 90C90] 
see: Global optimization in phase and chemical reaction 
equilibrium) 
thermodynamic properties 
[49K99, 65H20, 65K05, 80A10, 80A22, 90C90] 
see: Global optimization: application to phase equilibrium 
problems; Optimality criteria for multiphase chemical 
equilibrium) 
thermodynamics 
[01A99] 
(see: Carathéodory, Constantine) 
thermodynamics 
[49K99, 65K05, 80A10, 90C30] 
(see: Nonlinear systems of equations: application to the 
enclosure of all azeotropes; Optimality criteria for 
multiphase chemical equilibrium) 
thermoelastic behavior of a generally nonhomogeneous and 
nonisotropic body see: linear — 
thermoelastic model see: classical — 
thermoelasticity 
[35R70, 47840, 74B99, 74D99, 74G99, 74H99] 
(see: Quasidifferentiable optimization: applications to 
thermoelasticity) 
thermoelasticity see: Quasidifferentiable optimization: 
applications to — 
thesis see: parallel computation — 
thickness 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
third slope lemma 
[90C30] 
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(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
thirteenth problem see: Hilbert’s — 
thread 
90C35] 
(see: Generalized networks) 
three-argument function 
62H30, 90C27] 
(see: Assignment methods in clustering) 
three-dimensional transportation problem 
[90C35] 
see: Multi-index transportation problems) 
three-dimensional transportation problem 
[90C35] 
see: Multi-index transportation problems) 
three-index assignment problems 
[90C35] 
(see: Multi-index transportation problems) 
three-index transportation problem 
[90C35] 
see: Multi-index transportation problems) 
three-layer supergraph 
[65K05, 90-00, 90-08, 90C11, 90C27, 90C35] 
(see: Algorithms for genomic analysis) 
three phase algorithm 
[90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 
three-term-recurrence 
90C30] 
(see: Conjugate-gradient methods) 
three-term-recurrence algorithm 
90C30] 
(see: Conjugate-gradient methods) 
threshold see: determination of clusters size —; determination 
of rmsd —-; indifference —; preference —; 
unsatisfiability —; veto — 


threshold accepting 
[68T20, 68T99, 90C27, 90C59] 
(see: Metaheuristics) 
threshold accepting algorithms 
[90C59] 
(see: Heuristic and metaheuristic algorithms for the 
traveling salesman problem) 
THS 
[15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
thumb see: rule of — 
TID 
[93A30, 93B50] 
(see: Mixed integer linear programming: mass and heat 
exchanger networks) 
tie breaking rule 
[90C60] 
(see: Complexity of degeneracy) 
tie-up time 
(see: Railroad crew scheduling) 
tie-up-time (I) 
(see: Railroad crew scheduling) 


tight constraints 
[90C60] 
(see: Complexity of degeneracy) 

tight constraints 
[90C60] 
(see: Complexity of degeneracy) 

tight convex underestimators see: Global optimization: — 

tight relaxations 

49M37, 65K10, 90C26, 90C30] 

(see: a BB algorithm) 

tight relaxations 

90C09, 90C10, 90C11] 

(see: Disjunctive programming) 

tightening see: node — 

Tikhonov iterative regularization 

49J40, 49M30, 65K05, 65M30, 65M32] 

(see: Ill-posed variational problems) 

Tikhonov regularization 

65Fxx] 

(see: Least squares problems) 

Tikhonov regularization 

49J40, 49M30, 65K05, 65M30, 65M32] 

(see: Ill-posed variational problems) 

Tikhonov’s regularization approach 

49J40, 49M30, 65K05, 65M30, 65M32] 

(see: Ill-posed variational problems) 

tiling 

65K05, 90C30] 
(see: Bisection global optimization methods) 

time see: algorithm running in (nS) —; arr- —; completion —; 
cycle —; dep- —; idle —; minimization of cost/ —; 
on-duty —; one clause at a —; output-polynomial —; 
polylogarithmic —; polynomial —; radiation exposure —; 
strongly polynomial —; super-polynomial —; switching —; 
tie-up — 

time algorithm see: efficient polynomially bounded 
polynomial —; exponential —; nondeterministic 
polynomial —; one clause at a —; polynomial —; 
pseudopolynomial —; strongly polynomial —; weakly 
polynomial — 

time algorithms see: discrete- —; polynomial — 

time analog of the dynamic programming equation see: 
continuous- — 

time approach see: one clause at a — 

time approximation scheme see: fully polynomial —; 
polynomial — 

time-bounded Turing machine see: exponentially —; 
polynomially — 

time coefficient generation see: one-at-a- — 

time complexity of a deterministic Turing machine 

90C60 

(see: Complexity classes in optimization) 

time complexity function 

90C60 

(see: Computational complexity theory) 

time complexity function 

90C60 

(see: Computational complexity theory) 

time complexity function of an algorithm 

90C60 

(see: Computational complexity theory) 
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time complexity of a nondeterministic Turing machine 
[90C60] 
(see: Complexity classes in optimization) 

time computable function see: polynomial — 

time constraints see: vehicle scheduling problems with — 

time convergence see: polynomial — 

time-delay system 
[90C30] 
(see: Suboptimal control) 

time-delay systems 
[93-XX] 
(see: Dynamic programming: optimal control applications) 

Time-dependent traveling salesman problem 
(90C27) 
(referred to in: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos—-Rosen 
mixed integer formulation; MINLP: trim-loss problem; 
Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs) 
(refers to: Branch and price: Integer programming with 
column generation; Decomposition techniques for MILP: 
lagrangian relaxation; Graph coloring; Integer linear 
complementary problem; Integer programming; Integer 
programming: algebraic methods; Integer programming: 
branch and bound methods; Integer programming: branch 
and cut algorithms; Integer programming: cutting plane 
algorithms; Integer programming duality; Integer 
programming: lagrangian relaxation; LCP: Pardalos-Rosen 
mixed integer formulation; Mixed integer classification 
problems; Multi-objective integer linear programming; 
Multi-objective mixed integer programming; 
Multiparametric mixed integer linear programming; 
Parametric mixed integer nonlinear optimization; Set 
covering, packing and partitioning problems; Simplicial 
pivoting algorithms for integer programming; Stochastic 
integer programming: continuity, stability, rates of 
convergence; Stochastic integer programs) 

time-dependent traveling salesman problem 
[90C27] 
(see: Time-dependent traveling salesman problem) 

time deterministic algorithm see: polynomial — 

time discretization see: uniform — 

time equivalent of the dynamic programming algorithm see: 
continuous- — 

time formulation see: continuous- — 

time formulations see: discrete- — 

time horizon see: infinite — 

time interior point methods see: polynomial — 


time (I) see: tie-up- — 
time local search problems see: polynomial — 
time m see: algorithm solving a problem instance in — 
Time Model see: continuous —; discrete — 
time models see: continuous and discrete —; discrete- — 
time network see: Space- —; weekly space- — 
time optimal control 
[90C30] 
(see: Suboptimal control) 
time optimal control see: continuous- —; Discrete- —; 
Dynamic programming: continuous- — 
time optimal control problem 
[93-XX] 
(see: Dynamic programming: optimal control applications) 
time parallelism 
[49-04, 65Y05, 68N20] 
(see: Automatic differentiation: parallel computation) 
time problem see: strongly polynomial — 
time ratio see: cost-to- — 
time ratio cycle see: maximum profit-to- —; minimum 
cost-to- — 
time reduction see: polynomial — 
time replicated network 
[90C30, 90C35] 
(see: Optimization in water resources) 
Time Representation see: mixed — 
time Riccati equation see: continuous- — 
time series 
[90C26, 90C30] 
see: Forecasting) 
time series analysis 
[90C26, 90C30] 
see: Forecasting) 
time slice 
see: Bayesian networks) 
time solution see: polynomial — 
time-stamped models 
see: Bayesian networks) 
time-step 
[90C10, 90C30, 90C35] 
see: Optimization in operation of electric and energy 
power systems) 
time systems see: discrete- — 
time of a Turing machine see: running — 
time window constraints 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
time windows 
[00-02, 01-02, 03-02] 
(see: Vehicle routing problem with simultaneous pickups 
and deliveries) 
time windows see: vehicle routing problem with — 
times see: the New York —; regenerative stopping —; 
stochastic travel — 
timetabling 
[90035] 
(see: Multi-index transportation problems) 
T™M 
[90C60] 
(see: Complexity theory) 
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TNE 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
Todd-Ye potential function see: Tanabe- — 
Toeplitz matrix 
[90C08, 90C11, 90C27, 90C57, 90C59] 
see: Quadratic assignment problem) 
tolerance 
[90C33] 
see: Order complementarity; Peptide identification via 
mixed-integer optimization) 
tolerance closure of a relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
tolerance closure of a relation see: local — 
tolerance relation see: local — 
tolerances 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
see: Boolean and fuzzy relations) 
tolerant system 
[90C33] 
see: Order complementarity) 
tollens see: checklist modus —; modus — 
Tomlin penalty see: Driebeck- — 
tomography see: computerized — 
tON 
(74440, 90C26] 
(see: Shape selective zeolite separation and catalysis: 
optimization methods) 
tongue-and-groove constraint 
[68W01, 90-00, 90C90, 92-08, 92C50] 
see: Optimization based frameworkfor radiation therapy) 
tools see: parallel AD — 
TOP and BOT types of logical connectives 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
see: Checklist paradigm semantics for fuzzy logics) 
topological degree 
[65G20, 65G30, 65G40, 90C33] 
(see: Interval analysis: systems of nonlinear equations; 
Topological methods in complementarity theory) 
Topological derivative in shape optimization 
49Q10, 49Q12, 74P05, 35J85) 
(referred to in: Shape optimization) 
topological method 
[90C33] 
see: Topological methods in complementarity theory) 
topological methods 
[90C33] 
see: Topological methods in complementarity theory) 
Topological methods in complementarity theory 
90C33) 
referred to in: Equivalence between nonlinear 
complementarity problem and fixed point problem; 
Generalized nonlinear complementarity problem; Integer 
linear complementary problem; LCP: Pardalos-Rosen 
mixed integer formulation; Linear complementarity 
problem; Order complementarity; Principal pivoting 
methods for linear complementarity problems) 
(refers to: Convex-simplex algorithm; Equivalence between 


nonlinear complementarity problem and fixed point 
problem; Generalized nonlinear complementarity problem; 
Integer linear complementary problem; LCP: 
Pardalos—Rosen mixed integer formulation; Lemke 
method; Linear complementarity problem; Linear 
programming; Order complementarity; Parametric linear 
programming: cost simplex algorithm; Principal pivoting 
methods for linear complementarity problems; Sequential 
simplex method) 


topological representation theorem 


90C09, 90C10] 
(see: Oriented matroids) 


topological search 


65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: two-phase methods) 


topological separation 


93D09] 
(see: Robust control) 


topological space see: linear — 
topological stability 


90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 


topological stability in parametric programming 


90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 


topological vector spaces see: Increasing and 


convex-along-rays functions on —; Increasing and 
positively homogeneous functions on — 


topology 


[05C05, 05C85, 68Q25, 90B80, 90C26, 90C90] 
(see: Bottleneck steiner tree problems; Structural 
optimization: history) 


topology 


[03E70, 03H05, 91B16] 
(see: Alternative set theory) 


topology see: network —; ring —; strong operator —; tree — 
Topology of global optimization 


(90C30, 58E05) 

(referred to in: a BB algorithm; Continuous global 
optimization: applications; Continuous global 
optimization: models, algorithms and software; Differential 
equations and global optimization; Direct global 
optimization algorithm; Globally convergent homotopy 
methods; Global optimization based on statistical models; 
Global optimization in binary star astronomy; Global 
optimization methods for systems of nonlinear equations; 
Global optimization using space filling; Parametric 
optimization: embeddings, path following and 
singularities; Semidefinite programming and structural 
optimization; Topology optimization) 

(refers to: BB algorithm; Continuous global optimization: 
applications; Continuous global optimization: models, 
algorithms and software; Differential equations and global 
optimization; Direct global optimization algorithm; 
Globally convergent homotopy methods; Global 
optimization based on statistical models; Global 
optimization in binary star astronomy; Global 
optimization methods for systems of nonlinear equations; 
Global optimization using space filling; Parametric 
optimization: embeddings, path following and 
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singularities; Semidefinite programming and structural 
optimization; Structural optimization; Structural 
optimization: history; Topology optimization) 

Topology optimization 
(90C90, 90C99) 
(referred to in: Semidefinite programming and structural 
optimization; Structural optimization: history; Topology of 
global optimization) 
(refers to: Semidefinite programming and structural 
optimization; Structural optimization; Structural 
optimization: history; Topology of global optimization) 

topology optimization 
[49J20, 49J52, 49M37, 65K05, 90C26, 90C30, 90C90] 
(see: Shape optimization; Structural optimization; 
Structural optimization: history) 

topology optimization 

49M37, 65K05, 90C30] 

(see: Structural optimization) 

topology optimization see: structural — 

topology of transportation networks 

49M37, 90C11] 

(see: Mixed integer nonlinear programming) 

toric ideal 

13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 

toric ideal 

13Cxx, 13Pxx, 14Qxx, 90Cxx] 

(see: Integer programming: algebraic methods) 

toroidal butterfly 

90C35] 

(see: Feedback set problems) 

toroidal mesh 

90C35] 
(see: Feedback set problems) 

torus see: 2-dimensional —; d-dimensional — 

total action 

49-XX, 90-XX, 93-XX] 

(see: Duality theory: biduality in nonconvex optimization) 

total coloring 

90C35] 

(see: Graph coloring) 

total coloring problem 

90C35] 

(see: Graph coloring) 

total consistency 

90C29] 
(see: Estimating data for multicriteria decision making 
problems: optimization techniques) 

total cost function 

90C10, 90C25, 90C27, 90C35] 

(see: L-convex functions and M-convex functions) 

total cost infinite horizon problem 

49120, 90C40] 

(see: Dynamic programming: stochastic shortest path 

problems) 

total Gibbs free energy 

90C26, 90C90] 
(see: Global optimization in phase and chemical reaction 
equilibrium) 

total least squares see: Generalized — 


total least squares problem 
65F xx] 
(see: Least squares problems) 
total support 
90C09, 90C10 
(see: Combinatorial matrix analysis) 
total variation 
90C11, 90C15 
(see: Stochastic programming with simple integer recourse) 
totally acyclic oriented matroid 
90C09, 90C10 
(see: Oriented matroids) 
totally asynchronous implementation of the auction algorithm 
90C30, 90C35 
(see: Auction algorithms) 
totally asynchronous operation 
90C30, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms) 
totally dual integral system 
90C35] 
(see: Feedback set problems) 
totally unimodular 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C, 90C05, 90C09, 90C10, 90C27, 90C35] 
see: Assignment and matching; Combinatorial 
optimization algorithms in resource allocation problems; 
Convex discrete optimization) 
totally unimodular matrix 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: cutting plane algorithms) 
tour 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: cutting plane algorithms) 
tour elimination constraints see: sub- — 
tour partitioning algorithm see: K-iterated — 
tournament see: bipartite —; spanning acyclic — 
tours see: sub- — 
toy problems 
see: Planning in the process industry) 
Toyoda primal heuristic 
90C10, 90C27] 
(see: Multidimensional knapsack problems) 
TPBVP 
65L99, 93-XX] 
(see: Optimization strategies for dynamic systems) 
TPC 
49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
TR 
[90C30] 
(see: Large scale trust region problems) 
TR strategy 
[90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
trace see: maximum weight — 
trace of an alignment 
[90C35] 
(see: Optimization in leveled graphs) 
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trace polytope 
[90C35] 
(see: Optimization in leveled graphs) 
trace of relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
tracing see: origin — 
tracing infeasibilities see: diagnosing and — 
tracing the states of plants 
(see: Planning in the process industry) 
track 
[90C35] 
(see: Multi-index transportation problems) 
tracking 
[34-XX, 49-XX, 65-XX, 68-XX, 90-XX] 
(see: Nonlocal sensitivity analysis with automatic 
differentiation) 
tracking see: multitarget — 
tracking stations 
[26A24, 65K99, 85-08] 
(see: Automatic differentiation: geometry of satellites and 
tracking stations) 
tracking stations see: Automatic differentiation: geometry of 
satellites and — 
tractability see: analytical —; fixed parameter — 
tractable algorithms see: fixed parameter — 
traction see: surface — 
trade see: pure — 
trade economic equilibrium model see: pure — 
trade-off 
[90-XX] 
see: Outranking methods) 
trade-off cutting plane 
[90C29] 
see: Multi-objective optimization; Interactive methods for 
preference value functions) 
trade-off question 
[90C29] 
see: Multi-objective optimization; Interactive methods for 
preference value functions) 
trade-offs 
[49M37, 90C11, 90C29, 90C90] 
see: MINLP: applications in the interaction of design and 


control; Multi-objective optimization: interaction of design 


and control) 
trading tactics 
[90C27] 
see: Operations research and financial markets) 
traffic assignment 
[90B06, 90B20, 90C90, 91A65, 91B50, 91B99] 
see: Bilevel programming: applications; Traffic network 
equilibrium) 
traffic assignment 
[90B06, 90B20, 91B50] 
see: Traffic network equilibrium) 
traffic assignment see: dynamic — 
traffic assignment problem 
[90C30] 
(see: Simplicial decomposition) 


traffic assignment problem 
[90C30] 
(see: Simplicial decomposition) 

traffic control see: ground delay problem in air — 

traffic control and ground delay programs see: air — 

traffic equilibrium problem see: standard — 

Traffic network equilibrium 
(90B06, 90B20, 91B50) 
(referred to in: Auction algorithms; Communication 
network assignment problem; Dynamic traffic networks; 
Equilibrium networks; Financial equilibrium; Generalized 
monotonicity: applications to variational inequalities and 
equilibrium problems; Generalized networks; Maximum 
flow problem; Minimum cost flow problem; 
Multicommodity flow problems; Network design problems; 
Network location: covering problems; Nonconvex network 
flow problems; Oligopolistic market equilibrium; Piecewise 
linear network flow problems; Shortest path tree 
algorithms; Spatial price equilibrium; Steiner tree 
problems; Stochastic network problems: massively parallel 
solution; Survivable networks; Walrasian price 
equilibrium) 
(refers to: Auction algorithms; Communication network 
assignment problem; Directed tree networks; Dynamic 
traffic networks; Equilibrium networks; Evacuation 
networks; Financial equilibrium; Frank-Wolfe algorithm; 
Generalized monotonicity: applications to variational 
inequalities and equilibrium problems; Generalized 
networks; Maximum flow problem; Minimum cost flow 
problem; Network design problems; Network location: 
covering problems; Nonconvex network flow problems; 
Oligopolistic market equilibrium; Piecewise linear network 
flow problems; Shortest path tree algorithms; Spatial price 
equilibrium; Steiner tree problems; Stochastic network 
problems: massively parallel solution; Survivable networks; 
Walrasian price equilibrium) 

traffic network equilibrium 
[90B06, 90B15, 90B20, 91B50] 
(see: Dynamic traffic networks; Traffic network 
equilibrium) 

traffic network equilibrium 
[90C30] 
(see: Equilibrium networks) 

traffic network equilibrium see: fixed demand —; 
multimodal — 

traffic network equilibrium model see: multimodal — 

traffic network equilibrium with travel disutility functions 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 

traffic network model see: dynamic — 

traffic network problems see: fixed demand — 

traffic network problems with travel demand functions see: 
elastic demand — 

traffic networks see: Dynamic — 

traffic in transmission network see: routing of — 

trails see: diverging — 

train arc 
(see: Railroad crew scheduling; Railroad locomotive 
scheduling) 

train connection see: train-to- — 

train connection arcs see: train- — 
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train-to-train connection 

(see: Railroad locomotive scheduling) 
train-train connection arcs 

(see: Railroad locomotive scheduling) 
training 

[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 

(see: Unconstrained optimization in neural network 

training) 
training see: Unconstrained optimization in neural network — 
training algorithms 

[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 

(see: Unconstrained optimization in neural network 

training) 
training data 

[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 

90C90, 91B28 90C90 90C05 90C20 90C30] 

(see: Credit rating and optimization methods; Disease 

diagnosis: optimization-based methods) 
training a network 

[90C39] 

(see: Neuro-dynamic programming) 
training samples 

[62H30, 68T10, 90C05, 90C11] 

(see: Linear programming models for classification; Mixed 

integer classification problems) 
training set 

[65K05, 90-08, 90C05, 90C06, 90C10, 90C11, 90C20, 90C30, 

90C90] 

(see: Disease diagnosis: optimization-based methods) 
trajectories see: Turnpike theory: stability of optimal — 
trajectories and controls see: suboptimal — 
trajectory see: central —; optimal —; optimization over a —; 

search — 
trajectory and control functions see: asymptotically admissible 

pair of — 
trajectory-control pair see: admissible — 
trajectory function 

[03H10, 49J27, 90C34] 

(see: Semi-infinite programming and control problems) 
trajectory-function and control-function see: admissible pair 

of — 
tramp steamer problem 

[68Q25, 68R05, 90-08, 90C27, 90C32] 

(see: Fractional combinatorial optimization) 
transcription element identification see: Mixed 0-1 linear 

programming approach for DNA — 
transfer module see: mass/heat — 
transfers see: principle of — 
transform see: Cayley —; orthogonal —; principal pivotal — 
transformable decision problem see: polynomially — 
transformation see: canonical —; canonical dual —; code —; 

contradual —; convex —; coordinate —; dual —; 
exponential —; fast Givens —; Fenchel —; Given —; 

Householder —; identity —; integral Fenchel-Legendre —; 

inverted —; kernel —; Legendre —; linear —; logarithmic 

and square-root —; Markov —; modified square-root —; 

negation —; Piaget group of —; polynomial —; principal 

pivotal —; projective —; source code —; square-root —; 

square-root-free Givens —; unimodular max-closed form — 
transformation method see: canonical dual — 


transformation of problems 
[90C60] 
(see: Computational complexity theory) 
transformations see: elementary —; elementary orthogonal —; 
QR factorization using Householder —; unimodular 
max-closed form — 
transformed network 
[90035] 
see: Maximum flow problem) 
transforming declarative programs see: symbolically — 
transient class of states 
[49L99] 
see: Dynamic programming: average cost per stage 
problems) 
transient regime 
60J05, 90C15] 
(see: Derivatives of markov processes and their simulation) 
transients 
see: Emergency evacuation, optimization modeling) 
transit planning see: extra-urban — 
transition matrix 
[93-XX] 
(see: Boundary condition iteration BCI) 
transition probability density 
[60G35, 65K05] 
see: Differential equations and global optimization) 
transition probability matrix 
[49L.20, 49L99, 90C39] 
see: Dynamic programming: average cost per stage 
problems; Dynamic programming: discounted problems) 
transition rules of a Turing machine 
[90C60] 
see: Complexity theory) 
transitions 
(see: State of the art in modeling agricultural systems) 
transitions see: generic — 
transitive relation 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
transitivity 
[41A30, 4799, 65K10] 
(see: Lipschitzian operators in best approximation by 
bounded or continuous functions) 
translation see: symbolic — 
translation mapping technique see: pictogram — 
transmission network see: routing of traffic in — 
transparency 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
transport see: Identification methods for reaction kinetics 
and — 
transportation 
[90C15, 90C26, 90C30, 90C31, 90C33] 
(see: Bilevel programming: introduction, history and 
overview; Stochastic bilevel programs) 
transportation 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C11, 90C27, 
90C35] 
(see: Multicommodity flow problems; Stochastic 
transportation and location problems; Vehicle scheduling) 
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transportation cost 
[90B80] 
(see: Facilities layout problems) 

transportation decisions see: inventory and — 

transportation and location problem see: stochastic — 

transportation and location problems see: Stochastic — 

transportation models 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

transportation network see: system-optimized —; 
user-optimized — 

transportation networks see: topology of — 

transportation polytope see: k-way — 

transportation problem 
[90B80, 90C11, 90C35] 
(see: Minimum cost flow problem; Multi-index 
transportation problems; Stochastic transportation and 
location problems) 

transportation problem 
[90C35] 
(see: Multi-index transportation problems) 

transportation problem see: 3D- —; axial multi-index —; 
capacitated —; convex integer —; fixed charge —; integer 
multi-index —; k-index —; minimum concave —; 
multi-index —; multidimensional —; planar multi-index —; 
stochastic —; symmetric multi-index —; 
three-dimensional —; three-index — 

transportation problems see: Minimum concave —; 
Multi-index — 

transportation systems see: water — 

transportation terms 
(see: Planning in the process industry) 

transposed relation 
[03B52, 03E72, 47840, 68T27, 68135, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

transposition 

[90C08, 90C11, 90C27, 90C57, 90C59] 

(see: Quadratic assignment problem) 

transposition theorem 

[15A39, 90C05, 90C30] 

see: Farkas lemma; Linear optimization: theorems of the 

alternative; Theorems of the alternative and optimization) 

ansposition theorem 

[15A39, 90C05, 90C30] 

see: Linear optimization: theorems of the alternative; 
Motzkin transposition theorem; Theorems of the 
alternative and optimization; Tucker homogeneous systems 
of linear relations) 

transposition theorem see: Gordan —; Motzkin —; Stiemke —; 
Tucker — 

transshipment 
[90C30, 90C35] 
(see: Auction algorithms) 

transshipment model 
[90C90, 93430, 93B50] 
(see: MINLP: heat exchanger network synthesis; Mixed 
integer linear programming: heat exchanger network 
synthesis; Mixed integer linear programming: mass and 
heat exchanger networks) 


it 


im 


transshipment model 
[90C90] 
(see: Mixed integer linear programming: heat exchanger 
network synthesis) 

transshipment model see: expanded — 

transshipment node 

90C35] 

(see: Minimum cost flow problem) 

transshipment problem 

90C26] 

(see: MINLP: application in facility location-allocation) 

transshipment problem 

90C30, 90C35] 

(see: Auction algorithms) 

transshipment vertex 

05C05, 05C40, 68R10, 90C35] 

(see: Network design problems) 

transversal of a collection of subsets 

90C09, 90C10] 

(see: Matroids) 

transversal intersection 

90C22, 90C25, 90C31] 

(see: Semidefinite programming: optimality conditions and 

stability) 

transversal matroid 

90C09, 90C10] 

(see: Matroids) 

transversal to zero 

65F10, 65F50, 65H10, 65K10] 

(see: Globally convergent homotopy methods) 

trap-door point 

90C31, 90C34] 

(see: Parametric global optimization: sensitivity) 

trap-door point 

90C31, 90C34] 

(see: Parametric global optimization: sensitivity) 

trapezoid graph 

90C35] 
(see: Feedback set problems) 

travel see: light — 

travel behavior see: day-to-day dynamic — 

travel demand see: elastic —; fixed — 

travel demand functions see: elastic demand traffic network 
problems with — 

travel disutility functions see: traffic network equilibrium 
with — 

travel times see: stochastic — 

Traveling purchaser problem 
[9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 

traveling salesman see: probabilistic — 

Traveling salesman problem 
(90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90) 
(referred to in: Domination analysis in combinatorial 
optimization) 
(refers to: Domination analysis in combinatorial 
optimization; Evolutionary algorithms in combinatorial 
optimization; Heuristic and metaheuristic algorithms for 
the traveling salesman problem) 
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traveling salesman problem 
[05C05, 05C40, 68R10, 90C05, 90C06, 90C08, 90C10, 90C11, 
90C27, 90C30, 90C35, 90C57] 
(see: Assignment and matching; Integer programming; 
Integer programming: cutting plane algorithms; Integer 
programming: lagrangian relaxation; Network design 
problems) 
traveling salesman problem see: classical —; graphical —; 
Heuristic and metaheuristic algorithms for the —; prize 
collecting —; road —; Steiner graphical —; 
Time-dependent — 
Traveling Salesman Problem (ATSP) see: asymmetric — 
traveling salesman problems 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and cut algorithms) 
traveling salesperson problem 
[90C60] 
(see: Computational complexity theory) 
Traverso algorithm see: Conti- — 
treatment design see: Beam selection in radiotherapy — 
tree 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
tree see: additive —; attributed —; binary —; decision —; 


directed —; empty —; full Steiner —; game —; index —; 
k- —; level of a vertex in a rooted —; min-max Steiner —; 
minimax —; minimum ratio spanning- —; minimum 
spanning —; one- —; rectilinear Steiner —; rectilinear 


Steiner arborescence —; root of a —; rooted —; scenario —; 
spanning —; Steiner —; Steiner minimal —; Steiner 
minimum — 

tree algorithm see: parallel minimax —; sequential minimax 
game — 

tree algorithms see: Shortest path — 

tree association graph 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 

tree association graph see: weighted — 

tree constraints 
[90C09, 90C10] 
(see: Combinatorial optimization algorithms in resource 
allocation problems) 

tree decomposition 
[68R10, 90C27] 
(see: Branchwidth and branch decompositions) 

tree dissection 
[90C15] 
(see: Stochastic programming: parallel factorization of 
structured matrices) 

tree networks see: Directed — 

tree node 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 

tree nodes set 
[90B10, 90C27] 
(see: Shortest path tree algorithms) 

tree problem see: bounded degree minimum spanning —; 
capacitated minimum spanning —; minimum spanning —; 
resource-constrained minimum spanning —; single source 
shortest path —; Steiner minimal — 


tree problem with minimum number of Steiner points see: 
Steiner — 

tree problems see: Bottleneck steiner —; shortest path —; 
Steiner — 

tree search 
[90C05, 90C06, 90C08, 90C10, 90C11] 
(see: Integer programming: branch and bound methods) 

tree search 
[68W 10, 90C27] 
(see: Load balancing for parallel optimization techniques) 

tree search see: best-first —; depth-first —; Parallel 
Best-First —; Parallel Depth-First — 

tree search algorithm see: distributed game —; generalized 
game — 

tree searching see: Minimax game — 

tree solution see: spanning — 

tree-splitting algorithm 
[49]35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 

tree structure see: feasible spanning —; optimal spanning —; 
spanning — 

tree topology 
[05C85] 
(see: Directed tree networks) 

trees see: asymptotic behavior of CAP on —; barycentric 
scenario —; binary —; bottleneck Steiner —; CAP on —; 
Capacitated minimum spanning —-; classification and 
regression —; exact algorithm for solving CAP on —; 
heuristic approach to solving CAP on —; minimum 
spanning —-; parallelizing the exploration of minimax —; 
variations of Steiner — 

treewidth 

[68R10, 90C27] 

see: Branchwidth and branch decompositions) 

trial point 

[49M07, 49M10, 65K, 90C06, 90C20] 

see: Spectral projected gradient methods) 

trial step 

[49M37, 65K05, 65K10, 90C30, 93A13] 

see: Multilevel methods for optimal design) 

trial steplength see: compute a safeguarded new — 

triality 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: triduality in global optimization) 

triality theorem 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: triduality in global optimization) 

triality theory 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: triduality in global optimization) 

triangle see: Cholesky — 

triangle inequality 

[05C05, 05C40, 68R10, 90B06, 90B35, 90C06, 90C10, 90C27, 

90C35, 90C39, 90C57, 90C59, 90C60, 90C90] 

see: Multi-index transportation problems; Network design 
problems; Traveling salesman problem) 

triangle inequality see: strengthen — 

triangle product see: fuzzy — 

triangularization see: Orthogonal — 
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triangulate 

(see: Semidefinite programming and the sensor network 

localization problem, SNLP) 
triangulation 

[13Cxx, 13Pxx, 14Qxx, 68Q20, 90C05, 90C10, 90Cxx] 

(see: Integer programming: algebraic methods; Optimal 

triangulations; Simplicial pivoting algorithms for integer 

programming) 
triangulation see: boundary —; delaunay —; greedy —; 

Hickey-Cohen —; minimum weight —; minimum weight 

Steiner —; regular — 
triangulation of Euclidean space 

[90C05, 90C10] 

(see: Simplicial pivoting algorithms for integer 

programming) 
triangulation problem for input-output tables 

[90C10, 90C11, 90C20] 

(see: Linear ordering problem) 
triangulations see: Optimal —; pseudo- —; regular —; regular 

family of — 
tricanonical forms 
[49-XX, 90-XX, 93-XX] 
see: Duality theory: triduality in global optimization) 
tridiagonal matrix 
[65K05, 90Cxx] 
see: Symmetric systems of linear equations) 
triduality 
[49-XX, 90-XX, 93-XX] 

(see: Duality theory: monoduality in convex optimization) 
triduality 
[49-XX, 90-XX, 93-XX] 

(see: Duality theory: triduality in global optimization) 
triduality in global optimization see: Duality theory: — 
triduality theorem 
[49-XX, 90-XX, 93-XX] 

(see: Duality theory: triduality in global optimization) 
triduality theory 

[49-XX, 90-XX, 93-XX] 

see: Duality theory: triduality in global optimization) 
trim loss 

[90C11, 90C90] 

see: MINLP: trim-loss problem) 

trim-loss problem 

[90C11, 90C90] 

(see: MINLP: trim-loss problem) 

trim-loss problem 

90C11, 90C90] 

(see: MINLP: trim-loss problem) 
trim-loss problem see: MINLP: —; numerical example of a — 
trinomial distribution 

[90C15] 

(see: Logconcavity of discrete distributions) 
trip see: passenger —; pull-in —; pull-out — 
trip-route choice adjustment process 

[90B15] 

(see: Dynamic traffic networks) 
trip-route choice adjustment process 

[90B15] 

(see: Dynamic traffic networks) 
trip shifting see: Vehicle scheduling with — 


triplet 
65K05, 90C30] 
(see: Automatic differentiation: calculation of the Hessian) 
triplet 
65K05, 90C30] 
(see: Automatic differentiation: calculation of the Hessian) 
triplet see: sparse — 
triplets 
05C85] 
(see: Directed tree networks) 
trisection 
65K05] 
(see: Direct global optimization algorithm) 
tRT 
(see: Integrated planning and scheduling) 
truckload see: less-than- — 
true continuum 
[90C30] 
(see: Conjugate-gradient methods) 
truncated Buchberger algorithm 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
truncated Newton 
[65K05, 90C30] 
(see: Automatic differentiation: calculation of Newton 
steps) 
truncated Newton method 
[90C06] 
(see: Large scale unconstrained optimization) 
truncated Newton method 
[90C25, 90C30] 
(see: Successive quadratic programming: solution by active 
sets and interior point methods) 
truncated Newton method see: discrete — 
truncated Newton software package see: block — 
truncated singular value decomposition solution 
[65Fxx] 
(see: Least squares problems) 
truncated Taylor approximation 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 
truss 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
truss see: elastic bar of a —; node of a — 
truss design 
[90C25, 90C27, 90C90] 
(see: Semidefinite programming and structural 
optimization) 
truss design see: multiload —; robust obstacle-free — 
trust region 
[49M37, 90C30] 
(see: Large scale trust region problems; Nonlinear least 
squares problems; Nonlinear least squares: trust region 
methods; Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
trust region 
[90C30] 
(see: Nonlinear least squares problems; Successive quadratic 
programming) 
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trust region see: bundle —; large scale — 
trust region approach 
90C30] 
(see: Numerical methods for unary optimization) 
trust region method 
90C20, 90C25] 
(see: Quadratic programming over an ellipsoid) 
trust region method 
49M37] 
(see: Nonlinear least squares: trust region methods) 
trust region methodology 
49M37, 65K05, 65K10, 90C30, 93A13] 
(see: Multilevel methods for optimal design) 
trust region methods 
49M37] 
(see: Nonlinear least squares: trust region methods) 
trust region methods see: Nonlinear least squares: — 
trust region model 
65F10, 65F50, 65H10, 65K10] 
(see: Multidisciplinary design optimization) 
trust region model 
90C20, 90C25] 
(see: Quadratic programming over an ellipsoid) 
trust region problem 
90C60] 
(see: Complexity theory: quadratic programming) 
trust region problem 
90C60] 
(see: Complexity theory: quadratic programming) 
trust region problem see: general case of the —; hard case of 
the —; large scale —; Newton step case of the — 
trust region problems see: Large scale — 
trust region strategy see: pure — 
trust region technique 
49M37] 
(see: Nonlinear least squares: Newton-type methods) 
trust regions 
90C30] 
(see: Broyden family of methods and the BFGS update) 
trustworthy space 
49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
truth assessment see: fuzzy — 
truth-functional 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
TS 


68T99, 90C27] 

(see: Capacitated minimum spanning trees) 

TS see: reactive —; static —; strict 

Tsallis probability distribution 

90C90] 

(see: Simulated annealing methods in protein folding) 
tSP 


05-04, 68Q25, 68R10, 68W40, 90B06, 90B35, 90C06, 90C10, 
90C27, 90C39, 90C57, 90C59, 90C60, 90C90] 
(see: Domination analysis in combinatorial optimization; 
Evolutionary algorithms in combinatorial optimization; 
Traveling salesman problem) 

TSP see: asymmetric —; euclidean —; generalized —; m- —; 
max —; symmetric — 


TSP (ATSP) see: asymmetric — 

TSP (STSP) see: symmetric — 

TSVSP 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

Tucker approach see: Kuhn- — 

Tucker, A.W. 
[05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules) 

Tucker conditions see: generalized Karush—-Kuhn- —; 
Karush—Kuhn- —; Kuhn- — 

Tucker conditions for quadratic programming sub-problems 
see: Kuhn— — 

Tucker CQ see: Kuhn- — 

Tucker equations see: Kantorovich—Karush—Kuhn- —; 
Kuhn- — 

Tucker homogeneous systems of linear relations 
(15439, 90C05) 
(referred to in: Farkas lemma; Linear optimization: 
theorems of the alternative; Linear programming; Motzkin 
transposition theorem) 
(refers to: Farkas lemma; Linear optimization: theorems of 
the alternative; Linear programming; Motzkin 
transposition theorem) 

Tucker necessary optimality conditions see: Kuhn— — 

Tucker optimality condition see: Kuhn- — 


Tucker optimality conditions see: Karush-Kuhn- —; Kuhn- — 
Tucker point see: Karush—-Kuhn- —; Kuhn- — 
Tucker points see: multiple Kuhn— —; multiple QP Kuhn- — 


Tucker’s theorem 
[15A39, 90C05, 90C30] 

(see: Farkas lemma; Theorems of the alternative and 
optimization) 

Tucker transposition theorem 
[15A39, 90C05] 

(see: Tucker homogeneous systems of linear relations) 

Tucker type condition see: Karush-Kuhn- — 

tumors see: breast — 

tuning 

see: Bayesian networks) 

tuning parameter 

[65K05] 

see: Direct global optimization algorithm) 

tunneling method 

[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 

see: Unconstrained optimization in neural network 

training) 

turbine balancing problem 

[90C08, 90C11, 90C27, 90C57, 90C59] 

see: Quadratic assignment problem) 

Turing machine 

[90C20, 90C25, 90C60] 

see: Complexity theory; Complexity theory: quadratic 

programming; Quadratic programming over an ellipsoid) 

Turing machine 

[90C60] 

see: Complexity theory) 

Turing machine see: accepting computation of a —; accepting 
state of a —; alternating —; control state of a —; 
deterministic —; execution of a —; exponentially 
space-bounded —; exponentially time-bounded —-; final 
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state of a —; input alphabet of a —; language accepted by 
a —; length of a partial computation of a —; logspace —; 
move of a —; nonaccepting computation of a —; 
nondeterministic —; partial computation of a —; 
polynomially space-bounded —; polynomially 
time-bounded —-; running time of a —; size of the input of 
a —; space complexity of a deterministic —; space 
complexity of a nondeterministic —; start state of a —; state 
of a —; tape of a —; tape cell of a —; time complexity of 
a deterministic —; time complexity of 
a nondeterministic —; transition rules of a — 
Turing machine model 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 
Turing machine solving a problem 
[90C60] 
(see: Complexity theory) 
Turing machines see: complexity of — 
Turing reducibility see: polynomial — 
turning point 
[90C31, 90C34] 
(see: Parametric global optimization: sensitivity) 
turning point see: quadratic — 
turnpike theory see: Statistical convergence and — 
Turnpike theory: stability of optimal trajectories 
49J24, 35B40, 37C70) 
referred to in: Statistical convergence and turnpike theory) 
(refers to: Statistical convergence and turnpike theory) 
twice codifferentiable 
[65Kxx, 90Cxx] 
see: Quasidifferentiable optimization: algorithms for QD 
functions) 
twice codifferentiable function 
[49]52, 65K99, 70-08, 90C25] 
(see: Quasidifferentiable optimization: codifferentiable 
functions) 
twice continuously codifferentiable 
[65Kxx, 90Cxx] 
(see: Quasidifferentiable optimization: algorithms for QD 
functions) 
twice continuously codifferentiable function 
[49J52, 65K99, 70-08, 90C25] 
(see: Quasidifferentiable optimization: codifferentiable 
functions) 
twice-differentiable function see: piecewise — 
twice-differentiable MINLPs 
[65K05, 90C11, 90C26] 
(see: MINLP: global optimization with «BB) 
Twice-differentiable NLPs 
[49M37, 65K10, 90C26, 90C30] 
(see: #BB algorithm) 
twice-differentiable part of a function 
[90Cxx] 
(see: Discontinuous optimization) 
Twilt see: problem regular in the sense of Jongen—Jonker- — 
twinplex 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 


two affine functions see: program of minimizing a product 
of — 
two cardinalities axiom 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
two-dimensional marginal probability distribution function 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
two-function minimax inequality 
[46A22, 49J35, 49]40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
two-hop neighbors 
(see: Broadcast scheduling problem) 
two-layer feed-forward network 
[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 
(see: Unconstrained optimization in neural network 
training) 
Two-level Optimization 
(see: Mixed integer nonlinear bilevel programming: 
deterministic global optimization) 
two-parameter CG family 
[90C30] 
(see: Conjugate-gradient methods) 
two paths see: Pivoting algorithms for linear programming 
generating — 
two-person game 
[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 
two-person game see: cooperative case of a — 
two-person zero-sum game 
[46A22, 49J35, 49]40, 54D05, 54H25, 55M20, 62C20, 90C15, 
91A05] 
(see: Minimax theorems; Stochastic programming: minimax 
approach) 
two-phase 
[65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C10, 90C26, 
90C30, 90C35] 
(see: Bi-objective assignment problem; Stochastic global 
optimization: two-phase methods) 
two-phase method 
65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: two-phase methods) 
two-phase methods see: Stochastic global optimization: — 
two-phase procedure 
05B35, 65K05, 90C05, 90C20, 90C33] 
(see: Criss-cross pivoting rules) 
two-player zero-sum perfect-information game 
49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
two-point boundary value problem 
34A55, 65L99, 78A60, 90C30, 93-XX] 
(see: Optimal design in nonlinear optics; Optimization 
strategies for dynamic systems) 
two-point boundary value problem see: ODE — 
two Polygons Arrangement 
(see: State of the art in modeling agricultural systems) 
two-stage model see: linear — 
two-stage stochastic linear program 
[90C15, 90C90] 
(see: Chemical process planning; Two-stage stochastic 
programs with recourse) 
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two-stage stochastic program with recourse 
[90C15] 
(see: Stochastic programming: parallel factorization of 
structured matrices) 

two-stage stochastic programming 
[68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 

two-stage stochastic programming models 
[90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 

Two-stage stochastic programming models 
[90C15] 
(see: Two-stage stochastic programming: quasigradient 
method) 

two-stage stochastic programming problem see: dynamic — 

Two-stage stochastic programming: quasigradient method 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic programs 
with recourse) 
(refers to: L-shaped method for two-stage stochastic 
programs with recourse; Multistage stochastic 
programming: barycentric approximation; Simple recourse 
problem: dual method; Simple recourse problem: primal 
method; Stochastic linear programming: decomposition 
and cutting planes; Stochastic linear programs with 
recourse and arbitrary multivariate distributions; 
Stochastic quasigradient methods; Stochastic quasigradient 
methods: applications; Stochastic quasigradient methods in 
minimax problems; Two-stage stochastic programs with 
recourse) 


Two-stage stochastic programs with recourse 
(90C15) 
(referred to in: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method) 
(refers to: Approximation of extremum problems with 
probability functionals; Approximation of multivariate 
probability integrals; Discretely distributed stochastic 
programs: descent directions and efficient points; 
Extremum problems with probability functions: kernel type 
solution methods; General moment optimization problems; 
Logconcave measures, logconvexity; Logconcavity of 
discrete distributions; L-shaped method for two-stage 
stochastic programs with recourse; Multistage stochastic 
programming: barycentric approximation; Preprocessing 
in stochastic programming; Probabilistic constrained linear 
programming: duality theory; Probabilistic constrained 
problems: convexity theory; Semi-infinite programming: 
discretization methods; Simple recourse problem: dual 
method; Simple recourse problem: primal method; 
Stabilization of cutting plane algorithms for stochastic 
linear programming problems; Static stochastic 
programming models; Static stochastic programming 
models: conditional expectations; Stochastic integer 
programming: continuity, stability, rates of convergence; 
Stochastic integer programs; Stochastic linear 
programming: decomposition and cutting planes; 
Stochastic linear programs with recourse and arbitrary 
multivariate distributions; Stochastic network problems: 
massively parallel solution; Stochastic programming: 
minimax approach; Stochastic programming models: 
random objective; Stochastic programming: 
nonanticipativity and lagrange multipliers; Stochastic 
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programming with simple integer recourse; Stochastic 
programs with recourse: upper bounds; Stochastic 
quasigradient methods in minimax problems; Stochastic 
vehicle routing problems; Two-stage stochastic 
programming: quasigradient method) 
two-stage stochastic programs with recourse 
[90C06, 90C15] 
(see: Stochastic linear programming: decomposition and 
cutting planes) 
two-stage stochastic programs with recourse see: L-shaped 
method for — 
two-stage stochastic programs with simple integer recourse 
[90C11, 90C15, 90C31] 
see: Stochastic integer programming: continuity, stability, 
rates of convergence) 
two-stranded chain 
[65D25, 68W30] 
see: Complexity of gradients, Jacobians, and Hessians) 
two updates see: rank- — 
Tychonoff fixed point theorem 
[46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
see: Minimax theorems) 
type see: abstract variational inequality of elliptic —; 
homotopy —; single locomotive — 
type a see: gas lift wells of —; naturally flowing wells of — 
type A well 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
type A wells 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
type algorithm see: Craig conjugate gradient —; simplex —; 
SQP — 
type approximation see: Padé- — 
type b see: gas lift wells of —; naturally flowing wells of — 
type B well 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
type B wells 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
type condition see: Fritz John —; Karush-Kuhn-Tucker — 
type duality for M- and L-convex functions see: Fenchel- — 


type function see: max- —; maximum-—; min- — 
type functions see: difference of max- —; Lagrange- — 
Type I requirement 

[90C26] 


(see: Invexity and its applications) 
type iteration see: Gram-Schmidt — 
type lower bounds see: Gilmore-Lawler — 
type many-valued logic system see: lattice- — 
type method see: Newton- — 
type methods see: Krylov space —; Nonlinear least squares: 
Newton- — 
type models see: multiple locomotive — 
type neighborhood structure for the QAP see: K-L — 
type of optimization 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
type solution methods see: Extremum problems with 
probability functions: kernel — 


type variable 
(see: Bayesian networks) 

types see: crew —; model — 

types of logical connectives see: TOP and BOT — 

types of vehicles see: Vehicle scheduling problems with 
multiple — 


U 


U-concave function 
90C29 
(see: Generalized concavity in multi-objective optimization) 
U-continuous function 
90C29 
(see: Generalized concavity in multi-objective optimization) 
U-pseudoconcave function 
90C29 
(see: Generalized concavity in multi-objective optimization) 
U-quasiconcave function 
90C29 
(see: Generalized concavity in multi-objective optimization) 
U-quasiconcave function see: int —; Luc — 
U°-quasiconcave function 
90C29] 
(see: Generalized concavity in multi-objective optimization) 
U-weakly pseudoconcave function 
90C29] 
(see: Generalized concavity in multi-objective optimization) 
UAF of CEP 
49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
uBD 
49M37, 90C11] 
(see: Mixed-integer nonlinear optimization: A disjunctive 
cutting plane approach) 
UFVS 
[90C35] 
(see: Feedback set problems) 
ultrametric 
[62H30, 90C27, 90C39] 
(see: Assignment methods in clustering; Dynamic 
programming in clustering) 
ultrametric 
[62H30, 90C39] 
(see: Dynamic programming in clustering) 
umbrella function 
[41A30, 62J02, 90C26] 
(see: Regression by special functions: algorithms and 
complexity) 
umbrella regression see: quasiconvex and — 
unary length 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 
unary operations on relations 
[03B52, 03E72, 47840, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
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unary optimization 
[90C30] 
(see: Numerical methods for unary optimization) 
unary optimization 
[90C30] 
(see: Numerical methods for unary optimization) 
unary optimization see: Numerical methods for — 
unary optimization problem 
[90C30] 
(see: Numerical methods for unary optimization) 
unavoidable edges 
[68Q20] 
(see: Optimal triangulations) 
unbounded 
15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
unbounded controls 
03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
unbounded controls and non standard methods 
03H10, 49J27, 90C34] 
(see: Semi-infinite programming and control problems) 
unbounded cost 
49Jxx, 91Axx] 
(see: Infinite horizon control and dynamic games) 
unbounded domains see: global optimization over — 
unbounded optimization 
65G20, 65G30, 65G40, 65K05, 90C30] 
(see: Interval global optimization; Random search methods) 
unbounded program 
90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
unbounded solution 
90C29] 
(see: Multicriteria sorting methods) 
uncapacitated center 
90C26] 
(see: MINLP: application in facility location-allocation) 
uncapacitated facility location problem 
90B10, 90B80, 90C10, 90C11, 90C27, 90C35, 90C57] 
(see: Integer programming; Network location: covering 
problems) 
uncapacitated facility location problem 
[90B10, 90B80, 90C35] 
(see: Network location: covering problems) 
uncapacitated network flow problem 
[90B10] 
(see: Piecewise linear network flow problems) 
uncapacitated plant location problem 
[90B80, 90B85] 
(see: Warehouse location problem) 
uncapacitated static multifacility see: discrete 
single-commodity single-criterion — 
uncertain 
[37N40, 90C30, 90C34] 
(see: Robust design of dynamic systems by constructive 
nonlinear dynamics) 
uncertain information 
[90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 


uncertain systems 
[93D09] 
(see: Robust control) 

uncertainties 
[93D09] 
(see: Robust control) 

uncertainty 
[68Q25, 90B85, 90C26, 90C29, 90C70, 91B28, 94A17] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Competitive ratio for portfolio management; Fuzzy 
multi-objective linear programming; Jaynes’ maximum 
entropy principle; Single facility location: multi-objective 
rectilinear distance location) 

uncertainty 
[90C15, 90C26, 90C29, 94A17] 
(see: Bilevel optimization: feasibility test and flexibility 
index; Discretely distributed stochastic programs: descent 
directions and efficient points; Global optimization in 
batch design under uncertainty; Jaynes’ maximum entropy 
principle; Two-stage stochastic programs with recourse) 

uncertainty see: Bilevel programming framework for 
enterprise-wide process networks under —; budget of —; 
decision making under —; design under —; disaggregation 
under —-; fictitious —; Global optimization in batch design 
under —; measure of —; multi-objective linear 
programming under —; nonstochastic —; probabilistic —; 
process synthesis and design under —; Production 
planning under — 

uncertainty considerations 
(see: Selection of maximally informative genes) 

uncertainty, duality and applications see: Robust linear 
programming with right-hand-side — 

uncertainty in dynamical systems see: estimating — 

uncertainty embedded in a probability distribution 
[90C25, 94417] 
(see: Entropy optimization: shannon measure of entropy 
and its properties) 

uncertainty on hydrological exogenous inflow and demand 
see: water resources planning under — 

uncertainty modeling 
[90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 

uncertainty with perturbations see: minimax observation 
problem under — 

uncertainty: sensitivity analysis see: Short-term scheduling 
under — 

uncertainty set 

[93D09] 

see: Robust control) 

unclassifiable examples 

[90C09, 90C10] 

see: Optimization in boolean classification problems) 

unclassifiable examples 

[90C09, 90C10] 

see: Optimization in boolean classification problems) 

unconstrained and constrained optimization see: Interval 

analysis: — 

unconstrained dual in entropy optimization 

[90C25, 90C51, 94A17] 

see: Entropy optimization: interior point methods) 
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unconstrained global optimization 

65T40, 90C26, 90C30, 90C90] 

(see: Global optimization methods for harmonic retrieval) 

unconstrained implicit Lagrangian 

[90C30, 90C33] 

see: Implicit lagrangian) 

unconstrained minimization 

[90C26, 90C39] 

see: Second order optimality conditions for nonlinear 

optimization) 

unconstrained minimization 

[49M29, 65K10, 90C06, 90C30] 

see: Conjugate-gradient methods; Local attractors for 
gradient-related descent iterations; Unconstrained 
nonlinear optimization: Newton-Cauchy framework) 

unconstrained minimization see: algorithms for — 


unconstrained nonlinear least squares problem 
[49M37] 
(see: Nonlinear least squares: Newton-type methods) 

Unconstrained nonlinear optimization: Newton-Cauchy 
framework 
(90C30) 
(referred to in: Automatic differentiation: calculation of 
Newton steps; Broyden family of methods and the BFGS 
update; Conjugate-gradient methods; Dynamic 
programming and Newton’s method in unconstrained 
optimal control; Interval Newton methods; Large scale 
unconstrained optimization; Nondifferentiable 
optimization: Newton method; Nonlinear least squares: 
Newton-type methods; Numerical methods for unary 
optimization; Unconstrained optimization in neural 
network training) 
(refers to: Automatic differentiation: calculation of Newton 
steps; Broyden family of methods and the BFGS update; 
Dynamic programming and Newton’s method in 
unconstrained optimal control; Interval Newton methods; 
Large scale unconstrained optimization; Nondifferentiable 
optimization: Newton method; Numerical methods for 
unary optimization; Unconstrained optimization in neural 
network training) 

unconstrained optimal control 
[49M29, 65K10, 90C06] 
(see: Dynamic programming and Newton’s method in 
unconstrained optimal control) 

unconstrained optimal control see: Dynamic programming 
and Newton’s method in — 

unconstrained optimization 
[65F10, 65F50, 65H10, 65K05, 65K10, 90C30] 
(see: ABS algorithms for optimization; Broyden family of 
methods and the BFGS update; Globally convergent 
homotopy methods) 

unconstrained optimization 
[65K05, 68T05, 90C06, 90C30, 90C52, 90C53, 90C55] 
(see: Broyden family of methods and the BFGS update; 
Large scale unconstrained optimization; Unconstrained 
optimization in neural network training) 

unconstrained optimization see: branch and bound for —; 
Large scale —; New hybrid conjugate gradient algorithms 
for —; Performance profiles of conjugate-gradient 
algorithms for — 


unconstrained optimization algorithms 
[65G20, 65G30, 65G40, 65K05, 90C30] 

(see: Interval global optimization) 

Unconstrained optimization in neural network training 
(90C30, 90C30, 90C52, 90C53, 90C55, 65K05, 68T05) 
(referred to in: Broyden family of methods and the BFGS 
update; Conjugate-gradient methods; Large scale 
unconstrained optimization; Neural networks for 
combinatorial optimization; Neuro-dynamic 
programming; Numerical methods for unary optimization; 
Replicator dynamics in combinatorial optimization; 
Unconstrained nonlinear optimization: Newton-Cauchy 
framework) 

(refers to: Automatic differentiation: introduction, history 

and rounding error estimation; Broyden family of methods 

and the BFGS update; Conjugate-gradient methods; Large 
scale unconstrained optimization; Least squares problems; 

Neural networks for combinatorial optimization; 

Neuro-dynamic programming; Numerical methods for 

unary optimization; Replicator dynamics in combinatorial 

optimization; Unconstrained nonlinear optimization: 

Newton-Cauchy framework) 
unconstrained optimization problem 

[65K05, 90C26, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization; 

Discontinuous optimization; Smooth nonlinear nonconvex 

optimization) 

unconstrained optimization problem see: global — 

unconstrained optimum 

[65K05, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization) 
unconstrained optimum 

[65K05, 90Cxx] 

(see: Dini and Hadamard derivatives in optimization) 
unconstrained optimum see: conditions for an — 
unconstrained problem 

[90C31] 

(see: Sensitivity and stability in NLP) 
undefined 

[49M29, 65K10, 90C06] 

(see: Local attractors for gradient-related descent iterations) 
under control see: rounding errors are — 
under extreme events see: decision making — 
under network constraints see: optimization — 
under principal pivoting see: matrix class invariant — 
under probabilistic constraint see: programming — 
under uncertainty see: Bilevel programming framework for 

enterprise-wide process networks —; decision making —; 

design —; disaggregation —; Global optimization in batch 
design —; multi-objective linear programming —; process 
synthesis and design —; Production planning — 

under uncertainty on hydrological exogenous inflow and 
demand see: water resources planning — 

under uncertainty with perturbations see: minimax 
observation problem — 

under uncertainty: sensitivity analysis see: Short-term 
scheduling — 

under weak assumptions 
[57R12, 90C31, 90C34] 

(see: Smoothing methods for semi-infinite optimization) 
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underdetermined system of nonlinear equations 

[90C30] 

(see: Nonlinear least squares problems) 
underestimation see: convex global —; Molecular structure 

determination: convex global — 
underestimation matrix see: diagonal — 
underestimator 

[90C26] 

(see: Global optimization in multiplicative programming) 
underestimator see: convex —; convex global —; global —; 

local — 
underestimators see: feasible —; Global optimization: tight 

convex — 
underlying deterministic problem 
90C05, 90C15] 
(see: Probabilistic constrained linear programming: duality 
theory) 
underlying deterministic problem 
90C15] 
(see: Static stochastic programming models) 
underlying matroid 
90C09, 90C10] 
(see: Oriented matroids) 
underlying method see: single — 
underprojection 
90C30] 
(see: Relaxation in projection methods) 
undirected multicommodity network flow models 
90B10, 90C05, 90C06, 90C35] 
(see: Nonoriented multicommodity flow problems) 
undiscounted problem 
491.20, 90C39, 90C40] 

(see: Dynamic programming: infinite horizon problems, 

overview) 
undiscounted problems see: Dynamic programming: — 
unfeasibility criterion see: minimum — 

Unfolding see: maximum Variance — 
UniCalc 

[65G20, 65G30, 65G40, 65H20] 

(see: Interval analysis: intermediate terms) 
unichain policy 

[49L99] 

(see: Dynamic programming: average cost per stage 

problems) 
unicursal graph 

[90B06] 

(see: Vehicle routing) 
unidimensional Euclidean representation 

[62H30, 90C39] 

(see: Dynamic programming in clustering) 
unidimensional scale 

[62H30, 90C39] 

(see: Dynamic programming in clustering) 
unidimensional scale see: circular —; linear — 
unidimensional scales 

[62H30, 90C27] 

(see: Assignment methods in clustering) 
unidimensional scaling 

[62H30, 90C39] 

(see: Dynamic programming in clustering) 


UNIFAC equation 
[90C26, 90C90] 
(see: Global optimization in phase and chemical reaction 
equilibrium) 
unification see: set — 
unified algorithm 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
unified modeling frameworks see: Short-term scheduling, 
resource constrained: — 
uniform 
[49M37, 65K10, 90B36, 90C26, 90C30] 
see: 0BB algorithm; Maximum cut problem, MAX-CUT; 
Stochastic scheduling) 
uniform angle condition 
[49]20, 49J52] 
(see: Shape optimization) 
uniform computations 
[03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
uniform cone property 
[49]20, 49J52] 
see: Shape optimization) 
uniform distribution 
[52A22, 60D05, 68Q25, 90C05, 90C27, 90C90] 
(see: Probabilistic analysis of simplex algorithms; Simulated 
annealing) 
uniform distribution 
[90C11, 90C15] 
(see: Stochastic programming with simple integer recourse) 
uniform dose (eud) see: equivalent — 
uniform extension 
[49J20, 49]52] 
(see: Shape optimization) 
uniform fractional combinatorial optimization 
[68Q25, 68R05, 90-08, 90C27, 90C32] 
see: Fractional combinatorial optimization) 


uniform grid technique 

[90C26] 

(see: Global optimization using space filling) 

uniform Holder conditions 

[90C26] 

(see: Global optimization using space filling) 

uniform matroid 

[90C09, 90C10] 

see: Matroids) 

uniform norm see: approximation in the — 

uniform P-function 

90C30, 90C33] 

(see: Implicit lagrangian) 

uniform random sampling 

65C30, 65C40, 65C50, 65C60, 65Cxx, 65K05, 90C26, 90C30] 
(see: Stochastic global optimization: two-phase methods) 
uniform sequential coloring 

05-XX] 

(see: Frequency assignment problem) 

uniform time discretization 

[90C26] 

see: MINLP: design and scheduling of batch processes) 
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uniformity, conformity 
[68W01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
uniformly convex function 
[65F10, 65F50, 65H10, 65K10] 
(see: Globally convergent homotopy methods) 
uniformly differentiable function see: Dini — 
uniformly directionally differentiable function see: Dini — 
unilateral boundary value problem 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 
unilateral contact problem with friction see: coupled — 
unilateral frictional contact see: Signorini-Coulomb — 
unilateral growth condition 
[35A15, 47J20, 49]40] 
(see: Hemivariational inequalities: static problems) 
unilateral growth condition 
[35A15, 47J20, 49]40] 
see: Hemivariational inequalities: static problems) 
unilateral mechanics 
[49]40] 
(see: Nonconvex-nonsmooth calculus of variations) 
unilateral mechanics 
[49]40, 49]52] 
(see: Hemivariational inequalities: eigenvalue problems; 
Nonconvex-nonsmooth calculus of variations) 
unimodular see: totally — 
unimodular matrix 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
unimodular matrix see: totally — 
unimodular max-closed form transformation 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 
unimodular max-closed form transformations 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 
union 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
UNIQUAC equation 
[90C26, 90C90] 
see: Global optimization in phase and chemical reaction 
equilibrium) 
unique 
[12D 10, 12Y05, 13P10, 34-xx, 34Bxx, 34Lxx, 60J15, 60J60, 
60J70, 60K35, 65C05, 65C10, 65C20, 68U20, 70-08, 82B21, 
82B31, 82B41, 82B80, 90C05, 90C10, 90C22, 90C25, 90C31, 
92C40, 92E10, 93E24] 
(see: Complexity and large-scale least squares problems; 
Global optimization in protein folding; Grébner bases for 
polynomial equations; LP strategy for interval-Newton 
method in deterministic global optimization; Semidefinite 
programming: optimality conditions and stability; 
Simplicial pivoting algorithms for integer programming) 


unique path 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
uniqueness see: Variational inequalities: geometric 
interpretation, existence and — 
uniqueness problem see: entry- — 
uniqueness of solutions of nonlinear systems of equations 
[65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 
unit-Disk Graphs 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
unit-disk graphs see: bisectored —; Optimization problems 
in— 
unit-resolution see: single-lookahead- — 
unit-specific 
(see: Medium-term scheduling of batch processes) 
unit weight CMST 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
united extension 
[65H99, 65K99] 
(see: Automatic differentiation: point and interval) 
units see: building blocks for the process — 
units problem see: minimum- — 
unity demand 
[90B80, 90B85] 
(see: Warehouse location problem) 
univalent relation 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
univariate discrete probability distribution see: logconcave — 
univariate gradient free 
90C30] 
(see: Powell method) 
univariate interval Newton method 
65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 
univariate interval Newton operator 
65G20, 65G30, 65G40, 65H20, 65K99] 
(see: Interval Newton methods) 
univariate linear model see: general — 
universal 
49K99, 65K05, 80A10] 
(see: Optimality criteria for multiphase chemical 
equilibrium) 
universal class 
03E70, 03H05, 91B16] 
(see: Alternative set theory) 
universal cover 
05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements) 
universal Grobner basis 
05A, 13Cxx, 13Pxx, 14Qxx, 15A, 51M, 52A, 52B, 52C, 62H, 
68Q, 68R, 68U, 68W, 90B, 90C, 90Cxx] 
(see: Convex discrete optimization; Integer programming: 
algebraic methods) 
universal Grébner basis 
[13Cxx, 13Pxx, 14Qxx, 90Cxx] 
(see: Integer programming: algebraic methods) 
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universal portfolio 

[68Q25, 91B28] 

(see: Competitive ratio for portfolio management) 
universal properties of relations 

[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 

91B06, 92C60] 

(see: Boolean and fuzzy relations) 
universal wedge 

[68W01, 90-00, 90C90, 92-08, 92C50] 

(see: Optimization based frameworkfor radiation therapy) 
universality 

[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 

90B, 90C] 

(see: Convex discrete optimization) 
universally accessible form of CEP 

[49K99, 65K05, 80A10] 

(see: Optimality criteria for multiphase chemical 

equilibrium) 
unknown information 

[68Q25, 91B28] 

(see: Competitive ratio for portfolio management) 
unknown variables see: initializing — 
unlimited intermediate storage 

[90C26] 

(see: MINLP: design and scheduling of batch processes) 
unnormalized fuzziness 

[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 

(see: Checklist paradigm semantics for fuzzy logics) 
unpaired element see: left- —; right- — 
unpinned 
51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 

(see: Graph realization via semidefinite programming) 
unsatisfiability threshold 

03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 

(see: Maximum satisfiability problem) 

unsealed ABS class 

65K05, 65K10] 

(see: ABS algorithms for linear equations and linear least 

squares) 
unsplitting of load see: splitting/ — 
unstructured optimization 
90C06] 

(see: Large scale unconstrained optimization) 
unsupervised 

90C26, 90C56, 90C90] 

(see: Nonsmooth optimization approach to clustering) 
unsupervised classification 

90C90] 

(see: Optimization in medical imaging) 

unsupported nondominated solution 

90C11, 90C29] 

(see: Multi-objective mixed integer programming) 
unused partitions see: set Lfree of —; set Rfree of — 
unweighted feedback vertex set problem 

[90C35] 

(see: Feedback set problems) 
unweighted problem in OR 

[90B80, 90B85] 

(see: Warehouse location problem) 


unyielding configuration 

[51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 

(see: Graph realization via semidefinite programming) 

unyielding tensegrity 

[51K05, 52C25, 68Q25, 68U05, 90C22, 90C35] 

(see: Graph realization via semidefinite programming) 

UONNT 

[65K05, 68T05, 90C30, 90C52, 90C53, 90C55] 

see: Unconstrained optimization in neural network 

training) 

up to first order changes 

[90Cxx] 

see: Discontinuous optimization) 

up penalty 

[90C05, 90C06, 90C08, 90C10, 90C11] 

see: Integer programming: branch and bound methods) 

update see: BFGS —; BFGS quasi-Newton —; Broyden family of 
methods and the BFGS —; 
Broyden-Fletcher—Goldfarb-Shanno —; 
Broyden-Fletcher—Goldfarb-Shanno quasi-Newton —; 
Davidon-Fletcher—Powell —; DFP —; quasi-Newton —; 
SR1 —; symmetric rank-one — 

update formula see: Sherman-Morrison rank-one — 

update Newton method see: partial- — 

updates see: quasi-Newton —; rank-two — 

updating see: fractional —; inverse quasi-Newton —; 
partial —; secant — 

updating formula see: Moré — 

updating input-output matrices 
[90035] 
(see: Multi-index transportation problems) 

updating rule see: momentum — 

upper bandwidth 
[15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 

upper bound 
[90B35, 90C10] 
(see: Job-shop scheduling problem; Maximum constraint 
satisfaction: relaxations and upper bounds) 

upper bound 
[90C15] 
(see: Stochastic programs with recourse: upper bounds) 

upper bound see: Edmundson—Madansky —; 
Hunter—Worsley —; parametric —; piecewise linear —; 
polynomial —; valid — 

upper-bound dichotomy see: generalized- — 

upper bound on gas lift availability 

[76T30, 90C11, 90C90] 

(see: Mixed integer optimization in well scheduling) 

upper bound for a set 

[90C05, 90C10] 

see: Simplicial pivoting algorithms for integer 

programming) 

upper-bound test 

[49M37, 90C11] 

see: Mixed integer nonlinear programming) 

upper boundary 

[65K05, 90C26, 90C30] 

see: Monotonic optimization) 
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upper boundary point 
65K05, 90C26, 90C30] 
(see: Monotonic optimization) 
upper bounding 
49M37, 65K10, 90C26, 90C30] 
(see: &BB algorithm) 
upper bounding structure see: generalized — 
upper bounds 
90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
upper bounds see: Maximum constraint satisfaction: 
relaxations and —; Stochastic programs with recourse: — 
upper bounds constraints see: generalized —; lower and — 
upper bounds for multivariate probability integrals 
[65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
upper derivative 
[65K05, 90Cxx] 
(see: Dini and Hadamard derivatives in optimization) 
upper derivative see: Dini —; Dini conditional —; Hadamard 
conditional — 
upper directional derivative see: Dini —; Hadamard — 
upper directional derivatives see: lower and — 
upper envelope 
[90C26] 
(see: Global optimization: envelope representation) 
upper hemicontinuous operator 
[46N10, 49J40, 90C26] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
upper level problem 
[90C26, 90C31, 90C34, 91A65] 
(see: Bilevel programming: implicit function approach; 
Parametric global optimization: sensitivity) 
upper and lower bounds see: parametric — 
upper and lower bounds to eigenvalues 
[49R50, 65G20, 65G30, 65G40, 65L15, 65L60] 
see: Eigenvalue enclosures for ordinary differential 
equations) 
upper and lower well oil rate constraints 
[76T30, 90C11, 90C90] 
see: Mixed integer optimization in well scheduling) 
upper problem 
[90C25, 90C29, 90C30, 90C31] 
see: Bilevel programming: optimality conditions and 
duality) 
upper semicontinuous function 
[46A22, 49J35, 49J40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
upper semicontinuous function see: Rf - — 
upper semismooth function 
[49]40, 49J52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
upper set 
[62G07, 62G30, 65K05] 
(see: Isotonic regression problems) 
ups see: well-defined start- — 
urban transit planning see: extra- — 
URV factorization see: rank revealing — 


use of groundwater see: rational — 
use of PI-algebras 
[03B50, 68T15, 68T30] 
(see: Finite complete systems of many-valued logic algebras) 
use of water resource systems see: conjunctive — 
used partitions see: set Lreac Of —; set Rreac Of — 
user interface see: graphical — 
user-optimization 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
user-optimized transportation network 
[90B06, 90B20, 91B50] 
(see: Traffic network equilibrium) 
user-optimizing environment 
[90B80, 90B85, 90Cxx, 91 Axx, 91Bxx] 
(see: Facility location with externalities) 
using flexible templates see: De novo protein design — 
using a heuristic parameter, reject index for interval 
optimization see: Algorithmic improvements — 
using Householder transformations see: QR factorization — 
using MINLP see: HEN synthesis — 
using pseudocosts see: best estimate — 
using pseudoshadow prices see: best estimate — 
using space filling see: Global optimization — 
using (sub)gradients parametric representations see: necessary 
optimality condition without — 
using terrain/funneling methods see: Multi-scale global 
optimization — 
UTA see: meta- — 
UTA method 
90C29, 91A99] 
(see: Preference disaggregation) 
UTASTAR algorithm 
90C29, 91A99] 
(see: Preference disaggregation) 
utilité 
90C05, 90C90, 91B28] 
(see: Multicriteria methods for mergers and acquisitions) 
utilities see: consumption of — 
utility 
90C29, 90C90, 91A65, 91B99] 
(see: Bilevel programming: applications; Multiple objective 
programming support) 
utility 
90C29] 
(see: Preference modeling) 
utility function 
90-01, 90B30, 90B50, 90C29, 91B32, 91B52, 91B74] 
(see: Bilevel programming in management; Preference 
disaggregation approach: basic features, examples from 
financial decision making) 
utility function see: coordinatewise increasing —; implicit —; 
social — 
utility functions see: additive —; estimation of — 
utility theory 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
utility theory 
[03E70, 03H05, 91B16] 
(see: Alternative set theory) 
utility theory see: multi-attribute — 
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Vv 


V see: vertex set — 
V algebra 
[03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
V at see: insertion of vertex — 
(v)-differentiable family of measures see: weakly L; — 
V-invex 
[90C26] 
(see: Invexity and its applications) 
V-invex 
[90C26] 
(see: Invexity and its applications) 
V-representation 
[52B11, 52B45, 52B55] 
(see: Volume computation for polytopes: strategies and 
performances) 
valid cut 
[90C26] 
(see: Cutting plane methods for global optimization) 
valid inequalities 
[90C09, 90C10, 90C11] 
(see: Disjunctive programming) 
valid inequality see: nondominated — 
valid lower bound 
[90C10, 90C26] 
(see: MINLP: branch and bound global optimization 
algorithm) 
valid upper bound 
[90C10, 90C26] 
(see: MINLP: branch and bound global optimization 
algorithm) 
validation see: cross- —; model — 
validation sample 
91B28 90C90 90C05 90C20 90C30] 
(see: Credit rating and optimization methods) 
validity conditions 
90C35] 
(see: Maximum flow problem) 
valuation 
03E70, 03H05, 91B16] 
(see: Alternative set theory) 
valuation of a checklist 
03B50, 03B52, 03C80, 62F30, 62Gxx, 68T27] 
(see: Checklist paradigm semantics for fuzzy logics) 
valuation structure see: coarse —; fine — 
value see: best —; continuity property of the objective 
function —; convexity property of the objective function —; 
critical —; incumbent —; incumbent objective —; 
limiting —; marginal —; maximize net present —; mean —; 
minimal —; minimax —; positive marginal —; positive 
minimum —; regular —; saddle —; settle- —; structured 
singular — 
value analysis 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
value analysis 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 


91B06, 92C60] 
(see: Boolean and fuzzy relations) 

value bounds see: computable optimal — 

Value for Composite Convexifiable Function see: integral 
Mean- — 

value conditions see: boundary — 

value convergence tests 
[49M27, 90C11, 90C30] 
(see: MINLP: generalized cross decomposition) 

value cross decomposition see: mean — 

value cuts 
[49M27, 90C11, 90C30] 
(see: MINLP: generalized cross decomposition) 

value decomposition solution see: truncated singular — 

value extension see: mean — 

value of a flow 
[90035] 
(see: Maximum flow problem) 

value formula see: marginal — 

value function 
[90C10, 90C15, 90C29, 90C46] 
(see: Integer programming duality; Multi-objective 
optimization; Interactive methods for preference value 
functions; Multiple objective programming support; 
Multistage stochastic programming: barycentric 
approximation) 

value function 
[90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions; Multiple objective 
programming support) 

value function see: expected —; mean —; mixed integer —; 
optimal —; preference — 

value function approach 
[90-XX] 
(see: Outranking methods) 

value functions see: Multi-objective optimization; Interactive 
methods for preference —; optimal — 

value iteration 
[49L20, 49L99, 90C39, 90C40] 
(see: Dynamic programming: average cost per stage 
problems; Dynamic programming: discounted problems; 
Dynamic programming: infinite horizon problems, 
overview; Dynamic programming: stochastic shortest path 
problems; Dynamic programming: undiscounted 
problems) 

value iteration see: Gauss-Seidel —; relative — 

value of a network flow 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 

value of perfect information see: expected — 

value problem see: bilateral boundary —; initial —; mean —; 
ODE two-point boundary —; two-point boundary —; 
unilateral boundary — 

value of stochastic solution 
[90C15] 
(see: Two-stage stochastic programs with recourse) 

value theorem see: mean — 

valued analysis see: set- — 

valued approximate inference see: interval- — 

valued boundary laws and variational equalities see: single- — 
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valued CNSO see: extended real- —; real- — 

valued constraints see: set- — 

valued families of the Pinkava logic algebras see: many- — 
valued function see: Boolean 2- — 

valued Lagrangian see: vector — 

valued logic see: evaluation in multiple- — 

valued logic algebra see: Boolean 2- —; many- — 


valued logic algebras see: Finite complete systems of many- — 


valued logic implication see: many- — 

valued logic normal form see: complete many- — 
valued logic system see: lattice-type many- — 

valued logics see: classification of many- —; many- —; 


taxonomy of the Pl-algebras of many- — 


valued maps see: Generalized monotone single — 
valued normal form see: many- — 

valued normal forms see: Pl-algebras and 2- — 
valued objective function see: set- — 

valued optimization see: Set- — 

valued optimization problem see: set- — 

valued Pl-systems see: subfamilies of n- — 

valued relation 


[90C29] 
(see: Preference modeling) 


values see: negative marginal —; positive marginal —; set of 


formation — 


VAM 


[68T99, 90C27] 
see: Capacitated minimum spanning trees) 


VAM construction procedure see: mixed — 
vapor phases 


[90C26, 90C90] 
see: Global optimization in phase and chemical reaction 
equilibrium) 


vapor pressure see: Reid — 
variable 


[90C10, 90C30] 
(see: Modeling languages in optimization: a new paradigm) 


variable see: auxiliary —; choice of the entering —; choice of 


the leaving —; consistent —; distinguished —; dual —; 
eligible nonbasic —; entering —; flow decision —; 

leaving —; most/least infeasible integer —; multiple 
branches for bounded integer —; strongly determined —; 
type — 


variable assignment 


[90C60] 
(see: Complexity theory) 


variable coefficients see: generalized linear programming 


with — 


variable cost 


[90C25] 
see: Concave programming) 


variable dichotomy 


Vv 


‘a 


[90C05, 90C06, 90C08, 90C10, 90C11] 

(see: Integer programming: branch and bound methods) 
riable factor programming 

[49M29, 90C11] 

(see: Generalized benders decomposition) 


variable formulation of SP see: split- — 
variable metric 


[58E05, 90C30] 
(see: Topology of global optimization) 


variable metric bundle method 
49J40, 49]52, 65K05, 90C30] 
(see: Solving hemivariational inequalities by nonsmooth 
optimization methods) 
variable metric method 
90C30] 
(see: Unconstrained nonlinear optimization: 
Newton-Cauchy framework) 
variable metric methods 
90C26] 
(see: Smooth nonlinear nonconvex optimization) 
variable neighborhood descent 
9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 
Variable neighborhood search methods 
(9008, 90C59, 90C27, 90C26) 
(referred to in: Maximum cut problem, MAX-CUT) 
variable precision interval package 
65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: differential equations) 
variable representation see: splitting — 
variable space 
90C29 
(see: Multiple objective programming support) 
variable stage-length 
93-XX] 
(see: Dynamic programming: optimal control applications) 
variable-storage 
90C30 
(see: Conjugate-gradient methods) 
variable-storage algorithm 
90C30 
(see: Conjugate-gradient methods) 
variables 
(see: Planning in the process industry; Short-term 
scheduling under uncertainty: sensitivity analysis) 
variables 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
variables see: adjoint —; basic —; binary —; Boolean —; 
complementary pair of —; complicating —; constraints 
on —; control —; decision —; dependent —; design —; 
discrete —; discrete design —; dual —; eliminating blocks 
of —; full space of x —; Generalized geometric 
programming: mixed continuous and discrete free —; 
independent —-; initializing unknown —; integer —; 


intermediate —; key —; nonbasic —; separable problem —; 


sizing —; splitting —; structural —; superbasic — 
variables model see: errors-in- — 
variables x see: decision — 
variance 

[90C26, 90C90] 

(see: Signal processing with higher order statistics) 
variance see: mean- —; one-way analysis of — 
variance clustering see: minimal — 
variance model see: Portfolio selection: markowitz mean- — 
variance of the number of shadow-vertices 

[52A22, 60D05, 68Q25, 90C05] 

(see: Probabilistic analysis of simplex algorithms) 
variance optimization problems see: Decomposition 

algorithms for the solution of multistage mean- — 
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variance portfolio analysis see: mean- — 
variance reduction 
65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 90C15] 
(see: Approximation of multivariate probability integrals) 
variance reduction see: Monte-Carlo sampling and — 
variance reduction lower bounds 
90C08, 90C11, 90C27, 90C57, 90C59] 
(see: Quadratic assignment problem) 
variance reduction technique 
62F12, 65C05, 65C30, 65C40, 65C50, 65C60, 65Cxx, 65D30, 
65K05, 90C15, 90C31] 
(see: Approximation of multivariate probability integrals; 
Monte-Carlo simulations for stochastic optimization) 
Variance Unfolding see: maximum — 
variant of the constraint-by-constraint method see: 
lexicographic — 
variant of the simplex algorithm 
[52A22, 60D05, 68Q25, 90C05] 
(see: Probabilistic analysis of simplex algorithms) 
variants of GBD 
[49M29, 90C11] 
(see: Generalized benders decomposition) 
variates see: control — 
variation see: total — 
variation of an eigenvalue of an interval matrix see: interval 
of — 
variation of the interval Newton method see: Krawczyk — 
variation path see: principal — 
variation splitting algorithm see: principal — 
variation of a system 
[65K05, 90C30] 
(see: Bisection global optimization methods) 
variation technique see: boundary — 
variational equalities see: single-valued boundary laws and — 
variational equality for an elastostatic problem involving 
QD-superpotentials 
[49]40, 49M05, 49805, 74G99, 74H99, 74Pxx] 
(see: Quasidifferentiable optimization: variational 
formulations) 
variational formulation see: mixed — 
variational formulation of quasidifferential laws 
[49J40, 49M05, 49805, 74G99, 74H99, 74Pxx] 
(see: Quasidifferentiable optimization: variational 
formulations) 
variational formulation of quasidifferential thermal boundary 
conditions 
[35R70, 47840, 74B99, 74D99, 74G99, 74H99] 
(see: Quasidifferentiable optimization: applications to 
thermoelasticity) 
variational formulation of subdifferential laws 
[49J40, 49M05, 49805, 74G99, 74H99, 74Pxx] 
(see: Quasidifferentiable optimization: variational 
formulations) 
variational formulations see: Quasidifferentiable 
optimization: — 
variational-hemivariational inequality 
[49]40, 49J52, 49Q10, 70-XX, 74K99, 74Pxx, 80-XX] 
(see: Nonconvex energy functions: hemivariational 
inequalities; Nonconvex-nonsmooth calculus of variations) 


variational-hemivariational inequality 
[49]40] 
(see: Nonconvex-nonsmooth calculus of variations) 

variational inclusion 
[49J40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 

Variational inequalities 
(65K10, 65M60) 
(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; Global 
optimization methods for systems of nonlinear equations; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; Implicit 
lagrangian; Nonconvex energy functions: hemivariational 
inequalities; Nonconvex-nonsmooth calculus of variations; 
Optimization with equilibrium constraints: A piecewise 
SQP approach; Quasidifferentiable optimization; 
Quasidifferentiable optimization: algorithms for 
hypodifferentiable functions; Quasidifferentiable 
optimization: algorithms for QD functions; 
Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities: F. E. approach; 
Variational inequalities: geometric interpretation, 
existence and uniqueness; Variational inequalities: 
projected dynamical system; Variational principles) 
(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities: F. E. approach; 


4584 Subject Index 


Variational inequalities: geometric interpretation, 
existence and uniqueness; Variational inequalities: 
projected dynamical system; Variational principles) 

variational inequalities 
[26B25, 26E25, 46A22, 49J35, 49J40, 49J52, 49Q10, 49805, 
54D05, 54H25, 55M20, 65K99, 70-08, 74455, 74G99, 74H99, 
74K99, 74M 10, 74M15, 74Pxx, 90C26, 90C30, 90C33, 90C90, 
90C99, 91A05, 91A65] 
(see: Hemivariational inequalities: applications in 
mechanics; Minimax theorems; Multilevel optimization in 
mechanics; Nonsmooth and smoothing methods for 
nonlinear complementarity problems and variational 
inequalities; Quasidifferentiable optimization; 
Quasidifferentiable optimization: applications; 
Quasivariational inequalities) 

variational inequalities 
[46N10, 49]40, 49M05, 49Q10, 49805, 65K10, 70-08, 74G99, 
74H99, 74K99, 74Pxx, 90C06, 90C25, 90C26, 90C30, 90C31, 
90C33, 90C35] 
(see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems; Nonsmooth and 
smoothing methods for nonlinear complementarity 
problems and variational inequalities; Quasidifferentiable 
optimization: variational formulations; Quasivariational 
inequalities; Sensitivity analysis of variational inequality 
problems; Simplicial decomposition algorithms) 

variational inequalities see: approximation of —; multivalued 
monotone laws and —; Nonsmooth and smoothing 
methods for nonlinear complementarity problems and —; 
parametric —; QD laws and systems of —; scalar —; 
Solution methods for multivalued —; system of —; 
vector — 

variational inequalities: A brief review see: Generalized — 

variational inequalities and equilibrium problems see: 
Generalized monotonicity: applications to — 

Variational inequalities: F. E. approach 
(65M60) 
(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
geometric interpretation, existence and uniqueness; 


Variational inequalities: projected dynamical system; 
Variational principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
geometric interpretation, existence and uniqueness; 
Variational inequalities: projected dynamical system; 
Variational principles) 


Variational inequalities: geometric interpretation, existence 


and uniqueness 

(65K10, 65M60) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: projected 
dynamical system; Variational principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
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Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: projected 
dynamical system; Variational principles) 

Variational inequalities: projected dynamical system 

(65K10, 90C90) 

(referred to in: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Nonconvex energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 
Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
principles) 

(refers to: Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Hemivariational inequalities: applications in mechanics; 
Hemivariational inequalities: eigenvalue problems; 
Hemivariational inequalities: static problems; Nonconvex 
energy functions: hemivariational inequalities; 
Nonconvex-nonsmooth calculus of variations; 
Quasidifferentiable optimization; Quasidifferentiable 
optimization: algorithms for hypodifferentiable functions; 
Quasidifferentiable optimization: algorithms for QD 
functions; Quasidifferentiable optimization: applications; 


Quasidifferentiable optimization: applications to 
thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
principles) 

variational inequalities and quasivariational inequalities see: 
implicit — 

variational inequality 
[47H05, 47]20, 49J40, 49J52, 49M37, 49Q10, 65J15, 65K10, 
70-XX, 74K99, 74Pxx, 80-XX, 90C25, 90C26, 90C33, 90C55, 
91A10] 
(see: Bilevel programming; Fejér monotonicity in convex 
optimization; Nonconvex energy functions: 
hemivariational inequalities; Solution methods for 
multivalued variational inequalities) 

variational inequality see: generalized —; mixed — 

variational inequality for an elastostatic problem involving 
QD-superpotentials see: convex — 

variational inequality of elliptic type see: abstract — 

variational inequality formulation 
[90B06, 90B20, 90C30, 91B06, 91B28, 91B50, 91B60] 
(see: Equilibrium networks; Financial equilibrium; 
Oligopolistic market equilibrium; Spatial price 
equilibrium; Traffic network equilibrium; Walrasian price 
equilibrium) 

variational inequality formulation 

[91B50] 

see: Financial equilibrium; Walrasian price equilibrium) 

variational inequality formulation in link loads 

[90B06, 90B20, 91B50] 

see: Traffic network equilibrium) 

variational inequality formulation in path flows 

[90B06, 90B20, 91B50] 

see: Traffic network equilibrium) 

variational inequality formulations 

[65K10, 65M60, 90B06, 90B20, 91B06, 91B28, 91B50, 91B60] 

see: Oligopolistic market equilibrium; Spatial price 
equilibrium; Traffic network equilibrium; Variational 
inequalities) 

variational inequality problem 
[49J52, 65K10, 65M60, 90C06, 90C15, 90C25, 90C26, 90C30, 
90C33, 90C35, 90C90] 
(see: Cost approximation algorithms; Nondifferentiable 
optimization: Newton method; Simplicial decomposition 
algorithms; Stochastic bilevel programs; Variational 
inequalities; Variational inequalities: projected dynamical 
system) 

variational inequality problem 
[49J52, 90C15, 90C26, 90C30, 90C33] 
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(see: Cost approximation algorithms; Nondifferentiable 

optimization: Newton method; Stochastic bilevel programs) 
variational inequality problem see: dual —; finite- 

dimensional —; parametric —; vector — 
variational inequality problem and a projected dynamical system 
[65K10, 90C90] 
see: Variational inequalities: projected dynamical system) 
variational inequality problems 
[90C33] 
see: Linear complementarity problem) 
variational inequality problems see: Sensitivity analysis of — 
variational-like inequalities 
[49J40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
see: Variational principles) 
variational-like inequalities 
[49J40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 
variational methods 
[65L99, 93-XX] 

(see: Optimization strategies for dynamic systems) 
Variational Principle see: ekeland —; subdifferential — 
Variational principles 

(62H30, 65Cxx, 65C30, 65C40, 65C50, 65C60, 90C05, 49J40) 

(referred to in: Generalized monotonicity: applications to 

variational inequalities and equilibrium problems; 

Hemivariational inequalities: applications in mechanics; 

Hemivariational inequalities: eigenvalue problems; 

Nonconvex energy functions: hemivariational inequalities; 

Nonconvex-nonsmooth calculus of variations; 

Quasidifferentiable optimization; Quasidifferentiable 

optimization: algorithms for hypodifferentiable functions; 

Quasidifferentiable optimization: algorithms for QD 

functions; Quasidifferentiable optimization: applications; 

Quasidifferentiable optimization: applications to 

thermoelasticity; Quasidifferentiable optimization: calculus 

of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 

Quasidifferentiable optimization: exact penalty methods; 

Quasidifferentiable optimization: optimality conditions; 

Quasidifferentiable optimization: stability of dynamic 

systems; Quasidifferentiable optimization: variational 

formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 

F. E. approach; Variational inequalities: geometric 

interpretation, existence and uniqueness; Variational 

inequalities: projected dynamical system) 

(refers to: Hemivariational inequalities: applications in 

mechanics; Hemivariational inequalities: eigenvalue 

problems; Hemivariational inequalities: static problems; 

Nonconvex energy functions: hemivariational inequalities; 

Nonconvex-nonsmooth calculus of variations; Nonsmooth 

and smoothing methods for nonlinear complementarity 

problems and variational inequalities; Quasidifferentiable 
optimization; Quasidifferentiable optimization: algorithms 
for hypodifferentiable functions; Quasidifferentiable 
optimization: algorithms for QD functions; 

Quasidifferentiable optimization: applications; 

Quasidifferentiable optimization: applications to 


thermoelasticity; Quasidifferentiable optimization: calculus 
of quasidifferentials; Quasidifferentiable optimization: 
codifferentiable functions; Quasidifferentiable 
optimization: Dini derivatives, clarke derivatives; 
Quasidifferentiable optimization: exact penalty methods; 
Quasidifferentiable optimization: optimality conditions; 
Quasidifferentiable optimization: stability of dynamic 
systems; Quasidifferentiable optimization: variational 
formulations; Quasivariational inequalities; Sensitivity 
analysis of variational inequality problems; Solving 
hemivariational inequalities by nonsmooth optimization 
methods; Variational inequalities; Variational inequalities: 
F. E. approach; Variational inequalities: geometric 
interpretation, existence and uniqueness; Variational 
inequalities: projected dynamical system) 


variational principles 


[49]40, 49J52, 49K27, 49M05, 49805, 58C20, 58E30, 74G99, 
74H99, 74Pxx, 90C33, 90C48] 

(see: Hemivariational inequalities: applications in 
mechanics; Nonsmooth analysis: Fréchet subdifferentials; 
Quasidifferentiable optimization: variational formulations) 


variational principles 


[49J40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 


variational problem see: ill-posed — 
variational problems 


[03H10, 49J27, 49K05, 49K10, 49K15, 49K20, 90C34] 

(see: Duality in optimal control with first order differential 
equations; Semi-infinite programming and control 
problems) 


variational problems 


[49]40, 49M05, 49805, 74G99, 74H99, 74Pxx] 
(see: Quasidifferentiable optimization: variational 
formulations) 


variational problems see: IIl-posed —; implicit — 
variational sensitivity 


[90C90] 
(see: Design optimization in computational fluid dynamics) 


variations see: calculus of —; inverse problem of the calculus 


of —; Nonconvex-nonsmooth calculus of — 


variations of Steiner trees 


90C27] 
(see: Steiner tree problems) 


variations of Steiner trees 


90C27] 
(see: Steiner tree problems) 


varying dimension pivoting algorithms 


90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 


Vasicek model with impulse perturbations 


90C34, 91B28] 
(see: Semi-infinite programming and applications in 
finance) 


VDSP 


[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 


vector 


[90C09, 90C10] 
(see: Oriented matroids) 


Subject Index 


4587 


vector see: arc length —; augmenting —; characteristic —; 
charactertstic —; control —; descent —; differential cost —; 
dual —; feasible flow —; feasible high-order 
approximating —; feature —; generic cost —; geodesic 
gradient —; gradient —; high-order tangent 
approximating —; incidence —; Lagrange multiplier —; 
lexico-positive —; lexicographically positive —; position —; 
primitive integral —; proposal —; reference direction —; 
residual —; sign —; state —; steepest descent —; support 
of an integral —; support of a natural —; weight —; 
weighted characteristic — 

vector configuration 
[90C09, 90C10] 
(see: Oriented matroids) 

vector of decrease see: high-order approximating — 

vector estimates for parametric NLPS see: Bounds and 
solution — 

vector forward automatic differentiation 
[65D25, 68W30] 
(see: Complexity of gradients, Jacobians, and Hessians) 

vector generalized concavity 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 

vector geometry 
[26A24, 65K99, 85-08] 
(see: Automatic differentiation: geometry of satellites and 
tracking stations) 

vector inequality 
[90C29, 90C30] 
(see: Multi-objective optimization: lagrange duality) 

vector iteration see: control — 

vector iteration CVI see: Control — 

vector labeling 
[90C05, 90C10] 
(see: Simplicial pivoting algorithms for integer 
programming) 

vector lattice 
[90C33] 
(see: Equivalence between nonlinear complementarity 
problem and fixed point problem; Order complementarity; 
Topological methods in complementarity theory) 

vector lattice 
[90C33] 
(see: Order complementarity) 

vector machine see: generalized eigenvalue proximal 
support — 

vector machine problem see: Generalized eigenvalue proximal 
support — 

vector machines see: support — 

vector maximization method 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 

vector minimization 
(see: Planning in the process industry) 

Vector optimization 
(90C29) 
(referred to in: Composite nonsmooth optimization; Image 
space approach to optimization) 
(refers to: Image space approach to optimization) 


vector optimization 
[49K27, 65K05, 90B50, 90C05, 90C29, 90C48, 91B06] 
(see: Multi-objective optimization and decision support 
systems; Set-valued optimization) 
vector optimization 
[46A20, 49K27, 52A01, 65K05, 90B50, 90C05, 90C29, 90C30, 
90C48, 91B06] 
(see: Composite nonsmooth optimization; Multi-objective 
optimization and decision support systems; Set-valued 
optimization) 
vector of an oriented matroid 
[90C09, 90C10] 
see: Oriented matroids) 
vector space 
[14R10, 15A03, 51N20] 
see: Linear space) 
vector space see: optimization in a — 
vector space model 
[90C09, 90C10 
see: Optimization in classifying text documents) 
vector space model 
[90C09, 90C10 
(see: Optimization in classifying text documents) 
vector-space models for entropy optimization for image 
reconstruction 
[94A08, 94A17 
(see: Maximum entropy principle: image reconstruction) 
vector spaces see: Increasing and convex-along-rays functions 
on topological —; Increasing and positively homogeneous 
functions on topological —; ordered — 
vector valued Lagrangian 
[90C29, 90C30] 
(see: Multi-objective optimization: lagrange duality) 
Vector variational inequalities 
(58E35, 90C29) 
vector variational inequalities 
[46N10, 49]40, 90C26] 
see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
vector variational inequality problem 
[46N10, 49]40, 90C26] 
see: Generalized monotonicity: applications to variational 
inequalities and equilibrium problems) 
vectorial matroid 
[90C09, 90C10] 
see: Matroids) 
vectors 
[14R10, 15403, 51N20] 
see: Linear space) 
vectors see: equivalent cost —; high-order approximating —; 
lexicographical ordering for n-dimensional —; ordering on 
binary — 
vectors space see: real — 
vegetative 
(see: State of the art in modeling agricultural systems) 
vehicle block 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
vehicle and duty scheduling problems see: Integrated — 
Vehicle routing 
(90B06) 
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(referred to in: General routing problem; Stochastic vehicle 
routing problems; Vehicle scheduling) 
(refers to: General routing problem; Stochastic vehicle 
routing problems; Vehicle scheduling) 

vehicle routing 
[90-02, 90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems; Operations 
research models for supply chain management and design) 

vehicle routing 
[90C10, 90C15] 
(see: Stochastic vehicle routing problems) 

vehicle routing problem 
[05-04, 90B06, 90C10, 90C11, 90C15, 90C27, 90C57] 
(see: Evolutionary algorithms in combinatorial 
optimization; Set covering, packing and partitioning 
problems; Stochastic vehicle routing problems; Vehicle 
routing) 

vehicle routing problem see: capacitated —; 
distance-constrained —; dynamic —; Metaheuristic 
algorithms for the —; stochastic — 

vehicle routing problem with backhauls 
[90B06] 
(see: Vehicle routing) 

Vehicle routing problem with simultaneous pickups and 
deliveries 
(00-02, 01-02, 03-02) 

vehicle routing problem with time windows 
[90B06] 
(see: Vehicle routing) 

vehicle routing problems see: approximate methods for 
solving —; constructive methods for solving —; exact 
methods for solving —; Stochastic — 

Vehicle scheduling 
(90B06, 90B35, 68M20, 90C27, 90B80, 90B10, 90C10) 
(referred to in: Airline optimization; General routing 
problem; Integer programming; Job-shop scheduling 
problem; MINLP: design and scheduling of batch processes; 
Stochastic vehicle routing problems; Vehicle routing) 
(refers to: Airline optimization; Complexity classes in 
optimization; Complexity theory; General routing problem; 
Integer programming; Job-shop scheduling problem; 
MINLP: design and scheduling of batch processes; 
Stochastic scheduling; Stochastic vehicle routing problems; 
Vehicle routing) 

vehicle scheduling problem 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

vehicle scheduling problem see: Multi-depot —; 
Single-depot — 

vehicle scheduling problems see: multi-depot —; 
Single-depot — 

Vehicle scheduling problems with a fixed number of vehicles 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

Vehicle scheduling problems with multiple types of vehicles 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 

vehicle scheduling problems with time constraints 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 


Vehicle scheduling with trip shifting 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
vehicles see: number of —; Vehicle scheduling problems with 
a fixed number of —; Vehicle scheduling problems with 
multiple types of — 
vehicles’ homogeneity/heterogeneity 
00-02, 01-02, 03-02] 
(see: Vehicle routing problem with simultaneous pickups 
and deliveries) 
venture capital investment 
91B06, 91B60] 
(see: Financial applications of multicriteria analysis) 
verification 
34-xx, 34Bxx, 34Lxx, 90C60, 93E24] 
(see: Complexity and large-scale least squares problems; 
Computational complexity theory) 
verification 
65G20, 65G30, 65G40, 65H20] 
(see: Interval analysis: intermediate terms; Interval analysis: 
nondifferentiable problems) 
verification see: automatic result — 
verifying equilibrium solutions 
[90C26, 90C90] 
(see: Global optimization in phase and chemical reaction 
equilibrium) 
verifying feasibility see: Interval analysis: — 
version of the Hahn-Banach theorem see: Mazur-Orlicz — 
versus Multiperiod Models see: single — 
versus noninteractive methods see: interactive — 
vertex 
[05B35, 20F36, 20F55, 52C35, 57N65] 
(see: Hyperplane arrangements in optimization) 
vertex see: child of a —; co-optimal —; improper —; parent of 
a —; required —; shadow- —; transshipment — 
vertex algorithm see: shadow- — 
vertex (arc) deletion problem 
[90C35] 
(see: Feedback set problems) 
vertex (arc) set problem see: minimum feedback —; subset 
feedback —; subset minimum feedback — 
VERTEX COVER 
[03B50, 05C15, 05C62, 05C69, 05C85, 68Q25, 68R10, 68T15, 
68T30, 68W40, 90C27, 90C59, 90C60] 
(see: Complexity classes in optimization; Domination 
analysis in combinatorial optimization; Finite complete 
systems of many-valued logic algebras; Optimization 
problems in unit-disk graphs) 
vertex cover see: minimum weighted — 
Vertex Cover Problem see: minimum — 
vertex of a digraph 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
vertex of a graph 
[05C05, 05C40, 68R10, 90C35] 
(see: Network design problems) 
vertex insertion algorithm see: generic — 
vertex insertion (FVI) see: farthest — 
vertex insertion (NVI) see: nearest — 
vertex insertion (RVI) see: random — 
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vertex insertion (VI) 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 

vertex matrix of an interval matrix 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: eigenvalue bounds of interval 
matrices) 

vertex packing 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 

vertex in a rooted tree see: level of a — 

vertex set 
[49-01, 49K45, 49N10, 65K05, 90-01, 90C20, 90C26, 90C27, 
90C30, 91B52] 
(see: Bilevel linear programming: complexity, equivalence 
to minmax, concave programs; Monotonic optimization) 

vertex set see: feedback —; minimum feedback — 

vertex set problem see: feedback —; minimum weighted 
feedback —; unweighted feedback — 

vertex set V 
[90C35] 
(see: Graph coloring) 

vertex subset see: subgraph induced by a — 

vertex v at see: insertion of — 

vertex weights 
[05C15, 05C17, 05C35, 05C69, 90C22, 90C35] 
(see: Lovasz number) 

vertical linear complementarity problem 
[90C33] 
(see: Linear complementarity problem) 

vertices 
[05A, 15A, 51M, 52A, 52B, 52C, 62H, 68Q, 68R, 68U, 68W, 
90B, 90C] 
(see: Convex discrete optimization) 

vertices see: expected number of shadow- —; intersatured —; 
variance of the number of shadow- — 

vertices in a graph see: adjacent — 

very strong homomorphism 
[03B52, 03E72, 47840, 68T27, 68T35, 68Uxx, 90Bxx, 91 Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 

Vessel Problem (SVP) see: seismic — 

veto threshold 
[90-XX] 
(see: Outranking methods) 

(VI) see: vertex insertion — 

via linear optimization see: Distance dependent protein force 
field — 

via mixed-integer linear optimization see: Global pairwise 
protein sequence alignment — 

via mixed-integer optimization see: Multi-class data 
classification —; Peptide identification — 

via negative fitness see: genetic engineering — 

via parametric programming see: Design of robust 
model-based controllers — 

via semidefinite programming see: Graph realization —; 
Maximum likelihood detection — 

via (w,i) see: subdivision — 


Vienna Fortran 
[05-02, 05-04, 15A04, 15A06, 68U99] 
(see: Alignment problem) 
view see: beam’s-eye- — 
view of linear semi-infinite programming see: perfect duality 
from the — 
violating point see: index of a constraint — 
violation of constraints 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
violators algorithm see: pool adjacent — 
VIP 
[90C15, 90C26, 90C33] 
see: Stochastic bilevel programs) 
rtual 
[34-xx, 34Bxx, 34Lxx, 93E24] 
(see: Complexity and large-scale least squares problems) 
rtual depot 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
rtual displacements 
[49J52, 49805, 74G99, 74H99, 74Pxx, 90C33] 
(see: Hemivariational inequalities: applications in 
mechanics) 
virtual processor 
[68W 10, 90B15, 90C06, 90C30] 
(see: Stochastic network problems: massively parallel 
solution) 
virtual source 
[90B10, 90C27 
(see: Shortest path tree algorithms) 
rtual source algorithm 
[90B10, 90C27 
(see: Shortest path tree algorithms) 
virtual source concept in auction algorithms 
[90B10, 90C27 
see: Shortest path tree algorithms) 
virtual work see: principle of — 
visual binary star 
[90C26, 90C90 
(see: Global optimization in binary star astronomy) 
visual binary star see: spectroscopic — 
visual inference 
[90C26, 90C30] 
(see: Forecasting) 
sual interaction 
[90C29, 90C70] 
(see: Fuzzy multi-objective linear programming) 
visual interactive method 
[90C29] 
see: Multi-objective optimization; Interactive methods for 
preference value functions) 
visualization see: Optimization-based — 
Vitalyevich see: Kantorovich, Leonid — 
VLSI routing 
[05C05, 05C85, 68Q25, 90B80] 
(see: Bottleneck steiner tree problems) 
vND 
[9008, 90C26, 90C27, 90C59] 
(see: Variable neighborhood search methods) 
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VND 
9008, 90C26, 90C27, 90C59 
(see: Variable neighborhood search methods) 
vNDS 
[9008, 90C26, 90C27, 90C59 
see: Variable neighborhood search methods) 
VNDS 
[9008, 90C26, 90C27, 90C59 
see: Variable neighborhood search methods) 
vNFSS 
[9008, 90C26, 90C27, 90C59 
(see: Variable neighborhood search methods) 
vNS 
[9008, 90C26, 90C27, 90C59 
see: Variable neighborhood search methods) 
VNS 
[9008, 90C26, 90C27, 90C59 
see: Variable neighborhood search methods) 
VNS see: basic —; bl- —; fH- —; j- —; Parallel —; pD- —; 
Reduced —; Skewed — 
vocabulary see: indexing —; optimal —; optimal indexing — 
Vogel approximation method 
[68T99, 90C27] 
see: Capacitated minimum spanning trees) 
Volterra filter 
[90C26, 90C90] 
see: Signal processing with higher order statistics) 
Volterra model of conflicting populations 
[65G20, 65G30, 65G40, 65L99] 
(see: Interval analysis: differential equations) 
volume 
[90C05] 
see: Continuous global optimization: applications) 
volume 
[52B11, 52B45, 52B55] 
see: Volume computation for polytopes: strategies and 
performances) 
volume see: integral over a —; logarithmic —; normalized — 
Volume computation for polytopes: strategies and 
performances 
(52B11, 52B45, 52B55) 
(referred to in: Ellipsoid method; Quadratic programming 
over an ellipsoid) 
refers to: Quadratic programming over an ellipsoid) 
volume ellipsoid see: maximum-—; minimum- — 
volume formula see: integral over — 
volumetric method 
[49M20, 90-08, 90C25] 
see: Nondifferentiable optimization: cutting plane 
methods) 
von Neumann algebra 
01A99, 90C99] 
(see: Von Neumann, John) 
von Neumann architecture 
01A99, 90C99] 
(see: Von Neumann, John) 
Von Neumann, John 
(01499, 90C99) 
(referred to in: Duality theory: biduality in nonconvex 
optimization; Duality theory: triduality in global 
optimization; History of optimization) 


(refers to: Duality theory: biduality in nonconvex 
optimization; Duality theory: monoduality in convex 
optimization; Duality theory: triduality in global 
optimization; History of optimization) 

von Stackelberg game 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 

von Stackelberg games 
[90C25, 90C29, 90C30, 90C31] 
(see: Bilevel programming: optimality conditions and 
duality) 

Voronoi diagram 
[68Q20, 90B80, 90C27] 
(see: Optimal triangulations; Voronoi diagrams in facility 
location) 


Voronoi diagram 
[90B80, 90B85, 90C27] 
(see: Multifacility and restricted location problems; 
Voronoi diagrams in facility location) 


Voronoi diagram see: farthest-point — 


Voronoi diagrams 
[90B85, 90C27] 
(see: Single facility location: circle covering problem) 


Voronoi diagrams in facility location 
(90C27, 90B80) 
(referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Warehouse location problem) 
(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Production-distribution system design problem; 
Resource allocation for epidemic control; Single facility 
location: circle covering problem; Single facility location: 
multi-objective euclidean distance location; Single facility 
location: multi-objective rectilinear distance location; 
Stochastic transportation and location problems; 
Warehouse location problem) 


Voronoi edge 
[90B80, 90C27] 
(see: Voronoi diagrams in facility location) 
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Voronoi point 
[90B80, 90C27] 
(see: Voronoi diagrams in facility location) 
Voronoi region 
[90B80, 90C27] 
(see: Voronoi diagrams in facility location) 
voting 
[55R15, 55R35, 65K05, 90C11] 
(see: Deterministic and probabilistic optimization models 
for data classification) 
vous communication see: rendez- — 
voxels 
[90C90] 
(see: Optimization in medical imaging) 
VRP 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 
VRPB 
[90B06] 
(see: Vehicle routing) 
VRPTW 
[05-04, 90C27] 
(see: Evolutionary algorithms in combinatorial 
optimization) 
VSM 
[90C09, 90C 10] 
(see: Optimization in classifying text documents) 
VSM 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
VSP 
[68M20, 90B06, 90B10, 90B35, 90B80, 90C10, 90C27] 
(see: Vehicle scheduling) 
VSP see: p- — 


W 


(w,i) see: subdivision via — 
w-weighted Tchebycheff metric 

[90C11, 90C29] 

(see: Multi-objective mixed integer programming) 
Wx-algebra 

[01A99, 90C99] 

(see: Von Neumann, John) 
wait see: zero- — 
wait-and-see 

[90C15] 

(see: Two-stage stochastic programs with recourse) 
waits see: younger brother — 
Walford-one algorithm see: Smith- — 
Walford one-reducible graph see: Smith- — 
walk 

[90C35] 

(see: Minimum cost flow problem) 
walk see: directed — 
walk search see: random — 
Walker method see: overdetermined Yule- — 
Wallenius procedure see: Zionts- — 


Walras law 

[91B50] 

see: Walrasian price equilibrium) 

Walras law 

[91B50] 

see: Walrasian price equilibrium) 

Walrasian price equilibrium 

91B50) 

referred to in: Equilibrium networks; Financial 
equilibrium; Generalized monotonicity: applications to 
variational inequalities and equilibrium problems; 
Oligopolistic market equilibrium; Spatial price 
equilibrium; Traffic network equilibrium) 

(refers to: Equilibrium networks; Financial equilibrium; 
Generalized monotonicity: applications to variational 
inequalities and equilibrium problems; Oligopolistic 
market equilibrium; Spatial price equilibrium; Traffic 
network equilibrium) 


Walrasian price equilibrium 

[91B50] 

see: Walrasian price equilibrium) 

Wang algorithm see: Goldfarb- — 

Wardrop first principle 

[90B06, 90B20, 91B50] 

see: Traffic network equilibrium) 

Wardrop second principle 

[90B06, 90B20, 91B50] 

see: Traffic network equilibrium) 

Warehouse location problem 

90B80, 90B85) 

referred to in: Combinatorial optimization algorithms in 
resource allocation problems; Facilities layout problems; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Single facility location: circle covering problem; 
Single facility location: multi-objective euclidean distance 
location; Single facility location: multi-objective rectilinear 
distance location; Stochastic transportation and location 
problems; Voronoi diagrams in facility location) 

(refers to: Combinatorial optimization algorithms in 
resource allocation problems; Competitive facility location; 
Facility location with externalities; Facility location 
problems with spatial interaction; Facility location with 
staircase costs; Global optimization in Weber’s problem 
with attraction and repulsion; MINLP: application in 
facility location-allocation; Multifacility and restricted 
location problems; Network location: covering problems; 
Optimizing facility location with euclidean and rectilinear 
distances; Production-distribution system design problem; 
Resource allocation for epidemic control; Single facility 
location: circle covering problem; Single facility location: 
multi-objective euclidean distance location; Single facility 
location: multi-objective rectilinear distance location; 
Stochastic transportation and location problems; Voronoi 
diagrams in facility location) 
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warehouse location problem 
[90C10, 90C11, 90C27, 90C57] 
(see: Integer programming) 
warmstart see: advanced — 
Wastewater system, optimization of 
(76D55) 
water capacity constraints see: maximum oil, gas and — 
water demand 
[90C30, 90C35] 
(see: Optimization in water resources) 
water environment see: minimizing the degradation in quality 
of both — 
water pumping facilities see: surface — 
water resource planning 
[90C30, 90C35] 
(see: Optimization in water resources) 
water resource systems see: conjunctive use of — 
water resources see: Optimization in —; stochastic approach to 
optimization in — 
water resources planning under uncertainty on hydrological 
exogenous inflow and demand 
[90C30, 90C35] 
(see: Optimization in water resources) 
water resources policies see: nonanticipative —; 
nonanticipativity — 
water storage capacity see: nodes with — 
water transportation systems 
[90C30, 90C35] 
(see: Optimization in water resources) 
wavelength-division multiplexing 
[05C85] 
(see: Directed tree networks) 
wavelengths see: assignment of — 
way analysis of variance see: one- — 
way compatibility see: both- — 
way graph partitioning problem see: k- — 
way polytope see: k- — 
way table see: k- — 
way transportation polytope see: k- — 
weak assumptions see: under — 
weak compactness 
[46A22, 49J35, 49]40, 54D05, 54H25, 55M20, 91A05] 
(see: Minimax theorems) 
weak convergence 
[90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
weak convergence of probability measures 
[90C11, 90C15, 90C31] 
(see: Stochastic integer programming: continuity, stability, 
rates of convergence) 
weak discrete convergence 
90C15] 
(see: Approximation of extremum problems with 
probability functionals) 
weak duality 
90C06, 90C30] 
(see: Duality for semidefinite programming; Lagrangian 
duality: BASICS; Saddle point theory and optimality 
conditions) 
weak duality see: strong and — 


weak duality relation 
[49K05, 49K10, 49K15, 49K20] 
(see: Duality in optimal control with first order differential 
equations) 
weak duality result 
[15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
weak duality theorem 
[49-XX, 90-XX, 93-XX] 
(see: Duality theory: monoduality in convex optimization) 
weak efficiency 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
weak efficiency see: local — 
weak extremal 
[41A10, 47N10, 49K15, 49K27] 
(see: High-order maximum principle for abnormal 
extremals) 
weak extremal see: abnormal — 
weak homomorphism 
[03B52, 03E72, 47S40, 68127, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
weak minimizer 
[49K27, 90C29, 90C48] 
(see: Set-valued optimization) 
weak order 
[90-XX, 90C29] 
(see: Outranking methods; Preference modeling) 
weak prespecification 
[03B52, 03E72, 47S40, 68T27, 68T35, 68Uxx, 90Bxx, 91Axx, 
91B06, 92C60] 
(see: Boolean and fuzzy relations) 
weak principle of optimality 
[90C31, 90C39] 
(see: Multiple objective dynamic programming) 
Weak Slater CQ 
[49K27, 49K40, 90C30, 90C31] 
(see: First order constraint qualifications) 
weak stationarity 
[58C20, 58E30, 90C46, 90C48] 
(see: Nonsmooth analysis: weak stationarity) 
weak stationarity see: Nonsmooth analysis: — 
weak tangent 
[49K27, 58C20, 58E30, 90C48] 
(see: Nonsmooth analysis: Fréchet subdifferentials) 
weakly efficient 
[90C11, 90C29] 
(see: Multi-objective mixed integer programming; 
Multi-objective optimization: pareto optimal solutions, 
properties) 
weakly efficient point 
[90C29] 
(see: Generalized concavity in multi-objective optimization) 
weakly efficient point see: local — 
weakly efficient solution 
[90C29] 
(see: Multiple objective programming support) 
weakly efficient solution 
[90C29] 
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(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
weakly informed 
(see: Beam selection in radiotherapy treatment design) 
weakly L (v)-differentiable family of measures 
[90C15] 
(see: Derivatives of probability measures) 
weakly necessary constraint 
[90C05, 90C20] 
(see: Redundancy in nonlinear programs) 
weakly nondominated solution 
[90C29] 
(see: Multi-objective optimization: pareto optimal 
solutions, properties; Multiple objective programming 
support) 
weakly nondominated solution 
90C29 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
weakly noninferior solution 
90C29 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
weakly noninferior solution 
90C29 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
weakly overtaking optimality 
49Jxx, 91 Axx] 
(see: Infinite horizon control and dynamic games) 
weakly Pareto optimal solution 
90C29 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
weakly Pareto optimal solution 
90C29 
(see: Multi-objective optimization: pareto optimal 
solutions, properties) 
weakly polynomial time algorithm 
90C35 
(see: Minimum cost flow problem) 
weakly pseudoconcave function see: U- — 
wealth see: surplus — 
weather 
(see: State of the art in modeling agricultural systems) 
Weber objective function see: multifacility — 
Weber problem 
[90B85, 90C26] 
(see: MINLP: application in facility location-allocation; 
Multifacility and restricted location problems) 
Weber problem 
[90B85, 90C26, 90C90] 
(see: Global optimization in Weber’s problem with 
attraction and repulsion; Multifacility and restricted 
location problems) 
Weber problem see: generalized —; multifacility —; 
Steiner- — 
Weber problem with attraction and repulsion 
[90C26, 90C90] 
(see: Global optimization in Weber’s problem with 
attraction and repulsion) 


Weber's problem with attraction and repulsion see: Global 
optimization in — 
Weber-Rawls objective function see: multifacility — 
Weber-Rawls problem 
90B85] 
(see: Multifacility and restricted location problems) 
Weber-Rawls problem 
90B85] 
(see: Multifacility and restricted location problems) 
wedge see: universal — 
wedge filters 
68WO01, 90-00, 90C90, 92-08, 92C50] 
(see: Optimization based frameworkfor radiation therapy) 
wedge orientation optimization see: beam angle selection 
and — 
weekly space-time network 
(see: Railroad locomotive scheduling) 
weight 
[68Q20, 90-XX, 90B10, 90C26, 90C27] 
(see: Invexity and its applications; Optimal triangulations; 
Outranking methods; Shortest path tree algorithms) 
weight bounds see: lower — 
weight clique see: maximum — 
weight clique problem see: maximum — 
weight CMST see: nonunit —; unit — 
weight common mutated sequence see: minimum — 
weight of a constraint 
[90C10] 
(see: Maximum constraint satisfaction: relaxations and 
upper bounds) 
weight cost see: mean- — 
weight of a customer 
[90B80, 90B85] 
(see: Warehouse location problem) 
weight cut see: maximum mean- — 
weight of evidence see: expected — 
weight feedback arc set problem see: minimum — 
weight function of a matroid 
[90C09, 90C10] 
(see: Matroids) 
weight independent sets see: maximum — 
weight optimization see: beam — 
weight Steiner triangulation see: minimum — 
weight trace see: maximum — 
weight triangulation see: minimum — 
weight vector 
[05C60, 05C69, 05C85, 37B25, 68W01, 90C20, 90C27, 90C35, 
90C59, 91A22] 
see: Heuristics for maximum clique and independent set; 
Replicator dynamics in combinatorial optimization) 
weighted 
[90C09, 90C10] 
see: Matroids) 
weighted assignment model see: the multi-resource — 
weighted barycenter 
[90C20] 
see: Standard quadratic optimization problems: 
applications) 
weighted bipartite matching problem 
[90C05, 90C10, 90C27, 90C35] 
see: Assignment and matching) 
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weighted characteristic vector 
[05C60, 05C69, 05C85, 37B25, 68W01, 90C20, 90C27, 90C35, 
90C59, 91A22] 
(see: Heuristics for maximum clique and independent set; 
Replicator dynamics in combinatorial optimization) 
weighted clique number 
[05C60, 05C69, 05C85, 37B25, 68W01, 90C20, 90C27, 90C35, 
90C59, 91A22] 
(see: Heuristics for maximum clique and independent set; 
Replicator dynamics in combinatorial optimization) 
weighted coloring 
[90C35] 
(see: Graph coloring) 
weighted distance see: maximum — 
weighted Euclidean norm see: A- — 
weighted feedback vertex set problem see: minimum — 
weighted graph 
[90C35] 
(see: Feedback set problems) 
weighted graph bipartization problem see: minimum — 
weighted graph coloring problem 
[90C35] 
(see: Graph coloring) 
weighted graph planarization see: branch and bound 
algorithm for — 
weighted independent set see: maximum — 
weighted least squares 
[90C30, 90C52, 90C53, 90C55] 
see: Gauss-Newton method: Least squares, relation to 
Newton’s method) 
weighted least squares problem 
[65Fxx] 
see: Least squares problems) 
weighted matching problem 
[90C05, 90C10, 90C27, 90C35] 
see: Assignment and matching) 
weighted matroid 
[90C09, 90C10] 
see: Matroids) 
weighted MAX-SAT problem 
[03B05, 68P10, 68Q25, 68R05, 68T15, 68T20, 90C09, 90C27, 
94C10] 
see: Maximum satisfiability problem) 
weighted maximum norm 
[90C30, 90C52, 90C53, 90C55] 
(see: Asynchronous distributed optimization algorithms) 
weighted planar graph see: maximum — 
weighted problem in OR 
[90B80, 90B85] 
(see: Warehouse location problem) 
weighted stability number 
[05C69, 05C85, 68W01, 90C59] 
(see: Heuristics for maximum clique and independent set) 
weighted-sums program 
[90C11, 90C29] 
see: Multi-objective mixed integer programming) 
weighted-sums programs with constraints 
[90C11, 90C29] 
see: Multi-objective mixed integer programming) 
weighted Tchebycheff metric see: w- — 


weighted tree association graph 
[05C60, 05C69, 37B25, 90C20, 90C27, 90C35, 90C59, 91A22] 
(see: Replicator dynamics in combinatorial optimization) 
weighted vertex cover see: minimum — 
weighter sup norm 
[491.20, 90C40] 
(see: Dynamic programming: stochastic shortest path 
problems) 
weighter sup-norm contraction 
[49L99] 
(see: Dynamic programming: average cost per stage 
problems) 
weighting space reduction 
[90C29] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions) 
weights 
[65K05, 90C27, 90C29, 90C30, 90C57, 91C15] 
(see: Multi-objective optimization; Interactive methods for 
preference value functions; Optimization-based 
visualization) 
weights see: barycentric —; vertex — 
Weir dual see: Mond- — 
Weiszfeld procedure 
[90B85] 
(see: Single facility location: multi-objective euclidean 
distance location) 
well see: type A —; type B — 
well bore model 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
well-conditioned matrix 
[15-XX, 65-XX, 90-XX] 
(see: Cholesky factorization) 
well-conditioned problem 
[90C31] 
(see: Sensitivity and stability in NLP) 
well-defined start-ups 
(see: Planning in the process industry) 
well-determined system of nonlinear equations 
90C30] 
(see: Nonlinear least squares problems) 
well function see: double- — 
well oil flowrate 
76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
well oil rate constraints see: upper and lower — 
well-posed 
49J40, 49M30, 65K05, 65M30, 65M32] 
(see: Ill-posed variational problems) 
well-posed problem 
[90C05, 90C25, 90C29, 90C30, 90C31] 
(see: Nondifferentiable optimization: parametric 
programming) 
well-posed problem see: Levitin-Polyak — 
well-posedness 
[49]40, 49M30, 65K05, 65M30, 65M32] 
(see: Ill-posed variational problems) 
well-posedness 
[49J40, 49M30, 65K05, 65M30, 65M32] 
(see: Ill-posed variational problems) 
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well scheduling see: Mixed integer optimization in — 
well-separated pair decomposition 
[05C15, 05C62, 05C69, 05C85, 90C27, 90C59] 
(see: Optimization problems in unit-disk graphs) 
well switches see: maximum number of — 
wells see: connection of —; operational status of the —; set 
of —; type A —; type B 
wells of type a see: gas lift —; naturally flowing — 
wells of type b see: gas lift —; naturally flowing — 
West corner rule see: North- — 
Weyl fundamental theorem 
[15A39, 90C05] 
(see: Tucker homogeneous systems of linear relations) 
what-if-when scenarios 
[90C06, 90C10, 90C11, 90C30, 90C57, 90C90] 
(see: Modeling difficult optimization problems) 
wheel procedure see: roulette — 
when scenarios see: what-if- — 
Whitney savings heuristic 
[68T99, 90C27] 
(see: Capacitated minimum spanning trees) 
Whitney statistic see: Mann— — 
wide process networks under uncertainty see: Bilevel 
programming framework for enterprise- — 
width see: excess — 
Wiener-Hopf equations 
[49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 
Wiener-Hopf equations 
[49]40, 62H30, 65C30, 65C40, 65C50, 65C60, 65Cxx, 90C05] 
(see: Variational principles) 
Wiener measure 
[65K05, 68Q05, 68Q10, 68Q25, 90C05, 90C25, 90C26] 
(see: Information-based complexity and information-based 
optimization) 
Wiener model 
62C10, 65K05, 90C10, 90C15, 90C26] 
(see: Bayesian global optimization) 
Wiener probability measure 
60J65, 68Q25] 
(see: Adaptive global search) 
Wiener process 
60J65, 68Q25] 
(see: Adaptive global search) 
Wiener process 
60J65, 68Q25] 
(see: Adaptive global search) 
wilhelm see: Leibniz, gottfried — 
Wilhelm Leibniz see: Gottfried — 
Williams algorithm see: Esau- — 
Wilson equation 
[90C26, 90C90] 
(see: Global optimization in phase and chemical reaction 
equilibrium) 
Wilson equation see: regular solution of the — 
window constraints see: time — 
windows see: time —; vehicle routing problem with time — 
wiring problem see: backboard — 
with mathematical rigor 
[65G20, 65G30, 65G40, 65H20] 


(see: Interval analysis: unconstrained and constrained 
optimization) 
(with respect to another) see: pseudomonotone bifunction — 
without decomposition see: heat exchanger network 
synthesis — 
without pivoting see: guaranteed to be stable — 
without using (sub)gradients parametric representations see: 
necessary optimality condition — 
Wolfe see: Frank- — 
Wolfe algorithm see: Frank— —; regularized Frank- — 
Wolfe decomposition see: Dantzig- —; nonlinear Dantzig- —; 
regularized Frank— — 
Wolfe dual 
[90C26] 
(see: Invexity and its applications) 
Wolfe dual 
[90C26] 
(see: Invexity and its applications) 
Wolfe reduced gradient method 
[65K05, 65K10] 
(see: ABS algorithms for optimization) 
Wolfe test 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
Wolfowitz method see: Keifer- — 
Woodbury formula see: Sherman—Morrison- — 
wOR 
[76T30, 90C11, 90C90] 
(see: Mixed integer optimization in well scheduling) 
word patterns 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
word patterns 
[90C09, 90C10] 
(see: Optimization in classifying text documents) 
words see: meaningful — 
work 
[03D15, 68Q05, 68Q15] 
(see: Parallel computing: complexity classes) 
work see: principle of virtual — 
work first algorithm see: mandatory — 
working basis 
[49M25, 90-08, 90C05, 90C06, 90C08, 90C15] 
(see: Simple recourse problem: primal method) 
working basis 
[49M25, 90-08, 90C05, 90C06, 90C08, 90C15] 
(see: Simple recourse problem: primal method) 
workloadBalanced 
[65K05, 65Y05, 65Y10, 65Y20, 68W10] 
(see: Interval analysis: parallel methods for global 
optimization) 
world see: model —; real — 
world problem see: real- — 
Worsley bounds see: Hunter- — 
Worsley upper bound see: Hunter- — 
worst-case analysis 
[60J65, 62C20, 68Q25, 90C15] 
(see: Adaptive global search; Stochastic programming: 
minimax approach) 
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worst-case analysis 

62C20, 90C15] 

(see: Stochastic programming: minimax approach) 
worst-case approach 

[90C15, 90C26, 90C33] 

(see: Stochastic bilevel programs) 

worst-case complexity 

[90C35] 

see: Minimum cost flow problem) 

worst-case optimality 

[65D25, 68W30] 

(see: Complexity of gradients, Jacobians, and Hessians) 
worst-case performance guarantee 

[05C85] 

(see: Directed tree networks) 

WPE 

[91B50] 

see: Walrasian price equilibrium) 

wrapping effect 

[65G20, 65G30, 65G40, 65L99] 

see: Interval analysis: differential equations) 


X 


x see: decision variables — 

X-ray crystallography: Shake and bake approach see: Phase 
problem in — 

X-ray diffraction data see: Optimization techniques for phase 
retrieval based on single-crystal — 

x variables see: full space of — 


Y 


Yadegar linearization see: Frieze- — 
Ye potential function see: Tanabe-Todd— — 
yield see: §2-based — 
yield curve see: interest rate — 
yield to maturity 
[90C34, 91B28] 
see: Semi-infinite programming and applications in 
finance) 
yOP 
see: Integrated planning and scheduling) 
York Times see: the New — 
Yosida-Hewitt decomposition 
[90C15] 
(see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 
Yosida-Hewitt theorem 
[90C15] 
see: Stochastic programming: nonanticipativity and 
lagrange multipliers) 
Young inequality 
[90C05, 90C25] 
see: Young programming) 
Young inequality see: Fenchel- — 
Young programming 
90C25, 90C05) 
refers to: Linear programming) 


Young programming 
[90C05, 90C25] 
(see: Young programming) 
younger brother waits 
[49J35, 49K35, 62C20, 91A05, 91A40] 
(see: Minimax game tree searching) 
Yuan algorithm see: Dai- — 
Yule-Walker method see: overdetermined — 


Z 


z-critical cone 
[90C30, 90C33] 
(see: Optimization with equilibrium constraints: 
A piecewise SQP approach) 
Zamolodchikov differential equation see: Knizhnik—- — 
Zangwill algorithm 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
Zangwill theorem 
[90C30] 
(see: Rosen’s method, global convergence, and Powell’s 
conjecture) 
zemel measure 
[90B06, 90B35, 90C06, 90C10, 90C27, 90C39, 90C57, 90C59, 
90C60, 90C90] 
(see: Traveling salesman problem) 
Zeolite Association see: atlas of the International — 
zeolite separation and catalysis: optimization methods see: 
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